Data Analysis & Analytics

All LGO students master data analytics skills at LGO. The summer curriculum incorporates analytics coursework using Python and R, and many electives popular with LGOs have a data analysis focus. Some students choose an analytics internship, which have a large component of the research work devoted to large data sets and complex analysis.

Many internship projects that LGOs complete have a significant data analysis component. Students use data to make decision recommendations on supply chains, new products, optimized systems, and more. But some projects go above and beyond using data. These projects create analytics tools and predictive frameworks to make statistically sound business decisions.

Predicting Surgical Inpatient Discharges

Jonathan Zanger (LGO ’18)

Company: Massachusetts General Hospital (MGH)
Location: Boston, MA

Problem: At MGH and hospitals nationwide, managing capacity of beds on surgical inpatient floors is vital to avoid overcrowding and maintain patient care. The current process for managing bed capacity is based on historical averages of discharge rates and staff experience, and still results in frequent capacity problems. Jonathan and the MGH team saw an opportunity to create a predictive model of patients likely to be discharged within 24 hours and focus the staff on any barriers to these patients’ discharge.

Predicting Surgical Patients’ Discharges at Massachusetts General Hospital
Predicting Surgical Patients’ Discharges using Machine Learning techniques

Approach: Jonathan started by defining milestones towards post-operative recovery based on groupings of surgery types. Each surgical recovery has a “checklist” of events, such as stabilized temperature or cessation of IV medications, that have to occur before discharge. Jonathan aggregated these checklists and included lab results and medications as factors that could help gauge a patient’s readiness to leave the hospital.            

Working with the MGH team, Jonathan also identified barriers to discharge, which could include administrative issues (a physical therapy consultation that was ordered but not completed), clinical indicators requiring further care in the hospital, or situations in the patient’s home that would be a challenge for recovery. Jonathan was able to use text analytics to capture indicators from medical staff notes in patient records.

To integrate all of these data inputs, Jonathan developed a prediction algorithm based on a neural network and other machine learning techniques. He used historical data for surgical patient admissions and discharges in the prior year to train and test the algorithm. Jonathan compared the predicted discharges with records of actual patient discharges.

Impact: The algorithm’s performance was remarkably accurate: of the top 10 patients identified by Jonathan’s model to be discharged, over 97% were discharged within 48 hours. Jonathan created a discharge prediction web tool that was implemented during his internship at MGH, allowing clinical staff to identify potential barriers to the discharge of patients and avoid unnecessary days in the hospital. Because MGH uses the same electronic records system as 75% of US hospitals, Jonathan’s pioneering model can potentially address capacity problems nationwide. Hailed as being in a “league of his own,” Jonathan won the 2018 LGO Best Thesis Award for this work.

Integrated Supply and Production Network Design

Renata Bakousseva (LGO ’16)

Location: Akron, OH

Problem: The tire industry needs to deliver a large portfolio of products to customers in a timely and cost-efficient manner. Renata’s host company wanted to improve customer service and deliver new growth opportunities while eliminating waste. The company optimized its manufacturing system for high-volume products with low demand variation signals. The system is used for all products regardless of demand characteristics. This results in higher holding cost, stale inventory, lost sales, and higher total delivered cost. The company wanted to develop a more responsive production system to reduce strain on the supply network, reduce total delivered cost, and improve product fulfillment.

Approach: During her analytics internship, Renata analyzed a portfolio of low-volume products and found a relationship between lot size and production cost in both of the explored manufacturing systems.

Analytics Internship - Tire Production
The model shows that greater lot sizes reduce manufacturing cost, but also increases inventory cost, resulting in a higher total delivered cost.

Impact: Previously, the company’s operations only used manufacturing cost to decide how to deliver the product. Renata showed the impact of inventory cost as part of the total delivered cost. She also showed that lot size correlated to production cost. This result is important because it refutes the preconceived notion that manufacturing cost is fixed regardless of lot size.

Renata also analyzed various algorithms that optimize the product lot size and job scheduling. She used EOQ and a Mixed Integer Program to assess lot size dynamics. She also tested a few bin-packing algorithm heuristics, which showed how much time the company could save by scheduling mechanically, therefore proving the need to automate the scheduling process.

Automating Decisions Using Machine Learning

Shai Ben Nun (LGO ’17)

Company: Amazon
Location: Seattle, WA

Problem: Amazon would like to automate as many processes as possible. But that could involve automating processes that make important decisions for a business unit. Shai attempted to use past decisions to automate processes by using the hundreds of thousands of recorded employees’ work — data that was not previously utilized. Machine learning (ML) techniques were a natural solution to this analytics internship.

Approach: Shai defined a framework to identify opportunities and develop solutions. He used the “Stow Problem” as an example for many potential applications. Stow (storing things) is the most complex decision process first tier associates have at Amazon. If an automated machine can succeed in stowing items, there is a high likelihood that Amazon can automate other processes.

Shai focused on the first piece of the puzzle needed to automate stow: predict whether an item can be put into a bin. This decision combines physical factors such as the product’s dimensions, the product’s material and the bin’s empty space. It also combines stow guidelines (such as the direction an item should be placed in the bin). The recorded work from employees’ stow tasks helped to analyze those factors that we have no access to, such as material.

Analytics Internship Amazon
Boyan’s model could be used as a performance management tool.

Impact: Once Amazon can predict which bins are free, they can suggest that information to the associate to help make their job easier. Another important application is constructive feedback. Since Amazon can predict the average outcome, the model can track each associate’s decisions and compare them to the average. This can help new associates learn their job faster. The model will produce an informative report that can allow both managers and associates to see setbacks and make improvements.

Cross-Channel Predictive Analytics for Retail Distribution Decisions

JB Coles (LGO ’17)

Company: Zara/INDITEX
Location: A Coruña, Spain

Problem: Zara launched an online retail store in late 2010 to meet fast fashion demand in the e-commerce space. In addition to an increased customer base, an e-commerce sales channel can let Zara gain unique insight into customer engagement and preferences using website data. This insight can help the company better anticipate customer needs and serve them even more rapidly. JB’s project investigated potential operational improvements in both the physical and e-commerce businesses using data collected by

Approach: First, JB did some analysis to understand the currently available data, identify data sets of interest, and develop a project road map using identified data sets. He reviewed the available data, associated entity relationship diagrams (ERD), and web analytics practices to understand current data capabilities. JB found a number of areas where cross-channel data could be used in future work. These focus areas form the foundation of the road map for future predictive analytics work with

LGO Analytics Internship Zara
JB’s analysis showed ways to lower inventory while still maintaining customer service.

In the second phase, JB used an improved the accuracy of demand prediction across both sales channels. He created a demand forecast model using the identified data sets. JB transformed raw data sets into meaningful features that could train initial forecasting models. Finally, JB tested his models against unseen data and demonstrated a measurable performance improvement in forecast accuracy relative to existing models.

Impact: Zara can use this model to reduce required inventory while maintaining high service levels. This initial investigation can continue with future LGO internships to explore where e-commerce data can lead to operational improvements.

A Predictive Model for Power Pole Testing

Boyan Kelchev (LGO ’17)

Company: Pacific Gas & Electric (PG&E)
Location: San Francisco, CA

Problem: Pacific Gas & Electric Company’s distribution system includes approximately 2.4 million wood utility poles. The Pole Test & Treat (PTT) program inspects those poles and tries to use chemical treatments or structural reinforcements to prolong the poles’ service life while identifying when they need replacement. PTT inspects poles every 10 years. PG&E asked Boyan to improve PTT’s mission by using extensive data collected since the program began in the mid-1990s. With this analytics internship, PG&E wanted to use modern statistical methods to better understand and predict decay in their wood poles.

Analytics Internship PG&E
Boyan used predictive modeling to help PG&E plan for the future.

Approach: First, Boyan used available data to understand how various variables generally behaved. He ran hypothesis tests to find the main reasons a pole was rejected during inspections. He then developed a model that estimated the overall rejection rates of different kinds of poles. The result was a prediction with a mean absolute percentage error of about 30%. Finally, he used the model’s results to simulate how often PG&E will reject poles in the future.

Impact: Boyan’s model helps PG&E to better budget and plan for future work. The simulation highlighted a well-known problem in the utility industry: aging infrastructure. The relatively low average age of poles and the low replacement rates observed in the past few inspection cycles mean that PG&E will likely experience a drastic increase in rejection rates as the average age of its pole population grows. Planning for the related increase in manpower and work hours will be of great importance to PG&E in the next few decades.