Data Analysis & Analytics

All LGO students master data analytics skills at LGO. The summer curriculum incorporates analytics coursework using Python and R, and many electives popular with LGOs have a data analysis focus. Some students choose an analytics internship, which have a large component of the research work devoted to large data sets and complex analysis.

Many internship projects that LGOs complete have a significant data analysis component. Students use data to make decision recommendations on supply chains, new products, optimized systems, and more. But some projects go above and beyond using data. These projects create analytics tools and predictive frameworks to make statistically sound business decisions.

Improving Care Systems and Processes to Reduce Heart Failure Admissions

Mariam Al-Meer (LGO ’17)

Company: Massachusetts General Hospital (MGH)
Location: Boston, MA

Problem: Heart failure (HF) is a complex chronic condition, representing a considerable burden for hospitals nationwide. During her internship, Mariam looked at ways to improve how HF patience received care. She wanted to identify system-level operational recommendations, with a focus on improving outpatient monitoring. The hospital hoped that this would translate into fewer HF patients coming to the hospital.

MIT LGO Analytics Internship Data
Mariam’s final analysis from her project at MGH.

Approach: First, Mariam completed an extensive map of the HF care pathway. She then carried out a detailed retrospective analysis of admitted HF patients’ medical records to investigate their sources of origin and general behavior. As a result, she proved that the majority of HF admissions originated from the Emergency Department (ED) on weekdays between the hours of 9am and 6pm.

Mariam could then look compare this data with the patients’ primary care physicians and cardiologist appointments. About 57% of hospitalized HF patients had no scheduled follow-up appointments in the two weeks prior to their admissions. Similarly, around 43% had no scheduled appointments in the eight weeks after leaving the hospital. These two time periods are critical for acute HF decompensation.

Targeted outpatient care needed to be a priority. Mariam proposed predictive models to identify patients at greatest risk of a first hospital admission following encounters with their primary care providers and/or cardiologists in any given year. She performed logit-linear regressions on multiple prior first admissions and identified several significant predictors, including clinical risk factors, socio-demographic features, and medication histories.

Impact: This project highlighted the need for improved outpatient monitoring strategies to prevent HF decompensation and hospitalizations. The final model developed a risk scoring mechanism for HF patients (based on their likelihood of a first admission), from which health care provided could use to manage limited outpatient resources. Her study showed that non-physiologic features (e.g., marital status, ability to speak English, estimated patient income) also predicted HF hospitalization. Through these features specifically, the project also identified room for improvement: the hospital can assign remote monitoring space based interventional resources for MGH HF patients.

Integrated Supply and Production Network Design

Renata Bakousseva (LGO ’16)

Location: Akron, OH

Problem: The tire industry needs to deliver a large portfolio of products to customers in a timely and cost-efficient manner. Renata’s host company wanted to improve customer service and deliver new growth opportunities while eliminating waste. The company optimized its manufacturing system for high-volume products with low demand variation signals. The system is used for all products regardless of demand characteristics. This results in higher holding cost, stale inventory, lost sales, and higher total delivered cost. The company wanted to develop a more responsive production system to reduce strain on the supply network, reduce total delivered cost, and improve product fulfillment.

Approach: During her analytics internship, Renata analyzed a portfolio of low-volume products and found a relationship between lot size and production cost in both of the explored manufacturing systems.

Analytics Internship - Tire Production
The model shows that greater lot sizes reduce manufacturing cost, but also increases inventory cost, resulting in a higher total delivered cost.

Impact: Previously, the company’s operations only used manufacturing cost to decide how to deliver the product. Renata showed the impact of inventory cost as part of the total delivered cost. She also showed that lot size correlated to production cost. This result is important because it refutes the preconceived notion that manufacturing cost is fixed regardless of lot size.

Renata also analyzed various algorithms that optimize the product lot size and job scheduling. She used EOQ and a Mixed Integer Program to assess lot size dynamics. She also tested a few bin-packing algorithm heuristics, which showed how much time the company could save by scheduling mechanically, therefore proving the need to automate the scheduling process.

Automating Decisions Using Machine Learning

Shai Ben Nun (LGO ’17)

Company: Amazon
Location: Seattle, WA

Problem: Amazon would like to automate as many processes as possible. But that could involve automating processes that make important decisions for a business unit. Shai attempted to use past decisions to automate processes by using the hundreds of thousands of recorded employees’ work — data that was not previously utilized. Machine learning (ML) techniques were a natural solution to this analytics internship.

Approach: Shai defined a framework to identify opportunities and develop solutions. He used the “Stow Problem” as an example for many potential applications. Stow (storing things) is the most complex decision process first tier associates have at Amazon. If an automated machine can succeed in stowing items, there is a high likelihood that Amazon can automate other processes.

Shai focused on the first piece of the puzzle needed to automate stow: predict whether an item can be put into a bin. This decision combines physical factors such as the product’s dimensions, the product’s material and the bin’s empty space. It also combines stow guidelines (such as the direction an item should be placed in the bin). The recorded work from employees’ stow tasks helped to analyze those factors that we have no access to, such as material.

Analytics Internship Amazon
Boyan’s model could be used as a performance management tool.

Impact: Once Amazon can predict which bins are free, they can suggest that information to the associate to help make their job easier. Another important application is constructive feedback. Since Amazon can predict the average outcome, the model can track each associate’s decisions and compare them to the average. This can help new associates learn their job faster. The model will produce an informative report that can allow both managers and associates to see setbacks and make improvements.

Cross-Channel Predictive Analytics for Retail Distribution Decisions

JB Coles (LGO ’17)

Company: Zara/INDITEX
Location: A Coruña, Spain

Problem: Zara launched an online retail store in late 2010 to meet fast fashion demand in the e-commerce space. In addition to an increased customer base, an e-commerce sales channel can let Zara gain unique insight into customer engagement and preferences using website data. This insight can help the company better anticipate customer needs and serve them even more rapidly. JB’s project investigated potential operational improvements in both the physical and e-commerce businesses using data collected by Zara.com.

Approach: First, JB did some analysis to understand the currently available data, identify data sets of interest, and develop a project road map using identified data sets. He reviewed the available data, associated entity relationship diagrams (ERD), and web analytics practices to understand current data capabilities. JB found a number of areas where cross-channel data could be used in future work. These focus areas form the foundation of the road map for future predictive analytics work with Zara.com.

LGO Analytics Internship Zara
JB’s analysis showed ways to lower inventory while still maintaining customer service.

In the second phase, JB used an improved the accuracy of demand prediction across both sales channels. He created a demand forecast model using the identified data sets. JB transformed raw data sets into meaningful features that could train initial forecasting models. Finally, JB tested his models against unseen data and demonstrated a measurable performance improvement in forecast accuracy relative to existing models.

Impact: Zara can use this model to reduce required inventory while maintaining high service levels. This initial investigation can continue with future LGO internships to explore where e-commerce data can lead to operational improvements.

A Predictive Model for Power Pole Testing

Boyan Kelchev (LGO ’17)

Company: Pacific Gas & Electric (PG&E)
Location: San Francisco, CA

Problem: Pacific Gas & Electric Company’s distribution system includes approximately 2.4 million wood utility poles. The Pole Test & Treat (PTT) program inspects those poles and tries to use chemical treatments or structural reinforcements to prolong the poles’ service life while identifying when they need replacement. PTT inspects poles every 10 years. PG&E asked Boyan to improve PTT’s mission by using extensive data collected since the program began in the mid-1990s. With this analytics internship, PG&E wanted to use modern statistical methods to better understand and predict decay in their wood poles.

Analytics Internship PG&E
Boyan used predictive modeling to help PG&E plan for the future.

Approach: First, Boyan used available data to understand how various variables generally behaved. He ran hypothesis tests to find the main reasons a pole was rejected during inspections. He then developed a model that estimated the overall rejection rates of different kinds of poles. The result was a prediction with a mean absolute percentage error of about 30%. Finally, he used the model’s results to simulate how often PG&E will reject poles in the future.

Impact: Boyan’s model helps PG&E to better budget and plan for future work. The simulation highlighted a well-known problem in the utility industry: aging infrastructure. The relatively low average age of poles and the low replacement rates observed in the past few inspection cycles mean that PG&E will likely experience a drastic increase in rejection rates as the average age of its pole population grows. Planning for the related increase in manpower and work hours will be of great importance to PG&E in the next few decades.