Data Analysis & Analytics

All LGO students master data analytics skills during their time at LGO. The summer curriculum incorporates analytics coursework, and many electives popular with LGOs in the MBA and engineering have a data analysis focus.

Furthermore, many internship projects that LGOs complete have a significant data analysis component. Students use data to make decision recommendations on supply chains, new products, optimized systems, and more. But some projects go above and beyond using data. These projects create analytics tools and predictive frameworks to make statistically sound business decisions.

Wires-Down Predictive Modeling and Preventative Measures

Greg Eschelbach (LGO ’16)

Company: Pacific Gas & Electric (PG&E)
Location: San Francisco, CA

Problem: In 2012, PG&E identified overhead wire-down events as an important metric for safety and reliability. These events are caused by a host of reasons including trees falling on wires, wild animals, car accidents into poles, or older assets just failing. Greg hypothesized that a statistical approach can be used to identify the key drivers in wire-down events and help direct preventative measures such as tree-trimming, routine inspections, and older asset replacement.

Power line falure analytics
Daily data at a PG&E line segment.

Approach: Greg aggregated about 15 different data sources (system properties, vegetation data, weather and drought data, loading data, inspection data and outage data) and then geospatially assigned weather data to each asset. He used clustering analysis to create standardized weather “environments” of the aggregated factors. Finally, he developed an algorithm that used millions of individual tree-trimming records to describe the vegetation environment surrounding each line segment.

During his project, Greg used regression analysis and algorithms to best model the wire-down events. Out-of-sampling testing was used to determine the best models, and the test/train sets were iteratively rotated to show robustness throughout time. The best results were achieved using logistic regression models.

Greg created two specific models for vegetation failures and equipment failure. These models provide predicted hot spot locations for future failures and insight into the key factors causing the failures. Thus, PG&E can target preventative measures for each failure mode using the models’ results.

Impact: Greg ran failure-mode specific models on the entire PG&E overhead line network to calculate a likelihood of failure for each 150-foot segment. These individual lines were then aggregated upwards to their source-side devices to determine a cumulative failure rating for each source-side device segment (several miles in length). This provided PG&E with a sorted ranking of all segments to then prioritize tree-trimming and their re-conductoring. Several hundred miles of at-risk lines have been scheduled for repairs in 2016 based on the model.

Cannibalization in Retail Stores and Sales Forecast

Evelyne Kong (LGO ’15)

Company: Inditex, S.A. (Zara)
Location: A Coruña, Spain

Problem: To reduce stockouts and overstock at the end of each season, it’s essential for Zara to accurately forecast demand in its 1,600 stores worldwide. In the company’s existing forecasting method, products are considered independently of each other. Evelyne focused on understanding the cannibalization and complementarity effects among products within a store to improve the store demand forecast at the product level and the replenishment decision-making process.

Analytics Retail Store Inventory
Modeling the cannibalization effects within a Zara store.

Approach: Evelyne analyzed store sales data and information collected by Zara’s new RFID system. She then created a new product classification system to easily identify similar or dissimilar garments. To assess different scenarios, she developed models of a store’s demand and the substitution and complementary effects among articles. She then created a sales forecast model at the product level.

Impact: The sales forecast model showed statistically significant substitution effects. The display, availability, and price covariates proved to have the greatest effects on product demand variations. The resulting forecast model improved demand forecast accuracy in the trouser products category by 16% and 8% for the summer and winter season respectively in comparison with the current forecasting method.

Integrated Supply and Production Network Design

Renata Bakousseva (LGO ’16)

Location: Akron, OH

Problem: The tire industry needs to deliver a large portfolio of products to customers in a timely and cost-efficient manner. Renata’s host company wanted to improve customer service and deliver new growth opportunities while eliminating waste through supply and production network integration.

The company’s manufacturing system is optimized for high-volume products with low demand variation signals. The system is used for all products regardless of demand characteristics. This results in higher holding cost, stale inventory, lost sales, and higher total delivered cost. The company wanted to develop a more responsive production system to reduce strain on the supply network, reduce total delivered cost, and improve product fulfillment.

Approach: Renata analyzed a portfolio of low-volume products and found a relationship between lot size and production cost in both of the explored manufacturing systems.

Analytics Internship - Tire Production
The model shows that greater lot sizes reduce manufacturing cost, but also increases inventory cost, resulting in a higher total delivered cost.

Impact: Previously, the company’s operations only used manufacturing cost to decide how to deliver the product. Renata showed the impact of inventory cost as part of the total delivered cost. Her analysis methods also established that lot size is correlated to production cost. This result is important because it refutes the preconceived notion that manufacturing cost is fixed regardless of lot size.

Renata also analyzed various algorithms that optimize the product lot size and job scheduling. EOQ and a Mixed Integer Program were both used to assess lot size dynamics. The latter demonstrated more cost efficient and production efficient results because it was more flexible with time scale and considered manufacturing capacity. She also tested a few bin-packing algorithm heuristics, which showed how much time could be saved by scheduling mechanically. She therefore proved the need to automate the scheduling process.

Predictive Storm Damage Modeling and Repair Crew Optimization

Sean Whipple (LGO ’14)

Company: National Grid
Location: Waltham, MA

Problem: Extreme weather frequently damages utility infrastructure, resulting in service interruption and costly repairs. If utility companies can predict where and when damage will occur, they could optimize repair crews’ workflow, lowering costs and minimizing service breakdowns. National Grid saw predictive analytics as an opportunity to provide significantly better service.

whipple image
Utilizing outage predictions to assign crews.

Approach: Sean first developed a model to predict outages on the network using weather and outage data from the last six major storms. He included variables such as land cover, altitude, and other physical features to improve the model’s accuracy. He then used data mining and machine learning techniques to develop a model that applies a classification tree model to predict outages on the network. Finally, Sean and his MIT team created an optimization model to allocate repair crews across National Grid staging locations when the model predicted damage.

Impact: National Grid can now use robust data to predict how much damage an incoming storm will produce and automatically plan how to allocate repair crews most efficiently. This allows National Grid to respond to outages despite uncertainties, leading to more effective response responses when weather-related outages occur. For his work, Sean won the Best Thesis Award of his graduating year.