Data Analysis & Analytics

All LGO students master data analytics skills at LGO. The summer curriculum incorporates analytics coursework using Python and R, and many electives popular with LGOs have a data analysis focus. Some students choose an analytics internship, which have a large component of the research work devoted to large data sets and complex analysis.

Many internship projects that LGOs complete have a significant data analysis component. Students use data to make decision recommendations on supply chains, new products, optimized systems, and more. But some projects go above and beyond using data. These projects create analytics tools and predictive frameworks to make statistically sound business decisions.

Text Analytics to Inform Deviation Root Cause Analysis in Biomanufacturing

Lois Nersesian (LGO ’22)

Engineering Department: Chemical Engineering
Company: Amgen
Location: Cambridge, MA

Problem: Whenever a major deviation from a defined process, product or system requirement occurs, steps are taken to determine potential causal factors and prevent future issues. Data from historical deviation could provide valuable insights, however, the unstructured nature of the reports means that it is difficult to aggregate and use effectively. Amgen would like to determine a way to structure this data and draw meaningful insights.

LGO 22s on internship with Amgen.

Approach: To begin, Lois focused on the process of determining potential causal factors for new deviations as an area to address. To enable this, she explored various analytical solutions to add structure to the unstructured dataset. These included unsupervised clustering, explicit text extraction, and process driven step assignment. The results of each method were combined in a report for investigators.

Impact: The output of the project is a variety of methods that could be used to add structure to deviations and causal factors as well as a proof-of-concept tool, showcasing the results of the method. This kind of tool that centers on causal factors proposed in historical deviation has the potential to improve the efficiency and accuracy of future deviation investigations. If the methods she developed were fully implemented, investigators of a new deviation can use the tool to see information about similar deviations from the past. This solution will also help identify trends in deviations and root causes leading to durable fixes, preventing future deviations from occurring.

Deep Learning Models of Scanner/Vision Tunnel Performance in Sortation Subsystems

Felix Dumont (LGO ’21)

Engineering Department: Electrical Engineering and Computer Science
Company: Amazon

Problem: At Amazon’s large crossbelt sorter sites, the goal is 98% scanner performance. However, the average read rate success is 80-90%, contributing to a large amount of manual rework and recirculation impacting sorter utilization. The mechanisms to deep-dive scanner issues make it extremely difficult to categorize no-reads (unsuccessful scans) into operational or actual equipment issues. As a result, Amazon has very little visibility as to no-read causes across sites and cannot properly put together a plan to improve the situation.

Felix Dumont LGO '21 Thesis Diagram
While the algorithms used are well-known to deep learning experts, they had yet to be used in the context of fault classification at Amazon.

Approach: To address the no-read issues that Felix witnessed across the fulfillment network at Amazon, he built a pipeline on Amazon Web Services (AWS) to process scanner images. Then, he developed a deep learning ResNet model through AWS SageMaker to assign fault reasons for each image. A user interface finally allowed operations managers to see which sites are lagging behind, launch deep-dives and test operational or equipment fixes.

Impact: Felix’s solution allowed engineers and operations managers to understand the cause of no-reads at their respective sites and empowered them to address the issues. Despite the subjective nature of multiple labels, the models Felix developed showed less than 2% aggregated error across his validation set and less than 5% error across previously unseen scanners or sites, effectively correctly reporting all of the site-specific trends and issues. A conservative entitlement is approximately $2.2MM for the pilot sites in annual savings with the potential to save significantly more if actively adopted across the network.

Improving Asset Utilization and Manufacturing Production Capacity Using Analytics

Noa Ghersin (LGO ’20)

Engineering Department: Mechanical Engineering
Company: Boeing
Location: Everett, WA

Ghersin, Boeing, 2020
Data-based job allocations reduce workload variance and create fairer work distribution.

Problem: Boeing wanted to leverage digital solutions to improve Overall Equipment Effectiveness (OEE) for the thousands of machines it uses to manufacture airplane components. Noa’s project focused on increasing machine utilization and manufacturing production capacity in support of its vertical integration strategy. With increasing competition and an impetus to lower manufacturing costs, manufacturers like Boeing’s Interiors Responsibility Center (IRC) were looking to leverage IoT technology to transform not only what they manufacture, but how.  

Approach: Noa’s analysis included personnel interviews, observational time studies, review of historical machine data, and value stream mapping. An analytical tool based on mixed integer programming techniques was built to dictate optimal job allocations in the IRC’s CNC router workstation, replacing a previously manual task. A discrete event simulation of operations in the CNC router workstation was built and tested for further analysis of efficiencies gained from the tool. Additional operational inefficiencies were uncovered by resource state analyses of simulated operations. Noa also offered a methodology for data-based strategic decision-making, leveraging linear programming methods to account for ordered strategic priorities. 

Impact: Discrete event simulation projected that replacing human-dictated job allocations with an analytical tool would yield higher throughputs, enabling Boeing to better utilize its existing assets. A survey following a pilot of the tool revealed that analytics-based job allocations increased job satisfaction among 70% of employees in the CNC router workstation. What-if analyses simulating other potential interventions led to identification of alternative staffing and material storage schemes associated with 65% to 100% reduction in overtime hours.  

Improving Project Timelines Using AI / ML To Detect Forecasting Errors

David Goldberg (LGO ’19)

Engineering Department: Electrical Engineering and Computer Science
Company: Amgen
Location: Thousand Oaks, CA

Goldberg, Amgen, 2019
Better data and optimized resources could save Amgen millions of dollars.

Problem: Across industries and functions, data is a core ingredient driving our decisions and actions but can we trust our data? Accurate and robust forecasting is critical for optimizing recommendations and decisions around biotechnology companies’ drug pipelines. David’s project targeted the recurring errors in Amgen’s data that needed to be better detected and corrected to improve capacity management and decision making. 

Approach: David developed a novel data analytics tool to detect and flag potential errors within Amgen’s capacity management forecasting. He generated the tool in an automated manner using statistical analysis, artificial intelligence and machine learning. The framework, approach and techniques can more broadly be applied to detect anomalies and errors in other sets of data from across industries and functions.  User interaction allowed the tool to learn from past experiences improving over time. Flagging and correcting this data allowed for overcoming errors, which could have ultimately hampered Amgen’s ability to efficiently roll out drugs for patients.  

Impact: At the time David completed his project, the tool identified 893 corrected errors with a 99.2% accuracy rate and an estimated business impact of $77.798M in optimized resources. Using the paradigm of intelligent augmentation (IA), his tool empowered employees by saving them time sifting through thousands of lines and hundreds of thousands of data points. The human user could now make decisions based on the tool provided output.