How Machine Learning Can Improve Workplace Incident Classification

May 29, 2025

In my previous articles, I discussed the ESAW methodology - a standardized system for classifying workplace safety incidents - and explored the role of diagnostic analytics in enhancing workplace safety. Accurate incident classification is the cornerstone of effective analysis and prevention strategies. Machine learning offers great potential to streamline and improve this critical task.

Machine Learning Incident Classification Triage System

Why Pattern Recognition Beats Keyword Matching

Most EHS software already includes basic automation—keyword-based "auto-tagging" or decision trees that categorize an incident based on specific words like "fall" or "chemical." However, these systems are brittle. They fail when the narrative is nuanced or when a worker uses non-standard terminology.

Machine learning (ML) stops looking for specific "trigger words" and starts looking at patterns. Instead of following a strict rulebook, the system makes a calculated guess based on how thousands of similar incidents were handled in the past. This offers several structural advantages:

The success of this approach depends on the training data and the choice of the machine learning algorithm. When done correctly, machine learning is a powerful tool for improving both the accuracy and the speed of incident classification.

The Architecture of an Incident Classifier

Developing a classification model is a multi-stage process. While custom models were previously the only option, the industry is now shifting toward **Large Language Models (LLMs)**. These are systems—like the ones behind ChatGPT—that have been trained on vast amounts of text to understand human language contextually. The classic workflow remains essential for understanding how the system learns:

Introducing the Incident Classification Tool

To demonstrate the practical application of this process, I've developed an incident classification tool that utilizes machine learning to automate and improve incident classification, which can be accessed at https://incident-classification-tool.streamlit.app

This tool was trained using a publicly available dataset of OSHA accident and injury data from Kaggle (https://www.kaggle.com/datasets/ruqaiyaship/osha-accident-and-injury-data-1517/data). This dataset contains detailed information about workplace incidents, including the nature of the injury, the part of the body affected, the event type, and the environmental factors involved.

How the Tool Works

The tool simplifies incident classification into a few key steps:

Example Scenario

To illustrate how this works in practice, imagine an incident report describes an employee's hand being caught in a hydraulic press. The tool would analyze this description, considering factors like the environmental factor (pinch point), the nature of the injury (laceration), the body part affected (hand), and the event type (caught in or between). Based on this analysis, it would classify the incident according to the chosen classification system.

The Triage Strategy: Dealing with Accuracy

It is a mistake to view machine learning as a binary success or failure. The effectiveness of a model depends on its **Confidence Score**. This is a percentage that tells you how sure the model is about its own classification—the digital equivalent of a "check engine" light. While a model might achieve 90% accuracy on high-volume categories like "Trips and Falls," it may drop to 40% for rare, complex events.

In a mature EHS data architecture, this 40% isn't a failure—it's a **triage signal**. Instead of a human reading 1,000 reports to find the 5 critical ones, the ML model handles the 600 routine cases with high confidence and flags the remaining 400 for expert review. This ensures your limited time is spent precisely where the data is most ambiguous.

This highlights the inherent challenge: ML does not fix "Garbage In, Garbage Out." If a field worker cannot complete a detailed narrative while wearing cut-resistant gloves in the rain, the model will have no signal to process. The focus must remain on the quality of the field-level reporting system that feeds the model.

From Data Entry to Data Governance

By shifting to an ML-driven triage system, safety professionals move up the value chain. This transition offers three structural benefits:

Conclusion

By prioritizing data standardization and using machine learning to filter the noise, we can create safer and healthier workplaces. This technology doesn't replace the safety professional; it moves you from the filing cabinet to the decision table.