Apr 11, 2026
EHS leading indicators are routinely adopted by convention, not validation. Toolbox talk completion, near-miss counts, and inspection frequencies are standard metrics primarily because they are administratively easy to count, not because they are predictive. They are inherited from legacy corporate spreadsheets and software vendor defaults without any verification of their local signal.
ISO 45001:2018 Annex A specifically calls for "statistical operations" to reveal relationships, patterns and trends, yet the standard lacks the verification logic to prove predictive value. Consequently, most programs operate on pure institutional habit, rolling over metrics because they "feel important" rather than because they demonstrate statistical utility. This is superstition with a spreadsheet, not a safety strategy.
To solve this, validation must be treated as a system diagnostic. The true value of lag-correlation testing is not finding a handful of perfect proactive metrics, but exposing the reality of the metrics management currently trusts.
An inspection metric that reliably predicts incidents at a Texas refinery will degenerate into useless noise at a plant in São Paulo if the local reporting culture is compromised. The corporate dashboard will display the exact same metric, but the mathematical signal will be dead. The difference between "data" and "indicator" lies in the site-specific interaction of risk, trust, and operational latency. When an enterprise deploys a generic, global metric list without site-level validation, it is managing an illusion.
This article establishes a validation methodology for safety metrics and provides a browser-based Safety Metrics Screener for lag-correlation testing. The Screener is a free browser tool embedded in this article — no installation or server-side data transfer required. The methodology functions as a diagnostic tool. It does not just identify leading indicators, it audits the relationship between every metric and its outcome to expose the true operational structure of your management system.
Since the early foundations of Industrial Accident Prevention, safety management has separated lagging indicators (fatalities, injuries) from leading indicators (activities that predict them). The problem is that this classification is almost entirely qualitative. A metric receives the "leading" label simply because it describes a proactive task. If you accept this label without statistical proof, you aren't tracking predictive risk. You are just counting administrative tasks and calling it proactive safety. It creates a dangerous illusion of oversight.
Lagging indicators (incident rates, lost-time injury rates, fatality counts) measure what has already happened. They are essential for reporting and trend analysis, but they cannot prevent the next event. A leading indicator is defined by its relationship to those lagging outcomes: it must move before them and in the opposite direction.
The core question it asks is simple: in the months where this metric was high, were incidents lower one, two, or three months later? This statistical relationship is called lag correlation, and it is what this methodology tests.
The Campbell Institute's expert panel — drawn from EHS leaders at Cummins, Honeywell, ExxonMobil, Fluor, and others — defined a leading indicator as a measure that is simultaneously proactive, preventive, and predictive. All three criteria must be satisfied. A metric that is only predictive — one that moves before incidents but does not prevent them — does not qualify as "leading".
Lag-correlation testing audits two of those three criteria statistically — the predictive and preventive criteria. The third, proactive, cannot be tested by correlation: it is a structural property of the metric itself — whether a drop in the metric can trigger a preventive intervention. That is a management design question, not a data question. It is addressed in Section 2 before the statistical test begins.
The predictive criterion is tested by lag structure: the metric's correlation with future incidents must be strictly stronger than its correlation with concurrent incidents. If the peak correlation occurs at lag 0 (the same month), the metric is concurrent — regardless of its proactive label. The preventive criterion is tested by direction: the correlation must be negative, meaning higher activity predicts fewer incidents. A metric that peaks at a future lag with a positive correlation is predictive but not preventive. It moves before incidents rise, not before incidents fall. That is not a leading indicator — the screener will classify it as a Forewarning signal. Forewarning is a separate classification: it has temporal structure but signals risk accumulation rather than prevention. Acting on it as a control measure will not reduce risk; it tells you the window in which to act before the system fails.
This methodology uses lag-correlation testing to empirically audit these assumptions. But before using math to dismantle false assumptions, candidate metrics must pass four structural tests.
Before validation comes selection. Most EHS programmes build metric sets by adopting industry defaults or copying previous sites. Approving a vendor’s pre-configured list without testing for a local signal is also the same strategic failure. A generic candidate list tells you what other programmes found useful. It does not confirm predictive utility in your specific operational context.
Four properties determine whether a candidate is even worth validating. The first two — Temporally Prior and Actionable — test whether a metric can theoretically predict and prevent incidents. The final two — Sensitivity and Consistency — are the data prerequisites that determine whether the math can run at all.
The metric must occur before the outcome it predicts. A metric that moves concurrently with incidents is not a leading indicator. If there is no temporal priority, there is nothing to predict.
Temporal priority is the predictive criterion. But direction is equally required: a metric that rises before incidents rise (such as high overtime spiking before fatigue-related injuries) has temporal priority but fails as a leading indicator. The screener classifies this as a Forewarning signal.
A change in the metric must trigger an intervention that can prevent the outcome. This is the proactive criterion: the metric must enable a preventive action, not just signal that conditions are worsening. If inspection completion drops to 60%, deploying supervisors closes the gap. Tracking a metric with no intervention pathway isn't just useless—it's a documented liability.
From a governance perspective, validating a predictive signal shifts the legal threshold of foreseeability. Once a risk precursor is statistically validated, it constitutes "documented knowledge" in your management system. If that signal is then ignored without a recorded intervention, it becomes significantly harder to argue that a subsequent incident was not foreseeable. Validation should follow Governance: do not run the math until you have the authority to pull the operational brake. (See Section 13).
A leading indicator must vary enough over time to produce a detectable signal. If a compliance rate remains at a constant 99% while incident rates fluctuate, the measurement has failed to capture the variability of the field. Mathematically, data without variation provides no predictive value; it cannot identify changes in operational risk because the data series contains no new information.
Definitional changes, system migrations, collection gaps, and target-chasing all corrupt the time series. A training rate calculated against total headcount one month, then against permanent staff the next, is not the same metric. The denominator changed. The time series is broken. No statistical method corrects for a broken measurement definition. This is the property most frequently violated in real EHS data.
The following table crystallizes how our operational requirements map directly to Campbell's definitional criteria to ensure the validation math remains grounded in safety theory:
| Campbell Criteria | Article Property | Operational Management Test |
|---|---|---|
| Proactive | Actionable | Management Test: Does a drop in this metric trigger a mandatory, pre-authorized intervention? |
| Preventive | Direction | Performance Test: Does an increase in this activity demonstrate a reduction in incident rates? |
| Predictive | Temporally Prior | Forewarning Test: Does the signal occur early enough to allow for a response before accidents happen? |
Resources like the Practical Guide to Leading Indicators from the Campbell Institute provide a useful candidate pool. But they are a starting point, not a validation. Consensus ends exactly at the point where the list meets your data. Whether inspection completion at your site predicts incidents one month later — or not at all — is something only your time series can answer. That is where the validation framework begins.
The validation framework is a three-stage system diagnostic: candidate selection, statistical testing, and operational confirmation. The Screener implements the mathematical engine in Stage 2, while the framework provides the structural context needed to transform raw correlation into a management finding.
Apply the four properties from Section 2 as a prioritization filter. While candidates that lack sensitivity or actionability are technically eligible for screening, they are pre-disqualified from "Leading" status. However, testing these suspected reactive or static metrics is often necessary for providing the empirical evidence needed to authorize the retirement of redundant indicators or the reconfiguration of reporting systems.
Do not test every metric simply because it is available. Testing dozens of variables simultaneously introduces statistical noise and unnecessary administrative burden. The objective is to identify a reliable set of predictive indicators, not to quantify every data point available in your records.
Lag correlation identifies the time-delay between a leading indicator today and an incident outcome in the future. By testing offsets—typically at 1, 2, and 3 intervals—we identify whether a metric truly precedes incidents or simply reacts to them. This identifies the maximum response window: the time you have to intervene before the historical incident pattern repeats.
The analysis validates two requirements: Timing and Direction. Timing confirms that the strongest relationship occurs in the future. Direction confirms whether the relationship is preventive (negative correlation) or represents an escalation of risk (positive correlation). A positive correlation at the best future lag produces a Forewarning classification regardless of the signal's strength. The strength of the relationship is expressed as the correlation coefficient r, where |r| denotes its absolute value — stripped of sign — ranging from 0 (no relationship) to 1 (perfect relationship). Direction is carried by the sign; magnitude is read from |r|.
Test reliability depends on data density. While 24 data points (e.g., 24 months) provide a baseline for identifying seasonal trends, the fundamental requirement is a dataset with enough incident variability to produce a valid signal. For shorter timelines, you can increase statistical power by increasing granularity—shifting from monthly to weekly records—provided the data is clean enough to support the higher frequency.
Statistical classification is a starting point, not a conclusion. Every result — Leading, Forewarning, Concurrent, or Weak — must be asked two diagnostic questions:
If a metric is classified as "Forewarning", verify whether it tracks a genuine inflation of risk (like overtime-induced fatigue) or is simply a reactive count. This confirmation determines whether you govern it as an early-warning signal or re-evaluate the measurement.
Before running the screener, four technical constraints determine what the results can and cannot tell you.
The Screener identifies the time-relationship and direction of safety data. Four technical constraints shape what the results can and cannot tell you.
1. Directional Scope. Correlation is tested across 1 to 3 months (Lag 1 to 3) into the future. The current month (Lag 0) is excluded because monthly data is too coarse to distinguish the sequence of events.
If a safety talk on the 1st prevents an incident on the 25th, the monthly record registers both in the same period. The math cannot tell if the talk prevented the accident, or if the accident triggered the talk as a reactive response. Because of this limitation, we treat the current month strictly as a measure of how the system reacts to events. For a metric to be validated as Leading, it must show strength in the months that follow, where a clear preventive sequence exists.
Metrics with inherently longer lead-times (such as annual competency cycles) will appear as Weak. This does not confirm the absence of a signal — it confirms the absence of a short-term one. Validating cycles beyond 90 days requires at least 48–60 months of continuous history.
2. Exposure Rate-Normalisation. Raw counts are measures of volume, not efficiency. A count of inspections or hazard reports tells you how much activity occurred, but it doesn't tell you how well you are managing the risk.
If man-hours increase, both activity counts and incident counts will rise together. This isn't a "leading" signal; it's just the byproduct of having more people on site. This shared exposure is not safety performance. To find a real predictive signal, you must normalize the data (e.g., incidents per 200,000 hours) to measure the efficiency of the control relative to the risk exposure.
3. Time-Series Integrity. Changes to reporting definitions or data collection methods mid-dataset invalidate the results. If a new hazard reporting app was introduced or the definition of "Near Miss" was changed during the period, the statistical relationship will track the change in administrative process rather than the change in risk. Do not run the validation on datasets that combine different reporting methodologies.
4. Sample Volume. A minimum of 12 months of data is required, though reliability requires 24–36 months. On smaller datasets, one or two incidents will skew the results and create false signals. If you have less than 2 years of data, use the findings only to investigate the metric — not as a justification to change your safety strategy.
With these constraints in place, the Screener can be applied directly to your data.
The Safety Metrics Screener runs entirely in your browser—no installation required, no data sent to a server.
YYYY-MM format (e.g., 2024-01).
Rows
must be in chronological order with no gaps.To convert daily logs into the required monthly CSV, use SUM for event counts (e.g., total inspections) and AVG for rate-based metrics (e.g., monthly compliance percentage or average headcount) to preserve the proportional signal across disparate month lengths.
All activity metrics are rate-normalised (per 100 employees or per 200,000 hours) — not raw counts — in line with the Exposure Rate-Normalisation constraint in Section 4.
The calculation identifies the relationship between every metric and its outcomes at lags of 1 to 3 months into the future. As described in Section 4, Lag 0 (the current month) is ignored for prediction because monthly data cannot separate cause from effect within the same 30-day period.
Before running the calculation, select the methodology that matches your data density. Pearson correlation measures linear relationships and works well for high-volume, proportional activity metrics. However, real incident counts at well-run sites are often "sparse" (e.g., 0, 0, 1, 0, 0, 2). Spearman rank correlation handles these rare events better by looking at the rank of the month rather than the raw number. If your data includes zeroes and sparse outcomes like Lost Time Injuries, use Spearman to prevent outliers from distorting the Pearson result.
The screener assigns one of four classifications: Leading, Concurrent, Forewarning, or Weak. The result is based solely on the statistical evidence — it tells you the structure of the relationship, not whether the metric belongs in your management system.
1. Leading — peak |r| occurs at lag 1–3, is at least 0.08 stronger than lag 0, the correlation is negative (higher activity predicts fewer incidents), and peak |r| ≥ 0.30. Before treating this result as a validated leading indicator, confirm the Actionability check from Section 2: does a drop in this metric trigger a pre-approved intervention? If not, the statistical result stands, but the metric cannot be governed as a leading indicator. Acting on a signal without a pre-approved response procedure creates the legal exposure described in Section 13.
2. Concurrent — peak |r| occurs at lag 0, or peaks at a future lag but the gain over lag 0 is less than 0.08. In both cases the lead is too weak to distinguish from a reactive pattern — the metric moves with incidents, not reliably ahead of them. These are candidates for the Fix-or-Retire Review (Section 10): examine the reporting process before retiring the metric.
3. Forewarning — peak |r| occurs at lag 1–3, is at least 0.08 stronger than lag 0, the correlation is positive, and peak |r| ≥ 0.30. The metric moves ahead of the outcome — not because it prevents it, but because it tracks its build-up (e.g., overtime rising before a fatigue-related incident cluster).
4. Weak — peak |r| remains below 0.30 across all lags. No meaningful relationship between this metric and future incidents is detectable in the data.
The 0.08 gain threshold is not a fixed standard. On datasets under 24 months, raising it to 0.10–0.12 reduces the risk of classifying random variance as a predictive signal.
The table below shows how |r| magnitude maps to classification and what each range means in practice:
| |r| range | Classification | Interpretation |
|---|---|---|
| 0.00 – 0.30 | No signal | No predictive utility. The metric is either disconnected from risk drivers or the data collection is too inconsistent to produce a signal. |
| 0.30 – 0.50 | Moderate signal | Typical range for real EHS data. If negative (−r): validated leading indicator. If positive (+r): Forewarning — the metric has temporal structure but fails the preventive criterion. It tracks deterioration, not prevention. |
| 0.50 – 0.70 | Strong signal | High predictive confidence. If negative (−r): sufficient to justify management intervention based on metric trends alone. If positive (+r): strong Forewarning signal — do not treat as a control measure. Investigate the mechanism before acting. |
| > 0.70 | Suspect — verify data | Statistically improbable for activity metrics tested against incident outcomes. Direction still applies, but the magnitude warrants scrutiny: likely target chasing, data manipulation, or the metric being a direct mathematical derivative of the outcome. |
At low-incident sites, |r| = 0.30–0.50 is typically the realistic ceiling. Datasets with long runs of zero-incident months cause Pearson correlation to become unreliable — a handful of non-zero months will dominate the calculation and inflate or distort the coefficient. Use Spearman instead: it ranks months rather than using raw values, so the zero-heavy outcome column does not collapse the result. Also check that the direction of the signal holds consistently across rolling 12-month windows, rather than relying on a single full-period result.
The screener returns a ranked, color-coded table — Green: Leading, Orange: Forewarning, Yellow: Concurrent, Grey: Weak — with the full lag profile for each metric shown inline. Export the results to CSV for management review or audit documentation.
The dataset includes six candidate metrics tested against Incident Rate (per 200,000 hours) as the outcome:
This is not a sanitized example built to demonstrate a clean result. The dataset contains one validated leading indicator, metrics that respond to incidents rather than precede them, and signals too weak to act on — the full diagnostic range you would encounter in a real audit.
The screener does not produce a clean list of leading indicators. It classifies each metric by what the data shows — not by what the metric is assumed to do. All findings used Spearman Rank Correlation, the method suited to sparse, zero-heavy incident data (as explained in Section 5). For context, correlations in real EHS data rarely exceed |r| = 0.50 — a result of 0.56 is strong; 0.37, which this dataset returns for two of the six metrics, reflects a reactive pattern, not a predictive one.
Training Rate failed to show a meaningful signal, returning a Weak result across all lags (peak |r| = −0.10). While the negative direction is theoretically preventive, the effect size is negligible, explaining only 1% of incident variance (r² = 0.01). This suggests the training program operates as a pre-scheduled administrative activity rather than a responsive risk control. Because the monthly training volume remains nearly identical regardless of the site's shifting risk profile, it functions as a statistical constant. This lack of independent movement explains the "No Signal" result: a static activity cannot correlate with, or predict, volatile incident outcomes. Apply the Fix-or-Retire Review (Section 10) to the Content layer to align curriculum with field risk drivers.
Near Miss Rate validated as a Concurrent indicator (r = +0.37 at lag 0, p < 0.05) with no forward predictive power. This identifies a reactive reporting pattern: reporting volume spikes concurrently with incident clusters—rising from a baseline of 93 to 107 during high-incident periods—rather than preceding them. This statistical co-occurrence confirms that near misses are being surfaced in response to heightened supervisory scrutiny following an event, not through a proactive routine. Apply the Fix-or-Retire Review (Section 10) to the Collection layer to investigate reporting suppression.
Observation Rate also returned a Concurrent result (r = +0.37, p < 0.05), with the signal collapsing immediately after the current month (Lag 1: r = +0.03). This sharp drop in correlation confirms a total lack of forward predictive structure. Observations are being recorded as a reaction to trouble or administrative pressure rather than as a proactive early warning. This identifies a system where volume-based quotas have decoupled reporting from reality, turning the process into a lagging documentation exercise. Apply the Fix-or-Retire Review (Section 10) to the Incentives layer to restore signal integrity.
Total Working Hours returned a Weak result (peak |r| = +0.27) and failed to reach statistical significance. The data confirms that headcount is not a proxy for risk: the total volume of work hours explains 7% of the site's incident variance (r² = 0.07). This demonstrates that aggregate hour totals are a measure of operational scale, not risk intensity. Because the metric does not distinguish between a high-risk maintenance shutdown and routine administration, it provides no predictive signal for incident clusters. Total hours should be managed on operational dashboards, as they offer no empirical value for safety forewarning.
Hazard Closure Rate is the site’s primary early-warning signal, returning a Leading result (r = −0.56, p < 0.001) at a 60-day lag. The data reveals that administrative speed is a direct driver of field safety: resolving hazards today creates a measurable reduction in incidents two months later. The relationship is negligible in the first month (r = +0.16) but the correlation nearly triples by 60 days (r = +0.16 at lag 0 → r = −0.56 at lag 2). This confirms that closure speed isn't just an office metric—it is a proactive control that protects the site from future risk, explaining nearly one-third of the total incident variance (r² = 0.31).
Overtime Rate validated as a Forewarning indicator (r = +0.55, p < 0.001) at a 30-day lag. The data reveals that the impact of overtime is cumulative: it has almost no relationship to incidents in the first month (r = +0.14) but becomes a strong driver of risk after 30 days. This confirms that fatigue takes time to build. An overtime spike today is not an immediate crisis, but it marks the start of a 30-day "danger zone" where incident clusters become significantly more likely as the system's capacity for strain is reached.
The table below summarises all findings, sorted by classification result.
| Metric Name | Lag (mo) | Correlation (r) | Effect (r²) | Classification |
|---|---|---|---|---|
| Hazard Closure Rate (%) | 2 | −0.56 | 0.31 | Leading |
| Overtime Rate (hrs/emp) | 1 | +0.55 | 0.30 | Forewarning |
| Near Miss Rate (per 100 emp) | 0 | +0.37 | 0.14 | Concurrent |
| Observation Rate (per 100 emp) | 0 | +0.37 | 0.14 | Concurrent |
| Training Rate (hrs/emp) | 2 | −0.10 | 0.01 | Weak |
| Total Working Hours | 2 | +0.27 | 0.07 | Weak |
A Concurrent result means the metric fails the predictive criterion. This happens in two cases: the peak correlation occurs at Lag 0 (the same month as the incident), or it peaks at a future lag but the gain over Lag 0 is too small (below 0.08) to confirm a genuine lead. In both cases the metric is a record of what has already occurred rather than a forecast of what is coming. When a metric labeled "proactive" returns a Concurrent result, it is empirical evidence that the activity is triggered by incidents, not by a prevention routine.
The classification is site-specific, representing an audit of the local reporting culture. If Near Miss Rate is a validated leading indicator at Site A but Concurrent at Site B (as seen in Finding 2), Site A has built a reporting culture where risks surface independently. In contrast, Site B's reporting only spikes in response to trouble. For reactive cultures, the Fix-or-Retire Review (Section 10) provides the diagnostic steps needed to investigate whether the metric can be reconfigured or if the reporting system itself is broken.
While Concurrent metrics cannot predict the next failure, they are essential for verifying that the work is actually getting done. Tracking whether your administration fulfills its immediate promises—such as closing investigation actions on time—is a vital "health check" for safety performance. These metrics prove your management process is functioning, even if they hold no power to forecast the future.
A Forewarning indicator passes the predictive criterion but fails the preventive criterion—the metric moves before incidents, but it rises as risk increases. While the instinctive response is to discard these results, they provide a strategic warning of systemic strain. They identify precisely when the site's operational demand is exceeding its safe capacity.
Consider Finding 6 (Overtime Rate). Every spike in overtime represents a measurable accumulation of fatigue. The data shows that this strain does not cause immediate failure; instead, incident clusters peak one month later. In this context, the metric is an early warning of system exhaustion. It identifies the moment where the site has run out of "safety margin."
Strategic leaders use these results to determine their lead time for intervention. If a Forewarning indicator is validated with a 30-day lag, the system is giving you a one-month head start to act. This is the timeline to increase supervisory presence or suspend non-essential work before the accumulated fatigue results in an injury. These metrics tell a leader exactly how much time they have left to deploy controls before the system fails.
A Weak or Concurrent result does not automatically mean the metric is worthless. It may mean the data collection is broken, the reporting culture suppresses the signal, or the metric is measuring the wrong layer of activity. Before retiring a metric, investigate the source of the failure.
If a site has recorded zero incidents for 24 months, you cannot validate a metric as leading — there is no outcome variance to correlate against. In this case, apply the same lag-correlation method to the activity chain itself: test whether inspection frequency reliably precedes corrective action closure rates. If it does, the activity chain has temporal structure even if the incident outcome does not.
Three structural layers determine whether a metric can carry a signal:
Apply the protocol only if at least one of the following applies:
If none of these apply and the metric remains Weak after investigation, retire it. A metric that carries no predictive or compliance value is administrative overhead.
Once individual metrics are classified and weak signals retired, the next question is whether your validated metrics work better together than alone.
A single correlation isolates one metric, but it cannot detect where metrics overlap or miss context. Two metrics might appear strong individually but simply repeat the same information. Conversely, two weak metrics may form a strong predictive signal only when viewed together.
The Combined Metric Analyzer measures how much of the historical variation in a target outcome the selected metrics can collectively explain. The output is a score from 0 to 1: a score of 0.50 means the model accounts for half of the historical ups and downs in that outcome. The result is broken down into relative percentages for each metric; the metric with the highest share is the Primary driver. Any metric contributing less than 1% to the model is redundant, adding no new information.
To measure the gain from combining metrics, compare the combined score against the strongest individual baseline. If the combined score is higher, the signals are complementary. If it matches the single-metric baseline, the additional metrics add no explanatory value — use the simpler model. Always focus your investigation on the Primary driver, as it accounts for most of the model's explanatory power for that specific risk pattern.
Using the specimen data against Incident Rate:
Finding: Overtime Rate carries 77% of the total weight. The site leader should investigate the overtime trend and implement fatigue controls before the next incident spike.
Validation produces two types of output: signal classifications for each metric and, where multiple leading indicators exist, a combined model identifying the primary driver. Both feed directly into decisions — which metrics to act on, which to reassign, and which to retire. Implementation follows a five-step sequence:
The enterprise will maintain a global metric list — and should, for comparing performance across sites. That list is a reporting standard, not a predictive one. A metric that is Leading at a high-hazard manufacturing site may be Weak at a logistics hub with different risk drivers. Which metrics actually predict incidents varies by site — validate locally. The screener output, not the metric list, is what gets localised.
Intervene immediately on signal drops. If your leading metric has a 2-month lag, a drop today is a 60-day warning. The lag period is your maximum response window (see Section 3). Assign a named owner and a response deadline the same day the drop is detected.
The signal response rules above tell you which metrics to act on and when. They do not tell you how to read the overall health of a site at a glance. That requires combining your validated leading indicators into a single composite — the Safety Performance Index — which translates your screener output into a Green / Amber / Red operational signal. That is covered in the next article in this series: Building the Safety Performance Index.
Validation does not end with classification. Once the screener confirms that a metric reliably precedes incidents, you are no longer operating reactively — you have evidence that you knew risk was rising before the incident occurred. That shift carries legal and ethical weight, which is the subject of Section 13.
Operating a predictive framework brings a specific duty of care. Once you demonstrate that a metric reliably precedes incidents, you possess foreknowledge of risk. The core legal question is never whether you ran the validation, but what you operationally did once the predictive signal was identified.
A validated leading indicator constitutes a formal warning light in your management system. The duty of care is triggered once a risk is recognized or becomes mathematically foreseeable.
A recorded management response to a signal is your primary legal defence. The legal exposure sits in the gap between knowing a signal exists and closing the control it points to. This varies by jurisdiction — involving legal counsel is a necessary step. The legal risk is highest in the Interim Period — the window between the moment a signal is confirmed and the moment a control response is documented and deployed. If your validation audit identifies a new leading indicator, your duty of care is triggered immediately. Do not conduct validation audits as "informal research" without a pre-authorized plan to act on any positive results — at minimum, that plan should name who is responsible, what response is required, and within what timeframe.
True predictive capability is not a software feature. It begins with the integrity of your data and ends with a decision. How to turn your validated signals into a live operational dashboard — the Safety Performance Index — is covered in the next article in this series: Building the Safety Performance Index.