Anda di halaman 1dari 12

Achieving System Safety by Resilience Engineering

Erik Hollnagel Industrial Safety Chair, cole des Mines de Paris, France E-mail: erik.hollnagel@cindy.ensmp.fr
Professor, University of Linkping, Sweden E-mail: eriho@ida.liu.se

Erik Hollnagel 2006

Safety as a non-event
SAFE SYSTEM = NOTHING UNWANTED HAPPENS Prevention of unwanted events Unexpected event Daily operation (Status quo) Reduce likelihood. Reduce consequences. Accidents, incidents Protection against unwanted outcomes Unwanted outcome

Safety management must prevent/protect against both KNOWN and UNKNOWN risks. Safety management requires THINKING about how accidents can HAPPEN
Erik Hollnagel 2006

What has happened? What may happen?


Looking at the past Accident model Simple linear Complex linear Non-linear* Looking into the future Risk model Component failures Combination of failures and degraded defences Performance variability coincidences

* outcomes are not proportional to inputs, and cannot be derived from a simple combination of inputs
Erik Hollnagel 2006

Simple, linear cause-effect model


Assumption: Accidents are the (natural) culmination of a series of events or circumstances, which occur in a specific and recognisable order.

Domino model (Heinrich, 1930)

Hazards-risks: Due to component failures (technical, human, organisational), hence looking for failure probabilities (event tree, PRA/HRA). Consequence: Accidents are prevented by finding and eliminating possible causes. Safety is ensured by improving the organisations ability to respond.
Erik Hollnagel 2006

Complex, linear cause-effect model


Assumption: Accidents result from a combination of active failures (unsafe acts) and latent conditions (hazards).

Swiss cheese model (Reason, 1990)

Hazards-risks: Due to degradation of components (organisational, human, technical), hence looking for drift, degradation and weaknesses Consequence: Accidents are prevented by strengthening barriers and defences. Safety is ensured by measuring/sampling performance indicators.
Erik Hollnagel 2006

Non-linear accident model


Assumption: Accidents result from unexpected combinations (resonance) of normal performance variability.
T C T
FAA

Maintenance oversight

O I
Aircraft design knowledge

Certification

R T
Interval approvals

Aircraft

P T

R C

T
High workload

C
Procedures

Interval approvals

Aircraft design

O
Redundant design

End-play checking

O
Allowable end-play

R
Controlled stabilizer movement

Limiting stabilizer movement

R
Limited stabilizer movement

Mechanics

R T C

T
Equipment High workload Expertise

I
Excessive end-play

Jackscrew up-down movement

O I Horizontal stabilizer movement O T C

Procedures

R P R I Aircraft pitch control O

Lubrication

Lubrication

R
Grease

Jackscrew replacement

P
Expertise

Functional Resonance Accident Model

Hazards-risks: Emerges from combinations of normal variability (socio-technical system), hence looking for ETTO* and sacrificing decision
Thoroughness Trade-Off Consequence: Accidents are prevented by monitoring and damping variability. Safety requires constant ability to anticipate future events.
Erik Hollnagel 2006

* ETTO = Efficiency-

Safety management and control


The purpose of safety management is ensure that nothing unwanted happens. An SMS must therefore be able to control a dynamic process or organisation to insure that performance remains within predetermined safety limits.

Disturbance

Setpoint

Controller and actuating device

Process

Output

Sensor

Key concepts:

Process model (nature of activity) Measurements (performance indicators, output) Possibilities for control (means of intervention) Nature of threats (disturbances, noise)
Erik Hollnagel 2006

Safety management as feedback control


How can changes be brought about? What are the control options/tools? Environment (external variability) Process (internal variability) Nature of threats: - regular - irregular - unexampled

Required safety level

Safety Management System

Performance

Accident model: - simple linear - complex linear - non-linear Reporting threshold

Performance indicators

Delays in effects? Delays in feedback?

Erik Hollnagel 2006

Knowing what may happen


Murphys law: everything that can go wrong sooner or later will go wrong If theres more than one way to do a job and one of those ways will end in disaster, then somebody will do it that way.

Probability (p)

Known (safe)

Requisite imagination: Where is the cut-off point?


Unknown (unsafe)

There is an infinite number of ways in which something can go wrong. The problem is to find those that are unlikely yet potentially serious.

Consequence
Erik Hollnagel 2006

Regular threats
(Westrum, 2006)

Events that occur so often that the organisation can learn how to respond.
Medication errors that only affect a single patient. Transportation accidents (collision between vehicles) Process or component failure (loss of mass, loss of energy)

p = 0.01

Regular threats are covered by standard methods (HAZOP, Fault Trees, FMECA, etc.)
Cost

Their likelihood and severity (cost) are so high that they must be dealt with. Solutions can be based on standard responses, typically elimination or barriers

Erik Hollnagel 2006

Irregular threats
(Westrum, 2006)

One-off (singular) events, but so many, so rare, and so different that a standard response is impossible.
p

Apollo 13 moon mission accident. Epidemics (BSE, N5H1) Simultaneous loss of main and back-up systems.

p = 0.01

Irregular threats are imaginable but usually completely unexpected. They are discounted by standard methods.
Cost

Their likelihood is so low that defences are not cost effective, even if consequences are serious. Solutions require interaction and improvisation. Standard responses are insufficient.

Erik Hollnagel 2006

Unexampled events
(Westrum, 2006)

Events that are virtually impossible to imagine and which exceed the organisations collective experience
Chernobyl New Orleans flooding (2005) Attack on the WTC (9/11).
p

Even when unexampled events are imaginable, they are normally discounted as impossible. Their likelihood is so low that defences are not viable, even if consequences are catastrophic. Solutions require the ability to cope, i.e., dynamically to self-organize, formulate and monitor responses.
Erik Hollnagel 2006

p = 0.01

Cost

Reactive organisation
Scrambling Surprise! for action

Accident

Safety planning Preparing for regular threats

Activate readymade plans

Accident
Erik Hollnagel 2006

Interactive (attentive) organisation


Situation assessment, quick replanning Safety planning Preparing for irregular threats Evaluation, learning

Accident Prepared and alert Looking for expected situations.

Occasional health checks using pre-defined indicators


Erik Hollnagel 2006

Proactive (resilient) organisation


Safety planning Preparing for unexampled events Situation assessment, reorganisation Evaluation, learning

Accident

Alternative ways of functioning

Alert and observant.

Constantly selfcritical and inquisitive

Erik Hollnagel 2006

Some examples
Type of organisation Reactive (brittle, no resilience) Interactive (robust, partial resilience) Examples Mont Blanc Tunnel fire (March 26 1999) Swedish government after Tsunami (December 26 2004) Homeland Security and FEMA after Hurricane Katrina (August 29 2005) The aviation industry Nuclear power plants Hospitals Toyota (as innovative manufacturer) People of London after bombing, July 7 2005 Israeli hospitals (bus bombings)

Proactive (full resilience)

Erik Hollnagel 2006

Success and failure


Failure is normally explained as a breakdown or malfunctioning of a system and/or its components. This view assumes that success and failure are of a fundamentally different nature. Individuals and organisations must adjust to the current conditions in everything they do. Because information, resources and time are finite such adjustments will always be approximate. Success is due to the ability of organisations, groups and individuals correctly to make these adjustments, in particular to anticipate failures before they occur. Failure is due to the absence of that ability either temporarily or permanently. Safety must encompass strengthening this ability, rather than just avoiding or eliminating failures.
Erik Hollnagel 2006

Surprises and responses


Organisations view on surprises
Reactive Disturbances, or disrupting events, which challenge the proper functioning of a process.

Focus of organisations response


Try to keep process under control and ensure people do not exceed given limits. Improve ability to detect and to respond when challenged. Prepare routines and plans. Identify the variability that organisation should be aware of; ensure ability to cope with these variations. Search for the boundaries of own assessments in order to learn and revise.
Erik Hollnagel 2006

Interactive Exceptions that must be (attentive) regimented. Uncertainty about the future. Proactive (resilient) A need constantly to update definitions of the difference between success and failure. A recognition that models and plans are likely to be incomplete or wrong, despite best efforts.

From reactive to proactive control

Target state (setpoint)

Anticipatory control (feedforward)

Disturbance

+ Compensatory
control (feedback)

Process

Output

Sensor

You cannot drive a car by looking in the rear-view mirror! The main tool for looking ahead should NOT be to look back
Erik Hollnagel 2006

SMS as feedforward control


How can changes be brought about? What are the control options/tools? Anticipation
Customers, regulators,

Safety values and targets

(irregularities, disturbances, threats)

Environment
(external variability)

Nature of threats: - regular - irregular - unexampled

Safety Management System

Process (internal variability)

Performance

Accident model: - simple linear - complex linear - non-linear

Performance indicators Reporting threshold

Delays in effects? Delays in feedback?

Erik Hollnagel 2006

10

Components of resilience
Dynamic developments
Up da tin g

a Le rn

Anticipation Knowing what to expect (anticipation)


Knowledge

Attention Knowing what to look for (attention) Competence

Response Knowing what to do (rational response)


Resources

ing

Strategic decision making

Erik Hollnagel 2006

Resilience and safety management


Resilience is the intrinsic ability of an organisation to keep or recover a stable state, thereby allowing it to continue operations after a major mishap or in presence of continuous stress. A practice of Resilience Engineering must comprise the following critical components: Ways to analyse, measure and monitor the resilience of organisations in their operating environment. Tools and methods to improve an organisations resilience vis--vis the environment. Techniques to model and predict the short- and long-term effects of change and decisions on risk.
Erik Hollnagel 2006

11

If you want to know more about RE ...

Erik Hollnagel 2006

12

Anda mungkin juga menyukai