Achieving System Safety by Resilience Engineering IET - System - Safety - Hollnagel

Achieving System Safety by Resilience Engineering
Erik Hollnagel Industrial Safety Chair, cole des Mines de Paris, France E-mail: erik.hollnagel@cindy.ensmp.fr
Professor, University of Linkping, Sweden E-mail: eriho@ida.liu.se
Erik Hollnagel 2006
Safety as a non-event
SAFE SYSTEM = NOTHING UNWANTED HAPPENS Prevention of unwanted events Unexpected event Daily operation (Status quo) Reduce likelihood. Reduce consequences. Accidents, incidents Protection against unwanted outcomes Unwanted outcome
Safety management must prevent/protect against both KNOWN and UNKNOWN risks. Safety management requires THINKING about how accidents can HAPPEN
Erik Hollnagel 2006
What has happened? What may happen?

Looking at the past Accident model Simple linear Complex linear Non-linear* Looking into the future Risk model Component failures Combination of failures and degraded defences Performance variability coincidences
* outcomes are not proportional to inputs, and cannot be derived from a simple combination of inputs
Erik Hollnagel 2006
Simple, linear cause-effect model

Assumption: Accidents are the (natural) culmination of a series of events or circumstances, which occur in a specific and recognisable order.
Domino model (Heinrich, 1930)
Hazards-risks: Due to component failures (technical, human, organisational), hence looking for failure probabilities (event tree, PRA/HRA). Consequence: Accidents are prevented by finding and eliminating possible causes. Safety is ensured by improving the organisations ability to respond.
Erik Hollnagel 2006
Complex, linear cause-effect model

Assumption: Accidents result from a combination of active failures (unsafe acts) and latent conditions (hazards).
Swiss cheese model (Reason, 1990)
Hazards-risks: Due to degradation of components (organisational, human, technical), hence looking for drift, degradation and weaknesses Consequence: Accidents are prevented by strengthening barriers and defences. Safety is ensured by measuring/sampling performance indicators.
Erik Hollnagel 2006
Non-linear accident model

Assumption: Accidents result from unexpected combinations (resonance) of normal performance variability.
T C T
FAA
Maintenance oversight
O I
Aircraft design knowledge
Certification
R T
Interval approvals
Aircraft
P T
R C
T
High workload
C
Procedures
Interval approvals
Aircraft design
O
Redundant design
End-play checking
O
Allowable end-play
R
Controlled stabilizer movement
Limiting stabilizer movement
R
Limited stabilizer movement
Mechanics
R T C
T
Equipment High workload Expertise
I
Excessive end-play
Jackscrew up-down movement
O I Horizontal stabilizer movement O T C
Procedures
R P R I Aircraft pitch control O
Lubrication
Lubrication
R
Grease
Jackscrew replacement
P
Expertise
Functional Resonance Accident Model
Hazards-risks: Emerges from combinations of normal variability (socio-technical system), hence looking for ETTO* and sacrificing decision
Thoroughness Trade-Off Consequence: Accidents are prevented by monitoring and damping variability. Safety requires constant ability to anticipate future events.
Erik Hollnagel 2006
* ETTO = Efficiency-
Safety management and control

The purpose of safety management is ensure that nothing unwanted happens. An SMS must therefore be able to control a dynamic process or organisation to insure that performance remains within predetermined safety limits.
Disturbance
Setpoint
Controller and actuating device
Process
Output
Sensor
Key concepts:
Process model (nature of activity) Measurements (performance indicators, output) Possibilities for control (means of intervention) Nature of threats (disturbances, noise)
Erik Hollnagel 2006
Safety management as feedback control

How can changes be brought about? What are the control options/tools? Environment (external variability) Process (internal variability) Nature of threats: - regular - irregular - unexampled
Required safety level
Safety Management System
Performance
Accident model: - simple linear - complex linear - non-linear Reporting threshold
Performance indicators
Delays in effects? Delays in feedback?
Erik Hollnagel 2006
Knowing what may happen

Murphys law: everything that can go wrong sooner or later will go wrong If theres more than one way to do a job and one of those ways will end in disaster, then somebody will do it that way.
Probability (p)
Known (safe)
Requisite imagination: Where is the cut-off point?

Unknown (unsafe)
There is an infinite number of ways in which something can go wrong. The problem is to find those that are unlikely yet potentially serious.
Consequence
Erik Hollnagel 2006
Regular threats
(Westrum, 2006)
Events that occur so often that the organisation can learn how to respond.
Medication errors that only affect a single patient. Transportation accidents (collision between vehicles) Process or component failure (loss of mass, loss of energy)
p = 0.01
Regular threats are covered by standard methods (HAZOP, Fault Trees, FMECA, etc.)
Cost
Their likelihood and severity (cost) are so high that they must be dealt with. Solutions can be based on standard responses, typically elimination or barriers
Erik Hollnagel 2006
Irregular threats
(Westrum, 2006)
One-off (singular) events, but so many, so rare, and so different that a standard response is impossible.
p
Apollo 13 moon mission accident. Epidemics (BSE, N5H1) Simultaneous loss of main and back-up systems.
p = 0.01
Irregular threats are imaginable but usually completely unexpected. They are discounted by standard methods.
Cost
Their likelihood is so low that defences are not cost effective, even if consequences are serious. Solutions require interaction and improvisation. Standard responses are insufficient.
Erik Hollnagel 2006
Unexampled events
(Westrum, 2006)
Events that are virtually impossible to imagine and which exceed the organisations collective experience
Chernobyl New Orleans flooding (2005) Attack on the WTC (9/11).
p
Even when unexampled events are imaginable, they are normally discounted as impossible. Their likelihood is so low that defences are not viable, even if consequences are catastrophic. Solutions require the ability to cope, i.e., dynamically to self-organize, formulate and monitor responses.
Erik Hollnagel 2006
p = 0.01
Cost
Reactive organisation
Scrambling Surprise! for action
Accident
Safety planning Preparing for regular threats
Activate readymade plans
Accident
Erik Hollnagel 2006
Interactive (attentive) organisation

Situation assessment, quick replanning Safety planning Preparing for irregular threats Evaluation, learning
Accident Prepared and alert Looking for expected situations.
Occasional health checks using pre-defined indicators

Erik Hollnagel 2006
Proactive (resilient) organisation

Safety planning Preparing for unexampled events Situation assessment, reorganisation Evaluation, learning
Accident
Alternative ways of functioning
Alert and observant.
Constantly selfcritical and inquisitive
Erik Hollnagel 2006
Some examples
Type of organisation Reactive (brittle, no resilience) Interactive (robust, partial resilience) Examples Mont Blanc Tunnel fire (March 26 1999) Swedish government after Tsunami (December 26 2004) Homeland Security and FEMA after Hurricane Katrina (August 29 2005) The aviation industry Nuclear power plants Hospitals Toyota (as innovative manufacturer) People of London after bombing, July 7 2005 Israeli hospitals (bus bombings)
Proactive (full resilience)
Erik Hollnagel 2006
Success and failure

Failure is normally explained as a breakdown or malfunctioning of a system and/or its components. This view assumes that success and failure are of a fundamentally different nature. Individuals and organisations must adjust to the current conditions in everything they do. Because information, resources and time are finite such adjustments will always be approximate. Success is due to the ability of organisations, groups and individuals correctly to make these adjustments, in particular to anticipate failures before they occur. Failure is due to the absence of that ability either temporarily or permanently. Safety must encompass strengthening this ability, rather than just avoiding or eliminating failures.
Erik Hollnagel 2006
Surprises and responses

Organisations view on surprises
Reactive Disturbances, or disrupting events, which challenge the proper functioning of a process.
Focus of organisations response

Try to keep process under control and ensure people do not exceed given limits. Improve ability to detect and to respond when challenged. Prepare routines and plans. Identify the variability that organisation should be aware of; ensure ability to cope with these variations. Search for the boundaries of own assessments in order to learn and revise.
Erik Hollnagel 2006
Interactive Exceptions that must be (attentive) regimented. Uncertainty about the future. Proactive (resilient) A need constantly to update definitions of the difference between success and failure. A recognition that models and plans are likely to be incomplete or wrong, despite best efforts.
From reactive to proactive control
Target state (setpoint)
Anticipatory control (feedforward)
Disturbance
+ Compensatory
control (feedback)
Process
Output
Sensor
You cannot drive a car by looking in the rear-view mirror! The main tool for looking ahead should NOT be to look back
Erik Hollnagel 2006
SMS as feedforward control

How can changes be brought about? What are the control options/tools? Anticipation
Customers, regulators,
Safety values and targets
(irregularities, disturbances, threats)
Environment
(external variability)
Nature of threats: - regular - irregular - unexampled
Safety Management System
Process (internal variability)
Performance
Accident model: - simple linear - complex linear - non-linear
Performance indicators Reporting threshold
Delays in effects? Delays in feedback?
Erik Hollnagel 2006
10
Components of resilience
Dynamic developments
Up da tin g
a Le rn
Anticipation Knowing what to expect (anticipation)

Knowledge
Attention Knowing what to look for (attention) Competence
Response Knowing what to do (rational response)

Resources
ing
Strategic decision making
Erik Hollnagel 2006
Resilience and safety management

Resilience is the intrinsic ability of an organisation to keep or recover a stable state, thereby allowing it to continue operations after a major mishap or in presence of continuous stress. A practice of Resilience Engineering must comprise the following critical components: Ways to analyse, measure and monitor the resilience of organisations in their operating environment. Tools and methods to improve an organisations resilience vis--vis the environment. Techniques to model and predict the short- and long-term effects of change and decisions on risk.
Erik Hollnagel 2006
11
If you want to know more about RE ...
Erik Hollnagel 2006
12

Achieving System Safety by Resilience Engineering IET - System - Safety - Hollnagel

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Achieving System Safety by Resilience Engineering IET - System - Safety - Hollnagel

Diunggah oleh

Hak Cipta:

Format Tersedia

Achieving System Safety by Resilience Engineering

Erik Hollnagel 2006

What has happened? What may happen?

Simple, linear cause-effect model

Domino model (Heinrich, 1930)

Complex, linear cause-effect model

Swiss cheese model (Reason, 1990)

Non-linear accident model

Limiting stabilizer movement

Jackscrew up-down movement

O I Horizontal stabilizer movement O T C

R P R I Aircraft pitch control O

Functional Resonance Accident Model

Safety management and control

Controller and actuating device

Safety management as feedback control

Required safety level

Safety Management System

Accident model: - simple linear - complex linear - non-linear Reporting threshold

Delays in effects? Delays in feedback?

Erik Hollnagel 2006

Knowing what may happen

Requisite imagination: Where is the cut-off point?

Erik Hollnagel 2006

Erik Hollnagel 2006

Safety planning Preparing for regular threats

Activate readymade plans

Interactive (attentive) organisation

Accident Prepared and alert Looking for expected situations.

Occasional health checks using pre-defined indicators

Proactive (resilient) organisation

Alternative ways of functioning

Alert and observant.

Constantly selfcritical and inquisitive

Erik Hollnagel 2006

Proactive (full resilience)

Erik Hollnagel 2006

Success and failure

Surprises and responses

Focus of organisations response

From reactive to proactive control

Target state (setpoint)

Anticipatory control (feedforward)

SMS as feedforward control

Safety values and targets

(irregularities, disturbances, threats)

Nature of threats: - regular - irregular - unexampled

Safety Management System

Process (internal variability)

Accident model: - simple linear - complex linear - non-linear

Performance indicators Reporting threshold

Delays in effects? Delays in feedback?

Erik Hollnagel 2006

Anticipation Knowing what to expect (anticipation)

Attention Knowing what to look for (attention) Competence

Response Knowing what to do (rational response)

Strategic decision making

Erik Hollnagel 2006

Resilience and safety management

If you want to know more about RE ...

Erik Hollnagel 2006

Anda mungkin juga menyukai