Overview
2
Liviu Miclea 2016
Overview
3
Liviu Miclea 2016
Safety – Complexity - Responsibility
How does it fit?
Examples: Space
1986: Space shuttle Challenger
explodes shortly after takeoff
5
Liviu Miclea 2016
Examples: Military
6
Liviu Miclea 2016
Examples: Road
2003: Toll system „Toll Collect“ can not start service, i. a. due
to SW problems in the vehicle systems
2004: DaimlerChrysler calls back 680.000 cars because of
problems of the electronically controlles brake system
2004: Chrysler recalls 2,7 Mio cars due to a defect in the
automated gear switch system
2009: Toyota calls back 3,8 Mio cars due to problems with the
gas pedal and problems with ABS SW
7
Liviu Miclea 2016
Example: Energy
8
Liviu Miclea 2016
Examples: Healthcare
1985–1987: Due to SW errors five patients die after radiation overdose applied by
Therac-25
2001: In Panama five patients die due to radiation overdose, because incomplete
and misunderstood SW application conditions
2010: NY Times reports that after computer crashes radiation overdoses have been
applied due to deleted or outdated information.
9
Liviu Miclea 2016
Examples: Maritime
1987: Herald of Free Enterprise ferry leaves harbours with open bow doors and
capsizes, 134 fatalities
1994: Estonia ferry sinks in the Baltic Sea due to damaged bow door, 852
fatalities
10
Liviu Miclea 2016
Examples: Civil Aviation
11
Liviu Miclea 2016
Examples: Railway
12
Liviu Miclea 2016
Complexity: Some trends
Increasingly shortened cycles for technology innovation
Increasing performance and complexity of technical systems
Shortened product development cycles due to tightened
international competition and time pressure
Ease of modification and complex interaction of components
lead often to unexpected emergent behaviour, in particular
SW intensive systems
Accidents related to systems although components are
checked, tested, assessed, certified, proven-in-use ...
(System accidents)
Superficially the causes often seem to be SW or human
errors...
13
Liviu Miclea 2016
What is safety culture?
15
Liviu Miclea 2016
Goals, objectives and
restrictions
20
Liviu Miclea 2016
Terminology
21
Liviu Miclea 2016
Terminology
Hazards are caused by failures, or failure conditions
fault: abnormal condition that may cause a reduction in, or loss of, the
capability of a functional unit to perform its intended function
failure – inability of an item to perform its intended function
failure condition or failure mode – often used to identify specific failure
23
Liviu Miclea 2016
Compliance vs. Assurance
25
Liviu Miclea 2016
Motivation
Assumptions:
A zero risk (complex) technological system is not feasible
Any risk analysis makes (implicitly or explicitely) decisions about
optimal use of (limited) financial ressources as well as
minimization of the expected damage resulting from usage of a
system
Decisions are also taken without any risk analysis
Maximization of technical safety generally does not lead to
minimal risk
a system is safe, if the risk associated with its usage is below the
tolerable risk
a safe system is not a zero risk system
26
Liviu Miclea 2016
Normative Requirements - IEC 62278(EN 50126)
27
Liviu Miclea 2016
Risk: Definition
Example: Risk of casualty (rom: ranire) of a 22 year old male Swiss car driver
31
Liviu Miclea 2016
Hazard (rom: pericol, risc)
“Hazard: A physical situation with a potential for human injury (ranire).
(IEC 62278).”
Alternative:
“A hazard is a state or set of conditions of a system
(or an object) that, together with other conditions in the environment of the
system (or object), will lead inevitably to an accident (loss event). [....]
A hazard is defined with respect to the environment
of the system or component. [....]
What constitutes a hazard depends upon where
the boundaries of the system are drawn. [....]”
Leveson: Safeware, 1995
32
Liviu Miclea 2016
Hazards, Causes and Accidents
Cause (system level)
=> hazard (subsystem Hazard (system Accident k
Level) level)
Accident l
Cause
Subsystem System boundary
boundary
Causes Consequences
33
Liviu Miclea 2016
Basic process (based on EN 50129)
1 System Definition
2 Hazard Identification
3 Consequence Analysis Risk Analysis
4 Loss Analyis
5 Risk Assessment
Result: Hazards and
tolerable hazard rates
H, THR
Hazard Apportionment of qualitative
Control and quantitative requirements
6 Causal Analysis
7 CCF Analysis
8 SIL Allocation
(Functions) Apportionment of
quantitative requirements
SIL table Functions, THR, SIL
(Design+Implementation)
Components, FR, SIL
34
Liviu Miclea 2016
Definition of safety requirements
System Analyse
Definition Operation
Identify
Hazards
Estimate
Hazard
Rates
Identify Hazard
Consequen Log
ces:
• Accidents
• Near miss
• Safe state
Determine
risk
Risk System
Determine
tolerability Requirements
THR
criteria Specification
(safety) (Safety
requirements)
System Design
Analysis
35
Liviu Miclea 2016
Process view For each hazard
Risk Analysis
Hazard
Hazards Analysis
Perform Safety-
and THR
Causal related
System Analysis application
functions conditions
Determine Subsystem
SIL table THR and Requirements
SIL Specification
System Determine
Design FR for
Description system
elements
36
Liviu Miclea 2016
Independence
Functional independence implies,
that there are neither systematic nor random faults,
which cause a set of functions to fail simultaneously.
Hazard
+ THR
....
warnings
1E-7 1E-7 1E-7
Check
1E-7
System
independence Undetected failure
of light signals
7E-6
Undetected
failure of barriers
7E-6 architecture
Undetected failure
assumptions
Undetected Undetected failure Undetected
of switch-in failute of distant of light signals failure of barriers
function signal
1E-7 7E-6 7E-6
....
Determine SIL + THR for
SIL table THR + SIL subsystems
39
Liviu Miclea 2016
SIL table
SAFETY INTEGRITY Tolerable Hazard
LEVEL Rate
THR per
hour and per function
4 10-9<THR < 10-8
3 10-8 < THR < 10-7
2 10-7 < THR < 10-6
1 10-6 < THR < 10-5
SILs are assigned to safety functions and are „inherited“ by the components
implementing the safety functions
balance by means of a SIL allocation table
Procedure described in detail by EN 50129
Same table as in IEC 61508 (but continuous operation mode only)
Approach is based on heuristics and experience rather than scientific evidence
40
Liviu Miclea 2016
Conclusions
42
Liviu Miclea 2016
Architectures
43
Liviu Miclea 2016
Single board Controller Model
44
Liviu Miclea 2016
1oo1 Architecture
45
Liviu Miclea 2016
1oo1 Architecture
46
Liviu Miclea 2016
1oo1 Architecture
47
Liviu Miclea 2016
1oo2 Architecture
1oo2 Architecture
48
Liviu Miclea 2016
1oo2 Architecture
50
Liviu Miclea 2016
2oo2 Architecture
51
Liviu Miclea 2016
1oo1D Architecture
52
Liviu Miclea 2016
2oo3 Architecture
53
Liviu Miclea 2016
2oo3 Architecture
2oo3 Architecture Single
Fault Degradation Models
54
Liviu Miclea 2016
2oo2D Architecture 1oo2D Architecture
56
Liviu Miclea 2016
Comparing Architectures
Single Board Controller Model Results
- Highest safety rate -> 1oo2 (PFD min), then 2oo3 (3X1oo2)
- 2oo2 lowest PFS
57
Liviu Miclea 2016
Comparing Architectures
Single Safety Controller Model Results for All Architectures
- 1oo1D better safety than 1oo1, but the cost is higher false trip
- Lowest PFD -> 1oo2D with comparison diagnostics (many implementations)
(cons: higher PFS)
- 2oo3, 1oo2D and 1oo2D with comparison diagnostic -> excellent safety
- 2oo2, 2oo2D, 2oo3, 1oo2D and 1oo2D with comparison diagnostic -> excellent
operation without an excessive false trip rate (low PFS)
- 2002D -> the BEST compromise if the automatic diagnostics is excellent (IEC
61508) (is better then 2oo3)
59
Liviu Miclea 2016