Chapter coverage:
System failure
Failure detection and analysis
Improving process reliability
Recovery
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.2
Failure
• There is always a chance that things might go wrong – we
must accept this NOT ignore this.
• Critical failure:
– Lost of customer
– High downtime
– High repair cost
– Injury or lost of lives (company reputation)
• Non - critical failure – lesser effect
• Organizations must discriminate and give priority to
critical failure – “why things fail” & “how to measure the
impact of failure”
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.3
Failure as an Opportunity
• All failure can be traced back to some kind of human
failure.
– A machine failure might have been cause by
someone’s poor design or maintenance.
– Delivery failure might have been someone’s error in
managing the supply schedule.
• Failures are rarely a random chance.
– It can be controlled to a certain extent
– Can learn from failure and change accordingly
• Opportunity to examine and plan for elimination
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.4
System Failure
Why things fail:
1) Failure resulting from within the operation:
• Design failure
• Facilities failure
• People failure
1) Failure resulting from material or information input
• Supplier failure
3) Failure resulting from customer actions
• Customer failure
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.5
Why Things Fail
Design failure:
– Operations may look fine on paper but cannot cope with
real circumstances.
– Type 1: Characteristic of demand was overlooked or
miscalculated.
• Bearing factory designed to produce 100 bearings
per day but customers demand 125 bearings per
day.
– Type 2: The circumstances under which the operation
has to work are not as expected.
• A factory building designed to house stationary
machinery fails when it was used to store a
vibrating machine.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.6
Why Things Fail
Facilities failure:
– All facilities (machines, equipment, buildings, fittings)
are liable to ‘breakdown’.
– Type 1: Partial breakdown
• Worn out carpet in a hotel
• Machine can only half its normal rate
– Type 2: Complete breakdown
• Sudden stop of operation
– It is the effect of the breakdown that is important – some
breakdowns could paralyse the whole operation.
– Some failures have a cumulative significant impact.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.7
Why Things Fail
People failure:
– Type 1: ‘Errors’ are mistakes in judgement
• A managers decision to continue running the plant
with a partially failed heat exchanger resulted in a
more expensive complete breakdown.
– Type 2: ‘Violation’ are acts which are contrary to
defined operating procedures
• A machine operator failure to lubricate the
bearings of the motor resulted in the bearings
overheating and failing
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.8
Why Things Fail
Supplier failure:
– A supplier failed to
• Deliver.
• Deliver on time.
• Deliver quality goods and services
can lead to failure within an operation.
Customer failure:
– Customer failure can result when customers misuse
products and services
• Example: Someone loading a 14kg washing machine
with 18kg of cloths will cause the machine to fail.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.9
Measuring Failure
– There are three main ways of measuring failure:
• Failure rates – how often a failure occurs
• Reliability – the chances of failure occurring
• Availability – the amount of available useful
operating time
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.10
Measuring Failure
number of failures
• Failure rate (FR): FR =
total number of products tested
number of failures
FR =
operating time
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.11
Measuring Failure
Failure over time – the ‘bath-tub’ curve
• At different stages during the life of anything, the
probability of it failing will be different.
• Most physical entity failure pattern will follow the
bath-tub curve.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.12
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.13
Bath-Tub Curve
Infant-
mortality Normal-life Wear-out
stage stage stage
Failure rate
X y
Time
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.14
Reliability
– Measures the probability of a system, product or service
to perform as expected over time.
– Values between 0 and 1 (0 to 100% reliability)
– Used to relate parts of the system to the system.
• If components in a system are all interdependent, a
failure in any individual component will cause the
whole system to fail.
• Hence, reliability of the whole system, Rs,
Rs = R1 × R2 × R3 × …Rn
Where: R1 = reliability of component 1
R2 = reliability of component 2
R3 = reliability of component 3
Etc…
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.15
Worked Example
An automated pizza-making machine in a food manufacturer’s factory
has five major components, with individual reliabilities (the probability
of the component not failing) as follows:
Dough mixer Reliability = 0.95
Dough roller and cutter Reliability = 0.99
Tomato paste applicator Reliability = 0.97
Cheese applicator Reliability = 0.90
Oven Reliability = 0.98
If one of these parts of the production system fails, the whole system
will stop working. Thus the reliability of the whole system is:
Worked Example
Notes:
– The reliability of the whole system is 0.8 even though the
reliability of the individual components was higher.
– If the system had more components, its reliability would be
lower.
– E.g. for a system with 10 components having reliability of
0.99 each, the reliability of the system is 0.9 BUT if the
system has 50 components having reliability of 0.99 each,
the reliability of the system reduces to 0.8.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.17
Availability
– Availability is the degree to which the operation is
ready to work.
– An operation is not available if it has either failed or is
being repaired following a failure.
MTBF
Availability ( A) =
MTBF + MTTR
Where
MTBF = mean time between failures
MTTR = mean time to repair
operating hours
MTBF =
number of failures
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.18
Improving system
reliability Recovery
Stopping things going Coping when things do
wrong go wrong
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.19
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.20
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.21
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.22
Failure analysis:
1. Accident investigation
• Trained staff analyse the cause of the accident.
• Make recommendations to minimize or eradicate of
the failure happening again.
• Specialized investigation technique suited to the type
of accident
1. Product liability
• Ensures all products are traceable.
• Traced back to the process, the components from
which they were produced and the supplier who
supplied them.
• Goods can be recalled if necessary.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.23
3. Complaint analysis
• Complaints and compliments are recorded and taken
seriously.
• Cheap and easily available source of information
about errors.
• Involves tracking number of complaints over time.
4. Critical incident analysis
• Requires customers to identify the elements of
products or services they found either satisfying or
not satisfying.
• Especially used in service operations.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.24
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.26
Fault-tree analysis for below-temperature
food being served to customers
Food served to Key
customer is below
temperature AND node
Food OR node
Plate
is cold is cold
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.27
To be continued…
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.28
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.29
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.30
Fail-safeing
• Called poka-yoke in Japan.
• Based on the principle that human mistakes are to some
extent inevitable.
• The objective is to prevent them from becoming a
defect.
• Poka-yokes are simple (preferably inexpensive) devices
of systems which are incorporated into a process to
prevent inadvertent operator mistakes resulting in a
defect.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.31
Maintenance
• Maintenance is the method used by organizations to
avoid failure by taking care of their physical activities
• Important to organizations whose physical activities
play a central role in creating their goods and service.
Benefits of maintenance:
• Enhanced safety
• Increased reliability
• Higher quality
• Lower operating costs
• Longer life span
• Higher end value
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.32
Benefits of Maintenance
• Enhanced safety: Well maintained facilities are less
likely to behave in an unpredictable or non-standard
way, or fail outright, all of which would pose a hazard
to staff.
• Increased reliability – This leads to less time lost while
facilities are repaired, less disruption to the normal
activities of the operation , and less variation in output
rates.
• Higher quality – Badly maintained equipment is more
likely to perform below standard and cause quality
errors.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.33
Benefits of Maintenance
• Lower operating costs – Many pieces of process
technology run more efficiently when regularly
serviced.
• Longer life span – Regular care prolong the effective
life of facilities by reducing the problems in operation
whose cumulative effect causes deterioration.
• Higher end value – Well maintained facilities are
generally easier to dispose of into the second-hand
market.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.34
Approaches to maintenance
1. Run to breakdown (RTB)
• Allowing the facilities to continue operating until
they fail.
• Maintenance work is performed after failure has
taken place.
• The effect of the failure is not catastrophic or
frequent – e.g. does not paralyze the whole
operation.
• Regular checks are sufficient.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.35
Approaches to maintenance
2. Preventive maintenance (PM)
• Attempts to eliminate or reduce the chances of
failure by servicing the facilities at pre-planned
intervals.
• Used when the consequence of failure is
considerably more serious.
• Can be used to detect impending failures.
Remedial actions can be planned for, thus
improving overall efficiency.
• The useful life of certain components can be
increase beyond their recommended life span.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.36
Approaches to maintenance
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.37
Approaches to maintenance
4. Mixed maintenance strategies
• Most operations adopt a mixture of these
approaches because different elements of their
facilities have different characteristics.
Use ???
Use ???
Use ???
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.38
Approaches to maintenance
5. Run to breakdown versus preventive maintenance
• The more frequent preventive maintenance is
carried out, the lesser chance it has of breaking
down.
• The cost of preventive maintenance is often high.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.40
Cost of Breakdown
breakdown
Costs of
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.41
Total cost
Costs
Cost of providing
preventive
Cost of maintenance
breakdowns
‘Optimum’ level of
preventive
maintenance
Amount of preventive maintenance
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.42
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.43
Actual cost of
breakdowns
Costs
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.44
Total cost
Costs
Cost of breakdowns
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.45
Notes:
• In actuality the cost of PM does not increase as steeply as
indicated in Model 1.
– Model 1 assumes that all maintenance jobs must be
carried out by a specialist maintenance team but Model
2 recognizes that operators themselves can carry out
simple, in process maintenance. Etc…
• The cost of breakdown could be higher than indicated in
Model 1.
– A breakdown may cost more than the cost of repair
and the cost of the stoppage itself – a stoppage can
take away the stability in the operation.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.46
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.47
6. Failure distributions
• The shape of the failure probability distribution of a
facility can determine if it benefits from preventive
maintenance.
Machine A
Probability of failure
Machine B
x y
Time
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.48
Notes:
• Machine A
– The probability that it will break down before time x is
relatively low.
– It has high probability of breaking down between
times x and y.
– If preventive maintenance was carried out just before
point x, the chances of breakdown can be reduced.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.49
Notes:
• Machine B
– It has a relatively high probability of breaking down at
any time.
– Its failure probability increases gradually as it passes
through time x.
– Carrying out preventive maintenance at point x or any
other cannot dramatically reduce the probability of
failure.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.50
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.51
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.52
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.53
Example:
Suppose the screws on a machine become loose. Each week
it jams up and is passed to maintenance to be fixed.
• A ‘repair level’ maintenance engineer will simply
repair it and hand it back to production.
• A ‘prevention level’ maintenance engineer will spot
the weekly pattern to the problem and tighten the
screws in advance of their loosening.
• An ‘improvement-level’ maintenance engineer will
recognize that there is a design problem and modify
the machine so that the problem cannot recur.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.54
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.56
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.57
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.58
Example:
Take the process illustrated in Slide 19.59. This is a simple
shredding process which prepares the vegetables prior to
freezing. The most significant part of the process which
requires the most maintenance attention is the cutter sub-
assembly. However, there are several modes of failure.
1) They require changing because they have worn out
through usage
2) They have been damaged by small stones entering the
process
3) They have shaken loose because they were not fitter
correctly.
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.59 One part in one process can have several
different failure modes, each of which
requires a different approach
Cutter ‘wear out’
failure pattern
Shredding Solution
Failures
process Preventive maintenance
before end of useful life
Time
Cutters Solution
Failures
Time
Cutter ‘shake loose’
failure pattern
Solution
Failures
Time
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19
19.60
The End
© Nigel Slack, Stuart Chambers & Robert Johnston, 2004 Operations Management, 4E: Chapter 19