Anda di halaman 1dari 32

Reliability

Definition
The reliability of a component or system is the probability that it will be able to perform its function when
required, for a specified time and in a particular environment.
Function?
A component performs its function if it does not fail. However, there may be many levels of functioning (or
failing). A motor that will not start clearly has failed by most peoples understanding? But what if a indicator
light / gauge does not work? Thus the required level of functioning must be clarified.
Environment?
The conditions wherein the component are to be used must be considered. i.e.

the physical environment (temperature, humidity, dust)

the user (their skills\knowledge\experience)

maintenance (servicing schedule etc.)


Time?
The longer a component is in use, the more likely it is to fail. However, time need not be measured in
conventional time units (hours / months / years), it may be distance travelled (cars), machine cycles etc.
Probability?
Only in the most trivial of situations will we able to assign a deterministic reliability measure. More typically,
there is a degree of uncertainty and this is quantified using probability.
Further concepts and terminology
Design life or mission time
The design life is the period of time (however measured) that a component is required to last, or is stipulated to
last, e.g. the time period for which it is guaranteed. Guarantee is something of a misnomer though as the
probability of a component surviving for its design life or longer, will usually be less than 100%.
Components V System (and sub-systems)
In simple terms, we will take a component to one of a collection of atomic unit that collectively make up our
system (or sometimes, device).
For example, our system might be a desk top computer base station and the components might be the
motherboard, the hard disk, the CR-ROM drive, the cooling fan and the power supply.
Of course from the perspective of the manufacturer of these parts, they would regard their particular part as the
system, made up from a different set of components. And similarly in turn with the makers of these
components.
Conversely, the manager of a production line might see a complete PC as just a single component
component amongst many that make up his system of interest.
Mean time to failure (MTTF) and mean time between failures (MTBF)
A frequently used metric for a components reliability is the MTTF (average life for non-reparable device) or
MTBF (average up-time for repairable device). There may be a subtle difference between the two as some
devices upon repair, may not be restored to their full health and thus the MTBF may be getting shorter with
time.

Overview of topics considered in this module


Example
To give an overview of the material covered on this module we will consider a hypothetical example of a
manufacturer who produces cooling fans for computers.
Producer
Consumer

1. Design
It should be intuitively obvious that building in reliability at the design stage should be an objective in the
development of any new product. FMECA (Failure modes, effects and criticality analysis) is a technique used
to identify, prioritise and eliminate potential failures from a component or system.
2. Prototype testing
By testing a product (until it fails) the reason for failure can be ascertained and used to inform an improvement
in the design, leading to a longer MTTF. The longer the testing, the more the reliability grows. How long
should the testing programme be if a specified MTTF is to be achieved? (Duane models)
3. Guarantee
The MTTF is not sufficient information to quantify the reliability of a component. A light bulb and a cell
battery may both have MTTF of 1,000 hours but their lifespan would be described using very different
mathematical models or probability density functions (PDF). In addition to the PDF, the related function of
reliability and hazard are useful in understanding the reliability of a component.
Failure test data (possible censored) might be obtained to inform the selection of a suitable lifespan model.
From this it might be determined that a component has say a 99% probability of surviving for 2 years or longer
and so the producer may be happy to offer a two-year guarantee. Of course the consumer may wish to conduct
their own investigations to determine the reliability of a component.
4. Sale
Frequently it is not possible to individually test all items output from a producer. The consumer may devise a
sampling plan. i.e. Select a sample of output from a batch and purchase the entire batch if the number of
defective items in the sample is sufficiently small.
5. System Reliability
The consumer may wish to assemble a system consisting of many components, each with their own reliability
measure. It is of interest to know the overall reliability of the system. If this is unacceptably low, the system
may incorporate a degree of redundancy. For example, a dual powered calculator (solar + battery) utilizes
redundancy as the device will continue to function if either (but not both!) the solar panel or battery fails.
6. Machine breakdown
Machines frequently deteriorate from their optimal operating state to ones where the contribute less value. If
we know the likelihood of machines changing state (from one shift/day to the next) we could calculate the
average time a machine spends in each of its possible states, and hence its average worth.

Density, distribution, reliability and hazard function

Example
Consider an electronic device and a study on its life span in
years.
We might track the life of 50 such devices and compile the
following table and histogram.
Years
Failed

1
4

2
8

3 4
14 11

5
7

6
3

7
2

8
1

We could use this histogram to estimate survival probabilities of similar devices.


Eg. Pr(failure in 2 years) = 12/50. Pr(survive for 5 of more years) = 13/50.
If we convert our frequencies to probabilities
Years
1
2
3
4
5
6
7
8
Failed 0.08 0.16 0.28 0.22 0.14 0.06 0.04 0.02
Our probability estimates amount to taking an area under the
histogram.

We could imagine repeating this procedure for a large number


of devices with their survival time being recorded as a
continuous rather than a discrete quantity.
This would smooth our histogram giving something like that
shown opposite.
This is a probability density function (PFF), sometimes (incorrectly) called a distribution function f(t)
Note that the only requirement for a function to be a valid density function is that
Notes
o
Taking the lower limit to be 0 rather than - assumes t 0, which is usually the case in reliability.
o
The upper limit of really means the maximum age of the device, which sometimes is not bounded.
Rather than the empirical (based on data) development of a PDF outlined above, sometimes a parametric PDF
can be arrived at and this is the approach that we will usually take.
For each of the following, verify that these are valid density function and sketch the function. Evaluate the
probability of a component with life T given by this density surviving for the time specified.
Problem 1
f(t) = 0.25
Problem 2
f(t) = t
1
/6 (4-t)

0t4

and evaluate Pr(1.5 T 2.3)

0t1
1<t4

and evaluate Pr(T > 1)

(This is the uniform PDF.)

(This is the triangular PDF)

4
Problem 3
f(t) = t2

0t2

This is NOT a valid density


Problem 4
f(t) = 2e-2t

0t<

and evaluate Pr(T < 3)

This last PDF is the (negative) exponential and is used a lot in reliability

The distribution function or cumulative distribution function


The cumulative distribution function F(t) is Pt(T < t) =

. Thus F(t) = f(t)

Problem 5
Find F(t) for the valid density functions above.
For a component, F(t) is Pr(survival time < t).

The reliability function


In reliability, it is usually more useful to work with R(t) = Pr(survival time > t).
Clearly R(t) = 1- F(t). Also, R(t) = -f(t)
Note that F(0) = 0 and F() = 1 and conversely for R(t). I.e. R(0) = 1 and R() = 0
Sketches of the reliability functions
A sketch of a reliability function can be very helpful when selecting from two competing components.
Opposite are two sketches, each showing two reliability
functions.
In the left sketch, A is clearly is better than B for all t.
In the right sketch C is superior to D early in its life but the
situation is reversed as the components age. Which one
might be preferred overall would depend on the context.

MTTF or MTBF
MTTF
Problem 6
Find the MTTF for f(t) = 0.25 0 t 4
Solution

Exercise
Find the MTTF for the valid density functions above

Hazard function

The hazard function h(t) or (t) is the failure rate of the device at time t.
Example
Suppose h(t) = 2t + 3, t in years
Then at time 0, h(0) = 3 failures / year. At time 2 t(2) has increased to 7 failures per year.
Note that this is an instantaneous rate of change. The latter does not mean that the device will fail on average, 7
times between year 2 and year 3, as the failure rate will be 9 by the end of the year. But if we take short time
periods we can get good approximate answers
Problem 7
How many failures would you expect on average in one day, for 4 year old devices
Solution
Failure rate for 4 year old devices is 12 per year, 1 day 0.00274 years so about 0.033 failures.
When a rate like this is << 1, we can interpret it as a probability. i.e. 3.3% chance of a failure
Or in general Pr(failure in t) = h(t).t
How does h(t) relate to f(t) and R(t)?
Consider the event of a device failing in time t
This is given by the light shaded area / total shaded area
If f(t) is small, light shaded area rectangle of area f(t).t
Total shaded area = Pr(T> t) = R(t), by definition.
Thus the probability = f(t).t / R(t)
But as per the example above, this probability = h(t).t.
Thus h(t).t = f(t).t / R(t) so h(t) = f(t) / R(t) = - R(t) / R(t)
Note that as h(t) dt = -1/R(t) dR we have R(t) = exp(-h(t)dt)
Problem 8
Verify that R(t) = 1 0.5t, 0 t 2 years, is a valid reliability function and sketch the function. Find the
distribution, the density and the hazard function and sketch each of these
Solution
R(0) = 1, R(2) = 0. So R is valid. (Note that the reliability function tells us that the unit will definitely not
survive beyond 2 years. Such a definitive bound on the life of a component is somewhat atypical)
Sketch?
Clearly R(t) is a straight line with intercept 1 and slope 0.5.

F(t) = 1- R(t) = 0.5(t).


Sketch?
The sketch of F(t) is a mirror reflection of R(t)

6
f(t) = F(t) = 0.5.
This is the uniform distribution function on [0, 2].
Thus a histogram of a large number of such units would show
life spans have are about equally frequencies in any time
interval over the two years.
h(t) = f(t) / R(t) = 0.5t / (1 0.5t) = 1 / (2-t)
Note h(0) = 1, h(0.5) = 1.333, h(1) = 2, h(1.5) = 4
Thus the unit failure rate is increasing with time
As t gets closer to 2, the failure rate gets closer to .
Problem 9
Verify that R(t) is a valid reliability function and sketch R(t) when R(t) is R(t) = (t-4)2 / 16 0 t 4 years.
Find and sketch f(t) and h(t) and interpret these functions.
Solution
R(0) = 1, R(4) = 0 so R is valid.
Sketches?

R(t) = (t-4)2 / 16

f(t) = -R(t) = (4-t) / 8

h(t) = f/R = 2 / (4-t)

Note that in this example the hazard function is increasing with time (we could prove this using differentiation)
Exercise
If f(t) = 0.1t 0 t 10 show that f(t) is valid and find and sketch F(t), R(t) and h(t)
If R(t) = 3 t t 3 show that R(t) is valid and find and sketch f(t), R(t) and h(t)
Example
Suppose we starts with the hazard function, say, h(t) = 3 0 t < . Thus the unit failure rate is constant at 3
failures per year, i.e. it does not increase (or decrease) with time.
Components which may only exist in two states (perfect or destroyed), or equivalently, are prone to
catastrophic failure, or equivalently again have a life span with a memoryless property would have such a
hazard function.
For example, plates/cups etc., or some electrical / electronic components such as bulbs / fuses etc.
We can easily show that f(t) = 3e-3t 0 t < and thus R = e-3t 0 t < ,
This density function is called the (negative) exponential distribution is used extensively in reliability. We will
look at it in more detail in the next section. Note that it does not impose the rather artificial condition that a unit
will definitely have failed by a specified time.

7
Exercise
R(t) = (t+1)-1 t 0
Find f(t) and h(t) and show that h(t) is a decreasing function.
Note that a decreasing hazard function might seem unusual as it suggests that the failure rate diminishes with
time. However, in some situations, there may be a sizable number of defective units in a batch whose weakness
are exposed by virtue of them failing early in life (infant mortality)
Once these flawed units are burned off the survivors may exhibit a more or less constant hazard rate for a
period of time but as they approach the end of their life span, the hazard rate may increase.
This gives rise to the so-called bathtub hazard function.

Exercise
Show that R(t) = 1 0.2t for 0 t 5 is a valid reliability function and find f, R and h. Interpret h.
Note
Most of the models we looked at thus far (uniform density, triangular density etc.) had the advantage that they
were easily manipulated mathematically but the disadvantage that they were rarely adequate to model a real
world device. (An obvious failing for many of them is that some have a finite maximum).
We will next consider some more useful life models but these are typically much more difficult to manipulate
analytically and we will need to resort to numerical methods / statistical tables are needed to utilise them.
Each of these distributions has a form that models a life pattern. We can fit the same model to different
components by modifying the model parameters.
For example, the fundamental pattern of the life of a battery used in a watch may be fundamentally the same as
the bit on an electrical drill but the average life span as well as the degree of variation may differ.

Function
Density
Reliability

Summary of function properties and relations


Symbol
Definition
Condition for validity
(i) F(t)
f(t)
(ii) -R(t)
(i) 1- F(t)
R(t)
R(0) = 1, R() = 0
(ii)
(i)

Distribution

F(t)

Hazard

h(t)

Mean time to failure

MTTF

1- R(t)
F(0) = 0, F() = 1

(ii)
(i)
(ii)

f(t) / R(t)
- R(t) / R(t)

h(t) 0
MTTF 0

Life modelling
Exponential (Parameters )

Note that h(t) is a constant function. i.e. the failure rate of devices remains constant over time. I.e. if = 2.4
failures per year then this failure rate is the same for a brand new device as it is for a 10 year old one.
Such a model might be applied to a device that fails catastrophically, i.e. it has only two operation stages, either
it is perfect or it is destroyed.
In particular, it does not wear over time, a 10 year old device is as good as a new one. This is sometimes called
the memoryless property as the past experience of the device is not remembered (and can not be determined
by examining the device).
An example of a device where this model might be appropriate might be say a piece of crockery (plate, cup
etc.) A cup only has two states, either fully intact and perfect or broken and destroyed (we will set aside the
complication of a cup that may be cracked and thus in an in-between state). Thus, when it fails, it does so
catastrophically. A cup does not remember its history, i.e. it does not wear. The newest cup in your house
has the same risk of being broken in say the next month as a cup that has been there of 20 years.
Many electronic devices can be reasonably modelled with the exponential, as can electrical fuses or
light bulbs. A bulb for example, tends to have two states, perfect and broken and does not tend to wear. An
electrical surge might induce the catastrophic failure.
An engineer might well argue that a bulb does wear. Gas may escape from the bulb housing or the filament
will degrade with heat. Nevertheless, it may still be the case that an exponential model may fit quite well.
Problem 8
The life of a PCB is thought to follow an exponentially distribution with a MTTF of 18 months.
(a) Write down f(t), h(t) and R(t) for the PCB
(b) If the design life of the component is to be one year, what is the reliability?
(c) What design life would the manufacturers stipulate if they wish to guarantee their product for this time
and be 80% certain that it would survive for at least this time?
(d) What should the MTTF be if a two year guarantee is offered and only 1% of devices fail in this time?
(e) How likely is it that the device will fail in the MTTF, i.e. in 18 months?
Solution
Using years as our time unit MTTF = 1.5 years. Parameter = 1 / MTTF = 0.6667 years.
(a) f(t) = 0.6667e-0.6667t, R(t) = e-0.6667t, h(t) = 0.6667
(b) R(1) = 0.513 (51%)
(c) 0.8 = e-0.6667t t = 0.335 years or about 4 months.
(d) Let be the required parameter of the new design, 0.99 = e-2 = 49.7 years!
(e) R(1.5) = e-1 = 0.368 37%
Note that the probability of a device failing in the MTTF time is 37% in this and any other scenario.
Intuitively you may think that, by definition there would be a 50% chance of a device surviving until the
mean time to failure, but this is not so.

Normal or Gaussian

(Parameters and )

As before, if we begin by looking at the hazard function we get a sense of the life experience of a device
modelled with the normal. Up to a point, the failure rate is virtually zero, but there after it steadily rises. Thus
the device does age with time.
Examples of devices that might well be modelled with the normal include a range of mechanical devices,
batteries, knifes (blade sharpness).
The density function is quite informative here too in that is suggests that there is some modal time for the life of
devices with other survival times symmetric about this centre. Frequently life times can be asymmetrical, which
would give rise to a larger right tail (device more likely to survive for a time greater than rather than less than
the modal time) or (less likely), a larger left tail (device less likely to survive for a time greater than rather than
less than the modal time). Thus we will need other models that asymmetrical distributions.
Problem 9
The life of a AAA battery when used in a flash light it thought to be normally distributed with mean 80 hours
and standard deviation 3 hours.
(a) If the design life of the battery is 75 hours, what is the reliability?
(b) What design life K would the manufacturers stipulate if they wish to guarantee the battery for this time
and be 80% certain that it would survive for at least this time?
Solution
The normal does not lend itself to mathematical manipulation so we must use numerical methods (tables) or
software (e.g. Excel)
(a) Let T be the life time of the battery. T ~N(80, 32). Pr(T > 75) = 1- Pr(T < 75) = 1 Pr(Z<(75-80)/3)
= 1- Pr(Z < -1.67) = 1 - (-1.67) = 1 (1-(1.67)) = (1.67) = 0.9525 (95%)
(b) Pr(T<K) = 0.8 Pr(Z < K-80/3) = ( K-80/3) = 0.8 K-80/3 = -1(0.8) = 0.84 K = 82.5 hours.
We can obtain these answers in Excel with
(a) =1-NORMDIST(75, 80, 3,TRUE) and (b) =NORMINV(1-0.8, 80,3)
Exercise
A robotic arm had a life that is normally distributed with mean 50,000 cycles and SD 8,000 cycles.
(a) If the design life is 40,000 cycles what is the reliability?
(b) What design life would give a reliability of 98%
(c) A design life of 60,000 cycles is to be specified with a reliability of 90%. If the SD remains at 8,000,
what would the mean number of cycles need to be?

10

Lognormal (Parameters t50 and )


The lognormal is more versatile than the normal in that it is possible, by modifying the parameters, to construct
very different models. In the sketches below, each of the two parameters t50 and are varied, holding the other
fixed, and the impact on f(t), R(t) and h(t) can be observed.

An examination of the densities shows that we can readily model a life component with a large right tail,
something we could not do with the normal.
However, the hazard had an unusual shape, suggesting a rapidly increasing failure rate early in life and then a
gradual decrease. It is not easy to conceive of a device that might have such a hazard function. Nevertheless, it
is extensively used in reliability.
Problem 10
An industrial water pump has a life that is lognormal distributed with t50 = 20 million and = 2.3.
(a) Calculate the reliability if the design life is to be 1 million cycles.
(b) What design life would give a reliability of 99%?
Solution

ln( t )

t 50
R(t ) 1

(a) R(106) = 1- (ln(0.05) / 2.3) = 1 - (-1.30) = 0.9032


(b) 0.99 = 1- () () = 0.01 = - 2.326 = ln(t / 2x107) / 2.3 t = 94,906
Exercise
Software used to control an industrial process is thought to encounter critical errors (necessitating the stopping
of the process) in a time that has a lognormal distribution with t50 = 15 weeks and = 4.7.
(a) What is the likelihood of no critical errors in one 4 week period?
(b) What time period would be such that the probability of no critical errors would be 85%?

11

Weibull

(Parameters and )

As with the lognormal the Weibull facilitates densities with large right tails. However it is more versatile in
terms of the hazard function it permits. Like the lognormal, it is extensively used in reliability. Note that the
exponential function is a special case of the Weibull where = 1.
Problem 11
A components life span is described by a Weibull PDF with = 2.3 and = 4.5 months.
(a) Determine the reliability if the design life is to be 6 months. Sketch and interpret the hazard function.
(b) What design life would give a reliability of 99%
Solution

t
(a) R(t ) exp( ) = 0.144

(b) log(R) = -(t/) t = (-log(R)1/ ) = 0.61 months ~ 18 days

Exercise
A devices life span is described by a Weibull PDF with = 0.5 and = 20 years.
(a) Determine the reliability if the design life is to be 2 years/
(b) What design life would give a reliability of 90%

12

Reliability Block Diagrams (RBD)

Previously we have considered how we might obtain the reliability of a single component using a particular
probability model (Normal, Weibull etc,) and parameter selection. We next consider how to combine these
component reliability to get an overall measure of the system reliability.
Problem 12
A hiker takes a flash lamp on a trip. The lamp consists of three components whose lives are suitably modelled
to give reliability measures, taking the trip duration to be their design life. The components are the battery, the
light bulb and the switch which have reliability measures of 95%, 98% and 90% respectively.
What is the reliability of the flash lamp?
We assume the events of individual components failing are statistically independent. This in effect means
that if a component fails, this does not effect the probability of a different component failing.
This is often an unwarranted assumption. For example, if the flash lamps bulb failed because it was
dropped from a height, it is clearly reasonable to expect that the other components may also have failed.
Solution
Clearly the system can only work if all the components work (i.e. there is no redundancy). Thus we arrange our
components in series.

To determine the system reliability we take the product of the component reliabilities. This holds for any
number of components. I.e. Rsystem = 0.95 . 0.98 . 0.90 = 0.8379 or about 84%
We can rationalise the correctness of the rule as follows. 95% of the time (i.e. 95 out of every 100 trips on
average), the battery will not fail. Of these 95, 90% of the time (i.e. 85 times on average) the switch will also
not fail and of these 85, on 98% (i.e. 82.79 times on average) the bulb will not fail either.

The overall reliability of components arranged in series is always less than the reliability of any one of
the individual components.

Problem 13
A hospital has a main electricity supply and an independent backup generator. There is a 2% chance that in one
week, the mains supply will fail and a 5% chance that the generator will fail. How reliable is the hospitals
electricity supply?
Solution
Our components mains supply (M) and generator (G) are such that for a design life of one week, R(M) = 0.98
and R(G) = 0.95. Clearly there is a degree of redundancy here as, unlike our flash light example, we can permit
one of the components to fail, provided that not both fail.
We can arrange our system using the RBD shown opposite.
This, for obvious reasons, is called a parallel arrangement.

To rationalise the rule for determining the reliability of this system, note that the system only fails when both
components fails. As the mains fails 2% of the time (i.e. on average 20 times in every 1,000 weeks) and on
these 20 occasions, the generator fails 5% of the time (i.e. on average 1 out of the 20). Thus both will fail one
time in 1,000 and will not fail on the other 999 times on average. So the reliability should be 99.9%
I.e. Rsystem = 1 (1-0.98) (1-0.95) = 0.999 or 99.9%

13
Note
This holds for any number of components. For example for 3 components in parallel with reliabilities R1, R2
and R3 the reliability of the system is 1 (1- R1) (1- R2) (1- R3) etc.

The overall reliability of components arranged in parallel is always more than the reliability of any one
of the individual components.

Take care when interpreting probabilities that are close to 1 (or 0). There is a considerable difference
between 0.999 and 0.9999. A device with the former reliability will fail one time in 1,000 on average; a
device with the latter will fail one time in 10,000 on average and so in a certain sense, is 10 times better.

Mixed systems (parallel and serial arrangements)


In practice complex systems are likely to have a mix of serial and parallel arrangements of components. The
reliability of such as system is obtained by replacing each grouping of similar (i.e. either all serial or all
parallel) components by a single component which has the same reliability measure as the group. A series of
reliability diagrams such as the one shown below may be helpful
The diagram shows an original system
on the left, which by suitable grouping
of components is reduced after 5 steps
to a simple single component on the
right, whose reliability will equal that
of the overall system.

Problem 14
An important data set sits on the hard disk of a PC. 3 backups exist, one on to a set of 2 CDR and two more,
each on to a set of 3 zip disks. However corresponding disks from each zip disk backup set are interchangeable.
Both zip disk backup sets are password protected with the same password. In addition, the data set on the PC
and the CD-R backup set are also password protected with the same password, which is different form the one
used for the zip disk backups. Draw a reliability diagram for this system, labelling the various components.
If the reliability of each component (i.e. media does not fail, password not forgotten) is as given below,
determine the reliability of the system.
Component
Hard disk
CD
Zip
Password
Reliability
0.98
0.95
0.9
0.96
Exercise
Find the reliability of the system represented
by the RBD shown opposite.

14
Active V passive redundancy
A system for which the RBD has any parallel arrangement has an element of redundancy
A redundant component may be passive or
active. In the case of the former, it is not
activated until required (e.g. a generator,
spare tyre etc).
An active redundant component might be
a jet engine or additional structural support
on a fair ground ride.
Note that passive redundancy tends to give
rise to greater reliability.
Example
The RBD below represents the system that is the tyres on a car in the cases where there is no spare (left) and
where there is a spare (right).

No spare tyre
With a spare type
Problem 15
Suppose that for a cross-continent car trip, the risk of a tyre failing is 2% and that tyre failures are independent
events. Determine the reliability of the tyre system in the cases where
(a) There is no spare
(b) There is one spare
(c) There are two spares
Solution
(a) Rsystem = (0.98)4 = 0.922 (92%)
(b) Let X = # tyres that fail on the trip. X ~ bin(5, 0.02) so Pr(X = k) = 5Ck(0.02)k(0.98)5-k.
Rsystem = Pr(X 1) = Pr(X = 0) + Pr(X = 1) = 0.9961 (About 1 trip in 260)
(c) 0.999847 (About 1 trip in 6540)
Exercise
Do you think it is reasonable to assume that the events of tyre failures are independent?
Do you think it is reasonable to assume that the risk of the spare failing is the same as the in use tyres?
Exercise
Sketch the RBD diagram for a 4 propeller plane that to fly
(ii) All four engines
(iii)One engine
(iv) Two engines, but one on each side
(v) Two engines
If there is one chance in 1,000 of an engine failing on a flight, calculate the reliability for each of the above.

15
Problem 16
An engineer must make a PowerPoint presentation to an important client. The presentation requires a computer
with internet access, connected to a data projector. The conference centre has this equipment but there is a risk
that any of these components may fail at the start of or during the presentation. Specifically, there is a 1%
chance that the PC will fail, a 4% chance that the projector will fail and a 10% chance that the internet
connection will fail.
(a) Draw a RBD and calculate the likelihood of a successful presentation
(b) For increases reliability, the engineer takes a laptop with independent internet access as well as a
portable data projector. Assuming he must either use all his own components or all of the centres
components, by how much has he enhanced the likelihood of a successful presentation? (Assume the
same component failure likelihoods as above)
(c) Repeat (b) if it were in fact possible to interchange components (e.g. use his own laptop and Internet
access but the centres data projector)
In all cases express the risk of failure in the form one chance in N.
Solution
(RBD and calculations done in class)
(a) 85.5% or about 1 chance in 7 of failure
(b) 97.9% or about 1 chance in 48 of failure
(c) 98.9% or about 1 chance in 86 of failure
Notes
o
The redundancy employed in (b) is called high level redundancy (i.e. the system is duplicated) whilst that
employed in (c) is called low level redundancy (i.e. the components are duplicated). It should be obvious
that low level redundancy enhances reliability by more.
o
Intuitively some might think that the engineer having two presentation systems doubles his reliability.
Typically the reliability enhancement will be far greater that this.
Exercise
Redo the problem if Internet access on the laptop (as well as the PC), are provided via Wi-Fi by the conference
centre.
In the last exercise, the events of Internet access failure on the two computers are not of course independent. In
this example it is easy to deal with lack of independence but often it is more challenging. We will not on this
module, explore in any great detail the lack of independence of the failure of components and will always
assume independence. But you should consider for each problem the reasonableness of making such
assumptions.

16

Fitting Life models

Earlier we considered a number of different models (Weibull, normal, lognormal, exponential) that might be
used to described the life of a component or device and how to calculate a reliability measure once a model
form and parameters was known. An obvious question to consider is how to obtain the parameters once the
model form is known, or even how to identify a suitable model form in the first place.
We begin with the problem of obtaining parameters and assume that model form is known. Our methods
require us to employ simple linear regression so we being with a brief review.
Simple regression
An internet user wishes to estimate the download speed of their internet connection and from their computers
log obtains a random sample of 8 files which were downloaded. The size of each file and the time they took to
download is tabulated below.
File size (Mb)
Download time (Sec)

9
33

6
17

8
25

2
6

4
12

7
22

8
27

3
14

The first logical step is to graph the


data in some way. For this kind of
data the scatter diagram is
appropriate.
The file size is plotted on the
horizontal or X axis and the
download time is plotted on the
vertical or Y axis. Note that the
points do not lie on a smooth curve
so we do not join them together.
A couple of points are immediately clear from the scatter diagram.
The points tend to run from bottom left to top right in an approximately linear fashion (a guess as this line is
shown in the diagram above). This indicates that the download time for files tend to be long when the file is
large and short when the file is small as might reasonably be expected.
There is not an exact (or functional) linear relationship between the two variables. However we might postulate
that there is linear trend in the data, which can be characterised by the dotted line. The equation of this trend
line can be used to describe the relationship between our two variables, taking account of the noise in our data,
and may be used to for example, make predictions.
For example, using Excel or most modern scientific calculators we can determine three pertinent parameters.
The line intercept a (about 0.136), the line slope b (about 3.296) and what is called the correlation coefficient r
(about 0.96).
Thus the regression line has equation = a + b X = 0.136 + 3.296 X, where X is file size and Y is download
time. So we might predict that a file of size 5 Mb might take 0.136 + 3.296 (5) = 16.6 seconds to download.
We will apply regression analysis to failure test data and use the regression parameters to calculate suitable
parameters for our life models.

17

Reliability Testing Summary Sheet


For all probability plots, take F(i)

i 0.3
and i is the ordered sequence (out of N) of failure times.
N 0.4

The regression line has equation = a + b X


Distribution
X
Y
Parameters
1
ln(
)
Exponential
t
=b
1 F
Normal

Lognormal

Weibull

R(t)
exp(t )

-1(F)

= 1/ b
= -a/b

t
1

ln(t)

(F)

= 1/ b
t50 = exp(-a/b)

ln( t )

t 50
1

ln(t)

1
ln(ln(
))
1 F

=b
= exp(-a/b)

t
exp( )

-1

Note
This F value calculated above is called the median rank and is an estimate of the distribution function for the
number of devices that have failed by time t. Other simpler estimates, such as i/N or i/N+1 are sometimes used.
Whichever formula is used, the interval [0, 1] should be sub-divided.
When N is large, there is little difference between the three formulae. A visual comparison of the three
formulae are shown below for N = 10.
Formula
A (i-0.3)/N+0.4
B i /N
C i/(N + 1)

1
0.067
0.1
0.091

2
0.163
0.2
0.182

3
0.260
0.3
0.273

4
0.356
0.4
0.364

5
0.452
0.5
0.455

6
0.548
0.6
0.545

7
0.644
0.7
0.636

8
0.740
0.8
0.727

9
0.837
0.9
0.818

10
0.933
1
0.909

The idea of probability plotting is to log-transform the distribution function F(t) to make it linear. i.e. in the
form Y = a + bX, where X and Y are the variables and a and b are the intercept and slope respectively.
We can determine how to do this once we know the distribution form for any particular model and from this we
can obtain this models parameters from the regression parameters. And in turn, we can estimate the reliability
for a given design life. This is done in the table above for four models but the algebraic details are shown below
for the normal model and for the Weibull model.

18
Normal model?
t
t
R = 1
so F =
(remember that F = 1- R)


t 1

1
1 ( F )
t = bt + a. So b and a .

We plot t on the X axis and -1(F) on the Y, as given in the table


Lognormal model?

t
t
t
t
1
R exp( ) F 1 exp( ) 1-F = exp( )
exp( )
1 F



t
1 t
1
ln
ln( ln
) ln = ln(t) - ln()
1 F
1 F

So b = , a = -ln() = -bln() -a/b = ln() - exp(-a/b).
We plot ln(t) on the X axis ln(ln(1/1-F)) on the Y, again as given in the table.
Exercise
Verify the table entries for the lognormal and the exponential distributions.

Fitting an exponential model


Problem 17
An experiment is conducted in order to fit a suitable life model to a circuit board. The investigator is confident
that the exponential is the correct model form and thus needs to estimate the single parameter . The time of
failure for 8 components (months) is as tabulated below.
i
t

1
2
3
4
5
6
7
8
3.10 4.38 8.83 10.70 10.80 14.31 21.45 28.61

Notes
o
We assume for now that the data is not censored. i.e. the test concluded when all 8 devices had failed.
o
We also assume that there is no reliability growth (unlike a Duane modelling approach). I.e. we do not use
information from device failure to improves the design leading to longer MTTF.
o
The 3.10 months until the failure of the first device may have been found by noting that one device lasted
for 3.10 months, or perhaps two devices running together with the first failing after 1.55 months. Or
possibly 5 devices with the first failure after 0.62 months etc.
o
We refer to the Reliability Testing Summary Sheet to find the required parameter.
Solution
i
1
2
3
4
5
6
7
8

X
3.10
4.38
8.83
10.70
10.80
14.31
21.45
28.61

F
0.083
0.202
0.321
0.440
0.560
0.679
0.798
0.917

Y
0.09
0.23
0.39
0.58
0.82
1.13
1.60
2.48

19
Details of the construction of this table and the chart (both done in Excel) will be expanded upon in class.
The pertinent output is the regression line y = 0.0927x 0.2697, which may also be found on a calculator.
Thus a = -0.2697 and b = 0.0927 and our summary sheet tells us that = b = 0.0927.
So the failure rate for the component is 0.0927 failures/month which gives a MTTF of 1/0.0927 10.8 months.
Note also that if the design life is 12 months then the reliability is e-0.0927(12) = 0.33. So we might expect to see
one third of a batch of such devices survive beyond 12 months. In our data we see that 3 of the 8 actually did
survive for beyond 12 months, which is broadly consistent.
Note that the high R2 value (close to 1) as well as a physical inspection of the plot suggests that the exponential
is perhaps a reasonably good model.
Take care to distinguish between the coefficient of determination r2 (but written R2in Excel) and R, the
reliability measure.
Exercise
A batch of 12 components are tested until failure and the time of failure in minutes are 5, 12, 15, 22, 20, 40, 55,
70, 95, 120, 140 and 180.

(a) Fit an exponential life model to this data.


(b) Calculate the reliability if the design life is 20 minutes and determine if this is consistent with the
observed data.

Fitting a normal model


Problem 18
The makers of a mobile phone wish to model the phones stand-by battery life span with a normal distribution.
15 phones are left on standby until the battery dies and these times (hours) are as follows.
76.8, 77.6, 78.3, 79.9, 80.2, 80.4, 80.7, 81, 81.5, 84.8, 84.9, 85.1, 88.2, 89.2 and 89.5
Fit the model and discuss the validity of an assertion by the makers that stand-by battery time is 80 hours.
Solution
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

X
76.8
77.6
78.3
79.9
80.2
80.4
80.7
81.0
81.5
84.8
84.9
85.1
88.2
89.2
89.5

F
0.045
0.110
0.175
0.240
0.305
0.370
0.435
0.500
0.565
0.630
0.695
0.760
0.825
0.890
0.955

Y
-1.69
-1.22
-0.93
-0.71
-0.51
-0.33
-0.16
0.00
0.16
0.33
0.51
0.71
0.93
1.22
1.69

Again, details of the construction of this table and the chart will be expanded upon in class. Both the chart and
the r2 value suggest that the normal is a reasonable model.

20
The regression line y = 0.2179 x 17.985 so a = 17.985 and b = 0.2179.
Our summary sheet tells us that = 1/b = 4.59 and = - a/b = 82.54
If the design life is 80 hours then the reliability will be 71%, and this is broadly consistent with the observed
data where 10 of the 15, or 67% of batteries survive for 80 hours or longer. So the manufactures claim of 80
hours will usually (71% of the time) be achieved but in a substantial minority of cases, it will not.

Model form
In the circuit board example the exponential model form was selected apriori. Similarly with the normal model
form for the mobile phone battery example. The nature of these components suggests the model form but
perhaps the normal model might be a better fit to the first example? Or the exponential to the second example?
If we fit the alternative models in each case we get the plots shown below.

Exponential model fitted to battery data


Normal model fitted to circuit board data
The r2 values are summarised opposite for all four models. As
there is no issue with the linear fit of the regression lines we may
opt for the model with the largest r2 values (in bold).
Thus we would model the battery as normal and the circuit board
as exponential.

Normal Exponential
Battery
0.8995
0.9354
Circuit board 0.8758
0.9784

Exercise
A batch of 26 drill bits are tested until failure and the time of failure in hours are given below.
17.5
18.9
19.2
24.2
25.6
27.5
28.1
28.6
28.8
30.6
32.3
32.5
33.6
33.7
34.7
35.9
36.4
36.9
36.9
37.7
43.6
46.4
51.3
28.1
34.5
44.3
o
o

Fit a normal model to these data and calculate the reliability for a design life of 30 hours.
Repeat the analysis but now fit an exponential model and decide which of the two models is the best fit.

Fitting a lognormal model


Problem 19
The time in days between failures of a generator are recorded below.
65
70
72
75
78
80
82
83
84
90
92
100
101
102
103
109
112
116
120
132
Fit a lognormal to these data and determine the reliability for a mission time of 65 days.

21
Solution

The data is shown opposite and the plot is


shown above.
As there are no obvious issues with the plot we
can proceed to determine the parameters of the
lognormal to be = 0.21 and t50 = 91.6
The reliability for a life of 65 days will be 0.949.
Exercise
Use Excel to decide if either an exponential or
normal model might better fit the test data here.

i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

t
65
70
72
75
78
80
82
83
84
90
92
100
101
102
103
109
112
116
120
132

F
0.034
0.083
0.132
0.181
0.230
0.279
0.328
0.377
0.426
0.475
0.525
0.574
0.623
0.672
0.721
0.770
0.819
0.868
0.917
0.966

X
4.174
4.248
4.277
4.317
4.357
4.382
4.407
4.419
4.431
4.500
4.522
4.605
4.615
4.625
4.635
4.691
4.718
4.754
4.787
4.883

Y
-1.821
-1.383
-1.115
-0.910
-0.738
-0.585
-0.444
-0.312
-0.185
-0.061
0.061
0.185
0.312
0.444
0.585
0.738
0.910
1.115
1.383
1.821

Exercise
Estimate the reliability of a component whose life is thought to be lognomally distributed if the design life is 15
years and using the failure data 4, 8, 17, 19, 21, 35, 38, 41, 48 and 94.

Fitting a Weibull model


Problem 20
Failure times in months in a test for an industrial compressor are 16.1, 18.5, 22.5, 24.8, 25.4, 26.3, 28.4, 28.6,
30.2, 31.1, 34, 38, 40.7 and 43.2. Fit a Weibull model and estimate the reliability for a design life of 2 years
Solution

The data is shown opposite and the plot is shown above.


Again there are no obvious issues with the plot we can
proceed to determine the parameters of the Weibull to be
= 4.03 and = 32.1
The reliability for a life of 24 months will be 0.733.

i
1
2
3
4
5
6
7
8
9
10
11
12
13
14

t
16.1
18.5
22.5
24.8
25.4
26.3
28.4
28.6
30.2
31.1
34.0
38.0
40.7
43.2

F
0.05
0.12
0.19
0.26
0.33
0.40
0.47
0.53
0.60
0.67
0.74
0.81
0.88
0.95

X
2.78
2.92
3.11
3.21
3.23
3.27
3.35
3.35
3.41
3.44
3.53
3.64
3.71
3.77

Y
-3.00
-2.07
-1.57
-1.21
-0.93
-0.69
-0.47
-0.27
-0.08
0.11
0.31
0.52
0.76
1.11

22
Exercise
Fit a Weibull model to the test data (days) shown
opposite and determine the reliability if mission life
is 21 days.
Problem 21
Select a suitable model (using the Weibull, normal,
lognormal and exponential as candidates) using the
following failure test data (weeks) and estimate the
reliability when the design life is one year.

8.6
37.4
63.0
100.3

26.0
30.0
36.5
48.4

16.6
39.1
65.9
105.7

48.4
49.0
54.0
54.3

22.6
39.6
70.3
111.1

23.7
39.6
72.4
135.8

54.7
66.2
75.6
87.5

25.2
50.5
78.3
144.5

32.0
58.8
89.5
152.7

104.4
106.0
111.7
137.1

Solution
We do not know the model form here so consider a number of different candidates and construct the
appropriate probability plot for each.

Normal

Lognormal

Weibull

Exponential

The exponential is clearly not appropriate but all the others are reasonable. We select the lognormal by virtue of
it having a higher r2 value than any other candidate.
Exercise
Show that a design life of 52 weeks leads to a reliability of about 62%.

23

Censored data

In many situations it is not feasible to wait for all devices in a reliability test to fail before analysing the data. If
the experiment is terminated (by design or necessity) before all the test devices fail then the resultant data is
said to be censored.

Singly censored data


There are two types of singly censored data, type I and type II. Type I fixes the duration of the experiment
where as type II fixes the number of devices that will be permitted to fail.
Example
The ICB in a new brand of washing machine are to be tested, 50 units are selected for the test
Type I censored data would result if the experimenter decides that the experiment should stop after 200 hours
(perhaps 11 of the 50 ICBs will have failed by then).
Type II censored data would result if the experimenter decides that the experiment should proceed until 20
ICBs failures are observed (perhaps this would take 437 hours).
Type I is also called right-censored data as, with
respect to the time line, what would have happened
after the test was terminated (i.e. when the other
devices would have failed) is unknown. Thus data on
the right of the time axes (after end of test) is not
recorded, and thus censored.

Multiply-censored data
Sometimes test units have to removed before the experiment is completed. Perhaps a power unit being applied
to an ICB fails or liquid spills on it. Or perhaps for pragmatic reasons, a device has to be removed from the test
as it is needed elsewhere. Whatever the reason, it would be erroneous to ignore this when fitting life models.
Recall that we calculated F above using the formula i-0.3/N + 0.4 where i was the index of the failure of each
device (i.e. i = 1, 2, 3, ...) and N was the number of test units. We calculate F in a different way when there is
censored data.
Problem 22
10 devices are tested until all have failed. Calculate the F values if the 4th, 6th and 9th unit were censored.
Solution
The calculations are shown in the table opposite.
i is the index of the failure number as before.
O is the outcome for each device, failed (F) or
censored (C)
Ri-1 is calculated using the formula N+1-i/N+2 i if the device
failed or 1 if is has been removed (censored).
R is the product of the previous R value and the current
Ri-1 value for a failed device (it is not relevant for a
censored device
F = 1- R as before.

i
1
2
3
4
5
6
7
8
9
10

O
F
F
C
F
F
C
F
F
C
F

Ri-1
0.909
0.900
1
0.875
0.857
1
0.800
0.750
1
0.500

R
0.909
0.818

F
0.091
0.182

0.716
0.614

0.284
0.386

0.491
0.368

0.509
0.632

0.184

0.816

24
Note
1. If there are no censored data then the F values are the same as those obtained by i/ N+1
2. The above procedure, which uses the factor N+1-i/N+2 i is attributed to Herd and Johnson. If instead, the
factor N-i/N+1 i is used, the procedure is called the Kaplan-Meier procedure.
Exercise
1. Determine the F values above using the Kaplan-Meier procedure
2. Determine the F values using both the Herd & Johnson and the Kaplan-Meier procedures when there are
16 test units and
(i) The 3st, 5th and 10th are censored.
(ii) Type II singly-censoring is used when the test terminating after 10 units have failed.

Accelerated life testing


As the name suggests, accelerated life testing addresses the obvious problem of the tardiness with which failure
test data can be obtained. There are broadly speaking, two types.
Compress-time testing is appropriate when a device is not required to run continuously (e.g. not say a time
piece or a refrigeration unit) and simply require that the device be used more extensively that would be the case
circumstances.
For example, a door handle may be tested by repeatedly opening and closing it, or an engine starter may be
turned on and off repeatedly, or a flash light operated continuously to test the battery duration.
Clearly there are potential short comings with accelerated testing. For example, a device may heat up if the
testing is accelerated too much and a warmed up device may not be typical of normal operation.
Advanced-stress testing involves placing a greater load on a device or subjecting it to a harsher environment.
For example, electronic devices may be deliberately heated for testing purposes.

Reliability growth - Duane Models


Earlier we considered which of a number of candidate distributions best modelled some test failure data and
then calculate their parameters. The mean of the selected distribution would give you the MTTF, perhaps the
most useful summary statistic for the life of any device.
The test data used is assumed to be post production, i.e. using the released version of the device. Of course
efforts would have been made during the production process to enhance the reliability, and this could be
achieved by increasing the MTTF (regardless of what the PDF might be).
For example, a prototype might be developed and tested until it fails. An analysis might suggest that an
alternative design might have prevented, or made less likely, the failure. Thus the changed is implemented and
the testing of the prototype continues and the cycle repeated a number of times.
The distinction between development MTTF and release MTTF is
represented in the graph opposite.
The initial MTTF for the prototype is 10 but with on going design
improvements this is gradually increased (i.e. improved) towards a
MTTF of 30.
When the reliability growth process is complete (or stopped), the release
version of the device has been developed and this has a MTTF of 30.
Suppose we obtained the following data, with failure time in hours

25
Failure
1
Time
103
Mc = T/N 103

2
315
157.5

3
801
267

4
5
6
7
8
9
1183 1345 2957 3909 5702 7261
295.8 269 492.8 558.4 712.8 806.8

10
8245
824.5

(There may be a number of identical prototypes involved in the test. So for example, if 4 units are tested and the
first failure is after 7 hours, them we record 28 hours of operation without failure. If we had 6 units and the first
failure was after 5 hours we would record 30 etc.)
We record Mc (cumulative) which can be taken as an estimate of the MTTF. This is increasing over time
because every time we find and eliminate a deficiency in the deign we would expect the MTBF to increase.
An obvious question of interest would be, what it the instantaneous MTTF Mi at the end of this test process?
We might expect it to be more that the final measure of 824.5 as this is calculated on data when the design
flaws were not all eliminated.
Duane developed the model ln(T/n) = ln(T) + m and Mi = Mc / (1- )
We plot T/N V T on log-log paper as below and estimate the slope by taking two points on the (estimated)
regression line.
To find the slope of a log-log graph you must
calculate not (y2- y1) / (x2-x1) but
(ln(y2)-ln(y1)) / (ln(x2)-ln(x1)) =
ln(y2/y1) / ln(x2/x1)
x
1 10000
2 120

y
880
100

ln(x)
9.21
4.79

ln(y)
6.78
4.61

0.492

As an alternative to working with log-log graphs we can log transform both T/N and T (to get ln(T/n) and
ln(T)) and use simple linear regression to find the slope of the trend line b (but labelled in Duane models) to
get = 0.492 as above. i.e.

T/N

ln(T)

ln(Mc)

1
2
3
4
5
6
7
8
9
10

103
315
801
1183
1345
2957
3909
5702
7261
8245

103.0
157.5
267.0
295.8
269.0
492.8
558.4
712.8
806.8
824.5

4.635
5.753
6.686
7.076
7.204
7.992
8.271
8.649
8.890
9.017

4.635
5.059
5.587
5.690
5.595
6.200
6.325
6.569
6.693
6.715

So Mi = 824.5./ 0.492 = 1676 hours.


We can use this model for two purposes.
1. To estimate the MTBF for the component that has been tested for a given time
2. To estimate the amount of testing needed to achieve a desired MTBF

26
We can answer such questions using the regression equation ln(MC) = ln(T) + c and the relationship between
the instantaneous and cumulative MTTF, Mi = Mc / (1- )
i.e. ln(T/N) = ln(MC) = ln(T) + c so MC = exp( ln(T) + c) and Mi = MC / (1- ) = exp( ln(T) + c) / (1- )
If we want T for a given Mi we transpose this equation to get T = exp( ln(1- )Mi c) /)
So here Mi = exp(0.492 ln(T) + 2.255) / (1- 0.492) and
T = exp( ln(1- 0.492)Mi 2.255) / 0.492 )
Problem 23
(a) What is the (instantaneous) MTTF at the end of the test?
(b) If we want a MTTF of 800 hours, how long should the testing continue?
Solution
Using these formulae above we get (a) 561 and (b) 563 and 2052 respectively
Exercise
What MTTF might be expected after 50,000 hours of testing? (3867)
How much testing would be needed if the MTTF is to be 6 months? (64,400)

Exercise
Estimate the MTTF at the end of the test for the following failure data (days). Also, find the MTTF after 200
days and how long it would take to achieve a MTBF of 300.
10

35

Solution
i) 735 ii)

60

114

103

161

iii)

275

495

867 1682 2918

994

Exercise
Estimate the MTTF at the end of the test for the following failure data (hours). Also, find the MTTF after 500
days and how long it would take to achieve a MTTF that would be double that obtained at the end of the test.
10
600

50
900

90
1500

160
2000

250
3000

350
4000

27

Acceptance Sampling
Introduction
A manufacturer (producer) is mass producing items for a customer (consumer). However a percentage of the
items are defective and the parties have agreed that 3% or less as a defective rate for any batch is acceptable.
If all items in the batch can be inspected (100% inspection) then the decision to
accept or reject a batch is trivial and can be depicted using an operating
characteristic (OC) curve shown opposite.
PA is the probability of acceptance (i.e. that the batch will be accepted), and p is
the proportion of defects in the batch. Thus, provided that p 3%, the batch will
definitely be accepted (purchased), if p > 3% it will definitely not be accepted.
What if, as is frequently the case, it is not feasible to perform 100% inspection and a smaller sample is used?
For example, suppose samples of size n = 20 are randomly selected from each batch and the batch is purchased
if the number of defects in the batch (the acceptance number ac) is 2 or less?
In this case we can only consider, for a given defective portion p, the probability of the batch being accepted
and do so using the binomial distribution. For example, we can compile the following table.
p
PA

0%
5%
100.0% 92.5%

10%
67.7%

20%
20.6%

30%
3.5%

If we plot these values, PA v p, and interpolate the plot between these values we get the OC curve for our
sampling plan as shown below.

Note that in contrast to the ideal curve where we employ 100% inspection, it is no longer certain that the batch
will be purchased when p 3%. For example, if p = 2% then PA = 99.3% (Exercise)
Conversely, if p = 10% then PA = 67.7%. I.e. two out of every three batches will be purchased even when the
proportion of defectives is 10%, more than 3 times the maximum considered acceptable.
The consumer would probably not be happy with this sampling plan and demand a more stringent sampling
plan be used, perhaps samples of size 30 but with the same acceptance number of 2.
The corresponding OC curve for this plan is shown (dashed) below, along with the earlier one.

28

We can tell from this new OC curve that the chances that the consumer will purchase the batch when p is 10%
has been reduced to about 40%.
But of course there is a corresponding increase in the chances of the producer failing to see a batch with a
tolerable number of defectives. For example, the chance of a batch with only 2% defective items now being
sold has dropped from 99.3% to 97.8%
It is simply impossible to achieve the ideal OC curve when sampling is used. At best, we can specify two points
on the OC curve and determine a sampling plan that interpolates these two points.
Problem 24
Determine the OC curve that interpolates the two points
shown in the curve.
The points are (8%, 95%) and (20%, 10%)
The value p =8% is called the acceptable quality level (AQL)
and the p = 20% is the rejectable quality level (RQL).
Essentially, the batch should probably not be rejected when
p = AQL and probably not accepted rejected when p = RQL.
In our example, these probabilities are 5% and 10% and are
called the consumers and producer risk respectively.
The producers risk is labelled and the consumers risk .
Solution
RQL / AQL = 20/8 = 2.500. As = 5% (0.05) and = 10% (0.1) we refer to the first column of the tables by
Cameron below. In this column, and for ac = 9, we find the value 2.618 and this is the smallest value in the
column that exceeds 2.500. Thus our acceptance number is 9.
To determine the sample size divide the column headed nAQL by AQL for this row. Thus 5.426/0.08 = 68.
Thus our sampling plan is (n, ac) = (68, 9).
When p = 8% we get PA = 95.7% so = 4.3%. When p = 20% we get PA = 10.3% so = 10.3%
We dont get and to be exactly 5% and 10% respectively as tabular methods are imprecise.

29
Tables by Cameron.

ac
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

=0.05
=0.10
44.890
10.946
6.509
4.890
4.057
3.549
3.206
2.957
2.768
2.618
2.497
2.397
2.312
2.240
2.177
2.122
2.073
2.029
1.990
1.954
1.922

=0.05
=0.05
58.404
13.349
7.699
5.675
4.646
4.023
3.604
3.303
3.074
2.895
2.750
2.630
2.528
2.442
2.367
2.302
2.244
2.192
2.145
2.103
2.065

=0.05
=0.01
89.781
18.681
10.280
7.352
5.890
5.017
4.435
4.019
3.707
3.462
3.265
3.104
2.968
2.852
2.752
2.665
2.588
2.520
2.458
2.403
2.352

nAQL
0.052
0.355
0.818
1.366
1.970
2.613
3.286
3.981
4.695
5.426
6.169
6.924
7.690
8.464
9.246
10.035
10.831
11.633
12.442
13.254
14.072

ac
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

=0.01
=0.10
229.105
26.184
12.206
8.115
6.249
5.195
4.520
4.050
3.705
3.440
3.229
3.058
2.915
2.795
2.692
2.603
2.524
2.455
2.393
2.337
2.287

=0.01
=0.05
298.073
31.933
14.439
9.418
7.156
5.889
5.082
4.524
4.115
3.803
3.555
3.354
3.188
3.047
2.927
2.823
2.732
2.652
2.580
2.516
2.458

=0.01
=0.01
458.210
44.686
49.278
12.202
9.072
7.343
6.253
5.506
4.962
4.548
4.222
3.959
3.742
3.559
3.403
3.269
3.151
3.048
2.956
2.874
2.799

nAQL
0.010
0.149
0.436
0.823
1.279
1.785
2.330
2.906
3.507
4.130
4.771
5.428
6.099
6.782
7.477
8.181
8.895
9.616
10.346
11.082
11.825

We can use the binomial distribution (and Excel) to check our calculations and compile an OC curve. The
required function in MS Excel BINOMDIST(ac, n, p, TRUE)
Problem 25
Construct a sampling plan if we wish to fix the producers risk at 5% and the consumers risk at 10%. Assume
an AQL of 7% and an RQL of 15% have been agreed.
Solution
(n, ac) = (132, 14)
Exercise
Devise a sampling plan for the following
AQL
6%
8%
8%

RQL
17%
20%
10%

1%
1%
5%

5%
1%
1%

In each case sketch the OC curve and determine the precision of this method using the binomial.
Determine the probability of the batch being accepted if the proportion of defectives in the batch is
(i) the AQL + 1%
(ii) the AQL - 1%

30

Markov Chains The Machine breakdown problem


Problem 26
A stationary shop offers a photocopying service to the public and have one machine to handle all jobs. Due to
heavy usage, the quality of copies may deteriorate. Distinct states (excellent, good, acceptable and
unacceptable) for the copying machine have being identified and associated costs (refunds/repairs/lost business)
per day estimated. These values are tabulated below as well as the likelihood of the machine deteriorating to a
worse state. (The machine will not spontaneously upgrade to an improved state). Once the machine enters the
unacceptable state, it is repaired and returned to a state of excellent. Calculate the average net profit per day if
the average worth of the copier per day is as stated in the table.

To
Good
Acceptable
Unacceptable
Worth / day

Excellent
20%
10%
5%
300

From
Good
15%
10%
230

Acceptable
30%
100

Unacceptable
0

Solution
We must find the steady state s = (e, g, a, u) which we do by solving P.s = s where P is the probability transition
matrix. (e is the proportion of time that the machine is in a state of excellence, and similarly for g, a and u).

Use e + g + a + u = 1 rather than the last matrix row as the latter is merely a linear combination of the
first three rows.
Thus we can generate the following equations. (This is a problem requiring four variable in four linear
equations to be solved. This is a routine but tedious problem. A good strategy is to express all the states in
terms of the highest one, e.)
0.65e + u = e
0.2e + 0.75g = g 0.25g = 0.2 e
0.1e + 0.15g + 0.7a = a 0.3a = 0.1e + 0.15(g) = 0.1e + 0.15(0.8)g
e + g + a + u = 1 e + 0.80 e + 0.73 e + 0.35 e = 1
Thus u = 0.12, a = 0.25 and g = 0.28.
The steady state vector is (e, g, a, u) (35%, 28%, 25%, 12%)
Thus expected daily worth is (300)(0.35) + (230)(0.28) +(100)(0.25) = 194.40

u = 0.35 e
g = 0.80 e
a = 0.73 e
e = 0.35

(i)
(ii)
(iii)

There are a number of properties (explicit and implicit) in the process described above which are necessary for
the process to be Markov but are very restrictive.
1. The transition probabilities are constant over time.
2. The same time interval is taken between successive states remains constant.
3. The probability of a customer being a patron of any pub at time n depends only on their preferred pub at
time n-1.
4. The sum of the probability transition matrix columns must equal 1.

31
Problem 27
A firm leases earthmoving machines at different daily rate depending on the state of repair of the machine. The
3 possible states a machine might be in are nominally labelled perfect, useable and broke down. 100 per day is
charged for a perfect machine and 70 per day is charged for a useable machine. Over a period of one day,
there is a 20% change that a perfect machine will deteriorate to a state of useable and a 5% change that it will
break down. There is a 10% change that a machine in a useable state will breakdown. Machines which break
down are repaired and returned to the perfect state the day after they breakdown. Also, machines never
spontaneously improve to a better state. If capital depreciation and maintenance costs for a machine is 80 per
day, is it economically viable for the firm to continue leasing them?
Solution
The steady state (p, u, b) is (0.308, 0.615, 0.077) and the earnings are 73.85
As the maintenance cost = 80 Net income = -6.15. So no, it is not viable.
Problem 28
An industrial robot can be in a state of excellent, acceptable and unacceptable. Each day there is an 8% chance
that the robot will deteriorate from a state of excellence to a state of acceptable and there is an 5% chance that
the robot will deteriorate from a state of acceptable to a state of unacceptable. Also, there is a 2% chance that
the robot will deteriorate from a state of excellence to a state of unacceptable in a day. When the robot reaches a
state of unacceptable, it is serviced and returned to a state of excellence. Also, the robot will not upgrade itself
from a poorer to a better state. What proportion of the time is the robot in a state of acceptable or better?
Solution
The steady state (e, a, u) is (0.370, 0.593, 0.037) so Pr(acceptable or better) = 96.3%
Problem 29
A car hire firm has a fleet of 50 cars and classifies their condition in to categories of excellent, average, fair
and unacceptable. In any given month, there is a 10% chance that an excellent car will deteriorate to a state of
average, a 12% chance that it will deteriorate to a state of fair and a 3% chance that it will deteriorate to a state
of unacceptable. There is a 5% chance that an average car will deteriorate to fair and an 8% chance that it will
deteriorate to unacceptable. There is a 4% chance that a fair car will deteriorate to unacceptable. All
unacceptable cars can be overhauled and returned to a state of excellence within a month. If the rental charges
for excellent, average and fair cars is 60, 40 and 25 per day respectively, determine the expected average
income the firm might hope to make when all 50 of their acceptable or better cars are leased out.
Solution
The steady state (e, a, f, u) = (0.167, 0.129, 0.662, 0.042) and the average value per car is 31.74.
Thus the average income from 50 cars = (50*31.74) = 1,587

32

FMECA
Failure Modes Effect and Criticality Analysis (FMCEA) is a tool used in the design stage of a system to
enhance its reliability. It is a bottoms up approach in that it looks at the low level components that make up
the system and considers how failure of these components can induce a systems failures.
Some of the advantages of FMECA are
It affords the designed an overview of the system from a reliability perspective
It highlights the vulnerable parts of the system, allowing remedial action to be undertaken.
It identifies any operational constraints imposed by the design
Example
We will consider a FMECA example of an electric screwdriver and consider it to consist of the components
power unit, motor, switch, shank and bit
1
Item

2
Function

3
Failure
mode

4
Cause

5
Failure Mode
Frequency

6
Component
failure rate

7
Imm.
Effect

8
Next
effect

9
Severity
S

10
Criticality
S

1.4

0.6

0.3

0.6

0.4

0.08

0.3

0.6

0.054

Power
unit

Charge
loss

Age

0.7

(10 hours op)


2

Motor

Low power

low charge

0.5

Burn out

Misuse
Age
Electronics
failure

0.3

0.1

No On /
off
Crack/
break

Age

0.5

0.6

Manufacture
Flaw

0.9

0.1

Bit

Tip
damage

Wrong bit
Wrong speed

0.7

Bit slips

Bit
damage

0.4

0.56

Bit

Rust

Improper
storage

0.2

Bit slips

Bit
damage

0.4

0.16

Switch

Shank

On/off.
Speed.
Direction

No
variable
speed

No
torque
Reduced
torque
No
torque
Bit slips

Bit
damage

No
torque
Reduced
torque

3.454
1. Component name: Some FMECA templates have had addition code numbers / references as well. We have a separate entry for
each failure mode (note that there are two entries for bit).
2. Function: Often self-evident. Clarify if not (e.g. the Switch component)
3. Failure mode: How does the failure manifest itself? (e.g. no/reduced power, bit slips etc.)
4. Cause: E.g. misuse, age, fault in component etc.
5. Failure mode ratio: For the bit, the FMR for tip damage is 70% and for rust is 20%. These values might be ascertained
subjectively by the designer. Ideally we would like the failure mode ratios for a particular component to sum to 1 but this is
usually impossible, as it is unlikely that every possible failure mode can be identified.
6. Component failure rate: For a component (not a failure mode). E.g. the bit component may fail twice in every 10 hours of
operation.
7. Immediate effect: The immediate impact of the failure
8. Next effect: The knock-on effect, if any. E.g. If the bit slips, it will wear a and be damage.
9. Severity: A high value (close to 1) will be specified when the effect of component failure are high, lower values are used for
less severe consequences. E.g. no torque has criticality of 1 but reduced torque only 0.6. There may be a degree of subjectivity
in assigning these values.
10. Criticality is the product of S and gives us a measure of how critical each failure mode is. Note that the sum of the

criticality values (3.454) gives us an overall measure of the system and could be used to compare the reliabilities of
competing designs. As charge loss, with a contribution of 1.4, contributes more to this total than any other failure
mode the designed might well focus on seeking improvements or alternatives to this component.

Anda mungkin juga menyukai