and
Stochastic processes
Lecture Overview
Reliability Data Analysis
Complex Systems
Reliability/Hazard Function
Stochastic Process
Reliability Models
Reliability Analysis
Statistical analysis of failure modes and actions carried out, such as preventive
maintenance or repair, can result in an approximate assessment of system reliability and
actions involved to restore the performance of the system at all levels.
Reliability data
The analysis of reliability data depends on the types of observations available. Observations
of systems on test which fail yield complete information on the time till failure if monitoring
is continuous.
On systems, which have not failed, we have partial information only. Such data are called
time-censored.
If a system start operating at a particular time, we say that the censoring is single, called
censoring of type I. Some observation terminates at the instant of the rth failure, where r is a
predetermined integer. In this case, the failures are censored according to type II censoring.
If a system starts operating at different time points in an interval (0, t], and the observation
terminates at time t, we have multiple censoring of data. If a system is known to have failed
prior to the time when observation started we have left censoring. The other type of censored
information, where the system is still in operation at the termination of monitoring, is called
right censoring.
6
Reliability Data
Data
Manufacturer Failures
Statistics
EVENTS
BAD NEWS
Customer
CENSORED
Failure Free Time TIMES
GOOD NEWS
XX
X
XX
X
X
No of
No
r
Observed Information 1
renewal
6
5
4
X X XX
X
XX
X XX
X
X
XX
r2 r1
XX
0
XX
t0
rs rs 1
X XX
t1
X
t2
t s 1
ts
Time
400
800
1200
1600
2000
2400
Installation
Start of
of new equipment collection data
End of
data collection
10
Equipment Reliability
System reliability
Complex system
13
15
16
xi
ci
0
1
0
0
0
0
0
0
0
0
1
1
1
4
13
27
8
148
92
13
13
67
29
12
1
1
1
1
1
1
0
1
1
1
0
1
1
1
37
28
38
20
28
44
3
56
64
8
62
8
46
1
1
1
1
0
1
1
1
1
0
0
1
1
22
51
51
15
18
1
26
37
36
2
12
27
102
1
1
1
1
1
1
0
1
1
1
0
1
1
3
21
6
26
15
35
44
61
84
12
65
43
4
1
1
1
1
0
1
0
1
0
1
0
0
1
17
18
Distribution Plot
19
ID
Fan
A
ID
Fan
B
BOILER
FD
Fan
A
FD
Fan
B
WIND BOX
Coal Mills
PA
Fan
A
BUS MAIN
H
PA
Fan
B
20
Testing
evidence
Plant state
WOR
failure
WOC
entry
PFW
issued
SPR
repair
ROMP
PFW
off
WOR
days
days
WOC
closed
Do work
The sequence of activities varies from job to job. The interval between
failure and isolation is a priority. Consider these activities stated above
and work out the state of equipment at a specific time, and how precise
the state of the plant is?
21
Failure Modes
22
Critical failure: A failure which is sudden and causes cessation of one of more
fundamental functions. This failure requires immediate corrective maintenance action in
order to return to satisfactory condition.
Degraded failure: A failure which is gradual, partial or both, such failure does not cease
the fundamental functions but comprises of one or more several function. They may be
compromised by any combination of reduced increased or erratic output may lead to
critical failure.
Plant Database
Data information are monitored in the control room, collected and are
stored in
24
Genysis Database
WO_NUM
85700
85700
85800
85800
2AA
203700
"C"PFMILLCOAL/AIROUTLETTEMPIND'N
292900
2AA01
293000
350500
350500
351100
351100
351500
351500
733000
733000
PULVERISEDFUELMILLS---CARRYOUT
1AA033014
293000
PULVERISEDFUELMILLS---CARRYOUT
203700
292900
MO_TYPE
"A"PFMILL---HOLEATWESTROPEBOX
3AA06
"F"PFMILL---LARGECRACKINWELD
2AA0716
"G"PFMILLREJECTSSYSTEM&
CONTROLS
4AA06
"F"PFMILL---CCRPAFLOWINDICATION
4AA03
"C"PFMILL---PLEASEINSPECT3.3Kv
4AA0316
"C"PFMILLREJECTSSYSTEM&
CONTROLS
FAILURE
CAUSE
DURATION
(DAYS)
DATE
12/01/2000
ENTERED
20/01/2000
CLOSED
12/01/2000
ENTERED
60
210
09/08/2000
CLOSED
24/01/2000
ENTERED
413
24/01/2000
CLOSED
31/01/2000
ENTERED
12
15
15/02/2000
CLOSED
01/02/2000
ENTERED
12
14
15/02/2000
CLOSED
04/02/2000
ENTERED
50
17
21/02/2000
CLOSED
04/02/2000
ENTERED
408
05/02/2000
CLOSED
05/02/2000
ENTERED
18
05/02/2000
CLOSED
13/03/2000
ENTERED
15/03/2000
CLOSED
25
Efor Database
Date on
UNIT
MW
DURATION
MWH
CAUSE
TYPE
09/01/1999
57
96
5521
Mills/fuelquality
FR
15/01/1999
47
167
7917
Mills/fuelquality
FR
20/01/1999
150
0.5
75
Millfire
FR
29/01/1999
28
48
1338
Mills/coalcondition
FR
26/02/1999
101
25
2522
Gearboxlub.oilpp&millavail
PR
14/07/1999
104
20.66
2149
Mills
FR
10/08/1999
56
72
4032
Millavailability
PR
12/08/1999
80
1.16
93
Mills
FR
13/08/1999
44
44
Mills
FR
30/08/1999
50
5.25
263
Mills
FR
31/08/1999
74
296
Mills
FR
23/09/1999
100
0.5
50
Lossofmill
FR
07/10/1999
50
17.5
875
Mills
FR
14/10/1999
76
9.17
697
Mills
FR
12/01/2000
103
6.5
669.5
Mills
FR
03/02/2000
115
0.33
37.95
Coalfeeder
FR
03/02/2000
80
1.5
115
Mills
FR
17/02/2000
70
350
Lostmill
FR
24/02/2000
70
0.5
35
Mill
FR
26
PI
Unit
MillA
MillB
MillC
MillD
MillE
MillF
MillG
MillH
L1L
L1BMEA0
1
L1BMEA0
2
L1BMEA0
3
L1BMEA0
4
L1BMEA0
5
L1BMEA0
6
L1BMEA0
7
L1BMEA0
8
Date
Time
01/01/2003
00:00:00
314.3
0.0
54.1
50.3
52.7
0.0
0.0
60.0
0.0
01/01/2003
01:00:00
314.2
0.0
56.4
50.8
57.4
0.0
0.0
60.0
0.0
01/01/2003
02:00:00
317.1
0.0
51.2
53.1
54.6
0.0
0.0
61.5
0.0
01/01/2003
03:00:00
318.1
0.0
51.4
54.7
56.1
0.0
0.0
58.6
0.0
01/01/2003
04:00:00
312.7
0.0
49.3
53.2
56.9
0.0
0.0
57.5
0.0
01/01/2003
05:00:00
312.6
0.0
49.3
55.0
54.6
0.0
0.0
58.0
0.0
01/01/2003
06:00:00
314.9
0.0
51.7
52.4
54.3
0.0
0.0
56.6
0.0
01/01/2003
07:00:00
311.8
0.0
51.7
47.8
57.2
0.0
0.0
56.6
0.0
01/01/2003
08:00:00
313.4
0.0
49.6
49.2
54.4
0.0
0.0
57.6
0.0
01/01/2003
09:00:00
311.8
0.0
50.1
50.9
51.3
0.0
0.0
57.1
0.0
01/01/2003
10:00:00
312.7
0.0
45.8
55.4
52.3
0.0
0.0
55.1
0.0
01/01/2003
11:00:00
312.2
0.0
48.0
55.1
52.3
0.0
0.0
57.0
0.0
01/01/2003
12:00:00
314.2
0.0
48.0
54.3
54.1
0.0
0.0
56.9
0.0
01/01/2003
13:00:00
315.8
0.0
48.0
54.1
51.6
0.0
0.0
56.0
0.0
01/01/2003
14:00:00
322.6
0.0
48.4
53.4
50.5
0.0
0.0
57.7
0.0
01/01/2003
15:00:00
266.7
0.0
1.1
53.5
50.0
0.0
0.0
57.7
0.0
01/01/2003
16:00:00
349.3
0.0
1.1
46.5
48.8
0.0
48.1
57.1
0.0
01/01/2003
17:00:00
361.7
0.0
1.1
48.8
49.0
0.0
52.4
57.1
0.0
01/01/2003
18:00:00
365.4
0.0
1.1
48.8
50.2
0.0
53.3
59.3
0.0
01/01/2003
19:00:00
318.7
0.0
1.1
48.8
50.0
0.0
52.0
59.4
0.0
01/01/2003
20:00:00
312.2
0.0
1.1
50.6
48.9
0.0
51.4
58.0
0.0
01/01/2003
21:00:00
311.6
0.0
1.1
50.6
50.8
0.0
51.5
58.5
0.0
01/01/2003
22:00:00
306.4
0.0
1.1
45.8
49.0
0.0
56.5
58.5
0.0
27
01/01/2003
23:00:00
314.1
0.0
0.0
50.2
51.0
0.0
59.0
61.1
0.0
PI data information
Date
01/01/2003
15/01/2003
15/01/2003
16/01/2003
16/01/2003
14/02/2003
26/02/2003
26/05/2003
27/05/2003
29/05/2003
01/06/2003
06/06/2003
10/06/2003
11/06/2003
13/07/2003
14//07/2003
19/07/2003
22/07/2003
23/07/2003
24/07/2003
22/08/2003
14/11/2003
06/12/2003
time
15:00
4:00
22:00
1:00
15:00
11:00
3:00
7:00
14:00
10:00
5:00
8:00
22:00
7:00
18:00
20:00
1:00
4:00
5:00
9:00
10:00
1:00
8:00
motor current
1.1
51.9
0.3
51.8
0.7
47.2
0.2
45.3
0.5
53.3
0.8
56.8
0.2
56.6
0.7
51.8
0.8
42.6
1.7
68.4
0.1
47.4
0.6
state
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
28
Shift logs
MILL
UNIT1
UNIT2
UNIT3
UNIT4
service
available
available
overhaul
available
available
overhaul
available
available
service
available
available
available
available
available
available
available
available
service
available
available
available
service
available
available
available
available
service
available
overhaul
available
available
For illustrative purposes data extracted from the shift logs are presented in this
form. There are 8 mills in each unit and 7 are required for full production load.
Availability is interpreted as the number of mills operational at a given time
Interval. Availability of mills in the plant is interrupted by services and overhauls.
29
Unit 1
Unit 3
Unit 2
Unit 4
30
Statistical Analysis
31
Downtime Analysis
We conducted a Laplace test
on the series of downtime data
mills A, E and H tested positive
for a decreasing mean downtime
and mill B tested positive for an
increasing mean downtime
Mill E appears to have a change
point in the series about the 20th
event. The Laplace test on the
mill E data after the 20th event
suggests no trend.
Uptime analysis
While we analyzed these data
through correlations, the results
are best illustrated through
categorizing the data into long
and short durations.
The correlation is not strong, for example the autocorrelation coefficient was 0.227. As
such we conclude no trend in the uptimes for mills A, B, C, D, F and G and propose treating
the times between events in the data set as independent and identically distributed data.
33
Uptime analysis
Mill H appears to have a change point
about the 100th event rather than a
continuously changing mean uptime.
Mill E appears to have a mean uptime
that is changing
predictably
For the actual uptimes rather than the cumulative data, the series showed no sign of
Autocorrelation but did exhibit signs of heteroscedasticity with a linearly increasing
variance.
34
The residuals obtained from estimating the time of the successive uptimes from the first
differences of the polynomial model that was fit to the cumulative data, which we divided by
their sequential rank (i.e. ri/i) to stabilize the variability within the data. We see that there
were unusually large residuals for the first three events and there appears to be a skew towards
positive residuals throughout the data.
35
This strongly suggests an increasing residual mean uptime, which is consistent with
models such as a Pareto distribution, obtained through a mixture of exponentials
36
Reliability requirements
However, maintenance actions and renewal can show that a system can
meet its reliability requirements.
Preliminaries on life
distributions
0 < t < .
The probability density function f(t) corresponding to F(t) is its derivative (if it
exists). This is a non-negative valued function such that
t
F (t ) f (t ) dt ,
0 t .
38
Reliability function
This is the probability that the lifetime of the system will exceed t.
Another important function related to the life distribution is the
hazard function.
39
Hazard function
This can be used to represent the instantaneous hazard of a system, which has survived t units
of time, the hazard is given as
P (t T t t | T t )
t 0
t
h(t ) lim
P (t T t t T t ) P (t T t t )
.
P (T t )
P (T t )
From the multiplication law of probability, notice that h(t)t is approximately, for small t, the
probability that a system still functioning at age t will fail during the time interval (t, t + t ).
Where P ( t T t t | T t )
h(t )
1
F (t t ) F (t )
f (t )
lim
1 F (t ) t 0
t
R (t )
40
Hazard Function
41
E (T ) t f (t ) dt.
0
The equation can be rewritten in order that the expected life can be
computed as
tf (t )dt
ds
f (t )dt
f (t )dt
ds R (t ) dt
0
42
It is important to distinguish between repairable and nonrepairable items when predicting reliability measures.
43
There are three basic ways in which the pattern of failure can
change with time. The hazard rate may be decreasing,
increasing or constant
Repairable system
For item which are repaired when they fail, the reliability
is the probability that failure will not occur in the period of
interest, when more than one failure can occur.
The failure rate or ROCOF of repairable items can vary with time from these
three trends CFR, DFR, IFR
Repairable system can show a decreasing failure rate (DFR) when reliability
is improved by progressive repair, as defective parts are replaced by good parts.
Increasing failure rate (IFR) occurs when wear out failure modes parts begin
to predominate. The pattern of failure can be illustrated on a bathtub curve
46
Example
Suppose the hazard of a given system is constant in time, that is h(t) =
for all values of 0 t < . Then, the reliability function is
R (t ) exp dt e t , t 0
0
47
F ( t)
Reliability function
Distribution function
0.5
R ( t)
0.5
0
0
10
10
Hazard function
h( t)
0.5
10
48
Weibull Function
t
F (t ) 1 exp
t0
where > 0 and > 0 are the shape and scale parameters respectively
49
Weibull Distribution
t
R (t ) exp
51
52
Weibull Reliability
>1
=1
<1
53
The term infant mortality rate stems from the high mortality of infants
Electronic and mechanical systems may initially have high failure rate
Manufacturers provide production acceptance test burn in and
environmental test screening, to end the infant mortality before shipment to
clients. Therefore B<1 leads us to suspect that
If the dominant failure modes for a component is B < 1, and the component
survives infant mortality, it will improve with age. Conditional on survival the
failure rate decreases and the reliability increases
54
55
Tutorial Questions
57
Tutorial (A)
2.5
100
R (100) exp
250
0.904 .
t
R (t ) 0.95 exp
250
250
2.5
2.5
ln(0.95) 0.051
58
Stochastic processes
Stochastic Process
Set of random variables, or observations of the same
random variable over time: X t , t 0 (continuous-parameter) or
X n , n 0,1,...
(discrete-parameter)
60
For example, a repair model can be used to determine the optimal time
for preventive maintenance before a failure occurs.
61
The continuum is time and the highly localized events are failures,
which are assumed to occur at instants within the continuum, Crowder
et al (1991).
62
Poisson Process
Let X t , t 0 be a stochastic process where X(t) is the number of
events (arrivals) up to time t. Assume X(0)=0 and
(i) Pr(arrival occurs between t and t+t) = t o t ,
where o(t) is some quantity such that lim t 0 o t / t 0
(ii) Pr(more than one arrival between t and t+t) = o(t)
(iii) If t < u < v < w, then X(w) X(v) is independent of X(u) X(t).
Let pn(t) = P(n arrivals occur during the interval (0,t). Then
e t t
pn t
,n 0
n!
n
64
Poison Process
fa
il
fa
il
fa
il
ur
e
fa
il
ur
e
ur
e
ne
w
ur
e
Let T1, T2, T3,be the times to successive failures of the system and
let Xi = Ti Ti1 be the time between failure i 1 and failure i where T0
= 0. The Ti and Xi are random variables and we define ti and xi to be
their corresponding realized values. We can define N(t) as the number
of failures within the given interval of time (t).
t1
t2
t3
t4
time
65
Intensity function
The important point in system failure data analysis is that failures occur in a
specific sequence and can either be increasing, decreasing or constant.
The point process models that have been applied to repairable system
reliability are the homogeneous Poisson process (HPP), the nonhomogeneous Poisson process (NHPP) and the superimposed renewal
process (SRP).
Modelling terms used in the reliability for components (parts) and systems
had been confused totally by reliability engineers and scientists.
67
Non-homogeneous Poisson
process (NHPP)
Characteristics of NHPP
N 0 0
N t N s N s
N t N s ~ Po
where
independence of increments
t dt
s
dE N t
t
dt
69
NHPP
There are many connections between the NHPP and a distribution of time
of failure.
70
71
Renewal Process
The renewal process is defined as a
sequence of independent and identically
distributed non-negative random variables
X1, X2, X3 which with probability 1 are not
all zero. Hence it is a generalisation of the
HPP
72
fa
il
fa
il
fa
il
ur
e
fa
il
ur
e
ur
e
ne
w
ur
e
Example
t1
t2
t3
t4
time
73
Non-homogeneous
Poisson Process
Partial Repair
Rejuvenation
Maximal Repair
Correction
Renewal Process
74
t 0 t si
where
i 1
si
0 t
s1 s N t
0 t
constant (exponential)
0 t t
0 t t
sad
system
noncommittal
system
15
177
51
27
65
43
32
51
27
43
43
177
51
32
15
65
27
65
177
15
32
76
happy
system
sad
system
noncommittal
system
-33.7
-33.7
-35.5
loglinear
-32.4
-28.6
-31.0
power-law
-29.2
-32.0
-34.7
constant
-35.5
-35.5
-35.5
loglinear
-34.8
-34.8
-34.8
power-law
-35.1
-35.1
-35.1
constant
-35.5
-35.5
-35.5
loglinear
-34.8
-32.0
-35.2
power-law
-35.0
-31.8
-35.3
maximal repair
minimal repair
Baseline
Intensity
77
loglik
loglik
loglik
78
0.18
Intensity Function
0.18
( t a b s)
( t a b s)
( t a b s)
happy
410
sad
Intensity Function
0.18
410
410
noncommittal
79
Revision: Possible
Questions/Problems