05 - Reliability Analysis and Stocahstic Processes

Reliability Analysis
and
Stochastic processes
Lecture Overview
Reliability Data Analysis
Complex Systems
Reliability/Hazard Function
Stochastic Process
Reliability Models
Fundamentals of Reliability Analysis
The collection and analysis of reliability data requires a systematic

approach with clear definition of reliability parameters and
comprehensive collection and analysis procedure
Weibull analysis is considered to be an important techniques for

modelling failure process as we shall see later. Other methods are
relevant to certain situations. Simpler methods of analysis can be
beneficial, particularly in initial stages of a project
Presenting results in a clear and concise form with the emphasis of

benefits obtained from the analysis is an essential ingredent of any
study
3
Recall reliability definition

The term reliability generally expresses a certain degree of assurance
that a system will operate successfully in a specified environment
during a certain time period.
This concept is dynamic and it does not refer just to instantaneous
events. If a system fails, that does not necessarily imply that it is
unreliable. Every piece of equipment fails once in a while.
The question is how frequently do failure occur in a specified period
of time?
Reliability Analysis
Reliability analysis requires the recognition of modelling of system characteristics that

includes the post-failure behaviour of components and the contribution of a component
failure or failure mode to the overall system condition.
Statistical analysis of failure modes and actions carried out, such as preventive
maintenance or repair, can result in an approximate assessment of system reliability and
actions involved to restore the performance of the system at all levels.
Sometimes reliability requirements is based on practical experience and engineering

intuition. This usually requires statistical data about technical systems. The
determination requires both expertise and statistics. An ideal condition for determination
of reliability requirements exists when systems performance can be measured in some
units as the cost of its production and maintenance, its operating characteristics as well
as maintenance.
Reliability data
The analysis of reliability data depends on the types of observations available. Observations
of systems on test which fail yield complete information on the time till failure if monitoring
is continuous.
On systems, which have not failed, we have partial information only. Such data are called
time-censored.
If a system start operating at a particular time, we say that the censoring is single, called
censoring of type I. Some observation terminates at the instant of the rth failure, where r is a
predetermined integer. In this case, the failures are censored according to type II censoring.
If a system starts operating at different time points in an interval (0, t], and the observation
terminates at time t, we have multiple censoring of data. If a system is known to have failed
prior to the time when observation started we have left censoring. The other type of censored
information, where the system is still in operation at the termination of monitoring, is called
right censoring.
6
Identifying Suitable Distributions

In order to understand the failure process or repair process of a system,
knowledge of the characteristics of the theoretical distribution, and
statistical analysis of the data will assist in selecting a failure or repair
distribution. An ideal approach is to
Construct a histogram or the failure or repair times

Compute descriptive statistics
Analyse empirical failure rate
Use prior knowledge of the failure process
Use properties of the theoretical distribution
Construct a probability plot
7
Reliability Data
The reliability function can be defined as;

R(t) = P(system operates during [0, t)),
Where P(A) denotes the probability of an event A. To understand this it is
necessary to understand the concept of probability and the concept of random
variables
Data collected for repairable

electronic system
Collector
Data
Manufacturer Failures
Statistics
EVENTS
BAD NEWS
Customer
CENSORED
Failure Free Time TIMES
GOOD NEWS
The systems were progressively introduced into service and NOT

operated continuously or in a uniform manner
Source : Ansell and Phillips (1994)
Example of Reliability Data

10
9
8
XX
X
XX
X
X
No of
No
r
Observed Information 1
renewal
6
5
4
X X XX
X
XX
X XX
X
X
XX
r2 r1
XX
0
XX
t0
rs rs 1
X XX
t1
X
t2
t s 1
ts
Time
400
800
1200
1600
2000
2400
Time (in hours)

X = Failure:
0: last time withdrawn
Installation
Start of
of new equipment collection data
End of
data collection
: Failure free time
Data example from a group of 10 repairable electronic systems
Method of data collection of mechanical equipment fitted

To fleet x denote an observed renewal
10
Equipment Reliability
A piece of equipment is an assembly of components to perform a

specific function.
It will fail as a result of component failure (assuming no catastrophic

failure) it can be restored back by replacement of failed component
The mapping of component failures history of a piece of equipment is

shown in the diagram
Mechanical component may exhibit constant, decreasing or

increasing failure rates. In certain circumstances it may be necessary
to determine these characteristics differently
11
System reliability
Systems are more complex, they generally comprise of a combination of

equipment (series and parallel)
During the system lifetime component may be replaced (renew) and

equipment repaired.
Modification may also be carried out to improve performance or meet

operational requirement
System reliability techniques are needed to quantify the overall

equipment reliability and make sure the system is at 100 percent
12
Complex system
Complex repairable systems in business, industry, medicine and nature

frequently incorporate preventive maintenance actions in attempts to
improve operational performance and reliability.
These typically involve providing systematic inspection, detection and

eradication of partial or incipient failures.
Several researchers have proposed mathematical models for such

systems, though most of these contain fundamental flaws and serve
only as statistical approximations, as we shall see.
13
Complex repairable systems
A complex system consists of any structure of more than one component,

which performs a particular function. A complex repairable system is a
system, which after it has failed to perform properly, can be restored to a
satisfactory performance by any method except complete replacement of the
entire system (Crowder et al., 1991).
Therefore, it is extremely important that the assumed stochastic process

accurately characterizes system failure. Typical systems include industrial and
domestic machinery, such as production lines, motor vehicles and computers.
They also include biological and ecological structures, such as the human
body and natural ecosystems. We can imagine other applications arising in
society and commerce but concentrate here on industrial systems, which
benefit greatly from reliability and maintenance modelling.
14
Complex repairable systems
15
16
Refinery Pump Data

34
14
81
86
156
20
96
47
45
971
88
30
4
xi
ci
0
1
0
0
0
0
0
0
0
0
1
1
1
4
13
27
8
148
92
13
13
67
29
12
1
1
1
1
1
1
0
1
1
1
0
1
1
1
37
28
38
20
28
44
3
56
64
8
62
8
46
1
1
1
1
0
1
1
1
1
0
0
1
1
22
51
51
15
18
1
26
37
36
2
12
27
102
1
1
1
1
1
1
0
1
1
1
0
1
1
3
21
6
26
15
35
44
61
84
12
65
43
4
1
1
1
1
0
1
0
1
0
1
0
0
1
Main pump in petroleum

refinery collection period
7 years
65 event observations
First half: 15 CM and 11
PM
Second half: 29 CM and
10 PM
= inter-event times (days)

Censoring indicator variables (0 = preventive maintenance 1 = corrective maintenance)
17
Model: Power-Law Process

Estimation Method: Maximum Likelihood
Parameter Estimates
Standard 95% Normal CI
Parameter Estimate Error
Lower Upper
Shape
0.518739 0.064 0.407537 0.660284
Scale 0.0499202 0.051 0.0067508 0.369145
Trend Tests
MIL-Hdbk-189 Laplace's Anderson-Darling
Test Statistic
250.61
-7.39
28.92
18
Distribution Plot
19
Coal-Fired Power Generating

Plant Unit
ID
Fan
A
ID
Fan
B
BOILER
FD
Fan
A
FD
Fan
B
WIND BOX
Coal Mills
PA
Fan
A
BUS MAIN
H
PA
Fan
B
20
Process Failure/Maintenance Activity

Important period in the process
Interval
priority
Testing
evidence
Plant state
WOR
failure
WOC
entry
PFW
issued
SPR
repair
ROMP
PFW
off
WOR
WOC what to do, who to do it

hours
hours
days
days
WOC
closed
Do work
The sequence of activities varies from job to job. The interval between
failure and isolation is a priority. Consider these activities stated above
and work out the state of equipment at a specific time, and how precise
the state of the plant is?
21
Failure Modes
The failure mode is defined as the effect by which a failure is

observed on the item, rather than the effect a failure has on the system
containing the item. Specifying boundary enhances the standard
selection of equipment classes
we classify failures as
Mechanical failure (process failure)

Electrical Failure
Equipment failure (etc..)
22
Classification of failure modes

Information regarding failure modes are further classified according to
Critical failure: A failure which is sudden and causes cessation of one of more
fundamental functions. This failure requires immediate corrective maintenance action in
order to return to satisfactory condition.
Degraded failure: A failure which is gradual, partial or both, such failure does not cease
the fundamental functions but comprises of one or more several function. They may be
compromised by any combination of reduced increased or erratic output may lead to
critical failure.
Incipient failure: An interpretation in the state or condition of an item or equipment so

that a degraded or critical failure can be expected to result if corrective maintenance is
not taken.
23
Plant Database
Data information are monitored in the control room, collected and are
stored in
Genysis (Vax platform)

Efor Database
PI-Database
Shift logs
24
Genysis Database
WO_NUM
85700
85700
85800
85800
EQUIP, WORK / DESCRIPTION

2AA
2AA
203700
"C"PFMILLCOAL/AIROUTLETTEMPIND'N
292900
2AA01
293000
350500
350500
351100
351100
351500
351500
733000
733000
PULVERISEDFUELMILLS---CARRYOUT
1AA033014
293000
PULVERISEDFUELMILLS---CARRYOUT
203700
292900
MO_TYPE
"A"PFMILL---HOLEATWESTROPEBOX
3AA06
"F"PFMILL---LARGECRACKINWELD
2AA0716
"G"PFMILLREJECTSSYSTEM&
CONTROLS
4AA06
"F"PFMILL---CCRPAFLOWINDICATION
4AA03
"C"PFMILL---PLEASEINSPECT3.3Kv
4AA0316
"C"PFMILLREJECTSSYSTEM&
CONTROLS
FAILURE
CAUSE
DURATION
(DAYS)
DATE
12/01/2000
ENTERED
20/01/2000
CLOSED
12/01/2000
ENTERED
60
210
09/08/2000
CLOSED
24/01/2000
ENTERED
413
24/01/2000
CLOSED
31/01/2000
ENTERED
12
15
15/02/2000
CLOSED
01/02/2000
ENTERED
12
14
15/02/2000
CLOSED
04/02/2000
ENTERED
50
17
21/02/2000
CLOSED
04/02/2000
ENTERED
408
05/02/2000
CLOSED
05/02/2000
ENTERED
18
05/02/2000
CLOSED
13/03/2000
ENTERED
15/03/2000
CLOSED
25
Efor Database
Date on
UNIT
MW
DURATION
MWH
CAUSE
TYPE
09/01/1999
57
96
5521
Mills/fuelquality
FR
15/01/1999
47
167
7917
Mills/fuelquality
FR
20/01/1999
150
0.5
75
Millfire
FR
29/01/1999
28
48
1338
Mills/coalcondition
FR
26/02/1999
101
25
2522
Gearboxlub.oilpp&millavail
PR
14/07/1999
104
20.66
2149
Mills
FR
10/08/1999
56
72
4032
Millavailability
PR
12/08/1999
80
1.16
93
Mills
FR
13/08/1999
44
44
Mills
FR
30/08/1999
50
5.25
263
Mills
FR
31/08/1999
74
296
Mills
FR
23/09/1999
100
0.5
50
Lossofmill
FR
07/10/1999
50
17.5
875
Mills
FR
14/10/1999
76
9.17
697
Mills
FR
12/01/2000
103
6.5
669.5
Mills
FR
03/02/2000
115
0.33
37.95
Coalfeeder
FR
03/02/2000
80
1.5
115
Mills
FR
17/02/2000
70
350
Lostmill
FR
24/02/2000
70
0.5
35
Mill
FR
26
PI
Unit
MillA
MillB
MillC
MillD
MillE
MillF
MillG
MillH
L1L
L1BMEA0
1
L1BMEA0
2
L1BMEA0
3
L1BMEA0
4
L1BMEA0
5
L1BMEA0
6
L1BMEA0
7
L1BMEA0
8
Date
Time
01/01/2003
00:00:00
314.3
0.0
54.1
50.3
52.7
0.0
0.0
60.0
0.0
01/01/2003
01:00:00
314.2
0.0
56.4
50.8
57.4
0.0
0.0
60.0
0.0
01/01/2003
02:00:00
317.1
0.0
51.2
53.1
54.6
0.0
0.0
61.5
0.0
01/01/2003
03:00:00
318.1
0.0
51.4
54.7
56.1
0.0
0.0
58.6
0.0
01/01/2003
04:00:00
312.7
0.0
49.3
53.2
56.9
0.0
0.0
57.5
0.0
01/01/2003
05:00:00
312.6
0.0
49.3
55.0
54.6
0.0
0.0
58.0
0.0
01/01/2003
06:00:00
314.9
0.0
51.7
52.4
54.3
0.0
0.0
56.6
0.0
01/01/2003
07:00:00
311.8
0.0
51.7
47.8
57.2
0.0
0.0
56.6
0.0
01/01/2003
08:00:00
313.4
0.0
49.6
49.2
54.4
0.0
0.0
57.6
0.0
01/01/2003
09:00:00
311.8
0.0
50.1
50.9
51.3
0.0
0.0
57.1
0.0
01/01/2003
10:00:00
312.7
0.0
45.8
55.4
52.3
0.0
0.0
55.1
0.0
01/01/2003
11:00:00
312.2
0.0
48.0
55.1
52.3
0.0
0.0
57.0
0.0
01/01/2003
12:00:00
314.2
0.0
48.0
54.3
54.1
0.0
0.0
56.9
0.0
01/01/2003
13:00:00
315.8
0.0
48.0
54.1
51.6
0.0
0.0
56.0
0.0
01/01/2003
14:00:00
322.6
0.0
48.4
53.4
50.5
0.0
0.0
57.7
0.0
01/01/2003
15:00:00
266.7
0.0
1.1
53.5
50.0
0.0
0.0
57.7
0.0
01/01/2003
16:00:00
349.3
0.0
1.1
46.5
48.8
0.0
48.1
57.1
0.0
01/01/2003
17:00:00
361.7
0.0
1.1
48.8
49.0
0.0
52.4
57.1
0.0
01/01/2003
18:00:00
365.4
0.0
1.1
48.8
50.2
0.0
53.3
59.3
0.0
01/01/2003
19:00:00
318.7
0.0
1.1
48.8
50.0
0.0
52.0
59.4
0.0
01/01/2003
20:00:00
312.2
0.0
1.1
50.6
48.9
0.0
51.4
58.0
0.0
01/01/2003
21:00:00
311.6
0.0
1.1
50.6
50.8
0.0
51.5
58.5
0.0
01/01/2003
22:00:00
306.4
0.0
1.1
45.8
49.0
0.0
56.5
58.5
0.0
27
01/01/2003
23:00:00
314.1
0.0
0.0
50.2
51.0
0.0
59.0
61.1
0.0
PI data information
Date
01/01/2003
15/01/2003
15/01/2003
16/01/2003
16/01/2003
14/02/2003
26/02/2003
26/05/2003
27/05/2003
29/05/2003
01/06/2003
06/06/2003
10/06/2003
11/06/2003
13/07/2003
14//07/2003
19/07/2003
22/07/2003
23/07/2003
24/07/2003
22/08/2003
14/11/2003
06/12/2003
time
15:00
4:00
22:00
1:00
15:00
11:00
3:00
7:00
14:00
10:00
5:00
8:00
22:00
7:00
18:00
20:00
1:00
4:00
5:00
9:00
10:00
1:00
8:00
motor current
1.1
51.9
0.3
51.8
0.7
47.2
0.2
45.3
0.5
53.3
0.8
56.8
0.2
56.6
0.7
51.8
0.8
42.6
1.7
68.4
0.1
47.4
0.6
state
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
Event description mill B unit 1

Mill B not producing and oos
Mill B is running
B FD fan not making contact causing start faults
B mill running
B FD fan to investigate at next opportunity
B mill is running
Mill B not producing and unit is off
B mill is running
PFW issued for mill B not producing
B mill is running
B feeder drag chain
B mill is running
Service
B mill is running
B mill is not running
B g/box not controlling below 50%
B feeder carter g/box
B mil is on and running
B PFW, carter g/box, east roller faulty, o/h to plan
B operating
B east roller faulty, o/h
B mill is running
Internal inspection of rejects required
28
Shift logs
MILL
UNIT1
UNIT2
UNIT3
UNIT4
service
available
available
overhaul
available
available
overhaul
available
available
service
available
available
available
available
available
available
available
available
service
available
available
available
service
available
available
available
available
service
available
overhaul
available
available
For illustrative purposes data extracted from the shift logs are presented in this
form. There are 8 mills in each unit and 7 are required for full production load.
Availability is interpreted as the number of mills operational at a given time
Interval. Availability of mills in the plant is interrupted by services and overhauls.
29
Data on mill availability

Information about the number of mills available in each unit between 1999 -2005 is
extracted from the daily shift log document and the graphs below present mill
availability
Unit 1
Unit 3
Unit 2
Unit 4
30
Statistical Analysis
We conducted a study of the 2003 PI data records. The objective was

to assess the suitability of developing a renewal process to model the
availability of the mills. Using the state variable described we
calculated the consecutive number of hours that a mill would be in
state 1 and 0.
We explored these data for trend, seasonality, autocorrelation and

finally a suitable model fitted to the data. The following analysis is
considered, first the downtime data and secondly the uptime data.
31
Downtime Analysis
We conducted a Laplace test
on the series of downtime data
mills A, E and H tested positive
for a decreasing mean downtime
and mill B tested positive for an
increasing mean downtime
Mill E appears to have a change
point in the series about the 20th
event. The Laplace test on the
mill E data after the 20th event
suggests no trend.
There was no convincing evidence of autocorrelation in the data, we

therefore conclude no trend in the downtimes of the mills and
propose treating them as independent and identically distributed data32
Uptime analysis
While we analyzed these data
through correlations, the results
are best illustrated through
categorizing the data into long
and short durations.
The correlation is not strong, for example the autocorrelation coefficient was 0.227. As
such we conclude no trend in the uptimes for mills A, B, C, D, F and G and propose treating
the times between events in the data set as independent and identically distributed data.
33
Uptime analysis
Mill H appears to have a change point
about the 100th event rather than a
continuously changing mean uptime.
Mill E appears to have a mean uptime
that is changing
predictably
We fit a two degree polynomial to the

cumulative uptime for mill E and
measured an R2 of 99.2%. The model
implies that the mean uptimes is
increasing linearly between successions
For the actual uptimes rather than the cumulative data, the series showed no sign of
Autocorrelation but did exhibit signs of heteroscedasticity with a linearly increasing
variance.
34
Residuals adjusted for linearly

increasing variance for Mill E
The residuals obtained from estimating the time of the successive uptimes from the first
differences of the polynomial model that was fit to the cumulative data, which we divided by
their sequential rank (i.e. ri/i) to stabilize the variability within the data. We see that there
were unusually large residuals for the first three events and there appears to be a skew towards
positive residuals throughout the data.
35
Mean residual uptime for Mill A
This strongly suggests an increasing residual mean uptime, which is consistent with
models such as a Pareto distribution, obtained through a mixture of exponentials
36
Reliability requirements
This requirements should be well known in advance in order to

determine satisfactory confidence.
However, maintenance actions and renewal can show that a system can
meet its reliability requirements.
In reliability terms, the time to failure of a non-repairable item or the

time to failure of a system if repair is considered, is assumed to renew
the system to its original condition.
This assumption is very unrealistic for probability modelling and leads

to the distortion of statistical analysis, Ascher and Feingold (1984).
37
Preliminaries on life
distributions
The failure distribution represents an attempt to describe mathematically the

length of life of a device, Barlow and Proschan (1996).
However, on the basis of actual observations of time to failure, the cumulative

(life) distribution function, denoted by F(t), which is the probability that the
lifetime does not exceed t
F(t) = P(T t),
0 < t < .
The probability density function f(t) corresponding to F(t) is its derivative (if it
exists). This is a non-negative valued function such that
t
F (t ) f (t ) dt ,
0 t .
38
Reliability function
The reliability function, R(t), of a system having life distribution F(t)

is
R(t) = 1 F(t) = P(T > t).
This is the probability that the lifetime of the system will exceed t.
Another important function related to the life distribution is the
hazard function.
The hazard function is the ratio of the density function to reliability

function.
39
Hazard function
This can be used to represent the instantaneous hazard of a system, which has survived t units
of time, the hazard is given as
P (t T t t | T t )
t 0
t
h(t ) lim
P (t T t t T t ) P (t T t t )
.
P (T t )
P (T t )
From the multiplication law of probability, notice that h(t)t is approximately, for small t, the
probability that a system still functioning at age t will fail during the time interval (t, t + t ).
Where P ( t T t t | T t )
By definition, P(t < T t + t) = F(t + t ) F(t) so
h(t )
1
F (t t ) F (t )
f (t )
lim
1 F (t ) t 0
t
R (t )
40
Hazard Function
The hazard function is of great importance to practitioners

and the expression can be used in estimating
The time to failure (or time between failure)

Repair crew size for a given repair policy
Availability of a system
Warranty cost
Behaviour of a system failure with time
41
Mean Time To Failure (MTTF)
The mean time to failure (MTTF), , which is the expected value of T.

The general definition of the expected value of a lifetime random
variable T is
E (T ) t f (t ) dt.
0
The equation can be rewritten in order that the expected life can be
computed as
tf (t )dt
ds
f (t )dt
f (t )dt
ds R (t ) dt
0
42
Repairable and non-repairable

Items
It is important to distinguish between repairable and nonrepairable items when predicting reliability measures.
For non-repairable item such as a light bulb the reliability

is the survival probability over the expected life.
During the items life the instantaneous probability of the

first and only failure is called the hazard rate
43
The pattern of failure with time

(non-repairable systems)
There are three basic ways in which the pattern of failure can
change with time. The hazard rate may be decreasing,
increasing or constant
Decreasing hazard rate are observed in items which

becomes less likely to fail as their survival time increases.
For example electronic equipment and parts.
Constant hazard rate is a characteristic of failure which are

caused by excess load or stress at a constant average rate.
44
Repairable system
For item which are repaired when they fail, the reliability
is the probability that failure will not occur in the period of
interest, when more than one failure can occur.
It can also be expressed as the Rate of Occurrence of

Failure ROCOF.
Repairable systems can also be characterise by the Mean

Time Between Failure (MTBF), but only under the
condition of constant failure rate.
45
The pattern of failure for

repairable systems
The failure rate or ROCOF of repairable items can vary with time from these
three trends CFR, DFR, IFR
Constant failure rate (CFR) is an indicative of externally induced failure, it is

typical in complex system subject to repair and overhaul, where different part
exhibit different pattern of failure.
Repairable system can show a decreasing failure rate (DFR) when reliability
is improved by progressive repair, as defective parts are replaced by good parts.
Increasing failure rate (IFR) occurs when wear out failure modes parts begin
to predominate. The pattern of failure can be illustrated on a bathtub curve
46
Example
Suppose the hazard of a given system is constant in time, that is h(t) =
for all values of 0 t < . Then, the reliability function is
R (t ) exp dt e t , t 0
0
This reliability function corresponds to the exponential life distribution

having a cumulative distribution function
F (t ) 1 e t , t 0
47
Graph of F(t), R(t) and h(t) for

exponential distribution
1
F ( t)
Reliability function
Distribution function
0.5
R ( t)
0.5
0
0
10
10
Exponential cumulative distribution function
Exponential reliability function
Hazard function
h( t)
0.5
10
Exponential hazard function
48
Weibull Function
The Weibull distribution is the asymptotic distribution of the smallest

extreme for an initial underlying distribution which is bounded.
The Weibull function is non-linear expression for the hazard function.

It is used when the function cannot be represented linearly with time.
The cumulative distribution function of a Weibull variate T is given as
t
F (t ) 1 exp

t0
where > 0 and > 0 are the shape and scale parameters respectively
49
The Weibull Model
In the context of reliability modelling, the extreme value

distributions for the minimum are frequently encountered.
For example, if a system consists of n identical components
and the system fails when the first of these components
fails, then system failure times are the minimum of n
random component failure times
Extreme value theory says that, independent of the choice of

component model, the system model will approach a
Weibull as n becomes large.
50
Weibull Distribution
The density function is expressed as

t
1
f (t ) t exp
and the hazard is of the form

t 1
h(t )
and the reliability function is given as
t
R (t ) exp
51
General Failure Curve
52
Weibull Reliability
Shape parameter of distribution

Scale parameter of distribution
>1
=1
<1
we get the increasing hazard rate reliability function

(wear out of the bathtub curve )
reduces to the exponential reliability function
(constant failure rate region)
we get the decreasing hazard rate reliability function
(the early failure rate region)
Measure the overall reliability
53
< 1 Implies Infant Mortality
The term infant mortality rate stems from the high mortality of infants
Electronic and mechanical systems may initially have high failure rate
Manufacturers provide production acceptance test burn in and
environmental test screening, to end the infant mortality before shipment to
clients. Therefore B<1 leads us to suspect that
Inadequate burn in or stress screening

Production problems, misassemble, quality control
Overhaul problems
Solid state electronic failure
If the dominant failure modes for a component is B < 1, and the component
survives infant mortality, it will improve with age. Conditional on survival the
failure rate decreases and the reliability increases
54
B=1 Implies Random Failures
By random we mean the failure are independent of time. These

failure modes are ageless. An old part is as good as new if the
failure mode is random. Therefore we might suspect
Maintenance errors, human errors
Failure due to nature, foreign object damage, lightning strikes
Mixture of data from three or more failure modes (assuming
they all have different betta)
Here gain Overhaul are not appropriate
Weibull with B=1 is identical to the exponential distribution
55
1.0 < B < 4 Implies Early Wear

Out
If these failures occurs within the design life they are

unpleasant suprises
There are many mechanical failure modes in this classes
Low cycle fatigue
Most bearing failure
Corrosion, erosion
Overhaul or part replacement at low B lives may be cost
effective
The period for overhaul is read off the Weibull plot at the
appropriate B life
56
Tutorial Questions
An item is known to have a failure time that is Weibull

distributed with characteristic life 250h and shape
parameter 2.5.
What is the reliability at 100h and at what time is the

reliability 95%?
57
Tutorial (A)
2.5
100
R (100) exp
250
0.904 .
t
R (t ) 0.95 exp
250
250
2.5
2.5
ln(0.95) 0.051
t 250 (0.051)1/ 2.5 96.2hours
58
Stochastic processes
The word stochastic derives from Greek ( to aim, to

guess) and means random or chance.
A stochastic process may be thought of as a family of random

variables depending on parameters.
Stochastic processes are ways of quantifying the dynamic relationship

of sequences of random events, Taylor and Karlin (1994). They are
descriptions of random phenomena changing with time. These
phenomena can occur in complex repairable systems such as in
industrial machinery and other fields and have attracted increasing
attention in recent years.
59
Stochastic Process
Set of random variables, or observations of the same
random variable over time: X t , t 0 (continuous-parameter) or
X n , n 0,1,...
(discrete-parameter)
Xt may be either discrete-valued or continuous-valued.

A counting process can be a discrete-valued, continuousparameter stochastic process that increases by one each
time some event occurs. The value of the process at time t
is the number of events that have occurred up to (and
including) time t.
60
Basic concept of stochastic

processes
A stochastic model predicts a set of possible outcomes weighted by

their likelihood and probabilities. The models play an important role
in elucidating many areas of natural applications. They can be used to
analyze the inherent reliability in many processes.
Stochastic models give an insight to deal with uncertainties affecting

managerial decisions Taylor and Karlin (1994).
For example, a repair model can be used to determine the optimal time
for preventive maintenance before a failure occurs.
61
Stochastic point processes
Stochastic point processes have been applied to repairable systems.

They are mathematical models characterized by highly localized
events distributed randomly in a continuum.
The continuum is time and the highly localized events are failures,
which are assumed to occur at instants within the continuum, Crowder
et al (1991).
The entire technique developed for point processes is potentially

applicable to systems failure data.
62
Stochastic point process
A stochastic point process represents the successive arrival and

inter-arrival times of failure of systems, under the assumption that a
system is operated whenever possible and that repair times are
negligible.
The pattern of failures necessarily develops in calendar time. If a

system is sometimes shut down and no repair is considered, the exact
connection to calendar time disappears but the successive failures are
still calendar time ordered, Ascher and Feingold (1984).
Operating time can be used for reliability study.

63
Poisson Process
Let X t , t 0 be a stochastic process where X(t) is the number of
events (arrivals) up to time t. Assume X(0)=0 and
(i) Pr(arrival occurs between t and t+t) = t o t ,
where o(t) is some quantity such that lim t 0 o t / t 0
(ii) Pr(more than one arrival between t and t+t) = o(t)
(iii) If t < u < v < w, then X(w) X(v) is independent of X(u) X(t).
Let pn(t) = P(n arrivals occur during the interval (0,t). Then
e t t
pn t
,n 0
n!
n
64
Poison Process
fa
il
fa
il
fa
il
ur
e
fa
il
ur
e
ur
e
ne
w
ur
e
Let T1, T2, T3,be the times to successive failures of the system and
let Xi = Ti Ti1 be the time between failure i 1 and failure i where T0
= 0. The Ti and Xi are random variables and we define ti and xi to be
their corresponding realized values. We can define N(t) as the number
of failures within the given interval of time (t).
t1
t2
t3
t4
time
65
Intensity function
The intensity function of a stochastic point process is

Pr N t t N (t ) 1
t lim
t 0
t
A point process is said to be regular or orderly if

Pr{N t t N (t ) 2} 0 (t )
That is, if independent failures cannot occur

simultaneously. We will assume this property throughout
66
Point process models
The important point in system failure data analysis is that failures occur in a
specific sequence and can either be increasing, decreasing or constant.
The point process models that have been applied to repairable system
reliability are the homogeneous Poisson process (HPP), the nonhomogeneous Poisson process (NHPP) and the superimposed renewal
process (SRP).
Modelling terms used in the reliability for components (parts) and systems
had been confused totally by reliability engineers and scientists.
67
Non-homogeneous Poisson
process (NHPP)
Most repairs involve the replacement of very small

fraction of a systems constituent parts
It is plausible to assume that system reliability after

repair is essentially the same as it is immediately
after failure
This assumptions leads to NHPP as a system

reliability model
68
Characteristics of NHPP
N 0 0
system initialisation at time t
N t N s N s
N t N s ~ Po
where
independence of increments
t dt
s
dE N t
t
dt
69
NHPP
There are many connections between the NHPP and a distribution of time
of failure.
Consider a system of age t, modelled by an NHPP, can, for some purposes

be considered to have age x (numerically equal to t).
This normally results from the independent increments property of the

NHPP (non-stationary process)
70
Homogeneous Poisson process

(HPP)
The most straightforward way to define HPP is as

a sequence of independent and identically (IID)
exponentially distributed xis.
Several equivalent definition refer to the HPP as

an orderly stochastic process, with stationery,
independent increment
71
Renewal Process
The renewal process is defined as a
sequence of independent and identically
distributed non-negative random variables
X1, X2, X3 which with probability 1 are not
all zero. Hence it is a generalisation of the
HPP
72
fa
il
fa
il
fa
il
ur
e
fa
il
ur
e
ur
e
ne
w
ur
e
Example
t1
t2
t3
t4
N(t) = number of failures to time t
time
H(t) = history of failures to time t
73
Reliability Model relationships

MODELS
Minimal Repair
Non-homogeneous
Poisson Process
Partial Repair
Rejuvenation
Maximal Repair
Correction
Renewal Process
74
Proportional Intensities Model

(Coxs, 1972)
Nt
t 0 t si
where
i 1
si
0 t
= constant scaling factor
baseline intensity function
s1 s N t
0 t
constant (exponential)
0 t t
loglinear (truncated gumbel)
0 t t
power law (Weibull)

75
Hypothetical Data from Ascher

and Feingold (1984)
happy
system
sad
system
noncommittal
system
15
177
51
27
65
43
32
51
27
43
43
177
51
32
15
65
27
65
177
15
32
76
Log-likelihoods for Hypothetical

Data
Model
happy
system
sad
system
noncommittal
system
PIM (partial repair) constant
-33.7
-33.7
-35.5
loglinear
-32.4
-28.6
-31.0
power-law
-29.2
-32.0
-34.7
constant
-35.5
-35.5
-35.5
loglinear
-34.8
-34.8
-34.8
power-law
-35.1
-35.1
-35.1
constant
-35.5
-35.5
-35.5
loglinear
-34.8
-32.0
-35.2
power-law
-35.0
-31.8
-35.3
maximal repair
minimal repair
Baseline
Intensity
77
Contour Plots for Log-likelihood

Fits
loglik
loglik
power-law fit to happy data
loglik
loglinear fit to sad data
loglinear fit to noncommittal data
78
Intensity Functions for Chosen

Models
Intensity Function
0.18
Intensity Function
0.18
( t a b s)
( t a b s)
( t a b s)
happy
410
sad
Intensity Function
0.18
410
410
noncommittal
79
Revision: Possible
Questions/Problems
Definition of reliability, hazard, stochastic processes

Show the reliability function and gives an applied example
Derivation of important equations
Hazard Function
Stochastic Point Process
Weibull Model
Point Process Models
Intensity Function
Bathtub Curves and its significance to repairable and nonrepairable systems
80

05 - Reliability Analysis and Stocahstic Processes

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

05 - Reliability Analysis and Stocahstic Processes

Diunggah oleh

Hak Cipta:

Format Tersedia

Reliability Analysis

Fundamentals of Reliability Analysis

The collection and analysis of reliability data requires a systematic

Weibull analysis is considered to be an important techniques for

Presenting results in a clear and concise form with the emphasis of

Recall reliability definition

Reliability analysis requires the recognition of modelling of system characteristics that

Sometimes reliability requirements is based on practical experience and engineering

Identifying Suitable Distributions

Construct a histogram or the failure or repair times

The reliability function can be defined as;

Data collected for repairable

The systems were progressively introduced into service and NOT

Source : Ansell and Phillips (1994)

Example of Reliability Data

Time (in hours)

0: last time withdrawn

: Failure free time

Data example from a group of 10 repairable electronic systems

Method of data collection of mechanical equipment fitted

A piece of equipment is an assembly of components to perform a

It will fail as a result of component failure (assuming no catastrophic

The mapping of component failures history of a piece of equipment is

Mechanical component may exhibit constant, decreasing or

Systems are more complex, they generally comprise of a combination of

During the system lifetime component may be replaced (renew) and

Modification may also be carried out to improve performance or meet

System reliability techniques are needed to quantify the overall

Complex repairable systems in business, industry, medicine and nature

These typically involve providing systematic inspection, detection and

Several researchers have proposed mathematical models for such

Complex repairable systems

A complex system consists of any structure of more than one component,

Therefore, it is extremely important that the assumed stochastic process

Complex repairable systems

Refinery Pump Data

Main pump in petroleum

= inter-event times (days)

Model: Power-Law Process

Coal-Fired Power Generating

Process Failure/Maintenance Activity

WOC what to do, who to do it

The failure mode is defined as the effect by which a failure is

Mechanical failure (process failure)

Classification of failure modes

Incipient failure: An interpretation in the state or condition of an item or equipment so

Genysis (Vax platform)

EQUIP, WORK / DESCRIPTION

Event description mill B unit 1

Data on mill availability

We conducted a study of the 2003 PI data records. The objective was

We explored these data for trend, seasonality, autocorrelation and

There was no convincing evidence of autocorrelation in the data, we

We fit a two degree polynomial to the

Residuals adjusted for linearly

Mean residual uptime for Mill A

This requirements should be well known in advance in order to

In reliability terms, the time to failure of a non-repairable item or the

This assumption is very unrealistic for probability modelling and leads

The failure distribution represents an attempt to describe mathematically the

However, on the basis of actual observations of time to failure, the cumulative

The reliability function, R(t), of a system having life distribution F(t)

The hazard function is the ratio of the density function to reliability