Risk Analysis For Information and Systems Engineering: INSE 6320 - Week 6

1
Reliability Theory
INSE 6320 -- Week 6
Let T be a random variable representing the failure time or lifetime of a

physical system. For this system, the probability that it will fail by time t is:
Risk Analysis for Information and Systems Engineering
F ( t ) P [T t ]
Reliability
Expert Opinion
Midterm Review
f (u ) du
0
The probability of the system surviving until time "t" is:
R (t ) P[T t ] 1 F (t ) f ( u) du
t
Failure rate: the probability that a failure will occur in the interval [t1, t2]
given that a failure has not occurred before time t1. This is written as:
P [ t 1 T t 2 | T t 1]
P[t 1 T t 2]
F ( t 2 ) F ( t 1)
t 2 t1
( t 2 t 1) P [T t 1]
( t 2 t 1) R ( t 1)
Dr. A. Ben Hamza
Concordia University
2
Reliability
Reliability: The probability that an item will perform its intended function without
failure under stated conditions for a specified period of time.
Failure: The termination of the ability of the product to perform its intended function
Reliability provides a quantitative statement of the chance that an item will

operate without failure for a given period of time in the environment for
which it was designed.
In its simplest and most general form, reliability is the probability of success.
To perform reliability calculations, reliability must first be defined explicitly. It

is not enough to say that reliability is a probability. A probability of what?
Reliability is performance over time, probability

that something will work when you want it to.
Reliability Terms
Mean Time To Failure (MTTF) for non-repairable systems

Mean Time Between Failures for repairable systems (MTBF)
Reliability Probability (survival) R(t)
Failure Probability (cumulative density function ) F(t)=1-R(t)
Failure Probability Density f(t)
Failure Rate function (hazard rate) h(t)
Important Relationships:
R (t ) F (t ) 1
t
f (t ) h (t ) exp - h (u ) du dF ( t ) / dt,
0
R (t ) 1- F (t ) exp - h(u ) du ,
0
MTTF tf t dt R t dt
0
F (t ) f (u ) du ,
0
h (t ) f (t ) / R (t )
Example: Exponential Model
Failure Distribution function (or unreliability): Probability that the

product fails at some time prior to t.
F ( t ) P (T t )
h (t ) f (t ) / R (t )
Failure Density function: The value of f(t) is the probability of the

product failing precisely at time t.
f (t )
Constant Failure Rate
l(t)
f ( t ) exp( t )
dF (t )
dR (t )
R '(t )
dt
dt
0, t 0
R ( t ) exp( t ) 1 F ( t )
Reliability function: Probability that the item does not fail before time t
e (t x )
R ( x | t ) P(T t x | T t ) t e x R ( x )
e
R (t ) P (T t ) 1 F (t )
Hazard function: Measure of proneness to failure as a function of age, t.
h (t ) lim
t 0
P (t T t t | T t )
f (t)
R '( t )
d log R (t )
t
R (t )
R (t )
dt
MTTF
R(MTTF ) e MTTF e1 0.367879
Cumulative hazard : Cumulative number of failures at time t

H (t )
Memory-less property implies that a used unit is just as reliable as

one that is new; i.e., there is no wear-out.
h (u ) du log R ( t )
6
MTTF and MTBF
Example: Weibull Model
One of the measures of the system's reliability is the mean time to failure
(MTTF). It should not be confused with the mean time between failure (MTBF).
We refer to the expected time between two successive failures as the MTTF
when the system is non-repairable.
For a repairable item, MTBF is the ratio of the cumulative operating time to the
number of failures for that item.
When the system is non-repairable we refer to MTTF as the MTBF
MTBF MTTF R t dt tf t dt E (T )
0
f (t )
MTBF
Total operating time 45000
7500 hours
Number of failures
6
t
exp

0, 0, t 0
t
R ( t ) exp 1 F ( t )

t
h(t ) f (t ) / R (t )

MTTF
Example (repairable system): A motor is repaired and returned to service

six times during its life and provides 45,000 hours of service. Calculate MTBF.
1
t 1/ e t dt 1
is the Shape Parameter and

is the Characteristic Lifetime survival
11
Versatility of Weibull Model

t
Failure Rate: h(t ) f (t ) / R (t )

Answer
Failure Rate
Constant Failure Rate

Region
h(t)
0 1
Early Life
Region
Wear-Out
Region
Time t
10
Example
Failure Rate Function
Increasing failure rate (IFR) v.s. decreasing failure rate

(DFR)
h(t )
or
h (t )
respectively
Examples
h (t ) c where c is a constant
h (t ) at
h (t )
where a 0
1
for t 0
t 1
12
13
15
Answer
Answer
14
16
System Reliability Evaluation
Answer
A system (or a product) is a collection of components arranged according to a

specific design in order to achieve desired functions with acceptable performance
and reliability measures.
Clearly, the type of components used, their qualities, and the design configuration
in which they are arranged have a direct effect on the system performance and its
reliability. For example, a designer may use a smaller number of high-quality
components and configure them in a such a way to result in a highly reliable
system, or a designer may use larger number of lower-quality components and
configure them differently in order to achieve the same level of reliability.
Once the system is configured, its reliability must be evaluated and compared with
an acceptable reliability level. If it does not meet the required level, the system
should be redesigned and its reliability should be re-evaluated.
MTTF=
17
19
Reliability Block Diagram (RBD) Technique
Typical RBD configurations and related formulae
The first step in evaluating a system's reliability is to construct a reliability block

diagram which is a graphical representation of the components of the system and
how they are connected.
The purpose of RBD technique is to represent failure and success criteria pictorially
and to use the resulting diagram to evaluate System Reliability.
The reliability of the system is given by

R (t ) RA (t ) RB (t ) RC (t )....RZ (t )
Output
Input
Benefits:
Series System
The interpretation can be stated as any unit failing causes the system as a whole to fail.
The pictorial representation means that models are easily understood and therefore
readily checked.
Block diagrams are used to identify the relationship between elements in the system.
The overall system reliability can then be calculated from the reliabilities of the blocks
using the laws of probability.
Block diagrams can be used for the evaluation of system availability provided that
both the repair of blocks and failures are independent events, i.e. provided the time
taken to repair a block is dependent only on the block concerned and is independent
of repair to any other block
Parallel System
The reliability of the system is given by:
Input
Output
R(t ) 1 (1 RX (t ))(1 RY (t ))
The units X and Y that are operating in such a way that the system will survive as long as at
least one of the unit survives.
18
System Configuration Models
20
Typical RBD configurations and related formulae
Series/Parallel System
When blocks such as X and Y themselves comprise sub-blocks in series, block

diagrams of the type are shown below
Output
Input
RX (t ) RA1 (t ) RB1 ( t ) RC 1 (t )....RZ 1 (t )

RY (t ) RA 2 (t ) RB 2 (t ) RC 2 (t )....RZ 2 (t )
Thus, the reliability of the system is given by

R (t ) 1 (1 R X (t ))(1 RY (t ))
21
23
Software Reliability Models
Software Reliability
Basic definitions:
Software reliability models can be classified into many different groups; some of the
more prominent (better known) groups include:
Software reliability: probability that the software will not cause a failure for some
specified time.
error seeding - estimates the number of errors in a program. Errors are divided into
indigenous errors and induced (seeded) errors. The unknown number of indigenous
errors is estimated from the number of induced errors and the ratio of the two types
of errors obtained from the testing data.
Failure: divergence in expected external behavior.

Fault: cause/representation of an error, i.e., a bug
Error: a programmer mistake (misinterpretation of specifications?)
Reliability growth
Basic question: How to estimate the growth in software reliability as its errors are
being removed?
Measures and predicts the improvement of reliability through the testing process
using a growth function to represent the process.
Major issues:
Independent variables of the growth function could be time, number of test cases
(or testing stages) and
testing - (how much? When to stop!)

field use ( # of trained personnel? Support staff?)
The dependent variables can be reliability, failure rate or cumulative number of

errors detected.
Software reliability growth models: observe past failure history and give an estimate of
the future failure behavior; about 40 models have been proposed.
22
Reliability and Availability

A simple measure of reliability can be given as: MTBF = MTTF + MTTR , where
MTBF is mean time between failures
MTTF is mean time to fail
MTTR is mean time to repair
Availability can be defined as the probability that the system is still operating within
requirements at a given point in time and can be given as:
MTTF
Availability =
100%
(MTTF + MTTR)
Availability is more sensitive to MTTR which is an indirect measure of the
maintainability of software.
24
Nonhomogeneous Poisson process (NHPP)

provide an analytical framework for describing the software failure
phenomenon during testing.
the main issue is to estimate the mean value function of the cumulative
number of failures experienced up to a certain point in time.
a key example of this approach is the series of Musa models
A typical measure (failures per unit time) is the failure intensity (rate) given as:
# of failures in [ t , t t ]
(t ) f
where t = program CPU time (in a time shared computer) or wall clock time
(in an embedded system).
25
27
Example:
Assume a program will experience 100 failures in infinite time. It has now
experienced 50 failures. The initial failure intensity was 10 failures/cpu hour.
Software Reliability Growth models are generally black box - no easy way to
account for a change in the operational profile
Operational profile: description of the input events expected to occur in actual
software operation how it will be used in practice
The current failure intensity is:

50
( ) 0 1 10 1
5 failures/cpu hour
100
0
consequences are that we are unable to go from test to field

Many models have been proposed, perhaps the most prominent is Musa
Basic model:
The number of failures experienced after 10 cpu hours is:
10
( ) 100 1 exp
(10) 100[1 exp( 1)] 63 failures
100
Failure Intensity (FI) is the number of failures per unit time.
For 100 hours:
Assume that the decrement in failure intensity (FI) function (the derivative
with respect to the number of expected failures) is constant.
10
( ) 1001 exp
(100) 100[1 exp(10)] 100 failures
100
Implies that the FI is a function of average number of failures experienced

at any given point in time.
26
Musa Basic Model
where:
28
Expert Opinion

( ) 0 1
0
0 is the initial failure intensity at the start of execution.

is the average (expected) number of failures at any point in time.
0 is the total number of failures over infinite time.
Expert Opinion techniques involves consultation with experts, who use

their experience and understanding of the system to arrive at an estimate
of its cost.
Only used when more objective techniques are not applicable
Used to corroborate or adjust objective data
Cross check historical based estimate
Use for high level, low fidelity estimating

Last resort
The average number of failures at any point in time is given as:

( ) 0 1 exp 0
0
Tip: Expert opinion is the least regarded

and most dangerous method, but it is
seductively easy. Most lexicons do not
even admit it as a technique, but it is
included here for completeness.
29
31
Expert Opinion Advantages/Disadvantages
How to obtain information
Advantages
An expert can factor in differences between past project experiences and new
techniques, architectures or applications involved in the future project
Good cross check of other estimate from Subject Matter Expert (SME) point of
view
It helps to be a good lawyer and a good detective.

Ask clear, logical, probing questions.
Never simply ask a question then just walk away, use the following
approach
Help the specialists think through their own answers.
Allows perspective to an estimate that may be overlooked without SME
Do you mean?
Would that be the same in another situation?
Disadvantages
Ask questions in more than one way.
Expert judgment is only as good as the estimator, who has his own biases
Completely subjective without use of other techniques
Clarification.
Their answers might change based on a clarification question.
Look for uncertainty in their answers.
Low-to-nil credibility
Was their response confident or reluctant?
Evaluate the information obtained.

Make sense? Could you explain it to someone else?
30
What makes a good expert?
Credibility!
Someone who has the ear of the Program manager.

You should use the same person that the program manager relies upon for
the most critical information.
32
Main Points for Midterm Exam
What to Study
Some topics are more important than others.
Spend your time on the right stuff.
Dont waste time on topics we havent emphasized in class.
How to Prepare for the midterm
Technical specialist or engineer who is knowledgeable about the program

under question.
Focus on the main topics

Go back to your assignment. There is a lot of good review in there.
Make a list of your problem areas.
Eliminate any topics/problems not mentioned on the lecture slides.
Keep the class notes as guideline.
Read the relevant textbook chapters on the covered topics.
Bring a Calculator on the day of the exam.
Only one double-sided sheet of formulas and notes is allowed.
Midterm Exam Coverage: Lectures 1-to-5, and Assignment 1
Main Topics for Midterm Exam
Risk vs. Uncertainty
Probability of Events. Probability Distributions.
Individual Risk vs. Societal Risk
Weibull Analysis
Survival Analysis
Event Trees and F-N Curves
Fault Trees:
Block diagram for engineering systems (series, parallel, series-parallel, etc)

Cut Sets and Minimal Cut Sets
Equivalent Fault Tree
Probability of Occurrence of Fault Events
Probability of Occurrence (System Failure) of Top Event
33

Risk Analysis For Information and Systems Engineering: INSE 6320 - Week 6

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Risk Analysis For Information and Systems Engineering: INSE 6320 - Week 6

Diunggah oleh

Hak Cipta:

Format Tersedia

1

Let T be a random variable representing the failure time or lifetime of a

Risk Analysis for Information and Systems Engineering

The probability of the system surviving until time "t" is:

Dr. A. Ben Hamza

Reliability provides a quantitative statement of the chance that an item will

To perform reliability calculations, reliability must first be defined explicitly. It

Reliability is performance over time, probability

Mean Time To Failure (MTTF) for non-repairable systems

Example: Exponential Model

Failure Distribution function (or unreliability): Probability that the

Failure Density function: The value of f(t) is the probability of the

Constant Failure Rate

R(MTTF ) e MTTF e1 0.367879

Cumulative hazard : Cumulative number of failures at time t

Memory-less property implies that a used unit is just as reliable as

MTTF and MTBF

Example: Weibull Model

Total operating time 45000

Example (repairable system): A motor is repaired and returned to service

is the Shape Parameter and

Versatility of Weibull Model

Constant Failure Rate

Failure Rate Function

Increasing failure rate (IFR) v.s. decreasing failure rate

System Reliability Evaluation

A system (or a product) is a collection of components arranged according to a

Reliability Block Diagram (RBD) Technique

Typical RBD configurations and related formulae

The first step in evaluating a system's reliability is to construct a reliability block

The reliability of the system is given by

System Configuration Models

Typical RBD configurations and related formulae

When blocks such as X and Y themselves comprise sub-blocks in series, block

RX (t ) RA1 (t ) RB1 ( t ) RC 1 (t )....RZ 1 (t )

Thus, the reliability of the system is given by

Software Reliability Models

more prominent (better known) groups include:

Failure: divergence in expected external behavior.

testing - (how much? When to stop!)

The dependent variables can be reliability, failure rate or cumulative number of

Reliability and Availability

Software Reliability Models

Nonhomogeneous Poisson process (NHPP)

Software Reliability Models

The current failure intensity is:

consequences are that we are unable to go from test to field

The number of failures experienced after 10 cpu hours is:

Failure Intensity (FI) is the number of failures per unit time.

For 100 hours:

Implies that the FI is a function of average number of failures experienced

Musa Basic Model

0 is the initial failure intensity at the start of execution.

Expert Opinion techniques involves consultation with experts, who use

Use for high level, low fidelity estimating

The average number of failures at any point in time is given as:

Tip: Expert opinion is the least regarded

Expert Opinion Advantages/Disadvantages

How to obtain information

It helps to be a good lawyer and a good detective.

Allows perspective to an estimate that may be overlooked without SME

Ask questions in more than one way.

Look for uncertainty in their answers.