RELIABILITY • Failure does not occur if the software is not used. However in hardware reliability, material deterioration can cause failure even when the system is not in use.
• In software reliability, failures are caused by
incorrect logic, incorrect statements, or incorrect input data. In hardware reliability, failures are caused by material deterioration, random failures, design errors, misuse, and environmental factors. HARDWARE VERSUS SOFTWARE RELIABILITY • Software failures are rarely preceded by warnings while hardware failures are usually preceded by warnings.
• Software reliability tends to change continually
during test periods. While hardware reliability may change during certain periods such as initial burn in or the end of useful life however it has a much greater tendency then software value. HARDWARE RELIABILITY GRAPH
The Bathtub Curve
SOFTWARE RELIABILITY GRAPH RELIABILITY METRICS
• Reliability metrics are used to quantitatively express the
reliability of the software product.
• Measuring the software reliability is a difficult problem
because we don't have a good understanding about the nature of software. It is difficult to find a suitable way to measure software reliability, and most of the aspects related to software reliability.
• Some reliability metrics which can be used to quantify
the reliability of the software product are discussed in next slide. MEAN TIME TO FAILURE (MTTF)
• MTTF is defined as the time interval between the
successive failures. An MTTF of 200 means that one failure can be expected every 200 time units. • The time units are totally dependent on the system & it can even be specified in the number of transactions. • MTTF is relevant for systems with long transactions. For example, it is suitable for computeraided design systems where a designer will work on a design for several hours as well as for Word-processor systems. MEAN TIME TO REPAIR (MTTR)
• Once the failure occur sometime is required to fix
the error.
• MTTR measures the average time it takes to track
the errors causing the failure & to fix them. MEAN TIME BETWEEN FAILURE (MTBF)
• We can combine MTTF &MTTR metrics to get the
MTBF metric.
• MTBF = MTTF + MTTR
• Thus, an MTBF of 300 indicates that once the failure
occurs, the next failure is expected to occur only after 300 hours. In this case the time measurements are real time & not the execution time as in MTTF. RATE OF OCCURRENCE OF FAILURE (ROCOF) • It is the number of failures occurring in unit time interval.The number of unexpected events over a particular time of operation.
• ROCOF is the frequency of occurrence with which
unexpected behavior is likely to occur.
• An ROCOF of 0.02 means that two failures are likely
to occur in each 100 operational time unit steps. It is also called failure intensity metric. PROBABILITY OF FAILURE ON DEMAND (POFOD) • POFOD is defined as the probability that the system will fail when a service is requested. It is the number of system failures given a number of systems inputs.
• POFOD is the likelihood that the system will fail when
a service request is made. A POFOD of 0.1 means that one out of a ten service requests may result in failure.
• POFOD is an important measure for safety critical
systems. POFOD is appropriate for protection systems where services are demanded occasionally AVAILABILITY (AVAIL)
• Availability is the probability that the system is
available for use at a given time.
• It takes into account the repair time & the restart
time for the system.
• An availability of 0.995 means that in every 1000
time units, the system is likely to be available for 995 of these. CLASSIFICATION OF FAILURES
• Transient :Transient failures occur only for certain
input values while invoking a function.It is a temporary error that is likely to disappear soon.
• Permanent:Permanent failures occur for all input
values while invoking a function of the system.
• Recoverable:When a recoverable failure occurs,the
system can recover without having to shutdown and restart the system(with or without opertor intervention). CLASSIFICATION OF FAILURES
• Unrecoverable:In this,the system may need to be
restarted.
• Cosmetic:These classes of failure cause only minor
irritations and don’t lead to incorrect results. Example-A situation where mouse button has to be clicked twice instead of once to invoke a given function through the graphical user interface.