Reliability
Models to the
Space Shuttle
NORMAN
Real project
experience shows that
yeliability models can
predict reliability and
help develop test
strategies. This case
study reports on
IBM's approach to
the space shuttle's
on-board sojkzare.
~
~
are a powerful tool for predicting, controlling, and assessing software reliability. In combination, these functions let
an organization determine if the reliability goals it sets for its software have
been met.
At a recent conference, many practitioners reported on the increasing and increasingly successful use of reliability
measurement.' One of the most important examples was its use on the US space
shuttle.' The space-shuttle case study is
an excellent example of how a real project
team can evaluate candidate reliability
models and select the models t h a t best
match the software'sfailure history
In t h ~ article,
s
we share the experience
of a team at IBM Federal Services Conipany in Houston, which evaluated many
reliability models, tried to validate them
for use on t h ~ sproject, and selected the
Schneidewind model to predict the relia~
~
~
~~
~~
~
07407459/92/0700/0028/$03
00 0 IEEE
JULY 1 9 9 2
data to a reliability model, consider brealing your systems and processes into
smaller elements that can more accurately
satisfy assumptions and constraints. If you
dunk of each software version as a combination of code subsets that have a known
failure hstory and homogeneous
execution history, you can more readily
accommodate a model's
assumptions.
0s
T h e IBM team used
h s approach to deal with
the Schneidewind model's assumptions. The
shuttle's Primary Avionics
Software Subsystem is
modified frequently,
using a constantly im-
oped for a system's first release is the onginal version, labeled versionA in Figure I .
Because all the code in version A was released for the first time, the subset of the
total system that was new was in fact the
entire version. This subset is labeled newcode subset a in Figure 1;
all of subset a begins operation when version A is
released.
When the system is
updated and rereleased as
version B, only the lines
of code that were modified or added are included
in new-code subset 6.
T h e remainder ofsubset
a is carried over from
version A and has been
Ve6iOn a combination
of c-e
ha have
a known foilure hiay,
you can more reodib
acco"odak a model's
Satisfyingassumptions. To ensure that sta- have been released to NASA since 1980, subset b, on the other hand, begns optistical modeling successfullypredicts reli- each an upgrade of the preceding version. eration when version B is released.
ability, you must thoroughly understand
Figure I shows one way to depict a seAll of version B is carried over to verprecisely how the predictions are to be in- quentially upgraded system. Code devel- sion C, unless it is modified, in which case
terpreted and applied and by whom. Business and military decisions could vary significantly in response to perceptions of
reliability, based on interpretations of the
predictions and their credibility.
Ortginol version
To validate a model's appropriateness
for an application,you must address each
assumption the model makes. For example, the Schneidewind model4 assumes
Previous-code subset (b)
that a system is modified only in immediate response to an observed failure. It asNew-code subset (b)
sumes that the process used to correct the
code is constant, implying that for each
error corrected there is an mherent, fixed
probability of introducing additional erPrevious-coda subset (c)
rors. It also assumes that all code in a program is homogeneous from the standNew-code subset (0
point of executionhstory.
For many system that are sequentially
upgraded, these assumptions appear at
first to represent significantincompatibilities. However, as t h ~ scase study illustrates, these restrictions can be accommoFigure 1. Depiction of a sequentially upgraded system. The first release is the original version, labeled
dated by carefully analyzing the elements
version A. All the code in version A is new, so subset a is the entire system. The white arrows show when
of a complete softwaresystem and its asso- each neu-code subset b e g m operation. w h e n the system is updated and rerekased as version B, only the
ciated processes.
lines of code that were modified or added are included i n neu-code subset b. A l l of version 13 is cam'ed over
I
1
I
I
IEEE SOFTWARE
I
_
I
t o version C, unless it is modified, i n which case it becomespart of mu-code mbset c. This process applies to
each successive system release.
29
MODEL APPLICATION
The team uses each new-code subsets
failure and operational execution time histories to generate a reliability prediction
for the new code in each version. This
approach places every line ofcode in PASS
into one new-code subset
o r another, depending
on the version for whch
it was written. PASS is
represented as a composite system of separate
new-code components,
each with an execution
history and a reliability,
connected in series.
By comparing calculated and actual failure
hstories, the IBM team
evaluated several ways to
represent a composite
system mathematically.
In the end, they judged a standard statistical expression to best fit the actual failure
data. T h s expression describes the probability ofan event based on a serial relations h p of multiple elements that each have
different probabilities.
In other words, it represents the failure
prediction for the overall system as the reciprocal of the sum of the reciprocals of
the failure predictions of each individual
s
element. The composite T p ~ measure
- the estimated average execution time
until the next failure -is computed by
Tpus= l/(l/Ta+I/+ l/Tc+1/Td+ ...)
where T,, Tb, T, Td, and so on are estimates of the time until the next failure for
each new-code subset, whch is determined by applying the model to each subset individually.
Because you must assign code to a newcode subset only when it fails, you need
not perform the unreasonable task of
brealung down the entire system, h e by
line, into subsets. You must know which
subset a h e belongs to mly when it is defective. T h s is what makes this approach
so feasible.
Estimating execution tine. T h e team estimates the execution time of PASS segments by analyzing test-case records of
digital flight simulations as well as records of actual shuttle operations. They
count test-case executions as operational
executions only if the simulation fidelity
matches actual operational conditions
very closely. They never count prerelease test execution time for the new
code actually being tested as operational
execution time.
You can eliminate the tedious manual
activity of estimating execution time and
increase the accuracy of your estimates if
your environment can be designed to
30
Validation. The IBM team selected several models for evaluation, according to
how compatible their assumptions were
with PASS.To validate each model, they
compared its reliability predictions to
PASSSactual failure history.
Significant operational failures are virtually nonexistent in PASS,a certified
man-rated system. To have a larger statistical sample for validation, the team included failures of all magnitude. By considering every fault detected in any
operational-like execution, whether the
user was aware of the fault or not, they
identified about 100 failures from 1980 to
1990.
T h e unit for the failure data was days.
Depending on the granularity and accuracy of your historical data and your
systems failure intensity, you may find
hours or even seconds to be more appropriate units.
In t h ~ scase, the team used the failure
data for each of six dates between 1986and
1989 to obtain six PASS reliability predictions using the Schneidewind model. For
each of the six predictions, they computed
the predicted mean time between failure
by assuming that the next failure did in fact
occur on the predicted date. They then
compared each prediction to the actual
mean time between failures as ofthat date.
The Schneidewind model appears to
provide the most accurate fit to the 12
years of failure data from t h ~ projea.
s
For
all six dates, the Schneidewind models reliabilitypredictions were about 15 percent
less than the actual average time between
failures. On the basis of the accuracy and
consistency of these Predictions relative to
other models, the IBM team selected th~s
statistical method to model PASS reliability.
Credibihy. The two most critical factors
in establishmg a reliability models credibility are how it is validated and how its
predictions are interpreted. For example,
you can interpret a conservative prediction as providing a margin of confidence in
the systemsreliability, if the predicted reliability already exceeds an established acceptable level.
You may not be able to validate that
you can predict reliability precisely, but
JULY 1992
you can demonstrate that you can, with both the reasonableness and the relative
high confidence,predict a lower bound on conservatism of the models results. You
reliabilityw i h a specified environment. can h d formulas appropriate for your apIfyou can use historical failure data at a plication in any standard statistics textseries of previous dates (and you have the book.
actual data for the failure hstory following
those dates) you should be able to compare RELIABILITY AND TESTING
the predictions to the actual reliability and
If you dont have a testing strategy, test
evaluate the models performance.
You should take all these factors into costs are hkely to get out of control. Withconsideration when establishing valida- out a strategy, each module you test may
tion criteria. T h s will also significantly get an equal amount of resources. You
enhance the credibilityof your predictions must treat modules unequally! Allocate
among those who must make decisions on more test time, effort, and funds to modules with the hghest predicted number of
the basis of your results.
failures.
You can use a reliability model to preAnalysis After much study and analysis,
t h e I B M team concluded t h a t the dict failures, F(tl,t2),during the interual
Schneidewind models conservative per- tl,t2, where t could be execution time or
tester labor time for a single module (in thls
formance was due to
+ Conservative execution-time esti- case, t means execution time). We predict
failures at tl for a continumates for each version.
ous interval that has endWe plan to improve the
points at tl+l and t2.
accuracy of executiontime estimates anduse the
We recommend that
you allocate test time to
models weighting capaeffOrtl your modules in proporbilities to more heavily
tion toF(tl,tz).
weight the most recent
failure history in the preYou update the
dictions.
models parameters and
+ Slight process impredictions, according to
the actual number of failprovements implemented
ures, XQ,, during the
during the development
cycle of some versions (vitime interval 0,tl. As Figolating the models asure 2 shows, we predict
sumption of a constant process).
F(t,,t*),at t l duringtl,t2, on the basisofthe
+ Detection and removal of latent model and X Q ~ .In Figure 2, tv7 is total
faults before they became failures in exe- available test time for a single module, but
cution (violating the implied assumption you could make t2 equal to tm(that is, you
that the software is corrected only when a could predict to the end of the test period).
failure is encountered). T h e models preUsing these updated predictions, you
diction, which is based on an assumed fault may reallocate test resources. Of course, it
density remaining in the software until the can be disruptive to reallocate too henext failure occurs, will underpredict the quently. Instead,you could predict and retime to next failure if the fault density de- allocate at major milestoneshke the formal
creases between failures.
review of test results.
T h e IBM team is applying the model
Eqwtkm. Using the Schneidewind
to predict a conservative lower bound for
PASS reliability. They are also performing model and PASS as an example, here are
independent statistical analyses using the the minimum equations necessary to illussame failure data to compute 95 percent trate how to use prediction to allocate test
upper and lower confidenceintervals.The resources. T h e mathematical details of
analyses, which use various classical statis- this model have been published else,~
the details of the calcutical formulas, have further confirmed ~ h e r e . 4Although
IEEE SOFTWARE
F(t) = ( O ) I 1 - cxP(-Bt)l
e ~ p - P ~ ).7;i),t,
l
~
(2)
(3)
W )= (a)
-4),t
(4)
So, given n modules, you should allocate test-execution time, T,,for each module i according to
CFLtl , t 2 )
/= 1
Figure 2. Reliability-pedictim time scale. Wepredict F(tl,tz), at tl duringthe time intavaltl,t2, based
on the model and X O , ~On
, . the time scale, tmis total
available test time fw a single modnk.
______
Failures
I Module 1
1.7642
.1411
12
' Module2
1
11
10
Module 3
F( M )
failures
F(20,30)
R(20)
failures
failures
1.322
1.503
2
T
periods
failures
1 Module1
1
predicted
actual
Motlule 2
predicted
actual
Module 3
predicted
12.95
13
12.50
13
14.4
t2 = ( In ~ ( ~ P ) / R ( t z ) l V P
(6)
Module I
Module 2
52.9
49.0
45.3
64
34.6
44
@)(a)
Wz) =
(7)
where p is the desired fraction of remaining failures at tz.
Substituting Equation 7 in Equation 6
gives
(8)
t? = {In [(l/P)l)/P
L-.
32
=
.
Functions for Software measurement program6 (available free 6-om William Farr,
Naval Surface Warfare Center Code I<52, Dahlgren, VA 22448) to the observed
failure data to obtain estimates of cx and p.
Table 1 shows the total number of failures
observed during 0,20 and the estimated
parameters.
We obtained the predictions for the interval 20,30 in Table 2 with the equations
just described.T h e prediction of F(20,30)
led to the prediction of T, the allocated
number of test-execution time periods.
T h e number of additional failures that
were later observed as testing continued
-~
JULY
1992
46
26
3b
Key:
Module
Module 21
t~~
+ T h e actual amount of test time required, startng at 0, for the last failure to occur: 64 for module
1 and 44 for module 2. Thls quantity comes from
16
L
~
002
004
p
006
008
01
remoiningfailure frortion
ACKNOWLEDGMENTS
We thank E. Goniez, D.O. Hamilton, and J.K. Orr of the IBM Federal Senices Co., Houston, Texas, for their roles in the space-shuttlereliability study. This research was supported hy
Ulliain Farr of the Naval Surface Warfare Center and Raymond Paul of the Army OperahOnd
Test and F.valuation Command.
REFERENCES
Int/$ymp. &&are Reliabilzty Eng., IEEE CS
Iress,Los.-Uamitos, Calif., 1991,pp. 144167, 172-191.
2 . 1; Keller et. al, PracticalApplications of Software Reliability Models, Pror. Zntl Synp.Sofiw r e Relzuhilzty Eng., IEEE CS Press, Los Alamitos, Calif., 199I, pp. 76-78.
3 . IEEE Stuidard Ghssaly of Sofdare Engineering Zrmtnolo~,hVSVLEEE Std 729, IEEE,
NewYork, 19x3.
Schneidewind,Analysis of Error Processes in Computer Software,Proc. Intl Cmf.
Relzu/?leSofrare, IEEE CS Press, L o s Alamitos, Calif., 1975,pp. 337-346.
-5, U!H. Farr, A Survey of Software Reliahility Modeling and Estimation, Tech. Report
NSMC T R 82-171, Naval Surface Weapons Center, Dahlgren, Va., 1983.
6. \I!
I.
Farr and O.D. Smith, StatisticalModeling and Estimation of Reliability Functions
for Software Users Guide,Tech. Report NAVSUC TR-84-373, Rev. 2 , Naval Suface
Weapons Center, Dahlgren, \a., Mar. 1991.
7.J.D. Alusa and A.E Ackerman, Quantifymg Software Validation,ZEEE Software, May
1989, l i p . 19-27.
1, ApplicationsI and ApplicanomII,lc.
Ted W. Keller is a semor techmcal staff member at IBM Federal Serwces C O ,where he manages the organimhon responsible for integratmg schedules and comnunuents, configuranon conuol, control boards,
qualitv and reliabdity meuics, and cernficanon of flight readmess for
the space shuttles Pnmary Amomcs Software System
Keller received a BS m aerospace engmeermg from the University
of Tennessee He is co-wce char of the AIM Space-Based Observanons Sptems Committee on StandardsSoftware Rehdbihn,Workmg
Group and a member ofthe IEEE Computer Society
Address questions about this article to Schneidewinda t the Naval Postgraduate School, Code
AS/SS, Monterey. CA 93943 ;Internet n.schneidewind@compmail.com.
~
~
-esults.
a&
cdrom
~~
IEEE SOFTWARE