Software Engineering

APPlYing
Reliability
Models to the
Space Shuttle
F. SCHNEIDEWIND,Naval Postgraduate School

TEDW. KELLER,IBM Federal Services Company
NORMAN
Real project
experience shows that
yeliability models can
predict reliability and
help develop test
strategies. This case
study reports on
IBM's approach to
the space shuttle's
on-board sojkzare.
~
~
are a powerful tool for predicting, controlling, and assessing software reliability. In combination, these functions let
an organization determine if the reliability goals it sets for its software have
been met.
At a recent conference, many practitioners reported on the increasing and increasingly successful use of reliability
measurement.' One of the most important examples was its use on the US space
shuttle.' The space-shuttle case study is
an excellent example of how a real project
team can evaluate candidate reliability
models and select the models t h a t best
match the software'sfailure history
In t h ~ article,
s
we share the experience
of a team at IBM Federal Services Conipany in Houston, which evaluated many
reliability models, tried to validate them
for use on t h ~ sproject, and selected the
Schneidewind model to predict the relia~
~
~
~~
s o k a r e for the National Aeronau&s and

Space Administration.
The approach reported here is experimental - it is not current practice
throughout IBM. T h e output of the
Schneidewind model is used by this shuttle project only, and only to add confidence to low failure probabilities that are
based on formal certification processes.
However, the IBM team has judged that
&IS
application successfully models the
software's failure hstory.
The techniques the IBM team used to
apply a reliability model and the underlying concepts should be of value to others
who must perform similar studies and
model evaluations.
USING MODELS TO PREDICT

Three separate but related functions
comprise an integrated reliability program:
~~
~
07407459/92/0700/0028/$03
00 0 IEEE
JULY 1 9 9 2
+ Prediction is the activity of estimating the future failure rate, number of

failures, time to next failure, and mean
time to failure, where a failure is "the
inability of a system or system component to perform a re uired function
within specified limits.7 3
+ Control is the activity of comparing

predictions with predefined goals and
flagging software that fails to meet those
goals.
+ Assemnent is the activity of determining what action to take when software fails
to meet goals. For example, you might intensify inspection or testing, redesign the
software, or revise the process. Formulating test strategies,whch is also part of assessment, involves determining the priority, duration, and completion date of
testing and allocatingpersonnel and computer resources to testing.
data to a reliability model, consider brealing your systems and processes into
smaller elements that can more accurately
satisfy assumptions and constraints. If you
dunk of each software version as a combination of code subsets that have a known
failure hstory and homogeneous
execution history, you can more readily
accommodate a model's
assumptions.
0s
T h e IBM team used
h s approach to deal with
the Schneidewind model's assumptions. The
shuttle's Primary Avionics
Software Subsystem is
modified frequently,
using a constantly im-
oped for a system's first release is the onginal version, labeled versionA in Figure I .
Because all the code in version A was released for the first time, the subset of the
total system that was new was in fact the
entire version. This subset is labeled newcode subset a in Figure 1;
all of subset a begins operation when version A is
released.
When the system is
updated and rereleased as
version B, only the lines
of code that were modified or added are included
in new-code subset 6.
T h e remainder ofsubset
a is carried over from
version A and has been
If you think of each
Ve6iOn a combination
of c-e
ha have
a known foilure hiay,
you can more reodib
acco"odak a model's
Satisfyingassumptions. To ensure that sta- have been released to NASA since 1980, subset b, on the other hand, begns optistical modeling successfullypredicts reli- each an upgrade of the preceding version. eration when version B is released.
ability, you must thoroughly understand
Figure I shows one way to depict a seAll of version B is carried over to verprecisely how the predictions are to be in- quentially upgraded system. Code devel- sion C, unless it is modified, in which case
terpreted and applied and by whom. Business and military decisions could vary significantly in response to perceptions of
reliability, based on interpretations of the
predictions and their credibility.
Ortginol version
To validate a model's appropriateness
for an application,you must address each
assumption the model makes. For example, the Schneidewind model4 assumes
Previous-code subset (b)
that a system is modified only in immediate response to an observed failure. It asNew-code subset (b)
sumes that the process used to correct the
code is constant, implying that for each
error corrected there is an mherent, fixed
probability of introducing additional erPrevious-coda subset (c)
rors. It also assumes that all code in a program is homogeneous from the standNew-code subset (0
point of executionhstory.
For many system that are sequentially
upgraded, these assumptions appear at
first to represent significantincompatibilities. However, as t h ~ scase study illustrates, these restrictions can be accommoFigure 1. Depiction of a sequentially upgraded system. The first release is the original version, labeled
dated by carefully analyzing the elements
version A. All the code in version A is new, so subset a is the entire system. The white arrows show when
of a complete softwaresystem and its asso- each neu-code subset b e g m operation. w h e n the system is updated and rerekased as version B, only the
ciated processes.
lines of code that were modified or added are included i n neu-code subset b. A l l of version 13 is cam'ed over
I
1
I
I
Systems as components. To apply your
IEEE SOFTWARE
I
_
I
t o version C, unless it is modified, i n which case it becomespart of mu-code mbset c. This process applies to
each successive system release.
29
it hecomes part of nea-code subset c, subset L begins operation when version C is

released. This pi-ocess applies to each successive system release.
New-code srksets. A s Figure I illustrates,

each new version of PASS contains code
that has been carried forward from previous versions plus code generated for that
version.
The team found that they can more
~idequatelv satisfy a
models assumptions by
applying the model independently to the newcode subset of each version.
New
code
developed for a particular
0
version satisfies the
Schneidewind models assumptions:
+ It is developed with
a nearly constant process.
+ It does, as an aggregate, build up the same
shelf life and execution
histor)l.
+ Cnless it is subsequently changed to
add capability (thereby becoming new
code in the next version),it is changed only
to correct faults.
An absolutelyessential requirement for
this approach is an accurate code-change
history, so every fdure can be attributed
only to the versionin which it was firstintroduced. Tlus lets you b d d a separate failure
history for each releases new code.
automaticallylogexecution time. Inmany

installations, however, this luxury may be
impractical, asin thecaseoftheshuttle.
MODEL APPLICATION
The team uses each new-code subsets
failure and operational execution time histories to generate a reliability prediction
for the new code in each version. This
approach places every line ofcode in PASS
into one new-code subset
o r another, depending
on the version for whch
it was written. PASS is
represented as a composite system of separate
new-code components,
each with an execution
history and a reliability,
connected in series.
By comparing calculated and actual failure
hstories, the IBM team
evaluated several ways to
represent a composite
system mathematically.
In the end, they judged a standard statistical expression to best fit the actual failure
data. T h s expression describes the probability ofan event based on a serial relations h p of multiple elements that each have
different probabilities.
In other words, it represents the failure
prediction for the overall system as the reciprocal of the sum of the reciprocals of
the failure predictions of each individual
s
element. The composite T p ~ measure
- the estimated average execution time
until the next failure -is computed by
Tpus= l/(l/Ta+I/+ l/Tc+1/Td+ ...)
where T,, Tb, T, Td, and so on are estimates of the time until the next failure for
each new-code subset, whch is determined by applying the model to each subset individually.
Because you must assign code to a newcode subset only when it fails, you need
not perform the unreasonable task of
brealung down the entire system, h e by
line, into subsets. You must know which
subset a h e belongs to mly when it is defective. T h s is what makes this approach
so feasible.
The WOmost critical

factors in establishing
reliability models
credibility are how it
isvalidatedand
how its predictions

Ore interpreted.
Estimating execution tine. T h e team estimates the execution time of PASS segments by analyzing test-case records of
digital flight simulations as well as records of actual shuttle operations. They
count test-case executions as operational
executions only if the simulation fidelity
matches actual operational conditions
very closely. They never count prerelease test execution time for the new
code actually being tested as operational
execution time.
You can eliminate the tedious manual
activity of estimating execution time and
increase the accuracy of your estimates if
your environment can be designed to
30
Validation. The IBM team selected several models for evaluation, according to
how compatible their assumptions were
with PASS.To validate each model, they
compared its reliability predictions to
PASSSactual failure history.
Significant operational failures are virtually nonexistent in PASS,a certified
man-rated system. To have a larger statistical sample for validation, the team included failures of all magnitude. By considering every fault detected in any
operational-like execution, whether the
user was aware of the fault or not, they
identified about 100 failures from 1980 to
1990.
T h e unit for the failure data was days.
Depending on the granularity and accuracy of your historical data and your
systems failure intensity, you may find
hours or even seconds to be more appropriate units.
In t h ~ scase, the team used the failure
data for each of six dates between 1986and
1989 to obtain six PASS reliability predictions using the Schneidewind model. For
each of the six predictions, they computed
the predicted mean time between failure
by assuming that the next failure did in fact
occur on the predicted date. They then
compared each prediction to the actual
mean time between failures as ofthat date.
The Schneidewind model appears to
provide the most accurate fit to the 12
years of failure data from t h ~ projea.
s
For
all six dates, the Schneidewind models reliabilitypredictions were about 15 percent
less than the actual average time between
failures. On the basis of the accuracy and
consistency of these Predictions relative to
other models, the IBM team selected th~s
statistical method to model PASS reliability.
Credibihy. The two most critical factors
in establishmg a reliability models credibility are how it is validated and how its
predictions are interpreted. For example,
you can interpret a conservative prediction as providing a margin of confidence in
the systemsreliability, if the predicted reliability already exceeds an established acceptable level.
You may not be able to validate that
you can predict reliability precisely, but
JULY 1992
you can demonstrate that you can, with both the reasonableness and the relative
high confidence,predict a lower bound on conservatism of the models results. You
reliabilityw i h a specified environment. can h d formulas appropriate for your apIfyou can use historical failure data at a plication in any standard statistics textseries of previous dates (and you have the book.
actual data for the failure hstory following
those dates) you should be able to compare RELIABILITY AND TESTING
the predictions to the actual reliability and
If you dont have a testing strategy, test
evaluate the models performance.
You should take all these factors into costs are hkely to get out of control. Withconsideration when establishing valida- out a strategy, each module you test may
tion criteria. T h s will also significantly get an equal amount of resources. You
enhance the credibilityof your predictions must treat modules unequally! Allocate
among those who must make decisions on more test time, effort, and funds to modules with the hghest predicted number of
the basis of your results.
failures.
You can use a reliability model to preAnalysis After much study and analysis,
t h e I B M team concluded t h a t the dict failures, F(tl,t2),during the interual
Schneidewind models conservative per- tl,t2, where t could be execution time or
tester labor time for a single module (in thls
formance was due to
+ Conservative execution-time esti- case, t means execution time). We predict
failures at tl for a continumates for each version.
ous interval that has endWe plan to improve the
points at tl+l and t2.
accuracy of executiontime estimates anduse the
We recommend that
you allocate test time to
models weighting capaeffOrtl your modules in proporbilities to more heavily
tion toF(tl,tz).
weight the most recent
failure history in the preYou update the
dictions.
models parameters and
+ Slight process impredictions, according to
the actual number of failprovements implemented
ures, XQ,, during the
during the development
cycle of some versions (vitime interval 0,tl. As Figolating the models asure 2 shows, we predict
sumption of a constant process).
F(t,,t*),at t l duringtl,t2, on the basisofthe
+ Detection and removal of latent model and X Q ~ .In Figure 2, tv7 is total
faults before they became failures in exe- available test time for a single module, but
cution (violating the implied assumption you could make t2 equal to tm(that is, you
that the software is corrected only when a could predict to the end of the test period).
failure is encountered). T h e models preUsing these updated predictions, you
diction, which is based on an assumed fault may reallocate test resources. Of course, it
density remaining in the software until the can be disruptive to reallocate too henext failure occurs, will underpredict the quently. Instead,you could predict and retime to next failure if the fault density de- allocate at major milestoneshke the formal
creases between failures.
review of test results.
T h e IBM team is applying the model
Eqwtkm. Using the Schneidewind
to predict a conservative lower bound for
PASS reliability. They are also performing model and PASS as an example, here are
independent statistical analyses using the the minimum equations necessary to illussame failure data to compute 95 percent trate how to use prediction to allocate test
upper and lower confidenceintervals.The resources. T h e mathematical details of
analyses, which use various classical statis- this model have been published else,~
the details of the calcutical formulas, have further confirmed ~ h e r e . 4Although
You must allocate

more test
and funds to modules
with the highest
predicted number of
failures.
IEEE SOFTWARE
lations would differ, you could use other

reliability models to help allocate test resources.
First you establish two Parameters, cx
and p, whch you estimate by applying the
model to XO,?~.
Once the parameters have
been established, you can predict four
quantities that help allocate test resources.
1. Number of failures during 0,t
(1)
F(t) = ( O ) I 1 - cxP(-Bt)l
2. Using h s quantity, you can predict

the number of failures during fl ,t2
F(tl,t2) = (wP)[~
e ~ p - P ~ ).7;i),t,
l
~
(2)
3 . You can also predict the maximum

number of failures during the softwares
life (t = -)
F(m) = d p
(3)
4. Then, using h s quantity, you can

predict the maximum remaining number
of failures at t
W )= (a)
-4),t
(4)
So, given n modules, you should allocate test-execution time, T,,for each module i according to
CFLtl , t 2 )
/= 1
In &IS equation, although you are using

the predicted failures for a single module,
Fr{tl,tl),the total available test-execution
time (n)[tI-tl]is allocated for each module
i across n modules.
Example of use. We now provide an example in which we used the interval 0,20
to estimate cx and p for each module and
Figure 2. Reliability-pedictim time scale. Wepredict F(tl,tz), at tl duringthe time intavaltl,t2, based
on the model and X O , ~On
, . the time scale, tmis total
available test time fw a single modnk.
during the interval 20,20+T, is shown in

the last column asX(20,20+T).
If you compare Table 1 with Table 2,
you see that additional failures could occur
in module 1 (12.95 - (12 + 0) = .95 failures)
and module 2 (12.50 - (11 + 1) = .SO failures), according to the predicted maximum number of failures F(-). That is, for
these modules
______
Failures
I Module 1
1.7642
.1411
12
' Module2
1
11
10
Module 3
[X(O,ZO)+ X(20,20+T)]< F(m).
F( M )
failures
F(20,30)
R(20)
failures
failures
1.322
1.503
2
T
periods
failures
1 Module1
1
predicted
actual
Motlule 2
predicted
actual
Module 3
predicted
12.95
13
12.50
13
T h e actual F(-) is known only after d l

testing is complete; it is not known at
20+T. You need additional procedures for
deciding how long to test to reach a given
number of remaining failures. Avariant of
this decision is the stoppingmle.John Musa
and A. Frank Ackerman use failure intensity as a criterion for determining when to
stop testing.'
When to stop testing? Our recommended

approach to deciding when to stop testing
uses reliability prediction to estimate the
minimum testing time tz (or the interval
O,tz) needed to reduce the predicted maximum number of remaining failures to
R(t2).
To do this, we subtract Equation I
from Equation 3, set the result equal to
R(tz),and solve for tz.
14.4
t2 = ( In ~ ( ~ P ) / R ( t z ) l V P
(6)
where R(t2)can be established from
Module I
Module 2
lime to find last hilure

(in periods)
Total test time

(in periods)
Additional test time

(in periods)
52.9
49.0
45.3
64
34.6
44
@)(a)
Wz) =
(7)
where p is the desired fraction of remaining failures at tz.
Substituting Equation 7 in Equation 6
gives
(8)
t? = {In [(l/P)l)/P
the intewal20,30 to make predictions for

each module, hut then used a variable
amount of test time, T,, which is dependent on prediction results.
Tables 1 and 2 summarize the results of
applying the model to the failure data for
three space-shuttle modules (operational
increments). The modules are executed
continuously, 24 hours a day, day after
day. For illustrative purposes, we assume
that each period in the test interval is 30
days.
M e r executing the modules for the
0,20 interval, we applied the Statistical
hlodeling and Estimation of Reliability
L-.
32
=
.
Functions for Software measurement program6 (available free 6-om William Farr,
Naval Surface Warfare Center Code I<52, Dahlgren, VA 22448) to the observed
failure data to obtain estimates of cx and p.
Table 1 shows the total number of failures
observed during 0,20 and the estimated
parameters.
We obtained the predictions for the interval 20,30 in Table 2 with the equations
just described.T h e prediction of F(20,30)
led to the prediction of T, the allocated
number of test-execution time periods.
T h e number of additional failures that
were later observed as testing continued
You can use this result to determine when

to stop testing a given module. Figure 3
plots the results for module 1 and module
2 for various values ofp. From Equation 8
and Figure 3 you can derive (forp = .001),
the data in Table 3 :
+ The total minimum test time t 2 from
time 0 to reach essentially 0 remaining
failures: .I percent, corresponding to the
predicted remaining failures: 12.95 x .001
= .01295 for module 1 and 12.50 x .001 =
.O 1250 for module 2 (from Equation 7 and
Table 2).
+ T h e additional test time beyond
2O+i? 52.9 - 7.6 = 45.3 for module 1 and
49.0 - 14.4= 34.6 for module 2.
-~
JULY
1992
46
26
3b
Key:
Module
Module 21
t~~
+ T h e actual amount of test time required, startng at 0, for the last failure to occur: 64 for module
1 and 44 for module 2. Thls quantity comes from
16
L
~
002
004
p
006
008
01
remoiningfailure frortion
Figure 3. Eiemtton time needed to reach the desiredfiuctionof remainingfailures
lata, not from prediction. You dont know that it is

iecessarily the last failure. You only know that it was
h e last after 64 periods (1,910 days) for module 1 and
Wperiods(1,314days)formodule2.
So, your stopping rule for module 1 is tz = 52.9
ieriods; for module 2, t? = 49.0 periods.
ACKNOWLEDGMENTS
We thank E. Goniez, D.O. Hamilton, and J.K. Orr of the IBM Federal Senices Co., Houston, Texas, for their roles in the space-shuttlereliability study. This research was supported hy
Ulliain Farr of the Naval Surface Warfare Center and Raymond Paul of the Army OperahOnd
Test and F.valuation Command.
REFERENCES
Int/$ymp. &&are Reliabilzty Eng., IEEE CS
Iress,Los.-Uamitos, Calif., 1991,pp. 144167, 172-191.
2 . 1; Keller et. al, PracticalApplications of Software Reliability Models, Pror. Zntl Synp.Sofiw r e Relzuhilzty Eng., IEEE CS Press, Los Alamitos, Calif., 199I, pp. 76-78.
3 . IEEE Stuidard Ghssaly of Sofdare Engineering Zrmtnolo~,hVSVLEEE Std 729, IEEE,
NewYork, 19x3.
Schneidewind,Analysis of Error Processes in Computer Software,Proc. Intl Cmf.
Relzu/?leSofrare, IEEE CS Press, L o s Alamitos, Calif., 1975,pp. 337-346.
-5, U!H. Farr, A Survey of Software Reliahility Modeling and Estimation, Tech. Report
NSMC T R 82-171, Naval Surface Weapons Center, Dahlgren, Va., 1983.
6. \I!
I.
Farr and O.D. Smith, StatisticalModeling and Estimation of Reliability Functions
for Software Users Guide,Tech. Report NAVSUC TR-84-373, Rev. 2 , Naval Suface
Weapons Center, Dahlgren, \a., Mar. 1991.
7.J.D. Alusa and A.E Ackerman, Quantifymg Software Validation,ZEEE Software, May
1989, l i p . 19-27.
1, ApplicationsI and ApplicanomII,lc.
Norman E Schneidewind is a professor of mformanon sciences at

the Naval Postgraduate School and dvector of the computer laboratones m his department His mterests are software engmeenng and
networks He is & m a n of the IEEE Standard for Software Quality
M e m n worhng group and wce chauman for standards for the IEEE
Computer SocietysTechmcal Committee on Software E n p e e r m g
and edits the Standards deparunent for Conputer
Schneidewnd received a BS m electricalenpeering from the
Umversity of Cahfonua at Berkeley, an MS in electrical engmeermg
and an MS in computer science,both from San Jose State Univemty,
and an MS in engmeenng and a PhD in operanons research kom the
Umversity of Southern California. H e is a member of Eta Kappa Nu
and Tau Beta Pi enpeering honor so~ietles,Sigma XI research soaety, and a semor member of
IEEE
Ted W. Keller is a semor techmcal staff member at IBM Federal Serwces C O ,where he manages the organimhon responsible for integratmg schedules and comnunuents, configuranon conuol, control boards,
qualitv and reliabdity meuics, and cernficanon of flight readmess for
the space shuttles Pnmary Amomcs Software System
Keller received a BS m aerospace engmeermg from the University
of Tennessee He is co-wce char of the AIM Space-Based Observanons Sptems Committee on StandardsSoftware Rehdbihn,Workmg
Group and a member ofthe IEEE Computer Society
Address questions about this article to Schneidewinda t the Naval Postgraduate School, Code
AS/SS, Monterey. CA 93943 ;Internet n.schneidewind@compmail.com.
~
~
1 .ng these methods and models is feasible

today, but we recommend that you combine
{our use of a model with an evaluation of its assumpions and constraints, a validation of its predictions,
md an understanding of how to interpret its predicions. Doing these tl-ungs will lend credibility to your
&).1
-esults.
a&
cdrom
POUIRFul ADA ASOURCa ON CDAOM

RT YOUR FINGERTIPS! Millions of bytes of Rda
cods. RI1 indexed, annotated and documented.
flT YOUR FINGERTIPS! The complete Rda language reference manual.
RT YOUR FINGERTIPS! R choice of two Fully validated PC A d a programming environments with
compiler. libraries and on-line training and refer6nC6
. d
RT YOUR FINGERTIP5/ R 25 megabyte collection
of C-language code, utilities and templates.
RT YOUR FINGERTIPS! Over one-half million
words and phrases, parts of speech, and a
pronunciator. A great RI, OCR or SPELL CHECKING
resource.
FOR FURTHER INFORMRTlON all,WRITE OR FRX

US, WEll BE GLRD TO SEND YOU R DETRllED
DEMO OR RNWER RNY QUESTION YOU HRVE.
6520 Edenvaie Blve Ste 1 10
Eden Prairie, MN 55346
Voice: 800-722-9724
F a : 612-934-2824
Reader Service Number 4
~~
IEEE SOFTWARE

Software Engineering

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Software Engineering

Diunggah oleh

Hak Cipta:

Format Tersedia

APPlYing

F. SCHNEIDEWIND,Naval Postgraduate School

s o k a r e for the National Aeronau&s and

USING MODELS TO PREDICT

+ Prediction is the activity of estimating the future failure rate, number of

+ Control is the activity of comparing

If you think of each

Systems as components. To apply your

it hecomes part of nea-code subset c, subset L begins operation when version C is

New-code srksets. A s Figure I illustrates,

automaticallylogexecution time. Inmany

The WOmost critical

how its predictions

You must allocate

lations would differ, you could use other

2. Using h s quantity, you can predict

3 . You can also predict the maximum

4. Then, using h s quantity, you can

In &IS equation, although you are using

during the interval 20,20+T, is shown in

[X(O,ZO)+ X(20,20+T)]< F(m).

T h e actual F(-) is known only after d l

When to stop testing? Our recommended

where R(t2)can be established from

lime to find last hilure

Total test time

Additional test time

the intewal20,30 to make predictions for

You can use this result to determine when

Figure 3. Eiemtton time needed to reach the desiredfiuctionof remainingfailures

lata, not from prediction. You dont know that it is

Norman E Schneidewind is a professor of mformanon sciences at

1 .ng these methods and models is feasible

POUIRFul ADA ASOURCa ON CDAOM

FOR FURTHER INFORMRTlON all,WRITE OR FRX

Anda mungkin juga menyukai