Improving System
Reliability by Failure-Mode
Avoidance Including Four
Concept Design Strategies
Don Clausing1, * and Daniel D. Frey2
1
Massachusetts Institute of Technology (retired)
2
Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 3-449D, Cambridge, MA 02139
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE
Received 25 March 2005; Accepted 6 June 2005, after one or more revisions
Published online in Wiley InterScience (www.interscience.wiley.com).
DOI 10.1002/sys.20034
ABSTRACT
To be reliable, a system must be robust—it must avoid failure modes even in the presence
of a broad range of conditions including harsh environments, changing operational demands,
and internal deterioration. This paper discusses and codifies techniques for robust system
design that operate by expanding the range of conditions under which the system functions.
A distinction is introduced between one-sided and two-sided failure modes, and four strate-
gies are presented for creating larger windows between sets of one-sided failure modes. Each
strategy is illustrated through two examples from industrial practice. For each strategy, one
example is from paper handling and another is from jet engines. By showing that every strategy
has been successfully applied to each system, we seek to illustrate that the strategies are
widely applicable and highly effective. © 2005 Wiley Periodicals, Inc. Syst Eng 8: 245–261, 2005
245
246 CLAUSING AND FREY
Figure 1. Two ways of thinking about uncertainty—probabilistic and naturalistic. Medical doctors were asked questions in two
different formats. Their answers are graphed here as dots and the correct answer is annotated. These results suggest that the more
naturalistic formulation leads to far more accurate judgments by professional practitioners [adapted from Gigerenzer and Edwards,
2003].
set of conditions that a system is likely to experience in this as quickly and economically as one can manage
its lifecycle. Although an approximate set of conditions within the time available. This implies that systems
can be defined, it will surely miss some important engineers should not spend much energy on predicting
combinations of conditions. Later on, these unantici- field reliability but instead use that same energy to
pated operating conditions may arise and the system increase field reliability [Clausing, 1994].
may cease to function. When this happens, it is tempting It seems that the creative design work that leads to
to say that, since the condition was not specified, the reliability improvement is a very natural activity and is
system did not actually fail—that the system was mis- consistent with our “failure-mode avoidance” concep-
used. It is essential for systems engineers to recognize tion of reliability.
that nature does not care what systems engineers think We propose that thinking of reliability as failure-
the “specified operating conditions” are. When the sys- mode avoidance can have real advantages, especially in
tem fails to function under the conditions the system the early stages of system design or in a long-term
actually experiences, that constitutes a failure. This scenario such as technology development. In early
point is well understood by some reliability engineers. stages of system design, probability theory may be too
For example, Thomas, Ayers, and Pecht [2002] discuss quantitative for the task at hand. Probability density
“trouble not identified” warrantee returns in the auto functions imply a level of precision in modeling the
industry and conclude: “[I]t must not be assumed that scenario that is often unwarranted, especially during
a returned module that passes tests associated with an early development. As a project advances through its
engineering specification is good,” p. 650. Because of development stages the probabilistic view of reliability
uncertainty regarding specified operating conditions, becomes increasingly useful. Analysis of reliability us-
we argue that an effective approach is to increase the set ing probability theory is useful for component selec-
of conditions under which the system operates and do tion, system validation, and the management of
248 CLAUSING AND FREY
field-service operations. The value of the failure mode on target. Taguchi’s method employs orthogonal arrays
avoidance conception of reliability is greatest for tech- to explore the design space. At the same time, outer
nology strategy, systems architecting, concept design, arrays or compounded noises are used to explore the
and for some robust parameter design activities, all range of possible operating conditions. Signal to noise
done early during the development of the system. ratios are used as measures of the robustness of the
system and guide the engineer to preferable levels of
the control factors.
2. REVIEW OF RELATED WORK Taguchi’s philosophy of robust design is consistent
This paper is intended to help engineers with the early- with the approach to reliability engineering discussed
stage, conceptual phase of design. Therefore, an impor- here. Taguchi rejected the “goal post” mentality inher-
ent in tolerance limits and specifications. His notion of
tant related development is the Theory of Inventive
a quality-loss function replaced consideration of defect
Problem Solving (sometimes described by the acro-
rates and process yields with an emphasis on reducing
nyms TRIZ or TIPS). The theory was first described by
variance followed by adjustment to target. Taguchi en-
Altschuller [1984] and was recently placed in a broader
couraged engineers to deliberately expose designs to
context of innovation by Clausing and Fey [2004]. The
harsh conditions in experiments. To do this requires a
theory is based on a study of thousands of patents that
transformation in the culture of an engineering organi-
revealed patterns among inventive solutions. An impor-
zation. The emphasis must shift from demonstrating
tant underlying hypothesis is that inventive problems
adequate performance with high statistical confidence
can be viewed as conflicts which the inventive solutions
to aggressive improvement followed by adequate con-
resolve. This enabled large numbers of patents to be
firmation.
organized in a useful taxonomy. It has also given rise to
Robust parameter design is among the most impor-
commercial software products that facilitate the use of
tant developments in systems engineering in the 20th
the theory by professional practitioners. However, we century. These methods seem to have accounted for a
note that many patents claim robustness as their primary significant part of the quality differential that made
advantage—they do not deliver new functions, but de- Japanese manufacturing so dominant during the 1970s.
liver existing functions over a broader range of condi- The methods were subsequently adopted outside of
tions. While TRIZ is helpful in development of new Japan. The timing of that adoption in the West corre-
functions and elimination of harmful side effects, it sponded closely with improvement in quality that im-
does not seem to support reliability innovations to the proved competitiveness of North American and
extent we desire. Therefore, this paper analyzes patents European manufacturers. Robust design methods were
and seeks new patterns of inventive engineering work. surely a significant part of both the rise of Japanese
A development in reliability engineering closely industry and the response to that competitive challenge.
related to this paper is the “physics-of-failure” (PoF) Robust design methods have continued to be refined
approach developed at the Computer Aided Life Cycle and are still an active area of systems engineering
Engineering (CALCE) Electronic Products and Sys- innovation.
tems Center at the University of Maryland. The first Another approach relevant to this paper known as
instance in archival literature of the term “physics of “operating window methods” was developed and prac-
failure” is Pecht et al. [1990], which emphasizes use of ticed at Xerox Corporation in the 1970s. The operating
a physics-based model for reliability prediction and window is the set of conditions under which the system
design for reliability. This approach has been extended operates without failure. In operating window methods,
to product development by Pecht and Desgupta [1995] reliability is improved by making the operating window
and to accelerated life testing by Kimseng et al. [1999]. larger. Clausing [2004] described the approach in detail
This paper builds upon the conception of physics-of- in a recent issue of Technometrics, but the essence of
failure and seeks to extend this conception to the earli- the approach is simple enough to present here:
est, creative phases of system design.
An important development in reliability engineering 1. Increase the value of the noise factors so that the
is robust parameter design pioneered by Genichi failure rate is high.
Taguchi [Taguchi, 1993]. For any design concept, there 2. Change the value of the control factors to seek a
is a potentially large space of control factor settings that broader operating window at a fixed failure rate.
will nominally place the function at the desired target
value. In robust parameter design, the engineer explores This approach was used, for example, to improve the
the design space seeking changes that will make the reliability of paper handling machines. At Xerox, paper
system more robust while still keeping the performance stacks were designed and constructed to deliberately
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 249
produce a large magnitude of variation. The papers of the system is even more critical for early stage system
varied in their weight, surface condition, geometry, and design than it is for later stage parameter design.
so on. These paper stacks were similar to the worst As discussed in this section, the basic concept of
stacks one would encounter in field use, and, in con- operating windows is to seek a larger set of conditions
junction with operation near the limit of the operating under which the system functions. While the idea is
window, they brought about higher failure rates than very simple, implementation is challenging, requiring
would normally be encountered, on the order of 1 in 10 deep knowledge of the system and the creativity to
rather than 1 in 10,000. These high failure rates enabled develop the needed design innovations. This paper
the engineers to more quickly discern the effect of seeks to help engineers implement early stage robust-
changes in failure rate with changes in the control ness work via operating window methods. The next
factors such as stack forces, feed belt angles, and so on. section covers some theoretical developments. The sub-
This approach worried managers since they observed sequent sections present specific strategies for imple-
the machines jamming with high frequency, but they mentation.
eventually came to understand why this was needed. As
a consequence the engineers were able to quickly con-
verge to more reliable machine configurations. 3. OPERATING WINDOWS AND FAILURE
Despite the use of failure rates as a measure of MODES FORMALIZED
performance, the operating window method is, upon
This section develops a formal treatment of operating
closer examination, consistent with Taguchi’s quality
windows and failure modes. The details developed here
philosophy. Because failure rates were greatly in-
are not regarded by the authors as necessary for imple-
creased by applying aggressive noises, improvements
menting the four strategies presented in this paper. The
could be made rapidly, even though they sacrificed the
formal framework may, however, justify the approach
ability to accurately predict field reliability. The term
“operating window” may seem to imply an emphasis and will be helpful to those who seek a deeper under-
on goal posts, but in fact the “customer-specified” limits standing of the strategies. However, those readers who
are viewed as irrelevant and the expansion of actual are primarily interested in the operational aspects could
physical limits is valued instead. skip to Section 4.
Operating window methods continue to be an active To formalize the idea of operating windows, it is
area of research in quality engineering. Joseph and Wu helpful to define failure modes mathematically. A fail-
[2004] showed that under certain conditions a failure ure-mode criterion is an inequality that applies to a
rate of 50% maximizes the information gained from functional response of a system Yi(X, Z) > Li or Yi(X,
robust design using an operating window. As an exam- Z) < Ui. The criteria are defined such that, if the criteria
ple, they carried out a case study wherein line width in are satisfied, the failure will not occur. The inputs X and
a lithography process set at a much finer pitch than Z are vectors of physical variables in the engineering
actually needed in practice. The control factor settings system. The physical variables are sorted into two types,
that improved the robustness at the finer pitch also not necessarily disjoint—noise factors Z and control
improved the robustness at the pitch needed in opera- factors X. The control factors are variables the designer
tion. The basic concept of operating windows was may change during the parameter-design phase of sys-
therefore further corroborated. tems engineering. The noise factors are physical vari-
While retaining the benefits of Taguchi’s quality ables that vary in the environment, manufacture, or
philosophy, operating window methods may have a lifecycle of the system. Yi is a functional response of the
further advantage. In operating window methods, the system and the mapping Yi(X, Z) describes the physical
progress in reliability is measured in physical terms by or logical process by which the system responds to the
the size of the operating window. This may be prefer- control and noise factors. Li and Ui are lower and upper
able to measuring results with a more abstract measure limits on a response defined so that exceeding that limit
such as signal to noise ratios. For example, operating constitutes a system failure.
window methods encouraged engineers at Xerox to To illustrate these ideas, consider a jet engine. A
devise ways to double the range of paper weights the functional response of an engine is the thrust it devel-
machine could feed rather than contemplate how to ops. If thrust were to fall below some prescribed limit,
increase signal to noise ratios by 6 decibels. As pre- we could define that condition as a failure. The thrust
viously discussed, cognitive psychology suggests there is affected by control factors such as the chord of the
is an advantage in maintaining a connection to physical fan blades. The thrust is also affected by noise factors
quantities rather than probabilistic measures. We pro- such as the inlet temperature and angle of attack of the
pose that a mental connection to the physics and logic free stream into the engine inlet. A reliable engine is
250 CLAUSING AND FREY
designed so that the thrust is within acceptable limits flow at low temperatures. In this paper, a two-sided
over a wide range of the noise factors. failure mode is necessarily governed by a single set of
To make these ideas operational, we have found it failure-mode physics.
necessary to introduce a distinction between two types In the presence of a two-sided failure mode, robust
of failure modes—one-sided and two-sided failure parameter design is critical. Figure 2 depicts a two-
modes [Clausing and Frey, 2004]. A one-sided failure sided failure mode applied to a response. The operating
mode is a functional response and the associated physi- conditions give rise to a variation in the functional
cal process Yi(X, Z) with either a lower or upper limit response; therefore, the response has a probability dis-
but not both. A common one-sided failure mode is tribution p(Yi). In the scenario on the left side of Figure
plastic deformation of a material. When plastic defor- 2, the variability is so wide that it cannot be accommo-
mation is unacceptable or reaches a prescribed limit, the dated within the limits between the failure mode
designer will define that as a failure. Plastic deforma- boundaries. If robust parameter design were applied,
tion often occurs when a level of stress is exceeded, so the sensitivity of the response would be reduced, result-
the failure criterion would naturally fit the form Yi(X, ing in a tighter distribution of the response enabling
Z) < Ui where Yi denotes stress in physical units such both sides of the failure mode to be avoided. Thus,
as pounds per square inch. If there is no parallel failure robust parameter design is essential in the presence of
mode for low values of stress, then it is most natural to two-sided failure modes and, indeed, much of the re-
think of plastic deformation as a one-sided failure search in robust design is oriented toward scenarios
mode. with two-sided failure modes. This paper by contrast
A two-sided failure mode is a functional response concerns itself primarily with single-sided failure
and the associated physical process Yi(X, Z) with both modes, which seem to admit a wider range of robust
a lower and an upper limit. Two-sided failure modes are design approaches.
frequently found in measurement or metering functions It is common for a single noise factor to be limited
within a system. If a measuring system is inaccurate, from above and below by two different physical failure
the designer will regard it as a failure when the readings modes. Here, this situation is characterized as an oper-
are too high or too low compared to the true quantity, ating window between two one-sided failure modes
so the failure criterion would naturally fit the form Li < rather than a two-sided failure mode. To illustrate the
Yi(X, Z) < Ui where Yi denotes, for example, measure- difference, consider fluid metering again. It is possible
ment error in physical units such as volts. that an upper limit on the noise factor of temperature is
Note that, given the definitions here, a two-sided set by the physical process of a boiling while the lower
failure mode is driven by the same physical process limit on temperature is set by the previously discussed
description Yi at both the high and low failure-mode increase in viscosity with reduced temperature. It there-
boundaries. Thus, a single noise factor like ambient fore seems more natural to consider two failure modes
temperature can be limited from above and below by a governed by two different functional responses, Yi(X,
single physical phenomenon. For example, a fluid me- Z) < Ui and Yi+1(X, Z) > Li+1. The difference here is
tering system may operate in a limited temperature reflected in the fact that the two responses have different
range due to the fact that the fluid viscosity is a function indices. In theory this seems minor, but in practice we
of temperature. This single physical phenomenon of regard this as highly significant. Robust parameter de-
temperature dependence of viscosity may make too sign might still be applied with success, but it seems
much fluid flow at high temperatures and too little fluid that other approaches will also be applicable. All of the
Figure 2. Robust parameter design accomplishes failure-mode avoidance in the presence of two-sided failure modes.
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 251
four strategies presented here illustrate specific alterna- This theorem is mathematically straightforward. Re-
tives for such cases. liability is traditionally defined as the probability of
Now that we have defined failure-mode criteria of failure where probability and probability density are
various types, we may define the operating window defined so that integration over a set gives a probability.
formally. The operating window is the set of noise Since probability density cannot be negative, integrat-
conditions Z that satisfy the full set of failure-mode ing over a set must give a larger or equal probability
criteria: than integrating over any of its subsets. Although
mathematically basic, the theorem may be important in
Yi(X, Z) ≥ Li for all i with lower limited system design due to its practical implications. The
onewsided failure modes probability density function of the noise factors is gen-
Yi(X, Z) ≤ Ui for all i with upper limited erally known only very approximately. If changes can
Z onewsided failure modes be identified that meet the conditions of the theorem
Li ≤ Yi(X, Z) ≤ Ui for all i with above, then reliability can be improved in spite of our
twowsided failure modes ignorance about the probability density function of the
noise factors.
To simplify the notation, we will compress this down A graphical illustration of this theorem is Figure 3
to {Z|Yi(X, Z) ∈ Wi for all i} where Wi therefore defines in which one axis represents a single noise factor and
a window, which may be one-sided or two-sided. another axis represents a single control factor. Two
A goal of systems engineering is to make the system different functional responses define constraints within
robust by adding more points to this set. Given this the space defined. At the initial setting of the control
concept of failure-mode criteria and operating win- factor X1, there is an operating window. A change is
dows, it is possible to identify design changes that made in the control factor setting making it X 1g . Since
improve reliability without any recourse to probability. the new range of the noise factor Z1 completely contains
The development principle is to add points to the oper- the old range of Z1, the operating window has been
ating window as rapidly as possible. The theorem below increased and reliability has been improved.
formalizes this concept for parameter design. It is instructive to consider coupling among failure
modes. In pursuing robustness to the ith failure mode,
the designer may consider changing the value of control
Operating Windows and Parameter Design -- If factor Xk. If a change in Xk that affects the set satisfying
the design parameters of a system are changed from the ith failure-mode criterion also changes the set satis-
X to X′ and the new operating window holds the old fying the jth failure-mode criterion, then we say that
operating window as a subset {Z|Yi(X, Z) ∈ Wi for failure-mode criteria i and j are coupled by control
all i} ∈ {Z|Yi(X′, Z) ∈ Wi for all i}, then reliability factor Xk. This definition of coupling is consistent with
has improved. the definition of coupling among equations in mathe-
matics [Borowski and Borwein, 1991]. The definition
Figure 3. Robust parameter design can accomplish failure-mode avoidance in the presence of multiple one-sided failure modes.
252 CLAUSING AND FREY
is also similar to the definition of coupling in Axiomatic a graphical depiction of a two-dimensional operating
Design [Suh, 1990] except that coupling occurs among window formed between three one-sided failure-mode
failure modes rather than functional requirements. It criteria. A key distinction between Figure 3 and Figure
should be evident that the two failure modes in Figure 3 4 is that in Figure 4 two noise factors are represented
are coupled by control factor X1. In this instance, how- rather than one. In addition, no control factors are
ever, the coupling is not such that it negatively affects represented using an axis. Instead Figure 4 represents
the robust parameter design process. The theorem’s the operating window at a two distinct design configu-
conditions are satisfied and reliability improvements rations X and X′. The shape and size of the window can
may proceed despite the coupling. However, it should vary with the design parameters. A useful goal, as
also be clear that when failure modes are not coupled, before, is to add points to the operating window without
robust parameter design may be simpler to accomplish. removing any points. This condition holds in Figure 4,
In the absence of coupling, any control factor Xi so the change in design will improve the system’s
affects at most one failure-mode criterion. Once the reliability.
direction of the dependence is determined, the operat- The theorem discussed previously applies to pa-
ing window can be increased by sequentially maximiz- rameter design, but the idea depicted in Figure 4 can be
ing or minimizing the size of the set as a function of that readily extended to conceptual design in which not only
single control factor. This is frequently accomplished the control factors are changed, but the functional re-
by driving the value of the control factor to its technical sponse of the system is modified as well. All that is
or architectural limits. An example of this is found in required is the idea that the functional response itself
paper-feeding machinery. A higher friction coefficient can be varied as well as the control factors.
of the feed rolls helps to prevent misfeeds and does not
particularly encourage multifeeds. For this reason, de- Operating Windows and Conceptual Design -- If
velopers of paper handlers worked to increase the fric- the conceptual design of a system is changed in-
tion coefficient of feed rolls as far as technically cluding a change in functional responses Yi to Y ig and
feasible. Even though these technical developments
the corresponding design parameter changes from X
improved the system, the reliability was still not suffi-
to X′ and the new operating window holds the old
cient and further improvements had to be sought. Be-
operating window as a subset {Z|Yi(X, Z) ∈ Wi for
cause of this phenomenon, in any system that is fairly
all i} ∈ {Z|Y ig (X ′, Z) ∈ Wi for all i}, then reliability
mature, it is common for the parameters that do not
has improved.
couple multiple failure modes to be set near their physi-
cal or architectural limits. Since consideration of un-
At the earliest stages of system design when our
coupled parameters is straightforward, much of the
attention in systems engineering is therefore directed to latitude to make changes is greatest, it is these types of
dealing with parameters that are coupled to multiple conceptual changes that are most critical to find and
failure modes. implement. Although robust parameter design has been
It is often necessary to consider the operating win- a valued development in systems engineering, large
dow with respect to two or more noise factors simulta- changes in system reliability observed over time cannot
neously. This requires a representation of be explained by parameter design alone. As an example,
multidimensional failure-mode boundaries. Figure 4 is vehicles sold today are far more reliable than those that
were sold 30 years ago. The majority of that reliability defined in Section 3. Such control factors should be
improvement is due to the scores of system design and maximized or minimized to create the greatest possible
technological changes made over these decades. Elec- distance from the affected one-sided failure mode con-
tronic spark timing replaced the distributor. Fuel injec- sistent with any constraints on the control factor. As the
tion systems replaced carburetion. In addition there system is placed under greater demands over time due
were many other less widely known innovations that to system evolution and competition, the operating
created large improvements in reliability. We propose window afforded under the current system constraints
that at an early stage of system design, many design may become insufficient. Under these circumstances,
opportunities exist that meet the criteria of the theorems the constraint can often be relaxed by making changes
presented here. Much of early stage, conceptual reli- in the system architecture or by changes in technology.
ability engineering can therefore be undertaken without The relaxed constraint enables further changes to the
any probabilistic modeling freeing up engineers for uncoupled control factor, which opens the operating
deep thought about patterns of innovation in reliability window.
engineering. This is a principal message of this paper, Primary Case Study—Paper Feeder. As an indus-
and it will be emphasized by presenting specific strate- trial example, we present the Xerox paper feeder that
gies for carrying out this suggestion. first went into production in 1981, and has appeared in
many different Xerox copiers and printers. This paper
4. FOUR STRATEGIES FOR IMPROVED feeder is known as a friction-retard feeder (Fig. 5).
ROBUSTNESS The feedbelt rests on the paper stack, and drags the
top sheet forward. The friction of the retard roll holds
Up to this point, this paper has focused on the interre- back (retards) the second sheet if it tries to come
lated concepts of reliability, robustness, and one-sided through. Thus, the retard roll prevents multifeeds (feed-
failure modes. From this point forward, the paper con- ing of more than one sheet). Therefore, the wrap angle
centrates on strategies to avoid one-sided failure modes. between the feedbelt and the retard roll only affects the
All of these strategies involve concept design rather failure mode of multifeeds. The other primary failure
than parameter design. The design changes considered mode is misfeeds (no sheet is fed). This failure mode is
here are not only changes in the values of design pa- not affected by the wrap angle between the feedbelt and
rameters but also additions of new features or compo- the retard roll. Because multifeeds are reduced by a
nents, changes in the configuration of the system, or large wrap angle and misfeeds are unaffected, it is clear
even new inventions. We present four strategies along that the wrap angle should be as big as possible.
these lines: Despite the desirability of having a large wrap angle,
the previous-generation feeder (ca. 1975) had a wrap
1. Relax a constraint limit on an uncoupled control angle of only 13°, which was constrained by the system
factor. architecture. In the new design that first went into
2. Use physics of incipient failure to avoid failure.
3. Create two distinct operating modes for two dif-
ferent demand conditions.
4. Exploit interdependence between two operating-
window system variables.
Figure 6. The architecture on the left has a nearly linear paper path, U.S. Patent # 3,390,725 [Jones and Van Deluyster, 1976]. A
newer architecture on the right has a looping paper path, which enabled a larger wrap angle, U.S. Patent # 4,475,732 [Clausing
et al., 1984].
production in 1981 the wrap angle was increased to 45°. case of wrap angles in paper feeders, innovation en-
This large improvement in wrap angle was enabled by abled a critical parameter to be pushed past its previous
a change in the total system architecture. In large copi- constraints to move a one-sided failure-mode boundary
ers and printers the next subsystem after the paper and increase the operating window.
feeder is the registration subsystem, which aligns the Summary of the Strategy. When a system variable
sheet with the image. In the new design the architecture only affects one of the one-sided failure modes, take its
was changed so that the paper came out of the feeder value to its constraint limit. If the operating window is
and turned down to reach the registration subsystem still not large enough, seek new architectures or tech-
(Fig. 6), which was underneath the feeder. This enabled nologies that relax the constraint.
the wrap angle to be greatly increased. This architecture
also reduced the width of the copier/printer, which is 4.2. Use Physics of Incipient Failure To
desirable. This paper feeder with the large wrap angle Avoid Failure
has been very successful in many generations of Xerox
copiers and printers. In some systems the physics of the incipient failure can
Supplementary Case Study—Jet Engines. A be used to prevent or delay the failure mode. All one-
similar approach was used to improve the reliability of sided failure modes are associated with underlying
axial-flow fans in jet engines. A fan is a component of physical phenomena. In many cases the failure mode
modern high by-pass commercial jet engines that pro- exhibits distinct physical mechanisms that become ac-
vides a significant increase in the total mass flow, and tive as the onset of the failure mode is approached. In
therefore improvement in propulsive efficiency. A criti- some systems there exists an opportunity to exploit the
cal failure mode of such fans is flutter vibration due to physics of incipient failure to increase the size of the
the length of the blades and their exposure to inlet flow operating window.
distortions. It had long been known that increasing the Primary Case Study—Jet Engines. An example is
chord of a fan blade stiffened the blade and thereby afforded by the use of shaped grooves in compressor
reduced the incidence of the failure mode of flutter, but casings in modern jet engines. An axial flow compres-
the chord of the blade was limited by constraints on sor is comprised of multiple alternating stages of rotor
weight [Koff, 2004]. Eventually, new technologies for assemblies and stators. To limit engine complexity and
manufacturing hollow blades enabled engine manufac- weight, a large pressure rise per stage is desired so that
turers to increase chords significantly without added the desired pressure rise in the compressor can be
weight. For example both Patent #4,345,877 [Monroe, accomplished with a small number of stages. However,
1980] and Patent #4,720,244 [Kluppel and Monroe, the pressure increase of each stage is limited by a failure
1987] contributed to these advances. Wide-chord fans mode of aerodynamic stall and surge. A stall involves
provided much greater resistance to flutter and have separation of airflow from a blade, which at any given
thereby greatly improved engine reliability. As in the time may affect only one stage or even a group of stages.
IMPROVING SYSTEM RELIABILITY BY FAILURE-MODE AVOIDANCE 255
Figure 7. The arrangements of slots in an axial flow compressor. Adapted from U.S. Patent #4,086,022 [Freeman and Moritz,
1978].
A compressor surge generally refers to a complete flow [Freeman and Moritz, 1978], a series of angled channels
breakdown throughout the compressor. The value of are placed in the casing of the compressor extending
airflow and pressure ratio at which a surge occurs is from the leading edge of the rotors and extending just
termed the “surge point” and “surge margin” is a term aft of the trailing edge (see Fig. 7). If a surge begins to
for the difference between the airflow and compression occur, then “a rotating annulus of pressurized gas will
ratio at which it will normally be operated and the begin to build up about the tips of the blades”. Because
airflow and compression ratio at which a surge will of the geometry of the slots, “the annulus of air will be
occur. Thus, we can readily interpret surge margin as directed into the slots … thus reducing or eliminating
the distance from the one-sided failure mode of com- the surge” [Freeman and Moritz, 1978, p. 5].
pressor surge. To understand how the casing treatments are related
In the late 1970s new technologies known as “casing to the operating window, it is useful to consider Figure
treatments” were developed. In one casing treatment 8 adapted from Cumpsty [1997]. The abscissa in the
technology assigned to Rolls Royce, Patent #4,086,022 figure is mass flow of air into the engine. The mass flow
Figure 8. The effect of casing treatment on surge of jet engine compressors [adapted from Cumpsty, 1997].
256 CLAUSING AND FREY
in an engine may vary due to changes in inlet conditions rate will be excessive. If the stack force is too small, the
caused by atmospheric conditions or aircraft maneu- misfeed rate will be excessive. Therefore, there is an
vers; therefore, mass flow is a noise factor as defined in operating window between these two one-sided failure
Section 3. The ordinate in Figure 8 is pressure rise modes (Fig. 9).
across a stage of the compressor. When conditions are When the range of papers is moderate, it is easy to
at their nominal state, the engine will generally remain develop a sufficient operating window so that both the
on the operating line with mass flow and pressure rise multifeed rate and the misfeed rate are very small.
both changing as a function of the throttle position set However, for the large range of papers that are typically
by the pilot. At a fixed throttle position, when mass flow used in large production copiers and printers, it is very
is reduced due to maneuvers or environmental condi- difficult, or impossible, to develop a sufficient operat-
tions, the state of the engine moves toward the surge ing window, as shown on the left of Figure 9.
line as indicated in step 1 of Figure 8. This pushes the On the left hand side of Figure 9, it is evident that no
engine off the operating line and toward the failure- single value of stack force will simultaneously avoid
mode boundary. The amount of mass-flow drop that can both multifeeds and misfeeds over the full range of
be tolerated before failure (step 3a or step 3b) is some- paper weights. This was still true after robust parameter
times called the “surge margin” which we interpret as design had been completed, so there was little hope to
an indication of the operating window size. The tech- improve it further beyond the great improvement that
nology described in Patent #4,086,022 can be viewed had already been achieved.
as a means to exploit the incipient failure-mode physics The problem was resolved through the development
(the rotating annulus of air—step 2) to increase the of a “stack force relief/enhancement” technology, U.S.
surge margin. The treatments are designed so that the Patent # 4,561,644 [Clausing, 1985]. This technology
incipient physics will lead to a pressure relief across the uses two different values of the stack force, a small
stage (step 3b). The advanced casing treatment “in- value for most papers, and a larger value for heavy
creased fan stall margin by a staggering 20% under papers (as depicted on the right side of Fig. 9). Under
distorted inlet flow and with little loss in efficiency.” normal conditions, the stack force is set to the small
[Koff, 2004, p. 582]. value. For most common paper weights this works very
Supplementary Case Study—Paper Feeder. A reliably. If a larger paper weight is used, a misfeed
similar approach was used to improve the reliability of condition may begin to emerge. A sensor near the retard
paper feeders. For friction-retard paper feeders, the roll is designed to sense the arrival of the lead edge of
stack force between the feedbelt and the paper stack is the sheet. If an incipient misfeed occurs, the paper will
a critical system variable. If it is too large the multifeed not arrive within the desired time period. Under this
G. Gigerenzer, “Ecological intelligence: An adaptation for M. Pecht and A. Dasgupta, Physics-of-failure: An approach
frequencies,” The evolution of mind, D.D. Cummins and to reliable product development, J Inst Environ Sci 38
C. Allen (Editors), Oxford University Press, New York, (1995), 30–34.
1998, http://www.mpib-berlin.mpg.de/dok/full/gg/gge- M. Pecht, A. Dasgupta, D. Barker, and C. T. Leonard, The
juevm_/ggejuevm_.html. reliability physics approach to failure prediction model-
G. Gigerenzer and A. Edwards, Simple tools for under- ing, Qual Reliab Eng Int, 6 (1990), 267–273.
standing risks: From innumeracy to insight, Br Med J 327
S.S. Rao, Reliability-based design, McGraw Hill, New York.
(2003), 741–744.
1992.
H. Jones and J.W. Van Deluyster, Multiple sheet feeding
system for electrostatographic printing machines, U.S. C.V. Sidwell, On the impact of variability and assembly on
Pat. #3,930,725, 1976. turbine cooling flow and oxidation life, Ph.D. Thesis,
R.V. Joseph and C.F.J. Wu, Failure amplification method: An Massachusetts Institute of Technology, Cambridge, MA,
information maximization approach to categorical re- 2004.
sponse optimization, Technometrics 46(1) (2004), 1–12. M. Silverberg, Interrupted jet air knife for sheet separator,
K. Kimseng, M. Hoit, N. Tiwari, and M. Pecht, Physics-of- U.S. Pat. #4,275,877, 1981.
failure assessment of a cruise control module, Microelec- K.K. Stange, Air floatation bottom feeder, U.S. Pat.
tron Reliab 39(10) (1999), 1423–1444. #4,014,537, 1977.
G.E. Kluppel and R.C. Monroe, Fan blade for an axial flow N.P. Suh, The principles of design, Oxford University Press,
fan and method of forming same, U.S. Pat. #4,720,244, New York, 1990.
1987. G. Taguchi, Taguchi on robust technology development,
B.L. Koff, Gas turbine technology evolution: A designer’s
ASME Press, New York, 1993.
perspective, AIAA J Propulsion Power 18(14) (2004),
D.A. Thomas, K. Ayers, and M. Pecht, The trouble not iden-
577–595.
tified phenomenon in automotive electronics, Microelec-
A.H. Lefebvre, Gas turbine combustion, Philadelphia , Taylor
& Francis, 1999. tron Reliab 42(4–5) (2002), 641–651.
S.J. Markowski, R.P. Lohmann, and R.S. Reilly, Vorbix I.A. Ushakov (Editor), Handbook of reliability engineering,
burner: A new approach to gas turbine combustors, ASME Wiley, New York, 1994.
J Eng Power 98(1) (1976), 123–129. R.M. Washam, Dry low NOX combustion system for utility
R.C. Monroe, Axial flow fans and blades therefore, U.S. Pat. gas turbine, ASME Paper 83-JPGC-GT-13, ASME, New
#4,345,877, 1980. York, 1983.
Don Clausing received the B.S. degree in mechanical engineering from Iowa State University in 1952.
After working for nine years he again became a full-time student, and received his M.S. (1962) and Ph.D.
(1966) degrees from the California Institute of Technology (Caltech). He worked in industry for a total
of 29 years before becoming a half-time faculty member at MIT from 1986 until 2000. Starting about
1975 he has had a role in the major improvements in product development and systems engineering that
have enhanced the competitiveness of many commercial industries. This includes the publication (1994)
of his book Total Quality Development—World-Class Concurrent Engineering. He now has a new book
(2004), co-authored with Victor Fey, Effective Innovation—The Development of Winning Technologies.
Clausing has long been a leader in robust design, a key to reliable systems. During the 1970s he led in the
development of the operating-window method to achieve robustly reliable systems.
Dan Frey earned the B.S. degree in aeronautical engineering from Rensselaer Polytechnic Institute in
1987. After serving as a Naval Officer for 4 years, he earned his M.S. from the University of Colorado in
1993 and Ph.D. from the Massachusetts Institute of Technology in 1997. Since then, he has been a faculty
member conducting research in robust design, statistics, design methodology, and systems engineering.
He currently holds a dual key faculty position at MIT in the Department of Mechanical Engineering and
in the Engineering Systems Division.