and Extensions
Library of Congress Cataloging-in-Publication Data
Fang, Shu-Cherng.
Linear optimization and extensions : theory and algorithms I
Shu-Cherng Fang, Sarat Puthenpura.
p. em.
Includes bibliographical references and index.
ISBN 0-13-915265-2
I. Linear programming. I. Puthenpura, Sarat. II. Title.
T57.74.F37 1993
519.7' 2-dc20 92-38501
CIP
The authors and publisher of this book have used their best efforts in preparing this book. These efforts include the
development, research, and testing of the theories and programs to determine their effectiveness. The author and
publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation
contained in this book. The author and publisher shall not be liable in any event for incidental or consequential
damages in connection with, or arising out of, the furnishing, performance, or use of these programs.
10 9 8 7 6 5 4 3 2
ISBN 0-13-915265-2
PREFACE xiii
1 INTRODUCTION 1
vii
viii Contents
BIBLIOGRAPHY 280
INDEX 295
Preface
Since G. B. Dantzig first proposed the celebrated simplex method around 1947, the wide
applicability of linear programming models and the evolving mathematical theory and
computational methodology under these models have attracted an immense amount of
interest from both practitioners and academicians. In particular, in 1979, L. G. Khachian
proved that the ellipsoid method of N. Z. Shor, D. B. Yudin, and A. S. Nemirovskii could
outperform the simplex method in theory by exhibiting polynomial-time performance;
and, in 1984, N. Karmarkar designed a polynomial-time interior-point algorithm that
rivals the simplex method even in practice. These three methods present different and
yet fundamental approaches to solving linear optimization problems.
This book provides a unified view that treats the simplex, ellipsoid, and interior-
point methods in an integrated manner. It is written primarily as a textbook for those
graduate students who are interested in learning state-of-the-art techniques in the area of
linear programming and its natural extensions. In addition, the authors hope it will serve
as a useful handbook for people who pursue research and development activities in the
relatively new field of interior-point methods for optimization.
We have organized the book into ten chapters. In the first chapter, we introduce
the linear programming problems with modeling examples and provide a short review
of the history of linear programming. In the second chapter, basic terminologies are
defined to build the fundamental theory of linear programming and to form a geomet-
ric interpretation of the underlying optimization process. The third chapter covers the
classical simplex method-in particular, the revised simplex method. Duality theory,
the dual simplex method, the primal-dual method, and sensitivity analysis are the topics
of Chapter 4. In the fifth chapter, we look into the concept of computational complex-
ity and show that the simplex method, in the worst-case analysis, exhibits exponential
xiii
xiv Preface
complexity. Hence the ellipsoid method is introduced as the first polynomial-time al-
gorithm for linear programming. From this point onward, we focus on the nonsimplex
approaches. Naturally, the sixth chapter is centered around the recent advances of Kar-
markar's algorithm and its polynomial-time solvability. Chapter 7 essentially covers
the affine scaling variants, including the primal, dual, and primal-dual algorithms, of
Karmarkar's method. The concepts of central trajectory and path-following are also in-
cluded. The eighth chapter reveals the insights of interior-point methods from both the
algebraic and geometric viewpoints. It provides a platform for the comparison of differ-
ent interior-point algorithms and the creation of new algorithms. In Chapter 9, we extend
the results of interior-point-based linear programming techniques to quadratic and convex
optimization problems with linear constraints. The important implementation issues for
computer programming are addressed in the last chapter. Without understanding these
issues, it is impossible to have serious software development that achieves the expected
computational performance.
The authors see three key elements in mastering linear optimization and its exten-
sions, namely, (1) the intuitions generated by geometric interpretation, (2) the properties
proven by algebraic expressions, and (3) the algorithms validated by computer imple-
mentation; and the book is written with emphasis on both theory and algorithms. Hence
it is implied that a user of this book should have some basic understanding in math-
ematical analysis, linear algebra, and numerical methods. Since an ample number of
good reference books are available in the market, we decided not to include additional
mathematical preliminaries.
This book pays special attention to the practical implementation of algorithms.
Time has proven that the practical value of an algorithm, and hence its importance among
practitioners, is largely determined by its numerical performance including robustness,
convergence rate, and ease of computer implementation. With the advent of digital
computer technology, iterative solution methods for optimization have become extremely
popular. Actually, this book explains various algorithms in the framework of an iterative
scheme with three principal aspects: (a) how to obtain a starting solution, (b) how to
check if a current solution is optimal, and (c) how to move to an improved solution. We
have attempted to cast all the algorithms discussed in the book within the purview of
this philosophy. In this manner, computer implementation follows naturally.
The material in this book has been used by the authors to teach several graduate
courses at North Carolina State University, University of Pennsylvania, and Rutgers
University since 1988. According to our experience, Chapters 1 through 6 together with
a brief touch of Chapter 7 comprise the material for a one-semester first graduate course
in Linear Programming. A review of Chapters 3 and 5 together with Chapters 6 through
10 could serve for another one-semester course in Advanced Linear Programming, or
Special Topics on Interior-Point Methods. This book can also be used as a "cookbook"
for computer implementation of various optimization algorithms, without actually going
deep into the theoretical aspects. For this purpose, after introducing each algorithm, we
have included a step-by-step implementation recipe.
We have tried to incorporate the most salient results on the subject matter into this
book. Despite our efforts, however, owing to the tremendous ongoing research activities
Preface XV
in the field of interior-point methods, we may have unintentionally left out some of the
important and recent work in the area.
ACKNOWLEDGMENTS
Writing this book has been a long and challenging task. We could not have carried
on this endeavor without persistent help and encouragement from our colleagues and
friends, in addition to our families. The first and foremost cif such people is Mr. Steve
Chen, Head of the Global Network and Switched Services Planning Department, AT&T
Bell Laboratories. He envisioned the importance of this work and provided us with
tremendous support in time,· equipment, periodic suggestions for improving the book,
and every other aspect one can think of. Also, in particular, we wish to thank Professors
Romesh Saigal (University of Michigan), Jong-Shi Pang (Johns Hopkins University),
Jie Sun (Northwestern University), Robert J. Vanderbei (Princeton University), and Yin-
Yu Ye (University of Iowa) for reviewing our book proposal and/or drafts; Professor
Salah E. Elmaghraby (North Carolina State University) for encouraging and scheduling
one of us in the teaching of linear programming courses; Professor Elmor L. Peterson
(North Carolina State University) for his invaluable advisory work; Dr. L. P. Sinha
and Mr. W. Radwill (AT&T Bell Laboratories) for their valuable support and constant
encouragement. Also, successful completion of this work would not have been possible
without the support we received from Dr. Phyllis Weiss (AT&T Bell Laboratories).
Besides, we would like to thank Drs. Jun-Min Liu, Lev Slutsman, David Houck Jr.,
Mohan Gawande, Gwo-Min Jan (AT&T Bell Laboratories), and Dr. Ruey-Lin Sheu
(North Carolina State University) for their constructive suggestions. We express also the
greatest appreciation to those students who have tolerated the unpolished manuscript and
helped us improve the quality of this book. The final thanks go to Dr. Bruce Loftis of the
North Carolina Supercomputing Center, the Cray Research Grants, and our publishers in
Prentice Hall.
Shu-Cherng Fang
Raleigh, North Carolina
Sarat Puthenpura
Murray Hill, New Jersey
Linear Optimization
and Extensions
1
Introduction
The linear programming problem was first conceived by G. B. Dantzig around 1947 while
he was working as a Mathematical Advisor to the United States Air Force Comptroller on
developing a mechanized planning tool for a deployment, training, and logistical supply
program. The work led to his 1948 publication, "Programming in a Linear Structure."
The name "linear programming" was coined by T. C. Koopmans and Dantzig in the
summer of 1948, and an effective "simplex method" for solving linear programming
problems was proposed by Dantzig in 1949. In the short period between 1947 and
1949, a major part of the foundation of linear programming was laid. As early as 1947,
1
2 Introduction Chap. 1
Koopmans began pointing out that linear programming provided an excellent framework
for the analysis of classic economic theories.
Linear programming was not, however, born overnight. Prior to 1947, mathemati-
cians had studied systems of linear inequalities, the core of the mathematical theory of
linear programming. The investigation of such systems can be traced to Fourier's work
in 1826. Since then, quite a few mathematicians have considered related subjects. In
particular, the optimality conditions for functions with inequality constraints in the finite-
dimensional case appeared in W. Karush's master's thesis in 1939, and various special
cases of the fundamental duality theorem of linear programming were proved by others.
Also, as early as 1939, L. V. Kantorovich pointed out the practical significance of a
restricted class of linear programming models for production planning and proposed a
rudimentary algorithm for their solution. Unfortunately, Kantorovich' s work remained
neglected in the U.S.S.R. and unknown elsewhere until long after linear programming
had been well established by G. B. Dantzig and others.
Linear programming kept evolving in the 1950s and 1960s. The theory has been
enriched and successful applications have been reported. In 1975, the topic came to
public attention when the Royal Sweden Academy of Sciences awarded the Nobel Prize
in economic science to L. V. Kantorovich and T. C. Koopmans "for their contributions
to the theory of optimum allocation of resources." Yet another dramatic development
in linear programming came to public attention in 1979: L. G. Khachian proved that
the so-called "ellipsoid method" of N. Z. Shor, D. B. Yudin, and A. S. Nemirovskii,
which differs radically from the simplex method, could outperform the simplex method
in theory. Unlike the simplex method, which might take an exponential number of
iterations to reach an optimal solution, the ellipsoid method finds an optimal solution of
a linear programming problem in a polynomial-time bound. Newspapers around the world
published reports of this result as if the new algorithm could solve the most complicated
and large-scale resource allocation problems in no time. Unfortunately, the theoretic
superiority of the ellipsoid method could not be realized in practical applications.
In 1984, a real breakthrough came from N. Karmarkar's "projective scaling algo-
rithm" for linear programming. The new algorithm not only outperforms the simplex
method in theory but also shows its enormous potential for solving very large scale
practical problems. Karmarkar' s algorithm is again radically different from the simplex
method-it approaches an optimal solution from the interior of the feasible domain. This
interior-point approach has become the focal point of research interests in recent years.
Various theoretic developments and real implementations have been reported, and further
results are expected.
In this section, we first introduce a linear programming problem in its standard form, then
discuss the embedded assumptions of linear programming, and finally show a mechanism
to convert any general linear programming problem into the standard form.
Sec. 1.2 The Linear Programming Problem 3
In which, x,, x2, ... , Xn are nonnegative decision variables to be determined and c 1 , c2,
... , Cn are cost coefficients associated with the decision variables such that the objective
function z = c,x, + c2x2 + · · · + CnXn is to be minimized. Moreover, ~J=I a;jXj = b;
denotes the ith technological constraint for i = 1, ... , m, where aij, for i = 1, ... , m
and j = 1, ... , n, are the technological coefficients and b;, for i = 1, ... , m, are the
right-hand-side coefficients.
A linear programming problem (in standard form) is to find a specific nonnegative
value for each decision variable such that the objective function achieves its minimum
at this particular solution while all the technological constraints are satisfied.
If we denote X= (x,, ... , Xn)T, c = (c,, ... , Cn)T, b = (b 1, ... , bm)T, and A=
matrix of (aij ), then the above linear programming problem can be written in matrix
notation as follows:
Minimize cT x
subject to Ax = b (1.2)
x::::O
1. Proportionality assumption: For each decision variable Xj, for j = 1, ... , n, its
contribution to the objective function z and to each constraint
n
La;jXj =b;, for i=1, ... ,m,
j=l
Xj contribute exactly 2cj units in the objective function and 2aij units in the ith
constraint. No set-up cost for starting the activity is realized.
2. Additivity assumption: The contribution to the objective function or any techno-
logical constraint of any decision variable is independent of the values of other
decision variables. There are no interaction or substitution effects among the de-
cision variables. The total contribution is the sum of the individual contributions
of each decision variable.
3. Divisibility assumption: Each decision variable is allowed to assume any fractional
value. In other words, noninteger values for the decision variables are permitted.
4. Certainty assumption: Each parameter (the cost coefficient Cj, the technological
coefficient aij, and the right-hand-side coefficient bi) is known with certainty. No
probabilistic or stochastic element is involved in a linear programming problem.
It is clearly seen that a nonlinear function could violate the proportionality assump-
tion and additivity assumption, an integer requirement on the decision variables could
ruin the divisibility assumption, and a probabilistic scenario could rule out the certainty
assumption. Although the embedded assumptions seem to be very restrictive, linear
programming models are nonetheless among the most widely used models today.
The standard form of linear program deals with a linear minimization problem with non-
negative decision variables and linear equality constraints. In general, a linear program
is a problem of minimizing or maximizing a linear objective function with restricted or
unrestricted decision variables in the presence of linear equality and/or inequality con-
straints. Here we introduce a mechanism to convert any general linear program into the
standard form.
L a i j X j - ei = b;
j=l
n
On the other hand, a linear equation L aijXj = b; can be converted into a pair of
j=l
inequalities, namely, n n
L a;jXj :S b; and L aijXj ::: b;
j=l j=l
maximum (t
j=l
CjXj) = -minimurri (tj=l
-cjxj)
Minimize
n
subject to "'"'a·
L...J lj·x·J >- b·,, fori= 1, ... , m
j=l
Modeling a problem is always an art. Although iinear programming has long proved
its merit as an effective model of numerous applications, still there is no fixed rule of
6 Introduction Chap. 1
modeling. In this section we present some classic examples of situations that have natural
formulations, from which we see that a general practice is to define decision variables
first. Each decision variable is associated with a certain activity of interest, and the
value of a decision variable may represent the level of the associated activity. Once the
decision variables are defined, the objective function usually represents the gain or loss of
taking these activities at different levels, and each technological constraint depicts certain
interrelationships among those activities. However, many sophisticated applications go
far beyond the general practice.
Xj ~ 0, X2 ~ 0, ... , Xn ~ 0
n
subject to L a;jXj -Xi = b;, fori= 1, ... , m
j=l
Xi =0
Xn + Yn- Zn = 0
and the nonnegativity constraints
x·1>- 0 , Yj :::: 0, Zj :::: 0, for j = 1, ... , n
After converting, we have a standard-form linear program:
n
-Minimize 2)-PjZj + PjYj + rxj)
j=i
Xi =0
Xn + y,.- Z 11 = 0
Suppose that there are m different widths specified by customers, say w 1 , w 2 , ... , Wm,
and customers require bi subrolls of width Wi, for i = 1, ... , m. For a master roll with
width w (of course, Wi :S w for each i), there are many ways to cut it into subrolls. For
example, subrolls of widths 3, 5, 7 are cut from a master roll of width 10. We can cut a
master roll to produce three subrolls of width 3, zero subrolls of width 5, and zero subrolls
of width 7; or cut to produce one subroll of width 3, zero subrolls of width 5, and one
subroll of width 7; or cut to produce zero subrolls of width 3, two subrolls of width 5,
and zero subrolls of width 7, ... , etc. Each such way is called a feasible cutting pattern.
Although the total number of all possible cutting patterns may become huge, the number of
feasible cutting patterns is always finite, say n. If we let aij be the number of subrolls of
width Wi obtained by cutting one master roll according to pattern j, then
m
"a··w·
L_; l j l <
- w
i=l
is required for the pattern to be feasible. Now define Xj to be the number of master rolls
cut according to the jth feasible pattern, and the cutting-stock problem becomes an integer
linear programming problem:
n
Minimize L Xj
j=l
n
subject to L_; lj J >
"a··x· - bi fori= 1, ... , m
j=l
for j = 1, ... , n
This is a book of linear programming and its extensions. The authors see three key
elements in the mastering of linear programming, namely,
The first step of learning is to "see" problems and have a feeling about them. In
this way, we are led to understand the known properties and conjecture new ones. The
second step is to translate geometric properties into algebraic expressions and to develop
algebraic skills to manipulate them in proving new results. Once the problems are
understood and basic results are obtained, the third step is to develop solution procedures.
10 Introduction Chap. 1
Since the most important characteristic of a high-speed computer is its ability to perform
repetitive operations very efficiently, linear programming algorithms are introduced in
an iterative scheme and validated by computer implementations.
The basic philosophy of solving a linear programming problem via an iterative
scheme is to start from a rough solution and successively improve the current solution
until a set of desired optimality conditions are met. In this book, we treat the simplex
method, the ellipsoid method, and Karmarkar' s algorithm and its variants from this
integrated iterative approach. The layout of the book is as follows. We provide simple
geometry of linear programming in Chapter 2, introduce the classic simplex method in
Chapter 3, and study the fascinating duality theory and sensitivity analysis in Chapter 4.
From the complexity point of view, we further introduce Khachian's ellipsoid method
in Chapter 5 and Karmarkar' s algorithm in Chapter 6. The affine scaling algorithms,
as variants of Karmarkar' s algorithm, are the topics of Chapter 7. The insights of the
interior-point methods are discussed in Chapter 8. Then we extend our horizon to touch
on the convex quadratic programming in Chapter 9. Finally we wrap up the book by
studying the computer implementation issues in Chapter 10.
1.1. Bazaraa, M.S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network Flows, 2d
ed., John Wiley, New York (1990).
1.2. Bland, R. G., Goldfarb, D., and Todd, M. J., "The ellipsoid method: a survey," Operations
Research 29, 1039-1091 (1981).
1.3. Borgwardt, K. H., The Simplex Method: A Probabilistic Analysis, Springer-Verlag, Berlin
(1987).
1.4. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
1.5. Dantzig, G. B., "Maximization of a linear function of variables subject to linear inequalities,"
Activity Analysis of Production and Allocation, edited by T. C. Koopmans, John Wiley, New
York, 339-347 (1951).
1.6. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
1.7. Gass, S.l., Linear Programming: Methods and Applications, 2d ed., McGraw-Hill, New York
(1964).
1.8. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-stock
problem," Operations Research 9, 849-859 (1961).
1.9. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-stock
problem-Part II," Operations Research 11, 863-888 (1963).
1.10. Goldfarb, D., and Todd, M. J., "Linear Programming," in Optimization, Handbook in Op-
erations Research and Management Science, ed. by Nemhauser, G. L. and Rinnooy Kan,
A. H. G., Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
1.11. Hooker, J. N., "Karmarkar's linear programming algorithm," Interfaces 16, 75-90 (1986).
Exercises 11
1.12. Kantorovich, L. V., "Mathematical methods of organizing and planning production" (in Rus-
sian), Publication House of the Leningrad State University, Leningrad (1939), (English trans-
lation) Management Science 6, 366-422 (1959-60).
1.13. Karmarkar, N., "A new polynomial time algorithm for linear programming," Combinatorica
4, 373-395 (1984).
1.14. Karush, W., "Minima of functions of several variables with inequalities as side constraints,"
Master's thesis, Department of Mathematics, University of Chicago (1939).
1.15. Khachian, L. G., "A polynomial algorithm in linear programming" (in Russian), Doklady
Akademiia Nauk SSSR 224, 1093-1096, (English translation) Soviet Mathematics Doklady
20, 191-194 (1979). Soviet Mathematics Doklady 6, 286-290 (1965).
1.16. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, 2d ed., Addison-
Wesley, Reading, MA (1973).
1.17. Murty, K. G., Linear Programming, John Wiley, New York (1983).
1.18. Shamir, R., "The efficiency of the simplex method: a survey," Management Science 33,
301-334 (1987).
EXERCISES
1.1. Convert the following linear programming problems into standard form:
(a) Minimize 4xt + .Jix2 - 0.35x3
subject to
X[,X3?::: 0
(b) Maximize -3.lxt + 2.Jix2- x3
subject to 1OOxt - 20x2 = 7
subject to
X2 ?::: 0, X3 :S 10
1.2. Consider a linear programming problem:
subject to x 1 + xz _:::: 10
x1 - 3xz:::: 2
1.5. CHIPCO produces two kinds of memory chips (Chip-1 and Chip-2) for computer usage.
The unit selling price is $15 for Chip-1 and $25 for Chip-2. To make one Chip-1, CHIPCO
has to invest 3 hours of skilled labor, 2 hours of unskilled labor, and 1 unit of raw material.
To make one Chip-2, it takes 4 hours of skilled labor, 3 hours of unskilled labor, and 2 units
of raw material. The company has 100 hours of skilled labor, 70 hours of unskilled labor,
and 30 units of raw material available. The sales contract signed by CHIPCO requires that
at least 3 units of Chip-2 have to be produced and any fractional quantity is acceptable.
Can you formulate a linear program to help CHIPCO determine its optimal product
mix?
1.6. Assignment problem. Five persons (A, B, C, D, E) are assigned to work on five different
projects. The following table shows how long it takes for a specific person to finish a
specific project:
Exercises 13
Project#
2 3 4 5
A 5 5 7 4 8
B 6 5 8 3 7
c 6 8 9 5 10
D 7 6 6 3 6
E 6 7 10 6 II
The standard wage is $60 per person per day. Suppose that one person is assigned
to do one project and every project has to be covered by one person. Can you formulate
this problem as an integer linear program?
1.7. INTER-TRADE company buys no-bland textile outlets from China, India, and the Philip-
pines, ships to either Hong Kong or Taiwan for packaging and labeling, and then ships to the
United States or France for sale. The transportation costs between sources and destinations
can be read from the following table:
Geometry of Linear
Programming
subject to Ax= b
x:::o (2.1)
where c and x are n-dimensional column vectors, A an m x n matrix, and b an m-
dimensional column vector. Usually, A is called the constraint matrix, b the right-hand-
side vector, and c the cost vector. Note that we can always assume that b ::: 0, since for
any component bi < 0, multiplying a factor -1 on both sides of the ith constraint results
in a new positive right-hand-side coefficient. Now we define P = {x E WI Ax = b,
x :=:: 0} to be the feasible domain or feasible region of the linear program. When P is not
void, the linear program is said to be consistent. For a consistent linear program with a
14
Sec. 2.2 Hyperplanes, Halfspaces, and Polyhedral Sets 15
that intersect at the hyperplane H. Removing H results in two disjoint open halfspaces
H{ = {x E R" I aT x < ,8) (2.5)
and
H~ = {x E R" I aT x > ,8) (2.6)
H= {x E R" I aTx=,6}
Figure ~.1
depict the contours of the linear objective function, and the cost vector c becomes the
normal of its contour hyperplanes.
We further define a polyhedral set or polyhedron to be a set formed by the intersec-
tion of a finite number of closed halfspaces. If the intersection is nonvoid and bounded,
it is called a polytope. For a linear program in its standard form, if we denote a; to be
the ith row of the constraint matrix A and b; the ith element of the right-hand vector b,
then we have m-hyperplanes
H; = {x E Rn I af X = b;}, i = 1, .. . ,m
and the feasible domain P becomes the intersection of these hyperplanes and the first
orthant of R 11 • Notice that each hyperplane H is an intersection of two closed halfspaces
HL and Hu and the first orthant of R" is the intersection of n closed halfspaces {x E
R 11 I x; =:: 0} (i = 1, 2, ... , n). Hence the feasible domain P is a polyhedral set. An
optimal solution of the linear program can be easily identified if we see ~how the contour
hyperplanes formed by the cost vector c intersect with the polyhedron formed by the
constraints.
Consider the following linear programming problem:
Example 2.1
Minimize - X! - 2x2
subject to XI + x2 + x3 = 40
Although it has four variables, the feasible domain can be represented as a two-
dimensional graph defined by
2x 1 + x 2 =60
A more detailed study of polyhedral sets and polytopes requires the following definition:
Consequently, we know the set of all affine combinations of distinct points x 1, x2 E R" is
the whole line determined by these two points, while the set of all convex combinations
is the line segment jointing x 1 and x2 . Obviously each convex combination is an affine
combination, but the converse statement holds only when x 1 = x2 .
Following the previous definition, for a nonempty subset S C R", we say S is
affine if S contains every affine combination of any two points x 1, x2 E S; S is convex if
S contains every convex combination of any two points x 1, x2 E S.
It is clear that affine sets are convex, but convex sets need not be affine. Moreover,
the intersection of a collection (either finite or infinite) of affine sets is either empty or
affine and the intersection of a collection (either finite or infinite) of convex sets is either
empty or convex.
We may notice that hyperplanes are affine (and hence convex), but closed halfspaces
are convex only (not affine). Hence the linear manifold (the solution set of a finite system
of linear equations) {x E R" I Ax = b} is affine (and hence convex) but the feasible
domain P of our linear program is convex only.
18 Geometry of Linear Programming Chap. 2
H
Figure 2.3
One very important fact to point out here is that the intersection set of the polyhedral
set P and the supporting hyperplane with the negative cost vector -c as its normal
provides optimal solutions to our linear programming problem. This fact will be proved
in Exercise 1.6, and this is the key idea of solving linear programming problems by
"graphic method." Figure 2.4 illustrates this situation of Example 1.
-c = (1 Z)T
Extreme points of a polyhedral set are geometric entities, while the basic feasible solutions
of a system of linear equations and inequalities are defined algebraically. When these
two basic concepts are linked together, we have algebraic tools, guided by geometric
intuition, to solve linear programming problems.
The definition of extreme points is stated here: A point x in a convex set C is said
to be an extreme point of C if x is not a convex combination of any other two distinct
points in C. In other words, an extreme point is a point that does not lie strictly within
the line segment connecting two other points of the convex set. From the pictures of
convex polyhedral sets, especially in lower-dimensional spaces, it is clear to see that the
extreme points are those "vertic~s·: of a convex polyhedron. A formal proof is left as an
exercise. 1~, ~,
To characterize those extreme points of the feasible domain P = (x E Rn I Ax=
b, x ::: 0} of a given linear program in its standard form, we may assume that A is an
m xn matrix with m :::: n. We also denote the jth column of A by Aj, for j = 1, 2, ... , n.
20 Geometry of Linear Programming Chap. 2
Then we know y 1 , y2 E P and x = 1j2y 1 + 1/2y2 . In other words, xis not an extreme
point of P.
(:::: side): Suppose that x is not an extreme point, then x = Ay 1 + (1 - A)y2 for
some distinct y 1, y2 E P and 0 :::: A :::: 1. Since y 1, y2 2: 0 and 0 :::: A :::: 1, the last n - p
components of y 1 must be zero. Consequently, we have a nonzero vector w = x- y 1
such that Aw = Aw = Ax - Ay 1 = b - b = 0. This shows that the columns of A are
linearly dependent.
X= (::)
For a component in x 8 , its conesponding columns is in the basis B, we call it a basic
variable. Similarly, those components in XN are called nonbasic variables. Since B is
a nonsingular m x m matrix, we can always set all nonbasic variables to be zero, i.e.,
Sec. 2.5 Nondegeneracy and Adjacency 21
XN = 0, and solve the system of equations Bxs = b for basic variables. Then vector
X= [ : : ]
By noticing that every basic feasible solution is a basic solution, we have the next
corollary.
Corollary 2.1.2. For a given linear program in its standard form, there are at
most C(n, m) extreme points in its feasible domain P.
A very important fact to mention is that the correspondence between basic feasible
solutions and extreme points of P, as described in Corollary 2.2, in general is not one-
to-one. Corresponding to each basic feasible solution there is a unique extreme point
in P, but corresponding to each extreme point in P there may be more than one basic
feasible solution. ·
Consider the a polytope P define by
= {X E + X3 = 10, Xt + X4 = 10, X[, X2, X3, X4 ~ 0}
4
p R 1 X[+ X2 (2.10)
or, equivalently for its graph in Figure 2.5, we have
P = {x E R
2
Ix 1 + x2 :S 10, Xt ::: 10, x 1, x 2 ~ 0} (2.11)
Note that P has three extreme points in Figure 2.5, namely,
A= (0, 0), B = (0, 10), and c= (10, 0)
22 Geometry of Linear Programming Chap. 2
Figure 2.5
the value of one nonbasic variable from zero to positive and decreasing the value of
one basic variable from positive to zero. This is the basic concept of pivoting in the
simplex method to be studied in the next chapter. Geometrically, adjacent extreme points
of P are linked together by an edge of P, and pivoting leads one to move from one
extreme point to its adjacent neighbor along the edge direction. This can be clearly seen
in Figure 2.2.
Figure 2.6
This idea of "convex resolution" can be verified for a general polyhedral set with
the help of the following definition: An extremal direction of a polyhedral set P is a
nonzero vector d E Rn such that for each x 0 E P the ray {x E Rn I x = x0 +Ad, A ::: 0}
is contained in P. Note that, in the convex analysitlil:erature, it is usually called the
direction of recession.
From the definition of the feasible domain P, we see that a nonzero vector d E Rn
is an extremal direction of P if and only if Ad = 0 and d ::: 0. Also, P is unbounded if
and only if P has an extremal direction. Using extreme points and extremal directions,
every point in P can be well represented by the following theorem.
X= LA;Vi +d (2.9)
iE!
24 Geometry of Linear Programming Chap. 2
where LAi 1, A; > 0 for i E I, and d is either the zero vector or an extremal
iEl
direction of P.
A proof using the mathematical induction method on the number of positive com-
ponents of the given vector x E P is included at the end of this chapter as an exercise.
A direct consequence of the resolution theorem confirms our observation made at
the beginning of this section, namely,
The resolution theorem reveals one fundamental property of linear programming for
algorithm design.
It is important to point out that Theorem 2.3 does not rule out the possibility of
having an optimal solution at a nonextreme point. It simply says that among all the
optimal solutions to a given linear programming problem, at least one of them is an
extreme point.
The fundamental theorem of linear programming shows that one of the extreme points of
the feasible domain P is an optimal solution to a consistent linear programming problem
unless the problem is unbounded. This fundamental property has guided the design of
algorithms for linear programming.
_Gne o_f_the most intuitive ways of solving a linear programming problem is the
graphical method, as we discussed before. We draw a graph of the feasible domain P first.
-Tllenat eac~me point v of P, using the negative cost vector-cas the normal vector,
we draw a hyperplane H. If P is contained in the halfspace HL, then H is a desired
supporting hyperplane and v is an optimal solution to the given linear programming
problem. This method provides us a clear picture, but it is limited to those problems
whose feasible domains can be drawn in the three-dimensional, or lower, spaces only.
Another straightforward method is the enumeration method. Since an extreme point
corresponds to a basic feasible solution, it must be a basic solution. We can generate
all basic solutions by choosing m linearly independent columns from the columns of
constraint matrix A and solving the corresponding system of linear equations. Among
all basic solutions, we identify feasible ones and take the optimal one as our solution.
The deficiency of this method is due to the laborious computation. It becomes impractical
when the number C(n, m) becomes large.
The rest of this book is devoted to designing efficient iterative algorithms for linear
programming. There are two basic approaches. One is the well-known simplex method,
the other is the newly developed interior-point approach. Focusing on finding an optimal
extreme point, the simplex approach starts with one extreme point, hops to a better
neighboring extreme point along the boundary, and finally stops at an optimal extreme
point. Because the method is well designed, rarely do we have to visit too many extreme
points before an optimal one is found. But, in the worst case, this method may still visit
all nonoptimal extreme points.
Unlike the simplex method, the interior-point method stays in the interior of P
and tries to position a current solution as the "center of universe" in finding a better
direction for the next move. By properly choosing step lengths, an optimal solution is
finally achieved after a number of iterations. This approach takes more effort, hence
more computational time, in finding a moving direction than the simplex method, but
better moving directions result in fewer iterations. Therefore the interior-point approach
has become a rival of the simplex method and gathered much attention.
Figure 2. 7 shows the fundamental difference between these two approaches.
26 Geometry of Linear Programming Chap. 2
Simplex method
/
x4
/ . lo<orim/-p
/ointmeth/od xi
x* L~ ----- !2.__f___/ x2
Figure 2.7
2.1. Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network Flows (2d
ed.), John Wiley, New York (1990).
2.2. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
2.3. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
2.4. Gass, S. I., Linear Programming: Methods and Applications (2d ed.), McGraw-Hill, New
York (1964).
2.5. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
2.6. Goldfarb, D. and Todd, M. J., "Linear Programming," in Optimization, Handbook in Opera-
tions Research and Management Science, ed. Nemhauser, G. L., and Rinnooy Kan, A. H. G.,
Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
2.7. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, (2d ed.), Addison-
Wesley, Redwood City, CA (1973).
2.8. Peterson, E. L., An Introduction to Linear Optimization, lecture notes, North Carolina State
University, Raleigh, NC (1990).
EXERCISES
2.1. Prove that a linear program with bounded, feasible domain must be bounded, and give a
counterexample to show that the converse statement need not be true.
2.2. Let S be a subset of Rn. For each of the following assertions, either prove it or provide a
counterexample in R 2 to disprove it:
(a) If S is convex, then S is (i) affine; (ii) a cone; (iii) a polyhedron; (iv) a polytope.
(b) If S is affine, then S is (i) convex; (ii) a cone; (iii) a polyhedron; (iv) a polytope.
(c) If S is a cone, then S is (i) convex; (ii) affine; (iii) a polyhedron; (iv) a polytope.
(d) If S is a polyhedron, then S is (i) convex; (ii) affine; (iii) a cone; (iv) a polytope.
Exercises 27
(e) If S is a polytope, then S is (i) convex; (ii) affine; (iii) a cone; (iv) a polyhedron.
2.3. Let H = {x E R11 I a 7 x = ,8) be a hyperplane. Show that H is affine and convex.
2.4. Suppose C1, C2, ... , Cp are p(> 0) convex subsets of R 11 • Prove or disprove the following
assertions:
p
(a) nci is convex.
i=l
p
(b) Uci is convex.
i=l
2.5. Use the results of Exercises 2.3 and 2.4 to show that P = {x E R 11 1Ax = b, x ::: 0} is a
convex polyhedron.
2.6. To make the graphic method work, prove that the intersection set of the feasible domain
P and the supporting hyperplane whose normal is given by the negative cost vector -c
provides the optimal solutions to a given linear programming problem.
2.7. Let P = {(xt,X2) E R 2 lx1 +x2 S 40,2xt +x2 S 60,x1 S 20,xt,X2::: 0}. Do the
following:
(a) Draw the graph of P.
(b) Convert P to the standard equality form.
(c) Generate all basic solutions.
(d) Find all basic feasible solutions.
(e) For each basic feasible solution, point out its corresponding extreme points in the graph
of P.
(f) Which extreme points correspond to degenerate basic feasible solutions?
2.8. For P as defined in Exercise 2.7, use the graphic method to solve linear programming
problems with the following objective functions:
(a) z = -x2;
(b) Z =-X! - X2;
(c) Z = -2Xj - X2;
(d) Z = -Xj
(e) z = -x1 +x2.
What conclusion can be reached on the optimal solution set P*?
2.9. Show that the set of all optimal solutions to a linear programming problem is a convex
set. Now, can you construct a linear programming problem which has exactly two different
optimal solutions? Why?
2.10. Prove that for a degenerate basic feasible solution with p < m positive elements, its
corresponding extreme point P may correspond to C (n - p, n - m) different basic feasible
solutions at the same time.
2.11. Let M be the 2 x 2 identity matrix. Show that
(a) Me, the convex cone generated by M, is the first orthant of R2 .
(b) Me is the smallest convex cone that which contains the column vectors (1, 0) T and
(0, 1) 7 .
2.12. Given a nonempty set S c R 11 , show that the set of all affine (convex, convex conical)
combinations of points in S is an affine (convex, convex conical) set which is identical to
the intersection of all affine (convex, convex conical) sets containing S.
28 Geometry of Linear Programming Chap. 2
2.13. To prove the resolution theorem by the induction method, we let p be the number of positive
components of x E P. When p = 0, x = 0 is obviously an extreme point of P. Assume
that the theorem holds for p = 0, 1, ... , k, and x has k + l positive components.
If x is an extreme point, then there is nothing to prove. If x is not an extreme point, we
let x 7 = (x 7 I 0), where x7 = (xt, ... , Xk+l) > 0 and A = (A IN]. Then Theorem 2.1
shows that the columns of A are linearly dependent, in other words, there exists a vector
w E Rk+l, w =F 0 such that Aw = 0. We define wT = (w7 , 0) E Rn, then w =F 0 and
Aw = Aw = 0. There are three possibilities: w ::=:: 0, w < 0, and w has both positive and
negative components.
For w :::: 0, consider x(e) = X + ew and pick e* to be the largest negative value of e such
that x* = x(e*) has at least one more zero component than x. Then follow the induction
hypothesis to show the theorem holds. Similarly, show that in the remaining two cases, the
theorem still holds.
2.14. For a linear programming problem with a nonempty feasible domain P = {x E Rn I Ax=
b, x 0:::: 0}, prove that every extreme point of P is a vertex of P and the converse statement
is also true.
3
In Chapter 2 we have seen that if the optimal solution set of a linear programming
problem is nonempty, then it contains at least one extreme point of the polyhedral set
of the feasible domain. Thus an intuitive way to solve a linear programming problem
is to traverse from one extreme point to a neighboring extreme point in a systematic
fashion until we reach an optimal one. This is the basic idea of the simplex method and
its variants. However, in doing so, as in any other iterative scheme, we have to resolve
three important issues: (1) How do we start with an extreme point? (2) How do we move
from one extreme point to a better neighboring extreme point in an "efficient" way? (3)
When do we stop the process? This chapter addresses these issues for the simplex
method with an emphasis on the so-called revised simplex method, which provides a
computationally efficient implementation for linear programming.
29
30 The Revised Simplex Method Chap. 3
The first step is to find a valid and yet convenient starting point. The choice of a
starting point may affect the overall efficiency of an iterative scheme. It varies widely
from one method to another. If a method is very sensitive to its starting point, it is cer-
tainly worth spending additional computational effort and time in finding a good starting
point. Otherwise, we should spend minimum effort on it. Sometimes mathematical
transformations are employed to transform a given problem into an equivalent form for
a quick admissible starting point. Once the transformed problem is solved, its solution
could then be used to obtain a solution to the original problem. In general, finding a
starting point is not an easy task; it may take as much as half of the total computational
effort. We shall study different starting mechanisms in later sections and chapters.
The second step of an iterative scheme is to check if we have reached our goal or
not. For an optimization problem, this means testing for optimality of a solution. This
test has to be carried out at each iteration for the current solution in hand. When the
result turns out to be positive, the iterative scheme is terminated. Otherwise, we go to
the third step for further improvement. The testing process usually requires a stopping
rule, or stopping criterion for an iterative scheme. Once again, a computationally simple
stopping rule is preferred for an efficient iterative scheme, since it is performed at each
iteration.
If the stopping rule is met, we have achieved our goal. Otherwise, we proceed
to make further improvement in getting closer to our goal. This is usually done by
moving from a current solution to a better one. To do so we need two elements: (1) a
good direction of movement, and (2) an appropriate step length along the good direction.
A good direction should point to a better result, and the step length describes how far
we should proceed along the direction. Needless to say, the efficiency of an iterative
method depends strongly on the mechanism of finding a good direction and appropriate
step-length. In general, the synthesis of the direction of movement and the associated step
length calculation constitute the bulk of computation for an iterative scheme. Therefore
special attention should be paid to this aspect to achieve speed and efficiency in practical
implementations.
Bearing these ideas in mind, we shall study the guiding principles of the simplex
method for solving linear programming problems. For computational efficiency, we
focus on the revised simplex method, which is a systematic procedure for implementing
the steps of the original simplex method in a smaller array.
The simplex method was first conceived in the summer of 1947 by G. B. Dantzig.
Over the past four decades, although many variants of the simplex method been developed
to improve its performance, the basic ideas have remained the same. We study the basic
ideas in this section.
Considering the fundamental theorem of linear programming, we know that if the
feasible domain P = {x E RnjAx = b, x:::: 0} is nonempty, then the minimum objective
value z = cT x over P either is unbounded or is attainable at an extreme point of P.
This motivates the simplex method to restrict its iterations to the extreme points of P
only. It starts with an extreme point of P, checks for optimality, and then moves to
another extreme point with improved objective value if the current extreme point is
not optimal. Owing to the correspondence between extreme points and basic feasible
solutions as described in Corollary 2.2, the simplex method can be described in terms of
basic feasible solutions in an iterative scheme:
For Step 1, two commonly used mechanisms of finding a starting basic feasible
solution are the two-phase method and the big-M method. We shall introduce these two
mechanisms in Section 3.4. Once a starting point is obtained, in Step 2 it is checked
whether the current solution achieves the optimum. A stopping mle called nonnegative
reduced costs will be introduced in Section 3.3 for this purpose. If the objective cost
function can be further reduced, the stopping rule will be violated and the simplex method
proceeds to Step 3 to find an improved basic feasible solution. Under the assumption of
nondegeneracy, from Chapter 2, we know that each basic feasible solution has n - m
adjacent basic feasible solutions, which can be reached by moving along corresponding
edge directions from the current solution with appropriate step lengths. The simplex
method chooses an edge direction that leads to an adjacent basic feasible solution with
improved objective value. This is the so-called pivoting process, which will be discussed
in Section 3.3.
In order to introduce the simplex method in algebraic terms, we standardize some nota-
tions here. For a given basic feasible solution x*, we can always denote it by
x* = [:;]
where the elements of the vector x8 represent the basic variables and the elements of
vector x'N represent nonbasic variables. Needless to ~~~ x8 :::: 0 and x'N = 0 for the
~
32 The Revised Simplex Method Chap. 3
basic feasible solution. Corresponding to the basic variables x'B and nonbasic variables
x';y, we partition A and c as
X= [ ; : ]
with both Xs and XN being nonnegative. Hence the linear programming problem defined
by (3.1) becomes
(3.3a)
subject to Bx 8 + NxN = b; x 8 ::: 0; xN ::: 0 (3.3b)
(3.4)
= CsTB-Ib + rT [ : : ] (3.5)
where
(3.6)
X= [::] ::::0
In this case the current basic feasible solution x* is an optimal solution. On the other
hand, if any component of r is negative, its corresponding element of XN may be increased
from zero to some positive value (or equivalently a nonbasic variable is brought into the
basis) to gain a reduction in the objective value z. Hence vector r is named the reduced
.Q!.St vector• which consists of reduced costs Summarizing previous discussions, we have
derived the following result:
Theorem 3.1. If
=[X~]=
1
x*
x'N
[B-0 b] -> O
is a basic feasible solution with nonnegative reduced costs vector r, given by Equation
(3.6), then x* is an optimal solution to the linear programming problem (3.1).
Let x* be a current basic feasible solution with B being its corresponding basis, N the
nonbasis, B .the index set of basic variables in x*, and N the index set of nonbasic
variables. :tviore~v~f,f'Oi.=each nonbasic variable xq (q E N), let cq ~~t~-~ost coefficient
associated with it and Nq the column in N that corresponds to xq. Then Theorem 3.1
says that if
for each q EN (3.8)
then we can terminate the simplex method with an optimal solution x*. Otherwise, we
have to move to another basic feasible solution for some potential improvement in the
objective value.
Note that Nq = Aq for each q inN, since they represent the same columns.
After taking care of Step 2, we now focus on the process of moving to a basic feasible
solution with improved objective value. The process includes finding a good moving
direction (direction of translation) and an appropriate step length.
34 The Revised Simplex Method Chap.3
Feasible
region P
x1 Figure 3.1
Consequently, we require
-B-IA ]
---~ = [c~lc~] [ eq q ~-~-=.<:~~-~~~'L-~-g (3.14)
Note again that Aq and Nq are the same column vector, and Equation (3.14) actually
requires. the !C!_ciuced cost r q < 0 to assure the corresponding edge direction is a good
direction of translation.
Summarizing our findings, we have the following theorem.
x* = [B~Ib]
be a basic feasible solution to the linear programming problem defined by (3.1) with basis
B. If the reduced cost rq < 0, for some nonbasic variable xq, then the edge direction dq
given by (3.10) leads to an improvement in the objective value.
However, when x* is degenerate, dq may become an infeasible edge direction that forces
a step length a = 0. In this case, no actual translation happens, and we stay at the same
extreme point with two different representations in terms of basic variables.
Also note that for a feasible edge direction dq with rq < 0, if dq ::: 0, then x* +adq
is always feasible as long as a > 0. Therefore, as a approaches infinity, the given linear
programming problem becomes unbounded below and we have the following result.
Step Length. Once a good edge direction dq given by (3.10) is selected as the
direction of translation at the current basic feasible solution x*, we have to determine an
appropriate step length a ::: 0 such that x* + adq becomes a new basic feasible solution
by bringing in the nonbasic variable Xq and dropping out a basic variable to form a new
basis. By Theorem 3.3, we know that if dq ::: 0 is a feasible direction, then the given
linear programming problem is unbounded below. In case dq has negative components,
since Adq = 0 has been verified before, in order to keep x* + adq ::: 0, we need to
choose a according to the following formula:
a= Minirp.um -~ldJ
jEB
x*
[
< 0
dj
l (3.15)
where xJ is the jth element of x*, B is the index set of basic variables, and dJ is the
component in dq corresponding to the basic variable xJ.
This formula is referred to in the literature as the minimum ratio test. It determines
which basic variable will become nonbasic (with zero value) as the nonbasic variable xq
is introduced to the new basis. It is not difficult to show that under the assumption of
nondegeneracy, there is a unique basic variable Xp (p E B) which provides a positive
step length a leading to a distinct basic feasible solution. Also note that at a degenerate
point, the step length obtained by (3 .15) may become zero. In this case, although in
theory we have changed our basis, we actually stay at the same extreme point of P.
This process of changing basis is sometimes called the pivoting process. By pivot-in we
mean a nonbasic variable entering the new basis, and pivot-out a basic variable leaving
the current basis.
Sec. 3.3 Algebra of the Simplex Method 37
Just like the pivot-in process, there may be several basic variables achieving the
minimum ratio at the same time. Among these candidates for pivot-out, different variants
of the simplex method pick different candidates. But no evidence supports a particular
variant of the simplex method for all cases.
The following theorem summarizes the discussions in this subsection:
x* = [B~Ib]
be a basic feasible solution to the linear programming problem defined by (3~1) with
basis B. If a reduced cost rq < 0 is found for some nonbasic variable xq, then the edge
direction dq given by (3.10) together with the step length a determined by (3.15) lead to
another basic feasible solution whose objective value is no worse than that of the current
one.
Note that if dq is a feasible direction and a > 0, then the adjacent basic feasible
solution obtained in Theorem 3.4 represents a distinct extreme point with improved
objective value. However, when a = 0, we stay at the same extreme point with the
same objective value. Moreover, based on Theorems 3.1-3.4, we can sketch the key
steps of the simplex algorithm as follows:
Step 1: Find a basic feasible solution x with a basis matrix B and nonbasis
matrix N.
Step 2: Compute the reduced cost rq for each nonbasic variable xq according to
Equation (3.8). If rq ::: 0 for each nonbasic variable, then stop. The current basic
feasible solution is optimal. Otherwise, go to Step 3.
Step 3: Compute the direction of movement dq with rq < 0 according to (3.10). If
dq ::: 0, then the linear programming problem is unbounded. Otherwise, compute
the step length a according to (3.15), update the current basic feasible solution by
x +-- x + adq and update the corresponding basis matrix. Go to Step 2.
Example 3.1
Minimize - x1- x2 - X3
subject to Zx1 + x4 = 1
2xz +x5 = 1
38 The Revised Simplex Method Chap. 3
Note that
b=[l If
and
c = [-I -I -I o o of
Step 1: Let us pick x 4, x 5 and x6 as basic variables, then
10 01 0]0
B = [0 0 1
:8 = {4, 5, 6} N= {1, 2, 3}
[::]+ad 1
=[1 1 0 0 Of+1/2x[-2 0 0 1 0 of
= [O 1 112 o of
Note that the new basic variables are x 5 , x 6 , x 1 and nonbasic variables are x 2 , x3, x4.
Moreover,
Xs=[xs X6 XJf=[1 1 1/2f
XN = [ Xz X3 X4 f = [0 0 0f
Sec. 3.4 Starting the Simplex Method 39
0 0 1]
N=
[02 20 00
By now we have completed one simplex iteration. It is easy to verify that the
new solution corresponds to the vertex (1/2, 0, 0) with a reduced objective value
-1/2. If we go back to Step 2 for two more iterations, we can reach an optimal
solution
(Xt X2 X3 X4 Xs X6f=[1j2 1/2 1/2 0 0 Of
with an objective value -3/2.
As we discussed before, under the assumption of nondegeneracy, at each
iteration the step length a in Step 3 is always positive. This leads the simplex
method to a distinct extreme p9int with lower objective value after each iteration.
Hence the simplex method is not going to revisit any extreme point that has been
visited before. Since the total number of extreme points of the feasible domain P
is finite, we have the following tqeorem:
In the presence of degeneracy, the simplex method may be trapped into an endless
loop without termination. This phenomenon is called cycling, and we shall study it in
Section 3.5.
For some linear programming problems, it is easy to find a starting basic feasible solution.
But this task can be as difficult as finding an optimal solution from a given basic feasible
solution. In this section we introduce two commonly used starting mechanisms.
Consider the linear programming problem defined by (3.1). Without loss of generality,
we can assume b ::0: 0. Remember that we have n variables and m constraints. We let
xa = (x 1a, x 2 a, ... , Xma)T E Rm be an m-dimensional vector of artificial variables, and
40 The Revised Simplex Method Chap. 3
[;] = [~]
is a basic feasible solution to the Phase I problem. Also note that the Phase I problem is
always bounded below by 0, since xa :::: 0 is required. Therefore, applying the simplex
method (with a cycling prevention mechanism to be discussed later) to a Phase I problem
always results in an optimal solution
[;0*]
There are two possible cases:
Case 1 xM =f. 0. If ~* =1= 0, then the original problem is infeasible, since if the
original problem has a feasible solution x, then
[~]
is feasible to the Phase I problem with zero objective value. This violates the optimality
of the solution
[;0*]
Case 2 xa* = 0. In this case, if the current basis does not contain any artificial
variable in it, then x0 forms a starting basic feasible solution to the original problem. In
particular, if it is nondegenerate, x0 has exactly m positive elements in it to form the
basis. On the other hand, if it is degenerate with at least one artificial variable remaining
in the basis, say x;a = 0 is the kth basic variable in current basis, then we denote ek to
be an m -dimensional vector with its kth element being equal to 1 and the rest being
equal to 0 and consider the value e[B- 1Aq for each nonbasic variable Xq associated with
the current optimal solution.
There are two possibilities:
1. If e[B- 1Aq =1= 0 for a nonbasic variable xq, then we can bring Xq = 0 to the current
basis as a basic variable to replace x;a. In this case, the optimal solution to the
Phase I problem provides a starting basis without any artificial variable in it for
the original linear programming problem.
Sec. 3.4 Starting the Simplex Method 41
2. If e[B-tAq = 0 for every nonbasic variable Xq, then we know the kth row of the
constraint set Ax = b is redundant. In this case we can remove that redundant row
from the original constraints and restart the Phase I problem.
Unlike the two-phase method, the big-M method imposes a large penalty M > 0 for
each artificial variable and solves the following linear programming problem:
n m
Minimize z = l:c1x1 + l:Mxia (3.17a)
}=t i=t
[;] = [~]
is a starting basic feasible solution and M can be thought of as the penalty to be paid
for xa =1= 0. In theory, when M is chosen to be large enough, the artificial variables will
not appear in the final solution. In reality, we may raise a fundamental issue, namely
how big should M be? This is a very important issue in implementation. Consider the
following simple example:
Example 3.2
Minimize Xt
It is clear that x_T = (1, 0, 0, 0) and X:T = (0, 0, 0, E) are two basic feasible solutions
to the big-M problem with objective value of 1 and EM, respectively. Since x corresponds
x
to a basic feasible solution to the original problem but does not, we have to make sure
that 1 < EM, or M > 1/E for any given E > 0. Consequently we see the difficulty of
choosing M for a general implementation.
When the simplex method is applied to solve the big-M problem (3.17) with
sufficient large M > 0, since the problem is feasible, we either arrive at an optimal
42 The Revised Simplex Method Chap. 3
solution to the big-M problem or conclude that the big-M problem is unbounded below.
But what can we say about the original problem in each. case?
Case 1 The big-M problem has a finite optimum at (x*, xa*). In this case, if
xa* = 0, then x* is an optimal solution to the original linear programming problem. The
reason is that, for each feasible solution x of .the original linear program,
[~]
is a feasible solution to the big-M problem. Hence we know
m
cT x = cT x + Mx 0 2:: cT x* + ML Xi a* = cT x*
i=l
On the other hand, if xa* =f. 0, then we can conclude that the original linear programming
problem has no feasible solution. To show this case, we assume that the original problem
has a feasible solution x. Then
[~]
is a feasible solution to the big-M problem and
m
cT x = CT x +M X 0 2:: cT x* + ML Xi a*
i=l
But this inequality is impossible, since M is sufficiently large and at least one xf* > 0.
Therefore x could not be a feasible solution to the original problem.
We have seen that the step length a can assume zero value, if one or more of the basic
variables involved in the minimum ratio test tum out to be zero. In this case, the current
basic feasible solution is degenerate, and the new basic feasible solution stays at the
same extreme point although the new basis is different. In other words, although it
appears that geometrically we are stagnant at an extreme point of the feasible domain P,
algebraically we are not. Thus technically we can proceed with the simplex iterations
even if a = 0, but the real danger is that at some point we might return to the old basis.
This phenomenon is called cycling. Note that for a degenerate basic feasible solution x
with p( < m) positive components, we may have up to
(n- p)!
C((n- p), (n- m)) = (n- m)!(m- p)!
Sec. 3.5 Degeneracy and Cycling 43
different bases corresponding to the same extreme point x. The following example given
by E. M. L. Beale in 1955 shows that the simplex method could be trapped into cycling
problem if the largest reduction rule is used for entering basis and the minimum ratio
test with smallest index rule as tie-breaker is used for leaving basis.
Example 3.3
3 1
Minimize - -x4
4
+ 20xs- -x6
2
+ 6x7
1
subject to XJ + 4x4 - Sxs - X6 + 9x7 =0
x3 +x6 =1
Mx = [~] (3.18)
where
(3.20)
for some basic variable Xp. Hence it is overdetermined by more than n linear equations.
Therefore, some edge directions lead to infeasibility, as shown in Figure 3.1.
Note that the matrix M is called the fundamental matrix and each edge direction
dq is a column of M -l which corresponds to a nonb~sic variable Xq.
Having looked at the trap of cycling, we need some means to prevent it from happening.
As we have seen, cycling can occur only when degeneracy is encountered. Since it is
intuitively clear that degenerate basic feasible solutions can be eliminated by slightly
perturbing the constraint parameters (resulting in only a slightly perturbed optimal value
and optimal solutions), it should not be surprising that cycling can be prevented. Among
several methods available for the prevention of cycling, two most commonly used are
the lexicographic rule proposed by G. B. Dantzig, A. Orden, and P. Wolfe in 1955, and
the Bland's rule due toR. G. Bland in 1977.
Observe that in the absence of degeneracy the objective values at each iteration
of the simplex method form a strictly decreasing monotone sequence that guarantees no
basis will be repeated. When degeneracy is involved, the sequence is no longer strictly
decreasing. To prevent revisiting the same basis, we need to incorporate another index
to keep some strictly monotone property for cycling prevention.
Basically, the lexicographic rule is used to select a leaving variable from the current
basis. It ensures no cycles by the fact that while the objective value c~B- 1 b may
remain constant in the presence of degeneracy, the vector [c~B- 1 blc~B- 1 r can be kept
lexicographically monotone decreasing.
In this rule, we first use the minimum ratio test (3.15) to decide pivot-out candidates.
If the test generates a unique index, then the corresponding variable leaves the basis. In
case there is a tie among several indices, we restrict ourselves to these indices and conduct
another minimum ratio test with the value of x/ being replaced by its corresponding
element in the vector n- 1Ap 1, where Ap 1 is the column in A corresponding to the
basic variable Xp1 with the smallest index. If the tie is still unbroken, we conduct further
minimum ratio tests on those still tied indices by using n- 1Ap2 , where Ap2 is the column
in A corresponding to the basic variable xp 2 with the second smallest index, and so forth.
In the exercise, we show that when or before all the m columns of basic variables are
used, the tie must be broken, and the unique index leads to a lexicographically monotone-
decreasing sequence of [c~B- 1 blc~B- 1 f.
Bland's rule is very simple. It specifies the choice of both the entering and leaving
variables. In this rule, variables are first ordered in sequence, then
Sec. 3.7 The Revised Simplex Method 45
1. Among all nonbasic variables with negative reduced costs, choose the one with the
smallest index to enter the basis.
2. When there is a tie in the minimum ratio test, choose the basic variable with the
smallest index to leave the basis.
Bland's rule actually creates the following monotone property (to be proved in an
exercise): If a variable xq enters the basis, then it cannot leave the basis until some other
variable with a larger index, which was nonbasic when Xq enters, also enters the basis.
This monotone property prevents cycling, because in a cycle any variable that enters the
basis must also leave the basis, which implies that there is some largest indexed variable
that enters and leaves the basis. This certainly contradicts the monotone property.
Bd = -Aq (3.23)
Step 6 (check for unboundedness): If d :=::: 0, then STOP. The problem is un-
bounded below.
Step 7 (leave the basis and step-length): Find an index }p and step-length a
according to
(3.24)
Xq +-a (3.25)
Xj; +- + adj;
Xj; for (3.26)
B +- B + [Aq - Ajp]e~ (3.27)
:B +-BU{q}\{Jp} (3.28)
Go to Step 1.
Note that n- 1 is implicitly calculated in both (3.21) and (3.23). One can use the
well-known Gauss Jordan elimination method for solving systems of equations. How-
ever, a more popular implementation used in most modern computer packages is the LU
factorization method, since it is more efficient, accurate, and numerically stable. This
method is particularly preferred when the problem is sparse and in large scale.
The basic idea of LU factorization method is to triangularize the matrix B as a
product of a lower triangular matrix L and an upper triangular matrix U. In this way,
solving (3.21) becomes solving
(3.29)
7
U J= Cs (3.30)
Sec. 3.7 The Revised Simplex Method 47
by the forward solve process, in which we obtain the first element of y directly from
the first equation in (3.30), and then substitute it into the second equation in (3.30) for
the second element of y, and so on. Once y is obtained, then w can be obtained by
solving
(3.31)
This time, since LT is upper triangular, w can be easily solved by the backward solve
process, in which we obtain the last element of w directly from the last equation in
(3.31), and then substitute it into the second last equation in (3.31) for the second last
element of w, and so on. Similar techniques work for solving (3.23), too. For more
serious implementation, one is encouraged to know more about how to obtain the L
and U factors, how to update them, and how to use scaling techniques for numerical
accuracy. These will be covered in Chapter 10.
Also note that in (3.27), ep is an m-vector with one at its pth element and zero
everywhere else. The right-hand side of (3.27) uses matrix operations to replace the
column A1P by Aq in the basis B. The readers can easily check this out. As to (3.28), it
simply means that we add index q and drop }p in the index set B.
Given below are two examples. One has degenerate basic feasible solutions, which
illustrate how the revised simplex method works.
Example 3.4 Nondegenerate
Consider the following linear programming problem:
subject to XJ + xz + x3 = 40
Note that this problem and Example 2.1 have the same feasible domain but different
objective function. The graph of its feasible domain can be referred to Figure 2.2. To
solve this problem, we first note that
1
1 0
1 OJ
1 '
-2 0 0]
B= [ ~ ~]
index set B = {3, 4} and c s = [0 0 f. This solution corresponds to the extreme point
(0, 0) in Figure 2.2, and the objective value at this point is zero. The revised simplex
method proceeds as follows.
48 The Revised Simplex Method Chap. 3
[~ ~r W= [~J
which implies that w = [ 0 0 f.
Step 2: Compute the reduced costs
a = --X4 = min
d4 3:0:i::S4
{-X"}
-'
d;
=min {-40 -60}
--, -
-1 -2
- = 30
vertex (xt, xz) = (0, 0) but has a smaller objective value ( -90 as opposed to 0).
On the next iteration, the revised simplex method will step to the vertex whose first
two coordinates are (20, 20), and this one will turn out to be optimal. Its objective
value is -100.
Now we add one more constraint, x 1 ::: 30, to make the previous example
degenerate at (30, 0) and consider the following problem:
subject to x1 + xz + x3 = 40
2xt +xz +x4 = 60
x1 +xs = 30
Figure 3.2 is a two-dimensional graph of the feasible domain P. Note that the
extreme point (30, 0) is now "overdetermined" as the intersection of three lines, instead
of two.
xz
Figure 3.2
As in Example 3.4, we start it from the extreme point (0, 0) which corresponds to
the basic feasible solution x0 = [ 0 0 40 60 30 f with the basis matrix
B !
~ [~ ~]
and the index set B = {3, 4, 5}.
Carrying out one iteration of the revised simplex method as we did in Example 3.4,
the new solution becomes x 1 = [ 30 0 10 0 0 f which is a degenerate solution
obtained from the negative reduced cost of r 1 and the minimum ratio test with a = 30
and a tie between X4 and x 5 . This means x 1 is entering the basis and either x 4 or x 5
is leaving the basis. Let us choose xs to leave the basis. Therefore our current basis
becomes
1 0 1]
B=
[00 01 21
with an index set B = {3, 4, 1}.
Continue one more iteration, it can be easily checked that w = [ 0 0 -3 ]T,
r 2 < 0, d = [ -1 -1 0 f, and a = min [ -::_\ , _:>1 ] = 0. Therefore we know that
0
x 2 enters the basis, X4 leaves the basis, and the step length is zero. Updating related
information, we have reached a new basic feasible solution x2 = [ 30 0 10 0 0 f
with a new basis index B = {3, 2, 1}. Note that x 1 = x2 . This means we actually stay at
the same extreme point (30, 0) in Figure 3.2 owing to the zero step length.
For one more iteration, it can be checked out that x 5 is entering the basis to replace
x 3 , and the current basic feasible solution x3 = [ 20 20 0 0 10 f which is an
optimal solution.
Note that in the previous example the revised simplex method avoids cycling
problem even without any cycling prevention mechanism. But this is not true in general.
In this chapter, we have seen the basic concepts behind the simplex method and have de-
veloped a computationally attractive procedure to implement the revised simplex method.
In the next chapter we will study the duality theory of linear programming and develop
two more implementation schemes for the simplex method, namely the dual simplex
method and the primal dual simplex method. The performance of the simplex method
will be discussed in Chapter 5.
3.1. Bartels, R. H., and Golub, G. H., "The simplex method for linear programming using LU
decomposition," Communications of the ACM 12, 26-268 (1969).
Exercises 51
3.2. Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network Flows (2d
ed.), John Wiley, New York (1990).
3.3. Beale, E. M. L., "Cycling in the dual simplex algorithm," Naval Research Logistics Quarterly
2, 269-276 (1955).
3.4. Bland, R. G, "New finite pivoting rules for the simplex method," Mathematics of Operations
Research 2, 103-107 (1977).
3.5. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
3.6. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
3.7. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
3.8. Goldfarb, D, and Todd, M. J., "Linear programming," in Optimization, Handbook in Op-
erations Research and Management Science, ed. by Nemhauser, G. L. and Rinnooy Kan,
A. H. G., Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
3.9. Golub, G. H., and Van Loan, C. F., Matrix Computations, Johns Hopkins University Press,
Baltimore (1983).
3.10. Kotiah, T., and Slater, N., "On two-server Poisson queues with two types of customers,"
Operations Research 21, 597-603 (1973).
3.11. Luenberger, D. G., Introduction to Linear and Nonlinear Programming (2d ed.), Addison-
Wesley, Reading, MA (1973).
EXERCISES
3.1. Consider a linear programming problem with its feasible domain P = {x E R"IAx:::: b, x::::
0}. If b:::: 0, suggest an easy way to find a starting basic feasible solution.
3.2. Consider a linear programming problem in its standard form with P = {x E R"IAx =
b,x:::: 0}.
(a) Let d E R". Show that a necessary condition for d to be a feasible direction is that
Ad=O.
(b) Suppose x = (xi, ... , x"l E P with x; > 0 when d; =I= 0. Show that there exists a
scalar a > 0 such that x +ad :::: 0.
3.3. Consider the following linear programming problem:
(b) Applying the Gaussian elimination method to the three constraints, you can express basic
variables x3, x4, and xs in terms of nonbasic variables x1 and x2. Now reformulate the
linear programming problem in terms of two nonbasic variables.
(c) Draw a two-dimensional graph for the reformulated linear programming problem and
explain why you can represent the feasible domain P of the original linear program by
a two-dimensional graph, although P c R 5 .
(d) Mark the basic feasible solution of (a) on the two-dimensional graph.
(e) Calculate B- 1A and B- 1b and compare the results with those in (b). What is your
conclusion and explanation?
(f) Go back to (a) and write down the direction vectors d 1 and d2 . Compute the corre-
sponding reduced costs.
(g) Between d 1 and d2 , which direction leads to a potential reduction in the objective value?
This is your direction of translation. How far can we proceed along that direction without
violating the nonnegativity constraints? This is your step length.
(h) Take the direction of translation and step length of (g) to move to a new solution. Show
that the new solution is not only a basic feasible solution but also an adjacent extreme
point of the previous one.
(i) Update all data in (a). Is the new solution optimal? Why?
U) Redo (b) and (c) in terms of the new basic and nonbasic variables. Notice that the new
graph is different from the previous two-dimensional graph. Why?
(k) Summarize what you have learned from (a) to U).
(I) In general, if a given linear programming problem in its standard form has n variables
and n - 2 nonredundant constraints, you can always have a two-dimensional graphic
representation for it. Why?
3.4. Complete the simplex iterations for Example 3.1.
3.5. In Case 2 of the two-phase method, prove the following statement is true: "If e[ B -I Aq = 0
for every non basic variable Xq, then the kth row of the constraint set Ax = b must be
redundant."
3.6. In Case l of the big-M method, verify that the inequality
m
cT x = cT x + M x 0 ::;: cT x* + M L x;a*
i=l
3.8. Show that for a degenerate basic feasible solution x with p( < m) positive components, we
may have up to
(n- p)!
C((n- p), (n- m)) = (n- m)!(m- p)!
B ~ l~
0
4
8
2
8
29
10 22
~~
22
42
l
(a) Show that B can be factorized as the product of L and U where
and
(b) Define w = Ux, then solve Lw = y for w by the forward solve process.
(c) Solve Ux = w for x by the backward solve process.
3.13. Complete the simplex iterations of Example 3.4.
3.14. Complete the simplex iterations of Example 3.5.
3.15. Draw a detailed flow chart of the revised simplex method with the two-phase method for
computer implementation.
3.16. Develop computer codes based on the flow chart of the last exercise and test the following
problems:
(a) Minimize Xt + x2 + x3 - 3x4 + 6xs + 4x6
X2 + X3 - X4 + 4xs + X6 = 3
Xt,X2,X3,X4,X5,X6 ~ 0
(b) Minimize - + 1xs + X6 + 2x7
x4
subject to Xt + X4 + X5 + X6 + X7 = 1
(e) Minimize - x3
subject to XJ :S: 1
x2 ~ 0.00000001x1
x2 :S: 1 - O.OOOOOOOlx1
x3 ~ O.OOOOOOOlx2
X3 :S: 1 - O.OOOOOOOlx2
X],X2,X3 ~ 0
3.17. Analyze the computer outputs of 3.16 and comment on the special properties of each sub-
problem.
4
The notion of duality is one of the most important concepts in linear programming.
Basically, associated with each linear programming problem (we may call it the primal
problem), defined by the constraint matrix A, the right-hand-side vector b, and the cost
vector c, there is a corresponding linear programming problem (called the dual problem)
which is constructed by the same set of data A, b, and c. A pair of primal and dual
problems are closely related. The interesting relationship between the primal and dual
reveals important insights into solving linear programming problems.
To begin this chapter, we introduce a dual problem for the standard-form linear
programming problem. Then we study the fundamental relationship between the primal
and dual problems. Both the "strong" and "weak" duality theorems will be presented.
An economic interpretation of the dual variables and dual problem further exploits the
concepts in duality theory. These concepts are then used to derive two important simplex
algorithms, namely the dual simplex algorithm and the primal dual algorithm, for solving
linear programming problems.
We conclude this chapter with the sensitivity analysis, which is the study of the
effects of changes in the parameters (A, b, and c) of a linear programming problem
on its optimal solution. In particular, we study different methods of changing the cost
vector, changing the right-hand-side vector, adding and removing a variable, and adding
and removing a constraint in linear programming.
55
56 Duality Theory and Sensitivity Analysis Chap. 4
Vq EN
and
WT Ap = Cp
or, equivalently,
(4.2)
b T W.= W Tb = Cs
TB-Ib = CsXB
T T
=C X
Example 4.1
The dual linear program of Example 2.1 becomes
W[, W2 :SO 0
Example 4.2
For a linear programming problem in the "inequality form," i.e.,
Minimize cT x
We can convert this problem into its standard form and then derive its dual problem.
As we are required to show in an exercise, it is the following:
Maximize bT w
subject to AT w:::: c, w::: 0
These two linear programming problems are sometimes called a symmetric pair of primal
and dual programs, owing to the symmetric structure observed.
Note that both the primal and dual problems are defined by the same data set (A, b, c).
In this section, we study the fundamental relationship between the pair. First we show
that the concept of dual problem is well defined in the sense that we can choose either
one of the primal-dual pair as the primal problem and the other one becomes its dual
problem.
Lemma 4.1. Given a primal linear program, the dual problem of the dual linear
program becomes the original primal problem.
Proof Let us start with problem (4.1). Its dual problem (4.3) may be expressed as
Note that problem (4.5) is in its standard form. Its dual problem becomes
-Maximize z= cT w (4.6a)
subject to [ -t ] ~ Y],
w [ w unresnicted (4.6b)
Next we show that the primal (minimization) problem is always bounded below by
the dual (maximization) problem and the dual (maximization) problem is always bounded
above by the primal (minimization) problem, if they are feasible.
Several corollaries can be immediately obtained from the weak duality theorem:
Corollary 4.1.2. If the primal problem is unbounded below, then the dual prob-
lem is infeasible.
Proof Whenever the dual problem has a feasible solution w 0 , the weak duality
theorem prevents the primal objective from falling below bT w 0 .
Corollary 4.1.3. If the dual problem is unbounded above, then the primal prob-
lem is infeasible.
Note that the converse statement of either of two foregoing corollaries is not true.
For example, when the primal problem is infeasible, the dual could be either unbounded
above or infeasible. However, if the primal is infeasible and the dual is feasible, then
the dual must be unbounded. Concrete examples are presented in the exercises.
With these results, a stronger result can be stated as the following important theo-
rem.
1. If either the primal or the dual linear program has a finite optimal solution, then
so does the other and they achieve the same optimal objective value.
2. If either problem has an unbounded objective value, then the other has no feasible
solution.
Proof For the first claim, without loss of generality, let us assume that the primal
problem has reached a finite optimum at a basic feasible solution x. If we apply the
revised simplex method at x and define wr = c~B- 1 , then
T
c-Aw= [CB] [BT]
CN -NT w=r:::O (4.8)
CT X= CBXB
T TB-Ib = W Tb = bT W
= CB (4.9)
Owing to Corollary 4.1.1, we know w is an optimal solution to the dual linear program.
The proof of the second claim is a direct consequence of Corollary 4.1.2 and
Corollary 4.1.3.
The strong duality theorem has several implications. First of all, it says there is no
duality gap between the primal and dual linear programs, i.e., cT x* = bT w*. This is not
generally true for nonlinear programming problems. Second, in the proof of Theorem 4.2,
the simplex multipliers (see Section 7 of Chapter 3), or Lagrange multipliers, become
the vector w of dual variables. Furthermore, at each iteration of the revised simplex
method, the dual vector w maintains the property cT x = br w. However, unless all
components of the reduced costs vector r are nonnegative, w is not dual feasible. Thus,
the revised simplex method maintains primal feasibility and zero duality gap and seeks
for dual feasibility. Needless to say, the simplex multipliers w* corresponding to a primal
optimal solution x* form a dual optimal solution.
A celebrated application of the Duality Theorem is in establishing the existence
of solutions to systems of equalities and inequalities. The following result, known as
Farka' s lemma, concerns this aspect.
60 Duality Theory and Sensitivity Analysis Chap.4
--------------- I
------------ '
I
: --- I
Maximize b1 w (D)
subject to A 1 w ~ c, w :;:: 0
For the primal problem, we define
s=Ax-b:;::O (4.13)
as the primal slackness vector. For the dual problem, we define
r=c-A 1 w:;::O (4.14)
as the dual slackness vector. Notice that s is an m-dimensional vector and r an n-
dimensional vector. Moreover, for any primal feasible solution x and dual feasible
62 Duality Theory and Sensitivity Analysis Chap.4
solution w, we know
0 ::: r 7 X + ST W
= (c 7 - w 7 A)x + w 7 (Ax- b)
= c7 X - b 7 W (4.15)
Therefore, the quantity of r 7 x + s 7 w is eyual to the duality gap between the primal
feasible solution x and dual feasible solution w. This duality gap vanishes, if, and
only if,
r7 X = 0 and s7 W = 0 (4.16)
In this case, x becomes an optimal primal solution and w an optimal dual solution. Since
all vectors x, w, r, and s are nonnegative, Equation (4.16) requires that "either r1 = 0
or x1 = 0 for j = 1, ... , n" and "either s; = 0 or w; = 0 fori = 1, ... , m." Hence
(4.16) is called the complementary slackness conditions. This important result can be
summarized as the following theorem:
are satisfied.
As to the primal-dual pair of linear programs in the standard form, i.e.,
Minimize c 7 x (P)
Maximize b 7 w (D)
subject to A 7 w::: c
since the primal problem always has zero slackness (they are tight equalities), the condi-
tion w 7 s = 0 is automatically met. Therefore, the complementary slackness conditions
are simplified to r 7 x = 0.
With this knowledge, we can state the Karush-Kuhn-Tucker (K-K-T) conditions
for linear programming problems as following:
Sec. 4.4 An Economic Interpretation of the Dual Problem 63
Theorem 4.5 (K-K-T optimality conditions for LP). Given a linear program-
ming problem in its standard form, vector x is an optimal solution to the problem if, and
only if, there exist vectors w and r such that
So far, we have seen that the dual linear program uses the same set of data as the primal
problem, supports the primal solutions as a lower bound, and provides insights into the
sufficient and necessary conditions for optimality. In this section, we intend to explain
the meaning of dual variables and make an economic interpretation of the dual problem.
Given a linear programming problem in its standard form, the primal problem can be
viewed as a process of providing different services (x :;:: 0) to meet a set of customer
demands (Ax= b) in a least expensive manner with a minimum cost (min c7 x).
For a nondegenerated optimal solution x* obtained by the revised simplex method,
we have
In this way, the dual linear program leads the manufacturer to come up with a least-cost
plan in which the purchasing prices are acceptable to the "smart" supplier.
The foregoing scenarios not only provide economic interpretations of the primal and
dual linear programming problems, but also explain the implications of the complemen-
tary slackness conditions. Assume that the manufacturer already has bi (i
units of resources on hand. Then,
= 1, ... , m)
1. the ith component of the optimal dual vector wj represents the maximum marginal
price that the manufacturer is willing to pay in order to get an additional unit of
resource i from the supplier;
2. when the ith resource is not fully utilized (i.e., aix* < bi where ai is the ith row of
A and x* is an optimal primal solution ), the complementary slackness condition
requires that wj = 0, which means the manufacturer is not willing to pay a penny
to get an additional amount of that resource;
3. when the supplier asks too much (i.e., when AJ w* > Cj, where Aj is the jth
column of A), the complementary slackness condition requires that xJ = 0, which
means that the manufacturer is no longer willing to produce any amount of product
j.
Many other interpretations of the dual variables, dual problems, and complementary
slackness conditions can be found in the exercises.
With the concept of duality in mind, we now study a variant of the revised simplex
method. Basically, this variant is equivalent to applying the revised simplex method to
the dual linear program of a given linear programming problem. Hence we call it the
dual simplex method.
Recall that the basic philosophy of the revised simplex method is to keep primal feasi-
bility and complementary slackness conditions and seek for dual feasibility at its optimal
solution. Similarly, the dual simplex method keeps dual feasibility and complementary
slackness conditions but seeks for primal feasibility at its optimum.
Let us start with a basis matrix B which results in a dual feasible solution w such
that
(4.18)
[B- b]
1
Ax = [B IN] - - = b (4.20)
0
and
(4.21)
Therefore, the dual feasibility and complementary slackness conditions are satisfied
in this setting. However, the primal feasibility is not satisfied unless x 8 = s- 1b ::: 0. In
other words, before reaching an optimal solution, there exists at least one p E B (the index
set of basic variables in the primal problem) such that xP < 0. The dual simplex method
will reset Xp = 0 (that is, drop Xp from the basic variables) and choose an "appropriate"
nonbasic variable xq ~ B to enter the basis. Of course, during this pivoting process, the
dual feasibility and and complementary slackness conditions should be maintained. This
is the key idea behind the dual simplex method.
Note that the complementary slackness conditions are always satisfied because of
the way we defined w and x, hence we only have to concentrate on dual feasibility.
Remember that, in Chapter 3, we showed that dual feasibility is associated with the
reduced costs vector
r = [::]
M = [: ~]
with its inverse M- 1 = [:-I -~-IN] Thus the information of dual variables and
dual feasibility is embedded in the following equation:
(w 7 I rN 7 ) = c 7 M- 1 = (c~B- 1 I c~- c~B- 1 N) (4.22)
Needless to say, after each pivoting a new basic variable is introduced to replace an old
one, which results in a new fundamental matrix that produces new information on the
dual according to Equation (4.22). Therefore, in order to maintain the dual feasibility,
we exploit the matrix M- 1 first.
Note that the fundamental matrix M is an n x n matrix, and a direct inversion re-
quires O(n 3 ) elementary operations. In order to reduce the computational effort, also to
reveal the new dual information in an explicit form, we introduce the Sherman-Morrison-
Woodbury formula to modify the inverse of the fundamental matrix after each pivoting.
We first investigate the changes of the fundamental matrix (from M to M) after
each pivoting. In this case, we assume that Xp leaves the basis and Xq enters the basis.
Sec. 4.5 The Dual Simplex Method 67
Let ej be an n-dimensional unit vector with 1 for its jth component and 0 for the rest.
Then the new fundamental matrix M can be obtained according to
(4.23)
Example 4.4
Assume that xT = [xr x2 x3 x4 xs], xr, x2 are basic variables, X3, x4, xs are nonbasic,
and, correspondingly,
and
1 2 3 4 5J
5 6 7 8 9
M= 0 0 1 0 0
[0 0 0 1 0
0 0 0 0 1
Suppose that xr is leaving the basis (p = 1) and x 5 is entering the basis (q = 5).
The new fundamental matrix is given by
M~ [~ ~J
2 3 4
6
0
0
0
7 8
1 0
0 1
0 0
+ m [(I
0 0 0 0)-(0 0 0 0 1)]
~ [~ ~J [~ ~J
2 3 4
n~[~
0 0 0 2 3 4
6 7 8 0 0 0 6 7 8
0 1 0 0 0 0 0 1 0
0 0 0 0 0 0 0 1
0 0 0
+ 0 0 0 -1 1 0 0 0
The inverse matrix of the new fundamental matrix can be obtained with the help
of the following Sherman-Morrison-Woodbury formula.
Proof
= I + ( 1 - ±) M- 1UVT - ( w: 1
) M- 1uvT = I
Note that once M- 1 is known, the inverse matrix of (M + uvT) can be found in
O(n 2 )elementary operations. Sometimes, we call it the rank-one updating method.
To derive the inverse of the new fundamental matrix M, we let u = eq, v =
(ep - eq)· Then, Lemma 4.2 implies that
)TM-1
M-1 = M-1- M eq eP-
-1 (
eq (4.24)
1+ (ep- eq)TM- 1eq
Notice that, e;M- 1 is the qth row of M- 1. Hence it is (eq)T itself. Consequently,
-1 [ TM-1 T]
M-1 =M-1- M eq eP -eq
1 + e~M- 1 eq - er eq
M- 1e [eTM- 1 - eT]
= M-1- q P q (4.25)
eTM- 1e
p q
(4.27)
or
(4.28)
We further define
UT = eTB-1
p (4.29)
w=w+yu (4.32)
for j EN (4.36)
If there exists j E N such that yj < 0, then ~~j :::: -y is required. Hence we
must choose q such that
-rq -rj -
0 :S -y = - :S - , Vyj < 0, j EN (4.37)
Yq Yj
In other words, we should choose q so that the minimum ratio test
is satisfied.
4. In case Yj :::: 0, Vj E N, then we know
Yj = uT Aj = e~B- 1
Aj?: 0, (4.39)
Therefore,
(4.40)
Consequently, for any feasible x :::: 0, we see that e~B- 1 Ax :::: 0. Notice
that e~B- 1 Ax = e~B- 1 b = e~x 8 = xp. Hence (4.39) implies that xp?: 0, which
contradicts our assumption (of the dual simplex approach) that Xp < 0. This in
turn implies that there is no feasible solution to the primal problem.
70 Duality Theory and Sensitivity Analysis Chap.4
Incorporating the above observations into the dual simplex approach, we can now present
a step-by-step procedure of the dual simplex method for computer implementation. It
solves a linear programming problem in its standard form.
Step 1 (starting with a dual feasible basic solution): Given a basis B = [Aj1 ,
Ah, Ah, ... Aj'"] of the constraint matrix A in the primal problem with an index set
B = U1, h, j3, ... jm}, such that a dual basic feasible solution w can be obtained
by solving the system of linear equations
BTw = cs
Compute the associated reduced costs vector r with
Step 2 (checking for optimality): Compute vector xs for primal basic variables
by solving
Bxs = b
is optimal.
Otherwise go to Step 3.
Step 3 (leaving the basis): Choose a basic variable Xjp < 0 with index jp E B.
Step 4 (checking for infeasibility): Compute u by solving the system of linear
equations
Also compute
-r
_q
Yq
= min {--r·j
Y1
1 -}
Yj < 0, j rft B
Set
-rq
-=-y
Yq
Sec. 4.5 The Dual Simplex Method 71
I
Step 6 (updating the reduced costs):
vj ¢:. B, j f. q
Tjp +- -y
Step 7 (updating current solution and basis): Computed by solving
Bd= -Aq
Set
xq +- a = Yq =
x· (-x·
}p
d; )
}p
v j; E B, i f. p
B +- B + [Aq - Aj)e~
B +- Bu {q}\Up}
Go to Step 2.
subject to XJ + x2 + x3 = 2
c~B- 1
2
w= = [ -; J
Computing rj, Vj ¢ B, we have r2 = I and r3 = 2, which implies that w is dual
feasible.
Step 2 (checking for optimality): Since
r2
- Y2 =min
{ -1 -2}
-1, -1 = 1 = -y
r4 = -y = 1 and r3 = 2- YY3 = 1
(also note that r2 has been changed from 1 to 0 as x2 enters the basis.)
Step 7 (updating current solution and basis): Solving for d in the equation
Bd = -A2, we obtain
Also
X4
x2 = a = - = 1
Y2
X]= 2- 1 X 1 = 1
Thus the new primal vector has x 1 = x2 = 1 (and nonbasic variables x3 = X4 = 0).
Since it is nonnegative, we know it is an optimal solution to the original linear program.
The corresponding optimal basis B becomes
To start the dual simplex method, we need a basis matrix B which ensures a dual basic
feasible solution. In contrast with the artificial variable technique introduced in Chapter
3 to obtain a starting primal basic feasible solution for the revised simplex algorithm,
a popular method called the artificial constraint technique is used for the dual simplex
method. Basically, we can choose any nonsingular m x m submatrix B of A, and add
one artificial constraint
with a very large positive number M to the original problem. In this way, an additional
slack variable Xn+l is added, and B U {n + 1} becomes an index set of basic variables
for the new problem. Among those nonbasic variables, choose the one with minimum
Sec. 4.6 The Primal Dual Method 73
value in the reduced cost rj as the entering variable and Xn+l as the leaving variable. It
can be shown that a dual basic feasible solution can be identified by performing such a
single pivot.
Another way to obtain a dual basic feasible solution is by solving the following
linear programming problem (possible by applying the revised simplex method):
Minimize cT x (4.41a)
subject to Ax = Be, x ::: 0 (4.41b)
where B is any m x m nonsingular submatrix of A and e is a vector of all ones. Note
that problem (4.41) has a starting feasible solution
[~]
for the revised simplex method. If this leads to an optimal solution, the corresponding
dual solution can be chosen as an initial dual basic feasible solution. On the other hand,
if problem (4.41) becomes unbounded, we can show that the original linear program is
also unbounded. Hence no dual feasible solution can be found. This is left as an exercise
to the reader.
Before concluding this section, we would like to point out three facts:
1. Solving a linear program in its standard form by the dual simplex method is math-
ematically equivalent to solving its dual linear program by the revised (primal)
simplex method.
2. Solving a linear program by the dual simplex method requires about the same
amount of effort as the revised (primal) simplex method.
3. The dual simplex method is very handy in sensitivity analysis with an additional
constraint. This topic will be discussed in later sections.
As we discussed before, the dual simplex method starts with a basic feasible solution
of the dual problem and defines a corresponding basic solution for the primal problem
such that the complementary slackness conditions are met. Through a series of pivoting
operations, the method maintains the dual feasibility and complementary slackness con-
ditions and tries to attain the primal feasibility. Once the primal feasibility is achieved,
the K-K-T optimality conditions guarantee an optimal solution. In this section, we study
the so-called primal-dual method, which is very similar to the dual simplex approach but
allows us to start with a nonbasic dual feasible solution.
Consider a linear programming problem in its standard form, which we may refer
to as the "original problem." Let w be a dual feasible (possibly nonbasic) solution. Then
we know that Cj ::: wT Aj Vj, where Aj represents the jth column of the constraint matrix
A. We are particularly interested in those binding (or tight) constraints and denote an
index set T = {j 1 wT Aj = Cj }. According to the complementary slackness theorem
74 Duality Theory and Sensitivity Analysis Chap.4
(Theorem 4.4), T is also the index set of primal variables which may assume positive
values. Now we consider the following linear programming problem:
X·>
1- '
0 Vj E T, and xa;::: 0 (4.42c)
where xa is an m-dimensional vector of artificial variables.
Note that problem (4.42) only includes a subset of primal variables in the original
problem, hence it is called the restricted primal problem associated with the original one.
Also note that the following result is true.
Lemma 4.3. If the restricted primal problem has an optimal solution with zero
objective value, then the solution must be an optimal solution to the original problem.
Proof Assume that
is an optimal solution to the restricted problem with zero objective value. Since the
optimal objective value of the restricted primal problem is zero, we have x~ = 0 in its
optimal solution. Therefore we can use x} to construct a primal feasible solution x to
the original problem such that Xj = xl ;::: 0, Vj E T, and Xj = 0, Vj ~ T. Note that
the restricted problem was defined on the basis of an existing dual feasible solution w
with cj = wT Aj, Vj E T, and cj > wT Aj, Vj ~ T. It is clear that the complementary
slackness conditions are satisfied in this case, since (cj - wT Aj )xj = 0, Vj. Thus the
K-K-T conditions are satisfied and the proof is complete.
If the optimal objective value of the restricted primal problem is not zero, say
z* > 0, then xf is not good enough to define a primal feasible solution to the original
problem. In other words, a new dual feasible solution is needed to reconstruct the
restricted primal problem with reduced value of z*. In doing so, we also would like to
make sure that only new primal variables whose index does not belong to T are passed
on to the new restricted primal problem. To achieve our goal, let us consider the dual
problem of the restricted primal problem (4.42), i.e.,
Maximize z' = yrb (4.43a)
subject to YT Aj ~ 0, Vj E T (4.43b)
y:::: e, y unrestricted (4.43c)
Let y* be an optimal solution to this problem. Then the complementary slackness
conditions imply that y*T Aj :::: 0, for j E T. Only for those j f{. T with y*T Aj > 0,
Sec. 4.6 The Primal Dual Method 75
the corresponding primal variable Xj could be passed on to the restricted primal problem
with potential for lowering the value of z*. (Why?) More precisely, we may consider y*
as a moving direction in translating the current dual feasible solution w to a new dual
solution w', i.e., we define
w' = w+ay*, for a> 0
Hence we have
Cj- w'T Aj = Cj- (w + ay*)T Aj = (cj- wT Aj)- a(y*T Aj) (4.44)
Now, for each j E T, since Cj- wT Aj = 0 and y*T Aj _:::: 0, we know Cj- w'T Aj ::: 0.
In order to keep w' to be dual feasible, we have to consider those j tf. T with y*T Aj > 0.
Given the fact that Cj - wT Aj ::: 0, V j tf. T, we can properly choose a > 0 according to
the following formula:
a-
_ (ck - wT Ak) _
TA - mm
. { (cj - wT Aj)
T
I 1. tf. T,- y*T A . > 0 }
1
(4.45)
y* k 1 y* Aj
such that Cj - w'T Aj ::: 0, Vj tf. T. In particular, ck - w'T Ak = 0 and Cj - w'T Aj ::: 0,
for j tf. T and j =f. k. Then the primal variable xk is a candidate to enter the basis of the
new restricted primal problem, in addition to those primal variables in the basis of the
current restricted problem.
Following this process of adding primal variables into the restricted problem, we
may end up with either one of the following two situations: Case 1-the optimal objective
value of a new restricted primal problem becomes zero. Then Lemma 4.3 assures us an
optimal solution to the original problem is reached. Case 2-the optimal objective value
of a new restricted primal problem is still greater than zero but y*T Aj _:::: 0, Vj tf. T.
Then we can show that the original primal problem is infeasible and its dual problem is
unbounded.
Summarizing the discussions in the previous section, we can write down a step-by-step
procedure for the primal-dual simplex method for computer implementation.
(ck - wT Ak) -
y*T Ak -
. { (Cj - wT Aj)
Il)lll y*T Aj
I 1. tf. T' y *T A.
J >
o}
Also define a step length
a=
y*TAk
Add the primal variable xk into the basis to form a new restricted primal problem.
Step 6 (update the dual feasible vector): Set
w +- w+ay*
Go to Step 1.
Note that the mechanisms of generating a starting dual feasible solution for
the dual simplex method can be applied here to initiate the primal-dual method.
The following example illustrates the procedures of the primal-dual algorithm.
Example 4.6
Minimize - lx1- x2
subject to XJ + x2 + x3 = 2
XJ +x4 = 1
Minimize xf + x~
subject to xz + xf = 2
X~= 1
Maximize 2y! + Y2
subject to Y1 :so
Y1 :::: 1
Y2 :::: 1
Y!, Y2 unrestricted
Since x 2 and x~ are basic variables of the restricted primal, it follows the
complementary slackness conditions that the first and third constraints of its dual
problem are tight. Therefore,
y* = [n
is an optimal solution to this problem. We take it as the direction of translation for
the dual vector w.
Step 4 (check infeasibility/unboundedness): Now we proceed to compute the
values y*T Aj for j E { 1, 3, 4}. It can be easily verified that these values are 1, 0,
and 1 respectively. Therefore we continue.
Step 5 (enter the basis of the restricted primal): Compute cj - wT Aj for
j = 1, 3, 4. The values are 2, 1, and 3 respectively. Therefore,
a =min{~~}
1' 1
=2
and k = 1. This implies that x 1 should also enter the basis in addition to x 2 •
Step 6 (update the dual feasible vector): The new dual vector becomes
78 Duality Theory and Sensitivity Analysis Chap.4
So far we have just completed one iteration and a new restricted primal
problem is generated:
Minimize xf + x~
subject to x1 + Xz + xf = 2
x 1 +x~ =1
Given a linear programming problem in its standard form, the problem is completely
specified by the constraint matrix A, the right-hand-side vector b, and the cost vector c.
We assume that the linear programming problem has an optimal solution x* for a given
data set (A, b, c). In many cases, we find the data set (A, b, c) needs to be changed
within a range after we obtained x*, and we are interested in finding out new optimal
solutions accordingly. Conceptually, we can of course solve a set of linear programming
problems, each one with a modified data value within the range. But this may become
an extremely expensive task in reality. The knowledge of sensitivity analysis or post-
optimality analysis will lead us to understand the implications of changing input data on
the optimal solutions.
Assume that x* is an optimal solution with basis B and nonbasis N of a linear program-
ming problem:
Minimize cT x
subject to Ax = b, x ~ 0
Let c' = [ ~~] be a perturbation in the cost vector such that the cost vector changes
according to the formula
(4.46)
where a E R.
We are specifically interested in finding out an upper bound a and lower bound
g_ such that the current optimal solution x* remains optimal for the linear programming
problem with a new cost vector in which g_ ::::: a ::::: a. The geometric concept behind the
Sec. 4.7 Sensitivity Analysis 79
-c
,'
,'
' ' ,'
\'
''
'' ''
'' a
'' increases
Ax=b '<
'
,' ''
'' '
'>,'
-(c + ac')
,'
Figure 4.2
effect of the above perturbation of c on x* is illustrated in Figure 4.2. When the scale of
perturbation is small, x* may remain optimal. But a large-scale perturbation could lead
to a different optimal solution.
In order to find the stable range for the current optimal solution x* with basis B,
we focus for a moment on the revised simplex method. Notice that since the feasible
domain {x E Rn I Ax = b, x ::::_ 0} remains the same, x* stays feasible in the linear
program with the perturbed cost vector c. Moreover, x* stays optimal if the reduced
costs vector satisfies the requirement that
(4.47)
In other words, we require
(eN +ac'N)T- (cs +ac's)TB- 1N ::::_ 0 (4.48)
We now define
(4.49)
and
tT tT tTB-IN (4.50)
rN=CN-CB
(4.51)
x* stays optimal for the linear programming problem with a perturbed cost vector c.
Therefore, denoting N as the index set of nonbasic variables, we can determine that
and
Observation 3. When a is within the stable range, the current solution x* remains
optimal and the optimal objective value is a linear piece in the range. As a goes beyond
either the lower bound or the upper bound just a little bit, Figure 4.2 indicates that
a neighboring vertex will become a new optimal solution with another stable range.
This can be repeated again and again and the optimal objective function z*(a) becomes
a piecewise linear function. The piecewise linearity is between the bounds on a for
various bases. We can further prove that z*(a) is actually a concave piecewise linear
function as shown in Figure 4.3.
As in the previous section, let us assume that x* is an optimal solution with basis B and
nonbasis N to the linear programming problem
Minimize cT x
subject to Ax = b, x :::: 0
This time we incur a perturbation b' in the right-hand-side vector and consider the
following linear program:
Minimize z(a) = cT x (4.55a)
subject to Ax= b + ab', x:::: 0 (4.55b)
for a E R.
Sec. 4.7 Sensitivity Analysis 81
z*(o:)
Figure 4.3
Ax=b
Figure 4.4
Note that because the right-hand-side vector has been changed, x* need not be
feasible any more. But we are specifically interested in finding an upper bound a and
lower bound ~ such that the current basis B still serves as an optimal basis for the linear
programming problem with a new right-hand-side vector in which ~ ::s a ::s a. The
geometric implications of this problem are depicted in Figure 4.4.
In order to declare that B is an optimal basis, we have to check two conditions,
namely,
1. The reduced cost vector rT = cNT- c 8 TB- 1N is nonnegative.
82 Duality Theory and Sensitivity Analysis Chap. 4
Xa-
_ [B- (b0+ ab')]
1
The first condition is obviously satisfied, since the cost vector c, the basis B, and
the nonbasis N remain the same as before. The second condition is not necessarily true,
owing to the change of ab', unless n- 1 (b + ab') ::=:: 0.
To find the stable range for a, we let b = n- 1b and i)' = n- 1b'. Thus b+ab':::: 0
is required for the second condition. Consequently, we can define
and
Xa -
_[B- b+0aB- b'] _ [B-0 b] + [aB-0 b'] -_
1 1
- --
1 1
X
* + aB _1b,
So far, we have dealt with the changes in the cost vector and the right-hand-side vector.
In this section, we proceed to analyze the situation with changes in the constraint matrix.
In general, the changes made in the constraint matrix may result in different optimal
basis and optimal solutions. It is not a simple task to perform the sensitivity analysis.
Here we deal only with four simpler cases, namely adding and removing a variable and
adding and removing a constraint. As in previous sections, we still assume that the
original linear programming problem has an optimal solution x* = [B- 1b I 0] with an
optimal basis B such that the constraint matrix can be partitioned as A = [B I N].
Sec. 4.7 Sensitivity Analysis 83
Case 1 (adding a new variable). Suppose that a new decision variable, say Xn+l,
is identified after we obtained the optimal solution x* of the original linear program. Let
us also assume that Cn+ 1 is the cost coefficient associated with Xn+ 1, and An+ 1 is the
associated column in the new constraint matrix. We would like to find an optimal solution
to the new linear programming problem:
Minimize c7 X+ Cn+IXn+l
subject to Ax+ An+IXn+I = b, X 2: 0, Xn+I 2: 0
Note that we can set Xn+I = 0; then
[~]
becomes a basic feasible solution to the new linear program. Hence the simplex algorithm
can be initiated right away. Remember that x* is an optimal solution to the original
problem, the reduced costs rj, for j = 1, ... , must remain nonnegative. Therefore, we
only have to check the additional reduced cost rn+I = Cn+I- c~B- 1 An+I·
If rn+I :=: 0, then the current solution x* with Xn+I = 0 is an optimal solution to the
new problem and we do not have to do anything. On the other hand, if rn+l < 0, then
Xn+ 1 should be included in the basis as a basic variable. Therefore, we can continue the
simplex algorithm to find an optimal solution to the new linear programming problem.
subject to Ax = b, x :=: 0
Since the constraints are not altered, we know x* can be served as an initial basic
feasible solution to this problem for the revised simplex algorithm. Moreover, if the
simplex method finds the optimal objective value of the Phase I problem is not zero,
then the new linear programming problem obtained by removing the variable Xk from
the original problem must be infeasible. On the other hand, if the simplex method finds
an optimal solution x' with zero objective value for the Phase I problem, then we can
take x' as an initial basic feasible solution to the new linear program without the variable
Xk. In a finite number of iterations, either an optimal solution can be found for this new
problem, or the unboundedness can be detected.
Case 3 (adding a constraint). This time a new constraint is imposed after solv-
ing a linear programming problem. For simplicity, we assume the additional constraint
84 Duality Theory and Sensitivity Analysis Chap.4
To solve this new linear program, first notice that the additional constraint may
cut the original feasible domain to be smaller. If x* remains feasible, then of course it
remains optimal. But the feasible domain may exclude x*, as shown in Figure 4.5. In
this case, we do not even have a basic feasible solution to start the simplex algorithm.
Also notice that B is no longer a basis in the new problem. In fact, if the additional
constraint is not redundant, the dimensionality of any new basis becomes m + 1, instead
of m.
New constraint
(x* infeasible)
-------r
New constraint
(x* feasible) Figure 4.5
To solve the new problem with an additional constraint, we add a slack variable
Xn+I and consider the following linear programming problem:
where am+1s and am+1N are the subrows of am+! corresponding to XB and XN, respec-
tively.
We now pass the slack variable to the basis B and consider a new basis B defined
by
(4.60)
XB = B-1 [ b ] (4.62)
bm+1
Then
x [x;]
=
is a basic solution (not necessarily feasible) to the new problem with an additional
constraint. Moreover, we can show the following result.
Lemma 4.4. Let B be an optimal basis to the original linear programming prob-
lem. If x, essentially defined by (4.62), is nonnegative, then it is an optimal solution to
the new linear programming problem with the additional constraint.
Proof Since the basic solution x is nonnegative, it is a basic feasible solution. In
order to declare it is an optimal solution, we need to show the reduced cost for each
nonbasic variable is nonnegative, i.e.,
Example 4.7
Consider the problem,
Minimize - 2x 1 - xz
subject to x, + xz + x3 = 2
Minimize - 2x 1 - xz
subject to x 1 + xz + x3 = 2
x1 + +x4 = 1
XJ,X2,X},X4 2::0
with en = [ -~ J. The dual solution is defined by w = c~B- 1 . For the reduced costs rj
(j = 2, 3), we have r2 = 1 and r3 = 2, which implies that w is dual feasible. However,
Xn = [:~] =B- b=
1
[_n
we know the corresponding primal is infeasible. Therefore we can restore the primal feasi-
bility by the dual simplex method. The rest follows Example 4.5 exactly to the end.
Case 4 (removing a constraint). This case is more complicated than the ones
we have considered so far. However, if the constraint, say ak T x ::=: bk> that we wish
to remove is nonbinding, i.e., a[ x < bk> then it can be removed without affecting the
optimality of the current optimal solution. To check if the kth constraint is binding,
we simply look at the dual variable wk. If wk = 0, then the complementary slackness
condition allows the constraint to be not binding.
On the other hand, if we want to remove a binding constraint, the task becomes
difficult. We may have to solve the new linear programming problem from the beginning.
In this chapter, we have introduced the fundamental concept of duality theory in linear
programming. Two variants of the simplex algorithm, namely the dual simplex algorithm
Exercises 87
and the primal-dual algorithm, have been derived based on this very concept. We also
studied post-optimality analysis, which could assess the sensitivity of an optimal solution
or optimal basis with respect to various changes made in the input data of a linear
programming problem.
4.1. Balinsky, M. L., and Gomory, R. E., "A mutual primal-dual simplex method," in Recent
Advances in Mathematical Programming, ed. R. L. Graves and P. Wolfe, McGraw Hill, New
York (1963).
4.2. Balinsky, M. L., and Tucker, A. W., "Duality theory of linear programs: A constructive
approach with applications," SIAM Review 3, 499-581 (1969).
4.3. Barnes, J. W., and Crisp, R. M., "Linear programming: a survey of general purpose algo-
rithms," AilE Transactions 7, No. 3, 49-63 (1975).
4.4. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
4.5. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press, Princeton,
NJ (1963).
4.6. Farkas, J., "Theorie der einfachen Ungleichungen," Journal fiir die reine und angewandte
Mathematik 124, 1-27 (1902).
4.7. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
4.8. Goldfarb, D., and Todd, M. J., "Linear Programming," in Optimization, Handbook in Opera-
tions Research and Management Science, ed. Nemhauser, G. L. and Rinnooy Kan, A. H. G.,
Vol. 1, 73-170, Elsevier-North Holland, Amsterdam (1989).
4.9. Lemke, C. E., "The dual method for solving the linear programming problem," Naval Re-
search Logistics Quarterly 1, No. 1 (1954).
4.10. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, 2d ed., Addison-
Wesley, Reading, MA (1973).
4.11. Peterson, E. L., An Introduction to Linear Optimization, Lecture notes, North Carolina State
University, Raleigh, NC (1990).
EXERCISES
4.1. Prove that the symmetric pair in Example 4.2 are indeed a pair of primal and dual problems
by converting the primal problem into its standard form first.
4.2. Find the linear dual program of the following problems:
(a) Minimize 9xl + 6x2
subject to 3x! + 8x2 ~ 4
5xi +2x2 ~ 7
XJ,X2 ~ 0
88 Duality Theory and Sensitivity Analysis Chap. 4
subject to Ax ::S b, x :S 0
(b-2) Maximize bTw
subject to AT w 2: c, w ::::: 0
XJ :S 0, x2 2: 0, X3 unrestricted
4.5. Find the dual problems of the following linear programming problems:
(a) Minimize cT x
subject to Ax 2: b, X2:0
(b) Maximize bTw
subject to AT w :S c, W2:0
(c) Minimize CT X
(d) Minimize CT X
n
subject to Ax=O, LXi =1, X2:0
i=l
(This is the famous Karmarkar's standard form which will be studied in Chapter 6.)
Exercises 89
n
(e) Minimize LXi
i=1
n
subjectto x;-LaP~jXj2:R~, 'ii=1,2,···,N, 'ik=1,2,···,l
j=1
x; unrestricted
where pk is an NxN (probability) matrix fork= 1, 2, ... , fa E (0, 1), Rf E R+, 'i i, k.
(This is the policy iteration problem in dynamic programming.)
4.6. Construct an example to show that both the primal and dual linear problems have no
feasible solutions. This indicates that the infeasibility of one problem does not imply the
unboundedness of the other one in a primal-dual pair.
4.7. For an infeasible linear program, show that if its dual linear program is feasible, then the
dual must be unbounded.
4.8. For a linear programming problem (P) in its standard form, assume that A is an m x n
matrix with full row rank. Answer the following questions with reasons.
(a) For each basis B, let wT (B) = chB- 1 be the vector of simplex multipliers. Is w(B)
always a feasible solution to its dual problem?
(b) Can every dual feasible solution be represented as wT (B) = chB- 1 for some basis B?
(c) Since A has full row rank, can we guarantee that (P) is nondegenerate?
(d) If (P) is nondegenerate, can we guarantee that its dual is also nondegenerate?
(e) Is it possible that both (P) and its dual are degenerate?
(f) Is it possible that (P) has a unique optimal solution with finite objective value but its
dual problem is infeasible?
(g) Is it possible that both (P) and its dual are unbounded?
(h) When (P) and its dual are both feasible, show that the duality gap vanishes.
4.9. Consider a two-person zero-sum game with the following pay-off matrix to the row player:
Strategies 1 2 3
1 2 -1 0
2 -3 1 1
(This means the row player has two strategies and the column player has three
strategies. If the row player chooses his/her second strategy and the column player chooses
his/her third strategy, then the column player has to pay the row player $1.)
Let XJ, x2, and x3 be the probabilities with which the column player selects his/her
first, second, and third strategies over many plays of the game. Keep in mind that the
column player wishes to minimize the maximal expected payoff to the row player.
(a) What linear program will help the column player to determine his probability distribution
of selecting different strategies?
(b) Find the dual problem of the above linear program.
90 Duality Theory and Sensitivity Analysis Chap.4
Destination
2 3 4
7 2 -2 8
Origin 2 19 5 -2 12
3 5 8 -9 3
and assuming that w = (0, 3, -4, 7, 2, -5, 7l is an optimal dual solution, find an optimal
solution to the original (primal) problem.
4.11. Closely related to Farka's theorem of alternatives is Farka's transposition theorem: "There
is a solution x to the linear system Ax = b and x :=::: 0 if, and only if, bT w :=::: 0 when
AT w :=::: 0." Prove Farka' s transposition theorem.
4.12. Show that there is a solution x to the linear system Ax ::::: b if, and only if, bT w :=::: 0 when
AT w = 0 and w :=::: 0. This problem is called Gale's transposition theorem.
4.13. Show that there is a solution x to the linear system Ax ::::: b and x :=::: 0 if, and only if,
bT w :=::: 0 when AT w :=::: 0 and w :=::: 0.
4.14. Prove Gordan's transposition theorem: There is a solution x to the strict homogeneous linear
system Ax< 0 if, and only if, w = 0 when AT w = 0 and w :=::: 0.
4.15. Use Farka' s lemma to construct a proof of the strong duality theorem of linear programming.
4.16. Why is x* an optimal solution to the linear programming problem with new demands in
Section 4.4.1 ?
4.17. Show that, in applying the primal-dual method, if we end with a restricted primal problem
with positive optimal objective value and y*T Aj ::::: 0, 'V j ¢. T, then the original primal
problem is infeasible and its dual is unbounded.
4.18. Consider the following linear program:
Exercises 91
Minimize 2.q + x2 - x3
subject to Xt + 2x2 + X3 :S 8
- X[ + X2 - 2X3 :S 4
X[,X2,X3:::: 0
First, use the revised simplex method to find the optimal solution and its optimal dual
variables. Then use sensitivity analysis to answer the following questions.
(a) Find a new optimal solution if the cost coefficient of x2 is changed from 1 to 6.
(b) Find a new optimal solution if the coefficient of x 2 in the first constraint is changed
from 2 to 0.25
(c) Find a new optimal solution if we add one more constraint x2 + x3 = 3.
(d) If you were to choose between increasing the right-hand side of the first and the second
constraints, which one would you choose? Why? What is the effect of this increase on
the optimal value of the objective function?
(e) Suppose that a new activity X6 is proposed witli a unit cost of 4 and a consumption
vector A6 = (1 2l. Find a corresponding optimal solution.
5
Co~nplexity nalysis
and the Elli soid etho
The simplex approach described in previous chapters has been an extremely efficient
computational tool ever since it was introduced by G. B. Dantzig in 1947. For certain
problems, however, at least in theory, the method was shown to be very inefficient. This
leads to the study of the computational complexity of linear programming. The worst-
case analysis shows that the simplex method and its variants may take an exponential
number (depending on the problem size) of pivots to reach an optimal solution and the
method may become impractical in solving very large scale general linear programming
problems. Therefore, research work has been directed to finding an algorithm for linear
programming with polynomial complexity.
The first such algorithm was proposed by L. G. Khachian in 1979, based on the
method of central sections and the method of generalized gradient descent with space
dilation, which were developed for nonlinear optimization by several other Soviet math-
ematicians. In theory, Khachian's ellipsoid method has a better time bound than the
simplex method, but it seems to be of little practical value at least at the present time.
The practical performance of the variants of the simplex method is far better than that
of the ellipsoid method.
In this chapter we start with the concept of computational complexity, discuss the
performance of the simplex method in the context of complexity analysis, then introduce
the basic ideas of the ellipsoid method, and conclude with the performance of Khachian's
algorithm.
92
Sec. 5.1 Concepts of Computational Complexity 93
The concept of complexity analysis was introduced in the 1970s to evaluate the perform-
ance of an algorithm. The worst-case analysis measures the degree of difficulty in
problem solving under the worst scenario. The computational complexity provides us
an index of assessing the growth in computational effort of an algorithm as a function
of the size of the problem in the worst-case analysis. The complexity of an algorithm
is usually measured in this context by the number of elementary operations such as
additions, multiplications, and comparisons, which depends on the algorithm and the
total size of the input data in binary representation.
For a general iterative scheme, as discussed in Chapter 3, its complexity is deter-
mined by the product of the total number of iterations and the number of operations at
each iteration. The total number of iterations certainly depends on the accuracy level
required, while the number of elementary operations depends upon the binary represen-
tation of the input size. Consider a linear programming problem
Minimize cT x (5.1a)
subject to Ax = b, x =:: 0. (5.1b)
where A is an m x n matrix with m, n =:: 2, b E Rm, c, x E Rn, and the input data is
all integer (possibly converted from some rational data to this form). By specifying the
values of m, n, A, b, c, we define an instance of the linear program. If we further define
the input length of an instance to be the number of binary bits needed to record all the
data of the problem and denote it by L, then the size of an instance of the problem can
be represented by the triplet (m, n, L). Consequently, the complexity of an algorithm for
linear programming becomes a function of the triplet, namely f(m, n, L). Suppose that
there exists a constant number r > 0 such that the total number of elementary operations
required by the algorithm in any instance of the problem is no more than r f(m, n, L),
we say the algorithm is of order of complexity O(f(m, n, L)). When the complexity
function f (m, n, L) is a polynomial function of m, n and L, the algorithm is said to be
polynomially bounded or to be of polynomial complexity. Otherwise, the algorithm is a
nonpolynomial-time algorithm.
Notice that in the binary system, it takes (r + 1) bits to represent a positive integer
~ E [2r, 2r+I) for a nonnegative integer r. Therefore for a positive integer ~, we require
flog (1 + nl binary bits to represent it, where f·l denotes the round-up integer value.
Adding one more bit for signs, a total of 1 + rlog (1 + I~ 1)1 binary bits are needed for
encoding an arbitrary integers. For linear program (5.1), the input length is given by
n
L = fl + log(l +m)l + fl + log(l +n)l + L{l + fl + log(l + hDl}
}=1
m n m
(5.2)
i=l}=l i=l
94 Complexity Analysis and the Ellipsoid Method Chap. 5
In our complexity analysis, since only an upper bound on the computational effort is
required, we do not need an exact L in defining the size of an instance of a problem. A
common estimate is given by
n
m n
m
+ L{l +log (1 + lb;l)}l
i=l
or
n m n m
L = L 11 +log (1 + icj 1)1 + LL 11 +log (1 + iaij 1)1 + L 11 +log (1 + lb; 1)1 (5.4)
i=l j=l i=l
We now proceed to show that the simplex method is not of polynomial complexity,
although a vast amount of practical experience has confirmed that in most cases the
number of iterations is a linear function of m and a sublinear function of n.
The computational complexity of the simplex method depends upon the total number of
iterations and the number of elementary operations required at each iteration. Different
implementation details result in different complexity. Variants of the simplex method
were designed to achieve better computational performance. Following the computational
procedure in Chapter 3, it is not difficult to estimate that the revised simplex method
requires about m(n - m) + (m + 1) 2 multiplications and m(n + 1) additions at each
iteration. As to Dantzig's original simplex method, it requires about m(n- m) + n + 1
multiplications and m(n - m + 1) additions at each iteration. The key point is that both
of them are of order 0 (mn).
How many iterations are required? Each iteration of the simplex method and its
variants hops from one extreme point to a neighboring extreme point. For a linear
programming problem in its standard form, the feasible domain contains up to C(n, m)
extreme points that an algorithm could possibly visit. Since
The first such example is given by V. Klee and G. L. Minty in 1971 to show that
Dantzig's simplex method must traverse all (2n- 1) extreme points to reach the optimal
solution.
Example 5.1 (Klee-Minty's example)
For 0 < 8 < 1/2,
Maximize Xn
Xi 2: 0, i = 1, 2, ... , n. (5.5c)
Obviously the origin point is a basic feasible solution. If we start with the origin and apply
the largest reduction rule to the entering nonbasic variables, the simplex method takes 2n - 1
iterations to visit every extreme point of the feasible domain. For n = 2 and n = 3, Figures
5.1 and 5.2 illustrate the situations. A mathematical proof based on a linear transformation
of the example is included in Exercise 5.3.
xz
x 1 =(1,b)
~----------------------------~-x!
xO = (0, 0) Figure 5.1
Variants of the simplex method may change the entering or leaving rules (pivot-
ing scheme) to avoid traversing every extreme point. But different bad examples were
reported for different variants. This leads us to believe that the simplex method and its
variants are of exponential complexity.
However, the bad examples rarely happen in real-world problems. It has been
observed in the past forty years that real-life problems in moderate size require the
simplex method to take 4m to 6m iterations in completing two phases. It is conjectured
for n large relative to m, the number of iterations is expected to be a x m, where
exp(a) < log2 (2 + njm). Similar results were confirmed by Monte Carlo experiments
96 Complexity Analysis and the Ellipsoid Method Chap. 5
-,-------
x4:
'
__ ,..-- Xz
x:. ~---cf--- .::_ ........_ .._-__ x2
Figure 5.2
with artificial probability distributions. Hence the expected computational effort of the
simplex method is of O(m 2 n).
When sparsity issues are addressed, a regression equation of the form K mo: nd 0 ·33
usually provides a better fit for the complexity of the simplex method, where K is a
constant, 1.25 < a < 2.5, and d is the number of nonzero elements in matrix A divided
by nm. This explains why the simplex method is efficient in practice, although it is of
exponential complexity in theory.
After the simplex method was realized to be of exponential complexity, a major theo-
retical question arose: "Is there a polynomial-time algorithm for linear programming?"
An affirmative answer was finally provided by L. G. Khachian in 1979. He showed how
one could adapt the ellipsoid method for convex programming (of which linear program-
ming is a special case) developed by N. Z. Shor, D. B. Yudin, and A. S. Nemirovskii to
give a linear programming algorithm of polynomial complexity. More precisely, Yudin
and Nemirovskii showed that the ellipsoid method related to Shor's work approximates
the exact solution within any given tolerance E > 0 in a number of iterations which is
polynomial in both the size of input data and log (1/E). Khachian further proved that
when the method is applied to linear programming problems with integer coefficients,
even an exact solution can be obtained in polynomial time. In this section, we introduce
the basic ideas of the ellipsoid method for linear programming.
Consider a system of n variables in m (strict) linear inequalities, i.e.,
n
Ax< b (5.6a)
Sec. 5.3 Basic Ideas of the Ellipsoid Method 97
with A being an m x n matrix, x E Rn, and b E Rm. Our objective is to find a solution
of (5.6) if it exists. The ellipsoid method starts with a spheroid whose radius is large
enough to include a solution of the system of inequalities if one exists. Denoting the set
of solutions in the initial spheroid by P, the algorithm proceeds by constructing a series
of ellipsoids, Eb at the kth iteration such that P s; Ek. The ellipsoids are constructed in
a way that their volumes shrink geometrically. Since the volume of P can be proven to
be positive when P ::f.¢>, one can show that after a polynomial number of iterations the
algorithm either finds the center point of the current ellipsoid is a solution or concludes
that no solution exists for (5.6).
We now describe the method in geometric terms. Given a nonnegative number r
and a point z E Rn, a spheroid (sphere) centered at z with radius r in the n-dimensional
Euclidean space is defined by
n
S(z, r) = {x E Rn I L (x; - Z;) 2 :::=:: r 2 } = {x E Rn I (x- zl (x- z) :::=:: r 2 } (5.7)
i=l
The volume of S(z, r) is denoted by vol (S(z, r)). Given ann x n nonsingular matrix A
and a point cERn, an affine transformation T(A, c) maps every point x ERn to a new
point A(x- c) E Rn. An ellipsoid is the image of the unit sphere S(O, 1) under some
affine transformation. Therefore an ellipsoid can be represented by
E = {x ERn I (x- cl AT A(x- c) ::::: 1} (5.8)
The point c is defined to be the center of E, and the volume of E is then given by
vol (E) = det(A -I) x vol (S(O, 1)) (5.9)
where det(A -I) is the determinant of the inverse matrix of A. By a half-ellipsoid 4E,
we mean the intersection of E with a halfspace whose bounding hyperplane, H = {x E
Rn 1 aT x = f3} for some vector a E Rn and scalar f3, passes through the center of E. In
other words, we may define
(5.10)
Example 5.2
In Figure 5.3, E = S(O, 1) is the 2-dimensional unit sphere, the shaded area is !E given
by the intersection of E with the halfspace ((x 1, x2 ) E R2 1x 1 :::: 0}. Passing the points
(1, 0), (0, 1), and (0, -I), a new ellipsoid
3/2
A= [ 0
0
.J3!2
J with det(A) = 3.../3/4
We can further extend the result in Example 5.2 to the n-dimensional case. For E =
S(O, 1) s; Rn with the half-ellipsoid 4E
= {x E E I x 1 2:: 0}, we can construct a new
98 Complexity Analysis and the Ellipsoid Method Chap. 5
(1, 0)
Figure 5.3
ellipsoid
whose center is at
-
1
( n+1
,o, ... ,o)
and
2 2
vol (E)= (-n-) (-
n+1 2 ) n
n -1
(n-l)/ x vol (E)
Figure 5.4
to the unit sphere by an appropriate affine transformation. In Exercise 5.6, we can further
prove that
(_n_)
+
n 1
(_:f_)(n-1)/2 :S: e-l/2(n+l)
n2 - 1
for any integer n > 1
!
Lemma 5.1. Every half-ellipsoid E is contained in an ellipsoid E whose volume
is less than e-I/Z(n+Il times the volume of E.
Lemma 5.2. The smallest ellipsoid E containing a convex polyhedral set P has
its center in P.
Lemma 5.2 actually suggests an iterative scheme to solve the system of inequalities
(5.6). Here is the basic idea: if part of the solution set of (5.6) forms a convex polyhedron
P and it is contained in an ellipsoid Ek at the kth iteration, then we could check the center
of Ek. If the center of Ek belongs to P, we find a solution to (5.6). Otherwise, we can
replace Ek by a smaller ellipsoid Ek+I = Ek and repeat this process. Since Lemma 5.1
indicates that the volume of Ek shrinks at least by an exponential term e-l/ZCn+Il after
each iteration, this iterative scheme requires only a polynomial number of iterations to
reach a conclusion, if we know where to start and when to terminate.
100 Complexity Analysis and the Ellipsoid Method Chap.5
To start up the iterative scheme, consider the following known result (to be proven
in Exercise 5.7):
Lemma 5.3. If the system of inequalities (5.6) has any solution, then it has a
solution x E Rn such that
j = 1,2, ... ,n (5.13)
where L is the input size given by (5.3) with Cj = 0 for all j.
Lemma 5.4. If the system of inequalities (5.6) has a solution, then the volume
of its solutions inside the cube {x E Rn llxd :S 2L, i = 1, ... , n} is at least 2-(n+l)L.
Hence we can terminate the iterative scheme when vol (Ek) < 2-<n+ l)L. In this
case, (5.6) has no solution.
Summarizing Lemmas 5.1-5.4, we can outline the basic geometry of the ellipsoid
method for solving a system of strict linear inequalities (5.6) as follows:
Following the basic ideas described in the previous section, we introduce the ellipsoid
method for linear programming in this section. The major task is to construct E of Step
2 in algebraic terms. To do so, we let aT = (an, ai2• ... , ain) E Rn fori = 1, ... , m,
and rewrite system (5.6) as
i = 1,2, ... ,m (5.14)
Moreover, we let Ek be an ellipsoid defined by {x E Rn 1 (x-xk)TBj; 1(x-xk) :::::: 1}, where
1
xk is the center of Ek and Bj; =A[ Ak for some affine transformation matrix Ak. Notice
that when Ak is nonsingular, Bj; 1 is positive definite. Furthermore, if xk is not a solution
of (5.14), then there exists an index i such that aT xk :::.: bi and the potential solution falls
aT aT
in the half-ellipsoid ~Ek = {x E Ek I x :S xk} = {x E Ek I -aT -aT
x :?.: xk}. Refer
Sec. 5.4 Ellipsoid Method for Linear Programming 101
back to Figure 5.4. Based on the three parameters of the step, dilation, and expansion
defined by (5.12), the new ellipsoid Ek is defined by
and
(5.16)
Since B;; 1 is symmetric and positive definite, it can be shown that B;;~ 1 is also a sym-
metric positive define matrix and the set
{ x ERn I (x- xklB;; 1(x- xk) :::=:: 1, af x < b;} (5.17)
_ n ) (~) (n+l)/2
( n+1 times the volume of Ek
n2 - 1
Now, the ellipsoid method for solving a system of strict linear inequalities can be
described in algebraic terms:
Notice that after each iteration vol (Ek) is reduced at least by a factor of e-I/Z(n+l).
Since the starting volume is vol (S(O, 22L)), and the smallest ending volume is 2-(n+l)L,
the total number of iterations is at most a constant times n 2 L. This can be shown in
Exercise 5.11. Also notice that the formulas (5.12), (5.15), and (5.16) for xk+ 1 and
Bk+l assume exact arithmetic. For real implementation, one must use finite-precision
arithmetic to approximate exact numbers. This may cause computational errors. Never-
theless, Khachian indicated that if we take 23L bits of precision before the decimal point
and 38nL after the point, then it is sufficient for rounding approximation to an exact
number. However, if the values of xk+ 1 and Bk+t are rounded to this specified number
of bits, the ellipsoid Ek+l may not contain the re~uired half-ellipsoid. Khachian further
showed that if Bk+l is multiplied by a factor 2 114n , which is slightly larger than 1, then
Ek+l will always contain the desired half-ellipsoid. This guarantees the ellipsoid method
terminates within O(n 2 L) iterations with an exact solution. For our limited interests,
unless otherwise noted we will assume throughout that exact arithmetic is used in this
chapter.
The following simple example shows how the algorithm works.
102 Complexity Analysis and the Ellipsoid Method Chap. 5
Example 5.3
Let (5.6) be given by
Xj < 0
X2 < 0
In this case af = (1, 0), ar = (0, 1), bT ::::: (0, 0), L = 2 + log2 5, and 22L = 400. The
algorithm starts with
Bo = [4o o
0
0
400
J
and terminates in two iterations with
X
1 = [-20/3]
0 , B 1 = [ 1600/9 0 ]
0 1600/3
2 [ -20/3 ] B2 = [ 6400/27 0 ]
X = -40-J} /9 ' 0 6400/27
The geometric picture is given by Figure 5.5.
Figure 5.5
The duality theory in Chapter 4 indicates that in order to solve a linear programming
problem in its canonical form, i.e.,
Maximize cT x (5.18a)
subject to Ax ~ b, X ::: 0, (5.18b)
we only need to consider the following system of inequalities:
CT X= bT W (5.19a)
Ax~ b, x::: 0 (5.19b)
(5.19c)
Sec. 5.5 Performance of the Ellipsoid Method for LP 103
We know that system (5.19) is solvable if and only if the original problem (5.18) has
a feasible solution and a finite optimum. Moreover, if (x, w) is a solution to system
(5.19), then x is an optimal solution to (5.18). Notice that system (5.19) is not of the
strict inequality form, and the ellipsoid method may not be applicable. Fortunately, we
can perturb system (5.19) by a very small number 2-L to convert the weak inequality
form to strict inequalities. The following lemma validates this perturbation scheme.
Hence the ellipsoid method can be applied to system (5.19) with a perturbation
factor 2-L for strict inequalities. In this way, it is clearly seen that a polynomial time
algorithm for linear equalities yields a polynomial time algorithm for linear programming
problems.
The third disadvantage is due to sparsity. So far, the ellipsoid method does not
seem to be able to exploit sparsity. We may start with a very simple matrix Bo, but the
number of the fill-in elements in Bk grows very rapidly. Thus even if the number of
iterations could be reduced significantly, the ellipsoid method would still have problems
in solving large-scale linear programming problems.
After all, a fundamental difficulty is due to the limitations of finite-precision arith-
metic. It is unlikely that any reasonable implementation of the method would be of
polynomial time.
To improve the slow convergence of the ellipsoid method, variations of the basic ellipsoid
method were developed. Since the number of iterations depends upon the volume ratio of
Ek+l to Ek> research has been conducted to generate smaller ellipsoids at each iteration by
considering deep cuts, surrogate cuts, and parallel cuts. Researchers have also replaced
the role of ellipsoids in the basic method by certain polyhedra called simplices. In this
section we discuss some of these modifications.
In the basic ellipsoid method, suppose that xk violates the ith constraint of (5.14);
the ellipsoid Ek+ 1 constructed according to (5.15) and (5.16) contains the half-ellipsoid
~Ek = {x E Ek I aT x ::::; aT xk}. In reality we only require that Ek+ 1 contain the smaller
portion {x E Ek I aT x < b;} C Ek. Hence we may obtain an ellipsoid of smaller volume
by using the deep cut aT x ::::; b; instead of the cut aT x ::::; aT xk, which passes through the
center of Ek. This is illustrated in Figure 5.6.
Deep cut
Figure 5.6
Sec. 5.6 Modifications of the Basic Algorithm 105
Figure 5.7
To derive the formulas for the smaller ellipsoid, we consider the basic case where
E is the unit sphere S(O, 1), and one of the inequalities in (5.14) reads xi > t for some
0 ::=: t < 1. As Figure 5.7 shows, the feasible set P defined by (5.13) and (5.14) can be
included in an ellipsoid
2
E= {X ER" u1 ~ 1 J
1
(xi<:":)'+ n'~: ~ :,) t,x/ 1} oO (5.20)
whose center is
1 + nt )
( n + 1 , 0, ... , 0
and volume is
where
(5.23)
Computing t for each inequality in (5.14), if any one t is greater than or equal to one,
then system (5.14) has no feasible solution. Otherwise we can select the deepest cut with
the largest t for constructing Ek+l· Conceptually, deep cuts should lead to faster volume
reduction, and hence faster convergence of the ellipsoid method. But, as reported by
researchers, the improvement obtained can be rather disappointing.
is valid as long as ui :::: 0 fori = 1, ... , m, since no points that satisfy (5.14) are cut off
by this inequality. It can be shown that the deepest surrogate cut at the kth iteration of
the ellipsoid method is the one whose ui coefficients are obtained by solving
maximize uT (AT xk- b)/(uT ATBkAu) 112 (5.24)
u:::o
where A is defined in (5.6). In practice, since solving (5.24) requires a sufficient amount
of computations, only surrogate cuts which can be generated from two constraints are
considered. Figure 5.8 illustrates a surrogate cut.
---
--- ---
Figure 5.8
where aj = -a; and -bj < b;. Then we consider how to use two constraints simultane-
ously to generate a new ellipsoid. At the kth iteration, we let a = (a[ xk- b;) j a[Bkai J
and a= (aJ xk -bj)jy'a[Bka;. Suppose that aa < 1/n and a ::s-a ::s 1; then formulas
(5.15) and (5.16) with new parameters
I
P = [4 (1 _ a2)(1 _ a2) + n2(a2 _ a2)2] 2 (5.26a)
generate Ek+I that contains the slice {x E Ek I - bj ::S af x ::s b;} of Ek. Figure 5.9 shows
parallel constraint on the unit sphere.
0
xk
Figure 5.9
In the early development stage of the ellipsoid method, A. Yu. Levin already used
simplices rather than ellipsoids to achieve iterative volume reductions. This can be
viewed as a polyhedral version of the ellipsoid method. After Khachian proposed his
method, this idea regained researcher interest.
To describe this idea, we assume that there are n + 1 points v0 , v 1 , ... , vn in Rn,
and no hyperplane passes through all of them. Then the convex hall (as defined in
Chapter 2) generated by these n + 1 points in Rn forms a simplex S(v0 , ... , vn). It is
obvious that a simplex in R 2 is a triangle, and in R 3 a tetrahedron. The center of this
simplex is the point defined by
1 n
C=-I>i
n+1 i=O
(5.27)
for xES. Moreover, we let e(vk) = max{e(vi) 1 i = 0, ... , n}. Since ~S :f.¢, we have
e(vk) > 0. We now define
(5.29a)
and
. k 1 . k
v' = v + d; (v' - v ), i=0,1, ... ,n (5.29b)
It is straightforward to show that the new simplex S(VO, ... , vn) contains the half-simplex
~S(v 0 , ... , vn), and vol (S(VO, ... , vn)) < e- 1/ 2 (n+Il vol (S(v0 , ... , vn)). Consequently,
2
Linear programming practitioners have taken the efficiency of the simplex method for
granted for a long time. However, the worst-case analysis shows the algorithm is of
exponential complexity. The gap in-between still requires substantial efforts to reach full
understanding.
On the other hand, Khachian's algorithm is of polynomial complexity, which settles
a significant theoretical question about the degree of difficulty of linear programming
problems. However, even with considerable modification, the ellipsoid method seems to
be inferior to the simplex method for practical computation.
Although the ellipsoid method has also showed its theoretical significance in solving
nonlinear and combinatorial optimization problems where the constraints are known only
implicitly and may be exponential in number, a polynomial-time algorithm with better
performance in practice for linear programming problems is still in high demand. In 1984,
N. Karmarkar finally provided a promising result and stimulated exciting developments
in this area. We shall study Karmarkar' s algorithm in the next chapter.
5.1. Bland, R. G., Goldfarb, D., and Todd, M. J., "The ellipsoid method: a survey," Operations
Research 29, 1039-1091 (1981).
5.2. Borgwardt, K. H., The Simplex Method: A Probabilistic Analysis, Springer-Verlag, Berlin
(1987).
5.3. Burrell, B. P., and Todd, M. J., "The ellipsoid method generates dual variables," Mathematics
of Operations Research 10, 688-700 (1985).
Exercises 109
EXERCISES
5.1. Compare the graphs of fi(n) = n 2 , h(n) = n 3 , h(n) = 2n, f4(n) = I00n 2 , and fs(n)
= (0.001)2n, for n ::: 0.
(a) Does a quadratic algorithm always perform better than a cubic algorithm? Why?
(b) Does a polynomial algorithm always perform better than an exponential algorithm?
Why?
5.2. Show that C(n, m) > 2m for nonnegative integers n and m with n :::2m.
110 Complexity Analysis and the Ellipsoid Method Chap. 5
e
5.3. Consider Klee-Minty' s example, letting = 1IE and the linear transformation w I = X],
w; = (x;- EX;-1)/Ei- 1 fori= 2, ... , n. Show that problem 5.5 is equivalent to
n
Maximize L w;
i=l
subject to
i-1
w; +2L Wk :::::gi-l for i = 2, ... , n
k=2
j = 1, 2, ... , n
It has n + 1 extreme points vo, ... , Vn which are not on a hyperplane. Therefore the volume
of the polytope is at least
:! I det ( :o : v~ ) I·
1 · ..
5.9. When Bk is a symmetric positive definite matrix, prove that Bk+I defined by (5.16) has the
same property.
5.10. Show that the set defined by (5 .17) is contained in the ellipsoid
Ek+I = {x ERn I (x- xk+I)TB_;;~ 1 (x- xk+ 1) _:::: 1}.
5.11. Consider the ellipsoid method in Section 5.3. What is the volume of Eo? Show that the
total number of iterations needed is O(n 2 L).
5.12. Use Farkas' lemma in Chapter 4 to prove Lemma 5.3.
5.13. Show that the ellipsoid defined by (5.20) has its center at ((1 + nt)j(n + 1), 0, ... , 0) and
has volume equal to
n+1 ( n2 - 1
times the volume of the unit sphere.
5.14. Prove that in Rn, E given by Equations (5.15), (5.16), (5.22), and (5.23) contains the desired
feasible solution set, and determine the volume of E.
5.15. Consider a simple system of linear inequalities x 1 > 1/2, x 2 > 1/2. Solve the problem by
the basic ellipsoid method and the modified method with deep cuts. Does the idea of deep
cuts help?
5.16. Consider Exercise 5.15. Generate a surrogate cut of x 1 +x2 _:::: 1 and then apply the modified
ellipsoid method to solve the problem.
5.17. Consider a simple system of linear inequalities x 1 > 1/4, x 1 < 1/2, x 2 < 1/2. Solve the
problem by the ellipsoid method with parallel cuts.
5.18. Prove that the deepest surrogate cut at the kth iteration of the ellipsoid method is the one
whose Ui coefficients are obtained by solving (5.24).
5.19. In generating parallel cuts, if bj = -bi, calculate parameters r, a, and 8. Compare the
rank of Bk and Bk+ 1 and conclude that Ek+ 1 becomes flat in the direction of ai.
5.20. For any x E S(v0 , ... , vn), we have
n
X= LUiVi
i=O
5.21. Prove the ratio r between the volume of the new simplex and the volume of a given simplex
2
in Lemma 5.6 is less than e- 1/ 2 (n+ll • [Hint: Note the facts that vk = vk; each vi with
i :j:: k lies on the line passing through vk and vi; and the distance from vk to vi equals to
the distance from vk to vi divided by di. Hence
r = fl _!_,]
d·
i# l
6
• •
ar111arkar's ro1ect•ve
Scaling lgorithrn
112
Sec. 6.1 Basic Ideas of Karmarkar's Algorithm 113
upon two key factors: (1) How many steps (iterations) does it take? (2) How much
computation does it involve in each iteration?
The simplex method starts with an extreme point and keeps moving to a better
neighboring extreme point at each iteration until an optimal solution or infeasibility
is reached. In this scheme, the computational work at each iteration is minimized by
limiting the searches to only those edge directions which lead to adjacent extreme points.
But, as the Klee-Minty example showed, the simplex method may have to travel a
long path on the boundary of the feasible domain and visit almost every extreme point
before it stops. This boundary approach suffers from heavy computation in large-scale
applications, since the feasible domain may contain a huge number of extreme points.
Therefore one alternative idea is to travel across the interior of the feasible domain along
a "shorter path" in order to reduce the total number of iterations. However, this interior-
point approach usually requires the consideration of all feasible directions for a better
movement at each iteration. In other words, the new philosophy is to reduce the number
of iterations at the expense of heavier computation at each iteration.
In general, it is not an easy task to identify the "best direction of movement" among
all feasible directions at a particular interior point of the feasible domain. However,
Karmarkar noticed two fundamental insights, assuming the feasible domain is a polytope.
1. If the current interior solution is near the center of the polytope, then it makes sense
to move in the direction of steepest descent of the objective function to achieve a
minimum value.
2. Without changing the problem in any essential way, an appropriate transformation
can be applied to the solution space such that the current interior solution is placed
near the center in the transformed solution space.
The first insight can be observed in Figure 6.1. Since x 1 is near the center of
the polytope, we can improve the solution substantially by moving it in a direction of
steepest descent. But if an off-center point x 2 is so moved, it will soon be out of the
feasible domain before much improvement is made.
Figure 6.1
Karmarkar observed the second insight via the so-calied projective transformation,
whereby straight lines remain straight lines while angles and distances distort such that
114 Karmarkar's Projective Scaling Algorithm Chap. 6
we can view any interior point as the center of the polytope in a distorted picture. One
can use imagination to verify this with Figure 6.1 by viewing it at an angle and distance
that makes x2 appear to be near the center of the polytope. Such a distortion scarcely
alters anything essential to the problem but merely looks at it from a different viewpoint.
With these two fundamental insights, the basic strategy of Karmarkar's projective
scaling algorithm is straightforward. We take an interior solution, transform the solution
space so as to place the current solution near the center of the polytope in the transformed
space, and then move it in the direction of steepest descent, but not all the way to the
boundary of the feasible domain in order to have it remain as an interior solution. Then
take the inverse transformation to map the improved solution back to the original solution
space as a new interior solution. We repeat the process until an optimum is obtained
with the desired accuracy.
Following the basic strategy of the projective scaling, Karmarkar' s algorithm has a pre-
ferred standard form for linear programming:
Minimize CT X (6.la)
subject to Ax=O (6.1b)
eT X= 1, X2:0 (6.1c)
where A is an m x n dimensional matrix of full row rank, eT = (1, 1, ... , 1) is an
n- vector of all ones, and c, x E Rn.
A feasible solution vector x of problem (6.1) is defined to be an interior solution
if every variable xi is strictly positive. Note from (6.1c) that the feasible domain is a
bounded set, hence it becomes a polytope. A consistent problem in Karmarkar' s standard
form certainly has a finite infimum. Karmarkar made two assumptions for his algorithm.
We shall see later how a linear programming problem can be cast into Karmarkar's
standard form satisfying the two assumptions. Here are a couple of examples that fit our
description.
Example 6.1
Minimize -x, + 1
subject to x2 - x3 = 0
Sec. 6.2 Karmarkar's Standard Form 115
Example 6.2
Minimize -x1 - 2x2 + 4x5
subject to x2 - x3 = 0
XI + X2 + X3 + X4 + X5 = 1
~
R= --=-- (6.3)
-fo
Figure 6.2
116 Karmarkar's Projective Scaling Algorithm Chap. 6
(6.4)
r = ~.jr=n(:;=n=-=1"")
Let X: be an interior point of fj_, i.e., x; > 0 fori = 1, ... , n and I:7= 1 x; = 1. We can
define an n x n diagonal matrix
X= diag (X:)=
li X2
It is obvious that matrix X is nonsingular and its inverse matrix X -I is also a diagonal
(6.5)
matrix but with 1/x; as its ith diagonal elements for i = 1, ... , n. Moreover, we can
define a projective transformation Tx. from fj_ to fj_ such that
--1
X X
Tx.(x) = __ 1 for each x E fj_ (6.6)
eTX X
Example 6.3
Consider the simplex 6. in R 3 as shown in Figure 6.3. Let X = (1' 0, ol' y = (0, 1' 0) T,
z = (0, 0, 1)T, a = (3/10, 1/10, 3/S)T, b = (1/3, 0, 2j3)T, c = (0, 1/7, 6/7l,
d = (3/4, 1/4, O)T.
Since point A is an interior point, we can define
Xa = ['flO0
0
1/10
,~,]
0 0
Then we have
x,-I ~ [T 5;,] 0
10
0
Moreover, we see that Ta(X) = (1,0,0)T, Ta(Y) = (0, 1,0)T, Ta(Z) = (0,0, 1)T, Ta(a) =
(1/3, 1/3, 1/3)T, Ta(b) = (1/2, 0, 1/2l, Ta(c) = (0, 1/2, 1/2)T, Ta(d) = (lj2, 1/2, O)T.
Sec. 6.3 Karmarkar's Projective Scaling Algorithm 117
z = (0, 0, I)T
Figure 6.3
Example 6.3 showed that the scale and the angle in the transformed space are
distorted such that a current interior point, in this case point a, becomes the center of fl.
In general, we can show the following results:
Consider a linear programming problem in Karmarkar's standard form (6.1). Its feasible
domain is a polytope formed by the intersection of the null space of the constraint matrix
A, i.e., {xI Ax = 0} and the simplex b. in Rn. Let x > 0 be an interior feasible solution,
then the projective transformation Tx: maps x E b. to
x- 1x
y = Tx:(x) = __ 1
eTX X
(6.8)
118 Karmarkar's Projective Scaling Algorithm Chap. 6
Plugging the value of x into problem (6.1) according to Equation (6.8), and remembering
that Tx: maps f:.. onto f:.., we have a corresponding problem in the transformed space,
namely,
cTXy
minimize (6.1'a)
eTXy
B= [ ~;] (6.9)
then any direction d E R" in the null space of matrix B, i.e., Bd = 0, is a feasible
direction of movement for y. But remember that the distance from the center off:.. to its
closest boundary is given by the radius r in Equation (6.4). Therefore, if we denote the
norm of d by lldll,
then
remains feasible to problem (6.1') as long as d lies in the null space of matrix B and
0 s a < 1. In particular, if 0 s a < 1, then y(a) remains an interior solution, and its
inverse image
Xy(a)
1
x(a) = Tx:- (y(a)) = ---====---
eTXy(a)
(6.11)
becomes a new interior solution to the original problem (6,1). Also note that since
1 1
r = >-
..)n(n -1) n
we may replace Equation (6.10) by
(6.10')
CT Xk _:::: 2-L ( CT ~)
k+l Xkyk+l
X = -=--=--:--:-
eTXkyk+l
Set k = k + 1; go to Step 2.
chosen to be the problem size as defined in Chapter 5 or a multiple of the problem size
such that 2-L < 8 for a given tolerance 8 > 0. We shall prove in the next section that,
if the step size a is chosen to be 1/3, then the algorithm terminates in 0 (nL) iterations.
But for real applications, a larger value of a tends to speed up the convergence.
The following example illustrates one iteration of Karmarkar' s algorithm.
Example 6.4
Solve Example 6.1 by Karmarkar' s algorithm.
First we see that the linear programming problem is in Karmarkar' s standard form,
which satisfies both assumptions (A1) and (A2). Hence we start with
xo = (~· ~, ~)
and note that A= [0, 1, -1] and cT = (-1, 0, 0).
Now check Step 2. From Equation (5.4), we can choose L = 20 and easily see that
the objective value at x0 is too high. Therefore we have to find a better solution.
For Step 3, we define
~~J
[ 1/3 0
Xo = ~ 1/3
0
then AXo = [0, 1/3, -1/3] and
Bo = [~ 1/3
1
-~/3 J
Moreover, the moving direction is given by
T
0 2 -1 -1
= -[1- B0T (B 0 B0T ) -l Bo]Xoc =
d ( 9, 9 , 9 )
with norm lldll = .J6;9. For purposes of illustration, we choose a = 1I .J6 to obtain a new
solution in the transformed space
y0 = (1/3, 1/3, 1/3l + (lf3)(1/.J6)(9/../6)(2/9, -1/9, -1/9l = (4/9, 5/18, 5/18)T
Hence the new interior feasible solution is given by
0
1 XoY T
X = eTXoyO = (4/9, 5/18, 5/18)
Continuing this iterative process, Karmarkar's algorithm will stop at the optimal
solution x* = (1,0,0)T. It is worth mentioning that if we take a= 6j.,J6 > 1, then
yl = (1, 0, O)T and x 1 = x*. Hence direction d0 really points to the optimal solution.
In this section we show that Karmarkar's algorithm terminates in O(nL) iterations under
assumptions (AI) and (A2). The key to proving this polynomial-time solvability is to find
Sec. 6.4 Polynomial-Time Solvability 121
an appropriate step-length a such that the objective value after each iteration decreases
at a geometric rate. In particular, Karmarkar showed that, for a = 1/3,
In this way, for L (or a multiple of it) large enough such that 2-L(cT x0 ) ~ 0, we need
only choose k satisfying
(6.14)
Then we can terminate the algorithm to the precision level we want. Taking the natural
logarithm of (6.14), we see the requirement becomes
In other words, if k > SnL, the algorithm could be terminated with c7 xk < s. Hence
Karmarkar's algorithm requires only a polynomial number O(nL) of iterations.
Notice that (6.13) is equivalent to
or
This shows that the requirement (6.13) will be met if at each iteration we can reduce the
function value of n loge (cT x) by at least a constant of 1/5. Remember that the direction
of movement in Karmarkar' s algorithm was chosen to be the projected negative gradient
in order to reduce the function value of c 7 Xky, which is clearly different from the desired
function n loge (cT x). To link these two different objectives together, Karmarkar defined
a potential function for each interior point x of 1::!. and cost vector c as follows:
(6.18)
Two simple properties can be derived from this definition. First, in the transformed
solution space, we have a corresponding potential function
Remember that
122 Karmarkar's Projective Scaling Algorithm Chap. 6
hence we have
n ( T ) n
= Lloge cxx + Llogexf
j=! 1 j=!
(6.19b)
where det xk is the determinant of the diagonal matrix xk.
The previous equation shows that the potential function is an invariant under the
projective transformation Txk which satisfies the relation
J' (y) = f (x) + loge (det Xk) (6.20)
The second property is based on the observation that
Therefore if we can reduce the potential function f' (e/ n) by a constant in the transformed
solution space at each iteration, then f (xk) is reduced by the same amount after each
iteration taken in the original space. In particular, if we can show that
then
Consequently, we have
f(xk) 2: f(xo)- 5k fork= 1, 2, ...
or
n n k
n loge c 7 xk - L loge xJ 2: n loge c 7 x0 - L loge xJ - 5
j=! j=! n
Note that x0 is at the center ejn of /::, and the function value of L loge Xj over /::, is
j=!
maximized at the center of!::,, hence condition (6.17)
k
n loge (c 7 xk) 2: n loge (c 7 x0 ) -
5
is immediately achieved to guarantee the polynomial-time termination of Karmarkar's
algorithm.
Sec. 6.4 Polynomial-Time Solvability 123
The remaining work is to show that condition (6.21) holds for an appropriately
chosen step-length a in Karmarkar's algorithm. Recall from (6.19a) that
n
j'(y) = nloge (cTXky)- L)oge YJ
}=!
e d )
y = ;;- + ;;-a ( TldiT for some 0 :=: a :=: 1
where
then
Proof Note that the direction vector d is obtained as the projection of the negative
cost vector -cTXb hence cTXkd = -lldll 2. Then we have
T cTXke a
c Xky = - - - -lldll
n n
Moreover, we define
and s'(~,/3)
to be the spheroid in the transformed space which has a center at ejn with a radius
;S :=:: 0. In this way, if we take
~
;S=R=y---;-
then y(R) is the minimizer of the following problem:
Minimize cTXky
subject to AXky = 0
yES'(~, R)
which is a relaxation of the problem
Minimize cTXky
subject to AXky = 0
eT y = 1, y :=:: 0
124 Karmarkar's Projective Scaling Algorithm Chap. 6
Notice that the latter problem is closely related to problem (6.1). By Karmarkar's second
assumption (A2), we know its optimum value is zero. Hence we know the optimal
objective value of the relaxed problem is nonpositive and
cTXke
cTXky(R) = - - - Rlldll:::: 0
n
This implies that
1 cTXke
-lldll : =: _R_n_
n
- Llogeyj ::=:- Lloge
n (1) +
- a
2
2 (6.23)
j=l j=l n 2(1 -a)
Proof Since
yES'(~,~)
we know
1 a
Yj::::---
n n
and hence nyj :::: 1 -a, for j = 1, 2, ... , n. Taking the Taylor series expansion of
loge (1 + (n Yj - 1) ), for each j, there is a /Lj between 1 and nyj such that
and
n
f (e) a
1 1
f (y) ::: -;; -a+ 2(1 - a)2 for appropriate a
/
1
(y)::: /
1
(~)- 5/24
Therefore condition (6.21) is satisfied, and we have the following result as a major
theorem for polynomial-time solvability.
Theorem 6.1. Under the assumptions (A1) and (A2), if a step-length is chosen
to be a = 1/3, then Karmarkar's algorithm stops in O(nL) iterations.
Our objective is to convert this problem into the standard form (6.1) required by
Karmarkar, while satisfying the assumptions (A1) and (A2). We shall first see how to
convert problem (6.24) into Karmarkar's form and then discuss the two assumptions.
The key feature of Karmarkar's standard form is the simplex structure, which of
course results in a bounded feasible domain. Thus we want to regularize problem (6.24)
by adding a bounding constraint
for some positive integer Q derived from the feasibility and optimality considerations.
In the worst case, we can choose Q = 2 L, where L is the problem size. If this constraint
is binding at optimality with the objective value of magnitude -2°(L), then we can show
that the given problem (6.24) is unbounded.
By introducing a slack variable Xn+J, we have a new linear program:
Minimize c7 x (6.25a)
subject to Ax=b (6.25b)
7 (6.25c)
e x+xn+1 = Q
Assumption (A2) requires the optimal objective value of a given linear program to be
zero. For those linear programming problems with a known optimal objective value, this
assumption can be easily taken care of. But for those with unknown optimal objective
values, we have to figure out a process to obtain that piece of information.
Originally, Karmarkar used the so-called sliding objective function method to handle
the problem. We let z* be the unknown optimum value of the objective function and
pick an arbitrary value z. Suppose we run Karmarkar's algorithm pretending that z is
the minimum value of the objective function, i.e., we try to minimize cT x - for the z
given linear program. We also modify Step 3 of Karmarkar' s algorithm as follows:
"After finding yk+ 1 we check if
cTXkyk+l
---==--- <
eTXkyk+l
z
If so, we choose a point yk+
1
on the line segment between ft and yk+l such that
cTXkyk+l -
eTXkyk+l = Z
that achieves the assumed minimum z. On the other hand, for z < z*, eventually we get
a proof that the assumed minimum is lower than the actual minimum by noticing that
Karmarkar' s iteration is no longer able to produce a constant reduction in the potential
function.
With this modification, we can describe the sliding objective function method as
follows. Given that a lower bound l and an upper bound u on the objective function
are known (otherwise, we can take l = -2°(L) and u = 2°(L) to start with), we further
define a tentative lower bound l' and upper bound u' by
l' = l + (lj3)(u -l) (6.29)
and
u' = l + (2j3)(u - l) (6.30)
We pretend that l' is the minimum value of the objective function and run the modified
algorithm. Karmarkar showed that in a polynomial number of iterations, the algorithm
either identifies that l' is lower than the actual minimum or finds a feasible solution with
an objective value lower than u'. For suppose l' is not lower than the actual minimum;
then the constant reduction in the potential function in each iteration will force cr x to
be lower than u'. When l' is found to be too low or u' is too high, we replace l by l'
or u by u' correspondingly and rerun the algorithm. Since the range u - l ?: 0 shrinks
geometrically after each run, we know that in O(nL) runs the range is reduced from
2°CLJ to 2-0(Ll and an optimal solution will be identified.
Another way to handle the unknown optimal objective values is to use the infor-
mation of dual variables. Consider the dual of the linear programming problem (6.1).
We have
Maximize z (6.31a)
m
subject to L a;j w; +z ~ Cj, j = 1, 2, ... , n (6.31b)
i=l
wE R"', zE R (6.31c)
Notice that the dual problem (6.31) is always feasible, since we can choose any value
of w,, w2, ... , Wm and let
z =.min
J=!, ... ,n
(cj- taijw;)
i=l
(6.32)
such that (w, z) becomes a feasible solution to problem (6.31). For simplicity, we can
write (6.31b) as
(6.31b')
and write (6.32) as
z = min(c- AT w)j (6.321)
J
130 Karmarkar's Projective Scaling Algorithm Chap. 6
If a given linear program (6.1) satisfies assumption (A2), then we know z ::s 0 in the dual
problem (6.31). Moreover, any dual feasible solution (w, z) provides a lower bound for
the optimal objective value z* of problem (6.1). One immediate question is, how do we
define dual variables associated with each iteration of Karmarkar's algorithm? With this
information, then we discuss how to use this dual information to handle problems with
unknown optimal objective values.
To get a hint on the definition of dual variables at each iteration, we first consider
the form of the dual variables (w*, z*) at optimum. Assume that x* is the optimal
solution to problem (6.1) and denote matrix X* = diag (xj, ... , x;). At optimum, we
know ATw* ::S c. By complementary slackness, we further have X*ATw* = X*c. In
order to represent w* in terms of x*, we multiply AX* on both sides. Hence we have
A(X*) 2 AT w* = A(X*) 2 c (6.33)
This suggests that we might obtain good dual solutions by defining
wk = (AX~AT)- 1 AX~c (6.34)
and
i = min(c- AT wk)j (6.35)
J
at each iteration of Karmarkar' s algorithm. This is indeed true under the nondegeneracy
assumption, owing to the following theorem:
Theorem 6.2. Under the assumptions (A1) and (A2), if the iterates {xk} defined in
Karmarkar' s algorithm converge to a nondegenerate basic feasible solution x* of problem
(6.1), then {(wk, zk)} defined by (6.34) and (6.35) converges to an optimal solution of
its dual problem (6.31).
Proof Let x* be the principal submatrix of X* corresponding to the basic variables
in x* and
[~]
be the basis matrix of the given linear program corresponding to x*. Then A has
rank m and so does AX*. Hence we know A(X*) 2 AT is nonsingular. Consequently,
A(X*) 2 AT = A(X*) 2 AT is nonsingular.
By definition (6.34), we know (AXfAT)wk = AXfc fork = 1, 2, .... Noticing
that matrix (AXfAT) converges to the nonsingular matrix A(X*) 2 AT and vector AX~c
converges to A(X*) 2 c, it follows that wk converges to the unique solution w* of Equa-
tion (6.33). But we already know that the optimal solution to problem (6.31) also satisfies
Equation (6.33), hence {(wk, zk)} must converge to the optimal dual solution.
iteration. Their basic idea is to incorporate dual information {(wk, zk)} into Karmarkar's
algorithm, with {zk} being monotonically nondecreasing such that i can be used as an
estimate of the unknown optimum value of the objective function.
Notice that for a primal feasible solution x, cT x -l = cT x- zkeT x = (c- zke)T x,
therefore we define
(6.36)
In this way, when z* is unknown, we can consider replacing c by c(zk) in the objective
function at the kth iteration as an estimate. Now, assume that we can modify Karmarkar's
algorithm by finding a sequence of feasible solutions xk, wk, and zk such that
(6.40)
at each iteration, for k = 0, 1, .... Then, before the optimum is reached, we know
zk ::=: z* < cT xk. Moreover, (6.37) and (6.40) directly imply that cT xk ::=: cT x 0 and hence
Together with the definition of potential function (6.18) and inequality (6.40), we know
that
k
f(xk; c(z*)) ::=: f(x 0; c(z*))- S (6.41)
Therefore, the modified algorithm will converge in the same way as Karmarkar's al-
gorithm. The remaining question is how to construct such a sequence of improved
solutions.
Fork= 0, since we know how to take care of assumption (A1), we can choose
o e
X =-
n
and corresponding z0 . Then (6.37)-(6.40) are clearly satisfied. We now are interested
in knowing how to find xk+ 1, wk+l, and zk+ 1 satisfying (6.37)-(6.40), given that we
proceed through the kth iteration. Before doing so, we need some notations and a key
lemma. First for a p xn matrix M with rank p, we denote by PM= I-MT(MMT)- 1M
the projection mapping onto the null space of M, i.e., {d E Rn I Md = 0}. Also denote
by
eeT
Pe=l--
n
132 Karmarkar's Projective Scaling Algorithm Chap. 6
B= -
A [AeT l (6.42)
Suppose that A has full row rank and Ae = 0, then B has full row rank and
Lemma 6.3. In applying the modified Karmarkar' s algorithm with a given cost
vector c E Rn and explicit constraint matrix A such that Ae = 0, let dk = - Pp,c,
w = (AAT)- 1Ac, and z= min(c- AT w)j. Then we have
(cr e) + ~(Z)
1
cT (~n + (~) ~)
n !Jdkll
:::: ( 1_ ~)
n n n
Proof Since dk is the projection of -c, we have Jjdk jj 2 = cT Pp,c = -CT dk, and
AT e a dk ) cAT e a k
c (-
n
+ (n- ) - =---lid
JJdkJJ n n
II
Thus it suffices to show that
AT
k c e A
lid 11::::--z
n
Notice that
dk = -Pp,c = -PeP;..c = -Pe(c- AT w) =-(c-AT w- eeT (c-AT w)jn)
Since Aejn = 0, we get
n
For some i, we have
A
Z=C-
(A AA T Wi
A)
hence
AT
k c e A
di = - - z :::: 0 and
n
With the help of Lemma 6.3, we show how to find xk+ 1 , wk+ 1 and zk+ 1 after the
kth iteration. Let w = (AX~AT)- 1 AX~c(zk) and z = min(c- ATw)j. There are two
J
cases, depending upon whether z :::: zk.
Sec. 6.6 Handling Problems with Unknown Optimal Objective Values 133
Case 1. If z .:s zk, then z will not be a better estimate than zk. We shall focus
on satisfying (6.37) and (6.40). In this case, since
min(c(zk) -AT w)j _:s 0
1
and xk E F, we have
min(Xkc(zk) - XkAT w)j .:S 0
1
(6.44)
Note that
Xkc (zk) - XkAT w = PAxkXkc (i) = PAxk (Xkc- ixk)
If we denote u = PAxkXkc and v = PAxkxk, then
Xkc(l) -XkATw=u-lv
and (6.44) becomes
min (u- zkv)j > 0.
1
Thus zk < zk+ 1 ~ z*. Combining (6.40) with the definition of the potential function, we
can show that
(6.48)
can be applied with = c Xkc(zk+ 1), A= AXk> and B = Bk. Since the corresponding
z = 0, the potential function f ( · ; c(zk+ 1)) can be reduced by at least 1j 5 as before by
moving in the original space along the direction
dk = -XkPskXk (c -l+ 1e)
Combining the analysis of both cases, we state the modified step in Karmarkar' s
algorithm as follows:
At iteration k with xk, wk, and zk, set Xk = diag (xk), compute
u = PAxkXkc, v = PAxkxk
If min(u- zkv)1 ~ 0, then set
1
Otherwise, find
with min(u -l+ 1v)1 = 0
}
and set
wk+! = (AX~AT)- 1 AX~c(l+ 1 )
-k+! k 1 dk
X = X + 3n Tldkli
Set
The modified algorithm then generates a sequence {xk} of primal feasible solutions
and a sequence {(wk, zk)} of dual solutions such that both cT xk and zk converge to the
unknown optimal objective value z*.
Sec. 6.7 Unconstrained Convex Dual Approach 135
As pointed out in the previous section, the dual problem of Karmarkar' s linear program
inherits some interesting properties. In this section, we show that, given an arbitrarily
small number£ > 0, an £-optimal solution to a general linear program in Karmarkar's
standard form can be found by solving an unconstrained convex programming problem.
Let us focus on the linear programming problem (6.1) and its dual problem (6.31)
with an additional assumption that problem (6.1) has a strictly interior feasible solution
x such that Xj > 0 for j = 1, ... , n. We consider the following simple geometric
inequality:
LeYj:::: IT
n n {
-eYjx· }Xj (6.49)
j=l j=l 1
t
J=l
(t
1=l
aijwi- Cj) Xj :S f.J. t x j
J=!
logexj +f.J.lOge {t [(t
J=l
exp
1=l
aijWi- Cj) I f.J.l }
(6.51)
which holds true for arbitrary Wi E R, i = 1, 2, ... , m, Xj > 0, j = 1, 2, ... , n with
m
LXj = 1, and f.J. > 0. Moreover, inequality (6.51) becomes an equality if and only if:
j=l
j = 1, 2, ... , n (6.52)
Now, let us assume that the n-vector x also satisfies the constraint (6.1b) of the
linear programming problem. Then
n
LaijXj = 0, i = 1, 2, ... , m
j=l
136 Karmarkar's Projective Scaling Algorithm Chap. 6
and
n n
-{L loge {t
j=l
exp [ (t t=l
aijWi - Cj) j f.Ll } :S CT X+ f.L t
j=l
Xj loge Xj (6.54)
(6.55)
it can be shown that h(w; f.L) is a strictly concave function of w. Also, under the
assumption that there is a feasible interior solution to the linear programming problem
(6.1), inequality (6.54) implies that h(w; f.L) is bounded from above. Hence a unique
maximum solution w* exists.
Taking derivatives of h(w; f.L) at w*, we have
Then, equation (6.56) implies that x* satisfies the constraint (6.lb), and equation (6.57)
implies that x* satisfies the constraints (6.lc). Hence x* is a feasible solution to problem
(6.1). Moreover, each xJ satisfies the condition specified in (6.52) with
and hence, (6.54) becomes an equality with x and w being replaced by x* and w*,
respectively. We summmarize previous results as follows:
Theorem 6.3. Let w* be the unique maximum of the concave function h(w; J.L)
with f.L > 0. If x* is defined by Equation (6.57), then
Consequently, we know h(w*; J.L) approaches cT x* as f.L goes to 0. Hence, when J.L
is sufficiently small, we can find a near-optimal solution x* to the linear programming
problem (6.1) by solving an unconstrained maximization problem of the concave ob-
jective function h(w; J.L), or equivalently, minimizing an unconstrained convex function
-h(w; J.L). The remaining question is, "How small should f.L be such that x* obtained
by (6.57) is e-optimal, i.e., cT x* - z* ::::: e?"
To answer this question, we define
(6.59)
Taking the logarithm of xt as defined in Equation (6.57), and multiplying the result by
f.L, we have
138 Karmarkar's Projective Scaling Algorithm Chap. 6
Moreover, from the theory of linear programming, we know 0 ::: cT x* - z*. Moreover,
n
0 -< cT x* - z* = ru.loooe x*1 - L....t x*1 log e x*1
11. ' \ " '
r
j=l
n n
= f.L t
j=l
loge (:~) xj
1
(6.63)
L:xj* =1
j=l
x* ) x~'
n
'\"' x* >
L....t l -
IJ n (
_!_
* (6.64)
j=l j=l xj
Since 1 2:: xi, we have
Therefore,
x* ) '
L loge
n (
)
x*
:S loge n (6.66)
j=l 1
and let w* be the unique minimum of the convex function -h(w; p.,). If x* is defined
by Equation (6.57), then
(6.68)
and (x*; w*, z*) becomes an .s-optimal solution pair to the primal problem (6.1) and its
dual problem (6.31).
The following example illustrates the unconstrained dual approach to linear pro-
gramming problems in Karmarkar' s standard form.
Example 6.5
Minimize - x3
subject to XJ - x2 = 0
XJ +x2 +x3 =I
It is easy to see that (0, 0, I) is the optimal solution. In this case, we have a corre-
sponding unconstrained convex programming problem:
Minimize J.doge {exp[z/ J.t] + exp[ -z/ J.t] + exp[I/ J.t]}
subject to zE R
6.7.2 Extension
The work in the previous section actually suggests us to consider a perturbed problem
(Pf.l..):
n
Minimize cT x + p., L Xj loge Xj
j=!
subject to Ax = 0
eT X= 1
x;:::O
140 Karmarkar's Projective Scaling Algorithm Chap.6
subject to w E Rm
Under the assumption that problem (6.1 ), and hence ( PJL), has an interior feasible
solution, there is no duality gap between problems (PM) and (DM). Moreover, when A
has full row rank, for any given tolerance 8 > 0, by choosing
8
f.l,=--
loge n
the optimal solution w* of problem DJL generates a primal feasible solution x* of problem
(6.1), according to Equation (6.57), and a dual feasible solution (w*, z*) of problem
(6.31), according to Equation (6.59), such that I cr x* - z* I ::s 8.
For a linear programming problem in its standard form, we consider a corresponding
problem (P~):
n
Minimize cT x + JL L XJ loge XJ
}=l
subject to Ax = b
X>O
fort> 0 (6.69)
With an additional assumption that problem (P~) has a bounded feasible domain,
a sufficiently small JL can be determined such that the optimal solution w* of problem
(D~) generates an 8-optimal solution x* to the original linear programming problem in
References for Further Reading 141
Karmarkar' s projective scaling algorithm has stimulated a great amount of research in-
terest in linear programming. Since the work was introduced in 1984, many variants
have been proposed and many more are to come. The fundamental difference between
Karmarkar' s algorithm and· simplex methods is the philosophy of moving in the interior
versus moving on the boundary of the polytope. It is not true that Karmarkar-based
interior-point methods are going to replace the simplex methods, at least in the foresee-
able future. Both approaches are very sensitive to the structure of problems. The per-
formance is heavily affected by the sophistication of implementation. A hybrid method
of using the interior approach at the beginning for drastic reduction and shifting to the
simplex method for a final basic feasible solution seems attractive. We shall study the
interior-point methods further and discuss implementation issues in coming chapters.
6.1. Anstreicher, K. M., "A combined phase I- phase II projective algorithm for linear program-
ming," Mathematical Programming 43, 209-223 (1989).
6.2. Anstreicher, K. M., "On the performance of Karmarkar's algorithm over a sequence of
iterations," SIAM Journal on Optimization I, 22-29 (1991).
6.3. Bayer, D., and Lagarias, J. C., "Karmarkar's linear programming algorithm and Newton's
method," Mathematical Programming 50, 291-330 (1991).
6.4. Fang, S. C., "A new unconstrained convex programming approach to linear programming,"
OR Report No. 243, North Carolina State University, Raleigh, NC (1990), Zeischrift fiir
Operations Research 36, 149-161 (1992).
6.5. Fang, S. C., and Tsao, J. H-S., "Solving standard form linear programs via unconstrained
convex programming approach with a quadratically convergent global algorithm," OR Report
No. 259, North Carolina State University, Raleigh, NC (1991).
6.6. Gay, D., "A variant of Karmarkar's linear programming algorithm for problems in standard
form," Mathematical Programming 37, 81-90 (1987).
6.7. Hooker, J. N., "Karmarkar's linear programming algorithm," Interfaces 16, 75-90 (1986).
6.8. Karmarkar, N., "A new polynomial time algorithm for linear programming," Proceedings of
the 16th Annual ACM Symposium on the Theory of Computing, 302-311 (1984).
6.9. Karmarkar, N., "A new polynomial time algorithm for linear programming," Combinatorica
4, 373-395 (1984 ).
6.10. Kojima, M., "Determining basic variables of optimal solutions in Karmarkar's new LP algo-
rithm," Algorithmica 1, 499-515 (1986).
142 Karmarkar's Projective Scaling Algorithm Chap. 6
6.11. Kortanek, K. 0., and Zhu, J., "New purification algorithms for linear programming," Naval
Research Logistics 35, 571-583 (1988).
6.12. Monteiro, R. C., "Convergence and boundary behavior of the projective scaling trajectories
for linear programming," Mathematics of Operations Research 16, No. 4 (1991).
6.13. Rajasekera, J. R., and Fang, S.C., "On the convex programming approach to linear program-
ming," Operations Research Letters 10, 309-:312 (1991).
6.14. Shanno, D. F., "Computing Karmarkar projection quickly," Mathematical Programming 41,
61-71 (1988).
6.15. Shub, M., "On the asymptotic behavior of the projective rescaling algorithm for linear pro-
gramming," Journal of Complexity 3, 258-269 (1987).
6.16. Stone, R. E., and Tovey, C. A., "The simplex and projective scaling algorithm as iteratively
reweighted least squares methods," SIAM Review 33, 220-237 (1991).
6.17. Tapia, R. A., and Zhang, Y., "Cubically convergent method for locating a nearby vertex in
linear programming," Journal of Optimization Theory and Applications 67, 217-225 (1990).
6.18. Tapia, R. A., and Zhang, Y., "An optimal-basis identification technique for interior-point
linear programming algorithms," Linear Algebra and Its Applications, 152, 343-363 (1991).
6.19. Todd, M. J., and Burrell, B. P., "An extension to Karrnarkar's algorithm for linear program-
ming using dual variables," Algorithmica 1, 409-424 (1986).
6.20. Todd, M. J., and Ye, Y., "A centered projective algorithm for linear programming," Mathe-
matics of Operations Research 15, 508-529 (1990).
6.21. Vanderbei, R. J., "Karmarkar's algorithm and problems with free variables," Mathematical
Programming 43, 31-44 (1989).
6.22. Ye, Y., "Karmarkar's algorithm and the ellipsoidal method," Operations Research Letters 4,
177-182 (1987).
6.23. Ye, Y., "Recovering optimal basic variables in Karmarkar's polynomial algorithm for linear
programming," Mathematics of Operations Research 15, 564-572 (1990).
6.24. Ye, Y., and Kojima, M., "Recovering optimal dual solutions in Karmarkar's polynomial
algorithm for linear programming," Mathematical Programming 39, 305-317 (1987).
EXERCISES
R=--
Jn=l
.fo
and the distance between the center and any facet of !::,. is given by
r = --./;=n:;=(n=-=1;::;:)
Exercises 143
6.2. For a projective transformation Tx:, prove results (Tl) through (T6). What can one say about
its inverse transformation?
6.3. Does the projective transformation Tx: map a line segment in 6. to a line segment? Why?
6.4. Why is x(a) in Equation (6.11) an interior feasible solution to problem (6.1)? Prove it.
6.5. Show that if matrix A in Equation (6.9) has full rank, then the matrix BBT is invertible and
hence the direction din Equation (6.12) is well defined.
6.6. Carry out one more iteration of Example 6.4. Is it closer to the optimal solution?
6.7. Show that the function
Since its introduction in 1984, Karmarkar's projective scaling algorithm has become
the most notable interior-point method for solving linear programming problems. This
pioneering work has stimulated a flurry of research activities in the field. Among all
reported variants of Karmarkar' s original algorithm, the affine scaling approach especially
attracted researchers' attention. This approach uses the simple affine transformation to
replace Karmarkar's original projective transformation and allows people to work on the
linear programming problems in standard form. The special simplex structure required
by Karmarkar' s algorithm is relaxed.
The basic affine scaling algorithm was first presented by I. I. Dikin, a Soviet
mathematician, in 1967. Later, in 1985, the work was independently rediscovered by
E. Barnes and R. Vanderbei, M. Meketon, and B. Freedman. They proposed using the
(primal) affine scaling algorithm to solve the (primal) linear programs in standard form
and established convergence proof of the algorithm. A similar algorithm, the so-called
dual affine scaling algorithm, was designed and implemented by I. Adler, N. Karmarkar,
M. G. C. Resende, and G. Veiga for solving (dual) linear programs in inequality form.
Compared to the relatively cumbersome projective transformation, the implementation of
both the primal and dual affine scaling algorithms become quite straightforward. These
two algorithms are currently the variants subject to the widest experimentation and exhibit
promising results, although the theoretical proof of polynomial-time complexity was lost
in the simplified transformation. In fact, N. Megiddo and M. Shub's work indicated
that the trajectory leading to the optimal solution provided by the basic affine scaling
algorithms depends upon the starting solution. A bad starting solution, which is too
close to a vertex of the feasible domain, could result in a long journey traversing all
vertices. Nevertheless, the polynomial-time complexity of the primal and dual affine
scaling algorithms can be reestablished by incorporating a logarithmic barrier function
on the walls of the positive orthant to prevent an interior solution being "trapped" by the
144
Sec. 7.1 Primal Affine Scaling Algorithm 145
boundary behavior. Along this direction, a third variant, the so-called primal-dual affine
scaling algorithm, was presented and analyzed by R. Monteiro, I. Adler, and M. G. C.
Resende, also by M. Kojima, S. Mizuno, and A. Yoshise, in 1987. The theoretical issue
of polynomial-time complexity was successfully addressed.
In this chapter, we introduce and study the abovementioned variants of affine
scaling, using an integrated theme of iterative scheme. Attentions will be focused on
the three basic elements of an iterative scheme, namely, (1) how to start, (2) how to
synthesize a good direction of movement, and (3) how to stop an iterative algorithm.
subject to Ax = b, x 2: 0 (7 .I b)
where A is an m x n matrix of full row rank, b, c, and x are n-dimensional column
vectors.
Notice that the feasible domain of problem (7 .1) is defined by
P = {x E R 11 I Ax= b, x 2: 0}
We further define the relative interior of P (with respect to the affine space
{xiAx = b}) as
P 0 = {x E R11 1Ax= b, x > 0} (7.2)
An n-vector x is called an interior feasible point, or interior solution, of the linear
programming problem, if x E P 0 . Throughout this book, for any interior-point approach,
we always make a fundamental assumption
pO =/= ¢
There are several ways to find an initial interior solution to a given linear programming
problem. The details will be discussed later. For the time being, we simply assume that
an initial interior solution x0 is available and focus on the basic ideas of the primal affine
scaling algorithm.
(1) if the current interior solution is near the center of the polytope, then it makes sense
to move in the direction of steepest descent of the objective function to achieve a
minimum value;
146 Affine Scaling Algorithms Chap. 7
(2) without changing the problem in any essential way, an appropriate transformation
can be applied to the solution space such that the current interior solution is placed
near the center in the transformed solution space.
and its center point ejn = (ljn, 1/n, ... , 1/n)T were purposely introduced for the re-
alization of the above insights. When we directly work on the standard-form problem,
the simplex structure is no longer available, and the feasible domain could become an
unbounded polyhedral set. All the structure remaining is the intersection of the affine
space {x E Rn I Ax = b} formed by the explicit constraints and the positive orthant
{x E Rn 1 x :=:: 0} required by the nonnegativity constraints. It is obvious that the non-
negative orthant does not have a real "center" point. However, if we position ourselves
at the point e = (1, 1, ... , 1) T, at least we still keep equal distance from each facet,
or "wall," of the nonnegative orthant. As long as the moving distance is less than one
unit, any new point that moves from e remains in the interior of the nonnegative orthant.
Consequently, if we were able to find an appropriate transformation that maps a cur-
rent interior solution to the point e, then, in parallel with Karmarkar' s projective scaling
algorithm, we can state a modified strategy as follows.
"Take an interior solution, apply the appropriate transformation to the solution space so as
to place the current solution at e in the transformed space, and then move in the direction of
steep descent in the null space of the transformed explicit constraints, but not all the way to
the nonnegativity walls in order to remain as an interior solution. Then we take the inverse
transformation to map the improved solution back to the original solution space as a new
interior solution. Repeat this process until the optimality or other stopping conditions are
met."
~
n
X~
X, = diag (x!') = (7.3)
l 0 n
It is obvious that matrix Xk is nonsingular with an inverse matrix XJ: 1 , which is also a
diagonal matrix but with 1j xt
being its i th diagonal element for i = 1, ... , n.
Sec. 7.1 Primal Affine Scaling Algorithm 147
L-------~======~----~ L-------~--~----------Y2
Figure 7.1
mation Tk to "center" its image at e. By the relationship x = Xky shown in (7.5), in the
transformed solution space, we have a corresponding linear programming problem
Minimize (ck) T y (7.1'a)
subject to Aky = b, y 2:: 0 (7.1'b)
where ck = Xkc and Ak = AXk.
In Problem (7.1'), the image of xk, i.e., yk = Tk(xk), becomes e that keeps unit
distance away from the walls of the nonnegative orthant. Just as we discussed in Chapter
6, if we move along a direction d~ that lies in the null space of the matrix Ak = AXk for
an appropriate step-length ak > 0 , then the new point yk+l = e + akd; remains interior
feasible to problem (7.1'). Moreover, its inverse image xk+I = Tk- 1 (yk+ 1) = Xkyk+I
becomes a new interior solution to problem (7 .1 ).
Since our objective is to minimize the value of the objective function, the strategy
of adopting the steepest descent applies. In other words, we want to project the negative
gradient -ck onto the null space of matrix Ak to create a good direction d~ with improved
value of the objective function in the transformed space. In order to do so, we first define
the null space projection matrix by
Pk =I- A[ (AkA[)- 1Ak =I- XkAT (AX~AT)- 1 AXk (7.6)
-ck
yk
'',,Constant objective
',',, plane
Figure 7.2
Now we are in a position to translate, in the transformed solution space, the current
interior solution yk = e along the direction of d; to a new interior solution yk+I > 0 with
an improved objective value. In doing so, we have to choose an appropriate step-length
ak > 0 such that
(7.8)
Sec. 7.1 Primal Affine Scaling Algorithm 149
Notice that if d~ :::: 0, then ak can be any positive number without leaving the
interior region. On the other hand, if (d;)i < 0 for some i, then ak has to be smaller
than
Therefore we can choose 0 < a < 1 and apply the minimum ratio test
ak = min {-;--
I -(dy)i
(d~)i < o} (7.9)
This implies that xk+ 1 is indeed an improved solution if the moving direction d~ f. 0.
Moreover, we have the following lemmas:
Lemma 7 .1. If there exists an xk E P 0 with d~ > 0, then the linear programming
problem (7 .1) is unbounded.
Proof Since d~ is in the null space of the constraint matrix AXk and d~ > 0, we
know yk+ 1 = y + akd~ is feasible to problem (7.1'), for any ak > 0. Consequently, we
can set ak to be positive infinity, then Equation (7.12) implies that the limit of c 7 xk+ 1
approaches minus infinity in this case, for xk+l = xk + akXkd~ E P.
Since Xk 1 exists, it follows that (uk)T A= c 7 . Now, for any feasible solution X,
7 7
c x = (uk)T Ax= (uk) b
Since (uk) 7 b does not depend on x, the value of c 7 x remains constant over P.
Lemma 7.3. If the linear programming problem (7.1) is bounded below and its
objective function is not constant, then the sequence {c 7 xk I k = 1, 2, ... } is well-defined
and strictly decreasing.
Proof This is a direct consequence of Lemmas 7.1, 7.2, and Equation (7.12).
Notice that when rk ::?:: 0, the dual estimate wk becomes a dual feasible solution
and (xk) T 0 = e 7 Xkrk becomes the duality gap of the feasible solution pair (xk, wk),
i.e.,
(7 .14)
Sec. 7.1 Primal Affine Scaling Algorithm 151
In case eTXk rk = 0 with rk ::=:: 0, then we have achieved primal feasibility at xk, dual
feasibility at wk, and complementary slackness conditions. In other words, xk is primal
optimal and wk dual optimal.
Based on the above discussions, here we outline an iterative procedure for the
primal affine scaling algorithm.
Step 1 (initialization): Set k = 0 and find x0 > 0 such that Ax0 = b. (Details
will be discussed later.)
Step 2 (computation of dual estimates): Compute the vector of dual estimates
wk = (AX~AT)- 1 AX~c
where Xk is a diagonal matrix whose diagonal elements are the components of xk.
Step 3 (computation of reduced costs): Calculate the reduced costs vector
rk = c- ATwk
Step 4 (check for optimality): If rk ::=:: 0 and eTXkrk .:::: E (a given small positive
number), then STOP. xk is primal optimal and wk is dual optimal. Otherwise, go
to the next step.
Step 5 (obtain the direction of translation): Compute the direction
d~ = -Xkrk
Step 6 (check for unboundedness and constant objective value): If d~ > 0,
then STOP. The problem is unbounded. If d~ = 0, then also STOP. xk is primal
optimal. Otherwise go to Step 7.
Step 7 (compute step-length): Compute the step-length
ak =min{~
I-(dy)i
(d;); < o} where 0 <a < 1
subject to XJ - x2 + X3 = 15
X2 + X4 = 15
152 Affine Scaling Algorithms Chap. 7
In this case,
A=[1
0
-1 1 OJ
1 0 1 , b = [15 15f, and c = [ -2 1 0 0] T
Let us start with, say, x0 = [10 2 7 13]T, which is an interior feasible solution. Hence,
~ ~ ~l
10
Xo-
-
l
Moreover,
0
0 0 7
0 0 0
0
13
- 0.00771f
1. The linear programming problem under consideration has a bounded feasible do-
main with nonempty interior.
2. The linear programming problem is both primal nondegenerate and dual nonde-
generate.
The first assumption rules out the possibility of terminating the primal affine scaling
algorithm with unboundedness, and it can be further shown that (see Exercise 7.5) these
two assumptions imply that (i) the matrix AXk is of full rank for every xk E P and (ii)
the vector rk has at most m zeros for every wk E Rm.
We start with some simple facts.
Lemma 7.4. When the primal affine scaling algorithm applies, lim Xkrk = 0.
k->00
Sec. 7.1 Primal Affine Scaling Algorithm 153
Proof Since {cT xk} is monotonically decreasing and bounded below (by the first
assumption), the sequence converges. Hence Equations (7.12) and (7.9) imply that
The reader may recall that the above result is exactly the complementary slackness
condition introduced in Chapter 4. Let us define C c P to be the set in which the
complementary slackness holds. That is,
C = {xk E P IXkr" =0} (7.15)
Furthermore, we introduce D c P to be the set in which the dual feasibility
condition holds, i.e.,
(7.16)
In view of the optimality conditions of the linear programming problem, it is easy
to prove the following result.
Lemma 7.5. For any x E C n D, xis an optimal solution to the linear program-
ming problem (7.1).
We are now ready to prove that the sequence {xk} generated by the primal affine
scaling algorithm does converge to an optimal solution of problem (7 .1 ). First, we show
that
xj
k+l = k
xj - ak ( xjk)2 rjk
154 Affine Scaling Algorithms Chap. 7
Since (xj) 2rj < 0, we have xJ+ 1 > xj > 0, V k ~ K, which contradicts the fact
that xJ ___,. xj* = 0. Hence we know our assumption must be wrong and x* is primal
optimal.
The remaining work is to show that the sequence {xk} indeed converges.
Theorem 7.2. The sequence {xk} generated by the primal affine scaling algorithm
is convergent.
Proof Since the feasible domain is nonempty, closed, and bounded, owing to
compactness the sequence {xk} has at least one accumulation point in P, say x*. Our
objective is to show that x* is also the only accumulation point of {xk} and hence it
becomes the limit of {xk}.
Noting that rk(·) is a continuous function of xk and applying Lemma 7.4, we can
conclude that x* E C. Furthermore, the nondegeneracy assumption implies that every
element in C including x* must be a basic feasible solution (vertex of P). Hence we
can denote its nonbasic variables by x'N and define N as the index set of these nonbasic
variables. In addition, for any 8 > 0, we define a "8-ball" around x* by
B 0 = { xk E P Ixt < 8e}
Let r* be the reduced cost vector corresponding to x*. The primal and dual non-
degeneracy assumption ensures us to find an E > 0 such that
mip.lrjl > E
jEN
Recalling that
we have
More results on the convergence of the affine scaling algorithm under degeneracy
have appeared recently. Some references are included at the end of this chapter for
further information.
Many implementation issues need to be addressed. In this section, we focus on the start-
ing mechanisms, checking for optimality, and finding an optimal basic feasible solution.
Starting the primal affine scaling algorithm. Parallel to our discussion for
the revised simplex method, here we introduce two mechanisms, namely, the big-M
method and two-phase method for finding an initial interior feasible solution. The first
method is more easily implemented and suitable for most of the applications. However,
more serious commercial implementations often consider the second method for stability.
Big-M Method. In this method, we add one more artificial variable xa asso-
ciated with a large positive number M to the original linear program problem to make
(1, 1, ... , 1) E Rn+l become an initial interior feasible solution to the following problem:
Minimize CT X + MX a (7.19a)
Comparing to the big-M method for the revised simplex method, here we have
only n + 1 variables, instead of n + m. When the primal affine scaling algorithm is
applied to the big-M problem (7.19) with sufficiently large M, since the problem is
feasible, we either arrive at an optimal solution to the big-M problem or conclude that
the problem is unbounded. Similar to the discussions in Chapter 4, if the artificial
variable remains positive in the final solution (x*, xa*) of the big- M problem, then the
original linear programming problem is infeasible. Otherwise, either the original prob-
lem is identified to be unbounded below, or x* solves the original linear programming
problem.
Minimize u (7.20a)
is an interior feasible solution to the Phase-I problem. Hence the primal affine scaling
algorithm can be applied to solve this problem. Moreover, since the Phase-! problem is
bounded below by 0, the primal affine scaling algorithm will always terminate with an
optimal solution, say (x*, u*)T. Again, similar to the discussions in Chapter 4, if u* > 0,
then the original linear programming problem is infeasible. Otherwise, since the Phase-!
problem treats the problem in a higher-dimensional space, we can show that, except for
very rare cases with measure zero, x* > 0 will become an initial interior feasible solution
to the original problem.
Note that the difference in dimensionality between the original and Phase-I prob-
lems could cause extra computations for a simpleminded implementation. First of all,
owing to numerical imprecisions in computers, the optimal solution x* obtained from
Phase-I could become infeasible to the original problem. In other words, we need to
restore primal feasibility before the second-phase computation. Second, the difference of
dimensionality in the fundamental matrices AX~ AT (of the original problem) and AX~ AT
(of the Phase-I problem) could prevent us from using the same "symbolic factorization
template" (to be discussed in Chapter 10) for fast computation of their inverse matrices.
Therefore, it would be helpful if we could operate the Phase-I iterations in the original
n -dimensional space.
In order to do so, let us assume we are at the kth iteration of applying the primal
affine scaling to the Phase-I problem. We denote
Sec. 7.1 Primal Affine Scaling Algorithm 157
A= [A I v],
Remember that the gradient of the objective function is given by
hence the moving direction in the original space of Phase-! problem is given by
a~ = -Xk[I- A.[ c.Ak.A[)- 1Ak]ck (7.21)
where ck = Xkc. If we further define
k A A !A k
w = (AkAk)- Akc (7.22)
then we have
(7.23)
Simple calculation results in
AkA[= [AX~AT + (uk) 2vvT] (7.24)
and
Akck =[A I v] [~k ~k] [u~] = [AXk vuk] 1 [:k] = (uk) 2v (7.25)
(7.28)
Observing that
we further have
1
(Jk= 1 [X~AT(AX~A~~- (b-Axo)] (7.29)
x (uk)-2 +y
Notice that the scalar multiplier in (7.29) will be absorbed into the step-length and
the last element of the moving direction is -1. Hence we know that the algorithm tries
to reduce u all the time. In this expression, we clearly see the computation of d~ can be
performed in the original n-dimensional space and the template for the factorization of
AX~ AT can be used for both Phase I and II.
In order to compute the step-length, we consider that
An interesting and important point to be observed here is that the Phase-I iterations
may be initiated at any time (even during the Phase-II iterations). Once we detect that
the feasibility of a current iterate is lost owing to numerical inaccuracies that stem from
the finite word length of computers, Phase-I iterations can be applied to restore the
feasibility. Hence sometimes we call it a "dynamic infeasibility correction" procedure.
Sophisticated implementations should have this feature built in, since the primal method
is quite sensitive to numerical truncations and round-off errors.
Having determined the starting mechanisms, we focus on the stopping rules for the
implementation of the primal affine scaling algorithm.
Stopping rules. As we mentioned earlier, once the K-K-T conditions are met,
an optimal solution pair is found. Hence we use the conditions of (1) primal feasibility,
(2) dual feasibility, and (3) complementary slackness as the stopping rules. However, in
real implementations these conditions are somewhat relaxed to accommodate the numer-
ical difficulties due to limitations of machine accuracy.
Let xk be a current solution obtained by applying the primal affine scaling algorithm.
The primal feasibility condition requires that
(7.30)
IIAxk- bll
with xk :::: 0 (7.31)
llbll +1
Sec. 7.1 Primal Affine Scaling Algorithm 159
Note that for xk :::: 0, if CJp is small enough, we may accept xk to be primal feasible.
The addition of 1 in the denominator of (7.31) is to ensure numerical stability in
computation.
In practice, we choose CJp, CJd and CJc as sufficiently small positive numbers and
use them to decide if the current iteration meets the stopping rules. According to the
authors' experience, we have observed the following behavior for the primal affine scaling
algorithm:
Finding a basic feasible solution. Notice that, just like Karmarkar' s algo-
rithm, at each iteration, the current solution of the primal affine scaling algorithm always
stays in the interior of the feasible domain P. In order to obtain a basic feasible solution,
the purification scheme and related techniques described in Chapter 6 can be applied here.
The value of the potential function p(x) becomes larger when x is closer to a positivity
wall Xj = 0. Hence it creates a force to "push" x away from too close an approach to
Sec. 7.1 Primal Affine Scaling Algorithm 161
Constant
objective
:/planes~;
' '
: '
'
' Objective
Potential A
step (along ct_:-I) i
step (along d})
/ !'' ~ '
x* ~xk-1
·~
'
'
'
'
''
'
' '
' Recentered
solution
Figure 7.3
a boundary by minimizing p(x). With the potential function, we focus on solving the
following "potential push" problem:
Note that (7 .38b) requires the solution of problem (7 .38) to be an interior feasible solution
to the original linear programming problem; (7.38c) requires it to keep the same objective
value as xk; and minimizing p(x) forces the solution away from the positivity walls.
Therefore, we can take the optimal solution of problem (7 .38) as X.k.
Similar to our discussions for the Phase-I problem, if we directly apply the primal
affine scaling algorithm to solve the potential push problem, we have a mismatch in
dimensionality, since problem (7.38) has one more constraint than the original linear
programming problem. In order to implement the potential push method in a consistent
framework with the primal affine scaling algorithm, we need to take care of requirement
(7.38c) separately. Also notice that we do not really need to find an optimal solution
to the potential push problem. Any feasible solution to problem (7.38) with improved
value in p(x) can be adopted as X.k.
One way to achieve this goal is to take xk as an initial solution to problem (7 .38),
then project the negative gradient of p(x) onto the null space of the constraint matrix A
as a potential moving direction, say pk. But in order to keep the same objective value,
we first project the negative gradient of the objective function c7 x onto the null space
of A and denote it as g. Then, the recentering (or push) direction d~ is taken to be the
component of pk which is orthogonal to g. Finally, along this direction, we conduct a
line search for an optimal step-length.
162 Affine Scaling Algorithms Chap. 7
V
pk = - P ( p ( xk)) = [I - AT (AATr 1A] ( :k ) (7.39)
where
~= (~,
X x
... ,~)T
Xn
1
Similarly,
(7.40)
We now decompose pk into two components, one along g and the other orthogonal
to it. The first component can be expressed as JJ,g, for some M > 0, since it is along the
direction of g. Therefore the orthogonal component can be expressed as
d~ = pk- JJ,g (7.41)
Moreover, the orthogonal condition requires that
(d~)T g = 0 (7.42)
which determines the value of M by
(7.43)
and, consequently,
d~kx -- Pk - [ (pk)T
-- g
gTg
g] (7.44)
----------------- ,Pk
j1.g
Figure 7.4
Sec. 7.1 Primal Affine Scaling Algorithm 163
(7.45)
Hence we only have to search for K in the interval (0, IZ) such that p(xk) assumes a
minimum value.
Several issues are worth mentioning here:
1. When the potential push method is applied after each iteration of the primal affine
scaling algorithm, since P needs to be evaluated only once for all iterations and
a binary search is relatively inexpensive, the evaluation of the potential function
required during the search becomes the most time-consuming operation associated
with the potential push.
2. The purpose of applying potential push is to gain faster convergence by staying
away from the boundary. If the extra speed of convergence obtained by potential
push appears to be marginal, then it is not worth spending any major effort in it.
Some coarse adjustments are good enough in this case. According to the authors'
experience, no more than four or five searches per affine scaling iteration are needed
to estimate xk.
3. Recall that Karmarkar' s potential function is given by (6.18), namely,
n
f(x; c)= n loge (cT x)- L loge Xj
j=l
Hence j(x; c) = n loge (cT x)- p(x), assuming that cT x > 0. When the potential
push is applied after each iteration of the primal affine scaling, we see the first
term in j(x; c) is reduced by the affine scaling and the second term is reduced by
the potential push. Thus the flavor of Karmarkar' s approach is preserved.
4. Since the flavor of Karmarkar's potential function is preserved, it is conjectured that
primal affine scaling together with potential push could result in a polynomial-time
algorithm. But so far, no rigorous complexity proof has been provided.
Logarithmic barrier function method. Another way to stay away from the
positivity walls is to incorporate a barrier function, with extremely high values along the
boundaries {x E Rn I Xj = 0, for some 1 ::=: j ::=: n}, into the original objective function.
Minimizing this new objective function will automatically push a solution away from
the positivity walls. The logarithmic barrier method considers the following nonlinear
164 Affine Scaling Algorithms Chap. 7
optimization problem:
n
Minimize FIL(x) = cT x- p, I)ogexj (7.46a)
j=!
It follows that
(7 .48)
and
1
diL = --X[I- XAT (AX2 Ar)- 1 AX](Xc- p,e) (7.49a)
p,
Taking the given solution to be x = xk and comparing diL with the primal affine
scaling moving direction d~, we see that
While classical barrier function theory requires that xk solves problem (7.46) ex-
plicitly before IL = ILk is reduced, C. Gonzaga has pointed out that there exists ILo > 0,
0 < p < 1, and a > 0 so that choosing dJLk by (7 .49), xk+l = xk +adJLk, and ILk+ I = p ILk
yields convergence to an optimal solution x* to the original linear programming problem
in O(.jii.L) iterations. This could result in a polynomial-time affine scaling algorithm
with complexity O(n 3 L). A simple and elegant proof is due to C. Roos and J.-Ph. Vial,
similar to the one proposed by R. Monteiro and I. Adler for the primal-dual algorithm.
The dual affine scaling algorithm also consists of three key parts, namely, starting with
an interior dual feasible solution, moving to a better interior solution, and stopping with
an optimal dual solution. We shall discuss the starting mechanisms and stopping rules
in later sections. In this section we focus on the iterates.
Given that at the kth iteration, we have an interior dual solution (wk; sk) such
that AT wk + sk = c and sk > 0. Our objective is to find a good moving direction
(d:; d~) together with an appropriate step-length f3k > 0 such that a new interior solution
(wk+ 1 ; sk+ 1) is generated by
wk+ 1 = wk + f3kd~ (7.51a)
sk+l = sk + f3kd~ (7.51b)
which satisfies that
AT ~+1 + sk+I = c (7.52a)
sk+ 1 > 0 (7.52b)
and
(7.52c)
166 Affine Scaling Algorithms Chap. 7
Plugging (7.51) into (7.52a) and remembering that AT wk +sk = c, we have a requirement
for the moving direction, namely,
AT d: + d~ = 0 (7.53a)
In order to get better objective value, we plug (7.51a) into (7.52c), which results in
another requirement for the moving direction:
bT d: : : 0 (7.53b)
To take care of (7 .52b ), the affine scaling method is applied. The basic idea is to recenter
sk at e = ( 1, 1, ... , 1) T E Rn in the transformed space such that the distance to each
positivity wall is known. In this way, any movement within unit distance certainly
preserves the positivity requirement.
Similar to what we did in the primal affine scaling algorithm, we define an affine
scaling matrix Sk = diag (sk) which is a diagonal matrix with st
as its ith diagonal
element. In this way, St; 1sk = e and every s-variable is transformed (or scaled) into a
new variable u ::: 0 such that
(7.54a)
and
(7.54b)
Moreover, if d~ is a direction of co~t improvement in the transformed space, then its
corresponding direction in the original space is given by
d~ = Skd~. (7.54c)
Now we can study the iterates of the dual affine scaling algorithm in the transformed
(or scaled) space. In order to synthesize a good moving direction in the transformed
space, requirement (7 .53a) implies that
AT d: + d; = 0::::? AT d: + Skd~ = 0
::::? s-lAT
k
dkw+"
dk = 0 -----T'k
___.._s-lAT dkw = -dku
then we have
bTd: = bTQkd: = bTQkQ[b = ilbTQkll 2 2:0
1. If d~ = 0, then the dual problem has a constant objective value in its feasible
domain and (wk; sk) is dual optimal.
2. If d~ 2: 0 (but =I= 0), then problem (7.50) is unbounded.
3. Otherwise,
where 0 <a< 1
Note that, similar to the way we defined dual estimates in the primal affine scaling
algorithm, if we define
(7.57)
then Axk = ASk 2 AT d~ = b. Therefore, xk can be viewed as a "primal estimate" in
the dual affine scaling algorithm. Once the primal estimate satisfies that xk 2: 0, then
it becomes a primal feasible solution with a duality gap cT xk - bT wk. Moreover, if
cT xk - bT wk = 0, then (wk; sk) must be dual optimal and xk primal optimal. This
information can be used to define stopping rules for the dual affine scaling algorithm.
Based on the basic ideas discussed in the previous section, we outline a dual affine
scaling algorithm here.
Step 4 (computation of the primal estimate): Compute the primal estimate as:
xk = -SJ; 2 d~
-23.53211l
do= -AT do = -11.14679
s w -23.53211
[
-34.67890
Then
0.99 xI
fJo = 23.53211 = 0.04207
Updating dual variables, we have
So far, we have finished one iteration of the dual affine scaling algorithm. Iterating
again, we obtain
w 2 = (-2.00962 -1.10149)T
This time, x 2 > 0 and the duality gap has drastically reduced to
which is clearly closer to zero. The reader may carry out more iterations and verify that the
optimal value is assumed at w* = (- 2 -1) T and s* = (0 0 2 1) T with an optimal ob-
jective value of -45. The corresponding primal solution x* is located at (30 15 0 Ol.
In this section we introduce two methods, the "big-M method" and "upper bound
method," to find an initial dual feasible interior solution for the dual affine scaling al-
gorithm. Then we discuss the stopping rules and report some computational experience
regarding dual affine scaling.
Starting the dual affine scaling algorithm. The problem here is to find
(w0 ; s0 ) such that AT w0 + s0= c and s0 > 0. Note that, in a special case, if c > 0, then
we can immediately choose w0 = 0 and s0 = c as an initial interior feasible solution
for the dual affine scaling algorithm. Unfortunately, this special case does not happen
every time, and we have to depend upon other methods to start the dual affine scaling
algorithm.
Big-M Method. One of the most widely used methods to start the dual affine
scaling is the big-M method. In this method, we add one more artificial variable, say wa,
170 Affine Scaling Algorithms Chap. 7
and a large positive number M. Then consider the following "big-M" linear programming
problem:
Maximize bT w + M wa
p; = { ~ if
if
C;
C;
:'S 0
> 0
Minimize cT x
The success of this method depends upon the choice of M. It has to be sufficiently
large to include at least one optimal solution to problem (7.50). If the original linear
programming problem is unbounded, the choice of M becomes a real problem.
Stopping rules for dual affine scaling. For the dual affine scaling algorithm,
we still use the K-K-T conditions for optimality test. Note that the dual feasibility is
maintained by the algorithm throughout the entire iterative procedure. Hence we only
need to check the primal feasibility and complementary slackness.
Combining (7 .56c) and (7 .57), we see that the primal estimate is given by
(7.59)
It is easy to see that the explicit constraints Ax = b are automatically satisfied for
any xk which is defined according to formula (7.59). Therefore, if xk ::: 0, then it must
be primal feasible. Also note that, if we convert problem (7.50) into a standard-form
linear programming problem and apply the primal affine scaling to it, the associated dual
estimates eventually result in formula (7 .59).
Once we have reached dual feasibility at wk and primal feasibility at xk, then the
complementary slackness is provided by ac = cT xk - bT wk. When ac is smaller than a
given threshold, we can terminate the dual affine scaling algorithm.
Experiences with dual affine scaling. In light of the fact that the dual affine
scaling algorithm is equivalent to the primal affine scaling algorithm applied to the dual
problem, similar properties of convergence of the dual affine scaling can be established
as we did for the primal affine scaling algorithm. The computational effort in each
iteration of the dual affine scaling is about the same as in the primal affine scaling. To
be more specific, the computational bottleneck of the primal affine scaling is to invert the
matrix AX~ AT, and the bottleneck of dual affine scaling is to invert the matrix AS:;;- 2 AT.
But these two matrices have exactly the same structure, although they use different
scaling. Any numerical method, for example, Cholesky factorization, that improves the
computational efficiency of one algorithm definitely improves the performance of the
other one.
Based on the authors' experience, we have observed the following characteristics
of the dual affine scaling algorithm:
3. The dual affine scaling algorithm is still sensitive to dual degeneracy, but less
sensitive to primal degeneracy.
4. The dual affine scaling algorithm improves its dual objective function in a very
fast fashion. However, attaining primal feasibility is quite slow.
Like the primal affine scaling algorithm, there is no theoretic proof showing the dual
affine scaling is a polynomial-time algorithm. The philosophy of "staying away from the
boundary" to gain faster convergence also applies here. In this section, we introduce the
power series method and logarithmic barrier function method to improve the performance
of the dual affine scaling algorithm.
as a first-order differential equation and attempt to find a solution function which describes
the continuous curve. This smooth curve is called a continuous trajectory, and the moving
direction d~ = Xkd~ at each iteration is simply the tangential direction (or first-order
approximation) of this curve at an interior point of P.
As we can see from Figure 7.5, the first-order approximation deviates from the
continuous trajectory easily. A higher-order approximation may stay closer to the con-
tinuous trajectory that leads to an optimal solution. The basic idea of the power series
method is to find higher-order approximations of the continuous trajectory in terms of
truncated power series.
x*
Figure 7.5
Sec. 7.2 Dual Affine Scaling Algorithm 173
The same idea applies to the dual affine scaling algorithm. As a matter of fact,
a continuous version of the dual affine scaling algorithm may be obtained by setting
f3k -+ 0 and solving Equation (7.51) as a system of ordinary differential equations.
These equations will specify a vector field on the interior of the feasible domain. Our
objective here is to generate higher-order approximations of the continuous trajectories
by means of truncated power series.
Combining (7.56b) and (7.56c), we first write a system of differential equations
corresponding to (7.51) as follows:
(7.60a)
(7.60b)
(7.6la)
and
(k ::: 1) order approximation of w(f3) and s(f3), we need to compute w<j> and s<j> for
j = 1, ... , k. But Equation (7.60b) implies that
(7.64)
where M< 0 > = M(O) = AS(0)- 2 AT, in which S(0)-2 is the diagonal matrix with l/(s?) 2
being its ith diagonal element, and
k! ) [d(k-j)M(f3)] [dU+Ilw(f3)] = 0
~
k (
(7.65b)
j!(k-j)! d[3(k-j) df3U+Il
Hence our focus is shifted to find M<k-j+l>w<j> for j = 1, 2, ... , k and compute
w<k+l> in a recursive fashion.
Remembering that M(f3) = AS(f3)- 2 AT, we let Y(f3) = S(f3)- 2 be a diagonal
matrix with 1/(si(f3)) 2 as its ith diagonal element. Then, we have
(7.67)
and
Z(f3)Y(f3) =I.
Sec. 7.2 Dual Affine Scaling Algorithm 175
Consequently,
Vk?:.l, (7.69)
where y<O> = Y(O), which is a diagonal matrix with 1/Cs?f being its ith diagonal
element, and z<O> = Z(O), which is a diagonal matrix with (sf) 2 being its ith diagonal
element.
Now, if we know z<k-i>, then y<k> can be obtained by Equation (7.69). But this
is an easy job, since Z(f3) = S(f3) 2 . Taking kth derivative on both sides, we have
k
z<k> =L s<k-j>s<j> (7.70)
j=O
where S<j> for each j is a diagonal matrix which takes the ith component of s<j>,
i.e., S;<j>' as its ith diagonal element. In particular, S< 0 > = S(O), which is the diagonal
matrix with s? as its ith diagonal element.
Summarizing what we have derived so far, we can start with the initial conditions
Proceed with
w<l> = [M(O)r 1b = [AS02 Arr 1b,
and
Remembering that z<O> = Z(O) = diag ((s 0f) and y<O> = Y(O) = [Z(0)]- 1, from
Equation (7.70), we can derive z<I>. Then, y<I> can be derived from Equation (7.69).
Now, recursively, we can compute w< 2 > by Equation (7.68); compute s<2> by
(7 .71)
fork = 2; compute Z< 2 > by Equation (7.70); and compute y< 2 > by Equation (7.69).
Proceeding with this recursive procedure, we can approximate w(f3) and s(f3) by a power
series up to the desirable order.
Notice that the first-order power series approximation results in the same mov-
ing direction as the dual affine scaling algorithm at a current solution. In order to get
higher-order approximation, additional matrix multiplications and additions are needed.
However, there is only one matrix inversion AS 02 AT involved, which is needed anyway
by the first-order approximation. Since matrix inversion dominates the computational
176 Affine Scaling Algorithms Chap. 7
By appropriately choosing the barrier parameter J.L and step-length at each iteration,
C. Roos and J.-Ph. Vial provided a very simple and elegant polynomiality proof of the
dual affine scaling with logarithmic barrier function. Their algorithm terminates in at
most O(,jfi) iterations. Earlier, J. Renegar had derived a polynomial-time dual algo-
rithm based upon the methods of centers and Newton's method for linear programming
problems.
Instead of using (7.72), J. Renegar considers the following function:
n
f(w, /3) = t loge(bT W- /3) + L loge(Cj- AJ w) (7.74)
j=l
where f3 is an underestimate for the minimum value of the dual objective function (like the
idea used by Todd and Burrell) and t is allowed to be a free variable. A straightforward
calculation of one Newton step at a current solution (wk; sk) results in a moving direction
(7.75)
where
bT (AS.k 2 AT)- 1 AS.k 1e + bT wk- f3
y = (bTwk- {3) 2/t + bT(AS.k2 AT)- 1b
By carefully choosing values oft and a sequence of better estimations {f3k}, J. Renegar
showed that his dual method converges in O(,jfiL) iterations and results in a polynomial-
time algorithm of a total complexity 0 (n 3·5 L) arithmetic operations. Subsequently, P.M.
Vaidya improved the complexity to O(n 3 L) arithmetic operations. The relationship
between Renegar's method and the logarithmic barrier function method can be clearly
seen by comparing (7.73) and (7.75).
As in the simplex approach, in addition to primal affine scaling and dual affine scaling,
there is a primal-dual algorithm. The primal-dual interior-point algorithm is based on
the logarithmic barrier function approach. The idea of using the logarithmic barrier
function method for convex programming problems can be traced back to K. R. Frisch
in 1955. After Karmarkar's algorithm was introduced in 1984, the logarithmic barrier
function method was reconsidered for solving linear programming problems. P. E. Gill,
W. Murray, M.A. Saunders, J. A. Tomlin, and M. H. Wright used this method to develop
a projected Newton barrier method and showed an equivalence to Karmarkar's projective
scaling algorithm in 1985. N. Megiddo provided a theoretical analysis for the logarithmic
barrier method and proposed a primal-dual framework in 1986. Using this framework, M.
Kojima, S. Mizuno, and A. Yoshise presented a polynomial-time primal-dual algorithm
for linear programming problems in 1987. Their algorithm was shown to converge in at
most O(nL) iterations with a requirement of O(n 3 ) arithmetic operations per iteration.
Hence the total complexity is 0 (n 4 L) arithmetic operations. Later, R. C. Monteiro and
I. Adler refined the primal-dual algorithm to converge in at most 0 (,jfiL) iterations
178 Affine Scaling Algorithms Chap. 7
with O(n 2 ·5 ) arithmetic operations required per iteration, resulting in a total of O(n 3 L)
arithmetic operations.
=
(Al) The setS {x E Rn I Ax= b, x > 0} is nonempty.
=
(A2) The set T {(w; s) E Rm x Rn I ATw + s = c, s > 0} is nonempty.
(A3) The constraint matrix A has full row rank.
Under these assumptions, it is clearly seen from the duality theorem that problems
(P) and (D) have optimal solutions with a common value. Moreover, the sets of the
optimal solutions of (P) and (D) are bounded.
Note that, for x > 0 in (P), we may apply the logarithmic barrier function technique,
and consider the following family of nonlinear programming problems (P p.):
n
Minimize cT x - fJ., L loge Xj
j=!
Under assumptions (AI) and (A2) and assuming that (P) has a bounded feasible
region, we see problem (Pp.) is indeed feasible and assumes a unique minimum at x(JL),
for each JL > 0. Consequently, the system (7.76) has a unique solution (x; w; s) E
Rn x Rm x Rn. Hence we have the following lemma:
Lemma 7.5. Under the assumptions (Al) and (A2), both problem (Pp.) and sys-
tem (7.76) have a unique solution.
Observe that system (7.76) also provides the necessary and sufficient conditions
(the K-K-T conditions) for (w(JL); S(JL)) being a maximum solution of the following
program (Dp.):
n
Maximize bT w + JL L loge Sj
j=1
Therefore, as JL ~ 0, the duality gap g(JL) converges to zero. This implies that x(JL) and
w(JL) indeed converge to the optimal solutions of problems (P) and (D), respectively.
Hence we have the following result:
(d~; d~; d~) and step-length f3k at each iteration. To measure a "deviation" from the
curve r at each (xk; wk; sk), we introduce the following notations, fork= 0, 1, 2, ... ,
fori=1,2, ... ,n (7.79a)
(7.79b)
¢;ve
ek _- k (7.79d)
¢min
Obviously, we see that ek ::: 1 and (xk; wk; sk) E r if and only if ek = 1. We shall
see in later sections, when the deviation e0 at the initial point (x 0 ; w 0 ; s 0 ) E S x T is
large, the primal-dual algorithm reduces not only the duality gap but also the deviation.
With suitably chosen parameters, the sequence of points {(xk; wk; sk) E S x T} generated
by the primal-dual algorithm satisfy the inequalities
c 7 xk+l - b7 wk+ 1 = (1 - 2/(nek))(c7 xk - b7 wk) (7.80a)
ek+ 1 -2:::: (1- 1/(n + 1))(ek- 2), if 2 < ek (7.80b)
(7.80c)
The first inequality (7.80a) ensures that the duality gap decreases monotonically.
With the remaining two inequalities we see the deviation ek becomes smaller than 3 in
at most 0 (n log e 0 ) iterations, and then the duality gap converges to 0 linearly with the
convergence rate at least (1 - 2/(3n)).
We are now in a position to develop the key steps of the primal-dual algorithm. Let
us begin by synthesizing a direction of translation (moving direction) (d~; d~; d~) at a
current point (xk; wk; sk) such that the translation is made along the curve r to a new
point (xk+ 1 ; wk+l; sk+ 1). This task is taken care of by applying the famous Newton's
method to the system of equations (7.76a)-(7.76c).
Newton direction. Newton's method is one of the most commonly used tech-
niques for finding a root of a system of nonlinear equations via successively approximat-
ing the system by linear equations. To be more specific, suppose that F (z) is a nonlinear
mapping from Rn toRn and we need to find a z* E Rn such that F(z*) = 0. By using the
multivariable Taylor series expansion (say at z = zk), we obtain a linear approximation:
F(zk + .6.z) ~ F(zk) + J(zk).6.z (7.81)
where J(zk) is the Jacobian matrix whose (i, j)th element is given by
[a~iZ;(z) Jz=zk
Sec. 7.3 The Primal-Dual Algorithm 181
and t..z is a translation vector. As the left-hand side of (7.81) evaluates at a root of
F (z) = 0, we have a linear system
(7.82)
A solution vector of equation (7.82) provides one Newton iterate from zk to zk+l =
zk +d~ with a Newton direction d~ and a unit step-length. When J(z*) is nonsingular and
the starting point z0 is "close enough" to z*, Newton's method converges quadratically
to z*. But this spectacular convergence rate is only a "local" behavior. For a general
nonlinear mapping F (z), if z0 is not close enough to z*, the Newton iteration may diverge
hopelessly.
Let us focus on the nonlinear system (7.76a-c). Assume that we are at a point
(xk; wk; sk) for some f.Lk > 0, such that xk, sk > 0. The Newton direction (d~; d~; d~)
is determined by the following system of linear equations:
(7.83)
[l
where Xk and Sk are the diagonal matrices formed by xk and sk, respectively. Multiplying
it out, we have
(7.84a)
(7.84b)
(7.84c)
where
(7.85)
Notice that if xk E Sand (wk; sk) E T, then tk = 0 and uk = 0 correspondingly.
To solve system (7.83), we multiply both sides of Equation (7.84b) by AXkSk -t.
Then, we have
(7.86)
Now from Equation (7.84c), we have
d; = X,i; vk- X,i; Skd~.
1 1
(7.87)
Following (7.85), we denote X,i; 1vk = J.LkX_;; 1e- Ske as pk. Using Equation (7.84a) in
the above equation would produce
AXkS,i; 1d; = AXkS,i; 1pk - tk (7.88)
Substituting Equation (7.88) back into Equation (7.86) yields
d~ = [AXkS,i; 1ATrl (AXkS,i; 1(uk- pk) + tk) (7.89a)
and
dkX = xk s-I[pk-
k
dkJ S
(7.89c)
Again, for (xk; wk; sk) E S x T, Equations (7.89a)-(7.89c) are simplified as
(7.91a)
and matrix
", ?' . ,, ~, . '1 l A
Since matlix Q is the orthogonal projection matrix onto the column space of matrix
DkAT, we see that
(7.9lb)
(7.91c)
After obtaining a Newton direction at the kth iteration, the primal-dual algorithm
iterates to a new point according to the following translation:
xk+I = xk + ,Bkd~
wk+I = wk + ,Bkd~
sk+ 1 = l + ,Bkd~
with an appropriately chosen step-length ,Bk at the kth iteration such that xk+I E S and
(wk+I; sk+I) E T.
Sec. 7.3 The Primal-Dual Algorithm 183
Step-length and penalty parameter. When (xk; wk; sk) E S x T, the primal-
dual algorithm needs two parameters a and r, such that 0 ::::: r < a < 1, to control the
penalty (or barrier) parameter pf and the step-length f3k at the kth iteration.
For the penalty parameter, remembering the notations defined in (7.79), since we
want to reduce the duality gap, n¢~ve' we may choose the penalty parameter to be a
smaller number by setting
(7.92)
(7.93a)
i = 1,2, ... ,n
Moreover, since (d~l d~ = 0, we see the average complementary slackness, and hence
the duality gap, changes linearly in /3, i.e.,
(7.93b)
Ignoring the quadratic term in (7.93a) and lowering the value p,k = a¢~ve by a
factor r < a, we can define a linear function
(7.94)
The function ¢f (/3) can be either convex or concave depending upon the value of d;; d~.
For the convex piece, since d;;d~ ~ 0, the curve of ¢f(f3) lies above the curve of
1/fk (/3) for 0 ::::: f3 ::::: 1. However, a concave piece of ¢f (/3) may intersect 1/rk (/3) as
shown in Figure 7.6. In order to control the deviation parameter (Jk while reducing the
complementary slackness, we choose
for all f3 E (0, fJ),
The geometrical significance of ak and f3k is depicted in Figure 7.6. It is clearly seen
from the figure that the choice of f3k depends on the choice of 0 < r < 1 to ensure the
existence of ak.
Note that when (xk; wk; sk) E S x T, since (d~; d~; d~) is a solution to (7.84)
with tk = 0 and uk = 0, we know that Axk+ 1 = b and AT w + s = c. Moreover, the
definition of ak in (7.95) further implies that xk+ 1 > 0 and sk+ 1 > 0. In other words,
(xk+ 1 ; wk+I; sk+ 1) E S x T.
184 Affine Scaling Algorithms Chap. 7
L---------------------~--~---~
0 ak Figure 7.6
sk+ 1 = sk + {3kd~
Set k ~ k + 1 and go to Step 2.
7.3.4 Polynomial-Time Termination
Unlike the pure primal affine scaling and the pure dual affine scaling, the primal-dual al-
gorithm is a polynomial-time algorithm. The well-chosen step-length f3k at each iteration
leads to the nice convergence results:
Theorem 7.3. If the step-length f3k < 1 at the kth iteration, then
k 4(CY-r) 4(CY-r)
{3> >------::----:- (7.97a)
- n(l - 20' + ek0'2) - n(l + 0'2)ek
Sec. 7.3 The Primal-Dual Algorithm 185
where
4(u- r)r
v = ----- -,-----
n(l + u 2 ) + 4(u- r)r
(7.99)
Note that the first two terms of Equation (7 .93a) are bounded below by the first
two terms of Equation (7.99) for all f3 E [0, 1]. Hence we only have to evaluate the
last quadratic term. By (7.91b) and (7.9lc), we know ~(J.Lk) is the orthogonal sum of
I k k
vectors D;; dx and Dkds. Therefore,
A A
(7.101)
we get
(7.102)
186 Affine Scaling Algorithms Chap. 7
-
- [
1- rek f3k
1-(1-r8k)f3k
l( ak - -()")
r
o.1 06)
When gk ::: O"jr, the right-hand side of (7.106) becomes nonpositive. Conse-
quently, so does the left-hand side, and gk+I ::: O"jr. This proves (7.97d).
On the other hand, when gk > T•
(7.106) implies that
(7.110)
r
and the proof is complete.
In view of Theorem 7 .3, if k* is the earliest iteration at which ek* :=: CJ1r, then
~
r
< ek ::: (1- v/ (e 0 - ~)
r
+ ~.
r
vk < k*
and
ek <-
()
v k:::: k*
-r '
If such a k* does not exist, then ek > CJ I r, V k and
-IJ <
r
ek ::: (1- v) k(O
e - -CJ)
r
+ -,
CJ
r
\fk
By the inequality (7.97b), the duality gap cT xk- bT wk attains the given accuracy
E and the iteration stops in at most
0 ( n loge ( cT xo ~ bT wo))
additional iterations. Hence the primal-dual algorithm terminates in at most
and (7 .80) follows. Also notice that at each iteration of the primal-dual algorithm, the
computational bottleneck is the inversion of the matrix AD~AT. A direct implementation
requires O(n 3 ) elementary operations for matrix inversion and results in an O(n 4 L)
complexity for the primal-dual algorithm. Definitely, this complexity can be reduced by
better implementation techniques.
In order to apply the primal-dual algorithm, we start with an arbitrary point (x0 ; w 0 ; s0 ) E
R"+m+n such that x0 > 0 and s0 > 0.
In case Ax 0 = b and AT w 0 + s0 = c, we know x0 E S and (w0 ; s0 ) E T and we
have a starting solution for the primal-dual algorithm. Otherwise, consider the following
pair of artificial primal and dual linear programs:
Minimize c T x+nxn+l
0
subject to Ax + (b - Ax )xn+l = b (AP)
where Xn+l and Xn+2 are two artificial variables and ;rr and A are sufficiently large positive
numbers to be specified later;
Maximize bT w + AWm+l
subject to AT w +(AT w 0 + s 0 - c)wm+l + s =c (AD)
(b- Ax
0
l w + Sn+l = 7r
Wm+l + Sn+2 = 0
(s; Sn+l; Sn+2) ~ 0
then (x0 , x~+ 1 , x~+ 2 ) and (w0 , w~,+ 1; s 0 , s~+ 1 , s~+ 2 ) are feasible solutions to the artificial
problems (AP) and (AD), respectively, where
x~+l = 1
x~+2 = A - (AT wo +so - c) T xo
w~+l = -1
Sec. 7.3 The Primal-Dual Algorithm 189
In this case, the primal-dual algorithm can be applied to the artificial problems
(AP) and (AD) with a known starting solution. Actually, the optimal solutions of (AP)
and (DP) are closely related to those of the original problems (P) and (D). The following
theorem describes this relationship:
Theorem 7.4. Let x* and (w*; s*) be optimal solutions of the original problems
(P) and (D). In addition to (7.112a) and (7.112b), suppose that
(7.112c)
and
Tr > (b - Ax 0l w* (7.112d)
Then the following two statements are true:
(i) A feasible solution (x, Xn+l, Xn+2) of (AP) is a minimizer if and only if x solves
(P) and Xn+l = 0.
(ii) A feasible solution (w, Wm+l; s, sn+I, sn+2) of (AD) is a maximizer if and only if
(w; s) solves (D) and Wm+l = 0.
Proof. Since x* is feasible to (P), if we further define that x,~+l = 0 and x~+ 2 = ).. -
(AT w0 +s 0 -cl x*, then (x*, x~+!, x~+ 2 ) is feasible to (AP). Suppose that (x, Xn+!, Xn+2)
is feasible to (AP) with Xn+I > 0, then
cT x* + nx,:+l = w*Tb = w*T (Ax+ (b- Ax0 )xn+d
Note that AT w* + s* = c, Xn+l > 0, and (7.112d). We see that
In the real implementation of the primal-dual algorithm, it is a very difficult task to keep
(xk; wk; sk) E S x T due to numerical problems. Also the choice of the control parameters
190 Affine Scaling Algorithms Chap. 7
greatly affects the performance of the algorithm. Much effort has been devoted to
designing a version of the primal-dual algorithm for practical implementation. In this
section, we introduce one version of the primal-dual algorithm that allows us to start
with an arbitrary point (x 0 ; w0 ; s0 ) with x 0 , s0 > 0. This version produces a sequence of
iterates {(xk; wk; sk)), with xk, sk > 0, which leads to an optimal solution, although they
no l.c:mger stay on the curve of S x T. It is important to know that, at this moment, there
i< 'iO rigorous convergence proof for this version of the primal-dual algorithm, but it is
··,;dely used in many commercial packages.
Moving direction. The basic idea of this version follows from the analysis of
Section 3.2. Assume that we are at a point (xk; wk; sk) for some fJ-k > 0, such that
xk, sk > 0. The Newton direction (d~; d~; d~) is determined by Equations (7.89a)-
(7.89c). Combining (7.89a), (7.89b), and (7.89c), we have
(7.113a)
where D~ = XkS/; 1 and Pk =I- DkAT(AD~AT)- 1 ADk, which is the projection matrix
onto the null space of matrix ADk.
If we further define that
k kA A -1
dxm = p, DkPkDkXk e,
then (7.113a) becomes
(7.113b)
The first term of (7.113b) is usually called the centering direction, since in light of
the potential push, it is nothing but the projection of the push vector (ljxk) which helps
the algorithm stay away from the walls of the primal polytope. The second term is called
the objective reduction direction, since it is the projected negative gradient of the primal
objective function which leads to a reduction in the primal objective function. The third
term is called the feasibility direction, since tk is a measure of primar feasibility. Also
note that Adkxctr = 0, and Ad~obj. = 0. Hence these two directions are in the null space of
matrix A, and the primal feasibility is solely affected by dkxfeas .
In practice, if we start with an arbitrary point (x0 ; w0 ; s0 ) with x 0 , s0 > 0, the value
of t0 might be very large, since x0 could be far from being feasible. At this point, the
main effort of the algorithm will be in finding a feasible point near the central trajectory.
Once a feasible solution is found (say, at the kth iteration) the algorithm will try to
keep tk' = 0 for all k' ::: k, except for the case that feasibility is lost due to numerical
truncation or round-off errors. In this way, d~1 '"·' will eventually vanish from the picture.
In a similar fashion one can carry out the analysis of moving directions on the dual
side, i.e, d~ and d~. It is left as an exercise for the reader.
Adjusting Penalty Parameters and Stopping Rules. Notice that the mov-
ing direction at the kth iteration is determined by the value of the penalty parameter J.L k.
Strictly speaking, the translation described above has to be carried out several times for
a fixed value of J.Lk, so that the Newton steps actually converge to the central trajectory
corresponding to that J.Lk. However, it is apparent that doing so would be an "overkill."
Recall that at optimality J.Lk has to be brought to zero to satisfy the complementary slack-
ness. Therefore, in practical implementations, the value of J.Lk is reduced from iteration
to iteration and only one Newton step is carried out for a given value of J.Lk.
The way in which J.Lk can be reduced at each iteration is suggested by the algorithm
itself. From Equation (7.76c) we see that J.L = s7 xjn. Plugging in the values of xk and
sk gives us a reasonably good measure of the penalty parameter for the current point.
According to our experience, sometimes, a lower value of J.i, say, O"[(skf xk]/n with
a < 1, could accelerate the convergence of the algorithm. There have been other similar
ideas reported by various authors on the choice of J.Lk. Nevertheless, the above simple
rule seems to work well for a variety of practical problems solved by the authors.
As far as the stopping rules are concerned, we may check the primal feasibility, dual
feasibility, and complementary slackness. Notice that the primal feasibility is measured
by tk, dual feasibility by uk, and complementary slackness by vk as defined by (7 .85).
uk = 0, d~ > 0,
then the dual problem (D) is unbounded. If either of these cases happens, STOP.
Otherwise go to the next step.
Step 6 (finding step-lengths): Compute the primal and dual step-lengths
1
f3p= max {1 , - x, axik}
dkj
and
wk+l +- wk + .BDd~
sk+ +- sk + f3Dd~
1
Example 7.3
Consider the same problem as in Example 7.1 and Example 7.2. We begin with an arbitrary
assignment of x0 = [1 1 1 l]T, w0 = [0 O]T, s0 = [1 1 1 1]T. With this
information, we see that Xo, So and :06
are all equal to the identity matrix I, and p,O = 1.
We now compute
Therefore,
Although d~ > 0 and cT d~ < 0, we see from t 0 that the primal is still infeasible at
this moment. Hence we proceed further.
We choose a = 0.99. Using the formula to compute the step-lengths, we find that
(Jp = 1.0 and fJD = 1/10.30303 = 0.097059. Therefore the updated solution becomes
and
The new solution x 1 is already primal feasible, which is in tune with our previous
discussions. The reader is urged to carry out more iterations to see that an optimal solution
with
is finally reached.
As we discussed before, ideally it takes several Newton steps for a given penalty pa-
rameter to get onto the central trajectory, although we found that in most cases it is
adequate to carry out only one Newton step for each penalty parameter. In order to track
194 Affine Scaling Algorithms Chap. 7
the continuous central trajectories more closely, we may consider using the power-series
approximation method as we did for the dual affine scaling algorithm.
To simplify the discussion, we choose the smaller one between f3 p and f3 D as a
common step-length f3 for both the primal and dual iterations and focus on a current
point, say (x0 ; w 0 ; s0 ). In the limiting case of f3 -+ 0, (7.84) can be rewritten in the
following continuous version:
In this chapter we have studied the basic concepts of affine scaling including the primal,
dual, and primal-dual algorithms. Many extensions have been made to enhance the basic
affine scaling algorithms. However, it is important to understand that the research work
in this area is still ongoing. Different barrier functions including the entropy and inverse
functions have been proposed. Unfortunately, no polynomial convergence result has
References for Further Reading 195
been achieved at this moment. A unified treatment will definitely help the development
of the interior-point methods for linear programming. The idea of using interior-point
methods to solve quadratic and convex programming problems with linear constraints
has also been explored by many researchers. We shall study these interesting topics in
later chapters.
7.1 Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "An implementation of Kar-
markar's algorithm for linear programming," Mathematical Programming 44,297-335 (1989).
7.2 Adler, I., and Resende, M.G. C., "Limiting behavior of the affine scaling continuous trajec-
tories for linear programming problems," Mathematical Programming 50, 29-51 (1991).
7.3 Barnes, E. R., "A variation of Karmarkar's algorithm for solving linear programming prob-
lems," Mathematical Programming 36, 174-182 (1986).
7.4 Cavalier, T. M., and Soyster, A. L., "Some computational experience and a modification of
the Karmarkar algorithm," presented at the 12th Symposium on Mathematical Programming,
Cambridge, MA (1985).
7.5 Dikin, I. I., "Iterative solution of problems of linear and quadratic programming" (in Rus-
sian), Doklady Akademiia Nauk USSR 174, 747-748, (English translation) Soviet Mathematics
Doklady 8, 674-675 (1967).
7.6 Frisch, K. R., "The logarithmic potential method of convex programming," Technical Report,
University Institute of Economics, Oslo, Norway (1955).
7.7 Freund, R. M., "Polynomial-time algorithms for linear programming based only on primal
affine scaling and projected gradients of a potential function," Mathematical Programming
51, 203-222 (1991).
7.8 Gill, P. E., Murray, W., Saunders, M.A., Tomlin, J. A., and Wright, M. H., "On projected bar-
rier methods for linear programming and an equivalence to Karmarkar's projective method,"
Mathematical Programming 36, 183-209 (1986).
7.9 Gonzaga, C., "An algorithm for solving linear programming problems in O(n 3 L) opera-
tions," in Progress in Mathematical Programming: Interior-Point and Related Methods, ed.
N. Megiddo, Springer-Verlag, New York, 1-28 (1989).
7.10 Gonzaga, C., "Polynomial affine algorithms for linear programming," Mathematical Pro-
gramming 49, 7-21 (1990).
7.11 Huard, P., "Resolution of mathematical programming with nonlinear constraints by the
method of centers," in Nonlinear Programming, ed. J. Abadie, North-Holland, Amsterdam,
Holland, 207-219 (1967).
7.12 Karmarkar, N., Lagarias, J. C., Slutsman, L., and Wang, P., "Power series variants of
Karmarkar-type algorithms," AT&T Technical Journal68, No. 3, 20-36 (1989).
7.13 Kojima, M., Mizuno, S., and Yoshise, A., "A primal-dual interior point method for linear pro-
gramming," in Progress in Mathematical Programming: Interior-Point and Related Methods,
ed. N. Megiddo, Springer-Verlag, New York, 29-48 (1989).
7.14 Megiddo, N ., "On the complexity of linear programming," in Advances in Economical Theory,
ed. T. Bewely, Cambridge University Press, Cambridge, 225-268 (1987).
196 Affine Scaling Algorithms Chap. 7
7.15 Megiddo, N., Progress in Mathematical Programming: Interior-Point and Related Methods,
Springer-Verlag, New York (1989).
7.16 Megiddo, N., and Shub, M., "Boundary behavior of interior point algorithms in linear pro-
gramming," Mathematics of Operations Research 14,97-146 (1989).
7.17 Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms. Part I: Linear
programming," Mathematical Programming 44, 27-42 (1989).
7.18 Monteiro, R. C., Adler, I., and Resende, M. C., "A polynomial-time primal-dual affine scaling
algorithm for linear and convex quadratic programming and its power series extension,"
Mathematics of Operations Research 15, 191-214 (1990).
7.19 Renegar, J., "A polynomial-time algorithm based on Newton's method for linear program-
ming," Mathematical Programming 40, 59-93 (1988).
7.20 Roos, C., "A new trajectory following polynomial-time algorithm for linear programming
problem," Journal of Optimization Theory and Applications 63, 433-458 ( 1989).
7.21 Roos, C., and Vial, J.-Ph., "Long steps with the logarithmic penalty barrier function in linear
programming," in Economic Decision Making: Games, Economics, and Optimization, ed.
J. Gabszevwicz, J.-F. Richard, and L. Wolsey, Elsevier Science Publisher B.V., 433-441
(1990).
7.22 Sun, J., "A convergence proof for an affine-scaling algorithm for convex quadratic program-
ming without nondegeneracy assumptions," manuscript to appear in Mathematical Program-
ming (1993).
7.23 Tseng, P., and Luo, Z. Q., "On the convergence of affine-scaling algorithm," manuscript to
appear in Mathematical Programming 53 (1993).
7.24 Tsuchiya, T., "A study on global and local convergence of interior point algorithms for linear
programming" (in Japanese), PhD thesis, Faculty of Engineering, The University of Tokyo,
Tokyo, Japan (1991).
7.25 Vanderbei, R. J., "Karmarkar's algorithm and problems with free variables," Mathematical
Programming 43, 31-44 (1989).
7.26 Vanderbei, R. J., "ALPO: Another linear program solver," Technical Memorandum, No.
11212-900522-18TM, AT&T Bell Laboratories (1990).
7.27 Vanderbei, R. J., and Lagarias, J. C., "I. I. Dikin's convergence result for the affine-scaling
algorithm," Contemporary Mathematics 114, 109-119 (1990).
7.28 Vanderbei, R. J., Meketon, M. S., and Freedman, B. A., "A modification of Karmarkar's
linear programming algorithm," Algorithmica 1, 395-407 (1986).
7.29 Vaidya, P.M., "An algorithm for linear programming which requires O(((m + n)n 2 + (m +
n)l. 5 n)L) arithmetic operations," Mathematical Programming 47, 175-201 (1990).
7.30 Ye, Y., "An O(n 3 L) potential reduction algorithm for linear programming," Contemporary
Mathematics 114, 91-107 (1990).
7.31 Zhang, Y., Tapia, R. A., and Dennis, J. E., "On the superlinear and quadratic convergence
of primal-dual interior point linear programming algorithms," SIAM Journal on Optimization
2, 304-324 (1992).
Exercises 197
EXERCISES
7.1. You are given two algorithms, A and B. Algorithm A solves systems of linear equations;
Algorithm B solves linear programming problems.
(a) How can you use Algorithm A to solve a linear programming problem?
(b) How can you use Algorithm B to solve a system of linear equations?
(c) Combining (a) and (b), what is your conclusion? Why?
7.2. Consider the following linear programming problem:
Minimize -x, + 1
subject to x3 - x4 = 0
(a) Draw a graph of its feasible domain. Notice that (0, 0, 0.5, 0.5) is a vertex. Use the
revised simplex method to find its moving direction at this vertex and display it on the
graph.
(b) Note that (0.01, O.Ql, 0.49, 0.49) is an interior feasible solution which is "near" to the
vertex in (a). Use Karmarkar's algorithm to find its moving direction at this solution
and display it on the graph.
(c) Use the primal affine scaling algorithm to find its moving direction at
(0.01, 0.01, 0.49, 0.49) and display it on the graph.
(d) Use the primal affine scaling algorithm with logarithmic barrier function to find its
moving direction at (0.01, 0.01, 0.49, 0.49) and display it on a graph.
(e) Compare the directions obtained from (a) -(d). What kind of observations can be made?
Do you have any reason to support your observations?
7.3. Focus on the same linear programming problem as in Exercise 7.2.
(a) Find its dual problem and draw a graph of the dual feasible domain.
(b) Show that (1, -2) is an interior feasible solution to the dual linear program.
(c) Apply the dual affine scaling algorithm to find its moving direction at this point and
display it on the graph of the dual feasible domain.
(d) Is this moving direction pointing to the dual optimal solution?
(e) Apply the dual affine scaling algorithm with logarithmic barrier function to find its
moving direction at this point and display it on the graph of the dual feasible domain.
(f) Is the direction obtained in (e) better than that in (c)? Why?
7.4. Focus on the same linear programming problem again.
(a) Starting with the primal feasible solution x = (0.01, 0.01, 0.49, 0.49) and dual feasible
solution w = (1, -2), apply the primal-dual algorithm as stated in Section 7.3.6 under
"Step-by-Step Procedure" to find its moving directions dx and dw.
(b) Display the moving directions on the corresponding graphs.
(c) Can you make further observations and explain why?
7.5. Given a linear programming problem with bounded feasible domain, if the problem is both
primal and dual nondegenerate and xk is a primal feasible solution, show that
(a) AXk is of full row rank (assuming that m < n).
(b) The set C defined in (7.15) is a set of vertices of the polytope P of primal feasible
domain.
198 Affine Scaling Algorithms Chap. 7
Minimize c7 x
subject to Ax = b
7.8. In this problem, we try to outline a proof showing the primal affine scaling algorithm with
logarithmic barrier function is a polynomial-time algorithm. This proof is due to Roos and
Vial.
(a) Show that
Xkc )
p AXk ( ---;;; - e
can be used as a measure for the distance of a given point xk to the point xk (J-Lk) on
the central trajectory. Let us denote this distance measure by o(xk, J-Lk), i.e.,
(d) Prove that, if o(xk, J-Lk) < 1, then xk+l is an interior feasible solution to (P). Moreover,
o(xk+ 1, J-Lk) ::S o(xk, J-Lk). This implies that if we repeatedly replace xk by xk+ 1, with
fixed I-Lk, then we obtain a sequence of points which converges to x*(J-Lk) quadratically.
(e) Choose 0 < e < I and let J-Lk+l = (l - e)J-Lk. Show that
end
end
Let q 0 = -loge(nJ-L 0), and show that the algorithm terminates after at most 6(q-q 0 )Jn
iterations. The final points x and y(x, J-L) are interior solutions satisfying
200 Affine Scaling Algorithms Chap. 7
7.9. For the dual affine scaling algorithm, explain the meaning of "primal estimate" as defined
in (7.59).
7.10. For the primal-dual algorithm, try to decompose d~ and d: as we did for d~ in (7.113).
Then analyze different components.
7.11. We take x0 = e, w0 = 0, and s0 = e.
(a) Show that (7.112a) becomes n > 0.
(b) Show that (7.112b) becomes A. > n- cT e.
(c) What about (7.112c) and (7.112d)?
7.12. Derive the power-series expansions for x(/3), w(/3), s(/3), t(/3), u(/3), and v(/3) in the
primal-dual algorithm.
7.13. Develop computer codes for the primal affine scaling, dual affine scaling, and primal-dual
algorithms and test those problems in Exercise 3.16.
8
201
202 Insights into the Interior-Point Methods Chap. 8
For any positive scalar f-L, we can incorporate a logarithmic barrier function either into
the primal program P and consider a corresponding problem (Program P11-):
n
Notice that, under the above assumptions, as 1-L approaches 0, the unique solution to the
system of equations (8.5) solves the given linear program P and its dual problem D.
However, for 1-L > 0, we can actually approach the solution of XSe - f-Le = 0 from
different but equivalent algebraic paths. To be more specific, for Xj > 0 and Sj > 0
(j = 1, ... , n ), consider the following functions:
f(xj,sj) =f.L-XjSj (8.6a)
f1,
g(xj,sj) = - -Sj (8.6b)
Xj
1-L
h (xj, Sj) = - -Xj (8.6c)
Sj
Sec. 8.1 Moving along Different Algebraic Paths 203
Although they are different in format, the above three functions are all algebraically
equivalent to the condition (8.5c), since
2
{ (x, s) E R " If (x1 , SJ) = 0, x1 > 0, s1 > 0, for j = 1, ... , n}
211
= {(x,s) E R lg(xj,Sj) =O,x1 > O,s1 > 0, for j = 1, ... ,n}
= { (x, s) E R
2
" I h(xj, s1) = 0, Xj > 0, s1 > 0, for j = 1, ... , n}
= {(x, s) E R 2" I XSe - J.Le = 0, x > 0, s > 0}
In this way solving system (8.5) is equivalent to solving one of the following three
systems:
ATw+s-c=O (8.7a)
Ax-b=O (8.7b)
j(Xj, Sj) = 0, for j = 1, ... , n (8.7c)
X> 0, S > 0 (8.7d)
ATw+s-c=O (8.8a)
Ax- b = 0 (8.8b)
g(Xj, Sj) = 0, for j = 1, ... , n (8.8c)
X> 0, S > 0 (8.8d)
ATw+s-c=O (8.9a)
Ax- b = 0 (8.9b)
h(Xj, Sj) = 0, for j = 1, ... , n (8.9c)
X> 0, S > 0 (8.9d)
To solve any one of the above three systems, let us assume that (xk; wk; sk) E R" x
Rm x R" such that AT wk + sk = c, Axk = b, xk > 0, and sk > 0. We shall apply the
famous Newton method to solve these systems at (xk; wk; sk). Note that only functions
f, g, and h are nonlinear in these three systems. Therefore, when the Newton method
is applied, we need only linearize them for obtaining a moving direction.
Let us focus on system (8.8) first. Taking one Newton step with a linearization of the
function g(x1 , s1 ) = 0, we have
k
O-g(x1k ,s1k )= [ \lg(x k ,s1k ) ] T ( Xj - Xjk )
1
s1 - s1
204 Insights into the Interior-Point Methods Chap. 8
Substituting (8.6b) for the function g and multiplying it out, we see that
Consequently, we have
sf -1 ~ hxf)'' -!) =:n c
Sj ~ ;; - ( (1)') Xj
Since the above equation holds for j = 1, ... , n, by taking matrix Xk = diag (xk), we
have
(8.10)
Moving along Newton direction, the linear equations (8.8a) and (8.8b) are preserved. By
(8.8a), s =c-AT wand (8.10) becomes
1
x = -X~(AT w + 2p.XJ: 2 xk- c)
fJ
Multiplying both sides by matrix A and applying (8.8b), we see that
1
b =Ax= -AX~(AT w + 2J-LXJ: 2 xk- c)
fJ
Consequently,
This time, let us focus on the system (8.9) to show that the dual affine scaling algorithm
with logarithmic barrier function actually takes the Newton direction along the algebraic
Sec. 8.1 Moving along Different Algebraic Paths 205
path of h(x, s) = 0. Note that one Newton step with the linearization of h(x1 , Sj) = 0
results in
and
Xj ~ 1- ((:;) 2 ) Sj
Note the above equation holds for j = 1, ... , n. By taking matrix Sk = diag (sk), we
have
X= S_, s-2
2{J.,ke-fJ.,kS (8.11)
Again, moving along the Newton direction preserves the linear equations (8.9a) and
(8.9b). By (8.9b), we have
1
= -(ASk" 2 AT)- 1b- (ASk"2 AT)- 1ASk" 1e
fJ.,
Comparing this direction to (7.73), we see that the dual affine scaling algorithm with loga-
rithmic barrier function takes the Newton direction along the algebraic path of h(x, s) = 0.
Finally, we work on the system (8.7) to derive the moving directions of the primal-
dual algorithm. Simply by taking one Newton step with a linearization of the function
f(x1 , s1) = 0, we have
k
O-f(x1k ,s1k )= [ 'Vf(x1k ,s1k )] T ( Xj - Xj )
k
s1 - s1
206 Insights into the Interior-Point Methods Chap. 8
(8.12a)
(8.12b)
and
(8.12c)
Note that (8.12a), (8.12b), and (8.12c) form a system of linear equations with unknown
variables fixb fiwb and fisk. Using (8.12b) and (8.12c) to eliminate fixk and fisk in
(8.12a), we obtain
Comparing (8.13) to formula (7.90), we clearly see that the primal~dual algorithm takes
the Newton direction along the algebraic path of f(x, s) = 0.
Now, combining the results we obtained in the previous three subsections results
in the following theorem:
Theorem 8.1. The moving directions of the primal affine scaling algorithm with
logarithmic barrier function, the dual affine scaling algorithm with logarithmic barrier
function, and the primal-dual algorithm are the Newton directions along three different
and yet equivalent algebraic paths that lead to the solution of the Karush-Kuhn-Tucker
conditions (8.5).
Sec. 8.2 Missing Information 207
In Chapter 7, the primal approach and dual approach were treated separately as if they
were independent problems. However, Theorem 8.1 indicates that the moving directions
of both the primal affine scaling and dual affine scaling with logarithmic barrier function
are closely related to that of the primal-dual method. Hence we shall further exploit the
dual information in the primal approach and the primal information in the dual approach.
We first study the dual information in the primal affine scaling algorithm. From (8.1 0),
we have
-2 k x-2
s= 2f-L X kx -{-L kx
2
= !-LXJ: xk- !-LXJ: Xk (
2
-~Xkc +e) +f-LAT (AX~AT)- 1 AXk ( -~Xkc +e)
=c-AT (AX~AT)- 1 AXk(Xkc- f-Le)
Since we are moving along the Newton direction, both the primal and dual feasibility
conditions are preserved. Hence we can define that
w = (AX~AT)- 1 AXk(Xkc- f-Le)
Similar to what we did in the last subsection, we can derive the embedded primal
information of the dual affine scaling. Starting from Equation (8.11), we have
208 Insights into the Interior-Point Methods Chap. 8
X= S -1 8 -2
2f.J.,ke-f.J.,kS
1
= 2f.l.,SJ: 1e- f.l.,SJ: 2 [sk- ;AT (ASJ: 2ATr (b- f.J.,ASJ: 1e)]
1 1
= f.l.,SJ: [e + SJ: 1AT (ASJ:2 ATr (};AXke- ASJ: 1e)]
1 1
= f.l.,SJ: [e + SJ: 1AT (ASJ: 2 ATr ASJ: 1 (};skXke- e)]
1
=f.l.,SJ:
1
[I-SJ: 1AT(ASJ:2 ATr ASJ: 1]
1
(~ SkXke+e) +xk
Hence we know
2 1
f'::..xk = - [SJ: 1 - SJ: AT (ASJ:2 ATr ASJ: 1] (XkSke- f.J.,e)
2 1
= [SJ: 1 - SJ: AT (ASJ: 2 ATr ASJ: 1] vk(f.J.,) (8.15)
Comparing (8.15) to (8.13c), we see that the primal moving direction embedded in the
dual affine scaling algorithm with logarithmic barrier function has exactly the same form
as that of the primal-dual algorithm but with a different scaling matrix.
The results we found in the above two subsections can be summarized in the
following theorem:
Theorem 8.2. The form of either the dual moving direction embedded in the
primal affine scaling or the primal moving direction embedded in the dual affine scaling
can be found in the primal-dual algorithm but with different scaling matrices.
The concept of "moving along the Newton direction on different algebraic paths" not
only provides us a unified view to examine the primal affine scaling with logarithmic
barrier function, dual affine scaling with logarithmic barrier function, and primal-dual
algorithms but also serves as a platform to study new interior-point algorithms. At least
in theory there are infinitely many algebraic paths that could lead to the solution of
the given Karush-Kuhn-Tucker conditions, and each path may generate a new moving
direction associated with a potential interior-point algorithm. If a suitable step-length can
be decided at each iteration for a convergence proof, then a new interior-point algorithm
is introduced for further studies. An example of moving along a new path is given below.
Consider the function
Sec. 8.4 Geometric Interpretation of the Moving Directions 209
defined on x1 > 0, SJ > 0, j = 1, 2, ... , n, and p, > 0. In this way, solving system (8.5)
is equivalent to solving the following system:
ATw+s-c = 0 (8.16a)
Ax-b=O (8.16b)
r(Xj, Sj) = 0, for j = 1, ... , n (8.16c)
X> 0, S > 0 (8.16d)
We consider the moving direction at a given point (xk; wk; sk) such that Axk = b,
AT wk + sk = c, xk > 0, and sk > 0. One Newton step at this point with a linearization
of the function r (x1 , s1 ) = 0 yields that
-lo xjsJ _
e g--
p,
1 1)
( - - (xi- xj)
x1k' s1k s1 - s1k
Since the above expression holds for j = 1, 2, ... , n, its vector form becomes
x;;- 1
fl.xk + S;;- 1
fl.sk = -B(p,) (8.17)
Moreover, moving along the Newton direction preserves the linear equations, hence
we have Afl.xk = 0 and AT fl. wk + fl.sk = 0. These two equations together with (8.17)
form a system of linear equations in terms of fl.xk. fl. wk. and fl.sk. The solution of this
system becomes
1 1
fl.xk = - [xk- xks;;- AT (Axks;;- 1ATr Axk] e(p,) (8.18a)
1 1
fl.wk = (AXkSk ATr AXke(p,) (8.18b)
Different geometric viewpoints have been proposed to interpret the moving directions
of each individual affine scaling algorithm. Our objective in this section is to provide
a geometric view which at least interprets the moving directions of the primal affine
with logarithmic barrier, the dual affine with logarithmic barrier, and the primal-dual
algorithms in a unified way. Later on, we show that, for each of the above three
algorithms, an associated minimization problem can be defined such that the solution of
210 Insights into the Interior-Point Methods Chap. 8
the associated problem becomes the moving direction of the corresponding affine scaling
algorithm.
In order to achieve the goal, the concept of null space of a matrix needs to be
strengthened through the following two lemmas:
Proof For each x E Rn, since matrix A has full row rank, x can be decomposed
as
full rank. Moreover, AQ =AU= 0. The result follows from Lemma 8.1.
With these two lemmas, we can start developing a unified geometric interpretation
of the moving directions in different affine scaling algorithms.
Sec. 8.4 Geometric Interpretation of the Moving Directions 211
For the primal affine scaling algorithm with logarithmic barrier function, consider Pro-
gram PJ.L of (8.3). For a positive p,, we define
n
p(x) = cT x- f1, L loge Xj
j=l
Then p(x) is a convex and continuously differentiable function defined over the constraint
set (8.3b). In particular, for a given interior feasible solution xk, we have a first-order
approximation
p(x) ~ p (xk) + [Vp(xk)f (x- xk)
where V'p(xk) = c- p,XJ; 1e. Finding the steepest descent direction at xk is equivalent
to minimizing [Vp(xk)f(x- xk). Thus we consider a subproblem Ps of PJ.L:
where A. ::=:: 0 is a Lagrangian multiplier. Taking the partial derivative of L 1 with respect
to h and setting it to be zero at optimum hk, we have
uTvp (xk) + 2A. (uTx-; 2u) hk = o
Because matrix u has full rank and xk is a diagonal matrix, UTXJ; 2 U is a nonsingular
square matrix. Consequently,
212 Insights into the Interior-Point Methods Chap. 8
Remember that U is an isomorphism between Rn-m and the null space of matrix A in
Rn. We transform hk back to the null space of matrix A by
1
I:J..xk = Uhk = _ _!_u (U 7 X; 2 u r U 7 Vp (xk) (8.19)
2.A.
Noting that Vp(xk) = c tLXk' 1e, we apply Lemma 8.2 to (8.19) with Q = Xk. In this
way, we see that
With the same idea, we now consider the dual case. This time, we define
n
q(w, s) = b w
7
+ fL ~loge SJ
}=!
and assume that (wk, sk) is a solution to program DfL of (8.4). In this way, [Vq(wk, sk)f
= (b 7 , f-Le 7 S; 1). Since w-variables are unrestricted, we only have to construct an ellip-
soid in the s-space and consider the following subproblem Ds of program D/L:
Then AU= 0 and iJT can be considered as an isomorphism between Rm and the null
space of A, i.e.,
for some v E Rm
In other words, we have .6.wk = v, .6.sk =-AT v. Moreover, the subproblem Ds becomes
(8.21)
By setting its partial derivative with respect to v to be zero at the optimal solution vk
and applying Lemma 8.2, we eventually have
(8.22a)
and
(8.22b)
Note that 1/2A is only a positive scalar. By comparing (8.22) to (7.73), we conclude
that the moving direction of the dual affine scaling with logarithmic barrier function
algorithm is provided by the solution of the subprogram Ds. This is consistent with the
geometric interpretation we derived for the primal case.
n
CT X - fk L loge Xj -
J=l
ll
= XT s- fk L loge(XjSj)
J=l
ll
The desired primal-dual optimization problem can be defined as a problem which min-
imizes the gap between problems PJL and DJL subject to the primal and dual interior
feasibility conditions, i.e., problem (PD)JL has the following form:
If we define
-A= [A0 0 0]
AT 1 11
and use p(x) and q(w, s) representing the objective function of P JL and DJL respectively,
then problem (PD)~-' is simplified as follows:
X> 0, S > 0
Suppose that (xk; wk; sk) is a feasible solution of (PD)w The steepest descent direction
suggests us to consider the following subproblem (PD)s:
Sec. 8.4 Geometric Interpretation of the Moving Directions 215
subject to =0
we see AU= 0 and U serves as an isomorphism between Rn and the null space of A.
More explicitly, we have
where u 1 E Rn-m,
(8.24a)
and
(8.24b)
6 Xk -___).l_xlf2s-lf2
I k k
[1 _ x!f2s-If2 AT
k k
(Axks-I
k
AT)-! Axlf2s-lf2]
k k
2
x!l2s;l/2 (c- /Lx;;Ie)
=- 2~1 (xks;;I- XkSi: 1AT (AXkS;; 1ATrl AXkS;; 1) (c-AT wk- {LX;; 1e)
(8.25a)
Similarly, we have
Sec. 8.5 General Theory 217
1
L'l.wk = u2 = - - (AXkSJ; 1Arrt (b- tLASJ; 1e)
2Az
1
= - - (AXkSJ; 1Arrl (AXke- tLASJ; 1e) (8.25b)
2Az
L'l.s =-A T u-? = - -1A T ( AXkSk-1 A y)-1 ASk-1 (XkSke- f.Le) (8.25c)
2Az
Noting that both At and Az are nonnegative, and, comparing (8.25) to (7.90), we can
confirm that the moving directions of the primal-dual affine scaling algorithm are given
by the solution of the subproblem (PD) 5 •
The geometric interpretation of the moving directions of the affine scaling algorithms
suggests that we study two crucial factors. First, we need a symmetric positive definite
scaling matrix to open an appropriate ellipsoid in the null space of the constraint matrix for
consideration. Second, we need an appropriate objective function such that its first-order
approximation is optimized. In the previous section, we have incorporated logarithmic
barrier functions into the original objective and applied diagonal matrices for scaling.
Here we want to further extend this approach to study more general results.
In this subsection, we focus on the primal program P defined by (8.1). Instead of choosing
the logarithmic barrier function, for p, > 0, let us use a general concave barrier function
¢(x) which is well defined and differentiable on the relative interior of the primal feasible
domain and consider the following problem (P¢) JL:
Minimize cT x- tL¢(x) (8.26a)
subject to Ax = b (8.26b)
X>O (8.26c)
Under the interior-point assumption (Al) on problem (P), let xk be a feasible solu-
tion to problem (P¢ )JJ. and \1¢ be the gradient of ¢. We also let Q- 1 be an arbitrary
symmetric positive definite matrix, f3 < 1 be a positive scalar such that the ellipsoid
{x E Rn IIIQ- 1 (x- xk)Ji 2 _:::: {3 2 } becomes inscribed in the feasible domain of prob-
lem (P¢) w Our focus is to find a good moving direction vector L'l.xk = x - xk from the
ellipsoid such that xis still feasible, i.e., AL'I.xk = 0, and the objective value cT x- f.L¢(x)
is minimized.
218 Insights into the Interior-Point Methods Chap. 8
Taking the first-order approximation of the objective function at the current interior
solution xk, we have
c7 x- f..L¢(x) ~ c7 xk- f-L¢ (xk) + [c- f..L\1¢ (xk)J 7 l:!.xk
Therefore, we focus on the following subproblem (P¢ )s:
7
Minimize [c-f..LV¢(xk)J l:!.xk (8.27a)
subject to Al:!.xk =0 (8.27b)
where A. ~ 0 is the Lagrange multiplier associated with the inequality constraint. Setting
aLjah = 0 and solving for h results in a solution
hk = _ __!__ (UTQ-2Url UT (c- f..L\1¢ (xk))
2A.
Consequently, from Lemma 8.2, a moving direction
l:!.xk = -Q[I- QA 7 (AQ 2 A 7 r 1
AQ]Q (c- f-LV¢ (xk)) (8.29)
is generated for the general primal affine scaling algorithm. Also note that, when ¢(x)
is strictly concave and twice differentiable, the Hessian matrix H of -¢(x) becomes
symmetric positive definite. Actually, H is the Hessian of the objective function c 7 x-
f..L¢(x) of program (P¢) w If we choose H 112 to be the scaling matrix Q- 1 (or equivalently,
H = Q-2 ), then
l:!.xk = -H- 112 [1- H- 1 A12 7
(AH- 1A 7 r
1
AH~ 1 1 2 ] H- 112 (c- f..LV¢ (xk)) (8.30)
is the projected Newton direction with respect to the general barrier function.
Note that the classic inverse function can be used as a barrier function, i.e.,
1 n 1
¢(x) = -~ L x~ for r > 0
j=1 1
In this case,
V¢(x) = x-r- 1e (8.31a)
and
H = -(r + l)x-r-2 (8.31b)
Sec. 8.5 General Theory 219
In this subsection, we shift our focus to the dual program D defined by (8.2). As in the
general primal affine scaling, we replace the logarithmic barrier function, for 11- > 0, by
a general concave barrier function 1/r(x) which is well defined and differentiable on the
relative interior of the dual feasible domain. Now consider the following problem (D1/r) JL:
Maximize bT w + /1-1/r(s) (8.33a)
subject to AT w + s = c (8.33b)
S>O (8.33c)
Under the interior-point assumption (A2) on problem (D), let (wk; sk) be a feasible
solution to problem (D1/r) JL and Vlfr be the gradient of 1fr. Again we let Q- 1 be an
arbitrary symmetric positive definite matrix, f3 < 1 be a positive scalar such that the
ellipsoid {s ERn IIIQ- 1 (s- sk)ll 2 :::: {3 2 } becomes inscribed in the feasible domain of
problem (D1/r) w In order to find good moving direction vectors !:l wk = w - wk and
!:lsk = s - sk, we focus on the following subproblem (D1/r) s:
(8.34b)
220 Insights into the Interior-Point Methods Chap. 8
(8.34c)
Remember the isomorphism
between the null space of matrix A_T = [AT I In] and Rm such that /:).wk v and
D.sk =-AT v, for v E Rm. A null-space version of problem (Do/)s becomes
Maximize [bT 1 (!L \11/1 (sk)) T] Uv (8.35a)
and
(8.36b)
for the general dual affine scaling algorithm. Also note that, when 1jl (x) is strictly
concave and twice differentiable, the Hessian matrix H of 1jl (x) becomes symmetric
positive definite. If we choose H 112 to be the scaling matrix Q- 1 (or equivalently,
H = Q- 2 ), then the corresponding formulas for /:). wk and /:).sk can be derived. When the
classic inverse function is taken to be the barrier function for the dual approach, i.e.,
1 n 1
1/f(s) = -"'-
r L-- s~
for r > 0
j=l J
many algebraic paths that lead to the solutions of the Karush-Kuhn-Tucker conditions.
Moving along the Newton direction of each such path with appropriate step-lengths may
result in a new algorithm for further analysis.
The geometric interpretation relies on the special structure of a corresponding
subproblem. Basically, it takes an appropriate scaling matrix and a scalar to open an
inscribed ellipsoid in the feasible domain such that the inequality constraints can be
replaced. Then we consider the projected (negative) gradient of the objective function
in the null space of the constraint matrix as a potential moving direction. The shape of
the inscribed ellipsoid is certainly determined by the scaling matrix, and the projected
gradient is dependent on the barrier function applied.
Based on the geometric view, a general scheme which generates the moving direc-
tions of the generalized primal affine scaling and dual affine scaling has been included.
As to the generalization of the primal-dual algorithm, the difficulty lies in finding a pair
of primal barrier function cj;(x) and dual barrier function ljr(s) such that both programs
(Pep) JL and (Do/) JL have a common system of Karush-Kuhn-Tucker conditions. If this
can be done, the generalization follows immediately. But so far, except by using the
logarithmic barrier function for both the primal and dual, no other successful case has
been reported.
8.1 Gonzaga, C., "Search directions for interior linear programming methods," Algorithmica 6,
153-181 (1991).
8.2 den Hertog, D., Roos, C., and Terlaky, T., "Inverse barrier methods for linear programming,"
Report of the Faculty of Technical Mathematics and Informatics, No. 90-27, Delft University
of Technology (1990).
8.3 Sheu, R. L., and Fang, S. C., "Insights into the interior-point methods," OR Report No. 252,
North Carolina State University, Raleigh, NC (1990), Zeischriftfur Operations Research 36,
200-230 (1992).
8.4 Sheu, R. L., and Fang, S. C., "On the generalized path-following methods for linear pro-
gramming," OR Report No. 261, North Carolina State University, Raleigh, NC (1992).
8.5 Ye, Y., "An extension of Karmarkar's algorithm and the trust region method for quadratic
programming," in Progress in Mathematical Programming: Interior-Point and Related Meth-
ods, ed. N. Megiddo, Springer Verlag, New York, 49-64 (1989).
8.6 Zimmermann, U., "Search directions for a class of projective methods," Zeischriftfor Oper-
ations Research 34, 353-379 (1990).
EXERCISES
8.1. Show that (8.18) is indeed a solution to the system (8.17) together with AL'.xk = 0 and
A 7 L'.wk+L'.sk=0.
8.2. If we define P = U(U7 U)- 1U 7 =[I- A7 (AA 7 )- 1A], show that P2 = P and AP = 0.
222 Insights into the Interior-Point Methods Chap. 8
8.3. Show that v" in (8.22a) is indeed an optimal solution to the null-space version of program Ds.
8.4. From (8.19), we know that
( ~;:) =
2~ iJ (ASk ATr
2 1
iJT (Vq (~, l))
= _!_iJ(AS-2AT)-liJT ( ~1 )
2A k p,Sk e
Then show that iJ (ASi: 2 AT) -l iJT is not a projection mapping. Hence the moving direction
in the dual affine scaling with logarithmic barrier function cannot be viewed as the negative
gradient of q(w; s) projected into the null space. The reason is mostly due to the unrestricted
variables w. This phenomenon will not happen for the symmetric dual problem, which
requires both w and s to be nonnegative.
8.6. Derive (8.24a) and (8.24b) from
. . _ T XjSj
t(x, w, s)- -x s + LXjSj
n loge ( - -)
j=l J-i
Now consider the following subproblem:
By choosing x; 112 s! 12
and x!l 2s;
112
as the scaling matrix for x and s, respectively, show
that the solution of this subproblem provides the moving directions (8.18).
8.8. Replace the objective function of (8.3a) by
+ ~ f-.. _!_
7
c X
p, r ~x~
j=l J
I " I
1/f(s) = - ' \ ' - for r > 0
r ~s~
j=i J
Exercises 223
and using H 112 as the scaling matrix Q- 1, derive corresponding dual moving directions
t::,wk and !::,sk.
8.10. Consider using the entropy function
ll
</J(x) =- L Xj loge x1
}=1
"if;(s) =- L SJ loge Sj
}=1
The linearly constrained convex quadratic programming problems and linear program-
ming problems are closely related by having the same structure of their feasible do-
mains. In the past, linear programming algorithms had been naturally extended to solving
quadratic programming problems. For example, in 1959 P. Wolfe extended the simplex
method for quadratic programming. Just as in solving linear programming problems, in
the worst-case analysis, the simplex-based algorithm could take an exponential number
of iterations to reach optimality. Another example is that the ellipsoid method proposed
by M. K. Kozlov, S. P. Tarazov, and L. G. Khachian in 1979 led to the first notable
polynomial-time algorithm for solving quadratic programming problems. Similar to the
case of linear programming, despite the theoretic significance, the related implementa-
tion issues made this approach much less attractive. Therefore, it is easy to understand
that after N. Karmarkar proposed his first interior-points method to solve linear pro-
gramming problems in 1984, many researchers have devoted their efforts to developing
interior-points methods for quadratic programming.
In this chapter, we look into extending the affine scaling approach to solving
quadratic programming problems. Since the theoretic complexity analysis in this case is
somewhat parallel to that of the linear programming case, we merely focus on introduc-
ing practical implementations and leave out detailed complexity analysis. In particular,
we concentrate on developing the primal affine scaling and primal-dual algorithms for
quadratic programming first and then extend the results to solving general convex pro-
gramming problems with linear constraints.
224
Sec. 9.1 Convex Quadratic Program with Linear Constraints 225
The concept of duality also applies to quadratic programs. When Q is positive definite,
corresponding to the primal problem (9 .1 ), we have a dual Lagrangian problem:
1
Maximize - -vT Qv + bT w (9.2a)
2
subject to - Qv + AT w +s = c, (9.2b)
226 Affine Scaling for Convex Quadratic Programming Chap. 9
Figure 9.2
nonsingular and AQ- 1AT must be positive definite. In fact, if Q is assumed to be positive
definite, instead of positive semidefinite, then Q- 1 exists and AQ- 1A is positive definite
when A has full row-rank. For simplicity, we assume that Q in (9.la) is symmetric and
positive definite in this chapter.
As mentioned earlier, right after Karmarkar's work with linear programming, many re-
searchers tried and were able to extend Karmarkar's projective scaling algorithm for
solving quadratic programs. For example, S. Kapoor and P. M. Vaidya developed a
projective scaling algorithm which requires O(n 3·67 (log(n + m + 2))(log L)L) arithmetic
operations. Similarly, Y. Ye and E. Tse proposed an algorithm with O(n 4 L 3 ) complex-
ity. However, owing to the complicated nature of the projective transformation, although
these algorithms have polynomial-time bounds, they lack computational support of ef-
fective implementation. Our objective is to introduce some practical implementations
based on the affine scaling approach-in particular, primal affine scaling and primal-dual
algorithms.
With some caution in handling the subtle differences between linear programming and
quadratic programming problems, the primal affine scaling algorithm developed for linear
programs can be extended to solve quadratic programming problems. In accordance with
the philosophy that we have been following throughout the book, we focus on three key
issues for developing an iterative algorithm, namely, (1) obtaining a starting feasible
solution, (2) synthesizing a direction of translation with appropriate step-length for an
improved solution, and (3) finding stopping criteria to terminate the iterative process.
Optimal solution
Directions of translation/
Figure 9.3
Direction of translation
Optimal solution
Figure 9.4
Consider the convex quadratic function z(x) defined by (9.1a). For a symmetric
positive semidefinite matrix Q, we can always find a lower triangular matrix L such that
Q=LLr (9.4)
where Lr is the transpose matrix of L. Because L can be found by the standard Cholesky
factorization process, we usually call it a Cholesky factor of matrix Q. Notice that when
Q is positive definite, then both L and Lr are nonsingular.
Now consider a transformation from Rn to Rn such that
(9.5a)
Moreover, we define that
1
C =L- 1c (9.5b)
Then (9.1a) becomes
1 1T
1 1
+ c' T x 1
z(x ) =
2x X
1 ,r 1 1 + 1T 1
=
2x X C X
Note that the transformed convex quadratic function has spheric contours because of the
identical eigenvalues in its defining matrix I. Working in the transformed space, we
expect to avoid the undesirable "zig-zagging" problem. Also note that
1 1
z(x
1
) + -(c')r C
1
= -(x
1
+ c')r (x + c
1 1
)
2 2
Since iCc')r c1 is only a constant, minimizing z(x') is equivalent to minimizing z'(x') =
iCx' + c'l (x' + c').
Application to quadratic programming. For a quadratic programming prob-
lem, with symmetric positive definite Q, we shall apply the transformation (9.5a) to it
first, then follow the primal affine scaling idea to generate a direction of translation in
the affinely scaled space by projecting the negative gradient of the transformed quadratic
function onto the null space of the scaled constraints. The actual moving direction is
finally obtained by transforming the direction of translation back to the original space.
In light of previous discussion, the transformed quadratic program becomes:
1
Minimize z' = -(x' + c')r (x' + c') (9.6a)
2
Notice that the new quadratic objective function has spheric contours in the transformed
space, the original constrained Ax = b remains, the transformation X1 = Lr xis incorpo-
rated into the constraints, and x' is unrestricted although x is required to be nonnegative.
230 Affine Scaling for Convex Quadratic Programming Chap. 9
This transformed problem certainly doubles the number of variables in the original prob-
lem. But from later development, we can see this causes no special problem, because
once x is known, X 1 = LT x is automatically defined.
As in Chapter 7, let us assume that a primal interior feasible solution xk > 0 is
known. By taking its elements as diagonal elements, we form a diagonal scaling matrix
Xk. Remember that the basic idea of affine scaling is to scale the interior solution xk to
be at e = (1, 1, ... , 1) T. Therefore, in the scaled space we have scaled variables
-I
y = Xk X (9.7a)
1
As to the unrestricted variables X , they need not be scaled. Hence a scaling matrix is
defined by
Xk =
0 OJ
[Xk I (9.7b)
In this way, we have the following correspondence between the original variables and
scaled variables:
(9.7c)
Moreover, if we define
y:::O (9.8b)
Moreover, the quadratic objective function Z1 remains the same in the scaled space, since
only the unrestricted variables X1 are involved. The gradient of this objective function at
a given point
O ~]
1
\lz = [ x +c 1 (9.9a)
\lz~ = [ x k'
0
+c1 J
(9.9b)
The affine scaling approach suggests us to project the gradient vector \lzk onto the null
space of the constraint matrix Uk and to take its negative as a direction of translation
Sec. 9.2 Affine Scaling for Quadratic Programs 231
w
k
= [w1]
w~ = (ukukr)-1.(vk '\lzk) I
(9 .11)
and
(9.12b)
T _ [AX~AT AX~ ]
UkUk - X~AT X~+ Q-1 (9.13)
and
(9.14b)
I- Xk (X~+ Q- 1r
1 1
Xk =I- (I+ Xi; 1Q- 1Xi; 1r
= (I+Xi;IQ-IXi;l-I) (I+Xi;IQ-lXi;lrl
= xk-1 ( xkQ+I
2 )-]
xk
= Xj; 1 (Q + Xj; 2) Xj; 1
and
X~(X~+Q-Jrl = (I+Q-IXj;zri = (Q+Xj;zriQ
Equation (9.16a) reduces to
1
=-(X~+ Q- 1r X~AT w1- (X~+ Q- 1r (xk + Q- 1c)
1
(9.18b)
However, from (9.8a),(9.9b), and (9.12b) we know that
d~ = Xk(AT w1 + w~) (9.19)
Plugging (9.18a) and (9.18b) into (9.19) and carefully rearranging terms, we have a
simple result:
1
w1 + w~ =AT w1- (X~+ Q- r X~AT w7- (X~+ Q- 1r (xk + Q- 1c)
1 1
AT
(9.21)
Mapping it back to the original x-space by premultiplying the scaling matrix Xk> we
obtain
such that d~ = -I\Hk(Qxk +c). Compared to the linear programming case, this
projection matrix depends not only upon the affine scaling matrix Xk> but also upon the
second-derivative information Q of the objective function. Therefore, the above direction
of translation has a combined effect of pure affine scaling and Newton's method. To be
more specific, when xk is sufficiently away from the positivity walls of the first orthant,
Hk is dominated by Q- 1 and d~ behaves like a Newton direction. On the other hand,
as xk is close to the positivity walls, Hk is dominated by X~ and d~ behaves like a pure
affine scaling scheme.
It is also important to observe that since we are not directly inverting Q, d~ exists
even when Q is only positive semi-definite. In this case, if we set Q = 0, then d~ reduces
to the direction of translation for the primal affine scaling in the linear programming case.
Direction of translation
x becomes
- negative beyond
this point
Figure 9.5
Objective value
increases beyond
_-'this point
Direction of translation
Figure 9.6
(9.24)
It is left to the reader to verify that af ::: 0. With (9.23) and (9.24), finally we obtain an
appropriate step-length
(9.25)
Stopping rules. Similar to what we have for the linear programming case,
based on the Karush-Kuhn-Tucker conditions (9.3), we can check primal feasibility, dual
feasibility, and complementary slackness for optimality. When all these three conditions
are satisfied, we can terminate the algorithm with an optimal solution. To be more
precise, we preselect three sufficiently small positive numbers E 1 , Ez, and E3 . At the kth
iteration with a feasible solution xk, we define
Hk = (Q+XJ; 2 )- 1 ; w7 = (AHkAT)- 1AHk(Qxk+c); and sk = (Qxk+c)-ATw1
Then we stop the algorithm when the following conditions are met:
IIAxk- bll
llbll +1
(II) DUAL FEASIBILITY:
Either sk ::: 0 or
(xk)T sk ::::: E3
I!Axk- hi!
llhll + 1
and
and
(xk)T sk :::S E3
then stop with an optimal (or near optimal) solution xk. Otherwise, go to the next
step.
Step 4 (finding direction of translation): Compute a direction of translation
~ = -Hksk
2 [( d~) T ( Qxk + c) J
ak = - [(d~)T Qd~J
If d~ ::: 0, then set ak = al. Otherwise, calculate
a k1 = min
i
axk dk < 0
{ --'
-dk l
I }
l
Example 9.1
Minimize 2xf + 3xi + 5xj +X] + 2x2- 3x3
subject to X] +x2 =5
It is clear that
1 1 OJ and Q=
20 03 OJ
0
A= [ 0 1 1 ' [0 0 5
Also note that the unconstrained minimum x = -Q- 1c
= [-0.5 -0.6667 0.6000f is
infeasible; therefore we have to go to Step 2.
Let us start with an initial interior solution x0 = [0.3532 4.6468 5.3532f. (As
a matter of fact, the reader may verify that this solution can be obtained by invoking the
phase-! linear programming method with a starting vector of all ones.) Now, we have
Ho = [Q+X() 2 r 1
=
[
0.0998
o
0
0
o.3283
0
0
o
0.1986
]
Therefore,
s0 = (Qx0 +c)- AT w0 = [5.2756 - 1.6044 2.6517f
Since the negative component of s0 is too large, it is not optimal yet. Hence we
proceed to synthesize the direction of translation:
d~ = -H0 s0 = [ -0.5267 0.5267 - 0.5267]T
Because d~ is not nonnegative, we set a=
0.99 to calculate a6
= 0.6640. Also we compute
a6 = 1.8098. Therefore the actual step-length ao = 0.6640. Making the translation, we get
a new interior solution as
x 1 = x0 + aod~ = [0.0035 4.9965 5.0035]T
The reader may carry out further iterations to verify that the optimal solution is [0 5 sf.
9.2.2 Improving Primal Affine Scaling for Quadratic
Programming
Similar to Theorem 7 .2, it can be proven that the sequence of iterates {xk} generated by the
primal affine scaling algorithm for quadratic programming problems indeed converges to
an optimal solution under appropriate assumptions. However, as far as the computational
complexity is concerned, there is no evidence that the algorithm is of polynomial-time
bound. The authors have implemented the algorithm and found satisfactory performance
238 Affine Scaling for Convex Quadratic Programming Chap. 9
(9.26)
With these concepts in mind, let us revisit formula (9.22). Noting that Hk = (Q+Xk 2 )- 1
is a symmetric positive definite matrix, we denote Tk as its Cholesky factor such that
Tk T[ = Hk. Substituting into Equation (9.22), we see
(9.29)
Taking a closer look at the primal affine scaling algorithm for quadratic programming,
we see the most time-consuming work is the computation of the moving direction d~. We
now compare the null-space formulation (9.22) with the range-space formulation (9.28)
for solving a convex quadratic program with n variables and m constraints.
In the null-space formulation, the complexity of inverting (Q + x; 2 ) to form Hk
is of O(n 3 ), forming (AHkAT) is of O(mn 2 ), and inverting (AHkAT) is of O(m 3 ).
On the other hand, in the range-space formulation, the complexity of forming
Ac(Q + X; 2 )AJ is of O((n- m)n 2 ) and inverting this matrix is of O((n- m) 3 ).
Therefore the following observations can be made:
Potential push method. Just as in the linear programming case, the primal
affine scaling algorithm for quadratic programming may also "get trapped" at a wrong
vertex of the feasible domain. In order to avoid this potential problem, we introduce a
"potential push method" in this subsection and a "logarithmic barrier function method"
in the next subsection. Because these two methods are the extensions of what we had in
Chapter 7, we only outline the results without formal proof.
The basic concept of the "potential push" method is illustrated by Figure 9. 7. An
old solution xk- 1 with a moving direction d~- 1 leads to a current solution xk. However,
the current solution is "off the center" and our objective is to push the "off-centered"
solution to a better position which is away from the boundary but with the same objective
value. In order to do so, we define a potential function by
(9.30)
and try to find a projected normal vector a~ and a projected push vector a~ at xk such
that the better position can be determined by minimizing the potential function (9.30)
240 Affine Scaling for Convex Quadratic Programming Chap. 9
Recentering on
constant objective surface
Figure 9.7
In real implementation, once a~ and a~ are found, a binary search procedure is usually
applied to find an approximated minimizer x(t) of the potential function (9.30) along
the path (9 .31). According to the authors' experience, the extra computational effort
for the potential push is well repaid for a variety of large-scale quadratic programming
problems. But again, there is no theoretic proof of polynomial-time complexity for the
potential push method.
Sec. 9.3 Primal-Dual Algorithm for Quadratic Programming 241
Minimize (9.33a)
For a given barrier parameter fL > 0, let us focus on problem (9.33). The associated
Karush-Kuhn-Tucker conditions become
Ax=b, x>O (primal feasibility) (9.35a)
-Qx+ ATw+s = c, S>O (dual feasibility) (9.35b)
XSe = fLe (complementary slackness) (9.35c)
Now assume that (xk; wk; sk) is a current solution with xk > 0, sk > 0 for fLk > 0.
Invoking Newton's method would yield a system of equations for the directions of
translation. This system is given by
Io ] rd~ 1 r
dk
= - -Qxk +Axk - b
AT wk + sk - c 1 (9.36)
Xk d~ xkske - f.Lke
where xk and sk are the diagonal matrices formed by vectors xk and sk, respectively.
Note that (9.36) can be written as
AdkX = t;k where tk =b- Axk (9.37a)
where uk = Qxk +c - AT wk - sk (9.37b)
where vk = j.Lke - xkske (9.37c)
242 Affine Scaling for Conve:-< Quadratic Programming Chap. 9
(9.47a)
and
(9.47b)
Sec. 9.3 Primai~Dual Algorithm for Quadratic Programming 243
where d.~; is the ith component of d~, d~ the ith component of d~, xf the ith component
of xk, sf the i th component of sk, and 0 < a < 1 is a constant.
In addition, at each iteration, the barrier parameter f.Lk can be updated according to
the formula
After determining how to iterate from one solution to another one, the remaining
work is to start and stop the iterations. Similar to the development in Chapter 7, here we
introduce a version of the primal-dual algorithm which can start with an arbitrary triplet
(x0 , w0 , s0 ) with appropriate dimensions and x0 , s0 > 0. The algorithm terminates when
the Karush-Kuhn-Tucker conditions (9.35) are met.
Based on the basic concepts we discussed in the previous subsection, we list a version of
the primal-dual algorithm for convex quadratic programming. Although no theoretical
proof of convergence and polynomial-time complexity has been derived for this ver-
sion, experiences of solving real-life problems support its computational efficiency and
effectiveness.
Step 1 (starting the algorithm): Set k = 0. Start with any (x0 ; w0 ; s0 ) such that
x 0 > 0 and s0 > 0. Fix E 1, E2 and E3 to be three sufficiently small positive numbers
and 0 <a < 1.
Step 2 (intermediate computations): Compute
a (xk)T sk
f.Lk = - - -
n
tk=b-Axk
uk = Qxk + c - AT wk - sk
k
f.L < Et, and
then STOP. The current solution is optimal. Otherwise go to the next step.
(Note: Compute lluk II and IIQxk +ell only for those dual constraints which
are violated, i.e., corresponding to the negative components of uk. If uk :::: 0, then
there is no need to compute this measure of optimality.)
244 Affine Scaling for Convex Quadratic Programming Chap. 9
Step 5 (finding step-lengths): Compute the primal and dual step-lengths with
0 < a < 1 (say, 0.99):
and
It is clear to see that we are not optimal yet, since the primal vector x0 is not feasible at
all. Therefore Step 4 requires that
a
If we choose = 0.99, then Step 5 implies that at = 0.7571 and ag = 0.2298. Following
Step 6, we have new solution vectors:
x 1 =[1 1 lf+0.757lx[-1.3077 4.3077 3.6923f
= [0.0100 4.2612 3.7953f
1
s = (1 1 If + 0.2298 x (1.3077 - 4.3077 - 3.6923f
= [1.3005 0.0010 0.1514f
and
1
W = (0 Of +0.2298 X (-1.9231 23.1538]T = (-0.4496 5.3213]T
Note that the new primal vector x1
is closer to satisfying the primal feasibility requirement.
The reader is urged to carry out further iterations in order to terminate the procedure with
an optimal solution.
As indicated earlier, both the convergence and polynomiality properties of the above
version of the primal-dual algorithm have not been fully investigated, although it indeed
works very well for solving many real-life problems. To answer the theoretic questions,
R. C. Monteiro and I. Adler have reported another version of the primal-dual algorithm
which converges to optimality with O(n 3 L) complexity.
Their version requires starting with a solution (x0 ; w0 ; s0 ) such that the primal
feasibility and dual feasibility conditions (9.35a, b) are met. These two conditions are
also required to be maintained at each iteration. In other words, tk and uk must be kept
at 0 for each k. In addition, the starting solution (x0 ; w0 ; s0 ) and a corresponding p., 0 are
required to satisfy the condition
lifO- p.,Oell : :_ 8p.,O
where f 0 = [x?s?, xgsg, x~s~]T and e is a small positive number, say 0.1.
The algorithm updates the value of the barrier parameter p., according to the fol-
lowing formula:
p.,k+! = f.Lk (1- of-Iii)
where 8 is also a small positive number, say 0.1. Their moving directions are also
synthesized according to Equations (9.43), (9.44) and (9.45) with tk = 0 and uk = 0.
But, instead of evaluating at the real xk and sk, they introduced the so-called "adjusted"
values of x_k and sk such that
lxf- xfl :::_y for each i
I# I
and
lsf- sf I :::_y for each i
lsfl
246 Affine Scaling for Convex Quadratic Programming Chap. 9
where y is again a small positive number, say 0.1. The moving directions are computed
by using xk and sk instead of xk and sk in Equations (9.43)-(9.45).
Since both the primal and dual feasibility conditions are maintained, R. C. Monteiro
and I. Adler showed that their algorithm takes at most 0 (.j!i max(log E- 1, log n, log M0))
iterations to terminate with (xk)T sk :s E. With the barrier parameter Mo satisfying the
condition log (M 0 ) = O(L), the algorithm stops in O(.j!iL) iterations. Now, at each
iteration the complexity analysis is pretty much like the linear programming case. Hence
the total complexity can be brought down to O(n 3 L) arithmetic operations.
From a practical point of view, although Monteiro and Adler's primal-dual algo-
rithm is theoretically interesting, it is not computationally attractive. Basically, several
parameters in the algorithm are heuristically chosen, and it is difficult to find "univer-
sally good" values of these parameters. Furthermore, it is not that easy to find a starting
solution satisfying all the requirements, and the computations of xk and sk and related
approximations are found to adversely affect the efficiency of the algorithm.
In this section, we further extend the primal affine scaling algorithm to solve convex
programming problems with linear constraints. In certain sense this extension is expected,
since a convex function can be approximated by a quadratic function in a neighborhood
of each point of interest. Therefore, it is a natural extension from quadratic to convex
programming.
is no longer a hyperplane, although the feasible domain stays the same as in a linear
programming problem.
The Karush-Kuhn-Tucker conditions for problem (9.48) are given by
Ax=b, (9.49a)
7
-Vf(x)+A w+s=0, (9.49b)
XSe=O (9.49c)
Notice that when \.;2 f(x) is positive definite, conditions (9.49a)-(9.49c) can be used for
the optimality test. Also notice that if f (x) = ~ x 7 Qx + c7 x, then Vf (x) = Qx + c and
conditions (9.49a)-(9.49c) become conditions (9.3a)-(9.3c). In other words, the work of
linearly constrained quadratic programming is a special case of this general setting.
The basic idea of designing a primal affine scaling algorithm for a linearly con-
strained convex programming problem is pretty much straightforward. We simply replace
the role of Q and Qx + c in the quadratic programming algorithm by \/2 f (x) and Vf (x),
respectively, to handle the convex case. In this way, for an interior feasible solution xk,
the moving direction is determined by
Once the moving direction and step-length are known, the algorithm iterates from one
interior feasible solution to another. Moreover, the Karush-Kuhn-Tucker conditions are
checked for the termination of the algorithm. To be more precise, we preselect three
sufficiently small positive numbers E 1, E2 , and E3 . At the kth iteration with a feasible
solution xk, we define
Then we stop the algorithm when the following conditions are met:
I!Axk- bjj
llbll + 1
(II) DUAL FEASIBILITY:
Either sk ::: 0 or
(xk)T sk:::: E3
Based on the basic concepts, a primal affine scaling algorithm is assembled as follows
to solve convex programming problems with linear constraints.
either sk ::: 0 or
and
References for Further Reading 249
then stop with an optimal (actually, near optimal) solution xk. Otherwise, go to
the next step.
Step 4 (finding a direction of translation): Compute a direction of translation
d~ = -Hksk
Step 5 (computing step-length): Using the line search to find af by approximating
the minimizer of
f(a) = f (xk +ad~)
If d~ :::: 0, then set ak = af. Otherwise, calculate
I . ax; k
ak = ~m { -df
k d; < 0 I }
and set ak = min{ak, af}.
Step 6 (moving to a new solution): Perform the translation
xk+
1
+- xk + akd~
Set k +- k + 1 and go to Step 2.
Again, we have to point out that both the convergence proof and modifications for
polynomial-time complexity analysis are subject to further investigation. Here we only
outline an implementation procedure with no further theoretic indications.
In this chapter, we have seen how the variants of affine scaling can be extended to
solve linearly constrained quadratic programming and convex programming problems.
The basic ideas are quite simple, but more rigorous study on the convergence proof and
polynomiality complexity analysis remains a challenge. Here we merely introduce some
practical implementations. Moreover, from these practical implementations, we learn the
potential of developing a general purpose affine scaling solver for linear, quadratic, and
linearly constrained convex programming problems, because the basic ideas follow the
same logic. Researchers are also studying the linear complementarity problem and other
types of nonlinear programming problems from the interior point of view.
9.1 Cheng, Y-C., Houck, D. J., Jr., Meketon, M.S., Slutsman, L., Vanderbei, R. J., and Wang, P.,
"The AT&T KORBX® System," AT&T Technical Journal 68, No. 3, 7-19 (1989).
9.2 Fang, S. C., and Tsao, J. H-S., "An unconstrained convex programming approach to solv-
ing convex quadratic programming problems," OR Report No. 263, North Carolina State
University, Raleigh, NC (1992).
250 Affine Scaling for Convex Quadratic Programming Chap. 9
9.3 Freedman, B. A, Puthenpura, S. C., and Sinha, L. P., "A new Karmarkar-based algorithm
for optimizing convex, non-linear cost functions with linear constraints," Technical Memo-
randum, No. 54142-870217-01TM, AT&T Bell Laboratories (1987).
9.4 Goldfarb, D., and Liu, S., "An O(n 3 L) primal interior point algorithm for convex quadratic
programming," Mathematical Programming 49, 325-340 (1991).
9.5 Jarre, F., "On the convergence of the method of analytical centers when applied to con-
vex programs," manuscript, Institute fi.ir Angewandte Mathematik and Statistik, Universitat
Wurzburg, Wurzburg, Germany (1987).
9.6 Jarre, F., "The method of analytical centers for smooth convex programs," PhD thesis, Insti-
tute fi.ir Angewandte Mathematik and Statistik, Universitat Wurzburg, Wurzburg, Germany
(1987).
9.7 Kapoor, S., and Vaidya, P.M., "Fast algorithms for convex quadratic programming and mul-
ticommodity flows," Proceedings of the 18th Annual Symposium on Theory of Computing,
Berkeley, CA, 147-159 (1986).
9.8 Kozlov, M. K., Tarasov, S. P., and Khachian, L. G., "Polynomial solvability of convex
quadratic programming" (in Russian), Doklady Akademiia Nauk USSR 5, 1051-1053 (1979).
9.9 Mehrotra, S., and Sun, J., "An algorithm for convex quadratic programming that requires
O(n 3 ·5 L) arithmetic operations," Mathematics of Operations Research 15, 342-363 (1990).
9.10 Mehrotra, S., and Sun, J., "A method of analytic centers for quadratically constrained convex
quadratic programs," SIAM Journal of Numerical Analysis 28, 529-544 (1991).
9.11 Mehrotra, S., and Sun, J., "An interior point method for solving smooth convex programs
based on Newton's method," Contemporary Mathematics 114, 265-284 (1991).
9.12 Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms. Part II: convex
quadratic programming," Mathematical Programming 44, 43-66 (1989).
9.13 Monteiro, R. C., Adler, I., and Resende, M. C., "A polynomial-time primal-dual affine scaling
algorithm for linear and convex quadratic programming and its power series extension,"
Mathematics of Operations Research 15, 191-214 (1990).
9.14 Sonnevand, G., "An analytical center for polyhedrons and new classes of global algorithms
for linear (smooth, convex) programming," in Proceedings of the 12th IFIP Conference on
System Modeling and Optimizations, Budapest, Lecture Notes in Control Information Sciences,
Springer-Verlag, New York, 84, 866-876 (1985).
9.15 Wolfe, P., "The simplex method for quadratic programming," Econometrica 27, 382-398
(1959).
9.16 Ye, Y., "An extension of Karmarkar's algorithm and the trust region method for quadratic
programming," in Progress in Mathematical Programming: Interior-Point and Related Meth-
ods, ed. N. Megiddo, Springer-Verlag, New York, 49-64 (1989).
9.17 Ye, Y., and Tse, E., "A polynomial-time algorithm for convex quadratic programming,"
Manuscript, Department of Engineering-Economic Systems, Stanford University, Stanford,
CA (1986).
EXERCISES
9.1. Let L be an n x n symmetric matrix and positive definite on the null space of matrix A,
where A is an m x n matrix with rank m. Show that the matrix
Exercises 251
L AT)
(A 0
is nonsingular.
9.2. Let P = {x E Rnl-l::: Xi::: 1, i = 1, ... , n} and f be a quadratic norm-function such that
j(x) = -llxl! 2 .
(a) Verify that any vertex of Pis a local minimum point of f.
(b) Can you use the algorithms developed in this chapter to solve this problem? Why?
9.3. Solve the following quadratic programming problems:
(a) Minimize x 2 - 14x + y 2 - 6y- 5 subject to x + y ::: 2 and -x- 2y :::: -3.
(b) Maximize 6xy- 2x 2 - 9y 2 + 18x- 9y + 10 subject to x + 2y ::: 12 and x, y :::: 0.
9.4. Show that
(a) X~(X~+Q-1)-1 = (Q+X;2)-1Q.
(b) I- Xk(X~ + Q- 1)- 1Xk = x; 1(Q +X; 2)-1X; 1.
9.5. Derive Equation (9.24) and verify that af : :
0.
9.6. Consider the following convex programming (primal) problem
Minimize f(x)
h(x) = 0, XEX
Maximize p(v, w)
where
Prove that ¢} is rotation invariant (i.e., you may rotate xk such that xk -+ Rxk where R is
orthonormal (RRT =I), without affecting the minimization of¢}) and¢~ is scale invariant
(i.e., you may scale xk without affecting the minimization of ¢~).
252 Affine Scaling for Convex Quadratic Programming Chap. 9
9.9. For the (d~, d~) pair depicted by Equations (9.32a) and (9.32b), show that
(d~)T Qd~ = 0
such that cT vk = 0, Avk = 0, and Pk is a positive constant. (Note: This is. the potential
push equation for the LP case.) [Hint: Note that e in Equation (9.31) can be chosen as the
spectral norm of Q.]
9.11. Show that x(t) is a geodesic in the transformed space (where the constant objective surfaces
are spherical), and hence the locus of x(t) represents a great circle. [Hint: Show that
2
d x'(t)]T [dx'(t)] = 0 where x'(t) = LT x(t) and LLT =Q
[ dt2 dt
which implies zero acceleration in the tangent space. This is the striking property of
geodesics.]
9.12. Assume that the vector y 0 = e. Generate a sequence of vectors {Yk} by using the relationship
yk+l = Qyk
lrn lementation
of Interior-Point
Algorithrns
In recent years the interior-point algorithms have shown their efficiency in solving large-
scale linear and quadratic programming problems with a wide variety of successful ap-
plications. As a matter of fact, some large-scale problems became solvable owing to the
invention of these techniques. However, it is important to understand that implementa-
tion techniques play a key role in the claimed efficiency of these methods. For example,
we can easily implement the primal affine scaling algorithm for linear programming in
APL language in less than an hour, involving less than twenty lines of coding, but the
performance of such an implementation could be far from satisfactory. Many implemen-
tation issues need to be carefully addressed in order to achieve the expected performance.
Nowadays, with the advent of vector/parallel processing capabilities of modem comput-
ers, implementation skills are much more involved than ever before. This is particularly
true for any serious commercial software package.
In this chapter, we point out the computational bottleneck of interior-point algo-
rithms and focus on some implementation techniques, including the Cholesky factoriza-
tion, conjugate gradient, and LQ factorization methods, to tackle the bottleneck problem.
By no means does this chapter provide a complete treatment; it only touches the tip of
an iceberg.
So far we have studied the Karmarkar's projective scaling algorithm, primal affine scaling
algorithm, dual affine scaling algorithm, primal-dual algorithm, affine scaling with loga-
rithmic barrier function method, affine scaling with power-series method, and extended
affine scaling algorithms for linearly constrained quadratic and convex programming
253
254 Implementation of Interior-Point Algorithms Chap. 10
The idea of the Cholesky factorization method is quite simple. Instead of solving system
(10.1) directly, since the fundamental matrix M in (10.1) is symmetric and positive
definite, based on Cholesky, we first factorize it as a matrix product of an m x m lower
triangular matrix L and its transpose matrix LT, i.e., M = LL 7 . In this way, (10.1)
becomes
LL 7 u = v (10.2)
7
We further define z = L u. By solving
Lz = v (10.3)
for z first and then solving
(10.4)
for u, we find a solution to system (10.1) in two stages. Since L is a lower triangular
matrix, we can easily identify z1 first, then z2 , z3 , ..• , Zm by simple arithmetic operations.
Usually this process is called forward solve. Similarly, because LT is an upper triangular
matrix, we can easily identify Um first, then Urn-!, Um-2, ... , u 1 by simple arithmetic
operations. Therefore, it is often called backward solve.
It is easy to understand the advantage of forward solve and backward solve. But
the key to success is to find the Cholesky factor L in an efficient manner. Before
we introduce potential factorization algorithms, let us study a fundamental theorem of
Cholesky factorization.
(10.5)
M= [ uj~
~ OJI [10 L1OJ [10 0
Lf
J [~0 uT/~J
I
(10.7)
[ uj~
~
=
Since L 1 is unique, it is clear that
L= [u~ ~J
is also unique, and the proof is complete.
We now focus on computing techniques for finding the Cholesky factor L of a symmetric
positive definite matrix M. Let mij and lij be the (i, j)th element of matrices M and
L, respectively, for i, j = 1, 2, ... , m. Since M = LLT, by matrix multiplication, we
know that
j
Because L is a lower triangular matrix, we need only consider the elements lij with
i ~ j. In this case, (10.8) implies that, for j = 1,
(10.9a)
256 Implementation of Interior-Point Algorithms Chap. 10
and
i = 2, ... ,m (10.9b)
Moreover, for j = 2, 3, ... , m, we can first compute
j-1 ) 1/2
ljj = (
mjj- :z=zJk (10.10a)
k=l
In this scheme, the columns of L are computed one by one, but the part of the matrix
remaining to be factored is not accessed during the scheme. Also because the inner
product of subrows of L is calculated in (10.10), this scheme is called an inner-product
form. The inner-product form certainly is not the only way of computing the Cholesky
factor. As a matter of fact, the proof of Theorem 10.1 itself is a constructive proof.
It suggests a scheme called outer-product form of computing the rows of L one by
one. Details of this new scheme will be provided in the exercises. As to the detailed
implementation, the inner-product form scheme can be easily coded as follows:
Algorithm C-1
Note that Algorithm C-1 is based on the fact that the inner products between the subrows
of L and the lower triangular portion of M can be overwritten by the corresponding
elements of L (once L is known, we do not need to M any longer). To speed up the
performance, we may consider using a computer with vector processing capability. In
Sec. 10.2 The Cholesky Factorization Method 257
this environment, for i == 1, ... , m and j = 1, ... , i, let mij == (mn, ... , mij )T be a
column vector and consider the following coding, which overwrites miJ for liJ:
Algorithm C-2
/11 ~ .J'i7711
for i = 2 to m
li1 ~ mi1 I l11
end
for j = 2 to m
for i = j to m
T
s ~ mii- mi(j- 1)mi(j-1)
if i = i
mii ~ .jS
else
mii ~ sjlii
end
end
end
The difference between Algorithm C-1 and Algorithm C-2 may look subtle, but it clearly
illustrates that detailed implementation on the coding level could gain speed and reduce
the memory requirement. Another interesting issue to note here is that in Algorithms C-1
and C-2, the operations describing the vector inner products are on the subrows of a ma-
trix, which may perform less efficiently if the matrix elements are stored columnwise (as
in the case of FORTRAN). Therefore, if we choose to implement the algorithm in FOR-
TRAN, a code reorganizing has to be done to allow the operations to access contiguous
memory locations and thereby cut down memory access time. As to C programming,
since the elements are stored rowwise, Algorithms C-1 and C-2 can be implemented
without any degradation in performance due to memory access.
Newer FORTRAN and C compilers also allow the so-called recursive functions
and subroutines. This is an interesting feature, where a function or subroutine can call
itself. This feature can be effectively used in Cholesky factorization. The way recursion
can be invoked varies from compiler to compiler for particular applications, and hence
is beyond our scope. The important message to a serious program developer is to study
the compiler before implementing any interior-point algorithm.
Another important aspect one should not leave out in this discussion is the block
Cholesky factorization.
Knowing that matrix operations can be highly parallelizable (several row or column
operations can be done simultaneously), when we are dealing with large-scale problems
with special block structure in the constraint matrix, we should further study the Cholesky
factorization algorithm.
258 Implementation of Interior-Point Algorithms Chap. 10
[
~!!
M= . ~lpl
Mpl Mpp
such that m = pr, where r is referred to as the block size. The Cholesky factor of M
can be partitioned accordingly as
Loll
L= .
[
Lpl
By directly equating LLT = M with block structure, we see that
Mij = L L;kLJk
k=l
and hence
j-1
s
then, for p ::::: i ::::: j ::::: 2, L 11 is the Cholesky factor of 11 and Lij is the solution of the
matrix equation ZLL = sij. Hence a block Cholesky factorization scheme is obtained:
Algorithm C-3
Note that Algorithm C-3 may use Algorithm C-1 or C-2 to find the Cholesky factor
for block submatrices. Also note that since L£
is upper triangular, solving ZL£ = S
is relatively simple. The recursive subroutines can be used here quite efficiently, if
the compiler supports this feature. One key factor that affects the performance of block
Cholesky factorization is the choice of the block size r, which often needs careful thinking
and experimentation. The development of block Cholesky factorization algorithms and
their implementations, especially on vector/parallel processors, is an active research area.
For large-scale problems, it is quite possible that most elements of matrix M have zero
value. The sparsity is measured by the ratio between the number of nonzero elements
and the total number of elements in a matrix. When the sparsity ratio is relatively low,
say 0.01 or even smaller, we say the matrix is a sparse matrix. Otherwise, we have a
dense matrix. However, there is no clear-cut threshold sparsity ratio.
When sparse matrices are involved, it is no longer necessary to keep track of their
every element. Most attention needs to be focused on the "position and value" of nonzero
elements only. The techniques which help us manipulate the sparse matrix operations
are often called the sparse matrix techniques. Many books have been written on this
topic.
As to applying Cholesky factorization methods to a symmetric positive definite
sparse matrix M, the key concern is to prevent the Cholesky factor L from being dense.
The following example shows that a relatively sparse matrix M could have a relatively
dense Cholesky factor L.
Example 10.1
(from A. George and J. W. Liu): Let
r~ ~-5
2 0.5
I~J
0 0
M= 3 0
0.5 0 0 0.625
2 0 0 0
Applying Algorithm C-1, we have
r
~.5
L= l
0.25
l
0.5
-1.0
-0.25
-1.0
l
-0.50
-2
0.50
-3
]
260 Implementation of Interior-Point Algorithms Chap. 10
Note that for any permutation matrix P, the matrix PMP7 is still symmetric and positive
definite. It is also interesting to note that, in PMP7 , P permutes the rows of M and P 7
permutes the columns of M.
Example 10.2
For Example 10.1, if we choose a permutation matrix
o o o o
l
0 0 0 1
P= 0 0 1 0
0 1 0 0
I 0 0 0
then the rows and columns of M are permuted as
l6 0 0
l
0 0.625 0
PMP 7 = 0 0 3
0 0 0
2 0.5 2
LJ~ 0.791
0
0
1.73
0 0.707
lo.5 0.632 1.15 1.41
Compared to the Cholesky factor in Example 10.1, the new Cholesky factor is relatively
sparse. Consequently, solving system (10.13) is more efficient than solving system (10.1).
With the abovementioned concept, we understand the key issue in sparse Cholesky
factorization is to find an appropriate permutation P for a given symmetric positive def-
inite matrix M such that the number of fill-ins is minimized. Unfortunately, minimizing
fill-ins is not a simple problem in general. So far, only heuristics have been proposed
by various researchers to provide acceptable, but not necessarily optimal, results.
Sec. 10.2 The Cholesky Factorization Method 261
The most popular fill-in reduction scheme is the so-called minimum degree reorder-
ing algorithm. Here we briefly outline the algorithm, leaving the reader to find detailed
explanations and theoretical insights in other books.
The algorithm may be best understood by graphical illustrations. First of all, let
us establish a relationship between graphs and matrices. For an (m x m)-dimensional
matrix M, we define an ordered graph of M and denoted it by eM. In eM, there are
m nodes, and a node i is connected to node j (where i =j::. j) by a link if the (i, j)th
element of M is not zero, i.e., mij =j::. 0. Figure 10.1 illustrates this situation with the
help of an example, where the off-diagonal nonzeros of M are depicted by asterisks.
1
* *
2
*
* 3 * *
* 4 * *
5 *
* * * 6
7
* * *
M
Figure 10.1
Two nodes i and j are adjacent if they are connected by a link. The degree of
a node i, denoted by Deg (i), is defined to be the number of adjacent nodes of i, in
other words, the number of links connected to node i. For example, in Figure 10.1,
Deg (1) = 2, Deg (2) = 1, and Deg (7) = 3.
The idea of the minimum degree reordering algorithm is to eliminate a node with
the minimum degree from the graph, one at a time, until every node is eliminated. Once
a node is eliminated, the degree of nodes may change in the remaining graph and hence
needs to be updated. The node elimination sequence eventually suggest us a candidate
of the desired permutation matrix P.
We now describe the minimum degree reordering algorithm in terms of the graph
elimination model, where an elimination graph is defined as a graph which is subjected
to the elimination of selected nodes. Here we eliminate the nodes of eM one by one
in a systematic order. At each step the resulting graph is labeled as ef;_f, where the
subscript k denotes the step number.
At the end, a permuted graph is obtained by swapping the step number k and the node
number i. Moreover, a permutation matrix P is generated by assigning Pki = 1, for
k = 1, ... , m, and other elements being zero.
Notice that more than one node can assume the minimum degree in Step 2. Differ-
ent heuristics of node selection give different versions of the minimum degree reordering
algorithm. In the simple case without any further information, we may break ties arbi-
trarily.
The following example illustrates the minimum degree algorithm applied to the
example of Figure 10.1 with an arbitrary tie-breaker.
Example 10.3
k = 1, node selected = 2, min. degree = I k = 2, node selected = 5, min. degree = I
Finally,
k = 7, node selected = 4, min. degree =0
k
2-
5 __,_ 2
1-->- 3
7-4
6-5
3-6
4-7
When the Cholesky factorization method is applied to find moving directions at each
iteration of the previously mentioned interior-point algorithms, we need to factorize
M = ADkA 7 repeatedly, where Dk is a diagonal matrix with positive diagonal elements
for each k. It will be awfully tedious, if we have to permute every ADkA 7 in order to
reduce fill-ins.
Fortunately, closer observation indicates that although ADkA 7 changes along with
the value of Dk at each iteration, the positions of nonzero elements remain intact. This
means the sparsity structure is preserved as in AA 7 . Therefore, in the implementation
of an interior-point algorithm for large-scale problems, it is advantageous to perform a
symbolic factorization first. In this phase, we focus on AA 7 to analyze the positions in
which the nonzero entries of the computational result would occur. The minimum degree
reordering algorithm could be applied to reduce the fill-ins. Once this work is done, we
record the positions of nonzero elements as a template. Then, at each iteration, since
the positions of nonzero elements are known, we need only find the numerical value of
each nonzero element. Correspondingly, we may call it a numerical factorization phase.
Figure 10.2 illustrates this two-phase procedure using block diagrams.
Once the Cholesky factor of matrix M is computed, solving system (10.1) is equivalent
to solving the triangular systems (10.3) and (10.4).
264 Implementation of Interior-Point Algorithms Chap. 10
--------------------------------------------. ~--------------------.
' ' '
' ' ''
Minimum degree
reordeting - Symbolic
Cholesky
factorization
'
'
--+-+-
'' '
''
'
'
'
'
'
Numeric
Cholesky
factorization
Figure 10.2
Z; =
(
V; - L l;kZk
i-1 )
jl;; fori= 2, ... , m (10.14b)
k=l
Algorithm F -1
Z1=V1/I11
for i = 2 to m
S=O
for k = 1 to i - 1
s ~ s+l;kzk
end
z; = (v;- s)jl;;
end
Similar to Algorithm C-1, since we access matrix L row by row, and inner products
of row vectors are involved in (10.14b), Algorithm F-1 can be modified for vector
processing. This scheme is certainly more appropriate, if matrix L is stored rowwise
(like C programming). We leave this to the reader.
If matrix L is stored column by column and sparsity of the solution vector is being
considered, the following coding scheme is more efficient:
Algorithm F -2
for i = 1 to m
Z; = V;/l;;
for k = (i + 1) to n
Vk ~ Vk - Z;lki
end
end
Sec. 10.3 The Conjugate Gradient Method 265
The reader is asked to verify that Algorithm F-2 solves system (10.3) and accesses
the matrix L column by column. Note that if vi turns out to be zero at the beginning
of the ith step, then Zi must be zero and the inner loop can be completely skipped.
Hence the sparsity issue is exploited. When a columnwise storage scheme (for example,
FORTRAN) is used, Algorithm F-2 is more efficient.
Backward solve. If, on the other hand, L 7 is an upper triangular matrix, system
(10.4) can be solved by getting Urn from the last equation, Um-1 from the second last,
... , and u 1 from the first. This forms a backward solve procedure. To be more specific,
we have
(10.15a)
and
Um-i = (Zm-i - t
k=m-i+1
h(m-i)Uk) /l(m-i)(m-i) for i = 1, ... , m - 1 (10.15b)
Algorithm B-1
Um = Zm/lmm
for i = 1 to m - 1
S=O
for k = m - i + 1 to m
s +- s+lk(m-i) uk
end
Um-i = (Zm-i- S)/Z(m-i)(m-i)
end
Similar to Algorithm F-1, other coding scheme are available for further consideration.
Algorithm B-1 is only one of the simple implementations.
The forward solve and backward solve together with Cholesky factorization have
become the most popular method used by many interior-point algorithms for solving
system (10.1). Other methods, including the conjugate gradient method and the LQ
factorization method, will be introduced in subsequent sections.
In addition to the Cholesky factorization method, the conjugate gradient method can also
be applied to solve system (10.1) with a symmetric positive definite matrix M. The
method was originally suggested by M. R. Hestenes and E. Stiefel in 1952. Like the
steepest descent method, it is classified as an error correction method, which means
that the algorithm starts with an approximated solution (say uk), evaluates an error
266 Implementation of Interior-Point Algorithms Chap. 10
function, and then iterates along a direction (say dk) with an appropriate step-length to
reduce the error. Instead of moving directly along the negative gradient direction for
a maximum reduction of the error function, the moving directions are required to be
mutually conjugate with respect to the matrix M, i.e.,
fork f. j (10.16)
(10.17)
(10.18)
(10.19)
(10.20)
Constant cost
surfaces of hk
Actual solution
u at the
__ - - -- - - - - global minimum
of hk
Steepest
u k' descent
direction
~------------------------------------~-!11
Figure 10.3
Note that the steepest descent method suggests that we consider using the negative
gradient of hk with respect to uk as dk. In this case, we have
dhk
- = -2 (Muk -
-duk v) = 2rk (10.22)
which means the negative gradient of hk is proportional to the residual rk. Therefore,
the residual vector can be used in place of dk. With this in mind, the conjugate gradient
method intends tc stay close to ~ while satisfying the conjugacy requirements (10.16).
Therefore, when u 0 is arbitrarily chosen, we can take d 0 = r 0 . After that, we may
consider taking dk as the component of~ orthogonal to Mdk-I, fork :::: 1. In this way,
we define
(10.23)
uk+I = uk + <Xkdk
rk+l = rk - <XkPk
where B is symmetric positive definite with all eigenvalues being positive and less than
one, then the series
co
L Bk = I+ B + B 2
+ ... (10.27)
k=O
is convergent and
co
M- 1 =(I- B)- 1 = LBk (10.28)
k=O
(10.29)
Therefore the required matrix inversion can be replaced by matrix multiplications and
additions.
However, for a general linear programming problem, the fundamental matrix M =
AD~AT does not necessarily fit this scheme, unless the constraint matrix A is properly
manipulated. In this section we introduce the LQ factorization method to achieve this
purpose. For simplicity, we focus on the implementation of the primal affine scaling
algorithm and leave other algorithms to the reader.
To begin with, we define an (m x n)-dimensional matrix Q to be orthonormal if
QQT = I. Moreover, the norm of matrix Q is defined by
Similarly,
For an orthonormal matrix Q with full row-rank m < n, we have the following special
property of its matrix norm.
=max
IIYII=l 11
Q(y) II
-T
0
--IIYII=l
max II ( y) II-
0 - 1
and we are done.
Note that for a linear programming problem in its standard form, i.e.,
Minimize cT x
Algorithm Q
Once the LQ factorization is done, the original linear programming problem can be
expressed as
Minimize cT x (10.30a)
subject to Qx = b', x:=::O (10.30b)
where b' = L -lb (which can be obtained by applying a forward solve to the system
Lb' =b).
As mentioned earlier, we shall focus on the primal affine scaling algorithm and
leave similar development of other interior-point algorithms to the reader. Remember
that in each iteration of the primal affine scaling algorithm, most computational time is
spent on finding the dual estimate wk for the moving direction dk. To be more precise,
for the problem (10.30), we need to compute
(10.31)
where Xk is a diagonal matrix formed by the current primal solution xk at the kth iteration.
Or, equivalently, wk is a solution vector to the system
Mu=v (10.32)
where M = (QX~QT) and v = QX~c.
Now, since Q is an orthonormal matrix, we would like to show that (QX~QT)- 1
can be represented as a convergent power series in matrix form. Then we can compute
wk according to the basic idea mentioned at the beginning of this section. To achieve
this objective, let us focus on the kth solution xk > 0 and choose
(10.33)
In this way, Amax. Amin > 0. Moreover, for any a > Amax. we have
(10.34)
of B (often called the spectral radius of B and denoted by p(B)), we see that
Kmax =liB!!= IIQXQTII
(10.36)
Observe that the matrix power series (10.27) can be approximated by a matrix polynomial
Pr(B) of degree r > 0, i.e.,
(10.37)
where
(10.38)
and
00
Er(B) = L Bk (10.39)
k=r+!
By (10.37), we have
B k - IIBW+I
IIEr(B)II = 11(1- B)-I- Pr(B)II ::':: Loo II II - ( _ IIBII)
1
(10.40)
k=r+!
Since IIBII = p(B), (10.40) implies that
Consequently, for an arbitrary small tolerance level E > 0, in order to obtain a good
matrix approximation with I!Er(B)II :::; E, we simply have to choose a larger such that
r>
- !log ([1 - p(B)]E)
log [p(B)]
-1
l (10.42)
Algorithm LQ-1
Repeat:
Use "backward solve" to compute uU-I) for the system of equations
LT uU-1) = s(}-1)
j+-j+l
until j = r + 1.
274 Implementation of Interior-Point Algorithms Chap. 10
With this modification, since A and L are sparse, the sparsity issue of Q is bypassed.
Any sparse matrix multiplication technique can be used here. Also note that, in Step 2,
we no longer have to compute B, and s(O) = w(O) = L - 1AX~c.
Another potential drawback of Algorithm LQ-1 is due to the large value of r
required by the algorithm. In theory, we know that as Amin/Amax ~ 0 or, equivalently,
as p(B) ~ 1, the value of r approaches infinity. This means that more and more
iterations of Step 3 are needed, which results in inefficient computation. This inevitably
happens in the interior-point methods, because when we approach an optimal vertex,
even from interior, Amin/Amax still approaches 0.
To overcome this potential problem, we may consider a fixed-point scheme. From
(10.31)-(10.34), we know that
wk = (QX~QTr1 QX~c
1 I
=-(I- B)- v
a
where B = QXQT and v = QX~c. If we further denote w = awk, then w =(I -B)- 1v.
This implies that
w=Bw+v (10.43)
In other words, w is a fixed-point solution of (10.43). Hence we can replace Algo-
rithm LQ-1 by the following iterative scheme:
Algorithm LQ-2
Compute B = QXQT.
Step 2: Set j = 1 and select an arbitrary w( 0l.
Repeat:
w(j) = Bw(j- 1) + v
j+-j+1
until llw(j)- wU- 1)11 ::::E.
Step 3: Assign
~ = (1/a)wUl
Notice that Algorithm LQ-2 allows an arbitrary starting point w(O). In practice, when
the primal affine scaling algorithm converges, the dual solution wk varies little and
Sec. 10.5 Concluding Remarks 275
The proof also shows that the rate of convergence of Algorithm LQ-2 is at least linear
in p(B). As a final remark, it is noteworthy that the iterative scheme presented in
Algorithm LQ-2 may be accelerated. Let 0 < e < 1 be arbitrary and consider Step 2 in
Algorithm LQ-2 being modified as
w<n = ewU-Il + (1 -e) [BwU-Il + v]
It can be shown that the sequence {wUl: j = 0, 1, 2, ... } generated by the modified
Algorithm LQ-2 is also a Cauchy sequence. Its convergence rate is at least linear in
p(ei + (1 - e)B). This could improve the rate of convergence. But it is not an easy
problem to find an optimal e for a general setting.
A more general and new model of treating the infinitely summable series by using
Chebychev approximation to accelerate the convergence is also under current investiga-
tion.
In this chapter we have introduced three methods to overcome the computational bottle-
neck of implementing interior-point algorithms. While all three methods (Cholesky, CG,
276 Implementation of Interior-Point Algorithms Chap. 10
and LQ) appear promising, the most popular is the Cholesky factorization method. This
is partially due to the fact that a vast amount of literature is available in this area and
it is numerically stable. Moreover, any floating-point exceptions can be easily "tracked"
and "trapped" in the implementations of Cholesky factorization.
Sparsity considerations are very important in solving large-scale problems. In ad-
dition to what has been introduced in this chapter, one idea is to split the constraint
matrix into two parts, a dense part and a sparse part, and treat them separately. For the
sparse part we may use sparse matrix techniques, such as the ones described previously,
while a low rank updating scheme or the conjugate gradient method may be used for the
dense part. This approach could result in substantial reduction in computer run-time for
a number of cases. It is often referred to as the column dropping technique. However, it
requires us to take extreme care to ensure the robustness of the implemented software.
This has been recognized as a challenging task. The major problem is numerical insta-
bility. Even though the original system is nonsingular, the subsystem of equations to
be solved with some columns deleted may tum out to be singular. Another idea is to
solve dual problems as if they were primal. Under this situation, as long as the original
problem does not contain both dense rows and dense columns, we may still be able to
take advantage of one formulation.
Other techniques that have been tested for the implementation of interior-point
methods include (1) matrix reduction via the elimination of singleton columns and corre-
sponding variables, (2) fixing variables to their bounds, whenever applicable, (3) scaling
the data to render numerical stability in computation, and (4) using various acceleration
techniques applied to Cholesky factorization. The authors' limited experience with these
techniques indicates that little improvement can be obtained for overall efficiency in
solving real-life problems.
Finally, we want to mention that there have been attempts to apply decomposition
principles in conjunction with interior-point methods to solve large-scale problems. Ba-
sically, a large-scale problem is decomposed into a restricted master problem and several
subproblems, and then interior-point methods are applied. However, as far as the authors
know, there is no evidence showing any significant improvement.
10.1. Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "An implementation of Kar-
markar's algorithm for linear programming," Mathematical Programming 44,297-335 (1989).
10.2. Adler, I., Karmarkar, N., Resende, M.G. C., and Veiga, G., "Data structures and programming
techniques for the implementation of Karmarkar's algorithm," ORSA Journal of Computing
l, 84-106 (1989).
10.3. Cheng, Y-C., Houck, D. J., Jr., Meketon, M.S., Slutsman, L., Vanderbei, R. J., and Wang, P.,
''The AT&T KORBX® System," AT&T Technical Journal 68, No. 3, 7-19 (1989).
10.4. Duff, I. S., Erisman, A. M., and Reid, J. K., Direct Methods for Sparse Matrices, Clarendon
Press, Oxford (1986).
Exercises 277
10.5. Gay, D., "Massive memory buys little speed for complete in-core sparse Cholesky factoriza-
tions," Technical Report, AT&T Bell Laboratories (1988).
10.6. George, A., and Liu, J. W., Computer Solution of Large Positive Definite Systems, Prentice-
Hall, Englewood Cliffs, NJ (1981).
10.7. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Optimization,
Vol. 1, Addison-Wesley, Redwood City, CA (1991).
10.8. Golub, G. H., and Van Loan, C. F., Matrix Computations, Johns Hopkins University Press,
Baltimore, MD (1983).
10.9. Hestenes, M. R., and Stiefel, E., "Methods of conjugate gradients for solving linear systems,"
Journal of Researches National Bureau of Standards 49, 409-436 (1952).
10.10. Housos, E. C., Huang, C. C., and Liu, J. M., "Parallel algorithms for the AT&T KORBX®
System," AT&T Technical Journal68, No. 3, 37-47 (1989).
10.11. Markowitz, H. M., "The elimination form of the inverse and its application to linear pro-
gramming," Management Science 3, 255-269 (1957).
10.12. Pan, V., "How can we speed up matrix multiplications?", SIAM Review 26, 393-415 (1984).
10.13. Pissanetzky, S., Sparse Matrix Technology, Academic Press, New York (1984).
10.14. Puthenpura, S., Saigal, R., and Sinha, L. P., "Application of LQ factorization in implement-
ing the Karmarkar algorithm and its variants," Technical Memorandum, No. 51173-900205-
0lTM, AT&T Bell Laboratories (1990).
10.15. Saigal, R., "An infinitely summable series implementation of interior point methods," Techni-
cal Report 92-37, Department ofindustrial and Operations Engineering, University of Michi-
gan, Ann Arbor, May (1992).
10.16. Saigal, R., "Matrix partitioning methods for interior point algorithms," Technical Report
92-39, Department of Industrial and Operations Engineering, University of Michigan, Ann
Arbor, June (1992).
10.17. Vanderbei, R. J., "An implementation of the minimum-degree algorithm using simple data
structures," Technical Memorandum, No. 11212-900115-02TM, AT&T Bell Laboratories
(1990).
10.18. Vanderbei, R. J., "ALPO: Another linear program solver," Technical Memorandum, No.
11212-900522-18TM, AT&T Bell Laboratories ( 1990).
10.19. Van Loan, C., "A survey of matrix computations," Technical Report, Cornell University
(1990).
10.20. Wilkinson, J. H., The Algebraic Eigenvalue Problem, Oxford University Press (1965).
EXERCISES
A= [ 1
-1 2
3 -2]5
find the Cholesky factor of AA T.
10.8. Let A be an (m x n)-dimensional matrix (say m < n). Also let B be ann x m matrix such
that BA = I. Show that B exists if and only if the columns of A are linearly independent.
Furthermore, show that B is unique if and only if the rows of A are linearly independent.
Construct such a B for a given A. [Note: B is called the "left inverse" of A.]
10.9. Consider the matrix M in Example 10.1. Compute its Cholesky factor L and then apply
the minimum degree reordering to recompute the Cholesky factor. Remember to compare
these two results.
10.10. Verify that Algorithm F-2 gives the correct answer, and access L column by column.
10.11. Prove that dk, fork = 1, 2, ... , obtained by (10.23) and (10.24), indeed are mutually
conjugate with respect to matrix M.
10.12. For the matrix A of Exercise 10.7, perform LQ factorization to get the corresponding
matrix Q.
10.13. Provide a simple example, say 1 ::::0 m < n ::::0 10, such that A is relatively sparse but Q is
very dense.
10.14. Consider the fixed-point iteration scheme in connection with the LQ factorization tech-
nique, where
wU) = ewU- 1) + (1 - &) [Bw(J-1) + v]
Exercises 279
Let Kmax and Kmin be the largest and smallest eigenvalues of B. Show that
. . -Kmin
p (BI + (1 - B)B) < p(B) If and only If 0 > 8 > _ __::.:.;:::.:._
1- Kmin
10.15. An advantage of the C programming language is due to the dynamic memory allocation
which substantially reduces memory requirements for the implementation of the interior-
point algorithms. If you know what "dynamic memory allocation" is, discuss the way
of implementing the minimum degree reordering, the Cholesky factorization, and the for-
ward/backward solves with the dynamic memory allocation scheme.
Bibliography
1. Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "An implementation of
Karmarkar' s algorithm for linear programming," Mathematical Programming 44,
297-335 (1989).
2. Adler, I., Karmarkar, N., Resende, M. G. C., and Veiga, G., "Data structures and
programming techniques for the implementation of Karmarkar' s algorithm," ORSA
Journal of Computing 1, 84-106 (1989).
3. Adler, I., and Resende, M. G. C., "Limiting behavior of the affine scaling contin-
uous trajectories for linear programming problems," Mathematical Programming
50, 29-51 (1991).
4. Anstreicher, K. M., "A monotonic projective algorithm for fractional linear pro-
gramming," Algorithmica 1, 483-498 (1986).
5. Anstreicher, K. M., "Linear programming and the Newton barrier flow," Mathe-
matical Programming 41, 363-373 (1988).
6. Anstreicher, K. M., "The worst-case step in Karmarkar's algorithm," Mathematics
of Operations Research 14, 294-302 (1989).
7. Anstreicher, K. M., "A combined phase I-phase II projective algorithm for linear
programming," Mathematical Programming 43, 209-223 (1989).
8. Anstreicher, K. M., "A standard form variant, and safeguarded linesearch, for the
modified Karmarkar's algorithm," Mathematical Programming 47, 337-351 (1990).
9. Anstreicher, K. M., "On the performance ofKarmarkar's algorithm over a sequence
of iterations," SIAM Journal on Optimization 1, 22-29 (1991).
280
Bibliography 281
10. Anstreicher, K. M., and Bosch, R. A., "Long steps in an O(n 3 L) algorithm for
linear programming," Mathematical Programming 54, 251-265 (1992).
11. Asic, M. D., Kovacevic-Vujcic, V. V., and Radosavljevic-Nikolic, M. D., "Asymp-
totic behavior of Karmarkar's method for linear programming," Mathematical Pro-
gramming 46, 173-190 (1990).
12. Balinsky, M. L., and Gomory, R. E., "A mutual primal-dual simplex method," in
Recent Advances in Mathematical Programming, ed. R. L. Graves and P. Wolfe,
McGraw-Hill, New York (1963).
13. Balinsky, M. L., and Tucker, A. W., "Duality theory of linear programs: A con-
structive approach with applications," SIAM Review 3, 499-581 (1969).
14. Barnes, J. W., and Crisp, R. M., "Linear programming: A survey of general purpose
algorithms," AilE Transactions 7, No. 3, 49-63 (1975).
15. Barnes, E. R., "A variation of Karmarkar's algorithm for solving linear program-
ming problems," Mathematical Programming 36, 174-182 (1986).
16. Bartels, R. H., and Golub, G. H., "The simplex method for linear programming
using LU decomposition," Communications of the ACM 12, 26-268 (1969).
17. Bartholomew, J. W. H., Chandawarkar, A. S., and Puthenpura, S.C., "SIRCIT: A
system for AT&T overseas network planning," Proceedings of INFOCOM '90, San
Francisco (1990).
18. Bayer, D., and Lagarias, J. C., "The nonlinear geometry of linear programming,
I. Affine and projective scaling trajectories, II. Legendre transform coordinates
and central trajectories," Transactions of the American Mathematical Society 314,
499-581 (1989).
19. Bayer, D., and Lagarias, J. C., "Karmarkar's linear programming algorithm and
Newton's method," Mathematical Programming 50, 291-330 (1991).
20. Bazaraa, M. S., Jarvis, J. J., and Sherali, H. D., Linear Programming and Network
Flows, 2d ed., John Wiley, New York (1990).
21. Beale, E. M. L., "Cycling in the dual simplex algorithm," Naval Research Logistics
Quarterly 2, 269-276 (1955).
22. Ben Daya, M., and Shetty, C. M., "Polynomial barrier function algorithms for
convex quadratic programming," Report J 88-5, School of Industrial and Systems
Engineering, Georgia Institute of Technology (1988).
23. Blair, C. E., "The iterative step in the linear programming algorithm of N. Kar-
markar," Algorithmica 1, 537-539 (1987).
24. Bland, R. G., "New finite pivoting rules for the simplex method," Mathematics of
Operations Research 2, 103-107 (1977).
25. Bland, R. G., Goldfarb, D., and Todd, M. J., "The ellipsoid method: A survey,"
Operations Research 29, 1039-1091 (1981).
26. Borgwardt, K. H., The Simplex Method: A Probabilistic Analysis, Springer-Verlag,
Berlin (1987).
282 Bibliography
27. Burrell, B. P., and Todd, M. J., "The ellipsoid method generates dual variables,"
Mathematics of Operations Research 10, 688-700 (1985).
28. Carolan, W. J., Hill, J. E., Kennington, J. L., Niemi, S., and Wichmann, S. J., "An
empirical evaluation of the KORBX algorithm for military airlift applications,"
Operations Research 38, 240-248 (1990).
29. Cavalier, T. M., and Soyster, A. L., "Some computational experience and a modi-
fication of the Karmarkar algorithm," presented at the 12th Symposium on Math-
ematical Programming, Cambridge, MA (1985).
30. Charnes, A., "Optimality and degeneracy in linear programming," Econometrica
20, 160-170 (1952).
31. Cheng, Y-C., Houck, D. J., Jr., Meketon, M. S., Slutsman, L., Vanderbei, R. J.,
and Wang, P., "The AT&T KORBX® System," AT&TTechnical Journal68, No.3,
7-19 (1989).
32. Choi, I. C., Monma, C. L., and Shanno, D. F., "Further development of a primal-
dual interior point method," ORSA Journal on Computing 2, No. 4 (1990).
33. Chvatal, V., Linear Programming, Freeman, San Francisco (1983).
34. Dantzig, G. B., "Maximization of a linear function of variables subject to linear
inequalities," Activity Analysis of Production and Allocation, ed. T. C. Koopmans,
John Wiley, New York, 339-347 (1951).
35. Dantzig, G. B., Linear Programming and Extensions, Princeton University Press,
Princeton, NJ (1963).
36. Dantzig, G. B., and Wolfe, P., "Decomposition principle for linear programming,"
Operations Research 8, 101-111 (1960).
37. Dennis, J. E., Jr., Morshedi, A. M., and Turner, K., "A variable-metric variant of
Karmarkar's algorithm for linear programming," Mathematical Programming 39,
1-20 (1987).
38. Dikin, I. I., "Iterative solution of problems of linear and quadratic programming"
(in Russian), Doklady Akademiia Nauk USSR 174, 747-748, (English translation)
Soviet Mathematics Doklady 8, 674-675 (1967).
39. Ding, J., and Li, T. Y., "A polynomial-time predictor-corrector algorithm for a
class of linear complementarity problems," SIAM Journal on Optimization 1, 83-
92 (1991).
40. Duff, I. S., Erisman, A. M., and Reid, J. K., Direct Methods for Sparse Matrices,
Clarendon Press, Oxford (1986).
41. Erickson, J. R., "An iterative primal-dual algorithm for linear programming," Re-
port LITH-MAT-R-1985-10, Department of Mathematics, Linkoping University,
Sweden (1985).
42. Erlander, S., "Entropy in linear programs," Mathematical Programming 21, 137-
151 (1981).
Bibliography 283
43. Fang, S. C., "A new unconstrained convex programming approach to linear pro-
gramming," OR Report No. 243, North Carolina State University, Raleigh, NC
(1990), Zeischriftfiir Operations Research 36, 149-161 (1992).
44. Fang, S. C., Puthenpura, S. C., Saigal, R., and Sinha, L. P., "Solving stochastic pro-
gramming problems via Kalman filter and affine scaling," Technical Memorandum,
No. 51173-900808-0lTM, AT&T Bell Laboratories (1990).
45. Fang, S. C., and Tsao, J. H-S., "Solving standard form linear programs via un-
constrained convex programming approach with a quadratically convergent global
algorithm," OR Report No. 259, North Carolina State University, Raleigh,. NC
(1991).
46. Fang, S. C., and Tsao, J. H-S., "An unconstrained convex programming approach
to solving convex quadratic programming problems," OR Report No. 263, North
Carolina State University, Raleigh, NC (1992)(also to appear in Optimization).
47. Farkas, J., "Theorie der einfachen Ungleichungen," Journal for die reine und ange-
wandte Mathematik 124, 1-27 (1902).
48. Fiacco, A. V., and McCormick, G. P., Nonlinear Programming: Sequential Un-
constrained Minimization Techniques, John Wiley, New York (1968).
49. Frazer, R. J., Applied Linear Programming, Prentice-Hall, Englewood Cliffs, NJ
(1968).
50. Frisch, K. R., ''The logarithmic potential method of convex programming," Tech-
nical Report, University Institute of Economics, Oslo, Norway (1955).
51. Freedman, B. A., Puthenpura, S. C., and Sinha, L. P., "A new Karmarkar-based
algorithm for optimizing convex, non-linear cost functions with linear constraints,"
Technical Memorandum, No. 54142-870217-0lTM, AT&T Bell Laboratories
(1987).
52. Freund, R. M., "Polynomial-time algorithms for linear programming based only on
primal affine scaling and projected gradients of a potential function," Mathematical
Programming 51, 203-222 (1991).
53. Gacs, P., and Lovasz, L., "Khachian's algorithm for linear programming," Mathe-
matical Programming Study 14, 61-68 (1981).
54. Garcia, C. B., and Zangwill, W. 1., Pathway to Solutions, Fixed Points, and Equi-
libria, Prentice Hall, Englewood Cliffs, NJ (1981).
55. Gass, S. I., Linear Programming: Methods and Applications, 2d ed., McGraw-Hill,
New York (1964).
56. Gay, D., "A variant of Karmarkar's linear programming algorithm for problems in
standard form," Mathematical Programming 37, 81-90 (1987).
57. Gay, D., "Massive memory buys little speed for complete in-core sparse Cholesky
factorizations," Technical Report, AT&T Bell Laboratories (1988).
58. George, A., and Liu, J. W., Computer Solution of Large Positive Definite Systems,
Prentice Hall, Englewood Cliffs, NJ (1981).
284 Bibliography
59. de Ghellinck, G., and Vial, J.-P., "A polynomial Newton method for linear pro-
gramming," Algorithmica 1, 425-453 (1986).
60. de Ghellinck, G., and Vial, J.-P., "An extension of Karmarkar's algorithm for
solving a system of linear homogeneous equations on the simplex," Mathematical
Programming 39, 79-92 (1987).
61. Gill, P. E., Murray, W., Saunders, M. A., Tomlin, J. A., and Wright, M. H.,
"On projected barrier methods for linear programming and an equivalence to Kar-
markar's projective method," Mathematical Programming 36, 183-209 (1986).
62. Gill, P. E., Murray, W., and Wright, M. H., Numerical Linear Algebra and Opti-
mization, Vol. 1, Addison-Wesley, Redwood City, CA (1991).
63. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-
stock problem," Operations Research 9, 849-859 (1961).
64. Gilmore, P. C., and Gomory, R. E., "A linear programming approach to the cutting-
stock problem-Part II," Operations Research 11, 863-888 (1963).
65. Goldfarb, D., and Liu, S., "An O(n 3 L) primal interior point algorithm for convex
quadratic programming," Mathematical Programming 49, 325-340 (1991).
66. Goldfarb, D., and Mehrotra, S., "A relaxed version of Karrnarkar's method," Math-
ematical Programming 40, 289-315 (1988).
67. Goldfarb, D., and Mehrotra, S., "Relaxed variants of Karrnarkar's algorithm for
linear programming with unknown optimal objective value," Mathematical Pro-
gramming 40, 183-196 (1988).
68. Goldfarb, D., and Mehrotra, S., "A self-correcting version of Karrnarkar's algo-
rithm," manuscript, IE&OR, Columbia University, New York (1988), to appear in
SIAM Journal of Numerical Analysis.
69. Goldfarb, D., and Todd, M. J., "Linear Programming," in Optimization, Hand-
book in Operations Research and Management Science, ed. Nemhauser, G. L.,
and Rinnooy Kan, A. H. G., Vol. 1, 73-170, Elsevier-North Holland, Amsterdam
(1989).
70. Golub, G. H., and Van Loan, C. F., Matrix Computations, Johns Hopkins University
Press, Baltimore (1983).
71. Gonzaga, C., "Conical projection algorithm for linear programming," Mathematical
Programming 43, 151-173 (1989).
72. Gonzaga, C., "An algorithm for solving linear programming problems in O(n 3 L)
operations," in Progress in Mathematical Programming: Interior-Point and Related
Methods, ed. N. Megiddo, Springer-Verlag, New York, 1-28 (1989).
73. Gonzaga, C., "Polynomial affine algorithms for linear programming," Mathematical
Programming 49, 7-21 (1990).
74. Gonzaga, C., "Large-step path-following methods for linear programming, Part I.
Barrier function method," SIAM Journal on Optimization 1, 268-279 (1991).
75. Gonzaga, C., "Large-step path-following methods for linear programming, Part II.
Potential reduction method," SIAM Journal on Optimization 1, 280-292 (1991).
Bibliography 285
76. Gonzaga, C., "Search directions for interior linear programming methods," Algo-
rithmica 6, 153-181 (1991).
77. Gonzaga, C., "Interior point algorithms for linear programming with inequality
constraints," Mathematical Programming 52, 209-225 (1991).
78. Grotschel, M., Lovasz, L., and Schrijver, A., The Ellipsoid Method and Combina-
torial Optimization, Springer-Verlag, Heidelberg (1988).
79. Guier, 0., den Hertog, D., Roos, C., Terlaky, T., and Tsuchiya, T., "Degeneracy
in interior point methods for linear programming," Reports of the Faculty of Tech-
nical Mathematics and Informatics, No. 91-102, Delft University of Technology,
Netherlands, December (1991).
80. den Hertog, D., Roos, C., and Terlaky, T., "Inverse barrier methods for linear
programming," Report of the Faculty of Technical Mathematics and Informatics,
No. 90-27, Delft University of Technology, Netherlands (1990).
81. den Hertog, D., Roos, C., and Terlaky, T., "A complexity reduction for the long-step
path-following algorithm for linear programming," SIAM Journal on Optimization
2, 71-87 (1992).
82. Hestenes, M. R., and Stiefel, E., "Methods of conjugate gradients for solving linear
systems," Journal of Researches National Bureau of Standards 49,409-436 (1952).
83. Ho, J. K., "Recent advances in decomposition," Mathematical Programming Study
31, 119-128 (1987).
84. Hooker, J. N., "Karmarkar's linear programming algorithm," Inteifaces 16, 75-90
(1986).
85. Housos, E. C., Huang, C. C., and Liu, J. M., "Parallel algorithms for the AT&T
KORBX® System," AT&T Technical Journal 68, No. 3, 37-47 (1989).
86. Huard, P., "Resolution of mathematical programming with nonlinear constraints by
the method of centers," in Nonlinear Programming, ed. J. Abadie, North-Holland,
Amsterdam, Holland, 207-219 (1967).
87. Ikura,Y., Freedman, B. A., and Sinha, L. P., "A new Karmarkar-based resource-
directed decomposition algorithm with application to large-scale operator schedul-
ing problems," paper presented at the 13th International Mathematical Symposium,
Tokyo, Japan (1988).
88. Iri, M., and Imai, H., "A multiplicative barrier function method for linear program-
ming," Algorithmica 1, 455-482 (1986).
89. Jan, G. M., and Fang, S. C., "A variant of primal affine scaling algorithm for linear
programs," Optimization 22, 681-715 (1991).
90. Jarre, F., "On the convergence of the method of analytical centers when applied to
convex programs," manuscript, Institute fi.ir Angewandte Mathematik and Statistik,
Universitat Wurzburg, Wurzburg, Germany (1987).
91. Jarre, F., ''The method of analytical centers for smooth convex programs," PhD
thesis, Institute fi.ir Angewandte Mathematik and Statistik, Universitat Wurzburg,
Wurzburg, Germany (1987).
286 Bibliography
92. John, F., "Extremum problems with inequalities as subsidiary conditions," Studies
and Essays, Interscience, New York, 187-204 (1948).
93. Kallio, M., and Porteus, E. L., "A class of methods for linear programming,"
Mathematical Programming 14, 161-168 (1978).
94. Kantorovich, L. V., "Mathematical methods of organizing and planning produc-
tion" (in Russian), Publication House of the Leningrad State University, Leningrad
(1939), (English translation) Management Science 6, 366-422 (1959-60).
95. Kapoor, S., and Vaidya, P. M., "Fast algorithms for convex quadratic programming
and multicommodity flows," Proceedings of the 18th Annual Symposium on Theory
ofComputing, Berkeley, CA 147-159 (1986).
96. Karmarkar, N., "A new polynomial time algorithm for linear programming," Pro-
ceedings of the 16th AnnualACM Symposium on the Theory of Computing, 302-311
(1984).
97. Karmarkar, N., "A new polynomial time algorithm for linear programming," Com-
binatorica 4, 373-395 (1984).
98. Karmarkar, N., Lagarias, J. C., Slutsman, L., and Wang, P., "Power series variants
of Karmarkar-type algorithms," AT&T Technical Journal 68, No. 3, 20-36 (1989).
99. Karmarkar, N., and Ramakrishnan, K. G., "Implementation and computational re-
sults of Karmarkar's algorithm for linear programming, using an iterative method
for computing projections," extended abstract, presented at the 13th Symposium
on Mathematical Programming, Tokyo (1988).
100. Karmarkar, N., and Sinha, L. P., "Application of Karmarkar' s algorithm to overseas
telecommunications network planning," paper presented at the 12th International
Symposium of Mathematical Programming, Boston (1985).
101. Karush, W., "Minima of functions of several variables with inequalities as side
constraints," Master's thesis, Department of Mathematics, University of Chicago
(1939).
102. Khachian, L. G., "A polynomial algorithm in linear programming" (in Russian),
Doklady Akademiia Nauk SSSR 224, 1093-1096, (English translation) Soviet Math-
ematics Doklady 20, 191-194 (1979).
103. Khachian, L. G., "Polynomial algorithms in linear programming" (in Russian),
Zhurnal Vychisditel 'noi Mathematikii Mathematicheskoi Fiziki 20, 51-68, (English
translation) USSR Computational Mathematics and Mathematical Physics 20,
53-72 (1980).
104. Klee, V., and Minty, G. L., "How good is the simplex algorithm?" in Inequalities
III, ed. 0. Shisha, Academic Press, New York, 159-179 (1972).
105. Kojima, M., "Determining basic variables of optimal solutions in Karmarkar's new
LP algorithm," Algorithmica 1, 499-515 (1986).
106. Kojima, M., Mizuno, S., and Yoshise, A., "A primal-dual interior point method for
linear programming," in Progress in Mathematical Programming: Interior-Point
and Related Methods, ed. N. Megiddo, Springer-Verlag, New York, 29-48 (1989).
Bibliography 287
107. Kojima, M., Mizuno, S., and Yoshise, A., "A polynomial-time algorithm for a
class of linear complementarity problems," Mathematical Programming 44, 1-26
(1989).
108. Kojima, M., Mizuno, S., and Yoshise, A., "An O(.,fiiL) iteration potential reduc-
tion algorithm for linear complementarity problems," Mathematical Prog'ramming
50, 331-342 (1991).
109. Kortanek, K. 0., and Shi, M., "Convergence results and numerical experiments on a
linear programming hybrid algorithm," European Journal of Operational Research
32, 47-61 (1987).
110. Kortanek, K. 0., and Zhu, J., "New purification algorithms for linear program-
ming," Naval Research Logistics 35, 571-583 (1988).
111. Kotiah, T., and Slater, N., "On two-server Poisson queues with two types of cus-
tomers," Operations Research 21, 597-603 (1973).
112. Kozlov, M. K., Tarasov, S. P., and Khachian, L. G., "Polynomial solvability of
convex quadratic programming" (in Russian), Doklady Akademiia Nauk USSR 5,
1051-1053 (1979).
113. Kuhn, H. W., and Tucker, A. W., "Nonlinear programming," Proceedings of the
Second Berkeley Symposium on Mathematical Statistics and Probability, University
of California Press, Berkeley, CA, 481-492 (1951).
114. Lagarias, J. C., and Todd, M. J., eds., Proceedings of AMS-IME-SIAM Research
Conference on Mathematical Developments Arising from Linear Programming,
Bowdoin College, Brunswick, ME (1988).
115. Lasdon, L. S., "Duality and decomposition in mathematical programming," IEEE
Transactions on Systems Science and Cybernetics 4, 86-100 (1968).
116. Lemke, C. E., "The dual method for solving the linear programming problem,"
Naval Research Logistics Quarterly 1, No. 1 (1954).
117. Lemke, C. E., "The constrained gradient method of linear programming," Journal
of the Society of Industrial and Applied Mathematics 9, 1-17 (1961).
118. Levin, A. Yu., "On an algorithm for the minimization of convex functions" (in
Russian), Doklady Akademiia Nauk USSR 160, 1244-1247, (English translation)
Soviet Mathematics Doklady 6, 286-290 (1965).
119. Luenberger, D. G., Introduction to Linear and Nonlinear Programming, 2d ed.,
Addison-Wesley, Reading, MA (1973).
120. Markowitz, H. M., "The elimination form of the inverse and its application to
linear programming," Management Science 3, 255-269 (1957).
121. McShane, K. A., Monma, C. L., and Shanno, D., "An implementation of a primal-
dual interior point method for linear programming," ORSA Journal on Computing
1, 70-83 (1989).
122. Megiddo, N., "On the complexity of linear programming," in Advances in Eco-
nomical Theory, ed. T. Bewely, Cambridge University Press, Cambridge, 225-268
(1987).
288 Bibliography
123. Megiddo, N., "Pathways to the optimal set in linear programming," in Progress in
Mathematical Programming: Interior-Point and Related Methods, ed. N. Megiddo,
Springer-Verlag, New York, 131-158 (1989).
124. Megiddo, N., Progress in Mathematical Programming: Interior-Point and Related
Methods, Springer-Verlag, New York (1989).
125. Megiddo, N., and Shub, M., "Bo!lndary behavior of interior point algorithms in
linear programming," Mathematics of Operations Research 14, 97-146 (1989).
126. Mehrotra, S., "On finding a vertex solution using interior point methods," Linear
Algebra and Its Applications 152, 106-111 (1991).
127. Mehrotra, S., and Sun, J., "An algorithm for convex quadratic programming that
requires 0 (n 3 ·5 L) arithmetic operations," Mathematics of Operations Research 15,
342-363 (1990).
128. Mehrotra, S., and Sun, J., "A method of analytic centers for quadratically con-
strained convex quadratic programs," SIAM Journal of Numerical Analysis 28,
529-544 (1991).
129. Mehrotra, S., and Sun, J., "An interior point method for solving smooth convex
programs based on Newton's method," Contemporary Mathematics 114, 265-284
(1991).
130. Monma, C. L., and Morton, A. J., "Computational experience with a dual affine
variant of Karmarkar's method for linear programming," Operations Research Let-
ters 6, 261-267 (1987).
131. Monteiro, R. C., "Convergence and boundary behavior of the projective scaling
trajectories for linear programming," Mathematics of Operations Research 16, No.
4 (1991).
132. Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms.
Part I: Linear programming," Mathematical Programming 44, 27-42 (1989).
133. Monteiro, R. C., and Adler, I., "Interior path following primal-dual algorithms.
Part II: Convex quadratic programming," Mathematical Programming 44, 43-66
(1989).
134. Monteiro, R. C., and Adler, I., "An extension of Karmarkar type algorithm to a class
of convex separable programming problems with global linear rate of convergence,"
Mathematics of Operations Research 15, 408-422 (1990).
135. Monteiro, R. C., Adler, I., and Resende, M. C., "A polynomial-time primal-dual
affine scaling algorithm for linear and convex quadratic programming and its power
series extension," Mathematics of Operations Research 15, 191-214 (1990).
136. Murty, K. G., Linear Programming, John Wiley, New York (1983).
137. Nazareth, J. L., "Homotopies in linear programming," Algorithmica 1, 529-536
(1986).
138. Nazareth, J. L., Computer Solution of Linear Programs, Oxford University Press,
Oxford and New York (1987).
Bibliography 289
156. Saigal, R., "An infinitely summable series implementation of interior point meth-
ods," Technical Report 92-37, Department of Industrial and Operations Engineer-
ing, University of Michigan, Ann Arbor, May (1992).
157. Saigal, R., "Matrix partitioning methods for interior point algorithms," Technical
Report 92-39, Department of Industrial and Operations Engineering, University of
Michigan, Ann Arbor, June (1992).
158. Schultz, G. L., and Meyer, R. R., "An interior point method for block angular
optimization," SIAM Journal on Optimization 1, 583-602 (1991).
159. Shamir, R., "The efficiency of the simplex method: A survey," Management Sci-
ence 33, 301-334 (1987).
160. Shanno, D. F., "Computing Karmarkar projection quickly," Mathematical Pro-
gramming 41, 61-71 (1988).
161. Shanno, D. F., and Marsten, R. E., "On implementing Karmarkar's algorithm,"
Working Paper, Graduate School of Administration, University of California, Davis,
CA (1985).
162. Sheu, R. L., and Fang, S.C., "Insights into the interior-point methods," OR Report
No. 252, North Carolina State University, Raleigh, NC (1990), Zeischrift for
Operations Research 36, 200-230 (1992).
163. Sheu, R. L., and Fang, S.C., "On the generalized path-following methods for linear
programming," OR Report No. 261, North Carolina State University, Raleigh, NC
(1992).
164. Shor, N. Z., "Utilization of space dilation operation in minimization of convex
functions" (in Russian), Kibernetika 1, 6-12, (English translation) Cybernetics 6,
7-15 (1970).
165. Shor, N. Z., Minimization Methods for Non-differentiable Functions, Springer-
Verlag, Berlin (1985).
166. Shub, M., "On the asymptotic behavior of the projective rescaling algorithm for
linear programming," Journal of Complexity 3, 258-269 (1987).
167. Sonnevand, G., "An analytical center for polyhedrons and new classes of global
algorithms for linear (smooth, convex) programming," in Proceedings of the 12th
IFIP Conference on System Modeling and Optimizations, Budapest, Lecture Notes
in Control Information Sciences, Springer-Verlag, New York, 84, 866-876 (1985).
168. Sonnevand, G., "A new method for solving a set of linear (convex) inequalities and
its application for identification and optimization," Proceedings of the Symposium
on Dynamic Modelling, IFAC-IFORS, Budapest (1986).
169. Stone, R. E., and Tovey, C. A., "The simplex and projective scaling algorithm as
iteratively reweighted least squares methods," SIAM Review 33, 220-237 (1991).
170. Sun, J., "A convergence proof for an affine-scaling algorithm for convex quadratic
programming without nondegeneracy assumptions," manuscript to appear in Math-
ematical Programming (1993).
Bibliography 291
171. Tapia, R. A., and Zhang, Y., "Cubically convergent method for locating a nearby
vertex in linear programming," Journal of Optimization Theory and Applications
67, 217-225 (1990).
172. Tapia, R. A., and Zhang, Y., "An optimal-basis identification technique for interior-
point linear programming algorithms," Linear Algebra and Its Applications, 152,
343-363 (1991).
173. Tardos, E., "A strongly polynomial algorithm to solve combinatorial linear pro-
grams," Operations Research 34, 250-256 (1986).
174. Todd, M. J., "Large scale linear programming-geometry, working bases, and
factorization," Mathematical Programming 26, 1-20 (1986).
175. Todd, M. J., "Polynomial expected behavior of a pivoting algorithm for linear
complementarity and linear programming problems," Mathematical Programming
35, 173-192 (1986).
176. Todd, M. J., "Improved bounds and containing ellipsoids in Karmarkar's linear
programming algorithm," Mathematics of Operations Research 13, 650-659 ( 1988).
177. Todd, M. J., "Polynomial algorithms for linear programming," in Advances in
Optimization and Control, ed. H. A. Eiselt and G. Pederzoli, Springer-Verlag,
Berlin, 49-66 (1988).
178. Todd, M. J., "Exploring special structure in Karmarkar's linear programming algo-
rithm," Mathematical Programming 41, 97-113 (1988).
179. Todd, M. J ., "Probabilistic models for linear programming," Mathematics of Oper-
ations Research 16, No. 4 (1991).
180. Todd, M. J., and Burrell, B. P., "An extension to Karmarkar's algorithm for linear
programming using dual variables," Algorithmica 1, 409-424 (1986).
181. Todd, M. J., and Ye, Y., "A centered projective algorithm for linear programming,"
Mathematics of Operations Research 15, 508-529 (1990).
182. Tomlin, M. J., "An experimental approach to Karmarkar's projective method for
linear programming," Mathematical Programming Studies 31, 175-191 (1987).
183. Tomlin, J. A., "A note on comparing simplex and interior methods for linear pro-
gramming," in Progress in Mathematical Programming: Interior-Point and Related
Methods, ed. N. Megiddo, Springer-Verlag, New York, 91-103 (1989).
184. Tseng, P., and Luo, Z. Q., "On the convergence of affine-scaling algorithm,"
manuscript to appear in Mathematical Programming 53 (19~3).
185. Tsuchiya, T., "Global convergence of the affine scaling methods for degenerated
linear programming problems," Research Memo. No. 373, The Institute of Statis-
tical Mathematics, Tokyo, Japan (1990) (Also in Mathematical Programming 52,
377-403 (1991).
186. Tsuchiya, T., "A study on global and local convergence of interior point algorithms
for linear programming" (in Japanese), PhD thesis, Faculty of Engineering, The
University of Tokyo, Tokyo, Japan (1991).
292 Bibliography
187. Vanderbei, R. J., "Karrnarkar's algorithm and problems with free variables," Math-
ematical Programming 43, 31-44 (1989).
188. Vanderbei, R. J., "An implementation of the minimum-degree algorithm using
simple data structures," Technical Memorandum, No. 11212-900115-02TM, AT&T
Bell Laboratories (1990).
189. Vanderbei, R. J., "ALPO: Another linear program solver," Technical Memorandum,
No. 11212-900522-18TM, AT&T Bell Laboratories (1990).
190. Vanderbei, R. J., and Carpenter, T. J., "Symmetric indefinite system for interior
point method," Technical Report SOR 91-7, Department of Civil Engineering and
Operations Research, Princeton University, NJ (1991).
191. Vanderbei, R. J., and Lagarias, J. C., "I. I. Dikin's convergence result for the
affine-scaling algorithm," Contemporary Mathematics 114, 109-119 (1990).
192. Vanderbei, R. J., Meketon, M. S., and Freedman, B. A., "A modification of Kar-
markar' s linear programming algorithm," Algorithmica 1, 395-407 (1986).
193. Van Loan, C., "A survey of matrix computations," Technical Report, Cornell Uni-
versity (1990).
194. Vaidya, P. M., "An algorithm for linear programming which requires O(((m +
n)n 2 + (m +n)l.5n)L) arithmetic operations," Mathematical Programming 47, 175-
201 (1990).
195. Wilkinson, J. H., The Algebraic Eigenvalue Problem, Oxford University Press
(1965).
196. Witzgall, C., Boggs, P. T., and Domich, P. 0., "On the convergence behavior
of trajectories for linear programming," Contemporary Mathematics 114, 161-187
(1990).
197. Wolfe, P., ''The simplex method for quadratic programming," Econometrica 27,
382-398 (1959).
198. Wright, S. J., "Interior-point methods for optimal control of discrete-time systems,"
Technical Report MCS-P226-0491, Argonne National Laboratories, Chicago, April
(1991).
199. Ye, Y., "Karrnarkar's algorithm and the ellipsoidal method," Operations Research
Letters 4, 177-182 (1987).
200. Ye, Y!, "A class of potential functions for linear programming," manuscript, De-
partment of Management Sciences, The University of Iowa, Iowa City (1988).
201. Ye, Y., "A further result on the potential reduction algorithm for P-matrix linear
complementarity problem," manuscript, Department of Management Sciences, The
University of Iowa, Iowa City (1988).
202. Ye, Y., "A combinatorial property of analytical centers of polytopes," manuscript,
Department of Management Sciences, The University of Iowa, Iowa City (1989).
Bibliography 293
203. Ye, Y., "An extension of Karrnarkar's algorithm and the trust region method
for quadratic programming," in Progress in Mathematical Programming: Interior-
Point and Related Methods, ed. N. Megiddo, Springer-Verlag, New York, 49-64
(1989).
204. Ye, Y., "An O(n 3 L) potential reduction algorithm for linear programming," Con-
temporary Mathematics 114, 91-107 (1990).
205. Ye, Y., "Recovering optimal basic variables in Karrnarkar's polynomial algorithm
for linear programming," Mathematics of Operations Research 15, 564-572 (1990).
206. Ye, Y., "A potential reduction algorithm allowing column generations," SIAM Jour-
nal on Optimization 2, 7-20 (1992).
207. Ye, Y., Guler, 0., Tapia, R. A., and Zhang, Y., "A quadratically convergent
O(JjlL)-iteration algorithm for linear programming," Technical Report TR91-26,
Department of Mathematical Sciences, Rice University, Houston, TX (1991).
208. Ye, Y., and Kojima, M., "Recovering optimal dual solutions in Karrnarkar's polyno-
mial algorithm for linear programming," Mathematical Programming 39, 305-317
(1987).
209. Ye, Y., and Pardalos, P., "A class of linear complementarity problems solvable in
polynomial time," manuscript, Department of Management Sciences, The Univer-
sity oflowa, Iowa City (1989).
210. Ye, Y., Tapia, R. A., and Zhang, Y., "A superlinearly convergent O(JjlL)-iteration
algorithm for linear programming," Technical Report TR91-22, Department of
Mathematical Sciences, Rice University, Houston, TX (1991).
211. Ye, Y., and Todd, M. J., "Containing and shrinking ellipsoids in the path-following
algorithm," Mathematical Programming 47, 1-9 (1990).
212. Ye, Y., and Tse, E., "A polynomial-time algorithm for convex quadratic program-
ming," manuscript, Department of Engineering-Economic Systems, Stanford Uni-
versity, Stanford, CA (1986).
213. Yudin, D. B., and Nemirovskii, A. S., "Informational complexity and efficient
methods for the solution of convex extremal problems" (in Russian), Ekonomika
I Matematicheskie Metodv 12, 357-369, (English translation) Matekon 13, 3-25
(1976).
214. Zhang, Y., Tapia, R. A., and Dennis, J. E., "On the superlinear and quadratic
convergence of primal-dual interior point linear programming algorithms," SIAM
Journal on Optimization 2, 304-324 (1992).
215. Zimmermann, U., "Search directions for a class of projective methods," Zeitschrift
for Operations Research 34, 353-379 (1990).
Index
295
A Basic solution, 21
Basic variable, 20
Additivity assumption, 4
Basis, 20, 32
Adjacency, 21, 22
Bland's rule, 44
Affine combination, 17
Affine hull, 19 Block size, 258
Affine scaling, 112, 144 Binary search, 163
for QP, 227 Boundary point, 18
transformation, 146-47 Bounding constraint, 127
Affine scaling algorithms, 10 Bounding hyperplane, 15
Affine transformation, 97, 144
Algebraic paths, 201, 202
extensions, 208-9 c
Artificial constraint technique, 72
Cauchy sequence, 275
Artificial variables, 39
Centering force, 164
Artificial variable technique, 72
Certainty assumption, 4
Assignment problem, 12
Cholesky factor, 259, 260, 229
Cholesky factorization, 256-60
block, 257, 258
B
inner product form, 278
Backward solve, 47, 53, 254, 265, 273 outer product form, 256, 278
Barrier function, 217 sparse, 259
inverse, 218, 220, 222 Code reorganizing, 257
logarithmic, 198, 219, 221, 222, 240-41 Column dropping technique, 276
Basic feasible solution: Compilers:
degenerated, 34 C, 257
overdetermined, 34, 35 FORTRAN, 257
297
298 Index
G
L
Gale's transposition theorem, 90
Gauss Jordan elimination, 46 Lagrange dual problem, 251
Geodesics, 252 Lagrange multiplier, 59, 216, 218, 220, 251
Gordan's transposition theorem, 90 Largest reduction rule, 36
Graph elimination, 261 Left inverse, 278
Graphical method, 18, 25 Lexicographical rule, 44
Line search, 247
Linear manifold, 17
H Linear programming (LP), 1
canonical form, 5, 102
Hadamard's inequality, IIO fundamental theorem, 24
Half spaces: inequality form, 57
closes, 15 problem, bounded, 15
open, 15 problem, consistent, 14
Hybrid method, 141 standard fonn, 2, 3, 6, 11, 12
Hyperplane, 15 Lower triangular matrix, 46, 229, 255, 256
LQ factorization method, 268-75
LU factorization, 46
I
Integer linear programming problem, 9
Interior feasible point, 145
M
Interior point methods, 2, 25, 26, 112, 113 Marginal cost, 64
Interior solution (see Interior feasible point) Marginal prices (see Shadow prices)
Inventory-balance constraints, 8 Matrix polynomial, 272
Iterative scheme, 10, 29 Minimum degree reordering algorithm,
261-62
Minimum ratio test, 36, 69, 70, 149
K Moving direction:
geometric interpretation, 209-10
Karrnarkar's algorithm, 10, 113-40
mutually conjugate, 266
basic ideas, 113, 114
convergence, 130
direction of movement, 118, 119
polynomial time solvability, 120-25
N
sliding objective function method, 128-34 Newton direction, 181, 201, 203-6, 209
step-by-step procedure, 119 projected, 218
300 Index
z
v Zero slackness, 62
Variables: Zig-zagging, 228, 229