Anda di halaman 1dari 117

# Math Camp

for Economists

Daniel A. Graham

## c 2007-2008 Daniel A. Graham

ii
Contents

Contents iii

Preface vii

1 Linear Algebra 1

## 1.4 Separating and Supporting Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Matrix Algebra 15

## 2.8 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.9 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iii
3 Topology 35
3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Sigma Algebras and Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Calculus 47
4.1 The First Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 The Second Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Optimization 57
5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 The Well Posed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Dynamics 75
6.1 Dynamic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Systems of Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 81
6.3 Liapunov’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Notation 91

Using Mathematica 93
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Input Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Symbols and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Using Prior Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Commonly Used Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

iv
Bibliography 103

## List of Problems 105

Index 106

Colophon 109

v
vi
Preface

The attached represents the results of many years of effort to economize on the use of my scarce
mental resources — particularly memory. My goal has been to extract just those mathematical ideas
which are most important to Economics and to present them in a way that emphasizes the approach,
common to virtually all of mathematics, that begins with the phrase “let X be a non-empty set” and
goes on to add a pinch of this and a dash of that.
I believe that Mathematics is both beautiful and useful and, when viewed in the right way, not nearly as
complicated as some would have you believe. For me, the right way is to identify the links connecting
the ideas and, whenever possible, to embed them in a visual setting.
The reader should be aware of two aspects of these notes. First, intuition is emphasized. While “Prove
Theorem 7” might be a common format for exercises inside courses, “State an interesting proposition
and prove it” is far more common outside courses. Intuition is vital for such endeavors. Secondly, use
of the symbolic algebra program Mathematica is emphasized for at least the following reasons:

• Mathematica is better at solving a wide variety of problems than you or I will ever be. Our
comparative advantage is in modeling, not solving.

• Mathematica lowers the marginal cost of asking “What if?” questions, thereby inducing us to ask
more of them. This is a very good thing. One of the best ways of formulating conjectures about
what might be true, for instance, is to examine many specific cases and this is a relatively cheap
endeavor with Mathematica.

• Mathematica encourages formulating solution plans and, in general, top-down thinking. After
all, with it to do the heavy lifting, all that’s left for us is to formulate the problem and plan the
steps. This too, is a very good thing.

Why Mathematica and not Maple, another popular symbolics program? While there are differences,
both are wonderful programs and it would be difficult to argue that either is better than the other. I’ve
used both and have a slight personal preference for Mathematica.
Dan Graham
Duke University

vii
viii
Chapter 1

Linear Algebra

## 1.4 Separating and Supporting Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

The following is an informal review of that part of linear algebra which will be most important to
subsequent analysis. Please bear in mind that linear algebra is, perhaps, the single most important
tool in Economics and forms the basis for many other important areas of mathematics as well.

## 1.1 Real Spaces

Recall that the Cartesian product of sets, e.g. “Capital Letters” × “Integers” × “Lower Case Letters”, is
itself a set composed of all ordered n-tupels of elements chosen from the respective sets, e.g., (G, 5, f ),
(F , 1, a) and so forth. Note that no multiplication is involved in forming this product. Now introduce
the set consisting of all real numbers, denoted R and called “the reals”, and real n-space is obtained
as the n-fold Cartesian product of the reals with itself:

1
n times
z }| {
n
R ≡ R × ... × R
(-4, 5) *
≡ {(x1 , x2 , ..., xn ) | xi ∈ R, i = 1, 2, . . . , n}

## The origin is the point (0, 0, . . . , 0) ∈ Rn . It will sometimes

be written simply as 0 when no confusion should result. An
arbitrary element of this set, x ∈ Rn , is sometimes called a
point and sometimes called a vector and xi is called the ith
component of x. The existence of two terms for the same
thing is due, no doubt, to the fact that it is sometimes useful (6, -3)
to think of x = (x1 , x2 ), for example, as a point located at
x1 on the first axis and x2 on the second axis. Other times
it is useful to think of x = (x1 , x2 ) as a directed arrow or
vector with its tail at the origin, (0, 0), and its tip at the point
x = (x1 , x2 ). Figure 1.1: Vectors in R2
See Figure 1.1. It is important to realize, on the other hand, that it is hardly ever useful to think of a
vector as a list of its coordinates. Vectors are objects and better regarded as such than as lists of their
components.

## 1.1.1 Equality and Inequalities

Recall that a relation (or binary relation), R, on a set S is a mapping from S × S to {True, False}, i.e.,
for every x, y ∈ S, xRy is either “True” or “False”. This is illustrated in Figure 1.2 for the relation >
on R. Note that points along the “45-degree line” where x = y map into “False”.

## Supposing that x, y ∈ Rn , several sorts of relations are pos- y

sible between these vectors. The vector x is equal to the
vector y when each component of x is equal to the corre- x=y
sponding component of y:
Definition 1.1. x = y iff xi = yi , i = 1, 2, . . . , n x > y: False

## The vector x is greater than or equal to the vector y when

each component of x is at least as great as the corresponding
x
component of y:
Definition 1.2. x ≥ y iff xi ≥ yi , i = 1, 2, . . . , n.
x > y: True
The vector x is greater than the vector y when each compo-
nent of x is at least as great as the corresponding component
of y and at least one component of x is strictly greater than
the corresponding component of y:
Definition 1.3. x > y iff x ≥ y and x 6= y Figure 1.2: The Relation > on R
 Problem 1.1. [Answer] Suppose x > y. Does it necessarily follow that x ≥ y?

The vector x is strictly greater than the vector y when each component of x is greater than the
corresponding component of y:
Definition 1.4. x  y iff xi > yi , i = 1, 2, . . . , n.
 Problem 1.2. [Answer] Suppose x  y. Does it necessarily follow that x > y?

2
These definitions are standard and they conform to the conventional usage in the special case in which
n = 1. The distinctions are illustrated in Figure 1.3. The shaded area in the left-hand panel represents
the set of vectors, y, for which y  (4, 3). Note that (4, 3) does not belong to this set nor does any
point directly above (4, 3) nor any point directly to the right of (4, 3). The shaded area in the right-
hand panel illustrates the set of vectors, y, for which y ≥ (4, 3). This shaded area differs by including
(4, 3), points directly above (4, 3) and points directly to the right of (4, 3). Though not illustrated, the
set of y’s for which y > (4, 3) corresponds to the shaded area in the right-hand panel with the point
(4, 3) itself removed.

x = (4,3) x = (4,3)

## Figure 1.3: Vector Inequalities: y  x (left) and y ≥ x (right)

 Problem 1.3. A relation, R, on a set S is transitive iff xRy and yRz implies xRz for all x, y, z ∈ S.
(i) Is = transitive? (ii) What about ≥? (iii) ? (iv) >?
 Problem 1.4. A releation, R on a set S is complete if either xRy or yRx for all x, y ∈ S. When n = 1
it must either be the case that x ≥ y or that y ≥ x (or both). Thus ≥ on R1 is complete. Is ≥ on Rn
complete when n > 1?
 Problem 1.5. [Answer] Consider the case in which n = 1 and x, y ∈ R1 . Is there any distinction
between x > y and x  y?

||(-3)|| = 3

-3 0

||(2,4,4)|| = 6

3
4

||(3,4)|| = 5
2
4

## Figure 1.4: Vector Norms in R1 , R2 and R3

The (Euclidean) norm or length of a vector, by an obvious extension of the Pythagorean Theorem, is
the square root of the sum of the squares of its components.

3
 1/2
Definition 1.5. kxk ≡ x12 + x22 + . . . + xn
2
.

1
√ absolute value of a real number and the norm of a vector in R are equivalent — if a ∈ R
Note that the
2 2 3
then kak = a = |a|. The norms of a vectors in R and R are illustrated in Figure 1.4 on preceding
page. The extensions to higher dimensions are analogous.

## If x, y ∈ Rn , the dot or inner product of these two vectors

is obtained by multiplying the respective components and adding.
Pn
Definition 1.6. x · y ≡ x1 y1 + x2 y2 + . . . + xn yn = i=1 xi yi y = (3,4)
z = (-3,4)

## There is a very important geometric interpretation of this dot

product. It can be shown that

## x · y = kxkkyk cos θ (1.1)

where θ is the included angle between the two vectors. Recall x = (4,-3)
that the cosine of θ is bigger than, equal to or less than zero w = (-3, -4)
depending upon whether θ is less than, equal to or greater
than ninety degrees.
Theorem 1. Suppose x, y ∈ Rn with x, y 6= 0. Then x · y > 0
iff x and y form an acute angle, x · y = 0 iff x and y form a Figure 1.5: Angles
right angle and x · y < 0 iff x and y form an obtuse angle.

This theorem is illustrated in Figure 1.5 where (a) x and y form a right angle and x · y = 0, (b) x and
w form a right angle and x · w = 0, (c) x and z form an obtuse angle and x · z = −24 < 0, (d) y and
w form an obtuse angle and y · w = −25 < 0 and (e) y and z form an acute angle and y · z = 7 > 0.
When two vectors form a right angle they are said to be orthogonal Note that the word “orthogonal” is
just the generalization of the word “perpendicular” to Rn . Similarly, orthant is the generalization of
the word quadrant to Rn .

 Problem 1.6. The Cauchy-Schwarz inequality states that x · y ≤ kxkkyk. Show that this inequality
follows from Equation 1.1.
♦ Query 1.1. When does Cauchy-Schwarz hold as an equality?
 Problem 1.7. Suppose a, x, y ∈ Rn and a · x > a · y. Does it follow that x > y? [Hint: resist any
temptation to “divide both sides by a”.]
 Problem 1.8. In Mathematica a vector is a list, e.g. {1,2,3,4} or Table[j,{j,1,4}] and the dot
product of two vectors is obtained by placing a period between them. Use Mathematica to evaluate the
following dot product:
Table[j, {j,1,100}] . Table[j-50, {j,1,100}]
Do the two vectors form an acute angle, an obtuse angle or are they orthogonal?

The sum of two vectors is obtained by adding the respective components. Supposing that x, y ∈ Rn
we have:
Definition 1.7. x + y ≡ (x1 + y1 , x2 + y2 , . . . , xn + yn )

4
Note that the sum of two vectors in Rn is itself a vector in
Rn . The set Rn is said, therefore, to be closed with respect
to the operation of addition. y = (-1,5)
2
The addition of two points from R is illustrated in Figure 1.6.
The addition of x = (5, −3) and y = (−1, 5) yields a point
x+y = (4, 2)
located at the corner of the parallelogram whose sides are
formed by the vectors x and y. Equivalently, x + y = (4, 2)
is obtained by moving the vector x parallel to itself until its
tail rests at the tip of y, or by moving the vector y parallel
to itself until its tail rests at the tip of x.
x = (5, -3)
The scalar product of a real number and a vector is obtained
by multiplying each component of the vector by the real num-
ber. If α ∈ R then:
Definition 1.8. αx = (αx1 , αx2 , . . . , αxn ).

Note that this product is itself a vector in Rn . The set Rn is Figure 1.6: Vector Addition
said, therefore, to be closed with respect to the operation of scalar multiplication.
Scalar multiplication is illustrated in Figure 1.7. Note that
for any choice of α, αx lies along the extended line passing
through the origin and the point x. The sign of α determines
whether αx will be on the same (α > 0) or opposite (α <
0) side of the origin as x. The magnitude of α determines
whether αx will be closer (< 1) or further away (kαk > 1) 2x

##  Problem 1.9. In Mathematica if x and y are vectors and a

is a real number, then x+y gives the sum of the two vectors
and a x gives the scalar product of a and x. Use Mathematica -2 x
to evaluate the following:
3 {1,3,5} + 2 {2,4,6}

The norm of αx is
kαxk = (α2 x12 + α2 x22 + . . . + α2 xn
2 1/2
)
Figure 1.7: Scalar Product
= [α2 (x12 + x22 + . . . + xn
2 1/2
)]
= (α2 )1/2 (x12 + x22 + . . . + xn
2 1/2
)
= kαkkxk

Multiplying x by α thus produces a new vector that is kαk times as long as the original vector. It is not
difficult to see that αx points in the same direction as x if α is positive and in the opposite direction
if α is negative.
 Problem 1.10. In Mathematica the norm of the vector x is given by Norm[x]. What is
Norm[Table[j, {j,1,100}]]

## The operations of vector addition and scalar multiplication can be combined.

5
1.2.1 Linear Combinations

## Definition 1.9. If x 1 , x 2 , . . . , x k are k vectors in Rn and if α1 , α2 , . . . , αk are real numbers then

z = α1 x 1 + α2 x 2 + . . . + αk x k

## A related concept is that of linear independence.

Definition 1.10. If
α1 x 1 + α2 x 2 + . . . + αk x k = (0, 0, . . . , 0)
has no solution (a1 , a2 , . . . , ak ) other than the trivial solution, α = 0, then the vectors are said to be
linearly independent. Alternatively, if there were a non-trivial solution, α 6= 0, then the vectors are said
to be linearly dependent.

In the latter case we must have αj 6= 0 for some j and thus can write:
X
αj x j = − αi x i
i6=j

or, since αj 6= 0,
X αi i
xj = − x
i6=j
αj

Thus x j is a linear combination of the remaining x’s. It follows that vectors are either linearly inde-
pendent or one of them can be expressed as a linear combination of the rest.
This is illustrated in Figure 1.8. In the right-hand panel, x and y are linearly dependent and a non-
trivial solution is α = (1, 2). In the left-hand panel, on the other hand, x and y are linearly independent.
Scalar multiples of x lie along the dashed line passing through x and the origin and, similarly, scalar
multiples of y lie along the dashed line passing through y and the origin. The only way to have the
sum of two points selected from these lines add up to the origin is to choose the origin from each line
— the trivial solution α = (0, 0).

2y = (-4, 8)
(0,7)
y = (6, 4)
y = (-2, 4)
(0, 0) =1 x + 2 y

x = (6, -3)

x = (4, -8)

## Figure 1.8: Linear Independence (left) and Dependence (right)

Definition 1.11. If L is a non-empty set which is closed with respect to vector addition and scalar
multiplication, i.e. (i) x, y ∈ L =⇒ x + y ∈ L and (ii) α ∈ R, ; x ∈ L =⇒ αx ∈ L, then L is called a
linear space.
Definition 1.12. If L is a linear space and L ⊆ M then L is a linear subspace of M.
 Problem 1.11. Which of the following sets are linear subspaces of R3 ?

## 1. A point other than the origin? What about the origin?

6
2. A line segment? A line through the origin? A line not passing through the origin?

3. A plane passing through the origin? A plane not passing through the origin?

## 4. The non-negative orthant, i.e., {x ∈ R3 | x ≥ 0}?

♦ Query 1.2. Must the intersection of two linear subspaces itself be a linear subspace?
Definition 1.13. The dimension of a linear (sub)space is an integer equaling the largest number of
linearly independent vectors which can be selected from the (sub)space.
 Problem 1.12. [Answer] What are the dimensions of the following subsets of R3 ?

1. The origin?

## 3. A plane which passes through the origin?

Definition 1.14. Given a set {x 1 , x 2 , . . . , x k } of k vectors in Rn , the set of all possible linear combina-
tions of these vectors is referred to as the linear subspace spanned by these vectors.

Linear spaces spanned by independent and dependent vectors are illustrated for the case in which
n = 2 in Figure 1.8 on the previous page. Since the two vectors, (6, −3) and (6, 4) in the left-hand
panel are linearly independent, every point in R2 can be obtained as a linear combination of these two
vectors. The point (0, 7), for example, corresponds to −1x + 1y. In the right-hand panel, on the other
hand, the two vectors (4, −8) and (−2, 4), are linearly dependent and the linear subspace spanned by
these vectors is a one-dimensional, strict subset of R2 corresponding to the line which passes through
the two points and the origin.
 Problem 1.13. [Answer] Suppose x, y ∈ Rn with x 6= 0 and let X = { z ∈ Rn | z = αx, α ∈ R } be
the (1-dimensional) linear space spanned by x. The projection of y upon X, denoted ŷ, is defined to
be that element of X which is closest to y, i.e. that element ŷ ∈ X for which the norm of the residual
of the projection, ky − ŷk, is smallest. Obtain expressions for both α̂ and ŷ as functions of x and y.
 Problem 1.14. Suppose a, y ∈ Rn and let X = { x ∈ Rn | a · x = 0 } be the linear subspace
orthogonal to a. Obtain an expression for ŷ, the projection of y on X, as a function of a and y.
[See Problem 1.13.]
Definition 1.15. A basis for a linear (sub)space is a set of linearly independent vectors which span the
(sub)space.
Definition 1.16. An orthonormal basis for a linear (sub)space is a basis with two additional properties:

1. The basis vectors are mutually orthogonal, i.e., if x i and x j are vectors in the basis, then x i · x j =
0.

2. The length of each basis vector is one, i.e., if x i is a vector in the basis, then x i · x i = 1.

## 1.2.2 Affine Combinations

In forming linear combinations of vectors no restriction whatever is placed upon the α’s other than
that they must be real numbers. In the left-hand panel of Figure 1.9 on following page, for example,
every point in the two-dimensional space corresponds to a linear combination of the two vectors. An
affine combination of vectors, on the other hand, is a linear combination which has the additional
restriction that the α’s add up to one.

7
Definition 1.17. If x 1 , x 2 , . . . , x k are k vectors in Rn and if α1 , α2 , . . . , αk are real numbers with the
property that
Xk
αi = 1
i=1
then
k
X
z= αi x i
i=1
is an affine combination of the x’s.
 Problem 1.16. An affine combination of points is necessarily a linear combination as well but not
vice versa. True or false?
An affine space bears the same relationship to affine combinations that a linear space does to linear
combinations:
Definition 1.18. If L is closed with respect to affine combinations, i.e. affine combinations of points in
L are necessarily also in L, then L is called an affine space. If, additionally, L ⊆ M then L is an affine
subspace of M.
The affine subspace spanned by a set of vectors is similarly analogous to the linear subspace spanned
by a set of vectors.
Definition 1.19. Given a set {x 1 , x 2 , . . . , x k } of k vectors in Rn , the affine subspace spanned by these
vectors is set of all possible affine combinations of these vectors
 
 Xk k
X 
n i
z∈R |z= αi x , αi = 1
 
i=1 i=1

## When k = 2, z = α1 x 1 +α2 x 2 is an affine combination of x 1 and x 2 , provided that a1 +a2 = 1. Suppose

now that x 1 6= x 2 , let λ = α1 and (1 − λ) = α2 and rewrite this as z = λx 1 + (1 − λ)x 2 . Rewriting again
we have z = λ(x 1 − x 2 ) + x 2 . Note that when λ = 0, z = x 2 . Alternatively, when λ = 1, z = x 1 . In
general z is obtained by adding a scalar multiple of (x 1 − x 2 ) to x 2 . It is not difficult to see that such
points lie on the extended line passing through x 1 and x 2 — the set of all possible affine combinations
of two distinct vectors is simply the line determined by the two vectors. This is illustrated for n = 2
by the middle panel in Figure 1.9.

Figure 1.9: Combinations: linear (left), affine (middle) and convex (right)

 Problem 1.17. A linear subspace is necessarily an affine subspace as well but not vice versa. True or
false?
 Problem 1.18. [Answer] Suppose a is a point in L where L is an affine subspace but not a linear
subspace. Let M be the set obtained by “subtracting” a from L, i.e. M = { z | z = x − a, x ∈ L }. Is M
necessarily a linear subspace?
 Problem 1.19. Suppose x, y ∈ Rn with x and y linearly independent and consider the affine sub-
space A = {z ∈ Rn | z = λx + (1 − λ)y, λ ∈ R}. Find the projection, ô of the origin on A.

8
1.2.3 Convex Combinations

If we add the still further requirement that the α’s not only add up to one but also that each is non-
negative, then we obtain a convex combination.
Definition 1.20. If x 1 , x 2 , . . . , x k are k vectors in Rn and if α1 , α2 , . . . , αk are real numbers with the
property that

k
X
αi = 1
i=1
αi ≥ 0, i = 1, . . . , k

then
k
X
z= αi x i
i=1

## is a convex combination of the x’s.

Again considering the case of k = 2, we know that since the α’s must sum to one, convex combinations
of two vectors must lie on the line passing through these vectors. The additional requirement that the
α’s must be non-negative means that convex combinations correspond to points on the line between
x 1 and x 2 , i. e. the set of all possible convex combinations of two distinct points is the line segment
connecting the two points. This is illustrated for n = 2 in Figure 1.9 on the previous page.
 Problem 1.20. A convex combination of points is necessarily an affine combination and thus a linear
combination as well but not vice versa. True or false?

A convex set bears the same relationship to convex combinations that an affine subspace does to affine
combinations:
Definition 1.21. If L ⊆ Rn and L is closed with respect to convex combinations, i. e. convex combina-
tions of points in L are necessarily also in L, then L is called an convex set.
 Problem 1.21. Show that the intersection of two (or more) convex sets in Rn must itself be a convex
set.
Definition 1.22. Given a set L ⊆ Rn , the smallest convex set which contains L is called the convex hull
of L. Here “smallest” means the intersection of all convex sets containing the given set.]

The convex hull of a set of vectors corresponds to the set of all convex combinations of the vectors
and is thus analogous to the affine space spanned by a set of vectors:
 Problem 1.22. Suppose x, y and z are three, linearly independent vectors in R3 . Describe the sets
which correspond to all (i) linear, (ii) affine and (iii) convex combinations of these three vectors.

## 1.3 The Standard Linear Equation

9
With the geometrical interpretation of the dot product in
mind consider the problem of solving the linear equation

a1 x1 + a2 x2 + . . . + an xn = 0
a = (4, 3)
or
a·x =0
where a = (a1 , a2 , . . . an ) is a known vector of coefficients 90
— called the normal of the equation — and the problem is
to find those x’s in Rn which solve the equation. We know
that finding such an x is equivalent to finding an x which is
a. x = 0
orthogonal to a. The solution set,

X(a) ≡ { x ∈ Rn | a · x = 0 }

## then must consist of all x’s which are orthogonal to a. This

is illustrated for n = 2 in Figure 1.10. Figure 1.10: a · x = 0
 Problem 1.23. [Answer] Show that X(a) is a linear subspace.
 Problem 1.24. [Answer] What is the dimension of X(a)?
 Problem 1.25. Suppose a, b, y ∈ Rn are linearly independent and let L = {x ∈ Rn | a · x = 0 and b ·
x = 0}. Find an expression for ŷ, the projection of y on L as a function of a, b and y.

## Now consider the “non zero” version of the linear equation

a1 x1 + · · · + an xn = b

or
a·x =b
where b is not necessarily equal to 0 and let

X(a, b) = {x ∈ Rn | a · x = b}

## denote the solution set for this equation.

 Problem 1.26. [Answer] Show that X(a, b) is an affine subspace.
 Problem 1.27. When is X(a, b) a linear subspace?
♦ Query 1.3. Which two subsets of a linear space, X, are always linear subspaces?

To provide a geometric characterization of X(a, b), find a point x ∗ that (i) lies in the linear subspace
spanned by a and (ii) solves the equation a · x = b. To satisfy (i) it must be the case that x ∗ = λa for
some real number λ. To satisfy (ii) it must be the case that a · x ∗ = b. Combining we have a · (λa) = b
or λ = b/(a · a) and thus x ∗ = [b/(a · a)]a.
Now suppose that x 0 is any solution to a · x = 0. It follows that x ∗ + x 0 must solve a · x = b since
a · (x ∗ + x 0 ) = a · x ∗ + a · x 0 = b + 0 = b. We may therefore obtain solutions to a · x = b simply
by adding x ∗ to each solution of a · x = 0. X(a, b) is obtained, in short, by moving X(a) parallel
to itself until it passes through x ∗ . The significance of x ∗ is that it is the point in X(a, b) which is
closest to the origin. This norm, moreover, is kx ∗ k = kbk/kak . Note that x ∗ can be interpreted as
the intercept of the solution set with the a “axis”. When b is positive, X(a, b) lies on the same side
of the origin as a and a forms a positive dot product (acute angle)with each point in X(a, b). When b
is negative, X(a, b) lies on the opposite side of the origin from a and a forms a negative dot product
(obtuse angle) with each point in X(a, b).

10
This x ∗ is illustrated in Figure 1.11 for the case in which
a = (4, 3), kak = 5, b = −25/2, x ∗ = [b/(a · a)]a = −25/2 ×
1/25 × (4, 3) = (−2, −3/2) and
q
kx ∗ k = (−2, −3/2) · (−2, −3/2) a = (4, 3)

= 5/2
= kbk/kak
90
x* = (-2, -3/2)
The solution set for the linear equation a · x = b can thus be
given the following interpretation: X(a, b) is an affine sub-
a . x = -25/2 a. x = 0
space orthogonal to the normal a and lying a directed dis-
tance equal to b/kak from the origin at the closest point.
The term directed distance simply means that X(a, b) lies
on the same side of the origin as a if b is positive and on the
opposite side if b is negative. Figure 1.11: a · x = b
This is the standard form for a linear equation. It replaces
the familiar slope-intercept form used for n = 2. In this more general form the slope is given by the
“orthogonal to a” requirement and the intercept by the point x ∗ a distance b/kak out the a “axis”.
 Problem 1.28. Suppose b ∈ R, a, y ∈ Rn and let X(a, b) = {x ∈ Rn | a · x = b}. Obtain an
expression for ŷ, the projection of y on X(a, b), as a function of a, b and y. [See Problem 1.13 on
page 7.]

## 1.4 Separating and Supporting Hyperplanes

The solution set X(a, b) bears exactly the same relationship to Rn that a plane does to R3 . For example,
it is linear (either a linear or an affine subspace) and has a dimension equal to n − 1. For these reasons

X(a, b) ≡ { x ∈ Rn | a · x = b }

is called a hyperplane. This hyperplane divides Rn into two associated half spaces

H + (a, b) ≡ { x ∈ Rn | a · x ≥ b }
H − (a, b) ≡ { x ∈ Rn | a · x ≤ b }

## H + (a, b) ∩ H − (a, b) = X(a, b)

Definition 1.23. If Z ⊂ Rn is an arbitrary set, then X(a, b) is bounding for Z iff Z is entirely contained
in one of X(a, b)’s half-spaces, i.e., either Z ⊆ H + or Z ⊆ H − .
Definition 1.24. If Z ⊂ Rn is an arbitrary set, then X(a, b) is supporting for Z iff X(a, b) is bounding
for Z and X(a, b) “touches” Z, i.e.,
inf |a · z − b| = 0
z∈Z

These concepts together with the following theorem will prove very useful in subsequent analysis.

11
Theorem 2 (Minkowski’s Theorem). If Z and W are non-empty, convex and non-intersecting subsets
of Rn , then there exist a ∈ Rn and b ∈ R such that X(a, b) is separating for Z and W , i.e., X(a, b)
(i) is bounding for both Z and W , (ii) contains Z in one half-space and (iii) contains W in the other
half-space.

Minkowski’s Theorem is illustrated for n = 2 in Figure 1.12. In the left-hand panel the antecedent
conditions for the theorem are met and the separating hyperplane is illustrated. In right-hand panel
one of the sets is not convex and it is not possible to find a separating hyperplane.

a. x = b

Z Z

W
W

Figure 1.12: Conditions for Minkowski’s Theorem: satisfied (left) and violated (right)

Problem 1.1 on page 2. Yes. From the “only if” in the definition, x > y =⇒ x ≥ y.
Problem 1.2 on page 2. Yes. From the “only if” in the definition,

x  y =⇒ xi > yi , i = 1, 2, . . . , n
=⇒ xi ≥ yi , i = 1, 2, . . . , n
=⇒ x ≥ y

Problem 1.5 on page 3. No. If x > y then at least one component of x must be greater than the
corresponding component of y. Since there is only one component, this means that every component
of x is greater than the corresponding component of y. Thus x > y =⇒ x  y. The converse also
holds.
Problem 1.12 on page 7. The origin has dimension 0. Surprised? Note that α(0, 0, 0) = (0, 0, 0) has an
abundance of non-trivial solutions, e.g. α = 1. A line through the origin has dimension 1 and a plane
through the origin has dimension 2.
Problem 1.13 on page 7. Two facts characterize this projection. (i) Since ŷ ∈ X it must be the case
that ŷ = α̂x for some real α̂. (ii) The residual of the projection, y − ŷ, must be orthogonal to every
vector in X. Since x ∈ X fact (ii) implies that (y − ŷ) · x = 0 or y · x = ŷ · x. Combining with (i) yields
y · x = α̂x · x or, since kxk 6= 0, α̂ = (y · x)/(x · x) and thus ŷ = (y · x)/(x · x)x.
Problem 1.15 on page 7.
x1 = (1 0 0 ··· 0)
2
x = (0 1 0 ··· 0)
..
.
xn = (0 0 0 ··· 1)

12
Problem 1.18 on page 8. Yes. The argument proceeds in three steps.

1. M is an affine space:
If y i ∈ M, i = 1, . . . , k then it must be the i i
P case that y = x − a, , k for some x i ∈ L, i =
P i = 1, . . . P
is affine, it follows that i αi = 1 implies z ≡ i αi x i = i αi (y i + a) ∈ L. But
1, . . . , k. Since L P
this means that i αi y i = z − a ∈ M. Thus M is an affine space.

## 2. The origin belongs to M:

Since a ∈ L it follows that a − a = 0 ∈ M.

## 3. An affine space which contains the origin is necessarily a linear space:

Suppose thatP x i ∈ M, i = 1, . . . , k and βi ∈ R, i = 1, . . . , k. We need to show that the linear
combination i βi x i ∈ M.
i i i i
P
Note that β
P Pi x =iβi x + (1 − βi )0 ∈ M since x , 0 ∈ M and M is affine. But then i βi x + (1 −
i βi )0 = i βi x ∈ M.

## Problem 1.23 on page 10. (i) if x, x 0 ∈ X(a) then a · x = a · x 0 = 0, a · x + a · x 0 = a · (x + x 0 ) = 0 and

thus x + x 0 ∈ X(a) (ii) if x ∈ X(a) then a · x = 0, αa · x = a · (αx) = 0 and thus αx ∈ X(a).
Problem 1.24 on page 10. Since (i) the dimension of Rn equals n, (ii) a itself spans (occupies) a linear
subspace of dimension 1 and (iii) X(a) contains all those xs which are orthogonal to a, it is not hard
to see that there are n − 1 directions left in which to find vectors orthogonal to a. Thus the dimension
of X(a) must be equal to n − 1.
Problem 1.26 on page 10. Since x i ∈ X(a, b) implies a·x i = b and i αi = 1 for any affine combination
P
i
P
i αi x , it follows that

b = (α1 + . . . + αk )b
= α1 a · x 1 + . . . + αk a · x k
= a · (α1 x 1 + . . . + αk x k )

αi x i ∈ X(a, b).
P
and i

13
14
Chapter 2

Matrix Algebra

## 2.8.1 Application: Asset Pricing and Arbitrage . . . . . . . . . . . . . . . . . . . . . . . 32

2.9 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

## 2.1 Linear Spaces

Thus far we have thought of vectors as points in Rn represented by n-tuples of real numbers. This
is a little like thinking of “127 Main Street” as a 15 character text string when, in reality, it’s a house.
Similarly, an n-tuple of real numbers is best regarded as the address of the vector that lives there.

15
y

## x+y x+y = (1, 1)

y = (0, 1) 1
||y||

3/2 x = (3/2, 0)
o 1
x = (1, 0)
||x|| x
3/2 x
2
2-dimensional linear space R

Figure 2.1: A linear space (left) and the corresponding “address space” (right)

All this can be made less abstract by constructing a “coordinate free” linear space using a pencil, ruler,
protractor and a blank sheet of paper. Begin by placing a point on the paper and labeling it o to
represent the origin. Then arbitrarily pick another couple of points, label them x and y and draw
arrows connecting them to o. This is illustrated in the left-hand panel of Figure 2.1.
The lengths, kxk and kyk, of x and y, respectively, can be measured with the ruler. The scalar product
of x and, say 3/2, can then be obtained by extending x using the ruler until the length is 3/2 times as
long as x. Multiplying by a negative real number, say −2, would require extending x in the opposite
direction until it’s length is 2 times the original length. The scalar multiple of an arbitrary point z by
the real number a is then obtained by expanding (or contracting) z until its length equals kakkzk and
then reversing the direction if a is negative.
To add, say, x and y use the protractor to construct a parallel to y through x and a parallel to
x through y. The intersection of these parallels gives x + y. Adding any other two points would
similarly be accomplished by “completing the parallelogram” formed by the two points.
Note that x and y are linearly independent since ax + by = o has only the trivial solution a = b = 0.
Any other point, z, can be expressed as a linear combination, z = ax + by, for appropriate choices of
the real numbers a and b. This means that the two vectors, x and y, form a basis for our linear space
which, consequently, is 2-dimensional. All this is possible without axes and coordinates.
Now let’s add coordinates by choosing x and y, respectively, as the two basis vectors for our linear
space. The corresponding 2-dimensional “address” space is illustrated in the right-hand panel of Fig-
ure 2.1 where, for example, (1, 0) is the address of x since x lives 1 unit out the first basis vector (x)
and 0 units out the second basis vector (y). In general, (a, b) in the right-hand panel is the address of
the vector ax + by in the left-hand panel.
Definition 2.1. A linear space is an abstract set, L, with a special element called the “origin” and
denoted o together with an operation on pairs of elements in L called “addition” and denoted + and
another operation on elements in L and real numbers called “scalar multiplication” with the property
that for any x, y ∈ L and any a ∈ R: (i) x + o = x, (ii) 0 x = o, (iii) x + y ∈ L and (iv) ax ∈ L.

## 2.2 Real Valued Linear Transformations and Vectors

Suppose that L is an n-dimensional linear space and that b = {b1 , b2 , . . . bn } ⊂ L form a basis for L. A
real-valued linear transformation on L is a map, t, from L into R with the property that t(ax + by) =
at(x) + bt(y) for all real numbers a and b and all x, y ∈ L.

16
Since b is a basisPfor L, an arbitrary x̂ ∈ L must be expressible as a linear combination of the Pelements
n
of b. Thus x̂ = i x i b i where x = (x1 , x2 , . . . , x n ) ∈ R and, since T is linear, T ( x̂) = T ( i x i bi ) =
n
P
i T (bi )xi = a · x, where ai ≡ T (bi ) and thus a ∈ R . This means that a · x is the image of T (x̂)
when x is the address of x̂. It also means that for every real-valued linear transformation on the
n-dimensional linear space, L, there is a corresponding vector, a ∈ Rn , that represents the associated
formula for getting the image of a point from its address.
♦ Query 2.1. Let L∗ denote the set of all real-valued linear transformations of the linear space L and
define addition and scalar multiplication for elements of L∗ as follows for all f , g ∈ L∗ and α ∈ R:

## (f + g)(x) ≡ f (x) + g(x), ∀x ∈ L

(αf )(x) ≡ αf (x), ∀x ∈ L

L∗ thus defined is called the dual space of L. Is it a linear space and, if so, what is its dimensionality?

## 2.3 Linear Transformations and Matrices

In general, a linear transformation is a mapping that is, well, linear. This means (i) that if x maps into
T (x) and a is a real number, then ax must map into aT (x) and (ii) that if x and y map into T (x) and
T (y), respectively, then x + y must map into T (x) + T (y).

Let’s suppose that the domain and range of the linear transformation are both equal to the 2-dimensional
linear space illustrated in Figure 2.1 on the previous page and construct a linear transformation. Con-
sider the left-hand panel of Figure 2.2. First select the same basis vectors as before, x and y. Now
choose arbitrary points to be the images of these two points and label them T (x) and T (y). You’re
done. That’s right, you have just constructed a linear transformation. To see why simply note that any
point in the domain, z, can be expressed as a linear combination of the basis vectors, z = ax + by.
But then the linearity of T implies that T (z) = T (ax + by) = aT (x) + bT (y). Thus the image of any
point in the domain is completely determined by the starting selection of T (x) and T (y).

y
T(y)

T(y) = (1/2, 1)
2/3 y y = (0,1)
T(x)
T(x) = (1, 2/3)

o
x = (1,0)
1/2 x
x

## Figure 2.2: A linear transformation

Note that the T (x) and T (y) in the illustration are linearly independent. This need not be the case,
they could be linearly dependent and span either a one-dimensional linear subspace of L, a line, or a
zero-dimensional linear subspace, the origin. See Problem 2.1 on following page.

As before, the right-hand panel of Figure 2.2 gives the “address view” of the same linear transforma-
tion. This means that x = (1, 0) maps into T (x) = (1, 2/3), y = (0, 1) maps into T (y) = (1/2, 1) and,

17
in general, z = (z1 , z2 ) maps into
T (z1 x + z2 y) = z1 T (x) + z2 T (y)
= (1, 2/3)z1 + (1/2, 1)z2
" #" #
1 1/2 z1
=
2/3 1 z2

## Thus the matrix-vector product " #" #

1 1/2 z1
2/3 1 z2
gives a formula for computing the address of the image of z from the address of z.
 Problem 2.1. Suppose, in the construction of the linear transformation in Figure 2.2 on preceding
page, that x and y are linearly independent but that T (x) and T (y) were chosen in a way that made
them linearly dependent and, in fact, span only a one-dimensional linear subspace. Discuss the impli-
cations for the image of the domain under the transformation and for the matrix that maps addresses
for the transformation.
Definition 2.2. A linear transformation is a mapping, T , which associates with each x in some n-
dimensional linear space, D, a point T (x) in some m-dimensional linear space, R, with the property
that if x 1 , . . . , x k ∈ D and α1 , . . . , αk ∈ R then
 
X X  
i
T  αi x = αi T x i
i i

## It is customary to refer to D as the domain and R as the range of the transformation.

While the definition imposes no restriction upon the values of m and n it is convenient to assume for
the moment that m = n and R = D. Suppose that b1 , . . . , bn ∈ D is a basis for both D and R, and let
T (bj ), j = 1, . . . , n be the images of these basis vectors under the transformation. Since T (bj ) belongs
to R = D it can be expressed as a linear combination of the basis vectors:
n
X
T (bj ) = aij bi
i=1

The matrix  
a11 a12 ··· a1n
 a21 a22 ··· a2n
 

 
 . .. .. ..
 ..

 . . . 

an1 an2 ··· ann
obtained in this way has for its jth column the “address” of T (bj ) in terms of the basis, i.e., T (bj )
“lives” a1j out the basis vector b1 , a2j out b2 and so forth.
Similarly, an arbitrary vector x ∈ D can be expressed as a linear combination of the basis vectors
n
X
x= xj b j
j=1

## and the resulting column vector

 
x1
 x2
 

 
 .
 ..

 
xn

18
can be interpreted as the address of x in terms of the basis.
Now since T is linear,
 
n
X
T (x) = T  xj b j 
j=1
n
X
= xj T (bj )
j=1
 
n
X n
X
= xj  aij bi 
j=1 i=1
 
n
X n
X
=  aij xj  bi
i=1 j=1

## so that the matrix-vector product

  
A11 A12 ··· A1n x1
 A21 A22 ··· A2n   x2
  

  
 . .. .. ..  .
 ..   ..

 . . . 

An1 An2 ··· Ann xn

## can be interpreted as the address of T (x) in terms of the basis.

A similar result can be established when m 6= n so that to every linear transformation which maps an
n-dimensional linear space into an m-dimensional linear space there corresponds an m by n matrix
for mapping addresses in terms of given bases for the domain and the range, and vice versa. This
being the case, the study of linear transformations centers upon the matrix-vector product Ax or,
equivalently, upon the linear transformations, T , for which D = Rn and R = Rm .
♦ Query 2.2. Suppose m 6= n and thus R 6= D. Let d1 , d2 , . . . , dn ∈ D be a basis for D and r1 , r2 , . . . , rm ∈
R be a basis for R. Derive the formula for mapping the address of x ∈ D into the address of T (x) ∈ R.

## Now choose a subset of the domain, Rn , and recall that:

Definition 2.3. The image of X ⊆ Rn under T is

## T (X) ≡ {y ∈ Rm | y = T (x), x ∈ X} = {y ∈ Rm | y = Ax, x ∈ X}

Note that T (Rn ), the set of all linear combinations of the columns of A, is a linear subspace with a
dimension equal to the number of linearly independent columns or rank of A. It is also true that
Rank(A) ≤ min{m, n} since there can’t be more linearly independent columns than there are columns
and since the columns themselves live in Rn . When Rank(A) < n the transformation “collapses” the
domain into a linear subspace. No such collapse takes place when Rank(A) = m = n and T (Rn ) = Rn .

## Definition 2.4. Given a mapping T : Rn , Rm , the inverse image of Y ⊆ Rm is

T −1 (Y ) ≡ {x ∈ Rn | T (x) ∈ Y }

19
Definition 2.5. A mapping is invertible iff the inverse image of any point in the range is a single point
in the domain.

## Note that when T is invertible

T −1 T (x) = x = T T −1 (x)
 
(2.1)
 Problem 2.2. [Answer] Show that the transformation associated with the matrix A is invertible iff
Rank(A) = m = n.
 Problem 2.3. Suppose that the n × n matrix A is invertible. Does it follow that Ax = 0 =⇒ x = 0?
 Problem 2.4. Suppose that A in an n × n matrix and that Ax = 0 =⇒ x = 0. Does it follow that A is
invertible?

## 2.5 Change of Basis and Similar Matrices

What difference does the choice of a basis make to the matrix that represents the linear transformation
withhrespect to thei basis? Suppose that A is the original matrix, b1 , b2 , . . . , bn is the original basis


and b̂1 , b̂2 , . . . , b̂n is the new basis. Since each original basis vector must be uniquely associated with
a new basis vector and since, as bases, both must be linearly independent, this change of basis defines
a linear transformation which maps the new basis vectors to the old ones and this transformation
must be invertible. Let P be the matrix version of this transformation so that if x̂ is the address of a
vector in terms of the new basis, then P x̂ is the address of the same vector in terms of the old basis.
Since the transformation itself has not changed, it must be the case that x = P x̂ maps into Ax or,
in terms of the new basis, that x̂ maps into P −1 Ax = P −1 AP x̂. Thus B = P −1 AP is the matrix that
represents the transformation with respect to the new basis.
Definition 2.6. Two matrices, A and B, are called similar if there exists an invertable matrix P such
that B = P −1 AP .
Theorem 3. Two matrices, A and B, represent the same linear transformation with respect to different
bases iff A and B are similar.

## The power of this result is twofold:

1. From a collection of similar matrices, the simplest or most analytically convenient can be selected
since they all represent the same linear transformation.

2. The set of all linear transformations can be partitioned into sets of similar transformations and
a simplest representative selected from each to form a set of canonical or representative forms.
For example, it can be shown that all 2 × 2 matrices are similar to one of the following three
matrices:

## " # " # " #

a b c 0 r 0
(2.2)
−b a 0 d 1 r
Understanding linear transformations of 2-dimension linear spaces then reduces to understand-
ing these three canonical forms.

##  Problem 2.5. Suppose

" #
a b
A=
−b a

20
√ √
Show that kAxk/kxk = a2 + b2 and cos(θ) = a/ a2 + b2 where θ is the angle between x and Ax
and thus that this transformation corresponds to a rotation and either a lengthening or a shortening.
Hint: For the first part, try Mathematica with

## A = {{a, b}, {-b, a}};

x = {x1, x2};
Assuming[{Element[x1, Reals], Element[x2, Reals], Element[ a, Reals],
Element[b, Reals]}, Simplify[Norm[A.x]/Norm[x]]]

## ♦ Query 2.3. Suppose

" #
c 0
A=
0 d

Interpret the transformation T , i.e., what are the images, T (x) and T (y), of the two basis vectors, x
and y?

## 2.6 Systems of Linear Equations

Solving simultaneous systems of linear equations involves nothing more than identifying the proper-
ties of the inverse image of a linear transformation. To solve the homogeneous system

## a11 x1 +···+ a1n xn = 0

..
.
am1 x1 +···+ amn xn = 0

or Ax = 0 is to find the inverse image of 0 under this linear transformation. Similarly, to solve the
non-homogeneous system
a11 x1 + · · · + a1n xn = b1
..
.
am1 x1 +···+ amn xn = bm

## or Ax = b is to find the inverse image of b under this transformation.

Two distinct views of the matrix-vector product prove useful. In the column view, the vector Ax is
viewed as a linear combination of the columns of A using the components of x as the weights:
     
a11 a12 a1n
 a21  a22  a2n
     
  
 x1 +  .  x2 + · · · +  .  xn (COL)
     
 .
 ..   ..   .. 
     
am1 am2 amn

In the row view, the components of the vector Ax are viewed as the dot products of the rows of A
with the vector x:
[a11 a12 · · · a1n ] · x
[a21 a22 ··· a2n ] · x
.. .. .. .. .. (ROW)
. . . . .
[am1 am2 ··· amn ] · x

21
2.6.1 Homogeneous Equations

Consider the homogeneous system Ax = 0 using the column view. A non-trivial solution (x 6= 0)
is possible iff the columns of A are linearly dependent since a non-trivial linear combination of the
columns using the components of x as weights can only be equal to zero if the columns are linearly
dependent.
The row view confirms this since x must be orthogonal to each row of A and thus to the linear
subspace spanned by the rows of A. This is possible iff Rank(A) = r < n in which case the rows span
a r dimensional linear subspace and there are n − r directions left to look for things orthogonal. The
solution set in this case, not surprisingly, is itself a linear subspace of dimension n − r and is called
the null space of A.
 Problem 2.6. Suppose
 
0 1 0 0 0
0 0 0 1 1
 
A=
0

 1 0 1 1

1 1 0 0 1

## The Mathematica command

MatrixRank[A]
gives the rank of the matrix A and the command
NullSpace[A]
gives an orthogonal basis for the null space of A. (i) What is the rank of A? (ii) Give an orthogonal
basis for the null space of A. (iii) What is the rank of the null space of A?

The column view is illustrated in the left-hand panel of Figure 2.3 for the case in which
" #
6 −3
A=
4 −2

Column 1:
(6, 4) Ax = (0, 0)
(1, 2)

## Row 2: (4, -2)

Column 2:
(-3, -2) Row 1: (6, -3)
(6, 4)1+(-3 ,-2)2
= (0,0)
Column View Row View

## Figure 2.3: Non-trivial Solutions for Ax = 0

Since rank(A) = 1 there are non-trivial choices for the weights x1 and x2 for which A·1 x1 + A·2 x2 = 0,
e.g., (x1 , x2 ) = (1, 2). The right-hand panel presents the corresponding row view in which the solution
set is a 2 − 1 = 1 dimensional linear subspace orthogonal to the linear subspace spanned by the rows
of A. Note that (x1 , x2 ) = (1, 2) belongs to the solution set.

22
2.6.2 Non-Homogeneous Equations

The non-homogeneous system Ax = b is similar. The column view suggests that a solution is possible
iff b lies in the linear subspace spanned by the columns of A. Put somewhat differently, a solution
is possible iff Rank(A|b) = Rank(A). Given any one such solution, x ∗ , it is possible to obtain all
solutions as follows. Since Ax ∗ = b it follows that if x 0 is any other solution it must be the case
that Ax ∗ = Ax 0 = b or A(x 0 − x ∗ ) = 0. Now we already know that solutions to Ax = 0 form a
linear subspace of dimension n − Rank(A). The solutions to Ax = b must then correspond to the
set obtained by adding x ∗ to each of the solutions to Ax = 0 — an affine subspace of dimension
n − Rank(A).
This is illustrated in Figure 2.4 for the case in which
" #
4 3
A=
−3 4
" #
5
b=
−10

(3, 4) (-3, 4)
(4, 3)

(2, -1)

(4, -3) S2
-1(3, 4)
+2(4, -3) S1

(5, -10)
Column View Row View

## Figure 2.4: A Unique Solution for Ax = b

Since rank(A) = 2 = n, the solutions to Ax = b must be an affine subspace of rank zero — a single
point — which corresponds to the trivial solution for Ax = 0. In column view illustrated in the
left-hand panel this unique solution for x is obtained by “completing the parallelogram” whose sides
correspond to the columns of A and whose diagonal corresponds to b. It follows that the unique
solution is x = (2, −1). Notice that if the columns of A were chosen as the basis, then the address
of b would be (2, −1). In the row view illustrated in the right-hand panel, the unique solution for x
corresponds to the intersection of

S1 = {x | (4, 3) · (x1 , x2 ) = 5}

a hyperplane orthogonal to the first row of A and lying a directed distance equal to 5/k(4, 3)k = 1
from the origin and
S2 = {x | (−3, 4) · (x1 , x2 ) = −10}
a hyperplane orthogonal to the second row of A and lying a directed distance equal to −10/k(−3, 4)k =
−2 from the origin.

23
2.7 Square Matrices

## 2.7.1 The Inverse of a Matrix

When T Rn , Rn is invertible, the inverse image of any point, T −1 (x), is itself a point. Thus T −1 is is
also a linear transformation. As such it has an associated matrix which is denoted, naturally enough,
A−1 . A consequence of Equation 2.1 on page 20 is that

A−1 Ax = x = AA−1 x

or that

 
1 0 ··· 0
0 1 ··· 0
 
 
. .. .. .. 
 .. . . .
 
0 0 ··· 1

(
1 if i = j
Ai· · A−1
·j = (2.4)
0 if i 6= j

## where Ai· is the ith row of A and A−1 −1

·j is the jth column of A . The jth column of A
−1
must therefore

## 2. form an acute angle with the jth row.

3. be just long enough to make the dot product with the jth row equal to one.

These requirements can be used to construct the inverse geometrically — see Figure 2.5 on the next
page for the case in which n = 2 and
" # " #
4 3 −1 1/3 −1/6
A= A =
2 6 −1/9 2/9

24
In Figure 2.5, R1 is the set of vectors which are orthogonal
to the second column of A and form an acute angle with the
first column — the first row of A−1 must belong to this set. R2 (3, 6)
Similarly, R2 is the set of vectors which are orthogonal to the
first column of A and form an acute angle with the second
column — the second row of A−1 must belong to this set.
(4, 2)
 Problem 2.7. What problem would be encountered in con-
structing the inverse if the columns of A were linearly depen-
dent?
 Problem 2.8. The formula for the inverse of a 2 by 2 matrix
R1
is:
" #−1 " #
a b 1 d −b
=
c d ad − bc −c a

Derive the first row of this inverse using Equation 2.4 on the
previous page. Figure 2.5: Constructing the Inverse

##  Problem 2.9. “Derive” the formula given in Problem Prob-

lem 2.8 using the Mathematica commands A = {{a,b},{c,d}} and then Inverse[A]//MatrixForm.
What difference would it make to replace //MatrixForm with //InputForm?
 Problem 2.10. [Answer] Gram’s Theorem states that if A is an m by n matrix with m < n and x ∈ Rm
then
AAT x = 0 a AT x = 0
Prove Gram’s Theorem.

This theorem implies, for example, that if Rank(A) = m then Rank(AAT ) = m since AT x = 0 has no
solution, x 6= 0, and AAT x = 0 must therefore have no solution either.

## 2.7.2 Application: Ordinary Least Squares Regression as a Projection

Consider the problem of ordinary least squares regression. In this problem data is available which
describes n observations on each of p exogenous and 1 endogenous variables. This arranged as
follows

## • X: an n by p “exogenous data” matrix each row of which corresponds to an observation and

each column of which corresponds to an exogenous variable. There are more observations than
variables so Rank(X) = p < n.

## • y: an n by 1 “endogenous data” vector each “row” of which corresponds to an observation on

the endogenous variable.

The problem is to find the projection, ŷ, of y on S = {z | z = Xβ, β ∈ Rp }. The term “least squares”
derives from the fact ŷ is the closest point to y in S and thus minimizes the sum of the squares of
the components of the difference — see Figure 2.6 on following page.
There are two key facts:

• Since ŷ ∈ S, it must be the case that ŷ = X β̂ for some β̂. The problem of finding ŷ thus reduces
to one of finding β̂.

25
y

Residual from
the projection

Projection of y on S

## Figure 2.6: Ordinary Least Squares Regression

• Since ŷ is the projection of y on S, the residual of the projection, y − ŷ, must be orthogonal to
S, the space spanned by the columns of X.

## • Xj · (y − ŷ) = 0, j = 1, . . . , p. To be orthogonal to the space spanned by the columns of X, y − ŷ

must be orthogonal to each column of X.

## • X T y = X T ŷ = X T X β̂. Carry out the multiplication and substitute for ŷ.

• β̂ = (X T X)−1 X T y. Multiply both sides by (X T X)−1 which exists by virtue of Gram’s Theorem.

 Problem 2.11. Suppose x 1 = (1, 0, 2), x 2 = (2, 0, 1), y = (3, 3, 3) ∈ Rn and let L = { z ∈ Rn | z =
α1 x 1 + α2 x 2 , α, β ∈ R } be the linear subspace spanned by x 1 and x 2 . Find ŷ, the projection of y on
L. Hint:

## X = {{1, 2}, {0, 0}, {2, 1}}

y = {3,3,3}
X . Inverse[Transpose[X] . X] . Transpose[X] . y

## 2.7.3 The Determinant

The following, due to Hadley , page 87, is a typical definition of the determinant of a square
matrix — correct but not particularly intuitive.
Definition 2.7. The determinant of an n by n matrix A, written |A|, is
X
|A| ≡ (±)a1i a2j . . . anr

the sum being taken over all permutations of the second subscripts with a term assigned a plus sign
if (i, j, . . . , r ) is an even permutation of (1, 2, . . . , n), and a minus sign if it is an odd permutation.

## When n = 2 this becomes

a a12
11
= a11 a22 − a12 a21

a21 a22

26
 Problem 2.12. Use Mathematica to derive the formulas for the determinant and inverse of a general
3 × 3 matrix by first entering

## and then Det[A] and Inverse[A]//MatrixForm.

It is often more useful to recognize that the determinant is another “signed magnitude” somewhat
analogous to the dot product which is best understood by examining its sign and its magnitude sepa-
rately. In Figure 2.7 the (linearly independent) columns of
" #
4 3
A= kAk = 4 × 4 − 3 × −3 = 25
−3 4

have been illustrated and a parallelogram or, in this case a square, has been formed by completing the
sides formed by these columns.

## The first thing to notice is that movement from the first to

the second axis is counter clockwise and that the movement
from the first column to the second is also counter clockwise.
A 2 = (3, 4)
Thus the rows of A have the same orientation as the axes.
This means that the determinant has a positive sign. (Switch
5
the columns and the determinant would be negative.) The
magnitude, moreover, corresponds to the area of this paral- 25 = |A|
lelogram.

5
" # A1= (4, -3)
6 −3
B=
2 −1

## The parallelogram formed by these columns is, in this case,

degenerate — a segment of a line rather than an area. The Figure 2.7: The Determinant in R2 : An
determinant is again equal to the area enclosed within this Oriented Area
line interval which, in this case, is equal to zero.
 Problem 2.13. The “formula” for a 2 by 2 determinant is:

a b

c d

Show that |ad − bc| is the area of the parallelogram formed by the columns. Hint: let x = (a, c),
y = (b, d) and note that the area of the parallelogram is equal to the length of the base, kxk, times
the altitude, ky − ŷk where ŷ is the projection of y on the linear subspace spanned by x.

27
Higher dimensional cases are analogous. The determinant of
a 3 by 3 matrix, for example, has a sign which depends upon
whether the columns have the same orientation as the axes and
a magnitude equal to the volume of the parallelepiped formed A
by the columns. [A parallelepiped is a solid each face of which 2
is a parallelogram.] See Figure 2.8. When the columns are
A
linearly dependent the parallelepiped degenerates into a plane 3
area (rank 2) or a line interval (rank 1), both of which have zero
volume and the determinant, accordingly, is equal to zero.
The determinant of an n by n matrix, analogously, has a sign
A
which depends upon the orientation of the columns and a mag- 1
nitude equal to the volume of the “hyper” parallelepiped formed
by the columns.
Figure 2.8: The Determinant in R3 :
 Problem 2.14. Suppose A is an n by n matrix. Provide geo- An Oriented Volume
metrical interpretations for the following propositions:

## 1. Suppose that Â·i = A·i for i 6= k and

Â·k = λA·k
i.e., Â is obtained from A by multiplying the kth column of A by a number λ. Then |Â| = λ|A|.

## 2. Suppose that Â·i = A·i for i 6= k and

X
Â·k = A·k + λi A·i
i6=k

i.e., that Â is obtained from A by adding a linear combination of the other columns to the kth
column. Then |Â| = |A|.

 Problem 2.15. Suppose A is a non-singular n by n matrix. Show that |Ax| = α |A| for some α ∈ R
which depends only upon x. What is α?

## An important application of the determinant is provided by:

Theorem 4 (Cramer’s Rule). If Ax = b with A an n by n matrix and |A| > 0 then

|Bi |
xi =
|A|

## where Bi is obtained by replacing the ith column of A with b.

The geometrical interpretation of this theorem is quite simple and is illustrated for the case in which
n = 2 in Figure 2.9 on the next page. Note first that the columns of A, labeled A1 and A2 , are linearly
independent and the solution for both x1 and x2 can be obtained by completing the parallelogram:

kb1 k kb2 k
x1 = x2 =
kA1 k kA2 k

28
Let’s use Cramer’s Rule to find, say, x1 . Since we wish to
identify the first component of x we begin by replacing the f
first column of A with b to obtain B1 . Cramer’s rule then
e
asserts that b
|B1 |
x1 =
|A| b2 d
c
Our task then is to show that
A2
b1
|B1 | kb1 k A1
= (2.5) o
|A| kA1 k

## Note first that |A| is the oriented volume of the parallelo-

gram formed by the first and second columns of A which, in
this case, is positive and could be computed by multiplying
the length of the “base”, oA2 , by the “altitude”, the distance
Figure 2.9: Using Cramer’s Rule to Solve
between the parallel lines ob2 and A1 e. Since the parallelo-
Ax = b for x1
gram with vertices at o, d, e and A2 has the same base and
the same altitude, it’s area is also equal to |A|. Call this
parallelogram PA .
Turning attention to the numerator, |B1 | is the area of the parallelogram with vertices at o, b, f and
A2 . Call this parallelogram PB . Since PB has the same base as PA , the ratio of the area of PB to the
area of PA , |B1 | / |A|, is the same as the ratio of the distance between ob2 and b1 f and the distance
between ob2 and A1 e. But this is the same as the ratio kb1 k/kA1 k which establishes Equation 2.5.

## A particularly important “characterization” of a square matrix is provided by its characteristic roots

and characteristic vectors:
Definition 2.8. If A is n by n matrix, λ is a scalar and x 6= 0 is an n by 1 vector, then λ is a characteristic
root of A and x is the associated characteristic vector iff λ and x solve the characteristic equation:

Ax = λx (2.6)

Characteristic roots and vectors are also sometimes called (i) eigenvalues and eigenvectors or (ii) latent
roots and latent vectors.
A fact worth noting about the characteristic roots of a matrix is that they characterize the underlying
linear transformation and are invariant with respect to the choice of basis — recall the discussion of
Section 2.3 on page 17. To see this note that if A is the matrix representation of the linear transfor-
mation T for a particular choice of basis, then to be a characteristic root of A, λ must satisfy

T (x) = λx

But this means that matrices which represent the same linear transformation under alternative choices
of basis, i.e., similar matrices, will have the same characteristic roots.
Since Equation 2.6 can be rewritten as the homogenous equation

[A − λI]x = 0

29
it follows that λ is a characteristic root of A iff

|A − λI| = 0

## The expansion of this determinant

a11 − λ a12 ··· a1n

a a22 − λ ··· a2n
21
.. .. .. =0
..

.

. . .

an1 an2 ··· ann − λ

is a polynomial in λ with (−λ)n the highest order term [the product of the diagonal elements]. From
the fundamental theorem of algebra we know that such a polynomial will have n, not necessarily
distinct, solutions for λ.
 Problem 2.16. The characteristic roots of A may be either real or complex but if they are complex
they must occur in conjugate pairs so that if λ = a + bi is a root then λ∗ = a − bi must also be a root.
Show that it follows that both the sum of the roots and the product of the roots are necessarily real.

## Two elementary facts about characteristic roots are worth noting.

Theorem 5. If A is an n by n matrix with characteristic roots λi , i = 1, . . . , n, then
n
X
λi = trace(A)
i=1
Yn
λi = |A|
i=1

## where the trace of A is the sum of the diagonal elements:

n
X
trace(A) ≡ aii
i=1

Since similar matrices must have the same characteristic roots, it follows from Theorem 5, that similar
matrices have the same trace and determinant as well.
 Problem 2.17. [Answer] Show that Theorem 5 is valid for the 2 by 2 matrix
" #
a b
A=
c d

 Problem 2.18. What are the characteristic roots of the three canonical matrices given in Equation 2.2
on page 20?
 Problem 2.19. Suppose
 
1 2 3 4
4 1 2 3
 
A=
3

 4 1 2

2 3 4 1

The Mathematica command for finding the characteristic roots and vectors of a matrix are Eigenvalues[A]
and Eigenvectors[A], respectively. What are the characteristic roots and vectors of A?

30
2.8 Farkas’ Lemma

A final result that will prove very important in subsequent analysis takes us into the realm of linear
inequalities. It states that a system of linear equations will have a solution precisely when another
system of linear inequalities does not have a solution. The importance of this result involves “indi-
rection” — often it will be easier to establish the existence of a solution to the system of interest by
showing that the solution to complementary system cannot exist.
Theorem 6 (Farkas’ Lemma). Suppose A is an m by n matrix and b 6= 0 is a 1 by n row vector. Exactly
one of the following holds:

b
b

yA, y > 0
A2
A2 yA, y > 0
A1 A1

Az _
>0 Az _
>0
b. z < 0
b.z < 0

## Figure 2.10: Farkas’ Lemma

The basis of this theorem is quite simple and is illustrated in Figure 2.10. Either

1. a vector z exists which forms a non-obtuse angle with every row of A and an obtuse angle with b
(the left-hand panel)

## 2. or b lies in the “cone” generated by the rows of A (the right-hand panel)

The key to understanding Figure 2.10 is to fix the rows of A and rotate b clockwise in moving from the
left-hand panel to the right-hand panel. Initially b lies outside the cone generated by the rows of A. It
follows that there is a vector z for which Az ≥ 0 and for which b · z < 0, i.e., a vector z that makes an
non-obtuse angle with every row of A and an obtuse angle with b.

As b rotates clockwise this solution disappears precisely at the point at which b enters the cone
spanned by the rows of A but then there is a solution to yA = b with y > 0. This solution persists
until b emerges from the cone spanned by the rows of A but at this point there is again a solution to
Az ≥ 0 and b · z < 0.

31
2.8.1 Application: Asset Pricing and Arbitrage

Consider a two period model of asset pricing. There are n assets which can be traded in the first
period at prices p. The first period budget constraint limits an investor endowed with portfolio ŝ to
portfolios satisfying
p · s ≤ p · ŝ (2.7)

Asset prices in the second period are uncertain and depend upon which of m possible “states of
nature” occurs. It common knowledge when first period trading takes place that the second-period
price of the jth asset will be aij if the ith state occurs. Let A denote the corresponding m × n matrix
of second period prices in which rows correspond to states and columns to assets. Holding portfolio
s would then pay As in the second period, i.e., the ith component of this m-tuple would be the total
value of the portfolio if the ith state occurred.
Note that the components of s are not required to be non-negative. Indeed, negative components
correspond to “short” positions, e.g., s1 = −1 would be interpreted as taking a short position of one
share on the first asset. This means the investor borrows a share of this asset from the market, sells
it for p1 and then uses the receipts to purchase other shares. The catch, of course, is that such loans
must be repaid in the second period. Our investor would thus be required to purchase one share of
the first asset in the second period, whatever its price turns out to be, to repay the first-period loan.
The second-period “solvency” constraint is that the investor must be able to repay such loans or that
holding the portfolio not entail bankruptcy in any state

As ≥ 0 (2.8)

It is important to realize that the components of As are the commodities that investors care about
— the components of s only matter to the extent that they affect As. Since p is vector of security
prices and securities are not themselves the focus of interest, the question arises of whether or not
it is possible to identify an m-tuple of “shadow prices”, ρ, of the commodities of interest. Here ρi
would be interpreted as the price of a claim to one dollar contingent upon state i occurring in fictional
would have to be equivalent to those in the actual markets, i.e., ρ would have to satisfy

## p = ρA, ρ>0 (2.9)

Can we be sure that a solution to Equation 2.9 exists? Well, if we make the association y = ρ, b = p
and z = s, then Farkas’ Lemma states that either Equation 2.9 will have a solution or there will be a
solution, s, to
As ≥ 0, p · s < 0 (2.10)
An s that satisfied Equation 2.10 would be a good thing, too good in fact. It not only satisfies solvency,
As ≥ 0, but also “pumps money” into the pocket of the investor in the first period since p · s < 0. In
the context of the budget constraint, Equation 2.7, this means that

p · (λs) = λp · s ≤ p · ŝ

is satisfied for an arbitrarily large λ and thus that our investor could acquire infinite first period wealth.
This is commonly called an arbitrage opportunity. If we make the reasonable supposition that p and
A preclude such arbitrage opportunities, then the existence of shadow prices satisfying Equation 2.9
is guaranteed.
♦ Query 2.4. Suppose m = n = 2, that the two columns of A, A1 and A2 , are linearly indpendent and
that p · ŝ = 1, i.e., our investor is worth one dollar in the first period.

32
1. In a graph of the positive quadrant of R2 , illustrate A1 and A2 and the points v1 ≡ A1 /p1 and
v2 ≡ A2 /p2 . Is either v1 > v2 or v2 < v1 consistent with no arbitrage opportunities?
2. Illustrate the budget constraint for contingent claims under the assumption that no arbitrage
opportunities exist. Label the regions corresponding to long positions on both assets, to a long
position on the first asset and a short position on the second and to a short position on the first
asset and a long position on the second.
3. What is the effect in your illustration of adding the solvency constraint to the budget constraint?
4. Is it possible to determine the shadow prices, ρ1 and ρ2 , from your illustration and, if so, how?
♦ Query 2.5. Suppose that no arbitrage opportunities exist and let x = As and x̂ = Aŝ. What is the
budget constraint corresponding to Equation 2.7 on the previous page in terms of x, x̂ and ρ? What
is the solvency constraint corresponding to Equation 2.8 on the previous page?
♦ Query 2.6. Suppose that a new asset is introduced, that Rank(A) = Rank(A|b) where b is the vector
of state-dependent, second-period prices for the new asset, that no arbitrage opportunities exist either
before or after the introduction of the new asset and that p = ρA is the vector of first-period prices of
the original assets. What must be the first-period price of the new asset?
♦ Query 2.7. Suppose that no arbitrage opportunities exist and that there is a riskless portfolio, s ∗ , for
which As ∗ = (1, 1, . . . , 1)T . What is the one-period riskless rate of return? Hint: What is the first-period
cost of buying claims to a sure, second-period dollar?

Problem 2.2 on page 20. Suppose that Rank(A) = m = n and that A is not invertible. Then there must
exist y, x, x 0 ∈ Rn with x 6= x 0 such that y = Ax = Ax 0 . But this means that A(x − x 0 ) = 0 with
(x − x 0 ) 6= 0 and thus the columns of A are linearly dependent — a contradiction. Conversely, suppose
A is invertible and the columns of A are linearly dependent. Then there exist weights α = (α1 , . . . , αn )
such that Aα = 0. Now choose any x ∈ Rn and note that x − α 6= x + α and yet A(x − α) = A(x + α) =
Ax — thus A is not invertible.
Problem 2.10 on page 25. Suppose AAT x = 0. Then
x T AAT x = 0 (AT x)T (AT x) = 0 |AT x| = 0 AT x = 0
and, conversely, if AT x = 0 then clearly AAT x = 0.
Problem 2.17 on page 30. Expanding the determinant yields
(a − λ)(d − λ) − bc = 0
or
λ2 − (a + d)λ + ad − bc = 0
p
a + d + (a + d)2 − 4(ad − bc)
λ1 =
2
p
a + d − (a + d)2 − 4(ad − bc)
λ2 =
2

## It follows immediately that

λ1 + λ2 = a + d = trace(A)
λ1 λ2 = ad − bc = |A|

33
34
Chapter 3

Topology

3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

## 3.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.1 Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

## 3.2.2 Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

## 3.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3.1 Separation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

## 3.3.2 Generic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

## 3.4 Sigma Algebras and Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

This chapter draws much from Simmons , surely one of the most beautiful books about mathe-
matics ever written.

3.1 Counting

The subject of counting begins simply enough with thoughts of the positive integers, 1, 2, 3, . . .,
familiar to all of us. But counting was surely important to human beings even before such symbols
were invented. Imagine a primitive society of sheep herders whose number system was limited to
the symbols “1”, “2”, “3” and “several”, i.e., “more than 3”. How might they have kept track of herds
containing “several” sheep? One simple device might have been to place a stone in a pile for each
sheep in the herd and then, each night, to remove a stone for each sheep accounted for. Stones left in
the pile would then have indicated strays needing to be found.

35
3.1.1 Countable Sets

## Similarly, the infinite set

N = {1, 2, 3, . . .}
containing all the positive integers or cardinal numbers, serves as a modern “pile of stones”. While
this set is adequate for counting any non-empty, finite set, in mathematics there are many infinite sets
just as, for the herdsmen, there were many herds with “several” sheep. The simple but profound idea
of a one-to-one correspondence that met the needs of the herdsmen also permits comparing these
infinite sets.
Definition 3.1. Two sets are said to numerically equivalent if there is a one-to-one correspondence
between the elements of the two sets.
Definition 3.2. A countable set is a set that is numerically equivalent to the positive integers.

Suppose, for example, that we want to compare the set consisting of all positive integers with the set
consisting of of all even positive integers. Since the pairing

1 2 3 ··· n ···
2 4 6 ··· 2n ···

establishes a one-to-one correspondence, the two sets must be regarded as having the same number
of elements even though one is a proper subset of the other. This situation is not unusual since every
infinite set can, in fact, be put into a one-to-one correspondence with a proper subset of itself.
Similarly, there are exactly as may perfect squares as there are positive integers because these two
sets can also be put in a one-to-one correspondence:

1 2 3 ··· n ···
2 2 2 2
1 2 3 ··· n ···

As another example, consider the set of all positive rational numbers, i.e., ratios of positive integers.
Surely this set is larger than the positive integers, right? No. The following array includes every
positive rational number at least once

## 1/1 1/2 1/3 1/4 ···

2/1 2/2 2/3 2/4 ···
3/1 3/2 3/3 3/4 ···
.. .. .. .. ..
. . . . .

and can be put into a one-to-one correspondence with the positive integers as follows:

1 2 3 4 5 6 7 8 9 ···
1/1 1/2 2/1 1/3 2/2 3/1 1/4 2/3 3/2 ···

##  Problem 3.1. Construct a one-to-one correspondence between the set of integers,

{· · · , −2, −1, 0, 1, 2, · · · }

## and the set of positive integers.

So how many positive integers are there? The symbol ℵ0 , read “aleph null”, is used to represent the
number of elements or cardinality of the set. Our list of numbers now includes its first “trans-finite”
number:
1 < 2 < 3 < · · · < ℵ0

36
3.1.2 Uncountable Sets

Not all sets with infinitely many elements are countable. Consider a countable sequence of points of
the form x1 , x2 , x3 , ... where each element xi is either 0 or 1 and a countable listing of these sequences
such as:

s1 = (0, 0, 0, 0, 0, 0, 0, · · · )
s2 = (1, 1, 1, 1, 1, 1, 1, · · · )
s3 = (0, 1, 0, 1, 0, 1, 0, · · · )
s4 = (1, 0, 1, 0, 1, 0, 1, · · · )
s5 = (1, 1, 0, 1, 0, 1, 1, · · · )
s6 = (0, 0, 1, 1, 0, 1, 1, · · · )
s7 = (1, 0, 0, 0, 1, 0, 0, · · · )
..
.

It is possible to build a list of elements s0 in such a way that its first element is different from the first
element of the first sequence in the list, its second element is different from the second element of the
second sequence in the list, and, in general, its nth element is different from the nth element of the
nth sequence in the list. For instance:

s1 = (0, 0, 0, 0, 0, 0, 0, · · · )
s2 = (1, 1, 1, 1, 1, 1, 1, · · · )
s3 = (0, 1, 0, 1, 0, 1, 0, · · · )
s4 = (1, 0, 1, 0, 1, 0, 1, · · · )
s5 = (1, 1, 0, 1, 0, 1, 1, · · · )
s6 = (0, 0, 1, 1, 0, 1, 1, · · · )
s7 = (1, 0, 0, 0, 1, 0, 0, · · · )
..
.
s0 = (1, 0, 1, 1, 1, 0, 1, · · · )

Note that the highlighted element in s0 is in every case different from the highlighted element in the
table above it and thus the new sequence is distinct from all the sequences in the list. From this it
follows that the set T , consisting of all countable sequences of zeros and ones, cannot be put into
a list s1 , s2 , s3 , .... Otherwise, it would be possible by the above process to construct a sequence s0
which would both be in T (because it is a sequence of 0’s and 1’s) and at the same time not in T
(because we deliberately construct it not to be in the list). Therefore T cannot be placed in one-to-one
correspondence with the positive integers. In other words, T is uncountable.
Now consider the binary representation of a number between zero and one where, for example, 1/2
would be represented as 0.1, 1/4 would be 0.01 and so forth. Since the binary representation of a real
number between zero and one must be a countable sequence of zeros and ones preceded by a decimal
point, e.g., 0.1011011100 . . ., and since the number of such sequences is uncountable, it follows that
the set of real numbers lying between zero and one must also be uncountable.
Surely there are more real numbers than those lying between zero and one, right? No, the set of all real
numbers and the set of real numbers between zero and one, or in any other interval, are numerically

37
a b

P'
0

Figure 3.1: One-to-one Correspondence Between an Interval and the Real Line

equivalent. The one-to-one correspondence is illustrated in Figure 3.1. Simply bend the interval ab
into a semi-circle, rest the result on the real line and then associate an arbitrary point P from the
interval with that point P 0 from the real line which corresponds to the intersection of a line from the
center of the semi-circle through P with the real line.
We now have a new cardinal number, c, called the cardinal number of the continuum and our list of
numbers now includes a second “trans-finite” number:

## 1 < 2 < 3 < · · · < ℵ0 < c

 Problem 3.2. The Cantor set is obtained as follows. First let C1 denote the closed unit interval [0, 1].
Next delete from C1 the open interval (1/3, 2/3) corresponding to the middle third of C1 to get C2 and
note that C2 = [0, 1/3] ∪ [2/3, 1]. Now delete the open middle thirds of the two closed intervals to get

## C3 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1]

Continuing in this fashion we obtain a sequence of closed sets, each of which contains all its succes-
sors. The Cantor set is defined by
C = ∩∞i=1 Ci

1. Each Cn consists of a number of disjoint closed intervals of equal length. How many closed
intervals are there in C30 ?

2. The intervals removed have lengths 1/3, 2/9, 4/27, . . . , 2n−1 /3n , . . . What is the combined length
of the intervals that have been removed? Hint: Let Mathematica evaluate
Sum[2^(n-1)/3^n, {n,1,Infinity}]

You might be surprised at this point to learn that the cardinality of C is equal to c, i.e., the same as C1 .

An interesting consequence is that since the rational numbers are countable but the real numbers are
not, the set of irrational numbers must be uncountable as well or, more poetically:

The rational numbers are spotted along the real line like stars against a black sky, and the
dense blackness of the background is the firmament of the irrationals.
– T. E. Bell

Are there any cardinal numbers between ℵ0 and c? No one knows the answer to this question though
Cantor himself thought that no such number exits. There are, on the other hand, cardinal numbers
larger than c — the number of elements in the class of all subsets of R, for example. This is one
consequence of the following theorem.
Theorem 7. If X is any non-empty set, then the cardinal number of X is less than the cardinal number
of the class of all subsets of X.

38
Suppose, for example, that X = {1}, then there are two subsets,  and {1}. If X = {1, 2}, then there
are four subsets, , {1}, {2} and {1, 2}. Similarly, X = {1, 2, 3} has eight subsets and, in general, if X
has n elements, then there are 2n subsets.
Continuing into the infinite realm, if X has ℵ0 elements then there are 2ℵ0 > ℵ0 subsets. Which is
larger, c or 2ℵ0 ? As noted above, the cardinality of the unit interval, c, is the same as the cardinality
of the set of all countable sequences of zeros and ones. Consider the one-to-one mapping between
the set of all subsets of the natural numbers and the set of all countable sequences of zeros and ones
defined by
f (S) = {x1S , x2S , . . .}
where (
1 if i ∈ S
xiS ≡
0 otherwise
If, for example, S = {2, 3, 5} then f (S) = (0, 1, 1, 0, 1, 0, . . . , 0, . . .) Thus f (S) gives a countable sequence
of zeros and ones for any S ⊂ N. But then it is also true that

f −1 (x) ≡ {j ∈ N | xj = 1}

Thus f −1 exists and f −1 (x) gives a subset of N for any countable sequence of zeros and ones, x. Thus
the cardinality of the continuum is the same as the cardinality of the class of all subsets of the natural
numbers and c = 2ℵ0 .

## 3.2 Metric Spaces

Definition 3.3. Let X be a non-empty set. A metric or norm on X is a function d that maps X × X into
non-negative real numbers, R+ , and which satisfies the following three conditions:

## Definition 3.4. A metric space is a non-empty set X together with a metric, d, on X.

An example of a metric space is X = R with d(x, y) = x − y .

 Problem 3.3. [Answer] Show that d(x, y) = x − y for x, y ∈ R satisfies the three requirements
for a metric.

## Another example of a metric space is X = Rn with d(x, y) = (x − y) · (x − y) = kx − yk.

p
p
 Problem 3.4. Show that d(x, y) = (x − y) · (x − y) satisfies the three requirements for a metric.
Hint: For the third part remember the Cauchy-Schwarz inequality (Problem 1.6 on page 4).
Definition 3.5. Suppose (X, d) is a metric space. The diameter of a subset S ⊆ X is defined by

d(S) ≡ sup{d(x1 , x2 ) | x1 , x2 ∈ S}

## if the supremum (least upper bound) exists and infinity otherwise.

 Problem 3.5. [Answer] What is the diameter of ?
Definition 3.6. Suppose (X, d) is a metric space. Then a subset S ⊆ X is called bounded iff its diameter,
d(S), is finite.

39
3.2.1 Open Sets

Definition 3.7. Suppose (X, d) is a metric space. If x ∈ X and r is a positive real number, then the set

## is called an open sphere with center x and radius r .

Note that an open sphere is always non-empty since it contains its center.

Note also that the term “sphere” should not be taken literally. If X = R and d(x, y) = x − y , then
q of the real line containing numbers greater than x − r and less than
the “sphere” is actually an interval
x + r . If X = R2 and d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 then the sphere is actually a disk (the interior
of a circle).
 Problem 3.6. Let X = R2 and d(x, y) = max { x1 − y1 , x2 − y2 }. Show that d satisfies the three

##  Problem 3.7. In Mathematica

ChebyshevDistance[x,y] = Max[Abs[x-y]]

equals maxi xi − yi . Supposing
x = Table[j, {j,1,10}]
and
y = Table[11-j, {j,1,10}]
What is ChebyshevDistance[x,y]?
 Problem 3.8. In Mathematica
EuclideanDistance[x,y] = Norm[x-y]
qP
2
equals i (xi − yi ) . Supposing

x = Table[j, {j,1,10}]
and
y = Table[11-j, {j,1,10}]
what is EuclideanDistance[x,y]?
 Problem 3.9. Let X = R2 and d(x, y) = x1 − y1 + x2 − y2 . Show that d satisfies the three

##  Problem 3.10. In Mathematica

ManhattanDistance[x,y] = Total[Abs[x-y]]
P
equals i xi − yi . Supposing
x = Table[j, {j,1,10}]
and
y = Table[11-j, {j,1,10}]
what is ManhattanDistance[x,y]? Why do you suppose that it’s called Manhattan distance?

40
Definition 3.8. A subset G of the metric space X is called an open set if, for any point x in G, there is
a positive real number rx such that Srx (x) ⊆ G.
Theorem 8. Let (X, d) be a metric space. Then both the empty set, , and X itself are open sets.
Theorem 9. Let (X, d) be a metric space. Then a subset G of X is an open set iff it is a countable union
of open spheres.
Theorem 10. Let (X, d) be a metric space. Then

## ai = {x ∈ R | −1/(1 + i) < x < 1 + 1/(1 + i)}

n
where i is a positive integer. (i) What is ∪∞
i=1 ai ? Is it open, closed or neither? (ii) What is ∩i=1 ai ? Is it

open, closed or neither? (iii) What is ∩i=1 ai ? Is it open, closed or neither?

## 3.2.2 Closed Sets

Definition 3.9. If (X, d) is a metric space and A is a subset of X, then a point x ∈ X is a limit point of
A if every open set containing x contains at least one point from A other than x.
Definition 3.10. A subset A of a metric space X is a closed set if it contains all of its limit points.
Theorem 11. Let (X, d) be a metric space. Then

1. A subset A of X is a closed set iff its complement (with respect to X) is an open set.

2. The empty set, , and the set X itself are both closed.

Note that  is both an open and closed set and X is both an open and closed set.
Theorem 12. Let X be a metric space. Then

## 2. Any finite union of closed sets is a closed set.

 Problem 3.12. Consider closed intervals on R of the form ai = [−1 + 1/(1 + i), 1 − 1/(1 + i)] where
i is a positive integer. (i) What is ∪n ∞
i=1 ai ? Is it open, closed or neither? (ii) What is ∪i=1 ai ? Is it open,

closed or neither? (iii) What is ∩i=1 ai ? Is it open, closed or neither?

3.2.3 Convergence

## Definition 3.11. A sequence on a metric space X is a list of points in X:

{x1 , x2 , x3 , . . . , xn . . .}

which is numerically equivalent to the set of positive integers. Such a sequence might be denoted by
{xi } where it is understood that i ranges over the positive integers.

Unlike a set, order matters in a sequence — there is a first element, a second element and so forth —
and a given element in X can occur more than once.

41
Definition 3.12. A sequence on metric space (X, d) is convergent if there exists a point x ∈ X such
that either of these equivalent conditions hold:

• for each r > 0, there exists a positive integer nr such that n ≥ nr implies d(xn , x) < r .
• for each open sphere centered on x, Sr (x), there exists a positive integer nr such that n ≥ nr
implies xn ∈ Sr (x).

When a sequence converges, the point to which it converges is unique and called the limit point. The
fact that the sequence xn is convergent and that x is its limit point can be expressed in a variety of
equivalent ways:

• xn approaches x
• xn converges to x
• xn → x
• lim xn = x

Every convergent sequence has the property that for any positive real number r , there exists a positive
integer nr such that m, n ≥ nr implies d(xm , xn ) < r . This follows from the triangle inequality
(Definition 3.3 on page 39) since convergence implies that there is a positive integer, nr , such that
n ≥ nr implies d(xn , x) < r /2. Thus
r r
m, n ≥ nr ⇒ d(xm , xn ) ≤ d(xm , x) + d(x, xn ) < + =r (3.1)
2 2

A sequence for which Equation 3.1 holds is called a Cauchy sequence. Such a sequence might be
described as “trying to converge”. Whether or not it succeeds depends upon the metric space in which
it lives. If, f or example, X = {x ∈ R | 0 < x < 1} then the Cauchy sequence xn = 1/n tries to converge
but the point to which it wants to converge, 0, is not in X.
Definition 3.13. A complete metric space is a metric space in which every Cauchy sequence converges.
A complete metric space thus has the property that every sequence that tries to converge succeeds.
Theorem 13. If Y is a subspace of a complete metric space X, then Y is complete iff Y is a closed set.
Even when a sequence does not converge, it may be possible to extract a subsequence that does
converge. The sequence {0, 1, 0, 1, . . .} on R, for example, does not converge but the subsequence
{1, 1, 1} obtained by extracting the even terms from the original sequence does converge.
Definition 3.14. A sequence {xi }, i ∈ I on a metric space X is said to have a convergent subsequence
iff there exists a countable but not finite subset J ⊆ I such that the sequence {xj }, j ∈ J converges.

3.2.4 Continuity

Definition 3.15. Suppose (X, dX ) and (Y , dY ) are metric spaces and that f is a mapping of X into Y .
Then f is said to be continuous at a point x ∈ X if either of the following equivalent conditions hold:

• for each rY > 0 there exists a rX > 0 such that d(x, x 0 ) < rX implies d (f (x), f (x 0 )) < rY .
• for each open sphere SrY(f (x)) centered on f (x) there exists an open sphere SrX (x) centered
on x such that f SrX (x) ⊆ SrY (f (x)).

The mapping f is said to be continuous iff it is continuous at each point in its domain X.
Theorem 14. Suppose X and Y are metric spaces and f is a mapping of X into Y . Then f is continuous
iff f−1 (Z) is an open set in X whenever Z is an open set in Y .

42
3.3 Topological Spaces

Since Theorem 14 on the previous page expresses continuity solely in terms of open sets without any
direct reference to metrics, the possibility arises of dispensing with metrics altogether and basing
matters on open sets instead. Pursuing this idea further, Theorem 10 on page 41 gives the main
properties of the class of open sets in a metric space. This leads to the following
Definition 3.16. Let X be a non-empty set. A class T of subsets of X is called a topology on X iff it
satisfies the following two conditions:

## 2. the intersection of every finite class of sets in T is a set in T .

Definition 3.17. A topological space is a non-empty set X together with a topology T on X. The sets in
the class T are called the open sets of the topological space (X, T ).

In other words, the class of open sets is closed with respect to the formation of countable unions and
finite intersections.
Note that the empty set is a subset of every set and thus the empty class of sets is a subset of every
topology. From this it follows that the empty set, as the union of the sets in the empty class, must
belong to every topology. Similarly, the intersection of sets in the empty class must, by elementary set
theory, be equal to the union of the complements of the sets in the empty class, namely X. Thus the
set X must belong to every topology on X.
Definition 3.18. Suppose (X, T ) is a topological space and x is a point in X. Then a neighborhood of x
is an element of T (an open set) that contains x.

## The following are examples of topological spaces:

1. X is a metric space and T is the class of sets that are open in the sense of Definition 3.8 on
page 41. This is called the usual topology or the topology induced by the metric and is understood
to be the topology for a metric space unless another is specifically mentioned.

2. Let X be a non-empty set and let T contain the two sets  and X. This is called the trivial topology.

3. Let X be a non-empty set and let T be the class of all subsets of X. This is called the discrete
topology and a space with this topology is called a discrete space.

Suppose X is a non-empty set and that Ta and Tb are two topologies on X. The statement Ta ⊆ Tb
would then mean that each open set in Ta is also an open set in Tb but not necessarily vice versa. In
other words, Tb has all of the open sets of Ta and perhaps others as well. In such circumstances Tb
would be called stronger than Ta or, equivalently, Ta would be called weaker than Tb . Any two topolo-
gies may not be comparable, of course, but all topologies will be weaker than the discrete topology
(Example 3) and stronger than the trivial topology (Example 2).
Definition 3.19. Suppose (X, T ) is a topological space and Y is a subset of X. Then the relative topology
on Y is defined to be the class of all intersections of open sets in X with Y .
Definition 3.20. Let (X, Tx ) and (Y , Ty ) be topological spaces and f a mapping of X into Y . Then f is
called a continuous mapping iff f −1 (Z) is open in X (belongs to TX ) whenever Z is open in Y (belongs
to TY ).
 Problem 3.13. Consider the mapping from X = R into Y = R defined by
(
x if x ≤ 0
f (x) =
1 + x if x > 0

43

Suppose that both X and Y are metric spaces with the usual metric d(x, y) = x − y and with the
topology induced by this metric. Give an example of an open set Z in Y for which f −1 (Z) is not open
in X.
Definition 3.21. If (X, T ) is a topological space, then a set Y ⊆ X is called a closed set iff its complement,
X \ Y , is an open set (belongs to T ).
Theorem 15. Suppose (X, T ) is a topological space. Then

## • any finite union of closed sets in X is closed.

In other words, the class of closed sets is closed with respect to the formation of countable intersec-
tions and finite unions.
The following shows that we could replace “open set” with “closed set” as the primitive element for
defining a topology.
Theorem 16. Suppose X is a non-empty set and that there is a class of subsets of X which is closed
with respect to the formation of countable intersections and finite unions. Then the class of all com-
plements of these sets is a topology on X whose closed sets are precisely those given initially.

## 3.3.1 Separation Properties

One of the most basic requirements of a topological space is that each of its points should be a closed
set. Whether or not this is true depends upon the separation properties of the topology.
Definition 3.22. A T1 -space is a topological space (X, T ) with the property that if x and y are distinct
points in X, then there exist open sets in Ox and Oy in T such that x ∈ Ox and y ∈ Oy but x ∉ Oy
and y ∉ Ox .
 Problem 3.14. Which, if any, of the three examples of topological spaces — usual, trivial or discrete
— given on page 43 is a T1 -space?
Theorem 17. A topological space is a T1 -space iff each set containing a single point is a closed set.
 Problem 3.15. [Answer] Prove Theorem 17.
Definition 3.23. A T2 -space or Hausdorf space is a topological space (X, T ) with the property that if x
and y are distinct points in X then there exist open sets Ox and Oy in T such that x ∈ Ox , y ∈ Oy
and Ox ∩ Oy = .

 Problem 3.16. Consider X = R with the usual topology T induced by the metric d(x, y) = x − y .
Is (X, T ) a T2 -space?
 Problem 3.17. Is a T2 -space necessarily also a T1 -space?

## 3.3.2 Generic Properties

Definition 3.24. Suppose (X, T ) is a topological space. A set S ⊆ X is called dense if U ∩ S 6=  for every
non-empty open set U ∈ T .

A set S that is both open and dense has an interesting property. Every point in its complement,
S C = X \ S, can be approximated arbitrarily closely by points in S because S is dense, but no point in
S can be approximated by a point in S C because S is open. For example,

S = {x ∈ R2 | x1 x2 6= 1}

## is both open and dense in R2 under the usual topology.

44
Definition 3.25. A property which is true for an open and dense set is called a generic property of the
topological space.

## Loosely speaking, a generic property is one that is “almost always” true.

♦ Query 3.1. Suppose (X, T ) has the trivial topology and that S ⊆ X is both open and dense. What is
S?
♦ Query 3.2. Suppose (X, T ) has the discrete topology and that S ⊆ X is both open and dense. What
is S?

3.3.3 Compactness

Definition 3.26. Suppose (X, T ) is a topological space with S ⊆ X. Then Ca ∈ T for a in some index
set A is called an open cover of S if S ⊆ ∪a∈A Ca . A subclass of an open cover which is itself an open
cover is called a subcover.
Definition 3.27. A compact space is a topological space in which every open cover has a finite subcover.
Theorem 18. Suppose (X, d) is a metric space with the usual topology induced by the metric. Then
the set S ⊆ X is compact

• iff every infinite subset of S has a limit point. (The Bolzano-Weirstrass property)
• iff every sequence in S has a convergent subsequence. (The sequential compactness property)
Theorem 19 (Heine-Borel). A subset of Rn is compact under the usual topology iff it is closed and
bounded.
Theorem 20. Suppose (X, T ) is a topological space. Then

## • Any continuous image of a compact set is compact.

• Any closed subset of a compact set is compact.
• The union of a finite number of compact sets is compact.

## There is an important corollary for the first item in Theorem 20:

Theorem 21 (Weierstrass). Suppose (X, T ) is a topological space and that f is a continuous mapping
from X into R. If S is a compact set, then f achieves both a maximum and a minimum in S.

## 3.4 Sigma Algebras and Measure Spaces

Definition 3.28. A measure space is a non-empty set, X, together with a non-empty class of subsets, S,
which satisfies the following properties.

1. A ∈ S implies AC = X \ A ∈ S.
2. Ai ∈ S, for i = 1, 2, . . ., implies ∪∞
i=1 Ai ∈ S

## I.e., S is closed under complementation and countable unions.

A class of sets which satisfies these properties is called a σ -algebra and the members of S are called
the measurable sets in X. Further, if (X, S) is a measure space and (Y , T ) is a topological space, then
f : X → Y is called measurable iff f −1 (V ) ∈ S for every V ∈ T .
 Problem 3.18. Show that if (X, S) is a measure space, then X ∈ S and  ∈ S.
 Problem 3.19. Show that if (X, S) is a measure space, then Ai ∈ S for i = 1, 2, . . ., implies ∩∞
i=1 Ai ∈ S.

45

Problem
3.3 on page
39. (i) x − y

0 and x − y = 0 iff x = y. (ii) x − y = −(y − x) =

y − x and (iii) x − y + y − z ≥ (x − y) + (y − z) = |x − z|.

Problem 3.5 on page 39. Infinity since the supremum does not exist.
Problem 3.15 on page 44. Since X is a topological space:

x C ).

46
Chapter 4

Calculus

## 4.1 The First Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 The Second Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

## 4.1 The First Differential

Linear functions have such pleasant properties relative to non-linear functions that one is tempted to
erase the latter wherever they occur and replace them with the former. Under certain circumstances,
such replacement is legitimate. Note, to begin with that the number a and the linear function ax
provide a good linear approximation of the function f : R → R at x ∈ R if the error involved in the
approximation,
(v) ≡ f (x + v) − f (x) − av
is small relative to v:
(v) f (x + v) − f (x̄) − av
lim = =0
kvk→0 kvk v
~ and the associated linear function, a
Similarly, a vector a ~ · v, provide a good linear approximation of
f : Rn → R at the point x ∈ Rn if the error involved in the approximation,
~·v
(v) ≡ f (x + v) − f (x) − a
is small relative to the norm of v:
(v) ~·v
f (x + v) − f (x̄) − a
lim = =0
kvk→0 kvk kvk
Generalizing slightly we have:
Definition 4.1. The m by n matrix A is a good linear approximation of f : Rn → Rm if
(v) kf (x + v) − f (x̄) − Avk
lim = =0
kvk→0 kvk kvk

## where the ith row of A provides the approximation of f i

47
~ or m = n = 1
This definition subsumes as special cases the circumstances in which m = 1 and A = a
and A = a.
Two questions remain regarding the use of linear approximations. Which functions have good linear
approximations? When such approximation exists, how can they be identified? Differential calculus
was invented, in no small part, to provide the answers to these questions.
Definition 4.2. Suppose f : Rn → Rm is a vector-valued function with the real-valued components
f = (f 1 , . . . , f m ), f i : Rn → R. The function f is differentiable at a point x iff there is a matrix A s.t.

kf (x + v) − f (x) − Avk
lim =0
kvk→0 kvk

in which case the matrix A is called the derivative at x and the linear transformation Av is called the
differential at x. The function is differentiable iff it is differentiable at every point in the domain.

The requirement for f to be differentiable at x is, of course, precisely the same as the requirement
for the linear transformation Av to provide a good approximation of f at x. “Differentiable” is thus
synonymous with “has a good linear approximation” and “differential” is synonymous with “best linear
approximation”.
As special cases note that when m = n = 1, the matrix A has but a single element

df (x)
a = f 0 (x) =
dx

the “derivative of f evaluated at x”. Geometrically, a is the slope of f (x) at x and av is the equation
of the “tangent hyperplane”.
 Problem 4.1. Using Mathematica, plot the tangent to the graph of

## f (x) = (x + 2)(x 2 + 1)x(x − 1)(x − 2)

at x = 1.6.
 Problem 4.2. Using Mathematica, determine the critical points of the function specified in Prob-
lem 4.1, i.e., the points on the graph of f where df /dx = 0.

1.2
This is illustrated in Figure 4.1 for the case in which f (x) = ln(x),
x = 2 and 1.0

0.8

## f (x + v) = f (x) + f 0 (x)v + µ(v) or 0.6

1
ln(2 + v) = ln(2) + v + µ(v) 0.4

2 0.2

When m = 1 but n > 1, the matrix A has but a single row -1.0 -0.5 0.5 1.0

## Figure 4.1: Best Linear Approx-


∂f (x) ∂f (x)
 imation
~ = fx (x) =
a ,...,
∂x1 ∂xn
which is called the gradient of f : Rn → R evaluated at x.
Consider, for example, the case in which f (x) = x1 2 + x2 , x = (2, 3), f (2, 3) = 7 and fx (2, 3) = (4, 1).
The graph of this function would be a surface in R3 and the differential would be a hyperplane tangent
to this surface at the point (2, 3, 7).

48
The gradient itself is best illustrated in R2 — see Figure 4.2. The
parabola represents a level contour for f , i.e., the inverse image of
a particular point in the range. The gradient at the point x = (2, 3) 6

illustrates three very important facts about the gradient: (i) It is or-
thogonal to a tangent to the level contour at the point at which it 4
is evaluated. (ii) It points in the direction in which the function in-
creases at the greatest rate, i.e., the direction of steepest ascent. (iii)
Its length is equal to the slope of the function in the direction of 2

steepest ascent.
The latter fact follows from the fact that f (x + v) ≈ f (x) + fx · v.
-2 2 4 6
Now take v to be a unit length movement in the direction fx so that
v = fx /kfx k, substitute for v and rearrange to get
! x1 2 + x2 at x = (2, 3)
fx fx
f x+ − f (x) ≈ fx · = kfx k
kfx k kfx k

Recall now that when you graph the function f : R , R corresponding to y = x 2 , for example, you are
drawing a picture of the set {(x, y) ∈ R2 | y = x 2 }. Similarly
Definition 4.3. The graph of f : Rn , R is the set {(x, y) ∈ Rn+1 | y = f (x)}.
 Problem 4.3. Describe the graph of y = a · x where a ∈ Rn , i.e., the set G ≡ {(y, x) ∈ Rn+1 | y =
a · x}. What is the gradient of this map? What is the value of y when x = a/kak? Is G a linear
subspace? Is GC open and dense in the usual topology?
 Problem 4.4. Suppose f (x1 , x2 ) = x1 +ln(1+x2 ). Calculate and illustrate fx (x) for x = (0, 0), (1, 0), (0, 1)
and (0, −1). Hint: In Mathematica Log[x] is used to compute the natural log of x.

In general, when f : Rn → Rm :
 
∂f 1 (x̄)/∂x1 ··· ∂f 1 (x)/∂xn
.. .. ..
 
A = fx (x) =  .
 
 . . 

m
∂f (x)/∂x1 ··· ∂f m (x)/∂xn
is called the Jacobian of f . Notice that the ith row of the Jacobian is simply the gradient of the ith
component of f .
 Problem 4.5. Suppose that f : Rn → Rm . Show that if f is differentiable at x then it must also be
continuous at x.

Having done the best we can with a linear approximation, it is natural to ask what improvement might
be possible with a quadratic approximation, i.e., a mapping Q : Rn , R where

Q(x) = x T Bx
n X
X n
= bij xi xj
i=1 j=1

## = b11 x12 + (b12 + b21 )x1 x2 + (b13 + b31 )x1 x3 + · · · + bnn xn

2

and B is an n by n matrix.

49
 Problem 4.6. Show that for any n by n matrix B
" #
T T B + BT
x Bx ≡ x x, ∀x
2

and thus that B can be taken to be a symmetric matrix, bij = bji , without any loss of generality.

Such quadratic forms have a number of interesting properties. Note first that if λ ∈ R then

= λ2 x T Bx
= λ2 Q(x)

## Quadratic forms are thus homogeneous of the second degree where

Definition 4.4. A function f : Rn , R is homogeneous of degree ρ iff

f (tx) = t ρ f (x), ∀t 6= 0

The fact that quadratic forms are homogeneous of the second degree means that their graphs are

= Q(x)

## • The quadratic form is negative semi-definite iff Q(x) ≤ 0, ∀x 6= 0.

• The quadratic form is indefinite iff Q(x 0 ) > 0 for some x 0 and Q(x 00 ) < 0 for some x 00 .

2
These possibilities can be nicely illustrated for the case in 1

## which n = 2. The case of a positive definite quadratic form -1

0

where -2
" # 3
1 0
B=
0 4 2

1
is illustrated in Figure 4.3. Note that the graph is symmetric
about the y-axis and lies strictly above the x-plane every- 0
-2
where other than the origin. Cross sections taken parallel to -1
0
the x-plane are either ellipses when y > 0, a single point 1
when y = 0 or empty when y < 0. Other positive definite 2

quadratic forms will have similar graphs, i.e., “bowls” which Figure 4.3: Positive Definite
touch the x-plane only at the origin and lie strictly above it
everywhere else. Cross sections when y > 0 will either be ellipses or, in degenerate cases, circles.
Since −Q(x) is a negative definite quadratic form iff Q(x) is positive definite, we don’t need a new
picture for negative definite quadratic forms — simply turn Figure 4.3 upside down.

50
2
Figure 4.4 illustrates the graph of the positive semi-definite 1
0
-2
" # 2.0
1 0
B= 1.5
0 0 1.0

0.5
Again the graph is symmetric about the y-axis and lies on 0.0
or above the x-plane everywhere other than the origin. Here, -2
-1
however, the graph actually coincides with the x-plane along 0
1
a line. Cross sections taken parallel to the x-plane are either 2
pairs of parallel lines when y > 0, a single line when y = 0
Figure 4.4: Positive Semi-Definite
or empty when y < 0. A degenerate possibility for a positive
semi-definite quadratic form is that its graph coincides with
the x-plane everywhere — the case in which every element of B is equal to zero. Other possibilities are
similar to Figure 4.4 — “troughs” which coincide with the x-plane along a line and lie strictly above it
elsewhere.
Once again we need not bother with a graph for the negative version. Since −Q(x) is negative semi-
definite iff Q(x) is positive semi-definite, simply turn Figure 4.4 upside down for an illustration of a
2
Only the case of an indefinite quadratic form remains and 1
0
this is illustrated in Figure 4.5 for the case in which -1
-2
" # 2
1 0
B= 1
0 −1 0

-1
As always, the graph is symmetric about the y-axis. Here -2
the graph sometimes lies above the x-plane and sometimes -2
-1
lies below it. Cross sections taken parallel to the x-plane are 0

## pairs of hyperbolas when y > 0 or when y < 0 or a pair of 1

2
crossing lines when y = 0 [x2 = ±x1 ].
Figure 4.5: Indefinite
regions both above and below the x-plane.

## Armed with quadratic forms we can now say that

Definition 4.5. A vector a and an n by n matrix B provide a good quadratic approximation of f : Rn → R
at the point x if the error involved in the approximation,

1 T
µ(v) ≡ f (x + v) − f (x) − a · v − v Bv
2

## is small relative to kvk2 :

µ(v) f (x + v) − f (x̄) − a · v − 12 v T Bv
lim = =0
kvk→0 kvk2 kvk2

51
This means that the quadratic form, v T Bv, is a good (relative to kvk2 ) approximation of the error
left from the linear approximation. [A good quadratic approximation of f : Rn → Rm would similarly
require a vector ai and a matrix B i for each component f i .]
As might be suspected, the second derivative is the key to obtaining the quadratic approximation.
Definition 4.6. f : Rn → R is twice differentiable at x iff ∃ a ∈ Rn and an n by n matrix B s.t.

f (x + v) − f (x) − a · v − 12 v T Bv
lim =0
kvk→0 kvk2

in which case B is called the second derivative and the quadratic form 1/2v T Bv is called the second
differential. The function is twice differentiable iff it is twice differentiable at every point in the domain.

As special cases note that when n = 1 the matrix B has but a single element

d2 f (x)
b = f 00 (x) =
dx 2
the “second derivative of f (x) evaluated at x”. Geometrically, b is the slope of f 0 (x) at x — the rate
at which the slope of f changes at x — or the curvature of f .
When n > 1 the n by n matrix B has the second partial derivatives of f evaluated at x as its elements
 
∂ 2 f (x)/∂x12 · · · ∂ 2 f (x̄)/∂x1 ∂xn
.. .. ..
 
B = fxx (x) =  .
 
 . . 

2 2 2
∂ f (x)/∂xn ∂x1 ··· ∂ f (x̄)/∂xn

## and is called the Hessian of f .

The role of the second differential in approximating the error left -1.0 -0.5 0.5 1.0

## from the linear (first differential) approximation can be illustrated

-0.05
in the context of the example of Figure 4.1 on page 48. The error
from the linear approximation of ln(x), (v), is plotted in Figure 4.6 -0.10

## together with the best quadratic approximation of this error, namely,

the second differential -0.15

## 1 00 Figure 4.6: Best Quadratic Ap-

f (x + v) − f (x) − f 0 (x)v = f (x)v 2 + µ(v) or proximation
2!
1 1 1 2
ln(2 + v) − ln(2) − v = − v + µ(v)
2 2 4
1/6 1/3 1/2
 Problem 4.7. Suppose g(x1 , x2 , x3 ) = x1 x2 x3 . Calculate gx (x) and gxx (x) for x = (1, 8, 9).
What is the equation of the hyperplane that is tangent to S = { x ∈ R3 | g(x) = 6 } at x = (1, 8, 9)?
Hint: In Mathematica the gradient and hessian of a function can be computed using the following:

f = x1^(1/6) x2^(1/3) x3^(1/2);
x = {x1,x2,x3}
hessian[f,x]//MatrixForm

52
To be twice differentiable at a point is to have a good quadratic approximation at that point — the
second derivative is the quadratic term of the approximation. Twice differentiable functions are also
differentiable, of course, since the definition requires finding the first derivative as well. The first
derivative not only exists if f is twice differentiable at x, it is continuous as well. Putting matters
somewhat loosely, continuity of a function implies an absence of breaks in its graph. To be differen-
tiable implies an additional smoothness — an absence of kinks — since the graph must have unique
tangents wherever it is differentiable. To be twice differentiable means that the graph of the derivative
has no kinks and the function is smoother still. Pressing on, mathematicians use the term smooth to
describe a function which is differentiable at all orders.
 Problem 4.8. Suppose f : R → R. Give an example of an f that is smooth. Give an example of an f
that is differentiable but not smooth.

0.04
If you’re guessing that a cubic approximation could be used to ap-
proximate the error left from the linear and quadratic approxima- 0.02

tions and that this would be based on the third derivative, you’re
-1.0 -0.5 0.5 1.0
exactly right. Continuing the example from Figures Figure 4.1 on
-0.02
page 48 and Figure 4.6 on the previous page, the error from the linear
and quadratic approximations of ln(x), (v), is plotted in Figure 4.7 -0.04

together with the best cubic approximation of this error, namely, the -0.06

third differential
Figure 4.7: Best Cubic Approx-
imation
1 00 1
f (x + v) − f (x) − f 0 (x)v − f (x)v 2 = f 000 (x)v 3 + µ(v) or
2! 3!
(4.1)
1 1 1 2 1 1 3
ln(2 + v) − ln(2) − v+ v = v + µ(v)
2 2 4 6 4

Rearranging Equation 4.1 gives the Taylor series approximation of order three:

1 00 1
f (x + v) = f (x) + f 0 (x)v + f (x)v 2 + f 000 (x)v 3 + µ(v)
2! 3!

## Taylor series approximations of higher orders are analogous.

 Problem 4.9. Series[f[x],{x,a,n}] is the Mathematica command for generating an nth order
Taylor series expansion of f [x] with respect to x at x = a. Use it to obtain the 4th order expansion
of 1/Sqrt[1+x] at x = 0.

## 4.4 Convex and Concave Functions

Convex and concave functions are particularly important in Microeconomics. A function is concave if
and only if a line segment connecting any two points on the graph of the function lies on or below its
graph:
Definition 4.7. The function f : X , R where X is a convex subset of Rn is concave iff x, y ∈ Rn and
0 < a < 1 implies f (ax + (1 − a)y) ≥ af (x) + (1 − a)f (y)

A function is strictly concave if a line segment connecting any two distinct points on the graph of
the function lies below the graph at all points other than the end points. See the left-hand panel
of Figure 4.8 on following page.

53
3 60

2 40

1 20

0
0
0 0
4 4
2 2

2 2

4 4

0 0

## Figure 4.8: Strictly Concave (left) and Convex (right)

Definition 4.8. The function f : X , R where X is a convex subset of Rn is strictly concave iff x, y ∈
Rn , x 6= y and 0 < a < 1 implies f (ax + (1 − a)y) > af (x) + (1 − a)f (y).

The function f , on the other hand, is convex iff g(x) = −f (x) is concave. A convex function is
characterized by the fact that line segments connecting two points lie on or above the graph. Similarly,
f is strictly convex iff −f is strictly concave. A strictly convex function is illustrated in the right-hand
panel of Figure 4.8.
 Problem 4.10. Show that if f : Rn , R is a linear transformation then f is both concave and convex.

## Concave functions have particularly nice properties:

Theorem 22. Suppose that X is a convex subset of Rn . Then

## 1. If f : X , R is concave then it is also continuous in the interior of X.

2. If
k
X
f (x) = αi fi (x)
i=1

## with αi ≥ 0, i = 1, . . . , k, and if each fi : X , R, i = 1, . . . , k, is concave then f is also concave. [A

non-negative linear combination of concave functions must also be concave.]

 Problem 4.11. Illustrate the first part of Theorem 22 for f : R → R by showing that a discontinuous
function cannot be concave.
 Problem 4.12. Prove the second part of Theorem 22.

## The second derivatives of concave functions also have intuitive properties:

Theorem 23. Hessians of Concave Functions. Suppose f (x) is a continuously twice differentiable real
valued function on an open convex set X in Rn with Hessian fxx (x). Then

1. The function f is concave (convex) if and only if fxx (x) is negative (positive) semi-definite for all
x ∈ X.

2. The function f is strictly concave (convex) if fxx (x) is negative (positive) definite for all x ∈ X.

54
Note that the “and only if” is missing from the second proposition.
5
The function f (x) = −(x − 1)4 , for example, is strictly concave but
its Hessian is only negative semi-definite at x = 1 since f 00 (x) =
−12(x − 1)2 = 0 at x = 1 — see Figure 4.9.
Finally, and perhaps most important to the current discussion -1 1 2 3

## Definition 4.9. Suppose the function f : X , R, where X is a convex

subset of Rn . Then the set Lf (r ) ≡ {z ∈ Rn | f (z) ≥ r } ⊂ X is called
a level set for f . -5

## Theorem 24. If f X , R, where X is a convex subset of Rn , is concave

then the level set Lf (r ) is a convex set for any r ∈ R.
 Problem 4.13. [Answer] Prove Theorem Theorem 24. -10

## Figure 4.9: Strictly Concave

The converse of this proposition isn’t true — a weaker condition
but Negative Semi-Definite
than concavity, namely quasi-concavity, is sufficient to guarantee the
convexity of Lf (r ).
Definition 4.10. The function f : X , R, where X is a convex subset of Rn , is quasi-concave iff ∀r ∈ R,
the level set Lf (r ) is a convex set and strictly quasi-concave iff Lf (r ) is a strictly convex set.

## An equivalent definition which is sometimes useful is

Definition 4.11. The function f : X , R where X is a convex subset of Rn is

1. quasi-concave if

## f (x 0 ) ≥ f (x) =⇒ f ax 0 + (1 − a)x ≥ f (x), ∀x 0 , x ∈ X and 0 ≤ a ≤ 1



2. strictly quasi-concave if



## A strictly quasi-concave (but not concave) function is illustrated in Fig-

ure 4.10. As you would expect, f is quasi-convex iff −f is quasi- 300

## concave and strictly quasi-convex iff −f is strictly quasi-concave.

 Problem 4.14. Show that any monotone increasing (or decreasing) 200

## function is quasi-concave if X ⊂ R. [The function f : R , R is a

monotone increasing (decreasing) function iff x 0 ≥ x implies f (x 0 ) ≥ 100

(≤)f (x).]
0
 Problem 4.15. Show that any monotone increasing (or decreasing) 0
4
function is quasi-convex if X ⊂ R. 2

2
 Problem 4.16. Suppose f : R → R is single-peaked, i.e., it is either 4

## monotone increasing, monotone decreasing or there exists an x ∗ ∈ 0

R such that f is monotone increasing for x ≤ x ∗ and monotone Figure 4.10: Strictly Quasi-
decreasing for x ≥ x ∗ . Is f quasi-concave? Is f concave? Concave
It is an unfortunate fact that quasi-concave functions share few of the other nice properties of concave
functions. Quasi-concave functions are not necessarily continuous — f (x) = 1 if x = 0 and f (x) = 0
if x 6= 0 with x ∈ R is, for example, quasi-concave but neither concave nor continuous. Non-negative
linear combinations of quasi-concave functions, moreover, are not necessarily quasi-concave. The
functions, x 2 and (x − 1)2 for x ∈ X ≡ [0, 1], for example, are both quasi-concave (but not concave)
and yet 12 x 2 + 12 (x − 1)2 is not quasi-concave. This is illustrated in Figure 4.11 on following page.

55
Lastly, consider the bordered Hessian: 1.0

0.8

 
0 ∂f (x)/∂x1 ··· ∂f (x)/∂xn
0.6
 ∂f (x)/∂x1 ∂ 2
f (x)/∂x12 ··· 2
∂ f (x)/∂x1 ∂xn 
 
B(x) ≡ 
 
.. .. .. .. 

 . . . . 

0.4

2
0.2

## with (k + 1)st successive principal minor

0.2 0.4 0.6 0.8 1.0

## Figure 4.11: Quasi-Concave

0 ∂f (x)/∂x1 ··· ∂f (x)/∂xk Combinations

∂ 2 f (x)/∂x12

∂f (x)/∂x
1 ··· ∂ 2 f (x)/∂x1 ∂xk

Bk (x) ≡ .. .. .. ..
.

. . .

2
∂f (x)/∂xk ∂ 2 f (x)/∂xk ∂x1 ··· 2
∂ f (x)/∂xk

## k = 1, . . . , n. If f : Rn , R is twice continuously differentiable and quasi concave then it can be shown

that B2 (x) ≥ 0, B3 (x) ≤ 0, . . ., (−1)n Bn (x) ≥ 0 for all x ∈ Rn — not so nice! It is fortunate that second
order characterizations of concave and convex functions prove to be much more important than those
for their recalcitrant “quasi” cousins.

## Problem 4.13 on preceding page.

 Suppose x, y ∈ Lf (r ). Then f (x), f (y) ≥ r . But concavity then
implies that f λx + (1 − λ)y ≥ λf (x) + (1 − λ)f (y) ≥ λr + (1 − λ)r = r and λx + (1 − λ)y ∈ Lf (r ).

56
Chapter 5

Optimization

5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

## 5.3.3 The Envelope Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.1 Optimization

Virtually all of microeconomics is predicated upon the idea that the (economic) behavior of people
can be explained as an attempt on their part to maximize (or minimize) something subject to some
constraints. This view does not necessarily ascribe much intelligence to the people whose behavior is
to be described — the behavior of a rock falling off a cliff can be described, after all, as an attempt on
the part of the rock to minimize its potential energy. It simply means that we microeconomic theorists
always think we can cook up models based on optimizing behavior which will capture the important
aspects of actual behavior.

57
5.1.1 Unconstrained Optimization

The simplest optimization problem involves no constraints. Here f : Rn , R is the objective function
and the objective, accordingly, is to find that x ∗ which makes f (x ∗ ) as large (or as small) as possible:

f (x ∗ + v) − f (x ∗ ) ≤ 0, ∀v
(
∗ arg maxx f (x) if
x ≡
arg minx f (x) if f (x ∗ + v) − f (x ∗ ) ≥ 0, ∀v

Here the terms arg max and arg min refer to the “arguments” which solve the respective optimization
problem.
Now if f is differentiable and fx (x ∗ ) is its gradient evaluated at the point x ∗ then

f (x ∗ + v) − f (x ∗ ) ≈ fx (x ∗ )v

Thus fx (x ∗ ) = 0 is a necessary condition either for a maximum or a minimum — if it were not zero
then f would increase for some choice of v̂ (not a maximum) and fall for −v̂ (not a minimum either).
The requirement that the gradient “vanish”, fx (x ∗ ) = 0, entails solving the system of n equations:

∂f
(x ∗ , . . . , xn

)=0
∂x1 1
∂f
(x ∗ , . . . , xn

)=0
∂x2 1
..
.
∂f
(x ∗ , . . . , xn

)=0
∂xn 1

for the n unknown components of x ∗ . Since this requirement involves the first differential and is a
necessary condition it is called the first order necessary condition.
Another necessary condition involves the second differential. If f is twice differentiable with second
derivative fxx (x)

1 T
f (x ∗ + v) − f (x ∗ ) ≈ fx (x ∗ )v + v fxx (x ∗ )v
2
1 T
≈ v fxx (x ∗ )v when fx (x ∗ ) = 0
2

Moreover, since

## if fxx (x ∗ ) is negative semidefinite

(
∗ ∗ ≤0
f (x + v) − f (x )
≥0 if fxx (x ∗ ) is positive semidefinite

## we have the second order necessary condition:

Theorem 25. For x ∗ to impart a maximum (minimum) to f (x) it must be the case that fxx (x ∗ ) is
negative (positive) semidefinite.

58
It is sufficient that fxx (x ∗ ) be negative definite for a maximum or positive
definite for a minimum, but sufficient conditions aren’t very important in
Microeconomics. The reason is that Economic models typically involve the
optimization hypothesis that the person whose behavior is being modeled S
behaves as if he/she were trying to maximize (or minimize) an objective
function subject to constraints. Call this optimizing model M. A neces-
M
sary condition, N, which must be true in order for M to be true provides a
testable hypothesis — see Figure 5.1 — if evidence suggests that N is false N
then M must be false as well. Sufficient conditions, on the other hand,
do not provide testable hypotheses — S, for example, is sufficient for M
in Figure 5.1 but evidence that S is false does not imply that M is false.
Figure 5.1: Necessary and
 Problem 5.1. In Mathematica the command Maximize[f,{x,y,...}] Sufficient Conditions for
maximizes f with respect to x, y, . . . . Describe the output from M

## f = -(x + 2) (x^2 + 1) x (x - 1) (x - 2);

Plot[f,{x,-2.5,2.5}]
Maximize[f,x]//N

## x ∗ = arg maxf (x)

x

s.t. g i (x) ≥ 0, i = 1, . . . , m

The requirements that x must satisfy g i (x) ≥ 0, i = 1, . . . , m, are called constraints and the problem
is to find that x ∗ which satisfies the constraints and which imparts the largest value to the objective
function, i.e., x ∗ solves this problem if and only if

• g(x ∗ ) ≥ 0

## • for any x̂, f (x̂) > f (x ∗ ) implies g(x̂) 6≥ 0.

There are exactly two possibilities for the solution to this problem, x ∗ ,

• There is no x̂ for which f (x̂) > f (x ∗ ). Here the constraints are “not binding” — can be erased
without affecting the solution to the problem. Here the unconstrained conditions hold.

• There is a x̂ for which f (x̂) > f (x ∗ ) but, of course, g(x̂) 6≥ 0. Here the constraints are binding
— prevent one from doing as well as would be possible without them — and the unconstrained
conditions do not necessarily hold.

Focusing upon the second possibility for the moment, notice that if the objective function, f , and the
binding constraints, ĝ, are differentiable at x ∗ then

f (x ∗ + v) − f (x ∗ ) ≈ fx (x ∗ )v

59
and

## ĝ(x ∗ + v) − ĝ(x ∗ ) = ĝ(x ∗ + v)

≈ ĝx (x ∗ )v

where ĝx (x ∗ ) denotes the matrix whose rows are the gradients of those constraints which are binding
at x ∗ . For a maximum it is intuitive that the inequalities:

## cannot have any solutions since:

1. For small v, the movement from x ∗ to x ∗ + v would satisfy the constraints. This follows from
the fact that the components of ĝ(x ∗ ) are all equal to zero, all other constraints are strictly
positive and the (small) movement to x ∗ + v would thus leave all constraints non-negative.

2. For small v, the movement from x ∗ to x ∗ + v would impart a larger value to the objective
function.

This intuition is basically correct save for pathological cases — see Arrow, Hurwitz and Uzawa for a
discussion of constraint qualifications which are sufficient to eliminate such problems. [The original
Kuhn-Tucker constraint qualification was that the gradients of the binding constraints must be linearly
independent.]
Recall now that Farkas’ Lemma (Theorem 6 on page 31) says that if A is an m by n matrix and b 6= 0
is a 1 by n row vector then exactly one of the following is true:

## 2. Az ≥ 0, b · z < 0 has a solution z ∈ Rn

Now make the association that A = ĝx (x ∗ ), b = −fx (x ∗ ), z = v, y = λ and note that b = −fx (x ∗ ) 6= 0
by virtue of the fact that we are focusing upon the case in which the unconstrained, vanishing gradient,
condition does not hold. Then either

## λĝx (x ∗ ) = −fx (x ∗ ), λ>0 (5.2)

has a solution or Equation 5.1 has a solution. But since there can be no solutions to Equation 5.1 if x ∗
is to solve the maximization problem, it follows from Farkas’ Lemma that Equation Equation 5.2 must
have a solution.
Equation 5.2 is illustrated for a simple maximization problem in Figure 5.2 on the next page. Here
there are three constraints and the problem is to

x

## s.t. g 1 (x) = (1, 0) · (x1 , x2 ) ≥ 0

(5.3)
g 2 (x) = (0, 1) · (x1 , x2 ) ≥ 0
g 3 (x) = 1 + (−1/3, −1) · (x1 , x2 ) ≥ 0

This is an example of a linear programming problem — a problem in which both the objective function
and the constraints are linear.
Note first that the shaded area corresponds to g(x) ≥ 0 — the set of x’s which satisfy all three
constraints. By inspection it can be seen that x ∗ = (3, 0) solves the problem since:

60
fx . x = 3
2
g = (0, 1)
x f x = (1, 1)

1
g = (1, 0)
x
(3, 0)

f x = (-1, -1)
3
g = (-1/3, -1)
x

## • (3, 0) imparts a value of 3 to the objective function

• x’s which make the objective function larger than 3 do not belong to the shaded area.

That this point satisfies Equation 5.2 on the previous page can be seen by plotting, with their tails at
(3, 0), the gradients of the objective function and each of the three constraints. Note that −fx belongs
to the cone generated by g 2 and g 3 , the gradients of the constraints which are actually binding at
(3, 0), since it is possible to express −fx as a positive linear combination of g 2 and g 3 . These positive
weights provide the values of the components of λ in the solution to λĝx = −fx with λ > 0:
" #
h i 0 1
λ2 λ3 = (−1, −1)
−1/3 −1

or h i h i
λ2 λ3 = 2 3
 Problem 5.2. For the problem of Figure 5.2 show geometrically why the point (0, 1) does not sat-
isfy Equation 5.2 on the previous page. [Recall that ĝx (0, 1) contains only the gradients of the con-
straints which are binding at (0, 1).]

## If we write the Lagrangian Function

m
X
L(x, λ) ≡ f (x) + λi g i (x)
i=1

then, with slight modification, the “Farkas’ Lemma” requirement for a maximum, Equation 5.2 on
the previous page, becomes the Kuhn-Tucker Conditions which, given the constraint qualification, are
necessary for a solution to the maximization problem:

Lx (x ∗ , λ∗ ) = 0 (5.4)
∗ ∗
λLλ (x , λ ) = 0 (5.5)
∗ ∗
Lλ (x , λ ) ≥ 0 (5.6)
λ∗ ≥ 0 (5.7)

61
Equation 5.4 on preceding page, Equation 5.5 on preceding page and Equation 5.7 on preceding page
are the solution to Equation 5.2 on page 60 save for the fact that λ is now allowed to be equal to zero
— this to incorporate the case in which none of the constraints is binding. Equation 5.6 on preceding
page is merely a restatement of the constraints. Equation 5.5 on preceding page and Equation 5.7
on preceding page form the so-called complementary slackness condition which prevents non-binding
constraints from playing a role in Equation 5.4 on preceding page. Equation 5.5 on preceding page
says that
m
X
i
λ∗ ∗
i g (x ) = 0
i=1
Since each term in this summation is the product of two non-negative numbers (Equation 5.6 on pre-
ceding page and Equation 5.7 on preceding page), it follows that g i (x ∗ ) > 0 implies λ∗ i = 0 and,
conversely, λ∗j > 0 implies g j
(x ∗
) = 0. If the ith constraint is not binding, g i
(x ∗
) > 0 then the
∗ ∗
associated Lagrangian multiplier, λi , must be equal to zero. Conversely, if λi > 0 then the corre-
sponding constraint must be binding, g i (x ∗ ) = 0. Thus Equation 5.4, Equation 5.5 and Equation 5.7
on preceding page are equivalent to Equation 5.2 on page 60.
 Problem 5.3. Is it possible for both λ∗ j ∗
j = 0 and g (x ) = 0 to hold in an optimal solution? If so,
provide an example of an optimization problem in which this would be true. Otherwise show why this
is not possible.
The complementary slackness condition is consistent with the shadow price interpretation of the La-
grangian multipliers in which λi is the value of relaxing the ith constraint. This can expressed more
precisely as follows. Introduce a vector of parameters, a = (a1 , . . . , am ) into the generic optimization
problem as follows
x ∗ (a) ≡ arg max f (x)
x

s.t. g 1 (x) + a1 ≥ 0
..
.
g m (x) + am ≥ 0

The presence of the a’s in the constraints means that the optimal solution(s) will depend upon which
values are selected for these a’s. The notation x ∗ (a) reflects this functional dependence — it is simply
the set of x’s which solve the problem for the given a. Now the value of the objection function also
depends upon a. This can be expressed by
F (a) ≡ max f (x)
x

s.t. g 1 (x) + a1 ≥ 0
..
.
g m (x) + am ≥ 0

Notice that F (a), unlike x ∗ (a), is necessarily a single valued function — there can be but one value for
the maximized objective function. [In the previous example F (a) = a.] Provided that it is differentiable
then
∂F (0)
λi = (5.8)
∂ai
represents the rate at which the (optimized) value of the objective function increases as the ith con-
straint is relaxed or as ai is increased from zero.
ËĞ

62
 Problem 5.4. In Mathematica the command Maximize[{f,cons},{x,y,...}] maximizes f with re-
spect to x,y,... subject to the constraints, cons. How well does Mathematica do with the illustrative
problem in Equation 5.3 on page 60?
Maximize[{x1+x2, x1>=0, x2>=0, 1-x1/3-x2>=0},{x1,x2}]

## 5.1.3 Roots of Symmetric Matrices

The Kuhn-Tucker first-order conditions can themselves be used to derive some results related to the
second-order conditions.
Theorem 26. If A is symmetric and if either

x 0 = arg max x T Ax
x
T
s.t. x x ≤ 1
λ = max x T Ax
0
x
T
s.t. x x ≤ 1

or

x 0 = arg min x T Ax
x
T
s.t. x x ≥ 1
λ = min x T Ax
0
x

s.t. x T x ≥ 1

## then x 0 is a characteristic vector of A and λ0 is the associated characteristic root.

 Problem 5.5. Show, for any symmetric n by n matrix A, that the gradient of the associated quadratic
form is given by
∂x T Ax
= 2Ax
∂x

## For the maximization problem we have the Lagrangian function

Lmax = x T Ax + λ[1 − x T x]

## while for the minimization problem we maximize −x T Ax and have

Lmin = −x T Ax + λ[x T x − 1]

Thus
−Lmin = Lmax
and in either case a necessary condition is that

Lx = 2Ax − 2λx = 0

or
Ax = λx

63
Since an optimal x must satisfy the characteristic equation it follows that an optimal x must be a
characteristic vector of A and the Lagrangian multiplier λ must be the associated characteristic root.
Pre-multiply both sides of the last expression by x T to obtain

x T Ax = λx T x
= λ1

Thus the Lagrangian multiplier/characteristic root is, in fact, equal to the optimized value of the
objective function.
-1.0
-0.5
0.0
0.5
1.0

1.0

0.5

0.0

-0.5

-1.0

-1.0
-0.5
0.0

0.5

1.0

## Figure 5.3: Characteristic Roots and Vectors

Theorem 26 on preceding page is illustrated in Figure 5.3 for the case in which
" #
1 0
A=
0 −1

The characteristic roots and associated characteristic vectors for this indefinite quadratic form are

λ1 = 1; x 1 = (1, 0) or (−1, 0)
λ2 = −1; x 2 = (0, 1) or (0, −1)

Moving around the unit circle, x T x = 1, the point x 1 (x 2 ) corresponds to the point at which x T Ax
is furthest above (below) the x-plane and λ1 (λ2 ) is the actual distance from the x-plane to the graph
x T Ax at this point.
Theorem 27. It is necessary and sufficient for the symmetric matrix A to be

1. a positive definite quadratic form that all the characteristic roots of A be positive.

2. a positive semi-definite quadratic form that all the characteristic roots of A be non-negative.

3. an indefinite quadratic form that at least one of the characteristic roots of A be positive and at
least one be negative.

64
Note that it is sufficient to explore the “unit circle” {x ∈ Rn | x T x = 1} since the sign of the quadratic
form at an arbitrary point x 6= 0 must be the same as the sign at the point x/|x|

xT x 1
A = x T Ax, ∀x 6= 0
|x| |x| |x|2

and the latter point is on the unit circle. Suppose, for example, that A is positive definite. Then since
x T Ax > 0 for all x 6= 0 it follows that

λ ≡ min x T Ax
x

s.t. x T x ≥ 1
>0

In view of Theorem 26 on page 63, if λ must be the smallest of all of the characteristic roots it follows
that all of the roots must be positive. Conversely, if all of the roots are positive then the minimization
problem has a strictly positive solution and the quadratic form is positive definite. A similar argument
holds if the quadratic form is positive semi-definite, save for the fact that the smallest root may
actually be equal to zero. Finally, if the quadratic form is indefinite then its minimum on the unit circle
will clearly be negative and its maximum will be positive and there will, accordingly, be characteristic
roots of both signs. Conversely, if there are both positive and negative characteristic roots then the
quadratic form takes on both positive and negative values and is therefore indefinite.
Finally, in view of Theorem Theorem 25 on page 58:
Theorem 28. Suppose f : Rn , R is continuously twice differentiable. For x ∗ to impart a maximum
(minimum) to f (x) it must be the case that the characteristic roots of fxx (x ∗ ) are non-positive (non-
negative).
 Problem 5.6. Consider the (unconstrained) problem of minimizing with respect to x and y the func-
tion
f = x^4 + 3 x^2 y + 5 y^2 + x + y;
Using Mathematica, first plot this function, then solve the minimization problem and finally check the
solution by examining the gradient and the characteristic roots (eigenvalues) of the hessian.

## Plot3D[f, {x, -3, 3}, {y, -6, 6}]

min = Minimize[f, {x, y}] // N
sol = Last[min]
gradient[f_, vars_List] := Map[D[f, #] &, vars]
hessian[f_, vars_List] := Outer[D, gradient[f, vars], vars]
hes = hessian[f, {x, y}] /. sol
Eigenvalues[hes]

Comment on the output and indicate whether or not Mathematica’s solution satisfies the first and
second order conditions for a minimum.
 Problem 5.7. Repeat Problem 5.6 this time for the function
f = x^3 + y^3 + 2 x^2 + 4 y^2 + 6
and using FindMaximum[f, {{x, 0}, {y, -8/3}}] to find a local maximum of f starting at x = 0
and y = -8/3, this time restricting the plot range to {x,-2.5,1}, {y,-3.5,1}. Are any warnings
issued by Mathematica regarding the solution? Is the solution actually a local maximum, a local
minimum or a local saddle point?

65
 Problem 5.8. Consider a “knapsack problem” (integer programming problem) in which five items are
available for packing in the knapsack. Let the integer xi = {0, 1} denote whether (1) or not (0) the
ith item is included. The benefit to including the items are, respectively, 14, 10, 15, 8 and 9 and the
corresponding weights are 6, 8, 5, 6 and 4. Which items should be included if the goal is to maximize
the total benefit subject to the constraint that the total weight not exceed 18? Hint: Use Mathematica:

Maximize[{14x1+10x2+15x3+8x4+9x5,
6x1+8x2+5x3+6x4+4x5 <= 18,
0<=x1<=1, 0<=x2<=1, 0<=x3<=1, 0<=x4<=1, 0<=x5<=1,
Element[{x1,x2,x3,x4,x5}, Integers]},
{x1,x2,x3,x4,x5}]

 Problem 5.9. Mathematica uses the optimized command LinearProgramming[c,m,b] to solve the
linear programming problem of minimizing c · x subject to the constraints that mx ≥ b and x ≥ 0
where c and x are n-tuples, m is an m × n matrix and b is an m-tuple. Use it to solve the “transporta-
tion problem” in which there are three sources of supply with quantities 47, 36 and 52, respectively,
and four destinations with demands 38, 34, 29 and 34, respectively. In this problem x will be the
12-tuple x11,...,x34 where xIJ denotes the quantity shipped from source I to destination J. The
12-tuple of the corresponding costs, cIJ, of shipping one unit from source I to destination J is
c = {5, 7, 6, 10, 9, 4, 6, 7, 5, 8, 6, 6}
The problem is to minimize the total shipping cost of meeting the specified demands from the avail-
able sources. There are seven constraints, three specifying that total shipments from each of the three
sources cannot exceed the available amounts and another four specifying that the total shipments
to each of the four destinations cannot be less than the demands. To express the constraints in the
required ≥ format we then need, for example, the first row of m to be
{-1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0}
and the first component of b to be −47. Complete the specification of m and b and give the solution
from LinearProgramming[c,m,b]. What is the total shipping cost in the optimal solution? How many
of the constraints are binding in the optimal solution?

## 5.1.4 The Parametric Problem

The typical “optimizing model of behavior” in microeconomics adds one final ingredient — parameters
— to the constrained optimization problem:

x

## s.t. g i (x, a) ≥ 0, i = 1, . . . , mF (a) ≡ max f (x, a)

x

s.t. g i (x, a) ≥ 0, i = 1, . . . , m

where

## x ∈ Rn the explained or endogenous variables

a ∈ Rp the parameters or exogenous variables
x(a) the reduced form, a point-to-set mapping

Note that this is a generic optimization problem in the sense that it incorporates:

66
1. Minimization: replace min h(x, a) with max f (x, a) where f (x, a) ≡ −h(x, a)
2. Arbitrary inequality constraints: replace w(x, a) ≤ z(x, a) with g(x, a) ≡ z(x, a) − w(x, a) ≥ 0
3. Equality constraints: replace h(x, a) = 0 with the two constraints g 1 (x, a) ≡ h(x, a) ≥ 0 and
g 2 (x, a) ≡ −h(x, a) ≥ 0

The vector x represents the endogenous variables whose values will be chosen, according to the model,
by some person as if to maximize the objective function subject to the constraints. The business of
the model is to predict the values of these variables — hence the term “endogenous”. The vector a
represents the parameters or exogenous variables whose values are not explained by the model but
rather are exogenously specified — the “dial settings” for the experiment. In consumption theory, for
example, a would correspond to the consumer’s income and the market prices of consumption goods
and x to the quantities of these consumption goods demanded. The mapping, x(a), would in this
case be called the consumer’s demand function.
The purpose of an optimizing model is to explain the choice of x in experiments which are charac-
terized by the given values of a or, in short, to ascertain the properties of the point-to-set mapping
x(a) — the term “point-to-set” reflects the fact that the image of a may be a set rather than a single
point. This mapping might be called the reduced form of the model. It is a sort of “bottom line” for the
model since it expresses the endogenous or explained variables directly as functions of the exogenous
or explanatory variables. To predict the result of an experiment simply stick the description of the
experiment, a, into this function and out pops the set of possible results, x(a).
Models can be more or less specific. In the typical example problem or exercise both the objective
function and constraints are explicitly stated and it is possible to determine the exact properties of
x(a) since it is generally possible to solve for the reduced form explicitly. In most “real” analysis,
on the other hand, neither the objective function nor the constraints are explicitly stated. Instead
qualitative properties of these functions are given and the analytical challenge becomes to trace these
qualitative properties through the optimization problem to discover what qualitative properties they
entail for the reduced form.
A “conservation of information” law holds for this deductive process — the more (or less) specific
the information about the objective function and constraints the more (or less) specific will be the
derived information about the reduced form. In a model in which all functions are explicitly stated,
for example, it might be possible to deduce that an increase in a given sales tax by five cents would
reduce the given consumer’s consumption of the taxed commodity by ten pints per week. In a less
specific model, on the other hand, it might only be possible to state that an increase in the sales tax
would entail a reduction (by some indeterminate amount) in the quantity consumed. Such results are
called comparative static effects and are typically expressed as propositions concerning the signs of
partial derivatives of the reduced form.

## 5.2 The Well Posed Problem

Certain potential problems of optimization models are best recognized and eliminated as early in the
process as possible:

1. Is it possible for x(a) to be empty, i.e, to contain no solutions whatever? Since this amounts
to the prediction that nothing is possible, this would be a rather serious, indeed fatal, flaw in a
model.
2. Is it possible for x(a) to contain more than one solution? The limiting case here would be
that anything is possible or that the model has no predictive value — it doesn’t rule out any
possibilities whatever.

67
3. Is it possible for a single valued x(a) not to be smooth, i.e. discontinuous or non-differentiable?
The “pre Chaos Theory” view of the world of actual experiments was that small changes in exoge-
nous parameters produce correspondingly small changes in endogenous variables — that X(A)
is smooth. The “post Chaos Theory” view allows for “butterfly effects” — e.g. the possibility
that the small change in air currents caused by a butterfly’s wings could cause a large change in
weather patterns. Whatever one’s view of such matters, computing comparative static effects is
certainly much easier when they can be obtained as (partial) derivatives.

## 5.2.1 The Existence Problem

The optimization problem will have no solutions only for the most poorly posed problems. An sug-
gestive example is the problem of finding the largest real number which is strictly less than one. There
is no solution for this constrained maximization problem nor is there one for the unconstrained prob-
lem of simply choosing the largest real number. The failures in both problems stems from the same
problem — the feasible set, the set of points which satisfy the constraint(s), is not compact.
Recall from Theorem 21 on page 45 that a continuous function must attain a maximum and a minimum
on a compact set and from Theorem 19 on page 45 that a subset of Rn will be compact if and only if
it is closed and bounded.
Note the failure of the feasible sets in the two examples. The set of real numbers strictly less than one
is not closed and the set of all real numbers is not bounded. Compactness turns out to be just the
right requirement for feasible sets in view of the following two related facts:
Stock assumptions which assure that a problem has at least one solution are thus (i) a continuous
objective function and (ii) constraints which yield a compact feasible set.

## 5.2.2 The Uniqueness Problem

Now it would also be nice if x(a) were a function and not a point-to-set mapping since its nicer to have
a model which makes a unique prediction regarding behavior than one which merely enumerates a
collection of possibilities. The predictive power of the model is greater and comparative statics results,
which require functions, become possible. Unfortunately the “arguments” which solve maximization
problems need not generally be unique. There are, however, special classes of problems for which the
solutions are necessarily unique and, thus, for which x(a) is a function. One of the most important of
these involves problems whose objective functions and constraints are quasi-concave.
The connection of quasi-concavity to the uniqueness of solutions to Equation 5.9 on page 66 is imme-
diate. Suppose, for example, that the constraints, g i (x, a), are (strictly) quasi-concave functions in x.
Then the set of x’s which satisfy g i (x, a) ≥ 0 is simply a level set for the (strictly) quasi-concave func-
tion and thus a (strictly) convex set. The feasible set, the set of x’s which satisfy all of the constraints,
as an intersection of (strictly) convex sets must itself be (strictly) convex.
Now add the fact that the objective function, f (x, a), is quasi-concave in x and x(a) must then be a
convex set. This follows from two observations. If any two distinct points belong to x(a) then (i) both
points must belong to the feasible set and (ii) both points must impart the same value to the objective
function. It follows from the first and the convexity of the feasible set that any point along the line
segment connecting the two points must belong to the feasible set. It follows from the second and the
quasi-concavity of the objective function that any point along this line segment must impart a value
to the objective function which is at least as great as the value at the endpoints. But the value of the

68
objective function at any point along the line segment also cannot exceed the value at the end points
else the end points could not have belonged to x(a). Hence the value of the objective function must
be constant along the line segment and the entire line segment must then belong to x(a).
If the constraints are quasi-concave and the objective function is strictly quasi-concave then x(a) can
contain at most a single point. To see this suppose to the contrary that x, x 0 ∈ x(a) with x 6= x 0 and
pursue the argument of the previous paragraph. Any point along the line segment connecting x and
x 0 must be feasible and must now impart a value to the objective function which is, because of strict
quasi-concavity, strictly greater than either of the endpoints. Since this contradicts the assumption
that x 6= x 0 ∈ x(a) it cannot be the case that x(a) contains two or more points — either x(a) is
empty or it contains a unique solution and can be regarded as a function.

## 5.2.3 The Differentiability Problem

Optimization models in which the mapping x(a) is a not only a function but differentiable as well are
particularly important in Economics. In such models comparative static effects are easily expressed as
the partial derivatives of x(a).
Suppose that in Equation 5.9 on page 66 the constraints are quasi-concave and the objective function
is strictly quasi-concave in x and differentiable in both x and a. Then the problem has a unique
solution (from strict quasi-concavity) which can be characterized by the Kuhn-Tucker Conditions (from
differentiability in x). The Lagrangian Function is
m
X
L(x, λ, a) ≡ f (x, a) + λj g j (x, a)
j=1

## and the Kuhn–Tucker necessary conditions are

Lx (x(a), λ(a), a) = 0
λ(a) · Lλ (x(a), λ(a), a) = 0
Lλ (x(a), λ(a), a) ≥ 0
λ(a) ≥ 0

Now choose values for the exogenous variables, a = â, suppose the non-binding constraints have been
eliminated and write the Kuhn-Tucker conditions as
Lx (x̂, λ̂, â) = 0
(5.10)
g(x̂, â) = 0
where
L(x, λ, â) ≡ f (x, â) + λg(x, â)
g(x, â) ≡ [g 1 (x, â), . . . , g m (x, â)]

When Equation 5.10 is solved for x(â) will the result be differentiable? The Implicit Function Theorem
provides the answer to this question.
Theorem 29 (Implicit Function Theorem). Suppose hi (y, a), i = 1, , . . . , l are real-valued and contin-
uously differentiable functions on Rl × Rp with y ∈ Y , an open subset of Rl , and a ∈ A, an open
subset of Rp . Let hi (ŷ, â) = 0, i = 1, . . . , l. Provided that the determinant of the Jacobian does not
vanish, |hy (ŷ, â)| 6= 0, there exists a continuously differentiable function y(a) such that ŷ = y(â)
and hi (y(a), a) = 0, i = 1, . . . , l for all a in some neighborhood of â.

69
Intuitively, this theorem says that if you fix a = â you can regard h as a mapping which takes a point
y in Rl and produces another point, h(y, â), in Rl . If this mapping is differentiable it has a good linear
approximation given by the Jacobian hy . If the determinant of this Jacobian is non-singular then the
linear approximation is invertible. This means that it’s possible to find out what y mapped into zero
when a = â, namely, ŷ = y(â). The theorem states that it is not only possible to find ŷ, but, locally
at least, that it is also possible to find the function y(a) and that this function will be differentiable.
To use the Implicit Function Theorem, let the first order conditions of Equation 5.10 on preceding
page correspond to the hi , let l correspond to n + m and let y correspond to (x, λ). Then under
the conditions of the implicit function theorem there is a continuously differentiable function y(a) =
[x(a), λ(a)] such that

Lx (x(a), λ(a), a) = 0
(5.11)
g(x(a), a) = 0

## 5.3.1 The Classic Approach

Since the Implicit Function Theorem assures that x(a) is differentiable, we can differentiate Equa-
tion 5.11 with respect to ak to obtain

gxT
" #" # " #
Lxx ∂x/∂ak Lxak
+ =0
gx 0 ∂λ/∂ak Lλak

Since the matrix on the left side of this expression (the Jacobian of the hi ’s) is assumed to be non-
singular, the inverse exists and we have
#−1 "
gxT
" # " #
∂x/∂ak Lxx Lxak
=− (5.12)
∂λ/∂ak gx 0 Lλak

Equation 5.12 is called fundamental equation of comparative statics. It is apparent from this equation
that the inverse of the bordered Hessian or

Lxx gxT
" #
−1
H =
gx 0

is the key to comparative static results in the “classical approach”. In this approach second order nec-
essary conditions provide local information and concavity of f and the g i ’s provide global information
 Problem 5.10. Suppose that the unconstained optimization problem, maxx f (x, a), has a unique
solution x ∗ (a) where x, a ∈ R and ∂ 2 f (x, a)/∂x 2 < 0.

1. Show that Sign(dx ∗ (a)/da) = Sign(∂ 2 f (x, a)/∂x∂a) where ∂ 2 f (x, a)/∂x∂a is evaluated at x =
x ∗ (a).

2. Interpret this result geometrically by plotting ∂f (x, a)/∂x for a given value of a in a graph with
x plotted on the horizontal axis.

70
5.3.2 The Envelope Theorem

An important alternative approach to comparative statics is provided by the envelope theorem. Con-
sider a controlled experiment in which only the ith exogenous variable will be changed. The envelope
theorem concerns the following question. How does

F (a) ≡ f (x(a), a)
(5.13)
= f (x(a), a) + λ(a)g (x(a), a)

change with ai ? Put more precisely, what is the (partial) derivative of F (a) with respect to ai ? Consider
the right hand side of this expression and notice that as ai changes there are two types of effects upon
the value of f (x(a), a). The first corresponds to the (partial) derivative of f (x, a) with respect to ai
evaluated at x = x(a). This is the direct effect and is denoted ∂f (x(a), a)/∂ai . The second, or indirect
effect, corresponds to the effect of the change in ai upon the components of x, ∂xj (a)/∂ai , and the
effect of these changes in the components of x upon f (·), ∂f (·)/∂xj . Using the chain rule, the total
effect is the sum of these two components:

n
∂F (a) ∂f (a) X ∂f (x, a) ∂xj (a)
≡ +
∂ai ∂ai j=1
∂xj ∂ai

The envelope theorem says, quite simply, that the second — and complicated — term in this expression
is zero:
Theorem 30 (Envelope Theorem). Suppose

## x(a) ≡ maxf (x, a)

x
s.t. g(x, a) ≥ 0

and

F (a) = f (x(a), a)
L(a) = f (x(a), a) + λ(a)g (x(a), a)

where x(a) and λ(a) are functions satisfying the Kuhn-Tucker conditions (Equation 5.4 and Equa-
tion 5.7 on page 61). Provided that both F (·) and L(·) are differentiable

∂F (a) ∂L(x, λ, a)
=
∂ai ∂ai

where
∂L(x, λ, a) ∂f (x, a) ∂g(x, a)
= +λ
∂ai ∂ai ∂ai

## and all partial derivatives are evaluated at x = x(a) and λ = λ(a).

The basis of this surprising fact and the reason it is called the envelope theorem is illustrated in Fig-

71
ure 5.4. In this example there are no constraints and a has a single component:

[a − (x − a)2 ]
f (x, a) =
2
∗ [a − (x − a)2 ]
x (a) = arg max
x 2
=a
[a − (x − a)2 ]
F (a) = max
x 2
= f x ∗ (a), a


a
=
2

## objective function as a changes and as x is continually

F(a)
changed to remain optimal for the new values of a. The 4

## slope of F (a) at a point corresponds, of course, to the total

derivative dF (a))/da. A series of other curves are plotted 3

## in which x is held constant at the values which would be

optimal for various specific values of a, e.g., f (7, a) corre- 2

## sponds to holding x constant at x = 7 — the value of x f(7,a)

which is optimal when a = 7 — and then varying a. The 1
f(5,a)
slope of one of these fixed x curves at a particular point is f(3,a)
the partial derivative ∂f (x, a)/∂(a). 2 4 6 8 10
a

Note that the graph of F (a) is the “envelope” of the fixed x Figure 5.4: The Envelope Theorem with-
curves. The reason for this is simple. None of the fixed x out Constraints
curves can every lie above F (a) which, after all, is the max-
imized value of f (x, a) over all possible choices of x. The
fixed x curves can touch F (a), however. Consider the point at which a = 7 for example. Since x = 7 is
optimal for this value of a, it follows that F (7) = f (7, 7). The fixed x curve f (7, a) thus touches F (a)
at a = 7. A similar story holds for the other fixed x curves each of which contributes, in this case, a
single point to the envelope F (a).

Now if two curves touch at a point but do not cross they are tangent to one another. If, moreover,
both curves are differentiable, then their slopes are well defined at all points and their slopes must be
equal at points of tangency. This is the envelope theorem — the slope of a fixed x curve must be the
same as the slope of the envelope curve at the point of tangency.

A slightly more complex illustration of the envelope theorem is provided by the following problem:

arg max x 2
x
s.t. x ≤ a

## The Lagrangian function for this problem is

L(x, λ, a) ≡ x 2 + λ(a − x)

72
and the solution is given by

x ∗ (a) = a
λ∗ (a) = 2a
F (a) ≡ L(x ∗ (a), λ∗ (a), a)
= a2

## “fixed x” curves are obtained by holding both x and F(a)

80
L(7,14,a)
λ constant at values which would be optimal for some
a and then letting a vary. Note again that F (a) is the 60 L(5,10,a)
envelope of these curves and that, for example, the
40
tangency of F (a) with L(3, 6, a) at a = 3 means that L(3,6,a)
20

## dF (3) ∂L (x ∗ (3), λ∗ (3), 3)

= 0
da ∂a
-20
 Problem 5.11. Consider the problem
2 4 6 8 10
a
max min{x1 , x2 }
x≥0
s.t. p1 x1 + p2 x2 ≤ m Figure 5.5: The Envelope Theorem with Con-
straints
with parameters p1 , p2 , m > 0.

## 1. Show that the solution to this problem is given by

m
xi∗ (p1 , p2 , m) = , i = 1, 2
p1 + p2
m
F (p1 , p2 , m) =
p1 + p2

## 2. Is the objective function differentiable? Is F (p1 , p2 , m) differentiable?

 Problem 5.12. Solve Problem 5.11 using Mathematica for the case in which p1 = 2, p2 = 3 and
m = 10.

## 5.3.3 The Envelope Approach

The connection of the Envelope Theorem to comparative statics depends on a special characteristic of
problems frequently encountered in microeconomics. The endogenous variables in such problems are
the quantities chosen of various commodities and the exogenous parameters are the market prices
of these same commodities. The objective function, moreover, commonly depends upon the market
value of the quantities chosen — the dot product of prices and quantities. Such problems have the
special form:

max a · x
x
(5.14)
s.t. g(x) ≥ 0

## Since the Lagrangian for Equation 5.14 is

L(x, λ, a) = a · x + λg(x)

73
the Envelope Theorem implies that

∂F (a) ∂L(x, λ, a)
=
∂ai ∂ai
= xi (a)

and
∂ 2 F (a) ∂xi (a)
=
∂ai ∂aj ∂aj
Here the comparative static effects, ∂xi (a)/∂aj , are obtained as the second partial derivatives of F (a).
In this approach global information about the concavity or convexity of F (a) provides global informa-
tion about the comparative static effects.

74
Chapter 6

Dynamics

## 6.2.3 Repeated Real Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

## – F (0, x) = x for all x ∈ X

– F (t1 + t2 , x) = F (t2 , F (t1 , x)) = F (t1 , F (t2 , x)) for all x ∈ X and all t1 , t2 ∈ T

The state space, X, is typically taken to be a topological space and the set of possible times, T , is taken
either to be the real numbers for continuous time systems or the integers for discrete time systems. For
continuous time systems, F is often called a flow and for discrete time systems a map.
For a given initial condition, x(0),
x(t) ≡ F (x(0), t)

75
is often called a path or trajectory and represents the entire future, t > 0, and history, t < 0, of the
system that was in state x(0) at t = 0.
Interest focuses upon the existence of equilibria, i.e., x ∗ for which F (x ∗ , t) = x ∗ for all t and, when
an equilibrium exists, the stability of the equilibrium:
Definition 6.1. The equilibrium x ∗ is stable iff for every neighborhood of x ∗ , U , there is another
neighborhood of x ∗ , V , such that x(0) ∈ V implies x(t) ∈ U for every t > 0.
If, additionally, V can be chosen so that limt→∞ x(t) = x ∗ then x ∗ is called asymptotically stable.

## Now if F is differentiable then we can define

∂F
f (x) ≡ (x, 0)
∂t
and thus obtain the system of differential equations

ẋ = f (x) (6.1)

From a modeling perspective, it is often easier to begin by specifying the model either as either a
system of differential equations, Equation 6.1, for a continuous time system or as a system of difference
equations for a discrete time system
∆x = f (x) (6.2)

If the good news is that f is easier to specify, the bad news is that F is still needed and must now be
obtained by solving (integrating) f , a task which is often quite difficult to do manually but, fortunately,
easy to do with a computer and software such as Mathematica — see the introduction to this program,
Using Mathematica, on page 93.

## 6.1.1 Example: Predator and Prey

The predator-prey or Lotka-Volterra equations provide a very nice example of how f is specified and
then solved for F . These equations are a pair of first order, non-linear, differential equations frequently
used to describe the dynamics of biological systems in which two species interact, one a predator and
one its prey. They were proposed independently by Alfred J. Lotka in 1925 and Vito Volterra in 1926.
Here X = R2+ , T = R+ and f is given by

dx
ẋ ≡ = (a − by)x
dt
dy
ẏ ≡ = (cx − d)y
dt
where

## • x is the population of its prey (for example, rabbits)

• ẋ and ẏ are the growth rates of the two populations against time

## • a, b, c, d are parameters representing the interaction of the two species

The logic is that wolves are bad for rabbits so rabbit growth, ẋ, is decreasing in wolves, y. Rabbits, on
the other hand, are good for wolves, so wolf growth, ẏ, is increasing in rabbits, x.

76
What can be said about F ? Well, the first equation implies
y
that dx/dt = 0 when y = a/b and the second equation im-
plies that dy/dt = 0 when x = d/c. This means that state-
space, the positive quadrant of R2 , can be divided into the
four regions illustrated in Figure 6.1. For y > a/b it must be
the case that ẋ < 0 and for y < a/b it must be the case that a/b
ẋ > 0. Arrows are used to reflect this fact — right-pointing
above y = a/b and left-pointing below y = a/b. Similarly
for x > d/c, ẏ > 0 and for x < d/c, ẏ < 0. Upward-pointing
arrows to the right of x = d/c and downward pointing ar-
rows to the left of x = d/c reflect this fact.
d/c x
Next paths consistent with the arrow requirements are added.
Note that these must cross y = a/b vertically since ẋ = 0 ev- Figure 6.1: Predator and Prey
erywhere along this line. Similarly, they must cross x = d/c
horizontally since ẏ = 0 everywhere along this line. The resulting picture is suggestive — there is a
rest point at (d/c, a/b) and, away from this rest point paths seems flow around the rest point in a
counter-clockwise direction.

All this hasn’t been too difficult, but Mathematica can do better. The
4
vector field plot for this system when all the parameters are equal to
one is illustrated in Figure 6.2. In this type of plot, a grid of (x, y)
coordinates is selected and then the vector (ẋ, ẏ) evaluated at the 3
grid point is plotted at each of the grid points. These vectors thus
point in the direction of the flow with a length that is proportional
to the velocity of the flow. All Mathematica needs to produce this 2

## VectorFieldPlot[{(1-y)x, (x-1)y}, {x, 0, 4}, 1

{y, 0, 4}]

But wait, there’s more. We can obtain the complete path for a given
initial condition as simply as 1 2 3 4

## Figure 6.2: Vector Field Plot

sol1 = NDSolve[{D[x[t],t] == (1-y[t])x[t],
D[y[t],t] == (x[t]-1)y[t],\\
x==.2, y==.2\}, {x[t], y[t]}, {t, 0, 10}]

and then plot the results for this combined with a couple of other initial conditions in Figure 6.3 on
following page with the command

ParametricPlot[Evaluate[{sol1,sol2,sol3}], t,0,10]

77
Note that the solutions are indeed orbits, i.e., if the initial state is 4

(0.2, 0.2), for example, then the path begins on the outermost of the
illustrated orbits and then forever follows this orbit counter clock-
3
wise around and around the rest point at (1,1). What ever the initial
state, the story is similar — there is an orbit through the initial state
and the system then stays on that orbit forever.
2
 Problem 6.1. Is the equilibrium at (1, 1) in the preditor and prey
model stable? Is it asymptotically stable?
 Problem 6.2. Use Mathematica to examine the simultaneous sys- 1

## tem of differential equations

ẋ = 1/2 x 2 − y 0
0 1 2 3 4
ẏ = x − y
Figure 6.3: Preditor and prey
with the following orbits

Needs["VectorFieldPlots‘"];
vf = VectorFieldPlot[{1/2 x^2-y, x-y}, {x,-1,1}, {y,-1,1}];
Show[vf, Axes -> True]
sols = Table[{x[t], y[t]} /. NDSolve[{x’[t] == 1/2 x[t]^2 - y[t],
y’[t] == x[t] - y[t], x == n, y == 0.5}, {x[t], y[t]},
{t, 0, 20}], {n, -1, 1, 0.5}];
ph = ParametricPlot[Evaluate[sols], {t, 0, 20},
PlotRange -> {{-1, 1}, {-1, 1}}]
Show[{vf, ph}, Axes -> True]
 Problem 6.3. Use Mathematica to examine the differential equation
ẋ = 3xt 2
with the following

Needs["VectorFieldPlots‘"];
vf = VectorFieldPlot[{1, 3 x t^2}, {t, -2, 1}, {x, -1.5, 1.5}];
Show[vf, Axes -> True]
sol = x[t] /. DSolve[x’[t] == 3 x[t] t^2, x[t], t][]
solutions = Plot[Evaluate[Table[sol /. C -> n, {n, -.5, .5, .25}]],
{t, -2, 1}]
Show[{vf, solutions}, Axes -> True]

## 6.1.2 Example: The Logistic Map

The logistic map is often cited as an example of how complex, chaotic behavior can arise from very
simple non-linear dynamic systems. It models the population of a single species where (i) reproduction
means the population will increase at a rate proportional to the current population but (ii) starvation
means that the population will decrease at a rate proportional to the value obtained by taking the
theoretical “capacity” of the environment less the current population. Mathematically this can be
written as the difference equation
xt+1 = axt (1 − xt )
where xt is a number between zero and one representing the population at time t, and thus x0 rep-
resents the initial population at time 0 and a is a positive number representing a combined rate for
reproduction and starvation.

78
Since both xt and 1 − xt are positive but less than one,
axt (1 − xt ) will be less than xt if a < 1. Thus with a be- 0.5

tween zero and one, the population will eventually die for
any initial population.
When a is between 1 and 2, the population quickly stabilizes 0.4

## on the rest point value, x ∗ , determined by

a−1
xt+1 = xt = x ∗ ⇒ x ∗ = ax ∗ (1 − x ∗ ) ⇒ x ∗ = 5 10 15 20 25

a 0.3

## for any initial population. This is illustrated in Figure 6.4

for the case in which a = 1.5. Integer values of t are plotted
along the horizontal axis and xt is plotted on the vertical 0.2

## axis. The different paths correspond to different values of

x0 .
When a is between 2 and 3, the population will also even-
Figure 6.4: Asymptotically Stable
tually stabilize on the same value but first oscillates around
that value for some time. When a is between 3 and approx-
imately 3.54 the population eventually enters a cycle among points and then remains in this cycle
forever. This is illustrated in Figure 6.5. In the left-hand panel a = 3.2 and the population eventually
enters a two point cycle. In the right-hand panel a = 3.5 and the population eventually enters a four
point cycle. This stable, four point is also illustrated in the left-hand panel of Figure 6.7 on following
page.

0.9

0.8
0.8

0.7 5 10 15 20 25
5 10 15 20 25

0.6 0.6

0.5
0.5

0.4
0.4

0.3

Figure 6.5: Cycles: Two Point (left) and Four Point (right)

 Problem 6.4. Is the equilibrium at (a − 1)/a in the logistic map stable for a between 3 and 3.54? Is
it asymptotically stable?

Saving the best for last, when a is approximately 3.57 the system becomes chaotic. This is illustrated
for a = 3.7 in Figure 6.6 on following page.
Chaotic behavior can be described as extreme sensitivity to initial conditions. Figure 6.7 on following
page shows the last 40 of 120 values of the Logistic Map for x = 0.02. In the left-hand panel
a = 3.5 and in the right hand panel a = 3.7. Each panel has two plots, one computed with the default
16 digit accuracy and another with 80 digit accuracy. While the two plots are indistinguishable in the
non-chaotic, left-hand panel, there is a substantial difference in the chaotic, right-hand panel where

79
0.9

0.8

5 10 15 20 25
0.7

0.6

0.5

0.4

0.3

0.9

0.8
0.8

0.7 0.7

0.6
0.6
0.5

0.5
0.4

90 100 110 120

## Figure 6.7: Numerical Sensitivity: non-chaotic (left) and chaotic (right)

80
the two plots, which started from the same initial conditions, appear less and less related as the slight
“rounding” differences accumulate with each passing period.

An interesting historical note is that Edward Lorenz was using a numerical computer model to rerun a
weather prediction in 1961, when, as a shortcut on a number in the sequence, he entered the decimal
.506 instead of entering the full .506127 that the computer would hold. The result was a completly
different weather scenerio. Learning of this, one meteorologist remarked that one flap of a seagull’s
wings could change the course of weather forever. In later speeches and papers, Lorenz adopted
the more poetic and now familiar phrase “butterfly effect” to describe this characteristic of chaotic
systems.

Hopefully, you are now convinced (i) that many strange and wonderful things are possible even with
quite simple dynamic systems and (ii) that you should focus on modeling and leave the task of solving
to Mathematica (or some similar system).

## 6.2 Systems of Linear Differential Equations

Much of the remainder of this chapter will focus on the system of first-order, homogenous differential
equations given by
ẋ = Ax (6.3)

where A is an n × n matrix and both ẋ and x are elements of Rn . Why focus on this system? It’s
important in its own right and it can be used to understand behavior of many other systems.

## Suppose, for example, that we have the linear second-order system

d2 x dx
2
+a + bx = 0
dt dt

then we can introduce new variables x1 = x and x2 = dx/dt, note that dx2 /dt = d2 x/dt 2 and get the
first-order linear system

ẋ1 = x2
ẋ2 = −bx1 − ax2

## so that Equation 6.3 holds with

" #
0 1
A=
−b −a

Higher order linear systems can always be reduced to first-order linear systems by introducing new
variables in this way.

Suppose, on the other hand, that we have the non-homogenous first-order system

ẏ = Ay + b

Assuming that A is invertible, we can solve Ay + b = 0 for the rest-point y ∗ = −A−1 b. Now define
the new variables x ≡ y − y ∗ to “translate the origin” to y ∗ , note that ẋ = ẏ and y = x − A−1 b and
substitute to obtain
ẋ = A(x − A−1 b) + b = Ax − b + b = Ax

## and once again Equation 6.3 holds.

81
Finally suppose that we have the non-linear, first-order system

ẋ1 = f 1 (x1 , x2 , . . . , xn )
ẋ2 = f 2 (x1 , x2 , . . . , xn )
..
.
ẋn = f n (x1 , x2 , . . . , xn )

or, more succinctly, ẋ = f (x). Supposing that f is differentiable, we can let Ax̂ ≡ fx (x̂) where fx (x̂),
the Jacobian of f evaluated at x̂, has ∂f i (x̂)/∂xj as its ij element, and then examine the linear system

ẋ = Ax̂ x

## to understand the behavior of the non-linear system near x̂.

Better still, we need not consider all A matrices when studying Equation 6.3 since similar matrices
represent only a change of basis and must, therefore, exhibit the same underlying behavior. Further,
for any class of similar matrices, there is a particularly simple, “diagonal” form known as the Jordan
canonical form. Here the matrix which is similar to A takes the form:
 
D
 
C D 
 
 .. .. 

 . . 

 

 C D 

C D

where the C and D blocks are 2 × 2 matrices. Each of the D blocks take one of the following three
forms
" #
a b
(6.4)
−b a
" #
c 0
(6.5)
0 d
" #
e 0
(6.6)
1 e

where a ± bi is a pair of complex conjugate characteristic roots of A, c and d are real characteristic
roots of A and e is a repeated real characteristic root of A.

## Each of the C blocks is either

" #
1 0
0 1

or
" #
0 0
0 0

with the former case holding only if the adjacent D blocks correspond to repeated characteristic roots.

82
Suppose, for example, that A is a 4 × 4 matrix with characteristic roots 2, 2, 1 − i and 1 + i. The Jordan
canonical form would then be  
2 0 0 0
1 2 0 0 
 

0 0 1 −1
 (6.7)
 
0 0 1 1

It should be noted that the elements of the Jordan canonical form discussed thus far are all real
numbers. In this real version of the Jordan canonical form complex roots are represented by blocks
of the sort given in Equation 6.4 on the previous page in which the real part of the root is given by
the diagonal term and the imaginary part by the off diagonal terms. If we allow the matrix to have
complex numbers then in the complex version of the Jordan canonical form blocks like
" #
a − bi 0
0 a + bi

replace those in Equation 6.4 on the previous page. It should also be noted that in some discussions
the C blocks corresponding to repeated roots appear above the diagonal instead of below. The classic
reference for this is Hirsh and Smale .
 Problem 6.5. In Mathematica, the command JordanDecomposition[M] returns two matrices, S and
J, which satisfy S −1 MS = J. Here J is the complex version of the Jordan canonical form and S is
the similarity transformation for obtaining it from the matrix M. Both matrices can be conveniently
displayed with the command
Map[MatrixForm, {S, J} = JordanDecomposition[M]]
What is the result of applying this to the matrix in Equation 6.7?

Since the system de-couples into 2 × 2 blocks in the Jordan form, we can effectively limit attention to
the simple forms represented in Equation 6.4 to Equation 6.6 on the previous page.

## 6.2.1 Complex Roots

Here we consider the system corresponding to Equation 6.4 on the previous page:
" # " #" #
ẋ1 a b x1
=
ẋ2 −b a x2

## with solution " # " #" #

x1 (t) at cos(bt) sin(bt) x1 (0)
=e (6.8)
x2 (t) − sin(bt) cos(bt) x2 (0)
Note that limt→∞ (x1 (t), x2 (t)) equals (0, 0) iff a < 0. Thus the system is stable iff a, the real part of
the pair of complex conjugate roots, is negative. The vector field and trajectories for this system for
various interesting cases are illustrated below:
 Problem 6.6. Use Mathematica’s DSolve to confirm Equation 6.8.
 Problem 6.7. How would these plots change with changes in the magnitude but not the sign of a?
 Problem 6.8. How would these plots change with changes in the sign and magnitude of b?
♦ Query 6.1. What is the best linear approximation of the predator and prey system at the rest point,
(x, y) = (1, 1)? Describe the behavior of the solutions to this linearization.
♦ Query 6.2. What is the best linear approximation of the system in Problem 6.2 on page 78 at the rest
point, (x, y) = (0, 0)? What about the rest point at −2, −2? Describe the behavior of the solutions to
both linearizations.

83
10 15

10

## -10 -5 5 10 -15 -10 -5 5 10 15

-5

-5

-10

-10 -15

Figure 6.8: Unstable: complex roots with positive real part (a = 0.4 and b = 1)

0.3

0.2

1.0

0.1

0.5

## -1.0 -0.5 0.5 1.0

-0.1

-0.5

-0.2

-1.0
-0.3

Figure 6.9: Stable: Complex roots with negative real parts (a = −0.4 and b = 1)

84
5
5

-5 5 -5 5

-5
-5

## Here we consider the system corresponding to Equation 6.5 on page 82:

" # " #" #
ẋ1 c 0 x1
=
ẋ2 0 d x2

with solution
ect
" # " #" #
x1 (t) 0 x1 (0)
= (6.9)
x2 (t) 0 edt x2 (0)

Note that limt→∞ (x1 (t), x2 (t)) equals (0, 0) iff a, b < 0. Thus the system is stable iff both of the (real)
roots are negative. The vector field and trajectories for this system for various interesting cases are
illustrated below.
1.0 1.0

0.5 0.5

## -1.0 -0.5 0.5 1.0 -1.0 -0.5 0.5 1.0

-0.5 -0.5

-1.0 -1.0

Figure 6.11: Stable: negative and unequal real roots (c = −0.5 and d = −1)

##  Problem 6.9. Use Mathematica’s DSolve to confirm Equation 6.9.

 Problem 6.10. How would these plots change with changes in the magnitudes of c and d?

85
1.0 1.0

0.5 0.5

## -1.0 -0.5 0.5 1.0 -1.0 -0.5 0.5 1.0

-0.5 -0.5

-1.0 -1.0

Figure 6.12: Stable: negative and equal real roots (c = −1 and d = −1)

1.0

1.0

0.5

0.5

## -1.0 -0.5 0.5 1.0

-0.5

-0.5

-1.0

-1.0

Figure 6.13: Saddle: both positive and negative real roots (c = −1 and d = 1)

86
 Problem 6.11. Is the system described in Figure 6.13 on the previous page stable? Is it unstable?
What is the set of initial conditions from which the system converges to the rest point? Is this set
generic in R2 ? What is the set of initial conditions from which the system does not converge to the
rest point? Is this set generic in R2 ?

" # " #" #
ẋ1 r 0 x1
=
ẋ2 1 r x2

## with solution " # " #" #

x1 (t) rt 1 0 x1 (0)
=e (6.10)
x2 (t) t 1 x2 (0)

t→∞ t→∞

## depends upon limt→∞ ter t x1 (0). If r < 0 this can be written as

t
lim
t→∞ e|r |t

which is of the indeterminate form ∞/∞ and l’Hospital’s rule can therefore be applied. Differentiating
numerator and denominator then gives

t 1
lim = lim =0
t→∞ e|r |t t→∞ |r | e|r |t

Thus the system is stable iff the repeated real root r is negative. The vector field and trajectories for
this system for various interesting cases are illustrated below.
1.0 1.0

0.5 0.5

-0.5 -0.5

-1.0 -1.0

##  Problem 6.12. Use Mathematica’s DSolve to confirm Equation 6.10.

87
6.2.4 Summary

Solutions to all of the generic 2 × 2 systems that make up the diagonal blocks in the Jordan canonical
form have now been displayed. The key facts are the following:

## 1. There is a unique rest-point at the origin.

2. If the real part of all of the characteristic roots are negative, then all paths converge to the origin.
3. If the real parts of all of the characteristic roots are positive then all paths other than those
starting at the origin diverge from the origin.
4. Real parts with opposite signs are associated with “saddles” where some paths converge to the
origin and others diverge.
5. Complex roots are associated the orbits. When the real part is negative, the orbit “decays” and
paths converge on the origin. When the real part is positive, the orbit “explodes”. When the real
part is zero, i.e., the root is a pure complex number, then the orbit is itself a stable set.

## 6.3 Liapunov’s Method

For linear systems, the question of stability is easily resolved. Just examine the characteristic roots.
For non-linear systems, issues are more complicated. So far we have no way of determining stability
except by actually finding all the solutions to ẋ = f (x) which may be difficult if not impossible.
Fortunately, another, indirect method is possible.
Theorem 31 (Liapunov). Suppose x ∗ is an equilibrium for ẋ = f (x) and that there exists a continuous
map, v, from U , a neighborhood of x ∗ , into R which is differentiable on U \ {x ∗ } and satisfies

## 1. v(x ∗ ) = 0 and x 6= x ∗ implies v(x) > 0

2. v̇ ≤ 0 for x ∈ U \ {x ∗ }
Then x ∗ is stable and v is called a Liapunov function. Furthermore, if
3. v̇ < 0 for x ∈ U \ {x ∗ }
Then x ∗ is asymptotically stable and v is called a strict Liapunov function.

It should be emphasized that Theorem 31 can be applied without solving Equation 6.1 on page 76. On
the other hand, there is no direct method for finding Liapunov functions — often trial and error and
considerable ingenuity is required.
Here’s an example of the use of this method. The van del Pol equation is a second-order differential
equation that can be reduced to the following system of first-order, non-linear equations:
ẋ1 = x2
ẋ2 = −x1 + a(x23 /3 − x2 ) where a > 0

Is the equilibrium at x ∗ = (0, 0) stable? For this system we try (inspired guess) the function v =
1/2(x12 + x22 ). Since this is a positive-definite quadratic form, the requirement that v(x) > 0 for
x 6= x ∗ = 0 is satisfied. It remains to show that v̇ < 0:
v̇ = x1 ẋ1 + x2 ẋ2
= x1 x2 − x1 x2 + a(x24 − x 2 )
= −ax22 (3 − x22 )/3

88

which is clearly negative for |x| <√ 3. Thus v is in fact a strict Liapunov function and x ∗ , therefore,
is asymptotically stable for |x| < 3 ≈ 1.73.
For completeness, the vector field and solution plots are illustrated in Figure 6.15.

-2 -1 1 2
-3 -2 -1 1 2 3

-1 -1

-2
-2

-3

## Figure 6.15: The van del Pol equation (a = 1)

 Problem 6.13. Is v = x − log(x) + y − log(y) + k a Liapunov function for the predator and prey
system

ẋ = (1 − y)x
ẏ = (x − 1)y

89
90
Notation

## × “the cartesian product”

and B = {c, d} then A × B = {(a, c), (a, d), (b, c), (b, d)}. Similarly, R2 = R × R. In
If A = {a, b} Q
m
general, A ≡ j=1 Aj ≡ A1 × A2 × · · · × Am is the set of m-tuples of the form (a1 , a2 , . . . , am )
where a1 ∈ A1 , a2 ∈ A2 , and so forth.

## ⊂ “is a subset of”

Similarly ∩ “intersect” and ∪ “union”
R+ ⊂ R

Ac the complement of A

\ set minus
A \ B = A ∩ Bc

## ∈ “is an element of” or “belongs to”

5/7 ∈ R+ , −3 ∉ R+

x + 1 > x ∀x ∈ R

## ∃ “there exists” or “for at least one”

If x ∈ R then ∃y ∈ R such that y > x

## ≡ “is defined to be”

A ∩ B ≡ {x | x ∈ A and x ∈ B}.

⇒ “implies”

## ẋ the derivative of x(t) with respect to time

91
92
Using Mathematica

Basics

I am preparing this document using the notebook interface for the Mac version of Mathematica so
the keyboard shortcuts I describe may be a bit different if you’re using either the Windows or Unix
versions. Though Mathematica has both a notebook or graphics interface and a character-based
interface,you will most likely be using the notebook interface and this differs little between operating
systems.When you save your notebook you’ll get a standard ascii notebook file that is exactly the same
no matter which operating system you’re using. You could email this file to a friend, for example,and
he/she would be able to use it no matter which operating systems the two of you are using. You
could also select File/SaveasSpecial/HTML or File/SaveasSpecial/TEX, as I will do with this document
to display it on the internet.
When you start Mathematica, the screen will be blank save for a horizontal line across the top of
the screen. This line represents the insertion point for a "cell". A cell can contain text, such as
this paragraph, a headline, such as "Introduction to Mathematica" at the beginning of this document,
input to be processed by Mathematica, output returned by Mathematica and so forth. The default is
input but you can change this while the horizontal line is visible by selecting from the Format/Style.
Once you start typing, the line will disappear and the characters you type will appear together with a
"bracket" in the right-hand margin. You can click on this bracket and make another selection from
If you move the curor to the end (or beginning) of a cell, a new insertion line will appear where you
can, once again, select a style and enter new material. You can also, of course, return to any existing
cell and make any changes you like.

Input Expressions

Simply enter an expression, say (2+3)∧ 3, in an input cell and press the shift key and the enter key at
the same time. Mathematica will process your input, label your input cell with "In[#]" and return its
output in a cell labeled "Out[#]" where "#" is the number Mathematica assigns to this matched pair
of cells.
In: (2 + 3) ∧ 3
Out: 125
If you edit your input cell, "In[#]" will disappear to remind you that your new input has not yet been
processed.
The operators are what you would expect, with * or space for multiplication and ∧ for exponentiation.
Watch out for the following common sources of consternation:

93
Spaces

3x will be interpreted as 3 times x and not as a variable named 3x. On the other hand, x3 will be
interpreted as a variable named x3 and not as the product of x times 3. If what you want is x times 3
and you want to put the x first, then you must put a * or a space between the x and the 3.

Cases

Mathematica is case sensitive, e.g., y and Y are the names of two different variables. Mathematica’s
built in variables, like Pi, and functions, like Log, always have at least the first letter capitalized and
are usually not abbreviated but are spelled out in full. You can avoid conflicts by not capitalizing the
first letter of your own variables and functions.

Delimiters

Mathematica uses the delimiters ( ), [ ], { } and [[ ]] for completely different purposes. Parentheses
are used only for grouping, e.g, (x+y)∧ 2. Square brackets are used only to provide the argument(s) to
a function, e.g., Log. Curly braces are used only to delimit the elements of a list, e.g., mylist={x,y}
defines mylist to be a list containing two elements - the variables x and y. Double square brackets are
used only to refer to elements of a list, e.g., mylist[] refers to the second element of mylist - the
variable y.

## Symbols and Numbers

You are not limited to numeric entries, Mathematica will just as happily process symbolic expressions.
In: x + y ∧ 3


3
Out: x + y

x + y ∧3
  
In: Expand

Out: x 3 + 3x 2 y + 3xy 2 + y 3
When processing numbers, Mathematica’s default is to treat them as exact:
In: 1/3 + 2/5 + 3/7
122
Out:
105
If you want a decimal approximation you can either ask for it explicitly using the built-in numerical
value function, N:
In: N [1/3 + 2/5 + 3/7]
Out: 1.1619
or you can enter a decimal in your input:
In: 1/3 + .4 + 3/7
Out: 1.1619
Mathematica can compute very large numbers

94
In: 200!

Out: 78865786736479050355236321393218506229513597768717326329474253
3244359449963403342920304284011984623904177212138919638830257
6427902426371050619266249528299311134628572707633172373969889
4392244562145166424025403329186413122742829485327752424240757
3903240321257405579568660226031904170324062351700858796178922
2227896237038973747200000000000000000000000000000000000000000
00000000

or very precise approximations such as the value of Pi carried out to 400 decimal places

## In: N [Pi, 400]

Out: 3.141592653589793238462643383279502884197169399375105820974944
5923078164062862089986280348253421170679821480865132823066470
9384460955058223172535940812848111745028410270193852110555964
4622948954930381964428810975665933446128475648233786783165271
2019091456485669234603486104543266482133936072602491412737245
8700660631558817488152092096282925409171536436789259036001133
0530548820466521384146951941511609

or complicated expressions
  
In: Sum 1/j, j, 1, 100

14466636279520351160221518043104131447711
Out:
2788815009188499086581352357412492142272

Note the use of {j, 1, 100} to represent a list of values for j going from 1 to 100. This "range" operator
is widely used. For another example, consider the 2 dimensional plot

   
In:

0.5

1 2 3 4 5 6

-0.5

-1

Out: −Graphics−

## Plot3D x ∧ 2 − y ∧ 2, {x, −1, 1} , y, −1, 1

  
In:

95
1
0.5 1
0
-0.5 0.5
-1
-1 0
-0.5
0 -0.5
0.5
1 -1

Out: −SurfaceGraphics−
where both x and y go from -1 to +1.
By the way,if you forget how a function works,just put the cursor after the name of the function,e.g.,after
the Plot3D,click on Help and then Find Selected Function and the complete documentation on the rel-
evant function will pop up.Mathematica’s on-line help system is the best I’ve seen.

## Using "%" to refer to the previous expression

The percent sign, %, is shorthand for the results of the previous calculation, e.g.,
x + y ∧3
  
In: Expand

Out: x 3 + 3x 2 y + 3xy 2 + y 3
In:Factor [%]
3
Out: x + y

## Alternatively, you can name an expression using "="

In: myexpression = x ∧ 2 − 8x + 15
Out: 15 − 8x + x 2
and later refer to the equation by name
In: mysolutions = Solve [myexpression == 0]
Out: {{x → 3} , {x → 5}}
Note the use of the single equality to name or define an expression and the use of the double equality
for "equals". Note also that the output of "Solve" is a list of the two solutions to the equation. To
refer, say, to the second solution you would use mysolutions[].

96
Mathematica never forgets. If you ever enter x = 3, even by mistake, then the variable named x will
thereafter be replaced by the number 3. Deleting the cell containing the definition won’t help. To
remove this definition from memory you need to enter Clear[x].

## Using "/." to substitute values into an expression

You can substitute values into an expression using the substitution operator "/.",e.g.,
In: x ∧ 2/.x → 3
Out: 9
where the rightarrow is gotten by typing "-" and then ">", or
In:(a + b) ∧ 2/.a + b → x − y
2
Out: x − y

or
In: myexpression/.mysolutions
Out: {0, 0}
In the last case, the list of "mysolutions", namely x->3 and x->5 are subsituted one at a time into
"myexpression". The fact that the result in each case is zero, confirms that 3 and 5 are both solutions
to myexpression==0.

## You can define your own functions using ":=".

In: f [z_] :=z∧ 2 − 8z + 15
Note the use of "z_" on the left to refer to "z" on the right though you could equally well use "a_" and
"a" or any other such pair of symbols. Now f can be used exactly as you would any other function,
e.g., to confirm visually "mysolutions".
In: Plot [f [x] , {x, 0, 8} , PlotStyle → RGBColor [1, 0, 0]]

15
12.5
10
7.5
5
2.5

2 4 6 8

Out: −Graphics−
You can also define piecewise fundtions

97
In: g [x_] :=1 − x/; x ≤ 1
g [x_] :=x − 1/; x > 1
 
In: Plot g [x] , {x, −4, 4}

-4 -2 2 4

Out: −Graphics−

## Using "Simplify" to simplify an expression

mess = 3 + 7x + 8x ∧ 2 + 5x ∧ 3 + x∧ 4 /

In:
3 + 10x + 18x ∧ 2 + 14x ∧ 3 + 3x ∧ 4
3 + 7x + 8x 2 + 5x 3 + x 4
Out:
3 + 10x + 18x 2 + 14x 3 + 3x 4
In: Simplify [mess]
1 + x + x2
Out:
1 + 2x + 3x 2

Forgetting

Mathematica never forgets. If you ever enter x=3, even by mistake, then the variable named x will
thereafter be replaced by the number 3. Deleting the cell containing the definition won’t help. To
remove the definition from memory you need to enter Clear[x]:
In: x=3
Out: 3
In: x
Out: 3
In: Clear [x]
In: x
Out: x
You can remove all of your own definitions either with the following magic:
In: Clear ["Global‘*"]
In: myexpression

98
Out: myexpression
or by choosing "Kernel / Quit Kernel / Local" from the drop-down menu. You can then make any
changes you like and choose "Kernel / Evaluation /Evaluate Notebook" from the drop-down menu to
redo all the calculations in your notebook.

## Using "D" to differentiate

Taking the derivative of an expression with respect to a variable is such a common operation that it is
one of the few Mathematica commands that is abbreviated. Here are some examples.
In: D 3x ∧ 2, x
 

Out: 6x
In: D [f [x] , x]
Out: f 0 [x]
The partial derivative of a function of x and y with respect to y
In: D 3x ∧ 2 + 2xy + 2y ∧ 3, y
 

Out: 2x + 6y 2
Note: In the expression "2 times x times y" , there’s a space between the x and the y, otherwise the
expression would be interpreted as 2 times the variable named xy.
The generalization of the "product rule" to three functions
In: D [a [x] b [x] c [x] , x]
Out: b [x] c [x] a0 [x] + a [x] c [x] b0 [x] + a [x] b [x] c 0 [x]
The generalization of the "chain rule" to three functions
In: D [a [b [c [x]]] , x]
Out: a0 [b [c [x]]] b0 [c [x]] c 0 [x]

Using "Integrate"

## One that even I could do without much effort

In: Integrate x ∧ 3, x
 

x4
Out:
4
and one that I couldn’t
In: Integrate Sin [x] ∧ 2, x
 

x 1
Out: − Sin [2x]
2 4
Is the last one correct?
In: D [%, x]

99
1 1
Out: − Cos [2x]
2 2
In: Simplify [%]
Out: Sin [x]2

Yes. Note that for indefinite intergrals, Mathematica does not display the constant of integration.
For a definite integral give the range of integration
In: Integrate x ∧ 3, {x, 0, 1}
 

1
Out:
4

Using "Solve"

 
In:

x→0
Out: 1
x→ 8

## or a more complicated one:

In: e = x ∧ 3 − 2x + 9;
In:Solve [e == 0] //TableForm
1/3  1 √ 1/3
2 (81− 6465)

2
x → −2 3 81− 6465
√ − 32/3
( )
1/3 √  √ 1/3
√  (1−i 3) 12 (81− 6465)

Out: x → 1 + i 3 2

3(81− 6465)
+ 232/3
1/3 √ 1 √ 1/3
√  (1+i 3) 2 (81− 6465)

2
x → 1 − i 3 3 81− 6465√ + 232/3
( )
Note that we get one real solution, the first, and two complex solutions. Let’s see whether the first
solution works:
In: e/.% []

!1/3  √
1 1/3 
2 2 81 −
6465
Out: 9 − 2 −2
 √ − +

32/3

3 81 − 6465

!1/3  1 √ 1/3 3
2 2 81 − 6465
 √ −

32/3

3 81 − 6465
−2 

## Is this really zero?

In: Simplify [%]
Out: 0

100
To find the maximum of a function of a single variable:

In: payoff = 7x − 4x ∧ 2;
In: Plot [payoff, {x, 0, 2}]
3

0.5 1 1.5 2
-1

-2

Out: −Graphics−
In: Solve [D [payoff, x] == 0]
7
 
Out: x→
8
In: payoff/.% []

49
Out:
16
In: %//N

Out: 3.0625

## To find the maximum of a function of two variables

In: payoff = 7x + 5y − x ∧ 2 − y ∧ 2;
  
In: Plot3D payoff, {x, 0, 6} , y, 0, 6 , ViewPoint-> {1.5, −2.8, 1.1} , BoxRatios → {1, 1, .8}

6
4
2
0

10

0
2
4
6

101
Out: −SurfaceGraphics−
Note that you can interactively set the "ViewPoint" by putting your cursor at that point in the ex-
pression, selecting "Input / 3D View Point Selector" from the drop-down menu and the clicking on
"Paste" when you’re happy with the result.
   
In: Solve D [payoff, x] == 0, D payoff, y == 0 //TableForm
7 5
Out: x→ 2 y→ 2

Want to know more about Mathematica? Cheung et al.  is a good place to start and Ruskeepää
 will make you an expert. For a reference look no further than Mathematica’s superb built-in
help system. Enter a command and then press F1 for information about the command and examples
of its use or press F1 when the cursor is not on a command to bring up the entire help system.

102
Bibliography

C-K. Cheung, G. E. Keough, Robert H. Gross, and Charles Landraitis. Getting Started with Mathematica.
Wiley, 2nd edition, 2005.

Morris Hirsh and Stephen Smale. Differential Equations, Dynamical Systems and Linear Algebra. Aca-
demic Press, 1974.

## Heikki Ruskeepää. Mathematica Navigator. Academic Press, 2nd edition, 2004.

George F. Simmons. Introduction to Topology and Modern Analysis. McGraw Hill, 1963.

103
104
List of Problems

Chapter 1 1
1.1: 2 1.2: 2 1.3: 3 1.4: 3 1.5: 3 1.6: 4 ♦1.1: 4 1.7: 4 1.8: 4 1.9: 5 1.10: 5
1.11: 6 ♦1.2: 7 1.12: 7 1.13: 7 1.14: 7 1.15: 7 1.16: 8 1.17: 8 1.18: 8 1.19: 8
1.20: 9 1.21: 9 1.22: 9 1.23: 10 1.24: 10 1.25: 10 1.26: 10 1.27: 10 ♦1.3: 10
1.28: 11

Chapter 2 15
♦2.1: 17 2.1: 18 ♦2.2: 19 2.2: 20 2.3: 20 2.4: 20 2.5: 20 ♦2.3: 21 2.6: 22 2.7: 25
2.8: 25 2.9: 25 2.10: 25 2.11: 26 2.12: 27 2.13: 27 2.14: 28 2.15: 28 2.16: 30
2.17: 30 2.18: 30 2.19: 30 ♦2.4: 32 ♦2.5: 33 ♦2.6: 33 ♦2.7: 33

Chapter 3 35
3.1: 36 3.2: 38 3.3: 39 3.4: 39 3.5: 39 3.6: 40 3.7: 40 3.8: 40 3.9: 40
3.10: 40 3.11: 41 3.12: 41 3.13: 43 3.14: 44 3.15: 44 3.16: 44 3.17: 44 ♦3.1: 45
♦3.2: 45 3.18: 45 3.19: 45

Chapter 4 47
4.1: 48 4.2: 48 4.3: 49 4.4: 49 4.5: 49 4.6: 50 4.7: 52 4.8: 53 4.9: 53
4.10: 54 4.11: 54 4.12: 54 4.13: 55 4.14: 55 4.15: 55 4.16: 55

Chapter 5 57
5.1: 59 5.2: 61 5.3: 62 5.4: 63 5.5: 63 5.6: 65 5.7: 65 5.8: 66 5.9: 66
5.10: 70 5.11: 73 5.12: 73

Chapter 6 75
6.1: 78 6.2: 78 6.3: 78 6.4: 79 6.5: 83 6.6: 83 6.7: 83 6.8: 83 ♦6.1: 83 ♦6.2: 83
6.9: 85 6.10: 85 6.11: 87 6.12: 87 6.13: 89

Chapter 93

105
Index

T1 -space, 44 convex, 54
T2 -space, 44 strictly, 54
σ -algebra, 45 convex hull, 9
convex set, 9
affine countable set, 36
space, 8 Cramer’s Rule, 28
spanned by vectors, 8 critical points, 48
subspace, 8 curvature, 52
arbitrage, 32
arg max, 58 derivative
arg min, 58 first, 48
asymptotically stable, 76, 88 second, 52
determinant, 26
Bolzano-Weirstrass property, 45 diameter, 39
bordered Hessian, 70 difference equations, 76
bounded, 39 differentiable, 48
bounding, 11 at a point, 48
butterfly effect, 81 twice, 52
differential
canonical, 20 first, 48
Cantor set, 38 second, 52
cardinal number of the continuum, 38 differential equations, 76
cardinality, 36 directed distance, 11
Cartesian product, 1 discrete space, 43
Cauchy-Schwarz inequality, 4 dual space, 17
characteristic equation, 29 dynamic systems
characteristic root, 29 continuous time, 75
characteristic vector, 29 discrete, 75
closed, 44
closed set, 41 eigenvalues, 29
compact space, 45 eigenvectors, 29
comparative static effects, 67 endogenous variables, 67
complementary slackness condition, 62 equilibria, 76
complete, 3 exogenous variables, 67
component, 2
concave, 53 Farkas’ Lemma, 31
strictly, 54 feasible set, 68
constraint qualifications, 60 first order necessary condition, 58
constraints, 59 flow, 75
continuous at a point, 42 fundamental equation of comparative statics, 70
continuous mapping, 43
convergent subsequence, 42 generic, 66

106
generic property, 45 map, 75
Gram’s Theorem, 25 Assuming, 20
graph, 49 ChebyshevDistance, 40
Det, 27
half spaces, 11 dot product, 4
Hausdorf space, 44 Eigenvalues, 30
Heine-Borel Theorem, 45 Eigenvectors, 30
Hessian, 52 Element, 20
bordered, 56 EuclideanDistance, 40
homogeneous of degree ρ, 50 Inverse, 25–27
hyperplane, 11 ManhattanDistance, 40
MatrixForm, 27
identity matrix, 24 MatrixRank, 22
initial condition, 75
Norm, 5, 20
inverse image, 19
NullSpace, 22
invertible, 20
Reals, 20
scalar product, 5
Jacobian, 49, 82
Table, 4
Jordan canonical form, 82
Transpose, 26
complex, 83
vector as a list, 4
real, 83
vector sum, 5
Kuhn-Tucker Conditions, 61 matrix
similar, 20
Lagrangian Function, 61 matrix-vector, 19
Lagrangian multiplier, 62 matrix-vector product
latent roots, 29 column view, 21
latent vectors, 29 row view, 21
level measurable, 45
contour, 49 measurable sets, 45
level set, 55 measure space, 45
Liapunov function, 88 metric, 39
strict, 88 metric space, 39
limit point, 41, 42 complete, 42
linear (sub)space Minkowski’s Theorem, 12
basis, 7 monotone increasing (decreasing) function, 55
dimension, 7
projection, 7 neighborhood, 43
space, 6 norm, 39
spanned by vectors, 7 normal, 10
subspace, 6 null space, 22
linear equations numbers
homogeneous, 21 cardinal, 36
non-homogeneous, 21 rational, 36
linear programming, 66 numerically equivalent, 36
linear programming problem, 60
linear transformation, 18 objective function, 58
domain, 18 open cover, 45
image, 19 open set, 41
range, 18 open sets, 43
real-valued, 16 open sphere, 40
Lotka-Volterra, 76 ordinary least squares regression, 25

107
origin, 2 discrete, 43
orthant, 4 induced by the metric, 43
orthogonal, 4 relative, 43
stronger, 43
parameters, 62 trivial, 43
path, 76 usual, 43
point, 2 weaker, 43
point-to-set mapping, 67 trace, 30
predator-prey, 76 trajectory, 76
principal minor, 56 transitive, 3
triangle inequality, 39
indefinite, 50
negative definite, 50 van del Pol equation, 88
negative semi-definite, 50 vector, 2
positive definite, 50 affine combination, 8
positive semi-definite, 50 convex combination, 9
quasi-concave, 55 dot product ·, 4
strictly, 55 equality =, 2
quasi-convex, 55 included angle, 4
strictly, 55 inequality >, 2
inner product ·, 4
rank, 19 linear combination, 6
real n-space, 1 linearly dependent, 6
reduced form, 67 linearly independent, 6
relation, 2 norm k k, 3
residual of the projection, 7 scalar product, 5
strict inequality , 2
second order necessary condition, 58 sum +, 4
separating, 12 weak inequality ≥, 2
sequence, 41
Cauchy, 42 Weierstrass Theorem, 45
convergent, 42
sequential compactness property, 45
set
dense, 44
single-peaked, 55
smooth, 53
solution set, 10
stability, 76
stable, 76, 88
state space, 75
subcover, 45
sufficient conditions, 59
supporting, 11
supremum, 39

## Taylor series approximation, 53

testable hypothesis, 59
topological space, 43
topology, 43

108
Colophon

This book was prepared using Apple computers (a MacBook and an iMac) running OS X. Graphs were
prepared using either OmniGraffle Professional or Mathematica or a combination of the two. The text
was prepared using the wonderful editor TextMate with LATEX’s book class and Lucida Bright fonts.

109