0 Suka0 Tidak suka

91 tayangan117 halamanSep 13, 2010

© Attribution Non-Commercial (BY-NC)

PDF, TXT atau baca online dari Scribd

Attribution Non-Commercial (BY-NC)

91 tayangan

Attribution Non-Commercial (BY-NC)

- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- Micro: A Novel
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The 6th Extinction
- The Black Swan
- The Art of Thinking Clearly
- The Last Battle
- Prince Caspian
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Theory of Death: A Decker/Lazarus Novel

Anda di halaman 1dari 117

for Economists

Daniel A. Graham

Copyright

ii

Contents

Contents iii

Preface vii

1 Linear Algebra 1

1.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Matrix Algebra 15

2.9 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

iii

3 Topology 35

3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4 Sigma Algebras and Measure Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Calculus 47

4.1 The First Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 The Second Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Optimization 57

5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.2 The Well Posed Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Comparative Statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6 Dynamics 75

6.1 Dynamic Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Systems of Linear Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . 81

6.3 Liapunov’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Notation 91

Using Mathematica 93

Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Input Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

Symbols and Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Using Prior Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Commonly Used Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

iv

Bibliography 103

Index 106

Colophon 109

v

vi

Preface

The attached represents the results of many years of effort to economize on the use of my scarce

mental resources — particularly memory. My goal has been to extract just those mathematical ideas

which are most important to Economics and to present them in a way that emphasizes the approach,

common to virtually all of mathematics, that begins with the phrase “let X be a non-empty set” and

goes on to add a pinch of this and a dash of that.

I believe that Mathematics is both beautiful and useful and, when viewed in the right way, not nearly as

complicated as some would have you believe. For me, the right way is to identify the links connecting

the ideas and, whenever possible, to embed them in a visual setting.

The reader should be aware of two aspects of these notes. First, intuition is emphasized. While “Prove

Theorem 7” might be a common format for exercises inside courses, “State an interesting proposition

and prove it” is far more common outside courses. Intuition is vital for such endeavors. Secondly, use

of the symbolic algebra program Mathematica is emphasized for at least the following reasons:

• Mathematica is better at solving a wide variety of problems than you or I will ever be. Our

comparative advantage is in modeling, not solving.

• Mathematica lowers the marginal cost of asking “What if?” questions, thereby inducing us to ask

more of them. This is a very good thing. One of the best ways of formulating conjectures about

what might be true, for instance, is to examine many specific cases and this is a relatively cheap

endeavor with Mathematica.

• Mathematica encourages formulating solution plans and, in general, top-down thinking. After

all, with it to do the heavy lifting, all that’s left for us is to formulate the problem and plan the

steps. This too, is a very good thing.

Why Mathematica and not Maple, another popular symbolics program? While there are differences,

both are wonderful programs and it would be difficult to argue that either is better than the other. I’ve

used both and have a slight personal preference for Mathematica.

Dan Graham

Duke University

vii

viii

Chapter 1

Linear Algebra

1.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

The following is an informal review of that part of linear algebra which will be most important to

subsequent analysis. Please bear in mind that linear algebra is, perhaps, the single most important

tool in Economics and forms the basis for many other important areas of mathematics as well.

Recall that the Cartesian product of sets, e.g. “Capital Letters” × “Integers” × “Lower Case Letters”, is

itself a set composed of all ordered n-tupels of elements chosen from the respective sets, e.g., (G, 5, f ),

(F , 1, a) and so forth. Note that no multiplication is involved in forming this product. Now introduce

the set consisting of all real numbers, denoted R and called “the reals”, and real n-space is obtained

as the n-fold Cartesian product of the reals with itself:

1

n times

z }| {

n

R ≡ R × ... × R

(-4, 5) *

≡ {(x1 , x2 , ..., xn ) | xi ∈ R, i = 1, 2, . . . , n}

be written simply as 0 when no confusion should result. An

arbitrary element of this set, x ∈ Rn , is sometimes called a

point and sometimes called a vector and xi is called the ith

component of x. The existence of two terms for the same

thing is due, no doubt, to the fact that it is sometimes useful (6, -3)

to think of x = (x1 , x2 ), for example, as a point located at

x1 on the first axis and x2 on the second axis. Other times

it is useful to think of x = (x1 , x2 ) as a directed arrow or

vector with its tail at the origin, (0, 0), and its tip at the point

x = (x1 , x2 ). Figure 1.1: Vectors in R2

See Figure 1.1. It is important to realize, on the other hand, that it is hardly ever useful to think of a

vector as a list of its coordinates. Vectors are objects and better regarded as such than as lists of their

components.

Recall that a relation (or binary relation), R, on a set S is a mapping from S × S to {True, False}, i.e.,

for every x, y ∈ S, xRy is either “True” or “False”. This is illustrated in Figure 1.2 for the relation >

on R. Note that points along the “45-degree line” where x = y map into “False”.

sible between these vectors. The vector x is equal to the

vector y when each component of x is equal to the corre- x=y

sponding component of y:

Definition 1.1. x = y iff xi = yi , i = 1, 2, . . . , n x > y: False

each component of x is at least as great as the corresponding

x

component of y:

Definition 1.2. x ≥ y iff xi ≥ yi , i = 1, 2, . . . , n.

x > y: True

The vector x is greater than the vector y when each compo-

nent of x is at least as great as the corresponding component

of y and at least one component of x is strictly greater than

the corresponding component of y:

Definition 1.3. x > y iff x ≥ y and x 6= y Figure 1.2: The Relation > on R

Problem 1.1. [Answer] Suppose x > y. Does it necessarily follow that x ≥ y?

The vector x is strictly greater than the vector y when each component of x is greater than the

corresponding component of y:

Definition 1.4. x y iff xi > yi , i = 1, 2, . . . , n.

Problem 1.2. [Answer] Suppose x y. Does it necessarily follow that x > y?

2

These definitions are standard and they conform to the conventional usage in the special case in which

n = 1. The distinctions are illustrated in Figure 1.3. The shaded area in the left-hand panel represents

the set of vectors, y, for which y (4, 3). Note that (4, 3) does not belong to this set nor does any

point directly above (4, 3) nor any point directly to the right of (4, 3). The shaded area in the right-

hand panel illustrates the set of vectors, y, for which y ≥ (4, 3). This shaded area differs by including

(4, 3), points directly above (4, 3) and points directly to the right of (4, 3). Though not illustrated, the

set of y’s for which y > (4, 3) corresponds to the shaded area in the right-hand panel with the point

(4, 3) itself removed.

x = (4,3) x = (4,3)

Problem 1.3. A relation, R, on a set S is transitive iff xRy and yRz implies xRz for all x, y, z ∈ S.

(i) Is = transitive? (ii) What about ≥? (iii) ? (iv) >?

Problem 1.4. A releation, R on a set S is complete if either xRy or yRx for all x, y ∈ S. When n = 1

it must either be the case that x ≥ y or that y ≥ x (or both). Thus ≥ on R1 is complete. Is ≥ on Rn

complete when n > 1?

Problem 1.5. [Answer] Consider the case in which n = 1 and x, y ∈ R1 . Is there any distinction

between x > y and x y?

||(-3)|| = 3

-3 0

||(2,4,4)|| = 6

3

4

||(3,4)|| = 5

2

4

The (Euclidean) norm or length of a vector, by an obvious extension of the Pythagorean Theorem, is

the square root of the sum of the squares of its components.

3

1/2

Definition 1.5. kxk ≡ x12 + x22 + . . . + xn

2

.

1

√ absolute value of a real number and the norm of a vector in R are equivalent — if a ∈ R

Note that the

2 2 3

then kak = a = |a|. The norms of a vectors in R and R are illustrated in Figure 1.4 on preceding

page. The extensions to higher dimensions are analogous.

is obtained by multiplying the respective components and adding.

Pn

Definition 1.6. x · y ≡ x1 y1 + x2 y2 + . . . + xn yn = i=1 xi yi y = (3,4)

z = (-3,4)

product. It can be shown that

where θ is the included angle between the two vectors. Recall x = (4,-3)

that the cosine of θ is bigger than, equal to or less than zero w = (-3, -4)

depending upon whether θ is less than, equal to or greater

than ninety degrees.

Theorem 1. Suppose x, y ∈ Rn with x, y 6= 0. Then x · y > 0

iff x and y form an acute angle, x · y = 0 iff x and y form a Figure 1.5: Angles

right angle and x · y < 0 iff x and y form an obtuse angle.

This theorem is illustrated in Figure 1.5 where (a) x and y form a right angle and x · y = 0, (b) x and

w form a right angle and x · w = 0, (c) x and z form an obtuse angle and x · z = −24 < 0, (d) y and

w form an obtuse angle and y · w = −25 < 0 and (e) y and z form an acute angle and y · z = 7 > 0.

When two vectors form a right angle they are said to be orthogonal Note that the word “orthogonal” is

just the generalization of the word “perpendicular” to Rn . Similarly, orthant is the generalization of

the word quadrant to Rn .

Problem 1.6. The Cauchy-Schwarz inequality states that x · y ≤ kxkkyk. Show that this inequality

follows from Equation 1.1.

♦ Query 1.1. When does Cauchy-Schwarz hold as an equality?

Problem 1.7. Suppose a, x, y ∈ Rn and a · x > a · y. Does it follow that x > y? [Hint: resist any

temptation to “divide both sides by a”.]

Problem 1.8. In Mathematica a vector is a list, e.g. {1,2,3,4} or Table[j,{j,1,4}] and the dot

product of two vectors is obtained by placing a period between them. Use Mathematica to evaluate the

following dot product:

Table[j, {j,1,100}] . Table[j-50, {j,1,100}]

Do the two vectors form an acute angle, an obtuse angle or are they orthogonal?

The sum of two vectors is obtained by adding the respective components. Supposing that x, y ∈ Rn

we have:

Definition 1.7. x + y ≡ (x1 + y1 , x2 + y2 , . . . , xn + yn )

4

Note that the sum of two vectors in Rn is itself a vector in

Rn . The set Rn is said, therefore, to be closed with respect

to the operation of addition. y = (-1,5)

2

The addition of two points from R is illustrated in Figure 1.6.

The addition of x = (5, −3) and y = (−1, 5) yields a point

x+y = (4, 2)

located at the corner of the parallelogram whose sides are

formed by the vectors x and y. Equivalently, x + y = (4, 2)

is obtained by moving the vector x parallel to itself until its

tail rests at the tip of y, or by moving the vector y parallel

to itself until its tail rests at the tip of x.

x = (5, -3)

The scalar product of a real number and a vector is obtained

by multiplying each component of the vector by the real num-

ber. If α ∈ R then:

Definition 1.8. αx = (αx1 , αx2 , . . . , αxn ).

Note that this product is itself a vector in Rn . The set Rn is Figure 1.6: Vector Addition

said, therefore, to be closed with respect to the operation of scalar multiplication.

Scalar multiplication is illustrated in Figure 1.7. Note that

for any choice of α, αx lies along the extended line passing

through the origin and the point x. The sign of α determines

whether αx will be on the same (α > 0) or opposite (α <

0) side of the origin as x. The magnitude of α determines

whether αx will be closer (< 1) or further away (kαk > 1) 2x

is a real number, then x+y gives the sum of the two vectors

and a x gives the scalar product of a and x. Use Mathematica -2 x

to evaluate the following:

3 {1,3,5} + 2 {2,4,6}

The norm of αx is

kαxk = (α2 x12 + α2 x22 + . . . + α2 xn

2 1/2

)

Figure 1.7: Scalar Product

= [α2 (x12 + x22 + . . . + xn

2 1/2

)]

= (α2 )1/2 (x12 + x22 + . . . + xn

2 1/2

)

= kαkkxk

Multiplying x by α thus produces a new vector that is kαk times as long as the original vector. It is not

difficult to see that αx points in the same direction as x if α is positive and in the opposite direction

if α is negative.

Problem 1.10. In Mathematica the norm of the vector x is given by Norm[x]. What is

Norm[Table[j, {j,1,100}]]

5

1.2.1 Linear Combinations

z = α1 x 1 + α2 x 2 + . . . + αk x k

Definition 1.10. If

α1 x 1 + α2 x 2 + . . . + αk x k = (0, 0, . . . , 0)

has no solution (a1 , a2 , . . . , ak ) other than the trivial solution, α = 0, then the vectors are said to be

linearly independent. Alternatively, if there were a non-trivial solution, α 6= 0, then the vectors are said

to be linearly dependent.

In the latter case we must have αj 6= 0 for some j and thus can write:

X

αj x j = − αi x i

i6=j

or, since αj 6= 0,

X αi i

xj = − x

i6=j

αj

Thus x j is a linear combination of the remaining x’s. It follows that vectors are either linearly inde-

pendent or one of them can be expressed as a linear combination of the rest.

This is illustrated in Figure 1.8. In the right-hand panel, x and y are linearly dependent and a non-

trivial solution is α = (1, 2). In the left-hand panel, on the other hand, x and y are linearly independent.

Scalar multiples of x lie along the dashed line passing through x and the origin and, similarly, scalar

multiples of y lie along the dashed line passing through y and the origin. The only way to have the

sum of two points selected from these lines add up to the origin is to choose the origin from each line

— the trivial solution α = (0, 0).

2y = (-4, 8)

(0,7)

y = (6, 4)

y = (-2, 4)

(0, 0) =1 x + 2 y

x = (6, -3)

x = (4, -8)

Definition 1.11. If L is a non-empty set which is closed with respect to vector addition and scalar

multiplication, i.e. (i) x, y ∈ L =⇒ x + y ∈ L and (ii) α ∈ R, ; x ∈ L =⇒ αx ∈ L, then L is called a

linear space.

Definition 1.12. If L is a linear space and L ⊆ M then L is a linear subspace of M.

Problem 1.11. Which of the following sets are linear subspaces of R3 ?

6

2. A line segment? A line through the origin? A line not passing through the origin?

3. A plane passing through the origin? A plane not passing through the origin?

♦ Query 1.2. Must the intersection of two linear subspaces itself be a linear subspace?

Definition 1.13. The dimension of a linear (sub)space is an integer equaling the largest number of

linearly independent vectors which can be selected from the (sub)space.

Problem 1.12. [Answer] What are the dimensions of the following subsets of R3 ?

1. The origin?

Definition 1.14. Given a set {x 1 , x 2 , . . . , x k } of k vectors in Rn , the set of all possible linear combina-

tions of these vectors is referred to as the linear subspace spanned by these vectors.

Linear spaces spanned by independent and dependent vectors are illustrated for the case in which

n = 2 in Figure 1.8 on the previous page. Since the two vectors, (6, −3) and (6, 4) in the left-hand

panel are linearly independent, every point in R2 can be obtained as a linear combination of these two

vectors. The point (0, 7), for example, corresponds to −1x + 1y. In the right-hand panel, on the other

hand, the two vectors (4, −8) and (−2, 4), are linearly dependent and the linear subspace spanned by

these vectors is a one-dimensional, strict subset of R2 corresponding to the line which passes through

the two points and the origin.

Problem 1.13. [Answer] Suppose x, y ∈ Rn with x 6= 0 and let X = { z ∈ Rn | z = αx, α ∈ R } be

the (1-dimensional) linear space spanned by x. The projection of y upon X, denoted ŷ, is defined to

be that element of X which is closest to y, i.e. that element ŷ ∈ X for which the norm of the residual

of the projection, ky − ŷk, is smallest. Obtain expressions for both α̂ and ŷ as functions of x and y.

Problem 1.14. Suppose a, y ∈ Rn and let X = { x ∈ Rn | a · x = 0 } be the linear subspace

orthogonal to a. Obtain an expression for ŷ, the projection of y on X, as a function of a and y.

[See Problem 1.13.]

Definition 1.15. A basis for a linear (sub)space is a set of linearly independent vectors which span the

(sub)space.

Definition 1.16. An orthonormal basis for a linear (sub)space is a basis with two additional properties:

1. The basis vectors are mutually orthogonal, i.e., if x i and x j are vectors in the basis, then x i · x j =

0.

2. The length of each basis vector is one, i.e., if x i is a vector in the basis, then x i · x i = 1.

In forming linear combinations of vectors no restriction whatever is placed upon the α’s other than

that they must be real numbers. In the left-hand panel of Figure 1.9 on following page, for example,

every point in the two-dimensional space corresponds to a linear combination of the two vectors. An

affine combination of vectors, on the other hand, is a linear combination which has the additional

restriction that the α’s add up to one.

7

Definition 1.17. If x 1 , x 2 , . . . , x k are k vectors in Rn and if α1 , α2 , . . . , αk are real numbers with the

property that

Xk

αi = 1

i=1

then

k

X

z= αi x i

i=1

is an affine combination of the x’s.

Problem 1.16. An affine combination of points is necessarily a linear combination as well but not

vice versa. True or false?

An affine space bears the same relationship to affine combinations that a linear space does to linear

combinations:

Definition 1.18. If L is closed with respect to affine combinations, i.e. affine combinations of points in

L are necessarily also in L, then L is called an affine space. If, additionally, L ⊆ M then L is an affine

subspace of M.

The affine subspace spanned by a set of vectors is similarly analogous to the linear subspace spanned

by a set of vectors.

Definition 1.19. Given a set {x 1 , x 2 , . . . , x k } of k vectors in Rn , the affine subspace spanned by these

vectors is set of all possible affine combinations of these vectors

Xk k

X

n i

z∈R |z= αi x , αi = 1

i=1 i=1

now that x 1 6= x 2 , let λ = α1 and (1 − λ) = α2 and rewrite this as z = λx 1 + (1 − λ)x 2 . Rewriting again

we have z = λ(x 1 − x 2 ) + x 2 . Note that when λ = 0, z = x 2 . Alternatively, when λ = 1, z = x 1 . In

general z is obtained by adding a scalar multiple of (x 1 − x 2 ) to x 2 . It is not difficult to see that such

points lie on the extended line passing through x 1 and x 2 — the set of all possible affine combinations

of two distinct vectors is simply the line determined by the two vectors. This is illustrated for n = 2

by the middle panel in Figure 1.9.

Figure 1.9: Combinations: linear (left), affine (middle) and convex (right)

Problem 1.17. A linear subspace is necessarily an affine subspace as well but not vice versa. True or

false?

Problem 1.18. [Answer] Suppose a is a point in L where L is an affine subspace but not a linear

subspace. Let M be the set obtained by “subtracting” a from L, i.e. M = { z | z = x − a, x ∈ L }. Is M

necessarily a linear subspace?

Problem 1.19. Suppose x, y ∈ Rn with x and y linearly independent and consider the affine sub-

space A = {z ∈ Rn | z = λx + (1 − λ)y, λ ∈ R}. Find the projection, ô of the origin on A.

8

1.2.3 Convex Combinations

If we add the still further requirement that the α’s not only add up to one but also that each is non-

negative, then we obtain a convex combination.

Definition 1.20. If x 1 , x 2 , . . . , x k are k vectors in Rn and if α1 , α2 , . . . , αk are real numbers with the

property that

k

X

αi = 1

i=1

αi ≥ 0, i = 1, . . . , k

then

k

X

z= αi x i

i=1

Again considering the case of k = 2, we know that since the α’s must sum to one, convex combinations

of two vectors must lie on the line passing through these vectors. The additional requirement that the

α’s must be non-negative means that convex combinations correspond to points on the line between

x 1 and x 2 , i. e. the set of all possible convex combinations of two distinct points is the line segment

connecting the two points. This is illustrated for n = 2 in Figure 1.9 on the previous page.

Problem 1.20. A convex combination of points is necessarily an affine combination and thus a linear

combination as well but not vice versa. True or false?

A convex set bears the same relationship to convex combinations that an affine subspace does to affine

combinations:

Definition 1.21. If L ⊆ Rn and L is closed with respect to convex combinations, i. e. convex combina-

tions of points in L are necessarily also in L, then L is called an convex set.

Problem 1.21. Show that the intersection of two (or more) convex sets in Rn must itself be a convex

set.

Definition 1.22. Given a set L ⊆ Rn , the smallest convex set which contains L is called the convex hull

of L. Here “smallest” means the intersection of all convex sets containing the given set.]

The convex hull of a set of vectors corresponds to the set of all convex combinations of the vectors

and is thus analogous to the affine space spanned by a set of vectors:

Problem 1.22. Suppose x, y and z are three, linearly independent vectors in R3 . Describe the sets

which correspond to all (i) linear, (ii) affine and (iii) convex combinations of these three vectors.

9

With the geometrical interpretation of the dot product in

mind consider the problem of solving the linear equation

a1 x1 + a2 x2 + . . . + an xn = 0

a = (4, 3)

or

a·x =0

where a = (a1 , a2 , . . . an ) is a known vector of coefficients 90

— called the normal of the equation — and the problem is

to find those x’s in Rn which solve the equation. We know

that finding such an x is equivalent to finding an x which is

a. x = 0

orthogonal to a. The solution set,

X(a) ≡ { x ∈ Rn | a · x = 0 }

is illustrated for n = 2 in Figure 1.10. Figure 1.10: a · x = 0

Problem 1.23. [Answer] Show that X(a) is a linear subspace.

Problem 1.24. [Answer] What is the dimension of X(a)?

Problem 1.25. Suppose a, b, y ∈ Rn are linearly independent and let L = {x ∈ Rn | a · x = 0 and b ·

x = 0}. Find an expression for ŷ, the projection of y on L as a function of a, b and y.

a1 x1 + · · · + an xn = b

or

a·x =b

where b is not necessarily equal to 0 and let

X(a, b) = {x ∈ Rn | a · x = b}

Problem 1.26. [Answer] Show that X(a, b) is an affine subspace.

Problem 1.27. When is X(a, b) a linear subspace?

♦ Query 1.3. Which two subsets of a linear space, X, are always linear subspaces?

To provide a geometric characterization of X(a, b), find a point x ∗ that (i) lies in the linear subspace

spanned by a and (ii) solves the equation a · x = b. To satisfy (i) it must be the case that x ∗ = λa for

some real number λ. To satisfy (ii) it must be the case that a · x ∗ = b. Combining we have a · (λa) = b

or λ = b/(a · a) and thus x ∗ = [b/(a · a)]a.

Now suppose that x 0 is any solution to a · x = 0. It follows that x ∗ + x 0 must solve a · x = b since

a · (x ∗ + x 0 ) = a · x ∗ + a · x 0 = b + 0 = b. We may therefore obtain solutions to a · x = b simply

by adding x ∗ to each solution of a · x = 0. X(a, b) is obtained, in short, by moving X(a) parallel

to itself until it passes through x ∗ . The significance of x ∗ is that it is the point in X(a, b) which is

closest to the origin. This norm, moreover, is kx ∗ k = kbk/kak . Note that x ∗ can be interpreted as

the intercept of the solution set with the a “axis”. When b is positive, X(a, b) lies on the same side

of the origin as a and a forms a positive dot product (acute angle)with each point in X(a, b). When b

is negative, X(a, b) lies on the opposite side of the origin from a and a forms a negative dot product

(obtuse angle) with each point in X(a, b).

10

This x ∗ is illustrated in Figure 1.11 for the case in which

a = (4, 3), kak = 5, b = −25/2, x ∗ = [b/(a · a)]a = −25/2 ×

1/25 × (4, 3) = (−2, −3/2) and

q

kx ∗ k = (−2, −3/2) · (−2, −3/2) a = (4, 3)

= 5/2

= kbk/kak

90

x* = (-2, -3/2)

The solution set for the linear equation a · x = b can thus be

given the following interpretation: X(a, b) is an affine sub-

a . x = -25/2 a. x = 0

space orthogonal to the normal a and lying a directed dis-

tance equal to b/kak from the origin at the closest point.

The term directed distance simply means that X(a, b) lies

on the same side of the origin as a if b is positive and on the

opposite side if b is negative. Figure 1.11: a · x = b

This is the standard form for a linear equation. It replaces

the familiar slope-intercept form used for n = 2. In this more general form the slope is given by the

“orthogonal to a” requirement and the intercept by the point x ∗ a distance b/kak out the a “axis”.

Problem 1.28. Suppose b ∈ R, a, y ∈ Rn and let X(a, b) = {x ∈ Rn | a · x = b}. Obtain an

expression for ŷ, the projection of y on X(a, b), as a function of a, b and y. [See Problem 1.13 on

page 7.]

The solution set X(a, b) bears exactly the same relationship to Rn that a plane does to R3 . For example,

it is linear (either a linear or an affine subspace) and has a dimension equal to n − 1. For these reasons

X(a, b) ≡ { x ∈ Rn | a · x = b }

is called a hyperplane. This hyperplane divides Rn into two associated half spaces

H + (a, b) ≡ { x ∈ Rn | a · x ≥ b }

H − (a, b) ≡ { x ∈ Rn | a · x ≤ b }

Definition 1.23. If Z ⊂ Rn is an arbitrary set, then X(a, b) is bounding for Z iff Z is entirely contained

in one of X(a, b)’s half-spaces, i.e., either Z ⊆ H + or Z ⊆ H − .

Definition 1.24. If Z ⊂ Rn is an arbitrary set, then X(a, b) is supporting for Z iff X(a, b) is bounding

for Z and X(a, b) “touches” Z, i.e.,

inf |a · z − b| = 0

z∈Z

These concepts together with the following theorem will prove very useful in subsequent analysis.

11

Theorem 2 (Minkowski’s Theorem). If Z and W are non-empty, convex and non-intersecting subsets

of Rn , then there exist a ∈ Rn and b ∈ R such that X(a, b) is separating for Z and W , i.e., X(a, b)

(i) is bounding for both Z and W , (ii) contains Z in one half-space and (iii) contains W in the other

half-space.

Minkowski’s Theorem is illustrated for n = 2 in Figure 1.12. In the left-hand panel the antecedent

conditions for the theorem are met and the separating hyperplane is illustrated. In right-hand panel

one of the sets is not convex and it is not possible to find a separating hyperplane.

a. x = b

Z Z

W

W

Figure 1.12: Conditions for Minkowski’s Theorem: satisfied (left) and violated (right)

1.5 Answers

Problem 1.1 on page 2. Yes. From the “only if” in the definition, x > y =⇒ x ≥ y.

Problem 1.2 on page 2. Yes. From the “only if” in the definition,

x y =⇒ xi > yi , i = 1, 2, . . . , n

=⇒ xi ≥ yi , i = 1, 2, . . . , n

=⇒ x ≥ y

Problem 1.5 on page 3. No. If x > y then at least one component of x must be greater than the

corresponding component of y. Since there is only one component, this means that every component

of x is greater than the corresponding component of y. Thus x > y =⇒ x y. The converse also

holds.

Problem 1.12 on page 7. The origin has dimension 0. Surprised? Note that α(0, 0, 0) = (0, 0, 0) has an

abundance of non-trivial solutions, e.g. α = 1. A line through the origin has dimension 1 and a plane

through the origin has dimension 2.

Problem 1.13 on page 7. Two facts characterize this projection. (i) Since ŷ ∈ X it must be the case

that ŷ = α̂x for some real α̂. (ii) The residual of the projection, y − ŷ, must be orthogonal to every

vector in X. Since x ∈ X fact (ii) implies that (y − ŷ) · x = 0 or y · x = ŷ · x. Combining with (i) yields

y · x = α̂x · x or, since kxk 6= 0, α̂ = (y · x)/(x · x) and thus ŷ = (y · x)/(x · x)x.

Problem 1.15 on page 7.

x1 = (1 0 0 ··· 0)

2

x = (0 1 0 ··· 0)

..

.

xn = (0 0 0 ··· 1)

12

Problem 1.18 on page 8. Yes. The argument proceeds in three steps.

1. M is an affine space:

If y i ∈ M, i = 1, . . . , k then it must be the i i

P case that y = x − a, , k for some x i ∈ L, i =

P i = 1, . . . P

is affine, it follows that i αi = 1 implies z ≡ i αi x i = i αi (y i + a) ∈ L. But

1, . . . , k. Since L P

this means that i αi y i = z − a ∈ M. Thus M is an affine space.

Since a ∈ L it follows that a − a = 0 ∈ M.

Suppose thatP x i ∈ M, i = 1, . . . , k and βi ∈ R, i = 1, . . . , k. We need to show that the linear

combination i βi x i ∈ M.

i i i i

P

Note that β

P Pi x =iβi x + (1 − βi )0 ∈ M since x , 0 ∈ M and M is affine. But then i βi x + (1 −

i βi )0 = i βi x ∈ M.

thus x + x 0 ∈ X(a) (ii) if x ∈ X(a) then a · x = 0, αa · x = a · (αx) = 0 and thus αx ∈ X(a).

Problem 1.24 on page 10. Since (i) the dimension of Rn equals n, (ii) a itself spans (occupies) a linear

subspace of dimension 1 and (iii) X(a) contains all those xs which are orthogonal to a, it is not hard

to see that there are n − 1 directions left in which to find vectors orthogonal to a. Thus the dimension

of X(a) must be equal to n − 1.

Problem 1.26 on page 10. Since x i ∈ X(a, b) implies a·x i = b and i αi = 1 for any affine combination

P

i

P

i αi x , it follows that

b = (α1 + . . . + αk )b

= α1 a · x 1 + . . . + αk a · x k

= a · (α1 x 1 + . . . + αk x k )

αi x i ∈ X(a, b).

P

and i

13

14

Chapter 2

Matrix Algebra

2.9 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Thus far we have thought of vectors as points in Rn represented by n-tuples of real numbers. This

is a little like thinking of “127 Main Street” as a 15 character text string when, in reality, it’s a house.

Similarly, an n-tuple of real numbers is best regarded as the address of the vector that lives there.

15

y

y = (0, 1) 1

||y||

3/2 x = (3/2, 0)

o 1

x = (1, 0)

||x|| x

3/2 x

2

2-dimensional linear space R

Figure 2.1: A linear space (left) and the corresponding “address space” (right)

All this can be made less abstract by constructing a “coordinate free” linear space using a pencil, ruler,

protractor and a blank sheet of paper. Begin by placing a point on the paper and labeling it o to

represent the origin. Then arbitrarily pick another couple of points, label them x and y and draw

arrows connecting them to o. This is illustrated in the left-hand panel of Figure 2.1.

The lengths, kxk and kyk, of x and y, respectively, can be measured with the ruler. The scalar product

of x and, say 3/2, can then be obtained by extending x using the ruler until the length is 3/2 times as

long as x. Multiplying by a negative real number, say −2, would require extending x in the opposite

direction until it’s length is 2 times the original length. The scalar multiple of an arbitrary point z by

the real number a is then obtained by expanding (or contracting) z until its length equals kakkzk and

then reversing the direction if a is negative.

To add, say, x and y use the protractor to construct a parallel to y through x and a parallel to

x through y. The intersection of these parallels gives x + y. Adding any other two points would

similarly be accomplished by “completing the parallelogram” formed by the two points.

Note that x and y are linearly independent since ax + by = o has only the trivial solution a = b = 0.

Any other point, z, can be expressed as a linear combination, z = ax + by, for appropriate choices of

the real numbers a and b. This means that the two vectors, x and y, form a basis for our linear space

which, consequently, is 2-dimensional. All this is possible without axes and coordinates.

Now let’s add coordinates by choosing x and y, respectively, as the two basis vectors for our linear

space. The corresponding 2-dimensional “address” space is illustrated in the right-hand panel of Fig-

ure 2.1 where, for example, (1, 0) is the address of x since x lives 1 unit out the first basis vector (x)

and 0 units out the second basis vector (y). In general, (a, b) in the right-hand panel is the address of

the vector ax + by in the left-hand panel.

Definition 2.1. A linear space is an abstract set, L, with a special element called the “origin” and

denoted o together with an operation on pairs of elements in L called “addition” and denoted + and

another operation on elements in L and real numbers called “scalar multiplication” with the property

that for any x, y ∈ L and any a ∈ R: (i) x + o = x, (ii) 0 x = o, (iii) x + y ∈ L and (iv) ax ∈ L.

Suppose that L is an n-dimensional linear space and that b = {b1 , b2 , . . . bn } ⊂ L form a basis for L. A

real-valued linear transformation on L is a map, t, from L into R with the property that t(ax + by) =

at(x) + bt(y) for all real numbers a and b and all x, y ∈ L.

16

Since b is a basisPfor L, an arbitrary x̂ ∈ L must be expressible as a linear combination of the Pelements

n

of b. Thus x̂ = i x i b i where x = (x1 , x2 , . . . , x n ) ∈ R and, since T is linear, T ( x̂) = T ( i x i bi ) =

n

P

i T (bi )xi = a · x, where ai ≡ T (bi ) and thus a ∈ R . This means that a · x is the image of T (x̂)

when x is the address of x̂. It also means that for every real-valued linear transformation on the

n-dimensional linear space, L, there is a corresponding vector, a ∈ Rn , that represents the associated

formula for getting the image of a point from its address.

♦ Query 2.1. Let L∗ denote the set of all real-valued linear transformations of the linear space L and

define addition and scalar multiplication for elements of L∗ as follows for all f , g ∈ L∗ and α ∈ R:

(αf )(x) ≡ αf (x), ∀x ∈ L

L∗ thus defined is called the dual space of L. Is it a linear space and, if so, what is its dimensionality?

In general, a linear transformation is a mapping that is, well, linear. This means (i) that if x maps into

T (x) and a is a real number, then ax must map into aT (x) and (ii) that if x and y map into T (x) and

T (y), respectively, then x + y must map into T (x) + T (y).

Let’s suppose that the domain and range of the linear transformation are both equal to the 2-dimensional

linear space illustrated in Figure 2.1 on the previous page and construct a linear transformation. Con-

sider the left-hand panel of Figure 2.2. First select the same basis vectors as before, x and y. Now

choose arbitrary points to be the images of these two points and label them T (x) and T (y). You’re

done. That’s right, you have just constructed a linear transformation. To see why simply note that any

point in the domain, z, can be expressed as a linear combination of the basis vectors, z = ax + by.

But then the linearity of T implies that T (z) = T (ax + by) = aT (x) + bT (y). Thus the image of any

point in the domain is completely determined by the starting selection of T (x) and T (y).

y

T(y)

T(y) = (1/2, 1)

2/3 y y = (0,1)

T(x)

T(x) = (1, 2/3)

o

x = (1,0)

1/2 x

x

Note that the T (x) and T (y) in the illustration are linearly independent. This need not be the case,

they could be linearly dependent and span either a one-dimensional linear subspace of L, a line, or a

zero-dimensional linear subspace, the origin. See Problem 2.1 on following page.

As before, the right-hand panel of Figure 2.2 gives the “address view” of the same linear transforma-

tion. This means that x = (1, 0) maps into T (x) = (1, 2/3), y = (0, 1) maps into T (y) = (1/2, 1) and,

17

in general, z = (z1 , z2 ) maps into

T (z1 x + z2 y) = z1 T (x) + z2 T (y)

= (1, 2/3)z1 + (1/2, 1)z2

" #" #

1 1/2 z1

=

2/3 1 z2

1 1/2 z1

2/3 1 z2

gives a formula for computing the address of the image of z from the address of z.

Problem 2.1. Suppose, in the construction of the linear transformation in Figure 2.2 on preceding

page, that x and y are linearly independent but that T (x) and T (y) were chosen in a way that made

them linearly dependent and, in fact, span only a one-dimensional linear subspace. Discuss the impli-

cations for the image of the domain under the transformation and for the matrix that maps addresses

for the transformation.

Definition 2.2. A linear transformation is a mapping, T , which associates with each x in some n-

dimensional linear space, D, a point T (x) in some m-dimensional linear space, R, with the property

that if x 1 , . . . , x k ∈ D and α1 , . . . , αk ∈ R then

X X

i

T αi x = αi T x i

i i

While the definition imposes no restriction upon the values of m and n it is convenient to assume for

the moment that m = n and R = D. Suppose that b1 , . . . , bn ∈ D is a basis for both D and R, and let

T (bj ), j = 1, . . . , n be the images of these basis vectors under the transformation. Since T (bj ) belongs

to R = D it can be expressed as a linear combination of the basis vectors:

n

X

T (bj ) = aij bi

i=1

The matrix

a11 a12 ··· a1n

a21 a22 ··· a2n

. .. .. ..

..

. . .

an1 an2 ··· ann

obtained in this way has for its jth column the “address” of T (bj ) in terms of the basis, i.e., T (bj )

“lives” a1j out the basis vector b1 , a2j out b2 and so forth.

Similarly, an arbitrary vector x ∈ D can be expressed as a linear combination of the basis vectors

n

X

x= xj b j

j=1

x1

x2

.

..

xn

18

can be interpreted as the address of x in terms of the basis.

Now since T is linear,

n

X

T (x) = T xj b j

j=1

n

X

= xj T (bj )

j=1

n

X n

X

= xj aij bi

j=1 i=1

n

X n

X

= aij xj bi

i=1 j=1

A11 A12 ··· A1n x1

A21 A22 ··· A2n x2

. .. .. .. .

.. ..

. . .

An1 An2 ··· Ann xn

A similar result can be established when m 6= n so that to every linear transformation which maps an

n-dimensional linear space into an m-dimensional linear space there corresponds an m by n matrix

for mapping addresses in terms of given bases for the domain and the range, and vice versa. This

being the case, the study of linear transformations centers upon the matrix-vector product Ax or,

equivalently, upon the linear transformations, T , for which D = Rn and R = Rm .

♦ Query 2.2. Suppose m 6= n and thus R 6= D. Let d1 , d2 , . . . , dn ∈ D be a basis for D and r1 , r2 , . . . , rm ∈

R be a basis for R. Derive the formula for mapping the address of x ∈ D into the address of T (x) ∈ R.

Definition 2.3. The image of X ⊆ Rn under T is

Note that T (Rn ), the set of all linear combinations of the columns of A, is a linear subspace with a

dimension equal to the number of linearly independent columns or rank of A. It is also true that

Rank(A) ≤ min{m, n} since there can’t be more linearly independent columns than there are columns

and since the columns themselves live in Rn . When Rank(A) < n the transformation “collapses” the

domain into a linear subspace. No such collapse takes place when Rank(A) = m = n and T (Rn ) = Rn .

T −1 (Y ) ≡ {x ∈ Rn | T (x) ∈ Y }

19

Definition 2.5. A mapping is invertible iff the inverse image of any point in the range is a single point

in the domain.

T −1 T (x) = x = T T −1 (x)

(2.1)

Problem 2.2. [Answer] Show that the transformation associated with the matrix A is invertible iff

Rank(A) = m = n.

Problem 2.3. Suppose that the n × n matrix A is invertible. Does it follow that Ax = 0 =⇒ x = 0?

Problem 2.4. Suppose that A in an n × n matrix and that Ax = 0 =⇒ x = 0. Does it follow that A is

invertible?

What difference does the choice of a basis make to the matrix that represents the linear transformation

withhrespect to thei basis? Suppose that A is the original matrix, b1 , b2 , . . . , bn is the original basis

and b̂1 , b̂2 , . . . , b̂n is the new basis. Since each original basis vector must be uniquely associated with

a new basis vector and since, as bases, both must be linearly independent, this change of basis defines

a linear transformation which maps the new basis vectors to the old ones and this transformation

must be invertible. Let P be the matrix version of this transformation so that if x̂ is the address of a

vector in terms of the new basis, then P x̂ is the address of the same vector in terms of the old basis.

Since the transformation itself has not changed, it must be the case that x = P x̂ maps into Ax or,

in terms of the new basis, that x̂ maps into P −1 Ax = P −1 AP x̂. Thus B = P −1 AP is the matrix that

represents the transformation with respect to the new basis.

Definition 2.6. Two matrices, A and B, are called similar if there exists an invertable matrix P such

that B = P −1 AP .

Theorem 3. Two matrices, A and B, represent the same linear transformation with respect to different

bases iff A and B are similar.

1. From a collection of similar matrices, the simplest or most analytically convenient can be selected

since they all represent the same linear transformation.

2. The set of all linear transformations can be partitioned into sets of similar transformations and

a simplest representative selected from each to form a set of canonical or representative forms.

For example, it can be shown that all 2 × 2 matrices are similar to one of the following three

matrices:

a b c 0 r 0

(2.2)

−b a 0 d 1 r

Understanding linear transformations of 2-dimension linear spaces then reduces to understand-

ing these three canonical forms.

" #

a b

A=

−b a

20

√ √

Show that kAxk/kxk = a2 + b2 and cos(θ) = a/ a2 + b2 where θ is the angle between x and Ax

and thus that this transformation corresponds to a rotation and either a lengthening or a shortening.

Hint: For the first part, try Mathematica with

x = {x1, x2};

Assuming[{Element[x1, Reals], Element[x2, Reals], Element[ a, Reals],

Element[b, Reals]}, Simplify[Norm[A.x]/Norm[x]]]

" #

c 0

A=

0 d

Interpret the transformation T , i.e., what are the images, T (x) and T (y), of the two basis vectors, x

and y?

Solving simultaneous systems of linear equations involves nothing more than identifying the proper-

ties of the inverse image of a linear transformation. To solve the homogeneous system

..

.

am1 x1 +···+ amn xn = 0

or Ax = 0 is to find the inverse image of 0 under this linear transformation. Similarly, to solve the

non-homogeneous system

a11 x1 + · · · + a1n xn = b1

..

.

am1 x1 +···+ amn xn = bm

Two distinct views of the matrix-vector product prove useful. In the column view, the vector Ax is

viewed as a linear combination of the columns of A using the components of x as the weights:

a11 a12 a1n

a21 a22 a2n

x1 + . x2 + · · · + . xn (COL)

.

.. .. ..

am1 am2 amn

In the row view, the components of the vector Ax are viewed as the dot products of the rows of A

with the vector x:

[a11 a12 · · · a1n ] · x

[a21 a22 ··· a2n ] · x

.. .. .. .. .. (ROW)

. . . . .

[am1 am2 ··· amn ] · x

21

2.6.1 Homogeneous Equations

Consider the homogeneous system Ax = 0 using the column view. A non-trivial solution (x 6= 0)

is possible iff the columns of A are linearly dependent since a non-trivial linear combination of the

columns using the components of x as weights can only be equal to zero if the columns are linearly

dependent.

The row view confirms this since x must be orthogonal to each row of A and thus to the linear

subspace spanned by the rows of A. This is possible iff Rank(A) = r < n in which case the rows span

a r dimensional linear subspace and there are n − r directions left to look for things orthogonal. The

solution set in this case, not surprisingly, is itself a linear subspace of dimension n − r and is called

the null space of A.

Problem 2.6. Suppose

0 1 0 0 0

0 0 0 1 1

A=

0

1 0 1 1

1 1 0 0 1

MatrixRank[A]

gives the rank of the matrix A and the command

NullSpace[A]

gives an orthogonal basis for the null space of A. (i) What is the rank of A? (ii) Give an orthogonal

basis for the null space of A. (iii) What is the rank of the null space of A?

The column view is illustrated in the left-hand panel of Figure 2.3 for the case in which

" #

6 −3

A=

4 −2

Column 1:

(6, 4) Ax = (0, 0)

(1, 2)

Column 2:

(-3, -2) Row 1: (6, -3)

(6, 4)1+(-3 ,-2)2

= (0,0)

Column View Row View

Since rank(A) = 1 there are non-trivial choices for the weights x1 and x2 for which A·1 x1 + A·2 x2 = 0,

e.g., (x1 , x2 ) = (1, 2). The right-hand panel presents the corresponding row view in which the solution

set is a 2 − 1 = 1 dimensional linear subspace orthogonal to the linear subspace spanned by the rows

of A. Note that (x1 , x2 ) = (1, 2) belongs to the solution set.

22

2.6.2 Non-Homogeneous Equations

The non-homogeneous system Ax = b is similar. The column view suggests that a solution is possible

iff b lies in the linear subspace spanned by the columns of A. Put somewhat differently, a solution

is possible iff Rank(A|b) = Rank(A). Given any one such solution, x ∗ , it is possible to obtain all

solutions as follows. Since Ax ∗ = b it follows that if x 0 is any other solution it must be the case

that Ax ∗ = Ax 0 = b or A(x 0 − x ∗ ) = 0. Now we already know that solutions to Ax = 0 form a

linear subspace of dimension n − Rank(A). The solutions to Ax = b must then correspond to the

set obtained by adding x ∗ to each of the solutions to Ax = 0 — an affine subspace of dimension

n − Rank(A).

This is illustrated in Figure 2.4 for the case in which

" #

4 3

A=

−3 4

" #

5

b=

−10

(3, 4) (-3, 4)

(4, 3)

(2, -1)

(4, -3) S2

-1(3, 4)

+2(4, -3) S1

(5, -10)

Column View Row View

Since rank(A) = 2 = n, the solutions to Ax = b must be an affine subspace of rank zero — a single

point — which corresponds to the trivial solution for Ax = 0. In column view illustrated in the

left-hand panel this unique solution for x is obtained by “completing the parallelogram” whose sides

correspond to the columns of A and whose diagonal corresponds to b. It follows that the unique

solution is x = (2, −1). Notice that if the columns of A were chosen as the basis, then the address

of b would be (2, −1). In the row view illustrated in the right-hand panel, the unique solution for x

corresponds to the intersection of

S1 = {x | (4, 3) · (x1 , x2 ) = 5}

a hyperplane orthogonal to the first row of A and lying a directed distance equal to 5/k(4, 3)k = 1

from the origin and

S2 = {x | (−3, 4) · (x1 , x2 ) = −10}

a hyperplane orthogonal to the second row of A and lying a directed distance equal to −10/k(−3, 4)k =

−2 from the origin.

23

2.7 Square Matrices

When T Rn , Rn is invertible, the inverse image of any point, T −1 (x), is itself a point. Thus T −1 is is

also a linear transformation. As such it has an associated matrix which is denoted, naturally enough,

A−1 . A consequence of Equation 2.1 on page 20 is that

A−1 Ax = x = AA−1 x

or that

1 0 ··· 0

0 1 ··· 0

. .. .. ..

.. . . .

0 0 ··· 1

(

1 if i = j

Ai· · A−1

·j = (2.4)

0 if i 6= j

·j is the jth column of A . The jth column of A

−1

must therefore

3. be just long enough to make the dot product with the jth row equal to one.

These requirements can be used to construct the inverse geometrically — see Figure 2.5 on the next

page for the case in which n = 2 and

" # " #

4 3 −1 1/3 −1/6

A= A =

2 6 −1/9 2/9

24

In Figure 2.5, R1 is the set of vectors which are orthogonal

to the second column of A and form an acute angle with the

first column — the first row of A−1 must belong to this set. R2 (3, 6)

Similarly, R2 is the set of vectors which are orthogonal to the

first column of A and form an acute angle with the second

column — the second row of A−1 must belong to this set.

(4, 2)

Problem 2.7. What problem would be encountered in con-

structing the inverse if the columns of A were linearly depen-

dent?

Problem 2.8. The formula for the inverse of a 2 by 2 matrix

R1

is:

" #−1 " #

a b 1 d −b

=

c d ad − bc −c a

Derive the first row of this inverse using Equation 2.4 on the

previous page. Figure 2.5: Constructing the Inverse

lem 2.8 using the Mathematica commands A = {{a,b},{c,d}} and then Inverse[A]//MatrixForm.

What difference would it make to replace //MatrixForm with //InputForm?

Problem 2.10. [Answer] Gram’s Theorem states that if A is an m by n matrix with m < n and x ∈ Rm

then

AAT x = 0 a AT x = 0

Prove Gram’s Theorem.

This theorem implies, for example, that if Rank(A) = m then Rank(AAT ) = m since AT x = 0 has no

solution, x 6= 0, and AAT x = 0 must therefore have no solution either.

Consider the problem of ordinary least squares regression. In this problem data is available which

describes n observations on each of p exogenous and 1 endogenous variables. This arranged as

follows

each column of which corresponds to an exogenous variable. There are more observations than

variables so Rank(X) = p < n.

the endogenous variable.

The problem is to find the projection, ŷ, of y on S = {z | z = Xβ, β ∈ Rp }. The term “least squares”

derives from the fact ŷ is the closest point to y in S and thus minimizes the sum of the squares of

the components of the difference — see Figure 2.6 on following page.

There are two key facts:

• Since ŷ ∈ S, it must be the case that ŷ = X β̂ for some β̂. The problem of finding ŷ thus reduces

to one of finding β̂.

25

y

Residual from

the projection

Projection of y on S

• Since ŷ is the projection of y on S, the residual of the projection, y − ŷ, must be orthogonal to

S, the space spanned by the columns of X.

must be orthogonal to each column of X.

• β̂ = (X T X)−1 X T y. Multiply both sides by (X T X)−1 which exists by virtue of Gram’s Theorem.

Problem 2.11. Suppose x 1 = (1, 0, 2), x 2 = (2, 0, 1), y = (3, 3, 3) ∈ Rn and let L = { z ∈ Rn | z =

α1 x 1 + α2 x 2 , α, β ∈ R } be the linear subspace spanned by x 1 and x 2 . Find ŷ, the projection of y on

L. Hint:

y = {3,3,3}

X . Inverse[Transpose[X] . X] . Transpose[X] . y

The following, due to Hadley [1961], page 87, is a typical definition of the determinant of a square

matrix — correct but not particularly intuitive.

Definition 2.7. The determinant of an n by n matrix A, written |A|, is

X

|A| ≡ (±)a1i a2j . . . anr

the sum being taken over all permutations of the second subscripts with a term assigned a plus sign

if (i, j, . . . , r ) is an even permutation of (1, 2, . . . , n), and a minus sign if it is an odd permutation.

a a12

11

= a11 a22 − a12 a21

a21 a22

26

Problem 2.12. Use Mathematica to derive the formulas for the determinant and inverse of a general

3 × 3 matrix by first entering

It is often more useful to recognize that the determinant is another “signed magnitude” somewhat

analogous to the dot product which is best understood by examining its sign and its magnitude sepa-

rately. In Figure 2.7 the (linearly independent) columns of

" #

4 3

A= kAk = 4 × 4 − 3 × −3 = 25

−3 4

have been illustrated and a parallelogram or, in this case a square, has been formed by completing the

sides formed by these columns.

the second axis is counter clockwise and that the movement

from the first column to the second is also counter clockwise.

A 2 = (3, 4)

Thus the rows of A have the same orientation as the axes.

This means that the determinant has a positive sign. (Switch

5

the columns and the determinant would be negative.) The

magnitude, moreover, corresponds to the area of this paral- 25 = |A|

lelogram.

5

" # A1= (4, -3)

6 −3

B=

2 −1

degenerate — a segment of a line rather than an area. The Figure 2.7: The Determinant in R2 : An

determinant is again equal to the area enclosed within this Oriented Area

line interval which, in this case, is equal to zero.

Problem 2.13. The “formula” for a 2 by 2 determinant is:

a b

= ad − bc

c d

Show that |ad − bc| is the area of the parallelogram formed by the columns. Hint: let x = (a, c),

y = (b, d) and note that the area of the parallelogram is equal to the length of the base, kxk, times

the altitude, ky − ŷk where ŷ is the projection of y on the linear subspace spanned by x.

27

Higher dimensional cases are analogous. The determinant of

a 3 by 3 matrix, for example, has a sign which depends upon

whether the columns have the same orientation as the axes and

a magnitude equal to the volume of the parallelepiped formed A

by the columns. [A parallelepiped is a solid each face of which 2

is a parallelogram.] See Figure 2.8. When the columns are

A

linearly dependent the parallelepiped degenerates into a plane 3

area (rank 2) or a line interval (rank 1), both of which have zero

volume and the determinant, accordingly, is equal to zero.

The determinant of an n by n matrix, analogously, has a sign

A

which depends upon the orientation of the columns and a mag- 1

nitude equal to the volume of the “hyper” parallelepiped formed

by the columns.

Figure 2.8: The Determinant in R3 :

Problem 2.14. Suppose A is an n by n matrix. Provide geo- An Oriented Volume

metrical interpretations for the following propositions:

Â·k = λA·k

i.e., Â is obtained from A by multiplying the kth column of A by a number λ. Then |Â| = λ|A|.

X

Â·k = A·k + λi A·i

i6=k

i.e., that Â is obtained from A by adding a linear combination of the other columns to the kth

column. Then |Â| = |A|.

Problem 2.15. Suppose A is a non-singular n by n matrix. Show that |Ax| = α |A| for some α ∈ R

which depends only upon x. What is α?

Theorem 4 (Cramer’s Rule). If Ax = b with A an n by n matrix and |A| > 0 then

|Bi |

xi =

|A|

The geometrical interpretation of this theorem is quite simple and is illustrated for the case in which

n = 2 in Figure 2.9 on the next page. Note first that the columns of A, labeled A1 and A2 , are linearly

independent and the solution for both x1 and x2 can be obtained by completing the parallelogram:

kb1 k kb2 k

x1 = x2 =

kA1 k kA2 k

28

Let’s use Cramer’s Rule to find, say, x1 . Since we wish to

identify the first component of x we begin by replacing the f

first column of A with b to obtain B1 . Cramer’s rule then

e

asserts that b

|B1 |

x1 =

|A| b2 d

c

Our task then is to show that

A2

b1

|B1 | kb1 k A1

= (2.5) o

|A| kA1 k

gram formed by the first and second columns of A which, in

this case, is positive and could be computed by multiplying

the length of the “base”, oA2 , by the “altitude”, the distance

Figure 2.9: Using Cramer’s Rule to Solve

between the parallel lines ob2 and A1 e. Since the parallelo-

Ax = b for x1

gram with vertices at o, d, e and A2 has the same base and

the same altitude, it’s area is also equal to |A|. Call this

parallelogram PA .

Turning attention to the numerator, |B1 | is the area of the parallelogram with vertices at o, b, f and

A2 . Call this parallelogram PB . Since PB has the same base as PA , the ratio of the area of PB to the

area of PA , |B1 | / |A|, is the same as the ratio of the distance between ob2 and b1 f and the distance

between ob2 and A1 e. But this is the same as the ratio kb1 k/kA1 k which establishes Equation 2.5.

and characteristic vectors:

Definition 2.8. If A is n by n matrix, λ is a scalar and x 6= 0 is an n by 1 vector, then λ is a characteristic

root of A and x is the associated characteristic vector iff λ and x solve the characteristic equation:

Ax = λx (2.6)

Characteristic roots and vectors are also sometimes called (i) eigenvalues and eigenvectors or (ii) latent

roots and latent vectors.

A fact worth noting about the characteristic roots of a matrix is that they characterize the underlying

linear transformation and are invariant with respect to the choice of basis — recall the discussion of

Section 2.3 on page 17. To see this note that if A is the matrix representation of the linear transfor-

mation T for a particular choice of basis, then to be a characteristic root of A, λ must satisfy

T (x) = λx

But this means that matrices which represent the same linear transformation under alternative choices

of basis, i.e., similar matrices, will have the same characteristic roots.

Since Equation 2.6 can be rewritten as the homogenous equation

[A − λI]x = 0

29

it follows that λ is a characteristic root of A iff

|A − λI| = 0

a11 − λ a12 ··· a1n

a a22 − λ ··· a2n

21

.. .. .. =0

..

.

. . .

an1 an2 ··· ann − λ

is a polynomial in λ with (−λ)n the highest order term [the product of the diagonal elements]. From

the fundamental theorem of algebra we know that such a polynomial will have n, not necessarily

distinct, solutions for λ.

Problem 2.16. The characteristic roots of A may be either real or complex but if they are complex

they must occur in conjugate pairs so that if λ = a + bi is a root then λ∗ = a − bi must also be a root.

Show that it follows that both the sum of the roots and the product of the roots are necessarily real.

Theorem 5. If A is an n by n matrix with characteristic roots λi , i = 1, . . . , n, then

n

X

λi = trace(A)

i=1

Yn

λi = |A|

i=1

n

X

trace(A) ≡ aii

i=1

Since similar matrices must have the same characteristic roots, it follows from Theorem 5, that similar

matrices have the same trace and determinant as well.

Problem 2.17. [Answer] Show that Theorem 5 is valid for the 2 by 2 matrix

" #

a b

A=

c d

Problem 2.18. What are the characteristic roots of the three canonical matrices given in Equation 2.2

on page 20?

Problem 2.19. Suppose

1 2 3 4

4 1 2 3

A=

3

4 1 2

2 3 4 1

The Mathematica command for finding the characteristic roots and vectors of a matrix are Eigenvalues[A]

and Eigenvectors[A], respectively. What are the characteristic roots and vectors of A?

30

2.8 Farkas’ Lemma

A final result that will prove very important in subsequent analysis takes us into the realm of linear

inequalities. It states that a system of linear equations will have a solution precisely when another

system of linear inequalities does not have a solution. The importance of this result involves “indi-

rection” — often it will be easier to establish the existence of a solution to the system of interest by

showing that the solution to complementary system cannot exist.

Theorem 6 (Farkas’ Lemma). Suppose A is an m by n matrix and b 6= 0 is a 1 by n row vector. Exactly

one of the following holds:

b

b

yA, y > 0

A2

A2 yA, y > 0

A1 A1

Az _

>0 Az _

>0

b. z < 0

b.z < 0

The basis of this theorem is quite simple and is illustrated in Figure 2.10. Either

1. a vector z exists which forms a non-obtuse angle with every row of A and an obtuse angle with b

(the left-hand panel)

The key to understanding Figure 2.10 is to fix the rows of A and rotate b clockwise in moving from the

left-hand panel to the right-hand panel. Initially b lies outside the cone generated by the rows of A. It

follows that there is a vector z for which Az ≥ 0 and for which b · z < 0, i.e., a vector z that makes an

non-obtuse angle with every row of A and an obtuse angle with b.

As b rotates clockwise this solution disappears precisely at the point at which b enters the cone

spanned by the rows of A but then there is a solution to yA = b with y > 0. This solution persists

until b emerges from the cone spanned by the rows of A but at this point there is again a solution to

Az ≥ 0 and b · z < 0.

31

2.8.1 Application: Asset Pricing and Arbitrage

Consider a two period model of asset pricing. There are n assets which can be traded in the first

period at prices p. The first period budget constraint limits an investor endowed with portfolio ŝ to

portfolios satisfying

p · s ≤ p · ŝ (2.7)

Asset prices in the second period are uncertain and depend upon which of m possible “states of

nature” occurs. It common knowledge when first period trading takes place that the second-period

price of the jth asset will be aij if the ith state occurs. Let A denote the corresponding m × n matrix

of second period prices in which rows correspond to states and columns to assets. Holding portfolio

s would then pay As in the second period, i.e., the ith component of this m-tuple would be the total

value of the portfolio if the ith state occurred.

Note that the components of s are not required to be non-negative. Indeed, negative components

correspond to “short” positions, e.g., s1 = −1 would be interpreted as taking a short position of one

share on the first asset. This means the investor borrows a share of this asset from the market, sells

it for p1 and then uses the receipts to purchase other shares. The catch, of course, is that such loans

must be repaid in the second period. Our investor would thus be required to purchase one share of

the first asset in the second period, whatever its price turns out to be, to repay the first-period loan.

The second-period “solvency” constraint is that the investor must be able to repay such loans or that

holding the portfolio not entail bankruptcy in any state

As ≥ 0 (2.8)

It is important to realize that the components of As are the commodities that investors care about

— the components of s only matter to the extent that they affect As. Since p is vector of security

prices and securities are not themselves the focus of interest, the question arises of whether or not

it is possible to identify an m-tuple of “shadow prices”, ρ, of the commodities of interest. Here ρi

would be interpreted as the price of a claim to one dollar contingent upon state i occurring in fictional

shadow market. For such shadow prices to be interesting, trade opportunities in the shadow market

would have to be equivalent to those in the actual markets, i.e., ρ would have to satisfy

Can we be sure that a solution to Equation 2.9 exists? Well, if we make the association y = ρ, b = p

and z = s, then Farkas’ Lemma states that either Equation 2.9 will have a solution or there will be a

solution, s, to

As ≥ 0, p · s < 0 (2.10)

An s that satisfied Equation 2.10 would be a good thing, too good in fact. It not only satisfies solvency,

As ≥ 0, but also “pumps money” into the pocket of the investor in the first period since p · s < 0. In

the context of the budget constraint, Equation 2.7, this means that

p · (λs) = λp · s ≤ p · ŝ

is satisfied for an arbitrarily large λ and thus that our investor could acquire infinite first period wealth.

This is commonly called an arbitrage opportunity. If we make the reasonable supposition that p and

A preclude such arbitrage opportunities, then the existence of shadow prices satisfying Equation 2.9

is guaranteed.

♦ Query 2.4. Suppose m = n = 2, that the two columns of A, A1 and A2 , are linearly indpendent and

that p · ŝ = 1, i.e., our investor is worth one dollar in the first period.

32

1. In a graph of the positive quadrant of R2 , illustrate A1 and A2 and the points v1 ≡ A1 /p1 and

v2 ≡ A2 /p2 . Is either v1 > v2 or v2 < v1 consistent with no arbitrage opportunities?

2. Illustrate the budget constraint for contingent claims under the assumption that no arbitrage

opportunities exist. Label the regions corresponding to long positions on both assets, to a long

position on the first asset and a short position on the second and to a short position on the first

asset and a long position on the second.

3. What is the effect in your illustration of adding the solvency constraint to the budget constraint?

4. Is it possible to determine the shadow prices, ρ1 and ρ2 , from your illustration and, if so, how?

♦ Query 2.5. Suppose that no arbitrage opportunities exist and let x = As and x̂ = Aŝ. What is the

budget constraint corresponding to Equation 2.7 on the previous page in terms of x, x̂ and ρ? What

is the solvency constraint corresponding to Equation 2.8 on the previous page?

♦ Query 2.6. Suppose that a new asset is introduced, that Rank(A) = Rank(A|b) where b is the vector

of state-dependent, second-period prices for the new asset, that no arbitrage opportunities exist either

before or after the introduction of the new asset and that p = ρA is the vector of first-period prices of

the original assets. What must be the first-period price of the new asset?

♦ Query 2.7. Suppose that no arbitrage opportunities exist and that there is a riskless portfolio, s ∗ , for

which As ∗ = (1, 1, . . . , 1)T . What is the one-period riskless rate of return? Hint: What is the first-period

cost of buying claims to a sure, second-period dollar?

2.9 Answers

Problem 2.2 on page 20. Suppose that Rank(A) = m = n and that A is not invertible. Then there must

exist y, x, x 0 ∈ Rn with x 6= x 0 such that y = Ax = Ax 0 . But this means that A(x − x 0 ) = 0 with

(x − x 0 ) 6= 0 and thus the columns of A are linearly dependent — a contradiction. Conversely, suppose

A is invertible and the columns of A are linearly dependent. Then there exist weights α = (α1 , . . . , αn )

such that Aα = 0. Now choose any x ∈ Rn and note that x − α 6= x + α and yet A(x − α) = A(x + α) =

Ax — thus A is not invertible.

Problem 2.10 on page 25. Suppose AAT x = 0. Then

x T AAT x = 0 (AT x)T (AT x) = 0 |AT x| = 0 AT x = 0

and, conversely, if AT x = 0 then clearly AAT x = 0.

Problem 2.17 on page 30. Expanding the determinant yields

(a − λ)(d − λ) − bc = 0

or

λ2 − (a + d)λ + ad − bc = 0

Using the quadratic formula yields

p

a + d + (a + d)2 − 4(ad − bc)

λ1 =

2

p

a + d − (a + d)2 − 4(ad − bc)

λ2 =

2

λ1 + λ2 = a + d = trace(A)

λ1 λ2 = ad − bc = |A|

33

34

Chapter 3

Topology

3.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.1.1 Countable Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.1 Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.2.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.1 Separation Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.3.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

This chapter draws much from Simmons [1963], surely one of the most beautiful books about mathe-

matics ever written.

3.1 Counting

The subject of counting begins simply enough with thoughts of the positive integers, 1, 2, 3, . . .,

familiar to all of us. But counting was surely important to human beings even before such symbols

were invented. Imagine a primitive society of sheep herders whose number system was limited to

the symbols “1”, “2”, “3” and “several”, i.e., “more than 3”. How might they have kept track of herds

containing “several” sheep? One simple device might have been to place a stone in a pile for each

sheep in the herd and then, each night, to remove a stone for each sheep accounted for. Stones left in

the pile would then have indicated strays needing to be found.

35

3.1.1 Countable Sets

N = {1, 2, 3, . . .}

containing all the positive integers or cardinal numbers, serves as a modern “pile of stones”. While

this set is adequate for counting any non-empty, finite set, in mathematics there are many infinite sets

just as, for the herdsmen, there were many herds with “several” sheep. The simple but profound idea

of a one-to-one correspondence that met the needs of the herdsmen also permits comparing these

infinite sets.

Definition 3.1. Two sets are said to numerically equivalent if there is a one-to-one correspondence

between the elements of the two sets.

Definition 3.2. A countable set is a set that is numerically equivalent to the positive integers.

Suppose, for example, that we want to compare the set consisting of all positive integers with the set

consisting of of all even positive integers. Since the pairing

1 2 3 ··· n ···

2 4 6 ··· 2n ···

establishes a one-to-one correspondence, the two sets must be regarded as having the same number

of elements even though one is a proper subset of the other. This situation is not unusual since every

infinite set can, in fact, be put into a one-to-one correspondence with a proper subset of itself.

Similarly, there are exactly as may perfect squares as there are positive integers because these two

sets can also be put in a one-to-one correspondence:

1 2 3 ··· n ···

2 2 2 2

1 2 3 ··· n ···

As another example, consider the set of all positive rational numbers, i.e., ratios of positive integers.

Surely this set is larger than the positive integers, right? No. The following array includes every

positive rational number at least once

2/1 2/2 2/3 2/4 ···

3/1 3/2 3/3 3/4 ···

.. .. .. .. ..

. . . . .

and can be put into a one-to-one correspondence with the positive integers as follows:

1 2 3 4 5 6 7 8 9 ···

1/1 1/2 2/1 1/3 2/2 3/1 1/4 2/3 3/2 ···

{· · · , −2, −1, 0, 1, 2, · · · }

So how many positive integers are there? The symbol ℵ0 , read “aleph null”, is used to represent the

number of elements or cardinality of the set. Our list of numbers now includes its first “trans-finite”

number:

1 < 2 < 3 < · · · < ℵ0

36

3.1.2 Uncountable Sets

Not all sets with infinitely many elements are countable. Consider a countable sequence of points of

the form x1 , x2 , x3 , ... where each element xi is either 0 or 1 and a countable listing of these sequences

such as:

s1 = (0, 0, 0, 0, 0, 0, 0, · · · )

s2 = (1, 1, 1, 1, 1, 1, 1, · · · )

s3 = (0, 1, 0, 1, 0, 1, 0, · · · )

s4 = (1, 0, 1, 0, 1, 0, 1, · · · )

s5 = (1, 1, 0, 1, 0, 1, 1, · · · )

s6 = (0, 0, 1, 1, 0, 1, 1, · · · )

s7 = (1, 0, 0, 0, 1, 0, 0, · · · )

..

.

It is possible to build a list of elements s0 in such a way that its first element is different from the first

element of the first sequence in the list, its second element is different from the second element of the

second sequence in the list, and, in general, its nth element is different from the nth element of the

nth sequence in the list. For instance:

s1 = (0, 0, 0, 0, 0, 0, 0, · · · )

s2 = (1, 1, 1, 1, 1, 1, 1, · · · )

s3 = (0, 1, 0, 1, 0, 1, 0, · · · )

s4 = (1, 0, 1, 0, 1, 0, 1, · · · )

s5 = (1, 1, 0, 1, 0, 1, 1, · · · )

s6 = (0, 0, 1, 1, 0, 1, 1, · · · )

s7 = (1, 0, 0, 0, 1, 0, 0, · · · )

..

.

s0 = (1, 0, 1, 1, 1, 0, 1, · · · )

Note that the highlighted element in s0 is in every case different from the highlighted element in the

table above it and thus the new sequence is distinct from all the sequences in the list. From this it

follows that the set T , consisting of all countable sequences of zeros and ones, cannot be put into

a list s1 , s2 , s3 , .... Otherwise, it would be possible by the above process to construct a sequence s0

which would both be in T (because it is a sequence of 0’s and 1’s) and at the same time not in T

(because we deliberately construct it not to be in the list). Therefore T cannot be placed in one-to-one

correspondence with the positive integers. In other words, T is uncountable.

Now consider the binary representation of a number between zero and one where, for example, 1/2

would be represented as 0.1, 1/4 would be 0.01 and so forth. Since the binary representation of a real

number between zero and one must be a countable sequence of zeros and ones preceded by a decimal

point, e.g., 0.1011011100 . . ., and since the number of such sequences is uncountable, it follows that

the set of real numbers lying between zero and one must also be uncountable.

Surely there are more real numbers than those lying between zero and one, right? No, the set of all real

numbers and the set of real numbers between zero and one, or in any other interval, are numerically

37

a b

P'

0

Figure 3.1: One-to-one Correspondence Between an Interval and the Real Line

equivalent. The one-to-one correspondence is illustrated in Figure 3.1. Simply bend the interval ab

into a semi-circle, rest the result on the real line and then associate an arbitrary point P from the

interval with that point P 0 from the real line which corresponds to the intersection of a line from the

center of the semi-circle through P with the real line.

We now have a new cardinal number, c, called the cardinal number of the continuum and our list of

numbers now includes a second “trans-finite” number:

Problem 3.2. The Cantor set is obtained as follows. First let C1 denote the closed unit interval [0, 1].

Next delete from C1 the open interval (1/3, 2/3) corresponding to the middle third of C1 to get C2 and

note that C2 = [0, 1/3] ∪ [2/3, 1]. Now delete the open middle thirds of the two closed intervals to get

Continuing in this fashion we obtain a sequence of closed sets, each of which contains all its succes-

sors. The Cantor set is defined by

C = ∩∞i=1 Ci

1. Each Cn consists of a number of disjoint closed intervals of equal length. How many closed

intervals are there in C30 ?

2. The intervals removed have lengths 1/3, 2/9, 4/27, . . . , 2n−1 /3n , . . . What is the combined length

of the intervals that have been removed? Hint: Let Mathematica evaluate

Sum[2^(n-1)/3^n, {n,1,Infinity}]

You might be surprised at this point to learn that the cardinality of C is equal to c, i.e., the same as C1 .

An interesting consequence is that since the rational numbers are countable but the real numbers are

not, the set of irrational numbers must be uncountable as well or, more poetically:

The rational numbers are spotted along the real line like stars against a black sky, and the

dense blackness of the background is the firmament of the irrationals.

– T. E. Bell

Are there any cardinal numbers between ℵ0 and c? No one knows the answer to this question though

Cantor himself thought that no such number exits. There are, on the other hand, cardinal numbers

larger than c — the number of elements in the class of all subsets of R, for example. This is one

consequence of the following theorem.

Theorem 7. If X is any non-empty set, then the cardinal number of X is less than the cardinal number

of the class of all subsets of X.

38

Suppose, for example, that X = {1}, then there are two subsets, and {1}. If X = {1, 2}, then there

are four subsets, , {1}, {2} and {1, 2}. Similarly, X = {1, 2, 3} has eight subsets and, in general, if X

has n elements, then there are 2n subsets.

Continuing into the infinite realm, if X has ℵ0 elements then there are 2ℵ0 > ℵ0 subsets. Which is

larger, c or 2ℵ0 ? As noted above, the cardinality of the unit interval, c, is the same as the cardinality

of the set of all countable sequences of zeros and ones. Consider the one-to-one mapping between

the set of all subsets of the natural numbers and the set of all countable sequences of zeros and ones

defined by

f (S) = {x1S , x2S , . . .}

where (

1 if i ∈ S

xiS ≡

0 otherwise

If, for example, S = {2, 3, 5} then f (S) = (0, 1, 1, 0, 1, 0, . . . , 0, . . .) Thus f (S) gives a countable sequence

of zeros and ones for any S ⊂ N. But then it is also true that

f −1 (x) ≡ {j ∈ N | xj = 1}

Thus f −1 exists and f −1 (x) gives a subset of N for any countable sequence of zeros and ones, x. Thus

the cardinality of the continuum is the same as the cardinality of the class of all subsets of the natural

numbers and c = 2ℵ0 .

Definition 3.3. Let X be a non-empty set. A metric or norm on X is a function d that maps X × X into

non-negative real numbers, R+ , and which satisfies the following three conditions:

An example of a metric space is X = R with d(x, y) = x − y .

Problem 3.3. [Answer] Show that d(x, y) = x − y for x, y ∈ R satisfies the three requirements

for a metric.

p

p

Problem 3.4. Show that d(x, y) = (x − y) · (x − y) satisfies the three requirements for a metric.

Hint: For the third part remember the Cauchy-Schwarz inequality (Problem 1.6 on page 4).

Definition 3.5. Suppose (X, d) is a metric space. The diameter of a subset S ⊆ X is defined by

d(S) ≡ sup{d(x1 , x2 ) | x1 , x2 ∈ S}

Problem 3.5. [Answer] What is the diameter of ?

Definition 3.6. Suppose (X, d) is a metric space. Then a subset S ⊆ X is called bounded iff its diameter,

d(S), is finite.

39

3.2.1 Open Sets

Definition 3.7. Suppose (X, d) is a metric space. If x ∈ X and r is a positive real number, then the set

Note that an open sphere is always non-empty since it contains its center.

Note also that the term “sphere” should not be taken literally. If X = R and d(x, y) = x − y , then

q of the real line containing numbers greater than x − r and less than

the “sphere” is actually an interval

x + r . If X = R2 and d(x, y) = (x1 − y1 )2 + (x2 − y2 )2 then the sphere is actually a disk (the interior

of a circle).

Problem 3.6. Let X = R2 and d(x, y) = max {x1 − y1 , x2 − y2 }. Show that d satisfies the three

ChebyshevDistance[x,y] = Max[Abs[x-y]]

equals maxi xi − yi . Supposing

x = Table[j, {j,1,10}]

and

y = Table[11-j, {j,1,10}]

What is ChebyshevDistance[x,y]?

Problem 3.8. In Mathematica

EuclideanDistance[x,y] = Norm[x-y]

qP

2

equals i (xi − yi ) . Supposing

x = Table[j, {j,1,10}]

and

y = Table[11-j, {j,1,10}]

what is EuclideanDistance[x,y]?

Problem 3.9. Let X = R2 and d(x, y) = x1 − y1 + x2 − y2 . Show that d satisfies the three

ManhattanDistance[x,y] = Total[Abs[x-y]]

P

equals i xi − yi . Supposing

x = Table[j, {j,1,10}]

and

y = Table[11-j, {j,1,10}]

what is ManhattanDistance[x,y]? Why do you suppose that it’s called Manhattan distance?

40

Definition 3.8. A subset G of the metric space X is called an open set if, for any point x in G, there is

a positive real number rx such that Srx (x) ⊆ G.

Theorem 8. Let (X, d) be a metric space. Then both the empty set, , and X itself are open sets.

Theorem 9. Let (X, d) be a metric space. Then a subset G of X is an open set iff it is a countable union

of open spheres.

Theorem 10. Let (X, d) be a metric space. Then

n

where i is a positive integer. (i) What is ∪∞

i=1 ai ? Is it open, closed or neither? (ii) What is ∩i=1 ai ? Is it

∞

open, closed or neither? (iii) What is ∩i=1 ai ? Is it open, closed or neither?

Definition 3.9. If (X, d) is a metric space and A is a subset of X, then a point x ∈ X is a limit point of

A if every open set containing x contains at least one point from A other than x.

Definition 3.10. A subset A of a metric space X is a closed set if it contains all of its limit points.

Theorem 11. Let (X, d) be a metric space. Then

1. A subset A of X is a closed set iff its complement (with respect to X) is an open set.

2. The empty set, , and the set X itself are both closed.

Note that is both an open and closed set and X is both an open and closed set.

Theorem 12. Let X be a metric space. Then

Problem 3.12. Consider closed intervals on R of the form ai = [−1 + 1/(1 + i), 1 − 1/(1 + i)] where

i is a positive integer. (i) What is ∪n ∞

i=1 ai ? Is it open, closed or neither? (ii) What is ∪i=1 ai ? Is it open,

∞

closed or neither? (iii) What is ∩i=1 ai ? Is it open, closed or neither?

3.2.3 Convergence

{x1 , x2 , x3 , . . . , xn . . .}

which is numerically equivalent to the set of positive integers. Such a sequence might be denoted by

{xi } where it is understood that i ranges over the positive integers.

Unlike a set, order matters in a sequence — there is a first element, a second element and so forth —

and a given element in X can occur more than once.

41

Definition 3.12. A sequence on metric space (X, d) is convergent if there exists a point x ∈ X such

that either of these equivalent conditions hold:

• for each r > 0, there exists a positive integer nr such that n ≥ nr implies d(xn , x) < r .

• for each open sphere centered on x, Sr (x), there exists a positive integer nr such that n ≥ nr

implies xn ∈ Sr (x).

When a sequence converges, the point to which it converges is unique and called the limit point. The

fact that the sequence xn is convergent and that x is its limit point can be expressed in a variety of

equivalent ways:

• xn approaches x

• xn converges to x

• xn → x

• lim xn = x

Every convergent sequence has the property that for any positive real number r , there exists a positive

integer nr such that m, n ≥ nr implies d(xm , xn ) < r . This follows from the triangle inequality

(Definition 3.3 on page 39) since convergence implies that there is a positive integer, nr , such that

n ≥ nr implies d(xn , x) < r /2. Thus

r r

m, n ≥ nr ⇒ d(xm , xn ) ≤ d(xm , x) + d(x, xn ) < + =r (3.1)

2 2

A sequence for which Equation 3.1 holds is called a Cauchy sequence. Such a sequence might be

described as “trying to converge”. Whether or not it succeeds depends upon the metric space in which

it lives. If, f or example, X = {x ∈ R | 0 < x < 1} then the Cauchy sequence xn = 1/n tries to converge

but the point to which it wants to converge, 0, is not in X.

Definition 3.13. A complete metric space is a metric space in which every Cauchy sequence converges.

A complete metric space thus has the property that every sequence that tries to converge succeeds.

Theorem 13. If Y is a subspace of a complete metric space X, then Y is complete iff Y is a closed set.

Even when a sequence does not converge, it may be possible to extract a subsequence that does

converge. The sequence {0, 1, 0, 1, . . .} on R, for example, does not converge but the subsequence

{1, 1, 1} obtained by extracting the even terms from the original sequence does converge.

Definition 3.14. A sequence {xi }, i ∈ I on a metric space X is said to have a convergent subsequence

iff there exists a countable but not finite subset J ⊆ I such that the sequence {xj }, j ∈ J converges.

3.2.4 Continuity

Definition 3.15. Suppose (X, dX ) and (Y , dY ) are metric spaces and that f is a mapping of X into Y .

Then f is said to be continuous at a point x ∈ X if either of the following equivalent conditions hold:

• for each rY > 0 there exists a rX > 0 such that d(x, x 0 ) < rX implies d (f (x), f (x 0 )) < rY .

• for each open sphere SrY(f (x)) centered on f (x) there exists an open sphere SrX (x) centered

on x such that f SrX (x) ⊆ SrY (f (x)).

The mapping f is said to be continuous iff it is continuous at each point in its domain X.

Theorem 14. Suppose X and Y are metric spaces and f is a mapping of X into Y . Then f is continuous

iff f−1 (Z) is an open set in X whenever Z is an open set in Y .

42

3.3 Topological Spaces

Since Theorem 14 on the previous page expresses continuity solely in terms of open sets without any

direct reference to metrics, the possibility arises of dispensing with metrics altogether and basing

matters on open sets instead. Pursuing this idea further, Theorem 10 on page 41 gives the main

properties of the class of open sets in a metric space. This leads to the following

Definition 3.16. Let X be a non-empty set. A class T of subsets of X is called a topology on X iff it

satisfies the following two conditions:

Definition 3.17. A topological space is a non-empty set X together with a topology T on X. The sets in

the class T are called the open sets of the topological space (X, T ).

In other words, the class of open sets is closed with respect to the formation of countable unions and

finite intersections.

Note that the empty set is a subset of every set and thus the empty class of sets is a subset of every

topology. From this it follows that the empty set, as the union of the sets in the empty class, must

belong to every topology. Similarly, the intersection of sets in the empty class must, by elementary set

theory, be equal to the union of the complements of the sets in the empty class, namely X. Thus the

set X must belong to every topology on X.

Definition 3.18. Suppose (X, T ) is a topological space and x is a point in X. Then a neighborhood of x

is an element of T (an open set) that contains x.

1. X is a metric space and T is the class of sets that are open in the sense of Definition 3.8 on

page 41. This is called the usual topology or the topology induced by the metric and is understood

to be the topology for a metric space unless another is specifically mentioned.

2. Let X be a non-empty set and let T contain the two sets and X. This is called the trivial topology.

3. Let X be a non-empty set and let T be the class of all subsets of X. This is called the discrete

topology and a space with this topology is called a discrete space.

Suppose X is a non-empty set and that Ta and Tb are two topologies on X. The statement Ta ⊆ Tb

would then mean that each open set in Ta is also an open set in Tb but not necessarily vice versa. In

other words, Tb has all of the open sets of Ta and perhaps others as well. In such circumstances Tb

would be called stronger than Ta or, equivalently, Ta would be called weaker than Tb . Any two topolo-

gies may not be comparable, of course, but all topologies will be weaker than the discrete topology

(Example 3) and stronger than the trivial topology (Example 2).

Definition 3.19. Suppose (X, T ) is a topological space and Y is a subset of X. Then the relative topology

on Y is defined to be the class of all intersections of open sets in X with Y .

Definition 3.20. Let (X, Tx ) and (Y , Ty ) be topological spaces and f a mapping of X into Y . Then f is

called a continuous mapping iff f −1 (Z) is open in X (belongs to TX ) whenever Z is open in Y (belongs

to TY ).

Problem 3.13. Consider the mapping from X = R into Y = R defined by

(

x if x ≤ 0

f (x) =

1 + x if x > 0

43

Suppose that both X and Y are metric spaces with the usual metric d(x, y) = x − y and with the

topology induced by this metric. Give an example of an open set Z in Y for which f −1 (Z) is not open

in X.

Definition 3.21. If (X, T ) is a topological space, then a set Y ⊆ X is called a closed set iff its complement,

X \ Y , is an open set (belongs to T ).

Theorem 15. Suppose (X, T ) is a topological space. Then

In other words, the class of closed sets is closed with respect to the formation of countable intersec-

tions and finite unions.

The following shows that we could replace “open set” with “closed set” as the primitive element for

defining a topology.

Theorem 16. Suppose X is a non-empty set and that there is a class of subsets of X which is closed

with respect to the formation of countable intersections and finite unions. Then the class of all com-

plements of these sets is a topology on X whose closed sets are precisely those given initially.

One of the most basic requirements of a topological space is that each of its points should be a closed

set. Whether or not this is true depends upon the separation properties of the topology.

Definition 3.22. A T1 -space is a topological space (X, T ) with the property that if x and y are distinct

points in X, then there exist open sets in Ox and Oy in T such that x ∈ Ox and y ∈ Oy but x ∉ Oy

and y ∉ Ox .

Problem 3.14. Which, if any, of the three examples of topological spaces — usual, trivial or discrete

— given on page 43 is a T1 -space?

Theorem 17. A topological space is a T1 -space iff each set containing a single point is a closed set.

Problem 3.15. [Answer] Prove Theorem 17.

Definition 3.23. A T2 -space or Hausdorf space is a topological space (X, T ) with the property that if x

and y are distinct points in X then there exist open sets Ox and Oy in T such that x ∈ Ox , y ∈ Oy

and Ox ∩ Oy = .

Problem 3.16. Consider X = R with the usual topology T induced by the metric d(x, y) = x − y .

Is (X, T ) a T2 -space?

Problem 3.17. Is a T2 -space necessarily also a T1 -space?

Definition 3.24. Suppose (X, T ) is a topological space. A set S ⊆ X is called dense if U ∩ S 6= for every

non-empty open set U ∈ T .

A set S that is both open and dense has an interesting property. Every point in its complement,

S C = X \ S, can be approximated arbitrarily closely by points in S because S is dense, but no point in

S can be approximated by a point in S C because S is open. For example,

S = {x ∈ R2 | x1 x2 6= 1}

44

Definition 3.25. A property which is true for an open and dense set is called a generic property of the

topological space.

♦ Query 3.1. Suppose (X, T ) has the trivial topology and that S ⊆ X is both open and dense. What is

S?

♦ Query 3.2. Suppose (X, T ) has the discrete topology and that S ⊆ X is both open and dense. What

is S?

3.3.3 Compactness

Definition 3.26. Suppose (X, T ) is a topological space with S ⊆ X. Then Ca ∈ T for a in some index

set A is called an open cover of S if S ⊆ ∪a∈A Ca . A subclass of an open cover which is itself an open

cover is called a subcover.

Definition 3.27. A compact space is a topological space in which every open cover has a finite subcover.

Theorem 18. Suppose (X, d) is a metric space with the usual topology induced by the metric. Then

the set S ⊆ X is compact

• iff every infinite subset of S has a limit point. (The Bolzano-Weirstrass property)

• iff every sequence in S has a convergent subsequence. (The sequential compactness property)

Theorem 19 (Heine-Borel). A subset of Rn is compact under the usual topology iff it is closed and

bounded.

Theorem 20. Suppose (X, T ) is a topological space. Then

• Any closed subset of a compact set is compact.

• The union of a finite number of compact sets is compact.

Theorem 21 (Weierstrass). Suppose (X, T ) is a topological space and that f is a continuous mapping

from X into R. If S is a compact set, then f achieves both a maximum and a minimum in S.

Definition 3.28. A measure space is a non-empty set, X, together with a non-empty class of subsets, S,

which satisfies the following properties.

1. A ∈ S implies AC = X \ A ∈ S.

2. Ai ∈ S, for i = 1, 2, . . ., implies ∪∞

i=1 Ai ∈ S

A class of sets which satisfies these properties is called a σ -algebra and the members of S are called

the measurable sets in X. Further, if (X, S) is a measure space and (Y , T ) is a topological space, then

f : X → Y is called measurable iff f −1 (V ) ∈ S for every V ∈ T .

Problem 3.18. Show that if (X, S) is a measure space, then X ∈ S and ∈ S.

Problem 3.19. Show that if (X, S) is a measure space, then Ai ∈ S for i = 1, 2, . . ., implies ∩∞

i=1 Ai ∈ S.

45

3.5 Answers

Problem

3.3 on page

39. (i)x − y

≥

0 and x − y = 0 iff x = y. (ii) x − y = −(y − x) =

y − x and (iii) x − y + y − z ≥ (x − y) + (y − z) = |x − z|.

Problem 3.5 on page 39. Infinity since the supremum does not exist.

Problem 3.15 on page 44. Since X is a topological space:

x C ).

46

Chapter 4

Calculus

4.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3 The Second Differential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4 Convex and Concave Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Linear functions have such pleasant properties relative to non-linear functions that one is tempted to

erase the latter wherever they occur and replace them with the former. Under certain circumstances,

such replacement is legitimate. Note, to begin with that the number a and the linear function ax

provide a good linear approximation of the function f : R → R at x ∈ R if the error involved in the

approximation,

(v) ≡ f (x + v) − f (x) − av

is small relative to v:

(v) f (x + v) − f (x̄) − av

lim = =0

kvk→0 kvk v

~ and the associated linear function, a

Similarly, a vector a ~ · v, provide a good linear approximation of

f : Rn → R at the point x ∈ Rn if the error involved in the approximation,

~·v

(v) ≡ f (x + v) − f (x) − a

is small relative to the norm of v:

(v) ~·v

f (x + v) − f (x̄) − a

lim = =0

kvk→0 kvk kvk

Generalizing slightly we have:

Definition 4.1. The m by n matrix A is a good linear approximation of f : Rn → Rm if

(v) kf (x + v) − f (x̄) − Avk

lim = =0

kvk→0 kvk kvk

47

~ or m = n = 1

This definition subsumes as special cases the circumstances in which m = 1 and A = a

and A = a.

Two questions remain regarding the use of linear approximations. Which functions have good linear

approximations? When such approximation exists, how can they be identified? Differential calculus

was invented, in no small part, to provide the answers to these questions.

Definition 4.2. Suppose f : Rn → Rm is a vector-valued function with the real-valued components

f = (f 1 , . . . , f m ), f i : Rn → R. The function f is differentiable at a point x iff there is a matrix A s.t.

kf (x + v) − f (x) − Avk

lim =0

kvk→0 kvk

in which case the matrix A is called the derivative at x and the linear transformation Av is called the

differential at x. The function is differentiable iff it is differentiable at every point in the domain.

The requirement for f to be differentiable at x is, of course, precisely the same as the requirement

for the linear transformation Av to provide a good approximation of f at x. “Differentiable” is thus

synonymous with “has a good linear approximation” and “differential” is synonymous with “best linear

approximation”.

As special cases note that when m = n = 1, the matrix A has but a single element

df (x)

a = f 0 (x) =

dx

the “derivative of f evaluated at x”. Geometrically, a is the slope of f (x) at x and av is the equation

of the “tangent hyperplane”.

Problem 4.1. Using Mathematica, plot the tangent to the graph of

at x = 1.6.

Problem 4.2. Using Mathematica, determine the critical points of the function specified in Prob-

lem 4.1, i.e., the points on the graph of f where df /dx = 0.

1.2

This is illustrated in Figure 4.1 for the case in which f (x) = ln(x),

x = 2 and 1.0

0.8

1

ln(2 + v) = ln(2) + v + µ(v) 0.4

2 0.2

When m = 1 but n > 1, the matrix A has but a single row -1.0 -0.5 0.5 1.0

∂f (x) ∂f (x)

imation

~ = fx (x) =

a ,...,

∂x1 ∂xn

which is called the gradient of f : Rn → R evaluated at x.

Consider, for example, the case in which f (x) = x1 2 + x2 , x = (2, 3), f (2, 3) = 7 and fx (2, 3) = (4, 1).

The graph of this function would be a surface in R3 and the differential would be a hyperplane tangent

to this surface at the point (2, 3, 7).

48

The gradient itself is best illustrated in R2 — see Figure 4.2. The

parabola represents a level contour for f , i.e., the inverse image of

a particular point in the range. The gradient at the point x = (2, 3) 6

illustrates three very important facts about the gradient: (i) It is or-

thogonal to a tangent to the level contour at the point at which it 4

is evaluated. (ii) It points in the direction in which the function in-

creases at the greatest rate, i.e., the direction of steepest ascent. (iii)

Its length is equal to the slope of the function in the direction of 2

steepest ascent.

The latter fact follows from the fact that f (x + v) ≈ f (x) + fx · v.

-2 2 4 6

Now take v to be a unit length movement in the direction fx so that

v = fx /kfx k, substitute for v and rearrange to get

Figure 4.2: The Gradient of

! x1 2 + x2 at x = (2, 3)

fx fx

f x+ − f (x) ≈ fx · = kfx k

kfx k kfx k

Recall now that when you graph the function f : R , R corresponding to y = x 2 , for example, you are

drawing a picture of the set {(x, y) ∈ R2 | y = x 2 }. Similarly

Definition 4.3. The graph of f : Rn , R is the set {(x, y) ∈ Rn+1 | y = f (x)}.

Problem 4.3. Describe the graph of y = a · x where a ∈ Rn , i.e., the set G ≡ {(y, x) ∈ Rn+1 | y =

a · x}. What is the gradient of this map? What is the value of y when x = a/kak? Is G a linear

subspace? Is GC open and dense in the usual topology?

Problem 4.4. Suppose f (x1 , x2 ) = x1 +ln(1+x2 ). Calculate and illustrate fx (x) for x = (0, 0), (1, 0), (0, 1)

and (0, −1). Hint: In Mathematica Log[x] is used to compute the natural log of x.

In general, when f : Rn → Rm :

∂f 1 (x̄)/∂x1 ··· ∂f 1 (x)/∂xn

.. .. ..

A = fx (x) = .

. .

m

∂f (x)/∂x1 ··· ∂f m (x)/∂xn

is called the Jacobian of f . Notice that the ith row of the Jacobian is simply the gradient of the ith

component of f .

Problem 4.5. Suppose that f : Rn → Rm . Show that if f is differentiable at x then it must also be

continuous at x.

Having done the best we can with a linear approximation, it is natural to ask what improvement might

be possible with a quadratic approximation, i.e., a mapping Q : Rn , R where

Q(x) = x T Bx

n X

X n

= bij xi xj

i=1 j=1

2

and B is an n by n matrix.

49

Problem 4.6. Show that for any n by n matrix B

" #

T T B + BT

x Bx ≡ x x, ∀x

2

and thus that B can be taken to be a symmetric matrix, bij = bji , without any loss of generality.

Such quadratic forms have a number of interesting properties. Note first that if λ ∈ R then

= λ2 x T Bx

= λ2 Q(x)

Definition 4.4. A function f : Rn , R is homogeneous of degree ρ iff

f (tx) = t ρ f (x), ∀t 6= 0

The fact that quadratic forms are homogeneous of the second degree means that their graphs are

symmetric about the y-axis

= Q(x)

• The quadratic form is indefinite iff Q(x 0 ) > 0 for some x 0 and Q(x 00 ) < 0 for some x 00 .

2

These possibilities can be nicely illustrated for the case in 1

0

where -2

" # 3

1 0

B=

0 4 2

1

is illustrated in Figure 4.3. Note that the graph is symmetric

about the y-axis and lies strictly above the x-plane every- 0

-2

where other than the origin. Cross sections taken parallel to -1

0

the x-plane are either ellipses when y > 0, a single point 1

when y = 0 or empty when y < 0. Other positive definite 2

quadratic forms will have similar graphs, i.e., “bowls” which Figure 4.3: Positive Definite

touch the x-plane only at the origin and lie strictly above it

everywhere else. Cross sections when y > 0 will either be ellipses or, in degenerate cases, circles.

Since −Q(x) is a negative definite quadratic form iff Q(x) is positive definite, we don’t need a new

picture for negative definite quadratic forms — simply turn Figure 4.3 upside down.

50

2

Figure 4.4 illustrates the graph of the positive semi-definite 1

0

quadratic form associated with -1

-2

" # 2.0

1 0

B= 1.5

0 0 1.0

0.5

Again the graph is symmetric about the y-axis and lies on 0.0

or above the x-plane everywhere other than the origin. Here, -2

-1

however, the graph actually coincides with the x-plane along 0

1

a line. Cross sections taken parallel to the x-plane are either 2

pairs of parallel lines when y > 0, a single line when y = 0

Figure 4.4: Positive Semi-Definite

or empty when y < 0. A degenerate possibility for a positive

semi-definite quadratic form is that its graph coincides with

the x-plane everywhere — the case in which every element of B is equal to zero. Other possibilities are

similar to Figure 4.4 — “troughs” which coincide with the x-plane along a line and lie strictly above it

elsewhere.

Once again we need not bother with a graph for the negative version. Since −Q(x) is negative semi-

definite iff Q(x) is positive semi-definite, simply turn Figure 4.4 upside down for an illustration of a

negative semi-definite quadratic form.

2

Only the case of an indefinite quadratic form remains and 1

0

this is illustrated in Figure 4.5 for the case in which -1

-2

" # 2

1 0

B= 1

0 −1 0

-1

As always, the graph is symmetric about the y-axis. Here -2

the graph sometimes lies above the x-plane and sometimes -2

-1

lies below it. Cross sections taken parallel to the x-plane are 0

2

crossing lines when y = 0 [x2 = ±x1 ].

Figure 4.5: Indefinite

Other indefinite quadratic forms are similar — “saddles” with

regions both above and below the x-plane.

Definition 4.5. A vector a and an n by n matrix B provide a good quadratic approximation of f : Rn → R

at the point x if the error involved in the approximation,

1 T

µ(v) ≡ f (x + v) − f (x) − a · v − v Bv

2

µ(v) f (x + v) − f (x̄) − a · v − 12 v T Bv

lim = =0

kvk→0 kvk2 kvk2

51

This means that the quadratic form, v T Bv, is a good (relative to kvk2 ) approximation of the error

left from the linear approximation. [A good quadratic approximation of f : Rn → Rm would similarly

require a vector ai and a matrix B i for each component f i .]

As might be suspected, the second derivative is the key to obtaining the quadratic approximation.

Definition 4.6. f : Rn → R is twice differentiable at x iff ∃ a ∈ Rn and an n by n matrix B s.t.

f (x + v) − f (x) − a · v − 12 v T Bv

lim =0

kvk→0 kvk2

in which case B is called the second derivative and the quadratic form 1/2v T Bv is called the second

differential. The function is twice differentiable iff it is twice differentiable at every point in the domain.

As special cases note that when n = 1 the matrix B has but a single element

d2 f (x)

b = f 00 (x) =

dx 2

the “second derivative of f (x) evaluated at x”. Geometrically, b is the slope of f 0 (x) at x — the rate

at which the slope of f changes at x — or the curvature of f .

When n > 1 the n by n matrix B has the second partial derivatives of f evaluated at x as its elements

∂ 2 f (x)/∂x12 · · · ∂ 2 f (x̄)/∂x1 ∂xn

.. .. ..

B = fxx (x) = .

. .

2 2 2

∂ f (x)/∂xn ∂x1 ··· ∂ f (x̄)/∂xn

The role of the second differential in approximating the error left -1.0 -0.5 0.5 1.0

-0.05

in the context of the example of Figure 4.1 on page 48. The error

from the linear approximation of ln(x), (v), is plotted in Figure 4.6 -0.10

the second differential -0.15

f (x + v) − f (x) − f 0 (x)v = f (x)v 2 + µ(v) or proximation

2!

1 1 1 2

ln(2 + v) − ln(2) − v = − v + µ(v)

2 2 4

1/6 1/3 1/2

Problem 4.7. Suppose g(x1 , x2 , x3 ) = x1 x2 x3 . Calculate gx (x) and gxx (x) for x = (1, 8, 9).

What is the equation of the hyperplane that is tangent to S = { x ∈ R3 | g(x) = 6 } at x = (1, 8, 9)?

Hint: In Mathematica the gradient and hessian of a function can be computed using the following:

gradient[f_,x_List]:=Map[D[f, #] &, x]

hessian[f_,x_List]:=Outer[D,gradient[f,x],x]

f = x1^(1/6) x2^(1/3) x3^(1/2);

x = {x1,x2,x3}

gradient[f,x]

hessian[f,x]//MatrixForm

52

To be twice differentiable at a point is to have a good quadratic approximation at that point — the

second derivative is the quadratic term of the approximation. Twice differentiable functions are also

differentiable, of course, since the definition requires finding the first derivative as well. The first

derivative not only exists if f is twice differentiable at x, it is continuous as well. Putting matters

somewhat loosely, continuity of a function implies an absence of breaks in its graph. To be differen-

tiable implies an additional smoothness — an absence of kinks — since the graph must have unique

tangents wherever it is differentiable. To be twice differentiable means that the graph of the derivative

has no kinks and the function is smoother still. Pressing on, mathematicians use the term smooth to

describe a function which is differentiable at all orders.

Problem 4.8. Suppose f : R → R. Give an example of an f that is smooth. Give an example of an f

that is differentiable but not smooth.

0.04

If you’re guessing that a cubic approximation could be used to ap-

proximate the error left from the linear and quadratic approxima- 0.02

tions and that this would be based on the third derivative, you’re

-1.0 -0.5 0.5 1.0

exactly right. Continuing the example from Figures Figure 4.1 on

-0.02

page 48 and Figure 4.6 on the previous page, the error from the linear

and quadratic approximations of ln(x), (v), is plotted in Figure 4.7 -0.04

together with the best cubic approximation of this error, namely, the -0.06

third differential

Figure 4.7: Best Cubic Approx-

imation

1 00 1

f (x + v) − f (x) − f 0 (x)v − f (x)v 2 = f 000 (x)v 3 + µ(v) or

2! 3!

(4.1)

1 1 1 2 1 1 3

ln(2 + v) − ln(2) − v+ v = v + µ(v)

2 2 4 6 4

Rearranging Equation 4.1 gives the Taylor series approximation of order three:

1 00 1

f (x + v) = f (x) + f 0 (x)v + f (x)v 2 + f 000 (x)v 3 + µ(v)

2! 3!

Problem 4.9. Series[f[x],{x,a,n}] is the Mathematica command for generating an nth order

Taylor series expansion of f [x] with respect to x at x = a. Use it to obtain the 4th order expansion

of 1/Sqrt[1+x] at x = 0.

Convex and concave functions are particularly important in Microeconomics. A function is concave if

and only if a line segment connecting any two points on the graph of the function lies on or below its

graph:

Definition 4.7. The function f : X , R where X is a convex subset of Rn is concave iff x, y ∈ Rn and

0 < a < 1 implies f (ax + (1 − a)y) ≥ af (x) + (1 − a)f (y)

A function is strictly concave if a line segment connecting any two distinct points on the graph of

the function lies below the graph at all points other than the end points. See the left-hand panel

of Figure 4.8 on following page.

53

3 60

2 40

1 20

0

0

0 0

4 4

2 2

2 2

4 4

0 0

Definition 4.8. The function f : X , R where X is a convex subset of Rn is strictly concave iff x, y ∈

Rn , x 6= y and 0 < a < 1 implies f (ax + (1 − a)y) > af (x) + (1 − a)f (y).

The function f , on the other hand, is convex iff g(x) = −f (x) is concave. A convex function is

characterized by the fact that line segments connecting two points lie on or above the graph. Similarly,

f is strictly convex iff −f is strictly concave. A strictly convex function is illustrated in the right-hand

panel of Figure 4.8.

Problem 4.10. Show that if f : Rn , R is a linear transformation then f is both concave and convex.

Theorem 22. Suppose that X is a convex subset of Rn . Then

2. If

k

X

f (x) = αi fi (x)

i=1

non-negative linear combination of concave functions must also be concave.]

Problem 4.11. Illustrate the first part of Theorem 22 for f : R → R by showing that a discontinuous

function cannot be concave.

Problem 4.12. Prove the second part of Theorem 22.

Theorem 23. Hessians of Concave Functions. Suppose f (x) is a continuously twice differentiable real

valued function on an open convex set X in Rn with Hessian fxx (x). Then

1. The function f is concave (convex) if and only if fxx (x) is negative (positive) semi-definite for all

x ∈ X.

2. The function f is strictly concave (convex) if fxx (x) is negative (positive) definite for all x ∈ X.

54

Note that the “and only if” is missing from the second proposition.

5

The function f (x) = −(x − 1)4 , for example, is strictly concave but

its Hessian is only negative semi-definite at x = 1 since f 00 (x) =

−12(x − 1)2 = 0 at x = 1 — see Figure 4.9.

Finally, and perhaps most important to the current discussion -1 1 2 3

subset of Rn . Then the set Lf (r ) ≡ {z ∈ Rn | f (z) ≥ r } ⊂ X is called

a level set for f . -5

then the level set Lf (r ) is a convex set for any r ∈ R.

Problem 4.13. [Answer] Prove Theorem Theorem 24. -10

The converse of this proposition isn’t true — a weaker condition

but Negative Semi-Definite

than concavity, namely quasi-concavity, is sufficient to guarantee the

convexity of Lf (r ).

Definition 4.10. The function f : X , R, where X is a convex subset of Rn , is quasi-concave iff ∀r ∈ R,

the level set Lf (r ) is a convex set and strictly quasi-concave iff Lf (r ) is a strictly convex set.

Definition 4.11. The function f : X , R where X is a convex subset of Rn is

1. quasi-concave if

2. strictly quasi-concave if

ure 4.10. As you would expect, f is quasi-convex iff −f is quasi- 300

Problem 4.14. Show that any monotone increasing (or decreasing) 200

monotone increasing (decreasing) function iff x 0 ≥ x implies f (x 0 ) ≥ 100

(≤)f (x).]

0

Problem 4.15. Show that any monotone increasing (or decreasing) 0

4

function is quasi-convex if X ⊂ R. 2

2

Problem 4.16. Suppose f : R → R is single-peaked, i.e., it is either 4

R such that f is monotone increasing for x ≤ x ∗ and monotone Figure 4.10: Strictly Quasi-

decreasing for x ≥ x ∗ . Is f quasi-concave? Is f concave? Concave

It is an unfortunate fact that quasi-concave functions share few of the other nice properties of concave

functions. Quasi-concave functions are not necessarily continuous — f (x) = 1 if x = 0 and f (x) = 0

if x 6= 0 with x ∈ R is, for example, quasi-concave but neither concave nor continuous. Non-negative

linear combinations of quasi-concave functions, moreover, are not necessarily quasi-concave. The

functions, x 2 and (x − 1)2 for x ∈ X ≡ [0, 1], for example, are both quasi-concave (but not concave)

and yet 12 x 2 + 12 (x − 1)2 is not quasi-concave. This is illustrated in Figure 4.11 on following page.

55

Lastly, consider the bordered Hessian: 1.0

0.8

0 ∂f (x)/∂x1 ··· ∂f (x)/∂xn

0.6

∂f (x)/∂x1 ∂ 2

f (x)/∂x12 ··· 2

∂ f (x)/∂x1 ∂xn

B(x) ≡

.. .. .. ..

. . . .

0.4

2

0.2

0.2 0.4 0.6 0.8 1.0

0 ∂f (x)/∂x1 ··· ∂f (x)/∂xk Combinations

∂ 2 f (x)/∂x12

∂f (x)/∂x

1 ··· ∂ 2 f (x)/∂x1 ∂xk

Bk (x) ≡ .. .. .. ..

.

. . .

2

∂f (x)/∂xk ∂ 2 f (x)/∂xk ∂x1 ··· 2

∂ f (x)/∂xk

that B2 (x) ≥ 0, B3 (x) ≤ 0, . . ., (−1)n Bn (x) ≥ 0 for all x ∈ Rn — not so nice! It is fortunate that second

order characterizations of concave and convex functions prove to be much more important than those

for their recalcitrant “quasi” cousins.

4.5 Answers

Suppose x, y ∈ Lf (r ). Then f (x), f (y) ≥ r . But concavity then

implies that f λx + (1 − λ)y ≥ λf (x) + (1 − λ)f (y) ≥ λr + (1 − λ)r = r and λx + (1 − λ)y ∈ Lf (r ).

56

Chapter 5

Optimization

5.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.1 Optimization

Virtually all of microeconomics is predicated upon the idea that the (economic) behavior of people

can be explained as an attempt on their part to maximize (or minimize) something subject to some

constraints. This view does not necessarily ascribe much intelligence to the people whose behavior is

to be described — the behavior of a rock falling off a cliff can be described, after all, as an attempt on

the part of the rock to minimize its potential energy. It simply means that we microeconomic theorists

always think we can cook up models based on optimizing behavior which will capture the important

aspects of actual behavior.

57

5.1.1 Unconstrained Optimization

The simplest optimization problem involves no constraints. Here f : Rn , R is the objective function

and the objective, accordingly, is to find that x ∗ which makes f (x ∗ ) as large (or as small) as possible:

f (x ∗ + v) − f (x ∗ ) ≤ 0, ∀v

(

∗ arg maxx f (x) if

x ≡

arg minx f (x) if f (x ∗ + v) − f (x ∗ ) ≥ 0, ∀v

Here the terms arg max and arg min refer to the “arguments” which solve the respective optimization

problem.

Now if f is differentiable and fx (x ∗ ) is its gradient evaluated at the point x ∗ then

f (x ∗ + v) − f (x ∗ ) ≈ fx (x ∗ )v

Thus fx (x ∗ ) = 0 is a necessary condition either for a maximum or a minimum — if it were not zero

then f would increase for some choice of v̂ (not a maximum) and fall for −v̂ (not a minimum either).

The requirement that the gradient “vanish”, fx (x ∗ ) = 0, entails solving the system of n equations:

∂f

(x ∗ , . . . , xn

∗

)=0

∂x1 1

∂f

(x ∗ , . . . , xn

∗

)=0

∂x2 1

..

.

∂f

(x ∗ , . . . , xn

∗

)=0

∂xn 1

for the n unknown components of x ∗ . Since this requirement involves the first differential and is a

necessary condition it is called the first order necessary condition.

Another necessary condition involves the second differential. If f is twice differentiable with second

derivative fxx (x)

1 T

f (x ∗ + v) − f (x ∗ ) ≈ fx (x ∗ )v + v fxx (x ∗ )v

2

1 T

≈ v fxx (x ∗ )v when fx (x ∗ ) = 0

2

Moreover, since

(

∗ ∗ ≤0

f (x + v) − f (x )

≥0 if fxx (x ∗ ) is positive semidefinite

Theorem 25. For x ∗ to impart a maximum (minimum) to f (x) it must be the case that fxx (x ∗ ) is

negative (positive) semidefinite.

58

It is sufficient that fxx (x ∗ ) be negative definite for a maximum or positive

definite for a minimum, but sufficient conditions aren’t very important in

Microeconomics. The reason is that Economic models typically involve the

optimization hypothesis that the person whose behavior is being modeled S

behaves as if he/she were trying to maximize (or minimize) an objective

function subject to constraints. Call this optimizing model M. A neces-

M

sary condition, N, which must be true in order for M to be true provides a

testable hypothesis — see Figure 5.1 — if evidence suggests that N is false N

then M must be false as well. Sufficient conditions, on the other hand,

do not provide testable hypotheses — S, for example, is sufficient for M

in Figure 5.1 but evidence that S is false does not imply that M is false.

Figure 5.1: Necessary and

Problem 5.1. In Mathematica the command Maximize[f,{x,y,...}] Sufficient Conditions for

maximizes f with respect to x, y, . . . . Describe the output from M

Plot[f,{x,-2.5,2.5}]

Maximize[f,x]//N

x

s.t. g i (x) ≥ 0, i = 1, . . . , m

The requirements that x must satisfy g i (x) ≥ 0, i = 1, . . . , m, are called constraints and the problem

is to find that x ∗ which satisfies the constraints and which imparts the largest value to the objective

function, i.e., x ∗ solves this problem if and only if

• g(x ∗ ) ≥ 0

There are exactly two possibilities for the solution to this problem, x ∗ ,

• There is no x̂ for which f (x̂) > f (x ∗ ). Here the constraints are “not binding” — can be erased

without affecting the solution to the problem. Here the unconstrained conditions hold.

• There is a x̂ for which f (x̂) > f (x ∗ ) but, of course, g(x̂) 6≥ 0. Here the constraints are binding

— prevent one from doing as well as would be possible without them — and the unconstrained

conditions do not necessarily hold.

Focusing upon the second possibility for the moment, notice that if the objective function, f , and the

binding constraints, ĝ, are differentiable at x ∗ then

f (x ∗ + v) − f (x ∗ ) ≈ fx (x ∗ )v

59

and

≈ ĝx (x ∗ )v

where ĝx (x ∗ ) denotes the matrix whose rows are the gradients of those constraints which are binding

at x ∗ . For a maximum it is intuitive that the inequalities:

1. For small v, the movement from x ∗ to x ∗ + v would satisfy the constraints. This follows from

the fact that the components of ĝ(x ∗ ) are all equal to zero, all other constraints are strictly

positive and the (small) movement to x ∗ + v would thus leave all constraints non-negative.

2. For small v, the movement from x ∗ to x ∗ + v would impart a larger value to the objective

function.

This intuition is basically correct save for pathological cases — see Arrow, Hurwitz and Uzawa for a

discussion of constraint qualifications which are sufficient to eliminate such problems. [The original

Kuhn-Tucker constraint qualification was that the gradients of the binding constraints must be linearly

independent.]

Recall now that Farkas’ Lemma (Theorem 6 on page 31) says that if A is an m by n matrix and b 6= 0

is a 1 by n row vector then exactly one of the following is true:

Now make the association that A = ĝx (x ∗ ), b = −fx (x ∗ ), z = v, y = λ and note that b = −fx (x ∗ ) 6= 0

by virtue of the fact that we are focusing upon the case in which the unconstrained, vanishing gradient,

condition does not hold. Then either

has a solution or Equation 5.1 has a solution. But since there can be no solutions to Equation 5.1 if x ∗

is to solve the maximization problem, it follows from Farkas’ Lemma that Equation Equation 5.2 must

have a solution.

Equation 5.2 is illustrated for a simple maximization problem in Figure 5.2 on the next page. Here

there are three constraints and the problem is to

x

(5.3)

g 2 (x) = (0, 1) · (x1 , x2 ) ≥ 0

g 3 (x) = 1 + (−1/3, −1) · (x1 , x2 ) ≥ 0

This is an example of a linear programming problem — a problem in which both the objective function

and the constraints are linear.

Note first that the shaded area corresponds to g(x) ≥ 0 — the set of x’s which satisfy all three

constraints. By inspection it can be seen that x ∗ = (3, 0) solves the problem since:

60

fx . x = 3

2

g = (0, 1)

x f x = (1, 1)

1

g = (1, 0)

x

(3, 0)

f x = (-1, -1)

3

g = (-1/3, -1)

x

• x’s which make the objective function larger than 3 do not belong to the shaded area.

That this point satisfies Equation 5.2 on the previous page can be seen by plotting, with their tails at

(3, 0), the gradients of the objective function and each of the three constraints. Note that −fx belongs

to the cone generated by g 2 and g 3 , the gradients of the constraints which are actually binding at

(3, 0), since it is possible to express −fx as a positive linear combination of g 2 and g 3 . These positive

weights provide the values of the components of λ in the solution to λĝx = −fx with λ > 0:

" #

h i 0 1

λ2 λ3 = (−1, −1)

−1/3 −1

or h i h i

λ2 λ3 = 2 3

Problem 5.2. For the problem of Figure 5.2 show geometrically why the point (0, 1) does not sat-

isfy Equation 5.2 on the previous page. [Recall that ĝx (0, 1) contains only the gradients of the con-

straints which are binding at (0, 1).]

m

X

L(x, λ) ≡ f (x) + λi g i (x)

i=1

then, with slight modification, the “Farkas’ Lemma” requirement for a maximum, Equation 5.2 on

the previous page, becomes the Kuhn-Tucker Conditions which, given the constraint qualification, are

necessary for a solution to the maximization problem:

Lx (x ∗ , λ∗ ) = 0 (5.4)

∗ ∗

λLλ (x , λ ) = 0 (5.5)

∗ ∗

Lλ (x , λ ) ≥ 0 (5.6)

λ∗ ≥ 0 (5.7)

61

Equation 5.4 on preceding page, Equation 5.5 on preceding page and Equation 5.7 on preceding page

are the solution to Equation 5.2 on page 60 save for the fact that λ is now allowed to be equal to zero

— this to incorporate the case in which none of the constraints is binding. Equation 5.6 on preceding

page is merely a restatement of the constraints. Equation 5.5 on preceding page and Equation 5.7

on preceding page form the so-called complementary slackness condition which prevents non-binding

constraints from playing a role in Equation 5.4 on preceding page. Equation 5.5 on preceding page

says that

m

X

i

λ∗ ∗

i g (x ) = 0

i=1

Since each term in this summation is the product of two non-negative numbers (Equation 5.6 on pre-

ceding page and Equation 5.7 on preceding page), it follows that g i (x ∗ ) > 0 implies λ∗ i = 0 and,

conversely, λ∗j > 0 implies g j

(x ∗

) = 0. If the ith constraint is not binding, g i

(x ∗

) > 0 then the

∗ ∗

associated Lagrangian multiplier, λi , must be equal to zero. Conversely, if λi > 0 then the corre-

sponding constraint must be binding, g i (x ∗ ) = 0. Thus Equation 5.4, Equation 5.5 and Equation 5.7

on preceding page are equivalent to Equation 5.2 on page 60.

Problem 5.3. Is it possible for both λ∗ j ∗

j = 0 and g (x ) = 0 to hold in an optimal solution? If so,

provide an example of an optimization problem in which this would be true. Otherwise show why this

is not possible.

The complementary slackness condition is consistent with the shadow price interpretation of the La-

grangian multipliers in which λi is the value of relaxing the ith constraint. This can expressed more

precisely as follows. Introduce a vector of parameters, a = (a1 , . . . , am ) into the generic optimization

problem as follows

x ∗ (a) ≡ arg max f (x)

x

s.t. g 1 (x) + a1 ≥ 0

..

.

g m (x) + am ≥ 0

The presence of the a’s in the constraints means that the optimal solution(s) will depend upon which

values are selected for these a’s. The notation x ∗ (a) reflects this functional dependence — it is simply

the set of x’s which solve the problem for the given a. Now the value of the objection function also

depends upon a. This can be expressed by

F (a) ≡ max f (x)

x

s.t. g 1 (x) + a1 ≥ 0

..

.

g m (x) + am ≥ 0

Notice that F (a), unlike x ∗ (a), is necessarily a single valued function — there can be but one value for

the maximized objective function. [In the previous example F (a) = a.] Provided that it is differentiable

then

∂F (0)

λi = (5.8)

∂ai

represents the rate at which the (optimized) value of the objective function increases as the ith con-

straint is relaxed or as ai is increased from zero.

ËĞ

62

Problem 5.4. In Mathematica the command Maximize[{f,cons},{x,y,...}] maximizes f with re-

spect to x,y,... subject to the constraints, cons. How well does Mathematica do with the illustrative

problem in Equation 5.3 on page 60?

Maximize[{x1+x2, x1>=0, x2>=0, 1-x1/3-x2>=0},{x1,x2}]

The Kuhn-Tucker first-order conditions can themselves be used to derive some results related to the

second-order conditions.

Theorem 26. If A is symmetric and if either

x 0 = arg max x T Ax

x

T

s.t. x x ≤ 1

λ = max x T Ax

0

x

T

s.t. x x ≤ 1

or

x 0 = arg min x T Ax

x

T

s.t. x x ≥ 1

λ = min x T Ax

0

x

s.t. x T x ≥ 1

Problem 5.5. Show, for any symmetric n by n matrix A, that the gradient of the associated quadratic

form is given by

∂x T Ax

= 2Ax

∂x

Lmax = x T Ax + λ[1 − x T x]

Lmin = −x T Ax + λ[x T x − 1]

Thus

−Lmin = Lmax

and in either case a necessary condition is that

Lx = 2Ax − 2λx = 0

or

Ax = λx

63

Since an optimal x must satisfy the characteristic equation it follows that an optimal x must be a

characteristic vector of A and the Lagrangian multiplier λ must be the associated characteristic root.

Pre-multiply both sides of the last expression by x T to obtain

x T Ax = λx T x

= λ1

=λ

Thus the Lagrangian multiplier/characteristic root is, in fact, equal to the optimized value of the

objective function.

-1.0

-0.5

0.0

0.5

1.0

1.0

0.5

0.0

-0.5

-1.0

-1.0

-0.5

0.0

0.5

1.0

Theorem 26 on preceding page is illustrated in Figure 5.3 for the case in which

" #

1 0

A=

0 −1

The characteristic roots and associated characteristic vectors for this indefinite quadratic form are

λ1 = 1; x 1 = (1, 0) or (−1, 0)

λ2 = −1; x 2 = (0, 1) or (0, −1)

Moving around the unit circle, x T x = 1, the point x 1 (x 2 ) corresponds to the point at which x T Ax

is furthest above (below) the x-plane and λ1 (λ2 ) is the actual distance from the x-plane to the graph

x T Ax at this point.

Theorem 27. It is necessary and sufficient for the symmetric matrix A to be

1. a positive definite quadratic form that all the characteristic roots of A be positive.

2. a positive semi-definite quadratic form that all the characteristic roots of A be non-negative.

3. an indefinite quadratic form that at least one of the characteristic roots of A be positive and at

least one be negative.

64

Note that it is sufficient to explore the “unit circle” {x ∈ Rn | x T x = 1} since the sign of the quadratic

form at an arbitrary point x 6= 0 must be the same as the sign at the point x/|x|

xT x 1

A = x T Ax, ∀x 6= 0

|x| |x| |x|2

and the latter point is on the unit circle. Suppose, for example, that A is positive definite. Then since

x T Ax > 0 for all x 6= 0 it follows that

λ ≡ min x T Ax

x

s.t. x T x ≥ 1

>0

In view of Theorem 26 on page 63, if λ must be the smallest of all of the characteristic roots it follows

that all of the roots must be positive. Conversely, if all of the roots are positive then the minimization

problem has a strictly positive solution and the quadratic form is positive definite. A similar argument

holds if the quadratic form is positive semi-definite, save for the fact that the smallest root may

actually be equal to zero. Finally, if the quadratic form is indefinite then its minimum on the unit circle

will clearly be negative and its maximum will be positive and there will, accordingly, be characteristic

roots of both signs. Conversely, if there are both positive and negative characteristic roots then the

quadratic form takes on both positive and negative values and is therefore indefinite.

Finally, in view of Theorem Theorem 25 on page 58:

Theorem 28. Suppose f : Rn , R is continuously twice differentiable. For x ∗ to impart a maximum

(minimum) to f (x) it must be the case that the characteristic roots of fxx (x ∗ ) are non-positive (non-

negative).

Problem 5.6. Consider the (unconstrained) problem of minimizing with respect to x and y the func-

tion

f = x^4 + 3 x^2 y + 5 y^2 + x + y;

Using Mathematica, first plot this function, then solve the minimization problem and finally check the

solution by examining the gradient and the characteristic roots (eigenvalues) of the hessian.

min = Minimize[f, {x, y}] // N

sol = Last[min]

gradient[f_, vars_List] := Map[D[f, #] &, vars]

hessian[f_, vars_List] := Outer[D, gradient[f, vars], vars]

gradient[f, {x, y}] /. sol

hes = hessian[f, {x, y}] /. sol

Eigenvalues[hes]

Comment on the output and indicate whether or not Mathematica’s solution satisfies the first and

second order conditions for a minimum.

Problem 5.7. Repeat Problem 5.6 this time for the function

f = x^3 + y^3 + 2 x^2 + 4 y^2 + 6

and using FindMaximum[f, {{x, 0}, {y, -8/3}}] to find a local maximum of f starting at x = 0

and y = -8/3, this time restricting the plot range to {x,-2.5,1}, {y,-3.5,1}. Are any warnings

issued by Mathematica regarding the solution? Is the solution actually a local maximum, a local

minimum or a local saddle point?

65

Problem 5.8. Consider a “knapsack problem” (integer programming problem) in which five items are

available for packing in the knapsack. Let the integer xi = {0, 1} denote whether (1) or not (0) the

ith item is included. The benefit to including the items are, respectively, 14, 10, 15, 8 and 9 and the

corresponding weights are 6, 8, 5, 6 and 4. Which items should be included if the goal is to maximize

the total benefit subject to the constraint that the total weight not exceed 18? Hint: Use Mathematica:

Maximize[{14x1+10x2+15x3+8x4+9x5,

6x1+8x2+5x3+6x4+4x5 <= 18,

0<=x1<=1, 0<=x2<=1, 0<=x3<=1, 0<=x4<=1, 0<=x5<=1,

Element[{x1,x2,x3,x4,x5}, Integers]},

{x1,x2,x3,x4,x5}]

Problem 5.9. Mathematica uses the optimized command LinearProgramming[c,m,b] to solve the

linear programming problem of minimizing c · x subject to the constraints that mx ≥ b and x ≥ 0

where c and x are n-tuples, m is an m × n matrix and b is an m-tuple. Use it to solve the “transporta-

tion problem” in which there are three sources of supply with quantities 47, 36 and 52, respectively,

and four destinations with demands 38, 34, 29 and 34, respectively. In this problem x will be the

12-tuple x11,...,x34 where xIJ denotes the quantity shipped from source I to destination J. The

12-tuple of the corresponding costs, cIJ, of shipping one unit from source I to destination J is

c = {5, 7, 6, 10, 9, 4, 6, 7, 5, 8, 6, 6}

The problem is to minimize the total shipping cost of meeting the specified demands from the avail-

able sources. There are seven constraints, three specifying that total shipments from each of the three

sources cannot exceed the available amounts and another four specifying that the total shipments

to each of the four destinations cannot be less than the demands. To express the constraints in the

required ≥ format we then need, for example, the first row of m to be

{-1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0}

and the first component of b to be −47. Complete the specification of m and b and give the solution

from LinearProgramming[c,m,b]. What is the total shipping cost in the optimal solution? How many

of the constraints are binding in the optimal solution?

The typical “optimizing model of behavior” in microeconomics adds one final ingredient — parameters

— to the constrained optimization problem:

x

x

s.t. g i (x, a) ≥ 0, i = 1, . . . , m

where

a ∈ Rp the parameters or exogenous variables

x(a) the reduced form, a point-to-set mapping

Note that this is a generic optimization problem in the sense that it incorporates:

66

1. Minimization: replace min h(x, a) with max f (x, a) where f (x, a) ≡ −h(x, a)

2. Arbitrary inequality constraints: replace w(x, a) ≤ z(x, a) with g(x, a) ≡ z(x, a) − w(x, a) ≥ 0

3. Equality constraints: replace h(x, a) = 0 with the two constraints g 1 (x, a) ≡ h(x, a) ≥ 0 and

g 2 (x, a) ≡ −h(x, a) ≥ 0

The vector x represents the endogenous variables whose values will be chosen, according to the model,

by some person as if to maximize the objective function subject to the constraints. The business of

the model is to predict the values of these variables — hence the term “endogenous”. The vector a

represents the parameters or exogenous variables whose values are not explained by the model but

rather are exogenously specified — the “dial settings” for the experiment. In consumption theory, for

example, a would correspond to the consumer’s income and the market prices of consumption goods

and x to the quantities of these consumption goods demanded. The mapping, x(a), would in this

case be called the consumer’s demand function.

The purpose of an optimizing model is to explain the choice of x in experiments which are charac-

terized by the given values of a or, in short, to ascertain the properties of the point-to-set mapping

x(a) — the term “point-to-set” reflects the fact that the image of a may be a set rather than a single

point. This mapping might be called the reduced form of the model. It is a sort of “bottom line” for the

model since it expresses the endogenous or explained variables directly as functions of the exogenous

or explanatory variables. To predict the result of an experiment simply stick the description of the

experiment, a, into this function and out pops the set of possible results, x(a).

Models can be more or less specific. In the typical example problem or exercise both the objective

function and constraints are explicitly stated and it is possible to determine the exact properties of

x(a) since it is generally possible to solve for the reduced form explicitly. In most “real” analysis,

on the other hand, neither the objective function nor the constraints are explicitly stated. Instead

qualitative properties of these functions are given and the analytical challenge becomes to trace these

qualitative properties through the optimization problem to discover what qualitative properties they

entail for the reduced form.

A “conservation of information” law holds for this deductive process — the more (or less) specific

the information about the objective function and constraints the more (or less) specific will be the

derived information about the reduced form. In a model in which all functions are explicitly stated,

for example, it might be possible to deduce that an increase in a given sales tax by five cents would

reduce the given consumer’s consumption of the taxed commodity by ten pints per week. In a less

specific model, on the other hand, it might only be possible to state that an increase in the sales tax

would entail a reduction (by some indeterminate amount) in the quantity consumed. Such results are

called comparative static effects and are typically expressed as propositions concerning the signs of

partial derivatives of the reduced form.

Certain potential problems of optimization models are best recognized and eliminated as early in the

process as possible:

1. Is it possible for x(a) to be empty, i.e, to contain no solutions whatever? Since this amounts

to the prediction that nothing is possible, this would be a rather serious, indeed fatal, flaw in a

model.

2. Is it possible for x(a) to contain more than one solution? The limiting case here would be

that anything is possible or that the model has no predictive value — it doesn’t rule out any

possibilities whatever.

67

3. Is it possible for a single valued x(a) not to be smooth, i.e. discontinuous or non-differentiable?

The “pre Chaos Theory” view of the world of actual experiments was that small changes in exoge-

nous parameters produce correspondingly small changes in endogenous variables — that X(A)

is smooth. The “post Chaos Theory” view allows for “butterfly effects” — e.g. the possibility

that the small change in air currents caused by a butterfly’s wings could cause a large change in

weather patterns. Whatever one’s view of such matters, computing comparative static effects is

certainly much easier when they can be obtained as (partial) derivatives.

The optimization problem will have no solutions only for the most poorly posed problems. An sug-

gestive example is the problem of finding the largest real number which is strictly less than one. There

is no solution for this constrained maximization problem nor is there one for the unconstrained prob-

lem of simply choosing the largest real number. The failures in both problems stems from the same

problem — the feasible set, the set of points which satisfy the constraint(s), is not compact.

Recall from Theorem 21 on page 45 that a continuous function must attain a maximum and a minimum

on a compact set and from Theorem 19 on page 45 that a subset of Rn will be compact if and only if

it is closed and bounded.

Note the failure of the feasible sets in the two examples. The set of real numbers strictly less than one

is not closed and the set of all real numbers is not bounded. Compactness turns out to be just the

right requirement for feasible sets in view of the following two related facts:

Stock assumptions which assure that a problem has at least one solution are thus (i) a continuous

objective function and (ii) constraints which yield a compact feasible set.

Now it would also be nice if x(a) were a function and not a point-to-set mapping since its nicer to have

a model which makes a unique prediction regarding behavior than one which merely enumerates a

collection of possibilities. The predictive power of the model is greater and comparative statics results,

which require functions, become possible. Unfortunately the “arguments” which solve maximization

problems need not generally be unique. There are, however, special classes of problems for which the

solutions are necessarily unique and, thus, for which x(a) is a function. One of the most important of

these involves problems whose objective functions and constraints are quasi-concave.

The connection of quasi-concavity to the uniqueness of solutions to Equation 5.9 on page 66 is imme-

diate. Suppose, for example, that the constraints, g i (x, a), are (strictly) quasi-concave functions in x.

Then the set of x’s which satisfy g i (x, a) ≥ 0 is simply a level set for the (strictly) quasi-concave func-

tion and thus a (strictly) convex set. The feasible set, the set of x’s which satisfy all of the constraints,

as an intersection of (strictly) convex sets must itself be (strictly) convex.

Now add the fact that the objective function, f (x, a), is quasi-concave in x and x(a) must then be a

convex set. This follows from two observations. If any two distinct points belong to x(a) then (i) both

points must belong to the feasible set and (ii) both points must impart the same value to the objective

function. It follows from the first and the convexity of the feasible set that any point along the line

segment connecting the two points must belong to the feasible set. It follows from the second and the

quasi-concavity of the objective function that any point along this line segment must impart a value

to the objective function which is at least as great as the value at the endpoints. But the value of the

68

objective function at any point along the line segment also cannot exceed the value at the end points

else the end points could not have belonged to x(a). Hence the value of the objective function must

be constant along the line segment and the entire line segment must then belong to x(a).

If the constraints are quasi-concave and the objective function is strictly quasi-concave then x(a) can

contain at most a single point. To see this suppose to the contrary that x, x 0 ∈ x(a) with x 6= x 0 and

pursue the argument of the previous paragraph. Any point along the line segment connecting x and

x 0 must be feasible and must now impart a value to the objective function which is, because of strict

quasi-concavity, strictly greater than either of the endpoints. Since this contradicts the assumption

that x 6= x 0 ∈ x(a) it cannot be the case that x(a) contains two or more points — either x(a) is

empty or it contains a unique solution and can be regarded as a function.

Optimization models in which the mapping x(a) is a not only a function but differentiable as well are

particularly important in Economics. In such models comparative static effects are easily expressed as

the partial derivatives of x(a).

Suppose that in Equation 5.9 on page 66 the constraints are quasi-concave and the objective function

is strictly quasi-concave in x and differentiable in both x and a. Then the problem has a unique

solution (from strict quasi-concavity) which can be characterized by the Kuhn-Tucker Conditions (from

differentiability in x). The Lagrangian Function is

m

X

L(x, λ, a) ≡ f (x, a) + λj g j (x, a)

j=1

Lx (x(a), λ(a), a) = 0

λ(a) · Lλ (x(a), λ(a), a) = 0

Lλ (x(a), λ(a), a) ≥ 0

λ(a) ≥ 0

Now choose values for the exogenous variables, a = â, suppose the non-binding constraints have been

eliminated and write the Kuhn-Tucker conditions as

Lx (x̂, λ̂, â) = 0

(5.10)

g(x̂, â) = 0

where

L(x, λ, â) ≡ f (x, â) + λg(x, â)

g(x, â) ≡ [g 1 (x, â), . . . , g m (x, â)]

When Equation 5.10 is solved for x(â) will the result be differentiable? The Implicit Function Theorem

provides the answer to this question.

Theorem 29 (Implicit Function Theorem). Suppose hi (y, a), i = 1, , . . . , l are real-valued and contin-

uously differentiable functions on Rl × Rp with y ∈ Y , an open subset of Rl , and a ∈ A, an open

subset of Rp . Let hi (ŷ, â) = 0, i = 1, . . . , l. Provided that the determinant of the Jacobian does not

vanish, |hy (ŷ, â)| 6= 0, there exists a continuously differentiable function y(a) such that ŷ = y(â)

and hi (y(a), a) = 0, i = 1, . . . , l for all a in some neighborhood of â.

69

Intuitively, this theorem says that if you fix a = â you can regard h as a mapping which takes a point

y in Rl and produces another point, h(y, â), in Rl . If this mapping is differentiable it has a good linear

approximation given by the Jacobian hy . If the determinant of this Jacobian is non-singular then the

linear approximation is invertible. This means that it’s possible to find out what y mapped into zero

when a = â, namely, ŷ = y(â). The theorem states that it is not only possible to find ŷ, but, locally

at least, that it is also possible to find the function y(a) and that this function will be differentiable.

To use the Implicit Function Theorem, let the first order conditions of Equation 5.10 on preceding

page correspond to the hi , let l correspond to n + m and let y correspond to (x, λ). Then under

the conditions of the implicit function theorem there is a continuously differentiable function y(a) =

[x(a), λ(a)] such that

Lx (x(a), λ(a), a) = 0

(5.11)

g(x(a), a) = 0

Since the Implicit Function Theorem assures that x(a) is differentiable, we can differentiate Equa-

tion 5.11 with respect to ak to obtain

gxT

" #" # " #

Lxx ∂x/∂ak Lxak

+ =0

gx 0 ∂λ/∂ak Lλak

Since the matrix on the left side of this expression (the Jacobian of the hi ’s) is assumed to be non-

singular, the inverse exists and we have

#−1 "

gxT

" # " #

∂x/∂ak Lxx Lxak

=− (5.12)

∂λ/∂ak gx 0 Lλak

Equation 5.12 is called fundamental equation of comparative statics. It is apparent from this equation

that the inverse of the bordered Hessian or

Lxx gxT

" #

−1

H =

gx 0

is the key to comparative static results in the “classical approach”. In this approach second order nec-

essary conditions provide local information and concavity of f and the g i ’s provide global information

about this matrix.

Problem 5.10. Suppose that the unconstained optimization problem, maxx f (x, a), has a unique

solution x ∗ (a) where x, a ∈ R and ∂ 2 f (x, a)/∂x 2 < 0.

1. Show that Sign(dx ∗ (a)/da) = Sign(∂ 2 f (x, a)/∂x∂a) where ∂ 2 f (x, a)/∂x∂a is evaluated at x =

x ∗ (a).

2. Interpret this result geometrically by plotting ∂f (x, a)/∂x for a given value of a in a graph with

x plotted on the horizontal axis.

70

5.3.2 The Envelope Theorem

An important alternative approach to comparative statics is provided by the envelope theorem. Con-

sider a controlled experiment in which only the ith exogenous variable will be changed. The envelope

theorem concerns the following question. How does

F (a) ≡ f (x(a), a)

(5.13)

= f (x(a), a) + λ(a)g (x(a), a)

change with ai ? Put more precisely, what is the (partial) derivative of F (a) with respect to ai ? Consider

the right hand side of this expression and notice that as ai changes there are two types of effects upon

the value of f (x(a), a). The first corresponds to the (partial) derivative of f (x, a) with respect to ai

evaluated at x = x(a). This is the direct effect and is denoted ∂f (x(a), a)/∂ai . The second, or indirect

effect, corresponds to the effect of the change in ai upon the components of x, ∂xj (a)/∂ai , and the

effect of these changes in the components of x upon f (·), ∂f (·)/∂xj . Using the chain rule, the total

effect is the sum of these two components:

n

∂F (a) ∂f (a) X ∂f (x, a) ∂xj (a)

≡ +

∂ai ∂ai j=1

∂xj ∂ai

The envelope theorem says, quite simply, that the second — and complicated — term in this expression

is zero:

Theorem 30 (Envelope Theorem). Suppose

x

s.t. g(x, a) ≥ 0

and

F (a) = f (x(a), a)

L(a) = f (x(a), a) + λ(a)g (x(a), a)

where x(a) and λ(a) are functions satisfying the Kuhn-Tucker conditions (Equation 5.4 and Equa-

tion 5.7 on page 61). Provided that both F (·) and L(·) are differentiable

∂F (a) ∂L(x, λ, a)

=

∂ai ∂ai

where

∂L(x, λ, a) ∂f (x, a) ∂g(x, a)

= +λ

∂ai ∂ai ∂ai

The basis of this surprising fact and the reason it is called the envelope theorem is illustrated in Fig-

71

ure 5.4. In this example there are no constraints and a has a single component:

[a − (x − a)2 ]

f (x, a) =

2

∗ [a − (x − a)2 ]

x (a) = arg max

x 2

=a

[a − (x − a)2 ]

F (a) = max

x 2

= f x ∗ (a), a

a

=

2

F(a)

changed to remain optimal for the new values of a. The 4

derivative dF (a))/da. A series of other curves are plotted 3

optimal for various specific values of a, e.g., f (7, a) corre- 2

which is optimal when a = 7 — and then varying a. The 1

f(5,a)

slope of one of these fixed x curves at a particular point is f(3,a)

the partial derivative ∂f (x, a)/∂(a). 2 4 6 8 10

a

Note that the graph of F (a) is the “envelope” of the fixed x Figure 5.4: The Envelope Theorem with-

curves. The reason for this is simple. None of the fixed x out Constraints

curves can every lie above F (a) which, after all, is the max-

imized value of f (x, a) over all possible choices of x. The

fixed x curves can touch F (a), however. Consider the point at which a = 7 for example. Since x = 7 is

optimal for this value of a, it follows that F (7) = f (7, 7). The fixed x curve f (7, a) thus touches F (a)

at a = 7. A similar story holds for the other fixed x curves each of which contributes, in this case, a

single point to the envelope F (a).

Now if two curves touch at a point but do not cross they are tangent to one another. If, moreover,

both curves are differentiable, then their slopes are well defined at all points and their slopes must be

equal at points of tangency. This is the envelope theorem — the slope of a fixed x curve must be the

same as the slope of the envelope curve at the point of tangency.

A slightly more complex illustration of the envelope theorem is provided by the following problem:

arg max x 2

x

s.t. x ≤ a

L(x, λ, a) ≡ x 2 + λ(a − x)

72

and the solution is given by

x ∗ (a) = a

λ∗ (a) = 2a

F (a) ≡ L(x ∗ (a), λ∗ (a), a)

= a2

80

L(7,14,a)

λ constant at values which would be optimal for some

a and then letting a vary. Note again that F (a) is the 60 L(5,10,a)

envelope of these curves and that, for example, the

40

tangency of F (a) with L(3, 6, a) at a = 3 means that L(3,6,a)

20

= 0

da ∂a

-20

Problem 5.11. Consider the problem

2 4 6 8 10

a

max min{x1 , x2 }

x≥0

s.t. p1 x1 + p2 x2 ≤ m Figure 5.5: The Envelope Theorem with Con-

straints

with parameters p1 , p2 , m > 0.

m

xi∗ (p1 , p2 , m) = , i = 1, 2

p1 + p2

m

F (p1 , p2 , m) =

p1 + p2

Problem 5.12. Solve Problem 5.11 using Mathematica for the case in which p1 = 2, p2 = 3 and

m = 10.

The connection of the Envelope Theorem to comparative statics depends on a special characteristic of

problems frequently encountered in microeconomics. The endogenous variables in such problems are

the quantities chosen of various commodities and the exogenous parameters are the market prices

of these same commodities. The objective function, moreover, commonly depends upon the market

value of the quantities chosen — the dot product of prices and quantities. Such problems have the

special form:

max a · x

x

(5.14)

s.t. g(x) ≥ 0

L(x, λ, a) = a · x + λg(x)

73

the Envelope Theorem implies that

∂F (a) ∂L(x, λ, a)

=

∂ai ∂ai

= xi (a)

and

∂ 2 F (a) ∂xi (a)

=

∂ai ∂aj ∂aj

Here the comparative static effects, ∂xi (a)/∂aj , are obtained as the second partial derivatives of F (a).

In this approach global information about the concavity or convexity of F (a) provides global informa-

tion about the comparative static effects.

74

Chapter 6

Dynamics

6.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

– F (t1 + t2 , x) = F (t2 , F (t1 , x)) = F (t1 , F (t2 , x)) for all x ∈ X and all t1 , t2 ∈ T

The state space, X, is typically taken to be a topological space and the set of possible times, T , is taken

either to be the real numbers for continuous time systems or the integers for discrete time systems. For

continuous time systems, F is often called a flow and for discrete time systems a map.

For a given initial condition, x(0),

x(t) ≡ F (x(0), t)

75

is often called a path or trajectory and represents the entire future, t > 0, and history, t < 0, of the

system that was in state x(0) at t = 0.

Interest focuses upon the existence of equilibria, i.e., x ∗ for which F (x ∗ , t) = x ∗ for all t and, when

an equilibrium exists, the stability of the equilibrium:

Definition 6.1. The equilibrium x ∗ is stable iff for every neighborhood of x ∗ , U , there is another

neighborhood of x ∗ , V , such that x(0) ∈ V implies x(t) ∈ U for every t > 0.

If, additionally, V can be chosen so that limt→∞ x(t) = x ∗ then x ∗ is called asymptotically stable.

∂F

f (x) ≡ (x, 0)

∂t

and thus obtain the system of differential equations

ẋ = f (x) (6.1)

From a modeling perspective, it is often easier to begin by specifying the model either as either a

system of differential equations, Equation 6.1, for a continuous time system or as a system of difference

equations for a discrete time system

∆x = f (x) (6.2)

If the good news is that f is easier to specify, the bad news is that F is still needed and must now be

obtained by solving (integrating) f , a task which is often quite difficult to do manually but, fortunately,

easy to do with a computer and software such as Mathematica — see the introduction to this program,

Using Mathematica, on page 93.

The predator-prey or Lotka-Volterra equations provide a very nice example of how f is specified and

then solved for F . These equations are a pair of first order, non-linear, differential equations frequently

used to describe the dynamics of biological systems in which two species interact, one a predator and

one its prey. They were proposed independently by Alfred J. Lotka in 1925 and Vito Volterra in 1926.

Here X = R2+ , T = R+ and f is given by

dx

ẋ ≡ = (a − by)x

dt

dy

ẏ ≡ = (cx − d)y

dt

where

• ẋ and ẏ are the growth rates of the two populations against time

The logic is that wolves are bad for rabbits so rabbit growth, ẋ, is decreasing in wolves, y. Rabbits, on

the other hand, are good for wolves, so wolf growth, ẏ, is increasing in rabbits, x.

76

What can be said about F ? Well, the first equation implies

y

that dx/dt = 0 when y = a/b and the second equation im-

plies that dy/dt = 0 when x = d/c. This means that state-

space, the positive quadrant of R2 , can be divided into the

four regions illustrated in Figure 6.1. For y > a/b it must be

the case that ẋ < 0 and for y < a/b it must be the case that a/b

ẋ > 0. Arrows are used to reflect this fact — right-pointing

above y = a/b and left-pointing below y = a/b. Similarly

for x > d/c, ẏ > 0 and for x < d/c, ẏ < 0. Upward-pointing

arrows to the right of x = d/c and downward pointing ar-

rows to the left of x = d/c reflect this fact.

d/c x

Next paths consistent with the arrow requirements are added.

Note that these must cross y = a/b vertically since ẋ = 0 ev- Figure 6.1: Predator and Prey

erywhere along this line. Similarly, they must cross x = d/c

horizontally since ẏ = 0 everywhere along this line. The resulting picture is suggestive — there is a

rest point at (d/c, a/b) and, away from this rest point paths seems flow around the rest point in a

counter-clockwise direction.

All this hasn’t been too difficult, but Mathematica can do better. The

4

vector field plot for this system when all the parameters are equal to

one is illustrated in Figure 6.2. In this type of plot, a grid of (x, y)

coordinates is selected and then the vector (ẋ, ẏ) evaluated at the 3

grid point is plotted at each of the grid points. These vectors thus

point in the direction of the flow with a length that is proportional

to the velocity of the flow. All Mathematica needs to produce this 2

{y, 0, 4}]

But wait, there’s more. We can obtain the complete path for a given

initial condition as simply as 1 2 3 4

sol1 = NDSolve[{D[x[t],t] == (1-y[t])x[t],

D[y[t],t] == (x[t]-1)y[t],\\

x[0]==.2, y[0]==.2\}, {x[t], y[t]}, {t, 0, 10}]

and then plot the results for this combined with a couple of other initial conditions in Figure 6.3 on

following page with the command

ParametricPlot[Evaluate[{sol1,sol2,sol3}], t,0,10]

77

Note that the solutions are indeed orbits, i.e., if the initial state is 4

(0.2, 0.2), for example, then the path begins on the outermost of the

illustrated orbits and then forever follows this orbit counter clock-

3

wise around and around the rest point at (1,1). What ever the initial

state, the story is similar — there is an orbit through the initial state

and the system then stays on that orbit forever.

2

Problem 6.1. Is the equilibrium at (1, 1) in the preditor and prey

model stable? Is it asymptotically stable?

Problem 6.2. Use Mathematica to examine the simultaneous sys- 1

ẋ = 1/2 x 2 − y 0

0 1 2 3 4

ẏ = x − y

Figure 6.3: Preditor and prey

with the following orbits

Needs["VectorFieldPlots‘"];

vf = VectorFieldPlot[{1/2 x^2-y, x-y}, {x,-1,1}, {y,-1,1}];

Show[vf, Axes -> True]

sols = Table[{x[t], y[t]} /. NDSolve[{x’[t] == 1/2 x[t]^2 - y[t],

y’[t] == x[t] - y[t], x[0] == n, y[0] == 0.5}, {x[t], y[t]},

{t, 0, 20}], {n, -1, 1, 0.5}];

ph = ParametricPlot[Evaluate[sols], {t, 0, 20},

PlotRange -> {{-1, 1}, {-1, 1}}]

Show[{vf, ph}, Axes -> True]

Problem 6.3. Use Mathematica to examine the differential equation

ẋ = 3xt 2

with the following

Needs["VectorFieldPlots‘"];

vf = VectorFieldPlot[{1, 3 x t^2}, {t, -2, 1}, {x, -1.5, 1.5}];

Show[vf, Axes -> True]

sol = x[t] /. DSolve[x’[t] == 3 x[t] t^2, x[t], t][[1]]

solutions = Plot[Evaluate[Table[sol /. C[1] -> n, {n, -.5, .5, .25}]],

{t, -2, 1}]

Show[{vf, solutions}, Axes -> True]

The logistic map is often cited as an example of how complex, chaotic behavior can arise from very

simple non-linear dynamic systems. It models the population of a single species where (i) reproduction

means the population will increase at a rate proportional to the current population but (ii) starvation

means that the population will decrease at a rate proportional to the value obtained by taking the

theoretical “capacity” of the environment less the current population. Mathematically this can be

written as the difference equation

xt+1 = axt (1 − xt )

where xt is a number between zero and one representing the population at time t, and thus x0 rep-

resents the initial population at time 0 and a is a positive number representing a combined rate for

reproduction and starvation.

78

Since both xt and 1 − xt are positive but less than one,

axt (1 − xt ) will be less than xt if a < 1. Thus with a be- 0.5

tween zero and one, the population will eventually die for

any initial population.

When a is between 1 and 2, the population quickly stabilizes 0.4

a−1

xt+1 = xt = x ∗ ⇒ x ∗ = ax ∗ (1 − x ∗ ) ⇒ x ∗ = 5 10 15 20 25

a 0.3

for the case in which a = 1.5. Integer values of t are plotted

along the horizontal axis and xt is plotted on the vertical 0.2

x0 .

When a is between 2 and 3, the population will also even-

Figure 6.4: Asymptotically Stable

tually stabilize on the same value but first oscillates around

that value for some time. When a is between 3 and approx-

imately 3.54 the population eventually enters a cycle among points and then remains in this cycle

forever. This is illustrated in Figure 6.5. In the left-hand panel a = 3.2 and the population eventually

enters a two point cycle. In the right-hand panel a = 3.5 and the population eventually enters a four

point cycle. This stable, four point is also illustrated in the left-hand panel of Figure 6.7 on following

page.

0.9

0.8

0.8

0.7 5 10 15 20 25

5 10 15 20 25

0.6 0.6

0.5

0.5

0.4

0.4

0.3

Figure 6.5: Cycles: Two Point (left) and Four Point (right)

Problem 6.4. Is the equilibrium at (a − 1)/a in the logistic map stable for a between 3 and 3.54? Is

it asymptotically stable?

Saving the best for last, when a is approximately 3.57 the system becomes chaotic. This is illustrated

for a = 3.7 in Figure 6.6 on following page.

Chaotic behavior can be described as extreme sensitivity to initial conditions. Figure 6.7 on following

page shows the last 40 of 120 values of the Logistic Map for x[0] = 0.02. In the left-hand panel

a = 3.5 and in the right hand panel a = 3.7. Each panel has two plots, one computed with the default

16 digit accuracy and another with 80 digit accuracy. While the two plots are indistinguishable in the

non-chaotic, left-hand panel, there is a substantial difference in the chaotic, right-hand panel where

79

0.9

0.8

5 10 15 20 25

0.7

0.6

0.5

0.4

0.3

0.9

0.8

0.8

0.7 0.7

0.6

0.6

0.5

0.5

0.4

90 100 110 120

80

the two plots, which started from the same initial conditions, appear less and less related as the slight

“rounding” differences accumulate with each passing period.

An interesting historical note is that Edward Lorenz was using a numerical computer model to rerun a

weather prediction in 1961, when, as a shortcut on a number in the sequence, he entered the decimal

.506 instead of entering the full .506127 that the computer would hold. The result was a completly

different weather scenerio. Learning of this, one meteorologist remarked that one flap of a seagull’s

wings could change the course of weather forever. In later speeches and papers, Lorenz adopted

the more poetic and now familiar phrase “butterfly effect” to describe this characteristic of chaotic

systems.

Hopefully, you are now convinced (i) that many strange and wonderful things are possible even with

quite simple dynamic systems and (ii) that you should focus on modeling and leave the task of solving

to Mathematica (or some similar system).

Much of the remainder of this chapter will focus on the system of first-order, homogenous differential

equations given by

ẋ = Ax (6.3)

where A is an n × n matrix and both ẋ and x are elements of Rn . Why focus on this system? It’s

important in its own right and it can be used to understand behavior of many other systems.

d2 x dx

2

+a + bx = 0

dt dt

then we can introduce new variables x1 = x and x2 = dx/dt, note that dx2 /dt = d2 x/dt 2 and get the

first-order linear system

ẋ1 = x2

ẋ2 = −bx1 − ax2

" #

0 1

A=

−b −a

Higher order linear systems can always be reduced to first-order linear systems by introducing new

variables in this way.

Suppose, on the other hand, that we have the non-homogenous first-order system

ẏ = Ay + b

Assuming that A is invertible, we can solve Ay + b = 0 for the rest-point y ∗ = −A−1 b. Now define

the new variables x ≡ y − y ∗ to “translate the origin” to y ∗ , note that ẋ = ẏ and y = x − A−1 b and

substitute to obtain

ẋ = A(x − A−1 b) + b = Ax − b + b = Ax

81

Finally suppose that we have the non-linear, first-order system

ẋ1 = f 1 (x1 , x2 , . . . , xn )

ẋ2 = f 2 (x1 , x2 , . . . , xn )

..

.

ẋn = f n (x1 , x2 , . . . , xn )

or, more succinctly, ẋ = f (x). Supposing that f is differentiable, we can let Ax̂ ≡ fx (x̂) where fx (x̂),

the Jacobian of f evaluated at x̂, has ∂f i (x̂)/∂xj as its ij element, and then examine the linear system

ẋ = Ax̂ x

Better still, we need not consider all A matrices when studying Equation 6.3 since similar matrices

represent only a change of basis and must, therefore, exhibit the same underlying behavior. Further,

for any class of similar matrices, there is a particularly simple, “diagonal” form known as the Jordan

canonical form. Here the matrix which is similar to A takes the form:

D

C D

.. ..

. .

C D

C D

where the C and D blocks are 2 × 2 matrices. Each of the D blocks take one of the following three

forms

" #

a b

(6.4)

−b a

" #

c 0

(6.5)

0 d

" #

e 0

(6.6)

1 e

where a ± bi is a pair of complex conjugate characteristic roots of A, c and d are real characteristic

roots of A and e is a repeated real characteristic root of A.

" #

1 0

0 1

or

" #

0 0

0 0

with the former case holding only if the adjacent D blocks correspond to repeated characteristic roots.

82

Suppose, for example, that A is a 4 × 4 matrix with characteristic roots 2, 2, 1 − i and 1 + i. The Jordan

canonical form would then be

2 0 0 0

1 2 0 0

0 0 1 −1

(6.7)

0 0 1 1

It should be noted that the elements of the Jordan canonical form discussed thus far are all real

numbers. In this real version of the Jordan canonical form complex roots are represented by blocks

of the sort given in Equation 6.4 on the previous page in which the real part of the root is given by

the diagonal term and the imaginary part by the off diagonal terms. If we allow the matrix to have

complex numbers then in the complex version of the Jordan canonical form blocks like

" #

a − bi 0

0 a + bi

replace those in Equation 6.4 on the previous page. It should also be noted that in some discussions

the C blocks corresponding to repeated roots appear above the diagonal instead of below. The classic

reference for this is Hirsh and Smale [1974].

Problem 6.5. In Mathematica, the command JordanDecomposition[M] returns two matrices, S and

J, which satisfy S −1 MS = J. Here J is the complex version of the Jordan canonical form and S is

the similarity transformation for obtaining it from the matrix M. Both matrices can be conveniently

displayed with the command

Map[MatrixForm, {S, J} = JordanDecomposition[M]]

What is the result of applying this to the matrix in Equation 6.7?

Since the system de-couples into 2 × 2 blocks in the Jordan form, we can effectively limit attention to

the simple forms represented in Equation 6.4 to Equation 6.6 on the previous page.

Here we consider the system corresponding to Equation 6.4 on the previous page:

" # " #" #

ẋ1 a b x1

=

ẋ2 −b a x2

x1 (t) at cos(bt) sin(bt) x1 (0)

=e (6.8)

x2 (t) − sin(bt) cos(bt) x2 (0)

Note that limt→∞ (x1 (t), x2 (t)) equals (0, 0) iff a < 0. Thus the system is stable iff a, the real part of

the pair of complex conjugate roots, is negative. The vector field and trajectories for this system for

various interesting cases are illustrated below:

Problem 6.6. Use Mathematica’s DSolve to confirm Equation 6.8.

Problem 6.7. How would these plots change with changes in the magnitude but not the sign of a?

Problem 6.8. How would these plots change with changes in the sign and magnitude of b?

♦ Query 6.1. What is the best linear approximation of the predator and prey system at the rest point,

(x, y) = (1, 1)? Describe the behavior of the solutions to this linearization.

♦ Query 6.2. What is the best linear approximation of the system in Problem 6.2 on page 78 at the rest

point, (x, y) = (0, 0)? What about the rest point at −2, −2? Describe the behavior of the solutions to

both linearizations.

83

10 15

10

-5

-5

-10

-10 -15

Figure 6.8: Unstable: complex roots with positive real part (a = 0.4 and b = 1)

0.3

0.2

1.0

0.1

0.5

-0.1

-0.5

-0.2

-1.0

-0.3

Figure 6.9: Stable: Complex roots with negative real parts (a = −0.4 and b = 1)

84

5

5

-5 5 -5 5

-5

-5

" # " #" #

ẋ1 c 0 x1

=

ẋ2 0 d x2

with solution

ect

" # " #" #

x1 (t) 0 x1 (0)

= (6.9)

x2 (t) 0 edt x2 (0)

Note that limt→∞ (x1 (t), x2 (t)) equals (0, 0) iff a, b < 0. Thus the system is stable iff both of the (real)

roots are negative. The vector field and trajectories for this system for various interesting cases are

illustrated below.

1.0 1.0

0.5 0.5

-0.5 -0.5

-1.0 -1.0

Figure 6.11: Stable: negative and unequal real roots (c = −0.5 and d = −1)

Problem 6.10. How would these plots change with changes in the magnitudes of c and d?

85

1.0 1.0

0.5 0.5

-0.5 -0.5

-1.0 -1.0

Figure 6.12: Stable: negative and equal real roots (c = −1 and d = −1)

1.0

1.0

0.5

0.5

-0.5

-0.5

-1.0

-1.0

Figure 6.13: Saddle: both positive and negative real roots (c = −1 and d = 1)

86

Problem 6.11. Is the system described in Figure 6.13 on the previous page stable? Is it unstable?

What is the set of initial conditions from which the system converges to the rest point? Is this set

generic in R2 ? What is the set of initial conditions from which the system does not converge to the

rest point? Is this set generic in R2 ?

" # " #" #

ẋ1 r 0 x1

=

ẋ2 1 r x2

x1 (t) rt 1 0 x1 (0)

=e (6.10)

x2 (t) t 1 x2 (0)

t→∞ t→∞

t

lim

t→∞ e|r |t

which is of the indeterminate form ∞/∞ and l’Hospital’s rule can therefore be applied. Differentiating

numerator and denominator then gives

t 1

lim = lim =0

t→∞ e|r |t t→∞ |r | e|r |t

Thus the system is stable iff the repeated real root r is negative. The vector field and trajectories for

this system for various interesting cases are illustrated below.

1.0 1.0

0.5 0.5

-0.5 -0.5

-1.0 -1.0

87

6.2.4 Summary

Solutions to all of the generic 2 × 2 systems that make up the diagonal blocks in the Jordan canonical

form have now been displayed. The key facts are the following:

2. If the real part of all of the characteristic roots are negative, then all paths converge to the origin.

3. If the real parts of all of the characteristic roots are positive then all paths other than those

starting at the origin diverge from the origin.

4. Real parts with opposite signs are associated with “saddles” where some paths converge to the

origin and others diverge.

5. Complex roots are associated the orbits. When the real part is negative, the orbit “decays” and

paths converge on the origin. When the real part is positive, the orbit “explodes”. When the real

part is zero, i.e., the root is a pure complex number, then the orbit is itself a stable set.

For linear systems, the question of stability is easily resolved. Just examine the characteristic roots.

For non-linear systems, issues are more complicated. So far we have no way of determining stability

except by actually finding all the solutions to ẋ = f (x) which may be difficult if not impossible.

Fortunately, another, indirect method is possible.

Theorem 31 (Liapunov). Suppose x ∗ is an equilibrium for ẋ = f (x) and that there exists a continuous

map, v, from U , a neighborhood of x ∗ , into R which is differentiable on U \ {x ∗ } and satisfies

2. v̇ ≤ 0 for x ∈ U \ {x ∗ }

Then x ∗ is stable and v is called a Liapunov function. Furthermore, if

3. v̇ < 0 for x ∈ U \ {x ∗ }

Then x ∗ is asymptotically stable and v is called a strict Liapunov function.

It should be emphasized that Theorem 31 can be applied without solving Equation 6.1 on page 76. On

the other hand, there is no direct method for finding Liapunov functions — often trial and error and

considerable ingenuity is required.

Here’s an example of the use of this method. The van del Pol equation is a second-order differential

equation that can be reduced to the following system of first-order, non-linear equations:

ẋ1 = x2

ẋ2 = −x1 + a(x23 /3 − x2 ) where a > 0

Is the equilibrium at x ∗ = (0, 0) stable? For this system we try (inspired guess) the function v =

1/2(x12 + x22 ). Since this is a positive-definite quadratic form, the requirement that v(x) > 0 for

x 6= x ∗ = 0 is satisfied. It remains to show that v̇ < 0:

v̇ = x1 ẋ1 + x2 ẋ2

= x1 x2 − x1 x2 + a(x24 − x 2 )

= −ax22 (3 − x22 )/3

88

√

which is clearly negative for |x| <√ 3. Thus v is in fact a strict Liapunov function and x ∗ , therefore,

is asymptotically stable for |x| < 3 ≈ 1.73.

For completeness, the vector field and solution plots are illustrated in Figure 6.15.

-2 -1 1 2

-3 -2 -1 1 2 3

-1 -1

-2

-2

-3

Problem 6.13. Is v = x − log(x) + y − log(y) + k a Liapunov function for the predator and prey

system

ẋ = (1 − y)x

ẏ = (x − 1)y

89

90

Notation

and B = {c, d} then A × B = {(a, c), (a, d), (b, c), (b, d)}. Similarly, R2 = R × R. In

If A = {a, b} Q

m

general, A ≡ j=1 Aj ≡ A1 × A2 × · · · × Am is the set of m-tuples of the form (a1 , a2 , . . . , am )

where a1 ∈ A1 , a2 ∈ A2 , and so forth.

Similarly ∩ “intersect” and ∪ “union”

R+ ⊂ R

Ac the complement of A

\ set minus

A \ B = A ∩ Bc

5/7 ∈ R+ , −3 ∉ R+

x + 1 > x ∀x ∈ R

If x ∈ R then ∃y ∈ R such that y > x

A ∩ B ≡ {x | x ∈ A and x ∈ B}.

⇒ “implies”

91

92

Using Mathematica

Basics

I am preparing this document using the notebook interface for the Mac version of Mathematica so

the keyboard shortcuts I describe may be a bit different if you’re using either the Windows or Unix

versions. Though Mathematica has both a notebook or graphics interface and a character-based

interface,you will most likely be using the notebook interface and this differs little between operating

systems.When you save your notebook you’ll get a standard ascii notebook file that is exactly the same

no matter which operating system you’re using. You could email this file to a friend, for example,and

he/she would be able to use it no matter which operating systems the two of you are using. You

could also select File/SaveasSpecial/HTML or File/SaveasSpecial/TEX, as I will do with this document

to display it on the internet.

When you start Mathematica, the screen will be blank save for a horizontal line across the top of

the screen. This line represents the insertion point for a "cell". A cell can contain text, such as

this paragraph, a headline, such as "Introduction to Mathematica" at the beginning of this document,

input to be processed by Mathematica, output returned by Mathematica and so forth. The default is

input but you can change this while the horizontal line is visible by selecting from the Format/Style.

Once you start typing, the line will disappear and the characters you type will appear together with a

"bracket" in the right-hand margin. You can click on this bracket and make another selection from

the Format/Style menu if you change your mind about the style you want.

If you move the curor to the end (or beginning) of a cell, a new insertion line will appear where you

can, once again, select a style and enter new material. You can also, of course, return to any existing

cell and make any changes you like.

Input Expressions

Simply enter an expression, say (2+3)∧ 3, in an input cell and press the shift key and the enter key at

the same time. Mathematica will process your input, label your input cell with "In[#]" and return its

output in a cell labeled "Out[#]" where "#" is the number Mathematica assigns to this matched pair

of cells.

In: (2 + 3) ∧ 3

Out: 125

If you edit your input cell, "In[#]" will disappear to remind you that your new input has not yet been

processed.

The operators are what you would expect, with * or space for multiplication and ∧ for exponentiation.

Watch out for the following common sources of consternation:

93

Spaces

3x will be interpreted as 3 times x and not as a variable named 3x. On the other hand, x3 will be

interpreted as a variable named x3 and not as the product of x times 3. If what you want is x times 3

and you want to put the x first, then you must put a * or a space between the x and the 3.

Cases

Mathematica is case sensitive, e.g., y and Y are the names of two different variables. Mathematica’s

built in variables, like Pi, and functions, like Log, always have at least the first letter capitalized and

are usually not abbreviated but are spelled out in full. You can avoid conflicts by not capitalizing the

first letter of your own variables and functions.

Delimiters

Mathematica uses the delimiters ( ), [ ], { } and [[ ]] for completely different purposes. Parentheses

are used only for grouping, e.g, (x+y)∧ 2. Square brackets are used only to provide the argument(s) to

a function, e.g., Log[5]. Curly braces are used only to delimit the elements of a list, e.g., mylist={x,y}

defines mylist to be a list containing two elements - the variables x and y. Double square brackets are

used only to refer to elements of a list, e.g., mylist[[2]] refers to the second element of mylist - the

variable y.

You are not limited to numeric entries, Mathematica will just as happily process symbolic expressions.

In: x + y ∧ 3

3

Out: x + y

x + y ∧3

In: Expand

Out: x 3 + 3x 2 y + 3xy 2 + y 3

When processing numbers, Mathematica’s default is to treat them as exact:

In: 1/3 + 2/5 + 3/7

122

Out:

105

If you want a decimal approximation you can either ask for it explicitly using the built-in numerical

value function, N:

In: N [1/3 + 2/5 + 3/7]

Out: 1.1619

or you can enter a decimal in your input:

In: 1/3 + .4 + 3/7

Out: 1.1619

Mathematica can compute very large numbers

94

In: 200!

Out: 78865786736479050355236321393218506229513597768717326329474253

3244359449963403342920304284011984623904177212138919638830257

6427902426371050619266249528299311134628572707633172373969889

4392244562145166424025403329186413122742829485327752424240757

3903240321257405579568660226031904170324062351700858796178922

2227896237038973747200000000000000000000000000000000000000000

00000000

or very precise approximations such as the value of Pi carried out to 400 decimal places

Out: 3.141592653589793238462643383279502884197169399375105820974944

5923078164062862089986280348253421170679821480865132823066470

9384460955058223172535940812848111745028410270193852110555964

4622948954930381964428810975665933446128475648233786783165271

2019091456485669234603486104543266482133936072602491412737245

8700660631558817488152092096282925409171536436789259036001133

0530548820466521384146951941511609

or complicated expressions

In: Sum 1/j, j, 1, 100

14466636279520351160221518043104131447711

Out:

2788815009188499086581352357412492142272

Note the use of {j, 1, 100} to represent a list of values for j going from 1 to 100. This "range" operator

is widely used. For another example, consider the 2 dimensional plot

In:

0.5

1 2 3 4 5 6

-0.5

-1

Out: −Graphics−

In:

95

1

0.5 1

0

-0.5 0.5

-1

-1 0

-0.5

0 -0.5

0.5

1 -1

Out: −SurfaceGraphics−

where both x and y go from -1 to +1.

By the way,if you forget how a function works,just put the cursor after the name of the function,e.g.,after

the Plot3D,click on Help and then Find Selected Function and the complete documentation on the rel-

evant function will pop up.Mathematica’s on-line help system is the best I’ve seen.

The percent sign, %, is shorthand for the results of the previous calculation, e.g.,

x + y ∧3

In: Expand

Out: x 3 + 3x 2 y + 3xy 2 + y 3

In:Factor [%]

3

Out: x + y

In: myexpression = x ∧ 2 − 8x + 15

Out: 15 − 8x + x 2

and later refer to the equation by name

In: mysolutions = Solve [myexpression == 0]

Out: {{x → 3} , {x → 5}}

Note the use of the single equality to name or define an expression and the use of the double equality

for "equals". Note also that the output of "Solve" is a list of the two solutions to the equation. To

refer, say, to the second solution you would use mysolutions[[2]].

96

Mathematica never forgets. If you ever enter x = 3, even by mistake, then the variable named x will

thereafter be replaced by the number 3. Deleting the cell containing the definition won’t help. To

remove this definition from memory you need to enter Clear[x].

You can substitute values into an expression using the substitution operator "/.",e.g.,

In: x ∧ 2/.x → 3

Out: 9

where the rightarrow is gotten by typing "-" and then ">", or

In:(a + b) ∧ 2/.a + b → x − y

2

Out: x − y

or

In: myexpression/.mysolutions

Out: {0, 0}

In the last case, the list of "mysolutions", namely x->3 and x->5 are subsituted one at a time into

"myexpression". The fact that the result in each case is zero, confirms that 3 and 5 are both solutions

to myexpression==0.

In: f [z_] :=z∧ 2 − 8z + 15

Note the use of "z_" on the left to refer to "z" on the right though you could equally well use "a_" and

"a" or any other such pair of symbols. Now f can be used exactly as you would any other function,

e.g., to confirm visually "mysolutions".

In: Plot [f [x] , {x, 0, 8} , PlotStyle → RGBColor [1, 0, 0]]

15

12.5

10

7.5

5

2.5

2 4 6 8

Out: −Graphics−

You can also define piecewise fundtions

97

In: g [x_] :=1 − x/; x ≤ 1

g [x_] :=x − 1/; x > 1

In: Plot g [x] , {x, −4, 4}

-4 -2 2 4

Out: −Graphics−

mess = 3 + 7x + 8x ∧ 2 + 5x ∧ 3 + x∧ 4 /

In:

3 + 10x + 18x ∧ 2 + 14x ∧ 3 + 3x ∧ 4

3 + 7x + 8x 2 + 5x 3 + x 4

Out:

3 + 10x + 18x 2 + 14x 3 + 3x 4

In: Simplify [mess]

1 + x + x2

Out:

1 + 2x + 3x 2

Forgetting

Mathematica never forgets. If you ever enter x=3, even by mistake, then the variable named x will

thereafter be replaced by the number 3. Deleting the cell containing the definition won’t help. To

remove the definition from memory you need to enter Clear[x]:

In: x=3

Out: 3

In: x

Out: 3

In: Clear [x]

In: x

Out: x

You can remove all of your own definitions either with the following magic:

In: Clear ["Global‘*"]

In: myexpression

98

Out: myexpression

or by choosing "Kernel / Quit Kernel / Local" from the drop-down menu. You can then make any

changes you like and choose "Kernel / Evaluation /Evaluate Notebook" from the drop-down menu to

redo all the calculations in your notebook.

Taking the derivative of an expression with respect to a variable is such a common operation that it is

one of the few Mathematica commands that is abbreviated. Here are some examples.

In: D 3x ∧ 2, x

Out: 6x

In: D [f [x] , x]

Out: f 0 [x]

The partial derivative of a function of x and y with respect to y

In: D 3x ∧ 2 + 2xy + 2y ∧ 3, y

Out: 2x + 6y 2

Note: In the expression "2 times x times y" , there’s a space between the x and the y, otherwise the

expression would be interpreted as 2 times the variable named xy.

The generalization of the "product rule" to three functions

In: D [a [x] b [x] c [x] , x]

Out: b [x] c [x] a0 [x] + a [x] c [x] b0 [x] + a [x] b [x] c 0 [x]

The generalization of the "chain rule" to three functions

In: D [a [b [c [x]]] , x]

Out: a0 [b [c [x]]] b0 [c [x]] c 0 [x]

Using "Integrate"

In: Integrate x ∧ 3, x

x4

Out:

4

and one that I couldn’t

In: Integrate Sin [x] ∧ 2, x

x 1

Out: − Sin [2x]

2 4

Is the last one correct?

In: D [%, x]

99

1 1

Out: − Cos [2x]

2 2

In: Simplify [%]

Out: Sin [x]2

Yes. Note that for indefinite intergrals, Mathematica does not display the constant of integration.

For a definite integral give the range of integration

In: Integrate x ∧ 3, {x, 0, 1}

1

Out:

4

Using "Solve"

In:

x→0

Out: 1

x→ 8

In: e = x ∧ 3 − 2x + 9;

In:Solve [e == 0] //TableForm

1/3 1 √ 1/3

2 (81− 6465)

2

x → −2 3 81− 6465

√ − 32/3

( )

1/3 √ √ 1/3

√ (1−i 3) 12 (81− 6465)

Out: x → 1 + i 3 2

√

3(81− 6465)

+ 232/3

1/3 √ 1 √ 1/3

√ (1+i 3) 2 (81− 6465)

2

x → 1 − i 3 3 81− 6465√ + 232/3

( )

Note that we get one real solution, the first, and two complex solutions. Let’s see whether the first

solution works:

In: e/.% [[1]]

!1/3 √

1 1/3

2 2 81 −

6465

Out: 9 − 2 −2

√ − +

32/3

3 81 − 6465

!1/3 1 √ 1/3 3

2 2 81 − 6465

√ −

32/3

3 81 − 6465

−2

In: Simplify [%]

Out: 0

100

To find the maximum of a function of a single variable:

In: payoff = 7x − 4x ∧ 2;

In: Plot [payoff, {x, 0, 2}]

3

0.5 1 1.5 2

-1

-2

Out: −Graphics−

In: Solve [D [payoff, x] == 0]

7

Out: x→

8

In: payoff/.% [[1]]

49

Out:

16

In: %//N

Out: 3.0625

In: payoff = 7x + 5y − x ∧ 2 − y ∧ 2;

In: Plot3D payoff, {x, 0, 6} , y, 0, 6 , ViewPoint-> {1.5, −2.8, 1.1} , BoxRatios → {1, 1, .8}

6

4

2

0

10

0

2

4

6

101

Out: −SurfaceGraphics−

Note that you can interactively set the "ViewPoint" by putting your cursor at that point in the ex-

pression, selecting "Input / 3D View Point Selector" from the drop-down menu and the clicking on

"Paste" when you’re happy with the result.

In: Solve D [payoff, x] == 0, D payoff, y == 0 //TableForm

7 5

Out: x→ 2 y→ 2

Want to know more about Mathematica? Cheung et al. [2005] is a good place to start and Ruskeepää

[2004] will make you an expert. For a reference look no further than Mathematica’s superb built-in

help system. Enter a command and then press F1 for information about the command and examples

of its use or press F1 when the cursor is not on a command to bring up the entire help system.

102

Bibliography

C-K. Cheung, G. E. Keough, Robert H. Gross, and Charles Landraitis. Getting Started with Mathematica.

Wiley, 2nd edition, 2005.

Morris Hirsh and Stephen Smale. Differential Equations, Dynamical Systems and Linear Algebra. Aca-

demic Press, 1974.

George F. Simmons. Introduction to Topology and Modern Analysis. McGraw Hill, 1963.

103

104

List of Problems

Chapter 1 1

1.1: 2 1.2: 2 1.3: 3 1.4: 3 1.5: 3 1.6: 4 ♦1.1: 4 1.7: 4 1.8: 4 1.9: 5 1.10: 5

1.11: 6 ♦1.2: 7 1.12: 7 1.13: 7 1.14: 7 1.15: 7 1.16: 8 1.17: 8 1.18: 8 1.19: 8

1.20: 9 1.21: 9 1.22: 9 1.23: 10 1.24: 10 1.25: 10 1.26: 10 1.27: 10 ♦1.3: 10

1.28: 11

Chapter 2 15

♦2.1: 17 2.1: 18 ♦2.2: 19 2.2: 20 2.3: 20 2.4: 20 2.5: 20 ♦2.3: 21 2.6: 22 2.7: 25

2.8: 25 2.9: 25 2.10: 25 2.11: 26 2.12: 27 2.13: 27 2.14: 28 2.15: 28 2.16: 30

2.17: 30 2.18: 30 2.19: 30 ♦2.4: 32 ♦2.5: 33 ♦2.6: 33 ♦2.7: 33

Chapter 3 35

3.1: 36 3.2: 38 3.3: 39 3.4: 39 3.5: 39 3.6: 40 3.7: 40 3.8: 40 3.9: 40

3.10: 40 3.11: 41 3.12: 41 3.13: 43 3.14: 44 3.15: 44 3.16: 44 3.17: 44 ♦3.1: 45

♦3.2: 45 3.18: 45 3.19: 45

Chapter 4 47

4.1: 48 4.2: 48 4.3: 49 4.4: 49 4.5: 49 4.6: 50 4.7: 52 4.8: 53 4.9: 53

4.10: 54 4.11: 54 4.12: 54 4.13: 55 4.14: 55 4.15: 55 4.16: 55

Chapter 5 57

5.1: 59 5.2: 61 5.3: 62 5.4: 63 5.5: 63 5.6: 65 5.7: 65 5.8: 66 5.9: 66

5.10: 70 5.11: 73 5.12: 73

Chapter 6 75

6.1: 78 6.2: 78 6.3: 78 6.4: 79 6.5: 83 6.6: 83 6.7: 83 6.8: 83 ♦6.1: 83 ♦6.2: 83

6.9: 85 6.10: 85 6.11: 87 6.12: 87 6.13: 89

Chapter 93

105

Index

T1 -space, 44 convex, 54

T2 -space, 44 strictly, 54

σ -algebra, 45 convex hull, 9

convex set, 9

affine countable set, 36

space, 8 Cramer’s Rule, 28

spanned by vectors, 8 critical points, 48

subspace, 8 curvature, 52

arbitrage, 32

arg max, 58 derivative

arg min, 58 first, 48

asymptotically stable, 76, 88 second, 52

determinant, 26

Bolzano-Weirstrass property, 45 diameter, 39

bordered Hessian, 70 difference equations, 76

bounded, 39 differentiable, 48

bounding, 11 at a point, 48

butterfly effect, 81 twice, 52

differential

canonical, 20 first, 48

Cantor set, 38 second, 52

cardinal number of the continuum, 38 differential equations, 76

cardinality, 36 directed distance, 11

Cartesian product, 1 discrete space, 43

Cauchy-Schwarz inequality, 4 dual space, 17

characteristic equation, 29 dynamic systems

characteristic root, 29 continuous time, 75

characteristic vector, 29 discrete, 75

closed, 44

closed set, 41 eigenvalues, 29

compact space, 45 eigenvectors, 29

comparative static effects, 67 endogenous variables, 67

complementary slackness condition, 62 equilibria, 76

complete, 3 exogenous variables, 67

component, 2

concave, 53 Farkas’ Lemma, 31

strictly, 54 feasible set, 68

constraint qualifications, 60 first order necessary condition, 58

constraints, 59 flow, 75

continuous at a point, 42 fundamental equation of comparative statics, 70

continuous mapping, 43

convergent subsequence, 42 generic, 66

106

generic property, 45 map, 75

gradient, 48 Mathematica

Gram’s Theorem, 25 Assuming, 20

graph, 49 ChebyshevDistance, 40

Det, 27

half spaces, 11 dot product, 4

Hausdorf space, 44 Eigenvalues, 30

Heine-Borel Theorem, 45 Eigenvectors, 30

Hessian, 52 Element, 20

bordered, 56 EuclideanDistance, 40

homogeneous of degree ρ, 50 Inverse, 25–27

hyperplane, 11 ManhattanDistance, 40

MatrixForm, 27

identity matrix, 24 MatrixRank, 22

initial condition, 75

Norm, 5, 20

inverse image, 19

NullSpace, 22

invertible, 20

Reals, 20

scalar product, 5

Jacobian, 49, 82

Table, 4

Jordan canonical form, 82

Transpose, 26

complex, 83

vector as a list, 4

real, 83

vector sum, 5

Kuhn-Tucker Conditions, 61 matrix

similar, 20

Lagrangian Function, 61 matrix-vector, 19

Lagrangian multiplier, 62 matrix-vector product

latent roots, 29 column view, 21

latent vectors, 29 row view, 21

level measurable, 45

contour, 49 measurable sets, 45

level set, 55 measure space, 45

Liapunov function, 88 metric, 39

strict, 88 metric space, 39

limit point, 41, 42 complete, 42

linear (sub)space Minkowski’s Theorem, 12

basis, 7 monotone increasing (decreasing) function, 55

dimension, 7

projection, 7 neighborhood, 43

space, 6 norm, 39

spanned by vectors, 7 normal, 10

subspace, 6 null space, 22

linear equations numbers

homogeneous, 21 cardinal, 36

non-homogeneous, 21 rational, 36

linear programming, 66 numerically equivalent, 36

linear programming problem, 60

linear transformation, 18 objective function, 58

domain, 18 open cover, 45

image, 19 open set, 41

range, 18 open sets, 43

real-valued, 16 open sphere, 40

Lotka-Volterra, 76 ordinary least squares regression, 25

107

origin, 2 discrete, 43

orthant, 4 induced by the metric, 43

orthogonal, 4 relative, 43

stronger, 43

parameters, 62 trivial, 43

path, 76 usual, 43

point, 2 weaker, 43

point-to-set mapping, 67 trace, 30

predator-prey, 76 trajectory, 76

principal minor, 56 transitive, 3

triangle inequality, 39

quadratic approximation, 51

quadratic form, 50 uncountable, 37

indefinite, 50

negative definite, 50 van del Pol equation, 88

negative semi-definite, 50 vector, 2

positive definite, 50 affine combination, 8

positive semi-definite, 50 convex combination, 9

quasi-concave, 55 dot product ·, 4

strictly, 55 equality =, 2

quasi-convex, 55 included angle, 4

strictly, 55 inequality >, 2

inner product ·, 4

rank, 19 linear combination, 6

real n-space, 1 linearly dependent, 6

reduced form, 67 linearly independent, 6

relation, 2 norm k k, 3

residual of the projection, 7 scalar product, 5

strict inequality , 2

second order necessary condition, 58 sum +, 4

separating, 12 weak inequality ≥, 2

sequence, 41

Cauchy, 42 Weierstrass Theorem, 45

convergent, 42

sequential compactness property, 45

set

dense, 44

shadow price, 62

single-peaked, 55

smooth, 53

solution set, 10

stability, 76

stable, 76, 88

state space, 75

subcover, 45

sufficient conditions, 59

supporting, 11

supremum, 39

testable hypothesis, 59

topological space, 43

topology, 43

108

Colophon

This book was prepared using Apple computers (a MacBook and an iMac) running OS X. Graphs were

prepared using either OmniGraffle Professional or Mathematica or a combination of the two. The text

was prepared using the wonderful editor TextMate with LATEX’s book class and Lucida Bright fonts.

109

- A Classification of Integrable Hydrodynamic Type Chains Using the Haantjes TensorDiunggah olehDave Marshall
- Statistics with Stata 9 (Hamilton 2006 Thomson)Diunggah olehomogenikky
- Advanced Calculus and Analysis - I. Craw (2002) WWDiunggah olehUgras SEVGEN
- Basic Analysis - K. KuttlerDiunggah olehrommel duran
- Wandering in the World of Smarandache Numbers, by A. A. K. MajumdarDiunggah olehAnonymous 0U9j6BLllB
- MeasureTheory UCLADiunggah olehNiflheim
- Course Reader Fourier StandfordDiunggah olehangelos11
- Ch_9. Simplex MethodDiunggah olehghaznavi_m
- Topology Without Tears - SIDNEY a MORRISDiunggah olehlennoxcelestin
- Proceedings of the Sixth International Conference on Number Theory and Smarandache Notions, edited by Zhang WenpengDiunggah olehAnonymous 0U9j6BLllB
- eulerProbsDiunggah olehingdas
- Bystrom Applied MathematicsDiunggah olehjulianlennon
- Applied Maths Presentation 2010Diunggah olehEdwin Okoampa Boadu
- Presentation: A Classification of All Connected Graphs on Seven, Eight, and Nine Vertices With Respect to the Property of Intrinsic KnottingDiunggah olehChris Morris
- Monotone Random SystemsDiunggah olehalirezafakhr
- eBook Sudoku Puzzles Hard 100Diunggah olehMiguelAngelMoreno
- SPSS Tables 14.0Diunggah olehHasan
- Notes 380Diunggah olehlpredd
- McFadden-Statistical Tools for EconomistsDiunggah olehbeaconoflight
- Hmatrix BookDiunggah olehJason Fraser
- poincareabiograp000413mbpDiunggah olehspectraltheory
- SPSS Decision Trees User GuideDiunggah olehJunior Frank
- Subjective Probability the Real ThingDiunggah olehGuanchun WANG
- mathematical recreationsDiunggah olehtdscribd
- eBook on HyperboloidDiunggah olehChidi Ozueorah
- Discrete Mathematics 2005Diunggah olehankur2583008
- Algebra Lineal Con EllDiunggah olehpandaone
- complexitynotes02Diunggah olehritesh.shah433
- SPSS Brief Guide 14.0Diunggah olehHasan

- Lectures in Labor EconomicsDiunggah olehjoydrambles
- Monte Carlo With Applications to Finance in RDiunggah olehMadMinarch
- Hu - Time Series AnalysisDiunggah olehMadMinarch
- Misra - Structural Model of Sales Force CompensationDiunggah olehMadMinarch
- Notes on Luenberger's Vector Space OptimizationDiunggah olehMadMinarch
- BookDiunggah olehadarsh2day
- Basic Methods of Theoretical BiologyDiunggah olehMadMinarch
- Acemoglu - Political Economy Lecture NotesDiunggah olehMadMinarch
- Chetty - Public Economics LecturesDiunggah olehMadMinarch
- Lecture_notes in Open Economy MacroDiunggah olehMadMinarch
- Yale Econ Math CampDiunggah olehMadMinarch

- Matrix Analysis for ScientistDiunggah olehdialneira7398
- fall review for mathDiunggah olehPierre Rodriguez
- fa_mth405_1Diunggah olehrapsjade
- Elliptic curvesDiunggah olehjax21es
- Lecture-9-BasicConcepts of VectorSpace - CopyDiunggah olehSubhashree
- True of False for Vector SpacesDiunggah olehShawn Joshua Yap
- Mathematics MethodsDiunggah olehzackhussen
- Orthogonality and Vector SpacesDiunggah olehRaul
- Unit I Ax = b and the Four Subspaces.pdfDiunggah olehAhmed Al Sayed
- Online Learning on Hierarchical Mixture of Expert for Real-Time TrackingDiunggah olehVania V. Estrela
- Sub VsDiunggah olehMario Kikijo
- DU Undergraduate Math Hons Syllabus as per CBCSDiunggah olehkthesmart4
- prlmLA-a09Diunggah olehSOULed_Outt
- Cheat SheetDiunggah olehAnvit Mangal
- Fingeom NotesDiunggah olehtbrackman99
- Multivariable CalculusDiunggah olehAlexandru Hernest
- NORMED VECTOR SPACESDiunggah olehhyd arnes
- Jim Hefferon - Linear Algebra - Answers to QuestionsDiunggah olehkimuls
- UG SEM 2Diunggah olehCube Freak
- Advanced Matrix Theory and Linear Algebra for Engineers Video Course – SyllabusDiunggah olehandresun
- Mathematical System TheoryDiunggah olehJochen Trumpf
- ishbalDiunggah olehAsyrafMn
- BrownDiunggah olehAbishek Gupta
- subspace_book_96.pdfDiunggah olehexu_iku
- laDiunggah olehChris Moody
- IM Chapter1 7eDiunggah olehmzious
- final1Diunggah olehVanjarapu Sindhusha
- Katrin Tent-Tits Buildings and the Model Theory of GroupsDiunggah olehNargisBiswas
- Proceedings of the One Day National Seminar on Algebra 200th Birth Anniversary Celebrations of Evariste Galois at Kbn College Vijayawada 2011.10.25 Final to UploadDiunggah olehMALLIKARJUN BHAVANARI
- a Practical Approach to Linear AlgebraDiunggah olehDeepak Singh

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.