Anda di halaman 1dari 360

Calculus

J.M. Ward
MT1174, 2790174

2011

Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This subject guide is for a 100 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 4 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk

This guide was prepared for the University of London International Programmes by:
J.M. Ward, Department of Mathematics, London School of Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the author is unable to enter into any correspondence relating to, or arising
from, the guide. If you have any comments on this subject guide, favourable or unfavourable,
please use the form at the back of this guide.

University of London International Programmes


Publications Office
Stewart House
32 Russell Square
London WC1B 5DN
United Kingdom
Website: www.londoninternational.ac.uk
Published by: University of London
University of London 2011
The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher.
We make every effort to contact copyright holders. If you think we have inadvertently used
your copyright material, please let us know.

Contents

Contents
Preface

1 Introduction

1.1

This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.1

The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.2

Making use of the Online Library . . . . . . . . . . . . . . . . . .

1.4

Using this guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5

Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6

The use of calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Functions
2.1

Introduction: What is a function? . . . . . . . . . . . . . . . . . . . . . .

2.1.1

Some elementary functions and their graphs . . . . . . . . . . . .

11

2.1.2

Combinations of functions . . . . . . . . . . . . . . . . . . . . . .

15

2.1.3

Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

2.1.4

Identities

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24

2.1.5

Applications of functions . . . . . . . . . . . . . . . . . . . . . . .

26

Conic sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.2.1

Parabolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

2.2.2

Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.2.3

Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

2.2.4

Hyperbolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

36

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

45

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

2.2

3 Differentiation
3.1

Introduction: What is differentiation? . . . . . . . . . . . . . . . . . . . .

53
53

Contents

3.2

How to find derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

3.2.1

Standard derivatives . . . . . . . . . . . . . . . . . . . . . . . . .

56

3.2.2

The rules of differentiation . . . . . . . . . . . . . . . . . . . . . .

57

3.2.3

Higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . .

65

Using derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

3.3.1

The meaning of the derivative . . . . . . . . . . . . . . . . . . . .

66

3.3.2

Tangent lines and linear approximations . . . . . . . . . . . . . .

68

3.3.3

Applications of derivatives . . . . . . . . . . . . . . . . . . . . . .

72

3.3.4

Existence of derivatives . . . . . . . . . . . . . . . . . . . . . . . .

74

Using higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . . .

78

3.4.1

Maclaurin series . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

3.4.2

Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

97

3.3

3.4

4 One-variable optimisation
4.1

Introduction: What is optimisation? . . . . . . . . . . . . . . . . . . . . .

103

4.2

Using first-order derivatives . . . . . . . . . . . . . . . . . . . . . . . . .

104

4.2.1

Increasing and decreasing functions . . . . . . . . . . . . . . . . .

104

4.2.2

Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . .

106

4.2.3

An application: Elasticities revisited . . . . . . . . . . . . . . . . .

109

Using second-order derivatives . . . . . . . . . . . . . . . . . . . . . . . .

110

4.3.1

Second-derivatives and stationary points . . . . . . . . . . . . . .

110

4.3.2

Convex and concave functions . . . . . . . . . . . . . . . . . . . .

111

4.3.3

Points of inflection . . . . . . . . . . . . . . . . . . . . . . . . . .

113

Curve sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

114

4.4.1

Sketching curves defined by polynomials . . . . . . . . . . . . . .

115

4.4.2

Sketching curves defined using other elementary functions . . . .

119

4.4.3

Asymptotes and cusps . . . . . . . . . . . . . . . . . . . . . . . .

121

Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

123

4.5.1

Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . .

125

4.5.2

What happens when differentiability fails? . . . . . . . . . . . . .

126

4.5.3

Applications of optimisation . . . . . . . . . . . . . . . . . . . . .

127

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

130

4.3

4.4

4.5

ii

103

Contents

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

131

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

136

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

138

5 Integration

145

5.1

Introduction: What is integration? . . . . . . . . . . . . . . . . . . . . . .

145

5.2

How to find indefinite integrals . . . . . . . . . . . . . . . . . . . . . . .

147

5.2.1

Standard integrals . . . . . . . . . . . . . . . . . . . . . . . . . .

147

5.2.2

The basic rules of integration . . . . . . . . . . . . . . . . . . . .

149

5.2.3

Integration by substitution . . . . . . . . . . . . . . . . . . . . . .

150

5.2.4

Integration by parts

. . . . . . . . . . . . . . . . . . . . . . . . .

158

5.2.5

Using partial fractions to simplify integrands . . . . . . . . . . . .

162

5.2.6

Using trigonometric identities to simplify integrands . . . . . . . .

167

5.3

. . . . . . . . . . . . . . . . . . . . . . . . .

170

5.3.1

Definite integrals and what they represent . . . . . . . . . . . . .

170

5.3.2

Definite integrals and the other rules of integration . . . . . . . .

178

Applications of integrals . . . . . . . . . . . . . . . . . . . . . . . . . . .

182

5.4.1

Marginal functions revisited . . . . . . . . . . . . . . . . . . . . .

182

5.4.2

Consumer and producer surpluses . . . . . . . . . . . . . . . . . .

183

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

186

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

187

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

195

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

196

5.4

Definite integrals and areas

6 Functions of several variables

201

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

201

6.2

Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

202

6.2.1

Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

203

6.2.2

Contours and sections . . . . . . . . . . . . . . . . . . . . . . . .

204

Partial differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

210

6.3.1

Sections and partial derivatives . . . . . . . . . . . . . . . . . . .

211

6.3.2

Finding partial derivatives . . . . . . . . . . . . . . . . . . . . . .

212

6.3.3

The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

214

6.3.4

An application: Homogeneous functions . . . . . . . . . . . . . . .

220

6.3.5

Second-order partial derivatives . . . . . . . . . . . . . . . . . . .

224

Using partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . .

226

6.3

6.4

iii

Contents

6.4.1

Tangent planes . . . . . . . . . . . . . . . . . . . . . . . . . . . .

226

6.4.2

Gradient vectors . . . . . . . . . . . . . . . . . . . . . . . . . . .

230

6.4.3

Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . .

232

6.4.4

Implicitly defined functions of two variables . . . . . . . . . . . .

234

6.4.5

Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

238

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

241

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

242

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

253

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

255

7 Two-variable optimisation
7.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

261

7.2

Unconstrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . .

261

7.2.1

Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . .

262

7.2.2

Classifying stationary points . . . . . . . . . . . . . . . . . . . . .

264

7.2.3

Convex and concave functions . . . . . . . . . . . . . . . . . . . .

269

7.2.4

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

272

Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . .

275

7.3.1

Finding optimal points on the boundary of a region . . . . . . . .

277

7.3.2

The method of Lagrange multipliers . . . . . . . . . . . . . . . . .

279

7.3.3

The meaning of the Lagrange multiplier . . . . . . . . . . . . . .

282

7.3.4

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

284

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

289

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

290

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

294

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

296

7.3

8 Differential equations

303

8.1

Introduction: What is a differential equation? . . . . . . . . . . . . . . .

303

8.2

First-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

306

8.2.1

Separable first-order ODEs . . . . . . . . . . . . . . . . . . . . . .

307

8.2.2

Linear first-order ODEs . . . . . . . . . . . . . . . . . . . . . . .

308

8.2.3

Homogeneous first-order ODEs . . . . . . . . . . . . . . . . . . .

310

Second-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

312

8.3.1

Homogeneous second-order ODEs . . . . . . . . . . . . . . . . . .

312

8.3.2

Non-homogeneous second-order ODEs . . . . . . . . . . . . . . .

314

8.3

iv

261

Contents

8.4

Systems of first-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . .

318

8.4.1

Simple systems of first-order ODEs . . . . . . . . . . . . . . . . .

319

8.4.2

Other systems of first-order ODEs . . . . . . . . . . . . . . . . . .

321

Applications of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

323

8.5.1

Determining demand functions from elasticities . . . . . . . . . .

323

8.5.2

Continuous price adjustment . . . . . . . . . . . . . . . . . . . . .

324

8.5.3

Continuous cash flows . . . . . . . . . . . . . . . . . . . . . . . .

325

8.5.4

Market trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

327

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

327

Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

328

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

334

Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

334

8.5

A Sample examination paper

339

B Solutions to the sample examination paper

341

Contents

vi

Preface
This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. Where coverage in the main texts is weak, it provides some
additional background material.
I am grateful to Mark Baltovic for his careful reading of a draft of this guide and for his
many helpful comments.

Preface

Chapter 1
Introduction
In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise you on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.

1.1

This subject

Calculus, as studied in this Level 1 course is primarily the study of derivatives and
integrals of functions of one variable and partial derivatives of functions of several
variables.
Our approach here is not just to help you acquire proficiency in techniques and
methods, but also to help you understand some of the theoretical ideas behind these.
For example, after completing this course, you will hopefully understand why the
derivatives of a function allow you to determine where a function of one variable is
optimised. In addition to this, we try to indicate the uses of some of the methods in
applications to economics, finance and related disciplines.
Aims of the course
The broad aims of this course are as follows:
to enable students to acquire skills in the methods of calculus (including
multivariate calculus), as required for their use in further mathematics subjects and
economics-based subjects;
to prepare students for further courses in mathematics and/or related disciplines.
As emphasised above, however, we do also want you to understand why certain
methods work: this is one of the skills that you should acquire. Indeed, the
examination will not simply test your ability to perform routine calculations, it will also
probe your knowledge and understanding of the principles that underlie the material.
Learning outcomes
We now state the broad learning outcomes of this course, as a whole. At the end of this
course and having completed the essential reading and activities, you should be able to:

1. Introduction

use the concepts, terminology, methods and conventions covered in the course to
solve mathematical problems in this subject;
solve unseen mathematical problems involving understanding of these concepts and
application of these methods;
see how calculus can be used to solve problems in economics and related subjects;
demonstrate knowledge and understanding of the underlying principles of calculus.
There are a couple of things that we should stress at this point. Firstly, note the
intention that you will be able to solve unseen problems. This means simply that you
will be expected to be able to use your knowledge and understanding of the material to
solve problems that are not completely standard. This is not something you should
worry unduly about: all topics in mathematics expect this, and you will never be
expected to do anything that cannot be done using the material of this course.
Secondly, we expect you to be able to demonstrate knowledge and understanding and
you might well wonder how you would demonstrate this in the examination. Well, it is
precisely by being able to grapple successfully with unseen, non-routine, questions that
you will indicate that you have a proper understanding of the topic.
Topics covered
Descriptions of the topics to be covered appear in the relevant chapters. However, it is
useful to give a brief overview at this stage.
We start by revising some of the basic ideas that are needed for the study of this course
and, in particular, the idea of a function of one variable. We then introduce derivatives
of such functions and how to find them using the techniques of differentiation. This
enables us to see how such functions are behaving and, in particular, enables us to see
where such functions are optimised. We then introduce integrals of such functions and
how to find them using the techniques of integration. In particular, this will enable us
to see how to relate functions to areas. We then introduce functions of several variables
and develop techniques for finding their partial derivatives. In particular, we will see
how we can use these ideas to see where these slightly more complicated functions are
optimised. Lastly, we introduce the idea of a differential equation and examine methods
for solving them.
Throughout this subject guide, the emphasis will be on the theory as much as on the
methods. That is to say, our aim in this subject is not only to provide you with some
useful techniques and methods from calculus, but to also enable you to understand why
these techniques work.

1.2

Reading

There are many books that would be useful for this subject. We recommend two in
particular, and a couple of others for additional, further reading. (You should note,
however, that there are very many books suitable for this course. Indeed, almost any
text on first-year university calculus will cover the majority of the material.)

1.2. Reading

Textbook reading is essential as textbooks will provide you with more in-depth
explanations than you will find in this subject guide, and they will also provide many
more examples to study and exercises to work through. The books listed are the ones
we have referred to in this guide.
Essential reading
Detailed reading references in this subject guide refer to the editions of the set
textbooks listed below. New editions of one or more of these textbooks may have been
published by the time you study this course. You can use a more recent edition of any
of the books; use the detailed chapter and section headings and the index to identify
relevant readings. Also check the virtual learning environment (VLE) regularly for
updated guidance on readings.
Binmore, K. and J. Davies Calculus: Concepts and Methods. (Cambridge:
Cambridge University Press, 2002, second revised edition) [ISBN 9780521775410].
Anthony, M. and N. Biggs Mathematics for economics and finance: methods and
modelling. (Cambridge: Cambridge University Press, 1996) [ISBN 9780521559133].
By and large we will be following Binmore and Davies but, sometimes, we will follow
the simpler treatment found in Anthony and Biggs. Both texts, when used wisely, will
provide you with a large number of examples for you to study and exercises for you to
attempt. It is recommended that you purchase both of these. Another thing you might
like to bear in mind is that some of the material from Binmore and Davies that we omit
here will be useful if you go on to study 176 Further Calculus.
Further reading
Once you have covered the essential reading you are then free to read around the
subject area in any text, paper or online resource. You will need to support your
learning by reading as widely as possible and by thinking about how these principles
apply in the real world. To help you read extensively, you have free access to the VLE
and University of London Online Library (see Section 1.3.2). However, two useful
textbooks that we have referred to in this subject guide are the following.
Simon, C.P. and L. Blume Mathematics for economists. (New York and London:
W.W. Norton and Company, 1994) [ISBN 9780393957334].
Adams, R.A. and C. Essex Calculus: a complete course. (Toronto: Pearson, 2010,
seventh edition) [ISBN 9780321549280].
Simon and Blume is a useful supplementary text with an emphasis on applications of
the material to economics; whereas Adams and Essex (which is merely an example from
a large range of very similar calculus textbooks) is a detailed calculus textbook which
contains much material which is beyond the scope of this course. Both of these texts are
suitable as sources of additional explanation, examples and exercises, but they are
probably not worth purchasing.

1. Introduction

1.3

Online study resources

In addition to the subject guide and the essential reading, it is crucial that you take
advantage of the study resources that are available online for this course, including the
virtual learning environment (VLE) and the Online Library.
You can access the VLE, the Online Library and your University of London email
account via the Student Portal at
http://my.londoninternational.ac.uk
You should receive your login details in your study pack. If you have not, or you have
forgotten your login details, please email uolia.support@london.ac.uk quoting your
student number.

1.3.1

The VLE

The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
Past examination papers and Examiners commentaries: These provide advice on
how each examination question might best be answered.
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years Study Weekends have been recorded and made available.
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.

1.4. Using this guide

1.3.2

Making use of the Online Library

The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library at
http://tinyurl.com/ollathens
you will either need to use your University of London Student Portal login details, or
you will be required to register and use an Athens login.
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages at
www.external.shl.lon.ac.uk/summon/about.php

1.4

Using this guide

We have already mentioned that this guide is not a textbook. It is important that you
read textbooks in conjunction with the guide and that you try problems from the
textbooks. The exercises at the end of the main chapters of this subject guide are a very
useful resource and you should try them once you think you have mastered the material
from the chapter. You should really try these exercises before consulting the solutions,
as simply reading the solutions provided will not help you at all. Sometimes, the
solutions we provide will just be an overview of what is required, i.e. an indication of
how you should answer the questions, but in the examination, you must always show all
of your calculations. It is vital that you develop and enhance your problem-solving skills
and the only way to do this is to try lots of exercises.

1.5

Examination advice

Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the virtual
learning environment (VLE) where you should be advised of any forthcoming changes.
You should also carefully check the rubric/instructions on the paper you actually sit
and follow those instructions.
Remember, it is important to check the VLE for:
Up-to-date information on examination and assessment arrangements for this
course.

1. Introduction

Where available, past examination papers and Examiners commentaries for the
course which give advice on how each question might best be answered.
This course is assessed by a three hour unseen written examination. There are no
optional topics in this subject: you should study them all and this is reflected in the
structure of the examination paper. There are five questions (each worth 20 marks) and
all questions are compulsory. A sample examination paper may be found in an appendix
to this subject guide.
Please do not think that the questions in your real examination will necessarily be very
similar to the exercises in this subject guide or those in the sample examination paper.
The examination is designed to test you. You will get examination questions unlike the
questions in this subject guide. The whole point of examining is to see whether you can
apply your knowledge in familiar and unfamiliar settings. The Examiners (nice people
though they are) have an obligation to surprise you! For this reason, it is important
that you try as many examples as possible, from the subject guide and from the
textbooks. This is not so that you can cover any possible type of question the
Examiners can think of! It is so that you get used to confronting unfamiliar questions,
grappling with them, and finally coming up with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.

1.6

The use of calculators

You will not be permitted to use calculators of any type in the examination. This is not
something that you should worry about: the Examiners are interested in assessing that
you understand the key concepts, ideas, methods and techniques, and will set questions
which do not require the use of a calculator.

Chapter 2
Functions

Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 2.12.6, 2.14 and part of 7.1.2.
Anthony and Biggs (1996) Chapters 1, 2 and parts of 7.
Further reading
Simon and Blume (1994) Sections 2.1, part of 2.2, 5.1, 5.3, and 5.4, Appendices
A1.1, parts of A1.2 and A2.16.
Adams and Essex (2010) Preliminaries parts of P.1P.7, parts of Sections 3.13.3
and 3.5.
Aims and objectives
The objectives of this chapter are as follows.
To introduce functions in general and the elementary functions and their graphs in
particular.
To see how to find combinations of functions and the inverse of a function (if it
exists).
To see how functions can be used in economics-based subjects.
To introduce conic sections and see how to draw them.
Specific learning outcomes can be found near the end of this chapter.

2.1

Introduction: What is a function?

NOTE: Before you start this chapter, you should make sure that you have
covered the background material in Chapter 1 of 173 Algebra.
Given two sets A and B, a function, f , from A to B is a rule which takes each element
of A and gives us a unique (or exactly one) element of B. We often express the fact that
the function f takes elements from A and gives us elements of B by writing

2. Functions

f : A B. In such cases, we call the sets A and B the domain and co-domain of the
function respectively.

One way of visualising a function f : A B is to think of it as a black box that takes


any x A, the domain, and applies the rule given by f to it to get the unique output
f (x) B, the co-domain, i.e.
x A f f (x) B.
Here, we have used x to denote the independent variable as we are free to choose any
element, x A from the domain. But, of course, the choice of x here is not essential as
it is just a dummy variable we could have used any other letter instead and said
that the function f : A B is a black box that takes any p A and applies the rule
given by f to it to get the unique output f (p) B.

It is often convenient to introduce another variable, called the dependent variable, to


stand for the elements of B that the function f : A B gives us. For instance, we
could say that this function takes any x A, the domain, and applies the rule given by
f to it to get the unique output y B, the co-domain, where y = f (x). Of course, here
the independent variable, x, via the rule given by f , will determine the value of y and
this is why we think of y as the dependent variable.
For now, we will only be interested in functions whose domain and co-domain are
certain sets of real numbers. In particular, they will either be R itself or certain subsets
of R called intervals. Typically, we think of R as the points on a line so that intervals
are described by line segments. Indeed, for a, b R, we will have finite intervals like
(a, b) = {x R | a < x < b}

and

[a, b] = {x R | a x b},

which only differ according to whether the end-points, i.e. the elements a and b, are in
the set. Of course, we can also have finite intervals where one end-point, but not the
other, is in the set and we denote these by
(a, b] = {x R | a < x b}

and

[a, b) = {x R | a x < b}.

There are also infinite intervals which will have one finite end-point, say a R, and we
denote these by
(, a] = {x R | x a}

and

[a, ) = {x R | a x},

and

(a, ) = {x R | a < x},

if this finite end-point is in the set, or by


(, a) = {x R | x < a}

if it isnt. Of course, as we can see by looking at the sets involved when writing these
infinite intervals, the symbols and are not end-points as they are not real
numbers, they are just a notational convenience.
Putting these ideas together, we find that another way of visualising a function
f : A B is its graph which is the set of all points (x, y) R2 such that y = f (x).
Indeed, as a function f : A B must give a unique output y B for each x A, its
graph could look like the one illustrated in Figure 2.1(a) but not like the one in
Figure 2.1(b).

10

2.1. Introduction: What is a function?

y
c

x
y = f (x)

a x

(a)

(b)

Figure 2.1: In (a) we have the graph of a function f : [0, a] [b, c] as each input, x [0, a],

gives a unique output y [b, c]. In (b), we do not have the graph of a function from [0, a]
to [b, c] as each input, x [0, a], gives two outputs y [b, c].

2.1.1

Some elementary functions and their graphs

We now revise some elementary functions that will be useful in this course and look at
their graphs.
Power functions
A power function is a function f : R R given by
f (x) = xn ,

where n N. Depending on the value of n, the graphs of these functions look very much
like the ones illustrated in Figure 2.2. In addition to this, we also include the power
function f (x) = x0 = 1 as the function whose graph is a horizontal straight line that
goes through the point (0, 1).
y

y = xn

y=x
O

y=
O

(a) n = 1

(b) n is even

xn

(c) n 3 is odd

Figure 2.2: (a) When n = 1, the graph of the function f (x) = xn is just the straight line

y = x. (b) The graph of the function f (x) = xn when n is even. (c) The graph of the
function f (x) = xn when n 3 is odd. Of course, in (b) and (c) we are only looking at
the shape of the graph for different values of n without any regard to the scales on the
axes.
In particular, if we let x mean that x is positive and getting arbitrarily large (i.e.
we are considering what happens as x takes values far to the right on the x-axis) and

11

2. Functions

x means that x is negative but getting arbitrarily large in magnitude (i.e. we are
considering what happens as x takes values far to the left on the x-axis), we see that:

If n is even, xn as x and as x .
If n is odd, xn as x whereas xn as x .

This insight will be important in Section 4.4 when we consider how to sketch the graphs
of more complicated functions.
Exponential functions
An exponential function with base a is a function f : R (0, ) given by
f (x) = ax ,
where a = 1 is a positive real number. Depending on the value of a, the graphs of these
functions look very much like the ones illustrated in Figure 2.3.
y

y = ax

y = ax

(a) 0 < a < 1

(b) a > 1

Figure 2.3: (a) The graph of the function f (x) = ax when 0 < a < 1. (b) The graph of the

function f (x) = ax when a > 1. Of course, in both of these graphs we are only looking
at the shape of the graph for different values of a without any regard to the scales on the
axes.
Indeed, looking at these graphs we see that
If 0 < a < 1, ax 0 as x and ax as x .
If a > 1, ax as x and ax 0 as x .

And, as a0 = 1 for any positive a = 1, the graphs of these functions always go through
the point (0, 1).
Trigonometric functions
The two elementary trigonometric functions that we will be using are the sine and
cosine functions but, unlike what you may have seen before, we will always be using
them for angles that are given in radians instead of degrees. As you may know, we can
easily convert between these two units by using the formula
angle in radians =

12

2
angle in degrees,
360

2.1. Introduction: What is a function?

so, in particular, 360 is 2 radians, 180 is radians and 90 is /2 radians. Then,


measuring in radians, we can define the sine and cosine functions for 0 /2 by
using the right-angled triangle in Figure 2.4 to get
opposite
hypotenuse

and

s
nu
e
t
po

hy

adjacent

cos =

adjacent
.
hypotenuse

opposite

sin =

Figure 2.4: Defining the sine and cosine functions, sin and cos , for 0 /2.

In particular, by considering the two special triangles in Figure 2.5, we can see that the
values of these functions for some common angles (in radians) are

sin
cos

6
1
2
3
2

4
1

2
1

3
3
2
1
2

Activity 2.1 Recall that we also have the tangent function which, for 0 /2,
can be defined by using the right-angled triangle in Figure 2.4 to get
tan =

opposite
.
adjacent

Use the triangles in Figure 2.5 to find the values of tan when is /6, /4 and /3
radians. Incidentally, what are these three angles in degrees?

/4

4 1

/6

(a)

(b)

3 1

Figure 2.5: Finding sin and cos when (a) = /4 radians and (b) when = /6 or

= /3 radians.
At this point, well stop saying that an angle is in radians as, unless explicitly stated
otherwise, this will always be the case.

13

2. Functions

If we want to extend the definition of the sine and cosine functions to 0 2, we


think of a unit circle and a triangle with an hypotenuse of 1 as illustrated in
Figure 2.6(a) which, for 0 /2 gives us a point (x, y) with
x = cos

and

y = sin ,

which can be found as before. But, if we now have /2 2, we get the situation
illustrated in Figure 2.6(b), where we can find the magnitude of x and y using our
original triangle and their sign by considering where the point lies in the (x, y)-plane.
For instance, in Figure 2.6(b), the angle could be 5/4 and so the angle in the triangle
y

y
(x, y)
1
O

(x, y)

(a)

(b)

Figure 2.6: Finding sin and cos when 0 2 by considering a unit circle.

= 4 as the angle subtended by a straight line in this case


would be /4 (i.e. 5
4

the x-axis is ). This gives x and y a magnitude of 1/ 2 and their signs would be
negative as x, y < 0 so we see that
sin

5
1
=
4
2

and

cos

5
1
= ,
4
2

using the unit circle method.


Activity 2.2 Use the unit circle method to find sin and cos if = 2/3.
Activity 2.3 Use the unit circle method to find the values of sin and cos when
= 0 and = /2.
Hence deduce the values of these functions when = , = 3/2 and = 2.
If we want to extend the definition of the sine and cosine functions to all R, we can
see from the unit triangle method that both of these functions are periodic with a
period of 2, i.e.
sin( + 2) = sin

and

cos( + 2) = cos ,

and their graphs are illustrated in Figure 2.7. In particular, we observe that
cos = sin( + 2 ), i.e. the graph of the cosine function is what we get when we shift the
sine function to the left by /2.

14

2.1. Introduction: What is a function?

Figure 2.7: The graphs of the sine and cosine functions, sin (solid line) and cos (dashed

line), for 4.

2.1.2

Combinations of functions

The elementary functions we have seen can be combined in various ways to make more
complicated functions. Generally, this is straightforward and works in the way you
would expect, but sometimes there are slight complications and so we revise these
different types of combination here.
Linear combinations of functions
If we have two functions with the same domain and co-domain, say f : A B and
g : A B, we can define a new function which is a linear combination of these two
functions. For instance, if k and l are constants, we would have the new function
kf + lg : A B defined by
(kf + lg)(x) = kf (x) + lg(x),
for all x A. In particular, this gives us polynomials, i.e. functions pn : R R which
are a linear combination of power functions of the form
pn (x) = an xn + an1 xn1 + + a1 x + a0 ,
where the ai for 0 i n are real constants. Indeed, if an = 0, we say that this is a
polynomial of degree n.
Of course, you have seen polynomials before as, in Chapter 1 of 173 Algebra, you saw
how to solve polynomial equations of the form pn (x) = 0 where n = 1 (a linear
equation), n = 2 (a quadratic equation) and n = 3 (a cubic equation). The information
we get from solving these equations is useful when we come to draw the graphs of
polynomial functions as the next example shows.

15

2. Functions

Example 2.1 Draw the graphs of the functions f : R R and g : R R given by


f (x) = 5 and g(x) = x + 2 on the same axes. At what point(s) do these graphs
intersect?

When we draw graphs, we will often do this by doing a sketch. Indeed, for a sketch
of the simple functions given here, it suffices to indicate their shape (they are both
straight lines) and where they are relative to the x and y-axes (by indicating where
they intersect these axes). So, as we saw in Section 2.1.1, we should expect the graph
of g(x) to be a horizontal line that goes through the point (0, 5) as g(x) = 5 for all
x R whereas for f (x), we would expect a straight line that has an
x-intercept that occurs when f (x) = 0, i.e. when x = 2, and a
y-intercept that occurs when x = 0, i.e. when f (0) = 2.
This information allows us to obtain the sketch illustrated in Figure 2.8.
To find the point(s) at which these two graphs intersect, we are looking for the
value(s) of x that make f (x) = g(x), i.e. where 5 = x + 2. This gives x = 3 and we
know that the values of the functions here must satisfy f (3) = g(3) = 5 which gives
(3, 5) as the required point of intersection.1
y
5

y = g(x)
y = f (x)

2
2

Figure 2.8: The graphs of the functions f (x) = 5 and g(x) = x + 2. Notice that these

graphs intersect at the point (3, 5) which we found in Example 2.1.


We will see how to draw the graphs of polynomial functions where n = 2 in
Section 2.2.1 and we will develop a more general method for dealing with the case
where n 3 in Section 4.4.
Products and quotients of functions
If we have two functions with the same domain but possibly different co-domains, say
f : A B and g : A C, we can define a new function which is the product of these
two functions. For instance, here we would have the new function f g : A D where
D is the possibly new co-domain, defined by
(f g)(x) = f (x)g(x),
for all x A. Of course, we have seen how this works the other way in Chapter 1 of 173
Algebra as the process of factorisation involves writing a polynomial of degree n as the
1

Of course, thinking about the graphs of these functions as the points, (x, y), satisfying the equations
y = f (x) and y = g(x), all we have done here is solve the equations y = 5 and y = x + 2 simultaneously.

16

2.1. Introduction: What is a function?

product of two polynomials, one of degree m and another of degree p, with n = m + p.


However, the quotient of these two functions is slightly more tricky to deal with as the
function f /g defined by
f (x)
,
(f /g)(x) =
g(x)
only makes sense for those x A where g(x) = 0 as, of course, we can never divide by
zero. As such, when finding the quotient of two functions, we get a function
f /g : A B where A is a new domain given by
A = {x A | g(x) = 0}.
The points at which a quotient are undefined may have interesting consequences for its
graph since they can give rise to vertical asymptotes. But, this neednt be the case as
the next example shows.
Example 2.2

Discuss the behaviour of the functions given by


x+1
f (x) =
x1

x2 + x 2
and g(x) =
,
x1

at the point x = 1.
For f (x), the polynomials in the numerator and denominator of the quotient are
defined for all x R, but f itself is not defined at x = 1 because that would entail
division by zero. As such, f must be a function from {x R | x = 1} to R. Indeed, if
we are considering values of x close to one, i.e. x 1, we could say that
f (x)

1+1
2
=
,
x1
x1

and so we see that:


If we let x go to one from values of x that are larger than one (here we say x
goes to 1 from above and write x 1+ ), we see that x 1 is positive and
getting very small, which means that f (x) itself is positive and getting very
large. That is, f (x) is getting arbitrarily large as x goes to 1 from above and we
write this as f (x) as x 1+ .
If we let x go to one from values of x that are smaller than one (here we say x
goes to 1 from below and write x 1 ), we see that x 1 is negative and
getting very small, which means that f (x) itself is negative and getting very
large in magnitude. That is, f (x) is negative but getting arbitrarily large in
magnitude as x goes to 1 from below and we write this as f (x) as
x 1 .

As such, we see that f (x) has a vertical asymptote at the point x = 1 where it is
undefined. The graph of this function is illustrated in Figure 2.9(a) so that you can
see this asymptote and you will understand why its graph looks like this away from
the asymptote after you have covered the material in Section 2.2.4.

For g(x), the polynomials in the numerator and the denominator of the quotient are
defined for all x R, but g itself is not defined at x = 1 because, again, that would

17

2. Functions

entail division by zero. As such, g must also be a function from {x R | x = 1} to R.


However, in this case, we notice that x = 1 makes both the numerator and the
denominator equal to zero and so, in particular, x = 1 must be a root of the
numerator. This means that, if we factorise the numerator, we find that
g(x) =

(x + 2)(x 1)
x2 + x 2
=
,
x1
x1

and so, as long as x = 1, we have


g(x) = x + 2.
As such, where it is defined (i.e. for x = 1) the graph of g is a straight line like the
one sketched in Figure 2.9(b) although, of course, we must exclude the point (1, 3)
from this line as g(x) is not defined there. In particular, note that in this case the
function does not have a vertical asymptote at x = 1 even though it is undefined
there.
We will look at asymptotes in more detail when we see them again in Section 2.2.4 and
Section 4.4.
y
3
2

y
y = f (x)
O

(a)

y = g(x)

(b)

Figure 2.9: The graphs of the functions f (x) and g(x) from Example 2.2. In (a), the

vertical asymptote at x = 1 is indicated by a dashed line. In (b), the point where the
function is undefined is indicated by .
We can also form quotients using trigonometric functions and, in particular, we can use
the triangle in Figure 2.4 to see that
tan =

opposite/hypotenuse
sin
opposite
=
=
,
adjacent
adjacent/hypotenuse
cos

that is, we can think of the tangent function as the quotient


tan =

sin
,
cos

(2.1)

which will be defined for R as long as cos = 0, i.e. as long as = (2n + 1) 2 for
n Z. At the points where it is undefined this function has vertical asymptotes and its
graph is sketched in Figure 2.10.

18

2.1. Introduction: What is a function?

Figure 2.10: The graph of the tangent function, tan for 4. Note the vertical

asymptotes when = (2n + 1) 2 for n Z.

We can also find the reciprocals of our three trigonometric functions and these are
defined as follows.
The secant function, sec =

1
which is defined when = (2n + 1) 2 for n Z.
cos

The cosecant function, cosec =

1
which is defined when = n for n Z.
sin

The cotangent function, cot =

1
which is defined when = n for n Z.
tan

These functions will be especially useful in Section 2.1.4.


Activity 2.4

Show that we also have cot =

cos
as long as = n for n Z.
sin

Compositions of functions
If we have two functions, say f : A B and g : B C, then we can define the
composition g f : A C to be the function
(g f )(x) = g(f (x)),
and here we say that we are applying g after f . That is, thinking of this in terms of
black boxes we have
x A f f (x) B g g(f (x)) C,
i.e. we take an x A and apply f to get the output f (x) B which we then use as the
input for g yielding the final output g(f (x)) C which is the value of (g f )(x).

19

2. Functions

Example 2.3 Let f : R R and g : R R be the functions f (x) = x2 and


g(x) = 2x 1. What are the functions g f and f g?
Here, as the functions both go from R to R, we can find both of these compositions.
In particular,
g f is the function where
(g f )(x) = g(f (x)) = g(x2 ) = 2x2 1,
where (g f ) : R R.
f g is the function where
(f g)(x) = f (g(x)) = f (2x 1) = (2x 1)2 ,
and (f g) : R R.

Indeed, observe that as (2x 1)2 = 4x2 4x + 1, these are certainly not the same
function.
Activity 2.5 Let f : R R and g : R R be the functions f (x) = x2 + 1 and
g(x) = 2x . What are the functions g f and f g?
In particular, we will also need to be able to identify compositions the other way when
we cover the chain rule in Section 3.2.2. For instance, it should be clear that the
function (x2 + 5)3 is the composition of the function x3 after the function x2 + 5.
Activity 2.6 Explain why the function (x2 + 5)3 is the composition of the function
x3 after the function x2 + 5.

2.1.3

Inverse functions

If A and B are sets and we have a function f : A B, we know that this means that
for every x A there is a unique y B such that y = f (x). Now, if we can define
another function g : B A, i.e. for every y B there is a unique x A such that
y = f (x) if and only if x = g(y),
then we call the function, g, the inverse of f and denote it by f 1 . In terms of black
boxes, this means that we have
x A f f (x) B,
for f and, if it exists, we have
y B f 1 f 1 (y) A,
for f 1 , or more usefully,
f (x) B f 1 x A.

20

2.1. Introduction: What is a function?

In particular, this means that if the inverse, f 1 , of f exists, we see that the
composition f 1 after f gives us

x A f f (x) B f 1 x A,
and so (f 1 f )(x) = f 1 (f (x)) = x whereas the composition f after f 1 gives us
y B f 1 f 1 (y) A f y B,
and so (f f 1 )(y) = f (f 1 (y)) = y. That is, the inverse of a function (if it exists)
undoes what the function does and vice versa.
The question, then, is how can we tell whether an inverse function exists? And, if it
does exist, how can we find it? Well, given the function f : A B, the inverse will exist
if we are able to take y = f (x) and solve it to obtain a unique solution, x, in terms of y
for every y B. And, if we can do this, these solutions will tell us what the inverse
function is, i.e. they will allow us to identify the function, f 1 (y), by comparison with
x = f 1 (y). To make this clear, lets look at an example.
Example 2.4 Consider the function f : R R given by f (x) = x + 2. Explain why
this function has an inverse and find it.
Using the graph or common sense, we see that the function f (x) = x + 2 has an
inverse, since every y R where y = f (x) gives rise to a unique x R given by
x = y 2. As such, we can conclude that the inverse of this function exists and we
have x = f 1 (y) = y 2. Of course, we can now write this inverse as f 1 (x) = x 2
if we want it in terms of x.
Indeed, notice that, if we have the function f (x) and its inverse function f 1 (x), the
graph of f 1 is the reflection of the graph of f about the line y = x. This happens
because any point (x, y) on the curve y = f (x) becomes, under a reflection about the
line y = x, a point (y, x) on the curve x = f (y) which is the same as saying that
y = f 1 (x)!
Activity 2.7 Verify that the curve y = f 1 (x) is the reflection about the line y = x
of the curve y = f (x) using the function we saw in Example 2.4.
Of course, not every function has an inverse as the next example shows.
Example 2.5 Consider the function f : R R given by f (x) = x2 . Explain why
this function does not have an inverse.
If we take any y R where y = f (x) this gives us the equation y = x2 and, if we are
considering x R, this gives rise to a problem as far as the inverse of f is concerned
because:
If y < 0, we get no solution for x as we know that x R means that y = x2 0.

If y > 0, we get two solutions for x as we know that we can get x = y R.

That is, we can find no inverse in this case since we cannot guarantee a unique
solution for x R from the equation y = x2 for all y R.

21

2. Functions

Of course, we can usually get around such problems if we are prepared to restrict the
domain and the co-domain of the function. But, in that case, we would be finding the
appropriate local inverses as opposed to its inverse (which, remember, doesnt exist!).
Activity 2.8 By considering the domains (, 0] and [0, ) and suitably
restricting the co-domain of the function in Example 2.5, find its local inverses.
Lets now look at the inverses of the elementary functions we considered in Section 2.1.1.
Power functions: root functions
If we have the power function f (x) = xn where x N and f : [0, ) [0, ) we can
see that the inverse is given by
x = f 1 (y) = y 1/n ,
and this is called a root function. Thus, we have
x = y 1/n

if and only if y = xn ,

provided that x, y 0. In particular, if n = 2, this is the square root function, i.e. we

have y 1/2 = y.
Activity 2.9 Draw the graph of the power function f : [0, ) [0, ) where
f (x) = x2 and its inverse.
This also works for f (x) = xn where f : R R if n is odd. But, if n is even, the
function f (x) = xn where f : R R does not have an inverse as we saw, for n = 2, in
Example 2.5.
Activity 2.10 Explain why we can find an inverse of the function f : R R where
f (x) = xn if n is odd. Why doesnt this work if n is even?
Exponential functions: logarithmic functions
If we have the exponential function f (x) = ax where f : R (0, ) and a = 1 is a
positive real number, the inverse is the function f 1 : (0, ) R given by
x = f 1 (y) = loga y,
which is the logarithm to base a. Thus, we have
x = loga y
provided that y > 0.

22

if and only if y = ax ,

2.1. Introduction: What is a function?

Activity 2.11 Draw the graph of the exponential function f : R (0, ) where
f (x) = 2x and its inverse, f 1 (x) = log2 x where f 1 : (0, ) R.
In particular, we see from this that as
(f f 1 )(x) = f (f 1 (x)) = x we have aloga x = x,
and as
(f 1 f )(x) = f 1 (f (x)) = x we have

loga ax = x.

These results will be useful in Section 2.1.4 when we consider the laws of of logarithms.
Trigonometric functions: inverse trigonometric functions
If we want to discuss the inverses of the trigonometric functions sine and cosine, it is
first necessary to restrict their domain due to their oscillatory nature. To do this, we
consider a certain interval of values of , called the principal range, so that each value of
the function corresponds to a unique value of . Indeed, for the:
sine function, we take the principal range to be the interval [ 2 , 2 ] so that the
function sin : [ 2 , 2 ] [1, 1] where y = sin has an inverse. This inverse is
denoted by sin1 (or arcsin) where sin1 : [1, 1] [ 2 , 2 ]. Thus, we have
y = sin
provided that 2

if and only if = sin1 y,

and 1 y 1.

cosine function, we take the principal range to be the interval [0, ] so that the
function cos : [0, ] [1, 1] where y = cos has an inverse. This inverse is
denoted by cos1 (or arccos) where cos1 : [1, 1] [0, ]. Thus, we have
y = cos

if and only if = cos1 y,

provided that 0 and 1 y 1.

It will also be convenient for us to consider the inverse of the tangent function where, as
well as the oscillations, we need to take care to avoid the asymptotes that occur when
this function is undefined. As such, for the
tangent function, we take the principal range to be the interval ( 2 , 2 ) so that the
function tan : ( 2 , 2 ) R where y = tan has an inverse. This inverse is denoted
by tan1 (or arctan) where tan1 : R ( 2 , 2 ). Thus, we have
y = tan

if and only if = tan1 y,

provided that 2 < < 2 .

In particular, observe that sin1 , cos1 and tan1 are the inverses of the functions sin,
cos and tan respectively and not their reciprocals which we denoted by cosec, sec and
cot respectively in Section 2.1.2!

23

2. Functions

Activity 2.12 Find the acute angles 1 , 2 and 3 where 1 = sin1 12 , 2 = cos1
and t3 = tan1 1.

1
2

Also find cosec 1 , sec 2 and cot 3 .

2.1.4

Identities

An expression such as
(x + 1)2 = x2 + 2x + 1,
which is true for all x is called an identity and, as you know, these are useful when we
need to simplify expressions. In particular, in Chapter 1 of 173 Algebra, you saw that
the power laws dictate that
am an = am+n ,

am
= amn
an

(am )n = amn ,

and

and these are identities that work for any values of a, m and n for which both sides are
defined. Indeed, these laws allow us to simplify expressions that may result from
appropriate products, quotients and compositions of power functions or exponential
functions.
Activity 2.13 If f (x) = x3 , g(x) = x4 and h(x) = 2x , find the functions (f g)(x),
(f /g)(x) and (g h)(x) simplifying your answers as far as possible.
We now look at some other identities that will be useful in this course.
The laws of logarithms
For any positive real number a = 1, the laws of logarithms state that
loga x + loga y = loga (xy),

loga x loga y = loga

x
y

and y loga x = loga (xy ),

provided that all of the terms involved are defined. As you may know, these laws are
easily derived from the power laws we saw above and the fact that
aloga x = x,
which we saw earlier in Section 2.1.3.
Activity 2.14

Derive the laws of logarithms from the power laws.

It is also useful to note that if a, b = 1 are positive real numbers, then we have the
change of base formula which states that
loga x =

logb x
,
logb a

and this allows us to write logarithms to base a in terms of logarithms to base b.

24

2.1. Introduction: What is a function?

Activity 2.15

Derive the change of base formula for logarithms.

Trigonometric identities
There are also identities that allow us to simplify various expressions involving the
trigonometric functions. For instance, using the triangle in Figure 2.4, Pythagoras
theorem allows us to see that
2

opposite
adjacent
+
hypotenuse
hypotenuse
2
2
opposite + adjacent
=
hypotenuse2
hypotenuse2
=
hypotenuse2
= 1,

sin2 + cos2 =

and so, for acute angles,2 we have shown that


sin2 + cos2 = 1.

(2.2)

In particular, for natural numbers n 2, note that we commonly abbreviate things like
(sin )n by writing them as sinn . Further, dividing both sides of this expression by
sin2 we get
1 + cot2 = cosec2 ,
(2.3)
and this works as long as = n for n Z whereas dividing both sides of this
expression by cos2 we get
tan2 + 1 = sec2 ,

(2.4)

and this works as long as = (2n + 1) 2 for n Z. We call these three identities the
Pythagorean identities as they are simple consequences of Pythagorass theorem.
Activity 2.16

Use (2.2) to derive the Pythagorean identities (2.3) and (2.4).

Another useful pair of trigonometric identities are the compound-angle formulae given
by
sin( + ) = sin cos + cos sin and

cos( + ) = cos cos sin sin ,

which work for all , R.


Activity 2.17 Observe from the graphs of the sine and cosine functions in
Figure 2.7 that sine is an odd function, i.e. sin() = sin , and cosine is an even
function, i.e. cos() = cos . Use these facts and the compound-angle formulae to
show that we also have
sin( ) = sin cos cos sin and

cos( ) = cos cos + sin sin ,

for , R.
2

Of course, if we consider how we extend the definitions of the sine and cosine functions to all R,
it should be clear that this identity is actually true for all R.

25

2. Functions

Usually, we summarise these four compound-angle formulae by writing them as

sin( ) = sin cos cos sin and

cos( ) = cos cos sin sin , (2.5)

for , R. Indeed, they are especially useful since, setting = , we can use them to
obtain the double-angle formulae
sin(2) = 2 sin cos

and

cos(2) = cos2 sin2 ,

(2.6)

which work for all R. These will be especially useful in Chapter 5.


Activity 2.18 Use the compound-angle formulae to derive the double-angle
formulae given above.
Use the Pythagorean identity sin2 + cos2 = 1 to show that we also have
cos(2) = 1 2 sin2

and

cos(2) = 2 cos2 1,

for all R.

2.1.5

Applications of functions

In economics and related subjects, functions can be used to represent how one quantity
depends on another. For instance, as the profit that a company makes, , would depend
on the quantity of goods sold, q, it makes sense to suppose that there is some function
of q, say f , that tells us the corresponding profit, . In this case, we would use an
equation of the form = f (q) to express this dependency and we would have found a
profit function. Moreover, if f is invertible, we could find its inverse function, f 1 , and
we would use this to find the value of q that corresponds to a given value of . In which
case, the dependency would now be given by an equation of the form q = f 1 (). We
will look at profit functions properly in Section 4.5.3, but for now, we consider another
application of functions in economics, namely how they can be used to represent
information about supply and demand in a market.
Supply and demand functions
In any given market, there is a good which is supplied by the producers (and demanded
by the consumers) and the general idea is that, for both supply (and demand), if
producers are charging (or consumers are buying) at a price of p per-unit, then the level
of supply (or demand) for that good, q, will depend on p. Indeed, since each value of p
will lead the producers to supply (and the consumers to demand) exactly one quantity
q, it makes sense to think of the quantity, q, supplied (or demanded) as a function of
the price, p. This leads us to a description of the market in terms of two kinds of
function, namely:
If the quantity supplied, q, can be written in terms of p then we can identify the
supply function, q S , from the fact that we have q = q S (p). This tells us the
quantity, q, that the producers will supply if the prevailing market price is p.

26

2.1. Introduction: What is a function?

If the quantity demanded, q, can be written in terms of p then we can identify the
demand function, q D , from the fact that we have q = q D (p). This tells us the
quantity, q, that the consumers will demand if the prevailing market price is p.
In particular, note that, although we have q as a function of p in both of these cases we
follow the practice common in economics and use the vertical axis for p and the
horizontal axis for q when drawing the graphs of these functions. As such, any point on
the graph of these functions is of the form (q, p) where q = q S (p) for supply and
q = q D (p) for demand. Also, these functions and their graphs only make economic sense
when p 0 and the quantities they yield, q, are also non-negative.3

Once we have these functions, we are often interested in the the equilibrium point for
the market as this is the point where the supply and demand functions are equal. In
theory, this is the point, (q , p ), where the market stabilises since, at this point, the
per-unit price, p , is such that the levels of supply and demand are equal, i.e. we have
q S (p ) = q D (p ).

As such, we can find the equilibrium price, p , by solving the resulting equation and the
corresponding equilibrium quantity, q , can then be found by, say, using the demand
function as q = q D (p ). Lets look at a simple example.
Example 2.6

The supply and demand functions for a good are


q S (p) = p + 1

and

q D (p) = 3 p,

respectively. Sketch the graphs of these functions and find the equilibrium point.
Here the supply and demand functions are straight lines which can easily be
sketched using the method outlined in Example 2.1 and the results of doing this are
illustrated in Figure 2.11. To find the equilibrium price, p , we have
q S (p ) = q D (p )

p + 1 = 3 p

2p = 4,

and so, p = 2. Then using the demand function, say, we have


q = q D (p ) = 3 p ,
and so the equilibrium quantity is q = 3 2 = 1. Consequently, the equilibrium
point is (q , p ) = (2, 1) which, as indicated in Figure 2.11, is the point at which the
two straight lines intersect.
Usually, the supply and demand functions are invertible and so we can also find the
inverses of these functions. In particular, if they are invertible, we note that:
If the price, p, can be written in terms of q then we can identify the inverse supply
function, pS , from the fact that we have p = pS (q). This tells us the price, p, that
the producers will charge if the quantity being supplied is q.
3

Although, when drawing their graphs, it is often useful to consider all possible values of p and q
before restricting your attention to the economically meaningful ones where p, q 0!

27

2. Functions

p
3

S
1
D
O

1
Figure 2.11: A sketch of the graphs of the supply and demand functions in Example 2.6

indicating the equilibrium point for this market. (Note that this sketch only makes
economic sense when p 0.)
If the price, p, can be written in terms of q then we can identify the inverse
demand function, pD , from the fact that we have p = pD (q). This tells us the price,
p, that the consumers will pay if the quantity being demanded is q.
Activity 2.19 Decide whether the supply and demand functions in the example
above are invertible. If they are, find the inverse supply and demand functions.
The effects of taxation
Sometimes, in order to control a market, a government will impose an excise tax of T
per unit sold. We model such situations by assuming that the tax is paid to the
government by the supplier and so, if the price paid by the consumers in the presence of
this tax is p per unit, the suppliers effectively receive p T for each unit sold as they
must pay T of each p received to the government. As such, the supply and demand
functions in the presence of the tax, lets call them qTS (p) and qTD (p) respectively, will be
given by
qTS (p) = q S (p T )

and

qTD (p) = q D (p).

That is, the consumers still pay a price of p per unit and so the demand function is
unchanged, but the suppliers now only receive an amount p T per unit and so the
supply function is modified by the introduction of an excise tax. Of course, the
introduction of an excise tax will affect the equilibrium price and quantity for a market,
i.e. in the presence of such a tax, the new equilibrium point, lets call it (qT , pT ), will be
the point where
qTS (pT ) = qTD (pT )

or, equivalently,

q S (pT T ) = q D (pT ),

and, using the unchanged demand function qT = qTD (pT ) or, equivalently, qT = q D (pT ).
Lets look at how such a tax would affect the market we considered in Example 2.6.

28

2.1. Introduction: What is a function?

Example 2.7 An excise tax of T per unit is imposed on the market in


Example 2.6. Find the new equilibrium point and, by sketching the graph of the new
supply function on your earlier sketch, comment on how the equilibrium point for
the market has changed. How much of the tax has been passed onto the consumers?
What is the maximum tax, Tm , that can be imposed if this market is to continue
functioning?
If an excise tax of T per unit is imposed, the demand function is still
qTD (p) = q D (p) = 3 p,
but the supply function becomes
qTS (p) = q S (p T ) = p T + 1,
as the suppliers now see an effective price of p T . This means that the equilibrium
price in the presence of the tax, pT , is given by
qTS (pT ) = qTD (pT )

pT T +1 = 3pT

2pT = 2+T

T
p = 1+ ,
2

and so the equilibrium quantity in the presence of tax, qT , is


qT = qTD (pT ) = 3 1 +

T
2

=2

T
,
2

if we use the demand function, qTD (p).4 Thus, the new equilibrium point is
(2 T /2, 1 + T /2). Sketching the graph of the new supply function, as in
Figure 2.12, we see that it is parallel to the old one and the p-intercept has increased
by T . Indeed, as the equilibrium price has increased from 1 to 1 + T /2 due to the
presence of the tax, half the tax has been passed on to the consumer. Of course, the
equilibrium quantity in the presence of the tax must be positive and so, for the
market to function, we require that
qT > 0

T
>0
2

T < 4,

i.e. the maximum tax, Tm , that can be imposed is given by Tm = 4.


Alternatively, the government may decide to impose a percentage of the price tax of
100r% (so, for instance, a tax of 5% of the price would correspond to r = 0.05) instead
of the per unit tax that we have considered so far. So, again assuming that the tax is
paid to the government by the supplier, if the price paid by the consumers in the
presence of this tax is p per unit, the suppliers effectively receive p rp for each unit
sold as they must pay rp of each p received to the government. As such, in the presence
of such a tax, the supply and demand functions in the presence of the tax, lets call
4

Alternatively, we could use the supply function


qTS (p) = q S (p T ) = p T + 1,

to find qT . However, we can not use q S (p) = p + 1 as this no longer holds in the presence of the tax!

29

2. Functions

(2 12 T, 1 + 21 T )
new S
S

1
T 1

D
1

1
Figure 2.12: Following on from the sketch in Figure 2.11, if an excise tax of T per unit is
imposed, the supply set changes as shown and the demand set stays the same. Observe
how the introduction of this tax affects the equilibrium point for this market. (Note that
this sketch only makes economic sense when p 0.)

them qrS (p) and qrD (p) respectively, will be given by


qrS (p) = q S (p rp)

and

qrD (p) = q D (p).

That is, once again, the consumers still pay a price of p per unit and so the demand
function is unchanged, but the suppliers now only receive an amount p rp per unit
and so the supply function is modified by the introduction of a percentage of the price
tax. Of course, the introduction of this tax will also affect the equilibrium price and
quantity for the market, i.e. in the presence of such a tax, the new equilibrium point,
lets call it (qr , pr ), will be the point where
qrS (pr ) = qrD (pr )

or, equivalently,

q S (pr rpr ) = q D (pr ),

and, using the unchanged demand function qr = qrD (pr ) or, equivalently, qr = q D (pr ).
See, for example, Exercise 2.3 at the end of this chapter.

2.2

Conic sections

So far, we have been dealing with functions that are explicitly defined in terms of an
independent variable but, sometimes, we may have an equation relating two variables,
say x and y, which implicitly defines y as one or more functions of x. As it will be useful
in various places, we now investigate some important instances of functions defined in
this way and their graphs, the so-called conic sections.5

2.2.1

Parabolae

A parabola is a curve whose equation has the form


y = ax2 + bx + c,
5

See, for example, Binmore and Davies (2002) Section 2.14 for a full discussion of the geometric
aspects of conic sections and where they come from. Although this is interesting, we will not be delving
into these overly geometric aspects of conic sections in this course.

30

2.2. Conic sections

where a = 0, b and c are constants. Indeed, if we complete the square, we can write this
in the form
y = a(x p)2 + q,
for some constants p and q. This curve will have a y-intercept which we can find by
setting x = 0 and it may have x-intercepts which, if they exist, we can find by setting
y = 0. It will also have a turning point with coordinates (p, q) which will be a minimum
if a > 0 and a maximum if a < 0. Once we have this information, the parabola should
be easy to draw as the next example shows.
Example 2.8

Sketch the parabolae whose equations are

(a) y = x2 4x + 3, and
(b) y = x2 + 2x + 3.
For (a), we are told that y = x2 4x + 3 and so we find that:
For the y-intercept: Setting x = 0 we get y = 3.
For the x-intercepts: Setting y = 0 we get
x2 4x + 3 = 0

(x 1)(x 3) = 0,

i.e. the x-intercepts are x = 1 and x = 3.


The turning point of the parabola can be found by writing the equation of the
parabola in completed square form and, doing this, we get
y = (x 2)2 1.
Here, a = 1 > 0 and so we get a minimum at the point (2, 1).

Putting this information together, we then get the sketch in Figure 2.13(a).
For (b), we are told that y = x2 + 2x + 3 and so we find that
For the y-intercept: Setting x = 0 we get y = 3.
For the x-intercept: Setting y = 0 we get
x2 + 2x + 3 = 0

x2 2x 3 = 0

(x + 1)(x 3) = 0,

i.e. the x-intercepts are x = 1 and x = 3.


The turning point of the parabola can be found by writing the equation of the
parabola in completed square form and, doing this, we get
y = x2 +2x+3 = x2 2x +3 = (x1)2 1 +3 = (x1)2 +1+3 = (x1)2 +4.
Here, a = 1 < 0 and so we get a maximum at the point (1, 4).

Putting this information together, we then get the sketch in Figure 2.13(b).

31

2. Functions

y
y = x2 4x + 3

4
3

y = x2 + 2x + 3

2
O

x
1

(a)

x
1

(b)

Figure 2.13: In (a) we have a sketch of the parabola from Example 2.8(a). In (b) we have

a sketch of the parabola from Example 2.8(b).


Activity 2.20 Given the equation of a parabola in completed square form, i.e.
y = a(x p)2 + q,
use the fact that (x p)2 0 for all x R to explain why the turning point of this
parabola will be a minimum if a > 0 and a maximum if a < 0.

2.2.2

Circles

A circle of radius, r, centred on the point (a, b) has an equation given by


(x a)2 + (y b)2 = r2 .
Of course, such a circle is easy to draw and its x and y-intercepts can be found by
seeing where y = 0 and where x = 0 respectively. Once we have this information, the
circle should be easy to draw as the next example shows.
Example 2.9 Find the radius and centre of the circle whose equation is given by
Sketch the circle.

x2 6x + y 2 8y = 0.

We are told that


x2 6x + y 2 8y = 0,

is the equation of a circle and so, completing the square in x and y, we find that
(x 3)2 9 + (y 4)2 16 = 0

32

(x 3)2 + (y 4)2 = 25,

2.2. Conic sections

and so, comparing this with (x a)2 + (y b)2 = r2 we see that we have a circle of
radius 5 centred on the point (3, 4). We also find that:

For the x-intercept: Setting y = 0 we get


(x 3)2 + 16 = 25

(x 3)2 = 9

x 3 = 3,

y 4 = 4,

i.e. the x-intercepts are x = 6 and x = 0.


For the y-intercept: Setting x = 0 we get
9 + (y 4)2 = 25

(y 4)2 = 16

i.e. the y-intercepts are y = 8 and y = 0.


Putting this information together, we then get the sketch in Figure 2.14(a).
y
8

y
3
5

4
O

2
3

(a)

(b)

Figure 2.14: In (a) we have a sketch of the circle from Example 2.9. In (b) we have a

sketch of the ellipse from Example 2.10.

2.2.3

Ellipses

An ellipse has an equation of the form


x2 y 2
+ 2 = 1.
a2
b
In particular, an ellipse of this form is effectively a circle centred on the origin that has
been squashed and it is easy to draw once we have found its x and y-intercepts by
seeing where y = 0 and where x = 0 respectively.
Example 2.10 Sketch the ellipse whose equation is given by

x2 y 2
+
=1
4
9

Given that the equation of the ellipse is


x2 y 2
+
= 1,
4
9
we see that the x-intercepts, which occur when y = 0, are given by
x2
=1
4

x2 = 4

x = 2,

33

2. Functions

whereas the y-intercepts, which occur when x = 0, are given by


y2
=1
9

y2 = 9

y = 3.

Putting this information together, and bearing in mind that this should look like a
circle centred on the origin that has been squashed, we then get the sketch in
Figure 2.14(b).

2.2.4

Hyperbolae

A hyperbola can have an equation of the form


x2 y 2
2 = 1.
a2
b
This curve will have x-intercepts which can be found by setting y = 0, but no
y-intercepts. It will also have slant (or oblique) asymptotes which can be found by
writing the equation as
1
1
y2
= b2
2 ,
2
2
x
a
x
2
so that, as x , we have 1/x 0 and this leaves us with
b2
b
y2
=
=
y = x,
2
2
x
a
a
as the equations of the asymptotes. Once we have this information, the hyperbola
should be easy to draw as the next example shows.
Example 2.11

Sketch the hyperbola whose equation is given by

x2 y 2

= 1.
4
9

Given that the equation of the hyperbola is


x2 y 2

= 1,
4
9
we see that the x-intercepts, which occur when y = 0, are given by
x2
= 1 = x2 = 4 = x = 2,
4
whereas there are no y-intercepts since, setting x = 0, we get
y2
= 1 = y 2 = 9,
9
which has no real solutions. To find the asymptotes, we write the equation as

y2
=9
x2

1
1
2
4 x

so that, as x , we have 1/x2 0 and this leaves us with

y2
9
3
= y = x,
=
2
x
4
2
as the equations of the asymptotes. Putting this information together, we then get
the sketch in Figure 2.15(a).

34

2.2. Conic sections

y
2
x1

y =1+

x2
4

y2
9

=1
x

O
1
1

=
3

(a)

(b)

Figure 2.15: In (a) we have a sketch of the hyperbola from Example 2.11. In (b) we have

a sketch of the rectangular hyperbola from Example 2.12.


Of course, similar remarks apply to a hyperbola which has an equation of the form
y 2 x2
2 = 1,
b2
a
and, in particular, this curve will have y-intercepts but no x-intercepts.

Activity 2.21 Sketch the hyperbola whose equation is given by

y 2 x2

= 1.
9
4

Lastly, we note that a rectangular hyperbola has an equation of the form


(x a)(y b) = c,
and this arises when the asymptotes turn out to be the horizontal line y = b and the
vertical line x = a as the next example illustrates.
Example 2.12 Sketch the rectangular hyperbola whose equation is given by
(x 1)(y 1) = 2.
Given that (x 1)(y 1) = 2, we can see that:
For the x-intercept: Setting y = 0 we get (x 1)(1) = 2 or x 1 = 2, i.e.the
x-intercept is given by x = 1.
For the y-intercept: Setting x = 0 we get (1)(y 1) = 2 or y 1 = 2, i.e. the
y-intercept is given by y = 1.

Then, by writing the equation as

y =1+

2
,
x1

we can find the asymptotes by noting that:

35

2. Functions

For the vertical asymptote: As x 1+ we have y and as x 1 we have


y .

For the horizontal asymptote: As x we have y 1 from above and as


x we have y 1 from below.

Putting this information together, we then get the sketch in Figure 2.15(b). In
particular, observe that here we have
y =1+

(x 1) + 2
x+1
2
=
=
,
x1
x1
x1

and so this gives us y = f (x) where f (x) is the first function in Example 2.2 which
was illustrated in Figure 2.9(a).

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
identify elementary functions and sketch their graphs;
find combinations of elementary functions and inverses (if they exist);
use identities to rewrite expressions involving powers, logarithms and trigonometric
functions;
solve problems from economics-based subjects that involve functions;
identify and sketch conic sections.

Solutions to activities
Solution to activity 2.1
Using the triangles in Figure 2.5 and the definition of the tangent function, it should be
clear that


tan = , tan = 1 and tan = 3.
6
4
3
3
Indeed, using the fact that
angle in radians =

2
angle in degrees,
360

we can see that an angle of /6, /4 or /3 radians corresponds to an angle of 30, 45 or


60 degrees respectively.
Solution to activity 2.2
In this case, the unit circle method gives us the situation illustrated in Figure 2.16 and
so the angle in the triangle would be /3 (i.e. 2
= 3 as the angle subtended by a
3

36

2.2. Solutions to activities

straight line in this case the x-axis is ) giving x a magnitude of 1/2 and y a
magnitude of 3/2 whereas their signs would be negative for x (as x < 0) and positive
for y (as y > 0). Thus we see that

3
2
2
1
sin
=
and
cos
= ,
3
2
3
2
using the unit circle method.

(x, y)

y
1

2/3

Figure 2.16: For Activity 2.2, we find sin and cos when = 2/3 by considering a unit

circle.
Solution to activity 2.3
Using the unit circle in Figure 2.17(a), it should be clear that
sin 0 = 0

and

cos 0 = 1,

whereas using the unit circle in Figure 2.17(b), it should be clear that

and
cos = 0.
sin = 1
2
2
Then, using similar reasoning, we should be able to deduce that

sin

cos

3
2
1
0

2
0
1

are the other values of the functions sin and cos that we seek.
Solution to activity 2.4
From the definition of cot , we have
1
1
cos
=
=
,
sin

tan
sin
cos
as we know that tan = sin / cos . This function is defined as long as = n for n Z
since, at these values of , we have tan = 0 or, equivalently, sin = 0.
cot =

Solution to activity 2.5


Given the functions f : R R and g : R R where f (x) = x2 + 1 and g(x) = 2x , we
see that

37

2. Functions

1
O

(a)

1
x

(b)

Figure 2.17: For Activity 2.3, we find sin and cos by considering a unit circle when (a)
= 0 and (b) = /2.

g f is the function where


(g f )(x) = g(f (x)) = g(x2 + 1) = 2x

2 +1

where (g f ) : R R.
f g is the function where
(f g)(x) = f (g(x)) = f (2x ) = (2x )2 + 1 = 22x + 1,
and (f g) : R R.

Indeed, observe that as 2x

2 +1

= 22x + 1, these are certainly not the same function.

Solution to activity 2.6


If we have f (x) = x3 and g(x) = x2 + 5, then the function (x2 + 5)3 can be written as
(x2 + 5)3 = f (x2 + 5) = f (g(x)) = (f g)(x),
i.e. it is the composition we get from applying f after g or, in terms of x, it is the
composition of the function x3 after the function x2 + 5.
Solution to activity 2.7
By considering the graphs of the functions f (x) = x + 2 and f 1 (x) = x 2 as
illustrated in Figure 2.18, we see that the latter is indeed the reflection in the line y = x
of the former. Alternatively, we can see that if y = x + 2, a reflection in the line y = x
just means replacing all points (x, y) that satisfy this equation with points given by
(y, x) to get the new equation x = y + 2. But, of course, this gives y = x 2 which is
what we wanted.
Solution to activity 2.8
When we considered the function f : R R in Example 2.5, there were two problems
that prevented us from finding an inverse. To counteract these so that we can find the
local inverses of this function, we note that:
If we take the co-domain to be the interval [0, ) so that we have y 0, then we
remove the problem that occurs because y = x2 has no solution for y < 0.

38

2.2. Solutions to activities

2
x

y
O

x
=

Figure 2.18: For Activity 2.7, we see that the graph of f 1 (x) = x 2 is the reflection of

the function f (x) = x + 2 about the line y = x.

If we take the two domains given by the intervals (, 0] and [0, ) so that we
have x 0 and x 0 respectively, then we remove the problem that occurs
because y = x2 has two solutions for x R.

Indeed, this means that if we consider the function

f : [0, ) [0, ) given by f (x) = x2 , then we have


y = f (x)

y = x2

x=

y,

1
as x 0 because x [0, ).
Thus, using x = f (y), the inverse of this function is

1
1
f (y) = y or f (x) = x if we want it in terms of x.

f : (, 0] [0, ) given by f (x) = x2 , then we have


y = f (x)

y = x2

x = y,

1
as x 0 because x (, 0]. Thus,
using x = f (y), the inverse of this function

1
1
is f (y) = y or f (x) = x if we want it in terms of x.

In particular,
this means that the local inversesof f : R [0, ) where f (x) = x2 are

f 1 (x) = x when x [0, ) and f 1 (x) = x when x (, 0].


Solution to activity 2.9

We saw in Activity 2.8 that


the function f : [0, ) [0, ) where f (x) = x2 has an

inverse given by f 1 (x) = x. The graphs of these two


functions are illustrated in
Figure 2.19. In particular, observe that the curve y = x is the reflection about the line
y = x of the curve y = x2 and that all three of these curves intersect at the points (0, 0)
and (1, 1).
Solution to activity 2.10
From the graphs of the function f (x) = xn where f : R R when n is odd, which we
saw in Figure 2.2(a) and (c), it should be clear that the the equation y = f (x) has a
unique solution, x, for all y R and so the inverse of this function exists. In particular,
we see that

y = xn
=
x = y 1/n = n y,
gives us this unique solution for any y R provided that n
is odd and so we have

1
1
n
n
f (y) = y as the inverse function or, indeed, f (x) = x if we want it in terms of x.

39

x
y

y=

x2

2. Functions

y=

Figure 2.19: For Activity 2.9, we see that the graph of f 1 (x) =

x is the reflection of

the function f (x) = x2 about the line y = x.


However, from the graph of the function f (x) = xn where f : R R when n is even,
which we saw in Figure 2.2(b), it should be clear that when
y < 0, the equation y = f (x) has no solution for x as we know that x R means
that y = xn 0 when n is even.
y > 0, the equation y = f (x) has two solutions for x as we know that we can get

x = n y R when n is even.

As such, we can not find a unique solution, x, for all y R and so the inverse of this
function can not exist.
Solution to activity 2.11

x
log 2
=
y

1
O

y= x
2

We saw the graph of a function like f : R (0, ) where f (x) = 2x in Figure 2.3(b)
since we have a = 2 > 1 here. As such, we find that the graphs of the function
f : R (0, ) where f (x) = 2x and its inverse, f 1 (x) = log2 x where
f 1 : (0, ) R, are as illustrated in Figure 2.20. In particular, observe that the curve
y = log2 x is the reflection about the line y = x of the curve y = 2x .

Figure 2.20: For Activity 2.11, we see that the graph of f 1 (x) = log2 x is the reflection

of the function f (x) = 2x about the line y = x.

40

2.2. Solutions to activities

Solution to activity 2.12


To find the acute angles 1 and 2 where 1 = sin1
of values in Section 2.1.1 to see that

1
= sin
2
6

sin 1 =

gives us

1
2

and 2 = cos1 12 , we use the table

1 = sin1

= ,
2
6

and

1
1

= cos
gives us
2 = cos1 = ,
2
3
2
3
1
whereas to find the acute angle 3 where t3 = tan 1, we use the table we found in
Activity 2.1 to see that
cos 2 =

tan 3 = 1 = tan

gives us

3 = tan1 1 =

.
4

We also have
cosec 1 =

1
1
1
1
1
1
=
= 2, sec 2 =
=
= 2 and cot t3 =
= = 1,
sin 1
1/2
cos 2
1/2
tan 3
1

using the definitions of the reciprocals of our three trigonometric functions, which we
saw in Section 2.1.2.
Solution to activity 2.13
Given that f (x) = x3 , g(x) = x4 and h(x) = 2x , we use the definitions of the
combinations of functions we need from Section 2.1.2, to get
(f g)(x) = f (x)g(x) = (x3 )(x4 ) = x7 ,
f (x)
x3
1
(f /g)(x) =
= 4 = , and
g(x)
x
x
(g h)(x) = g(h(x)) = g(2x ) = (2x )4 = 24x ,

where we have used the power laws to simplify our answers. Indeed, observe that for the
last function, we can also write 24x = (24 )x = 16x .
Solution to activity 2.14
To derive the laws of logarithms, we note that for the first one, we use the power laws
and the given fact to get
aloga x+loga y = aloga x aloga y = xy = aloga (xy) ,
which means that loga x + loga y = loga (xy), for the second one, we similarly get
aloga xloga y =

x
aloga x
=
= aloga (x/y) ,
aloga y
y

which means that loga x loga y = loga (x/y) and for the third one, we get
y

ay loga x = (aloga x )y = xy = aloga (x ) ,


which means that y loga x = loga (xy ).

41

2. Functions

Solution to activity 2.15

We take logarithms to the base b on both sides of the given fact to see that
aloga x = x

logb aloga x = logb x

(loga x)(logb a) = logb x,

where we have used the third law of logarithms in the last step. Then, dividing through
on both sides by logb a (which is non-zero as a = 1), we get
loga x =

logb x
,
logb a

as required.
Solution to activity 2.16
Starting with sin2 + cos2 = 1, we divide both sides by sin2 to get
sin2 + cos2
1
=
2
sin
sin2

sin2 cos2
1
=
2 +
2
sin sin
sin2

1+

cos
sin

1
sin

so that 1 + cot2 = cosec2 if we use the definition of cosec from Section 2.1.2 and the
result from Activity 2.4. Then, again starting with sin2 + cos2 = 1, we divide both
sides by cos2 to get
sin2 + cos2
1
=
2
cos
cos2

sin2 cos2
1
+ 2 =
2
cos cos
cos2

sin
cos

+1 =

1
cos

so that tan2 + 1 = sec2 if we use the definition of sec and (2.1) from Section 2.1.2.
Solution to activity 2.17
With the given facts, we can use the compound-angle formula for sin( + ) to see that
sin( ) = sin( + ()) = sin cos() + cos sin() = sin cos cos sin ,
and the compound-angle formula for cos( + ) to see that
cos( ) = cos( + ()) = cos cos() sin sin() = cos cos + sin sin ,
as required.
Solution to activity 2.18
Using the compound-angle formula
sin( + ) = sin cos + cos sin ,
with = we get
sin( + ) = sin cos + cos sin

sin(2) = 2 sin cos ,

whereas using the compound-angle formula


cos( + ) = cos cos sin sin ,

42

2.2. Solutions to activities

with = we get
cos( + ) = cos cos sin sin

cos(2) = cos2 sin2 ,


2

as required. Indeed, since we also have the Pythagorean identity sin + cos = 1, we
can write this last double-angle formula as
cos(2) = (1 sin2 ) sin2 = 1 2 sin2 ,
in terms of sin2 , or as
cos(2) = cos2 (1 cos2 ) = 2 cos2 1,
in terms of cos2 , as required.
Solution to activity 2.19
From the graph in Figure 2.11, we can see that the economically meaningful part of the
supply function is q S : [0, ) [1, ) where q S (p) = p + 1 and the economically
meaningful part of the demand function is q D : [0, 3] [0, 3] where q D (p) = 3 p.
Clearly, both of these functions are invertible as each q in the co-domain gives rise to a
unique p in the domain and we find that
q =p+1

p = pS (q) = q 1,

is the inverse supply function, whereas


q =3p

p = pD (q) = 3 q,

is the inverse demand function.


Solution to activity 2.20
Given that
y = a(x p)2 + q,
we see that
If a > 0, then for any x R,
(x p)2 0

a(x p)2 0

a(x p)2 + q q,

i.e. for all x R, y q and so the smallest value of y occurs when y = q which, in
turn, means that we must have x = p. Thus, the turning point of the parabola is a
minimum and this occurs at the point (p, q).
If a < 0, then for any x R,
(x p)2 0

a(x p)2 0

a(x p)2 + q q,

i.e. for all x R, y q and so the largest value of y occurs when y = q which, in
turn, means that we must have x = p. Thus, the turning point of the parabola is a
maximum and this occurs at the point (p, q).

43

2. Functions

Solution to activity 2.21

y 2 x2

= 1,
9
4
we see that there are no x-intercepts since, setting y = 0, we get

x2
=1
4

x2 = 4,

which has no real solutions, whereas we see that the y-intercepts, which occur when
x = 0, are given by
y2
= 1 = y 2 = 9 = y = 3.
9
To find the asymptotes, we write the equation as
y2
=9
x2

1
1
+ 2
4 x

so that, as x , we have 1/x2 0 and this leaves us with


9
y2
=
2
x
4

3
y = x,
2

as the equations of the asymptotes. Putting this information together, we then get the
sketch in Figure 2.21.

x2
4

=1

y2
9

Given that the equation of the hyperbola is

y
=
3

Figure 2.21: For Activity 2.21, a sketch of the hyperbola

44

y 2 x2

= 1.
9
4

2.2. Exercises

Exercises

Exercise 2.1
Sketch the graph of the function f : {x R | x = 1, 1} R given by
x4 1
.
x2 1

f (x) =

Exercise 2.2
Use the compound-angle formulae to show that
tan( ) =

tan tan
,
1 tan tan

and hence deduce an expression for tan(2).


Exercise 2.3
The supply and demand functions for a good are
q S (p) = p 4

and

q D (p) = 8 p,

respectively. Sketch the graphs of these functions and find the equilibrium point.
A percentage [of the price] tax of 100r% is imposed. Find the new equilibrium point
and, by sketching the graph of the new supply function on your earlier sketch, comment
on how the equilibrium point for the market has changed. How much of the tax has
been passed onto the consumers? What is the maximum tax, rm , that can be imposed if
this market is to continue functioning?
Exercise 2.4
When selling a quantity, q, a firm makes a profit given by
(q) = q 2 + 2q + 2,
and the largest quantity it can produce is 10. Sketch the graph of this profit function
and deduce the value of q that will yield the greatest profit for this firm.
Explain why the inverse profit function exists and find it.
Exercise 2.5
Sketch the circle and the rectangular hyperbola with equations
x2 + y 2 = 1

and

2xy = 1,

respectively. At what points do these two curves intersect?

45

2. Functions

Solutions to exercises

Solution to exercise 2.1


The function f : {x R | x = 1, 1} R given by
x4 1
f (x) = 2
,
x 1
is clearly undefined at x = 1 and x = 1 as these values of x would entail division by
zero. However, we notice that factorising the numerator and the denominator we get
f (x) =

(x2 1)(x2 + 1)
,
x2 1

and so, as long as x = 1, we have


f (x) = x2 + 1.
This means that, to sketch the graph of f (x), we start by sketching the graph of the
parabola
y = x2 + 1,
which has
a y-intercept when x = 0, i.e. when y = 1,
no x-intercepts as y = 0 gives x2 + 1 = 0 which has no real solutions, and
a turning point which is a minimum at the point (0, 1).
We then exclude the points (1, 2) and (1, 2) on the parabola at which f (x) itself is
undefined to get the sketch in Figure 2.22.

y
y = f (x)
2
1
1 O 1

Figure 2.22: For Exercise 2.1, a sketch of the graph of f (x). (Note that the points at

which f (x) is undefined are marked by a .)


Solution to exercise 2.2
Using (2.1), we have
tan( ) =

46

sin( )
,
cos( )

2.2. Solutions to exercises

and so, using the compound-angle formulae in (2.5), we get


sin
sin

sin cos cos sin


cos cos
tan( ) =
=
,
sin sin
cos cos sin cos
1
cos cos

if we divide the numerator and denominator by cos cos and cancel where
appropriate. Thus, using (2.1) again, we have
tan( ) =

tan tan
,
1 tan tan

as required. Indeed, observe that this only makes sense if , = (2n + 1) 2 for n Z as,
if this isnt true, we cant divide through by cos cos or, equivalently, one of tan or
tan wont exist.
To deduce a formula for tan(2), we set = in the formula for tan( + ) to get
tan(2) = tan( + ) =

2 tan
tan + tan
=
.
1 tan tan
1 tan2

Again, we observe that this only makes sense if = (2n + 1) 2 for n Z as, if this isnt
true, tan wont exist.
Solution to exercise 2.3
Here the supply and demand functions are straight lines which can be easily sketched
using the method outlined in Example 2.1 and the results of doing this are illustrated in
Figure 2.23(a). To find the equilibrium price, p , we have
q S (p ) = q D (p )

p 4 = 8 p

2p = 12,

and so, p = 6. Then, using the demand function, say, we have


q = q D (p ) = 8 p ,
and so the equilibrium quantity is q = 8 6 = 2. Consequently, the equilibrium point
is (q , p ) = (2, 6) which, as indicated in Figure 2.23(a), is the point at which the two
straight lines intersect.
If a percentage [of the price] tax of 100r% is imposed,6 the demand function is still
qrD (p) = q D (p) = 8 p,
but the supply function becomes
qrS (p) = q S (p rp) = p rp 4,
6

Here we will start by restricting our attention to the case where 0 r 1 as, prima facie, these are
the values that would appear to be economically sensible. Although, as we will soon see, the economically
meaningful values of r will turn out to be 0 r < 1/2!

47

2. Functions

as the suppliers now see an effective price of p rp. This means that the equilibrium
price in the presence of tax, pr , is given by

qrS (pr ) = qrD (pr ) = pr rpr + 4 = 8 pr


= pr (2 r) = 12
12
= p =
,
2r
and so the equilibrium quantity in the presence of tax, qr , is
qr = qrD (pr ) = 8

16 8r 12
4 8r
12
=
=
,
2r
2r
2r

if we use the demand function, qrD (p).7 Sketching the graph of the new supply function,
as in Figure 2.23(b), we see that by writing its equation as
p=

q
4
+
,
1r 1r

and noting that

1
4
1
and
4,
1r
1r
when considering 0 r 1, this means that it is steeper than the old one and that the
p-intercept, which is now
4
4(1 r) + 4r
4r
=
=4+
,
1r
1r
1r
has increased by 4r/(1 r). In this case, as the equilibrium price has increased from 6 to
12
6(2 r) + 6r
6r
=
=6+
,
2r
2r
2r
we see that the consumer pays 6r/(2 r) more. But, as the total tax to be paid by the
supplier is given by
12
12r
rpr = r
=
,
2r
2r
this means that only half of the tax has been passed on to the consumer in this case. Of
course, the equilibrium quantity in the presence of the tax must be positive and so, for
the market to function, we require that
qr > 0

4 8r
>0
2r

4 > 8r

1
r< ,
2

(bearing in mind that 2 r > 0 if 0 r 1), i.e. the maximum tax, rm , that can be
imposed is given by rm = 1/2.
7

Alternatively, we could use the supply function


qrS (p) = q S (p rp) = p rp 4,

to find qr . However, we can not use q S (p) = p 4 as this no longer holds in the presence of the tax!

48

2.2. Solutions to exercises

p
8

new S

12 48r
, 2r )
( 2r

4
1r

6
4

D
4

(a)

(b)

Figure 2.23: For Exercise 2.3, a sketch of the graphs of the supply and demand functions

indicating the equilibrium point of the market when (a) there is no tax and (b) a
percentage of the price tax of 100r% is imposed. (Note that these sketches only make
economic sense when q 0.)
Solution to exercise 2.4
The firms profit function is
(q) = q 2 + 2q + 2,
and its domain is the interval [0, 10] as q 0 since it is a quantity and q 10 since the
largest quantity it can produce is 10. So, to sketch the graph of this profit function, we
start by sketching the parabola
y = q 2 + 2q + 2 = (q + 1)2 + 1,
in completed square form. This has
a y-intercept when q = 0, i.e. y = 2,
no q-intercepts as y = 0 gives (q + 1)2 + 1 = 0 which has no real solutions, and
a turning point which is a minimum at the point (1, 1).
We then restrict our attention to the relevant values of q, i.e. those that satisfy
0 q 10, to get a sketch of the graph of the profit function itself as illustrated in
Figure 2.24.
Looking at the graph of the profit function, we see that as it is a function
: [0, 10] [2, 122], its inverse exists since there is a unique q [0, 10] such that
y = (q) for all y [2, 122]. Indeed, solving this equation we find that, using the
completed square form above, we have
y = (q+1)2 +1

(q+1)2 = y1

q+1 = y 1

q = 1 y 1,

which gives us two values of q for each value of y > 1. However, as we must be getting
values of q [0, 10] from our inverse function, we take the + sign here (i.e. we discard

49

2. Functions

the sign) so that we can get the solutions where q 1 (instead of getting the
solutions where q 1 which we dont want). That is, we have found that
q = 1 (y) = 1 +

y 1,

is the sought after inverse function in terms of y.


y
122
y = (q)
2
1
1 O

10 q

Figure 2.24: For Exercise 2.4, a sketch of the graph of the profit function, (q). (Note

that dashed parts of the curve are on the parabola but are not part of the graph of the
profit function.)
Solution to exercise 2.5
To sketch the circle and the rectangular hyperbola, we note that:
The circle with equation
x2 + y 2 = 1,
is centred on the origin and has a radius of 1. Indeed, setting x = 0, we find that its
y-intercepts are y = 1 and, setting y = 0, we find that its x-intercepts are x = 1.
The rectangular hyperbola with equation
2xy = 1

y=

1
,
2x

has the x and y-axes, i.e. the lines y = 0 and x = 0, respectively, as its asymptotes
since
For the vertical asymptote: As x 0+ we have y and as x 0 we have
y .
For the horizontal asymptote: As x we have y 0 from above and as
x we have y 0 from below.

We also note that it has no x-intercepts (as no value of x makes y = 0) and no


y-intercepts (as no value of y makes x = 0).
and these curves are illustrated in Figure 2.25.

To find the points of intersection of these two curves we have to solve the equations
x2 + y 2 = 1

and

2xy = 1,

simultaneously. This can easily be done in two different ways which we give here for
completeness.

50

2.2. Solutions to exercises

Method I: The equation 2xy = 1 tells us that, say, y = 1/(2x) and substituting
this into the other equation we get
x2 + y 2 = 1

1
=1
4x2

x2 +

4x4 4x2 + 1 = 0,

which is a quadratic equation in x2 . By factorising this, say, we find that


(2x2 1)2 = 0

x2 =

1
2

1
x = .
2

Then, substituting these back into the equation y = 1/(2x) we get

1
2
1
y=
=
= ,
2x
2
2
as the corresponding values of y.
Method II: We note that, using our equations, we have
(x y)2 = x2 2xy + y 2 = (x2 + y 2 ) (2xy) = 1 1 = 0,
and so any solutions we seek must satisfy (x y)2 = 0 or, equivalently, x = y. If we
substitute this into one of our equations, say 2xy = 1, we get
2xy = 1

2y 2 = 1

y2 =

1
2

1
y = .
2

Then, using the equation x = y again, we get x = 1/ 2 as the corresponding


values of x.
So, whichever
method
we find
that the points of intersection of these two
you choose,
curves are (1/ 2, 1/ 2) and (1/ 2, 1/ 2), both of which are indicated in
Figure 2.25.

y
y=

1
2x

1
1

O
1

x2 + y 2 = 1

Figure 2.25: For Exercise 2.5, a sketch of the circle x2 + y 2 = 1 and


the rectangular

hyperbola
2xy

= 1. The indicated points of intersection are (1/ 2, 1/ 2) and


(1/ 2, 1/ 2).

51

2. Functions

52

Chapter 3
Differentiation
3
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 2.72.13.
Anthony and Biggs (1996) Chapter 6 and parts of Chapter 7.
Further reading
Simon and Blume (1994) Sections 2.32.7 and 3.6, Chapter 4 and Section 5.5.
Adams and Essex (2010) Sections 2.12.7, parts of Sections 3.1 and 3.3, parts of
Sections 4.9 and 4.10.
Aims and objectives
The objectives of this chapter are as follows.
To introduce the idea of a derivative and see how it can be found using various
techniques.
To use derivatives to find tangent lines and approximate functions using various
techniques.
To see how derivatives can be used in economics-based subjects.
Specific learning outcomes can be found near the end of this chapter.

3.1

Introduction: What is differentiation?

Having revised the idea of a function in the previous chapter, we now turn to
differentiation, the process by which we find the derivative of a function. Given a
function, f , its derivative at the point a, which we denote by f (a), is given by the
formula
f (a + h) f (a)
f (a) = lim
,
h0
h
provided that the limit exists. Indeed, when the limit exists, i.e. when we can find a
value for f (a), we say that the function is differentiable at a. Observe that here, we
have introduced the notation
lim g(h),
h0

53

3. Differentiation

to denote the value1 of the function g(h) as h 0 (provided, of course, that there is
such a value) and we call this value the limit of g(h) as h 0 whereas if there is no
such value, we say that this limit does not exist.2 To see how this works in practice, we
can consider a simple example.
Example 3.1 Use the definition to find the derivative of the function f (x) = x2 at
the point x = 3.

We need to find f (3) and, using the formula above with a = 3, we start by looking at
f (3 + h) f (3)
(3 + h)2 32
=
,
h
h
which, looking at the numerator, is easily simplified to give
f (3 + h) f (3)
(9 + 6h + h2 ) 9
6h + h2
=
=
= 6 + h.
h
h
h
This in turn means that
f (3) = lim

h0

f (3 + h) f (3)
= lim 6 + h
h0
h

= 6,

because, in the limit as h 0, we see that 6 + h goes to 6. Consequently, we can see


that the derivative of f (x) at the point x = 3, i.e. f (3), is 6. Indeed, we can say that
the function f (x) = x2 is differentiable at x = 3.
Activity 3.1 Use the definition to find the derivative of the function f (x) = x2 at
the point x = 1.
More generally, instead of finding the derivative of f at individual points, we can find
its derivative at a general point, x, by finding f (x). Of course, according to our
formula, this would involve finding
f (x + h) f (x)
,
h0
h
and, provided the limit exists, this will give us another function of x. This can then be
used to find the derivative, say f (a), at an individual point, a, by setting x = a in our
result. Lets see how this works.
f (x) = lim

Example 3.2 Use the definition to find the derivative of the function f (x) = x2 at
the general point x and use this to verify that f (3) = 6 as we found in Example 3.1.
We need to find f (x) and, using the formula above, we start by looking at
f (x + h) f (x)
(x + h)2 x2
=
,
h
h
1

That is, a finite real number.


In 176 Further Calculus, you will do limits properly, but this simple explanation of what is going on
should suffice for our purposes here. In particular, we briefly introduced the notation in Example 2.2
and, with the examples below, you should be able to see what is happening for now. Also, in Section
3.3.4, we will see some examples where a limit fails to exist and we will explain what that means there.
2

54

3.2. How to find derivatives

which, looking at the numerator, is easily simplified to give


f (x + h) f (x)
(x2 + 2xh + h2 ) x2
2xh + h2
=
=
= 2x + h.
h
h
h
This in turn means that
f (x + h) f (x)
= lim 2x + h
h0
h0
h

f (x) = lim

= 2x,

because, in the limit as h 0, we see that 2x + h goes to 2x. Consequently, we can


see that the derivative of f (x) at the general point x, i.e. f (x), is 2x which is also a
function of x as we should expect.3
Having found this, we can substitute x = 3 into our result to see that
f (3) = 2(3) = 6 as we found in Example 3.1.
Activity 3.2

Use the result in Example 3.2 to verify your answer to Activity 3.1.

At what point is the derivative of f (x) = x2 equal to (a) 16 and (b) 4?


We have seen that a function, f (x), has a derivative, f (x), which is also a function of x.
The process by which we go from a function to its derivative is called differentiation.
That is, when we have a function, f (x), we differentiate it with respect to x and we
sometimes denote this operation by
d
f (x) which is read as differentiate f (x) with respect to x,
dx
and the outcome of this operation will be the sought after derivative which we can write
as
df
or f (x).
dx
If we then want to evaluate the derivative of f at the point a, we write
df
dx

or f (a),
x=a

depending on which notation we are using.


We will see what derivatives tell us about functions in Section 3.3 and, in particular, we
will see that some functions do not have derivatives at certain points as the limit in the
definition may not exist. But, before we do that, we turn our attention to how we can
find the derivative of a function when we dont want to explicitly use the definition.

3.2

How to find derivatives

The previous section told us how to find derivatives from first principles, but now we
want to explore a more convenient way of finding them. The key idea is that we
3

Indeed, as this limit exists for all x R, we can say that the function f (x) = x2 is differentiable for
all x R.

55

3. Differentiation

introduce standard derivatives which tell us how to differentiate the basic functions that
we saw in the previous chapter. Once we know how to differentiate these, the rules of
differentiation will allow us to differentiate combinations of these functions.

3.2.1

Standard derivatives

In Example 3.2, we used the definition of the derivative to show that the function
f (x) = x2 has a derivative given by f (x) = 2x. We now state some results that will
allow us to differentiate other elementary functions.
Power and root functions
If n Z, we can use the definition of the derivative to show that
f (x) = xn

f (x) = nxn1 .

For instance, we note that:


If n = 0, we have
f (x) = x0 = 1

f (x) = 0x1 = 0,

which tells us the derivative of 1.


If n = 1, we have
f (x) = x1 = x

f (x) = 1x0 = 1,

which tells us the derivative of x.


If n = 2, we have
f (x) = x2

f (x) = 2x1 = 2x,

in agreement with what we saw in Example 3.2.


Indeed, we will also use this rule when n Q, and so we also have things like
f (x) = x 2 =

which tells us the derivative of

1 1
1
f (x) = x 2 = ,
2
2 x

x.

Exponential and logarithmic functions


If we are using e and ln, the derivatives are very simple, i.e.
f (x) = ex

f (x) = ex ,

as ex is the special function which is equal to its derivative. We also have


f (x) = ln x

56

f (x) =

1
,
x

3.2. How to find derivatives

which, as we will see in Activity 3.12, follows from the fact that the function ln x is the
inverse of ex .
If we have another base, a, the derivatives are not so simple. We shall see in Activity 3.9
that
f (x) = ax
=
f (x) = ax ln a,
and, using the change of base formula for logarithms, we will see that
f (x) = loga x

1
,
f (x) =
x ln a

f (x) = cos x,

in Section 3.2.2.
Sine and cosine functions
For the sine function we find that
f (x) = sin x
and for the cosine function we have
f (x) = cos x

f (x) = sin x.

Although, we could have used the fact that the sine and cosine functions are
interdefinable, i.e.
cos x = sin x +

and

sin x = cos x +

,
2

to derive the latter from the former once we have the chain rule (see Exercise 3.2).
Indeed, using these standard derivatives, we can then derive the derivatives of the other
trigonometric functions using their definitions in terms of sine and cosine together with
the rules of differentiation in Section 3.2.2 see, for example, Activity 3.6(c).

3.2.2

The rules of differentiation

In Section 2.1.2, we saw that there are several standard ways of making new functions
from old ones. Here, we see how we can use the standard derivatives, i.e. the derivatives
of our basic functions, and rules of differentiation to differentiate new functions that are
created from these basic ones in these standard ways. We start with the most
straightforward of these which allows us to differentiate linear combinations of functions.
The linear combination rule
If k and l are constants, this allows us to differentiate the linear combination,
kf (x) + lg(x), of two functions f (x) and g(x). It states that
df
dg
d
kf (x) + lg(x) = k
+l ,
dx
dx
dx
or, using our shorthand, (kf + lg) (x) = kf (x) + lg (x). Indeed, this gives us three more
basic rules straightaway, i.e. the

57

3. Differentiation

constant multiple rule: If k is a constant and f (x) is a function, then


d
df
kf (x) = k ,
dx
dx
or, using our shorthand, (kf ) (x) = kf (x).

sum rule: If f (x) and g(x) are functions, then


d
df
dg
f (x) + g(x) =
+
,
dx
dx dx
or, using our shorthand, (f + g) (x) = f (x) + g (x).
difference rule: If f (x) and g(x) are functions, then
d
df
dg
f (x) g(x) =

,
dx
dx dx
or, using our shorthand, (f g) (x) = f (x) g (x).
Activity 3.3 Derive the constant multiple, sum and difference rules from the linear
combination rule.
Example 3.3

Using these rules we see that:


3

if f (x) = 3x 2 , then f (x) = 3 21 x 2


rule;
1

= 32 x 2 by the constant multiple

if f (x) = x2 + x 2 , then f (x) = 2x + 12 x 2 by the sum rule;


if f (x) = cos x sin x, then f (x) = sin x cos x by the difference rule;
if f (x) = 3 ln x 4 ex , then f (x) =

3
4 ex by the linear combination rule.
x

So, in the case of simple combinations of functions such as these, we see that the
derivative of the linear combination is given by the linear combination of the derivatives.
Activity 3.4 Use the rules above to differentiate the following functions with
respect to x.
(a) 3 cos x,

(b) ex + cos x,

(c) 3 sin x 3 ln x.

Indeed, we can see that using the change of base formula for logarithms from
Section 2.1.4, we have
ln x
loga x =
,
ln a

58

3.2. How to find derivatives

and so, using the constant multiple rule, we get


d
dx

loga x

d
dx

ln x
ln a

1 d
ln a dx

ln x

1
ln a

1
x

1
,
x ln a

as mentioned in Section 3.2.1. We now look at the other rules of differentiation, i.e. the
ones that will allow us to differentiate products, quotients and compositions of functions.
The product rule
This allows us to differentiate the product of two functions f (x) and g(x). It states that
d
df
dg
f (x)g(x) =
g(x) + f (x) ,
dx
dx
dx
or, using our shorthand, [f (x)g(x)] = f (x)g(x) + f (x)g (x)]. Lets have a look at some
examples of how it works.
Example 3.4

Differentiate the function h(x) = x ex with respect to x.

This is the product of the two functions


f (x) = x

and

g(x) = ex ,

f (x) = 1

and

g (x) = ex .

and these give us


As such, the product rule tells us that
h (x) = (1)(ex ) + (x)(ex ) = (1 + x) ex ,
is the derivative of the function h(x) = x ex with respect to x.
Example 3.5

Differentiate the function h(x) = x ln x with respect to x.

This is the product of the two functions


f (x) = x

and

g(x) = ln x,

and these give us


f (x) = 1

and

g (x) =

1
.
x

As such, the product rule tells us that


h (x) = (1)(ln x) + (x)

1
x

= ln x + 1.

is the derivative of the function h(x) = x ln x with respect to x.

59

3. Differentiation

Example 3.6

Differentiate the function h(x) = ex ln x with respect to x.

This is the product of the two functions


f (x) = ex

and

g(x) = ln x,

and these give us


f (x) = ex

and

g (x) =

1
.
x

As such, the product rule tells us that


h (x) = (ex )(ln x) + (ex )

1
x

= ex ln x +

1
x

is the derivative of the function h(x) = ex ln x with respect to x.


Activity 3.5 Use the product rule to differentiate the following functions with
respect to x.
(a) x sin x, (b) ex cos x, (c) sin x cos x.
What can you deduce about the derivative of sin(2x) from your answer to (c)?
The quotient rule
This allows us to differentiate the quotient of two functions f (x) and g(x). It states that
df
dg
g(x) f (x)
d f (x)
dx ,
= dx
dx g(x)
[g(x)]2
or, using our shorthand,
f (x)
g(x)

f (x)g(x) f (x)g (x)


.
[g(x)]2

Of course, as we saw in Section 2.1.2, this all assumes that the quotient of the two
functions is defined for the values of x that we are working with, i.e. it only works for
values of x in the domain where g(x) = 0. Lets have a look at some examples of how it
works.

Example 3.7

For x = 0, differentiate the function h(x) =

This is the quotient of the two functions


f (x) = ex

and

g(x) = x,

f (x) = ex

and

g (x) = 1.

and these give us

60

ex
with respect to x.
x

3.2. How to find derivatives

As such, for x = 0, the quotient rule tells us that


h (x) =

(ex )(x) (ex )(1)


x1 x
=
e ,
2
x
x2

is the derivative of the function h(x) with respect to x.

Example 3.8

For x = 1, differentiate the function h(x) =

x3
with respect to x.4
ln x

This is the quotient of the two functions


f (x) = x3

and

g(x) = ln x,

and these give us


f (x) = 3x2

and

g (x) =

1
.
x

As such, for x = 1, the quotient rule tells us that


(3x2 )(ln x) (x3 )
h (x) =
[ln x]2

1
x

x2 (3 ln x 1)
,
[ln x]2

is the derivative of h(x) with respect to x.

Example 3.9

Differentiate the function h(x) =

ln x
with respect to x.5
ex

This is the quotient of the two functions


f (x) = ln x

and

g(x) = ex ,

and these give us

1
and
x
As such, the quotient rule tells us that
f (x) =

h (x) =

1
x

g (x) = ex .

(ex ) (ln x)(ex )


(1 x ln x) ex
1 x ln x
=
=
,
[ex ]2
x e2x
x ex

is the derivative of h(x) with respect to x.


Activity 3.6 Use the quotient rule to differentiate the following functions with
respect to x and find the values of x for which the derivatives exist.
(a)

sin x
,
x

(b)

ex
,
cos x

(c)

sin x
.
cos x

What can you deduce about the derivative of tan x from your answer to (c)?
4
5

Here, h(x) is only defined for x = 1 since we have ln x = 0 if x = 1.


Observe that as ex > 0 for all x R, we dont have to worry about dividing by zero here.

61

3. Differentiation

The chain rule


This allows us to differentiate the composition of two functions f (x) and g(x). It states
that
d
df dg
[f (g(x))] =
,
dx
dg dx

or, using our shorthand, [f (g(x))] = f (g)g (x). Lets have a look at some examples of
how it works.
Example 3.10

Differentiate the function h(x) = (2x + 1)3 with respect to x.

The function h(x) = (2x + 1)3 is the composition of the functions


f (g) = g 3

and

g(x) = 2x + 1.

As such we have
f (g) = 3g 2

and

g (x) = 2,

and so the chain rule tells us that


h (x) = (3g 2 )(2) = 6g 2 = 6(2x + 1)2 ,
is the derivative of h(x) with respect to x.
Activity 3.7 Verify that this is correct by multiplying out the brackets and
differentiating your new expression for h(x) with respect to x.

Differentiate the function h(x) = 2x + 1 with respect to x.

The function h(x) = 2x + 1 is the composition of the functions


Example 3.11

f (g) =

g = g2

and

g(x) = 2x + 1.

As such we have

1 1
f (g) = g 2
2
and so the chain rule tells us that
h (x) =

1 1
g 2
2

and

g (x) = 2,

(2) = g 2 =

1
,
2x + 1

is the derivative of h(x) with respect to x.6

Example 3.12
6

Differentiate the function h(x) = ex

3 +2

with respect to x.

In particular, observe that here the original function is only defined if x 1/2 whereas the derivative
is only defined if x > 1/2 (as, in the derivative, x = 1/2 would entail division by zero).

62

3.2. How to find derivatives

The function h(x) = ex

3 +2

is the composition of the functions

f (g) = eg

and

g(x) = x3 + 2.

As such we have
f (g) = eg

g (x) = 3x2 ,

and

and so the chain rule tells us that


3 +2

h (x) = (eg )(3x2 ) = 3x2 ex

is the derivative of h(x) with respect to x.


Activity 3.8
to x.

Use the chain rule to differentiate the following functions with respect
(a) sin(2x),

(b) ln(cos x),

(c) ln(ex ).

Why should your answer to (c) be obvious?


The chain rule can also be used to derive some useful results.
Activity 3.9 (A useful result)
Using the fact that
ax = ex ln a ,
which we saw in Section 2.1.3, show that
dax
= ax ln a.
dx
This was mentioned in Section 3.2.1, but there is no need to remember it as you
should be able to derive this result if it is needed.
Activity 3.10 (Deriving the quotient rule)
Derive the quotient rule by writing the quotient
f (x)
g(x)

as the product f (x)[g(x)]1 ,

and using the product and chain rules to differentiate it with respect to x.
Activity 3.11 (Derivatives of inverse functions)
If the function, f , has an inverse, f 1 , then we can let y = f (x) so that x = f 1 (y).
Use the chain rule to show that
d 1
f (y) = 1
dy

d
f (x) .
dx

63

3. Differentiation

Activity 3.12 We know that if y = ex , then x = ln y. Use the result in


Activity 3.11 and the fact that (ex ) = ex to show that the derivative of ln y with
respect to y is 1/y.
Using the rules together

Sometimes it will be necessary to apply several of the above rules of differentiation in


order to find a derivative. This is easily done as long as care is taken to recognise what
you are differentiating at each step. Here are two examples that should make the
procedure clear.
Example 3.13
x.

Differentiate the function l(x) = (x3 + 1) ln(x2 + 4) with respect to

This is the product of the two functions


f (x) = x3 + 1

g(x) = ln(x2 + 4),

and

and clearly, f (x) = 3x2 . But to differentiate g(x) we need to use the chain rule
because it is a composition. In this case, we have
g(h) = ln h
which gives us
g (h) =
so that
g (x) =

and

1
h

and
1
h

(2x) =

h(x) = x2 + 4,

h (x) = 2x,
2x
2x
= 2
,
h
x +4

by the chain rule. Now, putting all of this into the product rule gives us
l (x) = (3x2 ) ln(x2 + 4) + (x3 + 1)

2x
2
x +4

= 3x2 ln(x2 + 4) +

2x(x3 + 1)
,
x2 + 4

as the derivative of l(x) with respect to x.


Example 3.14

2 +x

Differentiate the function l(x) = ex

ln(x3 + 1) with respect to x.

This is the product of the two functions


f (x) = ex

2 +x

and

g(x) = ln(x3 + 1),

and to differentiate f (x) we need to use the chain rule because it is a composition.
In this case, we have
f (h) = eh

and

h(x) = x2 + x,

f (h) = eh

and

h (x) = 2x + 1,

which gives us

64

3.2. How to find derivatives

so that
f (x) = (eh )(2x + 1) = (2x + 1) eh = (2x + 1) ex

2 +x

by the chain rule. Then, to differentiate g(x), we need to use the chain rule again
because it is also a composition. In this case, we have
g(h) = ln h

and

h(x) = x3 + 1,

1
h

and

h (x) = 3x2 ,

(3x2 ) =

3x2
3x2
= 3
,
h
x +1

which gives us
g (h) =
so that
g (x) =

1
h

by the chain rule. Now, putting all of this into the product rule gives us
l (x) = (2x + 1) ex

2 +x

ln(x3 + 1) + ex

= (2x + 1) ln(x3 + 1) +

2 +x

3x2
x3 + 1

3x2
x2 +x
e
,
x3 + 1

as the derivative of l(x) with respect to x.


Of course, once you can reliably apply the rules, there is no need to show all of the
intermediate working.
Activity 3.13 Use the rules of differentiation to differentiate the following
functions with respect to x.
2

(a) ex ln(sin x),

(b)

sin(cos x)
,
esin x

(c) sin2 (3x) + cos2 (3x).

Why should your answer to (c) be obvious?

3.2.3

Higher-order derivatives

As we have seen above, when we differentiate a function, f (x), we find that its
derivative, f (x), is also a function of x. In this context, we call f (x) the first-order
derivative of f (x) and we can differentiate it again to get the second-order derivative,
i.e. we find
d2 f
d df
and we denote this by
or f (x).
dx dx
dx2
Of course, the second-order derivative will also be a function of x and so we can
differentiate it again to get the third-order derivative, i.e. we find
d
dx

d2 f
dx2

and we denote this by

d3 f
dx3

or f (x).

We can, of course, do this again and again but the shorthand notation we use can
become a bit unwieldy once we pass the third-order derivative. As such, for n 4, we

65

3. Differentiation

often write the nth-order derivative, i.e.


dn f
dxn

as f (n) (x),

when we use our shorthand.

Example 3.15 Find the first four derivatives of f (x) = sin x. What is the
relationship between these derivatives of sin x?
We have f (x) = sin x, and so the first-order derivative of f is given by
f (x) =

d
sin x = cos x.
dx

The second-order derivative of f is then


f (x) =

d
dx

df
dx

d
cos x = sin x.
dx

The third-order derivative of f is then


f (x) =

d
dx

d2 f
dx2

d
( sin x) = cos x.
dx

And, finally, the fourth-order derivative of f is


f

(4)

d
(x) =
dx

d3 f
dx3

d
( cos x) = sin x.
dx

So, in particular, we see that f (x) = f (x), f (x) = f (x) and f (4) (x) = f (x).
Activity 3.14
n 1?

Using the pattern inherent in Example 3.15, what is f (n) (x) for

Activity 3.15 Find the first four derivatives of f (x) = x ex . Hence deduce an
expression for f (n) (x) for n 1.

3.3

Using derivatives

Derivatives can be very useful in mathematics and economics, but before we see how,
we need to understand what derivatives represent.

3.3.1

The meaning of the derivative

If we draw the graph of a function, f , we get the curve with equation y = f (x). At any
point on this curve, say the point (a, f (a)), we can draw a chord (or secant line) that
connects the given point to another point on the curve. For instance, in Figure 3.1, the

66

3.3. Using derivatives


y
y = f (x)

f (b)

3
C

f (a)
O

Figure 3.1: The line segment C is the chord joining the points (a, f (a)) and (b, f (b)) on

the curve y = f (x). This is extended using the dotted lines at both ends so that we can
see what line the chord is a line segment of.
line segment C is the chord joining the points (a, f (a)) and (b, f (b)) on the curve
y = f (x). In particular, we see that the gradient of this chord, lets call it mC , can be
found using the formula
f (b) f (a)
,
mC =
ba
which you should know.
To relate this to the derivative, we take some number, h = 0, and let b = a + h so that
we now have a chord, C, which is joining the points (a, f (a)) and (a + h, f (a + h)). The
gradient of this chord is then given by
mC (h) =

f (a + h) f (a)
f (a + h) f (a)
=
,
(a + h) a
h

and, for h = 0, this is a function of h since the value of mC will depend on the value of
h that we choose. In particular, recalling what we saw in Section 3.1, we can see that
f (a) = lim mC (h),
h0

and we want to see what this is telling us about the function, f .


We now consider the chords that join the point (a, f (a)) to the points
(a + h1 , f (a + h1 )), (a + h2 , f (a + h2 )) and (a + h3 , f (a + h3 )) where h3 > h2 > h1 > 0.
These are the line segments on the lines C1 , C2 and C3 which can be seen in Figure 3.2.
That is, we have three points that are getting successively closer to the point (a, f (a))
so that we can see the effect of letting h 0. In particular, as we let h get smaller, we
see that the gradients of these particular chords are decreasing. The question is, do the

67

3. Differentiation
y
y = f (x)

C3
C2
f (a + h3 )

C1

T
f (a + h2 )

f (a + h1 )

f (a)
O

a + h1

a + h2

a + h3

Figure 3.2: C1 , C2 and C3 are three chords of the curve y = f (x) originating from the

point (a, f (a)). Observe that as the other end of a chord approaches this point, the chords
pivot about it and their gradients get closer to the gradient of the line, T .
gradients of these chords tend to some finite limit as h 0? That is, does the limit in
our expression for f (a) above exist?
Hopefully, in Figure 3.2, you can see that as h gets smaller (i.e. as we consider C3 , then
C2 and then C1 ), the lines are pivoting through the point (a, f (a)) and their gradients
are getting closer to the gradient of the line T . Indeed, in the limit as h 0, the lines
we get from extending an arbitrary chord joining the points (a, f (a)) and
(a + h, f (a + h)) should become the line T . In particular, this means that the limit of
mC (h) as h 0 exists because it should be equal to the gradient of T . This means that
the line T , called the tangent to f at the point (a, f (a))
goes through the point (a, f (a)), and
its gradient is the limit, as h 0, of mC (h), i.e. f (a).

For this reason, we define the gradient of a curve y = f (x) at the point (a, f (a)) to be
the gradient of its tangent line at that point and this, as we have seen, is simply the
value of f (a).

3.3.2

Tangent lines and linear approximations

Now that we know how the tangent lines to a curve are related to derivatives, we can
use derivatives to find the equation of the tangent line to a curve at a given point. This,
in turn, will introduce us to a useful way of performing approximations.

68

3.3. Using derivatives

Finding tangent lines


Given that f (a) is the gradient of the curve y = f (x) at the point x = a, we can use
this to find the equation of the tangent line at this point. In particular, the formula for
the gradient of a straight line, i.e.
f (a) =

y f (a)
,
xa

(3.1)

gives us the equation of the tangent line as it goes through the point (a, f (a)) and its
gradient is given by f (a). Lets look at a quick example.
Example 3.16
when x = 3.

Find the equation of the tangent line to the function f (x) = x2

When x = 3, the point on the curve y = x2 is (3, 9) and we know that f (3) = 6 as
f (x) = 2x. Consequently, using (3.1), the equation of the tangent line is given by
6=

y9
x3

y 9 = 6x 18

y = 6x 9.

In particular, when written in this form, we see that the gradient of the line is
indeed 6 and the point (3, 9) does indeed lie on it as 6(3) 9 = 9.
Activity 3.16 Find the equation of the tangent line to the function f (x) = ex when
x = 1.
Linear approximations
One use of tangent lines is that they provide us with a simple way of approximating the
value of a function. For instance, if we have the tangent line to the function f (x) at the
point x = a, the equation of its tangent line, i.e.
f (a) =

y f (a)
,
xa

can be rearranged to give us


y = f (a) + (x a)f (a).
Now, if we pick a value of x that is close to a, say x , the value of y when x = x , will be
y = f (a) + (x a)f (a),
and this will be close to the value of f (x ) as illustrated in Figure 3.3. Of course, if we
pick a value of x which is closer to a, the value of y will be closer to the value of f (x )
and we will have a better approximation to the value of f (x) at this point.
As we are approximating the function f by a straight line, we call this a linear
approximation to f around a. In particular, we have
f (x)

f (a) + (x a)f (a),

69

3. Differentiation
y
y = f (x)

3
T

error

f (x )

f (a)
O

Figure 3.3: When x is close to a we can use the tangent line at a to find y which gives

us an approximate value for f (x ). There will, of course, be an error involved in this


approximation but this can be made smaller if we use values of x which are closer to a.
if x is close to a. In Section 3.4, we will discuss Taylor series and these will allow us to
find better approximations to f around a, but before we do that, lets consider a useful
application of linear approximations.
Example 3.17

Without using a calculator, find an approximate value of 3 e0.1 .

Given that f (x) = 3 ex , we have


f (x) = 3 ex

f (0) = 3 e0 = 3,

and so, using our linear approximation, we get


f (0.1)

f (0) + (0.1 0)f (0) = 3 e0 +(0.1)(3) = 3 0.3 = 2.7,

i.e. an approximate value of 3 e0.1 is 2.7. We note in passing that, using a


calculator, we can see that the exact value of 3 e0.1 is 2.71 to 2dp and so this is a
pretty good approximation.
Using linear approximations to find changes
As well as allowing us to find approximations to f around a, linear approximations can
give us useful information about how the value of f is changing as we move from a to,
say, a + h. Using our linear approximation, we see that
f (a + h)

70

f (a) + hf (a)

f (a)

f (a + h) f (a)
,
h

3.3. Using derivatives

and so, if we denote the change in f by f and the change in x by x = h, we see that
f
f (a)
or
f f (a)x.
x
That is, we can find the approximate value of the change in f if we change x from a to
a + h. Of course, the smaller x = h is, the better our approximation. This is
illustrated in Figure 3.4.
y
y = f (x)

approx f

f (a)
O

exact f

error

f (a + h)

a+h
x = h

Figure 3.4: Finding an approximate value for f using a linear approximation to f .

Obviously, the smaller the value of the change x = h, the better the approximation for
f will be.
Example 3.18 Without using a calculator, find the approximate change in 3 ex if
x is increased from zero to 0.1. Hence deduce the approximate value of 3 e0.1 .
Given that f (x) = 3 ex , we have
f (x) = 3 ex

f (0) = 3 e0 = 3,

so, if x = 0.1 as x increases from 0 to 0.1, we find that


f

f (0)x = (3)(0.1) = 0.3,

i.e. the change in f is approximately 0.3. Observe that the minus sign is telling us
that when x increases from 0 to 0.1, f (x) is decreasing by approximately 0.3.
This means that using
f (0.1)

f (0) + f = 3 e0 +(0.3) = 3 0.3 = 2.7,

we see that the approximate value of 3 e0.1 is 2.7 as we would expect from the linear
approximation in Example 3.17.

71

3. Differentiation

Further, as the derivative of a function gives us information about how f (x) is changing
due to changes in x, we often refer to f (a) as the rate of change of f (x) with respect to
x when x = a.

3.3.3

Applications of derivatives

Derivatives are useful in economics and we now introduce two ways in which they can
arise in that subject. The first is their use when discussing marginal functions and the
second is when they are used in the context of elasticities. At this point, we will just
introduce these ideas and see how they might be useful, but they will also be used when
we consider some applications of the material contained in other chapters of this subject
guide.
Marginal functions
In economics, the term marginal denotes the rate of change of a quantity with respect
to a variable on which it depends. For instance, if a firm has a cost function, C(q), this
tells us the cost of producing q units of their product. The marginal cost of the firm,
which we denote by MC(q), would then be given by
MC(q) =

dC
.
dq

This is useful since, using what we saw above, we can see that the marginal cost is
telling us (approximately) about how changes in the level of production, q, will incur
changes in the costs, C. That is, if the level of production is increased by q, i.e. our
production increases from q to q + q, we find that
MC(q) =

dC
dq

MC(q)

C
q

MC(q)q,

where C = C(q + q) C(q) is, of course, the resulting increase in costs. In


particular, if q is so large that a change in production of one unit (i.e. q = 1) is small
compared to q, we see that
C = C(q + 1) C(q)

MC(q).

That is, in these circumstances, the marginal cost tells us (approximately) the extra
cost incurred if the firm wishes to produce one more unit of their good given that they
are already producing q units.
Example 3.19

A firm has a cost function given by


C(q) = 1000 + 5q + q 2 ,

in dollars. Find the marginal cost function for this firm and use it to determine the
approximate cost of producing one more unit if the original level of production is 100
units.
The marginal cost function, MC(q), is given by
MC(q) = C (q) = 5 + 2q,

72

3.3. Using derivatives

and so using the fact that the change in cost, C, is related to the change in
production, q, by
C C (q)q,
we see that an increase in production of one unit, i.e. q = 1, gives rise to an
increase in costs given by
C

C (100)(1) = (5 + 2(100))(1) = 205.

That is, if the firm is producing 100 units and they increase their production by one
unit, they will incur additional costs of approximately 205 dollars.
Activity 3.17 By using C(q + 1) C(q) directly when q = 100, determine how
good the approximation found in Example 3.19 is.
Generally then, if f is some economically meaningful function, its derivative is referred
to as the marginal of f and we denote this by Mf . For instance, if R(q) is the revenue
function for a firm, the marginal revenue, MR(q), is just R (q).
Elasticities
Suppose that, as in Section 2.1.5, we have a market where consumers purchase a good
according to the demand function, q D (p). If the price of this good was to increase from
p to p + p, then there will be a change in the quantity demanded by the consumers
from q to q + q. Indeed, since a rise in price will usually lead to a fall in demand, we
would expect q to be negative here. In these circumstances, we can see how these
changes are related by noting that
q = q D (p + p) q D (p)

q (p)p

q
p

q (p),

where we have used q to denote the quantity demanded, i.e. q(p) = q D (p).
Now, suppose that we are interested in the relative change in quantity, q/q, and the
relative change in price, p/p, we can see that the ratio of these two terms is then given
by
q/q
p q
p
=
q (p).
p/p
q p
q
Indeed, as q is usually negative (whereas the other terms on the left-hand-side, i.e. p,
q and p, are all positive) we would usually expect the right-hand-side to be negative
as well. With this in mind, we define the [price] elasticity of demand, (p), to be
p
(p) = q (p),
q
where q = q D (p) and the minus sign is introduced so that, in the usual case where q is
negative, we can be sure that (p) itself will be positive.7 Then, we can see that using
q
q

(p)

p
,
p

Some books omit the minus sign in their definition of the elasticity of demand, but it will be useful
for us to include it as it is easier to deal with positive quantities.

73

3. Differentiation

we can see how the relative change in quantity is simply related to the relative change
in price via the elasticity of demand.
Example 3.20 Suppose that the demand function for some good is given by
q D (p) = 10pr where r is a constant. Find the elasticity of demand. What does this
tell us about the effect of relative changes in price on relative changes in quantity?

Here we have q = q D (p) = 10pr where r is a constant and so,


q (p) = 10rpr1 ,
which means that the elasticity of demand is given by
p
p
(p) = q (p) =
q
10pr

10rpr1

= r,

i.e. (p) is a constant too. This means that, using


q
q

(p)

p
,
p

we see that a relative increase in price of, say, x% will lead to a relative decrease in
quantity purchased of (approximately) rx%.
Indeed, we will see, in Section 4.2.3, that elasticities can also give us useful information
about how the revenue, R = pq, generated from selling a quantity, q, at a price of p per
unit will be affected by increases in the price.

3.3.4

Existence of derivatives

Although we will usually be dealing with situations where a function has a derivative at
every point where it is defined, we will occasionally encounter situations where there is
at least one point at which the derivative of a function does not exist. Just so that we
are aware of what this means and the kinds of situation in which it can arise, we
consider some of the most common ways in which a derivative can fail to exist at a
certain point.8
Discontinuous functions
If a function is discontinuous at a point, i.e. there is a point at which the function is not
continuous, then the derivative will not exist at that point as the next example
illustrates.
Example 3.21

Show that the derivative of the function defined by


f (x) =

1
x0
,
1 x < 0

See, for example, Section 2.8 of Binmore and Davies (2002) for a discussion of some similar cases.

74

3.3. Using derivatives

does not exist when x = 0.


This function is illustrated in Figure 3.5(a) and, clearly, as the function is a
continuous horizontal line when x = 0, its derivative is defined and equal to zero as
long as x = 0. However, when x = 0, the function is discontinuous and its derivative,
if it exists, would be given by

f (h) f (0)
.
h0
h

f (0) = lim
However, here we can not just find

f (h) f (0)
,
h
and let h 0 as we did in Section 3.1 since the value of f (h) is different depending
on whether h is positive or negative. In such cases, we say that the limit we seek, i.e.
f (h) f (0)
,
h0
h
lim

exists if, firstly, both of the limits


lim

h0

f (h) f (0)
h

and

lim+

h0

f (h) f (0)
,
h

exist9 and, secondly, if they exist, they must be equal. But, using the given function,
we see that
(1) 0
1
f (h) f (0)
lim
= lim
= lim
= ,
h0
h0
h0
h
h
h
and
(1) 0
1
f (h) f (0)
= lim+
= lim+ = ,
lim+
h0
h0 h
h0
h
h
i.e. neither of these limits exists as is not a value10 but more of a notational
convenience which tells us that a function is getting arbitrarily large in the limit.
Consequently, we see that
f (h) f (0)
,
h0
h

f (0) = lim

fails to exist too and so the derivative of this function does not exist at x = 0.
Of course, the graph of a function can also have a discontinuity due to the presence of a
vertical asymptote. In such cases, the function is not actually defined at the value of x
where the asymptote occurs and so, because of this, the derivative cannot exist at this
point either.11 In both of these cases, as we cant ascribe a gradient to the function at
these points, the function cant have a tangent line at these points.
9

Notice that the former limit allows us to deal with negative h and the latter allows us to deal with
positive h. Also recall that the notation h 0 and h 0+ was explained in Example 2.2.
10
That is, it is not a real number.
11
Well come across this again in Section 4.4.3.

75

3. Differentiation
y

y
y = x1/3
y = |x|

1
y=

1
x0
1 x < 0

(a)

(b)

(c)

Figure 3.5: The graphs of three functions that have no derivative at x = 0 as explained in

(a) Example 3.21, (b) Example 3.22 and (c) Example 3.23. We note however that, unlike
the functions in (a) and (b), the function in (c) does have a tangent line at x = 0 given
by the vertical line with equation x = 0.
Continuous functions with corners
But, even if a function is continuous at every point, the derivative will not exist at
points where the curve changes too sharply, i.e. when the curve has a corner, as the
next example illustrates.
Example 3.22
when x = 0.

Show that the derivative of the function f (x) = |x| does not exist

This function is illustrated in Figure 3.5(b) and, clearly, as the function is the
continuous straight line f (x) = x when x < 0 and f (x) = x when x > 0, its
derivative is defined and equal to 1 when x < 0 and 1 when x > 0. However, when
x = 0, the function has a corner and its derivative, if it exists, would be given by
f (h) f (0)
.
h0
h

f (0) = lim

However here, as in Example 3.21, we can not just find


f (h) f (0)
,
h
and let h 0 as we did in Section 3.1 since the value of f (h) is different depending
on whether h is positive or negative. In such cases, we again say that the limit we
seek, i.e.
f (h) f (0)
lim
,
h0
h
exists if, firstly, both of the limits
lim

h0

f (h) f (0)
h

and

lim+

h0

f (h) f (0)
,
h

exist12 and, secondly, if they exist, they must be equal. But, using the given
function, we see that
lim

h0

76

f (h) f (0)
(h) 0
= lim
= lim 1 = 1,
h0
h0
h
h

3.3. Using derivatives

and

f (h) f (0)
h0
= lim+
= lim+ 1 = 1,
h0
h0
h0
h
h
i.e. both of these limits exist, but they are clearly not equal. Consequently, we see
that
f (h) f (0)
f (0) = lim
,
h0
h
fails to exist and so the derivative of this function does not exist at x = 0.
lim+

Observe that, in this case, the limits as h 0+ and as h 0 both exist, but the
problem occurs because they are not equal and so we cannot ascribe a value to the
derivative (i.e. the limit as h 0) in such situations. In particular, as this means that
we cant ascribe a gradient to f at this point, the function cant have a tangent line
here either.
Continuous functions with vertical tangent lines
Also, if a function is continuous at every point, the derivative will not exist at points
where the gradient of the curve becomes infinite, i.e. when the curve has a vertical
tangent line, as the next example illustrates.

Example 3.23
when x = 0.

Show that the derivative of the function f (x) = x1/3 does not exist

This function is illustrated in Figure 3.5(c) and, clearly, we can see that its
derivative is given by
1
f (x) = 31 x2/3 = 2/3 ,
3x
which exists as long as x = 0. Of course, when x = 0, the derivative cannot exist
since, if we were to use this formula, we would have to divide by zero and this is
never allowed. However, we can see from Figure 3.5(c) that the graph of the function
has a vertical tangent line at x = 0 which is given by the vertical line with equation
x = 0.13 Thus, we have a situation where the derivative of the function does not
exist at x = 0, but it does have a tangent line at that point.

Observe that, in cases where the tangent line to f at a point is a vertical line we cannot
use (3.1) to find its equation as its derivative is not defined.14

12

Again, as in Example 3.21, the former limit allows us to deal with negative h and the latter allows
us to deal with positive h.
13
Notice that the tangent lines of the function are getting steeper as we move towards x = 0 on the
left and shallower as we move away from x = 0 on the right.
14
Well come across this again in Section 4.4.3.

77

3. Differentiation

3.4

Using higher-order derivatives

We have seen that the first derivative of a function, f , can allow us to find a linear
approximation to f around a by using the formula
f (x)

f (a) + (x a)f (a).

However, if we use higher-order derivatives, we can get better approximations to f


around a by using the formula
f (x) = f (a) + (x a)f (a) +

(x a)2
(x a)n (n)
f (a) + +
f (a) + ,
2!
n!

(3.2)

which is called the Taylor series for f (x) about x = a.15 You will notice that the
right-hand-side of this formula is an infinite series and, for reasons beyond the scope of
this course, there will generally be conditions that depend on f and a that determine
whether this infinite series does indeed give us the value of f (x) that we expect to get
on the left-hand-side. For now, we just note that these conditions can be used to find a
set of values of x, that includes the point x = a, for which the formula works. Of course,
if the value of x in question does not lie in this set, the formula does not work!
In this course, we will often just use the first few terms from the Taylor series to get an
approximate value of f (x).16 And, as long as we are considering what this formula tells
us about f (x) when x is close to a, these approximations will generally be more than
adequate. For instance, if we take n = 1 in this formula, i.e. if we take the first two
terms of the Taylor Series, we recover our linear approximation to f around a and, if we
take n = 2, we get
f (x)

f (a) + (x a)f (a) +

(x a)2
f (a),
2!

which is now a quadratic approximation to f around a. Indeed, we have seen how useful
the linear approximation is in Section 3.3.2 and the quadratic approximation will be
useful in the next chapter.

3.4.1

Maclaurin series

Lets start with the relatively simple case of a Maclaurin series which is what we call a
Taylor series about x = 0. That is, the Maclaurin series of the function f (x) is found by
setting a = 0 in (3.2) to get
f (x) = f (0) + xf (0) +

xn
x2
f (0) + + f (n) (0) + .
2!
n!

(3.3)

To see how this works, lets start by finding a simple Maclaurin series.
15

See, for instance, Section 2.13 of Binmore and Davies (2002) for an explanation of where this formula
comes from.
16
It will be an approximation since, if we only keep the first few terms from the beginning of the series,
we lose all the information about the value of f (x) that is contained in the terms we are neglecting.

78

3.4. Using higher-order derivatives

Example 3.24

Find the Maclaurin series for ex .

Here we have f (x) = ex so that f (0) = 1. We also note that the first three
derivatives of this function are
f (x) = ex ,

f (x) = ex

and f (x) = ex .

Indeed, it should be clear that f (n) (x) = ex for all n 1. Then, to use these in (3.3),
we need to evaluate these derivatives at x = 0, i.e. we find that
f (0) = e0 = 1,

f (0) = e0 = 1 and f (0) = e0 = 1.

Indeed, it should be clear that f (n) (0) = e0 = 1 for all n 1. Consequently, putting
this into (3.3), we get
x2 x3
xn
e =1+x+
+
+ +
+ ,
2!
3!
n!
x

and this formula works for all x R.


In particular, observe that ex is only equal to the series on the right-hand-side if we
keep all of the terms in this infinite series. Of course, it is not always so easy to find a
Maclaurin series and so lets look at another example.
Example 3.25

Find the Maclaurin series for (1 + x)r when r Q.

Here we have f (x) = (1 + x)r so that f (0) = 1. We also note that the first three
derivatives of this function are
f (x) = r(1 + x)r1 , f (x) = r(r 1)(1 + x)r2 and f (x) = r(r 1)(r 2)(1 + x)r3 .
Indeed, it should be clear that
f (n) (x) = r(r 1) (r [n 1])(1 + x)rn ,
for all n 1. Then, to use these in (3.3), we need to evaluate these derivatives at
x = 0, i.e. we find that
f (0) = r,

f (x) = r(r 1) and f (0) = r(r 1)(r 2).

Indeed, spotting the pattern, it should be clear that


f (n) (0) = r(r 1) (r [n 1]),
for all n 1. Consequently, putting this into (3.3), we get
(1 + x)r = 1 + rx +

r(r 1) 2 r(r 1)(r 2) 3


x +
x +
2!
3!
r(r 1) (r [n 1]) n
+
x + ,
n!

and this formula works when |x| < 1.

79

3. Differentiation

In particular, notice that if r Q but r N, this is always an infinite series as, for any
n N, we will find that r [n 1] = 0. However, if r N, we will find a value of n,
namely n = r + 1 that makes r [n 1] = 0 and this will mean that all of the terms
with n r + 1 will be zero, i.e. the Maclaurin series will be finite and will terminate at
the term where n = r. This is a very special Maclaurin series that you may have
encountered before as the binomial theorem and we look at some examples of this
special case in Activity 3.18.
Activity 3.18 Use the Maclaurin series for (1 + x)r which we found in
Example 3.25 to find (1 + x)2 and (1 + x)3 .
As well as the two Maclaurin series derived in Examples 3.24 and 3.25, you should also
remember the following
x3 x5
x2n+1
sin x = x
+
+ +
+
3!
5!
(2n + 1)!
cos x = 1

x2n
x2 x4
+
+ +
+
2!
4!
(2n)!

for x R.

for x R.

x2 x3
xn
+
+ + (1)n+1 + for |x| < 1.
2
3
n
In particular, observe how these series differ in their first term, the presence of terms of
odd and even degree and the absence of factorials in the series for ln(1 + x).
ln(1 + x) = x

Using Maclaurin series as approximations


As we have seen, a Maclaurin series is an infinite series in powers of x and, by taking a
certain number of terms, we can use it to approximate a function. In particular, we say
that we have the nth-order Maclaurin series of a function if we keep all of the terms up
to and including the one in xn and discard the rest.
Example 3.26

Write down the second and fourth-order Maclaurin series for cos x.

As we saw above, the Maclaurin series for cos x is given by the infinite series
x2 x 4
x2n
cos x = 1
+
+ +
+ ,
2!
4!
(2n)!
As such, the second-order Maclaurin series for cos x is
1

x2
,
2!

which, since there is no x3 term in the Maclaurin series for cos x, is also the
third-order Maclaurin series for cos x. Similarly, the fourth-order Maclaurin series for
cos x is
x2 x4
1
+ ,
2!
4!
5
which, since there is no x term in the Maclaurin series for cos x, is also the
fifth-order Maclaurin series for cos x.

80

3.4. Using higher-order derivatives

These nth-order Maclaurin series can be used to approximate a function, f (x), for
values of x close to x = 0. In general, there are two factors that determine how accurate
this approximation will be, namely
the value of x we are considering: the closer this value of x is to x = 0, the better
the approximation will be, and
the order of the Maclaurin series we use: the more terms we keep, the better the
approximation will be.
The precise way of determining the accuracy of such approximations in terms of these
two factors will be dealt with in 176 Further Calculus where you will encounter Taylors
theorem. But, we can see how it works and begin to see how these factors affect the
accuracy of our approximations by considering some examples.
Example 3.27 Use the fourth-order Maclaurin series for cos x to find an
approximate value for cos 1 and cos 2.
The fourth-order Maclaurin series for cos x is
1

x2 x4
+ .
2!
4!

This means that taking x = 1, we see that


1

cos 1

13
12 14
+
= ,
2
24
24

which is 0.5417 to 4dp. Using a calculator we see that the true value of cos 1 is 0.5403
to 4dp and so this is a good approximation as, to 2dp, it gives us 0.54 either way.
Similarly, taking x = 2, we see that
cos 2

22 24
2
1
+
=12+ = ,
2
24
3
3

which is 0.3333 to 4dp. Using a calculator we see that the true value of cos 2 is
0.4161 to 4dp and so this is a poor approximation as it isnt even accurate to 1dp.
But, of course, we should expect our approximations to be poor if we move too far away
from x = 0 as, by definition, the Maclaurin series represents how the function is
behaving around x = 0. To see this, consider the curves in Figure 3.6 which illustrate
how the fourth-order Maclaurin series for cos x becomes less accurate at approximating
the function as we move away from x = 0.
The other way in which the accuracy of our approximation to a function can be affected
is the number of terms we take in the Maclaurin series. For instance, the second-order
Maclaurin series for cos x contains less information about the function than the
fourth-order one and so we would expect this to give us a worse approximation. This
can be seen in Figure 3.7, which illustrates how the second-order Maclaurin series is
even less accurate than the fourth-order one as we move away from x = 0.

81

3. Differentiation

Figure 3.6: The solid curve is the graph of the function cos x and the dashed curve is the

graph of the fourth-order Maclaurin series for this function. Observe how the Maclaurin
series moves away from the function as we take values of x further away from x = 0.
Using Maclaurin series to approximate other functions
We now look at some ways of finding Maclaurin series for more complicated functions
and see how we can use these to find approximations.
Example 3.28

Find the fourth-order Maclaurin series for x ex .

There are two ways to do this. We could use (3.3) to see that as f (x) = x ex we have
f (0) = 0 and then, using what we found in Activity 3.15 above, i.e.
f (x) = (1+x) ex ,

f (x) = (2+x) ex ,

f (x) = (3+x) ex ,

and f (4) (x) = (4+x) ex ,

we see that
f (0) = 1,

f (0) = 2,

f (0) = 3,

and f (4) (0) = 4.

So, putting this into (3.3) we see that, keeping terms up to x4 ,


0 + x(1) +

x3
x4
x3 x 4
x2
(2) + (3) + (4) = x + x2 +
+ ,
2!
3!
4!
2
6

is the fourth-order Maclaurin series for x ex .


Alternatively, since we know that the Maclaurin series for ex is given by
ex = 1 + x +

82

x2 x3 x4
+
+
+ ,
2!
3!
4!

3.4. Using higher-order derivatives

Figure 3.7: The solid curve is the graph of the function cos x, the dotted curve is the graph

of its second-order Maclaurin series and the dashed curve is the graph of its fourth-order
Maclaurin series. Observe how the former less accurately tracks the function than the
latter as we take values of x further away from x = 0.
we can see that
x ex = x 1 + x +

x2 x3 x4
+
+
+
2!
3!
4!

= x + x2 +

x3 x4
+
+ ,
2
6

as before, if we keep terms up to x4 for the fourth-order Maclaurin series for x ex .


This example illustrates a general point: when asked to find a Maclaurin series of a
certain order, we can always use the definition and differentiation. But, if the
derivatives start to become difficult to calculate, it is always easier to use the Maclaurin
series for the elementary functions (which we saw above) and a little algebra to find
what we are looking for. Lets consider another example to see how we can do this in a
slightly harder situation.
Example 3.29

Find the fourth-order Maclaurin series for cos(ln(1 + x)).

Here we have f (x) = cos(ln(1 + x)) which is a composition where f (x) = cos y with
y = ln(1 + x). So we need to look at the Maclaurin series for cos y which is given by
y2 y4
+
+ ,
2!
4!
and y, in turn, will be given by the Maclaurin series for ln(1 + x), i.e.
cos y = 1

y = ln(1 + x) = x

x2 x3 x 4
+
+
+ .
2
3
4!

83

3. Differentiation

So, substituting our series for y into our series for cos y, we can see that
f (x) = 1

1
2!

x2 x3 x4
+

+
2
3
4

1
4!

x2 x3 x4
+

+
2
3
4

+ ,

and we start by looking at how the terms A and B contribute to the series if we are
only interested in terms up to x4 . For A, we have
A=
=

x2 x3 x4
+

+
2
3
4
x2 x3 x4
x
+

+
2
3
4

x2 x3 x4
+

+
2
3
4

so we can multiply each term in the second bracket by the appropriate terms in the
first bracket (taking care to include cross-terms) to get
A = (x)(x) 2

x2
2

(x) + 2

x3
3

(x) +

x2
2

x2
2

+ = x2 x3 +

11 4
x + ,
12

where indicates terms we can ignore because their degree is greater than four.
Similarly, for B, we have
B=

x2 x3 x4
+

+
x
2
3
4

which is the bracketed expression


x

x2 x3 x 4
+

+
2
3
4

multiplied by itself four times. The terms which arise from this product are obtained
by multiplying together four objects, one from each occurrence of the bracketed
expression. Since the term with lowest power of x in each bracket is x, it is only by
taking the x from each bracket that we obtain a term which is at most x4 and so we
get
B = x4 + ,
where indicates terms we can ignore because their degree is greater than four.
Of course, using similar reasoning, we can see that there will be no further terms for
our series as the next term in the cos y series (i.e. the first one we omitted above) is
y 6 /6! and the smallest term this can yield looks like x6 whose degree is greater
than four.
Therefore, putting this all together, we have
A B
+ +
2! 4!
1
11
=1
x2 x3 + x4 +
2
12

cos (ln(1 + x)) = 1

1
x4 +
24

5
x2 x3
+
x4 + ,
2
2
12
and this gives us the fourth-order Maclaurin series for cos(ln(1 + x)) as we have kept
all of the terms up to x4 .
=1

84

3.4. Using higher-order derivatives

Activity 3.19 Find the fourth-order Maclaurin series for cos(ln(1 + x)) by using
the definition and differentiation to verify the answer we found in Example 3.29.
(Notice that it is harder to work it out using this method!)
Once we have the Maclaurin series of a function, f (x), we can use it to estimate the
value of the function at some value of x close to zero as we did above.

Example 3.30 Use the Maclaurin series we found in Example 3.29 to find an
approximate value for cos(ln 1.1) and cos(ln 1.9).
To find an approximate value for cos(ln 1.1), we use the Maclaurin series above to
get the approximation
cos(ln(1 + x))

x2 x3
5
+
x4 ,
2
2
12

and then set x = 0.1 to get


cos(ln 1.1) = cos(ln(1+0.1))

0.12 0.13 5
+
0.14 = 10.005+0.00050.000042,
2
2
12

which is 0.995458 to 6dp. In passing we note that, using a calculator, the true value
is 0.995461 to 6dp and so this is a good approximation as, to 5dp, it gives us 0.99546
either way.
To find an approximate value for cos(ln 1.9), we use the approximation above with
x = 0.9 to get
cos(ln 1.9) = cos(ln(1+0.9))

0.92 0.93 5
+
0.94 = 10.405+0.36450.273375,
2
2
12

which is 0.686125 to 6dp. In passing we note that, using a calculator, the true value
is 0.800987 to 6dp and so this is a poor approximation as it isnt even accurate to
1dp.
Observe that this approximation has deteriorated much more quickly than the one we
used when considering approximate values of cos x in Example 3.27. We wont pursue
the nature of this sensitivity here, but we do reiterate that we should expect our
approximations to be poor if we move too far away from x = 0 for, as we have seen, the
Maclaurin series is there to represent how the function is behaving around x = 0.

3.4.2

Taylor series

We now briefly consider what happens when we are looking for the Taylor series for
f (x) around x = a when a = 0. In this case, we follow the general method outlined
above, but now we have to use (3.2), i.e.
f (x) = f (a) + (x a)f (a) +

(x a)2
(x a)n (n)
f (a) + +
f (a) + ,
2!
n!

that we saw earlier.

85

3. Differentiation

Example 3.31

Find the Taylor series for ex around x = 1.

Here we have f (x) = ex so that f (1) = e. We also note, as in Example 3.24, that
f (n) (x) = ex for n 1. Then, to use these derivatives in (3.2), we need to evaluate
them at x = 1, i.e. we find that f (n) (1) = e for n 1. Consequently, putting this into
(3.2), we get
ex = e +(x 1) e +

(x 1)2
(x 1)3
(x 1)n
e+
e+ +
e+ ,
2!
3!
n!

as the Taylor series for ex around x = 1.


Activity 3.20 We can write ex as e1 ex1 so that values of x around x = 1
correspond to values of x 1 around x = 0. Use this fact and the Maclaurin series for
ex which we found in Exercise 3.24 to derive the result we found in Example 3.31.
Activity 3.21

Find the Taylor series for ex around x = 2.

Using Taylor series as approximations


We can use the Taylor series of a function around x = a to get approximations to the
value of the function for values of x close to x = a in the same way as we used the
Maclaurin series of a function to get approximations to the value of the function for
values of x close to x = 0 in Section 3.4.1. As the ideas are so similar, we will just take a
brief look at how they work.
Example 3.32 Find an approximation to e1.1 using (a) the second-order Maclaurin
series for ex and (b) the second-order Taylor series for ex around x = 1. How do
these approximations compare?
For (a), we know from Example 3.24 that the second-order Maclaurin series for ex is
given by
x2
1+x+ ,
2!
and, using this, we find that
1.12
= 2.705.
2!
Incidentally, the exact value of e1.1 is 3.0042 (to 4dp) and so this approximation
doesnt even agree with this to 1dp.
e1.1

1 + 1.1 +

For (b), we know from Example 3.31 that the second-order Taylor series for ex
around x = 1 is given by
(x 1)2
e +(x 1) e +
e,
2!
and, using this, we find that
e1.1

86

e +(1.1 0.1) e +

(1.1 1)2
e = 1.105 e,
2!

3.4. Learning outcomes

which, if we know the value of e, gives us 3.0037 (to 4dp). This agrees with the
exact value of e1.1 to 3dp.
As we should expect, the answer to (b) gives us a better approximation to e1.1 than
the one we found in (a) since x = 1.1 is closer to x = 1 than it is to x = 0. But, on
the other hand, the answer to (a) didnt require us to have any accurate knowledge
of the value of e itself!
Following on from this example, as we can see in Figure 3.8, we observe that the
Maclaurin series for ex is most accurate when x is close to x = 0 whereas the Taylor
series for ex about x = 1 is most accurate when x is close to x = 1. This is, of course,
exactly what we should expect!

Figure 3.8: The solid curve is the graph of the function ex , the dashed curve is the graph

of its second-order Maclaurin series and the dotted curve is the graph of its second-order
Taylor series about x = 1. Observe how, as we might expect, the Maclaurin series is more
accurate around x = 0 and this Taylor series is more accurate around x = 1.

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
find simple derivatives using the definition of the derivative;
find derivatives using standard derivatives and the rules of differentiation;
use the derivative to find tangent lines and use these to approximate functions;
solve problems from economics-based subjects that involve derivatives;
find Maclaurin and Taylor series and use these to approximate functions.

87

3. Differentiation

Solutions to activities
Solution to activity 3.1
We need to find the derivative of the function f (x) = x2 at the point x = 1, i.e.
f (1). So, using the definition of the derivative with a = 1, we start by looking at

f (1 + h) f (1)
(1 + h)2 (1)2
=
,
h
h
which, looking at the numerator, is easily simplified to give
f (1 + h) f (1)
(1 2h + h2 ) 1
2h + h2
=
=
= 2 + h.
h
h
h
This in turn means that
f (1 + h) f (1)
= lim
h0
h0
h

f (1) = lim

2+h

= 2,

because, in the limit as h 0, we see that 2 + h goes to 2.


Solution to activity 3.2
In Example 3.2, we showed that if f (x) = x2 , then f (x) = 2x. This means that
f (1) = 2 in agreement with what we saw in Activity 3.1.
To find the point at which the derivative of f (x) = x2 , i.e. f (x) = 2x, is equal to (a) 16,
we see that
f (x) = 2x = 16 when x = 8,
and (b) 4, we see that
f (x) = 2x = 4 when x = 2.
Solution to activity 3.3
Given the linear combination rule, i.e.
dg
d
df
+l ,
kf (x) + lg(x) = k
dx
dx
dx
we can derive the constant multiple rule by setting l = 0 so that
d
d
df
dg
df
kf (x) =
kf (x) + 0g(x) = k
+0
=k ,
dx
dx
dx
dx
dx
the sum rule by setting k = 1 and l = 1 so that
d
d
df
dg
df
dg
f (x) + g(x) =
1f (x) + 1g(x) = 1
+1
=
+
,
dx
dx
dx
dx
dx dx
and the difference rule by setting k = 1 and l = 1 so that
d
d
df
dg
df
dg
f (x) g(x) =
1f (x) + (1)g(x) = 1
+ (1)
=

.
dx
dx
dx
dx
dx dx

88

3.4. Solutions to activities

Solution to activity 3.4


For (a), we use the constant multiple rule to see that
d
dx

3 cos x

= 3

d
dx

cos x

= 3 sin x

= 3 sin x.

For (b), we use the sum rule to see that


d
dx

ex + cos x

d
dx

ex

d
dx

= ex + sin x

cos x

= ex sin x.

For (c), we use the linear combination rule to see that


d
3 sin x 3 ln x
dx

=3

d
dx

sin x 3

d
dx

ln x

= 3 cos x 3

1
x

3
= 3 cos x .
x

Solution to activity 3.5


For (a), h(x) = x sin x is the product of the two functions
f (x) = x

and

g(x) = sin x,

f (x) = 1

and

g (x) = cos x.

and these give us


As such, the product rule tells us that
h (x) = (1)(sin x) + (x)(cos x) = sin x + x cos x.
For (b), h(x) = ex cos x is the product of the two functions
f (x) = ex

and

g(x) = cos x,

and these give us


f (x) = ex

and

g (x) = sin x.

As such, the product rule tells us that


h (x) = (ex )(cos x) + (ex )( sin x) = ex (cos x sin x).
For (c), h(x) = sin x cos x is the product of the two functions
f (x) = sin x

and

g(x) = cos x,

and these give us


f (x) = cos x

and

g (x) = sin x.

As such, the product rule tells us that


h (x) = (cos x)(cos x) + (sin x)( sin x) = cos2 x sin2 x.
Then, using the double angle formulae
sin x cos x = sin(2x) and

cos2 x sin2 x = cos(2x),

89

3. Differentiation

from (2.6), this means that we have


d
dx

1
sin(2x)
2

= cos(2x),

so that, using the constant multiple rule, we can deduce that


d
dx

sin(2x)

= 2 cos(2x).

This result will make sense once we have seen the chain rule and, in particular,
Activity 3.8(a).
Solution to activity 3.6
sin x
For (a), h(x) =
is the quotient of the two functions
x
f (x) = sin x

and

g(x) = x,

f (x) = cos x

and

g (x) = 1.

and these give us


As such, the quotient rule tells us that
h (x) =

x cos x sin x
(cos x)(x) (sin x)(1)
=
.
2
x
x2

In this case, the original function and the derivative are only defined if x = 0.
For (b), h(x) =

ex
is the quotient of the two functions
cos x
f (x) = ex

and

g(x) = cos x,

and these give us


f (x) = ex

and

As such, the quotient rule tells us that


h (x) =

g (x) = sin x.

(ex )(cos x) (ex )( sin x)


cos x + sin x x
=
e .
2
[cos x]
cos2 x

In this case, the original function and the derivative are only defined if cos x = 0, i.e. if
x = (2n + 1) 2 for n Z.
For (c), h(x) =

sin x
is the quotient of the two functions
cos x
f (x) = sin x

and

g(x) = cos x,

and these give us


f (x) = cos x
As such, the quotient rule tells us that

and

g (x) = sin x.

(cos x)(cos x) (sin x)( sin x)


cos2 x + sin2 x
h (x) =
=
.
[cos x]2
cos2 x

90

3.4. Solutions to activities

In this case, the original function and the derivative are only defined if cos x = 0, i.e. if
x = (2n + 1) 2 for n Z.
Indeed, using the Pythagorean identity
sin2 x + cos2 x = 1 and the definitions

tan x =

sin x
cos x

and

sec x =

1
,
cos x

from (2.2), (2.1) and Section 2.1.2, we can deduce that


d
dx

tan x

1
= sec2 x,
2
cos x

as long as x = (2n + 1) 2 for n Z.


Solution to activity 3.7
Given that h(x) = (2x + 1)3 , we can multiply out the brackets to get
h(x) = 8x3 + 12x2 + 6x + 1,
which means that
h (x) = 24x2 + 24x + 6 = 6(4x2 + 4x + 1) = 6(2x + 1)2 ,
in agreement with what we saw in Example 3.10.
Solution to activity 3.8
For (a), h(x) = sin(2x) is the composition of the functions
f (g) = sin g

and

g(x) = 2x.

As such we have
f (g) = cos g

and

g (x) = 2,

and so the chain rule tells us that


h (x) = (cos g)(2) = 2 cos(2x),
which agrees with what we found in Activity 3.5(c).
For (b), h(x) = ln(cos x) is the composition of the functions
f (g) = ln g

and

g(x) = cos x.

As such we have

1
g
and so the chain rule tells us that
f (g) =

h (x) =

1
g

and

g (x) = sin x,

( sin x) =

sin x
= tan x.
cos x

For (c), h(x) = ln(ex ) is the composition of the functions


f (g) = ln g

and

g(x) = ex .

91

3. Differentiation

As such we have
f (g) =

1
g

g (x) = ex ,

and

and so the chain rule tells us that


h (x) =

1
g

(ex ) =

ex
= 1.
ex

Of course, this is obvious as ln(ex ) = x and so its derivative with respect to x is


therefore 1.
Solution to activity 3.9
Given that ax = ex ln a , we use the chain rule with h(g) = eg and g(x) = x ln a to get
d
dax
=
dx
dx

ex ln a

ex ln a

ln a

= ax ln a,

as required.
Solution to activity 3.10
Writing the quotient f (x)/g(x) as the product f (x)[g(x)]1 , the product rule gives us
d
f (x)[g(x)]1
dx

dg
df
[g(x)]1 + f (x) [g(x)]2
,
dx
dx

where we have used the chain rule to differentiate [g(x)]1 with respect to x. Rewriting
this, we then have
df
dg
g(x) f (x)
d f (x)
dx ,
= dx
dx g(x)
[g(x)]2
which is the quotient rule, as required.
Solution to activity 3.11
We have y = f (x) so that x = f 1 (y). Thus, differentiating both sides of the latter with
respect to x, we get
dx
df 1 dy
=
,
dx
dy dx
where we have used the chain rule on the right-hand-side as y itself is a function of x
since y = f (x). This gives us
1=

df 1 dy
dy dx

df 1
=1
dy

df
,
dx

as required.17 In particular, observe that this formula makes no sense at points where
f (x) = 0.

17

See Section 2.9 of Binmore and Davies (2002) for a geometric view of this result.

92

3.4. Solutions to activities

Solution to activity 3.12


Here we have y = f (x) = ex and x = f 1 (y) = ln y so, using the result from
Activity 3.11, we see that
d
dy

ln y

d
dx

=1

ex

1
1
= ,
x
e
y

as (ex ) = ex = y.
Solution to activity 3.13
There is, generally, no need to apply the rules of differentiation in as much detail as we
have been using. So, lets do the three examples in this activity quickly.
2

For (a), we have h(x) = ex ln(sin x) which is the product of two compositions and so
using the product and chain rules we get
h (x) =

x2

2x e

ln(sin x) + e

x2

cos x
sin x

ex
=
2x sin x ln(sin x) + cos x .
sin x

For (b), we have


sin(cos x)
,
esin x
which is the quotient of two compositions and so using the quotient and chain rules we
get
h(x) =

h (x) =

cos(cos x)( sin x) esin x sin(cos x) esin x cos x

[esin x ]2
sin x cos(cos x) + cos x sin(cos x)
.
=
esin x

For (c), we have h(x) = sin2 (3x) + cos2 (3x) which is the sum of two compositions and so
we can easily use the chain rule to see that
h (x) = 2 sin(3x) cos(3x)(3) + 2 cos(3x)[ sin(3x)](3) = 0.
Of course, this is obvious as sin2 (3x) + cos2 (3x) = 1 using (2.2) and so its derivative
with respect to x is zero.
Solution to activity 3.14
We have seen that the first four derivatives are given by
f (x),

f (x) = f (x),

f (x) = f (x),

and f (4) (x) = f (x) = f (x),

which returns us to our original function. Indeed, we can then see that the next four
derivatives will be given by
f (5) (x) = f (x),

f (6) (x) = f (x),

f (7) (x) = f (x) and f (8) (x) = f (x) = f (x),

93

3. Differentiation

which, again, returns us to our original function. This means that, spotting the pattern,
we can see that

f (x)
n = 4, 8, . . .

f (x)
n = 1, 5, 9, . . .
f (n) (x) =

f (x) n = 2, 6, 10, . . .

f (x) n = 3, 7, 11, . . .

for n 1.

Solution to activity 3.15


To find the first four derivatives of x ex , we use the product rule to see that
f (x) = (1)(ex ) + (x)(ex ) = (1 + x) ex ,
f (x) = (1)(ex ) + (1 + x)(ex ) = (2 + x) ex ,
f (x) = (1)(ex ) + (2 + x)(ex ) = (3 + x) ex , and
f (4) (x) = (1)(ex ) + (3 + x)(ex ) = (4 + x) ex .
Indeed, spotting the pattern, we can deduce that f (n) (x) = (n + x) ex for n 1.
Solution to activity 3.16
Here we have f (x) = ex so that f (x) = ex . Then using (3.1), we see that when x = 1 we
have
y f (1)
= y e1 = e1 (x 1) = y = e x.
f (1) =
x1
as the equation of the tangent line to the function f (x) = ex at x = 1.
Solution to activity 3.17
Here we have C(q) = 1000 + 5q + q 2 and so, when operating at q = 100, the increase in
cost given an increase in quantity of one is given by
C = C(101) C(100) = (1000 + 5(101) + 1012 ) (1000 + 5(100) + 1002 ) = 206.
This is pretty close to the approximate answer of 205 that we found in Example 3.19,
especially if we consider this as a relative error of 1/206 = 0.49% (to 2dp) instead of an
absolute error of one.
Solution to activity 3.18
The Maclaurin series for (1 + x)2 is given by
(1 + x)2 = 1 + 2x +

(2)(1) 2
x = 1 + 2x + x2 ,
2!

as all terms involving xn with n 3 will have a coefficient of zero. Similarly, the
Maclaurin series for (1 + x)3 is given by
(1 + x)3 = 1 + 3x +

94

(3)(2) 2 (3)(2)(1) 3
x +
x = 1 + 3x + 3x2 + x3 ,
2!
3!

3.4. Solutions to activities

as all terms involving xn with n 4 will have a coefficient of zero. Of course, this is
exactly what we would get if we just multiplied out the brackets in the usual way!
Solution to activity 3.19
To use (3.3), we see that f (x) = cos(ln(1 + x)) gives

f (0) = cos(ln 1) = cos 0 = 1,


and then, finding the first four derivatives of f (x), we get
f (x) =

sin(ln(1 + x))
,
1+x

f (x) =

sin(ln(1 + x)) cos(ln(1 + x))


,
(1 + x)2

f (x) =

3 cos(ln(1 + x)) sin(ln(1 + x))


,
(1 + x)3

f (4) (x) = 10

cos(ln(1 + x))
,
(1 + x)4

after some fairly tortuous differentiation. These then give us


f (0) = 1,

f (0) = 0,

f (0) = 3 and f (4) (0) = 10,

if we evaluate them at x = 0. Consequently, putting these into (3.3), we get


x2
x3
x4
(1) + (3) + (10) +
2!
3!
4!
x2 x3
5 4
=1
+
x + ,
2
3
12

cos(ln(1 + x)) = 1 + x(0) +

and this gives us the fourth-order Maclaurin series for cos(ln(1 + x)) in agreement with
what we saw before in Example 3.29. Notice, however, that this method involved some
fairly complicated differentiation whereas the method in Example 3.29 only involved
some simple algebra!
Solution to activity 3.20
For values of y around y = 0 we have the Maclaurin series
ey = 1 + y +

yn
y2 y3
+
+ +
+ ,
2!
3!
n!

so that, if we let y = x 1 for values of x around x = 1, we still have values of y around


y = 0, i.e. we can write
ex1 = 1 + (x 1) +

(x 1)2 (x 1)3
(x 1)n
+
+ +
+ ,
2!
3!
n!

95

3. Differentiation

which gives us the Taylor series for ex1 for values of x around x = 1. So, as
ex = e1 ex1 , this means that
ex = e +(x 1) e +

(x 1)3
(x 1)n
(x 1)2
e+
e+ +
e+ ,
2!
3!
n!

is the Taylor series for ex for values of x around x = 1 in agreement with what we found
in Example 3.31.
Solution to activity 3.21
To find the Taylor series for ex around x = 2, we can either use (3.2) or the method we
saw in Activity 3.20.
Method I: Using (3.2), we have f (x) = ex so that f (2) = e2 . We also note, as in
Example 3.24, that f (n) (x) = ex for n 1. Then, to use these derivatives in (3.2), we
need to evaluate them at x = 2, i.e. we find that f (n) (2) = e2 for n 1. Consequently,
putting these into (3.2), we get
ex = e2 +(x 2) e2 +

(x 2)2 2 (x 2)3 2
(x 2)n 2
e +
e + +
e + ,
2!
3!
n!

as the Taylor series for ex around x = 2.


Method II: Using the method of Activity 3.20, we know that for values of y around
y = 0 we have the Maclaurin series
ey = 1 + y +

yn
y2 y3
+
+ +
+ ,
2!
3!
n!

so that, if we let y = x 2 for values of x around x = 2, we still have values of y around


y = 0, i.e. we can write
ex2 = 1 + (x 2) +

(x 2)2 (x 2)3
(x 2)n
+
+ +
+ ,
2!
3!
n!

which gives us the Taylor series for ex2 for values of x around x = 2. So, as
ex = e2 ex2 , this means that
ex = e2 +(x 2) e2 +

(x 2)n 2
(x 2)2 2 (x 2)3 2
e +
e + +
e + ,
2!
3!
n!

is the Taylor series for ex for values of x around x = 2 in agreement with what we have
just found using the other method.

Exercises
Exercise 3.1
Find the derivatives of the following functions.
(a) esin x cos x,

96

(b)

tan x
,
ex2

(c) sin(x ex ).

3.4. Solutions to exercises

Exercise 3.2
Use the compound-angle formulae to show that
cos x = sin x +

and

sin x = cos x +

.
2

Hence use the chain rule to derive the derivative of cos x from the derivative of sin x.
Exercise 3.3
Verify that the point (e, e) is on the curve with equation
y = x ln x,
and find the equation of the tangent line to the curve at this point.
Consider, for some constants a and b, the curve with equation
y = ax2 + b.
For what values of a and b does this curve pass through the point (e, e) with the same
tangent line as the one you found above?
Exercise 3.4
Suppose the demand function for a good is
q D (p) =

1
1 + p4

Find the elasticity of demand in terms of p and verify that it is positive if p > 0.
Exercise 3.5
Find the fourth-order Maclaurin series for ln

1 + sin x
.
1+x

Solutions to exercises
Solution to exercise 3.1
We apply the rules of differentiation quickly as we did in Activity 3.13.
(a) The function h(x) = esin x cos x is a product that has the composition esin x as one
of its terms. As such, applying the product rule we get
h (x) =

esin x cos x cos x + esin x

sin x

cos2 x sin x esin x ,

where we have used the chain rule to differentiate the composition.

97

3. Differentiation
2

(b) The function h(x) = (tan x)/ ex is a quotient whose denominator is the
2
composition ex . As such, applying the quotient rule we get
2

h (x) =

(sec2 x) ex (tan x) ex (2x)


x2

[e ]2

sec2 x 2x tan x
,
ex2

where we have used the fact, from Activity 3.6(c), that the derivative of tan x is
sec2 x and the chain rule to differentiate the composition.

Also note that this derivative can be found by writing the function as
2
h(x) = (tan x) ex and, if we do this, we would use the product rule instead of the
quotient rule.
(c) The function h(x) = sin(x ex ) is the composition sin x after x ex where the latter
function is a product. As such, applying the chain rule we get
h (x) = cos(x ex ) (1) ex +x(ex )

= (1 + x) ex cos(x ex ),

where we have used the product rule to differentiate the product.


Solution to exercise 3.2
Using the compound-angle formulae from (2.5), we have
sin x +

= sin x cos

+ cos x sin = cos x,


2
2

and

= cos x cos sin x sin = sin x,


2
2
2
as required. Indeed, notice that we have used the facts, from Activity 2.3, that
sin(/2) = 1 and cos(/2) = 0.
cos x +

Now, using chain rule and the derivative of sin x, we see that

d
sin x +
dx
2

= cos x +

(1) = cos x +
,
2
2

which, using the results we showed above, becomes


d
cos x = sin x,
dx
as required.
Solution to exercise 3.3
Substituting x = e into y = x ln x we get
y = e ln e = e(1) = e,
and so the point (e, e) lies on this curve. The gradient of the curve at any point is given
by the derivative of f (x) = x ln x and so, using the product rule, we get
f (x) = (1) ln(x) + x

98

1
x

= ln(x) + 1.

3.4. Solutions to exercises

Thus, when x = e, the gradient of the curve is given by


f (e) = ln(e) + 1 = 1 + 1 = 2,
which means that, using (3.1), we get
2=

ye
xe

y e = 2(x e)

y = 2x e,

as the equation of the tangent line to the curve y = x ln x at the point (e, e).
The curve y = ax2 + b will have a tangent line at (e, e) which is the same as the one we
have just found if, firstly, the curve goes through the point (e, e), i.e. a and b must satisfy
e = a e2 +b,
and, secondly, it has the same gradient at e, i.e. if the derivative of g(x) = ax2 + b at
x = e is two. That is, as
g (x) = 2ax

we need

g (e) = 2a e,

to be two. Thus, from the second condition, we have


2a e = 2

1
a= ,
e

b = e e = 0.

and, from the first condition, we have


e=

1
e

e2 +b

Consequently, we see that when a = 1/ e and b = 0 the curve y = ax2 + b passes through
the point (e, e) with the same tangent line as the one we found above.
Solution to exercise 3.4
We have the demand function
q D (p) =

1
1 + p4

= (1 + p4 ) 2 ,

and so, setting q = q D (p), we can use the chain rule to get the derivative
3
1
2p3
q (p) = (1 + p4 ) 2 (4p3 ) =
3 .
2
(1 + p4 ) 2

Then, using the definition of the elasticity of demand from Section 3.3.3, we have
p
p
(p) = q (p) =
1
q
(1 + p4 ) 2

2p3

(1 + p4 ) 2

2p4
,
1 + p4

in terms of p. Indeed, when p > 0, we have p4 > 0 and 1 + p4 > 0, which means that
(p) > 0 too.

99

3. Differentiation

Solution to exercise 3.5

We start by noticing that it really is much easier to make use of the standard Maclaurin
series rather than trying to use (3.3) directly on the given function. Especially as, in
order to apply (3.3), we would need to find the first four derivatives of the function to
answer this question and this would get very messy very quickly! Indeed, if we decide to
use the standard Maclaurin series, two methods present themselves.
Method I: We start by simplifying the function by using the laws of logarithms from
Section 2.1.4. This gives us
ln

1 + sin x
1+x

= ln(1 + sin x) ln(1 + x),

and so, we can easily use the Maclaurin series for ln(1 + x) from Section 3.4.1, i.e.
x2 x3 x4
+

+ ,
2
3
4
to get the second term in this difference. Then, using the Maclaurin series for sin x, also
from Section 3.4.1, we have
x3
sin x = x
+ ,
3!
which means that the first term in this difference is
ln(1 + x) = x

ln(1 + sin x) = ln 1 + x
=

x3
+
3!

x3
x
+
3!

x3
x
+
3!

1
+
x
3

x
4

+ ,

where we have used the Maclaurin series for ln(1 + x) again in the second line. Now, as
we want to keep terms up to x4 , we can see that the brackets in the second term give us
x

x3
+
3!

x3
+
3!

= x2 2 x

x3
x4
+ = x2
+ ,
3!
3

where, here, were trying to make it clear that each term that arises from this product is
obtained by multiplying out the relevant brackets. Further, we see that the brackets in
the last two terms will give us x3 and x4 respectively. Overall, then, we have
1
x4
1 3
1 4
x3
+
x2
+ +
x
x
3!
2
3
3
4
x2 x3 x4
=x
+

+ ,
2
6
12
for the first part of our difference. Putting these together in our expression for the
function, we then have
ln(1 + sin x) =

ln

1 + sin x
1+x

= ln(1 + sin x) ln(1 + x)


x2 x3 x4
+

+
2
3
4
x3 x4
= +
,
6
6
=

100

x2 x3 x4
+

+
2
6
12

3.4. Solutions to exercises

and this gives us the required fourth-order Maclaurin series.


Method II: We could also use the Maclaurin series for sin x which we saw above to get
1 + sin x = 1 + x

x3
+ ,
3!

and the fact that

3
1
= (1 + x)1 = 1 x + x2 x3 + x4 + ,
1+x

which, with r = 1, follows from a simple application of the Maclaurin series for
(1 + x)r that we saw in Example 3.25. This means that we have
1 + sin x
=
1+x

1+x

x3
+
3!

1 x + x2 x3 + x4 +

= 1 1 x + x2 x3 + x4 +

+ x 1 x + x2 x3 +

x3
1 x +
3!

x3 x4
=1
+
+ ,
3!
3!
if we want to keep terms up to x4 . Then, using the Maclaurin series for ln(1 + x) which
we saw above, we get
ln

1 + sin x
1+x

= ln 1 +

x3 x 4
+
+
3!
3!

x3 x4
+
+ ,
3!
3!

and this gives us the same fourth-order Maclaurin series as the one we found using the
other method.

101

3. Differentiation

102

Chapter 4
One-variable optimisation
Essential reading

(For full publication details, see Chapter 1.)


Binmore and Davies (2002) Sections 4.14.3.
Anthony and Biggs (1996) Chapters 8 and 9.
Further reading
Simon and Blume (1994) Chapter 3.
Adams and Essex (2010) Sections 4.44.6.
Aims and objectives
The objectives of this chapter are as follows.
To see what first and second-order derivatives tell us about functions.
To see how derivatives and other information about a function can be used to
sketch curves.
To use derivatives to solve problems where a function needs to be optimised.
Specific learning outcomes can be found near the end of this chapter.

4.1

Introduction: What is optimisation?

Having seen how to find derivatives in the previous chapter, we now consider what they
tell us about a function. In particular, we will see that the first-order derivatives of a
function tell us where the function is increasing, stationary or decreasing; and its
second-order derivatives tell us where the function is convex or concave. Indeed, once we
have access to this information about a function we will be able to do two things.
Firstly, we will be able to sketch the curve that represents the graph of a function; and
secondly, we will be able to see where a function is optimised, i.e. we will be able to find
the points where the function takes its largest and smallest values.

103

4. One-variable optimisation

4.2

Using first-order derivatives

The first-order derivatives of a function allow us to see whether it is increasing or


decreasing. If it is neither increasing or decreasing, we say that the function is
stationary. As we shall see in Section 4.5, stationary points are important when we are
finding the points where a function is optimised.

4.2.1

Increasing and decreasing functions

Intuitively, if f (x) is a function defined for all x R, we would say that it is


increasing when the values of f (x) get larger as x gets larger, and
decreasing when the values of f (x) get smaller as x gets larger.
Or, more precisely, if a and b are any two points in an interval, I, such that a < b, then
f is increasing on I if f (a) < f (b), and
f is decreasing on I if f (a) > f (b).
Indeed, we can see that this makes sense by considering the two functions illustrated in
Figure 4.1.
y

y = f (x)

y = f (x)

00000001010
1111111
1111
0000
1010 1010
1010 1010

f (b)

f (a)

(a) f is increasing

1010
0000
1111
1010 10
1111111
0000000
1010 1010
10 10

f (a)

f (b)

(b) f is decreasing

Figure 4.1: As x increases, (a) f is increasing as its values get larger and (b) f is decreasing

as its values get smaller. This can also be seen by taking two values of x, say a and b,
such that a < b. In (a), the function is increasing because we have f (a) < f (b) and in (b)
the function is decreasing because we have f (a) > f (b).
However, of more interest here is the fact that we can use derivatives to determine
whether a function is increasing or decreasing over some interval, I. To see how this
works, consider that the first-order Taylor approximation to f (x) around x = a is given
by
f (x) = f (a) + (x a)f (a),
and to make this a good approximation, we want x a to be small. So, if we now
consider another value of x, say x = b, where b > a and b a is small, we see that this

104

4.2. Using first-order derivatives

approximation gives us
f (b) = f (a) + (b a)f (a).

Now, b a > 0, so we just need to know the sign of f (a) to determine whether f (b) is
greater or less than f (a), i.e. whether f is increasing or decreasing as we move from a to
b. Indeed, we see that
if f (a) > 0, then f is increasing at a because f (b) > f (a), and
if f (a) < 0, then f is decreasing at a because f (b) < f (a).
Indeed, by letting a be any value of x, we can generalise this to obtain the following
useful result. Let I be an interval,
if f (x) > 0 for x I, then f is increasing on I, and
if f (x) < 0 for x I, then f is decreasing on I.

Lets look at an example to see how this works.

Example 4.1 Determine the intervals on which the function f (x) = x3 2x2 15x
is (a) increasing and (b) decreasing.
Differentiating the function with respect to x, we find that
f (x) = 3x2 4x 15.
This factorises to give us
f (x) = (3x + 5)(x 3),
and so, by looking at what is happening away from the points x = 5/3 and x = 3
where f (x) = 0, we see that the sign of this derivative can be found by considering
the signs of its two factors, i.e.

3x + 5
x3
f (x)

x < 35

53 < x < 3
+

3<x
+
+
+

This means that the function is (a) increasing on the intervals x < 5/3 and x > 3
where f (x) > 0 and (b) decreasing on the interval 5/3 < x < 3 where f (x) < 0 as
illustrated in Figure 4.2(a).
A useful consequence of this is that it tells us something about the tangent lines to the
function f (x) at points where it is increasing or decreasing. Recall, from Section 3.3.2,
that the tangent line to f (x) at the point x = a has an equation given by
y = f (a) + (x a)f (a),
and, in particular, the gradient of the tangent line is given by f (a). This means that, if
f (x) is increasing (or decreasing) at x = a, then f (a) will be positive (or negative) and
this, in turn, means that the tangent line at this point will also be an increasing (or
decreasing) function of x. This will be useful in a moment, but for now, we can see how
this works by looking at Figure 4.3.

105

4. One-variable optimisation
y = f (x)

53

y = f (x)

2
3

(a)

(b)

Figure 4.2: The graph of f (x) = x3 2x2 15x indicating the points relevant to (a)

Examples 4.1, 4.2 and 4.4; (b) Examples 4.5 and 4.6.
y

y = f (x)

y = f (x)

1010
0000
1111
1010
1010
10

f (a)
T

0000
1111
1010
1010

f (a)

(a) f and T are increasing

(b) f and T are decreasing

Figure 4.3: (a) When f (x) is increasing at x = a, its tangent line at the point (a, f (a))

will also be increasing as the gradient of the curve (and hence the gradient of the tangent
line) at this point is positive. (b) When f (x) is decreasing, its tangent line at the point
(a, f (a)) will also be decreasing as the gradient of the curve (and hence the gradient of
the tangent line) at this point is negative.
At this point, we know what a positive or negative derivative tells us about a function
but you may be wondering what happens when the derivative is neither positive nor
negative. That is, what happens when the derivative is zero? This is very important and
we now turn our attention to that.

4.2.2

Stationary points

When we find a point, say x = a, that makes f (x) = 0, the tangent line at that point is
horizontal and its Cartesian equation is given by
y = f (a).
This means that we will have a function which may look like the one illustrated in
Figure 4.4. We call such points, i.e. points where f (x) = 0, stationary points.

106

4.2. Using first-order derivatives

y = f (x)

f (a)

1
0
0
1
0
1

Figure 4.4: The point x = a is a stationary point of the function f (x) as f (a) = 0. Observe

that this means that the tangent line to f (x) at the point (a, f (a)) is a horizontal line.
There are, essentially, four different kinds of stationary point that we will encounter and
these depend on how the function is changing as we move through the stationary point
in the direction of increasing x. In particular, as x is increasing through a stationary
point at x = a, we have a
local minimum if f changes from being increasing to being decreasing at the
stationary point, and a
local maximum if f changes from being decreasing to being increasing at the
stationary point.
Of course, f could also be increasing (or decreasing) on both sides of the stationary
point and in these cases we have a point of inflection. These four possibilities are
illustrated in Figure 4.5 and, in particular, we see that the stationary point we saw
earlier in Figure 4.4 is a local minimum.
This provides us with a way of classifying any stationary points we find by looking at
the sign of the first-order derivative of the function as we move through a stationary
point. This is called the first-order derivative test and it runs as follows. As we move
through the stationary point in the direction of increasing x, if we find that:
f (x) changes from positive to negative, i.e. the function goes from being increasing
to being decreasing as we pass through the stationary point, then the stationary
point is a local maximum.
f (x) changes from negative to positive, i.e. the function goes from being decreasing
to being increasing as we pass through the stationary point, then the stationary
point is a local minimum.
And, if the sign of f (x) does not change, i.e. if the function is increasing (or decreasing)
on both sides of the stationary point, then the stationary point is a point of inflection.

107

4. One-variable optimisation
y

y
y = f (x)

T
T

y = f (x)

(a) f increases on both sides


A point of inflection
y

(b) f increases and then decreases


A local maximum
y

y = f (x)

y = f (x)

T
O

(c) f decreases and then increases


A local minimum

(d) f decreases on both sides


A point of inflection

Figure 4.5: The four different kinds of stationary point.

Example 4.2 Find the stationary points of the function given in Example 4.1 and
classify them by using the first-order derivative test.
We saw in Example 4.1 that the derivative of the function can be written as
f (x) = (3x + 5)(x 3),
and so the stationary points of this function, i.e. the points that make f (x) = 0,
occur when x = 5/3 and x = 3 as you can see in Figure 4.2(a).
We can also use what we saw in Example 4.1 to see that, according to the
first-derivative test, the stationary point that occurs when:
x = 5/3 is a local maximum as f changes from being increasing to being
decreasing (i.e. f changes from positive to negative) at the stationary point.
x = 3 is a local minimum as f changes from being decreasing to being
increasing (i.e. f changes from negative to positive) at the stationary point.
This, of course, can be clearly seen in Figure 4.2(a).

108

4.2. Using first-order derivatives

4.2.3

An application: Elasticities revisited

In Section 3.3.3, we saw that the elasticity of demand, (p), is defined to be


p
(p) = q (p),
q
where q = q D (p) is the demand function and this told us how changes in price will cause
changes in the quantity purchased by the consumers. Now, of course, from the point of
view of a supplier, you are not interested in such changes per se, but in how such
changes affect your revenue. That is, will a change in price, together with the
corresponding change in quantity purchased, lead to an increase or decrease in your
revenue?
To answer this question, we assume that the supplier is a monopoly, i.e. it is the only
supplier of a given product to the market. In such cases, the revenue generated by
selling at a price per unit, p, is given by
R(p) = pq,
where q = q D (p) is the quantity that will be purchased by the consumers at this price.
Indeed, using the product rule to differentiate this with respect to p, we find that
p
R (p) = q + pq (p) = q 1 + q (p)
q

= q 1 (p) ,

using the definition of (p). So, as q > 0 for this to be economically meaningful, we have:
If (p) > 1, we see that R (p) < 0 and so a small increase in price leads to a
decrease in revenue. In such cases we say that demand for the product is elastic.
If (p) < 1, we see that R (p) > 0 and so a small increase in price leads to an
increase in revenue. In such cases we say that demand for the product is inelastic.
Thus, even though an increase in price will usually lead to a decrease in the quantity
that the consumers will demand, the value of the elasticity (i.e. whether it is greater
than or less than one) determines how such changes will affect the revenue (i.e. whether
it will decrease or increase).
Example 4.3 Suppose that the demand function for a good is given by
q D (p) = 20 2p. Determine the values of p that make the demand (a) elastic and (b)
inelastic.
In this case, we have q = q D (p) = 20 2p and so the elasticity of demand is given by
p
p
p
(p) = q (p) =
(2) =
,
q
20 2p
10 p
as long as p = 10. And, of course, we need values of p where 0 p 10 in order for
the demand function to be economically meaningful.
So, for (a), where we want the values of p that make demand elastic, we see that
(p) > 1

p
>1
10 p

p > 10 p,

109

4. One-variable optimisation

as 10 p > 0 since 0 p 10. This means that demand is elastic if p > 5 and, in
particular, if we have 5 < p 10 a small increase in price will lead to a decrease in
revenue.
For (b), similar reasoning shows us that demand is inelastic if p < 5 and, in
particular, if we have 0 p < 5 a small increase in price will lead to an increase in
revenue.

4.3

Using second-order derivatives

The second-order derivative of a function can allow us to infer useful information about
the shape of a function. For instance, they can allow us to infer whether a stationary
point is a local maximum or a local minimum and, more generally, whether the function
is convex or concave. Indeed, once we understand convexity and concavity, we will be in
a position to extend our understanding of what we mean by a point of inflection.

4.3.1

Second-derivatives and stationary points

The key to understanding the link between the shape of a function and its
second-derivative is the second-order Taylor approximation to f (x) around x = a, i.e.
f (x) = f (a) + (x a)f (a) +

(x a)2
f (a),
2

and we know that this is a good approximation as long as x a is small. Now, to start
with, lets suppose that f (x) has a stationary point at x = a, i.e. f (a) = 0, so that our
second-order Taylor approximation becomes
f (x) = f (a) +

(x a)2
f (a)
2

f (x) f (a) =

(x a)2
f (a).
2

Here, for all x near the stationary point, the sign of f (x) f (a) on the left-hand-side,
i.e. the relative magnitude of f (x) and f (a), is determined by the sign of f (a) on the
right-hand-side. That is, the sign of f (x) f (a) for x near the stationary point is
determined by the value of the second-order derivative at the stationary point. Indeed,
we see that:
If f (a) > 0, then f (x) > f (a) for all x near to a and so the function always lies
above the horizontal tangent line at x = a. This means that the stationary point is
a local minimum as in Figure 4.5(c).
If f (a) < 0, then f (x) < f (a) for all x near to a and so the function always lies
below the horizontal tangent line at x = a. This means that the stationary point is
a local maximum as in Figure 4.5(b).
Thus, the sign of the second-order derivative at a stationary point allows us to infer
whether the stationary point is a local maximum or a local minimum. When we classify
stationary points in this way, we call it the second-order derivative test. However,
observe that if f (a) = 0, then the second-order Taylor approximation tells us nothing
useful about the shape of the function as it reduces to f (x) = f (a).

110

4.3. Using second-order derivatives

Example 4.4 Use the second-order derivative test to classify the stationary points
of the function in Example 4.1.
We saw in Example 4.1 that the first-order derivative of f is
f (x) = 3x2 4x 15,
and, in Example 4.2, we saw that its stationary points occur when x = 5/3 and
x = 3. To use the second-order derivative test, we note that
f (x) = 6x 4,

and then use the fact that


when x = 5/3, f (x) = 14 < 0 and so this is a local maximum,
when x = 3, f (x) = 14 > 0 and so this is a local minimum,
in agreement with what we found in Example 4.2.

4.3.2

Convex and concave functions

More generally, the sign of the second-order derivative of a function tells us whether a
function is convex or concave. Indeed, we find that:
If f (x) > 0 on some interval, we say that f is convex on that interval.
If f (x) < 0 on some interval, we say that f is concave on that interval.
To get an idea of what this means, consider that a convex function on an interval, I,
has f (x) > 0 for all x I. So, if we take any particular point, say a I, the tangent
line to f at x = a has an equation given by
y = f (a) + (x a)f (a),
and so, our second-order Taylor approximation can be written as
f (x) = y +

(x a)2
f (a).
2

Now, as f (a) > 0 (recall that a I too), we see that f (x) > y for all x I where
x = a, i.e. these values of f always lie above the values from the tangent line to f at
x = a, as illustrated in Figure 4.6(a). But, of course, we can use any a I when we run
this argument and so a convex function is one which lies above all of its tangent lines,
as illustrated in Figure 4.6(b). In particular, a function must be convex in the
neighbourhood of a local minimum.
A similar argument can be given to show that a concave function always lies below all
of its tangent lines so that, in particular, a function must be concave in the
neighbourhood of a local maximum.

111

4. One-variable optimisation
y

y = f (x)

y = f (x)

f (x)
y

T
O

(a) f lies above the tangent at a I

(b) f lies above all its tangent lines

Figure 4.6: The relationship between a convex function and its tangent lines. (a) When

changing the value of x, we can see that the values of f (x) are greater than the
corresponding values of y from the tangent line to f at a, i.e. f lies above this tangent
line. (b) By changing the value of a, we can see that f lies above all of its tangent lines.

Activity 4.1 Using an argument similar to the one above, explain why a concave
function always lies below all of its tangent lines.
This gives us another, more visual, way of deciding whether a function is convex or
concave, namely:
A function is convex on some interval if it lies above all of its tangent lines in that
interval.
A function is concave on some interval if it lies below all of its tangent lines in that
interval.
And, we can see how this all works by continuing with our example.
Example 4.5 Determine the intervals on which the function in Example 4.1 is (a)
convex and (b) concave.
In Example 4.3 we saw that the second-order derivative of the function from
Example 4.1 is given by
f (x) = 6x 4,
so we find that
f (x) > 0 when 6x 4 > 0 which means that x > 2/3, and
f (x) < 0 when 6x 4 < 0 which means that x < 2/3.

This means that the function is convex on the interval x > 2/3 where f (x) > 0 and
concave on the interval x < 2/3 where f (x) < 0 as illustrated in Figure 4.2(b).
Indeed, when looking at this figure, observe that when x > 2/3 the function lies
above all of its tangent lines in that interval and that when x < 2/3 the function lies
below all of its tangent lines in that interval.

112

4.3. Using second-order derivatives

4.3.3

Points of inflection

Not all points of inflection are stationary points like the ones we saw in Section 4.2.2.
More generally, a point of inflection is a point where a function changes from being
convex to concave (or vice versa) in a certain well-defined way. Technically, we say that:
If f (a) = 0 and f (x) changes sign at x = a, then f has a point of inflection at a.
As such, we can see that the points indicated in Figure 4.7 as well as the ones we saw
earlier in Figure 4.5(a) and (d) are points of inflection although, of course, only the ones
in Figure 4.5(a) and (d) are stationary points as well.

y
T

y = f (x)

y = f (x)
O

a
(a)

a
(b)

Figure 4.7: A point of inflection where f changes from (a) convex to concave at a and (b)

concave to convex at a. In particular, observe that neither of these points of inflection is


a stationary point because neither of them have a horizontal tangent line, i.e. f (a) = 0
in both cases.

Example 4.6

Find any points of inflection of the function in Example 4.1.

We saw in Example 4.4 that the second-order derivative changes sign when x = 2/3
and, furthermore, we can see that f (2/3) = 0. This means that the function in
Example 4.1 has a point of inflection when x = 2/3.
Indeed, looking at Figure 4.2(b), we can see that when x = 2/3, the function changes
from being concave to convex as we should expect from a point of inflection.
However, this point of inflection is not a stationary point because f (x) = 0 when
x = 2/3.
It is, perhaps, worth stressing that the condition f (a) = 0 on its own is not enough to
guarantee that we have a point of inflection. For instance, the two functions illustrated
in Figure 4.8 both have f (0) = 0, but in neither case does the second derivative change
sign and so we do not have a point of inflection.
Activity 4.2 Show that f (0) = 0 for both of the functions illustrated in
Figure 4.8. How can we infer that they have those shapes by looking at (a) the
first-order derivative and (b) the second-order derivative of the function?

113

4. One-variable optimisation

(a) f (x) = x4 1

(b) f (x) = 1 x4

Figure 4.8: Both of these functions have f (0) = 0 but neither of them have a point
of inflection. (a) This is convex on both sides of x = 0 and the function has a local
minimum at that point. (b) This is concave on both sides of x = 0 and the function has
a local maximum at that point. (The dashed curves in these figures represent the curves
y = x2 1 in (a) and y = 1 x2 in (b) for comparison).

It is also worth noting that the condition that f (x) changes sign at x = a on its own is
not enough to guarantee that we have a point of inflection either. Of course, if f (x) is
changing sign at x = a and f (a) exists, we must have f (a) = 0. But, although we do
not dwell on it here, sometimes we may encounter functions where f (a) does not exist
even though f (x) changes sign at x = a. We will briefly consider what happens in
these cases when we look at cusps and asymptotes in Section 4.4.3.

4.4

Curve sketching

One useful application of this material on derivatives and what they tell us about the
shape of a function is curve sketching. The aim here is to illustrate the behaviour of
the curve described by the equation y = f (x) by picking out its main features and
where these features occur by means of a sketch. For most functions we will deal with,
these features include any points where the curve may cross the axes and the location
and nature of any stationary points. But, it may also be necessary to assess how the
curve behaves as x and, in particular, assessing whether the function has any
asymptotes. A general method for sketching the curve y = f (x) would therefore involve
us thinking about the following:
x-intercepts: The x-axis is given by the equation y = 0 and so the curve y = f (x)
crosses the x-axis at any point (x, 0) for which f (x) = 0. Solving this equation will
therefore give us the x-intercepts of the curve if there are any.
y-intercept: The y-axis is given by the equation x = 0 and so the curve crosses the
y-axis at the point (0, y) for which y = f (0). As f is a function, there can be only
one such point and this is the y-intercept.
Finding stationary points: We can find the stationary points, as we saw above, by
solving the equation f (x) = 0.

114

4.4. Curve sketching

Classifying stationary points: We can also determine whether each of the stationary
points is a local maximum, local minimum or point of inflection by using the
methods outlined above.
Limiting behaviour in the x-direction: We can determine how f (x) is behaving as
x and as x .

Of course, in certain cases, it may also be advantageous to think carefully about the
intervals in which the function is increasing (or decreasing) or whether the function is
convex (or concave). But, generally, the method above should suffice when we sketch
most functions.

In particular, observe that a sketch is very different from a plot. A plot involves plotting
certain points and joining them up with little regard to any interesting behaviour the
curve may be exhibiting elsewhere. A sketch, on the other hand, isolates any interesting
behaviour the curve may be exhibiting (such as the ones listed above) and concentrates
on these. Please be aware that there is a difference and in this course, we will always
want to see sketches and not plots!
To see how we can implement the method above, we will start by sketching the
relatively simple curves that arise when f is a polynomial. We will then consider how
we would proceed when the functions are differentiable, but involve other elementary
functions. Then, just so that we are aware of some possible complications, we look at
what happens when our function fails to be differentiable at some points.

4.4.1

Sketching curves defined by polynomials

Given what we have seen so far, the only real obstacle to sketching a polynomial is an
understanding of the limiting behaviour of this kind of function. The key result here is
that, if f (x) is a polynomial, its behaviour as x gets arbitrarily large in magnitude (that
is to say, as x or x ) is determined solely by its leading term, i.e. the one
with the highest power of x. Then, with this in mind, we can look at the term with the
highest power of x, lets say that this is xn , and note that:
if n is even, then xn as x and as x ; whereas
if n is odd, then xn as x and xn as x .

Using these facts and noting how the sign of the coefficient of the term with the highest
power of x can influence the sign of the limit, we can determine the limiting behaviour
of any polynomial.
Activity 4.3 Suppose that f (x) is a polynomial and that, for some constants a = 0
and n N, the term in this polynomial with the highest power of x is axn .
Determine the behaviour of f (x) as x and as x in the cases which arise
according to whether a is positive and negative and whether n is even or odd.
We can now see how to sketch some polynomials and we start by seeing how to sketch
the function that we have been considering throughout this chapter.

115

4. One-variable optimisation

Example 4.7 Sketch the curve y = f (x) where f (x) is the function in Example 4.1.
From the earlier examples in this section, we know quite a lot about this function
and, in particular, we have found and classified its stationary points. But, to sketch
this curve, we need to find a bit more information, namely its
x-intercepts: These occur when y = 0 and so we solve the equation given by
f (x) = 0, i.e.
x3 2x2 15x = 0,
which, on taking out the common factor of x and factorising the remaining
quadratic, gives us

x(x2 2x 15) = 0

x(x 5)(x + 3) = 0.

Thus, the x-intercepts occur when x = 3, x = 0 and x = 5.


y-intercept: This occurs when x = 0 and so using y = f (0) we see that the
y-intercept occurs when y = 0. Note, in particular, that this means that the
curve goes through the origin (as we should have expected since one of the
x-intercepts occurs when x = 0).
stationary points: We have found the x-coordinates of the stationary points and
classified them above (see, for instance, Example 4.2). So, all we need to do
here, is use y = f (x) to find the values of y at these points so that we can locate
them on our sketch. Doing this, we find that f (x) has a
local maximum when x = 5/3 and y = f (5/3) = 400/27, and
local minimum when x = 3 and y = f (3) = 36.

limiting behaviour: The term with the highest power of x in f (x) is x3 and so
f (x) as x and f (x) as x .

So, using this information, we begin to sketch this curve by roughly indicating these
key features on some axes as in Figure 4.9(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 4.9(b).
In particular, it is worth noting that in this sketch:
all of the key features are labelled;
the curve has the right kind of limiting behaviour, i.e. f (x) as x and
f (x) as x ; and
points of inflection which are not stationary points (recall that, in Example 4.5,
we saw that this curve has one when x = 2/3) are not usually indicated.
Of course, what we see here is similar to what we saw in Figure 4.2, but a sketch
must include information about all of the relevant key features.

116

4.4. Curve sketching


y = f (x)

400
27

400
27

53

3
5 x

53

5 x

36

36

(a) The key features

(b) The sketch

Figure 4.9: Sketching the curve y = x3 2x2 15x in Example 4.7. (a) Using what we

have discovered about the key features of the curve, we can begin to see what it must
look like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.
Indeed, it can be seen that, unlike plotting a function, sketching it is a bit of an art and
it can only be done well by learning to appreciate what your calculations are telling you
about its appearance. With this in mind, lets sketch a function that we havent
encountered before.

Example 4.8

Sketch the curve y = f (x) where f (x) = 2x4 4x3 + 2x2 .

We find the key features of this curve according to the list given above, namely
x-intercepts: These occur when y = 0 and so we solve the equation given by
f (x) = 0, i.e.
2x4 4x3 + 2x2 = 0,

which, on taking out the common factor of 2x2 and factorising the remaining
quadratic, gives us
2x2 (x2 2x + 1) = 0

2x2 (x 1)2 = 0.

Thus, the x-intercepts occur when x = 0 and x = 1.


y-intercept: This occurs when x = 0 and so using y = f (0) we see that the
y-intercept occurs when y = 0. Note, in particular, that this means that the
curve goes through the origin (as we should have expected since one of the
x-intercepts occurs when x = 0).
finding the stationary points: These occur when f (x) = 0 and so, noting that
f (x) = 8x3 12x2 + 4x,
we solve the equation
8x3 12x2 + 4x = 0,

117

4. One-variable optimisation

which, on taking out a common factor of 4x and factorising the remaining


quadratic, gives us
4x(2x2 3x + 1) = 0

4x(2x 1)(x 1) = 0,

and so the stationary points occur when x = 0, x = 1/2 and x = 1. Then, we


use y = f (x) to find the values of y at these points so that we can locate them
on the sketch. Doing this, we find that
x = 0 gives y = f (0) = 0,

x = 1/2 gives y = f (1/2) = 1/8, and

x = 1 gives y = f (1) = 0.

So, the stationary points have coordinates given by (0, 0), (1/2, 1/8) and (1, 0).
classifying the stationary points: Lets use the second-order derivative test here.
We can see that
f (x) = 24x2 24x + 4,
and so, looking at the stationary points, we have
f (0) = 4 > 0 and so (0, 0) is a local minimum;

f (1/2) = 2 < 0 and so (1/2, 1/8) is a local maximum; and


f (1) = 4 > 0 and so (1, 0) is a local minimum.

limiting behaviour: The term with the highest power of x in f (x) is 2x4 and so
f (x) as x and as x .

So, using this information, we begin to sketch this curve by roughly indicating these
key features on some axes as in Figure 4.10(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 4.10(b).

1
8

1
8

1
2

(a) The key features

y = f (x)

1
2

(b) The sketch

Figure 4.10: Sketching the curve y = 2x4 4x3 + 2x2 in Example 4.8. (a) Using what we

have discovered about the key features of the curve, we can begin to see what it must
look like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.

Activity 4.4 Find the points of inflection of the function in Example 4.8.

118

4.4. Curve sketching

4.4.2

Sketching curves defined using other elementary functions

When sketching curves defined using other elementary functions the only real obstacle
is, again, an understanding of the limiting behaviour of such functions. For instance, as
we saw in Section 2.1.1, exponential functions like ex and ex have very simple limiting
behaviours, i.e.
ex as x and ex 0 as x ; whereas
ex 0 as x and ex as x .

But, when functions such as these are multiplied by polynomials (say), it is not clear
how this will affect their limiting behaviour. For now, we just state the following fact1
When an exponential is multiplied by a polynomial, the exponential dominates.
Thus, for example, the function x3 ex 0 as x because the exponential ex 0
as x and this dominates the behaviour of the polynomial, x3 , even though
x3 as x . Lets sketch this curve to see why this is reasonable.
Example 4.9

Sketch the curve y = f (x) where f (x) = x3 ex .

We find the key features of this curve according to the list given above, namely
x-intercepts: These occur when y = 0 and so we solve the equation given by
f (x) = 0, i.e.
x3 ex = 0.
But, as ex = 0 for all x R, we find that the only x-intercept occurs when
x = 0.
y-intercept: This occurs when x = 0 and so using y = f (0) we see that the
y-intercept occurs when y = 0. Note, in particular, that this means that the
curve goes through the origin (as we should have expected since the x-intercept
we found occurs when x = 0).
finding the stationary points: These occur when f (x) = 0 and so, using the
product rule, we get
f (x) = (3x2 )(ex ) + (x3 )( ex ) = x2 (3 x) ex ,
and so we solve the equation
x2 (3 x) ex = 0.
But, as ex = 0 for all x R, we find that the stationary points occur when
x = 0 and x = 3. Then, we use y = f (x) to find the values of y at these points
so that we can locate them on the sketch. Doing this, we find that
1

In 176 Further Calculus we will encounter techniques for finding limits which are much more
sophisticated than the ones that we have seen so far. Once we have these, we will be able to see exactly
why this fact is true and be in a better position to assess the limiting behaviour of curves which are
defined using other elementary functions.

119

4. One-variable optimisation

x = 0 gives y = f (0) = (0)3 e0 = (0)(1) = 0, and

x = 3 gives y = f (3) = (3)3 e3 = 27 e3 .

So, the stationary points have coordinates given by (0, 0) and (3, 27 e3 ).
classifying the stationary points: Lets use the second-order derivative test here.
We can use the product rule again to see that
f (x) = (6x 3x2 )(ex ) + (3x2 x3 )( ex ) = (6x 6x2 + x3 ) ex ,
and so, looking at the stationary points, we have

f (0) = (0) e0 = 0 and so the second derivative test fails! However, we can
see that as
f (x) = x2 (3 x) ex ,
is positive when x < 0 and positive when 0 < x < 3, we can see that this
function is increasing on both sides of the stationary point at x = 0. Thus,
the first-derivative test tells us that (0, 0) is a point of inflection.
f (3) = (9) e3 < 0 and so (3, 27 e3 ) is a local maximum.
limiting behaviour: Using the fact above we would expect the ex to dominate
and this would mean that f (x) 0 as x whereas, as x , we would
expect f (x) as x3 and ex .

Then, using this information, we begin to sketch this curve by roughly indicating
these key features on some axes as in Figure 4.11(a) and then, joining them up with
a nice smooth curve, we get the sketch itself as in Figure 4.11(b).

27e3

27e3

y = f (x)
O

(a) The key features

(b) The sketch

Figure 4.11: Sketching the curve y = x3 ex in Example 4.9. (a) Using what we have

discovered about the key features of the curve, we can begin to see what it must look
like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.

Activity 4.5 Does the function in Example 4.9 have any other points of inflection?
If so, find them.

120

4.4. Curve sketching

Activity 4.6 Sketch the curve y = f (x) where f (x) = x2 ex and find all of its points
of inflection.

4.4.3

Asymptotes and cusps

The method above for sketching y = f (x) assumes, as we generally have throughout
this chapter, that the function, f (x), and its derivatives are well-defined for all x R.
But, more generally, there may be points at which the function or some of its
derivatives are not defined. When this happens we start to encounter asymptotes and
cusps. We will not dwell on this a great deal here, but we can use the following
examples to see how this may affect our sketches.
Example 4.10

Sketch the curve y = (x 1)1 .

Here we have y = f (x) where the function, f (x), is given by


f (x) =

1
,
x1

as long as x = 1. In particular, this means that we have


f (x) =

1
(x 1)2

and

f (x) =

2
,
(x 1)3

and so these derivatives arent defined at x = 1 either.2 Using these, we can see that
when
x < 1 we have f (x) < 0, f (x) < 0 and f (x) < 0, meaning that for these values
of x the function is negative, decreasing and concave; whereas when
x > 1 we have f (x) > 0, f (x) < 0 and f (x) > 0, meaning that for these values
of x the function is positive, decreasing and convex.
We can also see that the y-intercept of this curve occurs when y = 1 and that
f (x) 0 as x which means that this function has a horizontal asymptote
given by y = 0. However, the main feature that concerns us here is the vertical
asymptote at x = 1 which comes about because
lim f (x) =

x1

and

lim f (x) = ,

x1+

as we should expect to see from our discussion of hyperbolae in Section 2.2.4. The
sketch of this curve is illustrated in Figure 4.12(a).
In particular, observe that in Example 4.10, we have a case like the one mentioned at
the end of Section 4.3.3. That is, the function changes from being concave to convex at
a point, but there is no point of inflection. This happens because the second derivative
of this function does not exist at the point.
2

That is, the function and its derivatives are undefined when x = 1 as that would require us to divide
by zero and that is never allowed.

121

4. One-variable optimisation

Example 4.11

Sketch the curve y = (x 1)2 .

Here we have y = f (x) where the function, f (x), is given by


f (x) =

1
,
(x 1)2

as long as x = 1. In particular, this means that we have


f (x) =

2
(x 1)3

and

f (x) =

6
,
(x 1)4

and so these derivatives arent defined at x = 1 either.3 Using these, we can see that
when
x < 1 we have f (x) > 0, f (x) > 0 and f (x) > 0, meaning that for these values
of x the function is positive, increasing and convex; whereas when
x > 1 we have f (x) > 0, f (x) < 0 and f (x) > 0, meaning that for these values
of x the function is positive, decreasing and convex.
We can also see that the y-intercept of this curve occurs when y = 1 and that
f (x) 0 as x which means that this function has a horizontal asymptote
given by y = 0. However, the main feature that concerns us here is the vertical
asymptote at x = 1 which comes about because
lim f (x) =

x1

and

lim f (x) = ,

x1+

as now, f (x) is always positive. The sketch of this curve is illustrated in


Figure 4.12(b).

Example 4.12

Sketch the curve y = (x 1)2/3 .

Here we have y = f (x) where the function, f (x), is given by


f (x) = (x 1)2/3 ,
which is defined for all x R. However, this means that we have
f (x) =

2
3(x 1)1/3

and

f (x) =

2
,
9(x 1)4/3

and so these derivatives arent defined at x = 1.4 Using these, we can see that when
x < 1 we have f (x) > 0, f (x) < 0 and f (x) < 0, meaning that for these values
of x the function is positive, decreasing and concave; whereas when
3

Again, the function and its derivatives are undefined when x = 1 as that would require us to divide
by zero and that is never allowed.

122

4.5. Optimisation

x > 1 we have f (x) > 0, f (x) > 0 and f (x) < 0, meaning that for these values
of x the function is positive, increasing and concave.
We can also see that the y-intercept of this curve occurs when y = 1. The sketch of
this curve is illustrated in Figure 4.12(c) and we say that this curve has a cusp at
x = 1.
y

y
1
y=
x1
O
1

y=
x

1
(x 1)2

y = (x 1)2/3

1
O
x=1

x=1

(a)

(b)

(c)

Figure 4.12: Sketches of the curves in (a) Example 4.10, (b) Example 4.11 and (c)

Example 4.12. Observe the behaviour of all three of these curves at x = 1: in (a) and (b)
we have a vertical asymptote at x = 1 and in (c) we have a cusp at x = 1.

4.5

Optimisation

We have seen how to use derivatives to find and classify the stationary points of a
function and we have seen that a local maximum (or local minimum) is a point where
the function is larger (or smaller) than it is at other nearby points. However, we now
want to find the points, called a global maximum (or global minimum), where the
function is larger (or smaller) than it is at all other points. In such cases, we often say
that we are looking for the points where the function is optimised. We will see that
some functions do not have a global maximum (or a global minimum) even though they
may have a local maximum (or a local minimum).
In order to determine whether a function, f (x), has a global maximum or a global
minimum, it is always useful to ask the following questions.
Which local maximum gives the largest value of f (x) and which local minimum
gives the smallest value of f (x)?
What is the behaviour of f (x) as x and as x ?

Then, having answered these questions one should be in a position to identify the global
maximum with the largest value of f and the global minimum with the smallest value
of f assuming, of course, that these exist. Indeed, one way of making sense of these
questions and their answers is to sketch the relevant features of the curve y = f (x) and
then, using this sketch, one can then easily identify any global maximum or global
minimum that the function may have.
4

We can see that these derivatives are undefined when x = 1 as that would require us to divide by
zero and that is never allowed. Moreover, observe that this function does not have a vertical tangent
line at x = 1 because to the left of x = 1 the gradient is tending to and to the right of x = 1 the
gradient is tending to .

123

4. One-variable optimisation

For instance, consider the function whose graph is sketched in Figure 4.13(a) which has
two local maxima and two local minima. If we ask our questions about this function, we
see that:
Comparing the relevant values, we see that the largest local maximum occurs when
x = a and the smallest local minimum occurs when x = b.
The function tends to zero as x .

So, in this case, it should be clear that the global maximum occurs when x = a and the
global minimum occurs when x = b as illustrated in Figure 4.13(b). However, if we have

global max

local max

local max

local max

b
a

b
a

local min

local min

local min

(a) The sketch

global min

(b) The identification

Figure 4.13: (a) A sketch of a function with two local maxima and two local minima

which tends to zero as x . (b) This function has a global maximum and a global
minimum as indicated.
the function sketched in Figure 4.14(a) and ask our questions about that we see that:
Comparing the relevant values, we see that the largest local maximum occurs when
x = a and the smallest local minimum occurs when x = b.
The function tends to zero as x but tends to as x .

In this case, as illustrated in Figure 4.14(b), it should be clear that the global maximum
still occurs when x = a but now there is no global minimum since we can get far smaller
values of the function as x than we do from the smallest local minimum.
Activity 4.7 Use the sketches in Figures 4.9(b), 4.10(b) and 4.11(b) to determine
whether the functions in Examples 4.7, 4.8 and 4.9 have any global maxima or
global minima.
So, in general, we can see that if f : R R is a function that is differentiable for all
x R, then
its global maximum (or global minimum) can exist if the function is suitably
well-behaved as x and x ; and
if they exist, its global maximum (or global minimum) must occur at a local
maximum (or a local minimum).
But, having said this, a sketch is still the easiest way to see what is happening. We now
turn to some cases of optimisation where things work slightly differently.

124

4.5. Optimisation
y

y
local max

global max

local max

local max

local min

local min

local min

local min

!!

(a) The sketch

(b) The identification

Figure 4.14: (a) A sketch of a function with two local maxima and two local minima

which tends to zero as x and tends to as x . (b) This function has a


global maximum but no global minimum as indicated.

4.5.1

Constrained optimisation

Sometimes, it may be necessary to find the maximum (or minimum) value of f (x) when
the values of x are constrained (or restricted ). In such cases, there will be some interval,
such as x a or a x b, and we need to find the maximum (or minimum) value of
f (x) when x can only take these values.
For instance, consider the function whose graph is sketched in Figure 4.15(a) which has
a local minimum and a local maximum in the interval a x b. In this case, we can
see that the maximum and minimum values of f (x) for x in this interval must occur at
one of the points indicated by a . And, by comparing the values of f (x) at these
points we can see that the maximum occurs at the local maximum and the minimum
occurs at the local minimum as illustrated in Figure 4.15(b).
y

max

local max

min

local min

(a) The sketch

(b) The identification

Figure 4.15: (a) A sketch of a function in the interval a x b with a local maximum

and a local minimum. (b) This function has a maximum and a minimum as indicated.

125

4. One-variable optimisation

However, suppose we have the function whose graph is sketched in Figure 4.16(a) which,
again, has a local minimum and a local maximum in the interval a x b. In this case,
we can again see that the maximum and minimum values of f (x) for x in this interval
must occur at one of the points indicated by a . And, by comparing the values of f (x)
at these points we can now see that the maximum occurs at the end-point x = a and
the minimum occurs at the end-point x = b as illustrated in Figure 4.16(b).
y

y
max
local max

local max

local min

local min
min

(a) The sketch

(b) The identification

Figure 4.16: (a) A sketch of a function in the interval a x b with a local maximum

and a local minimum. (b) This function has a maximum and a minimum as indicated.
Activity 4.8 Use the sketches in Figures 4.9(b), 4.10(b) and 4.11(b) to find the
maximum and minimum values of the functions in Example 4.7 when 3 x 5,
Example 4.8 when 0 x 1 and Example 4.9 when 0 x 3.
So, in general, suppose that we have the interval a x b and f is a differentiable
function on this interval. In this case, the maximum (or minimum) value of f (x) will
occur

either at the local maximum (or local minimum) inside the interval that gives the
largest (or smallest) value of f (x)
or at one of the end-points of the interval, i.e. at x = a or x = b, if these give the
largest (or smallest) value of f (x).
This means that we should find the value of f (x) at any local maximum (or local
minimum) inside the interval and its value at the end-points of the interval, i.e. f (a)
and f (b). Having done this, the maximum (or minimum) will be the largest (or
smallest) of these values of f (x). But, of course, a sketch is still the easiest way to see
what is happening.

4.5.2

What happens when differentiability fails?

Our discussion of optimisation has assumed that the function in question is


differentiable for all relevant values of x whether that means x R or values of x inside

126

4.5. Optimisation

some interval. However, it is important to note that even if the function is not
differentiable at some relevant value(s) of x, we may still find that the maximum (or
minimum) value of the function occurs at such a point.
For instance, in Sections 3.3.4 and 4.4.3, we considered some ways in which a function
could fail to be differentiable at a point. Using these as a guide, we can consider the
three functions illustrated in Figure 4.17 which all fail to be differentiable at x = 1.
However, despite this, we see that in all three cases the global maximum of the function
occurs at x = 1 even though none of these points is a local maximum.5
y

(a) discontinuous

(b) corner

(c) cusp

Figure 4.17: Three functions which are not differentiable at x = 1 because (a) the function

is discontinuous at x = 1, (b) the function has a corner at x = 1 and (c) the function has
a cusp at x = 1.
Also, thinking about what we saw in Section 4.4.3, the presence of a vertical asymptote
may also mean that a global maximum or global minimum does not exist. Of course, as
we saw above, a sketch should enable us to see what is happening in any of these cases.
Activity 4.9 Consider the curves sketched in Figures 4.12(a) and (b). Do either of
these curves have a global minimum or a global maximum?
Now suppose that we are only interested in these curves for values of x in the
interval 0 x 1. Do either of these curves have a maximum or a minimum?

4.5.3

Applications of optimisation

Optimisation problems are very common in economics and we now introduce two ways
in which they can arise in that subject. The first is their use when a firm wants to find
the level of production which maximises its profit; and the second is when a government
wants to find the level of taxation which maximises the revenue generated by a tax that
has been imposed on a market.
Profit maximisation
When a firm sells an amount, q, it makes a profit given by
(q) = R(q) C(q),
5

That is, in all three cases, as f (1) does not exist it certainly cant be equal to zero!

127

4. One-variable optimisation

where R(q) is the revenue generated by selling this amount and C(q) is the cost of
producing this amount. Obviously, when doing this, the firm will want to sell an
amount q that will maximise its profit. Indeed, whereas the costs involved are
determined by factors intrinsic to the firm, the revenue generated is given by
R(q) = pq,
where p, the price per unit, is determined by the market the firm is selling in.

As an example, consider the case where the firm is a monopoly, i.e. it is the only
supplier of this product to the market. Indeed, as they are the only suppliers and the
amount they are supplying is q, the price that the consumers will be willing to pay for
this is given by p = pD (q) where pD (q) is, as in Section 2.1.5, the inverse demand
function of the market. As such, in this case, the revenue generated by the sale of an
amount q is given by
R(q) = qpD (q),
and this will yield a profit of
(q) = qpD (q) C(q).
Thus, in the case of a monopoly, given the firms cost function and the inverse demand
function for the market, we should be able to determine the amount, q, that the firm
should be selling by finding the value of q that maximises the firms profit. Lets look at
an example.
Example 4.13

Suppose that a firm is a monopoly with a cost function given by


C(q) = q 3 10q 2 + 25q + 10,

and the inverse demand function for this good is


pD (q) = 10 q.
Find the value of q that will maximise the firms profit.
This is a constrained optimisation problem as we must have
q 0 as q denotes the amount of good being sold, and
q 10 as, otherwise, the price that the consumers will pay will be negative.

So, we need to maximise the firms profit, i.e.

(q) = qpD (q) C(q) = q(10 q) (q 3 10q 2 + 25q + 10) = q 3 + 9q 2 15q 10,
given that q is in the interval given by 0 q 10.
To do this, we note that (q) is given by
(q) = 3q 2 + 18q 15,
and so, as the stationary points occur when (q) = 0, we solve the equation
3q 2 + 18q 15 = 0

128

q 2 6q + 5 = 0

(q 1)(q 5) = 0,

4.5. Optimisation

to see that the stationary points occur when q = 1 and q = 5. We can then see that
(q) = 6q + 18,
which, using the second-derivative test, tells us that when:
q = 1, we have (1) = 12 > 0 and so this is a local minimum.
q = 5, we have (5) = 12 < 0 and so this is a local maximum.

This means that the point we seek, i.e. the maximum of the profit function, must
occur at q = 5 or at one of the two end-points of our interval. But, using the profit
function, we see that
(0) = 10,

(5) = 15

and

(10) = 260,

which means that the maximum occurs at q = 5 because it yields the largest profit.
Thus, q = 5 will maximise the firms profit.
Activity 4.10 Sketch the profit function from Example 4.13 to verify that q = 5
does indeed give a maximum. (Do not try to find the q-intercepts here.)
Maximising tax revenue
In Section 2.1.5, we saw how the supply and demand functions for a market are
modified if a tax is imposed. We are now in a position to see what level of tax should be
imposed if the government wants to maximise its tax revenue. For instance, if an excise
tax of T per unit is imposed, then the governments tax revenue, R(T ), is given by the
tax per unit multiplied by the number of units sold at equilibrium, i.e.
R(T ) = qT T,
where qT is the equilibrium quantity in the presence of the tax. Of course, we can then
use this to find the value of T , say T , that maximises this tax revenue. Lets look at an
example.
Example 4.14 In Example 2.7, we saw how the introduction of an excise tax
affected the market in Example 2.6 and that the maximum tax that can be imposed
is given by Tm = 4. What excise tax, T , should be imposed if the government wants
to maximise its tax revenue, R(T ), from this market? Sketch a graph of the tax
revenue, R(T ), against T and comment on the relationship between the values of Tm
and T .
This is a constrained optimisation problem as we must have
T 0 as T is the tax per unit, and
T Tm as, otherwise, the market will cease to function.

129

4. One-variable optimisation

So, we need to maximise the tax revenue generated by the tax, R(T ), i.e.
R(T ) = qT T =

T
2

T =

T2
+ 2T,
2

given that T is in the interval given by 0 T Tm with Tm = 4.


To do this, we note that R (T ) is given by
R (T ) = T + 2,

and so, as the stationary point occurs when R (T ) = 0, we see that we have a
stationary point when T = 2. We can then see that R (T ) = 1 < 0 which, using
the second-derivative test, tells us that this stationary point is a maximum. This
means that the point we seek, i.e. the maximum of the tax revenue function, must
occur at T = 2 or at one of the two end-points of our interval. But, using the tax
revenue function, we see that
R(0) = 0,

R(2) = 2

and

R(4) = 0,

which means that the maximum occurs at T = 2 because it yields the largest tax
revenue. Thus, we take T = 2 and, as in the sketch in Figure 4.18, we find that T
is half-way between no tax (i.e. T = 0) and the maximum tax, Tm = 4.
RT
2

Figure 4.18: A sketch of the tax revenue generated by an excise tax of T for Example 4.14.

Notice how, in the presence of an excise tax, the tax revenue is maximised at a value of
T half-way between no tax (i.e. T = 0) and the maximum tax that can be imposed (i.e.
Tm = 4).
Of course, if a percentage of the price tax of 100r% is imposed, then the governments
tax revenue, R(r), would be given by the tax per unit, rpr , multiplied by the number of
units sold at equilibrium, i.e.
R(r) = rpr qr ,
where pr and qr are the equilibrium price and quantity in the presence of the tax. Of
course, we can also use this to find the value of r, say r , that maximises this tax
revenue. See, for example, Exercise 4.5.

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:

130

4.5. Solutions to activities

use first and second-order derivatives to identify the relevant features of a function;
sketch curves by identifying their key features;
optimise functions of one variable;
solve problems from economics-based subjects that involve optimisation.

Solutions to activities

Solution to activity 4.1


A concave function on an interval, I, has f (x) < 0 for all x I. So, if we take any
particular point, say a I, the tangent line to f at x = a has an equation given by
y = f (a) + (x a)f (a),
and so, our second-order Taylor approximation can be written as
f (x) = y +

(x a)2
f (a).
2

Now, as f (a) < 0 (recall that a I too), we see that f (x) < y for all x I where
x = a, i.e. these values of f always lie below the values from the tangent line to f at
x = a, as illustrated in Figure 4.19(a). But, of course, we can use any a I when we
run this argument and so a concave function is one which lies below all its tangent lines,
as illustrated in Figure 4.19(b). In particular, a function must be concave in the
neighbourhood of a local maximum.
y

y
f (x)
y = f (x)
O

y = f (x)
O

(a) f lies below the tangent at a I

(b) f lies below all its tangent lines

Figure 4.19: The relationship between a concave function and its tangent lines. (a) When

changing the value of x, we can see that the values of f (x) are less than the corresponding
values of y from the tangent line to f at a, i.e. f lies below this tangent line. (b) By
changing the value of a, we can see that f lies below all of its tangent lines.
Solution to activity 4.2
For f (x) = x4 1, we see that
f (x) = 4x3

and

f (x) = 12x2 ,

131

4. One-variable optimisation

which means, in particular, that f (0) = 0. Then, looking at the first-order derivative,
we see that f (x) < 0 for x < 0 and f (x) > 0 for x > 0 which means that the function
is decreasing for x < 0 and then increasing for x > 0 as shown in Figure 4.8(a). Or,
looking at the second-order derivative, we see that f (x) > 0 for all x = 0 and so the
function is convex as shown in Figure 4.8(a).
For f (x) = 1 x4 , we see that
f (x) = 4x3

and

f (x) = 12x2 ,

which means, in particular, that f (0) = 0. Then, looking at the first-order derivative,
we see that f (x) > 0 for x < 0 and f (x) < 0 for x > 0 which means that the function
is increasing for x < 0 and then decreasing for x > 0 as shown in Figure 4.8(b). Or,
looking at the second-order derivative, we see that f (x) < 0 for all x = 0 and so the
function is concave as shown in Figure 4.8(b).
Solution to activity 4.3
Given that f (x) is a polynomial and that, for some constants a = 0 and n N, the term
in this polynomial with the highest power of x is axn . We see that, as x , we have
f (x) if a > 0 as axn , and
f (x) if a < 0 as axn ,

regardless of whether n is even or odd. However, as x , we have


f (x) if a > 0 and n is even as axn ,
f (x) if a > 0 and n is odd as axn ,
f (x) if a < 0 and n is even as axn , and
f (x) if a < 0 and n is odd as axn ,

where now, it does matter whether n is even or odd.


Solution to activity 4.4
In Example 4.8, we found that
f (x) = 24x2 24x + 4,
and so we begin our search for points of inflection by seeing where f (x) = 0. That is,
we solve the equation

1
1
6 12
2
2
= ,
24x 24x + 4 = 0 = 6x 6x + 1 = 0 = x =
12
2
12
if we use the quadratic formula. Now, if f (x) also changes sign at these values of x we
have a point of inflection. To see whether this is the case, consider that we now have
f (x) = 24x2 24x + 4 = 24 x

1
1

2
12

1
1
+
2
12

= 24(x a)(x b),

if we let x = a and x = b denote the smaller and the larger values of x we are interested
in respectively. This means that, considering the signs of the two factors, we have

132

4.5. Solutions to activities

x<a
xa

xb

f (x)
+

x=a
0

a<x<b
+

x=b
+
0
0

x>b
+
+
+

so we see that f (x) does indeed change sign at x = a and x = b. Consequently, both
x = a and x = b where
a=

1
1

2
12

and b =

1
1
+ ,
2
12

are points of inflection of the function in Example 4.8.


Solution to activity 4.5
In Example 4.9, we found that
f (x) = (6x 6x2 + x3 ) ex ,
and so we begin our search for points of inflection by seeing where f (x) = 0. That is,
we solve the equation
(6x 6x2 + x3 ) ex = 0

x(6 6x + x2 ) ex = 0,

which, as ex = 0 for all x R, gives us x = 0 which we have already analysed in the


example and

6 12
= 3 3,
x=
2
if we use the quadratic formula. Now, if f (x) also changes sign at these values of x we
have a point of inflection. To see whether this is the case, consider that we now have
f (x) = (6x6x2 +x3 ) ex = x x 3

x 3+

ex = x(xa)(xb) ex ,

if we let x = a and x = b denote the smaller and the larger values of x we are interested
in respectively. This means that, considering the signs of the four factors, we have

x
xa
xb
ex
f (x)

0<x<a
+

+
+

x=a
+
0

+
0

a<x<b
+
+

x=b
+
+
0
+
0

x>b
+
+
+
+
+

so we see that f (x) does indeed change sign at x = a and x = b. Consequently, both
x = a and x = b where

a = 3 3 and b = 3 + 3,
are also points of inflection of the function in Example 4.9.
Solution to activity 4.6
Here f (x) = x2 ex and we find the key features of the curve y = f (x), namely

133

4. One-variable optimisation

x-intercepts: These occur when y = 0 and so we solve the equation given by


f (x) = 0, i.e.
x2 ex = 0.
But, as ex = 0 for all x R, we find that the only x-intercept occurs when x = 0.
y-intercept: This occurs when x = 0 and so using y = f (0) we see that the
y-intercept occurs when y = 0. Note, in particular, that this means that the curve
goes through the origin (as we should have expected since the x-intercept we found
occurs when x = 0).

finding the stationary points: These occur when f (x) = 0 and so, using the
product rule, we get
f (x) = (2x)(ex ) + (x2 )(ex ) = x(2 + x) ex ,
and so we solve the equation
x(2 + x) ex = 0.
But, as ex = 0 for all x R, we find that the stationary points occur when x = 0
and x = 2. Then, we use y = f (x) to find the values of y at these points so that
we can locate them on the sketch. Doing this, we find that
x = 0 gives y = f (0) = (0)2 e0 = (0)(1) = 0, and

x = 2 gives y = f (2) = (2)2 e2 = 4 e2 .

So, the stationary points have coordinates given by (0, 0) and (2, 4 e2 ).
classifying the stationary points: Lets use the second-order derivative test here. We
can use the product rule again to see that
f (x) = (2 + 2x)(ex ) + (2x + x2 )(ex ) = (2 + 4x + x2 ) ex ,
and so, looking at the stationary points, we have
f (0) = (2) e0 > 0 and so (0, 0) is a local minimum, and

f (2) = (2) e2 < 0 and so (2, 4 e2 ) is a local maximum.

limiting behaviour: Using the fact in Section 4.4.2, we would expect the ex to
dominate and this would mean that f (x) as x whereas, as x , we
would expect f (x) 0 as ex 0.

Then, using this information, we begin to sketch this curve by roughly indicating these
key features on some axes as in Figure 4.20(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 4.20(b).
To find the points of inflection of this function, we start by seeing where f (x) = 0.
That is, we solve the equation
(2 + 4x + x2 ) ex = 0,
which, as ex = 0 for all x R, gives us
4
x=
2

134

= 2

2,

4.5. Solutions to activities


y

4e2

4e2
y = f (x)

(a) The key features

(b) The sketch

Figure 4.20: Sketching the curve y = x2 ex . (a) Using what we have discovered about the

key features of the curve, we can begin to see what it must look like. (b) By joining up
these key features with a nice smooth curve, we get the sketch itself.
if we use the quadratic formula. Now, if f (x) also changes sign at these values of x we
have a point of inflection. To see whether this is the case, consider that we now have

x 2 + 2 ex = (x a)(x b) ex ,
f (x) = (2 + 4x + x2 ) ex = x 2 2
if we let x = a and x = b denote the smaller and the larger values of x we are interested
in respectively. This means that, considering the signs of the four factors, we have
x<a
xa

xb

ex
+
f (x)
+

x=a
0

+
0

a<x<b
+

x=b
+
0
+
0

x>b
+
+
+
+

so we see that f (x) does indeed change sign at x = a and x = b. Consequently, both
x = a and x = b where

a = 2 2 and b = 2 + 2,
are points of inflection of the function f (x) = x2 ex .
Solution to activity 4.7
Looking at the figures in question, we have:
Using Figure 4.9(b) we see that the function in Example 4.7 has neither a global
maximum (as f (x) as x ) nor a global minimum (as f (x) as
x ).
Using Figure 4.10(b) we see that the function in Example 4.8 has a global
minimum of zero when x = 0 and x = 1 but no global maximum as f (x) as
x or x .
Using Figure 4.11(b) we see that the function in Example 4.9 has a global
maximum of 27 e3 when x = 3 but no global minimum as f (x) as
x .

135

4. One-variable optimisation

Solution to activity 4.8


Looking at the figures in question, we have:
Using Figure 4.9(b) when 3 x 5 we see that the function in Example 4.7 has
a maximum value of 400/27 when x = 5/3 and a minimum value of 36 when
x = 3.

Using Figure 4.10(b) when 0 x 1 we see that the function in Example 4.8 has a
maximum value of 1/8 when x = 1/2 and a minimum value of zero when x = 0 and
x = 1.
Using Figure 4.11(b) when 0 x 3 we see that the function in Example 4.9 has a
maximum value of 27 e3 when x = 3 a minimum value of zero when x = 0.
Solution to activity 4.9
Looking at the figures in question, we have:
Using Figure 4.12(a) we see that the function has neither a global maximum (as
f (x) as x 1+ ) nor a global minimum (as f (x) as x 1 ).
Using Figure 4.12(b) we see that the function has neither a global maximum (as
f (x) as x 1) nor a global minimum (as, even though f (x) 0 as x
or x , it never gets there).

Now, restricting our attention to 0 x 1, we can see from the figures that:

Using Figure 4.12(a) we see that the function has a maximum value of 1 when
x = 0 but no minimum value as f (x) as x 1 .
Using Figure 4.12(b) we see that the function has a minimum value of 1 when
x = 0 but no maximum value as f (x) as x 1 .
Solution to activity 4.10
Using the information in Exercise 4.13 and noting that (1) = 17, we get the sketch in
Figure 4.21 if we are allowed to omit the q-intercepts. From this, we can clearly see that
q = 5 gives us the maximum value of (q) for 0 q 10.

Exercises
Exercise 4.1
For what values of x is the function
f (x) = x ex ,
increasing or decreasing? Use this information to find and classify any stationary points
of this function.

136

4.5. Exercises

(q)
15
1
O
10
17

10
5

4
260
Figure 4.21: A sketch of the profit function from Example 4.13 for Activity 4.10. (Note

that, as instructed, we have not found the q-intercepts of this profit function.)
For what values of x is this function convex or concave? Use this information to
determine whether this function has any points of inflection.
Exercise 4.2
Consider the function
f (x) = 12 ln x x2 + 10x,
where x > 0. Find the x-coordinates of the stationary points of f (x) and classify them.
Exercise 4.3
Sketch the curve y = f (x) where
f (x) = x3 +

1
,
x3

for x = 0. Does this function have a global maximum or a global minimum?


Exercise 4.4
Consider the function given by
f (x) = 3x5 25x3 + 60x.
(a) Show that the curve y = f (x) has only one x-intercept and find it.
(b) Find the stationary points of this function and classify them.
(c) Sketch the curve y = f (x).
(d) If the domain of f is restricted to values of x such that 2 x 2, identify the
global maximum and the global minimum of the function f (x).
What are the global maximum and the global minimum of the function if the
domain of f is restricted to values of x such that 3 x 3?

137

4. One-variable optimisation

Exercise 4.5
In Exercise 2.3 we saw how the introduction a percentage [of the price] tax of 100r%
affected a market and we found that the maximum tax that can be imposed is given by
rm = 1/2.
What tax, r , should be imposed if the government wants to maximise its tax revenue,
R(r), from this market? Sketch the graph of the tax revenue function, R(r), for values
of r that make economic sense.

Solutions to exercises
Solution to exercise 4.1
Using the product rule, we see that the derivative of the function f (x) = x ex is given by
f (x) = (1) ex +x(ex ) = (1 + x) ex ,
and so, as ex > 0 for all x R, we see that the function is
decreasing when x < 1 as f (x) < 0, and
increasing when x > 1 as f (x) > 0.

In particular, we have a stationary point at x = 1 as this makes f (x) = 0 and,


furthermore, this stationary point is a local minimum as f (x) is decreasing before it and
increasing after it.
Using the product rule again, we see that the second-order derivative of the function is
given by
f (x) = (1) ex +(1 + x)(ex ) = (2 + x) ex ,
and so, as ex > 0 for all x R, we see that the function is
concave when x < 2 as f (x) < 0, and
convex when x > 2 as f (x) > 0.

In particular, we have a point of inflection at x = 2 as this makes f (x) = 0 and f (x)


changes sign at this point (i.e. the function changes from being concave to being convex
at this point).
Solution to exercise 4.2
For x > 0, we have
f (x) = 12 ln(x) x2 + 10x

f (x) =

12
2x + 10.
x

The stationary points of f (x) occur when f (x) = 0 and so we have to solve the equation

12 + 2x2 10x
12
2x+10 = 0 =
= 0 = x2 5x+6 = 0 = (x2)(x3) = 0.
x
x

138

4.5. Solutions to exercises

Thus the x-coordinates of the stationary points of f (x) are x = 2 and x = 3.


To classify them, we note that
f (x) = 12x1 2x + 10

f (x) = 12x2 2 =

12
2,
x2

which tells us that when


x = 2, we have f (2) = 3 2 = 1 > 0 and so this is a local minimum.
x = 3, we have f (3) =

4
3

2 = 23 < 0 and so this is a local maximum.

Thus, the function, f (x), has a local minimum when x = 2 and a local maximum when
x = 3.
Solution to exercise 4.3
To sketch the curve y = f (x) where
f (x) = x3 +

1
= x3 + x3 ,
x3

for x = 0, we find its key features, namely


x-intercepts: These occur when y = 0 and so we solve the equation given by
x3 +

1
=0
x3

x6 + 1
=0
x3

x6 + 1 = 0.

But, as x6 + 1 > 0 for all x R, we find that this equation has no solutions and so
the curve has no x-intercepts.
y-intercept: This occurs when x = 0 and so, as the function is not defined when
x = 0, we find that the curve has no y-intercepts.
finding the stationary points: These occur when f (x) = 0 and so, as
f (x) = 3x2 3x4 ,
we have to solve the equation
3x2 3x4 = 0

x2 =

1
x4

x6 = 1,

i.e. the stationary points of f (x) occur when x = 1. Then, we use y = f (x) to find
the values of y at these points so that we can locate them on the sketch. Doing
this, we find that
x = 1 gives y = f (1) = 1 + 1 = 2, and

x = 1 gives y = f (1) = 1 + (1) = 2.

So, the stationary points have coordinates given by (1, 2) and (1, 2).
classifying the stationary points: The second-order derivative of the function is
f (x) = 6x + 12x5 = 6x +

12
,
x5

and so, looking at the stationary points, we have

139

4. One-variable optimisation

f (1) = 6 + 12 = 18 > 0 and so (1, 2) is a local minimum.

f (1) = 6 + (12) = 18 < 0 and so (1, 2) is a local maximum.


limiting behaviour: We see that f (x) as x and f (x) as
x . (Note, in particular, that 1/x3 0 as x and so the limiting
behaviour is determined by the x3 term in f (x).)
In this case, we must also look at what the function is doing near x = 0 as it is
undefined there. Indeed, here, because of the 1/x3 term in f (x), we have

f (x) as x 0+ , and
f (x) as x 0 ,

i.e. the curve y = f (x) has a vertical asymptote when x = 0. Consequently, using this
information, we can get the sketch in Figure 4.22.
y

y = x3 +

2
1

1
x3
x

Figure 4.22: A sketch of the curve y = f (x) from Exercise 4.3.

Indeed, using this sketch, we can clearly see that this function has neither a global
minimum nor a global maximum. In particular, notice that the local minimum is not
global because our local maximum gives us a smaller value of f (x) and the local
maximum is not global since the local minimum gives us a larger value of f (x)!
Solution to exercise 4.4
(a) To find the x-intercepts of the curve y = f (x) we set y = 0 and solve the equation
3x5 25x3 +60x = 0

x(3x4 25x2 +60) = 0

x = 0 or 3x4 25x2 +60 = 0.

To deal with this second possibility, we notice that we have a quadratic equation in x2
and so, if we were to use the quadratic formula (say), we get
x2 =

25

252 4(3)(60)
.
2(3)

But here, the discriminant is negative as


252 4(3)(60) = 625 720 = 95,

140

4.5. Solutions to exercises

and so this equation gives us no solutions for x2 and, hence, no solutions for x. Thus,
the only solution to y = 0 is x = 0 and this is, therefore, the only x-intercept of the
curve y = f (x).
(b) The stationary points occur when f (x) = 0 and so, as
f (x) = 15x4 75x2 + 60,
we have to solve the equation
15x4 75x2 + 60 = 0

x4 5x2 + 4 = 0.

This is also a quadratic equation in x2 and, if we factorise (say), we get


(x2 4)(x2 1) = 0

x2 = 4, 1,

and this, in turn, gives us x = 2 and x = 1 as the x-coordinates of the stationary


points.
To classify these stationary points, we find the second derivative of f (x), i.e.
f (x) = 60x3 150x = 30x(2x2 5),
and we can see that
If x = 2, we have
f (2) = 30(2)(2(2)2 5) = 60(8 5) = 180 < 0,
and so this is a local maximum. At this point we also have
y = f (2) = 3(2)5 25(2)3 + 60(2) = 96 + 200 120 = 16,
and so the coordinates of this point are (2, 16).
If x = 1, we have
f (1) = 30(1)(2(1)2 5) = 30(2 5) = 90 > 0,
and so this is a local minimum. At this point we also have
y = f (1) = 3(1)5 25(1)3 + 60(1) = 3 + 25 60 = 38,
and so the coordinates of this point are (1, 38).
If x = 1, we have
f (1) = 30(1)(2(1)2 5) = 30(2 5) = 90 < 0,
and so this is a local maximum. At this point we also have
y = f (1) = 3(1)5 25(1)3 + 60(1) = 3 25 + 60 = 38,
and so the coordinates of this point are (1, 38).

141

4. One-variable optimisation

If x = 2, we have
f (2) = 30(2)(2(2)2 5) = 60(8 5) = 180 > 0,
and so this is a local minimum. At this point we also have
y = f (2) = 3(2)5 25(2)3 + 60(2) = 96 200 + 120 = 16,
and so the coordinates of this point are (2, 16).

(c) We can use the information that we have found so far together with the observation
that the y-intercept occurs when x = 0, i.e. when y = f (0) = 0, to get the sketch in
Figure 4.23(a).

y
y = f (x)

y = f (x)

(3, 234)
(1, 38)

(2, 16)

(2, 16)

(1, 38)

(1, 38)

3
(2, 16)

(3, 234)

(a)

(2, 16)
3

(1, 38)

(b)

Figure 4.23: (a) A sketch of the curve y = f (x) from Exercise 4.4(c). (b) For

Exercise 4.4(d), picking out the interval 2 x 2 using vertical dotted lines and
the interval 3 x 3 using vertical dashed lines.
(d) Given that 2 x 2 and looking at the sketch in Figure 4.23(a), it should be
clear that the global maximum and the global minimum of f (x) are at the points (1, 38)
and (1, 38) respectively. If youre unclear about this, this interval is picked out by
the vertical dotted lines in Figure 4.23(b).
If we now have 3 x 3, looking at the sketch in Figure 4.23(a), it should be clear
that the global maximum and the global minimum of f (x) are at the points (3, 234) as
f (3) = 234 and (3, 234) as f (3) = 234 respectively. If youre unclear about this,
this interval is picked out by the vertical dashed lines in Figure 4.23(b).

142

4.5. Solutions to exercises

Solution to exercise 4.5


As we mentioned at the end of Section 4.5.3, the tax revenue, R(r), generated by this
tax is given by
R(r) = (rpr )qr ,
as it is the tax paid per unit sold, i.e. rpr , multiplied by the quantity sold, i.e. qr , if the
market is in equilibrium. If we refer back to Exercise 2.3 for pr and qr , this then gives us
R(r) = r

4 8r
2r

12
2r

r 2r2
= 48
.
(2 r)2

Now, to find the value of r, i.e. r , that maximises R(r), we differentiate it with respect
to r using the quotient and chain rules to get
2

R (r) = 48

(1 4r)(2 r) (r 2r )[2(2 r)(1)]


(1 4r)(2 r) + 2(r 2r )
= 48
,
4
(2 r)
(2 r)3

and this simplifies to give us

2 7r
.
(2 r)3
This has a stationary point when R (r) = 0, i.e. when r = 2/7, and as R (r) changes
from positive to negative as r goes through this value, we can see that this stationary
point is a local maximum.6 Now, in this case, we must have 0 r rm for the market
to function and so this is a constrained optimisation problem. That is, the maximum we
seek is either the value of R(r) at our local maximum, i.e.
R (r) = 48

2
7

2
=
7

48
2 72

12
2 72

2
7

2
=
7

12
12
7

12
7
12
7

= 2,

or its value at one of the end-points, i.e. r = 0 or r = rm = 1/2. But,


R(0) = 0 < 2

and

1
2

1
=
2

12
2 12

48
2 12

1
2

= 0 < 2,

and so the maximum value of R(r) is 2 and this occurs when r = 2/7, i.e. at the local
maximum. Thus, r = 2 and using the information we have so far, we can get the sketch
in Figure 4.24(a) for values of r that make economic sense, i.e. those where
0 r 1/2.7
Aside: As shown in Figure 4.24(b), observe that once we move away from the
economically meaningful values of r (i.e. where 0 r 1/2) the graph of R(r) gets
quite complicated. Indeed, note that as
R(r) = 48

r 2r2
,
(2 r)2

we can see that it has a vertical asymptote when r = 2 and, because we can write
R(r) = 48

r 2r2
2(r2 4r + 4) 7r + 8
7r 8
=
48
= 96 48
,
2
2
(2 r)
4 4r + r
(2 r)2

we can see that R(r) 96 as r , i.e. we also have a horizontal asymptote here.
6

Alternatively, you can show that this stationary point is a local maximum by showing that R (r) < 0
when r = 2/7, but this isnt quite so easy.
7
Note, in particular, that r is clearly not half-way between no tax (i.e. r = 0) and the maximum tax
(i.e. rm = 1/2) as it was in Example 4.14 when we looked at an excise tax.

143

4. One-variable optimisation

4
R(r)
R(r)
2
O 2

2
7

1
2
(a)

96
(b)

Figure 4.24: For Exercise 4.5: (a) A sketch of the graph of R(r) for the economically
meaningful values of r, i.e. those between zero (i.e. no tax) and 1/2 (i.e. the maximum
tax). (b) As an aside, we could have sketched the graph of R(r) for some economically
meaningless values of r (specifically r < 0 and r 1/2). Observe, in particular, the
vertical asymptote when r = 2 and the horizontal asymptote where R(r) 96 as
r . (Note that the details of what is happening in the positive quadrant, which we
saw in (a), have been omitted from (b) for clarity.)

144

Chapter 5
Integration
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 10.2, parts of 10.310.4, 10.510.9.

Anthony and Biggs (1996) Chapters 25 and 26.


Further reading
Simon and Blume (1994) Appendix A4.
Adams and Essex (2010) Sections 5.55.7, 6.16.2 and parts of 6.3.
Aims and objectives
The objectives of this chapter are as follows.
To introduce the idea of an integral and see how it can be found using various
techniques.
To use integrals to find areas.
To see how integrals can be used in economics-based subjects.
Specific learning outcomes can be found near the end of this chapter.

5.1

Introduction: What is integration?

In Chapter 3, we introduced differentiation and saw that a function, f (x), could be


differentiated with respect to x to yield its derivative, which we denoted by
df
dx

or

f (x).

And, in particular, we saw how to find such derivatives by using the rules of
differentiation and some standard derivatives. Now, given a function, f (x), we want to
make sense of what it means to find the indefinite integral of this function with respect
to x, which is denoted by
f (x) dx.

145

5. Integration

In such cases, as we are integrating the function f (x) with respect to x, we call it the
integrand. And, similarly to what we saw before, we will see how to find such integrals
by using the rules of integration and some standard integrals. In particular, the standard
integrals will be closely related to our standard derivatives since the key idea behind our
method for finding integrals will be the idea that integration is the process that
undoes (or reverses) the process of differentiation, i.e. the process of indefinite
integration can be thought of as antidifferentiation and the resulting indefinite integral
can be thought of as an antiderivative.

Consider the functions F (x) and f (x) where we know that f (x) is the derivative1 of
F (x), i.e.
dF
= f (x).
dx
Now, using the idea that integration undoes differentiation, i.e. if we integrate f (x)
with respect to x we are looking for a function, F (x), whose derivative is f (x), we can
see that
f (x) dx must be, more or less, given by F (x).
In such cases, we say that F (x) is an antiderivative of f (x) as opposed to, say, the
indefinite integral.
However, you may wonder why we say that the function, F (x), that we found above is
an, as opposed to the, antiderivative of f (x). The reason for this is that if, instead of
the function F (x) we had the function F (x) + c where c is a constant, then its
derivative would still be f (x), i.e.
d
F (x) + c
dx

= f (x),

and so, using the reasoning above, we would find that


f (x) dx can also, more or less, be given by F (x) + c,
where c is a constant. That is, F (x) + c is also an antiderivative of f (x) for this
constant c.
Example 5.1

Show that 4x2 and 4x2 + 1 are both antiderivatives of 8x.

4x2 is an antiderivative of 8x as we can differentiate 4x2 to get 8x. But, similarly, we


can see that 4x2 + 1 is also an antiderivative of 8x as we can differentiate 4x2 + 1 to
get 8x.
As such, because this works for any constant c we add to F (x), we say that the
indefinite integral gives us a whole family of antiderivatives which only differ by a
constant, i.e. the choice of c. In this way, we say that indefinite integration, i.e. the
process of finding
f (x) dx,
is antidifferentiation, i.e. it seeks all the functions F (x) + c that can be differentiated to
yield f (x) and, as such, every one of these functions will be an antiderivative of f (x).
1

We say that it is the derivative because differentiation always yields exactly one answer.

146

5.2. How to find indefinite integrals

Example 5.2

What is

8x dx?

We saw in Example 5.1 that 4x2 is an antiderivative of 8x. This means that
8x dx = 4x2 + c,
where c is an arbitrary (i.e. any) constant. Notice that this works because
differentiating 4x2 + c we get 8x.
Generally speaking then, we have the following.
If F (x) is a function whose derivative is the function f (x), then we have

f (x) dx = F (x) + c,
where c is an arbitrary constant. In particular, we call the
function, f (x), the integrand as it is what we are integrating,
function, F (x), an antiderivative as its derivative is f (x),
constant, c, a constant of integration which is completely arbitrary,2 and
integral,

f (x) dx, an indefinite integral since, in the result, c is arbitrary.

Now that we have the idea, lets see how were going to actually find the indefinite
integrals of the functions that commonly occur in this course.

5.2

How to find indefinite integrals

The previous section told us how to find indefinite integrals using the antiderivatives,
but now we want to explore a more convenient way of finding them. The key idea is
that we introduce standard integrals which tell us how to integrate the basic functions
that we saw in Chapter 2. Once we know how to integrate these, the rules of integration
will allow us to integrate combinations of these functions.

5.2.1

Standard integrals

In Example 5.2, we used the idea that indefinite integration is antidifferentiation to


show that the function f (x) = 8x has an indefinite integral given by
8x dx = 4x2 + c,
where c is an arbitrary constant. We now state some results that will allow us to find
the indefinite integrals of our other basic functions.
2

As we can add any constant to F (x) to account for the fact that F (x) + c, for any constant c R,
is also an antiderivative.

147

5. Integration

Power functions
If n = 1, we have

xn+1
x dx =
+ c,
n+1
where c is an arbitrary constant and this works because
n

d
dx

xn+1
+c
n+1

(n + 1)xn
+ 0 = xn .
n+1

In particular, if n = 0, we have
x0 dx = x + c,

1 dx =

and this works because the derivative of x + c is 1.


However, if we have n = 1, we have
1
dx = ln |x| + c,
x

x1 dx =

where we need the modulus sign in ln |x| as x may be negative but the logarithm
function is only defined for x > 0. This works because, if x > 0, we have |x| = x and so
d ln(x)
1
d ln |x|
=
= ,
dx
dx
x
whereas if x < 0, we have |x| = x and so

if we use the chain rule.

d ln(x)
1
1
d ln |x|
=
=
= ,
dx
dx
x
x

Exponential and logarithmic functions


If we are using e, we have
ex dx = ex +c,
where c is an arbitrary constant and this works because
d
dx

ex +c

= ex .

However, there is no nice standard integral for ln x and so well see how to find
ln x dx,
when we encounter integration by parts in Example 5.20.
If we have another base, a, the standard integrals are not so simple. But, we can see that
ax dx =

148

ax
+ c,
ln a

5.2. How to find indefinite integrals

where c is an arbitrary constant since, using the result from Activity 3.9, we have
d
dx

ax
+c
ln a

ax ln a
+ 0 = ax .
ln a

However, there is also no nice standard integral for loga x and so well see how to find
loga x dx,
in Activity 5.12 where we will use the change of base formula once we can integrate ln x.
Sine and cosine functions
For the sine and cosine function we find that
sin x dx = cos x + c

and

cos x dx = sin x + c,

where c is an arbitrary constant. The former works because


d
dx

cos x + c

= ( sin x) + 0 = sin x,

whereas the latter works because the derivative of sin x is cos x.

5.2.2

The basic rules of integration

In Section 2.1.2, we saw that there are several standard ways of making new functions
from old ones and, in Section 3.2.2, we saw how the rules of differentiation could be
used to differentiate these new functions. Here we will see how we can use standard
integrals, i.e. the integrals of our basic functions, and rules of integration to integrate
the new functions that are created from these basic ones in these standard ways. We
start with the most straightforward of these which allows us to integrate linear
combinations of functions.
The linear combination rule
If k and l are constants, this allows us to integrate the linear combination,
kf (x) + lg(x), of two functions f (x) and g(x). It states that
[kf (x) + lg(x)] dx = k

f (x) dx + l

g(x) dx.

Indeed, this gives us three more basic rules straightaway, i.e. the
constant multiple rule: If k is a constant and f (x) is a function, then
kf (x) dx = k

f (x) dx.

149

5. Integration

sum rule: If f (x) and g(x) are functions, then


[f (x) + g(x)] dx =

f (x) dx +

g(x) dx.

difference rule: If f (x) and g(x) are functions,


[f (x) g(x)] dx =

f (x) dx +

g(x) dx.

Activity 5.1 Derive the constant multiple, sum and difference rules from the linear
combination rule.
Activity 5.2

Use antiderivatives to show that the linear combination rule works.


Using these rules we see that:

Example 5.3

3x

21

dx = 3

x2
1
2

+ c = 6x 2 + c by the constant multiple rule,


3

x +x

1
2

3
1 3
x3 x 2
dx =
+ 3 +c=
x + 2x 2 + c by the sum rule,
3
3
2

[sin x cos x] dx = cos x sin x + c by the difference rule, and


3
4 ex dx = 3 ln |x| 4 ex +c by the linear combination rule,
x
where c is an arbitrary constant.
So, in the case of linear combinations of functions such as these, we see that the integral
of the linear combination is given by the linear combination of the integrals.
Activity 5.3
to x.

Use the rules above to integrate the following functions with respect
(a) 3 cos x,

(b) ex + cos x,

3
(c) 3 sin x .
x

We now look at the other rules of integration, i.e. the ones that will allow us to
integrate other combinations of functions. But, unlike what we saw with the rules of
differentiation in Section 3.2.2, we shall see that these are harder to apply.

5.2.3

Integration by substitution

Integration by substitution is a way of dealing with integrands that involve the


composition of two functions and, as such, it is closely related to the chain rule of
differentiation. To see how it works, we will start by seeing how integration by
substitution is related to the chain rule and then we will describe how to apply this
rule. We will then apply this rule in some simple examples and then some harder ones.

150

5.2. How to find indefinite integrals

Why integration by substitution works


We start by noting that the chain rule for differentiation tells us that if
h(x) = (f g)(x) = f (g(x)),
then we write h as f (g) so that, on differentiating, we get
dh
df dg
=
.
dx
dg dx
But, because of this we can see that
h(x) = (f g)(x) = f (g(x)) is an antiderivative of

dh
df dg
=
,
dx
dg dx

and so, we have

df dg
dx = f (g(x)) + c,
dg dx
which is the basis of integration by substitution. However, this is quite hard to apply
and so, as a useful way of applying this rule, we think of
dg

as

dg
dx,
dx

so that we have

df
dg = f (g) + c,
dg

and this is the key to the method that we shall be using here.
How to integrate by substitution
We can now see how to apply integration by substitution. The basic idea is that, if you
are given an integrand that involves a composition of two functions, this rule of
integration sometimes allows you to turn it into an easier integral by making a
substitution. That is:
The integral involves the derivative of a composition and has the form
f (g(x))g (x) dx.
Write f (g(x)) as f (g) and g (x)dx as dg. This should give you the easier integral
f (g) dg.
Find this integral and replace all occurrences of g with g(x) to get your final
answer.
Now, to make this clearer, lets look at some examples.
Some simple applications of integration by substitution
Easy integrations by substitution involve an integrand which is nothing more than a
simple composition of two functions and so there can be no doubt about which function
should be g. To see this, lets consider what happens when we want to integrate a
simple composition which involves the function 3x + 1.

151

5. Integration

Example 5.4

Find

(3x + 1)2 dx.

dg
= 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence,
Taking g = 3x + 1 we have
dx
substitution gives
(3x + 1)2 dx =

g2

1
dg
3

1
3

g 2 dg =

(3x + 1)3
1 g3

+c=
+ c,
3 3
9

where c is an arbitrary constant.

Example 5.5

1
dx.
3x + 1

Find

dg
Taking g = 3x + 1 we have
= 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence,
dx
substitution gives
1
dx =
3x + 1

1
g

1
dg
3

1
3

g 1 dg =

1
1
ln |g| + c = ln |3x + 1| + c,
3
3

where c is an arbitrary constant.

Example 5.6

e3x+1 dx.

Find

dg
= 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence,
Taking g = 3x + 1 we have
dx
substitution gives
e3x+1 dx =

eg

1
dg
3

1
3

eg dg =

1 g
1
e +c = e3x+1 +c,
3
3

where c is an arbitrary constant.


In particular, observe what changes in these examples and what stays the same. Indeed,
just for comparison, we can see what would happen if we had a composition which is
like the one in Example 5.4 but it now involves the function 4x + 7 instead of 3x + 1.
Example 5.7

Find

(4x + 7)2 dx.

dg
Taking g = 4x + 7 we have
= 4 and so dg = 4 dx, i.e. dx = 14 dg. Hence,
dx
substitution gives
(4x + 7)2 dx =

g2

1
dg
4

where c is an arbitrary constant.

152

1
4

g 2 dg =

1 g3
(4x + 7)3

+c=
+ c,
4 3
12

5.2. How to find indefinite integrals

1
dx and
4x + 7

In a similar manner, find

Activity 5.4

e4x+7 dx.

Note that in all of these examples, the substitution works because we have
g(x) = ax + b and hence
dg
=a
dx

dg = a dx

1
dg = dx,
a

where a = 0 and b are constants. Indeed, as we end up with an integrand involving a1 ,


which is a constant, it can be moved out of the integral using the constant multiple rule
of integration. So, if our integrand is a composition, i.e. f (g(x)), and g(x) is a linear
function, i.e. it has the form ax + b where a = 0 and b are constants, this kind of
substitution will always work and this leads to the general result that
1
f (ax + b) dx = F (ax + b) + c,
a

where F (x) is an antiderivative of f (x) and c is an arbitrary constant.


Activity 5.5 Suppose that a = 0 and b are constants. Use this result to find an
expression for
(ax + b)n dx,
when n is a constant. Also find expressions for
eax+b dx,

sin(ax + b) dx and

cos(ax + b) dx.

What happens if a = 0?
Activity 5.6 Using the expressions you found in Activity 5.5, verify your answers
to Activity 5.4.
Some less simple applications of integration by substitution
We will also see slightly harder integrations by substitution where the integrand
involves a composition of two functions multiplied by another function. Although, even
in these cases, there can be little doubt about which function should be g. To see this,
lets consider what happens when we want to integrate a simple composition which
involves the function x2 + 1.
Example 5.8

Find

(x2 + 1)2 x dx.

Taking g = x2 + 1 we have g (x) = 2x and so dg = 2x dx, i.e. x dx = 12 dg. Hence,


substitution gives
(x2 + 1)2 x dx =

g2

1
dg
2

1
2

g 2 dg =

1 g3
(x2 + 1)3

+c=
+ c,
2 3
6

153

5. Integration

i.e. the extra x in the integrand was actually needed for the substitution g = x2 + 1
to work.
x
dx.
x2 + 1

Find

Example 5.9

Taking g = x2 + 1 we have g (x) = 2x and so dg = 2x dx, i.e. x dx = 12 dg. Hence,


substitution gives
x2

1
g

x
dx =
+1

1
dg
2

1
2

g 1 dg =

1
1
ln |g| + c = ln |x2 + 1| + c,
2
2

i.e. the extra x in the integrand was, again, needed for the substitution g = x2 + 1
to work.

5
Example 5.10

x ex

Find

2 +1

dx.

Taking g = x2 + 1 we have g (x) = 2x and so dg = 2x dx, i.e. x dx = 12 dg. Hence,


substitution gives
x ex

2 +1

dx =

ex

2 +1

x dx =

1
dg
2

eg

1
2

eg dg =

1 g
1 2
e +c = ex +1 +c,
2
2

i.e. the extra x in the integrand was, again, needed for the substitution g = x2 + 1
to work.
In particular, observe what changes in these examples and what stays the same. Indeed,
just for comparison, we can see what would happen if we had a composition which is
like the one in Example 5.8 but it now involves the function 3x2 + 7 instead of x2 + 1.
Example 5.11

(3x2 + 7)2 x dx.

Find

Taking g = 3x2 + 7 we have g (x) = 6x and so dg = 6x dx, i.e. x dx = 16 dg. Hence,


substitution gives
(3x2 + 7)2 x dx =

g2

1
dg
6

1
6

g 2 dg =

1 g3
(3x2 + 7)3

+c=
+ c,
6 3
18

i.e. the extra x in the integrand was actually needed for the substitution
g = 3x2 + 7 to work.
Activity 5.7

In a similar manner, find

x
dx and
+7

3x2

2 +7

x e3x

dx.

To summarise, it is worth noting that in all of these examples, the substitution works
because we have g(x) = ax2 + b and hence
dg
= 2ax
dx

154

dg = 2ax dx,

5.2. How to find indefinite integrals

where a = 0 and b are constants. But, 2ax is not a constant and so we can not deal with
this by taking it out of the integral as we did in the last set of examples. However, in
these cases, the substitution still works because we have
dg
= 2ax
dx

dg = 2ax dx

1
dg = xdx,
2a

and there is also an x in the integrand to facilitate the transition from dx to dg.
Indeed, in the absence of this extra x, the substitution would produce a more
complicated integral and we would not be able to proceed!
Integration by substitution more generally
The general lesson that we should be drawing from the last two sets of examples is that
integration by substitution works when we have an integrand which is the product of
the composition of two functions f (g(x)), and
a constant multiple of g (x).
The first of these enables us to replace f (g(x)) with f (g) and the second enables us to
replace dx with some constant multiple of dg. Having done this, the substitution has
turned a hard integral into an easier one and we can proceed. Lets now consider some
more complicated examples.
Find

Example 5.12

(x3 + x2 )7 (3x2 + 2x) dx.

Here the composition is (x3 + x2 )7 and so we take g = x3 + x2 . As such, we have


dg
= 3x2 + 2x,
dx
which is the other part of the product in the integrand, i.e. this substitution will
work. Thus, we see that
dg = (3x2 + 2x) dx,
and so the substitution gives
3

2 7

(x + x ) (3x + 2x) dx =

g8
(x3 + x2 )8
g dg =
+c=
+ c.
8
8
7

Here, the extra 3x2 + 2x in the integrand was needed for the substitution
g = x3 + x2 to work.

Example 5.13

Find

x2

2x + 2
dx.
+ 2x + 2

Here the composition is (x2 + 2x + 2)1 and so we take g = x2 + 2x + 2. As such, we


have
dg
= 2x + 2,
dx

155

5. Integration

which is the other part of the product in the integrand, i.e. this substitution will
work. Thus, we see that
dg = (2x + 2) dx,
and so the substitution gives
1
dg = ln |g| + c = ln |x2 + 2x + 2| + c.
g

2x + 2
dx =
x2 + 2x + 2

Here, the extra 2x + 2 in the integrand was needed for the substitution
g = x2 + 2x + 2 to work.

Example 5.14

Find

(x2 + 1) ex

Here the composition is ex

3 +3x+7

3 +3x+7

dx.

and so we take g = x3 + 3x + 7. As such, we have

dg
= 3x2 + 3 = 3(x2 + 1),
dx
which is a constant multiple of the other part of the product in the integrand, i.e.
this substitution will work. Thus, we see that
dg = 3(x2 + 1) dx

1
dg = (x2 + 1) dx,
3

and so the substitution gives


(x2 + 1) ex

3 +3x+7

dx =

eg

1
dg
3

1
3

eg dg =

1 g
1 3
e +c = ex +3x+7 +c.
3
3

Here, the extra x2 + 1 in the integrand was needed for the substitution
g = x3 + 3x + 7 to work.

Activity 5.8

Find

x sin(x2 ) dx.

Integration by substitution with trigonometric functions


Sometimes we can straightforwardly apply what we have just seen to find integrals that
involve compositions of trigonometric functions as the following examples show.
Example 5.15

Find

sin2 x cos x dx.

Here the composition is sin2 x and so we take g = sin x. As such, we have


dg
= cos x,
dx
which is the other part of the product in the integrand, i.e. this substitution will

156

5.2. How to find indefinite integrals

work. Thus, we see that


dg = cos x dx,
and so the substitution gives
sin2 x cos x dx =

g 2 dg =

1
g3
+ c = sin3 x + c.
3
3

Here, of course, the extra cos x in the integrand was needed for the substitution
g = sin x to work.

Activity 5.9

Find

cos2 x sin x dx.

Indeed, as the next example shows, this kind of substitution allows us to find another
useful result.
Example 5.16

Find

tan x dx.

In (2.1), we saw that

sin x
,
cos x
which means that the composition is (cos x)1 and so we take g = cos x. As such, we
have
dg
= sin x,
dx
which, up to a minus, is the other part of the product in the integrand, i.e. this
substitution will work. Thus, we see that
tan x =

dg = sin x dx,
and so the substitution gives
tan x dx =

sin x
dx =
cos x

dg
= ln |g| + c = ln | cos x| + c.
g

Here, of course, the extra sin x in the integrand was needed for the substitution
g = cos x to work.

Activity 5.10

Find

cot x dx.

However, not every trigonometric substitution is so easy to spot as the next example
shows.
Example 5.17

Find

dx
.
(x + a)2 + b2

157

5. Integration

Here, for reasons that will soon become apparent, we make the substitution
x + a = b tan . As such, differentiating both sides of this expression with respect to
, we have
dx
= b sec2
=
dx = b sec2 d.
d
This means that our integral becomes
dx
=
(x + a)2 + b2

b sec2
d =
b2 tan2 + b2

sec2
d =
b sec2

d
,
b

if we use the trigonometric identity tan2 + 1 = sec2 from (2.4). This then gives us
d

1
= + c = tan1
b
b
b

x+a
b

+ c,

since x + a = b tan and where c is an arbitrary constant. Thus, we have found that
dx
1
tan1
=
2
2
(x + a) + b
b

x+a
b

+ c,

which is another useful result.


Activity 5.11

Find

x2

dx
. (Hint: Complete the square in the
+ 2x + 2

denominator.)
We will see other examples of how trigonometric identities can be used when finding
integrals in Section 5.2.6.

5.2.4

Integration by parts

Integration by parts is a way of dealing with integrands which involve the product of
two functions and, as such, it is closely related to the product rule of differentiation. To
see how it works, we will start by seeing how integration by parts is related to the
product rule and then we will describe how to apply this rule. We will then see some
examples of how it can be applied.
Why integration by parts works
We start by noting that the product rule for differentiation tells us that
d
[f (x)g(x)] = f (x)g(x) + f (x)g (x).
dx
So, integrating both sides with respect to x, we get
d
[f (x)g(x)] dx = f (x)g(x) dx + f (x)g (x) dx,
dx
which, on noting that integration undoes differentiation, yields
f (x)g(x) =

158

f (x)g(x) dx +

f (x)g (x) dx.

5.2. How to find indefinite integrals

Rearranging this then gives us


f (x)g (x) dx = f (x)g(x)

f (x)g(x) dx,

and we call this new rule integration by parts.


How to integrate by parts
Observe that integration by parts allows us to write one integral in terms of another
and so a successful application of this rule requires a good choice of f (x) and g (x), i.e.
one where it is straightforward to integrate g (x) and the new integral is easier to find
than the old one. That is:
The integral involves a product of two functions and has the form

f (x)g (x) dx.

Choose f (x) and g (x) so that we can differentiate f (x) to get f (x) and
straightforwardly integrate g (x) to get g(x).
Apply the formula and make sure that the new integral,

f (x)g(x) dx, is easier to

integrate.
If it is, proceed. If it is not, then you have been unwise in your choice of f (x) and
g (x).
Lets look at some simple examples of how it works.

Example 5.18

Find

x ex dx.

Here we have a product and, to apply integration by parts, we choose


f (x) = x

and

g (x) = ex ,

so that differentiating f (x) and integrating g (x) we get


f (x) = 1

and

g(x) = ex ,

where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x ex dx = (x)(ex )

(1)(ex ) dx = x ex

ex dx,

and, clearly, the new integral is easier to find. Thus, finding this integral, we get
x ex dx = x ex

ex dx = x ex ex +c = (x 1) ex +c,

as the answer.

159

5. Integration

Warning! Observe that if we had chosen f (x) and g (x) differently, we would have
got
f (x) = ex
and
g (x) = x,
so that differentiating f (x) and integrating g (x) we would have got
f (x) = ex

and

g(x) =

x2
,
2

where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x ex dx = (ex )

x2
2

x2
2

(ex )

dx =

x2 ex 1

2
2

x2 ex dx,

and this is bad because the new integral is harder to find.

Example 5.19

Find

x ln x dx.

Here we have a product and, to apply integration by parts, we choose


f (x) = ln x

and

g (x) = x,

so that differentiating f (x) and integrating g (x) we get


f (x) =

1
x

and

g(x) =

x2
,
2

where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x ln x dx = (ln x)

x2
2

1
x

x2
2

dx =

x2
ln x
2

x
dx,
2

and, clearly, the new integral is easier to find. Thus, finding this integral, we get
x ln x dx =

x2
ln x
2

x
x2
x2
dx =
ln x
+ c,
2
2
4

as the answer.
Warning! Observe that if we had chosen f (x) and g (x) differently, we would have
got
f (x) = x
and
g (x) = ln x.
This would have been bad because we cant integrate g (x) = ln x to get g(x) at the
moment.
However, having said that, now that we can integrate by parts, we can finally see how
to integrate ln x.

160

5.2. How to find indefinite integrals

Example 5.20

Find

ln(x) dx.

To do this using integration by parts, we treat the integrand as 1 ln(x) so that we


have a product, i.e. we want to find,
1 ln(x) dx.

ln(x) dx =
To apply integration by parts, we choose
f (x) = ln(x)

and

g (x) = 1,

so that differentiating f (x) and integrating g (x) we get


f (x) =

1
x

and

g(x) = x,

where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
1 ln(x) dx = (x)(ln(x))

(x)

1
x

dx = x ln(x)

1 dx,

and, clearly, the new integral is easier to find. Thus, finding this integral, we get
ln(x) dx = x ln(x)

1 dx = x ln(x) x + c,

and so we have now found the integral of ln x as promised in Section 5.2.1.


Activity 5.12 Use the result in Example 5.20 and the change of base formula for
logarithms to find
loga x dx,
which was also promised in Section 5.2.1.
Activity 5.13 Use the result in the previous example to find the integral in
Example 5.19 the other way, i.e. by choosing
f (x) = x

and

g (x) = ln x,

when integrating by parts.


We observe that integration by parts is not useful for all products since, as we saw
above, integrals like
(x2 + 1)2 x dx,
in Example 5.8 contain a product and yet they are best dealt with by substitution as
the extra x in the product is a constant multiple of the derivative of g = x2 + 1.

161

5. Integration

However, integrals like


(x2 + 1)2 x2 dx,
would require integration by parts since, now, the extra x2 in the product is not a
constant multiple of the derivative of g = x2 + 1. Indeed, the main skill involved in
finding integrals using these rules is choosing the appropriate method.3 To illustrate
this, lets see how we would find this last integral.
Example 5.21

Find

(x2 + 1)2 x2 dx.

Here we have a product and, to apply integration by parts, we choose


f (x) = (x2 + 1)2

and

g (x) = x2 ,

so that differentiating f (x) and integrating g (x) we get


f (x) = 2(x2 + 1)(2x)

and

g(x) =

x3
,
3

where we have used the chain rule to perform the differentiation and suppressed the
arbitrary constant from the integration. Applying the rule then gives,
x3
3

(x2 + 1)2 x2 dx = (x2 + 1)2

2(x2 + 1)(2x)

x3
3

dx,

and, clearly, the new integral is easier to find because we can easily multiply out the
brackets and integrate term-by-term. Thus, finding this integral, we get
(x2 + 1)2 x2 dx =

4
x3 2
(x + 1)2
3
3

x6 + x4 dx =

x3 2
4
(x + 1)2
3
3

x7 x5
+
7
5

+ c,

as the answer.
Activity 5.14 Verify that this answer is correct by multiplying out the brackets in
the integrand and integrating term-by-term.
The last two ways of making progress with an integral that we will consider are not rules
of integration, but handy techniques that allow us to rewrite integrands so that we can
see how to integrate them. The first of these uses a particular kind of algebraic identity
known as partial fractions and the second involves the use of trigonometric identities.

5.2.5

Using partial fractions to simplify integrands

Suppose that we have an integrand which is a rational function of two polynomials, say
R(x) =
3

P (x)
.
Q(x)

This is unlike the situation with differentiation where it is always pretty obvious which rule we should
be applying!

162

5.2. How to find indefinite integrals

In order to apply the method of partial fractions, it must be the case that the degree of
the numerator, i.e. P (x), is less than the degree of the denominator, i.e. Q(x). If this is
the case, we start by looking at how the denominator factorises and then proceed
according to which of the following cases we are in.
Case 1: The denominator has distinct [real] linear factors
If the denominator, Q(x), is of degree n and has n real and distinct roots a1 , a2 , . . . , an
then we can write
Q(x) = (x a1 )(x a2 ) (x an ),
i.e. Q(x) has distinct [real] linear factors. In this case, the method of partial fractions
dictates that we can write
A1
A2
An
P (x)
=
+
+ +
,
R(x) =
(x a1 )(x a2 ) (x an )
x a1 x a2
x an

and we can find the numbers A1 , A2 , . . . , An we require by cross-multiplying on the


right-hand-side, comparing the numerators and letting x = a1 , x = a2 , . . . , x = an
respectively. Lets look at a simple example.
Example 5.22

x
dx.
x2 x 2

Find

Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we see that
x2 x 2 = (x 2)(x + 1),
so we are in the case where we have distinct linear factors. This means that we can
write
x
A1
A2
x
=
=
+
,
2
x x2
(x 2)(x + 1)
x2 x+1
for some constants A1 and A2 . To find these constants, we cross-multiply on the
right-hand-side to see that
A1 (x + 1) + A2 (x 2)
x
=
,
(x 2)(x + 1)
(x 2)(x + 1)

and so, comparing the numerators, we need

x = A1 (x + 1) + A2 (x 2).
Indeed, setting x = 2 on both sides, we see that 2 = 3A1 whereas setting x = 1 on
both sides, we see that 1 = 3A2 . Thus, we have
x2

x
x
2/3
1/3
=
=
+
,
x2
(x 2)(x + 1)
x2 x+1

using the values of A1 and A2 that we have found. Consequently, we find that
x2

x
dx =
x2

2/3
1/3
+
x2 x+1

dx =

2
1
ln |x 2| + ln |x + 1| + c,
3
3

where c is an arbitrary constant.

163

5. Integration

We observe, in particular, that the degree of the denominator determines how many
constants we have to find.
Case 2: The denominator has a repeated [real] linear factor
If we find that one of the roots, say ak , of the denominator, Q(x), is real and repeated
m times then we replace the term
Ak
,
x ak
in the expansion from Case 1 with the terms
B1
B2
Bm
+
+ +
.
2
x ak (x ak )
(x ak )m

We then have to find the numbers B1 , B2 , . . . , Bm as well as any other numbers that
remain from Case 1. Lets look at a simple example.
Example 5.23

Find

x+3
dx.
(x + 2)(x 1)2

Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we have
(x + 2)(x 1)2 ,

and so we are in the case where we have a repeated linear factor. This means that
we can write
A1
B1
B2
x+3
=
+
+
,
2
(x + 2)(x 1)
x + 2 x 1 (x 1)2
for some constants A1 , B1 and B2 . To find these constants, we cross-multiply on the
right-hand-side to see that
A1 (x 1)2 + B1 (x 1)(x + 2) + B2 (x + 2)
x+3
=
,
(x + 2)(x 1)2
(x + 2)(x 1)2

and so, comparing the numerators, we need

x + 3 = A1 (x 1)2 + B1 (x 1)(x + 2) + B2 (x + 2).

Indeed, setting x = 2 on both sides, we see that 1 = 9A1 and setting x = 1 on both
sides, we see that 4 = 3B2 . However, to find B1 , we now note that comparing (say)
the coefficient of the x2 term on both sides of this expression we get 0 = A1 + B1
and so B1 = A1 = 1/9. Thus, we have
x+3
1/9
1/9
4/3
=
+
+
,
2
(x + 2)(x 1)
x + 2 x 1 (x 1)2

using the values of A1 , B1 and B2 that we have found. Consequently, we find that
1/9
1/9
4/3
+
+
dx
x + 2 x 1 (x 1)2
1
1
4
= ln |x + 2| ln |x 1|
+ c,
9
9
3(x 1)

x+3
dx =
(x + 2)(x 1)2
where c is an arbitrary constant.

164

5.2. How to find indefinite integrals

We observe, again, that the degree of the denominator determines how many constants
we have to find.
Case 3: The denominator has an irreducible [real] factor
If we find that the denominator, Q(x), has an irreducible [real] factor like ax2 + bx + c,4
then we replace the corresponding term in the expansion from Case 1 with the term
C1 x + C2
.
ax2 + bx + c
We then have to find the numbers C1 and C2 as well as any other numbers that remain
from Case 1. Lets look at a simple example.

Example 5.24

Find

x
dx.
2
(x 1)(x + 2x + 2)

Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we have
(x 1)(x2 + 2x + 2),
and so we are in the case where we have an irreducible factor as x2 + 2x + 2 has no
real roots as, for instance, b2 4ac gives us 22 4(1)(2) = 4 8 = 4 < 0. This
means that we can write
A1
C1 x + C2
x
=
+
,
(x 1)(x2 + 2x + 2)
x 1 x2 + 2x + 2
for some constants A1 , C1 and C2 . To find these constants, we cross-multiply on the
right-hand-side to see that
x
A1 (x2 + 2x + 2) + (C1 x + C2 )(x 1)
=
,
(x 1)(x2 + 2x + 2)
(x 1)(x2 + 2x + 2)
and so, comparing the numerators, we need
x = A1 (x2 + 2x + 2) + (C1 x + C2 )(x 1).
Indeed, setting x = 1 on both sides, we see that 1 = 5A1 and, to find C1 , we now note
that comparing the coefficient of the x2 term on both sides of this expression we get
0 = A1 + C1 and so C1 = A1 = 1/5 and comparing the coefficient of the constant
term on both sides we get 0 = 2A1 C2 and so C2 = 2A1 = 2/5. Thus, we have
(x

x
1/5
(1/5)(x + 2)
=
+ 2
,
+ 2x + 2)
x1
x + 2x + 2

1)(x2

That is, we have a quadratic like ax2 + bx + c with b2 4ac < 0 so we cannot find real roots. This
means that we cannot factorise it using real factors and so we cannot use Case 1 or Case 2 on it.

165

5. Integration

using the values of A1 , C1 and C2 that we have found. Consequently, we find that
(x

x
dx =
+ 2x + 2)

1)(x2

1
5

1/5
(1/5)(x 2)
+ 2
x1
x + 2x + 2

dx

1
x2
2
x 1 x + 2x + 2

dx.

Now, the integral of the first term is easy but, to deal with the integral of the second
term, we note that the derivative of x2 + 2x + 2 is 2x + 2 (i.e. we are thinking about
the substitution g = x2 + 2x + 2 which we saw in Example 5.13). This means that,
writing
x2

1 2x 4
1 2x + 2 6
1 2x + 2
3
x2
=
=
=
2
,
2
2
2
+ 2x + 2
2 x + 2x + 2
2 x + 2x + 2
2 x + 2x + 2 x + 2x + 2

we can see that, completing the square in the denominator of the last term, we have
(x

x
1
dx =
+ 2x + 2)
5

1)(x2

1
5

1
1 2x + 2
3

+
2
x 1 2 x + 2x + 2 (x + 1)2 + 1
ln |x 1|

dx

1
ln |x2 + 2x + 2| + 3 tan1 (x + 1) + c,
2

where c is an arbitrary constant. Here, we have implicitly made the substitution


g = x2 + 2x + 2 in the middle term (as we saw in Example 5.13) and we saw how to
integrate the last term in Activity 5.11.
Of course, the key here is that, in this new term the linear expression in the numerator
that we have to find has a degree which is one less than the degree of the irreducible
quadratic expression in the denominator.5 This means that, if we had a repeated
irreducible factor in the denominator, we would have to compensate in a way which is
reminiscent of Case 2 as the next example shows.
Example 5.25

Find

x4 + x3 + 2x2
dx.
(x 1)(1 + x2 )2

Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we have
(x 1)(1 + x2 )2 ,
and so we are in the case where we have a repeated irreducible factor as x2 + 1 has
no real roots as, for instance, b2 4ac gives us 02 4(1)(1) = 4 < 0. This means
that we can write
x4 + x3 + 2x2
A1
C1 x + C2 D1 x + D2
=
+
+
,
2
2
(x 1)(1 + x )
x1
1 + x2
(1 + x2 )2
5

That is, the number of constants we have to find is equal to the degree of the denominator in the
term we are dealing with.

166

5.2. How to find indefinite integrals

for some constants A1 , C1 , C2 , D1 and D2 . To find these constants, we


cross-multiply on the right-hand-side to see that
x4 + x3 + 2x2
A1 (1 + x2 )2 + (C1 x + C2 )(x 1)(1 + x2 ) + (D1 x + D2 )(x 1)
=
,
(x 1)(1 + x2 )2
(x 1)(1 + x2 )2
and so, comparing the numerators, we need
x4 + x3 + 2x2 = A1 (1 + x2 )2 + (C1 x + C2 )(x 1)(1 + x2 ) + (D1 x + D2 )(x 1).
Indeed, setting x = 1 on both sides, we see that 4 = 4A1 and, to find C1 , we now
note that comparing the coefficient of the x4 term on both sides of this expression
we get 1 = A1 + C1 and so C1 = 1 A1 = 0 and comparing the coefficient of the x3
term on both sides of this expression we get 1 = C1 + C2 and so C2 = 1 + C1 = 1.
To find D1 and D2 we note that, using what we have found so far, we have
x4 + x3 + 2x2 = (1 + x2 )2 + (x 1)(1 + x2 ) + (D1 x + D2 )(x 1),
which means that, comparing the coefficient of x2 on both sides of this expression we
get 2 = 2 1 + D1 and so D1 = 1 whereas comparing the coefficient of x on both
sides we get 0 = 0 + 1 D1 + D2 and so D2 = 0. Thus, we have
x4 + x3 + 2x2
1
1
x
=
+
+
,
2
2
2
(x 1)(1 + x )
x1 1+x
(1 + x2 )2
using the values of the constants A1 , C1 , C2 , D1 and D2 that we have found.
Consequently, we find that
x4 + x3 + 2x2
dx =
(x 1)(1 + x2 )2

1
1
x
+
+
dx
2
x1 1+x
(1 + x2 )2
1
+c
= ln |x 1| + tan1 x
2(1 + x2 )

where c is an arbitrary constant and we have implicitly used the substitution


u = 1 + x2 to work out the integral of the last term.
So, once again, we observe that the degree of the denominator determines how many
constants we have to find in all of these examples. Generally speaking, as we are using
partial fractions to help us find integrals, we shouldnt expect to see anything more
complicated than this.

5.2.6

Using trigonometric identities to simplify integrands

In Example 5.15 we saw how to find


sin2 x cos x dx,
by using the substitution g = sin x, but what if you were asked to find
sin2 x dx?

167

5. Integration

In this case, the substitution would not work since we do not have the extra factor of
cos x in the integrand. However, as we shall see in the next example, we can easily find
this new integral if we use one of the trigonometric identities that we saw in
Section 2.1.4.
sin2 x dx.

Find

Example 5.26

In Activity 2.18, we saw the double-angle formula


cos(2x) = 1 2 sin2 x,

which allows us to write the problematic integrand sin2 x in terms of the function
cos(2x) which is far easier to integrate. That is, rearranging this trigonometric
identity, we have
1
sin2 x =
1 cos(2x) ,
2
and so we find that
sin2 x dx =

1
2

1 cos(2x) dx =

1
1
x sin(2x) + c,
2
2

where c is an arbitrary constant.

Activity 5.15

Find

cos2 x dx.

Indeed, in Example 5.17, we used a substitution that worked because of the


trigonometric identity tan2 + 1 = sec2 to obtain a useful result. Heres another one
that is very similar.

Use the substitution x + a = b sin to find

Example 5.27

dx
b2

(x + a)2

Here, for reasons that will soon become apparent, we make the suggested
substitution. As such, differentiating both sides of x + a = b sin with respect to ,
we have
dx
= b cos
=
dx = b cos d.
d
This means that our integral becomes
dx
b2

(x +

a)2

b cos
2

b2 b2 sin

dx =

cos
d =
cos

d,

if we use the trigonometric identity 1 sin2 = cos2 from (2.2). This then gives us
d = + c = sin1

168

x+a
b

+ c,

5.2. How to find indefinite integrals

since x + a = b sin and where c is an arbitrary constant. Thus, we have found that
dx

= sin1

b2 (x + a)2

x+a
b

+ c,

which is another useful result.


As a last example, lets see another way in which trigonometric identities can be used to
find an integral.

Example 5.28

Use the substitution t = tan to find

d
.
1 + cos(2)

The substitution t = tan is very useful and so we start by seeing how it can be
applied. Firstly, we note that, differentiating both sides with respect to , we get
dt
= sec2 ,
d
and so, using the trigonometric identity sec2 = 1 + tan2 from (2.4), this gives us
d =

dt
.
1 + t2

Secondly, we note that the denominator of our integrand is


1 + cos(2) = 1 + cos2 sin2 ,
using a double-angle formula from (2.6) and so we will need to be able to write sin
and cos in terms of t. An easy way to do this is to consider the right-angled
triangle in Figure 5.1 as this immediately tells us that
t
sin =
,
1 + t2

and

cos =

1
,
1 + t2

and so we see that the denominator of our integrand can be written as


1
t2
2
1 + cos(2) = 1 + cos sin = 1 +

=
,
2
2
1+t
1+t
1 + t2
2

in terms of t. Thus, returning to the integral, we have


d
=
1 + cos(2)

1 + t2 dt
=
2 1 + t2

1
t
1
dt = + c = tan + c,
2
2
2

where c is an arbitrary constant.


Generally, as in the last two examples, when an unusual substitution is required in this
unit, it will be given in the question. Indeed, well see a little bit more of this kind of
thing in Examples 5.36 and 5.37.

169

5. Integration

1+

t2

1
Figure 5.1: A right-angled triangle with t = tan can have t on the opposite side and 1

on the
adjacent side which means that, using Pythagoras theorem, the hypotenuse must
be 1 + t2 . With this triangle, we can then quickly deduce the expressions for sin and
cos in terms of t which are needed for Example 5.28.

5.3

Definite integrals and areas

So far, we have been looking at indefinite integrals and we have been finding them by
using the idea of an antiderivative to deduce standard integrals and rules of integration.
We now turn to the geometric interpretation of an integral and this involves introducing
the idea of a definite integral and seeing what it represents.

5.3.1

Definite integrals and what they represent

In Section 3.3.1 we saw that the derivative of a function, f (x), gave us the gradient of
the curve y = f (x). We now consider what the integral of a function, f (x), tells us about
the curve y = f (x) and see how this comes about through the idea of a definite integral.
What is a definite integral?
Recall that an indefinite integral is so-called since, given a function, f (x), and one of its
antiderivatives, F (x), i.e. two functions related by the fact that
dF
= f (x),
dx
we have
f (x) dx = F (x) + c,
where c is an arbitrary constant. And, indeed, it is this arbitrary constant that makes
this integral indefinite as we do not know what c is. In a similar vein, instead of writing,
b

f (x) dx we could also write

f (x) dx,
a

where the constants a and b are called the limits of integration.


In order to work out integrals that look like this we need to know what to do with these
limits and the procedure is:
Firstly: deal with the integral. Integrating f (x), we take one of its antiderivatives,
F (x), and then write
b

f (x) dx = F (x) .
a

170

5.3. Definite integrals and areas

In particular, as we shall see below, observe that we no longer need a constant of


integration.
Secondly: deal with the limits. By definition, we let
b

F (x)
a

= F (b) F (a),

i.e. we subtract the value of the antiderivative at x = a from its value at x = b.


Notice that this means that, if F (x) is an antiderivative of f (x), we have
b
a

f (x) dx = F (b) F (a),

i.e. the value of the integral depends only on the value of the antiderivative at the
points x = a and x = b. Thus, this is now a definite integral as it no longer involves an
arbitrary constant, c.
Activity 5.16

If F (x) is an antiderivative of f (x), show that


b

f (x) dx = F (x) + c
a

= F (b) F (a),

if c is a constant. Hence explain why we can omit the constant of integration when
evaluating definite integrals.
Another consequence of this discussion is that it allows us to see how to use our basic
rules of integration to evaluate definite integrals. For instance, if k and l are constants
and f (x) and g(x) are functions, then we can see that the linear combination rule gives
us
b

g(x) dx,

f (x) dx + l

[kf (x) + lg(x)] dx = k

if we are using definite integrals.


Activity 5.17 Following what we saw in Section 5.2.2, write down the constant
multiple rule, the sum rule and the difference rule for definite integrals.
Activity 5.18 Using what we have seen so far, derive the linear combination rule
for definite integrals.
Now that we have the basic idea, lets see how we can work out a definite integral.
3

Example 5.29

Evaluate

(x + 4) dx.
1

If we follow the two step procedure above, i.e. integrating to find an antiderivative
and then dealing with the limits, we get
3
1

x2
(x+4) dx =
+ 4x
2

=
1

32
12
+ 4(3)
+ 4(1)
2
2

9
1
+ 12
+4
2
2

= 12,

171

5. Integration

which is the value of this definite integral.


Alternatively, we could use the linear combination rule to get
3

x dx +

(x + 4) dx =

9 1

2 2

x2
4 dx =
2

+ 12 4

+ 4x
1

=
1

32 12

2
2

+ 4(3) 4(1)

= 12,

which is the same answer as before.


What definite integrals with non-negative integrands represent

Definite integrals are useful because they tell us about the area under a curve.
Specifically, if we have the definite integral
b

f (x) dx,

(5.1)

where f (x) 0 for all x such that a x b,6 we say that we have a non-negative
integrand and find that the value of the integral is the area of the region between the
curve y = f (x), the x-axis and the vertical lines x = a and x = b as illustrated in
Figure 5.2.
y

y = f (x)

x
a

Figure 5.2: The hatched region is between the curve y = f (x), the x-axis and the vertical

lines x = a and x = b. In cases like this we have a non-negative integrand, i.e. f (x) 0
for a x b, and so the definite integral in (5.1) gives us the area of this hatched region.
Example 5.30 Find the area of the region between the line y = 4 2x, the x-axis
and the vertical lines x = 0 and x = 2 which is illustrated in Figure 5.3(a).
There are two ways to find this area:
As this is just a right-angled triangle, the area is just half times base times
height, i.e.
1
area of triangle = 2 4 = 4.
2
Thus, the area of the region is four.
6

At the moment we will just accept this caveat. The reason why we need f (x) to be non-negative for
values of x between the limits of integration will become clear very soon.

172

5.3. Definite integrals and areas

As we have y = f (x) with f (x) = 4 2x, we can see from Figure 5.3(a) that
f (x) 0 between x = 0 and x = 2. So, as noted above, the area should be given
by
2

2
0

(4 2x) dx = 4x x2

= (4 2 22 ) (4 0 02 ) = (8 4) 0 = 4,

which is, again, four.


Consequently, this confirms that the definite integral does give us the area of the
region between the line y = 4 2x, the x-axis and the vertical lines x = 0 and x = 2,
at least, when f (x) 0 between the vertical lines.
y

11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111

4
3

y = 4 2x

y = 4 x2

(a)

(b)

Figure 5.3: Non-negative integrands. (a) For Example 5.30, the region between the line

y = 4 2x, the x-axis and the vertical lines x = 0 and x = 2. (b) For Example 5.31, the
region between the parabola y = 4 x2 , the x-axis and the vertical lines x = 1 and
x = 1.
However, generally, we wont have a simple geometric way of finding the area under a
curve and so we will have to use integration.
Example 5.31 Find the area of the region between the parabola y = 4 x2 , the
x-axis and the vertical lines x = 1 and x = 1 which is illustrated in Figure 5.3(b).
As we have y = f (x) with f (x) = 4 x2 , we can see from Figure 5.3(b) that
f (x) 0 between x = 1 and x = 1. So, as noted above, the area should be given by
1
1

(4 x2 ) dx = 4x
=

11
3

x3
3

1
1

4(1)

11
3

(1)3
3

4(1)

(1)3
3

22
,
3

i.e. the area is 7 13 .

173

5. Integration

Activity 5.19 Observe that the region in the previous example is symmetric about
the y-axis. Use this observation to explain why the area of this region is two times
the area represented by the definite integral,
1
0

(4 x2 ) dx,

and verify that this does indeed give the correct area.
What definite integrals with non-positive integrands represent

We now start to consider what happens to the definite integral in (5.1) when we cant
guarantee that the integrand is non-negative, i.e. what happens if we do not have
f (x) 0 for all x such that a x b? To simplify matters, we will start by asking:
What happens when this condition always fails? That is, what happens when the
integrand is non-positive as f (x) 0 for all x such that a x b.

So what does the definite integral in (5.1) tell us about the area of the region bounded
by the curve y = f (x), the x-axis and the vertical lines x = a and x = b when we have a
non-positive integrand, i.e. when f (x) 0 for a x b, as illustrated in Figure 5.4?
One way of looking at this is to note that,
If f (x) 0 for all a x b, then f (x) 0 for all a x b.
But, this means that f (x) gives us a non-negative integrand and the area, A, of the
region in question is given by
b

A=
a

f (x) dx =

f (x) dx

f (x) dx = A,

i.e. for non-positive integrands, the definite integral gives us minus the area. Thus, in
the case of non-positive integrands, the area is given by the magnitude of the definite
integral. Lets have a look at an example.
y
O

x
y = f (x)

Figure 5.4: The hatched region is between the curve y = f (x), the x-axis and the vertical

lines x = a and x = b. In cases like this we have a non-positive integrand, i.e. f (x) 0
for a x b, and so the definite integral in (5.1) gives us minus the area of this hatched
region.

174

5.3. Definite integrals and areas

Example 5.32 Find the area of the region between the line y = 4 2x, the x-axis
and the vertical lines x = 2 and x = 4 which is illustrated in Figure 5.5(a).
There are two ways to find this area:
As this is just a right-angled triangle, the area is just half times base times
height, i.e.
1
area of triangle = 2 4 = 4.
2
Thus, the area of the region is four.
As we have y = f (x) with f (x) = 4 2x, we can see from Figure 5.5(a) that
f (x) 0 between x = 2 and x = 4. So, looking at the definite integral we get,
4

(42x) dx = 4xx

= (4442 )(4222 ) = (1616)(84) = 4,

which is minus the answer we would expect. As such, we take the magnitude of
this answer and so the area is, again, four.
Consequently, if f (x) 0 between the vertical lines, the definite integral gives us
minus the area and so we take the magnitude of the definite integral to find the area.

y = 4 2x

1
O

x
1

(a)

y = 4 2x

x
1

(b)

Figure 5.5: Negative integrands and their relation to area. The region between the line

y = 4 2x, the x-axis and the vertical lines (a) x = 2 and x = 4 for Example 5.32, and
(b) x = 0 and x = 4 for Example 5.33.

175

5. Integration

What definite integrals with general integrands represent


We now consider what happens to the definite integral in (5.1) when we cant guarantee
that the integrand is non-positive or non-negative, i.e. what happens if f (x) 0 for
some x such that a x b but not others? Lets start by considering the simple case
where we have an integrand which is neither non-positive nor non-negative because
there is some number c such that a c b where
f (x) 0 for all x such that a x c, and
f (x) 0 for all x such that c x b,

as illustrated in Figure 5.6. One way of looking at this is to note that the definite
integral

f (x) dx gives us the hatched area, A1 , between the vertical lines x = a and
x = c,
a

f (x) dx gives us minus the hatched area, A2 , between the vertical lines x = c
c

and x = b.
As such, the hatched area, A, between the lines x = a and x = b is given by
c

A = A1 + A 2 =

f (x) dx +
a

f (x) dx ,
c

which is not the value of the definite integral in (5.1).


y
y = f (x)
b
O

x
a

Figure 5.6: The hatched region is between the curve y = f (x), the x-axis and the vertical

lines x = a and x = b. In cases like this we have a non-negative integrand for a x c


and a non-positive integrand for a x c, i.e. the definite integral in (5.1) can not be
used to find the area of the region.
Thus, for general integrands, the procedure for finding the area of the region bounded
by the curve y = f (x), the x-axis and the vertical lines x = a and x = b is as follows:
Firstly, determine all the points where the curve crosses the x-axis with
x-coordinates between x = a and x = b.
Secondly, use these points to determine (possibly via a sketch) where the curve is
positive and where the curve is negative.

176

5.3. Definite integrals and areas

Thirdly, use this information to determine the areas by finding the appropriate
definite integrals (bearing in mind that the integrands will now be either
non-negative or non-positive).
Fourthly, add up all the areas to find the total area.
To see how this works lets consider a couple of examples.
Example 5.33 Find the area of the region between the line y = 4 2x, the x-axis
and the vertical lines x = 0 and x = 4 which is illustrated in Figure 5.5(b).
As indicated in Figure 5.5(b), the line y = 4 2x crosses the x-axis when x = 2 and
this lies between x = 0 and x = 4. We can also see that the function is non-negative
for 0 x 2 and non-positive for 2 x 4. As such, using our earlier workings in
Examples 5.30 and 5.32, we split the total region into two sub-regions to see that:

Between x = 0 and x = 2 we evaluate the definite integral,


2

(4 2x) dx which gives us 4,

as we saw in Example 5.30. Thus, the area is four here as we have a


non-negative integrand.
Between x = 2 and x = 4 we evaluate the definite integral,
4
2

(4 2x) dx which gives us 4,

as we saw in Example 5.32. Thus, the area is four here as we have a non-positive
integrand.
Consequently, the total area is eight.
We also note, in passing, that the definite integral
4

4
0

(4 2x) dx = 4x x

2
0

= (4 4 42 ) (4 0 02 ) = (16 16) 0 = 0,

and, as this is zero, it most definitely is not giving us the area we seek!
Activity 5.20 Verify that the answer to the previous example is correct by finding
the areas of the triangles involved.
Example 5.34 Find the area of the region between the parabola y = 1 x2 , the
x-axis and the vertical lines x = 2 and x = 2 which is illustrated in Figure 5.7.
As indicated in Figure 5.7, the parabola y = 1 x2 crosses the x-axis when x = 1
and these points lie between x = 2 and x = 2. We can also see that the function is
non-negative for 1 x 1 and non-positive for 2 x 1 and 1 x 2. As
such, we split the total region into three sub-regions to see that:

177

5. Integration

Between x = 2 and x = 1 we evaluate the definite integral,


1
2

x3
(1)3
(2)3
(1 x ) dx = x
= 1
2
3 2
3
3
1
8
4
= 1 +
2 +
= .
3
3
3
2

Thus, the area is

4
3

here as we have a non-positive integrand.

Between x = 1 and x = 1 we evaluate the definite integral,


1

x3
13
(1)3
(1 x ) dx = x
= 1
1
3 1
3
3
1
1
4
1
1 +
= .
= 1
3
3
3
2

Thus, the area is

4
3

here as we have a non-negative integrand.

Between x = 1 and x = 2 we evaluate the definite integral,


2
1

(1 x2 ) dx = x

Thus, the area is

4
3

x3
3

1
1

= 2

23
13
8
1
4
1
= 2
1
= .
3
3
3
3
3

here as we have a non-positive integrand.

Consequently, the total area is

4
3

+ 43 +

4
3

which is four.

We also note, in passing, that the definite integral,


2
2

(1x2 ) dx = x

x3
3

2
2

= 2

23
(2)3
8
8
4
(2)
= 2 2 +
= ,
3
3
3
3
3

and this is most definitely not giving us the area we seek!

5.3.2

Definite integrals and the other rules of integration

We have seen how to use the basic rules of integration when dealing with definite
integrals and so we now look at how we can use the other two rules of integration,
namely integration by substitution and integration by parts, in this context.
Integration by substitution
When evaluating a definite integral using integration by substitution we follow the same
procedure as before but now, we also change the limits of integration so that they are
values of g rather than values of x. That is, if we are making the substitution g = g(x)
and we have a definite integral with limits x = a and x = b, after the substitution, the
limits will be g = g(a) and g = g(b) respectively. This is best illustrated by an example.

178

5.3. Definite integrals and areas

y
y = 1 x2

2
1

1
2
3

Figure 5.7: Negative integrands and their relation to area (continued). For Example 5.34,

the region between the parabola y = 1 x2 , the x-axis and the vertical lines x = 2 and
x = 2.

Example 5.35

x ex

Find

2 +1

dx.

We saw in Example 5.10 that, taking g = x2 + 1, we have


dg
= 2x
dx

dg = 2x dx

x dx =

1
dg.
2

In this case, as we have a definite integral, we also change the limits of integration,
i.e.
lower limit: x = 0 gives g = g(0) = 02 + 1 = 1, and
upper limit: x = 1 gives g = g(1) = 12 + 1 = 2.
Hence, the substitution gives
1

xe

x2 +1

2
g

dx =

e
1

1
=
2

1
dg
2

1 g
e dg =
e
2
g

=
1

1
2

e2 e1

e
= (e 1),
2

as the answer.
Alternatively, using our indefinite integral from Example 5.10, we saw that
integration by substitution gave us
x ex

2 +1

dx =

1 x2 +1
e
+c,
2

and so this means that, if we suppress the constant of integration, we get


1

x ex
0

2 +1

dx =

1 x2 +1
e
2

=
0

1
2

2 +1

e1

2 +1

e0

1
2

e2 e1

e
= (e 1),
2

as before.

179

5. Integration

For a harder example, lets see what happens when we have to make a substitution that
works because of our double-angle formulae from Section 2.1.4.
1

Example 5.36

Use the substitution x = sin to find


0


x 1 x dx.

Differentiating both sides of x = sin2 with respect to , we have


dx
= 2 sin cos
d

dx = 2 sin cos d,

and changing the limits of integration we get


lower limit: x = 0 gives sin2 = 0 and so = 0, and
upper limit: x = 1 gives sin2 = 1 and so = /2.

Hence, the substitution gives us


1
0


x 1 x dx =

/2

/2

2 sin2 cos2 dt,

(sin )(cos )(2 sin cos ) dt =


0

where we have
used the trigonometric identity cos2 = 1 sin2 from (2.2)to get
cos from the 1 x in the integrand. Then, using the double-angle formula
sin(2) = 2 sin cos from (2.6), we see that this gives us
1
0


1
x 1 x dx =
2

/2

sin2 (2) d,
0

which we solve using a variation on the method given in Example 5.26, i.e. we note
that cos(4) = 1 2 sin2 (2) from Activity 2.18, so that
1
0


1
x 1 x dx =
4

/2
0

1 cos(4) d =

1
1
sin(4)
4
4

/2

=
0

as sin(4) = 0 when = 0 or = /2.


Lastly, lets see another application of the t = tan substitution that we saw in
Example 5.28.
/2

Example 5.37

Use the substitution t = tan to find


0

d
.
4 2 cos2

Using what we saw in Example 5.28, we see that


d =

dt
1 + t2

and

cos2 =

1
,
1 + t2

and so, in particular, the denominator of our integrand can be written as


4 2 cos2 = 4

180

2
2 + 4t2
=
.
1 + t2
1 + t2

,
8

5.3. Definite integrals and areas

Also, changing the limits of integration, we get


lower limit: = 0 gives t = tan 0 = 0, and
upper limit: = /2 gives t = tan(/2) = ,

which means that the substitution gives


/2
0

1 + t2 dt
1
dt
=
2 + 4t2 1 + t2
4 0
0

2
1
1
,
=
2 tan ( 2t)
=
4
8
0

d
=
4 2 cos2

1
2

1
dt
+ t2

as tan1 = /2 and we have used the result we saw in Example 5.17.

Of course, in this example, when we write things like tan(/2) = or


tan1 () = /2, what we really mean is tan as /2 and
tan1 t /2 as t . This shorthand will be fine for this course, but in 176
Further Calculus, we will see how to do things like this properly.
Integration by parts
When evaluating a definite integral using integration by parts we use
b

f (x)g (x) dx = f (x)g(x)


a

f (x)g(x) dx,
a

i.e. we have to evaluate the f (x)g(x) term using the limits of integration as well as
evaluating the new [easier] definite integral.
1

Example 5.38

x ex dx.

Find
0

We saw in Example 5.18 that, to apply integration by parts to this integral, we


choose
f (x) = x
and
g (x) = ex ,
so that differentiating f (x) and integrating g (x) we get
f (x) = 1

g(x) = ex ,

and

where we have suppressed the arbitrary constant from the integration. Applying the
rule in the case of a definite integral then gives,
1

x ex dx = (x)(ex )
0

(1)(ex ) dx = x ex
0

ex dx,
0

which leads to
1

1
x

x e dx =
0

(1)(e ) (0)(e ) e

=
0

e1 0

e1 e0

= 1,

as the answer.

181

5. Integration

Alternatively, using our indefinite integral from Example 5.18, we saw that
integration by parts gave us
x ex dx = (x 1) ex +c,
and so this means that, if we suppress the constant of integration, we get
1

1
0

x ex dx = (x 1) ex

= (1 1) e1 (0 1) e0 = 0 ( e0 ) = 1,

as before.

5.4

Applications of integrals

Integrals can be used in economics and we now introduce two ways in which they can
arise in that subject. The first is what happens when we want to find a cost function
but we only know the marginal cost; and the second introduces the idea of consumer
and producer surpluses.

5.4.1

Marginal functions revisited

Suppose that the cost of producing a quantity, q, of goods is given by the cost function,
C(q). In Section 3.3.3, we met the idea of the marginal cost, MC(q), of producing q
units which was given by
dC
MC(q) =
,
dq
and this was useful since the approximation
C

MC(q)q,

allowed us to estimate the change in costs, C, due to an increase in production of q,


i.e. where the quantity produced is increased from q to q + q. We now consider the
problem of finding the cost function, C(q), when we are given the marginal cost
function, MC(q). Indeed, as the marginal cost function is the derivative of the cost
function, we can see that
C(q) is an antiderivative of MC(q),
and so,
C(q) =

MC(q) dq.

(5.2)

However, this presents us with a problem as finding the indefinite integral on the
right-hand-side of (5.2) will yield all the antiderivatives of MC(q) i.e. a function C(q)
that contains an arbitrary constant whereas we want to find the particular
antiderivative that is actually the cost function i.e. we want to find a particular value
of this constant. So, the question is: Which value of the arbitrary constant will give us
the cost function? In order to answer this question, we need to be given more
information, say the fixed costs associated with this production, so that we can find the
right value for this constant. Lets consider an example.

182

5.4. Applications of integrals

Example 5.39

A companys marginal cost function is given by


MC(q) = 2q + 100 eq ,

and its fixed costs are 10, 000. What is the cost function, C(q), for this company?
Using (5.2) above, we see that the cost function is given by the integral of the
marginal cost, i.e.
C(q) =

(2q + 100 eq ) dq = q 2 + 100 eq +c,

where c is an arbitrary constant. This tells us, depending on the value of c, all of the
possible cost functions for this company. But, which one should we take? Obviously,
perhaps, we want the one which also gives us fixed costs of 10, 000, i.e. we want
C(0) = 10, 000 = 10, 000 = 02 + 100 e0 +c = 10, 000 = 100 + c = c = 9, 900,
as the fixed costs are the cost of producing nothing. Thus, the cost function for this
company is given by
C(q) = q 2 + 100 eq +9, 900,
as this function agrees with the question on both the marginal and the fixed costs of
production.

5.4.2

Consumer and producer surpluses

Suppose that a market has linear supply and demand functions as illustrated in
Figure 5.8. As we know from Section 2.1.5, the equilibrium price, p , and the
equilibrium quantity, q , occur at the point where the graphs of these functions
intersect. Indeed, at equilibrium, as the consumers buy q units of the good at a price of
p per unit, they pay an amount p q to the suppliers and we can think of this as the
area of the hatched region in Figure 5.9(b).
However, if the consumers are willing to buy q units of the good, it can be argued7
that the consumers would be willing to pay an amount given by
q

pD (q) dq,
0

which is the area of the hatched region in Figure 5.9(a). The difference between the area
that represents what they would pay and the area that represents what they actually
pay, i.e. the area of the hatched region in Figure 5.9(d), is called the consumer surplus.
Indeed, this consumer surplus, CS, can be found using the formula
q

CS =
0

pD (q) dq p q ,

and this is the amount that the consumers save by paying what they actually paid
instead of what they would have paid.
7

See, for example, Section 25.1 of Anthony and Biggs (1996).

183

5. Integration

Figure 5.8: Linear supply and demand functions for a market. Note that the equilibrium

price, p , and the equilibrium quantity, q , occur at the point where the graphs of these
functions intersect.
Similarly, if the suppliers are willing to supply q units of the good, it can be argued
that they need to be paid an amount given by
q

pS (q) dq,
0

which is the area of the hatched region in Figure 5.9(c). The difference between the area
that represents what they are actually paid and the area that represents what they need
to be paid, i.e. the area of the hatched region in Figure 5.9(e), is called the producer
surplus. Indeed, this producer surplus, PS, can be found using the formula
q

PS = p q

pD (q) dq,
0

and this is the amount that the suppliers gain by being paid what they actually receive
instead of what they need to receive. Lets look at a simple example.
Example 5.40

A market has an inverse demand function given by

1
pD (q) = 70 q,
3
and an inverse supply function given by
1
pS (q) = 20 + q.
2
Find the equilibrium price and quantity. What are the consumer and producer
surpluses for this market?
The equilibrium quantity, q , makes the prices obtained from the inverse demand
and supply functions equal, i.e.
1
1
5
70 q = 20 + q
=
50 = q
=
q = 60,
3
2
6

184

5.4. Applications of integrals


p

111111
000000
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
p
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
O

111111
000000
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
p

(a)

1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111

(b)

(c)
p

111111
000000
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
p
000000
111111
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
O
0q
1

1010
111111
000000
000000
111111
10
000000
111111
1010
000000
111111
000000
111111
1010
p

Consumer surplus:
area (a) area (b)

Producer surplus:
area (b) area (c)

(d)

(e)

Figure 5.9: What people pay or need to be paid. (a) What the consumers would pay for a

quantity q . (b) What the consumers pay for a quantity q if the market is at equilibrium.
(c) What the suppliers need to be paid for a quantity q . (d) What the consumers save if
they pay for a quantity q in a market that is at equilibrium, this is the consumer surplus.
(e) What the producers gain if they sell a quantity q in a market that is at equilibrium,
this is the producer surplus.
and this means that the equilibrium price, p , is given by
1
p = 70 (60) = 70 20 = 50,
3
if we use the inverse demand function.
Hence, to find the consumer surplus, CS, we have
q

CS =
0

pD (q) dq p q ,

and so we need to find


60
0

1
70 q
3

q2
dq = 70q
6

60
0

1
= 70(60) (60)2 0 = 4, 200 600 = 3, 600,
6

which means that


CS = 3, 600 (50)(60) = 3, 600 3, 000 = 600,

185

5. Integration

is the consumer surplus. And, to find the producer surplus, PS, we have
q

PS = p q

pS (q) dq,
0

and so we need to find


60
0

1
20 + q
2

q2
dq = 20q +
4

60
0

1
= 20(60) + (60)2 0 = 1, 200 + 900 = 2, 100,
4

which means that


PS = (50)(60) 2, 100 = 3, 000 2, 100 = 900,
is the producer surplus.

5
Although, as both the demand and supply functions are linear in this example, there is
an easier way to find the consumer and producer surpluses as the next Activity shows.
Activity 5.21 Sketch the inverse demand and supply functions in the previous
example and shade in the regions which represent the consumer and producer
surplus. What are the areas of these regions?
Of course, the demand and supply functions that we are given may not be linear and, in
such cases, we would have to use integration to find the consumer and producer
surpluses.
Activity 5.22

The demand for a commodity is given by the equation


p(q + 1) = 231.

If the equilibrium quantity is 10, find the equilibrium price and hence determine the
consumer surplus.

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
find integrals using standard integrals and the rules of integration;
find integrals by simplifying the integrand using partial fractions and trigonometric
identities;
use integrals to find areas;
solve problems from economics-based subjects that involve integrals.

186

5.4. Solutions to activities

Solutions to activities
Solution to activity 5.1
Given the linear combination rule, i.e.
[kf (x) + lg(x)] dx = k

f (x) dx + l

g(x) dx,

we can derive the constant multiple rule by setting l = 0 so that


kf (x) dx =

[kf (x) + 0g(x)] dx = k

f (x) dx + 0

g(x) dx = k

f (x) dx,

the sum rule by setting k = 1 and l = 1 so that


[f (x) + g(x)] dx =
=

[1f (x) + 1g(x)] dx = 1


f (x) dx +

f (x) dx + 1

g(x) dx

g(x) dx,

and the difference rule by setting k = 1 and l = 1 so that


[f (x) g(x)] dx =
=

[1f (x) + (1)g(x)] dx = 1


f (x) dx

f (x) dx + (1)

g(x) dx

g(x) dx.

Solution to activity 5.2


Suppose that F (x) and G(x) are antiderivatives of f (x) and g(x) respectively, i.e.
dF
= f (x)
dx

and

dG
= g(x).
dx

This means that


k

f (x) dx + l

g(x) dx = kF (x) + lG(x) + c,

where c is an arbitrary constant. But, by the linear combination rule for differentiation,
we also have
d
dF
dG
kF (x) + lG(x) + c = k
+l
+ 0 = kf (x) + lg(x),
dx
dx
dx
which means that kF (x) + lG(x) + c is also an antiderivative of kf (x) + lg(x), i.e.
[kf (x) + lg(x)] dx = kF (x) + lG(x) + c.
Consequently, we have
[kf (x) + lg(x)] dx = k

f (x) dx + l

g(x) dx,

as they have the same antiderivatives.

187

5. Integration

Solution to activity 5.3


For (a), use the constant multiple rule to see that
3 cos x dx = 3

cos x dx = 3 sin x + c,

where c is an arbitrary constant. For (b), we use the sum rule to see that
(ex + cos x) dx =

ex dx +

cos x dx = ex + sin x + c,

where c is an arbitrary constant. For (c), we use the linear combination rule to see that
3 sin x

3
x

dx = 3

1
dx = 3( cos x)3 ln |x|+c = 3 cos x3 ln |x|+c,
x

sin x dx3

where c is an arbitrary constant.


Solution to activity 5.4
For both of these integrals we use the substitution g = 4x + 7 so that we have
dg
=4
dx

dg = 4dx

1
dx = dg.
4

Hence making this substitution in the first integral we get


1
dx =
4x + 7

1
g

1
dg
4

1
4

1
1
1
dg = ln |g| + c = ln |4x + 7| + c,
g
4
4

where c is an arbitrary constant whereas, in the second integral, we get


e4x+7 dx =

1
dg
4

eg

1
4

eg dg =

1 g
1
e +c = e4x+7 +c,
4
4

where c is an arbitrary constant.


Solution to activity 5.5
Using the standard integrals as a source of antiderivatives, we see that, if n = 1,
(ax + b)n dx =

1 xn+1

+ c,
a n+1

whereas, if n = 1, we have
(ax + b)1 dx =

1
1
dx = ln |ax + b| + c,
ax + b
a

where c is an arbitrary constant. We also have


eax+b dx =

1 ax+b
e
+c,
a

1
sin(ax + b) dx = cos(ax + b) + c, and
a

188

5.4. Solutions to activities

1
sin(ax + b) + c,
a
where c is an arbitrary constant.
cos(ax + b) dx =

Of course, if a = 0, then the dependence on x in the integrand disappears and so we are


just integrating a constant, i.e. we have
bn dx = xbn + c,
for any n, as well as
eb dx = x eb +c,

sin b dx = x sin b + c and

cos b dx = x cos b + c,

where c is an arbitrary constant.

Solution to activity 5.6


Using what we saw in Activity 5.5 we see that the integrals from Activity 5.4 are,
simply,
1
1
dx = ln |4x + 7| + c
4x + 7
4

and

e4x+7 dx =

1 4x+7
e
+c,
4

where c is an arbitrary constant. This is, of course, exactly what we found in


Activity 5.4.
Solution to activity 5.7
Taking g = 3x2 + 7 we have g (x) = 6x and so dg = 6x dx, i.e. x dx = 16 dg. Hence, in
the first integral, this substitution gives
x
dx =
+7

3x2

1
g

1
dg
6

1
6

1
1
1
dg = ln |g| + c = ln |3x2 + 7| + c,
g
6
6

where c is an arbitrary constant whereas, in the second integral, this substitution gives
2 +7

x e3x

dx =

eg

1
dg
6

1
6

eg dg =

1 g
1 2
e +c = e3x +7 +c,
6
6

where c is an arbitrary constant. In both cases, note that the extra x in the integrand
was actually needed for the substitution g = 3x2 + 7 to work.
Solution to activity 5.8
Here the composition is sin(x2 ) and so we take g = x2 . As such, we have
dg
1
= 2x
=
x dx = dg,
dx
2
which is a constant multiple of the other part of the product in the integrand, i.e. this
substitution will work. Thus, the substitution gives
x sin(x2 ) dx =

sin(g)

1
dg
2

1
2

1
1
sin(g) dg = cos(g) + c = cos(x2 ) + c,
2
2

where c is an arbitrary constant. Here, of course, the extra x in the integrand was
needed for the substitution g = x2 to work.

189

5. Integration

Solution to activity 5.9


Here the composition is cos2 x and so we take g = cos x. As such, we have
dg
= sin x,
dx
which, up to a minus, is the other part of the product in the integrand, i.e. this
substitution will work. Thus, we see that
dg = sin x dx,
and so the substitution gives
sin2 x cos x dx =

g 2 ( dg) =

g 2 dg =

1
g3
+ c = cos3 x + c.
3
3

Here, of course, the extra sin x in the integrand was needed for the substitution
g = cos x to work.
Solution to activity 5.10
In Activity 2.4, we saw that

cos x
,
sin x
which means that the composition is (sin x)1 and so we take g = sin x. As such, we
have
dg
= cos x,
dx
which is the other part of the product in the integrand, i.e. this substitution will work.
Thus, we see that
dg = cos x dx,
cot x =

and so the substitution gives


cot x dx =

cos x
dx =
sin x

dg
= ln |g| + c = ln | sin x| + c.
g

Here, of course, the extra cos x in the integrand was needed for the substitution
g = sin x to work.
Solution to activity 5.11
We note that the quadratic expression in the denominator can be written as
x2 + 2x + 2 = (x + 1)2 + 1,
if we complete the square. As such, we have
x2

dx
=
+ 2x + 2

dx
= tan1 (x + 1) + c,
2
(x + 1) + 1

using the result we derived in Example 5.17. (A useful exercise at this point is to try
and get this answer by actually making the substitution x + 1 = tan as we did in that
example.)

190

5.4. Solutions to activities

Solution to activity 5.12


Using the change of base formula for logarithms from Section 2.1.4, i.e.
ln x
,
ln a

loga (x) =
we have
loga x dx =

1
ln a

ln x dx =

x
1
x ln(x) x + c = x loga (x)
+ c,
ln a
ln a

where c is an arbitrary constant.


Solution to activity 5.13
To find

x ln x dx the other way, i.e. by choosing


f (x) = x

and

g (x) = ln x,

we differentiate f (x) and integrate g (x) using the result in Example 5.20 to get
f (x) = 1

g (x) = x ln x x,

and

where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives
x ln x dx = x(x ln x x)
= x2 ln x x2
= x2 ln x

x2

(1)(x ln x x) dx
x ln x dx

x2
2

+c

x ln x dx + c,

so that, taking the integral on the right-hand-side over to the left-hand-side, we have
2

x2
x ln x dx = x ln x
+c
2
2

x2
x2
x ln x dx =
ln x
+ c,
2
4

where c is an arbitrary constant. Notice that this is the same as the answer we found in
Example 5.19 but it is slightly trickier to get and we need to know the answer to
Example 5.20.
Solution to activity 5.14
Unlike what we saw in Example 5.21, it would actually make more sense to find
(x2 + 1)2 x2 dx,
by multiplying out the brackets and integrating term-by-term rather than integrating it
by parts. Doing this, we get
(x2 + 1)2 x2 dx =

(x4 + 2x2 + 1)x2 dx =

(x6 + 2x4 + x2 ) dx =

x7 2 5 x3
+ x +
+ c,
7
5
3

191

5. Integration

where c is an arbitrary constant. Indeed, to verify that this is the same answer as the
one we saw in the example, it is easiest to take the earlier answer and note that
4
x3 2
(x + 1)2
3
3

x 7 x5
+
7
5

x3 4
4 x 7 x5
(x + 2x2 + 1)
+
+c
3
3 7
5
x7 2 5 x3
4
4
=
+ x +
x 7 x5 + c
3
3
3
21
15
7
3
x
2
x
=
+ x5 +
+ c,
7
5
3

+c=

which is what we got above.


Solution to activity 5.15

To find this integral we also use the other double-angle formula from Activity 2.18,
namely
1
cos(2x) = 2 cos2 x 1
=
cos2 x =
1 + cos(2x) ,
2
as this allows us to write the problematic integrand cos2 x in terms of the function
cos(2x) which is far easier to integrate. This means that we have
cos2 x dx =

1
2

1 + cos(2x) dx =

1
1
x + sin(2x) + c,
2
2

where c is an arbitrary constant.


Solution to activity 5.16
Using the first step, we can see that
b

f (x) dx = F (x) + c ,
a

as F (x) + c is also an antiderivative of f (x) if c is a constant. Then, using the second


step we get8
b

F (x) + c

=
a

F (b) + c F (a) + c

= F (b) F (a),

which is exactly what we wanted. That is, including a constant of integration does not
affect the value of a definite integral and so we can omit it.
Solution to activity 5.17
For definite integrals, it should be easy to see that we have the
constant multiple rule: If k is a constant and f (x) is a function, then
b

kf (x) dx = k
a
8

f (x) dx.
a

In what follows, bear in mind that a constant such as c, when evaluated at either x = a or x = b, is
just c.

192

5.4. Solutions to activities

sum rule: If f (x) and g(x) are functions, then


b

[f (x) + g(x)] dx =

f (x) dx +

g(x) dx.
a

difference rule: If f (x) and g(x) are functions,

[f (x) g(x)] dx =

f (x) dx

g(x) dx.
a

Solution to activity 5.18


Suppose that F (x) and G(x) are antiderivatives of f (x) and g(x) respectively, i.e.
dF
= f (x)
dx

dG
= g(x).
dx

and

This means that


b

f (x) dx+l
a

g(x) dx = k F (x)
a

+l G(x)

= k F (b)F (a) +l G(b)G(a) .

But, by the linear combination rule for differentiation, we also have


d
dF
dG
+l
= kf (x) + lg(x),
kF (x) + lG(x) = k
dx
dx
dx
which means that kF (x) + lG(x) is also an antiderivative of kf (x) + lg(x), i.e.
b

b
a

kF (b) + lG(b) kF (a) + lG(a) .

[kf (x) + lg(x)] dx = kF (x) + lG(x)


a

Consequently, we have
b

[kf (x) + lg(x)] dx = k


a

f (x) dx + l

g(x) dx,

as they have the same values.


Solution to activity 5.19
As the region in Example 5.31 is symmetric about the y-axis it should be clear that we
have an area given by
1
1

(4 x2 ) dx =

(4 x2 ) dx +

(4 x2 ) dx,

where the values of the two integrals on the right-hand-side, i.e. the areas they
represent, are equal. As such, we can write
1

1
2

(4 x ) dx = 2

(4 x2 ) dx,

193

5. Integration

if we decide to find the second of these integrals. Then, looking at the integral on the
right-hand-side, we get
1

x3
(4 x ) dx = 4x
3

4(1)

=
0

(1)3
3

4(0)

(0)3
3

11
,
3

so that, multiplying this by two, we get an area of 22/3 as before.


Solution to activity 5.20
Looking at the triangles in Figure 5.5(b), we use half times base times height to see
that the area of the triangle on the left is
1
2 4 = 4,
2

and the area of the triangle on the right is also given by


1
2 4 = 4.
2
As such, the total area is eight as we found in Example 5.33.
Solution to activity 5.21
A sketch of the inverse supply and demand functions from Example 5.40 is given in
Figure 5.10 and the shaded regions are the consumer and producer surpluses as
indicated. Notice that we have also labelled the equilibrium price and quantity, which
we found in the example, on the sketch. Indeed, from this sketch it should be clear that:
The consumer surplus, CS, is the area of a triangle of base 60 and height 20, i.e. we
can use half times base times height to see that
CS =

1
60 20 = 600,
2

and this agrees with what we found in the example.


The producer surplus, PS, is the area of a triangle of base 60 and height 30, i.e. we
can use half times base times height to see that
PS =

1
60 30 = 900,
2

and this agrees with what we found in the example.


Solution to activity 5.22
As the demand equation is p(q + 1) = 231, we see that the inverse demand function is
pD (q) =

231
,
q+1

and, an equilibrium quantity, q , of 10 then gives us an equilibrium price of


p = pD (q ) = 21. This means that, using
q

CS =
0

194

pD (q) dq p q ,

5.4. Exercises
p

111111
000000
70
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
CS
000000
111111
000000
111111
50
000000
111111

1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
PS

20

210

60

Figure 5.10: A sketch of the consumer and producer surpluses for Activity 5.21.

we need to find
10
0

231
dq = 231 ln |q + 1|
q+1

10
0

= 231 ln 11 231 ln 1 = 231 ln 11,

as ln 1 = 0, and this gives us


CS = 231 ln 11 (21)(10) = 231 ln 11 210,
for the consumer surplus.

Exercises
Exercise 5.1
Find the following indefinite integrals.
(a)

sin3 x cos x dx,

(b)

sin3 x dx,

(c)

(x + 2) ln x dx.

Exercise 5.2
Find

1 + ex

dx.
ex

Exercise 5.3
x2
Find
dx.
x2 1
Exercise 5.4
Use the substitution t = tan

x
to evaluate
2

/2
0

dx
.
2 + cos x

Exercise 5.5
Find the area of the region between the curve y = x3 , the x-axis and the vertical lines
x = 1 and x = 2.

195

5. Integration

Solutions to exercises
Solution to exercise 5.1
For (a), we have to find
sin3 x cos x dx,
and we notice that the integrand involves the composition sin3 x. This suggests that we
should make the substitution g = sin x and, as this gives us
dg
= cos x
dx

dg = cos x dx,

which is the other part of the product in the integrand, we can be sure that this will
work. So, using this substitution we get
sin3 x cos x dx =

g 3 dg =

g4
1
+ c = sin4 x + c,
4
4

where c is an arbitrary constant.


For (b), we have to find
sin3 x dx,
and we note that, using the trigonometric identity sin2 x = 1 cos2 x from (2.2), this
can be written as
sin3 x dx =

(1 cos2 x) sin x dx =

sin x dx

cos2 x sin x dx.

Of course, the first of these integrals on the right-hand-side is trivial and the other was
found in Activity 5.9. So, using this, we find that
sin3 x dx =

sin x dx

= cos x +

1
cos2 x sin x dx = cos x cos3 x + c
3

1
cos3 x + c,
3

where c is an arbitrary constant.


For (c), we have to find
(x + 2) ln x dx,
and we note that the integrand is a product. This suggests that we should use
integration by parts with
f (x) = ln x

and

g (x) = x + 2,

like we did in Example 5.19. So, differentiating f (x) and integrating g (x) we get
f (x) =

196

1
x

and

g(x) =

x2
+ 2x,
2

5.4. Solutions to exercises

where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x2
+ 2x
2

(x+2) ln x dx = (ln x)

x2
+ 2x
2

1
x

dx =

x
(x+4) ln x
2

x
+ 2 dx,
2

and, clearly, the new integral is easier to find. Thus, finding this integral, we get
(x + 2) ln x dx =

x
(x + 4) ln x
2

x
x
+ 2 dx = (x + 4) ln x
2
2

x2
+ 2x + c,
4

where c is an arbitrary constant.


Solution to exercise 5.2
It makes sense to start by rewriting the integral so that we have

1 + ex

1 + ex/2 ex/2 dx,


dx =
ex

since, in this form, we can see that we have the composition 1 + ex/2 in the integrand.
This suggests that we should make the substitution g = 1 + ex/2 and, as this gives us
1
dg
= ex/2
dx
2

2 dg = ex/2 dx,

which is the other part of the product in the integrand, we can be sure that this will
work. So, using this substitution we get

g 3/2
4
1 + ex

3/2
1/2

g
(2
dg)
=
2
g
dg
=
2
+c = 1 + ex/2
dx
=
+c,
x
3/2
3
e
where c is an arbitrary constant.
Solution to exercise 5.3
The integral

x2
dx,
x2 1
has an integrand which is the quotient of two polynomials. But, as these have the same
degree, we can not use the method of partial fractions on it as it stands. Instead, we
start by rewriting the integrand as
x2
x2 1 + 1
1
=
=1+ 2
,
2
2
x 1
x 1
x 1

so that now, we can use the method of partial fractions on


x2

1
,
1

as the degree of its numerator is less than the degree of its denominator. That is, since
x2 1 = (x 1)(x + 1), we have distinct linear factors and so we can write
x2

1
1
A1
A2
=
=
+
,
1
(x 1)(x + 1)
x1 x+1

197

5. Integration

for some constants A1 and A2 . To find these constants, we cross-multiply on the


right-hand-side to see that
1
A1 (x + 1) + A2 (x 1)
=
,
(x 1)(x + 1)
(x 1)(x + 1)
and so, comparing the numerators, we need
1 = A1 (x + 1) + A2 (x 1).
Indeed, setting x = 1 on both sides, we see that 1 = 2A1 whereas setting x = 1 on
both sides, we see that 1 = 2A2 . Thus, we have
x2

1
1/2
1/2
=
+
,
1
x1 x+1

using the values of A1 and A2 that we have found. Consequently, putting this all
together, we find that
x2
dx =
x2 1

1+

1/2
1/2
+
x1 x+1

dx = x +

1
1
ln |x 1| ln |x + 1| + c,
2
2

where c is an arbitrary constant.


Solution to exercise 5.4
We are asked to evaluate the definite integral
/2
0

dx
,
2 + cos x

using the substitution t = tan(x/2). This substitution, like the substitution t = tan
that we saw in Example 5.28, is very useful and so we start by seeing how it can be
applied. Firstly, we note that we can easily write sin(x/2) and cos(x/2) in terms of t by
using a right-angled triangle like the one in Figure 5.1 as this immediately tells us that
sin

t
x
=
2
1 + t2

and

cos

x
1
=
.
2
1 + t2

So, using the double-angle formula cos(2x) = cos2 x sin2 x from (2.6), we see that the
denominator of our integrand can be written as
2 + cos x = 2 + cos2 x sin2 x = 2 +

t2
3 + t2
1

=
.
1 + t2 1 + t2
1 + t2

Secondly, differentiating both sides of t = tan(x/2) with respect to x, we get


dt
1
x
= sec2 ,
dx
2
2
and so, since sec(x/2) is the reciprocal of cos(x/2) as we saw in Section 2.1.2, we have
dx =

2 dt
,
1 + t2

in terms of t. Thirdly, as this is a definite integral, we also have to change the limits of
integration, i.e.

198

5.4. Solutions to exercises

lower limit: x = 0 gives t = tan 0 = 0, and


upper limit: x = /2 gives t = tan(/4) = 1.
Thus, returning to the integral, we have
/2
0

dx
=
2 + cos x

1 + t2
3 + t2

2 dt
1 + t2

=2
0

dt
,
3 + t2

and, using the result we found in Example 5.17, this gives us


/2

1
dx
x
= 2 tan1
2 + cos x
3
3
0

and so is the answer.


3 3

1
0

2
=
3

1
tan1 tan1 0
3

2
0 ,
=
3 6

Solution to exercise 5.5


To find the area of the region between the curve y = x3 , the x-axis and the vertical lines
x = 1 and x = 2, we note that the curve will be similar to what we saw in
Figure 2.2(c) and so the region we are looking at is the one illustrated in Figure 5.11.
y

y = x3

1
0
0
1
0
1
01
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
1
0
0
1
0
0
1
01
011
00
1 1
1

Figure 5.11: The hatching indicates the region of interest in Exercise 5.5.

In particular, we see that the curve crosses the x-axis when x = 0 and that the function
is non-positive when 1 x 0 and non-negative when 0 x 2. As such, we split
the total region into two sub-regions to see that:
Between x = 1 and x = 0 we evaluate the definite integral,
0

x4
x dx =
4
1

Thus, the area is

1
4

=
1

04 (1)4
1

= .
4
4
4

here as we have a non-positive integrand.

Between x = 0 and x = 2 we evaluate the definite integral,


2

x4
x dx =
4

=
0

24 04
16

=
= 4.
4
4
4

Thus, the area is 4 here as we have a non-negative integrand.


1
17
Consequently, the total area of this region is + 4 = .
4
4

199

5. Integration

200

Chapter 6
Functions of several variables
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 3.13.9.
Anthony and Biggs (1996) Chapters 11 and 12.
Further reading
Simon and Blume (1994) parts of 13.113.2, parts of 14.114.6 and 14.8, parts of
15.115.2.
Adams and Essex (2010) parts of Chapter 12.
Aims and objectives
The objectives of this chapter are as follows.
To understand that functions of two variables represent surfaces and see how to
visualise these surfaces using sections and contours.
To introduce partial derivatives and use them in various contexts.
To introduce tangent planes, gradient vectors, directional derivatives and Taylor
series for functions of two variables.
Specific learning outcomes can be found near the end of this chapter.

6.1

Introduction

In Section 2.1, we saw that a function f : R R was a rule which takes an input,
x R, and gives us a unique output, f (x) R. We now turn our attention to functions
of two variables, i.e. functions where the input consists of a pair of numbers, (x, y) R2 ,
and whose output is a unique number f (x, y) R.1 In particular, we will mainly be
concerned with functions of two variables where the variables are independent, i.e. the
1

The theory we consider extends to the general case where the input consists of n numbers
(x1 , x2 , . . . , xn ). This extension to functions of n variables (with n 3) should be obvious and so we
do not spend much time on it here. However, although we will mainly be dealing with the two-variable
case, we will occasionally consider functions of more than two variables.

201

6. Functions of several variables

value of x can be chosen independently of the value of y and vice versa. As we shall see,
functions of two variables often occur in economics and other fields where we might
wish to apply mathematical techniques. Two important examples of such functions from
economics are:
The production function of a firm, q(k, l), gives the amount it produces when using
k units of capital and l units of labour.
The utility function of a consumer, u(x1 , x2 ), describes how much utility a
consumer derives from a bundle (x1 , x2 ) of two goods. As such it enables us to
compare the preferences of the consumer when he is confronted with different
combinations of these two goods.
These applications will be discussed later because, before we consider what we may
want to use them for, we want to know how we can visualise what is going on when we
have a function of two variables.

6
6.2

Surfaces

Let f : R2 R be a function of the two independent variables x and y. We can think of


any input (a, b) as a point in the (x, y)-plane and the output will be the corresponding
value of f , i.e. f (a, b), which we can take to be the number c. That is, generally
speaking, each point (x, y) in the (x, y)-plane will have an output given by the
corresponding value of f , i.e. f (x, y), which we can take to be the value of another
variable z. As such, to visualise a function of two variables we need three axes, two to
represent the inputs, i.e. x and y, and one to represent the output, i.e. z. Drawing these
as in Figure 6.1, we take the (x, y)-plane of the inputs to correspond to points where
z = 0, i.e. the input (a, b) is represented on our axes by the point (a, b, 0), and then the
output of z = f (x, y) is represented on our axes by the point (a, b, c) which is a vertical
distance c above the point (a, b, 0) in the (x, y)-plane.
If we do this for all possible inputs (x, y) R2 we obtain a surface in three-dimensional
space whose equation is given by z = f (x, y). For instance, the surfaces obtained from
three different functions of two variables, namely
f (x, y) = x2 + y 2 ,

g(x, y) = x2 y 2

and

h(x, y) = x2 y 2 ,

are illustrated in Figures 6.2(a), (b) and (c) respectively.


Of course, it would be difficult for us to sketch such surfaces by hand and, indeed, it is
hard enough to even contemplate how and why they look like they do without a
computer. But, as we shall soon see, it is possible to get some feel for what these
surfaces look like by thinking about how we can represent them in a two-dimensional
way. However, before we do that, lets take a moment to look at some far simpler
surfaces than the ones in Figure 6.2, namely those that can arise from linear functions
of two variables, as these turn out to be planes.

202

6.2. Surfaces

z
c

(a, b, c)

O
b

(a, b, 0)

y
Figure 6.1: Representing the point (a, b, c) using the x, y and z-axes in R3 .

6.2.1

Planes

The simplest kind of two-variable function is one which is linear in x and y, i.e. where
z = f (x, y) = ax + by,
for some constants a and b. Such functions represent planes and, generally speaking,
any surface which has an equation of the form
ax + by + cz = d,
where at least one of the constants a, b and c is non-zero will represent a plane. For
what follows, the important kinds of plane are, basically, those that fall into the
following categories:
The (x, y), (y, z) and (x, z)-planes which have equations z = 0, x = 0 and y = 0
respectively. (These are the planes in the middle of the three planes illustrated in
Figures 6.3(a), (b) and (c) respectively.)
Planes parallel to the (x, y), (y, z) and (x, z)-planes which, for some constant c, will
have equations z = c, x = c and y = c respectively. (These are the other planes
illustrated in Figures 6.3(a), (b) and (c) respectively.)
Planes which dont fall into either of the above categories, i.e. those with equations of
the form
ax + by + cz = d,
for some constants a, b, c and d (where at least two of the constants a, b and c are
non-zero) will not overly concern us here even though you will come across them in
Section 2.11 of 173 Algebra.

203

6. Functions of several variables

(a)

(b)

(c)

Figure 6.2: Visualising the surfaces (a) z = f (x, y) = x2 + y 2 , (b) z = g(x, y) = x2 y 2

and (c) z = h(x, y) = x2 y 2 in three-dimensions. In particular, observe how (c) is the


reflection of (a) in the (x, y)-plane as h(x, y) = f (x, y).

z
z

y
y
x

(a)

(b)

(c)

Figure 6.3: Planes parallel to the (x, y), (y, z) and (x, z)-planes: (a) From bottom, z =

10, 0, 10; (b) From left x = 10, 0, 10 and (c) From right y = 10, 0, 10. (Note, in
particular, how the axes are labelled in these pictures.)

6.2.2

Contours and sections

Although curve sketching (which is sketching the graph of a function of one variable) is
important in this course, you will not be asked to sketch surfaces (such as the ones
illustrated above in Figure 6.2) for functions of two variables. However, there are useful
ways of visualising such surfaces which do not involve sketching it in three dimensions.
One of these is to use planes, such as the ones we saw in Figure 6.3, to carve up a
three-dimensional illustration of a surface into two-dimensional representations in terms
of contours and sections. In particular, these ideas may be familiar to you from your
experiences with maps (for contours) and other technical diagrams (for sections).
Horizontal planes and the contours of a surface
One way of visualising a surface is to look at its contours, which are the curves of
intersection that arise when we look at the points of intersection of a surface with
planes that are parallel to the (x, y)-plane. To find the contours, we take a plane

204

6.2. Surfaces

parallel to the (x, y)-plane, say the plane z = c, and find the curve of intersection
between it and the surface z = f (x, y), i.e. the curve with equation c = f (x, y). This
curve is the z = c contour, i.e. the set of points (x, y) which give z = c when we put
them into the equation z = f (x, y).
Example 6.1 Find the z = 2 contour of the surface z = x y + 4. Repeat for z = 4
and z = 6.
To find the z = 2 contour of the surface z = x y + 4 we need to find the curve of
intersection, which in this case, is given by
2 = x y + 4.
Rearranging this gives the equation y = x + 2 which is the equation of a straight line.
Similarly, we find that:
For z = 4, the curve of intersection is given by 4 = x y + 4 which gives us
y = x.

For z = 6, the curve of intersection is given by 6 = x y + 4 which gives us


y = x 2.

Thus, we see from these equations that these two contours are straight lines as well.
The surface and its contours are illustrated in Figure 6.4.
5

1
z

0
5
2

2
2
0
0

5.0

5.0

2.5

0.0
0.0

2.5

2.5
5.05.0

2.5

5.0

2.5

0.0

2.5

x
y

5.05.0

2.5

0.0

2.5

5.0

y
3

y
4

(a)

(b)

(c)

Figure 6.4: For Example 6.1. (a) The surface z = x y + 4 and, from the bottom, the

planes z = 2, 4, 6. (b) The curves of intersection of the surface and the planes in (a)
with their corresponding values of z. (c) The contours: Each line represents a contour
(i.e. the points with coordinates (x, y) that map to a particular value of z) in this
case, the further to the right the line is, the larger the corresponding value of z is, as we
have z = 2, 4, 6 as we move from left to right. Notice that, here, the contours are parallel
straight lines (i.e. they have the same gradient but different y-intercepts).

Activity 6.1 Find the equations of the z = 10, z = 0 and z = 10 contours of the
surface z = 4x + 2y 2 and sketch these in the (x, y)-plane clearly labelling the
value of z which is associated with each contour.

205

6. Functions of several variables

Example 6.2 Find the z = 16 contour of the surface z = x2 + y 2 . What are the
z = c contours of the surface z = x2 + y 2 when (i) c > 0, (ii) c = 0 and (iii) c < 0?
To find the z = 16 contour of the surface z = x2 + y 2 we need to find the curve of
intersection which, in this case, is simply
x2 + y 2 = 16.
This is the equation of a circle, centred on the origin, with a radius of four.
To find the z = c contours in the three cases indicated we just need to find out what
the curve
x2 + y 2 = c,
looks like in the three cases. So, we have:

If c > 0, the contour is a circle, centred on the origin, with a radius of

c.

If c = 0, the contour is the point (0, 0) as this is the only solution to the
equation x2 + y 2 = 0.

If c < 0, there are no contours as we know that x2 + y 2 0 for all values of x


and y.
In particular, notice that z = 0 is the smallest value of z that arises from a point on
this surface. The surface and three of its contours for c > 0 are illustrated in
Figure 6.5.

4
70
3
60
2
50

70

60

40
50

30
40

20

30
20

5.0
0
5.0

10

0.0

5.0
2.5

5.0

2.5
2.5

10
2.5
0.0 y

0
1

5.0

2.5
2.5

0.0

5.0
2.5

5.0
2.5
0.0 y

5.0

(a)

(b)

(c)

Figure 6.5: For Example 6.2. (a) The surface z = x2 + y 2 , which we saw in Figure 6.2(a),

and the planes z = 4, 16, 25. (b) The curves of intersection of the surface and the planes
in (a) with their corresponding values of z. (c) The contours: Each circle represents a
contour (i.e. the points with coordinates (x, y) that map to a particular value of z) in
this case, the larger the radius of the contour, the larger the corresponding value of z as
we have z = 4, 16, 25. Notice that, here, the contours are concentric circles (i.e. they have
the same centre but different radii).

206

6.2. Surfaces

Activity 6.2 Find the z = 25 contour of the surface z = x2 y 2 . What are the
z = c contours of this surface when (i) c > 0, (ii) c = 0 and (iii) c < 0?
Vertical planes and the sections of a surface
Another way of visualising a surface is to look at its sections, which are the curves of
intersection that arise when we look at the points of intersection of a surface with
planes that are perpendicular to the (x, y)-plane. To find the sections, we take a plane
perpendicular to the (x, y)-plane and find the curve of intersection between it and the
surface z = f (x, y). In particular, in this course, we shall only need to consider sections
that arise from planes that are parallel to the (x, z)-plane (i.e. y = c for some constant
c) or parallel to the (y, z)-plane (i.e. x = c for some constant c).
As such, the easiest sections to sketch are the ones we get when we consider the (x, z)
and (y, z)-planes which are both perpendicular to the (x, y)-plane. In particular, we find
that the section which we get from the:
(x, z)-plane, which has the equation y = 0, is the curve of intersection between it
and the surface z = f (x, y), i.e. the curve with equation z = f (x, 0).
(y, z)-plane, which has the equation x = 0, is the curve of intersection between it
and the surface z = f (x, y), i.e. the curve with equation z = f (0, y).
Lets look at what these sections look like in the case of the two surfaces we considered
above when we were looking for contours.
Example 6.3

Find the (x, z) and (y, z)-sections of the surface z = x y + 4.

To find these sections of the surface z = x y + 4 we need to find the curves of


intersection, which in this case, are given by:
For the (x, z)-section, we have y = 0 and so the curve of intersection is given by
z = x + 4 and this is a straight line in the (x, z)-plane.
For the (y, z)-section, we have x = 0 and so the curve of intersection is given by
z = y + 4 and this is a straight line in the (y, z)-plane.

The surface and these sections are illustrated in Figure 6.6.

Activity 6.3 Find the (x, z) and (y, z)-sections of the surface z = 4x + 2y 2 and
sketch these in the appropriate planes.
Example 6.4

Find the (x, z) and (y, z)-sections of the surface z = x2 + y 2 .

To find these sections of the surface z = x2 + y 2 we need to find the curves of


intersection, which in this case, are given by:
For the (x, z)-section, we have y = 0 and so the curve of intersection is given by
z = x2 and this is a parabola in the (x, z)-plane.

207

6. Functions of several variables

For the (y, z)-section, we have x = 0 and so the curve of intersection is given by
z = y 2 and this is a parabola in the (y, z)-plane.
The surface and these sections are illustrated in Figure 6.7.

8.0

8.0

7.2

7.2

6.4

6.4

5.6

5.6

4.8

4.8

z 4.0

z 4.0

3.2

3.2

2.4

2.4

1.6

1.6

2.5

0.8

0.8

0.0

6
z
4
5.0
2

0.0

5.0
2.5

2.5
0.0
x

2.5

0.0
0

5.0

5.0

(a)

(b)

(c)

Figure 6.6: For Example 6.3. (a) The surface z = x y + 4 and the planes x = 0 (which

goes diagonally from bottom left to top right) and y = 0 (which goes diagonally from top
left to bottom right). (b) The (x, z)-section is the line z = x + 4. (c) The (y, z)-section is
the line z = y + 4.

8.0

8.0

7.2

7.2

6.4

6.4

5.6

5.6

4.8

4.8

z 4.0

z 4.0

3.2

3.2

2.4

2.4

1.6

1.6

0.8

0.8

0.0

6
z
4
4

0
4

2
0
x

0.0
0

(a)

(b)

(c)

Figure 6.7: For Example 6.4. (a) The surface z = x2 + y 2 and the planes x = 0 (which

goes diagonally from bottom left to top right) and y = 0 (which goes diagonally from top
left to bottom right). (b) The (x, z)-section is the parabola z = x2 . (c) The (y, z)-section
is the parabola z = y 2 .

Activity 6.4 Find the (x, z) and (y, z)-sections of the surface z = x2 y 2 and
sketch these in the appropriate planes.
More generally, we may want to look at the sections we get when we consider planes
that are parallel to the (x, z) and (y, z)-planes which we considered above. In
particular, we find that the sections we get from the planes that are parallel to the:

208

6.2. Surfaces

(x, z)-plane, which have equations of the form y = c where c is a constant, are the
curves of intersection between it and the surface z = f (x, y), i.e. the curve with
equation z = f (x, c).
(y, z)-plane, which have equations of the form x = c where c is a constant, are the
curves of intersection between it and the surface z = f (x, y), i.e. the curve with
equation z = f (c, y).
Lets see what these sections look like in the case of the two surfaces we considered
above.
Example 6.5

Find the y = 0, 2, 4 sections of the surface z = x y + 4.

To find these sections of the surface z = x y + 4 we need to find the curves of


intersection, which in this case, are given by:
For the y = 0 section, we have y = 0 and so the curve of intersection is given by
z = x + 4 and this is a straight line in the (x, z)-plane. Of course, this is just the
(x, z)-section we found in Example 6.3!
For the y = 2 section, we have y = 2 and so the curve of intersection is given by
z = x 2 + 4 = x + 2 and this is a straight line.
For the y = 4 section, we have y = 4 and so the curve of intersection is given by
z = x 4 + 4 = x and this is a straight line.

Observe that only the first of these sections lives in the (x, z)-plane, but we can
sketch the other two in this plane to get a feel for how the surface is changing when
we look at the sections y = c for different values of c. The surface and these sections,
when drawn in the (x, z)-plane, are illustrated in Figure 6.8.

6
8

4
z
4
2
4

2
2
0
0

x
0

2
5

4
1

(a)

(b)

Figure 6.8: For Example 6.5. (a) The surface z = x y + 4 and the planes y = 0, y = 2

and y = 4 as we move from right to left. (b) The y = 0, y = 2 and y = 4 sections (as
we move from top to bottom) all drawn in the (x, z)-plane. Note that, the y = 0 section
is the (x, z)-section and, of the three sections illustrated, this is the only one that really
lives in the (x, z)-plane. Also notice that, as the value of c increases when we look at
the plane y = c, the value of the z-intercept decreases when we look at the section.

209

6. Functions of several variables

Activity 6.5 Find the x = 0, 2, 4 sections of the surface z = x y + 4 and sketch


them in the (y, z)-plane. Of these three sections, which one have we found before
and what did we call it? Of these three sections, which is the only one that really
lives in the (y, z)-plane?
Activity 6.6

Consider the surface z = 4x + 2y 2.

Find the y = 2, 0, 2 sections of this surface and sketch them in the (x, z)-plane.
Find the x = 2, 0, 2 sections of this surface and sketch them in the (y, z)-plane.
Example 6.6

Find the x = 0, 1, 2 sections of the surface z = x2 + y 2 .

To find these sections of the surface z = x2 + y 2 we need to find the curves of


intersection, which in this case, are given by:
For the x = 0 section, we have x = 0 and so the curve of intersection is given by
z = y 2 and this is a parabola in the (y, z)-plane. Of course, this is just the
(y, z)-section we found in Example 6.4!

For the x = 1 section, we have x = 1 and so the curve of intersection is given by


z = 1 + y 2 and this is a parabola.
For the x = 2 section, we have x = 2 and so the curve of intersection is given by
z = 4 + y 2 and this is a parabola.
Observe that only the first of these sections lives in the (y, z)-plane, but we can
sketch the other two in this plane to get a feel for how the surface is changing when
we look at the sections x = c for different values of c. The surface and these sections,
when drawn in the (x, z)-plane, are illustrated in Figure 6.9.
Activity 6.7 Find the y = 0, 1, 2 sections of the surface z = x2 + y 2 and sketch
them in the (x, z)-plane. Of these three sections, which one have we found before
and what did we call it? Of these three sections, which is the only one that really
lives in the (x, z)-plane?
Activity 6.8

Consider the surface z = x2 y 2 .

Find the y = 0, 1, 2 sections of this surface and sketch them in the (x, z)-plane.
Find the x = 0, 1, 2 sections of this surface and sketch them in the (y, z)-plane.

6.3

Partial differentiation

In Chapter 3, we saw how to differentiate functions of one variable. Unsurprisingly,


perhaps, we can also differentiate functions of two variables using partial differentiation

210

6.3. Partial differentiation

8
6
6
z
4
z
4

2
2
4
2
0 y

0
4

2
3

y
1

4
1

(a)

(b)

Figure 6.9: For Example 6.6. (a) The surface z = x2 + y 2 and the planes x = 0, x = 1 and

x = 2 as we move from left to right. (b) The x = 0, x = 1 and x = 2 sections all drawn
in the (y, z)-plane. Note that, the x = 0 section is the (y, z)-section and, of the three
sections illustrated, this is the only one that really lives in the (y, z)-plane. Notice that,
as the value of c increases when we look at the plane x = c, the value of the z-intercept
increases when we look at the section.
to yield partial derivatives.2 In some ways, this will be similar to what we saw when we
differentiated functions of one variable to get their derivatives, but as we now have two
variables to deal with, things get a little trickier.

6.3.1

Sections and partial derivatives

Consider f (x, y), a function of two independent variables. For a fixed value of y, say
y = y0 , we can look at the function g(x) = f (x, y0 ) which is now a function of x only.
Clearly, the rate of change of g(x) with respect to x is just the derivative of this
function with respect to x. But, what happens when we want to calculate the rate of
change of f (x, y) with respect to x for any fixed value of y? To do this we avoid
specifying a particular value of y by just assuming that y is a constant and
differentiating with respect to x. So, given a function f (x, y) we denote the operation of
differentiating f with respect to x whilst holding y constant by
f
or, more compactly, fx (x, y),
x

(6.1)

and call this the partial derivative of f (x, y) with respect to x.3 In a similar manner, we
can define the partial derivative of f (x, y) with respect to y, denoted by
f
or, more compactly, fy (x, y),
y

(6.2)

Most of the material in these notes can be generalised to functions with more than two variables.
But, in this course, almost without exception, we will be considering functions of two variables.
3
Note that we use the curly-d, i.e. , for partial derivatives rather than the normal straight-d, i.e.
d, which one encounters in the notation dg/dx for the derivative of a function g(x) of one variable. We
shall see why it is important to keep these two notions of differentiation separate later.
Similarly, we use fx (x, y) as shorthand for the partial derivative of f (x, y) with respect to x rather
than the g (x) which one encounters as the shorthand for the derivative of a function g(x) of one variable.

211

6. Functions of several variables

which is what we obtain from differentiating f (x, y) with respect to y whilst holding x
constant.
Clearly, the partial derivative of f (x, y) with respect to x, i.e. the result of
differentiating f (x, y) with respect to x whilst holding y constant, is going to be another
function of x and y. This function of x and y is what is denoted by the symbols in (6.1).
But, what does this partial derivative mean? In effect, what we have done when we
consider the function f (x, y) for some fixed value of y, say y0 , is to look at the section of
the curve z = f (x, y) we get when y = y0 , i.e. the section given by the equation
z = f (x, y0 ) which lies in a plane that has y = y0 and is parallel to the (x, z)-plane.
Then, when we differentiate f (x, y0 ) with respect to x, we are finding the gradient of
this section, i.e. it tells us how z = f (x, y0 ) is varying with x. Consequently, this partial
derivative is telling us something about the gradient of the surface when we are at the
point (x, y0 ) and we are looking in the x-direction. This will become clear when we
look at tangent planes in Section 6.4.1.

Activity 6.9 Describe what the partial derivative of f (x, y) with respect to y
evaluated at the point (x0 , y) tells us about the gradient of the surface at the point
(x0 , y).

6.3.2

Finding partial derivatives

Calculating the partial derivatives of f (x, y) is only slightly more difficult than finding
the derivative of a function of one variable. Recalling that the partial derivative of a
function f (x, y) with respect to x, i.e. fx (x, y), is just the derivative of f (x, y) with
respect to x whilst holding y constant, to calculate fx (x, y) we just treat any occurrence
of y in f (x, y) as if it were a constant and differentiate f (x, y) with respect to x. And, in
a similar way, we can find the partial derivative of a function f (x, y) with respect to y,
i.e. fy (x, y). Lets look at an example.
Example 6.7

Given that f (x, y) = x2 y + 5xy 3 + y 2 , find fx (x, y) and fy (x, y).

Lets do this slowly so that we get the idea. To find fx (x, y), we treat y as if it were
a constant and lets say that this constant is c. So, we have a function of one
variable given by
g(x) = f (x, c) = cx2 + 5c3 x + c2 ,
and differentiating this with respect to x gives
dg
= 2cx + 5c3 .
dx
But, c is the constant were using to represent y and so replacing all the cs with ys
we have
f
= 2xy + 5y 3 ,
x
which is the partial derivative of f (x, y) with respect to x.
Similarly, to find fy (x, y), we treat x as if it were a constant and (again) lets say
that this constant is c. So, we have a function of one variable given by
g(y) = f (c, y) = c2 y + 5cy 3 + y 2 ,

212

6.3. Partial differentiation

and differentiating this with respect to y gives


dg
= c2 + 15cy 2 + 2y.
dy
But, c is the constant were using to represent x and so replacing all the cs with
xs we have
f
= x2 + 15xy 2 + 2y,
y
which is the partial derivative of f (x, y) with respect to y.
Obviously, there is no need to go through all this detail whenever we calculate a partial
derivative all you have to do is remember what you are keeping constant and then
differentiate whatever is left. Lets look at another example.
Example 6.8

Given that f (x, y) = 3x3 + 7xy 1 + 2y 9 , find fx (x, y) and fy (x, y).

Lets do this quickly. To find fx (x, y), we treat y as a constant and differentiate
with respect to x to get
f
= 9x2 + 7y 1 .
x
Similarly, to find fy (x, y), we treat x as a constant and differentiate with respect to y
to get
f
= 7xy 2 + 18y 8 .
y
And, were done!
Activity 6.10

Given that
f (x, y) = 2x + x3 y

x y3
+ ,
y
2

find fx (x, y) and fy (x, y).


So far, we have calculated the partial derivatives of very simple functions of x and y.
But, sometimes, we will need to use the chain, product and quotient rules when
calculating partial derivatives. Lets look at an example to see how this is done.
Example 6.9

Given that

f (x, y) = x ex+y ,
find fx (x, y) and fy (x, y).
We first note that we can write this function as
2

f (x, y) = (x ex ) ey ,
2

and so, to find fx (x, y), we treat ey as a constant and we differentiate the function
x ex using the product rule to get x ex +1 ex . This gives us
f
2
2
= ey (x ex + ex ) = (x + 1) ex+y .
x

213

6. Functions of several variables

To find fy (x, y), we treat x ex as a constant and we differentiate the function ey


2
using the chain rule to get 2y ey . This gives us

f
2
2
= x ex (2y ey ) = 2xy ex+y .
y

Activity 6.11

6.3.3

Given that f (x, y) =

x2 + y 2 , find fx (x, y) and fy (x, y).

The chain rule

Sometimes a function of one variable is defined with reference to a function of two


variables. For instance, suppose that the production level, q, of a firm depends on the
amounts k of capital and l of labour used through the function q(k, l). Suppose also
that both k and l change over time in some known way so that we have formulas for
k(t) and l(t) where t is a parameter measuring time.4 How, we might ask, can we find
the rate of change of production with time?
Example 6.10 Given that we have the production function q(k, l) = kl where k
and l are functions of time, t, given by
k(t) = 3 + 2t

and

l(t) = 10 3t,

find the rate of change of production with time.


In this case, we can calculate the production as a function of time by explicitly
finding Q(t) = q(k(t), l(t)) which, in this case is
Q(t) = k(t)l(t) = (3 + 2t)(10 3t) = 30 + 11t 6t2 .
And, in particular, we can now differentiate this to find the rate of change of
production with time, i.e. we have
dQ
= 11 12t,
dt
in this case.
More generally, suppose we are given a function f of two variables x and y, both of
which are themselves functions of t. We can think of this as defining a composite
function F (t) = f (x(t), y(t)). In the case of a single variable we have a rule, i.e. the
chain rule, which enables us to work out the derivative of a composite function.
Amazingly, perhaps, there is a similar rule for composite functions of two variables such
as the one we have here which is also known as the chain rule. It states that
dF
f dx f dy
=
+
.
dt
x dt
y dt
4

(6.3)

Notice that, since k and l both depend on t, we can only pick certain pairs of values, (k, l). That is,
in this case, the variables k and l are not independent.

214

6.3. Partial differentiation

Sometimes, in this context, we call F (t) the total derivative of F (t) with respect to t
(in order to distinguish it from the partial derivatives of f with respect to x and y).
To see why the chain rule works, consider that if we change t by a small amount, t,
the corresponding change in F (t) is given by
dF
t,
dt
but here, there are two ways in which F (t) = f (x(t), y(t)) can change with t.
F

Firstly, F can change with t because f changes with x and x changes with t, lets
denote this change in F by x F . In this case, we have
f
x,
x
as we are holding y constant to see how F changes with x and this means that
x F

f dx
t,
x dt
as the change in x, x, is related to a change in t by x
x F

x (t)t.

Secondly, F can change with t because f changes with y and y changes with t, lets
denote this change in F by y F . In this case, we have
y F

f
y,
y

as we are holding x constant to see how F changes with y and this means that
y F

f dy
t,
y dt

as the change in y, y, is related to a change in t by y

y (t)t.

Thus, as the total change in F due to these two changes is given by


F = x F + y F

f dx
f dy
t +
t,
x dt
y dt

we can now equate our two expressions for F and divide through by t to get the
chain rule which we saw above in (6.3). Lets see how we could have used it to answer
the question we saw in Example 6.10.
Example 6.11 Consider the functions in Example 6.10. Use the chain rule to find
the rate of change of production with time.
Here q(k, l) = kl, k(t) = 3 + 2t and l(t) = 10 3t. In this case, if we again let
Q(t) = q(k(t), l(t)), the chain rule states that
dQ
q dk q dl
=
+
.
dt
k dt
l dt
As such, using this, we can see that
dQ
= (l)(2) + (k)(3) = 2(10 3t) 3(3 + 2t) = 11 12t,
dt
which agrees with our earlier answer.

215

6. Functions of several variables

Activity 6.12 Suppose that f (x, y) = x2 y and that x(t) = 2 + 3t and y(t) = t2 + 1.
If F (t) = f (x(t), y(t)), use the chain rule to find the total derivative of F with
respect to t and check your answer by explicitly finding F (t) and differentiating it
with respect to t.
We now consider one of the many useful applications of the chain rule.
The derivative of an implicit function
An equation g(x, y) = c where c is a constant can, in some cases, be rearranged (or
solved) to give y as an explicit function of x. Once we have done this, we can then
differentiate our expression for y with respect to x to find its derivative, y (x).
Example 6.12 Suppose that y is a function of x defined by the equation
x2 y = 7. Find y as an explicit function of x and hence find y (x).

As we have x2 y = 7 we can easily rearrange this to get y = x2 7, i.e. we have


y(x) = x2 7,
if we want y as an explicit function of x. In particular, this means that
dy
= 2x,
dx
in this case.
In general, we say that an equation g(x, y) = c defines y implicitly as a function of x if
there is a function y(x) which satisfies the equation for a range of values of x. But, in
general, it may be difficult or impossible to solve the equation g(x, y) = c to find an
explicit formula for y(x) as we did in Example 6.12. However, we can [often] still find
the derivative y (x), even if we dont have an explicit expression for y in terms of x.
To see how we can do this, consider that if we knew the function, y(x), that satisfied
the equation g(x, y) = c, we could find a new function, G(x), of x only which would be
given by G(x) = g(x, y(x)). Then, using the chain rule, we would have
g dx g dy
dG
=
+
.
dx
x dx y dx
But, G(x) = c where c is a constant and so we also have
dG
=0
dx

as well as

which means that we are left with


0=

g g dy
+
.
x y dx

Rearranging this then gives us


dy
g/x
=
,
dx
g/y

216

dx
= 1,
dx

6.3. Partial differentiation

as long as gy (x, y) = 0, That is, y (x) can easily be found by using the partial
derivatives of g. (But, dont forget the minus sign!)
Example 6.13 In Example 6.12, y was a function of x defined implicitly by the
equation x2 y = 7. Find y (x) using the result above.
As we have the equation x2 y = 7 we can write this as g(x, y) = c with
g(x, y) = x2 y and c = 7. Using the above result we can then see that
g
= 2x
x
which means that

dy
g/x
2x
=
=
= 2x,
dx
g/y
1

as before.
Example 6.14

g
= 1,
y

and

Suppose that y is a function of x defined implicitly by the equation


2 3

3 2

x y 6x y + 2xy = 1.
Verify that the point (x, y) = (1/2, 2) satisfies this equation and find the value of the
derivative, y (x), at this point.
The point (x, y) = (1/2, 2) satisfies the equation since, putting x = 1/2 and y = 2
into the left-hand side, we get
1
2

(2)3 6

1
2

(2)2 + 2

1
2

(2) = 2 3 + 2 = 1,

which is what we have on the right-hand side of the equation. We then see that the
equation defining y implicitly as a function of x is of the form g(x, y) = 1 where
g(x, y) = x2 y 3 6x3 y 2 + 2xy. So, according to the formula given above, we have
dy
g/x
=
,
dx
g/y
and so, since
g
= 2xy 3 18x2 y 2 + 2y
x
we have

and

g
= 3x2 y 2 12x3 y + 2x,
y

dy
2xy 3 18x2 y 2 + 2y
= 2 2
,
dx
3x y 12x3 y + 2x

as long as 3x2 y 2 12x3 y + 2x = 0. Thus, given the point (1/2, 2), we can substitute
these values into our expression for y (x) to see that the value of the derivative at
this point is 6.

217

6. Functions of several variables

Activity 6.13

Suppose that y is a function of x defined implicitly by the equation


x2 + 2xy = 6 3y 3 .

Verify that the point (x, y) = (1, 1) satisfies this equation and find the value of the
derivative, y (x), at this point.
Extensions of the chain rule
What we seen above can be extended. Suppose, for instance, that g is is a function of
two variables x and y, both of which are themselves functions of two variables k and l.
We can think of this as defining a composite function G(k, l) = g(x(k, l), y(k, l)) and an
extension of the chain rule then assures us that
G
g x g y
=
+
k
x k y k

and

G
g x g y
=
+
.
l
x l
y l

To see why the first of these formulae works, consider that if we change k by a small
amount, k, whilst holding l constant, the corresponding change in G(k, l) is given by
G

G
k,
k

but here, there are two ways in which G(k, l) = g(x(k, l), y(k, l)) can change with k.
Firstly, G can change with k because g changes with x and x changes with k, lets
denote this change in G by x G. In this case, we have
x G

g
x,
x

as we are holding y constant to see how F changes with x and this means that
x G

g x
k,
x k

as the change in x, x, is related to a change in k with l held constant by


x xk (k, l)k.
Secondly, G can change with k because g changes with y and y changes with k,
lets denote this change in G by y G. In this case, we have
y G

g
y,
y

as we are holding x constant to see how F changes with y and this means that
y G

g y
k,
y k

as the change in y, y, is related to a change in k with l held constant by


y yk (k, l)k.

218

6.3. Partial differentiation

Thus, as the total change in F due to these two changes is given by


G = x G + y G

g x
g y
k +
k,
x k
y k

we can now equate our two expressions for G and divide through by k to get the
chain rule for Gk (k, l) which we saw above.
Activity 6.14 Use a similar argument to the one above to explain why the chain
rule formula for Gl (k, l) works.
And, in a similar manner, if we suppose that g(x, y, z) = c defines z implicitly as a
function of x and y, we can use this form of the chain rule to derive the formulae
z
g/x
=
x
g/z

and

z
g/y
=
,
y
g/z

which will allow us to calculate the partial derivatives of z with respect to x and y.
Indeed, to see why the first of these formulae works, we consider that if we knew the
function, z(x, y), that satisfied the equation g(x, y, z) = c, we could find a new function,
G(x, y), of x and y only which is given by G(x, y) = g(x, y, z(x, y)). Then using the
chain rule, we have
g dx g z
G
=
+
.
x
x dx z x
But, G(x, y) = c where c is a constant and so we also have
G
=0
x

as well as

dx
= 1,
dx

which means that we are left with


0=

g g z
+
.
x z x

Rearranging this then gives us


z
g/x
=
,
x
g/z
as long as gz (x, y, z) = 0. That is, zx (x, y) can easily be found by using the partial
derivatives of g. (But, dont forget the minus sign!)
Activity 6.15 Use a similar argument to the one above to explain why the formula
for zy (x, y) works.
Activity 6.16
equation

Suppose that q is a function of k and l defined implicitly by the


q 3 k + k 3 l + qk 2 l = 3.

Find the partial derivatives qk (k, l) and ql (k, l). What are the values of these partial
derivatives at the point where k = 1 and l = 1?
[Hint: The identity q 3 + q 2 = (q 1)(q 2 + q + 2) will be useful.]

219

6. Functions of several variables

6.3.4

An application: Homogeneous functions

Homogeneous functions are important in economics since they allow us to capture the
idea of returns to scale. In this section we will see what it means for a function to be
homogeneous and consider an important theorem about homogeneous functions. The
former will enable us to give an economic interpretation of homogeneous production
functions in terms of returns to scale and the latter will enable us to consider the
economic significance of the marginal products that can be derived from such
production functions.
Homogeneity and returns to scale
We say that a function, f (x, y), is homogeneous of degree r if
f (x, y) = r f (x, y),
for any R. Lets start by looking at some examples of homogeneous functions.

Example 6.15
one.

Show that the function f (x, y) = x1/2 y 1/2 is homogeneous of degree

Replacing x and y in f (x, y) with x and y we get


f (x, y) = (x)1/2 (y)1/2 = (1/2 x1/2 )(1/2 y 1/2 ) = 1 x1/2 y 1/2 = 1 f (x, y).
Comparing this with the definition of homogeneity, i.e.
f (x, y) = r f (x, y),
we see that r = 1 and so this function is homogeneous of degree one.
Example 6.16 Show that the function f (x, y) =
is its degree of homogeneity?

x+

y is homogeneous. What

Replacing x and y in f (x, y) with x and y we get

f (x, y) = x + y = x + y = ( x + y) = f (x, y).

As = 1/2 , comparing this with the definition of homogeneity, i.e.


f (x, y) = r f (x, y),
we see that r = 1/2 and so this function is homogeneous of degree one half.
Example 6.17

Show that the function f (x, y) = x + y 2 is not homogeneous.

Replacing x and y in f (x, y) with x and y we get


f (x, y) = (x) + (y)2 = x + 2 y 2 .

220

6.3. Partial differentiation

Comparing this with the definition of homogeneity, i.e.


f (x, y) = r f (x, y),
we see that there is no way of writing x + 2 y 2 in the form r (x + y 2 ) for any r and
so this function is not homogeneous.
In particular, this means that not all functions are homogeneous.
Economically, we can think of homogeneous functions as telling us about how outputs
change if we scale up our inputs. To see why, consider what happens if we scale up our
inputs by a factor of , i.e. if we increase our bundle of inputs, (x, y), by a factor of
> 1 we get the new bundle of inputs (x, y). Now if our outputs are determined by a
homogeneous function, f (x, y), of degree r we can see that the output from our new
bundle, (x, y), is given by
f (x, y) = r f (x, y),
i.e. we will get r times as much as we did from our old bundle, (x, y). That is, scaling
inputs by leads to a scaling of output by r if our output is determined by a function
which is homogeneous of degree r.
In particular, given a function which is homogeneous of degree one, we can see that
scaling our inputs by > 1 i.e. going from the bundle of inputs (x, y) to the bundle
of inputs (x, y) will scale our output by i.e. going from an output of f (x, y) to
an output of f (x, y). That is, we get constant returns to scale, a proportional increase
in inputs leads to the same proportional increase in output. Clearly, given functions of
degree r > 0, this idea can be extended to cover functions with degrees r = 1 as follows:
If r > 1, we get increasing returns to scale as > 1 implies that r > .5
If r = 1, we get constant returns to scale as we saw above.
If r < 1, we get decreasing returns to scale as > 1 implies that r < .6
To see how this works, consider the following example.
Example 6.18 A firm invests an amount of capital, k, and labour, l, in its
production process and this yields a production level of q(k, l). What will be the
effect on the level of production of quadrupling the amount of capital and labour
invested if the production function is homogeneous of degree (a) 1/2, (b) 1 and (c)
3/2?
Quadrupling the amount of capital and labour invested means increasing the
investment bundle from (k, l) to (4k, 4l). So, if the production function is
homogeneous of degree r, the production level will go from q(k, l) to
q(4k, 4l) = 4r q(k, l), i.e. the production level will change by a factor of 4r . In
particular, this means that if the production function is homogeneous of degree
(a) 1/2, the change will be by a factor of 41/2 = 2 (i.e. quadrupling inputs doubles
production),
5
6

That is, a proportional increase in inputs leads to a larger proportional increase in output.
That is, a proportional increase in inputs leads to a smaller proportional increase in output.

221

6. Functions of several variables

(b) 1, the change will be by a factor of 41 = 4, (i.e. quadrupling inputs quadruples


production),
(c) 3/2 the change will be by a factor of 43/2 = 8, (i.e. quadrupling inputs octuples
production),
yielding decreasing, constant and increasing returns to scale respectively.
We now turn to a useful result about homogeneous functions.
Eulers theorem and marginal products
Eulers theorem states that if f (x, y) is an homogeneous function of degree r, then
x

f
f
+y
= rf (x, y).
x
y

This follows from a simple application of the chain rule since, using the definition of a
function that is homogeneous of degree r, we have

f (x, y) = r f (x, y),


for any R. As such, differentiating both sides with respect to and using the chain
rule from (6.3) on the left-hand side, we have
f du f dv
+
= rr1 f (x, y),
u d v d
if we think of f (x, y) as f (u, v) with u = x and v = y. This then gives us
f
f
+y
= rr1 f (x, y).
u
v
and, if we now set = 1, we get the desired result as we have u = x, v = y and r1 = 1.
x

In this course, a question may involve verifying that Eulers theorem holds for some
given homogeneous function. As an example, lets verify that it is true for the two
homogeneous functions we considered in Examples 6.15 and 6.16.
Example 6.19 In Example 6.15, we saw that the function f (x, y) = x1/2 y 1/2 is
homogeneous of degree one. Verify that Eulers theorem holds for this function.
In this case we can see that
f
1
= x1/2 y 1/2
x
2

and

f
1
= x1/2 y 1/2 .
y
2

As such, we can see that the left-hand-side of Eulers theorem gives us


x

f
f
+y
=x
x
y

1 1/2 1/2
x
y
+y
2

1 1/2 1/2
x y
2

1
1
= x1/2 y 1/2 + x1/2 y 1/2
2
2

= x1/2 y 1/2 = f (x, y),


and since the degree of homogeneity of this function is one, we have f (x, y) on the
right-hand-side of Eulers theorem. Thus, as these two expressions are the same,
Eulers theorem holds.

222

6.3. Partial differentiation

Example 6.20 In Example 6.16, we saw that the function f (x, y) = x + y is


homogeneous of degree 1/2. Verify that Eulers theorem holds for this function.
In this case we can see that f (x, y) can be written as f (x, y) = x1/2 + y 1/2 and so,
f
1
= x1/2
x
2

and

f
1
= y 1/2 .
y
2

As such, we can see that the left-hand-side of Eulers theorem gives us


x

f
f
+y
=x
x
y

1 1/2
+y
x
2

1 1/2
y
2

1
1
1 1/2
1
= x1/2 + y 1/2 =
x + y 1/2 = f (x, y),
2
2
2
2

and since the degree of homogeneity of this function is a half, we have 12 f (x, y) on
the right-hand-side of Eulers theorem. Thus, as these two expressions are the same,
Eulers theorem holds.
We now turn to the economic significance of Eulers theorem. Consider a firm that
invests an amount of capital, k, and labour, l, in its production process and this yields a
production level of q(k, l). Further, assume that this production function is
homogeneous of degree one, i.e. that we have constant returns to scale. Eulers theorem
then asserts that
q
q
+l
= q.
k
k
l
Now, ql gives us the marginal product of labour, i.e. it measures the change in
production if we change the amount of labour. In particular, if we invest one more unit
of labour, say by employing one more worker, ql tells us the resulting change in
production.7 As such, it makes sense to say that this extra worker is responsible for this
change in production and so, if we assume that we reward workers by giving them
goods equal to the quantity they produce, it makes sense to reward this worker with a
quantity of goods given by ql . Thus, if all workers produce the same amount, i.e. ql , and
there are l (i.e. the amount of labour invested) workers, it makes sense that they should
all be rewarded with a quantity of goods equal to ql . As such, the quantity lql represents
the total quantity of goods that should be given as rewards to the workers (i.e. the
labour). A similar argument applies to the quantity kqk , i.e. this should be the total
quantity of goods that should be given as rewards to the providers of the capital.
Consequently, Eulers theorem tells us that these rewards should add up to the total
quantity of goods produced, i.e. all the goods being produced should be distributed
amongst the suppliers of capital and the providers of labour. In summary, this says:

But, strictly, this is only approximate since if q is the change in production and l is the change
in labour, the relationship
q
q
q
or
q
l,
l
l
l
is only an approximation. As such, taking on one more worker (i.e. changing the amount of labour by
one) gives l = 1 and hence the change in production, q, is given [approximately] by q = ql . However,
the argument given in these notes can be made precise if we consider the change in production due to
an arbitrarily small change in the amount of labour instead of, say, the intuitively more obvious change
of one worker.

223

6. Functions of several variables

In a firm with constant returns to scale, if we reward each factor of production


(e.g. capital and labour) at a level equal to its marginal product, then the total
reward to the factors of production will be the amount produced.

6.3.5

Second-order partial derivatives

If we have a function f (x, y), we can use partial differentiation to find the new functions
fx (x, y) and fy (x, y). These new functions are called the first-order partial derivatives of
f . However, it is also possible to partially differentiate these new functions with respect
to x and y to get the second-order partial derivatives of f . Obviously, for a function of
two variables, there are four second-order partial derivatives, i.e. those that are
unmixed:
2f
f
f
2f
and
,
=
=
x2
x x
y 2
y y
and those that are mixed:
2f

=
yx
y

f
x

2f

=
xy
x

and

f
.
y

Or, alternatively, in our more compact notation we have


fxx = (fx )x ,

fyy = (fy )y ,

fxy = (fx )y

and fyx = (fy )x ,

respectively. In this course, we will find that the order of partial differentiation in the
mixed second-order partial derivatives is unimportant since we will always have
fxy = fyx . In particular, this fact can serve as a useful check when we are working out
second-order partial derivatives.
Example 6.21

In Example 6.7, we found the first-order partial derivatives of


f (x, y) = x2 y + 5xy 3 + y 2 ,

were given by
fx (x, y) = 2xy + 5y 3

and

fy (x, y) = x2 + 15xy 2 + 2y.

Find the second-order partial derivatives of f .


Partially differentiating fx (x, y) = 2xy + 5y 3 with respect to x and y respectively, we
can see that
fxx (x, y) = 2y
and
fxy (x, y) = 2x + 15y 2 ,
whereas, partially differentiating fy (x, y) = x2 + 15xy 2 + 2y with respect to x and y
respectively, we can see that
fyx (x, y) = 2x + 15y 2

and

fyy (x, y) = 30xy + 2.

Notice that fxy = fyx as we should expect in this course.

224

6.3. Partial differentiation

Example 6.22

In Example 6.8, we found the first-order partial derivatives of


f (x, y) = 3x3 + 7xy 1 + 2y 9 ,

were given by
fx (x, y) = 9x2 + 7y 1

fy (x, y) = 7xy 2 + 18y 8 .

and

Find the second-order partial derivatives of f .


Partially differentiating fx (x, y) = 9x2 + 7y 1 with respect to x and y respectively,
we can see that
fxx (x, y) = 18x

fxy (x, y) = 7y 2 ,

and

whereas, partially differentiating fy (x, y) = 7xy 2 + 18y 8 with respect to x and y


respectively, we can see that
fyx (x, y) = 7y 2

and

fyy (x, y) = 14xy 3 + 144y 7 .

Notice that fxy = fyx as we should expect in this course.


Find the second-order partial derivatives of the function in

Activity 6.17
Activity 6.10.

Activity 6.18 Find the first and second-order partial derivatives of


f (x, y) = x3/4 y 1/4 .
And, of course, when finding second-order partial derivatives we may need to use the
chain, product and quotient rules.
Example 6.23

In Example 6.9, we found the first-order partial derivatives of


2

f (x, y) = x ex+y ,
were given by
fx (x, y) = (x + 1) ex+y

and

fy (x, y) = 2xy ex+y .

Find the second-order partial derivatives of f .


To find the second-order derivatives that arise from fx (x, y), we first note that we
can write it as
f
2
= [(x + 1) ex ] ey .
x
2

So, to find fxx (x, y), we treat ey as a constant and we differentiate the function
(x + 1) ex using the product rule to get (x + 1) ex +1 ex . This gives us
2f
2
2
= ey [(x + 1) ex + ex ] = (x + 2) ex+y .
2
x

225

6. Functions of several variables

To find fxy (x, y), we treat (x + 1) ex as a constant and we differentiate the function
2
2
ey using the chain rule to get 2y ey . This gives us
2f
2
2
= (x + 1) ex (2y ey ) = 2(x + 1)y ex+y .
yx
To find the second-order derivatives that arise from fy (x, y), we first note that we
can write it as
f
2
= 2(x ex )(y ey ).
y
2

So, to find fyx (x, y), we treat 2y ey as a constant and we differentiate the function
x ex using the product rule to get x ex +1 ex . This gives us
2f
2
2
= 2y ey (x ex + ex ) = 2(x + 1)y ex+y .
xy
To find fyy (x, y), we treat 2x ex as a constant and we differentiate the function y ey
2
2
using the chain and product rules to get y(2y ey ) + ey . This gives us

2f
2
2
2
= 2x ex (2y 2 ey + ey ) = 2x(2y 2 + 1) ex+y .
2
y
Notice that fxy = fyx as we should expect in this course.
Activity 6.19
Activity 6.11.

Find the second-order partial derivatives of the function in

Of course, we could go on and discuss higher-order partial derivatives, but we wont as


they will not be used in this course.

6.4

Using partial derivatives

We now look at some of the useful things that partial derivatives tell us about functions
of two variables. Before you start this section, you should note that this material makes
use of some ideas from Chapter 2 of 173 Algebra, namely
the dot product of two vectors (see Section 2.8),
displacement and direction vectors (see Section 2.9),
the equation of a plane (see Section 2.11), and
the equation of a hyperplane (see Section 2.12).
Make sure that you understand these before you proceed.

6.4.1

Tangent planes

Suppose that we have a surface whose equation is given by z = f (x, y). If c = f (a, b),
then the point (a, b, c) is on this surface and, if we look at the sections given by x = a

226

6.4. Using partial derivatives

and y = b, which are parallel to the (y, z)-plane and (x, z)-plane respectively, we can
find tangent lines in these planes by using the partial derivatives as these tell us how z
is changing with y and x respectively at this point. In particular, if x = a, the section is
given by z = f (a, y) and the tangent line is given by
z = c + fy (a, b)(y b),
and this lives in the plane x = a which is parallel to the (y, z)-plane whereas if y = b,
the section is given by z = f (x, b) and the tangent line is given by
z = c + fx (a, b)(x a),
and this lives in the plane y = b which is parallel to the (x, z)-plane.
Example 6.24 Show that the point (1, 1, 2) lies on the surface whose equation is
z = x2 + y 2 . What are the equations of the tangent lines to the x = 1 and y = 1
sections at this point?
The point (1, 1, 2) lies on the surface z = x2 + y 2 as 2 = 12 + 12 . Here we have
z = f (x, y) with f (x, y) = x2 + y 2 and so, looking at the:

x = 1 section, we have
fy (x, y) = 2y

fy (1, 1) = 2,

and so the tangent line, which lives in the plane x = 1, has an equation given by
z = 2 + 2(y 1) = 2y,
as we should expect since this section has an equation given by z = 1 + y 2 . This
section and the tangent line are illustrated in Figure 6.10(a).
y = 1 section, we have
fx (x, y) = 2x

fx (1, 1) = 2,

and so the tangent line, which lives in the plane y = 1, has an equation given by
z = 2 + 2(x 1) = 2x,
as we should expect since this section has an equation given by z = 1 + x2 . This
section and the tangent line are illustrated in Figure 6.10(b).
In particular, note that these tangent lines live in the planes that define the
relevant sections.
Indeed, as we can find two tangent lines that tell us about how the surface z = f (x, y)
is changing in the x and y-directions at the point (a, b, c) by considering the y = b and
x = a sections respectively, we can use these two lines to define the tangent plane to the
surface at this point. The question is: How do we find the equation of this tangent
plane?

227

6. Functions of several variables

x=1

y=1

z = 2y

z = 2x

(a)

(b)

Figure 6.10: Tangent lines to the (a) x = 1 and (b) y = 1 sections of the surface z = x2 +y 2

at the point (1, 1, 2) as discussed in Example 6.24.

Lets assume that both of the partial derivatives, fx (x, y) and fy (x, y), are defined at
the point (a, b, c). We know, from Section 2.11 of 173 Algebra, that the vector equation
of a plane through this point is given by

u
xa
v y b = 0,
w
zc
where the vector (u, v, w) is the normal vector to the plane. Indeed, working out this
dot product, we find that
u(x a) + v(y b) + w(z c) = 0,
is the Cartesian equation of the plane. But, what are u, v and w? Well, if we assume
that we have w = 0, i.e. the plane we are considering is not vertical, then we can write
this as
v
u
z = c (x a) (y b),
w
w
and, to be a tangent plane, we require that the two tangent lines we found above lie in
the plane. In particular, we find that when x = a, we must have
z =c

v
(y b) giving us z = c + fy (a, b)(y b),
w

and when y = b we must have


z =c

u
(x a) giving us z = c + fx (a, b)(x a),
w

which means that we have

v
= fy (a, b) and
w

u
= fx (a, b).
w

This means that the Cartesian equation of the tangent plane is given by
z c = fx (a, b)(x a) + fy (a, b)(y b),
and writing this as
fx (a, b)(x a) + fy (a, b)(y b) (z c) = 0,

228

(6.4)

6.4. Using partial derivatives

we find that the vector equation of the tangent plane is

fx (a, b)
xa
fy (a, b) y b = 0.
1
zc

(6.5)

In particular, we see that

u
fx (a, b)
v = fy (a, b) ,
w
1
is a normal vector to this tangent plane.

Example 6.25 Following on from Example 6.24, find the Cartesian and vector
equations of the tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2). Verify
that the tangent lines to the x = 1 and y = 1 sections at this point (found in
Example 6.24) lie in this tangent plane.
Using what we found in Example 6.24 and (6.4), it should be clear that the
Cartesian equation of the tangent plane to the surface z = x2 + y 2 at the point
(1, 1, 2) is given by
z 2 = 2(x 1) + 2(y 1)

z = 2x + 2y 2,

whereas, using (6.5), its vector equation is


2
x1
2 y 1 = 0.
1
z2

Of course, if you work out the dot product in the latter, you should get the former!
If we now find the x = 1 section of this tangent plane we get
z = 2(1) + 2y 2 = 2y,
which is the tangent line to the x = 1 section of the surface and so this must lie in
the tangent plane and, similarly, if we find the y = 1 section of this tangent plane we
get
z = 2x + 2(1) 2 = 2x,
which is the tangent line to the y = 1 section of the surface and so this must lie in
the tangent plane too. This is illustrated in Figure 6.11.
We note in passing that, if f is differentiable,8 then the tangent plane to f (x, y) at the
point (a, b) gives us a linear approximation to f (x, y) at nearby points, i.e.
f (x, y)

f (a, b) + fx (a, b)(x a) + fy (a, b)(y b),

which, using vectors, we can write as


f (x, y)
8

f (a, b) + fx (a, b), fy (a, b)

xa
.
yb

That is, if both of its partial derivatives exist.

229

6. Functions of several variables

Figure 6.11: The tangent plane to the surface z = x2 +y 2 at the point (1, 1, 2) as discussed

in Example 6.25. The lines in this tangent plane, which lie in the x = 1 and y = 1 planes,
are the tangent lines to the x = 1 and y = 1 sections of the surface respectively.

This prompts us to define the derivative of f (x, y) with respect to the vector x = (x, y)
to be the vector
df
= fx (x, y), fy (x, y) ,
dx
so that we can write
df
xa
f (x, y) f (a, b) +
.
dx (a,b) y b
This then gives us something which looks like a Taylor series and we will see more of
this in Section 6.4.5. But, before we do this, lets consider another important use of
what we have just seen.

6.4.2

Gradient vectors

The tangent to the surface z = f (x, y) at the point (a, b, c), where c = f (a, b), has a
Cartesian equation given by
z c = fx (a, b)(x a) + fy (a, b)(y b).
Now, if we look at the intersection of the surface and its tangent plane with the
horizontal plane z = c, we find that the surface gives us the contour c = f (x, y) and the
tangent plane gives us the line
fx (a, b)(x a) + fy (a, b)(y b) = 0.
Now, this line passes through the point (a, b) and, given that this line is in the tangent
plane of the surface at the point (a, b, c), it should be clear that it is the tangent line of
this contour at (a, b). In particular, as we can write the equation of this line as
fx (a, b)
xa

fy (a, b)
yb

= 0,

in vector form, we can see that the vector


f (a, b) =

230

fx (a, b)
,
fy (a, b)

(6.6)

6.4. Using partial derivatives

is a normal vector to the tangent line and so it is perpendicular to the contour.


Example 6.26 Given that z = f (x, y) where f (x, y) = x2 + y 2 , find f (1, 1). Show
that this vector is perpendicular to the tangent line to the z = 2 contour of this
surface at the point (1, 1) and hence deduce that it is perpendicular to this contour
at this point.
Here f (x, y) = x2 + y 2 and so we have
f (x, y) =

fx (x, y)
fy (x, y)

2x
,
2y

and, evaluating this at the point (x, y) = (1, 1), we get


f (1, 1) =

2
.
2

Then, using (6.6), we see that the Cartesian equation of the tangent line to the
z = 2 contour at this point9 is given by
2
x1

2
y1

=0

2(x 1) + 2(y 1) = 0

y = 2 x.

Now, for x R, we have points (x, y) on this tangent line given by


x
y

x
2x

0
1
+x
,
2
1

and so this line lies in the direction given by the vector 1, 1


f (1, 1)

1
1

2
1

2
1

. But, of course,

= 2 + (2) = 0,

which means that f (1, 1) is indeed perpendicular to this tangent line and, in
particular, it will be perpendicular to the contour at this point too. This is
illustrated in Figure 6.12.
In general, given a function f (x, y), we call the vector
f (x, y) =

fx (x, y)
,
fy (x, y)

(6.7)

the gradient of f . Indeed, we have seen that fx (a, b) and fy (a, b) allow us to see how
rapidly f is changing if we move away from the point (a, b) in the x or y-direction
respectively. Now, we will look at how f (a, b) allows us to see how rapidly f is
changing if we move away from the point (a, b) in any direction.
9

Note that (x, y) = (1, 1) gives z = f (1, 1) = 2 and so this point is on the z = 2 contour of this
surface.

231

6. Functions of several variables

y =2x y

z=2

f (1, 1)

1
O

Figure 6.12: The z = 2 contour of the surface z = x2 + y 2 and its tangent line at the

point (1, 1) as discussed in Example 6.26. Observe how the tangent line to the contour at
this point is perpendicular to the vector f (1, 1). (The x and y-intercepts of the contour
have been omitted for clarity.)

6.4.3

Directional derivatives

Given the function f (x, y), we want to find its derivative, fu (a, b), in the direction of the
= (u1 , u2 )T .10 Of course, if u
is a unit vector in the x-direction, i.e.
unit vector u

=
u

1
0

we should get fu (a, b) = fx (a, b),

is a unit vector in the y-direction, i.e.


whereas if u
=
u

0
1

we should get fu (a, b) = fy (a, b),

but the question is: What if we are not using either of these two directions?
Consider the point on the surface z = f (x, y) at the point (a, b, c) where c = f (a, b). At
i.e. the curve of intersection of the
this point, we can find the section in the direction u,
Then,
surface and a plane that contains the point (a, b, c) and the vector u.
geometrically, we would want to interpret fu (a, b) as the gradient of the tangent line to
is a unit vector, this means that we have a vector v given by
this section. Now, as u

u1
v = u2 ,
fu (a, b)
which lies in the plane and points in the direction of the tangent line. As such, this
vector is perpendicular to the normal vector to the surface at this point and so we have

u1
fx (a, b)
u2 fy (a, b) = 0.
fu (a, b)
1
That is, working out this dot product, we have

u1 fx (a, b) + u2 fy (a, b) fu (a, b) = 0,


10

=
That is, we have a direction u and we work with a unit vector in that direction, i.e. we use u
= u21 + u22 = 1.
(u1 , u2 )T where |u|

232

6.4. Using partial derivatives

or, rearranging,
fu (a, b) = u1 fx (a, b) + u2 fy (a, b) =

u1
f (a, b)
x
,
u2
fy (a, b)

if we rewrite this in terms of inner products. Thus, we can see that the derivative of f
is given by
at the point (a, b) in the direction of the unit vector u
f (a, b),
fu (a, b) = u
in terms of the gradient of f .
Example 6.27 Given that z = f (x, y) with f (x, y) = x2 + y 2 , find the derivative of
T
f (x, y) in the direction 1, 2 at the point (1, 1). What is the derivative of f in the
direction f (1, 1)?
We saw in Example 6.26 that the gradient of f at the point (1, 1) is given by
f (1, 1) =

2
.
2

So, taking the direction


1
2

u=

1
=
we get the unit vector u
5

1
,
2

as |u|2 = 12 + 22 = 5 and this means that the gradient of f in the direction of this
unit vector is given by
1
f (1, 1) =
fu (1, 1) = u
5

1
2

2
2

6
1
= (2 + 4) = .
5
5

Similarly, if we take the direction to be v = f (1, 1), we have


v=

2
2

1
so we get the unit vector v =
8

2
,
2

as |v |2 = 22 + 22 = 8 and this means that the gradient of f in the direction of this


unit vector is given by
1
fv (1, 1) = v f (1, 1) =
8

2
2

2
2

1
8
= (4 + 4) = .
8
8

In particular, observe that the latter is approximately 2.83 (to 2dp) which is larger
than the former which is approximately 2.68 (to 2dp).
Indeed, this leads on to a useful observation about the rate at which f is changing in
different directions. We know, from Section 2.9 of 173 Algebra, that if is the angle
and f (a, b), we have
between the vectors u

233

6. Functions of several variables

f (a, b) = |u||f

u
(a, b)| cos = |f (a, b)| cos ,
= 1 since u
is a unit vector. In particular, we can use the fact that
as |u|
1 cos 1 to see that
|f (a, b)| fu (a, b) |f (a, b)|.
That is, if |f (a, b)| = 0, we can deduce that:
The maximum rate of change of f at the point (a, b, c) is |f (a, b)| and this occurs
when = 0, i.e. when the direction is u = f (a, b). This is the direction and rate
at which f increases most rapidly.
The minimum rate of change of f at the point (a, b, c) is |f (a, b)| and this
occurs when = , i.e. when the direction is u = f (a, b). This is the direction
and rate at which f decreases most rapidly.

Indeed, this allows us to see that, at the point (a, b), f is steepest in the direction
f (a, b).11
Example 6.28 Illustrate that the maximum rate of change of f occurs in the
direction f using what we found in Example 6.27.
In Example 6.27, we saw that the rate of change in the direction v = f (1, 1) was
T
greater than the rate of change in the direction u = 1, 2 as
fv (1, 1) > fu (1, 1),
and we can illustrate this using Figure 6.13. In particular, observe that if we want to
move to the z = 4 contour from the point (1, 1) on the z = 2 contour, it is quickest
to go in the direction given by f (1, 1) as, if we were to go in the direction
T
u = 1, 2 , we would have to travel further. Consequently, the rate of change of
z = f (x, y) is maximised when we go in the direction given by f (1, 1) and if we go
T
in another direction, say u = 1, 2 , it will be smaller.

6.4.4

Implicitly defined functions of two variables

Suppose that we have a surface whose equation is given by z = f (x, y). We could, of
course, write this equation as f (x, y) z = 0 and, in this form, the equation is now
g(x, y, z) = 0 if we take g to be the function of three variables given by
g(x, y, z) = f (x, y) z.
Indeed, more generally, we can see that a surface can be given by an equation of the
form g(x, y, z) = c where g, a function of three variables, is constrained to take the
11

This will be important in


Of course, if |f (a, b)| = 0 we find that fu (a,b) = 0 in all directions, u!
Section 7.2.1.

234

6.4. Using partial derivatives

y
z=4
z=2

(1, 2)T
f (1, 1)

1
O

Figure 6.13: The z = 2 and z = 4 contours of the surface z = x2 + y 2 and the directions

f (1, 1) and (1, 2)T at the point (1, 1) as discussed in Example 6.27. Observe how the
quickest way to get to z = 4 contour from the point (1, 1) on the z = 2 contour is to go in
the direction f (1, 1). (The x and y-intercepts of the z = 2 contour have been omitted
for clarity.)
constant value, c. Sometimes, in such cases, we will be able to rearrange what we are
given to explicitly find the equation of the surface in the form z = f (x, y). But, what if
we cant? That is, what if we can only implicitly define the function f (x, y) through the
equation g(x, y, z) = c? As we shall see, with minor modifications, we will be able to
discuss certain aspects of such a surface using g even if we cant find f .
Tangent planes
Technically, a function g : R3 R defines a hypersurface in R4 whose equation is given
by u = g(x, y, z). And, although we cant visualise such hypersurfaces because they
live in a four-dimensional space, we can easily extend the theory of this chapter to say
things about them. For instance, if we have the point (a, b, c, d) where d = g(a, b, c), it
should be clear that the Cartesian equation of the tangent hyperplane to the surface at
this point is given by
u d = gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c),
which is the analogue of what we saw in (6.4).12 Indeed, rewriting this as
gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c) (u d) = 0,
we can see that the vector equation of this tangent hyperplane is

gx (a, b, c)
xa
gy (a, b, c) y b

gz (a, b, c) z c = 0,
1
ud
which is the analogue of (6.5) and the vector

gx (a, b, c)
gy (a, b, c)

gz (a, b, c) ,
1
12

We could, of course, re-run the argument given in Section 6.4.1 in this new context but we refrain
from doing that here.

235

6. Functions of several variables

is therefore one of its normal vectors as we might expect given what we saw before.
Here, however, we are interested in a surface in R3 whose equation, for some constant d,
is given by g(x, y, z) = d and this is the u = d contour of the corresponding
hypersurface in R4 .13 In particular, we want to be able to find the tangent plane to this
surface at a point (a, b, c) where g(a, b, c) = d. So, setting u = d in the Cartesian
equation of the tangent hyperplane above, we get
gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c) = 0,

(6.8)

and this is the Cartesian equation of the tangent plane we seek. Lets see how this
works in practice.
Example 6.29 Following on from Example 6.25, find the Cartesian equation of the
tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2) by using the function
g(x, y, z) = x2 + y 2 z.
The surface whose equation is z = x2 + y 2 can be represented by the equation
g(x, y, z) = 0 with g(x, y, z) = x2 + y 2 z and, as such, we have

gx (x, y, z) = 2x,

gy (x, y, z) = 2y,

gz (x, y, z) = 1.

and

Thus, using the Cartesian equation for the tangent plane at the point (a, b, c) on the
surface g(x, y, z) = d in (6.8), i.e.
gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c) = 0,
we verify that the point (1, 1, 2) is on the surface as g(1, 1, 2) = 12 + 12 2 = 0 and
see that
2(1)(x 1) + 2(1)(y 1) + (1)(z 2) = 0

2x + 2y z = 2,

is the Cartesian equation of the tangent plane to the surface at this point in
agreement with what we saw in Example 6.25.
But, of course, our real objective here is to see how to find a tangent plane when the
function of two variables which gives the surface is only implicitly defined through an
equation that involves a function of three variables as in the next example.
Example 6.30 Verify that the point (1, 0, ) is on the surface whose equation is
x3 + zy 3 + sin z = 1 and find the tangent plane to the surface at that point.
The point (1, 0, ) is on the surface as 13 + ()(03 ) + sin = 1 + 0 + 0 = 1 and we
can write the equation of the surface as g(x, y, z) = 1 with
g(x, y, z) = x3 + zy 3 + sin z.
As such, we have
gx (x, y, z) = 3x2 ,
13

gy (x, y, z) = 3zy 2 ,

and

gz (x, y, z) = y 3 + cos z,

In much the same way that a contour of a surface in R3 is a curve in R2 !

236

6.4. Using partial derivatives

and using (6.8), we get


3(12 )(x 1) + 3()(02 )(y 0) + (03 + cos )(z ) = 0

3x z = 3 ,

as the Cartesian equation of the tangent plane to the surface at this point.
Gradient vectors
If we now write (6.8) in vector form, we get

gx (a, b, c)
xa
gy (a, b, c) y b = 0,
gz (a, b, c)
zc
and so we can see that the vector

(6.9)

gx (a, b, c)
g(a, b, c) = gy (a, b, c) ,
gz (a, b, c)

is a normal vector to the tangent plane and so it is perpendicular to the surface.


Example 6.31 Following on from Example 6.29, find the vector g(1, 1, 2) where
g(x, y, z) = x2 + y 2 z. Show that this vector is perpendicular to the tangent plane
to the surface g(x, y, z) = 0 at the point (1, 1, 2) and hence deduce that it is
perpendicular to the surface at this point.
Here g(x, y, z) = x2 + y 2 z and so we have


gx (a, b, c)
2x

g(x, y, z) = gy (a, b, c) = 2y ,
gz (a, b, c)
1
and, evaluating this at the point (1, 1, 2), we get

2

2 .
g(1, 1, 2) =
1

Then, using (6.9), we see that the Cartesian equation of the tangent plane to the
surface g(x, y, z) = 0 at this point14 is given by

2
x1
2 y 1 = 0 = 2(x 1) + 2(y 1) (z 2) = 0 = 2x + 2y z = 2.
1
z2
Now, for x, y R, we have points (x, y, z) on this tangent plane given by




x
x
0
1
0
y =
= 0 + x 0 + y 1 ,
y
z
2 + 2x + 2y
2
2
2

237

6. Functions of several variables

and so this plane lies in the directions given by the vectors (1, 0, 2)T and (0, 1, 2)T .
But, of course,

1
2
1

2 0 = 2 + 0 + (2) = 0,
g(1, 1, 2) 0 =
2
1
2
and


0
2
0
g(1, 1, 2) 1 = 2 1 = 0 + 2 + (2) = 0,
2
1
2

which means that g(1, 1, 2) is indeed perpendicular to this tangent plane and, in
particular, it will be perpendicular to the surface at this point too.

In general, given a function g(x, y, z), we call the vector

gx (x, y, z)
g(x, y, z) = gy (x, y, z) ,
gz (x, y, z)

the gradient of g and, for a function of three variables, this is the analogue of what we
saw in (6.7). Of course, we could then extend what we saw in Section 6.4.3, and use this
to find the directional derivatives of a function of three variables. This, in turn, would
allow us to see how rapidly this function is changing if we move away from a point in a
certain direction and, in particular, it would allow us to find the maximum (or
minimum) rate of change of such a function and the direction in which it occurs.

6.4.5

Taylor series

We saw in Section 3.4 that a function, F (t), of one variable has a second-order Taylor
series given by
F (t) = F (a) + (t a)F (a) +

(t a)2
F (a) + ,
2!

around t = a. Now, we want to derive the corresponding result for a function, f (x, y), of
two variables around the point (a, b) and, from what we saw when we considered
tangent planes in Section 6.4.1, we should anticipate that the first two terms of this
Taylor series will be given by
f (a, b) +

df
dx

(a,b)

xa
,
yb

where the vector

df
= fx (x, y), fy (x, y) ,
dx
is the derivative of f (x, y) with respect to x = (x, y). So, our main concern here is what
the next term will look like.
14

Note that (x, y, z) = (1, 1, 2) gives g(1, 1, 2) = 12 + 12 2 = 0 and so this point is on this surface.

238

6.4. Using partial derivatives

If we want to find the Taylor series for a function, f (x, y), around the point (a, b) we
need to see what is happening at some nearby point (x, y). Lets say that, in terms of a
new variable t, these points are related by the equations
x = a + ht

and

y = b + kt,

for some appropriately small values of the numbers ht and kt since these points are
supposed to be close to one another. Indeed, this means that we can define a new
function, F (t), of the single variable, t, given by
F (t) = f (x(t), y(t)) where x(t) = a + ht and y(t) = b + kt,
where the idea is that F (t) and its derivatives will allow us to use the Maclaurin series
for F (t), i.e.
t2
F (t) = F (0) + tF (0) + F (0) + ,
2!
to deduce the corresponding Taylor series for f (x, y). In particular, we can see
straightaway that
F (0) = f (x(0), y(0)) = f (a, b),
which is the first of our anticipated terms. Now we need to find the derivatives F (t)
and F (t) to see what the other two terms are.
To find F (t), we use the chain rule from Section 6.3.3 to see that
F (t) =

f dx f dy
+
= hfx (x(t), y(t)) + kfy (x(t), y(t)).
x dt
y dt

In particular, this means that


F (0) = hfx (x(0), y(0)) + kfy (x(0), y(0)) = hfx (a, b) + kfy (a, b),
so we can see that the next term in our Taylor series will be
tF (0) = htfx (a, b) + ktfy (a, b) = (x a)fx (a, b) + (y b)fy (a, b) =

df
dx

(a,b)

xa
,
yb

which is the second of our anticipated terms.


To find the remaining term, we need to find F (t) by differentiating our expression for
F (t) with respect to t using the chain rule. This gives us
F (t) = h

fx dx fx dy
fy dx fy dy
+
+k
+
x dt
y dt
x dt
y dt

= h hfxx (x(t), y(t)) + kfxy (x(t), y(t)) + k hfyx (x(t), y(t)) + kfyy (x(t), y(t))
F (t) = h2 fxx (x(t), y(t)) + hkfxy (x(t), y(t)) + khfyx (x(t), y(t)) + k 2 fyy (x(t), y(t))
and, in particular, this means that
F (0) = h2 fxx (a, b) + hkfxy (a, b) + khfyx (a, b) + k 2 fyy (a, b),

239

6. Functions of several variables

so we can see that the next term in our Taylor series will be
1
t2
F (0) =
(x a)2 fxx (a, b) + (x a)(y b)fxy (a, b) +
2!
2!
(y b)(x a)fyx (a, b) + (y b)2 fyy (a, b) .
Indeed, if we now define the second derivative of f (x, y) with respect to x = (x, y) to be
the matrix
d2 f
f (x, y) fxy (x, y)
= xx
,
2
fyx (x, y) fyy (x, y)
dx
it is easily verified that we have
t2
1
d2 f
x a, y b
F (0) =
2!
2!
dx 2

xa
,
yb

(a,b)

as the next term in our Taylor series.


Consequently, putting this all together we see that the second-order Taylor series for a
function, f (x, y), of two variables around the point (a, b) is given by

f (x, y) = f (a, b) +

df
dx

d2 f
1
xa
x a, y b
+
yb
2!
dx 2

(a,b)

(a,b)

xa
+ ,
yb

and these terms will be sufficient for our purposes in this course. We will see how this
can be used in the next chapter, but for now, we will just use it to find an
approximation to a function of two variables around a certain point.
Example 6.32 Find the second-order Taylor series of the function
f (x, y) = ex cos y around the point (1, 0).
The first term of our second-order Taylor series is simply f (0, 1) = e1 cos 0 = e. We
also see that
df
= fx (x, y), fy (x, y) = ex cos y, ex sin y ,
dx
which means that
df
= e1 cos 0, e1 sin 0 = e, 0 ,
dx (1,0)
and so the second term of our second-order Taylor series is
df
dx

x1
y0

(1,0)

x1
y

= e, 0

= e(x 1).

Lastly, we see that


d2 f
=
dx 2

fxx (x, y) fxy (x, y)


fyx (x, y) fyy (x, y)

ex cos y ex sin y
,
ex sin y ex cos y

which means that


d2 f
dx 2

240

=
(1,0)

e1 cos 0 e1 sin 0
e1 sin 0 e1 cos 0

e 0
,
0 e

6.4. Learning outcomes

and so the third term of our second-order Taylor series is


1
d2 f
x 1, y 0
2!
dx 2

(1,0)

x1
y0

1
x 1, y
2!

e 0
0 e

x1
y

1
e(x 1)
x 1, y
ey
2!
1
=
e(x 1)2 e y 2 .
2!

Consequently, putting this all together, we find that


f (x, y)

e + e(x 1) +

1
e(x 1)2 e y 2 ,
2!

is the second-order Taylor series of f (x, y) = ex cos y around the point (1, 0).

Activity 6.20 Find an approximation to e1.1 cos 0.2 by using the second-order
Taylor series that we found in Example 6.32.

Activity 6.21 Find the second-order Taylor series in the previous example by using
the Taylor series for ex about x = 1 (see Example 3.31) and the Maclaurin series for
cos y (see Section 3.4.1).

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
visualise a surface by using sections and contours;
find partial derivatives;
use the chain rule to find derivatives of various kinds;
show that a function is homogeneous and verify Eulers theorem;
solve problems from economics-based subjects that involve partial derivatives;
find tangent planes and gradient vectors;
find directional derivatives and interpret what you have found;
find Taylor series and use these to approximate functions of two variables.

241

6. Functions of several variables

Solutions to activities
Solution to activity 6.1
To find the contours of the surface z = 4x + 2y 2 when we have the given values of z,
we note that:
For z = 10, the curve of intersection is given by 10 = 4x + 2y 2 which gives us
y = 2x 4.
For z = 0, the curve of intersection is given by 0 = 4x + 2y 2 which gives us
y = 2x + 1.
For z = 10, the curve of intersection is given by 10 = 4x + 2y 2 which gives us
y = 2x + 6.

Thus, we see from these equations that all three of the contours are straight lines. The
sketch of these contours in the (x, y)-plane is illustrated in Figure 6.14.
z
=

10
z

=
0

z
=

10

1
O

1
2

3 x

4
Figure 6.14: A sketch of the z = 10, z = 0 and z = 10 contours of the surface z =
4x + 2y 2 in the (x, y)-plane for Activity 6.1.

Solution to activity 6.2


To find the z = 25 contour of the surface z = x2 y 2 we need to find the curve of
intersection which, in this case, is simply
x2 y 2 = 25

x2 + y 2 = 25.

This is the equation of a circle, centred on the origin, with a radius of five.
To find the z = c contours in the three cases indicated we just need to find out what the
curve
x2 y 2 = c
=
x2 + y 2 = c,
looks like in the three cases. So, we have:

If c > 0, there are no contours as we have c < 0 and we know that x2 + y 2 0 for
all values of x and y.

242

6.4. Solutions to activities

If c = 0, the contour is the point (0, 0) as this is the only solution to the equation
x2 + y 2 = 0.

If c < 0, the contour is a circle, centred on the origin, with a radius of c as we


have c > 0.

In particular, notice that z = 0 is the smallest value of z that arises from a point on this
surface.
Solution to activity 6.3
To find these sections of the surface z = 4x + 2y 2 we need to find the curves of
intersection, which in this case, are given by:
For the (x, z)-section, we have y = 0 and so the curve of intersection is given by
z = 4x 2 and this is a straight line in the (x, z)-plane.
For the (y, z)-section, we have x = 0 and so the curve of intersection is given by
z = 2y 2 and this is a straight line in the (y, z)-plane.

These sections are illustrated in Figure 6.15.

6
z

z = 4x 2
O

1
2

z = 2y 2
x

(a)

(b)

Figure 6.15: A sketch of the (a) (x, z)-section and (b) the (y, z)-section of the surface

z = 4x + 2y 2 for Activity 6.3.


Solution to activity 6.4
To find these sections of the surface z = x2 y 2 we need to find the curves of
intersection, which in this case, are given by:
For the (x, z)-section, we have y = 0 and so the curve of intersection is given by
z = x2 and this is a parabola in the (x, z)-plane.
For the (y, z)-section, we have x = 0 and so the curve of intersection is given by
z = y 2 and this is a parabola in the (y, z)-plane.

These sections are illustrated in Figure 6.16.

243

6. Functions of several variables

x
z=

x2

y
z=

(a)

y 2

(b)

Figure 6.16: A sketch of (a) the (x, z)-section and (b) the (y, z)-section of the surface

z = x2 y 2 for Activity 6.4.


Solution to activity 6.5
To find these sections of the surface z = x y + 4 we need to find the curves of
intersection, which in this case, are given by:

For the x = 0 section, we have x = 0 and so the curve of intersection is given by


z = y + 4 and this is a straight line in the (y, z)-plane. Of course, this is just the
(y, z)-section we found in Example 6.3!
For the x = 2 section, we have x = 2 and so the curve of intersection is given by
z = 2 y + 4 = y + 6 and this is a straight line.
For the x = 4 section, we have x = 4 and so the curve of intersection is given by
z = 4 y + 4 = y + 8 and this is a straight line.

Observe that only the first of these sections lives in the (y, z)-plane but, as illustrated
in Figure 6.17, we can also sketch the other two in this plane to get a feel for how the
surface is changing when we look at the sections x = c for different values of c.
z
8
6
4

4
= 2
x = 0
x =
x
O

8 y

Figure 6.17: The x = 0, x = 2 and x = 4 sections of the surface z = 4x + 2y 2 for

Activity 6.5.

244

6.4. Solutions to activities

Solution to activity 6.6


To find the y = 2, 0, 2 sections of the surface z = 4x + 2y 2 we need to find the
curves of intersection, which in this case, are given by:
For the y = 2 section, we have y = 2 and so the curve of intersection is given by
z = 4x 4 2 = 4x 6 and this is a straight line.
For the y = 0 section, we have y = 0 and so the curve of intersection is given by
z = 4x 2 and this is a straight line in the (y, z)-plane. Of course, this is just the
(x, z)-section we found in Activity 6.3 and it is the only one that lives in the
(x, z)-plane!
For the y = 2 section, we have y = 2 and so the curve of intersection is given by
z = 4x + 4 2 = 4x + 2 and this is a straight line.

These sections are illustrated in Figure 6.18(a).

Similarly, to find the x = 2, 0, 2 sections of the surface z = 4x + 2y 2 we need to find


the curves of intersection, which in this case, are given by:
For the x = 2 section, we have x = 2 and so the curve of intersection is given by
z = 8 + 2y 2 = 2y 10 and this is a straight line.
For the x = 0 section, we have x = 0 and so the curve of intersection is given by
z = 2y 2 and this is a straight line in the (y, z)-plane. Of course, this is just the
(y, z)-section we found in Activity 6.3 and it is the only one that lives in the
(y, z)-plane!
For the x = 2 section, we have x = 2 and so the curve of intersection is given by
z = 8 + 2y 2 = 2y + 6 and this is a straight line.

These sections are illustrated in Figure 6.18(b).

=
x

y=2
y=0
y=
2

O
12 12
2

3
2

O 1
2

10

(a)

(b)

Figure 6.18: A sketch of (a) the y = 2, 0, 2 sections and (b) the x = 2, 0, 2 sections of

the surface z = 4x + 2y 2 for Activity 6.6.

245

6. Functions of several variables

Solution to activity 6.7


To find these sections of the surface z = x2 + y 2 we need to find the curves of
intersection, which in this case, are given by:
For the y = 0 section, we have y = 0 and so the curve of intersection is given by
z = x2 and this is a parabola in the (x, z)-plane. Of course, this is just the
(x, z)-section we found in Example 6.4!
For the y = 1 section, we have y = 1 and so the curve of intersection is given by
z = x2 + 1 and this is a parabola.
For the y = 2 section, we have y = 2 and so the curve of intersection is given by
z = x2 + 4 and this is a parabola.
Observe that only the first of these sections lives in the (x, z)-plane but, as illustrated
in Figure 6.9, we can also sketch the other two in this plane to get a feel for how the
surface is changing when we look at the sections y = c for different values of c.

y=2

4
y=1
1
O

y=0
x

Figure 6.19: The y = 0, y = 1 and y = 2 sections of the surface z = x2 +y 2 for Activity 6.7.

Solution to activity 6.8


To find the y = 0, 1, 2 sections of the surface z = x2 y 2 we need to find the curves of
intersection, which in this case, are given by:
For the y = 0 section, we have y = 0 and so the curve of intersection is given by
z = x2 and this is a parabola in the (y, z)-plane. Of course, this is just the
(x, z)-section we found in Activity 6.4 and it is the only one that lives in the
(x, z)-plane!
For the y = 1 section, we have y = 1 and so the curve of intersection is given by
z = x2 1 and this is a parabola.
For the y = 2 section, we have y = 2 and so the curve of intersection is given by
z = x2 4 and this is a parabola.

These sections are illustrated in Figure 6.20(a).

Similarly, to find the x = 0, 1, 2 sections of the surface z = x2 y 2 we need to find the


curves of intersection, which in this case, are given by:

246

6.4. Solutions to activities

For the x = 0 section, we have x = 0 and so the curve of intersection is given by


z = y 2 and this is a parabola in the (y, z)-plane. Of course, this is just the
(y, z)-section we found in Activity 6.4 and it is the only one that lives in the
(y, z)-plane!
For the x = 1 section, we have x = 1 and so the curve of intersection is given by
z = 1 y 2 and this is a parabola.
For the x = 2 section, we have x = 2 and so the curve of intersection is given by
z = 4 y 2 and this is a parabola.

These sections are illustrated in Figure 6.20(b).


z

O
1

y=0

O
1

y=1
4

x=0

x=1

4
y=2

(a)

x=2

(b)

Figure 6.20: A sketch of (a) the y = 0, 1, 2 sections and (b) the x = 0, 1, 2 sections of the

surface z = x2 y 2 for Activity 6.8.


Solution to activity 6.9
The partial derivative of f (x, y) with respect to y, i.e. the result of differentiating
f (x, y) with respect to y whilst holding x constant, is going to be another function of x
and y. This function of x and y is what is denoted by the symbols in (6.2). What does
this partial derivative mean? In effect, what we have done when we consider the
function f (x, y) for some fixed value of x, say x0 , is to look at the section of the curve
z = f (x, y) we get when x = x0 , i.e. the section given by the equation z = f (x0 , y)
which lies in a plane that has x = x0 and is parallel to the (y, z)-plane. Then, when we
differentiate f (x0 , y) with respect to y, we are finding the gradient of this section, i.e. it
tells us how z = f (x0 , y) is varying with y. Consequently, this partial derivative is
telling us something about the gradient of the surface when we are at the point (x0 , y)
and we are looking in the y-direction.
Solution to activity 6.10
Given the function
f (x, y) = 2x + x3 y

y3
x y3
+
= 2x + x3 y xy 1 + ,
y
2
2

we hold y constant and differentiate with respect to x, to see that


f
1
= 2 + 3x2 y y 1 = 2 + 3x2 y ,
x
y

247

6. Functions of several variables

and we hold x constant and differentiate with respect to y, to see that


f
3
x
3
= x3 + xy 2 + y 2 = x3 + 2 + y 2 .
y
2
y
2
These are the sought after partial derivatives fx (x, y) and fy (x, y) respectively.
Solution to activity 6.11
Given the function
f (x, y) =

x2 + y 2 = (x2 + y 2 )1/2 ,

we hold y constant and differentiate with respect to x using the chain rule to get
f
1
= (x2 + y 2 )1/2 (2x) =
x
2

x
x2 + y 2

and we hold x constant and differentiate with respect to y using the chain rule to get
f
1
= (x2 + y 2 )1/2 (2y) =
y
2

y
x2 + y 2

These are the sought after partial derivatives fx (x, y) and fy (x, y) respectively.
Solution to activity 6.12
Here f (x, y) = x2 y, x(t) = 2 + 3t and y(t) = t2 + 1. In this case, if we again let
F (t) = f (x(t), y(t)), the chain rule states that
f dx f dy
dF
=
+
.
dt
x dt
y dt
As such, using this, we can see that
dF
= (2xy)(3) + (x2 )(2t) = 2x(3y + xt),
dt
and so, substituting our expressions for x(t) and y(t), we get
dF
= 2(2 + 3t)[3(t2 + 1) + (2 + 3t)t] = 2(2 + 3t)(6t2 + 2t + 3).
dt
To check this, we note that
F (t) = f (x(t), y(t)) = (2 + 3t)2 (t2 + 1),
which, using the product and chain rules, gives us
dF
= [2(2 + 3t)(3)](t2 + 1) + (2 + 3t)2 (2t) = 2(2 + 3t)[3(t2 + 1) + t(2 + 3t)],
dt
and this agrees with our earlier answer.

248

6.4. Solutions to activities

Solution to activity 6.13


We have a function, y(x), which is defined implicitly by the equation
x2 + 2xy + 3y 3 = 6,
and we notice that, at the point (x, y) = (1, 1) we have
(1)2 + 2(1)(1) + 3(1)3 = 6,
and so this point does indeed satisfy the equation. To find its derivative at this point we
note that we have g(x, y) = c where
g(x, y) = x2 + 2xy + 3y 3
and we use the fact that

dy
g
=
dx
x

and

c = 6,

g
,
y

to get
dy
2x + 2y
2(x + y)
=
=
,
2
dx
2x + 9y
2x + 9y 2

as long as 2x + 9y 2 = 0. And, clearly, at the point (1, 1), this gives us


dy
dx

(1,1)

4
2(1 + 1)
= ,
2+9
11

as the value of the derivative.


Solution to activity 6.14
We have G(k, l) = g(x(k, l), y(k, l)) and we want to explain why the chain rule formula
for Gl (k, l) works. To do this, consider that if we change l by a small amount, l, whilst
holding k constant, the corresponding change in G(k, l) is given by
G

G
l,
l

but here, there are two ways in which G(k, l) = g(x(k, l), y(k, l)) can change with l.
Firstly, G can change with l because g changes with x and x changes with l, lets
denote this change in G by x G. In this case, we have
x G

g
x,
x

as we are holding y constant to see how F changes with x and this means that
x G

g x
l,
x l

as the change in x, x, is related to a change in l with k held constant by


x xl (k, l)l.

249

6. Functions of several variables

Secondly, G can change with l because g changes with y and y changes with l, lets
denote this change in G by y G. In this case, we have
y G

g
y,
y

as we are holding x constant to see how F changes with y and this means that
y G

g y
l,
y l

as the change in y, y, is related to a change in l with k held constant by


y yl (k, l)l.
Thus, as the total change in F due to these two changes is given by
G = x G + y G

g x
g y
l +
l,
x l
y l

we can now equate our two expressions for G and divide through by l to get the
chain rule for Gl (k, l) which we wanted.

Solution to activity 6.15


To see why the formula for zy (x, y) works, we consider that if we knew the function,
z(x, y), that satisfied the equation g(x, y, z) = c, we could find a new function, G(x, y),
of x and y only which is given by G(x, y) = g(x, y, z(x, y)). Then using the chain rule,
we have
g dy g z
G
=
+
.
y
y dy z y
But, G(x, y) = c where c is a constant and so we also have
G
=0
x

as well as

dy
= 1,
dy

which means that we are left with


0=

g g z
+
.
y z y

Rearranging this then gives us the formula we require that, i.e.


z
g/y
=
,
y
g/z
as long as gz (x, y, z) = 0.
Solution to activity 6.16
We have a function, q(k, l), which is defined implicitly by the equation
q 3 k + k 3 l + qk 2 l = 3,
and we want to find its partial derivatives with respect to k and l. To do this, we
rewrite the equation as g(q, k, l) = c so that we have, say,
g(q, k, l) = q 3 k + k 3 l + qk 2 l

250

and

c = 3,

6.4. Solutions to activities

and use the formulas


q
g
=
k
k

g
q

q
g
=
l
l

and

g
,
q

to see that the partial derivatives are


q
q 3 + 3k 2 l + 2qkl
=
k
3q 2 k + k 2 l

q
k 3 + qk 2
= 2
,
l
3q k + k 2 l

and

provided that 3q 2 k + k 2 l = 0.15


Now, to evaluate these partial derivatives at the point where (k, l) = (1, 1), we need to
find the corresponding value of q. This can be done by noting that, when we have k = 1
and l = 1, the equation becomes
q 3 + q 2 = 0,
and, using the hint, we see that this equation can be written as
(q 1)(q 2 + q + 2) = 0.
Indeed, since

1
7
q +q+2= q+
+ > 0,
2
4
for all q R, we see that q = 1 is the only solution to this equation. Thus, the point we
are interested in has coordinates (k, l, q) = (1, 1, 1) and, at this point, we have
2

1+3+2
6
3
q
=
= =
k
3+1
4
2

and

q
1+1
2
1
=
= = ,
l
3+1
4
2

as the values of the partial derivatives at this point.


Solution to activity 6.17
In Activity 6.10, we saw that the function
f (x, y) = 2x + x3 y

y3
x y3
+
= 2x + x3 y xy 1 + ,
y
2
2

had partial derivatives given by


f
= 2 + 3x2 y y 1
x

and

f
3
= x3 + xy 2 + y 2 .
y
2

So, partially differentiating fx (x, y) with respect to x and y respectively, we get


fxx (x, y) = 6xy

and fxy (x, y) = 3x2 + y 2 = 3x2 +

1
,
y2

whereas, partially differentiating fy (x, y) with respect to x and y respectively, we get


fyx (x, y) = 3x2 + y 2 = 3x2 +

1
y2

and fyy (x, y) = 2xy 3 + 3y = 2

x
+ 3y.
y3

Notice that fxy = fyx as we should expect in this course.


15

Notice that, in particular, we can never have k = 0 here as this does not satisfy the equation
q k + k 3 l + qk 2 l = 3.
3

251

6. Functions of several variables

Solution to activity 6.18


Given the function f (x, y) = x3/4 y 1/4 , we partially differentiate with respect to x and y
respectively to get
3
fx (x, y) = x1/4 y 1/4
4

1
and fy (x, y) = x3/4 y 3/4 ,
4

as the first-order partial derivatives. Then, for the second-order partial derivatives, we
note that partially differentiating fx (x, y) with respect to x and y respectively, we get
fxx (x, y) =

3 5/4 1/4
x
y
16

and fxy (x, y) =

3 1/4 3/4
x
y
,
16

whereas, partially differentiating fy (x, y) with respect to x and y respectively, we get


fyx (x, y) =

3 1/4 3/4
x
y
16

and fyy (x, y) =

3 3/4 7/4
x y
.
16

Notice that fxy = fyx as we should expect in this course.

Solution to activity 6.19


In Activity 6.11, we saw that the function
f (x, y) =

x2 + y 2 = (x2 + y 2 )1/2 ,

had partial derivatives given by


f
= x(x2 + y 2 )1/2
x

and

f
= y(x2 + y 2 )1/2 .
y

So, partially differentiating fx (x, y) with respect to x using the product and chain rules
we get
1
(x2 + y 2 ) x2
y2
fxx (x, y) = (1)(x2 +y 2 )1/2 +(x) (x2 + y 2 )3/2 (2x) =
=
,
2
(x2 + y 2 )3/2
(x2 + y 2 )3/2
and partially differentiating fx (x, y) with respect to y using the chain rule we get
xy
1
fxy (x, y) = x (x2 + y 2 )3/2 (2y) = 2
.
2
(x + y 2 )3/2
Similarly, partially differentiating fy (x, y) with respect to x using the chain rule we get
1
xy
fyx (x, y) = y (x2 + y 2 )3/2 (2x) = 2
.
2
(x + y 2 )3/2
and partially differentiating fy (x, y) with respect to y using the product and chain rules
we get
1
(x2 + y 2 ) y 2
x2
fyy (x, y) = (1)(x2 + y 2 )1/2 + (y) (x2 + y 2 )3/2 (2y) =
=
.
2
(x2 + y 2 )3/2
(x2 + y 2 )3/2
Notice that fxy = fyx as we should expect in this course.

252

6.4. Exercises

Solution to activity 6.20


To find an approximation to e1.1 cos 0.2 using the second-order Taylor series in
Example 6.32, we have
e1.1 cos 0.2

e + e(1.1 1) +

1
e(1.1 1)2 e(0.2)2 = 1.085 e,
2!

and, using the value of e, we find that e1.1 cos 0.2 2.949 to 3dp. Indeed, as the point
(1.1, 0.2) is close to the point (1, 0) we expect this to be a good approximation. Of
course, the exact value of e1.1 cos 0.2 is 2.944 to 3dp and so we can see that our
approximation agrees with this to 1dp.
Solution to activity 6.21
As we saw in Example 3.31, the second-order Taylor series for ex around x = 1 is
ex

e +(x 1) e +

(x 1)2
e,
2!

and as we saw in Section 3.4.1, the second-order Maclaurin series (i.e. the Taylor series
around y = 0) of cos y is
y2
cos y 1 .
2!
This means that, around the point (1, 0), we would have
ex cos y

e +(x 1) e +

(x 1)2
e
2!

y2
2!

and, multiplying out the brackets and discarding terms which are more than
second-order in (x 1) and y since these are small around the point (1, 0), we get
ex cos y

e +(x 1) e +

y2
(x 1)2
ee ,
2!
2!

which is the same as what we found in Example 6.32.

Exercises
Exercise 6.1
Find the first and second-order partial derivatives of the function
f (x, y) = 2xy + x2a y a ,
where a is a constant.
If this function satisfies the equation
x2

2
2f
2 f

2y
18f (x, y) + 36xy = 0,
x2
y 2

find all possible values of a.

253

6. Functions of several variables

Exercise 6.2
For some numbers , and , a function, f , takes the form
x2 + y
f (x, y) = 2
.
x + y
If f is homogeneous of degree four, find the values of , and . Having found these
values, verify that the function satisfies Eulers theorem.
Exercise 6.3
Suppose that R(p, q) = eq+p and that p is a positive function of q defined implicitly by
the equation
q 2 p + p2 q + qp = 3.
Given that r(q) = R(q, p(q)), use the chain rule to find its derivative, r (q), when q = 1.
Exercise 6.4

A function f : R2 R is defined by
f (x, y) = x2 2y 2 ,
and the point P has coordinates (1, 1).
(a) Find the direction and rate at which f increases most rapidly at P .
(b) Find the rate of change of f at P in the direction (1, 1)T .
(c) Verify that the point P is on the curve
x2 2y 2 = 1,
and find the Cartesian equation of the tangent line to this curve at this point.
Exercise 6.5
A function f : R3 R is defined by
f (x, y, z) = ln(xy + z).
(a) Find the gradient of f at the point (a, b, c).
(b) Verify that the point (1, 1, 0) is on the surface
ln(xy + z) = 0,
and find the normal vector and the tangent plane to the surface at this point.
(c) Consider the points, (x, y, z), at which the rate of increase of f in the direction
(x/2.y/2, z)T is equal to two. Show that all of these points lie on the surface with
equation
x2 + y 2 + 4z 2 = 1.

254

6.4. Solutions to exercises

Solutions to exercises
Solution to exercise 6.1
Given that f (x, y) = 2xy + x2a y a where a is a constant, its first and second-order partial
derivatives are given by
f
= 2y +2ax2a1 y a
x

2f
= 2a(2a1)x2a2 y a
x2

2f
= a(a 1)x2a y a2
y 2

and

2f
= 2+2a2 x2a1 y a1 ,
yx

and
f
= 2x + ax2a y a1
y

and

2f
= 2 + 2a2 x2a1 y a1 .
xy

Observe, in particular, that fxy (x, y) = fyx (x, y) as we should expect in this course.
If this function satisfies the equation
x2

2
2f
2 f

2y
18f (x, y) + 36xy = 0,
x2
y 2

we can substitute in the relevant terms to see that we must have


x2 2a(2a 1)x2a2 y a 2y 2 a(a 1)x2a y a2 18 2xy + x2a y a + 36xy = 0,
which can be tidied up to give us
2a(2a 1)x2a y a 2a(a 1)x2a y a 36xy 18x2a y a + 36xy = 0,
and, after further simplification, we get
2(a2 9)x2a y a = 0.
Consequently, as x, y R, we must have a2 = 9 which means that a = 3 are the
possible values of a if f has to satisfy the given equation.
Solution to exercise 6.2
For the function

x2 + y
,
x2 + y
to be homogeneous of degree four for some numbers , and , we require that
f (x, y) =

f (x, y) =

(x)2 + (y)
,
(x)2 + (y)

is equal to 4 f (x, y). But, in order for this to happen, we must find that the
numerator is homogeneous, i.e. we have 2 = so that
(x)2 + (y) = (x) + (y) = (x + y ),
giving us a numerator whose degree of homogeneity is = 2.

255

6. Functions of several variables

denominator is homogeneous, i.e. we have = 2 so that


(x)2 + (y) = (x)2 + (y)2 = 2 (x2 + y 2 ),
giving us a denominator whose degree of homogeneity is = 2.
overall degree of homogeneity is four, i.e. we must find that

(x)2 + (y)
(x + y )
2 x + y
=
= 2 2
= 2 f (x, y),
2

2
2
2
(x) + (y)
(x + y )
x +y

is equal to 4 f (x, y). That is, we must have 2 = 4 so that = 6.

Consequently, we find that = 3 (since 2 = ), = 6 and = 2 so that our sought


after homogeneous function is
x6 + y 6
f (x, y) = 2
.
x + y2
To verify that Eulers theorem holds for this function, we need to show that
x

f
f
+y
= 4f (x, y).
x
y

To do this, we use the quotient rule to see that


(6x5 )(x2 + y 2 ) (x6 + y 6 )(2x)
f
=
x
(x2 + y 2 )2

and

f
(6y 5 )(x2 + y 2 ) (x6 + y 6 )(2y)
=
,
y
(x2 + y 2 )2

which means that we have


x

f
f
+y
=x
x
y

(6x5 )(x2 + y 2 ) (x6 + y 6 )(2x)


(x2 + y 2 )2

+y

(6y 5 )(x2 + y 2 ) (x6 + y 6 )(2y)


(x2 + y 2 )2

6x6 (x2 + y 2 ) 2x2 (x6 + y 6 ) + 6y 6 (x2 + y 2 ) 2y 2 (x6 + y 6 )


(x2 + y 2 )2

6(x6 + y 6 )(x2 + y 2 ) 2(x2 + y 2 )(x6 + y 6 )


(x2 + y 2 )2

4(x6 + y 6 )
x2 + y 2
= 4f (x, y),
=

as required.
Solution to exercise 6.3
Given that, r(q) = R(q, p(q)), the chain rule tells us that
dr
R dq R dp
R R dp
=
+
=
+
,
dq
q dq
p dq
q
p dq
and so, as R(q, p) = eq+p , we have
dr
dp
dp
= eq+p + eq+p
= eq+p 1 +
dq
dq
dq

256

6.4. Solutions to exercises

Now we need to calculate p (q) given that p = p(q) is defined through the equation
q 2 p + p2 q + qp = 3.
To do this, we let G(q, p) be the function defined by
G(q, p) = q 2 p + p2 q + qp,
so that the given equation is now G(q, p) = 3. With this, we then have
dp
G
=
dq
q
where

G
= 2qp + p2 + p and
q

G
,
p
G
= q 2 + 2pq + q,
p

which gives us
dp
2qp + p2 + p
= 2
,
dq
q + 2pq + q

provided that q 2 + 2pq + q = 0.


To take stock, so far, we have found that
dr
dp
= eq+p 1 +
dq
dq

and

2qp + p2 + p
dp
= 2
,
dq
q + 2pq + q

and we need to evaluate this at the point where q = 1. In particular, we now need to
find the value of p that corresponds to q = 1 if p = p(q) is the positive function of q
defined implicitly by the equation
q 2 p + p2 q + qp = 3.
That is, if we set q = 1 in this equation we get
p + p2 + p = 3

p2 + 2p 3 = 0

(p + 3)(p 1) = 0,

i.e. the possible values of p are 3 and 1. But, we are told that p is a positive function
of q and so we reject p = 3 and take the point where q = 1 and p = 1 to be the one we
are interested in. Then, at this point, we find that
2+1+1
dp
=
= 1
dq
1+2+1

dr
= e1+1 (1 + [1]) = 0,
dq

i.e. r (q) = 0 when q = 1.


Solution to exercise 6.4
For (a), the function f (x, y) = x2 2y 2 has a gradient vector given by
f =

2x
,
4y

257

6. Functions of several variables

and so at the point P , i.e. (1, 1), we have


2
,
4

f (1, 1) =

and this is the direction in which f is increasing most rapidly at P . We then find that

|f (1, 1)| = 4 + 16 = 20,


is the rate of change of f in this direction and so this is the rate at which f increases
most rapidly.
For (b), a unit vector in the direction v = (1, 1)T is v = ( 12 , 12 )T and so
fv (1, 1) = v f (1, 1) =

1
2

4
2

1
1

1
6
= (2 + 4) = = 3 2,
2
2

is the rate of change of f at P in the direction v .

For (c), the point P is on the curve as 12 2(1)2 = 1 2 = 1. To find the equation
of the tangent line to the curve at this point, we use (6.6), to see that
f (1, 1)

x1
y+1

=0

2
x1

4
y+1

=0

2(x 1) + 4(y + 1) = 0,

i.e. x + 2y = 1 is the Cartesian equation of the tangent line to the curve at P .


Solution to exercise 6.5
For (a), given the function f (x, y, z) = ln(xy + z) we have

y
fx
y/(xy + z)
1

x
f
x/(xy
+
z)
f (x, y, z) =
=
=
y
xy + z
1
fz
1/(xy + z)
and so the gradient vector is

at the point (a, b, c).


b
1
a ,
f (a, b, c) =
ab + c
1

For (b), we see that the point (1, 1, 0) is on the surface as ln([1][1] + 0) = ln 1 = 0 and
the normal vector to the surface at this point is

1
1
1

1
f (1, 1, 0) =
= 1 .
(1)(1) + 0
1
1

Then, using (6.8), we have

x1
1
x1
f (1, 1, 0) y 1 = 0 = 1 y 1 = 0 = 1(x1)+1(y1)+1(z0) = 0,
z0
1
z0

258

6.4. Solutions to exercises

i.e. x + y + z = 2 is the Cartesian equation of the tangent plane to surface at the point
(1, 1, 0).
For (c), we note that at all points, (x, y, z), we have

y
1
x ,
f (x, y, z) =
xy + z
1

and that the direction v = (x/2, y/2, z)T can be written as



x
1
y ,
v=
2
2z

which means that a unit vector in this direction is given by



x
1

y .
v =
2
2
2
x + y + 4z
2z

The rate of increase of f in the direction of the unit vector v at a point (x, y, z) is then
given by fv (x, y, z), i.e. we have
v f (x, y, z) =

xy + xy + 2z
(xy + z)

x2 + y 2 + 4z 2

2(xy + z)
(xy + z) x2 + y 2 + 4z 2

2
x2 + y 2 + 4z 2

where we have just found the dot product of the two vectors v and f (x, y, z).
Consequently, when fv (x, y, z) = 2, we have points (x, y, z) that satisfy the equation,
2=

2
x2

y2

4z 2

x2 + y 2 + 4z 2 = 1,

as required.

259

6. Functions of several variables

260

Chapter 7
Two-variable optimisation
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 4.6, 4.7, 6.36.8.
Anthony and Biggs (1996) Chapter 13, parts of Chapters 14 and 21.
Further reading
Simon and Blume (1994) parts of Chapter 17, 18 and 19.
Adams and Essex (2010) parts of Sections 13.113.3.

Aims and objectives


The objectives of this chapter are as follows.
To use partial derivatives to solve problems where a function needs to be optimised.
To solve problems where a function needs to be optimised subject to a constraint.
Specific learning outcomes can be found near the end of this chapter.

7.1

Introduction

Having seen how to find partial derivatives and gained some insight into what they tell
us about a function of two variables in the last chapter, we now see how they can be
used to optimise such a function. In particular, we will see how the first-order partial
derivatives allow us to find the stationary points of a function and its second-order
partial derivatives allow us to see whether such a point is a maximum or a minimum. We
will also see how to optimise a function of two variables in cases where the variables are
constrained, i.e. they are required to satisfy some extra condition known as a constraint.

7.2

Unconstrained optimisation

We start by considering unconstrained optimisation, i.e. we are looking for the places
where a function of two variables, f (x, y), attains its maximum or minimum values
when x and y are independent and free to take any values in R2 .

261

7. Two-variable optimisation

7.2.1

Stationary points

Suppose we have a surface z = f (x, y) whose tangent plane at the point (a, b, c) where
c = f (a, b) is given by (6.4), i.e.
z c = fx (a, b)(x a) + fy (a, b)(y b).
We define a stationary point of this function to be any point where the tangent plane to
the function is horizontal and so, in this case, the tangent plane would have to be z = c.
But, if this is the case, it means that we must have
fx (a, b)(x a) + fy (a, b)(y b) = 0,
for all x, y R which, in turn, means that we must have
fx (a, b) = 0

and

fy (a, b) = 0.

Thus, we find that the point (x, y) = (a, b) is a stationary point of the function f (x, y) if
both first-order partial derivatives of the function are zero at that point. Consequently,
in order to find the stationary points of a function, f (x, y), we must find all points (x, y)
that satisfy the equations

fx (x, y) = 0

and

fy (x, y) = 0,

simultaneously.
Example 7.1 Find the stationary points of the function
f (x, y) = x4 + 2x2 y + 2y 2 + y.
The first-order partial derivatives of this function are
fx (x, y) = 4x3 + 4xy

and

fy (x, y) = 2x2 + 4y + 1.

At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to
solve the simultaneous equations
4x3 + 4xy = 0

and

2x2 + 4y + 1 = 0.

If we start by looking at the first equation, this gives us


4x3 + 4xy = 0

4x(x2 + y) = 0

x = 0 or y = x2 .

And so, to satisfy the second equation with:


x = 0 we must have
2(0)2 + 4y + 1 = 0
i.e. (0, 1/4) is a stationary point.

262

1
y= ,
4

7.2. Unconstrained optimisation

y = x2 we must have
2x2 + 4(x2 ) + 1 = 0

2x2 = 1

which in turn gives us

x2 =

1
2

1
x = ,
2

1
1
= ,
y =
2
2

i.e. (1/ 2, 1/2) and (1/ 2, 1/2) are stationary points.

Consequently, the points


0,

1
4

1
1
,
2 2

1
1
,
2 2

and

are stationary points of this function.


Example 7.2 Find the stationary points of the function
f (x, y) = 4x3 60xy + 5y 2 + 400y 35.
The first-order partial derivatives of this function are
fx (x, y) = 12x2 60y

fy (x, y) = 60x + 10y + 400.

and

At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to
solve the simultaneous equations
12x2 60y = 0

60x + 10y + 400 = 0.

and

We start by simplifying these equations to get,


x2 5y = 0

6x + y + 40 = 0,

and

and then notice that the first equation gives us y = x2 /5. Substituting this into the
second equation then allows us to see that
6x+

x2
+40 = 0 = x2 30x+200 = 0 = (x20)(x10) = 0 = x = 10 or x = 20,
5

and, since y = x2 /5, we have


y=

102
= 20
5

or

y=

202
= 80,
5

respectively. Thus, this function has two stationary points, namely the points
(10, 20) and (20, 80).
Activity 7.1 Find the stationary points of the function
f (x, y) = x2 4x + y 2 + 4y + 8.

263

7. Two-variable optimisation

Activity 7.2

Find the stationary points of the function


f (x, y) = 3x3 + 9x2 72x + 2y 3 12y 2 126y + 19.

In particular, notice that at a stationary point, i.e. at a point, (a, b), where
fx (a, b) = 0

and

fy (a, b) = 0,

we can see that the gradient vector of the function becomes


f (a, b) =

fx (a, b)
fy (a, b)

0
0

= 0.

That is, if we are at a stationary point, we can see that the rate of change of f in any
is zero as
direction given by the unit vector u
f (a, b) = u
0 = 0,
fu (a, b) = u
which means that at a stationary point, the rate of change of f is zero in all directions.

We have now seen how to find the stationary points of a function, f (x, y), but what do
they look like? Generally speaking, we will find that there are three kinds of stationary
point namely local minima, saddle points and local maxima and these are
illustrated in Figure 7.1(a), (b) and (c) respectively. We now consider what criteria we
can use to determine exactly what kind of stationary point we have found.
x

(a) local minimum

(b) saddle point

(c) local maximum

Figure 7.1: Each of these surfaces has the indicated kind of stationary point at (0, 0, 0).

7.2.2

Classifying stationary points

Lets say that we have found that (a, b) is a stationary point of the function, f (x, y).
This means that
fx (a, b) = 0
and
fy (a, b) = 0,
and so, in particular, the derivative of f at this point is given by
df
dx

264

= fx (a, b), fy (a, b) = 0 0 = 0.


(a,b)

7.2. Unconstrained optimisation

However, we saw in Section 6.4.5, that the second-order Taylor series of the function
f (x, y) around the point (a, b) is given by
f (x, y) = f (a, b) +

df
dx

(a,b)

1
d2 f
xa
x a, y b
+
yb
2!
dx 2

(a,b)

xa
+ ,
yb

and so, as (a, b) is a stationary point, we have


f (x, y) f (a, b) =

1
d2 f
x a, y b
2!
dx 2

(a,b)

xa
+ ,
yb

provided that the point (x, y) is sufficiently close to the point (a, b). Consequently, if we
let K(x, y) be the quantity
x a, y b

d2 f
dx 2

(a,b)

xa
,
yb

we can see that: If, for all (x, y) close to [but not equal to] (a, b), we have:
K(x, y) > 0, then f (x, y) > f (a, b) for such points and so the function always lies
above the horizontal tangent plane at (a, b). This means that the stationary point
is a local minimum as in Figure 7.1(a).
K(x, y) < 0, then f (x, y) < f (a, b) for such points and so the function always lies
below the horizontal tangent plane at (a, b). This means that the stationary point
is a local maximum as in Figure 7.1(c).
However, if we find that there are some points (x, y) close to [but not equal to] (a, b)
that make K(x, y) > 0 and others that make K(x, y) < 0, we see that at some points we
have f (x, y) > f (a, b) and so the function lies above the horizontal tangent plane and at
other points we have f (x, y) < f (a, b) and so the function lies below the horizontal
tangent plane. Indeed, as we saw in Figure 7.1(b), this is exactly what happens when
we have a saddle point.
Now, it turns out that,1 if we use the definition of the second derivative matrix, we have
K(x, y) = x a, y b

fxx (a, b) fxy (a, b)


fyx (a, b) fyy (a, b)

xa
,
yb

which means that, performing the matrix multiplications we get,


K(x, y) = (x a)2 fxx (a, b) + 2(x a)(y b)fxy (a, b) + (y b)2 fyy (a, b),
1

This is most easily done if we show that the second derivative of f (x, y) at the point (a, b), i.e. the
matrix
d2 f
fxx (a, b) fxy (a, b)
=
,
2
fyx (a, b) fyy (a, b)
dx (a,b)
is positive definite or negative definite as in Binmore and Davies (2002) Section 6.3. But you wont
encounter these concepts until you study 175 Further Linear Algebra and so we merely motivate the
result that follows here.

265

7. Two-variable optimisation

if we assume, as usual, that fxy (a, b) = fyx (a, b). Then, taking out a factor of fxx (a, b)
and completing the square, we get2
K(x, y) = fxx (a, b)

fxy (a, b)
(x a) +
fxx (a, b)

fxx (a, b)fyy (a, b) [fxy (a, b)]2


(y b)2 .
[fxx (a, b)]2

At this point we define the Hessian of f (x, y) to be the function


H(x, y) = fxx (x, y)fyy (x, y) [fxy (x, y)]2 ,
so that, finally, we have
K(x, y) = fxx (a, b)

fxy (a, b)
(x a) +
fxx (a, b)

H(a, b)
(y b)2 .
[fxx (a, b)]2

Now, if we look at this carefully, we see that:


If H(a, b) > 0 and fxx (a, b) > 0, then K(x, y) > 0 for all (x, y) close to [but not
equal to] (a, b) and so, as we saw above, this means that the stationary point (a, b)
is a local minimum.
If H(a, b) > 0 and fxx (a, b) < 0, then K(x, y) < 0 for all (x, y) close to [but not
equal to] (a, b) and so, as we saw above, this means that the stationary point (a, b)
is a local maximum.

Indeed, if we find that H(a, b) < 0, we can see that there will be some points (x, y) close
to [but not equal to] (a, b) that make K(x, y) > 0 and others that make K(x, y) < 0. In
this case, as we saw above, this means that the stationary point (a, b) is a saddle point.
In summary, we have now motivated the following method for classifying our stationary
points:
If (a, b) is a stationary point of the function, f (x, y), and the Hessian is defined
to be the function
H(x, y) = fxx (x, y)fyy (x, y) [fxy (x, y)]2 ,
then
If H(a, b) > 0 and fxx (a, b) > 0, then this stationary point is a local
minimum.
If H(a, b) > 0 and fxx (a, b) < 0, then this stationary point is a local
maximum.
If H(a, b) < 0, then this stationary point is a saddle point.
In particular, if H(a, b) = 0, we can draw no conclusions about the nature of
the stationary point by using this method.
Lets look at some examples of how this works in practice.
2

Technically, we have assumed that fxx (a, b) = 0 here, but if this was not the case we could present
a slightly different argument to deal with this problem. However, as we are just trying to motivate what
follows instead of providing a rigorous argument for it, we will skip these technicalities here.

266

7.2. Unconstrained optimisation

Example 7.3

Classify the stationary points we found in Example 7.1.

Using the first-order partial derivatives we found in Example 7.1, we find that the
second-order partial derivatives are
fxx (x, y) = 12x2 + 4y,

fxy (x, y) = 4x = fyx (x, y)

and

fyy (x, y) = 4,

and, as such, the Hessian is given by


H(x, y) = (12x2 + 4y)(4) (4x)2 = 48x2 + 16y 16x2 = 16(2x2 + y).
Evaluating this at each of the stationary points we then find that:
At (0, 1/4), the Hessian is
H(0, 1/4) = 16(1/4) < 0,
and so this is a saddle point.

At (1/ 2, 1/2), the Hessian is

H(1/ 2, 1/2) = 16(1/2) > 0 and fxx (1/ 2, 1/2) = 6 2 > 0,


so this is a local minimum.

At (1/ 2, 1/2), the Hessian is

H(1/ 2, 1/2) = 16(1/2) > 0 and fxx (1/ 2, 1/2) = 6 2 > 0,


so this is a local minimum.
Thus, the stationary points we found in Example 7.1, i.e.
0,

1
4

1
1
,
2 2

and

1
1
,
2 2

are a saddle point and two local minima respectively.


Example 7.4

Classify the stationary points we found in Example 7.2.

Using the first-order partial derivatives we found in Example 7.2, we find that the
second-order partial derivatives are
fxx (x, y) = 24x,

fxy (x, y) = 60 = fyx (x, y)

and

fyy (x, y) = 10,

and, as such, the Hessian is given by


H(x, y) = (24x)(10) (60)2 = 240x 3600 = 240(x 15).
Evaluating this at each of the stationary points we then find that:
At (10, 20), the Hessian is
H(10, 20) = 240(5) < 0,

267

7. Two-variable optimisation

and so this is a saddle point.


At (20, 80), the Hessian is
H(20, 80) = 240(5) > 0

and

fxx (20, 80) = 24(20) > 0,

so this is a local minimum.


Thus, the stationary points (10, 20) and (20, 80) are a saddle point and a local
minimum respectively.
Activity 7.3

Classify the stationary points we found in Activity 7.1.

Activity 7.4

Classify the stationary points we found in Activity 7.2.

Lastly, we have remarked above that in cases where the Hessian is zero at a stationary
point, the method that we have used so far fails. Indeed, in such cases, the stationary
point could be a local minimum, a local maximum or a saddle point and, to determine
which, we would have to think more carefully about what is happening. Lets consider
an example of a function where this kind of problem occurs.

Example 7.5 Find the stationary point of the function f (x, y) = x3 y 3 and show
that we cant determine its nature using the method above. What kind of stationary
point do we have here?
The first-order partial derivatives of this function are
fx (x, y) = 3x2

and

fy (x, y) = 3y 2 .

So, clearly, the only stationary point is at (0, 0). The second-order partial derivatives
of this function are given by
fxx (x, y) = 6x,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 6y,

and, as such, the Hessian is given by


H(x, y) = (6x)(6y) 02 = 36xy.
Indeed, evaluating this at the stationary point gives H(0, 0) = 0 and so the method
we used above fails.
However, if we consider the surface z = f (x, y), notice that the y = 0 section of our
function gives z = f (x, 0) = x3 . As such, if we look at this section around the
stationary point (0, 0) where z = f (0, 0) = 0, we can see that
if x > 0, we have f (x, 0) > f (0, 0) and so this stationary point cant be a local
maximum, whereas
if x < 0, we have f (x, 0) < f (0, 0) and so this stationary point cant be a local
minimum.

268

7.2. Unconstrained optimisation

Indeed, if we look at the x = 0 section of our function, i.e. z = f (0, y) = y 3 , this


leads us to a similar conclusion. In fact, looking at the sections, we can see that this
is a kind of saddle point, albeit one which looks different to the one that we saw
before in Figure 7.1(b), and it is illustrated in Figure 7.2.

100

100

50

50

200

100
0
-4

-2

0
0

-4

-2

-50

-50

-100

-100

-100
4

-200

2
0

-2
4

-4
-2

-4

(a)

(b)

(c)

Figure 7.2: Some useful pictures for Example 7.5. (a) The y = 0 section, z = f (x, 0) = x3 .

(b) The x = 0 section, z = f (0, y) = y 3 . (c) The surface z = f (x, y) = x3 y 3 displaying


a different kind of saddle point at (0, 0, 0).

Activity 7.5 Find the stationary point of the function

f (x, y) = (x 1)4 + (y 1)4 ,


and show that we cant determine its nature using the method above. What kind of
stationary point do we have here?

7.2.3

Convex and concave functions

As we saw in Section 4.3.2, it is often useful to know whether a function is convex or


concave. In particular, we will see that, if a function, f (x, y), is convex (or concave) for
all (x, y) R2 , then a local minimum (or local maximum) is actually a global minimum
(or global maximum), i.e. we can find the smallest (or largest) value that the function
can attain.
To see how this works, consider that, in the case of a function of one variable, f (x), we
saw in Section 4.3.2 that
f (x) is convex on R if it lies above all of its tangent lines, and
f (x) is concave on R if it lies below all of its tangent lines.
So, analogously, we say that a function of two variables, f (x, y), is
convex on R2 if it lies above all of its tangent planes, and
concave on R2 if it lies below all of its tangent planes.
As an example of what this means, it should be clear from what we can see of the
surfaces illustrated in Figure 7.1, that in:

269

7. Two-variable optimisation

(a) where we have a local minimum, the function is convex because it lies above all
of its tangent planes
(b) where we have a saddle point, the function is neither convex nor concave as,
considering the horizontal tangent plane at (0, 0, 0), some of the function lies
above this tangent plane and the rest of it lies below this tangent plane.
(c) where we have a local maximum, the function is concave because it lies below all
of its tangent planes.
We now want to develop a way of determining whether a function is convex or concave
on R2 .
Suppose that we have a function f (x, y) that is convex. As we saw in Section 6.4.1, at
any point (a, b), the tangent plane to this function has a Cartesian equation given by
z = f (a, b) +

df
dx

xa
,
yb

(a,b)

and, as this function is convex, it must be the case that for all (x, y) R2 , the function
lies above this tangent plane, i.e. we must have
f (x, y) f (a, b) +

df
dx

(a,b)

xa
.
yb

However, using the second-order Taylor series for f (x, y) around the point (a, b), this
means that we have
f (a, b)+

df
dx

(a,b)

1
d2 f
xa
x a, y b
+
yb
2!
dx 2

(a,b)

xa
yb

f (a, b)+

df
dx

(a,b)

xa
,
yb

which simplifies to give us


x a, y b

d2 f
dx 2

xa
yb

(a,b)

0,

and this just asserts that K(x, y) 0 using our notation from Section 7.2.2. However,
using what we saw before, this means that we require
H(x, y) 0

and

fxx (x, y) 0,

and this is, therefore, our condition for convexity.3


Activity 7.6 Using an argument similar to the one above, explain why a concave
function requires that H(x, y) 0 and fxx 0.
The upshot of this is that we can now see that a function, f (x, y), is
convex on R2 if, for all (x, y) R2 , H(x, y) 0 and fxx (x, y) 0, and
concave on R2 if, for all (x, y) R2 , H(x, y) 0 and fxx (x, y) 0.
3

Again, we have glossed over any complications in our derivation that would occur if fxx (x, y) = 0
for some point, (x, y).

270

7.2. Unconstrained optimisation

Note, in particular, that when testing for convexity or concavity, we can have
H(x, y) = 0 even though we must have H(x, y) = 0 when we are classifying stationary
points using the method of the previous section. But, it should be clear that if a
function, f (x, y), has a stationary point and it is
convex, then that stationary point is a global minimum.
concave, then that stationary point is a global maximum.
That is, we now have a way of determining whether a local minimum (or a local
maximum) is a global minimum (or a global maximum).
Example 7.6 Show that the function f (x, y) = x2 + y 2 has a global minimum at
the point (0, 0, 0).
The first-order partial derivatives of this function are
fx (x, y) = 2x

and

fy (x, y) = 2y.

At a stationary point, we must have fx (x, y) = 0 and fy (x, y) = 0, i.e. we must have
x = 0 and y = 0. Indeed, as z = f (0, 0) = 0, this means that we have a stationary
point at (0, 0, 0).
The second-order partial derivatives of this function are
fxx (x, y) = 2,

fxy (x, y) = 0 = fyx (x, y)

and

7
fyy (x, y) = 2,

and, as such, the Hessian is given by


H(x, y) = (2)(2) 02 = 4.
So, at the stationary point, we have H(0, 0) = 4 > 0 and fxx (0, 0) = 2 > 0 which
means that this is a local minimum. But, in fact, we have H(x, y) = 4 0 and
fxx (x, y) = 2 0 for all (x, y) R2 here and so this function is actually convex on
R2 , i.e. the local minimum we have found here is actually a global minimum.
In particular, notice that this should have been obvious since we have
z = f (0, 0) = 0 at the stationary point and for all other x, y R, we have
z = f (x, y) = x2 + y 2 > 0,
i.e. f (x, y) f (0, 0) for all x, y R. Consequently, it should be clear that this
function has a global minimum at (0, 0) and this minimum value is zero.
Lastly, we note that these conditions can also be used to determine the regions in the
(x, y)-plane where a function is convex, concave or neither as the next example shows.

271

7. Two-variable optimisation

Example 7.7 Determine the regions in the (x, y)-plane where the function,
f (x, y) = x2 y 3 is convex, concave or neither.
The first-order partial derivatives of this function are
fx (x, y) = 2x

and

fy (x, y) = 3y 2 ,

and so the second-order partial derivatives of this function are


fxx (x, y) = 2,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 6y,

which means that the Hessian is given by


H(x, y) = (2)(6y) 02 = 12y.
As such, we see that:
When y > 0, H(x, y) < 0 and so the function is neither convex nor concave.
When y 0, H(x, y) 0 and fxx (x, y) = 2 0 and so the function is convex.

The surface z = f (x, y) defined by this function is illustrated in Figure 7.3. In


particular, observe that this function has a stationary point at (0, 0, 0) and that,
even though our method for classifying this point fails here (as H(0, 0) = 0), it is
clearly a saddle point.

Figure 7.3: The surface z = f (x, y) where f (x, y) = x2 y 3 from Example 7.7. Observe

that this function is convex when y 0 but that it is neither convex nor concave when
y > 0.
Lets now look at some applications of this material.

7.2.4

Applications

Optimisation problems are very common in economics and we now introduce two ways
in which they can arise in that subject. The first is their use in cost minimisation and
the second will be another instance of profit maximisation.

272

7.2. Unconstrained optimisation

Cost minimisation
Suppose a firm is using quantities x and y of two commodities and this incurs a cost
given by the cost function, C(x, y). One might reasonably ask: What quantities should
they be using if they want to minimise their costs?
Example 7.8 A data processing company employs both senior and junior
programmers. A particularly large project will cost
C(x, y) = 2000 + 2x3 12xy + y 2 ,
pounds, where x and y represent the number of junior and senior programmers used
respectively. How many employees of each kind should be assigned to the project in
order to minimise its cost? What is this minimum cost?
To minimise the cost, we need to find the stationary points of C(x, y) and determine
which of them gives us a minimum. So, as before, we start by finding the first-order
partial derivatives of C(x, y), i.e.
Cx (x, y) = 6x2 12y

Cy (x, y) = 12x + 2y.

and

At a stationary point, both of these first-order partial derivatives are zero, i.e. we
must have Cx (x, y) = 0 and Cy (x, y) = 0. Thus, to find the stationary points, we
have to solve the simultaneous equations
6x2 12y = 0

12x + 2y = 0.

and

We start by simplifying these equations to get


x2 2y = 0

and

6x + y = 0,

and then notice that the second equation gives us y = 6x. Substituting this into the
first equation then allows us to see that
x2 2(6x) = 0

x2 12x = 0

x(x 12) = 0

x = 0 or x = 12,

and, since y = 6x, we have


y = 6(0) = 0

or

y = 6(12) = 72,

respectively. Thus, the cost function, C(x, y), has two stationary points, namely the
points (0, 0) and (12, 72).
To classify these stationary points, we look at the second-order partial derivatives of
C(x, y), which are
Cxx (x, y) = 12x,

Cxy (x, y) = 12 = Cyx (x, y)

and

Cyy (x, y) = 2,

and, as such, the Hessian is given by


H(x, y) = (12x)(2) (12)2 = 24x 144 = 24(x 6).
Evaluating this at each of the stationary points we then find that:

273

7. Two-variable optimisation

At (0, 0), the Hessian is


H(0, 0) = 24(6) < 0,
and so this is a saddle point.
At (12, 72), the Hessian is
H(12, 72) = 24(+6) > 0

and

Cxx (12, 72) = 12(12) > 0,

so this is a local minimum.


Consequently, to minimise the cost we want to use 12 junior and 72 senior
programmers. If we do this we find that the minimum cost is given by
C(12, 72) = 2000 + 3456 10368 + 5184 = 272,
i.e. the minimum cost is 272.4
Profit maximisation

We now describe the problem of maximising the profit of a firm which makes two
products, X and Y. Generally, if pX and pY are the selling prices of one unit of X and
one unit of Y respectively, then the total revenue, TR(x, y), obtained from producing
amounts x of product X and y of product Y is
TR(x, y) = xpX + ypY .
Of course, there are a number of ways in which the prices pX and pY may be related to
the quantities x and y. For instance:
If the goods were related, pX and pY could both depend on x and y (e.g. if we were
considering a music company producing an album on both CD and cassette).
If the goods were unrelated, pX and pY could depend only on x and y respectively
(e.g. a pharmaceuticals company producing paracetamol and insulin).
The firm will also have a joint total cost function, TC(x, y), which tells us how much it
costs to produce x units of X and y units of Y. Clearly, given TR(x, y) and TC(x, y), we
can consider the profit function of the firm, (x, y), which is given by
(x, y) = TR(x, y) TC(x, y) = xpX + ypY TC(x, y),
and we can maximise this function of x and y using the techniques described above.
Lets look at an example.
Example 7.9 Suppose that a firm is the sole supplier of X and Y (in other words,
it has a monopoly on these goods) and that the demands for X and Y, in tonnes, are
given by
x = 2 2pX + pY
and
y = 13 + pX 2pY ,
4

Which, thinking about it, is far less than the value of C(x, y) at the other stationary point since
C(0, 0) = 2000.

274

7.3. Constrained optimisation

respectively when each tonne of X and Y sells at a price, in pounds, of pX and pY ,


respectively. If the joint total cost function of the firm is TC(x, y) = 5 + x2 xy + y 2 ,
find the quantities of X and Y the firm should produce in order to maximise its
profit. What are the corresponding prices? What is the maximum profit?
We start by rearranging the equations to find expressions for pX and pY .5 The first
equation tells us that pY = x 2 + 2pX and so substituting this into the second
equation yields
y = 13 + pX 2(x 2 + 2pX ) = y = 13 + pX 2x + 4 4pX = 3pX = 17 2x y.
As such, we have

17 2x y
,
3
and so substituting this into pY = x 2 + 2pX , we find that
pX =

pY = x2+2

17 2x y
3x 6 + 34 4x 2y
28 x 2y
= pY =
= pY =
.
3
3
3

Consequently, the profit function in this case is given by

(x, y) = xpX + ypY TC(x, y)


17 2x y
28 x 2y
=x
+y
(5 + x2 xy + y 2 )
3
3
1
= (17x 2x2 xy) + (28y xy 2y 2 ) (15 + 3x2 3xy + 3y 2 )
3
1
(x, y) =
15 + 17x + 28y 5x2 5y 2 + xy ,
3

and we can now maximise this profit function using the method above.

Activity 7.7 Finish the problem started in Example 7.9. That is, find the values of
x and y that maximise the profit function (x, y) found in the example, the
corresponding prices pX and pY , and the maximum profit.

7.3

Constrained optimisation

We now turn our attention to the problem of constrained optimisation, i.e. the problem
of optimising a function, f (x, y), in the case where the values of x and y we are
considering are constrained by the requirement that they must lie in some region, R, of
R2 . In particular, we will see that the optimal point we seek will

Note that if the price of X was fixed and the price of Y was increased, then the demand for X would
rise and the demand for Y would fall. This is the behaviour one might expect if X and Y were two related
commodities, e.g. if they were two different types of chocolate bar.

275

7. Two-variable optimisation

either be a point inside the region, in which case it will be a stationary point of
f (x, y) that happens to be in the region,
or it will be a point on the boundary of the region, in which case it need not be a
stationary point of f (x, y) even though it optimises this function over points in the
region.
Of course, in the former case, we can find and classify the stationary point in the region
using the method in the previous section and then, checking that this point is more
optimal than any point on the boundary of the region, we will have our answer. Lets
look at a quick example.
Example 7.10 Minimise the function f (x, y) = (x 1)2 + (y 1)2 given that (x, y)
must lie in the region defined by the inequalities x 0, y 0 and x + y 3.
The first-order partial derivatives of this function are
fx (x, y) = 2(x 1)

and

fy (x, y) = 2(y 1),

and so, setting these equal to zero, we see that (1, 1) is the only stationary point of
this function. The second-order partial derivatives of this function are
fxx (x, y) = 2,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 2,

which means that the Hessian is given by


H(x, y) = (2)(2) 02 = 4,
and so we see that H(1, 1) = 4 > 0 and fxx (1, 1) = 2 > 0 which means that this
point is a local minimum. Indeed, as this point satisfies the inequalities given above,6
this point is in the specified region and so f (1, 1) = 0 is a candidate for the
minimum value of f (x, y) for (x, y) that lie in the region. However, we must check
that nothing odd is happening due to the points on the boundary of the region and
to do this we note that:
If we are on the x = 0 boundary of the region (so, technically, 0 y 3) we
have f (0, y) = 1 + (y 1)2 1 > 0.
If we are on the y = 0 boundary of the region (so, technically, 0 x 3) we
have f (x, 0) = (x 1)2 + 1 1 > 0.
If we are on the x + y = 3 boundary of the region we have x = 3 y (and,
technically, 0 y 3) which means that
f (3 y, y) = (2 y)2 + (y 1)2 = 2y 2 6y + 5 = 2 y
if we complete the square, but this means that f (3 y, y)

1
2

3
2

1
+ ,
2

> 0.

Thus, we cant find values of f (x, y) as small as f (1, 1) = 0 on any of the boundaries
of the region and so the minimum value of f (x, y) for points in this region is zero
and this occurs at the point (1, 1).

276

7.3. Constrained optimisation

Activity 7.8

Explain why the answer we found in the previous example is obvious!

However, in what follows we will be more interested in solving constrained optimisation


problems where the optimal point occurs on the boundary of the region since the
methods we have developed so far will not help us in that case.

7.3.1

Finding optimal points on the boundary of a region

Generally speaking, when the optimal point occurs on the boundary of a region, we will
be able to find it by considering the contours of the function we are optimising in
relation to the region we are optimising the function over. Indeed, when doing this, we
will find that we are in one of the two cases below.
The optimal point is at a corner of the boundary
The following example should clarify what we should do in this case.
Example 7.11 Maximise the function f (x, y) = x2 + y 2 given that (x, y) must lie
in the region defined by the inequalities x 0, y 0 and x + 2y 4.
We start by sketching the region which is the shaded triangle in Figure 7.4(a) and
some typical contours of the surface z = f (x, y). Indeed, notice that here, the
contour z = c has equation
x2 + y 2 = c,

and so it will be a circle of radius c centred on the origin. In the figure, we have
sketched the z = 4 and z = 16 contours and, in particular, we notice that as the
contours move away from the origin, the value of z increases as indicated by the
arrow.
Now, to find the maximum value of f (x, y) in this region we need a point which both
lies in the region, and
gives us the largest value of z.
That is, in this case, we want the point (4, 0) which is a corner of the boundary. In
particular, notice that with this point on the z = 16 contour:
we get a higher value of z than we do from any point on a contour with z < 16
(like, say, the z = 2 contour), and
we cant have any point on a contour with z > 16 as none of these contours will
give us a point in the region.
That is, the point (4, 0) which gives us z = 16 must indeed maximise the function
f (x, y) given that (x, y) must lie in the specified region.
6

That is, the point (1, 1) clearly satisfies the inequalities x 0 and y 0 as well as the inequality
x + y 3 since 1 + 1 = 2 < 3.

277

7. Two-variable optimisation

2
O

d
i n i rec
cr ti
ea on
si n o
g f
z

d
i n i rec
cr ti
ea on
si n o
g f
z

z = 16
z=4

(a)

(x , y )

z=2
z=1
x

(b)

Figure 7.4: (a) The region for Example 7.11 is the shaded triangle and the z = 4 and

z = 16 contours are indicated. (b) The region for Example 7.12 is the same shaded triangle
and the z = 1 and z = 2 contours are indicated. Note, in both cases, the direction in
which z increases.
The optimal point is on the boundary but it isnt a corner
This is the case that is going to concern us the most and so, for the moment, we just
look at an example to see what is happening before we come to the recommended
method for solving such problems.

Example 7.12 Maximise the function f (x, y) = xy given that (x, y) must lie in the
region defined by the inequalities x 0, y 0 and x + 2y 4.
We start by sketching the region which is the shaded triangle in Figure 7.4(b) and
some typical contours of the surface z = f (x, y). Indeed, notice that here, the
contour z = c has equation
xy = c,
and so it will be a rectangular hyperbola with the x and y-axes as its asymptotes. In
the figure, we have sketched the z = 1 and z = 2 contours and, in particular, we
notice that as the contours move away from the origin, the value of z increases as
indicated by the arrow.
Now, to find the maximum value of f (x, y) in this region we need a point which both
lies in the region, and
gives us the largest value of z.
That is, in this case, we want the point (x , y ) which is not a corner of the
boundary. In particular, notice that with this point on the z = 2 contour:
we get a higher value of z than we do from any point on a contour with z < 2
(like, say, the z = 1 contour), and
we cant have any point on a contour with z > 2 as none of these contours will
give us a point in the region.
That is, the point (x , y ) which gives us z = 2 must indeed maximise the function
f (x, y) given that (x, y) must lie in the specified region. But, how do we find this
point?

278

7.3. Constrained optimisation

One way to find this point is to see that it is a point where, for some constant c, we
have a contour f (x, y) = c which is both
tangential to the line x + 2y = 4, and
touching the line x + 2y = 4.
Indeed, as the gradient of f (x, y) = c is given by
dy
f /x
y
=
= ,
dx
f /y
x
as we saw in Section 6.3.3 and the gradient of the line x + 2y = 4 is given by
y =2

x
2

dy
1
= ,
dx
2

the first condition means that we must have a point which satisfies the equation

y
1
=
x
2

y=

x
,
2

whereas the second condition means that we must have a point which satisfies the
equation x + 2y = 4. Solving these equations simultaneously, we find that this gives
us the point (x , y ) = (2, 1).7
Now, in such cases, we could always proceed in this way but, as we shall see in a
moment, there is a way of turning this idea into a much more general method. And, it is
this new method that we will generally use in such cases.

7.3.2

The method of Lagrange multipliers

Suppose that we have been asked to optimise the function, f (x, y), given that (x, y)
must lie in some region and, by looking at the contours as above, we have determined
that the optimal point occurs on the boundary given by some equation g(x, y) = 0. In
particular, we are concerned with the case where the optimal point is not a corner of
the boundary, i.e. we want a point where, for some constant c, the contour f (x, y) = c is
both
tangential to the boundary given by g(x, y) = 0, and
touching the boundary given by g(x, y) = 0.
Now, for tangency, we require that the gradient of the contour f (x, y) = c, i.e.
dy
fx (x, y)
=
,
dx
fy (x, y)
is equal to the gradient of the boundary given by g(x, y) = 0, i.e.
dy
gx (x, y)
=
,
dx
gy (x, y)
7

And, at this point, z = f (2, 1) = 2 as expected from above. But, in general, we would not know the
optimal value of z = f (x, y) beforehand. We have just used it here to help illustrate what is going on.

279

7. Two-variable optimisation

where we have used what we saw in Section 6.3.3 twice. But, if these are equal, we have

gx (x, y)
fx (x, y)
=
fy (x, y)
gy (x, y)

fx (x, y)
fy (x, y)
=
,
gx (x, y)
gy (x, y)

and we denote this common value by , i.e. we have


=

fy (x, y)
fx (x, y)
=
.
gx (x, y)
gy (x, y)

Rearranging this we then get two equations, namely


fx (x, y) gx (x, y) = 0 and fy (x, y) gy (x, y) = 0,
or, more simply,

f (x, y) g(x, y) = 0 and


x

f (x, y) g(x, y) = 0.
y

So, any point which satisfies these two equations is a point where the contour
f (x, y) = c is tangential to the boundary g(x, y) = 0. We also note that the equation

f (x, y) g(x, y) = 0

g(x, y) = 0,

and so, any point which satisfies this equation lies on the boundary. Consequently, we
define the Lagrangean to be the function
L(x, y, ) = f (x, y) g(x, y),
and we call the Lagrange multiplier. In particular, the point we seek will be amongst
the stationary points of the Lagrangean since it must satisfy the equations
L
= 0,
x

L
= 0 and
y

L
= 0,

which we have derived above. In such cases, we call the function we are optimising,
f (x, y), the objective function and we call the equation of the boundary, which must be
written in the form g(x, y) = 0, the constraint. Lets see how we can use this method to
solve the constrained optimisation problem we saw in Example 7.12.
Example 7.13 Solve the constrained optimisation problem in Example 7.12 using
the method of Lagrange multipliers.
We have already seen that the optimal point we seek occurs when the function
f (x, y) = xy is tangential to the boundary given by the line x + 2y = 4. Writing the
equation of the line in the form g(x, y) = x + 2y 4 = 0 we see that the Lagrangean
is
L(x, y, ) = xy (x + 2y 4),
where is the Lagrange multiplier. We now find the stationary points of the
Lagrangean by finding its first-order partial derivatives, i.e.
Lx (x, y, ) = y ,

280

Ly (x, y, ) = x 2 and L (x, y, ) = (x + 2y 4),

7.3. Constrained optimisation

and setting them equal to zero to get the equations


y = 0,

x 2 = 0 and x + 2y 4 = 0.

We now eliminate from the first two equations to get


=y=

x
2

y=

x
,
2

and this, as you should expect is our tangency condition from Example 7.12. On the
other hand, the third equation is just
x + 2y = 4,
which, as you should expect, is our constraint. Solving these two equations
simultaneously, we then get the point (2, 1) as the only solution and so this must be
the optimal point we seek in agreement with what we found in Example 7.12.
Obviously, at this point, we find that f (1, 2) = 2 is the maximum value of f subject
to the constraint.
Sometimes we will see questions where we are just asked to use this method to solve a
constrained optimisation problem. In such cases, we will be given the objective function,
f (x, y), and the constraint, g(x, y) = 0, which we should be using. In particular, unless
we are explicitly asked to look at contours, we will just apply the method and assume
that the answer we find is the appropriate kind of optimal point.8 Lets look at an
example of such a problem.
Example 7.14

Given the function


f (x, y) = 160x 3x2 2xy 2y 2 + 120y 18,

find the maximum value of f (x, y) subject to the constraint x + y = 34.


We write the constraint x + y = 34 as x + y 34 = 0 so that it is in the form
g(x, y) = 0 with g(x, y) = x + y 34. This allows us to write the Lagrangean as
L(x, y, ) = 160x 3x2 2xy 2y 2 + 120y 18 (x + y 34),
where is the Lagrange multiplier. To find the stationary points of the Lagrangean
we find its first-order partial derivatives, i.e.
Lx (x, y, ) = 160 6x 2y ,
Ly (x, y, ) = 2x 4y + 120 and
L (x, y, ) = (x + y 34),
and set them equal to zero to get the equations
160 6x 2y = 0,

2x 4y + 120 = 0

and

x + y 34 = 0.

Although, sometimes, the Lagrangean may have several stationary points and, if that happens, it
should be fairly straightforward to see which of these is the one we want.

281

7. Two-variable optimisation

The first two equations give us


= 160 6x 2y

and

= 2x 4y + 120,

and so we can eliminate to get


160 6x 2y = 2x 4y + 120

2y = 4x 40

y = 2x 20,

whereas the third equation gives us x + y = 34 which is, of course, just our
constraint. So, as this gives y = 34 x, we can use it and the y = 2x 20 that we
have just found to eliminate y and get
34 x = 2x 20

3x = 54

x = 18.

And, if x = 18, then the constraint y = 34 x gives us y = 34 18 = 16. Thus, the


point (18, 16) is the only stationary point of the Lagrangean and so it must be the
optimal point we seek. Thus, the maximum of f (x, y) subject to the constraint
g(x, y) = 0 is f (18, 16) = 2, 722.
Note that, although we have only used this method to find maxima in the examples
above, it will find minima as well and we will see an example of this when we consider
cost minimisation problems in Section 7.3.4.

7.3.3

The meaning of the Lagrange multiplier

In addition to allowing us to solve certain constrained optimisation problems, the


method of Lagrange multipliers has another use which will be important when we come
to consider its applications in Section 7.3.4. To see this, consider that, when we are
asked to optimise f (x, y) subject to the constraint g(x, y) = c where c is a constant we
would proceed as follows.
Writing the constraint in the form g(x, y) c = 0, we have the Lagrangean
L(x, y, ) = f (x, y) (g(x, y) c),
where is the Lagrange multiplier. Its first-order partial derivatives are given by
Lx (x, y, ) = fx (x, y) gx (x, y),
Ly (x, y, ) = fy (x, y) gy (x, y) and
L (x, y, ) = (g(x, y) c)
and we find that the stationary points occur when we set these equal to zero to get the
equations
fx (x, y) gx (x, y) = 0,

fy (x, y) gy (x, y) = 0 and g(x, y) c = 0.

Now, the first two equations tell us that


fx (x, y) = gx (x, y) and fy (x, y) = gy (x, y),
and, clearly, neither of these depend on c. However, when we solve these equations in
the standard way and use the constraint, g(x, y) = c, we find the point (x , y ) which

282

7.3. Constrained optimisation

optimises f (x, y) subject to the constraint. Of course, since we have used the constraint
to find the point (x , y ), the values of x and y we found will depend on c, i.e. we have
the functions x = x(c) and y = y(c) of c. In particular, this means that the optimal
value of f (x, y) subject to the constraint that we have found also depends on c, lets call
this F (c), i.e. we have
F (c) = f (x , y ) = f (x(c), y(c)).
Now, if we differentiate this with respect to c using the chain rule (see Section 6.3.3), we
have
f dx f dy
dF
=
+
,
dc
x dc
y dc
so that, using our expressions for fx (x, y) and fy (x, y) above, we get
dF
g dx
g dy
=
+
=
dc
x dc
y dc

g dx g dy
+
x dc y dc

However, given the constraint g(x, y) = c, we see that differentiating both sides with
respect to c we get
g dx g dy
+
= 1,
x dc y dc
where we have used the chain rule again on the left-hand-side. Putting these last two
equations together, we find that
dF
= ,
dc
i.e. the Lagrange multiplier is the rate of change of the optimal value of f (x, y) subject
to the constraint g(x, y) = c with respect to c. In particular, if we allowed our constraint
to change from g(x, y) = c to g(x, y) = c + c we would find that the change in the
optimal value of f (x, y) subject to this constraint, i.e. F (c), is given by
F
c

c,

provided that c is suitably small. Lets see how this works in the context of
Example 7.14.
Example 7.15 Using what we found in Example 7.14, find and hence find the
approximate change in the maximum value of f (x, y) subject to the constraint
x + y = 34 if the constraint is changed to x + y = 35.
We have found that the maximum value of f (x, y) subject to the constraint
x + y = 34 is f (18, 16) = 2, 722. As this occurs at the point (18, 16) we can use either
of the first two equations we found in Example 7.14 to find so, using the first, we
have
160 6x 2y = 0 = = 160 6(18) 2(16) = 20.
Consequently, using the theory above, we have a change in the constraint from
x + y = 34 to x + y = 35 which gives c = 1 and so the change in the maximum
value of f (x, y) subject to this constraint is approximately 20.
We now turn to some applications of constrained optimisation in economics.

283

7. Two-variable optimisation

7.3.4

Applications

Constrained optimisation problems are very common in economics and we now


introduce two ways in which they can arise in that subject. The first is their use when a
consumer wants to maximise their utility subject to a constraint imposed by their
budget and the second is when a firm wants to minimise its costs subject to a constraint
on its level of production.
Utility maximisation subject to a budget constraint
Suppose that a consumer is interested in buying some combination of two goods. Lets
say the price of the first good is p1 per unit, the price of the second good is p2 per unit
and the consumer has an amount M to spend on them. Indeed, if he wants to purchase
the bundle, (x1 , x2 ), which contains quantities x1 and x2 of the first and second good
respectively, it will cost him
p 1 x1 + p 2 x2 ,
and he can afford this bundle if he satisfies the budget constraint given by
p1 x1 + p2 x2 M,

M/p2

M/p2

ut

ili

n
g

io
ct
cr

ea

sin

re

x1
p1

di

ty

x2
of

x2

x2
p2

in

where x1 , x2 0 as they represent quantities. This gives us a budget set, i.e. the set of
all bundles that the consumer can afford given the prices of the goods and his budget.
Indeed, geometrically, the bundles he can afford are contained in the triangular region
illustrated in Figure 7.5(a).

=
M

M/p1
(a)

x1

M/p1

x1

(b)

Figure 7.5: (a) The budget set for our consumer. (b) Adding three contours, u(x1 , x2 ) =

c, where the direction in which u(x1 , x2 ) is increasing is as indicated. Clearly, we are


interested in the point which is indicated in the figure.
Now, if his utility function is u(x1 , x2 ), the consumer wants to maximise this subject to
the constraint that he must be able to afford the bundle. That is, he must maximise
u(x1 , x2 ) subject to the constraint that the bundle he chooses is in the budget set. Lets
assume that, in this case, the utility function has contours u(x1 , x2 ) = c, where c is a
constant,9 that look like the ones illustrated in Figure 7.5(b) and that the direction of
9

These contours are called indifference curves as each point on such a contour gives our consumer the
same utility, i.e. he will be indifferent between the bundles represented by points on the same contour.

284

7.3. Constrained optimisation

increasing utility is as indicated. Indeed, we observe in this case that the maximum
value of u(x1 , x2 ) subject to the constraint imposed by the budget set occurs at the
point indicated, i.e. a point where we have a contour of u(x1 , x2 ) which is both
tangential to the line p1 x1 + p2 x2 = M , and
touching the line p1 x1 + p2 x2 = M .
As such, we could use the method of Lagrange multipliers to solve this problem, i.e. we
would write the constraint as p1 x1 + p2 x2 M = 0 and use the Lagrangean
L(k, l, ) = u(x1 , x2 ) (p1 x1 + p2 x2 M ),
to find the point (x1 , x2 ) which maximises the consumers utility subject to the
constraint. Indeed, having done this, we can define the function
U (M ) = u(x1 , x2 ),
which tells us the maximum utility of the consumer given his budget, M . In particular,
using the theory in Section 7.3.3, we see that the value of the Lagrange multiplier we
get from solving the equations will satisfy
dU
= ,
dM
i.e. it gives us the consumers marginal utility of [budgetary] money if he is purchasing
in a way that maximises his utility subject to his budget set. Lets look at an example.
Example 7.16 Suppose cats cost 2 each and dogs cost 1 each. If a consumer
has a utility function given by
u(x1 , x2 ) = x21 x22 ,
when he buys x1 cats and x2 dogs, how many cats and dogs should he buy if he
wants to maximise his utility given that he has M to spend? Find, U (M ), the
maximum utility he can attain if he has a budget of M and verify that U (M ) =
where is the Lagrange multiplier.
In this case, the budget set will be the region defined by the inequalities
2x1 + x2 M,
and x1 , x2 0 which looks like the one in Figure 7.5(a) whereas the contours
u(x1 , x2 ) = c where u(x1 , x2 ) = x21 x22 look like the ones sketched in Figure 7.5(b). As
such, we are in the situation described above and so we need to maximise u(x1 , x2 )
subject to the constraint that
2x1 + x2 = M

2x1 + x2 M = 0,

if we want the constraint in the right form. Thus, we have the Lagrangean
L(x1 , x2 , ) = x21 x22 (2x1 + x2 M ),

285

7. Two-variable optimisation

and we seek the points which simultaneously satisfy the equations Lx1 (x1 , x2 , ) = 0,
Lx2 (x1 , x2 , ) = 0 and L (x1 , x2 , ) = 0. The first-order partial derivatives of
L(x1 , x2 , ) are
Lx1 (x1 , x2 , ) = 2x1 x22 2,
Lx2 (x1 , x2 , ) = 2x21 x2 and
L (x1 , x2 , ) = (2x1 + x2 M ) ,
and we set these equal to zero to yield the equations
2x1 x22 2 = 0,

2x21 x2 = 0

2x1 + x2 M = 0.

and

We now solve these by eliminating from the first two equations, i.e. we get
= x1 x22 = 2x21 x2

x1 x2 (x2 2x1 ) = 0

x2 = 2x1 ,

where we reject the solutions where x1 = 0 and x2 = 0 as these give a utility of zero
which, clearly, wont give us the maximum we seek. We then use this new
relationship between x1 and x2 in the third equation, which is just the constraint
2x1 + x2 = M , to get
2x1 + 2x1 = M

4x1 = M

x1 =

M
,
4

and then, using this in the equation x2 = 2x1 , we get x2 = M/2. Thus, these values
of x1 and x2 maximise our consumers utility if he has a budget of M and his
maximum utility is then given by
U (M ) = u

M M
,
4 2

M
4

M
2

M4
,
64

which means that

4M 3
M3
=
.
64
16
Of course, we can also find the value of using, say, the equation
U (M ) =

= x1 x22

M
4

M
2

M3
,
16

which verifies that U (M ) = .

Activity 7.9 Another consumer has a budget of 4 to buy cats and dogs at the
prices in Example 7.16 and her utility function is u(x1 , x2 ) = 3x1 + x2 when she buys
x1 cats and x2 dogs. Sketch the budget set and some contours u(x1 , x2 ) = c where c
is a constant for this consumer. How many cats and dogs should she buy if she wants
to maximise her utility given her budget?

286

7.3. Constrained optimisation

Cost minimisation subject to a production constraint


Suppose that capital costs v per unit and labour costs w per unit. This means that a
firm which uses an amount k of capital and l of labour will incur costs given by the cost
function
C(k, l) = vl + wk.
Also suppose that these inputs allow the firm to produce an amount given by the
production function, q(k, l). We want to ask: How much capital and labour should the
firm use if it needs to produce an amount Q of its product? That is, we want to solve
the constrained optimisation problem
minimise C(k, l) subject to the constraint q(k, l) = Q,
where k, l 0 as they are quantities. Lets assume that, in this case, the constraint
q(k, l) = Q looks like the curve in Figure 7.6(a) for k, l 0. If we also sketch some
contours of the cost function,10 we can identify the direction in which costs are

co
g

sin
ea

de

cr

di

re

ct

io

st

of

7
O

(a)

k
(b)

Figure 7.6: (a) The constraint q(k, l) = Q. (b) Adding three contours, C(k, l) = c, where

the direction in which C(k, l) is decreasing is as indicated. Clearly, we are interested in


the point which is indicated in the figure.
decreasing as indicated in Figure 7.6(b). Indeed, we observe in this case that the
minimum value of C(k, l) subject to the constraint q(k, l) = Q occurs at the point
indicated, i.e. a point where we have a contour of C(k, l) which is both
tangential to the constraint q(k, l) = Q, and
touching the constraint q(k, l) = Q.
As such, we could use the method of Lagrange multipliers to solve this problem, i.e. we
would write the constraint as q(k, l) Q = 0 and use the Lagrangean
L(k, l, ) = C(k, l) (q(k, l) Q),
to find the point (k , l ) which minimises the costs subject to the constraint. Indeed,
having done this, we can define the function

C(Q)
= C(k , l ),
10

These contours are called isocosts as each point on such a contour costs the firm the same amount
of money.

287

7. Two-variable optimisation

which tells us the minimum cost of producing an amount, Q. In particular, using the
theory in Section 7.3.3, we see that the value of the Lagrange multiplier we get from
solving the equations will satisfy
dC
= ,
dQ
i.e. it gives us the marginal cost of the firm if it is producing in a way that minimises its
costs subject to the constraint that it is producing an amount, Q. Lets look at an
example.
Example 7.17 Suppose capital, k, costs 16 per unit and labour, l, costs 1 per
unit. If a firm can produce an amount given by the production function
q(k, l) = 10k 1/4 l1/4 ,

what values of k and l will minimise the cost of producing Q units? Find, C(Q),
the

minimum cost of producing Q and verify that C (Q) = where is the Lagrange
multiplier.
In this case, the constraint q(k, l) = Q will look like the curve in Figure 7.6(a) for
k, l 0 and so we are in the situation described above. Indeed, here the cost
function is
C(k, l) = 16k + l,

and, writing the constraint in the form q(k, l) Q = 0, we get the Lagrangean
L(k, l, ) = 16k + l (q(k, l) Q).
We seek the points which simultaneously satisfy the equations Lk (k, l, ) = 0,
Ll (k, l, ) = 0 and L (k, l, ) = 0 so we find the first-order partial derivatives of
L(k, l, ), i.e.
10 3 1
k 4l4 ,
4
10 1 3
k 4 l 4 and
4

Lk (k, l, ) = 16
Ll (k, l, ) = 1

L (k, l, ) = 10k 4 l 4 Q ,
and set these equal to zero to yield the equations
3 1
5
16 k 4 l 4 = 0,
2

5 1 3
1 k 4 l 4 = 0
2

10k 4 l 4 Q = 0.

and

We now solve these by eliminating from the first two equations, i.e. we get
1

5 l4
16 3 = 0
2 k4

2
5

= 16

k4
1

l4

from the first equation, and


1

5 k4
1 3 =0
2 l4

288

2
5

l4
1

k4

7.3. Learning outcomes

from the second equation. As such, we can equate these expressions for to get
16

2
5

k4
1

l4

2
5

l4

k4

16k = l.

We then use this new relationship between k and l in the third equation, which is
just the constraint 10k 1/4 l1/4 = Q, to get
1

Q = 10k 4 (16k) 4

Q = 20k 2

k2 =

Q
20

k=

Q2
,
400

and then, using this in the equation k = 16l, we get


l = 16

Q2
400

Q2
.
25

Thus, these values of k and l minimise the cost of producing Q units. The minimum
cost is then given by
Q2 Q2
,
400 25

C(Q)
=C

= 16

Q2
400

Q2
25

2Q2
,
25

and so, we have

4Q
.
C (Q) =
25
Of course, we can also find the value of using, say, the equation
=

2
5

l4
k

1
4

2
5

(Q2 /25) 4
(Q2 /400)

1
4

4Q
,
25

which verifies that C (Q) = .

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
find and classify the stationary points of a function of two variables;
solve problems from economics-based subjects that involve unconstrained
optimisation;
optimise a function in the presence of constraints;
solve problems from economics-based subjects that involve constrained
optimisation.

289

7. Two-variable optimisation

Solutions to activities
Solution to activity 7.1
The first-order partial derivatives of the function are
fx (x, y) = 2x 4

and

fy (x, y) = 2y + 4.

At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve
the simultaneous equations
2x 4 = 0

and

2y + 4 = 0.

But, clearly, the first of these equations gives x = 2 and the second gives y = 2. Thus,
(2, 2) is the only stationary point of f (x, y).
Solution to activity 7.2
The first-order partial derivatives of the function are
fx (x, y) = 9x2 + 18x 72

and

fy (x, y) = 6y 2 24y 126.

At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve
the simultaneous equations
9x2 + 18x 72 = 0

and

6y 2 24y 126 = 0.

Now, notice that the first equation contains no ys and the second equation contains no
xs. As such, the first equation tells us everything there is to know about x, i.e.
9x2 + 18x 72 = 0 = x2 + 2x 8 = 0 = (x + 4)(x 2) = 0 = x = 4 or x = 2,
whereas the second equation tells us everything we need to know about y, i.e.
6y 2 24y 126 = 0 = y 2 4y 21 = 0 = (y + 3)(y 7) = 0 = y = 3 or y = 7.
As such, since we can take any of the x values with any of the y values we can see that
this function has four stationary points, namely (4, 3), (4, 7), (2, 3) and (2, 7).
Solution to activity 7.3
Using the first-order partial derivatives we found in Activity 7.1, we find that the
second-order partial derivatives are
fxx (x, y) = 2,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 2.

As these are constants, they take these values at the stationary point (and, indeed, at
all other points). Thus, we can see that the Hessian at the stationary point is given by
H(2, 2) = (2)(2) (0)2 = 4 > 0
so this is a local minimum.

290

and

fxx (2, 2) = 2 > 0,

7.3. Solutions to activities

Solution to activity 7.4


Using the first-order partial derivatives we found in Activity 7.2, we find that the
second-order partial derivatives are
fxx (x, y) = 18x + 18,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 12y 24,

and, as such, the Hessian is given by


H(x, y) = (18x + 18)(12y 24) 02 = 216(x + 1)(y 2).
Evaluating this at each of the stationary points we find that:
At (4, 3), the Hessian is
H(4, 3) = 216(3)(5) > 0

fxx (4, 3) = 18(4) + 18 < 0,

and

so this is a local maximum.


At (4, 7), the Hessian is
H(4, 7) = 216(3)(+5) < 0,
and so this is a saddle point.

At (2, 3), the Hessian is


H(2, 3) = 216(+3)(5) < 0,
and so this is a saddle point.
At (2, 7), the Hessian is
H(2, 7) = 216(+3)(+5) > 0

and

fxx (2, 7) = 18(2) + 18 > 0,

so this is a local minimum.


Thus, the stationary point (4, 3) is a local maximum, (4, 7) and (2, 3) are saddle
points and (2, 7) is a local minimum.
Solution to activity 7.5
The first-order partial derivatives of this function are
fx (x, y) = 4(x 1)3

and

fy (x, y) = 4(y 1)3 .

So, clearly, the only stationary point is at (1, 1) as this is the only point that makes
fx (x, y) = 0 and fy (x, y) = 0. The second-order partial derivatives of this function are
given by
fxx (x, y) = 12(x 1)2 ,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 12(y 1)2 ,

and, as such, the Hessian is given by


H(x, y) = [12(x 1)2 ][12(y 1)2 ] 02 = 144(x 1)2 (y 1)2 .

291

7. Two-variable optimisation

Indeed, evaluating this as the stationary point gives H(1, 1) = 0 and so the method we
used above fails.
However, if we consider the surface z = f (x, y), notice that we have z = f (1, 1) = 0 at
the stationary point and for all other x, y R, we have
z = f (x, y) = (x 1)4 + (y 1)4 > 0,
i.e. f (x, y) f (1, 1) for all x, y R. Consequently, it should be clear that this function
has a local minimum at (1, 1) and this minimum value is zero.11
Solution to activity 7.6
Suppose that we have a function f (x, y) that is concave. As we saw in Section 6.4.1, at
any point (a, b), the tangent plane to this function has a Cartesian equation given by
z = f (a, b) +

df
dx

xa
,
yb

(a,b)

and, as this function is concave, it must be the case that for all (x, y) R2 , the function
lies below this tangent plane, i.e. we must have

f (x, y) f (a, b) +

df
dx

(a,b)

xa
.
yb

However, using the second-order Taylor series for f (x, y) around the point (a, b), this
means that we have
f (a, b)+

df
dx

(a,b)

1
d2 f
xa
x a, y b
+
yb
2!
dx 2

(a,b)

xa
yb

f (a, b)+

df
dx

(a,b)

xa
,
yb

which simplifies to give us


d2 f
x a, y b
dx 2

xa
yb

(a,b)

0,

and this just asserts that K(x, y) 0 using our notation from Section 7.2.2. However,
using what we saw before, this means that we require
H(x, y) 0

and

fxx (x, y) 0,

and this is, therefore, our condition for concavity.12


Solution to activity 7.7
We have found that the profit function is given by
(x, y) =
11

1
15 + 17x + 28y 5x2 5y 2 + xy ,
3

Actually, this is not only a local minimum, it is a global minimum as this is truly the smallest value
the function can take for x, y R.
12
Again, we have glossed over any complications in our derivation that would occur if fxx (x, y) = 0
for some point, (x, y).

292

7.3. Solutions to activities

and, to maximise this, we need to find its stationary points and determine which of
them gives us a maximum. So, we start by finding the first-order partial derivatives of
(x, y), i.e.
x (x, y) =

1
17 10x + y
3

and

y (x, y) =

1
28 10y + x .
3

At a stationary point, both of these first-order partial derivatives are zero, i.e. we must
have x (x, y) = 0 and y (x, y) = 0. Thus, to find the stationary points, we have to solve
the simultaneous equations
10x y = 17

x 10y = 28.

and

We start by noticing that the first equation gives us y = 10x 17 and so, substituting
this into the second equation, we get
x 10 10x 17

= 28

99x = 198

x = 2,

and then, using y = 10x 17 again, we get y = 3. Thus, the profit function, (x, y), has
(2, 3) as its only stationary point.
To classify this stationary point, we look at the second-order partial derivatives of
(x, y), which are
xx (x, y) =

10
,
3

xy (x, y) =

1
= yx (x, y)
3

and

yy (x, y) =

10
,
3

and, as such, the Hessian is given by


H(x, y) =

10

10

1
3

100 1
= 11.
9
9

Clearly, at (2, 3), we have H(2, 3) > 0 and fxx (2, 3) < 0, which means that the
stationary point we have found is indeed a local maximum. Consequently, to maximise
its profit, the firm should produce 2 tonnes of X and 3 tonnes of Y so that it can sell
them at prices, in pounds, of
pX =

17 2(2) 3
10
=
3
3

3.33

and

pY =

28 2 2(3)
20
=
3
3

6.67,

respectively and, in doing so, the firm will make a maximum profit of
(2, 3) =

1
44
15 + 17(2) + 28(3) 5(2)2 5(3)2 + (2)(3) =
3
3

14.67,

pounds.
Solution to activity 7.8
Of course, this should have been obvious
either by noting that
f (x, y) = (x 1)2 + (y 1)2 0,
for all points (x, y) R2 with a minimum of zero at (1, 1);

293

7. Two-variable optimisation

or by observing that as H(x, y) = 4 > 0 and fxx (x, y) = 2 > 0 for all points
(x, y) R2 , we see that this function is convex and so the stationary point (1, 1) we
found above is a global minimum.
Then, using either of these facts, we see that we have found the minimum of f (x, y) for
all (x, y) R2 and so it must be the minimum in the given region too since it is in that
region.
Solution to activity 7.9
Given the prices in Example 7.16 and the consumers budget of 4, we see that the
budget set is given by
2x1 + x2 4,
where x1 , x2 0 as they are quantities. This is sketched in Figure 7.7(a).

We are now asked to sketch some contours u(x1 , x2 ) = c where c is a constant and
u(x1 , x2 ) = 3x1 + x2 ,

for this consumer. Indeed, looking at the budget set, it makes sense to choose the
contours where c = 4 and c = 6 and these are illustrated in Figure 7.7(b). This allows us
to see the direction of increasing utility, which is indicated in the figure, and allows us
to see that the point (2, 0) is the one where we get the highest utility if we are
constrained to stay within the budget set. Consequently, this consumer should buy two
cats and no dogs if she wants to maximise her utility subject to her budget constraint.
x2

x2

6
4

4
1

c=

2x
+
4

=
2

c=

x2
O

of
on tility
i
t
u
ec
d i r si n g
a
re
in c

x1

4
3

(a)

x1

(b)

Figure 7.7: The sketches for Activity 7.9. (a) The budget set for our consumer. (b) Adding

two contours, u(x1 , x2 ) = c, where c = 4 and c = 6. The direction in which u(x1 , x2 ) is


increasing is as indicated and we are interested in the point which is indicated in the
figure.

Exercises
Exercise 7.1
The function
f (x, y) = x2 ln y y ln y,

is defined for y > 0 and all x R. Find its stationary points and classify them.

294

7.3. Exercises

Exercise 7.2
Consider the function
f (x, y) = x+1 y 1 ,
for x, y > 0 and some constants and . For what values of and is this function
convex? Sketch the region(s) in the (, )-plane that correspond to these values of
and .
Exercise 7.3
Suppose that a firm can sell its product in a domestic and a foreign market and that
the inverse demand functions for these two markets are
p1 = 30 4q1

and

p2 = 50 5q2 ,

where p1 and p2 are the prices (in pounds) if they sell quantities q1 and q2 (in tonnes) in
the domestic and foreign markets respectively. Given that the total cost function of the
firm (in pounds) is
TC(q) = 10 + 10q,
where q is the quantity produced (in tonnes) and that the firm has a monopoly in both
markets, find the quantities it should sell in these markets if they want to maximise
their profit. What are the corresponding prices? What is the maximum profit?

Exercise 7.4
Use the method of Lagrange multipliers to optimise the function
f (x, y) = x3/8 y 2/3 ,
subject to the constraint x2 + y 2 = 25 where x, y > 0.
By sketching the constraint and some contours of f , justify your use of the method of
Lagrange multipliers and determine whether the point you have found maximises or
minimises f subject to the constraint.
Exercise 7.5
Given an amount of capital, k, and labour, l, a firm produces a quantity of goods,
q(k, l), where
q(k, l) = ln k + ln l,
for k, l > 0. Suppose that each unit of capital costs 2 and each unit of labour costs 3.
Use the method of Lagrange multipliers to find the values of k and l that maximise the
firms production given that their total budget for capital and labour is M .
Hence show that the maximum production the firm can achieve given a budget of M
is given by
M
Q(M ) = 2 ln ,
2 6
and verify that Q (M ) = where is the Lagrange multiplier.

295

7. Two-variable optimisation

Solutions to exercises
Solution to exercise 7.1
Given that
f (x, y) = x2 ln y y ln y,
for y > 0 and all x R, we see that the first-order partial derivatives of this function are
fx (x, y) = 2x ln y

and

fy (x, y) =

x2
(ln y + 1),
y

where we have used the product rule when finding fy (x, y). At a stationary point, both
of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and
fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous
equations
x2
2x ln y = 0
and
ln y 1 = 0.
y
If we start by looking at the first equation, this gives us
x ln y = 0

x = 0 or ln y = 0

x = 0 or y = 1.

And so, to satisfy the second equation with:


x = 0 we must have
0 ln y 1 = 0

ln y = 1

y = e1 ,

i.e. (0, e1 ) is a stationary point.


y = 1 we must have
x2
ln 1 1 = 0
1

x2 = 1

x = 1,

i.e. (1, 1) and (1, 1) are stationary points.


Consequently, the points (0, e1 ), (1, 1) and (1, 1) are stationary points of this
function.
To classify these stationary points, we note that the second-order partial derivatives are
fxx (x, y) = 2 ln y,

fxy (x, y) =

2x
x2 1
= fyx (x, y) and fyy (x, y) = 2 ,
y
y
y

and, as such, the Hessian is given by


H(x, y) = (2 ln y)

x2 1

y2 y

2x
y

2(x2 + y) ln y + 4x2
.
y2

Evaluating this at each of the stationary points we then find that:

296

7.3. Solutions to exercises

At (0, e1 ), the Hessian is


H(0, e1 ) =

2 e1 ln(e1 )
= 2 e > 0 and fxx (0, e1 ) = 2 ln(e1 ) = 2 < 0,
2
e

as ln(e1 ) = 1 and so this is a local maximum.


At (1, 1), the Hessian is

4
H(1, 1) = < 0,
1
as ln 1 = 0 and so this is a saddle point.

At (1, 1), the Hessian is

4
H(1, 1) = < 0,
1
as ln 1 = 0 and so this is a saddle point.

Thus, the stationary points (0, e1 ), (1, 1) and (1, 1) are a local maximum and two
saddle points respectively.
Solution to exercise 7.2
We have, for x, y > 0, the function f (x, y) = x+1 y 1 whose first-order partial
derivatives are
fx (x, y) = ( + 1)x y 1

fy (x, y) = ( 1)x+1 y 2 ,

and

and so its second-order partial derivatives are


fxx (x, y) = ( + 1)x1 y 1 ,
fxy (x, y) = ( + 1)( 1)x y 2 = fyx (x, y),

and

fyy (x, y) = ( 1)( 2)x+1 y 3 .

The Hessian for this function can then be written as


H(x, y) = ( + 1)( 1)( 2) ( + 1)2 ( 1)2 x2 y 2(2) ,
and, for f (x, y) to be convex, we need H(x, y) 0 and fxx (x, y) 0, i.e. we need
( + 1)x1 y 1 0

( + 1) 0,

( )

as x, y > 0, and
( + 1)( 1)( 2) ( + 1)2 ( 1)2 x2 y 2(2) 0,
which gives
(+1)( 1) [( 2) ( + 1)( 1)] 0

(+1)( 1)(1) 0,

()

as x, y > 0. So, in order to satisfy both of these inequalities, we have


either 1: so that 0 from ( ) which means that we have 0 and, from
(),

297

7. Two-variable optimisation

1 and + 1 (but we cant have 0, 1 and + 1!), or

1 and + 1 (see region A in Figure 7.8 );

or 1: so that 0 from ( ) which means that we have 1 and, from


(),
1 and + 1 (see region B in Figure 7.8), or
1 and + 1 (see region C in Figure 7.8).

A sketch of the corresponding regions in the (, )-plane is illustrated in Figure 7.8.

=
1

2
1

Figure 7.8: The sketch for Exercise 7.2.

Solution to exercise 7.3


Here the firm is a monopoly and so, as it is the sole supplier of its product in both
markets, when it supplies quantities q1 and q2 to the domestic and foreign markets
respectively, the prices will be given by the inverse demand functions
p1 = 30 4q1

and

p2 = 50 5q2 ,

respectively.13 This means that their total revenue is given by


TR(q1 , q2 ) = p1 q1 + p2 q2 = (30 4q1 )q1 + (50 5q2 )q2 ,
and their total costs are given by
TC(q) = 10 + 10q

TC(q1 , q2 ) = 10 + 10(q1 + q2 ),

as q = q1 + q2 is the quantity being produced. As such, their profit function is


(q1 , q2 ) = TR(q1 , q2 ) TC(q1 , q2 ) = 20q1 + 40q2 4q12 5q22 10,
13

Note that the situation described here, where a producer charges different prices in different markets,
is sometimes known as price discrimination.

298

7.3. Solutions to exercises

and we need to find the values of q1 and q2 that maximise this.


To do this, we see that the first-order partial derivatives of (q1 , q2 ) are
q1 (q1 , q2 ) = 20 8q1

and

q2 (q1 , q2 ) = 40 10q2 ,

and so, as a stationary point occurs when q1 (q1 , q2 ) = 0 and q2 (q1 , q2 ) = 0, we need to
solve the simultaneous equations
20 8q1 = 0

and

40 10q2 = 0.

But, of course, the first equation gives q1 = 5/2 and the second equation gives q2 = 4
which means that (5/2, 4) is the only stationary point of (q1 , q2 ).
To check that this is a maximum, we look at the second-order partial derivatives of
(q1 , q2 ), which are
q1 q1 (q1 , q2 ) = 8,

q1 q2 (q1 , q2 ) = 0 = q2 q1 (q1 , q2 )

and

q2 q2 (q1 , q2 ) = 10,

and, as such the Hessian is given by


H(x, y) = (8)(10) 02 = 80.
Clearly, at (5/2, 4), we have H(5/2, 4) > 0 and q1 q1 (5/2, 4) < 0 which means that the
stationary point we have found is indeed a local maximum. Consequently, to maximise
its profit, the firm should supply 5/2 tonnes of its product to the domestic market and 4
tonnes of its product to the foreign market so that it can sell them at prices, in pounds,
of
5
= 20
and
p2 = 50 5(4) = 30,
p1 = 30 4
2
respectively and, in doing so, the firm will make a maximum profit of
(5/2, 4) = 20

5
2

+ 40(4) 4

5
2

5(4)2 10 = 95,

pounds.
Solution to exercise 7.4
Writing the constraint in the form x2 + y 2 25 = 0, we get the Lagrangean
L(x, y, ) = x3/8 y 2/3 (x2 + y 2 25),
and we seek the points which simultaneously satisfy the equations Lx (x, y, ) = 0,
Ly (x, y, ) = 0 and L (x, y, ) = 0. So we find the first-order partial derivatives of
L(x, y, ), i.e.
3
Lx (x, y, ) = x5/8 y 2/3 2x,
8
2
Ly (x, y, ) = x3/8 y 1/3 2y and
3
L (x, y, ) = (x2 + y 2 25),

299

7. Two-variable optimisation

and set these equal to zero to yield the equations


2 3/8 1/3
x y
2y = 0
3

3 5/8 2/3
x
y 2x = 0,
8

x2 + y 2 25 = 0.

and

We now solve these by eliminating from the first two equations, i.e. we get
3 5/8 2/3
x
y 2x = 0
8

3
16

y 2/3
x13/8

1
3

x3/8
y 4/3

from the first equation, and


2 3/8 1/3
x y
2y = 0
3

from the second equation. As such, we can equate these expressions for to get
3
16

y 2/3
x13/8

1
3

x3/8
y 4/3

y2 =

16 2
x.
9

We then use this new relationship between x and y in the third equation, which is just
the constraint x2 + y 2 = 25, to get

x2 +

16 2
x = 25
9

25 2
x = 25
9

x2 = 9

x = 3,

as x > 0. Then, using this in the equation y 2 = 16x2 /9, we get


y2 =

16 2
(3 ) = 16
9

y = 4,

as y > 0. Thus, x = 3 and y = 4 will optimise f (x, y) subject to the constraint.


The constraint is x2 + y 2 = 15 and this is a circle of radius five centred on the origin
which, for x, y > 0, is illustrated in Figure 7.9(a). The objective function,
f (x, y) = x3/8 y 2/3 has contours f (x, y) = c, where c is a constant, that look a bit like
rectangular hyperbolae as illustrated in Figure 7.9(b). The direction in which f (x, y) is
increasing is indicated in this figure along with the point we found above using the
Lagrange multiplier method i.e. a point where we have a contour of f (x, y) which is
both tangential to the constraint and touching the constraint. Having seen this, it
should be clear that this point will maximise f subject to the constraint.
Solution to exercise 7.5
The firm has M to spend on capital and labour where each unit of capital costs 2
and each unit of labour costs 3. As such, the cost of using k units of capital and l
units of labour is 2k + 3l and this gives us the constraint 2k + 3l = M .14 So, to
maximise the quantity
q(k, l) = ln k + ln l,
14

Strictly, the constraint is 2k + 3l M where k, l > 0, but we can see that if we chose a point where
2k + 3l < M , we could not maximise the quantity produced since, spending more on capital and labour
to get a point where 2k + 3l = M , we would get a larger quantity. This should make sense if you consider
the discussion of budget constraints in Section 7.3.4.

300

5
4

i n d i re
cr
e a c ti o
si n n
g of
f(
x,
y)

7.3. Solutions to exercises

(a)

(b)

Figure 7.9: The sketches for Exercise 7.4. (a) The constraint x2 + y 2 = 25 for x, y > 0. (b)

Adding three contours, f (x, y) = c, where the direction in which f (x, y) is increasing is
as indicated. Clearly, we are interested in the point (3, 4) which is indicated in the figure.
that the firm can produce subject to the constraint 2k + 3l = M where k, l > 0 we use
the Lagrangean
L(k, l, ) = ln(k) + ln(l) (2k + 3l M ).
We seek the points which simultaneously satisfy the equations Lk (k, l, ) = 0,
Ll (k, l, ) = 0 and L (k, l, ) = 0. The first-order derivatives of L(k, l, ) are
Lk (k, l, ) =

1
2,
k

Ll (k, l, ) =

1
3
l

L (k, l, ) = (2k + 3l M ),

and

and we set these equal to zero to yield the equations


1
2 = 0,
k

1
3 = 0
l

2k + 3l M = 0.

and

We now solve these by eliminating from the first two equations, i.e. we get
=

1
1
=
2k
3l

3l = 2k

3
k = l.
2

We then use this new relationship between k and l in the third equation, which is just
the constraint 2k + 3l = M , to get
2

3
l + 3l = M
2

6l = M

l=

M
,
6

and then, using this in the equation k = 3l/2, we get


k=

3 M
M

=
.
2
6
4

Thus the values of k and l that maximise q(k, l) subject to the constraint are k = M/4
and l = M/6.
In this case, the maximum production achievable, given a budget of M , is
Q(M ) = q

M M
,
4 6

= ln

M
4

+ ln

M
6

= ln

M2
24

= 2 ln

2 6

301

7. Two-variable optimisation

as required. Further, we can find the value of using, say, the equation
=

1
2k

1
2

4
M

2 6

and we can see that


Q(M ) = 2 ln

2
,
M

can be written as

Q(M ) = 2 ln M 2 ln 2 6

Q (M ) =

2
,
M

which verifies that Q (M ) = .


Note: Although this question is similar to what we saw in Example 7.17, notice that
here we are maximising production subject to a budget constraint whereas in
Example 7.17 we were minimising costs subject to a production constraint. In
particular, this means that you should always read the question carefully to ensure that
you are using the correct objective function and constraint! Further, we were not asked
to justify the assertion that the optimal point we found was a maximum here and so we
havent, but sometimes, as in Exercise 7.4, we will be asked to provide such a
justification.

302

Chapter 8
Differential equations
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 12.112.4 and 12.712.8.
Anthony and Biggs (1996) Chapters 27 and 28.
Further reading
Simon and Blume (1994) Sections 24.124.3 and Section 25.3.
Adams and Essex (2010) Sections 3.7 and 7.9, parts of Sections 17.117.2,
17.417.6.
Aims and objectives

The objectives of this chapter are as follows.


To see different types of differential equation and solve them using the given
methods.
To use differential equations to solve problems from economics-based subjects.
Specific learning outcomes can be found near the end of this chapter.

8.1

Introduction: What is a differential equation?

A differential equation is an equation which contains at least one derivative of an


unknown function. In this course, we will be concerned with ordinary differential
equations (or ODEs), i.e. those which involve functions of only one independent
variable.1 It is often convenient to classify ODEs according to how the highest order
derivative it contains appears in it. That is, we say that the
order of an ODE is given by the order of the highest-order derivative it contains.
1

If the differential equation involves a function with more than one independent variable, then it
would contain at least one partial derivative of the function and we would have a partial differential
equation.

303

8. Differential equations

degree of an ODE is given by the algebraic degree of the highest-order derivative


it contains.
On the whole, we will be concerned with ODEs which are first or second-order and of
the first degree.
Activity 8.1 Determine the order and degree of the following ODEs involving the
unknown function, y(x).
2

(a)

dy
dx

(b)

dy
=x
dx

(c)

d3 y
=x
dx3

=x

d2 y
.
dx2

d2 y
dx2
d2 y
dx2

.
2

Given an ODE, we usually want to solve it. That is, we want to find the unknown
function in a form which does not involve any derivatives, and when we have found the
function in this form we call it a solution to the ODE. In general, we will find that any
given ODE has many solutions and so we get a general solution, i.e. we find the
unknown function up to some arbitrary constants that are not determined by the ODE
itself. Lets look at a very simple example of an ODE (i.e. one that can be solved by
direct integration) to see how things work.

8
Example 8.1

Solve the ODE

dy
= 2x + 1.
dx

This is a first-order ODE of degree one and it is very easy to solve because we can
just integrate both sides to see that
dy
dx =
dx

(2x + 1) dx

y = x2 + x + c,

where c is an arbitrary constant. As the independent variable is x and the dependent


variable is y, the unknown function here is y(x), this gives us
y(x) = x2 + x + c.
We call this the general solution to the ODE as any solution to the ODE will have
this form and each of these solutions arises from a different value of the arbitrary
constant, c.
In addition to an ODE, we may also be given conditions which give us extra
information about the function we are interested in. Given this information, we can find
a particular solution, i.e. a solution to the ODE that also satisfies the given conditions.

304

8.1. Introduction: What is a differential equation?

Example 8.2 Find the solution to the ODE in Example 8.1 that also satisfies the
condition y(0) = 1.
We know that all solutions to the ODE in the previous example have the form
y(x) = x2 + x + c.
If, in addition, we want a solution that satisfies the condition y(0) = 1, we can set
x = 0 in both sides of this expression and use the condition to get
y(0) = 02 + 0 + c

1 = c.

That is, if we want to satisfy the condition y(0) = 1 as well, we must take c = 1 in
the general solution. Consequently,
y(x) = x2 + x + 1,
is the particular solution to the ODE given that y(0) = 1.
Of course, it should be clear from this example that, when we apply different conditions
to the general solution, we can get different values of c and hence different particular
solutions.
Activity 8.2 Find the particular solutions to the ODE in Example 8.1 that also
satisfy the conditions (a) y(0) = 0, (b) y(0) = 1 and (c) y(2) = 7.
Indeed, we solved simple ODEs that looked like this when we considered marginal
functions in Section 5.4.1. Further, as the following example shows, we can also solve
simple higher-order ODEs by direct integration.
Example 8.3

Solve the ODE

d2 y
= 6x + 2.
dx2

This is a second-order ODE of degree one and, once again, we can begin to solve it
by integrating both sides to see that
d2 y
dx =
dx2

(6x + 2) dx

dy
= 3x2 + 2x + c,
dx

but this does not give us a solution as we still have a derivative in our expression.
However, if we integrate both sides again, we get
dy
dx =
dx

(3x2 + 2x + c) dx

y = x3 + x2 + cx + d,

where d is another arbitrary constant. As the independent variable is x and the


dependent variable is y, the unknown function here is y(x), this gives us
y(x) = x3 + x2 + cx + d.
This is the general solution to the ODE as any solution to the ODE will have this
form and each of these solutions arises from different values of the arbitrary
constants, c and d.

305

8. Differential equations

Of course, if we find that there are several arbitrary constants in the general solution of
an ODE, such as c and d in the general solution to the second-order ODE in
Example 8.3, we will need more conditions in order to determine these constants and
hence find a particular solution.

Example 8.4 Find the solution to the ODE in Example 8.3 that also satisfies the
conditions y(0) = 1 and y (0) = 2.
We know that all solutions to the ODE in the previous example have the form
y(x) = x3 + x2 + cx + d.
If, in addition, we want a solution that satisfies the condition y(0) = 1, we can set
x = 0 in both sides of this expression and use the condition to get
y(0) = 03 + 02 + c(0) + d

1 = d.

We also know that


y (x) = 3x2 + 2x + c,
and so, if we want a solution that satisfies the condition y (0) = 2, we can set x = 0
in both sides of this expression and use the condition to get
y (0) = 3(02 ) + 2(0) + c

2 = c.

Thus, we see that

y(x) = x3 + x2 + 2x + 1,
is the particular solution to the ODE given that y(0) = 1 and y (0) = 2.

More generally, we wont be able to solve ODEs by direct integration and so the
procedure for solving an ODE will usually involve identifying its type and applying the
relevant method. In what follows, we shall see how the form of an ODE allows us to
choose the method that will enable us to solve it in cases where direct integration cant
be used.

8.2

First-order ODEs

In this section we will consider some methods that will allow us to solve certain
first-order ODEs of degree one. That is, certain ODEs that have the form
dy
= f (x, y),
dx
where f (x, y) is some given function of the independent variable, x, and the dependent
variable, y.

306

8.2. First-order ODEs

8.2.1

Separable first-order ODEs

A first-order ODE of degree one that can be written in the form


M (x) = N (y)

dy
,
dx

is called a separable ODE. This is because, in such cases, we have been able to
separate the variables so that all occurrences of x occur on the left-hand-side and all
occurrences of y occur on the right-hand-side. ODEs of this type can be solved by
integrating both sides to get
M (x) dx =

N (y)

dy
dx
dx

M (x) dx =

N (y) dy,

using the integration by substitution formula from Section 5.2.3. If we now determine
these integrals, we will find the general solution to the ODE.
Example 8.5

Find the general solution to the ODE

dy
= 2x(y 1).
dx

This ODE is separable as it can be written as


2x =

1 dy
,
y 1 dx

with M (x) = 2x and N (y) = (y 1)1 . Using the method described above, we write
this as
2x dx =

dy
y1

and determine the integrals to get x2 + c = ln |y 1|,

where c is an arbitrary constant. Taking exponentials of both sides, this gives us


|y 1| = ex

2 +c

= ec ex .

Now, both sides of this expression are non-negative because of the modulus on the
left-hand-side and the exponentials on the right-hand-side. This means that, if we
want to remove the modulus, we must allow the possibility that the right-hand-side
can give us a negative quantity, i.e. we have
y 1 = ec ex

y = 1 ec ex .

Then, as the independent variable is x and the dependent variable is y, the unknown
function here is y(x), so this gives us the general solution
2

y(x) = 1 + A ex ,
where A R is an arbitrary constant.2
Of course, having found the general solution to the ODE in this example, we can also
find particular solutions if we are given some conditions.
2

Here we have replaced ec with a new constant A R which can take any value.

307

8. Differential equations

Activity 8.3 Find the particular solutions to the ODE in Example 8.5 given the
conditions (a) y(0) = 2 and (b) y(0) = 0.
What value of y(1) will give the same particular solution as the one you found in
(a)?

8.2.2

Linear first-order ODEs

A first-order ODE of degree one that can be written in the form


dy
+ P (x)y = Q(x),
dx
is called a linear ODE. The procedure for solving such an ODE involves finding an
integrating factor, (x), given by
(x) = e

P (x) dx

where, here, P (x) dx is just any antiderivative of P (x). Once we have this, we
multiply both sides of the ODE by the integrating factor to get
(x)

dy
+ (x)P (x)y = (x)Q(x).
dx

(8.1)

Now, observe that


d
d
=
dx
dx

P (x) dx

P (x) dx

P (x) = (x)P (x),

if we use the chain rule3 and so, using the product rule, we have
d
(x)y(x)
dx

= (x)

dy d
dy
+
y(x) = (x)
+ (x)P (x)y(x),
dx dx
dx

which is the left-hand-side of (8.1). As such, we can write (8.1) as


d
(x)y(x)
dx

= (x)Q(x)

(x)y(x) =

(x)Q(x) dx,

and if we determine the integral on the right-hand-side, we can then find the general
solution to the ODE.
3

When using the chain rule here, we should find that


d
dx

P (x) dx = P (x).

To see why, note that if c is an arbitrary constant and F (x) is an antiderivative of P (x), i.e. F (x) = P (x),
we have
P (x) dx = F (x) + c
as expected.

308

d
dx

P (x) dx =

d
F (x) + c
dx

= F (x) = P (x),

8.2. First-order ODEs

Find the general solution of the ODE x

Example 8.6

dy
2y = 6.
dx

This ODE is linear as it can be written as


dy
2
6
y= ,
dx x
x
with P (x) = 2/x and Q(x) = 6/x. Using the method above, we start by finding the
integrating factor, (x), by determining the integral
2
dx = 2 ln |x| + c,
x

P (x) dx =

and so we see that 2 ln x is an antiderivative of 2/x. This means that the


integrating factor is
2
(x) = e2 ln x = eln x = x2 ,
and so we have
(x)y(x) =

(x)Q(x) dx = x2 y(x) =

6x3 dx = x2 y(x) = 3x2 + c,

where c is an arbitrary constant. As such, we find that


y(x) = 3 + cx2 ,
is the general solution to our linear ODE.
Activity 8.4

Observe that the ODE in Example 8.6 can also be written as


2
1 dy
=
.
x
y + 3 dx

Verify that the answer we found in that example is correct by solving this separable
ODE using the method in Section 8.2.1.
Lets now consider another example where the ODE is linear, but not separable.

Example 8.7

Find the general solution of the ODE

dy
= y + ex .
dx

This ODE is linear as it can be written as


dy
y = ex ,
dx
which is linear with P (x) = 1 and Q(x) = ex . Using the method above, we start by
finding the integrating factor, (x), by determining the integral
P (x) dx =

1 dx = x + c,

309

8. Differential equations

and so we see that x is an antiderivative of 1. This means that the integrating


factor is
(x) = ex ,
and so we have
(x)y(x) =

(x)Q(x) dx

ex y(x) =

dx

ex y(x) = x + c,

where c is an arbitrary constant. As such, we find that


y(x) = (x + c) ex ,
is the general solution to our linear ODE.
Activity 8.5

8.2.3

Verify that the ODE in the previous example is not separable.

Homogeneous first-order ODEs

As we saw in Section 6.3.4, a function f (x, y) is homogeneous of degree r if


f (x, y) = r f (x, y).
Using this, we say that a first-order ODE of the form
M (x, y) + N (x, y)

dy
= 0,
dx

is homogeneous of degree n if the functions M and N are both homogeneous of degree


n. The procedure for solving such an ODE involves making the substitution y = xv(x)
to separate the variables v and x so that we can solve it using the method in
Section 8.2.1.
Example 8.8

Find the general solution of the ODE


xy + y 2 xy

dy
= 0.
dx

Here we have functions M and N where


M (x, y) = xy + y 2

and N (x, y) = xy,

and, clearly, they are both homogeneous of degree 2. As such, we introduce a new
function, v(x), such that
y(x) = xv(x)

dy
dv
= v(x) + x ,
dx
dx

if we use the product rule. Using this, our ODE becomes


x2 v + v 2 x2 v v + x

310

dv
dx

= 0.

8.2. First-order ODEs

Cancelling common factors and simplifying this then becomes the separable ODE
dv
1
= ,
dx
x
which we solve using the method in Section 8.2.1, i.e.
dx
x

dv =

v(x) = ln |x| + c,

where c is an arbitrary constant. Consequently, using y(x) = xv(x), we find that


y(x) = x(ln |x| + c),
is the general solution to our homogeneous ODE.
Activity 8.6

Observe that the ODE in Example 8.8 can also be written as


dy
y
= 1.
dx x

Verify that the answer we found in that example is correct by solving this linear
first-order ODE using the method in Section 8.2.2.
Lets now consider another example where the ODE is homogeneous, but not linear.
Example 8.9

Find the general solution of the ODE


x4 + 5y 4 4xy 3

dy
= 0.
dx

Here we have functions M and N where


M (x, y) = x4 + 5y 4

and N (x, y) = 4xy 3 ,

and, clearly, they are both homogeneous of degree 4. As such, we introduce a new
function, v(x), such that
y(x) = xv(x)

dy
dv
= v(x) + x ,
dx
dx

if we use the product rule. Using this, our ODE becomes


x4 + 5x4 v 4 4x4 v 3 v + x

dv
dx

= 0.

Cancelling common factors and simplifying this then becomes the separable ODE
dv
1 + v4
=
,
dx
4xv 3
which we solve using the method in Section 8.2.1, i.e.
4v 3
dv =
1 + v4

dx
x

ln |1 + v 4 | = ln |x| + c,

311

8. Differential equations

where c is an arbitrary constant. So, taking exponentials of both sides, this gives us
|1 + v 4 | = eln |x|+c = ec eln |x| = ec |x|,
so that removing the modulus signs and replacing the arbitrary constant ec > 0 with
A R, we get
v 4 + 1 = Ax = v = (Ax 1)1/4 ,
for some arbitrary constant, A. Consequently, using y(x) = xv(x), we find that
y(x) = x(Ax 1)1/4 ,
is the general solution to our homogeneous ODE.
Activity 8.7

Verify that the ODE in Example 8.9 is not linear.

Homogeneous ODEs are not the only examples of ODEs that can be solved using the
methods above after some judicious substitution. In this course, if a novel substitution
is needed to make a given ODE solvable, it will usually be given. See, for example,
Exercise 8.2.

8.3

Second-order ODEs

In this section we will consider some methods that will allow us to solve certain
second-order ODEs where all occurrences of y and its derivatives are of degree one. In
particular, we will be concerned with such ODEs that have the form
a

dy
d2 y
+
b
+ cy = f (x),
dx2
dx

where a, b and c are constants and f (x) is some given function of the independent
variable, x. ODEs of this form are often said to have constant coefficients referring to
the constants multiplying y and its derivatives on the left-hand-side. The method for
solving such second-order ODEs is as follows.

8.3.1

Homogeneous second-order ODEs

If the function, f (x), on the right-hand-side of our second-order ODE with constant
coefficients is zero, i.e. if our ODE has the form
a

dy
d2 y
+b
+ cy = 0,
2
dx
dx

we say that it is homogeneous.4 To solve such an ODE, lets suppose that any solution
must have the form
y(x) = A ekx ,
(8.2)
4

Note that this is a different use of the word homogeneous to the one in Sections 6.3.4 and 8.2.3.
That is, this is an homogeneous equation whereas in Section 6.3.4 we had homogeneous functions and
in Section 8.2.3 we had an ODE which was made up from two such functions in a certain way.

312

8.3. Second-order ODEs

where k is a number to be determined and A is an arbitrary constant. Differentiating


this twice, we find that
dy
= Ak ekx
dx

and

d2 y
= Ak 2 ekx ,
dx2

and substituting this into the equation we get


a(Ak 2 ekx ) + b(Ak ekx ) + c(A ekx ) = 0.
Now, we can cancel the A as it is arbitrary and the ekx as it is always non-zero, which
leaves us with the auxiliary equation
ak 2 + bk + c = 0.
If we solve the auxiliary equation, we can determine the values of k in (8.2) that yield
solutions. Of course, when solving a quadratic equation such as this, there are three
different cases that can arise, i.e. we can get:
Two real solutions: If the solutions are k = and k = , then we get solutions of
the form
y(x) = A ex
and
y(x) = B ex ,
where A and B are arbitrary constants. As such, we find that
y(x) = A ex +B ex ,
is the general solution of the second-order ODE.
One real solution: If the solution is k = (twice), then we get solutions of the form
y(x) = A ex

and

y(x) = Bx ex ,

where A and B are arbitrary constants. As such, we find that


y(x) = (A + Bx) ex ,
is the general solution of the second-order ODE.

No real solutions: If the solutions are k = 1, then, using material which is


beyond the scope of this course,5 we find that
y(x) = ex A cos(x) + B sin(x) ,
is the general solution of the second-order ODE.
Lets illustrate these three cases by looking at some examples.
5

If you are interested, this case involves complex numbers which are discussed in Chapter 13 of
Binmore and Davies (2002). If you read this, you will then be able to understand the discussion of this
type of solution in Section 14.5 of Binmore and Davies (2002). However, as we are not dealing with such
things here, you are advised to wait until you tackle complex numbers properly in 175 Further Linear
Algebra.

313

8. Differential equations

Example 8.10

Find the general solution of the ODE y y 2y = 0.

As the right-hand-side of this second-order ODE with constant coefficients is zero, it


is homogeneous. Its auxiliary equation is given by
k2 k 2 = 0

(k 2)(k + 1) = 0,

and so we have two real solutions given by k = 2 and k = 1. As such, the theory
above dictates that
y(x) = A e2x +B ex ,
where A and B are arbitrary constants, is the general solution to this homogeneous
second-order ODE.
Example 8.11

Find the general solution of the ODE y + 4y + 4y = 0.

As the right-hand-side of this second-order ODE with constant coefficients is zero, it


is homogeneous. Its auxiliary equation is given by
k 2 + 4k + 4 = 0

(k + 2)2 = 0,

and so we have one real solution given by k = 2. As such, the theory above
dictates that
y(x) = (A + Bx) e2x ,
where A and B are arbitrary constants, is the general solution to this homogeneous
second-order ODE.

Example 8.12

Find the general solution of the ODE y 2y + 2y = 0.

As the right-hand-side of this second-order ODE with constant coefficients is zero, it


is homogeneous. Its auxiliary equation is given by

k 2 2k+2 = 0 = (k1)2 +1 = 0 = k1 = 1 = k = 1 1.
and so we get no real solutions for k. As such, the theory above dictates that we take
= 1 and d = 1, so that
y(x) = ex A cos(x) + B sin(x) ,
where A and B are arbitrary constants, is the general solution to this homogeneous
second-order ODE.

8.3.2

Non-homogeneous second-order ODEs

If the function, f (x), on the right-hand-side of our second-order ODE with constant
coefficients is non-zero, i.e. it has the form
a

314

dy
d2 y
+
b
+ cy = f (x),
dx2
dx

8.3. Second-order ODEs

with f (x) = 0, then we say that it is non-homogeneous. To solve such an ODE, we use
the following method.
We solve the corresponding homogeneous ODE, to find the function, yc (x), which is
often called the complementary function. That is, we solve
a

d2 yc
dyc
+ cyc = 0,
+b
2
dx
dx

using the auxiliary equation, as in Section 8.3.1, to find yc (x).


We then seek a function, yp (x), which is often called the particular integral, that
satisfies the non-homogeneous ODE. That is, we want to find a function, yp (x),
that satisfies
d2 y p
dyp
+ cyp = f (x),
a 2 +b
dx
dx
and we will see how to do this in a moment.
Then, having found the complementary function and a particular integral, the
general solution to our non-homogeneous ODE is given by
y(x) = yc (x) + yp (x).
That is, the general solution we seek, y(x), is the sum of the two functions we have
found.
In particular, observe that the complementary function will contain the two arbitrary
constants that make y(x) a general solution whereas the particular integral guarantees
that y(x) will give us the correct right-hand-side, i.e. f (x), when we substitute it into
the ODE.
Finding particular integrals
To find the particular integral for a given second-order ODE, we look at f (x) and start
by taking yp (x) to be a general function of the same form. For instance, if we find that
f (x) = a for some constant a we take yp (x) = .
f (x) = a + bx for some constants a and b we take yp (x) = + x.
f (x) = a + bx + cx2 for some a, b and c we take yp (x) = + x + x2 .
et cetera.
f (x) = a erx for some constant a we take yp (x) = erx .
f (x) = (a + bx) erx for some constants a, b and r we take yp (x) = ( + x) erx .
et cetera.
f (x) = a sin(rx) for some constants a and r we take yp (x) = sin(rx) + cos(rx).
f (x) = a cos(rx) for some constants a and r we take yp (x) = sin(rx) + cos(rx).
et cetera.

315

8. Differential equations

Then, by substituting the appropriate general function into our non-homogeneous


second-order ODE, we can find the values of the relevant Greek letters and this will
then give us the specific function, yp (x), that will play the role of the particular integral
in our solution.
Applying the method
Lets consider an example to see how we would go about determining the particular
integral in some of the cases listed above and how we would use this to find the general
solution of a non-homogeneous second-order ODE.
Example 8.13 In Example 8.10 we saw that the general solution to the
homogeneous second-order ODE y y 2y = 0 was given by
y(x) = A e2x +B ex ,
where A and B are arbitrary constants. Find the general solution to the
non-homogeneous second-order ODE
y y 2y = f (x),
when (i) f (x) = 8, (ii) f (x) = 6x and (iii) f (x) = 20 e3x .
We know that the complementary function, yc (x), for this non-homogeneous
second-order ODE is given by the general solution to the homogeneous second-order
ODE. As such, we know that

yc (x) = A e2x +B ex ,
where A and B are arbitrary constants. Our first task is to find the particular
integral, yp (x), for each choice of f (x). Once we have this, we can then find the
general solution, y(x), of the relevant non-homogeneous second-order ODE by
simply taking y(x) = yc (x) + yp (x).
For (i), we have f (x) = 8 and so we take yp (x) = where is a constant that has to
be determined. To find , we note that yp (x) and yp (x) are both zero which means
that substituting them into the non-homogeneous second-order ODE, we get
0 0 2 = 8

= 4.

Thus, yp (x) = 4 is the sought after particular integral and the general solution to
our non-homogeneous second-order ODE is
y(x) = A e2x +B ex 4,
using y(x) = yc (x) + yp (x).
For (ii), we have f (x) = 6x and so we take yp (x) = + x where and are
constants that have to be determined. To find and , we note that yp (x) = and
yp (x) = 0 which means that substituting them into the non-homogeneous
second-order ODE yields
0 2( + x) = 6x

316

2x (2 + ) = 6x.

8.3. Second-order ODEs

Now these two expressions must be the same and so, looking at the coefficient of x
on both sides, we see that must be 3. Similarly, looking at the constant term on
both sides we see that 2 must be zero, so as = 3, this means that must
be 3/2. Thus, yp (x) = 32 3x is the sought after particular integral and the general
solution to our non-homogeneous second-order ODE is
3
y(x) = A e2x +B ex + 3x,
2
using y(x) = yc (x) + yp (x).
For (iii), we have f (x) = 20 e3x and so we take yp (x) = e3x where is a constant
that has to be determined. To find , we note that yp (x) = 3 e3x and yp (t) = 9 e3x
which means that substituting them into the non-homogeneous second-order ODE
yields
9 e3x 3 e3x 2( e3x ) = 20 e3x

4 e3x = 20 e3x

= 5.

Thus, yp (x) = 5 e3x is the sought after particular integral and the general solution to
our non-homogeneous second-order ODE is
y(x) = A e2x +B ex +5 e3x ,
using y(x) = yc (x) + yp (x).
A complication
Although we wont spend much time on such things, observe that if the function, f (x),
in our non-homogeneous second-order ODE prompts us to try a particular integral,
yp (x), that is part of the complementary function i.e. we can find values of the
arbitrary constants in yc (x) that make yc (x) = yp (x) we have to be more subtle when
we choose our particular integral. However, this subtlety usually involves doing nothing
more than multiplying what wed normally choose to be our particular integral by x.
Lets return to our previous example to see how this works.
Example 8.14 Following on from Example 8.13, find the general solution to the
non-homogeneous second-order ODE
y y 2y = f (x),
when f (x) = 18 e2x .
We know that the complementary function, yc (x), for this non-homogeneous
second-order ODE is given by
yc (x) = A e2x +B ex ,
where A and B are arbitrary constants. Our task is to find the particular integral,
yp (x), in the case where f (x) = 18 e2x so that we can deduce the relevant general
solution.

317

8. Differential equations

Note: Here we would normally try yp (x) = e2x but this is part of the
complementary function since, taking A = and B = 0, we have yp (x) = yc (x)!
Our first reaction in this case would be to take yp (x) = e2x where is a constant
that has to be determined. To find , we note that yp (x) = 2 e2x and yp (x) = 4 e2x
which means that substituting them into the non-homogeneous second-order ODE,
we get
4 e2x 2 e2x 2( e2x ) = 18 e2x .

But now, the left-hand-side turns out to be zero,6 meaning that this equation for
has no solutions! That is, we cant determine if we use this general form for yp (x)!
Thus, the particular integral in this case cant have the general form yp (x) = e2x as
we cant find an that will make it work.
So, following the advice above, we try the next best thing which is our original
choice multiplied by x. That is, we try yp (x) = x e2x where is a constant that has
to be determined. To find , we note that writing yp (x) as (x)(e2x ) we can use the
product rule to get
yp (x) = ()(e2x ) + (x)(2 e2x ) = ( + 2x)(e2x ),
and
yp (x) = (2)(e2x ) + ( + 2x)(2 e2x ) = (4 + 4x)(e2x ).
So, substituting these into the non-homogeneous second-order ODE, we get
(4 + 4x)(e2x ) ( + 2x)(e2x ) 2(x)(e2x ) = 18 e2x

3 e2x = 18 e2x ,

which means that can now be determined and is actually equal to 6. Thus,
yp (x) = 6x e2x is the sought after particular integral and so the general solution to
our non-homogeneous second-order ODE is
y(x) = A e2x +B ex +6x e2x ,
using y(x) = yc (x) + yp (x).
Another example of this complication arises in Question 3(b) of the sample examination
paper in Appendix A.

8.4

Systems of first-order ODEs

We now turn our attention to systems of first-order ODEs. For instance, we may be
asked to find the functions y1 (x) and y2 (x) that simultaneously satisfy the ODEs
dy1
= f1 (y1 , y2 , x) and
dx
6

dy2
= f2 (y1 , y2 , x),
dx

Actually, this shouldnt be a surprise since, taking A = and B = 0 in our complementary function,
we still have a solution to the homogeneous second-order ODE and so putting this into the left-hand-side
must yield zero!

318

8.4. Systems of first-order ODEs

where we are given the functions f1 and f2 . Generally, y1 and y2 will appear on the
right-hand-sides of both these first-order ODEs and, in such cases, we say that they are
coupled as we cant solve one of them without using information contained in the other.
The procedure that we shall use to solve these involves rewriting the system of
first-order ODEs as a second-order ODE which can then be solved using the method
outlined in the previous section.

8.4.1

Simple systems of first-order ODEs

A simple system of coupled first-order ODEs will only involve linear combinations of
y1 (x) and y2 (x) on the right-hand-side, i.e. it will have the form
dy1
= ay1 (x) + by2 (x) and
dx

dy2
= cy1 (x) + dy2 (x),
dx

for some constants a, b, c and d. The procedure for solving this involves differentiating
the first equation (say) with respect to x so that we get
dy1
dy2
d2 y1
=a
+b
,
2
dx
dx
dx
and then, using the second equation, we find that
d2 y1
dy1
=
a
+ b (cy1 (x) + dy2 (x)) ,
dx2
dx
which means that we have
d2 y1
dy1
a
bcy1 (x) bdy2 (x) = 0.
2
dx
dx

Now, the first equation can be rearranged to give


by2 (x) =

dy1
ay1 (x),
dx

and so, if we substitute this in, we end up with


d2 y 1
dy1
(a + d)
(bc ad)y1 (x) = 0,
2
dx
dx
which is an homogeneous second-order ODE with constant coefficients which we can
solve using the method in Section 8.3.1 to find y1 (x). Of course, having done this, we
can then use the first of the original equations (say) to find y2 (x). Lets look at an
example to see how this works.
Example 8.15 Find the functions y1 (x) and y2 (x) that satisfy the system of
first-order ODEs given by
dy1
= 2y1 + 4y2
dx

and

dy2
= 3y1 + 3y2 ,
dx

with the conditions y1 (0) = 5 and y2 (0) = 2.

319

8. Differential equations

We will solve this by rewriting this system as a second-order ODE in y1 (x). To do


this we note that, rearranging the first ODE gives us
y2 =

1
4

dy1
2y1 ,
dx

(8.3)

and if we differentiate this with respect to x we get


dy2
1
=
dx
4

d2 y1
dy1
2
2
dx
dx

Consequently, if we substitute these two expressions into the second ODE, we get
1
4

d2 y 1
dy1
2
2
dx
dx

= 3y1 +

3
4

dy1
2y1 ,
dx

and this can be rearranged to get


d2 y1
dy1
6y1 = 0,

5
dx2
dx
which is our sought after second-order ODE in y1 (x). As it is an homogeneous
second-order ODE with constant coefficients, this can be solved using the method in
Section 8.3.1. The auxiliary equation is
k 2 5k 6 = 0

(k + 1)(k 6) = 0,

which has two real solutions given by k = 1 and k = 6 which means that the
general solution for y1 (x) is
y1 (x) = A ex +B e6x ,

for arbitrary constants A and B. To find the general solution for y2 (x), we note that
using (8.3) and the fact that
we get
y2 (x) =

y1 (x) = A ex +6B e6x ,


1
[A ex +6B e6x ] 2[A ex +B e6x ]
4

1
4

3A ex +4B e6x ,

in terms of the same arbitrary constants A and B as before. Thus, the general
solution to this system of first-order ODEs is
3
y1 (x) = A ex +B e6x and y2 (x) = A ex +B e6x ,
4
for arbitrary constants A and B.
However, we are also given the conditions y1 (0) = 5 and y2 (0) = 2 which imply that
3
5 = A + B and 2 = A + B.
4
Solving these two equations simultaneously, say by subtracting one from the other,
we see that 7 = 7A/4 which gives A = 4 and then, the first equation gives B = 1.
Consequently, we find that
y1 (x) = 4 ex + e6x

and y2 (x) = 3 ex + e6x ,

is the particular solution of this system of first-order ODEs given the conditions
y1 (0) = 5 and y2 (0) = 2.

320

8.4. Systems of first-order ODEs

It is worth noting that systems of equations of the form encountered here can also be
solved using diagonalisation in much the same way as systems of difference equations
are solved in Section 11.2 of 173 Algebra.

8.4.2

Other systems of first-order ODEs

Systems of first-order ODEs become more complicated when they involve more
complicated functions on the right-hand-side. The method for solving them remains the
same, but a little more care must be taken as the following example illustrates.
Example 8.16 Find the functions y1 (x) and y2 (x) that satisfy the system of
first-order ODEs given by
dy1
dy2
= 4y1 + 2y2 and
= 2y1 + 4x2 + 4,
dx
dx
with the conditions y1 (0) = 1 and y2 (0) = 7/2.
We will solve this by rewriting this system as a second-order ODE in y1 (x). To do
this we note that, rearranging the first ODE gives us
y2 =

1
2

dy1
+ 4y1 ,
dx

(8.4)

and if we differentiate this with respect to x we get


1
dy2
=
dx
2

d2 y1
dy1
+4
2
dx
dx

Consequently, if we substitute this derivative into the second ODE, we get


1
2

dy1
d2 y 1
+4
2
dx
dx

= 2y1 + 4x + 4,

and this can be rearranged to get


d2 y 1
dy1
+4
+ 4y1 = 8x2 + 8,
(8.5)
2
dx
dx
which is our sought after second-order ODE in y1 (x). As it is a non-homogeneous
second-order ODE with constant coefficients, this can be easily solved using the
method of Section 8.3.2. In particular:
The homogeneous second-order ODE that corresponds to (8.5) is
d2 y1
dy1
+4
+ 4y1 = 0,
2
dx
dx
and so the auxiliary equation is
k 2 + 4k + 4 = 0

(k + 2)2 = 0,

which has one real solution given by k = 2 (twice). Consequently, the


complementary function for y1 (x) is
y1 (x) = (A + Bx) e2x ,
where A and B are arbitrary constants.

321

8. Differential equations

The right-hand-side of (8.5) is a quadratic and this suggests that we try a


particular integral of the form
y1 (x) = x2 + x + .
We differentiate this twice to get
y1 (x) = 2x +

and

y1 (x) = 2,

so that, on substituting these into (8.5), our equation becomes


2 + 4(2x + ) + 4(x2 + x + ) = 8x2 + 8.
Then, equating the coefficients of the terms on both sides we see that, from the
x2 term, we get
4 = 8
=
= 2,
which means that, from the x term, we get
8 + 4 = 0

= 4,

and so, from the constant term, we get


2 + 4 + 4 = 8

= 5.

Consequently, we see that


y1 (x) = 2x2 4x + 5,

is the particular integral for y1 (x).


The general solution to (8.5) is then given by the sum of its complementary
function and its particular integral, i.e. we have
y1 (x) = (A + Bx) e2x +2x2 4x + 5,
where A and B are arbitrary constants.
We can now use this to find the general solution for y2 (x) since, using (8.5) and the
fact that
dy1
= B e2x 2(A + Bx) e2x +4x 4 = (B 2A 2Bx) e2x +4x 4,
dx
we get
y2 (x) =

1
[(B 2A 2Bx) e2x +4x 4] + 4[(A + Bx) e2x +2x2 4x + 5] .
2

So, simplifying this, we find that


y2 (x) = ( 12 B + A + Bx) e2x +4x2 6x + 8,
is the corresponding general solution for y2 (x) in terms of the same arbitrary
constants A and B as before.

322

8.5. Applications of ODEs

Once we have these general solutions, we can use the initial conditions y1 (0) = 1 and
y2 (0) = 7/2 to get the equations
1=A+5

7
2

and

= 12 B + A + 8,

which give us A = 4 and, hence, B = 1. Consequently, using these values, we find


that
y1 (x) = (4 + t) e2x +2x2 4x + 5 and y2 (x) = ( 29 + x) e2x +4x2 6x + 8,
are the sought after particular solutions.

8.5

Applications of ODEs

Differential equations are used widely in economics-based subjects and, in Section 5.4.1,
we saw a very simple application when we considered marginal functions. Here, we will
consider a few more examples that are a bit more sophisticated.

8.5.1

Determining demand functions from elasticities

In Section 3.3.3, we saw that the elasticity of demand, (p), is defined by


(p) =

p dq
,
q dp

where q = q D (p) is the demand function. If we know the elasticity of demand, we can
use this and our knowledge of ODEs to determine the demand function.
Example 8.17 Suppose that the elasticity of demand is a constant, i.e. (p) = r for
all p and r is a positive constant. Find the demand function if q D (1) = 2.
Using the definition of the elasticity of demand, this gives us

p dq
=r
q dp

1 dq
r
= ,
q dp
p

and so this is a separable first-order ODE. Solving this using the method in
Section 8.2.1, we write this as
1
dq =
q

r
p

dp and determine the integrals to get

ln |q| = r ln |p| + c,

where c is an arbitrary constant. Then, rewriting this as


ln |q| = ln |p|r + c,
we can take exponentials of both sides, to get
q = eln |p|

r +c

= ec pr ,

323

8. Differential equations

where we can remove the modulus signs since, economically, q and p are both
positive. Then, using the fact that q D (1) = 2, we see that ec = 2 and so
q = q D (p) =

2
,
pr

is the sought after demand function.


Activity 8.8 How does the demand function found in Example 8.17 behave as
p 0+ and as p ?

8.5.2

Continuous price adjustment

Suppose that the price of some commodity varies continuously with time and that its
initial price is not equal to its equilibrium price. We might expect that, as time
progresses, the price of the commodity will tend to its equilibrium price but to be sure,
we need to have a model of how the price of the commodity is varying with time. One
such model involves looking at how the rate of change of the price of the commodity is
related to the excess of demand over supply.
Suppose that the price of the commodity as a function of time is p(t) and that the
market for this commodity is governed by the demand function, q D (p), and the supply
function, q S (p). This means that, at any time, t, as the price is p(t), the quantity being
demanded is given by q D (p(t)) and the quantity being supplied is given by q S (p(t)). As
such, we can define the excess of demand over supply to be the function of p(t) given by
(p(t)) = q D (p(t)) q S (p(t)),

i.e. the difference between these two quantities. Clearly, this means that if p(t) is such
that:
(p(t)) > 0, demand outstrips supply and so the price should rise, i.e. p (t) > 0.
(p(t)) = 0, demand equals supply and we should have equilibrium, i.e. p (t) = 0.
(p(t)) < 0, supply outstrips demand and so the price should fall, i.e. p (t) < 0.
This suggests that the rate of change of the market price with time, i.e. p (t), should be
given by some function f of the excess of demand over supply, (p(t)), i.e. we have a
model where
dp
= f ((p(t)))
dt

with

(p(t)) = q D (p(t)) q S (p(t)).

Then, by solving this first-order ODE, we can find out how the market price varies with
time and hence assess the stability of the market by considering what it does as t .
To see how this works, lets consider an example.

324

8.5. Applications of ODEs

Example 8.18

A market is governed by the demand and supply functions


q D (p) = 5 2p

and

q S (p) = 3p 1,

respectively. If the rate of change of the market price is given by three times the
excess of demand over supply, find the ODE that describes how p(t) changes with
time.
We start by calculating the excess of demand over supply which is given by
(p(t)) = q D (p(t)) q S (p(t)) = [5 2p(t)] [3p(t) 1] = 6 5p(t).
We then know that the rate of change of demand over supply is given by three times
the excess, i.e.
6
dp
= 3(p(t)) = 3[6 5p(t)] = 15 p(t)
dt
5

This is a separable first-order ODE and we can easily solve it using the method in
Section 8.2.1.
Activity 8.9 Solve the separable first-order ODE found in Example 8.18 and use it
to determine how the market price changes over time if the initial price is p(0). How
does the market price behave in the long-term?

8.5.3

Continuous cash flows

In Section 6.1.5 of 173 Algebra, you saw how to find the balance, B(t), of a bank
account that utilises continuously compounded interest at an annual equivalent rate of
100r%. Another way of thinking about this is to say that, at any time, t, the rate of
increase of the balance, B (t), is given by rB(t). This means that we have
dB
= rB(t),
dt
and this is a simple separable first-order ODE that can be solved, using the method in
Section 8.2.1, to get
B(t) = P ert ,
where B(0) = P is the initial balance. As such, we can see that this way of thinking
about continuous compounding gives us an alternative way of deriving the formula you
saw in Section 6.1.5 of 173 Algebra.
Activity 8.10 Verify that solving this separable first-order ODE will give the
solution above.
However, we can actually use ODEs to find the balance of a bank account which uses
continuously compounded interest in the presence of more complicated investment
schemes. For instance, if we take the bank account above and suppose that money is

325

8. Differential equations

added to the account at a rate given by f (t),7 we see that the balance, B(t), is now
given by
dB
dB
= rB(t) + f (t) =
rB(t) = f (t),
dt
dt
which is a linear first-order ODE. And, of course, we could also have the situation where
money is deducted from the account at a rate given by f (t),8 and then we see that the
balance, B(t), would be given by
dB
= rB(t) f (t)
dt

dB
rB(t) = f (t),
dt

which is another linear first-order ODE. Lets consider an example.


Example 8.19 Suppose that we have two bank accounts, X and Y, that pay
continuously compounded interest at annual equivalent rates of 100rX % and 100rY %
respectively. We initially invest an amount PX in account X and, at each instant,
pay the interest accrued into account Y whose initial balance is PY . Find the ODE
that determines the balance in account Y at any time t 0.
Let BX (t) and BY (t) denote the balance in accounts X and Y respectively at time t.
The first thing to notice is that the rate of change of BX (t) is given by
dBX
= rX BX (t) rX BX (t) = 0,
dt
as, at every instant, any interest accrued is immediately deducted from account X so
that it can be paid into account Y. This means that BX (t) must be a constant and,
in particular, this constant must be the initial balance PX . Thus, we find that
BX (t) = PX for all t 0 and the interest accrued at each time t (which we
immediately pay into account Y) is given by rX PX .

The rate of change of BY (t) is then given by the sum of rY BY (t) which is the
continuously compounded interest accrued on the balance in account Y and rX PX
which, as we have just seen, is the continuously compounded interest accrued in
account X. That is, for t 0, we have
dBY
= rY BY (t) + rX PX
dt

dBY
rY BY (t) = rX PX ,
dt

which is a linear first-order ODE and we can easily solve this, subject to the
condition that BY (0) = PY , using the method in Section 8.2.2.
Activity 8.11 Solve the linear first-order ODE found in Example 8.19 and use it to
determine the balance in account Y at any time t 0.

7
8

That is, at each time, t, the balance increases by f (t).


That is, at each time, t, the balance decreases by f (t).

326

8.5. Learning outcomes

8.5.4

Market trends

In some markets, the equilibrium price will change with time and so it is useful for
consumers to try and anticipate trends. That is, the consumer will keep an eye on the
current equilibrium price, but they will also look at the rate at which the price is rising
or falling and whether this rate of change is speeding up or slowing down. We can
represent these three considerations mathematically by using p(t), p (t) and p (t)
respectively and, by considering how these affect the quantity being supplied or
demanded, we can model how the price itself is varying with time by using an ODE.
Lets look at an example.
Example 8.20

Suppose that the demand for a certain commodity is given by


q D (p) = 9 2p + 6

d2 p
dp
2 2,
dt
dt

and that supply is determined by


q S (p) = 3 + 4p

dp d2 p
2.
dt
dt

Find the ODE that determines the equilibrium price at any time t 0.
Here we have linear supply and demand functions which have been modified to take
a trend into account. To find the equilibrium price at any time t 0, we need to
determine the function, p(t), that makes the amount supplied equal to the amount
demanded, i.e.
3 + 4p(t)

dp
d2 p
dp d2 p
2 = 9 2p(t) + 6 2 2 .
dt
dt
dt
dt

But, rearranging this, we get the non-homogeneous second-order ODE with constant
coefficients given by
d2 p
dp
7 + 6p(t) = 12,
2
dt
dt
which we can solve using the method in Section 8.3.2.
Activity 8.12 Solve the second-order ODE found in Example 8.20 and use it to
determine how the equilibrium price changes if p(0) = 7 and p (0) = 15. How does
this equilibrium price behave in the long-term?

Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
identify and solve separable, linear and homogeneous first-order ODEs and other
first-order ODEs that can be solved by a given substitution;

327

8. Differential equations

identify and solve homogeneous and non-homogeneous second-order ODEs with


constant coefficients;
solve coupled systems of first-order ODEs by rewriting them as a second-order
ODE with constant coefficients;
solve problems from economics-based subjects that involve applications of ODEs.

Solutions to activities
Solution to activity 8.1
Looking at the given ODEs, we see that:
(a) is second-order of first degree,
(b) is second-order of second degree, and
(c) is third-order of first degree.
Here we find the highest order derivative to determine the order and then the algebraic
degree (or power) of this derivative determines the degree.
Solution to activity 8.2
We have the general solution
y(x) = x2 + x + c,

and we want to find the particular solutions corresponding to:


y(0) = 0. So, setting x = 0 in both sides of this expression and using the condition,
we get
y(0) = 02 + 0 + c = 0 = c,
which means that we must take c = 0 in the general solution to see that
y(x) = x2 + x,
is the particular solution to the ODE given that y(0) = 0.
y(0) = 1. So, setting x = 0 in both sides of this expression and using the
condition, we get
y(0) = 02 + 0 + c = 1 = c,
which means that we must take c = 1 in the general solution to see that
y(x) = x2 + x 1,
is the particular solution to the ODE given that y(0) = 1.
y(2) = 7. So, setting x = 2 in both sides of this expression and using the condition,
we get
y(2) = 22 + 2 + c = 7 = 6 + c,

328

8.5. Solutions to activities

which means that we must take c = 1 in the general solution to see that
y(x) = x2 + x + 1,
is the particular solution to the ODE given that y(2) = 7. Observe that this is the
same particular solution as the one we found with y(0) = 1 in Example 8.2 but that
it arises from a condition that specifies information about y(x) at a different value
of x.
Solution to activity 8.3
We have the general solution
2

y(x) = 1 + A ex ,
and we want to find the particular solutions corresponding to:
y(0) = 2. So, setting x = 0 in both sides of this expression and using the condition,
we get
y(0) = 1 + A e0 = 2 = 1 + A,
which means that we must take A = 1 in the general solution to see that
2

y(x) = 1 + ex ,
is the particular solution to the ODE given that y(0) = 2.
y(0) = 0. So, setting x = 0 in both sides of this expression and using the condition,
we get
y(0) = 1 + A e0 = 0 = 1 + A,
which means that we must take A = 1 in the general solution to see that
2

y(x) = 1 ex ,
is the particular solution to the ODE given that y(0) = 0.
If we want a value of y(1) that will give us the same particular solution as the one found
in (a), i.e.
2
y(x) = 1 + ex ,
we put x = 1 into both sides of this expression to get
y(1) = 1 + e1 = 1 + e .
That is, the condition y(1) = 1 + e gives us the same particular solution as the one we
found in (a).
Solution to activity 8.4
Here we have to solve the separable first-order ODE
2
1 dy
=
,
x
y + 3 dx

329

8. Differential equations

with M (x) = 2/x and N (y) = (y + 3)1 . Using the method in Section 8.2.1, we write
this as
2
dx =
x

dy
y+3

and determine the integrals to get 2 ln |x| + c = ln |y + 3|,

where c is an arbitrary constant. Taking exponentials of both sides we get


2

|y + 3| = e2 ln |x|+c = ec eln |x| = ec |x|2 .


Now, |x|2 = x2 and removing the modulus on the left-hand-side, we get
y + 3 = ec x2

y = 3 ec x2 ,

and so, as before, the general solution is


y(x) = 3 + Ax2 ,
where A R is an arbitrary constant.
Solution to activity 8.5
The given ODE can be left as it is or rearranged to give
1 dy
= 1,
y + ex dx
but, either way, it is not separable because we cant separate the variables.

Solution to activity 8.6


Here we have to solve the linear first-order ODE
y
dy
= 1,
dx x
with P (x) = 1/x and Q(x) = 1. Using the method in Section 8.2.2, we start by finding
the integrating factor, (x), by determining the integral
P (x) dx =

1
dx = ln |x| + c,
x

and so we see that ln x is an antiderivative of 1/x. This means that the integrating
factor is
1
(x) = e ln x = eln x = x1 ,
and so we have
(x)y(x) =

(x)Q(x) dx

x1 y(x) =

x1 dx

where c is an arbitrary constant. As such, we have


y(x) = x(ln |x| + c),
which is the same general solution as before.

330

x1 y(x) = ln |x| + c,

8.5. Solutions to activities

Solution to activity 8.7


If we try and write the ODE
x4 + 5y 4 4xy 3

dy
+ P (x)y = Q(x),
dx

dy
= 0 in the form
dx

the best we can do is

5
x3
dy
y = 3,
dx 4x
4y

and this is not linear due to the presence of the 1/y 3 on the right-hand-side.
Solution to activity 8.8
In Example 8.17, we found that q D (p) = 2/pr where r is a positive constant. As such,
we can see that q D (p) as p 0+ and q D (p) 0 as p .
Solution to activity 8.9
Using the method in Section 8.2.1, we write the ODE as
dp
=
p 65

(15) dt and determine the integrals to get

ln p

6
= 15t + c,
5

where c is an arbitrary constant. Taking exponentials of both sides, this gives us


p

6
= e15t+c = ec e15t .
5

Now, we remove the modulus bars and compensate for this loss by replacing ec (which
must be positive) with the constant A (which can be negative), to get
p(t) =

6
+ A e15t ,
5

which is the general solution. Then, given that the initial price is p(0), we see that
p(0) =

6
+ A e0
5

6
A = p(0) ,
5

and so, we have the particular solution


p(t) =

6
6
+ p(0)
5
5

e15t ,

which tells us how the market price changes over time if the initial price is p(0). In
particular, if we have a p(0) such that:
p(0) > 6/5, since e15t 0 as t , p(t) will decrease to 6/5.
p(0) = 6/5, we find that p(t) = 6/5 for all t 0.
p(0) < 6/5, since e15t 0 as t , p(t) will increase to 6/5.

Indeed, as you should be able to verify, 6/5 is the equilibrium price for this market and
so, in this case, regardless of the choice of p(0), the market is either in equilibrium or
tends to equilibrium in the long-term.

331

8. Differential equations

Solution to activity 8.10


To solve the separable first-order ODE
dB
=
B

dB
= rB(t) we write it as
dt

r dt,

and determine the integrals to get


ln |B| = rt + c

B = ert+c = ec ert ,

where we can remove the modulus sign since, economically, B is positive. Then, using
the fact that B(0) = P , we see that ec = P and so
B(t) = P ert ,
as we would expect.
Solution to activity 8.11
We have to solve the linear first-order ODE
dBY
rY BY (t) = rX PX ,
dt
subject to the condition that BY (0) = PY . The integrating factor is given by
e

(rY ) dt

= erY t ,

as rY t is an antiderivative of rY and this means that we have

erY t BY =

rX PX erY t dt

erY t BY = PX

rX rY t
e
+c,
rY

where c is an arbitrary constant. As such, our general solution is


BY (t) = PX

rX
+ c erY t .
rY

Then, as BY (0) = PY , we have


PY = PX

rX
+c
rY

c = PY + P X

rX
,
rY

and so the required particular solution is


BY (t) = PX

rX
rX
+ PY + PX
rY
rY

erY t = PY erY t +PX

rX
rY

erY t 1 ,

and this tells us the balance in account Y at any time t 0.


Solution to activity 8.12
To solve the non-homogeneous second-order ODE with constant coefficients given by
d2 p
dp
7 + 6p(t) = 12,
2
dt
dt
we note that:

332

8.5. Solutions to activities

The corresponding homogeneous second-order ODE is


d2 p
dp
+ 6p(t) = 0,

7
dt2
dt
and so the auxiliary equation is
k 2 7k + 6 = 0

(k 1)(k 6) = 0,

which has two real solutions given by k = 1 and k = 6. Consequently, the


complementary function for p(t) is
p(t) = A et +B e6t ,
where A and B are arbitrary constants.
The right-hand-side is a constant and this suggests we try a particular integral of
the form p(t) = . We differentiate this twice to get p (t) = 0 and p (t) = 0 so that,
on substituting these into our equation, we get
6 = 12

= 2.

Consequently, we see that p(t) = 2 is the particular integral for p(t).


The general solution is then given by the sum of its complementary function and
its particular integral, i.e. we have

p(t) = A et +B e6t +2,


where A and B are arbitrary constants.
Then given the initial condition p(0) = 7 we have
7=A+B+2

A + B = 5,

and since
p (t) = A et +6B e6t ,
the other initial condition, p (0) = 15, gives us
15 = A + 6B.
Solving these equations, say by subtracting one from the other, we get 5B = 10 which
gives us B = 2 and so, from the first equation, A = 3. Consequently, the particular
solution we seek is
p(t) = 3 et +2 e6t +2,
and this describes how the equilibrium price changes with time. Indeed, in the
long-term, as both 3 et and 2 e6t tend to infinity as t , we see that p(t) too.

333

8. Differential equations

Exercises
Exercise 8.1
Find the general solution of the ODE

dy
xy
+
1 + x2 .
=
dx 1 + x2
What is the particular solution if y(0) = 1?
Exercise 8.2
Use the substitution w(t) = y (t) to show that the ODE
d2 y 3 dy
= 3.

dt2
t dt
can be written as a linear ODE in terms of w(t). Solve this linear ODE for w(t) and
hence find the general solution of the original ODE.
Exercise 8.3
Find the particular solution of the ODE
y (t) 5y (t) + 6y(t) = 10 sin t,
given that y(0) = 0 and y (0) = 1.
Exercise 8.4

The functions f (t) and g(t) are related by the first-order ODEs
f (t) = 3f (t) g(t)

and

g (t) = 3g(t) f (t).

If f (0) = 2 and g(0) = 0, find these functions.


Exercise 8.5
The elasticity of demand for a good is given by
(p) =

2p2
,
p2 + 1

and q = 4 when p = 1. Find the demand function, q D (p).

Solutions to exercises
Solution to exercise 8.1
We solve this linear first-order ODE using the method in Section 8.2.2. Here
P (x) = x/(1 + x2 ) and we start by seeing that the integral
P (x) dx =

334

x
dx = 12 ln |1 + x2 | + c,
1 + x2

8.5. Solutions to exercises

where we have implicitly used the substitution u = 1 + x2 . So, as


antiderivative of x/(1 + x2 ), the integrating factor is
1

(x) = e 2 ln(1+x ) = eln


Then, as Q(x) =

1+x2

1
2

ln(1 + x2 ) is an

1 + x2 .

1 + x2 , we have
(x)y(x) =

(x)Q(x) dx

= y(x) 1 + x2 =

(1 + x2 ) dx

x3
+ c,
= y(x) 1 + x2 = x +
3
where c is an arbitrary constant. As such, we find that
y(x) =

x
x3
c
+
+
,
1 + x2 3 1 + x2
1 + x2

is the general solution of the given ODE.


If y(0) = 1, this gives us c = 1, and so
y(x) =

3x + x3 + 3

,
3 1 + x2

is the sought after particular solution.

Solution to exercise 8.2


Given that w(t) = y (t), we have w (t) = y (t), and so the given ODE, i.e.
d2 y 3 dy

= 3 becomes
dt2
t dt

dw 3
w(t) = 3,
dt
t

which is the sought after linear ODE for w(t).


We solve this ODE using the method in Section 8.2.2. Here P (t) = 3/t and we start
by seeing that the integral
p(t) dt =

3
dt = 3 ln |t| + c,
t

and so 3 ln t is an antiderivative of 3/t which means that the the integrating factor,
(t), is given by
3
(t) = e3 ln t = eln(t ) = t3 .
Then, as Q(t) = 3, we have
(t)w(t) =

(t)Q(t) dt

t3 w(t) =

3t3 dt

3
t3 w(t) = t2 + c,
2

where c is an arbitrary constant. As such, we see that


w(t) =

3t
+ ct3 ,
2

335

8. Differential equations

is the general solution for w(t).


Then, as w(t) = y (t), we see that
y(t) =

3t
+ ct3
2

w(t) dt =

3
c
dt = t2 + t4 + d,
4
4

where d is another arbitrary constant. This is the general solution of the original ODE.
Solution to exercise 8.3
The given ODE is a non-homogeneous second-order ODE with constant coefficients and
we solve it using the method of Section 8.3.2. In particular:
The corresponding homogeneous second-order ODE is
y (t) 5y (t) + 6y(t) = 0,
and so the auxiliary equation is
k 2 5k + 6 = 0

(k 2)(k 3) = 0,

which has two real solutions given by k = 2 and k = 3. Consequently, the


complementary function, yc (t), is
yc (t) = A e2t +B e3t ,
for arbitrary constants A and B.

The right-hand-side of the given ODE is 10 sin t and this suggests that we try a
particular integral of the form
yp (t) = sin t + cos t.
We differentiate this twice to get
yp (t) = cos t sin t

and

yp (t) = sin t cos t,

so that, on substituting these into the given ODE, we get


( sin t cos t) 5( cos t sin t) + 6( sin t + cos t) = 10 sin t.
Then, equating the coefficients of the terms on both sides we see that, from the
sin t term, we get
+ 5 + 6 = 10

+ = 2,

and, from the cos t term, we get


5 + 6 = 0

= ,

and so, solving these two equations simultaneously, we find that = 1 and = 1.
Consequently, we see that
yp (t) = sin t + cos t,
is the particular integral.

336

8.5. Solutions to exercises

The general solution is then given by the sum of its complementary function and
its particular integral, i.e. we have
y(t) = A e2t +B e3t + sin t + cos t,
where A and B are arbitrary constants.
We can now use the initial condition y(0) = 0 to see that
0=A+B+0+1

A + B = 1,

and, as
y (t) = 2A e2t +3B e3t + cos t sin t,
the initial condition y (0) = 1 gives us
1 = 2A + 3B + 1 0

2A + 3B = 0.

Solving these equations simultaneously then gives us A = 3 and B = 2 which means


that
y(t) = 3 e2t +2 e3t + sin t + cos t,
is the sought after particular solution.
Solution to exercise 8.4
We will solve the given system of first-order ODEs by rewriting it as a second-order
ODE in f (t). To do this we note that, rearranging the first ODE gives us
g = 3f

df
dt

(8.6)

and if we differentiate this with respect to t we get


df
d2 f
dg
=3
2.
dt
dt
dt
Consequently, if we substitute these two expressions into the second ODE, we get
3

df
d2 f
df
2 = 3 3f
dt
dt
dt

f,

and this can be rearranged to get


d2 f
df
6
+ 8f = 0,
2
dt
dt
which is our sought after second-order ODE in f (t). As it is an homogeneous
second-order ODE with constant coefficients, this can be solved using the method of
Section 8.3.1. The auxiliary equation is
k 2 6k + 8 = 0

(k 2)(k 4) = 0,

which has two real solutions given by k = 2 and k = 4 which means that the general
solution for f (t) is
f (t) = A e2t +B e4t ,

337

8. Differential equations

for arbitrary constants A and B. To find the general solution for g(t), we note that
using (8.6) and the fact that
f (t) = 2A e2t +4B e4t ,
we get
g(t) = 3[A e2t +B e4t ] [2A e2t +4B e4t ] = A e2t B e4t ,
in terms of the same arbitrary constants A and B as before. Thus, the general solution
to this system of first-order ODEs is
f (t) = A e2t +B e4t

and g(t) = A e2t B e4t ,

for arbitrary constants A and B.


However, we are also given the conditions f (0) = 2 and g(0) = 0 which imply that
2=A+B

and 0 = A B.

Solving these two equations simultaneously then gives us A = 1 and B = 1 which means
that
f (t) = e2t + e4t and g(t) = e2t e4t ,
are the sought after functions.
Solution to exercise 8.5
Using the definition of elasticity with q = q D (p) and the given expression we have

p dq
2p2
= 2
,
q dp
p +1

and this can be written as

2p
1 dq
= 2
,
q dp
p +1
which is a separable ODE. So, using the method of Section 8.2.1, we write this as
dq
=
q

2p
dp and determine the integrals to get
+1

p2

ln |q| = ln |p2 + 1| + c,

where c is an arbitrary constant.9 Taking exponentials on both sides, this gives us


ln(p2 +1)+c

q=e

c ln(p2 +1)1

=e e

= e (p + 1)

ec
= 2
,
p +1

where we can remove the modulus signs since, economically, q is positive and p2 + 1 is
always positive too. Then, using the fact that q = 4 when p = 1, we see that ec = 8 and
so
8
q = q D (p) = 2
,
p +1
is the sought after demand function.
9

Here we have implicitly used the substitution u = 1 + p2 to determine the integral on the
right-hand-side.

338

Appendix A
Sample examination paper
Important note: This sample examination paper reflects the intended examination
and assessment arrangements for this course in the academic year 20112012. The
format and structure of the examination may have changed since the publication of this
subject guide. You can find the most recent examination papers on the VLE where all
changes to the format of the examination are posted.

Calculus
Time allowed: THREE hours.
Candidates should answer all FIVE questions. All questions carry equal marks (20
marks each).
Calculators may not be used for this paper.
1. (a) (i) Find

t cos t dt.

(ii) Show that the differential equation


dy
x3
+ xy 2 x2 y
= 0,
cos(y/x)
dx
is homogeneous and find its degree of homogeneity.
(iii) Hence find the general solution of the differential equation in (ii) leaving
your answer in terms of y/x.
(b) A plane, P , in R3 contains the point (3, 4, 1) and has normal (4, 8, 4)T .
Find the Cartesian equation of this plane.
It is known that the surface, S, with equation
x2 + y 2 + z 2 = c,
for some c R has P as a tangent plane. Find the value of c that makes this
the case and find the point on this surface which has P as its tangent plane.
Another surface with equation
x2 + y 2 + z 2 = ,
for some , R intersects S orthogonally at the point (4, 3, 5). Find the
values of and that make this the case.

339

A. Sample examination paper

2. A market has an equilibrium price of 14 and an equilibrium quantity of 6.


(a) If this markets elasticity of demand is given by
p
(p) =
,
26 p
find its demand function.
(b) The markets inverse supply function has the form
pS (q) = aq + b,
for some numbers a and b. Given that the producer surplus is 36, find the
values of a and b. Hence deduce the supply function, q S (p), for this market.
(c) An excise (or per-unit) tax of T is imposed on the market. Find the new
equilibrium price and quantity of the market.
Hence find the value of T that maximises the tax revenue.
3. (a) A function f : R2 R is defined by

f (x, y) = x2 2x y 3 + y 2 + y.

Find and classify the stationary points of f .

Find the regions, if any, in the (x, y)-plane where f is convex, concave or
neither.
Does f have a global minimum or a global maximum? Justify your answer.
(b) Find the general solution of the differential equation
y (t) 2y (t) + y(t) = et .

What is the particular solution if y(0) = 1 and y(1) = 0?


4. If a firm uses amounts k and l of capital and labour respectively, then it can
produce an amount q(k, l) = k l where 0 < < 1/2. Supposing that the firm is
producing an amount Q, use the method of Lagrange multipliers to show that the
minimum amount it can spend on capital and labour is given by

1
2 vw Q 2 ,
where each unit of capital costs v and each unit of labour costs w. By sketching the
constraint and some appropriate contours, you should justify your use of the
method of Lagrange multipliers and explain why your answer is a minimum.
The product manufactured by the firm sells at a fixed price, p, and the raw
materials required to produce each unit cost an amount, c, where c < p. If the firm
acts in a way which minimises its capital and labour costs, use the result just
obtained to determine the production level, Q, that will maximise its profit.
5. (a) Find the fifth-order Maclaurin series for esin x .
cos x
dx.
(b) Determine the integral
(1 sin x)(2 + sin x)
(c) Find and classify the stationary points of the function

x
3 x.
f (x) =
12

340

Appendix B
Solutions to the sample examination
paper
Question 1.
(a) For (i), we use integration by parts to see that, differentiating the t and integrating
the cos t, we get
t cos t dt = t sin t

sin t dt = t sin t + cos t + c,

where c is an arbitrary constant.


For (ii), we compare the first-order differential equation with the standard form
M (x, y) + N (x, y)
to see that
M (x, y) =

x3
+ xy 2
cos(y/x)

dy
= 0,
dx

and N (x, y) = x2 y.

In this case, this means that we have


M (x, y) =

(x)3
+ (x)(y)2 = 3 M (x, y),
cos(y/x)

and
N (x, y) = (x)2 (y) = 3 N (x, y),
i.e. both M (x, y) and N (x, y) are homogeneous of degree 3. Consequently, this
differential equation is homogeneous of degree 3.
For (iii), as the differential equation in (ii) is homogeneous, we make the substitution
y(x) = xv(x) so that, using the product rule, we have
dy
dv
= v(x) + x ,
dx
dx
and the differential equation becomes
x3
dv
+ x3 v 2 x3 v v + x
cos v
dx
which, when simplified, yields
v cos v

= 0,

dv
1
= ,
dx
x

341

B. Solutions to the sample examination paper

which is a separable differential equation. Rewriting this in the usual way then gives

dx
x

v cos v dv =

v sin v + cos v = ln |x| + c,

where c is an arbitrary constant and we have used (i) to find the integral on the
right-hand-side. So, using y(x) = xv(x) again, we see that
y
y
y
sin
+ cos
= ln |x| + c,
x
x
x
is the general solution in terms of y/x. (Obviously, this expression cannot be usefully
simplified any further.)
(b) As the plane, P , contains the point (3, 4, 1) and has normal (4, 8, 4)T , we have

4
x3
8 y 4 = 0 = 4(x3)+8(y4)4(z+1) = 0 = x2y+z = 6,
4
z+1
as its Cartesian equation.

The surface, S, can be written as f (x, y, z) = c with


f (x, y, z) = x2 + y 2 + z 2 ,
for constant c. At any point, (x, y, z), on the surface, its normal vector is given by

fx
2x
f = fy = 2y ,
fz
2z

and in order for this to be in the same direction as the normal to P , there must be some
that makes



2
2x
4
x
2
f = 4 = 2y = 8 = y = 4 .
2
2z
4
z
2
Of course, we also need the point, (x, y, z), to lie on P and so we also have
x 2y + z = 6

(2) 2(4) + (2) = 6

1
= ,
2

i.e. this is the value of that we need. Thus, the point on S that we seek is (1, 2, 1)
and, using the equation for S, we get
c = (1)2 + (2)2 + (1)2 = 6,
as the required value of c.
The new surface can be written as g(x, y, z) = with
g(x, y, z) = x2 + y 2 + z 2 ,

342

for constants and . At any point (x, y, z) on the surface, its normal vector is given by

gx
2x
8
g = gy = 2y
=
g = 6 ,
gz
2z
10
at the point (4, 3, 5). We also see that the normal to S at the point (4, 3, 5) is

8

f = 6 ,
10

and, in order for these two surfaces to intersect orthogonally at this point, we must have


8
8

6
g f = 0 =
6 = 0 = 64 + 36 + 100 = 0 = = 1,
10
10

as the value of that we seek. Then, as the point (4, 3, 5) must lie on the new surface,
we also have
x2 + y 2 z 2 =

42 + 32 (52 ) =

= 0,

as the required value of .


Question 2.
(a) The elasticity of demand is given by the formula
p dq
(p) = ,
q dp
where q = q D (p) is the demand function. In this question, we are told that
=

p
,
26 p

and so putting this into the formula above we get


p dq
p

=
q dp
26 p

1 dq
1

=
,
q dp
p 26

which is a separable differential equation. As such, we solve this by separating the


variables and integrating both sides to get
1
dq =
q

1
dp
p 26

ln |q| = ln |p 26| + c

q = A(p 26),

for some arbitrary constant, A. Then, using the fact that the equilibrium price is 14 and
the equilibrium quantity is 6, we can see that A must satisfy the equation
6 = A(14 26)

A=

6
1
= .
12
2

343

B. Solutions to the sample examination paper

Putting this all together, we then see that we have q = q D (p) where
p
q D (p) = 13 ,
2

is the sought after demand function.


(b) The producer surplus is given by
q

PS = p q

pS (q) dq,
0

where p and q are the equilibrium price and quantity, and pS (q) is the inverse supply
function. So, using the information given in the question, we have
6

36 = (14)(6)

(aq + b) dq

q2
48 = a + bq
2

48 = 18a + 6b,

or, indeed, 8 = 3a + b as our first equation for a and b. Another equation that needs to
be satisfied is
14 = 6a + b,
as the equilibrium quantity must give the equilibrium price when we use the inverse
supply function. We can easily solve these equations for the constants a and b by
subtracting one from the other to get a = 2 and then, using the first equation again, we
get b = 2. Consequently, we have
pS (q) = 2q + 2 so that q S (p) =

p
1,
2

is the supply function for this market.1


(c) In the presence of an excise tax of T , the supply function becomes
1
qTS (p) = q S (p T ) = (p T ) 1,
2
whereas the demand function is unchanged, i.e. qTD (p) = q D (p).
1

Of course, an alternative method here would be to observe that the supply function is a straight line
and so the producer surplus is the area of a triangular region whose height is p b and whose width is
q . This means that, if we find the area of this triangle, we have
36 = 21 (14 b)(6)

14 b = 12

b = 2.

Then, again using the fact that equilibrium quantity must give the equilibrium price when we use the
inverse supply function, we use b = 2 to see that
14 = 6a + b

a=

14 2
= 2,
6

so that, once again, we find that


pS (q) = 2q + 2
is the supply function for this market.

344

so that q S (p) =

p
1,
2

This means that, in the presence of the excise tax of T , the new equilibrium price is
given by
qTS (p) = qTD (p)

p
1
(p T ) 1 = 13
2
2

p = 14 +

T
,
2

and, using qTD (p) say, we see that the new equilibrium quantity is
q = 13

1
2

T
2

14 +

=6

T
.
4

We can now find the tax revenue, R(T ), which is the tax per unit, T , multiplied by q,
the amount being sold in the presence of the tax, i.e. we have
R(T ) = T q = T

T
4

= 6T

T2
.
4

To see where this is maximised, we start by noting that R(T ) has a stationary point
when R (T ) = 0, i.e. when
6

T
=0
2

T = 12,

and since R (T ) = 1/2 < 0 this turning point is indeed a maximum. Thus, the tax
revenue is maximised when T = 12.
Question 3.
(a) The first-order partial derivatives of f (x, y) are
fx (x, y) = 2x 2

and

fy (x, y) = 3y 2 + 2y + 1.

At a stationary point, both of these first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve
the simultaneous equations
2x 2 = 0

and

3y 2 + 2y + 1 = 0.

But, the first equation gives us x = 1 and the second equation gives us
3y 2 2y 1 = 0

(3y + 1)(y 1) = 0

y=

1
or 1.
3

Consequently, the points (1, 1/3) and (1, 1) are the stationary points of this function.
The second-order partial derivatives of this function are
fxx (x, y) = 2,

fxy (x, y) = 0 = fyx (x, y)

and

fyy (x, y) = 6y + 2,

and, as such, the Hessian is given by


H(x, y) = (2)(6y + 2) (0)2 = 4(1 3y).
Evaluating this at each of the stationary points we then find that:

345

B. Solutions to the sample examination paper

At (1, 1/3), the Hessian is


H(1, 1/3) = 4(2) > 0

and

fxx (1, 1/3) = 2 > 0,

so this is a local minimum.


At (1, 1), the Hessian is
H(1, 1) = 4(2) < 0,
and so this is a saddle point.
Thus, the stationary points (1, 1/3) and (1, 1) are a local minimum and a saddle point
respectively.
To see where the function is convex, concave or neither we note that the Hessian is
given by
H(x, y) = 4(1 3y)
and
fxx (x, y) = 2,
and so we see that:
When y > 1/3, H(x, y) < 0 and so the function is neither convex nor concave.
When y 1/3, H(x, y) 0 and fxx (x, y) 0 and so the function is convex.

The function doesnt have a global minimum or a global maximum because, if we


consider the behaviour of the function when x = 0 we have
f (0, y) = y 3 + y 2 + y,
and as such, we see that:
As y , f (0, y) and so f (x, y) cannot have a global minimum.
As y , f (0, y) and so f (x, y) cannot have a global maximum.
(b) To solve the given non-homogeneous second-order differential equation, we follow
the method in Section 8.3.2. In particular:
The corresponding homogeneous second-order ODE is
y1 2y1 + y1 = 0,
and so the auxiliary equation is
k 2 2k + 1 = 0

(k 1)2 = 0,

which has one real solution given by k = 1 (twice). Consequently, the


complementary function, yc (t), is
y1 (t) = (At + B) et ,
for arbitrary constants A and B.

346

The right-hand-side of the given ODE is et and our first reaction in this case would
be to take yp (t) = et where is a constant that has to be determined. But, this
wont work as, taking A = 0 and B = , we see that this is part of the
complementary function. As such, we multiply by t and try yp (t) = t et which
wont work either because, taking A = and B = 0, we see that this is part of the
complementary function as well. Consequently, we multiply by t again and try
yp (t) = t2 et which, thankfully, will work because it is not part of the
complementary function. So, differentiating this using the product rule, we have
yp (t) = (2t) et +(t2 ) et = (2t + t2 ) et ,
and
yp (t) = (2 + 2t) et +(2t + t2 ) et = (2 + 4t + t2 ) et ,
which means that, substituting these into our ODE, we get
(2 + 4t + t2 ) et 2(2t + t2 ) et +t2 et = et
Consequently, we see that
yp (t) =

2 et = et

1
= .
2

t2 t
e,
2

is the particular integral we seek


The general solution to our ODE is then given by the sum of its complementary
function and its particular integral, i.e. we have
y(t) = (At + B) et +

t2 t
e,
2

where A and B are arbitrary constants.


Then given the conditions y(0) = 1 and y(1) = 0, we have the equations
1 = B e0

and

0 = (A + B) e1 +

e1
,
2

respectively. The first of these gives B = 1 and then the second gives
0=A+B+

1
2

0=A+1+

1
2

3
A= .
2

Thus, we find that


y(t) =

3
t2
t2 3t + 2 t
t + 1 et + et =
e,
2
2
2

is the particular solution we seek.

347

B. Solutions to the sample examination paper

Question 4.

Here the cost function is


C(k, l) = vk + wl,
and we want to minimise this subject to the constraint q(k, l) = Q where k, l > 0. So,
writing the constraint in the form q(k, l) Q = 0, we get the Lagrangean
L(k, l, ) = vk + wl (q(k, l) Q) = vk + wl (k l Q).
and we seek the points which simultaneously satisfy the equations Lk (k, l, ) = 0,
Ll (k, l, ) = 0 and L (k, l, ) = 0. As such, we find the first-order partial derivatives of
L(k, l, ), i.e.
Lk (k, l, ) = v k 1 l , Ll (k, l, ) = w k l1 and L (k, l, ) = (k l Q) ,
and set these equal to zero to yield the equations
v k 1 l = 0,

w k l1 = 0

k l Q = 0.

and

We now solve these by eliminating from the first two equations, i.e. we get
v k 1 l = 0,

v
k 1 l

vk
,
k l

wl
,
k l

from the first equation, and


w k l1 = 0

w
k l1

from the second equation. As such, we can equate these expressions for to get
vk
wl
=

k l
k l

l=

v
k.
w

We then use this new relationship between k and l in the third equation, which is just
the constraint k l = Q, to get
Q = k

v
k
w

Q=

v
w

k 2

k 2 =

w
v

k=

w 1
Q 2 ,
v

and then, using this in the equation l = vk/w, we get


l=

v
w

w 1
Q 2
v

v 1
Q 2 .
w

Thus, these values of k and l minimise the cost of producing Q units. The minimum
cost is then given by

C(Q)
=C
as required.

348

w 1
Q 2 ,
v

v 1
Q 2
w

=v

w 1
Q 2 + w
v

1
v 1
Q 2 = 2 vw Q 2 ,
w

de

cr

di

re
ct

io
ea
sin n of
g
co
st

To justify this, we note that the constraint k l = Q looks a bit like a rectangular
hyperbola and, for k, l > 0, this is illustrated in Figure B.1(a). The objective function,
C(k, l) = vk + wl has contours C(k, l) = c, where c is a constant, that are straight lines
as illustrated in Figure B.1(b). The direction in which C(k, l) is decreasing is indicated
in this figure along with the point we found above using the Lagrange multiplier
method i.e. a point where we have a contour of C(k, l) which is both tangential to
the constraint and touching the constraint. Having seen this, it should be clear that this
point will minimise C subject to the constraint.

(a)

(b)

Figure B.1: (a) The constraint q(k, l) = Q. (b) Adding three contours, C(k, l) = c, where

the direction in which C(k, l) is decreasing is as indicated. Clearly, we are interested in


the point which is indicated in the figure.
Using the given information, we can see that if Q is produced then the revenue
generated will be R(Q) = pQ and the costs incurred will be

C(Q) = cQ + C(Q)
+ FC = cQ + 2 vw Q 2 + FC,
which is the cost of the raw materials plus the costs of capital and labour plus any fixed
costs the firm may have. As such, the profit function for the firm is

1
(Q) = R(Q) C(Q) = pQ cQ 2 vw Q 2 FC,
and we want to find the value of Q that maximises this. As such, we find that

1 1 1
vw 12
2
(Q) = p c 2 vw
Q
=pc
Q 2 ,
2

as the fixed costs, FC, are a constant and, setting this equal to zero, we find that
(Q) = 0

12
2

pc
=
vw

Q=

pc

vw

2
12

is the only stationary point. Indeed, notice that this value of Q is positive as p > c and
> 0. Furthermore, we have

2
vw
2
(Q) =
Q 12 1 ,

1 2

349

B. Solutions to the sample examination paper

and as this is negative at the stationary point (since 0 < < 1/2 implies that > 0
and 1 2 > 0) we see that our stationary point is a local maximum. Thus,
pc

vw

Q=

2
12

is the value of Q that maximises the firms profit.


Question 5.
(a) Using the facts that
ey = 1 + y +

y2 y3 y4
+
+
+ ,
2!
3!
4!

and
sin x = x

x3 x5
+
,
3!
5!

we find that, letting y = sin x, we have


sin x

x3 x5
=1+ x
+

3!
5!

1
+
2!
+

x3
+
x
3!

1
+
3!

x3
+
x
3!

1
1
(x )4 + (x )5 + ,
4!
5!

if we keep the relevant terms of the sin x series when we put them into the series for ey .
Then, multiplying out the brackets and, again, keeping the relevant terms we get
esin x = 1 + x

x3 x5
+

3!
5!

1
x3
x2 + 2(x)
+
2!
3!
1
x3
+
x3 + 3(x)(x)
+
3!
3!
x4 x5
+
+
+ ,
4!
5!
+

so that, tidying up, this gives us


esin x = 1 + x

x3
x5
+

6
120

1
x4
x2
+
2
3
1
x5
+
x3
+
6
2
x4
x5
+
+
+
24 120
+

which means that we have


esin x = 1 + x +
in terms up to x5 .

350

x2
x4 x5
+ 0x3

+ ,
2
8
15

(b) We make the substitution g = sin x so that


dg
= cos x
dx

cos xdx = dg,

and so we have
cos x
dx =
(1 sin x)(2 + sin x)

1
dg.
(1 g)(2 + g)

Thus, using partial fractions, we have


1
A
B
=
+
(1 g)(2 + g)
1g 2+g

1 = A(2 + g) + B(1 g),

so that, setting g = 1 we get A = 1/3 and setting g = 2 we get B = 1/3.


Consequently, we have
1
dg =
(1 g)(2 + g)

1/3
1/3
+
1g 2+g

1
3

1
2 + sin x
ln
+ c,
3
1 sin x

dg

ln |1 g| + ln |2 + g| + c

as the answer.
(c) To find the stationary points of the function f (x) we write it as
x
x1/3 ,
f (x) =
12
and so we have

1
1
x2/3 .
12 3
The stationary points occur when f (x) = 0 and so we need to solve the equation
f (x) =

1
1
2/3 = 0
12 3x

x2/3 4
= 0,
x2/3

and this is satisfied when


x2/3 = 4

x2 = 64

x = 8.

To determine their nature, we find the second derivative of f (x), i.e.


f (x) =

1
3

2
3

x5/3 =

2
,
9x5/3

and we can see that


If x = 8, we have f (8) > 0 and so this is a local minimum.
If x = 8, we have f (8) < 0 and so this is a local maximum.

Thus, the stationary points when x = 8 and x = 8 are a local minimum and a local
maximum respectively.

351

B. Solutions to the sample examination paper

352