Mathematics
Textbook
CHOO YAN MIN
This book is optimised for viewing in PDF format (click the above link).
The other 700 pages are for things like front matter, TYS questions, appendices,
reproductions of formula lists and syllabuses, and answers to exercises.
The licensor cannot revoke these freedoms as long as you follow the license terms.
• Attribution — You must give appropriate credit, provide a link to the license, and
indicate if changes were made. You may do so in any reasonable manner, but not in any
way that suggests the licensor endorses you or your use.
• NonCommercial — You may not use the material for commercial purposes.
• ShareAlike — If you remix, transform, or build upon the material, you must distribute
your contributions under the same license as the original.
• No additional restrictions — You may not apply legal terms or technological measures
that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain
or where your use is permitted by an applicable exception or limitation. No warranties are
given. The license may not give you all of the permissions necessary for your intended use.
For example, other rights such as publicity, privacy, or moral rights may limit how you use
the material.
The scientist does not study nature because it is useful to do so. He studies it because
he takes pleasure in it, and he takes pleasure in it because it is beautiful.
- Henri Poincaré (1908 [1914], Science and Method, English trans., p. 22).
SYLLABUS ALERT
Where there are any differences between the old and revised syllabuses, I’ll let you know
with a yellow box like this.
• FREE! This book is free. But if you paid any money for it, I certainly hope your money
is going to me! This book is free because:
• DONATE! This book may be free, but donations are more than welcome! Donation
methods in footnote.3
It’s irrational for Homo economicus to donate. But please consider donating because:
1. There are any errors in this book. Please let me know even if it’s something as trivial
as a spelling mistake or a grammatical error.
2. You have absolutely any suggestions for improvement.
3. Any part of this book is less than crystal clear.
Here’s an anecdote about Richard Feynman, the great teacher and physicist:
Feynman was once asked by a Caltech faculty member to explain why spin
1/2 particles obey Fermi-Dirac statistics. He gauged his audience perfectly
and said, “I’ll prepare a freshman lecture on it.” But a few days later he
returned and said, “You know, I couldn’t do it. I couldn’t reduce it to
the freshman level. That means we really don’t understand it.”
I agree: If you can’t explain something simply, you don’t understand it well enough.4
Corollary: An excellent test of whether you understand something is to see if you can explain
it simply to someone else.
If at any point in this textbook, you have read the same passage a few times, tried to reason
it through, and still find things confusing, then it is a failure on MY part. Please let me
know and I will try to rewrite it so that it’s clearer. (There is also the possibility that I
simply messed up! So please let me know if there’s anything confusing!)
I deeply value any feedback, because I’d like to keep improving this textbook
for the benefit of everyone! I am very grateful to all the kind folks who’ve already
written in, allowing me to rid this book of more than a few embarrassing errors.
• LyX rocks!
You’re probably reading this on some device. So I’ve tried to set the font sizes and stuff so
that one can comfortably read this on a device as small as a seven-inch tablet. It should
also be possible to read this on a phone, though somewhat less comfortably. (Please let me
know if you have any feedback about this!)
(I’ll probably be contacting some publishers to see if they want to do a print version of
this, for anyone who prefers it in print.)
4
This quote or some similar variant is often (mis)attributed to Einstein. But as Einstein himself once said, “73% of Einstein
quotes are misattributed.”
L TEX is the typesetting program used by most economists and scientists. But LATEX can be difficult to use. LYX is a
5 A
user-friendly GUI version of LATEX. LYX has boosted my productivity by countless hours over the years and you should use
LYX too!
Reading maths is not like reading Harry Potter. Most of Harry Potter is fluff. There is
little fluff in maths.
So go slowly. Dwell upon and carefully consider every sentence in this textbook. Make sure
you completely understand what each statement says and why it is true. Reading maths
is very different from reading any other subject matter.
If you don’t quite understand some material, you might be tempted to move forward anyway.
Don’t. In maths, later material usually builds on earlier material. So if you simply move
forward, this will usually cost you more time and frustration in the long run.
Better then to stop right there. Keep working on it until you “get” it. Ask a friend or
a teacher for help. Feel free to even email me! (I’m always interested to know what the
common points of confusion are and how I can better clear them up.)
• Examples and exercises are your best friends. So work through them.
Work through all the examples and exercises. Merely moving your eyeballs is not the same
as working. Working means having pencil and paper by your side and going through each
example/exercise word-by-word, line-by-line.
For example, I might say something like “x2 − y 2 = 0. Thus, (x − y)(x + y) = 0.” If it’s not
obvious to you why the first sentence implies the second, stop right there and work on it
until you understand why. Don’t just let your eyeballs fly over these sentences and pretend
that your brain is “getting” it.
I will often not bother to explain some steps, especially if they simply involve some simple
algebra.
So there’s no need to memorise all the formulae that are already on the list you’re getting.
Note that you get a different list depending on which exam you’re taking — List of Formulae
(MF15) for the old 9740 exam and List of Formulae (MF26) for the revised 9758 exam.
(Both lists are reproduced in Part VIII of this book.) I cannot guarantee though that your
JC will give you the List during your JC common tests and exams.
You’ve probably forgotten some (or most?) of it, but unfortunately, you are still assumed
to know EVERYTHING from O-Level Maths & ‘A’ Maths. (To take H2 Maths, most JCs
require that you at least passed ‘A’ Maths.6 )
See in particular the lists near the end of either the 9740 (old) or the 9758 (revised) syllabus.
Skim through and see if anything looks totally alien to you!
Some chapters (e.g. Chapters 5 and 26) in this textbook will give a quick review of some of
the O-Level Maths material that you may have forgotten but which we’ll use quite often.
• Online Calculators
Google is probably the quickest for simple calculations. Type in anything into your
browser’s Google search bar and the answer will instantly show up:
Wolfram Alpha is somewhat more advanced (but also slower). Enter “sin x” for example
and you’ll get graphs, the derivative, the indefinite integral, the Maclaurin series, and a
bunch of other stuff you neither know nor care about.
The Derivative Calculator and the Integral Calculator are probably unbeatable for the
specific purposes of differentiation and integration. Both give step-by-step solutions for
anything you want to differentiate or integrate.
Here is a collection of spreadsheets I made. These spreadsheets are for doing tedious and
repetitive calculations you’ll often encounter in H2 maths (e.g. with vectors, complex
numbers, etc.). As with anything I do, I welcome any feedback you may have about
these spreadsheets. Perhaps in the future I will make a more attractive version of it.
(Instructions: Click “Make a copy” to open up your own independent copy of
this spreadsheet. Enter your input in the yellow cells. Output is produced in
the blue cells. If you mess up anything, simply click the same link and “Make
a copy” again.)
6
Some JCs, like HCI, even require that you got at least a B3 for both Maths & ‘A’ Maths.
There are way too many websites out there catering to primary, secondary, and lower-level
undergraduate maths. Unfortunately, some of them can be awful and can get things wrong.
Three resources I like (though are probably a bit advanced for JC students) are:
1. Math StackExchange
A great resource where you can ask maths questions and often get them answered fairly
promptly. Note though that this site is mostly frequented by fairly advanced students of
maths (not to mention also mathematicians), so they can be pretty impatient and quick
to downvote questions they perceive to be “stupid”. Nonetheless, if you make an effort to
write down a carefully-crafted question and show also that you’ve made some effort to look
for an answer (either on your own or online), they can be very helpful.7
2. ProofWiki gives succinct and rigorous definitions and proofs. Unfortunately it is very
incomplete.
3. Mathworld.Wolfram is also great, but at times excessively encyclopaedic, at the cost of
clarity and brevity.
And of course, you can find countless free maths textbooks online (some less legal than
others). Two totally illegal8 resources are: LibGen for books and SciHub for articles.9
An old reliable is Bittorrent.
7
There is an entire StackExchange family of websites. The flagship site is StackOverflow where you can ask any programming
question and get it answered amazingly quickly.
8
Well, depending on which jurisdiction you live in. Of course, in Singapore, unless told otherwise, you should assume that
everything is illegal.
9
Note though that these sites are constantly playing whac-a-mole with the fascist authorities so the URLs often change —
if so, simply google to look up the current URLs.
10
Pretty bizarre that in this age of the smartphone, they want you to learn how to use these clunky and now-useless devices
from the ’80s and ’90s. It is the equivalent of learning to program a VCR.
IMHO it’d be much better to teach you to some simple programming or Excel (or whatever spreadsheet program). “B-b-
but ... how would such learning be tested in an exam format?” Ay, there’s the rub. In the Singapore education system,
anything that cannot be “examified” is not worth learning.
The good Singaporean is taught that pragmatism is the highest virtue (and obedience
second). She is thus also trained to be a Type #1 student (and indeed a Type #1 human
being).
If you’re a Type #1 student, then this textbook may not be the best use of your time
(though you may still find the TYS and answers useful). Please use instead these resources,
which are provided with the efficient Type #1 student in mind:
• The H2 Mathematics CheatSheet, which contains all the formulae you’ll ever need
on two sides of a single A4 sheet of paper.11
• My H1 Mathematics Textbook, which is written simply, and which covers a subset
of the H2 syllabus.
• The H2 Maths Exercise Book (coming soon), which teaches you how to mindlessly
apply formulae and give the “correct” answer to every exam question.
• My totally awesome tuition classes!
Of course, it is fully intended that this textbook (complemented by a capable teacher) will
help any student get her A. But, as I now explain, getting an A is quite beside the point
of this textbook.
Even Type #1 student may find this textbook pragmatic, provided she plans to go
on and do more maths (this includes physics, economics, engineering). Let me
explain.
Over the years, the gahmen has made half-hearted attempts to magically transform test-
taking drones into creative innovators (usually involving silly four-letter campaigns like
TSLN 1997, TLLM 2005). Nonetheless, school administrators, teachers, and students alike
remain completely fixated with exams. And who can blame them, given the way the game
is set up?
11
Two things: (1) This CheatSheet does not include many of the formulae already printed in List MF26. (2) It is written
for 9758 (revised) students (so 9740 students may find a few things missing).
Another personal anecdote: While in JC, I remember being deeply mystified by why the
scalar (or dot) product, despite having a simple algebraic definition, could at the same time
also tell us about the cosine of the angle between the two vectors. I never figured it out,16
but it didn’t matter, because this was simply “yet another formula” that we were required
to know, for the sole purpose of answering exam questions.
13
In the 2012 PISA, the top countries were, in descending order, Singapore, Hong Kong, Taiwan, Korea, Macau, Japan,
Liechtenstein, Netherlands, Estonia, Finland. Source: PISA 2012 Results in Focus, p. 10.
14
I am currently accepting bets for this proposition: “By 2050, no born-and-bred Singaporean will have won a Fields medal
or a Nobel Prize (Peace Prize excluded).” We’ll need to work out what exactly “born-and-bred” means, but that can be
ironed out.
15
Lockhart, p. 29.
16
I remember complaining about this to a classmate and his response was “But that’s how we’ve always been taught maths
what. It’s just a bunch of formula.” He was probably right.
Today, of course, the intellectually curious student can easily find the answer on the internet. But at that time, the internet
was not quite so well-developed, so one could not easily find answers online.
Finally, I also hope that this textbook will serve as an authoritative resource to which
teachers and students alike can refer.
This textbook is far from perfect. To quote the motto of a certain neighbourhood secondary
school, the best is yet to be. I hope that with your help, this textbook will be continuously
improved.
So if you have any feedback or spot any errors, please feel free to email me.
As you can tell, I am pretty merciless about criticising others. So please don’t be shy about
pointing out to me the many mistakes that are surely still lurking in this textbook.
Preface 13
1 Sets 38
1.1 In ∈ and Not In ∉ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.2 Greater than >, Less Than <, Positive > 0, and Negative < 0 . . . . . . . . . . 40
1.3 Types of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4 The Order of the Elements Doesn’t Matter . . . . . . . . . . . . . . . . . . . . 42
1.13 Union ∪ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.14 Intersection ∩ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3 Functions 57
3.1 Formal Mathematical Notation for Functions . . . . . . . . . . . . . . . . . . 59
3.2 EVERY x ∈ D Must be Mapped to EXACTLY ONE y ∈ C . . . . . . . . . . . 63
3.3 Real-Valued Functions of a Real Variable . . . . . . . . . . . . . . . . . . . . . 65
3.4 The Range of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Creating New Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.6 One-to-One Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.7 Inverse Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.8 Domain Restriction to Create an Invertible Function . . . . . . . . . . . . . . 74
3.9 Composite Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4 Graphs 78
4.1 Graphing with Your TI84 Graphing Calculator . . . . . . . . . . . . . . . . . . 84
6 Intercepts 90
7 Symmetry 92
7.1 Reflection of a Point in a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2 Reflection of a Graph in a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 Lines of Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15.2 y = f (x + a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
15.3 y = af (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
15.4 y = f (ax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
21 Series 230
21.1 Convergent and Divergent Sequences and Series . . . . . . . . . . . . . . . . . 231
29 Vectors in 3D 289
32 Planes 319
32.1 Planes: Vector to Cartesian Equations . . . . . . . . . . . . . . . . . . . . . . . 326
32.2 Planes: Hessian Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
33 Distances 329
33.1 Distance of a Point from a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
33.2 Distance of a Point from a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 337
34 Angles 341
34.1 Angle between Two Lines (2D) . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
34.2 Angle between Two Lines (3D) . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
34.3 Angle between A Line and a Plane . . . . . . . . . . . . . . . . . . . . . . . . . 349
34.4 Angle between Two Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
V Calculus 439
66.5 The Sum of Two Independent Poisson R.V.’s is a Poisson R.V. . . . . . . . . 619
71.7 Sample Mean and Sample Variance are Unbiased Estimators . . . . . . . . . 684
71.8 The Sample Mean is a Random Variable . . . . . . . . . . . . . . . . . . . . . 687
71.9 The Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . 688
91.4 Answers for Ch. 23: Arithmetic Sequences and Series . . . . . . . . . . . . . . 1064
91.5 Answers for Ch. 24: Geometric Sequences and Series . . . . . . . . . . . . . . 1065
94.1 Answers for Ch. 45: Solving Problems Involving Differentiation . . . . . . . . 1121
94.2 Answers for Ch. 46: Maclaurin Series . . . . . . . . . . . . . . . . . . . . . . . 1123
94.3 Answers for Ch. 47: The Indefinite Integral . . . . . . . . . . . . . . . . . . . . 1126
The glory of [maths] is its complete irrelevance to our lives. That’s why it’s so fun!
Paul Lockhart (2009, A Mathematician’s Lament, p. 38).
I have never done anything ‘useful’. No discovery of mine has made, or is likely to make,
directly or indirectly, for good or ill, the least difference to the amenity of the world.
- G.H. Hardy (1940 [1967], A Mathematician’s Apology, p. 150).
The set is the basic building block of mathematics. Informally, a set is a “container” that
usually has some objects in it, but can sometimes also be empty.
Each object in a set is called an element (of that set).
Observations:
• The name of a set is often an upper-case letter; in this case, it is A.
• Mathematical punctuation marks called braces {} are used to denote a set. Listed within
these braces are the elements of the set.
• Elements of the set are separated by commas (,). This mathematical punctuation mark
means “and”.
• Thus, {3, π 2 , Clementi Mall, Love, the colour green} is the set consisting of five ele-
ments, namely 3 and π 2 and Clementi Mall and Love and the colour green.
• Elements in a set can be almost anything whatsoever! In this example, the
elements included a building (Clementi Mall), an abstract notion (Love), and even a
colour (green). The elements of a set can even be another set! But don’t worry, in the
context of A-level maths, the elements of a set will almost always be numbers.
• When we talk about a set, we refer to both the “container” itself and all the objects in
it.
Exercise 1. B is the set of the first 7 positive integers. Write down B in set notation.
(Answer on p. 1002.)
Exercise 2. C is the set of even prime numbers. Write down C in set notation. (Answer
on p. 1002.)
The mathematical punctuation mark ∈ means “is in”, while ∉ means “is not in”.
Example 2. Let B = {1, 2, 3, 4, 5, 6, 7}. Then 1 ∈ B, 2 ∈ B, 3 ∈ B, etc. You can read these
statements aloud as “1 is in B”, “2 is in B”, “3 is in B”, etc.
We can also write 1, 2, 3 ∈ B (“1, 2, and 3 are in B”).
Also, 8 ∉ B, 9 ∉ B, 10 ∉ B, etc. (“8 is not in B”, “9 is not in B”, “10 is not in B”, etc.).
We can also write 8, 9, 10 ∉ B (“8, 9, and 10 are not in B”).
Example 3. Cow ∈ {Cow, Chicken} reads aloud as “Cow is in the set consisting of Cow
and Chicken”.
Cow, Chicken ∈ {Cow, Chicken} reads aloud as “Cow and Chicken are in the set consisting
of Cow and Chicken”.
In this textbook:
• Greater than means “strictly greater than” (>). So I won’t bother saying “strictly”,
unless it’s something I want to emphasise.
• Less than means “strictly less than” (<).
• If I want to say greater than or equal to (≥) or smaller than or equal to (≤), I’ll
say exactly that.
• Positive means “greater than zero” (> 0).
• Negative means “less than zero” (< 0).
• Non-negative means “greater than or equal to zero” (≥ 0).
• Non-positive means “less than or equal to zero” (≤ 0).
• 0 is neither positive nor negative. Instead, 0 is both non-negative and non-positive.
The following taxonomy lists the several types of numbers you’ll encounter in this textbook.
We’ll study imaginary numbers only later on in Part IV of this textbook. For now, all
numbers we’ll consider are real numbers (or reals). We won’t define what real numbers
are. Instead, we’ll simply assume (like in secondary school) that “everyone knows” what
real numbers are.
Infinity (∞) and negative infinity (−∞) are NOT numbers. Informally, ∞ is the “thing”
that is greater than any real number. Similarly, −∞ is the “thing” that is smaller than any
real number. I repeat: INFINITY IS NOT A NUMBER.17
So what exactly are real numbers, infinity, and negative infinity? This is actually a fasci-
nating question that mathematicians were able to answer satisfactorily only from the 19th
century, but is beyond the scope of the A-levels.
Definition 1. An integer is any one of these real numbers: . . . , −3, −2, −1, 0, 1, 2, 3, . . .
Definition 2. A rational number (or simply rationals) is any real number that can be
expressed as the ratio of two integers. An irrational number (or simply irrationals) is any
other real number.
Example 5. The number 1.87 is a rational and a real, but it is not an integer.
Example 6. The number π ≈ 3.14159 is an irrational and a real, but it is neither an integer
nor a rational.
17
Actually, the truth is somewhat more complicated. Under certain special contexts in more advanced mathematics, infinity
is treated as a number. But in this textbook, I’ll simply keep it simple and insist that infinity is not a number.
The order in which we write out the elements of the set does not matter:
Definition 3. Two sets are equal (or identical) if both sets contain exactly the same ele-
ments.
Example 7. There are at least six equivalent ways to write the set of the 3 smallest positive
even numbers: {2, 4, 6} = {2, 6, 4} = {4, 2, 6} = {4, 6, 2} = {6, 2, 4} = {6, 4, 2}.
Example 9. The set of the 3 smallest positive even numbers can be written as {2, 4, 6}. It
can also be written as: {2, 2, 4, 6} or {2, 6, 6, 6, 4, 4}. Repeated elements are simply ignored.
The notation n({2, 4, 6}) denotes the number of elements in the set of the first 3 even num-
bers. Hence, n({2, 4, 6}) = 3. And we also have n({2, 2, 4, 6}) = 3 and n({2, 6, 6, 6, 4, 4}) = 3.
Example 10. {Cow, Chicken} = {Cow, Cow, Chicken} = {Chicken, Cow, Chicken}. And
n({Cow, Chicken}) = n({Cow, Cow, Chicken}) = n({Chicken, Cow, Chicken}) = 2.
Note that more commonly, the number of elements in the set A is written as ∣A∣. But for
some reason, the A-level syllabus instead uses the notation n(A), so that’s what we’ll use.
Exercise 3. W = {Apple, Apple, Apple, Banana, Banana, Apple}. What is n(W )? (An-
swer on p. 1002.)
Exercise 4. C is the set of even prime numbers. What is n(C)? (Answer on p. 1002.)
The mathematical punctuation mark “. . . ” is called the ellipsis and means “continue in the
obvious fashion”.
Example 11. D is the set of all odd positive integers smaller than 100. So in set notation,
we can write D = {1, 3, 5, 9, 11, . . . , 99}.
Example 12. T is the set of all negative integers greater than 100. So in set notation, we
can write T = {−99, −98, −97, . . . , −2, −1}.
What is obvious to one person might not be obvious to another. So only use the ellipsis
when you’re confident it will be obvious to your reader! And never be shy to write a few
more of the set’s elements (as I did with the sets above)!
Exercise 5. Let D and T be as in the above two examples. What are n(D) and n(T )?
(Answer on p. 1002.)
Example 13. Z+ is the set of all positive integers. So, Z+ = {1, 2, 3, . . . }. And since Z+ is
infinite, we write n(Z+ ) = ∞.
Example 14. Z is the set of all integers. So, Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }. And since
Z is infinite, we write n(Z) = ∞.
Obviously, for an infinite set, we cannot explicitly list out all of its elements. So we’ll often
use ellipses to help out, as we did in the above examples. Alternatively, we can use interval
notation or set-builder notation, which we’ll learn about shortly.
Exercise 6. H is the set of all prime numbers. Write down H in set notation. (Answer on
p. 1002.)
The following sets are so common that they have special symbols:
1. Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . } is the set of all integers. (Z is for Zahl, German for
number.)
2. Q is the set of all rational numbers. (Q is for quoziente, Italian for quotient.)
3. R is the set of all real numbers.
4. C is the set of all complex numbers. (To be studied only in Part IV of this textbook.)
To create a new set that contains only the positive (or negative) elements of the old set,
append a superscript plus (+ ) or minus (− ) to the name of a set:
1. Z+ = {1, 2, 3, . . . } is the set of all positive integers. Z− = {. . . , −3, −2, −1} is the set of all
negative integers.
2. Q+ is the set of all positive rational numbers. Q− is the set of all negative rational
numbers.
3. R+ is the set of all positive real numbers. R− is the set of all negative real numbers.
As we’ll learn later, there is no such thing as a positive or negative complex number. Hence,
there is no such set named C+ or C− .
To add the number 0 to a set, append a subscript zero (0 ) to its name:
Example 15. The set A = {3, π 2 , Clementi Mall, Love, the colour green}. And so the set
A0 = {3, π 2 , Clementi Mall, Love, the colour green, 0}.
Example 16. The set B = {1, 2, 3, 4, 5, 6, 7}. And so the set B0 = {0, 1, 2, 3, 4, 5, 6, 7}.
Adding both a superscript + and a subscript 0 to the name of a set creates a new set that
contains all positive elements of the old set and in addition the number 0.
Similarly, adding both a superscript − and a subscript 0 to the name of a set creates a new
set that contains all negative elements of the old set and in addition the number 0.
Example 17. If V = {−2, −1, 3, 4}, then V + = {3, 4}, V − = {−2, −1}, V0+ = {0, 3, 4}, and
V0− = {−2, −1, 0}.
Exercise 7. If U = {−1, 0, 2}, then what are U + , U − , U0 , U0+ , and U0− ? (Answer on p. 1002.)
1. (a, b) is the set of all real numbers that are greater than a and smaller than b. Such a
set is also called an open interval.
2. [a, b] is the set of all real numbers that are greater than or equal to a and smaller than
or equal to b. Such a set is also called an closed interval.
Example 19. The set J = [0, 3] denotes the set of all real numbers that are √
greater than
or equal to 0 and smaller than or equal to 3. So for example, the numbers 0, 2, 3 ∈ J.
3. (a, b] is the set of all real numbers that are greater than a and smaller than or equal to
b. Such a set is also called a half-open interval or a half-closed interval.
Example 20. The set K = (0, 3] denotes the set of all real numbers
√ that are greater than
0 and smaller than or equal to 3. So for example, the numbers 2, 3 ∈ K, but 0 ∉ K.
4. [a, b) is the set of all real numbers that are greater than or equal to a and smaller than
b. Such a set is also called a half-open interval or a half-closed interval.
Example 21. The set L = [0, 3) denotes the set of all real numbers
√ that are greater than
or equal to 0 and smaller than 3. So for example, the numbers 0, 2 ∈ L, but 3 ∉ L.
Exercise 8. How many elements does the set Z = [1, 1] contain? (Answer on p. 1002.)
Exercise 9. How many elements does the set Y = (1, 1) contain? (Answer on p. 1002.)
Exercise 10. How many elements does the set X = (1, 1.01) contain? (Answer on p. 1002.)
Exercise 11. Write down R, R+ , R+0 , R− , and R−0 in interval notation. (Answer on p.
1002.)
The empty set is literally the set that contains no elements. Hence the name!
Definition 4. The empty set is the set {}. It can also be denoted ∅.
Example 22. In 2016, the set of all Singapore Ministers who are younger than 30 is {} or
∅. This means that there is no Singapore Minister who is younger than 30.
Example 23. The set of all even prime numbers greater than 2 is {} or ∅. This means
that there is no even prime number that is greater than 2.
Example 24. The set of numbers that are greater than 4 and smaller than 4 is {} or ∅.
This means that there is no number that is simultaneously greater than 4 and smaller than
4.
As already mentioned, in this textbook (and also for the A-levels), the elements in a set will
almost always be numbers. But in general, the elements of a set can be (nearly) anything
whatsoever. In other words, a set really and simply is a “container” that can “contain”
(nearly) anything whatsoever.
Indeed, the elements of a set can be other sets, including even the empty set! Here are two
examples to illustrate:
Example 25. The set {∅} is not the same as the set ∅. The former is a set containing a
single element, namely the empty set. The latter is the empty set. It is perhaps clearer if
we rewrite them as
Note that the set {∅} = {{}} is certainly not empty, because it contains a single element
(namely the empty set).
Example 26. The set {∅, 1, {∅}} is the set containing three elements, namely the empty
set, the number 1, and a set containing the empty set.
Example 27. Let M = {1, 2}, N = {1, 2, 3}, and O = {1, 2, 4, 5}. Then M ⊆ N , but N ⊈ M .
Also, M ⊆ O, but O ⊈ M . Further, N ⊈ O and O ⊈ N .
Exercise 12. State whether Z, Q, and R are subsets of each other. (Answer on p. 1002.)
Exercise 13. True or false: “The set of currently-serving Singapore Prime Ministers is a
subset of the set of currently-serving Singapore Ministers.” (Answer on p. 1002.)
The next fact is useful for showing that two sets are equal.
Fact 1. Two sets are subsets of each other ⇐⇒ They are identical.
1. Two sets are subsets of each other Ô⇒ they are identical. (The symbol Ô⇒ stands
for implies or only if.)
2. Two sets are subsets of each other ⇐Ô they are identical. (The symbol ⇐Ô stands
for is implied by or if.)
Example 28. Let M = {1, 2}, N = {1, 2, 3}, O = {1, 2, 4, 5}, and P = {1, 2, 3}. Then
M ⊆ N, O, P and M ⊂ N, O, P . In contrast, N ⊆ P , but N ⊂/ P ; this is because N = P .
Exercise 14. Is the set of all squares (call it S) a proper subset of the set of all rectangles
(call it R)? (Answer on p. 1002.)
Exercise 15. Does A ⊆ B imply that A ⊂ B? (Answer on p. 1003.)
Exercise 16. Does A ⊂ B imply that A ⊆ B? (Answer on p. 1003 .)
Exercise 17. True or false statement: “If A is a subset of B, then A is either a proper
subset of or is equal to B.” (Answer on p. 1003.)
Remark 1. The official A-level syllabus uses the symbol ⊆ to mean “subset of” and ⊂ to
mean “proper subset of”. So this is what we’ll use in this textbook.
However, confusingly enough, many writers use the symbol ⊂ to mean “subset of” and ⊊ to
mean “proper subset of”. We will not follow such practice in this textbook. Just to let you
know, in case you get confused while reading other mathematical texts!
Definition 7. The union of the sets A and B (denoted A ∪ B) is the set of elements that
are either in A OR B.
Example 29. Let T = {1, 2}, U = {3, 4}, and V = {1, 2, 3}. Then T ∪ U = {1, 2, 3, 4},
T ∪ V = {1, 2, 3}, and U ∪ V = {1, 2, 3, 4}. And T ∪ U ∪ V = {1, 2, 3, 4}.
Exercise 18. Rewrite each of the following sets more simply: (a) [1, 2] ∪ [2, 3]. (b)
(−∞, −3) ∪ [−16, 7). (c) {0} ∪ Z+ ? (Answer on p. 1003.)
Exercise 19. What is the union of the set of squares (S) and the set of rectangles (R)?
(Answer on p. 1003.)
Exercise 20. What is the union of the set of rationals (Q) and the set of irrationals?
(Answer on p. 1003.)
Definition 8. The intersection of the sets A and B (denoted A ∩ B) is the set of elements
that are in A AND B.
Definition 9. Two sets intersect if their intersection contains at least one element (i.e.
A ∩ B ≠ ∅).
Definition 10. Two sets are mutually exclusive or disjoint if their intersection is empty
(i.e. A ∩ B = ∅).
Example 30. Let T = {1, 2}, U = {3, 4}, and V = {1, 2, 3}. Then T ∩ U = ∅, T ∩ V = {1, 2},
and U ∩ V = {3}. And T ∩ U ∩ V = ∅.
Exercise 21. Rewrite each of the following sets more simply: (a) (4, 7] ∩ (6, 9). (b) [1, 2] ∩
[5, 6]. (c) (−∞, −3) ∩ (−16, 7). (Answer on p. 1003.)
Exercise 22. What is the intersection of the set of squares (S) and the set of rectangles
(R)? (Answer on p. 1003.)
Exercise 23. What is the intersection of the set of rationals (Q) and the set of irrationals?
(Answer on p. 1003.)
The set minus (sometimes also called set difference) operator is very convenient. Sadly,
it is not in the A-level syllabus and so I’ll avoid using it in this textbook. Nonetheless, it’s
worth a quick mention.
Definition 11. A set minus B (denoted A/B or A − B) is the set that contains every
element in A that is not also in B.
Example 31. Let T = {1, 2}, U = {3, 4}, and V = {1, 2, 3}. Then T /U = T , T /V = ∅, and
U /V = {4}.
Definition 12. The set complement of A (denoted A′ or Ac ) is the set of all elements that
are not in A.
Example 32. Consider the set of positive integers. Let A = {2, 4, 6, 8, 10, . . . }. Then in
this context, A′ = {1, 3, 5, 7, 9, 11, . . . }.
Example 33. Consider the set of all reals. Let A = R+ . Then in this context, A′ = R−0 .
Example 34. We roll a die, hoping for an outcome of either 1 or 6. Our set of desired
outcomes may thus be written as A = {1, 6}.
Unfortunately, we do not get any of our desired outcomes. We may thus say that the actual
outcome was an element of the set A′ = {2, 3, 4, 5}.
Set-builder notation is an alternative method of writing down sets. In the current context,
the mathematical punctuation mark colon “∶” will mean “such that”.
Example 35. The set {x ∈ R ∶ x > 0} contains all x ∈ R such that x > 0. In words, this set
contains all real numbers that are positive.
What comes after the colon are the conditions or criteria that x must satisfy, in order to
qualify as a member of the set. Our sets will usually contain only numbers, but here’s an
example to show you how we can write down one particular set of musical artists.
Example 36. The set {x ∶ x is an artist that has had a US Billboard Hot 100 #1 Single}
contains all the artists who have ever had a US Billboard Hot 100 #1 Single.
It will however be more typical for our sets to be sets such as these:
Remark 2. We use the colon ∶ but some writers use instead the pipe ∣.
Exercise 24. Write down R− , Q− , Z− , R−0 , Q−0 , and Z−0 in set-builder notation. (Answer
on p. 1003.)
Exercise 25. Write down (a, b), [a, b], (a, b], and [a, b) in set-builder notation. (Answer
on p. 1003.)
Exercise 26. Let X = {x ∶ x is a living current or former Prime Minister of Singapore}.
Write down the set X so that all its elements are explicitly stated. (Answer on p. 1003.)
Exercise 27. Rewrite
√ each of the following sets in set-builder notation: (a) (−∞, −3) ∪
(5, ∞). (b) (−∞, 2] ∪ (e, π) ∪ (π, ∞). (c) (−∞, 3) ∩ (0, 7). (Answer on p. 1003.)
This very brief chapter is to warn you against making a common mistake — dividing by
0. Students have little trouble avoiding this mistake if the divisor is obviously a big fat 0.
Instead, students usually make this mistake when the divisor is an unknown constant or
variable that might be 0.
Example 38. Find the values of x for which x(x − 1) = (2x − 2)(x − 1).
Here’s the wrong solution: “Divide both sides by x − 1 to get x = 2x − 2. So x = 2.”
Here’s the correct solution: “Case #1. Suppose x − 1 = 0. Then the given equation is
satisfied. So x = 1 is one possible value for which x(x − 1) = (2x − 1)(x − 1). Case #2.
Now suppose x − 1 ≠ 0. So we can divide both sides by x − 1 to get x = 2x − 2. So x = 2.
Conclusion. The two possible values of x for which x(x − 1) = (2x − 1)(x − 1) are x = 1 and
x = 2.”
Moral of the story. Whenever you divide by a certain quantity, make sure it’s non-zero.
If you’re not sure whether it equals 0, then break up your analysis into two cases, as was
done in the above example: Case #1 — the quantity equals 0 (and see what happens
in this case); Case #2 — the quantity is non-zero (in which case you can go ahead and
divide).
By the way, let’s take this opportunity to clear up another popular misconception — You
may have heard that 1/0 = ∞. This is wrong. 1/0 ≠ ∞. Instead, any non-zero number
divided by 0 is undefined.18 “Undefined” is the mathematician’s way of saying, “You haven’t
told me what you are talking about. So what you are saying is meaningless.”
Exercise 28. What’s wrong with this “proof” that 1 = 0? (Answer on p. 1003.)
18
One exception is 0/0, which is indeterminate. This means that 0/0 is sometimes undefined, but can sometimes be defined
under certain circumstances.
You are probably familiar from secondary school with such statements as: “Let f (x) = x + 8
be a function.” Strictly speaking, this is not the correct way of describing a function.
Remark 3. The codomain is not the same thing as the range. We’ll learn about the range
only in the next section.
Altogether then, a function simply maps (or assigns) each element in the domain to one
(and exactly one) element in the codomain.
According to the mapping rule, “Cow” (in the domain) is mapped to “Produces milk”
(in the codomain) and “Chicken” (in the domain) is mapped to “Produces eggs” (in the
codomain). Every element in the domain is mapped to exactly one element in the codomain.
19
This definition is still informal. See Definition 135 in the Appendices for the exact, formal definition (optional).
According to the mapping rule, “1” (in the domain) is mapped to “2” (in the codomain)
and “2” (in the domain) is mapped to “4” (in the codomain). Every element in the domain
is mapped to exactly one element in the codomain.
According to the mapping rule, “3” (in the domain) is mapped to “3” (in the codomain),
“3.14159” (in the domain) is mapped to “3” (in the codomain), “3.5” (in the domain) is
mapped to “4” (in the codomain), and “3.88” (in the domain) is mapped to “4” (in the
codomain). Every element in the domain is mapped to exactly one element in the codomain.
Or alternatively:
This says that the function’s name is f , its domain is D, and its codomain is C. The last
bit “f ∶ x ↦ f (x)” is the mapping rule and this mapping rule applies to “all x ∈ D” (all
elements in the domain).
To save ourselves a bit of writing, if it’s clear from the context that we’re talking about the
function f , then we’ll omit “f ∶” from the front of the mapping rule. Also, if the mapping
rule applies universally to all elements of the domain, then we also omit the “for all x ∈ D”
at the end.
Altogether then, we will often simply write:
We will sometimes also denote the domain and codomain of f by Dom(f ) and Cod(f ).
In the context of set-builder notation (Section 1.17), the mathematical punctuation mark
colon “∶” stood for “such that”. However, in the context of functions, the colon “∶” stands
instead for “from”. Unfortunately there are only so many symbols and punctuation marks,
so invariably some symbols will have to play more than one role!
The mathematical punctuation mark “→” (right arrow) simply stands for “to”.
Altogether then, “f ∶ D → C” reads as “f is the function from domain D to domain C”.
The mathematical punctuation mark “↦” stands for “maps to”. Hence, “x ↦ f (x)” reads
as “x is mapped to f (x)”.
This may seem like an excessively pedantic distinction. But maths is precise and pedantic.
In maths, what we mean is precisely what we say and what we say is precisely what we
mean. There is never any room for ambiguity or alternative interpretations.
More examples:
This says that the function’s name is f , its domain is [0, 1] (the set of all reals between 0
and 1, including 0 and 1), its codomain is R (the set of all reals), and its mapping rule is
that we map each element x in the domain to the element 3x + 4 in the codomain. The
value of f at 0.5 is f (0.5) = 3(0.5) + 4 = 5.5.
What is f (3)? It is not 3(3) + 4 = 13. This is because 3 is not in the domain of f . Hence,
f (3) is simply undefined.
This says that the function’s name is f , its domain is R+ (the set of all positive reals), its
codomain is R (the set of all reals), and its mapping rule is that we map each element x in
the domain to the element ln x in the codomain. The value of f at 2 is f (2) = ln 2 ≈ 0.693.
f (0) is simply undefined, because 0 is not in the domain of f . Likewise, f (a) is undefined,
for any a < 0.
Exercise 29. For each of the following functions, write down the value of the function at
1. (a) The function f ∶ R → R is defined by x ↦ x + 1. (b) The function g ∶ [−1, 1] → R is
defined by x ↦ 17x. (c) The function h ∶ Z+ → R is defined by x ↦ 3x . (d) The function
i ∶ Z− → R is defined by x ↦ 3x . (Answer on p. 1004.)
This section simply repeats and emphasises what was already said above.
Can we define a function using the above domain, codomain, and mapping rule?
No. The reason is that the mapping rule fails to specify what “Cow” (an element of the
domain) should be mapped to. It thus fails the requirement that every element in the
domain be mapped to an element in the codomain.
Can we define a function using the above domain, codomain, and mapping rule?
No. The reason is that the mapping rule maps “Cow” (an element of the domain) to more
than one element in the codomain. It thus fails the requirement that every element in the
domain be mapped to exactly one element in the codomain.
• Domain R;
• Codomain [0, 1]; and
• Mapping rule: x ↦ x + 1.
Can we define a function using the above domain, codomain, and mapping rule?
No. The reason is that the mapping rule fails to map some elements in the domain (e.g.
14) to any element in the codomain. It thus fails the requirement that every element in the
domain be mapped to an element in the codomain.
• Domain R;
• Codomain R; and
• Mapping rule: x ↦ ±x.
Can we define a function using the above domain, codomain, and mapping rule?
No. The reason is that the mapping rule maps each element in the codomain (e.g. 14) to
more than one element in the codomain (+14 and -14). It thus fails the requirement that
every element in the domain be mapped to exactly one element in the codomain.
For Exercises 30-37: (i) State (yes/no) whether we can define a function using the given
domain, codomain, and rule. (ii) Explain why or why not. (iii) If we can, then write down
the function in formal notation.
Exercise 30. Let the domain be {5, 6, 7}, the codomain be Z+ , and the mapping rule be
x ↦ 2x (Answer on p. 1004.)
Exercise 31. Let the domain be {0, 3}, the codomain be {3, 4}, and the mapping rule be
(informally) “any larger number will work”. (Answer on p. 1004.)
Exercise 32. Let the domain be {2, 4}, the codomain be {3, 4}, and the mapping rule be
(informally) “any smaller number will work”. (Answer on p. 1004.)
Exercise 33. Let the domain be {1}, the codomain be {1}, and the mapping rule be
(informally) “stay exactly the same”. (Answer on p. 1004.)
Exercise 34. Let the domain be {1}, the codomain be {1, 2}, and the mapping rule be
(informally) “stay exactly the same”. (Answer on p. 1004.)
Exercise 35. Let the domain be {1, 2}, the codomain be {1}, and the mapping rule be
(informally) “stay exactly the same”. (Answer on p. 1004.)
√
Exercise 36. Let the domain be R, the codomain be R, and the mapping rule be x ↦ x.
(Answer on p. 1004.)
1
Exercise 37. Let the domain be R, the codomain be R, and the mapping rule be x ↦ .
x
(Answer on p. 1004.)
Exercise 38. How might you change the domain in Exercise 36 so that a function can be
defined? (Answer on p. 1005.)
Exercise 39. How might you change the domain in Exercise 37 so that a function can be
defined? (Answer on p. 1005.)
Definition 13. A function of a real variable is any function whose domain is a subset of
R.
Altogether then, a real-valued function of a real variable is any function both of whose
domain and codomain are subsets of R.
Consider the function i ∶ {Cow, Chicken} → Z defined by Cow ↦ 5 and Chicken ↦ 32. This
is a real-valued function, but not a function of a real variable. Thus, it is not a real-valued
function of a real variable.
Almost all functions considered in H2 Maths are real-valued functions of a real variable. So
we’ll see plenty of functions like f , g, and h from the above example, but rarely (if ever)
will we see functions like i or j.
In this textbook, unless otherwise clearly-stated, it may be assumed that all functions are
real-valued functions of a real variable.
Indeed, the range is usually a proper subset of the codomain, as was the case in each of the
following examples.
Example 49. Define f ∶ [0, 1] → R by x ↦ x + 1. Then Range(f ) = f ([0, 1]) = [1, 2].
Example 50. Define f ∶ {2, 3} → R by x ↦ x + 1. Then Range(f ) = f ({2, 3}) = {3, 4}.
The range is often is a proper subset of the codomain, but sometimes they can be equal:
√
Exercise 40. Let the function f ∶ R+0 → R be defined by x ↦ x. What is the range of f ?
(Answer on p. 1005.)
Exercise 41. Let the function f ∶ Z → Z be defined by x ↦ x2 . What is the range of f ?
(Answer on p. 1005.)
Exercise 42. Which of the following statements is/are true? (a) “The range of any function
is a subset of its domain.” (b) “The range of any function is a subset of its codomain.” (c)
“The range of any function is a proper subset of its codomain.” (Answer on p. 1005.)
As we shall see, f g shall refer to a function that is entirely different from f ⋅ g, so we must
really be careful to write f ⋅ g when that is what we mean.
We can of course give these four new functions new names (perhaps a single-letter name
for each), but this is not necessary. We can simply write:
f
(f − g) (1) = 7(1) + 5 − 13 = 11, ( ) (1) = [7 (1) + 5] /13 = 12,
g
where the pairs of parentheses around each of the five new functions are just to be clear
that we are talking about a single, fully-fledged function.
20
Formally, f + g is the function with domain A ∩ B, codomain R, and mapping rule x ↦ f (x) + g(x). Similarly, f − g is the
function with domain A ∩ B, codomain R, and mapping rule x ↦ f (x) − g(x).
f ⋅ g is the function with domain A ∩ B, codomain R, and mapping rule x ↦ f (x)g(x).
f
is the function with domain {x ∶ x ∈ A ∩ B, g(x) ≠ 0}, codomain R, and mapping rule x ↦ f (x)/g(x). The set A ∩
g
B/ {x ∶ g(x) = 0} is the set of all elements x that are in both A and B, excluding those for which g(x) = 0. This exclusion
is necessary, otherwise f (x)/g(x) may sometimes not be well-defined.
Finally, kf is simply the function with domain A, codomain R, and mapping rule x ↦ kf (x).
Informally, a function is one-to-one (or invertible) if every element in its range is “hit”
exactly once (by exactly one element in the domain). Put another way: every element y in
the range corresponds to exactly one element in the domain. Formally:
Example 54. Consider the function f whose domain is the set {Cow, Chicken}, codomain
is the set {Produces eggs, Produces milk, Guards the home}, and mapping rule is Cow ↦
Produces milk and Chicken ↦ Produces eggs.
This function is one-to-one because each element in the range is “hit” exactly once, as we
can easily verify: Produces eggs is “hit” once by Chicken and Produces milk is “hit” once
by Cow.
To check whether this function is one-to-one, we need to show that every element y in the
range corresponds to exactly one element x in the codomain. To this end, let’s pick any
element y in the range and write: y = x + 1 ⇐⇒ y − 1 = x.
Thus, indeed, this function is one-to-one — every element y in the range corresponds to
exactly one element y − 1 in the domain.
Remark 4. One-to-one or invertible functions are also known as injective functions (or
simply injections), but we won’t use this term in this textbook.
Exercise 43. State and explain √ whether each of the following functions is one-to-one. (a)
+ +
f ∶ R0 → R is defined by x ↦ x. (b) g ∶ R0 → R is defined by x ↦ x2 . (c) h ∶ R → R is
defined by x ↦ ∣x∣. (d) i ∶ R+0 → R is defined by x ↦ ∣x∣. (e) j ∶ R → R is defined by x ↦ sin x.
(Answer on p. 1006.)
Only invertible functions have inverse functions. If a function is not invertible, then
its inverse function simply does not exist.
Given a one-to-one (or invertible) function f , to find its inverse function f −1 , follow these
steps:
1. Dom (f −1 ) = Range (f ).
2. Cod (f −1 ) = Dom (f ).
3. Write down an expression f −1 (y) that involves only y and show that “f −1 (y) = x ⇐⇒
y = f (x) ”.
y =f (x) ⇐⇒ y = x + 1 ⇐⇒ y − 1 = x.
±−1
f (y)
We’ll actually only formally talk about graphs in the next few chapters. But for now, as a
visual aid, I’ll provide the graphs of f −1 (blue) and f (red) anyway.
Observe that f −1 is simply the reflection of f in the line y = x (dotted). Section 7.2 (in
particular Fact 7) will explain why exactly this is so.
y =f (x) ⇐⇒ y = 2x ⇐⇒ 0.5y = x.
±−1 f (y)
1
So f −1 has mapping rule y ↦ .
y
(Note that ∵ is the shorthand symbol for because. Similarly, ∴ is the shorthand symbol
for therefore.)
The condition here that y ≠ 0 is important and goes back to our warning that was Chapter
2 (Dividing by Zero). We know for sure that the range of f does not contain 0. This is
why in the last line above, we can safely divide both sides of the equation by y.
√ √
Here there are two possibilities for the mapping rule of f −1 , namely y → y and y → − y.
We must pick one. We know that the domain of f —and √ hence the codomain of f −1 — is
R+0 . So we should pick as the mapping rule of f −1 ∶ y → y.
+
√ the inverse function for each of the following functions. (a) f ∶ R0 → R
Exercise 44. Find
defined by x ↦ x. (b) g ∶ [−0.5π, 0.5π] → R defined by x ↦ sin x. (c) h ∶ R → R defined by
x ↦ x3 .(Answers on p. 1007.)
We saw that some functions were not one-to-one (or non-invertible). And so for these
functions, an inverse function simply does not exist.
Example 61. We saw in Exercise 43 that the function j ∶ R → R defined by x ↦ sin x was
not one-to-one. However, we can restrict the domain to [−0.5π, 0.5π] to get a brand new
function g ∶ [−0.5π, 0.5π] → R defined by x ↦ sin x. This brand new function g is identical
to the original function j except for its domain. g is one-to-one, as you should verify for
yourself.
We can thus go ahead and construct the inverse function g −1 . Actually, we already did this
in Exercise 44.
We can thus go ahead and construct the inverse function g −1 . I leave this as an exercise for
you.
There is almost always more than one way to restrict the domain of a non-invertible function
to obtain an invertible function. Indeed, a trivial case would be where we restrict its domain
to be the empty set! In which case the function thus formed would certainly be invertible,
though not very interesting (it would have an empty domain and an empty range — so too
would its inverse function).
1
Exercise 45. (a) Show that the function f ∶ (−∞, 1) ∪ (1, ∞) → R defined by x ↦
(x − 1)2
is not one-to-one. (b) Show that by restricting its domain to (1, ∞), we can create a new
invertible function g (you must prove that this new function is invertible). (c) Then find
the inverse function g −1 . (Answer on p. 1008.)
Exercise 46. For the function f in Example 62, let’s instead restrict the domain to [20, 30].
Show that the new function thus obtained is one-to-one and find its inverse. (Answer on
p. 1008.)
Definition 18. Let f and g be functions such that the range of g is a subset of the domain
of f . Then the composite function f g is the function with the same domain as g, the same
codomain as f , and mapping rule x ↦ f (g(x)).
The composite function f g can be read aloud as “f circle g” and is sometimes denoted
f ○ g, especially when we want to make clear that we are not talking about f ⋅ g. But we’ll
rarely use the f ○ g notation, unless there is some risk of confusion with f ⋅ g.
The underlined condition is important: The range of g must be a subset of the domain of
f in order for the composite function f g to exist. This condition ensures that given any x
from the domain of g, the value g(x) is itself also in the domain of f , so that f (g(x)) is
well-defined.
If this condition fails, then the composite function f g simply does not exist.
Let’s try computing f g(2). We can use the definition of a composite function: f g(2) =
f (g(2)) = f (2 + 1) = f (3) = 6.
Notice that for the composite function f g, we apply the function g first before applying the
function f . So for example, to compute, say f g(7), we compute g(7) first, then compute
f (g(7)). (A common mistake by students is to instinctiely read from left to right, and so
apply f first before g.)
Let’s try computing f g(3). We can use either the definition of a composite function:
f g(3) = f (g(3)) = f (32 ) = f (9) = 10.
We saw that if f is non-invertible, then its inverse function f −1 simply does not exist.
Nonetheless, we could restrict its domain to create a new invertible function g, whose
inverse function g −1 we could then write down.
By analogy, suppose we have functions f and g where g’s range is not a subset of f ’s
domain. Thus, the composite function f g simply does not exist. But we can play a similar
trick: We can restrict the domain of g to create a new function ĝ, so that the range of ĝ is
a subset of f ’s domain. We can then write down the composite function f ĝ.
Fortunately, this is not in the syllabus, so you don’t need to know how to do this. Yay!
Exercise 47. For each of the following pairs of functions f and g, verify that the composite
function f g exists and write it out in full. Also, compute f g(1) and f g(2). (a) The functions
g, f ∶ R → R defined by g ∶ x ↦ x2 + 1 and f ∶ x ↦ ex . (b) The functions g, f ∶ R → R defined
by g ∶ x ↦ ex and f ∶ x ↦ x2 + 1. (c) The functions g, f ∶ R− ∪ R+ → R defined by g ∶ x ↦ 1/2x
and f ∶ x ↦ 1/x. (d) The functions g, f ∶ R− ∪R+ → R defined by g ∶ x ↦ 1/x and f ∶ x ↦ 1/2x.
(Answer on p. 1009.)
Example 66. The function f ∶ R → R is defined by x ↦ 2x. The range of f is R and this
is indeed a subset of the domain of f (which is R). So the composite function f f ∶ R → R
exists and is defined by x ↦ f (f (x)) = 2f (x) = 2(2x) = 4x. And so for example f f (3) =
2(2 × 3) = 12.
The composite function f f can instead be written as f 2 . So in the above example, we’d
write f 2 (3) = 12.
We can, analogously, define the composite function f f 2 and denote it f 3 . Using the above
example, f 3 (x) = 8x and f 3 (3) = 24.
But confusingly enough, some writers use the symbol f 2 to mean “the second derivative of
f ”, f 3 to mean “the third derivative of f ”, etc.. We won’t follow such practice. Just to let
you know, in case you read other mathematical texts and get confused.
However, we will use f (3) to mean “the third derivative of”, f (4) to mean “the fourth
derivative of”, etc. This will show up occasionally in Part V (Calculus).
Exercise 48. For each of the following functions f , verify that the composite function f 2
exists and write it out in full. Also, compute f 2 (1) and f 2 (2). (a) The function f ∶ R → R
defined by x ↦ ex . (b) The function f ∶ R → R defined by x ↦ 3x + 2. (c) The function
f ∶ R → R defined by x ↦ 2x2 + 1. (Answer on p. 1009.)
An ordered pair is a mathematical object. Like a set of two objects, an ordered pair is,
informally, a “container” with two objects, where the objects are listed out with a comma
separating them.
The only difference between a set of two objects and an ordered pair is that order
matters for the latter.
To distinguish an ordered pair from a set of two objects, we use parentheses (instead of
braces).
We also refer to (a, b) as ordered set notation. So (Cow, Chicken) and (−5, 4) are both
examples of ordered pairs, written out in ordered set notation.
Example 68. Let (Cow, Chicken) and (Chicken, Cow) be ordered pairs. Let {Cow,
Chicken} and {Chicken, Cow} be sets.
Recall that for sets, order did not matter. Hence, {Cow, Chicken} = {Chicken, Cow}.
In contrast, for ordered pairs, order does matter. And so (Cow, Chicken) ≠ (Chicken, Cow).
Definition 19. An ordered pair of real numbers is any (x, y) where both x, y are real.
Example 69. (−5, 4), (1, 1), and (2, −3) are all ordered pairs of real numbers.
Confusingly, above in Section 1.9 (Intervals), we said that (−5, 4) was a set, namely {x ∈ R ∶
−5 < x < 4}. Here we say instead that (−5, 4) is an ordered pair, consisting of two objects
(−5 and 4), the order of which matters.
Unfortunately this is yet another bit of confusing notation you’ll have to live with. You’ll
have to learn to tell, from the context, whether (−5, 4) is a set of infinitely-many real
numbers or an ordered pair. But don’t worry, this is usually pretty obvious.
Definition 21. The cartesian plane is the set of all ordered pairs of real numbers.
In set-builder notation, the cartesian plane can be written as {(x, y) ∶ x ∈ R, y ∈ R}. This
reads aloud as “the cartesian plane is the set of ordered pairs of real number (x, y)”.
In this textbook, we’ll usually only ever look at ordered pairs of real numbers.
Hence, rather than say “ordered pair of real numbers”, we’ll simply say “ordered pair”. And
so whenever you see the notation (x, y), it should be understood that this is an ordered
pair of real numbers (and not cows or chickens).
And so instead of writing the cartesian plane as {(x, y) ∶ x ∈ R, y ∈ R}, we’ll simply write it
as {(x, y)}, with the understanding that x, y are reals.
In the present context, we’ll also simply call any ordered pair of real numbers a point.
(Later on, in the context of three-dimensional geometry, points will also refer to ordered
triples of real numbers.)
Definition 22. In the context of the cartesian plane, the origin is the point (0, 0).
Example 70. The points (or ordered pairs of real numbers) a = (−5, 4), b = (1, 1), and
c = (2, −3) are illustrated graphically on the cartesian plane:
Example 71. The set of three points {a, b, c} = {(−5, 4), (1, 1), (2, −3)} is a graph.
Given a function that is named with a lower-case letter, we will often use the upper-case
version of that same letter to denote that function’s graph. So for example, given the
function f , we often give its graph the name F .
Example 72. Consider the function f ∶ R → R defined by x ↦ x2 . Its graph may be written
as F = {(x, y) ∶ y = x2 }.
We’ve defined graph as a noun. But at the slight risk of confusion, we’ll also use it as a
verb that means “draw in the cartesian plane a given set of points”. So we can say either
“we draw the graph of f ” (graph as a noun), or “we graph f ” (graph as a verb).
To use the above example, we say that (2, 4) and (5, 25) are both points of f .
But since x determines f (x), it is nice but not necessary to specify the complete ordered
pair (x, f (x)). Instead, we can refer to the point simply as x. So in the above example, we
can simply say that “2 and 5 are both points of f ”, with the understanding that what we
really mean is “(2, 4) and (5, 25) are both points of f ”. This is a bit sloppy and at the risk
of some confusion, but will save us a lot of messy notation.
So in the context of functions, x does double duty. It can either refer to an element in the
function’s domain OR it can refer to a point of the function.
On exams though, it is probably safer to simply list out the full co-ordinates, whenever
you’re referring to a point. Just in case your marker is damn niao.
We just learnt about the graph of a function. A graph of an equation is similarly defined:
Example 73. The graph of the equation x2 + y 2 = 1 is simply the set {(x, y) ∶ x2 + y 2 = 1}.
Exercise 49. (a) Can the equation x2 + y 2 = 1 be rewritten into the form of a single
function? (b) Can it be rewritten into the form of two functions? (Answer on p. 1010.)
Exercise 50. Draw the graphs of each of the following equations. (a) y = ex . (b) y = 3x + 2.
(c) y = 2x2 + 1. (Answers on pp. 1010, 1011, and 1012.)
Here’s a super quick revision of some O-Level Maths we’ll be using. If you have severe
difficulty with these exercises, you should go back and review your O-Level Maths material!
For all real numbers x, y, a, and b (provided any denominators are non-zero):
x a xa
x ⋅x
a b
=x a+b
, ( ) = a,
y y
xa 1
b
= xa−b , x−a = ,
x xa
√
a1/b =
b
(xa ) = xab , b
a,
√ √ c
(xy)a = xa y a , ac/b = ac = ( b a) .
b
Exercise 52. (Answer on p. 1014.) Is each of the following true? (If true, explain why.
If false, simply give a counterexample.)
21
By convention, 00 is usually defined to be equal to 1 – this textbook will follow this practice.
Example 76. Here’s a case where there’s just a surd in the denominator:
√ √
1 2 2
√ =√ √ = .
2 2× 2 2
For more complicated cases, the trick is to use the fact that (a + b)(a − b) = a2 − b2 .
√ √ √ √
1 1− 2 1− 2 1− 2 1− 2 √
Example 77. √ = √ √ = √ = = = 2 − 1.
1 + 2 (1 + 2) (1 − 2) 12 − ( 2)2 1−2 −1
√
1 x2 x
√ = 2
+1− .
x
+ x2
+1 y y
y y2
⎧
⎪
⎪
⎪z, if z ≥ 0,
∣z∣ = ⎨
⎪
⎪
⎩−z,
⎪ if z < 0.
√ √ √ √
Example 78. ∣4∣ = 4 and ∣−4∣ = 4. ∣ 2∣ = 2 and ∣− 2∣ = 2.
(b) ∣x∣ ≤ b ⇐⇒ −b ≤ x ≤ b.
(b) ∣x − a∣ ≤ b ⇐⇒ a − b ≤ x ≤ a − b.
Proof. (a) By Fact 2, ∣x − a∣ < b if and only if −b < x − a < b. Rearranging the latter set of
inequalities yields a − b < x < a + b.
(b) Very similar.
22
The absolute value operator ∣⋅∣ is the function with domain R, codomain R+0 , and mapping rule x ↦ x if x ≥ 0 and x ↦ −x
if x < 0.
Example 79. The graph below is of the equation y = x + 3. It has horizontal intercept −3
and vertical intercept 3.
5
y
y=x+3
1
x
-5 -3 -1 1 3 5
-1
-3
-5
Horizontal intercepts are the x-coordinates of the points at which the graph intersects
the horizontal or x-axis. Similarly, vertical intercepts are the y-coordinates of the points
at which the graph intersects the vertical or y-axis.
3
f(x)
f(x) = x2 - 1
0
-2 -1 0 1 2
-1
The A-level exams will often ask you to write down the full co-ordinates of the points at
which a graph (or curve) crosses the axes — this means writing down both the x- and
y-coordinates, and not just the horizontal intercept or the vertical intercept. Here’s an
exercise to help you make this a habit.
Exercise 54. Write down in full the point(s) at which the graphs of each the following
equations crosses the axes: (a) x2 +y 2 = 1. (b) y = x2 −4. (c) y = x2 +2x+1. (d) y = x2 +2x+2.
(Answer on p. 1015.)
A reflection of a point in a line is its mirror image point on that line. Formally:
Definition 29. Let a be a point and l1 be a line. Let l2 be the line that is perpendicular
to l1 and runs through a. Let x be the point where l1 and l2 intersect. Then the reflection
of a in l1 is the point a′ on l2 such that the distances ax and a′ x are equal.
l1
l2 a x a'
Fact 5. Let (a, b) be a point. Its reflection in the line y = x is the point (b, a).
Fact 6. Let (a, b) be a point. Its reflection in the line y = −x is the point (−b, −a).
Example 81. (a) Given the point (3, 17), its reflection in the line y = x is (17, 3) and its
reflection in the line y = −x is (−17, −3).
(b) Given the point (−1, 5), its reflection in the line y = x is (5, −1) and its reflection in the
line y = −x is (−5, 1).
(c) Given the point (0, 0), its reflection in the line y = x is (0, 0) and its reflection in the
line y = −x is (0, 0).
Exercise 55. For each of the following points, write down their reflections in the lines (i)
y = x; and (ii) y = −x. (a) (3, 17). (b) (−1, 5). (c) (0, 0). (Answer on p. 1016.)
Definition 30. The reflection of a graph G in a line is the graph G′ where each point in
G′ is a reflection of a point in G.
Example 82. The reflection of the graph G = {(x, y) ∶ y = x2 + 4} in the line y = 2 is the
graph G′ = {(x, y) ∶ y = −x2 }.
G : y = x2 + 4
y=2
line of reflection
G ' : y = -x2
x=0
line of reflection
G ' : y = ln (-x) G : y = ln x
Fact 7 formalises our earlier observation in section 3.7 (Inverse Functions) that the graphs
of f and its inverse f −1 are reflections in the line y = x.
Fact 7. Let f be an invertible function. Then the reflection of the graph of f in the line
y = x is the graph of its inverse function f −1 .
Fact 8. Let (a, a) be a point. Its reflection in the line y = x is (a, a).
Fact 9. Let f be invertible. Suppose f passes through (a, a). Then so too does its inverse
f −1 . And hence, f and f −1 intersect at those points where x = f (x).
The above Fact is useful for finding the intersection points of a function and its inverse.
Example 84. Let f ∶ R → R be the invertible function defined by x ↦ 2x. The graph of f
intersects the graph of f −1 at the point(s) where x = f (x) ⇐⇒ x = 2x ⇐⇒ x = 0. Notice
the intersection point (0, 0) is also on the line y = x. See figure on p. 71.
Example 85. Let f ∶ R+0 → R be the invertible function defined by x ↦ x2 . The graph of
f intersects the graph of f −1 at the point(s) where x = f (x) ⇐⇒ x = x2 ⇐⇒ x(x − 1) = 0
⇐⇒ x = 0, 1. Notice the intersection points (0, 0) and (1, 1) are also on the line y = x. See
figure on p. 73.
Be careful not to make the mistake of believing that f and f −1 can only intersect at points
where x = f (x). A function and its inverse can certainly intersect at points that
are not on the y = x line.
Example 87. The graph of y = x2 is symmetric in the line x = 0 (which also happens to
be the vertical axis).
4
y
x=0
Reflection
line
3
y = x2
x
0
-2 -1 0 1 2
5
y
4
y = -x y=x
line 3 line
1
y=1/x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1 x
-2
-3
-4
-5
The syllabus makes nearly no mention of limits and none of continuity. Yet differentiation
and integration are built entirely on the concept of limits. Continuity is also almost always
assumed. It is thus well-worth spending an hour or two on these concepts, especially since
they’re not difficult and everything will become that much clearer.
(The right arrow symbol “→” means to in the context of functions, but now means ap-
proaches in the context of limits.)
Equivalently, we may say “The limit of f (x) as x approaches 3 is equal to 17.” We write:
Statements #1 and #2 are entirely equivalent. Either may be (informally) interpreted thus:
-5 -3 -1 1 3 5
This interpretation is informal because the phrase “close to” is vague. For the formal
definitions of limits (optional), see Section 88.1 in the Appendices.
The subtle condition “but not equal to” requires emphasis. When considering the limit
of f at 3, we do NOT care about the value f (3). Indeed, we do NOT care even if f (3) is
undefined!
Here’s an example where lim g(x) is well-defined, even though g(3) is not.
x→3
Example 90. Graphed below is the function g ∶ (−∞, 3)∪(3, ∞) → R defined by x ↦ 5x+2.
It looks almost exactly like that of f (from the previous example), except there is now a
“hole” (or more formally, a discontinuity) at x = 3.
Nonetheless, it is still true that
-5 -3 -1 1 3 5
-5 -3 -1 1 3 5
As x → 3, i(x) →
/ 17, or equivalently, lim i(x) ≠ 17.
x→3
-5 -3 -1 1 3 5
Section 8.3 gives more examples of limits. But first, let’s learn about continuity.
Section 88.6 in the Appendices contains additional definitions and results concerning con-
tinuity (optional).
However, it is not continuous at 3, because lim g(x) = 17, but g(3) is undefined, and so
x→3
lim g(x) ≠ g(3).
x→3
Altogether, g is continuous at any a ∈ (−∞, 3)∪(3, ∞), because for any a ∈ (−∞, 3)∪(3, ∞),
we have lim g(x) = g(a). But g fails to be continuous at 3.
x→a
But it is not continuous at 3, because lim h(x) = 17, but h(3) = 0 and so lim h(x) ≠ h(3).
x→3 x→3
Altogether, h is continuous at any a ∈ (−∞, 3)∪(3, ∞), because for any a ∈ (−∞, 3)∪(3, ∞),
we have lim h(x) = h(a). But h fails to be continuous at 3.
x→a
We now turn to examples where limits do not exist. We start with a trivial example.
This is a trivial example because −3 is “far from” the domain of f . So obviously, for all
values of x that are “close to” but not equal to −3, f (x) is undefined and so of course there
is no number L that f (x) is always “close to”!
x
-5 -3 -1 1 3 5
is
undefined.
It’s difficult or even impossible to draw an accurate graph of g near the origin.
In this example, lim g(x) does not exist. The reason is that for all values of x that are
x→0
“close to” but not equal to 0, there is no number L that g(x) is “close to”. When x is “close
to” 0, g(x) takes on every value in [−1, 1] infinitely often! And so g(x) can never be said
to be “close to” any one single number L.
Altogether then, g is not continuous at 0. (With a little work, we can actually prove that
g is continuous on R− and also on R+ , but this is beyond the scope of the A-Levels.)
There are infinitely many points along the line y = 1. And there are also infinitely many
points along the line y = 2! It is quite impossible to sketch its graph accurately.
However, lim h(x) does not exist. However we try to restrict x to values that are close to
x→3
(but not equal to) 3, h(x) is never close to any one single value; instead, h(x) switches
infinitely often between 1 and 2.
Indeed, lim h(x) does not exist for any a ∈ R! However we try to restrict x to values that
x→a
are close to (but not equal to) a, h(x) is never close to any one single value; instead, h(x)
switches infinitely often between 1 and 2.
y
2
is nowhere-
continuous.
⎧
⎪
⎪
⎪1, if x ≤ 0,
Exercise 56. Consider the function f ∶ R → R defined by f (x) = ⎨ What are
⎪
⎪2,
⎪ if x > 0.
⎩
lim f (x), lim f (x), and lim f (x)? (Answer on p. 1017.)
x→−5 x→0 x→5
This section considers infinite limits, i.e. where as x approaches some number, f (x)
increases (or decreases) grows without bound .
Example 96. Graphed below are the functions f and g, both with domain (−∞, 3)∪(3, ∞)
1 1
and codomain R, defined by f ∶ x ↦ 2 + and g ∶ x ↦ −2 − .
(x − 3)2 (x − 3)2
vertical asymptote x
-2 -1 0 1 2 3 4 5 6 7 8
Observe that for all values of x that are “close to” but not equal to 3, there is no number
L that f (x) is “close to”. Hence, we say that lim f (x) simply does not exist. Similarly,
x→3
lim g(x) does not exist either.
x→3
“lim f (x) = ∞” must NOT be interpreted to mean that there exists something called
x→3
“lim f (x)” (no such thing exists); or that this thing is equal to some other thing called “∞”
x→3
(recall that ∞ is not a number!). Instead, “lim f (x) = ∞” is interpreted informally as:
x→3
Again, see Section 88.3 in the Appendices (optional) for the formal definitions.
Example 97. Graphed below is the equation y = tan x. It has two vertical asymptotes
x = ±π/2, because lim = −∞ and lim = ∞.
x→−π/2 x→π/2
15 y
10
Vertical
asymptote
x = - π /2
5
y = tan x
x
0
π/2 π/2
-5
Vertical
asymptote
x = π /2
-10
-15
This section considers limits at infinity (not to be confused with the infinite limits dis-
cussed in the previous section). That is, the behaviour of f (x) as x increases (or de-
creases) grows without bound.
Example 96 (revisited). Reproduced below are the graphs of the functions f and g,
1
both with domain (−∞, 3) ∪ (3, ∞) and codomain R, defined by f ∶ x ↦ 2 + and
(x − 3)2
1
g ∶ x ↦ −2 − .
(x − 3)2
We already saw that f and g both have vertical asymptote x = 3, because as x → ∞, f (x)
increases without bound and g(x) decreases without bound. We now consider instead what
happens as x increases or decreases without bound.
horizontal asymptotes x
As x increases without bound, f (x) → 2 and g(x) → −2. And as x decreases without
bound, f (x) → 2 and g(x) → −2. We can write these observations as lim f (x) = 2,
x→∞
lim g(x) = −2, lim f (x) = 2, and lim g(x) = −2.
x→∞ x→−∞ x→−∞
Pedantic point: Infinite limits do not exist. In contrast, limits at infinity DO exist. Here
in this example, lim f (x) does not exist. In contrast, lim f (x) and lim f (x) both exist
x→3 x→∞ x→−∞
(and are both equal to 2).
20 y
15 y = ex
10
Horizontal asymptote
y=0 x
0
-4 -2 0 2 4
1
Example 99. Consider the function f ∶ R− ∪ R+ → R defined by x ↦ x + .
x
As x increases without bound or decreases without bound, f (x) approaches the line
y = x. We can also write these observations as lim f (x) = x and lim f (x) = x.
x→∞ x→−∞
Again, see Section 88.4 in the Appendices (optional) for the formal definition of an oblique
asymptote.
5 y
1
x
-5 -3 -1 1 3 5
-1
Oblique
asymptote
y=x
-3
-5
The problem of finding the derivative is the problem of finding the slope of the tangent to
a graph at a given point.
Graphed below is some function f ∶ R → R. Pick some point A = (a, f (a)). Draw the line l
which is tangent to the graph at the point A.
How do we find the slope of l? Unsure of how to proceed, we try a crude approximation.
Pick some point X1 = (x1 , f (x1 )) that is also on the graph. Consider the line AX1 . What’s
f (x1 ) − f (a)
its slope? Slope = Rise ÷ Run and so AX1 has slope .
x1 − a
This number serves as our first crude approximation of the slope of l.
How can we improve on this approximation? Simple — just pick some point X2 = (x2 , f (x2 ))
f (x2 ) − f (a)
that is closer to A. The line AX2 has slope .
x2 − a
This number serves as our second, improved approximation of the slope of l.
At least in theory, we can keep repeating this procedure, by picking points that are ever
closer to A. Our estimates of the slope of l will get ever better. Altogether then, we are
motivated to make the following formal definition of the derivative:
f (x) − f (a)
lim .
x→a x−a
If this limit exists, then we say that f is differentiable at the point a ∈ D and we call this
limit the value of f ’s derivative at the point a ∈ D.
But if this limit does not exist, then we say that f is not differentiable at the point a ∈ D
and the value of f ’s derivative at the point a ∈ D is undefined or does not exist.
Indeed, the value of f ’s derivative at any point a < 0 is −1, because for any a < 0,
f (x) − f (a) ∣x∣ − ∣a∣ −x + a
lim = lim = lim = lim −1 = −1.
x→a x−a x→a x − a x→a x − a x→a
f (x) − f (0)
So as x → 0, there is no one single value towards which the expression ap-
x−0
proaches. So the limit does not exist.
f (x) − f (a)
Lagrange’s notation: f ′ (a) = lim .
x→a x−a
R R
df (x) RRRR f (x) − f (a) df RRRR f (x) − f (a)
Leibniz’s notation: RRR = lim or RRR = lim .
dx RR x→a x−a dx RR x→a x−a
Rx=a Rx=a
⋅ f (x) − f (a)
Newton’s notation: f (a) = lim .
x→a x−a
Some remarks.
But Newton’s notation is sometimes used in physics (especially when the independent
variable is time). You certainly need to know about Newton’s notation because it is on the
A-level syllabus. Nonetheless, this textbook will avoid using Newton’s notation.
d
• Leibniz’s notation is convenient in that it allows us to interpret as the “differentiate
dx
with respect to” operator.
Section 9.5 will give some examples of how this operator works.
Define ∆x to be equal to x − a, and ∆f (x) to be equal to f (x) − f (a), so that we can write:
R
∆f (x) df RRRR f (x) − f (a)
lim = RRR = lim .
x→a ∆x dx RR x→a x−a
Rx=a
a = −5 a=2 a=0
R R R
df RRRR df RRRR df RRRR
R = −1, R = 1, R
dx RRRR dx RRRR dx RRRR
Leibniz’s notation: is undefined,
Rx=−5 Rx=2 Rx=0
⋅ ⋅ ⋅
Newton’s notation: f (−5) = −1, f (2) = 1, f (0) is undefined.
dz dz dy
= .
dx dy dx
It is tempting to naïvely interpret the expressions in the above equation as fractions, naïvely
apply simple algebra, naïvely cancel out the dy’s, so that the equation is indeed true. But
Another result is the Inverse Function Theorem, which may informally be stated as:
dy 1
= .
dx dx
dy
dy dx
Again, the naïve interpretation would be of and as fractions, so that indeed by naïve
dx dy
algebra, the above equation is true. But again, the correct informal interpretation (easily
seen when written in Leibniz’s notation) is this: “The change in y caused by a small unit
change in x” is equal to “The reciprocal of the change in x caused by a small unit change
in y”.
For a more detailed discussion, see the leading answer to this question on Math StackEx-
change.
The Derivative of f
Lagrange’s notation: f ′.
df (x) df
Leibniz’s notation: or .
dx dx
⋅
Newton’s notation: f.
d 2 d
For the next example, I assume you already know that cx = 2cx and cx = c.
dx dx
Example 101. Let f ∶ R → R be defined by f (x) = 7x2 . Its derivative is the function
f ′ ∶ R → R defined by f ′ (x) = 14x. This derivative may be denoted
df (x) df ⋅
f ′ or or or f .
dx dx
′ df (x) df ⋅
The value of the derivative of f at 0.5 is f (0.5) = ∣ = ∣ = f (0.5) = 1.75.
dx x=0.5 dx x=0.5
df (x) df ⋅
The value of the derivative of f at 1 is f ′ (1) = ∣ = ∣ = f (1) = 7.
dx x=1 dx x=1
df (x) df ⋅
The value of the derivative of f at 2 is f ′ (2) = ∣ = ∣ = f (2) = 28.
dx x=2 dx x=2
The derivative is also known as the first derivative. The second derivative is, similarly, also
a function:
The Derivative of f
Lagrange’s notation: f ′′ .
d2 f (x) d2 f
Leibniz’s notation: or .
dx2 dx2
⋅⋅
Newton’s notation: f.
d
Under Leibniz’s notation, since is the operator, it makes sense to denote the second
dx
d2 d2 f
derivative of f by f or .
dx2 dx2
′′ d2 f (x) d2 f ⋅⋅
f or or or f .
dx2 dx2
df (x) df ⋅
The value of the second derivative of f at 0.5 is f ′ (0.5) = ∣ = ∣ = f (0.5) = 14.
dx x=0.5 dx x=0.5
df (x) df ⋅
The value of the second derivative of f at 1 is f ′ (1) = ∣ = ∣ = f (1) = 14.
dx x=1 dx x=1
df (x) df ⋅
The value of the second derivative of f at 2 is f ′ (2) = ∣ = ∣ = f (2) = 14.
dx x=2 dx x=2
d3 f d4 f
Leibniz’s notation: Etc.
dx3 . dx4
3 4
⋅ ⋅
Newton’s notation: f. f.
Example 101 (revisited). Let f ∶ R → R be defined by x ↦ 7x2 . Its first derivative is the
function f ′ ∶ R → R defined by x ↦ 14x. Its second derivative is the function f ′′ ∶ R → R
defined by x ↦ 14. We have f ′ (2) = 28 and f ′′ (2) = 14.
Its third derivative is the function f (3) ∶ R → R defined by x ↦ 0. Its fourth derivative is
the function f (4) ∶ R → R defined by x ↦ 0. Observe that f (3) = f (4) . Indeed, the third and
all higher-order derivatives are identical functions: f (3) = f (4) = f (5) = . . .
We have f (3) (2) = f (4) (2) = f (5) (2) = ⋅ ⋅ ⋅ = 0. Indeed, for any x ∈ R, we have f (3) (x) =
f (4) (x) = f (5) (x) = ⋅ ⋅ ⋅ = 0.
d 2
Example 102. “ x = 2x” is simply shorthand for this statement:
dx
d
Example 103. “ f = g” is simply shorthand for this statement:
dx
d
Example 104. “ f ⋅ g = g ⋅ f ′ + f ⋅ g ′ ” is simply shorthand for this statement:
dx
d d
k = 0, sin x = cos x,
dx dx
d d
f ± g = f ′ ± g′, cos x = − sin x,
dx dx
d d
kf = kf ′ , f ⋅g = g ⋅ f ′ + f ⋅ g′,
dx dx
d d f g ⋅ f ′ − f ⋅ g′
xk = kxk−1 , = ,
dx dx g g⋅g
d d d (f ○ g) dg
ex = ex , f ○g = ⋅ .
dx dx dg dx
d 1
ln x = ,
dx x
(My mnemonic for the Quotient Rule is: “Lo-D-Hi minus Hi-D-Lo; cross over and square
the low.”)
Of the above rules, the Chain Rule is the most powerful. We can also write it more elegantly
(if a little imprecisely) as
dz dz dy
= ⋅ .
dx dy dx
As discussed above in the historical note (p. 116), thus written, the Chain Rule has a
beautiful informal interpretation: “The change in z caused by a small unit change in x” is
equal to “The change in z caused by a small unit change in y” × “The change in y caused
by a small unit change in x”. This makes perfect sense:
Altogether then, when I add 1 g of Milo (the x-variable) to a cup of water, I’d expect the
water level to rise by 0.6 cm. That is, dz/dx = 0.6 cm g-1 . This is indeed consistent with
dz dz dy
= = 2 × 0.3 = 0.6 cm g−1 .
dx dy dx
In case you’ve forgotten how it works, here are a few examples to illustrate:
√
Example 107. Let g ∶ R → R be defined by x ↦ 4x − 1.
√ √
d 4x − 1 d 4x − 1 d(4x − 1) −0.5 −0.5
g ′ (x) = = = 0.5 (4x − 1) ⋅ 4 = 2 (4x − 1) .
dx d(4x − 1) dx
Here’s a more complicated example, where the Chain Rule is applied twice.
3
Example 108. Let f ∶ R → R be defined by x ↦ [sin(2x − 3) + cos(5 − 2x)] . Then
3
′ d [sin(2x − 3) + cos(5 − 2x)]
f (x) =
dx
3
d [sin(2x − 3) + cos(5 − 2x)] d[sin(2x − 3) + cos(5 − 2x)]
=
d[sin(2x − 3) + cos(5 − 2x)] dx
2 d sin(2x − 3) d(2x − 3) d cos(5 − 2x) d(5 − 2x)
= 3 [sin(2x − 3) + cos(5 − 2x)] [ + ]
d(2x − 3) dx d(5 − 2x) dx
2
= 3 [sin(2x − 3) + cos(5 − 2x)] [cos(2x − 3) ⋅ 2 − sin(5 − 2x) ⋅ (−2)]
2
= 6 [sin(2x − 3) + cos(5 − 2x)] [cos(2x − 3) + sin(5 − 2x)] .
d d d
Corollary 1. tan x = sec2 x, cot x = − csc2 x, and csc x = − csc x cot x.
dx dx dx
SYLLABUS ALERT
d
csc x = − csc x cot x is in the List of Formulae for 9758 (revised), but not for 9740 (old).
dx
d d
cot x = − csc2 x and csc x = − csc x cot x.
dx dx
(b) Assume that mass is constant. Explain why Newton’s Second Law then simplifies into
the more-familiar F = ma, where a is acceleration (i.e. the rate of change of velocity).
In other words, f is differentiable if and only if f ′ has the same domain as f . Similarly, f
is twice-differentiable if and only if f ′′ has the same domain as f .
The condition that the first derivative (or second derivative) exists at every point in the
domain is important. Failing which, we do not consider the function to be differentiable
(or twice-differentiable). The three functions in the next example illustrate:
⎧
⎪
′′
⎪
⎪−2, for x < 0,
g (x) = ⎨
⎪
⎪
⎪ for x > 0.
⎩2,
But g ′′ (0) does not exist. And so g is differentiable but NOT twice-differentiable.
, for all .
x
- 2, for x < 0,
2, for x > 0.
is undefined.
⎧
⎪
′
⎪
⎪−2, for x < 0,
h (x) = ⎨
⎪
⎪2,
⎪ for x > 0.
⎩
But h′ (0) does not exist. So h is not even once-differentiable. (And thus it is certainly not
twice-differentiable either.)
The function i is infinitely-differentiable, with the 6th and higher-order derivatives all hav-
ing the mapping rule x ↦ 0.
The function j is infinitely-differentiable, with every derivative simply being the same
function as j.
g is continuous — you can draw its entire graph without lifting your pencil. However, it is
not differentiable because of the “kink”.
y
h is neither continuous
nor differentiable.
f is both continuous
and differentiable.
g is continuous, but
not differentiable.
dy
Example 113. Consider the equation x2 + y 2 = 1. What is ?
dx
√
Method #1. First write y in terms of x: y = ± 1 − x2 . Then differentiate:
dy −2x −x ∓x
=± √ = ±√ =√ .
dx 2 1 − x2 1 − x2 1 − x2
d
Method #2 (implicit differentiation). Directly apply to the given equation:
dx
d d dy dy x
(x2 + y 2 ) = (1) ⇐⇒ 2x + 2y = 0 Ô⇒ =− .
dx dx dx dx y
√
If desired, we can plug in y = ± 1 − x2 to get the same answer as before:
dy x ∓x
=− √ =√ .
dx ± 1 − x2 1 − x2
In the above example, the second method (implicit differentiation) is not obviously superior
to the first. However, it is sometimes difficult (or impossible) to express y in terms of x.
Nonetheless we might still want to compute dy/dx. In such cases, the method of implicit
differentiation is wonderful. The next example illustrates:
√ y dy dy
Example 114. Consider the equation x2 y + = 1. What is ∣ (i.e. what is
cos x dx x=0 dx
when evaluated at x = 0)?
In this example, it’s difficult to express y in terms of x. But this doesn’t matter, because
we can use implicit differentiation:
Now plug in x = 0:
d d 1 d −1
Corollary 2. sec x = sec x tan x, sin−1 x = √ , cos−1 x = √ , and
dx dx 1 − x2 dx 1 − x2
d 1
tan−1 x = .
dx 1 + x2
d 1
Proof. To prove that sin−1 x = √ , first rewrite y = sin−1 x as x = sin y. Next
dx 1−x 2
d dy
then apply (implicit differentiation) to get 1 = cos y . But sin2 y + cos2 y = 1, so
√ dx dx
cos y = 1 − x . And so,
2
dy d 1 1
= sin−1 x = =√ .
dx dx cos y 1 − x2
Exercise 62 asks you the prove the derivatives of sec x, cos−1 x and tan−1 x are as claimed.
d d −1 d
Exercise 62. Prove that sec x = sec x tan x, cos−1 x = √ , and tan−1 x =
dx dx 1 − x2 dx
1
. (Answer on p. 1019.)
1 + x2
Note: At x = 0, f is both decreasing and increasing, but neither strictly decreasing nor
strictly increasing. This follows from the formal definitions (below).
Definition 41. Given a function f and a set of points S, we say that f is ...
Exercise 63. Let g ∶ R → R defined by x ↦ sin x. Identify the sets on which which g is
increasing, decreasing, strictly increasing and/or strictly decreasing. (Answer on p. 1020.)
The derivative is the slope of the tangent. And so not surprisingly, the derivative is intimi-
ately related to whether a function is increasing or decreasing. Formally:
, for . , for .
, for . , for .
1. If f (x) ≥ f (a) for all a ∈ D that are “close to” x, then we call x a maximum point of
f and f (x) a maximum value.
2. If f (x) ≤ f (a) for all a ∈ D that are “close to” x, then we call x a minimum point of
f and f (x) a minimum value.
3. If f (x) > f (a) for all a ∈ D that are “close to” x, then we call x a strict maximum
point of f and f (x) a strict maximum value.
4. If f (x) < f (a) for all a ∈ D that are “close to” x, then we call x a strict minimum
point of f and f (x) a strict minimum value.
Of course, a strict maximum point is also a maximum point. And a strict minimum point
is also a minimum point.
Any maximum or minimum point is also known as an extremum (plural: extrema) or an
extreme point.
23
See p. 954 in the Appendices for the formal definitions.
x = ±1 y
maximum points
-2 -1 0 1 2 3
x = 0, 2
minimum points
Every point is a
maximum point.
Every point is a
minimum point.
x
-2 -1 0 1 2 3
1. If f (a) ≥ f (x) for all x ∈ D, we call a the global maximum point of f and f (a) the global
maximum value.
2. If f (a) ≤ f (x) for all x ∈ D, we call a the global minimum point of f and f (a) the global
minimum value.
3. If f (a) > f (x) for all x ∈ D/{a}, we call a the strict global maximum of f and f (a) the
strict global maximum value.
4. If f (a) < f (x) for all x ∈ D/{a}, we call a the strict global minimum of f and f (a) the
strict global minimum value.
Fact 11. There cannot be more than one strict global maximum point of a function. (Sim-
ilarly, there cannot be more than one strict global minimum point of a function.)
Proof. Suppose for contradiction that two distinct points x1 and x2 are strict global maxi-
mum points of f . Then since x1 is a strict global maximum point, we have f (x1 ) > f (x2 ).
Similarly, since x2 is a strict global maximum point, we have f (x2 ) > f (x1 ). The two
inequalities are contradictory. So it is impossible that two distinct points x1 and x2 are
strict global maximum points of f .
x = ±1 are maximum points. However, they are not global maximum points. Indeed, h
has no global maximum point because lim h(x) = ∞ (“as x increases without bound, h(x)
x→∞
also increases without bound”). In other words, there is no x such that h(x) ≥ h(a) for all
a ∈ R.
Similarly, x = 0, 2 are minimum points. However, they are not global minimum points.
Indeed, h has no global minimum point because lim h(x) = −∞ (“as x decreases without
x→−∞
bound, h(x) also decreases without bound”). In other words, there is no x such that
h(x) ≤ h(a) for all a ∈ R.
x = ±1 y
maximum points
-2 -1 0 1 2 3
x = 0, 2
minimum points
We next restrict the domain of h in two ways to create two new functions i and j:
i has three maximum points in total, namely ±1, 2.5. However, only 2.5 is a global maximum
point of i because only i(2.5) ≥ i(x) for all x ∈ [−1.5, 2.5]. Of course, it is also a strict global
maximum point because i(2.5) > i(x) for all x ∈ [−1.5, 2.5].
i has three minimum points in total, namely −1.5, 0, 2. However, only −1.5 is a global
maximum point of i because only i(−1.5) ≤ i(x) for all x ∈ [−1.5, 2.5]. Of course, it is also
a strict global minimum point because i(−1.5) < i(x) for all x ∈ [−1.5, 2.5].
y x = 2.5 y
x = ±1 x = -1
max max and max and
global max global max x = 1, 1.2
max
x x
-2 -1 0 1 2 3 -2 -1 0 1 2 3
x = -1.5
min and x = -1.2, 0 min x = 2 min and
global min x = 0, 2 min global min
Also graphed above (right) is the function j ∶ [−1.2, 2.2] → R defined by x ↦ 6x5 − 15x4 −
10x3 + 30x2 .
Again, there are three maximum points in total, namely ±1, 2.2. However, only −1 is a
global maximum point of j because only j(−1) ≥ j(x) for all x ∈ [−1.2, 2.2]. Of course, it is
also a strict global maximum point because j(−1) > i(x) for all x ∈ [−1.2, 2.2].
And again, there are three minimum points in total, namely −1.2, 0, 2. However, only 2 is
a global minimum point of j because only j(2) ≤ j(x) for all x ∈ [−1.2, 2.2]. Of course, it is
also a strict global minimum point because j(2) < j(x) for all x ∈ [−1.2, 2.2].
Nonetheless, these concepts are not difficult to grasp. It is thus well worth learning them,
just so you have a better understanding of how to find maximum and minimum points.
Note also that what we simply call maximum and minimum points are sometimes instead
called local maximum and minimum points, so that they are better contrasted with global
maximum or minimum points.
Exercise 64. (Answer on p. 1020.) For each of the following functions, write down,
if any of these exist, the (i) maximum points, (ii) minimum points, (iii) strict maximum
points, (iv) strict minimum points, (v) global maximum points, (vi) global minimum points,
(vii) strict global maximum points, (viii) strict global minimum points; and also all the
corresponding values of the function at these points.
(a) f ∶ R → R defined by x ↦ 100.
(b) g ∶ R → R defined by x ↦ x2 .
(c) h ∶ [1, 2] → R defined by x ↦ x2 .
Definition 44. A turning point is any point that is both a stationary point and a maximum
or minimum point.
Type A B C D E
Max ✓ ✓
Min ✓ ✓
Strict Max ✓ ✓
Strict Min ✓ ✓
Global Max ✓
Global Min ✓
Strict Global Max ✓
Strict Global Min ✓
Stationary ✓ ✓ ✓
Turning ✓ ✓
Exercise 65. Is each of the following statements true or false? To show that a statement
is false, simply give a counterexample from the above example. If it is true, explain why.
(Answer on p. 1021.)
(a) Every maximum point or minimum point is a stationary point.
(b) Every maximum point or minimum point is a turning point.
(c) Every stationary point is a maximum point or minimum point.
(d) Every turning point is a maximum point or minimum point.
(e) Every turning point is a stationary point.
(f) Every stationary point is a turning point.
Example 120. Consider the set S = [0, 1]. The points 0.2, 1/3, and 0.775 are all in the
interior of S. Indeed, every point x ∈ (0, 1) is in the interior of S.
In contrast, the points 0 and 1 are non-interior points of S.
Example 121. Consider the set S = [0, 0.5) ∪ (0.5, 1]. The points 0.2, 1/3, and 0.775 are all
in the interior of S. Indeed, every point x ∈ (0, 0.5) ∪ (0.5, 1) is in the interior of S.
In contrast, the points 0 and 1 are non-interior points of S.
The point 0.5 is not in the interior of S. It is not even a non-interior point of S, because
it is not in the set S to begin with.
In order for 1 to be a maximum point of f , it must be that to its left, f is increasing; while
to its right, f is decreasing. In other words, to the left of 1, f ′ (x) ≥ 0. While to the right of
1, f ′ (x) ≤ 0. Altogether then, we must have f ′ (1) = 0 — at the maximum point, the slope
of the function must be 0.
Exercise 66. Refer to the above Example. Explain the intuition for why g ′ (−1) = 0.
(Answer on p. 1021.)
In secondary school, you may have been taught that to find the maximum and minimum
points of f , simply follow this procedure:
Unfortunately, the above procedure (let’s call it the Incorrect Recipe) may sometimes
fail. It rests on the false belief that “f ′ (x) = 0 ⇐⇒ x is an extremum”. This is false
because
1. The IET does NOT say, “f ′ (x) = 0 Ô⇒ x is an extremum.” It is perfectly possible that
f ′ (x) = 0 without x being an extremum.
2. The IET does NOT say, “x is an extremum Ô⇒ f ′ (x) = 0 .” Instead, it says, “x is an
extremum AND an interior point Ô⇒ f ′ (x) = 0.” Thus, it is perfectly possible that x
is an extremum without f ′ (x) = 0.
The Incorrect Recipe does correctly identify the points B = (−1, f (−1)) and C =
3 3
(− , f (− )) as maximum and minimum points, respectively. But it makes two mistakes.
5 5
Mistake #1: D = (0, 0) is neither a maximum nor a minimum point, contrary to the
Incorrect Recipe.
Mistake #2: A and E are respectively a minimum and a maximum point, but neither is
detected by the Incorrect Recipe.
We now give the Correct Recipe for finding maximum and minimum points:
1. The Correct Recipe demands that you also check the non-interior points, which may
possibly be extrema, but may be overlooked by the Incorrect Recipe.
2. The Correct Recipe does not assume that every single one of our shortlist of points (the
stationary points and the non-interior points) is either a maximum point or a minimum
point. It allows for the possibility that some of these points could be neither.
Example 122. Consider f ∶ [−1, 1] → R defined by x ↦ x3 . Let’s apply the Correct Recipe.
Altogether, we conclude that −1 is the only minimum point and 1 is the only maximum.
Exercise 68. For each of the following functions, find all the maximum and minimum
points using the Correct Recipe. (Answer on p. 1022.)
(a) f ∶ R → R defined by x ↦ x. (b) g ∶ [0, 1] → R defined by x ↦ x.
4 2
(c) h ∶ R → R defined by x ↦ x − 2x . Identify (if any) the global minimum point(s).
Tangent line at x = 0
is concave upwards on
-2 -1 0 1 2
is concave downwards on
In contrast, f is concave upwards on R+0 because there, the line segment connecting any
two points on f is above the graph of f .
0 is an inflexion point because this is where the function f changes from being concave
downwards to being concave upwards.
A test for whether a point is an inflexion point is this: Draw the tangent line to the graph
at that point. The point is an inflexion point ⇐⇒ The line is above the graph on one side
of the point and below the graph on the other side (see Fact 95 in the Appendices).
The tangent line to the graph at the point 0 is drawn in green (it coincides with the
horizontal axis). We indeed see that the line is above the graph on the left side of the point
and below the graph on the right side of the point. Therefore, 0 is an inflexion point.
24
These are informal definitions. For the formal definitions, see p. 956 in the Appendices (optional).
⎧
⎪
⎪
⎪
⎪ < 0, for x ∈ R−0 ,
⎪
⎪
f ′ (x) = 3x2 ⎨= 0, for x = 0,
⎪
⎪
⎪
⎪
⎪
⎪
⎩> 0, for x ∈ R+0 .
But this is wrong! It is perfectly possible that f ′′ (x) = 0 without x being an inflexion
point! Here’s an example:
y
g is concave upwards everywhere
.
However, is not an
inflexion point of .
x
Definition 46. A stationary point of inflexion is simply any point that is both an inflexion
point and a stationary point.
A non-stationary point of inflexion is simply any point that is an inflexion point, but not
a stationary point.
Also illustrated is the tangent line at y = x (whose slope is indeed non-zero). Observe that
indeed, to the left of 0, the tangent line is above the graph; while to the right of 0, the
tangent line is below the graph. This serves as a second way to verify that 0 is a point of
inflexion.
y
Concave upwards on
Tangent line at 0
Concave downwards on
y y x
From graphs, it looks like around a maximum turning point a, f must be concave down-
wards, i.e. f ′′ (a) < 0. Similarly, around a minimum turning point b, f must be concave
upwards, i.e. f ′′ (b) > 0. The next proposition is thus intuitively plausible.
The third part of the above Proposition must be heavily emphasised: If f ′ (a) = 0 and
f ′′ (a) = 0, then the 2DT tells us absolutely nothing about a! a could be a maximum
point, a minimum point, an inflexion point, or something else altogether!
We previously gave the Correct Recipe for finding maximum and minimum points. Let’s
now add the 2DT to this recipe:
(a) Check if each of these points is a maximum point, a minimum point, or neither.
If f is not twice-differentiable, then the Enriched Recipe may not work. Fortu-
nately, most functions in A-levels are twice-differentiable.
2. The only two non-interior points are −1.5 and 0.5. Again by sketching the graph, we see
that −1.5 is a minimum point and 0.5 is a maximum point.
Altogether, we conclude that there are two maximum points — −1 and 0.5 — and two
minimum points — −0.6 and −1.5.
Exercise 69. Use the Enriched Recipe to find the maximum and minimum points of each
of the following functions. (Answer on p. 1024.)
(a) g ∶ R → R defined by x ↦ x8 + x7 − x6 .
π π
(b) h ∶ (− , ) → R defined by x ↦ tan x.
2 2
(c) i ∶ [0, 2π] → R defined by x ↦ sin x + cos x.
The Venn diagram below depicts the five types of points you need to know for the A-levels:
Inflexion, maximum, minimum, stationary, and turning points. To its right is a graph of a
rather-arbitrary function t ∶ D → R designed to illustrate these various points. The x- and
y-coordinates of a are denoted ax and ay ; similarly for other points.
a All y
b Inflexion c points
d e j
i
Stationary f
h
e g
Turning c g
f
h j b
i
Max Min a
• For most functions you’ll ever encounter, most points are like a. For lack of a better
name, we can call such points boring points — a boring point is simply any point that
is not an inflexion, maximum, minimum, stationary, or turning point.
• b is a non-stationary point of inflexion (explicitly excluded from the A-levels).
• c is a stationary point of inflexion.
• A point like d (not illustrated) — a stationary point that is not a maximum, minimum,
or inflexion point — is extremely unusual. You can find an exotic example on p. 960.
• f is both a maximum and minimum point because for all x ∈ D that are “close to”
fx ∈ D, we have t(x) ≤ t (fx ) ≤ t(x).
• The set of turning points is simply the intersection of the set of stationary points and
the set of maximum and minimum points.
• h is a maximum point because t(x) ≤ t (hx ) for all x ∈ D that are “close to” hx .
• j is a minimum point because t(x) ≥ t (jx ) for all x ∈ D that are “close to” jx .
• i is both a maximum and minimum point because there are simply no x ∈ D that are
“close to” ix ∈ D, and thus it is trivially or vacuously true that t(x) ≤ t (ix ) ≤ t(x) for x
that are “close to” x.25 i is not a stationary point because t′ (ix ) ≠ 0 — indeed, t′ (ix ) is
undefined.26
25
A point like ix ∈ D that is not “close to” any other x ∈ D is, aptly enough, called an isolated point.
26
ix is an example of a critical point. A critical point is any point that is either stationary or where the derivative is
undefined. Don’t worry, not something you need to know for the A-levels.
Given the graph of f ′ , you are required to know how to figure out what f looks like. Let’s
start with a very simple example.
The derivative simply gives the slope of f . Since f ′ (x) = 1 for all x, this means that f has
constant slope of 1. We are given moreover that f (0) = 2 (i.e. the vertical intercept is 2).
Altogether then, f (x) = x + 2 and is graphed in red above.
-1
x
The derivative simply gives the slope of g. Since g ′ (x) = −1 for all x < 0 and g ′ (x) = 1 for
all x > 0, this means that g has constant slope of −1 for x < 0 and constant slope of 1 for
all x > 0. We are given moreover that lim g(x) = −2, so the two branches of g nearly meet
x→0
at (0, −2), with a hole there. Altogether then,
⎧
⎪
⎪
⎪−x − 2, for x < 0,
g(x) = ⎨
⎪
⎪
⎩x − 2,
⎪ for x > 0.
The derivative simply gives the slope of h. Since h′ (x) < 0 for all x < 0, h′ (0) = 0, and
h′ (x) > 0 for all x > 0, this means that h is strictly decreasing on R− , a turning point at 0,
and strictly increasing on R+ .
Altogether then, even if we don’t know how to figure out what h(x) is, we can at least
roughly sketch the graph of h (in red above below). (Of course, you probably already know
from secondary school that h(x) = x2 /2, but we’re not supposed to know this until we learn
about integration later in this textbook.)
Quadratic equations show up very often in various contexts. So here is a fairly complete if
brisk review of quadratic equations, which you were supposed to have completely mastered
in secondary school.
1 2 b c
ax2 + bx + c = (x + x + ) .
a a a
2b b 2 b2
x + x = (x + ) − .
a 2a 4a
2 1 b 2 b2 c 1 b 2 b2 − 4ac
Hence, ax + bx + c = [(x + ) − 2 + ]= [(x + ) − ].
a 2a 4a a a 2a 4a2
What we just did above is called completing the square. We can use this to compute the
zeros or roots of the equation ax2 + bx + c = 0.
ax2 + bx + c = 0
1 b 2 b2 − 4ac b 2 b2 − 4ac
= [(x + ) − ] = (x + ) −
a 2a 4a2 2a 4a2
b 2 b2 − 4ac
⇐⇒ (x + ) =
2a 4a2
√
−b ± b2 − 4ac
⇐⇒ x= .
2a
This last expression give the roots of the equation ax2 + bx + c = 0. This expression will
NOT be printed in the A-Level List of Formulae! So be sure you remember it!
√
−b ± b2 − 4ac
x= .
2a
Category Features
∪-shaped.
1. a > 0, b2 − 4ac > 0
Intersects the horizontal axis at two points.
∪-shaped.
2. a > 0, b2 − 4ac = 0
Just touches the horizontal axis at the minimum point.
∪-shaped.
3. a > 0, b2 − 4ac < 0
Doesn’t intersect the horizontal axis.
∩-shaped.
4. a < 0, b2 − 4ac > 0
Intersects the horizontal axis at two points.
∩-shaped.
5. a < 0, b2 − 4ac = 0
Just touches the horizontal axis at the maximum point.
∩-shaped.
6. a < 0, b2 − 4ac < 0
Doesn’t intersect the horizontal axis.
√
−b ± b2 − 4ac
.
2a
– Moreover, we can write
√ √
−b + b2 − 4ac −b + b2 − 4ac
ax2 + bx + c = (x − ) (x + ).
2a 2a
What we have just done is to factorise the expression ax2 + bx + c. Factorisation is often
a useful trick to play.
Notice that if you plug in either of the roots into the right hand side (RHS) of the above
equation, we do indeed get zero, as expected.
2 −b 2 b 2
ax + bx + c = (x − ) = (x + ) .
2a 2a
b
– Notice that if you plug x = − into the RHS of the above equation, we do indeed get
2a
zero, as expected.
Exercise 71. For each of the following equations, sketch its graph and identify its intercepts
and turning points (if these exist). (a) y = 2x2 + x + 1. (b) y = −2x2 + x + 1. (c) y = x2 + 6x + 9.
(Answer on p. 1028.)
15.1 y = f (x) + a
The graph of y = f (x) + a is simply the graph of y = f (x) translated (moved) upwards by
a units.
The graph of y = af (x) is simply the graph of f (x) vertically-stretched (outwards from the
horizontal axis) by a stretching factor of a.
The graph of y = f (ax) is simply the graph of f (x) horizontally-stretched (outwards from
the vertical axis) by a stretching factor of 1/a. Or equivalently, the graph of y = f (ax) is
simply the graph of f (x) horizontally-compressed (inwards towards the vertical axis) by a
compression factor of a.
Why a stretching factor of 1/a (and not a)? The reason is that in order for f (x1 ) and
f (ax2 ) to hit the same value, we must have x2 = x1 /a. That is, every x value is scaled by
a factor of 1/a.
Equivalently, the blue curve is simply the red curve compressed horizontally (inwards
towards from the vertical axis) by a factor of 2.
Notice the green curve is simply the red curve stretched horizontally (outwards from
the vertical axis) by a factor of 1/1.1 and then translated downwards by 1 unit.
The graph of y = ∣f (x)∣ is simply the graph of f (x), but with all points for which f (x) < 0
reflected in the horizontal axis.
The graph of y = f (∣x∣) is simply the graph of f (x), but with all points for which x < 0
reflected in the vertical axis.
1
Exercise 73. Describe a series of transformations that would transform the graph of y =
x
1
to y = 3 − . (Answer on p. 1030.)
5x − 2
Conic sections are formed from the intersection of a double cone and a 2D cartesian plane.
Take an infinitely large double cone (it goes upwards and downwards forever). Use a 2D
cartesian plane to slice the double cone from all conceivable positions and at all conceivable
angles. The intersection of the plane and the surface of the double cone form curves which,
aptly enough, are called conic sections.
The figure below27 doesn’t show the upper half of the double cone, but you can easily
imagine it. Of the four curves depicted, only the hyperbola also cuts the upper half of the
double cone.
27
Taken from Wikipedia, which has an excellent page on conic sections.
We can prove (but do not do so in this textbook) that in general, a conic section is the
graph of the equation
1
Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0,
where A, B, C, D, E, F are real constants and x and y are the two variables (on the
cartesian plane).
In secondary school, we already learnt in some detail a special case of conic sections — the
1
quadratic y = ax2 + bx + c. This is the special case of the equation = where
A = a, B= 0, C = 0, D= b, E = −1, and F = c.
28
Strictly speaking, there are also the so-called degenerate conic sections, but we shall ignore these.
x2 y 2
1. + = 1,
a2 b2
x2 y 2
2. − = 1,
a2 b2
y 2 x2
3. − = 1,
b2 a2
ax + b
4. y= ,
cx + d
ax2 + bx + c
5. y= .
dx + e
1
Exercise 74. As per the general form given in =, state for each of the above five equations,
what A, B, C, D, E, and F are. Compute the discriminant for each equation. Hence,
conclude that first equation is of an ellipse and the remaining four are of hyperbolae.
(Answer on p. 1031.)
x2 y 2
The equation + = 1 describes an ellipse. In this section, we’ll study a special case of
a2 b2
this equation, where a = b = 1. The equation then becomes x2 + y 2 = 1, which is the unit
circle centred on the origin.
By unit circle, we mean that it has radius of unit length, i.e. length 1.
1. Intercepts. The graph intersects the vertical axis at the points (0, −1) and (0, 1) and
the horizontal axis at the points (−1, 0) and (1, 0).
2. Turning points. In this case, it is easy to see that there is a maximum turning point
at (0, 1) and a minimum turning point at (0, 1). But just as an exercise, let’s also try
to find these turning points more rigorously, i.e. through calculus.
So the only stationary point of the function f is 0. We must now determine whether it is
a maximum, minimum, or inflexion point.
Compute the second derivative and evaluate it at the stationary point:
This second derivative is messy and can be further simplified, but in this case there is no
need to simplify it, since all we want is to evaluate it at 0. We have
Squares are a proper subset of rectangles. Similarly, circles are a proper subset of ellipses.
The ellipse can be regarded as the generalisation of the circle.
Why does the equation x2 /a2 + y 2 /b2 = 1 describe an ellipse? Rewrite the equation as
x 2 y 2
( ) + ( ) = 1.
a b
1. First, stretch the graph horizontally, outwards from the vertical axis, by a factor of a.
2. Then stretch the graph vertically, outwards from the horizontal axis, by a factor of b.
1. Intercepts. The graph intersects the vertical axis at the points (0, −b) and (0, b), and
the horizontal axis at the points (−a, 0) and (a, 0).
2. Turning points. Clearly, there are maximum and minimum turning points at (0, b)
and (0, −b). Let’s find these rigorously using calculus.
These are graphed above. Let’s compute the first derivative of f and set it equal to 0:
−0.5
′ −0.5 −2x x2
f (x) = 0.5∣b∣ ( 2 ) = −∣b∣x (1 − 2 ) a−2 = 0 Ô⇒ x = 0.
a a
So the only stationary point of the function f is 0. We can show that it is a maximum
point, by computing the second derivative and evaluating it at 0:
−0.5 −0.5
′′ d x2 −2 −2 d x2
f (x) = [−∣b∣x (1 − 2 ) a ] = −a ∣b∣ [x (1 − 2 ) ]
dx a dx a
−0.5 −1.5
−2 x2 x2 −2x
= −a ∣b∣ [(1 − 2 ) − 0.5 (1 − 2 ) ( 2 )]
a a a
02 02 −0x
f ′′ (0) = −a−2 ∣b∣ [(1 − 2 ) −0.5 − 0.5 (1 − 2 ) −1.5 ( 2 )] = −a−2 ∣b∣ < 0.
a a a
And since g = −f , g ′ (0) = 0 and g ′′ (0) = a−2 ∣b∣ > 0. That is, the only stationary point of g is
(0, −b) and it is a minimum point.
2 2
(x + c) (y + d)
+ = 1.
a2 b2
(i) Sketch its graph. (ii) Write down the points at which it intersects the axes. (iii) Identify
any turning points. (iv) Write down the equations of any lines of symmetry and also (v)
asymptotes.
y = 1/x (graphed) is the first hyperbola we’ll study. It is also the simplest possible hyperbola.
5 y
y = -x
line of The graph of
symmetry 4 y = 1 / x has
two branches. y=x
line of
3 symmetry
1
x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
y=0
-1 horizontal
asymptote
(0, 0)
-2
Centre
-3
x=0
-4 vertical
asymptote
-5
It turns out that all hyperbolae we’ll study have some common features. They have two
branches. In the case of y = 1/x, one branch is top-right and the other is bottom-right.
5. Two lines of symmetry. Both pass through the centre. Moreover, each line of sym-
metry bisects an angle formed by the two asymptotes.
In the case of y = 1/x, they are y = x and y = −x.
x2 − y 2 = 1 is a hyperbola and so it has two distinct branches. Notice also that if x ∈ (−1, 1),
then there is no value of y for which x2 − y 2 = 1. Hence, the graph of this equation is empty
in the region where x ∈ (−1, 1).
1. Intercepts. The graph crosses the horizontal axis at the points (−1, 0) and (1, 0), but
does not intersect the vertical axis.
2. The two turning points — there is a minimum turning point at (0, b) and a maximum
turning point at (0, −b).
√ √ √
3. Asymptotes. We have y = ± x − 1. So as x → ∞, y = ± x − 1 → ± x2 = ±x.
2 2
(Informally, as x → ∞, the 1 becomes negligible and we can simply ignore it). And so
the two asymptotes are y = x and y = −x. The two asymptotes are perpendicular and
so this is a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (0, 0).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes.
So they must have slope 1 and −1. Moreover, both pass through the centre (0, 0).
Altogether, we can work out that the lines of symmetry are y = x and y = −x.
x 2 y 2
( ) −( ) =1
a b
1. First stretch the graph horizontally, outwards from the vertical axis, by a factor of a.
2. Then stretch the graph vertically, outwards from the horizontal axis, by a factor of b.
x 2 y 2
The graph’s characteristics are similar to before. Again, ( ) − ( ) = 1 is a hyperbola
a b
and so it has two distinct branches. Notice also that if x ∈ (−a, a), then there is no value
x2 y 2
of y for which 2 − 2 = 1. Hence, the graph of this equation is empty in the region where
a b
x ∈ (−a, a).
Exam Tip
On the A-level exams, they typically only ask for (i) the intercepts; (ii) the asymptotes;
and (iii) turning points.
Nonetheless, you might as well know about the centre and the two lines of symmetry,
because these concepts are not difficult and will help you to sketch better graphs.
SYLLABUS ALERT
y 2 /b2 − x2 /a2 = 1 is explicitly in the 9758 (revised) but not the 9740 (old) syllabus.
But even if you’re taking 9740, you might as well learn to draw y 2 /b2 − x2 /a2 = 1, because
it’s really simple (since you now know how to draw x2 /a2 − y 2 /b2 = 1).
y 2 x2
The graph of the equation 2 − 2 = 1 is simply the graph we studied in the previous section,
b a
π
but rotated clockwise (or anticlockwise).
2
Let’s summarise the graph’s characteristics. This is a hyperbola and so there are two
distinct branches. Notice also that if y ∈ (−b, b), then there is no value of x for which
y 2 x2
− = 1. Hence, the graph of this equation is empty in the region where y ∈ (−b, b). The
b2 a2
range of y is thus (−∞, b] ∪ [b, ∞).
y 2 x2
If we’d like, we can also find the turning points of 2 − 2 = 1 more rigorously, that is,
b a
through calculus. As with the circle, although it is not possible to rewrite this equation
√ it into the form of two functions.
into the form of a single function, it is possible to rewrite
x 2
Namely, f ∶ (−∞, −a] ∪ [a, ∞) → R defined by x ↦ ∣b∣ ( ) + 1 and g ∶ (−∞, −a) ∪ (a, ∞) →
√ a
2
x
R defined by x ↦ −∣b∣ ( ) + 1. The graph of the function f is entirely above the hori-
a
zontal axis, while that of g is entirely below the horizontal axis.
−0.5
′ x 2 2x ∣b∣ x
f (x) = 0.5∣b∣ [( ) + 1] ( 2
)= √ .
a a a x2 + a2
Hence, the only stationary point of f is (0, b). Let’s check what sort of a stationary point
this is.
√ −0.5
∣b∣ x2 + a2 − x(0.5) (x2 + a2 ) (2x)
f ′′ (x) = .
a x2 + a2
∣b∣
And so f ′′ (0) = > 0. Hence, this is a minimum point.
a2
Similarly, by computing the first derivative of g and doing the work, we can find that the
only stationary point of g is (0, −b) and that this is a maximum point.
Remember long division? Turns out it’ll be useful for dividing polynomials. Here are a
couple primary school examples to jog your memory.
11
7 83
77
6
The quotient is the integer portion of the solution and the remainder is the “left-over”
integer.
Example 140. What’s 470 ÷ 17? By long division, the quotient is 27 with a remainder of
11. So, 470 ÷ 17 = 2711/17.
27
17 470
459
11
In this textbook, we’ll almost always consider only polynomials in one variable. So when
I say polynomial, I’ll always mean a polynomial in one variable, unless otherwise stated.30
Example 141. The expressions 7x − 3 and 4x + 2 are 1st-degree polynomials (in one vari-
able). These are also called linear polynomials. (Polynomials of low degree are often also
called by such special names.)
Example 142. The expressions 3x2 + 4x − 5 and −x2 + 2x + 1 are 2nd-degree polynomials.
These are also called quadratic polynomials.
Example 143. The expressions 2x3 + 2x2 + 3x − 1 and −3x3 + 2x2 + 3x + 1 3rd-degree poly-
nomials. These are also called cubic polynomials.
Example 144. The expressions 5x4 − 2x3 + 2x2 + 3x − 1 and −9x4 + 3x3 + 2x2 + 3x + 1 are
4th-degree polynomials. These are also called quartic polynomials.
30
Actually, we’ve already secretly studied an example of a polynomial in two variables — the expression on the LHS of the
equation of the conic section: Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0.
x2 + 3
Example 145. Say you have an expression . We might be perfectly content with
x−1
this expression. Or we might try to simplify it through long division:
x +1
x − 1 x2 +0x +3
x2 −x +0
x +3
x −1
4
The “quotient” is x + 1 and the “remainder” is 4. Hence,
x2 + 3 4
=x+1+
x−1 x−1
4x3 + 2x2 + 1
Example 146. Let’s simplify through long division:
2x2 − x − 1
2x +2
2x2 − x − 1 4x3 +2x2 +0x +1
4x3 −2x2 −2x +0
4x2 +2x +1
4x2 −2x +3
4x +3
The “quotient” is 2x + 2 and the “remainder” is 4x + 3. Hence,
4x3 + 2x2 + 1 4x + 3
= 2x + 2 + .
2x2 − x − 1 2x2 − x − 1
16x + 3 4x2 − 3x + 1 x2 + x + 3
(a) . (b) . (c) .
5x − 2 x+5 −x2 − 2x + 1
ax2 + bx + c
y= .
dx + e
To warm up, here we’ll study the special case of the above equation, where a = 0:
bx + c
y= .
dx + e
2
x + 1 2x +1
2x + 1 1
2x +2 Ô⇒ y= =2− .
x+1 x+1
−1
7
y
y = -x + 1 y=x+3
line of line of
symmetry 5 symmetry
y=2
3 horizontal
asymptote
(-1, 2)
1
Centre
x
-6 -4 -2 0 2 4
x = -1 -1
vertical
asymptote
-3
3.5
2x + 4 7x +3
7x + 3 11
7x +14 Ô⇒ y= = 3.5 − .
2x + 4 2x + 4
−11
Let’s summarise the graph’s characteristics. This is a hyperbola and so there are two
distinct branches.
1. Intercepts. The graph intersects the vertical axis at the point (0, 0.75) and the hori-
zontal axis at the point (−3/7, 0).
2. There are no turning points.
3. Asymptotes. As x → −2, y → ±∞. And so x = −2 is a vertical asymptote. As x → ±∞,
y → 3.5. And so y = 3.5 is a horizontal asymptote. The two asymptotes are perpendicular
and so this is a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (−2, 3.5).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes.
So they must have slope 1 and −1. Moreover, both pass through the centre (−2, 3.5).
Altogether, we can work out that the lines of symmetry are y = x + 5.5 and y = −x + 1.5.
b/d
dx + e bx +c
bx +be/d
c − be/d
The “quotient” is b/d and the “remainder” is c − be/d. Let’s further simplify this so that x
has no coefficient.
bx + c b c − be/d
= +
dx + e d dx + e
b c − be/d 1
= +
d d x + e/d
b cd − be 1
= +
d d2 x + e/d
We can thus get from y = 1/x to the above equation, through these transformations:
e 1
1. Shift the graph leftwards by units to get the graph of y = .
d x + e/d
cd − be
2. Stretch the graph vertically, outwards from the horizontal axis, by a factor of to
d2
cd − be
get the graph of y = 2 .
d (x + e/d)
3. Finally, shift the graph upwards by b/d units to get the final graph.
Exam Tip
The A-level exams often ask you to list down a series of transformations that will get you
from one graph to another, as was just done.
1. Intercepts. If e = 0, then the graph does not cross the vertical axis. If e ≠ 0, then the
graph intersects the vertical axis at the point (0, c/e). If b = 0, then the graph does not
cross the horizontal axis. If b ≠ 0, then the graph intersects the vertical axis at the point
(−c/b, 0).
2. There are no turning points.31
3. Asymptotes. As x → −e/d, y → ±∞. And so x = −e/d is a vertical asymptote. As
x → ±∞, y → b/d. And so y = b/d is a horizontal asymptote. The two asymptotes are
perpendicular and so this is a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (−e/d, b/d).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes.
So they must have slope 1 and −1. Moreover, both pass through the centre (−e/d, b/d).
Altogether, we can work out that the lines of symmetry are y = x + e/d + b/d and y =
−x − e/d + b/d.
Exercise 77. For each of the following equations, sketch its graph and identify its inter-
cepts, turning points, asymptotes, centre, and lines of symmetry (if there are any of these).
3x + 2 x−2 −3x + 1
(a) y = . (b) y = . (c) y = . (Answers on pp. 1034, 1035, and 1036.)
x+2 −2x + 1 2x + 3
bx + c
31
See p. 918 in the Appendices (optional) for a proof that y = has no turning points.
dx + e
ax2 + bx + c
y= .
dx + e
ax2 + bx + c bx + c
• a = 0, because in that case = and this was already studied in the last
dx + e dx + e
section.
ax2 + bx + c ax2 + bx + c
• d = 0, because in that case = , which is a quadratic and which
dx + e e
we already studied in secondary school.
ax2 + bx a b
• Both c and e are 0, because in that case = x + , which is a linear expression.
dx d d
We’ll start with the simplest possible case (a = 1, b = 0, c = 1, d = 1, and e = 0). This is the
equation
x2 + 1
y= .
x
10 y y = (x2 + 1) / x
8
y=x
6 Oblique
(0, 0) 4 Asymptote
Centre
2 Minimum
Turning Point x
0
-10 -6 -2 2 6 10
-2
Maximum
Turning Point -4 y = (1 - √2) x
-6 Line of Symmetry
x=0
y = (1 + √2) x -8 vertical
Line of Symmetry -10 asymptote
x
x x2 +1
x2 + 1 1
Do the long division: x2 Ô⇒ y= =x+ .
x x
1
As usual, this is a hyperbola that has two distinct branches. Other features:
1. Intercepts. The graph intersects neither the vertical axis nor the horizontal axis.
2. There are two turning points — (−1, −2) is a maximum turning point and (1, 2) is a
minimum turning point. (To find these, compute the first derivative dy/dx = 1 − 1/x2 .
Set these equal to 0 for find two stationary points: x = ±1. Use the 2DT to determine
that x = −1 and x = 1 are, respectively maximum and minimum turning points.)
By observation, y can take on any value except those between these two turning points.
The range of y is thus (−∞, −2] ∪ [2, ∞).
3. Asymptotes. As x → 0, y → ±∞. Hence, there is one vertical asymptote: x = 0. As
x → ±∞, y → x. Hence, there is one oblique asymptote: y = x. The two asymptotes are
not perpendicular and so this is not a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (0, 0).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes
and pass through the centre. You don’t need to learn how to figure out their
equations (but see pp. 919ff. in the Appendices if you’re interested).
x +2
x + 1 x2 +3x +1
x2 +x
x2 + 3x + 1 1
2x Ô⇒ y= =x+2− .
x+1 x+1
2x +2
−1
10 y y=x+2
y = (1 - √2) x + 2 - √2 8 Oblique
Line of Symmetry 6 Asymptote
4
2 x
0
-11 -7 -3 -2 1 5 9
(-1, 1)
-4 Centre
-6 x = -1
y = (1 + √2) x + 2 + √2 vertical
Line of Symmetry -8
asymptote
-10
As usual, this is a hyperbola that has two distinct branches. Other features:
1. Intercepts. The graph intersects the vertical axis at the point (0, 1) and the horizontal
√ √
axis at the points (0.5(−3 + 5), 0) and (0.5(−3 − 5), 0). (The horizontal intercepts
are simply the zeros of the quadratic x2 + 3x + 1.)
2. There are no turning points. (Compute dy/dx = 1 + 1/(x + 1)2 . Set this equal to 0 —
there are no stationary points and thus no turning points either.)
By observation, y can take on any value. The range of y is thus R.
3. Asymptotes. As x → −1, y → ±∞. Hence, there is one vertical asymptote: x = −1. As
x → ±∞, y → x+2. Hence, there is one oblique asymptote: y = x+2. The two asymptotes
are not perpendicular and so this is not a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (−1, 1).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes
and pass through the centre. Again, you don’t need to know how to find their equations.
−2x −4
−x + 1 2x2 +2x +1
2x2 −2x
2x2 + 2x + 1 5 5
4x Ô⇒ = −2x − 4 + = −2x − 4 + .
−x + 1 −x + 1 −x + 1
4x −4
5
As usual, this is a hyperbola that has two distinct branches. Other features:
1. Intercepts. The graph intersects the vertical axis at the point (0, 1), but not the
horizontal axis, because there are no real zeros for the quadratic 2x2 + 2x + 1.
√ √
2. There are two turning points — (1 − 2.5, 0.325) and (1 + 2.5, −12.325) are the
minimum and maximum turning points. (Verify this.)
By observation, y can take on any value except those between these two turning points. The
range of y is thus (−∞, −12.325] ∪ [0.325, ∞).
3. Asymptotes. As x → 1, y → ±∞. Hence, there is one vertical asymptote: x = 1. As
x → ±∞, y → −2x − 4. Hence, there is one oblique asymptote: y = −2x − 4. The two
asymptotes are not perpendicular and so this is not a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (1, −6).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes
and pass through the centre. Again, you don’t need to know how to find their equations.
ax2 + bx + c
y= ,
dx + e
because it gets rather messy. But if you want, you can read about it in Appendices (p. 919
onwards).
Exercise 78. For each of the following equations, sketch its graph and identify its inter-
cepts, turning points, asymptotes, centre, and lines of symmetry (if any of these exist).
(Answers on pp. 1037, 1039, and 1041.)
x2 + 2x + 1 −x2 + x − 1 2x2 − 2x − 1
(a) y = . (b) y = . (c) y = .
x−4 x+1 x+4
A graph (or curve) is simply a set of points. Parametric equations give us an alternative
method to describing the same graph (or curve).
Example 152. Recall that the graph of the equation x2 + y 2 = 1 — i.e. the set S = {(x, y) ∶
x2 + y 2 = 1} — is the unit circle centred on the origin.
y
Arrows indicate the
instantaneous
direction of travel.
t = 3π / 4, x = - √2 / 2, y = √2 / 2
vx = - √2 / 2 ms-1, vy = - √2 / 2 ms-1 x2 + y2 = 1
ax = √2 / 2 ms-2, ay = - √2 / 2 ms-2
x
t = 0, x = 1, y = 0
vx = 0 ms-1, vy = 1 ms-1
ax = -1 ms-2, ay = 0 ms-2
t = 3π / 2, x = 0, y = -1
vx = 1 ms-1, vy = 0 ms-1
ax = 0 ms-2, ay = 1 ms-2
Example 200 (continued from above). The set S = {(x, y) ∶ x = cos t, y = sin t, 0 ≤ t <
2π} can also be interpreted as tracing the motion of a particle as it moves anti-clockwise
around a circle. x and y give the distances of the particle (in metres) from the origin, in
the x- and y-directions.
We have x = cos t and y = sin t. This says that at any instant of time t, the particle is cos t
metres to the east of the origin and sin t metres to the north of the origin. (Note that if
cos t < 0, then the particle is to the west of the origin. And if sin t < 0, then the particle is
to the south of the origin.)
At time t = 0 s, the particle is at the position (x, y) = (1, 0). At time t = 1 s, the particle
has moved to the position (x, y) = (0.54, 0.84). At time t = π/2 ≈ 1.07 s, the particle has
moved to position (x, y) = (0, 1).
Having interpreted t as time, we can now also easily talk about the velocity and acceler-
ation of the particle at different instants in time.
Example 200 (continued from above). We have x = cos t and y = sin t. From this,
we can easily compute the particle’s velocity in each direction: vx = dx/dt = − sin t and
vy = dy/dt = cos t.
This says that at any instant of time t, the velocity of the particle is − sin t ms-1 in the
x-direction and cos t ms-1 in the y-direction. (Note that if − sin t < 0, then the particle is
moving westwards. And if cos t < 0, then the particle is moving southwards.)
√
So for example,
√ at time t = 7π/4, its velocity is − sin (7π/4) = 2/2 ms-1 rightwards and
cos (7π/4) = 2/2 ms−1 upwards.
Exercise 79. (Answer on p. 1043.) Let P be the particle whose position (in metres) is de-
scribed by the set {(x, y) ∶ x = cos t, y = sin t, t ∈ R}, where t is time (seconds). Let Q be the
particle whose position (in metres) is described by the set {(x, y) ∶ x = sin t, y = cos t, t ∈ R}.
(a) How does the starting point (when t = 0) of Q differ from that of P ? (b) What about
the direction of travel?
x
t = 0, x = 1, y = 0
t = 3π / 2
Observe that if x = a cos t and y = b sin t, then by the same trigonometric identity as before,
x2 /a2 + y 2 /b2 = 1. As it turns out, this gives us a second way of writing the set T :
Similar to before, as t increases from 0 to 2π, we trace out, anti-clockwise, an ellipse centred
on the origin.
At any instant in time t, the particle’s position, velocity, and acceleration are (x, y) =
(a cos t, b sin t), (vx , vy ) = (−a sin t, b cos t), and (ax , ay ) = (−a cos t, −b sin t).
Exercise 80. Let P be the particle whose position (in metres) is described
{(x, y) ∶ x = a cos t, y = b sin t, t ∈ R}, where t is time (seconds). At each of the following
times, state the particle’s position and also its velocity and acceleration in both the x- and
π π
y- directions. (a) t = ; (b) t = ; (c) t = 2π. (Answer on p. 1043.)
4 2
Arrows indicate 5 y
the instantaneous 4
x2 - y2 = 1 direction of travel. 3
2
t=1
t=4 1 x
t=0
0
t=3
-5 -4 -3 -2 -1 -1 0 1 2 3 4 5
t=2 -2
-3 t=5
-4
-5
Note that t cannot be a half-integer multiple of π, because then tan t would be undefined.
Again, let’s interpret this as the movement of a particle. Interestingly, the particle always
moves upwards, as we can easily prove — vy = dy/dt = sec2 t > 0 for all t.
At t = 0, the particle is at (x, y) = (1, 0). During t ∈ [0, π/2), the particle moves northeast
along the green segment and flies off towards infinity as t → π/2 ≈ 1.57.
An instant after π/2 seconds, the particle magically reappears “near” infinity in the south-
west. During t ∈ (π/2, π], the particle moves northeast along the blue segment. At t = π,
the particle is at (−1, 0).
During t ∈ [π, 3π/2), the particle moves northwest along the red segment and flies off
towards infinity as t → 3π/2 ≈ 4.71.
An instant after 3π/2 seconds, the particle magically reappears “near” infinity in the south-
east. During t ∈ (3π/2, 2π], the particle moves northwest along the pink segment.
(b) Compute dx/dt. And hence make an observation about how the particle travels in the
x-direction.
The graph below indicates six positions of the particle — A, B, C, D, E, and F . (Also
indicated are the directions of travel.) The particle is at these positions at times t = 0, 1,
2, 3, 4, and 5 but not necessarily in that order.
(c) Using only the graphs of s = tan t and s = sec t (above) to guide you and without using
a calculator, state where the particle is, at each of the times t = 0, 1, 2, 3, 4, and 5.
5 y
4
A
C
2
B
1
{(x, y): x = tan t, y = sec t, t } x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
E
-2 F
D
-3
Arrows indicate -4
the instantaneous
direction of travel.
-5
Given a pair of parametric equations that describes a set of points, we can often go in
reverse: We can eliminate the parameter t and describe the same set of points using a
single equation.
y Instantaneous
Direction of Travel
x
Instantaneous t = 1, x = 2, y = 0
Direction of vx = (2t + 1) ms-1 = 3 ms-1
Travel vy = 1 ms-1, ax = 2 ms-2, ay = 0 ms-2
t = 0, x = 0, y = - 1
vx = (2t + 1) ms-1 = 1 ms-1
vy = 1 ms-1, ax = 2 ms-2, ay = 0 ms-2
Instantaneous
Direction of
Travel t = - 1, x = 0, y = - 2 x = y2 + 3y + 2
vx = (2t + 1) ms-1 = - 1 ms-1
vy = 1 ms-1, ax = 2 ms-2, ay = 0 ms-2
As an exercise, let’s also compute the velocity and acceleration of the particle.
vx = dx/dt = 2t + 1 and vy = dy/dt = 1. This says that at any instant in time t, the particle
has velocity 2t + 1 ms−1 rightwards and 1 ms−1 upwards.
ax = d2 x/dt2 = 2 and ay = d2 y/dt2 = 0. This says that the particle is always accelerating
rightwards at the rate 2 ms−2 . Moreover, it is never accelerating upwards (this is consistent
with the above finding that its upwards velocity is a constant 1 ms−1 ).
5 y
t = π / 2, x = - 4, y = 4
vx = - 2 sin (t) ms-1 = - 2 ms-1 4
vy = 3 cos (t) ms-1 = 0 ms-1
ax = - 2 cos (t) ms-2 = 0 ms-2
ay = - 3 sin (t) ms-2 = -3 ms-2 3
t = π, x = - 6, y = 1
vx = - 2 sin (t) ms-1 = 0 ms-1 2
vy = 3 cos (t) ms-1 = - 3 ms-1
ax = - 2 cos (t) ms-2 = 2 ms-2
ay = - 3 sin (t) ms-2 = 0 ms-2 1
x
0
-7 -5 -3 -1 1
-1
t = 3π / 2 , x = - 4, y = - 2
vx = - 2 sin (t) ms-1 = 2 ms-1 -2
vy = 3 cos (t) ms-1 = 0 ms-1
ax = - 2 cos (t) ms-2 = 0 ms-2
ay = - 3 sin (t) ms-2 = 3 ms-2 -3
Write (x + 4) /2 = cos t and (y − 1) /3 = sin t. Using the trigonometric identity cos2 t+sin2 t =
2 2
1, we can rewrite the set as {(x, y) ∶ x = [(x + 4) /2] + [(y − 1) /3] = 1}. This is the ellipse
centred on (−4, 1).
As an exercise, let’s also compute the velocity and acceleration of the particle.
vx = dx/dt = −2 sin t and vy = dy/dt = 3 cos t. This says that at any instant in time t, the
particle has velocity 2 sin t ms−1 leftwards and 3 cos t ms−1 upwards.
ax = d2 x/dt2 = −2 cos t and ay = d2 y/dt2 = −3 sin t. This says that at any instant in time
t, the particle is accelerating leftwards at the rate −2 cos t ms−2 and upwards at the rate
−3 sin t ms−2 .
N N
Given any fraction (where N and D are real numbers with D non-zero), we have >0
D D
if and only if one of the following is true:
The expressions that are in the numerator (N ) and denominator (D) can get pretty com-
plicated. So here are some very simple examples just to warm you up.
4
Example 157. > 0 because both the numerator and denominator are positive.
7
−5
Example 158. > 0 because both the numerator and denominator are negative.
−3
−9
Example 159. < 0 because the numerator is negative but the denominator is positive.
2
1
Example 160. > 0 because the numerator is positive but the denominator is negative.
−8
x+3
Example 161. > 0 ⇐⇒ one of the following is true:
3x + 2
1. “x + 3 > 0 AND 3x + 2 > 0”; OR
2. “x + 3 < 0 AND 3x + 2 < 0”.
Notice that (1) “x + 3 > 0 AND 3x + 2 > 0” ⇐⇒ “x > −3 AND x > −2/3” , which in turn is
equivalent to the single inequality “x > −2/3”.
Notice that (1) “x + 3 < 0 AND 3x + 2 < 0” ⇐⇒ “x < −3 AND x < −2/3” , which in turn is
equivalent to the single inequality “x < −3”.
x+3
Altogether then, > 0 ⇐⇒ “x > −2/3 OR x < −3” (equivalently, “x ∈ (−∞, −3) ∪
3x + 2
2
(− , ∞)”).
3
Note that I use quotation marks “⋅”, but these are not necessary. Instead, they merely help
to make especially clear which groups of conditions corresponds to each other.
4x − 1
Example 162. > 0 ⇐⇒ one of the following is true:
x+2
1. “4x − 1 > 0 AND x + 2 > 0” ⇐⇒ “x > 1/4 AND x > −2” ⇐⇒ “x > 1/4” ; OR
2. “4x − 1 < 0 AND x + 2 < 0” ⇐⇒ “x < 1/4 AND x < −2” ⇐⇒ “x < −2”.
4x − 1
Altogether then, > 0 ⇐⇒ “x > 1/4 and x < −2” (equivalently, “x ∈ (−∞, −2) ∪
x+2
(1/4, ∞)”).
5x + 4
Example 163. > 0 ⇐⇒ one of the following is true:
−2x + 1
1. “5x + 4 > 0 AND −2x + 1 > 0” ⇐⇒ “x > −4/5 AND x < 1/2” ⇐⇒ “x ∈ (−4/5, 1/2)” ; OR
2. “5x + 4 < 0 AND −2x + 1 < 0” ⇐⇒ “x < −4/5 AND x > 1/2”, but these are mutually
contradictory and thus impossible.
5x + 4
Altogether then, > 0 ⇐⇒ “x ∈ (−4/5, 1/2)”.
−2x + 1
When given any inequality that is of a slightly different form, be sure to always convert it
N
into what I’ll call the standard form > 0. Strictly speaking, this is not necessary, but
D
3x − 2
Example 164. Consider the inequality < 3. This inequality is equivalent to
−5x + 1
3x − 2 −15x + 3 − (3x − 2) −18x + 5
3− >0 ⇐⇒ >0 ⇐⇒ > 0.
−5x + 1 −5x + 1 −5x + 1
2x + 1
Exercise 83. For what values of x is each of the following inequalities true? (a) > 0.
3x + 2
x−1 −1 1 −3x − 18 2x + 3
(b) > 0. (c) > 0. (d) > 0. (e) > 0. (f) < 9. (Answers on p.
−4 −4 −4 9x − 14 −x + 7
1048.)
ax2 + bx + c
Don’t worry, you are not required to know how to graph the equation y = . But
dx2 + ex + f
ax2 + bx + c
you are required to know how to find the values of x for which > 0. This is
dx2 + ex + f
just the same game as before, albeit slightly more complicated.
2x2 + x + 3
Example 165. > 0 ⇐⇒ one of the following is true:
−x2 + 3x + 2
1. “2x2 + x + 3 > 0 AND −x2 + 3x + 2 > 0”; OR
2. “2x2 + x + 3 < 0 AND −x2 + 3x + 2 < 0”.
y = 2x2 + x + 3 is a ∪-shaped quadratic and has no real roots (because the discriminant is
negative). Hence, it is always positive. It is thus impossible that “2x2 + x + 3 < 0 AND
−x2 + 3x + 2 < 0” (Case 2).
We need thus only examine Case 1. As we just said, it is always true that 2x2 + x + 3 > 0.
So we need only examine when it is true that −x2 + 3x + 2 > 0.
The equation y = −x2 + 3x + 2 has a ∩-shaped graph and has two real zeros given by:
√ √ √
−3 ± 32 − 4(−1)(2) −3 ± 17 3 ∓ 17 √
= = = 0.5 (3 ∓ 17) .
2(−1) −2 2
√ √
Hence, the expression −x2 + 3x + 2 > 0 ⇐⇒ “x ∈ (0.5 (3 − 17) , 0.5 (3 + 17))”.
2x2 + x + 3 √ √
Altogether then, > 0 ⇐⇒ “x ∈ (0.5 (3 − 17) , 0.5 (3 + 17))”.
−x2 + 3x + 2
A dirty trick is to use your TI84 to do a quick check that this answer is correct:
The equation y = 2x2 + x + 2 has a ∪-shaped graph and has no real zeros (because the
discriminant is negative). Hence, it is always positive. It is thus impossible that “−x2 +
4x − 1 < 0 AND 2x2 + x + 2 < 0” (Case 2).
We need thus only examine Case 1. As we just said, it is always true that y = 2x2 +x+2 > 0.
So we need only examine when it is true that −x2 + 4x − 1 > 0.
The equation y = −x2 + 4x − 1 has a ∩-shaped graph and has two real zeros given by:
√ √ √
−4 ± 42 − 4(−1)(−1) −4 ± 12 4 ∓ 12 √
= = = 2 ∓ 3.
2(−1) −2 2
√ √
Hence, the expression −x2 + 4x − 1 > 0 ⇐⇒ “x ∈ (2 − 3, 2 + 3)”.
−x2 + 4x − 1 √ √
Thus, > 0 ⇐⇒ x ∈ (2 − 3, 2 + 3) ≈ (0.268, 3.732). As usual, let’s check
2x2 + x + 2
using our TI84:
The equation y = x2 + 5x + 4 has a ∪-shaped graph and has two real zeros given by:
√ √
−5 ± (5)2 − 4(1)(4) −5 ± 9 −5 ± 3
= = = −4, −1.
2(1) 2 2
The equation y = −x2 − 2x + 1 has a ∩-shaped graph and has two real roots given by:
√ √
2± (−2)2 − 4(−1)(1) 2 ± 8 √
= = −1 ∓ 2.
2(−1) −2
√ √
Hence, the expression −x2 + 4x − 1 > 0 ⇐⇒ “x ∈ (−1 − 2, 2 − 1)”. Thus:
√ √
1. “x2 +5x+4 > 0 AND −x2 −2x+1 > 0” ⇐⇒ “x < −4 OR x > −1 AND x ∈ (−1 − 2, 2 − 1)”.
√ √
Since −1 − 2 < −1, this is equivalent to x ∈ (−1, 2 − 1).
√ √
2. “x2 + 5x + 4 < 0 AND −x2 − 2x + 1 < 0” ⇐⇒ “x ∈ (−4, −1) AND x < −1 − 2 or x > 2 − 1”.
√ √
Since −1 − 2 < −1, this is equivalent to x ∈ (−4, −1 − 2).
x2 + 5x + 4 √ √
Altogether then, > 0 ⇐⇒ x ∈ (−4, −1 − 2) ∪ (−1, 2 − 1). As usual, let’s
−x2 − 2x + 1
check using our TI84:
The equation y = x2 − 4x + 3 has a ∪-shaped graph and has two real zeros given by:
√ √
4± (−4)2 − 4(1)(3) 4 ± 4
= = 1, 3.
2(1) 2
The equation y = x2 − 2x has a ∪-shaped graph and has two real roots given by:
√ √
2± (−2)2 − 4(1)(0) 2 ± 4
= = 0, 2.
2(1) 2
Exercise 84. Without using a calculator, find the values of x for which each of the following
x2 + 2x + 1 x2 − 1 x2 − 3x − 18 2x + 5
inequalities is true. (a) 2 > 0. (b) 2 > 0. (c) > 0. (d) >
x − 3x + 2 x −4 −x + 9x − 14
2 −x + 4
−3x + 1
. (Answers on pp. 1050, 1051, 1052, and 1053.)
6x − 7
In the TI84:
1. Press ON to turn on your calculator.
2. Press Y= to bring up the Y= editor.
3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π
(which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will
have entered “x − sin(0.5πx)”.
4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx).
It looks like the horizontal intercepts are close to the origin. Let’s zoom in to see better.
5. Press the (ZOOM) button to bring up a menu of ZOOM options.
6. Press 2 to select the Zoom In option. Nothing seems to happen. But now press ENTER
and the TI will zoom in a little for you.
It looks like there are 3 horizontal intercepts. To find out what precisely they are, we’ll use
the TI84’s “zero” option.
3. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu.
4. Press 2 to select the “zero” option. This brings you back to the graph, with a cursor
flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s ZERO function works by you first specifying a “Left Bound” and a “Right Bound”
for x. TI84 will then check to see if there are any horizontal intercepts (i.e. values of x for
which y = 0) within those bounds.
5. Using the < and > arrow keys, move the blinking cursor until it is where you want your
first “Left Bound” to be. For me, I have placed it a little to the left of where I believe
the leftmost horizontal intercept to be.
6. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
7. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is
where you want your first “Right Bound” to be. For me, I have placed it a little to the
right of where I believe the leftmost horizontal is.
8. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the horizontal intercept is. So go ahead and:
9. Press ENTER . TI84 now informs you that there is a “Zero” at “x = −1”, “y = 0” and
places the blinking cursor at precisely that point. This is the first horizontal intercept
we’ve found.
To find each of the other 2 horizontal intercepts, just repeat steps 3 through 9. You
should be able to find that they are at x = 0 and x = 1. Altogether, the 3 intercepts are
x = −1, 0, 1. Based on these and what the graph looks like, we conclude: x > sin (0.5πx)
⇐⇒ x ∈ (−1, 0) ∪ (1, ∞).
Look for the values of x for which x − e − ln x = 0. They are x = 0.7083, 4.1387:
Based on these horizontal intercepts and what the graph looks like, we conclude: x > e+ln x
if and only if x ∈ (0, 0.7083) ∪ (4.1387, ∞).
Exercise 85. Use a graphing calculator to find the values of x for which each of the
√ 1
following inequalities is true. (a) x3 − x2 + x − 1 > ex . (b) x > cos x. (c) > x3 + sin x.
1−x 2
(Answers on p. 1054.)
Warm-up questions:
Exercise 86. (PSLE-style question.) When Apu was 40 years old, Beng was twice as old
as Caleb. Today, Caleb is 28 years old and Apu is twice as old as Beng. What are the ages
of Apu and Beng today? (If necessary, assume that the age of a person is always an integer
and is fixed between January 1st and December 31st of each year.) (Answer on p. 1055.)
Exercise 87. (O-Level style question.) Planes A and B leave the same point at 12pm.
Plane A travels northeast at a constant speed of 100 km/h. Plane B travels south at a
constant speed of 200 km/h. At 3pm, both planes make an instant turn and start flying
directly towards each other at the same speed. At what time will the two planes collide?
(Answer on p. 1055.)
Definition 48. Given an equation involving a single variable x, a real solution to the
equation is any value of x ∈ R such that the equation is true.
Example 171. The equation x + 5 = 8 has one real solution: 3. The equation x2 − 1 = 0
has two real solutions: −1 and 1. The equation x2 − 1 = 8 has two real solutions: −3 and 3.
The equation x3 − 4x = 0 has three real solutions: −2, 0, and 2.
Definition 49. Given an equation involving a single variable x, a real solution set is the
set of values of x ∈ R such that the equation is true.
Example 173. The real solution set of the equation x + 5 = 8 is {3}. The real solution
set of the equation x2 − 1 = 0 is {−1, 1}. The real solution set of the equation x2 − 1 = 8 is
{−3, 3}. The real solution set of the equation x3 − 4x = 0 is {−2, 0, 2}.
Example 176. Consider the system of equations y = 0.5x2 − 1.5 and y = x. To solve this
system of equations, plug in the second equation into the first to get: x = 0.5x2 − 1.5.
Rearranging: x2 − 2x − 3 = 0. Now solve: x = 3, −1. Correspondingly, y = 3, −1. Altogether,
this system of equations has two real solutions: (3, 3) and (−1, −1). Its real solution set is
thus {(3, 3), (−1, −1)}.
Example 177. Consider the system of equations y = ln x and y = x. Observe that for all
x ∈ (0, 1), ln x < 0 and hence x > ln x. Moreover, for x = 1, ln x = 0 < x. Also, for x > 1,
d 1 d
ln x = < 1 < x = 1, so the slope of y = x is steeper than that of y = ln x. Altogether
dx x dx
then, for all x > 0, x > ln x. Hence, this system of equations has no real solutions. Its real
solution set is thus ∅ = {}.
Example 178. Consider the system of equations y = x and 2y = 2x. Observe that this
system of equations has infinitely many real solutions, e.g. (1, 1), (2, 2), (2.74, 2.74). There
is thus no way to explicitly list out all its real solutions. However, using set-builder notation,
we can write down its real solution set as {(x, y} ∶ y = x}. This says that every ordered pair
(x, y) such that y = x is a real solution to the given system of equations.
Here I’ll use another method: First rewrite the two equations as a third equation y =
x4 − x3 − 5 − ln x. Our goal is to find the horizontal intercepts of this equation, which will
in turn also be the solutions to the above set of equations.
Exercise 90. Using your graphing calculator, solve the following systems of equations.
(Answers on pp. 1057, 1058, and 1059.)
1 1
(a) x2 + y 2 = 1, y = sin x. (b) y = √ , y = x5 − x3 + 2. (c) y = , y = x3 + sin x.
1+ x 1−x2
Recall that an ordered pair (of real numbers) was simply any pair of real numbers, enclosed
by parentheses, and whose order matters (and this was the only difference between an
ordered pair and a set of two objects).
Example 180. (1, 2) and (2, 1) are both ordered pairs with (1, 2) ≠ (2, 1).
Example 181. (1, 2, 3) and (2, 1, 3) are both ordered triples with (1, 2, 3) ≠ (2, 1, 3).
(1, 1, 1, 1) and (2, 4, 1, 3) are both ordered quadruples with (1, 1, 1, 1) ≠ (2, 4, 1, 3).
(2, 2, 3, 2, 2) and (2, 4, 1, 5, 3) are both ordered quintuples with (2, 2, 3, 2, 2) ≠ (2, 4, 1, 5, 3).
We’ll simply call all of these ordered n-tuples or even simply tuples. Hence,
Example 182. (1, 2, 3), (2, 1, 3), (1, 1, 1, 1), (2, 4, 1, 3), (2, 2, 3, 2, 2), and (2, 4, 1, 5, 3) are
all ordered n-tuples. (1, 2, 3) and (2, 1, 3) are ordered 3-ples or triples. (1, 1, 1, 1) and
(2, 4, 1, 3) are ordered 4-tuples or quadruples. (2, 2, 3, 2, 2) and (2, 4, 1, 5, 3) are ordered
5-tuples or quintuples.
In fact, when talking about tuples, it will be understood that they are ordered, so we’ll
drop the word “ordered” and simply call them tuples (instead of ordered tuples).
Example 183. (1, 2, 3) and (2, 1, 3) are 3-ples or, equivalently, finite sequences of length
3.
(1, 2, 3, 4) and (2, 4, 1, 3) are 4-tuples or, equivalently, finite sequences of length 4.
(1, 2, 3, 4, 5) and (2, 4, 1, 5, 3) are 5-tuples or, equivalently, finite sequences of length 5.
Example 184. Given the sequence (2, 1, 3), 2 is its first term, 1 is its second term, and
3 is its third term.
Example 185. (2, 4, 6, 8, 10, 12, 14) is a finite sequence of length 7, consisting of the first
seven even positive integers. A corresponding function f for this sequence has
Indeed, the values of the function f (1) = 2, f (2) = 4, f (3) = 6, ..., f (7) = 14 exactly list out
the terms in the finite sequence (2, 4, 6, 8, 10, 12, 14).
Example 186. (2, 5, 12, 23, 38, 57, 80, 107, 138, 173) is a finite sequence of length 10. A
corresponding function f for this sequence has
Indeed, the values of the function f (1) = 2, f (2) = 5, f (3) = 12, f (4) = 23, ..., f (10) = 173
exactly list out the terms in the finite sequence (2, 5, 12, 23, 38, 57, 80, 107, 138, 173).
Exercise 91. (Answer on p. 1060.) For each of the following finite sequences, write down
a corresponding function.
(a) (1, 4, 9, 16, 25, 36, 49, 64, 81, 100).
(d) (2, 6, 6, 12, 10, 18, 14, 24, 18, 30, 22, 36, 26, 42).
32
Indeed, this is how a sequence is usually formally defined.
SYLLABUS ALERT
Recurrence relations are included in the 9740 (old) syllabus, but not in the 9758 (revised)
syllabus. So you can skip this section if you’re taking 9758.
Example 187. (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024) is a finite sequence of length 10. A
corresponding function f for this sequence has
The equation f (n) = 2f (n−1) is an example of a recurrence relation. That is, it describes
how each term in the sequence is generated, depending on what previous terms were.
In this particular example of a sequence, we can easily write down another corresponding
function that does not involve a recurrence relation:
Example 188. (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024) is a finite sequence of length 10. A
corresponding function g for this sequence has
If we can describe a sequence without using a recurrence relation, then we can immediately
compute what each term in the sequence is. So in the case of the finite sequence just given,
we prefer to use the function g rather than the function f as a corresponding function.
In contrast, with a recurrence relation, we need to know what some of the previous terms
are, in order to compute each term. So if possible, we prefer to describe sequences without
using recurrence relations.
It is possible to describe the sequence just given without using a recurrence relation, but it
does not come obviously (at least to the untrained eye) and takes a little work, as we’ll see.
A recurrence relation can certainly involve more than just the previous term. In the Fi-
bonnaci sequence, each term (from the third term onwards) is the sum of the previous
two terms: f (n) = f (n − 2) + f (n − 1). This equation is again a recurrence relation.
But in the past ten years’ exams, I haven’t seen a question where the recurrence relation
involves more than just the previous term. So we shall not bother doing much of these.
Example 190. (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89) is a finite sequence of length 11, consisting
of the first 11 Fibonacci numbers. A corresponding function f for this sequence has
Exercise 92. Each of the following finite sequences involves a recurrence relation. (Hint:
Each involves only the previous term and also a squared term.) Write down a corresponding
function for each. (a) (3, 4, 9, 64, 3969). (b) (1, 2, 10, 290, 252010). (Answer on p. 1061.)
Example 191. (1, 1, 1, 3, 5, 9, 17, 31, 57, 105, 193) is a finite sequence of length 11. We can
also write it as (an )n≤11 = (a1 , a2 , a3 , . . . , a11 ), where a1 = 1, a2 = 1, a3 = 1, a4 = 3, a5 = 5, ...,
a11 = 193.
Example 192. (1, 1, 1, 2, 2, 3, 4, 5, 7, 9, 12, 16, 21, 28, 37, 49, 65, 86, 114, 151) is a finite se-
quence of length 20. We can also write it as (bn )n≤20 = (b1 , b2 , b3 , . . . , b20 ), where b1 = 1,
b2 = 1, b3 = 1, b4 = 2, b5 = 2, ..., b20 = 151.
Example 193. (2, 4, 6, 8, 10, 12, 14) is a finite sequence of length 7. We can also write it as
(cn )n≤7 = (c1 , c2 , c3 , . . . , c7 ), where c1 = 2, c2 = 4, c3 = 6, ..., c7 = 14.
Example 194. (1, 1, 3, 5, 11, 21, 43, 85, 171, 341, 683) is a finite sequence of length 11. We
can also write it as (dn )n≤11 = (d1 , d2 , d3 , . . . , d11 ), where d1 = 1, d2 = 1, d3 = 3, d4 = 5, d5 = 11,
..., d11 = 683.
We can create new sequences out of old ones, in the “obvious” fashion:
Example 195. Using the sequence (an )n≤11 = (1, 1, 1, 3, 5, 9, 17, 31, 57, 105, 193), here are
some new sequences we can create:
(wn )n≤11 = (an /2)n≤11 = (a1 /2, a2 /2, a3 /2, . . . , a11 /2)
= (1/2, 1/2, 1/2, 3/2, 5/2, 9/2, 17/2, 31/2, 57/2, 105/2, 193/2) = (w1 , w2 , w3 , . . . , w11 ) .
Example 196. Using the sequences (an )n≤11 = (1, 1, 1, 3, 5, 9, 17, 31, 57, 105, 193) and
(dn )n≤11 = (1, 1, 3, 5, 11, 21, 43, 85, 171, 341, 683), here are some new sequences we can create:
(hn )n≤11 = (an /dn )n≤11 = (a1 /d1 , a2 /d2 , a3 /d3 , . . . , a11 /d11 )
= (1, 1, 1/3, 3/5, 5/11, 9/21, . . . , 193/683) = (h1 , h2 , h3 , . . . , h11 ) .
There are of course many other new sequences we can create, whether using only one
sequence, using two sequences, or even using three or more sequences.
Remark 6. You cannot create a new sequence using two finite sequences that are of different
lengths. For example, given two finite sequences (an )n≤11 = (1, 1, 1, 3, 5, 9, 17, 31, 57, 105, 193)
and (cn )n≤7 = (2, 4, 6, 8, 10, 12, 14), there is no such sequence as (an + cn )n≤11 or even
(an + cn )n≤7 . Either of these supposed sequences is simply undefined.
It turns out that we are rarely interested in finite sequences. Instead, we are much more
interested in infinite sequences, which is a simple extension of the concept of finite sequences.
We can easily extend the concept of finite sequences to infinite sequences, which have
domain Z+ = {1, 2, 3, 4, . . . } (the entire set of positive integers).
Example 197. (2, 4, 6, 8, 10, 12, 14, 16, 18, . . . ) is the infinite sequence consisting of all the
even positive integers. A corresponding function f for this sequence has
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (n) = 2n for all n.
Example 198. (1, 3, 6, 10, 15, 21, 28, 36, 45, 55, . . . ) is the infinite sequence consisting of the
triangular numbers. A corresponding function f for this sequence has
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (1) = 1 and f (n) = 1 + 2 + ⋅ ⋅ ⋅ + n for all n ≥ 2.
Example 199. The infinite sequence (1, 2, 6, 24, 120, 720, 5040, ...) has the corresponding
function f with
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (n) = 1 × 2 × ⋅ ⋅ ⋅ × n = n! for all n.
Exercise 93. For each of the following infinite sequences, write down a correspond-
ing function. (a) (1, 4, 9, 16, 25, 36, 49, 64, 81, 100, . . . ). (b) (2, 5, 8, 11, 14, 17, 20, . . . ). (c)
(0.5, 4, 13.5, 32, 62.5, 108, 171.5, . . . ). (d) (2, 6, 6, 12, 10, 18, 14, 24, 18, 30, 22, 36, 26, 42, . . . ).
(Answer on p. 1062.)
(an ) is our shorthand notation for an infinite sequence, where (an ) = (a1 , a2 , a3 , . . . ).
As stated, we are rarely interested in finite sequences. And so whenever we talk about
a sequence, it should be assumed that we are talking about an infinite sequence, unless
otherwise clearly stated.
The idea of creating new sequences carries over from the finite case in the “obvious” fashion.
Example 200.
Let (an ) = (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . )
and (bn ) = (2, 4, 6, 8, 10, 12, 14, 16, 18, 20, . . . ) .
Then (an + bn ) = (3, 5, 8, 11, 15, 20, 27, 37, 52, 75, . . . ) .
Analogous to Remark 6, you cannot create a new sequence using a finite sequence and an
infinite sequence. Instead, you can only create one using two infinite sequences.
Example 201.
Let (an ) = (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . . )
and (bn )n≤7 = (2, 4, 6, 8, 10, 12, 14) .
Definition 52. Given a finite sequence (an )n≤k , its series is the expression
a1 + a2 + a3 + ⋅ ⋅ ⋅ + ak .
We refer to a1 as the first term of the sequence and also as the first term of the series.
Similarly, a2 is the second term of both the sequence and the series. Etc.
Definition 53. Given a finite sequence (an )n≤k , its sum of series is the number S such
that S = a1 + a2 + a3 + ⋅ ⋅ ⋅ + ak .
Example 202.
Given the sequence (an )n≤8 = (1, 1, 1, 3, 5, 9, 17, 31) ,
its series is the expression 1 + 1 + 1 + 3 + 5 + 9 + 17 + 31
and its sum of series is the number 68.
Example 203.
Given the sequence (bn )n≤11 = (2, 4, 6, 8, 10, 12, 14) ,
its series is the expression 2 + 4 + 6 + 8 + 10 + 12 + 14
and its sum of series is the number 56.
It may seem strange and unnecessary to distinguish between a series and a sum of series.
Aren’t they exactly the same thing?
It turns out that expressions like a1 + a2 + a3 + ⋅ ⋅ ⋅ + ak play an important role in maths and
so we want to reserve a special name for the expression itself and distinguish it from the
sum of series. For example, we might be specifically interested in the series 1 + 2 + 3, rather
than just the sum of series 6.
Clearly, every finite sequence has a well-defined sum of series – simply add up all the terms
in the finite sequence!
Definition 54. Given an infinite sequence (an ), its series is the expression a1 + a2 + a3 + . . . .
A series that corresponds to a finite sequence is called a finite series, while a series that
corresponds to an infinite sequence is called an infinite series.
Every finite sequence has a sum of series. In contrast, not all infinite sequences do:
Example 204. Consider the sequence (an ) = (1, 1, 1, 1, 1, 1, . . . ). Its series is the expression
1 + 1 + 1 + 1 + 1 + . . . . There is no number equal to 1 + 1 + 1 + 1 + 1 + . . . and so a sum of series
does not exist for this sequence.
Example 205. Consider the sequence (bn ) = (0, 0, 0, 0, 0, 0, . . . ). Its series is the expression
0 + 0 + 0 + 0 + 0 + . . . . The sum of series for this sequence exists and is 0.
Definition 55. An infinite sequence for which a sum of series exists is said to have a
convergent series.
An infinite sequence for which no sum of series exists is said to have a divergent series.
So in the above examples, we say that the sequence (an ) has a divergent series (because its
sum of series does not exist), while the sequence (bn ) has a convergent series (because its
sum of series exists).
Example 206. Consider the sequence (cn ) = (1, −1, 1, −1, 1, −1, . . . ), where the terms sim-
ply alternate between 1 and −1. Its series is the expression 1 − 1 + 1 − 1 + 1 − 1 + . . . . Is there
any number that is equal to 1 − 1 + 1 − 1 + 1 − 1 + . . . ? It’s actually not obvious. On the one
hand, we can pair together every two terms like so:
1 − 1 + 1 − 1 + 1 − 1 + . . . = (1 − 1) + (1 − 1) + (1 − 1) + . . .
´¹¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶
0 0 0
= 0 + 0 + 0 + ...
and happily conclude that the sum of series is 0. But wait a minute ... what if we instead
pair together every two terms like so:
It turns out that the sequence (cn ) = (1, −1, 1, −1, 1, −1, . . . ) is divergent. Or equivalently,
a sum of series simply does not exist for this sequence.
Σ is the upper-case Greek letter sigma. An enlarged version of that letter ∑, read aloud
as “sum”, is used to express series in compact notation:
9
1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = ∑ n.
n=1
The variable n below the ∑ is called the index variable or dummy variable. We could
have named it p or z or x or any other letter (instead of n) and it wouldn’t have mattered.
Hence the name “dummy”.
The “= 1” below the ∑ says that we start counting the index variable from n = 1. We call
the number “1” the starting point.
The “9” above the ∑ is called the stopping point. It says that we should stop adding
once we hit n = 9.
9
Altogether, the notation ∑ says that we are adding up 9 terms, namely a1 , a2 , ..., a9 .
n=1
The expression to the right of the ∑ tells us what each an is. In this example, it is n, which
simply says that for every n, an = n.
9
Altogether, ∑ n says that we add up a1 through a9 , where each an is simply equal to n.
n=1
9
∑ 1.
n=1
This says that the starting point is 1 and the ending point is 9. In other words, we add
up a1 , a2 , . . . , a9 , where for each n, an = 1. And so a1 = 1, a2 = 1, etc. Altogether:
9
∑ 1 = a1 + a2 + ⋅ ⋅ ⋅ + a9 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1.
n=1
7
∑ 2n.
n=1
This says that the starting point is 1 and the ending point is 7. In other words, we add
up a1 , a2 , . . . , a7 , where for each n, an = 2n. And so a1 = 2, a2 = 4, etc. Altogether:
7
∑ 2n = a1 + a2 + ⋅ ⋅ ⋅ + a7 = 2 × 1 + 2 × 2 + ⋅ ⋅ ⋅ + 2 × 7 = 2 + 4 + 6 + 8 + 10 + 12 + 14.
n=1
7
The series 3 + 5 + 7 + 8 + 11 + 13 + 15 can be rewritten as ∑ (2n + 1) — the parentheses help
n=1
7
to clarify that we are not talking about 1 + ∑ 2n.
n=1
This says that the starting point is 1 and the ending point is 7. In other words, we add
up a1 , a2 , . . . , a7 , where for each n, an = 2n + 1. And so a1 = 3, a2 = 5, etc. Altogether:
3 5 15
7 ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
∑ (2n + 1) = a1 + a2 + ⋅ ⋅ ⋅ + a7 = (2 × 1 + 1) + (2 × 2 + 1) + ⋅ ⋅ ⋅ + (2 × 7 + 1).
n=1
Example 210. The series 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024 can be written as
10
∑ 2n .
n=1
This says that the starting point is 1 and the ending point is 10. In other words, we
add up a1 , a2 , . . . , a10 , where for each n, an = 2n . And so a1 = 2, a2 = 4, etc. Altogether:
10
∑ 2n = a1 + a2 + ⋅ ⋅ ⋅ + a10 = 21 + 22 + 23 + ⋅ ⋅ ⋅ + 210 = 2 + 4 + 8 + ⋅ ⋅ ⋅ + 1024.
n=1
Example 211. The series 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024 can be written
as
10
∑ 2n .
n=0
This says that the starting point is 0 and the ending point is 10. In other words, we
add up a0 , a1 , a2 , . . . , a10 , where for each n, an = 2n . And so a0 = 1, a1 = 2, a2 = 4, etc.
Altogether:
10
∑ 2n = a0 + a1 + a2 + ⋅ ⋅ ⋅ + a10 = 20 + 21 + 22 + ⋅ ⋅ ⋅ + 210 = 1 + 2 + 4 + ⋅ ⋅ ⋅ + 1024.
n=0
Exercise 94. Rewrite each of the following in summation notation. (Answer on p. 1063.)
(a) 1 + 4 + 9 + 16 + 25 + 36 + 49 + 64 + 81 + 100.
(b) 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23.
(c) 0.5 + 4 + 13.5 + 32 + 62.5 + 108 + 171.5.
Exercise 95. Find the sum of each of the following series. (Answer on p. 1063.)
5 17 33
n
(a) ∑ (2 − n) . (b) ∑ (4n + 5). (c) ∑ (x − 3).
n=−2 n=16 x=31
k
∑ f (n) = f (s) + f (s + 1) + ⋅ ⋅ ⋅ + f (k).
n=s
Example 212. Consider the finite sequence (4, 7, 10, 13, 16, 19, 22). A corresponding func-
tion f for this sequence has
Example 213. Consider the infinite sequence (4, 7, 10, 13, 16, 19, 22, 25, 28, 31, 34, . . . ). A
corresponding function f for this sequence has
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (1) = 4 and f (n) − f (n − 1) = 3 for all n ≥ 2.
Definition 57. An arithmetic sequence (or an arithmetic progression) is any finite or in-
finite sequence (an ) where an+1 − an is a constant for all n = 1, 2, 3, . . . . We call an+1 − an
the common difference. We call the series for an arithmetic sequence an arithmetic series.
And its sum of series (if it exists at all) is called an arithmetic sum of series.
Example 214. The sequence (an ) = (1, 4, 7, 10, 13, 16, 19, . . . ) is an arithmetic sequence
because an+1 − an is constant for n = 1, 2, 3, . . . .
But the sequence (bn ) = (1, 1, 4, 7, 10, 13, 16, 19, . . . ) is not an arithmetic sequence because
a2 − a1 = 0 ≠ a3 − a2 = 3.
The next fact is intuitively obvious. Clearly, there is no number for which, for example,
4 + 7 + 10 + 13 + 16 + 19 + 22 + . . . is equal to.
Fact 12. The infinite arithmetic sequence (an ) has no sum of series, except in the trivial
case where (an ) = (0, 0, 0, 0, 0, 0, . . . ).
Example 215. You’ve probably heard of the apocryphal story about an eight-year-old
Gauss adding up the numbers from 1 to 100 in an instant. The trick is to pair the first
number with the last, the second number with the second last, etc. then use multiplication.
Like this:
50 terms
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
1 + 2 + 3 + 4 + ⋅ ⋅ ⋅ + 100 = (1 + 100) + (2 + 99) + (3 + 98) + ⋅ ⋅ ⋅ + (50 + 51)
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
101 101 101 101
= 101 × 50 = 5050.
In general, there is a simple formula for the sum of a finite arithmetic series: (First Term
+ Last Term) × (Number of Terms) ÷ 2.
k
Fact 13. The finite arithmetic series a1 + a2 + ⋅ ⋅ ⋅ + ak has sum of series (a1 + ak ) .
2
Example 216. Consider the arithmetic sequence (7, 17, 27, 37, . . . , 837). Its common dif-
ference is 10. The difference between the first and last terms is 830. And so the last term
is 830 ÷ 10 = 83 terms after the first. Hence, there are in total 84 terms. By Fact 13, its
84
sum of series is (7 + 837) × = 35488.
2
Example 217. Consider the arithmetic sequence (1, 5, 9, 13, 17, 21, 25, 29, 33, . . . , 393). Its
common difference is 4. The difference between the first and last terms is 392. And so the
last term is 392 ÷ 4 = 98 terms after the first. Hence, there are in total 99 terms. By Fact
99
13, its sum of series is (1 + 393) × = 19503.
2
Exercise 96. Rewrite each of the following arithmetic series in summation notation and
compute their sums. (a) 2+7+12+17+22+27+32+⋅ ⋅ ⋅+997. (b) 3+20+37+54+71+⋅ ⋅ ⋅+1703.
(c) 81 + 89 + 97 + 105 + 113 + ⋅ ⋅ ⋅ + 8081 (Answer on p. 1064.)
Example 218. Consider the finite sequence (1, 2, 4, 8, 16, 32, 64, 128). A corresponding
function f for this sequence has
Example 219. Consider the finite sequence (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, . . . ). A cor-
responding function f for this sequence has
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (1) = 1 and f (n + 1) ÷ f (n) = 2 for all n = 1, 2, 3, . . . .
Definition 58. A geometric sequence (or a geometric progression) is any sequence (an )
where an+1 ÷ an is constant for all n = 1, 2, 3, . . . . We call an+1 ÷ an the common ratio. We
call the series for a geometric sequence a geometric series. And its sum of series (if it exists
at all) is called a geometric sum of series.
Example 220. The sequence (an ) = (1, 2, 4, 8, 16, 32, . . . ) is a geometric sequence because
an+1 ÷ an is constant for all n = 1, 2, 3, . . . .
But the sequence (bn ) = (1, 1, 2, 4, 8, 16, 32, . . . ) is not a geometric sequence because a2 ÷a1 =
1 ≠ a3 ÷ a2 = 2.
It turns out that just like with finite arithmetic series, there is a nice formula for the finite
geometric series. Let’s start with the simple case first where the first term is simply 1.
2 3 1 − rn
Fact 14. 1 + r + r + r + ⋅ ⋅ ⋅ + r n−1
= .
1−r
The trick used in the above proof is called the method of differences and the A-level
syllabus requires you to know it. The general case of a geometric series follows immediately
from the above:
2 3 1 − rn
Fact 15. a1 + a1 r + a1 r + a1 r + ⋅ ⋅ ⋅ + a1 r n−1
= a1 .
1−r
Example 221. Consider the geometric sequence (1, 2, 4, 8, 16, . . . , 1024). Its common ratio
is 2. The ratio of the last term to the first is 1024 ÷ 1 = 1024 = 210 . And so the last term
is 10 terms after the first. Hence, there are in total 11 terms. Thus, its sum of series is
1 − 211 −2047
1× = = 2047.
1−2 −1
Example 222. Consider the geometric sequence (4, 12, 36, 108, . . . , 8748). Its common
ratio is 3. The ratio of the last term to the first is 8748 ÷ 4 = 2187 = 37 . And so the last
term is 7 terms after the first. Hence, there are in total 8 terms. Thus, its sum of series is
1 − 38 −6560
4× =4× = 4 × 3280 = 13120.
1−3 −2
Exercise 97. Rewrite each of the following geometric series into summation notation and
compute their sums. (a) 7 + 14 + 28 + 56 + ⋅ ⋅ ⋅ + 448 + 896. (b) 20 + 10 + 5 + ⋅ ⋅ ⋅ + 5/8. (c)
1 + 1/3 + 1/9 + ⋅ ⋅ ⋅ + 1/243. (Answer on p. 1065.)
Perhaps surprisingly, it turns out that under a certain condition, an infinite geometric
sequence can have a sum of series. Again, let’s start with the simple case:
1
Fact 16. If ∣r∣ < 1, then 1 + r + r2 + r3 + ⋅ ⋅ ⋅ = .
1−r
Since ∣r∣ < 1, it follows that as n → ∞, rn → 0. Hence, if we take the difference, we have
simply S − rS = 1.
1
And so, S = .
1−r
a1
Fact 17. If ∣r∣ < 1, then a1 + a1 r + a1 r2 + a1 r3 + ⋅ ⋅ ⋅ = .
1−r
Exercise 98. Rewrite each of the following infinite geometric series in summation notation
and compute its sum. (a) 6 + 9/2 + 27/8 + . . . . (b) 20 + 10 + 5 + . . . . (c) 1 + 1/3 + 1/9 + . . . . (Answer
on p. 1065.)
SYLLABUS ALERT
Proof by the method of mathematical induction is included in the 9740 (old) syllabus, but
not in the 9758 (revised) syllabus. So you can skip this Chapter if you’re taking 9758.
We’ll now learn a new technique called proof by the method of mathematical induc-
tion. It’s pretty difficult, so go real slow.33
Imagine an infinite chain of dominos. Our goal is to knock all of them down. Suppose we
manage to do two things:
Then we will have succeeded. Because once the 1st domino is knocked down, the inductive
step implies that the 2nd domino is also knocked down, and now again by the inductive
step the 3rd domino is also knocked down, and now again by the inductive step the 4th
domino is also knocked down, ..., ad infinitum (to infinity).
33
Which is perhaps why they decided to drop it from the revised 9758 syllabus! It does appear though as the first topic of
Further Maths, which will be revived in 2017 and for which a free textbook will soon be appearing!
Step #1. Let P(k) be (shorthand for) the proposition to be proven. Our goal is to show
that P(k) is true for all k = 1, 2, 3, . . .
Step #3 (the inductive step). Show that P(j) implies P(j + 1) (for all j = 1, 2, 3, . . . ).
Step #1 rarely involves much work. Step #2 is usually, but not always, very easy. Step
#3 is usually the hardest part — on the A-level exams, it usually just involves some (or a
lot of) algebra.
Why does the method of mathematical induction work? Step #2 (the base case) shows
that P(1) is true (“knock down the 1st domino”). Step #3 (the inductive step) then
implies that P(2) is also true (“the falling 1st domino knocks down the 2nd domino”).
Step #3 (the inductive step) then implies that P(3) is also true (“the falling 2nd domino
knocks down the 3rd domino”).
Step #3 (the inductive step ) then implies that P(4) is also true (“the falling 3rd domino
knocks down the 4th domino”).
Ad infinitum (to infinity). Thus, we have proven that P(k) is true for all k = 1, 2, 3, . . . , as
desired.
Too abstract? Work through all the examples and exercises and you should find that it is
not very difficult. For our first example, we’ll reprove an earlier fact, but now using the
method of mathematical induction.
Proof. Step #1. Let P(k) be (shorthand for) the proposition that
1 − rk
1 + r + r2 + r3 + ⋅ ⋅ ⋅ + rk−1 = .
1−r
1 − r1
1= . ✓
1−r
2 3 j−1 1 1 − rj
1 + r + r + r + ⋅⋅⋅ + r = .
1−r
1 − rj+1
1 + r + r2 + r3 + ⋅ ⋅ ⋅ + rj = .
1−r
To this end, write:
1 + r + r2 + r3 + ⋅ ⋅ ⋅ + rj = (1 + r + r2 + r3 + ⋅ ⋅ ⋅ + rj−1 ) + rj
1 1−r 1 − rj + (1 − r)rj 1 − rj+1
j
= + rj = = , as desired.
1−r 1−r 1−r
In this particular instance, the method of mathematical induction was terribly cumbersome,
compared to our earlier four-sentence proof (p. 239). But it turns out that in many other
instances, this method is the best and sometimes the only tool to use.
Let’s try more examples.
j+1
(j + 1) [(j + 1) + 1] [2(j + 1) + 1]
∑ r2 = .
n=1 6
I just used the “backwards-forwards method”. The order in which I wrote down each line
is given by the numbers above each = sign.
Another trick is to exploit the fact that it has got to work out right. So for example,
it might not immediately be obvious that 2j 2 + 7j + 1 = (j + 2)(2j + 3), but you know it
has got to work out right and thus this must surely be true (unless of course you made
some mistake with the algebra somewhere). And if you expand the RHS, you find that this
equation is indeed true.
Proof. Step #1. Let P(k) be (shorthand for) the proposition that
k(a1 + ak )
a1 + a2 + ⋅ ⋅ ⋅ + ak = .
2
1 j(a1 + aj )
a1 + a2 + ⋅ ⋅ ⋅ + aj = .
2
(j + 1)(a1 + aj+1 )
a1 + a2 + ⋅ ⋅ ⋅ + aj+1 = .
2
Let’s first observe that aj − a1 = (j − 1) (aj+1 − aj ). In words, this equation says: Consider
the difference between the j th term and the first term; it is equal to j −1 times the difference
2 (j − 1)aj+1 + a1
between two consecutive terms. Rearranging, we have aj = .
j
Now write:
a1 + a2 + ⋅ ⋅ ⋅ + aj+1
j(a1 + aj )
= (a1 + a2 + ⋅ ⋅ ⋅ + aj ) + aj+1 = + aj+1
2
j {a1 + [(j − 1)aj+1 + a1 ] /j} ja1 + (j − 1)aj+1 + a1
= + aj+1 = + aj+1
2 2
(j + 1)a1 + (j − 1)aj+1 (j + 1)a1 + (j − 1)aj+1 + 2aj+1
= + aj+1 =
2 2
(j + 1)a1 + (j + 1)aj+1 (j + 1)(a1 + aj+1 )
= = , as desired.
2 2
n n 2
3
By the way, this shows that ∑ r = (∑ r) .
r=1 r=1
n
n(n + 1)(2n + 1)(3n2 + 3n − 1)
4
∑r = .
r=1 30
Vectors
Example 224. The line running through points a and b goes forever, in both directions
(red dotted line). In contrast, the line segment ab is finite. The line ab is a different
mathematical object from the line segment ab.
b
a
The length of the line segment ab is thus a well-defined concept. In contrast, it makes no
sense to talk about the length of the line ab.
A ray is a portion of a line, beginning at some point along the line, then going towards
infinity. You can think of a ray as a half-infinite-line. The figure above illustrates in grey
the ray that starts from the point a and goes in the direction b.
This textbook will strictly reserve the word ray to mean a half-infinite-line. But you should
know that some other writers use ray to mean a (finite) line segment.
We will not use the degree ○ as a unit of measurement for angles. In this textbook, the
unit of measurement for angles is the radian. As we’ll see in a moment, the radian is
actually a “unitless” unit. So we’ll always write, for example, “π/3” instead of “π/3 rad”.
π π
But just to refresh your memory, 0 rad= 0○ , rad= 45○ , rad= 90○ , π rad= 180○ , and
4 2
2π rad= 360○ . (This last sentence is the one and only time in this textbook that we’ll use
degrees as a unit of measurement for angles.)
π
θ is the zero angle if θ = 0, θ is an obtuse angle if θ ∈ ( , π),
2
π
θ is an acute angle if θ ∈ (0, ), θ is a straight angle if θ = π,
2
π
θ is a right angle if θ = , θ is a reflex angle if θ ∈ (π, 2π).
2
In the figure below, the angle A is acute, R is right, O is obtuse, S is straight, and X is
reflex. The zero angle is not depicted.
By convention, every angle is depicted as a sector of a circle, unless it is a right angle, in
which case it is depicted by a square.
A O
Triangles are also given different names, depending on the size of their largest angle. A
triangle is:
Obtuse triangle
Acute triangle
Right triangle
Both the sine and cosine functions have domain R and codomain [−1, 1].
The tangent function has domain ⋅ ⋅ ⋅∪(−1.5π, −0.5π)∪(−0.5π, 0.5π)∪(0.5π, 1.5π)∪. . . —i.e.
all reals except half-integer multiples of π. And the tangent function’s codomain is R.
Draw a unit circle. Then given any point p = (px , py ) on the unit circle and the angle A
that the line segment op makes with the positive x-axis, we define sin A = py , cos A = px ,
py
and tan A = . Note that the line segment op has length 1.
px
py p
A x
px
In the case where A is acute (the point p is in the top-right quadrant of the cartesian
plane), one mnemonic is “SOH, CAH, TOA” — Sine is Opposite over Hypothenuse, Cosine
is Adjacent over Hypothenuse, and Tangent is Opposite over Adjacent.
Sine and cosine fluctuate between −1 and 1. We describe their fluctuations as being sinu-
soidal. In contrast, tangent fluctuates between −∞ and ∞. At half-integer multiples of π,
the tangent function is undefined.
You don’t need to memorise the following (because you have a calculator). But you will
solve problems a little more quickly if you have these memorised.
π π π π 2π 3π 5π
x 0 π
6 4 3 2 3 4 6
√ √ √ √
sin x 0 1/2 2/2 3/2 1 3/2 2/2 1/2 0
√ √ √ √
cos x 1 3/2 2/2 1/2 0 −1/2 − 2/2 − 3/2 −1
√ √ √ √
tan x 0 3/3 1 3 Undefined − 3 −1 − 3/3 0
For all x for which all expressions are well defined, we have:
sin x
tan x = ,
cos x
The following formulae will appear in the List of Formulae you’ll get during exams, so
you don’t need to memorise them. Exam Tip: Whenever you see a question with
trigonometric functions, make sure you have this list right next to you! For all
A, B, P, Q for which all expressions are well-defined, we have:
tan A ± tan B
tan(A ± B) = ,
1 ∓ tan A tan B
2 tan A
tan 2A = ,
1 − tan2 A
P +Q P −Q
sin P + sin Q = 2 sin ( ) cos ( ),
2 2
P +Q P −Q
sin P − sin Q = 2 cos ( ) sin ( ),
2 2
P +Q P −Q
cos P + cos Q = 2 cos ( ) cos ( ),
2 2
P +Q P −Q
cos P − cos Q = −2 sin ( ) sin ( ).
2 2
We define sin2 A to be the square of sin A. One might thus suppose that analogously,
sin−1 x = 1/ sin x, but this is not so! Instead:
Definition 59. The arcsine function, denoted sin−1 , has domain [−1, 1], codomain (and
range) [−0.5π, 0.5π], and rule x ↦ y where sin y = x.
Below is the graph of the arcsine function. The endpoints (−1, −0.5π) and (1, 0.5π) are
marked with red dots.
y
0.5π
y = sin-1 x
-0.5π
Below is the graph of the arccosine function. The endpoints (−1, π) and (1, 0) are marked
with blue dots.
Note that [0, π] are the principal values of the arccosine function. Why can’t we select
[−0.5π, 0.5π] as the principal values for the arccosine function, like we did for the arcsine
function?34
y = cos-1 x
34
Because then cos−1 (−1), for example, would be undefined.
Below is the graph of the arctangent function. There are two horizontal asymptotes, namely
y = 0.5π and y = −0.5π. That is, as x → ±∞, y → ±0.5π.
Note that (−0.5π, 0.5π) are the principal values of the arctangent function.
y
y = 0.5π
horizontal
asymptote
-10 -6 -2 2 6 10
y = -0.5π
y= tan-1 x horizontal
asymptote
Remark 7. This notation can be tremendously confusing, which is why many writers prefer
to write arcsin x, arccos x, and arctan x instead of sin−1 x, cos−1 x, tan−1 x. But the Singapore
Cambridge A-level syllabus does not use the arcsin x, arccos x, or arctan x notation and so
neither shall this textbook.
c
a
a sin C
A C
b - a cos C a cos C
b
Proposition 4. A triangle with sides of lengths a, b, and c and angles A, B, and C has
area is 0.5ab sin C.
Proof. The triangle has base b and height a sin C. Hence, its area is 0.5ab sin C.
Proof. The area of the above triangle is 0.5ab sin C. By symmetry, it is also 0.5bc sin A and
0.5ac sin B. Equate these and divide by 0.5abc:
a b c
⇐⇒ = = .
sin A sin B sin C
Proposition 6. (The Law of Cosines.) For a triangle with sides of lengths a, b, and c
and angles A, B, and C, c2 = a2 + b2 − 2ab cos C.
2
c2 = (a sin C)2 + (b − a cos C)
= a2 sin2 C + b2 − 2ab cos C + a2 cos2 C
= a2 (sin2 C + cos2 C) + b2 − 2ab cos C
= a2 + b2 − 2ab cos C,
One perhaps-obvious implication of the Law of Cosines is that the length of any one side
of a triangle is always less than the sum of the lengths of the other two sides.
Proof. c2 = a2 +b2 −2ab cos C = a2 +b2 −2ab+2ab−2ab cos C = (a−b)2 +2ab(1−cos C) > (a−b)2 .
Hence, c > a − b or a < b + c.
Example 225. The points a = (−1, 2), b = (3, −1), c = (−1, 1), and d = (3, −2) can be
illustrated graphically on the cartesian plane. The origin (0, 0) is usually named o.
3 y
a
2
c
1
o x
0
-3 -2 -1 0 1 2 3 4 5
b
-1
-2 d
-3
We will not formally define vectors, because to do so would require more maths than is
covered at A-level. But informally, a vector is an “arrow” with two properties: direction
and length.
3 y
4
a 2
The vector ab = v = v
3
Length = 5
c 1
0
-3 -2 -1 0 1 2 3 4 5
x
-1
b
The vector cd = v = v
The vector u
-2
d
-3
Ð
→
Given two points a and b, ab denotes the vector from point a to point b. The word vector
means carrier (in Latin). You may have learnt in biology that mosquitoes are vectors,
because they carry diseases (to humans). In mathematics likewise, a vector carries us
from one point to another.
Ð
→ Ð
→
Example 227. The vector ab carries us from point a to point b. The vector cd carries us
from point c to point d.
Ð→
Example 228. The vector ab = (4, −3) carries us 4 units to the right and 3 units down. The
Ð→
vector cd = (4, −3) carries us 4 units to the right and 3 units down. The vector u = (2, −1.5)
carries us 2 units to the right and 1.5 units down.
Note that we’re now using the (x, y) ordered set notation for the third time!35
Do not confuse a point with a vector!
Example 229. The point (4, −3) is a zero-dimensional object. In contrast, the vector
(4, −3) is a two-dimensional object.
⎛x⎞
The vector (x, y) can also be written as .
⎝y ⎠
⎛ 4 ⎞ ⎛ 2 ⎞
Example 230. We can write (4, −3) = and u = (2, −1.5) = .
⎝ −3 ⎠ ⎝ −1.5 ⎠
⎛a⎞
The notation for vectors is very useful, because as we’ll see shortly, we’ll be doing a
⎝ b ⎠
lot of addition and multiplication with vectors, and this notation can help us see better (in
a literal sense). But in print, I’ll often prefer using the (a, b) notation, simply because this
takes up less space.
The point a is called the vector’s tail and the point b is called the vector’s head. This is
potentially confusing, so always remember: a vector carries us from tail to head and
not the other way round!
A vector is defined by two characteristics: direction and length.
It must be stressed that the tail and head of a vector do not matter. Only the
direction and length do. So long as two vectors have the same direction and length, they
are considered to be exact same vector. Examples to illustrate
35
So far, we have used (x, y) to denote (i) an open interval — specifically, the set of real numbers greater than x but smaller
than y; (ii) the ordered pair of real numbers x and y; and now also (iii) the vector that carries us x units to the right and
y units up.
Example 232. The vector (0, −1) can carry us from a to c or from b to d. Thus,
Ð
→
Thus, bd = (0, −1) = Ð
→ = (0, −1).
ac
But, Ð
→ = (0, 1) ≠ Ð
ca → = (0, −1),
ac
and (0, −0.5) ≠ Ð
→ = (0, −1).
ac
Yet another way of denoting vectors is by a single letter, either with a right arrow overhead
Ð→ Ð→
or in bold font. For example, in the figure above, the vector ab or cd is also named using
the letter v, either as Ð
→
v or as boldfont v.
Ð
→
Example 233. So altogether, I can write the vector ab in five different ways:
Ð
→ → ⎛ 4 ⎞
ab = Ð
v = v = (4, −3) = .
⎝ −3 ⎠
Exercise 102. Using a, b, c, or d from the above figure as the tail and a distinct point as
the head, there are 12 possible vectors. We’ve already written out 4 of these in the last two
examples. Write out the other 8 in ordered set notation. (Answer on p. 1069.)
Definition 62. Given a point a = (a1 , a2 ), its position vector is the vector a = (a1 , a2 ).
The position vector of the point a carries us from the origin o to the point a and so it
can also be denoted Ð → Take care not to confuse the point a = (a , a ) with the vector
oa. 1 2
a = (a1 , a2 ) — they are different objects!
Informally, the zero vector is the vector that carries us nowhere. Formally:
Ð
→
Definition 63. The zero vector is the vector (0, 0) and can be denoted 0 or 0 .
The analogy is to points in the real world — it makes no sense to talk about the sum of
two locations:
Example 234. Consider the points Paris and Tokyo. The sum Paris + Tokyo = ?? is
undefined. It makes no sense to talk about the sum of two locations.
p+v q
v u
b–a
b
a
p q–u
36
At least in this textbook (and in the A-levels).
Definition 64. Given two points a = (a1 , a2 ) and b = (b1 , b2 ), their difference b−a is defined
to be the vector from a to b, i.e., b − a = (b1 − a1 , b2 − a2 ).
Example 235. Paris − Tokyo = The journey that carries us from Tokyo to Paris. We might
write Paris − Tokyo =(−9000 km, 1000 km), meaning that to get from Tokyo to Paris, we
must travel 9, 000 km west and 1, 000 km north.
It makes sense to talk about the distance of the journey from Tokyo to Paris. Shortly, we’ll
see that it similarly makes sense to talk about the length of the vector from a to b.
Example 236. (See figure on p. 261.) Given the points a = (−1, 2) and b = (3, −1), their
difference b − a is the vector from a to b, i.e., b − a = (3 − (−1), −1 − 2) = (4, −3).
Definition 65. Given the point p = (p1 , p2 ) and the vector v = (v1 , v2 ), their sum p + v is
defined to be the point p + v = (p1 + v1 , p2 + v2 ).
Example 237. Tokyo + (−9000 km, 1000 km) = Paris. This says that starting from Tokyo,
if we embark on a journey that carries us 9, 000 km west and 1, 000 km north, then we’ll
end up in Paris.
Example 238. (See figure on p. 261.) Consider the vector (4, −3). If its tail is a = (−1, 2),
then its head is (−1, 2) + (4, −3) = (3, −1) = b. And if its tail is c = (−1, 1), then its head is
(−1, 1) + (4, −3) = (3, −2) = d.
Definition 66. Given the point q = (q1 , q2 ) and the vector u = (u1 , u2 ), their difference
q − u is defined to be the point q − u = (q1 − u1 , q2 − u2 ).
Example 239. Paris − (−9000 km, 1000 km) = Tokyo. This says that starting from Paris,
if we embark on a journey that is the exact opposite of going 9, 000 km west and 1, 000 km
north (equivalently, we embark on a journey that goes 9, 000 km east and 1, 000 km south),
then we’ll end up in Tokyo.
Example 240. (See figure on p. 261.) Consider again the vector (4, −3). If its head is
b = (3, −1), then its tail is (3, −1) − (4, −3) = (−1, 2) = a. And if its head is d = (3, −2), then
its tail is (3, −2) − (4, −3) = (−1, 1) = c.
Exercise 103. Consider the vector (4, −3). (a) If it has tail (0, 0), then what is its head?
(b) If it has head (0, 0), then what is its tail? (c) If it has tail (5, 2), then what is its head?
(d) If it has head (5, 2), then what is its tail? (Answer on p. 1069.)
Definition 67. If u = (u1 , u2 ) and v = (v1 , v2 ) are vectors, then their sum, denoted u + v,
is the vector defined by u + v = (u1 + v1 , u2 + v2 ).
Geometrically, if the tail of v is the head of u, then u + v is the vector from the tail of u
to the head of v.
u+v
v
Ð
→ Ð →
Example 241. (See figure on p. 261.) ab + bc = (4, −3) + (−4, 2) = (0, −1) = Ð
→
ac.
Ð
→ Ð →
Example 242. (See figure on p. 261.) ad + cb = (4, −4) + (4, −2) = (8, −6).
Definition 68. If v = (v1 , v2 ), then its additive inverse, denoted −v, is defined by
−v = (−v1 , −v2 ).
Geometrically, if the vector v is from point a to point b, then −v is the vector from point
b to point a. And so informally, the additive inverse is simply the same vector but flipped
in the opposite direction.
Ð
→ Ð → Ð
→ Ð →
Example 243. The additive inverse of ab is ba. That is, −ab = ba.
Ð
→ Ð → Ð→ Ð →
Example 244. The additive inverse of bc is cb. That is, − bc = cb.
Definition 69. Given two vectors u and v, their difference, denoted u − v, is defined to
be the sum of the vectors u and −v. Or equivalently, if u = (u1 , u2 ) and v = (v1 , v2 ), then
u − v is the vector defined by u − v = (u1 − v1 , u2 − v2 ).
Geometrically, if we place the heads of u and v at the same point, then u − v is the vector
from the tail of u to the tail of v.
u-v
v
Fact 19. Let p and q be two points with position vectors p and q. Then Ð
→ = q − p.
pq
Ð
→
Example 245. (See figure on p. 261.) b − a = (3, −1) − (−1, 2) = (4, −3) = ab.
Ð
→
Example 246. (See figure on p. 261.) d − c = (3, −2) − (−1, 1) = (4, −3) = cd.
Ð
→ Ð →
Example 247. (See figure on p. 261.) Without any numbers, we can compute: ab − cb =
Ð
→ Ð
→ Ð→ Ð → → We can verify with numbers that this is correct: Ð
→ Ð →
ab + (− cb) = ab + bc = Ð ac. ab − cb =
(4, −3) − (4, −2) = (0, −1) = Ð→ ✓.
ac
Ð
→ Ð →
Example 248. (See figure on p. 261.) ad − cb = (4, −4) − (4, −2) = (0, −2).
Exercise 105. Using the figure on p. 261, compute each of the following: Ð →−Ð
ac
→ Ð → →
cb, dc − Ð
ca,
Ð
→ Ð → Ð → Ð → Ð → Ð → Ð
→ Ð →
bd − da, ad + cd, dc + bd, and bd − db? (Answer on p. 1069.)
Ð
→
Definition 70. If a moving particle starts at point a and ends at point b, we call ab its
displacement vector.
Example 249. A particle is travelling along the red arc, along the path shown. Its starting
point is in blue and its ending point is in purple. Its displacement vector is thus (2, 2).
1 y
x
0
-1 0 1 2 Ending 3 4
point
-1
Displacement
vector (2, 2)
-2 Starting point
-3
-4
As you learnt in secondary school we can calculate the distance between two points using
the Pythagorean Theorem:
The vector v = (v1 , v2 ) goes v1 units right and v2 units up. We are thus motivated to define
its length (or magnitude) as:
Example 250 (continued). Another way to find the distance between p and q is to first
find the vector that carries us from p to q. This is Ð
→ = (−2, −2). The distance between p
pq √
Ð
→ 2 2 √
and q is thus simply the length (or magnitude) of this vector: ∣pq∣ = (−2) + (−2) = 8.
Of course, the distance from p to q is the same as the distance from q to p. So we could
→ = (2, 2) and gotten the same answer – ∣Ð
just as well have calculated the length of Ð
qp → =
qp∣
√ √
22 + 22 = 8.
Exercise 106. (Answer on p. 1069.) Using the figure on p. 261, compute each of the
following:
→−Ð
∣Ð
ac
→ Ð → → Ð
cb∣, ∣dc − Ð
→ Ð → Ð → Ð → Ð → Ð → Ð
→ Ð →
ca∣, ∣bd − da∣, ∣ad + cd∣, ∣dc + bd∣, and ∣bd − db∣.
Exercise 107. (Answer on p. 1069.) In general, given any two vectors u and v, is it true
that
∣u + v∣ = ∣u∣ + ∣v∣?
A scalar is often contrasted with a vector. A vector has both magnitude (or length) and
direction. In contrast, a scalar has magnitude but no direction.
Definition 73. If v = (v1 , v2 ) is a vector and c ∈ R is a scalar, then cv denotes the vector
defined by cv = (cv1 , cv2 ). We call this operation scalar multiplication of a vector.
Graphically, cv is simply the vector that has the same direction as v, but with c times the
length. This is formally shown in the next fact.
v
cv
Proof. √
2 2
∣cv∣ = ∣(cv1 , cv2 )∣ = (cv1 ) + (cv2 )
√ √
= c v1 + c v2 = ∣c∣ v12 + v22
2 2 2 2
Ð
→ → Ð→
Exercise 108. Using the figure on p. 261, write down 2ab, 3Ð ac, and 4ad in ordered set
Ð
→ Ð
→ → and ∣4Ð
→ Ð
→
notation. Verify that ∣2ab∣ = 2 ∣ab∣, ∣3Ð
→ = 3 ∣Ð
ac∣ ac∣, ad∣ = 4 ∣ad∣. (Answer on p. 1070.)
√ √
2 2
Example 251. Let’s verify that the vectors (1, 0), (0, 1), and ( , ) are all unit vectors:
2 2
√
∣(1, 0)∣ = 12 + 02 = 1, ✓
√
∣(0, 1)∣ = 02 + 12 = 1, ✓
¿
√ √ Á √ 2 √ 2
2 2 Á 2 2 √
∣( , )∣ = Á
À( ) +( ) = 2/4 + 2/4 = 1. ✓
2 2 2 2
Example 252. Let’s verify that the vectors (1, 1) and (−1, −1) are not unit vectors:
√ √
∣(1, 1)∣ = 12 + 12 = 2 ≠ 1, ✓
√ √
2 2
∣(−1, −1)∣ = (−1) + (−1) = 2 ≠ 1. ✓
Ð→
We specially reserve the name i (or i ) for the unit vector (1, 0), which is the unit vector
that is purely in the direction of the x-axis. Similarly, we specially reserve the name j (or
Ð
→
j ) for the unit vector (0, 1), which is the unit vector that is purely in the direction of the
y-axis.
And so, using also what we learnt about the sum of and scalar multiplication of vectors,
we can rewrite any vector into the sum of i’s and j’s:
7 y
c
6
j
5
j
4
j
3
j
2 a
ji
1
j x
0
-3 -2 -1 0-j 1 2 3 4 5
-1
-j
-2
-j b
-3
i i i i
1
Definition 75. The unit vector in the direction v is defined by v̂ = v.
∣v∣
Ð
→ →
Exercise 109. In the figure on p. 261, what are the unit vectors in the directions ab, Ð
ac,
Ð→ Ð
→ Ð → Ð→
and ad? What are the unit vectors in the directions 2ab, 3ac, and 4ad? (Answer on p.
1070.)
Fact 21. If c is a scalar and v̂ is a unit vector, then the vector cv̂ has length c.
Informally, two vectors have the same unit vector ⇐⇒ they both point in the same
direction. Formally:
Fact 22. Let a and b be any two vectors. Then â = b̂ ⇐⇒ a can be written as a scalar
multiple of b.
Informally, any vector in the plane can be written as the linear combination of any other
two vectors. Formally:
Fact 23. Let a and b be any two vectors in the same plane with distinct directions (i.e.
â ≠ b̂). Then every vector in the same plane can be written as αa + βb for some α, β ∈ R.
See TYS Exercise 338 (i) for an application of the above fact.
Exercise 110. Given the vectors a = (1, 3) and b = (7, 5), show that each of the following
vectors can be written in the form αa + βb for some α, β ∈ R. (i) (0, 1). (ii) (1, 0). (iii)
(1, 1). (Answer on p. 1070.)
Theorem 3. Ratio Theorem. Let a, b, and p be points, where p is on the line segment
ab. Let a, b, and p be the corresponding position vectors. Then
Ð
→
∣bp∣ ∣Ð→
ap∣
p= → a+ Ð → b.
→ + ∣Ð
∣Ð
ap∣ bp∣ → + ∣Ð
∣ap∣ bp∣
→ and µ = ∣Ð
Or if we let λ = ∣Ð
ap∣
→
bp∣, then the above can be rewritten in a form that is perhaps
easier to remember:
µa λb µa + λb
p= + = .
λ+µ λ+µ λ+µ
µa + λb
“The point dividing AB in the ratio λ ∶ µ has position vector .”
λ+µ
Example 255. Consider the points a = (8, 3) and b = (2, −6). Find the point p that divides
the line segment ab into the ratio 3 ∶ 7.
We have p = 0.7a +0.3b = 0.7(8, 3)+0.3(2, −6) = (6.2, 0.3). Hence, the point is p = (6.2, 0.3).
Exercise 111. (a) Consider the points a = (1, 2) and b = (3, 4). Find the point p that
divides the line segment ab into the ratio 5 ∶ 6. (b) Consider the points a = (1, 4) and
b = (2, 3). Find the point p that divides the line segment ab into the ratio 5 ∶ 1. (c)
Consider the points a = (−1, 2) and b = (3, −4). Find the point p that divides the line
segment ab into the ratio 2 ∶ 3. (Answer on p. 1070.)
Definition 76. Given two 2D vectors u = (u1 , u2 ) and v = (v1 , v2 ), their scalar product (or
dot product), denoted u ⋅ v, is defined by u ⋅ v = u1 v1 + u2 v2 .
And so to get the scalar product, simply multiply each term of each vector with the corre-
sponding term of the other, then add these up. It’s that simple!
The scalar product is itself simply a scalar (i.e. a real number). Hence the name.
Here is one use of the scalar product: the length of a vector is simply the square root of its
scalar product with itself. Formally:
√
Fact 25. Given a vector v, ∣v∣ = v ⋅ v.
√
Proof. By Definition 71 (length of vector), ∣v∣ = v12 + v22 . By Definition 76 (scalar product),
√
v ⋅ v = v1 v1 + v2 v2 = v12 + v22 . Hence, ∣v∣ = v ⋅ v.
Fact 26. Let θ ∈ [0, π] be the angle between two non-zero vectors u and v. Then
The above fact37 gives us a very convenient way to calculate the angle between two vectors,
because rearranging, we have:
u⋅v
θ = cos−1 ( ).
∣u∣ ∣v∣
37
We have two possible interpretations of the scalar product that are entirely equivalent. We can use either of these
interpretations as our definition and then prove that the other interpretation is true.
(1) In this textbook, we first define the scalar product by u ⋅ v = u1 v1 + u2 v2 , then prove that u ⋅ v = ∣u∣ ∣v∣ cos θ. That is,
we start with the algebraic definition, then prove a geometric property.
(2) In contrast, others may prefer to first define the scalar product by u⋅v = ∣u∣ ∣v∣ cos θ, then prove that u⋅v = u1 v1 +u2 v2 .
That is, we start with the geometric definition, then prove an algebraic property.
Either way, we first define the scalar product one way or the other. We then prove that the alternative statement is
equivalent.
(It is possible that your JC teachers take the second approach, rather than the first, as is done in this textbook.
Or worse, your teachers simply leave you confused as to why the hell u ⋅ v = ∣u∣ ∣v∣ cos θ and at the same time, magically
enough, u ⋅ v = u1 v1 + u2 v2 . This was my experience as a JC student a number of years ago. If this is also your current
experience, hopefully this textbook has helped to clear things up!)
⎛ 1×1+0×1 ⎞
= cos−1 ⎜ √ √ ⎟
⎝ ( 1 2 + 0 2 ) × ( 1 2 + 1 2 ) ⎠
1+0 1 π
= cos−1 ( √ ) = cos−1 ( √ ) = . ✓
1× 2 2 4
Example 260. The vector i = (1, 0) points east. The vector j = (0, 1) points north. We
π
know the angle between these two vectors is right (i.e. ). Let’s check and verify that the
2
formula works:
⎛ 1×0+0×1 ⎞
−1
= cos ⎜ √ √ ⎟
⎝ ( 12 + 02 ) × ( 02 + 12 ) ⎠
0+0 π
= cos−1 ( ) = cos−1 0 = ✓
1×1 2
⎛ ⎞
−1 ⎜ 3 × (−1) + 2 × (−4) ⎟
= cos ⎜ √ √ ⎟
⎜ 2 2 ⎟
⎝ ( 3 + 2 ) × ( (−1) + (−4) ) ⎠
2 2
−3 − 8 −11
= cos−1 ( √ √ ) = cos−1 ( √ ) ≈ 2.404
13 × 17 221
This is an example where the angle is obtuse, i.e. between π/2 and π.
(3, 2)
x
2.404 rad
(-1, -4)
π
• x > 0 Ô⇒ cos−1 x ∈ [0, ), i.e. cos−1 x is an acute (or zero) angle.
2
π
• x = 0 Ô⇒ cos−1 x = , i.e. cos−1 x is a right angle.
2
π
• x < 0 Ô⇒ cos−1 x ∈ ( , π], i.e. cos−1 x is an obtuse (or straight) angle.
2
These three observations, together with Fact 26, imply the following Fact, which by the
way was already illustrated by the previous three examples:
Definition 77. Two vectors are orthogonal (or perpendicular or normal) if the angle be-
π
tween them is right (i.e. equal to ).
2
Exercise 112. First write down the angle between each of the following pairs of vectors
without using the above formula. Then verify that the formula does indeed √ give you
these correct angles: (a) (2, 0) and (0, 17); (b) (5, 0) and (−3, 0); (c) i and (1, 3/3); (d) i
√
and (1, 3). (Answers on pp. 1071 and 1072.)
Exercise 113. Verify that i and j are orthogonal, by computing their scalar product.
(Answer on p. 1073.)
The scalar product also gives a convenient way of computing the length of the projection
of one vector on another.
Say we have a right triangle (left diagram) where the angle θ and the length a are known.
What is the length b? It is simply ∣a∣ cos θ.
a a
Ĭ
Ĭ b a b b
Now suppose a (blue) and b (green) are vectors (right diagram). The projection of the
vector a on the vector b is denoted a⊥b (red). Note that a⊥b is itself a vector.
What is the length of the projection? Well, if ∣a∣ is the length of the vector a and θ is the
angle between the two vectors, then the length of the projection is ∣a⊥b ∣ simply a cos θ.
Nicely enough, we actually have a quick alternative method of computing this length. Let
b̂ be the unit vector for b. Then
So we have a nice interpretation for a ⋅ b̂ or more correctly ∣a ⋅ b̂∣, since a ⋅ b̂ may sometimes
be negative:
̂ 1 1
(3, 2) ⋅ (1, 1) = (3, 2) ⋅ [ (1, 1)] = √ (3, 2) ⋅ (1, 1)
∣(1, 1)∣ 2
√
1 5 5 2
= √ (3 × 1 + 2 × 1) = √ = .
2 2 2
You should
√ verify for yourself that the length of the projection of (3, 2) on (1000, 1000) is
5 2
also . The length of the vector to be projected — (3, 2) — matters, but the length of
2
the vector onto which it is projected — be it (1, 1) or (1000, 1000) — doesn’t matter.
̂ 1 1
(−6, 1) ⋅ (2, 0) = (−6, 1) ⋅ [ (2, 0)] = (−6, 1) ⋅ (2, 0)
∣(2, 0)∣ 2
1 −12
= (−6 × 2 + 1 × 0) = = −6.
2 2
Again, you can verify for yourself that the length of the projection of (−6, 1) on (50000, 0)
is also −6. Again, the length of the vector to be projected — (−6, 1) — matters, but the
length of the vector onto which it is projected — be it (2, 0) or (50000, 0) — doesn’t matter.
Exercise 114. What are the lengths of the projections of (a) (1, 0) on (33, 33) and (b)
(33, 33) on (1, 0)? (Answer on p. 1073.)
The angle between a vector v and the x-axis is simply the angle between v and i = (1, 0).
Similarly, the angle between v and the y-axis is simply the angle between v and j = (0, 1).
Example 264. Consider the angle a between the vector (3, 2) and the x-axis. We have:
√
We refer to α = 3/ 13 as the x-direction cosine of the vector (3, 2). By computing
√
cos−1 α = cos−1 (3/ 13) ≈ 0.588, we find that the angle a between the vector (3, 2) and the
x-axis is 0.588.
(3, 2)
0.983 rad
x
2.404 rad
(-1, -4)
√
We refer to β = 2/ 13 as the y-direction cosine of the vector (3, 2). By computing
√
cos−1 β = cos−1 (2/ 13) ≈ 0.983, we find that the angle b between the vector (3, 2) and the
y-axis is 0.983.
Definition 78. Given a vector v, its x-direction cosine α is simply the length of the
projection of v̂ on the x-axis.
Similarly, its y-direction cosine β is simply the length of the projection of v̂ on the y-axis.
Fact 28. Let v be a vector and α and β be its x- and y-direction cosines. Then v̂ = (α, β).
Example 266. The x- and y-direction cosines of the vector (3, 2) are
3 2
α= √ and β = √ .
13 13
3 2
Hence, the unit vector in the direction (3, 2) is ( √ , √ ).
13 13
Exercise 115. For each of the following vectors, find their x- and y-direction cosines.
Hence write down their unit vectors. (a) (1, 3). (b) (4, 2). (c) (−1, 2). (Answer on p.
1073.)
In two dimensions, we had the cartesian (or two-dimensional) plane with x- and y-axes.
Informally, the x-axis goes to the right and the y-axis goes up. A point was any ordered
pair of real numbers. The origin o = (0, 0) was the intersection point of the two axes. And
relative to the origin, the generic point a = (a1 , a2 ) was the point a1 units to the right and
a2 units up.
In three dimensions, we now instead have the three-dimensional space (3D space).
The x- and y-axes are as before. There is an additional z-axis that, informally, comes
“out of the paper, perpendicular to the plane of the paper, straight towards your face”.
We call this the right hand coordinate system, because if you take your right hand,
stick out your thumb, forefinger, and middle finger so that they are perpendicular, your
thumb represents the x-axis, your forefinger the y-axis, and your middle finger the z-axis.
(Try it!)
(If instead the z-axis goes “into the paper”, then we’d have a left hand coordinate system.
Can you explain why?)
a2
x
a1
a3
z
Exercise 116. (Answer on p. 1074.) (a) Fill in the blanks. A 3D vector is an “arrow”
that has two characteristics: __________ and __________. Just like a
point, it can be described by an __________ of __________. The vector
a = (a1 , a2 , a3 ) carries us from the origin to _______________.
(b) What other ways are there to denote the vector a = (a1 , a2 , a3 )? (Hint. The unit vector
in the z-axis is now called k.)
Ð
→ → Ð →
(c) Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) be points. What are (i) a+b; (ii) a+ ob; (iii) Ð
oa+ ob;
and (iv) Ð→−Ð
oa
→
ba?
a2
x
a1
a3
z
First let’s calculate the distance of the red point from the origin,
√ in other words the length
of the red dotted line. By the Pythagorean Theorem, it is a22 + a23 .
√
Now, notice the green dotted line, the red dotted line (length a22 + a23 ), and the blue dotted
line (length a1 ) form a right-angled triangle, with the hypothenuse being the green dotted
line. Thus, the length of the green dotted line is (again by the Pythagorean Theorem):
√
√ 2 √
a21 + ( a2 + a3 ) = a21 + a22 + a23 .
2 2
Definition 79.√ The length (or magnitude) of a vector a = (a1 , a2 , a3 ) is denoted ∣a∣ and
defined by ∣a∣ = a21 + a22 + a23 .
This is very much analogous to the definition of the length (or magnitude) of a 2D vector.
Exercise 117. (Answer on p. 1075.) (a) Compute the lengths of the vectors a = (1, 2, 3),
b = (4, 5, 6), and a − b.
(b) Compute the lengths of the vectors 2a = (2, 4, 6), 3b = (12, 15, 18), and 4(a − b).
(c) Compute the unit vectors in the directions a = (1, 2, 3), b = (4, 5, 6), and a − b.
(d) Compute (1, 2, 3) ⋅ (4, 5, 6) and (−2, 4, −6) ⋅ (1, −2, 3).
(e) Compute the angles (i) between the vectors a = (1, 2, 3) and b = (4, 5, 6); and (ii)
between the vectors u = (−2, 4, −6) and v = (1, −2, 3). (iii) Are the vectors (−2, 4, −6) and
(1, −2, 3) orthogonal?
(g) Find the point that divides the line segment ab in the ratio 2 ∶ 3.
(h) For each of the following vectors, find their x-, y-, and z-direction cosines. And then
write down their unit vectors. (i) (1, 3, −2). (ii) (4, 2, −3). (iii) (−1, 2, −4).
Recall that given two 2D vectors u = (ux , uy ) and v = (vx , vy ), their scalar product was the
scalar defined by u ⋅ v = ux vx + vx vy . We now define a very similar concept.
Definition 80. Given two 2D vectors u = (ux , uy ) and v = (vx , vy ), their vector product (or
cross product), denoted u × v, is the scalar defined by
u × v = ux vy − uy vx .
Example 269. If u = (−1, 4) and v = (2, −3), then u × v = (−1) × (−3) − 4 × 2 = −5.
Ordinary multiplication is commutative. This simply means that given any real numbers
a, b, we have a × b = b × a. For example,
Example 272. If u = (−1, 4) and v = (2, −3), then u × v = (−1) × (−3) − 4 × 2 = −5, but
v × u = 4 × 2 − (−1) × (−3) = 5.
Fact 29. Let u and v be two non-zero 2D vectors and θ ∈ [0, π] be the angle between them.
Then the scalar u × v is equal to either ∣u∣ ∣v∣ sin θ or − ∣u∣ ∣v∣ sin θ.
Earlier we already had one formula for calculating the angle between two vectors. Let
θ ∈ [0, π] be the angle between u and v. Then
u⋅v
θ = cos−1 ( ).
∣u∣ ∣v∣
The above Fact now gives us a second formula. Let θ ∈ [0, π] be the acute or right angle
between u and v. Then
u×v
θ = sin−1 ∣ ∣.
∣u∣ ∣v∣
However, we’ll stick with using only the first cosine formula. We won’t use the second
sine formula, mainly because, as we’ll see, computing the vector product is very tedious,
especially in the 3D case, where it is a different creature altogether.
38
Footnote 37 explained that the scalar product could be defined in one of two equivalent ways.
Similarly, the vector product can be defined in one of two equivalent ways. We can use either definition and then prove
that the other is true.
(1) In this textbook, we first define the vector product by u × v = ux vy − uy vx ; we then prove that u × v = ± ∣u∣ ∣v∣ sin θ,
where θ is the angle between the two vectors. That is, we start with the algebraic definition, then prove a geometric
property.
The alternative approach is this:
π π
(2) Define the vector product by u × v = ∣u∣ ∣v∣ sin θ if θ ∈ [0, ] or u × v = − ∣u∣ ∣v∣ sin θ if θ ∈ ( , π] ; then prove that
2 2
u × v = ux vy − uy vx . That is, we start with the geometric definition, then prove an algebraic property.
SYLLABUS ALERT
Calculation of the area of a triangle or parallelogram is included in the 9740 (old) syllabus,
but not in the 9758 (revised) syllabus. So you can skip this section if you’re taking 9758.
The vector product is also helpful for computing the area of triangles and parallelograms.
Fact 30. The triangle with sides of lengths ∣u∣, ∣v∣, and ∣v − u∣ has area 0.5∣u × v∣.
v v–u
v v–u
|v| sin Ʌ Ʌ
u
Ʌ Case #2.
Proof. Case #1. If the vectors u and v form an acute or right angle θ, then the area of the
triangle is simply 0.5 × Base × Height or 0.5 ∣u∣ ∣v∣ sin θ. And by Fact 29, 0.5 ∣u∣ ∣v∣ sin θ =
0.5∣u × v∣.
Case #2. And if the vectors u and v form an obtuse angle θ, then the area of the triangle is
again simply 0.5 × Base × Height or 0.5 ∣u∣ ∣v∣ sin(π − θ). Recall that sin(π − θ) = sin π cos θ −
sin θ cos π = sin θ. So again the area of the triangle is 0.5 ∣u∣ ∣v∣ sin θ or 0.5∣u × v∣.
Example 273. Consider the triangle formed by the points (0, 0), (3, 4), and (5, 6). Its
area is simply 0.5 ∣(3, 4) × (5, 6)∣ = 0.5∣3 × 6 − 4 × 5∣ = 1.
v–u
ȣ v
Proof. Such a parallelogram is simply composed of two of the triangles from Fact 30. And
so its area is simply twice the area of the triangle, or 2 × 0.5∣u × v∣ = ∣u × v∣.
The 3D vector product is very different from the 2D vector product. The latter was simply
a scalar (real number); in contrast, the 3D vector product is instead a VECTOR!
Also previously, we first started with the algebraic definitions. For example, the 3D scalar
product was defined as u⋅v = u1 v1 +u2 v2 +u3 v3 and the 2D vector product as u×v = u1 v2 −u2 v1 .
We then showed that these algebraic definitions were equivalent to some geometric
interpretations.
For the vector product in 3D, I will go the other way round. That is, I will start with
the (very long) geometric definition, then show that it is equivalent to some algebraic
interpretation.
Let’s see what this first property means. Recall that it doesn’t matter where we put the
heads and tails of vectors. So let’s put u and v on the same plane, with their heads at the
same point.
u×v
ȣ
Plane
v
We see that there are exactly two vectors that are orthogonal to both u and v — the vector
pointing up (green) and the vector pointing down (purple).
The third and last property specifies the length (or magnitude) of u × v.
3. ∣u × v∣ = ∣u∣ ∣v∣ sin θ, where θ ∈ [0, π] is the angle between them.
Fact 32. (a) i × j = k; (b) j × k = i; (c) k × i = j; (d) j × i = −k; (e) k × j = −i; and (f)
i × k = −j.
Proof. In each case, use the right-hand rule to show that properties #1 and #2 of Definition
81 are satisfied.
π
In each case, the length of the cross product is ∣u∣ ∣v∣ sin θ = 1 × 1 × sin = 1. So indeed,
2
property #3 is also satisfied.
The next proposition gives the promised algebraic interpretation of the vector product.
⎛ uy vz − uz vy ⎞
u×v=⎜
⎜ uz vx − ux vz
⎟.
⎟
⎝ ux vy − uy vx ⎠
Proof. Optional, see p. 927 in Appendices. (The proof is actually quite simple. It just
involves some tedious algebra.)
u × v = (2 × 6 − 3 × 5, 3 × 4 − 1 × 6, 1 × 5 − 2 × 4) = (−3, 6, −3) .
Let’s verify that u×v is orthogonal to u, by computing (u × v)⋅u = (−2, −4, −2)⋅(−1, 3, −5) =
2 − 12 + 10 = 0 ✓. Similarly, let’s verify that u × v is orthogonal to v, by computing
(u × v) ⋅ v = (−2, −4, −2) ⋅ (2, −4, 6) = −4 + 16 − 12 = 0 ✓.
As in the 2D case of the vector product, here again in the 3D case, the vector product is
anticommutative, i.e. u × v = −v × u (see Exercise 120).
Exercise 118. For each of the following pairs of vectors, compute the vector product and
verify that it is orthogonal to each of the two vectors. (a) u = (0, 1, 2) and v = (3, 4, 5). (b)
u = (−1, −2, −3) and v = (1, 0, 5). (Answer on p. 1077.)
Exercise 119. Verify that in general, u × v is orthogonal to u and v by showing that
(u × v) ⋅ u = 0 and that (u × v) ⋅ v = 0. (Answer on p. 1077.)
Exercise 120. (a) Given u = (1, 2, 3) and v = (4, 5, 6), show that v × u = −u × v. (b) Prove
that in general, i.e. the 3D vector product is anti-commutative, i.e. u × v = −v × u. (Answer
on p. 1077.)
Example 278. If u = (1, 2, 3), v = (4, 5, 6), and w = (1, 0, 1), then (u × v)×w = (−3, 6, −3)×
(1, 0, 1) = (6, 0, −6), but u × (v × w) = (1, 2, 3) × (5, 2, −5) = (−16, 20, −8).
ax + by + c = 0.
This says that the line consists of exactly those points (x, y) that satisfy the equation
ax + by + c = 0.
(You may be more familiar with describing lines in the form y = mx+d. This simply involves
a rearrangement of the above equation. But the above equation is preferred because it is
more general — it allows for the possibility that the coefficient on y is 0.)
For convenience (but at the cost of some sloppiness), we may even simply identify the line
with the cartesian equation.
Describing lines using cartesian equations is secondary school stuff. We’ll now learn a
second method of describing lines — through vector equations. In general, any line can be
described in the form
r = p + λv, λ ∈ R,
where r is a generic point on the line, p is some known point on the line, v is a direction
vector of the line, and λ is a parameter that can take any real value.
This says that the line consists of every point r that can be written as (0, 2) + λ(1, 3) for
some real number λ. We call λ a parameter. As λ varies, we get different points of
the line. So for example, corresponding to λ = 0, 1, and −1, the line contains the points
(0, 2) + 0(1, 3) = (0, 2), (0, 2) + 1(1, 3) = (1, 5), and (0, 2)−1(1, 3) = (−1, −1). Of course, it
also contains infinitely many other points, one for each value of λ ∈ R.
We call (1, 3) a direction vector of the line. Note that this direction vector is not unique,
any scalar multiple thereof, i.e. c(1, 3) with c ∈ R, is also a direction vector of the line!
Again, we can either say “the line is described by the vector equation r = (0, 2) +
λ(1, 3)”. OR, for convenience (but at the cost of some sloppiness), we can also say “the line
is the very equation r = (0, 2) + λ(1, 3)”.
y 4
Line
Cartesian equation
3x - y + 2 = 0 The vector (1, 3)
x
0
-4 -2 0 2 4
Vector equation -2
r = (0, 2) + ɉ(1, 3)
-4
This says that the line consists of every point r that can be written as (0, 1) + λ(1, −1)
for some real number λ. Corresponding to λ = 0, 1, and −1, the line contains the points
(0, 1) + 0(1, −1) = (0, 1), (0, 1) + 1(1, −1) = (1, 0), and (0, 1)−1(1, −1) = (−1, 2).
Line y 4
Cartesian equation 2
x+y-1=0
x
0
-4 -2 0 2 4
-2 Vector equation
r = (0, 1) + ɉ(1, -1)
The vector (1, -1)
-4
This says that the line consists of every point r that can be written as (0, 3) + λ(1, 0)
for some real number λ.Corresponding to λ = 0, 1, and −1, the line contains the points
(0, 3) + 0(1, 0) = (0, 3), (0, 3) + 1(1, 0) = (1, 3), and (0, 3)−1(1, 0) = (−1, 3).
y 4
Vector equation
Line r = (0, 3) + ɉ(1, 0)
Cartesian equation
The point (0, 3)
y-3=0
2
x
0
-4 -2 0 2 4
-4
Exercise 121. Rewrite each of the following lines into vector equation form. (a) −5x+y+1 =
0. (b) x − 2y − 1 = 0. (c) y − 4 = 0. (d) x − 4 = 0. (Answer on p. 1078.)
1
r = p + λv, λ ∈ R.
Notice that the LHS of this equation is the generic Point r. And the RHS of the equation
is the Point p minus the Vector λv, which equals Point (see p. 266). So LHS and RHS
do indeed match up.
There is another way to describe a line using a vector equation. We can instead write:
2
r = p + λv, λ ∈ R,
where now r is the position vector of a generic point r on the line and p is the position
vector of some known point p on the line. So now LHS is a Vector and so too is RHS.
1
Equation = said that the line consists of those points r that could be written as p + λv. In
2
contrast, equation = says that the line consists of those points whose position vector r can
be written as p + λv. But both equations can equally well describe the very same line. The
difference is a fine and pedantic one and really doesn’t matter much.
3
r = p + λv, λ ∈ R; WRONG!
4
or r = p + λv, λ ∈ R. WRONG!
3 3 3
The LHS of = is a Point while the RHS of = is a Vector. Therefore = cannot be true.
4 4 4
The LHS of = is a Vector while the RHS of = is a Point. Therefore = cannot be true.
As usual, this is all very pedantic, but can serve as a useful test of your understanding.
In the previous section, given the cartesian equation of a line, we worked out its vector
equation. Now given its vector equation, we’ll work out its cartesian equation.
r = p + λv = (p1 , p2 ) + λ(v1 , v2 ).
where λ ∈ R and v is a non-zero vector.39 And so any point (x, y) on this line must satisfy
The above are the cartesian equations for a line (on a 2D plane)! But wait a minute ...
isn’t there supposed to be just one equation? Well, if we’d like, we can quite easily combine
them into a single equation by eliminating the parameter λ. In general:
Fact 34. The line with vector equation r = (p1 , p2 ) + λ(v1 , v2 ) (for λ ∈ R) is the line with
cartesian equations as given by the 3 cases below.
x − p1 y − p2
(1) = , if v1 , v2 ≠ 0;
v1 v2
(2) x = p1 , y is free, if v1 = 0, v2 ≠ 0;
(3) x is free, y = p2 , if v1 ≠ 0, v2 = 0;
Some examples:
39
Otherwise we’d simply be describing the single point p!
We can eliminate λ and reduce the above pair of equations into the single cartesian equation
y = x + 1 or
y−1 x
= .
1 1
Example 285. Consider the line described by the vector equation r = (0, 0)+λ(4, 5), where
λ ∈ R has cartesian equations x = 4λ and y = 5λ.
As λ varies between −∞ and ∞, this pair of equations gives us the points that are on the
line. For example, when λ = 1, 17, 33, we have the points (4, 5), (68, 85), and (132, 165).
Example 286. Consider the line described by the vector equation r = (3, 1)+λ(0, 2), where
λ ∈ R has cartesian equations x = 3 and y = 1 + 2λ.
As λ varies between −∞ and ∞, this pair of equations gives us the points that are on the
line. So in fact, the above equations say that x must always be 3 and y is free to vary along
with λ. For example, when λ = 1, 17, 33, we have the points (3, 3), (3, 25), and (3, 67).
Exercise 122. Rewrite each of the following lines into cartesian equation form. (Answer
on p. 1078.)
(a) r = (−1, 1) + λ(3, −2), where λ ∈ R.
Example 287. Consider the line described by the vector equation r = (1, 2, 3) + λ(0, 1, 1),
where λ ∈ R. Corresponding to λ = 0, 1, and −1, the line contains the points (1, 2, 3),
(1, 3, 4), and (1, 1, 2).
(1, 2, 3) 2
x
1
(1, 3, 4)
1
2 Line
3 (1, 1, 2) r = (1, 2, 3) + ɉ(0, 1, 1)
z 4
y
2
(-1, 0, 0)
1 Line
r = (0, 0, 0) + ɉ(1, 0, 0)
x
1
1 (1, 0, 0)
z
2 (0, 0, 0)
We now try to work out the cartesian equation of a line in 3D space. Suppose a line can
be described by the vector equation
r = p + λv = (p1 , p2 , p3 ) + λ(v1 , v2 , v3 ).
where λ ∈ R and v is a non-zero vector.40 And so any point (x, y, z) on this line must satisfy
The above are the cartesian equations for a line (in 3D space)! These are exactly analogous
to the cartesian equations (p. 31.2) in the 2D case.
Unlike in the 2D case, it is generally impossible to reduce these equations into a single
cartesian equation. However, we can reduce them into two equations.
Fact 35. The line with vector equation r = (p1 , p2 , p3 ) + λ(v1 , v2 , v3 ) where λ ∈ R is the line
with cartesian equations as given by the 7 cases below.
x − p 1 y − p2 z − p3
(1) = = if v1 , v2 , v3 ≠ 0; (most common case)
v1 v2 v3
y − p2 z − p3
(2) x = p1 , = , if v1 = 0, v2 , v3 ≠ 0;
v2 v3
x − p1 z − p 3
(3) y = p2 , = , if v2 = 0, v1 , v3 ≠ 0;
v1 v3
x − p1 y − p2
(4) z = p3 , = , if v3 = 0, v1 , v2 ≠ 0;
v1 v2
(5) x = p1 , y = p2 , z is free, if v1 , v2 = 0, v3 ≠ 0;
(6) x = p1 , z = p3 , y is free, if v1 , v3 = 0, v2 ≠ 0;
(7) y = p2 , z = p3 , x is free, if v2 , v3 = 0, v1 ≠ 0.
Example 289. Consider the line described by the vector equation r = (1, 2, 3) + λ(4, 5, 6),
where λ ∈ R. It can be described by the cartesian equations x = 1 + 4λ, y = 2 + 5λ, and
z = 3 + 6λ.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (5, 7, 9), (13, 17, 21), and (69, 87, 105).
By rearranging each equation so that λ is on one side, we can reduce these three equations
to just two:
x−1 y−2 z−3
= = .
4 5 6
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations.
Example 290. Consider the line described by the vector equation r = (0, 0, 0) + λ(2, 3, 5),
where λ ∈ R. It can be described by the cartesian equations x = 2λ, y = 3λ, and z = 5λ.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (2, 3, 5), (6, 9, 15), and (34, 51, 85).
By rearranging each equation so that λ is on one side, we can reduce these three equations
to just two:
x y z
= = .
2 3 5
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations.
In the case where v1 = 0 (but v2 ≠ 0 and v3 ≠ 0), then this is a line that is on the 2D yz
plane where x = p1 .
Example 291. Consider the line described by the vector equation r = (1, 2, 3) + λ(0, 5, 6),
where λ ∈ R. It can be described by the cartesian equations x = 1, y = 2 + 5λ, and z = 3 + 6λ.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (1, 7, 9), (1, 17, 21), and (1, 87, 102).
By rearranging the second and third equations so that λ is on one side, we can reduce these
three equations to just two:
y−2 z−3
x = 1, = .
5 6
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations.
Similarly, in the case where v2 = 0 (but v1 ≠ 0 and v3 ≠ 0), then this is a line that is on the
2D xz plane where y = p2 .
Example 292. Consider the line described by the vector equation r = (1, 2, 3) + λ(4, 0, 6),
where λ ∈ R. It can be described by the cartesian equations x = 1 + 4λ, y = 2, and z = 3 + 6λ.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (5, 2, 9), (13, 2, 21), and (69, 2, 105).
By rearranging the first and third equations so that λ is on one side, we can reduce these
three equations to just two:
x−1 z−3
y = 2, = .
4 6
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations.
Example 293. Consider the line described by the vector equation r = (1, 2, 3) + λ(4, 5, 0),
where λ ∈ R. It can be described by the cartesian equations x = 1 + 4λ, y = 2 + 5λ, and z = 3.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (5, 7, 3), (13, 17, 3), and (69, 87, 3).
By rearranging the first and second equations so that λ is on one side, we can reduce these
three equations to just two:
x−1 y−2
z = 3, = .
4 5
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations.
We now look at examples where exactly two of v1 , v2 , or v3 are zero (Cases 5, 6, and 7 of
Fact 35).
In the case where v1 = 0 and v2 = 0, but v3 ≠ 0, then this is a line that runs through the
points (p1 , p2 , λ) for λ ∈ R.
Example 294. Consider the line described by the vector equation r = (1, 2, 3) + λ(0, 0, 6),
where λ ∈ R. It can be described by the cartesian equations x = 1, y = 2, and z = 3 + 6λ.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (1, 2, 9), (1, 2, 21), and (1, 2, 105).
We see that x and y must always be equal to 1 and 2. Hence, the above equations simply
reduce to:
x = 1, y = 2.
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations. These are the points (1, 2, λ), where λ can be any real.
Example 295. Consider the line described by the vector equation r = (1, 2, 3) + λ(0, 5, 0),
where λ ∈ R. It can be described by the cartesian equations x = 1, y = 2 + 5λ, and z = 3.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (1, 7, 3), (1, 17, 3), and (1, 87, 3).
We see that x and z must always be equal to 1 and 3. Hence, the above equations simply
reduce to:
x = 1, z = 3.
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations. These are the points (1, λ, 3), where λ can be any real.
In the case where v2 = 0 and v3 = 0, but v1 ≠ 0, then this is a line that runs through the
points (λ, p2 , p3 ) for λ ∈ R.
Example 296. Consider the line described by the vector equation r = (1, 2, 3) + λ(4, 0, 0),
where λ ∈ R. It can be described by the cartesian equations x = 1 + 4λ, y = 2, and z = 3.
As λ varies between −∞ and ∞, these 3 equations give us the points that are on the line.
For example, when λ = 1, 3, 17, we have the points (5, 2, 3), (13, 2, 3), and (69, 2, 3).
We see that y and z must always be equal to 2 and 3. Hence, the above equations simply
reduce to:
y = 2, z = 3.
That is, this is the line that contains the points (x, y, z) which satisfy the above cartesian
equations. These are the points (λ, 2, 3), where λ can be any real.
Exercise 123. Rewrite each of the following vector equation descriptions of lines into
cartesian equations describing the same line. (a) r = (−1, 1, 1) + λ(3, −2, 1), where λ ∈ R.
(b) r = (5, 6, 1) + λ(7, 8, 1), where λ ∈ R. (c) r = (0, −3, 1) + λ(3, 0, 1), where λ ∈ R. (d)
r = (9, 9, 9) + λ(1, 0, 0), where λ ∈ R. (Answer on p. 1079.)
“If A is the point with position vector a = a1 i + a2 j + a3 k and the direction vector b is given
by b = b1 i + b2 j + b3 k, then the straight line through A with direction vector b has cartesian
equation
x − a1 y − a2 z − a3
= = (= λ).”
b1 b2 b3
By the way, the above statement printed in the 9740 (old) List of Formulae is false (*gasp*),
because it fails to specify that b1 , b2 , b3 must be non-zero. (The correct statement was just
given as Fact 35.)
Consider for example, the point (0, 0, 0) and the direction vector b given by b = j + k. Then
contrary to the above statement, the straight line through A with direction vector b does
not have cartesian equation
x y z
= = ,
0 1 1
because x/0 is undefined. This is the common mistake to which I devoted an entire chapter
(Chapter 2) earlier in this book. This seems like a very pedantic point to make, but dividing
by zero has been the cause of the downfall of many a student (and in this case some folks
at MOE or wherever the heck these things are written).
3x − 4 2y − 18 z − 1
= = .
6 5 3
In order to directly apply Fact 35, you must make sure that the coefficients on x, y,
and z are all 1! So first rewrite the above into
x − 4/3 y − 9 z − 1
= = .
2 2.5 3
And now by Fact 35, we can immediately describe this line by the vector equation r =
(4/3, 9, 1) + λ(2, 2.5, 3), for λ ∈ R.
5x y − 13 3z − 14
= = .
2 6 8
x y − 13 z − 14/3
Rewrite these into 2 = = 8 .
/5 6 /3
And so by Fact 35, we can immediately describe this line by the vector equation r =
(0, 13, 14/3) + λ(2/5, 6, 8/3), for λ ∈ R.
Exercise 124. Rewrite each of the following cartesian equation descriptions of lines into
a vector equation describing the same line. (Answer on p. 1080.)
7x − 2 0.3y − 5 8z
(a) = = . (b) 2x = 3y = 5z.
5 7 7
3y − 1 x − 3 5z − 2
(c) 17x − 4 = = 3z. (d) = , 3y = 11.
2 2 7
Definition 82. A set of points are said to be collinear if there is some line that contains
all of these points.
Any two points are always collinear — simply take the line that passes through both of
them.
In contrast, three points may not be collinear. To check whether three points are collinear,
1. First take the line that passes through two of the points.
2. Then check whether the third point is on this line.
c
c b
a
a, b, and c are
a not collinear. a, b, and c
are collinear.
b
Then check whether c is on the line: Is there λ such that c = (7, 8, 9) = (1, 2, 3) + λ(3, 3, 3)?
Rearranging, we have (6, 6, 6) = λ(3, 3, 3), which we can write out as:
Clearly, all three of the above equations are true if λ = 2. And so c is also on the line.
Hence, the three points are collinear.
Example 301. Are the points d = (1, 0, 0), e = (0, 1, 0), and f = (0, 0, 1) collinear?
First take the line through d and e . The vector from d to e is (−1, 1, 0) and the line passes
through d. Hence, the line can be written as r = (1, 0, 0) + λ(−1, 1, 0) (λ ∈ R).
Then check whether f is on the line: Is there λ such that f = (0, 0, 1) = (1, 0, 0)+λ(−1, 1, 0)?
Rearranging, we have (−1, 0, 1) = λ(−1, 1, 0), which we can write out as:
−1 = −λ, 0= λ, 1 = 0.
Clearly, there is no λ such that the above three equations can be true. And so the point f
is not on the line through d and e. Hence, the three points are not collinear.
Exercise 125. Determine whether each of the following set of three points are collinear. (a)
a = (3, 1, 2), b = (1, 6, 5), and c = (0, −1, 0). (b) a = (1, 2, 4), b = (0, 0, 1), and c = (3, 6, 10).
(Answer on p. 1080.)
1. u ⊥ v ⇐⇒ u ⋅ v = 0.
In words: Two vectors are orthogonal if and only if their scalar product is 0.
2. Since the plane is a flat surface, there must be some vector n that is orthogonal (per-
pendicular) to this plane.
That is, n is orthogonal to every vector on the plane. We call n the plane’s normal
vector (hence the use of the letter n).
Is the normal vector unique? No, because any other vector cn (where c is any scalar) serves
equally well as a normal vector. In the figure below, n is a normal vector to the illustrated
plane. So too is 0.5n. And so too is −n.
But otherwise, besides cn, there are no other vectors that are also orthogonal to
the plane. That is, any vector that cannot be written in the form cn is not orthogonal to
the plane.
Black vectors
n (a normal
Plane are on the plane
vector) 0.5n (Also a
normal vector)
-n (Also a
normal vector)
(r − p) ⋅ n = 0.
q (point not
on the plane) n is normal to r – p ,
but not to q – p.
q – p (vector not
on the plane)
p (point on
n (a normal the plane)
vector) Plane
r – p (vector
on the plane) r1 (point on
the plane)
Now consider any point q that is not on the plane. We can construct the vector q − p.
This vector q − p does not lie on the plane and must therefore not be orthogonal to n, the
plane’s normal vector. So, for any point q that does not lie on the plane, we have
(q − p) ⋅ n ≠ 0.
Fact 36. Suppose a plane contains point p and has normal vector n. Then the plane
contains exactly those points r such that
(r − p) ⋅ n = 0.
(r − p) ⋅ n = 0 ⇐⇒ r ⋅ n − p ⋅ n = 0 ⇐⇒ r ⋅ n = p ⋅ n.
Now, p is known (it is the position vector of a point p known to be on the plane). So too
is n (it is the plane’s normal vector). Thus, p ⋅ n is simply some known number.
So we can describe the plane even more simply by the vector equation
r ⋅ n = d,
where d = p ⋅ n.
⎛1⎞ ⎛1 ⎞
d=p⋅n=⎜ ⎟ ⎜
⎜ 2 ⎟⋅⎜ 1
⎟ = 1 × 1 + 2 × 1 + 3 × 0 = 3.
⎟
⎝3⎠ ⎝0 ⎠
We thus conclude that the plane may be described by the vector equation r ⋅ (1, 1, 0) = 3.
This says that the plane contains exactly every point r, whose position vector r satisfies the
above equation. For example, the points r1 = (3, 0, 0), r2 = (0, 3, 5), and r3 = (1, 2, −1) are
on the plane, because their position vectors r1 = (3, 0, 0), r2 = (0, 3, 5), and r3 = (1, 2, −1)
satisfy the above equation, as we can easily verify:
⎛3⎞ ⎛1 ⎞
r1 ⋅ n = ⎜ ⎟ ⎜
⎜ 0 ⎟⋅⎜ 1
⎟ = 3 × 1 + 0 × 1 + 0 × 0 = 3.
⎟
⎝0⎠ ⎝0 ⎠
⎛0⎞ ⎛1 ⎞
r2 ⋅ n = ⎜ ⎟ ⎜
⎜ 3 ⎟⋅⎜ 1
⎟ = 0 × 1 + 3 × 1 + 5 × 0 = 3.
⎟
⎝5⎠ ⎝0 ⎠
⎛ 1 ⎞ ⎛1 ⎞
r3 ⋅ n = ⎜ ⎟ ⎜
⎜ 2 ⎟⋅⎜ 1
⎟ = 1 × 1 + 2 × 1 + (−1) × 0 = 3.
⎟
⎝ −1 ⎠ ⎝ 0 ⎠
Lest you be sceptical that a plane could be described so simply, let’s verify that two vectors
on the plane are indeed orthogonal to the normal vector n. First consider r2 −r1 = (0, 3, 5)−
(3, 0, 0) = (−3, 3, 5) — this is a vector on the plane. We can verify that indeed
⎛ −3 ⎞ ⎛ 1 ⎞
(r2 − r1 ) ⋅ n = ⎜ ⎟ ⎜
⎜ 3 ⎟⋅⎜ 1
⎟ = −3 × 1 + 3 × 1 + 5 × 0 = 0.
⎟
⎝ 5 ⎠ ⎝0 ⎠
Next consider p − r3 = (1, 2, 3) − (1, 2, −1) = (0, 0, 4) — this is also a vector on the plane. We
can verify that indeed
⎛0⎞ ⎛1 ⎞
(p − r3 ) ⋅ n = ⎜ ⎟ ⎜
⎜ 0 ⎟⋅⎜ 1
⎟ = 0 × 1 + 0 × 1 + 4 × 0 = 0.
⎟
⎝4⎠ ⎝0 ⎠
⎛0⎞ ⎛ 2 ⎞
d=p⋅n=⎜ ⎟ ⎜
⎜ 0 ⎟ ⋅ ⎜ −1
⎟ = 0 × 2 + 0 × (−1) + 1 × 1 = 1.
⎟
⎝1⎠ ⎝ 1 ⎠
We thus conclude that the plane may be described by the vector equation r ⋅ (2, −1, 1) = 1.
This says that the plane contains exactly every point r, whose position vector r satisfies
the above equation. For example, the points r1 = (1, 1, 0), r2 = (0, 1, 2), and r3 = (1, 2, 1)
are on the plane, because their position vectors r1 = (1, 1, 0), r2 = (0, 1, 2), and r3 = (1, 2, 1)
satisfy the above equation, as we can easily verify:
⎛1⎞ ⎛ 2 ⎞
r1 ⋅ n = ⎜ ⎟ ⎜
⎜ 1 ⎟ ⋅ ⎜ −1
⎟ = 1 × 2 + 1 × (−1) + 0 × 1 = 1.
⎟
⎝0⎠ ⎝ 1 ⎠
⎛0⎞ ⎛ 2 ⎞
r2 ⋅ n = ⎜ ⎟ ⎜
⎜ 1 ⎟ ⋅ ⎜ −1
⎟ = 0 × 2 + 1 × (−1) + 2 × 1 = 1.
⎟
⎝2⎠ ⎝ 1 ⎠
⎛1⎞ ⎛ 2 ⎞
r3 ⋅ n = ⎜ ⎟ ⎜
⎜ 2 ⎟ ⋅ ⎜ −1
⎟ = 1 × 2 + 2 × (−1) + 1 × 1 = 1.
⎟
⎝1⎠ ⎝ 1 ⎠
Lest you be sceptical that a plane could be described so simply, let’s verify that two vectors
on the plane are indeed orthogonal to the normal vector n. First consider r2 −r1 = (0, 1, 2)−
(1, 1, 0) = (−1, 0, 2) — this is a vector on the plane. We can verify that indeed
⎛ −1 ⎞ ⎛ 2 ⎞
(r2 − r1 ) ⋅ n = ⎜ ⎟ ⎜
⎜ 0 ⎟ ⋅ ⎜ −1
⎟ = (−1) × 2 + 0 × (−1) + 2 × 1 = 0.
⎟
⎝ 2 ⎠ ⎝ 1 ⎠
Next consider p − r3 = (0, 0, 1) − (1, 2, 1) = (−1, −2, 0) — this is also a vector on the plane.
We can verify that indeed
⎛ −1 ⎞ ⎛ 2 ⎞
(p − r3 ) ⋅ n = ⎜ ⎟ ⎜
⎜ −2 ⎟ ⋅ ⎜ −1
⎟ = (−1) × 2 + (−2) × (−1) + 0 × 1 = 0.
⎟
⎝ 0 ⎠ ⎝ 1 ⎠
Example 304. A plane contains the points a = (1, 2, 3), b = (4, 5, 8), and c = (2, 3, 5).
Ð→
Both vectors ab = (3, 3, 5) and Ð → = (1, 1, 2) are on the plane. Hence, a normal vector to the
ac
Ð
→ →
plane is ab × Ð
ac = n = (1, −1, 0).
Since a ⋅ n = −1, the plane can be described by the vector equation r ⋅ (1, −1, 0) = −1.
Example 305. A plane contains the points a = (1, 0, 0), b = (0, 1, 0), and c = (0, 0, 1).
Ð
→
Both vectors ab = (−1, 1, 0) and Ð → = (−1, 0, 1) are on the plane. Hence, a normal vector to
ac
Ð
→ →
the plane is ab × Ð
ac = n = (1, 1, 1).
Example 306. A plane contains the points a = (0, 0, 3) and b = (1, 4, 5), and the vector
v = (3, 2, 1).
Ð
→
Both vectors ab = (1, 4, 2) and v = (3, 2, 1) are on the plane. Hence, a normal vector to the
Ð
→
plane is ab × v = n = (0, 5, −10).
Since a ⋅ n = −30, the plane can be described by the vector equation r ⋅ (0, 5, −10) = −30.
Example 307. A plane contains the points a = (8, −2, 0) and b = (3, 6, 9), and the vector
v = (0, 1, 1).
Ð
→
Both vectors ab = (−5, 8, 9) and v = (0, 1, 1) are on the plane. Hence, a normal vector to
Ð
→
the plane is ab × v = n = (−1, 5, −5).
Since a ⋅ n = −18, the plane can be described by the vector equation r ⋅ (−1, 5, −5) = −18.
Let n = (a, b, c) be the normal vector of a plane. Let p = (p1 , p2 , p3 ) be a point on the plane.
Then the plane can be described by the vector equation
r ⋅ n = p ⋅ n,
where r = (x, y, z) is the position vector of a generic point on the plane. Writing out the
vectors in the above equation explicitly, we have:
⎛x⎞ ⎛a ⎞ ⎛ p1 ⎞ ⎛ a ⎞
⎜ y ⎟⋅⎜ b ⎟=⎜ p ⎟⋅⎜ b ⎟, or ax + by + cz = ap1 + bp2 + cp3 .
⎜ ⎟ ⎜ ⎟ ⎜ 2⎟ ⎜ ⎟
⎝z ⎠ ⎝ c ⎠ ⎝ p3 ⎠ ⎝ c ⎠
This last equation is the cartesian equation description of the same plane. Note, once
again, that d = ap1 + bp2 + cp3 is simply some known number. So this cartesian equation
simply says that the plane contains exactly those points (x, y, z) that satisfy the equation
ax + by + cz = d.
Example 308. The plane with vector equation r ⋅ (1, 1, 0) = 3 has cartesian equation
x + y = 3.
Example 309. The plane with vector equation r ⋅ (2, −1, 1) = 1 has cartesian equation
2x − y + z = 1.
Example 310. The plane with vector equation r ⋅ (1, −1, 0) = −1 has cartesian equation
x − y = −1.
Example 311. The plane with vector equation r ⋅ (1, 1, 1) = 1 has cartesian equation
x + y + z = 1.
Example 312. The plane with vector equation r ⋅ (0, 5, −10) = −30 has cartesian equation
5y − 10z = −30.
Example 313. The plane with vector equation r ⋅ (−1, 5, −5) = −18 has cartesian equation
−x + 5y − 5z = −18.
r ⋅ (a, b, c) = d ⇐⇒ ax + by + cz = d.
Example 315. Given a plane with cartesian equation 2x + 3z = −5, we immediately know
that it has vector equation r ⋅ (2, 0, 3) = −5.
Here’s a nice observation: Every plane that contains the origin (0, 0, 0) can be written in
the form ax + by + cz = 0. Conversely, every plane that does not contain the origin can be
written in the form ax + by + cz = 1. Formally:
Proof. Given a plane r ⋅ n = d, the origin is on the plane (and thus satisfies this equation)
⇐⇒ 0 ⋅ n = d = 0.
SYLLABUS ALERT
The following statement is in the old but not the new List of Formulae.
n1 x + n2 y + n3 z + d = 0 where d= −a ⋅ n.”
Exercise 126. Find the vector and cartesian equations that describe the planes containing
each of the following set of three points: (Answer on p. 1081.)
(a) a = (7, 3, 4), b = (8, 3, 4), and c = (9, 3, 7).
(b) a = (8, 0, 2), b = (4, 4, 3), and c = (2, 7, 2).
(c) a = (8, 5, 9), b = (8, 4, 5), and c = (5, 6, 0).
Exercise 127. Write down the vector equations of the planes whose cartesian equations
are as given: (Answer on p. 1082.)
Example 316. The plane with vector equation r ⋅ (1, 0, 1) = 11 or cartesian equation x + z =
11 can be described in an infinite number of ways. For example, the same plane can also
be described by any of the following four equations: r ⋅ (2, 0, 2) = 22, r ⋅ (1/11, 0, 1/11) = 1,
2x + 2z = 22, and x/11 + z/11 = 1.
If you talk about the plane r ⋅ (2, 0, 2) = 22 and I talk about the plane x/11x + z/11z = 1, it
make take us a moment to realise that we are talking about the exact same plane. To save
ourselves such trouble, it may be desirable to describe planes in a standardised form, called
the Hessian normal form.
This involves simply picking the unit normal vector n̂ as our normal vector. However,
there are two possible unit normal vectors, one pointing “up” and the other pointing “down”.
We will choose the unit normal vector that ensures that p ⋅ n̂ ≥ 0, so that the RHS of our
vector or cartesian equation in Hessian normal form is always non-negative.
̂
Example 317. Consider the plane r ⋅ (1, 0, 1) = 11 or x + z = 11. We have (1, 0, 1) =
√ √
( 2/2, 0, 2/2). And so the plane can be rewritten in Hessian normal form as r ⋅
√ √ √ √ √ √
( 2/2, 0, 2/2) = 11 2/2 or ( 2/2) x + ( 2/2) z = 11 2/2.
Notice that in the Hessian normal form, the number dˆ = p ⋅ n̂ is uniquely defined. Indeed,
it is the distance of the plane from the origin! (We’ll prove this in section 33.2.)
̂
Example 318. Consider the plane r ⋅ (8, 1, 3) = −3 or 8x + y + 3z = −3. We have (8, 1, 3) =
√ √ √
(8/ 74, 1// 74, 3// 74). Note though that right now, the RHS is negative. So in order
to ensure that dˆ ≥ 0 (as required by the Hessian normal form), we need simply reverse the
√ √ √
sign of our unit normal vector — that is, we should pick (−8/ 74, −1/ 74, −3/ 74) as our
unit normal
√ vector.√ Altogether
√ then,
√ the plane √can be rewritten
√ in Hessian
√ normal form
√ as
r ⋅ (−8/ 74, −1/ 74, −3/ 74) = 3/ 74 or (−8/ 74) x − (1/ 74) y − (3/ 74) z = (3/ 74).
Exercise 128. Rewrite each of the following planes’ vector equation into Hessian normal
form. (Answer on p. 1082.)
Before we proceed, here are some useful things to remember. A line can be fully determined
by
Ð
→
41
If the two points are a and b, then the vector must be distinct from cab, for any c ∈ R.
Definition 83. The foot of the perpendicular from a point a to a line l is the point b on
the line l that is closest to the point a. The distance between the point a and the line l is
the length of the line segment ab.
Distance
a between
a and b
Note that the line ab must be perpendicular to the line l. Hence the name foot of the
perpendicular.
Rather than try to memorise the following proposition, it’s easier to just remember how
the proof works:
Proof. Let b be the foot of the perpendicular from the point to the line.
(a) Pick any known point on the line — here the obvious choice is p. Consider the right-
angled triangle △bpa — it has hypothenuse of length ∣Ð → and base of length ∣Ð
pa∣ → ⋅ v̂∣ (refer
pa
to the diagram above). Hence, by the Pythagorean Theorem, the length of line segment ab
(or the distance between the point a and the line l) is:
√
∣Ð
→ 2 − (Ð
pa∣ → ⋅ v̂)2 , as desired.
pa
(b) The point b is a distance ∣Ð → ⋅ v̂∣ away from the point p, heading in the direction Ð
pa
→
pb.
̂
→ ⋅ v̂∣ Ð
→
Hence b = p + ∣Ð
∗
pa pb. There are two possible cases to examine.
Ð
→
Case #1 : v̂ is pointing in the same direction as pb.
Ð̂
→
Then pb = v̂ and Ð → ⋅ v̂ > 0, so that ∣Ð
pa → ⋅ v̂∣ = Ð
pa → ⋅ v̂. Altogether then, =∗ becomes b =
pa
p + ∣Ð
→ ⋅ v̂∣ v̂ = p + (Ð
pa → ⋅ v̂) v̂, as desired. ✓
pa
Ð
→
Case #2 : v̂ and pb are pointing in opposite directions.
Ð̂
→
Then pb = −v̂ and Ð → ⋅ v̂ < 0, so that ∣Ð
pa → ⋅ v̂∣ = −Ð
pa → ⋅ v̂. Altogether then, =∗ becomes b =
pa
p + (−Ð
→ ⋅ v̂) (−v̂) = p + (Ð
pa → ⋅ v̂) (v̂), as desired. ✓
pa
On p. 930 in the Appendices (optional), I give another proof of the above Proposition using
calculus. The idea of this second proof will be illustrated in the last two examples of this
section.
And so (Ð
→ ⋅ v̂)2 = 169/61. Hence, the length of the side is
pa
√ √ √
169 104 8
3− = = ≈ 1.069.
91 91 7
Distance between
a and b is 1.069
b= (9, 8, 17)
p = (0, 1, 2)
Ð
→
Note that in this example, v and pb do point in the same direction and we have Ð
→ ⋅ v̂ > 0.
pa
Ð→
In contrast, in the next example, v and pb will point in opposite directions and we will
have Ð
→ ⋅ v̂ < 0.
pa
Distance between
a and b is 2.823
l
b= (-10, 9, -7)
p = (3, 2, 1)
(b) The point a = (8, 0, 2) and the line l described by r = (4, 4, 3) + λ(2, 7, 2).
(c) The point a = (8, 5, 9) and the line l described by r = (8, 4, 5) + λ(5, 6, 0).
Example 321. Consider the point a = (1, 2, 3) and the line r = (0, 1, 2) + λ(9, 1, 3) (λ ∈ R).
The distance between a and a generic point r on the line is
RRR R R R
RRR⎛ 1 ⎞ ⎛ 9λ ⎞RRRR RRRR⎛ 1 − 9λ ⎞RRRR
∣a − r∣ = RRRR⎜ 2 ⎟−⎜ 1+λ ⎟RRRR = RRRR⎜ 1 − λ ⎟RRRR
RRR⎜ ⎟ ⎜ ⎟RR RR⎜ ⎟RR
RRR⎝ 3 ⎠ ⎝ 2 + 3λ ⎠RRRR RRRR⎝ 1 − 3λ ⎠RRRR
R R R
√ √
= (1 − 9λ)2 + (1 − λ)2 + (1 − 3λ)2 = 91λ2 − 26 + 3.
Our goal is to find the point √ r on the line that is closest to the point a. In other
√ our goal is to minimise 91λ − 26λ + 3. So we can look for the minimum point
words, 2
of 91λ2 − 26λ + 3.
√
To simplify matters, note that minimising 91λ2 − 26λ + 3 is the same as minimising 91λ2 −
26λ + 3. So we might as well look for the minimum point of 91λ2 − 26λ + 3. To this end:
d set 26 1
(91λ2 − 26 + 3) = 182λ − 26 = 0 ⇐⇒ λ= = .
dλ 182 7
Altogether then, the point b on the line l that is closest to the point a has parameter λ = 1/7.
So b = (0, 1, 2) + 1/7(9, 1, 3) = 1/7(9, 8, 17).
And the distance between a and l (or equivalently, the length of the line segment ab) is
√ √
√ 1 2 1 8
91λ2 − 26λ + 3 = 91 ( ) − 26 ( ) + 3 = .
7 7 7
Of course, these are the same as what we found in Example 319 a few pages ago.
RRR R R R
RRR⎛ −1 ⎞ ⎛ 3 + 5λ ⎞RRRR RRRR⎛ −4 − 5λ ⎞RRRR
∣a − r∣ = RRRR⎜ 0 ⎟ ⎜ ⎟RRRR = RRRR⎜ −2 − λ ⎟RRRR
RRR⎜ ⎟−⎜ 2+λ ⎟RR RR⎜ ⎟RR
RRR⎝ 1 ⎠ ⎝ 1 + 2λ ⎠RRRR RRRR⎝ −2λ ⎠RRRR
R R R
√ √
= (−4 − 5λ) + (−2 − λ) + (−2λ) = 30λ2 + 44λ + 20.
2 2 2
And the distance between a and l (or equivalently, the length of the line segment ab) is
√ √
√ −11 2 −11 58
30λ2 + 44λ + 20 = 30 ( ) + 44 ( ) + 20 = .
15 15 15
Of course, these are the same as what we found in Example 320 a few pages ago.
Exercise 130. For each of the following, use the second method (calculus) to find (i) the
distance between the given point a and the given line l; and also (ii) the point b on the line
that is closest to a. (Answers on p. 1086.)
(a) The point a = (7, 3, 4) and the line l described by r = (8, 3, 4) + λ(9, 3, 7).
(b) The point a = (8, 0, 2) and the line l described by r = (4, 4, 3) + λ(2, 7, 2).
(c) The point a = (8, 5, 9) and the line l described by r = (8, 4, 5) + λ(5, 6, 0).
Definition 84. The foot of the perpendicular from a point a to a plane P is the point b on
the plane P that is closest to the point a. The distance between the point a and the plane
P is the length of the line segment ab.
a
Distance
Plane between
a and b
p
b
Proposition 9. Given a point a (with position vector a) and a plane given in Hessian
normal form r ⋅ n̂ = d,
ˆ
(a) The distance between the point and the plane is ∣dˆ − a ⋅ n̂∣; and
(b) The foot of the perpendicular from the point to the plane is the point a + (dˆ − a ⋅ n̂) n̂.
Proof. Let b be the foot of the perpendicular from the point to the line.
(a) Pick any point p on the plane. The length of the line segment ab — and hence also the
distance between the point and the plane — is simply the length of the projection of Ð → on
ap
the plane’s normal vector, which is simply ∣Ð → ⋅ n̂∣ = ∣(p − a) ⋅ n̂∣ = ∣d − a ⋅ n̂∣, as desired.
ap
Ð
→
(b) The point b is a distance ∣d − a ⋅ n̂∣ away from a, heading in the direction ab. Hence,
∗ ̂
Ð
→
b = a + ∣d − a ⋅ n̂∣ ab. There are two possible cases to examine.
Ð
→ Ð̂
→
Case #1 : If n̂ is pointing in the same direction as pb, then n̂ = ab. Moreover Ð → ⋅ n̂ =
ap
∗
d − a ⋅ n̂ > 0, so that ∣d − a ⋅ n̂∣ = d − a ⋅ n̂. Altogether then, = becomes b = a + (d − a ⋅ n̂) n̂, as
desired. ✓
Ð
→ Ð̂
→
Case #2 : If n̂ and pb are pointing in opposite directions, then n̂ = −ab. Moreover Ð → n̂ = d−
ap⋅
∗
a⋅ n̂ < 0, so that ∣d − a ⋅ n̂∣ = − (d − a ⋅ n̂). Altogether then, = becomes b = a−(d − a ⋅ n̂) (−n̂) =
a + (d − a ⋅ n̂) n̂, as desired. ✓
1 1 1 3 √
r ⋅ ( √ , √ , √ ) = √ = 3.
3 3 3 3
√
3 √ √
So n̂ = (1, 1, 1), dˆ = 3, and a ⋅ n̂ = 2 3.
3
√ √
Altogether then, the distance between the point and the plane is ∣dˆ − a ⋅ n̂∣ = ∣ 3 − 2 3∣ =
√
3 and the foot of the perpendicular is
√
√ √ 3
a + (dˆ − a ⋅ n̂) n̂ = (1, 2, 3) + ( 3 − 2 3) (1, 1, 1) = (0, 1, 2).
3
Ð
→
By the way, notice that in this example, n points in the opposite direction from ab. And
̂
Ð
→
so ab = −n̂. And moreover, dˆ − a ⋅ n̂ < 0.
a = (1, 2, 3)
Not to scale.
Distance
Plane between
a and b
p = (0, 1, 2) b = (0, 1, 2)
1 2 3 32
r ⋅ (√ , √ , √ ) = √ .
14 14 14 14
1 32
So n̂ = √ (1, 2, 3), dˆ = √ , and a ⋅ n̂ = 0.
14 14
32
Altogether then, the distance between the point and the plane is ∣dˆ − a ⋅ n̂∣ = ∣ √ − 0∣ =
14
32
√ and the foot of the perpendicular is
14
32 1 16
a + (dˆ − a ⋅ n̂) n̂ = (0, 0, 0) + ( √ − 0) √ (1, 2, 3) = (1, 2, 3).
14 14 7
Ð
→
By the way, notice that in this example, n points in the same direction as ab. And so
̂
Ð
→
ab = n̂. And moreover, dˆ − a ⋅ n̂ > 0.
a = (0, 0, 0)
Not to scale.
Distance
Plane between
a and b
p = (4, 5, 6) b= (1, 2, 3)
Consider two lines on the 2D cartesian plane that are parallel (and thus either do not
intersect or are identical). We define the angle between them to be 0.
Now consider two lines that intersect (see diagram below). Taking their intersection point
to be the vertex, A and B are, respectively, the acute and obtuse angles between the two
lines. Of course, there is the possibility that the two lines are perpendicular, in which case
A and B are both right (i.e. equal to π/2).
So when talking about “the angle between two lines”, there is some potential for confusion.
Are we talking about angle A or angle B?
By convention, the angle between two lines is the smaller angle. (Also, on the A-level
exams, they are usually quite careful to specifying that they want the acute angle, so that
there is no confusion.)
Example 325. Consider the lines (on the 2D cartesian plane) r = (1, 3) + λ(2, 1) and
r = (−1, −1) + λ(1, 3) (λ ∈ R). The angle θ between their direction vectors v1 = (2, 1) and
v2 = (1, 3) is given by
v1 ⋅ v2 (2, 1) ⋅ (1, 3) 5
θ = cos−1 ( ) = cos−1 ( ) = cos−1 ( √ √ ) ≈ 0.785.
∣v1 ∣ ∣v2 ∣ ∣(2, 1)∣ ∣(1, 3)∣ 5 10
y 4
A = 0.785
Vector equation
r = (1, 3) + ɉ(2, 1)
x
0
-4 -2 0 2 4
The vector (1, 3)
Vector equation
r = (-1, -3) + ɉ(1, 3) -2
A = 0.785
-4
v1 ⋅ v2 (−2, 3) ⋅ (3, 1) −3
θ = cos−1 ( ) = cos−1 ( ) = cos−1 ( √ √ ) ≈ 1.837.
∣v1 ∣ ∣v2 ∣ ∣(−2, 3)∣ ∣(3, 1)∣ 13 10
This is the obtuse angle between the two lines. So the acute angle between the two lines is
A = π − 1.837 = 1.305.
y 4
The vector (-2, 3)
Vector equation 2
r = (0, 0) + ɉ(-2, 3) A = 1.305
B = 1.837
x
0
-4 -2 0 2 4
A = 1.305
Vector equation
r = (1, 0) + ɉ(3, 1) -2
-4
So the two vectors are parallel. Which means that the two lines are parallel and so by
definition, the angle between the two lines is 0.
Vector equation
r = (2, -2) + ɉ(3, 3)
-2
Vector equation
r = (1, 1) + ɉ(-1, 1)
-4
Visualising lines in 3D space is difficult. Which is why we tackled the 2D case first.
It turns out that we compute angles between two lines in 3D space in exactly the same way
as in the 2D case.
1. If two lines are parallel, then again we define the angle between them to be 0.
2. If two lines intersect, then again we take their intersection point to be the vertex and
take the smaller angle formed to be the angle between the two lines.
On the 2D cartesian plane, the above were the only two possibilities — two lines either are
parallel or intersect. In contrast, in 3D space, there is the third possibility that two lines
neither are parallel nor intersect! As we’ll learn in section 35.1, any two lines that
neither are parallel nor intersect are called skew lines.
What is the angle between two skew lines, given that they do not intersect?
3. Given two skew lines, translate one of them so that they intersect. Examine the angle
between the two now-intersecting lines. This is defined to be the angle between the two
skew lines.
Translate one of 3 A
the lines so that
they intersect 2
x
1
1
2
3
z 4
So once again, given any two lines, the angle between them is simply the angle between
their direction vectors. So again the scalar product comes in handy.
v1 ⋅ v2 (9, 1, 3) ⋅ (3, 2, 1) 32
θ = cos−1 ( ) = cos−1 ( ) = cos−1 ( √ √ ) ≈ 0.459.
∣v1 ∣ ∣v2 ∣ ∣(9, 1, 3)∣ ∣(3, 2, 1)∣ 91 14
Example 330. Consider the lines r = (−1, 2, 3) + λ(0, 1, 0) and r = (0, 0, 0) + λ(8, −3, 5)
(λ ∈ R). The angle θ between their direction vectors v1 = (0, 1, 0) and v2 = (8, −3, 5). Thus,
So the obtuse angle between the two lines is 1.879. And the angle between the two lines is
1.263.
Example 331. Consider the lines r = (1, 3, 3) + λ(1, 5, 3) and r = (7, 4, 7) + λ(7, −2, 1)
(λ ∈ R). The angle θ between their direction vectors v1 = (1, 5, 3) and v2 = (7, −2, 1). Thus,
So the two lines are perpendicular and the angle between them is right (i.e. π/2).
Exercise 133. Find the angle between each of the following pairs of lines. (Answer on p.
1090.)
(a) r = (−1, 2, 3) + λ(−1, 1, 0) and r = (0, 0, 0) + λ(2, −3, 4) (λ ∈ R).
Fact 38. The angle between the line r = p + λv and the plane r ⋅ n = d is
v⋅n
A = sin−1 ∣ ∣.
∣v∣ ∣n∣
Proof. Let θ be the angle between the line’s direction vector and the plane’s normal vector.
θ satisfies cos θ = v ⋅ n/ (∣v∣ ∣n∣).
If θ is acute (or right), then the angle between the line and the plane is A = π/2 − θ. Thus,
π π π v⋅n
sin A = sin ( − θ) = sin ( ) cos θ − sin θ cos ( ) = cos θ = .
2 2 2 ∣v∣ ∣n∣
v⋅n v⋅n
sin A = ∣ ∣ or A = sin−1 ∣ ∣.
∣v∣ ∣n∣ ∣v∣ ∣n∣
π π π −v ⋅ n
sin A = sin (θ − ) = sin θ cos ( ) − sin ( ) cos θ = − cos θ = .
2 2 2 ∣v∣ ∣n∣
Note that if θ ∈ (π/2, π], then v ⋅ n < 0, so that ∣v ⋅ n∣ = −v ⋅ n. Altogether, we indeed have
v⋅n v⋅n
sin A = ∣ ∣ or A = sin−1 ∣ ∣.
∣v∣ ∣n∣ ∣v∣ ∣n∣
√
v ⋅ n (9, 1, 3) ⋅ (1, 1, 1) 13 13
sin−1 ∣ ∣ = sin−1 ∣ ∣ = sin−1 ∣ √ √ ∣ = sin−1 ∣ √ √ ∣ ≈ 0.906.
∣v∣ ∣n∣ ∣(9, 1, 3)∣ ∣(1, 1, 1)∣ 91 3 7 3
Example 333. The angle between the line r = (4, 2, 3) + λ(1, 0, 1) (λ ∈ R) and the plane
r ⋅ (−1, −1, 1) = 5 is
Example 334. The angle between the line r = (5, 5, 5) + λ(1, 0, 1) (λ ∈ R) and the plane
r ⋅ (0, 1, 0) = 3 is
Exercise 134. For each of the following, find the angle between the given line and plane.
(Answer on p. 1091.)
(a) r = (−1, 2, 3) + λ(−1, 1, 0) (λ ∈ R) and r ⋅ (3, 4, 5) = 0.
Given two planes P1 and P2 , the angle between them is simply the angle between any two
vectors v1 and v2 on the two planes.
n1
n2 v2
Angle between P2
Angle between
the two planes
the two planes’
normal vectors v1
P1
But the normal vector n1 of the first plane is orthogonal to v1 ; similarly, the normal vector
n2 of the second plane is orthogonal to v2 . And so the angle between v1 and v2 is equal to
the angle between n1 and n2 .
Altogether then, the angle between two planes is simply the angle between their normal
vectors.
Again, there are two possible angles — by convention, we take the smaller one.
n1 ⋅ n2
θ = cos−1 ( )
∣n1 ∣ ∣n2 ∣
−2 −2
= cos−1 ( √ √ ) = cos−1 ( √ )
3 2 6
≈ 2.526.
This is the obtuse angle. So the acute angle between the two planes is π − 2.526 = 0.615
radian.
Example 336. Consider the planes r ⋅ (2, 1, 3) = 26 and r ⋅ (−3, 0, 5) = −25. The angle θ
between the two planes is
n1 ⋅ n2
θ = cos−1 ( )
∣n1 ∣ ∣n2 ∣
(2, 1, 3) ⋅ (−3, 0, 5)
= cos−1 ( )
∣(2, 1, 3)∣ ∣(−3, 0, 5)∣
9
= cos−1 ( √ √ ) ≈ 1.146.
14 34
Exercise 135. Find the angle between the two given planes. (a) r ⋅ (−1, −2, −3) = 1 and
r ⋅ (3, 4, 5) = 2. (b) r ⋅ (1, −2, 3) = 3 and r ⋅ (5, 1, 1) = 4. (c) r ⋅ (1, 1, 8) = 5 and r ⋅ (−3, 0, 10) = 6.
(Answer on p. 1092.)
Definition 85. Two lines are parallel if their direction vectors can be written as scalar
multiples of each other.
Example 337. The lines r = (0, 0, 0) + λ(0, 1, 0) and r = (4, 17, 0) + λ(1, 0, 0) (λ ∈ R) are
not parallel, because (0, 1, 0) cannot be written as a scalar multiple of (1, 0, 0).
Example 338. The lines r = (8, 1, 1) + λ(3, 6, 9) and r = (4, 5, 6) + λ(1, 2, 3) (λ ∈ R) are
parallel, because (3, 6, 9) = 3(1, 2, 3).
Any two points are always coplanar — indeed, they are collinear (p1 and p2 in the figure
below). Three points are also always coplanar, although they may not be collinear (p1 , p2 ,
and p3 in the figure below). But four points may not be coplanar (p1 , p2 , p3 , and p4 in the
figure below).
Line 2
Two points are coplanar. They
also lie on the same line.
Plane p3
p2
Definition 87. Two lines are coplanar if there is some plane on which both lie. Two lines
that are not coplanar are called skew lines.
Example 339. In the figure above, Line 1 and Line 2 are skew lines. Line 1 lies on the
plane illustrated. Line 2 cuts through the plane and does not intersect Line 1.
1. If they are parallel, then obviously we can construct a plane that contains both lines.
And so the two lines are coplanar.
2. If they are not parallel and they lie on the same plane, then they must intersect. This
is just the familiar fact you learnt in primary school — two non-parallel lines on the
plane must definitely intersect.
Altogether we conclude:
Fact 39. Two lines are coplanar if and only if they (i) are parallel; OR (ii) intersect.
Equivalently, two lines are skew if and only if they (i) are not parallel; AND (ii) do not
intersect.
Example 340. Consider the lines r = (8, 1, 1)+λ(3, 6, 9) and r = (4, 5, 6)+λ(1, 2, 3) (λ ∈ R).
The direction vector of one can be written as the scalar multiple of the other, so they are
parallel. Hence, they are also coplanar; or equivalently, they are not skew.
Example 341. Consider the lines r = (0, 0, 0) + λ(0, 1, 0) and r = (4, 17, 0) + λ(1, 0, 0)
(λ ∈ R). The direction vector of one cannot be written as the scalar multiple of the
other, so they are not parallel. If they intersect, then there are reals α and β such that
(0, 0, 0) + α(0, 1, 0) = (4, 17, 0) + β(1, 0, 0), or
0 = 4 + β, α = 17, and 0 = 0.
α = 17, β = −4 solves the above equations. (What does this mean? This means that the
first line goes through the point (0, 0, 0) + α(0, 1, 0) = (0, 17, 0) and the second line also goes
through the same point (4, 17, 0) + β(1, 0, 0) = (0, 17, 0).)
The two lines intersect at (0, 17, 0). And so they are coplanar — or equivalently, they are
not skew.
If we’d like, we can easily find the plane on which these two lines lie. Remember: All
we need are two distinct vectors and a point to determine a plane. We already have two
distinct vectors, namely the direction vectors of the two lines. Using these, we can find
a normal vector for the plane — namely (0, 1, 0) × (1, 0, 0) = (0, 0, −1). Noting also that
the origin is on the first line and therefore on the plane, we conclude that the plane is
r ⋅ (0, 0, −1) = 0.
1 2 3
9α = 4 + 3β, 1 + α = 5 + 2β, and 2 + 3α = 6 + β.
3 2
Take 2× = minus = to get (4 + 6α) − (1 + α) = (12 + 2β) − (5 + 2β) or 3 + 5α = 7 or α = 0.8.
2
Now from =, this means that β = −1.6. These do not work if we try plugging them into
1
=. Hence, there are no reals α and β that solve the above system of equations. In other
words, the two lines do not intersect.
And so the two lines are not coplanar — or equivalently, they are skew.
Exercise 136. Determine whether each of the following pairs of lines is coplanar or skew.
If they are coplanar, find the plane that contains both of them. (Answer on p. 1093.)
(a) r = (8, 1, 5) + λ(3, 2, 1) and r = (1, 2, 3) + λ(5, 6, 7) (λ ∈ R).
Definition 88. A line with direction vector v and a plane with normal vector n are parallel
if v ⋅ n = 0 (i.e. v and n are perpendicular).
The above definition makes sense, because if the line is perpendicular to the plane’s normal
vector, then the line must be parallel to the plane itself.
Fact 40. Given a plane and a line, there are three possible cases (illustrated below):
1. The line and plane are parallel and do not intersect at all.
2. The line and plane are parallel and the line lies completely on the plane.
3. The line and plane are not parallel and intersect at exactly one point.
Line 1 Line 3
Plane
Line 2
Note that if a line and a plane are parallel, then either (i) they do not intersect at all; or
(ii) the line lies completely on the plane.
• So if a line and a plane are parallel and you can prove that they share at least one
intersection point, then it must be that the line lies completely on the plane.
• Conversely, if a line and a plane are parallel and you can prove that there is at least one
point on the line that is not on the plane (or that there is at least one point on the plane
that is not on the line), then it must be that they do not intersect at all.
Plug in a generic point of the line into the equation for the plane:
−10
[(3, 5, 5) + λ(9, 1, 3)]⋅(1, 1, 1) = 3 Ô⇒ 3+9λ+5+λ+5+3λ = 3 Ô⇒ 13+13λ = 3 Ô⇒ λ = .
13
10
So the intersection point is (3, 5, 5) − (9, 1, 3).
13
Example 344. Consider the line r = (3, 5, 5) + λ(9, 1, 3) (λ ∈ R) and the plane r ⋅ (1, 0, −3) =
−6. We have (9, 1, 3) ⋅ (1, 0, −3) = 0 and so they are parallel.
There are two possibilities. Either they do not intersect at all OR the line lie completely
on the plane.
Since (3, 5, 5) ⋅ (1, 0, −3) = −12 ≠ −6, the point (3, 5, 5) is on the line but is not on the plane
Example 345. Consider the line r = (3, 5, 3) + λ(9, 1, 3) (λ ∈ R) and the plane r ⋅ (1, 0, −3) =
−6. We have (9, 1, 3) ⋅ (1, 0, −3) = 0 and so they are parallel.
There are two possibilities. Either they do not intersect at all OR the line lie completely
on the plane.
Since (3, 5, 3) ⋅ (1, 0, −3) = −6, the point (3, 5, 3) on the line is also on the plane.
Since they are parallel and share at least one intersection point, it must be that the line
lies completely on the plane.
Exercise 137. For each of the following, determine whether the given line and plane are
(i) parallel but do not intersect; (ii) parallel with the line lying completely on the plane; or
(iii) intersect at exactly one point.(Answer on p. 1094.)
(a) r = (4, 5, 6) + λ(2, 3, 5) (λ ∈ R) and r ⋅ (−10, 0, 4) = −26.
Definition 89. Two planes are parallel if their normal vectors can be written as scalar
multiples of each other.
(Note that an alternative definition is this: “Two planes are parallel if they do not intersect.”
We will show that these two definitions are equivalent.)
Imagine that two planes intersect at some line, which we’ll call the intersection line.
Since this intersection line is on both planes, it must also be perpendicular to the normal
vectors of both planes. In other words, it must have direction vector n1 × n2 . The next
fact is thus not surprising (although actually proving it takes a little work).
Fact 41. Two non-parallel planes with normal vectors n1 and n2 intersect at all if and only
if they intersect along a line with direction vector n1 × n2 (i.e. the line is perpendicular to
both n1 and n2 ).
Fact 42. Given two planes, there are three possible cases:
1. The two planes are parallel and exactly identical.
2. The two planes are parallel and do not intersect at all.
3. The two planes are not parallel and share an intersection line with direction vector
n1 × n2 (where n1 , n2 are the normal vectors of the plane).
n2
P3
n3
P2
Intersection line n2 × n3
of P2 and P3
P1
Note that analogous to our study of two lines, if two planes are parallel, then either (i) they
do not intersect at all; or (ii) they are identical.
• So if two planes are parallel and you can prove that they share at least one intersection
point, then it must be that the two planes are identical.
• Conversely, if two planes are parallel and you can prove that there is at least one point
on one plane that is not on the other plane, then it must be that they do not intersect
at all.
And in the case where they are not parallel, to find the intersection line, simply find a point
p where the two planes intersect. Then the intersection line is simply
r = p + λ(n1 × n2 ), λ ∈ R.
1 2
7x + y + z = 42, x + y + 2z = 6.
There are infinitely many points where the two planes intersect. So why not we look for an
intersection point where x = 0. I’ll call this the “plug in x = 0” trick.
2 1
In which case = minus = yields z = −36 and y = 78. Hence, the intersection line is r =
(0, 78, −36) + λ(1, −13, 6) (λ ∈ R).
Example 348. Consider the planes r ⋅ (1, 1, 1) = 12 and r ⋅ (−1, −1, 0) = −1.
Clearly, (1, 1, 1) cannot be written as a scalar multiple of (−1, −1, 0). So the two planes are
not parallel and share an intersection line whose direction vector is (1, 1, 1) × (−1, −1, 0) =
(1, −1, 0).
1 2
x + y + z = 12, −x − y = −1.
2
Again, we can play the “plug in x = 0” trick. In which case = says that y = 1 and now from
1
=, we have z = 11. And so (0, 1, 11) is an intersection point of the two planes. Hence, the
intersection line is r = (0, 1, 11) + λ(1, −1, 0) (λ ∈ R).
1 2
y + 3z = 0, −x + y + 3z = 2.
Here the direction vector of the intersection line has x-coordinate 0. So the “plug in x = 0”
1 2
trick might not work. And indeed it doesn’t, because if we plug in x = 0, then = and = are
contradictory.
So let’s try the “plug in y = 0” trick instead, which I know will work because the y-
coordinate of the direction vector of the interesction line is non-zero (it’s −3). Then from
1 2
= we have z = 0 and now from = we have x = −2. And so (−2, 0, 0) is an intersection point
of the two planes. Hence, the intersection line is
Alternatively, we could also have used the “plug in z = 0” trick instead, which again I
know will work because the z-coordinate of the direction vector of the interesction line is
1 2
non-zero (it’s 1). Then from = we have y = 0 and now from = we have x = −2. And so again
we find that (−2, 0, 0) is an intersection point of the two planes. And so again we would
have concluded that the intersection line is
So the two planes are not identical and do not intersect at all.
Example 351. Consider the planes r ⋅ (4, 0, 3) = 32 and r ⋅ (−8, 0, −6) = −64.
Clearly, (4, 0, 3) can be written as a scalar multiple of (−8, 0, −6) and so the two planes are
parallel.
Let’s check if they are identical. The point (8, 0, 0) is on the first plane. It is also on the
second plane because:
Since the two planes are parallel and share at least one intersection point, it must be that
the two planes are exactly identical.
Exercise 138. For each of the following, determine whether the given pair of planes are
parallel and identical, parallel and do not intersect, or are not parallel. If they are not
parallel, determine also their intersection line. (Answer on p. 1095.)
(a) r ⋅ (4, 9, 3) = 61 and r ⋅ (1, 1, 2) = 19.
SYLLABUS ALERT
The relationship between three planes is included in the 9740 (old) syllabus, but not in the
9758 (revised) syllabus. So you can skip this section if you’re taking 9758.
• P1 and P2 ;
• P1 and P3 ; and
• P2 and P3 .
To find the relationship between the 3 planes is simply to find the relationships between
each of these 3 pairs of planes. This can be insanely tedious, but there is nothing new here.
Everything follows from what you learnt in the previous sections.
Let’s nonetheless give a summary of the possibilities. Given three planes, we have 3 possible
cases, each of which can be broken up into several sub-cases, for a total of 8 distinct
possibilities.
P1 , P2 , P3
3 parallel, P3
identical planes
P2
P3
P1
P1 , P2
3 parallel, non-
intersecting planes
3 parallel planes,
where P1 and P2
are identical
(a) The first 2 planes are identical. And so here we are really back to the situation of
two non-parallel planes, which we already covered in detail in the previous section.
They intersect along a line.
(b) The first 2 planes are not identical. And so the non-parallel plane intersects each
of the other two planes along a separate line of intersection.
P1 , P2
P3
P2
(a) None of the intersection lines intersect with each other. That is, each pair of planes
simply intersects along some distinct intersection line.
(b) All 3 intersection lines are identical. So all 3 planes intersect along the same inter-
section line.
(c) The 3 intersection lines and thus all 3 planes intersect at a single point.
To determine which of the above sub-cases we’re in, we must determine the relation between
each pair of intersection lines. This is tedious, but nothing new.
P3
P2
P1
P3
P1
Step #2. Find the 3 intersection lines along which each pair of planes intersect.
The planes P1 and P2 share an intersection line with direction vector (1, 0, 0) × (0, 1, 0) =
(0, 0, 1) and so their intersection line is r = (0, 0, 0) + λ(0, 0, 1) (λ ∈ R). Call this line l1 .
The planes P1 and P3 share an intersection line with direction vector (1, 0, 0) × (0, 0, 1) =
(0, −1, 0) and so their intersection line is r = (0, 0, 0) + λ(0, −1, 0) (λ ∈ R). Call this line l2 .
The planes P2 and P3 share an intersection line with direction vector (0, 1, 0) × (0, 0, 1) =
(1, 0, 0) and so their intersection line is r = (0, 0, 0) + λ(1, 0, 0) (λ ∈ R). Call this line l3 .
l1 and l2 are not parallel, but they do intersect at the point (0, 0, 0) and so that is also their
only intersection point.
l1 and l3 are not parallel, but they do intersect at the point (0, 0, 0) and so that is also their
only intersection point.
l2 and l3 are not parallel, but they do intersect at the point (0, 0, 0) and so that is also their
only intersection point.
Conclusion.
Altogether, we conclude that the 3 intersection lines intersect at a single point. Hence, the
3 planes also intersect at a single point. (So we are in Case 3c.)
P3 is not parallel to either of the first two planes, since (0, 1, 1) cannot be written as a
scalar multiple of (1, 1, 0) or (−2, −2, 0).
They are not, because (1, 0, 0) is on P1 but is not on P2 , as we can easily verify — (1, 0, 0) ⋅
(−2, −2, 0) = −2 ≠ −4. So we are in Case 2b.
Step #3. Find the intersection lines. There are two — one shared by P1 and
P3 and the other shared by P2 and P3 . (P1 and P2 are distinct, parallel planes
and thus do not intersect at all.)
The intersection line of P1 and P3 has direction vector (1, 1, 0) × (0, 1, 1) = (1, −1, 1). Let’s
1
find a point (x, y, z) at P1 and P3 intersect: the equations for the planes are x + y = 1 and
2
y + z = 1.
Using the “plug in x = 0” trick, we see that they intersect at (0, 1, 0). Hence, their inter-
section line is r = (0, 1, 0) + λ(1, −1, 1) (λ ∈ R). Call this line l1 .
The intersection line of P2 and P3 must also have direction vector (1, −1, 1). Let’s find a
1
point (x, y, z) at P2 and P3 intersect: the equations for the planes are −2x − 2y = −4 and
2
y + z = 1.
Using the “plug in x = 0” trick, we see that they intersect at (0, 2, −1). Hence, their
intersection line is r = (0, 2, −1) + λ(1, −1, 1) (λ ∈ R). Call this line l2 .
Exercise 139. What is the relationship between the 3 planes P1 , P2 , and P3 , given by
r ⋅ (1, 0, 1) = 1, r ⋅ (0, 1, −1) = −1, and r ⋅ (1, 1, 0) = 2? (Answer on p. 1096.)
Complex Numbers
We’ll start by defining the imaginary unit, then work our way to complex numbers.
Definition 90. The imaginary unit, denoted i, is a number that satisfies i2 = −1.
Using the imaginary unit, we can construct other purely imaginary numbers:
Definition 91. A purely imaginary number is any real, non-zero multiple of the imaginary
unit. That is, a purely imaginary number is any bi, where b ∈ R with b ≠ 0.
(We specify that b ≠ 0 because 0i = 0 is not a purely imaginary number, but a real number.)
√ √ √
Example 354. i + i = 2i = 2 −1 is purely imaginary. So too are −i = − −1 and πi = π −1.
i is both the imaginary unit and a purely imaginary number.
We can add real numbers to purely imaginary numbers to form imaginary numbers:
Notice that here in contrast, we do not specify that b ≠ 0. The reason is that complex
numbers include all real numbers.
Example 356. 10 and 17 are complex and real. 2+9i and 3−2i are complex and imaginary.
2i is complex, imaginary, and purely imaginary. i is complex , imaginary, purely imaginary,
and also the imaginary unit.
We denoted the set of real numbers by the symbol R. We now denote the set of complex
numbers by the symbol C.
Definition 94. The set of all complex numbers, denoted C, is defined as {a + bi∣a, b ∈ R}.
The set of reals is a proper subset of the set of complex numbers — formally,
Fact 43. R ⊂ C.
Complex numbers are thus the extension of the concept of real numbers. On the next page
is a modified version of our taxonomy of numbers from p. 41, with the complex numbers
fleshed out:
Example 357. 1, −1, i, and −i are all complex numbers. We do say that 1 is a positive
real number and −1 is a negative real number.
But we do not say that i is a positive complex number or that −i is a negative complex
number.
In fact, we do not even say that 1 is a positive complex number or that −1 is a negative
complex number.
Exercise 140. Fill in the following table. The first column has been done for you. (Answer
on p. 1097.)
√ √
Is this ... 13 − 2i 3i 0 4 4 + 2i i 3
A complex number? Yes
A real number? No
An imaginary number? Yes
A purely imaginary number? No
The imaginary unit? No
Definition 95. Given a complex number z = a + bi, its real part is a and is denoted Re(z).
Similarly, its imaginary part is b and is denoted Im(z).
It is also often convenient to write complex numbers in ordered pair notation, with the first
term being the real part and the second term being the imaginary.
Of course, two complex numbers z and w are equal if and only if (i) their real parts are
equal; AND (ii) their imaginary parts are equal.
Example 364. Suppose z = 3 + bi and w = a − 17i are equal. Then it must be that a = 3
and b = −17.
Exercise 141. Exactly two of the following complex numbers are identical. Find out which
two. (Answer on p. 1097.)
√ √
1 2 3 1 π π 3 π
a= √ − i, b = √ − √ i, c = sin − sin i, d= − cos (− ) i.
2 2 2 2 3 3 2 4
The familiar arithmetic operations work the same way on imaginary numbers as they do
on real numbers. Addition and subtraction are especially simple.
In general,
z + w = (a + c, b + d) and z − w = (a − c, b − d).
Exercise 142. For each of the following, compute z + w and z − w. (Answer on p. 1098.)
√
(a) z = −5 + 2i, w = 7 + 3i. (b) z = 3 − i, w = 11 + 2i. (c) z = 1 + 2i, w = 3 − 2i.
Below are listed the powers of i. Note that the cycle repeats after every fourth power,
because i4 = 1.
i = i, i2 = i × i = −1, i3 = i × i2 = −i, i4 = i × i3 = 1,
i5 = i × i4 = i, i6 = i × i = −1, i7 = i × i2 = −i, i8 = i × i3 = 1,
etc.
zw = (2 − i)(−1 + i) = −2 + 2i + i − i2
= −2 + 3i − i2 = −2 + 3i − (−1) = −1 + 3i.
Exercise 144. For each of the following, compute zw. (Answer on p. 1098.)
√
(a) z = −5 + 2i, w = 7 + 3i. (b) z = 3 − i, w = 11 + 2i. (c) z = 1 + 2i, w = 3 − 2i.
Recall that to rationalise a surd in the denominator (section 5.2), we used a trick involving
conjugate pairs.
√ √ √
3 3 1 − 5 3 (1 − 5) 3 ( 5 − 1)
Example 371. √ = √ × √ = = .
1+ 5 1+ 5 1− 5 1−5 4
This is a realisation (“make real”) that helps get rid of any complex numbers. Example:
1 1 −i −i −i
(b) = × = = = −i.
i i −i −i2 1
In general,
Exercise 146. For each of the following z, write down its conjugate z ∗ and hence compute
its reciprocal (i.e. 1/z). (a) z = −5 + 2i. (b) z = 3 − i. (c) z = 1 + 2i. (Answer on p. 1098.)
Example 373.
−2 + i −2 + i −3i 6i − 3i2 6i + 3
(a) = × = = .
3i 3i −3i −9i2 9
3 + i 3 + i 1 + i (3 + i)(1 + i) 3 + 3i + i + i2 2 + 4i
(b) = × = = = = 1 + 2i.
1−i 1−i 1+i 12 − i2 1+1 2
1+i 1 + i 3 + 2i 3 + 2i + 3i + 2i2 1 + 5i
(c) = × = = .
3 − 2i 3 − 2i 3 + 2i 9+4 13
2−i 2 − i −1 − i −2 − 2i + i + i2 −3 − i
(d) = × = = = −1.5 − 0.5i.
−1 + i −1 + i −1 − i 1+1 2
In general,
z z w∗ zw∗ ac + bd bc − ad
= × ∗= 2 = ( , ).
w w w c + d2 c2 + d2 c2 + d2
Exercise 147. Rewrite each of the following fractions into the form a + bi. (Answer on p.
1099.)
√
1 + 3i 2 − 3i 2 − πi 11 + 2i −3 7 − 2i
(a) . (b) . (c) √ . (d) . (e) . (f) .
−i 1+i 3 − 2i i 2+i 5+i
√
−b ± b2 − 4ac
x= .
2a
√ √
−b ± b2 − 4ac 3 ± 1
x= = = 1, 2.
2a 2
Now, armed with our new concept of imaginary numbers, we can completely dispense with
the requirement that b2 − 4ac ≥ 0. We can simply say that ax2 + bx + c = 0 ALWAYS has
complex roots, given by
√
−b ± b2 − 4ac
x= .
2a
Example 375. Consider the equation x2 −2x+2 = 0. Its discriminant is negative: b2 −4ac =
(−2)2 − 4(1)(2) = −4 < 0. It has two imaginary (and thus also complex) roots, given by
√ √ √ √
−b ± b2 − 4ac 2 ± −4 4 × −1 2i
x= = =1± = 1 ± = 1 ± i.
2a 2 2 2
Notice that 1 + i was a root to the given quadratic equation. And interestingly enough, so
too was 1 − i.
It turns out that in general, a quadratic equation with real coefficients has roots that come
in conjugate pairs. That is, if x + yi is a root, then so too is its conjugate x − yi.42 More
examples:
√
42
This is not terribly surprising if you examine the general solution for the quadratic equation — the ± b2 − 4ac bit corre-
sponds precisely to the imaginary part.
√ √ √ √ √
−b ± b2 − 4ac −1 ± −11 1 11 × −1 1 11
x= = =− ± =− ± i.
2a 6 6 6 6 6
Exercise 148. Find the roots for each of the following quadratic equations. (Answer on
p. 1100.)
Recall from p. 187 that a polynomial of degree n in one variable is any expression
a0 xn + a1 xn−1 + a2 xn−2 + ⋅ ⋅ ⋅ + an−1 x + an where each ai is a constant and x is the variable.
Proof. The proof of this theorem is way too advanced and so omitted from this book.43
There are sometimes “repeated solutions” or what are more formally called multiple roots,
as the next example illustrates.
Example 381. x3 − 6x2 + 12x − 8 = 0 has three (repeated) solutions, namely 2, 2, and 2.
We call 2 a multiple root (indeed a triple root).
43
But see this MathOverflow Q&A if you’re interested.
Example 382. x17 + 3x4 − 2x + 1 is a polynomial of degree 17. I may not know what the
solutions to x17 + 3x4 − 2x + 1 = 0 are, but I know from the Fundamental Theorem of Algebra
that there MUST be 17 solutions (though some may possibly be repeated).
x2 +x −6
x2 + 1 x4 +x3 −5x2 +x −6
x4 +0 +x2
x3 −6x2
x3 +0 +x
−6x2 +0 −6
−6x2 +0 −6
0.
Altogether, the four zeros of the given polynomial are ±i, 2, and −3.
x2 +2x +5
x − 5 x3 −3x2 −5x −25
x3 −5x2
2x2
2x2 −10x
5x −25
5x −25
0.
So x3 − 3x2 − 5x − 25i = (x − 3) (x2 + 2x + 5). I’m unable to easily see how x2 + 2x + 5 can be
factorised. So let me just use the quadratic formula:
√
−2 ± 22 − 4(1)(5) √ √
x= = −1 ± 1 − 5 = −1 ± −4 = −1 ± 2i.
2
Altogether then,
Exercise 150. Each of the following polynomials has 1 as a zero. Find the other zeros.
(Answer on p. 1101.)
(a) x3 + x2 − 2. (b) x4 − x2 − 2x + 2.
Example 386. If told that i solves 4x4 +5x2 +1 = 0, we know immediately that its conjugate
−i also solves the same equation. Similarly, if told also that 0.5i solves the same equation,
we know immediately that its conjugate −0.5i also solves the same equation.
The condition that all coefficients ak are real is important. The above theorem
does not apply if any of the coefficients are imaginary.
√ √
Example 388. 2/2 + i 2/2 solves x2 = i.
√ √
However, its conjugate 2/2 − i 2/2 does not (as you should verify yourself).
Exercise 151. Each of the following polynomials has 2 − 3i as a zero. Find the other zeros.
(Answer on p. 1102.)
(a) x4 − 6x3 + 18x2 − 14x − 39. (b) −2x4 + 21x3 − 93x2 + 229x − 195.
The complex plane (or Argand diagram) gives us a nice geometric interpretation: The
complex numbers are simply points on the plane. The real axis is the horizontal
or x-axis. The imaginary axis is the vertical or y-axis.
Example 390. In the figure below, marked in red are the real numbers −3, 0, π, and 2,
which may be written in ordered pair notation as (−3, 0), (0, 0), (π, 0), (2, 0). Points on
the horizontal axis are real numbers.
In blue are the purely imaginary numbers −4i and 3i, which may be written in ordered pair
notation as (0, −4) and (0, 3). Points on the vertical axis are purely imaginary numbers.
In green are the “impure” imaginary numbers 1 + i, −3 + 2i, 1 − 3i, and −4 − i, which may
be written in ordered pair notation as (1, 1), (−3, 2), (1, −3), and (−4, −1). Points not on
either axis are “impure” imaginary numbers.
5 y
1
x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
-2
-3
-4
-5
Exercise 152. Illustrate the complex numbers 1, −3, 2i, 1 + 2i, and −1 − 3i on a single
Argand diagram. (Answer on p. 1103.)
44
The differences between C and R2 in fact run deeper. See e.g. this discussion..
To write a complex number in standard form — i.e. z = x + iy, we need only two pieces
of information: its real part (x) and its imaginary part (y).
We now write a complex number in polar form. Again, we need only two pieces of
information: the modulus, denoted ∣z∣, and the argument, denoted arg z. Informally, the
modulus is the length of the position vector of z; the argument is the angle the position
vector of z makes with the positive x-axis.
Example 391. The complex number −3 = (−3, 0) has modulus ∣ − 3∣ = 3 and argument
arg 3 = π. The complex number −4i = (0, −4) has modulus ∣−4i∣ = 4 and argument
√ (−4i) =
arg√
−π/2. The complex number 3 + 3i = (3, 3), has modulus ∣3 + 3i∣ = 3 + 3 = 3 2 and
2 2
5 y
4
3 + 3i = (3, 3)
3
1
-3 = (-3, 0)
x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
-2
-3
-5
Definition
√ 96. The modulus function has domain C, codomain R, and mapping rule z ↦
x + y . The modulus of z is denoted ∣z∣.
2 2
In contrast, it is tricky to write down a formal definition of the argument function. One
problem is this: Angles are periodic.
Example 392. Consider again the complex number 3 + 3i = (3, 3). The angle it makes
with the positive x-axis is π/4.
But angles are periodic. Equivalently, angles come full circle 2π radians. So it would make
just as much sense to say that the angle is 9π/4. Or 17π/4. Or −7π/4. Or indeed any
π/4 + 2kπ, where k is any inteer.
To overcome this problem, we shall somewhat arbitrarily choose (−π, π] as our principal
values. Thus, arg(3 + 3i) shall be uniquely defined to be the value π/4 and nothing else.
Another problem is this: We are tempted to simply define arg(x + yi) = tan−1 (y/x). Un-
fortunately, the tan−1 function has codomain ge (−π/2, π/2). Whereas, as we just decided,
arg should have codomain (−π, π]. To overcome this, altogether, the argument function is
defined as follows:
Definition 97. The argument function has domain C, codomain (−π, π], and mapping
rule as given below:
⎧
⎪
⎪
⎪
⎪ tan−1 (y/x) , if x > 0 (top-right and bottom-right quadrants),
⎪
⎪
⎪
⎪
⎪
⎪ Undefined, if x = 0 = y (the origin),
⎪
⎪
⎪
⎪
⎪
⎪π/2, if x = 0, y > 0 (the positive y − axis),
arg z = ⎨
⎪
⎪
⎪ −π/2, if x = 0, y < 0, (the negative y − axis)
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ tan−1 (y/x) + π, if x < 0, y ≥ 0 (top-left quadrant, including the negative x-axis),
⎪
⎪
⎪
⎪
⎪ −1
⎩tan (y/x) − π, if x < 0, y < 0 (bottom-left quadrant).
y
(My mnemonic for the above: “arg z = tan−1 . Top left +π. Bottom left−π.”)
x
We now illustrate and explain the above definition:
arg z =
arg z =
• If x > 0 (top-right and bottom-right quadrants), then define arg(x + yi) = tan−1 (y/x).
The green point in the figure above illustrates. The angle that the position vector of the
green (x, y) makes with the positive x-axis is indeed simply tan−1 (y/x).
• If x = 0, y = 0 (the origin), then arg(x + yi) is undefined. In other words, we leave arg 0
undefined.45
• If x = 0, y > 0 (positive vertical axis), then define arg(x + yi) = arg(yi) = π/2.
• If x = 0, y < 0 (negative vertical axis), then define arg(x + yi) = arg(yi) = −π/2.
• If x < 0, y ≥ 0 (top-left quadrant plus the negative horizontal axis), then define arg(x +
yi) = tan−1 (y/x) + π.
The red point illustrates. The angle its position vector makes with the negative x-axis
is tan−1 (y/∣x∣). And so arg(x + yi) = π − tan−1 (y/∣x∣). Observe that tan−1 (y/∣x∣) =
tan−1 (y/ − x) = − tan−1 (y/x). Thus, arg(x + yi) = π − tan−1 (y/∣x∣) = tan−1 (y/x) + π.
• If x < 0, y < 0 (bottom-left quadrant), then define arg(x + yi) = tan−1 (y/x) − π.
The blue point illustrates. The angle its position vector makes with the negative x-axis is
tan−1 (∣y∣/∣x∣). And so arg(x + yi) = tan−1 (∣y∣/∣x∣) − π. Observe that (∣y∣/∣x∣) = (−y/ − x) =
(y/x). Thus, arg(x + yi) = π − tan−1 (∣y∣/∣x∣) = tan−1 (y/x) − π.
45
Some writers define arg 0 = 0, but we shall not do this.
Fact 48. (a) z is purely imaginary (z is on the vertical axis) ⇐⇒ arg z = ±π/2.
Exercise 154. (Answer on p. 1105.) Where on the complex plane must a complex number
be, if its argument is ...
π π π π
(a) Positive? (b) Negative? (c) 0? (d) ? (e) − ? (f) > ? (g) < − ?
2 2 2 2
Fact 49. Let z be a complex number with ∣z∣ = r and arg z = θ. Then z = r (cos θ + i sin θ).
√ √
2 −2
∣z∣ = 52 + (−2) = 29 and arg z = tan−1 ≈ −0.381.
5
√
So in polar form, z = 29 [cos (−0.381) + i sin (−0.381)].
√ √ 3
∣z∣ = 12 + 32 = 10 and arg z tan−1 ≈ 1.249.
1
√
So in polar form, z = 10 [cos (1.249) + i sin (1.249)].
√ √ 7
∣z∣ = (−4)2 + 72 = 65 and arg z = tan−1 + π ≈ 2.090.
−4
√
So in polar form, z = 65 [cos (2.090) + i sin (2.090)].
Exercise 155. Rewrite each of the following complex numbers in polar form: 1, −3, 2i,
1 + 2i, and −1 − 3i. (Answer on p. 1105.)
Fact 50. Let z be a complex number with ∣z∣ = r and arg z = θ. Then z = reiθ .
1 2 1
Proof. ∣z∣ = r and arg z = θ Ô⇒ z = r(cos θ + i sin θ) ⇐⇒ z = reiθ , where ⇐⇒ uses Fact
2
49 and ⇐⇒ uses the Euler Formula.
√ √
2
Example 396. The number z = 5 − 2i = (5, −2) has modulus 52 + (−2) = 29 and
√
argument tan−1 (−2/5) ≈ −0.381. Hence, we can also write z = 29ei(−0.381) .
√ √
Example 397. The number z = 1 + 3i = (1, 3) has modulus
√ i(1.249) 1 2 + 32 = 10 and argument
−1
tan (3/1) ≈ 1.249. Hence, we can also write z = 10e .
√ √
Example 398. The number z = −4 + 7i = (−4, 7) has modulus√ (−4)2 + 72 = 65 and
argument tan−1 (7/ − 4) + π ≈ 2.090. Hence, we can also write z = 65ei(2.090) .
Exercise 156. Rewrite each of the following complex numbers in exponential form: 1, −3,
2i, 1 + 2i, and −1 − 3i. (Answer on p. 1105.)
Now that we know how to write complex numbers in polar and exponential forms, the
arithmetic of complex numbers becomes even easier.
Fact 51. Product of two complex numbers. Let z and w be complex numbers. Then
This is the complex number with modulus rs and which makes an angle θ + φ with the
positive x-axis.
Note though that θ + φ may not be in (−π, π]. Thus, rather than say that arg(zw) =
arg z + arg w, we instead say that arg (zw) = arg z + arg w + 2kπ (where k = −1, 0, 1 ensures
that arg z + arg w + 2kπ ∈ (−π, π]).
Here is an alternative quicker proof of the above fact, using the exponential form.
Proof. Let z = reiθ and w = seiφ . Then zw = rsei(θ+φ) . This is the complex number with
modulus rs and which makes an angle θ + φ with the positive x-axis.
√ √
2
∣z∣ = 52 + (−2) = 29, and arg z = tan−1 (−2/5),
√ √
∣w∣ = 12 + 32 = 10, and arg w = tan−1 (3/1),
√ √ √
Ô⇒ ∣zw∣ = 29 × 10 = 290, and
arg (zw) = tan−1 (−2/5) + tan−1 (3/1) + 2kπ ≈ 0.869 + 2kπ = 0.869 (k = 0).
Notice that here arg z + arg w ≈ 0.869 ∈ (−π, π]. So arg z + arg w is already a principal
value and we can simply set k = 0 or arg(zw) = arg z + arg w.
√ √
So zw ≈ 290 (cos 0.869 + i sin 0.869) = 290ei(0.869) .
√ −2 3 √ −2 3
290 cos [tan−1 + tan−1 ] = 11, and 290 sin [tan−1 + tan−1 ] = 13.
5 1 5 1
√ √
∣z∣ = (−4)2 + 72 = 65, and arg z = tan−1 [7/ (−4)] + π,
√ √
∣w∣ = 12 + (−6)2 = 37, and arg w = tan−1 (−6/1),
√ √ √
Ô⇒ ∣zw∣ = 65 × 37 = 2405, and
arg (zw) = tan−1 [7/ (−4)] + π + tan−1 (−6/1) + 2kπ ≈ 0.684 + 2kπ = 0.684 (k = 0).
Notice that here arg z + arg w ≈ 0.684 ∈ (−π, π]. So arg z + arg w is already a principal
value and we can simply set k = 0 or arg(zw) = arg z + arg w.
√ √
So zw ≈ 2405 (cos 0.684 + i sin 0.684) = 2405ei(0.684) .
√ −7 −6 √ −7 −6
2405 cos [tan−1 + π + tan−1 ] = 38, and 2405 sin [tan−1 + π + tan−1 ] = 31.
4 1 4 1
√
∣z∣ = (−3)2 + 42 = 5, and arg z = tan−1 [4/ (−3)] + π,
√ √
∣w∣ = (−5)2 + 22 = 29, and arg w = tan−1 [2/ (−5)] + π.
√
Ô⇒ ∣zw∣ = 5 29, and
4 2
arg (zw) = (tan−1 + π) + (tan−1 + π) + 2kπ ≈ 4.975 + 2kπ = −1.308 (k = −1).
−3 −5
Notice that here arg z + arg w ≈ 4.975 ∉ (−π, π]. So we need to set k = −1 to get
arg(zw) = arg z + arg w − 2kπ ≈ −1.308 ∈ (−π, π], so that arg(zw) is indeed a principal
value.
√ √
So zw ≈ 5 29 [cos (−1.308) + i sin (−1.308)] = 5 29ei(−1.308) .
To get zw in standard form, use a calculator: You use a calculator, you’ll get
√ −4 −2 √ −4 −2
5 29 × cos [tan−1 + π + tan−1 ] = 7 and 5 29 × sin [tan−1 + π + tan−1 ] ≈ −26.
3 5 3 5
Exercise 157. Write down zw in polar and exponential forms, for each of the following
pair of z and w. (a) z = 1, w = −3. (b) z = 2i, w = 1 + 2i. (c) z = −1 − 3i, w = 3 + 4i. (Answer
on p. 1106.)
Fact 52. Ratio of two complex numbers. Let z and w be complex numbers. Then
z ∣z∣ z
∣ ∣= , and arg = arg z − arg w + 2kπ,
w ∣w∣ w
r
= [cos (θ − φ) + i sin (θ − φ)] .
s
This is the complex number with modulus r/s and argument θ − φ + 2kπ (where k is the
unique integer such that θ − φ + 2kπ ∈ (−π, π]).
Here is an alternative quicker proof of the above fact, using the exponential form.
Proof. Let z = reiθ and w = seiφ . Then z/w = ei(θ−φ) (r/s). This is the complex number
with modulus r/s and argument θ − φ + 2kπ (where k is the unique integer such that
θ − φ + 2kπ ∈ (−π, π]).
√ −2 √ 3
∣z∣ = 29, arg z = tan−1 , ∣w∣ = 10, arg w = tan−1 .
5 1
z √ z −2 3
Ô⇒ ∣ ∣ = 2.9, arg ( ) = tan−1 − tan−1 + 2kπ ≈ −1.630 (k = 0).
w w 5 1
√ √
Ô⇒ zw ≈ 2.9 [cos (−1.630) + i sin (−1.630)] = 2.9ei(−1.630) .
√ 7 √ −6
∣z∣ = 65, arg z = tan−1 + π, ∣w∣ = 37, arg w = tan−1 .
−4 1
√
z 65 z 7 −6
Ô⇒ ∣ ∣= , arg ( ) = tan−1 +π−tan−1 +2kπ ≈ 3.496+2kπ ≈ −2.788 (k = −1).
w 37 w −4 1
√ √
z 65 65 i(−2.788)
Ô⇒ ≈ [cos (−2.788) + i sin (−2.788)] = e .
w 37 37
4 √ 2
∣z∣ = 5, arg z = tan−1 + π, ∣w∣ = 29, arg w = tan−1 + π.
−3 −5
z 5 z 4 2
Ô⇒ ∣ ∣= √ , arg ( ) = tan−1 − tan−1 + 2kπ ≈ −0.547 + 2kπ = −0.547 (k = 0).
w 29 w −3 −5
z 5 5
Ô⇒ ≈ √ [cos (−0.547) + i sin (−0.547)] = √ ei(−0.547) .
w 29 29
Exercise 158. For each of the following pairs of z and w, write down z/w in polar and
exponential forms. (Answer on p. 1107.)
Fact 53 expresses the sine and cosine functions as weighted sums of the exponential func-
tions. It is not in the syllabus, but made a sudden first-time appearance on the 2015 A-level
exams (Exercise 356), just to screw students over.
Proof. By the Euler Formula, eiθ = cos θ+i sin θ. Moreover, e−iθ = cos (−θ)+i sin (−θ) = cos θ−
i sin θ, where the second equality uses the properties cos x = cos(−x) and sin(−x) = − sin x.
eiθ + e−iθ cos θ + i sin θ + cos θ − i sin θ
Hence, = = cos θ, as desired.
2 2
eiθ − e−iθ cos θ + i sin θ − cos θ + i sin θ
Similarly, = = sin θ, also as desired.
2i 2
The 2015 question was about the sum (or difference) of two complex numbers that
have the same modulus. Here’s a similar example:
We can play a similar trick to figure out the modulus and argument of z − w:
where k, m = −1, 0, 1 are to ensure that 0.5(θ + φ) + 2kπ ∈ (−π, π] and 0.5(θ + φ + π) + 2mπ ∈
(−π, π].
Exercise 159. Let z = 3ei(0.2π) and w = 3ei(−0.9π) . By mimicking the steps in Example 405,
find z + w and z − w in exact polar and exponential forms. (Answer on p. 1108.)
SYLLABUS ALERT
If you’re taking the 9758 (revised) exam, you are done with Part IV: Complex Numbers.
The remaining chapters in Part IV covers the following, which are on the 9740 (old) syllabus
but not on the 9758 (revised) syllabus:
In secondary school, we learnt to do some geometry using cartesian equations. And in Part
III (Vectors), we learnt to do some geometry using vector equations. Now, we’ll learn to
do some geometry using complex equations!
Given two complex numbers z = x + iy and w = a + ib, their sum is simply the complex
number z + w = (x + a) + (y + b)i.
We already know how to interpret z = (x, y) and w = (a, b) as points on the plane. This
gives us a nice geometric interpretation: z +w = (x+a, y +b) is likewise a point on the plane.
We can also interpret z = (x, y) and w = (a, b) as position vectors. And thus as usual, the
sum of two vectors is itself a vector: z + w = (x + a, y + b).
z + w = (x + a, y + b)
w
z = (x, y)
z+w
w = (a, b)
z = (x, y)
z - w = (x - a, y - b)
z-w
w = (a, b)
Note that in general, ∣z + w∣ ≠ ∣z∣ + ∣w∣ or that ∣z − w∣ ≠ ∣z∣ − ∣w∣. This is perhaps obvious
from the above figures and also bearing in mind Corollary 3 (the sum of the lengths of any
two sides of a triangle is always greater than the length of the third side).
With sums and differences, there was an exact analogy to vectors. In contrast, with products
and ratios of complex numbers, there is no analogy to vectors. In particular, the
product of two complex numbers has nothing to do with the scalar product or vector
product of their position vectors.
Nonetheless, we do have nice geometric interpretations. We already know from Fact 51
that the product of two complex numbers z and w is simply the complex number zw with
z = (a, b)
w = (c, d)
y z = (a, b)
w = (c, d)
This is the complex number with modulus r and angle −θ with the positive x-axis.
y
z* = (x, -y)
z = (x, y)
A locus (plural: loci) is a set of points that satisfy some condition (or conditions). We’ve
actually already encountered plenty of loci in Part I (Functions and Graphs), so this is
nothing new. This chapter reviews loci involving cartesian equations (and inequalities).
The goal is to prepare you for the next chapter, where we look at loci involving complex
equations (and inequalities).
42.1 Circles
Example 406. {(x, y) ∶ x2 + y 2 = 1} is the set of all points (x, y) in the cartesian plane that
satisfy the condition x2 + y 2 = 1. Graphically, this locus describes describing the unit circle
centred on the origin. (To be clear, it includes only the circumference of the circle.)
y
{(x, y): x2 + y2 = 1}
y
{(x, y): x2 + y2 ≤ 1}
y
{(x, y): x2 + y2 < 1}
{(x, y): x2 + y2 ≥ 1}
{(x, y): y = x}
x x
Example 413. Graphically, the locus {(x, y) ∶ y ≥ x} describes the set of all points above
the line y = x, including the line itself. Again, this is a closed half-plane.
Graphically, the locus {(x, y) ∶ y > x} describes the set of all points above the line y = x,
but excluding the line itself. Again, this is an open half-plane.
y y
x x
Example 414. Let (a, b) and (c, d) be points. The locus of points that are equidistant to
(a, b) and (c, d) is the line illustrated below. This is because if you pick any point (e.g. P )
on the line, it is indeed equidistant to (a, b) and (c, d). And if you pick any point (e.g. Q)
not on the line, it must be either closer to (a, b) or closer to (c, d) — in this case, Q is
closer to (a, b) than to (c, d).
(c, d)
Exercise 162. (a) Find the cartesian equation of the line that is equidistant to the points
(1, 4) and (−5, 0).
(b) Describe in words the set {(x, y) ∶ ∣(x − 17, y − 3)∣ = ∣(x + 2, y + 11)∣}. Then rewrite the
cartesian equation ∣(x − 17, y − 3)∣ = ∣(x + 2, y + 11)∣ into the form ay + bx + c = 0. (Answer
on p. 1113.)
46
If we’d like, we can further simplify this equation. If d − b ≠ 0, then it can be rewritten as
a−c c2 + d2 − (a2 + b2 )
y= x+ .
d−b 2(d − b)
c2 + d2 − (a2 + b2 )
x= .
2(c − a)
√ √ √ √
2 2 2 2 2 2
{(x, y) ∶ x + y = 1, y = x} = {(− ,− ),( , )}.
2 2 2 2
y
{(x, y): y = x}
{(x, y): x2 + y2 = 1}
{(x, y): y = x, x2 + y2 = 1}
y
{(x, y): y = x}
{(x, y): x2 + y2 ≤ 1}
{(x, y): y = x, x2 + y2 ≤ 1}
{(x, y): y = x}
{(x, y): x2 + y2 = 1}
Exercise 163. Sketch on a cartesian plane the locus {(x, y) ∶ x2 + y 2 = 1, x > 0}. (Answer
on p. 1114.)
43.1 Circles
On an Argand diagram (or complex plane), the locus {z ∈ C ∶ ∣z∣ = 1} simply describes the
unit circle centred on the origin, as we now prove:
√ √
Let z = (x, y). Then ∣z∣ = x2 + y 2 and so the equation ∣z∣ = 1 is equivalent to x2 + y 2 = 1
or x2 + y 2 = 1. Hence,
{z ∈ C ∶ ∣z∣ = 1} = {(x, y) ∶ x2 + y 2 = 1} .
But we already saw in the previous chapter that the locus {(x, y) ∶ x2 + y 2 = 1} describes
the unit circle centred on the origin.
Loci involving complex equations (or inequalities) can usually be easily transformed into a
familiar cartesian equation (or inequality).
y
{z : |z | = 1} = {(x, y): x2 + y2 = 1}
(b) Let c be some fixed complex number. Prove that the locus {z ∈ C ∶ ∣z − c∣ = r} is the
circle of radius r centred on the point c.
Let b and c be fixed complex numbers. The equation ∣z − c∣ = ∣z − b∣ is simply the condition
that z is equidistant to b and c.
Hence, the locus {z ∈ C ∶ ∣z − c∣ = ∣z − b∣} simply describes the points that are equidistant to
b and c. And as we showed earlier, such a locus is simply a line.
{z : |z – b | = |z – c |}
Exercise 165. Let b and c be fixed complex numbers. What is the locus of complex
numbers z that satisfy each of the following inequalities? (a) ∣z − c∣ ≤ ∣z − b∣. (b) ∣z − c∣ <
∣z − b∣. (c) ∣z − c∣ ≥ ∣z − b∣. (d)∣z − c∣ > ∣z − b∣. (Answer on p. 1115.)
The locus {z ∈ C ∶ arg z = α} describes the set of points z whose argument is α. It is thus
the ray (or half-line) which starts from but excludes the origin and which makes an angle
α with the positive x-axis. The figure below illustrates.
The point A is in the locus, because indeed arg A = α. In contrast, the point B is not in
the locus, because its argument is not arg B ≠ α.
Note importantly that points along the dotted red ray, such as C, are not in the locus,
because arg C = α − π ≠ α.
Moreover, the origin is not in the locus, because arg 0 is undefined.
{z : arg z = Ƚ} B
A
Ƚ
If we really wanted to, we could rewrite the complex equation arg z = α into cartesian form.
But it turns out that in this case, the cartesian form is more complicated. And so we’ll
just stick with the equation arg z = α.
The point b is in the locus, because indeed arg(b − a) = α. In contrast, the point c is not in
the locus, because its argument is not arg(c − a) ≠ α.
Note importantly that points along the dotted red ray, such as d, are not in the locus,
because arg(d − a) = α − π ≠ α.
Moreover, the point a is not in the locus, because arg(a − a) = arg 0 is undefined.
{z : arg (z – a) = Ƚ} c
b
Ƚ
d x
{z : arg z = Ⱦ}
Ƚ
Ⱦ
{z : arg z = Ƚ} x
Exercise 166. What is {z ∈ C ∶ ∣z∣ = 1, −π < arg z < 0}? (Answer on p. 1115.)
47
See Exercises 360 (2013), 364 (2011), and 370 (2008).
Definition 98. A chord is a line segment connecting any two points on a circle’s circum-
ference.
Here are a few properties of the circle (which you are supposed to still remember from
O-levels) and which would definitely have been useful in some complex loci questions in
the past ten years’ A-levels.
Fact 56. Let A be a point exterior to a circle. Let B and C be the points at which the
tangents from A touch the circle. Let O be the centre of the circle.
(a) The line through A and O (i) bisects the angle ∠BAC; (ii) is the perpendicular bisector
of the chord BC; and (iii) passes through the points D and E, which are the points on the
circle that are respectively that closest to and furthest from A.
Perpendicular
bisector of chord
B Chord
E
O
A D
Tangents C
Here’s an example that illustrates the uses of the above properties of the circle.
∣z + 4 + 2i∣ = 1 describes a unit circle centred on the point C = (−4, −2). Even if not asked
for, you should make a quick sketch to help yourself see better.
By the above fact, ∣z∣ is maximised at F and minimised at N , where F and N lie on the
line through the origin and the circle’s centre.
√ √
(a) The maximum value of ∣z∣ is the length OF = OC + CF = (−4)2 + (−2)2 + 1 = 20 + 1.
√ √
The minimum value of ∣z∣ is the length ON = OC − CN = (−4)2 + (−2)2 − 1 = 20 − 1.
(b) Consider △CAN . The line through F , C, N , and the origin is y = 0.5x. So AN =
0.5CA. Moreover, CA2 + AN 2 = CN 2 = 12 = 1.
4 2 1
Altogether then, CA2 + 0.25CA2 = 1 or CA2 = or CA = √ . And AN = √ . Hence,
5 5 5
2 1
N = (−4 + √ , −2 + √ ) .
5 5
2 1
Symmetrically, F = (−4 − √ , −2 − √ ).
5 5
y
O x
|z + 4 + 2i | = 1
U
C A y = 0.5 x
(Line through the
F origin and the
D centre of the circle.)
z satisfies ∣z + 4 + 2i∣ = 1. (c) What are the maximum and minimum possible values of arg z?
(d) For what values of z is arg z maximised and minimised?
(c) The points U and D at which arg z is maximised and minimised are also where the tan-
gents OU and OD from the origin touch the circle. By the above fact, OU is perpendicular
to CU . Similarly, OD is perpendicular to CD.
The angle the lower half of the√line y = 0.5x makes with the positive x-axis is θ = tan−1 0.5−π.
√
The angle ∠COU is sin−1 (1/ 20). Hence, arg U = θ+∠COU = tan−1 0.5−π−sin−1 (1/ 20).
√
Symmetrically, arg D = θ − ∠COD = θ − ∠COU = sin−1 0.5 − π + tan−1 (1/ 20).
√ √
(d) △ODC is right. So OD2 +CD2 = OC 2 . OC = 20 and CD = 1. Hence, OD = 20 − 1 =
√ √ √
19. Altogether then ∣D∣ = 19 and arg D = tan−1 0.5 − π + sin−1 (1/ 20).
√ √
Symmetrically, we also have ∣U ∣ = 19 and arg U = sin−1 0.5 − π + tan−1 (1/ 20).
y
O x
|z + 4 + 2i | = 1
U
C A y = 0.5 x
(Line through the
F origin and the
D centre of the circle.)
Exercise 167. The complex number z satisfies the equation ∣z − 2 − 2i∣ = 1. (a) What are
the maximum and minimum possible values of ∣z∣? (b) For what values of z is ∣z∣ maximised
and minimised? (c) What are the maximum and minimum possible values of arg z? (d)
For what values of z is arg z maximised and minimised? (Answer on p. 1116f.)
n
Theorem 7. De Moivre’s Theorem. (cos θ + i sin θ) = cos (nθ) + i sin (nθ).
Proof. cos θ + i sin θ is the complex number with modulus 1 and argument θ. So by Fact 51,
n
(cos θ + i sin θ) is the complex number with modulus 1n = 1 and argument nθ + 2kπ (where
k is the unique integer such that nθ + 2kπ is a principal value) — this complex number can
be written as cos (nθ) + i sin (nθ).
n 1 n 2 3 1 3
Proof. (cos θ + i sin θ) = (eiθ ) = ei(nθ) = cos (nθ) + i sin (nθ), where = and = use the Euler
2 b
Formula (Theorem 6) and = uses the law of exponents (xa ) = xab , which applies even when
a is imaginary.
n
Corollary 5. [r (cos θ + i sin θ)] = rn [cos (nθ) + i sin (nθ)].
n
Or equivalently, (reiθ ) = rn ei(nθ) .
Or equivalently, if ∣z∣ = r and arg z = θ, then ∣z n ∣ = rn and arg z n = nθ + 2kπ (where k is the
unique integer such that nθ + 2kπ is a principal value).
√ π π
∣z∣ = 2, arg z = + 2kπ = (k = 0),
4 4
√ 2 π π
∣z 2 ∣ = ( 2) = 2, arg z 2 = 2 ( ) + 2kπ = (k = 0),
4 2
√ 3 √ π 3π
∣z 3 ∣ = ( 2) = 2 2, arg z 3 = 3 ( ) + 2kπ = (k = 0),
4 4
√ 4 π
∣z 4 ∣ = ( 2) = 4, arg z 4 = 4 ( ) + 2kπ = π (k = 0),
4
√ 5 √ π 3π
∣z 5 ∣ = ( 2) = 4 2, arg z 5 = 5 ( ) + 2kπ = − (k = −1), etc.
4 4
√ √
∣z∣ = 12 + 0.42 = 1.16, arg z = tan−1 (0.4) + 2kπ,
√
∣z 3 ∣ = 1.16 1.16, arg z 3 = 3 tan−1 (0.4) + 2kπ,
√
∣z 5 ∣ = 1.162 1.16, arg z 5 = 5 tan−1 (0.4) + 2kπ, etc.
The powers of z = 1 + 0.4i, up to the 14th, are illustrated in the figure below.
(b) Given z = −5 + 12i, find ∣z∣ and arg z. Hence find ∣z 8 ∣ and arg z 8 . Write down (−5 + 12i)8
in exponential form. (Answer on p. 1118.)
Exercise 170. For each of the given values of z, compute z 10 , expressing your answer in
all three forms (polar, exponential, and standard). (a) z = −1 − i. (b) z = 2 + i. (c) z = 1 − 3i.
(Answer on p. 1119.)
Example 421. What are the roots to the equation z 3 = 1 + i? That is, for what values of
z is the given equation true?
A naïve application of de Moivre’s Theorem might suggest that
1/3
∣z 3 ∣ = 21/2 and arg z 3 = π/4 Ô⇒ ∣z∣ = (21/2 ) = 21/6 and arg z = (π/4)/3 = π/12.
This is not incorrect, but it gives us only one root to the equation z 3 = 1 + i, namely
z = 21/6 ei(π/12) .
In contrast, the Fundamental Theorem of Algebra tells us that since the equation z 3 = 1 + i
involves a degree-3 polynomial, it should have 3 roots. We’ve just found one root. How do
we find the other two?
The trick is to recognise that z 3 = 21/2 eiπ/4 can also be written as z 3 = 21/2 ei(π/4+2kπ) ,
for any integer k. This is because if you plug in any integer k, you will always get
21/2 ei(π/4+2kπ) = 21/2 ei(π/4) . The reason is that ei(2π) = 1.
1/3 1/3
We then have z = (z 3 ) = [21/2 ei(π/4+2kπ) ] = 21/6 ei(π/4+2kπ)/3 , for any integer k. Now in
contrast to before, different integers k will yield us distinct values for z = 21/6 ei(π/4+2kπ)/3 .
In particular, if we pick values of k so that the values of (π/4 + 2kπ) /3 are principal values,
that is, if we pick k = 0, ±1, we have
Observe that beautifully enough, the roots of the equation z 3 = 1 + i lie on a circle — in
particular, the circle of radius 21/6 centred on the origin. Moreover, each root can be
obtained by rotating another root 2π/3 radians about the origin.
n−1
1. If n is odd, then simply pick k = 0, ±1, ±2, . . . , ± . (E.g., if n = 15, then pick k =
2
0, ±1, ±2, . . . , ±7.)
n
2. If n is even AND arg z n > 0, then simply pick k = 0, ±1, ±2, . . . , − . (E.g., if n = 16 and
2
arg z n > 0, then pick k = 0, ±1, ±2, . . . , ±7, −8.)
n
3. If n is even AND arg z n ≤ 0, then simply pick k = 0, ±1, ±2, . . . , . (E.g., if n = 16 and
2
arg z n ≤ 0, then pick k = 0, ±1, ±2, . . . , ±7, 8.)
You can easily verify that in each case, we do indeed have n roots (just count them). See
Fact 90 in the Appendices for a proof (or explanation) of why the above values of k ensure
that we have k distinct principal values for arg z.
⎧
⎪ −1
⎪
⎪ 131/4 ei[π−tan (12/5)+2kπ]/4 , (k = 0),
⎪
⎪
⎪
⎪
⎪ −1
⎪131/4 ei[3π−tan (12/5)]/4 , (k = 1),
z=⎨
⎪
⎪
⎪ 131/4 i[−π−tan−1 (12/5)]/4
e , (k = −1),
⎪
⎪
⎪
⎪
⎪
⎪ 1/4 i[−3π−tan−1 (12/5)]/4
⎩13 e , (k = −2).
Notice that the eight possible values of w are on a circle whose radius is just slightly shorter
than the red circle. (Only the red circle is illustrated.)
Calculus
Part I already covered differentiation. This chapter merely ties up some loose ends.
The Inverse Function Theorem (IFT) simply says that “The change in y caused by a small
unit change in x (dy/dx)” is the inverse of “the change in x caused by a small unit change
in y (dx/dy)”.48 That is,
dy 1
= .
dx dx
dy
Example 424. Suppose that adding 1 g of Milo (the x-variable) to a cup of water increases
the volume of water by 2 cm3 (the y-variable). That is, dy/dx = 2 cm3 g-1 .
Then dx/dy = 0.5 g cm-3 . That is, if instead we had wanted to increase the volume of water
by 1 cm3 , we should have added 0.5 g of Milo to the water.
Example 425. Let x ∈ [−π/2, π/2]. Let y = sin x. Suppose we wish to find dx/dy in terms
of x.
Method #1 (longer method using Corollary 2 ). y = sin x Ô⇒ x = sin−1 y. So
dx d 1 1 1
= sin−1 y = √ =√ =
dy dy 1 − y2 1 − sin2 x cos x
dy dx 1
Method #2 (quicker method using the IFT). = cos x Ô⇒ = .
dx dy cos x
dy dx
Exercise 172. Suppose x2 y + sin x = 0. Find . Hence write down . (You may leave
dx dy
your answers expressed in terms of x and y.) (Answer on p. 1121.)
48
This is informal. For the formal statement of the IFT (optional), see p. 961 in the Appendices.
dy dy dx dx
“Informal Fact”. = ÷ (provided ≠ 0).
dx dt dt dt
Here is an informal “proof” of the above informal fact. By the Chain Rule,
dy 1 dy dt
= .
dx dt dx
By the IFT,
dt 2 1
= .
dx dxdt
2 1
Plugging = into = yields the desired result:
dy dy dx
= ÷ .
dx dt dt
See p. 962 in the Appendices for a formal version of the above Fact.
dy
Example 426. Let x = t5 + t and y = t6 − t. Find ∣ .
dx t=0
dy dy dx 6t5 − 1
= ÷ = .
dx dt dt 5t4 − 1
dy
So ∣ = 1. It would be much more difficult (perhaps even impossible) if instead we first
dx t=0
dy
tried to express y in terms of x, then compute .
dx
dy
Exercise 173. Let x = cos t + t2 and y = et − t3 . Find . (Answer on p. 1121.)
dx
Fact 57. The line with slope m through the point (a, b) has equation y − b = m(x − a).
1
Fact 58. Given a line with slope m, its perpendicular has slope − .
m
R R R
dy RRRR dy dx RRRR 6t5 − 1 RRRR
R = ÷ R = 4 R = 1.
dx RRRR dt dt RRRR 5t − 1 RRRR
Rt=0 Rt=0 Rt=0
So the tangent line at the point t = 0 or (0, 0) has slope 1. Thus, the normal line at this
point has slope −1. Its equation is thus y − 0 = 1(x − 0) or more simply y = x.
The points where this normal line intersects the curve is thus given by the system of
equations y = x, x = t5 + t, and y = t6 − t. Putting these together, we have t5 + t = t6 − t ⇐⇒
t (t5 − t4 − 2) = 0. So t = 0 or t ≈ 1.45 (calculator). (We know by the Fundamental Theorem
of Algebra that there must be six roots altogether — in this case, only two are real, while
the other four are complex.)
So the normal line intersects the curve C again at the point where t ≈ 1.45 or where
(x, y) ≈ (7.88, 7.88).
Example 428. We unload sand onto a flat surface at a steady rate of 0.01 m3 s-1 . Assume
the unloaded sand always forms a perfect cone whose height and base diameter are always
equal.
Let’s find the rate at which the base area of the cone is increasing, at the instant t = 20 s.
First, recall that a cone with base radius r and height h has volume
1
V = πr2 h.
3
Since the base diameter equals the height (or h = 2r), we can rewrite this as
2
V = πr3 .
3
Let A = πr2 be the base area. The rate at which the base area is increasing is
dA dr dV
= 2πr = ÷ r.
dt dt dt
The volume of the sand is always increasing at a rate 0.01 m3 s-1 . That is:
dV
= 0.01 m3 s−1 .
dt
dA 0.3 1/3
∣ = 0.01 ÷ ( ) = 0.0219 m2 s−1 .
dt t=20 π
(b) Use the Pythagorean Theorem to express l in terms of r and h. Hence express l solely
in terms of h.
(c) Now express the total external surface area A (excludes the base) solely in terms of h.
dA 3 π − h63 6 1/3
(d) Show that = . Hence conclude that the only stationary point is h = ( ) .
dh 2 A π
(e) Use the quotient rule to show that
12 6 2
d2 A 9 h4 A2 − (π − h3 )
= .
dh2 4 A3
d2 A
(f) Consider the numerator of 2
. Replace A2 with the expression for A that you found
dh
in (c). Now fully expand this numerator. Observe that it is a quadratic and prove that it
is always positive.
(g) Hence conclude that the stationary point we found is indeed the global minimum.
Example 429. Define f ∶ [0, 2] → R by x ↦ x−sin (0.5πx). We can easily find the minimum
point of f analytically:
df π π π 2 2 2
= 1 − cos ( x) = 0 ⇐⇒ cos ( x) = ⇐⇒ x= cos−1 ≈ 0.560664181.
dx 2 2 2 π π π
It looks like starting at x = 0, the function is decreasing, then hits a minimum point, then
keeps increasing. Our goal now is to find out what that minimum point is.
4. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu.
5. Press 3 to select the “minimum” option. This brings you back to the graph, with a
cursor flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s MINIMUM function works by you first choosing a “Left Bound” and a “Right
Bound” for x. TI84 will then look for the minimum point within your chosen bounds.
6. Using the < and > arrow keys, move the blinking cursor until it is where you want your
first “Left Bound” to be. For me, I have placed it a little to the left of where I believe
the minimum point to be.
7. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is
where you want your first “Right Bound” to be. For me, I have placed it a little to the
right of where I believe the minimum point to be.
9. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the minimum point is. So go ahead and:
10. Press ENTER . TI84 now informs you that there is a “Zero” at “X = .56066485”,
“Y = −.2105137” and places the cursor at precisely that point. This is our desired
minimum point.
(Notice there’s a slight error, because the TI84 uses slightly-imprecise numerical methods.
Analytically, we found that the minimum point was x ≈ 0.560664181, while the TI84 claims
it is “X = .56066485”.)
This example will also illustrate how to graph parametric equations on the TI84.
dy 6t5 − 1 5
∣ = 4 ∣ = .
dx t=1 5t + 1 t=1 6
Notice that strangely enough, the graph seems to be empty for the region where x < 0. But
clearly there are values for which x < 0 — for example, t = −1.1 Ô⇒ (x, y) ≈ (−2.71, 2.87).
So why isn’t the TI84 graphing this?
The reason is that by default, the TI84 graphs only the region for where 0 ≤ t ≤ 2π (at least
this is so for my particular calculator). We can easily adjust this:
4. Press the WINDOW button to bring up a menu of WINDOW options.
5. Using the arrow keys, the number pad, and the ENTER key as is appropriate, change
Tmin and Tmax to your desired values. In my case, I decided somewhat randomly to
enter Tmin = −10 and Tmax= 10.
6. Then press GRAPH again and the calculator will graph the given pair of parametric
equations, now for the region Tmin ≤ t ≤ Tmax, where Tmin and Tmax are whatever
you chose.
After Step 9. After Step 10. After Step 11. After Step 12.
dy
Actually, the last few steps were really not necessary, if all we wanted was to find ∣ ,
dx t=1
as we do now:
7. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu, which once again looks a little different
under the current parametric setting.
8. Press 2 to select the “dy/dx” option. This brings you back to the graph.
Nothing seems to be happening. But now, simply ...
9. Press 1 and now the bottom left of the screen changes to display “T = 1”.
dy
10. Hit ENTER . What you’ve just done is to ask the calculator to calculate at the
dx
point where t = 1. The calculator tells you that “dy/dx = .83333528”.
dy 5
Again, there’s a slight error — the exact correct answer is = = 0.8333..., so again the
dx 6
TI84 is a tiny bit off.
You can easily imagine what a “∞-degree polynomial” is. Only we don’t call it a “∞-degree
polynomial”. Instead, we call it a power series.
∞
∑ ai xi = a0 + a1 x + a2 x2 + . . . ,
i=0
1
1 + x + x2 + x3 + x4 + x5 + ⋅ ⋅ ⋅ = .
1−x
For H2 Maths, the only power series we’ll be interested in is called the Maclaurin series.
M (x) = a0 + a1 x + a2 x2 + ⋅ ⋅ ⋅ + an xn + . . . ,
Definition 101. Let f be a n-times differentiable function. The nth-order Maclaurin series
of f at x is denoted Mn (x) and is defined as the nth-degree polynomial (or finite series)
Mn (x) = a0 + a1 x + a2 x2 + ⋅ ⋅ ⋅ + an xn ,
1 2 1 3 x2 x2
M (x) = 1 + 1x + x + x + ⋅⋅⋅ = 1 + x + + + ...
2! 3! 2! 3!
Exercise 176. Write down the third-order Maclaurin series for each of the following func-
tions: (Answer on p. 1123.)
(a) f ∶ R → R defined by x ↦ (1 + x)n ,
Remark 8. The A-level syllabuses make no mention of the Taylor series and so we won’t
talk about it. But just so you know, the Maclaurin series is simply a special case of the
Taylor series — specifically, it is the Taylor series about 0.
The Maclaurin series is simply an (infinite) series. And as we saw in Part II, an infinite
series may or may not be convergent.
For what exactly this mysterious “nice” property is, see section 88.14 (optional).
The following table is in the List of Formulae you get, so dun need to memorise.
x2 ′′ xn (n)
f (x) = f (0) + xf ′ (0) + f (0) +... f (0) +...
2! n!
n(n − 1) 2 n(n − 1) . . . (n − r + 1) r
(1 + x)n = 1 + nx + x +... x +... (∣x∣ < 1)
2! r!
x2 xr
ex = 1 + x + +... +... (all x)
2! r!
x3 x5 (−1)r x2r+1
sin x = x − + +... +... (all x)
3! 5! (2r + 1)!
x2 x4 (−1)r x2r
cos x = 1 − + +... +... (all x)
2! 4! (2r)!
x2 x3 (−1)r+1 xr
ln(1 + x) = x − + +... +... (−1 < x ≤ 1)
2 3 r
1. The first row of the above table says that if x is a value at which the function f satisfies
the “nice” property, then f (x) is equal to the Maclaurin series of f at x.
2. The second row says that g ∶ R → R by x ↦ (1 + x)n satisfies the “nice” property for all
n(n − 1) 2
x ∈ (−1, 1). Thus, for all x ∈ (−1, 1), we have (1 + x)n = 1 + nx + x + . . . We
2!
say that (−1, 1) is the range of values for which g has a convergent Maclaurin
series.49
49
We should be careful to state that if n < 0, then the domain should be restricted to exclude 0.
In the syllabus, these five particular Maclaurin series are called standard series.
x2 x3
Example 435. Here we will not rigorously prove that e = 1 + x + + + . . . for all x ∈ R.
x
2! 3!
Instead, we will merely verify that this equation is “plausible”, for x = 0, 1, 5. (Try these
out yourself using the sheet “Maclaurin series” at the usual link.)
Exercise 177. (Tedious, use the sheet named “Maclaurin series” at the usual link.) Verify
π x3 x5
that for x = 0, , 2π, it is similarly “plausible” that sin x = x − + . . . (Answer on p.
2 3! 5!
1124.)
One important practical use of Maclaurin series is that finite-order Maclaurin series can be
used as approximations.
We see that it tends to be that the higher the order of the Maclaurin series, the better the
approximation.
I emphasise the phrase tends to be, because the approximation can sometimes get worse
before it gets better, especially if we’re looking at a value that is far from 0. The next
example illustrates.
The 0th-order Maclaurin series gets it exactly right. But each subsequent finite-order
Maclaurin series then drifts ever further from 0! Having computed the 5th- and 6th-order
Maclaurin series, it certainly does not look like the approximations will get any better. Yet
if we perservere, we find that
M7 (2π) = M8 (2π) ≈ −30.159, M9 (2π) =M10 (2π) ≈ 11.900, M11 (2π) =M12 (2π) ≈ −3.195,
M13 (2π) =M14 (2π) ≈ 0.625, M15 (2π) =M16 (2π) ≈ −0.093, M17 (2π) =M18 (2π) ≈ 0.011,
M19 (2π) =M20 (2π) ≈ −0.001, M21 (2π) =M22 (2π) ≈ 0.000, ...
Indeed, Mn (2π) ≈ 0.000 for all n ≥ 21. So it does indeed look like the Maclaurin series for
sin x converges. Graphed below are sin x and M21 (x). We see that M21 (x) almost perfectly
approximates sin x for x ∈ [−7, 7]. But for larger values, M21 (x) veers far away from sin x.
x
-12 -7 -2 2 7 12
Graphed below are y = sin x, M1 (x), . . . , M10 (x). We see that the 1st-order Maclaurin series
M1 (x) = x is indeed a good approximation for values of x that are close to 0, but terrible
for larger values.
Low-order Maclaurin series work well as approximations, provided we are looking at small
values of x (i.e. values that are close to 0).
But for large x, even if the Maclaurin series eventually converges, low-order Maclaurin
series may fare very poorly as approximations. Indeed, as we saw on the previous page, for
sufficiently large values of x, even a relatively-high-order Maclaurin series like M21 (x) will
fare poorly as an approximation!
Example 438. Consider k ∶ R → R defined by x ↦ ln(1 + x). The range of values for which
the Maclaurin series converges is (−1, −1]. Suppose we pick x = 2, which is certainly outside
this range. Then we have k(2) = ln 3 ≈ 1.099. Let’s see what the finite-order Maclaurin
series look like:
2
M0 (2) = 0, M1 (2) = 2, M2 (2) = 0, M3 (2) = 2 ,
3
1 1
M4 (2) = −1 , M5 (2) = 5 , M6 (2) = −5.6, M7 (2) ≈ 12.686,
3 15
M8 (2) = −19.314, M9 (2) = 37.575, M10 (2) = −64.825, M11 (2) ≈ 121.356.
Unlike before, further perserverance will not pay off here. Indeed, the Maclaurin series will
grow without bound. For example, M50 (2) ≈ −14.9 trillion! The Maclaurin series simply
does not converge for x = 2. So there is no reason to expect any finite-order Maclaurin
series to be a good approximation.
Informally, if two power series converge, then so too does their product; and to get this
product, simply multiply the two series together as if they were finite polynomials.50
1 3 1
Example 439. For all x ∈ R, sin x = 0 + 1x + 0 − x + . . . and cos x = 1 + 0 − x2 + 0 + . . . .
3! 2!
Thus, for all x ∈ R, we have sin x cos x = c0 + c1 x + c2 x2 + c3 x3 + . . . , where
Constant Term ∶ c0 = 0 × 1 = 0,
Coefficient on x ∶ c1 = 0 × 0 + 1 × 1 = 1,
1
Coefficient on x2 ∶ c2 = 0 × (− ) + 1 × 0 + 0 × 1 = 0,
2!
1 1 2
Coefficient on x3 ∶ c3 = 0 × 0 + 1 × (− ) + 0 × 0 + (− ) × 1 = −
2! 3! 3
2 2
sin x cos x = 0 + 1x + 0x2 + (− ) x3 + ⋅ ⋅ ⋅ = x − x3 + . . .
3 3
The expression on the RHS is, of course, simply also the Maclaurin series for sin x cos x.
You are asked to show this in Exercise 178.
Exercise 178. Let f ∶ R → R be defined by x ↦ sin x cos x. Evaluate f (0), f ′ (0), f ′′ (0),
and f (3) (0). Hence, write down the 3rd-order Maclaurin series for f and verify that this is
consistent with what we found in Example 439. (Answer on p. 1125.)
The next example illustrates that one must be careful about when the Maclaurin series is
convergent:
50
This assertion is formally stated and proven at Fact 97 in the Appendices (optional).
Constant Term ∶ c0 = 0 × 0 = 0,
Coefficient on x ∶ c1 = 0 × 1 + 1 × 0 = 0,
1
Coefficient on x2 ∶ c2 = 0 × (− ) + 1 × 1 + 0 × 0 = 1,
2
1 1 1 1
Coefficient on x3 ∶ c3 = 0 × + 1 × (− ) + 0 × 1 + (− ) × 0 = −
3 2 3! 2
1 1
And so sin x ln(1 + x) = 0 + 0x + 1x2 + (− ) x3 + ⋅ ⋅ ⋅ = x2 − x3 + . . . , for x ∈ (−1, 1] — this set
2 2
is simply the intersection of R and (−1, 1], which are respectively the ranges of values on
which the Maclaurin series for sin x and ln x converge.
The expression on the RHS is, of course, simply also the Maclaurin series for sin x ln(1 + x).
You are asked to show this in Exercise 178.
Exercise 179. Let f ∶ R → R be defined by x ↦ sin x ln(1+x). Evaluate f (0), f ′ (0), f ′′ (0),
and f (3) (0). Hence, write down the 3rd-order Maclaurin series for f and verify that this is
consistent with what we found in Example 440. (Answer on p. 1125.)
2
f (g(c)) = a0 + a1 g(c) + a2 [g(c)] + . . .
That is, to get f (g(c)), simply “plug in” g(c) into the power series for f .51
Thus, f (g(x)) = (1 + 2x)−1 = 1 − (2x) + (2x)2 − (2x)3 + . . . for all g(x) = 2x ∈ (−1, 1).
Equivalently,
x4 x6
f (g(x)) = ex = 1 + x2 +
2
+ + . . . for all x ∈ R.
2! 3!
In the case where g also has a convergent Maclaurin series, we can likewise also simply
“plug in” the Maclaurin series for g.52 Example:
51
For a more careful and formal version of this assertion, see Fact 98 in the Appendices (optional).
52
Again, for a more careful and formal version of this assertion, see Fact 99 in the Appendices (optional).
1 2 3
f (g(x)) = = 1 − g(x) + [g(x)] − [g(x)] + . . .
1 + sin x
2 3
x3 x3 x3
= 1 − (x − + . . . ) + (x − + . . . ) − (x − + ...) ...
3! 3! 3!
1 5
= 1 − x + x2 + x3 ( − 1) + ⋅ ⋅ ⋅ = 1 − x + x2 − x3 + . . .
3! 6
Find the general term for a Maclaurin series is explicitly excluded from the A-level syl-
labuses. Usually you’ll just have to write down the first few terms.
Method #2 (direct method). Let h(x) = 1/(1 + sin x). We have h(0) = 1. We also have
R R
dh RRRR − cos x RRRR
R = 2 RRR = −1,
dx RRRR (1 + sin x) R
RRx=0
Rx=0
R R R
d2 h RRRR (1 + sin x) sin x + 2 cos2 x(1 + sin x) RRRR (1 + sin x) sin x + 2 cos2 x RRRR
2
R = RRR = RRR
dx2 RRRR (1 + sin x)
4
R
R (1 + sin x)
3
RRR
Rx=0 Rx=0 x=0
R
sin x + 2 − sin2 x RRRR
= RR = 2,
(1 + sin x) RRRRx=0
3
1 2 −5 5
Thus, = 1 + (−1)x + x2 + x3 + ⋅ ⋅ ⋅ = 1 − x + x2 − x3 + . . .
1 + sin x 2! 3! 6
In the above example, I gave two methods. Use whichever seems to be easier or quicker.
Here’s another example:
−1
1 1 x2 x4
sec x = = = [1 − ( − + . . . )]
cos x 1 − x2!2 + x4!4 − . . . 2! 4!
2
x2 x4 x2 x4
=1+( − + ...) + ( − + ...) + ...
2! 4! 2! 4!
x2 1 1 2 x2 5x4
=1+ + x4 [− + ( ) ] + ⋅ ⋅ ⋅ = 1 + + + ...
2! 4! 2! 2 24
Method #2 (direct method). Let f (x) = sec x. Then f (0) = sec 0 = 1. And
2
f (4) (x) = 12 sec x [f ′ (x)] + 6 sec2 xf ′′ (x) − f ′′ (x) Ô⇒ f (4) (0) = 5.
1 2 0 3 5 4 x2 5x4
Thus, sec x = 1 + 0x + x + x + x + ⋅ ⋅ ⋅ = 1 + + + ...
2! 3! 4! 2! 24
Exercise 180. Write down the third-order Maclaurin series for sin [ln(1 + x)]. State also
the range of values for which the Maclaurin series converges. (Answer on p. 1125.)
f (x) = a0 + a1 x + a2 x2 + a3 x3 + a4 x4 . . .
Then the coefficients in the above power series are as given by the Maclaurin series. That
is, for each i = 0, 1, 2, . . . , we have
f (i) (0)
ai = .
i!
The above theorem is merely a tantalising hint of why the Maclaurin series “works”. This is
because the theorem merely says this: If we make the very big assumption that the infinitely-
differentiable function f can be written down as a power series, then the coefficients of the
power series are as given by the Maclaurin series.
But this is not very useful, because — how do we know that the function can be written
down as a power series? For a continuation of this discussion, see section 88.14 in the
Appendices.
Definition 102. Given functions f and F , we call F an indefinite integral (or antiderivative
or primitive) of f if for all x in the domain of f ,
F ′ (x) = f (x),
dF
F = ∫ f dx or = f.
dx
The statement “the value of F at 5 is 25” can be written as F (5) = 25. It can also be
written as
n n
∑ i = ∑ r.
i=1 r=1
Similarly, the statement F = ∫ f (x) dx is equivalent any of the following three statements,
because the letters x, a, b, c, etc. are merely “dummy” variables:
So the statement “the value of F at 5 is 25” can also be written F (5) = 25 or any of the
following four statements:
In general:
Proof. Since F ′ (x) = f (x) for all x, we also have G′ (x) = F ′ (x) + C ′ = F ′ (x) + 0 = f (x) for
all x. And so by definition, G is also an indefinite integral of f .
Example 448. Say f has indefinite integral F defined by F (x) = sin (ex −3x+5 ). Suppose
2
G is another indefinite integral of f . Then it must be that F (x) = G(x) + C, for some
C ∈ R.
Formally:
Fact 60. If F and G are both indefinite integrals of f , then there exists some C ∈ R such
that F (x) = G(x) + C for all x.
Proof. Since F and G are both indefinite integrals of f , by definition, F ′ (x) = G′ (x) for all
x. And thus (F − G)′ (x) = 0 for all x. But the only functions whose derivative is always 0
are constant functions.53 Thus, F (x) − G(x) = C, for all x, for some C ∈ R.
(b) F and G seem to be very different functions. Yet both are indefinite integrals of f .
Why does this not contradict our assertion that “the indefinite integral is unique up to a
constant”?
53
The alert reader will note that this assertion has not actually been proven in this textbook. We’ll simply take it for granted
that “the only functions whose derivative is 0 are constant functions”.
As before with our notation for differentiation, let’s be clear (pedantic). To take an example,
the notation
∫ sin x dx = − cos x + C
54
This shorthand statement fails to to mention the domain and codomains of the function and its indefinite integral. However,
the careful writer will “of course” have specified these nearby.
Proposition 10. Let k, n ∈ R be constants with n ≠ −1. Let f and g be functions with
indefinite integrals F and G. Then
∫ k dx = kx + C, ∫ sin x dx = − cos x + C,
xn+1
n
∫ x dx = + C, (x ≠ 0 if n < 0) ∫ cos x dx = − sin x + C,
n+1
−1
∫ x dx = ln ∣x∣ + C, (x ≠ 0) ∫ f (x) ± g(x) dx = F (x) ± G(x) + C,
x
∫ e dx = ex + C, ∫ kf (x) dx = kF (x) + C,
Proof. In general, to prove that ∫ f (x) dx = F , it suffices to prove that F ′ (x) = f (x) for
all x.
d
And so to prove that ∫ x−1 dx = ln ∣x∣ + C, it suffices to prove that (ln ∣x∣ + C) = x−1 for
dx
all x ≠ 0. This we now do. First note that
⎧
⎪
⎪
⎪ln x + C, for x > 0,
ln ∣x∣ + C = ⎨
⎪
⎪
⎩ln (−x) + C,
⎪ for x < 0.
⎧
⎪ 1
⎪
⎪
⎪ , for x > 0,
d ⎪
⎪
⎪ x
Thus, (ln ∣x∣ + C) = ⎨
dx ⎪
⎪
⎪
⎪
⎪ −1 1
⎪
⎪ = , for x < 0.
⎩ −x x
d
And so indeed (ln ∣x∣ + C) = x−1 for all x ≠ 0.
dx
You are asked to prove the remaining rules of integration in Exercise 182.
Exercise 182. Prove the remaining rules of integration listed in Proposition 10. (Answer
on p. 1127.)
No need to memorise the following rules of integration, because the List of Formulae con-
tains a (slightly less general) version.
1 1 x
(a) ∫ x2 + a2 dx = tan−1 ( ) + C,
a a
1 x
(b) ∫ √ 2 dx = sin−1 ( ) + C, for ∣x∣ < a,
a − x2 a
1 1 x−a
(c) ∫ x2 − a2 dx = ln ∣ ∣ + C, for x ≠ a,
2a x+a
1 1 a+x
(d) ∫ a2 − x2 dx = ln ∣ ∣ + C, for x ≠ a,
2a a−x
π
(e) ∫ tan x dx = ln ∣sec x∣ + C, for x not an odd multiple of ,
2
π
(h) ∫ sec x dx = ln ∣sec x + tan x∣ + C, for x not an odd multiple of ,
2
Proof. We prove only (a), (c), and (e). (You are asked to prove the remaining rules of
integration in Exercise 183.)
d 1 d 1 x 1 1 1
(a) By Corollary 2, tan−1 x = 2 . Hence, [ tan−1 ( ) + C] = ⋅ =
dx x +1 dx a a a ( x )2 + 1 a
a
a 1 1 x
. So indeed ∫ 2 dx = tan−1 ( ) + C.
x +a
2 2 x +a 2 a a
x−a
(c) Let x ≠ a. Case #1: ≥ 0.
x+a
d 1 x−a d 1 x−a 1 d
( ln ∣ ∣ + C) = ( ln + C) = [ln(x − a) − ln(x + a)]
dx 2a x+a dx 2a x + a 2a dx
1 1 1 1 x + a − (x − a) 1 2a 1
= ( − )= = = ,
2a x − a x + a 2a (x − a) (x + a) 2a x2 − a2 x2 − a2
1 1 x−a
so that indeed ∫ 2 dx = ln ∣ ∣ + C.
x − a2 2a x+a
x−a
Case #2: < 0.
x+a
d 1 x−a d 1 a−x 1 d
( ln ∣ ∣ + C) = ( ln + C) = [ln(a − x) − ln(x + a)]
dx 2a x+a dx 2a x + a 2a dx
1 −1 1 1 1 1 1
= ( − )= ( − )= 2 ,
2a a − x x + a 2a x − a x + a x − a2
1 1 x−a
so that again ∫ 2 dx = ln ∣ ∣ + C.
x − a2 2a x+a
d d sec x tan x
(ln ∣sec x∣ + C) = (ln sec x + C) = = tan x,
dx dx sec x
d d − sec x tan x
(ln ∣sec x∣ + C) = [ln (− sec x) + C] = = tan x,
dx dx − sec x
Exercise 183. Prove the remaining rules of integration listed in Proposition 11. (Answers
on pp. 1128, 1129, 1130, and 1131.)
The following indefinite integrals are NOT on the List of Formulae and you are definitely
required to know how to derive them on your own!
2 1 sin 2x
(a) ∫ sin x dx = x− + C,
2 4
2 1 sin 2x
(b) ∫ cos x dx = x+ + C,
2 4
2
(c) ∫ tan x dx = tan x + x + C,
Proof. (a) The trick is to recall the trigonometric identity cos 2x = 1 − 2 sin2 x (this is in the
List of Formulae, as are several other trig identities). And so:
2 1 − cos 2x 1 sin 2x
∫ sin x dx = ∫ 2
dx = x −
2 4
+ C.
You are asked to prove the remaining rules of integration in Exercise 184.
Exercise 184. Prove the remaining rules of integration listed in Fact 61. (Answer on p.
1132.)
The method of integration by substitution (IBS) is the Chain Rule in reverse. Before
we explain why it works, here are two examples of how it works.55
cos x d
First, observe that cot x = . Next, observe that sin x = cos x. Let u = sin x (this is
sin x dx
du
our substitution), so that we also have = cos x. So:
dx
cos x 1 du
∫ cot x dx = ∫ sin x dx = ∫ u dx dx.
So far, nothing unusual has happened. Now we’re going to do something strange, which is
to take that last expression and merrily cancel out the dx’s:
1 du 1
∫ u dx dx = ∫ u du + C1 .
du
Didn’t we repeatedly insist earlier that the derivative is NOT a fraction? So why are we
dx
allowed to “merrily” cancel out the dx’s!? Shortly we’ll explain why this move is legitimate.
For now, let us blindly perservere:
1
∫ u du + C1 = ln ∣u∣ + C = ln ∣sin x∣ + C.
Another example, before we explain why exactly we can “merrily” cancel out the dx’s:
du
Example 450. Let’s find ∫ 2x cos x2 dx. Let u = x2 , so that we also have = 2x. Now,
dx
2 du
∫ 2x cos x dx = ∫ dx cos u dx.
55
Actually we secretly already used this method a few times above, though not very explicitly.
du
∫ f ⋅ dx dx = ∫ f du + C.
du dP 1 du dQ 2
Proof. Let P = ∫ f ⋅ dx and Q = ∫ f du. In other words, =f⋅ and = f.
dx dx dx du
2 dQ dQ du 3 du
Using first the Chain Rule and then =, we have = ⋅ =f⋅ .
dx du dx dx
1 3 du
Examining = and =, we see that P and Q are both indefinite integrals for f ⋅ . And so
dx
by Fact 60 (uniqueness of the indefinite integral up to a constant), P and Q must be equal
(or differ by at most a constant). That is, P = Q + C or
du
∫ f ⋅ dx dx = ∫ f du + C.
The above result says that when doing integration, we are allowed to “merrily” do two
things:
du du
1. Replace dx with du (“cancel out the dx’s from dx to get du”);
dx dx
du dx du
2. Replace du with dx (“multiply du by = 1 to get dx”).
dx dx dx
Of course, we are not actually doing any such things as “cancelling out the dx’s” or “mul-
dx
tiplying by = 1” — these are merely mnemonics. Instead, all we are doing is appealing
dx
to the above theorem.56
Let’s try more examples, now that we have a better understanding of how this works:
dy dx dy
56
This is analogous to the Inverse Function Theorem, which states that = 1/ . The IFT is true NOT because and
dx dy dx
dx dy dx
are fractions. Nonetheless, as a convenient mnemonic, we can pretend that the IFT holds because and are
dy dx dy
fractions — even though strictly speaking, such thinking is wrong.
sin x du 1
∫ e cos x dx = ∫ eu dx = ∫ eu du + C1 = eu + C = esin x + C,
dx
1 1
where = uses Theorem 9. Purely as a mnemonic, we may think of this step = as “cancelling
out the dx’s”, even though strictly speaking, we are doing no such thing; instead, we are
appealing to Theorem 9.
50
Example 452. Let’s find ∫ (x3 + 5x2 − 3x + 2) (3x2 + 10x − 3) dx. One method would
be to fully expand the integrand to get a 152nd-degree polynomial, then integrate this
polynomial term-by-term. This is doable, but absurdly tedious.
d
A better method is to observe that 3x2 + 10x − 3 = (x3 + 5x2 − 3x + 2). Thus, let u =
dx
x3 + 5x2 − 3x + 2. Then we can write
3 2 50 2 50 du
∫ (x + 5x − 3x + 2) (3x + 10x − 3) dx = ∫ u dx dx
51
1 50 u51 (x3 + 5x2 − 3x + 2)
= ∫ u du + C1 = +C = + C,
51 51
1
where once again = uses Theorem 9.
In the next three examples, we go in the “opposite direction”. That is, instead of “cancelling
dx
out the dx’s” as was done in the previous few examples, we instead “multiply by = 1”.
dx
√ dx 1 du
∫ 1 − u2 du = ∫ cos x du = ∫ cos x du = ∫ cos x dx + C1
dx dx
2 1 sin 2x
= ∫ cos x cos x dx + C1 = ∫ cos2 x dx + C1 = x + + C,
2 4
√
1 −1 2 sin x cos x sin−1 u + u 1 − u2
= sin u + = ,
2 4 2
1 2
where = uses Theorem 9 and = uses Fact 61.
x2
Example 454. Let’s find ∫ √ dx. We’ll use the substitution u3 = 1 + 2x. Note that
3
1 + 2x
1 3 dx 3 2
x = (u − 1) and = u . So
2 du 2
2
x2 [ 12 (u3 − 1)] (u3 − 1)
2 2
(u3 − 1) du
∫ √ dx = ∫ √ dx = ∫ dx = ∫ dx
3
1 + 2x 3
u3 4u 4u du
2 2
1 (u3 − 1) dx (u3 − 1) 3 2 3u (1 − 2u3 + u6 )
=∫ du + C1 = ∫ ( u ) du + C1 = ∫ du + C1
4u du 4u 2 8
3 4 7 3 u2 2u5 u8 3 2 1 2u3 u6
= − + + = ( − + ) + = u ( − + )+C
8∫
u 2u u du C 1 C
8 2 5 8 8 2 5 8
1
where = uses Theorem 9. (The last line is just further simplification, which is nice but not
necessary.)
1 1 1 dx 1
cos2 x = = = and = .
sec2 x 1 + tan2 x 1 + u2 du 1 + u2
1 1 du 1 1 dx 1 1
So ∫ 3 dx = ∫ 3 dx = ∫ 3 du + C1 = ∫ 3 du + C1
1+ 1+u2 1+ 1+u2
du 1 + 1+u2 du 1 + 1+u2 1 + u2
1 1 2 1 −1 u 1 −1 tan x
=∫ du + C 1 = ∫ du + C 1 = tan ( ) + C = tan ( ) + C,
1 + u2 + 3 22 + u2 2 2 2 2
1 du 2
where = uses Theorem 9 (“multiply by = 1”) and = uses Proposition 11.
du
Usually, the hard part is to figure out the appropriate substitution to make. Fortunately,
in the A-level exams, you’ll always be told what substitution to make.
Exercise 185. (Answers on pp. 1133 and 1134.) (a) (i) Use the substitution x = 3 sec u
9
to find ∫ √ dx.
x2 x2 − 9
√
9 9
(ii) Now use instead the substitution x = to find ∫ √ dx.
1−u x 2 x2 − 9
√
1
(iii) Show that sin (sec−1 y) = 1 − 2 . Then explain why your answers in (i) and (ii) are
y
consistent.
3 x3
(b) (i) Use the substitution x = tan u to find ∫ 3/2
dx.
2 (4x2 + 9)
x3
(ii) Now use instead the substitution u = 4x2 + 9 to find ∫ 3/2
dx.
(4x + 9)
2
1
(iii) Show that cos (tan−1 y) = √ . Then explain why your answers in (i) and (ii) are
1+y 2
consistent.
dv
To choose v ′ , use the rule of thumb DETAIL — D stands for , Exponential, Trig,
dx
Algebraic, Inverse trig, Log. (This is because exponential functions are easiest to integrate,
followed by trigonometric functions, etc.)
′
u v
©
© x ′
∫ x e dx = uv − ∫ u v dx = xe − ∫ e dx = xe − e = e (x − 1).
x x x x x
u v′
©©
2 x ′ 2 x
∫ x e dx = uv − ∫ u v dx = x e − ∫ 2xe dx
x
The problem of finding the definite integral is the problem of finding the area under a curve.
The problem of finding the derivative is the problem of finding the slope of the tangent.
The two Fundamental Theorems of Calculus (FTCs) show that, surprisingly enough, these
two problems are intimately (indeed inversely) related.
This chapter is a largely-informal discussion of the intuition behind the FTCs.
Given a continuous real-valued function f , its area function is denoted A and is, infor-
mally, defined by the mapping A(c) = “Area bounded by the graph of f , the horizontal
axis, and the vertical lines x = 0 and x = c”.
Example 458. Graphed below is the continuous function f ∶ R+0 → R defined by f (x) =
√
x + 1.
The area A(6) is highlighted in red. It is the area bounded by the graph of f , the horizontal
axis, and the vertical lines x = 0 and x = 6.
Using a graphing calculator, A(6) = 15.79795897... Is there a way I can figure this out
without a graphing calculator? Here’s one possible approach — let’s approximate the area
by using three rectangles.
We’ll use three rectangles of equal width — so each rectangle has width 2. The leftmost
rectangle will occupy the interval [0, 2], the middle rectangle will occupy [2, 4], and the
rightmost rectangle will occupy [4, 6].
For each rectangle, we choose its height to be the lowest value attained by the function in
that interval. In the interval [0, 2], the lowest value attained by f is f (0). So the leftmost
blue rectangle has height f (0) and thus area Base × Height = 2f (0).
Similarly, the middle green rectangle has height f (2), because in the interval [2, 4], the
lowest value attained by f is f (2). Hence, it has area Base × Height = 2f (2).
The rightmost grey rectangle has height f (4), because in the interval [4, 6], the lowest value
attained by f is f (4). Hence, it has area Base × Height = 2f (4).
x
-1 0 1 2 3 4 5 6 7 8 9
Altogether, the total area of these three rectangles is
√ √ √
SL3 = 2f (0) + 2f (2) + 2f (4) = 2 [( 0 + 1) + ( 2 + 1) + ( 4 + 1)] ≈ 12.828,
where SL3 stands for “Lower Sum in the case of 3 rectangles with equal width”. This is our
very first approximation of the area A(6). We see that this is a fairly poor approximation,
because the true area is A(6) = 15.79795897... Nonetheless, it is useful — we know that SL3
is a lower bound for A(6). That is, we know that SL3 ≤ A(6).
We’ll next try a different approximation — SU 3 . Can you guess what this involves?
We’ll again use three rectangles of equal width (width 2), occupying intervals [0, 2], [2, 4],
and [4, 6]. The difference now is that for each rectangle, we choose its height to be the
highest value attained by the function in that interval. In the interval [0, 2], the highest
value attained by f is f (2). So the leftmost blue rectangle has height f (2) and thus area
Base × Height = 2f (2).
Similarly, the middle green rectangle has height f (4), because in the interval [2, 4], the
highest value attained by f is f (4). Hence, it has area Base × Height = 2f (4).
The rightmost grey rectangle has height f (4), because in the interval [4, 6], the highest
value attained by f is f (6). Hence, it has area Base × Height = 2f (6).
x
-1 0 1 2 3 4 5 6 7 8 9
√ √ √
SU 3 = 2f (2) + 2f (4) + 2f (6) = 2 [( 2 + 1) + ( 4 + 1) + ( 6 + 1)] ≈ 17.727,
where SU 3 stands for “Upper Sum in the case of 3 rectangles with equal width”. This
is our second approximation of the area A(6). We see that again, this is a fairly poor
approximation, because the true area is A(6) = 15.79795897.... Nonetheless, it is again
useful — we know that SU 3 is an upper bound for A(6). That is, we know that A(6) ≤ SU 3 .
Can we do better than this? Yes, certainly. An obvious follow-up would be to increase the
number of rectangles we use. Let’s next use 6 rectangles instead.
We’ll now use six rectangles of equal width (width 1), occupying intervals [0, 1], [1, 2],
[2, 3], [3, 4], [4, 5], and [5, 6]. To calculuate the Lower Sum SL6 , we give the first rectangle
height of f (0), the second f (1), ..., the sixth f (5). So each rectangle has, respectively, area
1×f (0), 1×f (1), ..., and 1×f (5). Hence, SL6 = f (0)+f (1)+f (2)+f (3)+f (4)+f (5) ≈ 14.382.
y y
x x
-1 0 1 2 3 4 5 6 7 8 9 -1 0 1 2 3 4 5 6 7 8 9
Analogously, to calculuate the Upper Sum SU 6 , we give the first rectangle height of f (1),
the second f (2), ..., the sixth f (6). So each rectangle has, respectively, area 1 × f (1),
1 × f (2), ..., and 1 × f (6). Hence, SU 6 = f (1) + f (2) + f (3) + f (4) + f (5) + f (6) ≈ 16.832.
Once again, A(6) has lower and upper bounds SL6 and SU 6 . That is, 14.382 ≈ SL6 ≤ A(6) ≤
SU 6 ≈ 16.832.
You can see where this is going. We can get ever better lower and upper bounds, by
increasing the number of rectangles we use.
Let n be the number of rectangles we use. We will always have SLn ≤ A(6) ≤ SU n .
Indeed, this “slim rectangles approach” is exactly how the area function is formally and
rigorously defined — see Section 88.17 in the Appendices for the details (optional).
x
-1 0 1 2 3 4 5 6 7 8 9
It appears then that we need to do more maths to figure out how to add up all these
“infinitely-many, infinitely-slim rectangles”. ... But it turns out though that there is an
absolutely-fantastic shortcut we can use.
Given a function f , we sketched an idea of how to find its area function A — approximate
the area under the curve using “infinitely-many, infinitely-slim rectangles” and add up the
total area of these rectangles. This though was merely a sketch of an idea. How do we go
about adding up the area of these “infinitely-many, infinitely-slim rectangles”? Easier said
than done!
It turns out though that we’ll take an entirely different approach. Strangely enough, instead
of finding the area function A, we shall try to find the area function’s derivative A′ .
This seems utterly bizarre. If we don’t know what A is in the first place, how could we
possibly figure out what A′ is? This is analogous to asking someone, who has no idea where
Singapore is, to find the Singapore Flyer!
But surprisingly, it turns out to be much easier to find A′ than it is to find A! We’ll recycle
the example from the last section:
x
-1 0 1 2 3 4 5 6 7 8 9
Consider the thin green vertical strip. This green strip is roughly rectangular in shape —
its left, right, and bottom edges are all straight. Only its upper edge is not straight.
This green strip’s area is exactly A(x) − A(6). Moreover, we know that its base is x − 6, its
left side is f (6), and its right side is f (x). Hence,
Rearranging, we have
A(x) − A(6)
f (6) < < f (x).
x−6
Now consider what happens if we pick another x that is slightly smaller but still larger
than 6. Then the above pair of inequalities will still hold. Indeed, for all x > 6, the above
pair of inequalities hold. If we let x approach 6, the above pair of inequalities becomes
A(x) − A(6)
lim f (6) ≤ lim ≤ lim f (x).
x→6 x→6 x−6 x→6
(For why the strict inequalities < became weak inequalities ≤, either you simply trust me
or see Fact 7 in the Appendices.)
Of course, lim f (6) is simply f (6). And by the continuity of f , lim f (x) = f (6). Hence,
x→6 x→6
A(x) − A(6)
f (6) ≤ lim ≤ f (6),
x→6 x−6
A(x) − A(6)
which means of course that lim = f (6). But wait a second ... what is
x→6 x−6
A(x) − A(6)
lim ? It is simply the value of the derivative of A at 6!!
x→6 x−6
We thus conclude that astonishingly enough, A′ (6) = f (6). And this is more generally true
— given a continuous function f , the derivative of its area function is simply the original
function itself! This is the First Fundamental Theorem of Calculus.
In words, the FTC1 says that the area function of a continuous function is simply
the function itself ! Equivalently, an indefinite integral (or antiderivative) of a
continuous function is the area function.57
Exercise 188. Why did I use the indefinite article an, rather than the definite article the,
in the last sentence above? (Answer on p. 1136.)
Example 459. Graphed below is the velocity v (ms-1 ) of a car as a function of time t (s).
Recall that the area under the graph is the distance travelled by the car. For example, the
shaded red area A(5) is the total distance travelled by the car after 5 s.
But the derivative of the distance travelled with respect to time is precisely the velocity!
Hence, this example illustrates the FTC1: the derivative of the area under the graph of a
function is precisely the function itself!
Velocity (ms-1)
Time (s)
0 1 2 3 4 5 6 7 8
57
For a formal, rigorous statement of FTC1 and its proof, see section 88.17 in the Appendices.
√ 3
Example 460. Define f ∶ R+0 → R by f (x) = x + 1. The definite integral ∫ f dx (simply
1
the area under f , between 1 and 3) is highlighted in blue. Similarly, the definite integral
8
∫5 f dx (simply the area under f , between 5 and 8) is highlighted in red.
IMPORTANT REMARK
q
The indefinite integral ∫ f dx and the definite integral ∫ f dx have very similar
p
names and notation. But do not make the mistake of believing that we’ve simply defined
them so that they’re similar — we have not.
It is the two FTCs that establish the connection between the two. This is what makes the
FTCs remarkable and surprising.
And it is because of this connection that we give these two distinctly-defined mathematical
objects such similar names and notation.
58
For the formal definition of the definite integral, see section 88.17 in the Appendices.
q
Thus, ∫ f dx = A(q) − A(p). From this and also with the aid of the FTC1, we can easily
p
prove the FTC2.
q
∫p f dx = ∫ f dx∣x=q − ∫ f dx∣x=p .
q
∫p f dx = A(q) − A(p)
= [∫ f dx∣ + C] − [∫ f dx∣ + C]
x=q x=p
= ∫ f dx∣ − ∫ f dx∣ .
x=q x=p
To repeat:
A priori, there is no reason to believe that the two are in any way related. It is the two
FTCs that establishes their remarkable relationship:
b
∫a f dx = ∫ f dx∣x=b − ∫ f dx∣x=a ,
b
To compute ∫ f dx, one method would have been to painfully add up the area of the
a
“infinitely-many infinitely-slim rectangles”. Thanks to the FTCs, we have a wonderful
alternative method that is much easier:
Example 461. Find the exact area bounded by the curve y = x2 and the horizontal lines
y = 1 and y = 2.
It’s always helpful to make a quick sketch (given below). Our desired area is labelled A
below. To find a desired area, there are usually multiple methods, some quicker than others.
√ √
Method #1. The entire rectangle A + B + C + D has area 2 × 2 2 = 4 2. B has area
√ √
−1 x 3 −1 1 2 2 2 2−1
2
∫−√2 x dx = [ 3 ] √ = − 3 − (− 3 ) = 3
.
− 2
By symmetry, D has the same area as B. C has area 1 × 2. Hence, A has area
√ √
√ 2 2−1 2 2−1 4 √
A + B + C + D − (B + C + D) = 4 2 − ( +2+ ) = (2 2 − 1) .
3 3 3
√
Method #2. The right branch of the parabola y = x2 has equation x = y. The right half
y=2 y=2 √ 2 2 2 √ 4 √
of the area A is ∫ x dy = ∫ y dy = [y 3/2 ]1 = (2 2 − 1). Hence, A = (2 2 − 1).
y=1 y=1 3 3 3
y
y=2
A
y=1
B D
C
x
-2 -1 0 1 2
Exercise 189. Find the exact area bounded by the curve y = x3 , the horizontal lines y = 1
and y = 2, and the vertical axis. (Answer on p. 1137.)
Example 462. Find the area A bounded by the curve y = x2 and the line y = x + 1.
√
1± 5
By the quadratic formula, the curve and line intersect at the points x = .
2
√ √
(1+ 5)/2 (1+ 5)/2
2 x2 x 3
∫(1−√5)/2 x + 1 − x dx = [ 2 + x − 3 ] √
(1− 5)/2
⎡ (1 + √5)2 √
(1 +
√ 3⎤ ⎡
(1 −
√ 2 √
(1 −
√ 3⎤
⎢ 1+ 5 5) ⎥ ⎢ 5) 1− 5 5) ⎥
= ⎢⎢ + − ⎥−⎢
⎥ ⎢ + − ⎥
⎥
⎢ 23 2 3 ⋅ 23
⎥ ⎢ 23 2 3 ⋅ 23
⎥
⎣ ⎦ ⎣ ⎦
√ √ √ √ √ √
6 + 2 5 1 + 5 16 + 8 5 6 − 2 5 1 − 5 16 − 8 5
=[ + − ]−[ + − ]
8 2 24 8 2 24
√ √ √ √ √ √ √ √ √
3+ 5 1+ 5 2+ 5 3− 5 1− 5 2− 5 7+5 5 7−5 5 5 5
=[ + − ]−[ + − ]= − = .
4 2 3 4 2 3 12 12 6
Exercise 190. Find the exact area bounded by the curve y = sin x and the line y = 0.5, for
x ∈ (0, π/2).(Answer on p. 1138.)
√
√
0.5(1+ 5)
x3 x2 5 5
= 2 [x − + ] = ,
3 2 0.5(1−√5) 3
where we’ve simply recycled our tedious calculations from the previous example.
x
A
Exercise 191. Find exact area bounded by the curves y = 2 − x2 and y = x2 + 1. (Answer
on p. 1138.)
But of course, an area is simply a magnitude, so we’ll take the absolute value and conclude
32
that the desired area is .
3
Exercise 192. Find the exact area bounded by x4 − 16 and the x-axis. (Answer on p.
1139.)
Example 465. Consider the curve described by the equations x = t3 −2 and y = 4−t5 . Find
the exact area bounded by the curve, the lines x = −2 and x = −1, and the horizontal axis.
It helps to graph this curve on your graphing calculator:
1
x=−1 x=−1
5
x=−1
5 2
t=1
5 2 3 3t8
∫x=−2 y dx = ∫x=−2 4 − t dx = ∫x=−2 (4 − t ) 3t dt = ∫t=0 (4 − t ) 3t dt = [4t −
8 0
] = 4.
Example 466. Consider the line y = 1. Rotate it about the x-axis to form an (infinite)
3D cylinder. Now consider the finite portion of the cylinder between x = 1 and x = 2. By a
primary school formula, its volume is Base Area × Height = π12 × (2 − 1) = π.
Height
Radius
Volume
We can also compute this same volume using integration. The intuition is that we’re adding
up infinitely-many infinitely-thin circle-shaped slices, laid on their sides, from x = 1 to x = 2
(left to right). The face of each of these circles has area πy 2 . In this particular example, y
is constant (simply 1). Thus, the total volume is
2 2
2
∫1 πy 2 dx = ∫ π dx = [πx]1 = π.
1
2
2
2
2
2 x3
∫0 πy dx = ∫0 π(3x) dx = 9π [ ] = 24π.
3 0
Height
Radius
Volume
Now consider instead the finite portion of the cone between x = 3 and x = 5. This looks
like a pedestal tilted sideways (not illustrated). We can easily compute its volume using
integration:
5
5
2
5
2 x3
∫3 πy dx = ∫3 π(3x) dx = 9π [ ] = 294π.
3 3
Computing its volume using geometric formulae is possible, if slightly more tedious. The
1 1
finite portion of the cone between x = 0 and x = 3 is V1 = πr2 h = π92 × 3 = 81π. The finite
3 3
1 2 1
portion of the cone between x = 0 and x = 5 is V2 = πr h = π152 × 5 = 375π. Hence, the
3 3
desired volume is V = V2 − V1 = 375π − 81π = 294π.
Example 468. Consider the curve y = x2 . Find its volume of rotation about the y-axis,
from y = 0 and y = 5.
In this case, there are no familiar geometric formulae we can apply. So we really just have
to compute this same volume using integration. Again, the intuition is that we’re adding up
infinitely-many infinitely-thin circle-shaped slices, but this time these circle-shaped slices
are stacked from bottom to top, from y = 0 to y = 5. The face of each of these circles has
area πx2 , where in this particular example, x2 = y. Thus, the total volume is
5
5
2
5 y2
∫0 πx dy = ∫0 πy dy = π [ ] = 12.5π.
2 0
Volume
Exercise 194. Compute the volume of rotation of y = sin x about the x-axis from x = 0 to
x = π. (Answer on p. 1139.)
Example 469. Use your TI84 to find the approximate area bounded by the curve y = esin x
and the horizontal axis, between x = 1 and x = 2.
dy
51.1 = f (x)
dx
dy
= f (x) is simply equivalent to y = ∫ f dx.
dx
dy 2 2 x3
Example 470. Solve = x . Easy: y = ∫ x dx = +C, where as usual C is the constant
dx 3
of integration.
This is the general solution to the given differential equation. It is general because C is
free to vary and so there are many possible solutions for y.
dy
Example 471. Solve = sin x.
dx
y = ∫ sin x dx = − cos x + C, where as usual C is the constant of integration. Again, this is
the general solution to the given differential equation.
If we are given the initial condition that x = 0 Ô⇒ y = 1, then we can write 1 = − cos 0 + C
and find that C = 2. We thus have that y = − cos x + 2. This is the particular solution to
the given differential equation (with given initial condition).
dy
Exercise 195. Find the general solution of = ex sin x. Find also the particular solution,
dx
if given also the intial condition x = 0 Ô⇒ y = 1. (Answer on p. 1140.)
dx 1 dy
= dy , (for ≠ 0.)
dy dx dx
dy 1 dx 1
So given = f (y), rearrange to get = (for f (y) ≠ 0). Equivalently, x = ∫ dy.
dx f (y) dy f (y)
dy
Example 472. Solve = y2.
dx
dx 1 1 −1
Rearrange to get = 2 (for y 2 ≠ 0 or y ≠ 0). Hence, x = ∫ 2 dy = + C (for y ≠ 0).
dy y y y
This is the general solution to the given differential equation.
We will often be asked to express y in terms of x. If so, we can easily rearrange to get
1
y= (for x ≠ C). This is also the general solution to the given differential equation!
C −x
If given also the initial condition x = 0 Ô⇒ y = 1, then we have
1
1= Ô⇒ C = 1.
C −0
1
Thus, y = is the particular solution to the given differential equation (with given
1−x
initial condition).
x = − ln(csc y + cot y) + C
That is, for each given value of x, there are infinitely-many possible values of y (one for
each integer m).
π
But now suppose we have the initial condition x = 3 Ô⇒ y = . In this case, we have
2
π π
3 = − ln ∣csc + cot ∣ + C = − ln ∣1∣ + C = C,
2 2
so that C = 3. We may write y = 2 (cot−1 e3−x + 2mπ). Moreover, plugging in the same
values for x and y, we see that
π π
= 2 (cot−1 e3−3 + 2mπ) = + 4mπ.
2 2
Hence, m = 0 and y = 2 cot−1 e3−x . This is the particular solution to the given differential
equation (with given initial condition)
dy
Exercise 196. Find the general solution of = y 2 + 1. Find also the particular solution,
dx
given also the initial condition x = 0 Ô⇒ y = 1. (Answer on p. 1140.)
d2 y dy
= (x) = f dx which in turn is equivalent to y = ∫ (∫ f dx) dx.
dx ∫
f is equivalent to
dx2
d2 y
Example 474. Solve 2 = x2 .
dx
dy 2 x3 x3 x4
= x dx = + C1 . Next, y = ∫ + C1 dx = + C1 x + C2 . This is the general
dx ∫ 3 3 12
solution to the given differential equation.
14 11
2= + 1C1 + 1 Ô⇒ C1 = .
12 12
x4 11
Hence y = + x + 1 is the particular solution.
12 12
d2 y
Example 475. Solve 2 = sin x.
dx
dy
= sin x dx = − cos x + C1 . Next, y = ∫ − cos x + C1 dx = − sin x + C1 x + C2 . This is the
dx ∫
general solution to the given differential equation.
1 = − sin 0 + 0C1 + C2 Ô⇒ C2 = 1,
1
2 = − sin π + πC1 + 1 Ô⇒ C1 = .
π
1
Hence y = − sin x + x + 1 is the particular solution.
π
d2 y
Exercise 197. Find the general solution of 2 = ex sin x. Find also the particular solution,
dx
given also that x = 0 Ô⇒ y = 1.(Answer on p. 1141.)
Example 476. A plate of bacteria grows at a rate that is inversely proportional to the
number of bacteria. Express the number of bacteria as a function of time.
Let x be the number of bacteria. Let t be time. We are given that x grows in inverse
dx k
proportion to t. In other words, = , for some constant k ∈ R. Rearranging, we have
dt x
dt x
= . Thus,
dx k
x x2
t=∫ dx = + C.
k k
√
Further rearranging,√we have x = ± k(t − C), where of course the negative root may be
rejected. Hence, x = k(t − C).
a
√ b
√
1= k(−C) and 2 = k(1 − C).
a b
From =, we have C = −1/k. Plug this into = and we have 4 = k(1 √ + 1/k) = k + 1 or k = 3.
Hence C = −1/3. Altogether then, the particular solution is x = 3t + 1.
Momentum is defined as the product of mass m and velocity v. Newton’s Second Law of
Motion states that force is the rate of change of momentum.
(b) (i) Write down Newton’s Second Law in the form of an equation.
dv
(ii) Assume that mass m is constant. Explain why F = m .
dt
Now suppose M and m are, respectively, the masses of the Earth and a small ball. Assume
that
(c) The small ball is initially held at rest, x m above the surface of the Earth. It is then
GM dv
released. Let v be the velocity of the ball. Explain why 2 = − . (In particular, explain
r dt
why there is a negative sign.)
RGM R dv
∫R+x r2 dr = ∫R+x − dt dr.
Let vs be the velocity at which the ball hits the surface of the Earth.
1 1
(d) (i) Show that the LHS of the above equation is equal to GM (− + ).
R R+x
vs2
(ii) Show that the RHS of the above equation is equal to − . (Hint 1: Use Integration by
2
dr
substitution. Hint 2: What is ?)
dt
√
1 1
(iii) Hence show that vs = − 2GM ( − ). Again, explain why vs is negative.
R R+x
Suppose instead that the small ball is initially at rest on the surface of the earth. It is then
propelled upwards at a velocity V .
(e)
√ Explain why the ball will reach a maximum height of x m, where V =
1 1
2GM ( − ), before falling back down to the earth.
R R+x
(f) The escape velocity ve is the velocity with which we must propel the ball √ upwards
2GM
(from its initial resting position on the surface of the earth). Explain why ve = .
R
(g) Given that G = 6.674×10−11 m3 kg-1 s-2 , M = 5.972×1024 kg, and R = 6, 371 km, compute
√
2GM
(express your answer in km s-1 , correct to 4 significant figures).
R
SYLLABUS ALERT
This is in the 9740 (old) syllabus, but not in the 9758 (revised) syllabus. So you can skip
this section if you’re taking 9758.
dy 2 2 x3
Example 477. The general solution to = x is y = ∫ x dx = + C.
dx 3
x3
The corresponding family of solution curves is the set of equations {y = + C ∶ C ∈ R}.
3
This family is illustrated below.
d2 y
Exercise 199. Sketch five members of the family of solution curves for = x, given also
dx2
that x = 0 Ô⇒ y = 1. (Answer on p. 1144.)
How many arrangements or permutations are there of the three letters in CAT? For
example, one possible permutation of CAT is TCA.
To solve this problem, one possible method is the method of enumeration. That is,
simply list out (enumerate) all the possible permutations.
To help us count more efficiently, we’ll learn about four basic principles of counting:
Example 478. For lunch today, I can either go to the food court or the hawker centre. At
the food court, I have 2 choices: ramen or briyani. At the hawker centre, I have 3 choices:
bak chor mee, nasi lemak, or kway teow.
Altogether then, I have 2 + 3 = 5 choices of what to eat for lunch today.
The Addition Principle (AP). I have to choose a destination, out of two possible areas.
At area #1, there are p possible destinations to choose from. At area #2, there are q possible
destinations to choose from.
The Addition Principle (AP) simply states that I have, in total, p + q different choices.
(Just so you know, the AP is sometimes also called the Second Principle of Counting
or the Rule of Sum or the Disjunctive Rule.)
Of course, the AP generalises to cases where there are more than just 2 “areas”. It may
seem a little silly, but just to illustrate, let’s use the AP to tackle the CAT problem:
59
See section 89.1 in the Appendices (optional) for a more precise statement of the AP.
Case #1. First letter is an A. Then the next two letters are either CT or TC — 2
possibilities.
Case #2. First letter is a C. Then the next two letters are either AT or TA — 2 possibilities.
Case #3. First letter is a T. Then the next two letters are either AC or CA — 2 possibilities.
Altogether then, by the AP, there are 2 + 2 + 2 = 6 possibilities. That is, there are 6 possible
permutations of the letters in CAT. These are illustrated in the tree diagram below.
Exercise 201. How many permutations are there of the letters in the word DEED? Illus-
trate your answer with a tree diagram similar to that given in the CAT example above.
(Answer on p. 1145.)
Example 480. For lunch today, I can either have prata or horfun. For dinner tonight, I
can have McDonald’s, KFC, or Pizza Hut.
Enumeration shows that I have a total of 6 possible choices for my two meals today:
Alternatively, we can use the Multiplication Principle (MP). I have 2 choices for lunch
and 3 choices for dinner. Hence, for my two meals today, I have in total 2 × 3 = 6 possible
choices.
The Multiplication Principle (MP). I have to choose two destinations, one from each
of two possible areas. At area #1, there are p possible destinations to choose from. At area
#2, there are q possible destinations to choose from.
The Multiplication Principle (AP) simply states that I have, in total, p × q different choices.
Of course, the MP generalises to cases where there are more than just 2 “areas”. Here’s an
example where we have to make 3 decisions:
60
See section 89.1 in the Appendices (optional) for a more precise statement of the MP.
(SF, BPC, A), (SF, BPC, B), (SF, BPC, C), (SF, CF, A),
(SF, CF, B), (SF, CF, C), (BN, BPC, A), (BN, BPC, B),
(BN, BPC, C), (BN, CF, A), (BN, CF, B), (BN, CF, C).
Example 482. Problem: How many four-letter words can be formed using the letters in
the 26-letter alphabet?
Let’s rephrase this problem so that it is clearly in the framework of the MP. We have 4
blank spaces to be filled:
_ _ _ _.
1 2 3 4
These 4 blanks spaces correspond to 4 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space? Decision #4: What letter to
put in the fourth blank space?
For Decision #1, we can put A, B, C, ..., or Z. So we have 26 choices for Decision #1.
For Decision #2, we can again put A, B, C, ..., or Z. So we again have 26 choices for
Decision #2.
We likewise have 26 choices for Decision #3 and also 26 choices for Decision #4.
Altogether then, by the MP, there are 26 × 26 × 26 × 26 = 264 = 456, 976 ways to make our
four decisions.
Solution: There are 264 = 456, 976 possible four-letter words that can be formed using the
26-letter alphabet.
_ _.
1 2
These 2 blank spaces correspond to 2 decisions to be made. Decision #1: What number to
put in the first blank space? Decision #2: What letter to put in the second blank space?
For Decision #1, we can put 1, 2, 3, ..., or 18. So we have 18 choices for Decision #1.
For Decision #2, we can put A, B, C, D, E, or F. So we have 6 choices for Decision #2.
Altogether then, by the MP, there are 18 × 6 = 108 ways to make our two decisions. In other
words, there are 108 possible outcomes from rolling these two dice.
(If necessary, it is tedious but not difficult to enumerate them: 1A, 1B, 1C, 1D, 1E, 1F,
2A, 2B, ..., 17E, 17F, 18A, 18B, 18C, 18D, 18E, and 18F.)
Exercise 202. A club as a shortlist of 3 men for president, 5 animals for vice-president,
and 10 women for club mascot. How many possible ways are there to choose the president,
the vice-president, and the mascot? (Answer on p. 1146.)
Example 484. For lunch today, I can either go to the food court or the hawker centre. At
the food court, I have 4 choices of cuisine: Chinese, Indian, Malay, and Western. At the
hawker centre, I have 3 choices of cuisine: Chinese, Malay, and Thai.
There are 2 choices of cuisine that are common to both the food court and the hawker
centre (Chinese and Malay).
Why do we subtract 2? If we simply added the 4 choices available at the food court to the
3 available at the hawker centre, then we’d double-count the Chinese and Malay cuisines,
which are available at both the food court and the hawker centre. And so we must subtract
the 2 cuisines that are at both locations.
Hence, by the IEP, there are 10 + 4 − 2 = 12 integers that are divisible by either 2 or 5.
(These are namely 2, 4, 5, 6, 8, 10, 12, 14, 15, 16, 18, and 20.)
Exercise 204. (Answer on p. 1147.) The food court has 4 types of cuisine: Chinese,
Indonesian, Korean, and Western. The hawker centre has 3: Chinese, Malay, and Western.
A restaurant has 3: Chinese, Japanese, or Malay.
In total, how many different types of cuisine are there? Illustrate your answer with a Venn
diagram.
61
See section 89.1 in the Appendices (optional) for a more precise statement of the IEP.
Example 486. The food court has 4 types of cuisine: Chinese, Malay, Indian, and Other.
I’m at the food court but don’t feel like eating Malay or Chinese. So by the Complements
Principle (CP), I have 4 − 2 = 2 possible choices of cuisine (Indian and Other).
The Complements Principle (CP). There are p possible destinations. I must choose
one. I rule out q of the possible destinations.
Exercise 205. There are 10 Southeast Asian countries, of which 3 (Brunei, Indonesia, and
the Philippines) are not on the mainland. How many mainland Southeast Asian countries
are there that a European tourist can visit? (Answer on p. 1147.)
62
See section 89.1 in the Appendices (optional) for a more precise statement of the CP.
In this chapter, we’ll use the MP to generate several more methods of counting.
But first, some notation you should find familiar from secondary school:
Definition 103. Let n ∈ Z+0 . Then n-factorial, denoted n!, is defined by n! = n×(n−1)×⋅ ⋅ ⋅×1
for n ≥ 1 and 0! = 1.
Example 488. Problem: How many permutations (or arrangements) are there of the three
letters in the word CAT?
Let’s rephrase this problem in the framework of the MP. Consider three blank spaces:
_ _ _.
1 2 3
These 3 blank spaces correspond to 3 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space?
For Decision #1, we can put C, A, or T. So we have 3 choices for Decision #1.
Having already used up a letter in Decision #1, we are left with two letters. So we have 2
choices for Decision #2.
Having already used up a letter in Decision #1 and another in Decision #2, we are left
with just one letter. So we have only 1 choice for Decision #3.
Altogether then, by the MP, there are 3×2×1 = 3! = 6 possible ways of making our decisions.
This is also the number of ways there are to arrange the three letters in the word CAT.
Example 489. Problem: How many ways permutations are there of the 13 letters in the
word UNPREDICTABLY?
Again, let’s rephrase this problem in the framework of the MP. Consider 13 blank spaces:
_ _ _ _ _ _ _ _ _ _ _ _ _.
1 2 3 4 5 6 7 8 9 10 11 12 13
These 13 blanks spaces correspond to 13 decisions to be made. Decision #1: What letter
to put in the first blank space? Decision #2: What letter to put in the second blank space?
... Decision #13: What letter to put in the 13th blank space?
For Decision #2, having already used up a letter in Decision #1, we are left with 12 letters.
So we have 12 choices for Decision #2.
For Decision #3, having already used up a letter in Decision #1 and another letter in
Decision #2, we are left with 11 letters. So we have 11 choices for Decision #3.
For Decision #13, having already used up a letter in Decision #1, another in Decision #2,
another in Decision #3, ..., and another in Decision #12, we are left with one letter. So
we have 1 choice for Decision #13.
Altogether then, by the MP, there are 13 × 12 × ⋅ ⋅ ⋅ × 2 × 1 = 13! = 6, 227, 020, 800 possible
ways of making our decisions. This is also the number of ways there are to arrange the 13
letters in the word UNPREDICTABLY.
Consider n empty spaces. We are to fill them with the n distinct objects.
_ _ _ . . . _.
1 2 3 n
For space #1, we have n possible choices. For space #2, we have n − 1 possible choices
(because one object was already placed in space #1). ... And finally for space #n, we have
only 1 object left and thus only 1 choice. By the MP then, there are n × (n − 1) × ⋅ ⋅ ⋅ × 1 = n!
possible ways of filling in these n spaces with the n distinct objects.
Example 490. The word COWDUNG has seven distinct letters. Hence, there are 7! = 5040
permutations of the letters in the word COWDUNG.
63
This is informal because, amongst other omissions, we haven’t yet given a precise definition of the term permutation.
In the previous section, we saw that there are 3! permutations of the three letters in the
word CAT and 13! permutations of the 13 letters in the word UNPREDICTABLY. We
made an important note: In each of these words, there was no repeated letter.
We now consider permutations of a set where some elements are repeated.
Example 491. How many permutations are there of the three letters in the word SEE?
A naïve application of the MP would suggest that the answer is 3! = 6. This is wrong.
Enumeration shows that there are only 3 possible permutations:
EES, ESE, SEE.
To see why a naïve application of the MP fails, set up the problem in the framework of the
MP. Consider 3 blank spaces:
_ _ _.
1 2 3
These 3 blanks spaces correspond to 3 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space?
For Decision #1, we can put E or S. So we have 2 choices for Decision #1.
But now the number of choices available for Decision #2 depends on what we chose for
Decision #1! (If we chose E in Decision #1, then we again have 2 choices for Decision
#2. But if instead we chose S in Decision #2, then we now have only 1 choice for Decision
#2.) This violates the implicit but important assumption in the MP that the number of
choices available in one decision is independent on the choice made in the other decision.
Hence, the MP does not directly apply.
The reason SEE has only 3 possible permutations (instead of 3! = 6) is that it contains a
repeated element, namely E. But why would this make any difference?
To understand why, let’s rename the second E as Ê, so that the word SEE is now trans-
formed into a new word SEÊ. From the three letters of this new word, we’d again have
3! = 6 possible permutations:
Restricting attention to the two letters EÊ, we see that there are 2! = 2 ways to permute
these two letters. Hence, any single permutation (in the case where we do not distinguish
between the two E’s) corresponds to 2 possible permutations (in the case where we do). The
figure below illustrates how the 3 permutations of SEE correspond to the 6 permutations
in SEÊ.
Hence, when we do not distinguish between the two E’s, there are only half as many possible
permutations.
Example 492. How many permutations are there of the four letters in the word SASS?
The answer is 4!/3! = 4. Let’s see why.
If we distinguish between the three S’s, perhaps by calling them S, Ŝ, and S̄, then we’d
have 4! = 24 possible permutations of the letters in the word SAŜS̄.
But amongst the three S’s themselves, we have 3! = 6 possible permutations: SŜS̄, SS̄Ŝ,
ŜSS̄, S̄SŜ, ŜS̄S, and S̄ŜS. So distinguishing between the three S’s increases by 6-fold the
number of possible permutations. Working backwards, the word SASS thus has one-sixth
as many permutations as SAŜS̄. That is, SASS has 4!/3! = 4 possible permutations.
The figure below illustrates how the 4 possible permutations of SASS correspond to the 24
possible permutations of SAŜS̄.
4!
Answer: .
2!2!
In the numerator, the 4! corresponds to the total of 4 letters. In the denominator, the 2!
corresponds to the 2 D’s and the 2! corresponds to the 2 E’s. Where do these numbers
come from?
If we distinguish between the two D’s, then we’d increase by 2!-fold the number of possible
permutations, to x ⋅ 2!. If, in addition, we distinguish between the 2 E’s, then we’d increase
again by 2!-fold the number of possible permutations, to x ⋅ 2! ⋅ 2!. But we know that if all
4 letters are distinct, then there are 4! possible permutations. Therefore,
x ⋅ 2! ⋅ 2! = 4!
You can go back and check that this answer is consistent with our answer for Exercise 201
(above).
8!
Answer: .
2!5!
In the numerator, the 8! corresponds to the total of 8 letters. In the denominator, the 2!
corresponds to the 2 E’s and the 5! corresponds to the 5 S’s. Where do these come from?
If we distinguish between the two E’s, then we’d increase by 2!-fold the number of possible
permutations, to y ⋅ 2!. If, in addition, we distinguish between the 5 S’s, then we’d increase
again by 5!-fold the number of possible permutations, to y ⋅ 2! ⋅ 5!. But we know that if all
8 letters are distinct, then there are 8! possible permutations. Therefore,
y ⋅ 2! ⋅ 5! = 8!
Fact 63. Consider n objects, only k of which are distinct. Let r1 , r2 , . . . , and rk be the
numbers of times the 1st, 2nd, . . . , and kth distinct objects appear. (So r1 + r2 + ⋅ ⋅ ⋅ + rk = n.)
Then the number of possible ways to permute these n objects is
n!
.
r1 !r2 ! . . . rk !
More examples:
Example 495. How many permutations are there of the six letters in the word BANANA?
We have three distinct letters — B, A, and N. The letter B appears 1 time. The letter A
appears 3 times. The letter N appears 2 times. Hence, by the above Fact, the number of
possible permutations of these 6 letters is
6!
= 60.
1!3!2!
Of course, 1! is simply equal to 1. So for the denominator, we shall usually not bother to
write out any 1!. So we will normally instead write that the number of permutations of
BANANA is:
6!
= 60.
3!2!
Example 496. How many permutations are there of the 11 letters in the word MISSIS-
SIPPI?
We have four distinct letters — M, I, S, and P. The letter M appears 1 time. The letter I
appears 4 times. The letter S appears 4 times. The letter P appears 2 times. Hence, by
the above Fact, the number of possible permutations of these 11 letters is
11!
= 34, 650.
4!4!2!
Exercise 207. There are 3 identical white tiles and 4 identical black tiles. How many ways
are there of arranging these 7 tiles in a row? (Answer on p. 1148.)
Informal Definition. Two circular permutations are equivalent if one can be transformed
into another by means of a rotation.
Example 497. There are 3! = 6 (linear) permutations of CAT. That is, there are 3! = 6
possible ways to fill them into these 3 linearly-arranged spaces:
___
1 2 3
In contrast, there are only 2! = 2 circular permutations of CAT. That is, there are only
2! = 2 possible ways to fill them into these 3 circularly-arranged spaces:
The three seemingly-different arrangements above are considered to be the same circular
permutation. This is because any arrangement is simply a rotation of another. Take the
left red arrangement, rotate it clockwise by one-third of a circle to get the middle green
arrangement. Repeat the rotation to get the right blue arrangement.
The second and only other circular arrangement of CAT is shown below. Again, these
three seemingly-different arrangements are considered to be the same circular permutation.
This is because any arrangement is simply a rotation of another. Take the left black ar-
rangement, rotate it clockwise by one-third of a circle to get the middle pink arrangement.
Repeat the rotation to get the right orange arrangement.
Note importantly, that the arrangement (or three arrangements) below cannot be rotated
to get the arrangement (or three arrangements) above. Hence, the arrangement below is
indeed distinct from the arrangement above.
It turns out that in general, if we have n distinct objects, there are (n − 1)! ways to arrange
them in a circle. So here there are only (3 − 1)! = 2! = 2 ways to arrange CAT in a circle.
Proof. Given n distinct objects, any 1 circular permutation can be rotated n times to obtain
n distinct (linear) permutations. Hence, there are n times as many (linear) permutations
as there are circular permutations.
But we already know that there are n! (linear) permutations of n distinct objects. Hence,
there are n!/n = (n − 1)! circular permutations of n distinct objects.
Exercise 208. How many ways are there to seat 10 people in a circle? (Answer on p.
1148.)
Note that if there are repeated objects, then the problem is considerably more difficult. See
Section 89.2 in the Appendices for a brief discussion.
Example 498. Using the 26-letter alphabet, how many 3-letter words can we form that
have no repeated letters? This, of course, is simply the problem of filling in these 3 empty
spaces using 26 distinct elements. For space #1, we have 26 possible choices. For space
#2, we have 25. And for space #2, we have 24.
___
1 2 3
By the MP then, the number of ways to fill the three spaces is 26 × 25 × 24. This is also the
number of three-letter words with no repeated letters.
Problems like the above example crop up often enough to motivate a new piece of notation:
Definition 104. Let n, k be positive integers with n ≥ k. Then P (n, k), read aloud as n
permute k, is defined by
n!
P (n, k) = .
(n − k)!
P (n, k) answers the following question: “Given n distinct objects and k spaces (where
k ≤ n), how many ways are there to fill the k spaces?”
Just so you know, P (n, k) is also variously denoted nP k, Pkn , n Pk , etc., but we’ll stick solely
with the P (n, k) in this textbook.
Example 530 (continued from above). The number of 3-letter words without repeated
letters is simply P (26, 3) = 26!/23! = 26 × 25 × 24.
Example 499. Problem: Using the 22-letter Phoenician alphabet, how many 4-letter words
can we form that have no repeated letters?
This, of course, is simply the problem of filling in these 4 empty spaces using 22 distinct
elements. So the answer is P (22, 4) = 22!/18! = 22 × 20 × 19 × 18 words.
Exercise 209. Out of a committee of 11 members, how many ways are there to choose a
president and a vice-president? (Answer on p. 1148.)
Example 500. At a dance party, there are 7 heterosexual married couples (and thus 14
people in total). Problem #1. How many ways are there of arranging them in a line,
with the restriction that every person is next to his or her partner?
Think of there as being 7 units (each unit being a couple). There are 7! ways to arrange
these 7 units in a line. Within each unit, there are 2 possible arrangements. Hence, in
total, there are 7! × 27 possible arrangements.
Problem #2. Repeat the above problem, but now for a circle, rather than a line.
There are 6! ways to arrange the 7 units in a circle. Within each unit, there are 2 possible
arrangements. Hence, in total, there are 6! × 27 possible arrangements.
Problem #3. How many ways are there of arranging them in a circle, with the restriction
that every man is to the right of his wife?
There are 6! ways to arrange the 7 units in a circle. Within each unit, there is only 1
possible arrangement. Hence, in total, there are 6! possible arrangements.
Example 501. (I assume you’re familiar with the standard 52-card deck.)
Problem #1. Using a standard 52-card deck, how many ways are there of arranging any
3 cards in a line, with the restriction that no two cards of the same suit are next to each
other?
This is the problem of filling in 3 spaces with 52 distinct objects. For space #1, we have
52 possible choices.
_ _ _.
1 2 3
For space #2, having picked a card of suit X for space #1, we must pick a card from some
other suit Y. And so there are only 39 possible choices (we have three suits available —
that’s 3 × 13 = 39).
For space #3, having picked a card of suit Y for space #2, we must pick a card from some
other suit Z. Note that suit Z can be the same as suit X. And so there are 38 possible choices
(we have three suits available, less the card used for space #1 — that’s 3 × 13 − 1 = 38).
Problem #2. Repeat the above problem, but now for a circle, rather than a line.
One subtle thing is that, in addition to space #1 being of a different suit from space #2
and space #2 being of a different suit from space #3, we must also have that space #3 is
of a different suit from space #1. Thus, there are 52 × 39 × 26 possible ways to fill in these
three spaces, if they were in a line.
Since they are instead in a circle, there are 52 × 39 × 26 ÷ 3 possible ways to arrange three
cards in a circle, with the condition that no two cards of the same suit are next to each
other.
Exercise 210. (Answer on p. 1148.) There are 4 brothers and 3 sisters. In how many
ways can they be arranged ...
(a) in a line, without any 2 brothers being next to each other?
(b) in a line, without any 2 sisters being next to each other?
(c) in a circle, without any 2 brothers being next to each other?
(d) in a circle, without any 2 sisters being next to each other?
P (n, k) is the number of ways we can fill k (ordered) spaces using n distinct objects.
In contrast, C(n, k) is the number of ways of choosing ose k out of n distinct objects.
Equivalently, it is the same problem of filling k spaces using n distinct objects, except
that now order does not matter.
Example 502. Suppose we have a committee of 13 members and wish to select a president
and a vice-president. This is equivalent to the problem of filling in 2 spaces, given 13
distinct objects.
__
1 2
Suppose instead that we want to choose two co-presidents. How many ways are there of
doing so?
This is simply the same problem as before — again we want to fill in 2 spaces, given 13
distinct objects. The only difference now is that the order of the 2 chosen objects
does not matter. So the answer must be that there are P (13, 2)/2! ways of choosing the
two co-presidents.
Example 503. How many ways are there of choosing 5 cards out of a standard 52-card
deck?
_____
1 2 3 4 5
First, how many ways are there to fill 5 spaces using 52 distinct objects (where order
matters)? Answer: P (52, 5) = 52 × 51 × 50 × 49 × 48 = 311, 875, 200.
And so if we don’t care about order, we must adjust this number by dividing by 5! to get
P (52, 5)/5! = 2, 598, 960. So the answer is that to choose 5 cards out of a 52-card deck,
there are 2, 598, 960 ways.
The above examples suggest that, in general, to choose k out of n given distinct objects,
there are P (n, k)/k! possible ways. This motivates the following definition:
P (n, k) n!
C(n, k) = = .
k! (n − k)!k!
It turns out that C(n, k) appears so often in maths that it has many alternative notations
⎛n⎞
— one of the most common is .
⎝k ⎠
“n choose k” also has several names, such as the combination, the combinatorial
number, and even the binomial coefficient. Shortly, we’ll see why the name binomial
coefficient makes sense.
Exercise 211 gives an alternate expression for C(n, k) which you’ll often find very useful.
Exercise 211. (Answer on p. 1150.) Show that:
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
C(n, k) = .
k!
Exercise 212. Compute C(4, 2), C(6, 4), and C(7, 3). (Answer on p. 1150.)
Exercise 213. We wish to form a basketball team, consisting of 1 centre, 2 forwards, and
2 guards. We have available 3 centres, 7 forwards, and 5 guards. How many ways are there
of forming a team? (Answer on p. 1150.)
Proof. Choosing k out of n objects is the same as choosing which n − k out of n objects to
ignore.
100!
C(100, 70) = .
30!70!
This is the same as the number of ways to choose the 30 men that will not be used for the
task:
100!
C(100, 30) = .
70!30!
Pascal’s Triangle consists of a triangle of numbers. If we adopt the convention that the
topmost row is row 0 and the leftmost term of each row is the 0th term, then the nth row,
k th term is the number C(n, k):
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 25 35 21 7 1
⋮
It turns out that beautifully enough, each term is equal to the sum of the two terms above
it. The next exercise asks you to verify several instances of this:
Exercise 214. Verify the following: (a) C(1, 0) + C(1, 1) = C(2, 1); (b) C(4, 2) + C(4, 3) =
C(5, 3); (c) C(17, 2) + C(17, 3) = C(18, 3). (Answer on p. 1150.)
Suppose we do choose the last object. Then we have to choose another k − 1 objects, out
of the first n objects. There are C(n, k − 1) ways of doing so.
Altogether then, by the Addition Principle, there are C(n, k) + C(n, k − 1) ways of choosing
k out of n + 1 distinct objects.
Poincaré’s quote is especially true in combinatorics. In this section, we’ll learn why C (n, k)
can be called the combination and also the binomial coefficient.
Verify for yourself that the following equations are true:
(1 + x)0 = 1,
(1 + x)1 = 1 + x,
(1 + x)2 = 1 + 2x + x2 ,
(1 + x)3 = 1 + 3x + 3x2 + x3 ,
(1 + x)4 = 1 + 4x + 6x2 + 4x3 + x4 ,
(1 + x)5 = 1 + 5x + 10x2 + 10x3 + 5x4 + x5 ,
(1 + x)6 = 1 + 6x + 15x2 + 20x3 + 15x4 + 6x5 + x6 ,
(1 + x)7 = 1 + 7x + 21x2 + 35x3 + 35x4 + 21x5 + 7x6 + x7 .
⋮
Each of the expressions on the RHS is called a binomial series. Each can also be called
the binomial expansion of (1 + x)n .
Notice anything interesting? No? Try this exercise:
It turns out that somewhat surprisingly, the coefficients of the binomial expansions of
⎛n⎞ ⎛n⎞ ⎛n⎞
(1 + x)n are simply , , ... . As an additional exercise, you should verify for
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝n⎠
yourself that this is also true for n = 0 through n = 6.
There are several ways to explain why the combinatorial numbers also happen to be the
binomial coefficients. Here we’ll give only the combinatorial explanation:
(1 + x)2 = (1 + x)(1 + x) = 1 ⋅ 1 + 1 ⋅ x + x ⋅ 1 + x ⋅ x.
For 1 ⋅ x, we “chose” 1
from the first (1 + x) and x
from the second (1 + x). ⎫
⎪ From the two (1 + x)’s in the
⎪
⎬ product, there are C(2, 1) = 2
⎪
⎪
For x ⋅ 1, we “chose” x ⎭ ways to choose 1 of the x’s.
from the first (1 + x) and 1
from the second (1 + x).
Altogether then, the coefficient on x0 is C(2, 0) (“choose 0 of the x’s”), that on x1 is C(2, 1)
(“choose 1 of the x’s”), and that on x2 is C(2, 1) (“choose 2 of the x’s”). That is:
Exercise 216. (Answer on p. 1151.) Mimicking what was just done above, explain why
By plugging x = 1, y = 1 into the last fact, we see that (1 + 1) = 2n is the sum of the terms
in the nth row of Pascal’s triangle:
There’s a nice combinatorial interpretation of the above fact (Poincaré’s quote at work
again).
Consider the set S = {A, B}. S has 22 = 4 subsets: ∅ = {}, {A}, {B}, and S = {A, B}.
Now consider the set T = {A, B, C}. T has 23 = 8 subsets: ∅ = {}, {A}, {B}, {C}, {A, B},
{A, C}, {B, C}, and T = {A, B, C}.
In general, if a set has n elements, how many subsets does it have? We can couch this in
the framework of the Multiplication Principle — this is really a sequence of n decisions of
whether or not to include each element in the subset. There are 2 choices for each decision.
Thus, there are 2n choices altogether. In other words, using a set of n elements, we can
form 2n subsets.
But of course, this must in turn be equal to the sum of the following:
...
Thus,
Exercise 218. Using what you’ve learnt, write down (3 + x)4 . (Answer on p. 1152.)
Exercise 219. (Answer on p. 1152.) (a) The Tan family has 4 sons and the Wong family
has 3 daughters. Using the sons and daughters from these two families, how many ways
are there of forming 2 heterosexual couples?
(b) The Lee family has 6 sons and the Ho family has 9 daughters. Using the sons and
daughters from these two families, how many ways are there of forming 5 heterosexual
couples?
Example 505. We want to know how much material to purchase, in order to build a fence
around a field. We might go through these steps:
1. Formulate a mathematical model: Our field is the shape of a rectangle, with length
100 m and breadth 50 m.
2. Analyse: The rectangle has perimeter 100 + 50 + 100 + 50 = 300 m.
3. Apply the results of our analysis: We need to buy enough material to build a
300-metre long fence.
That is, describe the real-world scenario in mathematical language and concepts.
This first step is arguably the most important. It is often subjective — not everyone will
agree that your mathematical model is the most appropriate for the scenario at hand.
To use the above example, the field may not be a perfect rectangle, so some may object
to your description of the field as a rectangle. Nonetheless, you may decide that all things
considered, the rectangle is a good mathematical model.
This involves using maths and the rules of logic. (A-level maths exams tend to be mostly
concerned with this second step.)
In the above example, this second step simply involved computing the perimeter of the
rectangle — 100 + 50 + 100 + 50 = 300 m. Of course, for the A-levels, you can expect the
analysis to be more challenging than this.
Note that this second step, in contrast to the first, is supposed to be completely watertight,
non-subjective, and with no room for disagreement. After all, hardly anyone reasonable
could disagree that a perfect rectangle with length 100 m and breadth 50 m has perimeter
300 m.
We’ve secretly always been using mathematical modelling; we just haven’t always been
terribly explicit about it. The foregoing discussion was placed here, because with probability
and statistical models, we want to be especially clear about that we are doing mathematical
modelling.
Real-world scenarios often involve chance. We can model such scenarios mathemati-
cally. For this purpose, we’ll use a mathematical object named the experiment, typically
denoted E.64
An experiment E = (S, Σ, P) is an ordered triple65 composed of three objects, called the
sample space S, the event space Σ (upper-case sigma), and the probability function
P, where
Examples:
64
An experiment is often instead called a probability triple or probability space or (probability) measure space.
65
Previously, in the only ordered triples we encountered, the three terms were always simply real numbers. Here however,
the first two terms are sets and the third is a function. Nonetheless, this is all the same an ordered triple, albeit a more
complicated one.
1. S = {H, T }.
The choice of the sample space belongs to Step #1 (Formulate a mathematical model) in
the process of mathematical modelling. It is subjective and open to disagreement.
For example, John (another scientist) might argue that the coin sometimes lands exactly
on its edge. This is exceedingly unlikely but nonetheless possible — one empirical estimate
is that the US 5-cent coin has probability 1 in 6000 of landing on its edge when flipped
(source). So John might denote this third possible outcome X and his sample space would
instead be S = {H, T, X}.
The event space is simply the set of events. In other words, the event space is the set of
all subsets of S.*
As we saw in Section 54.3, given any finite set S, there are 2∣S∣ possible subsets of S. In
general, given a finite sample space S, the corresponding event space Σ always simply
contains 2∣S∣ events. And so here, since there are 2 possible outcomes, there are, altogether,
22 = 4 possible events.
If the real-world outcome of the coin flip is Heads, then our interpretation (in terms of our
model) is that “the events {H} and {H, T } occur”. If the real-world outcome of the coin
flip is Tails, then our interpretation (in terms of our model) is that “the events {T } and
{H, T } occur”.
The event ∅ never occurs, whatever the real-world outcome is. And the event S = {H, T }
always occurs, whatever the real-world outcome is.
The mathematical modeller is free to select the sample space S she deems most appropriate.
However, once she has selected the sample space S, the event space Σ is automatically
determined by the rules of maths. There is no room for interpretation. Hence, the selection
of the event space Σ belongs to Step #2 (Analysis) in the process of mathematical modelling.
So likewise, John, who chooses S = {H, T, X} as his sample space, has no freedom to choose
his event space Σ. It is automatically Σ = {∅, {H}, {T }, {X}, {H, T }, {H, X}, {T, X}, S}
(consists of 8 elements).
3. Probability function P ∶ Σ → R.
The probability function simply assigns to each event a number (between 0 and 1) called
a probability. So here, if heads and tails are “equally likely” (or the coin is “unbiased” or
“fair”), then it makes sense to assign
The mathematical modeller has no freedom over the domain Σ and codomain R of the
probability function. However, she does have freedom to choose the mapping rule she
deems most appropriate. Hence, the act of choosing the mapping rule belongs to Step #1
(Formulation) in the process of mathematical modelling.
So here, if told that heads and tails are “equally likely” (or that the coin is “unbiased” or
“fair”), the mathematical modeller would naturally choose to assign probability 0.5 to each
of the events {H} and {T }.
John, who chooses S = {H, T, X} as his sample space, might instead assign probability
1/6000 to the event {X} and probability 5999/12000 to each of the events {H} and {T }.
It is correct and proper to write P ({H}) = P ({T }) = 0.5. It is incorrect and improper to
write P (H) = P (T ) = 0.5. This is because the function P is of events (sets of outcomes)
and NOT of outcomes themselves.
Nonetheless, we will often allow ourselves to be sloppy and write the “incorrect and im-
proper” P (H) = P (T ) = 0.5. This is because the notation P ({H}) = P ({T }) = 0.5 can get
rather messy. But you should always remember, even as you write P (H) = P (T ) = 0.5,
that this is technically incorrect.
1. S = {1, 2, 3, 4, 5, 6}.
2. Event space:
Σ = {∅, {1} , {2} , . . . , {6} , {1, 2} , {1, 3} , . . . , {5, 6} , {1, 2, 3} , {1, 2, 4} , . . . , {4, 5, 6} , . . . . . . , S}
There are 6 possible outcomes and thus 26 = 64 possible events. The event space, given
above, is simply the set of all possible events.
If the real-world outcome of the die roll is 3, then the interpretation (in terms of our
model) is that the following 32 events occur: {3}, {1, 3}, {2, 3}, . . . , {1, 2, 3}, {1, 3, 4}, . . . ,
S = {1, 2, 3, 4, 5, 6}. (These are simply the events that contain the outcome 3.)
Similarly, if the real-world outcome of the die roll is 5, then the interpretation is that 32
events occur. You should be able to list all 32 of these events on your own.
3. Probability function P ∶ Σ → R.
If the die is “unbiased” or “fair”, then it makes sense to assign
1
P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = .
6
4
What about the other 58 events? It makes sense to assign, for example, P ({1, 3, 5, 6}) = .
6
In general, the mapping rule of the probability function can be fully specified as: For any
event A ∈ Σ,
∣A∣ ∣A∣
P(A) = = .
∣S∣ 6
In words, given any event A, its probability P(A) is simply the number of elements it
contains, divided by 6.
• S, the sample space, is simply any set (interpreted as the set of possible outcomes in a
real-world scenario involving chance).
• Σ, the event space, is the set of possible events.
• P, the probability function, has domain Σ, codomain R, and must satisfy the three
Kolmogorov axioms (to be discussed below in Definition 107).
For the probability function P, the mathematical modeller is free to choose the mapping
rule she deems most appropriate. The only restriction is that P satisfies three axioms,
called the Kolmogorov Axioms, to be discussed in the next section.
Exercise 220. (Answers on pp. 1153, 1154, and 1155.) Consider each of the following
real-world scenarios.
(a) You pick, at random, a card from a standard 52-card deck.
(b) You flip two fair coins.
(c) You roll two fair dice.
Model each of the above real-world scenarios as an experiment, by following steps (i) - (iii):
(iv) In each scenario, explain briefly how John, another scientist, might justify choosing a
different sample space, event space, and probability function.
An axiom (or postulate) is a statement that is simply accepted as being true, without
justification or proof.
Example 508. Euclid’s parallel axiom says that “Two non-parallel lines in the plane
eventually intersect”. Historically, this axiom was accepted as a “self-evident truth”, with-
out need for justification or proof.
However, in the 19th century, mathematicians discovered “non-Euclidean geometries”, in
which the parallel axiom did not hold. These turned out to have significant implications
for maths, philosophy, and physics.
The above example illustrates that an axiom is not an eternal and immutable truth. Instead,
it is merely a statement that some mathematicians tentatively accept as being true. Having
listed a bunch of axioms, mathematicians then study their implications.
In probability theory, we impose three axioms on the probability function. These can be
thought of as restrictions on what the probability function looks like. Informally:
Formally:
Definition 107. We say that a function P satisfies the three Kolmogorov axioms if:
In case you’ve forgotten, two sets are disjoint if they have no elements in common.
Obviously, P(∅) = 0 (the probability that the empty event occurs is 0). Previously, you’ve
probably taken this and other “obvious” properties for granted. Now we’ll prove that they
follow from the Kolmogorov axioms.
Recall that given any set A, its complement Ac (sometimes also denoted A′ ) is defined to
be “everything else” — more precisely, Ac is the set of all elements that are not in A.
Proposition 12. Let P be a probability function and A, B be events. Then P satisfies the
following properties:
You may recognise that the Complements and the Inclusion-Exclusion properties are anal-
ogous to the CP and IEP from counting.
Venn diagrams are helpful for illustrating probabilities. Those below help to illustrate the
four of the above five properties.
Example 509. Flip three fair coins. Model this as an experiment E = (S, Σ, P), where
1
P(HHH) = P(HHT ) = ⋅ ⋅ ⋅ = P(T T T ) = ,
8
∣A∣
and more generally, for any event A ∈ Σ, P(E) = .
8
Problem: Suppose there is at least 1 tail. Find the probability that there are at least 2 heads.
There are 7 possible outcomes where there is at least 1 tail: HHT , HT H, HT T , T HH,
T HT , T T H, and T T T . Each is equally likely to occur. Of these, 3 outcomes involve at
least 2 heads (HHT , HT H, and T HH). Thus, given there is at least 1 tail, the probability
that there are at least 2 heads is simply 3/7.
The above analysis was somewhat informal. Here is a more formal analysis.
Let A be the event that there are at least 2 heads: A = {HHT, HT H, T HH, HHH}.
A ∩ B is thus the event that there are at least 2 heads and 1 tail: A ∩ B =
{HHT, HT H, T HH}.
P(A ∩ B) 3/8 3
P(A∣B) = = = .
P(B) 7/8 7
Hence, given that B has occurred, the probability that A has also occurred is simply
0.2/0.6 = 1/3. (The information that P(A) = 0.5 is irrelevant.) Formally:
P(A ∩ B) 0.2 1
P(A∣B) = = = .
P(B) 0.6 3
Definition 108. Let P be a probability function and A, B ∈ Σ be events. Then the condi-
tional probability of A given B is denoted P(A∣B) and is defined by:
P(A ∩ B)
P(A∣B) = .
P(B)
Exercise 222. Roll two dice. Given that the sum of the two dice rolls is 8, what is the
probability that we rolled at least one even number? (Answer on p. 1157.)
Definition 109. The conditional probability fallacy (CPF) is the mistaken belief that
P (A∣B) = P (B∣A)
is always true.
Fact 69. (a) If P(A) < P(B), then P (A∣B) < P (B∣A).
P (A ∩ B) P (B ∩ A)
Proof. By definition, P (A∣B) = and P (B∣A) = .
P(B) P(A)
P (A)
Thus, P (A∣B) = P (B∣A). And so,
P(B)
The CPF is also known as the confusion of the inverse or the inverse fallacy. In
different contexts, it is also known variously as the base-rate fallacy, false-positive
fallacy, or prosecutor’s fallacy.
Formally, this reasoning is flawed because P(Vomit) is probably much larger than P(Ebola).
Thus, P (Vomit∣Ebola) is probably much larger than P (Ebola∣Vomit).
Example 512. Sally buys a 4D ticket every week. One day, she wins the first prize. To
her astonishment, she wins the first prize again the following week.
Her jealous cousin Ah Kow makes a police report, based on the following reasoning:
“Without cheating, the probability that Sally wins the first prize two weeks in a row is 1 in
100 million. Given that she did win first prize two weeks in a row, the probability that she
didn’t cheat must likewise be 1 in 100 million. In other words, there is almost no chance
that Sally didn’t cheat.”
Let’s rephrase Ah Kow’s reasoning more formally. Let A and B be the events “Sally
wins the first prize two weeks in a row” and “Sally didn’t cheat”, respectively. We know
that P (A∣B) = 0.00000001. By the CPF, we have P (A∣B) = P (B∣A). Hence, P (B∣A) =
0.00000001. Equivalently, there is probability 0.99999999 that Sally cheated.
Formally, this reasoning is flawed because P(B) is probably much larger than P (A). Thus,
P (B∣A) is probably much larger than P (A∣B).
The test result returns positive (i.e. it says that the randomly-chosen person has smallpox).
What is the probability that this person actually has smallpox?
In words, it is easy to confuse “the probability of a positive test result conditional on having
smallpox” with “the probability of having smallpox conditional on a positive test result”.
Formally, this is the CPF. One starts with P (+∣S) = 0.99 and confusedly concludes that
P (S∣+) = 0.99 — this person almost certainly has smallpox.
In fact, as we now show, despite testing positive, the person is very unlikely to have small-
1 ∗
pox. The correct answer is P (S∣+) ≈ ! In the steps below, each = simply uses the
10, 000
definition of conditional probability (Definition 108):
∗ P (S) P (+∣S)
=
P (S) P (+∣S) + P (S C ) P (+∣S C )
1
1000000 0.99 1
= 1 999999 = 0.00009899029 ≈ .
1000000 0.99 + 1000000 0.01
10, 000
This example illustrates how far off the CPF can lead one astray.
This erroneous reasoning led to Sally Clark being convicted for murdering her two babies.
(Some of you may have noticed that the “expert” actually also made another mistake. But
we’ll examine this only in the next chapter.)
It turns out that not only laypersons and court prosecutors commit the CPF. As we’ll see
later, even academic researchers also often commit the CPF, when it comes to interpreting
the results of a null hypothesis significance test (Chapter 72).
Example 515. Consider all the families in the world that have two children, of whom at
least one is a boy. Randomly pick one of these families. What is the probability that both
children in this family are boys?
Think about it (set aside this book) before reading the answer below.
We already know that one child is a boy. So intuition might suggest that “obviously”,
Intuition would be wrong. Intuition goes astray by failing to recognise that there are three equally likely ways that a family
with two children can have at least one boy: BB, BG, or GB. The answer is in fact 1/3:
1
P(BB) 4 1
= = = .
P(BB) + P(BG) + P(GB) 1
4
+ 1
4
+ 1
4
3
In 2010, the following variant of the above Martin Gardner problem was presented.
Those familiar with the previous problem might think, “Well, this is exactly the same as
the two-boys problem, except with an obviously-irrelevant bit of information about the boy
being born on a Tuesday. So the answer must be the same as before: 1/3.”
It turns out though that, surprisingly, the Tuesday bit of information makes a big difference.
The answer is 13/27 = 0.481. This is much closer to 0.5 than to 1/3!
1 1 7
BT B Boy born on Tuesday Boy (born on any day) P (BT B) = ⋅ =
14 2 196
1 1 7
BT G Boy born on Tuesday Girl P (BT G) = ⋅ =
14 2 196
6 1 6
BN BT Boy not born on Tuesday Boy born on Tuesday P (BN BT ) = ⋅ =
14 14 196
1 1 7
GBT Girl Boy born on Tuesday P (GBT ) = ⋅ =
2 14 196
Altogether then, amongst two-child families with at least one boy born on a Tuesday, the
proportion that have two boys is
P (BT B) + P (BN BT )
=
P (BT B) + P (BT G) + P (BN BT ) + P (GBT )
7 6
196 + 196 13
= 7 7 6 7 = .
196 + 196 + 196 + 196 27
Informally, two events A and B are independent if the probability that both occur is
simply the product of the probabilities that each occurs. Independence is thus analogous
to the MP from counting. Formally:
P(A ∩ B) = P(A)P(B).
Fact 70. Suppose P(B) ≠ 0. Then A, B are independent events ⇐⇒ P(A∣B) = P(A).
1
Proof. By definition of conditional probabilities, P(A∣B) = P(A ∩ B)/P(B). By definition
2 2 1
of independence, P(A ∩ B) = P(A)P(B). Plugging = into =, we have P(A∣B) = P(A), as
desired.
• S = {HH, HT, T H, T T },
• Σ contains 24 = 16 elements, and
• P ({HH}) = P ({HT }) = P ({T H}) = P ({T T }) = 1/4.
Let H1 be the event that the first coin flip is Heads — that is, H1 = {HH, HT }. Analogously
define T1 , H2 , and T2 .
The intuitive idea of independence is easy to grasp. If we say that the two coin flips are
independent, what we mean is that the following four conditions are true:
1. H1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is heads.)
2. H1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is heads.)
3. T1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is tails.)
4. T1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is tails.)
Formally:
1. P (H1 ∩ H2 ) = P({HH}) = P (H1 ) P (H2 ) = P({HH, HT })P({HH, T H}) = 0.5 ⋅ 0.5 =
0.25.
2. P (H1 ∩ T2 ) = P({HT }) = P (H1 ) P (T2 ) = P({HH, HT })P({HT, T T }) = 0.5 ⋅ 0.5 = 0.25.
3. P (T1 ∩ H2 ) = P({T H}) = P (T1 ) P (H2 ) = P({T H, T T })P({HH, T H}) = 0.5 ⋅ 0.5 = 0.25.
4. P (T1 ∩ T2 ) = P({T T }) = P (T1 ) P (T2 ) = P({T H, T T })P({HT, T T }) = 0.5 ⋅ 0.5 = 0.25.
Now consider the event “Heads” E1 = {H1, H2, H3, H4, H5, H6}, and the event “Roll an
odd number” E2 = {H1, H3, H3, T 1, T 3, T 5}. These two events E1 and E2 are independent,
as we now verify:
P (E1 ∩ E2 ) 3/12 1
P (E1 ∣E2 ) = = = = P (E1 ) .
P (E2 ) 6/12 2
More broadly, we can even say that the coin flip and die roll are independent. Informally,
this means that the outcome of the coin flip has no influence on the outcome of the die roll,
and vice versa.
The idea of independence is a little tricky to illustrate on a Venn diagram. I’ll try anyway.
We compute
P(A ∩ B) 0.02
P(A∣B) = = = 0.2.
P(B) 0.1
We observe that P(A) = 0.2 = P(A∣B). And so by Fact 70, we conclude that the events A
and B are independent.
Flip two fair coins. Let H1 be the event that the first coin flip is heads, H2 be the event
that the second is heads, and T1 be the event that the first flip is tails. Show that
The idea of independence is intuitively easy to grasp. Indeed, so much so that students
often assume that “everything is independent”. This is a mistake. Unless you’re explicitly
told, NEVER assume that two events are independent.
Here are two examples where the assumption of independence is plausible:
Example 520. The event “coin-flip #1 is heads” and the event “coin-flip #2 is heads” are
probably independent.
Example 521. The event “die-roll #1 is 3” and the event “die-roll #2 is 6” are probably
independent.
Here are two examples where the assumption of independence is not plausible:
Example 522. The event “Google’s share price rises today” is probably not independent
of the event “Apple’s share price rises today”.
Example 523. The event “it rains in Singapore today” is probably not independent of the
event “it rains in Kuala Lumpur today”.
By simply multiplying together probabilities, the “expert” implicitly assumed that the two
events — “sudden death of baby #1” and “sudden death of baby #2” — are independent.
But as any doctor will tell you, if your family has a history of heart attack, diabetes, or
pretty much any other ailment, then you may be at higher risk (than the average person)
of suffering the same.
And so, it may well be that in any given year, a random person has probability 0.001 of
dying of a heart attack. It does not however follow that in any given year, a random family
has probability 0.0012 = 0.000001 of two deaths by heart attack.
Similarly, it may be that if one baby in a family has already suddenly died, a second baby
is at higher risk (than the average baby) of suddenly dying.
Exercise 226. (Answer on p. 1158.) Say the probability that a randomly-chosen person
is or was an NBA player is one in a million. (This is probably about right, since there’ve
only ever been 4, 000 or so NBA players, since the late 1940s.)
The Barry family had four players in the NBA — the father Rick Barry and three of his
four sons Jon, Brent, and Drew. (The oldest son Scooter didn’t make the NBA but was
still good enough to play professionally in other basketball leagues around the world.)
4
1 1
( ) = .
1, 000, 000 1, 000, 000, 000, 000, 000, 000, 000, 000
This is equal to the probability of buying a 4D number on six consecutive weeks, and
winning first prize every time. Is the journalist correct?
P(A ∩ B) = P(A)P(B),
P(B ∩ C) = P(B)P(C),
P(A ∩ C) = P(A)P(C).
A, B, C are independent if in addition to the above three conditions being true, it is also
true that
P(A ∩ B ∩ C) = P(A)P(B)P(C).
It is tempting to believe that pairwise independence implies independence. That is, if the
first three conditions listed above are true, then so is the fourth. Alas, this is false, as the
next exercise demonstrates:
The Monty Hall Problem is probably the world’s most famous probability puzzle. It takes
less than a minute to state. Yet its counter-intuitive answer confuses nearly everyone.
You’re at a gameshow. There are three boxes, labelled #1, #2, and #3. One box contains
one year’s worth of a Singapore minister’s salary. The other two are empty.
You are asked to pick one box (but you are not allowed to open it yet).
The host, who knows where the minister’s salary is, opens one of the other two boxes, to
reveal that it is empty. Important: The host is not allowed to open the box that contains
the minister’s salary; he must always open a box that is empty.
You’re now given a choice: Stay (with your original choice) or switch (to the other unopened
box). What should you do?
To illustrate:
Example 525. Say you pick Box #2. The host then opens an empty Box #1. You’re now
given a choice: Stay (with Box #2) or switch (to Box #3). Which do you choose?
Take as long as you need to think about this problem, before turning to the
next page for the answer.
Yes; you should switch. The first door has a 1/3 chance
of winning, but the second door has a 2/3 chance.
1. The probability that the minister’s salary is in the box you picked is 1/3. The probability
that the minister’s salary is in either of the other two boxes is 2/3. Of the other two boxes,
the gameshow host (who knows where the salary is) helps you eliminate one of them. So
the remaining unopened box still has probability 2/3 of containing the minister’s salary.
2. Imagine instead that there are 100 boxes, of which one contains the minister’s salary
and the others are empty. You pick one. Of the remaining 99, the gameshow host opens
98. You are again given the choice: Should you stay or switch? In this more extreme
version of the game, it is perhaps more obvious that your originally-picked box has only
probability 1/100 of containing the minister’s salary, while the only other unopened box
has probability 99/100 of the same. Therefore, you should switch.
3. Say you originally pick Box #1. There are three possible cases, each occurring with
probability 1/3:
Not switching wins you the minister’s salary only in Case A (1/3 probability).
Switching wins you the minister’s salary in Cases B and C (2/3 probability).
66
Marilyn vos Savant was, briefly, on the Guinness Book of Records as the person with the world’s highest IQ, until Guinness
retired this category because IQ tests were considered to be too unreliable.
Unfortunately for the above letter writers, Marilyn was correct and they were wrong.
The best way to convince the sceptical is through simulations — try this Google spreadsheet.
Or if you don’t trust computers, do an actual experiment:
Class Activity
Form pairs. One person is the gameshow host and the other is the contestant. The host
decides where the prize is (Box #1, #2, or #3). The contestant then picks a box. The
host then tells the contestant which one of the other two boxes is empty. The contestant
then decides whether to stay or switch.
Repeat as many times as you have time for. Record the proportion of times that the
contestant should have switched. You should find that this proportion is about 2/3.
67
You can read more of these letters at her website.
Example 526. (The birthday problem.) What is the smallest number n of people in a
room, such that it is more likely than not, that at least 2 people in the room share the same
birthday?*
Hence, the probability that at least 2 persons share the same birthday is
The smallest integer n for which the above probability is at least 0.5 is 23. (Wolfram
Alpha.) That is, perhaps surprisingly, with just 23 people, it is more likely than not that
at least 2 persons share a birthday.
*Assume there are no leap years (every year has 365 days). Also, assume each person’s birthday is equally likely to be on
any day of the year and does not depend on the birthday of anyone else.
Informally, a random variable is a function that assigns a real number (you can think of
this as a “numerical code”) to each possible outcome s. We call any such real number an
observed value of X.
Example 527. Model a fair coin-flip with the usual experiment E = (S, Σ, P), where
• S = {H, T }.
• Σ = {∅, {H} , {T } , S}.
• P ∶ Σ → R is defined by P (∅) = 0, P ({H}) = P ({H}) = 0.5, and P(S) = 1.
Let X ∶ S → R be the random variable that indicates whether the coin-flip is heads. That
is, the observed value of X is X(H) = 1 if the outcome is heads and X(T ) = 0 if the
outcome is tails.
Formally:
Students often confuse a random variable with an observed value of the random variable.
This confusion is, of course, simply the confusion between a function and the value taken
by the function.
Example 527 (continued from above). X is a function with domain S and codomain
R. X is therefore a random variable.
If the outcome of the coin-flip is heads, we do not say that X is 1. Instead, we say that
the observed value of X is 1.
If the outcome of the coin-flip is tails, we do not say that X is 0. Instead, we say that the
observed value of X is 0.
Remember: A random variable X is a function that can take on many possible real
number values. Each such value x = X(s) is called an observed value of X.
The notation “X ≥ k”, “X > k”, “X ≤ k”, “X < k”, “a ≤ X ≤ b”, etc. are similarly defined.
Example 527 (continued from above). X(H) = 1 and X(T ) = 0. So we can write:
Now let’s try some other arbitrary number like 13.71. Notice there is no outcome s such
that X(s) = 13.71. Thus:
Y = 15.5 denotes the event {s ∈ S ∶ X(s) = 15.5} = {H, T } , and P(X = 15.5) = 1.
Example 528. Flip two fair coins. Model this with the usual experiment, where S =
{HH, HT, T H, T T }.
Let X ∶ S → R indicate whether the two coin flips are the same and Y ∶ S → R count the
number of heads. That is,
Y (HH) = 2, Y (HT ) = 1, Y (T H) = 1, Y (T T ) = 0.
And
P(Y = 0) = 0.25, P(Y = 1) = 0.5, P(Y = 2) = 0.25, and P(X = k) = 0, for any k ≠ 0, 1, 2.
Another example:
S = {A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨} .
X ∶ S → R is the High Card Point count (used in the game of bridge). I.e.,
Thus,
36 4 4
P(X = 0) = , P(X = 1) = , P(X = 2) = ,
52 52 52
4 4
P(X = 3) = , P(X = 4) = , P(X = k) = 0,
52 52
for any k ≠ 0, 1, 2, 3, 4.
Thus,
39 13
P(Y = 0) = , P(Y = 1) = , P(Y = k) = 0, for any k ≠ 0, 1.
52 52
⎧
⎪ ⎫
⎪
⎪ ⎪
S=⎨ , ,..., , ,..., , ,..., ⎬.
⎪
⎪ ⎪
⎪
⎩ ⎭
⎛ ⎞ ⎛ ⎞
X = 7 and X = 5.
⎝ ⎠ ⎝ ⎠
The table below says that P (X = 2) = 1/36, because there is only one way the event X = 2
can occur. And P (X = 3) = 2/36, because there are two ways the event X = 3 can occur.
Exercise 228. (Continuation of the above example.) (Answer on p. 1159.) (a) Complete
the above table.
Consider the event E, described in words as “the sum of the two dice is at least 10”.
Example 530 (continued from above). Continue with the same the roll-two-fair-dice
example, with X again being the random variable that is the sum of the two dice. We had
⎛ ⎞ ⎛ ⎞
X = 7 and X = 5.
⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞
Y = 10 and Y = 4.
⎝ ⎠ ⎝ ⎠
Remember: random variables are simply functions. And thus, we can manipulate random
variables just like we manipulate any functions.
⎛ ⎞ ⎛ ⎞
(X + Y ) = 17 and (X + Y ) = 9.
⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞
(XY ) = 70 and (XY ) = 20.
⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞
(4X − 5Y ) = −22 and (4X − 5Y ) = 0.
⎝ ⎠ ⎝ ⎠
Exercise 230. (Answer on p. 1160.) Model a fair die-roll with the usual experiment
E = {S, Σ, P}. Define the function X ∶ S → R by the mapping rule X(1) = 1, X(2) = 2,
X(3) = 3, X(4) = 4, X(5) = 5, and X(6) = 6.
Exercise 231. For each of the following real-world scenarios, write down, in precise math-
ematical notation (i) the experiment E = {S, Σ, P}; (ii) what the random variable X is; and
(iii) P(X = k), for all possible k. (Answers on pp. 1160 and 1161.)
(a) Flip 4 (fair) coins. Let the random variable X be a count of the number of heads.
(b) Roll 3 (fair) dice. Let the random variable X be the sum of the three dice. (Tedious.)
Example 531. Flip two fair coins. Model this with the usual experiment where S =
{HH, HT, T H, T T }.
Let X ∶ S → R indicate whether the two coin flips were the same and Y ∶ S → R count the
number of heads. That is,
Then X = 0, Y = 0 is the event that the two coin flips were not the same AND the number of
heads was 0. By observation, this event is the empty set. Thus, P (X = 0, Y = 0) = P (∅) = 0.
X = 1, Y = 0 is the event that the two coin flips were the same AND the number of heads
was 0. By observation, this event is {T T }. Thus, P (X = 1, Y = 0) = P ({T T }) = 0.25.
P (X = 0, Y = 1) = 0.5, P (X = 1, Y = 1) = 0,
P (X = 0, Y = 2) = 0, P (X = 1, Y = 2) = 0.25.
Example 531 (continued from above). Flip two fair coins. We say the two coin-flips
are independent. Informally, the outcome of one doesn’t affect the other. Knowing that
the first coin-flip is heads tells us nothing about the second coin-flip.
A little more formally, let A and B be the random variables indicating whether the first and
second coin-flip are heads (respectively). That is, A = 1 if the first coin-flip is heads and
A = 0 otherwise; and B = 1 if the second coin-flip is heads and B = 0 otherwise. Then the
informal statement “the two coin-flips are independent” may be translated into the formal
statement “the random variables A and B are independent”.
Formally:
Let’s restate the above definition more explicitly. Suppose X can take on values x1 , x2 , . . . , xn
and Y can take on values y1 , y2 , . . . , ym . Then to say that X and Y are independent is to
say that all of the following n × m pairs of events are independent
X = x 1 , Y = y1 , X = x 1 , Y = y2 , ... X = x 1 , Y = ym ,
X = x 2 , Y = y1 , X = x 2 , Y = y2 , ... X = x 2 , Y = ym ,
⋮ ⋮ ... ⋮
X = x n , Y = y1 , X = xn , Y = y2 , ... X = x n , Y = ym .
Again, A and B are the random variables indicating whether the first and second coin-flips
are heads (respectively).
We now verify that indeed, P (A = a, B = b) = P(A = a)P(B = b) for all possible values of a
and b:
P (A = a, B = b) P(A = a)P(B = b)
a = 0, b = 0 P ({T T }) = 0.25 P ({T H, T T }) P ({HT, T T }) = 0.5 × 0.5, ✓
a = 1, b = 0 P ({HT }) = 0.25 P ({HH, HT }) P ({HT, T T }) = 0.5 × 0.5, ✓
a = 0, b = 1 P ({T H}) = 0.25 P ({T H, T T }) P ({HH, T H}) = 0.5 × 0.5, ✓
a = 1, b = 1 P ({HH}) = 0.25 P ({HH, HT }) P ({HH, T H}) = 0.5 × 0.5. ✓
Exercise 232. Flip two fair coins. Let X ∶ S → R indicate whether the two coin flips were
the same and Y ∶ S → R count the number of heads. Are X and Y independent random
variables? (Answer on p. 1163.)
Earlier we warned against blithely assuming that any two events are independent. Here we
can repeat this warning: Unless explicitly told (or you have a good reason), do not assume
that two random variables are independent.
The assumption of independence is a strong one. There are many scenarios where it is
plausible. For example, the flips of two coins are probably independent. The rolls of two
dice are probably independent.
There are, however, also many scenarios where it is not plausible. Today’s changes in
the share prices of Google and Apple are probably not independent. Today’s rainfall in
Singapore and in Kuala Lumpur are probably not independent.
Nonetheless, the assumption of independence is frequently — and incorrectly — made even
when it is implausible. The reason is that the maths is easy if we assume independence —
we can simply multiply probabilities together. Unfortunately, incorrectly assuming inde-
pendence has sometimes had tragic consequences, as we saw in the Sally Clark case.
Note that X takes on a value 1 with probability 1/6. Similarly, it takes on a value 2 with
probability 1/6. Etc. Hence, the expected value of X, denoted E [X] is given by:
1 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 21
E[X] = ⋅1+ ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6= = = 3.5.
6 6 6 6 6 6 6 6
E[X] is thus simply a weighted average of the possible values of X, where the weights are
the probability weights.
That is, a random variable is discrete if it takes on finitely many possible values.
We can now formally define the expected value of a discrete random variable:
E[X] = ∑ P(X = k) ⋅ k.
k∈Range(X)
We call E[X] the expected value (or mean) of X. We often write µX = E[X] or even
µ = E[X] (if it is clear from the context that we’re talking about the mean of X).
68
The correct definition is this: A random variable is discrete if its range is finite or countably-infinite. I avoid giving this
correct definition because this would require explaining what “countably-infinite” means.
E[X] = ∑ P (X = k) ⋅ k
k∈Range(X)
= P (X = 1) ⋅ 1 + P (X = 2) ⋅ 2 + P (X = 3) ⋅ 3 + P (X = 4) ⋅ 4 + P (X = 5) ⋅ 5 + P (X = 6) ⋅ 6.
1 1 1 1 1 1
= ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 + ⋅ 5 + ⋅ 6 = 3.5.
6 6 6 6 6 6
E[Y ] = ∑ P (Y = k) ⋅ k
k∈Range(Y )
= P (Y = 2) ⋅ 2 + P (Y = 3) ⋅ 3 + P (Y = 4) ⋅ 4 + P (Y = 5) ⋅ 5 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 12
1 2 3 4 5 6 5 4 3 2 1
= ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6+ ⋅7+ ⋅8+ ⋅9+ ⋅ 10 + ⋅ 11 + ⋅ 12
36 36 36 36 36 36 36 36 36 36 36
2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22 + 12 252
= = = 7.
36 36
As it turns out, it is generally true that E[X + Y ] = E[X] + E[Y ] (as we’ll see in the next
section). So if we knew this, then the problem would be very easy:
1 4
E[X + Y ] = E[X] + E[Y ] = 1 + = .
3 3
But as an exercise, let’s pretend we don’t know that E[X + Y ] = E[X] + E[Y ]. We thus
have to work out E[X + Y ] the hard way:
1 1 5 5 25
P (X + Y = 0) = ⋅ ⋅ ⋅ = ,
2 2 6 6 144
⎛ 2 ⎞ 1 1 5 5 1 1 ⎛ 2 ⎞ 5 1 50 10 60
P (X + Y = 1) = ⋅ ⋅ ⋅ + ⋅ = + = .
⎝ 1 ⎠ 2 2 6 6 2 2 ⎝ 1 ⎠ 6 6 144 144 72
You are asked to complete the rest of this problem in the exercise below.
Exercise 233. Complete the above example by following these steps: (a) Compute
P (X + Y = 2). (b) Compute P (X + Y = 3). (c) Compute P (X + Y = 4). (d) Now com-
pute E[X + Y ]. (Answer on p. 1163.)
Example 536. Let 5 be a constant random variable on some experiment E = (S, Σ, P).
That is, 5 ∶ S → R is the function defined by s ↦ 5. (Note that the symbol 5 does double
duty by denoting both a function and a real number.) Then not surprisingly,
Function Number
↓ ↓
E [5] = 5 .
That is, on average, we expect the random variable 5 to take on the value 5.
Fact 71. If the constant random variable c maps every outcome to the number c, then
E[c] = c.
(Source: Singapore Pools, “Rules for the 4-D Game”, Version 1.11, 17/11/15. PDF.)
n n n n n
∑ (ai + bi ) = ∑ ai + ∑ bi and ∑ (kai ) = k ∑ ai .
i=1 i=1 i=1 i=1 i=1
d
Example 538. The differentiation operator is an example of a linear transformation.
dx
Because it satisfies both additivity and homogeneity of degree 1:
d d d d d
(f (x) + g(x)) = f (x) + g(x) and (kf (x)) = k f (x).
dx dx dx dx dx
√
Example 539. The square-root operator ⋅ is not a linear transformation. In general, we
do not have
√ √ √ √ √
x+y = x+ y or kx = k x.
2 2
(x + y) = x2 + y 2 or (kx) = kx2 .
Proposition 13. The expectation operator E is linear. That is, if X and Y are random
variables and c is a constant, then
(a) Additivity: E[X + Y ] = E [X] + E [Y ],
(b) Homogeneity of degree 1: E[cX] = cE [X].
Example 541. I stake $100 on each of two different 4D numbers for Saturday’s drawing
(“big” game). (So that’s $200 total.)
Let X and Y be my winnings (excluding my original stake) from the first and second
numbers (respectively). Now, X and Y are certainly not independent because for example,
if my first number wins first prize, then my second number cannot possibly also win first
prize.
Nonetheless, despite X and Y not being independent, the linearity of the expectation
operator tells us that
Example 542. Consider a random variable X that is equally likely to take on one of 5
possible values: 0, 1, 2, 3, 4. Its mean is
1 1 1 1 1
µX = ∑ P (X = k) ⋅ k = ⋅ 0 + ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 = 2.
5 5 5 5 5
Now consider another random variable Y that is equally likely to take on one of 5 possible
values: −8, −3, 2, 7, 12. Coincidentally, its mean is the same:
1 1 1 1 1
µY = ∑ P (Y = k) ⋅ k = ⋅ (−8) + ⋅ (−3) + ⋅ 2 + ⋅ 7 + ⋅ 12 = 2.
5 5 5 5 5
The random variables X and Y share the same mean. However, there is an obvious differ-
ence: Y is “more spread out”.
What, precisely, do we mean when we say that one random variable is “more spread out”
than another?
Our goal in this section is to invent a measure of “spread-outness”. We’ll call this the
variance and denote the variance of any random variable X by V [X].
It’s not at all obvious how the variance should be defined. One possibility is to define the
variance as the weighted average of the deviations from the mean.
V [X] = ∑ P (X = k) ⋅ (k − µ)
1 1 1 1 1
= ⋅ (0 − µ) + ⋅ (1 − µ) + ⋅ (2 − µ) + ⋅ (3 − µ) + ⋅ (4 − µ)
5 5 5 5 5
1 1 1 1 1
= ⋅ (0 − 2) + ⋅ (1 − 2) + ⋅ (2 − 2) + ⋅ (3 − 2) + ⋅ (4 − 2)
5 5 5 5 5
2 1 1 2
= − − + 0 + + = 0.
5 5 5 5
Hmm. This works out to be 0. Is that just a weird coincidence? Let’s try the same for Y :
V [Y ] = ∑ P (Y = k) ⋅ (k − µ)
1 1 1 1 1
= ⋅ (−8 − µ) + ⋅ (−3 − µ) + ⋅ (2 − µ) + ⋅ (7 − µ) + ⋅ (12 − µ)
5 5 5 5 5
1 1 1 1 1
= ⋅ (−8 − 2) + ⋅ (−3 − 2) + ⋅ (2 − 2) + ⋅ (7 − 2) + ⋅ (12 − 2)
5 5 5 5 5
= −2 − 1 + 0 + 1 + 2 = 0.
=µ
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
∑ P(X = k) ⋅ (k − µ) = ∑ P(X = k) ⋅ k − ∑ P(X = k) ⋅ µ
k k k
= µ − µ∑ P(X = k) = 0.
k
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=1
So our first proposed definition of the variance — the weighted average of the deviations
from the mean — is always equal to 0. Intuitively, the reason is that the negative deviations
(corresponding to those values below the mean) exactly cancel out the positive deviations
(corresponding to those values above the mean).
This proposed definition is thus quite useless. We cannot use it to say things like Y is
“more spread out” than X.
This suggests a second approach: define the variance to be the weighted average of the
absolute deviations from the mean.
For X, the weighted average of the absolute deviations from the mean is
V [X] = ∑ P (X = k) ⋅ ∣k − µ∣
1 1 1 1 1
= ⋅ ∣0 − µ∣ + ⋅ ∣1 − µ∣ + ⋅ ∣2 − µ∣ + ⋅ ∣3 − µ∣ + ⋅ ∣4 − µ∣
5 5 5 5 5
1 1 1 1 1
= ⋅ ∣0 − 2∣ + ⋅ ∣1 − 2∣ + ⋅ ∣2 − 2∣ + ⋅ ∣3 − 2∣ + ⋅ ∣4 − 2∣
5 5 5 5 5
2 1 1 2 6
= + +0+ + = .
5 5 5 5 5
V [Y ] = ∑ P (Y = k) ⋅ (k − µ)
1 1 1 1 1
= ⋅ ∣−8 − µ∣ + ⋅ ∣−3 − µ∣ + ⋅ ∣2 − µ∣ + ⋅ ∣7 − µ∣ + ⋅ ∣12 − µ∣
5 5 5 5 5
1 1 1 1 1
= ⋅ ∣−8 − 2∣ + ⋅ ∣−3 − 2∣ + ⋅ ∣2 − 2∣ + ⋅ ∣7 − 2∣ + ⋅ ∣12 − 2∣
5 5 5 5 5
= 2 + 1 + 0 + 1 + 2 = 6.
Wonderful! So we can now use this second proposed definition of the variance to say things
like “Y is more spread out than X”.
This second proposed definition seems perfectly satisfactory. Yet for some bizarre reason,
we won’t use it! Instead, we’ll define the variance to be the weighted average of the
squared deviations from the mean.
For X, the weighted average of the squared deviations from the mean is
2
V [X] = ∑ P (X = k) ⋅ (k − µ)
1 2 1 2 1 2 1 2 1 2
= ⋅ (0 − µ) + ⋅ (1 − µ) + ⋅ (2 − µ) + ⋅ (3 − µ) + ⋅ (4 − µ)
5 5 5 5 5
1 2 1 2 1 2 1 2 1 2
= ⋅ (0 − 2) + ⋅ (1 − 2) + ⋅ (2 − 2) + ⋅ (3 − 2) + ⋅ (4 − 2)
5 5 5 5 5
4 1 1 4
= + + 0 + + = 2.
5 5 5 5
Formally,
Definition 118. Let µ = E [X]. Then the variance operator is denoted V and is the
function that maps each random variable X to a real number c, given by the mapping rule
2
V[X] = E [(X − µ) ] .
2
We call V[X] the variance of X. This is often also instead written as σX or even more
2
simply as σ (if it is clear from the context that we’re talking about the variance of X).
So to calculate the variance, we do this: Consider all the possible values that X can take.
Take the difference between these values and the mean of X. Square them. Then take the
probability-weighted average of these squared numbers.
More examples:
2 2
V[X] = E [(X − µ) ] = E [(X − 3.5) ]
1 35
= (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) = ≈ 2.92.
6 12
35
So the variance of the die roll is ≈ 2.92. This means that the expected squared deviation
12
35
of X from its mean µ = 3.5 is ≈ 2.92.
12
Example 544. Roll two fair dice. Let the random variable Y be the sum of the two dice.
We already know from Example 534 that µ = 7. So, using also our findings from Exercise
228,
2 2
V[Y ] = E [(Y − µ) ] = E [(Y − 7) ]
1 2 2 2 3 2 4 2 5 2 6 2 5 2
= ⋅5 + ⋅4 + ⋅3 + ⋅2 + ⋅1 + ⋅0 + ⋅1
36 36 36 36 36 36 36
4 2 3 2 2 2 1 2
+ ⋅2 + ⋅3 + ⋅4 + ⋅5
36 36 36 36
2 (25 + 32 + 27 + 16 + 5) 210 70
= = = ≈ 5.83.
36 36 12
70
So the variance of the sum of two dice is ≈ 5.83. This means that on average, the square
12
70
of the deviation of Y from its mean µ = 7 is ≈ 5.83.
12
As the above examples suggest, calculating the variance can be tedious. Fortunately, there
is a shortcut:
Proof. Using the definition of variance, the linearity of the expectation operator (Proposi-
tion 13), and the fact that µ is a constant, we have
2
V[X] = E [(X − µ) ] = E [X 2 + µ2 − 2Xµ] = E [X 2 ] + E [µ2 ] − 2E [Xµ]
= E [X 2 ] + µ2 − 2µE [X] = E [X 2 ] + µ2 − 2µ ⋅ µ = E [X 2 ] − µ2 .
Example 543 (continued from above). Let the random variable X be the outcome of
the roll of a fair die. We already know that µ = 3.5. So compute
1 2 2 91
E [X 2 ] = P (X = 1) ⋅ 12 + P (X = 2) ⋅ 22 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ 62 = (1 + 2 + ⋅ ⋅ ⋅ + 62 ) = .
6 6
91 182 147 35
Hence, V[X] = E [X 2 ] − µ2 = − 3.52 = − = .
6 12 12 12
Example 544 (continued from above). Let the random variable Y be the sum of two
rolled dice. We already know from Example 534 that µ = 7. So, using also our findings
from Exercise 228,
E [Y 2 ] = P (Y = 2) ⋅ 22 + P (Y = 3) ⋅ 32 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 122
1 2 2 2 3 2 1
= ⋅2 + ⋅3 + ⋅ 4 + ⋅⋅⋅ + ⋅ 122
36 36 36 36
Exercise 235. Let the random variable Z be the sum of three rolled dice. Find V[Z].
(Answer on p. 1165.)
A constant random variable cannot vary. So not surprisingly, the variance of a constant
random variable is 0.
Fact 73. Let c be a constant random variable (i.e. it maps every outcome to the real number
c). Then
V[c] = 0.
2
Proof. Use Fact 594: V [c] = E [c2 ] − (E [c]) = c2 − c2 = 0.
Let X be a random variable. Then E [X] has the same unit of measure as X. In contrast,
V [X] uses the squared unit.
Example 545. There are 100 dumbbells in a gym, of which 30 have weight 5 kg and the
remaining 70 have weight 10 kg. Let X be the weight of a randomly-chosen dumbbell.
Then the mean of X is
To get a measure of “spread” that uses the original unit of measure, we simply take the
square root of the variance. This is called the standard deviation as a measure of spread.
Definition 119. Let X be a random variable and V[X] be its variance. Then the standard
deviation of X is defined as
√
SD [X] = V[X].
2
The variance of a random variable X is often denoted σX or even more simply as σ 2 (if it
is clear from the context that we’re talking about the variance of X).
Correspondingly, the standard deviation of X is often denoted σX or σ.
Exercise 236. There are 100 rulers in a bookstore, of which 35 have length 20 cm and
the remaining 65 have length weight 30 cm. Let Y be the weight of a randomly-chosen
dumbbell. Find the mean, variance, and standard deviation of Y . (Be sure to include the
units of measurement.)(Answer on p. 1165.)
The variance operator is not linear. However, given independence, the variance operator
does satisfy additivity and homogeneity of degree 2.
Proposition 14. Let X and Y be independent random variables and c be a constant. Then
(a) Additivity: V[X + Y ] = V [X] + V [Y ],
(b) Homogeneity of degree 2: V[cX] = c2 V [X].
With the above, it becomes much easier than before to find the variance of the sum of 2
dice, 3 dice, or indeed n dice.
Now roll three fair dice. Let X3 , X4 , and X5 be the respective outcomes. Let Z be the sum
of the three dice (i.e. Z = X3 + X4 + X5 ). Again, assuming independence, we have
105
V[Z] = V [X3 + X4 + X5 ] = V [X3 ] + V [X4 ] + V [X5 ] = .
12
Again, compare this quick computation to the work you had to do in Exercise 235!
Now, let A be double the outcome of a die roll (i.e. A = 2X). Note importantly that A ≠ Y .
Y is the sum of two independent die rolls. In contrast, A is double the outcome of a single
die roll. Indeed, by Proposition 14, we see that
140
V[A] = V[2X] = 4V[X] = ≠ V[Y ].
12
Similarly, let B be triple the outcome of a die roll (i.e. B = 3X). Note importantly that
B ≠ Z. Z is the sum of three independent die rolls. In contrast, B is triple the outcome of
a single die roll. Indeed, by Proposition 14, we see that
315
V[B] = V[3X] = 9V[X] = ≠ V[Z].
12
Exercise 237. The weight of a fish in a pond is a random variable with mean µ kg and
variance σ 2 kg2 . (Include the units of measurement in your answers.) (Answer on p. 1165.)
(a) If two fish are caught and the weights of these fish are independent of each other, what
are the mean and variance of the total weight of the two fish?
(b) If one fish is caught and an exact clone is made of it, what are the mean and variance
of the total weight of the fish and its clone?
(c) If two fish are caught and the weights of these fish are not independent of each other,
what are the mean and variance of the total weight of the two fish?
Why is the variance defined as the weighted average of squared deviations from the mean?
1. First, we tried defining the variance as the weighted average of deviations from the mean,
i.e. V[X] = E [X − µ]. But this was no good, because this quantity would always be
equal to 0.69
2. Next, we tried defining the variance as the weighted average of absolute deviations from
the mean, i.e. V[X] = E [∣X − µ∣]. This seemed to work well enough. But yet for some
bizarre reason, we choose not to use this definition.
3. Instead, we choose to use this definition:
2
V[X] = E [(X − µ) ] .
Why do we prefer using squared (rather than absolute) deviations as our definition of
variance? The conventional view is that the squared deviations definition is superior to
the absolute deviations definition (but see Gorard (2005) and Taleb (2014) for dissenting
views). Here are some reasons for believing the squared deviations definition to be superior:
– The algebra is easier when dealing with squares than with absolute values.
– Differentiation is easier (bserve that x2 is differentiable but ∣x∣ is not).
– Variances are additive: If X and Y are independent, then V [X + Y ] = V [X] + V [Y ].
In contrast, if we use the definition V[X] = E [∣X − µ∣], then variances are no longer
additive.
• Tradition (inertia).
– A century or two ago, some Europeans preferred using squared to absolute deviations.
And so we’re stuck with using this.
See also these Stack Exchange Q&A discussions: [1], [2], [3], [4], and [5].
69
This is easily proven: E [X − µ] = E [X] − E [µ] = µ − µ = 0.
Here’s another example of a probability problem that can be stated very simply, yet have
counter-intuitive results.
Example 547. Keep flipping a fair coin until you get a sequence of HH (two heads in a
row). Let X be the number of flips taken.
Now, keep flipping a fair coin until you get a sequence of HT . Let Y be the number of flips
taken.
Intuition might suggest that “obviously”, µX = µY . Intuition would be wrong. It turns out
that, surprisingly enough, µX = 6 and µY = 4!
Example 548. Now suppose we flip a fair coin 10, 001 times. This gives us a sequence of
10, 000 pairs of consecutive coin-flips.
For example, if the 10, 001 coin-flips are HHTHT . . . , then the first four pairs of consecutive
coin-flips are HH, HT, TH, and HT .
Let A be the proportion of the 10, 000 consecutive coin-flips that are HH. Let B be the
proportion of the 10, 000 consecutive coin-flips that are HT .
In the previous example, we saw that it took, on average, 6 flips before getting HH and
4 flips before getting HT . So “obviously”, we’d expect a smaller proportion to be HH’s.
That is, µA < µB .
Sadly, we would again be wrong! It turns out that µA = µB = 1/4! This Google spreadsheet
simulates 10, 001 coin-flips and calculates A and B.
If you’re interested, the results given in the above two examples are formally proven in Fact
103 in the Appendices.
Example 549. Flip a coin. We can model this with a Bernoulli trial with probability of
success (heads) 0.5:
Formally:
Note that we can denote the two elements of the sample space with any symbols. We could
use 0 — standing for failure — and 1 — standing for success. Or we could use T and H,
as was done in the example above.
Example 551. 90% of H2 Maths students pass their H2 Maths A-level exams. We ran-
domly pick a H2 Maths student and see if she passes her H2 Maths A-level exam.
We can model this with a Bernoulli trial with probability of success 0.9:
• Sample space S = {F, P },
• Event space Σ = {∅, {F }, {P }, S},
• Probability function P({F }) = 0.1 and P({P }) = 0.9.
The corresponding Bernoulli random variable is simply the random variable Y ∶ S → R
defined by Y ({F }) = 0 and Y ({P }) = 1. Its probability distribution is given by P (Y = 0) =
0.1 and P(Y = 1) = 0.9.
Fact 74. A Bernoulli random variable T with probability of success p has mean p and
variance p(1 − p).
Proof. E[T ] = P (T = 0) ⋅ 0 + P (T = 1) ⋅ 1 = (1 − p) ⋅ 0 + p ⋅ 1 = p.
For the variance, first compute
E [T 2 ] = P (T = 0) ⋅ 02 + P (T = 1) ⋅ 12 = (1 − p) ⋅ 0 + p ⋅ 12 = p.
2
Hence, V [T ] = E [T 2 ] − (E[T ]) = p − p2 = p(1 − p).
Informally, the binomial random variable simply counts the number of successes in a
sequence of n identical, but independent Bernoulli trials.
1
X is an example of a binomial random variable X with parameters 3 and .
2
X can take on values 0, 1, 2, or 3 (corresponding to the number of heads).
⎛3⎞ 1 0 1 3 1 ⎛3⎞ 1 1 1 2 3
P(X = 0) = ( ) ( ) = , P(X = 1) = ( ) ( ) = ,
⎝0⎠ 2 2 8 ⎝1⎠ 2 2 8
⎛3⎞ 1 2 1 1 3 ⎛3⎞ 1 3 1 0 1
P(X = 2) = ( ) ( ) = , P(X = 3) = ( ) ( ) = .
⎝2⎠ 2 2 8 ⎝3⎠ 2 2 8
Formally:
X = T1 + T2 + ⋅ ⋅ ⋅ + Tn .
⎛2⎞ 0 2
P (Y = 0) = 0.9 0.1 = 0.01,
⎝0⎠
⎛2⎞ 1 1
P (Y = 1) = 0.9 0.1 = 0.18,
⎝1⎠
⎛2⎞ 2 0
P (Y = 2) = 0.9 0.1 = 0.81.
⎝2⎠
In words, the probability that both fail is 0.01, the probability that exactly one passes is
0.18, and the probability that both pass is 0.81.
⎛n⎞ k
P(X = k) = p (1 − p)n−k .
⎝k ⎠
In summary:
⎛n⎞ k
P(X = k) = p (1 − p)1−k .
⎝k ⎠
Example 554. Let X be the number of heads when 10 fair coins are flipped.
Then X ∼ B(10, 0.5). And the probability that exactly 8 coins are heads is:
⎛ 10 ⎞ 8 2 45
P(X = 8) = 0.5 0.5 = .
⎝ 8 ⎠ 1024
⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0
= 0.9 0.1 + 0.9 0.1 + 0.9 0.1 ≈ 0.677.
⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠
Example 556. Problem: Three machines each have, independently, probability 0.3 of fail-
ure. What is the expected number of failures? What is the variance of the number of
failures?
Solution: Let Z ∼ B(3, 0.3) be the number of failures. Then
Hence, E[Z] = P (Z = 1) ⋅ 1 + P (Z = 2) ⋅ 2 + P (Z = 3) ⋅ 3
⎛3⎞ 1 2 ⎛3⎞ 2 1 ⎛3⎞ 3 0
= 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3
⎝1⎠ ⎝2⎠ ⎝3⎠
= 0.441 + 0.378 + 0.081 = 0.9.
Now,E [Z 2 ] = P (Z = 1) ⋅ 12 + P (Z = 2) ⋅ 22 + P (Z = 3) ⋅ 32
⎛3⎞ 1 2 2 ⎛3⎞ 2 1 2 ⎛3⎞ 3 0 2
= 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3
⎝1⎠ ⎝2⎠ ⎝3⎠
= 0.441 + 0.756 + 0.243 = 1.44.
2
Hence, V[Z] = E [Z 2 ] − (E [Z]) = 1.44 − 0.92 = 0.63.
It turns out though that there is a much quicker formula for finding the mean and variance
of any binomial random variable.
(You can verify that this formula works for the last example: n = 3, p = 0.3, and thus
E[Z] = np = 0.9.)
SYLLABUS ALERT
The Poisson distribution is in the 9740 (old) syllabus, but not in the 9758 (revised) syllabus.
So you can skip this chapter if you’re taking 9758.
The Poisson process is the continuous time analogue of the Bernoulli process.70 And in
parallel, the Poisson random variable is the limit of the binomial random variable.
Example 557. The long-term average number of murders per year in Singapore is 2.4.
How might we model the rate at which murders are committed in Singapore?
Let’s assume that the rate at which murders are committed satisfies two properties:
1. (Time-homogeneity.) The probability that there are k murders in any fixed time
interval is constant.
For example, the probability that there are 2 murders in the first 90 days of the year, is
the same as the probability that there are 2 murders in the last 90 days of the year. As
another example, the probability that there is 1 murder on January 10th is the same as the
probability that there is 1 murder on August 5th.
2. (Independence.) The probability that there is a murder at any given moment does
not depend on the number of murders that have already been committed that year.
For example, the probability that there is a murder in December does not depend on how
many murders were committed between January and November.
Then an appropriate model might be the Bernoulli process. Let us say that each month,
there is a murder with probability 2.4/12 = 0.2, and no murder with probability 0.8. The
number of murders each month may thus be modelled by a Bernoulli random variable T
with parameter 0.2.
By assumption, the number of murders in one month has no influence on the number of
murders in another month. Thus, the number of murders in a given year can be modelled
by the binomial random variable X ∼ B(12, 0.2). Equivalently,
(Notice the number 0.2 was chosen so that E[X] = np = 12×0.2 = 2.4 matches the long-term
average number of murders per year.)
This model is reasonably good, but suffers from at least two flaws: It implicitly assumes
that
• In any given month, there can be at most 1 murder; and
• In any given year, there can be at most 12 murders.
These two implicit assumptions are somewhat unrealistic.
In the above model, what we did was to partition the year into 12 time intervals. If we
instead partitioned the year into 365 time intervals, the above two implicit assumptions
would be relaxes.
Let’s say that each day, there is probability 2.4/365 ≈ 0.00658 of a murder and probability
1−2.4/365 ≈ 0.99342 of no murders. The number of murders each day may thus be modelled
by a Bernoulli random variable U with parameter 2.4/365.
Thus, the number of murders in a given year can be modelled by the binomial random
2.4
variable Y ∼ B (365, ). Equivalently,
365
Y = U1 + U2 + ⋅ ⋅ ⋅ + U365 .
2.4 2.4
(Again, the number was deliberately chosen so that E[X] = np = 365 × = 2.4
365 365
matches the long-term average number of murders per year.)
2.4
This second model Y ∼ B (365, ) is probably better than the first model X ∼ B(12, 0.2).
365
But why stop at partitioning the year into 365 days?
In general, we can model the number of murders by the binomial random variable Z ∼
2.4
B (n, ). Taking the above reasoning to the extreme, we can instead partition the year
n
into infinitely-many infinitely-short time intervals. That is, we can let n → ∞. And as it
turns out, as n → ∞, the binomial random variable Z approaches something called the
Poisson random variable with parameter 2.4. That is,
lim Z ∼ Po(2.4).
n→∞
The following result establishes that the limit of a binomial random variable is a
Poisson random variable.71
λ
Theorem 13. Let λ > 0. Let Xn ∼ B (n, ). Let Y = lim Xn . Then Y is a Poisson random
n n→∞
variable with parameter λ.
Proof. This proof is actually also not too difficult. It just involves some algebra and ma-
nipulation of limits. But as usual, I’ll put it in the Appendices (on p. 988).
71
By the way, the Poisson random variable is a discrete random variable because although its range is not finite, its range is
countably-infinite.
The Poisson random variable is typically used to model the number of “occurrences” or
“arrivals” of some phenomenon, within a given timespan or space. We already saw one
example where it could be deployed (murders in Singapore).
In general, the Poisson random variable is an appropriate model if:
Example 558. Consider the number of goals scored in a given football match. Arguably,
an appropriate model for this number is a Poisson random variable, because arguably,
Suppose that, on average, the number of goals scored in a football match is 2.3. We can
model the number of goals scored with the Poisson random variable X ∼ Po(λ = 2.3).
By definition of the Poisson random variable, the probability that 0 goals are scored is
Now, no model is perfect. We do not know and may never know the exact processes
governing when a goal will be scored, a public mass shooting committed, or a supernova
observed. Nonetheless, we can argue that the Poisson random variable works reasonably
well as a model. We can make use of this model to analyse the phenomenon at hand.
If we choose not to use the Poisson random variable, then our alternatives are to:
• Find some alternative model that works better than the Poisson random variable.
• Shrug our shoulders and say that the phenomenon cannot be analysed mathematically.
The first alternative, if it exists, is great. The second is anti-intellectual and not very useful.
Exercise 240. This exercise revisits Example 559. Suppose the number of public mass
shootings in the US in a given year can be modelled by X, a Poisson random variable
with parameter λ = 4.2. Compute the probability that there are more than 5 public mass
shootings in the US in a given year. (Answer on p. 1167.)
Exercise 241. This exercise revisits Example 560. Suppose the number of supernovae
observed in a millennium can be modelled by Y , a Poisson random variable with param-
eter λ = 3.7. Compute the probability that there are no supernovae observed in a given
millennium. (Answer on p. 1167.)
It turns out that interestingly (and conveniently) enough, the mean and variance of X ∼
Po(λ) are both equal to λ.
Proof. The proof is actually not too difficult, given what we know about Maclaurin series.
But as usual, I’ll put it in the Appendices (p. 987).
This implies that if X ∼ B (n, p), n is “large enough”, and p is “small enough”, then the
random variable Y ∼ Po(λ = np) serve as a “good” approximation for the random variable
X.
The following example illustrates why we might want to approximate the binomial distri-
bution with the Poisson distribution.
In the old days, it would have been tedious to compute the above probability. So one might
instead have preferred to use the Poisson approximation.
Now, it would have been easy to find P(Y ≤ 10), because one would have had a print copy
of a Poisson table, partly reproduced below. A Poisson table tells us what the value of
P(Y ≤ k) is, for various possible values of λ and the number k, given that Y ∼ Po(λ). (For
the full table, see sheet titled “Poisson Table” at the usual link.)
Reading off the table, we have P(Y ≤ 10) ≈ 0.9574. We thus conclude: The probability that
at most 10 machines break down in a given month is approximately 0.9574.
You might wonder, “Well, weren’t there similarly also binomial tables that one could read
off of? If so, why then would we need to go through the trouble of approximating the
binomial with the Poisson and then refer to the Poisson table? We could just directly refer
to the binomial tables.”
Now, observe that to print a Poisson table, we need only be concerned with the Poisson
parameter λ and the number k. That’s a total of 2 parameters. So we can, in a single
table, list a lot of information.
In contrast, to print a binomial table, we have two binomial parameters n and p, and in
addition the number k. That’s a total of 3 parameters. So a binomial table really involved
multiple binomial tables, one for each value of n! (See this example.) Typically, the tables
would end at some small-ish value of n (20 in the linked table). Whereas in this particular
example, we would have needed the binomial tables all the way to n = 300!
And so, even though there were binomial tables, these were limited and would typically
not have furnished the desired information. This, then, was one big reason for using the
Poisson approximation, at least in the old days.
But today, it is no more difficult to compute P(X ≤ 10) than it is to compute P(Y ≤ 10).
For example, using my spreadsheet titled “Binomial” (at the usual link), one can simply
punch in n = 300 and p = 0.02 and read off that the exact solution to our problem P(X ≤ 10)
is approximately 0.9590.
Exercise 242. Suppose the number of deaths by lightning strikes in Singapore in a given
year can be modelled by the random variable X ∼ B (5500000, 10−6 ). (Answer on p. 1168.)
(a) What is an appropriate interpretation of the numbers 5500000 and 10−6 ?
(b) Using a suitable approximation (and justify your use of this approximation), find the
probability that at least 5 people are killed by lightning strikes in Singapore in a given year.
Formally:
Theorem 14. Suppose X ∼ Po (λ) and Y ∼ Po (µ) are independent Poisson random vari-
ables. Then X + Y ∼ Po (λ + µ).
Proof. (Optional.) We’ll prove that the probability distribution of X + Y is that of the
Poisson random variable with parameter λ + µ.
k
P (X + Y = k) = ∑ P (X + Y = k, X = i)
i=0
k k
1
= ∑ P (Y = k − i, X = i) = ∑ P (Y = k − i) P (X = i)
i=0 i=0
k k
λ e−λ µi e−µ
k−i
= ∑ pY (k − i)pX (i) = ∑
i=0 i=0 (k − i)! i!
k
−(λ+µ) 1
=e ∑ λk−i µi
i=0 (k − i)!i!
e−(λ+µ) k k!
= ∑ λk−i µi
k! i=0 (k − i)!i!
e−(λ+µ) k ⎛ k ⎞ k−i i 2 e−(λ+µ) k
= ∑ λ µ = (λ + µ) ,
k! i=0 ⎝ i ⎠ k!
1 2
where = uses the independence of X and Y and = uses Fact 67.
A trivial reason for this is that the difference of two independent Poisson random variables
can take on negative values. In contrast, the Poisson random variable always takes on
positive values. To illustrate:
Example 562 (continued from above). Reproduced from above: There are 34 ma-
chines in Room A and 42 in Room B. In any given month, each machine in Room A
has, independently, probability 0.03 of breaking down; and each machine in Room B has,
independently, probability 0.02 of breaking down.
Let A ∼ B(34, 0.03) and B ∼ B(42, 0.02). We now show that B − A is not a Poisson random
variable.
The range of A and B are both Z+0 = {0, 1, 2, . . . }. Thus, the range of B − A is Z =
{⋅ ⋅ ⋅ − 2, −1, 0, 1, 2, . . . }. By the definition of a Poisson random variable then (Definition
122), B − A cannot possibly be a Poisson random variable.
So far, all examples of random variables we’ve seen have been discrete. For example, the
binomial random variable X ∼ B (n, p) is discrete, because Range (X) = {0, 1, 2, . . . , n} is
finite.
We’ll now look at continuous random variables. Informally, a random variable Y is con-
tinuous if its range takes on a continuum of values.
For H2 Maths, you need only learn about one continuous random variable: the normal
random variable (subject of the next chapter).
Nonetheless, we’ll first look at another continuous random variable that is not in the syl-
labus. This is the continuous uniform random variable. It is much simpler than
the normal random variable and can thus help build up your intuition of how continuous
random variables work.
A line measuring exactly 1 metre in length is drawn on the floor. It is about to rain. Let
X be the position of the first rain-drop that hits the line. X is measured as the distance
(in metres) from the left-most point of the line.
So for example, if the first rain-drop hits the left-most point of the line, then x = 0. If it
hits the exact midpoint of the line, then x = 0.5. And if it hits the right-most point, then
x = 1.
• The range of X is [0, 1] (the first rain-drop can hit any point along the line); and
• X is equally likely to take on any value in the interval [0, 1] (the first rain-drop is equally
likely to hit any point along the line).
Recall that previously with any discrete random variable Y , we could find its probability
distribution. That is, we could find P (Y = k) (the probability that Y takes on the value
k). For example, if Y ∼ B (3, 0.5) modelled the number of heads in three coin-flips, then
⎛3⎞ 1 2 3
the probability that there was one heads was P (Y = 1) = 0.5 0.5 = .
⎝1⎠ 8
Now, in contrast, for any continuous random variable X, strangely enough, there is
zero probability that X takes on any particular value! For example, if X ∼ U [0, 1], then
P (X = 0.37) = 0. That is, there is zero probability that X takes on the value of 0.37!
72
But strangely enough, zero probability is not the same thing as impossible. For example, we’d say that
• There is zero probability, but it is not impossible that X ∼ U [0, 1] takes on the value 0.37.
• There is zero probability and it is impossible that X ∼ U [0, 1] takes on the value 1.2.
(Actually, rather than use the word “impossible”, mathematicians prefer saying “almost never”, which has a precise
definition.)
Similarly, the probability that X takes on values between 0.16 and 0.35 is simply 0.35−0.16 =
0.19. That is,
The above observations suggest that it may be useful to define a new concept, called the
cumulative distribution function.
The CDF simply tells us the probability that X takes on values less than or equal to k, for
every k ∈ R. Formally:
FX (k) = P (X ≤ k) .
It turns out that every random variable can be uniquely defined by giving its
CDF. For example, the continuous uniform random variable is formally defined thus:
Definition 124. X is the continuous uniform random variable on [0, 1] if its CDF FX ∶
R → R is defined by
⎧
⎪
⎪
⎪
⎪ 0, if k < 0,
⎪
⎪
FX (k) = ⎨k, if k ∈ [0, 1],
⎪
⎪
⎪
⎪
⎪
⎪
⎩1, if k > 1.
Armed with the concept of the CDF, the formal definition of a continuous random variable
can be simply stated:
Note that every random variable (discrete, continuous, or otherwise) has a cumulative
distribution function (CDF).
73
Or countably-infinite.
P (X ≤ k) = P (X < k) .
That is, whether an inequality is strict makes no difference. The reason is that by the third
Kolmogorov axiom (additivity),
Thus, for continuous random variables, it doesn’t matter whether inequalities are strict or
weak.
P (0.2 ≤ X ≤ 0.5) = P (0.2 < X ≤ 0.5) = P (0.2 ≤ X < 0.5) = P (0.2 < X < 0.5) .
Definition 126. Let X be a random variable whose CDF FX is differentiable. Then the
probability density function (PDF) of X is the function fX ∶ R → R defined by
d
fX (k) = FX (k).
dk
The PDF has an intuitive interpretation. The area under the PDF between points a and
b is equal to P (a ≤ X ≤ b). This, of course, is simply a consequence of the Fundamental
Theorems of Calculus:
b d b
FTC
∫a fX (k)dk = ∫a dk FX (k)dk = FX (b) − FX (a) = P(X ≤ b) − P(X ≤ a) = P(a ≤ X ≤ b).
For any a ≤ b, the area under the PDF between a and b is precisely P (a ≤ X ≤ b). For
example, there is probability 0.25 (red area) that X takes on values between 0.5 and 0.75.
There is probability 0.1 (blue area) that X takes on values between 0.2 and 0.3.
Exercise 244. The continuous uniform random variable Y ∼ U[3, 5] is equally likely to
take on values between 3 and 5, inclusive. (a) Write down its CDF FY . (b) Write down
and graph its PDF fY . (c) Compute, and also illustrate on your graph, the quantities
P (3.1 ≤ Y ≤ 4.6) and P (4.8 ≤ Y ≤ 4.9). (Answer on p. 1169.)
74
Note that although every random variable has a CDF, not every random variable has a PDF. In particular, if the random
variable’s CDF is not differentiable, then by our definition here, the random variable does not have a PDF.
The standard normal (or Gaussian) random variable (SNRV) is very important. In
fact, it is so important that we usually reserve the letter Z for it, and the Greek letters φ
and Φ (lower- and upper-case phi) for its PDF and CDF.
1. Z is a SNRV.
2. Z is a random variable with the standard normal distribution.
3. Z ∼ N (0, 1).
Definition 127. Z is called a standard normal random variable (SNRV) if its PDF φ ∶ R →
R is defined by:
1
φ(a) = √ e−0.5a .
2
2π
For the A-levels, you need not remember this complicated-looking PDF. Nor need you
understand where it comes from.
The normal PDF is often also referred to as the bell curve, due to its resemblance to a
bell (kinda).
As with the continuous uniform, for any a ≤ b, the area under the normal PDF between
a and b gives us precisely P (a ≤ X ≤ b). For example, there is probability 0.25 (red area)
that X takes on values between 0.5 and 0.75. There is probability 0.1 (blue area) that X
takes on values between 0.2 and 0.3.
a a 1
√ e−0.5x dx.
2
Φ(a) = P (Z ≤ a) = ∫ φ(x)dx = ∫
−∞ −∞ 2π
Unfortunately, this last integral has no simpler expression (mathematicians would say that
it has no “closed-form expression”). Instead, as we’ll soon see, we have to use the so-called
Z-tables (or a graphing calculator) to look up values of Φ(k).
The next fact summarises the properties of the normal distribution. Some of these proper-
ties are illustrated in the figure that follows.
Fact 78. Let Z ∼ N(0, 1) and φ and Φ be its PDF and CDF.
1. Φ(∞) = 1. (As with any random variable, the area under the entire PDF is 1.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has a surprising impli-
cation: however large a is, there is always some non-zero probability that Z ≥ a.)
3. E [Z] = 0. (The mean of Z is 0.)
4. The PDF φ reaches a global maximum at the mean 0. (In fact, we can go ahead and
1
compute φ (0) = √ ≈ 0.399.)
2π
5. V [Z] = 1. (The variance of Z is 1.)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
1. Press the blue 2ND button and then DISTR (which corresponds to the VARS button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
The TI84 is now asking for your lower and upper bounds. Since Φ(2.51) = Φ(2.51)−Φ(−∞),
your lower bound is −∞ and your upper bound is 2.51.
3. But there’s no way to enter −∞ on your TI84. So instead, you’ll enter −1099 , which is
simply a very large negative number. To do so, press (-) , the blue 2ND button, EE
(which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Now to enter your upper bound. First press , (this simply demarcates your lower and
upper bounds). Then enter your upper bound 2.51 by pressing 2 . 5 1 . Then press
ENTER . Your TI84 says that the answer is Φ(2.51) ≈ 0.99396.
-4 -3 -2 -1 0 1 2 3 4
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
Example 566. We’ll find Φ(2.51), Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4) using Z-tables.
Refer to the Z-tables on p. 633. (These are the exact same tables that appear on the List
of Formulae you’ll get during exams.)
• To find Φ(2.51), look at the row labelled 2.5 and the column labelled 1 — read off the
number 0.9940. We thus have Φ(2.51) = 0.9940.
• To find Φ(−2.51), note that the table does not explicitly give values of Φ(z), if z < 0.
But we can exploit the fact that the standard normal is symmetric about the mean µ = 0.
This fact implies that Φ(−z) = 1 − Φ(z). Hence, Φ(−2.51) = 1 − Φ(2.51) = 0.0060.
• To find Φ(1.372), first look at the row labelled 1.3 and the column labelled 7 — read off
the number 0.9147. This tells us that Φ(1.37) = 0.9147. Now look at the right end of the
table (where it says “ADD”). Since the third decimal place of 1.372 is 2, we look under
the column labelled 2 — this tells us to ADD 3. Thus, Φ(1.372) = 0.9147+0.003 = 0.9150.
• To find P (−4 ≤ Z ≤ 4), the Z-tables printed are actually useless, because they only go
to 2.99. So you can just write P (−4 ≤ Z ≤ 4) ≈ 1.
Exercise 245. Using both the Z-tables and your graphing calculator, find the following:
(a) P (Z ≥ 1.8). (b) P (−0.351 < Z < 1.2). (Answer on p. 1170.)
1 2 3 4 5 6 7 8 9
z 0 1 2 3 4 5 6 7 8 9
ADD
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 11 13 15 17
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 2 3 5 6 8 10 11 13 14
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1 3 4 6 7 8 10 11 13
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2 3 4 5 6 7 8 9
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 2 2 2 3 3 4
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2 2 2 2
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 1
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 1 1 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 1 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 1
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0
Consider σZ + µ, itself a random variable. We know that since E [Z] = 0 and V [Z] = 1, it
follows that
It turns out that σZ + µ is a normal random variable with mean µ and variance σ 2 :
Definition 128. X is called a normal random variable with mean µ and variance σ 2 if its
PDF fX ∶ R → R is defined by:
1 a−µ 2
fX (a) = √ e−0.5( σ ) .
σ 2π
Once again, for the A-levels, you need not remember this complicated-looking PDF. Nor
need you understand where it comes from.
Exercise 246. Let X ∼ N(µ, σ 2 ). Verify that if µ = 0 and σ 2 = 1, then for all a ∈ R, we
have fX (a) = φ(a). What can you conclude? (Answer on p. 1171.)
Thus, we can easily transform any normal random variable into the SNRV:
X −µ
Corollary 6. If X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1). Equivalently, X = σZ + µ.
σ
(Just to be clear, two random variables are identical if their CDFs are identical.)
X −µ
Exercise 247. Using Fact 79, prove that if X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1).
σ
(Answer on p. 1171.)
The above corollary gives us an alternative method for computing probabilities associated
with normal random variables. In general, if X ∼ N (µ, σ 2 ), then
c−µ c−µ
P (X ≤ c) = P (σZ + µ ≤ c) = P (Z ≤ ) = Φ( ).
σ σ
Fact 80. Let X ∼ N (µ, σ 2 ) and let fX and FX be the PDF and CDF of X.
1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random
variable.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising
implication that no matter how large a is, there is always some non-zero probability that
Z ≥ a.)
3. E [X] = µ. (The mean of Z is µ.)
4. The PDF fX reaches a global maximum at the mean µ. (In fact, we can go ahead and
1 0.399
compute fX (µ) = √ ≈ .)
σ 2π σ
5. V [X] = σ 2 . (The variance of X is σ 2 .)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (X ≥ µ + a) = P (X ≤ µ − a) = FX (µ − a).
(b) Since P (X ≥ µ + a) = 1 − P (X ≤ µ + a) = 1 − FX (µ + a), it follows that FX (µ − a) =
1 − FX (µ + a) or, equivalently, FX (µ + a) = 1 − FX (µ − a).
(c) FX (µ) = 1 − FX (µ) = 0.5.
Exercise 248. Prove all of the properties listed in Fact 80. (Hint: Use Corollary 6 to
convert X into the SNRV. Then simply apply Fact 78.) (Answer on p. 1172.)
1. Press the blue 2ND button and then VARS (which corresponds to the DISTR button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
3. Enter the lower bound −1099 by pressing (-) , the blue 2ND button, EE (which corre-
sponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Enter the upper bound 2 by pressing , and 2 . (Don’t press ENTER yet!!).
Previously, we didn’t bother telling the TI84 our mean µ and standard deviation σ.
And so by default, if we pressed ENTER at this point, the TI84 simply assumed that we
wanted the SNRV Z ∼ N(0, 1). Now we’ll tell the TI84 what µ and σ are:
Finding P (H < 2), P (I < 2), P (−1 < G < 1), P (−1 < H < 1), and P (−1 < I < 1) is similar:
P (H < 2) and P (I < 2) P (−1 < G < 1) P (−1 < H < 1) P (−1 < I < 1)
Since I has mean µ = 2, we should have exactly P (I < 2) = 0.5. So here the TI84 has
actually made a small error in reporting instead that P (I < 2) ≈ 0.5000000005.
2 − µG 2 − (−1)
P (G < 2) = P (Z < = √ ≈ 9.4868) = Φ (9.4868) ≈ 1,
σG 0.1
2 − µH 2 − 1
P (H < 2) = P (Z < = √ ≈ 0.7071) = Φ (0.7071) ≈ 0.7601,
σH 2
2 − µI 2 − 2
P (I < 2) = P (Z < = √ = 0) = Φ (0) = 0.5,
σI 3
−1 − (−1) 1 − (−1)
P (−1 < G < 1) = P (0 = √ <Z< √ ≈ 6.3246)
0.1 0.1
= Φ (6.3246) − Φ (0) ≈ 1 − Φ(0) = 0.5.
−1 − 1 1−1
P (−1 < H < 1) = P (−1.4142 ≈ √ < Z < √ = 0)
2 2
= Φ(0) − Φ(−1.4142) ≈ 0.5 − [1 − Φ(1.4142)]
= Φ(1.4142) − 0.5 ≈ 0.9213 − 0.5 = 0.4213,
−1 − 2 1−2
P (−1 < I < 1) = P (−1.7321 ≈ √ < Z < √ ≈ −0.5774)
3 3
= Φ(−0.5774) − Φ(−1.7321) = 1 − Φ(0.5774) − [1 − Φ(1.7321)]
≈ 0.9584 − 0.7182 = 0.2402.
Exercise 249. Let X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2). Using both the Z-tables and your
graphing calculator, find the following: (a) P (X ≥ 1) and P (Y ≥ 1). (b) P (−2 ≤ X ≤ −1.5)
and P (−2 ≤ Y ≤ −1.5). (Answer on p. 1173.)
Theorem 15. If X and Y are independent normal random variables, then X + Y is also a
normal random variable. Moreover, X − Y is also a normal random variable.
Proof. Omitted.
2
Corollary 7. Let X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ) be independent and a, b ∈ R
2
be constants. Then X + Y ∼ N (µX + µY , σX + σY2 ) and more generally, aX + bY ∼
N (aµX + bµY , a2 σX
2
+ b2 σY2 ).
2
Moreover, X − Y ∼ N (µX − µY , σX + σY2 ) and more generally, aX − bY ∼
N (aµX − bµY , a2 σX
2
+ b2 σY2 ).
Examples:
(a) What is the probability that their total weight is greater than 405 kg?
(b) What is the probability that one is more than 10% heavier than that the other?
(a) Let X1 ∼ N (200, 50) and X2 ∼ N (200, 50) be the weight of the first and second sumo
wrestler. Then X1 + X2 ∼ N (400, 100). Thus,
405 − 400
P (X1 + X2 > 405) = P (Z > √ ) = P (Z > 0.5) = 1 − Φ (0.5) ≈ 1 − 0.6915 = 0.3085.
100
(b) Our goal is to find p = P (X1 > 1.1X2 ) + P (X2 > 1.1X1 ). This is the probability that
the first sumo wrestler is more than 10% heavier than the second, plus the probability that
the second is more than 10% heavier than the first. Of course, by symmetry, these two
probabilities are equal. Thus, p = 2 × P (X1 > 1.1X2 ). Now,
But X1 − 1.1X2 ∼ N (200 − 1.1 ⋅ 200, 50 + 1.12 ⋅ 50) = N (−20, 110.5). Thus,
0 − (−20)
P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) = P (Z > √ )
110.5
(b) What is the probability that a caught fish weighs more than 9 times as much as a
caught shrimp?
(a) Let S be the total weight of 4 caught fish and 50 caught shrimp. Note, importantly,
that it would be wrong to write S = 4X + 50Y , because 4X + 50Y would be 4 times the
weight of a single caught fish, plus 50 times the weight of a single caught shrimp.
In contrast, we want Z to be the sum of the weights of 4 independent fish and 50 independent
shrimp. Thus, we should instead write S = X1 + X2 + X3 + X4 + Y1 + Y2 + ⋅ ⋅ ⋅ + Y50 , where
• X1 ∼ N (1, 0.4), X2 ∼ N (1, 0.4), X3 ∼ N (1, 0.4), and X4 ∼ N (1, 0.4) are the weights of
each caught fish.
• Y1 ∼ N (0.1, 0.1), Y2 ∼ N (0.1, 0.1), . . . , and Y50 ∼ N (0.1, 0.1) are the weights of each
caught shrimp.
(Note by the way that in contrast, 4X +50Y ∼ N (9, 42 × 0.4 + 502 × 0.1) = N (9, 256.4), which
has a rather different variance!)
(b) P (X > 9Y ) = P (X − 9Y > 0). But X − 9Y ∼ N (1 − 9 × 0.1, 0.4 + 92 × 0.1) = N (0.1, 8.5).
Thus, P (X − 9Y > 0) ≈ 0.5137 (calculator).
(a) Find the probability that their total water and electricity utility bill in any given month
exceeds $100.
(b) Find the probability that their total water and electricity utility bill in any given year
exceeds $1, 000.
(c) Then what is the maximum value of x, in order for the probability that the total utility
bill in a given month exceeds $100 is 0.1 or less?
Proof. The proof is a little advanced and thus entirely omitted from this book.
What does it mean for one random variable to “converge in distribution” to another? This
is a little beyond the scope of the A-levels, but informally, this means that as n → ∞,
n
the random variable ∑ Xi becomes “ever more” like the random variable with distribution
i=1
2
N (nµ, nσ ).
How large is “large enough”? The most common rule-of-thumb is that n ≥ 30 is “large
enough”, so that’s what we’ll use in this book, even though this is somewhat arbitrary.
Indeed, if the original distribution from which the random variables are drawn are “nice
enough”, then n ≥ 30 may not be “large enough”. (Informally, a distribution is “nice
enough” if it is — among other things — fairly symmetric, fairly unimodal, and not too
skewed.)
You can safely assume that all distributions you’ll ever encounter in the A-levels are “nice
enough”, so that the n ≥ 30 rule-of-thumb works. But whenever you use the CLT normal
approximation, you should be clear to state that you assume the distribution is “nice
enough”.
The CLT says that since n = 100 ≥ 20 is large enough and the distribution is “nice enough”
(we are assuming this), the random variable X can be approximated by the normal random
variable Y ∼ N (100 × 3.5, 100 × 35/12) = N (350, 3500/12).
P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360).
Note however that X is a discrete random variable, so that P(X ≥ 360) ≠ P(X > 360).
More specifically,
In contrast, Y is a continuous random variable, so that P(Y ≥ 360) = P(Y > 360). Hence, if
we simply use the approximations P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360),
then implicitly we’d be saying that P(X = 360) = 0, which is blatantly false.
To correct for this, we perform the so-called continuity correction. This says that we’ll
instead use the approximations
P(X ≥ 360) ≈ P(Y ≥ 359.5) and P(X > 360) ≈ P(Y ≥ 360.5).
Thus, P(X ≥ 360) ≈ P(Y ≥ 359.5) ≈ 0.2890 (calculator) and P(X > 360) ≈ P(Y ≥ 360.5) ≈
0.2693.
Note that if the random variable to be approximated is itself continuous, then there is no
need to perform the continuity correction. This is illustrated in Exercise 252 below.
Exercise 251. Let X be the random variable that is the sum of 30 rolls of a fair die. Find
P(100 ≤ X ≤ 110). (Answer on p. 1175.)
SYLLABUS ALERT
This is in the 9740 (old) syllabus, but not in the 9758 (revised) syllabus. So you can skip
this subsection if you’re taking 9758.
The binomial distribution is discrete. Thus, when using the normal distribution to approx-
imate it, we must use the continuity correction.
Example 575. Flip a fair coin 1, 000 times. Let X be the number of heads. Find the
probability that there are more than 510 heads.
The CLT says that since n = 1000 ≥ 20 is large enough and the distribution is “nice
enough” (we are assuming this), X can be approximated by the normal random variable
Y ∼ N (1000 × 0.5, 1000 × 0.25) = N (500, 250). Thus, using also the continuity correction,
we have
This turns out to be a decent approximation because the exact probability, computed using
the binomial distribution, is P (X > 510) ≈ 0.2533.
Exercise 253. In a school of 1000 students, each student has probability 0.9 of passing
the A-level H2 Maths exam. Find the probability that there are at least 920 passes in the
school. (You may make additional assumptions, but you should state them.) (Answer on
p. 1175.)
SYLLABUS ALERT
This is in the 9740 (old) syllabus, but not in the 9758 (revised) syllabus. So you can skip
this subsection if you’re taking 9758.
The Poisson distribution is discrete. Thus, when using the normal distribution to approxi-
mate it, we must use the continuity correction.
Example 576. Suppose the monthly number of murders in Singapore can be modelled by
X ∼ Po (0.1). Assuming that the month-to-month numbers of murders are independent,
find the probability that there are more than 10 murders in a given 10-year span.
The number of murders in 10 years is Y = X1 + X2 + ⋅ ⋅ ⋅ + X120 , where X1 ∼ Po (0.1),
X2 ∼ Po (0.1), . . . , X120 ∼ Po (0.1) are the numbers of murders in each of the 120 months.
The CLT says that since n = 120 ≥ 20 is large enough and the distribution is “nice
enough” (we are assuming this), Y can be approximated by the normal random variable
T ∼ N (120 × 0.1, 120 × 0.1) = N (12, 12). Thus, using also the continuity correction, we have
This is a decent approximation, because the exact probability, computed using the Poisson
distribution, is P(Y > 10) ≈ 0.6528.
Exercise 254. Suppose the daily number of fatalities from motor vehicle accidents can be
modelled by X ∼ Po(0.5). Find the probability that there are more than 200 fatalities from
car accidents in any given year. (You may make additional assumptions, but you should
state them.) (Answer on p. 1176.)
The Fundamental Theorems of Calculus and the CLT are the most profound and amazing
results you’ll learn in H2 maths. This chapter briefly explains why the CLT is so amazing
and why the normal distribution is ubiquitous.
The normal distribution is ubiquitous in nature. The classic example is human height.75
Example 577. Below is a histogram of the heights of the 4,060 NBA players who ever
played in an NBA game (through the end of the 2016 season). (Heights are reported in feet
and a whole number of inches, where 1 in = 2.54 cm and 1 ft = 12 in, so that 1 ft = 30.48
cm.) The histogram has 28 bins and (arguably) looks normal (bell-shaped).
The width of each bin is 1 inch. For example, the red bin says 410 players have had reported
heights of 6 ft 7 in (approx. 200 cm). The pink (leftmost) bin is barely visible and says
only 1 player has had a reported height of 5 ft 3 in (approx. 160 cm). The blue (rightmost)
bin is also barely visible and says that only 2 players have had reported heights of 7 ft 7 in
(approx. 231 cm). The average or mean height is approx. 6 ft 6 in (approx. 198 cm).
75
Data: Excel spreadsheet. Source: Basketball-Reference.com (retrieved June 15th, 2016). Caveats: (1) For some reason,
out of the 4060 players in that database at the time of retrieval, there was exactly one player (George Karl) whose height
was not listed. Wikipedia lists George Karl’s height as 6 ft 2 in, so that is what I have used for his height. (2) By NBA, I
actually mean the BAA (1946-1949), the NBA (1949-present), and the ABA (1967-1976), combined. (3) As is well-known
among basketball fans, the listed heights of NBA players are not accurate and can sometimes be off by as much as 2 to 3
inches (5 to 7.5 cm). (See this recent Wall Street Journal article.)
What made the book especially controversial were its claims that intelligence was largely
heritable and that black Americans had lower intelligence than whites. The figure above is
taken from p. 279 of the book. It suggests that
• Black IQ is normally distributed, with a mean of around 80.
• White IQ is normally distributed, with a mean of around 105.
(Source: YouTube.)
Question:
We will try to answer this question, but only after we’ve illustrated how the Central Limit
Theorem works.
Example 580. Flip many fair coins. Model each with the Bernoulli random variables T1 ,
T2 , T3 , . . . , each with probability of success (heads) 0.5.
Let Xn = T1 + T2 + ⋅ ⋅ ⋅ + Tn ∼ B (n, 0.5).
On this and the next page are the histograms of the distributions of X7 , X8 , X9 , X10 , X20 ,
X30 , X40 , X50 , and X100 . Observe that as n grows, the shape of the probability distribution
of Xn looks ever more bell-shaped. This is exactly what the CLT says.
Example 581. Flip many biased coins, each with probability 0.9 of heads. Model each
with the Bernoulli random variables Y1 , Y2 , Y3 , . . . , each with probability of success (heads)
0.9.
Let Sn = Y1 + Y2 + ⋅ ⋅ ⋅ + Yn be the number of heads in the first n coin-flips. (By the way,
Sn ∼ B (n, 0.9).)
On this and the next page are the histograms of the distributions of S1 , S2 , . . . , and S10 .
S1 has probability 0.1 of taking on value 0 and 0.9 of taking on value 1. S2 has probability
0.01 of taking on the value of 0, 0.18 of taking on the value 1, and 0.81 of taking on the
value 2. S3 has probability 0.001 of taking on the value of 0, 0.036 of taking on the value
1, 0.486 of taking on the value 2, 0.2916 of taking on the value 3, 0.6561 of taking on the
value 4. Etc.
It certainly does not look like the distribution Sn is becoming increasingly bell-curved.
Well, let’s see.
Below are the histograms of the distributions of S20 , S30 , S40 , S50 , and S100 . Remarkably
enough, as n grows, the shape of the probability distribution of Sn looks ever more bell-
shaped. As promised by the CLT.
Examples to illustrate:
Example 582. Assume that human height is entirely determined by 1000 independent
genes (assume all human beings have these 1000 genes).
Assume that each of these 1000 genes is associated with an independent random variable
2
X1 , X2 , . . . , X1000 , each identically distributed with mean µX and variance σX . Assume
also that human height is simply equal to the sum of these random variables. That is, a
human being’s height is simply given by H = X1 + X2 + ⋅ ⋅ ⋅ + X1000 .
Then the CLT says that since n = 1000 is “large enough”, H will be approximately normally
2
distributed, with mean 1000µX and variance 1000σX . Amongst the world’s 7.4 billion
people, there will be some very short people and some very tall people, but most people
will be near the mean height 1000µX .
Assume there are 300 independent factors that determine the exact weight of a mooncake.
Assume that each of these 300 factors is associated with an independent random variable
Y1 , Y2 , . . . , Y300 , each identically distributed with mean µY and variance σY2 . Assume also
that the weight of a mooncake is simply given by W = Y1 + Y2 + ⋅ ⋅ ⋅ + Y300 .
Then the CLT says that since n = 300 is “large enough”, W will be approximately normally
distributed, with mean 300µY and variance 300σY2 . Amongst the millions of mooncakes
produced, there will be some very light mooncakes and some very heavy mooncakes, but
most mooncakes will be near the mean weight 300µY .
Mathematical modellers often assume that “everything is normal”. There are three justifi-
cations for this:
1. We have strong empirical evidence that many things in nature are normally-distributed.
2. We have a strong theoretical reason (the CLT) for why this might be so.
3. The normal distribution is easy to handle (because e.g. the maths is easy, compared to
some other distributions).
However, many things are not normally-distributed. It is thus a mistake to assume that
“everything is normal”.
One example of a common but non-normal distribution found in nature is the Pareto
distribution. We’ll skip the formal details. Informally, it is called the Pareto Principle or
the 80-20 Rule and businesspersons say things like:
Let’s see if the points scored in the NBA resembles the Pareto distribution.77 In particular,
is it the case that 20% of NBA players have scored 80% of the points?
77
Source: Basketball-Reference.com. Caveats: (1) The data were retrieved on June 15th, 2016, so the points scored are
between 1946 and that date. (2) By NBA, I actually mean the BAA (1946-1949), the NBA (1949-present), and the ABA
(1967-1976), combined. Dataset.
The grand total number of points ever scored in the NBA is 11, 565, 923. Of which,
8, 424, 242 (or 72.8%) were scored by the top 20% (812). So it appears that the 80-20
Rule is a reasonably good description of the distribution of total points scored by players!
In contrast, the normal distribution is obviously not a good description.
It’s fairly obvious to anyone who bothers graphing the data that “points scored in the
NBA” is not normally-distributed. There are however instances where this is less obvious.
One is thus more likely to mistakenly assuming a normal distribution. A famous and tragic
example of this is given by the financial markets.
Let qi be the % change in closing value on day i, as compared to day i − 1. For example, on
June 14th, 2016, the DJIA closed at 17, 674.82. On June 15th, 2016, it closed at 17, 640.17,
34.65 points lower than the previous day’s close. Thus,
−34.65
q20160615 = ≈ −0.20%.
17, 674.82
The graph here is of q, on 36, 044 consecutive trading days (over 131 years). In black are
those days when the DJIA rose; in red are when it fell.
Can you spot the single largest one-day fall in the DJIA? (We’ll talk about this singular
day shortly.)
The graph here is also of q, but in the form of a histogram. Each bin has width 0.1%
(except the leftmost and rightmost bins). For example, on 2, 204 days (out of 36, 044),
q ∈ (−0.1%, 0%] (the DJIA fell by between 0.1% and 0%).
On 78 days, q ≤ −5% (the DJIA fell by more than 5%). On 70 days, q > 5% (the DJIA rose
by more than 5%).
It seems reasonable to say that q is normally-distributed (at least if we ignore the leftmost
and rightmost bins).
The sample mean and standard deviation (from the 36, 044 observations) are µ ≈ 0.023%
and σ ≈ 1.064%. So let’s suppose q were normally-distributed with mean µ and variance
σ2.
If so, then we’d predict (as per the properties of the normal distribution) that:
1. 0.6827 of the time, q is within 1 standard deviation of the mean, i.e. q ∈ (−1.04%, 1.09%).
2. 0.954 of the time, q is within 2 standard deviations of the mean, i.e. q ∈ (−2.10%, 2.15%).
In addition to the above “evidence”, we might make the following theoretical argument:
Share prices are affected by a myriad random and arguably-independent factors. Hence,
by the CLT, we’d expect share prices (and thus q as well) to be normally-distributed.
But as it actually turned out, during these 131 years, the DJIA rose or fell by more than ...
1. ... 5% 148 times.
2. ... 7% 40 times.
3. ... 10% 10 times.
Probability Statistics
Given a known model, what can we Given observed data, what can
say about the data we’ll observe? we say about the model?
Statistics question: “Suppose we observe HHH. Then what can we say about p?”
(Different statisticians will give different answers.)
In the real world, we will almost never know what p “truly” is. Instead, we usually only
have some limited data observations (such as observing HHH).
Probability is about making heroic assumptions about what p is, in order to draw inferences
about what the observed data will look like.
In contrast, statistics is about using limited, observed data to draw statistical inferences
about the model and its parameters.
Example 587. Ann and Bob are two infinitely-intelligent persons. Ann believes that the
probability of rain tomorrow is 0.2 and Bob believes that it is 0.6.
• Objectivist view: There is some single, “correct” probability p of rain tomorrow. Per-
haps no one (except some Supreme Being up above) will ever know what exactly p is.
But in any case, we can say that exactly one of the following must be true:
• Subjectivist view: A probability is not some objective, rational thing that exists outside
the mind of any human being. There is no “correct” probability. Instead, a probability
is merely
Thus, Ann and Bob can legitimately disagree about the probability of rain tomorrow,
without either being wrong. After all, the numbers 0.2 and 0.6 are merely their personal,
subjective degrees of belief in the likelihood of rain tomorrow.
Bruno de Finetti (1906-1985) was perhaps the most famous and extreme subjectivist ever.
In a preface to a book, he provocatively declared, with his CAPSLOCK key stuck, that
In this textbook (and for the A-levels), we will be strict objectivists. The main practical
implications of being an objectivist are illustrated in the following examples:
78
Theory of Probability, v. 1, 1990 edition, p. x. (Originally published as Teoria delle probabilità in 1970.)
Objectivist interpretation: Ann and Bob cannot both be correct. The suspect is either
innocent (with probability 1) or guilty (with probability 1).
In fact, we can go even further and say that both Ann and Bob are talking nonsense. It
is nonsensical to say things like the suspect is “probably” innocent (or “probably” guilty),
because the suspect either is innocent or not.
Subjectivist interpretation: Ann and Bob are perfectly well-entitled to their beliefs.
Moreover, it is perfectly meaningful to say things like the suspect is “probably” innocent
(or “probably” guilty). Ann and Bob do not know for sure whether the suspect is innocent
or guilty. They are thereby perfectly well-entitled to speak probabilistically about the
innocence or guilt of the suspect.
Example 589. We flip a coin 100 times and get 100 heads.
Given these observed data (100 heads out of 100 flips), what can we say (what statistical
inference can we make) about whether or not the coin is fair?
Subjectivist answer: The coin is probably not fair. (This is perhaps the answer that
most laypersons would give.)
Objectivist answer: The coin either is fair (with probability 1) or isn’t fair (with prob-
ability 1). Subjectivist statements like the coin is “probably” not fair are nonsensical.
Most untrained laypersons are innately subjectivist. Yet in this book (and also for the
A-levels), you’ll be trained to think like strict objectivists.
Note though that it is not the case that one school of thought is correct and the other
wrong. Both the objectivist and subjectivist schools of thought have merit. The growing
consensus amongst statisticians is to take the best of both worlds.
Nonetheless, in this textbook, we learn only the objectivist interpretation. Not because it
is necessarily superior, but rather because
71.1 Population
Definition 129. A population is any ordered set (i.e. vector) of objects we’re interested
in.
A population can be finite or infinite. But to keep things simple, we’ll look at examples
where it is finite.
Example 590. The two candidates for the 2016 Bukit Batok SMC By-Election are Dr.
Chee Soon Juan and PAP Guy. It is the night of the election and voting has just closed.
Our objects-of-interest are the 23, 570 valid ballots cast. (A ballot is simply a piece of paper
on which a vote is recorded. The words ballot and vote are often used interchangeably.)
Arrange the ballots in any arbitrary order. Let v1 = 1 if the first ballot is in favour of Dr.
Chee and v1 = 0 otherwise. Similarly and more generally, for any i = 2, 3, . . . , 23570, let
vi = 1 if the ith ballot is in favour of Dr. Chee and v1 = 0 otherwise.
Our population here is simply the ordered set P = (v1 , v2 , . . . , v23570 ). So in this example,
the population is simply an ordered set of 1s and 0s.
The population mean µ is simply the average across all population values. The popu-
lation variance σ 2 is a measure of the variation across all population values. Formally:79
Definition 130. Given a finite population P = (v1 , v2 , . . . , vk ), the population mean µ and
population variance σ 2 are defined by
2 2 2 2
∑i=1 vi v1 + v2 + ⋅ ⋅ ⋅ + vk ∑i=1 (vi − µ) (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vk − µ)
k k
2
µ= = and σ = = .
k k k k
Example 672 (continued from above). Suppose that of the 23, 570 votes, 9, 142 were
for Dr. Chee and the remaining against. So the vector (v1 , v2 , . . . , v23570 ) contains 9, 142 1s
and 14, 428 0s.
In this particular example, the population values are binary (either 0 or 1). And so we have
a nice alternative interpretation: the population mean is also the population proportion.
In this case, it is the proportion of the population who voted for Dr. Chee. So here the
proportion of votes for Dr. Chee is about 0.3879.
2 2 2 9142 9142 2 2
2 (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vn − µ) 9142 ⋅ (1 − 23570 ) + 14428 ⋅ (0 − 23570 )
σ = = ≈ 0.2374.
n 23570
As usual, the variance tells us about the degree to which the vi ’s vary. Of course, in this
example, we already know that the vi ’s can take on only two values — 0 and 1. So the
variance isn’t terribly interesting or informative in this example. In particular, it doesn’t
tell us anything more that the population mean didn’t already tell us (indeed, it can be
shown that in this example, σ 2 = µ − µ2 ).
79
In the case of an infinite population, the definitions of µ and σ 2 must be adjusted slightly, but the intuition is the same.
Informally, a parameter is some number we’re interested in and which may be calculated
based on the population.
Voting has just closed. In a few hours’ time (after the vote-counting is done), we will know
what exactly µ is. But right now, we still don’t know what µ is.
Suppose we are impatient and want to know right away what µ might be. In other words,
suppose we want to get an estimate of the true value of µ. What are some possible
methods of getting a quick estimate of µ?
One possibility is to observe a random sample of 100 votes and count the proportion of
these 100 votes that are in favour of Dr. Chee. So for example, say we do this and observe
that 39 out of the 100 votes are for Dr. Chee. That is, we find that the observed sample
mean (which in this context can also be called the observed sample proportion) is
0.39. Then we might conclude:
Based on this observed random sample of 100 votes, we estimate that µ is 0.39.
The layperson might be content with this. But the statistician digs a little deeper and asks
questions such as:
• How do we know if this estimate is “good”?
• What are the criteria to determine whether an estimate is “good”?
We’ll now try to address, if only to a limited extent, these questions. But to do so, we must
first precisely define terms like sample and estimate.
1. The range of possible values taken on by the objects in the population; and
2. The proportion of the population that takes on each possible value.
Example 672 (continued from above). The population is P = (v1 , v2 , . . . , v23570 ), the
ordered set of 23570 ballots. Suppose that of these, 9, 142 are votes for Dr. Chee (hence
recorded as 1s) and the remaining 14, 428 are for PAP Guy (hence recorded as 0s).
Then the distribution of the population can informally be described in words as:
• A proportion 9142/23570 of the population are 1s, and
• A proportion 14428/23570 of the population are 0s.
Then the distribution of the population can informally be described in words as:
• A proportion 1/6 of the population are 2s;
• A proportion 2/6 of the population are 3s;
• A proportion 1/6 of the population are 4s; and
• A proportion 2/6 of the population are 7s.
80
Formally, we’d define the population distribution as a function. Indeed, some writers define the population itself as the
distribution function.
Informally, to observe a random sample of size n, we follow this procedure: Imagine the
23, 570 ballots are in a single big bag.
1. Randomly pull out one ballot. Record the vote (either we write x1 = 1, if the vote was
for Dr. Chee, or we write x1 = 0, if it wasn’t).
2. Put this ballot back in (this second step is why we call it sampling with replacement).
3. Repeat the above n times in total, so as to record down the values of x1 , x2 , . . . , xn .
Definition 131. Let P be a population. Then the random vector (i.e. ordered set of
random variables) (X1 , X2 , . . . , Xn ) is a random sample of size n from the population P if
An example to illustrate:
Let X1 , X2 , and X3 be independent random variables, each with the same distribution as
the population. That is, for each i = 1, 2, 3,
14428 9142
P (Xi = 0) = and P (Xi = 1) = .
23570 23570
In this textbook, we’ll be very careful to distinguish between a random sample (which is
a vector of random variables) and an observed random sample (which is a vector of real
numbers).
This may be contrary to the practice of your teachers or indeed even the A-level exams.
Definition 132. Let (X1 , X2 , . . . , Xn ) be a random sample of size n. Then the corre-
sponding sample mean X̄ and the sample variance S 2 are the random variables defined
by:
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn
X̄ = ,
n
2 2 2 2
2
(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄) ∑i=1 (Xi − X̄)
n
S = = .
n−1 n−1
(The List of Formulae you get during exams will contain the observed sample variance.)
Note that strangely enough, the denominator of S 2 is n − 1, rather than n as one might
expect. As we’ll see later, there is a good reason for this.
By the way, there are two other formulae for calculating the sample variance:
Fact 81. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Let X̄ be the sample
mean and S 2 be the sample variance. Let a ∈ R be a constant. Then
2 2
[∑n
i=1 Xi ] 2 [∑ (X −a)]
n
∑i=1 Xi2 − ∑i=1 (Xi − a) − i=1 n i
n n
2 2
(a) S = n
and (b) S = .
n−1 n−1
• The sample mean X̄ (a random variable) vs. the observed sample mean x̄ (a real
number).
• The sample variance S 2 (a random variable) vs. the observed sample variance s2
(a real number).
Example 672 (continued from above). Let (X1 , X2 , X3 ) be a random sample of size 3.
The corresponding sample mean X̄ and sample variance S 2 are these random variables:
2 2 2
X1 + X2 + X3 (X1 − X̄) + (X2 − X̄) + (X3 − X̄)
X̄ = , S2 = .
3 3−1
Suppose our observed random sample of size 3 is (1, 0, 0). Then the corresponding ob-
served sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 1 + 0 + 0 1
x̄ = = = ,
n 3 3
2 2 2
2
2 2
(x1 − x̄) + (x2 − x̄) + (x3 − x̄)
2
(1 − 13 ) + (0 − 31 ) + (0 − 31 ) 1
s = = = .
n−1 3−1 3
Suppose our observed random sample of size 5 is (0, 1, 0, 0, 1). Then the corresponding
observed sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 + x4 + x5 0 + 1 + 0 + 0 + 1 2
x̄ = = = = 0.4,
n 5 5
2 2 2 2 2
2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) + (x4 − x̄) + (x5 − x̄)
s =
n−1
1 2 1 2 2 2 2
(0 − 5 ) + (1 − 5 ) + (0 − 51 ) + (0 − 15 ) + (1 − 51 )
= = 0.35.
5−1
Example 672 (continued from above). It is the night of the election and polling has
just closed. We still do not know the true proportion µ that voted for Dr. Chee.
We decide to get a random sample of size 3: (X1 , X2 , X3 ). The corresponding sample mean
X̄3 = (X1 + X2 + X3 ) /3 shall be an estimator for µ. (Informally, an estimator is a method
for generating “guesses” for some unknown parameter, in this case µ.)
This estimator is used to generate estimates (“guesses”) for µ. For every observed
random sample, the estimator generates an estimate.
Suppose our observed random sample of size 3 is (1, 0, 0). We calculate the corresponding
observed sample mean to be x̄ = 1/3. We say that x̄ = 1/3 is an estimate for µ.
(By the way, unless we are extremely lucky, it is highly unlikely that the true value of the
unknown parameter µ is precisely 1/3. After all, 1/3 is merely an estimate obtained from
a single observed random sample of size 3.)
Suppose instead that our observed random sample of size 3 were (0, 1, 1). Then the cor-
responding observed sample mean would be x̄ = 2/3. We’d instead say that x̄ = 2/3 is our
estimate for µ.
There is also more than one estimator we can use. For example, suppose instead that we
decide to get a random sample of size 5: (X1 , X2 , X3 , X4 , X5 ). We shall instead use the
corresponding sample mean X̄ = (X1 + X2 + X3 + X4 + X5 ) /3 as our estimator for µ. And
so for example suppose our observed random sample of size 5 is is (0, 1, 0, 0, 1). Then the
corresponding observed sample mean x̄ = 0.4 and x̄ = 0.4 would be our estimate for µ.
Now, are these estimators and estimates “good” or “reliable”? How much should we
trust them? These are questions that we’ll address in the next section.
Example 592. Suppose we wish to find the average height µ (in cm) of an adult male.
As a practical matter, it would be quite difficult to locate and record the height of every
adult male in the world. So instead, what we might do is to randomly pick 4 adult males
and record their heights. This gives us a random sample (H1 , H2 , H3 , H4 ) of heights. The
corresponding sample mean is the random variable H̄ = (H1 + H2 + H3 + H4 ) /4. H̄ shall
serve as our estimator for µ.
Suppose our observed random sample is (h1 , h2 , h3 , h4 ) = (178, 165, 182, 175).
Thus, h̄ = 175 serves as an estimate (or “guess”) of the true average male height µ.
Again, are the estimator H̄ and estimate h̄ = 175 “good” or “reliable”? How much should
we trust them? These are questions that we’ll address in the next section.
8 8
∑ xi = 1, 320 and ∑ x2i = 218, 360.
i=1 i=1
Then the observed sample mean x̄ and the observed sample variance s2 are
n
∑i=1 xi 1320
x̄ = = = 165,
n 8
2
(∑n xi )
218360 − 1320
2
∑i=1 x2i − i=1n
n
2 8
s = = = 80.
n−1 7
And our estimates for µ and σ 2 are, respectively, 165 cm and 80 cm2 .
8 8
2
∑(xi − 160) = 72 and ∑ (xi − 160) = 1, 560.
i=1 i=1
Then the observed sample mean x̄ and the observed sample variance s2 are
2
[∑ (x −a)]
2 n
1, 560 − 728
2
∑i=1 (xi − 160) − i=1 ni
n
2
s = = ≈ 130.3.
n−1 7
And our estimates for µ and σ 2 are, respectively, 169 cm and 130.3 cm2 .
Exercise 256. (Answer on p. 1177.) Let X be the random variable that is the weight (in
kg) of an American. Suppose we are interested in estimating the true population mean µ
and variance σ 2 of X. We get an observed random sample of size 10: (x1 , x2 , . . . , x10 ).
10 10
(a) Suppose you are told that ∑ xi = 1, 885 and ∑ x2i = 378, 265. Find the observed sample
i=1 i=1
mean x̄ and observed sample variance s2 .
10 10
2
(b) Suppose you are instead told that ∑(xi − 50) = 1, 885 and ∑ (xi − 50) = 378, 265. Find
i=1 i=1
2
the observed sample mean x̄ and observed sample variance s .
Earlier we asked: How do we decide if an estimator and the estimates it generates are
“good”? How do we know whether to trust any given estimate?
For H2 Maths, we’ll learn only about one (important) crtierion for deciding whether an
estimator is “good”. This is unbiasedness. Informally, an estimator is unbiased if on
average, the estimator “gets it right”. Formally:
Definition 133. Let X be a random variable and θ ∈ R be a parameter (i.e. just some real
number). We say that X is an unbiased estimator for θ if
E [X] = θ.
The next proposition says that the sample mean X̄ is an unbiased estimator for the
population mean µ; and the sample variance S 2 is an unbiased estimator for the
population variance σ 2 .
Proposition 15. Let (X1 , X2 , . . . , Xn ) be a random sample of size n drawn from a distri-
bution with population mean µ and population variance σ 2 . Let X̄ be the sample mean and
S 2 be the sample variance. Then
(a) E [X̄] = µ. And
(b) E [S 2 ] = σ 2 .
Proof. You are asked to prove (a) in Exercise 258. For the proof of (b), see p. 993 in the
Appendices (optional).
Proposition 15(b) is the reason why, strangely enough, we define the sample variance with
n − 1 in the denominator:
2 2 2
(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄)
2
S = .
n−1
As defined, S 2 is an unbiased estimator for the population variance σ 2 . This, then, is the
reason why we define it like this.
Some writers call S 2 the unbiased sample variance, but we shall not bother doing so. We’ll
simply call S 2 the sample variance.
Suppose two observed random samples of size 3 are (x1 , x2 , x3 ) = (1, 0, 0) and (x1 , x2 , x3 ) =
(1, 0, 1). The corresponding observed sample means are x̄1 = 1/3 and x̄2 = 2/3. These are
two possible estimates (“guesses”) of the true sample proportion µ.
Unless we’re extremely lucky, it’s unlikely that either of these two estimates is exactly
correct. Nonetheless, what the above unbiasedness proposition tells us is this:
Suppose the unknown population mean is µ = 0.39. We draw the following 10 observed
random samples of size 3 (table below). For each sample i, we calculate the corresponding
observed sample mean x̄i .
Sample i x1 x2 x3 x̄i
1 1 0 1 2/3
2 0 0 0 0
3 0 1 0 2/3
4 1 0 0 1/3
5 0 1 1 2/3
6 1 0 0 1/3
7 0 0 0 0
8 0 0 0 0
9 0 0 1 1/3
10 1 1 0 2/3
Note that every estimate x̄i is wrong. Indeed, since the sample mean X̄i can only take on
values 0, 1/3, 2/3, or 1, the estimates can never possibly be equal to the true µ = 0.39.
Nonetheless, what the above proposition says informally is that on average, the estimate
gets it correct. Formally, E [X̄] = µ = 0.39.
For a demonstration that you can play around with, try this Google spreadsheet.
Exercise 258. Prove that E [X̄] = µ. (This is part (a) of Proposition 15). (Answer on p.
1178.)
Exercise 259. Suppose we flip a coin 10 times. The first 7 flips are heads and the next 3
are tails. Let 1 denote heads and 0 denote tails. (Answer on p. 1178.)
(a) Write down, in formal notation, our observed random sample, the observed sample
mean, and observed sample variance.
(b) Are these observed sample mean and variance unbiased estimates for the true population
mean and variance?
(c) Can we conclude that this a biased coin (i.e. the true population mean is not 0.5)?
This section is just to repeat, stress, and emphasise that the sample mean X̄ is itself a
random variable. This is an important point.
Indeed, the sample mean X̄ is both (i) a random variable; and (ii) an estimator. In
contrast, an observed sample mean x̄ is both (i) a real number; and (ii) an estimate.
We’v showed that E [X̄] = µ. This equation can be interpreted in two equivalent ways:
• The expected value of the sample mean equals the population mean µ.
• The sample mean is an unbiased estimator for the population mean µ.
We now give the variance of the sample mean. It turns out to be equal to the population
variance σ 2 , divided by the sample size n.
σ2
Fact 82. V [X̄] = .
n
1
Exercise 260. Prove Fact 82. (Hint: Note that X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) and X1 , X2 ,
n
. . . , Xn are independent.) (Answer on p. 1178.)
Exercise 261. For each of the following terms, give a formal definition and an intuitive
explanation. (State whether each term is a random variable or a real number.) For sim-
plicity, you may assume that the finite population is given by P = (x1 , x2 , . . . , xk ). (Answer
on p. 1179.)
X1 + X2 + ⋅ ⋅ ⋅ + X n σ2
X̄n = ∼ N (µ, ) .
n n
Proof. Corollary 7 tells us that the sum of normal random variables is itself a normal
random variable. So X1 + X2 + ⋅ ⋅ ⋅ + Xn is a normal random variable.
Fact 79 tells us that a linear transformation of a normal random variable is itself a normal
random variable. So X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is a normal random variable.
In the previous sections, we already showed that X̄n has mean µ and variance σ 2 /n.
σ2
Altogether then, X̄n ∼ N (µ, ).
n
X 1 + X2 + ⋅ ⋅ ⋅ + X n
X̄n = .
n
σ2
Then lim X̄n ∼ N (µ, ).
n→∞ n
Proof. The CLT says that if n is “large enough”, then X1 +X2 +⋅ ⋅ ⋅+Xn is well-approximated
by the normal distribution N (nµ, nσ 2 ).
And so it follows from Fact 79 (a linear transformation of a normal random variable is itself
a normal random variable) that X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is well-approximated by the
σ2
normal distribution N (µ, ).
n
In the next chapter, we’ll make greater use of the two results given in this section.
Example 594. Suppose we’re interested in the average height of a Singaporean. The only
way to know this for sure is to survey every single Singaporean. This, however, is not
practical.
Instead, we have only the resources to survey 100 individuals. We decide to go to a bas-
ketball court and measure the heights of 100 people there. We thereby gather an ob-
served sample of size 100: (x1 , x2 , . . . , x100 ). We find that the average individual’s height is
x̄ = ∑ xi /100 = 179 cm.
The reason is that our observed sample of size 100 was non-random. We picked a basketball
court, where the individuals are overwhelmingly (i) male; and (ii) taller than average. Our
estimate x̄ = 179 cm is thus probably biased upwards.
Example 595. Suppose we’re interested in what the average Singaporean family spends
on food each month. The only way to know this for sure is to survey every single family in
Singapore. This, however, is not practical.
Instead, we have only the resources to survey 100 families. We decide to go to Sixth
Avenue and randomly ask 100 families living there what they reckon they spend on food
each month. We thereby gather an observed sample of size 100: (x1 , x2 , . . . , x100 ). We find
that the average family spends x̄ = ∑ xi /100 = $2, 700 on food each month.
Is x̄ = $2, 700 an unbiased estimate of the average monthly spending on food by a Singa-
porean family? Intuitively, we know that the answer is obviously no.
The reason is that our observed sample of size 100 was non-random. We picked an unusually
affluent neighbourhood. Our estimate x̄ = $2, 700 is thus probably biased upwards.
SYLLABUS ALERT
This is in the 9740 (old) syllabus, but not in the 9758 (revised) syllabus. So you can skip
this section if you’re taking 9758.
The problem with this is that we might be unlucky and get disproportionately many males
or females. For example, we might get an observed random sample of 60 males and 40
females. Since males are taller than females, our observed sample mean height x̄ would
probably be an overestimate of the true µ.
To reduce such bad luck, we might first divide the population into 2 strata: Male and
Female. Assume that 0.49 of the population is male and 0.51 of the population is female.
Then within the Male stratum, we randomly pick 49 individuals; and within the Female
stratum, we randomly pick 51 individuals. This is called stratified sampling.
The sex ratio of our random sample is thus guaranteed (by design) to closely match that
of the population.
81
More formally, stratified sampling reduces the variance of the estimator.
Example 597. Again, want to know the average height µ of a Singaporean. We decide to
use a random sample of 100 individuals. The sample mean height X̄ will be our estimator
for µ.
Again, we divide the population into 2 strata: Male and Female. Assume that 0.49 of the
population is male and 0.51 of the population is female.
In stratified sampling, we would have randomly picked 49 males and 51 females. In contrast,
in quota sampling, the interviewer is given the freedom to choose any 49 males and
any 51 females, in whatever way the interviewer deems appropriate.
A small advantage of quota sampling is that it gives the interviewer more flexibility and
can thus speed up the collection of data.
The big disadvantage with quota sampling is that the interviewer may unwittingly introduce
biases. For example, told simply to choose 49 males and 51 males, the interviewer might
choose respondents who are more attractive, more friendly-looking, of the same race, etc.
Example 598. The 1948 US Presidential Election featured Harry Truman (Democrat) vs
Thomas Dewey (Republican). In the months leading to the election, polls were almost
unanimous that there’d be a landslide victory for Dewey. They were wrong.
The following is taken from p. 19 of a 1949 report analysing the 1948 polling disaster.
Every poll gave Dewey a sizeable lead. Every poll was wrong.
Dewey Truman T − D
Gallup Poll 49.5% 44.5% −5.0%
Crossley Poll 49.9% 44.8% −5.1%
Roper Poll 52.2% 37.1% −15.1%
Actual 45.1% 49.5% +4.4%
This polling disaster was immortalised by the following photograph of Truman holding up
a copy of the Chicago Daily Tribune. On the morning following the election, the Tribune
had decided to run with the frontpage headline “DEWEY DEFEATS TRUMAN”, even
before knowing the official result.
What explains this polling disaster? One explanation (among many — see cited report)
was that the pollsters used quota sampling.
“Interviewers were each given an assigned quota specifying the number of men, women,
and persons of various economic levels to be interviewed.” It was believed that this would
reduce “the biases that might result if interviewers chose their respondents freely” (p. 12,
id.)
However, other than being subject to these quotas, interviewers had complete freedom to
choose their respondents. They could thus operate as “biased” selecting devices (p. 84). For
example, an interviewer who was himself likely to vote Dewey might tend (unconsciously or
otherwise) to choose respondents who were themselves also likely to vote Dewey (p. 136ff).
Polling has improved since 1948. For example, pollsters have now moved away from quota
sampling. Nonetheless, polling disasters still occur occasionally. Here’s a recent example:
Alas, these polls got it terribly wrong. On the day of the referendum itself, the “Leave”
campaign won 52 − 48. (Note though that the pollsters, having learnt the lesson of 1948,
probably didn’t use quota sampling. So the use of quota sampling was probably not one of
explanations of why the Brexit polling got it wrong.)
To be fair, pollsters were not the only ones who got it wrong. Even the financial markets
had also expected “Remain” to win. This was dramatically illustrated by the plunge in the
value of the British pound on the night of the elections, as the results came in:
Example 600. We want to know the average weight µ of the 28, 311 undergraduate stu-
dents enrolled at NUS. We decide to use a random sample of 100 individuals. The sample
mean weight X̄ will be our estimator for µ.
We have a complete list of all the students. We order them alphabetically from student #1
to student #28, 311.
We then get every 250th student on the list, until we have 100 individuals. That is, we
get student #250, student #500, student #750, . . . , and finally student #25, 000. This is
called systematic sampling.
The advantage of systematic sampling is that it takes away the freedom of interviewers to
choose their respondents. The sample is thus more likely to be truly random.
The disadvantage of systematic sampling is that it may, by coincidence, introduce a large
and systematic bias. Here’s an example.
As a result, we’ll probably obtain an overestimate of the true average household spending
in this HDB estate.
Here’s a quick sketch of how Null Hypothesis Significance Testing (NHST) works:
Example 602. A piece of equipment has probability θ of breaking down. We have many
pieces of the same type of equipment. Assume the rates of breakdown across the pieces of
equipment are identical and independent.
4. Write down a test statistic. In this case, an obvious test statistic is the sample number
of failures T = X1 + X2 + X3 + X4 + X5 . Our observed test statistic is thus t = x1 + x2 +
x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1.
5. Now ask, how likely is it that — if H0 were true — our test statistic would have been
“at least as extreme as” that actually observed? That is, what is the probability
In this case, the p-value is the probability of observing a random sample where 1 or fewer
pieces of equipment broke down, assuming H0 ∶ θ = 0.6 were true. That is,
p = P (T ≤ t = 1∣H0 ) .
Now, remember that T is a random variable. In fact, it’s a binomial random variable.
Assuming H0 to be true, we have T ∼ B (n, θ) = B (5, 0.6). Thus,
⎛5⎞ 0 5 ⎛5⎞ 1 4
p = P (T ≤ 1∣H0 ) = P (T = 0∣H0 ) + P (T = 1∣H0 ) = 0.6 0.4 + 0.6 0.4 = 0.08704.
⎝0⎠ ⎝1⎠
This says that if H0 were true, then the probability of observing a test statistic as extreme
as the one we actually observed is only 0.08704. We might interpret this relatively small
p-value as casting doubt on or providing evidence against H0 .
1. Null hypothesis H0 (e.g. “this equipment has probability 0.6 of breaking down”).
2. Alternative hypothesis HA (e.g. “this equipment has probability less than 0.6 of
breaking down”). The test is either one-tailed or two-tailed, depending on HA .
4. A test statistic T (which simply maps each observed random sample to a real number.)
5. The p-value of the observed sample. This is the probability that — assuming H0 were
true — T takes on values that are at least “as extreme as” the actual observed test
statistic t.
In particular, if p < α, then we say that we reject H0 at the significance level α. And
if p ≥ α, then we say that we fail to reject H0 at the significance level α.
Note importantly that to reject H0 (at some significance level α) does NOT mean that H0
is false and HA is true. Similarly, failure to reject H0 does NOT mean that H0 is true and
HA is false. More on this below.
Another example of NHST, now slightly more formally and carefully presented.
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
We pre-select α = 0.05 as our significance level. This is the arbitrary threshold at which
we’ll say we reject (or fail to reject) H0 .
We gather a random sample of 100 votes: (X1 , X2 , . . . , X100 ). Our test statistic is the
number of votes in favour of Dr. Chee, given by
T = X1 + X2 + ⋅ ⋅ ⋅ + X100 .
Suppose that in our observed random sample (x1 , x2 , . . . , x100 ), we find that 39 are in favour
of Dr. Chee. Our observed test statistic is thus t = 39.
We now ask: What is the probability that — assuming H0 were true — T takes on values
that are at least “as extreme as” the actual observed test statistic t? That is, what is the
p-value of the observed sample?
Now, assuming H0 were true, T is a binomial random variable with parameters 100 and
0.3. That is, T ∼ B (n, p) = B (100, 0.3). So:
And since p ≈ 0.03398 < α = 0.05, we can also say that we reject H0 at the α = 0.05
significance level.
When performing NHST, we will assiduously avoid saying things like “H0 is true”, “H0 is
false”, “HA is true”, or “HA is false”. Instead, we will stick strictly to saying either “we
reject H0 at the significance level α” or “we fail to reject H0 at the significance level α”.
Each of these two statements has a very precise meaning. The first says that p < α. The
second says that p ≥ α. Nothing more and nothing less.
Exercise 262. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level,
whether the coin is biased towards heads. (Answer on p. 1180.)
In the previous section, all the NHST we did were one-tailed tests.82 For example, in the
NHST done for Dr. Chee, we had
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
This was a one-tailed test because the alternative hypothesis HA was that µ was to the
right of 0.3.
If instead we changed the alternative hypothesis to:
H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.
Then this would be called a two-tailed test, because the alternative hypothesis HA is that
µ is either to the left or to the right of 0.3.
We now repeat the examples done in the previous section, but with HA tweaked so that we
instead have two-tailed tests. The difference is that the p-value is calculated differently.
82
By the way, the more common convention is to say “one-tailed” and “two-tailed” tests, rather than “one-tail” and “two-
tail” tests, as is the norm in Singapore (similar to those “Close for break” signs you sometimes see). But after some
consultation with my grammatical experts, I have been told that both are equally correct.
H0 ∶ θ = 0.6,
HA ∶ θ ≠ 0.6.
Say we observe the same random sample as before: (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0).
The difference now is how the p-value (of the observed sample) is calculated. In words, the
p-value gives the likelihood that our test statistic is “at least as extreme as” that actually
observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≤ t = 1.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both
the event T ≤ t = 1 and the event that T is as far away on the other side of E [T ∣H0 ] = 3.
The second event is, specifically, T ≥ 5. Altogether then, the p-value is given by
p = P (T ≤ 1, T ≥ 5∣H0 )
Since p = 0.1648 ≥ α = 0.1, we say that we fail to reject H0 at the α = 0.1 significance
level.
Observe that previously, under the one-tailed test, we could reject H0 at the α = 0.1
significance level, because there p = 0.08704. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.
In general, all else equal, the p-value for an observed random sample is greater under a
two-tailed test than under a one-tailed test. Thus, under a two-tailed test, we are less
likely to reject H0 .
H0 ∶µ = 0.3,
HA ∶µ ≠ 0.3.
Say we observe the same random sample as before: (x1 , x2 , . . . , x100 ), in which 39 votes were
in favour of Dr. Chee. So again our observed test statistic is t = x1 + x2 + ⋅ ⋅ ⋅ + x100 = 39.
The difference now is how the p-value (of the observed sample) is calculated. In words, the
p-value gives the likelihood that our test statistic is “at least as extreme as” that actually
observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≥ t = 39.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both
the event T ≥ t = 39 and the event that T is as far away on the other side of E [T ∣H0 ] = 30.
The second event is, specifically, T ≤ 21. Altogether then, the p-value is given by
Since p = 0.06281 ≥ α = 0.05, we say that we fail to reject H0 at the α = 0.05 significance
level.
Again observe that previously, under the one-tailed test, we could reject H0 at the α = 0.05
significance level, because there p = 0.03398. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.
Exercise 263. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level,
whether the coin is biased.(Answer on p. 1180.)
However, NHST is widely misunderstood, misinterpreted, and misused even within scientific
communities. It has long been heavily criticised. In March 2016, the American Statistical
Association even issued an official policy statement on how NHST should be used!
p = P (D∣H0 ) ,
where D stands for the observed data and H0 stands for the null hypothesis. The p-value
answers the following question: — assuming H0 were true, what’s the probability that we’d
get data “at least as extreme” as those actually observed (D)?
Say we get a p-value of 0.03. We should then say simply that
However, instead of merely saying the above, some researchers may instead conclude that:
Do you see the error here? The researcher has gone from the finding that p = P (D∣H0 ) = 0.03
to the conclusion that P (H0 ∣D) = 0.03. This is precisely the Conditional Probability Fallacy
(CPF), which we discussed at length in subsection 56.1.
The error is the same as leaping from “A lottery ticket buyer who doesn’t cheat has a small
probability q of winning” to “Jane bought a lottery ticket and won. Therefore, there is only
probability q that she didn’t cheat.”
The p-value is NOT the probability that H0 is true.83 Instead, it is the probability that
— assuming H0 were true — we would have gotten data “at least as extreme” as those
actually observed. This is an important difference. But it is also a subtle one, which is why
even researchers get confused.
83
Indeed, under the objectivist view, such a statement is nonsensical anyway, because H0 is either true or not true; it makes
no sense to talk probabilistically about whether H0 is true.
Example 603. On the night of the 2016 Bukit Batok SMC By-Election, the Elections
Department announced* that based on a sample count of 900 ballots,
What does the above gobbledygook mean? Let µ be the true proportion of votes won by
Dr. Chee. Let X̄ be the sample proportion and x̄ be the observed sample proportion.
It’s clear enough what the 39% means — they randomly counted 900 ballots and found
(after accounting for any spoilt votes) that x̄ = 39% were in favour of Dr. Chee.
What’s less clear is what the 95% confidence level and ±4% margin of error mean.
Here are three possible interpretations of what is meant. Only one is correct.
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between 0.35 and 0.43.
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between µ − 0.04 and µ + 0.04.
Take a moment to understand what each of the above interpretations say. Then decide
which you think is the correct interpretation, before turning to the next page.
Unfortunately, the correct interpretation is also the one that says the least. It is Interpre-
tation #3 — “with probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04)”.
This interpretation says merely that if we were somehow able to repeatedly observe random
samples of size 900, then we’d find that 0.95 of the corresponding observed sample means
will be in (µ − 0.04, µ + 0.04). Which isn’t saying much, because first of all, we have only one
observed random sample; we do not get to repeatedly observe random samples. Secondly,
this still doesn’t tell us much about µ, which is what we’re really interested in.
The correct interpretation (Interpretation #3) is the least interesting interpretation. Per-
haps this explains why journalists often prefer to give an incorrect interpretation.
*E.g. the article “Margin of Ignorance” (backup) begins by reporting poll results that Kerry-Edwards was supported by 51%
of voters, while Bush-Cheney was supported by 45%. The author then ridicules other journalists for their misinterpretation
of these data. (He also claims, incorrectly, that polling is based on the Central Limit Theorem.) He then triumphantly
gives the “correct” explanation: “95 times out of 100 the true Kerry-Edwards number will fall between 47 and 55 and the
Bush-Cheney number will fall between 41 and 49.” This, of course, is what we called incorrect Interpretation #1 above.
See section 89.9 in the Appendices for a discussion of where the Elections Department’s
±4% margin of error comes from.
Example 604. On the night of the 2016 Bukit Batok SMC By-Election, a website called
Mothership.sg wrote:
“Based on the sample count of 100 votes,* it was revealed at 9.26pm that the SDP Sec-Gen
received 39 percent of votes. In other words, Chee would score 35 per cent in the worst
case scenario and 43 per cent in the best case scenario.”
This is the most absurd misinterpretation of the margin of error I have ever seen.**
Let’s see what the correct worst- and best-case scenarios are.
Suppose that in the observed random sample of 900 votes, exactly 39% or 0.39 × 900 = 351
were votes for Dr. Chee and the remaining 549 were for PAP Guy. Then:
• Worst-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of Dr. Chee. That is, Dr. Chee won only 351 votes
and PAP Guy won the remaining 23, 570 − 351 = 23, 219 votes. So the correct worst-case
scenario is that Dr. Chee won ≈ 1.5% of the votes.
• Best-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of PAP Guy. That is, PAP Guy won only 549 votes
and Dr. Chee won the remaining 23570 − 549 = 23, 021 votes. So the correct best-case
scenario is that Dr. Chee won ≈ 97.7% of the votes.
These worst- and best-case scenarios are admittedly unlikely. Nonetheless, they are pos-
sible scenarios all the same. The journalist’s purported worst- and best-case scenarios are
completely wrong.
*By the way, even this basic fact was wrong. The sample count was not 100 votes. Instead, it was 900 votes, consisting of
100 votes from each of 9 polling stations.
Moreover, the Mothership.sg journalist failed to report the confidence level of 95%, either because he didn’t know what it
meant or because he didn’t think it important. But it is important. It is pointless to inform the reader about the margin of
error without also specifying the confidence level.
**You can find several misinterpretations of the margin of error collected in this academic paper: “Erring in the Margin of
Error”. None is as absurdly bad as the one here.
SYLLABUS ALERT
This is new to the 9758 (revised) syllabus. So skip this section if you’re taking 9740 (old).
Informally, the critical region is the set of values of the observed test statistic t for which
we would reject the null hypothesis. The critical region is thus sometimes also called the
rejection region.
And the critical value(s) is (are) the exact value(s) of the observed test statistic t at
which we are just able to reject the null hypothesis.
Say that as before, we have a one-tailed test where the two competing hypotheses are:
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
Say that as before, in our observed random sample of 100 votes, 39 are in favour of Dr.
Chee, so that our observed test statistic is t = 39.
We calculated that the corresponding p-value is 0.03398 and so we were able to reject H0
at the α = 0.05 significance level.
We now calculate the critical region and the critical value. We can calculate that if
t = 38, then the corresponding p-value is ≈ 0.053 (you should verify this for yourself). And
so we would be unable to reject H0 .
We thus conclude that the critical value is 39, because this is the value of t at which we
are just able to reject H0 .
And the critical region is the set {39, 40, 41, . . . , 100}. These are the values at which we’d
be able to reject H0 at the α = 0.05 significance level.
Say that as before, we have a two-tailed test where the two competing hypotheses are:
H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.
The significance level is again α = 0.05. Again, the observed random sample of 100 votes
contains 39 in favour of Dr. Chee, so that our observed test statistic is t = 39.
We calculate that if t = 40, then the corresponding p-value is ≈ 0.03745 (you should verifty
this for yourself). Thus, the critical values are 20 and 40, because these are the values of t
at which we are just able to reject H0 .
The critical region is the set {0, 1, . . . , 20, 40, 41, . . . , 100}. These are the values at which
we’d be able to reject H0 at the α = 0.05 significance level.
Exercise 264. (Answer on p. 1181.) We flip a coin 20 times. What are the critical region
and critical value(s) in
(a) A test, at the 5% significance level, of whether the coin is biased towards heads.
(b) A test, at the 5% significance level, of whether the coin is biased.
Example 605. The weight (in mg) of a grain of sand is X ∼ N (µ, 9). Our unknown
parameter of interest is the true population mean µ (i.e. the true average weight of a grain
of sand). Our “guess” is that µ = 5. We thus write down two competing hypotheses:
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
We take a random sample of size 4 — (X1 , X2 , X3 , X4 ). Our test statistic is the sample
mean X̄ = (X1 + X2 + X3 + X4 ) /4.
Our observed random sample is (x1 , x2 , x3 , x4 ) = (3, 9, 11, 7). That is, we randomly pick
four grains of sand that happen to have weights 3, 9, 11, and 7 mg. Then the observed test
statistic is
3 + 9 + 11 + 7
x̄ = = 7.5.
4
The p-value is the probability that the test statistic X̄ takes on values “at least as extreme
as” our observed test statistic x̄ = 7.5, assuming H0 ∶ µ = 5 were true. Note that if H0 were
true, then X̄ ∼ N (µ, σ 2 /n) = N (5, 9/4). Thus, the p-value is given by
⎛ 7.5 − 5 ⎞ ⎛ 2.5 − 5 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.04779 + 0.04779 = 0.09558.
⎝ 9/4 ⎠ ⎝ 9/4 ⎠
Thus, we reject H0 at the α = 0.1 significance level. However, we would fail to reject H0 at
the α = 0.05 significance level.
X̄ − µ
Any Normal Known Z-test: √ ∼ N(0, 1).
σ/ n
X̄ − µ
Large Any Known Z-test: √ ∼ N(0, 1).
σ/ n
X̄ − µ
Large Any Unknown Z-test: √ ∼ N(0, 1).
s/ n
Exercise 265. The Singapore daily high temperature (in °C) can be modelled by X ∼
N (µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the
true average daily high temperature). Your friend guesses that µ = 34. You gather
the following data on daily high temperatures, of 10 randomly-chosen days in 2015:
(35, 35, 31, 32, 33, 34, 31, 34, 35, 34). Test your friend’s hypothesis, at the α = 0.05 signifi-
cance level. (Be sure to write down your null and alternative hypotheses.) (Answer on p.
1182.)
We’ll recycle the same example from the previous section. Before, we knew that X was
normally distributed. Now the big difference is that we have absolutely no idea what
distribution X comes from!
To compensate, we require also that our random sample is “large enough”, so that the
CLT-approximation can be used.
Example 606. The weight (in mg) of a grain of sand is X ∼ (µ, 9). (This says simply that
X is distributed with mean µ and variance 9.) Our unknown parameter of interest is the
true population mean µ (i.e. the true average weight of a grain of sand). Again, we “guess”
that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
This time, we’ll take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test
statistic is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Recall the magic of the CLT. Even if we have absolutely no idea what distribution X
is drawn from, then provided n is sufficiently large, X̄ is normally distributed. So here,
since the sample is large (n = 100 ≥ 20), by the CLT, we know that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, 9/100).
x1 + x2 + ⋅ ⋅ ⋅ + x100
x̄ = = 5.5.
100
Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by
= P (Z ≥ 2) + P (Z ≤ −2) ≈ 0.0455.
Exercise 266. The Singapore daily high temperature (in °C) can be modelled by X ∼
(µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the true
average daily high temperature). Your friend guesses that µ = 34. You gather the data
on daily high temperatures, of 100 randomly-chosen days in 2015 and find the observed
sample average temperature to be 33.4 °C. Test your friend’s hypothesis, at the α = 0.05
significance level. (Be sure to write down your null and alternative hypotheses. Also, clearly
state where you use the CLT.) (Answer on p. 1182.)
We’ll recycle the same example from the previous section. Again, we have absolutely no
idea what distribution X comes from. And again, the random sample is large enough, so
that the CLT can be used.
But now, σ 2 is unknown. This turns out to be no big deal. We can simply replace σ 2
with the observed unbiased sample variance s2 , and do the same thing as before.
Example 607. The weight (in mg) of a grain of sand is X ∼ (µ, σ 2 ). (This says simply
that X is distributed with mean µ and variance σ 2 .) Our unknown parameter of interest
is the true population mean µ (i.e. the true average weight of a grain of sand). Again, we
“guess” that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
Again, we take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test statistic
is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Again, since the sample is large (n = 100 ≥ 20), by the CLT, that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, σ 2 /100). Since the sample variance S 2 is an unbiased estimator for
σ 2 , it is plausible that we also have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, s2 /100), where
s2 is the observed sample variance.
Say the observed sample mean and observed sample variance we get are:
100 2
x1 + x2 + ⋅ ⋅ ⋅ + x100 2 ∑i=1 (xi − x̄)
x̄ = = 5.6 and s = =8
100 n−1
Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by
Exercise 267. The Singapore daily high temperature (in °C) can be modelled by X ∼
(µ, σ 2 ). Our unknown parameter of interest is the true population mean µ (i.e. the true
average daily high temperature). Your friend guesses that µ = 34. You gather the data
on daily high temperatures, of 100 randomly-chosen days in 2015. Your observed sample
mean temperature is 33.4 °C and your observed sample variance is 11.2 °C2 . Test your
friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null and
alternative hypotheses. Also, clearly state where you use the CLT.) (Answer on p. 1183.)
SYLLABUS ALERT
This is in the 9740 (old) syllabus, but not in the 9758 (revised) syllabus. So you can skip
this section if you’re taking 9758.
We’ll recycle the same example from section 72.5. The big difference is that now σ 2 (the
variance of X) is unknown:
Example 608. The weight (in mg) of a grain of sand is X ∼ N (µ, σ 2 ). Our unknown
parameter of interest is the true population mean µ (i.e. the true average weight of a grain
of sand). Again, we “guess” that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
This time, we have resources only to take a random sample of size 4 — (X1 , X2 , X3 , X4 ).
Suppose our observed random sample is (x1 , x2 , x3 , x4 ) = (3, 9, 11, 7). That is, we randomly
pick four grains of sand with weights 3, 9, 11, and 7 mg. Then the observed test statistic is
3 + 9 + 11 + 7
x̄ = = 7.5.
4
Note that if H0 were true, then X̄ ∼ N (µ, σ 2 /n) = N (5, σ 2 /4). It is however no good if we
do not know what σ 2 is. Here’s an idea. Why not we use the unbiased sample variance s2
in place of σ 2 ? We can easily calculate s2 :
2 2 2 2 2
2 ∑ (xi − x̄) (3 − 7.5) + (9 − 7.5) + (11 − 7.5) + (7 − 7.5) 35
s = = = .
n−1 4−1 3
Wonderful. Can we now conclude, as we did in the previous section, that since the sample
variance S 2 is an unbiased estimator for σ 2 , we have, approximately, X̄ ∼ N (µ, s2 /n) =
35
N (5, /4)? Sadly, we cannot do so here in the small sample case. We cannot simply
3
replace the unknown true variance σ 2 with the unbiased sample variance s2 and say that
we have a good approximate distribution.
X̄ − µ
Instead, it turns out that the random variable √ has Student’s t-distribution with
s/ n
n − 1 degrees of freedom (the proof of this fact is omitted from this textbook). Equiva-
X̄ − µ
lently, we may write √ = Tn−1 . Just like with the normal distribution, we can use our
s/ n
graphing calculator to calculate probabilities associated with the t-distribution. (There are
also tables we can use.)
Tν is the random variable with Student’s t-distribution with ν degrees of freedom. Tν has a
rather complicated PDF that we shall relegate to the Appendices (see Definition 170). All
you need know is that
• Tν has mean 0.
• Its PDF looks very similar to that of the standard normal distribution, except it is
shorter and fatter.
• As ν → ∞, its PDF looks taller and thinner, until eventually it coincides exactly with Z.
Observe that previously, when σ 2 was known, we had p = 0.09558 and we were able to reject
H0 at the α = 0.1 significance level. Now in contrast, when σ 2 is unknown, the p-value is
much larger (p ≈ 0.2394) and we are unable to reject H0 at the same significance level.
(This observation holds in general. All else equal, it is harder to reject H0 when running a
t-test than when running a Z-test.)
SYLLABUS ALERT
This is new to the 9758 (revised) syllabus. So skip this section if you’re taking 9740 (old).
Example 609. We flip a coin 100 times. We get 100 heads. What can we say about the
coin?
This is an open-ended question, to which there can be many different answers. Here’s the
answer we’re taught to give for H2 Maths:
H0 ∶ µ = 0.5,
HA ∶ µ ≠ 0.5.
Our test statistic T is the number of heads (out of 100 coin-flips). Our observed test
statistic t is 100. The corresponding p-value (note that this is a two-tailed test) is
We note also that we can easily reject H0 at any of the conventional significance levels
(α = 0.1, α = 0.05, or α = 0.01).
Exercise 269. (Answer on p. 1184.) We observe the weights (in kg) of a random sample
of 50 Singaporeans: (x1 , x2 , . . . , x50 ). We observe that ∑ xi /50 = 68 and ∑ x2i /50 = 5000.
A friend claims that the average American is heavier than the average Singaporean. It
is known that the average American weighs 75 kg. Is your friend correct? If you make
any assumptions or approximations, make clear exactly where you do so. (Hint: Use Fact
81(a)).
In this chapter, we’ll be interested in the relationship between two sets of data.
Example 610. We measure the heights and weights of 10 adult male Singaporeans. Their
heights (in cm) and weights (in kg) are given in this table:
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
We call (hi , wi ) observation i. So for example, observation 5 is (178, 72) and observation
9 is (150, 44).
We can plot a scatter diagram of these 10 persons’ weights (vertical axis) against their
heights (horizontal).
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
The black dotted line is called a line of best fit. Shortly (section 73.4), we’ll learn how
to construct this line of best fit.
The more closely the data points in the above scatter diagram lie to a straight line, the more
strongly linearly-correlated are weight and height. So here with these particular data,
the linear correlation between weight and height seems strong. In the next section, we’ll
learn about the product moment correlation coefficient, which is a way to precisely
quantify the degree to which two sets of data are linearly-correlated.
Because the line of best fit is upward-sloping, we can also say that the linear correlation is
positive.
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
Again, the black dotted line is a line of best fit. The data points do not seem close to this
line. Thus, it seems that the linear correlation between temperature and rainfall is weak.
The line of best fit is downward-sloping and so we say that the linear correlation is negative.
Exercise 270. (Answer on p. 1185.) The table below shows the prices charged (p) and
the number of haircuts (q) given by 5 different barbers, during June 2016.
Draw a scatter diagram with price on the horizontal axis. Plot also what you think looks
like a line of best fit.
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
In the previous section, we used a scatter diagram to determine if there was a plausible
linear relationship between two sets of data. This, though, was a very crude method.
A more precise measure of the degree to which two sets of data are linearly correlated is
called the product moment correlation coefficient (PMCC). Formally:
Definition 134. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of real numbers.
The product moment correlation coefficient (PMCC) is the following real number:
1. −1 ≤ r ≤ 1. (Surprisingly, this can be proven using vectors: Fact 105 in the Appendices.)
2. We say the linear correlation is positive if r > 0 and negative if r < 0.
3. If r = 1, the linear correlation is positive and perfect.
8. r is merely a measure of linear correlation and nothing else. Two variables may be very
closely related but not linearly-correlated. For example, data generated by the quadratic
model yi = x2i may have a very low r.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
182 + 165 + 173 + 155 + 178 + 174 + 169 + 160 + 150 + 190
h̄ = = 169.6,
10
81 + 70 + 71 + 53 + 72 + 75 + 69 + 60 + 44 + 80
w̄ = = 67.5,
10
n
∑ (hi − h̄) (wi − w̄) = (182 − h̄) (81 − w̄) + ⋅ ⋅ ⋅ + (190 − h̄) (80 − w̄) = 1237
i=1
¿ √
Án 2
Á
À∑ (hi − h̄) = (182 − 169.6)2 + ⋅ ⋅ ⋅ + (190 − 169.6)2 ≈ 37.180640,
i=1
¿ √
Án
Á
À∑ (wi − w̄)2 = (81 − 67.5)2 + + ⋅ ⋅ ⋅ + (80 − 67.5)2 ≈ 35.418922,
i=1
As expected, r > 0 (the linear correlation is positive or, equivalently, the line of best fit is
upward-sloping). Moreover, r is close to 1 (the linear correlation is very strong).
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
≈ −0.1623.
As expected, r < 0 (the linear correlation is negative or, equivalently, the line of best fit is
downward-sloping). Moreover, r is fairly close to 0 (the linear correlation is weak).
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
Correlation does not imply causation. This saying has now become a cliché. Doesn’t make
it any less true.
Hanging suicides
$25 billion 8000 suicides
The PMCC is r ≈ 0.99789126. So the two sets of data are almost perfectly linearly-
correlated. But of course, this doesn’t mean that spending on science causes suicides
or that suicides cause spending on science. More likely, the correlation is simply spurious.
A comic from xkcd:
Example 256 (continued from above). We suspect that the heights and weights of
adult male Singaporeans are linearly-correlated. We thus write down this linear model:
w = a + bh.
Recall the quote: “All models are wrong, but some are useful.” The model w = a + bh is
unlikely to be exactly correct. But hopefully it will be useful.
We recycle the data from earlier. These, along with the scatter diagram, are reproduced
for convenience.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
The basic idea of linear regression is this: Find the line that “best fits” the given data.
Drawn in the figure above are three plausible candidates for the “line of best fit”. But there
can only be one line of best fit. Which is it?
At the end of the day, we’ll choose black dotted line as “the” line of best fit. But why?
This will be answered in the next section.
p = a + bt.
Again, our goal is to get estimates for the unknown parameters a and b (do you expect b
to be positive or negative?).
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
Again, drawn in the figure above are several plausible candidates for the “line of best fit”.
It turns out that the black dotted line will be “the” line of best fit.
There are different methods for determining “the” line of best fit. Each method will give a
different line of best fit.
The method we’ll learn in H2 Maths is the most basic and most standard method. It is
called the method of ordinary least squares (OLS).
Let’s assume there is some true linear model, which may be written as y = a+bx. As always,
we stick to the objectivist interpretation. The parameters a and b have some true, fixed
values. However, they are unknown (and may forever be unknown).
Nonetheless, we’ll try to do our best and get estimates for a and b. These estimates will be
denoted â and b̂. And our line of best fit will then be y = â + b̂x.
How do we find this line of best fit? Intuitively, this will be the line to which the data
points are “as close as possible”. But there are many ways to define the term “as close
as possible”. For example, we could try to minimise the sum of the distances between the
points and the line. But we shall not do this.
Instead, we’ll use the method of OLS:
1. Measure the vertical distance of each data point (xi , yi ) from the line. This is called the
residual and is denoted ûi .
2. Our goal is to find the line y = â + b̂x that minimises ∑ û2i — this quantity is called the
Sum of Squared Residuals (SSR).
Example:
85
Weight (kg)
80
75
70 5
65
60
55
50
45
Height (cm)
40
145 155 165 175 185 195
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 65 65 65 65 65 65 65 65 65 65
ûi = wi − ŵi (kg) 16 5 6 −12 7 10 4 −5 −21 15
The second last row of the above table gives, for each person with height hi , the correspond-
ing predicted weight ŵi (as per our candidate line of best fit). The residual ûi (last row) is
then defined as the vertical distance between the data point and the weight predicted by
the candidate line of best fit.
10
The SSR is ∑ û2i = 162 + 52 + 62 + (−12)2 + 72 + 102 + 42 + (−5)2 + (−21)2 + 152 = 1317.
i=1
Can we do better than this? That is, can we find another candidate line of best fit whose
SSR is smaller than 1317?
Fact 85. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of data. The OLS
regression line of y on x is y − ȳ = b̂ (x − x̄), where
∑ xi yi − nx̄ȳ
(ii) b̂ = .
∑ x2i − nx̄2
Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is as given
above and â = ȳ − b̂x̄.
Proof. We want to find â and b̂ such that the line y = â + b̂x has the smallest SSR possible.
The residual ûi is defined as the vertical distance between (xi , yi ) and the line y = â + b̂x.
That is,
2
Thus, the SSR is ∑ û2i = ∑ [yi − (â + b̂xi )] .
We wish to minimise the SSR, by choosing appropriate values of â and b̂. This involves the
following pair of first order conditions:84
∂ ∂
∑ û2i = 0, ∑ û2i = 0.
∂â ∂ b̂
The remainder of the proof simply involves taking derivatives and doing the algebra, and
is continued on p. 999 in the Appendices.
Remark 9. Whenever we simply say regression line or line of best fit, it may safely be
assumed that we are talking about the OLS regression line.
84
There’s a bit of hand-waving here.
n n
2
h̄ = 169.6, w̄ = 67.5, ∑ (hi − h̄) = 1382.4, ∑ (hi − h̄) (wi − w̄) = 1237.
i=1 i=1
Thus, the regression line is w − 67.5 = 0.8948 (h − 169.6) or w = â + b̂h = −84.26 + 0.8948h.
90
Weight (kg)
85 4
80
8
75
70
65
60
55
50
45
Height (cm)
40
145 155 165 175 185 195
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8
ûi = wi − ŵi (kg) 2.4 6.6 0.5 −1.4 −3.0 3.6 2.0 1.1 −6.0 −5.8
10
The SSR for the actual line of best fit is ∑ û2i = 2.42 + ⋅ ⋅ ⋅ + (−5.8)2 ≈ 147.6. This is much
i=1
better than the SSR of 1317 that we found for the previous candidate line of best fit, which
was simply a horizontal line.
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i
ûi = qi − q̂i
Example 612. We’ll find the PMCC and the regression line for these data:
i 1 2 3 4 5
xi 1 7 3 11 8
yi 14 5 6 4 4
2. Press the blue 2ND button and then CATALOG (which corresponds to the 0 button).
This brings up the CATALOG menu.
3. Using the down arrow key ∨ , scroll down until the cursor is on DiagnosticOn.
4. Press ENTER once. And press ENTER a second time. The TI84 now says “DONE”,
telling you that the Diagnostic option has been turned on.
The above steps need only be performed once. Unless of course you’ve just reset your
calculator (as is required before each exam). In which case you have to go through the
above steps again.
(Be careful to note that the TI84 uses the symbol “a” for the coefficient for x, whereas in
the A-level List of Formulae, they use b instead. Don’t get these mixed up!)
After Step 9. After Step 10. After Step 11. After Step 12.
Exercise 273. Using your TI84, find the PMCC between q and p, and also find the regres-
sion line of q on p (see data below). Verify that your answer for this exercise is the same
as those in the last two exercises. (Answer on p. 1187.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
Given any value of x, we call the corresponding ŷ = b̂ (x − x̄) + ȳ the fitted value or the
predicted value. One use of the regression line is that it can help us predict (or “guess”)
the value of y, even for x for which we have no data.
Example 610 (height and weight example revisited). Say we want to guess the weight
of an adult male Singaporean who is 185 cm tall. Using our regression line, we predict that
his weight is ŵh=185 = 0.8948 × 185 − 84.26 ≈ 81.3 kg. This is called interpolation, because
we are predicting the weight of a person whose height is between two of our observations.
Say instead we want to guess the weight of an adult male Singaporean who is 210 cm tall.
Using our regression line, we predict that his weight is ŵh=210 = 0.8948 × 210 − 84.26 ≈ 103.6
kg. This is called extrapolation, because we are predicting the weight of a person whose
height is beyond on our rightmost observation.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190 185 210
wi (kg) 81 70 71 53 72 75 69 60 44 80 - -
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8 81.3 103.6
90
80
70
60
50
Height (cm)
40
145 155 165 175 185 195 205 215
This, though, is not a very satisfying explanation for why extrapolation is “less reliable”
than interpolation. It merely leads to another question: “Why should a prediction be more
reliable if done between two known observations, than if done to the right of the right-most
observation (or to the left of the left-most observation)?”
We won’t give an adequate answer to this latter question. Instead, we’ll simply give a
bunch of examples to illustrate the dangers of extrapolation:
Example 613. A man on a diet weighs 115 kg in Week #1. Here’s a chart of his weight
loss.
The OLS line of best fit suggests that he has been losing about 0.5 kg a week.
He forgot to record his weight on Week #6. By interpolation, we “predict” that his weight
that week was 112.5 kg. This is probably a reliable guess.
By extrapolation, we predict that his weight on Week #201 will be 15 kg. This guess is
obviously absurd. It requires that he keeps losing 0.5 kg a week for nearly 4 years.
The OLS line of best fit suggests that he has been growing by about 1 cm a month.
He forgot to record his height in Month #6. By interpolation, we “predict” that his height
that month was 165 cm. This is probably a reliable guess.
By extrapolation, we predict that his height in Month #101 will be 260 cm. This guess is
obviously absurd. It requires that he keep growing by 1 cm a month for the 8-plus years.
Example 615. Russell’s Chicken (Problems of Philosophy, 1912, Google Books link).
The man who has fed the chicken every day throughout its life at last wrings its neck instead,
showing that more refined views as to the uniformity of nature would have been useful to the
chicken. ... The mere fact that something has happened a certain number of times causes
animals and men to expect that it will happen again. Thus our instincts certainly cause
us to believe the sun will rise to-morrow, but we may be in no better a position than the
chicken which unexpectedly has its neck wrung.
F0 = 22 + 1 = 3,
0
F1 = 22 + 1 = 5,
1
F2 = 22 + 1 = 17,
2
F3 = 22 + 1 = 257,
3
F4 = 22 + 1 = 65537.
4
Remarkably, the first five Fermat numbers are all prime. This observation led Fermat to
conjecture (guess) in the 17th century that all Fermat numbers are prime. This was an act
of extrapolation.
Unfortunately, Fermat’s act of extrapolation was wrong. About a century later, Euler
showed that F5 = 22 + 1 = 4294967297 = 641 × 6700417 is composite (not prime).
5
Today, the Fermat numbers F5 , F6 , . . . , F32 are all known to be composite. Indeed, it was
shown in 1964 that F32 is composite. Over half a century later, it is not yet known if F33 =
22 + 1 is prime or composite. F33 is an unimaginably huge number, with 2, 585, 827, 973
33
digits.
On his second day at school, he learns that the Chinese character for the number 2 is
written as two horizontal strokes.
On his third day at school, he learns that the Chinese character for the number 3 is written
as three horizontal strokes.
After his third day at school, Ah Beng decides he’ll skip at least the next few Chinese
classes, because he thinks he knows how to write the Chinese characters for the numbers 4
and above. 4 simply consists of four horizontal strokes; 5 simply consists of five horizontal
strokes; etc. Unfortunately, Ah Beng’s act of extrapolation is wrong.
The characters for the numbers 4 through 10 look instead like this:
4 5 6 7 8 9 10
Example 618. Moore’s Law. In 1965, Gordon Moore observed that the number of
components that could be crammed onto each integrated circuit doubled every year. He
predicted that this rate of progress would continue at least through 1975.
In 1975, he adjusted his prediction to a more modest rate of doubling every two years. Thus
far, this latter prediction has held up remarkably well, as the following graph (taken from
Nature) shows.
Unfortunately, as stated in the same Nature article, it “has become increasingly obvious to
everyone involved” that “Moore’s law ... is nearing its end”.
This is considerably quicker than the rate at which the annual US defense budget and US
Gross National Product (GNP) grows. Extrapolating, he concluded:
• In 2054, the entire annual US defense budget will be spent on a single aircraft.
• Early in the 22nd century, the entire US GNP will be spent on a single aircraft.
Except so far they have been right on track. In a 2010 Economist article, Augustine was
quoted as saying, “We are right on target. Unfortunately nothing has changed.” That article
also presented an updated version of Augustine’s Law.
The latest F-35 fighter program is estimated to cost the US Department of Defense US$1.124
trillion. To be fair, that estimate is the cost of the entire program over its projected 60-
year lifespan (through 2070) — this includes R&D, the purchase of over 2, 000 F-35s, and
operating costs. But still, US$1.124 trillion is a mind-blowing figure.*
*Figure quoted from an April 2016 Defense News story. Note though that the estimate keeps changing.
Exercise 274. Using the data below, “predict” how many haircuts were sold in June 2016
by (a) a barber who charged $7 per haircut; and (b) a barber who charged $200 per haircut.
Which prediction is an act of interpolation and which is an act of extrapolation? Which
prediction do you think is more reliable?(Answer on p. 1187.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
Two variables may have a relationship, but not a linear one. Here we consider cases where
the relationship is quadratic, reciprocal, or logarithmic.
Example 620. Quadratic. Consider the following data. There is a very strong, but not
perfect degree of linear correlation between x and y (r ≈ 0.950). The observations are very
close to, but are not exactly on the OLS line of best fit.
The degree of linear correlation between z and y is near perfect (r ≈ 0.995). The observations
also lie closer to the line of best fit than before.
The degree of linear correlation between z and y is much stronger (r ≈ 0.899). The obser-
vations also lie closer to the line of best fit.
The degree of linear correlation between z and y is much stronger (r ≈ 0.978). The obser-
vations also lie closer to the line of best fit.
i 1 2 3 4 5
xi 1 2 3 4 5
yi 10.59 10.54 27.30 33.84 56.6
(a) Plot the above data in a scatter diagram and find the PMCC.
It’s much more interesting to live not knowing than to have answers which might be wrong.
- Richard Feynman (1981, YouTube).
The A-level examiners85 want you to say, mindlessly and formulaically, that
Regurgitating the above sentence will earn you your full mark. But in fact, without the
“all else equal” clause, it is nonsense. And since it is almost never true that “all else is
equal”, it is almost always nonsense.
In every introductory course or text on statistics, one is told that the PMCC is merely
a relatively-unimportant consideration, in deciding between models. Yet somehow, the
A-level examiners seem to consider the PMCC an all-important consideration.
Here’s a quick example to illustrate.
Example 623. (From the 2015 exam — see Exercise 458 below.) In an experiment the
following information was gathered about air pressure P , measured in inches of mercury,
at different heights above sea-level h, measured in feet.
h 2000 5000 10000 15000 20000 25000 30000 35000 40000 45000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28
The exam first asks us to find the PMCCs between (a) h and P ; (b) ln h and P ; and (c)
√
h and P . The answers are (a) ra ≈ −0.980731; (b) rb ≈ −0.974800; and (c) rc ≈ −0.998638.
The A-level exam then says, “Using the most appropriate case ..., find the equation which √
best models air pressure at different heights.” The “correct” answer is that (c) P = a + b h
is the “most appropriate” model, simply because the PMCC there is the largest.
But this is utter nonsense. One does not conclude that one model is “more appropriate”
than another simply because its PMCC is 0.018 larger. Small measurement errors or plain
bad luck could easily explain these tiny differences in PMCCs.
Moreover, even if one model has r = 0.9 and another has r = 0.4, it does not automatically
follow that the first model is “more appropriate” than the second. In deciding which
statistical model to use, there are very many considerations, of which the PMCC is a
relatively-unimportant one.
Sadly, in the Singapore education system, what I consider to be the correct answer would
not have gotten you any marks. Instead, one is taught that there must always be one
single, simplistic, formulaic, definitive, “correct” answer. This is a convenient substitute
for thinking.
As it turns out, the “most correct” linear model — based on the actual barometric formula
(see subsection 89.10.1 in the Appendices) — is actually the following:
L
ln P = a + b ln (1 + h) .
T
The constants L = −0.0065 kelvin per metre (Km-1 ) and T = 288.15 kelvin (K) are, re-
spectively, the standard temperature lapse rate (up to 11, 000 m above sea level) and the
standard temperature (at sea level).
The PMCC for the above model is rd ≈ 0.999998, which is “better” than the cases examined
above. (See this Google spreadsheet for the data and calculations.)
Ten-Year Series
For more practice, try the TYS questions for H1 Maths (in my H1 Maths
Textbook). They’re very similar!
This part lists all the questions from 2006-2015 A-Level exams, sorted into the nine different
parts and in reverse chronological order.
In the older exams, they had the habit of not distinctly numbering different parts within
the same question as parts (i), (ii), etc. So I have sometimes taken the liberty of adding or
modifying such numbers.
Exam Tip
Unless explicitly instructed, you are always allowed to use your graphing calculator, so use
it wherever possible.
Examples of explicit instructions to avoid using your calculator include (but are not limited
to):
• “Without using your calculator ...”
• “Use a non-calculator method ...”
• “Find the exact value of ...”
√
• “Express your answer in terms of 3 or π.”
a
y= + bx + c,
x2
where a, b and c are constants. It is given that C passes through the points with coordinates
(1.6, −2.4) and (−0.7, 3.6), and that the gradient of C is 2 at the point where x = 1.
(i) Find the values of a, b and c, giving your answers correct to 3 decimal places. [4]
(ii) Find the x-coordinate of the point where C crosses the x-axis, giving your answer
correct to 3 decimal places. [2]
(iii) One asymptote of C is the line with equation x = 0. Write down the equation of the
other asymptote of C. [1]
Exercise 277. (9740 N2015/I/2. Answer on p. 1190.) (i) Sketch the curve with equation
x+1
y=∣ ∣,
1−x
stating the equations of the asymptotes. On the same diagram, sketch the line with equation
y = x + 2. [3]
(ii) Solve the inequality
x+1
∣ ∣ < x + 2. [3]
1−x
Exercise 278. (9740 N2015/I/5. Answer on p. 1192.) (i) State a sequence of transfor-
mations that will transform the curve with equation y = x2 on to the curve with equation
y = 0.25(x − 3)2 . [2]
A curve has equation y = f (x) where
⎧
⎪
⎪
⎪
⎪ 1 for 0 ≤ x ≤ 1,
⎪
⎪
f (x) = ⎨0.25(x − 3)2 for 1 < x ≤ 3,
⎪
⎪
⎪
⎪
⎪
⎪
⎩0 otherwise.
1
f ∶x→ , x ∈ R, x > 1.
1 − x2
2+x
g∶x→ , x ∈ R, x ≠ ±1.
1 − x2
√
Find algebraically the range of g, giving your answer in terms of 3 as simply as possible.
[5]
1
y= , x ∈ R, x ≠ 1, x ≠ 0.
1−x
D = (0, d)
Vertical
intercept
y = f(x)
Horizontal
intercepts
(i) Sketch the curve y 2 = f (x), stating, in terms of a, b, c and d, the coordinates of any
turning points and of the points where the curve crosses the x-axis. [4]
(ii) What can be said about the tangents to the curve y 2 = f (x) at the points where it
crosses the x-axis? [1]
Exercise 282. (9740 N2014/II/1. Answer on p. 1195.) A curve C has parametric equa-
tions x = 3t2 , y = 6t.
(i) Find the value of t at the point on C where the tangent has gradient 0.4. [3]
(ii) The tangent at the point P (3p2 , 6p) on C meets the y-axis at the point D. Find the
cartesian equation of the locus of the mid-point of P D as p varies. [4]
x2 + x + 1
y= , x ∈ R, x ≠ 1.
x−1
Without using a calculator, find the set of values that y can take. [5]
Exercise 284. (9740 N2013/I/3. Answer on p. 1196.) (i) Sketch the curve with equation
x+1
y= ,
2x − 1
stating the equations of any asymptotes and the coordinates of the points where the curve
crosses the axes. [4]
x+1
< 1. [1]
2x − 1
Exercise 285. (9740 N2013/II/1. Answer on p. 1197.) Functions f and g are defined by
2+x
f ∶x↦ , x ∈ R, x ≠ 1,
1−x
g ∶ x ↦ 1 − 2x, x ∈ R.
(i) Explain why the composite function f g does not exist. [2]
(ii) Find an expression for gf (x) and hence, or otherwise, find (gf )−1 (5). [4]
Group Under 16 years Between 16 and 65 years Over 65 years Total cost
A 9 6 4 $162.03
B 7 5 3 $128.36
C 10 4 5 $158.50
Write down and solve equations to find the cost of a ticket for each of the age categories.
[4]
x+k
g∶x↦ , x ∈ R, x ≠ 1,
x−1
Exercise 288. (9740 N2013/I/3. Answer on p. 1199.) It is given that f (x) = x3 +x2 −2x−4.
(i) Sketch the graph of y = f (x). [1]
(ii) Find the integer solution of the equation f (x) = 4, and prove algebraically that there
are no other real solutions. [3]
(iii) State the integer solution of the equation (x + 3)3 + (x + 3)2 − 2(x + 3) − 4 = 4. [1]
(iv) Sketch the graph of y = ∣f (x)∣. [1]
(v) Write down two different cubic equations which between them give the roots of the
equation ∣f (x)∣ = 4. Hence find all the roots of this equation. [4]
x2 + x + 1
< 0. [4]
x2 + x − 2
Exercise 290. (9740 N2011/I/2. Answer on p. 1200.) It is given that f (x) = ax2 + bx + c,
where a, b, and c are constants.
(i) Given that the curve with equation y = f (x) passes through the points with coordinates
(−1.5, 4.5), (2.1, 3.2) and (3.4, 4.1), find the values of a, b, and c. Give your answers correct
to 3 decimal places. [3]
(ii) Find the set of values of x for which f (x) is an increasing function.
1
f ∶ x ↦ ln(2x + 1) + 3, x ∈ R, x > − .
2
(i) Find f −1 (x) and write down the domain and range of f −1 . [4]
(ii) Sketch on the same diagram the graphs of y = f (x) and y = f −1 (x) giving the equations
of any asymptotes and the exact coordinates of any points where the curves cross the x-
and y-axes. [4]
(iii) Explain why the x-coordinates of the points of intersection of the curves in part (ii)
satisfy the equation ln(2x + 1) = x − 3, and find the values of these x-coordinates, correct
to 4 significant figures. [3]
Exercise 292. (9740 N2010/I/5. Answer on p. 1202.) The curve with equation y = x3 is
transformed by a translation of 2 units in the positive x-direction, followed by a stretch with
scale factor 0.5 parallel to the y-axis, followed by a translation of 6 units in the negative
y-direction.
(i) Find the equation of the new curve in the form y = f (x) and the exact coordinates of
the points where this curve crosses the x- and y-axes. Sketch the new curve. [5]
(ii) On the same diagram, sketch the graph of y = f −1 (x), stating the exact coordinates of
the points where the graph crosses the x- and y-axes. [3]
1
f ∶x↦ , for x ∈ R, x ≠ −1, x ≠ 1.
x2 − 1
1
g∶x↦ , for x ∈ R, x ≠ 2, x ≠ 3, x ≠ 4.
x−3
(x − 3)2
f g(x) = . [2]
(4 − x)(x − 2)
Exercise 294. (9740 N2009/I/1. Answer on p. 1205.) (i) The first three terms of a
sequence are given by u1 = 10, u2 = 6, u3 = 5. Given that un is a quadratic polynomial in n,
find un in terms of n. [4]
(ii) Find the set of values of n for which un is greater than 100. [2]
Exercise 295. (9740 N2009/I/6. Answer on p. 1206.) The curve C1 has equation y =
x−2 x2 y 2
. The curve C2 has equation + = 1.
x+2 6 3
(i) Sketch C1 and C2 on the same diagram, stating the exact coordinates of any points of
intersection with the axes and the equations of any asymptotes. [4]
(ii) Show algebraically that the x-coordinates of the points of intersection of C1 and C2
satisfy the equation 2(x − 2)2 = (x + 2)2 (6 − x2 ). [2]
(iii) Use your calculator to find these x-coordinates. [2]
ax a
f ∶x↦ , for x ∈ R, x ≠ ,
bx − a b
ax + b
f (x) = ,
cx + d
3x − 7
y=
2x + 1
3x − 7
(a) y= ,
2x + 1
3x − 7
(b) y2 = ,
2x + 1
including the coordinates of the points where the graphs cross the axes and the equations
of any asymptotes. [5]
x
(i) y = , stating the equations of the asymptotes, [4]
−1
x2
x
(ii) y 2 = 2 , making clear the form of the curve at the origin. [3]
x −1
x
Show that the x-coordinates of the points of intersection of the curves y = and y = ex
x2 −1
satisfy the equation x2 = 1 + xe−x .
√
Use the iterative formula xn+1 = 1 + xn e−xn , together with a suitable initial value x1 , to
find the positive root of this equation correct to 2 decimal places.
(i) Sketch the graph of y = f (x). Your sketch should indicate the position of the graph in
relation to the origin. [2]
(ii) Find f −1 (x), stating the domain of f −1 . [3]
(iii) On the same diagram as in part (i), sketch the graph of y = f −1 (x). [1]
(iv) Write down the equation of the line in which the graph of y = f (x) must be reflected in
order to obtain the graph of y = f −1 (x), and hence find the exact solution of the equation
f (x) = f −1 (x). [5]
2x2 − x − 19 x2 − 4x − 21
− 1 = . [1]
x2 + 3x + 2 x2 + 3x + 2
2x2 − x − 19
> 1. [4]
x2 + 3x + 2
1
f ∶x↦ for x ∈ R,x ≠ 3,
x−3
g ∶ x ↦ x2 for x ∈ R.
(i) Only one of the composite functions f g and gf exists. Give a definition (including the
domain) of the composite that exists, and explain why the other composite does not exist.
[3]
2x + 7
Exercise 302. (9740 N2007/I/5. Answer on p. 1212.) Show that the equation y =
x+2
B
can be written as y = A + , where A and B are constants to be found. Hence state a
x+2
1 2x + 7
sequence of transformations which transform the graph of y = to the graph of y = .
x x+2
[4]
2x + 7
Sketch the graph of y = , giving the equations of any asymptotes and the coordinates
x+2
of any points of intersection with the x- and y-axes.
Exercise 303. (9740 N2007/II/1. Answer on p. 1213.) Four friends buys three different
kinds of fruits in the market. When they get home they cannot remember the individual
prices per kilogram, but three of them can remember the total amount that they each paid.
The weights of fruit and the total amts paid are shown in the following table.
Assuming that, for each variety of fruit, the price per kilogram paid by each of the friends
is the same, calculate the total amount that Lee Lian paid. [6]
4x + 1
f ∶x↦ , x ∈ R, x ≠ 3.
x−3
(i) State the equations of the two asymptotes of the graph of y = f (x). [2]
(ii) Sketch the graph of y = f (x), showing its asymptotes and stating the coordinates of
the points of intersection with the axes. [3]
(iii) Find an expression for f −1 (x) and state the domain of f −1 . [3]
Exercise 305. (9233 N2006/I/3. Answer on p. 1214.) Functions f and g are defined by
f ∶ x ↦ 5x + 3, x > 0,
3
g∶x↦ , x > 0.
x
x−9
≤ 1. [5]
x2 − 9
N
Remark: This was the old 9233 exam. Technically, inequalities of the form ≥ 0 are no
D
N
longer in the 9740 or 9758 syllabus. Only inequalities of the form > 0 are. But this is
D
not a difficult question and you should go ahead and try it.
Exercise 307. (9740 N2015/I/8. Answer on p. 1216.) Two athletes are to run 20 km by
running 50 laps around a circular track of length 400 m. They aim to complete the distance
in between 1.5 hours and 1.75 hours inclusive.
(i) Athlete A runs the first lap in T seconds and each subsequent lap takes 2 seconds longer
than the previous lap. Find the set of values of T which will enable A to complete the
distance within the required time interval. [4]
(ii) Athlete B runs the first lap in t seconds and the time for each subsequent lap is 2%
more than the time for the previous lap. Find the set of values of t which will enable B to
complete the distance within the required time interval. [4]
(iii) Assuming each athlete completes the 20 km run in exactly 1.5 hours, find the difference
in the athletes’ times for their 50th laps, giving your answer to the nearest second. [3]
Exercise 308. (9740 N2015/II/4. Answer on p. 1217.) Skip part (a) if you’re taking the
9758 (revised) exam.
1
1 × 3 × 6 + 2 × 4 × 7 + 3 × 5 × 8 + ⋅ ⋅ ⋅ + n(n + 2)(n + 5) = n(n + 1)(3n2 + 31n + 74). [6]
12
2 A B
(b) (i) Show that can be expressed as + , where A and B are
+ 8r + 3
4r2 2r + 1 2r + 3
constants to be determined. [1]
n
2
The sum ∑ is denoted by Sn .
r=1 4r + 8r + 3
2
1
(b) The sum Sn of the first n terms of a sequence u1 , u2 , u3 , . . . is given by Sn = 1− .
(n + 1)!
(i) Give a reason why the series ∑ ur converges, and write down the value of the sum to
infinity. [2]
(ii) Find a formula for un in simplified form. [2]
O 4m A1 4m A2 4m A3 4m A4 4m A5 4 m A6 4 m A7 4m A8
O 4m A1 4m A2 8m A3 16 m A4
(i) In Version 1 of the exercise (top), the distances between adjacent points are all 4 m.
(a) Find the distance run by an athlete who completes the first 10 stages of Version 1 of
the exercise. [2]
(b) Write down an expression for the distance run by an athlete who completes n stages of
Version 1. Hence find the least number of stages that the athlete needs to complete to run
at least 5 km. [4]
(ii) In Version 2 of the exercise (bottom), the distances between the points are such that
OA1 = 4 m, A1 A2 = 4 m, A2 A3 = 8 m and An An+1 = 2An−1 An . Write down an expression
for the distance run by an athlete who completes n stages of Version 2. Hence find the
distance from O, and the direction of travel, of the athlete after he has run exactly 10 km
using Version 2. [5]
(i) The length of the nth piece of string cut off is p cm. Show that ln p = (An + B) ln 2 +
(Cn + D) ln 3, for constants A, B, C and D to be determined. [3]
(ii) Show that the total length of string cut off can never be greater than 384 cm. [2]
(iii) How many pieces must be cut off before the total length cut off is greater than 380
cm? You must show sufficient working to justify your answer. [4]
Exercise 312. (9740 N2013/I/9. Answer on p. 1222.) Skip this question if you’re taking
the 9758 (revised) exam.
n
1
∑ r(2r2 + 1) = n(n + 1)(n2 + n + 1). [5]
r=1 2
(ii) It is given that f (r) = 2r3 + 3r2 + r + 24. Show that f (r) − f (r − 1) = ar2 , for a constant
n
a to be determined. Hence find a formula for ∑ r2 , fully factorising your answer. [5]
r=1
n
(iii) Find ∑ f (r). (You should not simplify your answer.) [3]
r=1
Exercise 313. (9740 N2012/I/3. Answer on p. 1224.) Skip this question if you’re taking
the 9758 (revised) exam.
3un − 1
A sequence u1 , u2 , u3 , . . . is given by u1 = 2 and un+1 = for n ≥ 1.
6
(i) Find the exact values of u2 and u3 . [2]
(ii) It is given that un → l as n → ∞. Showing your working, find the exact value of l. [2]
(iii) For this value of l, use the method of mathematical induction to prove that
14 1 n
un = ( ) + l. [4]
3 2
Exercise 315. (9740 N2011/I/6. Answer on p. 1226.) Skip parts (ii) and (iii) of this
question if you’re taking the 9758 (revised) exam.
n
(ii) Hence find a formula for ∑ cos(rθ) in terms of sin(n + 0.5)θ and sin(0.5θ). [3]
r=1
n
cos(0.5θ) − cos(n + 0.5)θ
∑ sin(rθ) =
r=1 2 sin(0.5θ)
Exercise 317. (9740 N2010/I/3. Answer on p. 1228.) Skip part (ii) of this question if
you’re taking the 9758 (revised) exam.
Exercise 318. (9740 N2010/II/2. Answer on p. 1229.) Skip part (i) of this question if
you’re taking the 9758 (revised) exam.
n
1
∑ r(r + 2) = n(n + 1)(2n + 7). [5]
r=1 6
n
1 3 1 1
∑ = − − . [4]
r=1 r(r + 2) 4 2(n + 1) 2(n + 2)
1 2 1 A
− + = 3 ,
n−1 n n+1 n −n
n
1
∑ 3−r
.
r=2 r
Exercise 320. (9740 N2009/I/5. Answer on p. 1232.) Skip (i) if you’re taking the 9758
(revised) exam.
n
1
∑ r2 = n(n + 1)(2n + 1). [4]
r=1 6
2n
(ii) Find ∑ r2 , giving your answer in fully factorised form. [4]
r=n+1
Exercise 321. (9740 N2009/I/8. Answer on p. 1233.) Two musical instruments, A and
B, consist of metal bars of decreasing lengths.
(i) The first bar of instrument A has length 20 cm and the lengths of the bars form a
geometric progression. The 25th bar has length 5 cm. Show that the total length of all the
bars must be less than 357 cm, no matter how many bars there are. [4]
Instrument B consists of only 25 bars which are identical to the first 25 bars of instrument
A.
(ii) Find the total length, L cm, of all the bars of instrument B and the length of the 13th
bar. [3]
(iii) Unfortunately the manufacturer misunderstands the instructions and constructs in-
strument B wrongly, so that the lengths of the bars are in arithmetic progression with
common difference d cm. If the total length of the 25 bars is still L cm and the length of
the 25th bar is still 5 cm, find the value of d and the length of the longest bar. [4]
The nth term of a sequence is given by un = n(2n + 1), for n ≥ 1. The sum of the first
n terms is denoted by Sn . Use the method of mathematical induction to show that Sn =
1/6n(n + 1)(4n + 5) for all positive integers n. [5]
Exercise 323. (9740 N2008/I/10. Answer on p. 1235.) (i) A student saves $10 on 1
January 2009. On the first day of each subsequent month she saves $3 more than in the
previous month, so that she saves 13 on 1 February 2009, $16 on 1 March 2009, and so on.
On what date will she first have saved over $2000 in total?
(ii) A second students puts $10 on 1 January 2009 into a bank account which pays compound
interest at a rate of 2% per month on the last day of each month. She puts a further $10
into the account on the first day of each subsequent month.
(a) How much compound interest has her original $10 earned at the end of 2 years? [2]
(b) How much in total is in the account at the end of 2 years? [3]
(c) After how many complete months will the total in the account first exceed $2000? [4]
Exercise 324. (9233 N2008/I/14. Answer on p. 1236.) Skip part (i) of this question if
you’re taking the 9758 (revised) exam.
2 1 − (n + 1)xn + nxn+1
1 + 2x + 3x + ⋅ ⋅ ⋅ + nx n−1
= .
(1 − x)2
(i) Use mathematical induction to prove the statement for all positive integers n.
(ii) By considering the expression obtained by integrating each term on the left hand side,
prove the statement without using mathematical induction. [6]
The diagram shows the graph of y = ex − 3x. The two roots of the equation ex − 3x = 0 are
denoted by α and β, where α < β.
(i) Find the values of α and β, each correct to 3 decimal places. [2]
1
xn+1 = exn , for n ≥ 1.
3
(ii) Prove algebraically that, if the sequence converges, then it converges to either α or β.
[2]
(iii) Use a calculator to determine the behaviour of the sequence for each of the cases x1 = 0,
x1 = 1, x1 = 2. [3]
(iv) By considering xn+1 − xn , prove that
(v) State briefly how the results in part (iv) relate to the behaviours determined in (iii).
Exercise 328. (9740 N2007/II/2. Answer on p. 1241.) Skip part (i) of this question if
you’re taking the 9758 (revised) exam.
2n + 1
un+1 = un − ,
n2 (n + 1)2
for all n ≥ 1.
1
(i) Use the method of mathematical induction to prove that un = . [4]
n2
(ii) Hence find
N
2n + 1
∑ . [2]
n=1 n2 (n + 1)2
(iii) Give a reason why the series in part (ii) is convergent and state the sum to infinity. [2]
(iv) Use your answer to part (ii) to find
N
2n − 1
∑ . [2]
n=2 n (n − 1)
2 2
n
cos(0.5x) − cos [(n + 0.5)x]
∑ sin rx = ,
r=1 2 sin(0.5x)
2n
∑ 3r+2 . [3]
r=1
Exercise 331. (9233 N2006/I/1. Answer on p. 1243.) The sum Sn of the first n terms
2
of a geometric progression is given by Sn = 6 − n−1 . Find the first term and the common
3
ratio.
Exercise 332. (9233 N2006/I/11. Answer on p. 1244.) Skip (i) if you’re taking the 9758
(revised) exam.
n
1
∑ r3 = n2 (n + 1)2 . [4]
r=1 4
Exercise 333. (9740 N2015/I/7. Answer on p. 1245.) Referred to the origin O, points
A and B have position vectors a and b respectively. Point C lies on OA, between O
and A, such that OC ∶ CA = 3 ∶ 2. Point D lies on OB, between O and B, such that
OD ∶ DB = 5 ∶ 6.
Ð→ ÐÐ→
(i) Find the position vectors OC and OD, giving your answers in terms of a and b. [2]
(ii) Show that the vector equation of the line BC can be written as r = 0.6λa + (1 − λ)b,
where λ is a parameter. Find in a similar form the vector equation of the line AD in terms
of a parameter µ. [3]
(iii) Find, in terms of a and b, the position vector of the point E where the lines BC and
AD meet and find the ratio AE ∶ ED. [5]
Exercise 334. (9740 N2015/II/2. Answer on p. 1246.) The line L has equation r =
i − 2j − 4k + λ(2i + 3j − 6k).
(i) Find the acute angle between L and the x-axis. [2]
The point P has position vector 2i + 5j − 6k.
√
(ii) Find the points on L which are a distance of 33 from P . Hence or otherwise find the
point on L which is closest to P . [5]
(iii) Find a cartesian equation of the plane that includes the line L and the point P . [3]
Exercise 335. (9740 N2014/I/3. Answer on p. 1247.) (i) Given that a × b = 0, what can
be deduced about the vectors a and b? [2]
(ii) Find a vector equation of the line m where p and q meet. [4]
(iii) B is a general point on m. Find an expression for the square of the distance AB.
Hence, or otherwise, find the coordinates of the point on m which is nearest to A. [5]
Exercise 337. (9740 N2013/I/1. Answer on p. 1247.) Skip this question if you’re taking
the 9758 (revised) exam.
(i) Given that µ = 3, find the coordinates of the point of intersection of p, q and r.
(ii) Given instead that µ = 0, describe the relationship between p, q and r.
A
N
a C
b B
O M
Ð→ Ð→
The origin O and the points A, B and C lie in the same plane, where OA = a, OB = b and
Ð→
OC = c (see diagram).
(i) Explain why c can be expressed as c = λa + µb, for constants λ and µ. [1]
The point N is on AC such that AN ∶ N C = 3 ∶ 4.
(ii) Write down the position vector of N in terms of a and c. [1]
(iii) It is given that the area of triangle ON C is equal to the area of triangle OM C, where
M is the mid-point of OB. By finding the areas of these triangles in terms of a and b, find
λ in terms of µ in the case where λ and µ are both positive. [5]
Exercise 340. (9740 N2012/I/5. Answer on p. 1250.) Skip part (i) of this question if
you’re taking the 9758 (revised) exam.
Referred to the origin O, the points A and B have position vectors a and b such that
a = i − j + k and b = i + 2j. The point C has position vector c given by c = λa + µb, where
λ and µ are positive constants.
√
(i) Given that the area of triangle OAC is 126, find µ. [4]
√
(ii) Given instead that µ = 4 and that OC = 5 3, find the possible coordinates of C. [4]
Exercise 341. (9740 N2012/I/9. Answer on p. 1250.) (i) Find a vector equation of the
line through the points A and B with position vectors 7i+8j+9k and −i−8j+k respectively.
(ii) The perpendicular to this line from the point C with position vector i + 8j + 3k meets
the line at the point N . Find the position vector of N and the ratio AN ∶ N B. [5]
B
O Q
Ð→ Ð→
Referred to the origin O, the points A and B are such that OA = a and OB = b. The point
P on OA is such that OP ∶ P A = 1 ∶ 2, and the point Q on OB is such that OQ ∶ QB = 3 ∶ 2.
The mid-point of P Q is M (see diagram).
ÐÐ→
(i) Find OM in terms of a and b and show that the area of triangle OM P can be written
as k ∣a × b∣, where k is a constant to be found. [6]
(ii) The vectors a and b are now given by a = 2pi − 6pj + 3pk and b = i + j − 2k, where p is
a positive constant. Given that a is a unit vector,
(a) find the exact value of p, [2]
(b) give a geometrical interpretation of ∣a ⋅ b∣, [1]
Exercise 344. (9740 N2010/I/1. Answer on p. 1253.) The position vectors a and b are
given by a = 2pi + 3pj + 6pk and b = i − 2j + 2k, where p > 0, It is given that ∣a∣ = ∣b∣.
(i) Find the exact value of p. [2]
(ii) Show that (a + b) ⋅ (a − b) = 0. [3]
Exercise 345. (9740 N2010/I/10. Answer on p. 1253.) Skip part (iv) of this question if
you’re taking the 9758 (revised) exam.
x − 10 y + 1 z + 3
= = and x − 2y − 3z = 0.
−3 6 9
(iii) Show that the point A with coordinates (−2, 23, 33) lies on l. Find the coordinates of
the point B which is the mirror image of A in p. [3]
(iv) Find the area of triangle OAB, where O is the origin, giving your answer to the nearest
whole number. [3]
Exercise 347. (9740 N2009/II/2. Answer on p. 1254.) Skip part (iv) of this question if
you’re taking the 9758 (revised) exam.
Relative to the origin O, two points A and B have position vectors given by a = 14i+14j+14k
and b = 11i − 13j + 2k respectively.
(i) The point P divides the line AB in the ratio 2 ∶ 1. Find the coordinates of P . [2]
(Note: They should have said line segment rather than line. About this, see section 26.1 —
“Lines vs. Line Segments” on p. 248.)
Exercise 348. (9740 N2008/I/3. Answer on p. 1255.) Skip part (iii) of this question if
Ð→
you’re taking the 9758 (revised) exam. Points O, A, B are such that OA = i + 4j − 3k and
Ð→
OB = 5i − j, and the point P is such that OAP B is a parallelogram.
Ð→
(i) Find OP . [1]
(ii) Given that all three planes meet in the line l, find λ and µ. [3]
(iii) Given instead that the three planes have no points in common, what can be said about
the values of λ and µ? [2]
(iv) Find the cartesian equation of the plane which contains l and the point (1, −1, 3). [4]
Exercise 350. (9233 N2008/I/11. Answer on p. 1256.) The cartesian equations of two
lines are
(i) Show that the lines intersect and state the point of intersection. [5]
(ii) Find the acute angle between the lines. [4]
Exercise 351. (9740 N2007/I/6. Answer on p. 1256.) Skip part (iii) of this question if
you’re taking the 9758 (revised) exam. Referred to the origin O, the position vectors of the
points A and B are i − j + 2k and 2i + 4j + k respectively.
(ii) Find the position vector of the point M on the line segment AB such that AM ∶ M B =
1 ∶ 2. [3]
(iii) The point C has position vector −4i + 2j + 2k. Use a vector product to find the exact
area of triangle OAC. [4]
Exercise 353. (9233 N2007/I/7. Answer on p. 1257.) The point P is the foot of the
perpendicular from the point A(1, 3, −2) to the line given by
Exercise 355. (9233 N2006/I/14. Answer on p. 1258.) Skip part (ii) of this question if
you’re taking the 9758 (revised) exam.
The points A, B, C and D have position vectors i−2j+5k, i+3j, 10i+j+2k and −2i+4j+5k
respectively, with respect to an origin O. The point P on AB is such that AP ∶ P B = λ ∶ 1−λ
Ð→ Ð→
and the point Q on CD is such that CQ ∶ QD = µ ∶ 1 − µ. Find OP and OQ in terms of λ
and µ respectively. [3]
Exercise 356. (9740 N2015/I/9. Answer on p. 1259.) (a) The complex number w is such
that w = a + ib, where a and b are non-zero real numbers. The complex conjugate of w is
w2
denoted by w∗ . Given that ∗ is purely imaginary, find the possible values of w in terms
w
of a. [5]
Skip part (b) of this question if you’re taking the 9758 (revised) exam.
(b) The complex number z is such that z 5 = −32i.
(i) Find the modulus and argument of each of the possible values of z. [4]
(ii) Two of these values are z1 and z2 , where 0.5π < arg z1 < π and −π < arg z2 < −0.5π. Find
the exact value of arg(z1 − z2 ) in terms of π and show that ∣z1 − z2 ∣ = 4 sin(0.2π). [4]
Exercise 358. (9740 N2014/II/4. Answer on p. 1261.) Skip this question if you’re taking
the 9758 (revised) exam.
(a) The complex number z satisfies ∣z + 5 − i∣ = 4.
(i) Without using a calculator, find an exact expression for w6 . Give your answer in the
form reiθ , where r > 0 and 0 ≤ θ < 2π. [3]
(ii) Without using a calculator, find the three smallest positive whole number values of n
wn
for which ∗ is a real number. [4]
w
Exercise 360. (9740 N2013/I/8. Answer on p. 1263.) The complex number z is given by
z = reiθ , where r > 0 and 0 ≤ θ ≤ 0.5π.
√
(i) Given that w = (1 − i 3) z, find ∣w∣ in terms of r and arg w in terms of θ. [2]
Skip parts (ii) and (iii) of this question if you’re taking the 9758 (revised) exam.
(ii) Given that r has a fixed value, draw an Argand diagram to show the locus of z as θ
varies. On the same diagram, show the corresponding locus of w. You should identify the
modulus and argument of the endpoints of each locus. [4]
z 10
(iii) Given that arg = π, find θ. [3]
w2
Exercise 362. (9740 N2012/II/2. Answer on p. 1265.) Skip this question if you’re taking
the 9758 (revised) exam.
The complex number z satisfies the equation ∣z − (7 − 3i)∣ = 4.
(i) Sketch an Argand diagram to illustrate this equation. [2]
(ii) Given that ∣z∣ is as small as possible,
(a) find the exact value of ∣z∣, [2]
(b) hence find an exact expression for z, in the form x + iy. [2]
(iii) It is given instead that −π < arg z ≤ π and that ∣arg z∣ is as large as possible. Find the
value of arg z in radians, correct to 4 significant figures. [3]
Skip parts (iii) and (iv) of this question if you’re taking the 9758 (revised) exam.
(iii) Using a single Argand diagram, sketch the loci
(a) ∣z − z1 ∣ = ∣z − z2 ∣, [1]
(b) ∣z − w1 ∣ = ∣z − w1 ∣, [1]
(iv) Give a reason why there are no points which lie on both of these loci. [1]
Exercise 364. (9740 N2011/II/1. Answer on p. 1267.) Skip this question if you’re taking
the 9758 (revised) exam.
Exercise 366. (9740 N2010/II/1. Answer on p. 1269.) (i) Solve the equation x2 −6x+34 =
0. [2]
(ii) One root of the equation x4 + 4x3 + x2 + ax + b = 0, where a and b are real, is x = −2 + i.
Find the values of a and b and the other roots. [5]
Exercise 367. (9740 N2009/I/9. Answer on p. 1270.) (i) Solve the equation z 7 −(1+i) = 0,
giving the roots in the form reiα , where r > 0 and −π < α ≤ π. [5]
(ii) Show the roots on an Argand diagram. [2]
Skip part (iii) of this question if you’re taking the 9758 (revised) exam.
π
(iii) The roots represented by z1 and z2 are such that 0 < arg z1 < arg z2 < . Explain why
2
the locus of all points z such that ∣z − z1 ∣ = ∣z − z2 ∣ passes through the origin. Draw this
locus on your Argand diagram and find its exact cartesian equation. [5]
Skip part (b) of this question if you’re taking the 9758 (revised) exam.
(b) The complex number z satisfies the relations ∣z∣ ≤ 6 and ∣z∣ = ∣z − 8 − 6i∣.
Exercise 370. (9233 N2008/I/9. Answer on p. 1273.) Skip this question if you’re taking
the 9758 (revised) exam.
In an Argand diagram, the point P represents the complex number z. Clearly labelling
any relevant points, draw three separate diagrams to show the locus of P in each of the
following cases.
Exercise 371. (9233 N2008/II/3. Answer on p. 1275.) (i) Verify that w = 1 − i satisfies
the equation w2 = −2i and write down the other root of this equation. [3]
(ii) Use the quadratic formula to solve the equation z 2 − (3 + 5i)z − 4(1 − 2i) = 0. [4]
Exercise 372. (9740 N2007/I/3. Answer on p. 1276.) If you’re taking the 9758 (revised)
exam, you can skip part (a) but you should still do part (b) of this question .
(a) Sketch, on an Argand√ diagram, the locus of points representing the complex number z
such that ∣z + 2 − 3i∣ = 13. [3]
(b) The complex number w is such that ww∗+2w = 3+4i, where w∗ is the complex conjugate
of w. Find w in the form a + ib, where a and b are real. [4]
Exercise 375. (9233 N2007/II/5. Answer on p. 1279.) Skip this question if you’re taking
the 9758 (revised) exam.
Illustrate, on an Argand diagram, the locus of a point P representing the complex number
π
z, where arg(z − 2i) = . [3]
3
Illustrate, using the same Argand diagram, the locus of a point Q representing the complex
number z, where ∣z − 4∣ = ∣z + 2∣. [2]
π
Hence find the exact value of z such that arg(z − 2i) = and ∣z − 4∣ = ∣z + 2∣, giving your
3
answer in the form a + ib. [3]
√
Show that, in this case, zz ∗ = 8 + 4 3. [2]
Exercise 376. (9233 N2006/I/5. Answer on p. 1280.) Skip this question if you’re taking
the 9758 (revised) exam.
The complex number z satisfies ∣z + 4 − 4i∣ = 3.
(i) Describe, with the aid of a sketch, the locus of the point which represents z in an Argand
diagram. [3]
(ii) Find the least possible value of ∣z − i∣. [2]
Exercise 377. (9233 N2006/I/6. Answer on p. 1281.) (i) Show that the equation z 4 −
2z 3 + 6z 2 − 8z + 8 = 0 has a root of the form ki where k is real. [3]
(ii) Hence solve the equation z 4 − 2z 3 + 6z 2 − 8z + 8 = 0. [3]
Exercise 378. (9740 N2015/I/3. Answer on p. 1282.) (i) Given that f is a continuous
function, explain, with the aid of a sketch, why the value of
1 1 2 n
lim [f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )]
n→∞ n n n n
1
is ∫ f (x) dx.
0
√ √ √
1 3 1 + 3 2 + ⋅⋅⋅ + 3 n
(ii) Hence evaluate lim ( √ ).
n→∞ n 3
n
Exercise 379. (9740 N2015/I/4. Answer on p. 1283.) A piece of wire of fixed length d
m is cut into two parts. One part is bent into the shape of a rectangle with sides of length
x m and y m. The other part is bent into the shape of a semicircle, including its diameter.
The radius of the semicircle is x m. Show that the maximum value of the total area of the
two shapes can be expressed as kd2 m2 , where k is a constant to be found.
Exercise 380. (9740 N2015/I/6. Answer on p. 1283.) Write down the first three non-
zero terms in the Maclaurin series for ln (1 + 2x), where −0.5 < x ≤ 0.5, simplifying the
coefficients. [2]
(ii) It is given that the three terms found in part (i) are equal to the first three terms in
c
the series expansion of ax (1 + bx) for small x. Find the exact values of the constants a,
b and c and use these values to find the coefficient of x4 in the expansion of ax (1 + bx) ,
c
y
y = sin x
P
A2
y = cos x
A1
x
O 0.5π
A1 √
(i) Show that = 2. [4]
A2
√
(ii) The region bounded by y = sin x between O and P , the line y = 0.5 2 and the y-axis is
rotated about the y-axis through 360○ . Show that the volume of the solid formed is given
by
√
0.5 2 2
π∫ (sin−1 y) dy. [2]
0
b
(iii) Show that the substitution y = sin u transforms the integral in part (ii) to π ∫ u2 cos u du,
a
for limits a and b to be determined. Hence find the exact volume. [6]
(iii) Show that the area of the region bounded by C and the x-axis is given by
0.5π
∫0 9 sin4 θ cos2 θ dθ.
Use your calculator to find the area, giving your answer correct to 3 decimal places. [3]
The line with equation y = ax, where a is a positive constant, meets C at the origin and at
the point P .
3
(iv) Show that tan θ =at P . Find the exact value of a such that the line passes through
a
the maximum point of C. [3]
Exercise 383. (9740 N2015/II/1. Answer on p. 1285.) As a tree grows, the rate of
increase of its height, h m, with respect to time, t years after planting, is modelled by the
differential equation
dh 1 √
= 16 − 0.5h.
dt 10
(i) State the maximum height of the tree, according to this model. [1]
(ii) Find an expression for t in terms of h, and hence find the time the tree takes to reach
half its maximum height. [5]
Exercise 384. (9740 N2014/I/2. Answer on p. 1286.) The curve C has equation x2 y +
xy 2 + 54 = 0. Without using a calculator, find the coordinates of the point on C at which
the gradient is −1, showing that there is only one such point. [6]
y
x
O Ƚ
-7
(Ⱦ, -7)
(i) Find the value of α, giving your answer correct to 3 decimal places, and find the exact
value of β. [2]
α
(ii) Evaluate ∫ f (x) dx, giving your answer correct to 3 decimal places. [2]
β
√
(iii) Find, in terms of 3, the area of the finite region bounded by the curve and the line,
for x ≥ 0. [3]
(iv) Show that f (x) = f (−x). What can be said about the six roots of the equation
f (x) = 0? [4]
1
Exercise 386. (9740 N2014/I/8. Answer on p. 1287.) It is given that f (x) = √ ,
9 − x2
where −3 < x < 3.
(ii) Find the binomial expansion for f (x), up to and including the term in x6 . Give the
coefficients as exact fractions in their simplest form. [4]
(iii) Hence, or otherwise, find the first four non-zero terms of the Maclaurin series for
x
sin−1 . Give the coefficients as exact fractions in their simplest form. [4]
3
dx
= k (1 + x − x2 ) ,
dt
dx
where 0 ≤ x ≤ 0.5 and k is a constant. It is given that x = 0.5 and = −0.25 when t = 0.
dt
(i) Show that k = −0.2. [1]
(a) the exact time taken for the mass of the substance present in the chemical reaction to
becom half of its initial value, [1]
(b) the time taken for there to be none of the substance present in the chemical reaction,
giving your answer correct to 3 decimal places. [1]
(iv) Express the solution of the differential equation in the form x = f (t) and sketch the
part of the curve with this equation which is relevant in this context. [5]
4
h
(ii) Find the two solutions to the equation in part (i) for which r > 0, giving your answers
correct to 3 decimal places. [2]
(iii) Show that one of the solutions found in part (ii) does not give a stationary value of V .
Hence write down the value of r1 and find the corresponding value of h. [3]
(iv) Sketch the graph showing the volume of the toy as the radius of the hemisphere varies.
[3]
2 9x2 + x − 13
∫0 (2x − 5)(x2 + 9)
dx.
Give your answer in the form a ln b + c tan−1 d, where a, b, c and d are rational numbers to
be determined. [9]
√
⎧
⎪ x2
⎪
⎪
⎪ 1− 2 for − a ≤ x ≤ a,
f (x) = ⎨ a
⎪
⎪
⎪
⎪
⎩0 for a < x < 2a,
and that f (x + 3a) = f (x) for all real values of x, where a is a real constant.
(i) Sketch the graph of y = f (x) for −4a ≤ x ≤ 6a. [3]
√
3a/2
(ii) Use the substitution x = a sin θ to find the exact value of ∫ f (x) dx in terms of a
a/2
and π. [5]
Exercise 391. (9740 N2013/I/10. Answer on p. 1292.) The variables x, y and z are
dz A dy B
connected by the following differential equations. = 3 − 2z and = z.
dx dx
A
(i) Given that z < 1.5, solve equation = to find z in terms of x.
(ii) Hence find y in terms of x.
d2 y dy
(iii) Use the result in part (ii) to show that = a + b, for constants a and b to be
dx2 dx
determined . [3]
You can skip part (iv) if you’re taking the 9758 (revised) exam.
(iv) The result in part (ii) represents a family of curves. Some members of the family are
straight lines. Write down the equations of two of these lines. On a single diagram, sketch
one of your lines together with a non-linear member of the family of curves that has your
line as an asymptote. [4]
(ii) Points P and Q on C have parameters p and q respectively. The tangent at P meets
the tangent at Q at the point R. Show that the x-coordinate of R is p2 + pq + q 2 , and find
the y-coordinate of R in terms of p and q. Given that pq = −1, show that R lies on the
curve with equation x = y 2 + 1. [5]
A curve L has equation x = y 2 + 1. The diagram shows the parts of C and L for which y ≥ 0.
The curves C and L touch at the point M .
(iii) Show that 4t6 − 3t2 + 1 = 0 at M . Hence, or otherwise, find the exact coordinates of M .
[3]
(iv) Find the exact value of the area of the shaded region bounded by C and L for which
y ≥ 0. [6]
x x
a
x x
x Fig. 2
x
B Fig. 1 C Fig. 3
A
√ √ 2
(i) Show that the volume V of the prism is given by V = 0.25x 3 (a − 2x 3) . [3]
(ii) Use differentiation to find, in terms of a, the maximum value of V , proving that it is a
maximum. [6]
Author’s remark: The question should have clearly stated that either a or x is a fixed
constant. Otherwise the volume V of the prism is unbounded — simply blow up both a and
x to ∞! My guess is that the writers of this question intended a to be the fixed constant.
Exercise 394. (9740 N2013/II/3. Answer on p. 1295.) (i) Given that f (x) = ln(1+2 sin x),
find f (0), f ′ (0), f ′′ (0) and f ′′′ (0). Hence write down the first three non-zero terms in the
Maclaurin series for f (x). [7]
(ii) The first two non-zero terms in the Maclaurin series for f (x) are equal to the first two
non-zero terms in the series expansion of eax sin nx. Using appropriate expansions from the
List of Formulae (MF15), find the constants a and n. Hence find the third non-zero term
of the series expansion of eax sin nx for these values of a and n. [5]
B
1
0.75π
ȣ
A C
1
(i) Show that AC = . [4]
cos θ − sin θ
(ii) Given that θ is a sufficiently small angle, show that AC ≈ 1 + aθ + bθ2 , for constants a
and b to be determined. [4]
Exercise 397. (9740 N2012/I/8. Answer on p. 1297.) The curve C has equation x − y =
(x + y)2 . It is given that C has only one turning point.
dy 2
(i) Show that 1 + = . [4]
dx 2x + 2y + 1
d2 y dy 3
(ii) Hence, or otherwise, show that 2 = − (1 + ) . [3]
dx dx
(iii) Hence, state, with a reason, whether the turning point is a maximum or a minimum.
[2]
(i) It is given that the volume of the model is a fixed value k cm3 , and the external surface
area is a minimum. Use differentiation to find the values of r and h in terms of k. Simplify
your answers. [7]
(ii) It is given instead that the volume of the model is 200 cm3 and its external surface area
is 180 cm2 . Show that there are two possible values of r. Given also that r < h, find the
value of r and the value of h. [5]
(iv) A point P on C has parameter p, where 0 < p < 0.5π. Show that the normal to C at P
crosses the x-axis at the point with coordinates (p, 0). [3]
(ii) Hence find the coordinates of the points Q and R where this tangent meets the x- and
y-axes respectively. [2]
(iii) Find a cartesian equation of the locus of the mid-point of QR as p varies.
Exercise 402. (9740 N2011/I/4. Answer on p. 1301.) (i) Use the first three non-zero
terms of the Maclaurin series for cos x to find the Maclaurin series for g(x), where g(x) =
cos6 x, up to and including the term in x4 . [3]
a
(ii) (a) Use your answer to part (i) to give an approximation for ∫ g(x) dx in terms of a,
0
π
and evaluate this approximation in the case where a = . [3]
4
(i) On separate diagrams, sketch the graphs of y = f (∣x∣) and y = ∣f (x)∣, giving the coordi-
nates of any points where the graphs meet the x- and y-axes. You should label the graphs
clearly. [3]
(ii) A stone is dropped from a stationary balloon. It leaves the balloon with zero speed, and
dv
t seconds later it speed v metres per second satisfies the differential equation = 10−0.1v 2 .
dt
(a) Find t in terms of v. Hence find the exact time the stone takes to reach a speed of 5
metres per second. [5]
(b) Find the speed of the stone after 1 second. [3]
(c) What happens to the spped of the stone for large values of t?
Exercise 405. (9740 N2011/II/2. Answer on p. 1303.) The diagram shows a rectangular
piece of cardboard ABCD of sides n meteres and 2n metres, where n is a positive constant.
A square of side x metres is removed from each corner of ABCD. The remaining shape is
now folded along P Q, QR, RS and SP to form an open rectangular box of height x metres.
A x B
x
P Q
n
S R
D C
2n
(i) Show that the volume V cubic metres of the box is given by V = 2n2 x − 6nx2 + 3x3 .
(ii) Without using a calculator, find in surd form the value of x that gives a stationary
value of V , and explain why there is only one answer. [6]
Author’s remark: Although not stated, my guess is that the writers of this question intended
a to be the fixed constant, so that is how I approach this question.
4x
(b) The region bounded by the curve y = 2 , the axis and the lines x = 0 and x = 1 is
x +1
rotated through 2π radians about the x-axis. Use the substitution x = tan θ to show that
π/4
the volume of the solid obtained is given by 16π ∫ sin2 θ dθ, and evaluate this integral
0
exactly. [6]
Exercise 407. (9740 N2010/I/2. Answer on p. 1304.) (i) Find the first three terms of
the Maclaurin series for ex (1 + sin 2x). [You may use standard results given in the List of
Formulae (MF15).] [3]
(ii) It is given that the first two terms of this series are equal to the first two terms in the
4 n
series expansion, in ascending powers of x, of (1 + x) . Find n and show that the third
3
terms in each of these series are equal. [3]
Exercise 408. (9740 N2010/I/4. Answer on p. 1305.) (i) Given that x2 − y 2 + 2xy + 4 = 0,
dy
find in terms of x and y. [4]
dx
(ii) For the curve with equation x2 − y 2 + 2xy + 4 = 0, find the coordinates of each point at
which the tangent is parallel to the x-axis. [4]
x
Ƚ -1 O Ⱦ 1 ɀ
(i) Find the values of β and γ, giving your answers correct to 3 decimal places. [2]
(ii) Find the area of the region bounded by the curve and the x-axis between x = β and
x = γ. [2]
(iii) Use a non-calculator method to find the area of the region bounded by the curve and
the red line, where x ≤ 0. [4]
(iv) Find the set of values of k for which the equation x3 − 3x + 1 = k has three real distinct
roots. [2]
Exercise 410. (9740 N2010/I/7. Answer on p. 1306.) (i) A bottle containing liquid is
taken from a refrigerator and placed in a room where the temperature is a constant 20 ○ C.
As the liquid warms up, the rate of increase of its temperature θ ○ C after time t minutes
is proportional to the temperature difference (20 − θ) ○ C. Initially the temperature of the
liquid is 10 ○ C and the rate of increase of the temperature is 1 ○ C per minute. By setting
up and solving a differential equation, show that θ = 20 − 10e−0.1t . [7]
(ii) Find the time it takes the liquid to reach a temperature of 15 ○ C, and state what
happens to θ for large values of t. Sketch a graph of θ against t. [4]
3x 3x
y
ky
x x
Box Lid
(i) Use differentiation to find, in terms of k, the value of x which gives a minimum total
external surface area of the box and the lid. [6]
Author’s remark: It is not clear and so I interpret “external surface area” to mean the
external surface area when the lid is placed over the box.
Another possible and entirely reasonable interpretation is that this refers to the total external
surface area when the box and lid are kept apart, as depicted in the diagram! This would
yield a different answer.
y
(ii) Find also the ratio of the height to the width, , in this case, simplifying your answer.
x
[2]
y
(iii) Find the values between which must lie. [2]
x
(iv) Find the value of k for which the box has square ends. [4]
(i) The point P on the curve has parameter p. Show that the equation of the tangent at P
is (p2 + 1) − (p2 − 1)y = 4p.
(ii) The tangent at P meets the line y = x at the point A and the line y = −x at the point
B. Show that the area of triangle OAB is independent of p, where O is the origin. [4]
(iii) Find a cartesian equation of C. Skech C, giving the coordinates of any points where
C crosses the x- and y-axes and the equations of any asymptotes. [4]
√
Exercise 413. (9740 N2010/II/3. Answer on p. 1309.) (i) Given that y = x x + 2, find
dy
, expressing your answer as a single algebraic fraction. Hence, show that there is only
dx √
one value of x for which the curve y = x x + 2 has a turning point, and state this value.
(ii) A curve has equation y 2 = x2 (x + 2).
(a) Find exactly the possible values of the gradient at the point where x = 0. [2]
(b) Sketch the curve y 2 = x2 (x + 2).
√
(iii) On a separate diagram sketch the graph of y = f ′ (x), where f (x) = x x + 2. State the
equations of any asymptotes. [2]
Exercise 414. (9740 N2009/I/2. Answer on p. 1311.) Find the exact value of p such that
1 1
1
2p 1
∫0 4 − x2 dx = ∫0 √ dx. [5]
1 − p2 x 2
⎧
⎪
⎪
⎪7 − x2 for 0 < x ≤ 2,
f (x) = ⎨
⎪
⎪
⎩2x − 1
⎪ for 2 < x ≤ 4,
(ii) Given that the first two non-zero terms in the Maclaurin series for f (x) are equal to
1
the first two non-zero terms in the series expansion of , where a and b are constants,
a + bx2
find a and b in terms of e. [4]
Exercise 417. (9740 N2009/I/11. Answer on p. 1312.) The curve C has equation
y = f (x), where f (x) = xe−x .
2
(ii) Find the exact coordinates of the turning points on the curve. [4]
n
(iii) Use the substitution u = x2 to find ∫ f (x) dx, for n > 0. Hence find the area of the
0
region between the curve and the positive x-axis. [4]
2
(iv) Find the exact value of ∫ ∣f (x)∣ dx.
−2
(v) Find the volume of revolution when the region bounded by the curve, the lines x = 0,
x = 1 and the x-axis is rotated completely about the x-axis. Give your answer correct to 3
significant figures. [2]
Exercise 418. (9740 N2009/II/1. Answer on p. 1313.) The curve C has parametric
equations x = t2 + 4t, y = t3 + t2 .
(iii) The tangent l meets C again at the point Q. Use a non-calculator method to find the
coordinates of Q. [4]
Exercise 420. (9740 N2008/I/1. Answer on p. 1315.) The diagram shows the curve with
equation y = x2 . The area of the region bounded by the curve, the lines x = 1, x = 2 and
the x-axis is equal to the area of the region bounded by the curve, the lines y = a, y = 4
and the y-axis , where a < 4. Find the value of a. [4]
x
1 2
(iii) What can you say about the gradient of every solution curve as x → ±∞? [1]
You can skip the part (iv) if you’re taking the 9758 (revised) exam.
(iv) Sketch, on a single diagram, the graph of the solution found in part (ii), together with
2 other members of the family of solution curves. [3]
Exercise
√
422. (9740 N2008/I/5. Answer on p. 1316.) (i) Find the exact value of
1/ 3 1
∫0 1 + 9x2
dx. [3]
e
(ii) Find, in terms of n and e, ∫ xn ln x dx, where n ≠ −1. [4]
1
Exercise 423. (9740 N2008/I/6. Answer on p. 1316.) (a) In the triangle ABC, AB = 1,
BC =√3 and ∠ABC = θ radians. Given that θ is a sufficiently small angle, show that
AC ≈ 4 + 3θ2 ≈ a + bθ2 , for constants a and b to be determined. [5]
π
(b) Given that f (x) = tan (2x + ), find f (0), f ′ (0) and f ′′ (0). Hence find the first 3 terms
4
in the Maclaurin series of f (x). [5]
y y
Exercise 426. (9740√N2008/II/2. Answer on p. 1319.) The diagram shows the curve C
with equation y 2 = x 1 − x. The region enclosed by C is denoted by R.
0.5
x
O 0.5 1
-0.5
(i) Write down an integral that gives the area of R, and evaluate this integral numerically.
[3]
(ii) The part of R above the x-axis is rotated through 2π radians about the x-axis. By
using the substitution u = 1 − x, or otherwise, find the exact value of the volume obtained.
[3]
(iii) Find the exact x-coordinate of the maximum point of C. [3]
1 1 3
Exercise 428. (9233 N2008/I/3. Answer on p. 1319.) Show that ∫ xe−2x dx = − e−2 .
0 4 4
[5]
Exercise 430. (9233 N2008/I/6. Answer on p. 1320.) (i) Given that 0 < a < b, sketch the
graph of y = ∣x − a∣ for −b ≤ x ≤ b. [3]
b
(ii) Find ∫ ∣x − a∣ dx. [2]
−b
Exercise 431. (9233 N2008/I/8. Answer on p. 1320.) Find the exact value of a for which
√
∞ 1 3/2 1
∫a dx = ∫ √ dx. [5]
4 + x2 1/2 1 − x2
Exercise 432. (9233 N2008/I/10. Answer on p. 1321.) (i) Prove that the substitution
dy dz
y = xz reduces the differential equation xy = x2 + y 2 to xz = 1. [3]
dx dx
dy
(ii) Hence find the solution of the differential equation xy = x2 + y 2 f or which y = 6 when
dx
x = 2. [5]
Exercise 433. (9233 N2008/I/13. Answer on p. 1321.) A curve is defined by the para-
π
metric equations x = cos3 t, y = sin3 t, for 0 < t < .
4
(i) Show that the equation of the normal to the curve at the point P (cos3 t, sin3 t) is
x cos t − y sin t = cos4 t − sin4 t. [5]
(ii) Prove the identity cos4 t − sin4 t ≡ cos 2t. [2]
(iii) The normal at P meets the x-axis at A and the y-axis at B. Show that the length of
AB can be expressed in the form k cot 2t, where k is a constant to be found. [5]
Exercise 435. (9233 N2008/II/5. Answer on p. 1322.) (i) Show that the derivative of
2x
the function ln(1 + x) − is never negative. [5]
x+2
2x
(ii) Hence show that ln(1 + x) ≥ when x ≥ 0. [3]
x+2
Exercise 436. (9740 N2007/I/4. Answer on p. 1322.) The current I in an electric circuit
dI
at time t satisfies the differential equation 4 = 2 − 3I.
dt
(i) Find I in terms of t, given that I = 2 when t = 0. [6]
(ii) State what happens to the current in this circuit for large values of t. [1]
Exercise 437. (9740 N2007/I/11. Answer on p. 1323.) A curve has parametric equations
x = cos2 t, y = sin3 t, for 0 ≤ t ≤ 0.5π.
(i) Sketch the curve. [2]
(ii) The tangent to the curve at the point (cos2 θ, sin3 θ), where 0 < θ < 0.5π, meets the x-
and y-axes at Q and R respectively. The origin is denoted by O. Show that the area of
△OQR is
1 2
sin θ (3 cos2 θ + 2 sin2 θ) . [6]
12
0.5π
(iii) Show that the area under the curve for 0 ≤ t ≤ 0.5π is 2 ∫ cos t sin4 t dt, and use the
0
substitution sin t = u to find this area. [5]
(ii) The region R is bounded by the curve y = x2 sin x, the line x = 0.5π and the part of the
x-axis between 0 and 0.5π. Find
(a) the exact area of R, [5]
(b) the numerical value of the volume of revolution formed when R is rotated completely
about the x-axis, giving your answer correct to 3 decimal places. [2]
Exercise 440. (9233 N2007/I/2. Answer on p. 1325.) Find the first negative coefficient
4
in the expansion of (4 + 3x)2.5 in a series of ascending powers of x, where ∣x∣ < . Give your
3
answer as a fraction in its lowest terms. [3]
Exercise 441. (9233 N2007/I/3. Answer on p. 1325.) The region bounded by the curve
1 √
y= √ , the x-axis and the lines x = 0.5 and x = 0.5 3 is rotated through 4 right
1 + 4x2
angles about the x-axis to form a solid of revolution of volume V . Find the exact value of
V , giving your answer in the form kπ 2 . [5]
Exercise 442. (9233 N2007/I/8. Answer on p. 1325.) Use the substitution t = sin u to
2
(sin−1 t) cos [(sin−1 t) ]
show that ∫ √ dt simplifies to ∫ u cos u2 du. [3]
1−t 2
2
1 (sin−1 t) cos [(sin−1 t) ]
Hence evaluate ∫ √ dt. [4]
0 1 − t2
Exercise 443. (9233 N2007/I/10. Answer on p. 1326.) (i) By sketching the graphs of
y = cos x and y = sin x, or otherwise, solve the inequality cos x > sin x for 0 ≤ x ≤ 2π. [3]
2π
(ii) Evaluate ∫ ∣cos x − sin x∣ dx. [5]
0
Exercise 444. (9233 N2007/I/11. Answer on p. 1326.) Use partial fractions to evaluate
4 5x + 4
∫1 (x − 5)(x2 + 4) dx, giving your answer in the form − ln a, where a is a positive integer.
[9]
Exercise 447. (9233 N2006/I/7. Answer on p. 1329.) A hollow cone of semi-vertical angle
45○ is held with its axis vertical and vertex downwards (see diagram). At the beginning of
an experiment, it is filled with 390 cm3 of liquid. The liquid runs out through a small hole
at the vertex at a constant rate of 2 cm3 s−1 . Find the rate at which the depth of the liquid
is decreasing 3 minutes after the start of the experiment. [6]
45
Exercise 449. (9233 N2006/I/9. Answer on p. 1329.) (i) Use the derivative of cos θ to
d sec θ
show that = sec θ tan θ. [2]
dθ
(ii) Use the substitution x = sec θ − 1 to find the exact value of
1 1
∫√2−1 √ dx. [6]
(x + 1) x2 + 2x
.
1 + x − 2x2
Exercise 450. (9233 N2006/I/12. Answer on p. 1330.) (i) Express f (x) =
(2 − x)(1 + x2 )
in partial fractions. [4]
(ii) Expand f (x) in ascending powers of x, up to and including the term in x2 . [5]
(iii) State the set of values of x for which the expansion is valid. [1]
R x
P
√
Exercise 452. (9233 N2006/II/2. Answer on p. 1331.) (i) Given that z = x/ x2 + 32,
−1.5
show that dz/dx = 32 (x2 + 32) . [3]
−1.5
(ii) Find the exact value of the area of the region bounded by the curve y = (x2 + 32) ,
the x-axis and the lines x = 2 and x = 7. [3]
Exercise 453. (9740 N2015/II/5. Answer on p. 1332.) You can skip this entire question
if you’re taking the 9758 (revised) exam. The manager of a busy supermarket wishes to
conduct a survey of the opinions of customers of different ages about different types of cola
drink.
(i) Give a reason why the manager would not be able to use stratified sampling. [1]
(ii) Explain briefly how the manager could carry out a survey using quota sampling. [2]
(iii) Give one reason why quota sampling would not necessarily provide a sample which is
representative of the customers of the supermarket. [1]
Exercise 454. (9740 N2015/II/6. Answer on p. 1332.) ‘Droppers’ are small sweets that
are made in a variety of colours. Droppers are sold in packets and the colours of the sweets
in a packet are independent of each other. On average, 25% of Droppers are red.
(i) A small packet of Droppers contains 10 sweets. Find the probability that there are at
least 4 red sweets in a small packet. [2]
You can skip parts (ii) and (iii) if you’re taking the 9758 (revised) exam.
A large packet of Droppers contains 100 sweets.
(ii) Use a suitable approximation, which should be stated, to find the probability that a
large packet contains at least 30 red sweets. [3]
(iii) Yip buys 15 large packets of Droppers. Find the probability that no more than 3 of
these packets contain at least 30 red sweets. [2]
Exercise 455. (9740 N2015/II/7. Answer on p. 1333.) You can skip this entire question
if you’re taking the 9758 (revised) exam.
The average number of errors per page for a certain daily newspaper is being investigated.
(i) State, in context, two assumptions that need to be made for the number of errors per
page to be well modelled by a Poisson distribution. [2]
Assume that the number of errors per page has the distribution Po(1.3).
(ii) Find the probability that, on one day, there are more than 10 errors altogether on the
first 6 pages. [3]
(iii) The probability that there are fewer than 2 errors altogether on the first n pages of
the newspaper is less than 0.05. Write down an inequality in terms of n to represent this
information, and hence find the least possible value of n. [2]
(i) Find unbiased estimates of the population mean and variance of the mass of pineapples.
You can skip part (ii) of this question if you’re taking the 9758 (revised) exam.
(ii) Test the stall owner’s claim at the 10% level of significance. [7]
Exercise 457. (9740 N2015/II/9. Answer on p. 1334.) For events A, B and C it is given
that P (A) = 0.45, P (B) = 0.4, P (C) = 0.3 and P (A ∩ B ∩ C) = 0.1. It is also given that
events A and B are independent, and that events A and C are independent.
(i) Find P (B∣A). [1]
(ii) Given also that events B and C are independent, find P (A′ ∩ B ′ ∩ C ′ ). [3]
(iii) Given instead that events B and C are not independent, find the greatest and least
possible values of P (A′ ∩ B ′ ∩ C ′ ). [4]
h 2000 5000 10000 15000 20000 25000 30000 35000 40000 45000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28
(i) Draw a scatter diagram for these values, labelling the axes. [1]
(ii) Write down the number of these arrangements in which the letters are not in alpha-
betical order. [1]
(iii) Find the number of different arrangements that can be made with both the A’s together
and both the B’s together. [2]
(iv) Find the number of different arrangements that can be made with no two adjacent
letters the same. [4]
Exercise 460. (9740 N2015/II/12. Answer on p. 1336.) In this question you should state
clearly the values of the parameters of any normal distribution you use.
The masses in grams of apples have the distribution N (300, 202 ) and the masses in grams
of pears have the distribution N (200, 152 ). A certain recipe requires 5 apples and 8 pears.
(i) Find the probability that the total mass of 5 randomly chosen apples is more than 1600
grams. [2]
(ii) Find the probability that the total mass of 5 randomly chosen apples is more than the
total mass of 8 randomly chosen pears. [3]
The recipe requires the apples and pears to be prepared by peeling them and removing the
cores. This process reduces the mass of each apple by 15% and the mass of each pear by
10%.
(iii) Find the probability that the total mass, after preparation, of 5 randomly chosen apples
and 8 randomly chosen pears is less than 2750 grams. [4]
Exercise 461. (9740 N2014/II/5. Answer on p. 1337.) You can skip this entire question
if you’re taking the 9758 (revised) exam. An Internet retailer has compiled a list of 10000
regular customers and wishes to carry out a survey of customer opinions involving 5% of
its customers.
(i) Describe how the marketing manager could choose customers for this survey using
systematic sampling. [2]
(ii) Give one advantage and one disadvantage of systematic sampling in this context. [2]
(i) How many different teams can be formed by the club? [2]
One of the midfielders in the club is the brother of one of the attackers in the club.
(ii) How many different teams can be formed which include exactly one of the two brothers?
[3]
The two brothers leave the club. The club manager decides that one of the remaining
midfielders can play either as a midfielder or as a defender.
(iii) How many different teams can now be formed by the club? [3]
Exercise 463. (9740 N2014/II/7. Answer on p. 1338.) Yan is carrying out an experiment
with a fair 6-sided die and a biased 6-sided die, each numbered from 1 to 6.
(i) Yan rolls the fair die 10 times. Find the probability that it shows a 6 exactly thrice. [1]
You can skip parts (ii) and (iii) if you’re taking the 9758 (revised) exam.
(ii) Yan now rolls the fair dies 60 times. Use a suitable approximate distribution, which
should be stated, to find the probability that the die shows a 6 between 5 and 8 times,
inclusive. [3]
m 11 20 28 36 40 47 58 62 68 75
P 112800 102600 76500 72000 72000 69000 65800 57000 50600 47600
It is thought that the price after m months can be modelled by one of the formulae
P = am + b, P = c ln m + d,
Exercise 465. (9740 N2014/II/9. Answer on p. 1340.) The number of minutes that the
0815 bus arrives late at my local bus stop has a normal distribution; the mean number of
minutes the bus is late has been 4.3. A new company takes over the service, claiming that
punctuality will be improved. After the new company takes over, a random sample of 10
days is taken and the number of minutes that the bus is late is recorded. The sample mean
is t̄ minutes and the sample variance is k 2 minutes2 . A test is to be carried out at the
10% level of significance to determine whether the mean number of minutes late has been
reduced.
(i) State appropriate hypotheses for the test, defining any symbols that you use. [2]
You can skip parts (ii) and (iii) of this question if you’re taking the 9758 (revised) exam.
(ii) Given that k 2 = 3.2, find the set of values of t̄ for which the result of the test would be
that the null hypothesis is not rejected. [4]
(iii) Given instead that t̄ = 4.0, find the set of values of k 2 for which the result of the test
would be to reject the null hypothesis. [3]
Set 1 + + + + × × × ◯ ◯ ⋆
Set 2 + + + × ◯ ◯ ◯ ◯ ⋆ ⋆
Set 3 + + × × × × ◯ ◯ ◯ ⋆
For example, if a + symbol is chosen from set 1, a ◯ symbol is chosen from set 2 and a ⋆
symbol is chosen from set 3, the display would be +◯⋆.
Exercise 467. (9740 N2014/II/11. Answer on p. 1341.) You can skip this entire question
if you’re taking the 9758 (revised) exam.
An art dealers sells both original paintings and prints. (Prints are copies of paintings.) It
is to be assumed that his sales of originals per week can be modelled by the distribution
Po(2) and his sales of prints per week can be modelled by the independent distribution
Po(11).
(i) Find the probability that, in a randomly chosen week,
(ii) The probability that the art dealer sells fewer than 3 originals in a period of n weeks is
less than 0.01. Express this information as an inequality in n, and hence find the smallest
possible integer value of n. [5]
(iii) Using a suitable approximation, which should be stated, find the probability that the
art dealer sells more than 550 prints in a year (52 weeks). [3]
(iv) Give two reasons in context why the assumptions made at the start of this question
may not be valid. [2]
You can skip part (ii) of this question if you’re taking the 9758 (revised) exam.
(ii) Name a more appropriate sampling method, and explain how it can be carried out to
provide the representative sample that the Chief Executive wants. [2]
Exercise 469. (9740 N2013/II/6. Answer on p. 1342.) The continuous random variable
Y has the distribution N (µ, σ 2 ). It is known that P (Y < 2a) = 0.95 and P (Y < a) = 0.25.
Express µ in the form ka, where k is a constant to be determined. [4]
You can skip part (iii) if you’re taking the 9758 (revised) exam.
(iii) Given instead that n = 60, use a suitable approximation to find the probability that F
is at least 5. State the parameter(s) of the distribution that you use. [3]
Exercise 471. (9740 N2013/II/8. Answer on p. 1343.) For events A and B it is given
that P (A) = 0.7, P (B∣A′ ) = 0.8 and P (A∣B ′ ) = 0.88. Find
(i) P (B ∩ A′ ), [1]
(ii) P (A′ ∩ B ′ ), [2]
(i) Calculate unbiased estimates of the population mean and variance. [2]
You can skip the rest of this question if you’re taking the 9758 (revised) exam.
The manufacturer claims that this model of car will travel 13.8 km per litre on average. It
is given that the distances travelled per litre for cars of this model are normally distributed.
(ii) Stating a necessary assumption, carry out a t-test of the magazine editor’s belief at the
5% significance level. [5]
Exercise 473. (9740 N2013/II/10. Answer on p. 1344.) (i) Sketch a scatter diagram that
might be expected when x and y are related approximately as given in each of the cases
(A), (B) and (C) below. In each case your diagram should include 6 points, approximately
equally spaced with respect to x, and with all x- and y-values positive. The letters a, b, c,
d, e and f represent constants.
(ii) Draw the scatter diagram for these values, labelling the axes. [1]
(iii) Explain which of the three cases in part (i) is the most appropriate for modelling these
values, and calculate the product moment correlation coefficient for this case. [2]
(iv) It is required to estimate the distance travelled at a speed of 110 km h-1 . Use the case
that you identified in part (iii) to find the equation of a suitable regression line, and use
your equation to find the required estimate. [3]
(ii) the second digit higher than the first digit, [2]
(iii) exactly two letters the same or two digits the same, but not both, [4]
(iv) exactly one vowel (A, E, I, O or U) and exactly one even digit. [4]
Exercise 475. (9740 N2013/II/12. Answer on p. 1346.) You can skip this entire question
if you’re taking the 9758 (revised) exam. A company has two departments and each depart-
ment records the number of employees absent through illness each day. Over a long period
of time it is found that the average numbers absent on a day are 1.2 for the Administration
Department and 2.7 for the Manufacturing Department.
(i) State, in this context, two conditions that must be met for the numbers of absences to
be well modelled by Poisson distributions. Explain why each of your two conditions may
not be met. [3]
For the remainder of this question assume that these conditions are met. You should assume
also that absences in the two departments are independent of each other.
(ii) Find the smallest number of days for which the probability that no employee is absent
through illness from the Administration Department is less than 0.01. [2]
Each employee absent on a day represents one ’day of absence’. So, one employee absent
for 3 days contributes 3 days of absence, and 5 employees absent on 1 day contribnute 5
days of absence.
(iii) Find the probability that, in a 5-day period, the total number of days of absence in
the two departments is more than 20. [3]
(iv) Use a suitable approximation, which should be stated together with its parameter(s),
to find the probability that, in a 60-day period, the total number of days of absence in the
two departments is between 200 and 250 inclusive. [4]
Exercise 478. (9740 N2012/II/7. Answer on p. 1348.) A group of fifteen people consists
of one pair of sisters, one set of three brothers and ten other people. The fifteen people are
arranged randomly in a line.
(i) Find the probability that the sisters are next to each other. [2]
(ii) Find the probability that the brothers are not all next to each other. [2]
(iii) Find the probability that the sisters are next to each other and the brothers are all
next to each other. [2]
(iv) Find the probability that either the sisters are next to each other or the brothers are
all next to each other or both. [2]
Instead the fifteen people are arranged randomly in a circle.
(iv) Find the probability that the sisters are next to each other. [1]
Week x 1 2 3 4 5 6
Percentage mark y 38 63 67 75 71 82
L 91 92 93
r -0.929944 -0.929918
(iv) Calculate the value of r for L = 91, giving your answer correct to 6 decimal places. [1]
(v) Use the table and your answer to part (iv) to suggest with a reason which of 91, 92 or
93 is the most appropriate value for L. [1]
(vi) Using the value for L, calculate the values of a and b, and use them to predict the week
in which Amy will obtain her first mark of at least 90%. [4]
(vii) Give an interpretation, in context, of the value of L. [1]
(i) State two conditions needed for the number of gold coins found in a randomly chosen
region of area 1 square metre to be well modelled by a Poisson distribution. [2]
Assume that the number of gold coins in 1 square metre has the distribution Po(0.8).
(ii) Find the probability that in 1 square metre there are at least 3 gold coins. [1]
(iii) It is given that the probability that 1 gold coin is found in x square metres is 0.2.
Write down an equation for x, and solve it numerically given that x < 1. [2]
(iv) Use a suitable approximation to find the probability that in 100 square metres there
are at least 90 gold coins. State the parameter(s) of the distribution that you use. [3]
Pottery shards are also found scattered throughout the site. The number of pottery shards
in 1 square metre is an independent random variable with the distribution Po(3). Use
suitable approximations, whose parameters should be stated, to find
(v) the probability that in 50 square metres the total number of gold coins and pottery
shards is at least 200, [4]
(vi) the probability that in 50 square metres there are at least 3 times as many pottery
shards as gold coins. [3]
Exercise 482. (9740 N2011/II/5. Answer on p. 1352.) The continuous random variable X
has the distribution N (µ, σ 2 ). It is known that P (X < 40.0) = 0.05 and P (X < 70.0) = 0.975.
Calculate the values of µand σ. [4]
Exercise 483. (9740 N2011/II/6. Answer on p. 1352.) You can skip this entire question
if you’re taking the 9758 (revised) exam. It is desired to interview residents of a city suburb
about the types of shop to be opened in a new shopping mall. In particular it is necessary
to interview a representative range of ages.
(i) Explain how a quota sample might be carried out in this context. [2]
(ii) Explain a disadvantage of quota sampling in the context of your answer to part (i). [1]
(iii) State the name of a method of sampling that would not have this disadvantage, and
explain whether it would be realistic to use this method in this context. [2]
(iv) Given that n = 40, use an appropriate approximation to find P (R < 25). State the
parameters of the distribution you use. [4]
Exercise 485. (9740 N2011/II/8. Answer on p. 1354.) (i) Sketch a scatter diagram that
might be expected for the case when x and y are related approximately by y = a + bx2 ,
where a is positive and b is negative. Your diagram should include 5 points, approximately
equally spaced with respect to x, and with all x- and y-values positive. [1]
The table gives the values of seven observations of bivariate data, x and y.
(ii) Calculate the value of the product moment correlation coefficient, and explain why its
value does not necessarily mean that the best model for the relationship between x and y
is y = c + dx. [2]
(iii) Explain how to use the values obtained by calculating onccs to decide, for this data,
whether y = a + bx2 or y = c + dx is the better model. [1]
(iv) It is desired to use the data in the table to estimate the value of y for which x = 3.2.
Find the equation of the least-squares regression line of y on x2 . Use your equation to
calculate the desired estimate. [3]
Exercise 487. (9740 N2011/II/10. Answer on p. 1355.) In a factory, the time in minutes
for an employee to install an electronic component is a normally distributed continuous
random variable T . The standard deviation of T is 5.0 and under ordinary conditions
the expected value of T is 38.0. After background music is introduced into the factory, a
sample of n components is taken and the mean time taken for randomly chosen employees
to install them is found to be t̄ minutes. A test is carried out, at the 5% significance level,
to determine whether the mean time taken to install a component has been reduced.
(i) State appropriate hypotheses for the test, defining any symbols you use. [2]
(ii) Given that n = 50, state the set of values of t̄ for which the result of the test would be
to reject the null hypothesis. [3]
(iii) It is given instead that t̄ = 37.1 and the result of the test is that the null hypothesis is
not rejected. Obtain an inequality involving n, and hence find the set of values that n can
take. [4]
(i) Find the probability that, in a period of 4 minutes, at least 8 people join the queue. [1]
(ii) The probability that no more than 1 person joins the queue in a period of t seconds is
0.7. Find an equation for t. Hence find the value of t, giving your answer correct to the
nearest whole number. [4]
(iii) The number of people leaving the same queue in a period of 1 minute is a random
variable with the distribution Po(1.8). At 0930 on a certain morning there are 35 people in
the queue. Use appropriate approximations to find the probability that by 0945 there are
at least 24 people in the queue, stating the parameters of any distributions that you use.
(You may assume that the queue does not become empty during this period.) [5]
(iv) Explain why a Poisson model would probably not be valid if applied to a time period
of several hours. [1]
Exercise 491. (9740 N2010/II/6. Answer on p. 1358.) The time required by an employee
to complete a task is a normally distributed random variable. Over a long period it is
known that the mean time required is 42.0 minutes. Background music is introduced in the
workplace, and afterwards the time required, t minutes, is measured for a random sample
of 11 employees. The results are summarised as follows.
You can skip part (ii) of this question if you’re taking the 9758 (revised) exam.
(ii) Test, at the 10% significance level, whether there has been a change in the mean time
required by an employee to complete the task. [7]
For a third event C, it is given that P (C) = 0.5 and that A and C are independent.
Exercise 493. (9740 N2010/II/8. Answer on p. 1359.) The digits 1, 2, 3, 4 and 5 are
arranged randomly to form a five-digit number. No digit is repeated. Find the probability
that
(i) the number is greater than 30000, [1]
Exercise 494. (9740 N2010/II/9. Answer on p. 1359.) In this question you should state
clearly the values of the parameters of any normal distribution you use.
Over a three-month period Ken makes X minutes of peak-rate telephone calls and Y min-
utes of cheap-rate calls. X and Y are independent random variables with the distributions
N (180, 302 ) and N (400, 602 ) respectively.
(i) Find the probability that, over a three-month period, the number of minutes of cheap-
rate calls made by Ken is more than twice the number of minutes of peak-rate calls. [4]
Peak-rate calls cost $0.12 per minute and cheap-rate calls cost $0.05 per minute.
(ii) Find the probability that, over a three-month period, the total cost of Ken’s calls is
greater than $45. [3]
(iii) Find the probability that the total cost of Ken’s peak-rate calls over two independent
three-month periods is greater than $45. [3]
v 0 4 8 12 16 20 24 28 32 36
F 0 2.5 5.1 8.8 11.2 13.6 17.6 22.0 27.8 33.9
(i) Draw the scatter diagram for these values, labelling the axes clearly. [2]
It is thought that the drag force F can be modelled by one of the formulae
F = a + bv or F = c + dv 2
Exercise 496. (9740 N2010/II/11. Answer on p. 1361.) You can skip this entire question
if you’re taking the 9758 (revised) exam. In this question you should state clearly all
distributions that you use, together with the values of the appropriate parameters.
The number of telephone calls received by a call centre in one minute is a random variable
with distribution Po(3).
(i) Find the probability that exactly 8 calls are received in a randomly chosen period of 4
minutes. [2]
(ii) Find the length of time, to the nearest second, for which the probability that no calls
are received is 0.2. [3]
(iii) Use a suitable approximation to find the probability that, on a randomly chosen working
day of 12 hours, more than 2200 calls are received. [4]
A working day of 12 hours on which more than 2200 calls are received is said to be ‘busy’.
(iv) Find the probability that, in six randomly chosen working days, exactly two are busy.
[2]
(v) Use a suitable approximation to find the probability that, in 30 randomly chosen working
days of 12 hours, fewer than 10 are busy. [4]
86
I have changed the wording of this sentence slightly.
Exercise 498. (9740 N2009/II/6. Answer on p. 1362.) The table gives the world record
time, in seconds above 3 minutes 30 seconds, for running 1 mile as at 1st January in various
years.
Exercise 500. (9740 N2009/II/8. Answer on p. 1363.) Find the number of ways in which
the letters of the word ELEVATED can be arranged if
(i) there are no restrictions, [1]
(ii) T and D must not be next to one another, [2]
(iii) consonants (L, V, T, D) and vowels (E, A) must alternate, [3]
(iv) between any two Es there must be at least 2 other letters. [3]
(i) The mean thickness of n randomly chosen mechanics textbooks is denoted by M̄ cm.
Given that P (M̄ > 2.53) = 0.0668, find the value of n. [3]
(ii) Calculate the probability that 21 mechanics textbooks and 24 statistics textbooks will
fit into a bookshelf of length 1 m. State clearly the mean and variance of any normal
distribution you use in your calculation.
(iii) Calculate the probability that the total thickness of 4 statistics textbooks is less than
three times the thickness of 1 mechanics textbook. State clearly the mean and variance of
any normal distribution you use in your calculation. [3]
(iv) State an assumption needed for your calculation in parts (ii) and (iii) [1]
∑ x = 86.4, ∑ x2 = 835.92.
(iii) Suppose now that the population variance of X is known, and that the assumption
made in part (ii) is still valid. What change would there be in carrying out the test? [1]
You can skip parts (iii) and (iv) of this question if you’re taking the 9758 (revised) exam.
But you should still do part (v).
(iii) Given that n = 240 and p = 0.3, find P (R < 60) using a suitable approximation, which
should be clearly stated.
(iv) Given that n = 240 and p = 0.02, find P (R = 3) using a suitable approximation,
giving your answer correct to 4 decimal places and explaining why the approximation is
appropriate in this case. [3]
(v) Given that n = 20 and P (R = 0 or 1) = 0.2, write down an equation for the value of p,
and find this value numerically. [2]
Exercise 504. (9740 N2008/II/5. Answer on p. 1367.) You can skip this entire question
if you’re taking the 9758 (revised) exam. A school has 950 pupils.
(i) A sample of 50 pupils is to be chosen to take part in a survey. Describe how the sample
could be chosen using systematic sampling. [2]
The purpose of the survey is to investigate pupils’ opinions about the sports facilities
available at the school.
(ii) Give a reason why a stratified sample might be preferable in this context. [2]
Exercise 505. (9740 N2008/II/6. Answer on p. 1367.) You can skip this entire question
if you’re taking the 9758 (revised) exam. In mineral water from a certain source, the mass
of calcium, X mg, in a one-litre bottle is a normally distributed random variable with mean
µ. Based on observations over a long period, it is known that µ = 78. Following a period of
extreme weather, 15 randomly chosen bottles of the water were analysed. The masses of
calcium in the bottles are summarised by
∑ x = 1026.0, ∑ x2 = 77265.90.
Test, at the 5% significance level, whether the mean mass of calcium in a bottle has changed.
[6]
Exercise 507. (9740 N2008/II/8. Answer on p. 1368.) A certain metal discolours when
exposed to air. To protect the metal against discolouring, it is treated with a chemical. In
an experiment, different quantities, x ml, of the chemical were applied to standard samples
of the metal, and the times, t hours, for the metal to discolour were measured. The results
are given in the table.
(i) Calculate the product moment correlation coefficient between x and t, and explain
whether your answer suggests that a linear model is appropriate. [3]
(ii) Draw a scatter diagram for the data. [1]
One of the values t appears to be incorrect.
(iii) Indicate the corresponding point on your diagram by labelling it P , and explain why
the scatter diagram for the remaining points may be consistent with a model of the form
t = a + b ln x. [2]
(iv) Omitting P , calculate least square estimates of a and b for the model t = a + b ln x. [2]
(v) Estimate the value of t at the value of x corresponding to P . [1]
(vi) Comment on the use of the model in part (iv) in predicting the value of t when x = 8.0.
[1]
(i) Use a Poisson distribution to find the probability that in a given week at least 4 grand
pianos are sold. [2]
The mean number of upright pianos sold in a week is 2.6. The sales of the two types of
piano is independent.
(ii) Use a Poisson distribution to find the probability that in a given week the total number
of pianos sold is exactly 4. [2]
(iii) Use a normal approximation to the Poisson distribution to find the probability that
the number of grand pianos sold in a year of 50 weeks is less than 80. [4]
(iv) Explain why the Poisson distribution may not be a good model for the number of grand
pianos sold in a year. [2]
Exercise 510. (9740 N2008/II/11. Answer on p. 1371.) The random variable X has the
distribution N (50, 82 ). Given that X1 and X2 are two independent observations of X, find
(i) P (X1 + X2 > 120), [2]
(ii) P (X1 > X2 + 15). [3]
Exercise 512. (9233 N2008/II/23. Answer on p. 1371.) The events A, B and C are such
that P (A) = 0.2, P (C) = 0.4, P (A ∪ B) = 0.4 and P (B ∩ C) = 0.1. Given that A and B are
independent, find P (B) and show that B and C are also independent. [4]
Exercise 513. (9233 N2008/II/26. Answer on p. 1372.) You can skip this entire question
if you’re taking the 9758 (revised) exam. The number of times that an office photocopying
machine breaks down in a week follows a Poisson distribution with mean 3. Find the
probability that
(i) the machine will break down more than twice in a given week, [2]
(ii) the machine will break down at most three times in a period of four weeks. [3]
(iii) Use a suitable approximation to find the probability that the machine will break down
more than 50 times in a period of 16 weeks. [4]
Exercise 514. (9233 N2008/II/27. Answer on p. 1372.) The masses of a certain type of
electronic component produced by a machine are normally distributed with mean 32.40 g.
The machine is adjusted and a sample of 80 components is now taken and is found to have
a mean mass 32.00 g. The unbiased estimate of the population variance, calculated from
this sample, is 2.892 g2 .
(i) Test at the 5% significance level whether this indicates a change in the mean. [5]
(ii) Explain what you understand by the phrase ‘at the 5% significance’ in the context of
this question. [2]
(iii) Find the least level of significance at which this sample would indicate a decrease in
the population mean. [3]
Exercise 516. (9233 N2008/II/30. Answer on p. 1374.) (i) The masses of valves produced
by a machine are normally distributed with mean µ and standard deviation σ. 12% of the
valves have mass less than 86.50 g and 20% have mass more than 92.25 g. Find µ and σ.
[4]
(ii) The setting of the machine is adjusted so that the mean mass of the valves produced is
unchanged, but the standard deviation is reduced. Given that 80% of the valves now have
a mass within 2 g of the mean, find the new standard deviation. [3]
(iii) After the machine has been adjusted, a random sample of n valves is taken. Find the
smallest value of n such that the probability that the sample mean exceeds µ by at least
0.50 g is at most 0.1. [5]
Exercise 517. (9740 N2007/II/5. Answer on p. 1374.) You can skip this entire question
if you’re taking the 9758 (revised) exam. (i) Give a real-life example of a situation in which
quota sampling could be used. Explain why quota sampling would be appropriate in this
situation, and describe briefly any disadvantage that quota sampling has. [4]
(ii) Explain briefly whether it would be possible to use stratified sampling in the situation
you have described in part (i). [1]
Exercise 518. (9740 N2007/II/6. Answer on p. 1375.) In a large population, 24% have
a particular gene A, and 0.3% have gene B.
(i) Find the probability that, in a random sample of 10 people from the population, at most
4 have gene A. [2]
You can skip the rest of this question if you’re taking the 9758 (revised) exam.
A random sample of 1000 people is taken from the population. Using appropriate approx-
imations, find
(ii) the probability that between 230 and 260 inclusive have gene A, [3]
(iii) the probability that at least 2 but fewer than 5 have gene B. [2]
∑ x = 4626, ∑ x2 = 147691.
(i) Find unbiased estimates of the population mean and variance. [2]
(ii) Test, at the 5% significance level, whether the population mean time for a student to
complete the project exceeds 30 hours. [4]
You can skip part (iii) of this question if you’re taking the 9758 (revised) exam.
(iii) State giving a valid reason, whether any assumptions about the population are needed
in order for the test to be valid. [1]
Exercise 520. (9740 N2007/II/8. Answer on p. 1376.) Chickens and turkeys are sold
by weight. The masses, in kg, of chickens and turkeys are modelled as having independent
normal distributions with means and standard deviations as shown in the table.
(ii) Find the probability of the event that both a randomly chosen chicken has a selling
price exceeding $7 and a randomly chosen turkey has a selling price exceeding $55. [3]
(iii) Find the probability that the total selling price of a randomly chosen chicken and a
randomly chosen turkey is more than $62. [4]
(iv) Explain why the answer to part (iii) is greater than the answer to part (ii) [1]
(b) Find the number of different possible arrangements if men and women alternate. [2]
(c) Find the number of different possible arrangements if each man stands next to his wife
and men and women alternate. [2]
Exercise 522. (9740 N2007/II/10. Answer on p. 1377.) A player throws three darts at a
1
target. The probability that he is successful in hitting the target with his first throw is .
8
For each of his second and third throws, the probability of success is
• twice the probability of success on the preceding throw if that throw was successful,
• the same as the probability of success on the preceding throw if that throw was unsuc-
cessful.
(i) the probability that the third throw is successful given that exactly two of the three
throws are successful. [4]
It is given that the value of the product moment correlation coefficient for this data is
−0.912, correct to 3 decimal places. The scatter diagram for the data is shown below.
100
x (micrograms per litre)
90
80
70
60
50
40
30
20
t (minutes)
10
0
0 50 100 150 200 250 300
(ii) Calculate the corresponding estimated value of x when t = 300, and comment on the
suitability of the linear model. [2]
The variable y is defined by y = ln x. For the variables y and t,
(iii) calculate the product moment correlation coefficient and comment on its value, [2]
(iv) calculate the equation of the appropriate regression line. [3]
(v) Use a regression line to give the best estimate that you can of the time when the drug
concentration is 15 micrograms per litre. [2]
(i) How many different triangles can be drawn which have the point A as one of the vertices?
[1]
Exercise 525. (9233 N2007/II/23. Answer on p. 1379.) (i) A random sample of size 100
is taken from a population with mean 30 and standard deviation 5. Find an approximate
value for the probability that the sample mean lies between 29.2 and 30.8. [6]
(ii) Giving a reason, state whether it is necessary to make any assumptions about the
distribution of the population. [1]
State, with a reason in each case, whether W and B are independent, and whether M and
C are mutually exclusive. [4]
Exercise 527. (9233 N2007/II/26. Answer on p. 1379.) You can skip this entire question
if you’re taking the 9758 (revised) exam. At a fire station, each call-out is classified as either
genuine or false. Call-outs occur at random times. On average, there are two genuine call-
outs in a week, and one false call-out in a two-week period.
(i) Calculate the probability that there are fewer than 6 genuine call-outs in a randomly
chosen two-week period.
(ii) Using a suitable approximation, calculate the probability that the total number of
call-outs in a randomly chosen six-week period exceeds 19.
Exercise 529. (9233 N2006/I/4. Answer on p. 1380.) A box contains 8 balls, of which 3
are identical (and so are indistinguishable from one another) and the other 5 are different
from each other. 3 balls are to be picked out of the box; the order in which they are picked
out does not matter. Find the number of different possible selections of 3 balls. [4]
(Author’s remark: Assume also that the latter 5 balls are each different from the first 3
balls.)
Exercise 530. (9233 N2006/II/23. Answer on p. 1381.) Two fair dice, one red and the
other green, are thrown.
(i) Justifying your conclusion, determine whether A and B are independent. [3]
(ii) Find P (A ∪ B). [2]
(i) Test, at the 5% significance level, whether the mean mass of the contents of a bag is
less than 10 kg. [7]
(ii) Explain, in the context of the question, the meaning of ’at the 5% significance level’.
[1]
(i) Using this model, find the probability that, in a randomly chosen 200-year period, there
is exactly one severe flood in the first 100 years and exactly one severe flood in the second
100 years. [3]
(ii) Using the same model, and a suitable approximation, find the probability that there
are more than 25 severe floods in 1000 years. [5]
Exercise 533. (9233 N2006/II/28. Answer on p. 1382.) Observations are made of the
speeds of cars on a particular stretch of road during daylight hours. It is found that, on
average, 1 in 80 cars is travelling at a speed exceeding 125 km h-1 , and 1 in 10 is travelling
at a speed less than 40 km h-1 .
(i) Assuming a normal distribution, find the mean and the standard deviation of this
distribution. [4]
(ii) A random sample of 10 cars is to be taken. Find the probability that at least 7 will be
travelling at a speed in excess of 40 km h-1 . [3]
You can skip part (iii) of this question if you’re taking the 9758 (revised) exam.
(iii) A random sample of 100 cars is to be taken. Using a suitable approximation, find the
probability that at most 8 cars will be travelling at a speed less than 40 km h-1 . [3]
CONTENTS
Page
PREAMBLE 2
SYLLABUS AIMS 2
ASSESSMENT OBJECTIVES (AO) 2
USE OF A GRAPHING CALCULATOR (GC) 3
LIST OF FORMULAE AND STATISTICAL TABLES 3
INTEGRATION AND APPLICATION 3
SCHEME OF EXAMINATION PAPERS 4
CONTENT OUTLINE 5
ASSUMED KNOWLEDGE 13
MATHEMATICAL NOTATION 15
PREAMBLE
Mathematics is a basic and important discipline that contributes to the developments and understandings of
sciences and other disciplines. It is used by scientists, engineers, business analysts and psychologists, etc.
to model, understand and solve problems in their respective fields. A good foundation in mathematics and
the ability to reason mathematically are therefore essential for students to be successful in their pursuit of
various disciplines.
H2 Mathematics is designed to prepare students for a range of university courses, including mathematics,
sciences, engineering and related courses, where a good foundation in mathematics is required. It develops
mathematical thinking and reasoning skills that are essential for further learning of mathematics. Through
applications of mathematics, students also develop an appreciation of mathematics and its connections to
other disciplines and to the real world.
SYLLABUS AIMS
The aims of H2 Mathematics are to enable students to:
(a) acquire mathematical concepts and skills to prepare for their tertiary studies in mathematics, sciences,
engineering and other related disciplines
(b) develop thinking, reasoning, communication and modelling skills through a mathematical approach to
problem-solving
(c) connect ideas within mathematics and apply mathematics in the contexts of sciences, engineering and
other related disciplines
(d) experience and appreciate the nature and beauty of mathematics and its value in life and other
disciplines.
AO1 Understand and apply mathematical concepts and skills in a variety of problems, including those
that may be set in unfamiliar contexts, or require integration of concepts and skills from more than
one topic.
AO2 Formulate real-world problems mathematically, solve the mathematical problems, interpret and
evaluate the mathematical solutions in the context of the problems.
AO3 Reason and communicate mathematically through making deductions and writing mathematical
explanations, arguments and proofs.
2
9758 H2 MATHEMATICS (2017)
Students should be aware that there are limitations inherent in GC. For example, answers obtained by
tracing along a graph to find roots of an equation may not produce the required accuracy.
Kinematics and dynamics (e.g. free fall, projectile Functions; Calculus; Vectors
motion, collisions)
Optimisation problems (e.g. maximising strength, Inequalities; System of linear equations; Calculus
minimising surface area)
Financial maths (e.g. banking, insurance) Sequences and series; Probability; Sampling
distributions
The list illustrates some types of contexts in which the mathematics learnt in the syllabus may be applied,
and is by no means exhaustive. While problems may be set based on these contexts, no assumptions will be
made about the knowledge of these contexts. All information will be self-contained within the problem.
3
9758 H2 MATHEMATICS (2017)
PAPER 1 (3 hours)
A paper consisting of 10 to 12 questions of different lengths and marks based on the Pure Mathematics
section of the syllabus.
There will be at least two questions on application of Mathematics in real-world contexts, including those
from sciences and engineering. Each question will carry at least 12 marks and may require concepts and
skills from more than one topic.
PAPER 2 (3 hours)
A paper consisting of two sections, Sections A and B.
Section A (Pure Mathematics – 40 marks) will consist of 4 to 5 questions of different lengths and marks
based on the Pure Mathematics section of the syllabus.
Section B (Probability and Statistics – 60 marks) will consist of 6 to 8 questions of different lengths and
marks based on the Probability and Statistics section of the syllabus.
There will be at least two questions in Section B on application of Mathematics in real-world contexts,
including those from sciences and engineering. Each question will carry at least 12 marks and may require
concepts and skills from more than one topic.
4
9758 H2 MATHEMATICS (2017)
CONTENT OUTLINE
Knowledge of the content of the O Level Mathematics syllabus and of some of the content of the O Level
Additional Mathematics syllabuses are assumed in the syllabus below and will not be tested directly, but it
may be required indirectly in response to questions on other topics. The assumed knowledge for O Level
Additional Mathematics is appended after this section.
Topic/Sub-topics Content
5
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
6
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
3 Vectors
Exclude:
• finding the shortest distance between two skew
lines
• finding an equation for the common
perpendicular to two skew lines
7
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
5 Calculus
8
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
f ′(x)ef(x)
sin2 x, cos2 x, tan2 x,
sin mx cos nx, cos mx cos nx and sin mx sin nx
1 1 1 1
, , and
2 2 2 2
a +x a2 − x 2 a − x x − a2
2
9
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
10
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
11
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
12
9758 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
Exclude:
• derivation of formulae
• relationship r 2 = b1b2, where b1 and b2 are
regression coefficients
• hypothesis tests
13
9758 H2 MATHEMATICS (2017)
ASSUMED KNOWLEDGE
14
9758 H2 MATHEMATICS (2017)
15
9758 H2 MATHEMATICS (2017)
MATHEMATICAL NOTATION
The list which follows summarises the notation used in Cambridge’s Mathematics examinations. Although
primarily directed towards A Level, the list also applies, where relevant, to examinations at all other levels.
1. Set Notation
∈ is an element of
∉ is not an element of
{x1, x2, …} the set with elements x1, x2, …
{x: …} the set of all x such that
n(A) the number of elements in set A
∅ the empty set
universal set
A′ the complement of the set A
16
9758 H2 MATHEMATICS (2017)
2. Miscellaneous Symbols
= is equal to
≠ is not equal to
≡ is identical to or is congruent to
≈ is approximately equal to
∝ is proportional to
I is less than
Y; — is less than or equal to; is not greater than
K is greater than
[; – is greater than or equal to; is not less than
∞ infinity
3. Operations
a+b a plus b
a–b a minus b
a × b, ab, a.b a multiplied by b
a
a ÷ b, , a/b a divided by b
b
∑a
i =1
i a1 + a2 + ... + an
n n!
the binomial coefficient , for n, r ∈ + ∪ {0}, 0 Y r Y n
r r! (n − r )!
n(n − 1)...(n − r + 1)
, for n ∈ , r ∈ +∪ {0}
r!
17
9758 H2 MATHEMATICS (2017)
4. Functions
f the function f
f(x) the value of the function f at x
f: A →B f is a function under which each element of set A has an image in set B
f: x y the function f maps the element x to the element y
∆x ; δx an increment of x
dy
the derivative of y with respect to x
dx
dn y
the nth derivative of y with respect to x
dx n
f'(x), f''(x), …, f (n)(x) the first, second, … nth derivatives of f(x) with respect to x
b
∫ a
y dx the definite integral of y with respect to x for values of x between a and b
ln x natural logarithm of x
lg x logarithm of x to base 10
18
9758 H2 MATHEMATICS (2017)
7. Complex Numbers
i the square root of –1
z a complex number, z = x + iy
= r(cos θ + i sin θ ), r ∈ 0+
= reiθ, r ∈ 0+
8. Matrices
M a matrix M
–1
M the inverse of the square matrix M
T
M the transpose of the matrix M
det M the determinant of the square matrix M
9. Vectors
a the vector a
AB the vector represented in magnitude and direction by the directed line segment AB
a the magnitude of a
AB the magnitude of AB
19
9758 H2 MATHEMATICS (2017)
f1, f2,… frequencies with which the observations, x1, x2, …occur
p(x) the value of the probability function P(X = x) of the discrete random variable X
p1, p2,… probabilities of the values x1, x2, …of the discrete random variable X
f(x), g(x)… the value of the probability density function of the continuous random variable X
F(x), G(x)… the value of the (cumulative) distribution function P(X Y x) of the random variable X
20
81 New List of Formulae (MF26)
Reproduced on the following pages is List MF26 . This new List of For-
mulae and Statistical Tables will be used for the first time in 2017.
LIST OF FORMULAE
AND
STATISTICAL TABLES
For use from 2017 in all papers for the H1, H2 and H3 Mathematics, H1 Statistics and
H2 Further Mathematics syllabuses.
CSTXXX
*xxxxxxxxxx*
Algebraic series
Binomial expansion:
n n n
(a + b) n = a n + a n −1b + a n − 2b 2 + a n − 3b3 + K + b n , where n is a positive integer and
1 2 3
n n!
=
r r!(n − r )!
Maclaurin expansion:
x2 x n (n)
f( x) = f(0) + x f ′(0) + f ′′(0) + K + f (0) + K
2! n!
n(n − 1) 2 n(n − 1) K (n − r + 1) r
(1 + x) n = 1 + nx + x +K+ x +K ( x < 1)
2! r!
x2 x3 xr
ex =1+ x + + +K+ +K (all x)
2! 3! r!
x3 x5 (−1) r x 2 r +1
sin x = x − + −K+ +K (all x)
3! 5! (2r + 1)!
x2 x4 (−1) r x 2 r
cos x = 1 − + −K+ +K (all x)
2! 4! (2r )!
x2 x3 (−1) r +1 x r
ln(1 + x) = x − + −K+ +K ( −1< x ≤1)
2 3 r
2
Trigonometry
sin( A ± B) ≡ sin A cos B ± cos A sin B
cos( A ± B) ≡ cos A cos B m sin A sin B
tan A ± tan B
tan( A ± B) ≡
1 m tan A tan B
sin 2 A ≡ 2 sin A cos A
cos 2 A ≡ cos 2 A − sin 2 A ≡ 2 cos 2 A − 1 ≡ 1 − 2 sin 2 A
2 tan A
tan 2 A ≡
1 − tan 2 A
sin P + sin Q ≡ 2 sin 12 ( P + Q) cos 12 ( P − Q)
Principal values:
− 12 π ≤=sin−1x ≤ 1
2
π ( x ≤ 1)
0 ≤ cos−1x ≤ π ( x ≤ 1)
Derivatives
f(x) f ′( x)
1
sin −1 x
1− x 2
1
cos −1 x −
1− x 2
1
tan −1 x
1 + x2
3
Integrals
f(x) ∫ f( x) dx
1 1 x
tan −1
x + a2
2
a a
1 x
sin −1 (x < a)
2
a −x 2
a
1 1 x−a
2 2
ln (x > a)
x −a 2a x + a
1 1 a+x
2 2
ln ( x <a)
a −x 2a a − x
Vectors
µa + λb
The point dividing AB in the ratio λ : µ has position vector
λ+µ
Vector product:
a1 b1 a 2 b3 − a 3 b2
a × b = a 2 × b2 = a 3 b1 − a1b3
a b a b − a b
3 3 1 2 2 1
4
Numerical methods
b
∫ f ( x)dx ≈ 2 (b − a )[f (a ) + f (b)]
1
Trapezium rule (for single strip):
a
b a +b
∫
1
Simpson’s rule (for two strips): f ( x)dx ≈ 6 (b − a ) f (a ) + 4f + f (b)
a
2
f ( x1 )
x2 = x1 – ,
f ′( x1 )
y 2 = y1 + hf ( x1 , y1 )
h
y 2 = y1 + [f (x1 , y1 ) + f (x2 , u 2 )]
2
5
PROBABILITY AND STATISTICS
Standard discrete distributions
n x
Binomial B(n,p ) p (1 − p ) n − x np np (1 − p )
x
Poisson Po(λ ) λx
e −λ λ λ
x!
1 1− p
Geometric Geo(p) (1 – p)x–1p
p p2
1 1
Exponential λe–λx
λ λ2
6
THE NORMAL DISTRIBUTION FUNCTION
If Z has a normal distribution with mean 0 and variance 1 then, for each
value of z, the table gives the value of Φ(z) , where
Φ (z ) = P(Z ⩽ z).
For negative values of z use Φ(− z) = 1 − Φ( z) .
1 2 3 4 5 6 7 8 9
z 0 1 2 3 4 5 6 7 8 9
ADD
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 11 13 15 17
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 2 3 5 6 8 10 11 13 14
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1 3 4 6 7 8 10 11 13
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2 3 4 5 6 7 8 9
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 2 2 2 3 3 4
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2 2 2 2
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 1
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 1 1 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 1 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 1
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0
7
CRITICAL VALUES FOR THE t-DISTRIBUTION
ν=1 1.000 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6
2 0.816 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60
3 0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92
4 0.741 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869
6 0.718 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.695 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 0.694 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 0.691 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.690 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 0.689 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 0.688 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 0.688 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.687 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 0.686 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 0.686 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.685 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.768
24 0.685 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 0.684 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 0.684 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.684 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689
28 0.683 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.683 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.660
30 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 0.681 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
60 0.679 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
120 0.677 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373
∞ 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
8
CRITICAL VALUES FOR THE χ 2 -DISTRIBUTION
P(X ⩽ x) = p.
9
WILCOXON SIGNED RANK TEST
For each value of n the table gives the largest value of T which will lead to rejection of the null hypothesis at
the level of significance indicated.
Critical values of T
Level of significance
One Tail 0.05 0.025 0.01 0.005
Two Tail 0.1 0.05 0.02 0.01
n=6 2 0
7 3 2 0
8 5 3 1 0
9 8 5 3 1
10 10 8 5 3
11 13 10 7 5
12 17 13 9 7
13 21 17 12 9
14 25 21 15 12
15 30 25 19 15
16 35 29 23 19
17 41 34 27 23
18 47 40 32 27
19 53 46 37 32
20 60 52 43 37
For larger values of n , each of P and Q can be approximated by the normal distribution with mean
1 1
4
n(n + 1) and variance 24
n(n + 1)(2n + 1) .
10
BLANK PAGE
11
This booklet is the property of
12
82 Old (9740) Syllabus
CONTENTS
Page
AIMS 2
ASSESSMENT OBJECTIVES (AO) 2
USE OF GRAPHING CALCULATOR (GC) 3
LIST OF FORMULAE 3
INTEGRATION AND APPLICATION 3
SCHEME OF EXAMINATION PAPERS 3
CONTENT OUTLINE 4
ASSUMED KNOWLEDGE 12
MATHEMATICAL NOTATION 14
AIMS
The syllabus prepares students adequately for university courses including mathematics, physics and
engineering, where more mathematics content is required. The syllabus aims to develop mathematical
thinking and problem solving skills in students. Topics covered include Functions and Graphs, Sequences
and Series, Vectors, Complex Numbers, Calculus, Permutations, Combinations and Probability, Binomial,
Poisson and Normal Distributions, Sampling and Hypothesis Testing, and Correlation and Regression.
Students will learn to analyse, formulate and solve different types of problems. They will also learn to work
with data and perform statistical analyses.
AO1 understand and apply mathematical concepts and skills in a variety of contexts, including the
manipulation of mathematical expressions and use of graphing calculators
AO2 reason and communicate mathematically through writing mathematical explanation, arguments and
proofs, and inferences
AO3 solve unfamiliar problems; translate common realistic contexts into mathematics; interpret and
evaluate mathematical results, and use the results to make predictions, or comment on the context.
2
9740 H2 MATHEMATICS (2017)
Students should be aware that there are limitations inherent in GC. For example, answers obtained by
tracing along a graph to find roots of an equation may not produce the required accuracy.
LIST OF FORMULAE
Candidates will be provided in the examination with a list of formulae.
PAPER 1 (3 hours)
A paper consisting of about 10 to 12 questions of different lengths and marks based on the Pure
Mathematics section of the syllabus.
PAPER 2 (3 hours)
A paper consisting of 2 sections, Sections A and B.
Section A (Pure Mathematics – 40 marks) will consist of about 3–4 questions of different lengths and marks
based on the Pure Mathematics section of the syllabus.
Section B (Statistics – 60 marks) will consist of about 6–8 questions of different lengths and marks based on
the Statistics section of the syllabus.
3
9740 H2 MATHEMATICS (2017)
CONTENT OUTLINE
Knowledge of the content of the O Level Mathematics syllabus and of some of the content of the O Level
Additional Mathematics syllabus are assumed in the syllabus below and will not be tested directly, but it may
be required indirectly in response to questions on other topics. The assumed knowledge for O Level
Additional Mathematics is appended after this section.
Topic/Sub-topics Content
PURE MATHEMATICS
4
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
3 Vectors
5
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
Exclude:
• finding the shortest distance between two skew lines
• finding an equation for the common perpendicular to
two skew lines
4 Complex numbers
6
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
Exclude:
• loci such as z − a = k z − b , where k ≠ 1 and
arg( z − a ) − arg( z − b ) = α
• properties and geometrical representation of the n th
roots of unity
• use of de Moivre’s theorem to derive trigonometric
identities
5 Calculus
Exclude:
• finding non-stationary points of inflexion
• problems involving small increments and
approximation
7
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
8
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
STATISTICS
9
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
Exclude:
• finding probability density functions and distribution
functions
• calculation of E( X ) and Var( X ) from other probability
density functions
10
9740 H2 MATHEMATICS (2017)
Topic/Sub-topics Content
Exclude:
• derivation of formulae
• hypothesis tests
11
9740 H2 MATHEMATICS (2017)
ASSUMED KNOWLEDGE
12
9740 H2 MATHEMATICS (2017)
CALCULUS
13
9740 H2 MATHEMATICS (2017)
MATHEMATICAL NOTATION
The list which follows summarises the notation used in Cambridge’s Mathematics examinations. Although
primarily directed towards A Level, the list also applies, where relevant, to examinations at all other levels.
1. Set Notation
∈ is an element of
∉ is not an element of
{x1, x2, …} the set with elements x1, x2, …
{x: …} the set of all x such that
n(A) the number of elements in set A
∅ the empty set
universal set
A′ the complement of the set A
the set of integers, {0, ±1, ±2, ±3, …}
+ the set of positive integers, {1, 2, 3, …}
the set of rational numbers
+
the set of positive rational numbers, {x ∈ : x > 0}
+
0 the set of positive rational numbers and zero, {x ∈ : x ğ 0}
the set of real numbers
+
the set of positive real numbers, {x ∈ : x > 0}
+
0 the set of positive real numbers and zero, {x ∈ : x ğ 0}
n
the real n tuples
`= the set of complex numbers
⊆ is a subset of
⊂ is a proper subset of
is not a subset of
14
9740 H2 MATHEMATICS (2017)
2. Miscellaneous Symbols
= is equal to
≠ is not equal to
≡ is identical to or is congruent to
≈ is approximately equal to
∝ is proportional to
< is less than
Y; — is less than or equal to; is not greater than
> is greater than
[; – is greater than or equal to; is not less than
∞ infinity
3. Operations
a+b a plus b
a–b a minus b
a × b, ab, a.b a multiplied by b
a
a ÷ b, , a/b a divided by b
b
∑a
i =1
i a1 + a2 + ... + an
n n!
the binomial coefficient , for n, r ∈ + U {0}, 0 Y r Y n
r r! (n − r )!
n(n − 1)...(n − r + 1)
, for n ∈ , r ∈ +U {0}
r!
15
9740 H2 MATHEMATICS (2017)
4. Functions
f function f
f(x) the value of the function f at x
f: A →B f is a function under which each element of set A has an image in set B
f: x y the function f maps the element x to the element y
–1
f the inverse of the function f
g o f, gf the composite function of f and g which is defined by
(g o f)(x) or gf(x) = g(f(x))
∆x ; δx an increment of x
dy
the derivative of y with respect to x
dx
dn y
the nth derivative of y with respect to x
dx n
f'(x), f'′(x), …, f(n)(x) the first, second, … nth derivatives of f(x) with respect to x
16
9740 H2 MATHEMATICS (2017)
7. Complex Numbers
i square root of –1
z a complex number, z = x + iy
+
= r(cos θ + i sin θ ), r ∈ 0
+
= reiθ, r ∈ 0
Re z the real part of z, Re (x + iy) = x
Im z the imaginary part of z, Im (x + iy) = y
z the modulus of z, x + iy = √(x2 + y2), r (cosθ + i sinθ ) = r
arg z the argument of z, arg(r(cos θ + i sin θ )) = θ , – π < θ Ğ π
z* the complex conjugate of z, (x + iy)* = x – iy
8. Matrices
M a matrix M
–1
M the inverse of the square matrix M
T
M the transpose of the matrix M
det M the determinant of the square matrix M
9. Vectors
a the vector a
AB the vector represented in magnitude and direction by the directed line segment AB
â a unit vector in the direction of the vector a
i, j, k unit vectors in the directions of the cartesian coordinate axes
a the magnitude of a
AB the magnitude of AB
A, B, C, etc. events
A∪B union of events A and B
A∩B intersection of the events A and B
P(A) probability of the event A
A' complement of the event A, the event ‘not A’
P(A | B) probability of the event A given the event B
X, Y, R, etc. random variables
x, y, r, etc. value of the random variables X, Y, R, etc.
x1 , x 2 , … observations
f1 , f 2 ,… frequencies with which the observations, x1, x2 …occur
17
9740 H2 MATHEMATICS (2017)
p(x) the value of the probability function P(X = x) of the discrete random variable X
p1 , p 2 … probabilities of the values x1 , x 2 , …of the discrete random variable X
f(x), g(x)… the value of the probability density function of the continuous random variable X
F(x), G(x)… the value of the (cumulative) distribution function P(X Y x) of the random variable X
E(X) expectation of the random variable X
E[g(X)] expectation of g(X)
Var(X) variance of the random variable X
B(n, p) binominal distribution, parameters n and p
Po(µ) Poisson distribution, mean µ
2
N(µ, σ ) normal distribution, mean µ and variance σ2
µ population mean
2
σ population variance
σ population standard deviation
x sample mean
unbiased estimate of population variance from a sample,
s2
1
s2 = ∑(x − x )2
n −1
φ probability density function of the standardised normal variable with distribution N
(0, 1)
Φ corresponding cumulative distribution function
ρ linear product-moment correlation coefficient for a population
r linear product-moment correlation coefficient for a sample
18
83 Old List of Formulae (MF15)
Reproduced on the following pages is List MF15 . This old List of Formu-
lae and Statistical Tables will be used through 2016.
LIST OF FORMULAE
AND
STATISTICAL TABLES
For use from 2017 in all papers for the H1, H2 and H3 Mathematics, H1 Statistics and
H2 Further Mathematics syllabuses.
CSTXXX
*xxxxxxxxxx*
Algebraic series
Binomial expansion:
n n n
(a + b) n = a n + a n −1b + a n − 2b 2 + a n − 3b3 + K + b n , where n is a positive integer and
1 2 3
n n!
=
r r!(n − r )!
Maclaurin expansion:
x2 x n (n)
f( x) = f(0) + x f ′(0) + f ′′(0) + K + f (0) + K
2! n!
n(n − 1) 2 n(n − 1) K (n − r + 1) r
(1 + x) n = 1 + nx + x +K+ x +K ( x < 1)
2! r!
x2 x3 xr
ex =1+ x + + +K+ +K (all x)
2! 3! r!
x3 x5 (−1) r x 2 r +1
sin x = x − + −K+ +K (all x)
3! 5! (2r + 1)!
x2 x4 (−1) r x 2 r
cos x = 1 − + −K+ +K (all x)
2! 4! (2r )!
x2 x3 (−1) r +1 x r
ln(1 + x) = x − + −K+ +K ( −1< x ≤1)
2 3 r
2
Trigonometry
sin( A ± B) ≡ sin A cos B ± cos A sin B
cos( A ± B) ≡ cos A cos B m sin A sin B
tan A ± tan B
tan( A ± B) ≡
1 m tan A tan B
sin 2 A ≡ 2 sin A cos A
cos 2 A ≡ cos 2 A − sin 2 A ≡ 2 cos 2 A − 1 ≡ 1 − 2 sin 2 A
2 tan A
tan 2 A ≡
1 − tan 2 A
sin P + sin Q ≡ 2 sin 12 ( P + Q) cos 12 ( P − Q)
Principal values:
− 12 π ≤=sin−1x ≤ 1
2
π ( x ≤ 1)
0 ≤ cos−1x ≤ π ( x ≤ 1)
Derivatives
f(x) f ′( x)
1
sin −1 x
1− x 2
1
cos −1 x −
1− x 2
1
tan −1 x
1 + x2
3
Integrals
f(x) ∫ f( x) dx
1 1 x
tan −1
x + a2
2
a a
1 x
sin −1 (x < a)
2
a −x 2
a
1 1 x−a
2 2
ln (x > a)
x −a 2a x + a
1 1 a+x
2 2
ln ( x <a)
a −x 2a a − x
Vectors
µa + λb
The point dividing AB in the ratio λ : µ has position vector
λ+µ
Vector product:
a1 b1 a 2 b3 − a 3 b2
a × b = a 2 × b2 = a 3 b1 − a1b3
a b a b − a b
3 3 1 2 2 1
4
Numerical methods
b
∫ f ( x)dx ≈ 2 (b − a )[f (a ) + f (b)]
1
Trapezium rule (for single strip):
a
b a +b
∫
1
Simpson’s rule (for two strips): f ( x)dx ≈ 6 (b − a ) f (a ) + 4f + f (b)
a
2
f ( x1 )
x2 = x1 – ,
f ′( x1 )
y 2 = y1 + hf ( x1 , y1 )
h
y 2 = y1 + [f (x1 , y1 ) + f (x2 , u 2 )]
2
5
PROBABILITY AND STATISTICS
Standard discrete distributions
n x
Binomial B(n,p ) p (1 − p ) n − x np np (1 − p )
x
Poisson Po(λ ) λx
e −λ λ λ
x!
1 1− p
Geometric Geo(p) (1 – p)x–1p
p p2
1 1
Exponential λe–λx
λ λ2
6
THE NORMAL DISTRIBUTION FUNCTION
If Z has a normal distribution with mean 0 and variance 1 then, for each
value of z, the table gives the value of Φ(z) , where
Φ (z ) = P(Z ⩽ z).
For negative values of z use Φ(− z) = 1 − Φ( z) .
1 2 3 4 5 6 7 8 9
z 0 1 2 3 4 5 6 7 8 9
ADD
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 11 13 15 17
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 2 3 5 6 8 10 11 13 14
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1 3 4 6 7 8 10 11 13
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2 3 4 5 6 7 8 9
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 2 2 2 3 3 4
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2 2 2 2
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 1
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 1 1 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 1 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 1
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0
7
CRITICAL VALUES FOR THE t-DISTRIBUTION
ν=1 1.000 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6
2 0.816 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60
3 0.765 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92
4 0.741 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610
5 0.727 1.476 2.015 2.571 3.365 4.032 4.773 5.894 6.869
6 0.718 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959
7 0.711 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408
8 0.706 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041
9 0.703 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781
10 0.700 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587
11 0.697 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437
12 0.695 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318
13 0.694 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221
14 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140
15 0.691 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073
16 0.690 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015
17 0.689 1.333 1.740 2.110 2.567 2.898 3.222 3.646 3.965
18 0.688 1.330 1.734 2.101 2.552 2.878 3.197 3.610 3.922
19 0.688 1.328 1.729 2.093 2.539 2.861 3.174 3.579 3.883
20 0.687 1.325 1.725 2.086 2.528 2.845 3.153 3.552 3.850
21 0.686 1.323 1.721 2.080 2.518 2.831 3.135 3.527 3.819
22 0.686 1.321 1.717 2.074 2.508 2.819 3.119 3.505 3.792
23 0.685 1.319 1.714 2.069 2.500 2.807 3.104 3.485 3.768
24 0.685 1.318 1.711 2.064 2.492 2.797 3.091 3.467 3.745
25 0.684 1.316 1.708 2.060 2.485 2.787 3.078 3.450 3.725
26 0.684 1.315 1.706 2.056 2.479 2.779 3.067 3.435 3.707
27 0.684 1.314 1.703 2.052 2.473 2.771 3.057 3.421 3.689
28 0.683 1.313 1.701 2.048 2.467 2.763 3.047 3.408 3.674
29 0.683 1.311 1.699 2.045 2.462 2.756 3.038 3.396 3.660
30 0.683 1.310 1.697 2.042 2.457 2.750 3.030 3.385 3.646
40 0.681 1.303 1.684 2.021 2.423 2.704 2.971 3.307 3.551
60 0.679 1.296 1.671 2.000 2.390 2.660 2.915 3.232 3.460
120 0.677 1.289 1.658 1.980 2.358 2.617 2.860 3.160 3.373
∞ 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
8
CRITICAL VALUES FOR THE χ 2 -DISTRIBUTION
P(X ⩽ x) = p.
9
WILCOXON SIGNED RANK TEST
For each value of n the table gives the largest value of T which will lead to rejection of the null hypothesis at
the level of significance indicated.
Critical values of T
Level of significance
One Tail 0.05 0.025 0.01 0.005
Two Tail 0.1 0.05 0.02 0.01
n=6 2 0
7 3 2 0
8 5 3 1 0
9 8 5 3 1
10 10 8 5 3
11 13 10 7 5
12 17 13 9 7
13 21 17 12 9
14 25 21 15 12
15 30 25 19 15
16 35 29 23 19
17 41 34 27 23
18 47 40 32 27
19 53 46 37 32
20 60 52 43 37
For larger values of n , each of P and Q can be approximated by the normal distribution with mean
1 1
4
n(n + 1) and variance 24
n(n + 1)(2n + 1) .
10
BLANK PAGE
11
This booklet is the property of
12
Part IX
Appendices (Optional)
The discussion in the main text above has not always been complete, precise, and rigorous.
In these appendices, I fill in these gaps. In particular, I give formal definitions, statements
of claims, and proofs of claims.
In general, where there is a trade-off between generality of a result and the simplicity of its
proof, I favour the latter.
84.1 Sets
Fact 1. Two sets are subsets of each other ⇐⇒ they are identical.
Proof. (1) If every element in A is also in B and every element in B is also in A, then both
sets contain exactly the same elements. By Definition 3 then, A = B.
(2) If A = B, then both sets contain exactly the same elements. Hence, every element in A
is also in B and every element in B is also in A.
When a function is given the above definition, there is no formal distinction between a
function and its graph. A function is what we called its graph — that is, a function is a
set of points.
If it seems strange to you that a function is defined to be a set, you might find it stranger
still that every mathematical object can be defined in terms of sets.
For example, in standard set theory, very strangely, the number 0 is defined to be the
empty set {} = ∅. The number 1 is defined to be {0} = {{}} = {∅}. The number 2 is
defined to be {0, 1} = {{} , {{}}} = {∅, {∅}}. The number 3 is defined to be {0, 1, 2} =
{{} , {{}} , {{} , {{}}}} = {∅, {∅} , {∅, {∅}}}. Etc.
Sets are what mathematicians call a primitive notion. That is, sets are left undefined
(though they do have to satisfy certain axioms). But having summoned out of the void
this single undefined object called the set, mathematicians can then define every other
mathematical object based on the set. It is in this sense that the set is the basic building
block out of which every other mathematical object can be built. The idea is to have just
one undefined object, then define everything else based on this single undefined object.
Fact 5 (reproduced from p. 92). Let (a, b) be a point. Its reflection in the line y = x
is the point (b, a).
Proof. Let (p, q) be the reflection of the point (a, b) in the line y = x.
Consider the line through the points (a, b) and (p, q). It is perpendicular to the line y = x,
whose slope is 1. And so the slope of this line must be −1. Thus, q − b = −1 × (p − a) or
1
q − b = a − p.
Now consider the midpoint of the line segment connecting (a, b) and (p, q), namely
a+p b+q
( , ).
2 2
b+q a+p 2
This point must be on the line y = x. And so, = or b + q = a + p.
2 2
1 2 2 1
Taking = plus =, we have 2q = 2a or q = a. Taking = minus =, we have 2p = 2b or p = b.
Altogether, (p, q) = (b, a), as desired.
Similarly,
Fact 6 (reproduced from p. 92). Let (a, b) be a point. Its reflection in the line y = −x
is the point (−b, −a).
Proof. Let (p, q) be the reflection of the point (a, b) in the line y = −x.
Consider the line through the points (a, b) and (p, q). It is perpendicular to the line y = x,
whose slope is −1. And so the slope of this line must be 1. Thus, q − b = 1 × (p − a) or
1
q − b = p − a.
Now consider the midpoint of the line segment connecting (a, b) and (p, q), namely
a+p b+q
( , ).
2 2
b+q a+p 2
This point must be on the line y = −x. And so, =− or b + q = −a − p.
2 2
1 2 2 1
Taking = plus =, we have 2q = −2a or q = −a. Taking = minus =, we have −2p = 2b or p = −b.
Altogether, (p, q) = (−b, −a), as desired.
Fact 86. Let (p, q) be a point. Its reflection in the line ay + bx + c = 0 is the point
Proof. Consider the line that is perpendicular to the line of reflection and which contains
(p, q). It can be written as −by+ax+d = 0, where d is an unknown. Since (p, q) is on this line,
we have −bq +ap+d = 0, so that d = bq −ap. So the perpendicular line is −by +ax+bq −ap = 0.
The intersection of the line of reflection and the perpendicular line we just found is given
by the system of equations:
1
ay + bx + c = 0 (equation of line of reflection),
2
−by + ax + bq − ap = 0 (equation of perpendicular line).
1 2 a2 p − b(aq + c)
Take b× = plus a× = and do the algebra to get x = .
a2 + b2
1 2 b2 q − a(bp + c)
Similarly, take a× = minus b× = and do the algebra to get y = .
a2 + b2
(x, y) is the midpoint between (p, q) and the reflection point we are looking for. Thus, our
reflection point has x- and y-coordinates
Fact 7 (reproduced from p. 94). Let f be an invertible function. Then the reflection
of the graph of f in the line y = x is the graph of its inverse function f −1 .
bx + c
Fact 87. The graph of y = has no turning points.
dx + e
Proof. Compute
dy d b cd − be 1
= [ + ]
dx dx d d2 x + e/d
d b d cd − be 1
= + [ ]
dx d dx d2 x + e/d
cd − be d 1
=0+ [ ]
d2 dx x + e/d
cd − be 1
= [(−1) ].
d2 (x + e/d)2
a b ae
x + − 2
d d d
dx + e ax2 +bx +c
ae
ax2 + x
d
ae
(b − )x +c
d
ae b ae
(b − )x +( − 2)e
d d d
ae b
c+( − )e
d2 d
a b ae ae b
The “quotient” is x + − 2 and the “remainder” is c + ( 2 − ) e. Let’s see if we can
d d d d d
simplify this so that x in the denominator has no coefficient:
b ae c + ( d2 − d ) e a b ae 1 c + ( d2 − d ) e
ae b ae b
ax2 + bx + c a
= x+ − 2 + = x+ − 2 +
dx + e d d d dx + e d d d d x + e/d
bd − ae c + ( d2 − d ) e 1
ae b
a a bd − ae d2 c + (ae − bd) e 1
= x+ + = x + + .
d d2 d x + e/d d d2 d3 x + e/d
Recall that to rule out trivial cases, we assumed that d ≠ 0; c and e are not both 0; and
a ≠ 0.
Now in addition, we’ll also assume that d2 c + (ae − bd) e = 0 (otherwise the function is a
linear function).
We now examine the hyperbola’s intercepts, turning points, asymptotes, centre, and lines
of symmetry.
The horizontal intercepts are given by the zeros of the equation ax2 + bx + c = 0. So if
b2 −4ac <√
0, then there are no horizontal intercepts. Otherwise the two horizontal intercepts
−b ± b2 − 4ac
are (identical if b2 − 4ac = 0). And so the graph intesects the horizontal axis
2a √ √
−b − b2 − 4ac −b + b2 − 4ac
at the points ( , 0) and ( , 0).
2a 2a
dy a d2 c + (ae − bd) e
2. Turning points. Compute = − . Setting this equal to zero, we
dx d (x + e/d)2 d3
2 d2 c + (ae − bd) e d2 y d2 c + (ae − bd) e
have (dx + e) = . Compute also 2 = 2 . And thus,
a dx (dx + e)3
√ √
d2 c+(ae−bd)e d2 c+(ae−bd)e
−e − −e +
x= and x =
a a
.
d d
√ √
⎛a a2 ⎞ bd − ae e a2
y= + + 1 x + + + 1,
⎝d d2 ⎠ d2 d d2
√ √
⎛a a2 ⎞ bd − ae e a2
y= − + 1 x + − + 1.
⎝d d2 ⎠ d2 d d2
ax2 + bx + c
The proof that the above are indeed the lines of symmetry for y = simply
dx + e
involves a load of messy and boring algebra, which we’ll omit.
Here are the formal definitions of convergent and divergent sequences and series.
Definition 136. Let (an ) = (a1 , a2 , a3 , . . . ) be an infinite sequence. Let > 0. If there
exists N such that for any n ≥ N , an ∈ (L − , L + ), then the sequence (an ) is convergent;
and moreover, it converges to L (L is called the limit of the sequence). We can also write
(an ) → L.
A sequence that is not convergent is divergent and its limit does not exist.
86.1 Vectors in 2D
Fact 22 (reproduced from p. 277). Let a and b be any two non-zero vectors. Then
â = b̂ ⇐⇒ a can be written as a scalar multiple of b.
̂ = cb = b = b̂.
Proof. ( ⇐Ô ) Suppose a = cb. Then â = cb
c ∣b∣ ∣b∣
a b b
( Ô⇒ ) Suppose â = b̂. Then = â = b̂ and = b̂, so that indeed a = ∣a∣ â = ∣a∣ b̂ = ∣a∣
∣a∣ ∣b∣ ∣b∣
can be written as a scalar multiple of b.
Fact 23 (reproduced from p. 277 above). Let a and b be any two vectors in the same
plane with distinct direction vectors. Then every vector in the same plane can be written
as αa + βb for some α, β ∈ R.
Proof. I prove only the 2D case. (For higher dimensions, it is much easier to use linear
algebra, but this is not covered in H2 maths.)
Let a = (a1 , a2 ) and b = (b1 , b2 ). Let c = (c1 , c2 ) be any vector.
Observe that a1 b2 ≠ a2 b1 , because if a1 b2 = a2 b1 , then a1 , a2 , b1 , b2 ≠ 0 (otherwise both a and
a1 b1
b are zero vectors) and so = , in which case a and b have the same direction vector,
a2 b2
contradicting our assumption.
Then we do indeed have c = (αa1 + βb1 , αa2 + βb2 ) if we pick α and β such that
1 2
αa1 + βb1 = c1 , αa2 + βb2 = c2 .
1 2 3
Taking b2 × = minus b1 × = yields αa1 b2 − αa2 b1 = b2 c1 − b1 c2 .
1 2 4
Taking a2 × = minus a1 × = yields βa2 b1 − βa1 b2 = a2 c1 − a1 c2 .
Since a1 b2 ≠ a2 b1 , we can pick
b2 c1 − b1 c2 a1 c2 − a2 c1
α= and β = .
a 1 b2 − a 2 b1 a1 b2 − a2 b1
Fact 24 (reproduced from p. 280). Let a, b, and c be vectors. Then a⋅(b+c) = a⋅b+a⋅c.
Moreover, (a + b) ⋅ c = a ⋅ c + b ⋅ c.
Proof. I prove only the 2D case. Let a = (a1 , a2 ), b = (b1 , b2 ), and c = (c1 , c2 ). Then
Fact 26 (reproduced from p. 281). Let u and v be two vectors (of any dimension)
and θ ∈ [0, π] be the angle between them. Then u ⋅ v = ∣u∣ ∣v∣ cos θ.
Proof. Let u and v correspond to two sides of a triangle. Then u−v corresponds to the third
side and θ is the angle opposite this third side. Then by Proposition 6 (Law of Cosines),
2 2 2
∣u − v∣ = ∣u∣ + ∣v∣ − 2 ∣u∣ ∣v∣ cos θ
⇐⇒ (u − v) ⋅ (u − v) = u ⋅ u + v ⋅ v − 2 ∣u∣ ∣v∣ cos θ
⇐⇒ u ⋅ u + v ⋅ v − 2u ⋅ v = u ⋅ u + v ⋅ v − 2 ∣u∣ ∣v∣ cos θ
⇐⇒ −2u ⋅ v = −2 ∣u∣ ∣v∣ cos θ
⇐⇒ u ⋅ v = ∣u∣ ∣v∣ cos θ
u⋅v
⇐⇒ cos θ = .
∣u∣ ∣v∣
Theorem 3 (reproduced from p. 278.) Ratio Theorem. Let a and b be points. Let
p be a point on the line segment ab. Then
Ð
→
∣bp∣ ∣Ð→
ap∣
p= → a+ Ð
Ð → b.
Ð
Ð
→
∣ap∣ + ∣bp∣ →
∣ap∣ + ∣bp∣
Ð
→= ∣Ð→
ap∣
ap
→ + ∣Ð → (b − a) .
∣Ð
ap∣ bp∣
Hence,
p=a+Ð
→
ap
∣Ð→
ap∣
=a+ → (b − a)
Ð
Ð
→
∣ap∣ + ∣bp∣
⎛ ∣Ð→
ap∣ ⎞ ∣Ð→
ap∣
= ⎜1 − → ⎟a + Ð →b
⎝ → + ∣Ð
∣Ð
ap∣ bp∣ ⎠ → + ∣Ð
∣ap∣ bp∣
Ð
→
∣bp∣ ∣Ð→
ap∣
= → a+ Ð → b.
→ + ∣Ð
∣Ð
ap∣ bp∣ → + ∣Ð
∣ap∣ bp∣
Fact 29 (reproduced from p. 294). Let u and v be two non-zero 2D vectors and
θ ∈ [0, π] be the angle between them. Then the scalar u × v is equal to either ∣u∣ ∣v∣ sin θ or
− ∣u∣ ∣v∣ sin θ.
Proof. Let α and β be the angles that u and v make with the positive x-axis (the angles
are measured counter-clockwise). Then
u × v = ux vy − uy vx
Case #1. If β ≥ α, then θ = β − α and so sin (β − α) = sin θ. Thus, u × v = ∣u∣ ∣v∣ sin θ, as
desired.
Case #2. If α > β, then θ = α − β and so sin (β − α) = sin (−θ) = − sin θ. Thus, u × v =
− ∣u∣ ∣v∣ sin θ, as desired.
Lemma 2. a ⋅ (b × c) = (a × b) ⋅ c.
Proof. We can similarly prove that the parallelopiped with sides of lengths ∣a∣, ∣b∣, and ∣c∣
also has volume (a × b) ⋅ c.
d ⋅ d = d ⋅ [a × (b + c) − (a × b + a × c)]
= d ⋅ [a × (b + c)] − d ⋅ (a × b + a × c)
= d ⋅ [a × (b + c)] − d ⋅ (a × b) − d ⋅ (a × c)
= (d × a) ⋅ (b + c) − (d × a) ⋅ b − (d × a) ⋅ c
= (d × a) ⋅ b + (d × a) ⋅ c − (d × a) ⋅ b − (d × a) ⋅ c
= 0,
where the second and third lines follow from the distributivity of scalar product, and the
fourth line uses 2.
d ⋅ d = 0 ⇐⇒ d = 0. Thus, a × (b + c) = (a × b + a × c), as desired.
⎛ uy vz − uz vy ⎞
u×v=⎜
⎜ uz vx − ux vz
⎟.
⎟
⎝ ux vy − uy vx ⎠
Proof.
u×v= (ux i + uy j + uz k) × (vx i + vy j + vz k)
= ux i × (vx i + vy j + vz k)
+ uy j × (vx i + vy j + vz k) (distributivity)
+ uz k × (vx i + vy j + vz k)
= ux vx (i × i) + ux vy (i × j) + ux vz (i × k)
+ uy vx (j × i) + uy vy (j × j) + uy vz (j × k) (distributivity)
+ uz vx (k × i) + uz vy (k × j) + uz vz (k × k)
= 0 + ux vy k + ux vz (−j)
+ uy vx (−k) + 0 + uy vz i (Fact 32)
+ uz vx j + uz vy (−i) + 0
= (uy vz − uz vy ) i
+ (uz vx − ux vz ) j
+ (ux vy − uy vx ) k
⎛ uy vz − uz vy ⎞
= ⎜ u v −u v ⎟.
⎜ z x x z ⎟
⎝ ux vy − uy vx ⎠
Fact 34 (reproduced from p. 306). The line with vector equation r = (p1 , p2 ) + λ(v1 , v2 )
(for λ ∈ R) is the line with cartesian equations as given by the 3 cases below.
x − p1 y − p2
(1) = , if v1 , v2 ≠ 0;
v1 v2
(2) x = p1 , y is free, if v1 = 0, v2 ≠ 0;
(3) x is free, y = p2 , if v2 = 0, v1 ≠ 0;
2 2
Eliminate λ by taking v2 × = minus v1 × =:
v2 x − v1 y = v2 p1 + λv1 v2 − v1 p2 − λv1 v2
= v2 p1 − v1 p2 .
Rearranging, we have v2 (x − p1 ) = v1 (y − p2 ).
x − p1 y − p2
(1) If v1 , v2 ≠ 0, then we can divide both sides by v1 v2 to find that indeed = .
v1 v2
(2) If v1 = 0, v2 ≠ 0, then indeed x = p1 and y is free to vary.
Fact 35 (reproduced from p. 310). The line with vector equation r = (p1 , p2 , p3 ) +
λ(v1 , v2 , v3 ) (for λ ∈ R) is the line with cartesian equations as given by the 7 cases below.
x − p1 y − p2 z − p3
(1) = = , if v1 , v2 , v3 ≠ 0;
v1 v2 v3
y − p2 z − p3
(2) x = p1 , = , if v1 = 0, v2 , v3 ≠ 0;
v2 v3
x − p1 z − p3
(3) y = p2 , = , if v2 = 0, v1 , v3 ≠ 0;
v1 v3
x − p1 y − p2
(4) z = p3 , = , if v3 = 0, v1 , v2 ≠ 0;
v1 v2
(5) x = p1 , y = p2 , z is free, if v1 , v2 = 0, v3 ≠ 0;
(6) x = p1 , z = p3 , y is free, if v1 , v3 = 0, v2 ≠ 0;
(7) y = p2 , z = p3 , x is free, if v2 , v3 = 0, v1 ≠ 0.
1 2 3 1 2
Proof. Write x = p1 + λv1 , y = p2 + λv2 , and z = p3 + λv3 . Taking v2 × = minus v1 × = yields:
v2 x − v1 y = v2 p1 + λv1 v2 − v1 p2 − λv1 v2 = v2 p1 − v1 p2 .
4 2 3
Rearrange the above into v2 x−v1 y+v1 p2 −v2 p1 = 0. Similarly, taking v3 × = minus v2 × = yields
5 3 1 6
v3 y −v2 z +v2 p3 −v3 p2 = 0. Finally, taking v1 × = minus v3 × = yields v1 z −v3 x+v1 p3 −v3 p1 = 0.
4 5 x − p 1 4 y − p2 y − p2 5 z − p3
(1) If v1 , v2 , v3 ≠ 0, then = and = become = and = .
v1 v2 v2 v3
4 5 4 y − p 2 5 z − p3
(2) If instead v1 = 0 but v2 , v3 ≠ 0, then = and = become x = p1 and = .
v2 v3
I omit the proofs of cases (3) and (4), which are very similar.
4 6 4 6
(5) If v1 ≠ 0 but v2 , v3 = 0, then = and = become y = p2 and z = p3 .
I omit the proofs of cases (6) and (7), which are very similar.
√ √
(a1 − p1 − λv1 ) + (a2 − p2 − λv2 ) + (a3 − p3 − λv3 ) = λ2 ∣v∣ + ∣Ð
→ 2 − 2λ ∣Ð
→ ∣v∣.
2 2 2 2
pa∣ pa∣
[λ2 ∣v∣ + ∣Ð
→ 2 − 2λ ∣Ð
→ ∣v∣] = 2λ ∣v∣2 − 2 ∣Ð
→ ∣v∣ set
d 2
pa∣ pa∣ pa∣ = 0
dλ
v ⋅ (a − p) v̂ ⋅ Ð →
pa
⇐⇒ λ = 2 = .
∣v∣ ∣v∣
(a) Hence, the distance between the point and the line is
√ ¿
Ð
→ Ð
→
Á
Ð
→ Ð
→ v̂ ⋅ Ð
→
Á
À ∣Ð
→ ∣v∣
2 2 2 pa
λ ∣v∣ + ∣pa∣ − 2λ ∣pa∣ ∣v∣ =
2 (v̂ ⋅ pa) + ∣pa∣ − 2
2 pa∣
∣v∣
√
= (v̂ ⋅ Ð
→ 2 + ∣Ð
pa) → 2 − 2 (v̂ ⋅ Ð
pa∣ → ∣Ð
pa) →
pa∣
√
= (v̂ ⋅ Ð
pa) → 2 − 2 (v̂ ⋅ Ð
→ 2 + ∣Ð
pa∣ → (v̂ ⋅ Ð
pa) →
pa)
√
= ∣Ð
→ 2 − (v̂ ⋅ Ð
pa∣ → 2 , as desired.
pa)
ÐÐÐÐÐ→
(p + λv) ⋅ n = d
p ⋅ n + λv ⋅ n = d
1
λv ⋅ n = d − p ⋅ n,
where the second line uses the distributivity of the scalar product.
Case #1. The plane and line are not parallel.
Then v ⋅ n ≠ 0 and we can divide both sides by v ⋅ n to get
d−p⋅n
λ= .
v⋅n
d−p⋅n
p+ v.
v⋅n
( ⇐Ô ) Trivial — if they intersect along such a line, then of course they intersect.
( Ô⇒ ) Suppose the two planes intersect at some point p. So p is on both planes and we
1
have p ⋅ n1 = d1 and p ⋅ n2 = d2 .
Our goal is to show that a point q is on both planes (i.e. q ⋅ n1 = d1 and q ⋅ n2 = d2 ) if and
only if q = p + λ (n1 × n2 ) (for λ ∈ R). That is, the points of intersection are exactly those
points along the line r = p + λ (n1 × n2 ) (for λ ∈ R).
Any point can be written as q = p + λ (n1 × n2 ) + µv, where λ ∈ R and v is some vector that
is not perpendicular to n1 . Then
q ⋅ n1 = (p + λ (n1 × n2 ) + µv) ⋅ n1
= p ⋅ n1 + λ (n1 × n2 ) ⋅ n1 + µv ⋅ n1 Distributivity of scalar product
1
= d1 + λ (n1 × n2 ) ⋅ n1 + µv ⋅ n1 Using =
= d1 + µv ⋅ n1 ∵ (n1 × n2 ) ⊥ n1
ax + by + cz + d = 0,
ex + f y + gz + h = 0.
If the two planes are not parallel (i.e. (a, b, c) cannot be written as a scalar multiple of
(e, f, g)), then they share at least one point of intersection.
Proof. Pick any (i1 , i2 , i3 ) such that ai1 + bi2 + ci3 = 0 but ei1 + f i2 + gi3 ≠ 0. (This vector
exists because of the assumption that (a, b, c) cannot be written as a scalar multiple of
(e, f, g).)
Pick any (j1 , j2 , j3 ) such that aj1 + bj2 + cj3 + d = 0.
ej1 + f j2 + gj3 + h
(j1 , j2 , j3 ) − (i1 , i2 , i3 ) ,
ei1 + f i2 + gi3
=0 =0
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ej1 + f j2 + gj3 + h ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
aj1 + bj2 + cj3 + d − (ai1 + bi2 + ci3 ) = 0, ✓
ei1 + f i2 + gi3
ej1 + f j2 + gj3 + h
ej1 + ej2 + ej3 + h + (ei1 + f i2 + gi3 ) = 0. ✓
ei1 + f i2 + gi3
Fact 89. Let b > 0. Then (a) the two square roots of a+bi (i.e. the solutions to the equation
x2 = a + bi) are
√ √ √√
2 √
± ( a +b +a+i
2 2 a2 + b2 − a) .
2
(b) And the two square roots of a − bi (i.e. the solutions to the equation x2 = a − bi) are
√ √ √ √
2 √
± ( a + a − b − i a − a2 − b2 ) .
2 2
2
Proof.
√ √√ √√ 2
2
(a) [± ( a2 + b2 + a + i a2 + b2 − a)]
2
√ √
1 √ 2 2 √ √
= [ a + b + a − ( a2 + b2 + a) + 2i ( a2 + b2 + a) ( a2 + b2 − a)]
2
1 √
= [2a + 2i a2 + b2 − a2 ]
2 √
= a + i b2 = a + ib.
√ √ √ √ 2
2 √
(b) [± ( a + a2 − b2 − i a − a2 − b2 )]
2
√ √ √ √ √
1
= [a + a2 − b2 + a − a2 + b2 − 2i (a + a2 − b2 ) (a − a2 − b2 )]
2
1 √
= [2a − 2i a2 − (a2 − b2 )]
2 √
= a − i b2 = a − ib.
n
Proof. Write an xn + an−1 xn−1 + an−2 xn−2 + ⋅ ⋅ ⋅ + a1 x + a0 = ∑ ak xk . Since a + bi solves an xn +
k=0
n
an−1 xn−1 + an−2 xn−2 + ⋅ ⋅ ⋅ + a1 x + a0 = 0, we have ∑ ak (a + bi)k = 0. Observe that
k=0
n
[ ∑ ak (a + bi)k ] ∗ = 0∗ = 0.
k=0
n
k
Now repeatedly use Lemma 3 (above) to show that the LHS expression equals ∑ ak (a − bi) :
k=0
n n n
[ ∑ ak (a + bi)k ] ∗ = ∑ [ak (a + bi)k ] ∗ = ∑ ak ∗ [(a + bi)k ] ∗
k=0 k=0 k=0
n n n
== ∑ ak [(a + bi)k ] ∗ = ∑ ak [(a + bi) ∗ ] = ∑ ak (a − bi) .
k k
Proof. Let f ∶ R → C be defined by f (θ) = e−iθ (cos θ + i sin θ). Now take the derivative.87
f ′ (θ) = (−i)e−iθ (cos θ + i sin θ) + e−iθ (− sin θ + i cos θ) = 0. Since the only functions whose
derivatives are zero are constant functions, f (θ) = C for some constant C.
Thus, e−iθ (cos θ + i sin θ) = C or cos θ + i sin θ = Ceiθ . Plugging in θ = 0 reveals that C = 1
and yields the desired result.
87
We’re actually cheating a little with this proof here, because we haven’t proven how we can take derivatives of complex-
valued functions.
Proof. In each case, examine the largest and smallest φk . If they are both in the interval
(−π, π], then every φk is also in the interval (−π, π].
1. n is odd.
θ π π
Since θ ∈ (−π, π], we have ∈ (− , ).
n n n
θ 2(n − 1) π θ π π π π π π
max φk = + = + (n − 1) ∈ (− + (n − 1) , + (n − 1) ] = (−(n − 2) , π].
n 2 n n n n n n n n
And so indeed max φk ∈ (−π, π].
θ 2(n − 1) π θ π π π π π π
min φk = − = − (n − 1) ∈ (− − (n − 1) , − (n − 1) ] = (−π, (2 − n) ].
n 2 n n n n n n n n
And so indeed min φk ∈ (−π, π].
In this chapter, I sometimes use the symbols ∀ (“for all”) and ∃ (“there exist(s)”).
Informally, “lim f (x) = L” means “For all values of x that are close to but not equal to a,
x→a
f (x) is close to (or possibly even equal to) L.” Formally:
Definition 139. Let f be a real function on a real variable. Let L ∈ R. We say that the
limit of f (x) as x approaches a is L if:
This is an example of how mathematical definitions are formed. First we have some in-
tuitive, informal notion in mind (in this case a limit). Then with a little work, we write
down a formal, precise, rigorous definition to formalise our informal notion. Our rigorous
definition leaves no room for ambiguity or alternative interpretations.
Let’s now revisit our earlier examples, but now using our formal definition.
The game here is to figure out, based on your choice of , what δ I should pick in order
that x ∈ (a − δ, a + δ) /{a} implies f (x) ∈ (L − , L + ).
In the next example, δ need not depend on . Indeed, δ can simply be any positive number!
This is because of the peculiar way the function is defined.
Let > 0. Pick δ > 0 (that is, pick δ to be any positive number). Then it is indeed true
that for all x ∈ (3 − δ, 3 + δ) /{3}, we have f (x) = 0 ∈ (0 − , 0 + ). So indeed lim f (x) = 0.
x→3
Definition 140. Let f be a real function on a real variable. Let L ∈ R. We say that the
left-sided limit of f (x) as x approaches a is L if:
Definition 141. Let f be a real function on a real variable. Let L ∈ R. We say that the
right-sided limit of f (x) as x approaches a is L if:
Definition 142. (Two-Sided Infinite Limit.) We say that the limit of f (x) as x ap-
proaches a is ∞ if:
Definition 143. (Left-Sided Infinite Limit.) We say that the limit of f (x) as x ap-
proaches a from the left is ∞ if:
The definition of the right-sided infinite limit (“lim f (x) = ∞”) is very similar and thus
x↘a
omitted. Likewise with the definitions where the limit is −∞ (instead of ∞).
(Note that where I write x ↗ a, some others instead write x ↑ a or x → a− . And where I
write x ↘ a, some others instead write x ↓ a or x → a+ .)
Definition 145. (Limit at Infinity.) We say that the limit of f (x) as x approaches ∞
is L and write “as x → ∞, f (x) → L” or “ lim f (x) = L” if:
x→∞
Definition 147. (Limit at Infinity.) We say that the limit of f (x) as x approaches ∞
is ax + b if:
1 1 g(x) M
4. lim = (L ≠ 0), 5. lim = (L ≠ 0).
x→a f (x) L x→a f (x) L
Proof. We first write down what the statements lim f (x) = L and lim g(x) = M mean:
x→a x→a
1. Let 1 > 0. Pick small f , g so that f + g ≤ 1 . Pick the δf , δg that correspond to these
f , g . Now pick δ1 = min {δf , δg }, so that indeed x ∈ (a − δ1 , a + δ1 )/{a} implies
f (x) + g(x) ∈ (L + M − (f + g ) , L + M + (f + g )) ⊆ (L + M − 1 , L + M + 1 ) .
The proof that lim [f (x) − g(x)] = L − M is very similar and omitted.
x→a
2. Let 2 > 0. Pick small f so that kf ≤ 2 . Pick the δf that corresponds to this f . Then
indeed x ∈ (a − δf , a + δf )/{a} implies
kf (x) ∈ (k (L − f ) , k (L + f )) ⊆ (kL − 2 , kL + 2 ) .
= (LM − f M − g L + f g , LM + f M + g L + f g ) ⊆ (LM − 3 , LM + 3 ) .
The proof for the other cases (where L, M are not both positive) is similar and omitted.
1 4 L2
4. Let 4 ∈ (0, ∣ ∣). Pick f = . Pick δf that corresponds to this f . First,
L 1 + 4 L
⎧
⎪
1 ⎪
⎪≥ 1, if L > 0,
4 ∈ (0, ∣ ∣) Ô⇒ 1 + 24 L ⎨
L ⎪
⎪
⎩≤ 1,
⎪ if L < 0
⎧
⎪ ⎧
⎪
1 ⎪
⎪≤ 1, if L > 0, L ⎪
⎪≤ L, if L > 0,
Ô⇒ ⎨ Ô⇒ ⎨
1 + 24 L ⎪
⎪ 1 + 24 L ⎪
⎪
⎩≥ 1,
⎪ if L < 0 ⎩≤ L,
⎪ if L < 0
L 4 L
⇐⇒ ≤L Ô⇒ 4 L ≥
1 + 24 L 1 + 24 L
4 L 1
Ô⇒ 1 − ≥ 1 − 4 L.
1 + 24 L
1 1 1 ⎛ 1 1 ⎞ 1 + 4 L 1 + 4 L
∈( , ) = , =( , )
f (x) L + f L − f ⎝ L + 1+
4 L2 4 L ⎠
L − 1+
2
L + 24 L2 L
4L 4L
1 1 + 4 L 1 1 4 L 1
=( ( ) , + 4 ) = ( (1 − ) , + 4 )
L 1 + 24 L L L 1 + 24 L L
2 1 1 1 1
⊆ ( (1 − 4 L) , + 4 ) =( − 4 , + 4 ) ,
L L L L
2 1
where ⊆ uses ≥.
Proof. Let > 0. Pick any f , g to be small enough so that f + g ≤ . Pick the δf ,
δg that correspond to these f , g . Now pick δ1 = min {δ, δf , δg }, so that indeed for all
x ∈ (a − δ1 , a + δ1 )/{a}, we have h(x) ∈ (L − f , L + f ) ∩ (L − g , L + g ) = (L − min {f , g } , L +
min {f , g }) ⊆ (L − , L + ).
Definition 149. A function f is left-continuous at a point a if lim f (x) = f (a) and right-
x↗a
continuous at a if lim f (x) = f (a).
x↘a
Proof. This is obvious from Fact 91 and the definitions of left- and right-continuity.
f (x) − f (a)
Definition 154. A function f is left-differentiable at a point a if lim exists
x↗a x−a
f (x) − f (a)
and right-continuous at a if lim exists.
x↘a x−a
Proof. This is obvious from Fact 91 and the definitions of left- and right-differentiability.
Proof. Consider the circle (from which the trigonometric functions and the radian were
defined). First restrict attention to θ ∈ (0, 0.5π).
Clearly, for all θ ∈ (0, 0.5π), BC < arcAB. But by definition of the radian and the sine
function, we have θ = arc AB and sin θ = BC. Thus, sin θ < θ.
Clearly, the area of the △OAD is greater than the area of the circular sector OAB. That
θ
is, 0.5 tan θ > π12 × = 0.5θ. Or tan θ > θ.
2π
1 1 cos θ
Altogether then, for all θ ∈ (0, 0.5π), sin θ < θ < tan θ. Rearranging to get < < .
sin θ θ sin θ
sin θ
Multiply by sin θ to get 1 < < cos θ.
θ
We can show this last pair of inequalities also holds for all θ ∈ (−0.5π, 0).
sin θ
Since lim 1 = 1 and lim cos θ = 1, we have lim = 1 (Squeeze Theorem).
θ→0 θ→0 θ→0 θ
d d d 1
k = 0, sin x = cos x, ln x = ,
dx dx dx x
d d
f ± g = f ′ ± g′, cos x = − sin x,
dx dx
d d
kf = kf ′ , f ⋅g = g ⋅ f ′ + f ⋅ g′,
dx dx
d d f g ⋅ f ′ − f ⋅ g′
xk = kxk−1 , = ,
dx dx g g⋅g
d d d (f ○ g) dg
ex = ex , f ○g = ⋅ .
dx dx dg dx
where the penultimate = used Lemma 4.1. Hence, the derivative of f ± g is the function
f ′ ± g ′ with domain A ∩ B, codomain R, and mapping rule x ↦ f ′ (x) ± g ′ (x). We can write
this in shorthand as d (f ± g) /dx = f ′ ± g ′ .
1 2 3
where = and = used Lemmata 4 and 5, and = uses the fact that the cosine function is
continuous (admittedly we haven’t proven this yet, but this should be “obvious”). Hence,
the derivative of h is the function with domain R, codomain R, and mapping rule x ↦ cos x.
We can write this in shorthand as d sin x/dx = cos x.
Cosine. d cos x/dx = d sin (x + π/2) /dx = cos (x + π/2) = − sin x.
Product Rule. f ⋅ g is the function with domain A ∩ B, codomain R, and mapping rule
f (x)g(x) − f (a)g(a)
x ↦ f (x)g(x). For every a ∈ A ∩ B, lim exists and moreover
x→a x−a
f (x)g(x) − f (a)g(a)
lim
x→a x−a
1 2
where = and = used Lemma 4. Hence, the derivative of f ⋅ g is the function f g ′ + gf ′ with
domain A∩B, codomain R, and mapping rule x ↦ f g ′ +gf ′ . We can write this in shorthand
as d (f ⋅ g) /dx = f g ′ + gf ′ .
However, the above “proof” commits the cardinal sin of (possibly) dividing by zero, because
there is the possibility that g(x) = g(a) for values of x in the neighbourhood of a!
To get around this irksome technicality, we need to play a little trick. Define
⎧
⎪ f (g(x)) − f (g(a))
⎪
⎪
⎪ , if g(x) ≠ g(a),
φ(x) = ⎨ g(x) − g(a)
⎪
⎪
⎪ ′
⎩f (g(a)) ,
⎪ if g(x) = g(a).
⎧
⎪ f (g(x)) − f (g(a))
⎪ = f ′ (g(a)) , if g(x) ≠ g(a),
0 ⎪
lim
⎪x→a
Note that lim φ(x) = ⎨ g(x) − g(a)
x→a ⎪
⎪
⎪
⎪ lim f ′ (g(a)) = f ′ (g(a)) ,
⎩x→a if g(x) = g(a).
0
= will be used shortly. Now, observe that
1 1
because if g(x) = g(a), then = is clearly true; and if g(x) ≠ g(a), then = is again clearly
true, because
d f d 1 × ′1 d 1 P,C f ′ 1 ′ gf ′ − f g ′
Quotient Rule. = (f ) = f + f = + f (−1) g = .
dx g dx g g dx g g g⋅g g⋅g
d d d 1 d
Exponential. On the one hand, ln ex = x = 1. On the other, ln ex = x ex .
dx dx dx e dx
1 d x d x
Hence, x e = 1. Rearranging, e =e .x
e dx dx
Power Rule. Using the Chain Rule and also the derivatives of the natural logarithm and
exponential functions, we have:
dxn d n n
= en ln x = en ln x = xn = nxn−1 .
dx dx x x
Proof.
f (x) − f (a)
lim [f (x) − f (a)] = lim [(x − a) ]
x→a x→a x−a
f (x) − f (a)
= lim(x − a) lim
x→a x→a x−a
= 0 ⋅ f ′ (a) = 0
1. ... f (x) ≥ f (a), then x is a maximum point of f and f (x) a maximum value.
2. ... f (x) ≤ f (a), then x is a minimum point of f and f (x) a minimum value.
3. ... f (x) > f (a), then x is a strict maximum point of f and f (x) a strict maximum value.
4. ... f (x) < f (a), then x is a strict minimum point of f and f (x) a strict minimum value.
Proof. 1. Suppose f is decreasing on (a, b). That is, by definition, ∀x1 , x2 ∈ (a, b), x2 > x1
Ô⇒ f (x2 ) ≥ f (x1 ). Equivalently, x2 − x1 > 0 Ô⇒ f (x2 ) − f (x1 ) ≥ 0. Equivalently, for all
f (x2 ) − f (x1 ) f (x) − f (c)
distinct x1 , x2 ∈ (a, b), ≥ 0. This implies that ∀c ∈ (a, b), lim ≥ 0.
x2 − x1 x→c x−c
That is, f ′ (c) ≥ 0 for all c ∈ (a, b).
Now suppose f ′ (c) ≥ 0, for all c ∈ (a, b). Then ∃δ > 0 such that ∀c ∈ (a, b), ∀x ∈ (c − δ, c + δ),
f (x) − f (c)
≥ 0. Equivalently, ∀x1 , x2 ∈ (c − δ, c + δ), x2 > x1 Ô⇒ f (x2 ) ≥ f (x1 ). Since δ
x−c
is fixed and the previous sentence is true if we replace c with any other d ∈ (a, b), we have
that ∀x1 , x2 ∈ (a, b), x2 > x1 Ô⇒ f (x2 ) ≥ f (x1 ).
This completes the proof of 1. The proofs of 2, 3, and 4 are similar and thus omitted.
f (x) − f (a)
Proof. Suppose for contradiction that f ′ (a) > 0. That is, lim > 0. That is, there
x→a x−a
f (x) − f (a)
exists δ > 0 such that for all x ∈ (a − δ, a + δ)/{a}, > 0. That is, f (x) > f (a) if
x−a
x > a and f (x) < f (a) if x < a. So by definition then, x can neither be a maximum nor a
minimum point.
Similarly, suppose for contradiction that f ′ (a) < 0 ... (similar reasoning omitted).
We conclude that if x is a maximum or a minimum point, then f ′ (a) = 0.
Fact 94. Suppose f ∶ D → R is continuous at a. If there exists δ > 0 such that f is increasing
on (a − δ, a) and f is decreasing on (a, a + δ), then f attains a maximum at a. (Similarly,
if there exists δ > 0 such that f is decreasing on (a − δ, a) and f is increasing on (a, a + δ),
then f attains a minimum at a.)
Proof. By continuity, f (a) = sup f (x) and f (a) = sup f (x).88 So indeed f (a) ≥ f (x)
x∈(a−δ,a) x∈(a,a+δ)
for all x ∈ (a − δ, a + δ) and f attains a maximum at a.
(The proof of the “similarly” bit is similar and omitted.)
88
sup f (x) is the smallest real number L such that L ≥ f (x) for all x ∈ A.
x∈A
These are the general definitions of concavity and inflexion points, without assuming that
f is differentiable.
Definition 160. A function f is concave downwards (or concave) on an interval if for every
x1 , x2 in that interval and every α ∈ [0, 1],
Definition 161. A function f is concave upwards (or convex) on an interval (in the func-
tion’s domain) if for every x1 , x2 in that interval and every α ∈ [0, 1],
Definition 162. A function f is linear on an interval (in the function’s domain) if for
every x1 , x2 in that interval and every α ∈ [0, 1],
Of course, if a function is linear on an interval, then it is also both concave and convex on
that interval.
The “moreover” bit is to rule out the trivial case where f is simply linear (and thus both
concave and convex) on (a − δ, a + δ). In this case, we do not want to say that a is an
inflexion point.
x3 − x2
Proof. Pick any distinct x1 , x2 , and x3 in the interval. Let α = ∈ [0, 1], so that
x3 − x1
x3 − x2 x2 − x1
αx1 + (1 − α) x3 = x1 + x3 = x2 . And so by Definition 160,
x3 − x1 x3 − x1
x3 − x2 x2 − x1
f (x2 ) ≥ f (x1 ) + f (x3 )
x3 − x1 x3 − x1
This completes the proof of (a). The proof of (b) is similar and thus omitted.
(b) ∀x ∈ (a−δ, a), f ′ (a) (x − a)+f (a) ≤ f (x) and ∀x ∈ (a, a+δ), f ′ (a) (x − a)+f (a) ≥ f (x),
with at least one of these inequalities being strict.
Here is the informal interpretation of the above fact. Note that the tangent line has equation
y = f ′ (a) (x − a) + f (a). So (a) is the condition that to the left of a, the tangent line at a is
above the graph of f ; but to the right of a, the tangent line is below the graph of f . And
(b) is the condition that to the left of a, the tangent line at a is below the graph of f ; but
to the right of a, the tangent line is above the graph of f .
The additional condition that “at least one of these inequalities is strict” is to avoid the
trivial situation where f is linear on the interval (a − δ, a + δ).
f (x) − f (a)
Proof. (a) ∀x ∈ (a−δ, a), f ′ (a) (x − a)+f (a) ≥ f (x) ⇐⇒ ∀x ∈ (a−δ, a), f ′ (a) ≥
x−a
f (x) − f (a) f (x) − f (a)
⇐⇒ ∀x ∈ (a − δ, a), lim ≥ ⇐⇒ ∃δ > 0 such that ∀b ∈ (a − δ, a + δ),
x→a x−a x−a
f (b) − f (a) f (x) − f (a)
≥ ∀x ∈ (a − δ, a) ⇐⇒ f is concave on (a − δ, a) (Lemma 6).
b−a x−a
The proof that “∀x ∈ (a, a + δ), f ′ (a) (x − a) + f (a) ≤ f (x)” ⇐⇒ “f is convex on (a − δ, a)”
is similar and thus omitted.
This completes the proof of (a). The proof of (b) is similar and thus omitted.
Proof. 1. If f ′ (a) = 0 and f ′′ (a) < 0, then ∃δ > 0 ∶ f ′ (x) > 0 for all x ∈ (a−δ, a) and f ′ (x) < 0
for all x ∈ (a, a + δ) . And so by Fact 94, a is a maximum point.
2. The proof is similar and thus omitted.
3. We will show that all four possibilities exist.
Define f ∶ R → R by x ↦ x4 . Then f ′ (0) = f ′′ (0) = 0 and 0 is a minimum point.
Define g ∶ R → R by x ↦ −x4 . Then g ′ (0) = g ′′ (0) = 0 and 0 is a maximum point.
Define h ∶ R → R by x ↦ x3 . Then h′ (0) = h′′ (0) = 0 and 0 is an inflexion point.
⎧
⎪ 1
⎪ 5
⎪x sin , for x ≠ 0,
Define i ∶ R → R by i(x) = ⎨ x
⎪
⎪
⎪
⎩0, for x = 0.
⎧
⎪ 1 1
⎪
⎪ 5x 4
sin − x3
cos , for x ≠ 0,
i′ (x) = ⎨ x x
⎪
⎪
⎪
⎩0, for x = 0,
⎧
⎪ 1 1 1
⎪
⎪ 20x3
sin − 8x2
cos − x sin , for x ≠ 0,
i′′ (x) = ⎨ x x x
⎪
⎪
⎪
⎩0, for x = 0.
We indeed have i′ (0) = i′′ (0) = 0. However, near 0, i(x) fluctuates infinitely often between
negative and positive values. So 0 is neither a maximum point nor a minimum point.
Moreover, near 0, i′′ (x) fluctuates infinitely often between negative and positive values. So
there is no interval to the left of 0 on which i is concave or convex. And there is no interval
to the right of 0 on which i is concave or convex. Thus, 0 is not an inflexion point.
Proof. (a) Since f is either strictly increasing or strictly decreasing, it is invertible (or
one-to-one) and so f −1 exists.
(b) Let f −1 (y) be denoted by xy .
′ f −1 (y) − f −1 (a) xy − xa 1
(f −1 ) (a) = lim = lim = lim
y→a y−a y→a f (xy ) − f (xa ) y→a f (xy )−f (xa )
xy −xa
1 1 2 1 1
= lim = = .
xy →xa f (xy )−f (xa )
limxy →xa f (xy )−f (xa ) f ′ (a)
xy −xa xy −xa
1 2 f (xy ) − f (xa )
where = uses the continuity of f and = uses a Limit Law and the fact that lim
xy →xa xy − xa
exists and is not equal to 0.
Fact 96. Let f, g ∶ R → R be differentiable functions. Let y = f (t) and x = g(t). Then for
any a such that g ′ (a) ≠ 0, we have
R R R
dy RRRR dy RRRR dx RRRR
R = RRR ÷ RRR .
dx RRRR dt RR dt RR
Rt=a Rt=a Rt=a
Proof.
R
dy RRRR f (t) − f (a) f (t) − f (a) t − a
RRR = lim = lim [ ⋅ ]
dx RR t→a g(t) − g(a) t→a g(t) − g(a) t − a
Rt=a
R R
f (t) − f (a) g(t) − g(a) dy RRRR dx RRRR
= lim ÷ lim = RRR ÷ RRR .
t→a t−a t→a t−a dt RR dt RR
Rt=a Rt=a
Proof. Omitted, not because it’s difficult, but because the proof requires the Mean Value
Theorem, which in turn requires a few other ingredients. After some thought, I’ve decided
to just omit this, rather than add another 10 pages that no one will read.
Definition 164. We say that f satisfies the “nice” property at a if for all x ∈ (−a, a),
f (k) (x) k
lim a = 0.
k→∞ n!
Corollary 8. If f satisfies the “nice” property at a, then f (a) = M (a), where M (a) is the
Maclaurin series at a.
Now we can verify that our five standard series satisfy the “nice” property (for some specified
range of values).
Now, ∣n(n − 1)(n − 2) . . . (n − k + 1)∣ is bounded above, if not by n!, then by some other
expression involving n.
g (k) (x) k
So indeed, for all x ∈ (−a, a), lim a = 0. And so by Corollary 8, for all x ∈ (−1, 1),
k→∞ k!
n(n − 1) 2
we have (1 + x)n = M (x) = 1 + nx + x + ...
2!
Note that in contrast, if a ∉ (−1, 1), then ∣ak ∣ is not bounded from above and thus there is
g (k) (x) k
no guarantee that lim a = 0.
k→∞ k!
Example 625. We verify that the function h ∶ R → R defined by x ↦ ex satisfies the “nice”
property at a for all a ∈ R: For all x ∈ (−a, a) for all k ∈ Z+ , we have
h(k) (x) k ex k
a = a .
k! k!
ak a a a a a
Since a is fixed, eventually the ending terms in the product = ⋅ ⋅ ... ⋅ will
k! 1 2 3 k−1 k
ak
all be less than 1. And so lim = 0. And so as desired, we have
k→∞ k!
ex k ea k 1
lim a < lim a = ea lim ak = 0.
k→∞ k! k→∞ k! k→∞ k!
And so indeed, R is the range of values for which the Maclaurin series converges
to h. That is, h(x) is equal to its Maclaurin series for all x ∈ R.
⎧
⎪ cos x k
⎪
⎪
⎪ a , if k ≡ 1 ( mod 4) ,
⎪
⎪
⎪ k!
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ − sin x k
⎪
⎪
⎪ a , if k ≡ 2 ( mod 4) ,
i (x) k ⎪
(k) ⎪
⎪ k!
a =⎨ .
k! ⎪
⎪
⎪
⎪
⎪ − cos x k
⎪
⎪
⎪ a , if k ≡ 3 ( mod 4) ,
⎪
⎪
⎪
k!
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ sin x k
⎪
⎪ a , if k ≡ 4 ( mod 4) .
⎩ k!
ak
Again, lim = 0. Moreover, ± sin x, ± cos x have maximum absolute value of 1. Thus,
k→∞ k!
i(k) (x) k
lim a = 0, as desired.
k→∞ k!
And so indeed, R is the range of values for which the Maclaurin series converges
to i. That is, i(x) is equal to its Maclaurin series for all x ∈ R.
a a a 1 1
∈( , ) = (1 − , − 1) ⊆ (0, 1)
1+x 1+a 1−a 1+a 1−a
a k
Ô⇒ lim ( ) =0
k→∞ 1 + x
a a a 1 1
∈( , )=( − 1, 1 − ) ⊆ (−1, 0)
1+x 1−a 1+a 1−a 1+a
a k
Ô⇒ lim ( ) =0
k→∞ 1 + x
And so indeed, (−1, 1] is the range of values for which the Maclaurin series con-
verges to f . That is, f (x) is equal to its Maclaurin series for all x ∈ (−1, 1].
a f (k) (x) k
Note that in contrast, if a > 1, then could be greater than 1, so that lim a =
1+x k→∞ k!
(−1)k−1 a k f (k) (x) k
lim ( ) ≠ 0. So there is no guarantee that lim a = 0.
k→∞ k 1+x k→∞ k!
Proof. Let Fn (x) and Gn (x) be the nth-order polynomials for f and g. Let c ∈ F ∩ G.
Observe that
n
Fn (c)Gn (c) = a0 b0 + (a0 b1 + a1 b0 ) c + (a0 b2 + a1 b1 + a2 b0 ) c2 + ⋅ ⋅ ⋅ + (∑ ai bn−i ) c2n .
i=0
Let > 0. Our goal is to show that ∃N ∶ ∀n ≥ N , Fn (c)Gn (c) ∈ (f (c)g(c) − , f (c)g(c) + ).
Case #1: f (c), g(c) ≥ 0.
1 1
Pick f , g > 0 such that f (c)g + g(c)f + f g < , f < ∣f (c)∣, and g < ∣g(c)∣. Note that <
2
⇐⇒ −f (c)g − g(c)f − f g > − Ô⇒ −f (c)g − g(c)f + f g > −.
By definition, ∀f > 0, ∃Nf > 0: ∀n ≥ Nf , we have Fn (c) ∈ (f (c) − f , f (c) + f ) and ∀g > 0,
∃Ng > 0: ∀n ≥ Ng , we have Gn (c) ∈ (g(c) − g , g(c) + g ).
2 1
But by construction, −f (c)g − g(c)f + f g > − and f (c)g + g(c)f + f g < . Hence, the
lattermost set is a subset of (f (c)g(c) − , f (c)g(c) + ). So we have as desired:
2 1
But by construction, f (c)g + g(c)f + f g > − and −f (c)g − g(c)f + f g < . Hence, the
lattermost set is a subset of (f (c)g(c) − , f (c)g(c) + ). So we have as desired:
1 2
But by construction, −f (c)g + g(c)f − f g > − and f (c)g − g(c)f − f g < . Hence, the
lattermost set is a subset of (f (c)g(c) − , f (c)g(c) + ). So we have as desired:
Case #4: f (c) < 0, g(c) ≥ 0 is similar to Case #3 and the proof is thus omitted.
2 3
f (g(x)) = a0 + a1 g(c) + a2 [g(c)] + a3 [g(c)] + . . .
2 3
Proof. Let xg = g(c). Since xg ∈ F , by assumption, f (xg ) = a0 +a1 xg +a2 (xg ) +a3 (xg ) +. . . .
2 3
That is, f (g(x)) = a0 + a1 g(c) + a2 [g(c)] + a3 [g(c)] + . . . , as desired.
Fact 99. Let f ∶ F → R and g ∶ G → R be functions. Suppose f (x) = a0 +a1 x+a2 x2 +a3 x3 +. . .
for all x ∈ F and g(x) = b0 + b1 x + b2 x2 + b3 x3 + . . . Then ∀c ∈ G ∶ g(c) ∈ F , we have
2
f (g(x)) = a0 + a1 (b0 + b1 x + b2 x2 + b3 x3 + . . . ) + a2 (b0 + b1 x + b2 x2 + b3 x3 + . . . ) + . . .
2 n
Proof. Let Tn (x) = a0 + a1 g(c) + a2 [g(c)] + ⋅ ⋅ ⋅ + an [g(c)] and
2
Sn,k (x) = a0 + a1 (b0 + b1 x + b2 x2 + ⋅ ⋅ ⋅ + bk xk ) + a2 (b0 + b1 x + b2 x2 + ⋅ ⋅ ⋅ + bk xk ) + . . .
n
⋅ ⋅ ⋅ + an (b0 + b1 x + b2 x2 + ⋅ ⋅ ⋅ + bk xk ) .
(b) If f (x) > d for all x ∈ (a, b), then f (a), f (b) ≥ d.
The stronger claim that “if f (x) < c for all x ∈ (a, b), then lim f (x) < c” is false. Consider
x↗b
1
for example a = 0.5, b = 1, f (x) = 1 − , and c = 1. Then indeed f (x) < c for all x ∈ (a, b),
x
BUT lim f (x) = c.
x↗b
Proof. (a) If f (a) > c, then by the continuity of f , ∃δ > 0 ∶ ∀x ∈ (a, a + ), f (x) > c,
contradicting our assumption that f (x) < c for all x ∈ (a, b).
The proof that f (b) ≤ c is similar and thus omitted.
b−a b−a
Pi = [b + (i − 1) ,b + i ],
n n
b−a n
Then the Lower n-Sum of f on [a, b] and Upper n-Sum of f on [a, b] is ∑ f (xi ) are,
n i=1
respectively,
b−a n b−a n
∑ f (xi ) and ∑ f (xi ) .
n i=1 n i=1
Definition 166. Let f be a real function on a real variable that is continuous on [a, b].
The Lower Integral of f on [a, b] is
b b−a n
∫a f dx = n→∞
lim ∑ f (xi ) ,
n i=1
p
Proof. Omitted. But here is an informal “proof”: “Clearly”, ∫ f (x) dx is simply the area
a
q
of the graph of f between a and p, while ∫ f (x) dx is simply the area of the graph of f
a
between a and q. And so the former minus the latter is simply the area of the graph of f
p
between p and q, i.e. ∫ f (x) dx.
p
b c
Lemma 9. Suppose a, b, c ∈ R are constants. Then c = ∫ dx.
a b−a
b c
Proof. Omitted. But here is an informal “proof”: “Clearly”, ∫ dx is simply the area
a b−a
c b c c
of a rectangle with base b − a and height . So ∫ dx = (b − a) × = c.
b−a a b−a b−a
b
Lemma 10. Suppose that for all x ∈ [a, b], c < f (x) < d. Suppose moreover that ∫ f dx
a
b
is well-defined. Then (a − b)c < ∫ f dx < (a − b)d.
a
Proof. Omitted. But here is an informal “proof”: If for all x ∈ [a, b], f (x) = c, then by
b
Lemma 9, ∫ f dx = (b − a)c. And if for all x ∈ [a, b], f (x) = d, then by Lemma 9,
a
b
∫a f dx = (b − a)d. “Clearly” then, if for all x ∈ [a, b], c < f (x) < d, then (a − b)c <
b
∫a f dx < (a − b)d.
F (p) − F (q) 1
− f (p) = [F (p) − F (q) − (p − q)f (q)]
p−q p−q
1 p q
= [ f (x) dx − ∫ f (x) dx − (p − q)f (q)]
p − q ∫a a
1 p 1 p
1 2
= [∫ f (x) dx − (p − q)f (q)] = [f (x) − f (q)] dx,
p−q q p − q ∫q
1 2
where = uses Lemma 8 and = uses Lemma 9 (note that f (q) is simply a constant).
By the continuity of f , ∀x ∈ (q − δ, q + δ), f (q) − < f (x) ∈ f (q) + and hence − < f (x) −
p
f (q) < . So if p ∈ (q − δ, q + δ), then by Lemma 10, −(p−q) < ∫ [f (x) − f (q)] dx < (p−q)
q
and thus
1 p
− < [f (x) − f (q)] dx < .
p − q ∫q
1 p
[f (x) − f (q)] dx = 0. And so
p→q p − q ∫q
This proves that lim
F (p) − F (q)
lim [ − f (p)] = 0.
p→q p−q
But by the continuity of f , we know also that lim f (p) = f (q). Thus,
p→q
F (p) − F (q)
lim = f (q).
p→q p−q
F (p) − F (q)
By definition of the derivative, F ′ (q) = lim . And so indeed F ′ (q) = f (q).
p→q p−q
It took a long while, but we can now finally define the natural logarithm function (which
we’ve been happily using all along)!
Definition 168. The natural logarithm function, denoted ln, has domain R+ , codomain
c 1
R, and mapping rule ln c = ∫ dx.
1 x
Graphically, ln a is the area between the curve y = 1/x and the x-axis, bounded by the
vertical lines x = 1 and x = a.
x
0 1 2 3 4 5 6 7 8
You probably learnt that ln x is the number such that eln x = x. Our definition is a little
d 1
strange, but has the advantage that we can almost immediately prove that ln x = .
dx x
d 1
Fact 100. For all c ∈ R+ , we have ln c = .
dx c
1
Proof. Define f ∶ R+ → R by f (x) = . Let A be its corresponding area function. Then
x
=f (c),by FTC1 =0
³¹¹ ¹ · ¹ ¹ ¹µ ³¹¹ ¹ ¹·¹ ¹ ¹ µ
d d c 1 d dA(c) dA(1) 1
ln c = (∫ dx) = [A(c) − A(1)] = − = f (c) = .
dx dx 1 x dx dx dx c
(c) ln(x/y) = ln x − ln y.
(c) ln xn = n ln x.
1 1
Proof. (a) ln 1 = ∫ dx = 0.
1 x
d 1 dy 1 1 dy
(b) Differentiate both sides with respect to x to get ln(xy) = (y + x ) = +
dx xy dx x y dx
d 1 1 dy
and (ln x + ln y) = + . Thus, ln(xy) and ln x + ln y are both indefinite integrals
dt x y dx
1 1 dy
for the same function + . By Fact 60 then, ln(xy) = ln x + ln y + C. But for x = 1,
x y dx
ln y = ln 1 + ln y + C = ln y + C, so C = 0. Thus, ln(xy) = ln x + ln y.
d 1 n d
(d) Differentiate both sides with respect to x to get ln xn = n nxn−1 = and (n ln x) =
dx x x dx
n n
. Thus, ln xn and n ln x are both indefinite integrals for the same function . By Fact 60
x x
then, ln xn = n ln x + C. But for x = 1, ln 1n = ln 1 = 0 and n ln 1 + C = C, so C = 0. Thus,
ln xn = n ln x.
x
0 1 2 3 4 5 6 7 8
1
How do we know that e is indeed uniquely defined? It’s because is strictly positive for
x
all x ∈ [1, ∞), so that ln x is strictly increasing. So there can be only one number e such
that ln e = 1.
The only functions whose derivative is 0 are constant functions.90 Hence, the first derivative
ln x eln x
of e is a constant. That is, = C or eln x = Cx. But we also know that for x = 1,
x
eln 1 = e0 = 1, so that C = 1. Hence, eln x = x, as desired.
89
Note though that it was simply Euler himself who happened to start using the letter e to denote this number. And
presumably he was not doing it to honour himself. Calling it Euler’s number is simply an honour conferred by posterity.
90
As noted in n. 53, this textbook shall simply take this assertion for granted.
∞
1 1 1 1 1
Theorem 21. e = ∑ . (Equivalently, e = + + + + . . . )
i=0 i! 0! 1! 2! 3!
∞
xi
Proof. From our study of Maclaurin series, we know that e = ∑ for all x ∈ R. Hence,
x
i=0 i!
∞
1
e = e1 = ∑ , as desired.
i=0 i!
1 n
Theorem 22. lim (1 + ) = e.
n→∞ n
1 1
Proof. Let n ∈ (1, ∞) and x ∈ [1, ]. Then ≤ 1, so that
n x
1 1+ n 11
1+ n1
1
ln (1 + ) = ∫ dx ≤∫ 1 dx = .
n 1 x 1 n
1 n 1 1+ n 1 1+ n
1
n
1
1
Similarly, ≥ , so that ln (1 + ) = ∫ dx ≥ ∫ dx = .
x n+1 n 1 x 1 n+1 n+1
1 1 1
Ô⇒ e n+1 ≤ eln(1+ n ) ≤ e1/n
1 1
Altogether, ≤ ln (1 + ) ≤
n+1 n n
1 1 n n/n
≤ (1 + ) ≤ e1/n
1 n
Ô⇒ e n+1 Ô⇒ e n+1 ≤ (1 + ) ≤ e = e.
n n
n 1 n
Taking limits, lim e n+1 ≤ lim (1 + ) ≤ lim e.
n→∞ n→∞ n n→∞
n
Since lim e n+1 = e and lim e = e, by the Squeeze Theorem (Theorem 17), we must have
n→∞ n→∞
1 n
lim (1 + ) = e.
n→∞ n
Theorem 23. (AP.) If A and B are disjoint, finite sets, then ∣A ∪ B∣ = ∣A∣ + ∣B∣.
A ∪ B = {a1 , a2 , . . . , ap , b1 , b2 , . . . , bq } .
n
Corollary 9. If A1 , A2 , . . . , An are disjoint, finite sets, then ∣∪ni=1 Ai ∣ = ∑ ∣Ai ∣.
i=1
Theorem 24. (MP.) If A and B are finite sets, then ∣A × B∣ = ∣A∣ × ∣B∣.
⎧
⎪
⎪
A × B = ⎨ (a1 , b1 ) , (a1 , b2 ) , . . . , (a1 , bq ) , (a2 , b1 ) , (a2 , b2 ) , . . . , (a2 , bq ) , . . . ,
⎪
⎪
⎩
⎫
⎪
⎪
. . . , (ap , b1 ) , (ap , b2 ) , . . . , (ap , bq ) ⎬.
⎪
⎪
⎭
1
Proof. A ∪ B = (A/ (A ∩ B)) ∪ B. So by the AP, ∣A ∪ B∣ = ∣A/ (A ∩ B) ∣ + ∣B∣.
Now, (A/ (A ∩ B)) ∪ (A ∩ B) = A. So also by the AP, ∣A/ (A ∩ B)∣ + ∣A ∩ B∣ = ∣A∣ or
2 2 1
∣A/ (A ∩ B)∣ = ∣A∣ − ∣A ∩ B∣. Plug = into = to get the desired result.
n
n−1
∣∪ni=1 Ai ∣ = ∑ ∣Ai ∣ − ∑ ∣Ai ∩ Aj ∣ + ∑ ∣Ai ∩ Aj ∩ Ak ∣ − ⋅ ⋅ ⋅ + (−1) ∣∩ni=1 Ai ∣ .
i=1 i,j distinct i,j,k distinct
Theorem 26. (CP.) If A and B are finite sets and A ⊆ B, then ∣A/B∣ = ∣A∣ − ∣B∣.
Proof. B and A/B are disjoint, finite sets. Moreover, B ∪ (A/B) = A. So by the AP,
∣B∣ + ∣A/B∣ = ∣A∣. Rearranging yields the desired result.
n
∣A/ ∪ni=1 Bi ∣ = ∣A∣ − ∑ ∣Bi ∣ .
i=1
Consider n objects, only k of which are distinct. Let r1 , r2 , . . . , and rk be the numbers of
times the 1st, 2nd, . . . , and kth distinct objects appear. We already know from Fact 63
that the number of (linear) permutations of these n objects is
n!
.
r1 !r2 ! . . . rk !
We also know that m distinct objects have m! (linear) permutations and (m − 1)! circular
permutations.
A reasonable conjecture might thus be that the number of circular permutations of the
above n objects is
(n − 1)!
.
r1 !r2 ! . . . rk !
The above conjecture sometimes “works” — e.g. SEE has 3!/2! = 3 (linear) permutations
and SEE indeed also has (3 − 1)!/2! = 1 circular permutation. However and unfortunately,
this conjecture is, in general, incorrect. Here are two counter-examples.
Example 629. There are 3!/3! = 1 (linear) permutations of the three letters AAA.
If the above conjecture were true, then there ought to be (3 − 1)!/3! = 2!/3! = 1/3 circular
permutations of AAA. But this is not even an integer, so obviously it cannot be the number
of circular permutations of AAA. In fact, there is also exactly 1 circular permutation of
AAA.
Example 630. There are 6!/ (3!3!) = 20 (linear) permutations of the six letters AAABBB.
If the above conjecture were true, then there ought to be (6 − 1)!/ (3!3!) = 10/3 circular
permutations of AAABBB. But this is not even an integer, so obviously it cannot be
the number of circular permutations of AAABBB. In fact, there are exactly 4 circular
permutations of AAABBB.
A general solution (i.e. formula) is possible but is a bit too advanced for A-levels.91
91
See e.g. this Handbook on Combinatorics.
Proof. This proposition applies even for non-discrete random variables. But we’ll prove
this proposition only for the case where the random variable is discrete.
We’ll use the linearity of the expectation operator. We prove (b) first.
(a) E [X + Y ]
= ∑ ∑ P (X = k, Y = l) ⋅ (k + l)
k∈Range(X) l∈Range(Y )
= ∑ k ∑ P (X = k, Y = l) + ∑ l ∑ P (X = k, Y = l)
k∈Range(X) l∈Range(Y ) l∈Range(Y ) k∈Range(X)
= ∑ kP (X = k) + ∑ lP (Y = l)
k∈Range(X) l∈Range(Y )
= E [X] + E [Y ] .
2 2
(b) V[cX] = E [(cX) ] − (cµX ) = c2 E [X 2 ] − c2 µ2X = c2 (E [X 2 ] − µ2X ) = c2 V[X].
2 2
V [X + Y ] = E [(X + Y ) ] − (E [X + Y ])
2
= E [X 2 + Y 2 + 2XY ] − (E [X] + E [Y ])
Lemma 11. If X and Y are independent random variables, then E [XY ] = E [X] E [Y ].
Proof. We prove this Lemma only for the case where X and Y are both discrete.
E [XY ] = ∑ ∑ P (X = k, Y = l) ⋅ kl
k l
= ∑ ∑ P (X = k) P (Y = l) ⋅ kl (independence)
k l
= ∑ (P (X = k) k ∑ P (Y = l) ⋅ l) = ∑ (P (X = k) kE [Y ])
k l k
= E [Y ] ∑ P (X = k) k = E [Y ] E [X] .
k
(b) Flip a fair coin n + 1 times. This gives us n pairs of consecutive coin-flips. Let A be the
proportion of these n pairs of consecutive coin-flips that are HH. Let B be the proportion
that are HT . Then E[A] = µA = 1/4 and E[B] = µB = 1/4.
(Explanation: If the next flip is H, then we’ve completed HH and this took us only 1 more
flip. If instead the next flip is T , then we start all over again; we’ve already taken 1 flip
and are expected to take another p flips.) Similarly, observe that
(Explanation: If the next flip is H, then we expect to take, in addition, another q flips. If
instead the next flip is T , then we start all over again; we’ve already taken 1 flip and are
expected to take another p flips.)
Hence, p = 6 = µX . The reasoning used above is illustrated by the probability tree below.
Let’s now find µY . Again, let
(Explanation: If the next flip is T , then we’ve completed HT and this took us only 1 more
flip. If instead the next flip is H, then we’ve already taken 1 flip and are expected to take
another s flips.)
(Explanation: If the next flip is H, then we’ve already taken 1 flip and are expected to
take another s flips. If the next flip is T , then we’ve already taken 1 flip and are expected
to take another r flips.)
So r = 4 = µY .
(b) Let Si be the random variable that indicates whether the ith pair of consecutive coin-
flips is HH. That is, Si = 1 if so and Si = 0 if not. Then
S1 + S2 + ⋅ ⋅ ⋅ + Sn
A= .
n
S1 + S2 + ⋅ ⋅ ⋅ + Sn 1 n
And so, E [A] = E [ ] = ∑ E [Si ] .
n n i=1
Fact 77 (reproduced from p. 615). Let X ∼ Po(λ). Then E[X] = λ and V[X] = λ.
Proof.
∞ ∞
E[X] = ∑ P(X = k) ⋅ k = ∑ P(X = k) ⋅ k ∵ P(X = 0) ⋅ 0 = 0
k=0 k=1
∞ ∞
λk e−λ λk
=∑ ⋅ k = e−λ ∑ Pull out constant
k=1 k! k=1 (k − 1)!
∞
−λ λk−1
= λe ∑ Pull out constant
k=1 (k − 1)!
∞ k
−λ λ
= λe ∑ Change starting value of summation
k=0 k!
= λe−λ eλ = λ. Maclaurin series for ex
Similarly compute
∞ ∞
E [X 2 ] = ∑ P(X = k) ⋅ k 2 = ∑ P(X = k) ⋅ k 2 (∵P(X = 0) ⋅ 02 = 0)
k=0 k=1
∞ ∞
λk e−λ λk
=∑ ⋅ k 2 = e−λ ∑ k (Pull out constant)
k=1 k! k=1 (k − 1)!
∞ ∞ ∞
−λ λk −λ λk λk
=e ∑ [(k − 1) + 1] = e { ∑ [ (k − 1)] + ∑ }
k=1 (k − 1)! k=1 (k − 1)! k=1 (k − 1)!
∞ ∞
−λ λk λk λ1
=e {∑ [ (k − 1)] + ∑ } (∵ (1 − 1) = 0)
k=2 (k − 1)! k=1 (k − 1)! (1 − 1)!
∞ ∞
−λ 2 λk−2 λk−1
=e [λ ∑ +λ∑ ]
k=2 (k − 2)! k=1 (k − 1)!
∞ ∞ k
−λ 2 λk λ
=e (λ ∑ +λ∑ ) (Change starting value of summation)
k=0 k! k=0 k!
= e−λ (λ2 eλ + λeλ ) = λ2 + λ. (Maclaurin series for ex )
2
Hence, V[X] = E [X 2 ] − (E [X]) = λ2 + λ − λ2 = λ.
Proof. Since the range of X is {0, 1, 2, . . . , n}, it follows that the range of Y = lim Xn is Z+0 .
n→∞
⎛n⎞ λ k λ n−k
P (Xn = k) = ( ) (1 − ) .
⎝k ⎠ n n
Now,
∞ √
Fact 104. ∫ e−x dx =
2
π.
−∞
Fact 78 (reproduced from p. 629). Let Z ∼ N(0, 1) and φ and Φ be its PDF and CDF.
1. Φ(∞) = 1. (As with any random variable, the area under the entire PDF is 1.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has a surprising impli-
cation: however large a is, there is always some non-zero probability that Z ≥ a.)
3. E [Z] = 0. (The mean of Z is 0.)
4. The PDF φ reaches √ a global maximum at the mean 0. (In fact, we can go ahead and
compute φ (0) = 1/ 2π ≈ 0.399.)
5. V [Z] = 1. (The variance of Z is 1.)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
√ √
Proof. 1. Let u = x/ 2. We have u2 = 0.5x2 and du/dx = 1/ 2. And using Fact 989:
∞ 1 1 x=∞ 2 √ du 1 u=∞
1 1 √
√ e−0.5x dx = √ ∫ e−0.5x 2 dx = √ ∫ e−u du = √
2 2
Φ(∞) = ∫ π = 1.
−∞ 2π 2π x=−∞ dx π u=−∞ π
∞ −1 ∞ −1 2 ∞ −1
xφ(x) dx = √ ∫ (−xe−0.5x ) dx = √ [e−0.5x ] = √ [0 − 0] = 0.
2
3. E [Z] = ∫
−∞ 2π −∞ 2π −∞ 2π
⎧
⎪
⎪
⎪ > 0, if a < 0,
d d 1 −0.5a2 −a −0.5a2 ⎪
⎪
⎪
4. φ(a) = √ e =√ e ⎨= 0, if a = 0,
da da 2π 2π ⎪
⎪
⎪
⎪
⎪
⎪
⎩< 0, if a > 0.
v′
∞ ∞ ∞ u ³¹¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ µ
2 1 −0.5x2
2 1 © −0.5x2
5. V [Z] = ∫ (x − 0) φ(x) dx = ∫ x √ e dx = √ ∫ x xe dx
−∞ −∞ 2π 2π −∞
∞ ∞
1 −0.5x2 −0.5x2 1
dx] = √ ∫ e−0.5x dx = 1.
2
= √ [e −∫ e
2π −∞ 2π −∞
φ is continuous, increasing for a < 0 and decreasing for a > 0. Thus, φ reaches a global
maximum√at 0. By plugging in a = 0, we can compute this global maximum value to be
φ(0) = 1/ 2π ≈ 0.399.
6. By the Additivity Axiom, P (Z ≤ a) = P (Z < a, Z = a) = P (Z < a)+P (Z = a) = P (Z < a)+
0 = P (Z < a), as desired.
2 √ 2 √
7. Clearly, φ(a) = e−0.5a / 2π = e−0.5(−a) / 2π = φ(−a) for all a ∈ R. Thus, φ is symmetric
about the vertical axis x = 0, which is also the mean.
7(a). Using the substitution u = −x, we have du/dx = −1 and
⎧
⎪
⎪
⎪
⎪ > 0, if a < −1,
⎪
⎪
⎪
⎪
⎪
⎪ = 0, if a = −1,
d2 d −a −0.5a2 1 −0.5a2 2 ⎪
⎪
⎪
11. φ(a) = √ e =√ e (a − 1) ⎨< 0, if − 1 < a < 1,
da2 da 2π 2π ⎪
⎪
⎪
⎪
⎪
⎪
⎪
⎪ = 0, if a = 1,
⎪
⎪
⎪
⎪
⎩> 0,
⎪ if a > 1.
Hence, ±1 are the only two points of inflexion since φ changes concavity only here.
1 c−b
fY (c) = fX ( ).
∣a∣ a
c−b c−b
Case #2. If a < 0, then FY (c) = ⋅ ⋅ ⋅ = P (aX ≤ c − b) = P (X ≥ ) = 1 − FX ( ).
a a
Now differentiate:
c−b −µ 2
1 c−b 1 1 −0.5( a ) 1 −0.5[ c−(aµ+b) ]
2
faX+b (c) = fX ( )= √ e σ
= √ e aσ
.
∣a∣ a ∣a∣ σ 2π ∣a∣ σ 2π
But this lattermost expression is indeed the PDF of the random variable with distribution
N (aµ + b, a2 σ 2 ).
Proof. This proof may look intimidating but it’s really just a bunch of tedious algbera. (I’ve
also tried to go slow with the algebra, so more steps are explicitly listed than is typical in
a proof.)
(a) Start from the definition of the sample variance and do the algebra:
2
∑i=1 (Xi − X̄) ∑i=1 (Xi2 + X̄ 2 − 2X̄Xi ) ∑i=1 Xi2 − ∑i=1 X̄ 2 − ∑i=1 (2X̄Xi )
n n n n n
2
S = = =
n−1 n−1 n−1
∑i=1 Xi − nX̄ − 2X̄ ∑i=1 Xi ∑i=1 Xi − nX̄ − 2X̄ (nX̄) ∑i=1 Xi2 − nX̄ 2
n 2 2 n n 2 2 n
= = =
n−1 n−1 n−1
n 2
∑i=1 Xi
∑i=1 Xi2 − n [ n ]
2
n [∑n
i=1 Xi ]
∑i=1 Xi2 −
n
= = n
.
n−1 n−1
(b) Start from the formula found in (a) and do the algebra:
2 2
− [∑i=1nXi ] [∑ (X −a+a)]
n
2 n
∑i=1 Xi2 ∑i=1 (Xi − a + a) − i=1 ni
n n
S2 = =
n−1 n−1
2
2 2 + 2 (X − a) a] − [∑i=1 (Xi −a)+∑i=1 a]
n n
n
∑i=1 [(X i − a) + a i
= n
n−1
2 2
2 [∑n (X −a)] +(∑ni=1 a) +2 ∑i=1 (Xi −a) ∑i=1 a
n n
∑i=1 (Xi − a) + ∑i=1 a2 + 2a ∑i=1 (Xi − a) − i=1 i
n n n
= n
n−1
2 2
2 [∑n (X −a)] +(na) +2na ∑n i=1 (Xi −a)
∑i=1 (Xi − a) +na2 +2a ∑i=1 (Xi − a) − i=1 i
n n
= n
n−1
2
2 [∑n (X −a)]
∑i=1 (Xi − a) − i=1 n i
n
= .
n−1
Rearranging:
2
We’ve just shown that (Xi − X̄) is a biased estimator for σ 2 . And in turn, S 2 is not:
1
As promised, here is the proof of equation =:
2 2 2 2
E [(Xi − X̄) ] + E [(X̄ − µ) ] = E [(Xi − X̄) + (X̄ − µ) ]
= E [((Xi − X̄) + (X̄ − µ)) 2 − 2 (Xi − X̄) (X̄ − µ)]
= E [(Xi − µ) 2 − 2 (Xi − X̄) (X̄ − µ)]
= E [(Xi − µ) 2 − 2 (Xi X̄ − µXi − X̄ 2 + µX̄)]
= E [(Xi − µ) 2 − 2 (Xi X̄ − X̄ 2 )]
= E [(Xi − µ) 2 ] + 2 {E [X̄ 2 ] − E [X̄Xi ]}
= E [(Xi − µ) 2 ] .
Definition 170. The random variable Tν with Student’s t-distribution with ν degrees of
freedom has PDF f ∶ R → R given by mapping rule
∞ ν+1 −1 −x − ν+1
∫0 x 2 e dx t2 2
f (t) = √ ∞ ν (1 + ) .
νπ ∫0 x 2 −1 e−x dx ν
Let µ be the true population proportion (of votes for Dr. Chee). Say we take a random
sample of size 900.92 Let X be the sample number of votes for Dr. Chee. We know that
X ∼ B (900, µ).
Our confidence level is 95%. So we want to find the smallest k such that
where 900 × 9142/23570 ≈ 349. Using the “Binomial” sheet at the usual link, we have
Thus, k = 29. Now, 29/900 ≈ 3.2%. Thus, at a 95% confidence level, the margin of error
is ±3.2%. This is the “true” margin of error, assuming we know µ. But this assumption
defeats the point of sampling — we don’t know µ, which is why we’re doing sampling in
the first place!
What we want instead is the margin of error in the case where µ is unknown.
Case #2: Without perfect hindsight: µ unknown.
With µ unknown, a conservative interpretation would be to find the smallest k such that
for all µ, P (900µ − k ≤ X ≤ 900µ + k) ≥ 0.95.
92
This is slightly different from what actually happened: (1) The actual random sampling was most likely without replacement
(which would change the maths slightly). (2) 100 votes were taken from each of 9 different polling stations (which would
also change the maths slightly).
Our problem thus boils down to finding the smallest k such that for X ∼ B (900, 0.5) implies
We conclude that the smallest such k is 29. Now, 29/900 ≈ 3.2%. So the margin of error
may be given as ±3.2%. This is the same as what was calculated above, which is not
surprising, since 9142/23570 ≈ 0.388 is close to 0.5.
The reader will, of course, wonder why the Elections Department stated that the margin of
error was ±4%, rather than ±3.2% as I calculated here. I am not sure myself. My guess is
that they probably don’t bother going through all the above calculations afresh each time.
Instead, each time they report a sample count, they simply read off the margin of error
from a table that looks something like this:
(By the way, note that it is common to use the CLT approximation when calculating the
margin of error. I have not done so here. Instead, I’ve stuck with using the original, exact
binomial distribution.)
93
Proving this would need a little work though.
Proof. Let u = (x1 − x̄, x2 − x̄, . . . , xn − x̄) and v = (y1 − ȳ, y2 − ȳ, . . . , yn − ȳ) be n-dimensional
vectors. Then
But from what we learnt about vectors,94 if θ is the angle between two vectors,
u⋅v
cos θ = .
∣u∣ ∣v∣
94
Of course, in this textbook, we’ve only shown that this is true for two- and three-dimensional vectors. But let’s just wave
our hands and say that this is also true for higher-dimensional vectors.
∑ xi yi − nx̄ȳ
(ii) b̂ = .
∑ x2i − nx̄2
Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is a given
above and â = ȳ − b̂x̄.
Proof. (Continued from the proof begun on p. 731.) Remember that the data (x1 , x2 , . . . , xn )
and (y1 , y2 , . . . , yn ) are given. Thus, we can treat all the xi s and yi s as constants. We have:
∂ ∂ ∂ ûi
∑ û2i = ∑ û2i = ∑ (2ûi ) = ∑ −2 [yi − (â + b̂xi )] .
∂â ∂â ∂â
∂ 1
Thus, ∑ û2i = 0 ⇐⇒ yi − (â + b̂xi ) = 0 ⇐⇒ â = ȳ − b̂x̄.
∂â
We also have:
∂ ∂ ∂ ûi
∑ û2i = ∑ û2i = ∑ (2ûi ) = ∑ −2xi [yi − (â + b̂xi )] .
∂ b̂ ∂ b̂ ∂ b̂
∂ 1
Thus, ∑ û2i = 0 ⇐⇒ ∑ [yi − (â + b̂xi )] xi = 0. Plugging = into this last equation, we
∂ b̂
have ∑ [yi − (ȳ − b̂x̄ + b̂xi )] xi = 0. Tedious algebra yields Formula (ii):
∑ xi yi − nx̄ȳ
b̂ = .
∑ x2i − nx̄2
According to NASA (1976), “U.S. Standard Atmosphere”, p. 12, eq. (33a) (PDF), the
barometric formula (relating pressure P to height H above sea level), in the case where
LM,b ≠ 0 is given by:
′
g0 M
∗
TM,b R LM,b
P = Pb [ ] ,
TM,b + LM,b (h − hb )
′
where Pb , TM,b , LM,b , hb , g0 , R∗ are simply constants. Now, do the algebra:
′
g0 M
R∗ LM,b
TM,b
P = Pb [ ]
TM,b + LM,b (h − hb )
′
g M
− R∗ 0L
TM,b + LM,b (h − hb ) M,b
= Pb [ ]
TM,b
′
g0 M
− R∗ L
LM,b M,b
= PM,b [1 + (h − hb )]
TM,b
′
gM LM,b
ln P = ln PM,b − ∗0 ln [1 + (h − hb )] .
R LM,b TM,b
Now, for heights up to 11, 000 m above sea level, hb is simply the height at sea level. That
′
g0 M
is, hb = 0 m. If we also let a = ln PM,b and b = − ∗ and get rid of the subscripts in LM,b
R LM,b
and TM,b (just to make it neater), then we have:
L
ln P = a + b ln (1 + h) .
T
For heights up to 11, 000 m above sea level, L = −0.00065 kelvin per metre is the temperature
lapse rate (the rate at which the temperature falls, as we go up in altitude; see p.3, Table
4) and T = 288.15 kelvin is the standard sea-level temperature (also precisely equal to 15
°C; see p. 4).
Answers to Exercises
My answers here are often more verbose than what would be necessary for you to get the
full credit on an exam. The reason is to help you understand my answers better.
Answer to Exercise 3. The set W = {Apple, Apple, Apple, Banana, Banana, Apple}
has only two distinct elements. Hence, n(W ) = 2. We can rewrite the set more simply as
W = {Apple, Banana}.
Answer to Exercise 4. There is only one even prime number, namely 2. Hence,
n(C) = 1.
Answer to Exercise 5. D is the set containing the first 50 odd positive integers; hence,
n(D) = 50. And T is the set containing the first 99 negative integers; hence, n(T ) = 99.
Answer to Exercise 6. The set of all primes is H = {2, 3, 5, 7, 11, 13, 17, 23, 29, . . . }.
Answer to Exercise 7. If U = {−1, 0, 2}, then U + = {2}, U − = {−1}, U0 = {−1, 0, 2}, U0+ =
{0, 2}, and U0− = {−1, 0}.
Answer to Exercise 8. The set Z = [1, 1] contains only one element, namely the number
1. So actually, we can also write Z = {1}.
Answer to Exercise 9. The set Y = (1, 1) contains no elements. So actually, we can also
write Y = ∅.
Answer to Exercise 10. The set X = (1, 1.01) contains infinitely many elements, namely
all the real numbers that are greater than 1 but smaller than 1.01.
Answer to Exercise 11. R = (−∞, ∞), R+ = (0, ∞), R+0 = [0, ∞), R− = (−∞, 0), and
R−0 = (−∞, 0].
Answer to Exercise 12. (a) Every integer is also a rational number and a real number;
hence, Z ⊆ Q, R. (b) A rational number is also a real number; hence, Q ⊆ R. However,
some rational numbers are not integers (e.g. 1.5 is rational but is not an integer); hence,
Q ⊆/ Z. (c) Some real numbers are neither rational nor integers (e.g. π); hence, R ⊆/ Z, Q.
Answer to Exercise 13. True. The set of currently-serving Singapore Prime Ministers if
{Lee Hsien Loong}. The set of currently-serving Singapore Ministers is {Lee Hsien Loong,
Tharman, Teo Chee Hean, Khaw Boon Wan, . . . }. The latter set contains every element
that is in the former set. Hence, the former is a subset of the latter.
Answer to Exercise 14. Yes, the set of squares is a proper subset of the set of rectangles.
All squares are rectangles and so S ⊆ R. Moreover, some rectangles are not squares and so
S ≠ R. Altogether then, by Definition 6, S ⊂ R.
Answer to Exercise 18. (a) [1, 2] ∪ [2, 3] = [1, 3]. (b) (−∞, −3) ∪ [−16, 7) = (−∞, 7). (c)
{0} ∪ Z+ = Z+0 .
Answer to Exercise 19. S ∪ R = R. In words, “the set of all squares and all rectangles”
is itself simply “the set of all rectangles”.
Answer to Exercise 20. All real numbers are either rational or irrational. Hence, the
set of all rationals and irrationals is itself simply “the set of all reals” or R.
Answer to Exercise 21. (a) (4, 7] ∩ (6, 9) = (6, 7]. (b) [1, 2] ∩ [5, 6] = ∅. (c) (−∞, −3) ∩
[−16, 7) = [−16, −3).
Answer to Exercise 22. S ∩ R = S. In words, the intersection of these two sets is simply
itself the set of all squares. This is because the only objects that are BOTH squares AND
rectangles are squares.
Answer to Exercise 23. It is the empty set (∅). This is because there is no object that
is BOTH rational AND irrational.
Answer to Exercise 28. The error is in Step #5. Since x = y, we have x − y = 0. Hence,
we cannot divide both sides by x − y.
Answer to Exercise 29. f (1) = 1 + 1 = 2, g(1) = 17(1) = 17, and h(1) = 31 = 3. i(1) is
simply undefined because 1 is not in the domain Z− = {−1, −2, −3, ...}.
Answer to Exercise 30. (i) Yes. (ii) Every element in the domain is assigned to exactly
one element in the codomain. Specifically, 5 ↦ 10, 6 ↦ 12, and 7 ↦ 14. (iii) The function
f ∶ {5, 6, 7} → {⋅ ⋅ ⋅ − 6, −4, −2, 0, 2, 4, 6, . . . } is defined by x ↦ 2x (or alternatively, f (x) = 2x).
Answer to Exercise 31. (i) No. (ii) By the rule, the function would map 0 to 3 and/or
4. Thus, this violates the requirement that every element in the domain is assigned to
exactly one element in the codomain, because the element 0 in the domain is assigned to
more than one element in the codomain. (iii) NA.
Answer to Exercise 32. (i) No. (ii) By the rule, the function would map 2 to no element
in the codomain; and 4 to 3. Thus, this violates the requirement that every element in the
domain is assigned to exactly one element in the codomain, because the element 2 in the
domain is not assigned to any element in the codomain. (iii) NA.
Answer to Exercise 33. (i) Yes. (ii) By the rule, the function would simply map 1 in
the domain to 1 in the codomain. And so every element in the domain is assigned to one
(and exactly one) element in the codomain, as required. (iii) The function f ∶ {1} → {1} is
defined by x ↦ x (or alternatively, f (x) = x).
Answer to Exercise 34. (i) Yes. (ii) By the rule, the function would simply map 1 in
the domain to 1 in the codomain. And so every element in the domain is assigned to one
(and exactly one) element in the codomain, as required. (iii) The function f ∶ {1} → {1, 2}
is defined by x ↦ x (or alternatively, f (x) = x).
Answer to Exercise 35. (i) No. (ii) By the rule, the function would map 1 in the
domain to 1 in the codomain, but it would fail to map 2 in the domain to any element in
the codomain. This fails the requirement that every element in the domain is assigned to
exactly one element in the codomain. (iii) NA.
Answer to Exercise 36. (i) No. √ (ii) By the rule, the function would map −1 to no
element in the codomain, because −1 ∉ R. Thus, this violates the requirement that every
element in the domain is assigned to exactly one element in the codomain, because the
element -1 in the domain is not assigned to any element in the codomain. (iii) NA.
Answer to Exercise 37. (i) No. (ii) By the rule, the function would map 0 to no element
1
in the codomain, because is undefined and is thus not in the codomain. This violates
0
the requirement that every element in the domain is assigned to exactly one element in
the codomain, because the element 0 in the domain is not assigned to any element in the
codomain. (iii) NA.
Answer to Exercise 42. Only (b) is true: “The range of any function is a subset of its
codomain.” The range of a function need not be a subset of its domain, so (a) is false. The
range of a function is often but not always a proper subset of its codomain, so (c) is false.
√
y= x ⇐⇒ y 2 = x.
Thus, indeed, this function is one-to-one — every element y in the range corresponds to
exactly one element in the domain, namely y 2 .
√
y = x2 ⇐⇒ ± y = x.
√
The domain consists of only non-negative reals. And so it is impossible that x = − y. So
this function is indeed one-to-one —every
√ element y in the range corresponds to exactly
one element in the domain, namely y.
(c) The function h ∶ R → R defined by x ↦ ∣x∣ is not one-to one — for example, 23 in the
range is “hit” once by 23 and again by −23.
(d) To check whether the function i ∶ R+0 → R defined by x ↦ ∣x∣ is one-to-one, we need
to show that every element y in the range corresponds to exactly one element x in the
codomain. To this end, pick any element y in the range and write:
⎧
⎪
⎪
⎪x, if x ≥ 0,
y = ∣x∣ ⇐⇒ y = ⎨
⎪
⎪
⎩−x,
⎪ if x < 0.
The domain consists of only non-negative reals. And so it is impossible that x < 0. So this
function is indeed one-to-one — every element y in the range corresponds to exactly one
element in the domain, namely y.
(e) The function j ∶ R → R defined by x ↦ sin x is not one-to one — for example, 0 is
“hit” by infinitely many elements in the domain, namely . . . , −2π, −π, 0, π, 2π, . . . . This is
because sin(−2π) = 0, sin(−π) = 0, etc.
(b).
1. The function g ∶ [−0.5π, 0.5π] → R defined by x ↦ sin x has range [−1, 1]. So the inverse
function has domain [−1, 1].
2. The domain of g is [−0.5π, 0.5π]. So the inverse function has codomain [−0.5π, 0.5π].
3. Pick any element y in the range and write:
So g −1 has mapping rule y ↦ sin−1 y. (For a brief review of the arcsine function, see
Section 26.6 and the sections that follow.)
(c).
√
y = h(x) ⇐⇒ y = x3 ⇐⇒ y = x.
3
°
−1
h (y)
√
So h−1 has mapping rule y ↦ 3
y.
1. The function g has range (0, ∞). So the inverse function has domain (0, ∞).
2. The domain of g is (1, ∞). So the inverse function has codomain (1, ∞).
3. Pick any element y in the range and write:
1 1
y = g(x) ⇐⇒ y = ⇐⇒ (x − 1)2 = (∵y ≠ 0)
(x − 1)2 y
√ √
1 1
⇐⇒ x − 1 = ± ⇐⇒ x = 1 ± .
y y
−1 −1
√ of g — and hence the codomain of g — is (1, ∞). So h has
We know that the domain
1
mapping rule y ↦ 1 + .
y
1. The function g has range [400, 900]. So the inverse function has domain [400, 900].
2. The domain of g is [20, 30]. So the inverse function has codomain [20, 30].
3. Pick any element y in the range and write:
√
y = g(x) ⇐⇒ y = x2 ⇐⇒ ± y = x.
(b) The range of g is R+ and this is indeed a subset of the domain of f (which is R). So the
2
composite function f g ∶ R → R exists and is defined by x ↦ f (g(x)) = f (ex ) = (ex ) + 1 =
e2x + 1. Thus, f g(1) = e2(1) + 1 = e2 + 1 and f g(2) = e2(2) + 1 = e4 + 1.
(c) The range of g is R− ∪ R+ and this is indeed a subset of the domain of f (which is
R− ∪ R+ ). So the composite function f g ∶ R− ∪ R+ → R− ∪ R+ exists and is defined by
1 1
x ↦ f (g(x)) = f ( ) = 1/ ( ) = 2x. Thus, f g(1) = 2(1) = 2 and f g(2) = 2(2) = 4.
2x 2x
(d) The range of g is R− ∪ R+ and this is indeed a subset of the domain of f (which is
R− ∪ R+ ). So the composite function f g ∶ R− ∪ R+ → R− ∪ R+ exists and is defined by
1 1 x 1 2
x ↦ f (g(x)) = f ( ) = = . Thus, f g(1) = and f g(2) = = 1.
x 2 × 1/x 2 2 2
Answer to Exercise 48. (a) The range of f is R+ and this is indeed a subset of the
domain of f (which is R). So the composite function f 2 ∶ R → R exists and is defined by
x ↦ f (f (x)) = ef (x) = ee . Hence, f 2 (1) = ee and f 2 (2) = ee .
x 1 2
(b) The range of f is R and this is indeed a subset of the domain of f (which is R). So
the composite function f 2 ∶ R → R exists and is defined by x ↦ f (f (x)) = 3f (x) + 2 =
3(3x + 2) + 2 = 9x + 8. Hence, f 2 (1) = 17 and f 2 (2) = 26.
(c) The range of f is [1, ∞) and this is indeed a subset of the domain of f (which is R).
So the composite function f 2 ∶ R → R exists and is defined by
2
x ↦ f (f (x)) = 2 [f (x)] + 1 = 2(2x2 + 1)2 + 1
Answer to Exercise 49. (a) No, it is impossible to rewrite the equation x2 + y 2 = 1 into
the form of a single function. For every value of x, there can be two corresponding values of
y. For example, if x = 0, then either y = −1 or y = 1 will satisfy the equation. There is thus
no way to write y as a function of x. Conversely, for every value of y, there can likewise be
two corresponding values of x. There is thus no way to write x as a function of y.
(b) Although it is impossible to rewrite the equation x2 + y 2 = 1 into the form of a single
√ to rewrite it into the form of two functions.
function, it is nonetheless possible √ Namely,
f ∶ [−1, 1] → R defined by x ↦ 1 − x and g ∶ [−1, 1] → R defined by x ↦ − 1 − x . These
2 2
8 y
4
y = ex
3
1
x
0
-2 -1 0 1 2
8 y
7
5
y = 3x + 2
4
1
x
0
-2 -1 0 1 2
-1
-2
-3
-4
10 y
8
y = 2x2 + 1
7
1
x
0
-2 -1 0 1 2
52+x
= 2x 1 Factorise out 52x
5 (5 + 3 + 17)
1 52+x 5x
= x = 2x = 2x = 5−x .
5 5 (25) 5
a = 3, b = 4. Then x(a ) = 2(3 ) = 281 , but xab = 23×4 = 212 – the two are clearly not equal.
b 4
b
(ii) (xa ) = xab is true, as we now prove:
b times
b
³¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
(xa ) = (xa ) ⋅ (xa ) ⋅ ⋅ ⋅ ⋅ ⋅ (xa )
b times
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞ ⎛³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
a times ⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
= ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
= xab .
√
1
x
y − +1 x2
y2
√ = √ √ 2
x
y + x2
y2 +1 ( xy + x2
y2 + 1) ( xy − xy2 + 1)
√
x
y − x2
y2 +1
= 2 √ 2
( xy ) −( x2
y2 + 1)
√
x
y − x2
y2 +1
=
x2
− ( xy2 + 1)
2
y2
√ 2
y − y2 + 1
x x
=
−1
√
x2 x
= + 1 − .
y2 y
Answer to Exercise 54. (a) The graph of the equation x2 +y 2 = 1 intersects the horizontal
axis at the points (−1, 0) and (1, 0); and intersects the vertical axis at the points (0, −1)
and (0, 1).
(b) The graph of the equation y = x2 − 4 intersects the horizontal axis at the points (−2, 0)
and (2, 0); and intersects the vertical axis at the point (0, −4).
(c) The graph of the equation y = x2 + 2x + 1 intersects the horizontal and vertical axes at
the point (0, 0).
(d) The graph of the equation y = x2 + 2x + 2 does not intersect the horizontal axis, but
does intersect the vertical axis at the point (0, 2).
Answer to Exercise 55. (a) Given the point (3, 17), its reflection in the line y = x is
(17, 3) and its reflection in the line y = −x is (−17, −3).
(b) Given the point (−1, 5), its reflection in the line y = x is (5, −1) and its reflection in the
line y = −x is (−5, 1).
(c) Given the point (0, 0), its reflection in the line y = x is (0, 0) and its reflection in the
line y = −x is (0, 0).
⎧
⎪
⎪
⎪1, if x ≤ 0,
Answer to Exercise 56. Given f ∶ R → R defined by f (x) = ⎨ we have
⎪
⎪
⎪ if x > 0,
⎩2,
Answer to Exercise 58. Given g ∶ R → R defined by x ↦ x4 −x3 +x2 −x+1, the derivative
of g is the function with domain and codomain both R and mapping rule x ↦ 4x3 −3x2 +2x−1.
dg ⋅ dg ⋅
It may be denoted g ′ or or g. Evaluated at 1, we have g ′ (1) = ∣ = g(1) = 2.
dx dx x=1
The 2nd derivative of g is the function with domain and codomain both R and mapping
d2 g ⋅⋅
rule x ↦ 12x2 − 6x + 2. It may be denoted g ′′ or or g. Evaluated at 1, we have
dx2
′′ d2 g ⋅⋅
g (1) = 2 ∣ = g(1) = 8.
dx x=1
The 3rd derivative of g is the function with domain and codomain both R and mapping
3
(3) d3 g ⋅
rule x ↦ 24x − 6. It may be denoted g or 3
or g. Evaluated at 1, we have g (3) (1) =
dx
3
d3 g ⋅
∣ = g(1) = 18.
dx3 x=1
The 4th derivative of g is the function with domain and codomain both R and mapping
4
d4 g ⋅ d4 g
rule x ↦ 24. It may be denoted g or 4 or g. Evaluated at 1, we have g (4) (1) = 4 ∣ =
(4)
dx dx x=1
4
⋅
g(1) = 24.
For n ≥ 5, the nth derivative of g is the function with domain and codomain both R
n
dn g ⋅
and mapping rule x ↦ 0. It may be denoted g (n) or n
or g. Evaluated at 1, we have
n
dx
(n) dn g ⋅
g (1) = n ∣ = g(1) = 0.
dx x=1
′ x d x x g(x) − xg ′ (x)
h (x) = [cos ]⋅[ ] = [cos ]⋅ 2 .
g(x) dx g(x) g(x) [g(x)]
d d 1 0 − cos x 1 cos x
csc x = = 2 =− = − csc x cot x.
dx dx sin x sin x sin x sin x
d
Answer to Exercise 61. (a) Newton’s Second Law of Motion is F = (mv). (In words,
dt
force is equal to the rate of change of momentum.)
d dm dv
(b) Using the Product Rule, F = (mv) = v+m .
dt dt dt
dm dv
Now, under the assumption that mass is constant, we have = 0, so that F = m .
dt dt
Acceleration (a) is defined as the rate of change of velocity. Hence, F = ma.
d d 1 0 − (− sin x) 1 sin x
Answer to Exercise 62. sec x = = = = sec x tan x.
dx dx cos x cos2 x cos x cos x
d
Rewrite y = cos−1 x as x = cos y and then apply (implicit differentiation) to get 1 =
dx
dy √ dy d −1 −1
− sin y . But sin2 y +cos2 y = 1, so sin y = 1 − x2 . Thus, = sin−1 x = =√ .
dx dx dx sin y 1 − x2
d
Rewrite y = tan−1 x as x = tan y and then apply (implicit differentiation) to get 1 =
dx
dy dy d 1 1
sec2 y . But 1 + tan2 y = sec2 y, so = sin−1 x = = .
dx dx dx sec2 y 1 + x2
π π
Answer to Exercise 63. For every k ∈ Z, g is increasing on [− + 2kπ, + 2kπ]
2 2
π 3π π π
decreasing on [ + 2kπ, + 2kπ], strictly increasing on (− + 2kπ, + 2kπ), and strictly
2 2 2 2
π 3π
decreasing on ( + 2kπ, + 2kπ).
2 2
Answer to Exercise 64. (a) Given the function f ∶ R → R defined by x ↦ 100 ...
(i) Every point a ∈ R is a maximum point with corresponding maximum value f (a) = 100;
(ii) Every point a ∈ R is a minimum point with corresponding maximum value f (a) = 100;
(iii) No point is a strict maximum;
(iv) No point is a strict minimum;
(v) Every point a ∈ R is a global maximum point with corresponding global maximum
value f (a) = 100;
(vi) Every point a ∈ R is a global minimum point with corresponding global maximum
value f (a) = 100;
(vii) No point is a strict global maximum;
(viii) No point is a strict global minimum.
(iv) Only x = 0 is a strict minimum point with corresponding strict minimum value g(0) = 0;
(v) No point is a global maximum;
(vi) Only x = 0 is a global minimum point with corresponding global minimum value
g(0) = 0;
(vii) No point is a strict global maximum;
(viii) Only x = 0 is a strict global minimum point with corresponding strict global minimum
value g(0) = 0.
Answer to Exercise 65. (a) It is false that every maximum point or minimum point is
a stationary point — see Points A and E in Example 141.
(b) It is also false that every maximum point or minimum point is a turning point — again,
see Points A and E in Example 141.
(c) It is false that every stationary point is a maximum point or minimum point — see
Point D in Example 141.
(d) By Definition 140, it is true that every turning point is a maximum point or minimum
point.
(e) By Definition 140, it is true that every turning point is a stationary point.
(f) It is false that every stationary point is a turning point — again, see Point D in Example
141.
Answer to Exercise 67. “If c is a maximum or minimum point AND in the interior of
D, then c is a turning point” — true! By the IET, c is a stationary point. Since c is also
either a maximum or a minimum point, by Definition 44, x is also a turning point.
There are neither stationary nor non-interior points. Hence, there are no maximum or
minimum points.
h′ (x) = 4x3 − 4x = 4x (x2 − 1) = 4x(x − 1)(x + 1). So the stationary points of h are 0, 1, and
−1.
From a sketch of the graph, we see that 0 is a maximum point. And ±1 are minimum points
(and also global minimum points).
-2 -1 0 1 2
1. Identify all the stationary points. g ′ (x) = 8x7 + 7x6 − 6x5 = x5 (8x2 + 7x − 6) = 0 ⇐⇒
√ √
−7 ± 72 − 4(8)(−6) −7 ± 241
x = 0, or x = = .
2(8) 16
(a) g ′′ (x) = 56x6 + 42x5 − 30x4 . So
√ √
−7 − 241 −7 + 241
g ′′ (0) = 0, g ′′ ( ) > 0, and g ′′ ( ) > 0.
16 16
√
−7 ± 241
(b) So are both minimum points. The 2DT is inconclusive about 0. By
16
sketching the graph, we observe that 0 is an inflexion point (this is an informal
argument).
Altogether, we √ conclude that there are no maximum points and the only two minimum
−7 ± 241
points are .
16
π π
(b) h ∶ (− , ) → R defined by x ↦ tan x.
2 2
1. Identify all the stationary points. h′ (x) = sec2 x is never equal to 0, so there are no
stationary points.
2. There are no non-interior points.
Altogether, we conclude that there are no maximum points and no minimum points.
(c) i ∶ [0, 2π] → R defined by x ↦ sin x + cos x.
π 5π
1. Identify all the stationary points. i′ (x) = cos x − sin x = 0 ⇐⇒ x = , .
4 4
π π
(a) i′′ (x) = − sin x − cos x. So i′′ ( ) < 0 and i′′ ( ) > 0.
4 4
π 5π
(b) So is a maximum point and is a minimum point.
4 4
2. The only two non-interior points are 0 and 2π. The former is a minimum point and the
latter is a maximum point.
π
Altogether, we conclude that the two maximum points are and 2π, and the two minimum
4
5π
points are and 2π.
4
20 y
18
y = 2ex + x
16
14
12
10
8
6
y=x
4 Asymptote
2
x
0
-4 -2 0 2 4
-2
-4
-6
5
y
3
y=2-x/3
One of infinite
2 lines of symmetry
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
x
-1
-2
y = 3x + 2
-3
-4
-5
10 y
8
y = 2x2 + 1
7
1
x
0
-2 -1 0 1 2
Answer to Exercise 71. The graphs of all three equations are below: (a) y = 2x2 + x + 1
(red). (b) y = −2x2 + x + 1 (blue). (c) y = x2 + 6x + 9 (green).
(a) Since b2 − 4ac = 12 − 4(2)(1) = 1 − 8 = −7 < 0, there are no horizontal intercepts. The
b 1
vertical intercept is c = 1. The turning point is at x = − = − = −0.25.
2a 4
(b) Since b2 √− 4ac = 12√− 4(−2)(1) = 1 + 8 = 9 > 0, there are two horizontal intercepts,
−b ± 9 −1 ± 9
namely = = 1, −0.5. The vertical intercept is c = 1. The turning point is
2a −4
b −1
at x = − = = 0.25.
2a −4
(c) Since b2 − 4ac = 62 − 4(1)(9) = 36 − 36 = 0, there is one horizontal intercept, namely
b 6 b
− = − = 3. The vertical intercept is c = 9. The turning point is at x = − = 3.
2a 2 2a
Answer to Exercise 72. Below, alongside the graph of f (in red), are the graphs of
y = ∣2f (3x)∣ (in green), y = f (∣x − 1∣) (in blue), and (in purple).
(a) To get the graph of y = ∣2f (3x)∣ (in green), stretch the red graph horizontally (out-
1
wards from the vertical axis) by a factor of , then stretch the new graph vertically (upwards
3
from the horizontal axis) by a factor of 2, and finally reflect all points for which y < 0 on
the vertical axis.
(b) To get the graph of y = f (∣x − 1∣) (blue), shift the red graph rightwards by 1 unit, then
reflect all points for which x < 1 on the vertical line x = 1.
(c) To get the graph of y 2 = f (x) + 4 (in purple), shift the red graph upwards by 4 units,
then for all points for which f (x) + 4 ≥ 0, take both the positive and negative square roots.
1
1. Stretch the graph horizontally, outwards from the vertical axis, by a scale factor of to
5
1
get the graph of y = .
5x
1
2. Move the graph rightward by 2 units to get the graph of y = .
5x − 2
1
3. Reflect the graph on the horizontal axis to get the graph of y = − .
5x − 2
1
4. Move the graph upward by 3 units to get the graph of y = 3 − .
5x − 2
x2 y 2 1
Answer to Exercise 74. The equation 2 + 2 = 1 is the special case of the equation =
a b
1 1 1 1
where A = 2 , B = 0, C = 2 , D = 0, E = 0, and F = −1. We have B 2 − 4AC = 02 − 4 2 2 < 0,
a b a b
so that this is indeed an ellipse.
x2 y 2 1 1 1
The equation − = 1 is the special case of the equation = where A = , B = 0, C = − ,
a2 b2 a2 b2
1 −1
D = 0, E = 0, and F = −1. We have B 2 − 4AC = 02 − 4 2 2 > 0, so that this is indeed a
a b
hyperbola.
y 2 x2 1 1 1
The equation 2 − 2 = 1 is the special case of the equation = where A = − 2 , B = 0, C = 2 ,
b a a b
−1 1
D = 0, E = 0, and F = −1. We have B 2 − 4AC = 02 − 4 2 2 > 0, so that this is indeed a
a b
hyperbola.
ax + b
The equation y = can be rewritten as cxy +dy = ax+b or cxy −ax+dy −b = 0, with the
cx + d
d ax + b
additional condition that x ≠ − (otherwise the denominator is 0). The equation y =
c cx + d
1
is thus the special case of the equation = where A = 0, B = c, C = 0, D = −a, E = d, and
d
F = −b, with the additional condition that x ≠ − . We have B 2 − 4AC = c2 − 4(0)(0) > 0, so
c
that this is indeed a hyperbola.
ax2 + bx + c
The equation y = can be rewritten as dxy + ey = ax2 + bx + c or −ax2 + dxy −
dx + e
e
bx + ey − c = 0, with the additional condition that x ≠ − (otherwise the denominator is 0).
d
ax2 + bx + c 1
The equation y = is thus the special case of the equation = where A = 0, B = c,
dx + e
e
C = 0, D = −a, E = d, and F = −b with the additional condition that x ≠ − . We have
d
B 2 − 4AC = c2 − 4(0)(0) > 0, so that this is indeed a hyperbola.
(ii) It intersects the vertical axis at (−c, −d + b) and (−c, −d − b) and the horizontal axis at
(−c + a, −d) and (−c − a, −d).
(iii) There is one maximum turning point (−c, −d + b) and one minimum turning point
(−c, −d − b).
y
(x + c)2 / a2 + (y + d)2 / b2 = 1
(- c, - d + b)
x
y = -d (-c, -d)
Line of Symmetry Centre
(- c - a, - d) x = -c (- c + a, - d)
Line of
Symmetry
(- c, - d - b)
3.2
5x − 2 16x +3
16x −6.6
9.6
16x + 3 9.6
= 3.2 + .
5x − 2 5x − 2
4x2 − 3x + 1
(b) Given , we have
x+5
4x −23
x + 5 4x2 −3x +1
4x2 +20x
−23x
−23x −115
114
4x2 − 3x + 1 114
= 4x − 23 + .
x+5 x+5
x2 + x + 3
(c) Given , we have
−x2 − 2x + 1
−1
−x2 − 2x + 1 x2 +x +3
x2 +2x −1
−x +4
x2 + x + 3 −x + 4
= −1 + 2 .
−x − 2x + 1
2 −x − 2x + 1
1. Intercepts. The graph intersects the vertical axis at the point (0, 1) and the horizontal
axis at the point (−2/3, 0).
2. There are no turning points.
3. Asymptotes. As x → −2, y → ±∞. And so x = −2 is a vertical asymptote. As x → ±∞,
y → 3. And so y = 3 is a horizontal asymptote. The two asymptotes are perpendicular
and so this is a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (−2, 3).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes.
So they must have slope 1 and −1. Moreover, both pass through the centre (−2, 3).
Altogether, we can work out that the lines of symmetry are y = −x + 1 and y = x + 5.
1. Intercepts. The graph intersects the vertical axis at the point (0, −2) and the horizontal
axis at the point (2, 0).
2. There are no turning points.
3. Asymptotes. As x → 0.5, y → ±∞. And so x = 0.5 is a vertical asymptote. As
x → ±∞, y → −0.5. And so y = −0.5 is a horizontal asymptote. The two asymptotes are
perpendicular and so this is a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (0.5, −0.5).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes.
So they must have slope 1 and −1. Moreover, both pass through the centre (0.5, −0.5).
Altogether, we can work out that the lines of symmetry are y = −x and y = x − 1.
1. Intercepts. The graph intersects the vertical axis at the point (0, 1/3) and the horizontal
axis at the point (1/3, 0).
2. There are no turning points.
3. Asymptotes. As x → −1.5, y → ±∞. And so x = −1.5 is a vertical asymptote. As
x → ±∞, y → −1.5. And so y = −1.5 is a horizontal asymptote. The two asymptotes are
perpendicular and so this is a rectangular hyperbola.
4. The centre (point at which the two asymptotes intersect) is (−1.5, −1.5).
5. We know that the two lines of symmetry bisect the angles formed by the asymptotes.
So they must have slope 1 and −1. Moreover, both pass through the centre (−1.5, −1.5).
Altogether, we can work out that the lines of symmetry are y = x and y = −x − 3.
30
28
y y = (x2 + 2x + 1) / (x - 4)
26
24 y=x+6
22 Oblique
20 Asymptote
18
16 Minimum
Turning
14 Point
(4, 10) 12
Centre
10 y = (1 - √2) x + 6 + 4√2
8 Line of Symmetry
Maximum 6 x=4
Turning 4 vertical
Point
2 asymptote
x
0
-16 -12 -8 -4 -2 0 4 8 12 16 20 24
-4
-6 y = (1 + √2) x + 6 - 4√2
-8 Line of Symmetry
-10
Let’s summarise the graph’s characteristics. This is a hyperbola and so there are two
distinct branches.
1. Intercepts. The graph intersects the vertical axis at the point (0, −0.25) and the hori-
zontal axis at the point (−1, 0).
2. There are two turning points — (−1, 0) is a maximum turning point and (9, 18.125)
is a minimum turning point.
dy 25
These were found by computing = 1− . Setting this equal to zero, we see that
dx (x − 4)2
there are two stationary points: x = −1, 9.
13 y
y= (-x2 + x - 1) / (x + 1)
11
7
y = (-1 + √2) x + 2 - √2
y=-x+2 Line of Symmetry
Oblique 5
Asymptote
Minimum 3
Turning Point
1 x
-6 -5 -4 -3 -2 -1 -1 0 1 2 3 4
x = -1
(-1, 3) vertical -3
Centre asymptote y = (- 1 - √2) x + 2 + √2
Line of Symmetry
-5
Maximum
-7 Turning Point
1. Intercepts. The graph intersects the vertical axis at the point (0, −1), but not the
horizontal axis because −x2 + x − 1 has no real zeros.
√
2. There are two turning points — (−1 − 3, 6.464) is a maximum turning point and
√
(−1 + 3, −0.464) is a minimum turning point.
dy 3
These were found by computing = −1 + . Setting this equal to zero, we see that
dx √ (x + 1)2
By observation, y can take on any value except those between these two turning points.
The range of y is thus (−∞, −0.464] ∪ [6.464, ∞).
22 y
y = (2x2 - 2x - 1) / (x + 4) 14
y = (2 + √5) x - 10 + 4√5
Line of Symmetry
6
-42
Maximum -50
Turning
Point -58
1. Intercepts. The graph intersects the vertical axis at the point (0, −0.25) and the hori-
√ √ √
zontal axis at the points (0.5(1 − 3), 0) and (0.5(1 + 3), 0), because 0.5(1 ± 3) are
the zeros of 2x2 − 2x − 1.
√
2. There are two turning points — (−4 − 39/2, −35.664) is a maximum turning point
√
and (−4 + 39/2, −0.336) is a minimum turning point.
dy 39
These were found by computing = 2− . Setting this equal to zero, we see that
dx √(x + 4)2
By observation, y can take on any value except those between these two turning points.
The range of y is thus (−∞, −35.664] ∪ [−0.336, ∞).
Answer to Exercise 79. (a) Note that sin 0 = 0 and cos 0 = 1. And so at time t = 0, the
particle P is at position (1, 0) and in contrast, the particle Q is at position (0, 1). (b) The
particle P travels anti-clockwise and in contrast, the particle Q travels clockwise.
dx d2 x
Answer to Exercise 80. x = a cos t Ô⇒ = −a sin t, 2 = −a cos t. y = b sin t Ô⇒
dt dt
dy d2 y
= b cos t, 2 = −b sin t.
dt dt
√ √
π π π 2 2
(a) At time t = , the particle’s position is (a cos , b sin ) = (a ,b ).
4 4 4 2 2
√ √
π 2 π 2
In the x-direction, its velocity is −a sin = −a and its acceleration is −a cos = −a .
4 2 √ 4 2
2
This means that it is moving leftwards at a velocity of a ms−1 . Moreover, its acceleration
√ 2
2
leftwards is a ms−2 .
2
√ √
π 2 π 2
In the y-direction, its velocity is b cos = b and its acceleration is −b sin = −b . This
4 2 √ 4 2
2
means that it is moving upwards at a velocity of b ms−1 . Moreover, it is decelerating at
√ 2
2
a rate of b ms−2 .
2
π π π
(b) At time t = , the particle’s position is (a cos , b sin ) = (0, b).
2 2 2
π π
In the x-direction, its velocity is −a sin = −a and its acceleration is −a cos = 0. This
2 2
−1
means that it is moving leftwards at a velocity of a ms . Moreover, its acceleration is 0
ms−2 .
π π
In the y-direction, its velocity is b cos = 0 and its acceleration is −b sin = −b. This means
2 2
that it is moving upwards at a velocity of 0 ms−1 . (Or equivalently, it is moving downwards
at a velocity of 0 ms−1 .) Moreover, it is accelerating in the downwards direction at a rate
of b ms−2 .
(c) At time t = 2π, the particle’s position is (a cos(2π), b sin(2π)) = (a, 0).
In the x-direction, its velocity is −a sin(2π) = 0 and its acceleration is −a cos(2π) = −a. This
means that it is moving leftwards at a velocity of 0 ms−1 . Moreover, it is accelerating in
the leftwards direction at a rate of a ms−2 .
In the y-direction, its velocity is b cos(2π) = b and its acceleration is −b sin(2π) = 0. This
means that it is moving upwards at a velocity of b ms−1 . Moreover, it is not accelerating
in the y- direction.
dx
(b) = sec2 t = y 2 is always positive and so the particle is always moving rightwards.
dt
(c) We know that at t = 0, we have x = tan t = 0 and y = sec t = 1, and so the particle must
be at position B.
π
At t = 1, t ∈ [0, ), we have x = tan t ≥ 0 and y = sec t > 0, and so the particle is in the
2
top-right quadrant. The particle must thus be at position C.
π
At both t = 2 and 3, t ∈ ( , π], we have x = tan t ≤ 0 and y = sec t < 0, and so the particle
2
is in the bottom-left quadrant. We know moreover that tan 3 > tan 2, so that the particle
is further to the right at time t = 3 than at time t = 2. So at time t = 2, the particle is at
position D; and at time t = 3, the particle is at position E.
3π
At t = 4, t ∈ (π, ], we have x = tan t ≥ 0 and y = sec t < 0, and so the particle is in the
2
bottom-right. The particle must thus be at position F .
Finally, at t = 5, the particle must be at position A.
5 y
4
Instantaneous Direction of Travel
y = 3 - 0.75 (x + 1)2 3
2
1
x
0
-5 -4 -3 -2 -1 0 1 2 3
-1
-2
t = 0, x = - 1, y = 3
vx = 2 cos (t) ms-1 = 2-3 ms-1
vy = - 6 sin(t) cos(t) ms-1 = 0 ms-1
-4
-5
10 y
9
1
x
0
-10 -8 -5 -3 0 2 5 7 10
3 y
y = ln (2x + 3)
2
Instantaneous
Direction
of Travel
1 t = 0, x = - 1, y = 0
vx = 1 ms-1
vy = 2 ms-1
x
0
-2 -1 0 1 2 3 4 5 6 7 8
-1
-2
2x + 1
Answer to Exercise 83(a) > 0 ⇐⇒ one of the following is true:
3x + 2
1. “2x + 1 > 0 AND 3x + 2 > 0” ⇐⇒ “x > −1/2 AND x > −2/3” ⇐⇒ “x > −1/2”; OR
2. “2x + 1 < 0 AND 3x + 2 < 0” ⇐⇒ “x < −1/2 AND x < −2/3” ⇐⇒ “x < −2/3”.
2x + 1
Altogether then, > 0 ⇐⇒ “x < −2/3 OR x > −1/2”.
3x + 2
2x + 1
Or in set notation, > 0 ⇐⇒ “x ∈ (−2/3, −1/2)”.
3x + 2
x−1
(b) > 0 ⇐⇒ one of the following is true:
−4
x−1
Altogether then, > 0 ⇐⇒ x < 1.
−4
−1
(c) > 0 ⇐⇒ one of the following is true:
−4
−1
Altogether then, > 0 is always true and, in particular, true for any value of x.
−4
1
(d) > 0 ⇐⇒ one of the following is true:
−4
1
Altogether then, > 0 is always false and, in particular, false for every value of x.
−4
1. “−3x − 18 > 0 AND 9x + 14 > 0” ⇐⇒ “x < −6 and x > −14/9”, but these two inequalities
are mutually contradictory, so together they are impossible; OR
2. “−3x − 18 < 0 AND 9x + 14 < 0” ⇐⇒ “x > −6 and x < −14/9” ⇐⇒ “x ∈ (−6, −14/9)”.
−3x − 18
Altogether then, > 0 ⇐⇒ “x ∈ (−6, −14/9)”.
9x − 14
2x + 3
(f) < 9 ⇐⇒
−x + 7
2x + 3
9− >0
−x + 7
2x + 3
⇐⇒ 9 + >0
x−7
9x − 63 + 2x + 3
⇐⇒ >0
x−7
11x − 60
⇐⇒ > 0.
x−7
60
1. “11x − 60 > 0 AND x − 7 > 0” ⇐⇒ “x > and x > 7” ⇐⇒ “x > 7”; OR
11
60 60
2. “11x − 60 < 0 AND x − 7 < 0” ⇐⇒ “x < and x < 7” ⇐⇒ “x < ”.
11 11
2x + 3 60
Altogether then, > 0 ⇐⇒ “x > 7 OR x < ”.
−x + 7 11
2x + 3 60
Or in set notation, > 0 ⇐⇒ “x ∈ (−∞, ) ∪ (7, ∞)”.
−x + 7 11
The denominator x2 − 4 is a ∪-shaped quadratic with two real zeros given by −2, 2. Hence,
x2 − 4 > 0 ⇐⇒ “x < −2 or x > 2”. Also, x2 − 4 < 0 ⇐⇒ “x ∈ (−2, 2)”.
x2 − 1
So > 0 ⇐⇒
x2 − 4
x2 − 1
Altogether then, > 0 ⇐⇒ “x < −2 or x > 2” OR “x ∈ (−1, 1)” ⇐⇒ x ∈ (−∞, −2) ∪
x2 − 4
(−1, 1) ∪ (2, ∞).
The denominator −x2 + 9x − 14 has a ∩-shaped graph and has two real zeros given by 2, 7.
Hence, −x2 + 9x − 14 > 0 ⇐⇒ “x < 2 OR x > 7”. Also, −x2 + 9x − 14 < 0 ⇐⇒ “x ∈ (2, 7)”.
x2 − 3x − 18
So > 0 ⇐⇒
−x2 + 9x − 14
x2 − 3x − 18
Altogether then, > 0 ⇐⇒ x ∈ (−∞, −3) ∪ (2, 6) ∪ (7, ∞).
−x2 + 9x − 14
x + 1 −x + 2 x + 1 −x + 2 (x + 1)(2x − 1) + (x − 2)(−x + 4)
> Ô⇒ − > 0 Ô⇒ > 0.
−x + 4 2x − 1 −x + 4 2x − 1 (−x + 4)(2x − 1)
x + 1 −x + 2 x2 + 7x − 9
Hence, > ⇐⇒ > 0. When is this latter inequality true?
−x + 4 2x − 1 (−x + 4)(2x − 1)
√ √
−7 + 85 −7 − 85
a= and b = .
2 2
Hence, the numerator is negative for x ∈ (a, b) and positive for x ∈ (−∞, a) ∪ (b, ∞).
The denominator (−x + 4)(2x − 1) is a ∩-shaped quadratic with zeros 0.5 and 4 as zeros.
Hence, the denominator is positive for x ∈ (0.5, 4) and negative for x ∈ (−∞, 0.5) ∪ (4, ∞).
x2 + 7x − 9
So„ > 0 ⇐⇒
(−x + 4)(2x − 1)
√
1. “x ∈ (a, b)” AND “x ∈ (−∞, 0.5) ∪ (4, ∞)” ⇐⇒ x ∈ (0.5(−7+ 85) , 4); OR
√
2. “x ∈ (−∞, a) ∪ (b, ∞)” AND “x ∈ (−∞, 0.5) ∪ (4, ∞)” ⇐⇒ x ∈ (0.5(−7− 85) , 0.5).
Altogether then,
x2 + 7x − 9 √ √
>0 ⇐⇒ x ∈ (0.5 (−7 − 85) , 0.5) ∪ (0.5 (−7 + 85) , 4) .
(−x + 4)(2x − 1)
√ √
(b) Rewrite the inequality as x − cos x > 0. Graph y = x − cos x on your TI84.
√ √
So x − cos x = 0 ⇐⇒ x ≈ 0.6. Thus, x > cos x ⇐⇒ x > 0.6.
Examining the graph from right to left, we observe that 1/ (1 − x2 ) − x3 − sin x is negative
for x > 1 and positive for x ∈ (−1, 1). For x < −1, the expression is positive to the left of
what appears to be the only horizontal intercept.
We find x3 − x2 + x − 1 − ex = 0 ⇐⇒ x ≈ −1.2. Thus,
1
− x3 − sin x > 0 ⇐⇒ x ∈ (−∞, −1.2) ∪ (−1, 1).
1 − x2
3
Beng is 32 years old today. And from =, Apu is 64 years old today.
Answer to Exercise 87. At 3pm, Plane A is 300 km northeast of the starting point and
Plane B is 600 km south of it. The angle formed by their flight paths is 3π/4. The distance
between the two planes is the third side of the triangle, two of whose sides are 300 km and
600 km, and whose angle between those two sides is 3π/4.
By the Law of Cosines (Proposition 259) from O-Level, the third
√ side of a triangle is given
2
by: c2 = a2 +b2 −2ab cos C = 90000+360000−2(300)(600)×(− ) ≈ 195442. Hence, c ≈ 442.
2
Thus, at 3pm, the two planes are 442 km apart. From 3pm, the distance between the two
planes is shrinking by 300 km/h. Hence, it will be another 442/300 hours, or about 1h 28m
before they collide. Hence, they will collide at around 4:28pm.
2 1
a (1) + b (1) + c =2,
2 2
a (3) + b (3) + c =5,
2 3
a (6) + b (6) + c =9.
You can solve this system of equations either by calculator or by hand, as I do now:
2 1 4
Take = minus = to get 8a + 2b = 3 or b = 0.5(3 − 8a) = 1.5 − 4a.
4 1 5
Plug = into = to get a + 1.5 − 4a + c = 2 or c = 0.5 + 3a.
4 5 3
Plug = and = into = to get
4 5
Now from =, b = 49/30 and from =, c = 0.4.
Answer to Exercise 89. The turning point (which is a minimum turning point if a is
b
positive) of the equation is at x = − and
2a
b 2 b b2 b2 b2
y = a (− ) + b (− ) + c = − +c=c− .
2a 2a 4a 2a 4a
2 1
a (−1) + b (1) + c =2,
a =2.
Our goal is to find the horizontal intercepts of each of these equations. These horizontal
intercepts will give us the solutions to the above system of equations.
√
1. Graph the equation y = sin x − 1 − x2 .
The horizontal intercept is 0.7391. Now repeat the above, but for the second equation:
√
3. Graph the equation y = sin x + 1 − x2 .
1
1. Graph the equation y = x5 − x3 + 2 − √ .
1+ x
It looks like there are no horizontal intercepts. Conclusion: This system of equations has
no solutions.
After Step 1.
1
Rewrite the two equations into a new equation y = − x3 − sin x.
1−x 2
Our goal is to find the horizontal intercepts of this equation. These horizontal intercepts
will give us the solutions to the above system of equations.
1
1. Graph the equation y = − x3 − sin x.
1−x 2
It is −1.1790. Conclusion: This system of equations has one solution and its x-coordinate is
−1.1790. To find the corresponding y-coordinate, we need merely plug in this value of x into
1 1
either of the equations in the original system of equations: y = = ≈
1 − x2 1 − (−1.1790)2
−2.5633. Altogether, this system of equations has one solutions: (−1.1790, −2.5633).
Answer to Exercise 91. (a) A corresponding function for the finite sequence (1, 4, 9, 16
, 25 , 36 , 49, 64, 81, 100) is a function f with
(b) A corresponding function for the finite sequence (2, 5, 8, 11, 14, 17, 20) is a function f
with
(c) A corresponding function for the finite sequence (0.5, 4, 13.5, 32, 62.5, 108, 171.5) is a
function f with
(d) A corresponding function for the finite sequence (2, 6, 6, 12, 10, 18, 14, 24, 18, 30, 22, 36,
26 , 42) is a function f with
(e) There is no obvious pattern here. So a corresponding function for the finite sequence
(18, 14.5) is a (trivial) function f with
(b) A corresponding function for the finite sequence (1, 2, 10, 290, 252010) is the function
f with
Answer to Exercise 93. (a) A corresponding function for the infinite sequence (1, 4, 9, 16,
25, 36, 49, 64, 81, 100, . . . ) is a function f with
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (n) = n2 for all n.
(b) A corresponding function for the infinite sequence (2, 5, 8, 11, 14, 17, 20, . . . ) is a function
f with
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (n) = 3n − 1 for all n.
(c) A corresponding function for the infinite sequence (0.5, 4, 13.5, 32, 62.5, 108, 171.5, . . . )
is a function f with
• Domain Z+ ;
• Codomain R; and
n3
• Mapping rule f (n) = for all n.
2
(d) A corresponding function for the infinite sequence (2, 6, 6, 12, 10, 18, 14, 24, 18, 30, 22, 36,
26, 42, . . . ) is a function f with
• Domain Z+ ;
• Codomain R; and
• Mapping rule f (n) = 2n for all odd n and f (n) = 3n for all even n.
8
(b) 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23= ∑ (3n − 1).
n=1
7
n3
(c) 0.5 + 4 + 13.5 + 32 + 62.5 + 108 + 171.5= ∑ .
n=1 2
5
n
∑ (2 − n)
n=−2
−2 −1 0 1
= [2 − (−2)] + [2 − (−1)] + [2 − 0] + [2 − 1]
2 3 4 5
+ [2 − 2] + [2 − 3] + [2 − 4] + [2 − 5]
= −22529/48.
17
(b) ∑ (4n + 5)= (4 × 16 + 5) + (4 × 17 + 5) = 142.
n=16
(c) Remember: we can choose any name (letter) we like for the index or dummy variable.
We’ve usually been using n. But here I chose to use the letter x instead. This makes no
difference.
33
∑ (x − 3) = (31 − 3) + (32 − 3) + (33 − 3)= 28 + 29 + 30 = 87.
x=31
Answer to Exercise 96. (a) The common difference in the arithmetic series 2 + 7 + 12 +
199
17 + 22 + 27 + 32 + ⋅ ⋅ ⋅ + 997 = ∑ (2 + 5n) is 5. There are in total 200 terms. By Fact 13, its
n=0
200
sum of series is (2 + 997) × = 99900.
2
100
(b) The common difference in the arithmetic series 3+20+37+54+71+⋅ ⋅ ⋅+1703 = ∑ (3 + 17n)
n=0
101
is 17. There are in total 101 terms. By Fact 13, its sum of series is (3 + 1703) × = 86153.
2
(c) The common difference in the arithmetic series 81 + 89 + 97 + 105 + 113 + ⋅ ⋅ ⋅ + 8081 =
1000
∑ (81 + 8n) is 8. There are in total 1001 terms. By Fact 13, its sum of series is (81 +
n=0
1001
8081) × = 4085081.
2
6
Answer to Exercise 97. (a) The geometric series 7 + 14 + 28 + 56 + ⋅ ⋅ ⋅ + 448 = ∑ (7 × 2n )
n=0
has common ratio 2. There are in total 8 terms. Thus, the geometric sum of series is
1 − 28 −255
7× =7× = 1785.
1−2 −1
5
n
(b) The geometric series 20+10+5+⋅ ⋅ ⋅+ 5/8 = ∑ [7 × (1/2) ] has common ratio 1/2. There are
n=0
1 6 1 63 1
in total 6 terms. Thus, the geometric sum of series is 20 × [1 − ( ) ] / (1 − ) = 20 × / =
2 2 64 2
63 315
40 × = .
64 8
5
1 n 1
(c) The geometric series 1 + + + ⋅ ⋅ ⋅ +
1/3 1/9 1/243
= ∑ ( ) has common ratio . There are
n=0 3 3
6
1 1 728 2
in total 6 terms. Thus, the geometric sum of series is 1 × [1 − ( ) ] / (1 − ) = / =
3 3 729 3
3 728 364
× = .
2 729 243
∞
3 n
Answer to Exercise 98. (a) The geometric series 6 + 9/2 + 27/8 + ⋅ ⋅ ⋅ = ∑ [6 × ( ) ] has
n=0 4
6
common ratio 3/4. Thus, its sum is = 24.
1 − 3/4
∞
1 n 1
(b) The geometric series 20 + 10 + 5 + ⋅ ⋅ ⋅ = ∑ [20 × ( ) ] has common ratio . Thus, its
n=0 2 2
20
sum is = 40.
1 − 1/2
∞
1 n 1
(c) The geometric series 1 + 1/3 + 1/9 + ⋅ ⋅ ⋅ + 1/243 == ∑ ( ) has common ratio . Thus, its
n=0 3 3
1
sum is 1/ (1 − ) = 3/2.
3
Answer to Exercise 99. Step #1. Let P(k) stand for the proposition that
2
k
k(k + 1)
3
∑r = [ ] .
r=1 2
2
j(j + 1)
j
3
Assume that P(j) is true. That is, ∑ r = [ ].
r=1 2
2
(j + 1) [(j + 1) + 1]
j+1
3
Our goal is to show that P(j + 1) is true. That is, ∑ r = [ ].
r=1 2
2
j+1
3
j
3 j(j + 1)
3
∑ r = ∑ r + (j + 1) = [ ] + (j + 1)3
r=1 r=1 2
2 2
(j + 1)(j + 2) (j + 1) [(j + 1) + 1]
=[ ] =[ ] , as desired.
2 2
k
1 − (k + 1)ak + kak+1
∑ ra = a
r
.
r=1 (1 − a)2
1
(1 − a)2
∑ ra = a r
=a
r=1 (1 − a)2
1 − 2a + a2 1 − (1 + 1)a1 + 1 × a1+1
=a =a
(1 − a)2 (1 − a)2
1 − (k + 1)ak + kak+1
=a .✓
(1 − a)2
j
1 − (j + 1)aj + jaj+1
∑ ra = a
r
.
r=1 (1 − a)2
j+1
1 − (j + 2)aj+1 + (j + 1)aj+2
∑ rar = a .
r=1 (1 − a)2
j+1 j
1 − (j + 1)aj + jaj+1
∑ ra = ∑ rar + (j + 1)aj+1 = a
r
+ (j + 1)aj+1
r=1 r=1 (1 − a) 2
as desired.
k
k(k + 1)(2k + 1)(3k 2 + 3k − 1)
∑ r4 = .
r=1 30
1
1(1 + 1)(2 × 1 + 1)(3 × 12 + 3 × 1 − 1)
∑ r 4 = 14 = . ✓
r=1 30
j
j(j + 1)(2j + 1)(3j 2 + 3j − 1)
∑ r4 = .
r=1 30
j+1
(j + 1) [(j + 1) + 1] [2(j + 1) + 1] [3(j + 1)2 + 3(j + 1) − 1]
∑ r4 = .
r=1 30
j+1 j
j(j + 1)(2j + 1)(3j 2 + 3j − 1)
∑ r = ∑ r4 + (j + 1)4 =
4
+ (j + 1)4
r=1 r=1 30
j+1
= [j(2j + 1)(3j 2 + 3j − 1) + 30(j + 1)3 ]
30
j+1
= [(2j 2 + j)(3j 2 + 3j − 1) + 30j 3 + 90j 2 + 90j + 30]
30
j+1
= (6j 4 + 6j 3 − 2j 2 + 3j 3 + 3j 2 − j + 30j 3 + 90j 2 + 90j + 30)
30
j+1
= (6j 4 + 39j 3 + 91j 2 + 89j + 30)
30
(j + 1)
= (6j 4 + 18j 3 + 10j 2 + 21j 3 + 63j 2 + 35j + 18j 2 + 54j + 30)
30
(j + 1)(2j 2 + 7j + 6) (3j 2 + 9j + 5) (j + 1)(j + 2)(2j + 3) (3j 2 + 9j + 5)
= =
30 30
(j + 1)(j + 2)(2j + 3) (3j + 6j + 3 + 3j + 3 − 1)
2
= ,
30
(j + 1) [(j + 1) + 1] [2(j + 1) + 1] [3(j + 1)2 + 3(j + 1) − 1]
= , as desired.
30
Answer to Exercise 103. (a) If the vector (4, −3) has tail (0, 0), then its head is
(0, 0) + (4, −3) = (4, −3). (b) If it has head (0, 0), then its tail is (0, 0) − (4, −3) = (−4, 3).
(c) If it has tail (5, 2), then its head is (5, 2) + (4, −3) = (9, −1). (d) If it has head (5, 2),
then its tail is (5, 2) − (4, −3) = (1, 5).
Answer to Exercise 106. These are all lengths (or magnitudes). Specifically, ∣Ð →−Ð
ac
→
cb∣ =
√ √ √ √
2 Ð
→ Ð → 2 Ð
→ Ð →
∣(−4, 1)∣ = (−4) + 1 = 17, ∣dc − ca∣ = ∣(−4, 2)∣ = (−4) + 22 = 20, ∣bd − da∣ = ∣(4, −5)∣ =
2
√ √ √ √
Ð
→ Ð → 2 Ð
→ Ð →
4 + (−5) = 41, ∣ad + cd∣ = ∣(8, −7)∣ =
2 2 82 + (−7) = 113, ∣dc + bd∣ = ∣(−4, 2)∣ =
√ √ √ √
2 Ð
→ Ð → 2
(−4) + 2 = 20, and ∣bd − db∣ = ∣(−4, 2)∣ = 02 + (−2) + = 4 = 2.
2
√ √
2 2
The distance between (18, 4) and (−1, −2) is [18 − (−1)] + [4 − (−2)] = 192 + 62 =
√
397.
Answer to Exercise 107. No, in general∣u + v∣ ≠ ∣u∣ + ∣v∣. For example,√∣(1, 0)∣ = 1
and ∣(0, 1)∣ = 1, so that ∣(1, 0)∣ + ∣(0, 1)∣ = 2 . But ∣(1, 0) + (0, 1)∣ = ∣(1, 1)∣ = 2. Hence,
∣(1, 0)∣ + ∣(0, 1)∣ ≠ ∣(1, 1)∣.
Ð→
Answer to Exercise 109. The unit vector in the direction ab is 1/5(4, −3). That in the
direction Ð→ is (0, −1). That in the direction Ð
ac
→ √
ad is 1/ 32(4, −4). The unit vectors in the
Ð
→ → Ð
→
directions 2ab, 3Ð
ac, and 4ad are the same.
7
Answer to Exercise 110. (i) Write α + 7β = 0 and 3α + 5β = 1. Solving, we have α =
16
1 7 1
and β = − . Thus, (0, 1) = (1, 3) − (7, 5).
16 16 16
(ii) Write α + 7β = 1 and 3α + 5β = 0. Solving, we have α = −5/16 and β = 3/16. Thus,
5 3
(1, 0) = − (1, 3) + (7, 5).
16 16
1 1
(iii) Write α + 7β = 1 and 3α + 5β = 1. Solving, we have α = and β = . Thus,
8 8
1 1
(1, 1) = (1, 3) + (7, 5).
8 8
6 5 6 5 21 32
Answer to Exercise 111. (a) We have p = a + b = (1, 2) + (3, 4) = ( , ).
11 11 11 11 11 11
21 32
Hence, the point is p = ( , ).
11 11
1 5 1 5 11 19 11 19
(b) We have p = a + b = (1, 4) + (2, 3) = ( , ). Hence, the point is p = ( , ).
6 6 6 6 6 6 6 6
3 2 3 2 3 2 3 2
(c) We have p = a + b = (−1, 2) + (3, −4) = ( , − ). Hence, the point is p = ( , − ).
5 5 5 5 5 5 5 5
(b) (5, 0) is a vector that points purely to the right and (−3, 0) is a vector that points
purely to the left. So the angle between them must be π. Now verify:
(5, 0) ⋅ (−3, 0)
θ = cos−1 ( )
∣(5, 0)∣ ∣(−3, 0)∣
⎛ 5 × (−3) + 0 × 0 ⎞
−1
= cos ⎜ √ √ ⎟
⎝ ∣ 52 + 02 ∣ ∣ (−3)2 + 02 ∣ ⎠
−15
= cos−1 ( ) = cos−1 (−1) = π. ✓
5×3
⎛ 1 ⎞ 3 π
= cos−1 √ = cos−1 ( √ ) = . ✓
⎝ 1 × 4/3 ⎠ 2 6
√ π
(d) Recall that the right-angled triangle whose base is 1 and side is 3 has angle between
√ 3
π
the base and the hypothenuse. Hence, is the angle between i and (1, 3). Now verify:
3
√ ⎛ √ ⎞
⎛ (1, 0) ⋅ (1, 3) ⎞ ⎜ 1 × 1 + 0 × 3 ⎟
θ == cos−1 √ cos−1 ⎜ √ √ √ ⎟
⎝ ∣(1, 0)∣ ∣(1, 3)∣ ⎠ ⎜ 2 ⎟
⎝ ∣ 1 + 0 ∣ ∣ 1 + ( 3) ∣ ⎠
2 2 2
1 1 π
= cos−1 ( √ ) = cos−1 ( ) = . ✓
1× 4 2 3
y y
rad
i x
rad i
x
Answer to Exercise 114. (a) The length of the projection of (1, 0) on (33, 33) is the
same as the length of the projection of (1, 0) on (1, 1), which is:
̂ 1 1
(1, 0) ⋅ (1, 1) = (1, 0) ⋅ [ (1, 1)] = (1, 0) ⋅ (1, 1)
∣(1, 1)∣ ∣(1, 1)∣
√
1 2 2 2 √
= √ (1 × 1 + 0 × 1) = √ = = 2.
2 2 2
̂ 1 1
(33, 33) ⋅ (1, 0) = (33, 33) ⋅ [ (1, 0)] = (33, 33) ⋅ (1, 0)
∣(1, 0)∣ ∣(1, 0)∣
1
= (33 × 1 + 33 × 0) = 33.
1
(1, 3) ⋅ (1, 0)
Answer to Exercise 115. (a) Given the vector (1, 3), its x-direction cosine is =
∣(1, 3)∣ ∣(1, 0)∣
1 (1, 3) ⋅ (0, 1) 3 1 3
√ and its y-direction cosine is = √ . Hence, its unit vector is ( √ , √ ).
10 ∣(1, 3)∣ ∣(1, 0)∣ 10 10 10
(4, 2) ⋅ (1, 0) 4
(b) Given the vector (4, 2), its x-direction cosine is = √ and its y-direction
∣(4, 2)∣ ∣(0, 1)∣ 20
(4, 2) ⋅ (0, 1) 2 4 2
cosine is = √ . Hence, its unit vector is ( √ , √ ).
∣(4, 2)∣ ∣(0, 1)∣ 20 20 20
(−1, 2) ⋅ (1, 0) 1
(c) Given the vector (−1, 2), its x-direction cosine is = − √ and its y-
∣(−1, 2)∣ ∣(1, 0)∣ 5
(−1, 2) ⋅ (0, 1) 2 1 2
direction cosine is = √ . Hence, its unit vector is (− √ , √ ).
∣(−1, 2)∣ ∣(0, 1)∣ 5 5 5
Answer to Exercise 116. (a) A three-dimensional (3D) vector is an “arrow” that has
two characteristics: direction and length. Just like a point, it can be described by an
ordered triple of real numbers. The vector a = (a1 , a2 , a3 ) carries us from the origin to
the point (a1 , a2 , a3 ).
⎛ a1 ⎞
(b) a = (a1 , a2 , a3 ) = ⎜ ⎟ Ð
→
⎜ a2 ⎟ = a1 i + a2 j + a3 k = a . If we let a refer to the point (a1 , a2 , a3 ),
⎝ a3 ⎠
then we can also write a as the position vector Ð → (i.e. the vector that carries us from the
oa
origin to the point a).
(c) Given two points a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ), (i) there is no such thing as a+b; (ii)
Ð
→ →+Ð →
a + ob is the point (a1 + b1 , a2 + b2 , a3 + b3 ); (iii) Ð
oa ob is the vector (a1 + b1 , a2 + b2 , a3 + b3 );
and (iv) Ð →−Ð
oa
→
ba is the vector ob.
Ð→
(b) the vectors 2a = (2, 4, 6), 3b = (12, 15, 18), and 4(a − b) are respectively
√ The√lengths of √
2 14, 3 77, and 4 27.
1 1 1
(c) The unit vectors are â = √ (1, 2, 3), b̂ = √ (4, 5, 6), and â
− b = √ (−3, −3, −3).
14 77 27
(d) (1, 2, 3) ⋅ (4, 5, 6) = 1 × 4 + 2 × 5 + 3 × 6 = 32 and (−2, 4, −6) ⋅ (1, −2, 3) = (−2) × 1 + 4 × (−2) +
(−6) × 3 = −28.
(e) (i) The angle between the vectors (1, 2, 3) and (4, 5, 6) is
√ √ 32
cos−1 [a ⋅ b/ (∣a∣ ∣b∣)] = cos−1 [32/ ( 14 × 77)] = cos−1 √ ≈ 0.226.
1078
(e) (ii) The angle between the vectors (−2, 4, −6) and (1, −2, 3) is
√ √
cos−1 [u ⋅ v/ (∣u∣ ∣v∣)] = cos−1 [−28/ ( (−2)2 + 42 + (−6)2 × 12 + (−2)2 + 32 )]
−28 −28
= cos−1 √ = cos−1 = π.
56 × 14 28
No, these two vectors are not orthogonal; instead, they are pointing in the exact opposite
directions.
1 32
a ⋅ b̂ = (1, 2, 3) ⋅ √ (4, 5, 6) = √ .
77 77
µ λ
p= a+ b
λ+µ λ+µ
3 2
= (1, 2, 3) + (4, 5, 6)
2+3 2+3
1
= (11, 16, 21).
5
1
Hence, the point is p = (11, 16, 21).
5
(h) (i) Given the vector (1, 3, −2), its x-, y-, and z-direction cosines are, respectively
1 3 −2
Hence, its unit vector is ( √ , √ , √ ).
14 14 14
(ii) Given the vector (4, 2, −3), its x-, y-, and z-direction cosines are, respectively
4 2 −3
Hence, its unit vector is ( √ , √ , √ ).
29 29 29
(iiii) Given the vector (−1, 2, −4), its x-, y-, and z-direction cosines are, respectively
−1 2 −2
Hence, its unit vector is ( √ , √ , √ ).
21 21 21
Answer to Exercise 118. (a) If u = (0, 1, 2) and v = (3, 4, 5), then u × v = (−3, 6, −3).
Let’s verify that u × v is orthogonal to u, by computing (u × v) ⋅ u = (−3, 6, −3) ⋅ (0, 1, 2) =
0+6−6 = 0 ✓. Similarly, let’s verify that u×v is orthogonal to v, by computing (u × v)⋅v =
(−3, 6, −3) ⋅ (3, 4, 5) = −9 + 24 − 15 = 0 ✓.
(b) If u = (−1, −2, −3) and v = (1, 0, 5), then u × v = (−10, 2, 2). Let’s verify that u × v
is orthogonal to u, by computing (u × v) ⋅ u = (−10, 2, 2) ⋅ (−1, −2, −3) = 10 − 2 − 6 = 0 ✓.
Similarly, let’s verify that u × v is orthogonal to v, by computing (u × v) ⋅ v = (−10, 2, 2) ⋅
(1, 0, 5) = −10 + 0 + 10 = 0 ✓.
⎛ uy vz − uz vy ⎞ ⎛ ux ⎞
(u × v) ⋅ u = ⎜ ⎟ ⎜
⎜ uz vx − ux vz ⎟ ⋅ ⎜ uy ⎟
⎟
⎝ ux vy − uy vx ⎠ ⎝ uz ⎠
= (uy vz − uz vy ) ux + (uz vx − ux vz ) uy + (ux vy − uy vx ) uz
= ux uy vz − ux vy uz + vx uy uz − ux uy vz + ux vy uz − vx uy uz
= ux uy vz − ux vy uz + vx uy uz − ux uy vz + ux vy uz − vx uy uz
=0
⎛ uy vz − uz vy ⎞ ⎛ vx ⎞
(u × v) ⋅ v = ⎜ ⎟ ⎜
⎜ uz vx − ux vz ⎟ ⋅ ⎜ vy ⎟
⎟
⎝ ux vy − uy vx ⎠ ⎝ vz ⎠
= (uy vz − uz vy ) vx + (uz vx − ux vz ) vy + (ux vy − uy vx ) vz
= vx uy vz − vx vy uz + vx vy uz − ux vy vz + ux vy vz − vx uy vz
= vx uy vz − vx vy uz + vx vy uz − ux vy vz + ux vy vz − vx uy vz
=0
Answer to Exercise 120. (a) Given u = (1, 2, 3) and v = (4, 5, 6), u × v = (−3, 6, −3) and
v × u = (3, −6, 3) and hence u × v = −v × u.
(b) By definition,
⎛ uy vz − uz vy ⎞ ⎛ vy uz − vz uy ⎞
u×v=⎜
⎜ uz vx − ux vz
⎟
⎟ and v × u= ⎜
⎜ vz ux − vx uz
⎟
⎟
⎝ ux vy − uy vx ⎠ ⎝ vx uy − vy ux ⎠
Answer to Exercise 121. (a) The line 5x − y − 1 = 0 can also be written as r = (0, −1) +
λ(1, 5) (λ ∈ R).
(b) The line x − 2y − 1 = 0 can also be written as r = (1, 0) + λ(2, 1) (λ ∈ R).
(c) The line y − 4 = 0 can also be written as r = (0, 4) + λ(1, 0) (λ ∈ R).
(d) The line x − 4 = 0 can also be written as r = (4, 0) + λ(0, 1) (λ ∈ R).
Answer to Exercise 122. (a) The line r = (−1, 3) + λ(1, −2) (λ ∈ R) has cartesian
equations
x = −1 + λ,
y = 3 − 2λ.
y−1 x
= .
−2 1
x = 5 + 7λ,
y = 6 + 8λ.
y − 2/7 x
= .
1 7/8
x = −3λ,
y = 3.
Eliminating λ, we have y = 3.
That is, this is the line that contains the points (x, y, z) which satisfy the above two
equations.
That is, this is the line that contains the points (x, y, z) which satisfy the above two
equations.
x z−1
y = −3, = .
3 1
That is, this is the line that contains the points (x, y, z) which satisfy the above two
equations.
y = 9, z = 9.
That is, this is the line that contains the points (x, y, z) which satisfy the above two
equations. These are the points (λ, 9, 9), where λ can be any real.
x y z
(b) We can transform the equations 2x = 3y = 5z into 1/2
= 1/3
= 1/5
. And so this is also the
line r = (0, 0, 0) + λ (1/2, 1/3, 1/5)(λ ∈ R).
3y − 1 x − 4/17 y − 1/3 z
(c) We can transform the equations 17x − 4 = = 3z into 1 = 2 = 1 . And
2 /17 /3 /3
so this is also the line r = ( /17, /3, 0) + λ ( /17, /3, /3) (λ ∈ R).
4 1 1 2 1
x − 3 5z − 2 x − 3 z − 2/5
(d) We can transform the equations = , 3y = 11 into = 7 , y = 11/3.
2 7 2 /5
And so this is also the line r = (3, 0, /5) + λ (2, 0, /5) (λ ∈ R).
2 7
Answer to Exercise 125. (a) Given the points a = (3, 1, 2), b = (1, 6, 5), and c = (0, −1, 0),
first take the line through a and b. The vector from a to b is (−2, 5, 3) and the line passes
through a. Hence, the line can be written as r = (3, 1, 2) + λ(−2, 5, 3) (λ ∈ R).
Then check whether c is on the line: Is there λ such that c = (0, −1, 0) = (3, 1, 2)+λ(−2, 5, 3)?
Rearranging, we have (−3, −2, −2) = λ(−2, 5, 3), which we can write out as:
Clearly, there is no λ such that the above three equations can be true. And so the point c
is not on the line through a and b. Hence, the three points are not collinear.
(b) Given the points a = (1, 2, 4), b = (0, 0, 1), and c = (3, 6, 10), first take the line through
a and b. The vector from a to b is (−1, −2, −3) and the line passes through a. Hence, the
line can be written as r = (1, 2, 4) + λ(−1, −2, −3) (λ ∈ R).
Then check whether c is on the line: Is there λ such that c = (3, 6, 10) = (1, 2, 4) +
λ(−1, −2, −3)? Rearranging, we have (2, 4, 6) = λ(−1, −2, −3), which we can write out as:
Clearly, all three of the above equations are true if λ = −2. And so c is also on the line.
Hence, the three points are collinear.
Answer to Exercise 126. (a) The plane containing a = (7, 3, 4), b = (8, 3, 4), and
Ð
→
c = (9, 3, 7) contains also the vectors ab = (1, 0, 0) and Ð → = (2, 0, 3). A normal vector
ac
is thus (1, 0, 0) × (2, 0, 3) = (0, −3, 0). Any scalar multiple of a normal vector is itself a
normal vector, so why not we pick (0, 1, 0) as our normal vector. (Whenever we have the
choice, we prefer vectors that involve as many 1’s and as few negative signs as possible,
because this will simplify our calculations.)
The plane can thus be described by the vector equation r ⋅ (0, 1, 0) = 3 or the cartesian
equation y = 3.
(b) The plane containing a = (8, 0, 2), b = (4, 4, 3), and c = (2, 7, 2) contains also the vectors
Ð
→
ab = (−4, 4, 1) and Ð
→ = (−6, 7, 0). A normal vector is thus (−4, 4, 1)×(−6, 7, 0) = (−7, −6, −4).
ac
Another normal vector is (7, 6, 4).
The plane can thus be described by the vector equation r ⋅ (7, 6, 4) = 64 or the cartesian
equation 7x + 6y + 4z = 64.
(c) The plane containing a = (8, 5, 9), b = (8, 4, 5), and c = (5, 6, 0). contains also the
Ð
→
vectors ab = (0, −1, −4) and Ð
→ = (−3, 1, −9). A normal vector is thus (0, −1, −4) × (2, 0, 3) =
ac
(5, −12, −3). Another normal vector is (−5, 12, 3).
The plane can thus be described by the vector equation r ⋅ (−5, 12, 3) = 47 or the cartesian
equation −5x + 12y + 3z = 47.
Answer to Exercise 128. (a) r ⋅ (3, 6, 2) = 4 can be rewritten as r ⋅ (3/7, 6/7, 2/7) = 4/7.
(b) r ⋅ (1, 2, 2) = −1 can be rewritten as r ⋅ (−1/3, −2/3, −2/3) = 1/3.
Answer to Exercise 129 (a). Given the point a = (7, 3, 4) and the line l described
by r = (8, 3, 4) + λ(9, 3, 7), we have: Ð → = (−1, 0, 0) and so ∣Ð
pa → 2 = 12 = 1. Also, Ð
pa∣ →⋅
pa
(−1, 0, 0) ⋅ (9, 3, 7)
and so (Ð→ ⋅ v̂)2 = 81 . Hence, the length of the side is
9
v̂ = √ = −√ pa
√ 92 + 32√+ 72 139 139
1 − 81/139 = 58/139 ≈ 0.646. This is the distance between point a and the line l. And
9 (9, 3, 7)
b = (8, 3, 4) − √ √
139 139
(81, 27, 63)
= (8, 3, 4) −
139
(1031, 390, 493)
= .
139
Distance between
a and b is 0.646
l
b = (7 ,2 ,3 )
p = (8, 3, 4)
22 (2, 7, 2)
b = (4, 4, 3) − √ √
57 57
22 (2, 7, 2)
= (4, 4, 3) −
57
(184, 74, 127)
= .
57
Distance between
a and b is 3.170
l
b = (1 , ,2 )
p = (4, 4, 3)
6 (5, 6, 0)
b = (8, 4, 5) + √ √
61 61
6 (5, 6, 0)
= (8, 4, 5) +
61
(518, 280, 305)
= .
61
Distance between
a and b is 4.051
l
b = (8 , , 5)
p = (8, 4, 5)
Answer to Exercise 130 (b). Given the point a = (8, 0, 2) and the line √ l described by
Ð
→ Ð
→
r = (4, 4, 3)+λ(2, 7, 2), we have: ra = (4−2λ, −4−7λ, −1−2λ) and so ∣ra∣ = 57λ2 + 44λ + 33,
which is minimised at 114λ + 44 = 0 or λ = −44/114 = −22/57.
22 1
So b = (4, 4, 3) −(2, 7, 2) = (184, 74, 127). Moreover, the length of the line segment ab
57 √ 57 √
√ 22 2
22 1397
is 57λ2 + 44λ + 33 = 57 (− ) + 44 (− ) + 33 = .
57 57 57
Answer to Exercise 130 (c). Given the point a = (8, 5, 9) and√the line l described by
r = (8, 4, 5) + λ(5, 6, 0), we have: Ð
→ = (−5λ, 1 − 6λ, 4) and so ∣Ð
ra → = 61λ2 − 12λ + 17, which
ra∣
is minimised at 122λ − 12 = 0 or λ = 12/122 = 6/61.
6 1
So b = (8, 4, 5) +(5, 6, 0) = (518, 280, 335). Moreover, the length of the line segment ab
61 √ 61 √
√ 6 2
6 1001
is 61λ2 − 12λ + 17 = 61 ( ) − 12 ( ) + 17 = .
61 61 61
9 3 7 109
r ⋅ (√ ,√ ,√ )= √
139 139 139 139
1 109 100
So n̂ = √ (9, 3, 7), dˆ = √ , and a ⋅ n̂ = √ .
139 139 139
9
Altogether then, the distance between the point and the plane is ∣dˆ − a ⋅ n̂∣ = √ and the
139
foot of the perpendicular is
9 1
a + (dˆ − a ⋅ n̂) n̂ = (7, 3, 4) + √ √ (9, 3, 7)
139 139
(1054, 444, 619)
= .
139
Ð
→
By the way, notice that in this example, n points in the same direction as ab. And so
̂
Ð
→
ab = n̂. And moreover, dˆ − a ⋅ n̂ > 0.
a = (7, 3, 4)
Not to scale.
Distance
Plane between
a and b
p = (8, 3, 4) b=
2 7 2 42
r ⋅ (√ , √ , √ ) = √ .
57 57 57 57
1 42 20
So n̂ = √ (2, 7, 2), dˆ = √ , and a ⋅ n̂ = √ .
57 57 57
22
Altogether then, the distance between the point and the plane is ∣dˆ − a ⋅ n̂∣ = √ and the
57
foot of the perpendicular is
22 (2, 7, 2)
a + (dˆ − a ⋅ n̂) n̂ = (8, 0, 2) + √ √
57 57
(458, 154, 158)
= .
57
Ð
→
By the way, notice that in this example, n points in the same direction as ab. And so
̂
Ð
→
ab = n̂. And moreover, dˆ − a ⋅ n̂ > 0.
a = (8, 0, 2)
Not to scale.
Distance
Plane between
a and b
p = (4, 4, 3) b=
8 5 9 64
r ⋅ (√ , √ , √ ) = √
61 61 61 61
(5, 6, 0) ˆ 64 70
So n̂ = √ , d = √ , and a ⋅ n̂ = √ .
61 61 61
6
Altogether then, the distance between the point and the plane is ∣dˆ − a ⋅ n̂∣ = √ and the
61
foot of the perpendicular is
6 (5, 6, 0)
a + (dˆ − a ⋅ n̂) n̂ = (8, 5, 9) − √ √
61 61
(458, 269, 549)
= .
61
Ð
→
By the way, notice that in this example, n points in the opposite direction from ab. And
̂
Ð
→
so ab = −n̂. And moreover, dˆ − a ⋅ n̂ < 0.
a = (8, 5, 9)
Not to scale.
Distance
Plane between
a and b
p = (8, 4, 5) b=
(1, 5) ⋅ (8, 1) 13 13 1 1
(b) =√ √ =√ √ = √ . So θ = cos−1 √ ≈ 1.249. This is
∣(1, 5)∣ ∣(8, 1)∣ 26 65 13 × 2 13 × 5 10 10
the acute angle between the two lines.
(2, 6) ⋅ (3, 2) 18 9 9 9
(c) =√ √ =√ √ =√ . So θ = cos−1 √ ≈ 0.661. This is the
∣(2, 6)∣ ∣(3, 2)∣ 40 13 10 13 130 130
acute angle between the two lines.
(1, 5, 6) ⋅ (8, 1, 1) 19 19
(b) = √ √ . So θ = cos−1 √ √ ≈ 1.269. This is the acute angle
∣(1, 5, 6)∣ ∣(8, 1, 1)∣ 62 66 62 66
between the two lines.
(2, 6, 7) ⋅ (3, 2, 1) 25 25
(c) = √ √ . So θ = cos−1 √ √ ≈ 0.784. This is the acute angle
∣(2, 6, 7)∣ ∣(3, 2, 1)∣ 89 14 89 14
between the two lines.
(b) The angle between the line r = (−1, 2, 3) + λ(0, 2, 6) (λ ∈ R) (λ ∈ R) and the plane
r ⋅ (1, 3, 5) = 2 is:
(c) The angle between the line r = (−1, 2, 3) + λ(1, 9, 8) (λ ∈ R) and the plane r ⋅ (2, 8, 2) = 3
is:
This is the obtuse angle. So the acute angle between the two planes is π − 2.955 = 0.186
radian.
(b) Consider the planes described, respectively, by the vector equations r ⋅ (1, 2, 3) = 3 and
r ⋅ (5, 1, 1) = 4. The angle between them is
(1, 2, 3) ⋅ (5, 1, 1) 10
θ = cos−1 ( ) = cos−1 ( √ √ ) ≈ 1.031.
∣(1, −2, 3)∣ ∣(5, 1, 1)∣ 14 27
(c) Consider the planes described, respectively, by the vector equations r ⋅ (1, 1, −8) = 5 and
r ⋅ (−3, 0, 10) = 6. The angle between them is
This is the obtuse angle. So the acute angle between the two planes is π − 2.934 = 0.207
radian.
Answer to Exercise 136. (a) The lines r = (8, 1, 5) + λ(3, 2, 1) and r = (1, 2, 3) + λ(5, 6, 7)
(λ ∈ R) are not parallel because their direction vectors cannot be written as scalar multiples
of each other. Let’s see if they have an intersection point. If they intersect, then there are
reals α and β such that (8, 1, 5) + α(3, 2, 1) = (1, 2, 3) + β(5, 6, 7), or
1 2 3
8 + 3α = 1 + 5β, 1 + 2α = 2 + 6β, and 5 + α = 3 + 7β.
3 2
Take 2× = minus = to get (10 + 2α) − (1 + 2α) = (6 + 14β) − (2 + 6β) or 9 = 4 + 8β or β = 5/8.
3 1
Now from =, this means that α = 19/8. These do not work if we try plugging them into =.
Hence, the two lines do not intersect. And so they are not coplanar — or equivalently, they
are skew.
(b) The lines r = (0, 0, 6) + λ(3, 9, 0) and r = (1, 1, 1) + λ(1, 3, 0) (λ ∈ R) have direction
vectors that can be written as scalar multiples of each other, so they are parallel. Thus,
they are also coplanar — or equivalently, they are not skew.
Remember: We need two distinct vectors and a point to determine a plane. Here, the
direction vectors of the two lines are not distinct. So we need another vector. This is
not difficult. Simply consider a vector from a point on the first line to the second, e.g.
(1, 1, 1) − (0, 0, 6) = (1, 1, −5). Now, the plane that contains both lines has normal vector
(1, 1, −5)×(1, 3, 0) = (15, −15, 2). And so the plane that contains both lines is r⋅(15, −15, 2) =
12.
(c) The lines r = (6, 5, 5) + λ(1, 0, 1) and r = (8, 3, 6) + λ(0, 1, 1) (λ ∈ R) are not parallel are
not parallel because their direction vectors cannot be written as scalar multiples of each
other. Let’s see if they have an intersection point. If they intersect, then there are reals α
and β such that (6, 5, 5) + α(1, 0, 1) = (9, 3, 6) + β(0, 1, 1), or
1 2 3
6 + α = 9, 5 = 3 + β, and 5 + α = 6 + β.
3 2 1
From = − =, we have α = 3, β = 2. This is consistent with =. And so α = 3, β = 2 is a possible
solution for the above set of equations. Hence, the two lines do intersect — in particular at
(6, 5, 5) + α(1, 0, 1) = (9, 5, 8) or (9, 3, 6) + β(0, 1, 1) = (9, 5, 8). Thus, they are also coplanar
— or equivalently, they are not skew.
Remember: We need two distinct vectors and a point to determine a line. Here, the
direction vectors of the two lines are distinct. So a plane that contains both lines has
normal vector (1, 0, 1) × (0, 1, 1) = (−1, −1, 1). And so the plane that contains both lines is
r ⋅ (−1, −1, 1) = −6.
(b) Given the line r = (5, 5, 6) + λ(2, 3, 5) (λ ∈ R) and the plane r ⋅ (−10, 0, 4) = −26, we have
(2, 3, 5) ⋅ (−10, 0, 4) = 0 and so they are parallel.
The point (5, 5, 6) on the line is on the plane because (5, 5, 6) ⋅ (−10, 0, 4) = −26. Since the
line and plane are parallel and share at least one intersection point, it must be that the line
lies completely on the plane.
(c) Given the line r = (4, 5, 6) + λ(2, 3, 5) (λ ∈ R) and the plane r ⋅ (−10, 0, 3) = −26, we have
(2, 3, 5) ⋅ (−10, 0, 3) ≠ 0 and so they are not parallel.
They must therefore intersect at exactly one point. Let’s find it.
Plug in a generic point of the line into the equation for the plane: (4 + 2λ, 5 + 3λ, 6 + 5λ) ⋅
(−10, 0, 3) = −26 ⇐⇒ −40 − 20λ + 18 + 15λ = −26 ⇐⇒ 4 = 5λ ⇐⇒ λ = 4/5. So the
intersection point is (4, 5, 6) + 4/5(2, 3, 5) = (5.6, 7.4, 10).
1
4x + 9y + 3z = 61,
2
x + y + 2z = 19.
1 2
Use the “plug in x = 0” trick. Then = minus 9× = yields −15z = −110 or z = 22/3. And so
y = 13/3. Hence, the intersection line is r = (0, 13/3, 22/3) + λ(−3, 1, 1) (λ ∈ R).
(b) The planes are r ⋅ (1, 1, 0) = 4 and r ⋅ (1, 6, 8) = 60. Clearly, (1, 1, 0) cannot be written
as a scalar multiple of (1, 6, 8). So the two planes are not parallel and share an intersection
line that has direction vector (1, 1, 0) × (1, 6, 8) = (8, −8, 5).
Find a p = (x, y, z) where the two planes intersect:
1
x + y = 4,
2
x + 6y + 8z = 60.
1 2
Use the “plug in x = 0” trick. Then = implies that y = 4. And so from =, z = 4.5. And
so one intersection point of the two planes is (0, 4, 4.5). Hence, the intersection line is
r = (0, 4, 4.5) + λ(8, −8, 5) (λ ∈ R).
(c) The planes are r ⋅ (4, 4, 8) = 56 and r ⋅ (1, 1, 2) = 12. Clearly, (4, 4, 8) can be written as
a scalar multiple of (1, 1, 2). So the two planes are parallel. Do they not intersect at all or
are they identical?
The point (1, 3, 5) is on the first plane, but is not on the second because (1, 3, 5) ⋅ (1, 1, 2) =
14 ≠ 12. And so they are parallel planes that do not intersect at all.
(d) The planes are r ⋅ (4, 4, 8) = 48 and r ⋅ (1, 1, 2) = 12. Clearly, (4, 4, 8) can be written as
a scalar multiple of (1, 1, 2). So the two planes are parallel. Do they not intersect at all or
are they identical?
The point (1, 3, 4) is on the first plane and is also on the second because (1, 3, 4)⋅(1, 1, 2) = 12.
And so they are exactly identical planes.
1
x + z = 1,
2
y − z = −1.
Use the “plug in x = 0” trick to see that one intersection point is (0, 0, 1). Hence the
intersection line of P1 and P2 is r = (0, 0, 1) + λ(−1, 1, 1) (λ ∈ R). Call this line l1 .
The planes P1 and P3 share an intersection line with direction vector (1, 0, 0) × (1, 1, 0) =
(−1, 1, 1). Let’s look for an intersection point (x, y, z):
1
x + z = 1,
2
x + y = 2.
Use the “plug in x = 0” trick to see that one intersection point is (0, 2, 1). Hence the
intersection line of P1 and P3 is r = (0, 2, 1) + λ(−1, 1, 1) (λ ∈ R). Call this line l2 .
The planes P2 and P3 share an intersection line with direction vector (0, 1, −1) × (1, 1, 0) =
(1, −1, −1) or (−1, 1, 1). Let’s look for an intersection point (x, y, z):
1
y − z = −1,
2
x + y = 2.
Use the “plug in x = 0” trick to see that one intersection point is (0, 2, 3). Hence the
intersection line of P2 and P3 is r = (0, 2, 3) + λ(−1, 1, 1) (λ ∈ R). Call this line l3 .
Step #3. Determine where, if at all, the 3 intersection lines intersect.
Clearly, all 3 lines are parallel (because they all have the same direction vector). But l1
and l2 are distinct, because (0, 0, 1) is on l1 but not on l2 .
Thus, we must be in Case 3a, where all 3 lines are distinct and do not intersect.
Conclusion.
Altogether, we conclude that the 3 intersection lines do not intersect. The 3 planes form 3
distinct intersection lines. (So we are in Case 3a.)
Answer to Exercise 141. We rewrite each, by rationalising any denominators with surds
and also writing out the sine or cosine values.
√ √ √ √ √ √ √ √
2 2 3 2 2 3 2 3 2
a=( )−( ) i, b = ( )−( ) i, c = ( )−( ) i, and d = ( )−( ) i.
2 2 2 2 2 2 2 2
Comparing the real and imaginary parts, we see that only c = d.
zw = (a + bi)(c + di) = ac + adi + bci + bdi2 = ac − bd + (ad + bc)i = (ac − bd, ad + bc) ,
as desired.
Answer to Exercise 144. (a) Given z = −5 + 2i and w = 7 + 3i, we have zw = (−5 + 2i)(7 +
3i) = −35 − 15i + 14i + 6i2 = −41 − i.
(b) Given z = 3 − i and w = 11 + 2i, we have zw = (3 − i)(11 + 2i) = 33 + 6i − 11i − 2i2 = 35 − 5i.
√ √ √ √
(c) Given z = 1 + 2i and w = 3 − 2i, we have zw = (1 + 2i)(3 − 2i) = 3 − 2i + 6i − 2 2i2 =
√ √
3 + 2 2 + (6 − 2) i.
1 + 3i 1 + 3i i i + 3i2 i − 3
(a) = × = = = −3 + i.
−i −i i −i2 1
2 − 3i 2 − 3i 1 − i 2 − 3i − 2i + 3i2 −1 − 5i
(b) = × = = = −0.5 − 2.5i.
1+i 1+i 1−i 1 − i2 2
√ √ √ √ √ √
2 − πi 2 − πi 3 + 2i 3 2 + 2i − 3πi − π 2i2 2(3 + π) + (2 − 3π)i
(c) √ = √ × √ = = .
3 − 2i 3 − 2i 3 + 2i 9 − 2i 2 11
−3 −3 2 − i −6 + 3i −6 + 3i
(e) = × = = = −1.2 + 0.6i.
2+i 2+i 2−i 4 − i2 5
Answer to Exercise 148. (a) The roots to the equation x2 + x + 1 = 0 are given by
√ √ √ √ √
−b ± b2 − 4ac −1 ± −3 1 3 × −1 1 3
x= = =− ± =− ± i.
2a 2 2 2 2 2
√ √ √ √
−b ± b2 − 4ac −2 ± −4 4 × −1
x= = = −1 ± = −1 ± i.
2a 2 2
√ √ √ √ √
−b ± b2 − 4ac −3 ± −3 1 3 × −1 1 3
x= = =− ± =− ± i.
2a 6 2 6 2 6
Hence, b = −2 and c = 2.
x2 +2x +2
x − 1 x3 +x2 +0 −2
x3 −x2
2x2
2x2 −2x
2x −2
2x −2
0.
So x3 +x2 −2 = (x − 1) (x2 + 2x + 2). Use the quadratic formula to further factorise x2 +2x+2:
√
−2 ± 22 − 4(1)(2) √
x= = −1 ± 1 − 2 = −1 ± i.
2
x3 +x2 −2
x − 1 x4 +0 −x2 −2x +2
x4 −x3
x3 −x2
(b)
x3 −x2
0 −2x +2
−2x +2
0.
x2 −2x −3
x − 4x + 13 x −6x +18x2
2 4 3
−14x −39
x4 −4x3 +13x2
−2x3 5x2 −14x
−2x3 8x2 −26x
−3x2 +12x −39
−3x2 +12x −39
0.
So the polynomial also has quadratic factor x2 − 2x − 3, which we observe can in turn be
factorised as (x − 3)(x + 1). So the four zeros of the polynomial are 2 ± 3i, 3, and −1.
And if we want to write down the four factors of the polynomial, we can easily do so:
(b) Again, we know that a quadratic factor for the polynomial is x2 − 4x + 13, so go ahead
and do the long division:
√ √
−13 ± 132 − 4(−2)(−15) 13 ∓ 169 − 120 13 ∓ 7
x= = = = 5, 1.5.
2(−2) 4 4
5 y
4
2i = (0, 2)
3
2 1 + 2i = (1, 2)
1
-3 = (-3, 0) 1 = (1, 0)
x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
-2
-3
-1 - 3i = (-1, -3)
-4
-5
5 y
3
2i = (0, 2)
2 1 + 2i = (1, 2)
1 1.107 rad
-3 = (-3, 0)
x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1 4 = (4, 0)
-2
-3 -1.893 rad
-1 - 3i = (-1, -3)
-4
-5
Answer to Exercise 155. We already calculated the modulus and arguments of these
complex numbers in Exercise 153. So
π π
1 = cos 0 + i sin 0. −3 = 3(cos π + i sin π). 2i = 2 [cos + i sin ]. 1 + 2i ≈ 5(cos 1.107 +
√ 2 2
i sin 1.107). −1 − 3i ≈ 10 [cos(−1.893) + i sin(−1.893)].
Answer to Exercise 156. We already calculated the modulus and arguments of these
complex numbers in Exercise 153. So
√
1 = ei0 . −3 = 3eiπ . 2i = 2eiπ/2 . 1 + 2i ≈ 5ei(1.107) . −1 − 3i ≈ 10ei(−1.893) .
Answer to Exercise 157. (a) z = 1 has modulus 1 and argument 0. w = −3 has modulus
3 and argument π. Hence,
∣zw∣ = 1 × 3 = 3,
(You can easily verify that 3eiπ = −3, which is indeed equal to 1 × (−3).)
√
(b) z = 2i has modulus 2 and argument π/2. w = 1 + 2i has modulus 5 and argument
tan−1 (2/1). Hence,
√
∣zw∣ = 2 × 5,
√ √
So, zw = 2 5 (cos 2.678 + i sin 2.678) = 2 5ei(2.678) .
√
(c) z = −1 − 3i has modulus 10 and argument tan−1 (−3/ − 1) − π. w = 3 + 4i has modulus
5 and argument tan−1 (4/3). Hence,
√
∣zw∣ = 10 × 5,
4
arg (zw) = tan−1 3 − π + tan−1 + 2kπ ≈ −0.965 + 2kπ = −0.965, k = 0.
3
√ √
So, zw = 5 10 [cos (−0.965) + i sin (−0.965)] = 5 10ei(−0.965) .
z 1
∣ ∣= ,
w 3
z
arg ( ) = 0 − π + 2kπ = −π + 2kπ = π, k = 1.
w
(You can easily verify that eiπ /3 = −1/3, which is indeed equal to −1/3.)
√
(b) z = 2i has modulus 2 and argument π/2. w = 1 + 2i has modulus 5 and argument
tan−1 (2/1). Hence,
√
z 2 2 5 √
∣ ∣= √ = = 0.4 5,
w 5 5
z π
arg ( ) = − tan−1 2 + 2kπ ≈ 0.464 + 2kπ = 0.464, k = 0.
w 2
z √ √
So, = 0.4 5 (cos 0.464 + i sin 0.464) = 0.4 5ei(0.464) .
w
√
(c) z = −1 − 3i has modulus 10 and argument tan−1 (−3/ − 1) − π. w = 3 + 4i has modulus
4
5 and argument tan−1 . Hence,
3
√
z 10 √
∣ ∣= = 0.2 10,
w 5
z 4
arg ( ) = tan−1 3 − π − tan−1 + 2kπ ≈ −2.820 + 2kπ = −2.820, k = 0.
w 3
z √ √
So, = 0.2 10 [cos (−2.820) + i sin (−2.820)] = 0.2 10ei(−2.820) .
w
= 3 × 1 × 2 × cos(0.55π) = 6 cos(0.55π).
We can also verify that a blind application of the formulae given in Fact 54 will give the
same answers:
θ+φ θ−φ
Ô⇒ arg (eiθ + eiφ ) = arg [ei 2 (2 cos )]
2
θ+φ θ−φ
= arg (ei 2 ) + arg 2 + arg (cos ) + 2kπ
2
θ+φ
= + 2kπ,
2
where k is the unique integer such that 0.5(θ + φ) + 2kπ ∈ (−π, π]. And
θ−φ θ+φ
That is, eiθ + eiφ has modulus 2 cos and argument . In other words, eiθ + eiφ =
2 2
θ − φ i( θ+φ )
2 cos e 2 , as desired. Similarly,
2
θ+φ θ−φ θ−φ
) θ+φ θ−φ
eiθ − eiφ = ei 2 (ei 2 − ei(− 2 ) = ei 2 (2i sin ).
2
θ+φ θ−φ
Ô⇒ arg (eiθ − eiφ ) = arg [ei 2 (2i sin )]
2
θ+φ θ−φ
= arg (ei 2 ) + arg(2i) + arg (sin
) + 2mπ
2
θ+φ π θ+φ+π
= + + 0 + 2mπ = + 2mπ,
2 2 2
where m is the unique integer such that 0.5(θ + φ + π) + 2mπ ∈ (−π, π]. And
θ−φ θ+φ+π
That is, eiθ − eiφ has modulus 2 sin and argument . In other words, eiθ − eiφ =
2 2
θ − φ i( θ+φ )
2 sin e 2 , as desired.
2
Radius r Radius r
(a, b) (a, b)
x x
Radius r
{(x, y): (x – a)2 + (y – b)2 ≥ r2}
(a, b) (a, b)
Radius r
x x
(e) y
Radius r
x
Method #2. (This second method assumes we already know that the given locus is a line.)
The desired line is perpendicular to the line connecting the points (1, 4) and (−5, 0). The
0−4
latter line has slope = 2/3 and midpoint (−2, 2).
−5 − 1
The desired line thus has equation y = −1.5x + c. The desired line passes through the
midpoint (−2, 2) — hence, 2 = −1.5(−2) + c Ô⇒ c = 2 − 3 = −1.
Altogether then, the desired line has equation y = −1.5x − 1.
(b) {(x, y) ∶ ∣(x − 17, y − 3)∣ = ∣(x + 2, y + 11)∣} is the line that is equidistant from the points
(17, 3) and (−2, −11).
The equation ∣(x − 17, y − 3)∣ = ∣(x + 2, y + 11)∣ can be rewritten as 38x + 28y − 173 = 0 or
19 173
y =− x+ .
14 28
Method #2. (This second method assumes we already know that the given locus is a line.)
The desired line is perpendicular to the line connecting the points (17, 3) and (−2, −11).
−11 − 3 −14
The latter line has slope = and midpoint (7.5, −4).
−2 − 17 −19
19
The desired line thus has equation y = − x + c. The desired line passes through the
14
19 19 285 173
midpoint (7.5, −4) — hence, −4 = − (7.5) + c Ô⇒ c = −4 + (7.5) = −4 + = .
14 14 28 28
19 173
Altogether then, the desired line has equation y = − x + .
14 28
y
{(x, y): x = 0}
{(x, y): x2 + y2 = 1}
So the locus {z ∈ C ∶ ∣z∣ ≤ r} describes the circumference of the circle of radius r centred on
the origin.
√
2 2
(b) Let z = (x, y) and c = (a, b). Then ∣z − c∣ = ∣(x − a, y − b)∣ = (x − a) + (y − b) and so
√
2 2 2 2
the equation ∣z − c∣ = r is equivalent to (x − a) + (y − b) = r or (x − a) + (y − b) = r2 .
So the locus {z ∈ C ∶ ∣z − c∣ ≤ r} describes the circumference of the circle of radius r centred
on c.
(c) The locus {z ∈ C ∶ ∣z − c∣ ≤ r} describes the entire interior of the circle centred on c, with
radius r, including the circumference of the circle.
(d) The locus {z ∈ C ∶ ∣z − c∣ < r} describes the entire interior of the circle centred on c, with
radius r, excluding the circumference of the circle.
Answer to Exercise 165. (a) ∣z − c∣ ≤ ∣z − b∣ is the closed half-plane of points that are at
least as close to point c as to point b.
(b) ∣z − c∣ < ∣z − b∣ is the open half-plane of points that are at least as close to point c as to
point b.
(c) ∣z − c∣ ≥ ∣z − b∣ is the closed half-plane of points that are at least as close to point b as
to point c.
(d) ∣z − c∣ > ∣z − b∣ is the open half-plane of points that are at least as close to point b as to
point c.
Answer to Exercise 166. The locus {z ∈ C ∶ ∣z∣ = 1, −π < arg z < 0} describes the lower
half of the circumference of the unit circle centred on the origin, excluding the endpoints
on the horizontal axis.
F
|z - 2 - 2i | = 1
U
A
C
By the properties of the circle, ∣z∣ is maximised at F and minimised at N , where F and N
lie on the line through the origin and the circle’s centre.
√ √
(a) The maximum value of ∣z∣ is the length OF = OC + CF = 22 + 22 + 1 = 8 + 1.
√ √
The minimum value of ∣z∣ is the length ON = OC − CN = 22 + 22 − 1 = 8 − 1.
(b) Consider △CAN . The line through F , C, N , and the origin is y = x. So AN = CA.
Moreover, CA2 + AN 2 = CN 2 = 12 = 1.
1 1 1
Altogether then, CA2 + CA2 = 1 or CA2 = or CA = √ . And AN = √ . Hence,
2 2 2
1 1
N = (2 − √ , 2 − √ ).
2 2
1 1
Symmetrically, F = (2 + √ , 2 + √ ).
2 2
F
|z - 2 - 2i | = 1
U
A
C
(c) The points U and D at which arg z is maximised and minimised are also where the
tangents OU and OD from the origin touch the circle. By the properties of the circle, OU
is perpendicular to CU . Similarly, OD is perpendicular to CD.
π
The angle the upper half of the line y = x makes with the positive x-axis is θ = . The
4
−1 1 π −1 1
angle ∠COU is sin √ . Hence, arg U = θ + ∠COU = + sin √ .
8 4 8
π 1
Symmetrically, arg D = θ − ∠COD = θ − ∠COU = − sin−1 √ .
4 8
√ √
(d) △ODC is right. So OD2 + CD2 = OC 2 . OC = 8 and CD = 1. Hence, OD = 8 − 1 =
√ √ π 1
7. Altogether then, ∣D∣ = 7 and arg D = − sin−1 √ .
4 8
√ π 1
Symmetrically, ∣U ∣ = 7 and arg U = + sin−1 √ .
4 8
Answer to Exercise 168. Step #1. Let P(k) stand for the proposition that
k
(cos θ + i sin θ) = cos (kθ) + i sin (kθ) .
1
(cos θ + i sin θ) = cos (1 × θ) + i sin (1 × θ) . ✓
j
(cos θ + i sin θ) = cos (jθ) + i sin (jθ) .
j+1
(cos θ + i sin θ) = cos [(j + 1) θ] + i sin [(j + 1) θ] .
j+1 j
(cos θ + i sin θ) = (cos θ + i sin θ) (cos θ + i sin θ)
= [cos (jθ) + i sin (jθ)] (cos θ + i sin θ)
= cos (jθ) cos θ + i cos (jθ) sin θ + isin (jθ) cos θ − sin (jθ) sin θ
= cos [(j + 1) θ] + i sin [(j + 1) θ],
−4
Answer to Exercise 169. (a) ∣3 − 4i∣ = 5 and arg (3 − 4i) = tan−1 ≈ −0.927.
3
So ∣(3 − 4i)7 ∣ = 57 and arg(3 − 4i)7 = 7 tan−1 (−4/3) + 2kπ ≈ −0.2079 (k = 1). So (3 − 4i)7 =
57 ei(−0.2079) .
(b) ∣−5 + 12i∣ = 13 and arg (−5 + 12i) = tan−1 [12/(−5)] + π ≈ 1.966.
12
So ∣(−5 + 12i)8 ∣ = 138 and arg(−5 + 12i)8 = 8 (tan−1 + π) + 2kπ ≈ −3.125 (k = −3). So
−5
(−5 + 12i)8 = 138 ei(−3.125) .
Even without a calculator, we know that cos (0.5π) = 0 and sin (0.5π) = 1, and so in
standard form, z 10 = 25 i = 32i. (Alternatively, you may recognise that since eiπ = −1, we
must have eiπ(0.5) = i. Thus, 25 eiπ(0.5) = 25 i = 32i.)
√ 1
(b) ∣2 + i∣ = 5 = 50.5 and arg (2 + i) = tan−1 .
2
1
So ∣z 10 ∣ = 55 and arg z 10 = 10 × tan−1 + 2kπ ≈ 4.636 + 2kπ ≈ −1.647 (k = −1). Altogether
2
then, in polar and exponential forms,
1
For the standard form, just punch into your calculator 55 × cos (10 tan−1 ) = −237 and
2
1
55 × sin (10 tan−1 ) = −3116 to get z 10 = −237 − 3116i.
2
√ −3
(c) ∣1 − 3i∣ = 10 = 100.5 and arg (1 − 3i) = tan−1 .
1
So ∣z 10 ∣ = 105 and arg z 10 = 10 tan−1 (−3) + 2kπ ≈ −12.490 + 2kπ ≈ 0.0759 (k = 2). Altogether
then, in polar and exponential forms,
For the standard form, just punch into your calculator 105 × cos (10 tan−1 (−3)) = 99712 and
105 × sin (10 tan−1 (−3)) = 7584 to get z 10 = z 10 = 99712 + 7584i.
√ 1
(b) z 11 = 2+i has modulus 3 and argument tan−1 . So the roots of the equation z 11 = 2+i
2
−1
tan (1/2) + 2kπ
have modulus 31/22 and arguments , for k = 0, ±1, ±2, ±3, ±4, ±5. Altogether
11
then,
−1
(1/2)+2kπ]/11
z = 31/22 ei[tan , for k = 0, ±1, ±2, ±3, ±4, ±5.
√
(c) z 12 = 1 − 3i has modulus 10 and argument tan−1 (−3/1). So the roots of the equation
12 1/24 tan−1 (−3) + 2kπ
z = 1−3i have modulus 10 and arguments , for k = 0, ±1, ±2, ±3, ±4, ±5, 6.
12
−1
(−3)+2kπ]/12
z = 101/24 ei[tan , for k = 0, ±1, ±2, ±3, ±4, ±5, 6.
dx
Answer to Exercise 173. Given x = cos t + t2 and y = et − t3 , we may compute =
dt
dy dy et − 3t2
− sin t + 2t and = et − 3t2 . So = (for − sin t + 2t ≠ 0).
dt dx − sin t + 2t
Answer to Exercise 174. Given x = t5 +t and y = t4 −t, we have t = 0 Ô⇒ (x, y) = (0, 0).
And t = 1 Ô⇒ (x, y) = (2, 0).
dy dy dx 4t3 − 1
Compute = ÷ = 4 (for 5t4 + 1 ≠ 0).
dx dt dt 5t + 1
dy 4t3 − 1
And so at t = 0, = = −1 and so the tangent line at t = 0 has equation y = −x.
dx 5t4 + 1
dy 4t3 − 1 3 1
While at t = 1, = 4 = = and so the tangent line at t = 1 has equation
dx 5t + 1 6 2
1 1
y = (x − 2) = x − 1.
2 2
1 2
And so at their intersection, we have −x = x − 1 or x = . So their intersection point is
2 3
2 2
( , − ).
3 3
√
√ 3
(b) By the Pythagorean Theorem, l = r2 + h2 = + h2 .
πh
(c) The total external surface area of the cone (including the base) is
√ √ √ √
3 3 9 3h 9
A = πrl = π + h2 = π + = + 3πh.
πh πh π 2 h2 π h2
−18
dA h3 + 3π 3 π − h63 3 π − h63 dA 6 1/3
(d) Compute = √ = √ = . So = 0 ⇐⇒ h = ( ) .
dh 2 9 + 3πh 2 9 + 3πh 2 A dh π
h2 h2
18 π− h63 2
d2 A 3 h4 A − (π − h63 ) dA 3
18
h4 A − (π − h63 ) 23 12 6
9 h4 A2 − (π − h3 )
= dh
= A
=
dh2 2 A2 2 A2 4 A3
12 2 6 2 12 9 6 2
(f) A − (π − 3 ) = 4 ( 2 + 3πh) − (π − 3 )
h4 h h h h
1
This is a ∪-shaped quadratic in 3 , whose determinant is (24π)2 − 4(72)(−π 2 ) = 864π 2 > 0.
h
So this expression is always positive.
d2 A d2 A
(g) The numerator of our expression for is always positive. So is always positive.
dh2 dh2
dA
That is, is always strictly increasing. So the stationary point we found in (d) must also
dh
be the global minimum point.
1
(d) Given i ∶ R → R defined by x ↦ ln(1 + x), we have i(0) = 0, i′ (x) = , i′ (0) = 1,
1+x
1 1 x2 x3
i′′ (x) = − , i ′′
(0) = −1, i (3)
(x) = 2 , and i (3)
(0) = 2. Thus, M3 (x) = x− + .
(1 + x)2 (1 + x)3 2 3
π π π π
M0 ( ) = 0, M1 ( ) = M2 ( ) = ≈ 1.571,
2 2 2 2
π π π (π/2)3
M3 ( ) = M4 ( ) = − ≈ 0.925,
2 2 2 3!
π π π (π/2)3 (π/2)5
M5 ( ) = M6 ( ) = − + ≈ 1.004,
2 2 2 3! 5!
π
Mn ( ) ≈ 1.000, for all n ≥ 7.
2
It does appear “plausible” that sin (π) = M (π), because sin π = 0 and ...
(π)3
M3 (π) = M4 (π) =π− ≈ −2.026,
3!
(π)3 (π)5
M5 (π) = M6 (π) =π− + ≈ 0.524
3! 5!
3
[ln (x + 1)]
Answer to Exercise 180. For ln x ∈ R, we have sin [ln (x + 1)] = ln (x + 1)− +
3!
x2 x3
. . . . For x ∈ (−1, 1], we have ln (x + 1) = x − + − . . . . Hence, for x ∈ (−1, 1] (this is the
2 3
range of values for which the Maclaurin series for sin [ln(1 + x)] converges), we have
2 3 3
x2 x3 (x − x2 + x3 − . . . )
sin [ln (x + 1)] = (x − + − ...) − + ...
2 3 3!
x2 3 1 1 x2 x3
=x− + x ( − ) + ⋅⋅⋅ = x − + + ...
2 3 3! 2 6
Answer to Exercise 181. (a) F ′ (x) = 4 sin 4x, so that indeed F is an indefinite integral
for f . And:
(b) Although F and G seem to be very different functions, they actually differ only by a
constant (namely 1), as we now show:
So this example does not contradict the assertion that “the indefinite integral is unique up
to a constant”.
d
Answer to Exercise 182. (kx + C) = x for all x, so by definition, ∫ k dx = kx + C.
dx
Similarly,
d xn+1 xn+1
( + C) = xn Ô⇒ ∫ x dx = n + 1 + C,
n
✓
dx n + 1
d x
(e + C) = ex Ô⇒ ∫ e dx = e + C,
x x
✓
dx
d
(− cos x + C) = sin x Ô⇒ ∫ sin x dx = − cos x + C, ✓
dx
d
(sin x + C) = cos x Ô⇒ ∫ cos x dx = sin x + C, ✓
dx
d
[F (x) ± G(x) + C] = f (x) ± g(x) Ô⇒ ∫ f (x) ± g(x) dx = F (x) ± G(x) + C, ✓
dx
d
[kF (x) + C] = kf (x) Ô⇒ ∫ kf (x) dx = kF (x) + C. ✓
dx
1 x
So indeed ∫ √ dx = sin−1 ( ) + C (for ∣x∣ < a).
a2 − x2 a
a+x
(d) Let x ≠ a. Case #1: ≥ 0.
a−x
d 1 a+x d 1 a+x 1 d
( ln ∣ ∣ + C) = ( ln + C) = [ln(a + x) − ln(a − x)]
dx 2a a−x dx 2a a − x 2a dx
1 1 1 1 a − x + (a + x) 1 2a 1
= ( + )= = = ,
2a a + x a − x 2a (a + x) (a − x) 2a a2 − x2 a2 − x2
1 1 a+x
so that indeed ∫ 2 dx = ln ∣ ∣ + C.
a − x2 2a a−x
a+x
Case #2: < 0.
a−x
d 1 a+x d 1 a+x 1 d
( ln ∣ ∣ + C) = ( ln + C) = [ln(a + x) − ln(x − a)]
dx 2a a−x dx 2a x − a 2a dx
1 1 1 1 1 1 1
= ( − )= ( + )= 2 ,
2a a + x x − a 2a a + x a − x a − x2
1 1 a+x
so that indeed ∫ 2 dx = ln ∣ ∣ + C.
a − x2 2a a−x
Answer to Exercise 183. (f) Let x not be an integer multiple of π, so that sin x ≠ 0.
Case #1: sin x ≥ 0.
d d cos x
(ln ∣sin x∣ + C) = (ln sin x + C) = = cot x,
dx dx sin x
d d − cos x
(ln ∣sin x∣ + C) = [ln (− sin x) + C] = = cot x,
dx dx − sin x
Answer to Exercise 183. (g) Let x not be an integer multiple of π, so that csc x + cot x
is well-defined.
Case #1: csc x + cot x ≥ 0.
Answer to Exercise 183. (h) Let x not be an odd-integer multiple of π/2, so that
sec x + tan x is well-defined.
Case #1: sec x + tan x ≥ 0.
2 cos 2x + 1 1 sin 2x
∫ cos x dx = ∫ 2
dx = x +
2 4
+ C.
2 2
∫ tan x dx = ∫ sec x − 1 dx = tan x + x + C.
P +Q P −Q P +Q P −Q
(d) Recall that sin P + sin Q = 2 sin cos . So let mx = and nx = .
2 2 2 2
So P = (m + n)x and Q = (m − n)x. Thus,
1
sin(mx) cos(nx) = {sin [(m + n)x] + sin [(m − n)x]}
2
P +Q P −Q P +Q P −Q
(e) Recall that cos P − cos Q = −2 sin sin . So let mx = and nx = .
2 2 2 2
So P = (m + n)x and Q = (m − n)x. Thus,
1
sin(mx) sin(nx) = − {cos [(m + n)x] − cos [(m − n)x]}
2
P +Q P −Q P +Q P −Q
(f) Recall that cos P + cos Q = 2 cos cos . So let mx = and nx = .
2 2 2 2
So P = (m + n)x and Q = (m − n)x. Thus,
1
cos(mx) cos(nx) = {cos [(m + n)x] + cos [(m − n)x]}
2
1 1
=∫ √ dx = ∫ dx
3 sec2 u tan2 u 3 sec2 u tan u
1 1 dx 1
=∫ du + C1 = ∫ 3 sec u tan u du + C1
3 sec2 u tan u du 3 sec2 u tan u
1 x
=∫ du + C1 = ∫ cos u du + C1 = sin u + C = sin (sec−1 ( )) + C.
sec u 3
1 du
where = uses Theorem 9 (“multiply by = 1”).
du
3 dx 3 9
(a) (ii) Note that the given substitution x = √ implies = and u = 1− .
1−u du 2(1 − u)3/2 x2
9 1 9 3 9 3
∫ √ dx = ∫ √ du + C1 = ∫ √ du + C1
x2 x2 − 9 x2 x2 − 9 2(1 − u)3/2 9 9
−9 2(1 − u)3/2
1−u 1−u
√
(1 − u)3/2 3 1 √ 9
=∫ √ du + C 1 = ∫ √ du + C 1 = u+C = 1− + C.
9 − 9(1 − u) 2(1 − u)3/2 2 u x2
1 du
where = uses Theorem 9 (“multiply by = 1”).
du
(a) (iii) Let y = Hypothenuse
√ / Adjacent. Fix “Adjacent” = 1, so that “Hypothenuse” = y
√ √
and “Opposite” = Hypothenuse2 − Adjacent2 = y 2 − 12 = y 2 − 1. So
√ √
Opposite y2 − 1 1
sin (sec−1 y) = = = 1− .
Hypothenuse y y2
¿ √
x Á 1 9
sin (sec−1 ( )) = Á
À1 − 2 = 1 − .
3 (x) x2
3
3
( 23 tan u) 1 tan3 u 1
=∫ 3/2
dx = ∫ 3
dx = ∫ sin3 u dx
(9 sec2 u) 8 sec u 8
11 3 dx 1 3 3 2 3 sin3 u
= ∫ sin u du + C1 = ∫ sin u sec u du + C1 = du + C1
8 du 8 2 16 ∫ cos2 u
3 1 − cos2 u 3 sin u
= ∫ 2
sin u du + C1 = ∫ − sin u du + C1
16 cos u 16 cos2 u
⎡ ⎤
3 1 3 ⎢⎢ 1 −1 2x
⎥
⎥ + C,
= ( + cos u) + C = ⎢ + cos (tan ( )) ⎥
16 cos u 16 ⎢ cos (tan−1 ( 2x )) 3 ⎥
⎣ 3 ⎦
1 du
where = uses Theorem 9 (“multiply by = 1”).
du
u − 9 1/2 1 1/2 du
(b) (ii) u = 4x2 + 9 implies x = ( ) = (u − 9) and = 8x. Now,
4 2 dx
x3 x2 1 u − 9 1 du 1 u−9
∫ 3/2
dx = ∫ 3/2
8x dx = ∫ dx = ∫ du + C1
(4x2 + 9) (4x2 + 9) 8 4u3/2 8 dx 32u3/2
1
where = uses Theorem 9 (“cancel out dx’s”).
Adjacent 1
cos (tan−1 y) = =√ .
Hypothenuse 1 + y2
2x 1
cos (tan−1 ( )) = √ .
3 2x 2
1+( 3 )
We now show that the answers in (i) and (ii) are exactly identical!
⎡ ⎤ ⎡√ ⎤
⎢ ⎥ 3 ⎢⎢ 2 ⎥
3 ⎢ 1 −1 2x ⎥ 2x 1 ⎥
⎢ + cos (tan ( ))⎥ = ⎢ 1+( ) + √ ⎥
−1 2x
16 ⎢ cos (tan ( 3 )) 3 ⎥ 16 ⎢⎢ 3 2x 2 ⎥
⎣ ⎦ ⎣ 1 + ( 3 ) ⎥⎦
2x 2 2x 2
3 1+( 3 ) +1 3 2+( 3 ) 18 + 4x2
= √ = √ = √ .
16 2x 2
1+( 3 )
16 2x 2
1+( 3 ) 16 9 + 4x 2
Answer to Exercise 186. By the DETAIL rule of thumb, we should in both cases
choose v ′ = sin x. So
v′
©¬
u
′
∫ x sin x dx = uv − ∫ u v dx = −x cos x + ∫ cos x dx = −x cos x + sin x.
u v′ ′
©¬ ©
u
¬
v
2 ′ 2
∫ x sin x dx = uv − ∫ u v dx = −x cos x + ∫ 2 x cos x dx
Answer to Exercise 187. For the Lower Sum SL12 , each rectangle has width (or base)
0.5. The first rectangle has height f (0), the second f (0.5), the third f (1), ..., the twelfth
f (5.5). And so
⎡ √ √ ⎤
1 1 11 1 ⎢⎢ √ ⎛ 1 ⎞ ⎛ 11 ⎞⎥⎥
SL12 = [f (0) + f ( ) + ⋅ ⋅ ⋅ + f ( )] = ⎢( 0 + 1) + + 1 + ⋅⋅⋅ + + 1 ⎥ ≈ 15.116.
2 2 2 2⎢ ⎝ 2 ⎠ ⎝ 2 ⎠⎥
⎣ ⎦
For the Upper Sum SL12 , each rectangle again has width (or base) 0.5. The first rectangle
has height f (0.5), the second f (1), the third f (1.5), ..., the twelfth f (6). And so
⎡ √ √
√ ⎤
1 1 1 ⎢⎢⎛ 1 ⎞ ⎛ 3 ⎞ ⎥
SU 12 = [f ( ) + f (1) + ⋅ ⋅ ⋅ + f (6)] = ⎢ +1 + + 1 + ⋅ ⋅ ⋅ + ( 6 + 1)⎥⎥ ≈ 16.341.
2 2 2 ⎢⎝ 2 ⎠ ⎝ 2 ⎠ ⎥
⎣ ⎦
Answer to Exercise 188. We do not know which the area function is, amongst the
infinitely-many indefinite integrals of f . We merely know that the area function is one of
them. Hence, we use the indefinite article an, rather than the definite article the.
Method #1. The entire rectangle A + B + C + D has area 21/3 × 2 = 24/3 . The rectangle B + C
4 2
1/3
21/3 x 24/3 − 1
has area 1 × 1 = 1. The region D has area ∫ x3 dx = [ ] = . Hence,
1 4 1 4
4/3 24/3 − 1 3
A = A + B + C + D − (B + C + D) = 2 − (1 + ) = (24/3 − 1) .
4 4
y
y=2
A
y=1
D
B
C
x
√
Answer to Exercise 191. By the quadratic formula, the two curves intersect at ± 2/2.
So
√ √
2/2 √√ √ √ √
2/2 2x3 2 2 2 2 2 2 2 2
A=∫ √ 2 − x2 − (x2 + 1) dx = [x − ] √ =[ − ] − [− + ]= .
− 2/2 3 − 2/2 2 12 2 12 3
2
2
4 x5 32 −32 256
∫−2 x − 16 dx = [ − 16x] = ( − 32) − ( + 32) = − .
5 −2 5 5 5
Answer to Exercise
√ 193. (Again,
√ it helps to graph this on your calculator.) Note that
y = 1 ⇐⇒ t = 2; y = 2 ⇐⇒ t = 3; and dy/dt = 3t2 . So the area can be computed as:
3 3
√ √
3
3
y=2 t= 3 3
2 2 3t5 6t4
∫y=1 x dy = ∫t= √ (t + 2t) 3t dt = [ + ]
3
2 5 4 √3
2
π π 1 sin 2x π π 2
∫0 πy 2 dx = ∫ π sin2 x dx = π [ x − ] = .
0 2 4 0 2
y
v′ u ⎛ v′ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
©x ¬ ¬ © u
⎜ ⎟
y = ∫ e sin x dx = ex sin x − ∫ cos x ex dx = ex sin x − ⎜ex cos x + ∫ sin xex dx⎟
⎜ ⎟
⎝ ⎠
ex
⇐⇒ y = (sin x − cos x) + C is the general solution.
2
e0
Given also the initial condition x = 0 Ô⇒ y = 1, we find that 1 = (sin 0 − cos 0) + C =
2
ex
C − 0.5 ⇐⇒ C = 1.5. Thus, the particular solution is y = (sin x − cos x) + 1.5.
2
dx 1
Answer to Exercise 196. Rearranging, = 2 . So the general solution is
dy y +1
1
x = ∫ 2 dy = tan−1 y + C (Proposition 10). Rearranging, the general solution is
y +1
y = tan (x + D) (where D = −C).
Given also the initial condition x = 0 Ô⇒ y = 1, we have C = −π/4. So the particular
solution is x = tan−1 y − π/4.
v′ u ⎛ v′ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
dy/dx
dy ©¬ ¬©
u
⎜ x ⎟
= ∫ ex sin x dx = ex sin x − ∫ cos x ex dx = ex sin x − ⎜ ⎟
1
⎜e cos x + ∫ sin xe dx⎟
x
dx ⎜ ⎟
⎝ ⎠
dy ex
⇐⇒ = (sin x − cos x) + C1
dx 2
dy ex
Ô⇒ y=∫ dx = ∫ (sin x − cos x) + C1 dx
dx 2
1 ex
= [ (sin x − cos x) + C2 − ∫ ex cos x dx + C1 x]
2 2
21 ex ex
= [ (sin x − cos x) + C2 − (sin x + cos x) + C3 + C1 x]
2 2 2
1
= (−ex cos x + C1 x + C4 ) ,
2
2 1 dy 1
where = used =. The general solution is = (−ex cos x + C1 x + C4 ).
dx 2
1
1= (−e0 cos 0 + C1 0 + C4 ) Ô⇒ C4 = 2.5,
2
1 π π
2 = (−eπ/2 cos + C1 ⋅ + 2.5) Ô⇒ C1 = 1.5π.
2 2 2
dy 1
So the particular solution is = (−ex cos x + 1.5πx + 2.5).
dx 2
d
(b) (i) Newton’s Second Law of Motion is that F = (mv).
dt
d dm dv
(ii) By the Product Rule, F = (mv) = v + m . Assuming that m is constant, we
dt dt dt
dm dv
have = 0 and hence F = m .
dt dt
(c) Taking the Earth is immobile, the force of gravitation is the rate of change of momentum
of the small ball. That is,
GM m dv
F= = −m .
r2 dt
The ball drops towards the surface of the Earth at an increasing speed. By assumption,
downwards is the negative direction. Hence the negative sign.
GM dv
Cancelling out the m’s yields 2
=− .
r dt
R Gm1 1 R 1 1
(d) (i) ∫R+x r2 dr = Gm1 [− r ] = Gm1 (− +
R R+x
).
R+x
v
r=R dv r=R dr v=vs v2 s vs2
(ii) ∫r=R+x dt dr = ∫r=R+x dt dv =∫
v=0
v dv = [ ] = − .
2 0 2
√
1 1 vs2 1 1
(iii) Gm1 (− + )=− ⇐⇒ vs = ± 2Gm1 ( − ).
R R+x 2 R R+x
√assumption, downwards is the negative direction. So for (d) (iii), we must have vs =
By
1 1
− 2Gm1 ( − ).
R R+x
(e) This is simply the same process as before, but in reverse. The ball will keep moving
upwards, but the force of gravitation will keep pulling it down, reducing its velocity at a
Gm1 dv
rate given by the equation 2 = − . Eventually, the velocity of the ball will hit 0 and
r dt
then start going negative (i.e. the ball will start falling down towards the Earth).
√
1 1
Hence, if x is the maximum height attained by the ball, we have V = 2GM ( − ).
R R+x
(f) In order for the ball to never fall back down to earth, it must be that the ball keeps
going upwards and never reaches any maximum height. That is, x → ∞. Thus, ve = lim V =
√ √ x→∞
1 1 2GM
2GM ( − )= .
R R+x R
√ √
2GM 2 ⋅ 6.674 × 10−11 ⋅ 5.972 × 1024
ve = = ≈ 11, 190.
R 6371000
Answer to Exercise 200. Taking the green path, there are 3 ways. Taking the red
path, there are 2 ways. Hence, there are 3 + 2 = 5 ways to get from the Starting Point to
the River.
Answer to Exercise 203. We must choose three 4D numbers. Choosing the first 4D
number involves four decisions — what to put as the first, second, third and fourth digits,
with the condition that no digit is repeated.
____
1 2 3 4
Thus, by the MP, there are 10 × 9 × 8 × 7 = 5040 ways to choose the first 4D number.
If we ignored the fact that we already chose the first 4D number, then there’d similarly be
5040 ways to choose the second 4D number (given the condition that this second 4D number
does not have any repeated digits). However, there is an additional condition — namely,
the second 4D number cannot be the same as the first. Thus, there are 5040 − 1 = 5039
ways to choose the second 4D number.
By similar reasoning, we see that there are 5040 − 2 = 5038 ways to choose the third 4D
number.
Altogether then, by the MP, there are 5040 × 5039 × 5038 = 127, 947, 869, 280 ways to choose
the three 4D numbers.
1. The food court and hawker centre share 2 types of cuisine (Chinese and Western) in
common. And so together, the food court and the hawker centre have 4 + 3 − 2 = 5
different types of cuisine.
2. Combine together the food court and the hawker centre (call this the “Low-Class Place”).
The Low-Class Place has 5 types of cuisine and shares 2 types of cuisine (Chinese and
Malay) with the restaurant. And so together, the Low-Class Place and restaurant have
5 + 3 − 2 = 6 different types of cuisine (namely Chinese, Indonesian, Japanese, Korean,
Malay, and Western).
Answer to Exercise 209. The problem of choosing a president and vice-president from
a committee of 11 members is equivalent to the problem of filling 2 spaces with 11 distinct
objects. The answer is thus P (11, 2) = 11!/9! = 11 × 10 = 110.
Answer to Exercise 210. Let B and S stand for brother and sister, respectively.
(a) First consider the problem of permuting the seven letters in BBBBSSS, without any
two B’s next to each other. There is only 1 possible arrangement, namely BSBSBSB.
There are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 1 × 4!3! = 144 possible ways to arrange the siblings in a line, so
that no two brothers are next to each other.
(b) First consider the problem of permuting the seven letters in BBBBSSS, without any
two S’s next to each other. We’ll use the AP.
1. B in position #1.
(a) B in position #2. Then the only way to fill the remaining five positions is SBSBS.
Total: 1 possible arrangement.
(b) S in position #2. Then we must have B in position #3.
i. B in position #4. Then the only way to fill the remaining three positions is
SBS. Total: 1 possible arrangement.
ii. S in position #4. Then we must have B in position #5. And there are two
ways to fill the remaining two positions: either BS or SB. Total: 2 possible
arrangements.
(a) B in position #3. Then, like in 1(b), we are left with two B’s and two S’s to fill
the remaining four positions. Hence, Total: 3 possible arrangements.
(b) S in position #3. Then we must have B in position #4. There are three ways
to fill the remaining three positions: SBB, BSB, and BBS. Total: 3 possible
arrangements.
Again, there are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 10 × 4!3! = 1440 possible ways to arrange the siblings in a line, so
that no two sisters are next to each other.
(c) We saw that there was only 1 possible (linear) permutation of BBBBSSS that satisfied
the restriction, namely BSBSBSB.
If we now arrange the siblings in a circle, there will necessarily be two brothers next to each
other.
We thus conclude: There are 0 possible ways to arrange the siblings in a circle so that no
two brothers are next to each other.
(d) In part (b), we found 10 possible (linear) permutations of BBBBSSS that satisfied
the restriction.
Of these, 3 have sisters at the two ends: SBSBBBS, SBBSBBS, and SBBBSBS. If
arranged in a circle, these 3 arrangements would involve two sisters next to each other. So
we must deduct these 3 arrangements.
And now again, we must now take into account the fact that the brothers are distinct and
the sisters are distinct. We conclude that there are in total 1 × 4!3! = 144 possible ways to
arrange the siblings in a circle, so that no two sisters are next to each other.
⎛n⎞ n!
=
⎝ k ⎠ k!(n − k)!
n × (n − 1) × ⋅ ⋅ ⋅ × (n − k + 1) × (n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
=
k!(n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
= (mass cancellation).
k!
4! 4! 4×3
C(4, 2) = = = = 6,
2!(4 − 2)! 2!2! 2 × 1
6! 6! 6×5
C(6, 4) = = = = 15,
4!(6 − 4)! 4!2! 2 × 1
7! 7! 7×6×5
C(7, 3) = = = = 35.
3!(7 − 3)! 3!4! 3 × 2 × 1
⎛ 3 ⎞⎛ 7 ⎞⎛ 5 ⎞
Answer to Exercise 213. = 630.
⎝ 1 ⎠⎝ 2 ⎠⎝ 2 ⎠
17! 17! 17 × 16 17 × 16 × 15
(c) C(17, 2) + C(17, 3) = + = +
2!15! 3!14! 2×1 3×2×1
18 × 17 × 16
= 17 × 8 + 17 × 8 × 5 = 17 × 8 × 6 = .
3×2×1
Consider the 6 terms on the right. There is C(3, 0) = 1 way to choose 0 of the x’s. Hence,
the coefficient on x0 is C(3, 0) — this corresponds to the term 1 ⋅ 1 ⋅ 1 above.
There are C(3, 1) = 3 ways to choose 1 of the x’s. Hence, the coefficient on x1 is C(3, 1) —
this corresponds to the terms 1 ⋅ 1 ⋅ x, 1 ⋅ x ⋅ 1, and x ⋅ 1 ⋅ 1 above.
There are C(3, 2) = 3 ways to choose 2 of the x’s. Hence, the coefficient on x2 is C(3, 2) —
this corresponds to the terms 1 ⋅ x ⋅ x, x ⋅ 1 ⋅ x, and x ⋅ x ⋅ 1 above.
There is C(3, 03) = 1 way to choose 3 of the x’s. Hence, the coefficient on x3 is C(3, 3) —
this corresponds to the term x ⋅ x ⋅ x above.
Altogether then,
⎛4⎞
Answer to Exercise 219. (a) There are = 4 ways of choosing the two Tan sons
⎝2⎠
⎛3⎞
and = 3 ways of choosing the two Wong daughters.
⎝2⎠
Having chosen these sons and daughters, there are only 2! = 2 × 1 possible ways of matching
them up. This is because for the first chosen Tan Son, we have 2 possible choices of brides
for him. And then for the second chosen Tan Son, there is only 1 possible choice of bride
left for him.
⎛ 4 ⎞⎛ 3 ⎞
Altogether then, there are ⋅ 2 = 24 ways of forming the two couples.
⎝ 2 ⎠⎝ 2 ⎠
⎛6⎞ ⎛9⎞
(b) There are = 6 ways of choosing the five Lee sons and = 126 ways of choosing
⎝5⎠ ⎝5⎠
the five Ho daughters.
Having chosen these sons and daughters, there are 5! = 5 × 4 × 3 × 2 × 1 possible ways of
matching them up. This is because for the first chosen Tan Son, we have 5 possible choices
of brides for him. And then for the second chosen Tan Son, there are 4 possible choices of
brides left for him. Etc.
⎛ 6 ⎞⎛ 9 ⎞
Altogether then, there are ⋅ 5! = 6 ⋅ 126 ⋅ 5! = 90, 720 ways of forming the five
⎝ 5 ⎠⎝ 5 ⎠
couples.
S = {A«, K«, Q«, . . . , 2«, Aª, Kª, Qª, . . . , 2ª, A©, K©, Q©, . . . , 2©, A¨, K¨, Q¨, . . . , 2¨} .
(a) (ii) Since there are 52 possible outcomes, there are 252 possible events. Hence, the
event space contains 252 elements. It is too tedious to write this out explicitly.
(a) (iii) As always, P has domain Σ and R. We have P({3©}) = P({5♣}) = 1/52 and
P({3©, 5♣}) = 2/52. In general, given any event A ∈ Σ, we have
∣A∣ ∣A∣
P(A) = = .
∣S∣ 52
In words, given any event A, its probability P(A) is simply the number of elements it
contains, divided by 52. So for example, P ({3©, 5♣, A«}) = 3/52, as we would expect.
(a) (iv) John might argue that since packs of poker cards usually come with Jokers, there
is the possibility that we mistakenly included one or more Jokers in our deck of cards. He
might thus argue that to cover this possibility, we should set our sample space to be
S = {A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨, Joker} .
(b) (ii) Since there are 4 possible outcomes, there are 24 = 16 possible events. Hence, the
event space contains 16 elements. It is not too tedious to write these out explicitly:
⎧
⎪
⎪
Σ = ⎨∅, {HH} , {HT } , {T H} , {T T } , {HH, HT } , {HH, T H} , {HH, T T } ,
⎪
⎪
⎩
{HT, T H} , {HT, T T } , {T H, T T } , {HH, HT, T H} ,
⎫
⎪
⎪
{HH, HT, T T } , {HH, T H, T T } , {HT, T H, T T } , S ⎬.
⎪
⎪
⎭
(b) (iii) As always, P has domain Σ and R. We have P({HH}) = P({HT }) = 1/4 and
P({HT, HT, T H}) = 3/4. In general, given any event A ∈ Σ, we have
∣A∣ ∣A∣
P(A) = = .
∣S∣ 4
In words, given any event A, its probability P(A) is simply the number of elements it
contains, divided by 4. So for example, P ({T H, T T }) = 2/4, as we would expect.
(b) (iv) John might, as before, argue that there is the possibility that a coin lands on its
edge. He might thus argue that the sample space should be
⎧
⎪ ⎫
⎪
⎪ ⎪
S=⎨ , ,..., , ,..., , ,..., ⎬.
⎪
⎪ ⎪
⎪
⎩ ⎭
(c) (ii) Since there are 36 possible outcomes, there are 236 possible events. Hence, the
event space contains 236 elements.
⎧ ⎫ ⎧ ⎫
⎛⎪
⎪ ⎪ ⎪⎞ ⎛⎪
⎪ ⎪
⎪⎞ 1
(c) (iii) As always, P has domain Σ and R. We have P ⎨ ⎬ = P ⎨ ⎬ = and
⎝⎪
⎪ ⎪⎠ ⎝⎪ ⎪ ⎠
⎩ ⎪ ⎪ ⎪ 36
⎭ ⎩ ⎭
⎧
⎪ ⎫
⎪
⎛⎪ ⎪⎞ 2
P ⎨ , ⎬ = . In general, given any event A ∈ Σ, we have
⎝⎪
⎪ ⎪
⎪ ⎠ 36
⎩ ⎭
∣A∣ ∣A∣
P(A) = = .
∣S∣ 36
In words, given any event A, its probability P(A) is simply the number of elements it
⎧ ⎫
⎛⎪
⎪ ⎪
⎪⎞ 4
contains, divided by 52. So for example, P ⎨ , , , ⎬ = , as we would expect.
⎝⎪
⎪ ⎪
⎪ ⎠ 36
⎩ ⎭
(c) (iv) John might argue that there is the possibility that a die lands on a vertex. He
might thus argue that the sample space contains 72 = 49 outcomes and should be
⎧
⎪ ⎫
⎪
⎪ V V V ⎪
S=⎨ , ,..., , , ,..., , , ,..., ⎬.
⎪
⎪ V V V ⎪
⎪
⎩ ⎭
The mapping rule of the probability function would be appropriately adjusted. For example,
if John believes that any given die roll has probability 1/1000000 of landing on a vertex,
⎧ ⎫ ⎧ ⎫
⎛⎪
⎪V ⎪ ⎪⎞ 1 ⎪ ⎪
⎛⎪ ⎪⎞ 999999 2
then we might assign P ⎨ ⎬ = , P ⎨ ⎬ = ( ) , etc.
⎪V ⎪
⎝⎪ ⎪⎠ 10000002 ⎪ ⎪
⎝⎪ ⎪⎠ 1000000
⎩ ⎭ ⎩ ⎭
(b) The events A, Ac ∩ B, and Ac ∩ B c ∩ C are mutually exclusive. Moreover, their union
is A ∪ B ∪ C. Hence, by the Additivity Axiom (applied twice),
Answer to Exercise 222. Let A be the event that we rolled at least one even number
and B be the event that the sum of the two dice was 8. We have P(B) = 5/36 (see Exercise
228).
And A ∩ B can occur if and only if the two dice were , , or . Hence, P(A ∩ B) =
3/36.
Altogether then,
P(A ∩ B) 3/36 3
P(A∣B) = = = .
P(B) 5/36 5
1
P (DNA match∣Blood stain is not John Brown’s) = .
10, 000, 000
1
P (Blood stain is not John Brown’s∣DNA match) = .
10, 000, 000
There is reason to believe that P (Blood stain is not John Brown’s) is much greater than
P (DNA match) and thus that P (Blood stain is not John Brown’s∣DNA match) is much
greater than P (DNA match∣Blood stain is not John Brown’s).
One important factor is that if the DNA database is large, then invariably we’d expect to
find, purely by coincidence, a DNA match to the blood stain at the murder scene. As of
May 2016, the US National DNA Index contains over the DNA profiles of over 12.3 million
1
individuals. And so, even if it were true that there is only probability that two
10, 000, 000
random individuals have a DNA match, we’d expect to find a match, simply by combing
through the entire US National DNA Index!
The error here is similar to the lottery example, where we conclude (erroneously) that a
lottery winner must have cheated, simply because it was so unlikely that she won.
Answer to Exercise 225. First, note that P (H1 ) = P (T1 ) = P (H2 ) = 0.5.
(a) P (H1 ∩ H2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (H2 ), so that indeed H1 and H2 are indepen-
dent.
(b) P (H2 ∩ T1 ) = 0.25 = 0.5×0.5 = P (H2 ) P (T1 ), so that indeed H2 and T1 are independent.
(c) Observe that H1 ∩ T1 = ∅ (it is impossible that “the first coin flip is heads” AND also
“the first coin flip is tails”).
Hence, P (H1 ∩ T1 ) = P (∅) = 0 ≠ 0.25 = 0.5 × 0.5 = P (H1 ) P (T1 ), so that indeed H1 and T1
are not independent.
Answer to Exercise 226. No, the journalist is incorrectly assuming that the probability
of one family member making the NBA is independent of another family member making
the NBA. But such an assumption is almost certainly false.
The same excellent genes that made Rick Barry a great basketball player, probably also
helped his three sons. Not to mention that having an NBA player as your father probably
helps a lot too.
The two events “family member #1 in NBA” and “family member #2 in NBA” are probably
not independent. So we cannot simply multiply probabilities together.
Answer to Exercise 227. First, note that P (H1 ) = P (T2 ) = P(X) = 0.5.
(a) P (H1 ∩ T2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (T2 ), so that indeed H1 , T2 are independent.
P (H1 ∩ X) = 0.25 = 0.5 × 0.5 = P (H1 ) P (X), so that indeed H1 , X are independent.
P (T2 ∩ X) = 0.25 = 0.5 × 0.5 = P (T2 ) P (X), so that indeed T2 , X are independent.
Altogether then, H1 , T2 , and X are indeed pairwise independent.
(b) The event H1 ∩ T2 ∩ X is the same as the event H1 ∩ T2 . Thus, P (H1 ∩ T2 ∩ X) =
P (H1 ∩ T2 ) = 0.25 ≠ 0.5 × 0.5 × 0.5 = P (H1 ) P (T2 ) P(X), so that indeed the three events are
not independent.
3 2 1 6 1
(c) P(E) = P (X ≥ 10) = P (X = 10) + P (X = 11) + P (X = 12) = + + = = .
36 36 36 36 6
⎛ ⎞ ⎛ ⎞
Q = 3 and Q = 3.
⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞
(P Q) = 15 and (P Q) = 12.
⎝ ⎠ ⎝ ⎠
HT T H, T HT H, T T HH, HT T T, T HT T, T T HT, T T T H, T T T T }.
The event space Σ is the set of all possible subsets of S and contains 216 elements.
The probability function P ∶ Σ → R is defined by P(A) = ∣A∣/16, for any event A ∈ Σ.
HT T T, T HT T, T T HT, T T T H ↦ 1,
HHT T, HT HT, T HHT, HT T H, T HT H, T T HH ↦ 2,
HHHT, HHT H, HT HH, T HHH ↦ 3,
T T T T ↦ 0, HHHH ↦ 4.
⎧
⎪ ⎫
⎪
⎪
⎪
⎪ ⎪
⎪
⎪
S=⎨ , ,..., , ,..., , ,..., ⎬
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
⎩ ⎪
⎭
The event space Σ is the set of all possible subsets of S and contains 2216 elements.
The probability function P ∶ Σ → R is defined by P(A) = ∣A∣/216, for any event A ∈ Σ.
(b) (ii) The range of X is {3, 4, 5, . . . , 18}. We now count the number of ways there are for
the three dice to reach a sum of 3, to reach a sum of 4, etc. This will enable us to write
down the mapping rule of the function X ∶ S → R.
To get a sum of 3, the three dice must be or permutations thereof. There is thus
3!
= 1 possibility.
3!
To get a sum of 4, the three dice must be , or permutations thereof. There are thus
3!
= 3 possibilities.
2!
To get a sum of 5, the three dice must be , , or permutations thereof. There are
3! 3!
thus + = 6 possibilities.
2! 2!
To get a sum of 6, the three dice must be , , , or permutations thereof.
3! 3!
There are + 3! + = 10 such possibilities.
2! 3!
To get a sum of 7, the three dice must be , , , , or permutations
3! 3! 3!
thereof. There are + 3! + + = 15 such possibilities.
2! 2! 2!
To get a sum of 8, the three dice must be , , , , , or permutations
3! 3! 3!
thereof. There are + 3! + 3! + + = 21 such possibilities.
2! 2! 2!
To get a sum of 9, the three dice must be , , , , , , or
3! 3! 3!
permutations thereof. There are 3! + 3! + + + 3! + = 25 such possibilities.
2! 2! 3!
To get a sum of 10, the three dice must be , , , , , , or
3! 3! 3!
permutations thereof. There are 3! + 3! + + 3! + + = 27 such possibilities.
2! 2! 2!
By symmetry, there are also 27 ways to get a sum of 11; also 25 ways to get a sum of 12,
etc.
So X ∶ S → R is defined by
⎛ ⎞
X⎜
⎜
⎟ = 3,
⎟
⎝ ⎠
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X⎜
⎜
⎟=X⎜
⎟ ⎜
⎟=X⎜
⎟ ⎜
⎟ = 4,
⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
X⎜
⎜
⎟=X⎜
⎟ ⎜
⎟=X⎜
⎟ ⎜
⎟=X⎜
⎟ ⎜
⎟=X⎜
⎟ ⎜
⎟=X⎜
⎟ ⎜
⎟ = 5,
⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
1 3 6
(b)(iii) P(X = 3) = , P(X = 4) = , P(X = 5) = ,
216 216 216
10 15 21
P(X = 6) = , P(X = 7) = , P(X = 8) = ,
216 216 216
25 27 27
P(X = 9) = , P(X = 10) = , P(X = 11) = ,
216 216 216
25 21 15
P(X = 12) = , P(X = 13) = , P(X = 14) = ,
216 216 216
10 6 3
P(X = 15) = , P(X = 16) = , P(X = 17) = ,
216 216 216
1
P(X = 18) = , P(X = k) = 0,
216
Answer to Exercise 233. (a) P(X + Y = 2) is simply the probability of 2 heads and 0
sixes OR 1 head and 1 six OR 0 heads and 2 sixes. So
1 1 5 5 ⎛ 2 ⎞ 1 1 ⎛ 2 ⎞ 5 1 1 1 1 1 25 20 1 46
P (X + Y = 2) = ⋅ ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅ ⋅ = + + = .
2 2 6 6 ⎝ 1 ⎠ 2 2 ⎝ 1 ⎠ 6 6 2 2 6 6 144 144 144 144
(b) P (X + Y = 3) is simply the probability of 2 heads and 1 six OR 1 head and 2 sixes. So
1 1 ⎛ 2 ⎞ 5 1 ⎛ 2 ⎞ 1 1 1 1 10 2 12
P (X + Y = 3) = ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = + = .
2 2 ⎝ 1 ⎠ 6 6 ⎝ 1 ⎠ 2 2 6 6 144 144 144
1 1 1 1 1
P (X + Y = 4) = ⋅ ⋅ ⋅ = .
2 2 6 6 144
(d) E[X + Y ]
= ∑ P (X + Y = k) ⋅ k
k∈Range(X+Y )
= P (X + Y = 0) ⋅ 0 + P (X + Y = 1) ⋅ 1 + P (X + Y = 2) ⋅ 2
+ P (X + Y = 3) ⋅ 3 + P (X + Y = 4) ⋅ 4
25 60 46 12 1 60 + 92 + 36 + 4 192 4
= ⋅0+ ⋅1+ ⋅2+ ⋅3+ ⋅4= = = .
144 144 144 144 144 144 144 3
1
(b) P (X = 2000) = P (X = 1000) = P (X = 490) = ,
10000
10
P (X = 250) = P (X = 60) = ,
10000
9977
P (X = 0) = ,
10000
1
P (Y = 3000) = P (Y = 2000) = P (Y = 800) = ,
10000
9997
P (Y = 0) = .
10000
E[Y ] = ∑ P (Y = 2000) ⋅ k
k∈Range(Y )
1 1 1 9997
= ⋅ 3000 + ⋅ 2000 + ⋅ 800 + ⋅ 0 = 0.3 + 0.2 + 0.08 + 0 = 0.58.
10000 10000 10000 10000
(d) For every $1 staked, the “big” game is expected to lose you $0.341 and the “small”
game is expected to lose you $0.42. Thus, the “big” game is expected to lose you less
money.
1 3 1 25704
E [Z 2 ] = ⋅ 32 + ⋅ 42 + ⋅ ⋅ ⋅ + ⋅ 182 = = 119.
216 216 216 216
105
Hence, V[Z] = E [Z 2 ] − µ2 = 119 − 10.52 = .
12
35 65
Answer to Exercise 236. E [Y ] = × 20 cm + × 30 cm = 26.5 cm.
100 100
35 2 65 2
V [Y ] = × (20 cm − 26.5 cm) + × (30 cm − 26.5 cm) = 22.75 cm2 .
100 100
√
SD [Y ] = V [Y ] ≈ 4.77 cm.
Answer to Exercise 238. Let X ∼ B (20, 0.01) be the number of components in engine
#1 that fail. Let Y ∼ B (35, 0.005) be the number of components in engine #2 that fail.
P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − P (X = 0) − P (X = 1)
⎛ 20 ⎞ ⎛ 20 ⎞
=1− 0.010 0.9920 − 0.011 0.9919
⎝ 0 ⎠ ⎝ 1 ⎠
≈ 0.0169.
P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − P (Y = 0) − P (Y = 1)
⎛ 35 ⎞ ⎛ 35 ⎞
=1− 0.0050 0.99535 − 0.0051 0.99534
⎝ 0 ⎠ ⎝ 1 ⎠
≈ 0.0133.
P (X ≥ 2) P (Y ≥ 2) ≈ 0.00022.
Answer to Exercise 239. (a) The rate at which cats are killed is probably not constant.
There are probably periods of months or years when cat-killers are particularly active, and
other periods when the cat-killers are either in jail or inactive. (Indeed, between 2011 and
2014, relatively few cats were killed in northern Singapore. However, during 2015-2016,
there were unusually many cats killed in northern Singapore.)
Thus, the Poisson random variable is not a suitable model for the number of cats killed in
northern Singapore.
(b) It is reasonable to suppose that errors occur at a constant rate and that the author is
no more or less likely to make an error, regardless of when his last error was committed.
Thus, the Poisson random variable is arguably a suitable model for the number of errors
in this textbook.
On the other hand, one could argue that errors do not occur at a constant rate. It is
conceivable that the author is sometimes tired while working, and is thus more error-prone
during such occasions. Other times he is high on caffeine (and possibly other stimulants)
and is thus less error-prone.
(c) The rate at which you receive emails is probably not constant. For example, most of
your emails received are probably during the day, because that is when most people are
awake. Thus, the Poisson random variable is not a suitable model for the number of emails
you receive in a 24-hour timespan.
P(X > 5)
1
P(X ≥ 5) ≈ P(Y ≥ 5) = 1 − P(Y ≤ 4) ≈ 1 − 0.3575 = 0.6425,
1
where ≈ was obtained, either by reading off a Poisson table, using a graphing calculator, or
manually doing the calculations on a calculator.
⎧
⎪
⎪
⎪
⎪ 0, if k < 3, ⎧
⎪
⎪
⎪ ⎪
⎪0.5, if k ∈ [3, 5]
(a) FY (k) = ⎨0.5k, if k ∈ [3, 5], (b) fY (k) = ⎨
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪ if k > 5. ⎩0, otherwise.
⎩1,
(c) P (3.1 ≤ Y ≤ 4.6) = 0.75 is in blue and P (4.8 ≤ Y ≤ 4.9) = 0.05 is in red.
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
1 a−µ 2 1 a−0 2 1
fX (a) = √ e−0.5( σ ) = √ e−0.5( 1 ) = √ e−0.5a = φ(a).
2
σ 2π 1 2π 2π
We’ve just shown that the PDF of X ∼ N(µ, σ 2 ) when µ = 0 and σ 2 , is the same as the PDF
of the SNRV Z ∼ N(0, 1). Hence, the SNRV is indeed simply a normal random variable
with mean µ = 0 and variance σ 2 = 1.
X − µ X −µ
Answer to Exercise 247. First observe that = + . Now simply use Fact 79,
σ σ σ
1 −µ
with a = and b = :
σ σ
X − µ X −µ µ −µ 1 2
= + ∼ N( + , σ ) = N (0, 1) .
σ σ σ σ σ σ2
1 − 2.14
(a) P (X ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ −0.5098) = P (Z ≤ 0.5098) = Φ (0.5098) ≈ 0.6949.
5
1 − (−0.33)
P (Y ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ 0.9405) = 1 − P (Z ≤ 0.9405) = 1 − Φ (0.9405) ≈
2
0.1735.
(b) Let B1 ∼ N (110, 1156), B2 ∼ N (110, 1156), . . . , B12 ∼ N (110, 1156) be the bills in each
of the 12 months.
Then the total bill in a year is T = B1 +B2 +⋅ ⋅ ⋅+B12 ∼ N (12 × 110, 12 × 1156) = N (1320, 13872).
Thus, P (T > 1000) ≈ 0.9967 (calculator).
Our goal is to find the value of x for which P (B > 100) = 0.1. We have
50 − 200x 50 − 200x
Φ (√ ) = 0.9 ⇐⇒ √ ≈ 1.2815.
256 + 10000x2 256 + 10000x2
One can rearrange, do the algebra (square both sides), and use the quadratic formula.
Alternatively, one can simply use one’s graphing calculator to find that x ≈ 0.084. We
conclude that the maximum value of x is approximately 0.084, in order for the probability
that the total utility bill in a given month exceeds $100 is 0.1 or less.
Answer to Exercise 252. Let X be the random variable that is the sum of the weights
of the 5, 000 Coco-Pops.
The CLT says that since n = 5000 ≥ 20 is large enough and the distribution is “nice
enough” (we are assuming this), X can be approximated by the normal random variable
Y ∼ N (5000 × 0.1, 5000 × 0.004) = N (500, 20). Thus, P (X ≤ 499) ≈ P (Y ≤ 499) ≈ 0.4115
(calculator).
Answer to Exercise 253. Assume that the probability that a student passes is inde-
pendent of whether or not other students pass.
Assume also that the distribution is “nice enough”, so that since n = 1000 ≥ 20 is “big
enough”, we can use the CLT.
Let X be the number of passes. Then X can be approximated by the normal random
variable Y ∼ N (1000 × 0.9, 1000 × 0.9 × 0.1) = N (900, 90). Thus, using also the continuity
correction, we have P (X ≥ 920) ≈ P (Y ≥ 919.5) ≈ 0.0199 (calculator).
(This turns out to be a decent approximation because the exact probability, computed
using the binomial distribution, is P (X ≥ 920) ≈ 0.0176.)
The CLT says that since n = 365 ≥ 20 is large enough and the distribution is “nice
enough” (we are assuming this), Y can be approximated by the normal random variable
T ∼ N (365 × 0.5, 365 × 0.5) = N (182.5, 182.5). Thus, using also the continuity correction,
we have
This is a decent approximation, because the exact probability, computed using the Poisson
distribution, is P(Y > 200) ≈ 0.0928.
Answer to Exercise 256. (a) The sample mean and sample variance are
n
∑i=1 x 1885
x̄ = = = 188.5,
n 10
2
(∑n
378, 265 − 1885
2
∑i=1 x2 − i=1 x)
n
10
s2 = n
= ≈ 2550.
n−1 9
2
− 50) − [∑i=1 (xni −50)]
2 n
378, 265 − 1885
2
∑i=1 (xi
n
10
s2 = = ≈ 2550.
n−1 9
Answer to Exercise 257. (a) Assume that the weights of the five Singaporeans sampled
are independently- and identically-distributed. Then unbiased estimates for the population
mean µ and variance σ 2 of the weights of Singaporeans are, respectively, the observed
sample mean x̄ and observed sample variance s2 :
∑ xi 32 + 88 + 67 + 75 + 56
x̄ = = = 63.6,
n 5
2 ∑ x2i − nx̄2 322 + 882 + 672 + 752 + 562 − 4 × 63.6
s = = = 448.3.
n−1 4
(b) We don’t know! And unless we literally gather and weigh every single Singaporean, we
will never know what exactly the average weight of a Singaporean is.
All we’ve found in part (a) is an estimate (63.6 kg) for the average weight of a Singaporean.
We know that on average, the estimator we uses “gets it right”.
However, it could well be that we’re unlucky (and got 5 unusually heavy or unusuully light
persons) and the estimate of 63.6 kg is thus way off.
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn E [X1 + X2 + ⋅ ⋅ ⋅ + Xn ]
E [X̄] = E [ ]=
n n
E [X1 ] + E [X2 ] + ⋅ ⋅ ⋅ + E [Xn ] µ + µ + ⋅ ⋅ ⋅ + µ nµ
= = = = µ.
n n n
We have just shown that E [X̄] = µ. In other words, we’ve just shown that X̄ is an unbiased
estimator for µ.
Answer to Exercise 259. (a) The observed random sample is (x1 , x2 , . . . , x10 ) =
(1, 1, 1, 1, 1, 1, 1, 0, 0, 0). The observed sample mean and observed sample variance are
x1 + x2 + ⋅ ⋅ ⋅ + x10
x̄ = = 0.7,
n
2 2 2
2 (x1 − x̄) + (x2 − x̄) + ⋅ ⋅ ⋅ + (x10 − x̄) 7 ⋅ 0.32 + 3 ⋅ 0.72 ⋅
s = = = 0.23.
n−1 9
(b) Yes, the observed sample mean x̄ = 0.7 is an unbiased estimate for the true population
mean µ (i.e. the true proportion of coin flips that are heads).
⋅
And yes, the observed sample variance s2 = 0.23 is an unbiased estimate for the true
population variance σ 2 .
(c) No, this is merely one observed random sample, from which we generated a single
estimate (“guess”) — namely x̄ = 0.7 — of the true population mean µ.
All we know is that the sample mean X̄ is an unbiased estimator for the true population
mean µ. That is, the average estimate generated by X̄ will equal µ.
However, any particular estimate x̄ may or may not be equal to µ. Indeed, if we’re unlucky,
our particular estimate may be very far from the true µ.
1 1
Answer to Exercise 260. V [X̄] = V [ (X1 + X2 + ⋅ ⋅ ⋅ + Xn )] = 2 V [X1 + X2 + ⋅ ⋅ ⋅ + Xn ] =
n n
1 1 σ 2
2
(V [X1 ] + V [X2 ] + ⋅ ⋅ ⋅ + V [Xn ]) = 2 (nσ 2 ) = .
n n n
k
(b) The population variance σ 2 is the number defined by σ 2 = ∑ (xi − µ) /k. It measures
i=1
the dispersion across the population values.
n
(c) The sample mean X̄ is a random variable defined by X̄ = ∑ Xi /n. It is the average
i=1
of all values in a random sample.
n
(d) The sample variance S 2 is a random variable defined by S 2 = ∑ (Xi − X̄) / (n − 1).
i=1
It measures the dispersion across the values in a random sample.
(e) The mean of the sample mean, also called the expected value of the sample mean, is the
number E [X̄]. The interpretation is that if we we have infinitely-many observed samples
of size n, calculate the observed sample mean for each, then E [X̄] is equal to the average
across the observed sample means. It can be shown that E [X̄] = µ and hence that the
sample mean X̄ is an unbiased estimator for the population mean µ.
(f) The variance of the sample mean is the number V [X̄]. The interpretation is that if
we have infinitely-many observed random samples of size n, calculate the observed sample
mean for each, then V [X̄] measures the dispersion across the observed sample means.
(g) The mean of the sample variance, also called the expected value of the sample variance,
is the number E [S 2 ]. The interpretation is that if we have infinitely-many observed
random samples of size n, calculate the observed sample variance for each, then E [S 2 ] is
equal to the average across the observed sample variances. It can be shown that E [S 2 ] = σ 2
and hence that the sample variance S 2 is an unbiased estimator for the population variance
σ2 .
(h) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample mean as
x1 + x2 + x3 1 + 1 + 0 2
x̄ = = = .
3 3 3
The observed sample mean is the average of all values in an observed random sample.
(i) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample variance as
2 2 2
2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) 1/9 + 1/9 + 4/9 1
s = = = .
3−1 2 3
The observed sample variance measures the dispersion across the observed sample variances.
Answer to Exercise 262. Let µ be the probability that a coin-flip is heads. The null
and alternative hypotheses are
Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is
Answer to Exercise 263. Let µ be the true long-run proportion of coin-flips that are
heads. The null and alternative hypotheses are
Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is
Thus, the critical value is 15 (this is the value of t at which we are just able to reject H0 at
the α = 0.05 significance level).
And the critical region is {15, 16, . . . , 20} (this is the set of values of t at which we’d be able
to reject H0 at the α = 0.05 significance level).
(b) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ ≠ 0.5.
The test statistic T is the number of heads (out of the 20 coin-flips).
For t = 14, the corresponding p-value is
Thus, the critical value is 15 and the critical region is {15, 16, . . . , 20}.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
35 + 35 + 31 + 32 + 33 + 34 + 31 + 34 + 35 + 34
x̄ = = 33.4.
10
⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.5271.
⎝ 9/10 ⎠ ⎝ 9/10 ⎠
The large p-value does not cast doubt on or provide evidence against H0 . We fail to reject
H0 at the α = 0.05 significance level.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The large p-value casts doubt on or provides evidence against H0 . We reject H0 at the
α = 0.05 significance level.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The observed sample mean is x̄ = 33.4. And the observed sample variance is s2 = 11.2.
The fairly small p-value casts some doubt on or provides some evidence against H0 . But
we fail to reject H0 at the α = 0.05 significance level.
35 + 35 + 31 + 32 + 33 + 34 + 31 + 34 + 35 + 34
x̄ = = 33.4,
10
2 2 2 2
2 ∑ (xi − x̄) (35 − 33.4) + (35 − 33.4) + ⋅ ⋅ ⋅ + (34 − 33.4)
s = = ≈ 2.489.
n−1 10 − 1
⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞
= P T9 ≥ √ + P T9 ≤ √ ≈ 0.2598.
⎝ 2.489/10 ⎠ ⎝ 2.489/10 ⎠
The large p-value does not cast doubt on or provide evidence against H0 . We fail to reject
H0 at the α = 0.05 significance level.
Answer to Exercise 269. The observed sample mean is x̄ = 68 and the observed sample
variance (use Fact 81(a)) is
2
[∑n xi ]
50 × 5000 − (68×50)
2
∑i=1 x2i − i=1n
n
2 50
s = = ≈ 383.7.
n−1 49
Let µ be the true average weight of a Singaporean. The competing hypotheses are H0 ∶ µ =
75 and HA ∶ µ < 75.
(This is a one-tailed test, because your friend’s claim is that the average American is heavier
than the average Singaporean. If the claim were instead that the average American’s weight
is different from the average Singaporean’s, then we’d have a two-tailed test.)
Since the sample size n = 50 is “large enough”, we can appeal to the CLT. The p-value is
CLT ⎛ 68 − 75 ⎞
p = P (X̄ ≤ 68∣H0 ) ≈ P Z ≤ √ ≈ 0.0058.
⎝ 383.7/50 ⎠
The small p-value casts doubt on or provides evidence against H0 . We can reject H0 at any
conventional significance level (α = 0.1, α = 0.05, or α = 0.01).
1200 q
1000
800
600
400
200
p ($)
0
0 2 4 6 8 10 12
n
∑ (pi − p̄) (qi − q̄) = (8 − p̄) (300 − q̄) + (9 − p̄) (250 − q̄) + ⋅ ⋅ ⋅ + (8 − p̄) (400 − q̄)
i=1
= (8 − 7.8) (300 − 470) + (9 − 7.8) (250 − 470) + ⋅ ⋅ ⋅ + (8 − 7.8) (400 − 470) = −2480,
¿ √
Án
Á
À∑ (pi − p̄)2 = (8 − p̄)2 + (9 − p̄)2 + (4 − p̄)2 + (10 − p̄)2 + (8 − p̄)2
i=1
√ √
2 2 2 2 2
= (8 − 7.8) + (9 − 7.8) + (4 − 7.8) + (10 − 7.8) + (8 − 7.8) = 20.8 ≈ 4.56070170,
¿ √
Án
Á
À∑ (qi − q̄)2 = (300 − q̄)2 + (250 − q̄)2 + ⋅ ⋅ ⋅ + (400 − q̄)2
i=1
√ √
2 2 2
= (300 − 470) + (250 − 470) + ⋅ ⋅ ⋅ + (400 − 470) = 368000 ≈ 606.63003552.
(b) i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i 446 327 923 208 406
ûi = qi − q̂i −146 −77 77 192 −46
1000 q
900
800
700
600
500
400
300
200
100 p ($)
0
(c) 0 2 4 6 8 10
5
2 2 2
(d) The SSR is ∑ û2i ≈ (−146) + (−77) + 772 + 1922 + (−46) = 72308.
i=1
After Step 9. After Step 10. After Step 11. After Step 12.
The TI84 tells us that r = −.8963881445 and the regression line is y = ax+b = −119.2307692+
1400. This is indeed consistent with the answers from the previous exercises.
Answer to Exercise 274. In the previous exercises, we already calculated that the OLS
line of best fit is q = 1400 − 119.2p. Thus,
(a) By interpolation, a barber who charged $7 per haircut sold 1400 − 119.2 × 7 ≈ 566
haircuts.
(b) By extrapolation, a barber who charged $200 per haircut sold 1400−119.2×200 = −22440
haircuts. This is plainly absurd.
The second prediction is obviously absurd and thus obviously less reliable than the first.
(b) r ≈ 0.984.
dy a
Answer to Exercise 276 (9740 N2015/I/1). (i) Compute = −2 3 + b. From the
dx x
information given, we have this system of equations:
a 1
2
+ 1.6b + c = −2.4,
1.6
a 2
2 − 0.7b + c = 3.6,
(−0.7)
R
dy RRRR a 3
RRR = −2 3 + b = 2.
dx RR 1
Rx=1
a
(ii) − + bx + c = 0 Ô⇒ x ≈ −0.589 (calculator).
x2
x+1
• Intercepts. The graph of y = crosses the vertical axis at (0, 1) and the horizontal
1−x
axis at (−1, 0).
x+1 x+1
• Asymptotes. Observe that as x → 1, → ±∞, so that the graph of y = has
1−x 1−x
vertical asymptote x = 1. And as x → ±∞, (x + 1)/(1 − x) → −1, so that the graph of
x+1
y= has horizontal asymptote y = −1.
1−x
• The centre is thus (1, −1).
• The two lines of symmetry run through the centre and bisect the angles formed by
the asymptotes.
x+1 x+1
Armed with the graph of y = , it is easy to draw the graph of y = ∣ ∣ — simply
1−x 1−x
x+1
reflect the parts of the graph of y = where y < 0 in the horizontal axis. We can also
1−x
draw y = x + 2.
(0, 1) y y
Vertical x=1
intercept vertical x=1 y=x+2
asymptote vertical
(-1, 0) asymptote
Horizontal
intercept x
R
Q
y = -1 y = -1
horizontal horizontal
asymptote asymptote
P
(0, 1) x
(-1, 0) Vertical intercept
Horizontal intercept
Answer to Exercise 277 (9740 N2015/I/2) (ii) Using your graphing calculator, the
intersection points have x-coordinates Px ≈ −1.732, Qx ≈ 0.414, and Rx ≈ 1.732. Hence, the
inequality holds if and only if x ∈ (−1.732, 0.414) ∪ (1.732, ∞).
As an exercise, let me also do this without a calculator. The equation ∣(x + 1)/(1 − x)∣ = x+2
⇐⇒
x+1
(a) ” = x + 2 AND x ∈ [−1, 1)” OR
1−x
x+1
(b) ” − = x + 2 AND x ∉ [−1, 1)”.
1−x
Now,
x+1 √
= x + 2 ⇐⇒ x + 1 = (1 − x)(x + 2) ⇐⇒ x2 + 2x − 1 = 0 ⇐⇒ x = −1 ± 2,
1−x
x+1 √
− = x + 2 ⇐⇒ −x + 1 = (1 − x)(x + 2) ⇐⇒ x2 − 3 = 0 ⇐⇒ x = ± 3.
1−x
√
So condition (a) is equivalent to x = −1 + 2. This is the x-coordinate of Q.
√
And condition (b) is equivalent to x = ± 3. These are the x-coordinates of P and R.
Altogether then,
x+1 √ √ √
∣ ∣ = x + 2 ⇐⇒ x ∈ (− 3, −1 + 2) ∪ ( 3, ∞) .
1−x
y
2
y = 1 + f (0.5x)
(iii) Again, the easy way is by graphing calculator, but again as an exercise, let’s also do
it without a calculator. First, stretch the graph of y = f (x) horizontally, outwards from
the vertical axis by a factor of 2, to get the graph of y = f (0.5x). Then move it upwards
vertically by 1 unit to get the graph of the equation y = 1 + f (0.5x).
Note though that the transformations must be done piece-wise. The piece or region
0 ≤ x ≤ 1 in the old graph corresponds to the region 0 ≤ x ≤ 2 in the new graph. And the
region 1 < x ≤ 3 in the old graph corresponds to the region 2 < x ≤ 6 in the new graph. (To
save space, I’ve plot this on the same diagram.).
(ii) From our work above, the inverse function is f −1 ∶ (−∞, 0) → (1, ∞) defined by y ↦
√
1 − 1/y.
2+x
(b) Let y = . Rearranging, yx2 + x + 2 − y = 0. This is a quadratic. Since x ∈ R, the
1 − x2
determinant of this quadratic must be non-negative — 12 − 4y(2 − y) ≥ 0.
Rearranging, 4y 2 − 8y + 1 ≥ 0. This is a ∪-shaped quadratic with zeros
√
8± (−8)2 − 4(4)(1) √
= 1 ± 0.5 3.
2(4)
√ √
Hence, 12 − 4y(2 − y) ≥ 0 ⇐⇒ y ∈ (−∞, 1 − 0.5 3] ∪ [1 + 0.5 3, ∞) .
1 1 1−x 1−x 1
f 2 (x) = f ( )= = = =1− .
1−x 1 − 1/(1 − x) 1 − x − 1 −x x
To show that f 2 (x) = f −1 (x), we need merely show that f 2 (y) = x ⇐⇒ f (x) = y. To this
end, write
1 1 1 1
f 2 (y) = x ⇐⇒ 1 − = x ⇐⇒ f (x) = f (1 − ) = = = y.
y y 1 − (1 − 1/y) 1/y
Finally, the curve y 2 = f (x) crosses the x-axis at the same points as the curve y = f (x),
namely A, B, and C.
D = (0, d)
Vertical
intercept
y = f(x)
y2 = f(x)
Horizontal
intercept
for both graphs
(ii) The tangents to the curve y 2 = f (x) at the points where it cross the x-axis are vertical.
dy dy dx 6 1
= ÷ = = = 0.4.
dx dt dt 6t t
So t = 2.5.
(ii) The writers of this question seem to assume that p = 0, so that is what I’ll assume
too.95
1
The tangent line at (3p2 , 6p) has equation y − 6p = (x − 3p2 ). Where this line meets the
p
1
y-axis, we have y − 6p = (0 − 3p2 ) = −3p or y = 3p. So D = (0, 3p).
p
3p2 + 0 6p + 3p
The mid-point of P D is ( , ) = (1.5p2 , 4.5p). So the cartesian equation for the
2 2
2
locus of the mid-point of P D is x = 1.5 (y/4.5) = y 2 /13.5.
√
6± (−6)2 − 4(1)(−3) √ √
y= = 3 ± 12 = 3 ± 2 3.
2
√ √
2 3 2 3
So the last inequality is true if and only if y ∈ (−∞, 3 − ] ∪ [3 + , ∞).
3 3
95
If p = 0, then P = (0, 0) and the tangent at P is vertical, so that D could be any point on the y-axis.
• Intercepts. The graph crosses the vertical axis at (0, −1) and the horizontal axis at
(−1, 0).
• Asymptotes. As x → 0.5, y → ±∞. Hence, x = 0.5 is a vertical asymptote. As x → ±∞,
y → 0.5. Hence, y = 0.5 is a horizontal asymptote.
The intersection of the two asymptotes is (0.5, 0.5) — this is also the centre of the hyperbola.
There are two lines of symmetry, each running through the centre, and each bisecting an
angle formed by the two asymptotes.
y y=x
Line of
y=1-x (0.5, 0.5) symmetry
Line of Centre
symmetry
y = 0.5 y = (x + 1) / (2x - 1)
horizontal asymptote
x
(-1, 0)
Horizontal
intercept x = 0.5
vertical
asymptote
(0, -1)
Vertical
intercept
x+1
(ii) < 1 ⇐⇒ “x + 1 < 2x − 1 AND 2x − 1 > 0” OR “x + 1 > 2x − 1 AND 2x − 1 < 0”
2x − 1
⇐⇒ “2 < x AND x > 0.5” OR “2 > x AND x < 0.5” ⇐⇒ “x > 2 OR x < 0.5”.
Answer to Exercise 286 (9740 N2012/I/1). Let x, y, and z be the costs of, respec-
1
tively, the under-16, 16-65, and over-65 tickets. The system of equations is 9x + 6y + 4z =
2 3
$162.03, 7x + 5y + 3z = $128.36, 10x + 4y + 5z = $158.50.
So x = $7.65, y = $9.85, z = $8.52 (calculator).
−1 g(x) + k x+k
+k x + k + k(x − 1) x(1 + k)
g (g(x)) = = x−1
= = = x.
g(x) − 1 x+k
x−1 −1 x + k − (x − 1) k+1
(ii) Intercepts. The graph crosses the vertical axis at (0, −k) and the horizontal axis at
(−k, 0).
Asymptotes. As x → 1, y → ±∞. Hence, the graph has a vertical asymptote x = 1.
Moreover, as x → ±∞, y → 1. Hence, the graph has a horizontal asymptote y = 1.
y
y = -x y = (x + k) / (x - 1)
Line of
symmetry
(-k, 0) (1, 1)
Horizontal Centre y=1
intercept horizontal asymptote
x=0
y=x vertical
Line of asymptote
symmetry
(0, -k)
Vertical
intercept
1
1. Move the graph rightwards by 1 unit to get the graph of y = .
x−1
2. Stretch it vertically by a factor of k + 1, outwards from the horizontal axis to get the
k+1
graph of y = .
x−1
k+1 x+k
3. Move the graph upwards by 1 unit to get the graph of y = 1 + = .
x−1 x−1
y y
y = f(x ) y = |f(x )|
x x
Answer to Exercise 290 (9740 N2011/I/2). (i) The given information forms this
2 1 2 2 2
system of equations: a (−1.5) + b (−1.5) + c = 4.5, a (2.1) + b (2.1) + c = 3.2, a (3.4) +
3
b (3.4) + c = 4.1.
So a ≈ 0.215, b ≈ −0.490, and c ≈ 3.281 (calculator).
49
(ii) f ′ (x) = 2ax + b ≈ 0.430x − 0.490 > 0 if and only if x > ≈ 1.140.
43
y
x = - 0.5
Vertical
asymptote
for f (x)
(0, 3)
Vertical
y = f -1(x)
intercept
(3, 0)
(0.5 [e -3 -1] , 0) Horizontal y = f(x )
Horizontal intercept
intercept
(iii) If the graph of f intersects the line y = x, then it also intersects the graph of f −1 at
the points where x = f (x). In this case, x = f (x) ⇐⇒ x = ln(2x + 1) + 3 Ô⇒ x ≈ −0.485,
5.482 (calculator).
In sketching our graph, it helps to keep in mind that the cubic equation y = x3 has a single
stationary inflexion point and has no turning points. And thus, the same must be true for
y = (2x − 2)3 − 6.
(ii) The graph of f −1 is simply the reflection of the graph of f in the line y = x.
(0, 0.5 + 2)
Vertical
intercept of
y = f -1(x)
y = f -1(x)
x
(0.5 , 0)
Horizontal
intercept of
(-14, 0) y = f(x)
Horizontal
intercept of y = f (x) = (2x - 2)3 - 6
y = f -1(x)
(0, -14)
Vertical
intercept of
y = f(x)
96
If instead this is interpreted to mean a vertical stretch outwards from the horizontal axis, then after the stretch, we have
the graph of y = 2(x − 2)3 .
y = f (x)
y=0
Horizontal
asymptote
x
x = -1
Vertical
asymptote
x=1
(0, -1) Vertical
Vertical asymptote
intercept
(ii) By observation, the graph of f is symmetric in the vertical axis and if the domain of f
is restricted to R+0 , the new function thus formed would be invertible. Hence, the smallest
k for which f −1 exists is k = 0.
1 1 (x − 3)2
(iii) f g(x) = f (g(x)) = f ( )= =
x−3 1 2
) − 1 1 − (x − 3)
2
( x−3
(x − 3)2 (x − 3)2
= = .
[1 − (x − 3)] [1 + (x − 3)] (4 − x)(x − 2)
y
y = fg (x)
The point
(3, 0) is not
part of the
graph of y
= fg (x).
(x − 3)2 x2 − 6x + 9 1 1
= 2 = −1 + 2 = −1 + .
(4 − x)(x − 2) −x + 6x − 8 −x + 6x − 8 (4 − x)(x − 2)
Observe also that (4 − x)(x − 2) is a ∩-shaped quadratic with maximum value 1 (at x = 3).
But given the restrictions that x ≠ 2, x ≠ 3, x ≠ 4, we have (4 − x)(x − 2) ∈ (−∞, 0) ∪ (0, 1).
1
So ∈ (−∞, 0) ∪ (1, ∞).
(4 − x)(x − 2)
1
And −1 + ∈ (−∞, −1) ∪ (0, ∞). The range of f g is (−∞, −1) ∪ (0, ∞).
(4 − x)(x − 2)
√ √ √
17 ± (−17)2 − 4(3)(−166) 17 ± 289 + 1992 17 ± 2281
n= = = .
6 6 6
We can discard the negative root. The positive root that remains is approximately 10.8.
Bearing in mind that n must be an integer, we conclude that the set of values for which un
is greater than 100 is {11, 12, 13, . . . }.
y
y=x+3
Line of
symmetry
(0.5, 0.5)
Centre (0, )
Vertical
intercepts (-1, 0)
y = (x - 2) / (x + 1) Horizontal
intercept
y = 0.5
horizontal asymptote
x
(0, -1)
Vertical
intercept
( , 0) x = 0.5 y=-x-1
Horizontal vertical Line of
intercepts asymptote symmetry
(x − 2)2
(ii) Square both sides of the equation for C1 y 2 = . Plug this into the equation for
(x + 2)2
(x−2)2
x2 (x+2)2 2 2(x − 2)2
C2 : + = 1 ⇐⇒ x + = 6 ⇐⇒ x2 (x + 2)2 + 2(x − 2)2 = 6(x + 2)2 ⇐⇒
6 3 (x + 2)2
2(x − 2)2 = 6(x + 2)2 − x2 (x + 2)2 = (x + 2)2 (6 − x2 ), as desired.
(iii) They are −0.5149 and 2.445 (correct to 4 s.f.).
a a
(ii) The range of g is (−∞, 0) ∪ (0, ∞), while the domain of f is (−∞, ) ∪ ( , ∞). Since
b b
a/b ≠ 0, the range of g is not a subset of the domain of f and so f g does not exist.
ax
(iii) We have f −1 (x) = x ⇐⇒ = x ⇐⇒ ax = x(bx − a) ⇐⇒ 0 = x(bx − 2a).
bx − a
a
Thus, x = 0 or x = 2 solves f −1 (x) = x.
b
(ii) If ad − bc = 0, then f ′ (x) = 0 for all x. Hence, the graph is simply a horizontal line.
dy 3 × 1 − (−7) × 2 17
(iii) In this case, = = which is always positive. Hence, the
dx (2x + 1) 2 (2x + 1)2
graph has a positive gradient at all points.
y
x = - 0.5
vertical
asymptote
y = 1.5
y = (3x - 7) / (2x + 1) horizontal asymptote
y=
horizontal x
asymptotes
(7 / 3, 0)
y2 = (3x - 7) / (2x + 1) Horizontal
intercept
for both graphs
(0, -7)
Vertical
intercept
(iv) (a) This is a rectangular hyperbola that crosses the vertical axis at (0, −7) and the
7 3x − 7 8.5
horizontal axis at ( , 0). Since = 1.5 − , there is a vertical asymptote x = −0.5
3 2x + 1 2x + 1
and a horizontal asymptote y = 1.5.
(b) The graph of y 2 = f (x) is symmetric in the horizontal axis. It crosses the horizontal
7 √
axis at ( , 0). It has vertical asymptote x = −0.5 and horizontal asymptotes y = ± 1.5.
3
(ii) y 2 = f (x) is symmetric in the horizontal axis and empty where f (x) < 0. At the origin,
the tangent to the curve is vertical.
x=±1 y
vertical
asymptotes y = x / (x2 - 1)
y=0
horizontal asymptote
for both graphs
x
y2 = x / (x2 - 1)
(0, 0)
Horizontal and
vertical intercepts
for both graphs
x x
The intersection points of y = and y = e x
are given by = ex ⇐⇒ x = ex (x2 − 1)
x −1
2 x −1
2
⇐⇒ xe−x = x2 − 1 ⇐⇒ 1 + xe−x = x2 , as desired.
√ √
Try the starting value x0 = 2. Then x1 = 1 + x0 e−x0 = 1 + 2e−2 ≈ 1.12724.
√ √
x2 = 1 + x1 e−x1 = 1 + 1.12724e−1.12724 ≈ 1.16839.
√ √
x3 = 1 + x2 e−x2 = 1 + 1.16839e−1.16839 ≈ 1.16757.
√ √
x4 = 1 + x3 e−x3 = 1 + 1.16757e−1.16757 ≈ 1.16759.
So the positive root is 1.17 (correct to 2 decimal places).
9 y
y = f (x)
The point (1, 4) is
8 not part of the
graph of y = f -1(x).
7
6 y = f -1(x)
3
y=x
2 line
The point (4, 1) is
1 not part of the
graph of y = f (x).
0
-2 0 2 4 6 8
x
-1
(ii) The inverse function f −1 has domain (1, ∞) (this is simply the range of f ) and codomain
2
(4, ∞) (this is simply the domain
√ of f ). For the mapping √rule, write y = f (x) = (x − 4) + 1
⇐⇒ y − 1 = (x − 4)2 ⇐⇒ ± y − 1 = x − 4 ⇐⇒ x = 4 ∓ y − 1.
√
−1
We know that x > 4. Hence, f has mapping rule y ↦ 4 + y − 1.
(iii) See above.
(iv) Reflect the graph of f in the line y = x to get the graph of f −1 .
to f (x) = f −1 (x) is given by f (x) = x or (x − 4)2 + 1 = x ⇐⇒ x2 − 9x + 17 = 0
The solution √
√
9 ± 92 − 4(17) 9 ± 13
⇐⇒ x = = . We can reject the smaller root because it is less than
2 √2
9 + 13
4. Hence, the solution is .
2
Altogether then, the given inequality holds if and only if x ∈ (−∞, −3) ∪ (−2, −1) ∪ (7, ∞).
Answer to Exercise 301 (9740 N2007/I/2). (i) f has domain (−∞, 3) ∪ (3, ∞) and
range (−∞, 0) ∪ (0, ∞). g has domain R and range R.
Hence, the range of f is a subset of the domain of g. And so the composite function gf
exists. It has the same domain as f , namely (−∞, 3) ∪ (3, ∞) and the same codomain as g,
1 1
namely R. Its mapping rule is gf ∶ x ↦ g (f (x)) = g ( )= .
x−3 (x − 3)2
In contrast, the range of g is not a subset of the domain of f . And so the composite function
f g does not exist.
1
(ii) To find the mapping rule for the inverse function f −1 , write: y = f (x) = ⇐⇒
x−3
1 1
= x − 3 (division is permitted ∵y ≠ 0) ⇐⇒ + 3 = x.
y y
Hence, the inverse function f −1 has domain (−∞, 0) ∪ (0, ∞) (this is simply the range of f ),
1
codomain (−∞, 3) ∪ (3, ∞) (this is simply the domain of f ), and mapping rule y ↦ + 3.
y
1
1. Move the graph of y = 1/x to the left by 2 units to get the graph of y = .
x+2
2. Stretch it vertically by a factor of 3, outwards from the horizontal axis to get the graph
3
of y = .
x+2
3
3. Move it up by 2 units to get the graph of y = 2 + .
x+2
• Intercepts. The graph intersects the vertical axis at (0, 3.5) and the horizontal axis at
(−3.5, 0).
• Asymptotes. As x → −2, y → ±∞ and so x = −2 is a vertical asymptote. Also, as
x → ±∞, y → 2 and so y = 2 is a horizontal asymptote.
y=-x y
Line of y = (2x + 7) / (x + 2)
symmetry (-2, 2)
Centre
y=x +4
Line of
symmetry
y=2
horizontal asymptote
x
(0, -3.5)
Vertical
intercept
x = -2
vertical
asymptote
(-3.5, 0)
Horizontal
intercept
So x = 3.50, y = 2.6, z = 4.9, and the total amount paid by Lee Lian was 7.65 dollars
(calculator).
4x + 1 13
Answer to Exercise 304. (9233 N2007/II/4) (i) Write y = = 4+ . So x = 3
x−3 x−3
is a vertical asymptote and y = 4 is a horizontal asymptote.
1 1
(ii) The graph intersects the vertical axis at (0, − ) and the horizontal axis at (− , 0).
3 4
y
y = (4x + 1) / (x - 3)
(3, 4)
y=-x+7 Centre y=x +1
Line of Line of
symmetry symmetry
y=4
horizontal asymptote
x=3
vertical
(- 1 / 4, 0) asymptote
Horizontal
intercept
(0, - 1 / 3)
Vertical
intercept
(iii) The range of f is (−∞, 4) ∪ (4, ∞), so this is also the domain of f −1 . Write y = f (x) =
13 13 13
4+ ⇐⇒ y − 4 = ⇐⇒ = x − 3 (the division is permitted ∵y ≠ 4) ⇐⇒
x−3 x−3 y−4
13 13
3+ = x. So f −1 (x) = 3 + .
y−4 y−4
The composite function g 6 has domain (0, ∞), codomain (0, ∞), and mapping rule g 6 ∶ x ↦
g 2 (g 4 (x)) = g 2 (x) = x.
⋮
The composite function g 34 has domain (0, ∞), codomain (0, ∞), and mapping rule g 34 ∶
x ↦ g 2 (g 32 (x)) = g 2 (x) = x.
The composite function g 35 has domain (0, ∞), codomain (0, ∞), and mapping rule g 35 ∶
x ↦ g (g 34 (x)) = g(x) = 3/x.
h(x) = 5f (x) + 3.
Altogether then, the inequality holds when x ∈ (−∞, −3) ∪ [0, 1] ∪ (3, ∞).
50
Answer to Exercise 307 (9740 N2015/I/8). (i) Athlete A will take (2T +49×2)× =
2
50T + 49 × 50 = 50T + 2450 seconds to complete. The required time interval (in seconds) is
[5400, 6300]. So we need 50T + 2450 ∈ [5400, 6300]. Or equivalently, we need T ∈ [59, 77].
1 − 1.0250
(ii) Athlete B will take t = 50t(1.0250 − 1) seconds to complete. The required
1 − 1.02
time interval (in seconds) is [5400, 6300]. So we need 50t(1.0250 − 1) ∈ [5400, 6300]. Or
equivalently, we need t ∈ [63.845, 74.486].
(iii) T = 59 and t = 63.845. So Athlete A completes the last lap in T + 49 × 2 = 157 s, while
athlete B completes it in t × 1.0249 ≈ 168 s. And so the difference is 11 s.
k
1
∑ r(r + 2)(r + 5) = k(k + 1)(3k 2 + 31k + 74).
r=1 12
1
1 × 3 × 6 = 18 = 1(1 + 1)(3 × 12 + 31 × 1 + 74). ✓
12
j
1
∑ r(r + 2)(r + 5) = j(j + 1)(3j 2 + 31j + 74).
r=1 12
j+1
1
∑ r(r + 2)(r + 5) = (j + 1) [(j + 1) + 1] [3(j + 1)2 + 31(j + 1) + 74] .
r=1 12
j+1 j
1
∑ r(r + 2)(r + 5) = ∑ r(r + 2)(r + 5) + (j + 1) [(j + 1) + 2] [(j + 1) + 5]
r=1 r=1
5 1
= j(j + 1)(3j 2 + 31j + 74) + (j + 1) [(j + 1) + 2] [(j + 1) + 5]
12
6 1
= j(j + 1)(3j 2 + 31j + 74) + (j + 1)(j + 3)(j + 6)
12
7 j+1
= [j(3j 2 + 31j + 74) + 12(j + 3)(j + 6)]
12
8 j+1
= (3j 3 + 31j 2 + 74j + 12j 2 + 108j + 216)
12
9 j+1
= (3j 3 + 43j 2 + 182j + 216)
12
4 1
= (j + 1)(j + 2) (3j 2 + 37j + 108)
12
3 1
= (j + 1)(j + 2) (3j 2 + 6j + 3 + 31j + 31 + 74)
12
2 1
= (j + 1) [(j + 1) + 1] [3(j + 1)2 + 31(j + 1) + 74] , as desired.
12
(2A + 2B)r + 3A + B
= .
4r2 + 8r + 3
2 1 1
= − , as desired.
4r2 + 8r + 3 2r + 1 2r + 3
n n
2 1 1
(ii) ∑ 2 = ∑( − )
r=1 4r + 8r + 3 r=1 2r + 1 2r + 3
1 1 1 1 1 1 1 1
= ( − ) + ( − ) + ( − ) + ⋅⋅⋅ + ( − )
3 5 5 7 7 9 2n + 1 2n + 3
1 1
= − .
3 2n + 3
1
(iii) ≤ 0.001 ⇐⇒ 1000 ≤ 2n + 3 ⇐⇒ n ≥ 498.5. So the smallest n is 499.
2n + 3
1
pj+1 = 4pj − 7 = 4 [ (7 − 4j )] − 7
3
28 1
= − 7 − 4 × 4j = (7 − 4j+1 ),
3 3
as desired.
(ii)
n n
1 1 n
∑ pr = ∑ (7 − 4r ) = ∑(7 − 4r )
r=1 r=1 3 3 r=1
1 n n
1 4(1 − 4n )
= (∑ 7 − ∑ 4r ) = [7n − ]
3 r=1 r=1 3 1−4
1 4(1 − 4n ) 4 7n 4n+1
= [7n + ]= + − .
3 3 9 3 9
1
(b) (i) As n → ∞, → 0 and so Sn → 1.
(n + 1)!
(ii)
1 1 1 1
un = Sn − Sn−1 =1− − (1 − ) = −
(n + 1)! n! n! (n + 1)!
1 n
= [n + 1 − 1] = .
(n + 1)! (n + 1)!
n n
(b) (8 + 8n) × metres. Set (8 + 8n) × = 5000 and solve: 4n2 + 4n − 5000 = 0 ⇐⇒
2 2
n2 + n − 1250 = 0 ⇐⇒
√
−1 ± 12 − 4(1)(−1250) √
n= = −0.5 ± 0.5 5001.
2
√
The negative root can be ignored. The positive root is −0.5 + 0.5 5001 ≈ 34.859. Hence,
the athlete needs to complete at least 35 stages.
8 + 2 × 8 + 2 × 16 + 2 × 32 + . . . = 8 (1 + 2 + 4 + 8 + . . . )
n
1 − 2n
= 8 ∑ 2k−1 = 8 ×
k=1 1−2
= 2n+3 − 8 metres.
ln 10008
Set 2n+3 − 8 = 10000 and solve: n = − 3 ≈ 10.288. So at the instant when he has run
ln 2
exactly 10 km, he is in the midst of running his 11th stage.
We know that after completing 10 stages, he has run 213 − 8 = 8184 m. So at the current
instant, he has completed 1816 m of the 11th stage.
The 11th stage is 8 × 210 = 8192 m long, which means he has not yet completed even half of
the 11th stage.
So he is now 1816 m from O, running away from O.
n−1
ln p = ln [(2/3) × 128]
n−1
= ln (2/3) + ln 128
= (n − 1) ln (2/3) + ln 27
= (n − 1) [ln 2 − ln 3] + 7 ln 2
= (n + 6) ln 2 + (−n + 1) ln 3.
(ii) The total length of string cut off approaches (i.e. keeps getting closer to, but never
reaches) 384 cm:
2 1 2 2 2 3 128
128 + ( ) × 128 + ( ) × 128 + ( ) × 128 + . . . → = 384.
3 3 3 1 − 2/3
(iii) If n pieces are cut off, the total length cut off is
n
2 1 2 2 2 3 2 n 128 [1 − (2/3) ]
128 + ( ) × 128 + ( ) × 128 + ( ) × 128 + ⋅ ⋅ ⋅ + ( ) × 128 = .
3 3 3 3 1 − 2/3
n
128 [1 − (2/3) ] n
= 380 ⇐⇒ 384 [1 − (2/3) ] = 380
1 − 2/3
n 95 95 n
⇐⇒ 1 − (2/3) = ⇐⇒ 1− = (2/3)
96 96
1 n 1
⇐⇒ = (2/3) ⇐⇒ ln = n ln (2/3)
96 96
ln(1/96)
⇐⇒ =n Ô⇒ n ≈ 11.257.
ln(2/3)
So 12 pieces must be cut off before the total length cut off is greater than 380 cm.
k
1
∑ r(2r2 + 1) = k(k + 1)(k 2 + k + 1).
r=1 2
1
∑ r(2r2 + 1) = 1(2 × 12 + 1)
r=1
=3
1
= × 1 × (1 + 1)(12 + 1 + 1).
2
1
= k(k + 1)(k 2 + k + 1). ✓
2
j
1
∑ r(2r2 + 1) = j(j + 1)(j 2 + j + 1).
r=1 2
j+1
1
∑ r(2r2 + 1) = (j + 1) [(j + 1) + 1] [(j + 1)2 + (j + 1) + 1] .
r=1 2
j+1 j
1 2
∑ r(2r + 1) = ∑ r(2r2 + 1) + (j + 1) [2 (j + 1) + 1]
2
r=1 r=1
2 1 2
= j(j + 1)(j 2 + j + 1) + (j + 1) [2 (j + 1) + 1]
2
5 1
= (j + 1) [j(j 2 + j + 1) + 4(j + 1)2 + 2]
2
6 1
= (j + 1) (j 3 + 5j 2 + 9j + 6)
2
4 1
= (j + 1) (j + 2) (j 2 + 3j + 3)
2
3 1
= (j + 1) [(j + 1) + 1] [(j + 1)2 + (j + 1) + 1] , as desired.
2
n
Next, ∑ r2 = 12 + 22 + 32 + ⋅ ⋅ ⋅ + n2
r=1
f (1) − f (0) f (2) − f (1) f (3) − f (2) f (n) − f (n − 1)
= + + + ⋅⋅⋅ +
6 6 6 6
f (n) − f (0) 2r + 3r + r + 24 − 24
3 2
= =
6 6
2n + 3n + n n(2n + 1)(n + 1)
3 2
= = .
6 6
n n
∑ f (r) = ∑ [r(2r2 + 1) + 3r2 + 24]
r=1 r=1
n n n
= ∑ r(2r2 + 1) + ∑ 3r2 + ∑ 24
r=1 r=1 r=1
n n
= ∑ r(2r2 + 1) + 3 ∑ r2 + 24n
r=1 r=1
1 n(2n + 1)(n + 1)
= n(n + 1)(n2 + n + 1) + 3 + 24n,
2 6
3un − 1 −3un − 1 1
− un = → 0 ⇐⇒ un → − .
6 6 3
(iii) Step #1. Let P(k) stand for the proposition that
14 1 k 1
uk = ( ) − .
3 2 3
14 1 1 1 7 1
( ) − = − = 2.✓
3 2 3 3 3
14 1 j 1
uj = ( ) − .
3 2 3
14 1 j+1 1
uj+1 = ( ) − .
3 2 3
j
3un − 1 3 [ 14 1 1
3 (2) − 3] − 1
uj+1 = =
6 6
j j
14 ( 12 ) − 1 − 1 14 ( 12 ) − 2
= =
6 6
j
14 1 1 1 14 1 j+1 1
= ( )( ) − = ( ) − , as desired.
3 2 2 3 3 2 3
√
2 −95 ± 952 − 4(5)(−5000)
Set 5n + 95n = 5000 and solve: n = = −9.5 ± 33.019.
10
We can ignore the negative root. We have −9.5 + 33.019 = 23.519. So Mrs A’s account first
became greater than $5000 on the 24th month — that is, on December 1 2002.
(ii) On the last day of the 1st month, Mr A’s account has 100 × (1.005) dollars. On the
last day of the 2nd month, it has [100 × (1.005) + 100] × 1.005 = 100 × 1.0052 + 100 × 1.005
dollars. Etc. So on the nth month, Mr A’s account has
1 − 1.005n
100 × 1.005n + 100 × 1.005n−1 + ⋅ ⋅ ⋅ + 100 × 1.005 = 100 × 1.005 ×
1 − 1.005
= 20100 × (1.005n − 1) .
50 251
20100 (1.005n − 1) = 5000 ⇐⇒ 1.005n − 1 = ⇐⇒ 1.005n =
201 201
251 251
⇐⇒ n ln 1.005 = ln ⇐⇒ n = ln ÷ ln 1.005 Ô⇒ n ≈ 44.541.
201 201
So Mr B’s account first became greater than $5000 on the 45th month — that is, in
September 2004.
(iii) Let 100r be the monthly percentage interest rate. Then on the second day of the 36th
month, Mr B’s account has (note that the initial $100 has only earned interest 35 times,
while the last $100 deposited hasn’t earned any interest)
35 34 1 1 − (1 + r)36 (1 + r)36 − 1
100(1 + r) + 100(1 + r) + ⋅ ⋅ ⋅ + 100(1 + r) + 100 = 100 = 100 .
1 − (1 + r) r
(1 + r)36 − 1 (1 + r)36 − 1
100 × = 5000 ⇐⇒ = 50.
r r
0.5
= [sin(1.5θ) − sin(0.5θ) + ⋅ ⋅ ⋅ + sin(n + 0.5)θ − sin(n − 0.5)θ]
sin(0.5θ)
(iii) Step #1. Let P(k) stand for the proposition that
k
cos(0.5θ) − cos(k + 0.5)θ
∑ sin(rθ) = .
r=1 2 sin(0.5θ)
1
1 5 2 sin θ sin(0.5θ)
∑ sin(rθ) = sin θ =
r=1 2 sin(0.5θ)
4 −2 sin θ sin(−0.5θ) 3 −2 sin [0.5(2θ)] sin [0.5(−θ)]
= =
2 sin(0.5θ) 2 sin(0.5θ)
2 cos(0.5θ) − cos(1 + 0.5)θ
= . ✓
2 sin(0.5θ)
2 3
To get from = to =, I used the fact that cos P − cos Q = −2 sin [0.5(P + Q)] sin [0.5(P − Q)],
which is actually printed on your List of Formulae! (So here’s another exam tip: Whenever
you see trigonometric functions and are stuck, go look up the List of Formulae.)
j
cos(0.5θ) − cos(j + 0.5)θ
∑ sin(rθ) = .
r=1 2 sin(0.5θ)
j+1
cos(0.5θ) − cos(j + 1 + 0.5)θ
∑ sin(rθ) = .
r=1 2 sin(0.5θ)
j+1 j
1
∑ sin(rθ) = ∑ sin(rθ) + sin [(j + 1)θ]
r=1 r=1
3 4
as desired. (Again, to get from = to =, I used the same trigonometric identity as before.)
37
256+(256−1×7)+(256−2×7)+⋅ ⋅ ⋅+(256 − 36 × 7) = [256 + (256 − 36 × 7)]× = 4810 metres.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ 2
=4
256
(ii) The theoretical maximum is = 2304 metres. The depth drilled at the end
1 − 8/9
1 − (8/9)n
of the nth day is 256 . We want the latter quantity to be more than 99% of
1 − 8/9
the former. That is, we want 1 − (8/9)n > 0.99 or 0.01 > (8/9)n or ln 0.01 > n ln(8/9) or
n > ln 0.01 ÷ ln(8/9) ≈ 39.099. So it takes 40 days.
un = Sn − Sn−1
= n(2n + c) − (n − 1) [2(n − 1) + c]
= 2n2 + cn − 2n2 + 4n − 2 − cn + c
= 4n − 2 + c.
un = 4n − 2 + c,
un+1 = 4(n + 1) − 2 + c.
So un+1 = un + 4.
k
1
∑ r(r + 2) = k(k + 1)(2k + 7).
r=1 6
1
1
∑ r(r + 2) = 1(1 + 2) = 3 = (1)(1 + 1)(2 × 1 + 7). ✓
r=1 6
j
1
∑ r(r + 2) = j(j + 1)(2j + 7).
r=1 6
j+1
1
∑ r(r + 2) = (j + 1) [(j + 1) + 1] [2(j + 1) + 7] .
r=1 6
j+1 j
1
∑ r(r + 2) = ∑ r(r + 2) + (j + 1) [(j + 1) + 2]
r=1 r=1
4 1
= j(j + 1)(2j + 7) + (j + 1) [(j + 1) + 2]
6
5 1
= (j + 1) {j(2j + 7) + 6 [(j + 1) + 2]}
6
6 1
= (j + 1) (2j 2 + 7j + 6j + 18)
6
7 1
= (j + 1) (2j 2 + 13j + 18)
6
3 1
= (j + 1)(j + 2)(2j + 9)
6
2 1
= (j + 1) [(j + 1) + 1] [2(j + 1) + 7] .
6
as desired.
1 0.5 0.5
= − .
r(r + 2) r r+2
Hence,
n
1
∑
r=1 r(r + 2)
n n
0.5 0.5 1 1
= ∑( − ) = 0.5 ∑ ( − )
r=1 r r + 2 r=1 r r + 2
1 1 1 1 1 1 1 1 1 1
= 0.5 [( − ) + ( − ) + ( − ) + ⋅ ⋅ ⋅ + ( − )+( − )]
1 3 2 4 3 5 n−1 n+1 n n+2
1 1 1 1 3 1 1
= 0.5 [ + − − ]= − − , as desired.
1 2 n+1 n+2 4 2(n + 1) 2(n + 2)
n
1 3 1 1 3
∑ = − − → .
r=1 r(r + 2) 4 2(n + 1) 2(n + 2) 4
n n
1 1 2 1
(ii) ∑ 3 = 0.5 ∑ ( − + )
r=2 r − r r=2 r − 1 r r+1
1 2 1 1 2 1 1 2 1
= 0.5[ ( − + ) + ( − + ) + ( − + )
1 2 3 2 3 4 3 4 5
1 2 1 1 2 1 1 2 1
+ ( − + ) + ( − + ) + ⋅⋅⋅ + ( − + )]
4 5 6 5 6 7 n−1 n n+1
1 1 1
= 0.5 ( − + ).
2 n n+1
1 1
(iii) As n → ∞, → 0 and → 0. So as n → ∞,
n n+1
n
1 1 1 1 1 1
∑ = 0.5 ( − + ) → 0.5 ( ) = .
r=2 r3 −r 2 n n+1 2 4
1
1
Step #2. Verify that P(1) is true: ∑ r2 = 1 = (1)(1 + 1)(2 × 1 + 1). ✓
r=1 6
j
1
∑ r2 = j(j + 1)(2j + 1).
r=1 6
j+1
1
∑ r2 = (j + 1) [(j + 1) + 1] [2(j + 1) + 1] .
r=1 6
j+1 j
2 1 4 1
∑ r = ∑ r2 + (j + 1)2 = j(j + 1)(2j + 1) + (j + 1)2
r=1 r=1 6
5 1 6 1
= (j + 1) [j(2j + 1) + 6(j + 1)] = (j + 1) (2j 2 + j + 6j + 6)
6 6
7 1 3 1
= (j + 1) (2j 2 + 7j + 6) = (j + 1)(j + 2)(2j + 3)
6 6
2 1
= (j + 1) [(j + 1) + 1] [2(j + 1) + 1] , as desired.
6
2n 2n n
1 1
(ii) ∑ r2 = ∑ r2 − ∑ r2 = 2n(2n + 1) [2(2n) + 1] − n(n + 1)(2n + 1)
r=n+1 r=1 r=1 6 6
n(2n + 1) n(2n + 1)
= [2(4n + 1) − (n + 1)] = (7n + 1) .
6 6
1
Sk = k(k + 1)(4k + 5).
6
S1 = u1 = 1(2 × 1 + 1) = 3
1
= 1(1 + 1)(4 × 1 + 5).✓
6
1
Sj = j(j + 1)(4j + 5).
6
1
Sj+1 = (j + 1) [(j + 1) + 1] [4(j + 1) + 5] .
6
1
Sj+1 = Sj + uj+1
4 1
= j(j + 1)(4j + 5) + (j + 1) [2(j + 1) + 1]
6
5 1
= (j + 1) [j(4j + 5) + 6(2j + 3)]
6
6 1
= (j + 1) (4j 2 + 5j + 12j + 18)
6
7 1
= (j + 1) (4j 2 + 17j + 18)
6
3 1
= (j + 1)(j + 2)(4j + 9)
6
2 1
= (j + 1) [(j + 1) + 1] [4(j + 1) + 5] , as desired.
6
−8.5 + 109.874
We can ignore the negative root. So n = ≈ 33.791. So it is only in the 34th
3
month that she has saved over $2000. That’s 1 October 2011.
(ii) (a) At the end of 2 years, her original$10 has earned 10 × 1.0224 − 10 ≈ 6.084 dollars in
compound interest.
24 23 22 1 1 − 1.0224
10 × 1.02 + 10 × 1.02 + 10 × 1.02 + ⋅ ⋅ ⋅ + 10 × 1.02 = 10 × 1.02 ×
1 − 1.02
1 − 1.02n
10 × 1.02 × = 2000
1 − 1.02
−0.02
⇐⇒ 1 − 1.02n = 2000 ×
10.2
0.02
⇐⇒ 1 + 2000 × = 1.02n
10.2
−0.02
⇐⇒ n = ln (1 + 2000 × ) ÷ ln(1.02)
10.2
≈ 80.476.
So it is only at the end of 81 complete months that her total savings first exceed $2000.
2 1 − (k + 1)xk + kxk+1
1 + 2x + 3x + ⋅ ⋅ ⋅ + kxk−1
= .
(1 − x)2
1 − (1 + 1)x1 + 1 × x1+1 1 − 2x + x2
= =1 ✓
(1 − x)2 (1 − x)2
1 − (j + 1)xj + jxj+1
1 + 2x + 3x2 + ⋅ ⋅ ⋅ + jxj−1 = .
(1 − x)2
1 − (j + 2)xj+1 + (j + 1)xj+2
1 + 2x + 3x2 + ⋅ ⋅ ⋅ + (j + 1)xj = .
(1 − x)2
1 + 2x + 3x2 + ⋅ ⋅ ⋅ + (j + 1)xj
1 1 − (j + 1)xj + jxj+1
= + (j + 1)xj
(1 − x)2
d
[∫ (1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 ) dx] = 1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 .
dx
But
d
[∫ (1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 ) dx]
dx
d
= [x + x2 + x3 + ⋅ ⋅ ⋅ + xn + c] (c ∈ R)
dx
d x(1 − xn )
= [ + c] (geometric series)
dx 1−x
(1 − x) [(1 − xn ) + x (−nxn−1 )] − x(1 − xn )(−1)
= (quotient rule)
(1 − x)2
(1 − x) [1 − xn − nxn ] + x(1 − xn )
=
(1 − x)2
(1 − x) [1 − (n + 1)xn ] + x(1 − xn )
=
(1 − x)2
1 − (n + 1)xn − x + (n + 1)xn+1 + x − xn+1
=
(1 − x)2
1 − (n + 1)xn + nxn+1
= , as desired.
(1 − x)2
1
(0.5 + d) + (0.5r) = 0.5, ⇐⇒ d + 0.5r = 0,
1 2 3
(0.5 + 2d) + (0.5r2 ) = . ⇐⇒ 2d + 0.5r2 = − ,
8 8
2 1
Take = minus 2× = to get 0.5r2 − r = −3/8 or 4r2 − 8r + 3 = 0. And so
√
8± 82 − 4(4)(3)
r=
2(4)
√
=1± 1 − 3/4 = 0.5, 1.5.
Since the geometric progression is convergent, it must be that ∣r∣ < 1 and so r = 0.5. Hence,
0.5
its sum to infinity is simply = 1.
1 − 0.5
1 xn
e − xn+1 → 0
3
1 L
Ô⇒ e −L=0
3
⇐⇒ eL − 3L = 0.
So L is either α or β.
(iii) If x1 = 0, then x2 = 1/3, x3 ≈ 0.465, x4 ≈ 0.531, x5 ≈ 0.567, x6 ≈ 0.588, . . . , x15 ≈ 0.619.
So the sequence converges to α ≈ 0.619.
Hence,
⎧
⎪
1 xn ⎪
⎪ < 0, if α < xn < β,
xn+1 − xn = e − xn ⎨
3 ⎪
⎪
⎩ > 0,
⎪ if xn < α or xn > β.
Equivalently,
ar − a
1
ar = a + 3d ⇐⇒ d= ,
3
ar − a
ar2 = a + 5d Ô⇒ ar2 = a + 5
3
2
⇐⇒ 3r = 3 + 5(r − 1)
2
⇐⇒ 3r − 5r + 2 = 0.
1
(ii) 3r2 − 5r + 2 = (3r − 2)(r − 1) = 0 and so r = 2/3 or r = 1. But if r = 1, then from =,
we have d = 0, contradicting our assumption that d ≠ 0. Hence, r = 2/3. Since ∣r∣ < 1, the
a
geometric series is convergent and has sum to infinity = 3a.
1−r
1
(iii) From = and the fact that r = 2/3, d = −a/9. So
S = a + (a + d) + (a + 2d) + ⋅ ⋅ ⋅ + [a + (n − 1) d]
n
= [2a + (n − 1) d] ×
2
n−1 n
= [2a − a] ×
9 2
19 n n
= ( a − a) × .
9 9 2
19 n n
( a − a) × > 4a
9 9 2
19 n
⇐⇒ ( − )n > 8
9 9
⇐⇒ (19 − n) n > 72
⇐⇒ 0 > n2 − 19n + 72
√ √
19 ± 192 − 4(1)(72) 19 ± 73
⇐⇒ n= = .
2 2
√
n2 − 19n + 72 is a ∪-shaped quadratic with zeros 0.5 [19 ± 192 − 4(1)(72)] = 5.228, 13.772.
So n2 − 19n + 72 < 0 ⇐⇒ n ∈ {6, 7, 8, 9, 10, 11, 12, 13}.
2j + 1 1 2j + 1
uj+1 = uj − = − 2
j 2 (j
+ 1)2 j 2 j (j + 1)2
1 2j + 1 1 (j + 1)2 − (2j + 1)
= 2 [1 − ] = 2[ ]
j (j + 1)2 j (j + 1)2
1 j 2 + 2j + 1 − (2j + 1) 1 j2
= [ ] = [ ]
j2 (j + 1)2 j 2 (j + 1)2
1
= , as desired.
(j + 1)2
N
2n + 1 N
(ii) ∑ 2 = ∑ (un − un+1 )
n=1 n (n + 1)
2
n=1
= (u1 − u2 ) + (u2 − u3 ) + (u3 − u4 ) + ⋅ ⋅ ⋅ + (uN − uN +1 )
1
= u1 − uN +1 = 1 − .
(N + 1)2
N
2n + 1 1
(iii) As N → ∞, (N + 1)−2 → 0 and so ∑ 2 (n + 1)2
= 1 − → 1.
n=1 n (N + 1)2
2n + 1 2(n + 1) − 1
(iv) Observe that = .
n2 (n + 1)2 (n + 1)2 (n + 1 − 1)2
N −1 N −1
N
2n − 1 2(n + 1) − 1 2n + 1 1
Hence, ∑ 2 = ∑ = ∑ = 1 − .
n=2 n (n − 1)2
n=1 (n + 1)2 (n + 1 − 1)2
n=1 n 2 (n + 1)2 N 2
k
cos(0.5x) − cos [(k + 0.5)x]
∑ sin rx = .
r=1 2 sin(0.5x)
1
2 sin x sin(0.5x)
∑ sin rx = sin x =
r=1 2 sin(0.5x)
−2 sin x sin(−0.5x) 1 cos(0.5x) − cos [(1 + 0.5)x]
= = .✓
2 sin(0.5x) 2 sin(0.5x)
1
= uses the identity cos P − cos Q = −2 sin [0.5 (P + Q)] sin [0.5 (P − Q)].
Step #3. Show that P(j) implies P(j + 1) (for all j = 1, 2, 3, . . . ).
Assume that P(j) is true. That is,
j
cos(0.5x) − cos [(j + 0.5)x]
∑ sin rx = .
r=1 2 sin(0.5x)
j+1
cos(0.5x) − cos [(j + 1.5)x]
∑ sin rx = .
r=1 2 sin(0.5x)
j+1 j
∑ sin rx = ∑ sin rx + sin [(j + 1)x]
r=1 r=1
cos(0.5x) − cos [(j + 0.5)x]
= + sin [(j + 1)x]
2 sin(0.5x)
cos(0.5x) − cos [(j + 0.5)x] + 2 sin(0.5x) sin [(j + 1)x]
=
2 sin(0.5x)
cos(0.5x) − cos [(j + 0.5)x] + cos [(j + 0.5)x] − cos [(j + 1.5)x]
=
2 sin(0.5x)
cos(0.5x) − cos [(j + 1.5)x]
= , as desired.
2 sin(0.5x)
2n 2n
∑ 3r+2 = 32 ∑ 3r
r=1 r=1
= 9 (3 + 32 + 33 + ⋅ ⋅ ⋅ + 32n )
1
3 (1 − 32n )
=9
1−3
27
= (1 − 32n )
−2
27 2n
= (3 − 1) .
2
S1 = 6 − 2/31−1 = 4 = a
2 16
S2 = 6 − = = a + ar = 4(1 + r)
32−1 3
1
⇐⇒ r = .
3
j+1 j
3 1 2 1
To this end, write: ∑ r = ∑ r3 + (j + 1)3 = j 2 (j + 1)2 + (j + 1)3
r=1 r=1 4
5 1 4 1
= (j + 1)2 [j 2 + 4(j + 1)] = (j + 1)2 (j + 2)2
4 4
3 1 2
= (j + 1)2 [(j + 1) + 1] , as desired.
4
n
(iii) ∑(2r − 1)3 = 13 + 33 + 53 + 73 + ⋅ ⋅ ⋅ + (2n − 1)3
r=1
= [13 + 23 + 33 + 43 + ⋅ ⋅ ⋅ + (2n)3 ] − [23 + 43 + 63 + 83 + ⋅ ⋅ ⋅ + (2n)3 ]
1
= (2n)2 (2n + 1)2 − 2n2 (n + 1)2
4
= n2 (2n + 1)2 − 2n2 (n + 1)2
= n2 [(2n + 1)2 − 2(n + 1)2 ]
= n2 [4n2 + 4n + 1 − 2 (n2 + 2n + 1)]
= n2 (2n2 − 1) .
Ð→ Ð→ Ð→
(ii) BC = OC − OB = 0.6a − b and so the line BC can be written as r = b + λ(0.6a − b) =
0.6λa + (1 − λ)b, for λ ∈ R, as desired.
Ð→ ÐÐ→ Ð→ 5
AD = OD − OA = /11b − a and so the line AD can be written as r = a + µ(5/11b − a) =
(1 − µ)a + 5/11µb, for λ ∈ R, as desired.
(iii) Where the lines meet, we have 0.6λa + (1 − λ)b = (1 − µ)a + 5/11µb. Equating the
1 2 1
coefficients, we have 0.6λ = 1 − µ and 5/11µ = 1 − λ. From =, we have µ = 1 − 0.6λ. Plugging
2
this into =, we have 5/11 (1 − 0.6λ) = 1 − λ ⇐⇒ 1 − 0.6λ = 11/5 − 11/5λ ⇐⇒ 8/5λ = 6/5 ⇐⇒
λ = 3/4. And µ = 0.55. Altogether then, the position vector of E is 0.45a + 0.25b.
Ð→ ÐÐ→ Ð→ ÐÐ→
AE = 0.55a − 0.25b and ED = −0.45a + 9/44b. We observe that AE = −9/11ED and so the
desired ratio is 9/11.
(ii) The vector from P to a generic point on L is (2, 5, −6) − r = (2, 5, −6) − (1, −2, −4) −
(2λ, 3λ, −6λ) = (1 − 2λ, 7 − 3λ, −2 + 6λ). The length of this vector is
√ √
(1 − 2λ)2 + (7 − 3λ)2 + (−2 + 6λ)2 = 49λ2 − 70λ + 54.
49λ2 −70λ+54 is a ∪-shaped quadratic with minimum point given by 98λ−70 = 0 or λ = 5/7.
Hence, the closest point is (1, −2, −4) + 5/7(2, 3, −6) = 1/7(17, 1, −58).
(iii) The plane is parallel to the vectors (2, 3, −6) and (2, 5, −6) − (1, −2, −4) = (1, 7, −2).
It thus has normal vector (2, 3, −6) × (1, 7, −2) = (36, −2, 11). Moreover, we know that
(1, −2, −4) is on the plane. Hence, a cartesian equation is 36x − 2y + 11z = 36 × 1 − 2 × (−2) +
11 × (−4) = −4.
Answer to Exercise 336 (9740 N2014/I/9). (i) The plane q is parallel to the vectors
(1, 2, −3) and (2, −1, 4). It thus has normal vector (1, 2, −3) × (2, −1, 4) = (5, −10, −5) and
hence also normal vector (−1, 2, 1). It contains the point (1, −1, 3). Altogether then, it has
cartesian equation −x + 2y + z = 0.
(ii) Line m has direction vector (−1, 2, 1) × (1, 2, −3) = (−8, −2, −4) and hence also direction
vector (4, 1, 2).
To find a point that is on both planes, try plugging in x = 0. Then from the equation of
q, we have z = −2y. Now plug this also into the equation of p to get 2y − 3(−2y) = 12 or
y = 1.5. Hence, an intersection point is (0, 1.5, −3).
Altogether then, the line m has vector equation r = (0, 1.5, −3) + λ(4, 1, 2), for λ ∈ R.
Ð→ Ð→ Ð→
(iii) AB = OB − OA = (4λ, 1.5 + λ, −3 + 2λ) − (1, −1, 3) = (4λ − 1, 2.5 + λ, −6 + 2λ). So
Ð→ 2
∣AB∣ = (4λ − 1)2 + (2.5 + λ)2 + (−6 + 2λ)2 = 21λ2 − 27λ + 43.25. This lattermost expression
is a ∪-shaped quadratic, with minimum point given by 42λ − 27 = 0 or λ = 9/14. So
18 15 12
B = (4λ, 1.5 + λ, −3 + 2λ) = ( , , − ).
7 7 7
Answer to Exercise 337 (9740 N2013/I/1). (i) From the equation for p, we have
1
z = 0.5x − 2.
1 2
Plug = into the equation for q to get 2x − 2y + 0.5x − 2 = 6 ⇐⇒ y = 1.25x − 4.
1 2
Now plug = and = into the equation for r to get 5x − 4(1.25x − 4) + µ(0.5x − 2) = −9 ⇐⇒
3 4 4µ − 50
0.5µx + 25 − 2µ = 0 ⇐⇒ x = .
µ
1 2 4
So if µ = 3, from =, =, and =, we have
38 119 25
x=− ,y = − , and z = − .
3 6 3
1
(ii) From =, if µ = 0, then we have 250, a contradiction. So the three planes do not intersect.
ÐÐ→ Ð→ 4a + 3c
0.5 ∣ON × OC∣ = 0.5 ∣ × c∣
7
= 1/14 ∣(4a + 3c) × c∣
= 1/14 ∣4a × c + 3c × c∣ (distributivity of vector product)
= 1/14 ∣4a × c∣ (v × v = 0)
= 1/14 ∣4a × (λa + µb)∣
= 1/14 ∣4a × λa + 4a × µb∣ (distributivity of vector product)
2µ
= 1/14 ∣4a × µb∣ = .
7
ÐÐ→ Ð→
0.5 ∣OM × OC∣ = 0.5 ∣0.5b × c∣
= 1/4 ∣b × c∣
= 1/4 ∣b × (λa + µb)∣
= 1/4 ∣b × λa + b × µb∣ (distributivity of vector product)
= 1/4 ∣b × λa∣ (v × v = 0)
= 1/4λ ∣b × a∣ .
So θ = 0.705.
(ii) The intersection line of two planes has direction vector given by the vector product of
their normal vectors: (2, −2, 1) × (−6, 3, 2) = (−7, −10, −6).
A point that is on both planes satisfies both equations 2x − 2y + z = 1 and −6x + 3y + 2z = −1.
Plugging in x = 0, the first equation yields z = 1 + 2y, which when plugged into the second
equation yields y = −3/7. So a point that is on both planes is (0, −3/7, 1/7).
(iii) The distance between a point a and a plane is given by ∣d − a ⋅ n̂∣, where n is its normal
vector and d = r ⋅ n̂.
For p1 , d = 1/3 and for p2 = d = −1/7. Hence, the distance between A(4, 3, c) and the plane
p1 is
1 (4, 3, c) ⋅ (−6, 3, 2) 1 15 − 2c 14 − 2c
∣− − ∣ = ∣− + ∣=∣ ∣.
7 7 7 7 7
1 + c 14 − 2c
− = ⇐⇒ −7 − 7c = 42 − 6c ⇐⇒ c = −49,
3 7
1 + c 14 − 2c
OR = ⇐⇒ 7 + 7c = 42 − 6c ⇐⇒ c = 35/13.
3 7
Ð→ Ð→
0.5 ∣OA × OC∣ = 0.5 ∣a × (λa + µb)∣
= 0.5 ∣a × λa + a × µb∣ (distributivity of vector product)
= 0.5 ∣a × µb∣ (v × v = 0)
= 0.5µ ∣(1, −1, 1) × (1, 2, 0)∣
√
= 0.5µ ∣(−2, 1, 3)∣ = 0.5 14µ.
√ √ √ √
So 0.5 14µ = 126 or µ = 2 × 126/14 = 2 × 9 = 2 × 3 = 6.
√
(ii) c = λa+µb = λa+4b = λ(1, −1, 1)+4(1, 2, 0) = (4+λ, 8−λ, λ). We are given that ∣c∣ = 5 3.
√ 2
So (4 + λ)2 + (8 − λ)2 + λ2 = 3λ2 − 8λ + 80 = (5 3) = 75 ⇐⇒ 3λ2 − 8λ + 5 = 0 = (3λ − 5)(λ − 1),
so λ = 5/3 or 1. And c = (52/3, 61/3, 5/3) or (5, 7, 1).
Answer to Exercise 341 (9740 N2012/I/9). (i) r = (7, 8, 9) + λ(8, 16, 8).
(ii) The position vector of N is given by p + (Ð → ⋅ v̂) v̂, where v is the direction vector of the
pa
̂ (1, 2, 1)
line, p is a point on the line, and a is the given point (1, 8, 3). Compute (8, 16, 8) = √
6
and now:
7−α
Solving 5 = , we have 5α + 5 = 7 − α or α = 1/3. So the ratio AN ∶ N B = α ∶ 1 = 1/3 ∶ 1 =
α+1
1 ∶ 3.
(ii) (a) Since a is a unit vector, (2p)2 + (6p)2 + (3p)2 = 1 or 49p2 = 1 or p = 1/7.
(b) ∣a ⋅ b∣ is the length of the projection vector of b on a.
Another normal vector to the plane is a scalar multiple of the above, namely (1, 1, 2). We
have (4, −1, −3) ⋅ (1, 1, 2) = −3. Hence, a cartesian equation of p is x + y + 2z = −3.
x−1 1
(ii) From the equations for l1 , we have = z + 3 ⇐⇒ x = 2(z + 3) + 1 = 2z + 7 and
2
y−2 2
= z + 3 ⇐⇒ y = −4z − 10.
−4
1 2
Plug in = and = into the equations for l2 to get
2z + 7 + 2 3 −4z − 10 − 1 4 z − 3
= = .
1 5 k
3 4
From =, we have 10z + 45 = −4z − 11 ⇐⇒ z = −56/14 = −4. Now from =, we have
z−3 −7
k=5 = 5 = −7.
−4z − 11 5
(iii) The direction vector of l1 is perpendicular to the normal vector of the plane p, as we
can verify — (2, −4, 1) ⋅ (1, 1, 2) = 0. Moreover, a point on l1 is on p, as we can verify —
(1, 2, −3) ⋅ (1, 1, 2) = −3. Altogether then, l1 is on p.
From the equations for l2 , we have y = 5x+11 and z = −7x−11. Plug these into the equation
for the plane p to get: x + (5x + 11) + 2(−7x − 11) = −3 ⇐⇒ −8x − 11 = −3 ⇐⇒ x = −1. So
y = 6 and z = −4. The intersection point is (−1, 6, −4).
6 9 18 6 9 18
(ii) (a + b) ⋅ (a − b) = ( + 1, + 2, + 2) ⋅ ( − 1, − 2, − 2)
7 7 7 7 7 7
13 115 128
=− − + = 0.
49 49 49
2 2
(Optional. Actually, more generally, since (a + b) ⋅ (a − b) = ∣a∣ − ∣b∣ , if ∣a∣ = ∣b∣, then
(a + b) ⋅ (a − b) = 0.)
Answer to Exercise 345 (9740 N2010/I/10). (i) The line has direction vector
(−3, 6, 9), which is a scalar multiple of the plane’s normal vector (1, −2, −3). So the line is
perpendicular to the plane.
(ii) From the equations of the line, we have y = −2x + 19 and z = −3x + 27. Plug these in to
the equation of the plane to get x − 2(−2x + 19) − 3(−3x + 27) = 0 ⇐⇒ 14x − 119 = 0 ⇐⇒
x = 119/14 = 8.5. And so y = 2 and z = 1.5. So the point of intersection is (8.5, 2, 1.5).
−2 − 10
(iii) We can easily verify that the given point satisfies the equations for the line: =
−3
23 + 1 33 + 3
4= = . The point is therefore on the line.
6 9
The point of intersection we found in (ii) (call it X) is equidistant to both A and B.
Moreover, these three points are collinear. Thus, B = (19, −19, −30).
(2, 1, 3) ⋅ (−1, 2, 1) 3
θ = cos−1 ( ) = cos−1 ( √ √ ) ≈ 1.237.
∣(2, 1, 3)∣ ∣(−1, 2, 1)∣ 14 4
(ii) The line l has direction vector (2, 1, 3) × (−1, 2, 1) = (−5, −5, 5) and thus also direction
vector (1, 1, −1).
1 2
A point (x, y, z) that lies on both planes satisfies 2x+y +3z = 1 and −x+2y +z = 2. Plugging
1 2
in x = 0, = yields y = 1 − 3z and now = yields z = 0. So (x, y, z) = (0, 1, 0).
Altogether then, the line l has vector equation r = (0, 1, 0) + λ(1, 1, −1), for λ ∈ R.
(iii) The line l is parallel to the plane p3 , as we now verify: (1, 1, −1) ⋅ (2 − k, 1 + 2k, 3 + k) =
2 − k + 1 + 2k − 3 − k = 0. Moreover, the point (0, 1, 0), which is on the line l, is also on the
plane p3 , as we now verify: 2 × 0 + 1 + 3 × 0 − 1 + k(−0 + 2 × 0 + 0 − 2) = 0. Altogether then,
the line l lies in p3 for any constant k.
We want to find k such that (2, 3, 4) satisfies 2x + y + 3z − 1 + k(−x + 2y + z − 2) = 0.
That is, 2 × 2 + 3 + 3 × 4 − 1 + k(−2 + 2 × 3 + 4 − 2) = 18 + 6k. So k = −3. So the plane is
2x + y + 3z − 1 − 3(−x + 2y + z − 2) = 0 or 5x − 5y + 5 = 0 or x − y + 1 = 0.
Ð→
Answer to Exercise 347 (9740 N2009/II/2). (i) Let p = OP . By the Ratio Theorem,
2(11, −13, 2) + (14, 14, 14)
p= = (12, −4, 6). So the point P is (12, −4, 6).
3
Ð→
(ii) AB ⋅ p = (−3, −27, −12) ⋅ (12, −4, 6) = 0.
(iv) a × p = (140, 84, −224). ∣a × p∣ is the area of the parallelogram formed with a and p as
where the heads of a and p are the same point. The area of the triangle OAP is
its sides,√
∣a × p∣ = 1402 + 842 + (−224)2 ≈ 139.
Ð→ Ð→
(ii) The angle AOB is equal to the angle between the vectors OA and OB:
Ð→ Ð→
(iii) It is ∣OA × OB∣ ≈ 25.981.
Answer to Exercise 349 (9740 N2008/I/11). You can either find the intersection
point using a graphing calculator or painfully by hand, as I do now:
1 2 5 1
From p1 , z = 1 − x + y. Plug = into the equation for p2 to get 3x + 2y − 5z = 3x + 2y −
3 3
2 5 19 19 2 1 3
5 (1 − x + y) = −5 or x − y = 0 or x = y. And so from =, we have z = 1 + x. Now plug
3 3 3 3
3 2
in = and = into the equation for p3 to get 5x + λx + 17(1 + x) = µ or (22 + λ)x = µ − 17 or
µ − 17 0.4 4 4 4 7
x= =− = − . So the point of intersection is (− , − , ).
22 + λ 1.1 11 11 11 11
(i) The line has direction vector (2, −5, 3) × (3, 2, −5) = (19, 19, 19) and thus also direction
vector (1, 1, 1). From our work above, x = y at the intersection of the two planes. Plug in
x = 0 to find that the two planes intersect at (0, 0, 1). Altogether then, the line has vector
equation r = (0, 0, 1) + α(1, 1, 1), for α ∈ R.
(ii) Two points on the line are (0, 0, 1) and (−1, −1, 0). Plug these into the equation for
plane p3 to get 17 = µ and −5 − λ = µ, so that µ = −22.
(iii) The line l must be parallel to the plane p3 , so that (1, 1, 1) ⋅ (5, λ, 17) = 0 or λ = −22.
Moreover, the point (0, 0, 1) on the line is not on the plane, so that µ ≠ 17.
(iv) Another vector that is parallel to the plane to be found is (1, −1, 3)−(0, 0, 1) = (1, −1, 2).
The plane thus has normal vector (1, 1, 1) × (1, −1, 2) = (3, −1, −2). Compute also d =
(0, 0, 1) ⋅ (3, −1, −2) = −2. Altogether then, the plane has cartesian equation 3x − y − 2z = −2.
x − 1 1 2x − 2 + 3 2 5 − x − 4
= = .
−1 −3 1
1 2
Both = and = imply that x = 4 and so indeed the two lines intersect. (If they didn’t intersect,
1 2
then = would contradict =.) So the point of intersection is (4, 6, 1).
ÐÐ→
(ii) By the Ratio Theorem, OM = 1/3 [2(1, −1, 2) + (2, 4, 1)] = 1/3(4, 2, 5).
Ð→ Ð→
0.5 ∣OA × OC∣ = 0.5 ∣(1, −1, 2) × (−4, 2, 2)∣
= 0.5 ∣(−6, −10, −2)∣
√
= 0.5 140 ≈ 5.916.
So the angle between the line and the plane is 2.946 − π/2 ≈ 1.376.
√
∣17 − (1, 2, 4) ⋅ (3, −1, 2)∣ ∣17 − 9∣ 8 4 14
(iii) ∣d − a ⋅ n̂∣ = √ = √ =√ = ≈ 2.138.
14 14 14 7
Answer to Exercise 353 (9233 N2007/I/7). The foot of the perpendicular a point A
Ð→
to a line is Q + (QA ⋅ v̂) v̂, where Q is any point on the line and v is the line’s direction
vector. Hence,
Ð→ √
∣AP ∣ = ∣(−6, 3, 2)∣ = 49 = 7.
Ð→
(ii) OC = 0.25(4, 1, 3). So the line BC has direction vector (0, 3.25, −3.25) and hence also
direction vector (0, 1, −1). So the line BC has equation r = (1, −3, 4) + µ(0, 1, −1), for µ ∈ R.
Setting the equations of the two lines equal to each other, we have 4 + λ = 1, 1 + λ = −3 + µ,
and 3 = 4 − µ, so that λ = −3 and µ = 1. And the point of intersection is (1, −2, 3).
Ð→ Ð→
Answer to Exercise 355 (9233 N2006/I/14). By the Ratio Theorem, OP = (1−λ)OA+
Ð→ Ð→ Ð→ ÐÐ→
λOB = (1 − λ)(1, −2, 5) + λ(1, 3, 0) = (1, −2 + 5λ, 5 − 5λ). And OQ = (1 − µ)OC + µOD =
(1 − µ)(10, 1, 2) + µ(−2, 4, 5) = (10 − 12µ, 1 + 3µ, 2 + 3µ).
Ð→ ÐÐ→
(i) P Q has direction vector AB × CD = (0, 5, −5) × (−12, 3, 3) = (30, 60, 60) and hence also
direction vector (1, 2, 2).
Ð→ Ð→ Ð→
Moreover, P Q = OQ − OP = (9 − 12µ, 3 + 3µ − 5λ, −3 + 3µ + 5λ), which must be a scalar
1 2
multiple of (1, 2, 2). And so 3 + 3µ − 5λ = 2(9 − 12µ) and −3 + 3µ + 5λ = 2(9 − 12µ). Taking
2 1 2 1
= minus =, we have −6 + 10λ = 0 or λ = 0.6. Taking = plus =, we have 6µ = 4(9 − 12µ) or
Ð→
µ = 2/3. Altogether then, P Q = (1, 2, 2), as desired.
Ð→ Ð→ Ð→
(ii) First observe that AQ = OQ − OA = (10 − 12µ, 1 + 3µ, 2 + 3µ) − (1, −2, 5) = (2, 3, 4) −
(1, −2, 5) = (1, 5, −1).
Now compute that the area of triangle ABQ is
Ð→ Ð→
0.5 ∣AB × AQ∣ = 0.5 ∣(0, 5, −5) × (1, 5, −1)∣
= 0.55 ∣(20, −5, −5)∣ ≈ 10.607.
1
= [(a3 − 3ab2 ) + i (3a2 b − b3 )]
a +b
2 2
√ √
is purely imaginary if and only if a3 −3ab2 = 0. But a3 −3ab2 = a(a2 −3b2 ) = a (a − 3b) (a + 3b).
√
So either b = ±a/ 3 or a = 0 (but the latter is explicitly ruled out in the question).
√
Altogether, the possible values of w = a + ib are given by b = ±a/ 3 and a is any non-zero
real number.
where the last line uses the fact that sin(π − x) = sin x.
q −11 + 2i 11 2
(ii) Since pz 2 + 3 = p(−3 + 4i) + q = (−3p − q) + i (4p + q) is real, we have
z 125 125 125
q q
4p + 2 = 0 or q = −250p. And pz 2 + 3 = 19p.
125 z
y
{z : |z + 5 - i| = 4}
Radius 4
(-5, 1)
(ii) The complex equation ∣z − 6i∣ = ∣z + 10 + 4i∣ is equivalent to the cartesian equation
2
(x − 0)2 + (y − 6)2 = (x + 10)2 + (y + 4)2 or −12y + 36 = 20x + 100 + 8y + 16 or y = −x − 4.
2 1
So to find the intersection points of the line and the circle, plug
√ = into = to get
√ (x + 5)2 +
(−x − 4 − 1)2 = 42 or 2(x + 5)2 = 42 or (x + 5)2 = 8 or x + 5 = ± 8 or x = −5 ± 8. So the
√ √ √ √
possible values of z are −5 ± 8 + (5 ∓ 8 − 4) i = −5 ± 8 + (1 ∓ 8) i.
√
√ √ 2 −1 π
(b) (i) w = 3−i, so ∣w∣ = ( 3) + (−1)2 = 2 and arg w = tan−1 √ = − . So w = 2ei(−π/6) .
3 6
6 6 i(−π+2kπ) 6 iπ
And so w = 2 e =2 e .
wn
(ii) arg ( ∗ ) = arg wn − arg w∗ + 2kπ = n arg w + arg w + 2kπ = (n + 1) arg w + 2kπ = (n +
w
1) × (−π/6) + 2kπ. A complex number z is real if and only if arg z = 0 or arg z = π. So by
wn
observation, the three smallest positive whole number values of n for which ∗ is real are
w
5, 11, and 17.
3 2
0 = a (1 + 2i) + 5 (1 + 2i) + 17 (1 + 2i) + b
= a(−11 − 2i) + 5(−3 + 4i) + 17 + 34i + b
= (−11a − 15 + 17 + b) + i(−2a + 20 + 34)
= (2 − 11a + b) + i(54 − 2a).
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
0 0
(iii) By the complex conjugate roots theorem, 1 − 2i is also a root for the equation. Write
(ii) z is the top-right quarter of the circumference of the circle of radius r, centred on the
origin.
Take the position vector of z, rotate it clockwise by π/3 radians about the origin, double
its length — this is the position vector of w.
y
re iπ / 2 2re i π / 6
re i 0
x
{z = re iɅ : Ʌ [0, π / 2]}
2re i (- π / 3)
from
z 10 π
(iii) arg ( 2 ) = arg z 10 − arg w2 + 2kπ = 10 arg z − 2 arg w + 2kπ = 10θ − 2 (− + θ) + 2kπ =
w 3
π π
8θ + 2 + 2kπ = π, so θ = (with k = 0).
3 24
3 3
√
(ii) z is√real if and only√if 3c − c = 0 or c = 0, ± 3. The question already ruled out c = 0.
So c = ± 3 and z = 1 ± i 3.
√
(iii) z = 1 − i 3 = ∣z∣ ei arg z = 2ei(−π/3) . ∣z n ∣ = 2n > 1000 if and only if n > ln 1000/ ln 2 ≈ 9.966.
So the smallest positive integer n is 10.
∣z 10 ∣ = 210 and arg z 10 = 10(−π/3) + 2kπ = 2π/3 (k = 2).
{z : |z - (7 - 3i )| = 4}
o x
a
Radius 4
c = (7, -3)
b
Radius 4
d
(ii) (a) a is the point on the circle’s circumference that is closest to the origin a. The line
l through the origin and the centre of the circle passes through a (see Fact 56).
√ √
But the distance of the centre of the circle from the origin is 72 + 32 = 58. The distance
of the centre of the circle to the point a is 4 √
(this is simply the length of the radius). Hence,
the distance of the origin to the point a is 58 − 4.
(b) △abc is right. So ab2 + bc2 = ca2 = 42 = 16.
3
But the line l has slope − (because it runs through the origin and the point 7 − 3i) and
7
3 3 2 49 7 28
so ab = bc. Hence, ( ) × bc2 + bc2 = 16. Or bc2 = 16 × . Or bc = 4 × √ = √ . And
7 7 58 58 58
12 28 12
ab = √ . Hence, a = (7 − √ , −3 + √ ).
58 58 58
(iii) By observation, d is the point where ∣arg z∣ is as large as possible. arg z = arg(7 − 3i) +
∠cod.
4 −3
But △cod is right. So ∠cod = sin−1 √ . Moreover, arg(7 − 3i) = tan−1 .
58 7
−3 4
Altogether then, arg z = tan−1 + sin−1 √ = −0.9579.
7 58
√
−4 ± 42 − 4(1)(4 + 2i) √ √
w= = −2 ± 4 − (4 + 2i) = −2 ± −2i = −2 ± (1 + i) = −3 − i, −1 + i.
2
(iii) (a) This is simply the line that is equidistant to z1 = (−2, 2) and z2 = (2, −2). By
observation, it has cartesian equation y = x.
|z - z1 | = |z - z2 |
|w - w1 | = |w - w2 |
x
(b) This simply the line that is equidistant to w1 = (−3, −1) and w2 = (−1, 1). By observa-
tion, it has cartesian equation y = x + 2.
(iv) The two lines are parallel and do not intersect.
{z : |z - (2 + 5i )| ≤ 3}
b
P
Radius 3
c = (2, 5)
P2
Radius 3
a
P1
(6, 1)
(ii) The points on the circle’s circumference that are closest to and furthest from the origin
o are a and b. The line l through the origin and the centre of the circle passes through both
a and b (see Fact 56).
√ √ √ √
oc = 22 + 52 = 29 and ac = 3. Hence, oc = 29 − 3. √ Symmetrically, ob = 29 + 3. The
maximum and minimum possible values of ∣z∣ are thus 29 ± 3.
(iii) The locus of points that satisfy both ∣z − 2 − 5i∣ ≤ 3 and 0 ≤ arg z ≤ π/4 is the blue
closed segment.
By observation, ∣z − 6 − i∣ is maximised either at P1 or P2 . These points are given by
√
So P1 = (2,√2) and P2√
√ = (5, 5). The distances of these points to the point (6, 1) are 42 + 12 =
17 and 12 + 42 = 17. So both are, equally, the furthest point from (6, 1).
y
|z - z 1| = 2
arg (z - z 2) = π / 4
z 1= (1, )
z 2 = (-1, -1)
(b) This is simply the ray from the point z2 (but excluding the point z2 ) that makes an
angle π/4 with the horizontal.
√
(iv) We want to find x > 0 such that ∣(x, 0) − (1, 3)∣ = 2 or (x − 1)2 + 3 = 4 or (x − 1)2 = 1
or x = 0, 2. So (2, 0) is where the locus ∣z − z1 ∣ = 2 meets the positive real axis.
√ √
6± (−6)2 − 4(1)(34) −100
x= =3± = 3 ± 5i.
2 2
(ii)
(iii) ∣z − z1 ∣ = ∣z − z2 ∣ is the line (blue) that is equidistant to the points z1 = 21/14 eiπ(1/28) and
z2 = 21/14 eiπ(1/28+2/7)
Explanation #2: The perpendicular bisector of a chord runs through the centre of the
circle. So in this case, the perpendicular bisector of the chord z1 z2 runs through the origin
(which is the centre of the circle).
√ 2 √ √ √ √ √
(1 + 3i) (1 + 3i) = (−2 + 2 3i) (1 + 3i) = −2 − 6 + (2 3 − 2 3) i = −8.
(ii) 0 = 2z 3 + az 2 + bz + 4
√ √
= 2(−8) + a (−2 + 2 3i) + b (1 + 3i) + 4
√
= −12 − 2a + b + i 3 (2a + b)
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
1
=0 2
=0
1 2 2
Adding = and = together, we have −12 + 2b = 0 or b = 6. And now from =, a = −3.
√
(iii) By the complex conjugate roots theorem, another root is 1 − 3i. So
√ √
2z 3 − 3z 2 + 6z + 4 = 2 [z − (1 + 3i)] [z − (1 − 3i)] (z − c)
√ √
= 2 (z − 1 − 3i) (z − 1 + 3i) (z − c)
2 √ 2
= 2 [(z − 1) − ( 3i) ] (z − c)
= 2 (z 2 − 2z + 4) (z − c)
= 2 [z 3 + (−c − 2)z 2 + (4 + 2c)z − 4c] .
Comparing coefficients, we have c = −0.5, which is also the third root for the equation.
y
|z | ≤ 6 8 + 6i
A
O B
x
|z | = |z - 8 - 6i |
(i)
(ii) Observe that arg z is maximised and minimised at A and B. arg A = ∠COX + ∠AOC,
6 3
arg B = ∠COX − ∠BOC. Moreover, ∠COX = arg(8 + 6i) = tan−1 = tan−1 .
8 4
√
Note that △AOC is right and the length of OC is half of ∣8 + 6i∣ = 82 + 62 = 10. So
OC OC 5
OC = 10. Thus, ∠AOC = ∠BOC = cos−1 = cos−1 = cos−1 .
OA OB 4
3 5
Altogether then, arg A = ∠COX + ∠AOC = tan−1 + cos−1 ≈ 1.229 and arg B = ∠COX −
4 4
−1 3 −1 5
∠BOC = tan − cos ≈ 0.058.
4 4
- 2i
Radius 2
|z + 2i | = 2
π/3
1 + 3i π/6
π / 6 ≤ arg (z + 1 – 3i) ≤ π / 3
x
Remark 10. Do not make the mistake of concluding that by the complex conjugate roots
theorem, 1 + i is the other root of the equation w2 = −2i. The theorem applies only for
polynomials whose coefficients are all real. It does not apply here because there is an
imaginary coefficient.
√ √
3 + 5i ± (3 + 5i)2 − 4(1)(−4)(1 − 2i) 3 + 5i ± 9 − 25 + 30i + 16(1 − 2i)
(ii) z = =
√ 2 2
3 + 5i ± −2i 3 + 5i ± (1 − i)
= = = 2 + 2i, 1 + 3i.
2 2
|z + 2 - 3i | =
-2 + 3i
Radius
(b) (a + ib)(a − ib) + 2(a + ib) = 3 + 4i or a2 + b2 + 2a + 2bi = 3 + 4i. Two complex numbers are
1 2
equal if and only if their real and imaginary parts are equal. So a2 + b2 + 2a = 3 and 2b = 4.
2 1
From =, b = 2. Plug this into = to find that a2 + 2a + 1 = 0 or a = −1. So w = −1 + 2i.
(ii) z 6 = −64 = 64eiπ = 26 eiπ(1+2k) for k ∈ Z. So z = 2eiπ(1+2k)/6 for k = 0, ±1, ±2, −3.
(iii) First use what we found in part (ii). Then use what we found in part (i):
z 6 + 64
√ √
= (z 2 − 2 3 + 4) (z 2 + 4) (z 2 + 2 3 + 4) .
az 4 + bz 3 + cz 2 + dz + e = a (z − ki) (z + ki) (z 2 + f z + g)
= a (z 2 + k 2 ) (z 2 + f z + g)
= a [z 4 + f z 3 + (k 2 + g)z 2 + k 2 f z + gk 2 ] .
1 2
By comparing coefficients, we have b = af , c = a(k 2 + g), d = ak 2 f , and e = agk 2 . Now verify
that indeed:
ad2 + b2 e = a3 k 4 f 2 + a3 f 2 gk 2
= (af ) × [a(k 2 + g)] × (ak 2 f )
= b × c × d. ✓
(ii)a = 1, b = 3, c = 13, d = 27, e = 36. So indeed ad2 +b2 e = 1×272 +32 ×36 = 1053 = 3×13×27 =
bcd. ✓
√ √
1 b 2 d 27 √
From = above, f = = 3. So from =, k = ± =± = ± 9 = ±3. So the two desired
a af 1×3
roots are ±3i.
y
P : arg (z – 2i) = π / 3
(1, )
π/3
(1, 2)
Q : |z + 2| = |z – 4|
(-2, 0) (4, 0) x
√ √
The intersection of the two lines is (1, 3 + 2) or 1 + i ( 3 + 2).
√ √ √ 2 √ √
[1 + i ( 3 + 2)] [1 − i ( 3 + 2)] = 1 + ( 3 + 2) = 1 + 3 + 4 + 4 3 = 8 + 4 3. ✓
|z + 4 - 4i| = 3
A = (-4, 4)
C = (0, 1) x
(ii) Given a point (C here), the line connecting it to the centre of a circle (A here) also
passes through the point on the circumference (B here) that is closest to the given point
(see Fact 56).
√ √
2 2
The distance between A and C is (−4 − 0) + (4 − 1) = 42 + 32 = 5. So the distance
between B and C is 5 − 3 = 2.
√
2± (−2)2 − 4(1)(2) √
z= = 1 ± 1 − 2 = 1 ± i.
2
Answer to Exercise 378 (9740 N2015/I/3). (i) This question is simply asking you to
explain the intuition behind the definite and Riemann integral (see Chapter 479).
1
See figure below. There are 10 rectangles. Each has width . The leftmost blue rectangle
10
1 2
has height f ( ). The second-leftmost rectangle has height f ( ). Etc. The total area
10 10
1 1 2 10 1
of the 10 rectangles is [f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )]. It approximates ∫ f (x) dx,
10 10 10 10 0
which is the area under the graph of f , between 0 and 1.
By increasing the number of rectangles, we improve the approximation. In the limit, it is
plausible that we have
1 1 2 n 1
lim { [f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )]} = ∫ f (x) dx.
n→∞ n n n n 0
x
1
√
(ii) Let f ∶ R → R be defined by x ↦ 3 x. Then by part (i), the given expression is equal
1 1√ 1
to ∫ f (x) dx = ∫ 3 x dx = 3/4 [x4/3 ]0 = 3/4.
0 0
dA dy
= y + x + πx = 0.5d − x(2 + 0.5π) − x (2 + 0.5π) + πx = 0.5d − 4x = 0.
dx dx
d
So x = and
8
d 2 + 0.5π d 2
A = (0.5d − d) + 0.5π ( )
8 8 8
1 1 π π d2
=( − − + ) d2 = .
16 32 128 128 32
(2x)2 (2x)3
Answer to Exercise 380 (9740 N2015/I/6). (i) ln (1 + 2x) = (2x)− + +⋅ ⋅ ⋅ =
2 3
8
2x − 2x2 + x3 + . . . .
3
c
(ii) ax (1 + bx)
c(c − 1)(bx)2 c(c − 1)(c − 2)(bx)3
= ax [1 + cbx + + + ...]
2! 3!
ab3 c(c − 1)(c − 2) 3
= ax + abcx2 + 0.5ab2 c(c − 1)x3 + x ...
6
1 2 3 8
Comparing coefficients, a = 2, abc = −2, and 0.5ab2 c(c − 1) = .
3
1
Solve this system of equations using your calculator or manually, as I do now: From = and
2 −1 3 c−1 8 1 8 −3 5
=, we have b = . Now from =, we have = or 1 − = or c = . And b = .
c c 3 c 3 5 3
Altogether then, the coefficient of x4 is
5 3 3 8 13
ab3 c(c − 1)(c − 2) 2 ( 3 ) (− 5 ) (− 5 ) (− 5 ) 104
= =− .
6 6 27
π/4
The area between y = cos x and y = sin x between the origin and P is A2 = ∫ cos x dx −
0
π/4
π/4 π/4 √
∫ sin x dx = [sin x]0 − [− cos x]0 = 2 − 1.
0
√
So A1 = 2 − 2. And
√ √ √ √
A1 2 − 2 2 − 2 2+1 2−2+ 2 √
=√ =√ ×√ = = 2.
A2 2−1 2−1 2+1 1
√ √
0.5 2 0.5 2
π∫ x2 dy = π ∫ (sin−1 y) 2 dy.
0 0
√
0.5 2
(iii) π ∫ (sin−1 y) 2 dy
0
u ′ ⎡ v′ ⎤π/4
© © v ⎢ ©¬ ⎥⎥
u
2¬ ⎢ 2
π/4 π/4
2
= π∫ u d (sin u) = π ∫ u cos u du = π ⎢u sin u − ∫ 2 u sin u du⎥
⎢ ⎥
0 0 ⎢ ⎥
⎣ ⎦0
√ √
π3 2 π/4
π3 2 π/4
= − 2π [u(− cos u) − ∫ (− cos u) du] = + 2π [u cos u − sin u]0
16 2 0 16 2
√ √ √ √
π3 2 π 2 2 π 2 π2 π
= + 2π ( − )= ( + − 2) .
16 2 4 2 2 2 16 2
√
(ii) dy/dx = 0 ⇐⇒ 2 cot θ − tan θ = 0 ⇐⇒ 2 =√sin2 θ/ cos2 θ = tan2 θ ⇐⇒ tan θ = ± √ 2. So
tan θ = 2, which corresponds to where sin θ = 2/3
indeed there√is a stationary point when √
√
and cos θ = 1/3 and where (x, y) = (2 2/3/3, 2/ 3).
d2 y d 2 2 dθ −2 csc2 θ − sec2 θ
= (2 cot θ − tan θ) = (−2 csc θ − sec θ) = .
dx2 dx dx 3 sin2 θ cos θ
√
The numerator is always negative. At tan√θ = 2, we have cos θ > 0 and so the denominator
is negative. Altogether then, at tan θ = 2, the second derivative is negative, so that this
is indeed a maximum turning point.
(iii) By observation, y ≥ 0 (for 0 ≤ θ ≤ 0.5π) and thus C is entirely above the x-axis. So the
desired area is simply
(iv) The intersection points of the line and the curve C are given by 3 sin2 θ cos θ = a sin3 θ
or 3/a = tan θ, as desired.
√ √ √
2 = 3/a ⇐⇒ a = 3/ 2 = 1.5 2.
Answer to Exercise 383 (9740 N2015/II/1). (i) The maximum height is attained
when dh/dt = 0 ⇐⇒ h = 32 m.
√
1 1 16 − 0.5h t √
(ii) ∫ √ dh = ∫ 10 dt ⇐⇒ + C = Ô⇒ t = A − 40 16 − 0.5h.
16 − 0.5h 0.5 ⋅ (−0.5) 10
dy dy
2xy + x2 + y 2 + x(2y) = 0.
dx dx
Plugging in dy/dx = −1 implies that −x2 + y 2 = 0 or y = ±x. What we’ve just shown is that
dy/dx = −1 only if y = ±x.
Plug y = −x into the original equation: −x3 + x3 + 54 = 54 = 0 — clearly, this equation is
never true. Next plug in y = x: 2x3 + 54 = 0 ⇐⇒ x = −3. So the one and only point at
which the gradient is −1 is (−3, −3).
√ √
3 √
3 x7 3x5 33.5 33.5 54 3
∣∫ f (x) − (−7) dx∣ = − [ − − 7x + 7x] = −( − )= .
0 7 5 0 7 5 35
6 4
(iv) f (−x) = (−x) − 3 (−x) − 7 = x6 − 3x4 − 7 = f (x).
This last question (“What can be said about the six roots of the equation f (x) = 0?”) is
a strangely open-ended question. I don’t know what they wanted, so here I’ll just give a
complete answer, though probably you didn’t need to do so much work to get the full mark.
Two of the roots are ±α ≈ ±1.885.
So x6 −3x4 −7 = (x − α) (x + α) (x4 + ax2 + b) = x6 +(a − α2 ) x4 +(b − α2 a) x2 −α2 b. Comparing
1 2 3 1 3
coefficients, a − α2 = −3, b − α2 a = 0, −α2 b = −7. From =, a = α2 − 3 and from =, b = 7/α2 .
2
(You can verify for yourself that these values of a and b also satisfy =.)
√
And now solving the quadratic equation x4 + ax2 + b = 0, we have x2 = (−a ± a2 − 4b) /2.
Observe that a2 − 4b < 0, so that both values of x2 are imaginary. The square roots of the
above values of x2 yield the other four roots, all of which are also imaginary:
¿
√ √ Á √
Á 2
À 3 − α ± (α − 3) − 28/α
2 2 2
−a ± a2 − 4b Á
x=± =± .
2 2
(ii) f (x)
−0.5 −0.5 1 −0.5
= (9 − x2 ) = 9−0.5 [1 − (x/3)2 ] = [1 − (x/3)2 ]
3
⎡ x 2
2
x 2
3 ⎤
⎢ (− 1
) (− 3
) [− ( ) ] (− 1
) (− 3
) (− 5
) [− ( ) ] ⎥
1⎢⎢ 1 x 2
2 2 3 2 2 2 3 ⎥
= ⎢1 + (− ) [− ( ) ] + + + . . . ⎥⎥
3⎢ 2 3 2! 3! ⎥
⎢ ⎥
⎣ ⎦
1 1 1 5
= + 3 x2 + 4 3 x4 + 7 4 x6 + . . .
3 3 ×2 3 ×2 3 ×2
1 1 2 1 4 5 6
(iii) ∫ f (x) dx = ∫ 3 + 33 × 2 x + 34 × 23 x + 37 × 24 x + . . . dx
1 1 1 5
=C + x+ 4 x3 + x5
+ x7 + . . .
3 3 ×2 5×3 ×2
4 3 7×3 ×2
7 4
x 1 1 1 5
Ô⇒ sin−1 = x+ 4 x3 + x5
+ x7 + . . .
3 3 3 ×2 5×3 ×2
4 3 7×3 ×2
7 4
√
1 −1 1.25 + (x − 0.5)
∫ 2 dx = ∫ −0.2 dt Ô⇒ √ ln ∣ √ ∣ + C = −0.2t
1.25 − (x − 0.5) 2 1.25 1.25 − (x − 0.5)
√
√ 5 − (2x − 1)
Ô⇒ 5 ln √ + B = t.
5 + (2x − 1)
√
√ 5 − (2x − 1)
t = 5 ln √ .
5 + (2x − 1)
√
√ 5 + 0.5
(iii)(a) t(x = 0.25) = 5 ln √ .
5 − 0.5
√
√ 5+1
(iii)(b) t(x = 0) = 5 ln √ .
5−1
√ √
√ 5 − (2x − 1) 2 5
(iv) et/ 5 = √ = −1 + √
5 + (2x − 1) 5 + (2x − 1)
√
2 5 √
Ô⇒ √ = 5 + 2x − 1
et/ 5 + 1 √
5 √
Ô⇒ x = √ + 0.5 (1 − 5) .
et/ 5 + 1
x = f (t)
O t
√ πr √ r2
dV 2 π 2 0.5(−2r) 2
= 2πr + (2r 16 − r + r √
2 ) = 2πr + (2 16 − r − √
2 )
dr 3 16 − r2 3 16 − r2
2 πr 2 2 32 − 3r2
= 2πr + √ [2 (16 − r ) − r ] = πr [2r + √ ].
3 16 − r2 3 16 − r2
dV 32 − 3r2
= 0 ⇐⇒ r = 0 (clearly not the maximum) or 2r + √ =0
dr 3 16 − r2
√ 1
Ô⇒ 2r (3 16 − r2 ) = 3r2 − 32 Ô⇒ 36r2 (16 − r2 ) = 9r4 + 322 − 6 × 32r2
Ô⇒ 0 = 45r4 − 768r2 + 1024.
dV dV
(iii) ∣ ≠ 0, ∣ = 0.97
dr r≈1.207 dr r≈3.951
Presuming that V is maximised at a positive root to the equation 0 = 45r4 − 768r2 + 1024,
we have r1 ≈ 3.951 and h ≈ 0.625460188.
O
(iv) 4 r
dV dV 1
97
Remark: We found r ≈ 1.207 by setting = 0. So how can it be that ∣ ≠ 0? The reason is that at step Ô⇒ , we
dr dr r≈1.207
applied a squaring operation. This resulted in additional solutions for our eventual equation 0 = 45r4 − 768r2 + 1024 that
dV
were not shared by the equation = 0.
dr
Here’s a simple example to illustrate: Say we have the equation x = 2. We square both sides to get x2 = 4. We now
conclude that x = ±2. But in fact the additional solution x = −2 should be rejected.
1 2 3 1 2
Comparing coefficients, A + 2B = 9, 2C − 5B = 1, and 9A − 5C = −13. Take 2.5× = plus =
3
plus 0.4× = to get 2.5A + 3.6A = 2.5 × 9 + 1 + 0.4 × (−13) = 6.1A = 18.3 Ô⇒ A = 3. So we
also have C = 8 and B = 3. Hence,
2 9x2 + x − 13
∫0 (2x − 5)(x2 + 9)
dx
⎡ RRR <0 RRR ⎤2
⎢ x ⎥⎥
2 3x + 8 ⎢ 3 RR 3
dx = ⎢ ln RRRRR2x − 5RRRRR + ln (x2 + 9) + tan−1 ( )⎥
3 8
=∫ + 2
0 2x − 5 x +9 ⎢ 2 RR RRR 2 3 3 ⎥⎥
⎢ R
R R
⎣ ⎦0
3 (5 − 2 × 2)(2 + 9) 8
2 2 0 3 13 8 2
= ln + [tan−1 ( ) − tan−1 ( )] = ln + tan−1 ( ) .
2 5×9 3 3 3 2 45 3 3
y y
y = f(x)
O x
√ √ √ √
3 3a/2 3a/2 x2
(ii) Note that a < a. And so ∫ f (x) dx = ∫ 1− dx.
2 a/2 a/2 a2
Now use the given substitution:
√ √ √
3a/2 x2 π/3 a2 sin2 θ π/3 √
∫a/2 1 − 2 dx = ∫ 1− 2
a cos θ dθ = ∫ 1 − sin2 θa cos θ dθ
a π/6 a π/6
π/3 π/3 cos 2θ + 1 a sin 2θ π/3
2
=∫ a cos θ dθ = ∫ a dθ = [ + θ]
π/6 π/6 2 2 2 π/6
√ √
a sin 2θ π/3
a 0.5 3 − 0.5 3 π π πa
= [ + θ] = [ + − ]= .
2 2 π/6 2 2 3 6 12
dy
(ii) = 1.5 − 0.5Ae−2x . So y = 1.5x + 0.25Ae−2x + B, where B is the constant of integration.
dx
d2 y
(iii) = Ae−2x . So Ae−2x = a (1.5 − 0.5Ae−2x ) + b. And thus, a = −2 and b = 3.
dx2
(iv) Two of these lines have equations y = 1.5x (where A = 0 = B) and y = 1.5x + 1 (where
A = 0, B = 1) .
A non-linear member of the family of curves is y = 1.5x + e−2x (where A = 4). For this
member, as x → ∞, y → 1.5x, so it has y = 1.5x has an asymptote.
y
y = 1.5x + e-2x
O x
y = 1.5x
1 2
(iii) The point M is on the curve C and thus satisfies x = 3t2 and y = 2t3 . It is also on the
3 1 2 3 2
curve L and thus satisfies x = y 2 + 1. Plug = and = into = to get 3t2 = (2t3 ) + 1 = 4t6 + 1 or
4t6 − 3t2 + 1 = 0, as desired.
By observation, t2 = −1 solves this last equation. So t = ±i are a pair of possible solutions.
But these two imaginary solutions cannot possibly correspond to M .
Let’s look for the other two solutions. We have that 4t6 − 3t2 + 1 = (t2 + 1) (4t4 + at2 + b) =
4t6 + (a + 4) t4 + (a + b)t2 + b. So a = −4 and b = 1. Altogether then, 4t6 − 3t2 + 1 =
2
(t2 + 1) (4t4 − 4t2 + 1) = (t2 + 1) (2t2 − 1) .
√ √
So the other two solutions are t = ± 0.5. At the point M , y ≥ 0, so it must be that t = 0.5
√ 2 √ 3 √
and M = (3 ( 0.5) , 2 ( 0.5) ) = (1.5, 0.5).
√
2x x 2
(iv) At least for the region illustrated, the curve C can be written as y = = 1.5 x1.5 .
3 3 3
And so the area below the curve C and above the x-axis between x = 0 and x = 1.5 is
2√
Altogether then, the desired area is A − B = 2.
15
√ 2 π √ √ 2
0.5 (a − 2 3x) sin = 0.25 3 (a − 2 3x) .
3
The prism has volume equal to the area we just calculated multiplied by its height x:
√ √ 2
V = 0.25 3 (a − 2 3x) x, as desired.
D
x x
a
E x x
x Fig. 2
x
B Fig. 1 C Fig. 3
A
dV 1 √ √ √ √ 2
(ii) = 3 [2 (a − 2 3x) (−2 3) x + (a − 2 3x) ]
dx 4
1√ √ √ √ 1√ √ √
= 3 (a − 2 3x) (−4 3x + a − 2 3x) = 3 (a − 2 3x) (a − 6 3x)
4 4
dV a a
So = 0 ⇐⇒ x = √ , √ .
dx 2 3 6 3
These are the two stationary points of V as a function of x. To determine their nature, we
use the second derivative test:
d2 V 1 √ √ √ √ √ 3 √
= 3 [−2 3 (a − 6 3x) − 6 3 (a − 2 3x)] = (12 3x − 4a) .
dx2 4 2
√
The second derivative is positive when evaluated at x = a/ (2 3) and negative when eval-
√ √
uated at x = a/ (6 3). Thus, the maximum value of V occurs when x = a/ (6 3):
1√ √ 2 1√ a 2 a 1 4 2 a a3
V = 3 (a − 2 3x) x = 3 (a − ) √ = a × = .
4 4 3 6 3 49 6 54
2 cos x 2 cos 0
f ′ (x) = Ô⇒ f ′ (0) = = 2,
1 + 2 sin x 1 + 2 sin 0
4 2 14 3 7
So f (x) = 0 + 2x − x + x + ⋅ ⋅ ⋅ = 0 + 2x − 2x2 + x3 + . . .
2! 3! 3
We are told that 2 = n and −2 = an, so a = −1. Hence, the third non-zero term in the
8 x3
Maclaurin series for eax sin nx is (1 − ) x3 = − .
6 3
x 0.5 −1 −1 2
∫ 1 + x4 dx = ∫ 1 + u2 du = 0.5 (tan u) + C = 0.5 (tan x ) + C.
AC AB 1
= =
sin 3π
4 sin [π − ( 3π
4 + θ)] sin ( π4 − θ)
sin ( 3π
4 ) sin ( 3π
4 ) 1
⇐⇒ AC = = = .
sin ( π4 − θ) sin ( π4 ) cos θ − sin θ cos ( π4 ) cos θ − sin θ
3π π π
where the last equality uses sin = sin = cos .
4 4 4
1 1 2 −1
= = [1 − (θ + 0.5θ + . . . )]
cos θ − sin θ 1 − θ − 0.5θ2 + . . .
2
= 1 + (θ + 0.5θ2 + . . . ) + (θ + 0.5θ2 + . . . ) + . . .
= 1 + (θ + 0.5θ2 ) + θ2 + ⋅ ⋅ ⋅ = 1 + θ + 1.5θ2 + . . .
98
It’s actually possible to show, albeit with a lot of work, that this definite integral is equal to
1
1 √ √ √ √ −1
√ √ −1
√ 8x3
[ 2 ln [(x + 1 − 2x) / (x + 1 + 2x)] + 2 2 tan (1 + 2x) − 2 2 tan (1 − 2x) + 4
2 2
]
32 x +1 0
1 √ √ √
= ... more work ... = [ 2 ln (3 − 2 2) + 2π + 4] ≈ 0.186.
32
dy dy dy
1− = 2(x + y) (1 + ) = 2(x + y) + 2(x + y)
dx dx dx
dy
⇐⇒ 1 − 2x − 2y = (2x + 2y + 1)
dx
1 − 2x − 2y dy
⇐⇒ = (provided 2x + 2y + 1 ≠ 0)
2x + 2y + 1 dx
2 dy 2 dy
⇐⇒ −1 + = ⇐⇒ =1+ ,
2x + 2y + 1 dx 2x + 2y + 1 dx
dy
as desired. (Note that where 2x + 2y + 1 = 0, is undefined.)
dx
d
(ii) Apply to the equation found in (i):
dx
d2 y −2 dy 4 dy
= (2 + 2 ) = − (1 + )
dx2 (2x + 2y + 1)2 dx (2x + 2y + 1)2 dx
2 2
dy dy 3
= −( ) (1 + ) = − (1 + ) , as desired.
2x + 2y + 1 dx dx
dy
(iii) Any turning point occurs only if = 0. So at any such point, the second derivative
dx
is equal to −(1 + 0)3 = −1 < 0. So by the second derivative test, the turning point is a
maximum.
dA dh k 2 −2k 2
= 6πr + 2π (h + r ) = 6πr + 2π [ 2 − r + r ( 3 − )]
dr dr πr 3 πr 3
k 4 2k 8 2k 10 2k
= 6πr + 2π [ 2 − r − 2 ] = 6πr − πr − 2 = πr − 2 .
πr 3 πr 3 r 3 r
dA 10 2k 10 3 3k 1/3
= 0 ⇐⇒ πr − 2 = 0 ⇐⇒ πr = 2k ⇐⇒ r = ( ) .
dr 3 r 3 5π
dA 1/3
The minimum point is where = 0 ⇐⇒ 4πr3 + 2k − 4 = 0 or r = [(2 − k)/2] . And
dr
1 2
(ii) We are now instead given that V = πr2 h + 2πr3 /3 = 200 and A = 3πr2 + 2πrh = 180. So
r ≈ −6.75874, 3.03721, 3.72153 (calculator).
We can reject the negative solution. So the two possible values of r are 3.03721 and 3.72153.
90
The corresponding values of h are h = − 1.5r ≈ 4.88, 2.12.
πr
Since r < h, we have r ≈ 3.04, h ≈ 4.88.
dy
Note that is undefined where θ = 0 or θ = 2π.
dx
R
dy RRRR dy dy
RRR = 0, lim = ∞, lim = −∞.
dx RR θ→0 dx θ→2π dx
Rθ=π
So at C, the gradient is 0. As θ → 0, 2π, the tangents to C become vertical.
At C, dy/dx = cot (π/2) = 0.
(ii) You can certainly use your graphing calculator as an aid, but in this case we can easily
figure out the graph even without your calculator and so that’s what I’ll do as an exercise.
First figure out the endpoints: At the endpoints, θ = 0 Ô⇒ (x, y) = (0, 0) and θ = 2π Ô⇒
(x, y) = (2π, 0).
In between, we know that dy/dx = cot (θ/2) with dy/dx → ∞ as θ → 0, 2π. Moreover,
dy/dx = 0 when θ = π. So the curve C starts with vertical slope at θ = 0, keeps increasing
but at a decreasing rate until θ = π, then keeps decreasing at an increasing rate.
y
Tangents vertical
2
x
0 π 2π
x=2π θ=2π 2π
(iii) ∫ y dx = ∫ (1 − cos θ)(1 − cos θ) dθ = ∫ 1 − 2 cos θ + cos2 θ dθ
x=0 θ=0 0
2π
cos 2θ + 1 sin 2θ 2π
= [θ − 2 sin θ + ∫ dθ] = [1.5θ − 2 sin θ + ] = 3π.
2 0 4 0
(iv) The normal to C at P has equation y −(1−cos p) = − [x − (p − sin p)] / cot (p/2). Where
it crosses the x-axis, we have
− [x − (p − sin p)] p
0 − (1 − cos p) = ⇐⇒ x = cot (1 − cos p) + (p − sin p) = p.
cot (p/2) 2
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=sin p (See above)
1 1 1 4 + 3u 1 7
(b) ∫ du = ∫ dt ⇐⇒ ln ∣ ∣+C = t, where C = − ln ∣ ∣. Altogether,
16 − 9u2 2×43 4 − 3u 24 1
1 4 + 3u
t= (ln ∣ ∣ − ln 7).
24 4 − 3u
dy dy dx 2 1
Answer to Exercise 401 (9740 N2011/I/3). (i) = ÷ = − 2 ÷ (2t) = − 3 . So
dx dt dt t t
2 1 2 x 3
the equation of the tangent at the given point is y − = − 3 (x − p ) or y = − 3 + .
p p p p
x 3
(ii) The horizontal intercept is given by 0 = − 3
+ or x = 3p2 . So Q = (3p2 , 0).
p p
3 3
The vertical intercept is given by y = 0 + . So R = (0, ).
p p
1.5
(iii) The mid-point is of QR is (1.5p2 , ). The locus of the mid-point as p varies is
p
1.5
{(x, y) ∶ x = 1.5p2 , y = , p ∈ R− ∪ R+ }.
p
1.53
The cartesian equation of this locus is thus x = .
y2
a a a
(ii)(a) ∫0 cos6 x dx = ∫ (1 − 3x2 + 4x4 + . . . ) dx = [x − x3 + 0.8x5 + . . . ]0
0
= a − a3 + 0.8a5 + . . .
π/4
6 π π 3 π 5
So where a = π/4, ∫ cos x dx ≈ − ( ) + 0.8 ( ) ≈ 0.5400.
0 4 4 4
π/4
(b) ∫ cos6 x dx ≈ 0.4746. Using the first few terms of the Maclaurin series as an approx-
0
π π
imation works well if is close to 0. In this case, is not close to 0 and so there is no
4 4
good reason why this approximation would have worked well.
⎧
⎪ ⎧
⎪
⎪
⎪2 − x, for x ≥ 0, ⎪
⎪2 − x, for x ≤ 2,
f (∣x∣) = ⎨ ∣f (x)∣ = ⎨
⎪
⎪ ⎪
⎪
⎩2 + x,
⎪ for x < 0. ⎩x − 2,
⎪ for x > 2.
The graph of y = f (∣x∣) is simply that of y = f (x) but with the region where x < 0 replaced
by the reflection in the vertical axis of the graph y = f (x) in the region where x > 0.
The graph of y = ∣f (x)∣ is simply that of y = f (x) but with the region where y < 0 replaced
by its reflection in the horizontal axis.
(0, 2) y
y = f (|x |)
(0, 2)
y = |f (x) |
x
(-2, 0) (2, 0) x
(2, 0)
(ii) You can simply write down what you observe from the graphs: f (∣x∣) = ∣f (x)∣ ⇐⇒
x ∈ [0, 2].
Or if we want to solve this algebraically and rigorously ... For x > 2, f (∣x∣) = ∣f (x)∣ ⇐⇒
2 − x = x − 2 ⇐⇒ x = 2 (NA). For x ∈ [0, 2], f (∣x∣) = ∣f (x)∣ ⇐⇒ 2 − x = 2 − x, which is
always true. For x < 0, f (∣x∣) = ∣f (x)∣ ⇐⇒ 2 + x = 2 − x ⇐⇒ x = 0 (NA). Altogether then,
f (∣x∣) = ∣f (x)∣ ⇐⇒ x ∈ [0, 2].
0 1
1 0 1 x2 x2
(iii) ∫−1 f (∣x∣) dx = ∫−1 (2 + x) dx + ∫0 (2 − x) dx = [2x + ] + [2x − ] = 3.
2 −1 2 0
2 a
a 2 a x2 x2
∫1 ∣f (x)∣ dx = ∫1 (2 − x) dx + ∫2 (x − 2) dx = [2x − ] + [ − 2x] = 0.5a2 − 2a + 2.5.
2 1 2 2
√
Set the two to be equal: 3 = 0.5a2 − 2a + 2.5 or a2 − 4a − 1 = 0 or a = 2 ± 5.
√ √
We can reject a = 2 − 5 < 2. Thus a = 2 + 5.
1 10
(ii) (a) ∫ dv = ∫ dt ⇐⇒ ∫ 100 − v 2 dv = t.
10 − 0.1v 2
1 10 + v 1 10
So t = ln ∣ ∣ + D. At t = 0, v = 0, so D = − ln ∣ ∣ = 0. Moreover, v ≤ 10. Hence,
2 10 − v 2 10
1 10 + v
t = ln .
2 10 − v
1 15 1
t(v = 5) = ln = ln 3.
2 5 2
1 10 + v 10 + v 20 20
(b) t = ln ⇐⇒ e2t = = −1 + ⇐⇒ v = 10 − 2t .
2 10 − v 10 − v 10 − v e +1
20
At t = 1, v = 10 − .
e2 + 1
Answer to Exercise 405 (9740 N2011/II/2). (i) The box has length 2(n−x), breadth
n − 2x, and height x. It thus has volume V = 2(n − x)(n − 2x)x = 2n2 x − 6nx2 + 4x3 .
dV dV
(ii) = 2n2 − 12nx + 12x2 . = 0 if and only if
dx dx
√ √ √
12n ± (−12n)2 − 4(12) (2n2 ) 12n ± 144n2 − 96n2 3n
x= = = 0.5n ± .
24 24 6
The larger value of x√may be rejected because in that case, the breadth of the box would
√ be
3n 3
n−2x = n−2 (0.5n + ) < 0. So the only stationary value of V occurs at x = (0.5 − ) n.
6 6
u
n© −2x n e−2x e−2n
u
2 −2x 2e
n n
©
∫0 x ±
e dx = [x ] −∫ 2x dx = n2 + ∫ x e−2x dx
′
−2 0 0 −2 −2 0 ±′
v v
e−2x
e−2x
n n
= −0.5n2 e−2n + [x ] −
−2 0 ∫0
dx
−2
e−2x
n
2 −2n −2n
= −0.5n e − 0.5ne + 0.5 [ ]
−2 0
1 1
= −0.5n2 e−2n − 0.5ne−2n − 0.25 (e−2n − 1) = − e−2n (2n2 + 2n + 1) + .
4 4
∞ n 1 1
2 −2x
(ii) ∫0 xe dx = lim ∫ x2 e−2x dx = lim [− e−2n (2n2 + 2n + 1) + ]
n→∞ 0 n→∞ 4 4
1 1
= − lim [e−2n (2n2 + 2n + 1)] +
4 n→∞ 4
1 1 1
= − [ lim (2e−2n n2 ) + lim (2e−2n n) + lim (2e−2n )] + = .
4 n→∞ n→∞ n→∞ 4 4
1
2
1 4x 2 π/4 4 tan θ 2 2
(b) ∫0 πy dx = ∫ π ( 2 ) dx = ∫ π( ) sec θ dθ
0 x +1 0 tan2 θ + 1
π/4 tan θ sec θ 2 π/4 tan θ sec θ 2
= 16π ∫ ( ) dθ = 16π ∫ ( ) dθ
0 tan2 θ + 1 0 sec2 θ
π/4 tan θ 2 π/4 π/4 1 − cos 2θ
2
= 16π ∫ ( ) dθ = 16π ∫ sin θdθ = 16π ∫ dθ
0 sec θ 0 0 2
θ sin 2θ π/4 π 1
= 16π [ − ] = 16π ( − ) = 2π 2 − 4π.
2 4 0 8 4
dy
(ii) The tangent is parallel to the x-axis where = 0 or y + x = 0 or y = −x. Plugging this
2 2 2
dx
2
√
into the
√ √ given equation, we have x − x − 2x + 4 = 0 of x = 2 or x = ± 2. The points are
(∓ 2, ± 2).
Answer to Exercise 409 (9740 N2010/I/6). (i) β ≈ 0.347 and γ ≈ 1.532 (calculator).
γ
γ
3 x4 x2
(ii) ∣∫ x − 3x + 1 dx∣ = ∣[ − 3 + x] ∣ ≈ 0.781417.
β 4 2 β
√
(iii) The red line has equation y = 1. It intersects the curve at x = 0 and again at x = − 3.
So the desired area is
0
0
3 x4 x2 9 3 9
∫−√3 x − 3x dx = [ − 3 ] √ =− +3× = .
4 2 − 3 4 2 4
(iv) The graph of the equation x3 − 3x + 1 = k always has maximum and minimum turning
points x = −1 and x = 1. To ensure that there are three real distinct roots, the maximum
turning point must be above the horizontal axis AND the minimum turning point must be
below the horizontal axis.
Hence, we must have (−1)3 − 3(−1) + 1 − k > 0 or k < 3 AND 13 − 3(1) + 1 − k < 0 or k > −1.
Altogether, k ∈ (−1, 3).
20
Ʌ = 20 - 10e-0.1t
O t
dA 8 + 8k
= −100 2 + 12x
dx x
dA
= 0 ⇐⇒ −100(8 + 8k) + 12x3 = 0
dx
1/3 1/3
100(8 + 8k) 200
⇐⇒ x = [ ] =[ (1 + k)] .
12 3
1 p2 + 1 1
y−p+ = 2 (x − p − )
p p −1 p
1 1
⇐⇒ (p2 − 1) y + (p2 − 1) (−p + ) = (p2 + 1) (x − p − )
p p
1 1
⇐⇒ (p2 + 1) x − (p2 − 1) y = (p2 + 1) (p + ) + (p2 − 1) (−p + ) = 4p.
p p
2
(iii) Observe that x + y = 2t and x − y = . Hence, (x + y)(x − y) = 4 or x2 − y 2 = 4.
t
This of course is simply an east-west hyperbola, with horizontal intercepts ±2, asymptotes
y = ±x, centre (0, 0), and lines of symmetry y = 0 and x = 0.
y
x=0
(-0, 0) Line of
Centre Symmetry
x2 - y2 = 4
y=0
Line of
Symmetry x
-2 2
Horizontal Horizontal
Intercept Intercept
y=x y = -x
Linear Linear
Asymptote Asymptote
This is the only stationary point. However, as we know, stationary points are either inflexion
points or turning points. We must check that this is indeed a turning point. One way to
do this is through the second derivative test:
√
d2 y x + 2(1.5) − (1.5x + 2)(0.5)(x + 2)−0.5 1.5 dy 0.5
= =√ − .
dx2 x+2 x + 2 dx x + 2
√
Evaluated at x = −4/3, the second derivative equals 1.5/ −4/3 + 2 > 0, so this is a minimum
turning point.
√
(ii) (a) The equation y 2 = x2 (x + 2) is equivalent to the equation y = ±x x + 2.
√
So the curve
√ may be decomposed into the graphs of two functions f ∶ x ↦ x x + 2 and
g ∶ x ↦ −x x + 2, both with domain [−2, ∞) and codomain R.
√ √
We have f ′ (x) = (1.5x + 2) / x + 2 and g ′ (x) = − (1.5x + 2) / √x + 2. Thus, the √
two possible
values of the gradient at the point where x = 0 are f ′ (0) = 2/ 2 and g ′ (0) = − 2.
(b) The easy way is to just use your graphing calculator and copy. But as an exercise,
let’s try to do this without a graphing calculator. We first graph y = x2 (x + 2) =
x3 + 2x2 . This is the cubic with horizontal intercepts −2 and 0 and vertical intercept 0.
dy/dx = 3x2 + 4x = x(3x + 4) and d2 y/dx2 = 6x + 4. So the two stationary points are 0 and
−4/3, the former being a minimum turning point and the latter a maximum turning point.
We now graph y 2 = x2 (x+2). Recall that this is symmetric in the horizontal axis. Moreover,
where x2 (x + 2) < 0, the graph is empty.
Answer to Exercise 413 (9740 N2010/II/3) (iii) The easy way is to just use your
graphing calculator and copy.
But as an exercise,
√ let’s try to do this without a graphing √ calculator. Observe
1.5x + 2 2.25x2 + 6x + 4 1
that √ = . Hence, we want to graph y = 2.25x + 1.5 + .
x+2 x+2 x+2
2.25x2 + 6x + 4 1
We first graph y = = 2.25x+1.5+ . This is the hyperbola with asymptotes
x+2 x+2
y = 2.25x + 1.5 and x = −2 and centre (−2, −3).
√
1
Now note that the graph of y = 2.25x + 1.5 + is simply the region of the graph of
x+2
1
y 2 = 2.25x + 1.5 + where y > 0.
x+2
2.25x2 + 6x + 4
Note that since the graph of y = has asymptotes y = 2.25x + 1.5 and x = −2,
√ x+2
2.25x2 + 6x + 4 √
the graph of y = must also have asymptotes y = 2.25x + 1.5 and x = −2.
√ x+2
Of course, y = 2.25x + 1.5 is not a oblique asymptote, but it is an asymptote nonetheless.
27 y
24
vertical asymptote 21
18
15
12
9
6
3
0 x
-12 -8 -4 0-3 4 8
-6
-9
-12
-15
-18
-21
-24
-27
-30
-33
1 1 1 1 1 1 1 1
∫0 dx = ∫ ( + ) dx = [− ln(2 − x) + ln(2 + x)]0
4−x 2 4 0 2−x 2+x 4
1 ln 3
= [0 + ln 3 + ln 2 − ln 2] = .
4 4
1/2p
1/2p 1 sin−1 (px) sin−1 (0.5) π
∫0 √ dx = [ ] = = .
1 − p2 x 2 p 0
p 6p
4π 2π
So p = = .
6 ln 3 3 ln 3
Answer to Exercise 415 (9740 N2009/I/4). (i) f (27) = f (3) = 5 and f (45) = f (1) =
6. So f (27) + f (45) = 11.
(ii)
y = f (x)
x
O
3 −2 0 2 3
(iii) ∫−4 f (x) dx = ∫−4 f (x) dx + ∫−2 f (x) dx + ∫0 f (x) dx + ∫2 f (x) dx
2 4 2 3
=∫ 7 − x2 dx + ∫ 2x − 1 dx + ∫ 7 − x2 dx + ∫ 2x − 1 dx
0 2 0 2
2
x3 4 3
= 2 × [7x − ] + [x2 − x]2 + [x2 − x]2
3 0
8 2
= 2 × (14 − ) + 16 − 4 + 9 − 3 − 2 × (4 − 2) = 36 .
3 3
y
y = f (x)
O
x
√
(ii) f ′ (x) = e−x + xe−x (−2x) = e−x (1 − 2x2 ) = 0 ⇐⇒ x = ± 0.5. These are the two
2 2 2
stationary points. Remember that stationary points are either turning points or points of
inflexion. Here we are asked for the turning points. So we need to check whether these
two points are turning points or points of inflexion.
√ √
f ′′ (x) = e−x (−2x) (1 − 2x2 )+e−x (−4x) = −2xf ′ (x)−4e−x x. f ′′ (− 0.5) > 0 and f ′′ ( 0.5) <
2 2 2
0, so the former is a minimum turning point and the latter is a maximum turning point.
√ √
So the two turning points are (± 0.5, ± 0.5e−0.5 ).
n n2 n
n2
(iii) ∫ xe−x dx = 0.5 ∫ e−u du = −0.5 [e−u ]0 = 0.5 (1 − en ) Ô⇒ lim ∫ xe−x dx = 0.5
2 2 2
0 0 n→∞ 0
0 0
dy dy dx 3t2 + 2t 3 × 22 + 2 × 2
(ii) = ( )÷( ) = , which when evaluated at t = 2 is equal to = 2.
dx dt dt 2t + 4 2×2+4
So the equation of l is y = 2x + c, where c = 23 + 22 − 2 (22 + 4(2)) = −12. So the equation of
l is y = 2x − 12.
(iii) The intersection points of l and C are given by t3 +t2 = 2 (t2 + 4t)−12 or t3 −t2 −8t+12 = 0.
We want to find the solutions to this last equation. We know that t = 2 is one, because P
is an intersection point.
Now write t3 − t2 − 8t + 12 = (t − 2) (t2 + at + b) = t3 + (a − 2)t2 + (b − 2a)t − 2b. So a = 1 and
b = −6. So t3 − t2 − 8t + 12 = (t − 2) (t2 + t − 6) = (t − 2)(t + 3)(t − 2). So a = 1 and b = −6.
So there is one other intersection point, given by t = −3. This corresponds to Q =
((−3)2 + 4(−3), (−3)3 + (−3)2 ) = (−3, −18).
100
n = 5t 2 - t 3 + 2t + 100
n = 5t 2 - t 3 + t + 100
n = 5t 2 - t 3 + 100
O t
dn
(ii) Under the given model, ≥ 0 and n ≤ 150.
dt
1
∫ 3 − 0.02n dn = ∫ dt
1
Ô⇒ 50 ∫ dn = t + A = −50 ln (150 − n)
150 − n
t+A
Ô⇒ e −50 = 150 − n
n = 150 − e −50 = 150 − Be−0.02t .
t+A
Ô⇒
7 8 − a1.5
= ⇐⇒ 7 = 16 − 2a1.5 ⇐⇒ 2a1.5 = 9 ⇐⇒ a = 4.52/3 ≈ 2.726.
3 1.5
y
y = 1.5 ln (x2 + 1) + 2
y = 1.5 ln (x2 + 1) + 1
y = 1.5 ln (x2 + 1)
O x
√
√ 1/ 3
1/ 3 1 −1
tan (3x) 1 √ π
∫0 dx = [ ] = tan−1 3 = .
1 + 9x2 3 0
3 9
u
e « xn+1
e e xn+1 1
(ii) ∫1 x ln x dx = [
n
ln x] − ∫ dx
¯′
n + 1 1 1 n+1x
v
e
en+1 e xn en+1 xn+1
= −∫ dx = −[ ]
n+1 1 n+1 n+1 (n + 1)2 1
en+1 en+1 − 1 en+1 1 − en+1
= − = + .
n + 1 (n + 1)2 n + 1 (n + 1)2
0.5 0.5
Moreover, (4 + 3θ2 + . . . ) = 40.5 (1 + 0.75θ2 + . . . ) = 2 [1 + 0.5 × 0.75θ2 + . . . ] = 2 + 0.75θ2 .
Hence, AC ≈ 2 + 0.75θ2 , as desired.
π
(b) f (0) = tan(2 × 0 + π/4) = tan = 1.
4
π π
f ′ (x) = 2 sec2 (2x + ) and so f ′ (0) = 2 sec2 = 4.
4 4
π π π π
f ′′ (x) = 4 sec2 (2x + ) tan (2x + ) × 2 and so f ′′ (0) = 8 sec2 tan = 16.
4 4 4 4
16x2
Hence, f (x) = 1 + 4x + + ⋅ ⋅ ⋅ = 1 + 4x + 8x2 + . . .
2!
−b −30 15
x= = =
2a 2 [0.5π(1/2)2 − (0.75π + 0.5)] (0.75π + 0.5) − 0.5π(1/2)2
15 120
= = .
0.75π + 0.5 − π/8 5π + 4
120
y = 30 − (0.75π + 0.5)x = 30 − (0.75π + 0.5)
5π + 4
90π + 60 60π + 60 π+1
= 30 − = = 60 .
5π + 4 5π + 4 5π + 4
y
y = x + x2 + x3 / 6
y = ex sin x
O x
x2 x3 x3 x3 x3 x3
(ii) ex sin x = (1 + x + + + . . . ) (x − + . . . ) = x + x2 + − + ⋅ ⋅ ⋅ = x + x2 + + . . . .
2 6 6 2 6 3
(iii) See above.
(iv) ∣g(x) − f (x)∣ = 0.5 ⇐⇒ x ≈ −1.96, 1.56 (calculator). Hence, also from our observation
of the graphs of f and g, we have ∣g(x) − f (x)∣ < 0.5 ⇐⇒ −1.96 > x > 1.56.
1
2
1 √ 0 √ u1.5 u2.5
0
4π
(ii) π ∫ y dx = π ∫ x 1 − x dx = −π ∫ (1 − u) u du = −π [ − ] = .
0 0 1 1.5 2.5 1 15
d dy √
(iii) Apply to the equation: 2y = 1 − x + x(0.5)(1 − x)−0.5 (−1).
dx dx
dy √
At the maximum point, = 0. So 1 − x = 0.5x(1 − x)−0.5 or 1 − x = 0.5x or x = 2/3.
dx
v′
1 ¬ e−2x
1 1 e−2x 1 −2 1 e−2x
1
−2x
∫0 dx = [x ] − dx = − e + [ ]
−2 0 ∫0 −2
x e
® 2 2 −2 0
u
1 1 1 3
= − e−2 − (e−2 − 1) = − e−2 .
2 4 4 4
e3 1 3 1 3 1 1 3 2
∫e dx = ∫1 et t2 et
dt = ∫ 1 t2 dt = − [ ] = .
x(ln x)2 t 1 3
y = |x – a |
x
-b (a, 0) b
a b
b a b x2 x2
(ii) ∫ ∣x − a∣ dx = ∫ a − x dx + ∫ x − a dx = [ax − ] + [ − ax]
−b −b a 2 −b 2 a
a2 b2 b 2 a 2
= a2 − + ab + + − ab − + a2 = a2 + b2 .
2 2 2 2
∞
∞ 1 ∞1 1 1 tan−1 x2
∫a dx = ∫ dx = [ ]
4 + x2 a 4 1 + 41 x2 4 1/2 a
1 a π a
= (tan−1 ∞ − tan−1 ) = − 0.5 tan−1 .
2 2 4 2
√ √
3/2 1 3/2 π π π
−1
∫1/2 √ dx = [sin x]1/2 = − = .
1 − x2 3 6 6
π a π a π π π 1 2
− 0.5 tan−1 = ⇐⇒ = tan [2 ( − )] = tan = √ ⇐⇒ a= √ .
4 2 6 2 4 6 6 3 3
1 z2 y 2 /x2
(ii) ∫ z dz = ∫ dx Ô⇒ = ln ∣x∣ + C Ô⇒ = ln ∣x∣ + C, provided x ≠ 0.
x 2 2
C = 0.5 (62 /22 ) − ln ∣2∣ = 4.5 − ln 2. So the solution is
dy dy dx 3 sin2 t cos t
Answer to Exercise 433 (9233 N2008/I/13). (i) = ÷ = =
dx dt dt 3 cos2 t (− sin t)
sin t
− .
cos t
So the normal to the curve has gradient cos t/ sin t. Its equation is cos t (x − cos3 t) =
sin t (y − sin3 t) or x cos t − y sin t = cos4 t − sin4 t, as desired.
(ii) cos4 t − sin4 t = (cos2 t + sin2 t) (cos2 t − sin2 t) = 1 × cos 2t = cos 2t.
(iii) The horizontal intercept of the normal at P is given by y = 0 and thus xA cos t =
cos 2t
cos4 t − sin4 t = cos 2t or xA = .
cos t
The vertical intercept of the normal at P is given by x = 0 and thus −yB sin t = cos4 t−sin4 t =
cos 2t
cos 2t or yB = − .
sin t
√
So by the Pythagorean Theorem, the length of AB is x2A + yB2 or
√ √
cos 2t 2 cos 2t 2 1 1
( ) + (− ) = cos 2t +
cos t sin t cos2 t sin2 t
√ √
sin2 t + cos2 t 1 cos 2t cos 2t
= cos 2t = cos 2t = = = 2 cot 2t.
sin2 t cos2 t sin2 t cos2 t sin t cos t 0.5 sin 2t
√
π/3 1 π/3 1 sin 4x sin 6x π/3 3
∫0 sin 5x sin x dx = ∫ cos 4x − cos 6x dx = [ − ] = .
2 0 2 4 6 0 16
1 2 2x(−1) 1 2x − 2(x + 2) 1 4
− − = + = −
1 + x x + 2 (x + 2)2 1 + x (x + 2)2 1 + x (x + 2)2
(x + 2)2 − 4(1 + x) x2
= = .
(1 + x)(x + 2)2 (1 + x)(x + 2)2
We are supposed to say that ln(1 + x) ∈ R Ô⇒ 1 + x > 0, and so the above expression is
never negative. But see remark in footnote.99
1
Answer to Exercise 436 (9740 N2007/I/4). (i) 4 ∫ dI = ∫ dt Ô⇒
2 − 3I
4 2 − e−0.75(t+C) 2
− ln ∣2 − 3I∣ = t + C Ô⇒ I = = + Ae−0.75t .
3 3 3
Since t = 0 Ô⇒ I = 2, we have A = 4/3. Altogether then, I = 2 (1 + 2e−0.75t ) /3.
(ii) As t → ∞, I → 2/3.
99
Author’s remark: The writers of this question made the elementary mistake of failing to state the domain and codomain of
the function. They simply presumed that the codomain MUST somehow be C. But now that we’ve learnt about complex
numbers, there’s no reason why we cannot have, for example, ln(1 + x) ∈ C. In which case we could certainly have 1 + x < 0,
because it turns out that, for example, ln(−1) = πi.
dy dy dx 3 sin2 t cos t
= ÷ = = −1.5 sin t,
dx dt dt 2 cos t(− sin t)
d2 y d d dt −1.5 cos t 3
= (−1.5 sin t) = (−1.5 sin t) = = , for t ≠ 0.
dx2 dx dt dx 2 cos t(− sin t) 4 sin t
At the endpoints, t = 0 Ô⇒ (x, y) = (1, 0) and t = 0.5π Ô⇒ (x, y) = (0, 1). Altogether
then, we have
(ii) The equation of the tangent at the given point is y − sin3 θ = −1.5 sin θ (x − cos2 θ). At
Q, y = 0 and so x = 2/3 sin2 θ +cos2 θ. At R, x = 0 and so y = 1.5 sin θ cos2 θ +sin3 θ. Altogether
then, △OQR has area 0.5 × Base × Height, or:
1.5 1.5
(ii) (4 − x)1.5 (1 + 2x2 ) = 41.5 (1 − 0.25x)1.5 (1 + 2x2 )
⎡ ⋅ 2 ⋅ ( −x
3 1
)
2
⋅ 2 ⋅ (− 12 ) ⋅ ( −x
3 1
)
3 ⎤
⎢ 3 −x ⎥ 3
⎢
= 8 ⎢1 + ⋅ + 2 4
+ 2 4
+ . . . ⎥⎥ [1 + (2x2 ) + . . . ]
⎢ 2 4 2! 3! ⎥ 2
⎣ ⎦
3 3 1 3 1
= 8 (1 − x + 7 x2 + 10 .x3 + . . . ) (1 + 3x2 ) = (8 − 3x + 4 x2 + 7 .x3 + . . . ) (1 + 3x2 )
8 2 2 2 2
3 2 1 3 2 3 13 2 127 3
= 8 − 3x + x + .x + 24x − 9x + ⋅ ⋅ ⋅ = 8 − 3x + 24 x − 8 x + ...
24 27 16 128
(iii) The binomial series expansions in (ii) are valid provided “∣ − 0.25x∣ < 1 AND ∣2x2 ∣ < 1”
√ √ √ √
⇐⇒ “x ∈ (−4, 4) AND x ∈ (− 0.5, 0.5)” ⇐⇒ “x ∈ (− 0.5, 0.5)”.
5π/3
√ √
5π/3 1 − 5π/3
cos 2x sin 2x 5π 3 5π 3
∫0 sin2 x dx = ∫ dx = 0.5 [x − ] = 0.5 ( + )= + .
0 2 2 0 3 4 6 8
√ √
5π/3 5π/3 5π 3 5π 3
5π/3
∫0 cos2 x dx = ∫ 1 − sin2 x dx = [x]0 − ( + )= − .
0 6 8 6 8
u
0.5π © ⎡ ⎤0.5π
0.5π ⎢ ©u
⎥
(ii)(a) ∫0
2
x sin x dx = [x2 (− cos x) − ∫ 2x(− cos x) dx] = 2 ⎢⎢∫ x cos x dx⎥⎥
± 0 ⎢ ± ⎥
′
v ⎣ v′ ⎦0
0.5π
0.5π
= 2 [x sin x − ∫ sin x dx] = π + 2 [cos x]0 = π − 2.
0
0.5π 2
(ii) (b) π ∫ (x2 sin x) dx ≈ 5.391307769139469 (calculator).
0
By the way, with a lot of work, it is actually possible to show that the exact area is
π 2 (π 4 + 20π 2 − 120) /320.
4
( 5 ) ( 32 ) ( 12 ) (− 12 ) ( 43 )
2.5 2 405
4 = ⋅⋅⋅ = − .
4! 1024
√
√ √ 0.5 3
−1
0.5 3 0.5 3 1 tan (2x) π π π π2
π∫ y 2 dx = π ∫ dx = π [ ] = ( − )= .
0.5 0.5 1 + 4x 2 2 0.5
2 3 4 24
2
(sin−1 t) cos [(sin−1 t) ] u cos u2
∫ √ dt = ∫ √ cos u du = ∫ u cos u2 du.
1 − t2 2
1 − sin u
2
1 (sin−1 t) cos [(sin−1 t) ] π/2 sin u2
π/4
π2
2
∫0 √ dt = ∫ u cos u du = [ ] = 0.5 sin ≈ 0.312.
1 − t2 0 2 0 4
2π
(ii) ∫0 ∣cos x − sin x∣ dx
π/4 5π/4 2π
=∫ cos x − sin x dx + ∫ sin x − cos x dx + ∫ cos x − sin x dx
0 π/4 5π/4
π/4 5π/4 2π
= [sin x + cos x]0 + [− cos x − sin x]π/4 + [sin x + cos x]5π/4
√ √
2 2 5π 5π π π √
= + − 1 − 2 (sin + cos ) + (sin + cos ) + 1 = 4 2.
2 2 4 4 4 4
4 5x + 4 4 1 x 2 4
∫1 dx = ∫ − dx = [ln ∣x − 5∣ − 0.5 ln (x + 4)]
(x − 5)(x2 + 4) 1 x−5 x2 + 4 1
20
= − ln 4 − 0.5 ln = −1.5 ln 4 = − ln 8.
5
dy x2 − y 2 dy x2 + y 2
2x − 2y = Ô⇒ = .
dx x dx 2xy
Strictly speaking, we need to also check the case where x = 0, but I doubt this was expected
on the exams, so I’ll just put this bit in a footnote.100
dy dv dv 2vx2 2v dv 3v + v 3
(ii) = v + x . So v + x = − 2 =− Ô⇒ x = − .
dx dx dx x + v 2 x2 1 + v2 dx 1 + v2
1 + v2 1 1
(iii) − ∫ dv = ∫ dx Ô⇒ − ln ∣3v + v 3 ∣ + A = ln ∣x∣
3v + v 3 x 3
−1/3
Ô⇒ B ∣3v + v 3 ∣ = ∣x∣
3
⇐⇒ B 3 = ∣x∣ ∣3v + v 3 ∣ = ∣3x3 v + x3 v 3 ∣ = ∣3x2 y + y 3 ∣ .
dy dy
100
If x = 0, then y = 0 and so from 2x − 2y = A, we see that is undefined — except in the special case where A = 0. If
dx dx
dy
A = 0, then the curve is simply x2 − y 2 = 0 or y = ±x, which is a big cross on the cartesian plane and = ±1. Which can
dx
dy
also be written = (x2 + y 2 ) /(2xy).
dx
Answer to Exercise 448 (9233 N2006/I/8). The tangent is parallel to the x-axis
dy d dy dy
where = 0. Apply to the equation of the curve to get: 6x + y + x + 2y = 0 or
dx dx dx dx
dy
(x + 2y) = −6x − y.
dx
dy 1 1
So = 0 ⇐⇒ −6x − y = 0 or y = −6x. Now plug = into the equation of the curve to
dx √
get 3x2 + x(−6x) + (−6x)2 = 33 or 3x2 = 33 or x = ± 11. So the two desired points are
√ √
(± 11, ∓6 11).
d sec θ −2 −2 −1
= − (cos θ) (− sin θ) = sin θ (cos θ) = tan θ (cos θ) = sec θ tan θ.
dθ
1 1 π/3 1
∫√2−1 √ dx = ∫ √ sec θ tan θ dθ
(x + 1) x2 + 2x π/4 sec θ sec2 θ − 1
π/3 1 π
=∫ tan θ dθ = − .
π/4 tan θ 12
1 2 3 1 2 3
Comparing coefficients, A + 2C = 1, 2B − C = 1, and A − B = −2. Take = plus 2× = plus 4× =
1 + x − 2x2 1 x+1
to get 5A = −5 or A = −1. C = 1 and B = 1. So = − + .
(2 − x)(1 + x2 ) 2 − x 1 + x2
1 x+1
(ii) − +
2 − x 1 + x2
1 −1
= − (1 − 0.5x) + (x + 1)(1 + x2 )−1
2
1
= − [1 + (−1)(−0.5x) + (−1)(−2)(−0.5x)2 /2! + . . . ] + (x + 1) [1 + (−1)x2 + . . . ]
2
1
= − (1 + 0.5x + 0.25x2 + . . . ) + (x + 1) (1 − x2 + . . . )
2
= (−0.5 − 0.25x − x2 /8 + . . . ) + (x + 1 − x2 + . . . ) = 0.5 + 0.75x − 9x2 /8.
(iii) The binomial series expansions in (ii) are valid if “∣−0.5x∣ < 1 and ∣x2 ∣ < 1” ⇐⇒ “x ∈
(−1, 1)”.
c
(ii) The line through P perpendicular to QR has gradient qr. So it has equation y − =
p
qr(x − cp).
c c 1 1
− = qr(cv − cp) ⇐⇒
Since it passes through V , we have − = qr(v − p) ⇐⇒
v p v p
p−v 1 1
= qr(v − p) ⇐⇒ = −qr ⇐⇒ v = − .
vp vp pqr
d dy
(iii) Observe that xy = c2 . Applying the operator, we have y + x = 0. So at P ,
dx dx
c dy dy −c 1
+ cp = 0 or = ÷ (cp) = − 2 . So the gradient of the normal at P is p2 .
p dx dx p p
1 c c 2 2 1 2 c c
(iv) Let cs − cp = k. Then − = kp . Plug = into = to get − = kp2 = (cp − cs)p2 or
s p s p
1 1 p − s 1 1
− = (p − s)p2 = or = p2 or s = 3 , as desired.
s p sp sp p
1 1
(v) QP has gradient − and P R has gradient − . Since these two lines are perpendicular,
qp pr
1 −1 1 1
we must have − = 1 = pr ⇐⇒ − = p2 . But − is the gradient of QR and p2 is the
qp − pr qr qr
gradient of the normal at P . So QR is parallel to the normal at P .
dz −0.5 −1.5
= (x2 + 32) + x(−0.5) (x2 + 32) (2x)
dx
−1.5 −1.5
= (x2 + 32) (x2 + 32 − x2 ) = 32 (x2 + 32) .
7 −1.5 1 −0.5 7 1
(ii) 2
∫2 (x + 32) dx = [x (x2 + 32) ] = [7(81)−0.1 − 2(36)−0.5 ]
32 2 32
1 7 1 1
= ( − )= .
32 9 3 72
Answer to Exercise 453 (9740 N2015/II/5). (i) The manager may not have all the
required information to properly implement stratified sampling. For example, he may not
know what proportion of the sampling population each age group composes.
(ii) Decide what the age groups are and how many he wishes to survey from each group.
(That is, for each age group, set a quota of respondents to be surveyed.) Then simply go
around surveying customers he sees in the supermarket, until he meets the quota for each
age group.
(iii) The manager may unconsciously gravitate towards customers that look more friendly.
He may thus not get a representative sample of his customers (many of whom look un-
friendly).
Answer to Exercise 454 (9740 N2015/II/6). (i) Let X be the number of red sweets
in the packet.
(ii) X ∼ B(100, 0.25). Since np = 25 > 5 and n(1 − p) > 5, the normal approximation
Y ∼ N (25, 18.75) is suitable. Hence, using also the continuity correction,
29.5 − 25
P(X ≥ 30) = 1 − P(X < 30) ≈ 1 − P(Y < 29.5) = 1 − Φ ( √ )
18.75
≈ 1 − Φ(1.039) ≈ 1 − 0.8506 = 0.1494.
(iii) Let p = P(X ≥ 30) ≈ 0.1494 and q = 1 − P(X ≥ 30) ≈ 0.8506. Then the desired
probability is
⎛ 15 ⎞ 15 ⎛ 15 ⎞ 14 ⎛ 15 ⎞ 2 13 ⎛ 15 ⎞ 3 12
q + pq + pq + p q ≈ 0.8245.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ 3 ⎠
(iii) Let F ∼ Po(1.3n). We are given that P(F < 2) < 0.05. That is,
(1.3n)0 (1.3n)1
e−1.3n ( + ) < 0.05 ⇐⇒ e−1.3n (1 + 1.3n) < 0.05.
0! 1!
Let f (n) = e−1.3n (1+1.3n). From calculator, f (1), f (2), f (3) > 0.05 and f (4) < 0.05. Hence,
the smallest possible integer value of n is 4.
2
2 ∑(xi − x̄)2 (0.80 − 0.8825) + (1.000 − 0.8825)2 + ⋅ ⋅ ⋅ + (1.000 − 0.8825)2
s = = ≈ 0.005592857.
n−1 7
(ii) The null hypothesis is H0 ∶ µ0 = 0.9 and the alternative hypothesis is HA ∶ µ0 < 0.9.
x̄ − µ0 0.8825 − 0.9
t= √ = √ ≈ −0.661860.
s/ n 0.005592857/ 9
Since, ∣t∣ < t7,0.1 = −1.415, we are unable to reject the null hypothesis at the 10% significance
level.
= 0.45 + 0.4 + 0.3 − 0.45 ⋅ 0.4 − 0.45 ⋅ 0.3 − P(B ∩ C) + 0.1 = 0.935 − P(B ∩ C)
And if B and C are independent, P(B ∩ C) = 0.4 ⋅ 0.3 = 0.12 and P(A′ ∩ B ′ ∩ C ′ ) = 0.185.
(iii) We know that P(A ∩ B ′ ∩ C) = P(A ∩ C) − P(A ∩ B ∩ C) = 0.135 − 0.1 = 0.035.
We want to find lower and upper bounds for P(B ∩ C). Refer to diagram below.
30 y
25
20
15
10
0 x
0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000
(iii) We are apparently supposed to presume that the greater the PMCC, the “better” or
the “more appropriate”. So we are supposed to use (c) from part (ii).
The estimated regression equation is y − ȳ = b(x − x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i . So in this
case, the estimated regression equation is
√
P − 14.083 = −0.147 ( h − 140.986) .
(iv) Let x be the height given in metres. Then 3x = h. Thus, the above equation may be
rewritten as
√
P − 14.083 = −0.147 ( 3x − 140.986) .
1600 − 1500
P(F > 1600) = 1 − P(F ≤ 1600) = 1 − Φ ( √ )
520
√
= 1 − Φ ( 5) ≈ 1 − Φ(2.236) ≈ 1 − 0.9873 = 0.0127.
0 − (−100)
P(F > E) = P(F − E > 0) = 1 − P(F − E ≤ 0) = 1 − Φ ( √ )
3810
10
= 1 − Φ ( √ ) ≈ 1 − Φ(1.622) ≈ 1 − 0.9476 = 0.0524.
38
(iii) 0.85F +0.9E ∼ N (0.85 ⋅ 5 ⋅ 300 + 0.9 ⋅ 8 ⋅ 200, 0.852 ⋅ 5 ⋅ 202 + 0.92 ⋅ 8 ⋅ 152 ) = N (2715, 2903).
2750 − 2715
P(0.85F + 0.9E < 2750) = Φ ( √ ) ≈ Φ(0.650) ≈ 0.7422.
2903
⎛ 3 ⎞⎛ 8 ⎞⎛ 5 ⎞⎛ 6 ⎞
Answer to Exercise 462 (9740 N2014/II/6). (i) = 31500.
⎝ 1 ⎠⎝ 4 ⎠⎝ 2 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 4 ⎞⎛ 5 ⎞
(ii) Ways to include only the midfielder brother = .
⎝ 1 ⎠⎝ 4 ⎠⎝ 1 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 4 ⎞⎛ 5 ⎞
Ways to include only the attacker brother = .
⎝ 1 ⎠⎝ 4 ⎠⎝ 2 ⎠⎝ 3 ⎠
(iii) The club now has 3 goalkeepers, 8 defenders, 3 midfielders, 5 attackers, and one player
(call him Apu) who can either be a midfielder or a defender.
⎛ 3 ⎞⎛ 8 ⎞⎛ 3 ⎞⎛ 5 ⎞
Ways to form a team without Apu = = 3150.
⎝ 1 ⎠⎝ 4 ⎠⎝ 2 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 3 ⎞⎛ 5 ⎞
Ways to form a team with Apu as a midfielder = = 3150.
⎝ 1 ⎠⎝ 4 ⎠⎝ 1 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 3 ⎞⎛ 5 ⎞
Ways to form a team with Apu as a defender = = 2520.
⎝ 1 ⎠⎝ 3 ⎠⎝ 2 ⎠⎝ 4 ⎠
1
(ii) Let Y be the number of sixes rolled. Then Y ∼ B (60, ). We have np > 5 and
6
50
n(1 − p) > 5. So Z = N (10, ) is a suitable approximate distribution for Y . Using also the
6
continuity correction, we have
⎛ 8.5 − 10 ⎞ ⎛ 4.5 − 10 ⎞
P(5 ≤ Y ≤ 8) ≈ P(4.5 < Z < 8.5) = Φ √ −Φ √ ≈ Φ(−0.520) − Φ(−1.905)
⎝ 50/6 ⎠ ⎝ 50/6 ⎠
1
(iii) Let A be the number of sixes rolled. Then A ∼ B (60, ). We have n > 20 and np < 5.
15
So B = Po(4) is a suitable approximate distribution for A.
45 46 47 48
P(5 ≤ A ≤ 8) ≈ P(5 ≤ B ≤ 8) = e−4 ( + + + ) ≈ 0.349800.
5! 6! 7! 8!
(ii) It’s not at all clear which is the better model. But apparently we are supposed to say
that since the second model is better because the magnitude of its PMCC is greater.
In general, the estimated regression equation is y − ȳ = b(x − x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
√ √ √
(ii) The null hypothesis is not rejected if t̄ > µ0 −t9,0.1 ⋅k/ n = 4.3−1.383⋅ 3.2/ 10 ≈ 3.518.
√ √
(iii) The
√ null hypothesis is rejected if t̄ < µ 0 − t 9,0.1 ⋅ k/ n or 4.0 < 4.3 − 1.383 ⋅ k/ 10 or
k > 0.3 10/1.383 or k 2 > 0.32 ⋅ 10/1.3832 ≈ 0.471.
Answer to Exercise 466 (9740 N2014/II/10). (i) (a) 0.1 ⋅ 0.2 ⋅ 0.1 = 0.002.
(i) (b) The probability that no ⋆ is displayed is 0.9 ⋅ 0.8 ⋅ 0.9 = 0.648. And so the probability
that t least one ⋆ symbol is displayed is 1 − 0.648 = 0.352.
(i) (c) P(× × +) = 0.3 ⋅ 0.1 ⋅ 0.2, P(× + ×) = 0.3 ⋅ 0.3 ⋅ 0.4, P(+ × ×) = 0.4 ⋅ 0.1 ⋅ 0.4.
Thus, the desired probability 0.006 + 0.036 + 0.016 = 0.058.
(ii) The probability that there is exactly one ⋆ is P(⋆ ⋆/ ⋆/ ) + P(/⋆ ⋆ ⋆/ ) + P(/⋆⋆/ ⋆) = 0.1 ⋅ 0.8 ⋅
0.9 + 0.9 ⋅ 0.2 ⋅ 0.9 + 0.9 ⋅ 0.8 ⋅ 0.1 = 0.306.
The probability that the symbols are ⋆, +, ◯ (in any order) is
P (⋆ + ◯) + P (⋆◯+) + P (+ ⋆ ◯) + P (◯ ⋆ +) + P (+◯⋆) + P (◯ + ⋆)
= 0.1(0.3 ⋅ 0.3 + 0.4 ⋅ 0.2) + 0.2(0.4 ⋅ 0.3 + 0.2 ⋅ 0.2) + 0.1(0.4 ⋅ 0.4 + 0.2 ⋅ 0.3)
= 0.017 + 0.032 + 0.022 = 0.071
(ii) Let Q ∼ Po(2n). We are given that P(Q < 3) < 0.01. That is,
Let f (n) = e−2n (1 + 2n + 2n2 ). From calculator, f (1), f (2), f (3), f (4) > 0.01 and f (5) <
0.01. Hence, the smallest possible integer value of n is 5.
(iii) Let R ∼ Po(52⋅11) = Po(572). Given a large sample, we can use the normal distribution
S ∼ N (572, 572)as an approximation. Hence, using also the continuity correction,
550.5 − 572
P(R > 550) ≈ P(S > 550.5) = 1 − P(S < 550.5) = 1 − Φ ( √ )
572
(iv) Sales may be seasonal — e.g. it may be that art collectors make most of their purchases
in the northern hemisphere’s summer months.
The sales of originals and prints may not be independent of each other. E.g., an art collector
who buys an original Picasso might wish to also buy a few copies thereof.
(ii) Stratified sampling is more appropriate. If say 10% of employees are from India, 30%
from China, 20% from Thailand, and 40% from Singapore, then we could instead pick
from the list the first 9 Indian employees, the first 27 Chinese employees, the first 18 Thai
employees, and the first 36 Singaporean employees.
2a − µ
Answer to Exercise 469 (9740 N2013/II/6). P(Y < 2a) = P (Z < ) = 0.95
σ
2a − µ 2a − µ 1
Ô⇒ ≈ 1.645 ⇐⇒ ≈ σ.
σ 1.645
a−µ a−µ 1 2a − µ
P(Y < a) = P (Z < ) = 0.25 Ô⇒ ≈ −0.674 ⇐⇒ µ − a ≈ 0.674σ ≈ 0.674
σ σ 1.645
0.674 2 ⋅ 0.674
⇐⇒ µ (1 + ) ≈ (1 + ) a ⇐⇒ µ ≈ 1.29a. That is, k ≈ 1.29.
1.645 1.645
Answer to Exercise 470 (9740 N2013/II/7). (i) The probability that one packet
contains a free gift is independent of why another packet contains a free gift.
There is no possibility that any one packet contains two or more free gifts.
1 ⎛ 20 ⎞ 1 19 19
(ii) Let F ∼ B (20, ). Then P(F = 1) = ( ) ( ) ≈ 0.377354.
20 ⎝ 1 ⎠ 20 20
1
(iii) Let F ∼ B (60, ). Since n = 60 is large and np = 3 is small, a suitable approximation
20
for F is G ∼ Po (3).
30 31 32 33 34 35
P(F ≥ 5) ≈ P(G ≥ 5) = 1 − e−3 ( + + + + + ) ≈ 0.184737.
0! 1! 2! 3! 4! 5!
′ P(A′ ∩ B ′ ) 0.06
P(B ) = = = 0.5
P(A′ ∣B ′ ) 0.12
The null hypothesis is H0 ∶ µ0 = 13.8 and the alternative hypothesis is HA ∶ µ0 < 13.8.
x̄ − µ0 12.8 − 13.8
t= √ =√ ≈ −1.862697.
s/ n 2.305714/8
Since ∣t∣ < t7,0.05 = 1.895, we are unable to reject the null hypothesis at the 5% significance
level.
(ii)
150 Distance, y
100
50
Speed, x
0
0 15 30 45 60 75 90 105 120 135 150
(iii) As a function of speed, the distance travelled decreases at an increasing rate. So (A)
is the most appropriate.
PMCC ≈ −0.939203.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
(ii) The number of ways to choose the two digits so that the second digit is larger than the
4
first is 1 + 2 + ⋅ ⋅ ⋅ + 8 = 36. Hence, the desired probability is (1 + 2 + ⋅ ⋅ ⋅ + 8)/92 = = 0.4̇.
9
(iii) The number of ways to choose a code with exactly two letters the same, but not two
digits the same is
The number of ways to choose a code with exactly two digits the same, but not two letters
the same is
26 ⋅ 25 ⋅ 3 ⋅ 9 ⋅ 8 + 9 ⋅ 26 ⋅ 25 ⋅ 24 25 ⋅ 4 100
Hence the desired probability is = 2 = ≈ 0.19724.
3
26 9 2 13 ⋅ 3 1107
(iv) There are 4 ways to choose the even digit, 5 to choose the odd digit then 2 ways to
arrange these two digits. Hence, there are 4 ⋅ 5 ⋅ 2 = 40 ways to choose the two digits.
There are 5 ways to choose the vowel. There are 212 ways to choose the two consonants.
We can now slot in the vowel amidst the consonants in 3 different ways. Hence, there are
5 ⋅ 212 ⋅ 3 ways to choose the three letters.
Altogether then, there are 5 ⋅ 212 ⋅ 3 ⋅ 4 ⋅ 5 ⋅ 2 ways to choose a code with exactly one vowel
and exactly one even digit.
5 ⋅ 212 ⋅ 3 ⋅ 4 ⋅ 5 ⋅ 2 5 ⋅ 72 ⋅ 5 1225
Hence the desired probability is = = ≈ 0.18586.
263 92 133 ⋅ 3 6591
Condition #1 may not be met if the illness is contagious. If so, we’d expect the number of
people sick on a particular day to depend (positively) on how many were sick the previous
day.
Condition #2 may not be met if the illnesses are seasonal. For example, due to influenza,
illnesses may be more common during the winter than during the summer.
(iii) Let C be the total number of days of absence across both departments, over a 5-day
period. Then C ∼ Po(19.5) and
20
−19.5 19.5i
P(C > 20) = 1 − P(C ≤ 20) = 1 − e ∑ ≈ 0.396583.
i=0 i!
(iv) Let D be the total number of days of absence across both departments, over a 60-day
period. Then D ∼ Po(234). Since λD = 234 is large, the normal distribution is a suitable
approximation. Let E ∼ N (234, 234). Then
(i) (b) P(D∣+) = P(D ∩ +) ÷ P(+) = P(D)P(+∣D) ÷ P(+) = 0.001p ÷ 0.00599 ≈ 0.166110.
0.001p
P(D∣+) = .
0.999 − 0.998p
(ii) H0 ∶ x̄ ∼ N (14.0, 3.82 ). Since Z0.025 = 1.96, the values of x̄ for which the null hypothesis
would not be rejected are
σ σ 3.8 3.8
x̄ ∈ (µ − Z0.025 √ , µ + Z0.025 √ ) = (14.0 − 1.96 √ , 14.0 + 1.96 √ ) ≈ (12.335, 15.665) .
n n 20 20
(ii) There are 3! ways to arrange the 3 brothers as a single unit. Counting the 3 brothers
as a single unit, we have 13 units total, and there are 13! ways to arrange these 13 units.
So, there are in total 3! ⋅ 13! ways to arrange the 15 individuals so that the three brothers
are together. We do not want the three brothers to be together.
Hence, the desired probability is 1−3!⋅13!/15! = 1−6/ (14 ⋅ 15) = 1−1/35 = 34/35 ≈ 0.97142857.
(iii) There are 2 ways to arrange the 2 sisters as a single unit and 3! ways to arrange the 3
brothers as a single unit. Counting the 2 sisters as a single unit and also the 3 brothers as
a single unit, we have 12 units in total, and there are 12! ways to arranges these 12 units.
So, there are in total 2 ⋅ 3! ⋅ 12! ways to arrange the 15 individuals so that the 2 sisters are
together and the 2 brothers are together.
Hence, the desired probability is 2 ⋅ 3! ⋅ 12!/15! = 12/(13 ⋅ 14 ⋅ 15) = 2/(13 ⋅ 7 ⋅ 5) = 2/455 ≈
0.0043956.
(iv) Let A and B denote the events that “the sisters are next to each other” and “the
brothers are next to each other”. Our desired probably is P(A ∪ B).
2 1 2 91 ⋅ 2 13 2
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = + − = + −
15 35 455 3 ⋅ 455 455 455
91 ⋅ 2 33 43 43
= + = = ≈ 0.17695.
3 ⋅ 455 3 ⋅ 455 3 ⋅ 91 243
(ii) The trend is one of steady improvement. After a terrible performance in Week 1, Amy
resolves to work hard. Her work pays off, with her mark improving week after week.
The only deviation from trend occurs on Week 5, because Amy happened to be experi-
menting with drugs that week.
(iii) A linear model would suggest that she eventually breaks the 100% barrier, which is
quite impossible.
A quadratic model would suggest that her mark eventually starts falling and moreover at
an increasing rate, which is quite improbable, unless of course she gets hooked on drugs.
(v) We are supposed to say that the most appropriate choice is wherever the magnitude of
the PMCC is the largest. Hence, L = 92 is the most appropriate.
(vi) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
(vii) As x → ∞, y → L. An interpretation is thus that L is the best mark she can ever
hope to get, no matter how long she spends studying.
⎛ 30 ⎞ 3 ⎛ 30 ⎞ 4
(ii) P(A = 3) + P(A = 4) = p (1 − p)27 + p (1 − p)26 ≈ 0.373068.
⎝ 3 ⎠ ⎝ 4 ⎠
(iii) (a) np = 16.5 > 5 and n(1 − p) = 13.5 > 5 are both large and so yes, the normal
distribution N(16.5, 16.5 ⋅ 0.45)would be a suitable approximation for A.
(iii) (b) p is large. And so, while it is certainly possible to use the Poisson distribution as
an approximation, it would fare poorly.
⎛ 30 ⎞ 15
(iv) P(A = 15) = p (1 − p)15 ≈ 0.06864.
⎝ 15 ⎠
⎡ ⎤1/15
⎢ ⎛ 30 ⎞ ⎥
Thus, p(1 − p) = p − p2 ≈ ⎢⎢0.06864/ ⎥ ≈ 0.237900.
⎢ ⎝ 15 ⎠⎥⎥
⎣ ⎦
Rearranging, p2 − p + 0.237900 = 0. By the quadratic formula, p ≈ 0.39, 0.61. Given that
p < 0.5, we have p ≈ 0.39.
(iv) Let I ∼ Po(80). Since λ is large, the normal distribution J ∼ N(80, 80) is a suitable
approximation. Using also the continuity correction,
89.5 − 80
P(I ≥ 90) ≈ P(J ≥ 89.5) = 1 − Φ ( √ ) ≈ 1 − Φ(1.062) ≈ 1 − 0.8559 = 0.1441.
80
(v) Let P ∼ Po(3). Let Z be the number of gold coins and pottery shards found in 50 m2 .
Then Z ∼ Po(190). Since λ is large, the normal distribution Q ∼ N(190, 190) is a suitable
approximation for Z. Using also the continuity correction,
199.5 − 190
P (Z ≥ 200) ≈ P(Q ≥ 199.5) = 1 − Φ ( √ ) ≈ 1 − Φ(0.6892) ≈ 1 − 0.7546 = 0.2454.
190
(vi) Let X and Y be, respectively, the numbers of gold coins and pottery shards found in
50 m2 . Then X ∼ Po(40) and Y ∼ Po(150). Our goal is to find P(Y ≥ 3X) = P(Y − 3X ≥ 0).
Since λX = 40 and λY = 150 are both large, the normal distributions A ∼ N(40, 40) and
B ∼ N(150, 150) are suitable approximations for X and Y , respectively. And in turn,
B − 3A ∼ N(150 − 3 ⋅ 40, 150 + 32 ⋅ 40) = N(30, 510) is a good approximation for Y − 3X.
Hence, using also the continuity correction,
−0.5 − 30 30.5
P(Y − 3X ≥ 0) ≈ P(B − 3A ≥ −0.5) = 1 − Φ ( √ ) = Φ (√ ) ≈ Φ(1.3506) ≈ 0.9116.
510 510
Answer to Exercise 483 (9740 N2011/II/6). (i) Decide what the age groups will
be. Decide how many from each age group are to be interviewed (these are our quotas).
Then pick, at random, residents on the street to be interviewed, until the quota for every
age group is fulfilled.
(ii) Residents who are on the street may not be a representative sample of the population.
(iii) Random sampling. Acquire a complete list of the city suburb’s population. Use a
computer program to randomly pick a sample. Interview this sample.
No it is not realistic. First, one may be able to acquire a complete list of the city suburb’s
population. Second, one may not be able to contact every member of one’s sample.
(ii) Assumption #1 may not hold because if say n = 100, I may run out of time before I
attempt to contact all 100 different friends.
Assumption #2 may not hold because my friends probably know each other and so they
might be watching a movie together and their handphones are switched off. This would
mean that the probability that one friend is contactable is dependent on whether another
friend is contactable.
5 5 ⎛ 5 ⎞ i 5−i
(iii) P(R ≥ 6) = 1 − ∑ P(R = i) = 1 − ∑ 0.7 0.3 ≈ 0.551774.
i=0 i=0 ⎝ i ⎠
(iv) Since np = 28 > 5 and n(1 − p) = 12 > 5 are both large, a suitable approximation to R
is the normal distribution S ∼ N (28, 8.4). Using also the continuity correction, we have
24.5 − 28
P(R < 25) ≈ P(S < 24.5) = Φ ( √ ) ≈ Φ (−1.2076)
8.4
= 1 − Φ (1.2076) ≈ 1 − 0.8863 = 0.1137.
(ii) The PMCC is ≈ −0.992317 which is very large in magnitude. But this merely means
that the correlation between x and y is very strong. It does not also imply that their true
relationship is definitely linear. Indeed in this case, it appears that the relationship is not
linear.
(iii) We are supposed to say that the larger the magnitude of the PMCC, the better the
model. In this case, the PMCC of y and x2 is −0.999984. And so we’re supposed to conclude
that y = a + bx2 is the better model.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
(ii) (a) P(Exactly one faulty) = P(First faulty, second not) + P(Second faulty, first not) =
0.058 (1 − 0.058) + (1 − 0.058) 0.058 = 2 ⋅ 0.058 ⋅ 0.942 = 0.109272.
E F
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ P(E ∩ F )
(ii) (b)P(Both made by A∣Exactly one faulty) = .
P(F )
But P(E ∩ F ) = P(E)P(F ∣E) = 0.62 (0.05 ⋅ 0.95 + 0.95 ⋅ 0.05) = 0.0342. Hence P(E∣F ) =
0.0342/0.109272 ≈ 0.312980.
Answer to Exercise 487 (9740 N2011/II/10). (i) We are given that T ∼ N(5.0, 38.0).
Let X be the time taken to install the component after background music is introduced.
Assume that X remains normally distributed with standard deviation 5.0 (these are ques-
tionable assumptions, but without these we cannot proceed). That is, X ∼ N (µ0 , 5.02 ).
The null hypothesis is H0 ∶ µ0 = 38.0 and the alternative hypothesis is HA ∶ µ0 < 38.0.
√
(ii) Z0.05 ≈ 1.645.√ So to reject the null hypothesis, we must have t̄ < µ 0 − Z 0.05 σ/ n=
38.0 − 1.645 ⋅ 5.0/ 50 ≈ 36.8.
√
(iii) Since the null
√ is not rejected with t̄ = 37.1, we must have t̄ = 37.1 > µ 0 − Z 0.05 σ/ n=
2
38.0 − 1.645 ⋅ 5.0/ n. Rearranging, n < (1.645 ⋅ 5.0/0.9) ≈ 83.5. Thus, n ∈ {1, 2, . . . , 83}.
⎛ 18 ⎞ ⎛ 12 ⎞
(ii) There are ways to choose the committee so that there are exactly r
⎝ r ⎠ ⎝ 10 − r ⎠
⎛ 18 ⎞ ⎛ 12 ⎞
women. Similarly, there are ways to choose the committee so that there
⎝ r + 1 ⎠⎝ 9 − r ⎠
are exactly r + 1 women. We are given that the former number is greater than the latter,
i.e.
⎛ 18 ⎞ ⎛ 12 ⎞ ⎛ 18 ⎞ ⎛ 12 ⎞
>
⎝ r ⎠ ⎝ 10 − r ⎠ ⎝ r + 1 ⎠ ⎝ 9 − r ⎠
⇐⇒ (17 − r)!(r + 1)!(3 + r)!(9 − r)! > (18 − r)!r!(2 + r)!(10 − r)! (as desired).
Continuing with the algebra, we have (r + 1)(3 + r) > (18 − r)(10 − r) ⇐⇒ r2 + 4r + 3 >
17
r2 − 28r + 180 ⇐⇒ 32r > 177 ⇐⇒ r > 5 .
32
We have just proven that P(R = r) > P(R = r + 1) if and only if r = 6, 7, 8, 9. That is,
we have just shown that P(R = 6) > P(R = 7) > P(R = 8) > P(R = 9) > P(R = 10), but
P(R = 0) ≤ P(R = 1) ≤ P(R = 2) ≤ P(R = 3) ≤ P(R = 4) ≤ P(R = 5) ≤ P(R = 6).
We have thus shown that 6 is a most-probable-number-of-women and that 7, 8, 9, 10 are
not. We must rule out that 5 (or any smaller number) is a most-probable-number-of-women.
But clearly, 6!4! ≠ 5!5!, so that
⎛ 18 ⎞ ⎛ 12 ⎞ ⎛ 18 ⎞ ⎛ 12 ⎞
≠ .
⎝ 6 ⎠⎝ 4 ⎠ ⎝ 5 ⎠⎝ 5 ⎠
Hence, it is indeed the case that P(R = 5) < P(R = 6). Thus, 6 is indeed the unique
most-probable-number-of-women.
7
4.8i
P(X ≥ 8) = 1 − P(X ≤ 7) = 1 − e−4.8 ∑ ≈ 0.113334.
i=0 i!
(ii) Let Y be the number of people who join the queue in a period of t minutes. Then
Y ∼ Po(1.2t/60) = Po(0.02t). We are told that P(Y ≤ 1) = 0.7. That is,
By calculator, t ≈ 54.8675.
(iii) Let Z be the number of people who leave the queue over 15 minutes. Then Z ∼ Po(27).
Let B be the number of people who join the queue over 15 minutes. Then B ∼ Po(18).
11.5 − 9
P(Z − B ≤ 11) ≈ P(A − C ≤ 11.5) = Φ ( √ ) ≈ Φ(0.3727) ≈ 0.6453.
45
(iv) There might be certain periods of time when more planes arrive and other periods when
fewer arrive. So the rate at which people join the queue will probably not be constant.
Answer to Exercise 490 (9740 N2010/II/5). (i) Say we wish to stratify the specta-
tors by age group. One problem is that we may not know what proportion of the spectators
belongs to each age group. As such, it would may be difficult to get a representative sample.
(ii) Order the spectators by their names, alphabetically. Choose every 100th spectator on
the list to survey.
∑ t 454.3
t̄ = = = 41.3,
n 11
2
2 ∑ t2 − (∑ t) /11 18779.43 − 454.32 /11
s = = = 1.684.
n−1 10
(ii) The null hypothesis is H0 ∶ µ0 = 42.0 and the alternative hypothesis is HA ∶ µ0 ≠ 42.0.
t̄ − µ0 41.3 − 42.0
T= √ =√ ≈ −1.789.
s/ n 1.684/11
Since ∣T ∣ < t10,0.05 = 1.812, we are unable to reject the null hypothesis.
(ii) The first three digits are odd and there are 3! ways to arrange them. The last two are
even and there are 2! ways to arrange them. The total number of ways to arrange the five
digits is 5!. Answer: 3!2!/5! = 1/10 = 0.1.
(iii) If the first digit is 3, the last digit must be 1or 5, and in each case, there are 3! ways
to arrange the middle 3 digits.
Similarly, if the first digit is 5, the last digit must be 1 or 3, and in each case, there are 3!
ways to arrange the middle 3 digits.
If the first digit is 4, the last digit can be 1, 3, or 5, and in each case, there are 3! ways to
arrange the middle 3 digits.
Altogether then, there are 7 ⋅ 3! ways to get such a number and the desired probability is
7 ⋅ 3!/5! = 7/20 = 0.35.
Answer to Exercise 494 (9740 N2010/II/9). (i) Our desired probability is P(Y >
2X) = P(Y − 2X > 0). Now, Y − 2X ∼ N (400 − 2 ⋅ 180, 602 + 22 302 ) = N (40, 7200). So
0 − 40
P(Y − 2X > 0) = 1 − Φ ( √ ) ≈ Φ(0.4714) ≈ 0.6813.
7200
0.12X + 0.05Y ∼ N (0.12 ⋅ 180 + 0.05 ⋅ 400, 0.122 ⋅ 302 + 0.052 ⋅ 602 ) = N (41.6, 21.96)
45 − 41.6
Ô⇒ P(0.12X + 0.05Y > 45) = 1 − Φ ( √ ) ≈ 1 − Φ(0.7255) ≈ 1 − 0.7658 = 0.2342.
21.96
0.12X1 + 0.12X2 ∼ N (0.12 ⋅ 180 + 0.12 ⋅ 180, 0.122 ⋅ 302 + 0.122 ⋅ 302 ) = N(43.2, 25.92)
45 − 43.2
P (0.12X1 + 0.12X2 > 45) = 1 − Φ ( √ ) ≈ 1 − Φ(0.3536) ≈ 1 − 0.6381 = 0.3619.
25.92
(iii) We are, as usual, supposed to say that the larger the magnitude of the PMCC, the
better the model. So F = c + dv 2 is the better model.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
√
And F = 26.0 ⇐⇒ x ≈ (26.0 − 3.195652) /0.0242420 ≈ 30.7.
128
P(X = 8) = e−12 ≈ 0.0655233.
8!
(ii) Let Y be the number of calls received in a randomly chosen period of t seconds. Then
Y ∼ Po(3t/60) = Po(0.05t) and P(Y = 0) = e−0.05t = 0.2. So t = (ln 0.2) /(−0.05) ≈ 32.
128
P(Y = 0) = e−12 ≈ 0.0655233.
8!
(iii) Let Z be the number of calls received in a randomly chosen period of 12 hours.
Then Z ∼ Po(2160) and a suitable approximation therefor is the normal distribution A ∼
N (2160, 2160). Hence, using also the continuity correction,
2200.5 − 2160
P(Z > 2200) ≈ P(A ≥ 2200.5) = 1 − Φ ( √ ) ≈ 1 − Φ (0.8714) ≈ 1 − 0.8082 = 0.1918.
2160
⎛6⎞
(iv) 0.19182 0.80824 ≈ 0.2354.
⎝2⎠
(v) Let B be the number of busy days out of 30. Since np ≈ 5.754 > 5 and n(1 − p) > 5, a
suitable approximation to B is the normal distribution C ∼ N (5.754, 4.650). So using also
the continuity correction,
10.5 − 5.754
P(B ≤ 10) ≈ P(C ≤ 10.5) = Φ ( √ ) ≈ Φ(2.201) ≈ 0.9861.
4.650
Answer to Exercise 497 (9740 N2009/II/5). Simply survey people standing outside
the theatre waiting for the movie to start. Stop once the quota of 100 persons is met.
A disadvantage is that this may not be a representative sample. For example, there will be
no late-comers in our sample of 100.
(ii) No. A linear model would imply that several centuries hence, the time taken to run a
mile would be negative, which is clearly impossible.
The scatter diagram similarly suggests that the rate of improvement is tapering off, rather
than linear.
(iii) A quadratic model would imply that the world record time taken to run a mile eventu-
ally bottoms out, then starts increasing. But by definition, it is impossible that the world
record time increases.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
t(2010) ≈ e−0.0161280(2010)+34.853071 ≈ 11.4. So the predicted world record time on 1st January
2010 is 3 m 41.4 s.
Our range of data is 1930-2000. We are extrapolating our data, which might not always
work out reliably.
f ′ (p) = 7.5(3 + 0.02p)−2 (0.02) > 0. This shows that the probability that a randomly chosen
component that is faulty was supplied by A is increasing in the percentage of electronic
components bought from A. Which is not very surprising.
Answer to Exercise 500 (9740 N2009/II/8). (i) We have 8 letters total, 3 of which
are repeated. Hence, there are 8!/3! = 6720 possible permutations.
(ii) Let TD or DT be a single letter. Then we have 7 “letters” total, 3 of which are
repeated, so there are 2! × 7!/3! possible permutations that we do not want. So there are
6720 − 2! × 7!/3! = 5040 possible permutations that we do want.
X = M1 +⋅ ⋅ ⋅+M21 +S1 +. . . S24 ∼ N (21 ⋅ 2.5 + 24 ⋅ 2.0, 21 ⋅ 0.12 + 24 ⋅ 0.082 ) = N (100.5, 0.3636) .
100 − 100.5
Now, P(X ≤ 100) = Φ ( √ ) ≈ 1 − Φ (0.8292) ≈ 1 − 0.7964 = 0.2036.
0.3636
(iii) Again assuming the thicknesses of the textbooks are independently distributed, our
desired probability is P (S1 + S2 + S3 + S4 < 3M ) = P (S1 + S2 + S3 + S4 − 3M < 0). Now, S1 +
S2 + S3 + S4 − 3M ∼ N (4 ⋅ 2.0 − 3 ⋅ 2.5, 4 ⋅ 0.082 + 32 ⋅ 0.12 ) = N (0.5, 0.1156). Hence,
0 − 0.5
P (S1 + S2 + S3 + S4 − 3M < 0) = Φ ( √ ) ≈ 1 − Φ (1.4706) ≈ 1 − 0.9293 = 0.0707.
0.1156
∑ x 86.4
x̄ = = = 9.6,
n 9
2
2 ∑ x2 − (∑ x) /n 835.92. − 86.42 /9
s = = ≈ 0.81.
n−1 8
x̄ − µ0 9.6 − 10 4
t= √ =√ =− .
s/ n 0.81/9 3
Since ∣t∣ < t8,0.025 = 2.306, we are unable to reject the null hypothesis.
The sample size is small. And so we are unable to appeal to the CLT and claim that a
normal distribution is a suitable approximate distribution for x̄.
(Author’s remark: It actually makes no sense to say that “the CLT does not apply in this
context”. The CLT certainly applies. It is merely that the normal distribution is a poor
approximation for the sample mean.)
⎛ 20 ⎞ ⎛ 20 ⎞ ⎛ 20 ⎞ ⎛ 20 ⎞
= 0.154 0.8516 + 0.155 0.8515 + 0.156 0.8514 + 0.157 0.8513
⎝ 4 ⎠ ⎝ 5 ⎠ ⎝ 6 ⎠ ⎝ 7 ⎠
≈ 0.346354.
(iii) Since np and n(1−p) are large, a suitable approximation to R is the normal distribution
X ∼ N (72, 50.4). Hence, using also the continuity correction,
59.5 − 72
P(R < 60) ≈ P(X < 59.5) = Φ ( √ ) ≈ 1 − Φ (1.761) ≈ 1 − 0.9609 = 0.0391.
50.4
(iv) Since n is large and p is small, a suitable approximation to R is the normal distribution
Y ∼ Po (4.8). Hence,
4.83
P(R = 3) = e−4.8 ≈ 0.152.
3!
⎛ 20 ⎞ 0 ⎛ 20 ⎞ 1
(v)P(R = 0) + P(R = 1) = p (1 − p)20 + p (1 − p)19
⎝ 0 ⎠ ⎝ 1 ⎠
= (1 − p)19 (1 − p + 20p) = 0.2.
By calculator, p ≈ 0.142432.
(ii) We might want each level to be equally well-represented. For example, we might like
approximately one-sixth of the sample to be from Primary 1, another sixth from Primary
2, etc.
In which case we’d probably prefer to do a stratified sample. The method might be some-
thing like this: Pick from the aforementioned ordered list the first 108 Primary 1 students,
the first 108 Primary 2 students, etc.
Answer to Exercise 505 (9740 N2008/II/6). Let the mass of calcium in a bottle
(after the extreme weather) be X ∼ N (µ0 , σ 2 ). (We have made the necessary assumption
that X is normally distributed.)
The null hypothesis is H0 ∶ µ0 = 78 and the alternative hypothesis is H0 ∶ µ0 ≠ 78. Now,
x̄ − µ0 ∑ x/n − 78
t= √ =√ √ ≈ −1.207.
s/ n 2
[∑ x − (∑ x) /n] /(n − 1)/ n
2
Since ∣t∣ < t14,0.025 ≈ 2.145, we are unable to reject the null hypothesis.
Answer to Exercise 506 (9740 N2008/II/7). (i) Let A1 denote the event that A wins
the first set. Similarly define A2 , A3 , B1 , B2 , and B3 . P (A2 ) = P (A1 ∩ A2 ) + P (B1 ∩ A2 ) =
0.6 ⋅ 0.7 + 0.4 ⋅ 0.2 = 0.5.
(ii)
(iii) Without P , it appears that t is increasing, but at a decreasing rate. So a log model
might be appropriate.
So for the model t = a + b ln x, the least square estimates are a ≈ 1.4 and b ≈ 4.4.
(vi) This would be an extrapolation of the data, which may or may not be wise.
(ii) Let Y be the total number of pianos sold in a given week. Then Y ∼ Po(4.4). P(Y =
4) = e−4.4 4.44 /4! ≈ 0.191736.
(iii) Let Z be the number of grand pianos sold in 50 weeks. Then Z ∼ Po(90). Since λZ
is large, a suitable approximation is the normal distribution A ∼ N (90, 90). Hence, using
also the continuity correction,
79.5 − 90
P(Z < 80) ≈ P(A < 79.5) = Φ ( √ ) ≈ 1 − Φ(1.1068) ≈ 1 − 0.8657 = 0.1343.
90
(iv) An organisation might buy a relatively-large number of grand pianos on any given
day. So it is not likely that the rate at which grand pianos are sold is constant throught
the year.
⎛9⎞
(ii) = 9.
⎝8⎠
⎛ 5 ⎞⎛ 7 ⎞ ⎛ 5 ⎞⎛ 7 ⎞
(iii) + = 5 ⋅ 35 + 1 × 35 = 210.
⎝ 4 ⎠⎝ 4 ⎠ ⎝ 5 ⎠⎝ 3 ⎠
⎛9⎞
• No diplomats from K (i.e. only diplomats from L and M ) is ;
⎝8⎠
⎛8⎞
• No diplomats from L is ;
⎝8⎠
• No diplomats from M is 0.
⎛ 12 ⎞
The total number of ways to choose the diplomats is . Hence the number of ways to
⎝ 8 ⎠
⎡ ⎤
⎛ 12 ⎞ ⎢⎢⎛ 9 ⎞ ⎛ 8 ⎞⎥⎥
− + = 495 − (9 + 1) = 485.
⎝ 8 ⎠ ⎢⎢⎝ 8 ⎠ ⎝ 8 ⎠⎥⎥
have at least 1 diplomat from each island is
⎣ ⎦
(ii) X1 − X2 ∼ N (0, 2 ⋅ 82 ). So
15 − 0
P(X1 > X2 + 15) = P(X1 − X2 > 15) = 1 − Φ ( √ ) ≈ 1 − Φ(1.3258) ≈ 1 − 0.9075 = 0.0925.
2 ⋅ 82
74 − µ 74 − µ
(iii) P(Y < 74) = Φ ( ) = 0.0668 ⇐⇒ = −1.5.
σ σ
146 − µ 146 − µ 146 − µ
P(Y > 146) = 1 − Φ ( ) = 0.0668 ⇐⇒ Φ ( ) = 0.9332 ⇐⇒ = 1.5.
σ σ σ
146 − µ 74 − µ 72
− = 1.5 − (−1.5) = = 3 ⇐⇒ σ = 24 and µ = 110.
σ σ σ
Since σ = 8a and µ = 50a + b, a = 3 and b = −40.
pB pC = 0.25 ⋅ 0.4 = 0.1 = pB∩C , so that by definition, B and C are indeed independent.
(ii) Let Y be the number of times the machine will break down in a period of four weeks.
Then Y ∼ Po(12).
(iii) Let Z be the number of times the machine will break down in a period of 16 weeks.
Then Z ∼ Po(48). Since λZ is large, a suitable approximation for Z is the normal distribu-
tion A ∼ N (48, 48). Hence, using also the continuity correction,
50.5 − 48
P(Z > 50) ≈ P(A > 50.5) = 1 − Φ ( √ ) ≈ 1 − Φ(0.3608) ≈ 1 − 0.6409 = 0.3591.
48
Answer to Exercise 514 (9233 N2008/II/27). (i) Let the mass after the adjustment
be X ∼ N (µ0 , σ 2 ). It is necessary to assume that these masses remain normally distributed.
The null hypothesis is H0 ∶ µ0 = 32.40 and the alternative hypothesis is HA ∶ µ0 ≠ 32.40.
Now,
x̄ − µ0 32.00 − 32.40
t= √ = √ ≈ −2.104.
s/ n 2.892/80
Since ∣t∣ > t79,0.025 ≈ 1.99, we can reject the null hypothesis.
(ii) This means that if H0 were true and we tested infinitely many size-80 samples (as done
above), we’d reject H0 in 5% of the samples.
(iii) The one-tailed p-value is ≈ 0.0193. So the least level of significance is 1.93%.
55 − 50
P(X > 55) = 1 − Φ ( ) = 1 − Φ(1.25) ≈ 1 − 0.8944 = 0.1056.
4
Assuming that the probability that he’s late each day is independent of whether he was
late on any other day, the probability that he will be late no more than onc in 5 days is
⎛5⎞ ⎛5⎞
0.10560 0.89445 + 0.10561 0.89444 ≈ 0.910.
⎝0⎠ ⎝1⎠
(ii) Let Y ∼ N (40, 52 ). Our desired probability is P(X − Y − 5 < 0). Assuming the journey
times of Messrs Sim and Lee are independent, X − Y − 5 ∼ N (5, 42 + 52 ). Thus,
0−5
P(X − Y − 5 < 0) = Φ ( √ ) ≈ 1 − Φ(0.7809) ≈ 1 − 0.7826 = 0.2174.
42 + 52
(iii) Assume that the journey times of Messrs Sim and Lee each day are independent. Then
the desired probability is
2
(ii) Let X ∼ N (µ, σ 2 ). P (µ − 2 ≤ X ≤ µ + 2) = 0.8 Ô⇒ P (X ≤ µ + 2) = 0.9 ⇐⇒ Φ ( ) =
σ
2
0.9 ⇐⇒ ≈ 1.281 ⇐⇒ σ ≈ 1.56.
σ
σ2 0.50
(iii) Let X̄ ∼ N (µ, ). Then P(X̄ ≥ µ + 0.50) ≤ 0.1 ⇐⇒ 1 − Φ ( √ ) ≤ 0.1 ⇐⇒
n σ/ n
√ √ √
0.50 n 0.50 n 0.50 n 2 √
0.9 ≤ Φ ( ) ⇐⇒ ? 1.281 ⇐⇒ ≥ ⇐⇒ 0.50 n ≥ 2 ⇐⇒ n ≥ 16.
σ σ σ σ
(ii) Yes. If say the teacher teaches 10 different classes, we could stratify our sample by
class and pick 1 student from each class.
⎛ 10 ⎞ ⎛ 10 ⎞ ⎛ 10 ⎞
0.240 0.7610 + 0.241 0.769 + ⋅ ⋅ ⋅ + + 0.244 0.766 ≈ 0.933.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 4 ⎠
(ii) Let X ∼ B(1000, 0.24) be the number of people in a sample of 1000 that have gene A.
Since np = 240 > 5 and n(1 − p) = 760 > 5 are both large, a suitable approximation for X is
the normal distribution Y ∼ N (240, 182.4). Hence, using also the continuity correction,
(iii) Let Z ∼ B(1000, 0.003) be the number of people in a sample of 1000 that have gene B.
Since n is large and p is small, a suitable approximation for Y is the Poisson distribution
A ∼ Po (3). Hence,
2
∑ x 4626 ∑ x2 − (∑ x) /n
x̄ = = = 30.84 and s2 = ≈ 33.7259.
n 150 n−1
(ii) Let H0 ∶ µ0 = 30 and HA ∶ µ0 > 30 be the null and alternative hypotheses. Now,
x̄ − µ0 30.84 − 30
Z= √ ≈√ ≈ 1.772.
s/ n 33.7259/150
(iii) We used the Z-test. The sample size is large, so the normal distribution is a good
approximation provided the underlying distribution is “nice enough”.
7 − 6.6 4
P(3C > 7) = 1 − Φ ( ) = 1 − Φ ( ) ≈ 0.3949.
1.5 15
(ii) Let T be the weight of a randomly chosen turkey. Then T ∼ N (10.5, 2.12 ). Then
5T ∼ N (5 ⋅ 10.5, 52 ⋅ 2.12 ) = N (52.5, 10.52 ) and
55 − 52.5 5
P(5T > 55) = 1 − Φ ( ) = 1 − Φ ( ) ≈ 0.405904.
10.5 21
62 − 59.1 5
P(3C + 5T > 62) = 1 − Φ ( √ ) = 1 − Φ ( ) ≈ 0.392.
112.5 21
(iv) The event “both chicken costs more than $7 and turkey costs more than $55” is a
proper subset of the event “chicken and turkey together cost $62”. By the monotonicity of
probability, the probability of the latter is greater than the latter.
• To his right: “Wife A, some other man, that some other man’s wife, etc.”; OR
• To his left: “Wife A, some other man, that some other man’s wife, etc.”.
In the first scenario, we have 5! possible arrangements. Likewise in the second. Altogether
2 ⋅ 5! possible arrangements.
1 1 1 1
(i) P(1, 1, 1) = ⋅ ⋅ = .
8 4 2 64
1 1 1 3 1 7 1 1 8 + 2 ⋅ 3 + 7 21
(ii) P(1, 1) + P(1, 0, 1) + P(0, 1, 1) = ⋅ + ⋅ ⋅ + ⋅ ⋅ = = .
8 4 8 4 4 8 8 4 256 256
(iii) Let E and F be the events that “the third throw is successful” and “exactly two of
the three throws are successful”.
1 3 1 7 1 1 13
P(E ∩ F ) = P(1, 0, 1) + P(0, 1, 1) = ⋅ ⋅ + ⋅ ⋅ = .
8 4 4 8 8 4 256
13 17
P(F ) = P(E ∩ F ) + P(E ′ ∩ F ) = + P(1, 1, 0) = .
256 256
Thus, P(E∣F ) = P(E ∩ F ) ÷ P(F ) = 13/17.
(iii) PMCC ≈ −0.993839. Its magnitude is larger than −0.912 and very close to −1. It
would appear that the regression of ln x on t is a more appropriate model.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is
Answer to Exercise 524 (9233 N2007/I/4). (i) It cannot be that all three vertices
are collinear. Thus, one vertex must be chosen from the upper line segment and the other
must be chosen from the lower line segment. Hence, there are 3 × 6 = 18 possible triangles.
(ii) Consider triangles that do not have A as a vertex. Two vertices must be chosen from one
⎛3⎞ ⎛6⎞
line segment and the third must be chosen from the other. So there are ⋅7+4⋅ =
⎝2⎠ ⎝2⎠
21 + 60 = 81 possible triangles. Now, including also triangles with A as a vertex, we have
99 possible triangles.
(ii) The distribution is “sufficiently nice” that with a sample size of 100, it is appropriate
to use the CLT.
Answer to Exercise 526 (9233 N2007/II/25). (i) P(W ∣B) = 20/52 = 5/13 ≈ 0.384615.
(iv) P(W )P(B) = 0.4 ⋅ 0.52 ≠ P(B ∪ W ) and so W and B are not independent.
There are men who take chemistry (equivalently, P(M ∩ C) ≠ 0), so M and C are not
mutually exclusive.
Answer to Exercise 527 (9233 N2007/II/26). (i) Let X be the number of genuine
call-outs in a randomly chosen two-week period. Then X ∼ Po(4) and
42 43 44 45
P(X < 6) = e−4 (1 + 4 + + + + ) ≈ 0.785130.
2! 3! 4! 5!
(ii) Let Y be the total number of call-outs in a randomly chosen six-week period. Then
Y ∼ Po(15) and since λY is large, a suitable approximation for Y is the normal distribution
Z ∼ N (15, 15). Hence, using also the continuity correction,
19.5 − 15
P(Y > 19) ≈ P(Z > 19.5) ≈ 1 − Φ ( √ ) ≈ 0.123.
15
(ii) 0.74L + 0.86H ∼ N (0.74 ⋅ 5 + 0.86 ⋅ 3, 0.742 ⋅ 0.12 + 0.862 ⋅ 0.052 ) = N (6.28, 0.00728225).
Answer to Exercise 532 (9233 N2006/II/26). (i) Let X be the number of severe
floods in a randomly-chosen 100-year period. Then X ∼ Po(2). So
2 2
[P(X = 1)] = (e−2 ⋅ 2) = 4e−4 ≈ 0.0733.
(ii) Let Y be the number of severe floods in a randomly-chosen 1000-year period. Then
Y ∼ Po(20). Since λY is large, a suitable approximation for Y is the normal distribution
Z ∼ N (20, 20). Hence, using also the continuity correction,
25.5 − 20
P(Y > 25) ≈ P(Z > 25.5) = 1 − Φ ( √ ) ≈ 0.109.
20
So total 26 possibilities.
Since ∣Z∣ > Z0.05 = 1.645, we can reject the null hypothesis.
(ii) If H0 is true and we conduct the above test on infinitely-many size-80 samples, we’d
(falsely) reject H0 for 5% of the samples.
1 40 − µ 1 40 − µ 2
P(X < 40) = ⇐⇒ Φ ( )= ⇐⇒ ≈ −1.282.
10 σ 10 σ
1 2 85
≈ minus ≈ yields ≈ 3.522 ⇐⇒ σ ≈ 24.1 and µ ≈ 70.9.
σ
⎛ 10 ⎞ 0 10 ⎛ 10 ⎞ 1 9 ⎛ 10 ⎞ 2 8 ⎛ 10 ⎞ 3 7
(ii) 0.1 0.9 + 0.1 0.9 + 0.1 0.9 + 0.1 0.9 ≈ 0.987.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ 3 ⎠
(iii) Let Y be the number of cars out of a random sample of 100 that are travelling at
speed less than 40 km h-1 . Then Y ∼ B(100, 0.1). Since np = 10 > 5 and n(1 − p) = 90 > 5 are
both large, a suitable approximation to Y is the normal distribution Z ∼ N (10, 9). Hence,
using also the continuity correction,
8.5 − 10
P(Y ≤ 8) ≈ P(Z ≤ 8.5) = Φ ( √ ) = 1 − Φ(0.5) ≈ 1 − 0.6915 = 0.3085.
9
www.EconsPhDTutor.com
Or simply email:
DrChooYanMin@gmail.com