Differentiation

DIFFERENTIATION
Definition of the derivative of f : R → R
Let f be a real-valued function defined at all points

of an interval (a, b). For each x0 ∈ (a, b), we define
the derivative of f at x0, denoted by f 0(x0), to be
f (x) − f (x0)
lim
x→x0 x − x0
if this limit exists (and, if it does not exist, then the
function does not have a derivative at x0).
1
As you will know from calculus courses, we can think
of f 0(x0) as being the slope of the tangent to the
graph of f at x0.
If f 0(x0) is defined, we say that f is differentiable at

x0. If f is differentiable at each x0 ∈ (a, b), then f is
said to be differentiable on (a, b).
The right derivative, denoted by fr0 (x0), is the limit
f (x) − f (x0)
lim
x→x+ x − x0
0
if it exists. We define the left derivative fl0(x0)

similarly:
f (x) − f (x0)
fl0(x0) = lim
x→x−0
x − x0
Note that f 0(x0) exists if and only if fl0(x0) and

fr0 (x0) exist and are equal.
Example Consider f (x) = |x|.
First suppose that a > 0. Then for x sufficiently

close to a we have that x > 0 and so |x| = x. Hence
f (x) − f (a) |x| − |a|
lim
x→a
= x→a
lim
(x − a) x−a
x−a
= x→a
lim = 1.
x−a
Similarly, if a < 0, then f 0(a) = −1. But at a = 0 we
have:
|x| − |0| (−x) − 0
fl0(0) = lim = lim = −1.
x→0 (x − 0)
− x→0 − x
But fr0 (0) = 1, since it is
|x| − |0| x−0
lim = lim = 1.
x→0 (x − 0)
+ x→0 + x
Thus the derivative of f does not exist at 0.
Differentiability and continuity
One may also see that differentiability is a stronger

property than continuity.
Theorem If f is differentiable at a point c, then f is

continuous at c.
Proof We are given that
f (x) − f (c) 0(c)
lim
x→c
= f
(x − c)
exists. But
f (x) − f (c)
lim (f (x) − f (c)) = x→c
x→c
lim (x − c)
x−c
and, by the algebra of limits, this is equal to
f 0(c) × 0. Thus limx→c(f (x)) = f (c) and so f is
continuous at c.
You will know from calculus that the sign of the
derivative provides usual information about how a
function behaves. In particular, if f 0(a) > 0, then in
some small neighbourhood of a, the f -values to the
left of a are smaller than f (a), and to the right,
these values are larger:
Theorem If f 0(a) > 0, then there exists δ > 0 such
that
f (a + h) > f (a) > f (a − h)
for all h ∈ (0, δ).

Proof Since limx→a(f (x) − f (a))/(x − a) = f 0(a) > 0,
we can choose a number δ > 0 such that
0 < |x − a| < δ
implies

f (x)

− f (a) 0

− < f 0(a).

f (a)
x −a

In particular, if h = x − a ∈ (0, δ),

f (a + h) − f (a)
−f 0(a) < − f 0(a)
h
Adding f 0(a) to both sides, we see that, if
h = x − a ∈ (0, δ),
f (a + h) − f (a)
0<
h
Since h > 0 we may multiply through by h without

changing the signs to obtain
0 < f (a + h) − f (a),
from which it now follows that f (a + h) > f (a).

Let us now do the same argument with x − a = −h,
where h ∈ (0, δ).
Since
0 < |x − a| < δ
implies

f (x)

− f (a) 0

− < f 0(a),

f (a)
x −a

we have, in particular, for x − a = −h ∈ (−δ, 0),

f (a − h) − f (a)
−f 0(a) < − f 0(a).
−h
As before, we add f 0(a) to both sides. We then
multiply through −h < 0, which changes the sign,
giving
0 > f (a − h) − f (a).
The other half of the result follows.

Theorem If f 0(a) < 0, then there exists δ > 0 such
that
f (a + h) < f (a) < f (a − h)
for all h ∈ (0, δ).
The proof of this statement is similar to that of the

previous one: exercise!
Another example. Find from first principles f 0(a)
when f (x) = 1/x, a 6= 0.
Then, for a 6= 0 and h 6= 0 but sufficiently small in
modulus so that a + h 6= 0,
f (a + h) − f (a) 1 1 1
!
= −
h a+h a h
a−a−h 1
= ·
(a + h)a h
−h 1
= ·
(a + h)a h
1 1
= − → − 2,
(a + h)a a
as h → 0.
Theorem Let f, g be defined on (a, b) and
differentiable at c ∈ (a, b). Then
• (f + g)0(c) = f 0(c) + g 0(c)
• (f g)0(c) = f 0(c)g(c) + f (c)g 0(c)
 0
f g(c)f 0(c) − g 0(c)f (c)
• if g(c) 6= 0, then   (c) = 2
.
g g(c)
Let us prove that (f + g)0(c) = f 0(c) + g 0(c).
This is straightforward, since, by algebra of limits,

(f + g)(c + h) − (f + g)(c) f (c + h) − f (c)
lim = lim
h→0 h h→0 h
g(c + h) − g(c)
+ lim
h→0 h
= f 0(c) + g 0(c).
Let us prove that (f g)0(c) = f 0(c)g(c) + f (c)g 0(c).
We have, by algebra of limits,

(f g)(c + h) − (f g)(c) f (c + h)g(c + h) − f (c)g(c)
lim = lim
h→0 h h→0 h
f (c + h)g(c + h) − f (c)g(c + h) + f (c)g(c + h) − f (c)g(c)
= lim
h→0 h
(f (c + h) − f (c))g(c + h) f (c)(g(c + h) − g(c))
= lim + lim
h→0 h h→0 h
f (c + h) − f (c) g(c + h) − g(c)
= lim lim g(c + h) + f (c) lim .
h→0 h h→0 h→0 h
This is equal to f 0(c)g(c) + f (c)g 0(c), as required.
Let us now prove that, if g(c) 6= 0, then
 0
f
  (c)
g(c)f 0(c) − g 0(c)f (c)
= 2
.
g g(c)
In fact, we can instead prove that

0
0(c)

1
  = −
g
2
.
g g(c)
Then, by the previous result, about derivatives of

products,
 0 0
g 0(c)
  
f 1 0 1
  = f · = f (c) + f (c) − ,
g g g(c) g(c)2
which is the required formula.
Since g is differentiable at c, g is also continuous at c.
Then, as g(c) 6= 0, there exists δ > 0 such that, as

long as |h| < δ, g(c + h) 6= 0.
Then, by algebra of limits,

1/g(c + h) − 1/g(c) g(c) − g(c + h)
lim = lim
h→0 h h→0 g(c)g(c + h)h
−(g(c + h) − g(c))
= lim
h→0 h
1
· lim
h→0 g(c)g(c + h)
0 1
= −g (c) 2
.
g(c)
Theorem[Chain Rule] Let f be defined on (a, b) and
f 0(c) exists. Let g be defined on the range of f and
be differentiable at f (c). Define the new function
K(x) = (g ◦ f )(x) = g(f (x))
for all x ∈ (a, b). Then K is differentiable at c and
K 0(c) = g 0(f (c))f 0(c).

Digression. Note that, if f is differentiable at c,
then there is some δ > 0 such that, for all h with
0 < |h| < δ,
f (c + h) = f (c) + hf 0(c) + hR(h),
where R(h) → 0 as h → 0.
In other words, f has a local linear approximation

with slope f 0(c) in some neighbourhood of c.
Why is that?
Well, we know that, if f 0(c) exists, then

f (c + h) − f (c)
→ f 0(c), as h → 0,
h
so
f (c + h) − f (c)
− f 0(c) → 0, as h → 0.
h
So we can write, for h small enough,
f (c + h) − f (c)
− f 0(c) = R(h),
h
And conversely, assume that, for some a and some
δ > 0, for all h with 0 < |h| < δ,
f (c + h) = f (c) + ha + hR(h),
Then, for h with 0 < |h| < δ

f (c + h) − f (c)
= a + R(h) → a, as h → 0,
h
so the derivative of f at c exists and f 0(c) = a.
Proof of Chain Rule. We want to show that, if
K(x) = g(f (x)), then K 0(c) = g 0(f (c))f 0(c), provided
f 0(c) and g 0(f (c)) exist.
We know that f 0(c) exists, so there exists δ > 0 such

that, for all h with 0 < |h| < δ, we can write
f (c + h) = f (c) + hf 0(c) + hR1(h),
where R1(h) → 0 as h → 0.
Put k = f (c + h) − f (c) = hf 0(c) + hR1(h), so that
g(f (c + h)) = g(f (c) + k).
But g is differentiable at f (c), so has a local linear

approximation.
By the same argument, provided that

k = hf 0(c) + hR1(h) is small enough, we can write
g(f (c+h)) = g(f (c)+k) = g(f (c))+kg 0(f (c))+kR2(k),
where R2(k) → 0 as k → 0.
Substituting k = hf 0(c) + hR1(h), we get
g(f (c + h)) = g(f (c) + k))

= g(f (c)) + (hf 0(c) + hR1(h))g 0(f (c))
+ (hf 0(c) + hR1(h))R2(k)
= g(f (c)) + hf 0(c)g 0(f (c)) + hR(h),
where
R(h) = R1(h)g 0(f (c))+(f 0(c)+R1(h))R2(hf 0(c)+hR1(h)).

Suppose we can show that R(h) → 0 as h → 0. Then
g(f (c + h)) − g(f (c))
lim = lim (f 0(c)g 0(f (c)) + R(h))
h→0 h h→0
= f 0(c)g 0(f (c)),
as required.
But, by algebra of limits, and since k → 0 as h → 0,
lim R(h) = g 0(f (c)) lim R1(h)

h→0 h→0
+ lim (f 0(c) + R1(h)) lim R2(hf 0(c) + hR1(h))
h→0 h→0
= g 0(f (c)) · 0 + f 0(c) · lim R2(k) = 0.
k→0
Maxima, Minima, Rolle’s Theorem, and Mean
Value Theorem
Definition Let f : R → R. We say that f has a local

maximum at a point c ∈ R if there exists δ > 0 such
that f (c) ≥ f (x) for all x ∈ (c − δ, c + δ).
We define a local minimum similarly.

Theorem Let f : R → R. If f has a local maximum
(or minimum) at c, and if f 0(c) exists, then f 0(c) = 0.
Proof. Suppose f has a local maximum at c. Thus
there exists δ > 0 such that, for all h with |h| < δ,
f (c + h) ≤ f (c).
Consider 0 < h < δ; then, for all such h,

f (c + h) − f (c)
≤ 0.
h
Since f 0(c) exists, the right derivative fr0 (c) exists,

and by the above
f (c + h) − f (c)
fr0 (c) = lim ≤ 0.
h→0+ h
Now consider −δ < h < 0; then, for all such h,
f (c + h) − f (c)
≥ 0.
h
Since f 0(c) exists, the left derivative fl0(c) exists, and

by the above
f (c + h) − f (c)
fl0(c) = lim ≥ 0.
h→0 − h
But f 0(c) exists, so f 0(c) = fr0 (c) ≤ 0 and

f 0(c) = fl0(c) ≥ 0, and so f 0(c) = 0.
The proof in the case when f (c) has a local
minimum at c is similar. (CHECK!)
The theorem tells us that if f is differentiable on

(a, b), then in order to examine all local maxima or
minima, we may restrict attention to the points
where the derivative is zero.
An important result about functions continuous on
closed bounded intervals [a, b] is the following.
Theorem Suppose the real function f is continuous

on [a, b]. Then f is bounded on [a, b] and attains its
bounds; that is, there are x1, x2 ∈ [a, b] such that
f (x1) = min{f (x) : x ∈ [a, b]},
f (x2) = max{f (x) : x ∈ [a, b]}.

This is again MA103 revision.
Let us prove that f is bounded above. The proof

that it is bounded below is analogous.
Suppose otherwise. Then for every n ∈ N, there

exists zn ∈ [a, b] such that f (zn) > n. Now a ≤ zn ≤ b
for each n, so (zn) is a bounded sequence, and so
has a convergent subsequence, say znk → z ∈ [a, b].
By continuity of f , f (znk ) → f (z) as k → ∞,

contradicting f (znk ) > nk for each k.
Let us prove that there is x2 ∈ [a, b] such that
f (x2) = max{f (x) : x ∈ [a, b]}.
To say that f is bounded on [a, b] means that the set

{f (x) : x ∈ [a, b]} is bounded. Then it has a
supremum,
M = sup{f (x) : x ∈ [a, b]}.

By definition of supremum, for each n, there exists
yn such that
M − 1/n < f (yn) ≤ M.
Now for each n, a ≤ yn ≤ b, so the sequence (yn) is

bounded.
Therefore, (yn) must have a convergent

subsequence, say ynk → y as k → ∞, where y ∈ [a, b].
By continuity of f on [a, b], f (ynk ) → f (y) as k → ∞.
But also, for each k,
M − 1/nk < f (ynk ) ≤ M,
so, by sandwiching, f (ynk ) → M as k → ∞.
By uniqueness of limit,
f (y) = M = sup{f (x) : x ∈ [a, b]} = max{f (x) : x ∈ [a, b]}.

The following theorem is very useful:
Theorem[Rolle’s Theorem] Let f be continuous on

[a, b] and differentiable on (a, b) and f (a) = f (b).
Then there exists c ∈ (a, b) such that f 0(c) = 0.
Proof. By the previous result, f must have a
maximum value at some point c and a minimum
value at some point d in [a, b].
Then one of the following must happen:
1. the maximum value satifies f (c) > f (a) = f (b);
2. the minimum value satisfies f (d) < f (a) = f (b);
3. f (x) = f (a) for all x ∈ [a, b].

If the first of the above possibilities occurs, then
c ∈ (a, b), so f 0(c) = 0.
If the second of the above possibilities occurs, then

d ∈ (a, b), so f 0(d) = 0.
In the third case, f 0(x) = 0 for all x ∈ (a, b).

(CHECK!)
Rolle’s Theorem follows.

An immediate corollary of Rolle’s Theorem is the
(very important) Mean Value Theorem.
Theorem[The Mean Value Theorem] Let f be

continuous on [a, b] and differentiable on (a, b). Then
there exists c ∈ (a, b) such that
f (b) − f (a) = f 0(c)(b − a).

Proof Let the constant α be given by
α = (f (b) − f (a))/(b − a). Then the function g
defined by g(x) = f (x) − αx is continuous on [a, b]
and differentiable on (a, b) (because f is).
Also, it satisfies g(a) = g(b). To see this,

a(f (b) − f (a))
g(a) = f (a) − αa = f (a) −
b−a
f (a)b − f (a)a − af (b) + af (a)
=
b−a
f (a)b − af (b)
= .
b−a
Similarly,
b(f (b) − f (a))
g(b) = f (b) − αb = f (b) −
b−a
f (b)b − f (b)a − bf (b) + bf (a)
=
b−a
f (a)b − af (b)
= .
b−a
This is why we chose α as we did, to have

g(a) = g(b). This property enables us to apply
Rolle’s theorem.
By Rolle’s theorem, there is c ∈ (a, b) with g 0(c) = 0.
But g 0(x) = f 0(x) − α, so there is c ∈ (a, b) with

f 0(c) = α, i.e.
f (b) − f (a)
f 0(c) = ,
b−a
as required.
Definition Let f : I → R, where I is some interval. f
is increasing (decreasing) on I if for each x, y ∈ I
with x < y, we have f (x) ≤ f (y) (resp. f (x) ≥ f (y)).
Similarly we define f to be strictly increasing if ≤

can be replaced by <.
Theorem Let f : R → R be differentiable on (a, b).
1. f 0(x) ≥ 0 for all x ∈ (a, b) =⇒ f is increasing on

(a, b)
2. f 0(x) = 0 for all x ∈ (a, b) =⇒ f is constant on

(a, b)
3. f 0(x) ≤ 0 for all x ∈ (a, b) =⇒ f is decreasing on

(a, b).
This can be proved by using the Mean Value
theorem: the results follow from the fact that for
each pair x1 < x2 in (a, b), and for some c ∈ (x1, x2).
f (x2) − f (x1) = (x2 − x1)f 0(c).
If f 0(x) ≥ 0 for all x ∈ (a, b), then f 0(c) ≥ 0 in the

above, and so f (x2) ≥ f (x1) whenever x2 ≥ x1.
If f 0(x) ≤ 0 for all x ∈ (a, b), then f 0(c) ≤ 0 in the

above, and so f (x2) ≤ f (x1) whenever x2 ≥ x1.
If f 0(x) = 0 for all x ∈ (a, b), then f 0(c) = 0 in the
above, and so f (x2) = f (x1) for all x1, x2 ∈ [a, b].
Recall Rolle’s Theorem:
Theorem[Rolle’s Theorem] Let f be continuous on

[a, b] and differentiable on (a, b) and f (a) = f (b).
Then there exists c ∈ (a, b) such that f 0(c) = 0.
Worked example. Let f, g be differentiable on R,
and let a, b ∈ R with a < b. Show that there is a point
c such that
f 0(c)[g(b) − g(a)] = g 0(c)[f (b) − f (a)].
Let us consider
h(x) = f (x)[g(b) − g(a)] − g(x)[f (b) − f (a)].
Note that h is differentiable on R and hence

differentiable on (a, b), and continuous on [a, b].
Also,
h(a) = f (a)g(b) − f (a)g(a) − g(a)f (b) + g(a)f (a)

= f (a)g(b) − g(a)f (b),
and
h(b) = f (b)g(b) − f (b)g(a) − g(b)f (b) + g(b)f (a)

= −f (b)g(a) + g(b)f (a).
Thus h(a) = h(b), and by Rolle’s theorem, there

exists c ∈ (a, b) such that h0(c) = 0.
But
h0(x) = f 0(x)[g(b) − g(a)] − g 0(x)[f (b) − f (a)],
so h0(c) = 0 is equivalent to
f 0(c)[g(b) − g(a)] = g 0(c)[f (b) − f (a)],
as required.
Note that the Mean Value Theorem is a special case
of this result, with g(x) = x. Then g(b) = b,
g(a) = a, g 0(x) = 1 for all x, and so g 0(c) = 1.
Then
f 0(c)[g(b) − g(a)] = g 0(c)[f (b) − f (a)]
becomes
f 0(c)[b − a] = f (b) − f (a).

Differentiation of functions f : Rn → Rm
Partial and directional derivatives
Suppose that f : Rn → R. The partial derivative

∂f /∂xi at a point a ∈ Rn is the instantaneous rate of
change of the function with respect to xi, at a.
df
Analogously, if n = m = 1, the derivative dx at a
point a ∈ R is the instantaneous rate of change of f
with respect to x at a.
Formally,
∂f
(a)
∂xi
is
f (a1, . . . , ai−1, ai + h, ai+1, . . . , an) − f (a)
lim ,
h→0 h
if this limit exists.
We may think of the partial derivative ∂f /∂xi as the
rate of change in f as we move in the direction of
the vector ei, because
f (a1, . . . , ai−1, ai + h, ai+1, . . . , an) − f (a)
h
f (a + hei) − f (a)
= .
h
But we can move in many other directions, and such
considerations lead to the notion of directional
derivative.
We define a direction (or direction vector) in Rn to
be an n-vector of length 1.
Definition[Directional derivative] The directional

derivative of f in direction v, at the point a, is
defined to be the limit
f (a + hv) − f (a)
Dv f (a) = lim ,
h→0 h
if this limit exists.
Note that, when n = 1 we only have two choices of

direction, namely v = ±1.
The derivative of f : Rn → Rm
Suppose f (x) = (f1(x), f2(x), . . . , fm(x))T . Suppose

h = (h1, h2, . . . , hn)T . If the hi are small enough, then
∂f1 ∂f1
f1 (a + h) − f1 (a) ' (a)h1 + · · · + (a)hn
∂x1 ∂xn
...
∂fm ∂fm
fm (a + h) − fm (a) ' (a)h1 + · · · + (a)hn.
∂x1 ∂xn
When n = m = 1, this just says that, for h small
enough,
f (a + h) − f (a) ' f 0(a)h.
This is just the local linear approximation we

discussed last time:
f (a + h) − f (a) = f 0(a)h + hR(h),
where R(h) → 0 as h → 0, making the second term,

hR(h) of smaller order than the first term, f 0(a)h.
So
f1(a + h) − f1(a)
 
 
f (a + h) − f (a) =



.. 


 
fm(a + h) − fm(a)
 h1
 
∂f1 ∂f1
(a) ··· (a) 


 ∂x1 ∂xn
 
  h2 
  
'

.. ... ..  .
  .. 
  

∂fm ∂fm  
(a) ··· (a)  
∂x1 ∂xn hn
This describes the linear approximation of f at a,
and the matrix (or, equivalently, the linear mapping
it describes)
∂f1 ∂f1
(a) · · · (a)
 
 ∂x1 ∂xn 
Df (a) = 


.. ... .. 



∂fm ∂fm 
(a) · · · (a)
∂x1 ∂xn
is known as the derivative (or the Jacobian
derivative) of f at a.
When n = m = 1, Df (a) = (f 0(a)).

The ‘argument’ just given is not precise. We now
take a more formal approach, in which we shall see
that some conditions other than simply existence are
required on the partial derivatives to make the
argument watertight. First, we start with the formal
definition of the derivative.
Definition A function f : Rn → Rm is differentiable at
a ∈ Rn if there exists a linear function x 7→ Ax (where
A is an m × n matrix) from Rn to Rm such that
f (a + h) − f (a) − Ah
→0
k hk
as h → 0. We call the linear function (or,
equivalently, the matrix A representing it) the
derivative of f at a and denote it by Df (a).
The 1-dimensional analogue is the following
equivalent definition of differentiability.
Definition A function f : R → R is differentiable at

a ∈ R if there exists a linear function x 7→ Ax (where
A is a scalar constant) from R to R such that
f (a + h) − f (a) − Ah
→0
|h|
as h → 0. We call the linear function (or,
equivalently, the constant A representing it) the
derivative of f at a and denote it by f 0(a).
Suppose that f : Rn → R. Then the gradient of f at
a point a is defined to be the column vector
∂f ∂f ∂f
∇f (a) = ( (a) (a) ··· (a) )T .
∂x1 ∂x2 ∂xn
By a neighbourhood of a ∈ Rn we mean a set of the
form {x ∈ Rn : kx − ak < }, for some . (There are
other, more general interpretations of
‘neighbourhood’, but this will do for now.)
Theorem Suppose f : Rn → Rm. Then:
• f is differentiable at a =⇒ f is continuous at a
• f is differentiable at a ⇐⇒ fi is differentiable at a
(for i = 1, 2, . . . , m)
• If f is differentiable at a then Df (a) =
T  ∂f1 ∂f1
(∇f1(a)) (a) · · · (a)
  
   ∂x1 ∂xn 



.
.
 
 = 
  .
. . . . .
.

.

  
∂f m ∂f m

∇(fm(a))T (a) · · · (a)
∂x1 ∂xn
∂fi
• If (for i = 1, . . . , m and j = 1, . . . , n) all exist
∂xj
in a neighbourhood of a and are continuous at a,
then f is differentiable at a.
Let us consider the first statement. We assume that
f is differentiable at a, that is there exists a linear
function x 7→ Ax (where A = Df (a) is an m × n
matrix) from Rn to Rm such that
f (a + h) − f (a) − Ah
→0
k hk
as h → 0.
We want to show that f is continuous at a, that is
f (a + h) − f (a) → 0, as h → 0.
But
kf (a + h) − f (a)k
kf (a + h) − f (a)k = · k hk
k hk
kf (a + h) − f (a) − Ah + Ahkkhk
=

k hk 
kf (a + h) − f (a) − Ahk kAhk 
≤ k hk  +
khk k hk
kf (a + h) − f (a) − Ahk
= k hk · + kAhk.
k hk
So
kf (a + h) − f (a) − Ahk
lim kf (a + h) − f (a)k ≤ lim khk · lim
h→0 h→0 h→0 k hk
+ lim kAhk
h→0
= 0 · 0 + 0 = 0.
Let us consider the second statement. We assume
that f is differentiable at a, that is there exists a
linear function x 7→ Ax (where A = Df (a) is an m × n
matrix) from Rn to Rm such that
f (a + h) − f (a) − Ah
→0
k hk
as h → 0.
We want to show that fi is differentiable at a, for

i = 1, 2, . . . , m.
Write
f (a + h) − f (a) − Ah
g(h) = ;
k hk
so g(h) → 0 as h → 0.
But this implies that gi(h) → 0 for i = 1, 2, . . . , m.
In other words, for i = 1, 2, . . . , m,

fi(a + h) − fi(a) − Aih
gi(h) = → 0,
khk
where Ai is the 1 × n matrix representing the ith row
of A.
By definition, for i = 1, 2, . . . , m, fi is differentiable at
a, with derivative Ai = Dfi(a).
Conversely, suppose that, for i = 1, 2, . . . , m, fi is

differentiable at a, with derivative Ai = Dfi(a),
where Ai is a 1 × n matrix.
Then, by definition, for i = 1, 2, . . . , m,

fi(a + h) − fi(a) − Aih
gi(h) = → 0, as h → 0.
k hk
Define g(h) = (g1(h), . . . , gm(h)), and note that this
implies that g(h) → 0 as h → 0.
In other words,
f (a + h) − f (a) − Ah
g(h) = → 0,
k hk
where A is the m × n matrix whose i-th row is the
1 × n matrix Ai for i = 1, 2, . . . , m.
By definition, g is differentiable at a, with derivative

Df (a) = A.
Let us now take the third statement. We assume
that f is differentiable at a. We want to show that
Df (a) =
T ∂f1 ∂f1
(∇f1 (a)) (a) ··· (a)
   
   ∂x1 ∂xn 



.
.
 
 = 
  .. ... .. 
.


T
 
∂fm ∂fm 
∇(fm(a)) (a) ··· (a)
∂x1 ∂xn
We have already shown that

Df1(a)
 
 
Df (a) =



.. 
;

 
Dfm(a)
so we need to show that (Dfi(a))T = ∇fi(a).

By definition of differentiability,
kfi(a + h) − fi(a) − Dfi(a)hk
→ 0, as h → 0.
k hk
Then this limit must be the same along all possible

paths, so take h = hej , for j = 1, . . . , n:

f i (a

+ hej ) − fi(a) − hDfi(a)ej
→ 0, as h → 0
h

f i (a

+ hej ) − fi(a)
− Dfi(a)ej → 0, as h → 0.

h

This is equivalent to saying that, for i = 1, . . . , m and
j = 1, . . . , n,
∂fi fi(a + hej ) − fi(a)
= lim = Dfi(a)ej ,
∂xj h→0 h
∂fi
and so ∂x is the j-th component of Dfi(a).
j
It follows that
∂fi ∂fi
Dfi(a) = ( ,..., ) = (∇fi(a))T ,
∂x1 ∂xn
as required.
Finally, consider the fourth statement. Assume that
∂fi
(for i = 1, . . . , m and j = 1, . . . , n) all exist in a
∂xj
neighbourhood of a and are continuous at a. Show
that f is differentiable at a.
We can write, for i = 1, . . . , m,
fi(a + h) = fi(a + (0, h2, . . . , hn)T + h1e1)

= fi(a + (0, h2, . . . , hn, )T )
∂fi (i)
+ (a + (0, h2, . . . , hn)T )h1 + h1R1 (h1),
∂x1
(i)
where R1 (h1) → 0 as h1 → 0.
In doing so, we have treated fi as a function of one
variable only – namely the first component of its
argument – while keeping all the remaining
components fixed.
∂fi
Since the partial derivative, ∂x , exists in a
1
neighbourhood of a, this function is differentiable in
the neighbourhood of a1, as we vary the first
component a1 + h1 around a1.
Then we have used the fact that a real-valued
function of a real variable is differentiable if and only
if it has a local linear approximation.
Then we can repeat this argument to write
fi(a + (0, h2, . . . , hn, )T ) = fi(a + (0, 0, h3, . . . , hn)T )

∂fi
+ (a + (0, 0, h3, . . . , hn)T )h2
∂x2
(i)
+ h2R2 (h2),
(i)
where R2 (h2) → 0 as h2 → 0.
Thus, substituting for fi(a + (0, h2, . . . , hn, )T ),
fi(a + h) = fi(a + (0, 0, h3, . . . , hn)T )

∂fi
+ (a + (0, 0, h3, . . . , hn)T )h2
∂x2
∂fi
+ (a + (0, h2, . . . , hn)T )h1
∂x1
(i) (i)
+ h1R1 (h1) + h2R2 (h2),
(i)
where Rj (hj ) → 0 as hj → 0 for j = 1, 2.
Repeating this argument again, we arrive at
fi(a + h) = fi(a)
∂fi
+ (a)hn
∂xn
+ ...
∂fi
+ (a + (0, 0, h3, . . . , hn)T )h2
∂x2
∂fi
+ (a + (0, h2, . . . , hn)T )h1
∂x1
(i) (i) (i)(h ),
+ h1R1 (h1) + h2R2 (h2) + . . . + hnRn n
(i)
where Rj (hj ) → 0 as hj → 0 for j = 1, 2, . . . , n.
As h → 0, hj → 0 for j = 1, . . . , n. Then, as h → 0,
∂fi
using the continuity of ∂x at a for each i = 1, . . . , m
j
and each j = 1, . . . , n, we can write
∂fi T ∂fi (i)
(a + (0, . . . , hj+1, . . . , hn) ) = (a) + Mj (h),
∂xj ∂xj
(i)
where Mj (h) → 0 as h → 0.
Substituting this into the above, we may write
fi(a + h) = fi(a)
∂fi ∂fi ∂fi
+ (a)hn + . . . + (a)h2 + (a)h1
∂xn ∂x2 ∂x1
(i) (i)
+ h1M1 (h) + h2M2 (h) + . . . + hnMn(i)(h)
(i) (i) (i)(h ),
+ h1R1 (h1) + h2R2 (h2) + . . . + hnRn n
(i)
where Rj (hj ) → 0 as hj → 0 for j = 1, 2, . . . , n, and
(i)
Mj (h) → 0 as h → 0 for j = 1, 2, . . . , n.
In other words,
fi(a + h) − fi(a) = Aih + R(i)(h)h + M (i)(h)h.
∂fi
Here Ai is the 1 × n matrix with j-th entry ∂x (a).
j
Also M (i)(h) is the 1 × n matrix with j-th entry

(i)
Mj (h).
And R(i)(h) is the 1 × n matrix with j-th entry

(i)
Rj (h).
fi(a + h) − fi(a) − Aih (i) (i) h
= (R (h) + M (h))
k hk k hk
→ 0 as h → 0,
which implies that each fi is differentiable at a, and

hence that f (x) = (f1(x), . . . , fn(x))T is differentiable
at a.
Theorem Suppose f : Rn → R is differentiable at a.
Then
• all directional derivatives of f at a exist, and

Dv f (a) = (∇f (a))T v
• Df (a) = (∇f (a))T .

We have already proved the second statement, that
Df (a) = (∇f (a))T , if f is differentiable at a. See the
proof of the third statement of the previous theorem.
Let us consider the first statement, namely that all

directional derivatives of f at a exist, and
Dv f (a) = (∇f (a))T v.
Recall that the derivative Df (a) is a linear map
h → Df (a)h from Rn to Rm.
We have also shown that

∂f ∂f
Df (a) = (∇f (a))T =( (a), . . . , (a))T .
∂x1 ∂xn
Then, by definition of derivative, for each unit vector
v, as h → 0,
f (a + hv) − f (a) − (∇f (a))T hv
k hk
f (a + hv) − f (a) − (∇f (a))T hv
= → 0.
|h|
So
 
f (a + hv) − f (a) T h
 − (∇f (a)) v ·
 → 0.
h |h|
Hence
f (a + hv) − f (a)
− (∇f (a))T v → 0,
h
so
f (a + hv) − f (a)
Dv f (a) = lim = (∇f (a))T v,
h→0 h
as required.
The gradient has a useful interpretation. We have
seen that the rate of change of f at a in the
direction v is the directional derivative
Dv f (a) = (∇f (a))T v.

This may be expressed as the inner product
h∇f (a), vi, which equals
k∇f (a)kkvk cos θ = k∇f (a)k cos θ,
where θ denotes the angle between the vectors

∇f (a) and v, and where we have used the fact that
kvk = 1. This quantity is maximised when cos θ = 1,
so it is maximised when the direction v is in the
same direction as ∇f (a).
Suppose ∇f (a) 6= 0.
Since directions have length 1, this means that the

maximising v is
∇f (a)
v= .
k∇f (a)k
So the function f increases most rapidly in the
direction of the gradient.
Note that directional derivatives in all directions can
exist, even if f is not differentiable.
This is in contrast to the fact that, if f is

differentiable, then directional derivatives in all
directions must exist.
Example Suppose that f : R2 → R is given by


x
 
 x2 y T =
if (x, y) 6 0


f  = 4
x +y 2
y 

 0 if (x, y)T = 0.
Let v = (u, v)T be a direction vector, and let us take

a = 0. We have
(tu)2(tv)
 
f (0 + tv) − f (0) 1
= .
t t (tu)4 + (tv)2
This is equal to
t2u2v u2 v
4 4 2 2
= 2 4 2
.
t u +t v t u +v
If v 6= 0, this tends to u2/v as t → 0, while if v = 0,

then it tends to 0. The limit therefore exists in all
cases.
So f has directional derivatives in every possible
direction v at 0.
But f is not differentiable.
In fact, it is not even continuous at 0, since, for

example,
t2 · t2
f (t, t2) = 4 4
= 1/2 6→ 0 = f (0)
t +t
as t → 0.
Example Suppose f : R2 → R is given by


x
 
 x3
if (x, y) 6= (0, 0)


f  = 2
x +y 2
y 

 0 if (x, y) = (0, 0).
Here, f is continuous at (0, 0)T , since
x 2
|f ((x, y)T )| = 2 2
· |x| ≤ |x| → 0
x +y
as x → 0.
Also, f has directional derivatives in all directions at
(0, 0)T . To see this, note that, for each unit vector
v,
f (hv) h 3 u3 1 3 → u3 ,
= 2 2 · = u as h → 0,
h h (u + v 2) h
so Dv f (0) = u3, where v = (u, v)T .

However f is not differentiable at (0, 0).
From the above, taking v = e1, we see that

∂f
(0) = 1,
∂x
and, taking v = e2, we see that
∂f
(0) = 0.
∂y
Now recall that, if a function g : R2 → R is
differentiable at a point a, then its directional
derivatives are given by
 
∂g ∂g
Dv g(a) = (a), (a) v.
∂x ∂y
So here, we would need to have, for each unit vector

v = (u, v)T ,
u3 = (1, 0)(u, v)T = u,
which is not true for u 6= 0, 1.

Differentiation

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Differentiation

Diunggah oleh

Hak Cipta:

Format Tersedia

DIFFERENTIATION

Definition of the derivative of f : R → R

Let f be a real-valued function defined at all points

If f 0(x0) is defined, we say that f is differentiable at

if it exists. We define the left derivative fl0(x0)

Note that f 0(x0) exists if and only if fl0(x0) and

First suppose that a > 0. Then for x sufficiently

One may also see that differentiability is a stronger

Theorem If f is differentiable at a point c, then f is

f (a + h) > f (a) > f (a − h)

for all h ∈ (0, δ).

In particular, if h = x − a ∈ (0, δ),

Since h > 0 we may multiply through by h without

from which it now follows that f (a + h) > f (a).

we have, in particular, for x − a = −h ∈ (−δ, 0),

The other half of the result follows.

f (a + h) < f (a) < f (a − h)

for all h ∈ (0, δ).

The proof of this statement is similar to that of the

• (f + g)0(c) = f 0(c) + g 0(c)

• (f g)0(c) = f 0(c)g(c) + f (c)g 0(c)

This is straightforward, since, by algebra of limits,

We have, by algebra of limits,

In fact, we can instead prove that

Then, by the previous result, about derivatives of

Then, as g(c) 6= 0, there exists δ > 0 such that, as

Then, by algebra of limits,

K(x) = (g ◦ f )(x) = g(f (x))

for all x ∈ (a, b). Then K is differentiable at c and

K 0(c) = g 0(f (c))f 0(c).

f (c + h) = f (c) + hf 0(c) + hR(h),

In other words, f has a local linear approximation

Well, we know that, if f 0(c) exists, then

Then, for h with 0 < |h| < δ

We know that f 0(c) exists, so there exists δ > 0 such

f (c + h) = f (c) + hf 0(c) + hR1(h),

g(f (c + h)) = g(f (c) + k).

But g is differentiable at f (c), so has a local linear

By the same argument, provided that

g(f (c+h)) = g(f (c)+k) = g(f (c))+kg 0(f (c))+kR2(k),

g(f (c + h)) = g(f (c) + k))

R(h) = R1(h)g 0(f (c))+(f 0(c)+R1(h))R2(hf 0(c)+hR1(h)).

But, by algebra of limits, and since k → 0 as h → 0,

lim R(h) = g 0(f (c)) lim R1(h)

Definition Let f : R → R. We say that f has a local

We define a local minimum similarly.

Consider 0 < h < δ; then, for all such h,

Since f 0(c) exists, the right derivative fr0 (c) exists,

Since f 0(c) exists, the left derivative fl0(c) exists, and

But f 0(c) exists, so f 0(c) = fr0 (c) ≤ 0 and

The theorem tells us that if f is differentiable on

Theorem Suppose the real function f is continuous

f (x1) = min{f (x) : x ∈ [a, b]},

f (x2) = max{f (x) : x ∈ [a, b]}.

Let us prove that f is bounded above. The proof

Suppose otherwise. Then for every n ∈ N, there

By continuity of f , f (znk ) → f (z) as k → ∞,

To say that f is bounded on [a, b] means that the set

M = sup{f (x) : x ∈ [a, b]}.

M − 1/n < f (yn) ≤ M.

Now for each n, a ≤ yn ≤ b, so the sequence (yn) is

Therefore, (yn) must have a convergent

But also, for each k,

M − 1/nk < f (ynk ) ≤ M,