x−a
= x→a
lim = 1.
x−a
Similarly, if a < 0, then f 0(a) = −1. But at a = 0 we
have:
|x| − |0| (−x) − 0
fl0(0) = lim = lim = −1.
x→0 (x − 0)
− x→0 − x
But fr0 (0) = 1, since it is
|x| − |0| x−0
lim = lim = 1.
x→0 (x − 0)
+ x→0 + x
Thus the derivative of f does not exist at 0.
Differentiability and continuity
0 < |x − a| < δ
implies
f (x)
− f (a) 0
− < f 0(a).
f (a)
x −a
0 < f (a + h) − f (a),
Since
0 < |x − a| < δ
implies
f (x)
− f (a) 0
− < f 0(a),
f (a)
x −a
0 > f (a − h) − f (a).
0
f g(c)f 0(c) − g 0(c)f (c)
• if g(c) 6= 0, then (c) = 2
.
g g(c)
Let us prove that (f + g)0(c) = f 0(c) + g 0(c).
where R(h) → 0 as h → 0.
f (c + h) = f (c) + ha + hR(h),
where R(h) → 0 as h → 0.
where R1(h) → 0 as h → 0.
Put k = f (c + h) − f (c) = hf 0(c) + hR1(h), so that
where R2(k) → 0 as k → 0.
Substituting k = hf 0(c) + hR1(h), we get
where
as required.
By uniqueness of limit,
Let us consider
and
so h0(c) = 0 is equivalent to
as required.
Note that the Mean Value Theorem is a special case
of this result, with g(x) = x. Then g(b) = b,
g(a) = a, g 0(x) = 1 for all x, and so g 0(c) = 1.
Then
becomes
df
Analogously, if n = m = 1, the derivative dx at a
point a ∈ R is the instantaneous rate of change of f
with respect to x at a.
Formally,
∂f
(a)
∂xi
is
f (a1, . . . , ai−1, ai + h, ai+1, . . . , an) − f (a)
lim ,
h→0 h
if this limit exists.
We may think of the partial derivative ∂f /∂xi as the
rate of change in f as we move in the direction of
the vector ei, because
f (a1, . . . , ai−1, ai + h, ai+1, . . . , an) − f (a)
h
f (a + hei) − f (a)
= .
h
But we can move in many other directions, and such
considerations lead to the notion of directional
derivative.
We define a direction (or direction vector) in Rn to
be an n-vector of length 1.
∂f1 ∂f1
f1 (a + h) − f1 (a) ' (a)h1 + · · · + (a)hn
∂x1 ∂xn
...
∂fm ∂fm
fm (a + h) − fm (a) ' (a)h1 + · · · + (a)hn.
∂x1 ∂xn
When n = m = 1, this just says that, for h small
enough,
h1
∂f1 ∂f1
(a) ··· (a)
∂x1 ∂xn
h2
'
.. ... .. .
..
∂fm ∂fm
(a) ··· (a)
∂x1 ∂xn hn
This describes the linear approximation of f at a,
and the matrix (or, equivalently, the linear mapping
it describes)
∂f1 ∂f1
(a) · · · (a)
∂x1 ∂xn
Df (a) =
.. ... ..
∂fm ∂fm
(a) · · · (a)
∂x1 ∂xn
is known as the derivative (or the Jacobian
derivative) of f at a.
• f is differentiable at a =⇒ f is continuous at a
• f is differentiable at a ⇐⇒ fi is differentiable at a
(for i = 1, 2, . . . , m)
• If f is differentiable at a then Df (a) =
T ∂f1 ∂f1
(∇f1(a)) (a) · · · (a)
∂x1 ∂xn
.
.
=
.
. . . . .
.
.
∂f m ∂f m
∇(fm(a))T (a) · · · (a)
∂x1 ∂xn
∂fi
• If (for i = 1, . . . , m and j = 1, . . . , n) all exist
∂xj
in a neighbourhood of a and are continuous at a,
then f is differentiable at a.
Let us consider the first statement. We assume that
f is differentiable at a, that is there exists a linear
function x 7→ Ax (where A = Df (a) is an m × n
matrix) from Rn to Rm such that
f (a + h) − f (a) − Ah
→0
k hk
as h → 0.
f (a + h) − f (a) → 0, as h → 0.
But
kf (a + h) − f (a)k
kf (a + h) − f (a)k = · k hk
k hk
kf (a + h) − f (a) − Ah + Ahkkhk
=
k hk
kf (a + h) − f (a) − Ahk kAhk
≤ k hk +
khk k hk
kf (a + h) − f (a) − Ahk
= k hk · + kAhk.
k hk
So
kf (a + h) − f (a) − Ahk
lim kf (a + h) − f (a)k ≤ lim khk · lim
h→0 h→0 h→0 k hk
+ lim kAhk
h→0
= 0 · 0 + 0 = 0.
Let us consider the second statement. We assume
that f is differentiable at a, that is there exists a
linear function x 7→ Ax (where A = Df (a) is an m × n
matrix) from Rn to Rm such that
f (a + h) − f (a) − Ah
→0
k hk
as h → 0.
In other words,
f (a + h) − f (a) − Ah
g(h) = → 0,
k hk
where A is the m × n matrix whose i-th row is the
1 × n matrix Ai for i = 1, 2, . . . , m.
f i (a
+ hej ) − fi(a)
− Dfi(a)ej → 0, as h → 0.
h
This is equivalent to saying that, for i = 1, . . . , m and
j = 1, . . . , n,
∂fi fi(a + hej ) − fi(a)
= lim = Dfi(a)ej ,
∂xj h→0 h
∂fi
and so ∂x is the j-th component of Dfi(a).
j
It follows that
∂fi ∂fi
Dfi(a) = ( ,..., ) = (∇fi(a))T ,
∂x1 ∂xn
as required.
Finally, consider the fourth statement. Assume that
∂fi
(for i = 1, . . . , m and j = 1, . . . , n) all exist in a
∂xj
neighbourhood of a and are continuous at a. Show
that f is differentiable at a.
∂fi
Since the partial derivative, ∂x , exists in a
1
neighbourhood of a, this function is differentiable in
the neighbourhood of a1, as we vary the first
component a1 + h1 around a1.
Then we have used the fact that a real-valued
function of a real variable is differentiable if and only
if it has a local linear approximation.
Then we can repeat this argument to write
(i)
where R2 (h2) → 0 as h2 → 0.
Thus, substituting for fi(a + (0, h2, . . . , hn, )T ),
(i)
where Rj (hj ) → 0 as hj → 0 for j = 1, 2.
Repeating this argument again, we arrive at
fi(a + h) = fi(a)
∂fi
+ (a)hn
∂xn
+ ...
∂fi
+ (a + (0, 0, h3, . . . , hn)T )h2
∂x2
∂fi
+ (a + (0, h2, . . . , hn)T )h1
∂x1
(i) (i) (i)(h ),
+ h1R1 (h1) + h2R2 (h2) + . . . + hnRn n
(i)
where Rj (hj ) → 0 as hj → 0 for j = 1, 2, . . . , n.
As h → 0, hj → 0 for j = 1, . . . , n. Then, as h → 0,
∂fi
using the continuity of ∂x at a for each i = 1, . . . , m
j
and each j = 1, . . . , n, we can write
∂fi T ∂fi (i)
(a + (0, . . . , hj+1, . . . , hn) ) = (a) + Mj (h),
∂xj ∂xj
(i)
where Mj (h) → 0 as h → 0.
Substituting this into the above, we may write
fi(a + h) = fi(a)
∂fi ∂fi ∂fi
+ (a)hn + . . . + (a)h2 + (a)h1
∂xn ∂x2 ∂x1
(i) (i)
+ h1M1 (h) + h2M2 (h) + . . . + hnMn(i)(h)
(i) (i) (i)(h ),
+ h1R1 (h1) + h2R2 (h2) + . . . + hnRn n
(i)
where Rj (hj ) → 0 as hj → 0 for j = 1, 2, . . . , n, and
(i)
Mj (h) → 0 as h → 0 for j = 1, 2, . . . , n.
In other words,
∂fi
Here Ai is the 1 × n matrix with j-th entry ∂x (a).
j
So
f (a + hv) − f (a) T h
− (∇f (a)) v ·
→ 0.
h |h|
Hence
f (a + hv) − f (a)
− (∇f (a))T v → 0,
h
so
f (a + hv) − f (a)
Dv f (a) = lim = (∇f (a))T v,
h→0 h
as required.
The gradient has a useful interpretation. We have
seen that the rate of change of f at a in the
direction v is the directional derivative
x 2
|f ((x, y)T )| = 2 2
· |x| ≤ |x| → 0
x +y
as x → 0.
Also, f has directional derivatives in all directions at
(0, 0)T . To see this, note that, for each unit vector
v,
f (hv) h 3 u3 1 3 → u3 ,
= 2 2 · = u as h → 0,
h h (u + v 2) h