Anda di halaman 1dari 2

Mathematics for Machine Learning

Multivariate Calculus
Formula sheet
Dr Samuel J. Cooper
Prof. David Dye
Dr A. Freddie Page

Definition of a derivative Derivatives of named functions


 
0 df (x) f (x + ∆x) − f (x)  
f (x) = = lim d 1 1
dx ∆x→0 ∆x =− 2
dx x x
d
Time saving rules (sin(x)) = cos(x)
dx
- Sum Rule: d
(cos(x)) = − sin(x)
dx
d d d d
(f (x) + g(x)) = (f (x)) + (g(x)) (exp(x)) = exp(x)
dx dx dx dx

- Power Rule:
Derivative structures
Given f (x) = axb ,
Given f = f (x, y, z)
then f 0 (x) = abx(b−1)

- Product Rule:
- Jacobian:
Given A(x) = f (x)g(x),  
∂f ∂f ∂f
then A0 (x) = f 0 (x)g(x) + f (x)g 0 (x) Jf = , ,
∂x ∂y ∂z

- Chain Rule:
Given h = h(p) and p = p(m), - Hessian:
dh dh dp
 
then = × ∂2f ∂2f ∂2f
dm dp dm 
 ∂x2 ∂x∂y ∂x∂z 

 
Hf =  ∂2f ∂2f ∂2f 
- Total derivative: 
 ∂y∂x ∂y 2 ∂y∂z 

For the function f (x, y, z, ...), where each variable 
∂2f ∂2f ∂2f

is a function of parameter t, the total derivative is ∂z∂x ∂z∂y ∂z 2

df ∂f dx ∂f dy ∂f dz
= + + + ...
dt ∂x dt ∂y dt ∂z dt

1
Neural networks - Grad :
 
- Activation function: ∂f
 ∂x 
ex − e−x
 
 
σ(x) = tanh(x) = ∇f =  ∂f 
ex + e−x  ∂y 
 
d 1 4  
(σ(x)) = 2 = ∂f
dx cosh (x) (e + e−x )2
x ∂z

- Directional Gradient:
Taylor Series
∇f.r̂
- Univariate:
- Gradient Descent:
1
f (x) = f (c) + f (c)(x − c) + f 00 (c)(x − c)2 + ...
0
2 sn+1 = sn − γ∇f
∞ (n)
X f (c)
= (x − c)n
n! - Lagrange Multipliers λ:
n=0

- Multivariate: ∇f = λ∇g

- Least Squares - χ2 minimization:


f (x) = f (c) + Jf (c)(x − c)+
1 n
(x − c)t Hf (c)(x − c) + ...
X (yi − y(xi ; ak ))2
2 χ2 =
i
σi

criterion: ∇χ2 = 0
Optimization and Vector Calculus
- Newton-Raphson:
anext = acur − γ∇χ2
n
f (xi ) X (yi − y(xi ; ak )) ∂y
xi+1 = xi − = acur + γ
f 0 (xi ) σi ∂ak
i

Anda mungkin juga menyukai