CONTENTS
CONTENTS
Contents
1 Basics
1.1 Trace and Determinants . . . . . . . . . . . . . . . . . . . . . . .
1.2 The Special Case 2x2 . . . . . . . . . . . . . . . . . . . . . . . . .
2 Derivatives
2.1 Derivatives
2.2 Derivatives
2.3 Derivatives
2.4 Derivatives
2.5 Derivatives
of
of
of
of
of
a Determinant . . . . . . . . . . . .
an Inverse . . . . . . . . . . . . . . .
Matrices, Vectors and Scalar Forms
Traces . . . . . . . . . . . . . . . . .
Structured Matrices . . . . . . . . .
3 Inverses
3.1 Basic . . . . . . . . . . .
3.2 Exact Relations . . . . .
3.3 Implication on Inverses .
3.4 Approximations . . . . .
3.5 Generalized Inverse . . .
3.6 Pseudo Inverse . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
5
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
9
11
12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
15
16
17
17
17
17
4 Complex Matrices
19
4.1 Complex Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 19
5 Decompositions
22
5.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . 22
5.2 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 22
5.3 Triangular Decomposition . . . . . . . . . . . . . . . . . . . . . . 24
6 Statistics and Probability
25
6.1 Definition of Moments . . . . . . . . . . . . . . . . . . . . . . . . 25
6.2 Expectation of Linear Combinations . . . . . . . . . . . . . . . . 26
6.3 Weighted Scalar Variable . . . . . . . . . . . . . . . . . . . . . . 27
7 Gaussians
7.1 Basics . . . . . . . .
7.2 Moments . . . . . .
7.3 Miscellaneous . . . .
7.4 Mixture of Gaussians
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
28
28
30
32
33
8 Special Matrices
8.1 Units, Permutation and Shift .
8.2 The Singleentry Matrix . . . .
8.3 Symmetric and Antisymmetric
8.4 Vandermonde Matrices . . . . .
8.5 Toeplitz Matrices . . . . . . . .
8.6 The DFT Matrix . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
35
37
37
38
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 2
CONTENTS
8.7
8.8
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
40
41
43
43
44
45
47
48
48
49
A One-dimensional Results
50
A.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.2 One Dimensional Mixture of Gaussians . . . . . . . . . . . . . . . 51
B Proofs and Details
53
B.1 Misc Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 3
CONTENTS
CONTENTS
Matrix
Matrix indexed for some purpose
Matrix indexed for some purpose
Matrix indexed for some purpose
Matrix indexed for some purpose or
The n.th power of a square matrix
The inverse matrix of the matrix A
The pseudo inverse matrix of the matrix A (see Sec. 3.6)
The square root of a matrix (if unique), not elementwise
The (i, j).th entry of the matrix A
The (i, j).th entry of the matrix A
The ij-submatrix, i.e. A with i.th row and j.th column deleted
Vector
Vector indexed for some purpose
The i.th element of the vector a
Scalar
Real part of a scalar
Real part of a vector
Real part of a matrix
Imaginary part of a scalar
Imaginary part of a vector
Imaginary part of a matrix
det(A)
Tr(A)
diag(A)
vec(A)
||A||
AT
A
AH
Determinant of A
Trace of the matrix A
Diagonal matrix of the matrix A, i.e. (diag(A))ij = ij Aij
The vector-version of the matrix A (see Sec. 9.2.2)
Matrix norm (subscript if any denotes what norm)
Transposed matrix
Complex conjugated matrix
Transposed and complex conjugated matrix (Hermitian)
AB
AB
0
I
Jij
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 4
1 BASICS
Basics
(AB)1
(ABC...)1
(AT )1
(A + B)T
(AB)T
(ABC...)T
(AH )1
(A + B)H
(AB)H
(ABC...)H
1.1
B1 A1
...C1 B1 A1
(A1 )T
AT + B T
B T AT
...CT BT AT
(A1 )H
AH + B H
B H AH
...CH BH AH
1.2
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
P
Aii
Pi
i = eig(A)
i i ,
T
Tr(A )
Tr(BA)
Tr(A) + Tr(B)
Tr(BCA) = Tr(CAB)
Q
i = eig(A)
i i
det(A) det(B)
1/ det(A)
1 + uT v
A=
A11
A21
A12
A22
2 Tr(A) + det(A) = 0
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 5
1.2
1 =
Tr(A) +
p
Tr(A)2 4 det(A)
2
1 + 2 = Tr(A)
Eigenvectors
v1
A12
1 A11
Inverse
A
2 =
Tr(A)2 4 det(A)
2
1 2 = det(A)
1
=
det(A)
Tr(A)
BASICS
v2
A22
A21
A12
2 A11
A12
A11
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 6
2 DERIVATIVES
Derivatives
x
y
i
x
=
yi
x
y
=
ij
xi
yj
The following rules are general and very useful when deriving the differential of
an expression ([13]):
A
(X)
(X + Y)
(Tr(X))
(XY)
(X Y)
(X Y)
(X1 )
(det(X))
(ln(det(X)))
XT
XH
2.1
2.1.1
=
=
=
=
=
=
=
=
=
=
=
=
0
(A is a constant)
X
X + Y
Tr(X)
(X)Y + X(Y)
(X) Y + X (Y)
(X) Y + X (Y)
X1 (X)X1
det(X)Tr(X1 X)
Tr(X1 X)
(X)T
(X)H
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
Derivatives of a Determinant
General form
det(Y)
Y
= det(Y)Tr Y1
x
x
2.1.2
Linear forms
det(X)
= det(X)(X1 )T
X
det(AXB)
= det(AXB)(X1 )T = det(AXB)(XT )1
X
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 7
2.2
Derivatives of an Inverse
2.1.3
DERIVATIVES
Square forms
(13)
2.2
= 2(X+ )T
= 2XT
= (X1 )T = (XT )1
= k det(Xk )XT
Derivatives of an Inverse
XT abT XT
det(X1 )(X1 )T
(X1 BAX1 )T
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 8
2.3
2.3
2.3.1
DERIVATIVES
xT a
x
aT Xb
X
aT XT b
X
aT Xa
X
X
Xij
(XA)ij
Xmn
(XT A)ij
Xmn
2.3.2
aT x
x
= abT
= baT
aT XT a
X
aaT
= im (A)nj
(Jmn A)ij
= in (A)mj
(Jnm A)ij
= Jij
Second Order
X
Xkl Xmn
Xij
= 2
bT XT Xc
X
(Bx + b)T C(Dx + d)
x
(XT BX)kl
Xij
(XT BX)
Xij
Xkl
kl
klmn
= X(bcT + cbT )
= BT C(Dx + d) + DT CT (Bx + b)
= lj (XT B)ki + kj (BX)il
= XT BJij + Jji BX
(Jij )kl = ik jl
See Sec 8.2 for useful properties of the Single-entry matrix Jij
xT Bx
x
bT XT DXc
X
= (B + BT )x
= DT XbcT + DXcbT
=
(D + DT )(Xb + c)bT
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 9
2.3
DERIVATIVES
(x s)T W(x s)
s
2W(x s)
2W(x As)
2W(x As)sT
T n T n
a (X ) X b =
X
(14)
n1
Xh
r=0
i
(15)
=
s
x
T
Ar + sT A
r
x
Using the above we have for the gradient and the hessian
f
f
x f =
x
2f
xxT
xT Ax + bT x
= (A + AT )x + b
=
A + AT
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 10
2.4
Derivatives of Traces
2.4
DERIVATIVES
Derivatives of Traces
2.4.1
First Order
Tr(X)
X
Tr(XA)
X
Tr(AXB)
X
Tr(AXT B)
X
Tr(XT A)
X
Tr(AXT )
X
2.4.2
AT
AT B T
BA
(16)
Second Order
Tr(X2 ) = 2XT
X
Tr(XT BX)
X
Tr(XBXT )
X
Tr(AXBX)
X
Tr(XT X)
X
Tr(BXXT )
X
Tr(BT XT CXB)
X
Tr XT BXC
X
Tr(AXBXT C)
X
i
h
BX + BT X
XBT + XB
AT XT BT + BT XT AT
= 2X
= (B + BT )X
=
CT XBBT + CXBBT
BXC + BT XCT
AT CT XBT + CAXB
See [7].
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 11
2.5
2.4.3
DERIVATIVES
Higher Order
Tr(Xk )
X
= k(Xk1 )T
k1
X
Tr(AXk ) =
(Xr AXkr1 )T
X
r=0
T T
T
T
Tr
B
X
CXX
CXB
=
CXX
CXBBT
X
+CT XBBT XT CT X
+CXBBT XT CX
+CT XXT CT XBBT
2.4.4
Other
2.5
Assume that the matrix A has some structure, i.e. symmetric, toeplitz, etc.
In that case the derivatives of the previous section does not apply in general.
Instead, consider the following general rule for differentiating a scalar function
f (A)
"
#
T
X f Akl
df
f
A
=
= Tr
dAij
Akl Aij
A
Aij
kl
2.5
2.5.1
DERIVATIVES
(17)
Then the Chain Rule can then be written the following way:
M
(18)
k=1 l=1
(19)
Symmetric
(20)
(21)
= 2X1 (X1 I)
(22)
Diagonal
AI
(23)
Toeplitz
Like symmetric matrices and diagonal matrices also Toeplitz matrices has a
special structure which should be taken into account when the derivative with
respect to a matrix with Toeplitz structure.
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 13
2.5
Tr(AT)
T
Tr(TA)
=
T
DERIVATIVES
(24)
Tr([AT ]n1 )
Tr(A)
Tr([AT ]1n ))
Tr(A)
.
.
A1n
.
.
.
.
.
.
.
.
.
.
.
.
Tr([[AT ]1n ]2,n1 )
.
.
.
.
.
.
Tr([AT ]1n ))
An1
.
.
.
Tr([[AT ]1n ]n1,2 )
Tr([AT ]n1 )
Tr(A)
(A)
As it can be seen, the derivative (A) also has a Toeplitz structure. Each value
in the diagonal is the sum of all the diagonal valued in A, the values in the
diagonals next to the main diagonal equal the sum of the diagonal next to the
main diagonal in AT . This result is only valid for the unconstrained Toeplitz
matrix. If the Toeplitz matrix also is symmetric, the same derivative yields
Tr(AT)
T
Tr(TA)
T
= (A) + (A)T (A) I
=
(25)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 14
3 INVERSES
3
3.1
3.1.1
Inverses
Basic
Definition
(26)
cof(A, 1, 1)
cof(A, 1, n)
.
.
..
..
cof(A) =
cof(A,
i,
j)
cof(A, n, 1)
cof(A, n, n)
(27)
(28)
(29)
Determinant
n
X
j=1
n
X
(30)
j=1
3.1.4
Construction
1
adj(A)
det(A)
(31)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 15
3.2
Exact Relations
3.1.5
INVERSES
Condition number
The condition number of a matrix c(A) is the ratio between the largest and the
smallest singular value of a matrix (see Section 5.2 on singular values),
c(A) =
d+
d
The condition number can be used to measure how singular a matrix is. If the
condition number is large, it indicates that the matrix is nearly singular. The
condition number can also be estimated from the matrix norms. Here
c(A) = kAk kA1 k,
(32)
where k k is a norm such as e.g the 1-norm, the 2-norm, the -norm or the
Frobenius norm (see Sec 9.4 for more on matrix norms).
3.2
3.2.1
Exact Relations
The Woodbury identity
(A + CBCT )1 = A1 A1 C(B1 + CT A1 C)1 CT A1
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 16
3.3
Implication on Inverses
3.3
INVERSES
Implication on Inverses
(A + B)1 = A1 + B1
AB1 A = BA1 B
See [17].
3.3.1
A PosDef identity
3.4
Approximations
(I + A)1 = I A + A2 A3 + ...
A A(I + A)1 A
= I A1
If 2 is small then
(Q + 2 M)1
= Q1 2 Q1 MQ1
3.5
3.5.1
Generalized Inverse
Definition
A generalized inverse matrix of the matrix A is any matrix A such that (see
[18])
AA A = A
The matrix A is not unique.
3.6
3.6.1
Pseudo Inverse
Definition
AA+ A = A
A+ AA+ = A+
AA+ symmetric
IV
A+ A symmetric
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 17
3.6
Pseudo Inverse
3.6.2
INVERSES
Properties
= A
= (A+ )T
(cA)+
= (1/c)A+
= A+ (AT )+
= (AT )+ A+
(A A)
(AAT )+
Assume A to have full rank, then
(See [18])
(See [18])
Construction
Square
Broad
Tall
rank(A) = n
rank(A) = n
rank(A) = m
A+ = A1
A+ = AT (AAT )1
A+ = (AT A)1 AT
Assume A does not have full rank, i.e. A is nm and rank(A) = r < min(n, m).
The pseudo inverse A+ can be constructed from the singular value decomposition A = UDVT , by
A+ = VD+ UT
A different way is this: There does always exists two matrices C n r and D
r m of rank r, such that A = CD. Using these matrices it holds that
A+ = DT (DDT )1 (CT C)1 CT
See [3].
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 18
COMPLEX MATRICES
Complex Matrices
4.1
Complex Derivatives
(33)
and
df (z)
<(f (z)) =(f (z))
= i
+
dz
=z
=z
or in a more compact form:
f (z)
f (z)
=i
.
=z
<z
(34)
(35)
(36)
(37)
=
=
df (z)
dz
f (z)
f (z)
+i
.
<z
=z
(39)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 19
4.1
Complex Derivatives
COMPLEX MATRICES
df (Z)
dZ
f (Z)
f (Z)
+i
.
<Z
=Z
(40)
The chain rule is a little more complicated when the function of a complex
u = f (x) is non-analytic. For a non-analytic function, the following chain rule
can be applied ([7])
g(u)
x
=
=
g u
g u
+
u x u x
g u g u
+
u x
u
x
(41)
Notice, if the function is analytic, the second term reduces to zero, and the function is reduced to the normal well-known chain rule. For the matrix derivative
of a scalar function g(U), the chain rule can be written the following way:
T
T
Tr(( g(U)
Tr(( g(U)
g(U)
U ) U)
U ) U )
=
+
.
X
X
X
4.1.2
(42)
If the derivatives involve complex numbers, the conjugate transpose is often involved. The most useful way to show complex derivative is to show the derivative
with respect to the real and the imaginary part separately. An easy example is:
Tr(X )
Tr(XH )
=
<X
<X
Tr(X )
Tr(XH )
i
=i
=X
=X
(43)
(44)
Since the two results have the same sign, the conjugate complex derivative (37)
should be used.
Tr(X)
Tr(XT )
=
<X
<X
Tr(XT )
Tr(X)
=i
i
=X
=X
(45)
(46)
Here, the two results have different signs, the generalized complex derivative
(36) should be used. Hereby, it can be seen that (16) holds even if X is a
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 20
4.1
Complex Derivatives
COMPLEX MATRICES
complex number.
Tr(AXH )
<X
Tr(AXH )
i
=X
Tr(AX )
<X
Tr(AX )
i
=X
(47)
(48)
AT
(49)
AT
(50)
Tr(XXH )
Tr(XH X)
=
<X
<X
H
Tr(XX )
Tr(XH X)
i
=i
=X
=X
2<X
(51)
i2=X
(52)
By inserting (51) and (52) in (36) and (37), it can be seen that
Tr(XXH )
= X
X
Tr(XXH )
=X
X
(53)
(54)
Since the function Tr(XXH ) is a real function of the complex matrix X, the
complex gradient matrix (40) is given by
Tr(XXH ) = 2
4.1.3
Tr(XXH )
= 2X
X
(55)
1 det(XH AX)
det(XH AX)
i
2
<X
=X
T
H
H
1 H
= det(X AX) (X AX) X A
=
(56)
1 det(XH AX)
det(XH AX)
+i
2
<X
=X
= det(XH AX)AX(XH AX)1
=
(57)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 21
5 DECOMPOSITIONS
Decompositions
5.1
5.1.1
(D)ij = ij i
General Properties
eig(AB)
A is n m
rank(A) = r
5.1.3
= eig(BA)
At most min(n, m) distinct i
At most r non-zero i
Symmetric
=
=
=
=
I
R
P
(i.e. V is orthogonal)
(i.e. i is real)
p
i i
1 + ci
i c
1
i
5.2
(58)
U = eigenvectors
of AAT n n
p
D =
diag(eig(AAT ))
nm
V = eigenvectors of AT A m m
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 22
5.2
5.2.1
DECOMPOSITIONS
UT
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
5.2.3
Rectangular decomposition I
Assume A is n m
UT
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
5.2.5
Rectangular decomposition II
Assume A is n m
5.2.6
UT
Assume A is n m
UT
where D is diagonal with the square root of the eigenvalues of AAT , V is the
eigenvectors of AAT and UT is the eigenvectors of AT A.
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 23
5.3
Triangular Decomposition
5.3
5.3.1
DECOMPOSITIONS
Triangular Decomposition
Cholesky-decomposition
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 24
6
6.1
Mean
Covariance
6.1.3
Third moments
i
h
(3) (3)
M3 = m::1 m::2 ...m(3)
::n
where : denotes all elements within the given index. M3 can alternatively be
expressed as
M3 = h(x m)(x m)T (x m)T i
6.1.4
Fourth moments
mijkl = h(xi hxi i)(xj hxj i)(xk hxk i)(xl hxl i)i
as
h
i
(4)
(4)
(4)
(4)
(4)
(4)
(4)
(4)
M4 = m::11 m::21 ...m::n1 |m::12 m::22 ...m::n2 |...|m::1n m::2n ...m(4)
::nn
or alternatively as
M4 = h(x m)(x m)T (x m)T (x m)T i
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 25
6.2
6.2
6.2.1
Linear Forms
Am + b
Am
m+b
Quadratic Forms
=
=
=
=
=
=
Tr(M) + (m + a)T (m + a)
See [7].
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 26
6.3
6.2.3
Cubic Forms
E[xxT x] =
E[(Ax + a)(Ax + a)T (Ax + a)] =
Adiag(BT C)v3
+Tr(BMCT )(Am + a)
+AMCT (Bm + b)
+(AMBT + (Am + a)(Bm + b)T )(Cm + c)
v3 + 2Mm + (Tr(M) + mT m)m
Adiag(AT A)v3
+[2AMAT + (Ax + a)(Ax + a)T ](Am + a)
+Tr(AMAT )(Am + a)
6.3
=
=
=
=
wT m
wT M2 w
wT M3 w w
wT M4 w w w
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 27
7
7.1
7.1.1
GAUSSIANS
Gaussians
Basics
Density and normalization
1
exp (x m)T 1 (x m)
2
det(2)
1
2p
xxT
7.1.2
p(x)
= p(x)1 (x m)
x
= p(x) 1 (x m)(x m)T 1 1
Marginal Distribution
Assume x Nx (, ) where
xa
a
x=
=
xb
b
=
a
Tc
c
b
a
Tc
c
b
then
p(xa ) = Nxa (a , a )
p(xb ) = Nxb (b , b )
7.1.3
Conditional Distribution
Assume x Nx (, ) where
xa
a
x=
=
xb
b
=
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 28
7.1
Basics
GAUSSIANS
then
7.1.4
a)
p(xa |xb ) = Nxa (
a ,
n
a
a
b)
p(xb |xa ) = Nxb (
b ,
n
b
b
=
=
=
=
a + c 1
b (xb b )
T
a c 1
b c
b + Tc 1
a (xa a )
b Tc 1
a c
Linear combination
Rearranging Means
p
det(2(AT 1 A)1 )
p
NAx [m, ] =
Nx [A1 m, (AT 1 A)1 ]
det(2)
7.1.6
If A is symmetric, then
1
1
1
xT Ax + bT x = (x A1 b)T A(x A1 b) + bT A1 b
2
2
2
1
1
1
Tr(XT AX)+Tr(BT X) = Tr[(XA1 B)T A(XA1 B)]+ Tr(BT A1 B)
2
2
2
7.1.7
=
1
c
mc
C
1
(x m1 )T 1
1 (x m1 )
2
1
(x m2 )T 1
2 (x m2 )
2
1
(x mc )T 1
c (x mc ) + C
2
1
= 1
1 + 2
1 1
1
= (1
(1
1 + 2 )
1 m1 + 2 m2 )
1 T 1
1
1 1
1
=
(m + mT2 1
(1
2 )(1 + 2 )
1 m1 + 2 m2 )
2 1 1
1
T 1
mT1 1
1 m1 + m2 2 m2
2
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 29
7.2
Moments
GAUSSIANS
M
)
(
+
)
(
M
+
M
)
1
2
1
2
1
2
1
2
1
2
2
1
T 1
Tr(MT1 1
1 M1 + M2 2 M2 )
2
1
c
Mc
C
7.1.8
=
=
mc
c
=
=
Nm1 (m2 , (1 + 2 ))
1
1
T
1
p
exp (m1 m2 ) (1 + 2 ) (m1 m2 )
2
det(2(1 + 2 ))
1 1
1
(1
(1
1 + 2 )
1 m1 + 2 m2 )
1
1 1
(1 + 2 )
7.2
Moments
7.2.1
7.2
Moments
7.2.2
GAUSSIANS
= 2 4 Tr(A2 ) + 4 2 mT A2 m
E[(x m ) A(x m )] = (m m0 )T A(m m0 ) + Tr(A)
0 T
Cubic forms
E[xbT xxT ] =
7.2.4
E[xxT xxT ] =
2( + mmT )2 + mT m( mmT )
+Tr()( + mmT )
E[xxT AxxT ] = ( + mmT )(A + AT )( + mmT )
+mT Am( mmT ) + Tr[A( + mmT )]
E[xT xxT x] = 2Tr(2 ) + 4mT m + (Tr() + mT m)2
E[xT AxxT Bx] = Tr[A(B + BT )] + mT (A + AT )(B + BT )m
+(Tr(A) + mT Am)(Tr(B) + mT Bm)
E[aT xbT xcT xdT x]
= (aT ( + mmT )b)(cT ( + mmT )d)
+(aT ( + mmT )c)(bT ( + mmT )d)
+(aT ( + mmT )d)(bT ( + mmT )c) 2aT mbT mcT mdT m
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 31
7.3
Miscellaneous
GAUSSIANS
Moments
E[x] =
Cov(x) =
XX
k
7.3
7.3.1
k mk
k k0 (k + mk mTk mk mTk0 )
k0
Miscellaneous
Whitening
Entropy
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 32
7.4
Mixture of Gaussians
7.4
7.4.1
GAUSSIANS
Mixture of Gaussians
Density
K
X
k=1
k p
1
exp (x mk )T 1
(x
m
)
k
k
2
det(2k )
1
Derivatives
P
Defining p(s) = k k Ns (k , k ) one get
ln p(s)
j
=
=
ln p(s)
j
=
=
ln p(s)
j
=
=
j Ns (j , j )
P
ln[j Ns (j , j )]
N
(
,
k
s
k
j
k
k
j Ns (j , j ) 1
P
k k Ns (k , k ) j
j Ns (j , j )
P
ln[j Ns (j , j )]
N
(
,
k
k
j
k k s
j Ns (j , j )
P
1
k (s k )
k k Ns (k , k )
j Ns (j , j )
P
ln[j Ns (j , j )]
N
(
,
k
j
k
k k s
j Ns (j , j ) 1 T
T T
P
j + T
j (s j )(s j ) j
N
(
,
)
2
k
k
k k s
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 33
8 SPECIAL MATRICES
8
8.1
8.1.1
Special Matrices
Units, Permutation and Shift
Unit vector
Let ei Rn1 be the ith unit vector, i.e. the vector which is zero in all entries
except the ith at which it is 1.
8.1.2
= eTi A
= Aej
i.th row of A
j.th column of A
8.1.3
Permutations
0 1
P= 1 0
0 0
matrix, e.g.
0
0 = e2
1
eT2
= eT1
eT3
e1
e3
eT2 A
PA = eT1 A
eT3 A
AP =
Ae2
Ae1
Ae3
That is, the first is a matrix which has columns of A but in permuted sequence
and the second is a matrix which has the rows of A but in the permuted sequence.
8.1.4
0
1
L=
0
0
0
0
0
1
0
0
0
0
i.e. a matrix of zeros with one on the sub-diagonal, (L)ij = i,j+1 . With some
signal xt for t = 1, ..., N , the n.th power of the lag operator shifts the indices,
i.e.
n 0
for t = 1, .., n
(Ln x)t =
xtn for t = n + 1, ..., N
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 34
8.2
SPECIAL MATRICES
A related but slightly different matrix is the recurrent shifted operator defined
on a 4x4 example by
0 0 0 1
= 1 0 0 0
L
0 1 0 0
0 0 1 0
ij = i,j+1 + i,1 j,dim(L) . On a signal x it has the
i.e. a matrix defined by (L)
effect
n x)t = xt0 , t0 = [(t n) mod N ] + 1
(L
is like the shift operator L except that it wraps the signal as if it
That is, L
was periodic and shifted (substituting the zeros with the rear end of the signal).
is invertible and orthogonal, i.e.
Note that L
1 = L
T
L
8.2
8.2.1
The single-entry matrix Jij Rnn is defined as the matrix which is zero
everywhere except in the entry (i, j) in which it is 1. In a 4 4 example one
might have
0 0 0 0
0 0 1 0
J23 =
0 0 0 0
0 0 0 0
The single-entry matrix is very useful when working with derivatives of expressions involving matrices.
8.2.2
...
i.e. an n p matrix of zeros with the i.th column of A in place of the j.th
column. Assume A to be n m and Jij to be p n
0
..
.
Jij A =
Aj
0
.
..
0
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 35
8.2
SPECIAL MATRICES
i.e. an p m matrix of zeros with the j.th row of A in the placed of the i.th
row.
8.2.3
8.2.4
If i = j
Jij Jij = 0
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 36
8.3
8.2.6
SPECIAL MATRICES
Structure Matrices
8.3
8.3.1
Antisymmetric
The antisymmetric matrix is also known as the skew symmetric matrix. It has
the following property from which it is defined
A = AT
Hereby, it can be seen that the antisymmetric matrices always have a zero
diagonal. The n n antisymmetric matrices also have the following properties.
det(AT ) =
det(A) =
8.4
Vandermonde Matrices
1 v1 v12
1 v2 v22
V= . .
..
.. ..
.
1 vn vn2
v1n1
v2n1
..
.
(59)
vnn1
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 37
8.5
8.5
Toeplitz Matrices
SPECIAL MATRICES
Toeplitz Matrices
..
..
..
..
t21 . . . . . .
.
.
.
.
= t1
T=
(61)
.
.
..
..
..
..
..
..
.
.
. t12
.
t1
tn1 t21 t11
t(n1) t1
t0
A Toeplitz matrix is persymmetric. If a matrix is persymmetric (or orthosymmetric), it means that the matrix is symmetric about its northeast-southwest
diagonal (anti-diagonal) [9]. Persymmetric matrices is a larger class of matrices,
since a persymmetric matrix not necessarily has a Toeplitz structure. There are
some special cases of Toeplitz matrices. The symmetric Toeplitz matrix is given
by:
t0
t1 tn1
..
..
.
. ..
t1
.
T=
(62)
..
.
.
.
.
.
.
.
t1
t(n1) t1
t0
The circular Toeplitz matrix:
TC =
t0
tn
..
.
t1
t1
..
.
..
t0 t1
0 ...
TU =
. .
..
..
0
..
.
..
.
tn1
..
.
..
.
0
t0
0
.
..
t1
TL =
..
..
.
.
t(n1)
tn1
..
.
t1
t0
(63)
tn1
..
.
,
t1
t0
(64)
..
.
..
.
t1
0
..
.
0
t0
(65)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 38
8.6
8.5.1
SPECIAL MATRICES
The Toeplitz matrix has some computational advantages. The addition of two
Toeplitz matrices can be done with O(n) flops, multiplication of two Toeplitz
matrices can be done in O(n ln n) flops. Toeplitz equation systems can be solved
in O(n2 ) flops. The inverse of a positive definite Toeplitz matrix can be found
in O(n2 ) flops too. The inverse of a Toeplitz matrix is persymmetric. The
product of two lower triangular Toeplitz matrices is a Toeplitz matrix. More
information on Toeplitz matrices and circulant matrices can be found in [10, 7].
8.6
N
1
X
x(n)WNkn .
(67)
n=0
N 1
1 X
X(k)WNkn .
N
(68)
k=0
The DFT of the vector x = [x(0), x(1), , x(N 1)]T can be written in matrix
form as
X = WN x,
(69)
where X = [X(0), X(1), , x(N 1)]T . The IDFT is similarly given as
x = W1
N X.
(70)
If WN = e
j2
N
W1
N
WN WN
WN
=
=
1
W
N N
NI
WH
N
(71)
(72)
(73)
, then [15]
m+N/2
WN
= WNm
(74)
(75)
8.7
8.7
8.7.1
SPECIAL MATRICES
Eigenvalues
eig(A) > 0
eig(A) 0
Trace
Tr(A) > 0
Tr(A) 0
Inverse
Diagonal
Decomposition I
Decomposition II
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 40
8.8
Block matrices
8.7.8
SPECIAL MATRICES
AX = 0
Rank of product
Outer Product
Small pertubations
8.8
Block matrices
Multiplication
The Determinant
det
A11
A21
A12
A22
= A11 A12 A1
22 A21
= A22 A21 A1
11 A12
= det(A22 ) det(C1 ) = det(A11 ) det(C2 )
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 41
8.8
Block matrices
8.8.3
SPECIAL MATRICES
The Inverse
C1
C2
as
=
8.8.4
A12
A22
A11
A21
1
=
C1
1
1
C2 A21 A1
11
1
1
1
A1
11 + A11 A12 C2 A21 A11
1
A1
22 A21 C1
1
A1
11 A12 C2
1
C2
1
C1
1 A12 A22
1
1
1
A1
22 + A22 A21 C1 A12 A22
Block diagonal
A11
0
det
0
A22
A11
0
1
=
0
A22
(A11 )1
0
(A22 )1
= det(A11 ) det(A22 )
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 42
9.1
9.1.1
9.1.2
Consider some scalar function f (x) which takes the vector x as an argument.
This we can Taylor expand around x0
1
f (x)
= f (x0 ) + g(x0 )T (x x0 ) + (x x0 )T H(x0 )(x x0 )
2
where
g(x0 ) =
9.1.3
f (x)
x x0
H(x0 ) =
2 f (x)
xxT x0
As for analytical functions in one dimension, one can define a matrix function
for square matrices X by an infinite series
f (X) =
cn Xn
n=0
P
assuming the limit exists and is finite. If the coefficients cn fulfils n cn xn < ,
then one can prove that the above series exists and is finite, see [1]. Thus for
any analytical function f (x) there exists a corresponding matrix function f (x)
constructed by the Taylor expansion. Using this one can prove the following
results:
1) A matrix A is a zero of its own characteristic polynomium [1]:
X
p() = det(I A) =
cn n
p(A) = 0
n
f (A) = Uf (B)U1
if
|A| < 1
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 43
9.2
9.1.4
In analogy to the ordinary scalar exponential function, one can define exponential and logarithmic matrix functions:
e
X
1 n
1
A = I + A + A2 + ...
n!
2
n=0
eA
X
1
1
(1)n An = I A + A2 ...
n!
2
n=0
etA
X
1
1
(tA)n = I + tA + t2 A2 + ...
n!
2
n=0
ln(I + A)
X
(1)n1 n
1
1
A = A A2 + A3 ...
n
2
3
n=1
if
AB = BA
(eA )1 = eA
d tA
e = AetA = etA A,
tR
dt
d
Tr(etA ) = Tr(AetA )
dt
det(eA ) = eTr(A)
9.1.5
9.2
9.2.1
Trigonometric Functions
sin(A)
X
1
1
(1)n A2n+1
= A A3 + A5 ...
(2n
+
1)!
3!
5!
n=0
cos(A)
X
(1)n A2n
1
1
= I A2 + A4 ...
(2n)!
2!
4!
n=0
AB=
..
..
.
.
Am1 B
Am2 B
... Amn B
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 44
9.3
The vec-operator applied on a matrix A stacks the columns into a vector, i.e.
for a 2 2 matrix
A11
A21
A11 A12
A=
vec(A) =
A12
A21 A22
A22
Properties of the vec-operator include (see [13])
vec(AXB) = (BT A)vec(X)
Tr(AT B) = vec(A)T vec(B)
vec(A + B) = vec(A) + vec(B)
vec(A) = vec(A)
9.3
9.3.1
Solution
Unique solution x
Many solutions x
No solutions x
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 45
9.3
9.3.2
Standard Square
x = A1 b
9.3.3
Degenerated Square
9.3.4
Over-determined Rectangular
x = (AT A)1 AT b = A+ b
xmin = A+ b
Now xmin is the vector x which minimizes ||Ax b||2 , i.e. the vector which is
least wrong. The matrix A+ is the pseudo-inverse of A. See [3].
9.3.5
Under-determined Rectangular
xmin = AT (AAT )1 b
The equation have many solutions x. But xmin is the solution which minimizes
||Ax b||2 and also the solution with the smallest norm ||x||2 . The same holds
for a matrix version: Assume A is n m, X is m n and B is n n, then
AX = B
Xmin = A+ B
The equation have many solutions X. But Xmin is the solution which minimizes
||AX B||2 and also the solution with the smallest norm ||X||2 . See [3].
Similar but different: Assume A is square n n and the matrices B0 , B1
are n N , where N > n, then if B0 has maximal rank
AB0 = B1
where Amin denotes the matrix which is optimal in a least square sense. An
interpretation is that A is the linear approximation which maps the columns
vectors of B0 into the columns vectors of B1 .
9.3.6
A=0
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 46
9.4
Matrix Norms
9.3.7
If A is symmetric, then
xT Ax = 0,
9.3.8
A=0
See Sec 9.2.1 and 9.2.2 for details on the Kronecker product and the vec operator.
9.3.9
Encapsulating Sum
P
n An XBn
=C
1
P T
vec(X) =
vec(C)
n Bn An
See Sec 9.2.1 and 9.2.2 for details on the Kronecker product and the vec operator.
9.4
9.4.1
Matrix Norms
Definitions
0
||A|| = 0 A = 0
= |c|||A||,
cR
||A|| + ||B||
Examples
||A||F
P
= max
p j i |Aij |
=
max eig(AT A)
1/p
= (max||x||
P p =1 ||Ax||p )
= maxi j |Aij |
qP
p
2
=
Tr(AAH )
ij |Aij | =
||A||max
||A||KF
=
=
||A||1
||A||2
||A||p
||A||
maxij |Aij |
||sing(A)||1
(Frobenius)
(Ky Fan)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 47
9.5
Rank
9.4.3
Inequalities
m
n
mn
mn
mnd
||A||1
1
||A||
1
m
n
n
n
nd
m
md
||A||2
1
m
n
||A||F
1
m
n
1
d
d
||A||KF
1
m
n
1
1
m ||A||
9.4.4
Condition Number
p
The 2-norm of A equals (max(eig(AT A))) [9, p.57]. For a symmetric, positive
definite matrix, this reduces to max(eig(A)) The condition number based on the
2-norm thus reduces to
kAk2 kA1 k2 = max(eig(A)) max(eig(A1 )) =
9.5
max(eig(A))
.
min(eig(A))
(76)
Rank
9.5.1
Sylvesters Inequality
If A is m n and B is n r, then
rank(A) + rank(B) n rank(AB) min{rank(A), rank(B)}
9.6
1
p(A1 x)
det(A)
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 48
9.7
Miscellaneous
9.7
Miscellaneous
Orthogonal matrix
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 49
A
A.1
A.1.1
One-dimensional Results
Gaussian
Density
p(x) =
A.1.2
(x )2
exp
2 2
2 2
1
Normalization
Z
e
(s)2
2 2
ds =
Z
e(ax
+bx+c)
dx
+c1 x+c0
dx
Z
e c2 x
A.1.3
2 2
r
2
b 4ac
exp
a
4a
2
r
c 4c2 c0
exp 1
c2
4c2
Derivatives
p(x)
ln p(x)
p(x)
ln p(x)
A.1.4
ONE-DIMENSIONAL RESULTS
=
=
=
=
(x )
2
(x )
2
1 (x )2
p(x)
2
1 (x )2
2
p(x)
b=
1 c1
2 c2
or
c2 x2 + c1 x + c0 =
=
c1
2c2
2 =
w=
1 c21
+ c0
4 c2
1
(x )2 + d
2 2
1
2c2
d = c0
c21
4c2
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 50
A.2
A.1.5
Moments
or p(x) = C exp(c2 x2 + c1 x)
hx2 i
2 + 2
hx3 i
3 2 + 3
hx4 i
4 + 62 2 + 3 4
c1
2c2
2
c1
1
2c2 + h 2c2
i
c21
c1
3
2
(2c )
2c2
2 4
2
c1
c1
+
6
2c2
2c2
1
2c2
+3
1
2c2
2
=
2
h(x ) i
h(x )3 i
=
=
h(x )4 i
3 4
0h
=
=
1
2c2
1
2c2
i2
c1
exp(c2 x2 + c1 x)xn dx = Zhxn i =
exp
hxn i
c2
4c2
From the un-centralized moments one can derive other entities like
A.2
A.2.1
hx2 i hxi2
hx3 i hx2 ihxi
=
=
2
2 2
=
=
hx4 i hx2 i2
2 4 + 42 2
h
i
c2
1 4 2c12
p(s) =
K
X
k
A.2.2
1
2c2
2c1
(2c2 )2
2
(2c2 )2
1 (s k )2
k
exp
2
k2
2k2
Moments
k hxn ik
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 51
A.2
where hik denotes average with respect to the k.th component. We can calculate
the first four moments from the densities
X
1
1 (x k )2
p(x) =
k p
exp
2
k2
2k2
k
p(x) =
k Ck exp ck2 x2 + ck1 x
as
hxi
hx2 i
hx3 i
hx4 i
k k k
k (k2 + 2k )
k (3k2 k + 3k )
P
P
h
i
ck1
k
k
2ck2
2
P
ck1
1
+
k
k
2ck2
2ck2
h
ii
h
P
c2k1
ck1
k k (2ck2 )2
2ck2
2
2
P
c2k1
c
1
k1
6
+
3
k k
2ck2
2ck2
2ck2
P
hx i
hx3 i
4
hx i
=
=
=
=
0
P
0
P
=
2
k k k
4
k k 3k
=
=
=
0
P
k k
0
P
k k 3
1
2ck2
1
2ck2
i2
From the un-centralized moments one can derive other entities like
2
P
2
hx2 i hxi2
=
k,k0 k k0 k + k k k0
P
2
3
2
2
hx3 i hx2 ihxi =
k,k0 k k0 3k k + k (k + k )k0
P
4
2 2
4
2
2
2
2
hx4 i hx2 i2
=
k,k0 k k0 k + 6k k + 3k (k + k )(k0 + k0 )
A.2.3
Derivatives
P
Defining p(s) = k k Ns (k , k2 ) we get for a parameter j of the j.th component
j Ns (j , j2 ) ln(j Ns (j , j2 ))
ln p(s)
=P
2
j
j
k k Ns (k , k )
that is,
ln p(s)
j
ln p(s)
j
ln p(s)
j
j Ns (j , j2 ) 1
P
2
k k Ns (k , k ) j
j Ns (j , j2 ) (s j )
P
2
j2
k k Ns (k , k )
"
#
j Ns (j , j2 ) 1 (s j )2
P
1
2
j2
k k Ns (k , k ) j
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 52
Note thatP
k must be constrained to be proper ratios. Defining the ratios by
j = erj / k erk , we obtain
ln p(s) X ln p(s) l
=
rj
l rj
where
B
B.1
B.1.1
l
= l (lj j )
rj
Xij
=
=
u1 ,...,un1
r=0
n1
X
r=0
Using the properties of the single entry matrix found in Sec. 8.2.4, the result
follows easily.
B.1.2
Details on Eq. 78
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 53
B.1
Misc Proofs
Through the calculations, (16) and (47) were used. In addition, by use of (48),
the derivative is found with respect to the imaginary part of X
i
det(XH AX)
=X
= i det(XH AX)
=X
Tr[(X AX) X A(X)]
+
=X
= det(XH AX) AX(XH AX)1 ((XH AX)1 XH A)T
H
1 det(XH AX)
det(XH AX)
i
2
<X
=X
T
= det(XH AX) (XH AX)1 XH A
=
1 det(XH AX)
det(XH AX)
+i
2
<X
=X
= det(XH AX)AX(XH AX)1
=
Notice, for real X, A, the sum of (56) and (57) is reduced to (13).
Similar calculations yield
det(XAXH )
X
1 det(XAXH )
det(XAXH )
i
2
<X
=X
T
= det(XAXH ) AXH (XAXH )1
(77)
1 det(XAXH )
det(XAXH )
+i
2
<X
=X
= det(XAXH )(XAXH )1 XA
(78)
and
det(XAXH )
X
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 54
REFERENCES
REFERENCES
References
[1] Karl Gustav Andersson and Lars-Christer Boiers. Ordinaera differentialekvationer. Studenterlitteratur, 1992.
[2] Jorn Anem
uller, Terrence J. Sejnowski, and Scott Makeig. Complex independent component analysis of frequency-domain electroencephalographic
data. Neural Networks, 16(9):13111323, November 2003.
[3] S. Barnet. Matrices. Methods and Applications. Oxford Applied Mathematics and Computin Science Series. Clarendon Press, 1990.
[4] Christoffer Bishop. Neural Networks for Pattern Recognition. Oxford University Press, 1995.
[5] Robert J. Boik. Lecture notes: Statistics 550. Online, April 22 2002. Notes.
[6] D. H. Brandwood. A complex gradient operator and its application in
adaptive array theory. IEE Proceedings, 130(1):1116, February 1983. PTS.
F and H.
[7] M. Brookes. Matrix Reference Manual, 2004. Website May 20, 2004.
[8] Mads Dyrholm. Some matrix results, 2004. Website August 23, 2004.
[9] Gene H. Golub and Charles F. van Loan. Matrix Computations. The Johns
Hopkins University Press, Baltimore, 3rd edition, 1996.
[10] Robert M. Gray. Toeplitz and circulant matrices: A review. Technical
report, Information Systems Laboratory, Department of Electrical Engineering,Stanford University, Stanford, California 94305, August 2002.
[11] Simon Haykin. Adaptive Filter Theory. Prentice Hall, Upper Saddle River,
NJ, 4th edition, 2002.
[12] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge
University Press, 1985.
[13] Thomas P. Minka. Old and new matrix algebra useful for statistics, December 2000. Notes.
[14] L. Parra and C. Spence. Convolutive blind separation of non-stationary
sources. In IEEE Transactions Speech and Audio Processing, pages 320
327, May 2000.
[15] John G. Proakis and Dimitris G. Manolakis. Digital Signal Processing.
Prentice-Hall, 1996.
[16] Laurent Schwartz. Cours dAnalyse, volume II. Hermann, Paris, 1967. As
referenced in [11].
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 55
REFERENCES
REFERENCES
[17] Shayle R. Searle. Matrix Algebra Useful for Statistics. John Wiley and
Sons, 1982.
[18] G. Seber and A. Lee. Linear Regression Analysis. John Wiley and Sons,
2002.
[19] S. M. Selby. Standard Mathematical Tables. CRC Press, 1974.
[20] Inna Stainvas. Matrix algebra in differential calculus. Neural Computing
Research Group, Information Engeneering, Aston University, UK, August
2002. Notes.
[21] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice Hall,
1993.
[22] Max Welling. The Kalman Filter. Lecture Note.
Petersen & Pedersen, The Matrix Cookbook, Version: February 16, 2006, Page 56