Anda di halaman 1dari 33

# atrix Tutorial 2: Basic Matrix Operations

## This tutorial covers matrix operations like addition,

substractions, and multiplication of matrices. An
introduction to multiplication and division of matrices by
a scalar is provided. Includes determinants.
Dr. E. Garcia
Mi Islita.com
Email | Last Update: 07/09/06

Topics

Learning by Doing
Matrix Operations at Once
Multiplication of Matrices
Multiplication and Division of Matrices by a Scalar
Orthogonal Matrices
Transpose and Inverse Properties
Determinants
Tutorial Review
References
Learning by Doing

In Part 1 of this tutorial we introduced the reader to different type of matrices, digraphs, and
markov chains. We used lots of graphics to help users visualize the concepts. Now is time to
discuss matrix operations. As mentioned before, only the most common and basic operations
will be covered. Here we will use a learning-by-doing approach. Thus, rather than staring at
some equations, you must do your part.

## We recommend you to grab a stack of paper, a pencil and do the following.

• Do a quick and first reading of this tutorial. Don't skip sections. Don't worry if you
don't understand completely a key concept the first time. This first scan is analogous
to the visual scanning of key concepts you did in Part 1. The idea is to place some
global weights of knowledge in your "database" (mind) and later on associate to each
concept a local weight of specific knowledge. Incidentally, this teaching approach
resembles the way term weights are computed; i.e., by considering global and local
information.
• Once you have finished, go back and read again each section carefully. It is now when
you are going to concept-map text to images, form associations and execute.
• By execute I mean that each time you encounter an equation or figure describing some
calculations, try to replicate the calculations from scratch. Don't skip sections.
• Once you have finished, we suggest you to invent your own exercises and solve these.
If you prefer, use tabular data like from the sport or business section of a newspaper.
• Try to solve the exercises presented in the review section.

## Matrix Operations at Once

The rules for addition, substraction, multiplications and divisions between matrices are as
follows. Let first assume that matrix A and B are used to construct matrix Z. It must follows
that for

## • Addition: Z = A + B; zij = aij + bij

• Substraction: Z = A - B; zij = aij - bij
• Multiplication: Z = A*B, if # columns in A = # rows in B; zij = ai1* b1j + ai2* b2j + ai3*
b3j + ... aim* bnj

The rules for multiplication and division of a matrix by a scalar (a real number) are
simpler. If matrix Z is constructed by multiplying all elements of matrix A by a scalar c, then
its elements are zij = c*aij. In an analogous manner, dividing matrix A by c gives zij = (1/c)*aij.

All these operations are illustrated in Figure 1. Let's revisit these one by one.
Figure 1. Some matrix operations.

## Addition and Substraction of Matrices

To add or substract matrices these must be of identical order. This just means that the matrices
involved must have the same number of rows and columns. If they don't have the same
number of rows and columns we cannot add or substract these.

The expression

## zij = aij + bij

means "to element in row i, column j of matrix A add element in row i, column j of matrix B".
If we do this with each element of A and B we end with matrix Z. An example is given in
Figure 2.

## zij = aij - bij

means "to element in row i, column j of matrix A deduct element in row i, column j of matrix
B". If we do this with each element of A and B we end with matrix Z. See Figure 3.

## Figure 3. Substraction operation.

Multiplication of Matrices

Consider two matrices A and B with the following characteristics: the number of columns in
A equals the number of rows in B. These are conformable with respect to one another, and
they can be multiplied together to form a new matrix Z.

The expression

zij = ai1* b1j + ai2* b2j + ai3* b3j + ... aim* bnj

means "add the products obtained by multiplying elements in each i row of matrix A by
elements in each j column of matrix B". Figure 4 illustrates what we mean by this statement.

## Figure 4. Multiplication operation.

Matrix multiplication has a catch as we mentioned before. The order in which we multiply
terms does matter. The reason for this is that we need to multiply row elements by column
elements and one by one. Therefore A*B and B*A can produce different results. We say "can
produce" because there exist special cases in which the operation is conmutative (order does
not matter). An example of this is when we deal with diagonal matrices. Diagonal matrices
were described in Part 1.

## Multiplication and Division of Matrices by a Scalar

The rules for multiplication and division of a matrix by a scalar are similar. Since multiplying
a number x by 1/c is the same as dividing x by c, lets consider these operations at once.

If all elements of matrix A are multiplied by a scalar c to construct matrix Z, hence zij = c*aij.
Similarly dividing matrix A by c gives zij = (1/c)*aij. The expression
zij = c*aij

means "multiply each element in row i column j times c", and the expression

## zij = 1/c*aij = aij/c

means "divide each element in row i column j by c". These two operations are shown in
Figure 5, where c = 2.

## Figure 5. Multiplication and division of a matrix by a scalar.

Figure 6 shows that a scalar matrix is obtained when an identity matrix is multiplied by a
scalar. As we will see in Part 3 of this tutorial, deducting a scalar matrix from a regular matrix
is an important operation.

## Figure 6. Scalar matrix obtained by multiplying an identity matrix by a scalar.

Orthogonal Matrices

A regular matrix (one whose determinant is not equal to zero) M is said to be orthogonal if
when multiplied by its transpose the identity matrix I is obtained; i.e., M*MT = I. Orthogonal
matrices have interesting properties. If M is orthogonal:
1. its transpose and inverse are identical: MT = M-1.
2. when multiplied by its transpose the product is commutative: M*MT = MT*M.
3. its transpose is also an orthogonal matrix.
4. when multipled by an orthogonal matrix the product is an orthogonal matrix.
5. its determinant is +/- 1. The reverse is not necessarily true; i.e., not all matrices whose
determinant is +/- 1 are orthogonal.
6. the sum of the square of the elements in a given row or column is equal to 1.
7. when multiplied, the corresponding elements in two rows or columns-i.e., dot product-
is equal to zero.

Conversely, a square matrix (one with same number of rows and columns) is orthogonal if
the following conditions both exist:

1. the sum of the square of the elements in every row or column is equal to 1.
2. the sum of the products of corresponding elements in every pair of rows or columns
-i.e., dot products- is equal to zero.

As we can see, it is quite easy to determine if a regular or square matrix is orthogonal. Just
look for any of these properties.

(ABC)T =CTBTAT

## The following inverse properties are observed in matrices

(ABC)-1 =C-1B-1A-1

## (ABC-1)-1 = (C-1)-1B-1A-1 = CB-1A-1

A-1A = AA-1 = I = 1

Since matrix division is not defined, it is impossible to divide a matrix expression by a given
matrix. However, the desired effect is achieved by multiplying the expression by the inverse
of the given matrix (2).

Determinants

## Although the following is an incomplete definition, a determinant (det) can be described as a

function that associates a scalar to a square matrix. This can assume any real value including
zero. A matrix with a nonzero determinant is an invertible matrix (we can calculate its inverse
matrix). If the determinat is zero (det = 0) is called a non invertible matrix. Don't worry about
matrix inversions, yet.
To indicate that we are referring to determinant A and not to matrix A we surround the
symbol A by pipes ("|"). The symbolic definition of a determinant for a matrix A is shown in
Figure 7 and for m = n = 2 and m = n = 3

## Figure 7. Some determinants.

In the figure, the second subscripts are all distinct, the number of terms is n! and v is the
number of inversions of the second subscripts. Thus, the determinant of a matrix of order n=2
has two terms and 1 negative sign and the determinant of a matrix of order n=3 has 6 terms
and 3 negative signs. Sample calculations are given in Figure 8.

## Figure 8. Sample calculations of determinants.

There are other methods for solving determinants (triangularization, reduction methods, etc).
For large matrices there are plenty of software solutions to choose from.

If the determinant of a square matrix is not zero, its matrix is described as a regular matrix. If
the determinant is zero, its matrix is described as a singular matrix. The problem of
transforming a regular matrix into a singular matrix is referred to as the eigenvalue problem.
The eigenvalue problem and two important concepts, eigenvalues and eigenvectors will be
explained in Part 3 of this tutorial.

## Prev: Matrix Tutorial 1: Stochastic Matrices

Tutorial Review

1. Create two different matrices A and B, both of order n = 2. Prove that A*B and B*A
produce different results.
2. Consider the m = n = 2 matrix with the follow elements; a11 = -18; a12 = 29 ; a21 = 30;
a22 = 4. Calculate its trace and its determinant. Is this a regular or a singular matrix? Is
this an invertible or non invertible matrix?
3. Calculate the transpose matrices for the matrices shown in Figure 7. Calculate the
determinants of the transposed matrices. Are these regular or singular matrices? Are
these invertible or non invertible matrices?

References

1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).

## Status of the Current Document

atrix Tutorial 3: Eigenvalues and Eigenvectors
A tutorial on eigenvalues, eigenvectors and their properties.
Includes step by step how-to calculations. An introduction
to vector iteration, the Power Method and the Deflation
Method is provided.
Dr. E. Garcia
Mi Islita.com
Email | Last Update: 07/17/06

Topics

## Putting Everything Together

The Eigenvalue Problem
Calculating Eigenvalues
Eigenvectors
Properties of Eigenvalues and Eigenvectors
Computing Eigenvectors from Eigenvalues
Computing Eigenvalues from Eigenvectors
The Power Method (Vector Iteration)
The Deflation Method
Why should we care about all this?
Tutorial Review
References
Putting Everything Together

In Part 1 of this three-part tutorial we defined different type of matrices. We covered digraphs,
stochastic matrices, and markov chains. We also mentioned how some search engine
marketers have derived blogonomies out of these and similar concepts.

## In Part 2 we covered matrix operations like addition, substraction and multiplication of

matrices. We also discussed multiplication and division of matrices by a scalar and
calculation of determinants from square matrices. We mentioned that if a determinant has a
nonzero value, its matrix is described as regular and that if a determinant has zero value, its
matrix is described as singular.

It is now time to put everything together, to demystify eigenvalues, eigenvectors, and present
some practical applications.

## Consider a scalar matrix Z, obtained by multiplying an identity matrix by a scalar; i.e., Z =

c*I. Deducting this from a regular matrix A gives a new matrix A - c*I.

Equation 1: A - Z = A - c*I.

## If its determinant is zero,

Equation 2: |A - c*I| = 0

and A has been transformed into a singular matrix. The problem of transforming a regular
matrix into a singular matrix is referred to as the eigenvalue problem.

However, deducting c*I from A is equivalent to substracting a scalar c from the main
diagonal of A. For the determinant of the new matrix to vanish the trace of A must be equal to
the sum of specific values of c. For which values of c?

Calculating Eigenvalues

## Figure 1 shows that the computation of eigenvalues is a straightforward process.

Figure 1. The eigenvalue problem.

In the figure we started with a matrix A of order n = 2 and deducted from this the Z = c*I
matrix. Applying the method of determinants for m = n = 2 matrices discussed in Part 2 gives

|A - c*I| = c2 - 17*c + 42 = 0

c1 = 3 and c2 = 14.

Note that c1 + c2 = 17, confirming that these characteristic values must add up to the trace of
the original matrix A (13 + 4 = 17).

The polynomial expression we just obtained is called the characteristic equation and the c
values are termed the latent roots or eigenvalues of matrix A.

## Thus, deducting either c1 = 3 or c2 = 14 from the principal of A results in a matrix whose

determinant vanishes (|A - c*I| = 0)

## c1/trace = 3/17 = 0.176 or 17.6%

c2/trace = 14/17 = 0.824 or 82.4%
Thus, c2 = 14 is the largest eigenvalue, accounting for more than 82% of the trace. The largest
eigenvalue of a matrix is also called the principal eigenvalue.

There are many scenarios like in Principal Component Analysis (PCA) and Singular Value
Decomposition (SVD) in which some eigenvalues are so small that are ignored. Then the
remaining eigenvalues are added together to compute an estimated fraction. This estimate is
then used as a correlation criterion for the so-called Rank Two approximation.

SVD and PCA are techniques used in cluster analysis. In information retrieval, SVD is used
in Latent Semantic Indexing (LSI) while PCA is used in Information Space (IS). These will
be discussed in upcoming tutorials.

Now that the eigenvalues are known, these are used to compute the latent vectors of matrix
A. These are the so-called eigenvectors.

Eigenvectors

## Equation 1 can be rewritten for any eigenvalue i as

Equation 3: A - ci*I

Multiplying by a column vector Xi of same number of rows as A and setting the results to

Equation 4: (A - ci*I)*Xi = 0

## Thus, for every eigenvalue ci this equation constitutes a system of n simultaneous

homogeneous equations, and every system of equations has an infinite number of solutions.
Corresponding to every eigenvalue ci is a set of eigenvectors Xi, the number of eigenvectors
in the set being infinite. Furthermore, eigenvectors that correspond to different eigenvalues
are linearly independent from one another.

## Properties of Eigenvalues and Eigenvectors

At this point it might be a good idea to highlight several properties of eigenvalues and
eigenvectors. The following pertaint to the matrices we are dicussing here, only.

• the absolute value of a determinant (|detA|) is the product of the absolute values of the
eigenvalues of matrix A
• c = 0 is an eigenvalue of A if A is a singular (noninvertible) matrix
• If A is a nxn triangular matrix (upper triangular, lower triangular) or diagonal
matrix , the eigenvalues of A are the diagonal entries of A.
• A and its transpose matrix have same eigenvalues.
• Eigenvalues of a symmetric matrix are all real.
• Eigenvectors of a symmetric matrix are orthogonal, but only for distinct eigenvalues.
• The dominant or principal eigenvector of a matrix is an eigenvector corresponding
to the eigenvalue of largest magnitude (for real numbers, largest absolute value) of
that matrix.
• For a transition matrix, the dominant eigenvalue is always 1.
• The smallest eigenvalue of matrix A is the same as the inverse (reciprocal) of the
largest eigenvalue of A-1; i.e. of the inverse of A.

If we know an eigenvalue its eigenvector can be computed. The reverse process is also
possible; i.e., given an eigenvector, its corresponding eigenvalue can be calculated.

## Computing Eigenvectors from Eigenvalues

Let's use the example of Figure 1 to compute an eigenvector for c1 = 3. From Equation 2 we
write

## Figure 2. Eigenvectors for eigenvalue c1 = 3.

Note that c1 = 3 gives a set with infinite number of eigenvectors. For the other eigenvalue, c2
= 14, we obtain
Figure 3. Eigenvectors for eigenvalue c2 = 14.

## In addition, it is confirmed that |c1|*|c2| = |3|*|14| = |42| = |detA|.

As show in Figure 4, plotting these vectors confirms that eigenvectors that correspond to
different eigenvalues are linearly independent of one another. Note that each eigenvalue
produces an infinite set of eigenvectors, all being multiples of a normalized vector. So, instead
of plotting candidate eigenvectors for a given eigenvalue one could simply represent an entire
set by its normalized eigenvector. This is done by rescaling coordinates; in this case, by taking
coordinate ratios. In our example, the coordinates of these normalized eigenvectors are:

## 1. (0.5, -1) for c1 = 3.

2. (1, 0.2) for c2 = 14.
Figure 4. Eigenvectors for different eigenvalues are linearly independent.

Mathematicians love to normalize eigenvectors in terms of their Euclidean Distance (L), so all
vectors are unit length. To illustrate, in the preceeding example the coordinates of the two
eigenvectors are (0.5, -1) and (1, 0.2). Their lengths are

## for c1 = 3: L = [0.52 + -12]1/2 = 1.12

for c2 = 14: L = [12 + 0.22]1/2 = 1.02

## for c1 = 3: (0.5/1.12, -1/1.12) = (0.4, -0.9)

for c2 = 14: (1/1.02, 0.20/1.02) = (1, 0.2)

You can do the same and normalize eigenvectors to your heart needs, but it is time consuming
(and boring). Fortunately, if you use software packages these will return unit eigenvectors for
you by default.

## Computing Eigenvalues from Eigenvectors

This is a lot easier to do. First we rearrange Equation 4. Since I = 1 we can write the general
expression
Equation 5: A*X = c*X

Now to illustrate calculations let's use the example given by Professor C.J. (Keith) van
Rijsbergen in chapter 4, page 58 of his great book The Geometry of Information Retrieval (3),

## Figure 5. Eigenvalue obtained from an eigenvector.

This result can be confirmed by simply computing the determinant of A and calculating the
latent roots. This should give two latent roots or eigenvalues, c = 41/2 = +/- 2. That is, one
eigenvalue must be c1 = +2 and the other must be c2 = -2. This also confirms that c1 + c2 =
trace of A which in this case is zero.

## An Alternate Method: Rayleigh Quotients

An alternate method for computing eigenvalues from eigenvectors consists in calculating the
so-called Rayleigh Quotient, where

## where XT is the transpose of X.

For the example given in Figure 5, XT*A*X = 36 and XT*X = 18; hence, 36/18 = 2.

Rayleigh Quotients give you eigenvalues in a straightforward manner. You might want to use
this method instead of inspection or as double-checking method. You can also use this in
combination with other iterative methods like the Power Method.

## The Power Method (Vector Iteration)

Eigenvalues can be ordered in terms of their absolute values to find the dominant or largest
eigenvalue of a matrix. Thus, if two distinct hypothetical matrices have the following set of
eigenvalues

• 5, 8, -7; then |8| > |-7| > |5| and 8 is the dominant eigenvalue.
• 0.2, -1, 1; then |1| = |-1| > |0.2| and since |1| = |-1| there is no dominant eigenvalue.

One of the simplest methods for finding the largest eigenvalue and eigenvector of a matrix is
the Power Method, also called the Vector Iteration Method. The method fails if there is no
dominant eigenvalue.

## In its basic form the Power Method is applied as follows:

1. Asign to the candidate matrix an arbitrary eigenvector with at least one element being
nonzero.
2. Compute a new eigenvector.
3. Normalize the eigenvector, where the normalization scalar is taken for an initial
eigenvalue.
4. Multiply the original matrix by the normalized eigenvector to calculate a new
eigenvector.
5. Normalize this eigenvector, where the normalization scalar is taken for a new
eigenvalue.
6. Repeat the entire process until the absolute relative error between successive
eigenvalues satisfies an arbitrary tolerance (threshold) value.

It cannot get any easier than this. Let's take a look at a simple example.
Figure 6. Power Method for finding an eigenvector with the largest eigenvalue.

What we have done here is apply repeatedly a matrix to an arbitrarily chosen eigenvector. The
result converges nicely to the largest eigenvalue of the matrix; i.e.

## Equation 6: AkXi = cik*Xi

Figure 7 provides a visual representation of the iteration process obtained through the Power
Method for the matrix given in Figure 3. As expected, for its largest eigenvalue the iterated
vector converges to an eigenvector of relative coordinates (1, 0.20).
Figure 7. Visual representation of vector iteration.

It can be demonstrated that guessing an initial eigenvector in which its first element is 1 and
all others are zero produces in the next iteration step an eigenvector with elements being the
first column of the matrix. Thus, one could simply choose the first column of a matrix as an
initial seed.

Whether you want to try a matrix column as an initial seed, keep in mind that the rate of
convergence of the power method actually depends on the nature of the eigenvalues. For
closely spaced eigenvalues, the rate of convergence can be slow. Several methods for
improving the rate of convergence have been proposed (Shifted Iteration, Shifted Inverse
Iteration or transformation methods). I will not discuss these at this time.

## The Deflation Method

There are different methods for finding subsequent eigenvalues of a matrix. I will discuss only
one of these: The Deflation Method. Deflation is a straightforward approach. Essentially, this
is what we do:

1. First, we use the Power Method to find the largest eigenvalue and eigenvector of
matrix A.
2. multiply the largest eigenvector by its transpose and then by the largest eigenvalue.
This produces the matrix Z* = c *X*(X)T
3. compute a new matrix A* = A - Z* = A - c *X*(X)T
4. Apply the Power Method to A* to compute its largest eigenvalue. This in turns should
be the second largest eigenvalue of the initial matrix A.

Figure 8 shows deflection in action for the example given in Figure 1 and 2. After few
iterations the method converges smoothly to the second largest eigenvalue of the matrix.
Neat!
Figure 8. Finding the second largest eigenvalue with the Deflation Method.

Note. We want to thanks Mr. William Cotton for pointing us of an error in the original
version of this figure, which was then compounded in the calculations. These have been
corrected since then. After corrections, still deflation was able to reach the right second
eigenvalue of c = 3. Results can be double checked using Raleigh's Quotients.

We can use deflation to find subsequent eigenvector-eigenvalue pairs, but there is a point
wherein rounding error reduces the accuracy below acceptable limits. For this reason other
methods, like Jacobi's Method, are preferred when one needs to compute many or all
eigenvalues of a matrix.

## Why should we care about all this?

Armed with this knowledge, you should be able to understand better articles that discuss link
models like PageRank, their advantages and limitations, when these succeed or fail and why.
The assumption from these models is that surfing the web by jumping from links to links is
like a random walk describing a markov chain process over a set of linked web pages.
The matrix is considered the transition probability matrix of the Markov chain and having
elements strictly between zero and one. For such matrices the Perron-Frobenius Theorem tells
us that the largest eigenvalue of the matrix is equal to one (c = 1) and that the corresponding
eigenvector, which satisfies the equation

Equation 7: A*X = X

does exists and is the principal eigenvector (state vector) of the Markov Chain, with elements
of X being the pageranks. Thus, according to theory, iteration should enable one to compute
the largest eigenvalue and this principal eigenvector, whose elements are the pagerank of the
individual pages.

## Beware of Link Model Speculators

If you are interested in reading how PageRank is computed, stay away from speculators,
especially from search engine marketers. It is hard to find accurate explanations in SEO or
SEM forums or from those that sell link-based services. I rather suggest you to read university
research articles from those that have conducted serious research work on link graphs and
PageRank-based models. Great explanations are all over the place. However, some of these
are derivative work and might not reflect how Google actually implements PageRank these
days (only those at Google know or should know this or if PageRank has been phased out for
something better). With all, these research papers are based on experimentation and their
results are verifiable.

There is a scientific paper I would like readers to at least consider: Link Analysis,
Eigenvectors and Stability, from Ng, Zheng and Jordan from the University of California,
Berkeley (5). In this paper the authors use many of the topics herein described to explain the
HITS and PageRank models. Regarding the later they write:
Figure 9. PageRank explanation, according to Ng, Zheng and Jordan from University of
California, Berkeley

Note that the last equation in Figure 9 is of the form A*X = X as in Equation 7; that is, p is
the principal eigenvector (p = X) and can be obtained through iterations.

After completing this 3-part tutorial you should be able to grasp the gist of this paper. The
group even made an interesting connection between HITS and LSI (latent semantic indexing).

If you are a student and are looking for a good term paper on Perron-Frobenius Theory and
PageRank computations, I recommend you the term paper by Jacob Miles Prystowsky and
Levi Gill Calculating Web Page Authority Using the PageRank Algorithm (6). This paper
discusses PageRank and some how-to calculations involving the Power Method we have
described.

How many iterations are required to compute PageRank values? Only Google knows.
According to this Perron-Frobenius review from Professor Stephen Boyd from Stanford (7),
the original paper on Google claims that for 24 million pages 50 iterations were required. A
lot of things have changed since then, including methods for improving PageRank and new
flaws discovered in this and similar link models. These flaws have been the result of the
commercial nature of the Web. Not surprisingly, models that work well under controlled
conditions and free from noise often fail miserably when transferred to a noisy environment.
These topics will be discussed in details in upcoming articles.

Meanwhile, if you are still thinking that the entire numerical apparatus validates the notion
that on the Web links can be equated to votes of citation importance or that the treatment
validates the link citation-literature citation analogy a la Eugene Garfield's Impact Factors,
think again. This has been one of the biggest fallacies around, promoted by many link
spammers, few IRs and several search engine marketers with vested interests.

Literature citation and Impact Factors are driven by editorial policies and peer reviews. On the
Web anyone can add/remove/exchange links at any time for any reason whatever. Anyone can
environment, far from the controlled conditions observed in a computer lab, peer review and
citation policies are almost absent or at best contaminated by commercialization. Evidently
under such circumstances the link citation-literature citation analogy or the notion that a link
is a vote of citation importance for the content of a document cannot be sustained.

Tutorial Review

## 1. Prove that a scalar matrix Z can be obtained by multiplying an identity matrix I by a

scalar c; i.e., Z = c*I.
2. Prove that deducting c*I from regular matrix A is equivalent to substracting a scalar c
from the diagonal of A.
3. Given the following matrix,

Prove that these are indeed the three eigenvalues of the matrix. Calculate the
corresponding eigenvectors.
4. Use the Power Method to calculate the largest eigenvalue of the matrix given in
Exercise 3.
5. Use the Deflation Method to calculate the second largest eigenvalue of the matrix
given in Exercise 3.

References

1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).
3. The Geometry of Information Retrieval; C.J. (Keith) van Rijsbergen, Cambridge
(2004).
4. Lecture 8: Eigenvalue Equations; S. Xiao, University of Iowa.
5. Link Analysis, Eigenvectors and Stability; Ng, Zheng and Jordan from the University
of California, Berkeley.
6. Calculating Web Page Authority Using the PageRank Algorithm; Jacob Miles
Prystowsky and Levi Gill; College of the Redwoods, Eureka, CA (2005).
7. Perron-Frobenius Stephen Boyd; EE363: Linear Dynamical Systems, Stanford
University, Winter Quarter (2005-2006).
Thank you for using this site.

## Matrix Tutorial 1: Stochastic Matrices

A matrix tutorial. Includes, square, triangular, scalar,
transpose, and stochastic matrices. Also covers rank of a
matrix, digraphs and Markov chains.
Dr. E. Garcia
Mi Islita.com
Email | Last Update: 07/11/06

Topics

Principal and Trace of a Square Matrix
Row Vectors, Column Vectors, Scalar and Transpose Matrices
The Rank of a Matrix
Demystifying Stochastic Matrices
Digraphs, Indegrees and Outdegrees
SEO Blogonomies: The Search Engine Markov Chain
What's Next?
Tutorial Review
References

This tutorial introduces matrices, eigenvalues, and eigenvectors to IR students and search
engine marketers. In Part 1 we go through some definitions and familiarize readers with
different type of matrices. Emphasis is given to stochastic matrices. In Part 2 we stop
momentarily to explain some basic matrix operations. Part 3 demystifies eigenvalues and
eigenvectors, showing how to calculate these.

We hope that presenting the material in this order, i.e., visualization of matrices first, followed
by matrix operations, might help students to associate math operations with what they have
visualized already. Currently, many matrix tutorials intermingle execution with visualization,
forcing students to stop and do a one-by-one mapping between text and graphics, before
processing new material. In our opinion that approach injects to the discourse an unnecessary
level of difficulty.

By separating visualization from execution, by the end of this tutorial the reader will be able
to discriminate between different type of matrices. Students will be able to identify key
concepts such as the rank of a matrix, digraphs, markov chains, and other key concepts
without resourcing to math operations.

We do not pretend to make a comprehensive review out of this tutorial. Rather the material is
limited to what we think might be relevant to link models and cluster structures. Applications
and examples are provided.

Most of the material and examples are taken from two great books (1, 2) I read way back
while in grad school and before the inception of commercial search engines (Google, Yahoo,
MSN, etc) in the Web scene:

1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).

Why we wrote this tutorial for an audience consisting of IR students and search marketers?
Well, there are plenty of reasons. Consider this:

## • matrices simplify the handling of rutinary business model calculations.

• eigenvectors can be used to understand link models and networks.
• eigenvalues, eigenvectors, stochastic matrices, Markov Chains, etc are used to
understand dissimilar random processes.
• often students and search engine marketers find these concepts too abstracts or
complex to understand.
• research articles about these topics are often misquoted in SEM discussion
forums/SEO blogs and key concepts become "blogonomies".
Thus, one of the goals of this tutorial is help our audience to grasp these concepts, while we
dispel some myths. In this way next time a reader has an encounter with these topics he/she
can grasp the gist of the discourse --or at least a good portion of it.

## Principal and Trace of a Square Matrix

Let us first define what is a matrix and go through some basic definitions.

A matrix is just a rectangular array of rows (m) and columns (n); that is, a table. Thus, tabular
data entered into an Excel spreadsheet can be viewed as a matrix. If you run a mom-n-pop
business and for some reason you have arranged numbers or letters in rows and columns, you

If a matrix has the same number of rows (m) and columns (n) is termed a square matrix; i.e.,
m = n. The matrix is said to be of the nth order or of order n. Thus, an array consisting of two
rows and two columns is a square matrix of order m = n = 2 and an array consisting of three
rows and three columns is a square matrix of order m = n = 3.

Elements of a matrix are identified by assigning subscripts to rows and columns. Thus, for
matrix A its elements are aij. For instance, a32 means element in row 3 column 2.

The diagonal extending from the upper-left corner to the lower-right corner of a square matrix
is termed the principal. The elements of the principal are termed the principal elements or
diagonal elements. The sum of the principal is the trace of the matrix. The trace is an
important concept, as we will see in Part 1 and Part 2 of this tutorial. These concepts are
illustrated in Figure 1.

## Row Vectors, Column Vectors, Scalar and Transpose Matrices

A one-row matrix is a called a row vector. Similarly, a one-column matrix is termed a column
vector. A null matrix is one with all elements being zero.

A matrix in which all nondiagonal elements have zero value is a diagonal matrix. If all
elements of a diagonal matrix are equal, we call this a scalar matrix. If all elements of a
scalar matrix are 1 this is termed a unit matrix or an identity matrix, I.
A transpose matrix AT is obtained by converting rows into columns and columns into rows.
Some of these definitions are illustrated in Figure 2.

## The Rank of a Matrix

A matrix in which all elements above or below the principal have zero value is a triangular
matrix. Moreover, a triangular matrix is classified as lower-triangular or upper-triangular,
respectively, according to whether the zero elements lie above or below the principal.

The rank of a matrix is equal to the number of linearly independent rows or linearly
independent columns it contains, whichever of these two number is smaller. Accordingly, the
rank of a square matrix is equal to the number of nonzero rows in its upper-triangular matrix
or the number of nonzero columns in its equivalent lower-triangular matrix, whichever of
these two number is smaller.

Figure 3 shows a square matrix and its equivalent triangular matrix. This was obtained by
subjecting the matrix to elementary column operations. Don't worry for now about
tranforming a square matrix into a triangular matrix. What is important is the following: since
B contains 3 nonzero columns, A is of rank 3.
Figure 3. Rank of a square matrix.

Another way of computing the rank of a matrix involves the use of singular values. This will
be discussed in an upcoming tutorial on Singular Value Decomposition (SVD).

not knowing that the term "rank" of a link graph is used in those articles in reference to the
rank of a matrix and not in reference to any web page ranks (i.e., positioning of search
results). Next thing one reads from these marketers is what we call a bunches of blogonomies.
We call a "blogonomy" the dissemination of false knowledge through blogs or public forums
and "blogorrhea" when a false concept is promoted for a profit.

## Demystifying Stochastic Matrices

If all elements of a matrix are non negative, we can normalize rows by adding row elements
together and dividing each element by the corresponding row totals. Obviously, adding
together normalized row elements equals 1. In general, a matrix whose sum of all row
elements (or column elements) equals 1 is called a stochastic matrix. Elements of a
stochastic matrix can be zero if their row totals (or column totals) equal 1. See Figure 4.

## Figure 4. Stochastic matrices.

Since a stochastic matrix can be obtained also by normalizing columns, authors often use the
expressions row-stochastic matrix and column-stochastic matrix, in order to distiguish
between the two cases. The expression doubly stochastic matrix is reserved for square
matrices whose sums of both rows and columns equal 1. This is the case if both matrix A and
its transpose AT are stochastics.

## Digraphs, Indegrees and Outdegrees

A directed graph or digraph consists of a number of points (nodes) linked together by arrows
or lines also called edges. Arrows indicates the direction of the relationship between two
nodes. The number of arrows ending at a specific node is called the outdegree of the node and
the number of arrows leading from it, is called the indegree.

To illustrate these concepts, let me use the example presented by the authors of Graphical
Exploratory Data Analysis (1) from 1986.

## Figure 5. Digraph for the friendship between six persons.

Here they represented the friendship between six individuals as a digraph (any similarity with
link graphs flying around?). The direction of the arrows says it all. 1, 3 and 6 consider 2 a
friend, but 2 is friendly with 3, only.

The following array describes how the nodes are related. Note that row totals give outdegrees
and column totals indegrees.
Figure 6. Indegrees and outdegrees for the friendship between six persons.

When these type of relationships are represented in matrix notation the resultant array is
called an adjacency matrix. Dividing each row element of the adjacency matrix by the
correponding outdegree, yields a row-stochastic matrix,

## Markov Chains and Link Models

Now that we have the basic ideas clarified, let's move forward and talk about random
processes.

A random process is a process or series of events that occur by chance. If the process evolve
in time is called a Markov chain. Looking at some of the stochastic matrices we have
derived, if instead of mere numbers the elements represent probabilities pij these are called
transition probabilities. The corresponding matrix is termed a transition matrix

Therefore, it can be said that a Markov chain is just a random process evolving in time
according with the transition probabilities of the Markov chain.
SEO Blogonomies: The Search Engine Markov Chain

## The spreading of incorrect knowledge or at best innaccurate representation of concepts is

prevalent in circles associated to search engine optimization (SEO). This is a social
phenomenon more notorious in the blogosphere and through public forums (sites and
discussion forums). Because of this, we call the phenomenon "blogonomies". We are
currently compiling a list of the most notorious blogonomies spreaded over the search engine
marketing world.

Many blogonomies are promoted by well known SEO and SEM specialists. These folks are
called "experts" by their followers and pose as such in their SEM conferences. They often
quote each other or call each other "experts". Many of these folks like to write the fine line of
fallacies, producing material where false concepts are decorated with scientific terms and
"fat" words. They are also experts in damage control and in saving face.

## We are not interested in investigating what actually motivates the phenomenon of

blogonomies since that is self-evident. What we want is make the reader aware of the
phenomenon. As a sample of what you could expect to see listed in our SEO Blogonomies
here is one: The Search Engine Markov Chain Blogonomy.

Some SEOs have written -giving the impression to readers- that search engines use a mythical
Markov Chain to find patterns in search engine search results or sites, like if such chain is a
special kind of detection instrument, tool or technique that is applied to find keyword patterns
in a web page or to detect how the document was optimized. This is pure non sense.

There is no such thing as a mythical Search Engine Markov Chain, which only exists in the
mind of these folks and followers, who often misquote research articles. A markov chain is
simply a random process that occurs over time according to some transition probabilities.

Suppose we run an experiment that has N possible results (states). Suppose that we keep
repeating the experiment and that the probability of each of the results or states occurring on
the (N+1)th repetition depends only on the result of the Nth repetition of the experiment. This
is called a markov chain.

Thus a markov chain is not an instrument, technique, tool or the like that allegedly is used by
search engines to rank web pages or to find word patterns in documents. True that there is a
lot of research in which things have been modeled as markov processes in an attempt at
understanding better behaviours and link graphs, but the analogy stops there.

True that there is something called an absorbing markov chain, but this is a specific case
involving random walks with absorbing states. Perhaps it might be a good idea to write a
tutorial on regular markov chains and absorbing markov chains or, better, recommend readers
to take a look at the book of James T. Sandefur, Discrete Dynamical Systems, Theory and
Applications (Oxford University Press; Chapter 6 Absorbing Markov Chains) (3). If you like
fractals, chaos and iterations, this book is for you.

Meanwhile, if while drunk you walked randomly from one point to another, chances are that
you might have "markov-chained yourself", already.

What's Next?
What all this discourse has to do with web links (linked web pages)? Well, consider a random
walk over a set of linked pages. This can be defined by a transition matrix, which in this case
is the links matrix. The largest eigenvector of the transition matrix tell us the probabilities of
the walk ending on the candidate pages. To understand the significance of this statement we
first need to define what we mean by the largest eigenvector and how these are computed.
This and the calculations involved, will be explained, step by step in Part 2 and Part 3 of this
tutorial.

Tutorial Review

## 1. What is the difference between a diagonal, scalar and identity matrix?

2. Which of the following is a square matrix: an array of 3 rows and 2 columns or an
array of 10 rows and 10 columns?
3. Look at the business or sport section of a newspaper and try to find tabular data that
could be represented as a square matrix. Calculate its trace and transpose.
4. Derive a column-stochastic matrix from Figure 6.
5. Look at your site map. Try to derive a digraph from your site map or link structure.
For this exercise, consider only your pages, ignoring third-party links (this of course
will represent an ideal scenario). Compute a row-stochastic matrix or a column-
stochastic matrix. Have fun.
6. What is a markov chain?

References

1. Graphical Exploratory Data Analysis; S.H.C du Toit, A.G.W. Steyn and R.H. Stumpf,
Springer-Verlag (1986).
2. Handbook of Applied Mathematics for Engineers and Scientists; Max Kurtz, McGraw
Hill (1991).
3. Discrete Dynamical Systems, Theory and Applications; James T. Sandefur, Oxford
University Press; Chapter 6 Absorbing Markov Chains (1990).