The correspondence analysis has highly flexible data requirements. The only
strict data requirement is a rectangular data matrix with non-negative
entries. Correspondence analysis is most effective if the following conditions
are satisfied:
With the use of the available facilities it is easy to obtain one of the main
output researchers are interested in, namely the scatterplot representing row
and/or column points projected on a subspace chosen by the user. It must be
noted, however, that in order to interpret the CA scatterplot and to have a
sound comprehension of the data structure, the mere examination of that
graph is not enough. The user has to consult a number of statistics reported
on screen in tabular form. Further, the user must perform from scratch some
calculations on the basis of those raw statistics.
Primitive matrix
Profiles
Row profiles
1 2 J
1. ............ 1
2. ............ 1
3. 1
............
. . .
I 1
............
Column mass 1
Column Profiles
1. ...........
.
2.
...........
3.
.
. . ...........
I .
...........
.
Column 1 1 1 1
mass
Masses
=n+i/n
=nj+/n
Correspondence matrix
Distances
The distance thus obtained is called the Chi-square distance. The Chi-square
distance differs from the usual Euclidean distance in that each square is
weighted by the inverse of the frequency corresponding to each term.
Essentially, the reason for choosing the Chi-square distance is that it satisfies
the principle of distributional equivalence, expressed as follows:
Inertia
Moment of inertia =
Inertia =
Total inertia
Reduction of Dimensionality
The criterion used for dimensionality reduction implies that the inertia of a
cloud in the optimal subspace is maximum, but that would still be less than
that in the true space. What is lost in this process is the knowledge of how
far and in which direction the profiles lie off this subspace. What is gained is
a view of the profiles, which otherwise would not be possible. The ratio of
inertia inside the subspace to the total inertia gives a measure of the
accuracy of representation of a cloud in the subspace.
Each row and each column makes a contribution to the total inertia,
respectively called row inertia and column inertia. The principal inertia
of the row (or column) points is the inertia of the row (or column)
points projected onto the axis. Thus, each row or column makes a
contribution to the principal inertia. The component of row inertia or
column inertia along a principal axis is called the principal inertia.
Since the sums of the frequencies across the columns must be equal to the
row totals, and the sums across the rows equal to the column totals, there
are in a sense only (number, J, of olumns-1) independent entries in each row,
and (number, I, of rows-1) independent entries in each column of the
contingency table. Thus, the maximum number of eigenvalues that can be
extracted from a two- way table is equal to the minimum of [ the number of
columns minus 1, and the number of rows minus 1] . If we choose to extract
(i.e., interpret) the maximum number of dimensions that can be extracted,
then we can reproduce exactly all the information contained in the table.
How many axes are significant and should be retained for further analysis or
interpretation? Here significant means necessary to study in detail not in
terms of statistical significance tests. Two types of factor axes are
considered: First order factor axes and Second order factor axes. First order
factor axes are considered on the basis of contributions to the total variance
(or inertia), whereas the second order factor axes are considered on the
basis of contributions to the eccentricity, that is. COS2 j .
Correspondence analysis issues eigenvalues for the min[(I, J)-1] factor axes;
the eigenvalues are ranked in the decreasing order of magnitude.
After having selected the first order factor axes, the second order factor axes
are selected as follows:
Let M/ be the rank of a factor axis for which a point i of N (I) and or j of N (J)
exists, such that
COS2 j (i) k
or
COS2 j (j) k
Explicative points
Outlier points
Graphics
The joint display of row and column points shows the relation between a
point from one set and all points of another set, not between individual
points between each set. Except in special cases, it is extremely dangerous
to interpret the proximity of two points corresponding to different sets of
points.
Points of a cloud (or set) situated away from the origin, but close to
each other have similar profiles
Notation
Contingency table N (I J)
The matrix of row profiles can also be defined as the rows of the
correspondence matrix P divided by their respective row sums (i.e. row
masses), which can be written as:
1
Matrix of row profile = Dr P
C= Dc 1 P
g 1 g 2 g n 0
(2)
UTU=I V T V=I
(3)
where UT is the transpose of U, and VT is the transpose of V.
[ (x y) T D q (x y) ]
(4
)
Let m be the vector of point messes (we have already assumed that ):
T
I m= I
(5)
F= Dm U G
(6)
The coordinates of the points in an optimal a -dimensional subspace are
contained in the first a columns. The principal axes of this space are
contained in the matrix
A=D q - V
Here, we have two special cases of the above general result, viz. Row
problem and Column problem. These problems involve the reduction of
dimensionality of the row profiles and the column profiles, where each set of
points has its associated masses and Chi-square distances. Both these
problems reduce to singular value decomposition of the same matrix of
standardized residuals.
Row problem
r T D r - 1P = I T P = c T
T
where c is the row vector of the column masses
A = Dr1/2(Dr-1P-ICT)Dc-
1/2
(7)
A = Dr-1/2 (P-ycT)Dr-
1/2
(8)
Column problem
can be written as
This is the transpose of the matrix derived for A., the row problem. It follows
that both the row and column problems can be solved by singular value
decomposition of the same matrix of standardized residuals:
(1
0)
(11)
(12)
It can be easily seen that the matrix A is the transpose of the matrix derived
for the row problem. These results imply that both the row problem and
column problems are solved by computing the singular value decomposition
of the same matrix (i.e. the matrix of the standard residuals).
(1
3)
whose elements are:
(14)
Total inertia =
Thus, there are k = min [I-1, J-1] dimensions in the solution. The squares of
the singular values of A i.e. the eigenvalues of ATA or AAT also decompose
the total inertia. These are denoted by and are called the principal
inertias.
G
(15)
(16)
X=FG -1=
(17
)
i.e.
(19)
For the ith row, the inertia components for all k axes sum up to the row inertia
of the ith row:
The left hand side of the above equation is identical to the sum of squared
elements in the ith row of A
or
(20)
There are k = min [I-1, J-1] dimensions in the solution. The square of the
singular values of A, are denoted by are called singular values.
The principal coordinates of the rows are obtained using [Equation (6)], for
the row problem.
(21)
or in scalar notation:
(22)
i.e.
The standard coordinates of the rows are the principal coordinates divided by
their respective singular values:
(23)
i.e.
The standard coordinates of the columns are the principal coordinates
divided by their respective singular values:
i.e.
References:
http://www.mattpeeples.net/ca.html
http://www.statmethods.net/advstats/ca.html
http://www.sciencedirect.com/science/article/pii/S2352711015000060
http://www.unesco.org/webworld/idams/advguide/Chapt6_5.htm