An assignment on conducting a PCA analysis for 6 companies stocks of two different industries the
companies were TCS, Wipro and Infosys from IT industry and Raj tv , Tv18 and Ndtv from the media
industry.
Data : the data have been taken from www.capitalline.com and stata has been used to conduct the
analysis.
NOTE : the commands all are highlighted in red followed with their results.
Infosys 1.0000
The pca command uses the above correlation matrix which is default to generate the principal
components.
Number of comp. = 6
Trace = 6
--------------------------------------------------------------------------
-------------+------------------------------------------------------------
This can be seen from the Eigen value column and the proportion column. Where the Eigen value
indicates the respective component’s Eigen value which can be seen in two different ways, one it can be
considered as the variance of the respective components and it can also be interpreted, as higher the
Eigen value, higher the proportion of total variance accounted for by that particular component, which
is evidential from the proportion column. Thus, in our case the first component (comp1) has the higher
Eigen value of 2.51376 and a corresponding proportion of 0 .4190, indicating 41% proportion of total
variance is being accounted by the first component (comp1). The second component (comp2) has an
Eigen value and the proportion of 1.18472 and 0.1975 respectively. Further the proportion of the total
variance accounted for by both the first and the second component (comp1 and comp2) is 0.6164,
indicating that 61% of the total variance is explained by the first two components, which is seen in the
cumulative column . This percentage is a decent score indicating that the first two components itself are
enough for replacing the original six variables. The command PCA also gives us the Eigen vector table as
below.
----------------------------------------------------------------------------------------
-------------+------------------------------------------------------------+-------------
----------------------------------------------------------------------------------------
In the above table the component1 (comp1) explains the weights associated with the respective stock
prices. Notice that here TCS has been assigned a weight of 0.4880 indicating that TCS receives the
greatest weights in the first component than any other variable. Although, TCS has the highest weight
assigned to it, but if we look at Infosys the weight assigned to it is .4808 which differs from TCS by a
marginal value of 0.0072 thus, we could conclude that both TCS and Infosys both are equally important
to the first principal component. The next variable that has much importance after TCS and Infosys is
Rajtv with a weight 0.4449 assigned to it. The other variables Wipro, Ndtv and tv18 has a value of
0.2842, 0.4325 and 0.2547 respectively.
The second component basically explains the difference between the industries, it represents a contrast
between the software stocks (Infosys, Wipro and Tcs) and the media stock ( Ndtv, Rajtv and tv18). Thus,
we can say that most of the variation in these stocks would be industry specific. This might be called as
an industry component. Thus, looking at the values associated to each stock in component 2 we can say
that both the industries move in different direction, but the companies within the industries moves in a
particular way.
SCREEPLOT
We can also asses the number of principal components with screeplot which gives us a scree plot
(graphical presentation) of the Eigen values as below.
1 2 3 4 5 6
Number
Each point in the above dig indicates the Eigen value of the respective components, the 1 st point
represents the 1st component, 2nd point indicates the 2 nd component and so on. If you look at the curve
after the point three it becomes a little flatter thus, we can conclude that the 1 st two components is far
enough to replace the original six variables.
Predict pc1 pc2
Having assessed the principal components lets use the predict command to score or to predict the first
two principal components (pc1 and pc2).
(score assumed)
(4 components skipped)
Scoring coefficients
--------------------------------------------------------------------------
-------------+------------------------------------------------------------
--------------------------------------------------------------------------
Note the two principal components will have zero correlation. The information that one principal
component has will not be present in the other principal component. It can be checked with the
correlate command as done below.
(obs=64)
pc1 pc2
pc1 1.0000
Now the above command uses the covariance table rather the correlation table which is default
Number of comp. = 6
Trace = .0047896
--------------------------------------------------------------------------
Component E
-------------+------------------------------------------------------------
--------------------------------------------------------------------------
From the above table we can see that the first component (comp1) has the higher Eigen value of .
00283293 and a corresponding proportion of 0 .5915, indicating 59% proportion of the total variance is
being accounted by the first component (comp1). The second component (comp2) has an Eigen value
and the proportion of .000959223 and 0.2003 respectively. Further the proportion of the total variance
accounted for by both the first and the second component (comp1 and comp2) is 0.7917, indicating that
79% of the total variance is explained by the first two components which is seen in the cumulative
column. This percentage is a decent score indicating that the first two components itself enough for
replacing the original six variables. The command PCA also gives us the Eigen vector table as below.
Principal components (eigenvectors)
----------------------------------------------------------------------------------------
-------------+------------------------------------------------------------+-------------
If we Notice here Wipro has been assigned a weight of 0.9790 indicating that Wipro receives the
greatest weights in the first component than any other variable. TCS has been assigned a weight of
0.1457 which is the next highest weight assigned. Thus, we could now conclude that Wipro has a greater
impact on the component than any other variable. The other variables Infosys, NDTV, Rajtv and TV18
are assigned weights 0.0847, 0.0648, 0.0800, 0.0499 by the first principal component.
Now an interesting thing to be noted is component two, as already mentioned it is the industry
component which represents a contrast between two industries taken. The thing to be noted here is the
sign of the values as been changed in case of a covariance matrix when compared to a correlation
matrix. And Wipro does not correlate well within the industry.
Note the weights assigned (comp1) to the stocks have also changed initial it was TCS and Infosys which
had almost equal weights, but now it is Wipro which is dominating the other variables in the first
component. Now let’s look at the screeplot to assess the number principal components to be
considered.
Screeplot
1 2 3 4 5 6
Number
If you look at the above dig, even here the curve after the point three becomes flatter thus, we can
conclude that the 1st two components is far enough to replace the original six variables.
Having assessed the principal components we can go on with the predict command as usual.
Note: Though the Eigen values and weights assigned to each components might change, but the number
components that will replace the original data will not change. Even with covariance matrix we have
only two principal components replacing the original six variables so as with correlation matrix.
Assignment by: