Anda di halaman 1dari 9

MULTIVARIATE ASSIGNMENT 1 :

An assignment on conducting a PCA analysis for 6 companies stocks of two different industries the
companies were TCS, Wipro and Infosys from IT industry and Raj tv , Tv18 and Ndtv from the media
industry.

Data : the data have been taken from www.capitalline.com and stata has been used to conduct the
analysis.

NOTE : the commands all are highlighted in red followed with their results.

mean Infosys tcs wipro raj_tv tv_18 ndtv

Mean estimation Number of obs = 64

Mean Std. Err. [95% Conf.Interval]

Infosys .0003643 .0017447 -.0031221 .0038507

Tcs .0012121 .0023266 -.0034371 .0058613

Wipro -.0064159 .0065461 -.0194971 .0066654

raj_tv .0005371 .002129 -.0037174 .0047916

tv_18 .0002465 .0035455 -.0068387 .0073317

Ndtv -.00174 .0025351 -.0068059 .0033259

pwcorr Infosys tcs wipro raj_tv tv_18 ndtv

The above command is to generate correlation matrix of X variables.

Infosys tcs wipro raj_tv tv_18 ndtv

Infosys 1.0000

Tcs 0.6609 1.0000


Wipro 0.2555 0.3443 1.0000

raj_tv 0.3425 0.3779 0.1864 1.0000

tv_18 0.1686 0.0774 0.0465 0.2068 1.0000

Ndtv 0.3176 0.3128 0.1032 0.5056 0.3735 1.0000

pca infosys tcs wipro ndtv rajtv tv18

The pca command uses the above correlation matrix which is default to generate the principal
components.

Principal components/correlation Number of obs = 64

Number of comp. = 6

Trace = 6

Rotation: (unrotated = principal) Rho = 1.0000

--------------------------------------------------------------------------

Component Eigenvalue Difference Proportion Cumulative

-------------+------------------------------------------------------------

Comp1 2.51376 1.32904 0.4190 0.4190

Comp2 1.18472 .384656 0.1975 0.6164

Comp3 .800066 .0749677 0.1333 0.7498

Comp4 .725098 .271143 0.1208 0.8706

Comp5 .453955 .131558 0.0757 0.9463

Comp6 .322397 . 0.0537 1.0000


The above table displays the Eigen values of the respective components and the proportion of total
variance accounted for by the respective principal components.

This can be seen from the Eigen value column and the proportion column. Where the Eigen value
indicates the respective component’s Eigen value which can be seen in two different ways, one it can be
considered as the variance of the respective components and it can also be interpreted, as higher the
Eigen value, higher the proportion of total variance accounted for by that particular component, which
is evidential from the proportion column. Thus, in our case the first component (comp1) has the higher
Eigen value of 2.51376 and a corresponding proportion of 0 .4190, indicating 41% proportion of total
variance is being accounted by the first component (comp1). The second component (comp2) has an
Eigen value and the proportion of 1.18472 and 0.1975 respectively. Further the proportion of the total
variance accounted for by both the first and the second component (comp1 and comp2) is 0.6164,
indicating that 61% of the total variance is explained by the first two components, which is seen in the
cumulative column . This percentage is a decent score indicating that the first two components itself are
enough for replacing the original six variables. The command PCA also gives us the Eigen vector table as
below.

Principal components (eigenvectors)

----------------------------------------------------------------------------------------

Variable Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Unexplained

-------------+------------------------------------------------------------+-------------

Infosys 0.4808 -0.2601 -0.2381 0.4603 0.0445 0.6562 0

Tcs 0.4880 -0.3667 -0.2001 0.2256 -0.0290 -0.7318 0

Wipro 0.2842 -0.4478 0.7802 -0.2978 -0.0925 0.1125 0

Ndtv 0.4325 0.4398 -0.0940 -0.2472 -0.7398 0.0469 0

Rajtv 0.4449 0.1959 -0.2113 -0.5926 0.6045 0.0497 0

tv18 0.2547 0.6046 0.4909 0.4861 0.2754 -0.1284 0

----------------------------------------------------------------------------------------

In the above table the component1 (comp1) explains the weights associated with the respective stock
prices. Notice that here TCS has been assigned a weight of 0.4880 indicating that TCS receives the
greatest weights in the first component than any other variable. Although, TCS has the highest weight
assigned to it, but if we look at Infosys the weight assigned to it is .4808 which differs from TCS by a
marginal value of 0.0072 thus, we could conclude that both TCS and Infosys both are equally important
to the first principal component. The next variable that has much importance after TCS and Infosys is
Rajtv with a weight 0.4449 assigned to it. The other variables Wipro, Ndtv and tv18 has a value of
0.2842, 0.4325 and 0.2547 respectively.

The second component basically explains the difference between the industries, it represents a contrast
between the software stocks (Infosys, Wipro and Tcs) and the media stock ( Ndtv, Rajtv and tv18). Thus,
we can say that most of the variation in these stocks would be industry specific. This might be called as
an industry component. Thus, looking at the values associated to each stock in component 2 we can say
that both the industries move in different direction, but the companies within the industries moves in a
particular way.

SCREEPLOT

We can also asses the number of principal components with screeplot which gives us a scree plot
(graphical presentation) of the Eigen values as below.

Scree plot of eigenvalues after pca


2.5
2
Eigenvalues
11.5
.5

1 2 3 4 5 6
Number

Each point in the above dig indicates the Eigen value of the respective components, the 1 st point
represents the 1st component, 2nd point indicates the 2 nd component and so on. If you look at the curve
after the point three it becomes a little flatter thus, we can conclude that the 1 st two components is far
enough to replace the original six variables.
Predict pc1 pc2

Having assessed the principal components lets use the predict command to score or to predict the first
two principal components (pc1 and pc2).

(score assumed)

(4 components skipped)

Scoring coefficients

Sum of squares (column-loading) = 1

--------------------------------------------------------------------------

Variable Comp1 Comp2 Comp3 Comp4 Comp5 Comp6

-------------+------------------------------------------------------------

Infosys 0.4808 -0.2601 -0.2381 0.4603 0.0445 0.6562

Tcs 0.4880 -0.3667 -0.2001 0.2256 -0.0290 -0.7318

Wipro 0.2842 -0.4478 0.7802 -0.2978 -0.0925 0.1125

raj_tv 0.4449 0.1959 -0.2113 -0.5926 0.6045 0.0497

tv_18 0.2547 0.6046 0.4909 0.4861 0.2754 -0.1284

Ndtv 0.4325 0.4398 -0.0940 -0.2472 -0.7398 0.0469

--------------------------------------------------------------------------

Note the two principal components will have zero correlation. The information that one principal
component has will not be present in the other principal component. It can be checked with the
correlate command as done below.

Correlate pc1 pc2

(obs=64)

pc1 pc2

pc1 1.0000

pc2 -0.0000 1.0000


Note that these principal components are obtained on the basis of correlation matrix which is default.
However the above analysis can be done using a covariance matrix as well.

pca infosys tcs wipro ndtv rajtv tv18, covariance

Now the above command uses the covariance table rather the correlation table which is default

Principal components/covariance Number of obs = 64

Number of comp. = 6

Trace = .0047896

Rotation: (unrotated = principal) Rho = 1.0000

--------------------------------------------------------------------------

Component E

igenvalue Difference Proportion Cumulative

-------------+------------------------------------------------------------

Comp1 .00283293 .00187371 0.5915 0.5915

Comp2 .000959223 .000459268 0.2003 0.7917

Comp3 .000499955 .000241179 0.1044 0.8961

Comp4 .000258776 .000101065 0.0540 0.9502

Comp5 .000157711 .0000766927 0.0329 0.9831

Comp6 .0000810182 . 0.0169 1.0000

--------------------------------------------------------------------------

From the above table we can see that the first component (comp1) has the higher Eigen value of .
00283293 and a corresponding proportion of 0 .5915, indicating 59% proportion of the total variance is
being accounted by the first component (comp1). The second component (comp2) has an Eigen value
and the proportion of .000959223 and 0.2003 respectively. Further the proportion of the total variance
accounted for by both the first and the second component (comp1 and comp2) is 0.7917, indicating that
79% of the total variance is explained by the first two components which is seen in the cumulative
column. This percentage is a decent score indicating that the first two components itself enough for
replacing the original six variables. The command PCA also gives us the Eigen vector table as below.
Principal components (eigenvectors)

----------------------------------------------------------------------------------------

Variable Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 Unexplained

-------------+------------------------------------------------------------+-------------

Infosys 0.0847 0.1551 0.3246 0.3988 -0.0802 0.8354 0

Tcs 0.1457 0.1615 0.5350 0.5972 -0.1010 -0.5474 0

Wipro 0.9790 -0.1290 -0.1401 -0.0703 -0.0129 0.0114 0

Ndtv 0.0648 0.4378 0.3952 -0.5911 -0.5463 -0.0116 0

Rajtv 0.0800 0.2529 0.4010 -0.2981 0.8245 0.0105 0

tv18 0.0499 0.8232 -0.5214 0.2027 0.0701 -0.0453 0

If we Notice here Wipro has been assigned a weight of 0.9790 indicating that Wipro receives the
greatest weights in the first component than any other variable. TCS has been assigned a weight of
0.1457 which is the next highest weight assigned. Thus, we could now conclude that Wipro has a greater
impact on the component than any other variable. The other variables Infosys, NDTV, Rajtv and TV18
are assigned weights 0.0847, 0.0648, 0.0800, 0.0499 by the first principal component.

Now an interesting thing to be noted is component two, as already mentioned it is the industry
component which represents a contrast between two industries taken. The thing to be noted here is the
sign of the values as been changed in case of a covariance matrix when compared to a correlation
matrix. And Wipro does not correlate well within the industry.

Note the weights assigned (comp1) to the stocks have also changed initial it was TCS and Infosys which
had almost equal weights, but now it is Wipro which is dominating the other variables in the first
component. Now let’s look at the screeplot to assess the number principal components to be
considered.
Screeplot

Scree plot of eigenvalues after pca


.003 .002
Eigenvalues
.001 0

1 2 3 4 5 6
Number

If you look at the above dig, even here the curve after the point three becomes flatter thus, we can
conclude that the 1st two components is far enough to replace the original six variables.

Having assessed the principal components we can go on with the predict command as usual.

Note: Though the Eigen values and weights assigned to each components might change, but the number
components that will replace the original data will not change. Even with covariance matrix we have
only two principal components replacing the original six variables so as with correlation matrix.

Assignment by:

Vamse goutam.v (2nd Msc General Economics)

Pavithra Narayanan (2nd Msc General Economics)

Shruti Shekhar (2nd Msc General Economics)

Dhruv mehrotra(2nd Msc Financial Economics)

Anda mungkin juga menyukai