Anda di halaman 1dari 5

Assignment Name – Statistics Advance

Question 1. Calculate covariance and correlation between below two


columns A and B?

Answer:
Covariance -
Calculating the Covariance for A and B:
We will start by finding the mean of A and B
Mean of A = 337 / 7 = 48.14285714285714 = 48
Mean of B = 322 / 7 = 46
Now to get the Covariance, we have to subtract the mean from A
and B values.

COVARIANCE = ((25- 48) + (35- 48) + (21 – 48) + (67-48) + (98-48) +


(27 - 48) + (64 - 48)(52-46) + (10-46) + (5-46) + (98-46) + (52-46) +
(36-46) + (69-46)
----------------------------------------------------------------------
--------------------------------------------------------------------------------------------
---------------

11

Covariance = 471.8571

Correlation =

Finding the Correlation between A and B


Correlation = ((25- 48) + (35- 48) + (21 – 48) + (67-48) + (98-48) + (27
- 48) + (64 - 48)(52-46) + (10-46) + (5-46) + (98-46) + (52-46) + (36-46)
+ (69-46)
----------------------------------------------------------------------
--------------------------------------------------------------------------------------------
---------------
<\𝐬𝐪𝐫𝐭 >/(25-
48) + (35- 48) + (21 – 48) + (67-48) + (98-48) +
(27 - 48)² (64 - 48)(52-46) + (10-46) + (5-46) + (98-46) + (52-46) + (36-
46) + (69-46) ²

Correlation = 0.585604

Question 2. What are the different ways


to deal with multi collinearity?
Ans. In regression, "multicollinearity"
refers to predictors that are correlated
with other predictors. Multicollinearity
occurs when your model includes multiple
factors that are correlated not just to your
response variable, but also to each other.
In other words, it results when you have
factors that are a bit redundant.
Ways of Dealing with multi collinearity-
- Remove highly correlated predictors
from the model. If you have two or
more factors with a high VIF, remove
one from the model. Because they
supply redundant information,
removing one of the correlated factors
usually doesn't drastically reduce the
R-squared. Consider using stepwise
regression, best subsets regression, or
specialized knowledge of the data set
to remove these variables. Select the
model that has the highest R-squared
value.

- Use Partial Least Squares Regression (PLS) or


Principal Components Analysis, regression
methods that cut the number of predictors to
a smaller set of uncorrelated components.

Anda mungkin juga menyukai