Anda di halaman 1dari 27

MULTIVARIATE ANALYSIS

SPSS OPERATION AND


APPLICATION

STUDENT NAME: DENIZ YILMAZ


STUDENT NUMBER: M0987107

1
CONTENTS
1 INTRODUCTION

2 RELIABILITY ANALYSIS

3 CORRELATIONS

4 COMPARE MEANS

5 GENERAL LINEAR MODEL

6 FACTOR ANALYSIS
.
7 REGRESSION ANALYSIS 2
1.INTRODUCTION
 In this study I gave 70 soccer players as a data and
tested their name, nationality, incomes, marital
status, weight, height, performance, goal, age,
nationality, red card, yellow card, disabilities, in 12
variables .

 It allows for a great deal of flexibility in the data format


 It provides the user with a comprehensive set of procedures for
data transformation and file manipulation
 It offers the researcher a large number of statistical analyses
processes commonly used in social sciences.

3
2.RELIABILITY ANALYSIS

Reliability is the correlation of an item, scale, or


instrument with a hypothetical one which truly
measures what it is supposed to. Since the true
instrument is not available, reliability is estimated in
one of four ways:

 Internal consistency: Cronbach's alpha


 Split-half reliability: The Spearman-Brown coefficient
 Test-retest reliability: The Spearman Brown coefficient
 Inter-rater reliability: Intraclass correlation, of which
there are six types
4
3.CORRELATIONS
 Correlations, Nonparametic Correlations
There are two types of correlations: bivariate and partial. Bivariate correlation
is a correlation between two variables. Partial correlation looks at the
relationship between two variables while controlling the effect of one or more
additional variables. Pearsons’s product-moment correlation coefficient and
Spearman’s rho are examples of bivariate correlation coefficient.
 Pearson’s Correlation Coefficient
Pearson correlation requires only that data are interval for it to be an accurate
measure of the linear relationship between two variables.
 Partial Correlation
Partial correlation is used to examine the relationship between two variables
while controlling for the effects of one or more additional variables. the sign
of the coefficient informs us of the direction of the relationship and the size of
the coefficient describes the strength of the relationship

5
CORRELATIONS
Pearson correlation requires only that data are interval for it to be an
accurate measure of the linear relationship between two variables.
Figure provides a matrix of the correlation coefficient for the three variables.
Incomes is negatively related to goals. The output also show that age is
positively related to the amount of incomes. Finally goals appears to be
negatively related to the age .

6
4. COMPARE MEANS
 One Sample T-Test
s. In our table population mean of AGE is 1.87 if we look at the significance
One
value,sample
we can t-test
see,it isisless
a statistical procedure that
than the predetermined is used tolevel,
significance knowthen
thewe
mean
difference between
can reject the the sample
null hypothesis andand the known
conclude that thevalue of themean
population population
and the
mean.
sampleIn oneare
mean sample t-test, different.
statistically we knowIfthe the population mean.
calculated value We draw
is greater thana
random sample from
the predetermined the population
significance and
level, than wethen compare
can accept the the
nullsample mean
hypothesis
and conclude
with that the mean
the population mean ofandthemake
population and sample
a statistical are statistically
decision as to whether or
different...
not the sample mean is different from the population.

7
We have a closer look to GOALS:
COMPARE MEANS
In here I want to compare the mean goals of soccer players who are the age between 30-40 and under 30 years old.
In independent sample test we see the Levene's Test for Equality of Variances. This tells us if we have met our second
assumption (the two groups have approximately equal variance on the dependent variable). If the Levene's Test is
Independent T-Test
significant (the value under "Sig." is less than .05), the two variances are significantly different. If it is not significant
(Sig.The
is greater than .05), the
Independent two variances
Samples T Testare compares
not significantly
the different; that is, of
mean scores the two
two variances
groups are
onapproximately
a given
equal. Here, we see that the significance is .985, which is greater than .05. We can assume that the variances are
variable.
approximately equal. As the t value of 0.985 with 47 degrees of freedom is not significant (greater then our 0.05
Hypotheses:
significance level) we fail to reject the null hypothesis:
Null:
t (47) The pmeans
=-0.595; =0.555,ofNS the two groups are not significantly different.
Alternate: The means of the two groups are significantly different.

8
COMPARE MEANS
 Paired T-Test
 Paired sample t-test is a statistical technique that is used to
compare two population means in the case of two samples that are
correlated.
 Hypothesis:

Null hypothesis
Alternative hypothesis
 The level of significance:

In paired sample t-test, after making the hypothesis, we choose the


level of significance. In most of the cases in the paired sample t-test,
significance level is 5%.

9
COMPARE MEANS
 One Way ANOVA
 The One-Way ANOVA compares the mean of one or more groups based on
one independent variable (or factor).
 Assumptions: The two groups have approximately equal variance on the
dependent variable. We can check this by looking at the Levene's Test.
 Hypotheses:
Null: There are no significant differences between the groups' mean scores.
Alternate: There is a significant difference between the groups' mean scores
 In one way ANOVA:
First, we are looking the Descriptive Statistics
Next we see the results of the Levene's Test of Homogeneity of
Variance
Lastly, we see the results of our One-Way ANOVA

10
COMPARE MEANS
 One Way ANOVA
 Post-Hoc Comparisons :We can look at the results of the Post-Hoc
Comparisons to see exactly which pairs of groups are significantly
different. There are three parts in post-hoc tests: Tukey’s test , Scheffe and
LSD results
 Homogenous subsets: (The Tukey range test) gives information
similar with post hoc tests, but in a different format.The important point is
whether (Sig. is greater than 0.05) or(Sig. is less than 0.05).

 Mean plots : are used to see if the mean varies between different groups
of the data.

11
5. GENERAL LINEAR MODEL(GLM)
 General Linear Model
 The general linear model can be seen as an extension of linear multiple
regression for a single dependent variable, and understanding the multiple
regression model is fundamental to understanding the general linear model.
The general purpose of multiple regression (the term was first used by
Pearson, 1908) is to quantify the relationship between several independent
or predictor variables and a dependent or criterion variable.
 General Linear Model menu includes :
Univariate GLM 
Multivariate GLM
Repeated Measures
Variance Components

12
GENERAL LINEAR MODEL(GLM)
Univariate GLM
Univariate GLM is the general linear model now often used to implement
such long-established statistical procedures as regression and members of
the ANOVA family.
 The Between-Subjects Factors information table in Figure is an
example of GLMs output. This table displays any value labels defined for
levels of the between-subjects factors. In this table, we see that GOALS= 1 ,
2 and 3 correspond to (under 100, between 100-150 and over 150 goals)
respectively

13
GENERAL LINEAR MODEL(GLM)
 Univariate GLM

Tests of Between-Subjects Effects :Type III in Figure SS shows the


sums of squares and other statistics differ for most effects. The ANOVA
table in Figure demonstrates the PERFORMANCE by GOALS
interaction effect is not significant at p = 0 .815

14
GENERAL LINEAR MODEL(GLM)
Multivariate GLM
Multivariate GLM is often used to implement two long-established
statistical procedures - MANOVA and MANCOVA.
Tests of Between-Subjects Effects :(Test of overall model significance) The
overall F test appears, illustrated below, in the "Corrected Model” and answers the
question, "Is the model significant for each dependent?“There will be an F
significance level for each dependent. That is, the F test tests the null hypothesis that
there is no difference in the means of each dependent variable for the different
groups formed by categories of the independent variables. For the example below,
the multivariate GLM is found to be not significant for all three dependent variables.

15
GENERAL LINEAR MODEL(GLM)
Multivariate GLM
 Between-Subjects SSCP Matrix: contains the sums of squares
attributable to model effects. These values are used in estimates of effect
size.
 Multivariate Tests (Test of individual effects overall ) : in contrast to the overall
F test, answer the question, "Is each effect significant for at least one of the
dependent variables?" That is, where the F test focuses on the dependents, the
multivariate tests focus on the independents and their interactions.
Types of individual effects:
• Hotelling's T-Square
• Wilks' lambda
• Pillai-Bartlett trace
• Roy's greatest characteristic root (GCR)

16
GENERAL LINEAR MODEL(GLM)
Multivariate GLM
 Box's M tests MANOVA's assumption of homoscedasticity using the F distribution. If
p(M)<.05, then the covariances are not significantly different. Thus we want M not to
be significant, rejecting the null hypothesis that the covariances are not homogeneous.
In the figure of SPSS output below, Box's M shows the assumption of equality of
covariances among the set of dependent variables is violated with respect to the groups
formed by the categorical independent factor "performance”.

Levene's test :SPSS also outputs Levene's


test as part of Manova.. If Levene's test is
significant, then the data fail the assumption of
equal group error variances. In the figure
,Levene's test shows that the assumption of
homogeneity of error variances among the
groups of "performance" is violated for two of
the three dependent variables listed.
17
6. FACTOR ANALYSIS
 Factor Analysis
Attempts to identify underlying variables, or factors, that explain the
pattern of correlations within a set of observed variables. Factor analysis is
often used in data reduction to identify a small number of factors that
explain most of the variance observed in a much larger number of manifest
variables. Factor analysis requires that you have data in the form of
correlations, so all of the assumptions that apply to correlations, are
relevant .

 Types of factor analysis: :


Principal component
Common factor analysis

18
FACTOR ANALYSIS
Correlation Matrix: We can use the correlation matrix to check the pattern of
relationships. First scan the significance values and look for any variable for which
the majority of values are greater than 0.05.

All we want to see in this figure is


that the determinant is not 0.  If the
determinant is 0, then there will be
computational problems with the
factor analysis, If we look at figure :
a. Determinant = .835 which is greater
than necessary value 0.00001. We
can say, there is no problem with this
data.

19
FACTOR ANALYSIS
 KMO and Barlett’s Test: Two important parts of the output
a. Kaiser-Meyer-Olkin Measure of Sampling Adequacy :This measure
varies between 0 and 1, and values closer to 1 are better. Kaiser recommends
accepting value is greater than 0.05. For this data (figure) the KMO value is
0.472 , the minimum suggested value is 0.6 which is mean factor analysis is not
appropriate for this data.
b. Bartlett's Test of Sphericity : The null hypothesis that the original
correlation matrix is an identity matrix. For factor analysis to work we need
some relationships between variables and if the R-matrix were an identity matrix
then all correlation coefficients would be zero. Therefore we want to this test to
be significant. Barlett,s Test is not significant (0.061), because the value need to
be (p<0.001), therefore factor analysis is not appropriate.

20
FACTOR ANALYSIS
Total Variance Explained: Figure lists the eigenvalues associated with
each linear component (factor) before extraction, after extraction and after
rotation. Before extraction, SPSS has identified 4 linear components within
the data set .The Eigenvalues associated with each factor represent the
variance explained by that particular linear component and SPSS also
displays eigenvalue in terms of the percentage of variance explained ( so
factor 1 explains 35.360 total variance)it should be clear that the first two
factors explain relatively large amount of variance(especially factor1)
whereas subsequent factors explain only small amounts of variance.

21
FACTOR ANALYSIS
Communalities & Component Matrix: Figure shows the table of
communalities before and after extraction principal component analysis works
on the initial assumption that all variance is common; therefore before
extraction the communalities are all 1. The communalities on the column
labeled extraction reflect the common variance in the data structure. So, we
can say that 69.7% of the variance associated with question 1 is common.
This output also shows the component matrix before rotation. This
matrix contains the loadings of each variable onto each factor. At this stage
SPSS has extracted two factors.

22
7. REGRESSION ANALYSIS
 Regression analysis
Includes any techniques for modeling and analyzing several variables,
when the focus is on the relationship between a dependent variable and
one or more independent variables.
 The regression equation is the simplest form regression analysis involves
finding the best straight line relationship to explain how the variation in an
outcome (or dependent) variable, Y, depends on the variation in a predictor
(or independent or explanatory) variable, X. Once the relationship has been
estimated we will be able to use the equation:
Y = β0 .β1X
 The basic technique for determining the coefficients β0 and β1 is
Ordinary Least Squares (OLS): values for β0 and β1 are chosen so as to
minimize the sum of the squared residuals (SSR). The SSR may be written
as:

23
REGRESSION ANALYSIS
Model summary: From the model summary table, we can find how well
the model fits the data .This table displays R, R squared, adjusted R squared,
and the standard error. R is the correlation between the observed and
predicted values of the dependent variable. The values of R range from -1 to
1.
 R squared is the proportion of variation in the dependent variable explained
by the regression model. The values of R squared range from 0 to 1.
Adjusted R-Square is an adjustment for the fact that when one has a large
number of independents, it is possible that R2 will become artificially high
simply.

24
REGRESSION ANALYSIS
ANOVA: Besides R-squared we can use Anova (Analysis of variance) to
check how well the model fits the data.

The F statistic is the regression mean square (MSR) divided by the residual
mean square (MSE). If the significance value of the F statistic is small
(smaller than 0.05) then the independent variables do a good job explaining
the variation in the dependent variable. If the significance value of F is larger
than 0.05 then the independent variables do not explain the variation in the
dependent variable, and the null hypothesis that all the population values for
the regression coefficients are 0 is accepted.

25
REGRESSION ANALYSIS
 The Collinearity Diagnostics table :in SPSS is an alternative method of
assessing if there is too much multicollinearity in the model. High eigenvalues
indicate dimensions (factors) which account for a lot of the variance in the
crossproduct matrix. Eigenvalues close to 0 indicate dimensions which explain
little variance. Multiple eigenvalues close to 0 indicate an ill-conditioned
crossproduct matrix, meaning there may be a problem with multicollinearity .

26
THE END

27

Anda mungkin juga menyukai