Anda di halaman 1dari 11

Shahraiz Shaukat 153002

Missing Data
If you are missing much of your data; this can cause several problems. The most apparent problem
is that there simply won't be enough data points to run your analyses. The EFA, CFA, and path
models require a certain number of data points in order to compute estimates. This number
increases with the complexity of your model. If you are missing several values in your data, the
analysis just won't run.
Additionally, missing data might represent bias issues. Some people may not have answered
particular questions in your survey because of some common issue. For example, if you asked
about gender, and females are less likely to report their gender than males, then you will have
male-biased data. Perhaps only 50% of the females reported their gender, but 95% of the males
reported gender. If you use gender in your causal models, then you will be heavily biased toward
males, because you will not end up using the unreported responses.
To find out how many missing values each variable has, in SPSS go to Analyze, then Descriptive
Statistics, then Frequencies. Enter the variables in the variables list. Then click OK. The table in
the output will show the number of missing values for each variable.
The threshold for missing data is flexible, but generally, if you are missing more than 10% of the
responses on a particular variable, or from a particular respondent, that variable or respondent may
be problematic. There are several ways to deal with problematic variables.
 Just don’t use that variable

 If it makes sense, impute the missing values. This should only be done for continuous or
interval data (like age or Likert-scale responses), not for categorical data (like gender).
 If your dataset is large enough, just don't use the responses that had missing values for that
variable. This may create a bias, however, if the number of missing responses is greater
than 10%.
Outliers
Outliers can influence your results, pulling the mean away from the median.
To detect outliers on each variable, just produce a boxplot in SPSS (as demonstrated in the video).
Outliers will appear at the extremes, and will be labeled. If you have a really high sample size,
then you may want to remove the outliers. If you are working with a smaller dataset, you may want
to be less liberal about deleting records. However, this is a trade-off, because outliers will influence
small datasets more than large ones. Lastly, outliers do not really exist in Likert-scales. Answering
at the extreme (1 or 5) is not really representative outlier behavior.

Normality
Normality refers to the distribution of the data for a particular variable. We usually assume that
the data is normally distributed, even though it usually is not! Normality is assessed in many
different ways: shape, skewness, and kurtosis (flat/peaked).
 Shape: To discover the shape of the distribution in SPSS, build a histogram (as shown in
the video tutorial) and plot the normal curve. If the histogram does not match the normal
curve, then you likely have normality issues. You can also look at the boxplot to determine
normality.
 Skewness: Skewness means that the responses did not fall into a normal distribution, but
were heavily weighted toward one end of the scale. Income is an example of a commonly
right skewed variable; most people make between 20 and 70 thousand dollars in the USA,
but there is smaller group that makes between 70 and 100, and an even smaller group that
makes between 100 and 150, and a much smaller group that makes between 150 and 250,
etc. all the way up to Bill Gates and Mark Zuckerberg. Addressing skewness may require
transformations of your data (if continuous), or removing influential outliers. There are two
rules on Skewness.
1. If your skewness value is greater than 1 then you are positive (right) skewed, if it is less
than -1 you are negative (left) skewed, if it is in between, then you are fine. Some published
thresholds are a bit more liberal and allow for up to +/-2.2, instead of +/-1.

2. If the absolute value of the skewness is less than three times the standard error, then you
are fine; otherwise you are skewed.
Using these rules, we can see from the table below, that all three variables are fine using the first
rule, but using the second rule, they are all negative (left) skewed.
 Kurtosis: Kurtosis refers to the outliers of the distribution of data. Data that have outliers
have large kurtosis. Data without outliers have low kurtosis. The kurtosis (excess kurtosis)
of the normal distribution is 0. The rule for evaluating whether or not your kurtosis is
problematic is the same as rule two above:
 If the absolute value of the kurtosis is less than three times the standard error, then the
kurtosis is not significantly different from that of the normal distribution; otherwise you
have kurtosis issues. Although a looser rule is an overall kurtosis score of 2.200 or less
(rather than 1.00)
Linearity
Linearity refers to the consistent slope of change that represents the relationship between an IV
and a DV. If the relationship between the IV and the DV is radically inconsistent, then it will throw
off your SEM analyses. There are dozens of ways to test for linearity. Perhaps the most elegant
(easy and clear-cut, yet rigorous), is the deviation from linearity test available in the ANOVA test
in SPSS. In SPSS go to Analyze, Compare Means, Means. Put the composite IVs and DVs in the
lists, then click on options, and select "Test for Linearity". Then in the ANOVA table in the output
window, if the Sig value for Deviation from Linearity is less than 0.05, the relationship between
IV and DV is not linear, and thus is problematic (see the screenshots below). Issues of linearity
can sometimes be fixed by removing outliers (if the significance is borderline), or through
transforming the data. In the screenshot below, we can see that the first relationship is linear (Sig
= .268), but the second relationship is nonlinear (Sig = .003).
Homoscedasticity
Homoscedasticity is a nasty word that means that the variable's residual (error) exhibits consistent
variance across different levels of the variable. There are good reasons for desiring this. For more
information, see Hair et al. 2010 chapter 2. :) A simple way to determine if a relationship is
homoscedastic is to do a simple scatter plot with the variable on the y-axis and the variable's
residual on the x-axis. To see a step by step guide on how to do this, watch the video tutorial. If
the plot comes up with a consistent pattern - as in the figure below, then we are good - we have
homoscedasticity! If there is not a consistent pattern, then the relationship is considered
heteroskedastic. This can be fixed by transforming the data or by splitting the data by subgroups
(such as two groups for gender).
Multicollinearity
Multicollinearity is not desirable. It means that the variance our independent variables explain in
our dependent variable are overlapping with each other and thus not each explaining unique
variance in the dependent variable. The way to check this is to calculate a Variable Inflation Factor
(VIF) for each independent variable after running a multivariate regression. The rules of thumb
for the VIF are as follows:
 VIF < 3: not a problem
 VIF > 3; potential problem
 VIF > 5; very likely problem
 VIF > 10; definitely problem
The tolerance value in SPSS is directly related to the VIF, and values less than 0.10 are strong
indications of Multicollinearity issues. For particulars on how to calculate the VIF in SPSS, watch
the step by step video tutorial. The easiest method for fixing Multicollinearity issues is to drop one
of problematic variables. This won't hurt your R-square much because that variable doesn't add
much unique explanation of variance anyway.

Data Sheet
Variable Mode Missing Aberrant Outlier Skewness Kurtosis
value value
IRO1 4 - 5 4,9,55,223,101,102 -0.803 0.563
IRO2 4 - 5 101,103,29,233 -0.400 -0.854
IRO3r 3 - 5 179,182,261,267 0.297 -0.732
IRO4 5 - 5 96,138,232,268 -1.589 2.841
IRO5 5 - 5 - -0.939 0.453
IRO6r 2 - 5 - 0.036 -1.077
IRO7 4 1 5 - -0.412 -0.434
IRO8r 2 - 5 - 0.277 -0.998
ERO1 3 - 5 - 0.001 -0.964
ERO2 4 - 5 - -0.715 0.057
ERO3 4 - 5 183,33,240,266 -0.338 -0.748
ERO4 5 1 6 150,162,186,187,188 -1.225 0.362
ERO5 2 1 6 230,233,240,269 0.207 -0.942
ERO6 3 - 5 200,232,241,245 -0.077 -0.943
EA1 4 - 5 182,185,265,267 -0.307 -0.614
EA2 4 - 5 85,129,173,174 -0.442 -0.354
EA3 4 1 5 133,174,129,162 -0.361 -0.563
GP1 4 - 5 103,266 -0.225 -0.759
GP2 4 1 5 21,26 -0.397 -0.673
GP3 3 1 5 - 0.133 -0.625
GP4 4 - 5 101,106,179,240 -0.217 -0.918
GP5 4 - 5 172,179,232,249 -0.203 -0.713
GP6 4 - 5 179,226,232,246 -0.260 -0.807
GP7 4 1 5 - -0.652 -0.032
GP8 4 - 5 - -0.247 -0.707
C1 6 1 7 - -0.769 -0.131
C2 6 - 7 84,140,174,197,173,193 -0.320 -0.757
C3 6 - 7 174 -1.181 0.947
AEB1 1 - 11 136,134,169,106,184,187 1.477 5.299
,248
AEB2 1 - 5 107,170,131,106,169,180 0.726 -0.990
,221
AEB3 1 - 5 - 0.594 -0.592
AEB4 2 - 5 233,236,240,137,139,140 0.250 -0.793
242
AEB5 2 - 5 - 0.431 -0.473
BEB1 2 - 5 - 0.815 -0.029
BEB2 2 - 5 139,236,248,266 0.299 -0.885
BEB3 2 - 5 242,266,267,269 0.387 -0.543
BEB4 2 - 5 - 0.680 -0.453
CEB1 2 - 5 - 0.895 0.175
CEB2 2 - 5 248,249,262,266 0.190 -0.753
CEB3 2 - 5 - 0.494 -0.592
CEB4 2 - 5 228,250,249,266 0.289 -0.900
DEB1 3 1 5 248,249,250,264 -0.415 -0.737
DEB2 4 - 5 - -0.230 -0.902
DEB3 3 - 5 - -0.007 -0.835
DEB4 3 - 5 - -0.058 -0.833
DEB5 4 - 5 207,234,247,266 -0.319 -0.605
DEB6 4 1 5 - -0.357 -0.672
EEB1 3 1 5 - 0.001 -0.837
EEB2 3 - 5 - -0.229 -0.655
EEB3 4 - 5 178,197,246,247 -0.225 -0.795
EEB4 4 - 5 - -0.653 -0.058
EEB5 5 - 5 - -0.784 -0.370
EEB6 5 - 5 - -0.777 -0.332
EEB7 4 - 5 140,242,246,226 -0.297 -0.644
EEB8 3 1 5 227,230,236,265 0.236 -0.855

Multicollinearity

Coefficientsa
Standardized
Unstandardized Coefficients Coefficients Collinearity Statistics
Model B Std. Error Beta t Sig. Tolerance VIF
1 (Constant) 2.760 .286 9.660 .000
MeanERO .148 .050 .178 2.976 .003 .972 1.029
EAMean -.004 .039 -.006 -.097 .923 .909 1.100
CMean .086 .025 .209 3.427 .001 .928 1.077
EBMean -.004 .057 -.004 -.071 .943 .950 1.053
a. Dependent Variable: MeanIRO

Inter-construct Correlations

Correlations
EBMean CMean EAMean MeanIRO GPMean
EBMean Pearson Correlation 1 -.102 -.173** -.005 .010
Sig. (2-tailed) .096 .004 .935 .871
N 270 270 270 270 270
CMean Pearson Correlation -.102 1 .251** .224** .320**
Sig. (2-tailed) .096 .000 .000 .000
N 270 270 270 270 270
EAMean Pearson Correlation -.173** .251** 1 .062 .309**
Sig. (2-tailed) .004 .000 .311 .000
N 270 270 270 270 270
MeanIRO Pearson Correlation -.005 .224** .062 1 .214**
Sig. (2-tailed) .935 .000 .311 .000
N 270 270 270 270 270
GPMean Pearson Correlation .010 .320** .309** .214** 1
Sig. (2-tailed) .871 .000 .000 .000
N 270 270 270 270 270
**. Correlation is significant at the 0.01 level (2-tailed).

Linearity

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanIRO Between Groups (Combined) 13.135 17 .773
Linearity 5.057 1 5.057
Deviation from Linearity 8.078 16 .505
Within Groups 105.013 252 .417
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanERO Between Groups (Combined) 12.950 18 .719
Linearity 1.641 1 1.641
Deviation from Linearity 11.309 17 .665
Within Groups 105.198 251 .419
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanEA Between Groups (Combined) 18.157 11 1.651
Linearity 9.673 1 9.673 2
Deviation from Linearity 8.484 10 .848
Within Groups 99.991 258 .388
Total 118.148 269
ANOVA Table
Sum of Squares df Mean Square F
MeanGP * MeanC Between Groups (Combined) 27.046 17 1.591 4
Linearity 11.959 1 11.959 33
Deviation from Linearity 15.087 16 .943 2
Within Groups 91.102 252 .362
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanAEB Between Groups (Combined) 10.473 17 .616
Linearity 1.568 1 1.568
Deviation from Linearity 8.906 16 .557
Within Groups 107.675 252 .427
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanBEB Between Groups (Combined) 8.882 15 .592
Linearity .197 1 .197
Deviation from Linearity 8.685 14 .620
Within Groups 109.266 254 .430
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanCEB Between Groups (Combined) 8.295 15 .553
Linearity 1.324 1 1.324
Deviation from Linearity 6.971 14 .498
Within Groups 109.853 254 .432
Total 118.148 269
ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanDEB Between Groups (Combined) 6.010 22 .273
Linearity .438 1 .438
Deviation from Linearity 5.572 21 .265
Within Groups 112.138 247 .454
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanEEB Between Groups (Combined) 20.542 25 .822
Linearity 5.209 1 5.209
Deviation from Linearity 15.333 24 .639
Within Groups 97.606 244 .400
Total 118.148 269

ANOVA Table
Sum of Squares df Mean Square
MeanGP * MeanEB Between Groups (Combined) 23.739 58 .409
Linearity .022 1 .022
Deviation from Linearity 23.717 57 .416
Within Groups 94.409 211 .447
Total 118.148 269
Homoscedasticity
GP & IRO

GP & ERO
GP & EA

GP & C
GP & EB

Anda mungkin juga menyukai