Anda di halaman 1dari 12


Exploratory Data Analysis

A normality test to check whether the data meet the assumption of the population must be normally distributed. SPSS provides two statistics: (i) Kolmogorov-Smirnov (ii) Shapiro-Wilk Case Processing Summary Cases Valid strata N Percent totfinwell urban 480 100.0% rural 320 100.0%

Missing N 0 0

Percent .0% .0%

Total N 480 320

Percent 100.0% 100.0%

There is no missing data from 800 samples, which is assumed to be randomly selected. 1.1 From the descriptives table below, several observations can be made: (i) Mean Mean is 75.53 (ii) Trimmed Mean To obtain this value, SPSS removes the top and bottom 5 per cent of the cases and calculates a new mean value. If we compare the original mean (75.53) and this new trimmed mean (75.50), we can see whether our extreme scores are having a strong influence on the mean.Trimmed Mean is 75.50, which is very similar to the mean. (iii) Skewness Skew is the tilt of the distribution, skew should be within +2 to -2 range when the data are normally distributed. In this case, skew is -.314 and . 136, which is not within the range of accepted as normally distributed.

(iv) Kurtosis Kurtosis, on the other hand, provides information about the peakedness of the distribution. If the distribution is perfectly normal, we would obtain a skewness and kurtosis value of 0 (rather an uncommon occurrence in the social sciences). Positive skewness values indicate positive skew (scores clustered to the left at the low values). Negative skewness values indicate a clustering of scores at the high end (right-hand side of a graph). Positive kurtosis values indicate that the distribution is rather peaked (clustered in the centre), with long thin tails. Kurtosis values below 0 indicate a distribution that is relatively flat (too many cases in the extremes). With reasonably large samples, skewness will not make a substantive difference in the analysis (Tabachnick & Fidell 2007, p. 80). Kurtosis can result in an underestimate of the variance, but this risk is also reduced with a large sample (200+ cases: see Tabachnick & Fidell 2007, p. 80). In this case, the value of kurtosis is -.070 and .272, and the sample for this case is large 800 samples.

Descriptives strata totfinwell urban Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis rural Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis

Statistic Std. Error 75.53 Lower Bound Upper Bound 74.00 77.06 75.50 77.50 290.425 17.042 19 120 101 24 -.079 .109 72.86 Lower Bound Upper Bound 70.90 74.82 73.11 75.00 318.000 17.833 16 120 104 26 -.314 -.070 .136 .272 .111 .222 .997 .778

1.2 Kolmogorov-Smirnov and Shapiro-Wilk statistic Tests of Normality Kolmogorov-Smirnova strata Statistic df totfinwell urban .058 rural .064 480 320 Sig. .001 .003 Shapiro-Wilk Statistic df .994 .988 480 320 Sig. .048 .011

a. Lilliefors Significance Correction Kolmogorov-Smirnov and Shapiro-Wilk statistic assess the normality of the distribution of scores. A non-significant result (Sig. value of more than .05) indicates normality. In this exercise, the Sig. value are .001,.003 and .048, .011 suggesting violation of the assumption of normality. The results of the KS and SW test show that significant value < .05, therefore, reject null hypothesis. Meaning that the data is not normally distributed. This is quite common in larger samples.

1.3 Histograms

Histograms are used to display the distribution of a single continuous variable. Inspection of the shape of the histogram provides information about the distribution of scores on the continuous variable. Many of the statistics discussed in this manual assume that the scores on each of the variables are normally distributed (i.e. follow the shape of the normal curve). In this exercise, the scores are reasonably normally distributed, with most scores occurring in the centre, tapering out towards the extremes. It is quite common in the social sciences, however, to find that variables are not normally distributed. Scores may be skewed to the left or right or, alternatively, arranged in a rectangular shape. The actual shape of the distribution for each group can be seen in the Histograms. 1.4 Normal Q-Q Plot

In this exercise, scores appear to be reasonably normally distributed. This is also supported by an inspection of the normal probability plots (labelled Normal Q-Q Plot). In this plot, the observed value for each score is plotted against the expected value from the normal distribution. A reasonably straight line suggests a normal distribution.

1.5 Detrended Normal Q-Q Plots The Detrended Normal Q-Q Plots are obtained by plotting the actual deviation of the scores from the straight line. There should be no real clustering of points, with most collecting around the zero line.

1.6 Boxplots The final plot that is provided in the output is a boxplot of the distribution of scores for the two groups. The rectangle represents 50 per cent of the cases, with the whiskers (the lines protruding from the box) going out to the smallest and largest values. Sometimes we will see additional circles outside this rangethese are classified by SPSS as outliers. The line inside the rectangle is the median value. Any scores that SPSS considers are outliers appear as little circles with a number attached (this is the ID number of the case). SPSS defines points as outliers if they extend more than 1.5 box-lengths from the edge of the box. Extreme points (indicated with an asterisk, *) are those that extend more than three box-lengths from the edge of the box. In the exercise below there are three outliers (two for urban and one for rural ): ID numbers 550 and 686 and 711 for rural samples. Boxplots are useful when we wish to compare the distribution of scores on variables. We can use them to explore the distribution of one continuous variable (e.g. positive affect) or, alternatively, we can ask for scores to be broken down for different groups

(e.g. age groups). We can also add an extra categorical variable to compare (e.g. males and females). In the exercise presented below, the distribution of scores of total financial wellbeing between urban and rural population is very similar.

1.7 Interpretation and Conclusion In this exercise, which is to test the normality of data on total financial wellbeing between urban and rural population, since the data used are large samples (800), we cannot rely on the Descriptives Table, KS, and SW test, then we have to look at the Histograms, Normal Q-Q plots, Detrended Normal Q-Q Plots and Boxplots. In the histograms, look at the tails of the distribution. There are almost invisible data points sitting on their own, out on the extremes, it means that there are no potential outliers. Furthermore, the scores drop away in a reasonably even slope. In the Normal Q-Q Plots, the observed value for each score is plotted against the expected value from the normal distribution. We see a reasonably straight line which suggests a normal distribution of the data. The same is observed in the Detrended Q-Q Normal Plots where there is actual deviation of the scores from the straight line. There is no real clustering of points, with most collecting around the zero line. It also indicates the normality of the data.

In the Boxplots, it is observed that the median line of the box is placed in the middle for the rural population, whereas the line of the urban is almost to the middle of the box which also indicates the normality of the data is not violated. To conclude, with the consideration of using big sample size, and the observations from several analyses above, the data is fulfill the assumption of normal. 2. Assumptions for t-test

T-tests are used when we have two groups (e.g. males and females) or two sets of data (before and after), and we wish to compare the mean score on some continuous variable. There are two main types of t-tests:
(i) Paired sample t-tests

(also called repeated measures) are used when we are interested in changes in scores for participants tested at Time 1, and then again at Time 2 (often after some intervention or event). The samples are related because they are the same people tested each time.
(ii) Independent sample t-tests

are used when we have two different (independent) groups of people (males and females), and we are interested in comparing their scores. In this case, we collect information on only one occasion but from two different sets of people. 2.1 Independent-samples t-test Independent samples t-test is to compare the mean of two groups on a single interval or ratio variable. Example of research question: Are the urban population more financially satisfied than rural population?

This exercise will use the categorical independent variable with only two groups (e.g. strata : urban/rural) and one continuous dependent variable (e.g. totfinwell). Respondents can belong to only one group.

2.1.1 Assumptions (i) Level of measurement - Involved continuous data for the DV interval and ratio data (ii) Random sampling: - assuming that data are obtained using a random sample from the population (iii) Independent of observations - the observations for each variable must be independent of one another i.e. not influenced by other variable/s. (iv) Normal distribution - assuming that the population from which the samples are taken are normally distributed (v) Homogeneity of variance - assuming that the samples are taken from population of equal variance.

2. 2. Hyphothesis Null hypothesis (Ho): there is no difference between the two means of the financial wellbeing between the urban and rural residents. Alternate hypothesis (HA): there is a difference between the two means of the financial wellbeing between the urban and rural residents.

2.3 IV and DV IV is categorical variable (financial wellbeing); and DV is continuous variable (strata, urban and rural area)

2.4 The result Group Statistics strata N totfinwell urban 480 rural 320 Mean 75.53 72.86 Std. Deviation 17.042 17.833 Std. Error Mean .778 .997

Independent Samples Test Levene's Test for Equality of Variances t-test for Equality of Means 95% Confidence Interval of the Difference

Sig. t


Sig. Mean (2Differenc Std. Error tailed) e Difference Lower Upper .033 2.671 1.253 .211 5.130

totfinwell Equal 1.47 .225 2.132 798 variances 4 assumed Equal variances not assumed

2.112 662.21 .035 8





2.4 Interpretation of the result


2.4.1 Check assumptions using table of Independent Samples Test - Result of Levenes test for equality of variance In the Levenes test, the result is not significant, f > 0.05 0.225 is > 0.05, then the analysis will be using the t-test result in the first row as there is equal variance. Referring to the result above, since Levenes test is not significant, row 1 is used (equal variance assumed); t = 2.132 & p = 0.033 2.4.2 The result showed that p>0.05 = 0.225 > 0.05, thus fail to reject null hypothesis. Therefore, it can be concluded that there is no significant difference in the mean scores of financial wellbeing for each of the urban and rural residents.

2.4.3 The effect size (eta squared) can be calculated : Eta squared = t2 + (N1 + N2 2) = (2.132)2 = 0.45 (2.132)2 + (480 + 320 2)

The effect size or the magnitude of the difference is very small. Only 0.45 percent of the variance (effect size x 100) in financial wellbeing is explained by residential areas. The strength of difference is very low.