Introduction
This note is created as a guide to STATA course. This note will run through a sample data which is in
STATA format (student_analysis.dta). The sample data is a continuation from previous workshop
(Day 1). As now we will focus on basic statistical analysis after exploring and cleaning of data.
student_analysis.dta
Analysis
1.
2.
3.
4.
5.
6.
7.
Step 5: Interpretation
95% confidence interval of mean difference of systolic does not include zero.
p-value < 0.001, reject H0
Step 6: Conclusion
At the 5% level of significance, the mean systolic pressure is difference than 120. The mean systolic
pressure is 115.65 (114.69, 116.60), lower than the mean systolic pressure of population.
Step 7: Presentation of results
Table 1: Comparison of mean systolic pressure to the population of 120 (n =438)
Parameter
Mean (SD)
95% Confidence Interval
t-statistica (df)
Systolic pressure
115.65 (10.21)
114.69, 116.60
-8.93 (437)
a
p-value
<0.001
Step 1: Hypothesis
H0: The mean BMI between males and females are the same.
HA: The mean BMI between males and females are different.
Step 2: Level of significance
= 0.05
Step 3: Assumptions
1. Random sample
2. Two samples are independent
3. Two populations are normally distributed
There are two ways of checking normal distribution:
(i) Histogram with an overlaid normal curve
a) Select: Graphics > Histogram
c) Output:
c) Output:
c) Output:
Notes:
Hypothesis of Levenes test
H0: The variances between
groups are the same
HA: The variances between
groups are different (onetailed)
d) Interpretation:
p value > 0.05 ( Do not reject H0)
The variances are equal. Thus, the assumption of Levenes test is met.
Step 4: Stastistical test
1. Select: Statistics > Summaries, tables, and test > Classical tests of hypotheses > t-test (meancomparison test)
3. Output:
Step 5: Interpretation
95% confidence interval of mean difference BMI does not include zero.
p-value < 0.001, reject H0
Step 6: Conclusion
At the 5% level of confidence, the mean BMI are different between males and females. The mean
BMI of males (23.97 0.26) is higher than females (20.89 0.22).
Step 7: Presentation of results
Table 2: Mean comparison of BMI between gender (n = 438)
Group (n)
Mean (SD)
Male (196)
Female (242)
23.97 (3.59)
20.89 (3.49)
p-value
< 0.001
9.07(436)
10
11
Step 5: Interpretation
95% confidence interval of mean difference of BMI does not include zero.
p-value < 0.001, reject H0
Step 6: Conclusion
At the 5% level of significance, the means BMI before and after aerobic activity are different. The
mean BMI of before aerobic is higher than that after aerobic. The aerobic activity is effective in
lowering the BMI status.
Step 7: Presentation of results
Table 3: Comparison of mean BMI before and after an extensive aerobic activity (n = 438).
Group
Mean (SD)
Before
22.27 (3.85)
After
21.41 (3.97)
a
Paired t-test was applied.
22.01 (437)
p-value
< 0.001
12
c) Output:
14
c) Output:
c) Output:
Notes:
Hypothesis of Barletts test
H0: The variances between groups are the
same
HA: The variances between groups are
different (one-tailed)
d) Interpretation:
p value > 0.05 ( Do not reject H0)
The variances are equal. Thus, the assumption of Barletts test is met.
Step 4: Statistical test
1. Select: Statistics > Linear models and related > ANOVA/MANOVA > One-way ANOVA
16
3. Output:
Step 5: Interpretation
p-value < 0.001, reject H0
Step 6: Conclusion
At the 5% level of significance, the mean height are different among races.
Additional analysis: Post-hoc analysis
Post-hoc analysis is to identify which pair of groups have the significant difference in mean of BMI.
Possible comparison pair of height among groups:
-
Malay vs Chinese
Malay vs Indian
Malay vs Others
Indian vs Chinese
Indian vs Others
Chinese vs Others
a) Select: Statistics > Linear models and related > ANOVA/MANOVA > One-way ANOVA
17
d) Interpretation:
Malay vs Chinese:- p-value = 0.001, reject H0
Malay vs Indian:- p-value = 0.001, reject H0
Malay vs Others:- p-value > 0.008, do not reject H0
Chinese cs Indian:- p-value > 0.008, do not reject H0
Chinese vs Others:- p-value > 0.008, do not reject H0
Indian vs Others:- p-value > 0.008, do not reject H0
Bonferroni correction:
The alpha value (/npair) = 0.008
Step 6: Conclusion
At the 5% level of significance, mean height are significant among races. Significant difference are
reported between Malay and Chinese,and Malay and Indian.
18
F-statistica(df)
9.59 (3)
p-valueb
< 0.001
19
3. Output:
21
3. Output:
Step 5: Interpretation
p-value < 0.05
2 statistic = 4.29
Step 6: Conclusion
There is an association between hypertension and gender. There is higher proportion in males group
compared to those in females group.
Step 7: Presentation of results
Table 5: Association between hypertension and gender (n=438)
Gender
Hypertension, n (%)
2 statistica (df)
Hypertensive
Normal
Male
14 (7.1)
182 (92.9)
Female
7 (2.9)
235 (97.1)
4.29 (1)
a
p-value
0.038
Pearson Chi-square test was applied; Level of significance was set at 5%.
Command: csi a b c d, or
3. Output:
Step 5: Interpretation
OR = 2.58 (95% CI: 1.05, 6.35). The 95% CI does not include 1, reject H0.
23
Step 6: Conclusion
There is 2.58 times of higher odds of getting hypertension among males as compared to females.
Step 7: Presentation of results
Table 6: Association between hypertension and gender (n=438)
Gender
Hypertension, n (%)
OR (95% CI)
Hypertensive
Normal
Male
14 (7.1)
182 (92.9)
Female
7 (2.9)
235 (97.1)
2.58 (1.05, 6.35)
a
2 statistica (df)
p-value
4.29 (1)
0.038
Pearson Chi-square test was applied; Level of significance was set at 5%.
ANALYSIS 7: Correlation
Research question: The researchers wish to identify the relationship between height and weight.
Step 1: Hypothesis
H0: There is no relationship between weight (kg) and systolic pressure (mmHg).
HA: There is a relationship between weight (kg) and systolic pressure (mmHg).
Step 2: Statistical test
1. Distribution of weight by histogram
a) Select: Graphics > Histogram
24
c) Output:
d) Interpretation:
Distribution of weight is approximately normal.
2. Distribution of weight by box and whisker plot
a) Select: Graphics > Boxplot
25
c) Output:
d) Interpretation:
Distribution of weight is approximately normal.
3. Distribution of systolic pressure by histogram and box and whisker plot.
26
27
Command:
. twoway (scatter vary varx) (lfit vary varx)
c) Output:
d) Interpretation: There is a positive correlation between weight and systolic blood pressure.
6. Strength of relationship between height and systolic blood pressure
a) Select: Summaries, tables, and tests > Summary and descriptive statistics > Pairwise
correlations
28
c) Output:
Notes:
Correlation coefficient (r)
r < 0.25 poor relationship
0.26 < r < 0.50 fair
0.51 < r < 0.75 good
0.76 < r < 1.00 - excellent
Step 3: Conclusion
There is a significant, positive and fair correlation between weight and systolic blood pressure
(r=0.47, p<0.001).
Step 4: Presentation of results
Table 7: Correlation between weight and systolic blood pressure (n=438)
Variable
ra
p-value
Weight (kg)
Systolic pressure (mmHg)
a
0.47
< 0.001
Pearson correlation test was applied; Level of significance was set at 5%.
29
30
Organising a filename.do
1. Open analysis_date.do > Create comments and notes as below
Notes:
Green text: Comments/ notes
/* : opening symbol for hiding selective commands
*/ : closing symbol for hiding selective commands
31