Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking
Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking
Mariah Delaire
COH 602 Biostatistics
National University
June 5, 2015
2
Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking
The question addressed in this data analysis is, is there a relationship between smoking
and blood pressure? The extraneous variables that were considered to better address this
question is weight status and gender. It is hypothesized that there is a higher proportion of
overweight male smokers compared to overweight female smokers, causing males to have higher
blood pressure. This project used the data set HEART which is a subset of variables from the
Framingham Heart Study. PROC CONTENTS was constructed using SAS to determine variables
that will be used for the project.
For descriptive statistics, frequency distributions were computed using the PROC FREQ
procedure in SAS for all of the categorical variables. The categorical variables were blood
pressure status, smoking status, weight status and gender. Once the frequencies were analyzed,
there were no obvious outliers skewing the data, allowing for all variables to remain in the
procedure. The next procedure used was PROC GCHART which allowed for the categorical
variables to be visualized. Blood pressure, smoking status, weight status and gender were all
graphed in the GCHART procedure to better visualize the distributions of the data. The data was
then sorted by gender using the PROC SORT procedure. After which, PROC FREQ was used
once more to sort the frequencies by gender to better assess distributions between the variables.
PROC GCHART was used once again for the sorted data to visualize the sorted distributions.
Inferential statistics was conducted using the chi-squared procedure to assess the
relationships between blood pressure status, smoking status and weight status. The first thing was
to eliminate extraneous variables by just assessing the relationship between all variables. The chi
squared procedure was the most appropriate test to use because all variables used in this study
were categorical. Using t tests or ANOVA would not suffice due to the lack of continuous data.
3
Data Analysis Project: Determining the Relationship between Blood Pressure and
The chi-squared procedure was used to first assess the relationship between blood pressure status
Smoking
and smoking status. After that procedure was ran, the same was done for blood pressure status
and weight status. These were done separately to assess their independent relationships to the
independent variable. The last procedure ran without extraneous variables was chi squared using
all three categorical variables- blood pressure status, smoking status and weight status.
After relationships were seen, adding the extraneous variable, gender, was done to stratify
the data and test the research question. The chi squared procedure was ran using the PROC
SORT feature to stratify the data between male and female. After which, the chi squared
procedure was done to assess the relationship between blood pressure status and smoking status
between genders. The same was done using blood pressure status and weight status. The last
procedure used was chi squared to assess the relationship between blood pressure status,
smoking status, and weight status between genders.
Hypothesis testing was done to determine whether the data obtained was significant or
not. The null hypothesis is that blood pressure status, smoking status and weight status is
independent regardless if extraneous variables exist or not. The alternative hypothesis is that
there is a relationship between all variables and that they affect blood pressure status. The
appropriate test statistic was that for chi squared 2 = (O-E)2/E. To decide whether or not to
reject the null hypothesis, the critical value was determined using the degrees of freedom for
each test along with the table for chi square distribution. For blood pressure status*smoking
status using chi squared, the critical value determined was 18.31. So if 218.31, then the end
result is considered significant. Blood pressure * weight status critical value came to be 12.59,
so if 212.59, the value is significant. The critical values for the remaining variables are listed
below in table 1.
4
Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking
VARIABLE NAME
VARIABLE LABEL
RESPONSE VALUES
BP_STATUS
1 = High
2 = Normal
3 = Optimal
SEX
1 = Female
2 = Male
SMOKING_STATUS
Smoking Status
1 = Heavy (16-25)
2 = Light (1-5)
3 = Moderate (6-15)
4 = Non-smoker
5 = Very Heavy (>25
WEIGHT_STATUS
Weight Status
1 = Normal
2 = Overweight
3 = Underweight
As seen in table 1, the variables for the data analysis used were blood pressure status,
gender, smoking status, and weight status. Table 2 shows the frequencies in all the categorical
variables between genders. As expected, males had a higher percentage of heavy smokers
compared to females along with a higher percentage of overweight individuals. 11.87 percent of
5
Data Analysis Project: Determining the Relationship between Blood Pressure and
females are heavy smokers while 30.51 percent of males are heavy smokers. 66.46 percent of
Smoking
females are overweight while 70.39 percent of males are overweight. The percentage of females
who have high blood pressure is 41.28 while the percentages of males is 46.28. These results
correlate with the idea that males have higher blood pressures due to their weight and smoking
status.
Table2:Frequenciesforcategoricalvariablesbetweengenders
SmokingStatus(females)
Smoking_Status
Frequency Percent
Cumulative Cumulative
Frequency
Percent
Heavy(1625)
339
11.87
339
11.87
Light(15)
422
14.78
761
26.65
Moderate(615)
340
11.90
1101
38.55
1682
58.89
2783
97.44
73
2.56
2856
100.00
Nonsmoker
VeryHeavy(>25)
FrequencyMissing=17
WeightStatus(females)
Cumulative Cumulative
Weight_Status Frequency Percent Frequency
Percent
Normal
Overweight
Underweight
846
29.49
846
29.49
1907
66.47
2753
95.96
116
4.04
2869
100.00
FrequencyMissing=4
6
Data Analysis Project: Determining the Relationship between Blood Pressure and
BloodPressureStatus(females)
Smoking
Cumulative Cumulative
BP_Status Frequency Percent Frequency
Percent
High
1186
41.28
1186
41.28
Normal
1166
40.58
2352
81.87
Optimal
521
18.13
2873
100.00
SmokingStatus(males)
Smoking_Status
Cumulative Cumulative
Frequency Percent Frequency
Percent
Heavy(1625)
707
30.51
707
30.51
Light(15)
157
6.78
864
37.29
Moderate(615)
236
10.19
1100
47.48
Nonsmoker
819
35.35
1919
82.82
VeryHeavy(>25)
398
17.18
2317
100.00
FrequencyMissing=19
WeightStatus(males)
Cumulative Cumulative
Weight_Status Frequency Percent Frequency
Percent
Normal
Overweight
Underweight
626
26.82
626
26.82
1643
70.39
2269
97.22
65
2.78
2334
100.00
FrequencyMissing=2
7
Data Analysis Project: Determining the Relationship between Blood Pressure and
BloodPressureStatus(males)
Smoking
Cumulative Cumulative
BP_Status Frequency Percent Frequency
Percent
High
1081
46.28
1081
46.28
Normal
977
41.82
2058
88.10
Optimal
278
11.90
2336
100.00
Inferential statistics were used to determine the relationship between blood pressure and
smoking with extraneous variables of weight and gender. The below table 3 shows the results of
using the chi-squared procedure in SAS. The first variables tested were tested without stratifying
the data between males and females. The relationship between blood pressure status and smoking
status were ruled significant because 89.94>18.31 with a p-value<0.0001. The same occurred for
blood pressure status and weight status with a p value <0.0001 and a test statistic value of
386.667>12.59. With all variables tested against each other using chi squared, the same results
occurred as the last. The p-value for blood pressure status, smoking status and weight status was
<0.0001 with a test statistic value of 58.21>25.00.
Table 3: Chi- Squared Procedure Results
BPStat*
Smoking
Stat
BPStat*
WeightStat
BPStat*
WeightStat*
Smoking
Stat
BPStat*
Smoking
Stat
(female)
BPStat*
Smoking
Stat
(male)
BPStat*
WeightStat
(female)
BPStat*
Weight
Stat
(male)
BPStat*
WeightStat*
SmokingSta
t
(female)
BPStat*
WeightStat*
SmokingStat
(male)
DF
2 Value
10
89.935
6
386.667
15
58.206
10
102.387
10
26.774
6
247.180
6
138.14
15
65.452
15
21.774
P-Value
<0.000
<0.0001
<0.0001
<0.000
0.0028
<0.0001
7
<0.001
<0.0001
0.1139
1
18.31
12.59
25.00
1
18.31
18.31
12.59
12.59
25.00
25.0
Critical
Value
8
Data Analysis Project: Determining the Relationship between Blood Pressure and
After sorting the data between genders, the results did not change too significantly. For
Smoking
blood pressure status * smoking status in females, the test statistic value was 102.39>18.31 with
a p-value of <0.0001, which showed significance between the variables. Testing the same
variables for males, resulted in a test statistic value of 26.77>18.31 with a p-value of 0.003. The
results showed there was a significant relationship between blood pressure and smoking in
males. The next variables tested were blood pressure status and weight status in females which
yielded a test statistic value of 247.18>12.59 with a p-value of <0.0001, classifying it significant
as well. The same results tested for males resulted in a test statistic value of 138.15>12.59 with a
p-value of >0.0001, which showed that there was a significant relationship between blood
pressure and weight in males.
The last chi squared procedure was used to test blood pressure status, smoking status, and
weight status in both males and females. For females the results showed a test statistic value of
65.45>25 with a p-value of >0.0001 indicating a significant relationship between these variables.
Using the test for males, the test statistic value was 21.77<25 with a p-value of 0.114, yielding
results that showed there was not a significant relationship.
In conclusion, there were more significant results with the unsorted data, which coincided
more with the alternative hypothesis. As seen in the results from table 3, all of the p-values and
test statistic values were in a range that proved large significance. This showed that there was a
significant relationship between blood pressure status, smoking status and weight status. After
sorting the data, there was a larger level of significance between the variables and females as
opposed to the variables and males. As seen, the sorted data for females showed a much larger
level of significance whereas the data for males was much lower, including one set of variables
that were proven insignificant. As a result, it has been seen that males do not have a higher blood
9
Data Analysis Project: Determining the Relationship between Blood Pressure and
pressure due to weight and smoking in comparison to females. The results actually show the
Smoking
reverse, where the females have a higher level of significance between all the variables. In
conclusion, we fail to reject the null hypothesis because 21.77<25.00 with a p-value of 0.114. We
do not have statistically significant evidence to show that blood pressure is higher in males than
females due to smoking and weight. However, the results can be proven significant if the data
was not stratified, and the variables were assessed individually with blood pressure status.
References:
Delwiche, L., & Slaughter, S. (2012). The little SAS book: A primer (5th ed). Cary, N.C.:
SAS Institute.
SAS/STAT 9.2 Users Guide, Second Edition. Retrieved July 3, 2016.
Sullivan, L., & Sullivan, L. (2012). Essentials of biostatistics in public health (2nd ed.).
Sudbury, MA: Jones & Bartlett Learning.