Anda di halaman 1dari 9

1

Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking

Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking
Mariah Delaire
COH 602 Biostatistics
National University
June 5, 2015

2
Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking

The question addressed in this data analysis is, is there a relationship between smoking
and blood pressure? The extraneous variables that were considered to better address this
question is weight status and gender. It is hypothesized that there is a higher proportion of
overweight male smokers compared to overweight female smokers, causing males to have higher
blood pressure. This project used the data set HEART which is a subset of variables from the
Framingham Heart Study. PROC CONTENTS was constructed using SAS to determine variables
that will be used for the project.
For descriptive statistics, frequency distributions were computed using the PROC FREQ
procedure in SAS for all of the categorical variables. The categorical variables were blood
pressure status, smoking status, weight status and gender. Once the frequencies were analyzed,
there were no obvious outliers skewing the data, allowing for all variables to remain in the
procedure. The next procedure used was PROC GCHART which allowed for the categorical
variables to be visualized. Blood pressure, smoking status, weight status and gender were all
graphed in the GCHART procedure to better visualize the distributions of the data. The data was
then sorted by gender using the PROC SORT procedure. After which, PROC FREQ was used
once more to sort the frequencies by gender to better assess distributions between the variables.
PROC GCHART was used once again for the sorted data to visualize the sorted distributions.
Inferential statistics was conducted using the chi-squared procedure to assess the
relationships between blood pressure status, smoking status and weight status. The first thing was
to eliminate extraneous variables by just assessing the relationship between all variables. The chi
squared procedure was the most appropriate test to use because all variables used in this study
were categorical. Using t tests or ANOVA would not suffice due to the lack of continuous data.

3
Data Analysis Project: Determining the Relationship between Blood Pressure and
The chi-squared procedure was used to first assess the relationship between blood pressure status
Smoking

and smoking status. After that procedure was ran, the same was done for blood pressure status
and weight status. These were done separately to assess their independent relationships to the
independent variable. The last procedure ran without extraneous variables was chi squared using
all three categorical variables- blood pressure status, smoking status and weight status.
After relationships were seen, adding the extraneous variable, gender, was done to stratify
the data and test the research question. The chi squared procedure was ran using the PROC
SORT feature to stratify the data between male and female. After which, the chi squared
procedure was done to assess the relationship between blood pressure status and smoking status
between genders. The same was done using blood pressure status and weight status. The last
procedure used was chi squared to assess the relationship between blood pressure status,
smoking status, and weight status between genders.
Hypothesis testing was done to determine whether the data obtained was significant or
not. The null hypothesis is that blood pressure status, smoking status and weight status is
independent regardless if extraneous variables exist or not. The alternative hypothesis is that
there is a relationship between all variables and that they affect blood pressure status. The
appropriate test statistic was that for chi squared 2 = (O-E)2/E. To decide whether or not to
reject the null hypothesis, the critical value was determined using the degrees of freedom for
each test along with the table for chi square distribution. For blood pressure status*smoking
status using chi squared, the critical value determined was 18.31. So if 218.31, then the end
result is considered significant. Blood pressure * weight status critical value came to be 12.59,
so if 212.59, the value is significant. The critical values for the remaining variables are listed
below in table 1.

4
Data Analysis Project: Determining the Relationship between Blood Pressure and
Smoking
VARIABLE NAME
VARIABLE LABEL
RESPONSE VALUES

BP_STATUS

Blood Pressure Status

1 = High
2 = Normal
3 = Optimal

SEX

1 = Female
2 = Male

SMOKING_STATUS

Smoking Status

1 = Heavy (16-25)
2 = Light (1-5)
3 = Moderate (6-15)
4 = Non-smoker
5 = Very Heavy (>25

WEIGHT_STATUS

Weight Status

1 = Normal
2 = Overweight
3 = Underweight

Table 1: Variables used in Data Analysis

As seen in table 1, the variables for the data analysis used were blood pressure status,
gender, smoking status, and weight status. Table 2 shows the frequencies in all the categorical
variables between genders. As expected, males had a higher percentage of heavy smokers
compared to females along with a higher percentage of overweight individuals. 11.87 percent of

5
Data Analysis Project: Determining the Relationship between Blood Pressure and
females are heavy smokers while 30.51 percent of males are heavy smokers. 66.46 percent of
Smoking

females are overweight while 70.39 percent of males are overweight. The percentage of females
who have high blood pressure is 41.28 while the percentages of males is 46.28. These results
correlate with the idea that males have higher blood pressures due to their weight and smoking
status.

Table2:Frequenciesforcategoricalvariablesbetweengenders

SmokingStatus(females)
Smoking_Status

Frequency Percent

Cumulative Cumulative
Frequency
Percent

Heavy(1625)

339

11.87

339

11.87

Light(15)

422

14.78

761

26.65

Moderate(615)

340

11.90

1101

38.55

1682

58.89

2783

97.44

73

2.56

2856

100.00

Nonsmoker
VeryHeavy(>25)

FrequencyMissing=17

WeightStatus(females)
Cumulative Cumulative
Weight_Status Frequency Percent Frequency
Percent
Normal
Overweight
Underweight

846

29.49

846

29.49

1907

66.47

2753

95.96

116

4.04

2869

100.00

FrequencyMissing=4

6
Data Analysis Project: Determining the Relationship between Blood Pressure and
BloodPressureStatus(females)
Smoking
Cumulative Cumulative
BP_Status Frequency Percent Frequency
Percent
High

1186

41.28

1186

41.28

Normal

1166

40.58

2352

81.87

Optimal

521

18.13

2873

100.00

SmokingStatus(males)
Smoking_Status

Cumulative Cumulative
Frequency Percent Frequency
Percent

Heavy(1625)

707

30.51

707

30.51

Light(15)

157

6.78

864

37.29

Moderate(615)

236

10.19

1100

47.48

Nonsmoker

819

35.35

1919

82.82

VeryHeavy(>25)

398

17.18

2317

100.00

FrequencyMissing=19
WeightStatus(males)
Cumulative Cumulative
Weight_Status Frequency Percent Frequency
Percent
Normal
Overweight
Underweight

626

26.82

626

26.82

1643

70.39

2269

97.22

65

2.78

2334

100.00

FrequencyMissing=2

7
Data Analysis Project: Determining the Relationship between Blood Pressure and
BloodPressureStatus(males)
Smoking
Cumulative Cumulative
BP_Status Frequency Percent Frequency
Percent
High

1081

46.28

1081

46.28

Normal

977

41.82

2058

88.10

Optimal

278

11.90

2336

100.00

Inferential statistics were used to determine the relationship between blood pressure and
smoking with extraneous variables of weight and gender. The below table 3 shows the results of
using the chi-squared procedure in SAS. The first variables tested were tested without stratifying
the data between males and females. The relationship between blood pressure status and smoking
status were ruled significant because 89.94>18.31 with a p-value<0.0001. The same occurred for
blood pressure status and weight status with a p value <0.0001 and a test statistic value of
386.667>12.59. With all variables tested against each other using chi squared, the same results
occurred as the last. The p-value for blood pressure status, smoking status and weight status was
<0.0001 with a test statistic value of 58.21>25.00.
Table 3: Chi- Squared Procedure Results
BPStat*
Smoking
Stat

BPStat*
WeightStat

BPStat*
WeightStat*
Smoking
Stat

BPStat*
Smoking
Stat
(female)

BPStat*
Smoking
Stat
(male)

BPStat*
WeightStat
(female)

BPStat*
Weight
Stat
(male)

BPStat*
WeightStat*
SmokingSta
t
(female)

BPStat*
WeightStat*
SmokingStat
(male)

DF
2 Value

10
89.935

6
386.667

15
58.206

10
102.387

10
26.774

6
247.180

6
138.14

15
65.452

15
21.774

P-Value

<0.000

<0.0001

<0.0001

<0.000

0.0028

<0.0001

7
<0.001

<0.0001

0.1139

1
18.31

12.59

25.00

1
18.31

18.31

12.59

12.59

25.00

25.0

Critical
Value

8
Data Analysis Project: Determining the Relationship between Blood Pressure and
After sorting the data between genders, the results did not change too significantly. For
Smoking

blood pressure status * smoking status in females, the test statistic value was 102.39>18.31 with
a p-value of <0.0001, which showed significance between the variables. Testing the same
variables for males, resulted in a test statistic value of 26.77>18.31 with a p-value of 0.003. The
results showed there was a significant relationship between blood pressure and smoking in
males. The next variables tested were blood pressure status and weight status in females which
yielded a test statistic value of 247.18>12.59 with a p-value of <0.0001, classifying it significant
as well. The same results tested for males resulted in a test statistic value of 138.15>12.59 with a
p-value of >0.0001, which showed that there was a significant relationship between blood
pressure and weight in males.
The last chi squared procedure was used to test blood pressure status, smoking status, and
weight status in both males and females. For females the results showed a test statistic value of
65.45>25 with a p-value of >0.0001 indicating a significant relationship between these variables.
Using the test for males, the test statistic value was 21.77<25 with a p-value of 0.114, yielding
results that showed there was not a significant relationship.
In conclusion, there were more significant results with the unsorted data, which coincided
more with the alternative hypothesis. As seen in the results from table 3, all of the p-values and
test statistic values were in a range that proved large significance. This showed that there was a
significant relationship between blood pressure status, smoking status and weight status. After
sorting the data, there was a larger level of significance between the variables and females as
opposed to the variables and males. As seen, the sorted data for females showed a much larger
level of significance whereas the data for males was much lower, including one set of variables
that were proven insignificant. As a result, it has been seen that males do not have a higher blood

9
Data Analysis Project: Determining the Relationship between Blood Pressure and
pressure due to weight and smoking in comparison to females. The results actually show the
Smoking

reverse, where the females have a higher level of significance between all the variables. In
conclusion, we fail to reject the null hypothesis because 21.77<25.00 with a p-value of 0.114. We
do not have statistically significant evidence to show that blood pressure is higher in males than
females due to smoking and weight. However, the results can be proven significant if the data
was not stratified, and the variables were assessed individually with blood pressure status.

References:
Delwiche, L., & Slaughter, S. (2012). The little SAS book: A primer (5th ed). Cary, N.C.:
SAS Institute.
SAS/STAT 9.2 Users Guide, Second Edition. Retrieved July 3, 2016.
Sullivan, L., & Sullivan, L. (2012). Essentials of biostatistics in public health (2nd ed.).
Sudbury, MA: Jones & Bartlett Learning.

Anda mungkin juga menyukai