- tests whether i 0; i = 1, 2, 3, k
SIMPLE LINEAR REGRESSION
- used to estimate the dependent variable Y for given set of independent variable X.
Y = a + bX +
or
Y = 0 + 1X +
; where
! =
!" ! !
!!
! =
!
!
!
!
and
inference in 1 may be performed to determine if it is significantly different from zero (1 0), using
1 0
!!
!|!
!!
; with df = n 2
!
!
a linear relationship (linearity) exists between Y and Xi if the p-value of 1 (using t-test) < .
R2 is the proportion of the total variance (s2) of Y that can be explained by the linear regression of Y on X.
Example:
Using the example about the file, HCTRBC.sav, find the linear
regression model that estimates the RBC (Y, in x1012/L), given the
hematocrit (X, in % vol) of a patient.
Find Y = 0 + 1X +
! = 2 2
! =
!
!
!
!
=
=
ID
1
2
3
4
5
6
7
8
9
10
SUMS:
HCT
(% vol)
X
40.7
40.3
40.9
38.7
38.2
39.4
38
38.2
43.4
38.3
RBC
(x1012/L)
Y
4.4
4.3
4.4
4.1
4.1
4.2
4.1
4
4.6
4.1
X2
1656.49
1624.09
1672.81
1497.69
1459.24
1552.36
1444
1459.24
1883.56
1466.89
Y2
19.36
18.49
19.36
16.81
16.81
17.64
16.81
16
21.16
16.81
XY
179.08
173.29
179.96
158.67
156.62
165.48
155.8
152.8
199.64
157.03
X = 396.1
Y = 42.3
X2 = 15716.37
Y2 = 179.25
XY = 1678.37
Written by: Asst. Prof. Xandro Alexi A. Nieto of UST Faculty of Pharmacy
linear relationship (linearity) exists between Y and Xk if the p-value of the k < , using the individual t-tests of
the ANOVA result.
- Hypotheses are as follows:
Ho: ! = 0.
Ha: ! 0.
Diagnostic checking of the linear regression model may be applied by checking if:
the residuals are normally distributed (Kolmogorov-Smirnov Test of Normality)
Ho: The residuals are normally distributed.
Ha: The residuals are not normally distributed.
the residuals have constant variance (by using Levenes test or Bartletts test)
Ho: The variances are equal.
Ha: The variances are not equal.
Examples:
1. A researcher wants to determine if which among the variables (mother and fathers height; taller grandfathers height)
determine a sons height (expressed in inches). The data is in heights.sav. Test all hypotheses at = 0.05.
-
2. (bloodlead.sav) A group of researchers wanted to determine the factors that contributes to the amount of blood lead level
(in g/dL) in radiator repair workers. Data such as number of radiators repaired per day, years of employment, and renal
function tests [FBS (in mmol/L), creatinine (in mol/L), crea (in mg/dL), BUN (in mmol/L), presence of protein in urine,
and eGFR (in mL/min/1.73m)] were gathered. Conduct a multiple regression model to determine the factors that
contribute to the amount of blood lead level in radiator repair workers. Use 5% level of significance.
Linear
Regression
Results:
2
R
=
_________________
Regression
equation:
________________________________________________________________________________
Do
the
linear
regression
results
show
that
at
least,
one
of
the
coefficients
significantly
differ
from
zero?
Ho:
_______________________________________________________________________________________________
Ha:
_______________________________________________________________________________________________
Test
statistic:
_______
p-value:
________
Conclusion:
________________________________________________________________________________________
Which
of
the
variables
coefficients
significantly
differ
from
zero?
Number
of
radiators
repaired
per
day
Ho:
_________________________________________
Ha:
_________________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________________
Years
of
employment
(yrs)
Ho:
_________________________________________
Ha:
_________________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________________
Renal
function
tests
FBS
(in
mmol/L)
Ho:
______________________________________
Ha:
___________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________
Creatinine
(in
mol/L)
Ho:
______________________________________
Ha:
___________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________
Crea
(in
mg/dL)
Ho:
______________________________________
Ha:
___________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________
BUN
(in
mmol/L)
Ho:
______________________________________
Ha:
___________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________
Presence
or
Absence
of
Protein
Ho:
______________________________________
Ha:
___________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
________________________________________________________________________________
eGFR
(in
mL/min/1.73
m)
Ho:
______________________________________
Ha:
___________________________________________
Regression
coefficient:
____________
Test
statistic:
_______
p-value:
_______
Conclusion:
_____________________________________________________________________________
Written by: Asst. Prof. Xandro Alexi A. Nieto of UST Faculty of Pharmacy
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
Written by: Asst. Prof. Xandro Alexi A. Nieto of UST Faculty of Pharmacy
= + + + , or
Consider
-
! + where p = P(Y=1)
!
that
= !! !!! !! !!! !! !!! !!
!!!
= +
used when the dependent variable Y is dichotomous variable, when at least one of the independent variables Xi ,
i 1,2,,k, is interval/ratio.
validity of the model may be tested using the Hosmer and Lemeshow test, in which:
Ho: the data fits the model.
Ha: The data does not fit the model.
Example 1: An oncologist is interested to determine the variables that lead to papillary tumor growth, cancerous cells which
are found in the throat. Data from 40 patients who may have lived with exposure to radioactive iodine in the last 5 years and
who have had thyroiditis in the last six months is at thyroiditis.sav.
Model Fit Test:
Ho: ________________________
Ha: ________________________
Test Statistic: __________
p-value: ______________
Conclusion: __________________
Coefficient
stat
p-value
Odds Ratio
estimate
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
Example 2: (renalcast.sav) A group of researches wanted to determine the variables that leads to renal cast formation of
construction workers. Years in the occupation, if painting is included in the occupation, and urinary findings, such as BUN,
uric acid, PH, and presence of bacteria were recorded. Conduct a multiple logistic regression model to determine the
variables that leads to renal cast formation of construction workers. Use 5% level of significance.
Model Fit Test:
Ho: ________________________ Ha: ________________________
Test Statistic: __________ p-value: ______________
Conclusion: __________________
Summarize your findings using the table below:
Variables
Coefficient
stat
p-value
Odds Ratio
estimate
Years in Occupation
Painting
BUN
Uric Acid
pH
Bacteria
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
__________________________________________________________________________________________
Written by: Asst. Prof. Xandro Alexi A. Nieto of UST Faculty of Pharmacy