Anda di halaman 1dari 37

FUNDAMENTAL CONCEPTS OF MODEL BUILDING

USING REGRESSIONANALYSIS

© A. MITRA
DETERMINISTIC AND PROBABILISTIC MODELS

 Deterministic
Y = ƒ(X 1 , X 2 ,… X p – 1 )

 If the independent variables,Xi’s are known,the dependent variable,Y,is known.

 Probabilistic
Y = ƒ X1, X 2 ,… X p – 1 + s

s ≡ a random and unknown component known as the errorcomponent.

© A. MITRA
LINEAR MODEL
 ƒ X = þO + þ 1 X

(a) Linear relationship


© A. MITRA
ERROR COMPONENT

 If Mean (s) = 0,for any given value of X;

E Y = þO + þ 1 X

 Estimated regression model:

Yˆ = þˆ + þˆ X
O 1

where þˆ and þˆ are estimates of the parameters þ and þ


O 1 O 1

© A. MITRA
LOGARITHMIC RELATIONSHIP
 E Y = þO Log(þ 1 X)

© A. MITRA
(b) Logarithmic relationship
INVERSE RELATIONSHIP
 E Y = þO + þ1(1/X)

(c) Inverse relationship


 Exponentially decreasing function
E Y = þO exp(—þ1X)

© A. MITRA
QUADRATIC RELATIONSHIP
 E Y = þO + þ 1 X + þ 2 X 2

(d) Quadratic relationship

© A. MITRA
MODEL ASSUMPTIONS
 Assumption 1 The mean of the probability distribution of the error component is 0, i.e., E(s) = 0, for each and
every level of the independent variable.

 Assumption 2 The variance of the probability distribution of the error component is constant for all levels of
the independent variable.The constant variance will be denoted by o 2 and this is often referred to as the
homoscedasticity assumption.

 Assumption 3 The probability distribution of the error component is normal for each level of the independent
variable.

 Assumption 4 The error components for any two different observations are independent of each other.
© A. MITRA
MODEL ASSUMPTIONS

X (Quantity ordered)

© A. MITRA
FIGURE 13-2 Assumptions for a regression model
MULTIPLE REGRESSION MODEL

 Y = þO + þ1X1 + þ 2 X 2 + ⋯ + þ p – 1 X p – 1 + s

 X’s are considered as fixed, Y is a random variable


 X’s could be from observational data
 X’s could be from set levels in designed experiments

 Fitted regression model:

Ŷ = þ̂O + þ̂1 X1 + ⋯ + þ̂p – 1 Xp – 1

© A. MITRA
LEAST SQUARES METHOD

 Residuals:
ei = Yi —Yˆi , i = 1,2,… , n

 Sum of squares for error (SSE):


SSE = ∑(Yi —Yˆi)2

 The least squares method finds the estimated model coefficients þˆO, þˆ1,… , þˆ p – 1 , such that SSE is minimized.

© A. MITRA
LEAST SQUARES METHOD

© A. MITRA
FIGURE 13-3 Method of least squares
SIMPLE LINEAR REGRESSION

 Estimates of slope and y-intercept coefficients:


∑ X i F i – ( ∑ X i )(∑ F i )/n
þ̂1 =
∑ X i2 – ∑ X i 2 /n

þ̂O = Y¯ —þˆ1 X¯

 Estimate of o 2 ,the variance of the error component:


SSE
s2 = = MSE
n–p

© A. MITRA
INTERACTION BETWEEN INDEPENDENT VARIABLES

 The functional relationship of Y with X1 is influenced by the level of another independent variable X 2 .

X2 = 1

X2 = 0
Y

© A. MITRA X1
PERFORMANCE MEASURES OF A REGRESSION MODEL

 Total sum of squares = Error sum of squares + Regression sum of squares

SST = SSE + SSR


∑(Yi —Y¯)2= ∑(Yi —Yˆi)2+ ∑(Yˆi —Y¯)2

 As model performance improves SSR will increase and SSE will decrease, for a given data set.

© A. MITRA
PARTITIONING OF TOTAL SUM OF SQUARES

© A. MITRA FIGURE 13-4 Partitioning of total sum of squares


PERFORMANCE MEASURES

 Coefficient of determination:

SSR
R2 = = 1 —SSE
SST SST

0 Ç R2 Ç 1

 R 2 never decreases as independent variables are added to the model.


 It may be of interest to know if the additional contribution of the new independent variable is significant.

© A. MITRA
ADJUSTED R 2

 Adjusted R 2 incorporates the number of independent variables used in the model.


 As an independent variable is added to an existing model,SSE will reduce or stay the same.Also,error degrees of
freedom (n-p) will reduce.Hence,the net impact of SSE/(n-p) is not known ahead of time.
( n – 1 ) SSE
 R 2a = 1 —( n – p ) SST

© A. MITRA
MODEL UTILITY

 HO: þ1 = þ2 = ⋯ = þ p–1 = 0
H1: At least one of the þi parameters G 0

 Test statistic:
MSR SSR/(p–1)
FO = =
MSE SSE/(n–p)

 When HO is true, test statistic has an F-distribution with (p-1) degrees of freedom in the numerator and (n-p) degrees of
freedom in the denominator.
 Reject HO if FO > F α,p–1,n–p
or if p-value < α (chosen level of significance)
© A. MITRA
MODEL UTILITY
Source of Degrees of Sum of squares Mean Square F-statistic p-value
Variation Freedom

Model p-1 SSR MSR = SSR/(p-1) F0 = MSR/MSE P[F p-1,n-p > F0]

Error n-p SSE MSE = SSE/(n-p)

TABLE ANOVA Table for Testing Model Utility

© A. MITRA
SIGNIFICANCE OF INDIVIDUAL PREDICTORS

 Test if an independent variable, in the presence of other independent variables, is significant:


HO: þi = 0; i = 1, 2, … , p —1
H a : þi G 0

 Test statistic:
þˆi
tO = SE(þˆ i)

 Critical value:Two-tailed, found from t-distribution with (n-p) degrees of freedom and α/2 area on each tail.
 Reject HO if tO X t α/2, n – p
or if p-value < α.

© A. MITRA
CONFIDENCE INTERVAL FOR MODEL PARAMETER

 CI for þ i :

þ̂i ± t α/2, n – p SE(þˆi )

 If CI does not include 0, the null hypothesis HO: þ i = 0 is rejected.

© A. MITRA
MODEL VALIDATION AND REMEDIAL MEASURES

 Linearity of Regression Function


 While model may be linear in the parameter (þi), it may not be so in terms of the independent variables.
 Functional forms for X could be:

X 2 , 1/X, X, ln X , exp X , etc.

© A. MITRA
RESIDUAL PLOTS FOR FUNCTIONAL FORMS
FIGURE 13-5 Residual plot for functional forms

(a) Adequate functional form

(a) Adequate functional form


(b) Adequate functional form

© A. MITRA

(c) Quadratic functional form (d) Exponential functional form


CONSTANCY OF ERRORVARIANCE

(a) Assumption satisfied (b) Increasing variability with Yˆ

FIGURE 13-6 Residual plots for validating homoscedasticity


© A. MITRA assumption.
CONSTANCY OF ERRORVARIANCE

 Response variable count data, Y has a Poisson distribution.


 Var (Y) ∝ Mean of (Y)
Shape of Residuals vs. Y-hat plot similar to Figure 13-6 (b).

 Variance stabilizing transformation:


Y∗ = Y
 If response variable represents the proportion of successes in a Binomial experiment, residual plot could be
football-shaped, similar to Figure 13-7 (a).

© A. MITRA
CONSTANCY OF ERRORVARIANCE

(a) Response Binomial proportion

FIGURE 13-7 Residual plots for nonconstant error variability


© A. MITRA
CONSTANCY OF ERRORVARIANCE

 For response variable being a Binomial proportion, variance stabilizing transformation:

Y ∗ = arcsin Y = arcsin p
or
p
Y ∗ = ln
1–p

© A. MITRA
EXPONENTIAL GROWTH OR DECAY OF DEPENDENT VARIABLE

 Model:
Y = E Y s

 Multiplicative Model
 Variance of residuals increases with Yˆ.

© A. MITRA
EXPONENTIAL GROWTH OR DECAY
FIGURE 13-7 Residual plots for nonconstant error variability

(b) Multiplicative model

 Variance stabilizing transformation:

Y ∗ = log Y
© A. MITRA
NORMALITY OF ERROR COMPONENT

 Histogram of residuals
 Box plot of residuals
 Normal probability plot of residuals
 Anderson-Darling test:
If p-value < α, reject the null hypothesis of normality of residuals.

 Remedial measures-If normality assumption not satisfied:


 Box-Cox power transformation: Y ∗ = Y h

© A. MITRA
ESTIMATION AND INFERENCES FROM REGRESSION MODEL
 Inferences on individual parameters:
HO: þi = 0
H a : þi G 0

bi
 Test statistic = tO = SE(b )i
 Inferences on all parameters:
HO: þ1 = þ2 = ⋯ = þ p–1 = 0
H a : At least one þi G 0

MSR
 Test statistic = FO = MSE

© A. MITRA
INFERENCES ON MODEL PARAMETERS

 Simultaneous inferences on some þ i :


 Joint confidence intervals- Bonferroni method
Family level of confidence = 1 – α

Number of simultaneous confidence intervals = g


Confidence level for individual parameter = 1 —α/g
t-value found for a tail area = α/(2g)

 CI for þ i :
þ̂i ± t α / 2g, n – p SE(þˆi ), i = 1, 2, … , g

© A. MITRA
HYPOTHESIS TEST ON SUBSET OF PARAMETERS

 Create a full (F) model and a reduced (R) model.


 Reduced model based on HO that we wish to test.
 HO: þ g+1 = þ g+2 = ⋯ = þ p – 1 = 0
H a : At least one of the above þ i parameters G 0
 Full Model: E Y = þO + þ1X1 + þ g X g + þ g+ 1 X g+ 1 + ⋯ + þ p – 1 X p – 1

 Reduced Model: E Y = þO + þ1X1 + ⋯ + þ g X g

S S E R – SS E F / ( p – g – 1 )
 Test Statistic: FO =
SSE F /(n–p)

© A. MITRA
HYPOTHESIS TESTING ON SUBSET OF PARAMETERS

 Numerator degreed of freedom of FO = (p —g —1)


Denominator degrees of freedom of FO = (n —p)

 If FO X F α , p – g – 1 , n – p , reject HO
 If p-value < α, reject HO

© A. MITRA
CONFIDENCE INTERVAL FOR MEAN RESPONSE

 Point estimate: Yˆr = þˆ + þˆ X + þˆ X + ⋯ + þ̂p – 1 Xp – 1 , r


O 1 1r 2 2r

 If standard error of Yˆr is given by SE (Yˆr) :

CI: Yˆr ± t α/2,n–p SE Yˆr

© A. MITRA
PREDICTION INTERVAL FOR INDIVIDUAL OBSERVATIONS

 s 2 Yr new = SE 2 Ŷr + s 2
 Prediction Interval:

Yˆr ± t α/2; n – p s Yr new

© A. MITRA

Anda mungkin juga menyukai