Fundamental Concepts of Model Building Using Regression Analysis-Mitra Sept 2 2016

FUNDAMENTAL CONCEPTS OF MODEL BUILDING
USING REGRESSIONANALYSIS
© A. MITRA
DETERMINISTIC AND PROBABILISTIC MODELS
 Deterministic
Y = ƒ(X 1 , X 2 ,… X p – 1 )
 If the independent variables,Xi’s are known,the dependent variable,Y,is known.
 Probabilistic
Y = ƒ X1, X 2 ,… X p – 1 + s
s ≡ a random and unknown component known as the errorcomponent.
© A. MITRA
LINEAR MODEL
 ƒ X = þO + þ 1 X
(a) Linear relationship

© A. MITRA
ERROR COMPONENT
 If Mean (s) = 0,for any given value of X;
E Y = þO + þ 1 X
 Estimated regression model:
Yˆ = þˆ + þˆ X
O 1
where þˆ and þˆ are estimates of the parameters þ and þ

O 1 O 1
© A. MITRA
LOGARITHMIC RELATIONSHIP
 E Y = þO Log(þ 1 X)
© A. MITRA
(b) Logarithmic relationship
INVERSE RELATIONSHIP
 E Y = þO + þ1(1/X)
(c) Inverse relationship

 Exponentially decreasing function
E Y = þO exp(—þ1X)
© A. MITRA
QUADRATIC RELATIONSHIP
 E Y = þO + þ 1 X + þ 2 X 2
(d) Quadratic relationship
© A. MITRA
MODEL ASSUMPTIONS
 Assumption 1 The mean of the probability distribution of the error component is 0, i.e., E(s) = 0, for each and
every level of the independent variable.
 Assumption 2 The variance of the probability distribution of the error component is constant for all levels of
the independent variable.The constant variance will be denoted by o 2 and this is often referred to as the
homoscedasticity assumption.
 Assumption 3 The probability distribution of the error component is normal for each level of the independent
variable.
 Assumption 4 The error components for any two different observations are independent of each other.
© A. MITRA
MODEL ASSUMPTIONS
X (Quantity ordered)
© A. MITRA
FIGURE 13-2 Assumptions for a regression model
MULTIPLE REGRESSION MODEL
 Y = þO + þ1X1 + þ 2 X 2 + ⋯ + þ p – 1 X p – 1 + s
 X’s are considered as fixed, Y is a random variable

 X’s could be from observational data
 X’s could be from set levels in designed experiments
 Fitted regression model:
Ŷ = þ̂O + þ̂1 X1 + ⋯ + þ̂p – 1 Xp – 1
© A. MITRA
LEAST SQUARES METHOD
 Residuals:
ei = Yi —Yî , i = 1,2,… , n
 Sum of squares for error (SSE):

SSE = ∑(Yi —Yî)2
 The least squares method finds the estimated model coefficients þÔ, þˆ1,… , þˆ p – 1 , such that SSE is minimized.
© A. MITRA
LEAST SQUARES METHOD
© A. MITRA
FIGURE 13-3 Method of least squares
SIMPLE LINEAR REGRESSION
 Estimates of slope and y-intercept coefficients:

∑ X i F i – ( ∑ X i )(∑ F i )/n
þ̂1 =
∑ X i2 – ∑ X i 2 /n
þ̂O = Y¯ —þˆ1 X¯
 Estimate of o 2 ,the variance of the error component:

SSE
s2 = = MSE
n–p
© A. MITRA
INTERACTION BETWEEN INDEPENDENT VARIABLES
 The functional relationship of Y with X1 is influenced by the level of another independent variable X 2 .
X2 = 1
X2 = 0
Y
© A. MITRA X1
PERFORMANCE MEASURES OF A REGRESSION MODEL
 Total sum of squares = Error sum of squares + Regression sum of squares
SST = SSE + SSR

∑(Yi —Y¯)2= ∑(Yi —Yî)2+ ∑(Yî —Y¯)2
 As model performance improves SSR will increase and SSE will decrease, for a given data set.
© A. MITRA
PARTITIONING OF TOTAL SUM OF SQUARES
© A. MITRA FIGURE 13-4 Partitioning of total sum of squares

PERFORMANCE MEASURES
 Coefficient of determination:
SSR
R2 = = 1 —SSE
SST SST
0 Ç R2 Ç 1
 R 2 never decreases as independent variables are added to the model.

 It may be of interest to know if the additional contribution of the new independent variable is significant.
© A. MITRA
ADJUSTED R 2
 Adjusted R 2 incorporates the number of independent variables used in the model.

 As an independent variable is added to an existing model,SSE will reduce or stay the same.Also,error degrees of
freedom (n-p) will reduce.Hence,the net impact of SSE/(n-p) is not known ahead of time.
( n – 1 ) SSE
 R 2a = 1 —( n – p ) SST
© A. MITRA
MODEL UTILITY
 HO: þ1 = þ2 = ⋯ = þ p–1 = 0
H1: At least one of the þi parameters G 0
 Test statistic:
MSR SSR/(p–1)
FO = =
MSE SSE/(n–p)
 When HO is true, test statistic has an F-distribution with (p-1) degrees of freedom in the numerator and (n-p) degrees of
freedom in the denominator.
 Reject HO if FO > F α,p–1,n–p
or if p-value < α (chosen level of significance)
© A. MITRA
MODEL UTILITY
Source of Degrees of Sum of squares Mean Square F-statistic p-value
Variation Freedom
Model p-1 SSR MSR = SSR/(p-1) F0 = MSR/MSE P[F p-1,n-p > F0]
Error n-p SSE MSE = SSE/(n-p)
TABLE ANOVA Table for Testing Model Utility
© A. MITRA
SIGNIFICANCE OF INDIVIDUAL PREDICTORS
 Test if an independent variable, in the presence of other independent variables, is significant:

HO: þi = 0; i = 1, 2, … , p —1
H a : þi G 0
 Test statistic:
þî
tO = SE(þˆ i)
 Critical value:Two-tailed, found from t-distribution with (n-p) degrees of freedom and α/2 area on each tail.
 Reject HO if tO X t α/2, n – p
or if p-value < α.
© A. MITRA
CONFIDENCE INTERVAL FOR MODEL PARAMETER
 CI for þ i :
þ̂i ± t α/2, n – p SE(þî )
 If CI does not include 0, the null hypothesis HO: þ i = 0 is rejected.
© A. MITRA
MODEL VALIDATION AND REMEDIAL MEASURES
 Linearity of Regression Function

 While model may be linear in the parameter (þi), it may not be so in terms of the independent variables.
 Functional forms for X could be:
X 2 , 1/X, X, ln X , exp X , etc.
© A. MITRA
RESIDUAL PLOTS FOR FUNCTIONAL FORMS
FIGURE 13-5 Residual plot for functional forms
(a) Adequate functional form
(a) Adequate functional form

(b) Adequate functional form
© A. MITRA
(c) Quadratic functional form (d) Exponential functional form

CONSTANCY OF ERRORVARIANCE
(a) Assumption satisfied (b) Increasing variability with Yˆ
FIGURE 13-6 Residual plots for validating homoscedasticity

© A. MITRA assumption.
 Response variable count data, Y has a Poisson distribution.

 Var (Y) ∝ Mean of (Y)
Shape of Residuals vs. Y-hat plot similar to Figure 13-6 (b).
 Variance stabilizing transformation:

Y∗ = Y
 If response variable represents the proportion of successes in a Binomial experiment, residual plot could be
football-shaped, similar to Figure 13-7 (a).
© A. MITRA
(a) Response Binomial proportion
FIGURE 13-7 Residual plots for nonconstant error variability

© A. MITRA
 For response variable being a Binomial proportion, variance stabilizing transformation:
Y ∗ = arcsin Y = arcsin p
or
p
Y ∗ = ln
1–p
© A. MITRA
EXPONENTIAL GROWTH OR DECAY OF DEPENDENT VARIABLE
 Model:
Y = E Y s
 Multiplicative Model
 Variance of residuals increases with Yˆ.
© A. MITRA
EXPONENTIAL GROWTH OR DECAY
FIGURE 13-7 Residual plots for nonconstant error variability
(b) Multiplicative model
 Variance stabilizing transformation:
Y ∗ = log Y
© A. MITRA
NORMALITY OF ERROR COMPONENT
 Histogram of residuals
 Box plot of residuals
 Normal probability plot of residuals
 Anderson-Darling test:
If p-value < α, reject the null hypothesis of normality of residuals.
 Remedial measures-If normality assumption not satisfied:

 Box-Cox power transformation: Y ∗ = Y h
© A. MITRA
ESTIMATION AND INFERENCES FROM REGRESSION MODEL
 Inferences on individual parameters:
HO: þi = 0
H a : þi G 0
bi
 Test statistic = tO = SE(b )i
 Inferences on all parameters:
HO: þ1 = þ2 = ⋯ = þ p–1 = 0
H a : At least one þi G 0
MSR
 Test statistic = FO = MSE
© A. MITRA
INFERENCES ON MODEL PARAMETERS
 Simultaneous inferences on some þ i :

 Joint confidence intervals- Bonferroni method
Family level of confidence = 1 – α
Number of simultaneous confidence intervals = g

Confidence level for individual parameter = 1 —α/g
t-value found for a tail area = α/(2g)
 CI for þ i :
þ̂i ± t α / 2g, n – p SE(þî ), i = 1, 2, … , g
© A. MITRA
HYPOTHESIS TEST ON SUBSET OF PARAMETERS
 Create a full (F) model and a reduced (R) model.

 Reduced model based on HO that we wish to test.
 HO: þ g+1 = þ g+2 = ⋯ = þ p – 1 = 0
H a : At least one of the above þ i parameters G 0
 Full Model: E Y = þO + þ1X1 + þ g X g + þ g+ 1 X g+ 1 + ⋯ + þ p – 1 X p – 1
 Reduced Model: E Y = þO + þ1X1 + ⋯ + þ g X g
S S E R – SS E F / ( p – g – 1 )
 Test Statistic: FO =
SSE F /(n–p)
© A. MITRA
HYPOTHESIS TESTING ON SUBSET OF PARAMETERS
 Numerator degreed of freedom of FO = (p —g —1)

Denominator degrees of freedom of FO = (n —p)
 If FO X F α , p – g – 1 , n – p , reject HO
 If p-value < α, reject HO
© A. MITRA
CONFIDENCE INTERVAL FOR MEAN RESPONSE
 Point estimate: Yˆr = þˆ + þˆ X + þˆ X + ⋯ + þ̂p – 1 Xp – 1 , r

O 1 1r 2 2r
 If standard error of Yˆr is given by SE (Yˆr) :
CI: Yˆr ± t α/2,n–p SE Yˆr
© A. MITRA
PREDICTION INTERVAL FOR INDIVIDUAL OBSERVATIONS
 s 2 Yr new = SE 2 Ŷr + s 2
 Prediction Interval:
Yˆr ± t α/2; n – p s Yr new
© A. MITRA

Fundamental Concepts of Model Building Using Regression Analysis-Mitra Sept 2 2016

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Fundamental Concepts of Model Building Using Regression Analysis-Mitra Sept 2 2016

Diunggah oleh

Hak Cipta:

Format Tersedia

FUNDAMENTAL CONCEPTS OF MODEL BUILDING

 If the independent variables,Xi’s are known,the dependent variable,Y,is known.

s ≡ a random and unknown component known as the errorcomponent.

(a) Linear relationship

 If Mean (s) = 0,for any given value of X;

 Estimated regression model:

where þˆ and þˆ are estimates of the parameters þ and þ

(c) Inverse relationship

(d) Quadratic relationship

 X’s are considered as fixed, Y is a random variable

 Fitted regression model:

Ŷ = þ̂O + þ̂1 X1 + ⋯ + þ̂p – 1 Xp – 1

 Sum of squares for error (SSE):

 Estimates of slope and y-intercept coefficients:

 Estimate of o 2 ,the variance of the error component:

 Total sum of squares = Error sum of squares + Regression sum of squares

SST = SSE + SSR

© A. MITRA FIGURE 13-4 Partitioning of total sum of squares

 R 2 never decreases as independent variables are added to the model.

 Adjusted R 2 incorporates the number of independent variables used in the model.

Error n-p SSE MSE = SSE/(n-p)

TABLE ANOVA Table for Testing Model Utility

 Test if an independent variable, in the presence of other independent variables, is significant:

þ̂i ± t α/2, n – p SE(þˆi )

 If CI does not include 0, the null hypothesis HO: þ i = 0 is rejected.

 Linearity of Regression Function

X 2 , 1/X, X, ln X , exp X , etc.

(a) Adequate functional form

(a) Adequate functional form

(c) Quadratic functional form (d) Exponential functional form

(a) Assumption satisfied (b) Increasing variability with Yˆ

FIGURE 13-6 Residual plots for validating homoscedasticity

 Response variable count data, Y has a Poisson distribution.

 Variance stabilizing transformation:

(a) Response Binomial proportion

FIGURE 13-7 Residual plots for nonconstant error variability

 For response variable being a Binomial proportion, variance stabilizing transformation:

(b) Multiplicative model

 Variance stabilizing transformation:

 Remedial measures-If normality assumption not satisfied:

 Simultaneous inferences on some þ i :

Number of simultaneous confidence intervals = g

 Create a full (F) model and a reduced (R) model.

 Reduced Model: E Y = þO + þ1X1 + ⋯ + þ g X g

 Numerator degreed of freedom of FO = (p —g —1)

 Point estimate: Yˆr = þˆ + þˆ X + þˆ X + ⋯ + þ̂p – 1 Xp – 1 , r

 If standard error of Yˆr is given by SE (Yˆr) :

CI: Yˆr ± t α/2,n–p SE Yˆr

Yˆr ± t α/2; n – p s Yr new

Anda mungkin juga menyukai