Anda di halaman 1dari 21

Regression and

Multiple Regression Analysis

Group members :
2. Md. Abdul Hakim (GS 20474)
3. Eka Tarwaca Susila Putra (GS 21024)
4. Md. Kamal Uddin (GS 19021)
5. Muhd. Azlan bin Abd Ghani (GS 19339)
6. Md. Kausar Hossain (GS 20976)
Regression
- a technique use for the modeling and analysis of
numerical data consisting of value of dependent
variable (response variable) and of one or more
independent variables (explanatory variables).

It can be used for prediction (including forecasting of


time-series data), inference, hypothesis testing, and
modeling of causal relationships.

Regression concepts were published in early of 1800.

It was published by Legendre 1805 and gauss 1809.


Applications
Applications of regression are numerous and occur in
almost every field, including:
- engineering,
- physical sciences,
- economics,
- management,
- life and biological sciences
- social sciences.

In fact, regression analysis may be the most widely used


statistical technique.
Types of Regression Models

1 independent Regression 2+ Independent


Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear
simple linear regression model:
A regression model that involves only one independent
variable.
The form can be express as

Yi = β0+ β1Xi+ei i = 1,2,3,..., n

Here, Y = the yield (dependent),


Xi= the independent variable
ei= error or disturbance
Multiple linear regression model:
A regression model that involves more than one
regressor (independent) variable.

The general form can be express as

Yi = β0+ β1Xi1+ β2Xi2+ …. + βkXik+ ei i = 1,2,3,..., n

Here, Y = the yield (dependent),


Xi= the independent variable
ei= error or disturbance
Objectives

3. The general purpose of regression (multiple


regression) is to learn about the relationship between
several independent or predictor variables and a
dependent variable.

5. The specific objective of regression are:


• Estimate the unknown parameters in the
regression model (fitting the model to the data).
• Predict or forecast the response variable and these
predictions are helpful in planning the project.
Underlying Principles
According to Gaussian, standard or classical linear
regression model (CLRM), which is the
foundation/cornerstone of most econometric theory.

several assumptions:

Assumption 1: The regression model is linear in the


parameters
Assumption 2: X values are fixed in repeated sampling
Assumption 3: Zero mean values of disturbance (error)
Underlying Principles cont’s …

Assumption 4: Error variance


ie: Var(ei /Xi ) = σ2 ( a constant)
Assumption 5: No autocorrelation between the
disturbances (error).
Assumption 6: Zero covariance between ei and Xi , or
Cov (ei, Xi) = 0
Assumption 7: There are no perfect linear
relationships among the
independent variables.
Methods of Estimation
Here we just name some well-known methods for
estimating the regression model:
• The methods of moments
• The methods of least squares
• The methods of maximum likelihood

The Ordinary Least Squares (OLS) method of


estimation is the popular one, has a wide area of uses
for its flexibility.

The main aim of least square method is to estimate


parameters of the linear regression model by minimizing
the error sum of squares.
The Ordinary Least Squares (OLS)
A multi linear model of the form
Y = β0+ β1X1+ β2X2+….++ β6X6+e
We may write the sample regression model as follows
Yi = β0 + β1xi1 + β2xi2 + ---------+ βkxik + εI

The least-squares function is


n
S = ∑εI 2
i=1

n k
= ∑( yi - β0 - ∑βj xij )2
i=1 j =1
This function S must be with respect to β0, β1, ……….., βk.
The least-squaresd estimators of β0, β1, ……….., βk are estimated by
minimized this S function with respect to β0, β1, ……….., βk.
The techniques to determining the model accuracy:

iii) Standard error of the coefficient


iv) T-test of the coefficients
v) Residuals standards deviations
• Coefficient of determination, R2
• ANOVA for overall measures
(i) The standard error is represented by

se( βi ) = MSres / Sxx


MSres : residual means square
Sxx : Sum of square of independent variables
(ii) T-test of the coefficients

 Suppose that we wish to test the


hypothesis that the slope equals a
constant, say ßi0. The appropriate
hypothesis are:

H0 : ßi = ßio
H1 : ßi ≠ ßio

where we have specified a two-sided alternative


(ii) T-test of the coefficients cont’s…

The definition of a t statistic is follows:


To = (βi – βio) / MSres / Sxx

iii) Coefficient of determination:


•R2 as a PRE (proportional-reduction-in-error measure of association)
o
iv) Residual standard deviation:

the standard deviation of the residuals (residuals = differences


between observed and predicted values). It is calculated as
follows:
(v) ANOVA for overall measures

The analysis of variance table divides the total variation


in the dependent variable into two components,

1st component- which can be attributed to the regression


model (labeled Regression)
2nd component-which cannot (labeled Residual).

*If the significance level for the F-test is small (less than
0.05), then the hypothesis that there is no (linear)
relationship can be rejected, and the multiple correlation
coefficient can be called statistically significant. The F
statistic can be written as

Fo =
MSr MSr = Regression means square
MSres = Residual means square
MSres
Literature on Applications of OLS method :

Here we have considered a seven variable Multiple linear regression model.


The model can be written as a linear form

Y = β0+ β1X1+ β2X2+….++ β6X6+e

Y = Overall rating of job being done by supervisor


X1 = Handles employee complaints
X2 = Does not allow special privileges
X3 = Opportunity to learn new things
X4 = Raises based on performance
X5 = To critical of poor performance
X6 = Rate of advancing to better jobs
e = Error term
β0, β1, β2,….,β6 are the unknown parameters.

Our ultimate goal is to estimate the unknown parameters from the model.

Data Source: http://www.ilr.cornell.edu/hadi/rabe4


For estimating model we have used here SPSS 11.5 version. The outputs getting
from SPSS 11.5 version are given below:

Summary of coefficients
t Sig.
Std. Error
Model of
Coefficients Coefficients
(Constant) 10.787 11.589 .931 .362
X1 .613 .161 3.809 .001
X2 -.073 .136 -.538 .596
X3 .320 .169 1.901 .040
X4 .082 .221 .369 .715
X5 .038 .147 .261 .796
X6 -.217 .178 -1.218 .236

From summary of the coefficients table we see that the variables


X1 and X3 are significance than comparing the other variables.
The R2 value =0.73 and standard error of the estimate= 7.06
Here value of R2 is high, this imply that our fitting
model for this data set is appropriate.

ANOVA
Model Sum of
Squares df Mean Square F Sig.
Regression
3147.966 6 524.661 10.502 .000

Residual 1149.000 23 49.957


Total 4296.967 29

We can also comment from ANOVA Table that


over all fitting of the model is also appropriate (F=10.502, α=0.01).
Conclusion
2. Regression- can learn the relationship between several
independent variables and a dependent variable.

4. Regression- can estimate the unknown parameters of


regression model

6. It also can be use for forecasting the response variable


and these predictions are helpful in planning the project.