Anda di halaman 1dari 25

Regresi dan Korelasi

Lanjutan……
Least Square Principle
• Determining a regression equation by minimizing the
sum of the squares of the squares of the vertical
distances between the actual Y values and the
predicted values of Y
sy
• General form: Y’ =a + bX Slope : b  r
sx
Intercept : a  Y  b X
• r : correlation coefficient
Y : mean of Y
• Sy : standard deviation of Y
• Sx : standard deviation of X X : mean of X
Konsep Residu
x y residu
Buku y^ (y-y^)^2
Page Price y-y^
Sejarah 500 84 73.71 10.29 105.80
Matematika 700 75 84.00 (9.00) 81.00
Psikologi 800 99 89.14 9.86 97.16
Sosiologi 600 72 78.86 (6.86) 47.02
Manajemen 400 69 68.57 0.43 0.18
Biologi 500 81 73.71 7.29 53.08
Musik 600 63 78.86 (15.86) 251.45
Keperawatan 800 93 89.14 3.86 14.88
650.57
a 48
Nilai Kuadrat Terkecil
b 0.05
Kesalahan Standart Estimasi (Standard Error)
0
 Y  Yˆ 
2

s y. x  t hitung 
n2 se
x y residu
Buku y^ (y-y^)^2
Page Price y-y^
Sejarah 500 84 73.71 10.29 105.80
Matematika 700 75 84.00 (9.00) 81.00
Psikologi 800 99 89.14 9.86 97.16
Sosiologi 600 72 78.86 (6.86) 47.02
Manajemen 400 69 68.57 0.43 0.18
Biologi 500 81 73.71 7.29 53.08
Musik 600 63 78.86 (15.86) 251.45
Keperawatan 800 93 89.14 3.86 14.88
(0.00) 650.57
a 48
b 0.05 Deviasi positif diimbangi 10.4129
dengan deviasi negatif
Asumsi Pokok Regresi Linear
• Memiliki distribusi normal
• Dalam garis regresi terdapat rata-rata
• Memiliki standar kesalahan estimasi yang
sama (sy.x); dan
• Distribusi yang terikat dengan yang lain
• If the values follow a normal distribution:
Y '  s y . x  include the middle 68% of observation
Y '  2s y . x  include the middle 95% of observation
Y '  3s y . x  include virtually all the observations
Confidence Interval & Estimation Interval

CI Nilai Rata-rata Y untuk Y  t s y . x  


ˆ 1 X  X
2

suatu nilai X n  X  X 2

Interval Prediksi untuk Y


Y  t s y. x  1  
ˆ 1 X  X
2

pada suatu nilai X


n  X  X 2
Example
• Determine a 95% confidence interval for all sales
representatives who make 25 calls
Sales Calls Copier
Sales Representative (X) Sales (Y) (X-mean) (X-mean)^2
Tom Keller 20 30 -2 4
Jeft Hall 40 60 18 324
Brian Virost 20 40 -2 4
Greg Fish 30 60 8 64
Susan Welch 10 30 -12 144
Carlos R 10 40 -12 144
Rich N 20 40 -2 4
Mike Kiel 20 50 -2 4
Mark Reynolds 20 30 -2 4
Soni Jones 30 70 8 64
Mean 22 0 Total 760
Cont’
• If she makes 25 calls, the expectation of the
number of copiers is 48.5526, found by Y’ =
18.9476 + 1.1842 X = 18.9476 + 1.1842 (25)
df = n – k = 10 – 2 = 8
confidence level : 95%  tvalue = 2.306

1 25  22
2
CI  48.5526  2.306(9.901) 
10 760
 48.5526  7.6356
Coefficient of Determination
Total var iation  Un exp lained var iation
r 
2

Total var iation


 Y  Y    Y  Y '
2 2

 Y  Y 
2

• E.g. of Y = a + b X
R2 = 0.8, we say that 80% of the variation in
weekly production, Y, is determined by its
linear relationship with X
The Relationships among the coefficient of
correlation, coefficient of determination and
the standard error of estimate

Re gression 
 SSR   Y 'Y 
2

Error var iation  SSE   Y  Y '


2

Total var iation  SS total   Y  Y  


2

The format for the ANOVA table is


Source df SS MS
Regression 1 SSR SSR/1
Error n-2 SSE SSE/(n - 2)
Total n-1 SS total*
*SS total = SSR + SSE
Cont’
Coefficient of Deter min ation
SSR SSE
r 
2
 1
SS total SS total

S tan dard error


SSE
s y. x 
n2
Multiple Regression and
Correlation Analysis
Multiple Regression Analysis
• General Multiple Regression Equation

Y '  a  b1 X 1  b2 X 2  .....  bn X n
Multiple Standard Error of
Estimate
• Multiple standard error of estimate

 Y  Y ' 2

s y .12...k 
n  k  1
Assumption about Multiple Regression
and Correlation
1. The independent variables and the dependent variable have a
linear relationship
2. The dependent variable is continous and at least interval scale
3. The variation in the difference between the actual and the
predicted values is the same for all fitted values of Y, so called
homoscedasticity  (Y-Y’) must be aproximately same for all
values of Y’
4. The residuals (Y-Y’) are normally distributed with a mean of 0
5. Successive observations of the dependent variable are
uncorrelated, so called autocorrelation
The Relationships among the coefficient of
correlation, coefficient of determination and
the standard error of estimate

Re gression 
 SSR   Y 'Y 
2

Error var iation  SSE   Y  Y '


2

Total var iation  SS total   Y  Y  


2

The format for the ANOVA table is

Source df SS MS F
MSR/
Regression k SSR SSR/k MSE
Error n - (k + 1) SSE SSE/[n - (k+1)]
Total n-1 SS total
Cont’
• Coefficient of Multiple Determination

SSR
R 
2

SS total

• Adjusted Coefficient of Determination


SSE
n  k  1
2
Radj  1
SStotal
n 1
Example
Analysis of Variance
Source DF SS MS
Regression 5 100 20
Residual Error 20 40 2
Total 25 140
• How large was the sample?
• How many independent and dependent variables are there?
• Compute the standard error of estimate. About 95 percent of
the residuals will be between what two values?
• Determine the coefficient of multiple determination.
Interpret this value
• Find the coefficient of multiple determination, adjusted for
the degrees of freedom
Global Test: Testing the Multiple
Regression Model
• Can the dependent variable be estimated
without relying on the independent variables?
H 0 : 1   2   3  0
H1 : Not all the  i ' s are 0
• Characteristic of F distribution:
– Cannot be negative
– It is a continous distribution
– It is positively skewed
– It is asymptotic
Analysis of Variance
Source DF SS MS
Regression 3 171220 57073,49
Residual Error 16 41695 2605,955
Total 19 212916

Coefficients Standard Error t-stat


Intercept 427,194 59,601 7,168
Temp -4,583 0,772 -5,934
Insul -14,831 4,754 -3,119
Age 6,101 4,012 1,521
SSR / k 171,220 / 3
F   21.90
SSE /[ n  (k  1)] 41,695 /[ 20  (3  1)]

Df = (3, 16)
It means that some of the
independent variables do have ability
Reject to explain the variation of dependent
H0 variables
3.24
Evaluating Individual Regression
Coefficients
For temperature: For insulation: For furnace age:
H 0 : 1  0 H0 : 2  0 H 0 : 3  0
H 1 : 1  0 H1 :  2  0 H1 :  3  0
  0.05
n  20
df  n  (k  1)  20  (3  1)  16
t table  2.120

Yˆ  427.194  4.583 X 1  14.831X 2  6.101X 3


b1  0  4.583  0
For X1: t   5.934
sbi 0.772
Evaluating the Assumptions of
Multiple Regression
1. There is a linear relationship
2. The variation in the residuals is the same for
both large and small values of Y
3. The residuals follow the normal probability
distribution
4. The independent variables should not be
correlated
5. The residuals are independent
TUGAS
• Dikerjakan secara individual ditulis di ppt, next
week present secara acak
• Jika dipanggil tidak siap, nilai UTS minus 5, jika
mencoba maju plus 5
• Kerjakan soal di Lind:
– Soal no 20 di exercise multiple regression  Mike
Wilde
– Soal no 49 di exercise linear regression  a
sample of 12 homes
Sukses Tidak sukses
26/9/2015, no. 20 Vebri Kristianto

Anda mungkin juga menyukai