REGRESSION AND
CORRELATION ANALYSIS
TM 14
Intan N. Awwaliyah
Goals
After this, you should be able to:
• Calculate and interpret the simple correlation
between two variables
• Determine whether the correlation is significant
• Calculate and interpret the simple linear
regression equation for a set of data
• Understand the assumptions behind
regression analysis
• Determine whether a regression model is
significant
Goals
(continued)
After this, you should be able to:
• Calculate and interpret confidence intervals for the
regression coefficients
• Recognize regression analysis applications for
purposes of prediction and description
• Recognize some potential problems if regression
analysis is used incorrectly
• Recognize nonlinear relationships between two
variables
CORRELATION ANALYSIS
• suatu peristiwa atau kejadian memiliki
keterkaitan dengan peristiwa lain.
Misalnya :
• variabel harga (X) → naik turunnya harga dinyatakan dalam
perubahan nilai X
• variabel hasil penjualan (Y) → naik turunnya hasil penjualan
diperlihatkan dari perubahan pada nilai Y
Langkah
identifikasi variabel
awal
KORELASI
pengertian dan bentuk
istilah yang digunakan untuk mengukur kekuatan hubungan
antar variabel
y y
x x
y y
x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Correlation Coefficient
• The population correlation coefficient ρ (rho)
measures the strength of the association between
the variables
y y y
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Korelasi Product Moment (Pearson’s
Correlation)
Karl
• Korelasi product moment yang dikembangkan oleh
Pearson populer juga dengan sebutan Korelasi
Pearson
• Fungsi :
1. Untuk mengetahui hubungan antara 2 variabel
2. Untuk mengetahui arah atau bentuk hubungan
3. Untuk mengetahui keeratan hubungan
4. Dasar untuk melakukan prediksi
Type of Bivariate Correlation
Pearson Spearman dan Kendall
r=
( x − x)( y − y)
[ ( x − x ) ][ ( y − y ) ]
2 2
Tree n xy − x y
Height, r=
y 70 [n( x 2 ) − ( x) 2 ][n( y 2 ) − ( y) 2 ]
60
8(3142) − (73)(321)
50 =
40
[8(713) − (73)2 ][8(14111) − (321)2 ]
= 0.886
30
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Excel Output
Excel Correlation Output
Tools / data analysis / correlation…
Correlation between
Tree Height and Trunk Diameter
Significance Test for Correlation
• Hypotheses
r
•
t= (with n – 2 degrees of freedom)
1− r 2
n−2
Example: Produce Stores
Is there evidence of a linear relationship
between tree height and trunk diameter at
the .05 level of significance?
r .886 Decision:
t= = = 4.68
1− r 2 1 − .8862 Reject H0
y = β0 + β1x + ε
Variable
y y = β0 + β1x + ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value Random Error
of y for xi
for this x value
Intercept = β0
xi x
Estimated Regression Model
Garis regresi linier dari sampel menyediakan
estimasi dari garis regresi linier untuk populasi
Independent
RUMUS
e 2
= (y −ŷ) 2
= (y − (b 0 + b1x)) 2
The Least Squares Equation
(Formula Kuadrat Terkecil)
• Rumus menghitung manual b1 and b0 sbb
b1 =
( x − x )( y − y )
(x − x) 2
Atau :
xy − x y
n and b0 = y − b1 x
b1 =
x 2
−
( x ) 2
n
Interpretasi nilai Slope dan Intercept
ŷi = b0 + b1x
• b0 adalah estimasi rata2 nilai y
ketika nilai x sama dengan 0
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Gambar Grafis
• House price model: scatter plot and regression line
450
400
House Price ($1000s)
350 Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
Xi x
Coefficient of Determination, R2
• The coefficient of determination adalah porsi dari total
variasi dalam dependent variable yg dapat dijelaskan
oleh variasi pada independent variable
SSR
R =2 dimana 0 R 1 2
SST
Coefficient of Determination, R2
(continued)
Coefficient of determination
where:
R =r2 2
R2 = Coefficient of determination
r = Simple correlation coefficient
Contoh Nilai R2
y
R2 = 1
x
R2 = +1
Contoh Nilai R2
y
0 < R2 < 1
x
Contoh Nilai R2
R2 = 0
y
No linear relationship
between x and y:
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Standard Error of Estimate
• Standard deviation variasi observasi seputar garis
regresi dapat diestimasi dengan cara:
SSE
s =
n − k −1
Where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the model
Standard Deviasi dari Slope Regresi
sε sε
sb1 = =
(x − x) 2
( x)
x − n 2
2
where:
sb1 = Estimate of the standard error of the least squares slope
SSE = Sample standard error of the estimate
sε =
n−2
Excel Output
Regression Statistics sε = 41.33032
Multiple R 0.76211
R Square 0.58082
sb1 = 0.03297
Adjusted R
Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Membandingkan plotting Standard Errors
Variation of observed y values Variation in the slope of regression
from the regression line lines from different possible samples
y y
smalls x smallsb1 x
y y
b1 − β1
•
t= where:
sb1 b1 = Sample regression slope
coefficient
•
d.f. = n − 2 β1 = Hypothesized slope
sb1 = Estimator of the standard
error of the slope
Inferences about the Slope:
t Test Example
Test Statistic: t = 3.329 sb1
b1 t
H0: β1 = 0 From Excel output:
Coefficient Standard P-
HA: β1 0 s Error t Stat value
1.6929 0.1289
Intercept 98.24833 58.03348 6 2
d.f. = 10-2 = 8
Square 3.3293 0.0103
Feet 0.10977 0.03297 8 9
/2=.025 /2=.025 Decision: Reject H0
Conclusion:
Reject H0 Do not reject H0
-tα/2 tα/2
Reject H
0 There is sufficient evidence that
0 square footage affects house
-2.3060 2.3060 3.329
price
Analisis Regresi untuk Deskripsi
Confidence Interval Estimate of the Slope:
b1 t /2sb1 d.f. = n - 2
1 (xp − x)
2
ŷ t /2sε +
n (x − x) 2
Interval Keyakinan untuk rata2 y, dengan
x tertentu (contd)
Confidence interval estimate for an
Individual value of y given a particular xp
1 (xp − x)
2
ŷ t /2sε 1+ +
n (x − x) 2
Confidence
Interval for
the mean of
y, given xp
x
x xp
Contoh: House Prices
House Price Estimated Regression Equation:
Square Feet
in $1000s
(x)
(y) houseprice = 98.25 + 0.1098 (sq.ft.)
245 1400
312 1600
279 1700 Prediksikan harga rumah
308 1875
dengan luas 2000 square
199 1100
219 1550
feet
405 2350
324 2450
319 1425
255 1700
Example: House Prices
(continued)
Prediksikan harga rumah
dengan luas 2000 square
feet
house price = 98.25 + 0.1098 (sq.ft.)
= 98.25 + 0.1098(2000)
= 317.85
Prediksi harga rumah dengan luas 2000
square feet adalah 317.85($1,000s) =
$317,850
Estimasi Nilai Rata-rata
Interval Leyakinan untuk E(y)|xp
Contoh: Dengan 95% confidence interval tentukan
harga rata-rata rumah dengan luas 2,000 square-foot
Harga Prediksi Yi = 317.85 ($1,000s)
1 (xp − x)2
ŷ t α/2sε + = 317.85 37.12
n (x − x) 2
1 (xp − x)2
ŷ t α/2sε 1+ + = 317.85 102.28
n (x − x) 2
y y
x x
residuals
x residuals x
Not Linear
✓ Linear
Residual Analysis for
Constant Variance
y y
x x
residuals
x residuals x
RESIDUAL OUTPUT
Predicted House Price Model Residual Plot
House
Price Residuals 80
1 251.92316 -6.923162 60
3 284.85348 -5.853484 20
4 304.06284 3.937162 0
0 1000 2000 3000
5 218.99284 -19.99284 -20
6 268.38832 -49.38832 -40
7 356.20251 48.79749 -60
8 367.17929 -43.17929 Square Feet
9 254.6674 64.33264
10 284.85348 -29.85348
SPSS Procedure
• Buka data regresi.sav
• Menu Analyze → Regression → Linear
• Dependent → variable terikat : Sales
• Independent → Promosi
• Case Labels → Daerah
• Method → ENTER
• Options
• Stepping Method Criteria→ Uji F
• Lainnya default
• CONTINUE
• Statistics
• Regression Coefficients : Estimate dan Model Fit , Descriptive
• Residuals → Casewise Diagnostics → all cases : untuk melihat
pengaruh regresi thdp semua daerah
• Continue
SPSS Procedure
• Plots (bisa untuk mendeteksi Outlier)
• SDRESID → Y; ZPRED → X
• Next
• ZPRED → Y; DEPENDT --X
• Next
• Note dua plot di ats untuk uji linieritas dan kesamaan varians
(homokedastisitas)
• Standarized Residual Plots dan Normal Probability Plot (untuk
menguji normalitas)
• Continue
• OK
Kesimpulan
• Introduced correlation analysis
• Discussed correlation to measure the strength of a
linear association
• Introduced simple linear regression analysis
• Calculated the coefficients for the simple linear
regression equation
• measures of variation (R2 and sε)
• Addressed assumptions of regression and correlation
Summary
(continued)