Anda di halaman 1dari 33

CORRELATION AND

REGRESSION
1. Scatter Plot
Scatter plot adalah sebuah grafik yang biasa
digunakan untuk melihat suatu pola
hubungan antara 2 variabel. Untuk bisa
menggunakan scatter plot, skala data yang
digunakan haruslah skala interval dan rasio
Scatter Plot hanya memberikan gambaran
hubungan yang ditangkap indera mata (tidak
menjelaskan arah atau keeratan hubungan)
Berbagai contoh Scatter Plot

Scatter Plot Income-Pola Cenderung hubungan


Kepemilikan mobil Data mengumpul
positif

Hubungan rendah/tdk ada Data menyebar


hubungan Outlier
2. Kovarian (Co-variance)
Kovarian merupakan salah satu jenis nilai
yang digunakan dalam statistik untuk
mendeskripsikan hubungan linear antara dua
variabel
Dapat menunjukkan arah hubungan, nilai -1
sd 1 (hubungan negatif, posistif dan tidak ada
hubungan)
Belum dapat menunjukkan keeratan
hubungan
Rumus Kovarian

N
3. Korelasi
• Alat analisis yang digunakan untuk
mengetahui arah dan keeratan hubungan dua
variabel
• Nilai -1 sd +1
Rumus Korelasi Sederhana

Mendekati 1 atau -1 berarti hubungan antara dua variabel semakin kuat,



sebaliknya nilai mendekati 0 berarti hubungan antara dua variabel semakin
lemah.
Nilai positif menunjukkan hubungan searah (X naik maka Y naik) dan nilai

negatif menunjukkan hubungan terbalik (X naik maka Y turun).
Menurut Sugiyono (2007) pedoman untuk memberikan interpretasi koefisien

korelasi sebagai berikut:
0,00 - 0,199= sangat rendah
0,20 - 0,399 = rendah
0,40 - 0,599 = sedang
0,60 - 0,799= kuat
0,80 - 1,000 = sangat kuat
MATRIK KORELASI
Correlations

Citizenship Participation Democracy


Citizenship Pearson Correlation 1 .969** .968**
Sig. (2-tailed) .007 .007
N 5 5 5
Participation Pearson Correlation .969** 1 .977**
Sig. (2-tailed) .007 .004
N 5 5 5
Democracy Pearson Correlation .968** .977** 1
Sig. (2-tailed) .007 .004
N 5 5 5
**. Correlation is significant at the 0.01 level (2-tailed).

Bagaimana hubungan pengetahuan kewarganegaraan seseorang


dengan perilaku demokratisnya dimana partisipasi politik
menjadi variabel kontrol. Karena ketiga variabel bersifat
kuantitatif, maka tipe analisis korelasi yang digunakan adalah
Pearson.
Korelasi Ganda
ryx2 1  ryx2 2  2 ryx1 ryx2 rx1x2
R yx1x2 
1  rx21x2

suatu nilai yang memberikan kuatnya hubungan dua


variabel atau lebih secara bersama-sama dengan
variabel lain.
Uji Signifikansi
R2
Fhitung  k
(1  R 2 )
n  k 1

Di mana:
 R = Nilai koefisien korelasi ganda
 k = jumlah variabel bebas (independen)
 n = jumlah sampel
 F = F-hitung yang selanjutnya akan dibandingkan dengan
 Kaidah penguji signifikansi:
 Jika Fhitung > F tabel : signifikan
 jika Fhitung < F tabel : maka tidak signifikan
 carilah nilai F tabel menggunakan tabel F dengan rumus:
 taraf signifikansinya α = 0.01 atau α = 0.05
4. REGRESI
Regresi Sederhana
Ŷ = a + bX

Keterangan:
Ŷ = Respon (variabel terikat/dependen)
a = Constanta
b = Koefisien regresi variabel independen
X = Prediktor (variabel bebas/independen)
The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (y) & 2 or more independent variables (xi)

Population model:
Y-intercept
Population slopes Random Error

y  β0  β1x1  β 2 x 2    βk x k  ε
Estimated multiple regression model:
Estimated Estimated
(or predicted) Estimated slope coefficients
value of y intercept

12
ŷ  b0  b1x1  b 2 x 2    bk x k
Multiple Regression Model
Two variable model
y
ŷ  b0  b1x1  b 2 x 2

l x1
abe
ri
va
tuk
un
o pe x2
Sl
uk v ar iabel x 2
unt
Slope

x
13 1
Multiple Regression Model
Two variable model
y
yi
<
Observasi sampel
ŷ  b0  b1x1  b 2 x 2

yi

<
e = (y – y)

x2i
x2

<
x1i The best fit equation, y ,
is found by minimizing the
x sum of squared errors, e2
14 1
Interpretation of Estimated
Coefficients
 Slope (bi)
Estimates that the average value of y changes by bi units for
each 1 unit increase in Xi holding all other variables
constant
Example: if b1 = -20, then sales (y) is expected to decrease
by an estimated 20 pies per week for each $1 increase in
selling price (x1), net of the effects of changes due to
advertising (x2)
 y-intercept (b0)
The estimated average value of y when all xi = 0 (assuming
15 all xi = 0 is within the range of observed values)
Multiple Regression Output
•Regression Statistics

Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172 Sales  306.526 - 24.975(Price)  74.131(Adv ertising)
Standard Error 47.46341
Observations 15

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
16
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
The Multiple Regression Equation

Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)


where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of changes effects of changes
due to advertising due to price
17
Using The Model to Make Predictions

Predict sales for a week in which the selling


price is $5.50 and advertising is $350:

Sales  306.526 - 24.975(Price)  74.131(Adv ertising)


 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62

Note that Advertising is


Predicted sales in $100’s, so $350
means that x2 = 3.5
is 428.62 pies
18
Multiple Coefficient of
Determination
Reports the proportion of total variation in y explained
by all x variables taken together

SSR Sum of squares regression


R 
2

SST Total sum of squares

19
Multiple Coefficient of
Determination (continued)

Regression Statistics
SSR 29460.0
Multiple R 0.72213 R 2
  .52148
R Square 0.52148 SST 56493.3
Adjusted R Square 0.44172
Standard Error 47.46341
52.1% of the variation in pie sales is
Observations 15 explained by the variation in price
and advertising

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
20
Adjusted R2
R2 never decreases when a new x variable is added
to the model
This can be a disadvantage when comparing models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new x variable is
added
Did the new x variable add enough explanatory power to
offset the loss of one degree of freedom?

21
Adjusted R2 (continued)

Shows the proportion of variation in y explained by all


x variables adjusted for the number of x variables used
 n 1 
R 2
A  1  (1  R )
2

 n  k  1
(where n = sample size, k = number of independent variables)
Penalize excessive use of unimportant independent variables
Smaller than R2
Useful in comparing among models

22
Multiple Coefficient of
Determination
(continued)

•Regression Statistics

Multiple R 0.72213
R 2A  .44172
R Square 0.52148
44.2% of the variation in pie sales is explained
Adjusted R Square 0.44172
by the variation in price and advertising, taking
Standard Error 47.46341
Observations 15
into account the sample size and number of
independent variables

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
23
Is the Model Significant?
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all of
the x variables considered together and y
Use F test statistic
Hypotheses:
H : β = β = … = β = 0
0 1 2 k
(no linear relationship)
H : at least one β ≠ 0
A i
(at least one independent variable affects y)

24
F-Test for Overall Significance
(continued)
 Test statistic:

SSR
k MSR
F 
SSE MSE
n  k 1
where F has (numerator) D1 = k and
(denominator) D2 = (n – k - 1)
degrees of freedom
25
F-Test for Overall Significance
(continued)
•Regression Statistics
MSR 14730.0
Multiple R 0.72213
F   6.5386
R Square
Adjusted R Square
0.52148
0.44172
MSE 2252.8
Standard Error 47.46341 With 2 and 12 degrees P-value for
Observations 15 of freedom the F-Test

ANOVA   df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333      

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
26
Are Individual Variables
Significant?
(continued)

H0: βi = 0 (no linear relationship)


HA: βi ≠ 0 (linear relationship does exist
between xi and y)

Test Statistic:

bi  0
(df = n – k – 1)
t
sbi
27
Are Individual Variables Significant?
(continued)

•Regression Statistics
t-value for Price is t = -2.306, with p-
•Multiple R •0.72213
value .0398
•R Square •0.52148
•Adjusted R Square •0.44172
t-value for Advertising is t = 2.855, with
•Standard Error •47.46341
p-value .0145
•Observations •15

•ANOVA   •df •SS •MS •F •Significance F


•Regression •2 •29460.027 •14730.013 •6.53861 •0.01201
•Residual •12 •27033.306 •2252.776
•Total •14 •56493.333 •  •  • 

•  •Coefficients •Standard Error •t Stat •P-value •Lower 95% •Upper 95%


•Intercept •306.52619 •114.25389 •2.68285 •0.01993 •57.58835 •555.46404
•Price •-24.97509 •10.83213 •-2.30565 •0.03979 •-48.57626 •-1.37392
28
•Advertising •74.13096 •25.96732 •2.85478 •0.01449 •17.55303 •130.70888
Standard Deviation of the
Regression Model

The estimate of the standard deviation of the


regression model is:

SSE
s   MSE
n  k 1
 Is this value large or small? Must compare to
the mean size of y for comparison

29
Standard Deviation of the Regression
Model

•Regression Statistics
•Multiple R •0.72213 The standard deviation of the
•R Square •0.52148 regression model is 47.46
•Adjusted R Square •0.44172
•Standard Error •47.46341
•Observations •15

•ANOVA   •df •SS •MS •F •Significance F


•Regression •2 •29460.027 •14730.013 •6.53861 •0.01201
•Residual •12 •27033.306 •2252.776
•Total •14 •56493.333 •  •  • 

•  •Coefficients •Standard Error •t Stat •P-value •Lower 95% •Upper 95%


•Intercept •306.52619 •114.25389 •2.68285 •0.01993 •57.58835 •555.46404
•Price •-24.97509 •10.83213 •-2.30565 •0.03979 •-48.57626 •-1.37392
•Advertising •74.13096 •25.96732 •2.85478 •0.01449 •17.55303 •130.70888
30
Multicollinearity

Multicollinearity: High correlation exists between


two independent variables
This means the two variables contribute redundant
information to the multiple regression model

31
Multicollinearity
(continued)

Including two highly correlated independent


variables can adversely affect the regression results
No new information provided

Can lead to unstable coefficients (large standard error and


low t-values)
Coefficient signs may not match prior expectations

32
Detect Collinearity
(Variance Inflationary Factor)

VIFj is used to measure collinearity:

1
VIFj 
1  Rj
2

R2j is the coefficient of determination when the jth


independent variable is regressed against the
remaining k – 1 independent variables

If VIFj > 5, xj is highly correlated with


the other explanatory variables
33

Anda mungkin juga menyukai