Analisis Hubungan Antar Variabel

CORRELATION AND
REGRESSION
1. Scatter Plot
Scatter plot adalah sebuah grafik yang biasa
digunakan untuk melihat suatu pola
hubungan antara 2 variabel. Untuk bisa
menggunakan scatter plot, skala data yang
digunakan haruslah skala interval dan rasio
Scatter Plot hanya memberikan gambaran
hubungan yang ditangkap indera mata (tidak
menjelaskan arah atau keeratan hubungan)
Berbagai contoh Scatter Plot
Scatter Plot Income-Pola Cenderung hubungan

Kepemilikan mobil Data mengumpul
positif
Hubungan rendah/tdk ada Data menyebar

hubungan Outlier
2. Kovarian (Co-variance)
Kovarian merupakan salah satu jenis nilai
yang digunakan dalam statistik untuk
mendeskripsikan hubungan linear antara dua
variabel
Dapat menunjukkan arah hubungan, nilai -1
sd 1 (hubungan negatif, posistif dan tidak ada
hubungan)
Belum dapat menunjukkan keeratan
hubungan
Rumus Kovarian
N
3. Korelasi
• Alat analisis yang digunakan untuk
mengetahui arah dan keeratan hubungan dua
variabel
• Nilai -1 sd +1
Rumus Korelasi Sederhana
Mendekati 1 atau -1 berarti hubungan antara dua variabel semakin kuat,


sebaliknya nilai mendekati 0 berarti hubungan antara dua variabel semakin
lemah.
Nilai positif menunjukkan hubungan searah (X naik maka Y naik) dan nilai

negatif menunjukkan hubungan terbalik (X naik maka Y turun).
Menurut Sugiyono (2007) pedoman untuk memberikan interpretasi koefisien

korelasi sebagai berikut:
0,00 - 0,199= sangat rendah
0,20 - 0,399 = rendah
0,40 - 0,599 = sedang
0,60 - 0,799= kuat
0,80 - 1,000 = sangat kuat
MATRIK KORELASI
Correlations
Citizenship Participation Democracy

Citizenship Pearson Correlation 1 .969** .968**
Sig. (2-tailed) .007 .007
N 5 5 5
Participation Pearson Correlation .969** 1 .977**
Sig. (2-tailed) .007 .004
N 5 5 5
Democracy Pearson Correlation .968** .977** 1
Sig. (2-tailed) .007 .004
N 5 5 5
**. Correlation is significant at the 0.01 level (2-tailed).
Bagaimana hubungan pengetahuan kewarganegaraan seseorang

dengan perilaku demokratisnya dimana partisipasi politik
menjadi variabel kontrol. Karena ketiga variabel bersifat
kuantitatif, maka tipe analisis korelasi yang digunakan adalah
Pearson.
Korelasi Ganda
ryx2 1  ryx2 2  2 ryx1 ryx2 rx1x2
R yx1x2 
1  rx21x2
suatu nilai yang memberikan kuatnya hubungan dua

variabel atau lebih secara bersama-sama dengan
variabel lain.
Uji Signifikansi
R2
Fhitung  k
(1  R 2 )
n  k 1
Di mana:
R = Nilai koefisien korelasi ganda
k = jumlah variabel bebas (independen)
n = jumlah sampel
F = F-hitung yang selanjutnya akan dibandingkan dengan
Kaidah penguji signifikansi:
Jika Fhitung > F tabel : signifikan
jika Fhitung < F tabel : maka tidak signifikan
carilah nilai F tabel menggunakan tabel F dengan rumus:
taraf signifikansinya α = 0.01 atau α = 0.05
4. REGRESI
Regresi Sederhana
Ŷ = a + bX
Keterangan:
Ŷ = Respon (variabel terikat/dependen)
a = Constanta
b = Koefisien regresi variabel independen
X = Prediktor (variabel bebas/independen)
The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (y) & 2 or more independent variables (xi)
Population model:
Y-intercept
Population slopes Random Error
y  β0  β1x1  β 2 x 2    βk x k  ε
Estimated multiple regression model:
Estimated Estimated
(or predicted) Estimated slope coefficients
value of y intercept
12
ŷ  b0  b1x1  b 2 x 2    bk x k
Multiple Regression Model
Two variable model
y
ŷ  b0  b1x1  b 2 x 2
l x1
abe
ri
va
tuk
un
o pe x2
Sl
uk v ar iabel x 2
unt
Slope
x
13 1
Multiple Regression Model
Two variable model
y
yi
<
Observasi sampel
ŷ  b0  b1x1  b 2 x 2
yi
<
e = (y – y)
x2i
x2
<
x1i The best fit equation, y ,
is found by minimizing the
x sum of squared errors, e2
14 1
Interpretation of Estimated
Coefficients
 Slope (bi)
Estimates that the average value of y changes by bi units for
each 1 unit increase in Xi holding all other variables
constant
Example: if b1 = -20, then sales (y) is expected to decrease
by an estimated 20 pies per week for each $1 increase in
selling price (x1), net of the effects of changes due to
advertising (x2)
 y-intercept (b0)
The estimated average value of y when all xi = 0 (assuming
15 all xi = 0 is within the range of observed values)
Multiple Regression Output
•Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172 Sales  306.526 - 24.975(Price)  74.131(Adv ertising)
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
16
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
The Multiple Regression Equation
Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)

where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of changes effects of changes
due to advertising due to price
17
Using The Model to Make Predictions
Predict sales for a week in which the selling

price is $5.50 and advertising is $350:
Sales  306.526 - 24.975(Price)  74.131(Adv ertising)

 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62
Note that Advertising is

Predicted sales in $100’s, so $350
means that x2 = 3.5
is 428.62 pies
18
Multiple Coefficient of
Determination
Reports the proportion of total variation in y explained
by all x variables taken together
SSR Sum of squares regression

R 
2

SST Total sum of squares
19
Determination (continued)
Regression Statistics
SSR 29460.0
Multiple R 0.72213 R 2
  .52148
R Square 0.52148 SST 56493.3
Adjusted R Square 0.44172
52.1% of the variation in pie sales is
Observations 15 explained by the variation in price
and advertising
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
20
Adjusted R2
R2 never decreases when a new x variable is added
to the model
This can be a disadvantage when comparing models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new x variable is
added
Did the new x variable add enough explanatory power to
offset the loss of one degree of freedom?
21
Adjusted R2 (continued)
Shows the proportion of variation in y explained by all

x variables adjusted for the number of x variables used
 n 1 
R 2
A  1  (1  R )
2

 n  k  1
(where n = sample size, k = number of independent variables)
Penalize excessive use of unimportant independent variables
Smaller than R2
Useful in comparing among models
22
Determination
(continued)
Multiple R 0.72213
R 2A  .44172
R Square 0.52148
44.2% of the variation in pie sales is explained
Adjusted R Square 0.44172
by the variation in price and advertising, taking
Observations 15
into account the sample size and number of
independent variables
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
23
Is the Model Significant?
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all of
the x variables considered together and y
Use F test statistic
Hypotheses:
H : β = β = … = β = 0
0 1 2 k
(no linear relationship)
H : at least one β ≠ 0
A i
(at least one independent variable affects y)
24
F-Test for Overall Significance
(continued)
 Test statistic:
SSR
k MSR
F 
SSE MSE
n  k 1
where F has (numerator) D1 = k and
(denominator) D2 = (n – k - 1)
degrees of freedom
25
F-Test for Overall Significance
(continued)
MSR 14730.0
Multiple R 0.72213
F   6.5386
R Square
Adjusted R Square
0.52148
0.44172
MSE 2252.8
Standard Error 47.46341 With 2 and 12 degrees P-value for
Observations 15 of freedom the F-Test
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
26
Are Individual Variables
Significant?
(continued)
H0: βi = 0 (no linear relationship)

HA: βi ≠ 0 (linear relationship does exist
between xi and y)
Test Statistic:
bi  0
(df = n – k – 1)
t
sbi
27
Are Individual Variables Significant?
(continued)
t-value for Price is t = -2.306, with p-
•Multiple R •0.72213
value .0398
•R Square •0.52148
•Adjusted R Square •0.44172
t-value for Advertising is t = 2.855, with
•Standard Error •47.46341
p-value .0145
•Observations •15
•ANOVA •df •SS •MS •F •Significance F

•Regression •2 •29460.027 •14730.013 •6.53861 •0.01201
•Residual •12 •27033.306 •2252.776
•Total •14 •56493.333 • • •
• •Coefficients •Standard Error •t Stat •P-value •Lower 95% •Upper 95%

•Intercept •306.52619 •114.25389 •2.68285 •0.01993 •57.58835 •555.46404
•Price •-24.97509 •10.83213 •-2.30565 •0.03979 •-48.57626 •-1.37392
28
•Advertising •74.13096 •25.96732 •2.85478 •0.01449 •17.55303 •130.70888
Standard Deviation of the
Regression Model
The estimate of the standard deviation of the

regression model is:
SSE
s   MSE
n  k 1
 Is this value large or small? Must compare to
the mean size of y for comparison
29
Standard Deviation of the Regression
Model
•Multiple R •0.72213 The standard deviation of the
•R Square •0.52148 regression model is 47.46
•Adjusted R Square •0.44172
•Standard Error •47.46341
•Observations •15
•ANOVA •df •SS •MS •F •Significance F

•Regression •2 •29460.027 •14730.013 •6.53861 •0.01201
•Residual •12 •27033.306 •2252.776
•Total •14 •56493.333 • • •
• •Coefficients •Standard Error •t Stat •P-value •Lower 95% •Upper 95%

•Intercept •306.52619 •114.25389 •2.68285 •0.01993 •57.58835 •555.46404
•Price •-24.97509 •10.83213 •-2.30565 •0.03979 •-48.57626 •-1.37392
•Advertising •74.13096 •25.96732 •2.85478 •0.01449 •17.55303 •130.70888
30
Multicollinearity
Multicollinearity: High correlation exists between

two independent variables
This means the two variables contribute redundant
information to the multiple regression model
31
Multicollinearity
(continued)
Including two highly correlated independent

variables can adversely affect the regression results
No new information provided
Can lead to unstable coefficients (large standard error and

low t-values)
Coefficient signs may not match prior expectations
32
Detect Collinearity
(Variance Inflationary Factor)
VIFj is used to measure collinearity:
1
VIFj 
1  Rj
2
R2j is the coefficient of determination when the jth

independent variable is regressed against the
remaining k – 1 independent variables
If VIFj > 5, xj is highly correlated with

the other explanatory variables
33

Analisis Hubungan Antar Variabel

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Analisis Hubungan Antar Variabel

Diunggah oleh

Hak Cipta:

Format Tersedia

CORRELATION AND

Scatter Plot Income-Pola Cenderung hubungan

Hubungan rendah/tdk ada Data menyebar

Mendekati 1 atau -1 berarti hubungan antara dua variabel semakin kuat,

Citizenship Participation Democracy

Bagaimana hubungan pengetahuan kewarganegaraan seseorang

suatu nilai yang memberikan kuatnya hubungan dua

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Sales  306.526 - 24.975(Pri ce)  74.131(Adv ertising)

Predict sales for a week in which the selling

Sales  306.526 - 24.975(Price)  74.131(Adv ertising)

Note that Advertising is

SSR Sum of squares regression

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Shows the proportion of variation in y explained by all

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

H0: βi = 0 (no linear relationship)

•ANOVA •df •SS •MS •F •Significance F

• •Coefficients •Standard Error •t Stat •P-value •Lower 95% •Upper 95%

The estimate of the standard deviation of the

•ANOVA •df •SS •MS •F •Significance F

• •Coefficients •Standard Error •t Stat •P-value •Lower 95% •Upper 95%

Multicollinearity: High correlation exists between

Including two highly correlated independent

Can lead to unstable coefficients (large standard error and

VIFj is used to measure collinearity:

R2j is the coefficient of determination when the jth

If VIFj > 5, xj is highly correlated with

Anda mungkin juga menyukai