Tujuan Pembelajaran
Menjelaskan tujuan analisis regresi dan korelasi Menghitung dan menginterpretasikan arti dari persamaan regresi dan standard error dari estimasi-estimasi untuk analisis regresi linier sederhana Menggunakan hasil analisis untuk menduga interval dari variabel terikat Menghitung dan menjelakan arti koefisien korelasi dan determinasi
Dependent variable: denoted Y Independent variables: denoted X1, X2, , Xk If we only have ONE independent variable, the model is
y a bx
y a bx Variables: X = Independent Variable (we provide this) Y = Dependent Variable (we observe this) Parameters: a= Y-Intercept b = Slope
House Price Lower vs. Higher Variability
25K$
Height
Weight
6.6 6.2
Height
Height
100 140 180 220 260
Weight
Weight
Deterministic Model: an equation or set of equations that allow us to fully determine the value of the dependent variable from the values of the independent variables. y = $25,000 + (75$/ft2)(x) Area of a circle: A = *r2 Probabilistic Model: a method used to capture the randomness that is part of a real-life process. y = 25,000 + 75x + E.g. do all houses of the same size (measured in square feet) sell for exactly the same price?
=y-intercept x
y a bx
(This is an application of the least squares method and it produces a straight line that minimizes the sum of the squared differences between the points and the line)
errors
Least Squares Line See if you can estimate Y-intercept and slope from this data
Statistics
Data
Data Points: x 1 y 6
Information
2 3 4 5
6
1 9 5 17
12 y = .934 + 2.114x
Least Squares Line See if you can estimate Y-intercept and slope from this data
Y 6 1 9 5 17 12 50
2 Y - Ybar (X-Xbar)*(Y-Ybar) (X - Xbar) -2.333 5.833 6.250 -7.333 11.000 2.250 0.667 -0.333 0.250 -3.333 -1.667 0.250 8.667 13.000 2.250 3.667 9.167 6.250 0.000 37.000 17.500
SUMMARY OUTPUT Regression Statistics Multiple R 0.7007 R Square 0.4910 Adjusted R Square 0.3637 Standard Error 4.5029 Observations ANOVA df Regression Residual Total 1 4 5 SS MS F Significance F Same as p-value 78.22857143 78.22857143 3.858149366 0.120968388 H0: Regression Model is "NO Good" 81.1047619 20.27619048 159.3333333
The proportion of the variation in the variable Y that can be explained by your regression model Will use later 6
Intercept X Variable 1
Coefficients Standard Error t Stat P-value 0.933333333 4.19198025 0.222647359 0.834716871 2.114285714 1.076401159 1.96421724 0.120968388
H0: 1 = 0
Excel: Plotted Regression Model You will need to play around with this to get the plot to look Good
Y Predicted Y
Standard Error
If is small, the fit is excellent and the linear model should be used for forecasting. If is large, the model is poor But what is small and what is large?
Standard Error
Judge the value of by comparing it to the sample mean of the dependent variable ( ). In this example, = .3265 and = 14.841 so (relatively speaking) it appears to be small, hence our linear regression model of car price as a function of odometer reading is good.
If no linear relationship exists between the two variables, we would expect the regression line to be horizontal, that is, to have a slope of zero. We want to see if there is a linear relationship, i.e. we want to see if the slope ( b ) is something other than zero. Our research hypothesis becomes: H1: b 0 Thus the null hypothesis becomes: H0: = 0
Coefficient of Determination
Tests thus far have shown if a linear relationship exists; it is also useful to measure the strength of the relationship. This is done by calculating the coefficient of determination R2. The coefficient of determination is the square of the coefficient of correlation (r), hence R2 = (r)2 r will be computed shortly and this is true for models with only 1 indepenent variable
Coefficient of Determination
R2 has a value of .6483. This means 64.83% of the variation in the auction selling prices (y) is explained by your regression model. The remaining 35.17% is unexplained, i.e. due to error. Unlike the value of a test statistic, the coefficient of determination does not have a critical value that enables us to draw conclusions. In general the higher the value of R2, the better the model fits the data. R2 = 1: Perfect match between the line and the data points. R2 = 0: There are no linear relationship between x and y.
MSE = SSE/(n2)
We call this value ($14,574) a point prediction (estimate). Chances are though the actual selling price will be different, hence we can estimate the selling price in terms of a confidence interval.
Prediction Interval
The prediction interval is used when we want to predict one particular value of the dependent variable, given a specific value of the independent variable:
(Technically this formula is used for infinitely large populations. However, we can interpret our problem as attempting to determine the average selling price of all Ford Tauruses, all with 40,000 miles on the odometer)
The confidence interval estimate of the expected value of y will be narrower than the prediction interval for the same given value of x and confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value.
Regression Diagnostics
There are three conditions that are required in order to perform a regression analysis. These are: The error variable must be normally distributed, The error variable must have a constant variance, & The errors must be independent of each other.