Anda di halaman 1dari 13

578Assignment-5 (Chs.

13 and 14)-solutions: Due by midnight of Sunday, December 2nd, 2012: drop box 4): 70 points True/False(One point each) Chapter 13 1. The standard error of the estimate (standard error) is the estimated standard deviation of the distribution of the independent variable (X). FALSE it is the estimate of the standard deviation of the error term 2. In a simple linear regression model, the coefficient of determination only indicates the strength of the relationship between independent and dependent variable, but does not show whether the relationship is positive or negative. TRUE R2 is greater than or equal to 0, no negative 3. When using simple regression analysis, if there is a strong correlation between the independent and dependent variable, then we can conclude that an increase in the value of the independent variable causes an increase in the value of the dependent variable. FALSEthe strong correlation could be negative 4. The error term is the difference between an individual value of the dependent variable and the corresponding mean value of the dependent variable. FALSE it is the difference between an individual value of the dependent variable and the corresponding predicted value (not the mean value) : residual and error term are the same thing 5. In bi-variate regression the Coefficient of Determination is always equal to the square of the correlation coefficient. TRUE 6. In Regression Analysis if the variance of the error term is constant, we call it the Heteroscedasticity property. FALSE (instruction page 10-11)

Chapter 14 7. When the F test is used to test the overall significance of a multiple regression model, if the null hypothesis is rejected, it can be concluded that all of the independent variables X1, X2, Xk are significantly related to the dependent variable Y. FALSE we can conclude that at least one (not all). 8. An application of the multiple regression model generated the following results involving the F test of the overall regression model: p-value=.0012, R2=.67 and s=.076. Thus, the null hypothesis, which states that none of the independent variables are significantly related to the dependent variable, should be rejected even at the .01 level of significance. TRUE since p-value is less than 0.01 9. High Multicollinearity problem occurs when the Independent variables are highly correlated with the Dependent variable. FALSE It occurs when there is high linear relation among the Independent variables. 10. The assumption of independent error terms in regression analysis is often violated when using time series data and is called the problem of Autocorrelation. TRUE see Instructions 11. Homoscedasticity problem occurs when the assumption of constant error variance is violated. FALSE. This problem is called Heteroscedasticity and frequently occurs in cross-sectional data. Multiple Choices(Two points each) Chapter 13 1. All of the following are assumptions of the error terms in the simple linear regression model except : A. Errors are normally distributed B. Error terms have a mean of zero C. Error terms have a constant variance D. Error terms depend on the explanatory variable (Instruction page 10-11, Book page 530)
2

2. The point estimate of the variance in a regression model is A. SSE B. MSE C. se D. b1 3. The least squares regression line minimizes the sum of the A. Sum of Differences between actual and predicted Y values B. Sum of Squared differences between actual and predicted X values C. Sum of Absolute deviations between actual and predicted X values D. Sum of Absolute deviations between actual and predicted Y values E. Sum of Squared differences between actual and predicted Y values 4. The ___________ the R2 and the __________ the s (standard error), the stronger the relationship between the dependent variable and the independent variable. A. Higher, lower B. Lower, higher C. Lower, lower D. Higher, higher 5. In simple bivariate regression analysis, if the correlation coefficient is a positive value, then A. The Y intercept must also be a positive value. B. The coefficient of determination can be either positive or negative, depending on the value of the slope. C. The least squares regression equation could either have a positive or a negative slope. D. The standard error of estimate can either have a positive or a negative value. E. The slope of the regression line must also be positive. (the slope coefficient and correlation coefficient have the same sign in bivariate regression- also obvious from the interpretation of the slope in

Instruction- but note that the relation could be weak or strong. Positive sign only shows the direction not the magnitude.) 6. A researcher wants to explore the relationship between the grades students receive on their Midterm test and their Final test score. The following data present the Midterm and Final scores for ten students. What is the correlation coefficient?
Mid 180 195 210 225 240 255 255 264 265 290 Fin 280 280 300 316 320 350 370 320 400 350

A. 0.556 B. 0.645 C. 0.738 D. 0.802 E. 0.905 The MegaStat result is given below:
Correlation Matrix
Mid Fin Mid 1.000 .802 10 Fin

1.000 sample size

Chapter 14 7. Which is not an assumption of a multiple regression model? A. Positive autocorrelation of error terms B. Normality of error terms
4

C. Independence of error terms D. Constant variation of error terms E. Independence of error terms with X variables see Instructions 8. A multiple regression analysis with 22 observations on each of four independent variables and the dependent variable would yield ______ and _______ degrees of freedom respectively for regression (explained) and error. A. 3, 17 B. 4, 20 C. 4, 18 D. 3, 20 E. 4, 17 df for regression = k = 4 and df for error = n-k-1 = 22-4-1 = 17 9. Consider the following partial computer output for a multiple regression model.

What is R2? A. 31.308% B. 76.95% C. 77.72% D. 72.63% E. 23.1%

where the denominator is SST


5

10. Consider the following partial computer output for a multiple regression model.

What is adjusted R2? A. 31.308% B. 76.95% C. 87.72% D. 72.63% E. 23.1% 2 = 1- [SSE/(n-k-1)]/[SST/(n-1)] = 1- (9.378/16)/(40.686/19) =.7263 11. In multiple regression analysis, the mean square regression divided by mean square error yields the: A. Standard error B. F statistic C. R2 D. Adjusted R2 or E. T statistic see Instructions 12. A particular multiple regression model has 3 independent variables, the sum of the squared error is 7680 and the total number of observations is 34. What is the value of the standard error of estimate? A. 256
6

B. 232.72 C. 225.89 D. 16 E. 15.03 The df for error = 34- 3-1 = 30 and the standard error of estimate is MSE = (7680/30) = 16. Essay Type (Five points each) Chapter 13 1. Use the following results obtained from a simple linear regression analysis with 12 observations. = 37.2895 - (1.2024)X r2 = 0.6744 sb1 = 0.2934 Test to determine if there is a significant negative relationship between the independent and dependent variable at =.05 and .01 Reject H0, There is a significant negative relationship between dependent and independent variable. H0: b10 and Ha: b1<0 one-tailed test b1=-1.2024, therefore, t(hat)=b1/sb1=-1.2024/0.2934 = -4.09816; df = 12-2 = 10 table t-values (one-tailed) are 1.812 and 2.764 for 5% and 1% significance levels Comparing the absolute value of t(hat) with critical values in the table we conclude that b1 is highly significant or significantly different from zero (therefore negative here) even at 1% significance level (or with 99% confidence). 2. A local tire dealer wants to predict the number of tires sold each month. He believes that the number of tires sold is a linear function of the amount of money invested in advertising. He randomly selects 6 months of data consisting of tire sales (in thousands of tires) and advertising expenditures (in thousands of dollars). Based on the data set with 6 observations, the simple linear regression model yielded the following results. (X is advertising expenditure in thousand dollars and Y is tires sold in thousands): X =24; Y =42; X2 = 124; Y2 = 338; XY = 196

Find the Intercept and slope and Write the Regression Equation. Also predict the amount of tires (in thousand tires) sold when money invested in advertising is 5 thousand dollars. Calculate the correlation coefficient, coefficient of determination. Check whether there is a relation between correlation coefficient and coefficient of determination. Calculate SSE and MSE and standard error of the slope coefficient. SSxy = 196- (24*42)/6 = 28 SSxx = 124 (242)/6 = 28 SSyy = SST = 338 (422)/6 = 44. b1= SSxy/ SSxx = 1 and b0 = b0 = - b1 = 7 4 = 3. The estimated Regression Equation is: = 3 + 1*X or simply 3 + X. Since advertising is measured in thousand dollars, we enter 5 for 5 thousand dollars, in the equation for prediction. At X = 5, = 3 + 5 = 8 or 8 thousand tires sold. R2 = SSR/SST = 28/44 = 0.6364: the model explains 63.64% of variation in the tire sales. Correlation coefficient r = SSxy/[ SSxx SSyy] = 28/28*44 =0.7977 We see that .6364 = .7977. Thus it is verified that Coefficient of Determination is the square of the Correlation coefficient SSR = SS2xy/ SSxx = 282/28 = 28 MSE = SSE/(6-2) = 16/4 = 4. SSE = SST- SSR = 44 -28 = 16

Standard Error of Estimate = se = 4 = 2 sb1 = se/ SSxx = 2/28 = 0.378 3. Consumer Reports provided extensive testing and ratings for more than 100 HDTVs. An overall score, based primarily on picture quality, was developed for each model. In general, a higher overall score indicates better performance. The following data show the price and overall score for the ten 42-inch plasma televisions (Consumer Report data slightly changed here):

Brand Dell Hisense Hitachi JVC LG Maxent Panasonic Phillips Proview Samsung

Price 2800 2800 2700 3500 3300 2000 4000 3000 2500 3000

Score 60 55 45 50 55 38 67 56 32 40

Use the above data to develop an estimated regression equation. Compute the Coefficient of Determination and the correlation coefficient and show their relation. Interpret the explanatory power of the model. Estimate the overall score for a 42-inch plasma television with a price of $3400. The MegaStat Regression result is shown below:
Regression Analysis
R r Std. Error ANOVA table Source SS Regression 524.0257 Residual 543.5743 Total 1,067.6000 Regression output variables Intercept Price Observation 1 2 3 4 5 6 coefficients 8.8950 0.0138 Score 60.0 55.0 45.0 50.0 55.0 38.0 0.491 0.701 8.243 n k Dep. Var. 10 1 Score

df 1 8 9

MS 524.0257 67.9468

F 7.71

p-value .0240

std. error 14.9582 0.0050 Predicted 47.6 47.6 46.2 57.3 54.5 36.5

t (df=8) 0.595 2.777 Residual 12.4 7.4 -1.2 -7.3 0.5 1.5

pvalue .5685 .0240

confidence interval 95% 95% lower upper -25.5987 43.3888 0.0023 0.0253

std. coeff. 0.000 0.701

Estimated regression equation: = 8.895+ 0.0138X Coefficient of determination, R2 = 0.491 Correlation coefficient, r = 0.701 Relation: r2 = 0.7012 = 0.491 = R2 The Regression model can explain 49.1% of the variation in the Dependent variable (Overall score) Estimate Overall score: when X = 3400 is = 8.895+ 0.0138*3400 = 55.82

7 8 9 10

67.0 56.0 32.0 40.0

64.2 50.4 43.4 50.4

2.8 5.6 -11.4 -10.4

Calculator based:
Brand Dell Hisense Hitachi JVC LG Maxent Panasonic Phillips Proview Samsung Score=Y 60 55 45 50 55 38 67 56 32 40 Price=X 2800 2800 2700 3500 3300 2000 4000 3000 2500 3000 Xi- =xi -160 -160 -260 540 340 -960 1040 40 -460 40 Yi- = yi 10.2 5.2 -4.8 0.2 5.2 -11.8 17.2 6.2 -17.8 -9.8 xi*yi -1632 -832 1248 108 1768 11328 17888 248 8188 -392 xi
2

yi 25600 25600 67600

104.04 27.04 23.04 0.04 27.04 139.24 295.84 38.44 316.84 96.04

291600 115600 921600 1081600 1600 211600 1600

Sum

498

29600

37920 SSXY

2744000 SSXX

1067.6 SSYY

=29600/10 =2960 = 498/10 = 49.8

b1 = SSxy/SSxx = 0.0138 b0=49.8- (.0138*2960)= 8.895 yhat=8.895 +0.0138X


10

if X=3400, Y-hat = 8.8950 + (.0138*3400) = 55.82

SST=1067.6 SSR=.0138*37920 = 524 SSE=1067.6 - 524 = 543.6 R2= SSR/SST = 524/1067.6 = 0.491 r= SSxy/( SSxx SSyy) = 37920/(2744000*1067.6) = 0.701 R = .491= 0.701 Relationship between r & R2 verified

Chapter 14 4. A member of the state legislature has expressed concern about the differences in the mathematics test scores of high school freshmen across the state. She asks her research assistant to conduct a study to investigate what factors could account for the differences. The research assistant looked at a random sample of school districts across the state and used the factors of percentage of mathematics teachers in each district with a degree in mathematics, the average age of mathematics teachers and the average salary of mathematics teachers

Analysis of Variance

Write the least squares prediction equation. What is the number of observations in the sample? Based on the multiple regression model given above, estimate the mathematics test score and calculate the value of the residual, if the percentage of teachers with a mathematics degree is 50.0, the average age is 43 and the average salary is 48,300 (48.3). If the actual mathematics test score for these factors is 68.50, what is the error for this observation?

11

=35.178+0.22073X1+0.3353X2+0.0930X3 n=3+32+1=36 Estimated test score=65.12 and residual=3.38 =35.178+0.22073(50)+0.3353(43)+0.0930(48.3)=65.12 e=68.50-65.12=3.38 5. For the above equation (question # 4) answer the following: What is the total sum of squares? What is the explained variation? What is the mean square error? SS Total=SST = 1053.09+1858.50=2911.59 SSR = explained variation=1053.09 SSE = 2911.59-1053.09 = 1858.50 and MSE=1858.50/32 = 58.08 6. For the above equation (question # 4), calculate the Coefficient of Determination and the Adjusted coefficient of Determination and Test for the overall usefulness of the model using F-Statistic at 5% and 1% significance levels. R2=1053.09/2911.59=0.3617 R2 adjusted=(0.3617-(3/35))(35/32)=(0.3617-0.0857)(1.0938)=0.3019 MSR=1053.09/3=351.03 MSE=1858.50/32=58.08 F=MSR/MSE=351.03/58.08=6.04 F.01,3,32=4.51 Since, 6.04>4.51, reject H0 The coefficient of determination is highly significant and the model is useful. (Since it exceeds critical value at 1% we dont need to test at 5%). However the model can explain only 36.17 % of variation and 30.19 % of variance in the dependent variable. 7. For the above Regression (question # 4), test the usefulness (or significance of the three independent variables using t-test for 5% and 1% significance levels. t1 = 0.22073/0.07131 = 3.095 t2 = 0.3353/0.1901 = 1.764 t3 = 0.0930/0.1675 = 0.555

12

The two-tailed t-values for 32 df are: 2.037 and 2.738 for 5% and 1%, respectively. Thus, only the first slope is statistically significant at 1% level and the other two are insignificant even at 5%. In other words, statistically speaking, only the percentage of teachers with Math degree is significant (that is useful) in explaining the changes is Math test scores given the current sample results. This is not surprising considering the low value of the coefficient of Determination.

13