Anda di halaman 1dari 2

STATS 330 / STATS 762

Midterm Test Model answers


SC 2015

1. Plot interpretation for 6 marks.


As coffee strength increases the reaction time decreases (the stronger the coffee
the faster the reaction). [2 marks]
As amount increases the reaction time increases (the more to drink the slower to
react). [2 marks]
Slopes indicate planarity of data. [1 mark]
Slopes indicate parallel lines model. [1 mark]
2. What is meant by the term collinearity in linear regression? If present, what effect
does collinearity have on the estimated regression coefficients? [5 marks]
A regression is said to be collinear when one or more variables are almost linear
combinations of the others. [2 marks]
Collinearity can lead to imprecise estimation of regression coefficients (increased
standard errors). [1 mark]
Increased standard errors can lead to non-significance of predictors. [1 mark].
Collinearity is a data not a model property, its impact can be alleviated by re-
moving affected predictors from the model. [1 mark]
3. What is the optimism of a prediction? Describe two ways of estimating the opti-
mism. [5 marks]
A given data set is usually a sample from a population. The optimism is the
difference in goodness of fit of a model to the sample and the goodness of fit of
the same model to the population. [2 mark]
The goodness of fit is usually described by the associated R2 or the prediction
error. [1 mark]
The goodness of fit of a model to the population can be estimated by Cross
Validation, [1 mark]
or bootstrapping. [1 mark]
4. Description of three data points in a scatter plot with respect to their effect on re-
gression. Note: Just naming the points by their leverage and potential for outlier is
sufficient.[3 marks]
Point 10 is a high leverage point but no outlier. It is an unusual observation for
the housing price but any line fit through the majority of points will be close
enough to 10 to not show up as large residual. [1 mark]

1
Point 35 is a low leverage outlier. It is not an unusual observation for housing
prices but it will have a large residual in a fitted model. [1 mark]
Point 24 is a high leverage outlier. It is an unusual observation for housing price
and will have a large residual. Its location may actually draw the line towards
itself. [1 mark]

5. We fitted a linear model to the data in Q4. Using the output below compute the
prediction interval and the confidence interval for the rental value of a property priced
at 700,000$. [5 marks]

confidence.interval = fitted.value t.975,48 standard.error [1 mark]


= 699.6994 2.010635 22.31385 [.5 mark]
= [654.8344, 744.5644], [1 mark]

prediction.interval = fitted.value t.975,48 se2 + 2 [1 mark]

= 699.6994 2.010635 22.313852 + 143.44752 [.5 mark]
= [407.8103, 991.5885]. [1 mark]

6. The following little dataset contains information about four imaginary STATS330 stu-
dents. Name rules for the four variable in the table, and identify inconsistent entries.
[6 marks]

Sex should be in {M, F, O}. [1 mark]


Height is measured in metres and a reasonable range could be [0.70, 2.20].[1 mark]
Weight is measured in kg, and a reasonable range could be [40, 250]. [1 mark]
Grade is a letter grade between from A+ to D following UoA rules (it is
STATS330 after all). [1 mark]
Hemmingways height is in feet and inches, and his weight in pounds, clearly
inconsistent. [1 mark]
PJ Harveys weight is not assigned, and David Hasselhoffs sex is given as m,
both clearly inconsistent. [1 mark]
Some students pointed to Polly Jean as inconsistent since the variable was First.name
which would ask for only one name. It was counted as a right answer if the student
missed at least one of the four inconsistencies named above.

Anda mungkin juga menyukai