405Second2016Midterm With Solutions

ORF 405 Monday November 21, 2016
SECOND MIDTERM
• Closed book, and anything with a switch ON/OFF should be turned OFF!
• No unjustified answer will be given credit, even if the answer is correct.
• The problems are independent of each other, so they can be tackled in whichever order you choose.
• Use the number of points allotted to each problems to decide how much time you should spend on each of them.
• Give your answers on this document and write OVER at the bottom of a page if you use the back of that page.
• You should try the extra credit problem VI only if you have time left.
• Remember to sign the pledge and write you name on this page and any loose sheet of paper you turn in !
20pts Problem I
20pts We start with a set of points (xi , yi )i=1,··· ,2n in the plane and we perform three least squares regressions of the response variable y against the
explanatory variable x first with the data (xi , yi )i=1,··· ,n , next with the data (xi , yi )i=n+1,··· ,2n and finally with the data (xi , yi )i=1,··· ,2n .
We assume that in the first two regressions, the estimates of the slopes are positive in both cases. In this case, can the estimate of the slope
for the third regression be negative?
Remember that if your answer is YES, you need to argue why it is so for ALL point configurations, while if your answer is NO, it is enough
to give a counter example to the claim.
Solution
The answer is YES. Indeed, it is very easy to construct counter examples. For example, if n = 2 and (x1 , y1 ) = (1, 10) and (x2 , y2 ) = (2, 11), the least
squares regression line is the line going through the two points and its slope is
y2 − y1
= 1.
x2 − x1
Similarly we choose (x3 , y3 ) = (10, 1) and (x4 , y4 ) = (11, 2), the least squares regression line is the line going through the two points and its slope is
y4 − y3
= 1.
x4 − x3
We now compute the sign of the slope of the LS regression of the full data set of 4 points. In fact, since we only care about the sign of the slope, we only
compute the covariance of the xi ’s and the yi ’s.
1 1
x= (1 + 2 + 10 + 11) = 6 and y= (10 + 11 + 1 + 2) = 6
4 4
so that
1 1
cov(x, y) = (1 − 6)(10 − 6) + (2 − 6)(11 − 6) + (10 − 6)(1 − 6) + (11 − 6)(2 − 6) = (−20 − 20 − 20 − 20) = −20
4 4
which is negative. This gives us the desired counter-example.
20pts Problem II
20pts Given a set of couples (xi , yi )i=1,··· ,n of real numbers, we perform the least squares regressions of the yi ’s against the xi ’s and we compute
the R2 . How would the R2 change if instead, we would perform the least squares regressions of the responses yi ’s against the explanatory
variables 2xi ’s?
Hint: Remember that the R2 is defined as 1 minus the ratio of the sum of squared errors of the regression to the empirical variance of the
response variable.
Solution
For the first regression, if we denote by β0 and β1 the intercept and the slope, the estimates β̂0 and β̂1 are the numbers minimizing the quantity:
n
X
|yi − β0 − β1 xi |2 .
i=1
By analogy, for the second regression, if we denote by α0 and α1 the intercept and the slope, the estimates α̂0 and α̂1 are the numbers minimizing the quantity:
n
X
|yi − α0 − α1 2xi |2 .
i=1
Clearly, we must have α̂0 = β̂0 and β̂1 = 2α̂1 . Now for the first regression, the R2 is given by:
Pn 2
i=1 |yi − β̂0 − β̂1 xi |
R2 = 1 − Pn 2
i=1 |yi − y|
while for the second regression the R2 , which we denote by r2 to avoid any confusion, is given by:
Pn 2
Pn 2
i=1 |yi − α̂0 − α̂1 2xi | i=1 |yi − β̂0 − β̂1 xi |
r2 = 1 − Pn 2
= 1 − P n 2
= R2
i=1 |yi − y| i=1 |yi − y|
so the R2 does not change,

35pts Problem III
Let x1 , x2 , · · · , xn be n independent samples, say n = 1024, from the uniform distribution on the interval [0, δ] where δ is a positive number,
and let y1 , y2 , · · · , yn be n real numbers. We perform a kernel regression to explain the values yi ’s as responses to the xi ’s and compute an
estimate of the response for the value x0 = δ/2.
5pts 1. Write down the formula giving the triangle kernel.
15pts 2. For each possible value of the bandwidth b > 0, compute the probability p that the evaluation of the kernel regression at x0 given this value
of the bandwidth, would not be well defined.
15pts 3. What value of the bandwidth would you need to use in order to guarantee that the expected number of samples (xi , yi )’s actually entering
the computation of the kernel regression at x0 is at least n/4?
Solution
1. Ktriangle (x) = 1 − |x| if |x| ≤ 1 and 0 otherwise.
2. Given a choice b for the bandwidth, the evaluation of the kernel regression will require a division by 0 whenever none of the points xi fall in the interval
(x0 − b, x0 + b). Clearly, the probability that this occurs is 0 when b > δ/2. Otherwise,
2b 1024
p = P[no xi ∈ (δ/2 − b, δ/2 + b)] = P[all xi ∈ (0, δ/2 − b ∪ (δ/2 + b, δ)] = P[x1 ∈ (0, δ/2 − b ∪ (δ/2 + b, δ)]n = [(δ − 2b)/δ]n = (1 − ) .
δ
So the result is (
0 if b > δ/2
p= 2b 1024
(1 − δ
) otherwise
3.If we use the bandwidth b, the couple (xi , yi ) enters the computation if |xi − x0 | < b or in other words, if xi ∈ (δ/2 − b, δ/2 + b). As in the previous
question, since we assume that the xi ’s are realizations of independent random variables uniformly distributed over the interval [0, δ], the number of xi ’s
falling in the interval (δ/2 − b, δ/2 + b) is a binomial random variable with parameters n = 1024 and probability p0 given by:
(
0 1 if b > δ/2
p =
2b/δ otherwise.
Since the mean of the binomial distribution B(n, p0 ) is np0 , we conclude that the expected number of samples (xi , yi )’s entering the computation is at least k
if np0 ≥ n/4 or equivalently p0 ≥ 1/4 which is equivalent to b > δ/8.
25pts Problem IV
Let us assume that x1 = (x1,1 , x1,2 , x1,3 ), · · · , xn = (xn,1 , xn,2 , xn,3 ) is a set of n = 1024 points in R3+ and that y1 , · · · , yn are n real
numbers satisfying p
yi = xi,1 xi,2 + log xi,1 + xi,2 + xi,3 + i , i = 1, ..., n
where the i are n = 1024 samples from independent identically distributed Gaussian random variables with mean 0 and variance 0.04.
15pts 1. Determine vectors a1 = (a1,1 , a1,2 , a1,3 ), a2 = (a2,1 , a2,2 , a2,3 ), and a3 = (a3,1 , a3,2 , a3,3 ), and functions x 7→ φ1 (x), x 7→ φ2 (x), and
x 7→ φ3 (x) such that: √
x1 x2 + log x1 + x2 + x3 = φ1 (a1 · x) + φ2 (a2 · x) + φ3 (a3 · x)
for all x = (x1 , x2 , x3 ) ∈ R3 .
10pts 2. Assuming that you are attempting a projection pursuit regression of the responses yi against the explanatory variables xi,1 , xi,2 and xi,3 ,
give the number nterms of ridge functions (varying from nterms=1 to max.terms=10) you would choose, and explain which vectors
ak and what kind of ridge functions ϕk the program should find. Sketch the plot of the goodness of fit (gofn in R) you would get.
Solution
1.
√ x1 + x2 2 x1 − x2 2 1
x1 x2 + log x1 + x2 + x3 = − + log(x1 + x2 + x3 )
2 2 2
So the solution is
1 1 1 1
a1 = ( , , 0), a2 = ( , − , 0) and a3 = (1, 1, 1)
2 2 2 2
and the functions x 7→ φ1 (x), x 7→ φ2 (x), and x 7→ φ3 (x) are given by:
1
φ1 (x) = x2 , φ2 (x) = −x2 , and φ3 (x) = log x.
2
√
2. Even though this is presumably not the only way to write the function x 7→ x1 x2 + log x1 + x2 + x3 as a sum of ridge functions, it is reasonable to
expect that nterms= 3 is the right number of ridge functions we should use, and the graphs of these functions should be the graphs of the functions φk we
just identify, even though we do not know in which order they will be picked by the program. Also, these functions should be evaluated on the projections of
x onto the vectors ak identified above.
15pts Problem V. Extra credit
We start with n real numbers x1 , x2 , · · · , xn and we suppose that the numbers y1 , y2 , · · · , yn are generated from the linear model:
yi = β0 + β1 xi + i , i = 1, · · · , n
where the i ’s are outcomes from independent draws from the uniform distribution U (−1, +1) over the interval [−1, +1].
5pts 1. Write the Likelihood function of the yi0 s under the above assumptions. Recall that the likelihood function of observations y1 , · · · , yn is the
joint density of the random variables Y1 , · · · , Yn computed at the observations (y1 , · · · , yn ).
5pts 2. Say if you think that one can find solutions βˆ0 and βˆ1 that maximize the likelihood? Explain your answer.
5pts 3. Suppose that n = 3, (x1 , y1 ) = (0, 1), (x2 , y2 ) = (1, 0), and (x3 , y3 ) = (2, 3). Do β̂0 and β̂1 exist, and if year, determine their values.
Solution
1 Q
1. L(β0 , β1 ) = 2n i I(yi − β0 − β1 xi ∈ [−1, 1]) where I(A) is the indicator function for the event A.
2. There will not always be a unique solution. Consider the data (0, 0), (1, 0). Any line passing through the origin with β1 ∈ [−1, 1] will have the same
likelihood. There will not always be a solution with non-zero likelihood. Consider the data (0, 1), (1, 0), (2, 3), (3, 1000). No can be drawn for which
the data has non-zero likelihood.
3. These data points have a unique solution: β̂0 = 0, β̂1 = 1.

405Second2016Midterm With Solutions

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

405Second2016Midterm With Solutions

Diunggah oleh

Hak Cipta:

Format Tersedia

ORF 405 Monday November 21, 2016

so the R2 does not change,

Anda mungkin juga menyukai