Rental vs Value
350000
R 300000
e 250000
n 200000
t 150000
a 100000
l 50000
0
0 10 20 30 40 50 60
Value
Scatter Plot.
Now let's find out a regression equation between house value and rental income
Using excel we find:
̅ = 𝟏𝟕𝟒𝟑𝟕𝟓 , 𝒀
𝑿 ̅ = 𝟗𝟔𝟏𝟏. 𝟑𝟑
𝑺𝑺𝒙 = 𝟒𝟕𝟐𝟕𝟖𝟐𝟖𝟏𝟐𝟓
𝑺𝑺𝒚 = 𝟒𝟕𝟗𝟔𝟖𝟐𝟎. 𝟖𝟗
𝑺𝑺𝒙𝒚 = 𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑. 𝟑
𝐒𝐒𝐱𝐲 𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑.𝟑
Slope: 𝒃 = = = 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖𝟐𝟒
𝐒𝐱 𝟒𝟕𝟐𝟕𝟖𝟐𝟖𝟏𝟐𝟓
̅ − 𝒃𝒙
Y-intercept: 𝒂 = 𝒚 ̅ = 𝟗𝟔𝟏𝟏. 𝟑𝟑 − 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖𝟐𝟒(𝟏𝟕𝟒𝟑𝟕𝟓) = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓
Regression equation: 𝒚 = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓 + 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖𝒙
with the equation of regression we want find the rental income a house of $230.000:
𝒚 = 𝟓𝟑𝟔𝟑. 𝟖𝟔𝟓 + 𝟎. 𝟎𝟐𝟒𝟑𝟓𝟖(𝟐𝟑𝟎. 𝟎𝟎𝟎)
𝒚 = $ 𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒 Rental
Although both rental incomes are very close to each other, the best approach is:
𝒚 = $ 𝟓𝟑𝟕𝟑. 𝟔𝟎𝟖𝟐 Rental
If we take into account that the average gave us $ 9611.33 which means that the rental
values are around that data, for this reason the previous rental value was chosen since it is
the one that is soon the closest.
10
Health Expenditure
0
0 2 4 6 8 10 12 14 16 18
Prenatal Care
Scatter Plot.
Now let's find out a regression equation between house value and rental income
Using excel we find:
̅ = 𝟔. 𝟏𝟐𝟔𝟕% , 𝒚
𝒙 ̅ = 𝟕𝟗. 𝟗𝟏𝟑𝟑%
𝑺𝑺𝒙 = 𝟑. 𝟕𝟖𝟐𝟎
𝑺𝑺𝒚 = 𝟑𝟓𝟒. 𝟓𝟔𝟏𝟐
𝑺𝑺𝒙𝒚 = 𝟔. 𝟐𝟖𝟎𝟑
𝐒𝐒𝐱𝐲 𝟔.𝟐𝟖𝟎𝟑
Slope: 𝒃 = = 𝟑.𝟕𝟖𝟐𝟎 = 𝟏. 𝟔𝟔𝟎𝟔
𝐒𝐱
̅ − 𝒃𝒙
Y-intercept: 𝒂 = 𝒚 ̅ = 𝟕𝟗. 𝟗𝟏𝟑𝟑 − 𝟏. 𝟔𝟔𝟎𝟔(𝟔. 𝟏𝟐𝟔𝟕) = 𝟔𝟗. 𝟕𝟑𝟗𝟑
Regression equation: 𝒚 = 𝟔𝟗. 𝟕𝟑𝟗𝟑 + 𝟏. 𝟔𝟔𝟎𝟔𝒙
From the regression equation we will find the percentage of women who receive prenatal
care for a city that spends 5%
𝒚 = 𝟔𝟗. 𝟕𝟑𝟗𝟑 + 𝟏. 𝟔𝟔𝟎𝟔(𝟓. 𝟎)
𝒚 = 𝟕𝟖. 𝟎𝟒𝟏𝟖%
The percentage data that best approximates the real one is 78.0418%, since taking into
account that the average in prenatal care gave us 79.9133% it can be concluded that it is
the best approximation for this exercise.
Section 10.2: Homework 2
Table #10.1.6 contains the value of the house and the amount of rental income in
a year that the house brings in ("Capital and rental," 2013). Find the correlation
Coefficient and coefficient of determination and then interpret both.
Now we will find the correlation coefficient as follows:
𝐒𝐒𝐱𝐲 𝟏𝟏𝟓𝟏𝟔𝟏𝟓𝟖𝟑. 𝟑
𝐫= = = 𝟎. 𝟕𝟔𝟓
√𝐒𝐒𝐱 𝐒𝐒𝐲 √(𝟒𝟕𝟐𝟕𝟖𝟑)(𝟒𝟕𝟗𝟔𝟖𝟐𝟎. 𝟖𝟗)
The value obtained from the correlation coefficient tells me that the linear association of
the data between the two variables is positive and strong between them and it is concluded
that there is a linear trend between the two study variables
Coefficient of determination:
This means that it is a model whose estimates fit the real variable quite well, that is the
model explains the real variable by 58.5%.
𝐒𝐒𝐱𝐲 𝟔. 𝟐𝟖𝟎𝟑
𝐫= = = 𝟎. 𝟏𝟕𝟏𝟓
√𝐒𝐒𝐱 𝐒𝐒𝐲 √(𝟑. 𝟕𝟖𝟐𝟎)(𝟑𝟓𝟒. 𝟓𝟔𝟏𝟐)
The value obtained from 0.1715 corresponding to the correlation coefficient, indicates that
there is a weak correlation, that is, there is no linear trend between the two study variables.
Coefficient of determination:
a.) Test at the 5% level for a positive correlation between house value and rental
amount.
We indicate the null and alternative hypothesis and the level of significance
𝑯𝟎 : 𝝆 = 𝟎
𝑯𝑨 : 𝝆 > 𝟎
𝜶 = 𝟎. 𝟎𝟓
Now you will find the value of the test statistic and p-value:
𝒓
𝒕=
𝟐
√𝟏 − 𝒓
𝒏−𝟐
Previously we had found the value of r and 𝒓𝟐 and replacing we have the following:
𝟎. 𝟕𝟔𝟓
𝒕= = 𝟖. 𝟎𝟓𝟔𝟎
√ 𝟏 − 𝟎. 𝟓𝟖𝟓𝟐
𝟒𝟖 − 𝟐
Now we introduce the value of t and df in the calculator TI-89 to obtain the value of the p-
value in the following way:
𝒕𝒄𝒅𝒇(𝟖. 𝟎𝟓𝟔𝟎, 𝟏𝐄𝟗𝟗, 𝟒𝟔)
𝝆 = 𝟏. 𝟐𝟐𝟐𝑬−𝟏𝟎
As you can see the value of p <0.05, for this reason we can affirm that there is a positive
correlation between the two study variables
E = 124630.6384
𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒 − 𝟏𝟐𝟒𝟔𝟑𝟎. 𝟔𝟑𝟖𝟒 < 𝒚 < 𝟓𝟑𝟔𝟗. 𝟒𝟔𝟕𝟑𝟒 + 𝟏𝟐𝟒𝟔𝟑𝟎. 𝟔𝟑𝟖𝟒
𝟏𝟏𝟗𝟐𝟔𝟏. 𝟏𝟕𝟏𝟏 < 𝒚 < 𝟏𝟑𝟎𝟎𝟎𝟎. 𝟏𝟎𝟓𝟕
Statistical interpretation:
There is a 95% chance that the interval 119261.1711 < y < 130000.1057 contains the true
value for the rental income on a house worth $230,000.
a.) Test at the 5% level for a correlation between percentage spent on health
expenditure and the percentage of women receiving prenatal care.
Now you will find the value of the test statistic and p-value:
𝒓
𝒕=
𝟐
√𝟏 − 𝒓
𝒏−𝟐
Previously we had found the value of r and 𝒓𝟐 and replacing we have the following:
𝟎. 𝟏𝟕𝟏𝟓
𝒕= = 𝟎. 𝟔𝟐𝟕𝟔
√ 𝟏 − 𝟎. 𝟎𝟐𝟗𝟒
𝟏𝟓 − 𝟐
Now we introduce the value of t and df in the calculator TI-89 to obtain the value of the p-
value in the following way:
𝒕𝒄𝒅𝒇(𝟎. 𝟔𝟐𝟕𝟗, 𝟏𝐄𝟗𝟗, 𝟏𝟑)
𝝆 = 𝟎. 𝟐𝟕𝟎𝟓
There is a positive correlation between health expenditures and the percentage of women receiving
prenatal care
c.) Compute a 95% prediction interval for the percentage of woman receiving
prenatal care for a country that spends 5.0 % of GDP on health expenditure.
Given the fixed value xo, the prediction interval for an individual y is
̂−𝑬<𝒚<𝒚
𝒚 ̂+𝑬
𝑺𝑺𝒙 = 𝟑. 𝟕𝟖𝟐𝟎
𝒙𝟎 = 𝟓. 𝟎 %
𝑺𝒆 = 𝟓. 𝟏𝟒𝟓𝟏
̅ = 𝟔. 𝟏𝟐𝟔𝟕
𝒙
̂ = 𝟕𝟖. 𝟎𝟒𝟏𝟖%
𝒚
n = 15
Then for a 95% confidence interval we have a 𝒕𝒄 = 𝟐. 𝟑𝟔𝟓
Now:
̅) 𝟐
𝟏 (𝒙𝟎 − 𝒙
𝑬 = 𝒕𝒄 𝑺𝒆 √𝟏 + +
𝒏 𝑺𝑺𝒙
𝟏 (𝟓. 𝟎 − 𝟔. 𝟏𝟐𝟔𝟕)𝟐
𝑬 = (𝟐. 𝟑𝟔𝟓)(𝟓. 𝟏𝟒𝟓𝟏)√𝟏 + +
𝟏𝟓 𝟑. 𝟕𝟖𝟐𝟎
E = 14.4095
𝟕𝟖. 𝟎𝟒𝟏𝟖 − 𝟏𝟒. 𝟒𝟎𝟗𝟓 < 𝒚 < 𝟕𝟖. 𝟎𝟒𝟏𝟖 + 𝟏𝟒. 𝟒𝟎𝟗𝟓
𝟔𝟑. 𝟔𝟑𝟐𝟑 < 𝒚 < 𝟗𝟐. 𝟒𝟓𝟏𝟑
Statistically it means that the percentage of women who receive antenatal care is between
𝟔𝟑. 𝟔𝟑𝟐𝟑 < y < 𝟗𝟐. 𝟒𝟓𝟏𝟑 for the country that spends 5%.
State the null and alternative hypotheses and the level of significance
𝑯𝟎 : 𝝆 = 𝟎 (𝐩𝐞𝐫𝐢𝐨𝐝 𝐚𝐧𝐝 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐲 𝐚𝐫𝐞 𝐢𝐧𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭)
𝑯𝑨 : 𝝆 > 𝟎 (𝐩𝐞𝐫𝐢𝐨𝐝 𝐚𝐧𝐝 𝐚𝐜𝐭𝐢𝐯𝐢𝐭𝐲 𝐚𝐫𝐞 𝐝𝐞𝐩𝐞𝐧𝐝𝐞𝐧𝐭)
𝜶 = 𝟎. 𝟎𝟏
Now you will find the value of the test statistic and p-value:
Test statistic:
First find the expected frequencies for each cell.
Using excel we find the value of the chi-square:
𝑿𝟐 = 𝟔𝟖. 𝟒𝟔𝟒𝟔
Now we proceed to find the degrees of freedom for this exercise:
𝒅𝒇 = (𝟑 − 𝟏)(𝟒 − 𝟏) = 𝟔
With the previously obtained data we will calculate the p-value:
𝑿𝟐 𝒄𝒅𝒇(𝟔𝟖. 𝟒𝟔𝟒𝟔, 𝟏𝑬𝟗𝟗, 𝟔)
𝝆 = 𝟖. 𝟒𝟑𝟗𝟐𝑬−𝟏𝟑 ≈ 𝟎
Conclusion
Reject Ho, since the value of p is less than 0.01, therefore there is enough evidence to show
that the activity and time period are independent for the dolphins.
Now you will find the value of the test statistic and p-value:
Test statistic:
First find the expected frequencies for each cell.
Do the data show that the frequencies observed substantiate the claim that the
reasons for choosing a car are equally likely? Test at the 5% level.
We indicate the null and alternative hypothesis and the level of significance
𝑯𝟎 : 𝝆 = 𝟎
𝑯𝑨 : 𝝆 > 𝟎
𝜶 = 𝟎. 𝟎𝟓
Then:
𝟏
𝑷(𝟏) = 𝑷(𝟐) = 𝑷(𝟑) = 𝑷(𝟒) = 𝑷(𝟓) = 𝑷(𝟔) =
𝟔
Now you can find the expected frequency for each side of the die. Since all the
probabilities are the same, then each expected frequency is the same.
𝟏
𝐄𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐟𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 = 𝐄 = 𝐧 ∗ 𝐏 = 𝟑𝟎𝟎 ∗ = 𝟓𝟎
𝟔