Final Review
Data Analysis
Use the following information for questions 1-7
Quiz Scores
1
4
5
5
6
7
7
7
3
4
5
6
6
7
7
8
3
4
5
6
6
7
7
8
AP Statistics
8
8
8
9
9
10
Collection 1
Histogram
Collection 1
8
7
6
5
4
3
2
1
6
8
Scores
10
12
8 10 12
Scores
14 16
18
Collection 1
Min= 1
Q1 = 5
Med= 6.5
0
10
Q3 = 8
Max = 10
12
Normal Distribution
Use the following info for questions 11-16, be sure to sketch the curve for each problem.
IQ is distributed normally with a mean of 100 and a standard deviation of 15.5.
P (z X ) =
.10
invnorm(.10, 100, 15.5)= 80
15. What IQ scores are in the middle 50% of the population?
P (z X ) =
.25
P (z X ) =
.75
invnorm(.25, 100, 15.5)=89.5
=
all
406 + .791football
12
= .34
35
19. What is the correlation coefficient for this data?
r = .70
20. What two things does this correlation tell us about the scatterplot of the data?
Its strong and positive.
The correlations suggests that schools who have higher achieving athletes tend to have a higher overall achieving
student body.
21. What is the correlation without the influential point?
The correlation without Northwestern is .35, significantly reducing the linear relationship between student athlete
score and overall school score.
22. Correlation only applies to what type(s) of relationship(s)?
Linear relationships
23. Give an example of two things that are highly correlated but are not necessarily a cause-and-effect
relationship.
Reading levels and shoe sizes, smoking and cancer
24. Iowas football players have an average SAT score of 814. What score would you predict for the entire
student body? Is this a good prediction? Why or why not?
1049.87, this is not a good prediction. Its not necessarily bad either. However, since we do not have data that
exists beyond 820 we are facing the danger of extrapolation to arrive at our answer.
25. Find the residual for Penn State.
res = actual predicted
1083-1115.83= -32.83
26. What is the coefficient of determination for this data and what does it tell you about the data.
r2= .49, This means that approximately 49% of the variation in student SAT can be explained by the linear
relationship between football SAT and overall SAT
Use the following information for questions 27-29. Be sure to include all formulas used.
Shipping Cost ($)
Shipping Box Length (inches)
10
4.99
12
8.59
15
16.79
18
28.99
24
68.99
Standard
Coefficients
Error
t Stat
P-value
Intercept
-46.8530
10.3134
4.542
0.019
Shipping Box Length
(inches)
4.5900
0.62328
7.364
0.005
R-Squared = .948
27. Perform a logarithmic transformation for an exponential model for this data. Show your work.
shipping cost = 163.081*log(box length) 166.4634
28. Perform a logarithmic transformation for a power model for this data. Show your work.
Log(shipping cost) = 3.001*log(box length) 2.303
29. Which model, exponential or power, is a better fit for this data? Justify.
Power model because there is no pattern in the residual plot and Approx. 99.9% of the variation in the log(shipping
cost) can be explained by the linear relationship between log(box length) and log(shipping cost)
Use the following table to answer questions 30-32
A researcher suspected a relationship between peoples preferences in movies and their preferences in pizza. A
random sample of 100 people produced the following two-way table:
Favorite Movie
Pepperoni
Veggie
Cheese
The Matrix
20
5
10
Ever After
8
15
12
American Pie
15
2
13
Total
43
22
35
30. What percent of these people prefer pepperoni pizza?
43%
31. What percent of people who prefer veggie pizza like The Matrix?
5
= .23
22
32. What percent of those who like Ever After prefer cheese pizza?
12
= .34
35
Total
35
35
30
100
Compare to see
which is
jucier/tastier
***Randomly Assign by
flipping a coin where
Heads= full dose
Tails= half dose
42. Instead of testing 24 trees you decide to do only 18. However, when you try to purchase 18 at nursery A, you
find out they only have 12. Letting this how you would use a completely random experiment that utilizes blocking.
(You are blocking because you need to account for the differences in the two stores)
43. Do cars get better gas mileage with premium instead of regular unleaded gasoline? While it might be possible
to test some engines in a laboratory setting, wed rather used real cars and real drivers in real day to day driving,
so we get 20 volunteers. Design the experiment.
a. I want to test the effects of aerobic exercise on resting heart rate. I want to test two different levels
of exercise, 30 minutes 3 times per week and 30 minutes 5 times per week. I have a group of 20 people to
test, 10 men and 10 women. I will take heart rates before and after the experiment. Draw a diagram for
this experimental design. Explain how you
Simulations
44. Design and perform a simulation of how many children a couple must have to get at least one girl and at least
one boy. Include a description and perform 10 trials.
17868
95034
27754
90056
19233
01927
82226
24943
05756
42648
52711
95034
27754
90056
61790
28713
82425
38889
05756
42648
52711
90656
96409
36290
93074
28713
82425
38889
87964
12531
45467
60227
96409
36290
93074
18883
42544
71709
40011
12531
45467
60227
41979
82853
77558
85848
42544
71709
40011
83485
73676
00095
48767
82853
77558
85848
46816
47150
32863
52573
73676
00095
48767
85435
99400
29485
95592
47150
32863
52573
19233
01927
82226
94007
99400
29485
95592
Probability
Use the following for questions 45-48
Probability of winning certain prizes in my fake raffle (tickets ARE replaced after each draw):
Car
0.03
Boat
0.07
TV
0.12
Can Opener
0.33
4
0.39
5
0.26
6
0.15
7 or more
0.03
Random Variables
Use the following for 54-57
Liz can run the 400 meter dash in an average of 60 seconds with a standard deviation of 4 seconds. Paul can run it
in 70 seconds with a standard deviation of 8 seconds.
=
L 60
=
L2 16
=
P 70
=
P2 64
54. If Liz and Paul are the first two legs of a 1600 m relay team, what is the mean and standard deviation of their
times together?
L + P = 60 + 70 = 130
70 = 8.94
55. Liz and Paul race each other. What is the mean and standard deviation of the difference in their times?
L P =
60 70 =
10
L P =
L +P =
70 = 8.94
56. Paul drinks a 2-liter of Mountain Dew, so he now runs twice as fast. What are his new mean and standard
deviation?
1
1
=
P
=
( 70 ) 35
2
2
1
.52Var =
(P )
64 =
16 4
=
P
(.25 )=
2
57. Liz is penalized 10 seconds for jumping the gun. What are her new mean and standard deviation?
L + 10 = 60 + 10 = 70
L + 10 =
Var ( L ) =
16 = 4
For problems 58-66, use the following situation: For Test 1, the class average was 80 with a standard
deviation of 10. For Test 2, the class average was 70 with a standard deviation of 12.
=
1 80
=
12 100
=
2 70
=
22 144
58. What is the average for the two tests added together?
59. What is the standard deviation for the two tests added together?
E (T1 ) E (T2 ) = 80 70 = 10
61. What is the standard deviation for the difference in the test averages?
62. If I cut the test scores on Test 2 in half and add 50, what is the new average?
1
1
E (T2 ) + 50
=
( 70 ) + 50= 85
2
2
63. What is the new standard deviation for Test 2 in problem 199?
2
1
=
VAR (T2 )
2
1
=
(144 ) 6
2
64. If I add 7 points to every Test 1, what is the new standard deviation?
SD (T1 ) = 10
65. If I multiply every Test 1 by 2 and subtract 80, what is the new mean?
2E (T1 ) 80
= 2 ( 80 ) 80
= 80
66. If I multiply every Test 1 by 2 and subtract 80, what is the new standard deviation.
(2=
) VAR (T ) =
(2) (100 )
2
20
Multiple Choice
83) You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of
variables you have measured is
A) 1463
B) Four; two categorical and two quantitative
C) Four; one categorical and three
quantitative
84) If your score on a test is at the 60th percentile, you know that your score lies
A) Below the lower quartile
B) Between the lower quartile and the median
C) Between the median and the upper quartile
85) When dealing with financial data (such as salaries or lawsuits settlements), we often find that
the shape of the distribution is _________. When the distribution has this shape, the _________
is pulled toward the long tail of the distribution, but the _________ is less affected. The sequence
of words to correctly complete this passage is
A) Right skewed, median, mean.
B) Left skewed, mean, median.
C) Right skewed, mean, standard deviation.
86) Items produced by a manufacturing process are supposed to weigh 90 grams. The
manufacturing process is such, however, that there is variability in the items produced and they do
not all weigh exactly 90 grams. The distribution of weights can be approximated by a normal
distribution with mean 90 grams and a standard deviation of 1 gram. What percentage of the items
will either weigh less than 87 grams or more than 93 grams?
A) 6%
B) 94%
C) 99.7%
D) 0.3%
E) 0.15%
B) The line that makes the sum of the squares of the vertical distances of the data points from the
line as small as possible.
C) The line that best splits the data in half, with half of the points above the line and half below
the line.
D) The line that makes the sum of the squares of the residuals 0.
E) All of the above.
90) The fraction of the variation in the values of a response y that is explained by the leastsquares regression of y on x is the
A) Correlation coefficient
B) Slope of the least-squares regression line
C) Square of the correlation coefficient
91) Suppose we fit the least-squares regression line to a set of data. If a plot of the residuals
shows a curved pattern,
A) A straight line is not a good summary for
the data.
B) The correlation must be 0.
96) An experiment compares the taste of a new spaghetti sauce with the taste of a successful
sauce. Each of a number of tasters tastes both sauces (in random order) and says which tastes
better. This is called a
A) Simple Random Sample
B) Stratified Random Sample
C) Completely Randomized Design
97) In a certain town, 50% of the households own a cellular phone, 40% own a pager, and 20% own
both a cellular phone and a pager. The proportion of households that own neither a cellular phone
nor a pager is
C) 30%
D) 70%
E) 90%
A) 0%
B) 10%
98) If the knowledge that an event A has occurred implies that a second event B cannot occur, the
events A and B are said to be
D) The Sample Space
A) Independent
E) Complementary
B) Disjoint
C) Mutually Exhaustive
99) A deck of cards contains 52 cards, of which 4 are aces. You are offered the following wager:
Draw one card at random from the deck. You win $10 if the card drawn is an ace. Otherwise you
lose $1. If you make this wager very many times, what will be the mean outcome?
A) About -$1, because you will lose most of the time.
B) About $9, because you win $10 but lose only $1.
C) About -$0.15, that is, on the average you lose about 15 cents.
D) About $0.77, that is, on the average you win about 77 cents.
E) About $0, because the random draw gives you a fair bet.
100) All bags entering a research facility are screened. Ninety-seven percent of the bags that
contain forbidden material trigger an alarm. Fifteen percent of the bags that do not contain
forbidden material also trigger the alarm. If 1 out of every 1,000 bags entering the building
contains forbidden material, what is the probability that a bag that triggers the alarm will actually
contain forbidden material?
A) 0.00097
B)0.00640
C)0.03000
D)0.14550
E)0.9700