Anda di halaman 1dari 10

INTERPRETING SCATTER PLOTS AND TWO-VARIABLE STATISTICS Activity 1: Match the scatter plot to the correct correlation coefficient:

1. 0.14 2. 0.99 3. 0.43 4. 0.77

Activity 2: A zoologist was interested in predicting the weight of alligators by simply measuring their length. Some brave researchers went out to an alligator preserve in the Everglades, and measured 21 alligators lengths and weights.

Best fit linear equation: y = 5.9x 393,

r = 0.93

a) What is the slope of the linear model? Interpret this value in context of the data. The slope of the line is 5.9. This means that for every extra inch of length, the weight of the alligator increases by 5.9 pounds. b) What is the yintercept of the linear model? Interpret this value in context of the data. Does this interpretation make sense in context? Why (not)? The y-int of the model is -393. This means that when the alligator has a length of zero inches it has a weight of -393 pounds. This does not make sense because if an alligator has no length (does it exist) and what is negative weight. c) One alligator the researchers named Fluffy was too aggressive to be weighed. They did get

Fluffys length, though: 108 inches. Predict her weight using the model. y = 5.9 (108) 393 pounds y = 244.2 pounds The predicted value for Fluffys weight is 244.2 pounds. d) Even though the correlation between weight and length is high (0.93), There may be a better equation to model this relationship. Why? If we look at the residuals, there is a pattern that can be seen. A relationship based on a curve is a better model. Visually one can see that a curve will better fit the data set. Activity 3: Mr. Theil collected arm span and height data for a larger group of students. He also wanted to see if the relationship between height and arm span was different for males and females.

Correlation coefficients: for females, r= 0.917 and for males, r = 0.616 a) For which group, males, or females, is the relationship between height and arm span stronger? The relationship between height and arm span is stronger for females than for males. b) Give one piece of visual evidence for your conclusion in part a). The blue squares that represent females fit closer to the linear model. (Residuals appear to be smaller). The female data points more closely form a straight line than the males data points. c) Give one piece of numerical evidence for your conclusion in part a). The relationship is stronger for the females since their R2 value is larger. d) Tracys arm span is 170 cm long. Predict her height, using the appropriate bestfit linear model. Height = .666 (170) + 57.4 cm Height = 170.6 cm The predicted height for Tracy would be 170.6 cm tall. e) Chuckies Arm span is 180 cm. Predict his height. Height = .38 (180) + 112.5 cm Height = 180.9 cm

The predicted height for Chuckie would be 180.9 cm tall. f) Which prediction, Tracys or Chuckys, is probably more accurate? Provide evidence and/or specific reasoning for your decision. My prediction for Tracys height is probably more accurate since the R2 value for the female model is larger (stronger relationship). 84% of the change in height is due to the change in arm span.

g) One person, Kelly, has an arm span of 168 cm, and a height of 170 cm, and was left off the plot. You dont know if Kelly is male or female. Whats your best guess? Provide evidence for your conclusion. Male Height = .38(168) +112.5 cm Female Height =. 666(168) + 57.4 cm = 176.3 cm = 169.3 cm Based on the above calculations, I think Kelly is female since her actual height is closest to the predicted height for a female with an arm span of 168 cm. The height calculations support this conclusion. h) How confident are you with your decision in g)? Absolutely sure, pretty sure, or not very sure at all? Explain. I am pretty sure since the female model is the strongest of the two models. The female height predicted by calculation almost perfectly matches Kellys actual height. i) Theres a point plotted at (212, 181). Write a sentence that describes the gender and appearance of this person. How are they considerably different from the rest of the people in this study? Be specific. Male Height = .38(212) +112.5 cm Female Height =. 666(212) + 57.4 cm = 193.1 cm = 198.6 cm Residual Male = 181-193.1 cm Residual Female = 181198.6 cm =-12.1 cm =-17.6 cm Looking at the residual values, the male value is smaller and this person is most likely male, with long arms. He is much shorter than most males with a similar arm span. j) Theres a point plotted at (175, 188). Write a sentence that describes the gender and appearance of this person. How are they considerably different from the rest of the people in this study? Be specific. Male Height = .38(175) +112.5 cm Female Height =. 666(175) + 57.4 cm = 179 cm = 174 cm Residual Male = 188-179 cm Residual Female = 188174 cm = 9 cm = 14 cm Looking at the residual values, the male value is smaller and this person is most likely male, with shorter arms. He is much taller than most males with a similar arm span.

internationally as well? The following table contains the box office receipts for the ten highest grossing movies in history (as of 2007). The numbers are in millions of dollars and adjusted for inflation. Movie Domestic International Receipts Receipts Titanic 601 1235 Star Wars 461 337 Shrek 2 437 444 ET 435 322 Star Wars: Phantom Menace 431 491 Pirates of the Caribbean: Dead 417 592 Mans Chest Spider-Man 404 418 Star Wars: Revenge of the Clones 380 468 The Lord of the Rings: The Return 377 752 of the King Spider-Man 2 373 410 a. Plot the relationship on a scatter plot.

PROBLEM 1: Do higher grossing movies in the US tend to be higher-grossing

Domestic vs International Box Office Receipts


1400 International Box Office Receipts (millions $) 1200 1000 800 600 400 200 0 0 100 200 300 400 500 600 700 Domestic Box Office Receipts (millions $) y = 2.8471x - 681.91 R2 = 0.4815

b.

c.

Find the equation of the line of best fit. Use this model to predict the international box office gross for a movie which brings in $500 million dollars in the US. International Receipts = 2.8471(500) 681.91 = 741.64 million dollars The predicted International Box Office Receipts would be 741.64 million dollars. Interpret the value of the slope in context of this situation. As the Domestic Receipt increase by 1 million dollars, the International Receipts would increase by 2.8471 million dollars.

d. Interpret the yintercept of the model in context. Do you feel this interpretation has any realworld value? Explain. If the Domestic Receipts were $0, the International Receipts would be a negative 681.91 million dollars. This is not a real-world interpretation since it is not possible to negative ticket sales. e. Suppose that Titanic were removed from this data set. How would this removal change the value of the slope and the intercept of the linear model? It would make the relationship turn from a positive one to a negative one. The slope would become -2.1385 and the intercept would become a large positive value (1353.2 million dollars). f. Suppose that Titanic were removed from this data set. How would this removal change the value of the correlation coefficient? Explain why. Before r = 0.69 After r =0.50 It will make it a weak negative correlation, since the titanic was such a popular movie the numbers skew the data and everything becomes opposite once its removed. g. Create a new scatter plot, determine the new equation of the line of best fit, and determine the new correlation coefficient after removing Titanic from the data set.

Domestic vs International Box Office Receipts


800 International Box Office Receipts (millions $) 700 600 500 400 300 200 100 0 0 100 200 300 400 500 Domestic Box Office Receipts (millions $) y = -2.1385x + 1353.2 R2 = 0.2512

h. Even if you removed Titanic from the data set, and computed a new linear model and correlation coefficient, why might it be inappropriate to use them to make predictions about the international box office income of other movies that premiere in the US? The movie may be more popular in the US then internationally or viceversa. There could also be a popular actor that is in the movie that may not be as popular in another country. Therefore, the data could be very different for US or internationally.

PROBLEM 2: The data provided in the table below are the gold medal winning long
jump distances for the mens and womens divisions at the Olympics from 1948 to present. Year Mens Distance (m) Womens Distance (m) 1948 7.82 5.69 1952 7.57 6.24 1956 7.83 6.35 1960 8.12 6.37 1964 8.07 6.76 1968 8.90 6.82 1972 8.24 6.87 1976 8.34 6.72 1980 8.54 7.06 1984 8.54 6.96 1988 8.72 7.40 1992 8.67 7.14 1996 8.50 7.12 2000 8.55 6.99 2004 8.59 7.07

Make three scatter plot graphs Mens Distance vs. year, Womens Distance vs. year, and Womens Distance vs. Mens Distance (or vice versa). (You could also try to make a double scatter plot with year on the x-axis and both mens and womens distances on the y-axis) a. Determine the equations of the lines of best fit and the correlation coefficients for each scatter plot.
Olympic Long Jump Distances (Male)
9 8.8 Distance Jumped (m) 8.6 8.4 8.2 8 7.8 7.6 7.4 1940 1950 1960 1970 1980 1990 2000 2010 ymale = 0.0164x - 24.041 R = 0.5915
2

Years

Olympic Long Jump Distances (Female)


8 7 Distance Jumped (m) 6 5 4 3 2 1 0 1940 1950 1960 1970 1980 1990 2000 2010 yf emale = 0.021x - 34.655 R = 0.7256
2

Years

Womens Distance (m) vs Men's Distance (m)


8 7 6 5 4 3 2 1 0 7.4 7.6 7.8 8 8.2 Women's Distance (m)

y = 0.948x - 1.1292 R2 = 0.6733

8.4

8.6

8.8

Men's Distance (m)

Olympic Long Jump Distances (Male and Female)


10 9 Distance Jumped (m) 8 7 6 5 4 3 2 1 0 1940 1950 1960 1970 1980 1990 2000 2010 ymale = 0.0164x - 24.041 R = 0.5915
2

yf emale = 0.021x - 34.655 R = 0.7256


2

Years

b. What do each of these equations and coefficients tell you about distances over time, and mens distances compared to womens distances. This tells me that the slope of the line of best fit for the women is greater than that of the men. (The womens line of best fit could possibly cross the mens line of best fit ). The R2 value for the women is higher than for the men, this means that the womens data set more closely forms a straight line. (ie Women: 73% of the variation in distance is due to year and Men: 59% of the variation in distance is due to the year.) c. According to the scatter plots and trends, will women ever catch up to men in terms of distance jumped. ymale = 0.0164x 24.041 yfemale = 0.021x 34.655 If women catch the men, ymale = yfemale 0.0164x 24.041 = 0.021x 34.665 0.0046x = 10.624 x= 2310 This means that according to the lines of best fit, the women could catch up to the men in the year 2310. In that Olympic games, the winning length would be 13.84 m. This does not seem probable as this is an additional 80% increase in distance.