Anda di halaman 1dari 34

Chapter 3 DESCRIPTIVE STUDY OF BIVARIATE DATA

3.1 (a) The table, with completed marginal totals, is: Degree of Nausea Slight Moderate 36 18 33 36 69 54

Pill Placebo Total

None 43 19 62

Severe 3 12 15

Total 100 100 200

(b) The relative frequencies, by row, are Degree of Nausea Slight Moderate 0.36 0.18 0.33 0.36

Pill Placebo

None 0.43 0.19

Severe 0.03 0.12

Total 1.00 1.00

(c) A much higher proportion, 0.43, of pill takers avoided nausea as compared to the proportion 0.19 among those who took the placebo. Also the proportion of persons suffering moderate and severe nausea was much lower among those receiving the pill. 3.2 (a) The table, with completed marginal totals, is: Sugar Content Below average Above average 3 7 4 6 6 4 13 17 47

Manufacturer General Mills Kellogg Quaker Total

Total 10 10 10 30

48

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

(b) Calculating relative frequencies by row Sugar Content Below average Above average 0.3 0.7 0.4 0.6 0.6 0.4

Manufacturer General Mills Kellogg Quaker

Total 1.0 1.0 1.0

(c) Quaker has a higher proportion of cereals with below average sugar content, whereas General Mills has a higher proportion of cereals with above average sugar content. 3.3 The relative frequencies, by row, are 10 or less More than 10 Biology 0.40 0.60 Physical 0.30 0.70 Social 0.52 0.48 A larger percentage of physical science and biology majors final exams that do social science majors. 3.4 The relative frequencies, by row, are Favor Indifferent Opposed Total Faculty 0.180 0.210 0.610 1.000 Academic Staff 0.176 0.308 0.516 1.000 Student 0.265 0.445 0.290 1.000 The students are somewhat more in favor and much less opposed than either the faculty or the academic staff. 3.5 (a) The two-way frequency table is: Iron Low Alkalinity High Total (b) The relative frequencies are: Iron Low Alkalinity High Total 0.211 0.632 0.263 0.368 0.474 1.000 Low 0.421 High 0.105 Total 0.526 4 12 5 7 9 19 Low 8 High 2 Total 10 Total 1.000 1.000 1.000 study longer for their

49

(c) The relative frequencies, by row, are: Iron Low Alkalinity High 3.6 (a) The two-way frequency table is: Alcoholic 42 15 57 Not Alcoholic 22 71 93 Total 64 86 150 0.444 0.556 1.000 Low 0.800 High 0.200 Total 1.000

Depressed Not depressed Total

(b) The relative frequencies are: Alcoholic 0.280 0.100 0.380 Not Alcoholic 0.147 0.473 0.620 Total 0.427 0.573 1.000

Depressed Not depressed Total 3.7

(a) The two-way frequency table is: Major Male Female Total B 12 6 18 H 4 0 4 P 5 4 9 S 14 4 18 Total 35 14 49

(b) The relative frequencies are: Major H P 0.082 0.102 0 0.082 0.082 0.184

Male Female Total

B 0.245 0.122 0.367

S 0.286 0.082 0.368

Total 0.715 0.286 1.001

Alternate solution using Minitab: When a data set is large, it is useful to enter it once on a computer and instruct it to do the counting. To do so, we encode gender and intended major as numbers. We choose 0 if male, 1 if female and

1= B , 2 = H , 3 = P , 4 = S

50

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

With the coded data in a file called 2.94.dat.

We can also calculate relative frequencies by row. More precisely, 100 (relative frequency) is obtained from the MINITAB command

3.8

(a) The two-way frequency table is: Child Obese Not obese 12 24 8 36 20 60

Parent At least one obese Neither obese Total (b) The relative frequencies are:

Total 36 44 80

Parent At least one obese Neither obese Total

Child Obese Not obese 0.150 0.300 0.100 0.450 0.250 0.750

Total 0.450 0.550 1.000

51

(c) The relative frequencies, by row, are: Child Not obese 0.667 0.818

Parent At least one obese Neither obese 3.9

Obese 0.333 0.182

Total 1.000 1.000

(a) The proportions are calculated by row 0 .548 = 23 / 42 and so on. Male 0.548 0.844 Female 0.452 0.156 Total 1.000 1.000

English Computer science

(b) There appears to be gender bias, favoring males, in the Computer Science department. The relative frequencies in the English department do not indicate the obvious presence of bias. 3.10 (a) The proportions are calculated by row 0 .959 = 2110 / 2200 and so on. Died 0.041 0.033 Survived 0.959 0.967 Total 1.000 1.000

Research hospital Community hospital

(b) The proportions 0.967 versus 0.959 suggest that you should prefer the community hospital. 3.11 (a) The proportions, by row, for each condition are Good Condition Died Research hospital 0.021 Community hospital 0.027 Bad Condition Died Research hospital 0.050 Community hospital 0.070

Survived 0.979 0.973

Total 1.000 1.000

Survived 0.950 0.930

Total 1.000 1.000

(b) The research hospital has a higher proportion of patients in good condition that survive, 0.979 vs.0.973, and a higher proportion of patients in poor condition that survive, 0.950 vs. 0.930. Whether you are in bad or in good condition, you should prefer the research hospital. (c) We have reached just the opposite conclusion of that reached in Exercise 3.10. In this example of Simpsons paradox, the condition of the patient acted as the

52

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

lurking variable. The proportion of patients in poor condition is much higher at the research hospital so that kept down their overall survival rate calculated in Exercise 3.10. 3.12 Put each name on a slip of paper and select one blindfolded. That person will be assigned to the treatment group and the other to the placebo group. 3.13 (a) Of course, the fact that 21 out of 57, or proportion .368 quit smoking, by itself, would seem to be stronger evidence. Intuitively, we tend to think incorrectly that no persons would have quit without the medicated patch. (b) Most people respond positively when they are given attention. The placebo trials make it possible to treat all subjects alike except for the presence or absence of medication. Twenty percent, 11 out of 55, responded positively to the procedure, even without the medication. This makes the success of the medicated patch less spectacular but provides a proper frame of reference. 3.14 (a) A dog is physically fit from regular walks, runs and playtimes. Presumably the dogs owner would take part in these activities with the dog and so would benefit physically as well so there would be a positive correlation. (b) It is safe to assume that the more music one downloads, the more time will be spent listening to that music along with other music so there would be a positive correlation. (c) It is reasonable to expect these to be positively correlated. 3.15 (a) Positive more sales persons should be able to see more people and sell more real estate. (b) Positive in general better players get paid higher salaries. (c) Positive one would expect sales to increase with the amount of TV advertising of the cola. (d) Negative strength diminishes with age after middle age. 3.16 Not likely. The two are not naturally related and there are numerous lurking variables that could contribute to the size of the federal debt that have nothing to do with attendance at NFL games (e.g., military spending). 3.17 No. The value of r can be small even is there is a strong relationship along a curve as illustrated in Figure 2 of the text.

53

3.18 (a) The scatter diagram is:


7 6 5 4 y 3 2 1 0 0 2 4 x 6 8

(b) The pattern is northwest to southeast so r should be negative. A guess is r = 0.9 since pattern is tight. (c) We calculate
x y
xx 3 2 3 0 2 0

y y
2 1 2 0 1 0

( x x )( y y )
6 2 6

1 2 7 4 6 20
x=4

6 5 2 4 3 20 y =4

0
2 16

( x x )2 9 4 9 0 4 26 S xx

( y y )2 4 1 4 0 1 10
S yy

S xy

so
r= S xy S xx S yy = 16 = 0.992 26 10

54

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.19 (a) A computer calculation gives r = 0.460 for males. The scatter plot diagram for males and the multiple scatter plot are

(b) A computer calculation gives r = 0.415 for females. (c) Both have about the same testosterone levels but Females have higher levels of estradiol and are more variable.

3.20 (a) The scatter diagrams are


9 8 7 6 5 y 4 3 2 1 0 0 2 4 x 6 8 y 4 3 2 1 0 0 2 4 x 6 8 9 8 7 6 5

55

(b) For the first data set, we calculate x 0 4 2 6 3 15 x=3 so r= S xy S xx S yy = 16 = 0.80 20 20 y 4 6 2 8 5 25 y =5


xx
3 1 1 3 0 0

y y
1 1 3 3 0 0

( x x )( y y ) 3 1 3 9 0 16
S xy

( x x )2 9 1 1 9 0 20 S xx

( y y )2 1 1 9 9 0 20
S yy

(c) The pattern is northwest to southeast so r should be negative. Guess r = 0.4 since pattern is not very tight. We calculate x 0 4 2 6 3 15 x=3 so y 8 2 5 4 6 25 y =5
xx
3 1 1 3 0 0

y y
3 3 0 1 1 0

( x x )( y y )
9 3 0 3 0 15

( x x )2 9 1 1 9 0 20 S xx

( y y )2 9 9 0 1 1 20
S yy

S xy

r=

S xy S xx S yy

15 = 0.75 20 20

3.21 Only Figure 7(c) has a northwest-southeast pattern indicating a negative value for r. Since the tightest pattern, indicating the largest r, is in Figure 7(a) the matches are (a) r = 0.3 and Figure 7(c). (b) r = 0.1 and Figure 7(b). (c) r = 0.9 and Figure 7(a). 3.22 There is a southwest-northeast pattern so r is positive. Take (b) r = 0.5 since the pattern is not tight enough about a line for r = 0.9 .

56

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.23 Identifying the sums of squares about the means as S xx , S yy , and S xy respectively, we find S xy 204.3 r= = = 0.578 S xx S yy 530.7 235.4 3.24 Young persons tend to both drive fast and hang fuzzy dice in their cars. They tend to get speeding tickets. People in their thirties and forties tend to drive in such a manner that they receive fewer tickets and, perhaps with maturity, dont hang dice in their cars. 3.25 Let x = the amount of hydrogen and y = the amount of carbon. Then, with n = 11, we calculate x = 533.80 , y = 621.00

x
so

= 43,124.84

= 48, 624.58

xy = 43, 760.84

S xx = x

( x)
n

= 43,124.84
2

(533.80)2 = 17, 220.98 11

(621.00) 2 = 13,566.31 n 11 ( x )( y ) = 43, 760.84 (533.80)(621.00) = 13, 625.40 S xy = xy n 11 Consequently, S xy 13, 625.40 r= = = 0.891 17, 220.98 13,566.31 S xx S yy S yy = y
2

( y)

= 48, 624.58

3.26 (a) The scatter diagram below suggests (0.76, 90) may be unusual.

57 (b) Let x = speed and y = body length. Then, with n = 20, we calculate

x = 37.18 , y = 2752 x = 75.8684 y = 386,384 xy = 5151.84


2 2

so S xx = x
2

( x)
n

= 75.8684
2

(37.18)2 = 6.7508 20

(2752) 2 = 7708.8 n 20 ( x )( y ) = 5151.84 (37.18)(2752) = 35.872 S xy = xy n 20 Consequently, S yy = y


2

( y)

= 386,384

r=

S xy S xx S yy

35.872 = 0.157 7708.8 6.7508

3.27 (a) The scatter diagram is shown in below. The pattern runs from lower left to upper right and is not very tight. We estimate r = 0.2 .

(b) Let x = length and y = weight. We calculate

x = 1511 x = 208,153 y
2

y = 1011 = 94, 453 xy = 139,141


,

so

58

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

S xx = x

( x)
n

(1511)2 = 208,153 = 596.545 11 = 94, 453

(1011)2 = 1532.909 11 n ( x )( y ) = 139,141 (1511)(1011) = 266.364 S xy = xy n 11 Consequently, S yy = y


2

( y )

r=

S xy S xx S yy

266.364 = 0.279 596.545 1532.909

(c) The multiple scatter diagram reveals different patterns and one possible F outlier.

3.28 (a) The scatter diagram is shown in below. The pattern runs from lower left to upper right and is moderately tight. We estimate r = 0.7 .

59 (b) Let x = length and y = weight. We calculate

x = 1013 x = 128, 613 y


2

y = 591 = 44369 xy = 75196


, (1013)2 = 341.875 8

so S xx = x
2

( x)
n

= 128, 613
2

S yy = y

(591)2 = 708.875 n 8 ( x )( y ) = 75196 (1013)(591) = 360.625 S xy = xy n 8


2

( y )

= 44369

Consequently, r= S xy S xx S yy = 360.625 = 0.733 341.875 708.875

3.29 (a) The scatter diagram is


7 6 5 Motorcycles (mil) 4 3 2 1 0 0 25 50 75 100 125 150 175 200 225 250 Cell Phones (m il)

(b) The scatter diagram exhibits a strong correlation but it is hard to imagine any causal relationship between an increase in motorcycle registration and an increase in cell phone usage, so we suspect the presence of lurking variables. A steadily increasing population seems a likely culprit.

60

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.30 For x = 2 x 3 and y = y + 10 , we calculate x 1 2 7 4 6 y 6 5 2 4 3


x -1 1

11 5 9 25

y 4 5 8 6 7 30
y = 6

x x -6 -4 6 0 4
0

y y 2 1 2 0 1 0

( x x)( y y ) ( x x ) ( y y )
12 4 12 0 4 32
S x y

36 16 36 0 16 104

4 1 4 0 1 10
S y y

x = 5 so

S x x

r=

S xy S xx S yy

32 = 0.9923 104 10

and r just changes sign since a = 3 and c = 1 are of opposite signs. (b) Here
x = 2.54 x where x is the height in inches y = 0.454 y where y is the weight in pounds

Since a = 2.54 and c = 0.454 are of the same sign, r = 0.86 is unchanged. This answer may be obtained without knowing the specific values for a and c. 3.31 (a) The scatter diagram is shown below.
275 250 Garbage (in millions of tons) 225 200 175 150 125 100

1960

1970

1980 Year

1990

2000

2010

(b) There is a tight southwest to northeast pattern indicating a strong positive correlation. The later years have the largest garbage values.

61

(c) Population size is also increasing and most likely, even with more recycling, more people mean more garbage. 3.32 (a) The scatter diagram is shown below.
300

280 Population (in millions)

260

240

220

200

180 100 125 150 175 200 225 Garbage (in millions of tons) 250 275

(b) There is a tight southwest to northeast pattern indicating a strong positive correlation. The high populations have the largest garbage values and low populations the smallest. (c) Here the association, or a substantial part of it, is causal. The more people the more garbage. 3.33 (a) Let x = (year 1960) and y = amount of garbage (mil.tons). Then, with n = 6, we calculate x = 147 , y = 1052
2 x = 5209 2 y = 205,854 xy = 31, 618

so S xx = x
2

( x)
n

(147)2 = 5209 = 1607.5 6

(1052)2 S yy = y = 205,854 = 21403.3 n 6 ( x )( y ) = 31, 618 (147)(1052) = 5844 S xy = xy n 6 Consequently, S xy 5844 r= = = 0.9963. S xx S yy 1607.5 21403.3
2

( y )

62

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

(b) The correlation is still 0.9976. Since year is a linear transformation of (year 1960) year = 1 ( year 1960) + 1960 The deviation for each year, or year year equals the same deviation for

( year 1960) as you may verify. Consequently the sum of squares for years
and the sum of cross-products remain the same and the correlation is unchanged (see Exercise 3.30). 3.34 (a) Let x = amount of garbage (mil.tons) and y = population size (millions). Then, with n = 6, we calculate
x = 1442
2 x = 357,508

y = 11391052 xy = 267,996

2 y = 205,854

so S xx = x
2

( x)
n

= 357,508
2

(1442) 2 = 10947.333 6

S yy = y 2

(1052)2 = 21403.333 n 6 ( x )( y ) = 267, 996 (1442)(1052) = 15165.333 S xy = xy n 6 = 205,854

( y)

Consequently, r= S xy S xx S yy = 15165.333 = 0.991. 10947.333 21403.333

(b) r = 0.991 is unchanged. The mean


number pounds = 2000 number tons

so the deviation from the mean in pounds satisfies

( number pounds number pounds) = 2000 ( number tons number tons)


When S xx and S xy are replaced by (2000)2 S xx and 2000S xy respectively, the correlation is unchanged (see Exercise 3.30).

63 3.35 The value of y at x = 1 is 2 + 3(1) = 5 and the value at x = 4 is 2 + 3(4) = 14 . The line is shown below. The intercept is 2, the value of y at x = 0 , and the slope is 3, the coefficient of x.

3.36 The value of y at x = 0 is 6 2(0) = 6 and the value at x = 3 is 6 2(3) = 0 . The line is

The intercept is 6, the value of y at x = 0 , and the slope is 2 , the coefficient of x. 3.37 (a) y = 10(41) 155 = 255. (b) Note that y = 0 if 10 x = 155 or if x = 15.5 . A profit will be made if 16 or more units are sold. 3.38 (a) (b) (c) (d) x = duration of training, y = measure of performance. x = average number of cigarettes smoked, y = carbon monoxide level. x = level of humidity, growth rate of fungus. x = expenditure, y = sales.

64

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.39 (a) (b) The scatter diagram and the visually drawn dotted line are shown below.

(c) We use the alternative form of calculation


x 1 2 3 4 5 15 x =3 so y 1.0 2.2 2.6 3.4 3.9 13.1 y = 2.62 xy 1.0 4.4 x2 1 4

7.8 9 13.6 16 19.5 25 46.3 55 2 xy x

S xy = xy

( x )( y ) = 46.3 (15)(13.1) = 7.00


n 5

S xx = x 2 and
= 1 S xy S xx =

( x)
n

= 55

(15)2 = 10.00 5

7 = 0.70 10

= y x = 2.62 ( 0.70 ) 3 = 0.52 0 1 = 0.52 + 0.70 x . This is the solid line in the figure. and the least squares line is y

65

3.40 (a) (b) The scatter diagram and the visually drawn dotted line are shown below.
6 5 4 3 2 1 0 0 1 2 3 4 5 x 6 7 8 9 10 y
y 6 5 4 3 2 1 0 0 1 2 3 4 5 x 6 7 8 9 10

(c) We calculate x y 1 2 7 4 6 20
x=4

xx 3 2 3 0 2 0

y y
2 1 2 0 1 0

( x x )( y y )
6 2 6 0 2 16 S xy

5 4 1 3 2 15 y =3

( x x )2 9 4 9 0 4 26 S xx

so = 1 S xy S xx = 16 = 0.615 26

= y x = 2 ( 0.615) 4 = 4.46 0 1 = 4.46 0.615 x . This is the solid line in the and the least squares line is y figure above. = S xy = 10.2 = 1.085, 3.41 (a) 1 S xx 9.4 = y x= 0 1 39.9 19 (1.085 ) = 2.143 9 9

= 2.143 + 1.085 x . So, (b) The least squares fitted line is y = 2.143 + 1.085(3) = 5.398 or about 5.4 minutes. y

66

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.42 (a) The scatter diagram is shown below.


Total Number of Wolve s in Wisconsin and M ichigan
600 500 Number of Wolves 400 300 200 100 0 0 1 2 3 4 5 6 7 8 9 10 11 12 Year (coded)

(b) Observe that x 1 2 3 4 5 6 7 8 9 10 55


x = 5.5

y 178 204 248 257 327 325 373 436 467 546 3361 y = 336.1

xx

y y

( x x )( y y ) 711.45 462.35 220.25 118.65 4.55 -5.55 55.35 249.75 458.15 944.55 3219.5
S xy

-4.5 -158.1 -3.5 -132.1 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 0 -88.1 -79.1 -9.1 -11.1 36.9 99.9 130.9 209.9 0

( x x )2 20.25 12.25 6.25 2.25 0.25 0.25 2.25 6.25 12.25 20.25 82.5 S xx

We have n = 10, x = 55, y = 3361, S xx = 82.5 and S xy = 3219.5 . = S xy = 3219.5 = 39.024, = y x = 336.1 ( 39.024 ) (5.5) = 121.468 1 0 1 S xx 82.5 = 121.468 + 39.024 x . so the least squares line is y

67

(c) According to the scatter diagram, the straight line provides an excellent summary of the population growth over this period. The slope, 39.024, summarizes a typical yearly increase of about 39 wolves. 3.43 (a) The scatter diagram is shown below.
Amount of Garbage for Ten-Year Periods
300 Garbage (millions of tons) 250 200 150 100 50 0 140

180

220

260

300

Population (m il)

(b) We calculate n = 6, x = 1442, y = 1052, x = 240.333, y = 175.333 S xx = 10947.33, S xy = 15165.33 Using this, we have = S xy = 15165.33 = 1.385 = y x = 175.33 (1.385)(240.33) = 157.53 1 0 1 S xx 10947.33 = 157.52 + 1.385 x . and so, the least squares line is y (c) The slope of the least squares line says that 1.385 million tons of garbage is created for each 1 million people. Hence, each person generates about 1.385 tons of garbage per year. 3.44 The relative frequencies, by row, are: Type of Representation Self Attorney Amount of Aid Unchanged 0.587 0.463

Increased 0.321 0.515

Decreased 0.092 0.022

Total 1.000 1.000

Having an attorney improves the chances of getting an increase.

68

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.45 (a) x = 542 30 = 18.067 (b) The relative frequencies, by manufacturer, are: Carbohydrates Above mean Below mean 4 6 4 6 4 6 12 18

Manufacturer General Mills Kellogg Quaker Total (c) Manufacturer General Mills Kellogg Quaker

Total 10 10 10 30

Carbohydrates Above mean Below mean 0.4 0.6 0.4 0.6 0.4 0.6

Total 1.0 1.0 1.0

The row proportions are exactly the same for each row. 3.46 (a) The scatter diagram is shown below.

(b) With n = 10 and x = sugar content, we obtain S xx = 257.6, S yy = 352.5, and


S xy = 74 so

r=

S xy S xx S yy

74 = 0.246 257.6 352.5

Sugar and carbohydrate content are unrelated or have a very weak association.

69

3.47 (a) The frequency table is Drive Size Small Full Total 2-Wheel 12 20 32 4-Wheel 23 25 48 Total 35 45 80

(b) The relative frequencies are: Drive Size Small Full Total (c) Drive Size Small Full 2-Wheel 0.343 0.444 4-Wheel 0.657 0.556 Total 1.000 1.000 2-Wheel 0.15 0.25 0.40 4-Wheel 0.2875 0.3125 0.60 Total 0.4375 0.5625 1.00

(d) A larger proportion of small truck purchasers prefer 4-wheel drive. 3.48 (a) The frequency table is: Hepatitis 11 70 81 No Hepatitis 538 464 1002 Total 549 534 1083

Vaccinated Not Vaccinated Total

(b) The relative frequencies, by vaccination group, are: Hepatitis 0.020 0.131 No Hepatitis 0.980 0.869 Total 1.000 1.000

Vaccinated Not Vaccinated

Vaccinated persons in the study had a substantially lower incidence of Hepatitis than those not vaccinated. 3.49 (a) Negative; typically, the more time spent on the computer the fewer hours available for friends and other activities. (b) Somewhat negative; most students cram for finals and the more exams the more late night studying during finals and the fewer hours of sleep. (c) No relation. (d) Positive; higher temperature tends to make people more thirsty.

70

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.50 (a) High correlation; a lurking variable is the population size in each city. Large cities have more people that do everything. (b) High correlation; lurking variables are the new technology that introduced these devices and the price drops that make them more affordable. (c) High correlation; both variables increase year after year but the increase is due to different causes. Cell phones have become more affordable and cover larger service areas while banks have tried to expand services by setting up the automated teller machines. (d) High correlation; larger cities have more pollution and more persons who ride public transportation. (e) The correlation may be high; both variables typically increase year after year. Inflation causes x to increase and technological advances increase y. 3.51 a) The scatter diagram is shown below.

(b)

With n = 6 , we calculate x = 82.3 ,


2 x = 1,136.83

y = 56.9 xy = 785.21

2 y = 542.49

so S xx = x 2
2

( x)
n

= 1,136.83
2

(82.3)2 = 7.948 6

(56.9)2 S yy = y = 542.49 = 2.888 n 6 ( x )( y ) = 785.21 (82.3)(56.9) = 4.732 S xy = xy n 6 Consequently, S xy 4.732 r= = = 0.988. S xx S yy 7.948 2.888

( y)

71

3.52 (a) The scatter diagram is shown below.

(b) With n = 8 , we calculate

x = 84
2 x = 1110

y = 188 xy = 1921

2 y = 4544

so S xx = x
2

( x)
n

(84) 2 = 1110 = 228.00 8 = 4544

(188) 2 = 126.00 n 8 ( x )( y ) = 1921 (84)(188) = 53.00 S xy = xy n 8 Consequently, S xy 53 r= = = 0.313. S xx S yy 228.0 126.0 S yy = y


2

( y )

(c)
= 1 S xy S xx = 53 = 0.232 228.0

188 84 ( 0.232 ) = 25.9 8 8 = 25.9 0.232 x . and the least squares line is y = y x= 0 1

= 25.9 0.232(8) = 24.0 . In later chapters, we will see that (d) The equation is y this least squares fitted line does not have much predictive power because of the small value of r.

72

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.53 (a)
Kellog's Cereals

17

Sugar

12

2 10 20 30

Carbs

(b) With n = 8 , we calculate

y = 114
2 y = 1580

x = 176 xy = 1957

2 x = 3446

so S xx = x
2

( x)
n

= 3446
2

(176) 2 = 348.4 10

S yy = y

(114)2 = 280.4 n 10 ( x )( y ) = 1957 (114)(176) = 49.4 S xy = xy n 10


2

( y )

= 1580

Consequently, r= S xy S xx S yy = 49.4 = 0.158. 280.4 348.4

The scatter plot and the small value of r indicate sugar content and carbohydrate content are unrelated in Kelloggs cereals.

73

3.54 (a) The scatter diagram is shown below.

8 7 6 5 4 3 2 1 0 0 1 2 3 x 4 5 6 y

(b) We calculate
x 0 2

5 4 1 6 18 x = 3 so

y 5 4 4 2 7 2 24
y = 4

x x 3 1 2 1 2 3
0

y y 1 0 0 2 3 2 0

( x x)( y y ) ( x x) ( y y )
3 0 0 2 6 6 17 S x y

9 1 4 1 4 9 28 S x x

1 0 0 4 9 4 18
S y y

r=

S xy S xx S yy

17 = 0.757. 28 18

74

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

3.55 (a) (b) The scatter plot and visually drawn line are shown below.

6 5 4 3 2 1 0 0 1 2 3 4 5 x 6 7 8 9 10 y
y

6 5 4 3 2 1 0 0 1 2 3 4 5 x 6 7 8 9 10

(c) We calculate
x 0 3

5 8 9 25 x = 5 so

y 1 2 4 3 5 15
y = 3

x x 5 2 0 3
4 0

y y 2 1 1 0 2 0

( x x)( y y ) ( x x)
10 2 0 0 8 20 S x y

25 4 0 9 16 54 S x x

= 1

S xy S xx

20 = 0.370 54

= y x = 3 ( 0.370 ) 5 = 1.15 0 1 = 1.15 + 0.370 x . This is the solid line in the and the least squares line is y figure.

75

3.56 (a) A scatter plot would be more appropriate to use here since the correlation between the two variables is what is sought. (b) A 2x2 contingency table would be more appropriate here since both variables of interest are further categorized into two options and all four combinations are of interest. (c) A scatter plot would be more appropriate to use here since the correlation between the two variables is what is sought. 3.57 (a) x = road roughness and y = gas consumption. (b) x = number of wins and y = total sales. (c) x = trip distance and y = number of weekends at home. 3.58 (a) Read C3T7.DAT in C1 and C2. (See next page for output.)

76

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

(b) Read 3.25.DAT in C1 and C2.

3.59 (a)

(b)

77

3.60 (a)

78

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

(b)

The unusual observation has the initial row time of 1043 and is case 29.

The correlation is slightly smaller with the one observation removed but the slope and intercept of the fitted line change quite a lot. That one point has a lot of influence in the determination of the fitted line.

79

3.61 (a) The scatter diagram is shown below.

(b) With n = 36 , we calculate

x = 2006
2 x = 114, 950

y = 961
2 y = 26, 281

xy = 54,166

so S xx = x
2

( x)
n

(2006)2 = 114,950 = 3,171.22 36 = 26, 281

S yy = y

(961)2 = 627.64 n 36 ( x )( y ) = 54,166 (2006)(961) = 616.94 S xy = xy n 36


2

( y)

Consequently, r= S xy S xx S yy = 616.94 = 0.437. 3,171.22 627.64

80

CHAPTER 3. DESCRIPTIVE STUDY OF BIVARIATE DATA

Anda mungkin juga menyukai