Ch26 Exercises

9/27/2006
26 Exercises
Mix and Match

For this matching exercises, refer to the multiple regression equation y = 0 + 1 x + 2 d + 3 x d + where y and x are numerical variables and d is a dummy variable. y is the annual salary of an employee (in thousands of dollars) and x denotes the years of experience. d is coded as 1 for college grads and is coded as 0 for those lacking a college degree.
1. Intercept for high school grad 2. Intercept for college grad 3. $M/year, high school grad 4. $M/year, college grad 5. Difference in slopes 6. Difference in intercepts 7. Interaction 8. Equal variances 9. Average salary for high school grad with 10 years of experience. 10. Average salary for college grad with 10 years of experience
a. 1 b. 3 c. 0 + 2 d. 0 + 1 10 e. d x f. g. 1+3 h. 0+2 + (1+3)10 i. 0 j. 2
True/False
11. The two-sample t-test is possibly confounded if the groups differ in ways other than the labeling that distinguishes the groups. 12. An analysis of covariance is another name for the use of randomization to avoid confounding. 13. A dummy variable is a numerical encoding using 0s and 1s that distinguishes the members of two groups. 14. To build the interaction between x and a dummy variable d, we multiply x times d. 15. If the multiple regression implies parallel fits, the slope of the dummy variable is the difference between the two fitted lines.
9/27/2006
26 Exercises
16. A multiple regression with a numerical predictor and a dummy variable as two predictors implies parallel fits to the two groups. 17. The purpose of an interaction variable is to force fits in the two groups to be parallel. 18. Interaction variables typically introduce collinearity into a multiple regression and should be removed from the fit if not statistically significant. 19. If neither the interaction nor dummy variable is statistically significant in an analysis of covariance, then theres no lurking factor that confounds the results of the related two-sample t-test. 20. To be a confounding variable, the variable must be related to y and to the dummy variable indicating group membership. 21. A major assumption of the use of regression with dummy variables is that the size of the two groups be approximately the same in order to increase the variation of the dummy variable. 22. To check the similar variances condition in models with a dummy variable, use comparison boxplots of y versus the categorical variable.
Think About It
23. These comparison boxplots show the revenue generated individual sales representatives who operate in divisions supervised by two different managers. Whats the problem using a two-sample t-test to judge the statistically significance of the apparent difference?
50 45
Revenue ($M)
40 35 30 25 20 15 10 A B
Manager
Level A B
Number 24 37
Mean 25.0295 35.9265
Std Dev 5.33875 6.89008
24. An auditor collected a random samples of about 100 invoices paid in the current fiscal year and compared the amounts of these invoices to those of a second random sample of invoices paid in the prior fiscal year. These boxplots summarize the amounts (in dollars) of the two sets of invoices.
E26-2
9/27/2006
90000 80000 70000 60000 50000 40000 30000 20000 10000 0 2005 2006
26 Exercises
Invoice Amount ($)
Fiscal Year
Year 2005 2006
Number 111 109
Mean 22199.3 25116.3
Std Dev 16185.4 17702.3
Would you suggest that the auditor perform a two-sample t-test to compare the mean values of these invoices, or can you suggest one (or more) lurking factors that should be taken into account prior to the comparison? 25. When fitting the regression of y on x for two groups, we can estimate the slope and intercept within each group by either fitting two simple regressions or by fitting one multiple regression. If simple regressions are so much easier to interpret, why bother to glue them together into one multiple regression? 26. What assumption is required when we combine two simple regressions into one multiple regression using a dummy variable and an interaction? The MRM requires an assumption that the combination of the two SRMs does not require. What is it, and what condition of the MRM does it affect? 27. An industry analyst constructed a model describing the cost of building cars at plants operated by different manufacturers in North America. As a first step, the analyst regressed total production cost (in dollars) on the number of labor hours for a sample of vehicles. The data used came from two plants, one operated by a domestic manufacturer under contract with the UAW, the United Automobile Workers, and the other operating a non-unionized plant. (UAW members cost more than nonunion labor; The Wall Street Journal in May 2006 estimated total costs run $74 per hour if benefits are included.) The analyst included a dummy variable in the regression indicating the plant. Do you think the analyst should also include an interaction (between plant and labor hours)? 28. Matsushita is well known for the efficiency of its automated factories. Facing pressure from developing Asian producers with lower labor costs, the company reconfigured robots in its factory in Saga, Japan. After the modification, it takes 40 minutes to configure the assembly line and start production. Formerly, it took about 20 hours.1 Once production begins, the plant runs as previously; the robots are the same, only reconfigured to simplify changing tasks. In order to analyze the
1
Reported in Business Week (7/10/2006). No one does lean like the Japanese 40-41.
E26-3
9/27/2006
26 Exercises
association between the time to complete a production run (the response) and the number of units produced, how will this modification change the nature of the fitted equation. Do you expect the slope, intercept, and error variance all to change? Note the interpretation of these parameters in the context of this data. 29. A two-sample t-test has a lot in common with regression. This output summarizes the results of fitting a simple regression with only a dummy variable as the explanatory variable. The data is the same salary data used in the text, with salary regressed on Group.
170
Salary ($M)
160 150 140 130 120 110 0 .25 .5 .75 1
Group
R2 se n Term Intercept Group Estimate 140.46667 3.64367
0.019116 12.42868 220 Std Error 1.43514 1.76775 t Stat 97.88 2.06 p-value <.0001 0.0405
a) Interpret the estimated intercept and slope in the fitted simple regression. b) Whats the relationship between the t-statistic for the slope in this simple regression and the t-statistic for the two-sample t-test? (See Table 26.1) c) What assumption is needed in this simple regression approach to a two-sample ttest that we did not require previously? 30. This output summarizes a simple regression fit to the data on marketing Courier Paks by then Federal Express.
100 90 80 70 60 50 40 30 20 10 0 -10 0 .25 .5 .75 1
Mailings
Aware
R2 se n Term Estimate
0.070348 22.09439 125 Std Error t Ratio Prob>|t| E26-4
9/27/2006 Term Intercept Aware Estimate 29.693333 12.306667 Std Error 2.551241 4.033866 t Ratio 11.64 3.05
26 Exercises Prob>|t| <.0001 0.0028
a) Summarize the estimated equation of the simple regression model. b) The t-statistic for the slope in this model is statistically significant. Assuming the conditions of the SRM hold, whats this tell us? c) The variability of the two groups seems somewhat different. Why might that be the case, considering the role of the hours of promotion in this example? 31. The Analysis of Covariance emphasizes the use of regression to fix a problem with the two-sample t-test that has a confounding variable. You can also think of the use of a dummy variable as a way to fix a problem in the regression of y on x. Take a look at this scatterplot:
14 12 10 8 6 4 2 0 -2 -2 -1 0 1 2 3 4 5
a) If we fit parallel slopes to these data, with one line for the red and another for the green points, what do you think the slope will be? b) What happens if we estimate the slope while ignoring the presence of two clear groups? That is, if we fit a simple regression of y on x using all of the data? 32. After a manufacturer closed an old assembly plant, it re-trained its production employees to use new machines in a more highly automated robotic facility. The automated facility allows the plant to fill small orders of customized parts rather than churn out identical copies. After a week-long training period, a group of these long-time employees were put to work. Another group of workers were new hirers that did not undergo this extensive training. In a study of the value of this training program, an analyst regressed the number of items produced on the time required (I minutes) for completion of the order. The data is shown in this plot; trained employees are colored green.
E26-5
9/27/2006
26 Exercises
250
200
Minutes
150
100
50 20 30 40 50 60 70 80 90 100 110
Units
a) If we fit a separate equation to each group, then what is the interpretation of the intercept in either fit? Include the units as part of your description. b) What is the interpretation of the slope in either fit? Include the units as part of your description. b) Will an analysis of covariance require an interaction term, or can you skip this step and only fit a dummy variable to distinguish the two groups? 33. The following output summarizes the fit of an analysis of covariance to the data in Question 31. The variable D denotes a dummy variable, with D=1 for values colored green and 0 otherwise. Term Intercept x D D*x Estimate 5.7180783 2.2326506 -5.875727 -0.189738 Std Error 0.13508 0.136367 0.391419 0.205609 t Stat 42.33 16.37 -15.01 -0.92 p-value <.0001 <.0001 <.0001 0.3584
a) Does the fit of the model suggest parallel equations for the two groups? b) What should be the next thing to do in the analysis of this data, specifically thinking of the form of the fitted model? 34. The following output summarizes the fit of an analysis of covariance to the data in Question 32. The variable D denotes a dummy variable, with D=1 for values colored green and 0 otherwise. Term Intercept Units D D * Units Estimate 26.782895 2.0620401 52.816094 -1.277012 Std Error 2.022751 0.036202 2.952812 0.051934 t Stat 13.24 56.96 17.89 -24.59 p-value <.0001 <.0001 <.0001 <.0001
a) What is the interpretation of the coefficient of D in the fit of this multiple regression? Use the context of the analysis in your answer. b) For what size production run does it appear that the trained employees (shown in green with D = 1) appear more productive than the employees who did not receive training (red, D = 0). If one group is always better than the other, say so. E26-6
9/27/2006
26 Exercises
You Do It
35. Emerald diamonds These data are a subset of the diamonds used in Chapter 19. This data table of 144 diamonds includes the price (in dollars), the weight (in carats), and the clarity grade of the diamonds. The diamonds have clarity grade either VS1 or VVS1. (a) Would it be appropriate to use a two-sample t-test to compare the average prices of VS1 and VVS1 diamonds, or is this relationship confounded by the weights of the diamonds? (b) Perform the two-sample t-test to compare the prices of the two clarity grades. Summarize this analysis, assuming that there are no lurking variables. (c) Compare the prices of the two types of diamonds using an analysis of covariance. Summarize the comparison of prices based on this analysis. Use a dummy variable coded as 1 for VVS1 diamonds and 0 otherwise. (Assume for the moment that the model meets the conditions for the MRM.) (d) Compare the results from b and c. Do they agree? Explain why they agree or differ. You should take account the precision of the estimates and your answer to a. (e) What problem bedevils the multiple regression used for the analysis of covariance that is not present in the two-sample t-test? 36. Convenience shopping These data expand the data table introduced in Chapter 19 by introducing data from a second location. For each of two service stations operated by a national petroleum refiner, we have the daily sales in the convenience store located at the service station. The data for each day give the sales at the store (in dollars) and the number of gallons of gasoline sold. For Site 1, the data cover 283 days; for site 2, the data cover 285 days. (a) Would it be appropriate for management of this chain of service stations to rate the operators of the convenience stores based on a two-sample comparison of the sales of the convenience stores during these two periods, or would such a comparison be confounded by different levels of traffic (as measured by the volume of gasoline sold)? (b) Perform the two-sample t-test to compare the sales of the two service stations. Summarize this analysis, assuming that there are no lurking variables. (c) Compare the sales at the two sites using an analysis of covariance. Summarize the comparison of sales based on this analysis. Use a dummy variable coded as 1 for Site 1 and 0 otherwise. (Assume for the moment that the model meets the conditions for the MRM.) (d) Compare the results from b and c. Do they agree? Explain why they agree or differ. You should take account the precision of the estimates and your answer to a. E26-7
9/27/2006
26 Exercises
(e) Does the estimated multiple regression used in the analysis of covariance meet the similar variances condition? (f) Suppose an analyst fit the simple regression of sales in the convenience on gasoline sales, ignoring the distinction between the two sites. Does this pooling of all the data together affect the relationship between sales in the store and gasoline sales? 37. Download (Introduced in Chapter 19) Before taking the plunge into videoconferencing, a company ran tests of its current internal computer network. The goal of the tests was to measure how rapidly data moved through the network given the current demand on the network. Eighty files ranging in size from 20 to 100 megabytes (MB) were transmitted over the network at various times of day, and the time to send the files recorded. Two types of software were used to transfer the files, identified by the column labeled Vendor in the data table. The two possible values are MS and NP; use a dummy variable coded as 1 when Vendor = MS. (a) Would it be appropriate for management to compare the two vendors based on a two-sample comparison of the times needed to transfer the files, or would such a comparison be confounded by different sizes of the files that were sent? (b) Perform the two-sample t-test to compare the performance of the software provided by the two vendors. Summarize this analysis, assuming that there are no lurking variables. (c) Compare the sales at the two sites using an analysis of covariance. Summarize the comparison of sales based on this analysis. Use a dummy variable coded as 1 for Site 1 and 0 otherwise. (Assume for the moment that the model meets the conditions for the MRM.) (d) Compare the results from b and c. Do they agree? Explain why they agree or differ. You should take account the precision of the estimates and your answer to a. (e) Does the estimated multiple regression used in the analysis of covariance meet the similar variances condition? 38. Production costs (Introduced in Chapter 19) A manufacturer produces custom metal blanks that are used by its customers for computer-aided machining. The customer sends a design via computer (a 3-D blueprint), and the manufacturer sends the customer an estimated price per unit. This analysis considers the factors that affect the cost to manufacture these blanks. This cost estimate is then used to determine a price for the customer. The data for the analysis were sampled from the accounting records of 195 previous orders filled during the last 3 months. The data measure performance at two plants, identified as OLD and NEW in the column Plant. (a) Would it be appropriate for management to compare the two plants using a twosample comparison of the costs per unit, or would such a comparison be confounded by different requirements for machine use per unit in the two plants? E26-8
9/27/2006
26 Exercises
(b) Perform the two-sample t-test to compare the average cost per unit at the two plants. Summarize this analysis, assuming that there are no lurking variables. (c) Compare the average cost per unit at the two plants using an analysis of covariance. Summarize the comparison based on this analysis. Represent these categories using a dummy variable coded as 1 if the plant is new. (Assume for the moment that the model meets the conditions for the MRM.) (d) Compare the results from b and c. Do they agree? Explain why they agree or differ. You should take account the precision of the estimates and your answer to a. (e) Does the estimated multiple regression used in the analysis of covariance meet the similar variances condition? 39. Home prices This data table expands the data introduced in Chapter 19 on the prices of homes in the Seattle area. One realtor operating in Seattle listed these 28 homes. This table includes prices and sizes of 8 more homes listed by a different realtor in Seattle. As previously, well look at the price per square foot, using as numerical predictor the reciprocal of the number of square feet as the explanatory variable. In this model, the intercept estimates the variable cost per square foot and the slope of 1/SqFt estimates the fixed costs present regardless of the size of the home. (a) Scatterplot the cost per square foot of the homes on the reciprocal of the size of the homes. Do you see a difference in the relationship between cost per square foot and 1/SqFt for the two realtors? Use color-coding or different symbols to distinguish for the data of the two realtors. (b) Based on your visual impression formed in a, fit an appropriate regression model that describes the fixed and variable costs for these realtors. Use a dummy variable coded as 1 for Realtor B to represent the different realtors in the regression. (c) Does the estimated multiple regression fit in b meet the conditions for the MRM? (d) Interpret the estimated coefficients from the equation fit in b, if it is OK to do so. If not, indicate why not. (e) Would it be appropriate to use the estimated standard errors shown in the output of your regression estimated in b to set confidence intervals for the estimated intercept and slopes? Explain. 40. Leases (Introduced in Chapter 19) This data table includes the annual prices of 223 commercial leases. All of these leases provide office space in a Midwestern city in the US. In previous exercises, we estimated the variable costs (costs that increase with the size of the lease) and fixed costs (those present regardless of the size of the property) using a regression of the cost per square foot on the reciprocal of the number of square feet. The intercept estimates the variable costs and the slope estimates the fixed costs. Some of these E26-9
9/27/2006
26 Exercises
leases cover space in the downtown area, whereas others are located in the suburbs. The variable Location identifies these two categories. (a) Scatterplot the cost per square foot of the leases on the reciprocal of the square feet of the lease. Do you see a difference in the relationship between cost per square foot and 1/SqFt for the two locations? Use color-coding or different symbols to distinguish for the data of the two locations. (b) Based on your visual impression formed in a, fit an appropriate regression model that describes the fixed and variable costs for these leases. Use a dummy variable coded as 1 for leases in the city and 0 for the suburban leases. (c) Does the estimated multiple regression fit in b meet the conditions for the MRM? (d) Interpret the estimated coefficients from the equation fit in b, if it is OK to do so. If not, indicate why not. (e) Would it be appropriate to use the estimated standard errors shown in the output of your regression estimated in b to set confidence intervals for the estimated intercept and slopes? Explain. 41. R&D expenses This data file contains a variety of accounting and financial values that describe companies operating in technology industries: software, systems design, and semiconductor manufacturing. One column gives the expenses on research and development (R&D), and another gives the total assets of the companies. Both of these columns are reported in millions of dollars. This data table expands previous versions (introduced in Chapter 19) by adding data for 2003 to the data for 2004. To estimate regression models, we need to transform both expenses and assets to a log scale. (a) Plot the log of R&D expenses on the log of asset for 2003 and 2004 together in one scatterplot. Use color-coding or distinct symbols to distinguish the groups. Does it appear that the relationship is different in these two years, or can you capture the association with a single simple regression? A common question asked when fitting models to subsets is Do the equations for the two groups differ from each other? For example, does the equation for 2003 differ from the equation for 2004? Weve been answering this question informally, using the t-statistics for the slopes of the dummy variable and interaction. Theres just one small problem: were using two tests to answer one question. Whats the chance for a false positive error? If youve got one question, better to use one test. To see if theres any difference, we can use a variation on the F test for R2. The idea is to test both slopes at once rather than separately. The method uses the change in the size of R2. If the R2 of the model increases by a statistically significant amount when we add both the dummy variable and interaction to the model, then something changed and the model is different. The form of this incremental, or partial, F test is
F=
Change in R 2 /( number of added slopes) (1 " R 2 full ) /( n " 1 " q full )

E26-10
9/27/2006
26 Exercises
In this formula, q denotes the number of variables in the model with all the bells and whistles, including dummy variables and interactions. R2full is the R2 for that model. As usual, a big value for this F-statistic is 4. (b) Add a dummy variable (coded as 0 for 2004 and 1 for 2003) and its interaction with log assets to the model. Does the fit of this model meet the conditions for the MRM? Comment on the consequences of any problem that you identify. (c) Assuming that the model meets the conditions for the MRM, use the incremental F-test to assess the size of the change in R2. Does the test agree with your visual impression? (The value of qfull for the model with dummy and interaction is 3, with 2 slopes added. You will need to fit the simple regression of log R&D expenses on log assets to get the R2 from this model.) (d) Summarize the fit of the model that best captures what is happening in these two years. 42. Cars The cases that make up this data set are cars. For each of 223 types of cars sold in the US during the 2003 and 2004 model years, we have the base price and the horsepower of the engine (HP). In previous exercises, we found that a model for the association of price and horsepower required taking logs of both variables. (We used base 10 logs.) The column Location denotes the continent of the home country of the manufacturer. (This is a bit loose, since Ford owns Jaguar and GM owns Saab. We coded these as European anyhow. Similarly, we labeled Chrysler as US even though it was absorbed by Daimler, a.k.a., Mercedes.) Alas, we have three groups. To simplify the analysis, well compare domestic cars to imports from Europe. The data set for this exercise hence excludes cars from Asian manufacturers. (a) Plot the log10 of price on the log10 of horsepower for cars from both groups of manufacturers in one scatterplot. Use color-coding or distinct symbols to distinguish the groups. Does it appear that the relationship is different in these two years, or can you capture the association with a single simple regression? (b) Add a dummy variable (coded as 0 for US and 1 for European designs) and its interaction with log10 HP to the model. Does the fit of this model meet the conditions for the MRM? Comment on the consequences of any problem that you identify. (c) Assuming that the model meets the conditions for the MRM, use the incremental F-test to assess the size of the change in R2. (See the discussion of this test in Question 41.) Does the test agree with your visual impression? (The value of qfull for the model with dummy and interaction is 3, with 2 slopes added. You will need to fit the simple regression of log R&D expenses on log assets to get the R2 from this model.) (d) Compare the conclusion of the incremental F-test to the tests of the coefficients of the dummy variable and interaction separately. Do these agree? Explain the similarity or difference.
E26-11
9/27/2006
26 Exercises
43. Movies These data (used also in Chapter 20) describe the box-office success of 224 movies released during the years 1998 through 2001. For this analysis, were interested in the relationship between initial success at the movie theatre and subsequent sales for pay-per-view services, such as those offered by cable television. All of these movies are rated either G or PG, with Audience set to Family, or rated R, with Audience set to Adult. We dropped movies rated PG-13. (a) Plot the log10 of subsequent sales on the log10 of the box-office gross for movies from both groups in one scatterplot. Use color-coding or distinct symbols to distinguish the groups. Does it appear that the relationship between box office success and subsequent video sales differs for the two categories, or can you capture the association with a single simple regression? (b) Add a dummy variable (coded as 1 for adult audiences and 0 for family audiences) and its interaction with log10 Gross to the model. Does the fit of this model meet the conditions for the MRM? Comment on the consequences of any problem that you identify. (c) Assuming that the model meets the conditions for the MRM, use the incremental F-test to assess the size of the change in R2. (See the discussion of this test in Question 41.) Does the test agree with your visual impression? (The value of qfull for the model with dummy and interaction is 3, with 2 slopes added. You will need to fit the simple regression to get its R2 for comparison to the multiple regression.) (d) Compare the conclusion of the incremental F-test to the tests of the coefficients of the dummy variable and interaction separately. Do these agree? Explain the similarity or difference. (e) Whats your take on the subsequent success of movies? Does the box-office gross tell you something different about movies intended for adults versus those for the family? 44. Hiring (Introduced in Chapter 19) A firm that operates a large, direct-to-consumer sales force would like to be able to put in place a system to monitor the progress of new agents. A key task for agents is to open new accounts; an account is a new customer to the business. The goal is to identify superstar agents as rapidly as possible, offer them incentives, and keep them with the company. To build such a system, the firm has been monitoring sales of new agents over the past two years. The response of interest is the profit to the firm (in dollars) of contracts sold by agents over their first year. Among the possible predictors of this performance is the number of new accounts developed by the agent during the first 3 months of work. Some of these agents were located in new offices, whereas others joined an existing office (see the column labeled Office). (a) Plot the log of profit on the log of the number of accounts opened for both groups in one scatterplot. Use color-coding or distinct symbols to distinguish the groups. Does the coloring explain an unusual aspect of the black and white scatterplot? Does a simple regression that ignores the groups provide a reasonable summary? E26-12
9/27/2006
26 Exercises
(b) Add a dummy variable (coded as 1 for new offices and 0 for existing offices) and its interaction with log Accounts to the model. Does the fit of this model meet the conditions for the MRM? Comment on the consequences of any problem that you identify. (c) Assuming that the model meets the conditions for the MRM, use the incremental F-test to assess the size of the change in R2. (See the discussion of this test in Question 41.) Does the test agree with your visual impression? (The value of qfull for the model with dummy and interaction is 3, with 2 slopes added. You will need to fit the simple regression to get its R2 for comparison to the multiple regression.) (d) Compare the conclusion of the incremental F-test to the tests of the coefficients of the dummy variable and interaction separately. Do these agree? Explain the similarity or difference. (e) Whats your take on locating new hires in new or existing offices? Would you recommend locating them in one or the other (assuming it could be done without disrupting the current placement procedures)? 45. Promotion These data describe spending by a pharmaceutical company to promote a cholesterol-lowering drug. The data covers 39 consecutive weeks and isolates the metropolitan areas around Boston, Massachusetts, and Portland, Oregon. A subset of this data was introduced in Chapter 19. The variables in this collection are shares. Marketing research often describes the level of promotion in terms of voice. In place of the level spending, voice is the share of advertising devoted to a specific product. Voice puts spending in context; $10 million might seem like a lot for advertising unless everyone else is spending $200 million. The column Market Share is the ratio of sales of this product divided by total sales for such drugs in the Boston area. The column Detail Voice is the ratio of detailing for this drug to the amount of detailing for all cholesterol-lowering drugs in Boston. Detailing counts the number of promotional visits made by representatives of a pharmaceutical company to doctors offices. (a) A hasty analyst fit the regression of Market Share on Detail Voice with the data from both locations combined. The analyst found a very statistically significant slope for Detail Voice, estimated larger than 1. (Implying at 1% more share of detailing would get on average 1% more of the market.) What mistake has the analyst made? (b) Propose an alternative model and evaluate whether your alternative model meets the conditions of the MRM so that you can do confidence intervals. (c) Whats your interpretation of the relationship between detailing and market share? If you can, offer your impression as a range. 46. iTunes The music that you keep on an Apple iPod can be stored digitally in several formats. A popular format for Apple is known as AIFF, short for Audio Interchange File Format. Another format is known as AAC, short for Advanced Audio Coding. Files E26-13
9/27/2006
26 Exercises
on an iPod can be in either of these formats, or both. The 596 songs in this data set use a mixture of these two formats. (a) Based on the scatterplot of the amount of space needed on the length of the songs, propose a model for how much space (in megabytes, MB) is needed to store a song of a given number of seconds. (b) Evaluate whether your model meets the conditions of the MRM so that you can do confidence intervals. (c) Interpret the estimated slopes in your model. (d) Construct, if appropriate, a prediction interval for the amount of disk space required to store a song that is 240 seconds long using AAC and then AIFF format. How can you get intervals? (Be imaginative: the obvious approach has some problems.)
E26-14

Ch26 Exercises

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Ch26 Exercises

Diunggah oleh

Hak Cipta:

Format Tersedia

9/27/2006

Mix and Match

a. 1 b. 3 c. 0 + 2 d. 0 + 1 10 e. d x f. g. 1+3 h. 0+2 + (1+3)10 i. 0 j. 2

Mean 25.0295 35.9265

Std Dev 5.33875 6.89008

Invoice Amount ($)

Year 2005 2006

Number 111 109

Mean 22199.3 25116.3

Std Dev 16185.4 17702.3

160 150 140 130 120 110 0 .25 .5 .75 1

R2 se n Term Intercept Group Estimate 140.46667 3.64367

0.070348 22.09439 125 Std Error t Ratio Prob>|t| E26-4

26 Exercises Prob>|t| <.0001 0.0028

Change in R 2 /( number of added slopes) (1 " R 2 full ) /( n " 1 " q full )

Anda mungkin juga menyukai