Anda di halaman 1dari 21

Homework Problems, 550.

311 Spring 2010


Collected for 550.311; many are taken from textbooks by Jay Devore, Neil Weiss, Bernard Rosner, John Rice, Sheldon Ross, Cincich, Levine & Stephan, Walpole, Myers, Myers, & Ye. Probability basics Problem 1.1: Later in the semesters homework you will show that if a 1 inch needle is randomly dropped on a oor lined with parallel lines spaced 2 inches apart, then the probability of the needle touching a line is exactly
1 .

Assuming this is true, assuming the only arithmetic

operation you may do is to count, and assuming you have access to such a needle and such a oor, how might you compute an integer numerator and an integer denominator of a fraction approximating ? Problem 1.2: (Devore ed5, p58) An engineering construction rm is currently working on power plants at three dierent sites. Let Ai denote the event that the plant at site i is completed by the contract date. Use the operations of union, intersection, and complementation to describe each of the following events in terms of A1 , A2 , and A3 , draw a Venn diagram, and shade the region corresponding to each one: a) At least one plant is completed by the contract date. b) All plants are completed by the contract date. c) Only the plant at site 1 is completed by the contract date. d) Exactly one plant is completed by the contract date. e) Either the plant at site 1 or both of the other two plants are completed by the contract date. Problem 1.3: (Rice p25) Show from the probability axioms: For any events A, B it holds that P(A B) P(A) + P(B) 1. (You may use the results derived in class without reproving them.) Problem 1.4: Show from the probability axioms: For any events E1 , E2 , E3 , . . . (even if they are not disjoint) it holds that P(i Ei ) Combinatorial probability Problem 2.1: (Devore ed5, p74) A class has 20 nonsmokers, 15 light smokers, and 10 heavy smokers. Six of these will be randomly selected to participate in a study (each set of 6 is equiprobable). a) What is the probability that all 6 are heavy smokers? b) What is the probability that all 6 have the same status (non/light/heavy smoking)? 1
i

P(Ei ). [Hint: Consider events of form Ai := Ei \(i1 Ej ).] j=1

Problem 2.2: (Ross p54) Suppose 8 rooks (castles) are randomly laid on a chessboard. What is the probability that they are non-attacking, that is, that no two are in the same row or column? Problem 2.3: (Rice p25) The rst 3 digits of a university telephone exchange are 452. If all sequences of the remaining four digits are equally likely, what is the probability that a randomly selected university number contains 7 distinct digits? Problem 2.4: (Rice p26) A deck of 52 cards is shued thoroughly. What is the probability that all 4 aces are next to each other? Problem 2.5: Suppose 6 criminals commit a violent crime, and are apprehended by the police. The police take 42 additional innocent people, and put all 48 in a line-up for a witness to select the 6 criminals. Unfortunately for the police, the witness is a pathological liar, and decides to randomly identify 6 from the lineup, none of the line-ees being shown any preference. What is the probability that the witness picks out exactly 2 of the criminals? Problem 2.6: A monkey is given the 11 Scrabble tiles whose letters make up the the word Shakespeare. What is the probability that the monkey will properly spell Shakespeare in a single try? Problem 2.7: (Rice p27) A group of 60 second graders is to be randomly assigned to two classes of 30 each. Five of the second graders, Bill, Sarah, Michelle, Katy, and Cameron, are close friends. a) What is the probability that they will all be in the same class? b) What is the probability that exactly 4 of them will be in the same class? Problem 2.8: (Related to Simpsons Paradox) Suppose there are 1800 smokers and 2300 nonsmokers who are taken ill with a particular disease. There are two experimental medications, A and B, and each of the 4100 people gets one or the other. The outcome is: 500 of the 1100 smokers on A got better, 300 of the 700 smokers on B got better, 600 of the 900 nonsmokers on A got better, and 900 of the 1400 nonsmokers on B got better. a) If we randomly select a one of these 1100 smokers on A, and randomly select one of these 700 smokers on B, what are their respective probabilities of recoveryand which has the higher probability? b) If we randomly select on of these 900 nonsmokers on A, and randomly select one of these 1400 nonsmokers on B, what are their respective probabilities of recoveryand which has the higher probability? c) If we 2

randomly select one of these 2000 people treated with A and one of these 2100 people treated with B (without paying attention to their smoking/nonsmoking status) what are their respective probabilities of recoveryand which has the higher probability? (Why is this very strange and vexing?) Problem 2.9: (Devore ed5, p75) Three molecules of type A, three of type B, three of type C, and three of type D are to be linked together in a line to form a chain molecule. One possible chain molecule is ABCDABCDABCD and another is BCDDAAABDBCC. a) How many such chain molecules are there? (Hint: If the three As were distinguishable from one another A1 , A2 , A3 and the Bs, Cs, and Ds were also, how many possible molecules would there be? Now think about the eects of removing the distinguishability from the As, then the Bs, etc.) Suppose a chain molecule is randomly selected. What is the probability that all three molecules of each type end up next to one another (such as BBBAAADDDCCC)? Conditional probability and independence Problem 3.1: (Rice p28) A fair coin is tossed three times. a) What is the probability of two or more heads given that there was at least one heads? b) What is the probability of two or more heads given that there was at least one tails? Problem 3.2: Two dice are rolled and the sum of their faces is less than 6. What is the probability that at least one of the dice came up 3? Problem 3.3: (Ross p104) Suppose there is a .5 probability that the Queen carries the gene for hemophilia and, if she carries the gene, there is a .5 probability that any individual prince/ess will have hemophilia. If the rst three prince/esses dont have hemophilia, what is the probability that the next prince/ess will have hemophilia? Problem 3.4: (Rice p60) Suppose the probability of living to be older than 70 is .6, and the probability of living to be at least 80 is .2. If a person reaches her 70th birthday, what is the probability that she will celebrate her 80th birthday? Problem 3.5: (Rosner p69) In the journal article by Colley, Holland, and Corkhill, Inuence of passive smoking and parental phlegm on pneumonia and bronchitis in early childhood Lancet II (1974) 1031, the authors claim that 7.8% of children with nonsmoking parents have episodes of pneumonia and/or bronchitis in the rst year of life whereas, respectively 11.4% of children 3

with one smoking parent and 17.6% of children with two smoking parents had such an episode. Assuming that in 94% of the population neither parent smokes, in 4% of the population one parent smokes, and in 2% of the population both parents smoke. a) What is the probability that a randomly selected child (from the whole population) has two smoking parents and also has such an episode? b) What is the probability that a randomly selected child (from the whole population) has such an episode? c) If a randomly selected child (from the whole population) has such an episode, what is the conditional probability that both parents smoke? Problem 3.6: (Rice p28) A life insurance company has high-risk, medium-risk, and low-risk clients in proportions .10, .20, and .70, respectively. The probability of a claim over a 10 year period is .02, .01, and .0025 from these respective populations. If a claim is randomly selected from a 10 year period, what is the probability that it is led by a high-risk client? Problem 3.7: (Rice p30) Show that if events A and B are independent then A and B are independent, and that A and B are independent. Problem 3.8: (Devore ed5, p90) Closing an incision after a particular operation requires 12 stitches, and if one stitch is defective then we call the closing defective. Assume each stitch is defective with the same probability, and independently of the others. a) If the probability of a closing being defective is .02, what is the individual probability of a single stitch being defective? b) Just what should the probability of a stitch being defective be so that the probability of the closing being defective is under .01? Problem 3.9: (Devore ed5, p92) Suppose I y to Hawaii on airline X and return on airline Y . Consider events A = {X loses my luggage} and B = {Y loses my luggage}. If A and B are independent with P(A) > P(B), P(A B) = .0002, and P(A B) = .03, determine P(A) and P(B). Problem 3.10: Below are 4 networks; each link is functional with probability .9, independent of the other links. In each the rst 3 networks, what is the probability that node t is reachable from node s (along functional links)? b) In the fourth network, what is the probability that every node is reachable from every other node?

Problem 3.11: (Weiss ed6,p188) In a letter to the editor that appeared in the February 23, 1987 issue of U.S. News and World Report, a reader discussed the issue of space shuttle safety. Each criticality 1 item must have 99.99% reliability, according to NASA standards, meaning that the probability of failure for such an item is .0001. Mission 25, the mission in which the Challenger exploded, had 748 criticality 1 items. (You may assume that failure of criticality 1 items are independent of each other.) a) Determine the probability that none of the criticality 1 items would fail. b) Determine the probability that at least one criticality 1 item would fail. Bernoulli and binomial random variables Problem 4.1: (Rosner p68) Suppose that a disease is inherited via an autosomal recessive mode of inheritance. The implication is that each child in the family has an independent 4 (inclusive) inherit the disease? Problem 4.2: Suppose the probability that a random white blood cell is a neutrophil is .4. In 8 randomly and independently selected white blood cells, what is the probability that exactly 3 are neutrophils? 5
1 4

probability

of inheriting the disease. In a family with 5 children, what is the probability that between 2 and

Problem 4.3: (Rice p31) A player throws darts at a target. On each trial, independently of the other trials, she hits the bulls-eye with probability .05. Just how many times should she throw so that her probability of hitting the bulls-eye (at least once) is at least .5? Problem 4.4: (Rice p62) Appending three extra (binary) bits to a 4-bit word in a particular way (a Hamming Code) allows detection and correction of up to 1 error in any of the bits. If each bit has a probability of .05 of being changed during the communication, and if the bits are changed independently of each other, what is the probability that word is correctly received (that is, 0 or 1 bit is in error)? How does this probability compare to the probability that the word will be received correctly if we didnt use check bits, in which case all 4 bits would have to be received correctly for the word to be correct? Problem 4.5: (Rice p64) In Poker, a royal straight ush is an ace, king, queen, jack, and ten, all in the same suit. Suppose an avid poker player plays 100 hands a week, 52 weeks a year, for 20 years. a) What is the probability that the player never sees a royal straight ush dealt to them? b) What is the probability that the player sees 2 or more royal straight ushes? Problem 4.6: (Rice p63) Consider the random variable X binomial(n, p). For what value k is P(X = k) maximized? Show this. Problem 4.7: (Cincich) In Parade Magazine Ask Marylin (Nov 26, 2000) the following questions was posed: I have just tossed a balanced coin 10 times, and I ask you to guess which of the following sequences was the result. One (and only one) of the sequences is genuine: HHHHHHHHHH, HHTTHTTHHH, TTTTTTTTTT. a) Before any coins are ipped, comment on the respective probabilities of each of these three sequences occurring. b) What is the answer to the posed question, and why? Problem 4.8: In the game Plinko, a ball falls through successive levels l0 , l1 , l2 , . . .. At level ln , the ball is in one of n + 1 positions labelled rn,0 , rn,1 , rn,2 , . . . , rn,n . For all n, k, given that the ball passes through position rn,k , there is a probability of rn+1,k and a probability of
1 2 1 2

that the ball will then pass through

that the ball will (instead) then pass through rn+1,k+1 . Show that in

the game Plinko, for all n, k, the probability of the ball passing through the position rn,k is the
1 binomial probability P(X = k) where X binomial(n, 2 ). (Hint: verify that the statement is

true for all k with n = 0, 1, 2. Then assume it is true for all k with some particular value of n and 6

prove that it would be true for all k with n + 1. One tool is the fact that which can be easily veried with elementary algebra.) Geometric and negative binomial random variables

n k

n1 k

n1 k1

Problem 5.1: Given the 11 Scrabble letters making up the word Shakespeare, what is the probability that a monkey will misspell Shakespeare on each of the rst four tries and then spell it correctly on the fth try? Problem 5.2: (Rice p63) If X is a geometric random variable and n and k are positive integers, show that P(X > n + k 1|X > n 1) = P(X > k). In light of the construction of a geometric distribution from a sequence of independent Bernoulli trials, how can this be interpreted so that this result is obvious? Problem 5.3: (Rice p63) Three identical, fair coins are thrown simultaneously again and again until all of them show the same face. What is the probability that the number of simultaneous throws is between 3 and 5, inclusive. Poisson process and random variable, approximation of binomial Problem 6.1: (Rosner p114) Suppose the number of patient admissions to a particular hospital division on any given day has a Poisson distribution with parameter 1.6. Just how many daily admissions must the hospital be prepared to handle if they want to be 85% sure not to turn away patients on a given day? Problem 6.2: (Rosner p114) In the journal article by Sjolie and Green Blindness in insulin treated diabetic patients with age at onset less than 30 years, Journal of Chronic Disease 40 (1987) 215-220 it was reported that among insulin dependent diabetics in their 30s the annual probability of blindness is .0067. In a group of 1000 such patients what is the probability that there are 4 or more cases of blindness next year? Problem 6.3: (Rosner p108) From the journal article by Ho, Berardi, Weiblen, MahoneyTrout, Mitchell, and Grady Seroprevalence of human immunodeciency virus among childbearing women, New England Journal of Medicine 318 (1988) 525-530. In a particular hospital, 3741 newborns were tested for HIV, and 30 tested positive. In a random sample of 500 from this 7

population, what is the probability that exactly 10 of them have HIV? Problem 6.4: Suppose the probability of a single person contracting a certain variety of colon cancer over a 20 year period is
12 . 1000

Over a particular 20 year period, a particular factory main-

tained 100 employees and, and 4 of them were diagnosed with this cancer. Assuming the factory environment did not aect the cancer risk, how unusual is this occurrencespecically, what was the probability of 4 or more cases occurring over this period? Problem 6.5: Suppose a positive real number s is given. If X Poisson(s), for what value of k is P(X = k) maximized? Show this. Problem 6.6: (Ross p144) Suppose a nonnegative integer k is given. What positive real number s maximizes P(X = k), where X Poisson(s)? Show this. Problem 6.7: (Ross p144) Suppose a positive real number s is given. If X Poisson(s) show that P{Xis even} = 1 (1 + e2s ). (Hint: Add the expansions of es and es .) 2 Problem 6.8: Suppose n people own identical hats, and these hats are thrown into a box and are thoroughly mixed up. Then everyone independently and randomly points to one of the hats and claims it (there may be one person, several people or no people pointing to any single hat). a) What is the probability that no-one is pointing to their own hat? b) Compute the limit of your answer to part a as n goes to innity. The cumulative distribution function, uniform and exponential rv Problem 7.1: (Rice p65) Suppose the random variable X has density function f (x) = cx2 for 0 x 1, and f (x) = 0 otherwise. a) Find c. b) Find the cdf. c) Compute P(.1 X < .5). Problem 7.2: (Devore ed5,p177) Let X denote the distance in meters that a banner-tailed kangaroo rat moves from its birth site to the rst territorial vacancy. In the article Competition and dispersal from multiple nests, Ecology (1997) 873-883, it is suggested that X Exponential(.01386) meters. a) What is the probability that the distance is at most 100 meters? b) What is the probability that the distance is at most 200 meters? c) What is the probability that the distance is between 100 and 200 meters?

Problem 7.3: (Ross p187) The median of a continuous random variable X is the value x such that FX (x) = 1 ; in other words, it is a value that X is equally likely to be greater than and lesser 2 than. Compute the median of X Exponential(). Problem 7.4: (Ross p193) Suppose Y Uniform(2, 2). What is the probability that the
1 polynomial 4 t2 + Y t + 1 has real roots? Hint: Look at the discriminant; for what values of Y is

it negative? Problem 7.5: (Rice p65) Suppose X Exponential(). Let Y be the discrete random variable dened by Y = k if k 1 X < k for k = 1, 2, 3, . . .. Find the probability mass function of Y . Now, identify the distribution as one that we have already seen. Problem 7.6: (Ross p188) Suppose the random variable Y has cdf FY . Show that the random variable X := FY (Y ) has distribution Uniform(0, 1). (For simplicity, assume FY is strictly in1 creasing. Find the cdf of X, FX (t) by simplifying with the identity FY (FY (t)) = t for all t in

the interval [0, 1].) Exponentials memoryless feature, relation to Poisson process Problem 8.0: Consider a Poisson process with rate starting at time t = 0. Let the random variable X be the time when the third occurrence happens. Find and simplify the density of X. (Hint: Mimic the way we found the density where X was the time of the rst occurrence, but include the possibilities of 0, 1, and 2 occurrences in the interval [0, t].) Gamma and chi-square random variables, normal random variable Problem 9.0: In the previous problem, identify the distribution and parameters of X by recognizing its density. Problem 9.1: (Walpole p174) Show that ( 1 ) = 2 the gamma functions integral.) Problem 9.2: For k = 1, 2, 3 compute (k + 1 ), and generalize the pattern to express (k + 1 ) 2 2 for all positive integers k. (The generalization does not need a formal proof.) . (Hint: Use the substitution y = 2x in

Problem 9.3: (Devore p169) The article Reliability of domestic waste biolm reactors in J. of Envir. Eng. (1995) 785-790 suggests that substrate concentration (mg/cm3 ) of inuent to a reactor is normally distributed with = .30 and = .06. a) What is the probability that the concentration exceeds .25? b) What is the probability that the concentration is at most .10? c) Give a number x such that the probability is exactly .05 that the concentration is higher than x. Problem 9.4: Suppose X Normal(, 2 ). What is the distribution of
X 2 ?

Problem 9.5: (Ross p187) Suppose Z Normal(0, 12 ). Show that for every real number c > 0
c it holds that limx P(Z x + x | Z x) = ec . (Hint: Write as a ratio involving the cdf, then

use lHopitals rule.) Expected value of a random variable and functions of a rv


2 Problem 10.1: Suppose X 2 . What are E(X) and X ? n

Problem 10.2: (Rice p156) Suppose n people have throat cultures, and the cultures are then completely mixed up. If we randomly pair o the n peoples names with the n cultures, what is the expected number of correct labels? (Hint: Express as a sum of (perhaps dependent) random variables.) Problem 10.3: (Rice p155) Suppose X is a nonnegative continuous random variable with cdf F . a) Show that E(X) = random variable. Problem 10.4: (Rice p154) Suppose the random variable X has cdf F (x) = 1 x for x 1.
2 Find E(X) for those values of such that it exists. Find X for those values of such that it exists. 0 [1 F (x)]dx.

(Hint: One approach is to create a double integral and

change the order of integration.) b) Use part a) to compute the expected value of the exponential

Problem 10.5: (Rice p154) For a xed such that 1 1, let the random variable X have density f (x) =
1+x 2 2 for all x such that 1 x 1. Find E(X) and X .

Problem 10.6: (St. Petersburgs Paradox) Suppose we ip a fair coin repeatedly until a heads results. If the rst heads shows up on the ith ip then you will be paid 2i dollars. a) What are your expected winnings? b) Is this counterintuitive? 10

Problem 10.7: (Rice p155) Two sticks are each 5 inches long. The random variable X has a Uniform(0,5) distribution, and we plan to break each stick into two pieces of length X and 5 X inches. From these four pieces of the two sticks we will make a rectangular picture frame. What is the expected area of this frame? Problem 10.8: Suppose a disease outbreak will spread to a disk (centered at a particular location) with radius X Normal(20, 52 ). What is the expected area of the outbreak?
1 Problem 10.9: (Rice p156) Suppose X Poisson(). Find E( X+1 ).

1 Problem 10.10: (Rice p156) Suppose X Uniform(1, 2). Compute E( X ) and

1 . E(X)

Are they equal?

1 Problem 10.11: (Rice p156) Suppose X Gamma(, ). Compute E( X ) for all values and

for which it is dened. Problem 10.12: (Walpole p193) Suppose X Gamma(, 1). Show that for all nonnegative integers k it holds that E(X k ) =
(+k) . ()

Problem 10.13: (Ross p329) Suppose Z Normal(0, 12 ) and let the real number y be given and xed. Consider the new random variable dened by
Z X := 0

if Z > y otherwise

Show that E(X) =

y 1 e 2 2

Problem 10.14: (Ross p334) Consider a gambler who at each gamble either wins or loses the bet with probability p and 1 p, respectively. When p > 1 , a popular gambling system, known as 2 the Kelley strategy, is to always bet the fraction 2p 1 of your current fortune. Compute the expected value after n gambles of a gambler who starts with x units and employs the Kelley strategy. Joint distributions Problem 11.1: (Ross p240) Two fair dice are rolled. a) Find the joint probability mass function when X is the largest value of both dice and Y is the smallest value of both dice. b) Compute 11

the marginal mass functions of X and of Y . c) Are X and Y independent? Show this. Problem 11.2: (Ross p241) Suppose the joint probability density function of random variables X and Y is given by f (x, y) = c(y 2 x2 )ey for y x y, 0 < y < . a) Find c. (Hint: The joint density must integrate to 1. b) Find the marginal densities of X and Y . Problem 11.3: Suppose the joint density function of the random variables X and Y is given by f (x, y) =
1 4

for 0 x 2, 0 y 2. Compute P(Y > X). (Hint: Just identify the correct

region to integrate the density over. Problem 11.4: (Ross p241) The joint probability density function of X and Y is given by f (x, y) =
6 7

x2 +

xy 2

for 0 < x < 1, 0 < y < 2. a) Find the marginal densities of X and Y .
1 2 1 and Y < 2 ). d) Compute P(Y > X).

b) Are X and Y independent? c) Compute P(X

Problem 11.5: (Ross p241) Suppose the joint density of X and Y is given by f (x, y) = e(x+y) for 0 x < , 0 y < . a) Compute the marginal densities of X and Y . b) Are X and Y independent? Show this. c) Compute P(Y > X). Problem 11.6: (Buons Needle Problem) Endless, parallel East-to-West lines are spaced 2 inches apart on the ground, and a needle of length 1 inch is randomly tossed on the ground. Compute the probability that the needle touches a line. (Hint: The answer is
1 .

Consider the

random variable X = the North-South distance of the Southmost tip of the needle from the line immediately to its North. Consider the random variable Y = the angle of the needle relative to the East-West lines. It is reasonable to think of these random variables as being independent, and the answer is obtained by integrating an appropriate area corresponding to the outcomes when the needle touches the line.) Variance of a random variable, linear combinations of random variables Problem 12.1: Suppose X is any random variable with expected value and standard deviation . Compute the expected value and standard deviation of the random variable
X .

Problem 12.2: Suppose X Uniform(0, 1). a) Find the variance of X using the denition of variance. b) Without integration, use the variance of X to compute the variance of Y Uniform(0, 3) and Z Uniform(1.5, 1.5). 12

Problem 12.3: (Rice p156) Suppose X and Y are independent random variables, each with the same variance 2 . Compute Cov(X + Y, X Y ). The moment generating function, and uses Problem 13.1: (Walpole p193) Let n be a positive integer and suppose random variable X satises P(X = x) =
1 n

for x = 1, 2, 3, . . . , n. Show that the mgf for X is M (t) =

et (1ent ) . n(1et )

Problem 13.2: Directly compute the mgf for X Binomial(n, p) without considering X as a sum of independent Bernoulli random variables. (Hint: You will use binomial expansion of (a + b)n .) Problem 13.3: (Rice p160) Suppose X is a continuous random variable with density f (x) = 2x for 0 x 1. a) Compute the mgf M (t) for X. b) Verify that E(X) = M (0) and E(X 2 ) = M (0). Problem 13.4: (Rice p174) Show that the moment generating function of a binomial random variable with parameters (n, p) converges to the mgf of a Poisson random variable with parameter s when s := np is a xed number and n (i.e., p 0). (Hint: Recall that limi 1 +
i i

= e .

Problem 13.5: (Rice 160) Find the mgf of a geometric random variable and use it to compute the mean and variance. Problem 13.6: (Rice p160) Find the mgf of a negative binomial random variable and use it to compute the mean and variance. Problem 13.7: (Walpole p193) Suppose X is a continuous random variable. By expanding etx in a Maclaurin series and integrating term-by-term, show that MX (t) = 1 + E(X)t +
E(X 2 ) 2! tx e fX (x)dx

t2 +

E(X 3 ) 3!

t3 + .

Problem 13.8: What distribution does random variable X have if for all positive integers n the nth moment of X is E(X n ) = n!. (Hint: Use previous problem.) Problem 13.9: Suppose X1 , X2 , X3 Exponential() are independent and X = X1 + X2 + X3 . 13

Using the mgf what distribution does X have, and with what parameters? Chebyshevs Theorem, The Law of Large Numbers Problem 14.1: Suppose X Exponential() for some > 0. For each of k = 1, 2, 3 bound P(X kX X X + kX ) with Chebyshevs Theorem, and then compute these probabilities exactly and compare the bound to the exact answer. (Hint: Recall what X and X are for an exponential random variable, and integrate an appropriate density.) Problem 14.2: Suppose X 2 . Use Chebyshevs Theorem to give an interval in which X will n fall with probability at least
15 . 16

(The endpoints of this interval should be functions of n.) Hint:

Recall the mean and variance of a chi-square random variable. Problem 14.3: Suppose there is a sequence of rvs X1 , X2 , X3 , . . . such that, for all positive integers n, Xn 2 . Show that for all > 0 it holds that limn P(| Xn 1| > ) = 0. n n The Central Limit Theorem, normal approximation to binomial Problem 15.1: Suppose X 2 for some very large integer n. What other distribution will X n approximately have, and why? (Specify the distribution and the parameters.) Problem 15.2: Suppose the probability that a randomly selected person is left-handed is .10. In a class of 250 students just how many left-handed seats should we have to be 95% sure that no left-handed person goes without a seat? Problem 15.3: When I write checks I record the amount rounded o to the nearest dollar. Assume for simplicity that X = the roundo error from a single transaction in cents has the continuous distribution X Uniform(50, 50). After 100 independent transactions, what is the probability that my nancial records dier from my account balance by more than 7 dollars (i.e. 700 cents)? Estimation, estimating the mean when variance is know defer for later.

14

Estimating population proportion Problem 16.1: Suppose we wish to estimate the proportion of a particular patients white blood cells that are neutrophils. To do this, we randomly select 1000 white blood cells from the patient and suppose we nd that 381 of them are neutrophils. a) Give a 95% condence interval for the actual proportion of the patients white blood cells that are neutrophils. b) In designing this study, just how big a sample size do we need to take so that the 95% condence interval has margin of error .15? (For this last computation, you may use the intelligent guess that the true proportion is approximately .4, since that is the usual proportion in healthy people. Problem 16.2: Suppose you wish to approximate
x n 1

by performing the Buon Needle experiment


x n 1 will be within .005 of ?

n times; if x will be the number of times that the needle will hit a line then you will approximate
1 . How big does n have to be in order to be 95% sure that

Problem 16.3: (Devore ed5,p308) It is important that face masks used by reghters be able to withstand high temperatures because reghters commonly work in temperatures of 200-500 degrees F. In a test of one type of mask, 22 out of 110 masks had lenses pop out at 250 degrees F. Construct a 90% condence interval for the population proportion of these masks whose lenses would pop out at 250 degrees. Problem 16.4: (Devore ed5,p287) Consider the next one thousand 95% condence intervals that a statistical consultant will obtain for various clients. Suppose the data sets on which the intervals are based are selected independently of one another. What is the probability that between 940 and 960 of these intervals contain the parameter of interest? Problem 16.5: (Rice p226) Suppose the random variable p is a sample proportion of size n from a particular dichotomous population. Show that
p(1) p n1 2 is an unbiased estimator of p .

Problem 16.6: Suppose there is some population such that an unknown proportion p has some particular characteristic. To test H0 : p = p0 against the two-sided alternative Ha : p = p0 , we compute the sample proportion p for a large, random sample. Instead of the usual test statistic 2 Z=
pp0
p0 (1p0 ) n

, suppose we decide to use the test statistic X :=

pp0
p0 (1p0 ) n

= Z 2.

a) Under H0 , what distribution does X have? (Hint: What distribution does Z have?) b) What would be a reasonable rejection region if we want the test of hypothesis to have a level of signi15

cance ? c) With elementary algebra, show that this test statistic satises X =

(1 e1 )2 e1

+ (2 e2 ) , e2

where 1 and 2 are the number of members of the sample with the characteristic and without the characteristic, respectively, and e1 and e2 are, respectively, np0 and n(1 p0 ). Estimating variance, the t-dist, estimating mean when variance unknown Problem 17.1: (Cincich p348) Organizational Science (Mar, Apr 2000) published a study of of the use of communications media by mid-level managers. One question was to 426 managers was how many email messages they sent in a typical week. The sample mean was 9.81 with a sample standard deviation of 14.41 messages. a) Compute a 95% condence interval for the population mean. b) If the population standard deviation was indeed around 14.41, then how big of a sample would be needed to get a margin of error .5? Problem 17.2: (Rosner p203) In the journal article by Arora and Rochester, Eect of chronic airow limitation on sternocleidomastoid muscle thickness, Chest 85 (1984) 58s-59s, the authors sampled 32 men with chronic airow limitation, and the sample mean of their triceps skin-fold thickness (TSFT) was .92 and the sample standard deviation TSFT was .4. a) Give a 95% condence interval for the (chronic airow limitation) population mean TSFT. b) The authors also sampled 40 healthy men, and obtained a sample TSFT mean of 1.35 and sample TSFT standard deviation of .5; give a 95% condence interval for the healthy population mean TSFT. c) For the chronic airow limitation population, give a 95% condence interval for the population TSFT standard deviation, assuming population normality. d) Repeat Part c for the healthy population. Problem 17.3: (Rosner p205) Suppose we wish to estimate the concentration (g/mL) of a specic dose of ampicillin in the urine after a certain period of time. We recruit 25 volunteers who have received ampicillin, and nd that they have a mean concentration 7.0 with sample standard deviation 2.0. Assume normality of population. a) Give a 95% condence interval for the population mean concentration. b) Give a 90% condence interval for the population variance concentration. c) If, instead, we knew the population concentration standard deviation was exactly 2.0, then just how big of a sample would be needed to get a 95% condence interval for population mean with a margin of error .25? Problem 17.4: (Rosner p204) A study of psychological changes in a cohort of dialysis patients with end-stage renal disease was conducted by Oldeburg, Macdonald, and Perkins, Prediction of quality of life in a cohort of end-stage renal disease patients, Journal of Clinical Epidemiology 41 16

(1988) 555-564. A sample of 102 patients has sample mean Psychological Adjustment to Illness Scale (PAIS) of 36.50 with sample PAIS standard deviation 16.08. Also, their serum phosphate concentration (mmol/L) had sample mean 1.68 with sample standard deviation .47. For each of serum phosphate and PAIS, give a 90% condence interval for the population mean. Problem 17.5: (Devore p303) A study of the ability of individuals to walk in a straight line (Can we really walk straight? Amer. J. of Physical Anthro. (1992) 19-27) reported the data on cadence (strides per second) for a sample of n = 20 randomly selected healthy men. The sample mean and and sample standard deviation were .9255 and .0809, respectively. Compute a 90% condence interval for the population mean and another for the population standard deviation. (Assume normality.) Problem 17.6: (Devore ed5,p307) The reaction time (RT) to a stimulus is the interval of time commencing with stimulus presentation and ending with the rst discernable movement of a certain type. The article Relationship of reaction time and movement time in a gross motor skill Perceptual and Motor Skills (1973) 453-454 reports that the sample mean RT for 16 experienced swimmers to a pistol start was .214 sec and the sample standard deviation was .036 sec. Give a 95% condence interval for the population mean and population standard deviation, assuming normality. Problem 17.7: (Devore ed5,p307) A triathlon consisting of swimming, cycling, and running is one of the more strenuous amateur sporting events. The article Cardiovascular and thermal response of triathlon performance, Medicine and Science in Sports and Exercise (1988) 385-389 reports on a research study involving nine male triathletes. Maximum heart rate (beats/min) was recorded during performance of each of the three events. For swimming, the sample mean and sample standard deviation were 188.0 and 7.2, respectively. Assuming normality, construct a 99% condence interval for the population (triathletes maximum heart rate while swimming) mean and standard deviation. Problem 17.8: (Devore ed5,p308) In the article The essential amino acid requirements of infants Amer. J. Nutrition (1964) 322-330 a study was conducted wherein for each of six normal infants the amount of the amino acid alanine (mg/100mL) was determined while the infants were on an isoleucine-free diet, resulting in the following data: 2.84 3.54 2.80 1.44 2.94 2.70 Compute a 95% condence interval for the population mean, assuming normality. 17

Test of hypothesis Problem 18.1: (Rosner p263) Suppose the incidence of MI (myocardial infarction) per year was 5 per 1000 among people 45 to 54 years old in 1970. To look at changes in incidence over time, 5000 people in this age bracket were followed for one year, starting in 1980; fteen new cases of MI were found. At a level of signicance .05, does this indicate that the rate had changed since 1970? Problem 18.2: (Rosner p263) Suppose the mean and standard deviation of serum creatinine level in the general population are 1.0 and 0.4 mg/dL, respectively, and assume normality. We test a new antibiotic on 12 patients, and after 24 hours their sample mean and sample standard deviation creatinine levels are 1.2 and .6, respectively. a) At a level of signicance .05, does this indicate that the population (of people on the antibiotic for 24 hours) mean creatinine level is higher than for the general population? b) At a level of signicance .05, does this indicate that the population (of people on the antibiotic for 24 hours) standard deviation creatinine level is higher than for the general population? Problem 18.3: (Rosner p265) Iron deciency anemia is an important nutritional health problem in the United States. Suppose the mean daily iron intake among a large population of 9 to 11 year olds is 14.44 mg. A dietary assessment was performed on a random sample of 51 children ages 9 to 11 whose families live below the poverty level. The sample mean and sample standard deviation daily iron intake were found to be 12.50 and 4.75, respectively. Does this indicate, at a level of signicance .05, that the mean iron intake of of the population of children below the poverty level is lower than the national mean? Problem 18.4: (Devore ed5,p333) Minor surgery on horses under eld conditions requires a reliable short-term anesthetic producing good muscle relaxation, minimal cardiovascular and respiratory changes, and a quick, smooth recovery with minimal aftereects so that horses can be left unattended. The article A eld trial of ketamine anesthesia in a horse Equine Vet. J. (1984) 176-179 reports that for a sample of n = 73 horses to which ketamine was administered under certain conditions, the sample mean recumbency (lying-down) time was 18.86 min, and the sample standard deviation was 8.6 min. Assuming normality: a) Does this data indicate, at a level of signicance .10, that the population mean recumbency time under these conditions is less than 20 minutes? b) At a level of signicance .10, is the population standard deviation less 18

than 15 minutes? Problem 18.5: (Devore ed5,p333) The recommended daily dietary allowance for zinc among males older than age 50 years is 15 mg/day. The article Nutrient intakes and dietary patterns of older Americans: A national study J. Gerontology (1992) M145-150 reports the following summary data on intake for a sample of males age 65-74 years: n = 115, x = 11.3, s = 6.43. Does this data indicate, at a level of signicance .10, that mean daily intake of zinc for 65-74 year old males is less than the recommended allowance? Problem 18.6: (Devore ed5,p347) A spectrophotometer used for measuring CO concentration (ppm, by volume) is checked for accuracy by taking readings on a manufactured gas (called span gas) in which the CO concentration is very precisely controlled at 70 ppm. If the readings suggest the spectrophotometer is not working properly than it will have to be recalibrated. Assume that if properly calibrated then measured concentration for span gas is normally distributed. On the basis of six readings, 85, 77, 82, 68, 72, and 69, does the machine need to be recalibrated? Use a level of signicance .05. Problem 18.7: Suppose Machine E and Machine F produce widgets; the probability that any single widget is nondefective is .9 and .8, respectively, for Machine E and Machine F, independent of the other widgets. The machine of origin for a warehouse of widgets in unknown; consider H0 that they are from Machine E vs Ha that they are from Machine F. It is decided that a random sample of 160 widgets will be selected from this warehouse, and H0 will be rejected if X = the number of nondefective widgets is less than 134. a) Compute the level of signicance for this test. b) Compute the for this test. c) Compute the power of this test. P-values Problem 19.1: Compute P-values for all of the tests of hypothesis in previous section. Problem 19.2: In a test of hypothesis, suppose that we compute P = the p-value of the observed outcome, and I then ignore the original test statistic and want to use P as my test statistic (after all, P is a random variable). What rejection region should we use for this new test statistic P , if we want a level of signicance ? 19

Problem 19.3: Suppose the time until failure of a motor has an exponential distribution. A manufacturer claims that the expected lifetime of a motor is 20, 000 hours. You think the expected lifetime is actually less than this, buy such a motor to see how long it lasts, say X hours. a) Find a number x such that rejecting the manufacturers claim if X is less than x has level of signicance .20. b) In doing this test, suppose X is realized as 2, 000 hours. Provide a p-value. Problem 19.4: (Devore 6ed, p46 and p338) The ve observations 2781 2900 3013 2856 2888 are of stabilized viscosity (cP) for specimens of a certain grade of asphalt with 18% rubber added, as recorded in the article Viscosity Characteristics of Rubber-Modied Asphalts, J. of Materials in Civil Engr. (1996), pages 153156. Suppose for a particular application it is required that the true average viscosity be 3000. (Assume normality of population) a) Is this sample evidence that the requirement is not satised? Test at a level of signicance .05. b) Provide a p-value. Problem 19.5: (trivial modication of Devore 6ed, p338) To obtain information on the corrosiveresistance properties of a certain type of steel conduit, 41 specimens are buried in soil for a 2-year period. The maximum penetration (in mils) for each specimen is then measured, yielding a sample average penetration of x = 52.7 and a sample standard deviation of s = 4.8. The conduits were manufactured with the specication that the true average penetration be 50 mils. a) Does this sample provide evidence that the specication is not correct, at a level of signicance .02? b) Provide a p-value. Problem 19.6: A politician claims an approval rating of 70%. A random sample of 100 constituents has a sample proportion p = .60 who approve of the politician. a) At a level of sig nicance .05 does this imply that the politician is less popular than claimed? b) Provide a p-value. Goodness-of-t Problem 20.1: (Rosner p422) A study was performed in Cramer, Schi, Schoenbaum, Gibson, Belisle, Albrecht, Stillman, Berger, Wilson, Stadel, and Siebel, Tubal infertility and the intrauterine device, New England Journal of Medicine 312 (1985) 941-947 relating the duration of IUD use to infertility. A group of 89 infertile IUD users and a group of 640 fertile IUD users

20

were identied. The women were subdivided by the duration of IUD use as follows: months < 3 3 months < 18 18 months 36 36 < months infertile fertile 10 53 23 200 20 168 36 219

Does this indicate a relationship between infertility and length of IUD use, at a level of signicance .05? Problem 20.2: (Devore ed5,p624) Criminologists have long debated whether there is a relationship between weather conditions and the incidence of violent crime. The author of Is there a season for homicide? Criminology (1988) 287-296 classied homicides according to season, with the numbers during winter, spring, summer, and fall being, respectively, 328, 334, 372, and 327. At a level of signicance .01, does this data indicate that the liklihood of homicide is not the same for all seasons? Problem 20.3: (Devore ed5,p624) The article Psychiatric and alcoholic admissions do not occur disproportionately close to patients birthdays Psychological Reports (1992) 944-946 focuses on the existence of any relationship between date of patient admission for treatment of alcoholism and patients birthday. Assuming a 365 day year, in the absence of any relation, a patients admission date is equally likely to be any of the 365 days of the year. The investigators established four dierent admission categories: (1) The 15 days nearest to birthday, (2) the next 46 closest days to birthday, (3) the next 120 days closest to birthday, and (4) the other 184 days of the year. A sample of 200 patients gave observed frequencies of 11, 24, 69, and 96 for these categories, respectively. Does this indicate a relationship, at level of signicance .01?

updated 9/2/07

21