Anda di halaman 1dari 83

CHAPTER 1

PARAMETER ESTIMATION

1
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

12-3

INTRODUCTION
Parameter estimation is the first step in inferential statistics. In other words, it is the process of estimating the value of a parameter using information obtained from a sample. The process that acquires information from samples and used the information to make conclusions about populations is called statistical inference. In order to do statistical inference, we require the skills and knowledge of descriptive statistics, probability distributions, and sampling distributions. The process can be simply as in figure 1.

INTRODUCTION (cont..)
The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic. There are two approaches to parameter estimation which are i) Point estimation Using point estimate will obtain value that is either 100% accurate or 100% different from the true value Note that, true value = parameter value ii) Interval estimation
3

INTRODUCTION (cont..)
Estimator is the statistic used to obtain the point estimate. Estimate is a specific value or range of values used to approximate some population parameter.

Why we estimate? Can we get the exact value from the population?

INTRODUCTION (cont..)
Population
Eg: All UUM students

Sampling process Samples are taken at random

Part of population unit eg: a number of UUM students

census Collect Information/ data

survey Collect Information/data

Population measurement

Sample measurement

estimate PARAMETER

STATISTICS

Figure 1.1: Relationship between parameter and statistic

POINT ESTIMATION
A point estimate is a specific numerical value of a parameter or a single value (or point) used to approximate a population parameter. A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point.

POINT ESTIMATION (CONT)


Table 1.1: Symbols for parameter and statistics Parameter Statistics/ estimator mean, variance, standard deviation, proportion, mean, variance, standard deviation, proportion,

POINT ESTIMATION (CONT)


Table 1.2: Formulas for statistics Statistics/ estimator Formula Sample mean Sample variance

Sample standard deviation

Sample proportion

POINT ESTIMATION (CONT)


Characteristics of Good Estimator The objective of each characteristic good estimator is to obtain an estimator with the sampling distribution mean centered to the parameter being estimated. The characteristic include: un-biasness Consistency relatively efficiency

POINT ESTIMATION (CONT)


An unbiased estimator of a population parameter is an estimator whose expected value is equal to that parameter. An estimator is an unbiased estimator for parameter if E( )= E.g. the sample mean is an unbiased estimator of the population mean , since:

10

POINT ESTIMATION (CONT)


An unbiased estimator is said to be consistent if the difference between the estimator and the parameter grows smaller as the sample size grows larger. E.g. is a consistent estimator of because:

That is, as n grows larger, the variance of smaller.

x grows
11

POINT ESTIMATION (CONT)


If there are two unbiased estimators of a parameter, the one whose variance is smaller is said to be relatively efficient. E.g. both sample median and mean are unbiased estimators of the population mean. However, according to the variances

so we choose mean, x since it is relatively efficient 12 compared to the sample median ~ . x

POINT ESTIMATION (CONT)


Example 1: A sample of 10 frogs has been taken at random and the weight (in grams) for each of the frog was recorded and given as below: 250 230 190 200 210 195 225 200 230 240

1. compute the point estimate for the mean weight of the frogs 2. estimate the standard deviation for the weight of the frogs 3. estimate the proportion of frogs that have weight not more than 200 grams
13

Solution:
i. let represent the mean weight of the frogs

ii. the estimate of the standard deviation for the weight of the frogs is given by

14

Solution:
iii. let be the number of frogs with weight not more than 200 grams and be the point estimate for the proportion of frog with weight not more than 200 grams Frogs with weight not more than 200 grams are; 200, 195, 200, 190 Then, Thus;

Note: the answer for question i) and ii) can be found directly from your calculator using the mode (SD) function. Those who use calculator model Casio can refer to Appendix 1 for the complete procedure.

15

Example 2:
The age of 15 students who came to the recreational club during last weekend are as given below:
8 17 15 15 13 15 10 12 16 16 12 16 17 15 18

Calculate the point estimate of the: i. average age of students ii. variance of age of students iii. proportion of students with age more than 15 years old.
16

Example 3:

A research has been done to determine percentage of UUMs staff living in Jitra, From a sample of 200 randomly chosen people, 88 of them are living in Jitra. i. Obtain the point estimate for the percentage of UUMs staff living in Jitra.

ii. Estimate the mean and the standard deviation of the proportion.

17

Interval Estimation
No matter how good is the point estimator is, we have to admit that the point estimate can sometimes gives a value which is 100% different from the true value. Besides that, the point estimators dont reflect the effects of larger sample sizes. Thus, it is recommended to use interval estimator to estimate population parameters, which is less precise but safer. An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval.
18

Interval Estimation (cont)


The value of interval estimator is between lower and upper boundaries. Generally, we write the value as Lower bound < population parameter < Upper bound
If is the point estimate of parameter given by , then the interval estimate is

Where, S is the standard deviation of the estimator k is the distribution of the parameter (distribution can be define based on Central Limit Theorem)

()

19

Interval Estimation (cont)


Once the interval estimate is obtained, we can conclude (with some ___% of certainty) that the population parameter of interest is between some lower and upper bounds. In this section we will discuss the interval estimate for the mean and the proportion and is summarize in Figure 1.2.

20

Interval Estimation (cont)

Figure 1.2: Interval estimation


21

Interval Estimation (cont)


Interval estimation for mean
Generally, the interval estimator for one population mean is given by

x Z S x < < x + Z S x
2 2

()
2

()

or x Z S x

()

Note: the Z distribution can be replace by t distribution if the condition to use Z distribution is not satisfied.

22

Interval Estimation (cont)


To determine whether to use Z or t distribution, we have to follow the Central Limit Theorem

Figure 1.3: Central Limit Theorem

23

Interval Estimation (cont)


Characteristics of the Z Distribution When the standard deviation of population is known or the sample size taken is more than or equal to 30, the normal Z distribution can be used.

Figure 1.4: Condition to use normal Z distribution


24

Interval Estimation (cont)


Characteristics of the t Distribution 1. When the population standard deviation is unknown and the sample size is less than 30, the t distribution with degrees of freedom must be used instead of Z distribution. 2. The degrees of freedom are the number of values that are free to vary after a sample statistic has been computed.

25

Interval Estimation (cont)


The t distribution differs from the standard normal distribution in the following ways. i. The variance is greater than 1.

ii. The t distribution is actually a family of curves based on the concept of degrees of freedom, which is related to sample size. As the sample size increases, the t distribution approaches the standard normal distribution.

26

Interval Estimation (cont)

Figure 1.5: The Z Normal and t distribution


27

Interval Estimation (cont)

Figure 1.6: t distribution with different degrees of freedom.


28

Interval Estimation (cont)


When to use the z or t distribution?
Is population std. dev. known? Use Z distribution no matter what the sample size is.

Yes

No Is sample size, n > 30? Yes No

* Variable are normally distributed when n<30

Use Z distribution and s in place of .

Use t distribution and s in the formula.

** variable are approximately normally distributed

Figure 1.7: Criteria for choosing Z or t distribution

29

Interval Estimation (cont)


Therefore; the confidence interval for a mean has 3 formulas; 1. confidence interval for a mean with known population standard deviation

x Z
2

< < x + Z
2

or

30

Interval Estimation (cont)


2. confidence interval for a mean with unknown population standard deviation, sample size more than or equal to 30 .
x Z S
2

S < < x + Z n 2 or

31

Interval Estimation (cont)


3. confidence interval for a mean with unknown population standard deviation, sample size less than 30 (n<30).
x t S < < x + t , n 1 S n 2 or

,n 1

32

Interval Estimation (cont)


The graphical view of interval estimate:

Width of interval LCL: UCL:

Figure 1.8: Graphical view of confidence interval


33

Interval Estimation (cont)


The probability of ( 1 ) is called Confidence Level (or degree of confidence). The is called significance level or the probability of Type I error will occur. Confidence Level is the relative frequency of times the confidence interval actually does contain the population parameter, assuming that the estimation process is repeated a large number of times. There are four commonly used confidence levels
34

Interval Estimation (cont)


Confidence Level

10.90 0.95 0.98 0.99

0.10 0.05 1.6449 0.05 0.025 1.9600 0.02 0.01 2.3323 0.01 0.005 2.5758

35

There are the critical values for t distribution. Confidence Level (10.90 0.95 0.98 0.99 df 3 5 7 9 0.10 0.05 0.02 0.01 0.05 0.025 0.01 0.005 2.3534 2.5706 2.9980 3.2498

36

Example 4:
A computer company samples demand during lead time over 25 time periods:
235 421 394 261 386 374 361 439 374 316 309 514 348 302 296 499 462 344 466 332 253 369 330 535 334

It is known that the standard deviation of demand over lead time is 75 computers. Estimate the mean demand over lead time with 95% confidence level in order to set inventory levels.
37

Example 5:
The president of a large university wishes to estimate the average age of the students presently enrolled. From past studies, the standard deviation is known to be 2 years. A sample of 50 students is selected, and the mean is found to be 23.2 years. Find the 95% confidence interval of the population mean.

Example 6:
A survey of 30 adults found that the mean age of a persons primary vehicle is 5.6 years. Assuming the standard deviation of the population is 0.8 year; find the 99% confidence interval of the population mean.
38

Example 7:
A cereal company selects twenty five 12-ounce boxes of corn flakes every 10 minutes and weighs the boxes. Suppose the weights have a normal distribution with variance is 0.04 ounces. One such sample yields calculate the 90% confidence interval of the population mean.

Example 8:
Ten randomly selected automobiles were stopped, and tread depth of the right front tire was measured. The mean was 0.32 inch and the standard deviation was 0.08 inch. Find the 95% confidence interval of the mean depth. Assume that the variable is approximately normally distributed.
39

Example 9:
The average production of peanuts in the state of Virginia is 3000 pounds per acre. A new plant food has been developed and is tested on 60 individual plots of land. The mean yield with the new plant food is 3120 pounds of peanuts per acre with a standard deviation of 578 pounds. Find the 95% confidence interval for the mean amount of rainfall during the summer months for the northeast part of the United States. Interpret the interval.

40

Example 10:
The following daily highs were recorded in the city of Chicago on 20 randomly selected December days.
32 49 21 32 25 34 25 36 31 38 27 40 22 30 44 28 39 36 18 38

Find a confidence interval for the mean daily high temperature, should we use t or Z distribution? Explain.

41

Interval estimation for proportion


Whenever the information is given in percentage, proportion or number of success for a specific event, then the problem being investigated has something to do with proportion. The procedures for drawing inferences about proportion are involved the nominal and sometimes ordinal scale (i.e categorical data). Example of categorical data: gender (male and female), job satisfaction (satisfied and unsatisfied), opinion (poor and good), attendance (absent, present), examination result (pass and failed), etc.
42

Interval estimation for proportion


The point estimate for the proportion is given by

p= x
Where;

n = symbol for the sample proportion

= number of sample units that possess the characteristics of interest = sample size.

43

Interval estimation for proportion


Knowing that: Sample size n is big Both and are greater than or equal to 5 then, the formula to estimate the confidence interval for a proportion is given by
p Z S ( p ) < p < p + Z S ( p )
2 2

= p Z = p Z

p(1 p )
2

n pq

< p < p + Z

p(1 p )
2

pq < p < p + Z n n 2
44

Where is, p + q = 1

Example 11:
A recent study of 100 people in Miami found 27 were obese. What is the proportion of individual living in Miami who are obese? Obtain the 95% confidence interval of the proportion of individual living in Miami who are obese and interpret.

Example 12:
A survey found that out of 200 workers, 168 said they were interrupted three or more times an hour by phone, message, faxes and etc. Estimate with 90% confidence level, the percentage of the workers who are not interrupted three or more times an hour.
45

Example 13:
In a random sample of 500 observations, we found the proportion of successes to be 48%. Estimate with 95% confidence the population proportion of successes.

Example 14:
A random sample of 1500 pine trees was tested for traces of the Bark Beetle infestation. The result showed that 153 of the trees showed such traces. Assuming the data is approximately normally distributed, calculate the point estimator of the proportion of pine trees has been infested, and find a 95% confidence interval for the proportion of pine trees have been infested
46

Example 15:
The quality control manager at Ameen Company claims that the production of model A telephone to be out of control when the overall rate of defects exceed 4%. The test for a random sample of 150 telephones revealed that 9 of them are defective. Construct a 98% confidence interval for the proportion of telephones defect.

Example 16:
A statistics practitioner working for a major league baseball wants to supply radio and television commentators with interesting statistics. He observed several hundred games and counted the number of time runner on first base attempted to steal second base. He found there were 373 such events of which 259 were successful. Estimate with 95% confidence the population proportion of all attempted theft of second base that is successful.
47

Sample size
Sample size for Mean Recall back: the interval formula for estimating population mean is

x Z
2

< < x + Z n 2 4n 1 2 3 4
Error (E )

note that, maximum error;

E = Z
2

n
48

Sample size (cont)


Sample size for Mean using the maximum Error formula, we then can calculate the value of sample size, n which is given by

Z 2 n= E2

49

Sample size (cont)


Sample size for proportion Recall back: interval formula for estimating population proportion is

p Z

p(1 p )
2

< p < p + Z n 2 4 44 n 14 2 3
Error ( E )

p(1 p )

50

Sample size (cont)


Sample size for proportion note that, maximum error; E = Z

p(1 p )
2

using the maximum Error formula, we then can calculate the value of sample size, n which is given by

(1 p ) Z q Z p p 2 = 2 n= 2 2 E E

51

Conclusion
In conclusion, the width of the confidence interval estimate is affected by the population standard deviation, the confidence level the sample size,

52

Conclusion
The width of the confidence interval is a function of the confidence level, the population standard deviation, and the sample size

x Z

S
2

n S
2

< < x + Z n

S
2

= x Z

A larger confidence level produces a wider confidence interval:

Figure 1.9: relationship between width and confidence level


53

Conclusion
Larger values of standard confidence intervals deviation produce wider

Figure 1.10: relationship between width and standard deviations Increasing the sample size decreases the width of the confidence interval while the confidence level can remain unchanged.

54

INTERVAL ESTIMATION FOR TWO MEANS

Previously we have discussed the techniques to estimate parameters for one population mean

Now, consider this parameter but with two populations. With two populations, our interest will now be on the difference between two population means.

55

INTERVAL ESTIMATION FOR TWO MEANS


Population 1 Sample, size: n1

Parameters:

and Population 2

Statistics:

and

Sample, size: n2

Parameters:

and

Figure 1.11: Independent Population and Samples

56

INTERVAL ESTIMATION FOR TWO MEANS


There are two different types of sample which are: Dependent Samples also called related (or paired) samples occur when the response of the nth person in the second sample is partly a function of the response of the nth person in the first sample. There are two (2) common forms of sample dependency, before-after and other studies in which the same people are surveyed at different points in time including panel studies. matched-pairs studies in which similar people are surveyed at different points in time. Independent Samples are samples that are completely unrelated to one another.
57

INTERVAL ESTIMATION FOR TWO MEANS


Interval estimation for difference of two independent means In order to test and estimate the difference between two means, we draw random samples from each of two populations. Initially, we will consider independent samples, that is, samples that are completely unrelated to one another. Statistics used is

or

58

INTERVAL ESTIMATION FOR TWO MEANS


Interval estimation for difference of two independent means Two assumptions need to be fulfilled in order to determine the difference between two independent means: The samples must be independent of each other; that is, there can be no relationship between the subjects in each sample. The populations from which the samples were obtained must be normally distributed.

59

Interval estimation for difference of two independent means (cont..) There are four (4) different formulas to estimate the confidence level for the difference between two independent means, which are: i. Confidence interval when both population variance (or standard deviation) are known

60

Interval estimation for difference of two independent means (cont..) ii. Confidence interval when both population variance (or standard deviation) are unknown but both sample sizes are more or equal to 30

iii. Confidence interval when both population variance (or standard deviation) are unknown, any one or both sample sizes less than 30 and both population variances are assume equal

61

Interval estimation for difference of two independent means (cont..)

iv. Confidence interval when both population variance (or standard deviation) are unknown, any one or both sample sizes less than 30 and both population variances are assume unequal

As in interval estimator for one mean, same situation should be consider in deciding the formula to use to determine the difference between two means

62

Figure 1.12: Flow diagram for choosing the correct distribution


and Are both known? No Are both n1 & n2 > 30? No Use t/2 values and s in the formula. ** variable must be approximately normally distributed Use z/2 values no matter what the sample size is. Yes * Variable must be normally distributed when n<30 Use z/2 values and s in place of . Yes

Conduct equal variances t-test. ? Is No Use t/2 values with

Yes

Use t/2 values with pooled variance estimator,

63

Figure 1.13: Flow diagrams for choosing the correct confidence interval formula
and Are both known? No Are both n1 & n2 > 30? No Use t/2 values and s in the formula. ** variable must be approximately normally distributed Yes * Variable must be normally distributed when n<30

Yes

Conduct equal variances t-test. ? Is No

Yes

64

Example 17:
Two random samples of 40 students were drawn independently from two normal populations. The following statistics regarding their scores in a final exam were obtained;

Construct a 95% confidence interval for the difference between the means.

65

Solution The populations standard deviations are unknown. However, since both sample sizes are large enough (both ), according to Central Limit Theorem, the means follow Normal distribution.

The 95% confidence interval for the difference between the means is

66

Example 18:
A random sample of 22 male customers who shopped at this supermarket showed that they spent an average of RM80 with standard deviation of RM17.50. While a random sample of 20 female customers who shopped at the same supermarket showed that they spent an average of RM96 with standard deviation RM14.40. Assume that the amount spent at this supermarket by all the male and female customers are normally distributed with equal but unknown standard deviation. Construct a 99% confidence interval for the difference between the mean amount spent by all male and all female customers at this supermarket and interpret the interval.
67

Example 19:
Because of the rising costs of industrial accidents, many chemical, mining, and manufacturing firms have instituted safety courses. Employees are encouraged to take these courses designed to heighten safety awareness. A company is trying to decide which one of two courses to institute. To help make a decision eight employees take Course 1 and another eight take Course 2. Each employee takes a test, which is graded out of a possible 25. The safety test results are shown below. Assume that the scores are normally distributed. Construct a 90% confidence interval for different of mean.
Course 1 Course 2 14 20 21 18 17 22 14 15 17 23 19 21 20 19 16 15
68

Example 20:

Random samples of children sent to kindergarten aged 4 to 6 years in Bandar A and B were taken to find the number of hours spend for outdoor activities in the kindergarten daily. A sample of 321 children in Bandar B and 94 children in Bandar A give the mean of 3.01 hours and 2.88 hours, respectively. From past studies the population standard deviation for the children in Bandar B is assumed to be 1.09, while the population standard deviation for the children in Bandar A is 1.01. Find a 95% confidence interval for the difference between the two population means.

69

Interval estimation for the difference between two proportions

We will now look at procedures for drawing inferences about the difference between populations whose data are nominal (i.e. categorical).

With nominal data, we can calculate the proportions of occurrences of each type of outcome. Thus, the parameter to be estimated in this section is the difference between two population proportions: p1p2.

70

Interval estimation for the difference between two proportions (cont) Assumptions for doing Inferences about two proportions i. We have proportions from two independent simple random samples.

ii. In order to use Normal Z distribution, for both sample the conditions

n1 p1 5, n2 p2 5, n1 (1 p1 ) 5 and n2 (1 p2 ) 5
must be satisfied.

71

Interval estimation for the difference between two proportions (cont) To draw inferences about the parameter , we take samples of population, calculate the sample proportions and look at their difference.

x1 x2 and p 2 = p1 = n1 n2

( p1 p2 ) is an unbiased estimator for ( p1 p2 ) .

72

Interval estimation for the difference between two proportions (cont) The confidence interval estimator for (p1p2) is given by:
p1q1 p2q2 p1q1 p2q2 ( p1 p2 ) z + p1 p2 ( p1 p2 ) + z n n + n n2 1 1 2 2 2
( p1 p2 ) z p1q1 p2 q2 + n1 n2

73

Example 21:
A Consumer Packaged Goods (CPG) company has testing the marketing of two new versions of soap packaging. Version one (bright colors) was distributed in one supermarket, while version two (simple colors) was in another. Construct a 95% confidence interval for the difference between the two proportions of successes of packaged soap sales.

74

Example 22:
A random sample of 500 respondents was selected in a large city to determine information concerning consumer behavior. Among the questions asked was, Do you enjoy shopping? Of 240 male respondents, 136 answered yes. Of 260 female respondents, 224 answered yes. Construct a 95% confidence interval estimate of the difference between the proportion of males and females who enjoy shopping.

75

SPSS NOTES FOR OBTAINING THE CONFIDENCE INTERVAL OF MEAN


Step 1 : Select Analyze Menu Select Descriptive Statistics

76

Step 2 : Click on Explore Select the appropriate variable Step 3 : Click on the button into Dependent List box

List of Variable(s)

Make your choice

Make your choice

77

Step 4 : Click on Statistics Select the appropriate statistics, eg: Descriptive


You can change the degree of confidence (Usually use 90% and above)

Step 5 : Then, click on Continue Click on OK

78

Example

A random sample of 10 university students was surveyed to determine the amount of time spent weekly using a personal computer. The times are: 13, 14, 5, 6, 8, 10, 7, 12, 15, and 3. If the times are normally distributed with a standard deviation of 5.2 hours, estimate with 90% confidence the mean weekly time spent using a personal computer by all university students.

79

Descriptives times Mean 90% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 9.30 6.92 11.68 9.33 9.00 16.900 4.111 3 15 12 8 -.040 -1.396 Std. Error 1.300

Lower Bound Upper Bound

.687 1.334

At 90% confidence level, the mean weekly time spent using a personal computer by all university students is between 6.92 and 11.68.

80

SUMMARY
A point estimator is a good estimator if it has the qualities of good estimator which are un-biasness, consistent and relatively efficient. Unlike point estimation, interval estimation involves an interval constructed around the point estimate with a probability of . To construct interval, information regarding the sampling distribution of the statistics is important.

81

SUMMARY (CONT)
The Central Limit Theorem enables us to determine the sampling distribution for the sample statistics based on sample information of the sample size and knowledge of the population variance. If we want to know whether the population means/ proportion equals to certain value, k and the confidence interval for means/ proportion includes the k value, we can conclude that there is evidence to conclude that the mean/ proportion equals to k, at a given level of confidence.

82

SUMMARY (CONT)
If the confidence interval for the difference between two means/proportions includes 0 we can say that there is no significant difference (failed to reject) between the means of the two populations, at a given level of confidence.

END OF CHAPTER 1
83