Anda di halaman 1dari 24

Week 6.

Chapter 7 Introduction to inferential statistics

Inferential statistics
Statistical inference is concerned with making inferences from one or more samples about the population from which they were drawn. One approach is to estimate the characteristics or parameters of a population from the corresponding characteristics or parameters of the sample(s). It is usually too ambitious to estimate every possible parameter of a population and the simplest parameter that we estimate is the population mean from the mean of a sample. Not only can we use such estimates for predictive purposes but we also use them for monitoring and controlling processes.

Estimating the population mean


W denote the mean of a random sample by
and the population mean by . x

We cannot simply estimate by


because if we take any two random samples from a population then they might be different.

Sampling distributions
If we take many random samples then we could calculate the means of all the samples and construct a distribution of the sample means to show how these means are distributed. Such a distribution is called a sampling distribution of the means
Frequency 18 16 14 12 10 8 6 4 2 0

A sampling distribution of means

Mean

The distribution of sample means


A histogram of the sampling distribution of means exhibits a bell shape and can be modelled by a Normal distribution. Providing that we have taken many random samples (typically more than 30) then we always find that the sampling distribution of means is Normally distributed even though the population from which the samples are drawn is not. If we take many random samples and take the mean of the sample means (i.e. the mean of the sampling distribution of the means) then we would expect this to be very close to the population mean () because many random samples will cover most of the population.

Estimating the population mean


If we take many random samples and take the mean of the sample means (i.e. the mean of the sampling distribution of the means) then we would expect this to be very close to the population mean () because many random samples will cover most of the population.
The sample mean

can be used to estimate the population mean

Standard error of the mean


Although we would expect that the population mean can be estimated by the sample mean we would not expect the standard deviation of the sampling distribution of the means to be similar to the standard deviation of the population ( ) because there is not a great deal of variation in the sample means. The standard deviation of the sampling distribution of the means is related to and the sample size (n) by:

This is often called the standard error of the mean

Modelling the sampling distribution of the means using the Standard Normal distribution
x Zx n

Central Limit Theorem


The sampling distribution of many sample means is almost Normally distributed even though the population from which the samples were drawn is not. Generally, we find that the sampling distribution of the means is Normally distributed if the population is Normally distributed or at least 30 samples are used to construct the sampling distribution or alternatively the samples are large enough to represent the population.

Confidence intervals
So far we can only estimate the population mean if we have taken a large number of random samples and this is potentially very time consuming and expensive. Rather than trying to find a single estimate for the population mean we can estimate an interval in which it lies by taking advantage of the Central Limit Theorem.

95% confidence

Two tail confidence

A two tail confidence interval for the population mean


x - Z*

s n

< < x + Z*

s n

One tail confidence

The students t-distribution


When we have small samples (n < 30) then the Central Limit Theorem will not support the use of the Normal distribution and we instead use the students t-distribution whose shape depends on the number of degrees of freedom.

x - t*

s n

< < x + t*

s n

Hypothesis testing
Suppose that we have some preconceived idea about a population, for example that average life expectancy has increased over the last decade or that average household disposable income has decreased over the last five years. Such preconceptions are called hypotheses. We can test hypotheses by taking random samples to see if there is any evidence to support or refute them. In practice statisticians are very cautious people and do not say that they accept a hypothesis but instead say that a hypothesis cannot be rejected. In general the hypothesis that we set out to test is called the Null Hypothesis, which we label H0. If the Null Hypothesis is rejected then implicitly we cannot reject an alternative to the Null Hypothesis, which we label H1.

Procedure for hypothesis testing


Accurately define a precise statement of the Null Hypothesis (H0) Take a random sample from the population

Test to see if the sample supports the Null Hypothesis (H0)


If the sample suggests that the Null Hypothesis (H0) is highly improbable then reject it and do not reject the alternative (H1)

Two tailed significance and acceptance and rejection regions

Level of confidence = 100% - level of significance

One tailed regions

Errors in hypothesis testing


The Null hypothesis (H0) is: Not reject H0 The decision is: (i.e. reject H1 ). Reject H0 (i.e. not reject H1). Type I error Correct decision TRUE Correct decision FALSE Type II error

Chi-squared hypothesis tests


A number of theoretical probability distributions are closely related to the Normal distribution and one of the most widely used is the Chi-squared distribution. The Chi squared distribution is often used for making comparisons, particularly between the contents of tables and gets its name from the fact that it considers squared differences. In fact the Chi-squared distribution is derived from summing several independent squared Normal distributions. We often encounter the Chi-squared distribution when dealing with variance because variance also comes from considering squared differences.

Chi-squared distribution
The formula for the Chi-squared distribution is far too complex for most of us to use and we often find tables of frequently used Chisquared probabilities. Like the students t distribution the shape of the Chi-squared distribution is determined by the degrees of freedom. We can see that it becomes similar to the Normal distribution as the number of degrees of freedom increases.

Calculating chi-squared

Degrees of freedom = number of classes 1

Hypothesis tests of association


The 2 hypothesis test is also widely used for testing whether there is any relationship in the responses from two or more questions in a questionnaire. The responses from the questions are entered into a table known as a contingency table with r rows and k columns. Expected frequencies are calculated using from relative frequencies and we calculate the degrees of freedom by:
Degrees of freedom for an (r x k) contingency table = (r - 1)*( k 1)

Anda mungkin juga menyukai