Introduction to inferential statistics and hypothesis testing

Week 6.
Chapter 7 Introduction to inferential statistics
Inferential statistics
Statistical inference is concerned with making inferences from one or more samples about the population from which they were drawn. One approach is to estimate the characteristics or parameters of a population from the corresponding characteristics or parameters of the sample(s). It is usually too ambitious to estimate every possible parameter of a population and the simplest parameter that we estimate is the population mean from the mean of a sample. Not only can we use such estimates for predictive purposes but we also use them for monitoring and controlling processes.
Estimating the population mean

W denote the mean of a random sample by
and the population mean by . x
We cannot simply estimate by

because if we take any two random samples from a population then they might be different.
Sampling distributions
If we take many random samples then we could calculate the means of all the samples and construct a distribution of the sample means to show how these means are distributed. Such a distribution is called a sampling distribution of the means
Frequency 18 16 14 12 10 8 6 4 2 0
A sampling distribution of means
Mean
The distribution of sample means

A histogram of the sampling distribution of means exhibits a bell shape and can be modelled by a Normal distribution. Providing that we have taken many random samples (typically more than 30) then we always find that the sampling distribution of means is Normally distributed even though the population from which the samples are drawn is not. If we take many random samples and take the mean of the sample means (i.e. the mean of the sampling distribution of the means) then we would expect this to be very close to the population mean () because many random samples will cover most of the population.
Estimating the population mean

If we take many random samples and take the mean of the sample means (i.e. the mean of the sampling distribution of the means) then we would expect this to be very close to the population mean () because many random samples will cover most of the population.
The sample mean
can be used to estimate the population mean
Standard error of the mean

Although we would expect that the population mean can be estimated by the sample mean we would not expect the standard deviation of the sampling distribution of the means to be similar to the standard deviation of the population ( ) because there is not a great deal of variation in the sample means. The standard deviation of the sampling distribution of the means is related to and the sample size (n) by:
This is often called the standard error of the mean
Modelling the sampling distribution of the means using the Standard Normal distribution
x Zx n
Central Limit Theorem

The sampling distribution of many sample means is almost Normally distributed even though the population from which the samples were drawn is not. Generally, we find that the sampling distribution of the means is Normally distributed if the population is Normally distributed or at least 30 samples are used to construct the sampling distribution or alternatively the samples are large enough to represent the population.
Confidence intervals
So far we can only estimate the population mean if we have taken a large number of random samples and this is potentially very time consuming and expensive. Rather than trying to find a single estimate for the population mean we can estimate an interval in which it lies by taking advantage of the Central Limit Theorem.
95% confidence
Two tail confidence
A two tail confidence interval for the population mean

x - Z*
s n
< < x + Z*
s n
One tail confidence
The students t-distribution

When we have small samples (n < 30) then the Central Limit Theorem will not support the use of the Normal distribution and we instead use the students t-distribution whose shape depends on the number of degrees of freedom.
x - t*
s n
< < x + t*
s n
Hypothesis testing
Suppose that we have some preconceived idea about a population, for example that average life expectancy has increased over the last decade or that average household disposable income has decreased over the last five years. Such preconceptions are called hypotheses. We can test hypotheses by taking random samples to see if there is any evidence to support or refute them. In practice statisticians are very cautious people and do not say that they accept a hypothesis but instead say that a hypothesis cannot be rejected. In general the hypothesis that we set out to test is called the Null Hypothesis, which we label H0. If the Null Hypothesis is rejected then implicitly we cannot reject an alternative to the Null Hypothesis, which we label H1.
Procedure for hypothesis testing

Accurately define a precise statement of the Null Hypothesis (H0) Take a random sample from the population
Test to see if the sample supports the Null Hypothesis (H0)

If the sample suggests that the Null Hypothesis (H0) is highly improbable then reject it and do not reject the alternative (H1)
Two tailed significance and acceptance and rejection regions
Level of confidence = 100% - level of significance
One tailed regions
Errors in hypothesis testing

The Null hypothesis (H0) is: Not reject H0 The decision is: (i.e. reject H1 ). Reject H0 (i.e. not reject H1). Type I error Correct decision TRUE Correct decision FALSE Type II error
Chi-squared hypothesis tests

A number of theoretical probability distributions are closely related to the Normal distribution and one of the most widely used is the Chi-squared distribution. The Chi squared distribution is often used for making comparisons, particularly between the contents of tables and gets its name from the fact that it considers squared differences. In fact the Chi-squared distribution is derived from summing several independent squared Normal distributions. We often encounter the Chi-squared distribution when dealing with variance because variance also comes from considering squared differences.
Chi-squared distribution
The formula for the Chi-squared distribution is far too complex for most of us to use and we often find tables of frequently used Chisquared probabilities. Like the students t distribution the shape of the Chi-squared distribution is determined by the degrees of freedom. We can see that it becomes similar to the Normal distribution as the number of degrees of freedom increases.
Calculating chi-squared
Degrees of freedom = number of classes 1
Hypothesis tests of association

The 2 hypothesis test is also widely used for testing whether there is any relationship in the responses from two or more questions in a questionnaire. The responses from the questions are entered into a table known as a contingency table with r rows and k columns. Expected frequencies are calculated using from relative frequencies and we calculate the degrees of freedom by:
Degrees of freedom for an (r x k) contingency table = (r - 1)*( k 1)

Introduction to inferential statistics and hypothesis testing

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Introduction to inferential statistics and hypothesis testing

Diunggah oleh

Hak Cipta:

Format Tersedia

Week 6.

Chapter 7 Introduction to inferential statistics

Estimating the population mean

We cannot simply estimate by

A sampling distribution of means

The distribution of sample means

Estimating the population mean

can be used to estimate the population mean

Standard error of the mean

This is often called the standard error of the mean

Central Limit Theorem

Two tail confidence

A two tail confidence interval for the population mean

One tail confidence

The students t-distribution

Procedure for hypothesis testing

Test to see if the sample supports the Null Hypothesis (H0)

Two tailed significance and acceptance and rejection regions

Level of confidence = 100% - level of significance

One tailed regions

Errors in hypothesis testing

Chi-squared hypothesis tests

Degrees of freedom = number of classes 1

Hypothesis tests of association

Anda mungkin juga menyukai