Anda di halaman 1dari 9

Chapter 1: Nature of Statistics

Section 1.1 Statistics Basics

Two major types of Statistics (studies):

1. Descriptive Statistics consists of methods for organizing and

summarizing information.

2. Inferential Statistics consists of methods for drawing and

measuring the reliability of conclusions about a population based on
information obtained from a sample of the population.

Select a representative sample

from the population


Analyses for statistical

inference about the

Population is the complete collection of all elements on which we
measure one or more variables in a statistical study
Population Size is the number of elements or observations in the
population, denoted by N

Sample is a sub collection of elements drawn from a population on

which we measure one or more variables in a statistical study
Sample Size is the number of elements or observations in the sample,
denoted by n

Chapter 1: Nature of Statistics

Example (1.12) The Music People Buy

Results of monthly telephone surveys yielded the percentage estimates of
all music expenditures shown in the following table. These statistics were
published in 2004 Consumer Profile. Identify if this is a descriptive or
inferential statistical study. [Inferential]

Music Rock Country Rap R&B Pop Religious Kids

Expense 23.9 13.0 12.1 11.3 10.0 6.0 2.8
Music Jazz Classical Oldies Movies New Other Unknown
Expense 2.7 2.0 1.4 1.1 1.0 8.9 3.8

Example (1.14) Pricey Gasoline

An associated Press/AOL poll of 1000 US adults, taken April 18-20,
2005 and appearing in Eau Claire Leader Telegram on April 22, asked
about some of the consequences of the rising cost of gasoline. Of those
sampled, 58% had reduced their driving, 57% had cut back on other
expenses, 41% had planned vacations closer to home, and 41% said that
they may buy a more fuel-efficient vehicle.
a. Identify the population and sample for the study
b. Are the percentages provided descriptive or inferential statistics?
Ideally a sample should be representative of the population. This
means that characteristics prevalent in the population are also within
the sample.

To ensure a representative sample, a sample should be chosen

appropriately using sampling techniques (section 1.2 and 1.3) and as
large of a sample should be chosen as possible. Note that sample size
is restricted by time, money, effort, etc...

A sample characteristic is called a statistic, whereas a population

characteristic is called a parameter. Sample statistics are estimates of
the Population parameters. Example: average tree girth in a forest.
Sample mean is y and population mean is .

Chapter 1: Nature of Statistics

Other ways of Classifying Statistical Studies:

1. Observational Study researchers simply observe characteristics

and take measurements, as in a sample survey. These studies can only
reveal association.

Example: in a study of high blood pressure and coronary heart disease

investigators determine the blood pressure and the presence of heart
disease at the same time. If they find an association they cannot
determine which comes first. Does heart disease result in high blood
pressure or does high blood pressure cause heart disease or are both high
blood pressure and heart disease the result of some other common cause?

2. Designed Experiment researchers impose treatments and controls

and then observe characteristics and take measurements. These studies
can establish causation. More on this in section 1.4.

Example: recall the Salk Polio Vaccine Trials they used a control group
and a treatment group to determine that the vaccine indeed helped reduce
the risk of contracting Polio.

Homework: 1.7 1.21 (all odd numbered problems)

Chapter 1: Nature of Statistics

Section 1.2 Simple Random Sampling

If an experiment/study requires a sample, then the Goal is to have a
representative sample of a population of interest.

Check to see if other databases or other organizations have the

information you are looking for. Example: Statistics Canada runs a
Census every 5 years and publishes their results on the internet and in
various databases.

Definitions: Types of Samples

Random Sample method of sampling such that each member of a
population under observation has an equal chance of being selected.

Example choosing a sample of 20 MRC students. List names of all

MRC students and picking twenty names out of a hat is a random
sample. Picking the first twenty students who walk through East Gate
at 10am is not a random sample.

Simple Random Sample is a method of sampling such that:

o n subjects are selected in such a way that every possible sample
of size n has the same chance of being chosen.
Two types of simple random sampling
1. with replacement member of a population can be
selected more than once
2. without replacement member of a population is
selected at most once. (this is typically what is done)

Example choosing a sample of 20 MRC students by listing all

groups of twenty students at MRC and picking a group from a hat is a
simple random sample

Example from a group of 4 top students from a high school (April,

May, Jon, Tim) a group of two is to be selected to represent the
school. List all possible groups of 2, one of these groups is to
randomly selected to represent the school, what are the chances of
selecting Jon and Tim?

Chapter 1: Nature of Statistics

How to choose things randomly?

Have a computer generate a (pseudo) random number (tutorial)
Use a random device - pick number from a hat or roll some dice
Use a random number table (pg 15 or appendix A-5)

Example: (1.44) Keno

In the game of Keno, 20 balls are selected at random from 80 balls,
numbered 1-80.
a. Use a random number table to simulate one game of keno by
obtaining 20 random numbers between 1 and 80.
b. If you have access to a random-number generator, use it to solve part

Homework: 1.31, 1.37, 1.39, 1.41, 1.43

Chapter 1: Nature of Statistics

Section 1.3 Other Sampling Techniques

The sampling technique chosen is usually determined by the type of study

being conducted. There are advantages and disadvantages to each sampling

1. Systematic Random Sampling simple to conduct but runs into

problems when there is a pattern in the data.
Step 1: Divide the population size (N) by the sample size (n) and
round the result down to the nearest whole number m
Step 2: Use a random number table (or similar device) to obtain a
number, k, between 1 and m
Step 3: Select for the sample those members of the population that are
numbered, k, k+m, k+2m,

Example (1.50) Keno. In the game of keno, 20 balls are selected at

random from 80 balls, numbered 1-80. Use systematic random sampling
to obtain sample of 20 of the 80 balls. Is it reasonable to use systematic
sampling for simulating a game of Keno?

2. Cluster Random Sampling, we divide the population into sections (or

clusters), then obtain a simple random sample the clusters and then
choose to interview all the members (or a subset of them) of the selected

Example - looking at city blocks of Calgary as a cluster. Select several

blocks and interview all those in the clusters. If looking at the average
height of Calgarians is this reasonable? If looking at income of people in
Calgary is this reasonable?

3. Stratified Random Sampling, we divide the population into at least two

different subpopulations (or strata) that share some characteristic. Then
draw the simple random sample from each stratum.

Example - looking at the prevalence of heart disease in different ethnic


Homework: 1.49, 1.53

Chapter 1: Nature of Statistics

How good is the sample?

Sampling Bias is the systematic tendency for some value to be

selected more often.

o The Banff Park lake study of the distribution of body length in

a certain population of fish in a lake. The sample will be
collected using a fishing net. Smaller fish can more easily slip
through the holes in the net. Thus, smaller fish are less likely to
be caught than larger ones, so the sampling procedure is biased.

o When bias occurs the sample mean fish length y will not be a
good estimate of the population mean fish length .

Other errors can occur such as measuring errors, missing data, non response,
transcription errors. There are methods for dealing with this.

Chapter 1: Nature of Statistics

Section 1.4 Experimental Design

Definitions: In a designed experiment,

Experimental Units are the individuals or items on which the

experiment is performed
Treatment each experimental condition
Response Variable the characteristic of the experimental outcome
that is to be measure or observed
Factor a variable whose effect on the response variable is of interest
in the experiment
Levels the possible values of a factor

Example (1.62) Storage of Perishable Items

Storage of perishable items is an important concern for many companies.
One study examined the effects of storage time and storage temperature on
the deterioration of a particular item. Three different storage temperatures
and five different storage times were used. Identify the experimental units,
response variable, factor(s), level of each factor, and treatments.

The following principles of experimental design enable a researcher to

conclude that differences in the results of an experiment, not reasonably
attributable to chance, are likely caused by the treatments.

Control two or more treatments should be compared, group

receiving the treatment is called treatment group, and the group
receiving the a control (placebo) is called the control group.

Randomization the experimental units should be randomly divided

into groups to avoid unintentional selection bias in constituting the

Replication a sufficient number of experimental units should be

used to ensure that randomization creates groups that resemble each
other closely and increase the chance of detecting difference in

Example recall the Salk Polio vaccine trials

Chapter 1: Nature of Statistics

Statistical Designs
We look at two statistical designs:

1. Completely Randomized Design all experimental units are assigned

randomly among all the treatments.

2. Randomized Block Design the experimental units are assigned

randomly among all the treatments separately within each block. Where
units that are similar is some way are expected to affect the response
variable are grouped in blocks.

Example (1.66) Lifetimes of Flashlight Batteries.

Two different options are under consideration for comparing the lifetimes of
four brands of flashlight battery, using 20 different flashlights. Determine
the design used in each option.
a. One option is to randomly divide 20 flashlights into four groups of
five flashlights each and then randomly assign each group to use a
different brand of battery.
b. Another option is to use 20 flashlights five different brands of 4
flashlights each and randomly assign the 4 flashlights of each brand
to use a different brand of battery.

Homework 1.61, 1.63

Homework review problems
3, 5, 9, 19, 23, 25


we have now completed chapter 1 in which we discussed methods for

collecting data and creating experiments
In Chapter 2 and 3 we are going to look at methods for classifying,
tabulating and summarizing data using Descriptive Statistics and
Graphing techniques