Anda di halaman 1dari 9

Chapter 1: Nature of Statistics

Section 1.1 Statistics Basics

Two major types of Statistics (studies):

1. Descriptive Statistics consists of methods for organizing and


summarizing information.

2. Inferential Statistics consists of methods for drawing and


measuring the reliability of conclusions about a population based on
information obtained from a sample of the population.

Select a representative sample


from the population

Sample
POPULATION

Analyses for statistical


inference about the
population

Definitions:
Population is the complete collection of all elements on which we
measure one or more variables in a statistical study
Population Size is the number of elements or observations in the
population, denoted by N

Sample is a sub collection of elements drawn from a population on


which we measure one or more variables in a statistical study
Sample Size is the number of elements or observations in the sample,
denoted by n

1
Chapter 1: Nature of Statistics

Example (1.12) The Music People Buy


Results of monthly telephone surveys yielded the percentage estimates of
all music expenditures shown in the following table. These statistics were
published in 2004 Consumer Profile. Identify if this is a descriptive or
inferential statistical study. [Inferential]

Music Rock Country Rap R&B Pop Religious Kids


Expense 23.9 13.0 12.1 11.3 10.0 6.0 2.8
%
Music Jazz Classical Oldies Movies New Other Unknown
Age
Expense 2.7 2.0 1.4 1.1 1.0 8.9 3.8
%

Example (1.14) Pricey Gasoline


An associated Press/AOL poll of 1000 US adults, taken April 18-20,
2005 and appearing in Eau Claire Leader Telegram on April 22, asked
about some of the consequences of the rising cost of gasoline. Of those
sampled, 58% had reduced their driving, 57% had cut back on other
expenses, 41% had planned vacations closer to home, and 41% said that
they may buy a more fuel-efficient vehicle.
a. Identify the population and sample for the study
b. Are the percentages provided descriptive or inferential statistics?
[Descriptive]
NOTE:
Ideally a sample should be representative of the population. This
means that characteristics prevalent in the population are also within
the sample.

To ensure a representative sample, a sample should be chosen


appropriately using sampling techniques (section 1.2 and 1.3) and as
large of a sample should be chosen as possible. Note that sample size
is restricted by time, money, effort, etc...

A sample characteristic is called a statistic, whereas a population


characteristic is called a parameter. Sample statistics are estimates of
the Population parameters. Example: average tree girth in a forest.
Sample mean is y and population mean is .

2
Chapter 1: Nature of Statistics

Other ways of Classifying Statistical Studies:

1. Observational Study researchers simply observe characteristics


and take measurements, as in a sample survey. These studies can only
reveal association.

Example: in a study of high blood pressure and coronary heart disease


investigators determine the blood pressure and the presence of heart
disease at the same time. If they find an association they cannot
determine which comes first. Does heart disease result in high blood
pressure or does high blood pressure cause heart disease or are both high
blood pressure and heart disease the result of some other common cause?

2. Designed Experiment researchers impose treatments and controls


and then observe characteristics and take measurements. These studies
can establish causation. More on this in section 1.4.

Example: recall the Salk Polio Vaccine Trials they used a control group
and a treatment group to determine that the vaccine indeed helped reduce
the risk of contracting Polio.

Homework: 1.7 1.21 (all odd numbered problems)

3
Chapter 1: Nature of Statistics

Section 1.2 Simple Random Sampling


If an experiment/study requires a sample, then the Goal is to have a
representative sample of a population of interest.

Check to see if other databases or other organizations have the


information you are looking for. Example: Statistics Canada runs a
Census every 5 years and publishes their results on the internet and in
various databases.

Definitions: Types of Samples


Random Sample method of sampling such that each member of a
population under observation has an equal chance of being selected.

Example choosing a sample of 20 MRC students. List names of all


MRC students and picking twenty names out of a hat is a random
sample. Picking the first twenty students who walk through East Gate
at 10am is not a random sample.

Simple Random Sample is a method of sampling such that:


o n subjects are selected in such a way that every possible sample
of size n has the same chance of being chosen.
Two types of simple random sampling
1. with replacement member of a population can be
selected more than once
2. without replacement member of a population is
selected at most once. (this is typically what is done)

Example choosing a sample of 20 MRC students by listing all


groups of twenty students at MRC and picking a group from a hat is a
simple random sample

Example from a group of 4 top students from a high school (April,


May, Jon, Tim) a group of two is to be selected to represent the
school. List all possible groups of 2, one of these groups is to
randomly selected to represent the school, what are the chances of
selecting Jon and Tim?

4
Chapter 1: Nature of Statistics

How to choose things randomly?


Have a computer generate a (pseudo) random number (tutorial)
Use a random device - pick number from a hat or roll some dice
Use a random number table (pg 15 or appendix A-5)

Example: (1.44) Keno


In the game of Keno, 20 balls are selected at random from 80 balls,
numbered 1-80.
a. Use a random number table to simulate one game of keno by
obtaining 20 random numbers between 1 and 80.
b. If you have access to a random-number generator, use it to solve part
(a)

Homework: 1.31, 1.37, 1.39, 1.41, 1.43

5
Chapter 1: Nature of Statistics

Section 1.3 Other Sampling Techniques

The sampling technique chosen is usually determined by the type of study


being conducted. There are advantages and disadvantages to each sampling
technique.

1. Systematic Random Sampling simple to conduct but runs into


problems when there is a pattern in the data.
Step 1: Divide the population size (N) by the sample size (n) and
round the result down to the nearest whole number m
Step 2: Use a random number table (or similar device) to obtain a
number, k, between 1 and m
Step 3: Select for the sample those members of the population that are
numbered, k, k+m, k+2m,

Example (1.50) Keno. In the game of keno, 20 balls are selected at


random from 80 balls, numbered 1-80. Use systematic random sampling
to obtain sample of 20 of the 80 balls. Is it reasonable to use systematic
sampling for simulating a game of Keno?

2. Cluster Random Sampling, we divide the population into sections (or


clusters), then obtain a simple random sample the clusters and then
choose to interview all the members (or a subset of them) of the selected
section.

Example - looking at city blocks of Calgary as a cluster. Select several


blocks and interview all those in the clusters. If looking at the average
height of Calgarians is this reasonable? If looking at income of people in
Calgary is this reasonable?

3. Stratified Random Sampling, we divide the population into at least two


different subpopulations (or strata) that share some characteristic. Then
draw the simple random sample from each stratum.

Example - looking at the prevalence of heart disease in different ethnic


groups.

Homework: 1.49, 1.53

6
Chapter 1: Nature of Statistics

How good is the sample?

Sampling Bias is the systematic tendency for some value to be


selected more often.

o The Banff Park lake study of the distribution of body length in


a certain population of fish in a lake. The sample will be
collected using a fishing net. Smaller fish can more easily slip
through the holes in the net. Thus, smaller fish are less likely to
be caught than larger ones, so the sampling procedure is biased.

o When bias occurs the sample mean fish length y will not be a
good estimate of the population mean fish length .

Other errors can occur such as measuring errors, missing data, non response,
transcription errors. There are methods for dealing with this.

7
Chapter 1: Nature of Statistics

Section 1.4 Experimental Design

Definitions: In a designed experiment,

Experimental Units are the individuals or items on which the


experiment is performed
Treatment each experimental condition
Response Variable the characteristic of the experimental outcome
that is to be measure or observed
Factor a variable whose effect on the response variable is of interest
in the experiment
Levels the possible values of a factor

Example (1.62) Storage of Perishable Items


Storage of perishable items is an important concern for many companies.
One study examined the effects of storage time and storage temperature on
the deterioration of a particular item. Three different storage temperatures
and five different storage times were used. Identify the experimental units,
response variable, factor(s), level of each factor, and treatments.
--

The following principles of experimental design enable a researcher to


conclude that differences in the results of an experiment, not reasonably
attributable to chance, are likely caused by the treatments.

Control two or more treatments should be compared, group


receiving the treatment is called treatment group, and the group
receiving the a control (placebo) is called the control group.

Randomization the experimental units should be randomly divided


into groups to avoid unintentional selection bias in constituting the
groups

Replication a sufficient number of experimental units should be


used to ensure that randomization creates groups that resemble each
other closely and increase the chance of detecting difference in
treatments

Example recall the Salk Polio vaccine trials

8
Chapter 1: Nature of Statistics

Statistical Designs
We look at two statistical designs:

1. Completely Randomized Design all experimental units are assigned


randomly among all the treatments.

2. Randomized Block Design the experimental units are assigned


randomly among all the treatments separately within each block. Where
units that are similar is some way are expected to affect the response
variable are grouped in blocks.

Example (1.66) Lifetimes of Flashlight Batteries.


Two different options are under consideration for comparing the lifetimes of
four brands of flashlight battery, using 20 different flashlights. Determine
the design used in each option.
a. One option is to randomly divide 20 flashlights into four groups of
five flashlights each and then randomly assign each group to use a
different brand of battery.
b. Another option is to use 20 flashlights five different brands of 4
flashlights each and randomly assign the 4 flashlights of each brand
to use a different brand of battery.

Homework 1.61, 1.63


Homework review problems
3, 5, 9, 19, 23, 25

SUMMARY

we have now completed chapter 1 in which we discussed methods for


collecting data and creating experiments
In Chapter 2 and 3 we are going to look at methods for classifying,
tabulating and summarizing data using Descriptive Statistics and
Graphing techniques