An Introduction to Secondary Data Analysis
Chapter 1 Introduction to Psychology - Thinking Through the Themes
Research Design
Structure of Research Proposal_quantitative
Year 8 Student Research Project
Ch01 Print.ppt 1
2012 Arifin RandomsamplingandallocationusingSPSS
Concrete Lab
Langston BSSA1982p1427(1)
Field RNG Exploration: The British Royal Wedding (2011)
Research Design
Lab Report
mel705-14.ppt
0768f97d-9eb8-4e2f-9095-b66259877690
10.1.1.460.9108
Methods of Sampling
1A.student Checklist & RP Instructions Ext
DQ2
4-Question-Strategy.pdf
science textbook pages 20-26

summarizing information.

measuring the reliability of conclusions about a population based on

information obtained from a sample of the population.

from the population

Sample

POPULATION

inference about the

population

Definitions:

Population is the complete collection of all elements on which we

measure one or more variables in a statistical study

Population Size is the number of elements or observations in the

population, denoted by N

which we measure one or more variables in a statistical study

Sample Size is the number of elements or observations in the sample,

denoted by n

1

Chapter 1: Nature of Statistics

Results of monthly telephone surveys yielded the percentage estimates of

all music expenditures shown in the following table. These statistics were

published in 2004 Consumer Profile. Identify if this is a descriptive or

inferential statistical study. [Inferential]

Expense 23.9 13.0 12.1 11.3 10.0 6.0 2.8

%

Music Jazz Classical Oldies Movies New Other Unknown

Age

Expense 2.7 2.0 1.4 1.1 1.0 8.9 3.8

%

An associated Press/AOL poll of 1000 US adults, taken April 18-20,

2005 and appearing in Eau Claire Leader Telegram on April 22, asked

about some of the consequences of the rising cost of gasoline. Of those

sampled, 58% had reduced their driving, 57% had cut back on other

expenses, 41% had planned vacations closer to home, and 41% said that

they may buy a more fuel-efficient vehicle.

a. Identify the population and sample for the study

b. Are the percentages provided descriptive or inferential statistics?

[Descriptive]

NOTE:

Ideally a sample should be representative of the population. This

means that characteristics prevalent in the population are also within

the sample.

appropriately using sampling techniques (section 1.2 and 1.3) and as

large of a sample should be chosen as possible. Note that sample size

is restricted by time, money, effort, etc...

characteristic is called a parameter. Sample statistics are estimates of

the Population parameters. Example: average tree girth in a forest.

Sample mean is y and population mean is .

2

Chapter 1: Nature of Statistics

and take measurements, as in a sample survey. These studies can only

reveal association.

investigators determine the blood pressure and the presence of heart

disease at the same time. If they find an association they cannot

determine which comes first. Does heart disease result in high blood

pressure or does high blood pressure cause heart disease or are both high

blood pressure and heart disease the result of some other common cause?

and then observe characteristics and take measurements. These studies

can establish causation. More on this in section 1.4.

Example: recall the Salk Polio Vaccine Trials they used a control group

and a treatment group to determine that the vaccine indeed helped reduce

the risk of contracting Polio.

3

Chapter 1: Nature of Statistics

If an experiment/study requires a sample, then the Goal is to have a

representative sample of a population of interest.

information you are looking for. Example: Statistics Canada runs a

Census every 5 years and publishes their results on the internet and in

various databases.

Random Sample method of sampling such that each member of a

population under observation has an equal chance of being selected.

MRC students and picking twenty names out of a hat is a random

sample. Picking the first twenty students who walk through East Gate

at 10am is not a random sample.

o n subjects are selected in such a way that every possible sample

of size n has the same chance of being chosen.

Two types of simple random sampling

1. with replacement member of a population can be

selected more than once

2. without replacement member of a population is

selected at most once. (this is typically what is done)

groups of twenty students at MRC and picking a group from a hat is a

simple random sample

May, Jon, Tim) a group of two is to be selected to represent the

school. List all possible groups of 2, one of these groups is to

randomly selected to represent the school, what are the chances of

selecting Jon and Tim?

4

Chapter 1: Nature of Statistics

Have a computer generate a (pseudo) random number (tutorial)

Use a random device - pick number from a hat or roll some dice

Use a random number table (pg 15 or appendix A-5)

In the game of Keno, 20 balls are selected at random from 80 balls,

numbered 1-80.

a. Use a random number table to simulate one game of keno by

obtaining 20 random numbers between 1 and 80.

b. If you have access to a random-number generator, use it to solve part

(a)

5

Chapter 1: Nature of Statistics

being conducted. There are advantages and disadvantages to each sampling

technique.

problems when there is a pattern in the data.

Step 1: Divide the population size (N) by the sample size (n) and

round the result down to the nearest whole number m

Step 2: Use a random number table (or similar device) to obtain a

number, k, between 1 and m

Step 3: Select for the sample those members of the population that are

numbered, k, k+m, k+2m,

random from 80 balls, numbered 1-80. Use systematic random sampling

to obtain sample of 20 of the 80 balls. Is it reasonable to use systematic

sampling for simulating a game of Keno?

clusters), then obtain a simple random sample the clusters and then

choose to interview all the members (or a subset of them) of the selected

section.

blocks and interview all those in the clusters. If looking at the average

height of Calgarians is this reasonable? If looking at income of people in

Calgary is this reasonable?

different subpopulations (or strata) that share some characteristic. Then

draw the simple random sample from each stratum.

groups.

6

Chapter 1: Nature of Statistics

selected more often.

a certain population of fish in a lake. The sample will be

collected using a fishing net. Smaller fish can more easily slip

through the holes in the net. Thus, smaller fish are less likely to

be caught than larger ones, so the sampling procedure is biased.

o When bias occurs the sample mean fish length y will not be a

good estimate of the population mean fish length .

Other errors can occur such as measuring errors, missing data, non response,

transcription errors. There are methods for dealing with this.

7

Chapter 1: Nature of Statistics

experiment is performed

Treatment each experimental condition

Response Variable the characteristic of the experimental outcome

that is to be measure or observed

Factor a variable whose effect on the response variable is of interest

in the experiment

Levels the possible values of a factor

Storage of perishable items is an important concern for many companies.

One study examined the effects of storage time and storage temperature on

the deterioration of a particular item. Three different storage temperatures

and five different storage times were used. Identify the experimental units,

response variable, factor(s), level of each factor, and treatments.

--

conclude that differences in the results of an experiment, not reasonably

attributable to chance, are likely caused by the treatments.

receiving the treatment is called treatment group, and the group

receiving the a control (placebo) is called the control group.

into groups to avoid unintentional selection bias in constituting the

groups

used to ensure that randomization creates groups that resemble each

other closely and increase the chance of detecting difference in

treatments

8

Chapter 1: Nature of Statistics

Statistical Designs

We look at two statistical designs:

randomly among all the treatments.

randomly among all the treatments separately within each block. Where

units that are similar is some way are expected to affect the response

variable are grouped in blocks.

Two different options are under consideration for comparing the lifetimes of

four brands of flashlight battery, using 20 different flashlights. Determine

the design used in each option.

a. One option is to randomly divide 20 flashlights into four groups of

five flashlights each and then randomly assign each group to use a

different brand of battery.

b. Another option is to use 20 flashlights five different brands of 4

flashlights each and randomly assign the 4 flashlights of each brand

to use a different brand of battery.

Homework review problems

3, 5, 9, 19, 23, 25

SUMMARY

collecting data and creating experiments

In Chapter 2 and 3 we are going to look at methods for classifying,

tabulating and summarizing data using Descriptive Statistics and

Graphing techniques

