Anda di halaman 1dari 21

About me

Not a statistician! I do work in the area of chemometrics So emphasis will be on practical knowledge and application of statistics, rather than theorems and proofs.

Why you should be here


you have an interest in statistical reasoning you have a desire to learn to use statistics properly in experimental design and data analysis you want to develop your ability to critically assess scientific (or pseudoscientific) arguments

What is expected of you


attendance at most lectures feedback to me on what you like and dislike about the course, and especially how it might be improved

Objectives
Understand the fundamental principles of statistical inference. Understand the general principles underlying the most common tests. Know the assumptions of common tests and understand impact of violations. Be able to perform standard statistical analyses

Some opinions of statistics


If your experiment needs statistics, you should have done a better experiment.

Ernest Rutherford

Some opinions of statistics


There are three types of lies: lies, damn lies, and statistics! Benjamin Disraeli

Some opinions of statistics

To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.

Sir Ronald Fisher

Some opinions of statistics


The purpose of models is not to fit the data, but to sharpen the questions. Samuel Karlin

What statistics can and cant do


Can Cant
provide objective criteria tell the truth (probabilistic for evaluating hypotheses conclusions only!) synthesize information compensate for poor design (not without information indicate biological loss keep your raw significance: statistical data!) significance does not mean help detect patterns in biological significance, nor messy data vice versa! help optimize effort help you critically evaluate arguments

The four ages of a biostatistician


Age Stone Bronze Defining characteristics Total ignorance Nodding familiarity, but understanding purely superficial Comment Ignorance is not bliss! Statistics a (small) sidebar to scientific investigation (See Rutherford, Ernest)

Silver

Moderate familiarity coupled with a strong desire to demonstrate same; statistical reach exceeds grasp
Knows when statistical issues are (and are not) important; recognizes limitations (of self and statistical science)

Overwhelming concern with statistical minutae; scientific forest often obscured by statistical trees.
That to which we can/should all aspire.

Gold

11

Getting Started Statistics: Collection Organization Analysis Interpretation of DATA to answer questions and make decisions.

12

Populations and samples: Study a part in order to gain information about the whole

Population

Sample
Analyse

13

An Ideal Sample Ideally, a sample should be representative of the population as a whole. representative: what is true of the sample is true of the population (apart from the number of observations). This is more likely to happen if the sampling process is free of bias. bias: systematic favouring of certain members of the population over others

14

Sources of bias: Voluntary response

Im in!

Me too!

And me!

Population Self-selection over-represents people with strong opinions.

15

Some examples of self-selected samples


Mail out 100 questionnaires and await responses. Get 62% response rate of whom 3/4 are in favour of the question.
- What do the non-responders think?

The study of 1236 men aged over 60 found that those who had three alcoholic drinks a day had a 60% lower risk of death than teetotallers.
- Many drinkers never made it to age 60

Survey of mothers with young children on welfare. Those who voluntarily went to job training courses were found to have an increased chance of leaving the welfare state.
- volunteers are more motivated anyway

16

Sources of bias: Investigator intervention


This ones far too small

Population

Humans are notoriously bad at picking an average group of people.

17

Sources of bias: Systematic sampling


eg Sample every 50th item from a production line

quality control

If there are periodic, seasonal and/or systematic effects, our sampling process will be biased.

18

A solution: Simple random samples


3
4 6

1 2

5 8

7
9

10

11
13

14 15 16 1920 21 17 22 23 18 24 25 2627 29 30 28 31 3233 34

12

Each individual in the population has the same chance of being selected in the sample.

The simple random sample (SRS) is the basis of most analyses

19

Alternatives to SRS: Cluster samples


Select a street/locality at random, then several buildings from the locality

Collection is confined to localised areas, saving cost.

20

Alternatives to SRS: Stratified samples


Population Income Strata Over $50,000/year 20%

$25,000-$49,999/year
Under $25,000/year

50%
30%

Select 20%, 50%, 30% of your sample from each income category, respectively. Can be more efficient than simple random but more difficult to analyse.

21

Types of Data
Nominal Categorical M/F Group 1, 2 or 3 Red, blue, green Poor, moderate, good Left, middle, right Small, average, large

Ordinal

Quantitative

Numerical Number of worms Salary of individual Amount of toxic effluent