Anda di halaman 1dari 21

About me

Not a statistician! I do work in the area of chemometrics So emphasis will be on practical knowledge and application of statistics, rather than theorems and proofs.

Why you should be here

you have an interest in statistical reasoning you have a desire to learn to use statistics properly in experimental design and data analysis you want to develop your ability to critically assess scientific (or pseudoscientific) arguments

What is expected of you

attendance at most lectures feedback to me on what you like and dislike about the course, and especially how it might be improved

Understand the fundamental principles of statistical inference. Understand the general principles underlying the most common tests. Know the assumptions of common tests and understand impact of violations. Be able to perform standard statistical analyses

Some opinions of statistics

If your experiment needs statistics, you should have done a better experiment.

Ernest Rutherford

Some opinions of statistics

There are three types of lies: lies, damn lies, and statistics! Benjamin Disraeli

Some opinions of statistics

To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.

Sir Ronald Fisher

Some opinions of statistics

The purpose of models is not to fit the data, but to sharpen the questions. Samuel Karlin

What statistics can and cant do

Can Cant
provide objective criteria tell the truth (probabilistic for evaluating hypotheses conclusions only!) synthesize information compensate for poor design (not without information indicate biological loss keep your raw significance: statistical data!) significance does not mean help detect patterns in biological significance, nor messy data vice versa! help optimize effort help you critically evaluate arguments

The four ages of a biostatistician

Age Stone Bronze Defining characteristics Total ignorance Nodding familiarity, but understanding purely superficial Comment Ignorance is not bliss! Statistics a (small) sidebar to scientific investigation (See Rutherford, Ernest)


Moderate familiarity coupled with a strong desire to demonstrate same; statistical reach exceeds grasp
Knows when statistical issues are (and are not) important; recognizes limitations (of self and statistical science)

Overwhelming concern with statistical minutae; scientific forest often obscured by statistical trees.
That to which we can/should all aspire.



Getting Started Statistics: Collection Organization Analysis Interpretation of DATA to answer questions and make decisions.


Populations and samples: Study a part in order to gain information about the whole




An Ideal Sample Ideally, a sample should be representative of the population as a whole. representative: what is true of the sample is true of the population (apart from the number of observations). This is more likely to happen if the sampling process is free of bias. bias: systematic favouring of certain members of the population over others


Sources of bias: Voluntary response

Im in!

Me too!

And me!

Population Self-selection over-represents people with strong opinions.


Some examples of self-selected samples

Mail out 100 questionnaires and await responses. Get 62% response rate of whom 3/4 are in favour of the question.
- What do the non-responders think?

The study of 1236 men aged over 60 found that those who had three alcoholic drinks a day had a 60% lower risk of death than teetotallers.
- Many drinkers never made it to age 60

Survey of mothers with young children on welfare. Those who voluntarily went to job training courses were found to have an increased chance of leaving the welfare state.
- volunteers are more motivated anyway


Sources of bias: Investigator intervention

This ones far too small


Humans are notoriously bad at picking an average group of people.


Sources of bias: Systematic sampling

eg Sample every 50th item from a production line

quality control

If there are periodic, seasonal and/or systematic effects, our sampling process will be biased.


A solution: Simple random samples

4 6

1 2

5 8




14 15 16 1920 21 17 22 23 18 24 25 2627 29 30 28 31 3233 34


Each individual in the population has the same chance of being selected in the sample.

The simple random sample (SRS) is the basis of most analyses


Alternatives to SRS: Cluster samples

Select a street/locality at random, then several buildings from the locality

Collection is confined to localised areas, saving cost.


Alternatives to SRS: Stratified samples

Population Income Strata Over $50,000/year 20%

Under $25,000/year


Select 20%, 50%, 30% of your sample from each income category, respectively. Can be more efficient than simple random but more difficult to analyse.


Types of Data
Nominal Categorical M/F Group 1, 2 or 3 Red, blue, green Poor, moderate, good Left, middle, right Small, average, large



Numerical Number of worms Salary of individual Amount of toxic effluent