Anda di halaman 1dari 48

Research design and methodology,

statistics and data handling (MedIs7)


Biostatistics II

(KVT3)

Lecturers:

Alina Zalounina (AZ)


Carsten Dahl (CD)
Dan Karbing (DK)
Department of Health Science and Technology
Aalborg University

Aim of the course


Learn to use and understand normally applied
statistics in medical research (advanced level)
- Software tool: SPSS

Introduce important aspects of research work


(only for MedIs7)

Funding and intellectual property rights


Medical/research writing
Planning studies

Programme
Date

Topic

Lecturer

Sept 2
Sept 5

Introductory statistics
Funding academic research

AZ
DK

Sept 9

Developing and testing a hypothesis

AZ

Sept 12

Medical writing

DK

Sept 16

Contingency tables

CD

Sept 21

Parametric analysis

DK

Sept 23

Non-parametric analysis

CD

Sept 26

Intellectual Property Right (IPR) and patents

DK

Sept 30

Regression analysis

CD

Okt 5

Study design

AZ

Okt 7

Survival analysis

CD

Okt 19

Meta-analysis. Evidence-based medicine

AZ

= only for MedIs7

Learning material

Software: SPSS
The students are expected to bring a laptop.

SPSS should be installed before the course starts using guidelines at:
http://spss.software.aau.dk/

Examination
Written exam - 4 hours, in January
Pensum: slides + learning material specified
for each lecture
Exam questions will reflect lectures and course
assignments
Some questions will require use of a software (SPSS)
Hjlpemidler: everything, but not Internet
and communication with others
More detailed Exam info can be found at
http://person.hst.aau.dk/az/MedIs7
6

Today: Introductory statistics

Alina Zalounina
Center for Model-based
Medical Decision Support
7

Learning material:
Chapter 1: Data
Chapters 2-5: Descriptive Statistics
Chapters 7-8: Statistical Inference

Learning Objectives

Identify the type of data

Define and understand the main terms


of Descriptive Statistics

Understand the purpose of Inferential


Statistics

Outline the major measures of risk

Outline the basic operations in SPSS


9

Type of data
Categorical
data
Nominal

Ordinal

etnicity
score
gender
marital status
type of operation
smoking status

Metric
data
Discrete

Continuous

number of
children

weight
height
temp.
age
blood pressure
time
cholesterol
body mass index

Samples and populations

Sample = collected data


Population = all possible data
11

Type of Statistics
Descriptive used to organize and
describe a sample
Inferential used to extrapolate from a
sample to a larger population

12

Learning Objectives

Identify the type of data

Define and understand the


main terms of Descriptive Statistics

Understand the purpose of Inferential


Statistics

Outline the major measures of risk

Outline the basic operations in SPSS


13

Descriptive Statistics. Issues for today


Frequency
Measures of Central Tendency
- Mean
- Median

Measures of Variability
- Variance
- Standard deviation
- Standard error

Descriptive Plots
- Boxplot
- Histogram
- Q-Q plot

Data distibutions
- Normal
- Binomial

14

Frequency table
Relative frequency

15

Measures of Central Tendency


Mean (average)
N

x
i 1

Population
Mean

x
i 1

Sample Mean
16

Median (middle)

17

Measures of Variability
Variance

( xi x)
i 1

n -1

Sample Variance

( xi )
N

i 1

Population Variance
18

Standard deviation, Standard error

s
se=
n
n

( xi x)

i 1

n -1

Sample SD

( xi )
N

Standard
Error
2

i 1

Population SD
19

Descriptive Plots
Boxplot

20

Histogram
Overall shape curve shows distribution

The histogram shows the frequency distribution across a


set of measurements as a set of physical bars.
21

Normal distribution
Bell-shaped

In a perfect normal frequency distribution, the mean and


median are equal. The data is continuous and
symmetrically distributed around the central point.
Variability is represented by the width of the distribution.
22

Normal distribution: formulae


X~N(, 2) =>
1
f(x)
e
2

1 x- 2

P(a X b) f(x)dx
a

95%

-1.96*

+1.96*

Note: about 95% of observations lie


within 1.96*standard deviation of
the mean

X = a continuous variable
f(x) = probability distribution function of X
= mean
= standard deviation

Check normality
Without inspecting the data it is risky to assume a normal
distribution.
There are a number of graphs that can be used to check the
deviations of the data from the normal distribution:
A histogram should reveal a bell shaped curve.
QQ plot: Curvature of the points indicates departures of
normality

24

Skew distribution

This population is skewed to the right


(i.e. it has a long right hand tail)

25

Binomial distribution

o There are n identical independent trials


o Each trial can have only 2 outcomes: success or failure
o Probability p of success in each trial is constant
o Variable of interest is X=the number of successes in n trials

binomial variable

26

Binomial distribution: formula


X~Bin(n,p) => P(X=x)=

n!
x
p (1 p)n-x
x!(n-x)!

Note: n!=n(n-1)(n-2)1

27

Binomial distribution: Example


The probability that a student is accepted to
the Department of Medicine is 0.3.
p=0.3
If 5 students from the same school apply,
what is the probability that 2 are accepted?
n=5

P(X=2)=?

5!
5-2
2
=> P(X=2)=
(1

0.3)
0.31
0.3
2!(5-2)!
probability
distribution

28

Learning Objectives

Identify the type of data

Define and understand the


main terms of Descriptive Statistics

Understand the purpose of


Inferential Statistics

Outline the major measures of risk

Outline the basic operations in SPSS


29

Inferential Statistics
Can your experiment make a statement about
the general population?
Two types of tests:
1. Parametric

assume that the variable in question has a known


underlying mathematical distribution that can be
described (normal, binomial, etc.)

2. Non-Parametric

are considered distribution-free methods because


they do not rely on any underlying mathematical
distribution.
30

Learning Objectives

Identify the type of data

Define and understand the


main terms of Descriptive Statistics

Understand the purpose of


Inferential Statistics

Outline the major measures of risk

Outline the basic operations in SPSS


31

Risks and Odds. Issues for today


Risk (probability) = a measure of the chance of

getting some outcome of interest (e.g., disease) from


some event (e.g., exposure to a risk factor)

Absolute risk
Relative risk
Odds
Odds ratio

Absolute risk = the risk for a single group


Mother smoked during
pregnancy

outcome

Apgar
score <7

Yes

No

Totals

Yes

11

No

17

19

Totals

10

20

30

Risk (low score | smoking)= 8/10 = 80 %


Risk (low score | no smoking)= 3/20 = 15 %

risk factor

Relative risk (Risk Ratio, RR)= the risk for the

exposed group compared to the risk for the non-exposed


group.
Risk (low score | smoking)= p1 = 80 %
Risk (low score | no smoking)= p2 = 15 %

The risk of low score among those having smoked


compared to those who did not smoke is
RR = p1/p2 = 80%/15% = 5.3

Interpretation of RR:
Mothers who smoked during pregnancy had more than 5 times
the risk of getting low Apgar score as those who did not smoke.

Odds
outcome
Apgar score<7

risk
factor

Mother
smoked
during
pregnancy

Yes

No

Totals

Yes

10

No

17

20

Totals

11

19

30

Odds (mothers with low score smoked) = 8/3 = 2.7


Odds (mothers with high score smoked) = 2/17 = 0.12

Odds ratio (OR)


Odds (mothers with low score smoked) = odds1 = 2.7
Odds (mothers with high score smoked) = odds2 = 0.12

The ratio between the odds is the odds ratio for smoking
among mothers with low score compared to mothers with
high score:
OR = odds1/odds2 = 22.67
Interpretation of OR:
Mothers with low Apgar score were more than 22 times as
likely to have smoked during pregnancy as those with high
Apgar score.

RR versus OR
Exposed Non-Exposed
Outcome

No Outcome

Outcome

A(B D)
RR
B(A C)

No Outcome

Exposed

NonExposed

AD
OR
BC

RR=1 or OR=1 => there is no association between the outcome


and exposure to risk factor

Learning Objectives

Identify the type of data

Define and understand the


main terms of Descriptive Statistics

Understand the purpose of


Inferential Statistics

Outline the major measures of risk

Outline the basic operations in SPSS


38

Introduction to SPSS
Example

39

Data view

Variable view

Smoking

LowApgarScore
41

Frequences

Cross - Tabulations

42

Risk estimate

odds ratio for non-smoking among mothers with


high score compared to mothers with low score:

Odds ratio: 22.667=(17/2)/(3/8)

risk of high score among those who did not


smoke compared to those having smoked

Relative risks:
4.25=(17/20)/(2/10)
0.188=(3/20)/(8/10)

risk of low score among those who did not


smoke compared to those having smoked

43

Descriptives

Box-plot

44

Histogram

45

Q-Q Plot

46

Learning Objectives

Identify the type of data

Define and understand the main terms


of Descriptive Statistics

Understand the purpose of Inferential


Statistics

Outline the major measures of risk

Outline the basic operations in SPSS


47

Exercises:
http://person.hst.aau.dk/az/MedIs7

48

Anda mungkin juga menyukai