Anda di halaman 1dari 37

Introduction to Biostatistics

Dr Asim Waris
Course Information


Quizzes and Assignments 25%

OHT 1 and 2 25%
Final 50%

 Understand the role of statistics in the scientific

process and how it is a core component of
evidence-based medicine
 Understand features, strengths and limitations of
descriptive, observational and experimental
 Distinguish between association and causation
 Understand roles of chance, bias and
confounding in the evaluation of research
Text Books

 Probability and Statistics by Murray R Speigel

 Introduction to Statistics for Biomedical Engineers
 •Applied Statistics and Probability for Engineers
Statistical and medical ethics

 Misuse of patients
 1.Are the proposed procedures or diagnostic techniques safe?
 2.Is it ethical to withhold the treatment under evaluation from some patients
(namely, the controls)?
 3.Is it ethical to bring certain persons into the trial?
 4.Has informed consent been obtained from all patients?
 5.Is it ethical to offer inducements to people to participate in a trial?
 6.Is it ethical to use double-blind techniques?
 7.Is it ethical for patients to be randomly allocated to the different treatment and
control groups?
 8.How far can one go with placebos and dummy treatments? Can placebo or
sham surgery be justified?
 9.Who should make the decision about the answers to these questions? The
persons in charge of the investigation?
 All members of the investigation team? Clinical colleagues? A formal ethics
committee of clinical colleagues? A formal ethics committee of non-medical
people? A formal ethics committee of medical and non-medical people?
Statistical and medical ethics

 Misuse of statistics
 1.Why is it unethical to publish results that for
statistical reasons are incorrect?
 2.Why is it unethical to present results in a
misleading way?
 3.Why should professional statistical advice be
sought at the beginning of an investigation?
Statistical and medical ethics

 World Medical Association Declaration of Helsinki.

 The World Medical Association (WMA) has
developed the Declaration of Helsinki as a
statement of ethical principles for medical research
involving human subjects, including research on
identifiable human material and data. The first
version was adopted in 1964 and has been
amended six times since, most recently at the
General Assembly in October 2008. The current
(2008) version is the only official one.
The Scientific Method




Revise H

supports H
with H
Variable and Data
 The raw data of an investigation consist of observations made on individuals.
 The number of individuals is called the sample size. Any aspect of an individual that is
measured, like blood pressure, or recorded, like age or sex, is called a variable.
Variable and Data

 A first step in choosing how best to display and

analyse data is to classify the variables into their
different types.
 There are two major types of variable –
categorical variables and metric variables

 A variable is any characteristic that can be

measured on each individual unit in a statistical
 1.Catagorical variable
 2. Numerical variable
 Discrete variable
 Continous variable

 Quantitative variables (Metric Variables) are those

variables that are measured in terms of numbers.
Some examples of quantitative variables are
height, weight, and shoe size etc.

 Quantitative variables (Metric Variables) are those

variables that are measured in terms of numbers.
Some examples of quantitative variables are
height, weight, and shoe size etc.
 1. Discrete variable is a variable that has values
that has either a finite number of possible values or
a countable number of possible values.
 2. A continuous variable is a variable that has an
infinite number of possible values that is not

 Qualitative variables (categorical variables) are those

that express a qualitative attribute such as hair color,
eye color, religion, favorite movie, gender etc. They
can also be referred as categorical variables.
1)Nominal categorical variables: Consider the variable
 blood type. Let’s assume for simplicity that there are
only four different blood types: O, A, B, and A/B. We
can first determine the blood type of subjects and then
allocate the result to one of the four blood type
categories. The order is not important.
 Ordinal categorical variables: The ordering of the
categories is not arbitrary as it was with nominal
variables. It is now possible to order the categories
in a meaningful way. e.g stress score or Glasgow
Coma Scale.
 It is very important to note that the score/scale are
not real numbers. So it’s not appropriate to apply
arithmetic operations on Ordinal variables data.

 Determine whether the following random variables

are discrete or continuous.
 (a) The number of light bulbs that burn out in a
room of 10 light bulbs in the next year.
 (b) The number of branches on a randomly
selected Oak tree.
 (c) The length of time between calls to 911.
Dependent and Independent
 Variables are properties or characteristics of some
event, object, or person that can take on different
values or amounts (as opposed to constants such
as π that do not vary). When conducting research,
experimenters often manipulate variables.
 For example, an experimenter might compare the
effectiveness of four types of antidepressants. In
this case, the variable is “type of antidepressant.”
 In general, the independent variable is
manipulated by the experimenter and its effects
on the dependent variable are measured.
 Can blueberries slow down aging? A study indicates that antioxidants found in
blueberries may slow down the process of aging. In this study, 19-month-old
rats (equivalent to 60-year-old humans) were fed either their standard diet or a
diet supplemented by either blueberry, strawberry, or spinach powder. After
eight weeks, the rats were given memory and motor skills tests. Although all
supplemented rats showed improvement, those supplemented with blueberry
powder showed the most notable improvement.

 What is the independent variable?

 What are the dependent variables?
dietary supplement: none, blueberry, strawberry, and
memory test and motor skills test
Objectives of statistic

It is helpful to distinguish between two major

categories of statistics
Descriptive statistics
Are concerned with the presentation, organization
and summarization and description of data Graphical
representation & Tables
Inferential Statistics
Allow us to generalize from our sample of data to a
larger groups of subjects
It consists of Estimation and hypothesis of testing
What is the description of
 Data consist of observations made on
individuals. Normally, we collect observations on
a sample from a much larger group called the
 Different samples from the same population will
give different results, a phenomenon called
sampling variation

 Data: plural, one piece of data is Datum.

 The set of values collected for the variable of each
of the elements belonging to the sample
Data Collection
Types of observations
Examples of observations about people are gender,
age, height, eye color, responsiveness to treatment,
life expectancy, etc.
These are called variables can be dependent or
independent (already discussed)
Types of variables

 It is very important to distinguish between outcome

and exposure variables, in addition to identifying
the types of each of the variables in the data set.
The outcome variable is the variable that is the
focus of our attention, whose variation or
occurrence we are seeking to understand. In
particular we are interested in identifying factors, or
exposures, that may influence the size or the
occurrence of the outcome variable.
Population vs. Sample

 Population: The set of all measurements of interest

to the investigator .e.g. average FSc marks of
NUST students
 Sample: Any subset of all measurements selected
from the population.
Example: does current government is achieving
international standards in Health care?
To estimate, we take a sample which is a good
representative of the entire population

 A teacher wants to know how students in the class

did on their last test. The teacher asks the 10
students sitting in the front row to state their latest
test score. He concludes from their report that the
class did extremely well.
 What is the sample? What is the population? Can
you identify any problems with choosing the
sample in the way that the teacher did?
Principles of Sampling

 1. Determine the objectives of the study you are

 2. Carefully identify the population for sampling
 3. Choose the variables you will measure in the
 4. Decide appropriate design for producing the
 5. Collect the data
Principles of Sampling

 Carefully identify the population

 The population should be explicitly described in
order to obtain a sample that provides accurate
 For example: poverty, average money spend on
the food.
Principles of Sampling

 Choose the variables you will measure in the

 We must determine what will be measured and
how it will be measured. In order not to overlook an
important issue we should attempt to identify all
relevant variables prior to data collection.
 For example we want to find out Obesity in youth.
Possible variables are genes, demography, food
habits, exercise etc
Principles of Sampling

Decide appropriate design for producing the data

 Statistical design falls under two categories Survey
and experiments
 All elections polls are survey (Gallup, Harris, Roper
 An experiment is an attempt to determine cause-
and-effect relationship between variables.
Methods of sampling
Non-probability sampling is a sampling technique where
the samples are gathered in a process that does not give all the
individuals in the population equal chances of being selected.

 Friends, family, neighbours, acquaintances.

 Students in a class or co-workers in a workplace.
 Volunteers.
 Judgment sample.
 Quota sample - obtain a cross-section of a population, eg. by age and
sex for individuals or by region, firm size, and industry for businesses.
This may be reasonably representative.
 Sampling distribution of statistics cannot be obtained using any of the
above methods, so statistical inference is not possible.
“the theory, methods, and practice of forming judgements about the
parameters of a population and the reliability of statistical relationships,
typically on the basis of random sampling”
Random Sample

 A simple random sample of size n consists of n

elements chosen from the population in such a
way that all samples of that size have the same
chance of being selected.

 There are several ways to find a random sampling,

e.g. Physical Mixing, or randomly select elements
from population etc.
Methods of sampling –
Random sampling methods – each member has an
equal probability of being selected.
 Systematic – every kth case. Equivalent to
random if patterns in list are unrelated to issues of
interest. Eg. telephone book.
 Stratified samples – sample from each stratum or
subgroup of a population. Eg. region, size of firm.
 Cluster samples – sample only certain clusters of
members of a population. Eg. city blocks, firms.
 Multistage samples – combinations of random,
systematic, stratified, and cluster sampling.
 Stratified sampling. With stratified sampling, the population is divided into groups, based on
some characteristic. Then, within each group, a probability sample (often a simple random
sample) is selected. In stratified sampling, the groups are called strata.As a example, suppose
we conduct a national survey. We might divide the population into groups or strata, based on
geography - north, east, south, and west. Then, within each stratum, we might randomly select
survey respondents.
 Cluster sampling. With cluster sampling, every member of the population is assigned to one,
and only one, group. Each group is called a cluster. A sample of clusters is chosen, using a
probability method (often simple random sampling). Only individuals within sampled clusters
are surveyed.Note the difference between cluster sampling and stratified sampling. With
stratified sampling, the sample includes elements from each stratum. With cluster sampling, in
contrast, the sample includes elements only from sampled clusters.
 Multistage sampling. With multistage sampling, we select a sample by using combinations of
different sampling methods.For example, in Stage 1, we might use cluster sampling to choose
clusters from a population. Then, in Stage 2, we might use simple random sampling to select a
subset of elements from each chosen cluster for the final sample.
 Systematic random sampling. With systematic random sampling, we create a list of every
member of the population. From the list, we randomly select the first sample element from the
first k elements on the population list. Thereafter, we select every kth element on the list.This
method is different from simple random sampling since every possible sample of n elements is
not equally likely.
 A research scientist is interested in studying the
experiences of twins raised together versus those
raised apart. She obtains a list of twins from the
National Twin Registry, and selects two subsets of
individuals for her study. First, she chooses all those in
the registry whose last name begins with Z. Then she
turns to all those whose last name begins with B.
Because there are so many names that start with B,
however, our researcher decides to incorporate only
every other name into her sample for B.
 What is the population? What is the sample? Was the
sample picked by simple random sampling? Is it
Sampling Terminology
 Parameter
fixed, unknown number that describes the
 Statistic
known value calculated from a sample
a statistic is often used to estimate a parameter
 Variability
different samples from the same population may
yield different values of the sample statistic
 Sampling Distribution
tells what values a statistic takes and how often it takes
those values in repeated sampling