Biostatistics in Dentistry

BIOSTATISTICS IN
DENTISTRY By: Naghman Zuberi

INTRODUCTION
Statistics Italian word statista meaning statesman or the
German word statistik which means a political state.
Originated from 2 main sources:

Government records
Mathematics
Registration of heads of families in ancient Egypt & Roman census

on military strength, births & deaths, etc.
John Graunt (1620-1674) father of health statistics
NAGHMAN ZUBERI 2
STATISTICS : is the science of compiling, classifying and
tabulating numerical data & expressing the results in a
mathematical or graphical form.
BIOSTATISTICS : is that branch of statistics concerned with
mathematical facts & data related to biological events.
NAGHMAN ZUBERI 3
USES OF STATISTICS
To assess the state of oral health in the community and to determine
the availability and utilization of dental care facilities.
To indicate the basic factors underlying the state of oral health by

diagnosing the community and solutions .
To determine success or failure of specific oral health care programs

or to evaluate the program action.
To promote health legislation and in creating administrative

standards.
NAGHMAN ZUBERI 4
MEASURES OF CENTRAL TENDENCY
A single estimate of a series of data that summarizes the data - the
measure of central tendency.
Objective:
To condense the entire mass of data.
To facilitate comparison.
NAGHMAN ZUBERI 5
PROPERTIES
Should be easy to understand and compute.
Should be based on each and every item in the series.
Should not be affected by extreme observations.
Should be capable of further statistical computations.
Should have sampling stability.
NAGHMAN ZUBERI 6
The most common measures of central tendency that are used in dental sciences are :
Arithmetic mean mathematical estimate.
Median positional estimate.
Mode based on frequency.
NAGHMAN ZUBERI 7
ARITHMETIC MEAN
Simplest measure of central tendency.
Ungrouped data:
Sum of all the observations in the data
Mean =

Number of observations in the data
Grouped data:
Sum of all the variables multiplied by the
Mean = corresponding frequency in the data
Total frequency
NAGHMAN ZUBERI 8
MEDIAN:-
Middle value in a distribution such that one half of the units in the
distribution have a value smaller than or equal to the median and
one half have a value greater than or equal to the median.
All the observations are arranged in the order of the magnitude.
Middle value is selected as the median.
Odd number of observations : (n+1)/2.
Even number of observations: mean of the middle two values is taken

as the median.
NAGHMAN ZUBERI 9
MODE
The mode or the modal value is that value in a series of
observations that occurs with the greatest frequency.
When mode is ill defined, it can be calculated using the relation
Mode = 3 median 2 mean
NAGHMAN ZUBERI 10
Most commonly used: arithmetic mean.
Extreme values in the series : median.
To know the value that has high influence in the series: mode.
NAGHMAN ZUBERI 11
MEASURES OF DISPERSION
Dispersion is the degree of spread or variation of the variable

about a central value.
Measures of dispersion used:
To determine the reliability of an average.
To serve as basis for control of variability.
To compare two or more series in relation to their variability.
To facilitate further statistical analysis.
NAGHMAN ZUBERI 12
RANGE
It is the simplest method, Defined as the difference between the
value of the smallest item and the value of the largest item.
But this measure does not give information about the values that lie
between the extreme values.
Subjects to fluctuations from sample to sample.
NAGHMAN ZUBERI 13
MEAN DEVIATION
It is the average of the deviations from the arithmetic mean.
M.D = X Xi
n
where ( sigma ) is the sum of, X is the
arithmetic mean, Xi is the value of each observation in the data, n
is the number of observation in the data.
NAGHMAN ZUBERI 14
MEAN DEVIATION
NAGHMAN ZUBERI 15
STANDARD DEVIATION(SD)
Most important and widely used.
Also known as root mean square deviation, because it is the square root
of the mean of the squared deviations from the arithmetic mean.
Greater the standard deviation, greater will be the magnitude of

dispersion from the mean.
A small SD means a higher degree of uniformity of the observations.
NAGHMAN ZUBERI 16
STANDARD DEVIATION
The smaller the standard deviation, the higher the quality of the measuring instrument
and your technique
Also indicates that the data points are also fairly close together with a small value
for the range.
Indicates that you did a good job of precision w/your measurements.
NAGHMAN ZUBERI 17
A high or large standard deviation
Indicates that the values or measurements are not similar
There is a high value for the range
Indicates a low level of precision (you didnt make measurements that
were close to the same)
The standard deviation will be 0 if all the values or measurements are

the same.
NAGHMAN ZUBERI 18
FORMULA FOR STANDARD DEVIATION
range (highest value lowest value)
= =
N N
N = number of measured values
As N gets larger or the more samples (measurements, scores, etc.), the

reliability of this approximation increases
NAGHMAN ZUBERI 19
22.5 mL, 18.3 mL, 20.0 mL, 10.6 mL
The Standard Deviation would be:

range (highest value lowest value)
= N = N
Range = 22.5 10.6g = 11.0 mL
N=4 11.0g
= 5.95 mL
4
= 6.0 mL
S.D. = 17.9 6.0 mL (expressed to the same
level of precision as the mean) NAGHMAN ZUBERI 20
COEFFICIENT OF VARIATION(C.V.)
A relative measure of dispersion.
To compare two or more series of data with either different units of

measurement or marked difference in mean.
C.V.= (Sx100)/ X
Where, C.V. is the coefficient of variation
S is the standard deviation
X is the mean
Higher the C.V. greater is the variation in the series of data

NAGHMAN ZUBERI 21
NORMAL DISTRIBUTION CURVE
Gaussian curve
Half of the observations lie above and half below the mean
Normal or Gaussian distribution
NAGHMAN ZUBERI 22
PROPERTIES
Bell shaped.
Symmetrical about the midpoint.
Total area of the curve is 1. Its mean zero & standard deviation 1.
Height of curve is maximum at the mean and all three measures of

central tendency coincide.
Maximum number of observations is at the value of the variable

corresponding to the mean, numbers of observations gradually
decreases on either side with few observations at extreme points.
NAGHMAN ZUBERI 23
Area under the curve between any two points can be found out in terms of
a relationship between the mean and the standard deviation as follows:
Mean 1 SD covers 68.3% of the observations
These limits on either side of mean are called confidence limits.
Forms the basis for various tests of significance .
NAGHMAN ZUBERI 24
TESTS OF SIGNIFICANCE
Different samples drawn from the same population, estimates

differ sampling variability.
To know if the differences between the estimates of different

samples is due to sampling variations or not tests of significance.
Null hypothesis
Alternative hypothesis
NAGHMAN ZUBERI 25
NULL HYPOTHESIS
There is no real difference in the sample(s) and the population in the
particular matter under consideration and the difference found is accidental
and arises out of sampling variation.
NAGHMAN ZUBERI 26
ALTERNATIVE HYPOTHESIS
Alternative when null hypothesis is rejected.
States that there is a difference between the two groups being compared.
NAGHMAN ZUBERI 27
LEVEL OF SIGNIFICANCE
After setting up a hypothesis, null hypothesis should be either

rejected or accepted.
This is fixed in terms of probability level (p) called level of

significance.
Small p value - small fluctuations in estimates cannot be attributed

to sampling variations and the null hypothesis is rejected.
NAGHMAN ZUBERI 28
STANDARD ERROR
It is the standard deviation of a statistic like the mean, proportion etc
Calculated by the relation
Standard error of the population = (p x q)/ n
Where,
p is the proportion of occurrence of an event in the sample
q is (1-p)
n is the sample size

NAGHMAN ZUBERI 29
TESTING A HYPOTHESIS
Based on the evidences gathered from the sample
2 types of error are possible while accepting or rejecting a null

hypothesis
Hypothesis Accepted Rejected
True Right Type I error
False Type II error Right
NAGHMAN ZUBERI 30
STEPS IN TESTING A HYPOTHESIS
State an appropriate null hypothesis for the problem.
Calculate the suitable statistics.
Determine the degrees of freedom for the statistic.
Find the p value.
Null hypothesis is rejected if the p value is less than 0.05,

otherwise it is accepted.
NAGHMAN ZUBERI 31
TYPES OF TESTS :-
PARAMETRIC NON- PARAMETRIC
i. students t test. i. Wilcoxan signed rank test.
ii. One way ANOVA. ii. Wilcoxan rank sum test.
iii. Two way ANOVA. iii. Kruskal-wallis one way ANOVA.
iv. Correlation coefficient. iv. Friedman two way ANOVA.
v. Regression analysis. v. Spearmans rank correlation.
vi. Chi-square test.
NAGHMAN ZUBERI 32
CHI- SQUARE(2) TEST
It was developed by Karl Pearson.
It is the alternate method of testing the significance of difference
between two proportions.
Data is measured in terms of attributes or qualities.
Advantage : it can also be used when more than two groups are to be
compared.
NAGHMAN ZUBERI 33
CALCULATION OF 2 STATISTIC :-
2 = ( O E )2

E
Where, O = observed frequency and E = expected frequency.
Finding the degree of freedom(d.f) : it depends on the number of

columns & rows in the original table.
d.f = (column -1) (row 1).

If the degree of freedom is 1, the 2 value for a probability of 0.05
is 3.84.
NAGHMAN ZUBERI 34
CHI-SQUARE WITH YATES CORRECTION
It is required for compensation of discrete data in the chi-square distribution

for tables with only 1 DF.
It reduces the absolute magnitude of each difference (O- E) by half before

squaring.
This reduces chi- square & thus corrects P( i.e., result significance).
Formula used is :
2 = [ ( O E ) ]2
It is required when chi-square is in borderline of significance.
NAGHMAN ZUBERI 35
LIMITATIONS :-
It will not give reliable result if the expected frequency in any one
cell is less than 5.
In such cases, Yates correction is necessary i.e , reduction of the (O-
E) by half.
X2 = [(O-E) 0.5]2
E
The test tells the presence or absence of an association between the
two frequencies but does not measure the strength of association.
Does not indicate the cause & effect. It only tells the probability of
occurrence of association by chance.
NAGHMAN ZUBERI 36
STUDENT T TEST :-
When sample size is small. t test is used to test the hypothesis.
This test was designed by W.S Gosset, whose pen name was
student.
It is applied to find the significance of difference between two
proportions as,
Unpaired t test.
Paired t test.
Criterias :
The sample must be randomly selected.
The data must be quantitative.
The variable is assumed to follow a normal distribution in the
population.
Sample should be less than 30. NAGHMAN ZUBERI 37
PAIRED T TEST
When each individual gives a pair of observations.
To test for the difference in the pair values.
Test procedure is as follows:
Null hypothesis
Difference in each set of paired observation calculated : d=X1 X2
Mean of differences, D =d/n, where n is the number of pairs.
Standard deviation of differences and standard error of difference

are calculated.
NAGHMAN ZUBERI 38
Test statistic t is calculated from : t=D/SD/n
Find the degrees of freedom(d.f.) (n-1)
Compare the calculated t value with the table value for (n-1) d.f. to find
the p value.
If the calculated t value is higher than the t value at 5%, the mean
difference is significant and vice-versa.
NAGHMAN ZUBERI 39
ANALYSIS OF VARIANCE(ANOVA) TEST :-
When data of three or more groups is being investigated.
It is a method of partitioning variance into parts( between & within)

so as to yield independent estimate of the population variance.
This is tested with F distribution : the distribution followed by the

ratio of two independent sample estimates of a population variance.
F = S12/ S22 .The shape depends on DF values associated with S12 &
S22 .
NAGHMAN ZUBERI 40
One way ANOVA : if subgroups to be compared are defined by just one factor.
Two way ANOVA : if subgroups are based on two factors.
NAGHMAN ZUBERI 41
MISCELLANEOUS :-
Fishers exact test :
A test for the presence of an association between categorical

variables.
Used when the numbers involved are too small to permit the use
of a chi- square test.
Friedmans test :
A non- parametric equivalent of the analysis of variance.
Permits the analysis of an un-replicated randomized block design.

NAGHMAN ZUBERI 42
Kruskal wallis test :
A non-parametric test.
Used to compare the medians of several independent samples.
It is the non-parametric equivalent of the one way ANOVA.
Mann- whitney U test :

A non-parametric test.
Used to compare the medians of two independent samples.
Mc Nemars test :
A variant of a chi squared test, used when the data is paired.
NAGHMAN ZUBERI 43
Tukeys multiple comparison test :
Its a test used as sequel to a significant analysis of variance test, to

determine which of several groups are actually significantly different
from one another.
It has built-in protection against an increased risk of a type 1 error.
Type 1 error : being misled by the sample evidence into rejecting

the null hypothesis when it is in fact true.
Type 2 error : being misled by the sample evidence into failing to

reject the null hypothesis when it is in fact false.
NAGHMAN ZUBERI 44
NAGHMAN ZUBERI 45

Biostatistics in Dentistry

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Biostatistics in Dentistry

Diunggah oleh

Hak Cipta:

Format Tersedia

BIOSTATISTICS IN

DENTISTRY By: Naghman Zuberi

Originated from 2 main sources:

Registration of heads of families in ancient Egypt & Roman census

John Graunt (1620-1674) father of health statistics

tabulating numerical data & expressing the results in a

mathematical or graphical form.

BIOSTATISTICS : is that branch of statistics concerned with

mathematical facts & data related to biological events.

To indicate the basic factors underlying the state of oral health by

To determine success or failure of specific oral health care programs

To promote health legislation and in creating administrative

A single estimate of a series of data that summarizes the data - the

measure of central tendency.

To condense the entire mass of data.

Should be based on each and every item in the series.

Should not be affected by extreme observations.

Should be capable of further statistical computations.

Should have sampling stability.

Arithmetic mean mathematical estimate.

Median positional estimate.

Mode based on frequency.

All the observations are arranged in the order of the magnitude.

Middle value is selected as the median.

Odd number of observations : (n+1)/2.

Even number of observations: mean of the middle two values is taken

The mode or the modal value is that value in a series of

observations that occurs with the greatest frequency.

When mode is ill defined, it can be calculated using the relation

Mode = 3 median 2 mean

Extreme values in the series : median.

Dispersion is the degree of spread or variation of the variable

Measures of dispersion used:

To determine the reliability of an average.

To serve as basis for control of variability.

To compare two or more series in relation to their variability.

To facilitate further statistical analysis.

It is the simplest method, Defined as the difference between the

between the extreme values.

Subjects to fluctuations from sample to sample.

It is the average of the deviations from the arithmetic mean.

arithmetic mean, Xi is the value of each observation in the data, n

is the number of observation in the data.

Most important and widely used.

Greater the standard deviation, greater will be the magnitude of

A small SD means a higher degree of uniformity of the observations.

The standard deviation will be 0 if all the values or measurements are

As N gets larger or the more samples (measurements, scores, etc.), the

The Standard Deviation would be:

Range = 22.5 10.6g = 11.0 mL

A relative measure of dispersion.

To compare two or more series of data with either different units of

Where, C.V. is the coefficient of variation

S is the standard deviation

Higher the C.V. greater is the variation in the series of data

Symmetrical about the midpoint.

Height of curve is maximum at the mean and all three measures of

Maximum number of observations is at the value of the variable

Mean 1 SD covers 68.3% of the observations

Mean 2 SD covers 95.4% of the observations

Mean 3 SD covers 99.7% of the observations

These limits on either side of mean are called confidence limits.

Forms the basis for various tests of significance .