Anda di halaman 1dari 47

# DATA COLLECTION

AND
UNIVARIATE DISTRIBUTIONS

ENGSTAT
Statistics
A branch of mathematics dealing with the collection,
presentation, analysis, and interpretation of masses of numerical
data to make decisions, solve problems, and design products and
processes.
Types
Descriptive - describes the main features of a collection of data
quantitatively
Inferential
infer from the sample data what the population might think
make judgments of the probability that an observed difference
between groups is a dependable one or one that might have
happened by chance
Statistics

Parameters Population

Statistical
Inference

Statistic Sample
Statistics
Types of Data
QUANTITATIVE (Numeric)
Continuous Discreet
Blood pressure Number of Children
Height Number of asthma attacks per week
Weight
Age
QUALITATIVE (Categorical)
Ordinal (Ordered categories) Nominal (Unordered categories)
Stage of cancer Gender (Male/Female)
Better, Same, Worse Alive or Dead
Disagree, Neutral, Agree Blood Type (O, A, B, AB)
Data Presentation
To Show Use Data Needed
Frequency of Occurrence Bar Chart, Pie Chart Tallies by category

## Trends over time Line graph, Run chart, Measurements taken in

Control chart chronological order
Distribution: Variation Histograms Forty or more measurements
not related to time (not necessarily in
chronological order, variable
data)

## Association: Looking for Scatter diagram Forty or more paired

a correlation between measurements (measures of
two things both things of interest,
variable data)
Tools for Describing Data: Graphical Methods
Graphing Categorical Data
Bar chart useful for summarizing and displaying
patterns
Pie Chart
Pareto Chart- bar graph with the height of bars
proportional to the contributions of each source, bars are
ordered from tallest to shortest
Graphing Numerical data
Dotplot
Histogram
Cumulative frequency chart
boxplots
Tools for Describing Data: Bar Chart

## Bar Charts are used for

qualitative type of variable

## The variable is plotted along

the x-axis and the height of
the bar is equal to the
percentage of frequencies
1 9

## Distribution of Patients with IBD

Crohn's Disease 88
Ulcerative Colitis 48

384
Disease

## Distribution of Patients by Disease Subtype

17%
Crohn's Disease
9%
Ulcerative Colitis

Non-specific Chronic
74% Inflammatory Disease
1 10

## Distribution of Patients with IBD

A1 (< 16) A2 (17-40) A3 (> 40)
Femal Femal
Male e Male e Male Female
Crohn's Disease 8 2 37 14 22 5
Ulcerative Colitis 0 2 15 8 13 10
Non-specific
Chronic
6 3 136 92 85 65
Inflammatory
Disease

160
136
140
120
100 92 Crohn's Disease
85
80 Ulcerative Colitis
65
60
Non-specific Chronic
37 Inflammatory Disease
40
22
20 15 14 13
8 6 8 10
2 2 3 5
0
0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
1 11

## Distribution of Patients with IBD

A1 (< 16) A2 (17-40) A3 (> 40)
Femal Femal
Male e Male e Male Female
Crohn's Disease 8 2 37 14 22 5
Ulcerative Colitis 0 2 15 8 13 10
Non-specific
Chronic
6 3 136 92 85 65
Inflammatory
Disease

200
180
160
140
120 136
Non-specific Chronic
100 Inflammatory Disease
80 85 Ulcerative Colitis
92
60
15 65 Crohn's Disease
40
13
20 37 8
6
0 3 14 22 10
8 2 5
0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
1 12

## Characteristics of CD Patients According to Montreal Classification

A1 (< 16) A2 (17-40) A3 (> 40)

## Male Female Male Female Male Female Total

L1 0 0 4 0 2 1 7

L2 0 0 2 0 4 6

L3 6 2 31 14 16 4 73
L4 2 2
35

30

25

20 L1 (Ileal)
L2 (Colon)
15
L3 (Ileocolonic)
10 L4 (UGT)

0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
1 13

## Characteristics of CD Patients According to Montreal

Classification
A1 (< 16) A2 (17-40) A3 (> 40)
Femal Femal Femal
Male e Male e Male e
B1 8 2 32 13 21 5 81

B2 0 0 2 0 0 0 2

B3 0 0 3 1 1 0 5

35

30

25

20
B1 (NSNP)
15 B2 (Stricturing)
10 B3 (Penetrating)

0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
Tools for Describing Data: Pie Chart
Tools for Describing Data: Pareto Chart
Tools for Describing Data: Dot Plot
The dot plot as a
representation of a
distribution consists of
group of data points
plotted on a simple scale

## The dot plot is suitable for

small to moderate sized
data sets.
Tools for Describing Data: Histogram
A Histogram is used for
continuous quantitative data
where, on the x-axis the
quantitative exclusive type of
class intervals is plotted and on
the y-axis the frequencies are
plotted.

## The difference between bar

charts and histogram is that,
there are no gaps between the
bars in histogram.
Tools for Describing Data: Scatter Plot
Tools for Describing Data: Time Series Plot

## A time series plot is used

to evaluate patterns and
behavior in data
over time..

## A time series plot displays

observations on the y-axis
against equally spaced
time interval on the x-axis.
Frequency of Measurement (f)
Frequency is the number of times that measurement occurs in
the dataset
Frequency distribution table gives the measurements and their
frequencies
Relative frequency of an observation in a set with n
measurements is the ratio of frequency to the total number of
measurements
Cumulative frequency gives the proportion of measurements
less than or equal to a specified value
Frequency of Measurement (f)
Tools for Describing Data: Numerical Measures
Measure of Central Tendency
Mean - average
Median value found at the exact middle of the dataset
Mode most frequently occurring value

## Measure of Spread or Dispersion

Standard deviation measure of how spread out a distribution is
Variance
Percentile - the value of a variable below which a certain percent of
observations fall
Quartile
Measure of Central Tendency: Ungrouped Data

Data: 8 6 4 10 3 8 4 8 5
Sample Mean is given by:
n

X i
8 6 4 10 3 8 4 8 5
X i 1
6.22
n 9

## The sample median is given by:

3 4 4 5 6 8 8 8 10
Measure of Central Tendency: Ungrouped Data

Data: 8 6 4 10 3 8 4 8 5
The mode of the sample is the value that occurs most
frequently

3 4 4 5 6 8 8 8 10

3 4 4 4 6 8 8 8 10
bimodal
Measure of Dispersion: Ungrouped Data

sample variance

X
n
2
i X
s2 i 1
n 1

## The sample standard deviation is the square root

of the sample variance

S s 2
Measure of Moments: Ungrouped Data
1. Skewness measure of asymmetry

X
N
3
i Xi
skewness i 1
n 1s 3
If skewness is +, distribution is skewed to the right
If skewness is -, distribution is skewed to the left

## mean median median mean

Measure of Moments: Ungrouped Data
2. Kurtosis measure of whether the data are peaked or flat
relative to a normal distribution.

X
N
4
i Xi
k i 1
n 1s 4
Measure of Central Tendency: Grouped Data

M ean
fx i i

N
f frequency
i
X class mark
i
Measure of Central Tendency: Grouped Data

N
f i
M edian Lx 2 c
f med
f i summation of frequency preceding the median class
f frequency of the median class
med
L lower limit of the median class
X
Measure of Central Tendency: Grouped Data

f mo f1
Mode Lmo c
2 f m 0 f1 f 2
L lower limit of the modal class
mo
f frequency of the modal class
mo
f frequency preceeding the modal class
1
f frequency succeeding the modal class
2
Quartile, Decile, Percentile
nN
f i
Qn LQn 4 c
f Qn
nN
f i
Dn LDn 10 c
f Dn
nN
f i
Pn LPn 10 c
f Pn
Inter Quartile Range
middle fifty
equal to the difference
between the upper and lower
quartiles
IQR = Q3 Q1
Measure of Dispersion: Grouped Data
Standard Deviation
1. For normal distribution, the 68-95-99.7 Rule applies
2. The square of the standard deviation is the variance
3. When referring to the standard deviation of the sample s is used, for
standard deviation of the population is used.
Measure of Moments: Grouped Data
1. Skewness - measure of the symmetry

f X
N

i X
skewness i 1
n 1s 3
Measure of Moments: Grouped Data
2. Kurtosis - measure of whether the data are peaked or flat
relative to a normal distribution

f X
N 4

i X
k i 1
n 1s 4
f X
N 4

i X
Normal Distribution : k i 1
3
n 1s 4
Grouped Data: Frequency Distribution
1. Determine the range: Range = highest lowest value

## 2. Number of Classes: k = 1 + 3.322 log N

N = total number of observations

## 5. Distribute data into respective classes

Frequency Distribution Grouped Data
Grouped Data: Frequency Distribution
Raw Data

24 27 18 21 21 18 24 27
21 15 24 21 21 21 18 24

37 31 24 15 30 21 18 21

25 24 15 39 27 27 24 39

24 27 24 21 37 18 34 21
Frequency Distribution
f = frequency Classes f True Class
Range = 39-15 = 24 (apparent limits Mark
K = 1 + 3.322 log(40) = 6.3 limits)
Class interval = 24/6.3 = 3.8 4 10 13 0 9.5 13.5 11.5
14 17 3 13.5 17.5 15.5
18 21 15 17.5 21.5 19.5
22 25 10 21.5 25.5 23.5
26 29 5 25.5 29.5 27.5
30 33 2 29.5 33.5 31.5
34 37 3 33.5 37.5 35.5
38 41 2 37.5 41.5 39.5
42 45 0 41.5 45.5 43.5
Total 40
Frequency Histogram

16
14
12
Frequency

10
8
6
4
2
0
10 13 14 17 18 21 22 25 26 29 30 33 34 37 38 41 42 45
Class Interval
Ogive Cumulative Frequency Distribution

## classes frequency Less than Ogive Greater than Ogive

10 13 0 Less than More than
14 17 3 13.5 0 13.5 40
18 21 15 17.5 3 17.5 37
22 25 10 21.5 18 21.5 22
26 29 5 25.5 28 25.5 12
30 33 2 29.5 33 29.5 7
34 37 3 33.5 35 33.5 5
38 41 2 37.5 38 37.5 2
42 45 0 41.5 40 41.5 0
Class mark = (lower limit + higher limit)/2
Ogive Cumulative Frequency Distribution

45
40
35
30
25
less than ogive
20
greater than ogive
15
10
5
0
13.5 17.5 21.5 25.5 29.5 33.5 37.5 41.5
Measures of Central Tendencies
Classes frequency True Xi Cumulative
Limits Frequency
10 13 0 9.5 13.5 11.55 0
14 17 3 13.5 17.5 15.5 3
18 21 15 17.5 21.5 19.5 18 Modal class
22 25 10 21.5 25.5 23.5 28 Med. class
26 29 5 25.5 29.5 27.5 33
30 33 2 29.5 33.5 31.5 35
34 37 3 33.5 37.5 35.5 38
38 41 2 37.5 41.5 39.5 40
42 45 0 41.5 45.5 43.5
Total 40
Exercises
Raw Data
66, 58, 56, 46, 44, 46, 46, 60, 70, 54, 80, 62, 64, 44, 60, 66,
82, 86, 94, 70, 44, 64, 52, 46, 40, 56, 52, 60, 44, 48, 64, 55
a. Group data into frequency distribution
b. Calculate Mean, Median, Mode
c. Calculate Standard deviation
d. Select one each from Percentile, Decile, and Quartile and
calculate
e. Graph the frequency distribution into a histogram and
polygon
1 45
1 46
1 47