Biostatistik Intermediat 1

Biostatistik
Intermediat
Indang Trihandini
Metode Pembelajaran
Student Centered Learning
Modul (chapter)
Diskusi
Output (hasil statistik dari program SPSS)
Case based learning
Tugas
Prasyarat
Biostatistik Dasar
Metodologi Penelitian
Deskripsi mata kuliah

Mendapatkan kemampuan menerapkan pendekatan
kuantitatif tingkat intermediet dalam memahami
masalah kesehatan masyarakat serta pemecahannya.
Pokok bahasan teknik biostatistik deskriptif dan
inferensia, sampai menerapkan beberapa teknik
statistik tingkat lanjut yaitu analisis regresi linier,
analisis varians dan analisis regresi logistik, analisis
Survival. Mata ajaran ini menggunakan Bahasa
Indonesia sebagai bahasa penyampaian dan
menggunakan bahasa Inggris pada buku rujukan
Subkompetensi
Mahasiswa mampu menjelaskan lingkup dan
peran biostatistik dalam area kesehatan
masyarakat (C1; A2; P1)
Mahasiswa mampu membedakan jenis data dan
skala variable dalam riset penelitian kesehatan
masyarakat (C2; A3; P2)
Mahasiswa mampu menerapkan teknik statistic
deskriptif (ukuran pusat, variasi, dan posisi)
sesuai skala variable (C3; A2; P2)
Mahasiswa mampu menjelaskan teori dasar
peluang dan membedakan beberapa sebaran
peluang teoritis tersering diterapkan pada data
riset kesehatan masyarakat (C2; A3; P2)
Subkompetensi
Mahasiswa mampu menerapkan cara
mengeksplorasi data, mengenali kualitas data,
dan menjelaskan teknik imputasi yang dapat
dipakai pada data riset kesehatan masyarakat
(C3; A2; P1)
Mahasiswa mampu menerapkan metode statistic
inferensia dengan estimasi dan uji hipotesis
statistic berdasarkan skala variabel (C3; A3; P3)
Mahasiswa mampu menerapkan analisis statistic
uji Z, uji t, uji chi-square, asosiasi dan korelasi
sederhana sesuai jenis data, skala variable dan
rancangan penelitian (C3; A3; P3)
Subkompetensi
Mahasiswa mampu menerapkan teknik
pengambilan sampel dari populasi pada riset
kesehatan masyarakat
(C3; A3; P3)
Mahasiswa mampu menerapkan analisis
biostatistik lanjut: regresi linier ganda, regresi
logistic ganda, analisis varians ganda (C3; A3;
P2)
Mahasiswa mampu menelaah kesalahan dalam
analisis biostatistik dan interpretasinya dalam
riset kesehatan masyaraka (C4; A3; P3)
Subkompetensi
Mahasiswa mampu mengorganisasikan
penyajian hasil analisis biostatistik berdasarkan
data riset kesehatan masyarakat maupun
registrasi vital, baik secara tertulis dan lisan,
ditujukan kepada professional kesehatan
masyarakat maupun kalangan awam (C4; A4;
P4).
MATRIKS KEGIATAN
Pertemuan
ke-
Subkompetensi
Tahap Pembelajaran
O
L
U
Media
Teknologi
Pokok Bahasan
(%)
(%)
(%)
2.1.
90
10
Lingkup dan peran

biostatistik
2.2.
50
40
10
Variabel dan data
20
Statistik deskriptif:
ukuran pusat, ukuran
variasi, ukuran posisi
20
Penyajian data: tabel

master, tabel
frekuensi, tabel silang
dan grafik
2.3.
2.3.
40
40
40
40
Laptop, LCD
Kriteria
Penilaian
(Indikator)
Review
Makalah
Laptop,
LCD, artikel (individual)
di jurnal
Laptop,
LCD, artikel
di jurnal,
Diskusi Kelp,
perangkat Presentasi
lunak
statistik
Laptop,
LCD, artikel
di jurnal,
Diskusi Kelp,
perangkat Presentasi
lunak
statistik
Pertemuan
Subkekompetensi
Tahap Pembelajaran
O
L
U
(%)
(%)
(%)
Media
Pokok
Bahasan
2.4.
50
40
10
Peluang &
sebaran
peluang
teoritis
2.5.
50
40
10
Eksplorasi
Data
2.6.
50
40
10
Imputasi
Data
2.8.
50
40
10
Sampel
Teknologi
Laptop,
LCD,
perangkat
lunak
statistik
Laptop,
LCD,
perangkat
lunak
statistik
Laptop,
LCD, artikel
di jurnal,
perangkat
lunak
statistik
Laptop,
LCD, artikel
di jurnal,
perangkat
lunak
statistik
Kriteria
Penilaian
(Indikator)
Diskusi Kelp,
Presentasi
Diskusi Kelp,
Presentasi
Diskusi Kelp,
Presentasi
Diskusi Kelp,
Presentasi
Pertemuan
Subkekompetensi
2.6.
Tahap Pembelajaran
O
L
U
(%)
50
(%)
40
(%)
10
2.7.
50
40
10
2.7.
50
40
10
2.9.
UTS
50
40
10
Pokok
Bahasan
Statistic
inferensia:
uji hipotesis
Teknik uji
statistic
sederhana
Media
Teknologi
Laptop,
LCD, artikel
di jurnal,
perangkat
lunak
statistik
Laptop,
LCD, artikel
Teknik uji
di jurnal,
statistic
perangkat
sederhana
lunak
statistik
Analisis
Laptop,
statistik
LCD, artikel
variabel
di jurnal,
ganda:
perangkat
analisis
lunak
regresi linier statistik
Kriteria
Penilaian
(Indikator)
Diskusi Kelp,
Presentasi
Diskusi Kelp,
Presentasi
Presentasi
makalah
individu
Pertemuan
Subkekompetensi
Tahap Pembelajaran
O
L
U
Media
Teknologi
Pokok Bahasan
(%)
(%)
(%)
2.9.
50
40
10
2.9.
50
40
10
10
2.9.
50
40
10
11
2.9
50
40
10
Kriteria
Penilaian
(Indikator)
Laptop,
Analisis statistik
LCD, artikel
Presentasi
variabel ganda:
di jurnal,
makalah
analisis regresi
perangkat
individu
logistik
lunak
statistik
Laptop,
LCD, artikel
Presentasi
di jurnal,
Ukuran Risiko
makalah
perangkat
individu
lunak
statistik
Laptop,
LCD, artikel
Analisis statistik
Presentasi
di jurnal,
variabel ganda:
makalah
perangkat
analisis varians
individu
lunak
statistik
Laptop,
LCD, artikel
Presentasi
Analisis statistik di jurnal,
makalah
perangkat
Survival1
individu
lunak
statistik
Pertemuan
Subkekompetensi
Tahap Pembelajaran
O
L
U
Media
Teknologi
Pokok Bahasan
(%)
(%)
(%)
12
2.9
50
40
10
13
2.10.
40
50
10
14
2.11.
20
70
10
Kriteria
Penilaian
(Indikator)
Laptop,
LCD, artikel
Presentasi
Analisis statistik di jurnal,
makalah
perangkat
Survival2
individu
lunak
statistik
Laptop,
Presentasi
Kesalahan pada
LCD, artikel makalah
statistik
di jurnal
individu
Laptop,
LCD, artikel
di jurnal,
Presentasi
Penyajian hasil
perangkat makalah
analisis biostatistik
lunak
individu
penyajian
data
Review
16
UAS
MCQ
Essay
Penilaian
Bentuk
Instrumen
Frekuensi
Bobot (%)
Tugas Kelompok
Lembar penilaian
15
Tugas Individu
Lembar penilaian
12
20
UTS
Soal ujian
25
UAS
Soal ujian
40
Total
100
Pedoman Kriteria Penilaian

Nilai Angka
85100
80<85
75<80
70<75
65<70
60<65
55<60
40<55
<40
Nilai Huruf
A
AB+
B
BC+
C
D
E
Bobot
4,00
3,70
3,30
3,00
2,70
2,30
2,00
1,00
0,00
Tanggal Praktikum Individu

2 Oktober 2015
9 Oktober 2015
30 Oktober 2015
Session 1
Pertanyaan
Survei, Registrasi
Induksi vs deduksi
Skala variabel
Jenis variabel: dependen vs independen;
Jenis data: primer, sekunder, tersier
Data berkelompok vs Individu
Data rutin vs ad hoc
Session 1
Ukuran pusat, ukuran variasi, ukuran
posisi
Tabel master, tabel frekuensi, tabel
silang
Grafik: histogram, ogive, stem&leaf, boxplot, bar, garis, scatter
Session 2
Teori dasar peluang, hukum
penambahan & perkalian; peluang
kondisional & independen;
Sebaran peluang teoritis: Z, t, binomial;
Sebaran sampel & Teorema Limit Pusat;
Session 2
Cara eksplorasi data dengan statistik &
grafik
Simpulan kualitas data
Bias
Jenis imputasi data
Session 3
Besar sampel untuk estimasi & uji
hipotesis;
Teknik sampling random: sederhana,
sistematik, stratifikasi, klaster
Session 4
Estimasi: titik & selang.
Konsep kepercayaan & kekuatan studi.
Uji hipotesis statistik; uji satu atau dua
arah
Session 5
Uji Z, uji t,
Uji chi-square,
Korelasi (r dan rs),
Asosiasi pada tabel silang (koefisien OR,
RR, C)
Session 6
Analisis regresi linier sederhana & ganda
Session 8
Analisis regresi logistik sederhana &
ganda;
Session 9
Analisis varians satu arah & ganda arah
Session 10
Survival analysis
Session 11
Cox regression
Session 12
Interpretasi
Kesalahan dalam praktik/terapan analisis
statistik;
Kesalahan dalam menyimpulkan hasil
analisis statistik
Session 13
Penyajian hasil analisis biostatistik riset
atau registrasi vital dengan
narasi/tulisan.
Penyajian lisan hasil analisis biostatistik
dengan bantuan media.
Session 14
Critical review
Daftar Rujukan
Rumsay, Deborah. Intermediate Statistics for Dummies.
Indianapolis, Indiana, USA: Wiley Publishing Inc. 2007.
Prasetyo, Sabarinah; Iwan Ariawan. Biostatistik Dasar untuk
Rumah Sakit, Bahan Ajar. Depok UI: FKMUI, 2008.
Manfred Stommel, PhD; Katherine J. Dontje, PhD, FNP-BC,
Statistics for Advanced Practice Nurses and Health
Professionals, Springer Publishing Company, 2014
David M. Levine, Ph.D; David F. Stephan. Even You Can Learn
Statistics, Second Edition. Pearson Education, Inc.2010
Andy Field, Discovering Statistics Using Spss, Sage, 2009
Mendel Suchmacher; Mauro Geller, Practical Biostatistics A UserFriendly Approach for Evidence-Based Medicine. Elsevier Inc., 2012
Anders Kallen. Understanding Biostatistics. John Wiley & Sons, 2011
Sesi 1
KONSEP DASAR STATISTIK : induktif
deduktif, parameter statistik, populasi
sampel, stat deskriptif stat inferensial, data
variable (pembagian variable:noir dan
katagorikal numeric)
METODE STATISTIK : pengumpulan data,
pengolahan data, penyajian data dan analisis
data
RINGKASAN DATA : numerik (mean, median,
mode, varian,standar deviasi, koef variasi,
IQR ) dan kategorik ( proporsi/persentase)
Deduksi-Induksi
Pendekatan scientific melalui logika
Dua jenis penalaran, yaitu Penalaran Deduktif dan
Penalaran Induktif.
Penalaran deduktif merupakan prosedur yang
berpangkaldari observasi dan berakhir pada suatu
kesimpulan atau pengetahuan baru yang bersifat lebih
khusus.
Metode ini diawali dari pembentukan teori,konsep,
hipotesis, definisi operasional, instrumen dan
operasionalisasi.
Penalaran induktif merupakan prosedur yang berpangkal
dari hasil pengamatan empirik dan berakhir pada suatu
kesimpulan atau pengetahuan baru yang bersifat umum.
Kedua penalaran tersebut dapat digunakan secara
bersama-sama dan saling mengisi
Populasi dan Sampel

Atribut dan Subyek
Penelitian biasanya menyangkut sasaran
berupa atribut dari subyek tertentu (subyek
pemilik atribut)
Populasi dan Sampel
Ada populasi dan sampel atribut (data) dan ada
juga populasi dan sampel subyek (responden)
STATISTICS
Types of data
ANOVA
Normal distribution
Repeated measures ANOVA
Describing data
Non-parametric tests
Mann-Whitney U test
Boxplots
Summary of common tests
Standard deviations
Summaries of proportions
Skewed distributions
Odds and Odds Ratio
Parametric vs Non-parametric
Sample size
Absolute and Relative Risks
Statistical errors
Number Needed to Treat (NNT)
Power calculations
Confidence intervals (CIs)
Clinical vs statistical significanceCI (diff between two proportions)
Two-sample t test
Correlation
Problem of multiple tests
Regression
Subgroup analyses
Paired t test
Chi-square test
Logistic regression
Survival analysis
DATA
Data adalah bentuk kata jamak, sedangkan
bentuk tunggalnya adalah datum. Data
diperoleh melalui pencatatan (recording)
terhadap berbagai hal di institusi pelayanan
kesehatan, jumlah dan jenis obat-obatan yang
diberikan kepada pasien, jumlah dan jenis
bahan laboratorium yang dipakai, besarnya
uang yang dikeluarkan untuk pembelian barang,
jumlah kejadian infeksi nosokomial pada pasien
Sumber data
Data di rumah sakit atau puskesmas atau klinik
dapat dikumpulkan secara rutin, dan dapat
disebut sebagai data rutin. Hal ini dikenal
dalam sistem pencatatan dan pelaporan rumah
sakit. Namun kadang kala institusi tersebut
mengadakan pengumpulan data yang sifatnya
temporer, atau sewaktu saja, hal ini dapat
sebut sebagai data ad hoc. Survei yang
dilakukan sewaktu-waktu saja menghasilkan
data ad hoc.
Data Primer vs Data

Sekunder
Data primer adalah data yang diperoleh
dari proses pengumpulan yang dilakukan
sendiri langsung dari sumber datanya yaitu
subyek yang diteliti.
Data sekunder adalah data diperoleh dari
institusi yg telah mengumpulkan datanya,
atau tidak langsung dikumpulkan dari
sumber data yaitu subyek yang diteliti.
Data Individu vs Data

Agregat
Data yang diperoleh melalui pengukuran pada
satu subyek atau individu dapat disebut sebagai
data individu. Contohnya seperti pengukuran
kadar hemoglobin, lama hari rawat pasien, biaya
perawatan tiap pasien. Dapat pula diperoleh data
persen pasien infeksi nosokomial tiap bangsal
perawatan. Karena unitnya adalah bangsal
perawatan, yang terdiri dari para pasien, maka
dapat disebut sebagai data agregat, yaitu berupa
agregasi (kumpulan) pasien dalam setiap bangsal
Data Berpasangan vs Data

Berkelompok
Data diperoleh dari pengukuran berulang (repeated
measures) pada subyek yang sama, maka data yang
diperoleh disebut sebagai data berpasangan atau
dependen (paired data). Misalnya data tekanan darah
pasien sebelum diberi obat dan sesudah diberi obat,
pengetahuan pasien tentang penanggulangan asma
sebelum dan sesudah penyuluhan diberikan di ruang
tunggu pasien.
Sedangkan data yang diperoleh dari subyek kelompok
yang berbeda, misalnya data lama hari rawat pasien
pada kelompok yang dirawat di ruang kelas satu dan
kelas dua; atau kelompok pasien tinggal di daerah
urban atau kelompok rural disebut data berkelompok
atau data independen.
Transformasi Data
Mendapatkan informasi selalu diawali oleh
proses yang berawal dari data, atau dengan
hasil transformasi data. Sehingga dapat
diartikan bahwa informasi haruslah sesuatu
yang siap pakai untuk mengambil keputusan.
Data (pendekatan kuantitatif), segala sesuatu
terukur secara jelas (dengan angka), proses
tersebut dikenal sebagai prosedur statistik.
Prosedur statistik dimulai dari pengumpulan
data, pengolahan dan penyajian data, serta
analisis dan penyimpulan yang merupakan
informasi.
TIPE DATA
Nominal : gender, type of customer
(loyalty), flavor/color liked, etc.
Ordinal/Ranking :type of user, preferred
brand, brand awareness, etc.
Interval: Attitudinal or satisfaction scales.
Are you satisfied with your education at U of L?
Dissatisfied 1 2
Satisfied
3 4 5
Ratio: Income, price willing to pay, age, etc.
Type of
Measurement
Type of
descriptive analysis
Two
categories
Nominal
More than
two categories
Frequency table
Proportion (percentage)
Frequency table
Category proportions
(percentages)
Mode
Type of
Measurement
Type of
descriptive analysis
Ordinal
Rank order
Median
Interval
Arithmetic mean
Ratio
means
Data dari skala Kategorik

(Analisis Deskriptif)
47
Ukuran konsentrasi:
Proporsi/Ratio/Rate
Ukuran dispersi: Standar
Deviasi
Data dari skala numerik (Analisis

Deskriptif)
48
Data ini diringkaskan dalam

bentuk ukuran konsentrasi
nilai (nilai tengah) dan
ukuran dispersi/ penyebaran
nilai
Ukuran konsentrasi: Mean/
Median/Mode/ koefisien/
indeks
Ukuran dispersi: Standar
Deviasi
Level Pengukuran
Nominal
Ratio
Increasing ability to use higher level statistical analyses
Non-parametric testing is generally performed

with nominal and ordinal level data
Parametric testing with interval and ratio
Summarizing Qualitative Data

Frequency Distribution (shows how many)
Relative Frequency Distribution (shows what
fraction)
Percent Frequency Distribution (shows what
percentage)
Bar Graph
Pie Chart
Both these are graphical means for displaying
any of above.
Frequency Distribution
A
A frequency
frequency distribution
distribution is
is aa tabular
tabular summary
summary of
of
data
data showing
showing the
the frequency
frequency (or
(or number)
number) of
of items
items
in
in each
each of
of several
several nonoverlapping
nonoverlapping classes.
classes.
The
The objective
objective is
is to
to provide
provide insights
insights about
about the
the data
data
that
that cannot
cannot be
be quickly
quickly obtained
obtained by
by looking
looking only
only at
at
the
the original
original data.
data.
DESCRIBING DATA
MEAN
Average or arithmetic mean of the

data
MEDIAN
The value which comes half way

when the data are ranked in order
MODE
Most common value observed
In a normal distribution, mean and median are the

same
If median and mean are different, indicates that the

data are not normally distributed
The mode is of little if any practical use
Lihat data...
Eksplorasi data?
Mengenali data yang hilang
Mengenali data yang salah salah
sewaktu:
Entry data
Catat data
Editing & cleaning data
Proses koleksi/kumpul data
Data Analysis
Exploratory
Confirmatory
Apa akibat data salah atau

hilang?
Bentuk sebaran data konsekuensi:
Statistik deskriptif nilai yg disajikan
Statistik inferens ~ sebaran peluang teoritis
Eksplorasi Data
1.
2.
3.
4.
5.
Mengetahui sifat data

Beberapa cara eksplorasi:
Descriptive statistics: mean = median
Histogram, Stem & leaf
Box plot
Normal Probability Plot
Goodness of Fit Test : K-S test
56
Statistik Deskriptif
Ukuran Pusat (Central Tendency)
Mean = (Xi) / n
Median = Observed value at the mid
Modus = The most frequent observed
Proportion = ( XA)/ ( XA+B)

Variasi
Standard deviation = [(Xi-X)2] / (n-1)
Range = max - min

Posisi
Quartile = Q1, Q2, Q3
Inter Quartile Range = Q3 Q1
57
Tabel Sebaran Frekuensi
Tabel 1. Pasien patah tulang di rawat di UPD
RS X, Jan 1998
Umur (th) Frekuensi

%
15-19
20-24
25-29
30-34
35-39
40-44
45-49
Jumlah
7
18
21
30
16
147
4,7
12,3
14,3
20,4
28
19,0
27
18,4
10,9
100,0
Pareto Diagram
Bentuk?
60
Bentuk?
61
Bentuk?
62
Bentuk? Modus?
63
Stem & Leaf

Batang Daun
1
2,4,5
2
0,1,1,3,5,6,7
3
2,4,5,5,6,6,6,8,9
4
3,4,4,6,6,7,7,9,9,9
5
1,3,4,5,6,7,8,9
6
1,3,4,4,6,8
7
0,1,4,5
Stem and Leaf

A frequency distribution table that
provides a visual picture of the
distribution
Stem and Leaf

Each raw score has two parts: a stem,
consisting of all but the last digit, and the
leaf, the last digit in the number.
Stem and Leaf

Current Salary Stem-and-Leaf Plot
Frequency
Stem &
2.00
1
2.00
2
6.00
3
3.00
4
1.00
5
2.00 Extremes
Stem width:
Each leaf:
.
.
.
.
.
Leaf
55
47
001234
016
0
(>=81250)
10000
1 case(s)
Each
Each stem
stem represents
represents 10
10 thousand,
thousand,
so
so the
the 11 (stem)
(stem) =
= 10,000
10,000
There
There are
are two
two cases
cases (frequency=2)
(frequency=2) with
with 15,000,
15,000,
two
two cases
cases with
with 20,000
20,000 (actually,
(actually, 24,000
24,000 and
and 27,000),
27,000),
66 cases
cases with
with 30,000
30,000 (30,
(30, 30,
30, 31,
31, 32,
32, 33,
33, and
and 34
34
thousand)
thousand) in
in this
this data
data set.
set.
Current Salary Stem-and-Leaf Plot
Frequency
Stem & Leaf
2.00
1 . 55
2.00
2 . 47
6.00
3 . 001234
3.00
4 . 016
1.00
5 . 0
2.00 Extremes
(>=81250)
Stem width:
10000
Each leaf:
1 case(s)
Stem and Leaf in SPSS

To create a stem and leaf in SPSS, select the
following:
Analyze
Descriptives
Explore
Select stem and leaf in plots
Click continue
Click OK
Stem & Leaf Plot

Histogram like picture
1st digit of each data value is placed in the
stem, & 2nd digit in the leaf
E.g. Data: 42, 21, 46, 69, 87, 29, 34, 59, 81, 97, 64, 60, 87, 81, 69, 77,
75, 47, 73, 82, 91, 74, 70, 65, 86, 87, 67, 69, 49, 57, 55, 68, 74, 66, 81,
90, 75, 82, 37, 94
2
3
4
5
6
7
8
9
19
47
2679
975
940957986
7534045
717126712
7104
70
Box-plot
1400
15
1200
T re a d mill time in se co n d s
1000
800
600
400
N=
GROUP
10
healthy
disease
Box Plot
Gunakan Q1, Q2, Q3
Batas imajiner:
Pagar dalam: 1,5*IQR dari Q1 atau Q3
Pagar luar: 1,5*IQR dari Pagar dalam
Data ada di luar pagar luar =

Extreme / Outlier
Data ada di antara PL & PD =

Potential outlier
Boxplot Components
Box Plot
Gunakan Q1, Q2, Q3
Batas imajiner:
Pagar dalam: 1,5*IQR dari Q1 atau Q3
Pagar luar: 1,5*IQR dari Pagar dalam
Data ada di luar pagar luar =

Extreme / Outlier
Data ada di antara PL & PD =

Potential outlier
Box Plot
*
OF
Min
IF
Q1 Q2 Q3
IF
OF
Max
Variasi tinggi / rendah (~IQR)

Identiifikasi pencilan / nilai ekstrim
Bentuk Simetris atau tidak
76
Measures of dispersion
Range
Distance between the highest and lowest
scores in a distribution;
sensitive to extreme scores;
compensate by calculating interquartile range
(distance between the 25th and 75th percentile
points) which represents the range of scores for
the middle half of a distribution
Usually used in combination with other measures

of dispersion.
Range
Source:
www.animatedsoftware.com/ statglos/sgrange
.htm
Source:
http://pse.cs.vt.edu/SoSci/converted/Dispersion_I/box
Diagnostics with Boxplots
Visualization Techniques: Box Plots

Invented by J. Tukey
Another way of displaying the distribution of data
Following figure shows the basic part of a box plot
outlier
10th
percentile
75th
percentile
50th
percentile
25th
percentile
10th
percentile
Box Plot
*
Min
OF
IF
Q1 Q2 Q3
IF
OF
Max
Variasi tinggi / rendah (~IQR)

Identiifikasi pencilan / nilai ekstrim
Bentuk Simetris atau tidak
82
Diagram Pie
tidak
ya
Diagram Bar
10.5
10.0
9.5
9.0
8.5
Count
8.0
7.5
healthy
GROUP
disease
Bar Diagram
60
50
40
30
20
Status
Count
10
1
0
0
No College
Education
College
Bar Diagram
50
40
30
20
C ount
Status
0
10
1
No College
Education
College
Scatter Plot
18
16
14
12
hasil coba akhir
10
6
10
12
hasil coba pertama
14
16
18
20
Scatter plots
A scatter plot illustrates the relationship

between two continuous variables.
Scatter plots
A scatter plot
illustrates the values
of Y (vertical axis)
versus the
corresponding values
of X (horizontal axis)
Scatter plots
Scatter plots can provide answers to the

following questions:
Are variables X and Y correlated?
(as one variable goes up, the other variable
goes up/down)
Scatter plots

Is there a linear relationship between X and
Y? (as one variable goes up, the other
variable goes up/down)
8
7
Scatter plots
6
5
4
3
2
1
0
0
Scatter plots can provide answers to the following

questions:
Is there a curvilinear relationship between variables
X and Y? (As Y goes up X goes up, then at a peak,
as X continues to go up, Y goes down
10
8
7
Scatter plots
6
5
4
3
2
1
0
0
Scatter plots can provide answers to the following

questions:
Is there a curvilinear relationship between variables
X and Y? (As Y goes up X goes up, then at a peak,
as X continues to go up, Y goes down
10
100000
80000
Scatter plots
60000
Beginning Salary
40000
20000
0
0
20000
40000
60000
80000
100000
120000
Current Salary

Are there outliers? (Do one or more points
stray from the trend?)
140000
Grafik Garis
50
45
40
35
30
25
20
15
10
5
0
R.Melati
R.Mawar
R.Anggrek
bl-1
bl-2
bl-3
bl-4
Plot QQ
96
Descriptiv es
Nilai Asupan
Lemak Responden
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis
Lower Bound
Upper Bound
Statistic
20.0374
17.9234
Std. Error
1.06249
22.1515
19.8010
18.7150
92.568
9.62123
6.01
40.01
34.00
15.16
.291
-.982
.266
.526
97
Dot Plot
One of the simplest graphical
summaries of data is a dot plot.
A horizontal axis shows the range of
data values.
Then each data value is represented by
a dot placed above the axis.
Dot Plot
Tune-up Parts Cost
.
50
.
.
.
60
.
..
. .
.
.. .. .. ..
.
.
..... .......... .. . .. . . ... . ..
70
80
90
Cost ($)
Not used much anymore. Common

when graphical drawing tools were
primitive.
100
110
Histogram
Another common graphical presentation of

quantitative data is a histogram.
The variable of interest is placed on the horizontal

axis.
A rectangle is drawn above each class interval with
its height corresponding to the intervals frequency,
relative frequency, or percent frequency.
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
In informal discussions bar graphs and
histograms are often equated. In this class
you should be careful to keep them straight.
Histogram
Tune-up Parts Cost
18
16
Frequency
14
12
10
8
6
4
2
5059 6069 7079
8089
8089 9099
9099 100-110
100-110
Parts
Cost ($)
Histogram (Common categories)

Symmetric
Left tail is the mirror image of the right tail

Examples: heights and weights of people
Relative Frequency
.35
.30
.25
.20
.15
.10
.05
0
Skewness of distributions
Measures look at how lopsided
distributions arehow far from the
ideal of the normal curve they are
When the median and the mean are
different, the distribution is skewed.
The greater the difference, the
greater the skew.
Distributions that trail away to the left are

negatively skewed and those that trail
away to the right are positively skewed
If the skewness is extreme, the
researcher should either transform the
data to make them better resemble a
normal curve or else use a different set
of statisticsnonparametric statisticsto
carry out the analysis
Different Shapes of
Distributions
Source: http://faculty.vassar.edu/lowry/f0204.gif
Skewness of distributions
Source: http://www.polity.org.za/html/govdocs/reports/aids/images/image022.gif
Distribution of posting frequency on Usenet
Histogram
Moderately Skewed Left
A longer tail to the left
Example: exam scores
Relative Frequency
.35
.30
.25
.20
.15
.10
.05
0
Histogram
Moderately Right Skewed
A Longer tail to the right
Example: housing values
Relative Frequency
.35
.30
.25
.20
.15
.10
.05
0
Histogram
Highly Skewed Right
Relative Frequency
A very long tail to the right

.35
Example:
executive salaries
.30
.30
.25
.20
.15
.10
.05
0
Kurtosis
Measures of kurtosis look at how sharply
the distribution rises to a peak and then
drops away
Cumulative Distributions
Cumulative
Cumulative frequency
distribution shows
shows the
the
number
number of
of items
items with
with values
values less
less than
than or
or equal
equal to
to
the
the upper
upper limit
limit of
of each
each class..
class..
Cumulative
Cumulative relative
relative frequency
distribution shows
shows
the
the proportion
proportion of
of items
items with
with values
values less
less than
than or
or
equal
equal to
to the
the upper
upper limit
limit of
of each
each class.
class.
Cumulative
Cumulative percent
percent frequency
distribution shows
shows
the
the percentage
percentage of
of items
items with
with values
values less
less than
than or
or
equal
equal to
to the
the upper
upper limit
limit of
of each
each class.
class.
Cumulative Distributions
Hudson Auto Repair
Cost ($)
Cumulative
Frequency
< 59
< 69
< 79
< 89
< 99
< 109
2
15
31
38
45
50
Cumulative
Relative
Frequency
.04
.30
2 + 13 .62
.76
.90
1.00
Cumulative
Percent
Frequency
4
30
62
15/50
76
90
100
.30(100)
Ogive
An ogive is a graph of a cumulative

distribution.
The data values are shown on the horizontal
axis.
Shown on the vertical axis are the:
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class
is plotted as a point.
The plotted points are connected by straight
lines.
Ogive
Hudson Auto Repair

Because the class limits for the parts-cost
data are 50-59, 60-69, and so on, there
appear to be one-unit gaps from 59 to 60,
69 to 70, and so on.
These gaps are eliminated by plotting points
halfway between the class limits.
Thus, 59.5 is used for the 50-59 class, 69.5

is used for the 60-69 class, and so on.
Ogive with
Cumulative Percent Frequencies
Cumulative Percent Frequency
Tune-up
Tune-up Parts
Parts Cost
Cost
100
80
60
(89.5,
76)
40
20
50
60
70
80
90
100
Parts
Cost ($)
110
Measures of dispersion
Variance (S2)
Average of squared distances of individual
points from the mean
High variance means that most scores are
far away from the mean. Low variance
indicates that most scores cluster tightly
about the mean.
Standard Deviation (SD)

A summary statistic of how much scores
vary from the mean
Square root of the Variance
expressed in the original units of
measurement
Used in a number of inferential statistics
Variance vs. Standard

Deviation
Variance
Population
Sample
Standard
Deviation
NORMAL DISTRIBUTION
THE EXTENT OF THE
SPREAD OF DATA
AROUND THE MEAN
MEASURED BY THE
STANDARD
DEVIATION
MEAN
CASES DISTRIBUTED
SYMMETRICALLY
ABOUT THE MEAN
AREA BEYOND TWO
STANDARD DEVIATIONS
ABOVE THE MEAN
STANDARD DEVIATION MEASURE OF THE

SPREAD OF VALUES OF A SAMPLE AROUND
THE MEAN
THE SQUARE OF
THE SD IS KNOWN
AS THE VARIANCE
SD
Sum(Value Mean)
Numberof values
SD decreases as a function
of:
smaller spread of values
about the mean
larger number of values
IN A NORMAL
DISTRIBUTION, 95%
OF THE VALUES WILL
LIE WITHIN 2 SDs OF
THE MEAN
STANDARD DEVIATION AND

SAMPLE SIZE
As sample size
increases, so
SD decreases
n=150
n=50
n=10
SKEWED DISTRIBUTION
MEAN
MEDIAN 50% OF
VALUES WILL LIE
ON EITHER SIDE
OF THE MEDIAN
DOES A VARIABLE FOLLOW A

NORMAL DISTRIBUTION?
Important because parametric statistics
assume normal distributions
Statistics packages can test normality
Distribution unlikely to be normal if:
Mean is very different from the median
Two SDs below the mean give an impossible
answer (eg height <0 cm)

Biostatistik Intermediat 1

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Biostatistik Intermediat 1

Diunggah oleh

Hak Cipta:

Format Tersedia

Biostatistik

Deskripsi mata kuliah

Lingkup dan peran

Variabel dan data

Penyajian data: tabel

Pedoman Kriteria Penilaian

Tanggal Praktikum Individu

Populasi dan Sampel

Data Primer vs Data

Data Individu vs Data

Data Berpasangan vs Data

Ratio: Income, price willing to pay, age, etc.

Data dari skala Kategorik

Data dari skala numerik (Analisis

Data ini diringkaskan dalam

Increasing ability to use higher level statistical analyses

Non-parametric testing is generally performed

Summarizing Qualitative Data

Average or arithmetic mean of the

The value which comes half way

Most common value observed

In a normal distribution, mean and median are the

If median and mean are different, indicates that the

The mode is of little if any practical use

Apa akibat data salah atau

Mengetahui sifat data

Median = Observed value at the mid

Modus = The most frequent observed

Proportion = ( XA)/ ( XA+B)

Standard deviation = [(Xi-X)2] / (n-1)

Range = max - min

Quartile = Q1, Q2, Q3

Inter Quartile Range = Q3 Q1

Tabel Sebaran Frekuensi

Tabel 1. Pasien patah tulang di rawat di UPD

Umur (th) Frekuensi

Stem & Leaf

Stem and Leaf

Stem and Leaf

Stem and Leaf

Stem and Leaf in SPSS

Stem & Leaf Plot

Data ada di luar pagar luar =

Data ada di antara PL & PD =

Data ada di luar pagar luar =

Data ada di antara PL & PD =

Variasi tinggi / rendah (~IQR)

Usually used in combination with other measures

Diagnostics with Boxplots

Visualization Techniques: Box Plots

Variasi tinggi / rendah (~IQR)

hasil coba akhir

hasil coba pertama

A scatter plot illustrates the relationship

Scatter plots can provide answers to the

Scatter plots can provide answers to the

Scatter plots can provide answers to the following

Scatter plots can provide answers to the following

Scatter plots can provide answers to the

Not used much anymore. Common

Another common graphical presentation of

The variable of interest is placed on the horizontal

Histogram (Common categories)

Left tail is the mirror image of the right tail