Anda di halaman 1dari 127

Biostatistik

Intermediat

Indang Trihandini

Metode Pembelajaran
Student Centered Learning
Modul (chapter)
Diskusi
Output (hasil statistik dari program SPSS)
Case based learning
Tugas

Prasyarat
Biostatistik Dasar
Metodologi Penelitian

Deskripsi mata kuliah


Mendapatkan kemampuan menerapkan pendekatan
kuantitatif tingkat intermediet dalam memahami
masalah kesehatan masyarakat serta pemecahannya.
Pokok bahasan teknik biostatistik deskriptif dan
inferensia, sampai menerapkan beberapa teknik
statistik tingkat lanjut yaitu analisis regresi linier,
analisis varians dan analisis regresi logistik, analisis
Survival. Mata ajaran ini menggunakan Bahasa
Indonesia sebagai bahasa penyampaian dan
menggunakan bahasa Inggris pada buku rujukan

Subkompetensi
Mahasiswa mampu menjelaskan lingkup dan
peran biostatistik dalam area kesehatan
masyarakat (C1; A2; P1)
Mahasiswa mampu membedakan jenis data dan
skala variable dalam riset penelitian kesehatan
masyarakat (C2; A3; P2)
Mahasiswa mampu menerapkan teknik statistic
deskriptif (ukuran pusat, variasi, dan posisi)
sesuai skala variable (C3; A2; P2)
Mahasiswa mampu menjelaskan teori dasar
peluang dan membedakan beberapa sebaran
peluang teoritis tersering diterapkan pada data
riset kesehatan masyarakat (C2; A3; P2)

Subkompetensi
Mahasiswa mampu menerapkan cara
mengeksplorasi data, mengenali kualitas data,
dan menjelaskan teknik imputasi yang dapat
dipakai pada data riset kesehatan masyarakat
(C3; A2; P1)
Mahasiswa mampu menerapkan metode statistic
inferensia dengan estimasi dan uji hipotesis
statistic berdasarkan skala variabel (C3; A3; P3)
Mahasiswa mampu menerapkan analisis statistic
uji Z, uji t, uji chi-square, asosiasi dan korelasi
sederhana sesuai jenis data, skala variable dan
rancangan penelitian (C3; A3; P3)

Subkompetensi
Mahasiswa mampu menerapkan teknik
pengambilan sampel dari populasi pada riset
kesehatan masyarakat
(C3; A3; P3)
Mahasiswa mampu menerapkan analisis
biostatistik lanjut: regresi linier ganda, regresi
logistic ganda, analisis varians ganda (C3; A3;
P2)
Mahasiswa mampu menelaah kesalahan dalam
analisis biostatistik dan interpretasinya dalam
riset kesehatan masyaraka (C4; A3; P3)

Subkompetensi
Mahasiswa mampu mengorganisasikan
penyajian hasil analisis biostatistik berdasarkan
data riset kesehatan masyarakat maupun
registrasi vital, baik secara tertulis dan lisan,
ditujukan kepada professional kesehatan
masyarakat maupun kalangan awam (C4; A4;
P4).

MATRIKS KEGIATAN
Pertemuan
ke-

Subkompetensi

Tahap Pembelajaran
O
L
U

Media
Teknologi
Pokok Bahasan

(%)

(%)

(%)

2.1.

90

10

Lingkup dan peran


biostatistik

2.2.

50

40

10

Variabel dan data

20

Statistik deskriptif:
ukuran pusat, ukuran
variasi, ukuran posisi

20

Penyajian data: tabel


master, tabel
frekuensi, tabel silang
dan grafik

2.3.

2.3.

40

40

40

40

Laptop, LCD

Kriteria
Penilaian
(Indikator)

Review
Makalah
Laptop,
LCD, artikel (individual)
di jurnal
Laptop,
LCD, artikel
di jurnal,
Diskusi Kelp,
perangkat Presentasi
lunak
statistik
Laptop,
LCD, artikel
di jurnal,
Diskusi Kelp,
perangkat Presentasi
lunak
statistik

Pertemuan
Subkekompetensi

Tahap Pembelajaran
O
L
U
(%)

(%)

(%)

Media

Pokok
Bahasan

2.4.

50

40

10

Peluang &
sebaran
peluang
teoritis

2.5.

50

40

10

Eksplorasi
Data

2.6.

50

40

10

Imputasi
Data

2.8.

50

40

10

Sampel

Teknologi

Laptop,
LCD,
perangkat
lunak
statistik
Laptop,
LCD,
perangkat
lunak
statistik
Laptop,
LCD, artikel
di jurnal,
perangkat
lunak
statistik
Laptop,
LCD, artikel
di jurnal,
perangkat
lunak
statistik

Kriteria
Penilaian
(Indikator)

Diskusi Kelp,
Presentasi

Diskusi Kelp,
Presentasi

Diskusi Kelp,
Presentasi

Diskusi Kelp,
Presentasi

Pertemuan
Subkekompetensi

2.6.

Tahap Pembelajaran
O
L
U
(%)
50

(%)
40

(%)
10

2.7.

50

40

10

2.7.

50

40

10

2.9.

UTS

50

40

10

Pokok
Bahasan
Statistic
inferensia:
uji hipotesis
Teknik uji
statistic
sederhana

Media
Teknologi

Laptop,
LCD, artikel
di jurnal,
perangkat
lunak
statistik
Laptop,
LCD, artikel
Teknik uji
di jurnal,
statistic
perangkat
sederhana
lunak
statistik
Analisis
Laptop,
statistik
LCD, artikel
variabel
di jurnal,
ganda:
perangkat
analisis
lunak
regresi linier statistik

Kriteria
Penilaian
(Indikator)

Diskusi Kelp,
Presentasi

Diskusi Kelp,
Presentasi

Presentasi
makalah
individu

Pertemuan
Subkekompetensi

Tahap Pembelajaran
O
L
U

Media
Teknologi
Pokok Bahasan

(%)

(%)

(%)

2.9.

50

40

10

2.9.

50

40

10

10

2.9.

50

40

10

11

2.9

50

40

10

Kriteria
Penilaian
(Indikator)

Laptop,
Analisis statistik
LCD, artikel
Presentasi
variabel ganda:
di jurnal,
makalah
analisis regresi
perangkat
individu
logistik
lunak
statistik
Laptop,
LCD, artikel
Presentasi
di jurnal,
Ukuran Risiko
makalah
perangkat
individu
lunak
statistik
Laptop,
LCD, artikel
Analisis statistik
Presentasi
di jurnal,
variabel ganda:
makalah
perangkat
analisis varians
individu
lunak
statistik
Laptop,
LCD, artikel
Presentasi
Analisis statistik di jurnal,
makalah
perangkat
Survival1
individu
lunak
statistik

Pertemuan
Subkekompetensi

Tahap Pembelajaran
O
L
U

Media
Teknologi
Pokok Bahasan

(%)

(%)

(%)

12

2.9

50

40

10

13

2.10.

40

50

10

14

2.11.

20

70

10

Kriteria
Penilaian
(Indikator)

Laptop,
LCD, artikel
Presentasi
Analisis statistik di jurnal,
makalah
perangkat
Survival2
individu
lunak
statistik
Laptop,
Presentasi
Kesalahan pada
LCD, artikel makalah
statistik
di jurnal
individu
Laptop,
LCD, artikel
di jurnal,
Presentasi
Penyajian hasil
perangkat makalah
analisis biostatistik
lunak
individu
penyajian
data

Review
16

UAS

MCQ
Essay

Penilaian
Bentuk

Instrumen

Frekuensi

Bobot (%)

Tugas Kelompok

Lembar penilaian

15

Tugas Individu

Lembar penilaian

12

20

UTS

Soal ujian

25

UAS

Soal ujian

40

Total

100

Pedoman Kriteria Penilaian


Nilai Angka
85100
80<85
75<80
70<75
65<70
60<65
55<60
40<55
<40

Nilai Huruf
A
AB+
B
BC+
C
D
E

Bobot
4,00
3,70
3,30
3,00
2,70
2,30
2,00
1,00
0,00

Tanggal Praktikum Individu


2 Oktober 2015
9 Oktober 2015
30 Oktober 2015

Session 1

Pertanyaan
Survei, Registrasi
Induksi vs deduksi
Skala variabel
Jenis variabel: dependen vs independen;
Jenis data: primer, sekunder, tersier
Data berkelompok vs Individu
Data rutin vs ad hoc

Session 1
Ukuran pusat, ukuran variasi, ukuran
posisi
Tabel master, tabel frekuensi, tabel
silang
Grafik: histogram, ogive, stem&leaf, boxplot, bar, garis, scatter

Session 2
Teori dasar peluang, hukum
penambahan & perkalian; peluang
kondisional & independen;
Sebaran peluang teoritis: Z, t, binomial;
Sebaran sampel & Teorema Limit Pusat;

Session 2
Cara eksplorasi data dengan statistik &
grafik
Simpulan kualitas data
Bias
Jenis imputasi data

Session 3
Besar sampel untuk estimasi & uji
hipotesis;
Teknik sampling random: sederhana,
sistematik, stratifikasi, klaster

Session 4
Estimasi: titik & selang.
Konsep kepercayaan & kekuatan studi.
Uji hipotesis statistik; uji satu atau dua
arah

Session 5

Uji Z, uji t,
Uji chi-square,
Korelasi (r dan rs),
Asosiasi pada tabel silang (koefisien OR,
RR, C)

Session 6
Analisis regresi linier sederhana & ganda

Session 8
Analisis regresi logistik sederhana &
ganda;

Session 9
Analisis varians satu arah & ganda arah

Session 10
Survival analysis

Session 11
Cox regression

Session 12
Interpretasi
Kesalahan dalam praktik/terapan analisis
statistik;
Kesalahan dalam menyimpulkan hasil
analisis statistik

Session 13
Penyajian hasil analisis biostatistik riset
atau registrasi vital dengan
narasi/tulisan.
Penyajian lisan hasil analisis biostatistik
dengan bantuan media.

Session 14
Critical review

Daftar Rujukan
Rumsay, Deborah. Intermediate Statistics for Dummies.
Indianapolis, Indiana, USA: Wiley Publishing Inc. 2007.
Prasetyo, Sabarinah; Iwan Ariawan. Biostatistik Dasar untuk
Rumah Sakit, Bahan Ajar. Depok UI: FKMUI, 2008.
Manfred Stommel, PhD; Katherine J. Dontje, PhD, FNP-BC,
Statistics for Advanced Practice Nurses and Health
Professionals, Springer Publishing Company, 2014
David M. Levine, Ph.D; David F. Stephan. Even You Can Learn
Statistics, Second Edition. Pearson Education, Inc.2010
Andy Field, Discovering Statistics Using Spss, Sage, 2009
Mendel Suchmacher; Mauro Geller, Practical Biostatistics A UserFriendly Approach for Evidence-Based Medicine. Elsevier Inc., 2012
Anders Kallen. Understanding Biostatistics. John Wiley & Sons, 2011

Sesi 1
KONSEP DASAR STATISTIK : induktif
deduktif, parameter statistik, populasi
sampel, stat deskriptif stat inferensial, data
variable (pembagian variable:noir dan
katagorikal numeric)
METODE STATISTIK : pengumpulan data,
pengolahan data, penyajian data dan analisis
data
RINGKASAN DATA : numerik (mean, median,
mode, varian,standar deviasi, koef variasi,
IQR ) dan kategorik ( proporsi/persentase)

Deduksi-Induksi
Pendekatan scientific melalui logika
Dua jenis penalaran, yaitu Penalaran Deduktif dan
Penalaran Induktif.
Penalaran deduktif merupakan prosedur yang
berpangkaldari observasi dan berakhir pada suatu
kesimpulan atau pengetahuan baru yang bersifat lebih
khusus.
Metode ini diawali dari pembentukan teori,konsep,
hipotesis, definisi operasional, instrumen dan
operasionalisasi.
Penalaran induktif merupakan prosedur yang berpangkal
dari hasil pengamatan empirik dan berakhir pada suatu
kesimpulan atau pengetahuan baru yang bersifat umum.
Kedua penalaran tersebut dapat digunakan secara
bersama-sama dan saling mengisi

Populasi dan Sampel


Atribut dan Subyek
Penelitian biasanya menyangkut sasaran
berupa atribut dari subyek tertentu (subyek
pemilik atribut)
Populasi dan Sampel
Ada populasi dan sampel atribut (data) dan ada
juga populasi dan sampel subyek (responden)

STATISTICS
Types of data
ANOVA
Normal distribution
Repeated measures ANOVA
Describing data
Non-parametric tests
Mann-Whitney U test
Boxplots
Summary of common tests
Standard deviations
Summaries of proportions
Skewed distributions
Odds and Odds Ratio
Parametric vs Non-parametric
Sample size
Absolute and Relative Risks
Statistical errors
Number Needed to Treat (NNT)
Power calculations
Confidence intervals (CIs)
Clinical vs statistical significanceCI (diff between two proportions)
Two-sample t test
Correlation
Problem of multiple tests
Regression
Subgroup analyses
Paired t test
Chi-square test

Logistic regression
Survival analysis

DATA
Data adalah bentuk kata jamak, sedangkan
bentuk tunggalnya adalah datum. Data
diperoleh melalui pencatatan (recording)
terhadap berbagai hal di institusi pelayanan
kesehatan, jumlah dan jenis obat-obatan yang
diberikan kepada pasien, jumlah dan jenis
bahan laboratorium yang dipakai, besarnya
uang yang dikeluarkan untuk pembelian barang,
jumlah kejadian infeksi nosokomial pada pasien

Sumber data
Data di rumah sakit atau puskesmas atau klinik
dapat dikumpulkan secara rutin, dan dapat
disebut sebagai data rutin. Hal ini dikenal
dalam sistem pencatatan dan pelaporan rumah
sakit. Namun kadang kala institusi tersebut
mengadakan pengumpulan data yang sifatnya
temporer, atau sewaktu saja, hal ini dapat
sebut sebagai data ad hoc. Survei yang
dilakukan sewaktu-waktu saja menghasilkan
data ad hoc.

Data Primer vs Data


Sekunder
Data primer adalah data yang diperoleh
dari proses pengumpulan yang dilakukan
sendiri langsung dari sumber datanya yaitu
subyek yang diteliti.
Data sekunder adalah data diperoleh dari
institusi yg telah mengumpulkan datanya,
atau tidak langsung dikumpulkan dari
sumber data yaitu subyek yang diteliti.

Data Individu vs Data


Agregat
Data yang diperoleh melalui pengukuran pada
satu subyek atau individu dapat disebut sebagai
data individu. Contohnya seperti pengukuran
kadar hemoglobin, lama hari rawat pasien, biaya
perawatan tiap pasien. Dapat pula diperoleh data
persen pasien infeksi nosokomial tiap bangsal
perawatan. Karena unitnya adalah bangsal
perawatan, yang terdiri dari para pasien, maka
dapat disebut sebagai data agregat, yaitu berupa
agregasi (kumpulan) pasien dalam setiap bangsal

Data Berpasangan vs Data


Berkelompok
Data diperoleh dari pengukuran berulang (repeated
measures) pada subyek yang sama, maka data yang
diperoleh disebut sebagai data berpasangan atau
dependen (paired data). Misalnya data tekanan darah
pasien sebelum diberi obat dan sesudah diberi obat,
pengetahuan pasien tentang penanggulangan asma
sebelum dan sesudah penyuluhan diberikan di ruang
tunggu pasien.
Sedangkan data yang diperoleh dari subyek kelompok
yang berbeda, misalnya data lama hari rawat pasien
pada kelompok yang dirawat di ruang kelas satu dan
kelas dua; atau kelompok pasien tinggal di daerah
urban atau kelompok rural disebut data berkelompok
atau data independen.

Transformasi Data
Mendapatkan informasi selalu diawali oleh
proses yang berawal dari data, atau dengan
hasil transformasi data. Sehingga dapat
diartikan bahwa informasi haruslah sesuatu
yang siap pakai untuk mengambil keputusan.
Data (pendekatan kuantitatif), segala sesuatu
terukur secara jelas (dengan angka), proses
tersebut dikenal sebagai prosedur statistik.
Prosedur statistik dimulai dari pengumpulan
data, pengolahan dan penyajian data, serta
analisis dan penyimpulan yang merupakan
informasi.

TIPE DATA
Nominal : gender, type of customer
(loyalty), flavor/color liked, etc.
Ordinal/Ranking :type of user, preferred
brand, brand awareness, etc.
Interval: Attitudinal or satisfaction scales.
Are you satisfied with your education at U of L?
Dissatisfied 1 2
Satisfied
3 4 5

Ratio: Income, price willing to pay, age, etc.

Type of
Measurement

Type of
descriptive analysis

Two
categories

Nominal
More than
two categories

Frequency table
Proportion (percentage)
Frequency table
Category proportions
(percentages)
Mode

Type of
Measurement

Type of
descriptive analysis

Ordinal

Rank order
Median

Interval

Arithmetic mean

Ratio

means

Data dari skala Kategorik


(Analisis Deskriptif)
47

Ukuran konsentrasi:
Proporsi/Ratio/Rate
Ukuran dispersi: Standar
Deviasi

Data dari skala numerik (Analisis


Deskriptif)
48

Data ini diringkaskan dalam


bentuk ukuran konsentrasi
nilai (nilai tengah) dan
ukuran dispersi/ penyebaran
nilai
Ukuran konsentrasi: Mean/
Median/Mode/ koefisien/
indeks
Ukuran dispersi: Standar
Deviasi

Level Pengukuran
Nominal

Ratio

Increasing ability to use higher level statistical analyses

Non-parametric testing is generally performed


with nominal and ordinal level data
Parametric testing with interval and ratio

Summarizing Qualitative Data


Frequency Distribution (shows how many)
Relative Frequency Distribution (shows what
fraction)
Percent Frequency Distribution (shows what
percentage)
Bar Graph
Pie Chart
Both these are graphical means for displaying
any of above.

Frequency Distribution
A
A frequency
frequency distribution
distribution is
is aa tabular
tabular summary
summary of
of
data
data showing
showing the
the frequency
frequency (or
(or number)
number) of
of items
items
in
in each
each of
of several
several nonoverlapping
nonoverlapping classes.
classes.

The
The objective
objective is
is to
to provide
provide insights
insights about
about the
the data
data
that
that cannot
cannot be
be quickly
quickly obtained
obtained by
by looking
looking only
only at
at
the
the original
original data.
data.

DESCRIBING DATA
MEAN

Average or arithmetic mean of the


data

MEDIAN

The value which comes half way


when the data are ranked in order

MODE

Most common value observed

In a normal distribution, mean and median are the


same

If median and mean are different, indicates that the


data are not normally distributed

The mode is of little if any practical use

Lihat data...
Eksplorasi data?
Mengenali data yang hilang
Mengenali data yang salah salah
sewaktu:

Entry data
Catat data
Editing & cleaning data
Proses koleksi/kumpul data

Data Analysis

Exploratory

Confirmatory

Apa akibat data salah atau


hilang?
Bentuk sebaran data konsekuensi:
Statistik deskriptif nilai yg disajikan
Statistik inferens ~ sebaran peluang teoritis

Eksplorasi Data

1.
2.
3.
4.
5.

Mengetahui sifat data


Beberapa cara eksplorasi:
Descriptive statistics: mean = median
Histogram, Stem & leaf
Box plot
Normal Probability Plot
Goodness of Fit Test : K-S test
56

Statistik Deskriptif
Ukuran Pusat (Central Tendency)

Mean = (Xi) / n

Median = Observed value at the mid

Modus = The most frequent observed

Proportion = ( XA)/ ( XA+B)


Variasi

Standard deviation = [(Xi-X)2] / (n-1)

Range = max - min


Posisi

Quartile = Q1, Q2, Q3

Inter Quartile Range = Q3 Q1

57

Tabel Sebaran Frekuensi

Tabel 1. Pasien patah tulang di rawat di UPD

RS X, Jan 1998

Umur (th) Frekuensi


%

15-19
20-24
25-29
30-34
35-39
40-44
45-49

Jumlah

7
18
21
30

16
147

4,7
12,3
14,3
20,4
28
19,0
27
18,4
10,9
100,0

Pareto Diagram

Bentuk?

60

Bentuk?

61

Bentuk?

62

Bentuk? Modus?

63

Stem & Leaf


Batang Daun

1
2,4,5

2
0,1,1,3,5,6,7

3
2,4,5,5,6,6,6,8,9

4
3,4,4,6,6,7,7,9,9,9

5
1,3,4,5,6,7,8,9

6
1,3,4,4,6,8

7
0,1,4,5

Stem and Leaf


A frequency distribution table that
provides a visual picture of the
distribution

Stem and Leaf


Each raw score has two parts: a stem,
consisting of all but the last digit, and the
leaf, the last digit in the number.

Stem and Leaf


Current Salary Stem-and-Leaf Plot
Frequency

Stem &

2.00
1
2.00
2
6.00
3
3.00
4
1.00
5
2.00 Extremes
Stem width:
Each leaf:

.
.
.
.
.

Leaf
55
47
001234
016
0
(>=81250)

10000
1 case(s)

Each
Each stem
stem represents
represents 10
10 thousand,
thousand,
so
so the
the 11 (stem)
(stem) =
= 10,000
10,000
There
There are
are two
two cases
cases (frequency=2)
(frequency=2) with
with 15,000,
15,000,
two
two cases
cases with
with 20,000
20,000 (actually,
(actually, 24,000
24,000 and
and 27,000),
27,000),
66 cases
cases with
with 30,000
30,000 (30,
(30, 30,
30, 31,
31, 32,
32, 33,
33, and
and 34
34
thousand)
thousand) in
in this
this data
data set.
set.
Current Salary Stem-and-Leaf Plot
Frequency
Stem & Leaf
2.00
1 . 55
2.00
2 . 47
6.00
3 . 001234
3.00
4 . 016
1.00
5 . 0
2.00 Extremes
(>=81250)
Stem width:
10000
Each leaf:
1 case(s)

Stem and Leaf in SPSS


To create a stem and leaf in SPSS, select the
following:
Analyze
Descriptives
Explore
Select stem and leaf in plots
Click continue
Click OK

Stem & Leaf Plot


Histogram like picture
1st digit of each data value is placed in the
stem, & 2nd digit in the leaf
E.g. Data: 42, 21, 46, 69, 87, 29, 34, 59, 81, 97, 64, 60, 87, 81, 69, 77,
75, 47, 73, 82, 91, 74, 70, 65, 86, 87, 67, 69, 49, 57, 55, 68, 74, 66, 81,
90, 75, 82, 37, 94

2
3
4
5
6
7
8
9

19
47
2679
975
940957986
7534045
717126712
7104
70

Box-plot
1400
15

1200

T re a d mill time in se co n d s

1000

800

600

400
N=

GROUP

10

healthy

disease

Box Plot
Gunakan Q1, Q2, Q3
Batas imajiner:
Pagar dalam: 1,5*IQR dari Q1 atau Q3
Pagar luar: 1,5*IQR dari Pagar dalam

Data ada di luar pagar luar =


Extreme / Outlier

Data ada di antara PL & PD =


Potential outlier

Boxplot Components

Box Plot
Gunakan Q1, Q2, Q3
Batas imajiner:
Pagar dalam: 1,5*IQR dari Q1 atau Q3
Pagar luar: 1,5*IQR dari Pagar dalam

Data ada di luar pagar luar =


Extreme / Outlier

Data ada di antara PL & PD =


Potential outlier

Box Plot
*
OF

Min

IF

Q1 Q2 Q3

IF

OF

Max

Variasi tinggi / rendah (~IQR)


Identiifikasi pencilan / nilai ekstrim
Bentuk Simetris atau tidak
76

Measures of dispersion
Range
Distance between the highest and lowest
scores in a distribution;
sensitive to extreme scores;
compensate by calculating interquartile range
(distance between the 25th and 75th percentile
points) which represents the range of scores for
the middle half of a distribution

Usually used in combination with other measures


of dispersion.

Range

Source:
www.animatedsoftware.com/ statglos/sgrange
.htm

Source:
http://pse.cs.vt.edu/SoSci/converted/Dispersion_I/box

Diagnostics with Boxplots

Visualization Techniques: Box Plots


Invented by J. Tukey
Another way of displaying the distribution of data
Following figure shows the basic part of a box plot
outlier
10th
percentile

75th
percentile
50th
percentile
25th
percentile
10th
percentile

Box Plot
*

Min

OF

IF

Q1 Q2 Q3

IF

OF

Max

Variasi tinggi / rendah (~IQR)


Identiifikasi pencilan / nilai ekstrim
Bentuk Simetris atau tidak
82

Diagram Pie
tidak

ya

Diagram Bar

10.5

10.0

9.5

9.0

8.5

Count

8.0

7.5
healthy

GROUP

disease

Bar Diagram
60

50

40

30

20

Status

Count

10

1
0

0
No College

Education

College

Bar Diagram
50

40

30

20

C ount

Status
0
10

1
No College

Education

College

Scatter Plot
18

16

14

12

hasil coba akhir

10

6
10

12

hasil coba pertama

14

16

18

20

Scatter plots

A scatter plot illustrates the relationship


between two continuous variables.

Scatter plots

A scatter plot
illustrates the values
of Y (vertical axis)
versus the
corresponding values
of X (horizontal axis)

Scatter plots

Scatter plots can provide answers to the


following questions:
Are variables X and Y correlated?
(as one variable goes up, the other variable
goes up/down)

Scatter plots

Scatter plots can provide answers to the


following questions:
Is there a linear relationship between X and
Y? (as one variable goes up, the other
variable goes up/down)

8
7

Scatter plots

6
5
4
3
2

1
0
0

Scatter plots can provide answers to the following


questions:
Is there a curvilinear relationship between variables
X and Y? (As Y goes up X goes up, then at a peak,
as X continues to go up, Y goes down

10

8
7

Scatter plots

6
5
4
3
2

1
0
0

Scatter plots can provide answers to the following


questions:
Is there a curvilinear relationship between variables
X and Y? (As Y goes up X goes up, then at a peak,
as X continues to go up, Y goes down

10

100000

80000

Scatter plots

60000

Beginning Salary

40000

20000

0
0

20000

40000

60000

80000

100000

120000

Current Salary

Scatter plots can provide answers to the


following questions:
Are there outliers? (Do one or more points
stray from the trend?)

140000

Grafik Garis

50
45
40
35
30
25
20
15
10
5
0

R.Melati
R.Mawar
R.Anggrek

bl-1

bl-2

bl-3

bl-4

Plot QQ

96

Descriptiv es
Nilai Asupan
Lemak Responden

Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

Lower Bound
Upper Bound

Statistic
20.0374
17.9234

Std. Error
1.06249

22.1515
19.8010
18.7150
92.568
9.62123
6.01
40.01
34.00
15.16
.291
-.982

.266
.526

97

Dot Plot
One of the simplest graphical
summaries of data is a dot plot.
A horizontal axis shows the range of
data values.
Then each data value is represented by
a dot placed above the axis.

Dot Plot
Tune-up Parts Cost

.
50

.
.
.
60

.
..
. .
.
.. .. .. ..
.
.
..... .......... .. . .. . . ... . ..
70

80

90

Cost ($)

Not used much anymore. Common


when graphical drawing tools were
primitive.

100

110

Histogram

Another common graphical presentation of


quantitative data is a histogram.

The variable of interest is placed on the horizontal


axis.
A rectangle is drawn above each class interval with
its height corresponding to the intervals frequency,
relative frequency, or percent frequency.
Unlike a bar graph, a histogram has no natural
separation between rectangles of adjacent classes.
In informal discussions bar graphs and
histograms are often equated. In this class
you should be careful to keep them straight.

Histogram
Tune-up Parts Cost
18
16

Frequency

14
12
10
8
6
4
2
5059 6069 7079

8089
8089 9099
9099 100-110
100-110

Parts
Cost ($)

Histogram (Common categories)


Symmetric

Left tail is the mirror image of the right tail


Examples: heights and weights of people
Relative Frequency

.35
.30
.25
.20
.15
.10
.05
0

Skewness of distributions
Measures look at how lopsided
distributions arehow far from the
ideal of the normal curve they are
When the median and the mean are
different, the distribution is skewed.
The greater the difference, the
greater the skew.

Distributions that trail away to the left are


negatively skewed and those that trail
away to the right are positively skewed
If the skewness is extreme, the
researcher should either transform the
data to make them better resemble a
normal curve or else use a different set
of statisticsnonparametric statisticsto
carry out the analysis

Different Shapes of
Distributions

Source: http://faculty.vassar.edu/lowry/f0204.gif

Skewness of distributions

Source: http://www.polity.org.za/html/govdocs/reports/aids/images/image022.gif

Distribution of posting frequency on Usenet

Histogram
Moderately Skewed Left
A longer tail to the left
Example: exam scores
Relative Frequency

.35
.30
.25
.20
.15
.10
.05
0

Histogram
Moderately Right Skewed
A Longer tail to the right
Example: housing values
Relative Frequency

.35
.30
.25
.20
.15
.10
.05
0

Histogram
Highly Skewed Right
Relative Frequency

A very long tail to the right


.35
Example:
executive salaries
.30
.30

.25
.20
.15
.10
.05
0

Kurtosis
Measures of kurtosis look at how sharply
the distribution rises to a peak and then
drops away

Cumulative Distributions
Cumulative
Cumulative frequency
frequency distribution
distribution shows
shows the
the
number
number of
of items
items with
with values
values less
less than
than or
or equal
equal to
to
the
the upper
upper limit
limit of
of each
each class..
class..
Cumulative
Cumulative relative
relative frequency
frequency distribution
distribution shows
shows
the
the proportion
proportion of
of items
items with
with values
values less
less than
than or
or
equal
equal to
to the
the upper
upper limit
limit of
of each
each class.
class.
Cumulative
Cumulative percent
percent frequency
frequency distribution
distribution shows
shows
the
the percentage
percentage of
of items
items with
with values
values less
less than
than or
or
equal
equal to
to the
the upper
upper limit
limit of
of each
each class.
class.

Cumulative Distributions
Hudson Auto Repair

Cost ($)

Cumulative
Frequency

< 59
< 69
< 79
< 89
< 99
< 109

2
15
31
38
45
50

Cumulative
Relative
Frequency

.04
.30
2 + 13 .62
.76
.90
1.00

Cumulative
Percent
Frequency
4
30
62
15/50
76
90
100

.30(100)

Ogive

An ogive is a graph of a cumulative


distribution.
The data values are shown on the horizontal
axis.
Shown on the vertical axis are the:
cumulative frequencies, or
cumulative relative frequencies, or
cumulative percent frequencies
The frequency (one of the above) of each class
is plotted as a point.
The plotted points are connected by straight
lines.

Ogive

Hudson Auto Repair


Because the class limits for the parts-cost
data are 50-59, 60-69, and so on, there
appear to be one-unit gaps from 59 to 60,
69 to 70, and so on.
These gaps are eliminated by plotting points
halfway between the class limits.

Thus, 59.5 is used for the 50-59 class, 69.5


is used for the 60-69 class, and so on.

Ogive with
Cumulative Percent Frequencies
Cumulative Percent Frequency

Tune-up
Tune-up Parts
Parts Cost
Cost

100
80
60

(89.5,
76)

40
20
50

60

70

80

90

100

Parts
Cost ($)

110

Measures of dispersion
Variance (S2)
Average of squared distances of individual
points from the mean
High variance means that most scores are
far away from the mean. Low variance
indicates that most scores cluster tightly
about the mean.

Standard Deviation (SD)


A summary statistic of how much scores
vary from the mean
Square root of the Variance
expressed in the original units of
measurement
Used in a number of inferential statistics

Variance vs. Standard


Deviation
Variance
Population

Sample

Standard
Deviation

NORMAL DISTRIBUTION
THE EXTENT OF THE
SPREAD OF DATA
AROUND THE MEAN
MEASURED BY THE
STANDARD
DEVIATION

MEAN

CASES DISTRIBUTED
SYMMETRICALLY
ABOUT THE MEAN
AREA BEYOND TWO
STANDARD DEVIATIONS
ABOVE THE MEAN

STANDARD DEVIATION MEASURE OF THE


SPREAD OF VALUES OF A SAMPLE AROUND
THE MEAN
THE SQUARE OF
THE SD IS KNOWN
AS THE VARIANCE

SD

Sum(Value Mean)
Numberof values

SD decreases as a function
of:
smaller spread of values
about the mean
larger number of values
IN A NORMAL
DISTRIBUTION, 95%
OF THE VALUES WILL
LIE WITHIN 2 SDs OF
THE MEAN

STANDARD DEVIATION AND


SAMPLE SIZE
As sample size
increases, so
SD decreases

n=150

n=50

n=10

SKEWED DISTRIBUTION
MEAN
MEDIAN 50% OF
VALUES WILL LIE
ON EITHER SIDE
OF THE MEDIAN

DOES A VARIABLE FOLLOW A


NORMAL DISTRIBUTION?
Important because parametric statistics
assume normal distributions
Statistics packages can test normality
Distribution unlikely to be normal if:
Mean is very different from the median
Two SDs below the mean give an impossible
answer (eg height <0 cm)

Anda mungkin juga menyukai