Anda di halaman 1dari 54

STATISTIKA DAN PROBABILITAS

(MS184303)

HASAN IQBAL NUR, ST, MT.


DIKA VIRGINIA DEVINTASARI, S.Si, M.Sc.

DEPARTEMEN TEKNIK TRANSPORTASI LAUT


FAKULTAS TEKNOLOGI KELAUTAN
INSTITUT TEKNOLOGI SEPULUH NOPEMBER

Semester Ganjil 2018/2019 1


Pokok Bahasan
1. Pendahuluan: Pengantar Statistika dan Probabilitas
2. Statistika Deskriptif
3. Probabilitas
4. Distribusi Probabilitas: Variabel Random (Diskrit dan Kontinyu)
5. Distribusi Variabel Random Diskrit
6. Distribusi Variabel Random Kontinyu
7. Hubungan antar distribusi
8. Penaksiran parameter
9. Pengujian hipotesis
10. Tes Chi-Square

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 2


MINGGU KE-2
Pendahuluan: Statsitika
Deskriptif (Descriptive
Statistics)

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 3


Jenis-Jenis Statistika
• Statistika deskriptif: metode yang berkaitan
dengan pengumpulan dan penyajian data i.e.,
Penyampaian secara grafis dan numeris dari
data amatan untuk keperluan deskripsi.
• Statistika inferensi: metode yang berkaitan
dengan analisis sampel untuk penarikan
kesimpulan (inferensi) tentang karakteristik
populasi.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 4


Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 5
What is Descriptive Statistics? [Ronald. E. Walpole]
There are times when a scientific practitioner wishes only to
gain some sort of summary of a set of data represented in
the sample. In other words, inferential statistics is not
required. Rather, a set of single-number statistics or
descriptive statistics is helpful.
These numbers give a sense of:
1. the centre of the location of the data,
2. variability in the data and
3. general nature of distributions of observations in the
Sample

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 6


Continue ...
Types of descriptive statistics:
• Organize Data
◦ Tables
◦ Graphs, i.e.,

• Summarize Data
◦ Central Tendency (distribusi frekuensi)
 Mean
 Median
 Modus

◦ Variation (Measuring Variability)


 Range
 Variance
 Standard Deviation
 Quartile

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 7


Key measures
Describing data

Statistika dan Probabilitas_Departemen


Teknik Transportasi Laut_Ganjil 2018/2019 8
Key distinction
Population vs. Sample Notation

Statistika dan Probabilitas_Departemen


Teknik Transportasi Laut_Ganjil 2018/2019 9
Central Tendency > Mean (Rata-Rata)
• Suppose that the observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 . The sample
mean is denoted by 𝑥.ҧ

• The sampling distribution of the sample mean is a probability


distribution of all the sample means. Let’s say you had 1,000 people,
and you sampled 5 people at a time and calculated
their average height. If you kept on taking samples (i.e. you repeated
the sampling a thousand times), eventually the mean of all of your
sample means will:
1. Equal the population mean, μ
2. Look like a normal distribution curve.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 10


Continue ...
• Mean can be badly affected by outliers (data points with extreme
values unlike the rest)
• Outliers can make the mean a bad measure of central tendency or
common experience

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 11


Central Tendency > Median (Nilai Tengah)
• The purpose: to reflect the central tendency of the sample in such a way that is
uninfluenced by extreme value or outliers.

median

• Given the observations in a sample are 𝑥1 , 𝑥2 , … , 𝑥𝑛 , arrange in increasing order of


magnitude, the sample median is:

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 12


Continue ...
2. If the recorded values for a variable form a symmetric distribution,
the median and mean are identical.
3. In skewed data, the mean lies further toward the skew than the
median.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 13


The most common The combined IQ
data point is called scores for Classes A
the mode. & B:
Central
Tendency>
Mode 80 87 89 93 93 96 97
BTW, It is possible to
98 102 103 105 106
have more than one
109 109 109 110 111
mode!
115 119 120

Statistika dan Probabilitas_Departemen


Teknik Transportasi Laut_Ganjil
2018/2019
14
Continue ... 1. It may give you the most likely experience
rather than the “typical” or “central”
experience.
2. In symmetric distributions, the mean, median,
and mode are the same.
3. In skewed data, the mean and median lie
further toward the skew than the mode.

Statistika dan Probabilitas_Departemen


Teknik Transportasi Laut_Ganjil 2018/2019 15
Types of descriptive statistics:
• Organize Data
o Tables
o Graphs, i.e.,

• Summarize Data
◦ Central Tendency (distribusi frekuensi)
o Mean
o Median
o Modus
◦ Variation (Measure of Dispersion)
 Range
 Variance
 Standard Deviation
 Quartile

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 16


Dispersion > Range
• The spread, or the distance, between the lowest and highest values of a
variable.
• To get the range for a variable, you subtract its lowest value from its
highest value. 𝑿𝒎𝒂𝒙 − 𝑿𝒎𝒊𝒏
• The Range can be useful and is discusses at length on Statistical Quality
Control.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 17


Dispersion > Variance
1. A measure of the spread of the recorded values on a variable.
2. A measure of dispersion.
3. Large variability in a data set produces relatively large value of
𝑥 − 𝑥ҧ 2 and thus a large sample variance.

The larger the variance, the further the individual cases are from the mean,

The smaller the variance, the closer the individual scores are to the mean.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 18


Continue ...
Variance is extensively used in probability theory, where from a given
smaller sample set, more generalized conclusions need to be drawn. This is
because variance gives us an idea about the distribution of data around the
mean, and thus from this distribution, we can work out where we can expect
an unknown data point. [smaller data set  data distanalyse]
1. Calculating variance starts with a “deviation.”
A deviation is the distance away from the mean of a case’s score.
(𝑥 − 𝑥)ҧ
Example:
If the average person’s car costs
$20,000, my deviation from the
mean is - $14,000! So,
6,000 – 20,000K = -14K

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 19


Question (?)

1. The deviation of 102 from 110.54 is?


2. Deviation of 115?

Class A--IQs of 13 Students


102 115
128 109
131 89
98 106
140 119
93 97
110
(𝑥 − 𝑥)ҧ 𝐴 = 110.54

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 20


Continue ...
• We want to add these to get total deviations, but if we were to do that,
we would get zero every time. Why?
The data has the same value to the mean
• We need a way to eliminate negative signs. Why?
Since we are only interested in the deviations of the scores and not
whether they are above or below the mean score, we can ignore the
minus sign and take only the absolute value, giving us the absolute
deviation.

2. Squaring the deviations will eliminate negative signs...


A Deviation Squared: (𝑥 − 𝑥)ҧ 2
Total
Deviation ...
Deviation

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 21


Continue ...
3. If you were to add all the squared deviations together, you’d get what
we call the “Sum of Squares.”

𝑠 2 = σ(𝑥 − 𝑥)ҧ 2 =(𝑥1 − 𝑥)ҧ 2 + (𝑥2 − 𝑥)ҧ 2 + ... + (𝑥𝑛 − 𝑥)ҧ 2

Total Sum of
Deviation
Deviation Square

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 22


4. The last step,
The last step, the approximate average sum of squares.

Thus,
• all variances that are non-zero will be positive numbers.
• A large variance indicates that numbers in the set are far from the mean
and each other, while a small variance indicates the opposite.
Total Sum of
Deviation Variance
Deviation Square

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 23


Sample Standard Deviation
The standard deviation is a measure of the spread of scores within a set
of data. Denoted by 𝑠, is the positive square root of 𝑠 2 , that is:

REVIEW:
Deviation  Deviation Squared  Sum of Squared  Variance 
Standard Deviation

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 24


Continue ...

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 25


Variance VS Std. Dist

Which Variability is more important ?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 26


The Second Quartile (Median)

What does it mean?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 27


The First Quartile

What does it mean?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 28


The Third Quartile

What does it mean?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 29


Graphical Diagnostics
• Scatter Plot
• Stem-and-Leaf-Plot
• Histogram
• Box-and-Whisker-Box or Box-Plot

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 30


Statistika dan Probabilitas_Departemen
Teknik Transportasi Laut_Ganjil 2018/2019 31
Scatter Plot
 Explanatory and Response Variables
Most statistical studies examine data on more than one variable. In
many of these settings, the two variables play different roles.

Definition:
A response variable measures an outcome of a study.
An explanatory variable may help explain or influence
changes in a response variable.

Note: In many studies, the goal is to show that changes in


one or more explanatory variables actually cause
changes in a response variable. However, other
explanatory-response relationships don’t involve direct
causation.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 32


Displaying Relationships: Scatterplots

The most useful graph for displaying the relationship between two
quantitative variables is a scatterplot.
Definition:
A scatterplot shows the relationship between two quantitative
variables measured on the same individuals. The values of one
variable appear on the horizontal axis, and the values of the
other variable appear on the vertical axis. Each individual in
the data appears as a point on the graph.

1.Decide which variable should go on each axis.

•Remember, the eXplanatory variable goes on the X-axis!

2.Label and scale your axes.

3.Plot individual data values.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 33


Displaying Relationships: Scatterplots

 Make a scatterplot of the relationship between body


weight and pack weight.
 Since Body weight is our eXplanatory variable, be sure to
place it on the X-axis!

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019


Interpreting Scatterplots

How to Examine a Scatterplot


As in any graph of data, look for the overall pattern and for striking
departures from that pattern.
•You can describe the overall pattern of a scatterplot by the direction,
form, and strength of the relationship.
•An important kind of departure is an outlier, an individual value that
falls outside the overall pattern of the relationship.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 35


Interpreting Scatterplots

Outlier
There is one possible outlier, the hiker with
the body weight of 187 pounds seems to be
carrying relatively less weight than are the
other group members.

Strength Direction Form


There is a moderately strong, positive, linear relationship between body weight and
pack weight.
It appears that lighter students are carrying lighter backpacks.
36
Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019
Stem-and-Leaf Plot
• Combined tabular and graphical display
How can we create a Steam-and-Leaf Plot?

Use the data in the table to make a stem-and-leaf plot.

Step 1: Group the data by tens digits.


75 79
Step 2: Order the data from least to greatest. 83 84 86 86 88
91 94 99

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 37


Helpful Hint!

To write 42 in a stem-and-leaf plot, write


each digit in a separate column.
4 2

Stem
Leaf
Test Scores
Stems Leaves
7 5 9
8 3 4 6 6 8
9 1 4 9

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 38


Find the least value, greatest value, mean, median,
mode, and range of the data.
The least stem and least leaf give Stems Leaves
the least value, 40. 4 00157
5 1124
The greatest stem and greatest leaf
give the greatest value, 94. 6 333599
7 044
8 367
9 14

Use the data values to find the mean (40 + … + 94) ÷ 23 = 64.

Key: 4 0 means 40

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 39


The median is the middle value in the table, 63.

To find the mode, look for the number that occurs most
often in a row of leaves. Then identify its stem. The mode is
63.

The range is the difference between


the greatest and the least value.
94 – 40 = 54.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 40


• The stem-and-leaf plot contains only four stems,
There is a consequently does not provide an adequate Picture
of the distribution.
case... • The smaller the number of data available, the
smaller is our choice for the number of stems.
• Usually we choose 5 to 20 stems

Statistika dan Probabilitas_Departemen


Teknik Transportasi Laut_Ganjil 2018/2019 41
Histogram

Relative
Line Plot Frequency Histogram
Dist.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 42


Continue...
Another way is through the use of frequency distribution, where the data,
grouped into different classes or intervals, can be constructed by counting
the leaf belonging to each Stem and nothing that Stem defines a class
interval.
(𝑏 − 𝑎)
2

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 43


The Histogram

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 44


Statistika dan Probabilitas_Departemen
Teknik Transportasi Laut_Ganjil 2018/2019 45
Statistika dan Probabilitas_Departemen
Teknik Transportasi Laut_Ganjil 2018/2019 46
Statistika dan Probabilitas_Departemen
Teknik Transportasi Laut_Ganjil 2018/2019 47
Box-and-Whisker Plot or Box Plot
• A box plot summarizes data using the median, upper and lower
quartiles, and the extreme (least and greatest) values. It allows you to
see important characteristics of the data at a glance.
• Interquartile range ( Upper quartile, extremes the 75% percentiles;
Lower quartile, the 25% percentiles).
• The five number summary consist of :
1. The median ( 2nd quartile)
2. The 1st quartile
3. The 3rd quartile
4. The maximum value in a data set
5. The minimum value in a data set

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 48


Importance

Why do we need to know how to display and analyze data


in box-and-whisker plots ?

*It helps you to interpret and represent data.


*It gives a visual representation of data.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 49


Box and Whisker Diagrams.

Anatomy of a Box and Whisker Diagram.

Lower Upper
Lowest Quartile Median Quartile Highest
Value Value
Whisker Box Whisker

4 5 6 7 8 9 10 11 12

Box Plots

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 50


Statistic Descriptive
Done!!!

Now you are qualified for this study. Any questions?

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 51


Exercise 1.21
The lengths of power failures, in minutes, are recorded in the following
table.

(a) Find the sample mean and sample median of the power-failure times.
(b) Find the sample standard deviation of the power failure times.

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 52


Exercise 1.18
The following scores represent the final examination grades for an elementary
statistics course:
23 60 79 32 57 74 52 70 82
36 80 77 81 95 41 65 92 85
55 76 52 10 64 75 78 25 80
98 81 67 41 71 83 54 64 72
88 62 74 43 60 78 89 76 84
48 84 90 15 79 34 67 17 82
69 74 63 80 85 61
(a) Construct a stem-and-leaf plot for the examination grades in which the
stems are 1, 2, 3, . . . , 9.
(b) Construct a relative frequency histogram, draw an estimate of the graph
of the distribution, and discuss the skewness of the distribution.
(c) Compute the sample mean and sample std. dev

Statistika dan Probabilitas_Departemen Teknik Transportasi Laut_Ganjil 2018/2019 53


Exercise 1.27
A study is done to determine the influence of the wear, y, of a bearing as a function of the load, x,
on the bearing. A designed experiment is used for this study. Three levels of load were used, 700 lb,
1000 lb, and 1300 lb. Four specimens were used at each level, and the sample means were,
respectively, 210, 325, and 375.

(a) Plot average wear against load.

(b) From the plot in (a), does it appear as if a relationship exists between wear and load?

(c) Suppose we look at the individual wear values for each of the four specimens at each load level
(see the data that follow). Plot the wear results for all specimens against the three load values.

(d) From your plot in (c), does it appear as if a clear relationship exists? If your answer is different
from that in (b), explain why.

Statistika dan Probabilitas_Departemen


Teknik Transportasi Laut_Ganjil 2018/2019 54