Anda di halaman 1dari 6

1.

Terdapat data sbb: 3, 5, 4, 4, 6, 7, 8, 9, 2, 4


Hitung: mean, median, modus, range, frekuensi dari 4
2. Apa beda statistika deskriptif dan inferensial? Beri satu contoh sederhana!

Dr. Titis Wijayanto


Industrial Engineering
DTMI-UGM
twijaya@ugm.ac.id

3. Apa beda antara independen variabel dan dependen variabel?


4. Gambarkan dua kurva distribusi normal lengkap dengan notasi sumbu-x dan sumbu y-nya pada satu grafik!
Kurva A memiliki standar deviasi lebih besar namun memiliki rerata lebih kecil dari B .
5. Jika kita ingin membandingkan hasil nilai ujian dari kelompok mahasiswa kelas C dan D, uji statistika apa
yang bisa digunakan? Jelaskan dengan singkat langkah-langkahnya!
6. Uji ANOVA digunakan untuk menghitung perbedaan kinerja (K) antara 3 kelompok shift pekerja.
a.

Tuliskan h0 nya

b.

Jika fhitung > F tabel, apa interpretasinya?

7. Jelaskan dengan singkat apakah itu univariate, bivariate, dan multivariate?

01 Kontrak perkuliahan, konsep dasar analisis statistika


MVA, aplikasi, gambaran besar MVA

02 Review tentang operasi matrix

(b) terampil menggunakan berbagai metode statistika multivariate

03 Data preparation

(c)terampil memilih metode statistika multivariate yang tepat


untuk konteks kasus yang beragam

04 Multiple regression

05 Multivariate ANOVA (MANOVA)

06 Principal Component Analysis

07 Factor Analysis

Tujuan dari kuliah ini:


(a) mampu memahami konsep dasar dan konsep lanjut dari
statistika multivariate

(d) mampu membuat interpretasi dan menarik benang merah


(appropriate inferences) hasil dari analisis statistika
multivariate
3

nilai sebelum MID

50% dari total nilai

Quiz

5%

Working Group

15%

Presentation

5%

Peer review

0..%

UTS

25%

Rencher AC, Method of multivariate analysis, second


edition, Willey interscience
JF. Hair, Jr., W.C. Black, B.J. Babin, R.E. Anderson, and R.L.
Tatham, 2006, Multivariate Data Analysis, Pearson Int.
Edition, New York.
Hines, W.W., D.C. Montgomery, D.M. Goldsman, and C.M.
Borror, 2003, Probability and Statistics in Engineering, John
Wiley & Sons, Hoboken, New Jersey
Box G.E.P., W.G. Hunter, and J.S. Hunter, 1978, Statistics for
Experimenters: An Introduction to Design, Data Analysis,
and ModelBuilding, John Wiley & Sons, New York

Independent Variable (IV)

Controlled by the experimenter

and/or hypothesized influence

and/or represent different groups

Dependent variables
u

the response or outcome variable

IV and DV - input/output, stimulus/response, etc.

Usually represent sides of an equation

x y

x yz

x + = y
x = y +
9

Experimental

high level of researcher control, direct manipulation of IV, true IV to DV causal


flow

Non-experimental

low or no researcher control, pre-existing groups (gender, etc.), IV and DV


ambiguous

Experiments = internal validity

Non-experiments = external validity

10

Drowning in data!
Many organisations today are faced with the same challenge:

TOO MUCH DATA. These include:


Business - customer transactions
Communications - website use
Government - intelligence
Science - astronomical data
Pharmaceuticals - molecular configurations
Industry - process data

11

12

A typical industrial plant has hundreds of control loops, and thousands of


measured variables, many of which are updated every few seconds.
This situation generates tens of millions of new data points each day, and billions
of data points each year. Obviously, this is far too much for a human brain to
absorb. Because of the way we visualise things, we are basically limited to
looking at only one or two variables at a time:
11

As a result, we have become data-rich but knowledge-poor.


The biggest problem is that interesting, useful patterns and relationships which are
not intuitively obvious lie hidden inside enormous, unwieldy databases. Also, many
variables are correlated.
This has led to the creation of data-mining techniques, aimed at extracting this
useful knowledge. Some examples are:
Neural Networks
Multiple Regression
Decision Trees
Genetic Algorithms
Clustering
MVA
Subject of this topic
Mining data

0
1

13

14

The aim of data-mining can be


illustrated graphically as follows:

KNOWLEDGE

Data
unrelated facts
Information
facts plus relations
Knowledge
information plus patterns

Suppose that youre the production supervisor at a company. Your


process has been running smoothly until all of a sudden, alarms
are sounding, and you have to make a decision about what to do.

You have two control charts at your disposal for the


measurements performed on the system.

Scientific
principles

Connectedness

Observed
associations

+ patterns

INFORMATION
+ relations
DATA
15

Raw Numbers

Understanding
16

What if you plot the points of both control charts against


each other to form a simple multivariate control chart?

Multivariate analysis (MVA) is defined as the simultaneous


analysis of more than two variables.
MVA uses ALL available data to capture the most information
possible. The basic principle is to boil down hundreds of
variables down to a mere handful.

on this simple multivariate graph, you can see that


variable 1 and variable 2 are related to each other linearly.
(Most of the points lie close to a straight line.) The suspect
point (the big X) is well separated from the other points

17

MVA

18

The World is a complex system

A good example of these ideas is Apple versus Orange.

Reality

Clever scientists could easily come up with hundreds of different things to


measure on apples and oranges, to tell them apart:
Colour, shape, firmness, reflectivity,
Skin: smoothness, thickness, morphology,
Juice: water content, pH, composition,
Seeds: colour, weight, size distribution,
etc.

Univariate stats only go so far when applicable

Real data usually contains more than one DV

Multivariate analyses are much more realistic and feasible


If you could solve all problems by taking a single measurement on a
system, the world would be a much simpler place to live in.
However, complex systems require multiple measurements to better
understand them.
19

However, there will never be more than one difference: is it an apple or an


orange? In MVA parlance, we would say that there is only one latent
attribute.
20

Imagine that you were given a spreadsheet with hundred columns of data
(the variables) measured on thousands rows (the samples [objects]). How
would you do about analyzing such data?
Tmt

X1

X4

X5

Rep

Y avec

Y sans

-1

-1

-1

2.51

2.74

-1

-1

-1

2.36

3.22

-1

-1

-1

2.45

2.56

-1

2.63

3.23

-1

2.55

2.47

-1

2.65

2.31

-1

2.45

2.67

-1

2.6

2.45

-1

2.53

2.98

3.02

3.22

-1

2.7

2.57

-1

2.97

2.63

2.89

3.16

2.56

3.32

2.52

3.26

-1

2.44

3.1

-1

2.22

2.97

-1

2.27

2.92

1
0
1
Raw
Data:
1
0
2
impossible
1
0
3to
interpret
-1
1
1

hundreds of columns
thousands of rows

Your classical training probably tells you to


Plot the columns together two at a time
Plot each variable for all samples and look
for trends

Tmt

X1

X4

X5

Rep

Y avec

Y sans

-1

-1

-1

2.51

2.74

-1

-1

-1

2.36

3.22

-1

-1

-1

2.45

2.56

-1

2.63

3.23

-1

2.55

2.47

-1

2.65

2.31

-1

2.45

2.67

-1

2.6

2.45

-1

2.53

2.98

-1

3.02

3.22

-1

2.7

2.57

-1

2.97

2.63

2.89

3.16

2.56

3.32

2.52

3.26

-1

2.44

3.1

-1

2.22

2.97

-1

2.27

2.92

Raw Data:
impossible to
interpret

Statistical Model

Several of the more common multivariate techniques:

Multivariate ANOVA (MANOVA)

Principal Component Analysis

Factor Analysis

Multiple Regression Analysis

Multiple Discriminate Analysis

Cluster Analysis

Multidimensional Scaling

Conjoint Analysis
23

trends

trends
X

hundreds of columns
21

trends
X

(internal to
software)

thousands of rows

X
22

2-D Visual Outputs