Anda di halaman 1dari 73

Lecture 10: Introduction to

Machine Learning

Course: Biomedical Informatics

Parisa Rashidi
Fall 2014

Reminder
Your project progress reports are due on Tuesday,

10/28
~2 pages in length (excluding references)
formatted using IEEE style link

Agenda
Machine learning
Today
Introduction to machine learning
Different types of machine learning methods

Walkthrough: a machine learning process

Later
More machine learning methods

NLP

Software
Rapidminer Link

Artificial Intelligence
Artificial Intelligence (AI) has many subfields

Machine Learning (ML)


Natural Language Processing (NLP)
Vision

What is Learning ?
Machine learning is programming computers to

optimize a performance criterion using example data


or past experience.

You were not made to live like beasts, but to follow virtue and knowledge.
(Dante Alighieri)
*ROBERTO BATTITI AND MAURO BRUNATO.
The LION way. Machine Learning plus Intelligent Optimization.

What We Talk About When We


Talk AboutLearning
Learning general models from a data of particular

examples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
Example 1: adverse drug-drug interactions
Example 2: Customer behavior:

People who bought Blink also bought David and


Goliath (www.amazon.com)

Build a model that is a good and useful approximation

to the data.

Relation with Other Fields


ML draws on ideas from many fields
Statistics
Statistical
Physics

Economics

Neuroscience

Machine Learning

Control
Theory
Computer
Science

Optimization

To Understand ML
You need
Basic Knowledge of computer science
Linear Algebra
Calculus
Probability and statistics
Optimization

Example ML Algorithms
Linear Regression
Decision trees, neural network, support vector machine,

Total
Energy
Very Low

Very High
Low

Stand

Run

Main
Frequency
Low
Sit

High
Walk

Support Vector Machines


A simple decision tree

Generic Applications
Almost everywhere
Speech recognition, face recognition, search engines,
bioinformatics, fraud detection
And it will be everywhere
Smart homes, smart vehicles, smart cities

Biomedical Application
Mobile health monitoring solutions
Electronic Health Record (EHR) mining
Genome-wide associations (GWAS)
Smart homes for elderly
Biomarker discovery

13

Challenges & Competitions


Many other competitions at Kaggle

http://www.kaggle.com/competitions
Example: predict the likelihood that an HIV patient's

infection will become less severe

A great way to improve your skills (and maybe make

some money!)

Supervised vs. Unsupervised


Learning

Supervised Machine Learning


Goal is Prediction
Example:
Input: examples of benign and malignant tumors
defined in terms of tumor shape, radius, ..
Output: predict whether a previously unseen example is
benign or malignant

Tumor
Examples

Machine
Learning
Algorithm

New Instance

Model

Benign or
Malignant?

0 Example: Surgery

Risk
0 Differentiating
between low-risk and
high-risk patients

Cell Shape Uniformity

Supervised Learning Toy


Example: Classification

Cell Size Uniformity


Rule: x > a AND y > b
then low-risk

Supervised Learning Toy


Example: Regression
0 Example: Child Mortality

y : child mortality
y = g (x | q )
where
g ( ) model,
q parameters

Child Mortality

0 x : maternal education

y = wx+w0

Maternal Education

Supervised Learning: Uses


Prediction of future cases: Use the rule to predict the

output for future inputs


Knowledge extraction: The rule is easy to understand
Compression: The rule is simpler than the data it
explains
Outlier detection: Exceptions that are not covered by
the rule, e.g., fraud

Unsupervised Machine
Learning
Also known as data mining
Goal is knowledge discovery
Example:
Input: DNA Sequence as a long string of {A,C,G,T}
Output: frequent subsequences (gene patterns)
()
()

AACGTAACGGGACTCCAC
DNA
Sequence

AC
Data
Mining
Algorithm

Model

Gene
Pattern

Unsupervised Learning
Example: Learning Associations
It started with market basket analysis
P (Y | X ) probability that somebody who buys X also
buys Y where X and Y are products/services.

Unsupervised Learning
Learning what normally happens
No labels
Example method:
Clustering: Grouping similar instances
Example applications
Image compression: Color quantization
Bioinformatics: Learning motifs

You dont Always need Machine


Learning!
Machine Learning definition (supervised):
The ability to learn and to improve with experience
instead of using pre-determined rules.

Consider the following two tasks:


Problem: Is m a prime number?

Solution: test up to to see if m can be


factored into two values.

Testing for
Prime Numbers

Recognizing
Handwritten Digits

You dont Always need Machine


Learning!
Unsupervised learning definition(rather unofficial):
Automatic analysis of data to extract previously
unknown interesting patterns
Consider the following two tasks:
Problem: Find all patterns matching regular
expression A*C.
Solution: Simple String matching (finite state
machine)

DNA Sequence Mining


Regular Expression Matching

When Learning is needed?


There is no need to learn to calculate payroll
Learning is used when:
Human expertise does not exist (navigating on Mars),
Humans are unable to explain their expertise (speech
recognition)
Solution changes in time (routing on a computer
network)
Solution needs to be adapted to particular cases (user
biometrics)

Supervised vs. Unsupervised


Learning
Supervised Learning (learn from my example)
Goal: A program that performs a task as good as humans.
TASK well defined (the target function)
EXPERIENCE training data provided by a human
PERFORMANCE Metric error/accuracy on the task
Unsupervised Learning (see what you can find)
Goal: To find some kind of structure in the data.
TASK vaguely defined
No EXPERIENCE: no labeled data
No PERFORMANCE Metric (but, there are some evaluations
metrics)
*TAKIS METAXAS, CS 315 Web Search and Data Mining

Terminology

A Simple Example
Tumor Classification
Benign: -1
Malignant: +1
Uniformity
of Cell Size

Uniformity
of Cell
Shape

Marginal
Adhesion

Single
Epithelial
Cell Size

Bare
Nuclei

Bland
Chromatin

Normal
Nucleoli

Mitoses

Class Label
(benign =2,
malignant
=4)

-1

10

+1

Terminology: Feature
Features = the set of attributes associated with an

example
(aka Independent variable in statistics)
Feature

Uniformity
of Cell Size

Uniformity
of Cell
Shape

Marginal
Adhesion

Single
Epithelial
Cell Size

Bare
Nuclei

Bland
Chromatin

Normal
Nucleoli

Mitoses

Class Label
(benign =2,
malignant
=4)

-1

10

+1

Terminology: Instance
Example = an instance of data = data point = xi
Each row of the table is a data instance.
Instance

Uniformity
of Cell Size

Uniformity
of Cell
Shape

Marginal
Adhesion

Single
Epithelial
Cell Size

Bare
Nuclei

Bland
Chromatin

Normal
Nucleoli

Mitoses

Class Label
(benign =2,
malignant
=4)

-1

10

+1

Terminology: Label
Label = Class = the feature to be predicted = category

associated with an object


Denoted by yi

(aka Dependent variable in statistics)

Label usually provided by an expert

Label

Uniformity
of Cell Size

Uniformity
of Cell
Shape

Marginal
Adhesion

Single
Epithelial
Cell Size

Bare
Nuclei

Bland
Chromatin

Normal
Nucleoli

Mitoses

Class Label
(benign =2,
malignant
=4)

-1

10

+1

Data Representation
We usually represent data in a matrix
Label

2 5 1 1 1 2

2 5 4 4 5 7 10 3
3 2 1 1 1 2

Co-variance Matrix (Feature Feature)

Gram Matrix (Instance Instance)

Instances

Instances

Features

-1
+1
?

Note: We can also assign a probability to each label (well discuss it later)

Summary of Key Terms


Instance = example = data point
Feature = independent variable
Class label = dependent variable
Decision boundary = separates examples in different
classes

Algorithms

Availability of Labeled Data


Supervised learning => when all data is labeled
Semi-supervised learning => when a small amount of data is labeled
Unsupervised learning => when data is not labeled
Transfer Learning => when labeled data is available in another domain
Active Learning => when the algorithm has access to a human oracle to
ask for labels of a few data points
Do you have
labeled data?

Yes
Supervised

A little
Semisupervised

No
Unsupervised

In another
domain
Transfer
Learning

By asking
oracle
Active
Learning

Task Type
Categorical: Classification task
Classifier

Continuous: Regression task


Ordered: Ranking task
What is your
output type?

Categorical

Classification

Continuous

Regression

Ordered

Ranking

Input Representation
The most common type
Simple records in Tables

Can be analyzed using regular machine learning


techniques.
Most other data types are converted to this type.
(Not always: There are methods that directly process other
data types.)
ID WGT

HGT

Cholesterol

Risk
(Class)

high

short

260

high

high

med

254

high

high

tall

142

med

A Simple Record

Input Representation(cont.)
Image, video
is preprocessed using Vision techniques.
Text
is preprocessed using NLP techniques.
Continuous measures along time (Time series)
is preprocessed using Time Series analysis.
Graphs
is preprocessed using Graph Theory tools.

Image

Time series

Text

Graph

More Details

Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data (and label data)
3. Split labeled data into training and test datasets
4. Use training data to train machine learning

algorithm.
5. Predict labels of examples in test data,
6. Evaluate algorithm.

Features Are Important!


Should be rich enough to capture the problem
Should be simple enough to allow learning the model
Too Many features
Makes learning more difficult

Not enough features


Impacts generalization power

Feature Extraction
Typically results in significant reduction in

dimensionality
Domain-specific

* Image taken from Jeff Howbert Slides

Feature Extraction
Typically results in significant reduction in

dimensionality
Domain-specific

* Image taken from Jeff Howbert Slides

Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets
4. Use training data to train machine learning

algorithm.
5. Predict labels of examples in test data,
6. Evaluate algorithm.

How to Split Data?


Holdout
Training set
(validation set)
Test set
K-fold Cross-validation
E.g. 10 fold cross validation

Methods of Sampling
Holdout
E.g. Reserve 2/3 for training and 1/3 for testing

Random subsampling
Cross validation
Partition data into k disjoint subsets
k-fold: train on k-1 partitions, test on the remaining one
Leave-one-out: k=n

Stratified sampling
Bootstrap
Sampling with replacement

Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets
4. Use training data to train machine learning

algorithm.
5. Predict labels of examples in test data
6. Evaluate algorithm.

Decision Boundary
We seek to find this boundary
True Decision
Boundary
x2

Learned
Decision
Boundary

Benign

(uniformity)

Outlier

Malignant

x1 (Radius)
= Labeled

Why Noise?
Noise might be due to different reasons
Imprecision in recording the input data
Errors in labeling data
We might not have considered additional features
(latent, or hidden features)
When there is noise, the decision boundary becomes

more complex

Overfitting
Data are well described by our model, but the

predictions do not generalize to new data.


A very rich hypothesis space
Training set too small
y

Overfitting and Underfitting


Underfitting
If your hypothesis is less complex than the actual
function
Using a straight line to model data generated by a third

order polynomial

Overfitting
If your hypothesis is more complex than the actual
function
Using a fifth order polynomial to model data generated by a

second order polynomial

Bias-Variance
Bias = assumptions, restrictions on model
Variance = variation of the prediction of the model
Simple linear model => high bias
Complex model => high variance
y

x
Under-fitting

x
Over-fitting

Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets
4. Use training data to train machine learning

algorithm.
5. Predict labels of examples in test data
6. Evaluate algorithm.

Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Model Comparison
How to compare the relative performance among
competing models?

Metrics for Performance


Evaluation
Focus on the predictive capability of a model
Rather than how fast it takes to classify or build models,
scalability, etc.
Confusion Matrix:
PREDICTED CLASS
Class=Yes Class=No

Class=Yes

ACTUAL
CLASS Class=No

a: TP (true positive)
b: FN (false negative)

c: FP (false positive)

d: TN (true negative)

Metrics for Performance


Evaluation
PREDICTED CLASS
Class=Yes

ACTUAL
CLASS

Class=No

Class=Yes

a
(TP)

b
(FN)

Class=No

c
(FP)

d
(TN)

Most widely-used metric:

ad
TP TN
Accuracy

a b c d TP TN FP FN

Cost Matrix
PREDICTED CLASS
C(i|j)

Class=Yes

Class=Yes

C(Yes|Yes)

C(No|Yes)

C(Yes|No)

C(No|No)

ACTUAL
CLASS Class=No

Class=No

C(i|j): Cost of misclassifying class j example as class i

Computing Cost of Classification


Cost
Matrix

PREDICTED CLASS

ACTUAL
CLASS

Model
M1

ACTUAL
CLASS

PREDICTED CLASS

150

40

60

250

Accuracy = 80%
Cost = 3910

C(i|j)

-1

100

Model
M2

ACTUAL
CLASS

PREDICTED CLASS

250

45

200

Accuracy = 90%
Cost = 4255

Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 examples = 9990
Number of Class 1 examples = 10

If model predicts everything to be class 0, accuracy is

9990/10000 = 99.9 %
Accuracy is misleading because model does not detect

any class 1 example

Other Measures
True Positives
a
Precision (p)
All items predicted as positive
ac
a
True Positives
Recall (r)
All actual positive items
ab
2rp
2a
F - measure (F)

r p 2a b c

Triple Tradeoff
Complexity of the hypothesis space: C
Amount of training data: N
Generalization error on new data: E
N E
C first E, then E

Learning Curve
Learning curve shows how

accuracy (or error) changes


with varying sample size

More on Bias vs. Variance


Typical learning curve for high variance:

Test error still decreasing as m increases. Suggests larger

training set will help.


Large gap between training and test error.

*Andrew Y. Ng, Advice for applying Machine Learning, Stanford

More on Bias vs. Variance


Typical learning curve for high bias:

Even training error is unacceptably high.


Small gap between training and test error.

*Andrew Y. Ng, Advice for applying Machine Learning, Stanford

Diagnosis
Fixes to try:
Solution
Try getting more training examples.
Try a smaller set of features.
Try a larger set of features.
Try different features.

*Andrew Y. Ng, Advice for applying Machine Learning, Stanford

Fixes the problem of


high variance.
high variance.
high bias.
high bias.

Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Model Comparison
How to compare the relative performance among
competing models?
We will look at this next time!

Putting It All Together


Differentiate between walking and Jogging using
accelerometer

Kwapisz et al, SIGKDD exploration, 2010

Data

Sample d=(x,y,z)
at 60 HZ

preprocess

-Segment
-Label

Feature
Extraction

f_1, f_2, f_3,.

Feature
Selection

Total
Energy
Very Low

Select some features

Run

Low

Stand

Main
Frequency

Train
Low

Evaluate

Very High

Sit

High
Walk

A simple decision tree model

References
Slides partially based on:
Lecture Notes for E Alpaydn 2010 Introduction to
Machine Learning 2e The MIT Press (V1.0)

Resources for You

Tools
RapidMiner
Weka
R
Scikits-learn
Matlab
More here

https://sites.google.com/site/parisar/links
(You can also find some publicly available free e-books

on machine learning)

Resources: Datasets
UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html

UCI KDD Archive:


http://kdd.ics.uci.edu/summary.data.application.html

Statlib: http://lib.stat.cmu.edu/
Delve: http://www.cs.utoronto.ca/~delve/

71

Resources: Journals
IEEE transaction on knowledge and data engineering
Journal of Machine Learning Research www.jmlr.org
Machine Learning
Neural Computation
Neural Networks
IEEE Transactions on Neural Networks
IEEE Transactions on Pattern Analysis and Machine
Intelligence
Annals of Statistics
Journal of the American Statistical Association
...

72

Resources: Conferences
International Conference on Knowledge Discovery and

Data Mining (KDD)


International Conference on Machine Learning (ICML)
European Conference on Machine Learning (ECML)
Neural Information Processing Systems (NIPS)
Uncertainty in Artificial Intelligence (UAI)
Computational Learning Theory (COLT)
International Conference on Artificial Neural Networks
(ICANN)
International Conference on AI & Statistics (AISTATS)
International Conference on Pattern Recognition (ICPR)
...
73

Anda mungkin juga menyukai