Machine Learning
Parisa Rashidi
Fall 2014
Reminder
Your project progress reports are due on Tuesday,
10/28
~2 pages in length (excluding references)
formatted using IEEE style link
Agenda
Machine learning
Today
Introduction to machine learning
Different types of machine learning methods
Later
More machine learning methods
NLP
Software
Rapidminer Link
Artificial Intelligence
Artificial Intelligence (AI) has many subfields
What is Learning ?
Machine learning is programming computers to
You were not made to live like beasts, but to follow virtue and knowledge.
(Dante Alighieri)
*ROBERTO BATTITI AND MAURO BRUNATO.
The LION way. Machine Learning plus Intelligent Optimization.
examples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
Example 1: adverse drug-drug interactions
Example 2: Customer behavior:
to the data.
Economics
Neuroscience
Machine Learning
Control
Theory
Computer
Science
Optimization
To Understand ML
You need
Basic Knowledge of computer science
Linear Algebra
Calculus
Probability and statistics
Optimization
Example ML Algorithms
Linear Regression
Decision trees, neural network, support vector machine,
Total
Energy
Very Low
Very High
Low
Stand
Run
Main
Frequency
Low
Sit
High
Walk
Generic Applications
Almost everywhere
Speech recognition, face recognition, search engines,
bioinformatics, fraud detection
And it will be everywhere
Smart homes, smart vehicles, smart cities
Biomedical Application
Mobile health monitoring solutions
Electronic Health Record (EHR) mining
Genome-wide associations (GWAS)
Smart homes for elderly
Biomarker discovery
13
http://www.kaggle.com/competitions
Example: predict the likelihood that an HIV patient's
some money!)
Tumor
Examples
Machine
Learning
Algorithm
New Instance
Model
Benign or
Malignant?
0 Example: Surgery
Risk
0 Differentiating
between low-risk and
high-risk patients
y : child mortality
y = g (x | q )
where
g ( ) model,
q parameters
Child Mortality
0 x : maternal education
y = wx+w0
Maternal Education
Unsupervised Machine
Learning
Also known as data mining
Goal is knowledge discovery
Example:
Input: DNA Sequence as a long string of {A,C,G,T}
Output: frequent subsequences (gene patterns)
()
()
AACGTAACGGGACTCCAC
DNA
Sequence
AC
Data
Mining
Algorithm
Model
Gene
Pattern
Unsupervised Learning
Example: Learning Associations
It started with market basket analysis
P (Y | X ) probability that somebody who buys X also
buys Y where X and Y are products/services.
Unsupervised Learning
Learning what normally happens
No labels
Example method:
Clustering: Grouping similar instances
Example applications
Image compression: Color quantization
Bioinformatics: Learning motifs
Testing for
Prime Numbers
Recognizing
Handwritten Digits
Terminology
A Simple Example
Tumor Classification
Benign: -1
Malignant: +1
Uniformity
of Cell Size
Uniformity
of Cell
Shape
Marginal
Adhesion
Single
Epithelial
Cell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses
Class Label
(benign =2,
malignant
=4)
-1
10
+1
Terminology: Feature
Features = the set of attributes associated with an
example
(aka Independent variable in statistics)
Feature
Uniformity
of Cell Size
Uniformity
of Cell
Shape
Marginal
Adhesion
Single
Epithelial
Cell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses
Class Label
(benign =2,
malignant
=4)
-1
10
+1
Terminology: Instance
Example = an instance of data = data point = xi
Each row of the table is a data instance.
Instance
Uniformity
of Cell Size
Uniformity
of Cell
Shape
Marginal
Adhesion
Single
Epithelial
Cell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses
Class Label
(benign =2,
malignant
=4)
-1
10
+1
Terminology: Label
Label = Class = the feature to be predicted = category
Label
Uniformity
of Cell Size
Uniformity
of Cell
Shape
Marginal
Adhesion
Single
Epithelial
Cell Size
Bare
Nuclei
Bland
Chromatin
Normal
Nucleoli
Mitoses
Class Label
(benign =2,
malignant
=4)
-1
10
+1
Data Representation
We usually represent data in a matrix
Label
2 5 1 1 1 2
2 5 4 4 5 7 10 3
3 2 1 1 1 2
Instances
Instances
Features
-1
+1
?
Note: We can also assign a probability to each label (well discuss it later)
Algorithms
Yes
Supervised
A little
Semisupervised
No
Unsupervised
In another
domain
Transfer
Learning
By asking
oracle
Active
Learning
Task Type
Categorical: Classification task
Classifier
Categorical
Classification
Continuous
Regression
Ordered
Ranking
Input Representation
The most common type
Simple records in Tables
HGT
Cholesterol
Risk
(Class)
high
short
260
high
high
med
254
high
high
tall
142
med
A Simple Record
Input Representation(cont.)
Image, video
is preprocessed using Vision techniques.
Text
is preprocessed using NLP techniques.
Continuous measures along time (Time series)
is preprocessed using Time Series analysis.
Graphs
is preprocessed using Graph Theory tools.
Image
Time series
Text
Graph
More Details
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data (and label data)
3. Split labeled data into training and test datasets
4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data,
6. Evaluate algorithm.
Feature Extraction
Typically results in significant reduction in
dimensionality
Domain-specific
Feature Extraction
Typically results in significant reduction in
dimensionality
Domain-specific
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets
4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data,
6. Evaluate algorithm.
Methods of Sampling
Holdout
E.g. Reserve 2/3 for training and 1/3 for testing
Random subsampling
Cross validation
Partition data into k disjoint subsets
k-fold: train on k-1 partitions, test on the remaining one
Leave-one-out: k=n
Stratified sampling
Bootstrap
Sampling with replacement
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets
4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data
6. Evaluate algorithm.
Decision Boundary
We seek to find this boundary
True Decision
Boundary
x2
Learned
Decision
Boundary
Benign
(uniformity)
Outlier
Malignant
x1 (Radius)
= Labeled
Why Noise?
Noise might be due to different reasons
Imprecision in recording the input data
Errors in labeling data
We might not have considered additional features
(latent, or hidden features)
When there is noise, the decision boundary becomes
more complex
Overfitting
Data are well described by our model, but the
order polynomial
Overfitting
If your hypothesis is more complex than the actual
function
Using a fifth order polynomial to model data generated by a
Bias-Variance
Bias = assumptions, restrictions on model
Variance = variation of the prediction of the model
Simple linear model => high bias
Complex model => high variance
y
x
Under-fitting
x
Over-fitting
Important Steps
1. Determine relevant features (expert knowledge)
2. Collect data
3. Split labeled data into training and test datasets
4. Use training data to train machine learning
algorithm.
5. Predict labels of examples in test data
6. Evaluate algorithm.
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Model Comparison
How to compare the relative performance among
competing models?
Class=Yes
ACTUAL
CLASS Class=No
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
ACTUAL
CLASS
Class=No
Class=Yes
a
(TP)
b
(FN)
Class=No
c
(FP)
d
(TN)
ad
TP TN
Accuracy
a b c d TP TN FP FN
Cost Matrix
PREDICTED CLASS
C(i|j)
Class=Yes
Class=Yes
C(Yes|Yes)
C(No|Yes)
C(Yes|No)
C(No|No)
ACTUAL
CLASS Class=No
Class=No
PREDICTED CLASS
ACTUAL
CLASS
Model
M1
ACTUAL
CLASS
PREDICTED CLASS
150
40
60
250
Accuracy = 80%
Cost = 3910
C(i|j)
-1
100
Model
M2
ACTUAL
CLASS
PREDICTED CLASS
250
45
200
Accuracy = 90%
Cost = 4255
Limitation of Accuracy
Consider a 2-class problem
Number of Class 0 examples = 9990
Number of Class 1 examples = 10
9990/10000 = 99.9 %
Accuracy is misleading because model does not detect
Other Measures
True Positives
a
Precision (p)
All items predicted as positive
ac
a
True Positives
Recall (r)
All actual positive items
ab
2rp
2a
F - measure (F)
r p 2a b c
Triple Tradeoff
Complexity of the hypothesis space: C
Amount of training data: N
Generalization error on new data: E
N E
C first E, then E
Learning Curve
Learning curve shows how
Diagnosis
Fixes to try:
Solution
Try getting more training examples.
Try a smaller set of features.
Try a larger set of features.
Try different features.
Model Evaluation
Metrics for Performance Evaluation
How to evaluate the performance of a model?
Methods for Model Comparison
How to compare the relative performance among
competing models?
We will look at this next time!
Data
Sample d=(x,y,z)
at 60 HZ
preprocess
-Segment
-Label
Feature
Extraction
Feature
Selection
Total
Energy
Very Low
Run
Low
Stand
Main
Frequency
Train
Low
Evaluate
Very High
Sit
High
Walk
References
Slides partially based on:
Lecture Notes for E Alpaydn 2010 Introduction to
Machine Learning 2e The MIT Press (V1.0)
Tools
RapidMiner
Weka
R
Scikits-learn
Matlab
More here
https://sites.google.com/site/parisar/links
(You can also find some publicly available free e-books
on machine learning)
Resources: Datasets
UCI Repository:
http://www.ics.uci.edu/~mlearn/MLRepository.html
Statlib: http://lib.stat.cmu.edu/
Delve: http://www.cs.utoronto.ca/~delve/
71
Resources: Journals
IEEE transaction on knowledge and data engineering
Journal of Machine Learning Research www.jmlr.org
Machine Learning
Neural Computation
Neural Networks
IEEE Transactions on Neural Networks
IEEE Transactions on Pattern Analysis and Machine
Intelligence
Annals of Statistics
Journal of the American Statistical Association
...
72
Resources: Conferences
International Conference on Knowledge Discovery and