C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
AGENDA MACHINE LEARNING
• Background
• Use cases in healthcare, insurance, retail and banking
• Examples:
• Unsupervised Learning –Principle Component Analysis
• Supervised Learning – Support Vector Machines
• Semi-supervised Learning – Deep Learning
• Supervised Learning – Ensemble Models
• Resources
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING BACKGROUND
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
WHAT IS MACHINE LEARNING?
BACKGROUND
Wikipedia:
“Machine learning is a scientific discipline that deals with the construction
and study of algorithms that can learn from data. Such algorithms operate
by building a model based on inputs and using that to make predictions or
decisions, rather than following only explicitly programmed
instructions.”
SAS:
Machine learning is a branch of artificial intelligence that automates the
building of systems that learn from data, identify patterns, and make
decisions – with minimal human intervention.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING GARTNER HYPE CYCLE (ADVANCED ANALYTICS)
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING WHY NOW?
https://www.google.com/trends/
Machine Learning
Data Science
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING WHY WOULD YOU USE MACHINE LEARNING?
BACKGROUND
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
MULTIDISCIPLINARY NATURE OF BIG DATA ANALYSIS
BACKGROUND
Statistics
Pattern Computational
Neuroscience
Recognition
Data
Science
AI
Data Mining
Machine
Learning
Databases
KDD
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING EVERYDAY USE CASES
• Internet search
• Digital adds
• Recommenders
• Image recognition – tagging photos, etc.
• Fraud/risk
• Upselling/cross-selling
• Churn
• Human resource management
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
EXAMPLES OF MACHINE LEARNING IN
HEALTHCARE, BANKING, INSURANCE AND
OTHER INDUSTRIES
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
TARGETED TREATMENT
IN HEALTHCARE
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
EARLY DETECTION AND MONITORING
IN HEALTHCARE
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Source: Michael Wang Source: Andrea Mannini
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
MEDICATION ADHERENCE
IN HEALTHCARE
AiCure uses mobile technology and facial recognition to determine if the right person
is taking a given drug at the right time.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
BIG DATA IN DRUG DEVELOPMENT
IN HEALTHCARE
It’s using concepts of machine learning and big data to scope out potential
drug compounds – ones that could have more broad-ranging benefits,
pivoting away from today’s trend toward targeted therapies.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
AUTOMOBILE INSURANCE
IN INSURANCE
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING FRAUD RISK PROTECTION MODEL USING
IN BANKING LOGISTIC REGRESSION
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
FOR
RECOMMENDATION
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
TARGETED MARKETING
IN RETAIL
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING: THE ALGORITHMS
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING TERMINOLOGY
Train Fit
Score Predict
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING
MULTIDISCIPLINARY NATURE OF BIG DATA ANALYSIS
BACKGROUND
Statistics
Pattern Computational
Neuroscience
Recognition
Data
Science
AI
Data Mining
Machine
Learning
Databases
KDD
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Machine Learning
SUPERVISED UNSUPERVISED SEMI-SUPERVISED
LEARNING LEARNING LEARNING TRANSDUCTION
Regression A priori rules Prediction and
Data Mining
*In semi-supervised learning, supervised prediction and classification algorithms are often combined with
clustering.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT PCA IS USED IN COUNTLESS MACHINE
ANALYSIS LEARNING APPLICATIONS
FOR MACHINE LEARNING
• Fraud detection
• Word and character recognition
• Speech recognition
• Email spam detection
• Texture classification
• Face Recognition
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT SEA BASS OR SALMON?
ANALYSIS
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT SEA BASS OR SALMON?
ANALYSIS
FOR MACHINE LEARNING
Lightness provides a
better separation
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT SEA BASS OR SALMON?
ANALYSIS
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT TYPICAL DATA MINING PROBLEM:
ANALYSIS 100s OR 1000s OF FEATURES
FOR MACHINE LEARNING
Key questions:
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT WHAT IS PRINCIPAL COMPONENT ANALYSIS?
ANALYSIS
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT WHAT IS PRINCIPAL COMPONENT ANALYSIS?
ANALYSIS
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT TWO-DIMENSIONAL EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
Original Data
The first principle component is
found according to two criteria:
pc1
• The variation of values along
the principal component
direction should be maximal
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT TWO-DIMENSIONAL EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
PCA.mp4
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT TWO-DIMENSIONAL EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
pc2 pc1
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT NO DIMENSION REDUCTION: 100 100
ANALYSIS
FOR MACHINE LEARNING
P
Y = X
No dimension reduction!
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT DIMENSION REDUCTION: 100 2
ANALYSIS
FOR MACHINE LEARNING
Choose the first two
principal components that
have the highest variance!
. P
Y = X
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT HOW TO CHOOSE THE NUMBER OF PRINCIPAL
ANALYSIS COMPONENTS?
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT FACE RECOGNITION EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
50x50
2500x1
Calculate average
face vector
...
Subtract average
face vector from
each face vector
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT FACE RECOGNITION EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
2500
2500
... X= 𝐶𝑥 =
...
...
100
2500
100 face vectors
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT FACE RECOGNITION EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
...
Select the 18
principle
components
that have the
...
largest variation ...
18 selected principle
2500 principle components, components (eigenfaces)
each 2500 x 1 dimensional
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT FACE RECOGNITION EXAMPLE
ANALYSIS
FOR MACHINE LEARNING
Represent each image in the training
set as a linear combination of
eigenfaces
𝑤1 𝑤1
𝑤1 𝑤2 𝑤3 𝑤18 𝑤2 𝑤2
. ... .
...
. .
. .
𝑤18 𝑤18
Weight vector Weight vector
for image 1 for image 100
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
PRINCIPAL
COMPONENT RECOGNIZE UNKNOWN FACE
ANALYSIS
FOR MACHINE LEARNING
?
?
1. Convert the unknown image as a face vector
2. Standardize the face vector
3. Represent this face vector as a linear combination of 18 eigenfaces
4. Calculate the distance between this image’s weight vector and the weight vector
for each image in the training set
5. If the smallest distance is larger than the threshold distance unknown person
6. If the smallest distance is less than the threshold distance,
then the face is recognized as
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
EXAMPLE 2: SUPERVISED LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPPORT VECTOR
MACHINES BINARY CLASSIFICATION MODELS
xx xx
Support vectors: points that lie on (define) the margin
xxxx
xx xx
x xx
xx xx
xx xx
xx xx
marginmargin
Maximize
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPPORT VECTOR
MACHINES INTRODUCING A PENALTY
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPPORT VECTOR
MACHINES SEPARATION DIFFICULTIES
x
x Use a “kernel function” to
map to a higher-
x 2 classes: In, Out
dimensional space
x
x x xx xx x xx x http://techlab.bu.edu/classer/data_sets
Non-separable
Separable 1-D
in 2-D problem
space
Circle-in-the-Square
data set
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SUPPORT VECTOR
MACHINES KEY TUNING PARAMETERS
Linear
Polynomial (2 or 3
degrees)
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
NEURAL NETWORKS
w1
X1 f Y
X2 w2
Neural network compute node
w3
X3 f is the so-called activation function
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
NEURAL NETWORKS
P Y X) = 𝑔 𝑇𝑌 X1 α1
𝑇𝑌 = 𝛽0𝑌 + 𝛽𝑌𝑇 𝑍 Z1 β1
𝑍𝑚 = 𝜎 𝛼0𝑚 + 𝛼𝑚 𝑇𝑋 X2
Z2
Z3
𝑒 𝑇𝑌 1
𝑔 𝑇𝑌 = , 𝜎(𝑥) = X4
𝑒 𝑇𝑁 +𝑒 𝑇𝑌 1+𝑒 −𝑥
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
NEURAL NETWORKS ESTIMATING THE WEIGHTS
𝑤𝑛𝑒𝑤𝑖 = 𝑤𝑖 + ∆𝑤𝑖
𝜕𝐸
∆𝑤𝑖 = −𝛼
𝜕𝑤𝑖
4. Stop if error E is small enough.
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Supervised Unsupervised
Single-layer
network
Autoencoder
Multilayer perceptron (MLP)
(with noise injection: denoising autoencoder)
Deep Network
Stacked autoencoder
Deep belief network or Deep MLP
(with noise injection: stacked denoising
autoencoder)
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING
WHAT IS DEEP LEARNING?
BACKGROUND
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
NEURAL NETS AUTOENCODERS
OUTPUT = INPUT
x1 x2 x3
DECODE
(Often more hidden layers with many nodes)
h11 h12
ENCODE
x1 x2 x3
Note: Linear activation function
INPUT corresponds with 2 dimensional
principle components analysis
http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING
y h2 hy2 h2
1 2 3
h1 h1 h1 h1
h31 h32 1
h231
h 31 h332
h 32 4
The weights from layerwise
pre-training can be used
h2 hx23
asx1an initialization hx24
h21 h22 h23 hx21
2 h 22
for
h 23
x5
training 1the entire
2 deep
3
network!
h h h h
h11 h12 h13 h14 h111 h121 h131 h141
1 2 3 4
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING DIGIT RECOGNITION - CLASSIC MNIST TRAINING DATA
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING FEATURE EXTRACTION
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING MEDICARE PROVIDERS DATA
Medicare Data:
• Billing information
• Number of severity of procedures billed
• Whether the provider is a university hospital
• …
Goals:
• Which Medicare providers are similar to each other?
• Which Medicare providers are outliers?
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING MEDICARE DATA
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
DEEP LEARNING VISUALIZATION AFTER DIMENSION REDUCTION
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
EXAMPLE 4: SUPERVISED LEARNING
ENSEMBLE MODELS
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
CONCEPT
http://www.englishchamberchoir.com/
Blended
Scotch
Investment Portfolio
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
CONCEPT
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
CONCEPT
Ensemble (average)
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
CONCEPT
• Combine strengths
• Compensate for weaknesses
• Generalize for future data
Ensemble (average)
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
COMMON USAGE
• Predictive modeling competitions - Teams very often join forces at later stages
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING COMMON USAGE
• Forecasting
• Election forecasting (polls from various markets and demographics)
• Business forecasting (market factors)
• Weather forecasting
http://www.nhc.noaa.gov/
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
COMMON USAGE
http://www.wral.com/fishel-weekend-snow-prediction-is-an-
outlier/14396482/
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING ASSIGNING OUTPUT
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
APPROACHES
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING APPROACHES
Average/Vote
for final
prediction
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
RANDOM FORESTS
a ≥10 a<10
• Ensemble of decision trees
• Score new observations by majority vote or average b ≥ 66 b<66
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING MODIFIED ALGORITHM OPTIONS
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING MODIFIED ALGORITHM OPTIONS
FOR MACHINE LEARNING
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING MULTIPLE ALGORITHMS
FOR MACHINE LEARNING
Different algorithms
Super Learners
Combine cross-validated results of models
from different algorithms
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ENSEMBLE MODELING IMPROVED FIT
FOR MACHINE LEARNING STATISTICS
Misclassification
Model
Rate
Ensemble 0.151
Bagging 0.185
Boosting 0.161
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING RESOURCES
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING SAS RESOURCES
Products Page
http://www.sas.com/en_us/insights/analytics/machine-learning.html
GitHub
https://github.com/sassoftware/enlighten-apply
https://github.com/sassoftware/enlighten-deep
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MACHINE LEARNING MISCELLANEOUS RESOURCES
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
THANK YOU
C op yr i g h t © 2 0 1 4 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .