Machine Learning:
Choosing the Best Approach
Deep Learning vs. Machine Learning: Choosing the Right Approach
Introduction 3
Terminology 4
Your Project 7
Your Data 26
Your Hardware 35
Conclusion 52
AI: Artificial intelligence (AI) is a computer system trained to perceive you manually select the relevant features to use and then train the
its environment, make decisions, and take action. model. When we refer to machine learning we exclude deep learning.
Common techniques include decision trees, regression, support vector
Machine learning: The techniques involved with building models that machines, and ensemble methods.
automatically learn from data. In this ebook, we use machine learning
as shorthand for “traditional machine learning”—the workflow in which
ARTIFICIAL MACHINE
INTELLIGENCE LEARNING DEEP LEARNING
Statistical methods that Neural networks with many layers
Any technique that
enable machines to “learn” that learn representations and tasks
enables machines to
tasks from data without “directly” from data
mimic human
intelligence explicitly programming
Algorithm: The set of rules or instructions that will train the model to
do what you want it to do.
Model: The trained program that predicts outputs given a set of inputs.
It’s helpful to start off with a clear picture of what you’re trying to
accomplish. There are few hard and fast rules when it comes to
selecting deep learning rather than machine learning algorithms, so
consider choices as if they were on a spectrum.
While one task alone might be more suited to machine learning, your
full application might involve multiple steps that are better suited to
deep learning when taken together.
You'll need more data to teach the network to distinguish between similar images,
like the African and European swallow.
To avoid overfitting from the start, make sure you have plenty of
training, validation, and test data. Use the training and validation data
first to train the model; the data needs to be representative of your
real-world data and you need to have enough of it. Once your model
is trained, use completely new test data to check that your model is
performing well.
If you think your model is starting to overfit the data, take a look at:
Like a lot in life, there are elements of common sense and trial and
error. Most practitioners develop an intuition of how various parameters
affect accuracy through experimentation.
Just Remember…
At some point you may get close to the point of diminishing
returns, at which tweaks to the model result in insignificant
improvements in accuracy. It’s good to keep in mind the final
goal, and if possible, the business impact of incremental
Trace plot of coefficients fit by lasso.
improvements in accuracy and danger of overfitting.
Data scientists often refer to the ability to share and explain results as
model interpretability. A model that is easily interpretable has:
• A small number of features that typically are created from some Guess the Algorithm
physical understanding of the system
A researcher designed a way to take ultra-low-dose CT scans
• A transparent decision-making process
(which reduce the amount of radiation exposure, but also reduce
image resolution) and apply image processing techniques to
Interpretability is important for many applications that need to:
regain image resolution.
• Prove that your model complies with government
or industry standards
What technique did he use?
• Explain factors that contributed to a diagnosis
• Show the absence of bias in decision-making SVM or CNN
If you must have the ability to demonstrate the steps the algorithm
Local interpretable model-agnostic explanations (LIME) take
took to reach a conclusion, focus your attention on machine learning
a series of individual inputs and outputs to approximate the
techniques. Decision trees are famously easy to follow down their
decision making. Read the paper.
Boolean paths of “if x, then y.”
Traditional statistics techniques such as linear and logistic regression Another area of research is the use of decision trees as a
are well accepted. Even random forests are relatively simple to explain method to illustrate a more complex model. Read the paper.
if taken one tree at a time.
How much do you know about the system where your project sits?
Schematic showing how the components of the elevator system are connected to one another.
If you have a solid understanding of the data, select the features you
think will be the most influential and start with a machine learning
algorithm. If you have high-dimensional data, try dimensionality
reduction techniques such as principal component analysis (PCA) to
create a smaller number of features to try to improve results.
Feature Selection
Ensure your model is focused on the data with the most predictive
power and is not distracted by data that will not impact decision
making. Precise feature selection will result in a faster, more efficient,
more interpretable model.
One benefit of many deep learning algorithms is that they perform A common example of using deep learning and machine learning
feature extraction and selection tasks automatically. You will still want together is to use a CNN to select features that are then fed into a
to preprocess the data, but then the model will decide for itself which machine learning algorithm. We walk through an example of how this
features of the data are the most important. can be done in Section 4.
Tabular Data
What do we mean by tabular? Think of a database or employee Traditional machine learning techniques were designed with tabular
information where the columns are independent of each other. data in mind, so you may want to start with machine learning if your
data is tabular. There are ways to transform tabular data to work with
Tabular data can be numeric or categorical (though eventually the deep learning models, but this may not be the best option when
categorical data would be converted to numeric). starting out.
Wavelets provide yet another way to extract features from signals, with
techniques like wavelet scattering showing promising results when
combined with machine learning algorithms.
Text
Traditional approaches involve converting text to a numerical
representation via bag-of-words models and normalization techniques
such as TF-IDF.
This numerical data can then be used with traditional machine learning
techniques such as support vector machines or naïve Bayes. Newer
techniques use text with recurrent or convolutional neural network
architectures.
If you want to use deep learning but do not have a lot of labeled
data, consider transfer learning using a pretrained network such as
GoogLeNet, ResNet-101, or VGG-16.
It’s useful to think about hardware in two groups: Guess the Algorithm
• Hardware you have available to train the model. An oil company created a more efficient way to keep track of
• Hardware the model will run on in production. geo-tagged machine inventory for maintenance scheduling.
They set up a machine vision system to identify tags with serial
numbers, use object character recognition to extract numbers,
and associate images with inventory.
Clustering or
Regional Convolutional Neural Network (R-CNN)
Because deep learning models take a long time to train (often on the
order of hours or days), it is common to have several models training in
parallel, with the hope that one (or some) of them will provide improved
results.
Run applications on a multicore desktop with local workers, take advantage of GPUs,
and scale up to a cluster with Parallel Computing Toolbox.
Desktop CPUs are still sufficient for training most machine learning
models.
For training models on large amounts of data, you can use big data
frameworks such as Apache Spark™ to spread the computation across
a cluster of CPUs.
Communication Connectivity
Deploy analytics
to aggregator
DATA AGGREGATOR 3
• Online analytics
• Visualization and reporting
2
EDGE NODES
• Local embedded algorithms
• Data reduction EXPLORATORY ANALYSIS
1 • Historical analytics
• Algorithm development
The trend toward smarter and more connected sensors is adding Guess the Algorithm
pressure to move more processing and analytics as close to the
sensors as possible. This approach has the benefit of shrinking the Orbiting satellites and spacecraft in low Earth orbit are subject
amount of data that is transferred over the network, which reduces to the collision dangers of more than 500,000 pieces of space
the cost of transmission and can reduce the power consumption of debris; tracking this debris allows the spacecraft to maneuver
wireless devices. away from collision zones. Current tracking tactics are
vulnerable to orbital variations of space debris clouds due
Models that run on hardware at the edge will provide quick results to constantly changing astrodynamics subject to
without a network connection. However, enough hardware will need nonlinear celestial disturbances.
to be available at the edge to run the machine learning model, and it
will be more difficult to push out updates to the model than if the model What kind of algorithm did they use to improve safety?
resided on a centralized server.
Deep Q Network or Artificial Neural Network
Tools are available that can convert machine learning models, which
are typically developed in high-level interpreted languages, into
standalone C/C++ code, which can be run on low-power embedded
devices.
With GPU Coder Support Package for NVIDIA GPUs, you can cross-
compile and deploy the generated CUDA code as a standalone
application on an embedded GPU such as the NVIDIA Drive platform
or the NVIDIA Jetson® board.
Just Remember…
Depending on your application, the level of validation required
before using the model in production will vary greatly. For example,
a model used in marketing to recommend ads will require
significantly less validation than a model used in a safety-critical
application. In such applications, models can be integrated with
existing validation processes such as hardware-in-the-loop (HIL) to
ensure the model runs as expected in the production environment.
Use the Classification Learner app to try different classifiers on your dataset quickly.
The Deep Network Designer app lets you build, visualize, and edit deep learning networks.
The images used in this example are from the CIFAR-10 dataset.
convnet = alexnet;
ACCESS
LAYERS
TRAIN THE
NETWORK
TEST THE
NETWORK
2 'conv1' Convolution
96 11x11x3 convolutions with
stride [4 4] and padding [0 0 0 0]
TRAIN THE
NETWORK
…
Step 3. Train
Set Up Training Data BRING IN
ALEXNET
If you are using CIFAR-10, you have the choice of 10 categories of
objects. The categories in this example were randomly chosen; you
can choose whichever categories work best for you.
Things to try:
Change the number 50 to as many training images as you would
like to use. TEST THE
NETWORK
See how increasing the number of images changes the accuracy of
the classifier.
Step 3. Train
Extract Features from the Training Set Images BRING IN
ALEXNET
Features are extracted through activations, which will pull the features
learned from the CNN up to that point in the architecture. If you use a
network trained on millions of images, such as AlexNet, you can expect
that the features pulled from the network will be very rich, complex
features that describe the objects. ACCESS
LAYERS
featureLayer = 'fc7';
trainingFeatures = activations(convnet,
trainingSet, featureLayer);
TRAIN THE
NETWORK
TEST THE
NETWORK
Step 3. Train
Train the SVM Classifier BRING IN
ALEXNET
fitcecoc is just one of the many classifiers available. It fits a
multiclass SVM using error-correcting output codes. What about other
fitting functions like fitcknn or fitcnb?
classifier = fitcecoc(trainingFeatures,
trainingSet.Labels);
TRAIN THE
NETWORK
TEST THE
NETWORK
Step 4. Test
Set Up Test Data BRING IN
rootFolder = 'cifar10Test'; ALEXNET
testSet = imageDatastore(fullfile(rootFolder,
categories), 'LabelSource', 'foldernames');
testSet.ReadFcn = @readFunctionTrain;
Extract Features from the Test Set Images, ACCESS
and Test the SVM Classifer LAYERS
testFeatures = activations(convnet, testSet,
featureLayer);
predictedLabels = predict(classifier, testFeatures);
TRAIN THE
Determine Overall Accuracy NETWORK
confMat = confusionmat(testSet.Labels,
predictedLabels);
confMat = confMat./sum(confMat,2);
mean(diag(confMat))
TEST THE
NETWORK
As we’ve seen, there are very few hard and fast rules when it comes to
choosing the best algorithm for your project. Guess the Algorithm
Most algorithms are chosen through a process of trial and error to see A research and development organization restored arm and
what works best in any given situation. hand control to a quadriplegic man by processing signals from
an electrode array implanted in his brain.
Whether you end up with a traditional machine learning algorithm or
a deep learning algorithm, MATLAB provides tools and support to get What algorithm did they use?
started with these techniques quickly.
SVM or RNN
Machine Learning
Deep Learning
With just a few lines of MATLAB code, you can build deep learning
models without having to be an expert. Explore how MATLAB can help
you perform these deep learning tasks:
What did a researcher use to reduce radiation in CT scans? Convolutional Neural Network » Read the article
How did a researcher predict the path of space junk? Artificial Neural Network » Watch the presentation
What did the R&D organization use to restore limb control? Support Vector Machine » Read the story
© 2019 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See mathworks.com/trademarks for
a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.