ML Algos

VERSION 1.
0
28 MAY 2018
Common ML Algos
ALGORITHM CATEGORIZATION
1) Crunchers. These algorithms use small repetitive steps guided with simple rules to number crunch a
complex problem. We give these algorithms the data, and they come back with an answer. If we don’t
like the answer we give the algorithms more data to fine-tune their answer. Crunchers are good at
classifying customers, estimating project durations, and analyzing survey data to understand our
business culture.
2) Guides. These algorithms guide us on how to best navigate a policy, process, or workflow based on
historic actions that were successful. Guides are good at coordinating a lot of moving parts needed
to understand and execute things like risk management, strategic change, and complex project
management.
3) Advisors. These algorithms advise us on our best options by providing us with predictions, rankings,
and likelihood-of-success based on historic patterns. Advisors are good at advising us on decision-
making, planning, and risk mitigation.
4) Predictors. These algorithms predict future human behaviors and events by using small repeatable
decisions and judgments that interpret historic behaviors and events. Predictors are good at business
planning, market forecasting, brand management, health diagnosis, and predicting consumer
behaviors, brand attractiveness, fraud, marketing opportunities, weather events, and disease
outbreaks.
5) Tacticians. These algorithms tactically anticipate short-term behaviors and react accordingly. They
do this by applying a combination of short-term tactical rules along with information they learned
about the people involved. Tacticians are good at balancing supply chains, systems performance,
human capital workloads, and assembly lines.
6) Strategists. These algorithms strategically anticipate behaviors and plan accordingly. Strategists look
past the data uncovering insights and innovative opportunities. They do this by applying a
combination of short-term and long-term strategic rules along with information they learned about
the people involved and how these people react in various environments. Strategists are good at
forecasting market demand, customer attrition, human productivity, and employee attrition.
7) Lifters. These algorithms help us by automating our mundane and repetitive work freeing us to do
what we’ve been hired to do. These algorithms have some subject matter expertise allowing them to
do our analytical heavy lifting. Lifters are good at analyzing and recognizing repeatable patterns and
gaps in regulations, fraud, risks, improvements, transformations, opportunities, and innovations.
8) Partners. These algorithms bring out the best in us. They have a large amount of subject matter
expertise in our area allowing us to be more productive and more focused. Partners are good at
advising us, training us, keeping us up to date with market changes, and coordinating us and our
efforts daily, quarterly and annually. Partners understand how we tick from our behaviors to when
we should eat lunch to what temperature we need the air conditioning at.
9) Okays. These algorithms have subject matter expertise in multiple areas allowing groups of us to do
all our foundational analytical work. Once the algorithms complete their analyses we each review the
work based on our own expertise and then okay the work. Okays are good at building the big picture
through deep analysis and looking at things from all angles. They are useful for business planning,
strategic change, and culture change.
10) Supervisors. These algorithms have key subject matter expertise for how our business works. They
manage us and our efforts so that we and the business stay healthy, productive, and financially
strong. These algorithms orchestrate us and all the other algorithms to help us meet our strategic
long-term objectives.
NAÏVE BAYES CLASSIFIER ALGORITHM
A classifier is a function that allocates a population ’s element value from one of the
available categories. For instance, Spam Filtering is a popular application of Naïve Bayes
algorithm. Spam filter here, is a classifier that assigns a label “Spam” or “Not Spam” to
all the emails.
WHEN TO USE:
a. If you have a moderate or large training data set.
b. If the instances have several attributes.
c. Given the classification parameter, attributes which describe the instances
should be conditionally independent.
APPLICATIONS:
a) Sentiment Analysis- It is used at Facebook to analyse status updates expressing
positive or negative emotions.
b) Document Categorization- Google uses document classification to index documents
and find relevancy scores i.e. the PageRank. PageRank mechanism considers the
pages marked as important in the databases that were parsed and classified using a
document classification technique.
c) Classifying news articles - Algorithm is also used for categorizing articles about
Technology, Entertainment, Sports, Politics, etc.
d) Email Spam Filtering- Google Mail uses Naïve Bayes algorithm to classify your emails
as Spam or Not Spam
ADVANTAGES:
a) Performs well when the input variables are categorical.
b) Converges faster, requiring relatively little training data than other discriminative
models like logistic regression, when the Naïve Bayes conditional independence
assumption holds.
c) It is easier to predict class of the test data set. A good bet for multi class predictions
as well.
d) Though it requires conditional independence assumption, Naïve Bayes Classifier has
presented good performance in various application domains.
K MEANS CLUSTERING ALGORITHM
K-Means is a non-deterministic and iterative method. The algorithm operates on a given
data set through pre-defined number of clusters, k. The output of K Means algorithm is k
clusters with input data partitioned among the clusters. K Means clustering alg orithm can
be applied to group the webpages that talk about similar concepts. So, the algorithm will
group all web pages that talk about Jaguar (a search query) as an Animal into one cluster,
Jaguar as a Car into another cluster and so on.
APPLICATIONS:
K Means Clustering algorithm is used by most of the search engines like Yahoo,
Google to cluster web pages by similarity and identify the ‘relevance rate’ of search
results. This helps search engines reduce the computational time for the users.
ADVANTAGES
a) In case of globular clusters, K-Means produces tighter clusters than hierarchical
clustering.
b) Given a smaller value of K, K -Means clustering computes faster than hierarchical
clustering for large number of variables.
SUPPORT VECTOR MACHINE LEARNING ALGORITHM

It works by classifying the data into different classes by finding a line (hyperplane) which
separates the training data set into classes.
APPLICATIONS:
SVM is commonly used for stock market forecasting by var ious financial institutions.
For instance, it can be used to compare the relative performance of the stocks when
compared to performance of other stocks in the same sector. The relative
comparison of stocks helps manage investment making decisions based on the
classifications made by the svm learning algorithm.
ADVANTAGES
a) SVM offers best classification performance (accuracy) on the training data.
b) SVM renders more efficiency for correct classification of the future data.
c) The best thing about SVM is that it does not make any strong assumptions on data.
d) It does not over-fit the data.
APRIORI MACHINE LEARNING ALGORITHM
Apriori algorithm is an unsupervised machine learning algorithm that generates association
rules from a given data set. Association rule implies that if an item A occurs, then item B
also occurs with a certain probability. Most of the association rules generated are in the
IF_THEN format. Basic principle on which Apriori Machine Learning Algorithm works:
a) If an item set occurs frequently then all the subsets of the item set, also occur
frequently.
b) If an item set occurs infrequently then all the supersets of the item set have
infrequent occurrence.
APPLICATIONS
a) Detecting Adverse Drug Reactions - used for association analysis on healthcare data
like-the drugs taken by patients, characteristics of each patient, adverse ill -effects
patients experience, initial diagnosis, etc.
b) Market Basket Analysis - Many e-commerce giants like Amazon use Apriori to draw
data insights on which products are likely to be purchased together and which are
most responsive to promotion.
c) Auto-Complete Applications- when the user types a word, the Google-autocomplete

looks for other associated words that people usually type after a specific word.
ADVANTAGES
a) It is easy to implement and can be parallelized easily.
b) Apriori implementation makes use of large item set properties.
ARTIFICIAL NEURAL NETWORKS

Human brain has a highly complex and non -linear parallel computer which can organize the
structural constituents i.e. the neurons interconnected in a complex manner between each
other. Let us take a simple example of face recognition -whenever we meet a person, a
person who is known to us can be easily recognized with his name or he works at XYZ place
or based on his relationship with you. We may be knowing thousands of people, the task
requires the human brain to immediately recognize the person (face recog nition).Now,
suppose instead of the human brain doing it, if a computer is asked to perform this task. It
is not going to be an easy computation for the machine as it does not know the person.
You have to teach the computer that there are images of differ ent people. If you know
10,000 people then you have to feed all the 10,000 photographs into the computer. Now,
whenever you meet a person you capture an image of the person and feed it to the
computer. The computer matches this photograph with all the 10,0 00 photographs that
you have already fed into the database. At the end of all the computations -it gives the
result with the photograph that best resembles the person.
LINEAR REGRESSION MACHINE LEARNING ALGORITHM
Linear Regression algorithm shows the relati onship between 2 variables and how the
change in one variable impacts the other. The algorithm shows the impact on the
dependent variable on changing the independent variable. The independent variables are
referred as explanatory variables, as they explain the factors the impact the dependent
variable. Dependent variable is often referred to as the factor of interest or predictor.
APPLICATIONS
a) Estimating Sales- Linear Regression finds great use in business, for sales forecasting
based on the trends. If a company observes steady increase in sales every month -
a linear regression analysis of the monthly sales data helps the company forecast
sales in upcoming months.
b) Risk Assessment- Linear Regression helps assess risk involved in insurance or
financial domain. A health insurance company can do a linear regression analysis on
the number of claims per customer against age. This analysis helps insurance
companies find, that older customers tend to make more insurance claims. Such
analysis results play a vital role in important business decisions and are made to
account for risk.
ADVANTAGES
a) It is one of the most interpretable machine learning algorithms, making it ea sy to
explain to others.
b) It is easy of use as it requires minimal tuning.
c) It is the mostly widely used machine learning technique that runs fast.
DECISION TREE MACHINE LEARNING ALGORITHM

A decision tree is a graphical representation that makes use of branching methodology to
exemplify all possible outcomes of a decision, based on certain conditions. Types of trees:
APPLICATIONS
a) Decision trees are among the popular machine learning algorithms that find great
use in finance for option pricing.
b) Remote sensing is an application area for pattern recognition based on decision
trees.
c) Decision tree algorithms are used by banks to classify loan applicants by their
probability of defaulting payments.
ADVANTAGES
a) Decision trees are very instinctual and can be e xplained to anyone with ease. People
from a non-technical background, can also decipher the hypothesis drawn from a
decision tree, as they are self -explanatory.
b) When using decision tree machine learning algorithms, data type is not a constraint
as they can handle both categorical and numerical variables.
c) Decision tree machine learning algorithms do not require making any assumption
on the linearity in the data and hence can be used in circumstances where the
parameters are non-linearly related. These machine learning algorithms do not
make any assumptions on the classifier structure and space distribution.
d) These algorithms are useful in data exploration. Decision trees implicitly perform
feature selection which is very important in predictive analytics . When a decision
tree is fit to a training dataset, the nodes at the top on which the decision tree is
split, are considered as important variables within a given dataset and feature
selection is completed by default.
e) Decision trees help save data prepara tion time, as they are not sensitive to missing
values and outliers. Missing values will not stop you from splitting the data for
building a decision tree. Outliers will also not affect the decision trees as data
splitting happens based on some samples wit hin the split range and not on exact
absolute values.

ML Algos

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

ML Algos

Diunggah oleh

Hak Cipta:

Format Tersedia

VERSION 1.

SUPPORT VECTOR MACHINE LEARNING ALGORITHM

c) Auto-Complete Applications- when the user types a word, the Google-autocomplete

ARTIFICIAL NEURAL NETWORKS

DECISION TREE MACHINE LEARNING ALGORITHM

Anda mungkin juga menyukai