Practical Machine Learning Cookbook
()
About this ebook
- Implement a wide range of algorithms and techniques for tackling complex data
- Improve predictions and recommendations to have better levels of accuracy
- Optimize performance of your machine-learning systems
This book is for analysts, statisticians, and data scientists with knowledge of fundamentals of machine learning and statistics, who need help in dealing with challenging scenarios faced every day of working in the field of machine learning and improving system performance and accuracy. It is assumed that as a reader you have a good understanding of mathematics. Working knowledge of R is expected.
Related to Practical Machine Learning Cookbook
Related ebooks
Python: Deeper Insights into Machine Learning Rating: 0 out of 5 stars0 ratingsPython Data Analysis Cookbook Rating: 5 out of 5 stars5/5Python: Real World Machine Learning Rating: 0 out of 5 stars0 ratingsPython Machine Learning Cookbook Rating: 0 out of 5 stars0 ratingsBuilding Machine Learning Systems with Python Rating: 4 out of 5 stars4/5Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition Rating: 0 out of 5 stars0 ratingsPython Data Visualization Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsLearning Predictive Analytics with Python Rating: 0 out of 5 stars0 ratingsmatplotlib Plotting Cookbook Rating: 5 out of 5 stars5/5Python Data Visualization Cookbook Rating: 4 out of 5 stars4/5Python Business Intelligence Cookbook Rating: 0 out of 5 stars0 ratingsMachine Learning with R Rating: 4 out of 5 stars4/5Machine Learning for Business: Using Amazon SageMaker and Jupyter Rating: 5 out of 5 stars5/5Python for Finance - Second Edition Rating: 2 out of 5 stars2/5Machine Learning in Action Rating: 0 out of 5 stars0 ratingsMachine Learning with R - Second Edition Rating: 5 out of 5 stars5/5Apache Spark for Data Science Cookbook Rating: 0 out of 5 stars0 ratingsNumPy: Beginner's Guide - Third Edition Rating: 4 out of 5 stars4/5Practical Data Analysis Cookbook Rating: 0 out of 5 stars0 ratingsOpenCV: Computer Vision Projects with Python Rating: 0 out of 5 stars0 ratingsMicrosoft Azure Machine Learning Rating: 4 out of 5 stars4/5TensorFlow Machine Learning Cookbook Rating: 4 out of 5 stars4/5Learning Data Mining with Python - Second Edition Rating: 0 out of 5 stars0 ratingsPractical Machine Learning for Data Analysis Using Python Rating: 0 out of 5 stars0 ratingsMastering Social Media Mining with Python Rating: 5 out of 5 stars5/5Deep Learning with Keras Rating: 5 out of 5 stars5/5Designing Machine Learning Systems with Python Rating: 0 out of 5 stars0 ratingsIntroducing Data Science: Big data, machine learning, and more, using Python tools Rating: 5 out of 5 stars5/5Mastering Python Data Analysis Rating: 0 out of 5 stars0 ratings
Computers For You
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics Rating: 4 out of 5 stars4/5The Hacker Crackdown: Law and Disorder on the Electronic Frontier Rating: 4 out of 5 stars4/5101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters Rating: 4 out of 5 stars4/5Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls Rating: 4 out of 5 stars4/5Master Builder Roblox: The Essential Guide Rating: 4 out of 5 stars4/5The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology Rating: 0 out of 5 stars0 ratingsElon Musk Rating: 4 out of 5 stars4/5CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsSQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands Rating: 5 out of 5 stars5/5The Invisible Rainbow: A History of Electricity and Life Rating: 4 out of 5 stars4/5Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad Rating: 0 out of 5 stars0 ratingsMastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 5 out of 5 stars5/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition Rating: 4 out of 5 stars4/5User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5CompTIA Security+ Practice Questions Rating: 2 out of 5 stars2/5Deep Search: How to Explore the Internet More Effectively Rating: 5 out of 5 stars5/5AP® Computer Science Principles Crash Course Rating: 0 out of 5 stars0 ratingsGarageBand Basics: The Complete Guide to GarageBand: Music Rating: 0 out of 5 stars0 ratingsDark Aeon: Transhumanism and the War Against Humanity Rating: 5 out of 5 stars5/5The Professional Voiceover Handbook: Voiceover training, #1 Rating: 5 out of 5 stars5/5
Reviews for Practical Machine Learning Cookbook
0 ratings0 reviews
Book preview
Practical Machine Learning Cookbook - Atul Tripathi
Table of Contents
Practical Machine Learning Cookbook
Credits
About the Author
About the Reviewer
www.PacktPub.com
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Introduction to Machine Learning
What is machine learning?
An overview of classification
An overview of clustering
An overview of supervised learning
An overview of unsupervised learning
An overview of reinforcement learning
An overview of structured prediction
An overview of neural networks
An overview of deep learning
2. Classification
Introduction
Discriminant function analysis - geological measurements on brines from wells
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - transforming data
Step 4 - training the model
Step 5 - classifying the data
Step 6 - evaluating the model
Multinomial logistic regression - understanding program choices made by students
Getting ready
Step 1 - collecting data
How to do it...
Step 2 - exploring data
Step 3 - training the model
Step 4 - testing the results of the model
Step 5 - model improvement performance
Tobit regression - measuring the students' academic aptitude
Getting ready
Step 1 - collecting data
How to do it...
Step 2 - exploring data
Step 3 - plotting data
Step 4 - exploring relationships
Step 5 - training the model
Step 6 - testing the model
Poisson regression - understanding species present in Galapagos Islands
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - plotting data and testing empirical data
Step 4 - rectifying discretization of the Poisson model
Step 5 - training and evaluating the model using the link function
Step 6 - revaluating using the Poisson model
Step 7 - revaluating using the linear model
3. Clustering
Introduction
Hierarchical clustering - World Bank sample dataset
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - transforming data
Step 4 - training and evaluating the model performance
Step 5 - plotting the model
Hierarchical clustering - Amazon rainforest burned between 1999-2010
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - transforming data
Step 4 - training and evaluating model performance
Step 5 - plotting the model
Step 6 - improving model performance
Hierarchical clustering - gene clustering
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - transforming data
Step 4 - training the model
Step 5 - plotting the model
Binary clustering - math test
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - training and evaluating model performance
Step 4 - plotting the model
Step 5 - K-medoids clustering
K-means clustering - European countries protein consumption
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - clustering
Step 4 - improving the model
K-means clustering - foodstuff
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - transforming data
Step 4 - clustering
Step 5 - visualizing the clusters
4. Model Selection and Regularization
Introduction
Shrinkage methods - calories burned per day
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - building the model
Step 4 - improving the model
Step 5 - comparing the model
Dimension reduction methods - Delta's Aircraft Fleet
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - applying principal components analysis
Step 4 - scaling the data
Step 5 - visualizing in 3D plot
Principal component analysis - understanding world cuisine
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - preparing data
Step 4 - applying principal components analysis
5. Nonlinearity
Generalized additive models - measuring the household income of New Zealand
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - setting up the data for the model
Step 4 - building the model
Smoothing splines - understanding cars and speed
How to do it...
Step 1 - exploring the data
Step 2 - creating the model
Step 3 - fitting the smooth curve model
Step 4 - plotting the results
Local regression - understanding drought warnings and impact
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - collecting and exploring data
Step 3 - calculating the moving average
Step 4 - calculating percentiles
Step 5 - plotting results
6. Supervised Learning
Introduction
Decision tree learning - Advance Health Directive for patients with chest pain
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - preparing the data
Step 4 - training the model
Step 5- improving the model
Decision tree learning - income-based distribution of real estate values
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - training the model
Step 4 - comparing the predictions
Step 5 - improving the model
Decision tree learning - predicting the direction of stock movement
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - calculating the indicators
Step 4 - preparing variables to build datasets
Step 5 - building the model
Step 6 - improving the model
Naive Bayes - predicting the direction of stock movement
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - preparing variables to build datasets
Step 4 - building the model
Step 5 - creating data for a new, improved model
Step 6 - improving the model
Random forest - currency trading strategy
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - preparing variables to build datasets
Step 4 - building the model
Support vector machine - currency trading strategy
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - calculating the indicators
Step 4 - preparing variables to build datasets
Step 5 - building the model
Stochastic gradient descent - adult income
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - preparing the data
Step 4 - building the model
Step 5 - plotting the model
7. Unsupervised Learning
Introduction
Self-organizing map - visualizing of heatmaps
How to do it...
Step 1 - exploring data
Step 2 - training the model
Step 3 - plotting the model
Vector quantization - image clustering
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - data cleaning
Step 4 - visualizing cleaned data
Step 5 - building the model and visualizing it
8. Reinforcement Learning
Introduction
Markov chains - the stocks regime switching model
Getting ready
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - preparing the regression model
Step 4 - preparing the Markov-switching model
Step 5 - plotting the regime probabilities
Step 6 - testing the Markov switching model
Markov chains - the multi-channel attribution model
Getting ready
How to do it...
Step 1 - preparing the dataset
Step 2 - preparing the model
Step 3 - plotting the Markov graph
Step 4 - simulating the dataset of customer journeys
Step 5 - preparing a transition matrix heat map for real data
Markov chains - the car rental agency service
How to do it...
Step 1 - preparing the dataset
Step 2 - preparing the model
Step 3 - improving the model
Continuous Markov chains - vehicle service at a gas station
Getting ready
How to do it...
Step 1 - preparing the dataset
Step 2 - computing the theoretical resolution
Step 3 - verifying the convergence of a theoretical solution
Step 4 - plotting the results
Monte Carlo simulations - calibrated Hull and White short-rates
Getting ready
Step 1 - installing the packages and libraries
How to do it...
Step 2 - initializing the data and variables
Step 3 - pricing the Bermudan swaptions
Step 4 - constructing the spot term structure of interest rates
Step 5 - simulating Hull-White short-rates
9. Structured Prediction
Introduction
Hidden Markov models - EUR and USD
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - turning data into a time series
Step 4 - building the model
Step 5 - displaying the results
Hidden Markov models - regime detection
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - preparing the model
10. Neural Networks
Introduction
Modelling SP 500
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - calculating the indicators
Step 4 - preparing data for model building
Step 5 - building the model
Measuring the unemployment rate
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - preparing and verifying the models
Step 4 - forecasting and testing the accuracy of the models built
11. Deep Learning
Introduction
Recurrent neural networks - predicting periodic signals
Getting ready...
How to do it...
12. Case Study - Exploring World Bank Data
Introduction
Exploring World Bank data
Getting ready...
Step 1 - collecting and describing data
How to do it...
Step 2 - downloading the data
Step 3 - exploring data
Step 4 - building the models
Step 5 - plotting the models
13. Case Study - Pricing Reinsurance Contracts
Introduction
Pricing reinsurance contracts
Getting ready...
Step 1 - collecting and describing the data
How to do it...
Step 2 - exploring the data
Step 3 - calculating the individual loss claims
Step 4 - calculating the number of hurricanes
Step 5 - building predictive models
Step 6 - calculating the pure premium of the reinsurance contract
14. Case Study - Forecast of Electricity Consumption
Introduction
Getting ready
Step 1 - collecting and describing data
How to do it...
Step 2 - exploring data
Step 3 - time series - regression analysis
Step 4 - time series - improving regression analysis
Step 5 - building a forecasting model
Step 6 - plotting the forecast for a year
Practical Machine Learning Cookbook
Practical Machine Learning Cookbook
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2017
Production reference: 1070417
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-78528-051-1
www.packtpub.com
Credits
About the Author
Atul Tripathi has spent more than 11 years in the fields of machine learning and quantitative finance. He has a total of 14 years of experience in software development and research. He has worked on advanced machine learning techniques, such as neural networks and Markov models. While working on these techniques, he has solved problems related to image processing, telecommunications, human speech recognition, and natural language processing. He has also developed tools for text mining using neural networks. In the field of quantitative finance, he has developed models for Value at Risk, Extreme Value Theorem, Option Pricing, and Energy Derivatives using Monte Carlo simulation techniques.
About the Reviewer
Ryota Kamoshida is the developer of the Python library MALSS (MAchine Learning Support System), (https://github.com/canard0328/malss) and now works as a senior researcher in the field of computer science at Hitachi, Ltd.
www.PacktPub.com
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1785280511.
If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
Data in today’s world is the new black gold which is growing exponentially. This growth can be attributed to the growth of existing data, and new data in a structured and unstructured format from multiple sources such as social media, Internet, documents and the Internet of Things. The flow of data must be collected, processed, analyzed, and finally presented in real time to ensure that the consumers of the data are able to take informed decisions in today’s fast-changing environment. Machine learning techniques are applied to the data using the context of the problem to be solved to ensure that fast arriving and complex data can be analyzed in a scientific manner using statistical techniques. Using machine learning algorithms that iteratively learn from data, hidden patterns can be discovered. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt and learn to produce reliable decisions from new data sets.
We will start by introducing the various topics of machine learning, that will be covered in the book. Based on real-world challenges, we explore each of the topics under various chapters, such as Classification, Clustering, Model Selection and Regularization, Nonlinearity, Supervised Learning, Unsupervised Learning, Reinforcement Learning, Structured Prediction, Neural Networks, Deep Learning, and finally the case studies. The algorithms have been developed using R as the programming language. This book is friendly for beginners in R, but familiarity with R programming would certainly be helpful for playing around with the code.
You will learn how to make informed decisions about the type of algorithms you need to use and how to implement these algorithms to get the best possible results. If you want to build versatile applications that can make sense of images, text, speech, or some other form of data, this book on machine learning will definitely come to your rescue!
What this book covers
Chapter 1, Introduction to Machine Learning, covers various concepts about machine learning. This chapter makes the reader aware of the various topics we shall be covering in the book.
Chapter 2, Classification, covers the following topics and algorithms: discriminant function analysis, multinomial logistic regression, Tobit regression, and Poisson regression.
Chapter 3, Clustering, covers the following topics and algorithms: hierarchical clustering, binary clustering, and k-means clustering.
Chapter 4, Model Selection and Regularization, covers the following topics and algorithms: shrinkage methods, dimension reduction methods, and principal component analysis.
Chapter 5, Nonlinearity, covers the following topics and algorithms: generalized additive models, smoothing splines, local regression.
Chapter 6, Supervised Learning, covers the following topics and algorithms: decision tree learning, Naive Bayes, random forest, support vector machine, and stochastic gradient descent.
Chapter 7, Unsupervised Learning, covers the following topics and algorithms: self-organizing map, and vector quantization.
Chapter 8, Reinforcement Learning, covers the following topics and algorithms: Markov chains, and Monte Carlo simulations.
Chapter 9, Structured Prediction, covers the following topic and algorithms: hidden Markov models.
Chapter 10, Neural Networks, covers the following topic and algorithms: neural networks.
Chapter 11, Deep Learning, covers the following topic and algorithms: recurrent neural networks.
Chapter 12, Case Study - Exploring World Bank Data, covers World Bank data analysis.
Chapter 13, Case Study - Pricing Reinsurance Contracts, covers pricing reinsurance contracts.
Chapter 14, Case Study - Forecast of Electricity Consumption, covers forecasting electricity consumption.
What you need for this book
This book is focused on building machine learning-based applications in R. We have used R to build various solutions. We focused on how to utilize various R libraries and functions in the best possible way to overcome real-world challenges. We have tried to keep all the code as friendly and readable as possible. We feel that this will enable our readers to easily understand the code and readily use it in different scenarios.
Who this book is for
This book is for students and professionals working in the fields of statistics, data analytics, machine learning, and computer science, or other professionals who want to build real-world machine learning-based applications. This book is friendly to R beginners, but being familiar with R would be useful for playing around with the code. The will also be useful for experienced R programmers who are looking to explore machine learning techniques in their existing technology stacks.
Sections
In this book, you will find headings that appear frequently (Getting ready and How to do it).
To give clear instructions on how to complete a recipe, we use these sections as follows:
Getting ready
This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.
How to do it…
This section contains the steps required to follow the recipe.
Conventions
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We will be saving the data to the fitbit_details frame:
Any command-line input or output is written as follows:
install.packages(ggplot2
)
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Monte Carlo v/s Market n Zero Rates
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors .
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the SUPPORT tab at the top.
Click on Code Downloads & Errata.
Enter the name of the book in the Search box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on Code Download.
You can also