Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Practical Machine Learning Cookbook
Practical Machine Learning Cookbook
Practical Machine Learning Cookbook
Ebook907 pages5 hours

Practical Machine Learning Cookbook

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • Implement a wide range of algorithms and techniques for tackling complex data
  • Improve predictions and recommendations to have better levels of accuracy
  • Optimize performance of your machine-learning systems
Who This Book Is For

This book is for analysts, statisticians, and data scientists with knowledge of fundamentals of machine learning and statistics, who need help in dealing with challenging scenarios faced every day of working in the field of machine learning and improving system performance and accuracy. It is assumed that as a reader you have a good understanding of mathematics. Working knowledge of R is expected.

LanguageEnglish
Release dateApr 14, 2017
ISBN9781785286537
Practical Machine Learning Cookbook

Related to Practical Machine Learning Cookbook

Related ebooks

Computers For You

View More

Related articles

Reviews for Practical Machine Learning Cookbook

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Practical Machine Learning Cookbook - Atul Tripathi

    Table of Contents

    Practical Machine Learning Cookbook

    Credits

    About the Author

    About the Reviewer

    www.PacktPub.com

    Why subscribe?

    Customer Feedback

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Sections

    Getting ready

    How to do it…

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Downloading the color images of this book 

    Errata

    Piracy

    Questions

    1. Introduction to Machine Learning

    What is machine learning?

    An overview of classification

    An overview of clustering

    An overview of supervised learning

    An overview of unsupervised learning

    An overview of reinforcement learning

    An overview of structured prediction

    An overview of neural networks

    An overview of deep learning

    2. Classification

    Introduction

    Discriminant function analysis - geological measurements on brines from wells

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - transforming data

    Step 4 - training the model

    Step 5 - classifying the data

    Step 6 - evaluating the model

    Multinomial logistic regression - understanding program choices made by students

    Getting ready

    Step 1 - collecting data

    How to do it...

    Step 2 - exploring data

    Step 3 - training the model

    Step 4 - testing the results of the model

    Step 5 - model improvement performance

    Tobit regression - measuring the students' academic aptitude

    Getting ready

    Step 1 - collecting data

    How to do it...

    Step 2 - exploring data

    Step 3 - plotting data

    Step 4 - exploring relationships

    Step 5 - training the model

    Step 6 - testing the model

    Poisson regression - understanding species present in Galapagos Islands

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - plotting data and testing empirical data

    Step 4 - rectifying discretization of the Poisson model

    Step 5 - training and evaluating the model using the link function

    Step 6 - revaluating using the Poisson model

    Step 7 - revaluating using the linear model

    3. Clustering

    Introduction

    Hierarchical clustering - World Bank sample dataset

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - transforming data

    Step 4 - training and evaluating the model performance

    Step 5 - plotting the model

    Hierarchical clustering - Amazon rainforest burned between 1999-2010

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - transforming data

    Step 4 - training and evaluating model performance

    Step 5 - plotting the model

    Step 6 - improving model performance

    Hierarchical clustering - gene clustering

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - transforming data

    Step 4 - training the model

    Step 5 - plotting the model

    Binary clustering - math test

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - training and evaluating model performance

    Step 4 - plotting the model

    Step 5 - K-medoids clustering

    K-means clustering - European countries protein consumption

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - clustering

    Step 4 - improving the model

    K-means clustering - foodstuff

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - transforming data

    Step 4 - clustering

    Step 5 - visualizing the clusters

    4. Model Selection and Regularization

    Introduction

    Shrinkage methods - calories burned per day

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - building the model

    Step 4 - improving the model

    Step 5 - comparing the model

    Dimension reduction methods - Delta's Aircraft Fleet

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - applying principal components analysis

    Step 4 - scaling the data

    Step 5 - visualizing in 3D plot

    Principal component analysis - understanding world cuisine

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - preparing data

    Step 4 - applying principal components analysis

    5. Nonlinearity

    Generalized additive models - measuring the household income of New Zealand

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - setting up the data for the model

    Step 4 - building the model

    Smoothing splines - understanding cars and speed

    How to do it...

    Step 1 - exploring the data

    Step 2 - creating the model

    Step 3 - fitting the smooth curve model

    Step 4 - plotting the results

    Local regression - understanding drought warnings and impact

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - collecting and exploring data

    Step 3 - calculating the moving average

    Step 4 - calculating percentiles

    Step 5 - plotting results

    6. Supervised Learning

    Introduction

    Decision tree learning - Advance Health Directive for patients with chest pain

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - preparing the data

    Step 4 - training the model

    Step 5- improving the model

    Decision tree learning - income-based distribution of real estate values

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - training the model

    Step 4 - comparing the predictions

    Step 5 - improving the model

    Decision tree learning - predicting the direction of stock movement

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - calculating the indicators

    Step 4 - preparing variables to build datasets

    Step 5 - building the model

    Step 6 - improving the model

    Naive Bayes - predicting the direction of stock movement

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - preparing variables to build datasets

    Step 4 - building the model

    Step 5 - creating data for a new, improved model

    Step 6 - improving the model

    Random forest - currency trading strategy

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - preparing variables to build datasets

    Step 4 - building the model

    Support vector machine - currency trading strategy

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - calculating the indicators

    Step 4 - preparing variables to build datasets

    Step 5 - building the model

    Stochastic gradient descent - adult income

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - preparing the data

    Step 4 - building the model

    Step 5 - plotting the model

    7. Unsupervised Learning

    Introduction

    Self-organizing map - visualizing of heatmaps

    How to do it...

    Step 1 - exploring data

    Step 2 - training the model

    Step 3 - plotting the model

    Vector quantization - image clustering

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - data cleaning

    Step 4 - visualizing cleaned data

    Step 5 - building the model and visualizing it

    8. Reinforcement Learning

    Introduction

    Markov chains - the stocks regime switching model

    Getting ready

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - preparing the regression model

    Step 4 - preparing the Markov-switching model

    Step 5 - plotting the regime probabilities

    Step 6 - testing the Markov switching model

    Markov chains - the multi-channel attribution model

    Getting ready

    How to do it...

    Step 1 - preparing the dataset

    Step 2 - preparing the model

    Step 3 - plotting the Markov graph

    Step 4 - simulating the dataset of customer journeys

    Step 5 - preparing a transition matrix heat map for real data

    Markov chains - the car rental agency service

    How to do it...

    Step 1 - preparing the dataset

    Step 2 - preparing the model

    Step 3 - improving the model

    Continuous Markov chains - vehicle service at a gas station

    Getting ready

    How to do it...

    Step 1 - preparing the dataset

    Step 2 - computing the theoretical resolution

    Step 3 - verifying the convergence of a theoretical solution

    Step 4 - plotting the results

    Monte Carlo simulations - calibrated Hull and White short-rates

    Getting ready

    Step 1 - installing the packages and libraries

    How to do it...

    Step 2 - initializing the data and variables

    Step 3 - pricing the Bermudan swaptions

    Step 4 - constructing the spot term structure of interest rates

    Step 5 - simulating Hull-White short-rates

    9. Structured Prediction

    Introduction

    Hidden Markov models - EUR and USD

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - turning data into a time series

    Step 4 - building the model

    Step 5 - displaying the results

    Hidden Markov models - regime detection

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - preparing the model

    10. Neural Networks

    Introduction

    Modelling SP 500

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - calculating the indicators

    Step 4 - preparing data for model building

    Step 5 - building the model

    Measuring the unemployment rate

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - preparing and verifying the models

    Step 4 - forecasting and testing the accuracy of the models built

    11. Deep Learning

    Introduction

    Recurrent neural networks - predicting periodic signals

    Getting ready...

    How to do it...

    12. Case Study - Exploring World Bank Data

    Introduction

    Exploring World Bank data

    Getting ready...

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - downloading the data

    Step 3 - exploring data

    Step 4 - building the models

    Step 5 - plotting the models

    13. Case Study - Pricing Reinsurance Contracts

    Introduction

    Pricing reinsurance contracts

    Getting ready...

    Step 1 - collecting and describing the data

    How to do it...

    Step 2 - exploring the data

    Step 3 - calculating the individual loss claims

    Step 4 - calculating the number of hurricanes

    Step 5 - building predictive models

    Step 6 - calculating the pure premium of the reinsurance contract

    14. Case Study - Forecast of Electricity Consumption

    Introduction

    Getting ready

    Step 1 - collecting and describing data

    How to do it...

    Step 2 - exploring data

    Step 3 - time series - regression analysis

    Step 4 - time series - improving regression analysis

    Step 5 - building a forecasting model

    Step 6 - plotting the forecast for a year

    Practical Machine Learning Cookbook


    Practical Machine Learning Cookbook

    Copyright © 2017 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: April 2017

    Production reference: 1070417

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham 

    B3 2PB, UK.

    ISBN 978-1-78528-051-1

    www.packtpub.com

    Credits

    About the Author

    Atul Tripathi has spent more than 11 years in the fields of machine learning and quantitative finance. He has a total of 14 years of experience in software development and research. He has worked on advanced machine learning techniques, such as neural networks and Markov models. While working on these techniques, he has solved problems related to image processing, telecommunications, human speech recognition, and natural language processing. He has also developed tools for text mining using neural networks. In the field of quantitative finance, he has developed models for Value at Risk, Extreme Value Theorem, Option Pricing, and Energy Derivatives using Monte Carlo simulation techniques.

    About the Reviewer

    Ryota Kamoshida is the developer of the Python library MALSS (MAchine Learning Support System), (https://github.com/canard0328/malss) and now works as a senior researcher in the field of computer science at Hitachi, Ltd.

    www.PacktPub.com

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at service@packtpub.com for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www.packtpub.com/mapt

    Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Customer Feedback

    Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1785280511.

    If you'd like to join our team of regular reviewers, you can e-mail us at customerreviews@packtpub.com. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

    Preface

    Data in today’s world is the new black gold which is growing exponentially. This growth can be attributed to the growth of existing data, and new data in a structured and unstructured format from multiple sources such as social media, Internet, documents and the Internet of Things. The flow of data must be collected, processed, analyzed, and finally presented in real time to ensure that the consumers of the data are able to take informed decisions in today’s fast-changing environment. Machine learning techniques are applied to the data using the context of the problem to be solved to ensure that fast arriving and complex data can be analyzed in a scientific manner using statistical techniques. Using machine learning algorithms that iteratively learn from data, hidden patterns can be discovered. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt and learn to produce reliable decisions from new data sets.

    We will start by introducing the various topics of machine learning, that will be covered in the book. Based on real-world challenges, we explore each of the topics under various chapters, such as Classification, Clustering, Model Selection and Regularization, Nonlinearity, Supervised Learning, Unsupervised Learning, Reinforcement Learning, Structured Prediction, Neural Networks, Deep Learning, and finally the case studies. The algorithms have been developed using R as the programming language. This book is friendly for beginners in R, but familiarity with R programming would certainly be helpful for playing around with the code.

    You will learn how to make informed decisions about the type of algorithms you need to use and how to implement these algorithms to get the best possible results. If you want to build versatile applications that can make sense of images, text, speech, or some other form of data, this book on machine learning will definitely come to your rescue!

    What this book covers

    Chapter 1, Introduction to Machine Learning, covers various concepts about machine learning. This chapter makes the reader aware of the various topics we shall be covering in the book.

    Chapter 2, Classification, covers the following topics and algorithms: discriminant function analysis, multinomial logistic regression, Tobit regression, and Poisson regression.

    Chapter 3, Clustering, covers the following topics and algorithms: hierarchical clustering, binary clustering, and k-means clustering.

    Chapter 4, Model Selection and Regularization, covers the following topics and algorithms: shrinkage methods, dimension reduction methods, and principal component analysis.

    Chapter 5, Nonlinearity, covers the following topics and algorithms: generalized additive models, smoothing splines, local regression.

    Chapter 6, Supervised Learning, covers the following topics and algorithms: decision tree learning, Naive Bayes, random forest, support vector machine, and stochastic gradient descent.

    Chapter 7, Unsupervised Learning, covers the following topics and algorithms: self-organizing map, and vector quantization.

    Chapter 8, Reinforcement Learning, covers the following topics and algorithms: Markov chains, and Monte Carlo simulations.

    Chapter 9, Structured Prediction, covers the following topic and algorithms: hidden Markov models.

    Chapter 10, Neural Networks, covers the following topic and algorithms: neural networks.

    Chapter 11, Deep Learning, covers the following topic and algorithms:  recurrent neural networks.

    Chapter 12, Case Study - Exploring World Bank Data, covers World Bank data analysis.

    Chapter 13, Case Study - Pricing Reinsurance Contracts, covers pricing reinsurance contracts.

    Chapter 14, Case Study - Forecast of Electricity Consumption, covers forecasting electricity consumption.

    What you need for this book

    This book is focused on building machine learning-based applications in R. We have used R to build various solutions. We focused on how to utilize various R libraries and functions in the best possible way to overcome real-world challenges. We have tried to keep all the code as friendly and readable as possible. We feel that this will enable our readers to easily understand the code and readily use it in different scenarios.

    Who this book is for

    This book is for students and professionals working in the fields of statistics, data analytics, machine learning, and computer science, or other professionals who want to build real-world machine learning-based applications. This book is friendly to R beginners, but being familiar with R would be useful for playing around with the code. The will also be useful for experienced R programmers who are looking to explore machine learning techniques in their existing technology stacks.

    Sections

    In this book, you will find headings that appear frequently (Getting ready and How to do it).

    To give clear instructions on how to complete a recipe, we use these sections as follows:

    Getting ready

    This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

    How to do it…

    This section contains the steps required to follow the recipe.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: We will be saving the data to the fitbit_details frame:

    Any command-line input or output is written as follows:

    install.packages(ggplot2)

    New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Monte Carlo v/s Market n Zero Rates

    Note

    Warnings or important notes appear in a box like this.

    Tip

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

    To send us general feedback, simply e-mail feedback@packtpub.com, and mention the book's title in the subject of your message.

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors .

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Downloading the example code

    You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support  and register to have the files e-mailed directly to you.

    You can download the code files by following these steps:

    Log in or register to our website using your e-mail address and password.

    Hover the mouse pointer on the SUPPORT tab at the top.

    Click on Code Downloads & Errata.

    Enter the name of the book in the Search box.

    Select the book for which you're looking to download the code files.

    Choose from the drop-down menu where you purchased this book from.

    Click on Code Download.

    You can also

    Enjoying the preview?
    Page 1 of 1