Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects

Ebook707 pages7 hours

Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects

Name: Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects
Author: David Nettleton
ISBN: 9780124166585

By David Nettleton

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Whether you are brand new to data mining or working on your tenth predictive analytics project, Commercial Data Mining will be there for you as an accessible reference outlining the entire process and related themes. In this book, you'll learn that your organization does not need a huge volume of data or a Fortune 500 budget to generate business using existing information assets. Expert author David Nettleton guides you through the process from beginning to end and covers everything from business objectives to data sources, and selection to analysis and predictive modeling.

Commercial Data Mining includes case studies and practical examples from Nettleton's more than 20 years of commercial experience. Real-world cases covering customer loyalty, cross-selling, and audience prediction in industries including insurance, banking, and media illustrate the concepts and techniques explained throughout the book.

Illustrates cost-benefit evaluation of potential projects
Includes vendor-agnostic advice on what to look for in off-the-shelf solutions as well as tips on building your own data mining tools
Approachable reference can be read from cover to cover by readers of all experience levels
Includes practical examples and case studies as well as actionable business insights from author's own experience

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateJan 29, 2014

ISBN9780124166585

Author

David Nettleton

David F. Nettleton has more than 25 years of experience in IT system development, specializing in databases and data analysis. He has a Bachelor of Science degree in Computer Science, Master of Science degree in Computer Software and Systems Design and a Ph.D. in Artificial Intelligence. He has worked for IBM as a Business Intelligence Consultant, among other companies. In 1995 he founded his own consultancy dedicated to commercial data analysis projects, working in the Banking, Insurance, Media, Industry and Health Sectors. He has published over 40 articles and papers in journals, national and international congresses and magazines, and has given many presentations in conferences and workshops. He is currently a contract researcher at the Universitat Pompeu Fabra, Barcelona, Spain and at the IIIA-CSIC, Spain, specializing in data mining applied to online social networks and data privacy. Dr. Nettleton was born in England and lives in Barcelona, Spain since 1988.

Related authors

Skip carousel

Related to Commercial Data Mining

Related ebooks

Skip carousel

Making Big Data Work for Your Business: A guide to effective Big Data analytics
Ebook
Making Big Data Work for Your Business: A guide to effective Big Data analytics
bySudhi Sinha
Rating: 0 out of 5 stars
0 ratings
Fighting Churn with Data: The science and strategy of customer retention
Ebook
Fighting Churn with Data: The science and strategy of customer retention
byCarl Gold
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
Freemium Economics: Leveraging Analytics and User Segmentation to Drive Revenue
Ebook
Freemium Economics: Leveraging Analytics and User Segmentation to Drive Revenue
byEric Benjamin Seufert
Rating: 5 out of 5 stars
5/5
Big Data: Opportunities and challenges
Ebook
Big Data: Opportunities and challenges
byBCS, The Chartered Institute for IT
Rating: 0 out of 5 stars
0 ratings
It's Not the Size of the Data -- It's How You Use It: Smarter Marketing with Analytics and Dashboards
Ebook
It's Not the Size of the Data -- It's How You Use It: Smarter Marketing with Analytics and Dashboards
byKoen Pauwels
Rating: 3 out of 5 stars
3/5
Business Modeling and Data Mining
Ebook
Business Modeling and Data Mining
byDorian Pyle
Rating: 3 out of 5 stars
3/5
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
Ebook
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
byJared Dean
Rating: 3 out of 5 stars
3/5
Monetizing Your Data: A Guide to Turning Data into Profit-Driving Strategies and Solutions
Ebook
Monetizing Your Data: A Guide to Turning Data into Profit-Driving Strategies and Solutions
byKathy Williams Chiang
Rating: 0 out of 5 stars
0 ratings
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
Ebook
Profit Driven Business Analytics: A Practitioner's Guide to Transforming Big Data into Added Value
byWouter Verbeke
Rating: 0 out of 5 stars
0 ratings
Big Data for Executives and Market Professionals - Third Edition: Big Data
Ebook
Big Data for Executives and Market Professionals - Third Edition: Big Data
byJose Antonio Ribeiro Neto
Rating: 0 out of 5 stars
0 ratings
The Average is Always Wrong: A real-world guide to putting data at the heart of your business
Ebook
The Average is Always Wrong: A real-world guide to putting data at the heart of your business
byIan Shepherd
Rating: 0 out of 5 stars
0 ratings
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
Ebook
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
byMichele Chambers
Rating: 0 out of 5 stars
0 ratings
Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results
Ebook
Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results
byBernard Marr
Rating: 4 out of 5 stars
4/5
Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration
Ebook
Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration
byEarl Cox
Rating: 5 out of 5 stars
5/5
Guerrilla Analytics: A Practical Approach to Working with Data
Ebook
Guerrilla Analytics: A Practical Approach to Working with Data
byEnda Ridge
Rating: 5 out of 5 stars
5/5
The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits
Ebook
The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits
byRussell Glass
Rating: 0 out of 5 stars
0 ratings
Perspectives on Data Science for Software Engineering
Ebook
Perspectives on Data Science for Software Engineering
byTim Menzies
Rating: 5 out of 5 stars
5/5
Big Data Analytics for Cyber-Physical Systems: Machine Learning for the Internet of Things
Ebook
Big Data Analytics for Cyber-Physical Systems: Machine Learning for the Internet of Things
byGuido Dartmann
Rating: 0 out of 5 stars
0 ratings
Microsoft SQL Server 2005: A Beginner''s Guide
Ebook
Microsoft SQL Server 2005: A Beginner''s Guide
byDusan Petkovic
Rating: 0 out of 5 stars
0 ratings
DataOps A Complete Guide - 2020 Edition
Ebook
DataOps A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Mastering Social Media Mining with Python
Ebook
Mastering Social Media Mining with Python
byMarco Bonzanini
Rating: 5 out of 5 stars
5/5
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
Ebook
Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
byEric Siegel
Rating: 4 out of 5 stars
4/5
Big Data Analytics: Disruptive Technologies for Changing the Game
Ebook
Big Data Analytics: Disruptive Technologies for Changing the Game
byArvind Sathi
Rating: 4 out of 5 stars
4/5
Stratégie: Business Intelligence & Analytics
Ebook
Stratégie: Business Intelligence & Analytics
byDr. Anupama Rajesh
Rating: 0 out of 5 stars
0 ratings
Data Monetization A Complete Guide - 2021 Edition
Ebook
Data Monetization A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Marketing Mix Modelling A Complete Guide - 2021 Edition
Ebook
Marketing Mix Modelling A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Business Value in an Ocean of Data: Data Mining from a User Perspective
Ebook
Business Value in an Ocean of Data: Data Mining from a User Perspective
byBulcsú Fajszi
Rating: 0 out of 5 stars
0 ratings
Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions
Ebook
Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions
byMatt Taddy
Rating: 5 out of 5 stars
5/5
Learning Data Mining with Python - Second Edition
Ebook
Learning Data Mining with Python - Second Edition
byRobert Layton
Rating: 0 out of 5 stars
0 ratings

Enterprise Applications For You

Skip carousel

Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
Ebook
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
byJamshid Gharajedaghi
Rating: 4 out of 5 stars
4/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
Ebook
The Ridiculously Simple Guide to Google Docs: A Practical Guide to Cloud-Based Word Processing
byScott La Counte
Rating: 0 out of 5 stars
0 ratings
Scrivener For Dummies
Ebook
Scrivener For Dummies
byGwen Hernandez
Rating: 4 out of 5 stars
4/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
Bitcoin For Dummies
Ebook
Bitcoin For Dummies
byPrypto
Rating: 4 out of 5 stars
4/5
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
Ebook
The New Email Revolution: Save Time, Make Money, and Write Emails People Actually Want to Read!
byRobert W Bly
Rating: 5 out of 5 stars
5/5
Create Income through Self-Publishing: An Author's Approach on Generating Wealth by Self-Publishing
Ebook
Create Income through Self-Publishing: An Author's Approach on Generating Wealth by Self-Publishing
bySimon Marlow
Rating: 5 out of 5 stars
5/5
Excel Formulas That Automate Tasks You No Longer Have Time For
Ebook
Excel Formulas That Automate Tasks You No Longer Have Time For
byErik Kopp
Rating: 5 out of 5 stars
5/5
QuickBooks 2024 All-in-One For Dummies
Ebook
QuickBooks 2024 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Excel 2016 For Dummies
Ebook
Excel 2016 For Dummies
byGreg Harvey
Rating: 4 out of 5 stars
4/5
Excel 2019 For Dummies
Ebook
Excel 2019 For Dummies
byGreg Harvey
Rating: 3 out of 5 stars
3/5
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
QuickBooks 2023 All-in-One For Dummies
Ebook
QuickBooks 2023 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office
Ebook
Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office
byKevin Wilson
Rating: 3 out of 5 stars
3/5
50 Useful Excel Functions: Excel Essentials, #3
Ebook
50 Useful Excel Functions: Excel Essentials, #3
byM.L. Humphrey
Rating: 5 out of 5 stars
5/5
Change Management for Beginners: Understanding Change Processes and Actively Shaping Them
Ebook
Change Management for Beginners: Understanding Change Processes and Actively Shaping Them
bySteffen Lobinger
Rating: 5 out of 5 stars
5/5
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
Ebook
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
byTerry R. Hoffmann
Rating: 0 out of 5 stars
0 ratings
ConfigMgr - An Administrator's Guide to Deploying Applications using PowerShell
Ebook
ConfigMgr - An Administrator's Guide to Deploying Applications using PowerShell
byOwen Smith
Rating: 5 out of 5 stars
5/5
QuickBooks 2021 For Dummies
Ebook
QuickBooks 2021 For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Learn SQLite in 24 Hours
Ebook
Learn SQLite in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Notion for Beginners: Notion for Work, Play, and Productivity
Ebook
Notion for Beginners: Notion for Work, Play, and Productivity
byJill Hamilton
Rating: 4 out of 5 stars
4/5
Excel Formulas and Functions 2020: Excel Academy, #1
Ebook
Excel Formulas and Functions 2020: Excel Academy, #1
byAdam Ramirez
Rating: 4 out of 5 stars
4/5
Microsoft Outlook 2016/2019/365 User Guide
Ebook
Microsoft Outlook 2016/2019/365 User Guide
byDave Tosh
Rating: 5 out of 5 stars
5/5
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Excel : The Complete Ultimate Comprehensive Step-By-Step Guide To Learn Excel Programming
Ebook
Excel : The Complete Ultimate Comprehensive Step-By-Step Guide To Learn Excel Programming
byKevin Clark
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
[REPLAY] Alex Moazed – Building Modern Monopolies - [Invest Like the Best, EP.25]: My guest this week is Alex Moazed, the co-author of Modern Monopolies: What It Takes to Dominate the 21st Century Economy, which explores the platform business model (Uber, Airbnb, Github). Alex is also the founder and CEO of Applico, a company that he s
Podcast episode
[REPLAY] Alex Moazed – Building Modern Monopolies - [Invest Like the Best, EP.25]: My guest this week is Alex Moazed, the co-author of Modern Monopolies: What It Takes to Dominate the 21st Century Economy, which explores the platform business model (Uber, Airbnb, Github). Alex is also the founder and CEO of Applico, a company that he s
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
Podcast episode
Massively Parallel Data Processing In Python Without The Effort Using Bodo: An interview about how Bodo converts standard Python code to native MPI automatically for massive speed ups in data processing workloads
byData Engineering Podcast
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
Podcast episode
Unlocking The Power of Data Lineage In Your Platform with OpenLineage: An interview with Julien Le Dem about the OpenLineage specification and the opportunity that it offers for simplifying the tracking and analysis of data lineage across your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
The Startup Inside IBM with Sachin Agarwal: Sachin Agarwal is the worldwide product management lead at IBM Aspera. He’s also an organizer at Empower Platform and the founder and CEO of Braid, a project management solution built inside Gmail, Google Calendar, and Slack. Previously, Sachin worked as
Podcast episode
The Startup Inside IBM with Sachin Agarwal: Sachin Agarwal is the worldwide product management lead at IBM Aspera. He’s also an organizer at Empower Platform and the founder and CEO of Braid, a project management solution built inside Gmail, Google Calendar, and Slack. Previously, Sachin worked as
byScreaming in the Cloud
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
Revisiting The Technical And Social Benefits Of The Data Mesh: An interview with Zhamak Dehghani about her experience working with the community that has grown up around her idea of the data mesh and the lessons that she has learned.
Podcast episode
Revisiting The Technical And Social Benefits Of The Data Mesh: An interview with Zhamak Dehghani about her experience working with the community that has grown up around her idea of the data mesh and the lessons that she has learned.
byData Engineering Podcast
0 ratings
0% found this document useful
BDTP. B2B Marketing Attribution with Steffen Hedebrandt: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Steffen Hedebrandt, co-founder and CMO of Dreamdata. You'll learn about the typical customer journeys in B2B SaaS, which channels are the easiest to measure, tips for running marketing experiments, and more.
Podcast episode
BDTP. B2B Marketing Attribution with Steffen Hedebrandt: Today we have another episode of Better Done Than Perfect. Listen in as we talk with Steffen Hedebrandt, co-founder and CMO of Dreamdata. You'll learn about the typical customer journeys in B2B SaaS, which channels are the easiest to measure, tips for running marketing experiments, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
What does the D2C Model means for Shopping Centres and Retailers?: While the D2C selling model is becoming popular among brands and product companies, there are several supply chain challenges and complexities involved for businesses in this space. Logistics and supply chain management can make or break a...
Podcast episode
What does the D2C Model means for Shopping Centres and Retailers?: While the D2C selling model is becoming popular among brands and product companies, there are several supply chain challenges and complexities involved for businesses in this space. Logistics and supply chain management can make or break a...
byVoice on Demand - Retail Podcast by MECS+R
0 ratings
0% found this document useful
#77 Acing the Data Science Interview
Podcast episode
#77 Acing the Data Science Interview
byDataFramed
0 ratings
0% found this document useful
Four Must Read Books for Data and Analytics Leaders with Randy Bean, John Thompson, Cole Nussbaumer Knaflic, and Doug Laney: The data and analytics field is evolving rapidly. As is the role of the modern data and analytics leader. To keep pace with new trends, organizational best practices, and technologies, leaders need to make learning a priority, whether that’s reading or podcasting — or both! In this episode, Cindi is joined by four authors and thought leaders who are writing the next chapter of our industry. Randy Bean, Founder and CEO of New Vantage Partners, shares the value of experimentation and failure. John Thompson, Global Head of Advanced Analytics and AI at CSL Behring, shares insights into building successful analytics teams; Cole Nussbaumer Knaflic, Founder and CEO of storytelling with data, shares how leaders can better connect with audiences through story; and Doug Laney, Data and Analytics Innovation Fellow at West Monroe, shares how to monetize, manage and measure
Podcast episode
Four Must Read Books for Data and Analytics Leaders with Randy Bean, John Thompson, Cole Nussbaumer Knaflic, and Doug Laney: The data and analytics field is evolving rapidly. As is the role of the modern data and analytics leader. To keep pace with new trends, organizational best practices, and technologies, leaders need to make learning a priority, whether that’s reading or podcasting — or both! In this episode, Cindi is joined by four authors and thought leaders who are writing the next chapter of our industry. Randy Bean, Founder and CEO of New Vantage Partners, shares the value of experimentation and failure. John Thompson, Global Head of Advanced Analytics and AI at CSL Behring, shares insights into building successful analytics teams; Cole Nussbaumer Knaflic, Founder and CEO of storytelling with data, shares how leaders can better connect with audiences through story; and Doug Laney, Data and Analytics Innovation Fellow at West Monroe, shares how to monetize, manage and measure
byThe Data Chief
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
Podcast episode
This Week In Machine Learning & AI - 5/27/16: The White House on AI & Aggressive Self-Driving Cars: This Week in Machine Learning & AI brings you the…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
Raja Rajamannar (Mastercard) | Quantam Marketing
Podcast episode
Raja Rajamannar (Mastercard) | Quantam Marketing
byThe CMO Podcast
0 ratings
0% found this document useful
Marketing Automation Explained with Casey Cheshire, Founder & CMO of Cheshire Impact: Marketing automation has been around for a little while now, but few marketers fully optimize how they use this crucial technology. On this episode, we sat down with Casey Cheshire, Founder & CMO of Cheshire Impact to learn how to optimize a...
Podcast episode
Marketing Automation Explained with Casey Cheshire, Founder & CMO of Cheshire Impact: Marketing automation has been around for a little while now, but few marketers fully optimize how they use this crucial technology. On this episode, we sat down with Casey Cheshire, Founder & CMO of Cheshire Impact to learn how to optimize a...
byMarketing Trends
0 ratings
0% found this document useful
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
Podcast episode
It’s Not a Data Science Problem, It’s a Data Engineering Problem with Laurie Voss: Laurie Voss is a senior data analyst at Netlify, makers of a serverless platform designed to help teams build, deploy, and collaborate on web apps more effectively. Previously, Laurie worked as Chief Data Officer at npm, Inc., co-founded Snowball Factory,
byScreaming in the Cloud
0 ratings
0% found this document useful
SaaStr 181: How To Gain Enterprise Clients As A Startup, How To Approach Multi-Year and Prepaid Deals with Those Mega Companies & How To Balance Fast Growth Expectations with Profitability with Jerry Jao, Founder & CEO @ Retention Science: Jerry Jao is the Founder & CEO @ Retention Science, the startup that brings intelligence to your marketing automation through artificial intelligence that delivers a personalized customer experience, at scale. To date, Jerry has raised over $10m...
Podcast episode
SaaStr 181: How To Gain Enterprise Clients As A Startup, How To Approach Multi-Year and Prepaid Deals with Those Mega Companies & How To Balance Fast Growth Expectations with Profitability with Jerry Jao, Founder & CEO @ Retention Science: Jerry Jao is the Founder & CEO @ Retention Science, the startup that brings intelligence to your marketing automation through artificial intelligence that delivers a personalized customer experience, at scale. To date, Jerry has raised over $10m...
byThe Official SaaStr Podcast: SaaS | Founders | Investors
0 ratings
0% found this document useful
Tracking the (Money) Balls: How Data Science is Becoming a Game Changer: Data science is huge in sports, and it's not just game stats anymore. Player and ball tracking data are changing the way major sports leagues play games. We dive into how these data are analyzed and what the results mean to coaches and teams with Kirk G...
Podcast episode
Tracking the (Money) Balls: How Data Science is Becoming a Game Changer: Data science is huge in sports, and it's not just game stats anymore. Player and ball tracking data are changing the way major sports leagues play games. We dive into how these data are analyzed and what the results mean to coaches and teams with Kirk G...
byHarvard Data Science Review Podcast
0 ratings
0% found this document useful
Overcoming Supply Chain Disruptions with Shanna Greathouse and Tony Nichols: Overcoming Supply Chain Disruptions with Shanna Greathouse and Tony Nichols , , and Joe Lynch discuss overcoming supply chain disruptions. Tony and Shanna work at , a company that specializes in helping companies minimize supply chain risk. About Tony...
Podcast episode
Overcoming Supply Chain Disruptions with Shanna Greathouse and Tony Nichols: Overcoming Supply Chain Disruptions with Shanna Greathouse and Tony Nichols , , and Joe Lynch discuss overcoming supply chain disruptions. Tony and Shanna work at , a company that specializes in helping companies minimize supply chain risk. About Tony...
byThe Logistics of Logistics
0 ratings
0% found this document useful
Media Monitor – Samantha Monk, Director of AI, Meltwater – Monitoring Media and Mining Data to Spot Trends and Understand Your Competition in Business: Samantha Monk, director of AI at Meltwater (meltwater.com), an online media monitoring company, leads an informative discussion on the power of media monitoring. From news to social media, Meltwater monitors nearly everything that is relevant for...
Podcast episode
Media Monitor – Samantha Monk, Director of AI, Meltwater – Monitoring Media and Mining Data to Spot Trends and Understand Your Competition in Business: Samantha Monk, director of AI at Meltwater (meltwater.com), an online media monitoring company, leads an informative discussion on the power of media monitoring. From news to social media, Meltwater monitors nearly everything that is relevant for...
byFinding Genius Podcast
0 ratings
0% found this document useful
Source-to-Pay for the Public Sector – Grant Smith from Elcom
Podcast episode
Source-to-Pay for the Public Sector – Grant Smith from Elcom
byThe Procurement Software Podcast
0 ratings
0% found this document useful
Unlocking Logistics Profits with Matt McKinney: and discuss unlocking logistics profits. Matt is the Co-founder and CEO of , a modern transportation audit & pay company. About Matt McKinney is the Co-founder and CEO of Loop, a modern transportation audit & pay company. Loop's AI-driven...
Podcast episode
Unlocking Logistics Profits with Matt McKinney: and discuss unlocking logistics profits. Matt is the Co-founder and CEO of , a modern transportation audit & pay company. About Matt McKinney is the Co-founder and CEO of Loop, a modern transportation audit & pay company. Loop's AI-driven...
byThe Logistics of Logistics
0 ratings
0% found this document useful
Introducing Data Downtime: From Firefighting to Winning // Barr Moses // MLOps Coffee Sessions #19
Podcast episode
Introducing Data Downtime: From Firefighting to Winning // Barr Moses // MLOps Coffee Sessions #19
byMLOps.community
0 ratings
0% found this document useful
An AI and ML Look Ahead for 2019
Podcast episode
An AI and ML Look Ahead for 2019
byThe Cloudcast
0 ratings
0% found this document useful
Edan Krolewicz - Data for customer acquisition - The light and the dark side of the global data industry - S3 Ep12
Podcast episode
Edan Krolewicz - Data for customer acquisition - The light and the dark side of the global data industry - S3 Ep12
byChampagne Strategy
0 ratings
0% found this document useful
Bringing in the Content Moderation Auditors
Podcast episode
Bringing in the Content Moderation Auditors
byThe Lawfare Podcast
0 ratings
0% found this document useful
438. Insights: PPP - How digital providers helped rescue SMBs: Sam Maule is joined by some great guests to talk about PPP lending during the COVID-19 pandemic. On today's show we are taking a look at the process from the lenders perspective!
Podcast episode
438. Insights: PPP - How digital providers helped rescue SMBs: Sam Maule is joined by some great guests to talk about PPP lending during the COVID-19 pandemic. On today's show we are taking a look at the process from the lenders perspective!
byFintech Insider Podcast by 11:FS
0 ratings
0% found this document useful

Skip carousel

After Years of Challenges, Foursquare Has Found its Purpose -- and Profits
Entrepreneur
Article
After Years of Challenges, Foursquare Has Found its Purpose -- and Profits
Apr 1, 2017
8 min read
Why Your Business Should Get in on the Gamification Trend
Entrepreneur
Article
Why Your Business Should Get in on the Gamification Trend
Sep 1, 2013
5 min read
Machine Learning – With Zero Programming
APC
Article
Machine Learning – With Zero Programming
Aug 12, 2019
6 min read
5 Myths About Loyalty Marketing Busted!
NZ Marketing
Article
5 Myths About Loyalty Marketing Busted!
Nov 15, 2020
4 min read
Where’s The Money?
AdNews
Article
Where’s The Money?
Nov 20, 2019
14 min read
NOTHING IS FREE Data-driven Optimisation Unlocks Freemium Business Models’ Real Potential
The European Business Review
Article
NOTHING IS FREE Data-driven Optimisation Unlocks Freemium Business Models’ Real Potential
Sep 20, 2018
6 min read
Putting Artificial Intelligence to Work
Rotman Management
Article
Putting Artificial Intelligence to Work
May 1, 2018
11 min read
Pivoting To First-party Data
NZ Marketing
Article
Pivoting To First-party Data
Jun 9, 2021
5 min read
Quantum Computing’s DISRUPTION IN Finance Industry
Techfastly
Article
Quantum Computing’s DISRUPTION IN Finance Industry
Oct 1, 2021
5 min read
The Art Of Data: Present
NZ Marketing
Article
The Art Of Data: Present
Jun 9, 2021
2 min read
GDPR - Death Knell or a New Life for Customer Analytics?
The European Business Review
Article
GDPR - Death Knell or a New Life for Customer Analytics?
Sep 20, 2018
4 min read
Reimagining Manufacturing Operations After COVID-19
Techfastly
Article
Reimagining Manufacturing Operations After COVID-19
Nov 1, 2022
4 min read
Open Banking Is A Great British Success Story
MoneyWeek
Article
Open Banking Is A Great British Success Story
Feb 26, 2021
3 min read
“How Do You Launch A Product Without Alienating Or Damaging Your Customers?”
PC Pro Magazine
Article
“How Do You Launch A Product Without Alienating Or Damaging Your Customers?”
Feb 10, 2022
6 min read
Signals Of Change: how To Evolve For The New Global Reality
Rotman Management
Article
Signals Of Change: how To Evolve For The New Global Reality
May 1, 2022
11 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Harnessing Innovative Technology To Maximise Efficiency
Farmer's Weekly
Article
Harnessing Innovative Technology To Maximise Efficiency
Jan 20, 2023
The cannabis industry around the world has experienced tremendous growth in recent years, and new technology is playing a key role in this transformation. Here are the some of the innovative tools and systems adopted. • Artificial intelligence (Al) A
2 min read
The Three Types Of Big Data That Matter For Cmos
The European Business Review
Article
The Three Types Of Big Data That Matter For Cmos
May 22, 2018
4 min read
Digital Strategies For Proactive Supplier Risk Management
The European Business Review
Article
Digital Strategies For Proactive Supplier Risk Management
Oct 2, 2023
4 min read
Privacy Vs Personalisation: A Marketer’s Dilemma Or A Brand-positioning Choice?
NZ Marketing
Article
Privacy Vs Personalisation: A Marketer’s Dilemma Or A Brand-positioning Choice?
Sep 22, 2021
4 min read
Cognitive Enterprise
Techfastly
Article
Cognitive Enterprise
Dec 1, 2021
6 min read
Leadership Forum: Making Digital Transformation A Reality
Rotman Management
Article
Leadership Forum: Making Digital Transformation A Reality
Jan 1, 2018
Glenda Crisp Senior Vice President and Chief Data Officer, TD Bank Group + Connie Bonello Associate Partner, Financial Services, IBM Canada IN MOST OF TODAY’S ORGANIZATIONS, data underpins every transaction, operation and interaction. And yet, the ab
8 min read
Arnab PANDEY
Techfastly
Article
Arnab PANDEY
Apr 1, 2021
11 min read
A Practical Guide To Kick-starting Your Cyber Supply Chain Risk Programme
The European Business Review
Article
A Practical Guide To Kick-starting Your Cyber Supply Chain Risk Programme
Mar 27, 2024
5 min read
Harnessing Data And Research
NZ Marketing
Article
Harnessing Data And Research
Dec 8, 2023
4 min read
Corporate Listening Provides Insights And Business Value
The European Business Review
Article
Corporate Listening Provides Insights And Business Value
May 31, 2020
The voice of customers (VOC) is recognized as vitally important in marketing and corporate communication. VOC is a key source of market insights and intelligence as well as feedback in relation to products and services. Customer insights and feedback
6 min read
Enter the Industry 4.0 Era Today by Using “Dark Data” You Already Have
The European Business Review
Article
Enter the Industry 4.0 Era Today by Using “Dark Data” You Already Have
Aug 2, 2019
7 min read
Understanding 'Big Data' and What It Means to Your Business
Entrepreneur
Article
Understanding 'Big Data' and What It Means to Your Business
May 1, 2013
2 min read
BUILDING THE SMARTER FUTURE OF BANKING & FINANCIAL SERVICES
The European Business Review
Article
BUILDING THE SMARTER FUTURE OF BANKING & FINANCIAL SERVICES
Nov 25, 2021
4 min read
Sneak Peek
AdNews
Article
Sneak Peek
May 23, 2021
2 min read

Related categories

Skip carousel

Reviews for Commercial Data Mining

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Commercial Data Mining - David Nettleton

Commercial Data Mining

Processing, Analysis and Modeling for Predictive Analytics Projects

David Nettleton

Cover image

Title page

Copyright page

Acknowledgments

Chapter 1: Introduction

Abstract

Chapter 2: Business Objectives

Abstract

Introduction

Criteria for Choosing a Viable Project

Factors That Influence Project Benefits

Factors That Influence Project Costs

Example 1: Customer Call Center – Objective: IT Support for Customer Reclamations

Example 2: Online Music App – Objective: Determine Effectiveness of Advertising for Mobile Device Apps

Summary

Chapter 3: Incorporating Various Sources of Data and Information

Abstract

Introduction

Data about a Business’s Products and Services

Surveys and Questionnaires

Loyalty Card/Customer Card

Demographic Data

Macro-Economic Data

Data about Competitors

Financial Markets Data: Stocks, Shares, Commodities, and Investments

Chapter 4: Data Representation

Abstract

Introduction

Basic Data Representation

Advanced Data Representation

Chapter 5: Data Quality

Abstract

Introduction

Examples of Typical Data Problems

Relevance and Reliability

Quantitative Evaluation of the Data Quality

Data Extraction and Data Quality – Common Mistakes and How to Avoid Them

How Data Entry and Data Creation May Affect Data Quality

Chapter 6: Selection of Variables and Factor Derivation

Abstract

Introduction

Selection from the Available Data

Reverse Engineering: Selection by Considering the Desired Result

Data Mining Approaches to Selecting Variables

Packaged Solutions: Preselecting Specific Variables for a Given Business Sector

Summary

Chapter 7: Data Sampling and Partitioning

Abstract

Introduction

Sampling for Data Reduction

Partitioning the Data Based on Business Criteria

Issues Related to Sampling

Chapter 8: Data Analysis

Abstract

Introduction

Visualization

Associations

Clustering and Segmentation

Segmentation and Visualization

Analysis of Transactional Sequences

Analysis of Time Series

Typical Mistakes when Performing Data Analysis and Interpreting Results

Chapter 9: Data Modeling

Abstract

Introduction

Modeling Concepts and Issues

Neural Networks

Classification: Rule/Tree Induction

Traditional Statistical Models

Other Methods and Techniques for Creating Predictive Models

Applying the Models to the Data

Simulation Models – What If?

Summary of Modeling

Chapter 10: Deployment Systems: From Query Reporting to EIS and Expert Systems

Abstract

Introduction

Query and Report Generation

Executive Information Systems

Expert Systems

Case-Based Systems

Summary

Chapter 11: Text Analysis

Abstract

Basic Analysis of Textual Information

Advanced Analysis of Textual Information

Commercial Text Mining Products

Chapter 12: Data Mining from Relationally Structured Data, Marts, and Warehouses

Abstract

Introduction

Data Warehouse and Data Marts

Creating a File or Table for Data Mining

Chapter 13: CRM – Customer Relationship Management and Analysis

Abstract

Introduction

CRM Metrics and Data Collection

Customer Life Cycle

Example: Retail Bank

Integrated CRM Systems

Customer Satisfaction

Example CRM Application

Chapter 14: Analysis of Data on the Internet I – Website Analysis and Internet Search

Abstract

Chapter e14: Analysis of Data on the Internet I – Website Analysis and Internet Search

Abstract

Introduction

Analysis of Trails left by Visitors to a Website

Search and Synthesis of Market Sentiment Information on the Internet

Summary

Chapter 15: Analysis of Data on the Internet II – Search Experience Analysis

Abstract

Chapter e15: Analysis of Data on the Internet II – Search Experience Analysis

Abstract

Introduction

The Internet and Internet Search

Data Mining of a User Search Log

Summary

Chapter 16: Analysis of Data on the Internet III – Online Social Network Analysis

Abstract

Chapter e16: Analysis of Data on the Internet III – Online Social Network Analysis

Abstract

Introduction

Analysis of Online Social Network Graphs

Applications and Tools for Social Network Analysis

Summary

Chapter 17: Analysis of Data on the Internet IV – Search Trend Analysis over Time

Abstract

Chapter e17: Analysis of Data on the Internet IV – Search Trend Analysis over Time

Abstract

Introduction

Analysis of Search Term Trends Over Time

Data Mining Applied to Trend Data

Summary

Chapter 18: Data Privacy and Privacy-Preserving Data Publishing

Abstract

Introduction

Popular Applications and Data Privacy

Legal Aspects – Responsibility and Limits

Privacy-Preserving Data Publishing

Chapter 19: Creating an Environment for Commercial Data Analysis

Abstract

Introduction

Integrated Commercial Data Analysis Tools

Creating an Ad Hoc/Low-Cost Environment for Commercial Data Analysis

Chapter 20: Summary

Abstract

Appendix: Case Studies

Case Study 1: Customer Loyalty at an Insurance Company

Case Study 2: Cross-Selling a Pension Plan at a Retail Bank

Case Study 3: Audience Prediction for a Television Channel

Glossary

Bibliography

Index

Copyright

Acquiring Editor: Andrea Dierna

Editorial Project Manager: Kaitlin Herbert

Project Manager: Punithavathy Govindaradjane

Designer: Matthew Limbert

Morgan Kaufmann is an imprint of Elsevier

225 Wyman Street, Waltham, MA 02451, USA

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notices

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

Library of Congress Cataloging-in-Publication Data

Nettleton, David, 1963-

Commercial data mining : processing, analysis and modeling for predictive analytics projects / David Nettleton.

pages cm

Includes bibliographical references and index.

ISBN 978-0-12-416602-8 (paperback)

1. Data mining. 2. Management–Mathematical models. 3. Management–Data processing. I. Title.

HD30.25.N48 2014

658′.056312–dc23

2013045341

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

ISBN: 978-0-12-416602-8

Printed and bound in the United States of America

14 15 16 17 18 10 9 8 7 6 5 4 3 2 1

For information on all Morgan Kaufmann publications Visit our Website at www.mkp.com

Acknowledgments

Dr. David Nettleton is a contract researcher at the Pompeu Fabra University, Barcelona, Catalonia, Spain, and at the IIIA-CSIC, Bellaterra, Catalonia, Spain.

I would first like to thank the reviewers of the book proposal: Xavier Navarro Arnal, Dr. Anton Dries (KU Leuven, Belgium), Tim Holden (Imagination Technologies, United Kingdom) and Colin Shearer (Advanced Analytic Solutions, IBM). Second, I would like to thank the reviewers of the book manuscript: Dr. Vicenç Torra (IIIA-CSIC, Catalonia, Spain), Tim Holden, Dr. Anton Dries, Joan Gómez Escofet (Autonomous University of Barcelona, Catalonia, Spain), and the anonymous reviewers for their suggestions, corrections, and constructive criticisms, which have helped enhance the content.

Third, I would like to thank the following people and companies for their permission to adapt and include material for the book: IBM, Newprosoft, Professor Ian Witten, Dr. Ricardo Baeza-Yates (Yahoo! Research and Pompeu Fabra University, Catalonia, Spain), Dr. Joan Codina-Filba (Pompeu Fabra University, Catalonia, Spain), Dra. Liliana Calderón Benavides (Autonomous University of Bucaramanga, Columbia), and Néstor Martinez.

I would also like to thank Kaitlin Herbert and Andrea Dierna of Morgan Kaufmann Publishers for their belief in this project and their support throughout the preparation of the book.

Finally, I would like to thank my supporting institutions: Pompeu Fabra University, Catalonia, Spain, and the IIIA-CSIC, Catalonia, Spain.

Chapter 1

Introduction

Abstract

The introduction commences with an overview of the readership, scope, and reason for the book, with reference to the complete cycle of a data mining project. Then a brief summary of each chapter is given, and finally some reading recommendations are provided.

Keywords

overview

chapter summaries

data mining

analysis

data

project cycle

This book is intended to benefit a wide audience, from those who have limited experience in commercial data analysis to those who already analyze commercial data, offering a vision of the whole process and its related topics. The author includes material from over 20 years of professional business experience as well as a diversity of research projects he was involved in, in order to enrich the content and give an original approach to commercial data analysis. In the appendix, practical case studies derived from real-world projects are used to illustrate the concepts and techniques that are explained throughout the book. Numerous references are included for those readers who wish to go into greater depth about a given topic.

Many of the methods, techniques, and ideas presented, such as data quality, data mart, customer relationship management, data sources, and Internet searches, can be applied by small business owners, freelance professionals, or medium to large-sized companies. The reader will see that it is not a prerequisite to have large volumes of data, and many tools used for data analysis are available for a nominal cost.

Although the steps in Chapter 2 through 10 can be carried out sequentially, note that, in practice, aspects such as data sources, data representation, and data quality are often carried out in parallel and reiteratively. This also applies to the variable/factor selection, analysis, and modeling steps. However, note that the better each step is performed, the fewer iterations will be necessary.

In order to obtain meaningful results, data analysis requires an attention to detail, an adequate project definition, meticulous preparation of the data, investigative capacity, patience, rigor, and objectives that are well defined from the beginning. If these requirements are taken together as a starting point, then a basis can be built from which a data warehouse is converted into a high-value asset. One of the motivators for data analysis is to realize a return on investment for the database infrastructures that many businesses have installed. Another is to gain competitive leverage and insight for products and services by better understanding the marketplace, including customer and competitor behavior.

The analysis and comprehension of business data are fundamental parts of all organizations. Monitoring national economies and retail sales tendencies depend on data analysis, as does measuring the profitability, costs, and competitiveness of commercial organizations and businesses. Analyzing customer data has become easier due to data management infrastructures that separate the operational data from the analytical data, and from Internet applications and cloud computing, which facilitate the gathering of large-volume historical data logs.

On the other hand, computer systems have swamped us with large volumes of data and information, much of which is irrelevant for a specific analysis objective. Also, customer behavior has become more complex due to the diversity of applications that compete in the marketplace, especially for mobile devices. Thus, the objective of data analysis should be that of discovering useful and meaningful knowledge and separating the relevant from the irrelevant.

Chapter 2 through 10 follow the sequential steps for a typical data mining project. A scheme of the organization of these chapters can be seen in Figure 1.1. Chapter 2, Business Objectives, discusses the definition of a data mining project, including its initial concept, motivation, business objectives, viability, estimated costs, and expected benefit (returns). Key considerations are defined and a way of quantifying the cost and benefit is presented in terms of the factors that most influence the project. Finally, two case studies illustrate how the cost/benefit evaluation can be applied to real-world projects.

Figure 1.1Relationship between chapters and the phases of a commercial data analysis project

Chapter 3, Incorporating Various Sources of Data and Information, discusses possible sources of data and information that can be used for a commercial data mining project and how to establish which data sources are available and can be accessed for a commercial data analysis project. Data sources include a business’s own internal data about its customers and its business activities, as well as external data that affects a business and its customers in different domains and in given sectors: competitive, demographic, and macro-economic.

Chapter 4 Data Representation, looks at the different ways data can be conceptualized in order to facilitate its interpretation and visualization. Visualization methods include pie charts, histograms, graph plots, and radar diagrams. The topics covered in this chapter include representation, comparison, and processing of different types of variables; principal types of variables (numerical, categorical ordinal, categorical nominal, binary); normalization of the values of a variable; distribution of the values of a variable; and identification of atypical values or outliers. The chapter also discusses some of the more advanced types of data representation, such as semantic networks and graphs.

Chapter 5, Data Quality, discusses data quality, which is a primary consideration for any commercial data analysis project. In this book the definition of quality includes the availability or accessibility of data. The chapter discusses typical problems that can occur with data, errors in the content of the data (especially textual data), and relevance and reliability of the data and addresses how to quantitatively evaluate data quality.

Chapter 6, Selection of Variables and Factor Derivation, considers the topics of variable selection and factor derivation, which are used in a later chapter for analysis and modeling. Often, key factors must be selected from a large number of variables, and to do this two starting points are considered: (i) data mining projects that are defined by looking at the available data, and (ii) data mining projects that are driven by considering what the final desired result is. The chapter also discusses techniques such as correlation and factor analysis.

Chapter 7, Data Sampling and Partitioning, discusses sampling and partitioning methods, which is often done when the volume of data is too great to process as a whole or when the analyst is interested in selecting data by specific criteria. The chapter considers different types of sampling, such as random sampling and sampling based on business criteria (age of client, length of time as client, etc.).

With Chapter 2 through 7 having laid the foundation for obtaining and defining a dataset for analysis, Chapter 8, Data Analysis, describes a selection of the most common types of data analysis for data mining. Data visualization is discussed, followed by clustering and how it can be combined with visualization techniques. The reader is also introduced to transactional analysis and time series analysis. Finally, the chapter considers some common mistakes made when analyzing and interpreting data.

Chapter 9, Modeling, begins with the definition of a data model and what its inputs and outputs are, then goes on to discuss concepts such as supervised and unsupervised learning, cross-validation, and how to evaluate the precision of modeling results. The chapter then considers various techniques for modeling data, from AI (artificial intelligence) approaches, such as neural networks and rule induction, to statistical techniques, such as regression. The chapter explains which techniques should be used for various modeling scenarios. It goes on to discuss how to apply models to real-world production data and how to evaluate and use the results. Finally, guidelines are given for how to perform and reiterate the modeling phase, especially when the initial results are not the desired or optimal ones.

Chapter 10, Deployment Systems: From Query Reporting to EIS and Expert Systems, discusses ways that the results of data mining can be fed into the decision-making and operative processes of the business.

Chapter 11 through 19 address various background topics and specific data mining domains. A scheme of the organization of these chapters can be seen in Figure 1.2.

Figure 1.2Chapter topics related to commercial data analysis and projects based on real-world cases

Chapter 11, Text Analysis, discusses both simple and more advanced text processing and text analysis: basic processing takes into account format checking based on pattern identification, and more advanced techniques consider named entity recognition, concept identification based on synonyms and hyponyms, and information retrieval concepts.

Chapter 12, Data Mining from Relationally Structured Data, Marts, and Warehouses, deals with extracting a data mining file from relational data. The chapter reviews the concepts of data mart and data warehouse and discusses how the informational data is separated from the operational data, then describes the path of extracting data from an operational environment into a data mart and finally into a unique file that can then be used as the starting point for data mining.

Chapter 13, CRM – Customer Relationship Management and Analysis, introduces the reader to the CRM approach in terms of recency, frequency, and latency of customer activity, and in terms of the client life cycle: capturing new clients, potentiating and retaining existing clients, and winning back ex-clients. The chapter goes on to discuss the characteristics of commercial CRM software products and provides examples and functionality from a simple CRM application.

Chapter 14, Analysis of Data on the Internet I – Website Analysis and Internet Search, first discusses how to analyze transactional data from customer visits to a website and then discusses how Internet search can be used as a market research tool.

Chapter 15, Analysis of Data on the Internet II – Search Experience Analysis, Chapter 16, Analysis of Data on the Internet III – Online Social Network Analysis, and Chapter 17, Analysis of Data on the Internet IV – Search Trend Analysis over Time, continue the discussion of data analysis on the Internet, going more in-depth on topics such as search experience analysis, online social network analysis, and search trend analysis over time.

Chapter 18, Data Privacy and Privacy-Preserving Data Publishing, addresses data privacy issues, which are important when collecting and analyzing data about individuals and organizations. The chapter discusses how well-known Internet applications deal with data privacy, how they inform users about using customer data on websites, and how cookies are used. The chapter goes on to discuss techniques used for anonymizing data so the data can be used in the public domain.

Chapter 19, Creating an Environment for Commercial Data Analysis, discusses how to create an environment for commercial data analysis in a company. The chapter begins with a discussion of powerful tools with high price tags, such as the IBM Intelligent Miner, the SAS Enterprise Miner, and the IBM SPSS Modeler, which are used by multinational companies, banks, insurance companies, large chain stores, and so on. It then addresses a low-cost and more artisanal approach, which consists of using ad hoc, or open source, software tools such as Weka and Spreadsheets.

Chapter 20, Summary, provides a synopsis of the chapters.

The appendix details three case studies that illustrate how the techniques and methods discussed throughout the book are applied in real-world situations. The studies include: (i) a customer loyalty project in the insurance industry, (ii) cross-selling a pension plan in the retail banking sector, and (iii) an audience prediction for a television channel.

For readers who wish to focus on business aspects, the following reading plan is recommended: Chapters 2, 3, 6, 8 (specifically the sections titled Visualization, Associations, and Segmentation and Visualization), 9, 10, 11 (specifically the sections titled Basic Analysis of Textual Information and Advanced Analysis of Textual Information), 13, 14, 18, 19, and the case studies in the appendix. For readers who want more technical details, the following chapters are recommended: Chapter 4 through 9, 11, 12, through 17, 19, and the case studies in the appendix.

The demographic data referred to in this book has been randomly anonymized and aggregated; that is, individual people cannot be identified by first and last names or by any unique or derivable identifier. Any resemblance to a real person or entity is purely coincidental.

Figure 1.3 illustrates the phases of a project. The cyclic nature of the phases of a detailed analysis is evident, including the modeling phase. The cyclic nature is evident when, for example, after obtaining the first results of a data analysis and modeling, different input factors are then chosen, or the business objective changes. (However, if the business objectives have been well chosen and the first phases have been performed correctly, there should be no need to go back and redefine the high-level analysis.)

Figure 1.3Life cycle of a typical commercial data analysis project (numbers correspond to chapters)

Chapter 2

Business Objectives

Abstract

This chapter discusses the definition of a data mining project, including its initial concept, motivation, objective, viability, estimated costs, and expected benefit (returns). Key considerations are defined, and a way of quantifying the cost and benefit is presented in terms of the factors that most influence the project. Two case studies illustrate how the cost/benefit evaluation can be applied to real-world projects.

Keywords

business objectives

cost

benefit

evaluation

influential factors

customer call center

mobile applications advertising

Introduction

A commercial data analysis project that lives up to its expectations will probably do so because sufficient time was dedicated at the outset to defining the project’s business objectives. What is meant by business objectives? The following are some examples:

• Reduce the loss of existing customers by 3 percent.

• Augment the contract signings of new customers by 2 percent.

• Augment the sales from cross-selling products to existing customers by 5 percent.

• Predict the television audience share with a probability of 70 percent.

• Predict, with a precision of 75 percent, which clients are most likely to contract a new product.

• Identify new categories of clients and products.

• Create a new customer segmentation model.

The first three examples define a specific percentage of precision and improvement as part of the objective.

Business Objective

Assigning a Value for Percent Improvement

The percentage improvement should always be considered with regard to the current precision of an existing index as a baseline. Also, the new precision objective should not get lost in the error bars of the current precision. That is, if the current precision has an error margin of ± 3% in its measurement or calculation, this should be taken into account.

In the fourth and fifth examples, an absolute value is specified for the desired precision for the data model. In the final two examples the desired improvement is not quantified; instead, the objective is expressed in qualitative terms.

Criteria for Choosing a Viable Project

This section enumerates some main issues and poses some key questions relevant to evaluating the viability of a potential data mining project. The checklists of general and specific considerations provided here are the bases for the rest of the chapter, which enters into a more detailed specification of benefit and cost criteria and applies these definitions to two case studies.

Evaluation of Potential Commercial Data Analysis Projects – General Considerations

The following is a list of questions to ask when considering a data analysis project:

• Is data available that is consistent and correlated with the business objectives?

• What is the capacity for improvement with respect to the current methods? (The greater the capacity for improvement, the greater the economic benefit.)

• Is there an operational business need for the project results?

• Can the problem be solved by other techniques or methods? (If the answer is no, the profitability return on the project will be greater.)

• Does the project have a well-defined scope? (If this is the first instance of a project of this type, reducing the scale of the project is recommended.)

Evaluation of Viability in Terms of Available Data – Specific Considerations

The following list provides specific considerations for evaluating the viability of a data mining project in terms of the available data:

• Does the necessary data for the business objectives exist, and does the business have access to it?

• If part or all of the data does not exist, can processes be defined to capture or obtain it?

• What is the coverage of the data with respect to the business objectives?

• What is the availability of a sufficient volume of data over a required period of time, for all clients, product types, sales channels, and so on? (The data should cover all the business factors to be analyzed and modeled. The historical data should cover the current business cycle.)

• Is it necessary to evaluate the quality of the available data in terms of reliability? (The reliability depends on the percentage of erroneous data and incomplete or missing data. The ranges of values must be sufficiently wide to cover all cases of interest.)

• Are people available who are familiar with the relevant data and the operational processes that generate the data?

Factors That Influence Project Benefits

There are several factors that influence the benefits of a project. A qualitative assessment of current functionality is first required: what is the current grade of satisfaction of how the task is being done? A value between 1 and 0 is assigned, where 1 is the highest grade of satisfaction and 0 is the lowest, where the lower the current grade of satisfaction, the greater the improvement and, consequently, the benefit, will be.

The potential quality of the result (the evaluation of future functionality) can be estimated by three aspects of the data: coverage, reliability, and correlation:

• The coverage or completeness of the data, assigned a value between 0 and 1, where 1 indicates total coverage.

• The quality or reliability of the data, assigned a value between 0 and 1, where 1 indicates the highest quality. (Both the coverage and the reliability are normally measured variable by variable, giving a total for the whole dataset. Good coverage and reliability for the data help to make the analysis a success, thus giving a greater benefit.)

• The correlation between the data and its grade of dependence with the business objective can be statistically measured. A correlation is typically measured as a value from –1 (total negative correlation) through 0 (no correlation) to 1 (total positive correlation). For example, if the business objective is that clients buy more products, the correlation would be calculated for each customer variable (age, time as a customer, zip code of postal address, etc.) with the customer’s sales volume.

Once individual values for coverage, reliability, and correlation are acquired, an estimation of the future functionality can be obtained using the formula:

An estimation of the possible improvement is then determined by calculating the difference between the current and the future functionality, thus:

A fourth aspect, volatility, concerns the amount of time the results of the analysis or data modeling will remain valid.

Volatility of the environment of the business objective can be defined as a value of between 0 and 1, where 0 = minimum volatility and 1 = maximum volatility. A high volatility can cause models and conclusions to become quickly out of date with respect to the data; even the business objective can lose relevance. Volatility depends on whether the results are applicable over the long, medium, or short terms with respect to the business cycle.

Note that this a priori evaluation gives an idea for the viability of a data mining project. However, it is clear that the quality and precision of the end result will also depend on how well the project is executed: analysis, modeling, implementation, deployment, and so on. The next section, which deals with the estimation of the cost of the project, includes a factor (expertise) that evaluates the availability of the people and skills necessary to guarantee the a posteriori success of the project.

Factors That Influence Project Costs

There are numerous factors that influence how much a project costs. These include:

• Accessibility: The more data sources, the higher the cost. Typically, there are at least two different data sources.

• Complexity: The greater the number of variables in the data, the greater the cost. Categorical-type variables (zones, product types, etc.) must especially be taken into account, given that each variable may have many possible values (for example, 50). On the other hand, there could be just 10 other variables, each of which has only two possible values.

• Data volumes: The more records there are in the data, the higher the cost. A data sample extracted from the complete dataset can have a volume of about 25,000 records, whereas the complete database could contain between 250,000 and 10 million records.

• Expertise: The more expertise available with respect to the data, the lower the cost. Expertise includes knowledge about the business environment, customers, and so on that facilitates the interpretation of the data. It also includes technical know-how about the data sources and the company databases from which the data is extracted.

Example 1: Customer Call Center – Objective: IT Support for Customer Reclamations

Mr. Strong is the operations manager of a customer call center that provides outsourced customer support for a diverse group of client companies. In the last quarter, he has detected an increase of reclamations by customers for erroneous billing by a specific company. By revising the bills and speaking with the client company, the telephone operators identified a defective batch software program in the batch billing process and reported the incident to Mr. Strong, who, together with the IT manager of the client company, located the defective process. He determined the origin of the problem, and the IT manager gave instructions to the IT department to make the necessary corrections to the billing software. The complete process, from identifying the incident to the corrective actions, was documented in the call center’s audit trail and the client company. Given the concern for the increase in incidents, Mr. Strong and the IT manager decided to initiate a data mining project to efficiently investigate reclamations due to IT processing errors and other causes.

Hypothetical values can be assigned to the factors that influence the benefit of this project, as follows: The available data has a high grade of correlation (0.9) with the business objective. Sixty-two percent of the incidents (which are subsequently identified as IT processing issues) are solved by the primary corrective actions; thus, the current grade of precision is 0.62. The data captured represents 85 percent of the modifications made to the IT processes, together with the relevant information at the time of the incident. The incidents, the corrections, and the effect of applying the corrections are entered into a spreadsheet, with a margin of error or omission of 8 percent. Therefore, the degree of coverage is 0.85 and the grade of reliability is (1 – 0.08) = 0.92.

The client company’s products and services that the call center supports have to be continually updated due to changes in their characteristics. This means that 10 percent of the products and services change completely over a one year period. Thus a degree of volatility of 0.10 is assigned. The project quality model, in terms of the factors related to benefit, is summarized as follows:

• Qualitative measure of the current functionality: 0.62 (medium)

• Evaluation of future functionality:

• Coverage: 0.85 (high)

• Reliability: 0.92 (high)

• Correlation of available data with business objective: 0.9 (high)

• Volatility of the environment of the business objective: 0.10 (low)

Values can now be assigned for factors that influence the cost of the project.

Mr. Strong’s operations department has an Oracle database that stores the statistical summaries of customer calls. Other historical records are kept in an Excel spreadsheet for the daily operations, diagnostics arising from reclamations, and corrective actions. Some of the records are used for operations monitoring. The IT manager of the client company has a DB2 database of software maintenance that the IT department has performed. Thus there are three data sources: the Oracle database, the data in the call center’s Excel spreadsheets, and the DB2 database from the client IT department.

There are about 100 variables represented in the three data sources, 25 of which the operations manager and the IT manager consider relevant for the data model. Twenty of the variables are numerical and five are categorical (service type, customer type, reclamation type, software correction type, and priority level). Note that the correlation value used to estimate the benefit and the future functionality is calculated as an average for the subset of the 25 variables evaluated as being the most relevant, and not the 100 original variables.

The operations manager and the IT manager agree that, with three years’ worth of historical data, the call center reclamations and IT processes can be modeled. It is clear that the business cycle does not have seasonal cycles; however, there is a temporal aspect due to peaks and troughs in the volume of customer calls at certain times of the year. Three years’ worth of data implies about 25,000 records from all three data sources. Thus the data volume is 25,000.

The operations manager and the IT manager can make time for specific questions related to the data, the operations, and the IT processes. The IT manager may also dedicate time to technical interpretation of the data in order to extract the required data from the data sources. Thus there is a high level of available expertise in relation to the data.

Factors that influence the project costs include:

• Accessibility: three data sources, with easy accessibility

• Complexity: 25 variables

• Data volume: 25,000 records

• Expertise: high

Overall Evaluation of the Cost and Benefit of Mr. Strong’s Project

In terms of benefit, the evaluation gives a quite favorable result, given that the current functionality

Enjoying the preview?

Page 1 of 1

Commercial Data Mining: Processing, Analysis and Modeling for Predictive Analytics Projects

About this ebook

David Nettleton

Related authors

Related to Commercial Data Mining

Related ebooks

Enterprise Applications For You

Related podcast episodes

Related articles

Related categories

Reviews for Commercial Data Mining

What did you think?

Book preview

Commercial Data Mining - David Nettleton

Table of Contents

Copyright

Acknowledgments

Abstract

Abstract

Introduction

Criteria for Choosing a Viable Project

Evaluation of Potential Commercial Data Analysis Projects – General Considerations

Evaluation of Viability in Terms of Available Data – Specific Considerations

Factors That Influence Project Benefits

Factors That Influence Project Costs

Example 1: Customer Call Center – Objective: IT Support for Customer Reclamations

Overall Evaluation of the Cost and Benefit of Mr. Strong’s Project