Anda di halaman 1dari 24

Big Data: How to Convert the Big Hype into Big Value with Analytics

Colin White BI Research August 29, 2012


Big Data: How to Convert the Big Hype into Big Value With Analytics

Harriet Fryman Director, IBM

Colin White President, BI Research

The Evolution of Electronic Data

First OLTP systems

Early decision support products

First commercial RDBMSs

Early data warehousing

Big data and Big analytics

Optimized systems

Universal systems

Optimized systems

Copyright BI Research, 2012

What are Big Data and Big Analytics?

Represent next generation data management and analytic solutions that could not previously be supported because of:
Limited or incomplete information

Technology limitations

Copyright BI Research, 2012

The Next Generation of Innovation

Three important big data management advances:

Analytic RDBMSs Non-relational systems, e.g., Hadoop HDFS and MapReduce Stream processing systems

Three important big analytics advances:

New and improved analytical techniques Enhanced data visualization and exploration Analytics-driven automated decision management

I hope were not going to have the same old argument

Thanks to Harriet Fryman of IBM!

Copyright BI Research, 2012

Big Analytics

New & Improved Analytical Techniques

uncover new patterns New analytic algorithms for new types of data Analysis of text, images, and video streams Enhanced predictive modeling and analysis In-database analytic functions

Enhanced Visualization & Exploration

find new answers Interactive visual data discovery New forms of visualization for large data volumes and and new types of data Deliver data and results to users ranging from business managers to data scientists Consumer-like user interaction & experience

Automated decision management

improve outcomes Analytics-driven business processes Place new insight at the point of impact Automate transactional decisions with greater accuracy

Big data management systems

Source: Based on a visual from IBM

Copyright BI Research, 2012

Big Analytics Example: Customer Marketing

Blend data
Internal & External Data Retail measurement POS data Consumer panel household data Customer demographics Customer purchase behavior Customer billing data Customer satisfaction data Customer market research data Third-party data (ACXIOM, D&B, etc.) Merchandising sales data (SAP, JD Edwards, etc,)

Build models
Statistical Techniques Multiple linear regression Non-linear progression Factor analysis Structural equations model Cluster analysis Forecasting Logistic regression Non-Statistical Techniques Blog mining Neural networks Market basket analysis Operations Research Mixed integer programming Linear programming

Analysis services
Market Analytics Market volume forecasting Market share models Promotion effectiveness Market basket analysis Price elasticity modeling Product portfolio analysis Lifestyle segmentation Demand forecasting Customer Analytics Customer behavior analysis Profiling & segmentation Response modeling Cross-sell/up-sell modeling Loyalty & attrition modeling Profitability & lifetime modeling Purchase/usage behavior analysis Propensity scoring Campaign management


Copyright BI Research, 2012

Next Generation Analytic Workflow

What does it tell me?

What shall I do now?

Whats useful?

Uncover insights and find answers to business questions

Distill information, apply algorithms, identify patterns

2 4
Share with others, make a decision, initiate a process

Data scientists Whats available?

Assemble V3+
Capture and query, monitor and stream, big data sources

Web and Social Interaction Data

Source: IBM


Text, Content & Documents

Sensor and Network Data

Transactional Data History

Summary: Big Data + Big Analytics

Traditional Enterprise BI/DW
(determine and analyze current business situation) Integrated data sources Structured data Aggregated and detailed data (with limits)

Extended DW (Big Data + Big Analytics)

(provide more complete answers, identify new business opportunities and extend analytics to new business areas) Virtualized and blended data sources Multi-structured data Large volumes of detailed data (no limits)

Relational EDW with at rest data Dimensional cubes/marts with at rest data
One-size fits all data management Rigid data governance Reporting and OLAP Reports and dashboards

Non-relational stores with at rest data Streaming systems with in motion data
Flexible & optimized data management Flexible data governance Advanced analytic functions & predictive models Sophisticated visualization of large result sets

Fixed navigation paths (drill down, slice/dice)

Humans interpret results, patterns and trends Manual analyses, decisions and actions

Flexible exploration of large result sets

Sophisticated trend and pattern analysis Rules-driven recommendations and decisions

Copyright BI Research, 2012

Choosing the Right Solution

Key requirements are: The ability for organizations to easily analyze large volumes of structured and multi-structured data with good price/performance The need to make technologies for developing and running these analyses more usable by information workers such as data scientists Organizations will likely use multiple data management systems and analytic tools the challenge is deciding which to use when and interconnecting the systems

Copyright BI Research, 2012


Use Cases - 1
Use Case
Real-Time Filtering, Monitoring & Analytics Near-Real-Time Analytics Data Integration Hub BI Accelerator New LOB BI Application Investigative Computing
Built for purpose (optimized) systems

Streaming System

Embedded Analytic Services

Enterprise Data Warehouse

Analytic RDBMS

NonRelational System

Copyright BI Research, 2012


Use Cases - 2
Use Case
Real-Time Filtering, Monitoring & Analytics

Application Examples
In-line fraud detection Dynamic network/smart grid optimization Real-time equipment tracking, failure prediction & action Customer next best off/option Customer service center optimization Shipping/freight service-level tracking & re-optimization Detailed data integration & archiving Detailed data filtering & transformation Detailed data aggregation

Near-Real-Time Analytics

Data Integration Hub BI Accelerator New LOB BI Application

Reduce latency of existing LOB analyses Reduce costs of existing LOB analyses
Display advertising spot buying & effectiveness Web site traffic analysis & optimization Dynamic product/service pricing optimization Enhanced customer modeling & analytics Improved fraud detection New sensor-based analytic applications

Investigative Computing

Copyright BI Research, 2012


Real-Time Stream Processing: River Health

Near-time notifications

Business analysts & managers

Monitor fish movements and pollution levels via geo-spatial views, public transparency reporting, cross-institutional collaboration

Data scientist
Data mining, what-if analysis for impact analysis, correlating pollution levels with events & seasonal activity
Source IBM: Inspired by the River and Estuary Observatory Network (REON) project Monitoring pollution levels in rivers and estuaries in New York state using sensors

Data flow Solid - required Dotted - optional


Real-Time Embedded Analytics: Telco Provider

IVR Chat Session Web Email
Mobile Apps

Call Voice Center


2 3

DM receives list of candidate marketing offers from EMM Optionally EMM calls out to SPSS to help determine candidate offers

1 5

Request for Next Best Action (NBA) from channel

Enterprise Marketing Management

Cross-channel Campaign Management

Next Best Action delivered to the customer through the appropriate channel

Decision Management determines NBA from: Marketing offers (EMM) Service Problems Billing Information Location Service Issue Issue Resolution Dispute Satisfaction Account Management Advice Self Service Channel Match Agent Match etc.

Decision Services
Business Optimization Rules Predictive Analytics Text Analytics Entity Analytics

Real Time Marketing

Source: IBM

Big Data Platform


Stream Computing Data Warehouse

Information Integration & Governance

Core Database
Demographic (DB, surveys) Interactions (Call center, Web)

Enterprise Content Management

Behavioral (Orders, Payments) Attitudinal (Surveys, Social / CCI)


Data Integration Hub & BI Accelerator: Sears

$43 billion retail organization with over 4,000 stores (Sears and Kmart)

Cost effectively manage increasing data volume Reduce the number of data warehouses and ETL jobs Reduce analytical processing times and provide intra-day analytics

Capture and store all detailed transaction data (POS data, web clicks, supply chain events, etc.) for analysis

Hadoop data hub and BI accelerator Manages all detailed structured (and multi-structured) data Data hub is used to distribute required data to other analytics systems Hadoop system is also used to accelerate performance-critical analyses

Primary source: Presentation by Dr. Phillip Shelley (CTO Sears Holdings and CEO MetaScale) at the Hadoop Summit, June 2012

Copyright BI Research, 2012


Data Integrator Hub & BI Accelerator: Sears

Enhanced pricing application
Issue: only about 10% of the sales data is in the EDW; pricing models were taking taking 8 weeks to setup and run Hadoop MapReduce solution analyzes price elasticity based on 100% of the sales data Pricing models can now be run weekly (or daily if required)

Improved customized offers to loyal customers

Issue: existing system was not scalable; only a small subset of the data could be analyzed Replaced 6,000 COBOL application with 400 lines of Pig and Java UDFs; implemented in 6 weeks Application can now be run multiple times per day per store per line item per customer reduces impact of competitors such as Amazon

Primary source: Presentation by Dr. Phillip Shelley (CTO Sears Holdings and CEO MetaScale) at the Hadoop Summit, June 2012

Copyright BI Research, 2012


New LOB Application

MediaMath is a leader in the billion dollar display advertising business Employs an analytic RDBMS to deliver an analytic platform called TerminalOne Enables ad agencies and large-scale advertisers to identify, bid on, buy, and optimize ad impressions Automatically matches each impression in real time with ads that are meaningful and relevant to users Analyzes upwards of 15 billion ad impressions a day and calculates the fair market value of more than 50,000 ads/sec
Copyright BI Research, 2012
Source: LUMA Partners


Investigative Computing: Consumer Products

Wanted to capitalize on consumer themes to better engage target segment social insight of competitors led to internal assumptions regarding the products price
Identified key social channels to gain feedback into the product line to more effectively serve them

37.3% 4.0%



Identify discussion volume across multiple social channels Measure consumer reaction Compare results with traditional research data





Drove market actions that prevented costly pricing discounts and package re-design
Source: IBM


The IBM Platform

6 Gain insights through big analytics IBM Cognos family IBM SPSS software

Analytic Applications
BI / Exploration / Functional Industry Predictive Content BI / Reporting Visualization App App Analytics Analytics Reportin g

5 Unlock Big Data wherever it resides IBM Vivisimo Velocity

IBM Big Data Platform

Visualization & Discovery Application Development Systems Management

7 Automate and optimize transactional decisions IBM SPSS Decision Management

4 Find relevant information in raw data IBM InfoSphere BigInsights

Accelerators Hadoop System Stream Computing Data Warehouse

1 Optimize data warehouse deployment IBM Netezza Smart Analytics System

2 Reduce costs with Hadoop IBM InfoSphere BigInsights

3 Capitalize on streaming data InfoSphere Streams

Information Integration & Governance
Source: IBM


Where Are We Today? Conclusions

Big data represents a new and evolving business and technology ecosystem it is not one market, but many markets
Big analytics is the next wave of analytic value for organizations Big data initiatives at present are primarily LOB use-case driven

True value is gained from a hybrid of existing and new data systems
Integration with existing enterprise systems will be a key vendor differentiator The vendors that win will be those that can best tackle cost and/or advanced analytics requirements Big data is causing significant innovation in:
Data management Analytics and visualization Data-driven automation
Copyright BI Research, 2012 21

Where Next: Cognitive Computing

Copyright BI Research, 2012




Contacting Speakers
If you have further questions or comments: Colin White, BI Research Harriet Fryman