Anda di halaman 1dari 25

Unlocking New Big Data

Insights with MySQL

A MySQL Whitepaper

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Table of Contents
Introduction ....................................................................................................3
1.

Defining Big Data ....................................................................................3

2.

The Internet of things .............................................................................4

3.

The Lifecycle of Big Data .......................................................................6


Step 1: Acquire Data ..................................................................................8
Step 2: Organize Data .............................................................................14
Step 3: Analyze Data ...............................................................................17
Step 4: Decide .........................................................................................18

4.

MySQL Big Data Best Practices ..........................................................20

Conclusion ....................................................................................................25
Additional Resources ..................................................................................25

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 2

Introduction
Today the terms Big Data and Internet of Things draw a lot of attention, but behind the hype
there's a simple story. For decades, companies have been making business decisions based on
traditional enterprise data. Beyond that critical data, however, is a potential treasure trove of
additional data: weblogs, social media, email, sensors, photographs and much more that can be
mined for useful information. Decreases in the cost of both storage and compute power have made
it feasible to collect this data - which would have been thrown away only a few years ago. As a
result, more and more organizations are looking to include non-traditional yet potentially very
valuable data with their traditional enterprise data in their business intelligence analysis.
As the worlds most popular open source database, and the leading open source database for
Web-based and Cloud-based applications, MySQL is a key component of numerous big data
platforms. This whitepaper explores how you can unlock extremely valuable insights using MySQL
with the Hadoop platform.

1. Defining Big Data


Big data typically refers to the following types of data:

Traditional enterprise data includes customer information from CRM systems,


transactional ERP data, web store transactions, and general ledger data.

Machine-generated /sensor data includes Call Detail Records (CDR), weblogs, smart
meters, manufacturing sensors, equipment logs (often referred to as digital exhaust) and
trading systems data.

Social data includes customer feedback streams, micro-blogging sites like Twitter, social
media platforms like Facebook.

The McKinsey Global Institute estimates that data volume is growing 40% per year1. But while its
often the most visible parameter, volume of data is not the only characteristic that matters. We
often refer to the Vs defining big data:

Volume. Machine-generated data is produced in much larger quantities than non-traditional


data. For instance, a single jet engine can generate 10TB of data in 30 minutes. With more
than 25,000 airline flights per day, the daily volume of just this single data source runs into
the Petabytes. Smart meters and heavy industrial equipment like oil refineries and drilling
rigs generate similar data volumes, compounding the problem.

Velocity. Social media data streams while not as massive as machine-generated data
produce a large influx of opinions and relationships valuable to customer relationship

Big data: The next frontier for innovation, competition, and productivity: McKinsey Global Institute 2011

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 3

management. Even at 140 characters per tweet, the high velocity (or frequency) of Twitter
data ensures large volumes.

Variety. Traditional data formats tend to be relatively well defined by a data schema and
change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change. As
new services are added, new sensors deployed, or new marketing campaigns executed,
new data types are needed to capture the resultant information.

The Importance of Big Data


When big data is distilled and analyzed in combination with traditional enterprise data,
organizations can develop a more thorough and insightful understanding of their business, which
can lead to enhanced productivity, a stronger competitive position and greater innovation all of
which can have a significant impact on the bottom line.
For example, retailers usually know who buys their products. Use of social media and web log files
from their ecommerce sites can help them understand who didnt buy and why they chose not to,
information not formerly available to them. This can enable much more effective micro customer
segmentation and targeted marketing campaigns, as well as improve supply chain efficiencies
through more accurate demand planning.
Other common use cases include:

Sentiment analysis
Marketing campaign analysis
Customer churn modeling
Fraud detection
Research and Development
Risk Modeling
And more

2. The Internet of things


The Big Data imperative is compounded by the Internet of Things, generating an enormous
amount of additional data.
The devices we use are getting smaller and smarter. Theyre connecting more easily, and theyre
showing up in every aspect of our lives. This new reality in technology called the Internet of
Thingsis about collecting and managing the massive amounts of data from a rapidly growing
network of devices and sensors, processing that data, and then sharing it with other connected
things. Its the technology of the future, but you probably have it nowin the smart meter from your
utility company, in the environmental controls and security systems in your home, in your activity
wristband or in your cars self-monitoring capabilities.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 4

Gartner estimates the total economic value-add from the Internet of Things across industries will
reach US$1.9 trillion worldwide in 20202.
For example, just a few years from now, your morning routine might be a little different thanks to
Internet of Things technology. Your alarm goes off earlier than usual because your home smart
hub has detected traffic conditions suggesting an unusually slow commute. The weather sensor
warns of a continued high pollen count, so because of your allergies, you decide to wear your suit
with the sensors that track air quality and alert you to allergens that could trigger an attack.
You have time to check your messages at the kitchen e-screen. The test results from your recent
medical checkup are in, and theres a message from your doctor that reiterates his
recommendations for a healthier diet. You send this information on to your home smart hub. It
automatically displays a chart comparing your results with those of the general population in your
age range, and asks you to confirm the change to healthier options on your online grocery order.
The e-screen on the refrigerator door suggests yogurt and fresh fruit for breakfast.
Major Advances in Machine-to-Machine Interactions Mean Incredible Changes
The general understanding of how things work on the internet is a familiar pattern: humans connect
through a browser to get the information or do the action they want to do on the internet.

The Internet of Things changes that model. In the Internet of Things, things talk to things, and
processes have two-way interconnectivity so they can interoperate both locally and globally.
Decisions can be made according to predetermined rules, and the resulting actions happen
automatically without the need for human intervention. These new interactions are driving
tremendous opportunities for new services.

Peter Middleton, Peter Kjeldsen, and Jim Tully, Forecast: The Internet of Things, Worldwide, 2013,
(G00259115), Gartner, Inc., November 18, 2013.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 5

The Value of Data


Transforming data into valuable information is no small task. The variables and the risks are real
and often uncharted; flexibility and time to market can mean the difference between failure and
success. But, with the considerable potential of this developing market, some businesses are
aggressively undertaking the challenges. These businessesthe ones planning now for this new
technologywill be the ones to succeed and thrive.

Oracle delivers an integrated, secure, comprehensive platform for the entire IoT architecture
across all vertical markets. For more information on Oracles Internet of Things platform, visit:
http://www.oracle.com/us/solutions/internetofthings/overview/index.html

We shall now consider the lifecycle of Big Data, and how to leverage the Hadoop platform to derive
added value from data acquired in MySQL solutions.

3. The Lifecycle of Big Data


With the exponential growth in data volumes and data types, it is important to consider the
complete lifecycle of data, enabling the right technology to be aligned with the right stage of the
lifecycle.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 6

Figure 1: The Data Lifecycle

As Figure 1 illustrates, the lifecycle can be distilled into four stages:


Acquire: Data is captured at source, typically as part of ongoing operational processes. Examples
include log files from a web server or user profiles and orders stored within a relational database
supporting a web service.
Organize: Data is transferred from the various operational systems and consolidated into a big
data platform, i.e. Hadoop / HDFS (Hadoop Distributed File System).
Analyze: Data stored in Hadoop is processed, either in batches by Map/Reduce jobs or
interactively with technologies such as the Apache Drill or Cloudera Impala initiatives. Data can
also be processed in Apache Spark. Hadoop may also perform pre-processing of data before
being loaded into data warehouse systems, such as the Oracle Exadata Database Machine.
Decide: The results of the Analyze stage above are presented to users, enabling actions to be
taken. For example, the data maybe loaded back into the operational MySQL database supporting
a web site, enabling recommendations to be made to buyers; into reporting MySQL databases
used to populate the dashboards of BI (Business Intelligence) tools, or into the Oracle Exalytics InMemory machine.

MySQL in the Big Data Lifecycle


In the following sections, we will consider MySQL in the Big Data Lifecycle as well as the
technologies and tools at your disposal at each stage of the lifecycle.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 7

Figure 2: MySQL in the Big Data Lifecycle

Acquire: Through NoSQL APIs, MySQL is able to ingest high volume, high velocity data, without
sacrificing ACID guarantees, thereby ensuring data quality. Real-time analytics can also be run
against newly acquired data, enabling immediate business insight, before data is loaded into
Hadoop. In addition, sensitive data can be pre-processed, for example healthcare or financial
services records can be anonymized, before transfer to Hadoop.
Organize: Data can be transferred in batches from MySQL tables to Hadoop using Apache Sqoop
or the MySQL Hadoop Applier. With the Applier, users can also invoke real-time change data
capture processes to stream new data from MySQL to HDFS as they are committed by the client.
Analyze: Multi-structured data ingested from multiple sources is consolidated and processed within
the Hadoop platform.
Decide: The results of the analysis are loaded back to MySQL via Apache Sqoop where they
power real-time operational processes or provide analytics for BI tools.
Each of these stages and their associated technology are discussed below.

Step 1: Acquire Data


With data volume and velocity exploding, it is vital to be able to ingest data at high speed. For this
reason, Oracle has implemented a NoSQL interface directly to the InnoDB storage engine, and
additional NoSQL interfaces to MySQL Cluster, which bypass the SQL layer completely. Without
SQL parsing and optimization, Key-Value data can be written directly to MySQL tables up to 9x
faster, while maintaining ACID guarantees.
In addition, users can continue to run complex queries with SQL across the same data set,
providing real-time analytics to the organization.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 8

Native Memcached API access is available for MySQL 5.6 and MySQL Cluster. By using its
ubiquitous API for writing and reading data, developers can preserve their investments in
Memcached infrastructure by re-using existing Memcached clients, while also eliminating the need
for application changes.
As discussed later, MySQL Cluster also offers additional NoSQL APIs including Node.js, Java,
JPA, HTTP/REST and C++.

On-Line Schema Changes


Speed, when combined with flexibility, is essential in the world of big data. Complementing NoSQL
access, support for on-line DDL (Data Definition Language) operations in both MySQL 5.6 and
MySQL Cluster enables DevOps teams to dynamically evolve and update their database schema
to accommodate rapidly changing requirements, such as the need to capture additional data
generated by their applications. These changes can be made without database downtime.
Using the Memcached interface, developers do not need to define a schema at all when using
MySQL Cluster.

NoSQL for the MySQL Database


As illustrated in the following figure, NoSQL for the MySQL database is implemented via a
Memcached daemon plug-in to the mysqld process, with the Memcached protocol mapped to the
native InnoDB API.

Clients and Applications

SQL

MySQL Server

Memcached Protocol

Memcached Plug-in
innodb_
memcached

Handler API

local cache
(optional)

InnoDB API

InnoDB Storage Engine


mysqld process
Figure 3: Memcached API Implementation for InnoDB

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 9

With the Memcached code running in the same process space, users can insert and query data at
high speed. With simultaneous SQL access, users can maintain all the advanced functionality
offered by InnoDB including support for crash-safe transactional storage, Foreign Keys, complex
JOIN operations, etc.
Benchmarks demonstrate that the NoSQL Memcached API for InnoDB delivers up to 9x higher
performance than the SQL interface when inserting new key/value pairs, with a single low-end
commodity server3 supporting nearly 70,000 Transactions per Second.

MySQL 5.6: NoSQL Benchmarking


80000
70000
60000

TPS

50000
40000

Memcached API

30000

SQL

20000
10000
0
8

32

128

512

Client Connections

Figure 4: Over 9x Faster INSERT Operations

The delivered performance demonstrates MySQL with the native Memcached NoSQL interface is
well suited for high-speed inserts with the added assurance of transactional guarantees.

MySQL as Embedded Database


MySQL is embedded by over 3,000 ISVs and OEMs. It is for instance a popular choice in Point of
Sales (POS) applications, security appliances, network monitoring equipmentsetc. In the age of
the Internet of Things, those systems are increasingly connected with each others, and generating
vast amount of potentially valuable data. More information about MySQL as an embedded
database is available at: http://www.mysql.com/oem/

MySQL Cluster
MySQL Cluster has many attributes that make it ideal for new generations of high volume, high
velocity applications that acquire data at high speed, including:

The benchmark was run on an 8-core Intel server configured with 16GB of memory and the Oracle Linux operating system.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 10

In-memory, real-time performance


Auto-sharding across distributed clusters of commodity nodes
Cross-data center geographic replication
Online scaling and schema upgrades
Shared-nothing, fault-tolerant architecture for 99.999% uptime
SQL and NoSQL interfaces

As MySQL Cluster stores tables in network-distributed data nodes, rather than in the MySQL
Server, there are multiple interfaces available to access the database.
The chart below shows all of the access methods available to the developer. The native API for
MySQL Cluster is the C++ based NDB API. All other interfaces access the data through the NDB
API.
At the extreme left hand side of the chart, an application has embedded the NDB API library
enabling it to make native C++ calls to the database, and therefore delivering the lowest possible
latency.
On the extreme right hand side of the chart, MySQL presents a standard SQL interface to the data
nodes, providing connectivity to all of the standard MySQL drivers.

Clients and Applications

NoSQL

Native

memcached

SQL

JDBC / ODBC
PHP / PERL
Python / Ruby

JavaScript

NDB API

MySQL Cluster Data Nodes

Figure 5: Ultimate Developer Flexibility MySQL Cluster APIs

Whichever API is used to insert or query data, it is important to emphasize that all of these SQL
and NoSQL access methods can be used simultaneously, across the same data set, to provide the
ultimate in developer flexibility.
Benchmarks executed by Intel and Oracle demonstrate the performance advantages that can be
realized by combining NoSQL APIs with the distributed, multi-master design of MySQL Cluster4.

http://mysql.com/why-mysql/benchmarks/mysql-cluster/

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 11

1.2 Billion write operations per minute (19.5 million per second) were scaled linearly across a
cluster of 30 commodity dual socket (2.6GHz), 8-core Intel servers, each equipped with 64GB of
RAM, running Linux and connected via Infiniband.
Synchronous replication within node groups was configured, enabling both high performance and
high availability without compromise. In this configuration, each node delivered 650,000 ACIDcompliant write operations per second.

1.2 Billion UPDATEs per Minute


Millions of UPDATEs per Second

25

20

15

10

0
2

10

12

14

16

18

20

22

24

26

28

30

MySQL Cluster Data Nodes

Figure 6: MySQL Cluster performance scaling-out on commodity nodes.

These results demonstrate how users can acquire transactional data at high volume and high
velocity on commodity hardware using MySQL Cluster.
To learn more about the NoSQL APIs for MySQL, and the architecture powering MySQL Cluster,
download the Guide to MySQL and NoSQL:
http://www.mysql.com/why-mysql/white-papers/mysql-wp-guide-to-nosql.php

MySQL Fabric
MySQL is powering some of the most demanding Web applications, thereby collecting an
enormous amount of data potentially adding tremendous value to the businesses capable of
harnessing it. MySQL Fabric makes it easier and safer to scale out MySQL databases in order to
acquire large amounts of information:
Indeed, while MySQL Replication provides the mechanism to scale out reads (having one master
MySQL server handle all writes and then load balance reads across as many slave MySQL servers
as you need), a single server must handle all of the writes. As modern applications become more
and more interactive, the proportion of writes will continue to increase. The ubiquity of social media
means that the age of the publish once and read a billions times web site is over. Add to this the
promise offered by Cloud platforms - massive, elastic scaling out of the underlying infrastructure and you get a huge demand for scaling out to dozens, hundreds or even thousands of servers.
The most common way to scale out is by sharding the data between multiple MySQL Servers; this
can be done vertically (each server holding a discrete subset of the tables - say those for a specific
Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 12

set of features) or horizontally where each server holds a subset of the rows for a given table.
While effective, sharding has required developers and DBAs to invest a lot of effort in building and
maintaining complex logic at the application and management layers - detracting from higher value
activities.
The introduction of MySQL Fabric makes all of this far simpler. MySQL Fabric is designed to
manage pools of MySQL Servers - whether just a pair for High Availability or many thousands to
cope with scaling out huge web application.
MySQL Fabric provides a simple and effective option for High Availability as well as the option of
massive, incremental scale-out. It does this without sacrificing the robustness of MySQL and
InnoDB; requiring major application changes or needing your Dev Ops teams to move to unfamiliar
technologies or abandon their favorite tools.
For more information about MySQL Fabric, get MySQL Fabric - A Guide to Managing MySQL High
Availability & Scaling Out5.

Figure 7: MySQL Fabric: High Availability + Sharding based scale out

http://www.mysql.com/why-mysql/white-papers/mysql-fabric-product-guide/

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 13

Step 2: Organize Data


Once data has been acquired into MySQL, many users will run real-time analytics across it to yield
immediate insight into their operations.
They will then want to load or stream data into their Hadoop platform where it can be consolidated
with data from other sources for processing. Referred to as the Organize stage, there are two
approaches to exporting data from MySQL to Hadoop:
Apache Sqoop (Batch, Bi-Directional)
MySQL Hadoop Applier (Real-Time, Uni-Directional)

Apache Sqoop
Originally developed by Cloudera, Sqoop is now an Apache Top-Level Project6. Apache Sqoop is a
tool designed for efficiently transferring bulk data between Hadoop and structured datastores such
as relational databases. Sqoop can be used to:
1. Import data from MySQL into the Hadoop Distributed File System (HDFS), or related systems
such as Hive and HBase.
2. Extract data from Hadoop typically the results from processing jobs - and export it back to
MySQL tables. This will be discussed more in the Decide stage of the big data lifecycle.
3. Integrate with Oozie7 to allow users to schedule and automate import / export tasks.
Sqoop uses a connector-based architecture that supports plugins providing connectivity between
HDFS and external databases. By default Sqoop includes connectors for most leading databases
including MySQL and Oracle Database, in addition to a generic JDBC connector that can be used
to connect to any database that is accessible via JDBC. Sqoop also includes a specialized fastpath connector for MySQL that uses MySQL-specific batch tools to transfer data with high
throughput.
When using Sqoop, the dataset being transferred is sliced up into different partitions and a maponly job is launched with individual mappers responsible for transferring a slice of this dataset.
Each record of the data is handled in a type-safe manner since Sqoop uses the database
metadata to infer the data types.

http://sqoop.apache.org/
http://oozie.apache.org/

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 14

Figure 8: Importing Data from MySQL to Hadoop using Sqoop

When initiating the Sqoop import, the user provides a connect string for the database and the
name of the table to be imported.
As shown in the figure above, the import process is executed in two steps:
1. Sqoop analyzes the database to gather the necessary metadata for the data being imported.
2. Sqoop submits a map-only Hadoop job to the cluster. It is this job that performs the actual data
transfer using the metadata captured in the previous step.
The imported data is saved in a directory on HDFS based on the table being imported, though the
user can specify an alternative directory if they wish.
By default the data is formatted as CSV (Comma Separated Values), with new lines separating
different records. Users can override the format by explicitly specifying the field separator and
record terminator characters.
You can see practical examples of importing and exporting data with Sqoop on the Apache blog.
Credit goes to the ASF for content and diagrams:
https://blogs.apache.org/sqoop/entry/apache_sqoop_overview

MySQL Hadoop Applier


Apache Sqoop is a well-proven approach for bulk data loading. However, there are a growing
number of use-cases for streaming real-time updates from MySQL into Hadoop for immediate
analysis. In addition, the process of bulk loading can place additional demands on production
database infrastructure, impacting performance.
The MySQL Hadoop Applier is designed to address these issues by performing real-time
replication of events between MySQL and Hadoop.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 15

Replication via the Hadoop Applier is implemented by connecting to the MySQL master reading
events from the binary log8 events as soon as they are commited on the MySQL master, and
writing them into a file in HDFS. Events describe database changes such as table creation
operations or changes to table data.
The Hadoop Applier uses an API provided by libhdfs, a C library to manipulate files in HDFS. The
library comes precompiled with Hadoop distributions.
It connects to the MySQL master to read the binary log and then:

Fetches the row insert events occurring on the master


Decodes these events, extracts data inserted into each field of the row, and uses content
handlers to get it in the format required
Appends it to a text file in HDFS.

This is demonstrated in the figure below:

Figure 9: MySQL to Hadoop Real-Time Replication

Databases are mapped as separate directories, with their tables mapped as sub-directories with a
Hive data warehouse directory. Data inserted into each table is written into text files (named as
datafile1.txt) in Hive / HDFS. Data can be in comma separated or other formats, configurable by
command line arguments.

http://dev.mysql.com/doc/refman/5.6/en/binary-log.html

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 16

Figure 10: Mapping between MySQL and HDFS Schema

The installation, configuration and implementation are discussed in detail in the Hadoop Applier
9
blog . Integration with Hive is documented as well.
You can download and evaluate Hadoop Applier code from the MySQL labs10 (select the Hadoop
Applier build from the drop down menu).
Note that this code is currently a technology preview and not certified or supported for production
deployment.

Step 3: Analyze Data


Following data acquisition and organization, the Analyze phase is where the raw data is processed
in order to extract insight. With our MySQL data in HDFS, it is accessible to the whole ecosystem
of Hadoop related-projects, including tools such as Hive, Pig and Mahout.
This data could be processed by Map/Reduce jobs in Hadoop to provide a result set that is then
loaded directly into other tools to enable the Decide stage, or Map/Reduce outputs serve as preprocessing before dedicated appliances further analyze the data.
The data could also be processed via Apache Spark11. Apache Spark is an open-source data
analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark
fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System
(HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises
performance up to 100 times faster than Hadoop MapReduce for certain applications.

http://innovating-technology.blogspot.fi/2013/04/mysql-hadoop-applier-part-2.html
http://labs.mysql.com
11
http://spark.apache.org/
10

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 17

As we have already seen, Sqoop and the Hadoop Applier are key technologies to connect MySQL
with Hadoop available for use with multiple Hadoop distributions, e.g. Cloudera, HortonWorks and
MapR.

Step 4: Decide
Results sets from Hadoop processing jobs are loaded back into MySQL tables using Apache
Sqoop, where they become actionable for the organization.
As with the Import process, Export is performed in two steps as shown in the figure below:
1. Sqoop analyzes MySQL to gather the necessary metadata for the data being exported.
2. Sqoop divides the dataset into splits and then uses individual map tasks to push the splits to
MySQL. Each map task performs this transfer over many transactions in order to ensure
optimal throughput and minimal resource utilization.

Figure 11: Exporting Data from Hadoop to MySQL using Sqoop

The user would provide connection parameters for the database when executing the Sqoop export
process, along with the HDFS directory from which data will be exported and the name of the
MySQL table to be populated.
Once the data is in MySQL, it can be consumed by BI tools such as Oracle Business Intelligence
solutions, Pentaho, JasperSoft, Talend, etc. to populate dashboards and reporting software.
In many cases, the results can be used to control a real-time operational process that uses MySQL
as its database. Continuing with the on-line retail example cited earlier, a Hadoop analysis would
have been able to identify specific user preferences. Sqoop can be used to load this data back into
MySQL, and so when the user accesses the site in the future, they will receive offers and
recommendations based on their preferences and behavior during previous visits.
The following diagram shows the total workflow within a web architecture.
Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 18

Figure 12: MySQL & Hadoop Integration Driving a Personalized Web Experience

Integrated Oracle Solution


Oracle also offers an integrated portfolio of big data products. For instance, for Web data acquired
in MySQL, the picture could be the following:

Web Data
Acquired in
MySQL

Acquire

Analyzed with
Oracle Exadata

Organize

Decide

Analyze

Organized with
Oracle Big Data
Appliance

You
can
learn
more
about
Oracle
http://www.oracle.com/us/technologies/big-data/index.html

Decide Using
Oracle Exalytics

Big

Data

solutions

here:

MySQL Enterprise Edition is integrated and certified with the following products:

Oracle Enterprise Manager


Oracle GoldenGate
Oracle Secure Backup
Oracle Audit Vault & Database Firewall
Oracle Fusion Middleware

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 19

MyOracle Online Support


Oracle Linux
Oracle VM
Oracle Clusterware

4. MySQL Big Data Best Practices


MySQL 5.6: Enhanced Data Analytics
MySQL 5.6 includes a host of new capabilities that enhance MySQL when deployed as part of a
Big Data pipeline:

MySQL Optimizer: Significant Performance Improvement for Complex Analytical


Queries: A combination of Batched Key Access, Multi-Range Reads, Index Condition
Pushdown, Subquery and File Sort optimizations have been proven to increase
performance by over 250 times!
Improved diagnostics including Optimizer Traces, Performance Schema instrumentation
and enhanced EXPLAIN functions enable developers to further tune their queries for
highest throughput and lowest latency.

Full Text Search support for the InnoDB storage engine increases the range of queries
and workloads that MySQL can serve.

Improved Security with major enhancements to how passwords are implemented,


managed and encrypted further protects access to your most sensitive data.

For more details on those capabilities and MySQL 5.6, get the following Guide:
http://www.mysql.com/why-mysql/white-papers/whats-new-mysql-5-6/

MySQL Enterprise Edition


For MySQL applications that are part of a Big Data infrastructure, the technical support, advanced
features and management tools included in MySQL Enterprise Edition will help you achieve the
highest levels of MySQL performance, scalability, security and uptime. In addition to the MySQL
Database, MySQL Enterprise Edition includes:
The MySQL Enterprise Monitor:
The MySQL Enterprise Monitor provides at-a-glance views of the health of your MySQL databases.
It continuously monitors your MySQL servers and alerts you to potential problems before they
impact your system. Its like having a virtual DBA assistant at your side to recommend best
practices and eliminate security vulnerabilities, improve replication, and optimize performance. As a
result, DBAs and system administrators can manage more servers in less time.

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 20

The MySQL Enterprise Monitor is a web-based application that can manage MySQL within the
safety of a corporate firewall or remotely in a public cloud. MySQL Enterprise Monitor provides:

Performance & Availability Monitoring - Continuously monitor MySQL queries and


performance related server metrics
Visual Query Analysis Monitor query performance and pinpoint SQL code that is causing
a slow-down
InnoDB Monitoring - Monitor key InnoDB metrics that impact MySQL performance
MySQL Cluster Monitoring - Monitor key MySQL Cluster metrics that impact performance
and availability
Replication Monitoring Gain visibility into the performance, and health of all MySQL
Masters and Slaves
Backup Monitoring Ensure your online, hot backups are running as expected
Disk Monitoring Forecast future capacity requirements using trend analysis and
projections.
Security Monitoring - Identify and resolve security vulnerabilities across all MySQL servers
Operating System Monitoring - Monitor operating system level performance metrics such
as load average, CPU usage, RAM usage and swap usage

As noted earlier, it is also possible to monitor MySQL via Oracle Enterprise Manager.
The MySQL Query Analyzer
The MySQL Query Analyzer helps developers and DBAs improve application performance by
monitoring queries and accurately pinpointing SQL code that is causing a slowdown. Using the
Performance Schema with MySQL Server 5.6, data is gathered directly from the MySQL server
without the need for any additional software or configuration.
Queries are presented in an aggregated view across all MySQL servers so DBAs and developers
can filter for specific query problems and identify the code that consumes the most resources. With
the MySQL Query Analyzer, DBAs can improve the SQL code during active development and
continuously monitor and tune the queries in production.
Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 21

MySQL Workbench Enterprise Edition


MySQL Workbench is a unified visual tool that enables developers, DBAs, and data architects to
design, develop and administer MySQL databases. MySQL Workbench provides advanced data
modeling, a flexible SQL editor, and comprehensive administrative tools.

MySQL Workbench allows you to:

Design: MySQL Workbench includes everything a data modeler needs for creating complex
ER models, forward and reverse engineering, and also delivers key features for performing
difficult change management and documentation tasks that normally require much time and
effort.

Develop: MySQL Workbench delivers visual tools for creating, executing, and optimizing
SQL queries. The SQL Editor provides color syntax highlighting, reuse of SQL snippets, and

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 22

execution history of SQL. The Database Connections Panel enables developers to easily
manage database connections. The Object Browser provides instant access to database
schema and objects.

Administer: MySQL Workbench provides a visual console to easily administer MySQL


environments and gain better visibility into databases. Developers and DBAs can use the
visual tools for configuring servers, administering users, and viewing database health.

Migrate: MySQL Workbench now provides a complete, easy to use solution for migrating
Microsoft SQL Server, Microsoft Access, Sybase ASE, PostgreSQL, and other RDBMS
tables, objects and data to MySQL. Developers and DBAs can quickly and easily convert
existing applications to run on MySQL. Migration also supports migrating from earlier
versions of MySQL to the latest releases.

MySQL Enterprise Backup


MySQL Enterprise Backup performs online, non-blocking Hot backups of your MySQL databases.
You get a consistent backup copy of your database to recover your data to a precise point in time.
In addition, MySQL Enterprise Backup supports creating compressed backup files, and performing
backups of subsets of InnoDB tables. Compression typically reduces backup size up to 90% when
compared with the size of actual database files, helping to reduce storage costs. In conjunction
with the MySQL binlog, users can perform point in time recovery.
MySQL Enterprise Scalability
MySQL Enterprise Scalability enables you to meet the sustained performance and scalability
requirements of ever increasing user, query and data loads. The MySQL Thread Pool provides an
efficient, thread-handling model designed to reduce overhead in managing client connections, and
statement execution threads.
MySQL Enterprise Authentication
MySQL Enterprise Autehntication provides ready to use external authentication modules to easily
integrate MySQL with existing security infrastructures including PAM and Windows Active
Directory. MySQL users can be authenticated using Pluggable Authentication Modules ("PAM") or
native Windows OS services.
MySQL Enterprise Encryption
To protect sensitive data throughout its lifecycle, MySQL Enterprise Encryption provides industry
standard functionality for asymmetric encryption (Public Key Cryptography). MySQL Enterprise
Encryption provides encryption, key generation, digital signatures and other cryptographic features
to help organizations protect confidential data and comply with regulatory requirements including
HIPAA, Sarbanes-Oxley, and the PCI Data Security Standard.

MySQL Enterprise Firewall


MySQL Enterprise Firewall guards against cyber security threats by providing real-time protection
against database specific attacks, such as an SQL Injection. MySQL Enterprise Firewall monitors
Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 23

for database threats, automatically creates a whitelist of approved SQL statements and blocks
unauthorized database activity.
MySQL Enterprise Audit
MySQL Enterprise Audit enables you to quickly and seamlessly add policy-based auditing
compliance to new and existing applications. You can dynamically enable user level activity
logging, implement activity-based policies, manage audit log files and integrate MySQL auditing
with Oracle and third-party solutions
MySQL Enterprise High Availability
MySQL Enterprise High Availability enables you to make your database infrastructure highly
available. MySQL provides you with certified and supported solutions.
Oracle Premier Support for MySQL
MySQL Enterprise Edition provides 24x7x365 access to Oracles MySQL Support team, staffed by
database experts ready to help with the most complex technical issues, and backed by the MySQL
developers. Oracles Premier support for MySQL provides you with:

24x7x365 phone and online support


Rapid diagnosis and solution to complex issues
Unlimited incidents
Emergency hot fix builds forward compatible with future MySQL releases
Access to Oracles MySQL Knowledge Base
Consultative support services
The ability to get MySQL support in 29 languages

In addition to MySQL Enterprise Edition, the following services may also be of interest to Big Data
professionals:
Oracle University
Oracle University offers an extensive range of MySQL training from introductory courses (i.e.
MySQL Essentials, MySQL DBA, etc.) through to advanced certifications such as MySQL
Performance Tuning and MySQL Cluster Administration. It is also possible to define custom
training plans for delivery on-site. You can learn more about MySQL training from the Oracle
University here: http://www.mysql.com/training/
MySQL Consulting
To ensure best practices are leveraged from the initial design phase of a project through to
implementation and sustaining, users can engage Professional Services consultants. Delivered
remote or onsite, these engagements help in optimizing the architecture for scalability, high
availability and performance. You can learn more at http://www.mysql.com/consulting/

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 24

Conclusion
Big Data and the Internet of Things are generating significant transformations in the way
organizations capture and analyze new and diverse data streams. As this paper has discussed,
MySQL can be seamlessly integrated within a Big Data lifecycle. Using MySQL solutions with the
Hadoop platform and following the best practices outlined in this document can enable you to yield
more insight than was ever previously imaginable.

Additional Resources
MySQL Whitepapers
http://www.mysql.com/why-mysql/white-papers/
MySQL Webinars:
Live: http://www.mysql.com/news-and-events/web-seminars/index.html
On Demand: http://www.mysql.com/news-and-events/on-demand-webinars/
MySQL Enterprise Edition Demo:
http://www.youtube.com/watch?v=guFOVCOaaF0
MySQL Cluster Demo:
https://www.youtube.com/watch?v=A7dBB8_yNJI
MySQL Enterprise Edition Trial:
http://www.mysql.com/trials/
MySQL Case Studies:
http://www.mysql.com/why-mysql/case-studies/
MySQL TCO Savings Calculator:
http://mysql.com/tco
To contact an Oracle MySQL Representative:
http://www.mysql.com/about/contact/

Copyright 2015, Oracle and/or its affiliates. All rights reserved.

Page 25