Eufris 2012
Annual growth of data is remarcable Data is the most valuable thing most companies have Data is massively underutilized
Eufris 2012
Forecast
There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.
Eufris 2012
IDC
"Big Data is a technlogy that helps extract value from the digital universe.
IDC
"Techniques and technologies that make handling data at extreme scale economical."
Forrester
Eufris 2012
Bandwidth
inges5ng,
prosessing
and
delivering
large
amounts
of
data
Content
storing,
managing
and
retaining
large
amounts
of
data
www.netapp.com
Eufris 2012
3 Vs of Big Data
Variety
Big
Data
extends
beyond
structured
data,
including
unstructured
data
of
all
varie5es:
text,
audio,
video,
click
streams,
log
les
and
more
Velocity
o@en
5me
sensi5ve,
Big
Data
must
be
used
as
it
is
streaming
in
to
the
enterprise
in
order
to
maximize
its
value
to
the
business
Volume
Big
Data
comes
in
one
size:
large.
Enterprises
are
awash
with
data,
easily
amassing
terabytes
and
even
petabytes
of
informa5on
Eufris 2012
Eufris 2012
Hadoop
The
Apache
Hadoop
so.ware
library
is
a
framework
that
allows
for
the
distributed
processing
of
large
data
sets
across
clusters
of
computers
using
a
simple
programming
model. Three
subprojects Hadoop
Common Hadoop
Distributed
Filesystem
(HDFS) Hadoop
MapReduce
Eufris 2012
MapReduce
Introduced
by
Google
in
2004
2 2 Map 2 1 2 3
Eufris 2012
Reduce
3 4 5
Eufris 2012
NoSQL
DeniNon
1
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent, a huge data amount, and more. nosql-database.org
Eufris 2012
NoSQL
DeniNon
2
In computing, NoSQL (sometimes expanded to "not only SQL") is a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally. Wikipedia
Eufris 2012
BASE:
Basically
available,
So?
state,
Eventually
consistent
Eufris 2012
Eufris 2012
Eufris 2012
MapReduce on AWS
Not
yet
Hadoop
1.0.0
Eufris 2012
MapReduce on AWS
EC2 S3 + DynamoDB
Eufris 2012
Google BigQuery
Features Speed - Analyze billions of rows(!) in seconds Scale - Terabytes of data, trillions of records Simplicity - SQL-like query language, hosted on Google infrastructure Sharing - Powerful group- and user-based permissions using Google accounts Security - Secure SSL access Multiple access methods - Can be used by REST API, a command-line tool, a browser-based graphical interface, and Google Apps Script
Eufris 2012
BigQuery example
Eufris 2012
Eufris 2012
Eufris 2012
Autonomy IDOL 10
"For far too long, organizations have confined structured data to relational databases and unstructured data to simplistic keyword matching technologies..." IDOL 10 brings these worlds together, allowing organizations to automatically process, understand, and act on 100 percent of their data, in real-time. The results will be dramatic, as businesses can develop entirely new applications that explore the richness and color of Human Information that live in unstructured, semi-structured, and structured forms. Price?
Eufris 2012
Thank you!
Eufris 2012