Anda di halaman 1dari 23

Big Data

Eufris 2012

Why should I care?


McKinsey:
$250 billions annual savings in EU alone by enhancing public sector $600 billions annual consumer surplus from using personal location data globally

Annual growth of data is remarcable Data is the most valuable thing most companies have Data is massively underutilized

Eufris 2012

Forecast
There will be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

Eufris 2012

What is Big Data?


"Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis"

IDC

"Big Data is a technlogy that helps extract value from the digital universe.

IDC

"Techniques and technologies that make handling data at extreme scale economical."

Forrester

Eufris 2012

ABC of Big Data


Analy&cs
making sense of your data, in real-5me, in easy way

Bandwidth
inges5ng, prosessing and delivering large amounts of data

Content
storing, managing and retaining large amounts of data

www.netapp.com

Eufris 2012

3 Vs of Big Data
Variety
Big Data extends beyond structured data, including unstructured data of all varie5es: text, audio, video, click streams, log les and more

Velocity
o@en 5me sensi5ve, Big Data must be used as it is streaming in to the enterprise in order to maximize its value to the business

Volume
Big Data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of informa5on

Eufris 2012

Few core concepts

Eufris 2012

Hadoop
The Apache Hadoop so.ware library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. Three subprojects Hadoop Common Hadoop Distributed Filesystem (HDFS) Hadoop MapReduce

Eufris 2012

MapReduce
Introduced by Google in 2004

2 2 Map 2 1 2 3
Eufris 2012

Reduce

3 4 5

MapReduce on App Engine


Mapreduce is an experimental, innovaNve, and rapidly changing new feature for App Engine

Eufris 2012

NoSQL
DeniNon 1
Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent, a huge data amount, and more. nosql-database.org

Eufris 2012

NoSQL
DeniNon 2
In computing, NoSQL (sometimes expanded to "not only SQL") is a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally. Wikipedia

Eufris 2012

From ACID to BASE


ACID:
Atomicity, Consistency, Isola&on, Durability

BASE:
Basically available, So? state, Eventually consistent

Eufris 2012

Big Data and cloud

Eufris 2012

Big Data on AWS

Eufris 2012

MapReduce on AWS
Not yet Hadoop 1.0.0

Eufris 2012

MapReduce on AWS
EC2 S3 + DynamoDB

Eufris 2012

Google BigQuery
Features Speed - Analyze billions of rows(!) in seconds Scale - Terabytes of data, trillions of records Simplicity - SQL-like query language, hosted on Google infrastructure Sharing - Powerful group- and user-based permissions using Google accounts Security - Secure SSL access Multiple access methods - Can be used by REST API, a command-line tool, a browser-based graphical interface, and Google Apps Script
Eufris 2012

BigQuery example

Eufris 2012

Big Data outside of cloud

Eufris 2012

Oracle Big Data Appliance


About 500 000 $
18 Oracle Sun Servers 864 GB main memory; 216 CPU cores; 648 TB of raw disk storage; 40 Gb/s InfiniBand connectivity between nodes and engineered systems; 10 Gb/s Ethernet connectivity.

Eufris 2012

Autonomy IDOL 10

"For far too long, organizations have confined structured data to relational databases and unstructured data to simplistic keyword matching technologies..." IDOL 10 brings these worlds together, allowing organizations to automatically process, understand, and act on 100 percent of their data, in real-time. The results will be dramatic, as businesses can develop entirely new applications that explore the richness and color of Human Information that live in unstructured, semi-structured, and structured forms. Price?
Eufris 2012

Thank you!

Eufris 2012

Anda mungkin juga menyukai