Future Nosql

The Past, Present and Future of NoSQL
Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com
Your data needs started here...
http://bit.ly/OT71M4
...but soon you had to be here
http://bit.ly/Oxcsis
...probably using one of these
http://bit.ly/QDUIUF
Before NoSQL there was...
Since the dawn of the RDBMS

1970 Main memory Intel 1103, 1k bits 2012 4GB of RAM costs $25.99 3TB Superspeed USB for $129
Mass storage Microprocessor
IBM 3330 Model 1, 100 MB
Nearly 4004 being Westmere EX has 10 developed; 4 bits and cores, 30MB L3 cache, 92,000 instructions per runs at 2.4GHz second
More recent changes

A decade ago Faster Faster storage More reliable storage Deployed in Large data set Development Buy a bigger server A SAN with more spindles More expensive SAN Your data center Millions of rows Waterfall Now Buy more servers SSD More copies of local storage The cloud private or public Billions to trillions of rows Iterative
http://bit.ly/UmUnsU
http://bit.ly/cnP77L
http://bit.ly/ODoMhh
http://bit.ly/uW2nk
http://bit.ly/Qmg8YD
Challenges for Databases

Build a database for scaleout
Run on clusters of 100s of commodity
machines
that enables agile development and is usable for a broad variety of applications
Is Scaleout Mission Impossible?

What about the CAP Theorem?
Brewer's theorem Consistency, Availability, Partition Tolerance

It says if a distributed system is partitioned, you cant
be able to update everywhere and have consistency
So, either allow inconsistency or limit where updates
can be applied
Two choices for consistency

Eventual consistency
Allow updates when a system has been partitioned Resolve conicts later Example: CouchDB, Cassandra
Immediate consistency
Limit the application of updates to a single master
node for a given slice of data
Another node can take over after a failure is detected
Avoids the possibility of conicts Example: MongoDB
Scaleout architecture
How do you distribute data among many servers Choices
Hashes (Dynamo style) vs ranges (BigTable style) Tradeoff: set-and-forget vs optimizability Physical vs logical segments Very important with secondary indexes Tradeoff: cluster rebalancing ease vs performance optimization
Why mess with the data model?

Relational minus joins and multi-statement
transactions is much less useful What about partial solutions to joins and multistatement transactions?
Hard to implement Complex for developers to understand

Therefore alternatives are worth considering for
distributed systems
performance implications
NoSQL Alternatives
Key-Value Column Document Graph
Redis Voldemort DynamoDB
Cassandra
MongoDB CouchDB
Neo4j InniteGraph
NoSQL and MongoDB
Tradeoff: Scale vs Functionality

scalability & performance
memcached key/value
RDBMS
depth of functionality
What MongoDB solves

Agility

Applications store complex data that is easier to model as documents Schemaless DB enables faster development cycles
Flexibility
Relaxed transactional semantics enable easy scale out Auto Sharding for scale down and scale up
Cost
Cost effective operationalize abundant data (clickstreams, logs, tweets, ...)
Sharding Data Distribution across nodes

Data location transparent to your code Data distribution is automatic Data re-distribution is automatic Aggregate system resources horizontally No code changes
Sharding - Range distribution

sh.shardCollection("test.tweets", {_id: 1} , false)
shard01
shard02
shard03
Sharding - Range distribution
shard01
shard02
shard03
a-i
j-r
s-z
Sharding - Splits
shard01
shard02
shard03
a-i
ja-jz k-r
s-z
Sharding - Splits
shard01
shard02
shard03
a-i
ja-ji ji-js js-jw jz-r
s-z
Sharding - Auto Balancing
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js js-jw jz-r
s-z
jz-r
Sharding - Auto Balancing
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
Sharding - Routed Query

find({_id: "alvin"})
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
Sharding - Routed Query

find({_id: "alvin"})
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
Sharding - Scatter Gather

find({email: "alvin@10gen.com"})
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
Sharding - Scatter Gather

find({email: "alvin@10gen.com"})
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
Sharding - Caching
96 GB Mem 3:1 Data/Mem
shard01
a-i
300 GB Data
j-r s-z
300 GB
Aggregate Horizontal Resources

96 GB Mem 1:1 Data/Mem 96 GB Mem 1:1 Data/Mem 96 GB Mem 1:1 Data/Mem
shard01
shard02
shard03
a-i
300 GB Data
j-r
s-z
100 GB
100 GB
100 GB
Replica Sets Data Availability across nodes

Data Protection Multiple copies of the data Spread across Data Centers, AZs High Availability Automated Failover Automated Recovery
Replica Sets
App
Write Read Read
Primary
Asynchronous Replication
Secondary
Read
Secondary
Replica Sets
App
Write Read Read
Primary
Secondary
Read
Secondary
Replica Sets
App
Primary
Write Read Read
Primary
Automatic Election of new Primary
Secondary
Replica Sets
App
Recovering
Write Read Read New primary serves data
Primary
Secondary
Replica Sets
App
Read Write Read Read
Secondary
Primary
Secondary

machines
Data Model
Why JSON?
Provides a simple, well understood
encapsulation of data Maps simply to the object in your OO language Linking & Embedding to describe relationships
Schema Design Relational Database
Schema Design MongoDB
embedding
linking
Schemas in MongoDB
Design documents that simply map to your application
post = {author: "Herg", date: new Date(), text: "Destination Moon", tags: ["comic", "adventure"]} > db.posts.save(post)
Embedding
> db.blogs.find( { author: "Herg"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { ! author : "Kyle", ! date : ISODate("2011-09-19T09:56:06.298Z"), ! text : "great book" } ] }
! ! ! ! !
JSON & Scaleout

Embedding removes need for Enables data to be distributed across many nodes
without penalty
Distributed Joins Two Phase commit

machines
Big Data = MongoDB = Solved

Content Management Opera9onal Intelligence E-Commerce
User Data Management
High Volume Data Feeds
Mobile
Location Based Service

Problem: Solution:
Location based social networking service needs to scale to
high number of users and check-ins
Used MongoDB deployed on EC2 8 clusters, 40 machines, 15k QPS, 2.3 billion records Auto-sharding and geo-spatial indexing are key To date have scaled to 9m users, 3m check-ins per day,
750m total check-ins, 20m places, 400k merchants
Results:
How Telefnica uses MongoDB

London:
O2 UK: Priority Moments location based offers

service O2 UK: eCommerce Product Catalog
Madrid:
M2M (machine to machine) event acquisition

platform Personalization Server (Oracle migration)

M2M Event Acquisition
Apps
Event notication
Event Notier
Portal API
Core
Event Storage
Mng Storage
Mng Platform
Mng
Event Gateway
Event acquisition
BOSS
Operator Network
MNO1 MNO2 MNOn

Product Catalog

machines
10gen is the company behind MongoDB

Founded in 2007
$73M+ in funding
NEA
Dwight Merriman, Eliot Horowitz Flybridge, Sequoia, Union Square,
Set the direc*on & contribute code to MongoDB
Foster community & ecosystem
Worldwide Expanding Team

170+ employees NY, CA, UK and Australia
Provide MongoDB cloud services
Provide MongoDB support services
MongoDB is the leading NoSQL solution.

#2 on Indeeds Fastest Growing Jobs Jasperso] BigData Index
Demand for MongoDB, the document-oriented NoSQL database, saw the biggest spike with over 200% growth in 2011.
Google Searches
451 Group MongoDB increasing its dominance
56
The Evolution of MongoDB

1.8 March 11
Journaling Sharding and Replica set enhancements Spherical geo search
2.0 Sept 11
Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements
2.2 Aug 12
Aggregation Framework
2.4 winter 12
Multi-Data Center Deployments Improved Performance and Concurrency
Future of NoSQL?
Future of the Data Center

Hardware
"Auto Pilot"
More Cores More Memory More IOPs (SSD) More Capacity More bandwidth (100GbE) Zero human intervention
Future of NoSQL?
Real Time Analytics Ad-Hoc / Analytics Greater Scale
Can't wait for a batch process / ETL / DW Map/Reduce = Hammer 100s -> 1,000s of nodes Petabytes -> Exabytes
Deeper history
Heterogeneous deployment
Seamless integration with what you have

Future Nosql

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Future Nosql

Diunggah oleh

Hak Cipta:

Format Tersedia

The Past, Present and Future of NoSQL

Alvin Richards Technical Director, EMEA alvin@10gen.com @jonnyeight alvinonmongodb.com

Your data needs started here...

...but soon you had to be here

...probably using one of these

Before NoSQL there was...

Since the dawn of the RDBMS

Mass storage Microprocessor

IBM 3330 Model 1, 100 MB

More recent changes

Challenges for Databases

Run on clusters of 100s of commodity

Is Scaleout Mission Impossible?

Brewer's theorem Consistency, Availability, Partition Tolerance

be able to update everywhere and have consistency

So, either allow inconsistency or limit where updates

Two choices for consistency

node for a given slice of data

Another node can take over after a failure is detected

Avoids the possibility of conicts Example: MongoDB

Why mess with the data model?

Hard to implement Complex for developers to understand

Redis Voldemort DynamoDB

NoSQL and MongoDB

Tradeoff: Scale vs Functionality

What MongoDB solves

Cost effective operationalize abundant data (clickstreams, logs, tweets, ...)

Sharding Data Distribution across nodes

Sharding - Range distribution

Sharding - Range distribution

ja-ji ji-js js-jw jz-r

Sharding - Auto Balancing

ja-ji ji-js js-jw jz-r

Sharding - Auto Balancing

Sharding - Routed Query

Sharding - Routed Query

Sharding - Scatter Gather

Sharding - Scatter Gather

Aggregate Horizontal Resources

Replica Sets Data Availability across nodes

Write Read Read

Write Read Read

Automatic Election of new Primary

Challenges for Databases

Run on clusters of 100s of commodity

Provides a simple, well understood

Schema Design Relational Database

Schema Design MongoDB

JSON & Scaleout

Distributed Joins Two Phase commit

Challenges for Databases

Run on clusters of 100s of commodity

Big Data = MongoDB = Solved

User Data Management

High Volume Data Feeds

Location Based Service

How Telefnica uses MongoDB

O2 UK: Priority Moments location based offers

M2M (machine to machine) event acquisition

How Telefnica uses MongoDB

MNO1 MNO2 MNOn

How Telefnica uses MongoDB

Challenges for Databases

Run on clusters of 100s of commodity