http://bit.ly/OT71M4
http://bit.ly/Oxcsis
http://bit.ly/QDUIUF
Nearly 4004 being Westmere EX has 10 developed; 4 bits and cores, 30MB L3 cache, 92,000 instructions per runs at 2.4GHz second
http://bit.ly/UmUnsU
http://bit.ly/cnP77L
http://bit.ly/ODoMhh
http://bit.ly/uW2nk
http://bit.ly/Qmg8YD
machines
that enables agile development and is usable for a broad variety of applications
can be applied
Immediate consistency
Limit the application of updates to a single master
Scaleout architecture
How do you distribute data among many servers Choices
Hashes (Dynamo style) vs ranges (BigTable style) Tradeoff: set-and-forget vs optimizability Physical vs logical segments Very important with secondary indexes Tradeoff: cluster rebalancing ease vs performance optimization
performance implications
NoSQL Alternatives
Key-Value Column Document Graph
Cassandra
MongoDB CouchDB
Neo4j InniteGraph
memcached key/value
RDBMS
depth of functionality
Flexibility
Relaxed transactional semantics enable easy scale out Auto Sharding for scale down and scale up
Cost
shard01
shard02
shard03
shard01
shard02
shard03
a-i
j-r
s-z
Sharding - Splits
shard01
shard02
shard03
a-i
ja-jz k-r
s-z
Sharding - Splits
shard01
shard02
shard03
a-i
s-z
shard01
shard02
shard03
a-i js-jw
s-z
jz-r
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
shard01
shard02
shard03
a-i js-jw
ja-ji ji-js
s-z
jz-r
Sharding - Caching
96 GB Mem 3:1 Data/Mem
shard01
a-i
300 GB Data
j-r s-z
300 GB
shard01
shard02
shard03
a-i
300 GB Data
j-r
s-z
100 GB
100 GB
100 GB
Replica Sets
App
Primary
Asynchronous Replication
Secondary
Read
Secondary
Replica Sets
App
Primary
Secondary
Read
Secondary
Replica Sets
App
Primary
Write Read Read
Primary
Secondary
Replica Sets
App
Recovering
Write Read Read New primary serves data
Primary
Secondary
Replica Sets
App
Read Write Read Read
Secondary
Primary
Secondary
machines
that enables agile development and is usable for a broad variety of applications
Data Model
Why JSON?
encapsulation of data Maps simply to the object in your OO language Linking & Embedding to describe relationships
embedding
linking
Schemas in MongoDB
Design documents that simply map to your application
post
=
{author:
"Herg",
date:
new
Date(),
text:
"Destination
Moon",
tags:
["comic",
"adventure"]} >
db.posts.save(post)
Embedding
> db.blogs.find( { author: "Herg"} ) { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Herg", date : ISODate("2011-09-18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { ! author : "Kyle", ! date : ISODate("2011-09-19T09:56:06.298Z"), ! text : "great book" } ] }
! ! ! ! !
machines
that enables agile development and is usable for a broad variety of applications
Mobile
Used MongoDB deployed on EC2 8 clusters, 40 machines, 15k QPS, 2.3 billion records Auto-sharding and geo-spatial indexing are key To date have scaled to 9m users, 3m check-ins per day,
750m total check-ins, 20m places, 400k merchants
Results:
Madrid:
Event notication
Event Notier
Portal API
Core
Event Storage
Mng Storage
Mng Platform
Mng
Event Gateway
Event acquisition
BOSS
Operator Network
machines
that enables agile development and is usable for a broad variety of applications
Google Searches
56
2.0 Sept 11
Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements
2.2 Aug 12
Aggregation Framework
2.4 winter 12
Future of NoSQL?
"Auto Pilot"
More Cores More Memory More IOPs (SSD) More Capacity More bandwidth (100GbE) Zero human intervention
Future of NoSQL?
Real Time Analytics Ad-Hoc / Analytics Greater Scale
Can't wait for a batch process / ETL / DW Map/Reduce = Hammer 100s -> 1,000s of nodes Petabytes -> Exabytes
Deeper history
Heterogeneous deployment