@ebenhewitt
10. 14. 10
strange loop
st louis
• i wrote this
agenda
• context “If I had asked the
• features people what they
• data model wanted, they would
• api have said ‘faster
horses’”.
--Henry Ford
so it turns out, there’s a lot of
data in the world…
• Google processes 8 EB of data every year
– 24 PB every day
– 1PB is a quadrillion bytes
– 1 EB is a 1024 PB
• eBay
– 50TB of new data every day
• World of Warcraft
– uses 1.3 PB to store the game
• Chevron
– 2TB of data every day
• WalMart’s Customer Database
– 2004, .5 petabyte = 500 TB
The movie Avatar required 1PB
storage
distributed
decentralized
fault tolerant
elastic
durable
database
cassandra.apache.org
innovation at scale
google bigtable (2006) amazon dynamo (2007)
• consistency model: • consistency model:
strong client tune-able
• data model: sparse map • data model: key-value
• clones: hbase, • O(1) dht
hypertable • clones: riak, voldemort
• column family, • symmetric p2p, gossip
sequential writes, • AP
bloom filters, linear
insert performance
• CP
proven
• SimpleGeo >50 Large EC2 instances
• consistency
– all clients have same view of data
• availability
– writeable in the face of node failure
• partition tolerance
– processing can continue in the face of
network failure (crashed router, broken
daniel abadi: pacelc
partition! normal
trade-off A & condition:
C tradeoff
latency &
consistency
write consistency
Level Description
ZERO Good luck with that
ANY 1 replica (hints count)
ONE 1 replica. read repair in bkgnd
QUORUM (N /2) + 1
ALL N = replication factor
read consistency
Level Description
ZERO Ummm…
ANY Try ONE instead
ONE 1 replica
QUORUM Return most recent TS after (N /2)
+ 1 report
ALL N = replication factor
durability
fast writes: staged eda
• A general-purpose framework for high
concurrency & load conditioning
• Decomposes applications into stages
separated by queues
• Adopt a structured approach to event-
driven concurrency
highly
agenda
• context
• features
• data model
• api
structure
keyspace column column…
• setting family… • name
s (eg, • setting • value
partitione s (eg, • timesta
r) comparat mp
or, type
[Std])
keyspac
• ~= database
• typically one per application
• some settings are configurable only
per keyspace
– partitioner
• Configured in XML in YAML in API
create a keyspace
//Create Keyspace
KsDef k = new KsDef();
k.setName(keyspaceName);
k.setReplication_factor(1);
k.setStrategy_class
("org.apache.cassandra.locator.RackUnawareStrategy");
//Connect to Server
TTransport tr = new TSocket(HOST, PORT);
TFramedTransport tf = new TFramedTransport(tr); //new default
TProtocol proto = new TBinaryProtocol(tf);
Cassandra.Client client = new Cassandra.Client(proto);
tr.open();
partitioner smack-down
Random Order Preserving
• system will use MD5 • key distribution
(key) to distribute data determined by token
across nodes • lexicographical ordering
• even distribution of • can specify the token
keys from one CF for this node to use
across ranges/nodes • ‘scrabble’ distribution
• required for range
queries
– scan over rows like cursor
in index
column family
• group records of similar kind
• CFs are sparse tables
• ex:
– Tweet
– Address
– Customer
– PointOfInterest
column family
keys column
key s
nickname
user=ebe
12 n
=The
Situation
3
key
45 user=alis
icon=
n=
6
on
42
json-like notation
User {
123 : { user:eben,
nickname: The Situation },
row-oriented
• each row is uniquely identifiable by
key
• rows group columns and super
a column has 3 parts
1. name
– byte[]
– determines sort order
– used in queries
– indexed
2. value
– byte[]
– you don’t query on column values
3. timestamp
– long (clock)
– last-write-wins conflict resolution
get started
$cassandra –f
$bin/cassandra-cli
cassandra> connect localhost/9160
<<SC>>Cen <<SC>>
1001 tral Park Empire State Bldg
7 desc=Fun
to walk in.
phone=212
.
desc=Great
view from
555.11212 102nd floor!
<<SC>>
6311 The Loop
2 phone=314
.
desc=Home
of Strange
555.11212 Loop!
super column
super column
family
PointOfInterest {
key: 85255 { column
Phoenix Zoo { phone: 480-555-5555, desc: They have animals
here. },
Spring Training { phone: 623-333-3333, desc: Fun for baseball
fans. }, key
}, //end phx super column
flexible
schema
key: 10019 { s
Central Park { desc: Walk around. It's pretty.} ,
Empire State Building { phone: 212-777-7777,
desc: Great view from 102nd floor. }
} //end nyc
about super column families
• sub-column names in a SCF are not
indexed
– top level columns (SCF Name) are always
indexed
• often used for denormalizing data
from standard CFs
rdbms: domain-based
model
what answers do I have?
big query language
cassandra: query-based
model
what questions do I have?
replica/tion
• configurable replication factor
• replica placement strategy
rack unaware Simple Strategy
rack aware Old Network Topology
Strategy
data center shard Network Topology
Strategy
agenda
• context
• features
• data model
• api
slice predicate
• data structure describing columns to
return
– SliceRange
• start column name (byte[])
• finish column name (can be empty to stop on
count)
• reverse
• count (like LIMIT)
• get() : Column read api
– get the Col or SC at given ColPath
COSC cosc = client.get(key, path, CL);
• get_slice() : List<ColumnOrSuperColumn>
– get Cols in one row, specified by SlicePredicate:
List<ColumnOrSuperColumn> results =
client.get_slice(key, parent, predicate, CL);
• get_range_slices() : List<KeySlice>
– returns multiple Cols according to a range
– range is startkey, endkey, starttoken, endtoken:
List<KeySlice> slices = client.get_range_slices(
insert
insert(userIDKey, cp,
new Column("name".getBytes(UTF8),
"George Clinton".getBytes(), clock),
CL);
delete
String columnFamily = "Standard1";
byte[] key = "k2".getBytes(); //row key
• pycassa (python)
• Telephus (twisted python)
• fauna/cassandra gem (ruby)
• hector (java)
• pelops (java)
• kundera (JPA)
• hectorSharp (C#)
?
what about…
SELECT WHERE
ORDER BY
JOIN ON
GROUP
SELECT WHERE
cassandra is an index factory
<<cf>>USER
Key: UserID
Cols: username, email, birth date, city, state
How to support this query?
Columns Rows
are sorted according to are placed according to their Partitioner:
CompareWith or
CompareSubcolumnsWith
•Random: MD5 of key
•Order-Preserving: actual key
--Ray Kurzweil
@ebenhewitt
cassandra.apache.org