Anda di halaman 1dari 37

Scaling to 200K

Transactions per Second


with Open Source -
MySQL, Java, curl, PHP
by
Dathan Vance Pattishall
Contents
Who am I?
Introduction
Now I work at RockYou

W h
en
I st
a rt
ed
Fa ce b o o k S h a rd s
do 1 0 0 K TPS
a lo n e
MySpace, Hi5,
Orkut, Ads, Main
Site various
other DB servers
Sum to 100K TPS
On Less than 120 Database Servers

32 – 48 GB of RAM
8 Disk RAID 10 with 256 MB PERC 6
Controller

We can support any Logical


SQL Query
TEAM
The Requirements
• Scale Linear
• Store some data forever
• Allow for change
• Keep it cheap
• Oh and downtime is not an option
Design to Meet The
Requirement
Federation

User 1’s Data


User 2’s Data User 1’s Data User 3’s Data
User 3’s Data
….
User N’s Data

User 2’s Data User N’s Data


Federation
pu t
u g h
th ro
w rite
e a se
in cr
N OT
es
Do
Federation

T h is
In cre a se s
W rite
T h ro u g h p u t
How does one Federate?
Enter Global Lookup Cluster
• Hash Lookups are fast, can do 45K qps
single server
• Ownerid -> Shard_id
• Groupid -> Shard_id
• Tagid -> Shard_id
• Url_id -> Shard_id
• Front By memcache
• Use consistent hashing to add capacity
horizontally and HA
Write Multiple Views of the
Data
Keep Data Consistent
What if I need an ID to
represent a row
 REPLACE INTO Tickets VALUES(‘a’);
Get a ID back
CREATE TABLE `TicketsGeneric` (
 `id` bigint(20) unsigned NOT NULL

auto_increment,
 `stub` char(1) NOT NULL default '',

 PRIMARY KEY (`id`),

 UNIQUE KEY `stub` (`stub`)

) ENGINE=MyISAM

AUTO_INCREMENT=7445309740
But what if I need a global view
of the table
• Cron Jobs
• Front by Memcache
• Offline Tasks to atomic write job and
return the page quickly i.e. defer
writes to Many RECPT
– Pure PHP
– Like GEARMAND uses IPC distributed
across servers
– Does 100Million actions per day and
scales linearly
• @see Friend Query Section
What about maintenance
What about Shard
Misbalance?
Migrate them
• object_id -> shard_id, lock shard_id
for object_id
• Migrate the user
• If error die, send alert
• Takes less then 30 seconds per
primary object
• Currently shards are self balancing,
can migrate 4 million users in 8
days, at slowest setting.
What about managing
datasize
• Enter Shard Types
– Archive Shard
– Sub Shards
• One way a DBA can scale is to
partition and allocate a server per
table. Why not by partition shard
types?
• Allows for bleeding edge techs, have
10 shards of XTRA-DB

What about Split Brain?
Friend Queries
MULTI-GET from Shards
Jetty + J/Connect
(AsyncShard Server)
• Can Query 8 shards at a time in
parallel
• Data is merged on the fly
• JASON is the communication protocol
• private ExecutorService exec =
Executors.newFixedThreadPool
(8); // 4 CPU * .8 Ut (1 + W/C)
=~ 8


J/Connect
/* mysql-connector-java-5.1.7 ( Revision: ${svn.Revision} ) */SHOW
VARIABLE
WHERE Variable_name ='language' OR Variable_name =

'net_write_timeout' OR
Variable_name = 'interactive_timeout' OR Variable_name =

'wait_timeout' OR
Variable_name = 'character_set_client' OR Variable_name =

'character_set_connection' OR Variable_name = 'character_set' OR

Variable_name =
'character_set_server' OR Variable_name = 'tx_isolation' OR

Variable_name =
'transaction_isolation' OR Variable_name = 'character_set_results' OR

Variable_name = 'timezone' OR Variable_name = 'time_zone' OR

Variable_name =
'system_time_zone' OR Variable_name = 'lower_case_table_names'

OR Variable_name = 'max_allowed_packet' OR Variable_name =


'net_buffer_length' OR Variable_name = 'sql_mode' OR
Variable_name = 'query_cache_type' OR Variable_name =
'query_cache_size' OR Variable_name = 'init_connect';
Fix
 Add:
 &cacheServerConfiguration=true
 To your JDBC url directive

 @see
http://assets.en.oreilly.com/1/event/21/Connecto


Writing Large Strings
REALTIME
• Incrementing impressions is easy,
but storing referrer URLS is not as
easy in RealTime
• Why must know your limits of the
Storage Engine you are using
INNODB & Strings
• Indexing a string takes a lot of space
• Indexing a large string takes even
more space
• Each index has its own 16KB page.
• Fragmentation across pages was
hurting the app – chewing up I/O
• Lots of disk space chewed up per day
• Due to a bunch of overhead with
Strings & Deadlock detection

INNODB & High Concurrency of
Writes
• Requirement: 300 ms for total db
access FOR ALL Apps
• Writes when the datafile(s) size is
greater then the buffer_size-slow
down at high concurrency
• 10 ms to 20 seconds sometimes for
the full transaction
• Fixed by offloading the query to
OfflineTask that writes it as a single
thread.

Deadlock / Transaction
Overhead Solved
• Put a Java daemon that buffers up to
4000 messages (transactions) and
apply it serially with one thread
• It does not go down & if it does we
can fail over
• Log data to local disk for outstanding
trans
• It does not use much memory or cpu
• Even during peak messages do not
exceed 200 outstanding
transactions
Disk Consumption solved
• Archive Data
• Compress using INNODB 1.0.4
• innodb_file_format =
Barracuda
• 8K Key Block Size – best bang for the
buck for our data. Less Key Block
Size causes major slow down in
transactions.
Stats Across All Services
Questions / Want to Work
here?

d a th a n @ ro ckyo u . co m

Anda mungkin juga menyukai