Web Servers
Web Servers
• Linux.
• Can usually scale by adding
machines.
• Ruby, Python, PHP, Groovy.
– Web Code not the bottleneck.
– Spend time waiting RPCs.
– Development speed critical.
The CPU is not the bottleneck.
Any language should be fast
enough..just, it should be dynamic
Serving Media
• Each piece of Media should be
hosted by a “mini-cluster”.
– Scalability.
– More than 1 HDD to serve the media.
– Online Backup.
• Apache lighttpd (high load,
context switching, context switching)
• Switch from single process to multi-
process.
Serving Media
C
DNs
SuperCour
se
Serv1
SuperCour
Moderately played se
Intern Serv2
et SuperCour
se
Serv3
SuperCour
se
Serv4
Serving Thumbnails
• Thumbnails are scary, they may
represent a bottleneck.
– Disk Seeks.
– Many small objects.
– High number of requests/sec
Serving Thumbnails
• Limit on the # of files in a
directory.(ext3)
• Squid..better to use Varnish.(reverse
Proxy)
• Apache may not be sufficient for disk
reads. (load, too many disk reads)
Thumbnails: lighttpd/aio
• Lighttpd is single threaded…
Main Thread
Worker Disk
Thread 2 Read
Thumbnails
• There will may be bottleneck with
accessing small files (disks reads
bottlenecks).
Thumbnails: BT
• Google uses a system called BTFE in
youtube, google video, imagesearch..
– based on Google Bigtable.
– Avoids small files problem.
– Various forms of caching.(multiple
cache layers based on location..etc)
Databases
• Stores metadata (users, bookmarks,
comments, etc…)
• Database performance degrades
with disk reads.
• Pay little attention to “swap” in the
linux kernel, as the OS may swap the
database engine in/out.
DB Optimizations
• Query Optimizations
• Batch Jobs
• memcached
• App server caching.
• Pre-calculation of common queries.
DB Replication
Sql Replication
Master
(mostly
Write)
Sql Replication
Master
Databa
se
Replica
Databa
se
Cache misses require slow disk I/Os, causing a reduction in replication speed.
DB (Abstract view)
• DB updates involve two steps:
– Reading the affected DB pages.
– Applying the changes.