Anda di halaman 1dari 3


Copyright, Inktank Storage Inc. 2013.

Ceph, the complete choice for cloud storage, is a massively scalable, open source, distributed storage system. It is built to scale to the Exabyte level and beyond while running on readily available commodity hardware. Ceph is in the Linux kernel, and has been integrated with the leading open source cloud management platforms. Key to Cephs design is the autonomous, self-healing, and intelligent Object Storage Daemon (OSD). Storage clients and OSDs both use the CRUSH (controlled, replication, under, scalable, hashing) algorithm to efficiently compute information about data containers on demand, instead of having to depend on a central lookup table. CRUSH provides a better data management mechanism compared to older approaches, and enables massive scale by cleanly distributing the work to all the clients and OSDs in the cluster. CRUSH uses intelligent data replication to ensure resiliency, which is better suited to hyper-scale storage. Imagine an entire cluster filled with commodity hardware, no redundant array of inexpensive disks (RAID) cards, little human intervention and faster recovery times: this is a reality with Ceph replication. Unlike traditional RAID, Ceph stripes data across an entire cluster, not just RAID sets, while keeping a mix of old and new data to prevent high traffic in replaced disks. Lets take a deeper look at how Ceph replication will save you money, save you time, increase flexibility and lower risk of losing data over RAID.


RAID challenges include capacity and speed issues, being able to scale to your needs, reliability, availability and the expense. As technology has advanced, disks have grown, keeping the cost of data reasonable because you do not need to buy more spindles or a lot more controllers or heads. We instead can precisely position more tracks on a spindle, rotate the data faster and encode a bit in fewer inches of rest. Speed and economic gains have come from greater density on disks. Non-recovery error (NRE) rate is not a function of the disk drive but a function of the bits, which leads to drives becoming larger and making NREs common. Further most, many RAID controllers fail the recovery after an NRE, losing the whole set. The access speed has also not kept up with the increased density of the drives. With technologies like RAID6, it may take a 4TB drive many days to complete rebuilding. During that time, users are exposed to simultaneous disk failures, undetected bit errors, and extended periods of degraded performance. What happens if you want to expand using RAID? When building a larger system to add greater capacity, you want to use the latest and greatest disks, as the disks will be larger and costing less per GB. This is where we run into problems. Most RAID replication schemes require that the disks have the same geometry and must be replaced with identical units. Proprietary appliances may also require you to order the replacements from the manufacturer which often comes at a much higher than commodity price for the drives. You may also run into your storage system reaching a limit beyond which cannot be further expanded, requiring a major overhaul of the entire system. Once the overhaul is complete, you need to redistribute the data in a way that balances the capacity and balances the traffic. If you do not balance the traffic, you are not making good use of your spindles. Reliability and availability is also a concern of RAID. Whether you are using RAID-5 or RAID-6, the odds of an NRE during recovery are significant and client data access will be starved out during recovery. Even the most advance RAID system can not protect you against: server failures, NIC failures, switch failures, operating system crashes and facility or regional disasters. The cost of RAID includes both the capital and operating expenses. The capital costs comes from the mark-up for enterprise hardware and high performance RAID controllers that you will need to make sure that your storage system is most efficient. Operating costs exists because RAID doesnt manage itself. RAID requires management to create storage, tune storage for appliances, and the redistribution of applications, which is a time consuming operation making it difficult to expand and migrate your data. RAID recovery does a great job but when it doesnt work, it goes really bad and could cost you a lot of dollars and time to fix the problem. Also, once a drive fails dont put off replacing the drive. This just turns into more problems down the road.

Commodity hardware is what makes todays cloud infrastructures possible. The model is built around leveraging many inexpensive building blocks and assuming that those blocks will all eventually fail. When it comes to disk drives, a certain percentage of failures is a given even for the highest quality disks. Ceph protects against disk failures in two ways. The first way is replicating data in multiple locations and fault domains, known as de-clustered placement. This uses less expensive disk controllers and avoids the problems common with RAID and todays large disks. Key benefits of de-clustered placement are: Recovery is parallel and 200x faster Service can continue during the recovery process Exposure to 2nd failures is reduced by 200x Zone aware placement protects against higher level failures Recovery is automatic and does not await new drives No idle hot-spares are required

Second, the data on any failed disk is replicated across many OSDs. Those OSDs will learn about their peers failure via a new map of the cluster, then use CRUSH to find new locations to place copies of the data. This automatically keeps an optimal level of resiliency in the cluster. Unlike RAID, potentially hundreds of source OSDs will be involved in copying data to hundreds of destination OSDs. The parallel copies will complete quickly, reducing exposure to multiple failures and degraded performance. There is no need to synchronize stripes of data across many disks or calculate parity, so the individual disk operations are fast and simple. Raw disks are cheap, and using dense inexpensive disks and drive controllers partially offsets the cost of additional capacity. RAID systems can be opaque about their internal workings, as vendors consider this their secret sauce. Being locked into proprietary solutions, users have to trust that in the event of failure everything works as advertised. With Ceph, all the details are out in the open. The CRUSH algorithm has been extensively documented in numerous scientific papers, and the implementation is available as Open Source software. Last but not least, cost is also a key advantage that Ceph has over RAID. Some of the keys cost benefits include: Can leverage commodity hardware for lowest costs Not locked in to single vendor; get best deal over time RAID not required, leading to lower component costs

Below you will find a graph that puts into perspective how cost effective Ceph Replication is compared to Enterprise RAID based on GB.
ENTERPRISE RAID Raw $/GB Protected $/GB Usable (90%) Replicated Relative Expense $3 $4 (RAID6 6+2) $4.44 $8.88 (Main + Bkup) 533% storage cost CEPH REPLICATION $0.50 $1.50 (3 copies) $1.67 $1.67 (3 copies) Baseline (100%)


Download Ceph today, and say good-bye to expensive proprietary storage solutions. As free open source software, its easy to get started with Ceph. Take advantage of our learning resources to start using Ceph and see for yourself how much a modern approach to reliable, autonomous, distributed storage can do for you: