Anda di halaman 1dari 4

CLOUD COMPUTING: BIG DATA IS THE FUTURE OF IT

Winter 2009 | Ping Li | ping@accel.com Cloud computing has been generating considerable hype these days. Every participant in the datacenter and IT ecosystem has been rolling out cloud initiatives and strategies from hardware vendors, ISVs, SaaS providers, and Web 2.0 companies - startups and incumbents are equally active. Cloud computing promises to transform IT infrastructure and deliver scalability, flexibility, and efficiency, as well as new services and applications that were previously unthinkable. Despite all of this activity, cloud computing remains as amorphous today as its name suggests. However, one critical trend shines through the cloud Big Data. Indeed, its the core driver in cloud computing and will define the future of IT. from this exponential data growth as inexpensively as possible. Previous computing platform transitions had technology dislocations similar to cloud computing but along different dimensions. The shift from mainframe to client-server was fueled by disruptive innovation in computing horsepower that enabled distributed microprocessing environments. The following shift to web applications/web services during the last decade was enabled by the open networking of applications and services through the internet buildout. While cloud computing will leverage these prior waves of technology computing and networking it will also embrace deep innovations in storage/ data management to tackle big data. Along these lines, many of the early uses of cloud computing have been focused less on computing and more on storage. For example, a significant portion of the initial applications on AWS were primarily leveraging just S3 with applications executing behind the firewall. Popular storage applications, like Jungle Disk and Smug Mug, were early AWS customers. This explosion of data has driven enterprises (and consumers for that matter) to find cheap, on-demand storage in unlimited quantities which cloud storage promises to deliver. Until now, massive tape archives in the middle of nowhere (like Iron Mountain) have been the only means to achieve that cheap storage. However, enterprises today need more; they need quick access data retrieval for multiple reasons, from compliance to business analytics. It is simply no longer sufficient to have cold data; rather, it needs to be online and resilient (and cheap, of course); hence, the accelerating shift towards storing every piece of data in memory or on disks (Data Domain smartly rode this trend). The need to balance data availability/usability and cost effectiveness has prompted significant innovation in both onpremise and hosted cloud storage cloud storage systems (Caringo, EMC Atmos, and ParaScale, to name just a few), flash-based storage systems (Fusion IO, Nimble Storage, Pliant, etc.) are just some current examples. Furthermore, hierarchical storage management (HSM, which has always sounded great but has been implemented only rarely) will become an important element in storage workflows. Enterprises will require seamless capability to move data across different tiers of storage (both on-premise and into the cloud) based on policy and data type to maximize retrieval costs. As cloud computing matures, true cloud applications will be (re)written to leverage hierarchical and cloud-like storage tiers to retrieve data dynamically from different storage layers. Page 1
Source: Approaching the Zettabyte Era. Cisco, 16 June 2008. <http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11481374_ns827_Networking_Solutions_White_Paper.html>
1

BIG DATA THE PERFECT STORM


Cloud computing has been driven fundamentally by the need to process an exploding quantity of data. Data is no longer measured in gigabytes but in exabytes as we are Approaching the ZettaByte Era.1 Moreover, data types structured, semistructured, or unstructured continue to proliferate at an alarming rate as more information is digitized, from family pictures to historical documents to genome mapping to financial transactions to utility metering. The list is truly unbounded. But today, data is not only being generated by users and applications. It is increasingly being machine-generated, and such data is exponentially leading the charge in the Big Data world. In a recent article, The Economist called this phenomenon the Data Deluge (http://www.economist.com/opinion/displaystory.cfm? story_id=15579717). One can argue that Web 2.0 companies have been pushing the upper bounds of large-scale data processing more than anyone. That being said, this data explosion is not sparing any vertical industries financial, health care, biotech, advertising, energy, telecom, etc. All are grappling with this perfect storm. Below are just a few stats: Google was processing two years ago more than 400PB of data/month in just one application The New York Times is processing an 11-million-story archive dating back to 1851 eBay processes more than 50TB/day in its data warehouse CERN is processing 2GB/second for their most recent particle accelerator Facebook crunches 15TB/day into a 2.5PB data warehouse Without question, data represents the competitive advantage of any enterprise, and every organization is now encumbered with the task of storing, managing, analyzing, and extracting value

A NEW CLOUD STACK


In order for cloud computing to become a mainstream approach, a new cloud stack (like mainframe and OSI) will likely emerge. Just like prior computing platform transitions (client/server, web services, etc.), core platform capabilities, such as security, access control, application management, virtualization, systems management, provisioning, availability, etc. will be a prerequisite before IT organizations are able to adopt the cloud completely. Clearly, this stack will exist in a different representation than prior platform layers to embrace a cloud environment. Simply replicating the current computing stack but allowing it to reside off-premise will not achieve the scale, capabilities, and economies of cloud computing. In particular, this new cloud framework needs the ability to process data in increasingly greater orders of magnitude and do it at a fraction of the cost by leveraging commodity, multi-threaded servers for storage and computing. In many ways, this cloud stack has been implemented already, albeit in a primitive form, at large-scale internet datacenters. The challenge of processing terabytes of data daily at Google, Facebook, and Amazon drove them to adopt a new data architecture, which is essentially Martian to traditional enterprise datacenter architects. No longer are ACID and relational databases back-ending transactional applications. Internet datacenters quickly encountered the scaling limitations of SQL databases as the volume of data exploded. Instead, highperformance, scalable/distributed non-SQL data stores are being developed internally and implemented at scale. Big Table and Cassandra are among the many variants, and this non-database database trend has proliferated to the point of having its own conference: NoSQL. Database caching layers (i.e., Northscales Memcached) are also being implemented to further drive application performance, and its now accepted as a standard tier in datacenters. Managing non-transactional data has become even more daunting. From log files to click stream data to web indexing, internet data centers are collecting massive volumes of data that need to be processed cheaply in order to drive monetization value. Hadoop is an open source data management framework that has become widely deployed for massive parallel computation and distributed file systems in a cloud environment. Hadoop has allowed the largest web properties (Yahoo!, LinkedIn, Facebook, etc.) to store and analyze any data in near real-time at a fraction of the cost that traditional data management and data warehouse approaches could even contemplate. Although the framework has roots in internet datacenters, Hadoop is quickly penetrating broader enterprise use cases. The diverse set of participants at Hadoop World NYC hosted by Cloudera clearly points to this trend.

point network and data level security, although high bandwidth encryption solutions and sophisticated key management will be needed to match the massively parallel computational cloud environments. In this case, the primary security challenges will stem from control. User authentication will become increasingly challenging as applications are federated outside the firewall because of SaaS adoption. In addition, managing and reconciling user identities across individual user directories for each SaaS/Cloud application will present further security issues. Much like web applications in the 90s created an SSO layer, cloud computing is essentially abstracting a web services interface for infrastructure IT, and it will demand a similar unified authentication/entitlement layer. In addition to federated user authentication, cloud computing will also require data authentication and security. Impervas database firewall is an example of an increasingly important cloud security product. As applications reside in different public and private clouds, it will be critical for the cloud applications to be able to talk to each other. This will drive the need for ensuring data authentication and policy control for the volumes of data flowing between cloud applications. Moreover, given the multi-tenancy paradigm of cloud environments, policy granularity will be paramount to ensure security and compliance. Data integration across cloud platforms will be more of an obstacle than application integration, as applications have become more open/standard. Standard data APIs will emerge as part of the new cloud stack to allow disparate environments to talk to each other and avoid vendor lock-in. Data migration challenges are perhaps the greatest factor today for locking users to a particular cloud platform. Over time, these APIs and layers will harden and will become tailored, depending on use case and workload for particular applications. The adoption of these new frameworks will ultimately make cloud computing safe and broaden its penetration into enterprises of all sizes.

WHATS BREWING IN A CLOUD?


Despite constant comparisons to grid and utility computing, cloud computing has the potential to address a much broader set of applications and use cases beyond the limited HPC environments served traditionally by grid computing. This breadth of cloud computing is engendered in a new set of underlying technology forces. Virtualization technologies, high-powered commodity servers, low-cost/high bandwidth connectivity, concurrent/multi-threaded programming models and open source software stacks are all technology building blocks that can deliver the high performance and scalability of grid/utility computing, but importantly and concurrently with underlying commodity resources. These technology drivers enable applications and users to be abstracted cleanly from particular IT infrastructure resources (computing, storage, networking, etc.) in new and powerful ways; i.e., location agnostic and multi-tenancy are two critical Page 2

SECURING THE CLOUD


Given this data intensive nature, any widely adopted cloud computing platform will inevitably account for richer security requirements. The security challenges will be focused less on

elements among others. Unlike traditional HPC grid environments, which were designed for a specific application in a single company, cloud computing enables disparate applications and entities to harness a shared pool of resources. In addition, applications can be broken up in the cloud where computing resources may reside on the client while the data is accessed portably from multiple cloud locations (as an example). Many different definitions of cloud computing have surfaced. Rather than posit yet another, several characteristics are resident in any cloud instance: (i) self-provisioned (either by user, developer, or IT); (ii) elasticity (on-demand allocation of any computing, storage and networking resources); (iii) multianything (multi-user, multi-application, multi-session, etc.); and (iv) portability (applications are abstracted from physical infrastructure and can be migrated easily). These capabilities allow enterprise to shift IT resources from capex to opex a usage based model that is particularly appealing during recent economic constraints. These cloud prerequisites will yield a powerful a set of use cases beyond grid computing that are unique to cloud platforms. Cloud computing will reach its full potential in the future when a whole new set of applications (never possible before) is created that is purpose-built for the cloud. For example, one can envision powerful collaboration applications emerging that enable internal enterprise and external users to seamlessly cooperate that would have been previously impossible with users and data isolated on disparate enterprise islands. Its likely these innovative applications will require new programming models and potentially languages yet to be hardened.

a powerful trend in the role of developers in driving cloud computing adoptions. Many early users of cloud computing are examples of developers launching applications without requiring the involvement of IT (in the case of a Web 2.0 startup, they dont have an IT department). Increasingly, empowering developers and line of business owners to innovate and deploy new applications without the shackles of IT will be a motivating driver for cloud adoption. No longer do users need to have ITs blessing and time to get their job done. This developer-centric nature was a primary motivator of VMwares strategic acquisition of SpringSource. In addition to inheriting significant Java technology, VMware now has a distinct opportunity to transition SpringSources dominant Java developer mindshare to develop onto VMwares private cloud platform. Amazon Web Services has experienced tremendous success from its developer-centric platform APIs. Unlike traditional hosting providers that cater to IT/operations, Amazon went after developers first and has only recently begun to add the functionality that will appeal to broader enterprise IT. Within enterprises, there are early signs of developers (Q&A environments, batch processing, and developer prototyping) and line of business/departmental leveraging cloud computing. It is not uncommon for new platform technologies to start at the fringes of IT before mainstream adoption takes place. Unlike typical three-tier traditional enterprise datacenters, the internet datacenters of Facebook, Google, etc. were not encumbered by legacy enterprise stacks, applications, and IT rules; which in turn enabled them to be built from the ground up with cloud stacks to handle elastically large-scale consumer transactions for multiple applications. Therefore, and unsurprisingly, Amazons internet datacenters was easily adapted to become the first and leading public computing provider. It will certainly take significant time/effort for enterprise IT infrastructure gatekeepers to evolve their current architectures to embrace a new cloud platform. Luckily, enterprises can reap the technology innovation from internet data centers (many which are open source) to accelerate this transition.

STILL IN THE EARLY DAYS


Despite the high energy surrounding cloud computing and early cloud offering successes, such as Amazon Web Services, cloud computing for enterprise services is definitely still in its formative stages. In contrast, however, consumers have already adopted cloud computing technologies. One could argue that web companies like Google, Yahoo!, Facebook, and Salesforce are examples of consumers leveraging cloud computing. These Web 2.0/SaaS offerings clearly exhibit the core cloud characteristics outlined above, and in turn are delivering new, value-added services previously considered unthinkable. Interestingly, this time the consumers, via their use of Web 2.0 services, have been teaching the typically early technology adopter enterprises the effectiveness of cloud computing. Today, the enterprise use of cloud computing represents opposite ends of the spectrum: (i) Web 2.0 start-ups seeking to launch applications quickly and cheaply, and (ii) compute intensive enterprises that need batch processing for bursty, large-scale applications. Although these users are driving the early adoption of cloud technology, its unlikely these limited use cases will establish cloud computing as a pervasive platform. Cloud computing instead will need to penetrate mainstream IT infrastructure slowly and offer a broader set enterprise applications. It is important to note here that these Web 2.0 start-ups represent

MORE THAN ONE FLAVOR


There have been analogies drawn between cloud computing and public utilities (electric, gas, etc.) where the value is all about economies of scale. According to this hypothesis, the world will only have a few cloud providers that reach maximum efficient scale. It is quite unlikely that this will happen. Multiple cloud models will emerge depending on the user, the workload, and the application. For example, certain developers will prefer to interface with a cloud provider at a higher level of abstraction, such as Google App Engine, as opposed to a more bare metal API, such as Rackspace. Alternatively, an application may choose to run on MSFT Azure to leverage SQL/MSFT services or Salesforce Force for CRM integration and distribution advantages. Today, one can break cloud platforms into roughly two camps: developercentric (Amazon, MSFT) and IT-centric (EMC, VMware). Page 3

Cloud platforms will remain distinct and diverse as long as they continue to deliver unique value-add for their particular use cases and users. To drive this cloud diversity point further, the concept of a cloud within a cloud is also emerging where distinct services, such as data warehousing, can be built atop a more generic cloud platform to provide a higher layer cloud service. In addition, private clouds behind the firewalls present yet another flavor of cloud computing as enterprises leverage the benefits of cloud frameworks while maintaining security/control as well as the compliance of their internal datacenters. Lastly, hybrid clouds that bridge private and public clouds on a permanent and temporary basis (also known as cloud bursting) will come to fruition for certain applications or as a migration path for enterprises. Several start-ups (Cirtas, CloudSwitch and Zetta among them) are building products that make the cloud safe for enterprises. Innovation will abound to solve the specific issues in all of these various cloud environments.

Fellow, Yahoo! Research: So a lot of the companies that are out there today Yahoo!, Facebook, Google theyre all exposing data APIs. Imagine whats going to happen once large clouds are routinely available to build theyre own application and you start aggregating your own data, and you have the opportunity to fuse that with all the data thats out there. Someones going to figure out the next big thing, by taking 2 + 2 and coming up with 20. Mike Schroepfer, VP Engineering, Facebook: one of the things that is going to happen is that people are going to figure out that we need a more blended workload between the cloud and the client. Weve been operating kind of in the cycle of reincarnation and computer science, moved toward most of the computing happening in the cloud, and my browser effectively being its own terminal. You know, in the last 2 or 3 years, the speed and capability of browsers has been outpacing that of most chips. Youre seeing 2x to 4x improvements in core performance on the engines and VMs in those browsers year on year, which is way outpacing the speed of chip designSo I believe that there will be a couple of people who will figure out ways to blend computation and storage on the client, more gracefully with that on the server, but still provide you with all of the benefits of basically access to my data anywhere I need, and the kind of reliability of the cloud. Jayshree Ullal, President and CEO, Arista Networks: Well, theres a technology impact but I actually think its going to really make CIOs rethink their jobs. Today, you can have a server administrator, an application administrator, a network administrator, and theyre all silos but you need your general practitioner. And thats really missing right now in the cloud. So if I had to make a prediction, less on the technology, more on the operational side, I would say for the deployment of this, its got to be a generalized IT person, whether thats the CIO or somebody he or she appoints Rich Wolski, Professor of Computer Science, University of California, Santa Barbara and CTO/Founder, Eucalyptus Systems: theres another revolution coming thats going to intersect the cloud revolution and that has to do with data simulationpretty much everything you own is going to be trying to send you data. And youre going to need, personally, a great deal of storage and compute capacity to be able to deal with that. I think the cloud is going to make that revolution that much quicker to come to us. These predictions depict cloud computing as still being in its formative phases, but that it will emerge as fundamental breakthroughs in datacenter and IT infrastructure in the years to come. Despite the current macro headwinds, deep innovation, and market opportunities in cloud computing will persist. Once this economic storm passes, Im convinced the sun will shine through, and cloud computing is sure to have many silver linings. Ping Li is a partner at Accel Partners in Palo Alto and focuses primarily on Information Technology infrastructure and digital media platforms. Page 4

LOOKING AHEAD
To further parse all this, I hosted a cloud computing panel with an esteemed group of technology thought leaders at Accels 15th Stanford Technology Symposium. Needless to say, these panelists had plenty of deep insights, opinions, and predictions about cloud computing. The panel brought together technologists who view cloud computing from distinctly different lenses: private cloud innovators, public cloud providers, cloud enabling technology solutions and cloud infrastructure applications. In wrapping up the panel session, I asked each speaker to conjure up a single prediction for cloud computing in the next few years. Heres what the experts said: Jonathan Bryce, CTO/Founder, Mosso (Rackspace): I think cloud computing is going to be a mindshift; its going to take a while. But I think an economy like this is actually a huge opportunity for entrepreneursI think this is a time when resources are scarce thats when great businesses end up getting built. And I think part of whats going to enable some of those businesses is cloud computing, and being able to get started with a lower varied entry, lower price point, all of those kind of things Mike Olson, CEO/Co-founder, Cloudera: I think that a lot of whats been said around here about data is really right on. I predict that in the next 10 years, computer science as computer science isnt really going to be the place that smart young guys are going to find tremendously rewarding careers. I think that the application of these new compute systems to large data in the sciences will advance human kind substantially. I think that science will be done maybe not even in the lab on the wet bench anymore, but with data, with computer systems looking at vast amounts of data. Raghu Ramakrishnan, Chief Scientist for Audience and Research

Anda mungkin juga menyukai