Virtual DR Solutions
Agenda
Importance of Business Continuity and Current Challenges Virtualization as a BC enabler
Better Business Continuity with VMware Infrastructure Preventing downtime Protecting data and systems Rapid Disaster Recovery
Agenda
Importance of Business Continuity Business Continuity Definition and Current Challenges
BC Importance and Focus
Virtualization as a BC enabler
Better Business Continuity with VMware Infrastructure
Traditional Challenges
1 out of 500 data centers will have a major outage each year (Disaster Recovery Journal)
43% of companies experiencing disasters never re-open, and 29% close within two years (McGladrey and Pullen)
Circuit breakers wipe out the Web PG&Es faulty equipment reveals the Internets vulnerability to a disruption of its power source Verne Kopytoff, Chronicle Staff Writer Thursday, July 26, 2007 All it took to wipe out some of the Internet's biggest sites Tuesday was some faulty PG&E electrical breakers that caused a blackout in downtown San Francisco. Some of the Web's hottest destinations - Craigslist, Yelp, Second Life - were suddenly inaccessible from San Mateo to Singapore after back-up generators failed at the facility housing their computer equipment. Although mostly fixed within 12 hours, the incident shows how easy it is to send major swaths of the online world to the dark ages. Sites that millions of people rely on can be knocked offline by freak accidents, not to mention major catastrophes, and this event served as a wake-up call to the executives that operate them. "If the data center was that vulnerable to a power outage, what if something really catastrophic happened like an earthquake?" asked Derek Gordon, marketing vice president for Technorati, a search engine of blogs that was brought down for a couple of hours Tuesday after the blackout. "What does that say about the vulnerability of the Internet in the Bay Area?" The troubles started when 365 Main, a key data center in downtown San Francisco that touts its "state-of-the-art electrical system," failed to get its backup generators started immediately after the power outage hit around 1:45 p.m. A number of companies that house their computer servers in the facility were suddenly offline, setting off a mad scramble to get the Web sites up and running. Shoppers at RedEnvelope, an online retailer, couldn't buy monogrammed pillow soaps. Hipsters on Yelp, the review site, had to take a break from sharing their reports of fabulous and not-so fabulous restaurants. Users of online classified service Craigslist were out of luck in finding a second-hand futon. The backup generators were turned on 45 minutes after the blackout started, a delay that 365 Main said it was still investigating yesterday. But it took some of the facility's customers anywhere from another hour to 11 hours to get their servers safely rebooted and their Web sites operational. What the episode exposed is that some companies operate entirely from one data center, a decision described by some security experts as risky. In emergencies, such companies can't shift traffic to an alternative facility where they keep additional servers. "There's all kinds of things that can happen from a power outage to a tornado to a backhoe," said Jason Needham, director of product management at F5 Networks, a Seattle company that sells software and equipment for data centers. "All these things seem far-fetched until they happen." However, Needham said the trend is for companies to put all their eggs in one basket, so to speak, in an effort to save money. In fact, just hours before Tuesday's power outage, 365 Main put out a press release trumpeting the fact that RedEnvelope had moved all its operations to its facility and closed an unneeded center in the Midwest. Data centers are usually designed with redundant equipment to ensure power during outages, earthquakes and floods. Backup electricity is supposed to kick in within seconds after an outage through a complex system that keeps servers humming without interruption. Gordon, from Technorati, called opening several data centers ruinously expensive for thinly funded Internet startups, of which there are hundreds in the Bay Area. Only profitable companies can afford such an extravagance, he said, though he acknowledged that Technorati, which isn't profitable, is in the process of moving into a second facility. Tuesday's outage "added to the sense of urgency," Gordon said. "The lesson here is despite all of your planning and all of your promises, you are vulnerable." This article appeared on page C - 1 of the San Francisco Chronicle
Complexity
Management and provisioning
Configure hardware Install OS Config OS Install backup/ restore agent Start Singlestep automatic recovery
Reliability
Complex solutions are hard to test Requires specialized training for personnel
X
Agenda
Importance of Business Continuity and Current Challenges Virtualization as a BC enabler
Better Business Continuity with VMware Infrastructure
Preventing downtime Protecting data and systems Ensuring rapid recovery from failures
Business Continuity: The Killer App for Virtualization! Press Best Disaster Recovery Product of 2006 (TechTarget) Customers 55%
55% of customers using virtualization for BC/DR
N=2265
After Virtualization:
Software tied to hardware Single OS image per machine One application workload per OS
Hardware-independence of OS and applications Virtual machines can be provisioned to any system Manage OS and application as a single unit
Eliminate need for 1:1 hardware duplication for BC Eliminate risk of hardware configuration drift
Encapsulation
Apps System Data
System portability Simplify provisioning for recovery Simplify backup and replication Simplify copying and cloning of systems
Physical Server
= File
DR Test
App OS
Easier testing of a BC-DR plan Stability and security Utilize DR hardware for other tasks
App OS
VMware Infrastructure
Agenda
Importance of Business Continuity and Current Challenges Virtualization as a BC enabler
Better Business Continuity with VMware Infrastructure
Elements of preventing downtime Eliminate planned downtime Reduce un-planned downtime with better fault tolerance
Per studies from IBM & Sun, planned downtime is responsible for 80-90% of total system downtime
90%
PLANNED DOWNTIME
80%
Eliminating planned downtime can increase system availability by a full order of magnitude
SUN estimate
IBM estimate
X
Resource Pool
Copyright 2005 VMware, Inc. All rights reserved.
No need for dedicated stand-by hardware None of the cost and complexity of clustering
Cluster physical machines with virtual machines Cluster virtual machines with virtual machines Cluster applications using fewer physical servers Test cluster configurations on a single physical server
Lower cost:
Provides, at a lower cost, fault-tolerance equivalent to that possible with physical systems
With VMware Infrastructure Prevent resource bottlenecks with DRS Automated load balancing across a pool of servers Ability to dynamically add resources to server pool
1. Overloaded host: automatic workload balancing 2. Dynamically add resources: DRS rebalances load
Agenda
Importance of Business Continuity and Current Challenges Virtualization as a BC enabler
Preventing downtime Protecting data and systems Rapid Disaster Recovery Implementing Better Business Continuity
Keys to protecting data and systems Minimize complexity Minimize impact on services Ensure comprehensive protection
Impact
Systems are data Protect system using same tools and processes used to protect data Virtual machines are the simplest, most portable way to store system
Virtual Machine
Physical Server
App
Service Console
Backup Agent
OS
OS
Service Console
Backup Agent
tape
Backup Server
Agenda
Importance of Business Continuity and Current Challenges Virtualization as a BC enabler
Better Business Continuity with VMware Infrastructure
Preventing downtime Protecting data and systems Rapid Disaster Recovery Implementing Better Business Continuity
DR Site
Bound to HW 5-10% utilized
WAN
Application OS
OS files
x86
local storage
x86
local storage
Storage
Storage
Slow and Unreliable Process, Separate processes for systemInfrastructure Expensive and application data
Complex to physically recover OS, applications & data
V-V
Restore VM
Power on VM
Primary site
P2V
conversion
Secondary Site
Web
imaging conversion
App
imaging
P2V P2V
conversion
WAN replication
DNS How:
If Rapid Recovery is required: Boot virtual machines on any hardware Start data recovery of application data if necessary
VMware Converter creates virtual machines matching physical machines Copy virtual machines to recovery site
Replication
Write data
Target storage
DR SITE
Array-Based Replication
Source VMFS
Target VMFS
Customer Example
Budget
Applications Protected
Customer Results
Our virtual IT infrastructure will help us provide greater availability than ever before for our most critical applications.
-- Paul Poppleton, IT Manager QUALCOMM
Using VMware virtual infrastructure, we can offer the same levels of service and more flexibility for up to 40 percent lower server and operating system costs.
We can move a virtual machine to another physical server, apply a patch, and move it back without any service interruption.
-- Jamey Vester, Member of Professional Staff Subaru of Indiana
Agenda
Importance of Business Continuity and Current Challenges Virtualization as a BC enabler
Better Business Continuity with VMware Infrastructure
Preventing downtime Protecting data and systems Rapid Disaster Recovery Implementing Better Business Continuity
The VMware Difference Rapid, Reliable, Affordable Business Continuity Products
VMware Infrastructure 3
VMware Virtual SMP
Enables single VM to use up to 4 physical processors simultaneously
VMware Converter
Automates conversion of physical to virtual machines (physical-virtual)
VMware VMotion
Moves live, running VMs from one host to another while maintaining continuous service availability.
Affordable
Realize early savings from consolidation Increase HA and DR coverage for more applications Fund your BC plan with hardware and operational savings
Reliable
Zero downtime planned maintenance Automatic restarts for un-planned server failure Frequent non-disruptive DR testing with dual-use of DR site
Plan a PoC
AccessFlow can be engaged to assess your business continuity needs and design an appropriate roadmap to implement a robust DR solution http://www.accessflow.com
Try