Anda di halaman 1dari 33

Continuity Management

- Premanand Lotlikar

9th September, 2007


Agenda
• Introduction
• Objective of Continuity Mgmt
• Benefits
• Relationship with other processes
• Phases in Business Continuity Planning (BCP)
• Disaster Recovery Planning (DRP)
• Process Control
• Key Performance Indicators
• Cost
• Possible Problems
Introduction
• Disasters can strike anytime!
• Organizations must plan for these disasters:
– Natural- Earthquakes, storms, fires, floods,
hurricanes, tornados, and tidal waves
– System/technical- Outages, malicious code, worms,
and hackers
– Supply systems- Electrical power problems,
equipment outages, utility problems, and water
shortages
– Human-made/political- Disgruntled employees, riots,
vandalism, theft, crime, protesters, and political unrest
Introduction
• The length of time that services could be
interrupted are defined as follows:
– Minor- Operations are disrupted for several hours to
less than a day.
– Intermediate- An event of this stature can cause
operations to be disrupted for a day or longer. The
organization might need a secondary site to continue
operations.
– Major- This type of event is a true catastrophe. In this
type of disaster, the entire facility would be destroyed.
A long-term solution would require building a new
facility.
Objective
• Support overall Business Continuity
Management by ensuring that all required
IT infrastructure and services can be
restored within the specified time limits
after the disaster
Benefits
• Recovery of the systems
• Lose less availability time
• Minimize interruptions to business
activities
Relationship with other processes
• SLM
• Availability Mgmt
• Configuration Mgmt
• Capacity Mgmt
• Change Mgmt
Phases in the BCP process
• Project Management and Initiation
• Business Impact Analysis
• Recovery Strategy
• Plan design and Development
• Testing, Maintenance, Awareness, and
Training
Project Management and Initiation
• Establish the need for the BCP
• Perform a risk analysis to identify and
document potential outages to critical
systems
• Results should be presented to
management so they understand the
potential risk
Project Management and initiation
• With management on board, you can start to develop a plan of
action
• This management plan should include:
– Scope of the project
– Appointment of a project planner
– Determination of who will be on the team
• representatives from senior management, the legal staff, recovery team
leaders, the information security department, various business units,
networking, and physical security
– Finalize the project plan
• finalize issues such as needed resources (personnel, financial), time
schedules, budget estimates, and critical success factors
– Determine the data-collection method
• Strohl Systems BIA Professional and SunGard’s Paragon software can
automate much of the BCP process
• Learning curve is involved anytime individuals are introduced to software
they are not familiar with
Business Impact Analysis (BIA)
• Its role is to describe what impact a disaster
would have on critical business functions
• Example
– DoS attacks that result in 2 hours of downtime of the
company’s VoIP phone system will result in $28,000
in lost revenue
– 8-hour outage to the web server might cost the
company only $1,000 in lost revenue
• These types of numbers will help the
organization determine what needs to be done
to ensure the survival of the company
Business Impact Analysis
• The impact or loss that an organization faces
because of lost service or data can be felt in
many ways
• These are generally measured by one of the
following:
– Allowable business interruption
• Max Tolerable Downtime (MTD) is a measurement of the
longest time that an organization can survive without a
specific business function
– Financial and operational considerations
– Regulatory requirements
– Organizational reputation
Business Impact Analysis
• The eight steps in the BIA process are as follows:
– Select individuals to interview
– Determine the methods to be used for gathering information
– Develop a customized questionnaire to gather specific monetary
and operational impact information
– Analyze the compiled data
– Determine the time-critical business processes and functions
– Determine MTD for each process and function
– Prioritize the critical business process or function based on its
MTD
– Document the findings and report your recommendations to
management
Recovery Strategy
• Predefined actions that management has
approved to be followed in case normal
operations are interrupted
• Following are recovery strategies for:
– Data interruptions
• Focus here is on recovering the data
• Solutions to data interruptions include backups,
offsite storage, and remote journaling
Recovery Strategy
• Recovery strategies for:
– Operational interruptions
• Interruption is caused by the loss of some type of equipment
• Solutions to this type of interruption include hot sites,
redundant equipment, Redundant Array of Inexpensive Disks
(RAID), and Backup Power Supplies (BPS)
– Facility and supply interruptions
• Causes of these interruptions can include fire, loss of
inventory, transportation problems, Heating Ventilation and
Air Conditioner (HVAC) problems, and telecommunications
– Business interruptions
• These interruptions can be caused by loss of personnel,
strikes, critical equipment, supplies, and office space
Recovery Strategy
• To evaluate the losses and determine the best
recovery strategy, follow these steps:
– Document all costs for each possible alternative.
– Obtain cost estimates for any outside services that
might be needed.
– Develop written agreements with the chosen vendor
for such services.
– Evaluate what resumption strategies are possible in
case there is a complete loss of the facility.
– Document your findings and report your chosen
recovery strategies to management for feedback and
approval.
Plan Design and Development
• The team prepares and documents a detailed
plan for recovery of critical business systems
• The plan should be a guide for implementation
• The plan should also detail how the organization
will interface with external groups, such as
customers, shareholders, the media, the
community, and region and state emergency
services groups
• Final step of the phase is to combine this
information into the BCP plan and interface it
with the organization’s other emergency plans
Plan Design and Development
• Plan should include information on both long-term and
short-term goals and objectives:
– Identify critical functions and priorities for restoration.
– Identify support systems that are needed by critical functions.
– Estimate potential disasters and calculate the minimum
resources needed to recover from the catastrophe.
– Select recovery strategies and determine what vital personnel,
systems, and equipment will be needed to accomplish the
recovery.
– Determine who will manage the restoration and testing process.
– Calculate what type of funding and fiscal management is needed
to accomplish these goals.
Testing & Maintenance
• Five different types of BCP training:
– Checklist
• Performed by sending copies of the plan to different department managers and
business unit managers for review
– Tabletop
• Performed by having the members of the emergency management team and business
unit managers meet in a conference to discuss the plan
• Primary advantage of the tabletop testing method is to discover dependencies between
different departments
– Walkthrough
• This is an actual simulation of the real thing
• Primary purpose of this test is to verify that members of the response team can
perform the required duties
– Functional
• Functional test is similar to a walkthrough but actually starts operations at the
alternative site
– Full interruption
• This plan is the most detailed, time-consuming, and thorough
• Mimics a real disaster, and all steps are performed to startup backup operations
• Involves all the individuals who would be involved in a real emergency, including
internal and external organizations
Awareness and Training
• Goal of awareness and training is to make sure
all employees know what to do in case of an
emergency
• Employees assigned to specific tasks should be
trained to carry out needed procedures.
• Plan for cross-training of teams, if possible, so
those team members are familiar with a variety
of recovery roles and responsibilities
• Number one priority of any BCP or DRP plan is
to protect the safety of employees
Disaster Recovery Planning
• BCP deals with what is needed to keep the
organization running and what functions are
most critical
• DRP’s purpose is to get a damaged organization
restarted where critical business functions can
resume
• DR activities center on assessing the damage,
restoring operations, and determining whether
an alternate location will be needed until repairs
can be made
• These items can be broadly grouped into
salvage and recovery
Disaster Recovery Planning
• Salvage
– Restoring functionality to damaged systems, units, or the facility
• A damage assessment to determine the extent of the damage
• A salvage operation to recover any repairable equipment
• Repair and cleaning to eliminate any damage to the facility and restore
equipment to a fully functional state
• Restoration of the facility so that it is fully restored, stocked, and ready for
business
• Recovery
– Focused on the responsibilities needed to get an alternate site up and
running
– This site will be used to stand in for the original site until operations can
be restored there

• #NOTE: Physical security is always of great importance after a disaster.


Steps such as guards, temporary fencing, and barriers should be deployed
to prevent looting and vandalism
Alternative Sites and H/W Backup
• Reciprocal Agreement
– Requires two organizations to pledge assistance to one another
in case of disaster
– Carried out by sharing space, computer facilities, and technology
resources (cost effective)
– Parties to this agreement must place their trust in the other
organization to their aid in case of disaster
– There is also the issue of confidentiality because the damaged
organization is placed in a vulnerable position and must trust the
other party with confidential information
– If the parties of the agreement are near each other, there is
always the danger that disaster could strike both parties,
thereby, rendering the agreement useless
Cold, Warm, and Hot Sites
• Cold site
– This option can be used by businesses that can
manage without IT services for some time
– Basically an empty room with only rudimentary
electrical, power, and computing capability
• Warm site
– Somewhat of an improvement over a cold site
– This facility has data equipment and cables, and is
partially configured
– It could be made operational in anywhere from a few
hours to a few days
Cold, Warm, and Hot Sites
• Hot site
– This facility is ready to go
– Fully configured and is equipped with the same system as the
production network
– Although it is capable of taking over operations at a moment’s
notice, it is the most expensive option discussed
• Mobile site
– Non-mainstream alternative to traditional recovery options
– Typically consist of fully contained tractor-trailer rigs that come
with all the needed facilities of a data center
– They can be quickly moved to any needed site
Multiple Data Centers
• Each of these sites is capable of handling
all operations if another fails
• Gives the company fault tolerance by
maintaining multiple redundant sites
• If the redundant sites are geographically
dispersed, the possibility of more than one
being damaged is low
• However, cost is a consideration
Other Alternatives
• Database shadowing
– Database shadowing system uses two physical disks to write the
data to
– Creates good redundancy by duplicating the database sets to
mirrored servers
• Electronic vaulting
– Makes a copy of backup data to a backup location
– This is a batch-process operation that functions to keep a copy
of all current records, transactions, or files at an offsite location
• Remote journaling
– Similar to electronic vaulting, except that information is
processed in parallel
– By performing live data transfers, it allows the alternate site to be
fully synchronized and ready to go at all times
Backup Types
• Full Backup
– A full backup backs up all files, regardless of whether they have
been modified
• Incremental Backup
– An incremental backup backs up only those files that have been
modified since the previous backup of any sort
– Restoration will require all incremental backup tapes plus the
last full backup
• Differential Backup
– A differential backup backs up all files that have been modified
since the last full backup
– Restoration will require the full backup and the last differential
backup
Process Control
• Effective Configuration Mgmt process
• Regular test of the recovery plan
• Up-to-date and effective tools
• Dedicated training for everyone involved in
the process
• Support and commitment throughout the
organization
Key Performance Indicators
• No of identified shortcomings of the
recovery plan
• Revenue lost further to disaster
• Cost of the process
Cost
Possible Problems
• Resources
• Commitment
• Estimating the damages
• Access to recovery facilities
• Lack of awareness
• BCP is IT departments responsibility!
Thank you!

Anda mungkin juga menyukai