Anda di halaman 1dari 37

IBM System p5 and eServer p5

IBM
Introduction to
High Availability Cluster Multi-Processing
pSeries

(HACMP) server

and
HACMP Extended Distance IBM

(HACMP-XD)
Shawn Bodily
ATS HACMP Specialist

© 2006 IBM Corporation


IBM System p5 and eServer p5

Although hardware is now very reliable, hardware


failures account for a small minority of system outages
ƒ Several studies place the proportion between 20% and
45%
ƒ Human error, software error and planned maintenance
cause the majority of service outages
I

2 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Downtime and poor performance are expensive both


financially and in terms of customer perceptions
ƒ “Overall downtime-costs average 3.6% of annual revenue.” –
Infonetics
ƒ Many studies estimate average cost of downtime at over
$5,000/hour
ƒ Popular Web sites estimate cost of
I downtime at millions of dollars
A 22-hour crash in June, 2003 cost eBay
an estimated $5M
ƒ Losses go beyond immediate sales
revenue
¾ To clients, availability equates to reliability
and trustworthiness
¾ Internal application failures prevent
employees from working
3 © 2005 IBM Corporation © 2004 IBM Corporation
IBM System p5 and eServer p5

HACMP - Proven Technology for Business

ƒ Mature product now in its 17th major release


ƒ Averaging 40,000 licenses sold world-wide annually

ƒ Built on a decade of IBM cluster leadership


ƒ HACMP allows you to createI highly available environments
with minimal hardware.
ƒ HACMP is scalable up to 32-nodes, allowing your cluster to
adapt to the growing demands of your business.
ƒ The optional XD feature allows your clusters to span
unlimited geographic distances.

4 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP – Is NOT the right solution if:

ƒ Your environment is not secure


ƒ Network security is not in place
ƒ Change management procedures are not respected
ƒ You do not have trained administrator
I
ƒ Environment is prone to ‘user fiddle faddle’
ƒ Application requires manual intervention

HACMP will never be an out-of-the-box


solution to availability. A certain degree
of skill will be always be required.

5 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Reducing both Planned and Unplanned downtime


ƒ Unplanned Outage
 System Failure
– Hardware
– Operating System Crash
– Power Loss
– User Error
 Component Failure
– NIC
– SCSI/SAN Adapter
– Network Hub/Switch I
– SAN Switch
– Disk Failure (both O/S and application data)

ƒ Planned Outage
 Maintenance
– System Hardware Change/Upgrade
– OS & Application Upgrades & Fixes
 Testing
– Applied Fixes
– Failure scenarios for HA & DR

6 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP™ protects against service outages by detecting


problems and quickly “failing over” to backup hardware
ƒ Two nodes (A and B)
ƒ Two networks
Company Shared Network
Private (internal) network
!


 Public (shared) network Database


IBM
Web Srv
IBM

ƒ Shared disk I pSeries Private pSeries

 All data in shared storage serve


r
Network serve
r

available to both nodes


ƒ Critical applications A B
 Database server
 Web server
– Dependent on DB Shared Disk

7 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Example Failure #1: Node failure

ƒ Node A fails completely


Company Shared Network
ƒ Node B detects the loss
of Node A Database
IBM
Web Srv
IBM !
Private
ƒ Node B starts up its own
pSeries pSeries

I
serve
Network serve

instance of the Database. r r

ƒ Database is temporarily A B
taken-over by Node B
until Node A is brought
back online
Shared Disk

8 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Example Failure #2: Loss of network connection

ƒ Node A loses a NIC


Company Shared Network
ƒ Because of NIC redundancy,
the service IP swaps locally Database
IBM
Web Srv
IBM !
ƒ Operations continue normally pSeries Private pSeries

while problem is resolved I serve


r Network serve
r

ƒ If total public network


connectivity was lost a A B
fallover could occur

Shared Disk

9 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Failover possibilities One to any

One to one

Any to one Any to any

10 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Custom Resource Groups


Startup Preferences
ƒ Online On Home Node Only (cascading) - (OHNO)
ƒ Online on First Available Node (rotating or cascading w/inactive takeover)
- (OFAN)
ƒ Online On All Available Nodes (concurrent) - (OAAN)
ƒ Startup Distribution

Fallover Preferences I

ƒ Fallover To Next Priority Node In The List - (FOHP)


ƒ Fallover Using Dynamic Node Priority - (FDNP)
ƒ Bring Offline (On Error Node Only) - (BOEN)

Fallback Preferences
ƒ Fallback To Higher Priority Node - (FBHP)
ƒ Never Fallback - (NFB)

11 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Common Resources to make highly available


Service IP Address(es)
ƒ The IP Addresses that users/client apps will use for production
ƒ This can be one or multiple addresses
ƒ Not limited to the number of interfaces when utilizing aliasing

Application (Server)
I
ƒ Application(s) desired to be controlled/protect by HACMP
ƒ Many cases can be user provided start/stop script
ƒ May take advantage of pre-packaged application Smart Assists.

Shared Storage
ƒ Volume Groups
ƒ Logical Volumes
ƒ JFS
ƒ NFS

12 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Additional Granular Options


Resource Group Dependencies
 Parent/Child Relationships
– Great for Multi-Tier environments

 Location Dependencies
– Online on Same Node
• All resource groups must be online on the same node
I
– Online on Different Nodes
• All resource groups must be online on different nodes
– Online on Same Site
• All resource groups must be online on the same site

Define Resource Group Priorities (Different Node Dep.)


 Low
 Intermediate
 High

13 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Application Monitoring
HACMP can monitor applications in one of two ways:

 Process Monitor – determines the death of a process


 Custom Monitor – monitors health of the application using a monitor
method you provide

Decisions upon failure I


 Restart – Can establish a number of restarts to restart locally. After a
specified restart count, if app continues to fail you can escalate to a
fallover.
– Notifiy – Send email notification
– Fallover – Move application and associated resource group to next
candidate node.

Suspend/Resume Application Monitoring at anytime.

14 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

DLPAR/CUoD configuration

ƒ HACMP on the primary machine detects the failure


ƒ Running in a partition on another server, HACMP grows the backup
partition, activates the required inactive processors and restarts
application
Production DLPAR/CUoD Server
Database Server (running applications on active processors)
I

Active Processors Inactive Processors

Order Entry
Web Server
Database
Server

HACMP
HACMP Shared
HACMP
Disk HACMP

15 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Recent HACMP releases greatly improve ease of use


ƒ Enhancements include:
 Configuration wizard for typical two-node cluster
 Automatic detection and configuration of IP networks
 “Online Planning Worksheet” guides you through configuration
 Simplified Web-based interface for management and monitoring

Online Planning
Worksheets For
Resource Groups
Shown Here

16 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

With HACMP V5.x, you can configure a cluster in just


five questions

1. What is the address of the backup node?


2. What is the name of the application?
3. What script HACMP should use to start it?
4. What script HACMP should use
I to stop it?
5. What is the service IP label that clients will use to access
the application?

17 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

18 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

19 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

WebSMIT Overview Demo

20 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP Cluster Test Tool


ƒ The Cluster Test Tool reduces implementation costs by simplifying
validation of cluster functionality.

ƒ It reduces support costs by automating testing of an HACMP cluster


to ensure correct behavior in the event of a real cluster failure.

ƒ The Cluster Test Tool executes a test plan, which consists of a series
of individual tests. I
ƒ Tests are carried out in sequence and the results are analyzed by the
test tool.

ƒ Administrators may define a custom test plan or use the automated


test procedure.

ƒ Test results and other important data are collected in the test tool's
log file.

21 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

New features make HACMP V5.X easier to use


and more flexible
ƒ Automatic detection and correction of common cluster
configuration problems
ƒ Enhanced support for complex multi-tier applications,
relationships and dependencies
ƒ Clusters can be configured withI simple ASCII files

ƒ Parallel resource processing recovers applications faster


ƒ Simpler, more flexible configuration and management
ƒ New “Smart-Assists” simplify HACMP implementation in
DB2®, Oracle and WebSphere® environments
 Inexpensive option includes all three Smart-Assists

22 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP with Oracle 10g fallover Demo


(1) p52A
(1) p505
(1) HMC
HACMP 5.4
AIX 5.3 TL5
Oracle 10g
DS4300
LPARMon (http://www.alphaworks.ibm.com/tech/lparmon)
I
Swingbench (http://www.dominicgiles.com/swingbench.html)
Web-based System Manager

The cluster shown, was actually created using the two-node


configuration assistant within HACMP.

23 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

IBM

HACMP Extended Distance


(HACMP-XD)
pSeries

server

IBM

© 2006 IBM Corporation


IBM System p5 and eServer p5

Do you really need HA or DR ?

What is the target recovery time ?


Minutes ? Hours ? Days ?

Costs associated with implementing and


maintaining an HA or DR solution
 Redundant hardware
 Inter site networking
 Operations staff

HA/DR is a balance of recovery time requirements and cost


25 © 2005 IBM Corporation © 2004 IBM Corporation
IBM System p5 and eServer p5

Tiers of Disaster Recovery:


Level Setting HACMP/XD HACMP /XD
Best D/R practice is to blend tiers of solutions in order to maximize application fits in here
coverage at lowest possible cost . One size, one technology, or one
methodology doesn't fit all applications.

Applications with
Tier 7 - Highly automated, business wide, integrated solution (Example: Low tolerance to
GDPS/PPRC/VTS P2P, AIX HACMP/XD , OS/400 HABP.... outage
Zero
Zero or
or near
near zero
zero data
data
Tier 6 - Storage mirroring (example: XRC, recreation
PPRC, VTS Peer to Peer) I
Value

Tier 5 - Software two site, two phase commit (transaction integrity)


minutes
minutes to
to hours
hours
data
data recreation
recreation
Tier 4 - Batch/Online database shadowing & journaling, Applications
up
up to
to 24
24 hours
Point in Time disk copy (FlashCopy), TSM-DRM
hours Somewhat Tolerant
data recreation
data recreation
Tier 3 - Electronic Vaulting, TSM**, Tape to outage
24-48
24-48 hours
hours
data
data recreation
recreation
Tier 2 - PTAM, Hot Site,TSM**
Applications very
Tier 1 - PTAM* tolerant to outage
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Recovery Time *PTAM = Pickup Truck Access Method with Tape


**TSM = Tivoli Storage Manager
Tiers based on SHARE definitions *** = Geographically Dispersed Parallel Sysplex

26 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

27 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP Extended Distance (XD) is an optional


component for cross-site geographic disaster recovery

ƒ Backup systems may be physically separate from primary


operations for protection in the event of power failure, flood,
earthquake etc.
ƒ The XD option provides a basketI of disaster recovery
capabilities and integration points
ƒ XD provides multiple options:
 IP-based data mirroring (GLVM, HAGEO)
 Support for hardware-based data mirroring (Metro-Mirror/PPRC)

28 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP XD – Extended Distance for Disaster Recovery


ƒ Data replication between sites ensures a copy of the data is
available after a site wide disaster

ƒ Choice of Technology depends on distance, performance


requirements
Campus-wide – use LVM Split Site Mirroring
I
LAN / MAN

S
A
N

29 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP XD – Extended Distance for Disaster Recovery

Metro wide – use SVC or ESS/PPRC Mirroring


Production Recovery
Site Site

Router Router

Server Server I Server Server


A B SVC Mirroring C D

SVC SVC
Secondary
Primary
ESS/DS
ESS/DS

PPRC/Metro
Mirror
or
eRCMF
30 © 2005 IBM Corporation © 2004 IBM Corporation
IBM System p5 and eServer p5

HACMP XD – Extended Distance for Disaster Recovery


Unlimited – use GLVM Mirroring

Subset of disks are defined as “Remote Physical Volumes” or RPVs

RPV Driver
Replicates
data over
I
WAN

copy
copy 11 Mirror
Mirror 11 copy
copy 22
LVM
Mirrored
Volume
Group copy
copy 11 Mirror
Mirror 22 copy
copy 22

Both sites always have a complete copy of all mirrors


31 © 2005 IBM Corporation © 2004 IBM Corporation
IBM System p5 and eServer p5

New HACMP “Geographic Logical Volume Manager” is


a reliable, easy-to-use data mirror and failover
capability
ƒ GLVM provides unlimited-distance IP-based data mirroring
 Fully integrated with AIX 5L™ logical volume management

ƒ Easier to use than existing HAGEO solution



I
No need to define and manage separate state maps
 Long-term replacement for HAGEO

ƒ Automatically reverses direction of data replication on


failover
ƒ Supports all IBM TotalStorage® products certified with
base HACMP

32 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP XD – HACMP automates the solution


HACMP integrates support for all the replication options

ƒ Manages data replication direction, switching and resync


after recovery

ƒ Recovers locally or moves entire


I application to backup site

ƒ Common infrastructure supports all solutions


Choose the one that meets your performance and distance
requirements

33 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Thank You

ƒQuestions?????
I

34 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Backup SlidesI on Networking

35 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

Typical Local HACMP Clustering Configuration

en0 switch en0


10.70.10.x
switch
en1 en1
I

A single network view on a common subnet.


Multiple networks can be used.

36 © 2005 IBM Corporation © 2004 IBM Corporation


IBM System p5 and eServer p5

HACMP Clustering Across Sites


10.70.10.x 10.50.10.x

en0 switch switch en0


Router Router

en1 switch switch


en1
I

Different subnets, routers connected to allow cross subnet communications

37 © 2005 IBM Corporation © 2004 IBM Corporation

Anda mungkin juga menyukai