Overview
Troubleshooting
Using Volume
Manager
Event
Notification
Service Group
Basics
Introduction
VCS_3.5_Solaris_R3.5_2002091
5
Cluster
Communication
Faults and
Failovers
Preparing
Resources
Terms
and
Concepts
Installing
Applications
Resources
and Agents
Installing
VCS
Managing
Cluster
Services
NFS
Resources
Using
Cluster
Manager
14-2
Objectives
After completing this lesson, you will be able to:
Monitor system and cluster status.
Apply troubleshooting techniques in a VCS
environment.
Detect and solve VCS communication problems.
Identify and solve VCS engine problems.
Correct service group problems.
Resolve problems with resources.
Solve problems with agents.
Correct resource type problems.
Plan for disaster recovery.
VCS_3.5_Solaris_R3.5_2002091
5
14-3
Monitoring VCS
VCS log files
System log files
The hastatus utility
SNMP traps
Event notification triggers
Cluster Manager
VCS_3.5_Solaris_R3.5_2002091
5
14-4
Example entries:
TAG_D 2001/04/03 12:17:44
started
TAG_D 2001/04/03 12:17:44
TAG_C 2001/04/03 12:17:45
exited errno 10054
TAG_E 2001/04/03 12:17:52
membership
TAG_E 2001/04/03 12:17:52
Jeopardy: 0x0
VCS_3.5_Solaris_R3.5_2002091
5
14-5
14-6
Troubleshooting Guide
Start by running hastatus -summary:
Cluster communication problems are indicated by
the message:
Cannot connect to server -- Retry Later
VCS engine startup problems are indicated by
systems in one of the WAIT states.
Service group, resource, or agent problems are
indicated within the hastatus display.
VCS_3.5_Solaris_R3.5_2002091
5
14-7
14-8
train11#lltstatn
LLTnodeinformation:
NodeStateLinks
*0train11OPEN2
1train12CONNWAIT2
VCS_3.5_Solaris_R3.5_2002091
5
train12#lltconfig
LLTisrunning
train12#lltstatn
LLTnodeinformation:
NodeStateLinks
0train11CONNWAIT2
*1train12OPEN2
14-9
14-11
STALE_ADMIN_WAIT
To recover from STALE_ADMIN_WAIT state:
1. Visually inspect the main.cf file to determine
whether it is valid.
2. Edit the main.cf file, if necessary.
3. Verify the syntax of main.cf, if modified.
hacf verify config_dir
14-12
ADMIN_WAIT
A system can be in the ADMIN_WAIT state under
these circumstances:
A .stale flag exists and the main.cf file has a
syntax problem.
A disk error occurs affecting main.cf during a
local build.
The system is performing a remote build and last
running system fails.
14-13
VCS_3.5_Solaris_R3.5_2002091
5
14-14
VCS_3.5_Solaris_R3.5_2002091
5
14-15
VCS_3.5_Solaris_R3.5_2002091
5
14-16
VCS_3.5_Solaris_R3.5_2002091
5
14-17
VCS_3.5_Solaris_R3.5_2002091
5
14-18
VCS_3.5_Solaris_R3.5_2002091
5
14-19
14-20
VCS_3.5_Solaris_R3.5_2002091
5
14-21
Concurrency Violations
Occurs when a failover service group is online or
partially online on more than one system
Notification provided by the Violation trigger:
Invoked on the system that caused the concurrency violation
Notifies the administrator and takes the service group offline
on the system causing the violation
Configured by default with the violation script in
/opt/VRTSvcs/bin/triggers
Can be customized:
Send message to the system log.
Display warning on all cluster systems.
Send e-mail messages.
VCS_3.5_Solaris_R3.5_2002091
5
14-22
Check logs.
Manually bring the resource offline, if necessary.
Configure ResNotOff trigger for notification or
action.
VCS_3.5_Solaris_R3.5_2002091
5
14-23
VCS_3.5_Solaris_R3.5_2002091
5
14-24
VCS_3.5_Solaris_R3.5_2002091
5
14-25
VCS_3.5_Solaris_R3.5_2002091
5
14-26
Clearing Faults
2.
2.
VCS_3.5_Solaris_R3.5_2002091
5
14-27
VCS_3.5_Solaris_R3.5_2002091
5
14-28
VCS_3.5_Solaris_R3.5_2002091
5
14-29
main.cf
main.cmd
sysname
LLT and GAB configuration files
Customized trigger scripts
Customized agents
14-30
14-31
Summary
You should now be able to:
Monitor system and cluster status.
Apply troubleshooting techniques in a VCS
environment.
Resolve communication problems.
Identify and solve VCS engine problems.
Correct service group problems.
Resolve problems with resources.
Solve problems with agents.
Correct resource type problems.
Plan for disaster recovery.
VCS_3.5_Solaris_R3.5_2002091
5
14-32
Lab Exercise 14
Your instructor will run scripts that cause
problems within your cluster environment.
Apply the troubleshooting techniques provided
in the lesson to identify and fix the problems.
Notify your instructor when you have restored
your cluster to a functional state.
VCS_3.5_Solaris_R3.5_2002091
5
14-33