Anda di halaman 1dari 35

NetBackup Training

KEEPING PEOPLE AND INFORMATION CONNECTED.

Module 1: Brief Overview, Client/Policy Configuration, Troubleshooting


For Internal SunGard Use Only

Agenda
Introduction Purpose & Assumptions History Terminology and Concepts Architecture Standards How Backups Work Managing NetBackup Client Implementation and Configuration Policies Troubleshooting Reporting Monitoring Overall Environment Shutdown/Restart NetBackup Tips and Tricks Education/Further Reading Q&A
KEEPING PEOPLE AND INFORMATION CONNECTED

Purpose and Assumptions


Purpose
Increase knowledge of NetBackup product

Assumptions
Presentation assumes 6.5.3 Vague familiarity of NetBackup Know how to access environments Windows and/or Unix admin experience Please write down your questions for the Q&A session at the end

KEEPING PEOPLE AND INFORMATION CONNECTED

History
Corporate 1987 - proprietary software solution written by engineers at Control Data for Chrysler Corp. 1993 - renamed to BackupPlus (bp prefix) Late 1993 - OpenVision acquisition (/usr/openv/ install path) and rebranded product NetBackup 1997 - Veritas acquired OpenVision 2005 - Symantec acquired Veritas Version 1993 BackupPlus 1.0 (Control Data) 1994 NetBackup 1.6 (OpenVision) 1996 NetBackup 2.0 1997 NetBackup 3.0 (Veritas) 2000 NetBackup 3.4 2002 NetBackup 4.5 2003 NetBackup 5.0 2005 NetBackup 6.0 (Symantec) 2007 NetBackup 6.5
KEEPING PEOPLE AND INFORMATION CONNECTED

Terminology and Concepts


Master Server brains of the operation, houses catalog Media Server where storage units exist, pushes data Client device providing data to be backed up Enterprise Media Manager (EMM) manages device and media information; typically installed on Master Catalog database of backup images and other information Metadata info of files backed up (name, path, size, date, image location, etc.) Duration - time it takes to perform the backup Exit Code final status of job
0 = Successful with NO files missed 1 = Successful with files missed 2+ = Backup Failed

Start Window time when a backup can START Frequency how often the backup should execute Retention length of time backups are valid Policy grouping of like clients sharing similar attributes
KEEPING PEOPLE AND INFORMATION CONNECTED

Terminology and Concepts continued


Schedule subset of Policy, defines Start Window, retention, storage unit, etc. Storage Unit location defined to store backups, can be disk/tape, exist only on a Media Server Backup Image one backup job comprised of all files backed up; job must complete Disk Storage primary landing zone for jobs; destage to tape later; removes older images as needed; can be configured many ways, current standard is Basic Disk; optional Multiplexing interleaving of multiple jobs on tape to prevent shoeshining Long Term Data Retention utilizes media and marginally increases catalog size; non-issue

Dependent on proper forward and reverse lookups


Scaling horizontally by adding more media servers
KEEPING PEOPLE AND INFORMATION CONNECTED

Terminology and Concepts continued


Files to Backup Part of policy config Exclude List files to skip Include List files to include after processing excludes Additional config on client; granular to policy or schedule level; no stacking Backup Type Full all files captured Differential Incremental all changes since last backup Cumulative Incremental all changes since last full User Allows user to run backups from client side; most used for child jobs of DB Agents - Default-Application-Backup Database Agents Exchange, Notes, Oracle, SQL, SAP, etc. Options NDMP, Off-site Management (Vault), Tape/Disk Sharing, Bare Metal Restore, Snapshot, VMWare etc. Licensing - gold key with many options; SunGard pays for Protected Data Recovery restore catalog or import all images manually
KEEPING PEOPLE AND INFORMATION CONNECTED

NetBackup Tiered Architecture


Master Server (Top Tier)
Scheduler
Stores Catalog (Metadata, Images), Volume Information Vaulting Management

Media Server(s) (Mid Tier)


Data Mover Sends Metadata to Master

Can be located on Master

Clients (Lower Tier)


Configured via GUI/Registry (Win) or config files (*nix)

KEEPING PEOPLE AND INFORMATION CONNECTED

Example NetBackup Architecture Diagram

Master Server

FC Switch Fabric A

Meta-data Nework

Disk Storage

Media Server 1

Media Server 2

Media Server 3

Media Server N

Backup Network
FC Switch Fabric B Enterprise Class Tape Library

Client Hosts
KEEPING PEOPLE AND INFORMATION CONNECTED

Standards
Infrastructure Server/OS Types
Unix Solaris 10 T2000 Naming Standards Defined Network Configuration Standards (Metadata, backup, mgmt)

Robot Types

LTO3/4 Tape Drives / Media Types Volume Serial Numbers (VolSers/bar codes) SAN connectivity Disk Array Standards DSSU Configuration Application/Configuration Documented on LiveLink
KEEPING PEOPLE AND INFORMATION CONNECTED

Quantum Scalar i2000 STK SL8500 Small Robots for legacy restores

How Backups Work (simplified)


Scheduler on Master tells Media to backup its client Media server is granted storage unit resource (disk or tape) Media connects to client software and tells it to start backing up Client creates list of files to backup Full everything Differential changes since last backup Cumulative changes since last full Copies of files are sent to buffer Buffer contents sent to Media Server Media server writes buffer contents to storage unit Media server sends metadata to Master server to update catalog Backup completes Storage unit resource released Backup image is completed and closed
KEEPING PEOPLE AND INFORMATION CONNECTED

Managing NetBackup (Demonstration)


NBU Administration Console 99.9% of daily administration occurs here Activity Monitor Overall job status Jobs tab
Job details State - Queued, Active, Partial, Failed Type Backup, Restore, Catalog, Duplicate, Vault Status Exit Code of job 0 = All files backed up, no problems 1 = Some files skipped (open/locked) >1 = Failure Additional info Suspend/kill jobs Sorting/Filtering - Be aware of any filters you have set Exporting

Daemons tab Processes tab Help


KEEPING PEOPLE AND INFORMATION CONNECTED

Managing NetBackup (contd)


Storage Storage Units defined target for backups (similar to storage pool in TSM) Disk or Tape Storage Unit Groups Media Volume Pools logical grouping of tapes
Various defined pools Scratch SG_SHARED_xxx Policy defines Volume Pool

Volume Groups locational grouping of tapes


Robot groups Onsite group Offsite groups Vault moves media between volume pools

Robots media currently in robot Standalone tapes no longer associated with robot/volume group Inventory Robot Ejecting media States Active, Full, Frozen, Suspended, Imported
KEEPING PEOPLE AND INFORMATION CONNECTED

Managing NetBackup (contd)


Device Monitor
Up/Down/Reset drive

Devices
Drives Robots
SCSI Robots have single Control Host ACS any server can control

Media Servers Topology

KEEPING PEOPLE AND INFORMATION CONNECTED

Managing NetBackup (contd)

Backup Archive Restore


Used for restoring files

Host Properties
Master Server Media Servers Clients Include/Exclude Lists Server authorization

Catalog
Offline backup (legacy method) Import images Verify Images Duplicate images

Reports Vault option that processes and tracks volumes sent offsite
KEEPING PEOPLE AND INFORMATION CONNECTED

Client Implementation and Configuration


All systems Install client binaries
Agents included for Windows, not for Unix

Verify network communication Client configuration Unix


Configuration files bp.conf
SERVER = backup01-dal Master Must be Listed First! SERVER = backup02-dal SERVER = backup03-dal SERVER = backup0N-dal CLIENT_NAME = jumpstart01-dal

exclude_list and include_list

exclude_list.policyName.scheduleName include_list.policyName.scheduleName Exclude/Include lists do not stack


Windows
Backup, Archive, Restore GUI or Registry Some configuration available from Admin Console>Host Properties>Clients Changing open file backup for Windows

Demonstration of Windows client configuration


KEEPING PEOPLE AND INFORMATION CONNECTED

Policies (Demonstration)
Policies - A backup policy allows the admin to configure how and when backups are to be performed for a group of clients. This group of clients share similar backup requirements (type, backup window, retention, etc.)

Attributes
Policy Type Destination
Classification Storage Unit Volume Pool

Check Points Limit Jobs per Policy Job Priority Media Owner

Active/Inactive Follow NFS Cross mount points Compression Encryption Collect DR Info Allow Multiple Data Streams Keyword Phrase Snapshot Client

KEEPING PEOPLE AND INFORMATION CONNECTED

Policies (contd)
Schedules
Attributes Tab
Name Type of Backup Full, Incremental, Differential, Cumulative., User Synthetic Schedule Type Calendar Based Frequency Based

Destination Multiple Copies Override Policy Storage Override Policy Vol Pool Override Media Owner Retention Media Multiplexing Start Window Tab

Exclude Dates Tab

Defines when backup can START Defines when backup cannot run Only available when calendar sched type chosen Retries allowed after runday Specific Days or Recurring Days

Calendar Schedule

Summary of All Policies


KEEPING PEOPLE AND INFORMATION CONNECTED

Policies (contd)
Clients Know hardware/OS type Backup Selections what to backup
ALL_LOCAL_DRIVES System_State:\ or Shadow Copy Components:\ NEW_STREAM for multistreaming

Manual backups

KEEPING PEOPLE AND INFORMATION CONNECTED

Troubleshooting
MSS Document When in doubt, ASK! Windows client Troubleshooting

KEEPING PEOPLE AND INFORMATION CONNECTED

Windows Clients Over 3000 servers across all environments 77% of all servers 85% of all failures

KEEPING PEOPLE AND INFORMATION CONNECTED

Error Codes Media related (8x) Network Communication related (4x) Configuration/Hardware related (5x) Most Common Codes:
41, 196, 5x, 219, 13, 14, 2x

KEEPING PEOPLE AND INFORMATION CONNECTED

Check the Simple Stuff


Is Server On and Cabled Decommissioned Maintenance Hosts Files or DNS correct Host All backup servers All backup interfaces on backup servers Network Functional Routing Library/Media Problem Server Hardware Windows Event Log Correlation Telnet To Master/Media from Client To Client from Master/Media telnet <hostname> bpcd (or 13782) telnet <hostname> vnetd (or 13724)
KEEPING PEOPLE AND INFORMATION CONNECTED

Check the Simple Stuff (contd)


BPCLNTCMD Command Options
-sv returns version of Master
5.1

-pn communicates back to Master


expecting response from server backup01-dal backup03-dal backup03-dal 10.229.133.233 56618

-self returns info about local system


gethostname() returned: backup03-dal host backup03-dal: backup03-dal at 10.229.133.233 (0xae585e9) checkhname: aliases:

-hn <hostname> - returns info resolved from hostname


host backup01-dal: backup01-dal at 10.229.133.229 (0xae585e5) checkhname: aliases:

-ip <IP address> - returns info resolved from IP


checkhaddr: host : backup01-dal: backup01-dal at 10.229.133.229 (0xae585e5) checkhaddr: aliases:

-server <Master> - see hn option

KEEPING PEOPLE AND INFORMATION CONNECTED

In Depth Client Troubleshooting Turn up logging on client


Host properties or client BAR GUI Must have <install>\netbackup\logs\* dirs created

Client Logs and Directories:


bpbkar\<date>.log Backup/Archive process (BPBKAR32) bpcd\<date>.log Client Daemon (BPCDW32) tar\<date>.log Restores (TAR32)

KEEPING PEOPLE AND INFORMATION CONNECTED

In Depth Client Troubleshooting (contd)


Run test backup/restore Examine logs after failure Logs structured as such:
00:00:03.125 [3652] <2> bpcd exit_bpcd: exit status 0 ------>exiting 09:55:33.941 [6092] <16> bpfsmap: ERR - open_snapdisk: NBU snapshot failed

Search for <#> entries:


<2>, <4>, <8>, <16>, <32>: <2>=informational and <32>=Critical Failure

Search error message on Google and Symantec Test recommended solution Lather, rinse, repeat Last resort/time sensitive open case with Symantec (800) 342-0652 Customer Number 3680-5196-9875

KEEPING PEOPLE AND INFORMATION CONNECTED

Example Log Error 41


5:20:55.454 PM: [1656.2600] <16> dtcp_write: TCP - failure: send socket (904) (TCP 10053: Software caused connection abort) 5:20:55.454 PM: [1656.2600] <16> dtcp_write: TCP - failure: attempted to send 6 bytes 5:20:55.486 PM: [1656.2600] <16> dtcp_write: TCP - failure: send socket (904) (TCP 10053: Software caused connection abort)

The connection is being reset internally to the host. Recommendation is to reload the NIC driver or replace the NIC. Error 41 can also produce TCP 10054 errors in the logs, but this is an external closing of the connection. These can be caused by loss of network connectivity, crashes or reboots. Error 41 has also been the result of corrupted VSS. Check the Event Log for any related error messages and consult with Systems Engineers, if necessary

KEEPING PEOPLE AND INFORMATION CONNECTED

Windows Client Troubleshooting Checklist


Narrow your effort based on error code Check the simple stuff: Is server cabled, decommed, under maint. Verify hosts file(s) or DNS on all involved servers Network functional? Verify routing Library or Media problem? Server hardware problem? Check Windows event log Correlate any issues
Run BPCLNTCMD on all involved servers using each option:
-sv -pn -self -hn <hostname> -ip <ip address> -server <name of Master>
KEEPING PEOPLE AND INFORMATION CONNECTED

Maximize logging values for client Verify log dirs created in <install>\netbackup\logs\* bpbkar bpcd tar Start backup/restore Review logs searching for errors (look for <4> <8> <16> <32>) Search error message on Google and Symantec sites Test solution Repeat until resolved Open case with Symantec (800) 342-0652 Cust. #: 3680-5196-9875

Reporting
NetBackup Reports Aptare
In depth historical reporting and trending Supports several backup products, incl. TSM Command Center Dashboard Job Reports The Dot Report Dont agitate the Dots Billing yes we can be a profit center IF we are successful Media Reports

KEEPING PEOPLE AND INFORMATION CONNECTED

Keeping Tabs on the Infrastructure


Use Aptare Check for down drives/stuck tapes regularly Verify Drive Configuration Scratch Destaging Balance Jobs Tape Injects/Ejects

KEEPING PEOPLE AND INFORMATION CONNECTED

How To Shutdown/Restart NetBackup


Shutdown Suspend/Cancel jobs Stop Aptare netbackup stop bpps a to see whats running kill -9 <pid> to kill hung processes Optionally rename startup script Use init 6 to restart server if processes will not die Ensure drives are empty
robtest ACSLS server
KEEPING PEOPLE AND INFORMATION CONNECTED

Startup netbackup start Resume/Restart all jobs Start Aptare Verify environment functions

Management Tips and Tricks


Use Activity Monitor, Restore, Policies, Device Monitor, Clients Properties most often Policies Use Summary of All Policies Sorting/Filtering Sort by State long running jobs? Export to Excel Selected rows or all rows Column Fields Move, Hide, Show Built-in NetBackup Reports Help Use multiple windows Break up long running jobs Multiple streams per policy Multiple policies Watch jobs per policy and client settings Dont forget about Aptare! It isnt always clear, look at it, correlate it, think about it
KEEPING PEOPLE AND INFORMATION CONNECTED

Education and Further Reading


Google Symantec Detailed PDFs on EC troubleshooting Manuals/Troubleshooting Guide Technotes NetBackup Mailing List/Forums List: http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu Forums Backup Central (mirrors the mail lists):
http://www.backupcentral.com/phpBB2/

Symantec: https://forums.symantec.com/syment/board?board.id=21 Tek-Tips:


http://www.tek-tips.com/threadminder.cfm?pid=776

KEEPING PEOPLE AND INFORMATION CONNECTED

Questions and Answers


Altered Lyrics to the tune of the Beatles Yesterday Yesterday, All those backups seemed a waste of pay. Now my database has gone away. Oh I believe in yesterday.

Suddenly, There's not half the files there used to be. And there's a milestone hanging over me. The system crashed, so suddenly.
I pushed something wrong, What it was, I could not say. Now all my data's gone, And I long for yesterday-ay-ay-ay. Yesterday, the need for back-ups seemed so far away. I knew my data was all here to stay, Now I believe in yesterday.

KEEPING PEOPLE AND INFORMATION CONNECTED

Thanks for attending!


KEEPING PEOPLE AND INFORMATION CONNECTED.

Anda mungkin juga menyukai