Anda di halaman 1dari 29

 

Contents
Data Protection and Disaster Recovery Tips
Chapter 1: Disaster Preparedness and You . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
by Paul Robichaux
6 Common Backup and Restore Mistakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Using the Wrong Backup Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Not Verifying Backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Mismanaging the Transaction Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Not Allowing Enough Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Forgetting the Small Stuff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Not Practicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Spend Time, Not Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
sidebar: Setting Up a Secure Offsite Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Chapter 2: Recovering from an Exchange Server Crash . . . . . . . . . . . . . . . . . . . . . 7


by Alan Sugano
sidebar: Take the First Step to a Complete DR Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Chapter 3: Recovering from an Exchange Server Crash . . . . . . . . . . . . . . . . . . . . . 11


Build a solution with a script, a secondary environment, and a file-restore strategy
by Brian Wilansky and Jeff Sandler
Restore from a Backup Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Tool Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Portal Server Backup Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Restore Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Additional Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Not Perfect, But an Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
sidebar: Solutions Snapshot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
ii    Data Protection and Disaster Recovery Tips

Chapter 4: Exchange Disaster Recovery Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17


by Menko den Ouden
Tip 1: Assess Required Service Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Tip 2: Create a Disaster Recovery Information Kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Tip 3: Back Up the Cluster Quorum Disk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Prepare Now; Minimize Stress Later . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Tip 4: Include AD in Your Recovery Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Tip 5: “Back Up” Your Exchange Expert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Tip 6: Use the Exchange Disaster Recovery Analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19


Chapter 5: Putting Together Your High-Availability Puzzle . . . . . . . . . . . . . . . . . 21
Improve system, database, and data availability with SQL Server 2005
by Kalen Delaney and Ron Talmage
Failover Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Server vs. Data Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Database Mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Mirroring Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Log Shipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Merge Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Transactional Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Peer-to-Peer Transactional Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Availability in a Highly Concurrent Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Snapshot Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Online Index Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Faster Restoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Database Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Final Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28


Chapter 1:

Disaster Preparedness and You


By Paul Robichaux

I’m writing this column from Nice, France, which is a beautiful city. And, so far, no one has laughed
at my attempts to dust off my rusty college French. However, what should have been a perfect trip has
been haunted by the ghost of disasters, both past and future.
First, the past. Not too long ago, the 100th anniversary of the great San Francisco earthquake of
1906 rolled around. San Francisco and its surrounding area were uniquely vulnerable to this earthquake
because of a variety of factors, including prevailing construction methods, soil composition, and the lack
of effective firefighting capability. As you probably know, the fault systems that underlie the Bay Area
(and their companion faults in the Puget Sound area) are overdue for a major earthquake, and that’s
worrisome.
Second, I’ve been reading a scary book, “Fifty Degrees Below Zero,” in which science fiction
author Kim Stanley Robinson describes some of the possible outcomes of abrupt climate change.
Those outcomes include destructive weather events that are practically Biblical in scale, along with
desperate efforts to mitigate the climate change and retool the economy. Whether or not you agree that
global warming is real, the historical record of abrupt climate change—and the lasting aftereffects—is
abundantly clear.
These two things have little in common except this: Both point out the need for effective disaster
recovery for your Exchange organization, and “effective” in this context implies effective and accurate
preparation. As hurricane season approaches, there are lots of nervous folks along the Gulf Coast, in
Florida, and along the eastern seaboard of the United States, but they’re already preparing. What about
your own organization?
I don’t have space here to list every step you might conceivably take to protect your Exchange
organization, but I can point out a few high-value things that you should be sure to include in your
planning:
1. Have a bug-out plan. If a disaster hit your business, how would you get away from the area?
How would you decide when it was time to go? How would you tell your employees not to come
to work? In fact, how would you make the decision to shut down or relocate operations?
2. Keep communicating. How would management and employees communicate until your
email service could be reestablished? Who’s in charge of establishing and maintaining disaster
communications?
3. Grab your gear and go. One of my customers implemented its disaster recovery plan for
Hurricane Katrina by shutting down the Exchange server, pulling all the disks from the storage
enclosure, and taking them by car to Houston. This was an ingenious and effective solution,
given the circumstances. What would you do under similar circumstances?
4. Now is always better than later. It’s better to have a fair solution now than a perfect solution

Brought to you by CA and Windows IT Pro eBooks


    Data Protection and Disaster Recovery Tips

later. Of course, this doesn’t mean that you should rush out and slap together a disaster-
preparedness strategy out of whatever random products and technologies you can find. It
does, however, mean that you should push disaster recovery and preparedness planning to the
forefront of your list of operational concerns.

It’s not possible to anticipate every possible disaster, but you don’t have to. The responses to many
disasters will be the same; you can make plans based on the expected duration of recovery, the impact
of the disaster on your facilities and the surrounding area, and other factors. Even if you don’t live in a
disaster-prone area (I don’t; the biggest threat in northwest Ohio is apparently highway construction),
you should still be prepared for things such as structure fires, major traffic accidents (what if a gasoline
tanker blew up nearby? That happened at my wedding!), and so on.
The Boy Scouts say “Be prepared,” but I like the US Coast Guard’s motto better: “Semper Paratus,”
which is Latin for “always ready.”

6 Common Backup and Restore Mistakes


By Paul Robichaux

Nothing compares with the sinking feeling you experience when you need to restore data from a
backup but can’t for some reason. Most computer users have this experience eventually; the pain is
even more acute and frequent for administrators, who are responsible for large amounts of important
business data. Although backup and restore technologies have advanced in the past few years, you
probably still use them only as last-ditch safety mechanisms. When all else fails, you try to restore from
backup. For this alternative to be viable, you must have a degree of confidence that your data will be
available and readable when you need it. However, Exchange administrators make several common
mistakes that prevent their backup and recovery operations from running smoothly.

#1: Using the Wrong Backup Method


The two basic methods for backing up Exchange data are online and offline. Online backups use a
Microsoft interface (such as Extensible Storage Engine—ESE, backup APIs, or Microsoft Volume
Shadow Copy Service—VSS) to copy the selected Exchange data while the Exchange services are
running and while the target database is mounted and active. The Exchange-provided APIs back up
transaction logs and truncate the logs when necessary.
Offline backups copy the Exchange database and log files while the database isn’t mounted. Some
solutions purport to copy Exchange data without using Microsoft’s APIs but also without dismounting
the databases. The Microsoft article “XADM: Hot Split Snapshot Backups of Exchange” (http://support.
microsoft.com/?kbid=311898) explains that Microsoft considers these backups to be offline.
Performing online backups is preferable for typical production operations because online backups
capture a consistent copy of the Exchange databases without interrupting user access. However, offline
backups are useful in some situations. For example, performing a complete offline backup of your
Exchange database and logs is a good idea before installing a Windows or an Exchange service pack or
performing a forklift upgrade of the database to another server. Although creating offline backups is

Brought to you by CA and Windows IT Pro eBooks


Chapter 1 Disaster Preparedness and You  

more time consuming than generating online backups, many administrators prefer the extra safety of
having a periodic offline backup in addition to routine production backups.

#2: Not Verifying Backups


If your backup fails and no one notices, does it make a sound? Maybe not, but your users will surely
sound off if you can’t recover their mail data. I recently worked with a company whose administrator
accidentally corrupted a mailbox database. When the Exchange administrator tried to restore the
database, he discovered that backups of the database had been failing for more than four months
because the administrator hadn’t installed the Exchange version of the company’s third-party backup
agent software. The installed version of the agent tried to back up the files but couldn’t because the
Exchange Information Store (IS) had the files open. Even a cursory review of the backup software’s
reports or the Application event log would have shown that the software wasn’t backing up the
Exchange data. Unfortunately, no one monitored the backups for success.
To prevent this problem, regularly check your backup software’s logs. You need to verify:
• that the backup software is backing up what you want it to. Make sure the backup type, time,
and contents are correct.
• that the backup finishes. Verify that the requested data is backed up, and check for errors that
might have occurred.
• that you can restore the data written during the backup. If you’re using tape, verify that you can
read the tape from another tape drive. Check to see whether you can restore the data to a server
and extract Exchange information.

If one of these three checks fails, you should be able to determine the cause of the backup failure
and therefore fix the problem. For example, during an online backup, Exchange computes a checksum
for each page and compares it with that page’s checksum on disk. If the checksums don’t match, you
receive a 1018 error and the backup stops. Checking your backups would alert you to the error and give
you a chance to fix it before the backup stopped.
Even if your backups are working now, don’t get complacent. Changing your environment, backup
software, Windows configuration, or Exchange configuration might make your backups fail in the
future. Check your backups regularly for the best protection. The fastest and simplest way to check
your backups to be sure they work is to check the Application event log and the report that your
backup program generates. Check the Application event log to ensure that Exchange didn’t generate any
errors during the backup period. Check the backup program report to verify that the backup program
didn’t skip any files and that no errors occurred.

#3: Mismanaging the Transaction Logs


Your ability to restore an Exchange database depends on the state of the transaction logs. If you have
the correct set of log files for a database, you have a good chance of restoring the database to the point
of failure. Conversely, if the logs are lost or damaged, the odds of a complete recovery drop. When you
perform a restore, Exchange attempts to play back the log files, in sequence, from the first log required
for the database (also known as the low anchor log) to the last log available (the high anchor log). If a log
file between the low and high anchor logs is missing, log playback stops. The restore can’t continue
until you recover the missing log file.

Brought to you by CA and Windows IT Pro eBooks


    Data Protection and Disaster Recovery Tips

Online backups automatically include the log files as part of the backup data set. During normal
operation, Exchange continues to create new log files as transactions occur. These log files remain on
disk until you perform a full or an incremental online backup, at which point the Exchange IS process
truncates or removes the files. Don’t remove log files yourself. In some circumstances, you might
need to copy the log files to a separate directory for safekeeping. In “Offline Backup and Restoration
Procedures for Exchange” (http://support.microsoft.com/?kbid=296788), Microsoft recommends saving
copies of the transaction logs in a separate location before attempting to recover data from an offline
backup.
When you use NTBackup to perform a restore, the logs don’t play back unless you select the Last
restore set check box (or the equivalent check box in another backup program). The database you restore
isn’t mountable unless you select this option, or unless you use the Eseutil /r command to manually start
a log playback.
If your transaction logs are missing or any of your log files are damaged, Microsoft’s free Exchange
Server Disaster Recovery Analyzer (ExDRA) might be helpful. This tool can analyze a dismounted
database, tell you which log files are present and which are missing, and give you options for fixing any
problems it finds. ExDRA can be valuable if you experience an unexpected restore failure, although
it’s no substitute for understanding the disaster-recovery process and consulting Microsoft Customer
Service and Support (CSS) or other experts when necessary.

#4: Not Allowing Enough Time


Backups take time. Each backup configuration has a throughput number that reflects how much data
you can back up and restore in a given time period. A common mistake is to underestimate the amount
of time a restore will take. When a restore takes longer than anticipated, you sometimes must break
service level agreements (SLAs), and users are often disgruntled.
Microsoft’s recommendation is to measure the length of time necessary to back up a volume of
data, then allocate twice that time for a restore. You might wonder why a restore takes twice as long as
a backup. Suppose you need to back up a 60GB database, using a backup system that can write 12GB
per hour. Five hours seems reasonable for a backup. However, when you get ready to restore the data,
remember that merely reading the data takes five hours. The restore process requires that you also do
the following:
• Locate the appropriate backup media (if you’re using removable media such as tape) or find the
appropriate disk volume (if you’re using VSS or SAN-based backups).
• Transfer the backup data to the server from which you’ll perform the restore.
• Create a recovery server or Recovery Storage Group (RSG), if necessary.
• Read the data back from the backup media and correct any errors or problems.
• Replay the transaction logs.
• Move data from the recovery server or RSG to production mailboxes.
• Mount the database successfully.
• Deal with any ancillary problems that arise.

This list isn’t trivial; if a problem occurs at any stage in the process, your recovery operation won’t
proceed through the successive steps. The more restores you perform, the more smoothly they’ll go.
You’ll be able to accurately estimate how long a restore will take, and you’ll become familiar with and

Brought to you by CA and Windows IT Pro eBooks


Chapter 1 Disaster Preparedness and You  

learn how to solve the types of problems that are common in your environment.

#5: Forgetting the Small Stuff


Exchange backup discussions often focus only on backing up and restoring Exchange data, ignoring the
numerous other objects and data items that you must also back up and restore. For example, if your
Exchange server has a catastrophic hardware failure that requires you to replace it, you need to install
Windows and Exchange on the new server before you can use your Exchange database backups and
transaction logs. Maintaining a system-state backup of your Exchange server lets you easily restore the
server and Exchange data, putting you back in business much more quickly than if you need to hunt
for product installation CD-ROMs, product keys, and so on. If your Exchange environment includes
antivirus software, spam filters, X.509 Certificate Authorities (CAs), fax connectors, or other auxiliary
services, you need to back up and restore their configuration data as well as the necessary data (e.g.,
private keys, filter lists) to restore these services to their original operating quality.
When you use NTBackup to perform a system-state backup, NTBackup captures all the system
data on the local machine, including the registry, Active Directory (AD) Directory Information Tree
(DIT) files on a domain controller (DC), Windows Certificate Services data, DHCP and DNS server
databases, and other data that’s crucial for recovery. Most third-party backup utilities also have this
capability, but you don’t need to use third-party tools; you can use NTBackup to schedule a system-state
backup to an on-disk file, then include the file with every Exchange backup. This method guarantees
that you always have an up-to-date system state to restore. Don’t forget to periodically update the
Automated System Recovery (ASR) disk. You can often use the ASR disk to repair damaged Windows
installations without completely reinstalling the OS. Many third-party backup programs have a similar
capability.

#6: Not Practicing


The best time to learn how to recover data in your environment is before you have a problem.
Remember that practice makes perfect. Even if you have only one database on one server, you can still
practice recovery. Buy a copy of Microsoft Virtual PC 2004 or VMware Workstation, build a test server,
and practice restoring data to it. If you’re using Exchange Server 2003, you must be thoroughly familiar
with RSGs and how to use them. You need to know how to use your preferred backup program to
restore data to the original server and to a different server. Keep your product installation CD-ROMs
and product keys in a safe location (not in a text file on a server that you might need to recover).
Regularly practice recovering items that you might need to recover during an actual outage; depending
on your environment, these items might include individual mailboxes, individual messages, databases,
storage groups (SGs), or entire servers. Practicing beforehand will be time well spent when a failure
occurs.

Spend Time, Not Money


Many companies spend a lot of money on disaster-recovery and high-availability solutions but discover
too late that just buying the best hardware and software isn’t sufficient. You can use the free NTBackup
utility and an inexpensive tape-or disk-based backup system to build a completely adequate disaster-
recovery solution. Learn as much as you can about backup and recovery, avoid the common mistakes

Brought to you by CA and Windows IT Pro eBooks


    Data Protection and Disaster Recovery Tips

I’ve discussed, practice backup and recovery in your environment, and continually monitor your
processes. Then, when you experience a failure, you’ll be ready to put your skills to work.

Setting Up a Secure Offsite Backup


By Randy Franklin Smith

What’s the simplest way to set up a secure, automatic, offsite backup process for files on a server?

The simplest way would be to use an Internet-based backup service such as NetMass. Internet-based backup services
use a local agent to compress and encrypt your files, then transmit them to a data center. I’ve used NetMass, and it
was a lifesaver. However, such services can be costly for companies with many gigabytes of data, and some compa-
nies are unwilling to put their data into someone else’s hands.

The next-simplest option would be to implement Microsoft System Center Data Protection Manager (DPM) 2006,
which automatically maintains multiple versions of files and lets users restore files themselves without involving the
administrator. But DPM can also be costly, and it requires a SQL Server license.

I had a client who wanted secure offsite backups for about 300GB of data but couldn’t afford DPM and SQL Server. I
fulfilled that client’s needs with one additional PC and a Windows Server 2003 Release 2 (R2) license. I set up the new
Windows server to serve as the backup server. After connecting the backup server to the company’s domain, I set up
DFS to replicate data from the company’s main servers to the backup server.

After the backup server completed the initial replication, we moved it to an offsite location. Next, I configured the
backup server to automatically establish an L2TP VPN connection to a server at the company’s main office by using
RRAS on both servers. Over the persistent VPN connection, DFS keeps the files on the backup server up-to-date with
changes on the main servers, usually within seconds.

To preserve the ability to restore a version of a file from several days earlier, I advised the client to run a full backup of
the files on the backup server to an archive disk drive once a week. Each of the other nights of the week, the backup
server performs an incremental backup to the backup drive. This arrangement lets users restore any version of a file
that’s up to seven days old. Periodically, at the client’s request, I copy the files from the archive disk drive to a USB
drive for long-term archiving.

If you’re going to use DFS for remote backups, you’ll find the DFS enhancements in Windows 2003 R2 to be worth
the investment. DFS on Windows 2003 R2 is more stable and efficient than on Windows 2003 and is easy to manage
and troubleshoot.

Brought to you by CA and Windows IT Pro eBooks


  

Chapter 2:

Recovering from an Exchange


Server Crash
By Alan Sugano

I recently received a call from a client saying that a remote server in San Francisco had Microsoft
Exchange Server databases that wouldn’t mount. Lately, this particular server has become unstable and
freezes every few weeks. The server had frozen again and the administrator rebooted it. Although the
server came up, the Exchange private and public databases refused to mount. Usually when both stores
refuse to mount on a server, the problem is server related and not necessarily related to the Exchange
databases themselves.
But, because this was a remote server and I wanted to get it up and running as quickly as possible,
I tried to run Eseutil against both the private and public databases. Unfortunately, the databases didn’t
mount after running Eseutil. This problem occurred when I was speaking at the WinConnections
conference in San Diego. I discussed the situation with the client and the company decided to order a
new server to replace the server that was unstable. The server hardware was going to take a few days to
arrive, so I scheduled a trip to San Francisco immediately following the conference.
Fortunately, this client has several Exchange servers on its WAN. All of the mailboxes were deleted
on the San Francisco server by using the Microsoft Management Console (MMC) Active Directory Users
and Computers snap-in, and the user’s mailboxes were recreated on a local server in Los Angeles. This
setup allowed the remote users to at least send and receive new mail until the new server could be
installed. Users took a performance hit because their mail resided on a remote server, but slow mail is
better than no mail.
When I arrived, the San Francisco server was installed with Windows Server 2003 and Exchange
Server 2003. After running a few tests to verify the server was functioning properly, I used Active
Directory Users and Computers to move the San Francisco users’ mailboxes from the Los Angeles server
to the new server. Fortunately these mailboxes were relatively small because the users had been using
the new mailboxes on the Los Angeles server for only a week, so even over the WAN the mailbox move
took only about 1 hour. Now that the mail was located on the correct local server, I needed to recover
all the messages prior to the Exchange server crash. Originally I had planned to use the Recovery
Storage Group feature in Exchange 2003 (the original server was running Exchange 2000, which the
feature supports), but because the mailboxes had been deleted and recreated on the Los Angeles server,
the new mailboxes had new Global Unique Identifiers (GUIDs), and the original GUID assigned to the
San Francisco mailboxes prior to the Exchange crash were gone.
As you might know, the mailbox GUIDs must be consistent when using a Recovery Storage Group
or you’ll receive an error when you try to run ExMerge to export the mailbox information to a .pst file.

Brought to you by CA and Windows IT Pro eBooks


    Data Protection and Disaster Recovery Tips

At this point, I had a couple of options: I could try to get the original Exchange 2000 server up and
running again, restore the original database, then merge the information; I could purchase an Exchange
Recovery tool from third-party venture, or I could try to somehow get the information out of the old
store to merge it with the new mailbox information. After discussing the situation with the client, we
decided on the last option.
Knowing that I needed to merge the new and old mailbox information, I used Exmerge to export
all the new mailbox information to .pst files. I then created a Recovery Storage Group on the new server
and copied the information store databases from the old server, then mounted and dismounted the store
(this process proved that the old database files were OK). This ensured that I had a database that was in
a consistent state. Using Exchange System Manager (ESM), I opened the properties of the First Storage
Group, Private Information Store Database properties and selected the “Database could be overwritten
with a Restore” checkbox and dismounted the store. I renamed the original store databases and copied
over the old private information store database files (priv1.edb and priv1.stm) from the Recovery
Storage Group directory to the live mdbdata directory on the new server. I made sure that the old store
database file names were the same as the new database file names (priv1.edb and priv1.stm). Then I
mounted the store databases from the old server on the new server and had all the San Francisco users
access their mailboxes.
At this point, the mailboxes looked empty because they still had the new GUIDs assigned to their
mailboxes in Active Directory (AD). I took this approach to create “dummy” mailboxes on the server
so I could delete the dummy mailboxes and reconnect the old mailboxes to the corresponding user
in AD via ESM. After all the San Francisco users accessed their mailboxes, I went into the ESM and
deleted the empty mailboxes, reconnected the users to the old mailboxes, and modified the rights on
the mailbox to ensure that the user and mail administrator had full rights to their old mailboxes. Now
users had restored mailboxes in a state just before the server crashed. I ran Exmerge and merged all
the new mailbox information from the .pst files I had previously exported into the original mailboxes.
Now each user had a complete set of information in their mailboxes, both pre- and post-crash. After
AD replication completed, I had to refresh the mail profile on all the workstations by deleting the
profile and recreating it. After these steps, the users were able to access their mailboxes with a complete
information set. I restored the Public folder database from the old server so users could access the
Public folders that previously resided on the old server.
Fortunately, no additional Public folders were created after the crash, so I didn’t need to worry
about merging new Public folder information on the new server. I did run into some Public folder
rights concerns, and I had to reassign rights to certain Public folders. Consider using the PFDavAdmin
utility to reassign Public folder rights if you have problems assigning rights via the ESM.
The above process allowed me to restore all the users’ mailbox information. Fortunately this
remote office had a relatively small number of users (20), so it wasn’t too much work to recover
the information. The users were happy to get back all their old mail and restore the original mail
performance now that they were accessing their mail from a local server.

 WithExchange Server 2003 Service Pack 2 (SP2) and Exchange 2003 Standard Edition, you can
TIP
now have a mail store of 75GB, up from 16GB in earlier versions.

Brought to you by CA and Windows IT Pro eBooks


Chapter 2 Recovering from an Exchange Server Crash  

Take the First Step to a Complete DR Solution


By David Chernicoff

As I write this, most of the local waterways are well over flood stage (as much as 11 feet above crest). In addition,
the Federal Emergency Management Agency (FEMA) has yet to show up with debit cards and trailers, and many
local businesses now find themselves completely flooded out. Some locations along the river are seeing water hit the
second floor of some low-lying buildings. As my office sits between two major flood areas, the current weather pat-
terns give me reason to think about my disaster recovery plans despite no immediate danger.

Although I’m completely comfortable with my data backups, I realize that I lack any sort of business continuity/disaster
recovery plan. What happens if my office gets flooded or simply loses power for a long period of time due to flooding
elsewhere? Fortunately, floods are rarely a surprise, and if necessary, I know that I could pack up my office and move it
elsewhere—which would be a major aggravation given the amount of hardware involved—but it could be done.

Because an ISP hosts my Web and email services, I don’t have to concern myself with customers that have issues
with those services while my office location is being moved or temporarily out of service. However, like many small
businesses, I don’t maintain an offsite backup of my critical data (though, in the past, I’ve covered services that offer
real-time backup of local servers to remote sites). The ongoing weather situation here has forced me to rethink this
practice. Even without a major disaster that destroys my upper-floor office, I could easily be in a situation in which
flooding prevents me from accessing my office and the data contained therein.

One of my habits, however, would help me alleviate some of the potential problems. I keep current copies of my in-
progress projects online, stored in password-protected archives in password-protected directories on one of my Web
servers. I update these archives as the projects progress and could continue with any of the projects without access
to my office as long as I have access to a computer that has Microsoft Office and Internet access. But I wouldn’t have
access to the relatively huge amount of historical data I retain, nor any of the more specialized applications that I run.

For this reason, I’ve given serious thought to one of two solutions: tape backup or a rack-mountable hard disk-based
backup appliance. Tape backup lets me do the traditional tape rotation with offsite storage, and the costs are fairly low.
However, tape backup also means a change in workflow, and my work style doesn’t lend itself to the workflow that
would allow me to use tape as my crisis solution.

With a hard disk-based appliance, I can automate the backup process so that a mirror of my office data is always
available in a single device that, although not exactly portable, is small enough to pick up and move if necessary. This
would provide minimal interference with my existing workflow and let me get up and running at an alternate location
with minimal trouble.

I still need to develop a business continuity plan in case a disaster destroys my office and its contents, but the added
security of maintaining a movable image of my office is a good place to start.  

Brought to you by CA and Windows IT Pro eBooks


  11

Chapter 3:

Bridge the File-Restore Gap


Build a solution with a script, a secondary environment,
and a file-restore strategy

By Ethan Wilansky and Jeff Sandler

Microsoft SharePoint technologies support information sharing among groups of people within or across
companies. (In this article, we refer to Microsoft SharePoint Portal Server 2003 as Portal Server and
Windows SharePoint Services 2.0 as SharePoint Services. When referring to both, we use SharePoint.)
SharePoint provides document-management support through two types of file repositories: document
libraries and lists. Files are stored in a SharePoint content database, not in the Windows file system.
One glaring omission in SharePoint’s document-management capabilities is that you can’t easily
restore files accidentally deleted. In addition, although versioning is available, Portal Server and
SharePoint Services remove all version history after you delete an item. Besides these file-restore
problems, SharePoint has a relatively weak backup engine, which doesn’t give you the option of
performing single-file restores from a backup.
Despite SharePoint’s backup and restore gaps, workable solutions do exist. We’ll show you how
to automate the SharePoint backup process by using a script we’ve provided, then we’ll discuss some
approaches for supporting file restores.

Restore from a Backup Environment


Instead of struggling with SharePoint’s inadequate file-restore capability, you can provide single-file
restores and undeletes from backed-up versions of a Portal Server portal instance (single-server or
farm), a SharePoint Services instance, or SharePoint site collections or sites. However, managed backup
and restore procedures require dedicating IT employees to oversee the process, a mechanism to request
the restoration of files, and possibly additional hardware for the secondary environment. One way to
minimize the cost of a secondary environment is to virtualize your portal instance by using server-
virtualization products such as VMware Workstation or Microsoft Virtual PC.
Regardless of whether your secondary Share-Point environment is physical or virtual, backups
might also require a significant amount of media capacity. To meet storage requirements for your
backups, you could use SAN technology, which is already available in many larger organizations, or
terabyte-sized drives. Many vendors provide NAS terabyte drives where you can inexpensively store
SharePoint backups. It’s also prudent to store your backups in offsite storage and implement fault
tolerance by using a RAID solution.

Brought to you by CA and Windows IT Pro eBooks


12    Data Protection and Disaster Recovery Tips

Tool Options
If maintaining a second SharePoint environment containing a previous version of the production
environment is viable for your organization, you can perform full backup and restore operations. For a
Portal Server implementation, you can use the SharePoint Portal Server Data Backup and Restore utility
(spsbackup.exe). For a pure SharePoint Services implementation that doesn’t contain Portal Server,
you can use the SharePoint administration utility (stsadm.exe) or Microsoft SQL Server tools such as
OSQL (osql.exe) to perform backup and restore. For more information about backing up and restoring
SharePoint Services, see the Microsoft article “How to back up and restore installations of Windows
SharePoint Services that use Microsoft SQL Server 2000 Desktop Engine (Windows)” (http://support.
microsoft.com/?kbid=833797.)
Although you can use Stsadm to back up and restore Portal Server, using the tool in this way has
limitations. For information about those limitations, see the Microsoft article “Supported scenarios for
using the Stsadm.exe command-line tool to back up and to restore Windows SharePoint Services Web
sites and personal sites in SharePoint Portal Server 2003” (http://support.microsoft.com/?kbid=889236).

Portal Server Backup Procedures


You can start Spsbackup from the SharePoint Portal Server program group or at the command line by
navigating to the SharePoint bin folder and typing spsbackup.exe. You might also want to add the bin
folder to your PATH environment variable on the server where you plan to run the backup procedure.
You can do backups periodically, so that you can find restorable data according to the backup date.
To automate the backup procedure, you can create a batch job that calls the Spsbackup utility, or you
can create a script to complete this operation. Then you can use the built-in AT Command Scheduler
(winat.exe) or Task Scheduler to schedule a regular backup.
The code in Listing 1 (http://www.windowsitpro.com/Files/93239/Listing_01.txt) shows a simple
VBScript routine that creates a time-stamped folder for the backup by using the FileSystemObject
and Date functions. After creating the folder, the script uses the WshShell object’s Exec method to
call the Spsbackup utility. To use the backup script, you need to modify the strDestinationFolder
pathname in callout A for your own environment, so that it’s either a Universal Naming Convention
(UNC) pathname or a local drive pathname. Spsbackup requires a local drive as a backup target for
a single-server implementation of Portal Server and a UNC pathname for a backup target in a farm
implementation of Portal Server. Note that if you decide to use external storage for your backups, the
pathname will impact how you connect your external-storage device. Simply mapping a drive letter to a
UNC pathname for a single-server implementation of Portal Server won’t work. In the case of a single-
server implementation, you can extend the script by calling FileSystemObject’s MoveFolder method so
that after the backup procedure occurs locally, you can move it to a target that’s accessible via a UNC
pathname.
Unfortunately, Microsoft didn’t bother to create a SharePoint scripting API, so you must either
call command-line tools, as the script does, or write your own Microsoft COM objects that leverage the
SharePoint object model by using the Microsoft .NET Framework. You can register and instantiate the
custom COM components and call methods of these custom objects from a script. You can also use
the new Windows PowerShell to run Spsbackup, but, as far as I know, no Microsoft .NET cmdlets are
available for the current release of Portal Server. Cmdlets encapsulate tasks by calling .NET methods

Brought to you by CA and Windows IT Pro eBooks


Chapter 3 Bridge the File-Restore Gap  13

available in various object models. Developers (Microsoft or otherwise) can write cmdlets to automate
SharePoint administrative tasks by leveraging either the Portal Server or SharePoint Services .NET
object models.
After you’ve customized your script, you can add it as a scheduled task to your SharePoint server,
where you’ll execute the script by using the AT command-line utility or the Scheduled Task Wizard,
which you can launch from the Scheduled Tasks icon below the System Tools program group. The
wizard will walk you through the scheduling process. Alternatively, you can type AT /? at the command
line for help in using the AT command scheduler.
After you create the backups, you can do a full restore of a Portal Server backup by using the
Spsbackup utility either from the utility’s graphical interface or at the command line. You can see the
graphical interface for Spsbackup if you open Spsbackup from the Portal Server program group or if
you run Spsbackup from the command line without specifying any switches.

Restore Strategies
Now that you have a backup procedure, you might want to look at three choices for building your
restore strategy: the snapshot, ad hoc, and hybrid methods. In the snapshot method, you create a restore
environment that’s an earlier image of the production portal. You run periodic backup and restore
operations to get a snapshot of the portal at an earlier time. An obvious down side to this method is that
you can’t restore files prior to the date of the restored environment. This deficiency is solved in the ad
hoc method, where you can regularly create and archive backups and conduct restore operations as the
need arises. If you require a file that dates back three months, all you need is the backup file from that
time period to prepare an ad hoc restore and retrieve the file. However, depending upon the frequency
of restore requests, the ad hoc method might be a poor solution because restores can take a long time to
finish.
In the hybrid method, you combine the first two methods by maintaining a snapshot with the
option of restoring an ad hoc environment upon request. You can overwrite the mirrored environment
with the ad hoc restore or maintain a third server just for on-demand restores. The size of your
organization, administration team, and infrastructure and the number of restore requests can influence
whether any of these approaches will work for you.

Additional Approaches
Other approaches you can take within a second portal environment don’t require doing a complete
restore of the portal databases to find deleted data. If you can narrow critical document libraries and
lists to those found within a few site collections, sites, or subsites, you can back up and restore smaller
sections of the portal more frequently, on a periodic or an ad hoc basis. In this partial?file-restore
approach, you isolate backups and restores at the SharePoint Services level by using the smigrate.exe
command-line migration tool or the stsadm.exe command-line administrative tool to mirror smaller site
structures.
Running stsadm.exe with the -o backup and -o restore options lets you back up and restore site
collections from one environment to another; smigrate.exe provides similar functionality for individual
sites. You can find more information about these tools in the Microsoft Office SharePoint Portal Server

Brought to you by CA and Windows IT Pro eBooks


14    Data Protection and Disaster Recovery Tips

2003 Administrator’s Guide (http://www.microsoft.com/technet/prodtechnol/office/sps2003/ downloads/


admdwnld.mspx) and the Windows SharePoint Services Administrator’s Guide (http:// www.microsoft.
com/technet/prodtechnol/sppt/wss/wssagabs.mspx).
This strategy of backing up and restoring smaller sections lets you perform more frequent and
rapid backups and file restores than a full backup-and-restore operation. However, it requires that you
correctly identify and maintain a list of critical site collections or sites to back up.
No matter how you deal with the SharePoint backup and restore gap, you need to take storage
needs into account and, depending on which approach you take, consider securing the duplicated
data so that most users can’t access it. Otherwise, you might overlook duplicate data when securing
SharePoint.

Not Perfect, But an Improvement


Microsoft has improved restore functionality in Microsoft Office SharePoint Server 2007. In the
current beta version, you’ll see a two-stage undelete process. After deletion, a file remains in a recycle
bin for a configurable amount of time, during which a user could restore it. An administrator can also
restore user files and set file-retention policies based on configurable storage quotas. In addition, the
beta’s backup utilities provide differential, incremental, and full backup options.
As any IT professional can tell you, no perfect software product exists. We hope the solutions we’ve
offered will help you bridge the gap in backup and file restore in Share-Point.

Solutions Snapshot
PROBLEM:
SharePoint provides no easy way to recover deleted files.  
SOLUTION:
Recover files by running an automated backup procedure and using any of several file-restore strategies.  
WHAT YOU NEED:
SharePoint Portal Server and Windows SharePoint Services; SharePoint secondary environment (physical or virtual);
automated-backup script  
DIFFICULTY:
3 out of 5 
SOLUTION STEPS:
Create a secondary portal environment (physical or virtual).
Create a script to automate backups.
Pick a restore strategy.
Restore files.  

Brought to you by CA and Windows IT Pro eBooks


Chapter 3 Bridge the File-Restore Gap  15

Listing 1: VBScript Procedure for Automating SharePoint Backup


--BEGIN COMMENT
‘Use wshshell to run the exec method
--END COMMENT
Set WshShell = CreateObject(“WScript.Shell”)

--BEGIN COMMENT
‘Get today’s date
--END COMMENT
strDate = Replace(Date(),”/”,”_”)

--BEGIN COMMENT
‘Provide a destination folder for the backup that points to a network
‘share and append the name with today’s date.
--END COMMENT

--BEGIN CALLOUT A
strDestinationFolder = CreateFolder(“\\usrds005\SPBackups\” & strDate)
--END CALLOUT A

--BEGIN COMMENT
‘Build the spsbackup parameter string
--END COMMENT
strParam = “/all /file “ & strDestinationFolder

--BEGIN COMMENT
‘Build the spsbackup backup string
‘Note, spbackup is located on the C drive in a default installation.
--END COMMENT
strExec = “C:\Progra~1\ShareP~1\Bin\spsbackup.exe”

--BEGIN COMMENT
‘Use the Exec method to run spsbackup
--END COMMENT
Set objExec = WshShell.Exec(strExec & “ “ & strParam)

--BEGIN COMMENT
‘Create a folder function
--END COMMENT
Function CreateFolder(folderName)
Set fso = CreateObject(“Scripting.FileSystemObject”)
Set f = fso.CreateFolder(folderName)
CreateFolder = f.Path
End Function

Brought to you by CA and Windows IT Pro eBooks


  17

Chapter 4:

Exchange Disaster Recovery Tips


By Menko den Ouden

You know that someday disaster could strike at your Exchange environment—probably at the worst
possible time. Regardless of whether your Exchange organization is large or small, losing mail services
has a big impact on your business. These six tips will help you in designing, planning, testing, and
implementing an Exchange-specific disaster recovery plan.

Tip 1: Assess Required Service Levels


Email is a vital function, perhaps never more so than when disaster strikes and mail services aren’t
available. You need to make sure all email users at all levels of the business agree about the response
times and service levels needed. Clearly explain to users how IT will restore email services in different
disaster scenarios.
Recovery time will depend largely on how long it will take to recover Active Directory (AD), the
Exchange system, and Exchange databases from backup media. Therefore, to gauge response time,
first calculate the total amount of time needed to recover a complete database and a complete server.
Doing so lets you estimate the amount of time needed to recover an Information Store (IS) or a complete
server in optimum circumstances. You’ll then have to build in additional recovery time for more severe
disasters to accommodate dependencies such as faulty or inoperative network infrastructure and other
failing services (e.g., SANs, NICs).
To shorten recovery time, you might also opt to decrease database sizes, which will almost
automatically require additional databases and storage groups (SGs). Each SG, with a maximum of four
per server, can have as many as five databases. Because each SG creates its own log files, you’ll then
want to separate the transaction-log sets on dedicated disks. Spreading the storage load in this way can
help you recover the databases more quickly.

Tip 2: Create a Disaster Recovery Information Kit


Create a disaster recovery kit, which should be stored securely offsite on a remote server, backup tape,
or even in a box containing hard-copy files. The kit includes detailed information about server names,
passwords, installations, patch and driver history, configuration history, and licensing information. Also
include in the kit disk and partition configurations, your Exchange organization name, administrative
group and routing group names, system state information, and Microsoft IIS metabase backups. Store
recent backups or printed information about where to find other backup media, store installation
media, system state backups, and contact information about who or what type of IT pro can and will
restore what data. If you have a SAN, include contact information for your SAN specialist.

Brought to you by CA and Windows IT Pro eBooks


18    Data Protection and Disaster Recovery Tips

Also you should regularly extract AD user information, such as email addresses, by using a utility
such as LDIFDE or CSVDE and add this information to the kit. For example, you’d use the following
command to export directory objects, including mail addresses:
ldifde -f C:\export.ldf -v

Tip 3: Back Up the Cluster Quorum Disk


If you’re using an Exchange cluster, you’ll need to include in your disaster recovery plan backing
up and restoring the cluster quorum disk as well as the shared disks. Without the quorum disk, you
won’t have vital cluster-configuration data and more important, your cluster will no longer start when
disk signatures have changed—for example, when you replace disks, use storage-management tools to
change the disk configuration, or reconfigure the array on a shared bus.
To back up the quorum disk, you’ll need to perform a full computer backup or a Windows system
state backup. You can use NTBackup’s Automated System Recovery (ASR) tool to create an ASR floppy
disk that stores the disk signatures. On Windows NT 4.0 and Windows 2000 pre-Service Pack 3 (SP3),
you could use the Windows 2000 Resource Kit Cluster Tool (clustool.exe) to back up the configuration of
the complete cluster, including disk signatures. In case of a lost quorum and when the signature of the
quorum disk changed, you can use the Win2K resource kit’s Dumpcfg utility (dumpcfg.exe) to manually
write the signature back to the quorum disk. (The Microsoft article “Recovering from an Event ID
1034 on a server cluster” at http://support.microsoft.com/?kbid=280425 provides detailed instructions
for using Dumpcfg.) In Windows Server 2003, you can use the cluster service and the Windows 2003
Resource Kit Cluster Server Recovery Utility (clusterrecovery.exe) tool to fix a lost quorum disk.
Additionally, make sure you read, understand, and test the procedures explained in the clusterrecovery.
chm Help file.

Prepare Now; Minimize Stress Later


Schedule recovery tests to give you and your colleagues practice in recovering your Exchange server. Use
test labs and the Recovery Storage Group (RSG) to check whether database backups were successful.
You could, for instance, extract random mailboxes from the RSG by using the Exchange Mailbox Merge
(ExMerge) utility to check the data and the Exchange Disaster Recovery Analyzer (ExDRA) tool to check
data integrity. By testing your Exchange recovery procedure now, you’ll be better prepared to handle a
far more stressful, real-world Exchange crash.


Tip 4: Include AD in Your Recovery Plan
In many cases, recovering Exchange also means recovering Active Directory (AD). Small companies
often have only one server for both Exchange and AD, and even in very large environments, a minor
mistake in AD can have consequences for the complete Exchange and AD configuration. Since
Exchange Server 2003 and Exchange 2000 Server rely heavily on AD, make sure you frequently back
up your domain controller’s (DC)’s system state, which includes AD, the registry, boot files, certificate
services, Microsoft IIS, COM+, and Sysvol information. Perform system-state backups at least as often as
you back up Exchange.
Thoroughly check and test your system-state backup and restore capabilities and make sure that

Brought to you by CA and Windows IT Pro eBooks


Chapter 4 Exchange Disaster Recovery Tips  19

the NTDS and Sysvol volumes have enough space to perform a complete system-state restore. I’ve
seen restores of Global Catalogs (GCs) larger than 2GB fail on disks with more than 2GB of free space.
Make sure that your recovery plan includes procedures to restore AD both authoritatively and non-
authoritatively. For instance, deleting or changing important directory objects in AD in a multiple-DC
environment will require you to perform an authoritative AD restore, whereas you’d want to use the
non-authoritative restore to recover a DC that failed completely because of hardware errors. For more
information about AD backup and restore procedures, see the Microsoft TechNet “Active Directory
Operations Guide,” http://technet2.microsoft.com/windowsserver/en/library/9c6e4dd4-3877-4100-a8e2-
5c60c5e19bb01033.mspx.

Tip 5: “Back Up” Your Exchange Expert


Many organizations have a resident Exchange expert—the one person who fully knows the Exchange
infrastructure. Your disaster recovery plan should specify who will back up and, if necessary, replace
your Exchange guru should he or she be unavailable in a disaster. Select an employee who will back up
the Exchange expert, and make sure that employee and the Exchange guru meet regularly—to bring the
backup employee up to speed on your organization’s Exchange procedures.

Tip 6: Use the Exchange Disaster Recovery Analyzer


The Exchange Server Disaster Recovery Analyzer Tool (ExDRA) can help administrators troubleshoot
Exchange-database–related problems. ExDRA collects configuration data and header information from
databases and transaction-log files and creates a detailed list of database problems and instructions
for resolving them, as Web Figure 1 (http://www.windowsitpro.com/Files/04/49606/WebFigure_01.gif
)shows. Familiarize yourself with ExDRA before a disaster strikes, so that you’ll be adept at using the
tool and interpreting its information when you’re under pressure during a recovery. You can download
the free ExDRA tool at http://www.microsoft.com/downloads/details.aspx?familyid =c86fa454-416c-
4751-bd0e-5d945b8c107b&displaylang=en.
When databases won’t mount or you suspect Information Store (IS) problems, run ExDRA to find
inconsistencies and errors. ExDRA can check dismounted ISs to see whether the IS shutdown was clean
or dirty. Additionally, ExDRA will tell you which eseutil.exe and Isinteg.exe commands you need to run
to check and repair the database(s) and transaction-log files. ExDRA will perform for you the checks
you’d typically do by using these commands:

isinteg -s ServerName -test allfoldertests

which checks the higher-level IS database-table–structure integrity (replace ServerName with the name
of your Exchange server), and

eseutil /g

which checks the physical database pages. ExDRA will run similar commands for you to check IS integrity
and database consistency, then will then give suggestions and examples for fixing the problems.

Brought to you by CA and Windows IT Pro eBooks


20    Data Protection and Disaster Recovery Tips

At the outset of your database-mounting problems, if you don’t suspect Windows system problems,
full disks, or viruses as the cause, restart the Information Store service or the complete Exchange
server. During the Information Store service startup, selecting the soft recovery option—which checks
database consistency and replays uncommitted transaction logs into the database—could fix the
problems automatically. The Microsoft article “How to identify logical corruption problems in Exchange
Server,” http://support.microsoft.com/?kbid=828068, provides more information about troubleshooting
Exchange database-corruption problems and is a useful addition to your disaster recovery kit.

Brought to you by CA and Windows IT Pro eBooks


21

Chapter 5:

Putting Together Your


High Availability Puzzle
Improve system, database, and data availability with SQL Server 2005

By Kalen Delaney and Ron Talmage


With every release of SQL Server, Microsoft has emphasized one area of technology. For SQL Server
7.0, that area was scalability; for SQL Server 2000, it was security. For SQL Server 2005, the emphasis is
system and database availability. Microsoft has not only added one completely new technology, database
mirroring, to achieve higher availability, but also substantially improved existing availability features.
SQL Server 2005 provides four high-availability technologies: failover clustering and database
mirroring, both with supported automatic failover; and log shipping and replication, with either manual
or custom-coded failover. Because Microsoft supports automatic failover for both failover clustering
and database mirroring, they’re clearly the technologies of choice to maximize uptime. If you don’t need
automatic failover or you’re willing to custom-code your automatic failover processes, log shipping and
replication might provide the availability you need.
These four availability solutions address a system and database failure. However, Microsoft has
also addressed another aspect of availability in SQL Server 2005: the availability of data in a highly
concurrent system. If you can’t access the data you need because another process has it locked, you have
an availability problem. Microsoft has added several new features to support data availability in highly
concurrent environments, including snapshot isolation and online index building.
In addition, some enhancements to the database restore process can make your data available
more quickly. Although you probably think first about restoring a database as part of recovery from a
failure, keep in mind that you might perform a database restore for other reasons, such as when you
move to new hardware or create a test system with data from an earlier backup. Two new features
that make your data available more quickly during a restore are online recovery and fast recovery (see
“Faster Restoring” in this article). Let’s look at what you can expect from these new and improved high-
availability features.

Failover Clustering
Of SQL Server’s high-availability solutions, failover clustering remains the technological leader. A
failover cluster consists of a set of redundant servers (called nodes) that share an external disk system.
Clustering requires special Windows software. In addition, to be eligible for Microsoft support,
Microsoft must certify your entire cluster configuration, and it must be listed in the Windows Catalog

Brought to you by CA and Windows IT Pro eBooks


22    Data Protection and Disaster Recovery Tips

in the cluster solution category. During a cluster failover, a virtual SQL Server instance moves from one
node to another.
As a result, a cluster failover appears to external applications as if the virtual SQL Server instance
is briefly unavailable (usually for less than a minute), then available again. The instance seemingly just
stops and restarts. Behind the scenes, an orderly process takes place quickly. One SQL Server instance
located on one physical server becomes unavailable. Windows closes the database data files that the
instance had open on a commonly shared disk space. Then, another SQL Server instance starts on
another physical server, opens the same data files, and takes over the virtual server name and virtual IP
address of the failed instance.

Server vs. Data Redundancy


The fact that SQL Server’s cluster failover works at the SQL Server instance level is its essential
advantage. Because an entire instance can fail over from one to another node of a cluster, all server
settings remain the same. All data files are the same, including system databases; therefore, all logins,
permissions, SQL Server Agent jobs, server configurations, and more are preserved. Failover clustering
is the only SQL Server high-availability technology that provides such server redundancy.
Unfortunately for failover clustering, server redundancy doesn’t imply data-file redundancy.
Because failover clustering makes use of shared disks among the nodes of the cluster, even though those
disks might be located in redundant arrays and on a SAN, that common drive system is a potential point
of failure. Some SAN vendors provide methods for replicating SAN data over relatively long distances,
but the technology can be costly and complex to administer.
SQL Server 2005 extends the range of clustering and uses the full capabilities of Windows
clustering. The number of nodes that SQL Server 2005 Enterprise Edition supports is now limited only
by the version of Windows you use. Perhaps the biggest news in SQL Server 2005 clustering is that the
Standard Edition now supports a twonode cluster, whereas in earlier versions of SQL Server, only the
Enterprise Edition supported clustering.

Database Mirroring
The most exciting new SQL Server 2005 high-availability feature is database mirroring. As discussed,
failover clustering, which provides server redundancy, doesn’t provide data-file redundancy. Although
database mirroring doesn’t provide server redundancy, it provides both database redundancy and data-
file redundancy.
When you set up database mirroring, you use two servers with a database that will be mirrored
from one to the other. The source server is called the principal server, and the database that you want
to protect is called the principal database. The other server, which receives mirrored data from the
source, is called the mirrored server, and the copy of the principal database on it is called the mirrored
database. When mirroring is up and running, the principal SQL Server 2005 instance transmits copies of
the principal database’s transaction log activity to the mirror SQL Server 2005 instance. The copy of the
transaction log activity is written to the mirrored database’s log, then those transactions are executed on
the mirror database. The result is that the mirror database executes the same transaction log activity as
the principal, but slightly behind in time. It mirrors the principal’s activity.

Brought to you by CA and Windows IT Pro eBooks


Chapter 5 Putting Together Your High Availability Puzzle  23

To enable automatic failover, you must specify that the transmission will be synchronous (with
SAFETY set to ON) and also specify a third observer SQL Server instance, called a witness. In
synchronous mode, the principal will wait for acknowledgment from the mirror that it has written the
mirrored log activity to disk before the principal moves ahead with the transaction. In the meantime,
the principal, mirror, and witness all communicate periodically, indicating their online status to each
other.
If the principal server suddenly fails, leaving both the mirror and witness servers still functional,
an automatic failover will occur. After the mirror server detects that the principal is no longer available,
the mirror server queries the witness to discover whether it detects the principal. If the witness also
can’t detect the principal, the mirror promotes itself to the principal role and brings its database online
as the new principal. The witness then records the presence of a new principal in the configuration.
If the old principal is then brought back online, the former principal finds that the old mirror is
now the new principal, and that it has been “outvoted.” The new principal and the witness agree that
the old principal is no longer the principal server. The old principal then takes on the mirror role and
starts receiving the new principal’s transaction log data. A database mirroring database failover can
occur in just a few seconds.
You can also enable the client to automatically redirect its connections if a failover occurs. If your
application connects to a principal database using ADO.NET or the Microsoft SQL Server Native Client
(SQL Native Client), the driver will automatically redirect connections when a database mirroring
failover occurs. You just specify the initial principal server and database in the connection string (and
optionally the failover partner server). If a mirroring failover occurs and your application attempts to
connect, the driver will detect the application and redirect the connection to the former mirror server,
which is now the principal.

Mirroring Restrictions
When you set up database mirroring, the principal database must be in the Full recovery model
and the mirror database must be restored with NORECOVERY. Therefore, you can’t read from the
mirror database, although you can make a database snapshot of it on the mirror server. The principal,
mirror, and witness must all be distinct-SQL Server instances: you can’t mirror a database on a single
SQL Server instance. Related to that restriction, the principal and mirror databases must have the
same name, and you can mirror only from one principal database to one mirror database. (However, a
server that’s a principal for one database can be a mirror in a different mirroring session for a different
database.)
Database mirroring requires either Enterprise Edition or Standard Edition for the principal and
mirror servers. The witness server, which is only an observer in a mirroring session, can be any edition
of SQL Server—including SQL Server 2005 Express Edition. The Standard Edition supports mirroring
only in synchronous mode (with SAFETY set to ON), whereas the Enterprise Edition also supports
mirroring in asynchronous mode.
What’s exciting about database mirroring is that it can provide very high availability, in most
scenarios failing over from one server to another in just a few seconds. This failover is automatic, just
like clustering, but much faster. And, unlike failover clustering, database mirroring doesn’t require
additional expensive and proprietary hardware for support. Database mirroring is supported on

Brought to you by CA and Windows IT Pro eBooks


24    Data Protection and Disaster Recovery Tips

commodity hardware and is easy to manage and monitor. As a result, in some cases, it can provide
higher availability than clustering at a significantly lower cost.
Of course, database mirroring provides redundancy only at the database level. Therefore, unlike
failover clustering, when you have a database mirroring failover, you must ensure that the mirror server
has all the proper logins, SQL Agent jobs, SQL Server Integration Services (SSIS) packages, and other
supporting components and configurations.
In addition, if you have a SQL Server instance with many interdependent databases, enabling
mirroring with automatic failover might not be appropriate. If only one database fails over, you could
end up with one database online on one server and all the other databases online on another server.
Then, the dependencies among the databases would break. As of this release, you don’t have a way to
bind a set of mirrored databases so that they all fail over together (although that’s a natural next step in
the evolution of database mirroring).

Log Shipping
You can think about log shipping as the opposite of failover clustering, at least from a technology
standpoint. It’s the low-tech, low-cost way to provide database redundancy, but without any automatic
failover. You might be tempted to view log shipping as simply a slow method of database mirroring,
but the underlying technologies are completely different. In log shipping, you automate the SQL Server
process of backing up transaction logs from a primary server and restoring them to a secondary server.
(Database mirroring uses a special endpoint transmission technology, and no intermediate files are
involved.)
In SQL Server 2005, you’ll find several important changes in log shipping. First, the supported
version of log shipping is now available in all editions of SQL Server that support SQL Server Agent,
which means in all editions except SQL Server Express. Additionally, SQL Server 2005 log shipping is
exclusively stored procedure and SQL Server Agent-based and doesn’t use database maintenance plans.
Finally, although a monitor server was required for SQL Server 2000 log shipping, that server is optional
in SQL Server 2005.
All of these changes are clearly improvements, but they come at a cost. SQL Server 2000 log
shipping can’t be directly upgraded to SQL Server 2005, because maintenance plans are no longer used.
Instead, you must manually reestablish log shipping on an upgraded set of servers.
SQL Server 2005 log shipping doesn’t support automatic failover. If the primary log shipping server
fails, you must recover the secondary server yourself, either manually or based on your own custom-
coded failure detection. You can set up a system to make role reversals easy, so that controlled failover
and failback, although still manual, involve only a few steps.
Like database mirroring, log shipping provides database redundancy only, not server redundancy.
So just as with database mirroring, you must ensure that the secondary server is kept in sync with the
primary for such matters as logins, permissions, and SQL Server Agent jobs. On the other hand, unlike
database mirroring, you can ship logs to multiple secondary servers.

Brought to you by CA and Windows IT Pro eBooks


Chapter 5 Putting Together Your High Availability Puzzle  25

Replication
Replication, which has been available since SQL Server 6.0, is one of the oldest high-availability features
in SQL Server. Although providing high availability isn’t replication’s primary purpose, in many cases, it
does so successfully.

Merge Replication
Microsoft designed merge replication for use by occasionally connected computers (e.g., laptops), but
you can use it between database servers to support high availability. On systems with low to moderate
activity, merge replication can provide redundant databases—although not with automatic failover.
Merge replication offers two key benefits: It lets you update the same data on both the publisher and
a subscriber, and it lets you manage any conflicts automatically. Also, merge replication offers the
unique capability of automatic synchronization: When either a publisher or subscriber goes offline or
is disconnected, each can work autonomously. When they’re reconnected or brought back online, they
automatically synchronize with each other. Merge replication can’t, however, guarantee transactional
consistency when multisite updates of the same data are involved.

Transactional Replication
You often see transactional replication used for high availability because its performance can be much
better than that of merge replication and because it can guarantee transactional consistency between
the publisher and subscribers. Perhaps the most common high availability scenario for transactional
replication occurs when you copy data from one database, the publisher, to one or more subscribers
through a distribution server. The subscribers are treated as read-only, and updates occur only on the
publisher. If the publisher fails, one of the subscribers can become a read/write server and accept data
updates—and even become a publisher to the other subscribers.

Peer-to-Peer Transactional Replication


SQL Server 2005 provides a new form of transactional replication, peer-to-peer, in which each server is
both a publisher and a subscriber to the same data set. The replication is essentially two-way, similar
to merge replication. Unlike merge replication, however, peer-to-peer transactional replication doesn’t
provide automatic conflict management. Instead, you must ensure either that updates occur to just one
database or that the updates are partitioned so that the same data isn’t updated at the same (or nearly
the same) time.
Like log shipping, replication is supported in all editions of SQL Server 2005 that support the
SQL Agent service, so only SQL Server Express is excluded. If you want to ensure that failover to a
subscriber will occur, you need to manually intervene or write custom code to detect a failure and
perform the failover procedures. Also, just as in log shipping, you must ensure that the servers are
configured appropriately to support failover.

Availability in a Highly Concurrent Environment


If another user has locked the data you need, it doesn’t matter how sophisticated your failover
solutions are, your data is still unavailable. SQL Server 2005 provides a new technology called row-level
versioning (RLV) to reduce the effect of locking on data availability. The most far-reaching feature that
uses RLV is SQL Server 2005’s new snapshot isolation.

Brought to you by CA and Windows IT Pro eBooks


26    Data Protection and Disaster Recovery Tips

Snapshot Isolation
You can enable snapshot isolation as a database setting in all editions of SQL Server 2005. Snapshot
isolation lets SQL Server keep track of previous versions of all modified data. Therefore, even though
the data is still locked while it’s being modified, other transactions can access a previous committed
version of the locked data. Data is more available. However, as always, you pay a price.
The older versions of changed rows are stored in the tempdb database, and for systems that have a
large amount of modified data, tempdb space requirements can grow dramatically. On any system that
employs snapshot isolation, a DBA must carefully monitor the amount of row versioning that occurs
and watch the size limits for the tempdb database. You see another cost of using row versioning when
many changes are made to the same rows. SQL Server will maintain all changes to any row in a linked
list as long as any open transaction or running statement might need the older versions.
Additional changes to the same row will cause a new row version to be linked to the front of the
list. A query that needs to select older versions of data might need to traverse an increasingly longer
version chain, which means that a SELECT statement can take a long time to execute, even though
the data is technically available. The data modification operations will also be slower because previous
versions of the rows must be added to the linked list.

Online index creation


SQL Server 2005’s RLV technology also supports another high-availability feature, online index creation,
which is available only in the Enterprise and Developer editions. Typically, building or rebuilding an
index makes the index unavailable. If you build or rebuild a nonclustered index, no modifications
are permitted on the base table because the nonclustered index must be maintained with every data
modification. If you rebuild the clustered index, which contains the data itself, the entire table is usually
unavailable during the process.
With the new online index creation feature, the table and its indexes are fully available while
indexes are being built or rebuilt. You must specifically request online index creation by using
either the CREATE INDEX or the ALTER INDEX statement. For example, executing the following
statement performs an online rebuild of the clustered index on the Sales.SalesOrderDetail table in the
AdventureWorks database:

ALTER INDEX PK_SalesOrderDetail_


SalesOrderID_SalesOrderDetailID
ON Sales.SalesOrderDetail
REBUILD WITH (ONLINE = ON);

Online index creation uses row versioning to keep the original index rows available even while
changes are being made to the base table. Anyone selecting from the table sees the values as they were
before the rebuild began. As with snapshot isolation, with online index building, you pay a price for
the greater data availability. And again, part of that price is the space required in the tempdb database,
which can be considerable if you’re rebuilding the clustered index on a huge table. (Every row must be
versioned as you build the next index, but you also need space to version any rows modified during the
index-building process.) In addition, the actual building of the index might take more time than if the
building were occurring offline.

Brought to you by CA and Windows IT Pro eBooks


Chapter 5 Putting Together Your High Availability Puzzle  27

Faster Restoring
You might need to restore a database as part of disaster recovery, but you might also perform this
operation when you move a database to a new drive or copy it to a new machine. Restoring from a
backup is also a way to revert a test database to an earlier point in time so you can resume testing from
a known earlier state. To restore a database, SQL Server first copies the data and the log records from
the backup media, then goes through a process called recovery.
Usually, recovery applies to all files and filegroups and involves two phases. In the first phase,
called redo, all transactions marked in the transaction log as committed are verified in the data files
and redone, or rolled forward, if necessary. In the second phase, called undo, SQL Server checks to
see whether any uncommitted transactions have made changes to data files; those transactions will be
undone, or rolled back. In SQL Server versions before SQL Server 2005, the database wasn’t available
for any use until both the redo and the undo phases were finished.

Fast Restore
A new restore feature available only in SQL Server 2005 Enterprise Edition is fast restore. Fast restore
makes the database available as soon as the redo phase is finished. The data involved in any transactions
that were uncommitted when the backup was made are locked and unavailable in case an undo must be
performed, but the rest of the data in the database is fully available. You needn’t do anything to enable
this feature other than use SQL Server 2005 Enterprise Edition.

Online Restore
Another new restore feature available in SQL Server 2005’s Enterprise and Developer editions is online
restore. Online restore lets you restore damaged files or pages while the rest of the database remains
fully available. For a database to be online, its primary filegroup must be online. Therefore, if any files in
the primary filegroup are damaged, online restore isn’t available. However, some or all of the secondary
filegroups can be offline. You can restore the damaged files from backup while the rest of the database
is online. Only the file and filegroup being restored are offline. In addition, if your SQL Server 2005
database is running under the Full recovery model, you can also restore one or more individual pages
from a file. Only the filegroup containing those pages is offline; the rest of the database is online.

Piecemeal Restore
A final restore enhancement is piecemeal restore, which is new in all editions of SQL Server 2005 and
enhances the SQL Server 2000 partial restore. A partial restore in either SQL Server 2005 or SQL Server
2000 lets you restore only selected filegroups within a database. After the initial partial restore of the
primary filegroup and perhaps some of the secondary filegroups, piecemeal restore lets you restore
additional filegroups. Filegroups that aren’t restored are marked as offline and aren’t accessible until
they’re restored. In SQL Server 2000, you can perform a partial restore from a full database backup only,
but that’s no longer a requirement for SQL Server 2005.

Database Snapshots
One more new SQL Server 2005 feature that many people mention when they discuss high availability
is database snapshots. However, by themselves, database snapshots aren’t strictly an availability feature.

Brought to you by CA and Windows IT Pro eBooks


28    Data Protection and Disaster Recovery Tips

Although it’s beyond the scope of this article to go into any detail about database snapshots, be aware
that making a snapshot of a database has some availability benefits. First, if you’re running tests and
want to revert to an earlier point in time, the database is unavailable while restoring from a backup.
If you revert to a snapshot instead, the period of unavailability is drastically reduced. Second, you can
use snapshots in conjunction with database mirroring to provide a copy of the database for reporting
purposes. If you don’t use snapshot isolation, locking in the source database can make data unavailable
for short periods of time, but a read-only reporting database ameliorates some of that unavailability.

Final Words
The availability of your system, your databases, and your data is crucial to good performance in your
environment. SQL Server 2005 has added new features at every level to improve availability and has
enhanced many existing features to provide increased availability with more ease than ever before. This
discussion of new high-availability features and enhancements to existing features should help you see
which features will best support the availability of your systems, databases, and data.

Brought to you by CA and Windows IT Pro eBooks

Anda mungkin juga menyukai