Anda di halaman 1dari 5

Rescuing a failed domain controller: Disaster recovery in action

By Louis Nel

Version 1.0 June 12, 2006

I recently had to sort out an issue with a failed mirror set (i.e., RAID 1) on a Windows Server 2003 domain controller. No problem, I thought. Well, not quite. The mirror had to be deleted, taking everything from both drives with it. Restoring Active Directory through backup failed. To make a bad situation worse, the DC was the holder of all the Flexible Single Master of Operations (FSMO) roles in this (single) domain. Transferring the roles failed; seizing them was problematic. Disaster recovery? Indeed! This article will show you how to get such a DCand the whole domainback from the brink. As you'll see, a disaster recovery plan is about more than generalities.

Disaster scenario
You have your disaster recovery plan all neatly set out. Then disaster strikes: A Windows Server 2003 domain controller goes down. Okay, not a train smash; you've got up-to-date backups. But restoring Active Directory via backup fails. Now what? Well, you can still reinstall Server 2003 and restore user data from backup. (The latter worksyou've checked.) There's only one problem: This server was the holder of all the FSMO roles. So you're starting to sweat a little, but not too profusely. You know about transferring FSMO roles to another domain controller. But what if that fails? Yes, you can try seizing it. At this stage, you're looking at the stuff disasters are made of, because now your whole domain teeters on the brink. (I'll explain why in a moment.) Admittedly, this is a very particular (and very unfortunate) scenario. But then, the nature of a disaster is its unpredictability. And there are a couple of general lessons to be learned from this specific incident. Here's what I did and what I learned along the way. In this situation, the failed mirror could not be rebuilt in a nondestructive way (I won't go into the whys and wherefores here), making loss of all data on both drives inevitable. I tried restoring AD from backup. It failed, presumably because the backup software that was used (an old version for NT) didn't back up the system state data. Trying to restore with Server 2003's own backup utility (ntbackup.exe) didn't work either. It didn't recognize the backup format of the legacy software.

Lose the roles and you're lost


Next, I tried transferring the FSMO roles that were held by this DC to another DC in the domain. It failed. Then I attempted seizing the roles, but the error messages I got (Figure A) did not look promising. I nevertheless attempted seizing every role, and strangely enough, after completing the whole procedure described below, I saw that the roles had been seized successfully. (Don't ask me, ask Microsoft.) But what happens if you do lose the FSMO roles? Let's just say that losing some of them can have bone-chilling implications. For example, without the RID Master,

Figure A: The attempt at seizing the roles resulted in these errors. (Note: the domain name, DC/server name, and CN name have been edited out for security reasons.) Page 1

Copyright 2006 CNET Networks, Inc. All rights reserved. For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html

Rescuing a failed domain controller: Disaster recovery in action

if you have more than one domain, you won't--with immediate effect--be able to move security principals from one domain to another. You also won't be able to add new users, groups, and computers to the domain. You won't experience the latter problem immediately, as each DC in the domain has a pool of 512 RIDs. But after that, you're dead in the water. Now you're faced with the prospect rebuilding the whole domain.

Replication to the rescue


So what are your options (apart from re-creating the domain)? Reinstalling and replicating. If you have a big AD (and maybe slow WAN links), replication is not an attractive option, but it might be your only choice. If you have another DC in the same site as the failed DC, you're in luck, because replication will be much faster. Tip If it will speed up things, take the DC you're reinstalling to the same location as the one you intend replicating from. In my case, the two DCs in the same site were separated by a wireless link that would have slowed replication down, so I took the one across.

Reinstall Windows Server 2003 on the failed machine, make it a DC (run DCPromo), and install and restore whatever other services there were on the machine, like DHCP, WINS, DNS, and IIS. When you're finished, start replicating. Now you're ready to restore your data.

First, clean up
Before you reinstall Windows Server 2003 on the failed machine and make it a DC, there's an important job to do: a metadata cleanup. This entails removing the dead DC from AD (more technically speaking, removing the ntdsDSA object). You have to be an Enterprise Administrator to perform this task. A word of caution: Be absolutely sure this is the route you want to take before you do the metadata cleanup. There's no turning back (at least none that I'm aware of). How you perform the cleanup will differ depending on whether you want to name your new DC the same as the old (failed) one. I suggest retaining the old name, as it simplifies matters a lot (for example, with shares). However, if you always wanted to rename that DC, now is the time. Let's start with the steps to follow if you want to give the new DC the same name. In this case, you'll have to remove the old DC's ntdsDSA object. The commands differ slightly depending on whether the DC in question has Service Pack 1 (SP1) installed. If SP1 is installed, metadata cleanup also removes File Replication Service (FRS) connections and as part of the process, tries to transfer or seize any operations master roles that the retired DC holds.

1 2 3

Type ntdsutil at the command prompt.

At the ntdsutil: prompt, type metadata cleanup and press [Enter].

If SP1 is installed, type remove selected server ServerName. (See Figure B.)

If SP1 is not installed and you're using the version of Ntdsutil.exe that's included with Windows Server 2003 with no service pack, connect to the existing domain controller (in our case, the one in the same site as the failed DC) on which you want to remove the failed DC's ntdsDSA object. To do this, type connections at the metadata cleanup prompt and press [Enter].

Type connect to server <servername>, where <servername> is the DC that will be used to clean the metadata, and press [Enter]. It can be any working DC in the same domain, but we'll use one in the same site. Figure C shows this step on a DC that does not have SP1 installed. Page 2

Copyright 2006 CNET Networks, Inc. All rights reserved. For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html

Rescuing a failed domain controller: Disaster recovery in action

5 6 7 8 9 10 11 12 13 14 15

Type quit and press [Enter].

Type select operation target and press [Enter].

Type list domains and press [Enter]. All domains in the forest will be listed.

Type select domain <number> and press [Enter].

Type list sites and press [Enter].

Type select site <number> (the number of the site in which the DC was a member) and press [Enter].

Type list servers in site and press [Enter].

Type select server <number>, where <number> is that of the DC to be removed, and press [Enter].

Type quit and press [Enter].

Type remove selected server and press [Enter].

Type quit and press [Enter] until you're back at the command prompt.

Figure B: Starting the metadata cleanup process using ntdsutil on a DC with SP1 installed

Figure C: Starting the metadata cleanup process using ntdsutil on a DC without SP1 installed

Page 3
Copyright 2006 CNET Networks, Inc. All rights reserved. For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html

Rescuing a failed domain controller: Disaster recovery in action

If you're going to take the plunge and give the DC a new name, you'll have to remove the failed server from the Sites & Services and Users & Computers snap-ins. NB: Don't do this if the new DC will have the same name as the failed one.

1 2 3 1 2 3

Open the Sites & Services snap-in.

Select the relevant site.

Delete the server object representing the failed DC.

Open the Users & Computers snap-in.

Select the domain controllers container.

Delete the computer object associated with the failed DC.

Lessons
Here are some things you should know, check, and do before disaster strikes: This might seem pretty obvious (but how many of us do it): Plan for what-if (worst-case) scenarios. That's what's meant by "disaster", right? Don't bargain on anything (backups working, etc.) Outline procedures to recover from disasters like these. Put a fair amount of detail in your disaster recovery documentation. You need more than generalities. Have the procedures for tasks like seizing FSMO roles set out clearly as part of your disaster recovery plan. It will speed up recovery considerably in case of a crisis. Even better, test your procedures in the calm environment of a test lab. Regularly check that you have what it takes to recover from a disaster. For instance, how up-to-date is the backup of your system state data? When it comes to system state data, age matters. If your system state backup is older than the tombstone age, you're in for trouble. The default tombstone lifetime is 60 days. (A tombstone keeps tabs on objects deleted but not yet completely removed from AD.) To prevent inconsistencies in AD, you're prevented from restoring data older than the tombstone lifetime. Prepare to speed up recovery (and take pressure off yourself) by making separate backups of DNS and DHCP and all server drivers. Ensure that your disaster recovery procedure is set out clearly and systematically, listing the steps to follow and the order in which things should be done.

Potential pitfalls
Install the relevant service pack(s) and critical updates immediately after reinstallation. Remember to check shares and permissions. I also had to restore mapped drives. Also, remember to set up the time service again if you had to follow the recovery route described above. And just to add to the fun: If you apply Server 2003's SP1, you might run into a problem with the time server service not starting. You'll find the solution here.

Page 4
Copyright 2006 CNET Networks, Inc. All rights reserved. For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html

Rescuing a failed domain controller: Disaster recovery in action

Additional resources
TechRepublic's Downloads RSS Feed Sign up for TechRepublic's Downloads Weekly Update newsletter Sign up for our Network Administration NetNote Check out all of TechRepublic's free newsletters "Familiarize yourself with Active Directory's five FSMO roles" (TechRepublic article) "Mastering the Active Directory Schema" (TechRepublic download) "Managing OUs, Users, and Groups in Active Directory" (TechRepublic download)

Version history
Version: 1.0 Published: June 12, 2006

Tell us what you think


TechRepublic downloads are designed to help you get your job done as painlessly and effectively as possible. Because we're continually looking for ways to improve the usefulness of these tools, we need your feedback. Please take a minute to drop us a line and tell us how well this download worked for you and offer your suggestions for improvement. Thanks! The TechRepublic Downloads Team

Page 5
Copyright 2006 CNET Networks, Inc. All rights reserved. For more downloads and a free TechRepublic membership, please visit http://techrepublic.com.com/2001-6240-0.html

Anda mungkin juga menyukai