Troubleshooting Active Directory Replication

Troubleshooting Active Directory Replication
Keep your domain controller data current to avoid a host of user account problems
Feb 14, 2011Sean Deuby | Windows IT Pro
EMAIL
INSHARE
COMMENTS 2
One of Active Directorys (ADs) advantages is that its a distributed application. Its functionality is spread
across multiple domain controllers (DCs) so that the failure of any one DC wont affect the overall
availability of AD. To accomplish this, AD must move its information around freely and efficiently between
its DCs in a process known as replication. The AD replication model is a powerful, fault-tolerant, and
complex system. Its also the area that seems to cause the most issues for AD administrators. But thats
usually not ADs fault.
Why should you monitor replication and keep it working well? If replication isnt working to one or more
of your DCs, a segment of your user population wont be kept current with the latest directory data. This
could result in a host of problems: Password changes arent seen; accounts unlocked by administrators
arent accessible by the account owner; users dont have access to applications (even though theyve been
added to the correct groups); new users cant log on (even though their accounts have been created); and,
very importantly, terminated employees might be able to access the network after their accounts have
been disabled.
Replication issues can also affect Group Policy functioning and site or subnet changes. A DC that hasnt
successfully replicated with its partner DCs will be tombstoned out of the forest and must be rebuilt.
Replication problems can also affect schema updates and have been known to cause forest-wide failures.
The Layered Approach

AD administrators should invest a little time to make sure that AD replication is working correctly for the
health of their directoryand of their jobs. As a distributed application, AD depends on all the layers of
infrastructure on which its built. Most of the issues that cause AD service interruptionsincluding
replicationcan be traced back to infrastructure or to administrative error (such as accidentally deleting
objects). So, the first step in any AD replication troubleshooting must be to make sure that your
infrastructure is working correctly. I call this technique troubleshooting from the wire up.
I use the seven-layer OSI network model (physical, data link, network, transport, session, presentation,
and application) as a basis for my own AD troubleshooting model. My model is as follows:
Physical (i.e., the wire)
Network
Name resolution
OS
Authentication
The AD application itself
The physical layer refers to the physical network infrastructure: the wires that make networking function.
If someone disconnects a network patch cord or runs a backhoe through a fiber circuit, replication isnt
going to work.
The network layer refers to network connectivity above the physical layer: router, switch, and, especially,
firewall functionality. With regard to firewalls, DCs communicate over so many portssome
dynamicallythat its important to carefully follow the guidance laid out in the Microsoft article How to
configure a firewall for domains and trusts.
Another network-related issue is remote procedure call (RPC) errors, such as RPC server is
unavailable. The Ask the Directory Services Team blog includes a very informative post about how to
troubleshoot these errors by using the PORTQRY utility. (For more information, see the Microsoft
Directory Services Team blog article Using PORTQRY for troubleshooting.)
Name Resolution: Suspect #1

Name resolution is where you should focus most of your AD troubleshooting efforts because the majority
of AD-related problems are caused by name resolution configuration issues. Several years ago, Microsoft
Product Support Services traced 80 percent of AD cases to name resolution issues. (For more information,
see Troubleshooting networks without NetMon.)
AD is dependent on DNS to register and resolve all the myriad services and nodes it needs, and there are
many ways to configure DNS incorrectly. Microsoft has long recognized this, and the DCPROMO wizard
has grown increasingly more sophisticated in the way that it configures DNS. Windows IT Pro has
published a variety of articles about DNS, including several by Boyd Gerber, a Microsoft network
escalation engineer who specializes in DNS. See the Learning Path for a list of Windows IT Pro DNS
articles. (For more information about how to troubleshoot DNS, see the Microsoft TechNet article
Troubleshooting Active Directory-Related DNS Problems.)
Probably the best command to debug DNS problems is DCDIAG /TEST:DNS. This diagnostic command
comprehensively tests the DNS service of a DC or of the server that you direct it to by using the /S switch.
Using the /V (verbose) switch provides detailed test results. Adding the /E (enterprise) switch runs the
command on all DNS servers in your forest. Finally, you can better analyze the volumes of information
that this command provides by piping the output to a file by using the /F switch.
Many of these techniques are covered in the DNS page of my Active Directory Troubleshooting flowchart.
You can find additional AD troubleshooting tips on my Active Directory Troubleshooting Tips and Tricks
blog.
One aspect of AD thats not well known is how name resolution is tied to replication. One of the most
common errors we see when replication isnt working is some kind of name resolution error, such as RPC
server is unavailable or DNS lookup failure. Because we humans and most computer services locate other
computers on the network by using the DNS A record (e.g., mycomputer.deuby.net), its natural to assume
that this is also how DCs find each other for replication. They doeventually. But only indirectly. For
replication purposes, a DCs directory service registers a GUID in DNS as a CNAME (alias) record. This
GUID is unique in the forest. The CNAME is known as the DSA object GUID, and it resolves to the DCs A
record. When a directory service on a DC tries to locate its replication partners, it uses the Fully Qualified
Domain Name (FQDN) of the CNAME (e.g., 802e2778-27d1-49ca-9d125c439f4c4d3b._msdcs.deuby.net).
If you want to find a DC in the same way that another DC really locates one, you have to find its GUID.
There are several ways that you can find the DSA object GUID of a DC. One way is to look it up in the
Microsoft Management Console (MMC) DNS Management snap-in under the _msdcs container of the
domains zone. However, this method works only if the GUID is registered correctly in DNS. If you arent
sure whether it is, a simple way to verify the registration is to run the command
REPADMIN /SHOWREPL <dcname>
In this command, dcname represents the name of the DC thats experiencing replication problems. The
DSA object GUID is one of the first items listed in the response. Append _msdcs.domain.com to the
GUID, and that will be what you have to ping.
After you obtain the DSA GUID, ping it from a DC thats receiving the errors. (You could also do this from
your own client, but that would probably introduce another variable because you might be using a
different DNS server than the one the DC is using.) If you get no response from the ping, or if you receive a
could not find host error, the replication problem most likely occurs because the CNAME or A record
isnt registered correctly. Reregister the DCs GUID and its SRV records either by running the NLTEST
/DSREGDNS command or by restarting the NETLOGON service.
Critical Layers: Health and Authentication

The importance of checking the OS health of the DC should be self-evident. AD is an application that runs
on top of (or, in the case of Windows Server 2008 R2 or Windows Server 2008, is a role of) the Windows
Server OS. Theres nothing unique about OS troubleshooting on a DC compared with troubleshooting any
other application role. However, a dedicated DC does have an advantage over other application roles if you
do encounter OS problems. Instead of spending hours trying to fix an ailing OS, you can simply demote
the DC, or forcibly remove the role by using DCPROMO /FORCEREMOVAL. Then, you can quickly
rebuild the OS and reinstall AD. This is often the quickest way to get a DC working again.
Similar to name resolution, the authentication layer of the AD troubleshooting model isnt exactly a
software layer. Its a vital component within AD that, among other functions, determines the valid
identities of the DCs themselves to allow them to safely communicate with one other. Kerberos is the
security protocol thats used, and the Kerberos Key Distribution Center (KDC) is part of every DC. If you
arent familiar with this protocol (and every AD admin should be), the Microsoft Directory Services Team
blog has a helpful article. (For more information, see Kerberos for the Busy Admin.)
Kerberos itself is an extremely reliable AD component. With respect to replication between DCs, many
authentication-related failures are actually caused by external problems, such as time skew between
computers. The W32TM utility is the main tool for correcting time skew, which it does by managing the
Windows Time service. For example, you can perform the following actions by running the corresponding
W32TM commands:
Check the last time that your target DC successfully synchronized its time, and with what server: w32tm /query
/status
Force the service to use another DC in the domain: w32tm /config /syncfromflags:DOMHIER
Force the service to rediscover its network resources, then resynchronize with its time source: w32tm /resync
/rediscover
If youve virtualized some of your DCs, make sure that theyre not synchronizing time with their host but
are synchronizing instead with their partner DCs. (For more information about how to troubleshoot
Kerberos, see the Microsoft article Troubleshooting Kerberos Problems). For more information about
Kerberos troubleshooting by using network traces, even though the cause of the problem is name
resolution, see the Microsoft Directory Services Team blog article "Troubleshooting Kerberos
Authentication problems Name resolution issues". The Windows Server 2003 R2 Kerberos Technology
Centeralso provides a range of Kerberos-related articles.)
How Replication Works

Before you can effectively troubleshoot replication, you must understand how it works. Replication is the
process of forwarding updates for a directory partition to all DCs that have a copy of that partition. For
example, if you make a change to a user account in the domain child1.mycompany.com, replication
forwards that change to the other child1 DCs because those controllers have a copy of (that is, they host)
that domain partition. If you make a change to the site configuration for mycompany.com, replication
forwards that change to all other DCs in the mycompany.com forest because site information is stored in
the configuration partition thats hosted on every DC in the forest. Replication works on a per-partition
basis, making replication topology more complicated to understand. The good news is that when
replication fails, it usually fails for all partitions on a DC because of issues that affect the supporting
infrastructure.
To fine-tune the way that DCs replicate with one another, you create an AD site topology that contains
your forest's DCs. The site topology is a network of its own that has sites as its nodes and site links as the
connections between the nodes. The topology is usually based on your company's LAN and WAN
configuration. You can further tune the way that replication connections are generated between sites by
changing the relative cost of the site link (i.e., how expensive the WAN circuit is).
Within a site, each DC uses its Knowledge Consistency Checker (KCC) and its knowledge of the site
configuration that's stored in the configuration partition to create connection objects between DCs.
Connection objects are the pathways that transmit AD objects and attributes to other DCs (replication
partners) via the replication process. These connection objects are one-way pathways. This means that
every DC must have at least one inbound connection object to receive updates from each upstream
replication partner, and at least one outbound connection object to transmit updates to each downstream
partner. Replication from one DC to another is triggered by the upstream DC when it advertises to its
replication partners that it has an update to share. The DC advertises this almost immediately (within 15
seconds).
In the same way that DCs are connected within a site, sites are linked to each other for replication by
connection objects. But the way that the connection objects are created is controlled by how you set up the
site links. Most administrators turn down the site link replication interval to 15 minutes from its default of
180 minutes. If you allowed every DC in every site to replicate with every other DC, the situation would
quickly become unmanageable. Therefore, one DC is configured as the bridgehead server for each
directory partition in each site. In most cases, one bridgehead server handles intersite replication for all
directory partitions.
Both within a site and between sites, replication is a pull operation. In other words, a DC always requests
updates from its upstream partners instead of pushing them out to its downstream partners. Therefore,
when you troubleshoot, you should always think of objects and attribute updates as incoming requests to
the DC that youre working on. (For comprehensive documentation about replication, see the Microsoft
TechNet article How Active Directory Replication Topology Works.)
The Right Tools for the Job

Now that you have the basic concept under your belt, and youve presumably verified that all the
underlying AD components are working correctly, what tools will you use to fix replication? The first thing
to do is to run DCDIAG on the target DC to check its general health. DCDIAG is the main diagnostic utility
for DCs. It runs a suite of 27 tests by default. For example, Figure 1 shows the Replications test failing for a
DC named GODAN.
Figure 1: Results of running the DCDIAG tool Replications test for a DC named GODAN
If a DCDIAG test results in warnings or failures, and if the reason for it isnt immediately obvious, you
should rerun DCDIAG. In the follow-up run, focus on the specific test that failed, and specify verbose
operation. In this case, DCDIAG /TEST:Replications /V provides little extra useful information; however,
a follow-up run of the DCDIAG test on the source DC (Kyoshi) reveals that the directory service isnt
running.
The next utility to concentrate on is REPADMIN. REPADMIN is the Swiss Army knife of replication
utilities. It has 69 different commands in three tiers of increasing complexity, from simple checks to
destroy-your-own-directory commands. As if that werent enough, the syntax of commands often varies
slightly between versions. Knowledge of some of the more arcane REPADMIN commands is a requirement
for directory service nerd-dom. You can use REPADMIN /?:command to get detailed help about
individual commands in Server 2008 R2 or Server 2008. Table 1 shows a list of REPADMIN commands.
(For more information about how to use the Windows 2003 version of REPADMIN, see the Microsoft
article "Troubleshooting replication with repadmin".)
Table 1: Common REPADMIN Commands
Generally, the first REPADMIN command to run is /SHOWREPL, which is targeted to the DC thats not
receiving updates. Figure 2 shows the result. This is an intimidating result if you havent looked at it
before. The data is easier to understand if you break it into sections. The first section, preceding the
dashed line, shows general information about the DC. In particular, the data shows that the DC is a Global
Catalog server, and it shows the DSA GUID. The next section shows every partition, in distinguished name
(DN) format, that this DC hosts. It also shows the DCs replication partner (and the partners DSA GUID)
and the time that the DC last replicated successfully.
Figure 2: Results of running the REPADMIN command /SHOWREPL
Knowing Where to Look

Replication usually fails on a per-DC basis. So if you see replication from one partition failing and from
another partition succeeding, this probably means that the partitions are replicated from different DCs. In
this simpler case, restarting the KYOSHI NETLOGON service clears up the problem. After you obtain and
study this detailed replication information, troubleshoot from the wire up to eliminate the most likely
suspects. (For more help, you can refer to the replication page of my Active Directory Troubleshooting
flowchart on my Active Directory Troubleshooting Tips and Tricks blog.)
If the replication problem that youre troubleshooting is between sites, first check that the sites of the
upstream and downstream DCs are connected to one other by site links. To learn which DCs are the
bridgehead servers between these sites, run
REPADMIN /BRIDGEHEADS *
(The asterisk returns the bridgehead servers for all your sites.) Then, run
REPADMIN /FAILCACHE FSMO_ISTG:<site>
This command targets the intersite topology generator for the site thats represented by
the site parameter. It also displays a list of failed replication links that are detected by its KCC. If the
problem is caused by an incorrect site topology (e.g., someone moved a DC to a new site without creating a
site link object to connect it to the other sites), or if youre simply moving DCs around, REPADMIN /KCC
will force the KCC to recalculate and create connection objects between DCs so that you dont have to wait
for its scheduled run.
When you think youve fixed the problem thats preventing replication, you can trigger general replication
for all your target DCs partners by running
REPADMIN /SYNCALL
or for a specific partner and directory partition by running

REPADMIN /REPLICATE <targetDC> <sourceDC> <directory
partition>
in which the directory partition is, for example, DC=Deuby,DC=net.

Its important to monitor replication on a regular basis so that you can correct any issues before they get
out of hand. The easiest way to do this is to run
REPADMIN /REPLSUMMARY
regularly. Doing this provides you a replication summary of all the DCs in your forest. For deeper analysis,
you can run
REPADMIN </command> *
(instead of using a DC name). This runs a REPADMIN command, such as /SHOWREPL, against every DC
in your forest. Tim Springston, an escalation engineer in Microsofts Premier Customer Support Group,
has blogged about how to use REPADMINs /CSV option to create an organized output of /SHOWREPL *
that you can use to look at the replication status of all your DCs in Microsoft Excel. (See "Get the Lowdown
on your Replication".)
Heres another tip thats no more technical than a dry erase marker: Use a large whiteboard when you
troubleshoot replication issues between multiple DCs or sites. Otherwise, the complexity of the
relationships between DCs, directory partitions, and sites will quickly make your head spin.
Finally, I want to put in a good word for an old replication tool that doesnt seem to get much respect:
REPLMON. This utility is part of the Windows 2003 and Windows 2000 Support Tools, and it provides a
graphical view of your replication topology. It cant do nearly as many things as REPADMIN, and some
features dont work with Server 2008 R2 or Server 2008. But its the best way to learn how DCs establish
connections with one other. (I created a short screencast about REPLMON that will walk you through the
basic steps. To watch it, visit YouTube. To obtain Windows Support Tools, visit the Windows Server 2003
Service Pack 2 32-bit Support Tools download page.)
Bottom Line: Eternal Vigilance

AD replication is a process thats prone to failure. But most of the time, a supporting component is the
cause of the problem. If you experience replication problems, check those AD foundationsphysical,
network, name resolution, the OS, and authenticationbefore you spend much time on AD itself or on the
replication process.
If you correct the underlying problems and give AD a little time to reestablish its connections, many
problems will simply disappear. Become familiar with REPADMIN and keep a good image of the
underlying structure, and youll keep your AD environment healthy.
Troubleshooting networks without NetMon

NedPyle [MSFT]
18 Dec 2007 4:09 PM
Hi, Ned here. You may already be asking yourself why Im writing about network troubleshooting. Isnt this the Directory
Services blog? Dont we just care about Kerberos and group policies and the like? Shouldnt the Networking team do all
this heavy TCP/IP lifting?
Well, without the network, Active Directory and all its little pieces dont really amount to much. We are a customer of
networking ourselves and that means to be effective DS engineers we have to understand the infrastructure that moves all
our data around. Otherwise when this important component fails we cant really determine if DS is having issues or the
rd
underlying structure it relies on is in trouble. To be frank, we work a lot of cases here in 3 tier support that came in as
Directory Services symptoms and left resolved as network issues. At one point, 80% of all our DS cases could be tracked
back to DNS configuration problems!
We cant all be network trace gurus though it takes a lot of time and experience to get to the point where you can look
at a capture in NetMon3.1 (or Wireshark, Ethereal, Packetyzer, etc.) and make meaningful sense of all the details. So what
are your options if you suspect a networking problem and you dont feel that NetMon is in your league? You can call us in
Microsoft support, or you can use other tools that are simpler and often just as effective to figure out your issue. Thats
what well do today.
One quick note Im sticking with IPv4 here since thats 99.999% of what youll see.
Network troubleshooting from 30,000 feet
Heres an extremely unattractive flowchart I put together that covers the basic process. We are going into a great deal
more detail below.
At its core, we will always troubleshoot the same way:

1. Whats our symptom and failing component?
2. Do we have basic network connectivity?
3. Do we have good name resolution?
4. Can we test our failing component using reliable tools?
You may be saying What the heck? Does this guy think I was born yesterday? but trust me plenty of engineers that
should know better often rush into step 4 when they really didnt have a good understanding of step 1 or without trying
the basics in steps 2 and 3. Especially when servers are down, the boss is screaming, and the company is losing money.
Note: Unless specified, everything we do here will be from the computer that is reporting the problem or having the
symptom. In all examples the network settings are:
IP address 10.10.0.128 (SRC-CLIENT-01.contoso.com)
Subnet Mask - 255.255.0.0
Default Gateway - 10.10.0.1
DNS Server - 10.20.0.20 (DNS-01.contoso.com)
WINS Server - 10.20.0.30
Our Destination DC - 10.30.0.166 (DEST-DC-01.contoso.com)
1. Whats our symptom and failing component?
Were troubleshooting something not working what exactly? Since this is a Directory Services blog Im going to be
greedy and focus on DS components. Are domain controllers not replicating SYSVOL? Are users unable to logon? Is group
policy not applying? You need to understand the component in question in order to test it at the Application layer of OSITCP/IP.
2. Do we have basic network connectivity?

Next we will determine if the lower layers are working ok. Its very possible that our component is just one of many
victims, but no one else is complaining as loudly. Lets break out a snippet from the flowchart and follow it with some
utilities.
Connectivity test with PING built-in tool in all supported Windows versions
Can we verify our own local networking with:
PING 127.0.0.1
PING 10.10.0.128
PING 10.10.0.1
All should return:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss)
This tests if our NIC responds at all, if our own IP address works, and if we can reach our gateway. If we cant even reach
our gateway but the NIC responds, we probably have a local software firewall issue. Also keep in mind, that most hardware
firewalls (often default gateways in customer environments these days.) do not allow you to ping their interfaces. If you know
for sure that the firewalls private network interface is working it is OK if it fails to respond to a ping.
Can we ping between our problem computer and the destination that our component is trying to reach with:
PING 10.30.0.166
PING DEST-DC-01.contoso.com
PING DEST-DC-01
This proves that we can get to the machine at all on the wire both with and without name resolution. We can only use this
test if your network allows ICMP some customers decide to turn it off internally on routers and private firewalls (and no, I
really havent ever heard a good reason why the days of malware/hackers using ICMP to find machines on a LAN are ten
years behind us; I welcome comments on this). If pinging by address fails, its important to read the error DESTINATION
UNREACHABLE or REQUEST TIMED OUT means routing is having issues and we should move to the routing tests. COULD
NOT FIND HOST means name resolution is broken and we should move to the name resolution tests. You may also want to
ping with the F L 1472command to verify that we can ping without fragmenting a 1500 byte packet.
Routing tests with TRACERT/PATHPING/ARP/ROUTE - built-in tools in all supported Windows versions
Can we check which routes were taking and where the traffic dies with:
PATHPING 10.30.0.166
or
TRACERT 10.30.0.166
Both tools accomplish basically the same thing letting you know where you travel on the network to reach your
destination, and where the journey fails. TRACERT shows fairly quick, basic info:
Tracing route to DEST-DC-01.contoso.com [10.30.0.166] over a maximum of 30 hops:
1 1 ms 1 ms <1 ms router1.network.contoso.com [10.10.0.1]
2 <1 ms 1 ms <1 ms router2.network.contoso.com [10.30.0.1]
3 <1 ms <1 ms <1 ms DEST-DC-01.contoso.com [10.30.0.166]
Whereas PATHPING trades speed for more details:
Tracing route to DEST-DC-01.contoso.com [10.30.0.166] over a maximum of 30 hops:
0 SRC-CLIENT-01.contoso.com [10.10.0.128]
1 router1.network.contoso.com [10.10.0.1]
2 router2.network.contoso.com [10.30.0.1]
3 DEST-DC-01.contoso.com [10.30.0.166]
Computing statistics for 75 seconds...
Source to Here This Node/Link

Hop RTT Lost/Sent = Pct Lost/Sent = Pct Address
0 SRC-CLIENT-01.contoso.com [10.10.0.128]
0/ 100 = 0% |
1 0ms 0/ 100 = 0% 0/ 100 = 0% router1.network.contoso.com [10.10.0.1]
0/ 100 = 0% |
2 0ms 0/ 100 = 0% 0/ 100 = 0% router2.network.contoso.com [10.30.0.1]
0/ 100 = 0% |
3 0ms 0/ 100 = 0% 0/ 100 = 0% DEST-DC-01.contoso.com [10.30.0.166]
Its not usually necessary, but you can also see further routing details with:
ARP a
and
ROUTE PRINT
3. Do we have good name resolution?
Were only in this step if we failed some of our earlier checking, or if we simply feel that we only have partial name
resolution (for example, a DC might have its a record but be missing CNAME and SRV records needed for functionality).
So now well run through some tests to see why our name resolution isnt working or to verify that we have all the records
we need for our component.
Note: Its important that before you do any name resolution testing you always start with the following commands
to ensure that you are not using cached information:
IPCONFIG /flushdns
NBTSTAT -R
Name resolution tests with NSLOOKUP - built-in tool in all supported Windows versions
Can we get the DNS server to give us back the A record with:
NSLOOKUP DEST-DC-01.contoso.com 10.20.0.20

This will return:
Server: DNS-01.contoso.com
Address: 10.20.0.20
Name: DEST-DC-01.contoso.com
Address: 10.30.0.166
Using the fully qualified domain name lets us know A record lookups are working. The important part about using
NSLOOKUP is that it actually uses UDP DNS lookups, whereas the DNSCMD command below makes an RPC connection to
the DNS to return data, and isnt a valid test of the DNS protocol itself.
Name resolution tests with DNSCMD and NSLOOKUP (if appropriate) support tools download for
Windows2000/XP/2003
Can we get the DNS server give us back the CNAME and SRV records of our DCs with:
DNSCMD /EnumRecords _msdcs.contoso.com @ /Type CNAME

and
NSLOOKUP
>set type=all
_ldap._tcp.dc._msdcs.contoso.com
_kerberos._tcp.dc._msdcs.contoso.com
This is usually important for Directory Services engineers because the A record is only part of the puzzle. We also care
about SRV records and CNAME records. Thats how AD works when it comes to LDAP, Kerberos, replication, and so on. So
if you suspect one of those technologies has a name resolution issue this is appropriate to test.
Name resolution tests with NBTSTAT (if appropriate) - built-in tool in all supported Windows versions
Can we get WINS to give us back the records with:
NBTSTAT -c
NBTSTAT -n
This is important since despite all efforts to the contrary, WINS and NetBIOS name resolution are still part of many
products, including DFS Namespaces, Netlogon, Terminal Services licensing, and much more.
If all these name resolution steps check out, its time to move to the Application layer testing phase.
4. Can we test our failing component using reliable tools?
The one youve been waiting for. At this stage weve eliminated the overall possible general network connectivity issues,
and we suspect that just our component is a victim. If the network is fine, the mostly likely problems are filtered firewall
rules and the application layer itself. Lets go down some common paths to figure it out.
LDAP tests with LDP and PORTQRY support tools download for Windows 2000/XP/2003; download Portqry.
Can we verify that LDAP is listening on DC/GCs with:
PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 389

PORTQRY -n DEST-DC-01.contoso.com -p both -e 3268
Heres a sample of working output from the first command:
TCP port 389 (ldap service): LISTENING
Using ephemeral source port
Sending LDAP query to TCP port 389...
LISTENING is good. :-) TCP-based LDAP ports should always be listening on DC/GCs and never return NOT LISTENING or
FILTERED. UDP-based ports should return LISTENING or FILTERED (as they are connectionless). Seeing TCP as FILTERED or
anything as NOT LISTENING should be a red flag to find out why someone has configured a firewall to block or
manipulate LDAP traffic.
NOTE: You should see more data then what is listed in the blog example.
Can we connect to the domain controllers with LDP:
LDP
Connection --> Connect --> DEST-DC-01.contoso.com
Connection --> Bind
View --> Tree --> Select the domain naming context
Browse a few levels deep.
By doing the above with a reliable tool (i.e. not an application that does many things unspecific to LDAP and often use
ADSI rather than pure LDAP) we can see if unadulterated LDAP binds and queries are working. We also know that
authentication is working.
SMB tests with NET USE and PORTQRY - download Portqry.
Can we verify that SMB is listening on port 138 and 445 with:
PORTQRY -n DEST-DC-01.contoso.com -p udp -e 138

PORTQRY -n DEST-DC-01.contoso.com -p both -e 445
The same diatribe above applies here for LISTENING versus FILTERED. If we cannot get to 138 and 445 over the network,
endless zillions of components will fail follow that link to see what I mean, its a good one. If SMB is blocked via firewall
rules, file sharing, group policy, named pipes, and many other applications will fail.
Can we connect over SMB (as an administrator) with:
NET USE \\DEST-DC-01.contoso.com\C$ /p:n

This simple and reliable test tells us that we can map a drive through SMB to the server. It also validates that at least NTLM
authentication is working (to only use NTLM, use an IP address). You could use KLIST or KERBTRAYfrom the Resource Kit
to confirm if theres a Kerberos TGS ticket for that connection as well.
RPC tests with COMPMGMT and PORTQRY - download Portqry.
Can we verify the endpoint mapper is available and returning data with:

The endpoint mapper should always be LISTENING on TCP 135 (never FILTERED or NOT LISTENING) and should return all
of its registered endpoint ports and named pipes. If the endpoint mapper is blocked due to firewall rules, a great many
applications will fail.
Can we connect to the destination server with:
COMPMGMT.MSC
Computer Management --> Connect to another computer
Expand System Tools
COMPMGMT is an included app with simple RPC connectivity needs at startup. This will generate several MSRPC binds,
query and respond to several RPC endpoints, and generally is a good test of basic RPC functionality. The list of RPC-based
applications (from Microsoft and elsewhere) is a mile long and includes such things as AD replication, FRS replication, DFS
Replication, and more.
PORTQRY scripting
Finally, heres a little batch file you can use to run PORTQRY with a set of standard DS-related queries and output to a file.
This is a useful way to see if any ports are looking troublesome even if youre not sure which ones to be looking for. For
the sharp-eyed, yes HTTP/HTTPS is included. Why? Certificate Authority Web Enrollment issues we do a lot more in MS
DS support than deal with account lockouts. :-)
@echo off
REM Sample batch wrapper script for portqry.exe
REM Designed to verify responsiveness of remote server specified on commandline
REM Requires PORTQRY.EXE in same directory as script
REM Example: checkports.cmd DEST-DC-01.contoso.com
REM Please note that this script is provided "AS IS" with no warranties, and confers no rights.
REM Use of included script sample is subject to the terms specified at
REM http://www.microsoft.com/info/cpyright.htm
ECHO Querying DNS
Portqry -n %1 -p both -e 53 > %1_checkports.txt
ECHO Querying DHCP
Portqry -n %1 -p udp -e 67 >> %1_checkports.txt
ECHO Querying HTTP
portqry -n %1 -p tcp -e 80 >> %1_checkports.txt
ECHO Querying Kerberos KDC Service
portqry -n %1 -p both -e 88 >> %1_checkports.txt
ECHO Querying NTP Time Service
Portqry -n %1 -p udp -e 123 >> %1_checkports.txt
ECHO Querying RPC EndPoint Mapper Service
ECHO Querying NetBIOS Name Service (WINS)
ECHO Querying NetBIOS Datagram Service
portqry -n %1 -p udp -e 138 >> %1_checkports.txt
ECHO Querying NetBIOS Session Service
ECHO Querying LDAP
ECHO Querying HTTP over SSL
ECHO Querying SMB
ECHO Querying Kerberos Logon
ECHO Querying LDAP over SSL
ECHO Querying Win2000/2003 AD Logon and Directory Replication
portqry -n %1 -p tcp -o 1025,1026 >> %1_checkports.txt
ECHO Querying Global Catalog

ECHO Querying Global Catalog over SSL
ECHO Querying Terminal Server / Remote Desktop
Portqry -n %1 -p tcp -e 3389 >> %1_checkports.txt
start notepad %1_checkports.txt

Troubleshooting Active Directory Replication

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Troubleshooting Active Directory Replication

Diunggah oleh

Hak Cipta:

Format Tersedia

Troubleshooting Active Directory Replication

The Layered Approach

Physical (i.e., the wire)

The AD application itself

Name Resolution: Suspect #1

Critical Layers: Health and Authentication

How Replication Works

The Right Tools for the Job

Table 1: Common REPADMIN Commands

Figure 2: Results of running the REPADMIN command /SHOWREPL

Knowing Where to Look

or for a specific partner and directory partition by running

in which the directory partition is, for example, DC=Deuby,DC=net.

Bottom Line: Eternal Vigilance

Troubleshooting networks without NetMon

At its core, we will always troubleshoot the same way:

2. Do we have basic network connectivity?

Can we verify our own local networking with:

Computing statistics for 75 seconds...

Source to Here This Node/Link

NSLOOKUP DEST-DC-01.contoso.com 10.20.0.20

DNSCMD /EnumRecords _msdcs.contoso.com @ /Type CNAME

Can we get WINS to give us back the records with:

Can we verify that LDAP is listening on DC/GCs with:

PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 389

Can we connect to the domain controllers with LDP:

PORTQRY -n DEST-DC-01.contoso.com -p udp -e 138

Can we connect over SMB (as an administrator) with:

NET USE \\DEST-DC-01.contoso.com\C$ /p:n

PORTQRY -n DEST-DC-01.contoso.com -p tcp -e 135

Can we connect to the destination server with:

ECHO Querying Global Catalog

Anda mungkin juga menyukai