Anda di halaman 1dari 59

Step-by-Step Guide to Installing,

Configuring, and Tuning a HighPerformance Compute Cluster


White Paper
Published: June 2007
For the latest information, please see http://www.microsoft.com/windowsserver2003/ccs

Contents
Introduction............................................................................................................................... 1
Before You Begin...................................................................................................................... 4
Plan Your Cluster.................................................................................................................. 4
Install Your Cluster Hardware............................................................................................... 5
Configure Your Cluster Hardware......................................................................................... 6
Obtain Required Software..................................................................................................... 6
Installation, Configuration, and Tuning Steps..........................................................................11
Step 1: Install and Configure the Service Node...................................................................11
Step 2: Install and Configure ADS on the Service Node.....................................................15
Step 3: Install and Configure the Head Node......................................................................19
Step 4: Install the Compute Cluster Pack...........................................................................21
Step 5: Define the Cluster Topology....................................................................................22
Step 6: Create the Compute Node Image...........................................................................22
Step 7: Capture and Deploy Image to Compute Nodes......................................................25
Step 8: Configure and Manage the Cluster.........................................................................26
Step 9: Deploy the Client Utilities to Cluster Users.............................................................28
Appendix A: Tuning your Cluster.............................................................................................30
Appendix B: Troubleshooting Your Cluster..............................................................................33
Appendix C: Cluster Configuration and Deployment Scripts...................................................36
Related Links.......................................................................................................................... 37

Introduction
High-performance computing is now within reach for many businesses by clustering industrystandard servers. These clusters can range from a few nodes to hundreds of nodes. In the
past, wiring, provisioning, configuring, monitoring, and managing these nodes and providing
appropriate, secure user access was a complex undertaking, often requiring dedicated
support and administration resources. However, Microsoft Windows Compute Cluster
Server 2003 simplifies installation, configuration, and management, reducing the cost of
compute clusters and making them accessible to a broader audience.
Windows Compute Cluster Server 2003 is a high-performance computing solution that uses
clustered commodity x64 servers that are built with a combination of the Microsoft Windows
Server 2003 Compute Cluster Edition operating system and the Microsoft Compute Cluster
Pack. The base operating system incorporates traditional Windows system management
features for remote deployment and cluster management. The Compute Cluster Pack
contains the services, interfaces, and supporting software needed to create and configure the
cluster nodes, as well as the utilities and management infrastructure. Individuals tasked with
Windows Compute Cluster Server 2003 administration and management have the advantage
of working within a familiar Windows environment, which helps enable users to quickly and
easily adapt to the management interface.
Windows Compute Cluster Server 2003 is a significant step forward in reducing the barriers to
deployment for organizations and individuals who want to take advantage of the power of a
compute clustering solution.
-

Integrated software stack. Windows Compute Cluster Server 2003 provides an


integrated software stack that includes operating system, job scheduler, message passing
interface (MPI) layer, and the leading applications for each target vertical.

Better integration with IT infrastructure. Windows Compute Cluster Server 2003


integrates seamlessly with your current network infrastructure (for example, Active
Directory), enabling you to leverage existing organizational skills and technology.

Familiar development environment. Developers can leverage existing Windows-based


skills and experience to develop applications for Windows Compute Cluster Server 2003.
Microsoft Visual Studio is the most widely used integrated development environment
(IDE) in the industry, and Visual Studio 2005 includes support for developing HPC
applications, such as parallel compiling and debugging. Third-party hardware and
software vendors provide additional compiler and math library options for developers
seeking an optimized solution for existing hardware. Windows Compute Cluster Server
2003 supports the use of MPI with Microsofts MPI stack, or the use of stacks from other
vendors.

This step-by-step guide is based on the highly successful cluster deployment at National
Center for Supercomputing Applications (NCSA) at the University of Illinois at ChampaignUrbana. The cluster was built as a joint effort between NCSA and Microsoft, using commonly
available hardware and Microsoft software. The cluster was composed of 450 x64 servers,
achieving 4.1 teraflops (TFLOPs) on 896 processors using the widely accepted LINPACK
benchmark. Figure 1 shows the cluster topology used for the NCSA deployment, including the
public, private, and MPI networks.

Figure 1 Supported cluster topology similar to NCSA-deployment topology

Although every IT environment is different, this guide can serve as a basis for setting up your
large-scale compute cluster. If you need additional guidance, see the Related Links section at
the end of this guide for more resources.
Note
The intended audience for this document is network administrators who have at least two
years experience with network infrastructure, management, and configuration. The example
deployment outlined in this document is targeted at clusters in excess of 100 nodes. Although
the steps discussed here will work for smaller clusters, they represent steps modeled on large
deployments for enterprise-scale and research-scale clusters.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Note
The skill level that is required to complete the steps in this document assumes knowledge of
how to install, configure, and manage Microsoft Windows Server 2003 in an Active Directory
environment, and experience in adding and managing computers and users within a domain.
Note
This is Version 1 of this document. To download the latest updated version, visit the Microsoft
Web site (http://www.microsoft.com/hpc/). The update may contain critical information that
was not available when this document was published.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Before You Begin


Setting up a compute cluster with Windows Server 2003 Compute Cluster Edition begins with
the following tasks:
1. Plan your cluster.
2. Install your cluster hardware.
3. Configure your cluster hardware.
4. Obtain required software.
When you have completed these tasks, use the steps in the Installation, Configuration, and
Tuning Steps section to help you install, configure, and tune your cluster.

Plan Your Cluster


This step-by-step guide provides basic instructions on how to deploy a Windows compute
cluster. Your cluster planning should cover the types of nodes that are required for a cluster,
and the networks that you will use to connect the nodes. Although the instructions in this guide
are based on one specific deployment, you should also consider your environment and the
number and types of hardware you have available.
Your cluster requires three types of nodes:
-

Head node. A head node mediates all access to the cluster resources and acts as a
single point for cluster deployment, management, and job scheduling. There is only one
head node per cluster.

Service node. A service node provides standard network services, such as directory and
DNS and DHCP services, and also maintains and deploys compute node images to new
hardware in the cluster. Only one service node is needed for the cluster, although you can
have more than one service node for different roles in the clusterfor example, moving
the image deployment service to a separate node.

Compute node. A compute node provides computational resources for the cluster.
Compute nodes are provided jobs and are managed by the head node.

Additional node types that can be used but are not required are remote administration nodes
and application development nodes. For an overview of device roles in the cluster, see the
Windows Compute Cluster Server 2003 Reviewers Guide
(http://www.microsoft.com/windowsserver2003/ccs/reviewersguide.mspx).
Your cluster also depends on the number and types of networks used to connect the nodes.
The Reviewers Guide discusses the topologies that you can use to connect your nodes, by
using combinations of private and public adapters for message passing between the nodes
and system traffic among all of the nodes. For the cluster detailed in this guide, the head node
and service node have public and private adapters for system traffic, and the compute nodes
have private and message passing interface (MPI) adapters. (Note: This is not a supported
topology but is very similar to one that is.) Consult the Reviewers Guide for the advantages of
each network topology.
Lastly, you should consider the level of cluster expertise, networking knowledge, and amount
of management time available on your staff to dedicate to your cluster. Although deployment
and management is simplified with Windows Compute Cluster Server 2003, keep in mind that
no matter what the circumstances, a large-scale compute cluster deployment should not be
taken lightly. It is important to understand how management and deployment work when

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

planning for the appropriate resources. Compute Cluster Server uses robust, enterprise-grade
technologies for all aspects of network and device management. Its management tools and
programs allow granular, role-based management of security for cluster administration and
cluster users, and its network and system management tools can easily and quickly deploy
applications and jobs using familiar, wizard-based interfaces. Additional compute nodes can
be added automatically to the compute cluster by simply plugging the nodes in and
connecting them to the cluster. Extensive (and expensive) daily hands-on tweaking,
configuration, and management are not needed when using commodity hardware and a
standards-based infrastructure.

Install Your Cluster Hardware


For ease of management and configuration, all nodes in the deployment in this guide will use
the same basic hardware platform. Hardware requirements for computers running Windows
Compute Cluster Server 2003 are similar to those for Windows Server 2003, Standard x64
Edition. You can find the system requirements for your cluster at
http://www.microsoft.com/windowsserver2003/ccs/sysreqs.mspx. Table 1 shows a list of
hardware for all nodes. This list is based on the hardware used in the NCSA deployment.

Table 1: Hardware for All Nodes


Component

Recommended Hardware

CPU

Blade servers - Each blade has two single-core


3.2 GHz processors with 2 MB cache and an
800MHz front-side bus. Motherboard includes 4x
PCI Express slots.

RAM

4 x 1GB 400 MHz DIMMs. For compute nodes,


you should plan on having 2 GB RAM per core.

Storage

SCSI adapter, 73GB 10K RPM Ultra320 SCSI


disk. RAID may be used on any node, but was
not used in this deployment. For the head node,
you should plan on having three disks: one for
the OS, one for the database, and one for the
transaction logs. This will provide improved
performance and throughput.

Network Interface Cards

1000 Mb Gigabit Ethernet adapter


1x InfiniBand 4x PCE Express adapter

Gigabit Network Hardware

48-port Gigabit switch per rack: 40 ports for


blades, 4 for uplink to ring
48-port Layer 2 Gigabit switches in ring
configuration

InfiniBand Network Hardware

5x 24-port InfiniBand switches per rack


2x 96-port InfiniBand switches for cross-rack
connectivity

Note
The head node and the network services node each use two Gigabit Ethernet network
adapters; both the compute nodes and the head nodes use the private MPI network, though
the head nodes MPI interface was disabled for this specific deployment. Also, the service
node requires a 32-bit operating system, since ADS will only work with 32-bit, but you can run
the operating system on 32-bit or 64-bit hardware. (This is a custom configuration used on the

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

cluster deployment at NCSA and is not supported for general use. However, it is very similar
to a supported cluster topology. For more information on supported cluster topologies, please
refer to the Windows Compute Cluster Server 2003 Reviewers Guide.)

Configure Your Cluster Hardware


When you have added your switches and blades to the rack, you must configure the network
connections and network hardware prior to installing the network software. To configure your
hardware, follow the checklist in Table 2.

Table 2: Hardware Configuration Checklist


Check when
completed

Configuration Item
Connect all high-speed interconnect connections from the pass-through
module on the chassis to the racks high-speed interconnect switches.
Connect all Gigabit Ethernet connections from the pass-through module
on the chassis to the racks 48-port Gigabit Ethernet switch.
Connect all Infiniband switches to the Layer 2 switches.
Connect all Gigabit Ethernet switches to the Gigabit Ethernet Layer 2
switches.
Disable the built-in subnet manager on all switches. The built-in subnet
manager doesnt support OpenIB clients, and conflicts with the subnet
manager that does support such clients.
Change the BIOS boot sequence on all nodes to Network Pre-boot
Execution Environment (PXE) first, CD ROM second, and Hard Drive
third. For platforms that dynamically remove missing devices at powerup, an efficient way to set the hard drives last in the boot order is to pull
the hard drives, power up the devices once, power off the devices, put
the drives back in, and then power up again. The boot order will be set
correctly thereafter.
Disable hyperthreading on all nodes and set the nodes system clock to
the correct time zone, if required.
Obtain a list of all private Gigabit Ethernet adapter MAC addresses for
the compute nodes. These addresses are used as input with a
configuration script to identify your nodes and configure them with the
proper image. In some cases you can use the blade chassis telnet
interface to collect the MAC addresses. See Appendix C for a
description of the input file and the file format.

Obtain Required Software


In addition to Windows Compute Cluster Server 2003, you will need to obtain operating
systems, administration utilities, drivers, and Quick Fix files to bring your systems up-to-date.
Table 3 lists the software required for each node type, and the notes following the chart show
you where to obtain the necessary software. The following list is based on the software used
in the NCSA deployment.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Table 3: Software Required by Node Type


Software Required by Node Type

Head
Node

Service
Node

Compute
Node

Windows Server 2003 R2 Standard Edition x64


Windows Server 2003 R2 Enterprise Edition x86
Windows Server 2003 Compute Cluster Edition x64
Microsoft Compute Cluster Pack
SQL Server 2005 Standard Edition x64
Automated Deployment Services (ADS) version 1.1
Microsoft Management Console (MMC) 3.0
.NET Framework 2.0
Windows Preinstall Environment (WinPE)
QFE KB910481
QFE KB914784
Microsoft System Preparation tool (sysprep.exe)
Cluster configuration and deployment scripts
Latest network adapter drivers

Notes on the software required the deployment described in this paper:


Microsoft SQL Server 2005 Standard Edition x64: By default, the Compute Cluster Pack
will install MSDE on the head node for data and node tracking purposes. Because MSDE is
limited to eight concurrent connections, SQL Server Standard Edition 2005 is recommended
for clusters with more than 64 compute nodes.
ADS version 1.1: ADS requires 32-bit versions of Windows Server 2003 Enterprise Edition
for image management and deployment. Future Microsoft imaging technology (Windows
Deployment Services, available in the next release of Windows Server, code name
Longhorn) will support 64-bit software. You can download the latest version of ADS from the
Microsoft Web site
(http://www.microsoft.com/windowsserver2003/technologies/management/ads/default.mspx).
Because this paper is based on a previous large-scale compute cluster deployment at NCSA,
it details using ADS to deploy compute node images as opposed to using Microsoft Windows
Deployment Services (WDS). However, future updates to this paper will explain how to use
WDS to deploy compute node images to your cluster.
MMC 3.0: MMC 3.0 is required for the administration node, which may or may not be the
head node. It is automatically installed by the Compute Cluster Pack on the computer that is
used to administer the cluster. You can also download and install the latest versions for
Windows Server 2003 and Windows XP x86 and x64 versions at the Microsoft Web site
(http://support.microsoft.com/?kbid=907265).
.NET Framework 2.0: The .NET Framework is automatically installed by the Compute Cluster
Pack. You can also download the latest version at the Microsoft Web site
(http://msdn2.microsoft.com/en-us/netframework/aa731542.aspx).

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

WinPE: You will need a copy of Windows Preinstallation Environment for Windows Server
2003 SP1. If you need to add your Gigabit Ethernet drivers to the WinPE image, you will need
to obtain a copy of the Windows Server 2003 SP1 OEM Preinstallation Kit (OPK), which
contains the programs needed to update the WinPE image for your hardware. WinPE and the
OPK are available only to customers with enterprise or volume license agreements; contact
your Microsoft representative for more information.
QFE KB910481: This Quick Fix is for potential problems when deploying Winsock Direct in a
fast Storage Area Network (SAN) environment. You can download the quick fix at the
Microsoft Web site (http://support.microsoft.com/?kbid=910481).
QFE KB914784: This Quick Fix is in response to a Security Advisory and provides additional
kernel protection in some environments. You can download the quick fix at the Microsoft Web
site (http://support.microsoft.com/?kbid=914784).
Sysprep.exe: Sysprep.exe is used to help prepare the compute node image prior to
deployment. Sysprep is included as part of Windows Server 2003 Compute Cluster Edition.
Note: You must use the x64 bit version of Sysprep in order to capture and deploy your
images.
Cluster configuration and deployment scripts: These scripts are available to download at
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/default.mspx. They include
hard-coded paths and require you to follow the installation and usage instructions exactly as
described in this guide. If you must modify the scripts for your deployment, make sure that
you verify that the scripts work in your environment before using them to deploy your cluster.
For the scripts to run properly, you will also need specific information about your cluster and
its hardware. Appendix C contains a sample input file (AddComputeNodes.csv) that is used to
automatically configure the compute cluster nodes and populate Active Directory with node
information. Table 4 lists the specific items needed, with room for you to write down the values
for your deployment. You can then use this information when building your cluster and when
creating your compute node images. Follow the instructions in Appendix C for creating your
own sample input file.
Note
Every item in Table 4 must have an entry or the input file will not work properly. If you do not
have a value for a field, use a hyphen - for the field instead.
Latest network adapter drivers: Contact the manufacturer of your network adapters for the
most recent drivers. You will need to install these drivers on your cluster nodes.

Table 4: Cluster Information Needed for Script Input File


Input Value

Your Value

Description

FullName

Populates the cluster node registry with


the Registered Owner name.

Organisation name

Populates the cluster node registry with


the Registered Organization name.

ProductKey

25-digit alphanumeric product key used


for all compute cluster nodes. Contact
your Microsoft representative for your
volume license key.

Server Name

Populates Active Directory with a


Compute Cluster node name.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Input Value

Your Value

Description

Srv Description

Populates the ADS Management console


with a text description of the node. Can
be used to list rack placement or other
helpful information.

Server MAC

Gigabit Ethernet MAC address for each


compute cluster node.

Machine Name

Used to configure the cluster node with a


machine name. Must match the value in
the Server Name field.

Admin Password

Local administrator password.

Domain

The cluster domain name (for example,


HPCCluster.local).

Domain Username

Account name with permission to add


computers to a domain.

Domain Password

Password for the account with


permission to add computers to a
domain.

ImageName

The image name to be installed on the


cluster node (for example, CCSImage).

HPC Cluster Name

The head node name must be used for


the cluster name.

NetworkTopology

Must be Single.

PartitionSize

Not used.

PublicIP

Not used.

PublicSubnet

Not used.

PublicGateway

Not used.

PublicDNS

Not used.

PublicNICName

Not used.

PublicMAC

Not used.

PrivateIP

Not used.

PrivateSubnet

Not used.

PrivateGateway

Not used.

PrivateDNS

Not used.

PrivateNICName

Not used.

PrivateMAC

Not used.

MPIIP

Assigns a static address to the MPI


adapter (for example, 11.0.0.1).

MPISubnet

Assigns a subnet mask to the MPI


adapter (for example, 255.255.0.0).

MPIGateway

Not used.

MPIDNS

Not used.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Input Value

Your Value

Description

MPINICName

Not used.

MPIMAC

Not used.

MachineOU

Populates Active Directory with Machine


OU information (for example,
OU=Cluster
Servers,DC=HPCCluster,DC=local).

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Installation, Configuration, and Tuning Steps


To install, configure, and tune a high-performance compute cluster, complete the following
steps:
1. Install and configure the service node.
2. Install and configure ADS on the service node.
3. Install and configure the head node.
4. Install the Compute Cluster Pack.
5. Define the cluster topology.
6. Create the compute node image.
7. Capture and deploy image to compute nodes.
8. Configure and manage the cluster.
9. Deploy the client utilities to cluster users.

Step 1: Install and Configure the Service Node


The service node provides all the back-end network services for the cluster, including
authentication, name services, and image deployment. It uses standard Windows technology
and services to manage your network infrastructure. The service node has two Gigabit
Ethernet network adapters and no MPI adapters. One adapter connects to the public network;
the other connects to the private network dedicated to the cluster.
There are five tasks that are required for installation and configuration:
1. Install and configure the base operating system.
2. Install Active Directory, Domain Name Services (DNS), and DHCP.
3. Configure DNS.
4. Configure DHCP.
5. Enable Remote Desktop for the cluster.
Install and configure the base operating system. Follow the normal setup procedure for
Windows Server 2003 R2 Enterprise Edition, with the exceptions as noted in the following
procedure.
To install and configure the base operating system
Boot the computer to the Windows Server 2003 R2 Enterprise Edition CD.
1. Accept the license agreement.
2. On the Partition List screen, create two partitions: one partition of 30 GB, and a
second using the remainder of the space on the hard drive. Select the 30 GB partition
as the install partition, and then press ENTER.
3. On the Format Partition screen, accept the default of NTFS, and then press ENTER.
Proceed with the remainder of the text-mode setup. The computer then reboots into
graphical setup mode.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

10

4. On the Licensing Modes page, select the option for which you are licensed, and then
configure the number of concurrent connections if needed. Click Next.
5. On the Computer Name and Administrator Password page, type a name for the
service node (for example, SERVICENODE). Type your local administrator password
twice, and then press ENTER.
6. On the Networking Settings page, select Custom settings, and then click Next.
7. On the Networking Components page for your private adapter, select Internet
Protocol (TCP/IP), and then click Properties. On the Internet Protocol (TCP/IP)
Properties page, select Use the following IP address. Configure the adapter with a
static nonroutable address, such as 10.0.0.1, and a 24-bit subnet mask (255.0.0.0).
Select Use the following DNS server addresses, and then configure the adapter to
use 127.0.0.1. Click OK, and then click Next.
Note: If this computer has a 1394 Net Adapter, it will ask you to set the IP for that
adapter first (before setting setting TCP/IP properties). Click Next to skip this page
(unnecessary to the cluster deployment) and move on to setting the TCP/IP
properties.
8. Repeat the previous step for the public adapter. Configure the adapter to acquire its
address by using DHCP from the public network. If you prefer, you can assign it a
static address if you have one already reserved. Configure the public adapter to use
127.0.0.1 for DNS queries. Click OK, and then click Next.
9. On the Workgroup or Computer Domain page, accept the default of No and the
default of WORKGROUP, and then click Next. The computer will copy files, and then
reboot.
10. Log in to the server as administrator. Click Start, click Run, type diskmgmt.msc, and
then click OK. The Disk Management console starts.
11. Right-click the second partition on your drive, and then click Format. In the Format
dialog box, select Quick Format, and then click OK. When the format process is
finished, close the Disk Management console.
Install Active Directory, DNS, and DHCP. Windows Server 2003 provides a wizard to
configure your server as a typical first server in a domain. The wizard configures your server
as a root domain controller, installs and configures DNS, and then installs and configures
DHCP.
To install Active Directory, DNS, and DHCP
1. Log in to your service node as Administrator. If the Manage Your Server page is not
visible, click Start, and then click Manage Your Server.
2. Click Add or remove a role. The Configure Your Server Wizard starts. Click Next.
3. On the Configuration Options page, select Typical configuration for a first server,
and then click Next.
4. On the Active Directory Domain Name page, type the domain name that will be
used for your cluster and append the .local suffix (for example, HPCCluster.local).
Click Next.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

11

5. On the NetBIOS Domain Name page, accept the default NetBIOS name (for
example, HPCCLUSTER) and click Next. At the Summary of Selections page, click
Next. If the Configure Your Server Wizard prompts you to close any open
programs, click OK.
6. On the NAT Internet Connection page, make sure the public adapter is selected.
Deselect Enable security on the selected interface, and then click Next. If you
have more than two network adapters in your computer, the Network Selection page
appears. Select the private LAN adapter and then click Next. Click Finish. After the
files are copied, the server reboots.
7. After the server reboots, log on as Administrator. Review the actions listed in the
Configure Your Server Wizard, and then click Next. Click Finish.
Configure DNS. DNS is required for the cluster and will be used by people who want to use
the cluster. It is linked to Active Directory and manages the node names that are in use. DNS
must be configured so that name resolution will function properly on your cluster. The
following task helps to configure your DNS settings for your private and public networks.
To configure DNS
1. Click Start, and then click Manage Your Server. In the DNS Server section, click
Manage this DNS server. You can also start the DNS Management console by
clicking Start, Administrative Tools, and then DNS.
2. Right-click your server, and then click Properties.
3. Click the Interfaces tab. Select Only the following IP addresses. Select the public
interface, and then click Remove. Only the private interface should be listed. If it is
not, type the IP address of the private interface, and then click Add. This ensures that
your services node will provide DNS services only to the private network and not to
addresses on the rest of your network. Click Apply.
4. Click the Forwarders tab. If the public interface is using DHCP, confirm that the
forwarder IP list has the IP address for a DNS server in your domain. If not, or if you
are using a static IP address, type the IP address for a DNS server on your public
network, and then click Add. This ensures that if the service node cannot resolve
name queries, the request will be forwarded to another name server on your network.
Click OK.
5. In the DNS Management console, select Reverse Lookup Zones. Right-click
Reverse Lookup Zones, and then click New Zone. The New Zone Wizard starts.
Click Next.
6. On the Zone Type page, select Primary zone, and then select Store the zone in
Active Directory. Click Next.
7. On the Active Directory Zone Replication Scope page, select To all domain
controllers in the Active Directory domain. Click Next.
8. On the Reverse Lookup Zone Name page, select Network ID, and then type the
first three octets of your private networks IP address (for example, 10.0.0). A reverse
name lookup is automatically created for you. Click Next.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

12

9. On the Dynamic Update page, select Allow only secure dynamic updates. Click
Next.
10. On the Completing the New Zone Wizard page, click Finish. The new reverse
lookup zone is added to the DNS Management console. Close the DNS Management
console.
Configure DHCP. Your cluster requires automated IP addressing services to keep node traffic
to a minimum. Active Directory and DHCP work together so that network addressing and
resource allocation will function smoothly on your cluster. DHCP has already been configured
for your cluster network. However, if you want finer control over the number of IP addresses
available and the information provided to DHCP clients, you must delete the current DHCP
scope and create a new one, using settings that reflect your cluster deployment.
To configure DHCP
1. Click Start, and then click Manage Your Server. In the DHCP Server section, click
Manage this DHCP server. You can also start the DHCP Management console by
clicking Start, clicking Administrative Tools, and then clicking DHCP.
2. Right-click the scope name (for example, Scope [10.0.0.0] Scope1), and then click
Deactivate. When prompted, click Yes. Right-click the scope again, and then click
Delete. When prompted, click Yes. The old scope is deleted.
3. Right-click your server name and then click New Scope. The New Scope Wizard
starts. Click Next.
4. On the Scope Name page, type a name for your scope (for example, HPC Cluster)
and a description for your scope. Click Next.
5. On the IP Address Range page, type the start and end ranges for your cluster. For
example, the start address would be the same address used for the private adapter:
10.0.0.1. The end address depends on how many nodes you plan to have in your
cluster. For up to 250 nodes, the end address would be 10.0.0.254. For 250 to 500
nodes, the end address would be 10.0.1.254. For the subnet mask, you can either
increase the length to 16, or type in a subnet mask of 255.255.0.0. Click Next.
6. On the Add Exclusions page, you define a range of addresses that will not be
handed to computers at boot time. The exclusion range should be large enough to
include all devices that use static IP addresses. For this example, type the start
address of 10.0.0.1 and an end address of 10.0.0.9. Click Add, and then click Next.
7. On the Lease Duration page, accept the defaults, and then click Next.
8. On the Configure DHCP Options page, select Yes, I want to configure these
options now, and then click Next.
9. On the Router (Default Gateway) page, type the private network adapter address
(for example, 10.0.0.1), and then click Add. Click Next.
10. On the Domain Name and DNS Servers page, in the Parent domain text box, type
your domain name (for example, HPCCluster.local). In the Server name text box,
type the server name (for example, SERVICENODE). In the IP ADDRESS fields, type

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

13

the private network adapter address (for example, 10.0.0.1). Click Add, and then click
Next.
11. On the WINS Servers page, click Next.
12. On the Activate Scope page, select Yes, I want to activate this scope now, and
then click Next.
13. On the Completing the New Scope Wizard page, click Finish. Close the DHCP
Management console.
Enable Remote Desktop for the cluster. You can enable Remote Desktop for nodes on your
cluster so that you can log on remotely and manage services by using the nodes desktop.
To disable Windows Firewall and enable Remote Management for the domain
1. Click Start, click Administrative Tools, and then click Active Directory Users and
Computers.
2. Right-click your domain (for example, hpccluster.local), click New, and then click
Organizational Unit.
3. Type the name of your new OU (for example, Cluster Servers) and then click OK. A new
OU is created in your domain.
4. Right-click your OU and then click Properties. The OU Properties dialog appears. Click
the Group Policy tab. Click New. Type the name for your new Group Policy (for example,
Enable Remote Desktop) and then press ENTER.
5. Click Edit. The Group Policy Object Editor opens. Browse to Computer Configuration \
Administrative Templates \ Windows Components \ Terminal Services.
6. Double-click Allow users to connect remotely using Terminal Services. Click Enabled
and then click OK. Close the Group Policy Object Editor.
7. On the OU Properties page, on the Group Policy tab, select your new Group Policy and
then click Options. Click No Override, click OK. You have created a new Group Policy
for your OU that enables Remote Desktop. Click OK.

Step 2: Install and Configure ADS on the Service Node


ADS is used to install compute node images on new hardware with little or no input from the
cluster administrator. This automated procedure makes it easy to set up and install new nodes
on the cluster, or to replace failed nodes with new ones. To install and configure ADS, perform
the following procedures:
1. Copy and update the WinPE binaries.
2. Copy and edit the script files.
3. Install and configure ADS.
4. Share the ADS certificate.
5. Import ADS templates.
6. Add devices to ADS.
Copy and update the WinPE binaries. The WinPE binaries provide a simple operating

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

14

system for the ADS Deployment Agent and the scripting engine to run scripts against the
node. Because the WinPE binaries are based on the installation files that are found on the
Windows Server 2003 CD, the driver cabinet files may not include the drivers for your Gigabit
Ethernet adapters. If your adapter is not recognized during installation and configuration of
your compute node image, you will need to update the WinPE binaries with the necessary
adapter drivers and information files.
Note: You can also wait to create the WinPE binaries until after you have installed and
configured ADS on the service node.
Copy and update the WinPE binaries
1. Create a C:\WinPE folder on your service node. Copy the WinPE binaries to C:\WinPE.
2. To update your WinPE binaries with the drivers and information files for your adapter,
create a C:\Drivers folder on your service node. Copy the .sys, .inf, and .cat files for your
driver to C:\Drivers.
3. Click Start, click Run, type cmd, and then click OK. A command prompt window opens.
4. Change directories to C:\WinPE\.
5. Type drvinst.exe /inf:c:\drivers\<filename>.inf c:\WinPE, where <filename> is the file name
for your drivers .inf file, and then press ENTER. Your WinPE binaries are now updated
with the drivers for your Gigabit Ethernet Adapter.
Copy and edit the script files. Follow the normal setup procedure for Windows Server 2003
R2 Enterprise Edition, with the exceptions noted later.
Copy and edit the script files
1. Create the C:\HPC-CCS. Create three new folders within the HPC-CCS folder: C:\HPCCCS\Scripts, C:\HPC-CCS\Sequences, and C:\HPC-CCS\Sysprep. Create the folder
C:\HPC-CCS\Sysprep\I386.
2. Copy the files AddADSDevices.vbs, ChangeIPforIB.vbs, and AddComputeNodes.csv (or
the name of your input file) into C:\HPC-CCS\Scripts. Copy Capture-CCS-image-withwinpe.xml and Deploy-CCS-image-with-winpe.xml into C:\HPC-CCS\Sequences. Copy
sysprep.inf into C:\HPC-CCS\Sysprep.
3. Insert the Windows Server 2003 Compute Cluster Edition CD into the CD drive. Browse to
the CD folder \Support\Tools. Double-click Deploy.cab. Copy the files sysprep.exe and
setupcl.exe to the C:\HPC-CCS\Sysprep\I386 folder. You must use the 64-bit versions of
these files or the image capture script will not work.
4. Use the chart in Table 4 to edit the file AddComputeNodes.csv (or the name of your input
file) and use the values for your company, your administrator password information, your
product key, MAC addresses, and MachineOU values. The easiest way to work with this
file, especially for entering the MAC addresses, is to import it into Excel as a commadelimited file, add the necessary values, and then export the data as a comma-separated
value file.
Install and configure ADS. You can download the ADS binaries from Microsoft, and then
either copy them to your service node or burn them onto a CD.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

15

To install and configure ADS


1. Browse to the CD or the folder containing the ADS binaries and then run ADSSetup.exe.
2. A Welcome page appears. Click Install Microsoft SQL Server Desktop Engine SP4
(Windows). The setup program automatically installs the MSDE software.
3. On the Welcome page, click Install Automated Deployment Services. The Automated
Deployment Services Setup Wizard starts. Click Next.
4. On the License Agreement page, select I accept the terms of the license agreement,
and then click Next.
5. On the Setup Type page, select Full installation, and then click Next.
6. The Installing PXE warning dialog appears. Click OK, and then click Next.
7. On the Configure the ADS Controller page, make sure that Use Microsoft SQL Server
Desktop Engine (Windows) is selected, and that Create a new ADS database is
selected. Click Next.
8. On the Network Boot Service Settings page, make sure that Use this path is selected.
Insert the Windows Server 2003 R2 Enterprise Edition x86 CD into the drive. Browse to
the CD drive, or type the drive containing the CD, and then click Next.
9. On the Windows PE Repository page, select Location of Windows PE. Browse to the
folder containing the WinPE binaries (for example, C:\WinPE). In the Repository name
text box, type a name for your repository (for example, NodeImages). Click Next.
10. On the Image Location page, type the path to the folder where the images will be stored.
These must be on the second partition that you created on your server (for example,
E:\Images). The folder will be created and shared automatically. Click Next.
11. If ADS Setup Wizard detects more than one network adapter in your computer, the
Network Settings for ADS Services page is displayed. In the Bind to this IP address
drop-down list, select the IP address that the ADS services will use to distribute images
on the private network, and then click Next.
12. On the Installation Confirmation page, click Install.
13. On the Completing the Automated Deployment Services Setup Wizard page, click
Finish. Close the Automated Deployment Services Welcome dialog box.
14. To open the ADS Management console, click Start, click All Programs, click Microsoft
ADS, and then click ADS Management.
15. Expand the Automated Deployment Services node, and then select Services. In the
center pane, right-click Controller Services, and then click Properties. On the Controller
Service Properties page, select the Service tab, and then change Global job template to
boot-to-winpe. For the Device Identifier, select MAC Address. For the WinPE Repository
Name, type NodeImages or the repository name that you created earlier. Click Apply,
and then click OK.
16. In the ADS Management console, right-click Image Distribution Service, and then click
Properties. Select the Service tab, and ensure that Multicast image deployment is
selected. Click OK.
Share the ADS certificate. ADS creates a computer certificate when it is installed. This
certificate is used to identify all computers in the cluster. The certificate must be shared so

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

16

that the compute node image can import the certificate and then use it during the
configuration process.
To share the ADS certificate
1. Click Start, click Administrative Tools, and then click Server Management. The Server
Management console opens.
2. Click Shared Folders, and then click New File Share. The Share a Folder Wizard starts.
Click Next.
3. On the Folder Path page, click Browse, and then browse to C:\ Program Files\ Microsoft
ADS\ Certificate. Click Next.
4. On the Name, Description, and Settings page, accept the defaults, and then click Next.
5. On the Permissions page, accept the defaults, and then click Finish. Click Close, and
then close the Server Management console. The ADS certificate is shared on your
network.
Import ADS templates. ADS includes several templates that are useful when managing your
nodes, including reboot-to-winpe and reboot-to-hd. The templates are not installed by default;
you must add them to ADS using a batch file. You also need to add the compute cluster
templates to ADS so that you can capture and deploy the compute node image on your
network.
To import ADS templates
1. Open Windows Explorer and browse to C:\ Program Files\ Microsoft ADS\ Samples\
Sequences.
2. Double-click create-templates.bat. The script file automatically installs the templates in
ADS. Close Windows Explorer.
3. Click Start, click All Programs, click Microsoft ADS, and then click ADS Management.
The ADS Management console opens.
4. Browse to Job Templates. Right-click Job Templates, and then click New Job
Template. The New Job Template Wizard starts. Click Next.
5. On the Template Type page, select An entirely new template, and then click Next.
6. On the Name and Description page, type a name for the compute node capture
template (for example, Capture Compute Node). Type a description (for example, Run
within Windows Server CCE), and then click Next.
7. On the Command Type page, select Task sequence, and then click Next.
8. On the Script or Executable Program page, browse to C:\hpc-ccs\sequences. Select All
files from the Files of type drop-down list. Select Capture-CCS-image-with-winpe.xml,
and then click Open. Click Next.
9. On the Device Destination page, select None, and then click Next. Click Finish. Your
capture template is added to ADS.
10. Repeat steps 4 through 9. In step 6, use Deploy Compute Node and Run from WinPE as
the name and description. In step 8, select the file Deploy-CCS-image-with-winpe.xml.
When finished, you have added the deployment template to ADS.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

17

Add devices to ADS. Follow the normal setup procedure for Windows Server 2003 R2
Enterprise Edition, with the exceptions noted later.
To add devices to ADS
1. Populate the ADS server with ADS devices. Click Start, click Run, type cmd.exe, and
then click OK. Change the directory to C:\HPC-CCS\Scripts.
2. Type AddADSDevices.vbs AddComputeNodes-Sample.csv (use the name of your input
file instead of the sample file name). The script will echo the nodes as they are added to
the ADS server. When the script is finished, close the command window.
If your company uses a proxy server to connect to the Internet, you should configure your
server so that it can receive system and application updates from Microsoft.
1. To configure your proxy server settings, open Internet Explorer . Click Tools, and
then click Internet Options.
2. Click the Connections tab, and then click LAN Settings.
3. On the Local Area Network (LAN) Settings page, select Use a proxy server for
your LAN. Enter the URL or IP address for your proxy server.
4. If you need to configure secure HTTP settings, click Advanced, and then enter the
URL and port information as needed.
5. Click OK three times, and then close Internet Explorer.
When you have finished configuring your server, click Start, click All Programs, and then
click Windows Update. This will ensure that your server is up-to-date with service packs and
software updates that may be needed to improve performance and security.

Step 3: Install and Configure the Head Node


The head node is responsible for managing the compute cluster nodes, performing job
control, and acting as the gateway for submitted and completed jobs. It requires SQL Server
2005 Standard Edition as part of the underlying service and support structure. You should
consider using three hard drives for your head node: one for the operating system, one for the
SQL Server database, and one for the SQL Server transaction logs. This will provide reduced
drive contention, better overall throughput, and some transactional redundancy should the
database drive fail.
In some cases, enabling hyperthreading on the head node will also result in improved
performance for heavily-loaded SQL Server applications.
There are two tasks that are required for installing and configuring your head node:
1. Install and configure the base operating system.
2. Install and configure SQL Server 2005 Standard Edition.
To install and configure the base operating system
1. On the head node computer, boot to the Windows Server 2003 R2 Standard Edition
x64 CD.
2. Accept the license agreement.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

18

3. On the Partition List screen, create two partitions: one partition of 30 GB, and a
second that uses the remainder of the space on the hard drive. Select the 30 GB
partition as the install partition, and then press ENTER.
4. On the Format Partition screen, accept the default of NTFS, and then press ENTER.
Proceed with the remainder of the text-mode setup. The computer then reboots into
graphical setup mode.
5. On the Licensing Modes page, select the option for which you are licensed, and then
configure the number of concurrent connections, if needed. Click Next.
6. On the Computer Name and Administrator Password page, type a name for the
head node (for example, HEADNODE). Type the account with permission to join a
computer to the domain (for example, hpccluster\administrator), type the password
twice, and then press ENTER.
7. On the Networking Settings page, select Typical settings, and then click Next. This
will automatically assign addresses to your public and private adapters. If you want to
use static IP addresses for either interface, select Custom Settings, and then click
Next. Follow the steps that you used to configure your service node adapter settings.
8. On the Workgroup or Computer Domain page, select Yes, make this computer a
member of a domain. Type the name of your cluster domain (for example,
HPCCluster.local), and then click Next. When prompted, type the name and the
password for an account that has permission to add computers to the domain
(typically, the Administrator account), and then click OK. Note: If your network adapter
drivers are not included on the Windows Server 2003 CD, then you will not be able to
join a domain at this time. Instead, make the computer a member of a workgroup,
complete the rest of setup, install your network adapters, and then join your head
node to the domain.
When you have configured the base operating system, you can install SQL Server 2005
Standard Edition on your head node.
To install and configure SQL Server 2005 Standard Edition
1. Log on to your server as Administrator. Insert the SQL Server 2005 Standard Edition x64
CD into the head node. If setup does not start automatically, browse to the CD drive and
then run setup.exe.
2. On the End User License Agreement page, select I accept the licensing terms and
conditions, and then click Next.
3. On the Installing Prerequisites page, click Install. When the installations are complete,
click Next. The Welcome to the Microsoft SQL Server Installation Wizard starts. Click
Next.
4. On the System Configuration Check page, the installation program displays a report
with potential installation problems. You do not need to install IIS or address any IISrelated warnings because IIS is not used in this deployment. Click Next.
5. On the Registration Information page, complete the Name and Company fields with the
appropriate information, and then click Next.
6. On the Components to Install page, select all check boxes, and then click Next.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

19

7. On the Instance Name page, select Named instance, and then type
COMPUTECLUSTER in the text box. Your cluster must have this name, or Windows
Compute Cluster will not work. Click Next.
8. On the Service Account page, select Use the built-in System account, and then select
Local system in the drop-down list. In the Start services at the end of setup section,
select all options except SQL Server Agent, and then click Next.
9. On the Authentication Mode page, select Windows Authentication Mode. Click Next.
10. On the Collation Settings page, select SQL collations, and then select Dictionary
order case-insensitive for use with 1252 Character Set from the drop-down list. Click
Next.
11. On the Error and Usage Report Settings page, click Next.
12. On the Ready to Install page, click Install. When the Setup Progress page appears,
click Next.
13. On the Completing Microsoft SQL Server 2005 Setup page, click Finish.
14. Open the Disk Management console. Click Start, click Run, type diskmgmt.msc, and then
click OK.
15. Right-click the second partition on your drive, and then click Format. In the Format dialog
box, select Quick Format, and then click OK. When the format process finishes, close
the Disk Management console.
If your company uses a proxy server to connect to the Internet, you should configure your
head node so that it can receive system and application updates from Microsoft.
1. To configure your proxy server settings, open Internet Explorer. Click Tools, and then
click Internet Options.
2. Click the Connections tab, and then click LAN Settings.
3. On the Local Area Network (LAN) Settings page, select Use a proxy server for
your LAN. Enter the URL or IP address for your proxy server.
4. If you need to configure secure HTTP settings, click Advanced, and then enter the
URL and port information as needed.
5. Click OK three times, and then close Internet Explorer.
When you have finished configuring your server, click Start, click All Programs, and then
click Windows Update. This will ensure that your server is up-to-date with service packs and
software updates that may be needed to improve performance and security. You should elect
to install Microsoft Update from the Windows Update page. This service provides service
packs and updates for all Microsoft applications, including SQL Server. Follow the instructions
on the Windows Update page to install the Microsoft Update service.

Step 4: Install the Compute Cluster Pack


When the head node has been configured, you can install the Compute Cluster Pack that
contains services, interfaces, and supporting software that is needed to create and configure
cluster nodes. It also includes utilities and management infrastructure for your cluster.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

20

To install the Compute Cluster Pack


1. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute
Cluster Pack Installation Wizard appears. Click Next.
2. On the Microsoft Software License Terms page, select I accept the terms in the
license agreement, and then click Next.
3. On the Select Installation Type page, select Create a new compute cluster with this
server as the head node. Do not use the head node as a compute node. Click Next.
4. On the Select Installation Location page, accept the default. Click Next.
5. On the Install Required Components page, a list of required components for the
installation appears. Each component that has been installed will appear with a check
next to it. Select a component without a check, and then click Install.
6. Repeat the previous step for all uninstalled components. When all of the required
components have been installed, click Next. The Microsoft Compute Cluster Pack
Installation Wizard completes. Click Finish.

Step 5: Define the Cluster Topology


After the Compute Cluster Pack installation for the head node is complete, a Cluster
Deployment Tasks window appears with a To Do List. In this procedure, you will configure the
cluster to use a network topology that consists of a single private network for the compute
nodes and a public interface from the head node to the rest of the network.
To define the cluster topology
1. On the To Do List page, in the Networking section, click Configure Cluster Network
Topology. The Configure Cluster Network Topology Wizard starts. Click Next.
2. On the Select Setup Type page, select Compute nodes isolated on private network
from the drop-down list. A graphic appears that shows you a representation of your
network. You can learn more about the different network topologies by clicking the Learn
more about this setup link. When you have reviewed the information, click Next.
3. On the Configure Public Network page, select the correct public (external) network
adaptor from the drop-down list. This network will be used for communicating between the
cluster and the rest of your network. Click Next.
4. On the Configure Private Network page, select the correct private (internal) adaptor
from the drop-down list. This network will be used for cluster management and node
deployment. Click Next.
5. On the Enable NAT Using ICS page, select Disable Internet Connection Sharing for
this cluster. Click Next.
6. Review the summary page to ensure that you have chosen an appropriate network
configuration, and then click Finish. Click Close.

Step 6: Create the Compute Node Image


You can now create a compute node image. This is the compute node image that will be
captured and deployed to each of the compute nodes. There are three tasks that are required
to create the compute node image:

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

21

1. Install and configure the base operating system.


2. Install and configure the ADS agent and Compute Cluster Pack.
3. Update the image and prepare it for deployment.
To install and configure the base operating system
4. Start the node that you want to use to create your compute node image. Insert the
Microsoft Windows Server 2003 Compute Cluster Edition CD into the CD drive. Textmode setup launches automatically.
5. Accept the license agreement.
6. On the Partition List screen, create one partition of 16 GB. Select the 16 GB partition as
the install partition, and then press ENTER.
7. On the Format Partition screen, accept the default of NTFS, and then press ENTER.
Proceed with the remainder of the text-mode setup. The computer then reboots into
graphical setup mode.
8. On the Licensing Modes page, select the option for which you are licensed, and then
configure the number of concurrent connections, if needed. Click Next.
9. On the Computer Name and Administrator Password page, type a name for the
compute node that has not been added to ADS (for example, NODE000). Type your local
administrator password twice, and then press ENTER.
10. On the Networking Settings page, select Typical settings, and then click Next. This will
automatically assign addresses to your public and private adapters. The adapter
information for the deployed nodes will be automatically created when the image is
deployed to a node.
11. On the Workgroup or Computer Domain page, select Yes, make this computer a
member of a domain. Type the name of your cluster domain (for example, HPCCluster),
and then click Next. When prompted, type the name and the password for an account
that has permission to add computers to the domain (for example,
hpccluster\administrator), and then click OK. The computer will copy files, and then
reboot. Note: If your network adapter drivers are not included on the Windows Server
2003 Compute Cluster Edition CD, then you will not be able to join a domain at this time.
Instead, make the computer a member of a workgroup, complete the rest of setup, install
your network adapters, and then join your compute node to the domain.
12. Log on to the node as administrator.
13. Copy the QFE files to your compute node. Run each executable and follow the
instructions for installing the quick fix files on your server.
14. Open Regedit. Click Start, click Run, type regedit, and then click OK.
15. Browse to HKEY_LOCAL_MACHINE\ SYSTEM\ CurrentControlSet\ Services\ Tcpip\
Parameters. Right-click in the right pane. Click New, and then click DWORD value. Type
SynAttackProtect (case sensitive), and then press ENTER.
16. Double-click the new key that you just created. Confirm that the value data is zero, and
then click OK.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

22

17. Right-click in the right pane. Click New, and then click DWORD value. Type
TcpMaxDataRetransmissions (case sensitive), and then press ENTER.
18. Double-click the new key that you just created. In the Value data text box, type 20.
Ensure that Base is set to Hexadecimal, and then click OK.
19. Close Regedit.
20. Disable any network interfaces that will not be used by the cluster, or that do not have
physical network connectivity.
When you have configured the base operating system, you can then install and configure the
ADS Agent and the Compute Cluster Pack on your image.
To install and configure the ADS Agent and Compute Cluster Pack
1. Copy the ADS binaries to a folder on the compute node. Browse to the folder, and then
run ADSSetup.exe.
2. A Welcome page appears. Click Install ADS Administration Agent. The Administration
Agent Setup Wizard starts. Click Next.
3. On the License Agreement page, select I accept the terms of the license agreement,
and then click Next.
4. On the Configure Certificates page, select Now. Type the fully-qualified path to the
certificate share on the service node (for example, \\servicenode \Certificate\ adsroot.cer).
Click Next.
5. On the Configure the Agent Logon Settings page, select None, and then click Next.
6. On the Installation Confirmation page, click Install.
7. On the Completing the Administration Agent Setup Wizard page, click Finish. Close
the Automated Deployment Services Welcome page.
8. Insert the Compute Cluster Pack CD into the head node. The Microsoft Compute
Cluster Pack Installation Wizard appears. Click Next.
9. On the Microsoft Software License Terms page, select I accept the terms in the
license agreement, and then click Next.
10. On the Select Installation Type page, select Join this server to an existing compute
cluster as a compute node. Type the name of the head node in the text box (for
example, HEADNODE). Click Next.
11. On the Select Installation Location page, accept the default. Click Next.
12. On the Install Required Components page, a list of required components for the
installation appears. Each component that has been installed will appear with a check
next to it. Select a component without a check, and then click Install.
13. Repeat the previous step for all uninstalled components. When all of the required
components have been installed, click Next. When the Microsoft Compute Cluster Pack
completes, click Finish.
When you have installed and configured the ADS Agent and Compute Cluster pack, you can
update your image with the latest service packs, and then prepare your image for deployment.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

23

To update the image and prepare it for deployment


1. Run the Windows Update service on your compute node. If your cluster lies behind a
proxy server, configure Internet Explorer with your proxy server settings. For information
on how to do this, see Step 1: Install and Configure the Service Node, earlier in this
guide.
2. Run the Disk Cleanup utility. Click Start, click All Programs, click Accessories, click
System Tools, and then click Disk Cleanup. Select the C: drive, and then click OK.
Select all of the check boxes, and then click OK. When the cleanup utility is finished,
close the utility.
3. Run the Disk Defragmenter utility. Click Start, click All Programs, click Accessories,
click System Tools, and then click Disk Defragmenter. Select the C: drive, and then
click Defragment. When the defragmentation utility is finished, close the utility.

Step 7: Capture and Deploy Image to Compute Nodes


You can now capture the compute node image that you just created. You can then deploy the
image to compute nodes on your cluster.
To capture the compute node image
1. If the compute node is not running, turn on the computer and wait for the node to boot into
Windows Server 2003 Compute Cluster Edition.
2. Log on to the service node as administrator. Click Start, and then click ADS
Management. Right-click Devices, and then click Add Device.
3. In the Add Device dialog box, type a name in the Name text box (for example, Node000),
a description for your node (for example, Compute Node Image), and then type the MAC
address for the node that is running the compute node image. Click OK. The status pane
will indicate that the node was created successfully. Click Cancel to close the dialog box.
4. Right-click your compute node name. Click Properties, and then click the User Variables
tab.
5. Click Add. In the Variables dialog box, in the Name text box, type Imagename. In the
Value text box, type a name for your image (for example, CCSImage). Click OK twice.
6. Right-click the compute node device again, and then click Properties. In the WinPE
repository name text box, type the name for your repository that you defined when you
installed ADS (for example, NodeImages). Click Apply, and then click OK.
7. Right-click the compute node that you just added, and then click Take Control.
8. Right-click the compute node device again, and then click Run job. The Run Job Wizard
starts. Click Next.
9. On the Job Type page, select Use an existing job template, and then click Next.
10. On the Template Selection page, select Capture Compute Node. Click Next.
11. On the Completing the Run Job Wizard page, click Finish. A Created Jobs dialog box
appears. Click OK. The ADS Agent on your compute node runs the job, using Sysprep to
prepare and configure the node image, and then using the ADS image capture functions
to create and copy the image to ADS. When the image capture is complete, the node
boots into WinPE.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

24

Deploy the image to nodes on the cluster. When you have captured the compute node
image to the service node, you can deploy the image to compute nodes on the cluster.
To deploy the image to nodes on the cluster
1. Log on to the service node as administrator. Click Start, click All Programs, click
Microsoft ADS, and then click ADS Management.
2. Expand the Automated Deployment Services node, and then select Devices.
3. Select all devices that appear in the right pane, right-click on the selected devices,
and then select Take Control. The Control Status changes to Yes.
4. Right-click on the devices, and then click Run job.
5. The Run Job Wizard appears. Click Next.
6. On the Job Type page, select Use an existing job template. Click Next.
7. On the Template Selection page, select boot-to-winpe. Click Next.
8. On the Completing the Run Job Wizard page, click Finish.
9. Boot the computer nodes. The network adapters should already be configured to use
PXE and obtain the WinPE image from the service node. To avoid overwhelming the
ADS server during unicast deployment of WinPE image, it is recommended that you
boot only four nodes at a time. Subsequent sets of four nodes should be booted up
only after all of the previous sets of four nodes are showing Connected to WinPE
status in the ADS Management window on the head node.
10. After all the nodes are connected to WinPE, you can deploy the compute node image
to those nodes. Right-click the devices, and then click Run job.
11. The Run Job Wizard appears. Click Next.
12. On the Job Type page, select Use an existing job template. Click Next.
13. On the Template Selection page, select Deploy CCS Image. Click Next.
14. On the Completing the Run Job Wizard page, click Finish. The nodes automatically
download and run the image. This task will take a significant amount of time,
especially when you are installing hundreds of nodes. Depending on your available
staff, you may want to run this as an overnight task. When finished, your nodes are
joined to the domain and ready to be managed by the head node.

Step 8: Configure and Manage the Cluster


The head node is used to manage and maintain your cluster once the node images have
been deployed. The Compute Cluster Pack includes a Compute Cluster Administrator console
that simplifies management tasks, including approving nodes on the cluster and adding users
and administrators to the cluster. The console includes a To Do List that shows you which
tasks have been completed. Follow these steps to configure and manage your cluster:
1. Disable Windows Firewall on all nodes in the cluster.
2. Approve nodes that have joined the cluster.
3. Add users and administrators to the cluster.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

25

Disable Windows Firewall on all nodes on the cluster. The Compute Cluster Administrator
console enables you to define how the firewall is configured on all cluster node network
adapters. For best performance on large-scale deployments, it is recommended that you
disable Windows Firewall on all interfaces.
To disable Windows Firewall on all nodes on the cluster
1. Click Start, click All Programs, click Microsoft Compute Cluster Pack, and then click
Compute Cluster Administrator.
2. Click the To Do List. In the Networking section in the results pane, click Manage
Windows Firewall Settings. The Manage Windows Firewall Wizard starts. Click Next.
3. On the Configure Firewall page, select Disable Windows Firewall, and then click Next.
4. On the View Summary page, click Finish. On the Result page, click Close. When
compute nodes are approved to join the cluster, the firewall will be disabled.
Approve nodes that have joined the cluster. When you deploy Compute Cluster Edition
nodes, they have joined the cluster but have not been approved to participate or process any
jobs. You must approve them before they can receive and process jobs from your users.
To approve nodes that have joined the cluster
1. Open the Compute Cluster Administrator console. Click Node Management.
2. In the results pane, select one or more nodes that display a status of Pending Approval.
3. In the task pane, click Approve. You can also right-click the selected nodes and then click
Approve.
4. When the nodes are approved, the status changes to Paused. You can leave the nodes in
Paused status, or in the task pane you can click Resume to enable the node to receive
jobs from your users.
Add users and administrators to your cluster. In order to use and maintain the cluster, you
must add cluster users and administrators to your cluster domain. This will make it possible
for others to submit jobs to the cluster, and to perform routine administration and maintenance
on the cluster. If your organization uses Active Directory, you will need to create a trust
relationship between your cluster domain and other domains in your organization. You will
also need to create organizational units (OUs) in your cluster domain that will act as
containers for other OUs or users from your organization. You may need to work with other
groups in your company to create the necessary security groups so that you can add users
from other domains to your compute cluster domain. Because each organization is unique, it
is not possible to provide step-by-step instructions on how to add users and administrators to
the cluster domain. For help and information on how best to add users and administrators to
your cluster, see Windows Server Help.
To add users and administrators to your cluster
1. In the Compute Cluster Administrator, click the To Do List. In the results pane, click
Manage Cluster Users and Administrators. The Manage Cluster Users Wizard starts.
Click Next.
2. On the Cluster Users page, the default group of HPCCLUSTER\Domain Users has been
added for you. Type a user or group by using the format domain\user or domain\group,
and then click Add. You can add or remove users and groups using the Add and

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

26

Remove buttons. When you have finished adding or removing users and groups, click
Next.
3. On the Cluster Administrators page, the default group of HPCCLUSTER\Domain
Admins has been added for you. Type a user or group using the format domain\user or
domain\group, and then click Add. You can add or remove users and groups by using the
Add and Remove buttons. When you have finished adding or removing users and
groups, click Next.
4. On the View Summary page, click Next.
5. On the Result page, click Close.

Step 9: Deploy the Client Utilities to Cluster Users


The Compute Cluster Administrator and the Compute Cluster Job Manager are installed on
the head node by default. If you install the client utilities on a remote workstation, an
administrator can manage clusters from that workstation. If you install the Compute Cluster
Administrator or Job Manager on a remote computer, the computer must have one of the
following operating systems installed:

Windows Server 2003, Compute Cluster Edition

Windows Server 2003, Standard x64 Edition

Windows Server 2003, Enterprise x64 Edition

Windows XP Professional x64 Edition

Windows Server 2003 R2 Standard x64 Edition

Windows Server 2003 R2 Enterprise x64 Edition

In addition, Windows Compute Cluster Server 2003 requires the following:

Microsoft .NET Framework 2.0

Microsoft Management Console (MMC) 3.0 to run the Compute Cluster Administrator
snap-in

SQL Server 2000 Desktop Engine (MSDE) to store all job information

The last step in the Windows Compute Cluster Server 2003 deployment process is to create
an administrator or operator console.
To deploy the client utilities
1. On the workstation that is running the appropriate operating system, insert the
Compute Cluster Pack CD. The Microsoft Compute Cluster Pack Installation
Wizard is automatically launched. Click Next.
2. On the Microsoft Software License Terms page, select I accept the terms in the
license agreement, and click Next.
3. On the Select Installation Type page, select Install only the Microsoft Compute
Cluster Pack Client Utilities for the cluster users and administrators, and then
click Next.
4. On the Select Installation Location page, accept the default location, and then click
Next.
5. On the Install Required Components page, highlight any components that are not
installed, and then click Install.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

27

6. When the installation is finished, a window appears that says Microsoft Compute
Cluster Pack has been successfully installed. Click Finish.
Please note that for an administration console, you should install only the client utilities. For a
development workstation, you should install both the software development kit (SDK) utilities
and the client utilities.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

28

Appendix A: Tuning your Cluster


Each cluster is created with a different goal in mind; therefore, there is a different way to tune
each cluster for optimal performance. However, some basic guidelines can be established. To
achieve performance improvements, you can do some planning, but testing will also be
crucial. For testing, it is important to use applications and data that are as close as possible to
the ones that the cluster will ultimately use. In addition to the specific use of the cluster, its
projected size will be another basis for making decisions. After you deploy the applications,
you can work on tuning the cluster appropriately.
The best networking solution will depend on the nature of your application. Although there are
many different types of applications, they can be broadly categorized as message-intensive
and embarrassingly parallel. In message-intensive applications, each nodes job is dependent
on other nodes. In some situations, data is passed between nodes in many small messages,
meaning that latency is the limiting factor. With latency-sensitive applications, highperformance networking interfaces, such as Winsock Direct, are crucial. In addition, the use of
high-quality routers and switches can improve performance with these applications.
In some messaging situations, large messages are passed infrequently, meaning that
bandwidth is the limiting factor. A specialty network, such as InfiniBand or Myrinet, will meet
these high-bandwidth requirements. If network latency is not an issue, a gigabit Ethernet
network adapter might be the best choice.
In embarrassingly parallel applications, each node processes data independently with little
message passing. In this case, the total number of nodes and the efficiency of each node is
the limiting factor. It is important to be able to fit the entire dataset into RAM. This will result in
much faster performance, as the data will not have to be paged in and out from the disk
during processing. The speed of the processors and the type and number of nodes is a prime
concern. If the processors are dual-core or quad-core, this may not be as efficient as having
separate processors, each with their own memory bus. In addition, if hyper-threading is
available, it may be advantageous to turn this feature off. Hyper-threading is used when
applications are not using all CPU cycles, so we have them run on a single processor.
Hyperthreading is generally bad for high-performance computing applications, but not
necessarily all of them. So long as the operating system kernel is hyperthread-aware, the
floating point intensive processes will be balanced across physical cores. For multi-threaded
applications that may have I/O intensive threads and floating point intensive threads,
hyperthreading could be a benefit. Hyperthreading was disabled at NCSA because none of
the applications were floating-point intensive, and no specific thread-balancing or kerneltuning was performed. This works for regular scenarios, but in high-performance computing,
all CPU resources are used, so having all processes on a single processor has the opposite
effects: they have to wait to get resources. If the application were actually perfectly parallel,
each extra node would increase performance time linearly.
For each application, there are a maximum number of processors that will increase
performance. Above that number, each processor adds no value, and could even decrease
performance. This is referred to as application scaling. Depending on the system architecture,
all cores sometimes divide available bandwidth to memory, and they certainly always divide
the network bandwidth. One of these three (CPU, network, or memory bandwidth) is the
performance bottleneck with any application. If the nature of the application(s) is known, you
can determine in advance the optimal cluster specifications that will match the application.
You should work with your application vendor to ensure that you have the optimal number of
processors.
In some applications, many jobs are run, each of short duration. With this scenario, the
performance of the job scheduler is crucial. The CCS job scheduler was designed to handle
this situation.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

29

When evaluating cluster performance, it is important to be aware that benchmarks dont


always tell the whole story. You must evaluate the performance based on your own needs and
expectations. Evaluation should take place by using the application along with the data that
will be running on the cluster. This will help to ensure a more accurate evaluation that will
result in a system that better meets your needs.
For more information on cluster tuning, you can download the Performance Tuning a
Compute Cluster white paper from the Microsoft Web site at http://go.microsoft.com/fwlink/?
LinkId=87828
You can also find additional tips and new information on performance tuning at the HPC Team
blog: http://windowshpc.net/blogs.
Table 5 deals with scalability and will help you make decisions based on the intended size of
your cluster. The first part focuses on management scenarios, while the second part focuses
on networking scenarios. For each scenario, there are an estimated number of nodes, above
which the scenario will manifest itself. If your cluster exceeds the specified number of nodes,
you may need to use the Note column to plan accordingly, or to troubleshoot.

Table 5: Scalability Considerations


Management Scenario

Nodes

Note

MSDE on Head Node


supports 8 or fewer
concurrent connections.

64+

Use SQL Server 2005 on the head node (hard coded)


5-7 tables for scheduler. Use 8 tables for SDM.

RIS on Service Node


supports only 80 machines
simultaneously.

64+

Use ADS for CCS 2003 (ADS requires 32-bit).

ICS/NAT has an address


range limit of 192.168.0.*

250+

Use DHCP Server instead.

The File server on Head


Node only supports a limited
number of simultaneous
connections to SMB/NTFS.

24

Executable on compute nodes


Increase the number of connections that the file server
on the head node can support (see KB 317249)

The DC/DNS server on Head


Node is not optimal. It doesnt
handle well with several NICs.

64+

It is best to leverage corporate IT DC.


Put DC/DNS on a separate management node.

ADS loses contact with


compute nodes after
WinsockDirect has been
enabled.

N/A

Use clusrun or jobs to control the machine. If IPMI is


available, use IPMI to reboot the machine into winPE.
WDS for next version of CCS works with
WinsockDirect.

Cisco IB switch subnet


manager is incompatible with
openIB drivers.

N/A

Use openSM:
Disable Cisco IB switch subnet manager
Enable openSM

A SDM update bottleneck


exists.

64+

CCS V1 SP1

Job Scheduling bottlenecks


exist.

64+

CCS V1 SP1

WinsockDirect (large scale


only)

64+

CCS V1 SP1
Winsock Direct hotfixes 910481 , 927620, 924286

Infiniband drivers (large


scale only)

64+

This is fixed in openFabrics build 459, found at:


http://windows.openib.org/downloads/binaries/

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

30

Management Scenario

Nodes

Note

There is a bottleneck in the


number of possible
simultaneous connections
with code path used when
SYN attack protection is on.

64+

Disable SYN attack protection registry value


HKLM\System\CurrentControlSet\Services\Tcpip\Para
meters
SynAttackProtect =0

There are TCP timeouts on


calling nodes when network is
jammed (delay at switch). For
example, mpi all reduce.

64+

Set TCP retransmission count to 0x20. Please note


that this is hard to diagnose as one-to-all makes
different nodes fail.

Latency is too high.

N/A

Use mpiexec env IBWSD_POLL 500 linpack.

Bandwidth is too low.

N/A

Use mpiexec env


MPICH_SOCKET_SBUFFER_SIZE 0 to avoid copy on
send to improve bandwidth. Only use this when
Winsock Direct is enabled it can cause lockup with
GigE and IPoIB.

A Winsock Direct connection


timeout exists.

N/A

Use mpiexec env IBWSD_SA_TIMEOUT 1000 to set


the subnet manager timeout to a higher value during
Winsock Direct connection establishment.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

31

Appendix B: Troubleshooting Your Cluster


In addition, Table 6 can help you troubleshoot problems with your cluster.

Table 6: Troubleshooting
Issue

Mitigation

Details

Application Hangs
N/A

For ease of debugging, switch off


shared memory communication
using environment variable
MPICH_DISABLE_SHM=1
Note: This is done from the
command line with the command:

MPI environment variable


MPICH_DISABLE_SHM
If you disable the shared memory,
MSMPI stops looking at communication
between processors and focuses on
network queues (node to node)
communication.

mpiexec env VARIABLE SETTING


-env OTHERVARIABLE
OTHERSETTING

Note: You can also set up WinDbg


for just-in-time debugging with the
command Windbg I

Application Fails
Network
connectivity issue

The last line of the MPI output file


gives you information on network
errors
Note: The stdout output is located
where you route it in your job; i.e.,
it is specified by the /stdout:
switch to job submit

Output file

SYN protection
interferes with
connectivity under
heavy load

Turn SYN protection off


completely on all CNs (leave SYN
protection active on the head
node to avoid denial-of-service)

Registry setting for SYN attack


protection:
KEY_LOCAL_MACHINE\System\CurrentContr
olSet\Services\Tcpip\Parameters\SynAttackPr
otect=0

To deploy this setting to all nodes:


clusrun /all reg add
HKLM\SYSTEM\CurrentControlSet\Service
s\Tcpip\Parameters /v SynAttackProtect /t
REG_DWORD /d 0 /f
clusrun /all shutdown -t 10 -r -f

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

32

Issue

Mitigation

Details

Network
connectivity failure

Identify node with defect


Use Pallas ping pong, one-to-all,
all-to-all

A good set of tools for this are the Linuxbased Intel MPI benchmarks (based on
the Pallas test suite). These are available
for download from
http://windowshpc.net/files/4/porting_unix
_code/entry373.aspx
Note: Because these tests are Linuxbased, you will have to port them to CCS
using the Subsystem for UNIX
Applications (SUA). Instructions how to
do this are included with the download.

Winsock Direct
issues

Disable Winsock Direct (WSD)


and use the IPoIB path instead of
RDMA:

IB driver and Winsock Direct installation


utility

Clusrun /all
\\HEADNODE\IBDriverInstallPath\n
et\amd64\installsp -r

If it works when disabled, then try


to Repair IB connections clusrun
netsh interface set interface
name=MPI
admin=DISABLE/ENABLE
Validate that your cluster has the
latest Winsock Direct patches

Application Performance Not Optimal


Application not
optimized for
memory or CPU
utilization

Check whether nodes are paging


instead of using RAM
Check CPU utilization

Application doesnt
scale to large
number of nodes

Decrease the number of nodes


used by the application until
application performance comes
back to expected level

MSMPI does not


balance well
between same
node processors
communication
and node to node
communication

Experiment with disabling the


shared memory setting and see
whether application performance
improves
Especially relevant for messageintensive applications

MPICH_DISABLE_SHM

Messages are not


coming in fast
enough

GigE: Experiment with turning off


the interrupt modulation to free up
CPU usage
IB: Experiment with increasing
polling of messages.
Polling causes high CPU usage,
so if usage is too high, it will be
detrimental to the application
computing CPU needs.

openIB driver IBWSD_POLL

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

Use perfmon counters


http://go.microsoft.com/fwlink/?
LinkId=86619

33

Issue

Mitigation

Details

Connectivity to
one or more nodes
on the cluster is
lost

Divide the cluster into subsets of


nodes.
Run Pallas ping pong or Pallas
one-to-all or Pallas all-to-all on
those subsets.

Intel MPI benchmarks.


This strategy breaks the cluster into
subclusters to try to find where the issue
is. In each sublcluster run sanity tests
like the Pallas series in order to discover
which subcluster contains the bad
node.

Switches
oversubscription
not optimal

Try a higher number of uplinks

This strategy involves checking the


number of uplinks/downlinks per switch
to check to see this is the cause of poor
application performance.

Send operation

Experiment with having no extra


copy on the Send operation

MSMPI setting
Set MPICH_SOCKET_SBUFFER_SIZE
to 0
Note: This is done on the command line
with the command:
mpiexec env VARIABLE SETTING -env
OTHERVARIABLE OTHERSETTING

Note: This will lead to higher bandwidth


but also to higher CPU utilization.
Note: Use this only when compute nodes
are fitted with a WSD-enabled driver.
Using a setting of 0 will cause the
compute nodes on non-WSD networks to
stop responding.
Memory bus
bottleneck

Experiment with setting the


processor affinity (assign an MPI
process to a specific CPU or CPU
core)

An example of doing this from the


command line:
job submit /numprocessors:12
mpiexec /cmd /c setAffinity.bat
myapp.exe

where setAffinity.bat consists of:


@echo of
set /a AFFINITY="1 << (%PMI_RANK%
%% %NUMBER_OF_PROCESSORS%)"
echo affinity is %AFFINITY%
start /wait /b /affinity %AFFINITY% %*

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

34

Appendix C: Cluster Configuration and Deployment


Scripts
These scripts are used to automatically add nodes to the cluster and to deploy images to
nodes automatically without administrator intervention.
AddADSDevices.vbs. Parses an input file and uses the data to automatically
populate ADS with the correct compute node information, including node names and
MAC address values. These values are later used by Sysprep.exe to configure the
node images during deployment.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb01.mspx
AddComputeNodes.csv. Sample input file that shows configuration information
needed for adding nodes to the cluster. The easiest way to work with this file is to
import it into Excel as a comma-delimited file, add the necessary values, including
compute node MAC addresses, and then export the data as a comma-separated
value file. Every item must have an entry or the input file will not work properly. If you
do not have a value for a field, use a hyphen - for the field instead.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/node/ccnovb11.mspx
Capture-CCS-image-with-winpe.xml. ADS job template that captures a compute
node image for later deployment to nodes on the cluster.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb04.mspx
Deploy-CCS-image-with-winpe.xml. ADS job template that deploys a compute node
image to compute nodes on the cluster.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb02.mspx
Sysprep.inf. Generic configuration file for use with Sysprep.exe. Variable values are
retrieved from ADS during the image deployment process.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb05.mspx
The original high-performance compute cluster used additional scripts specific to its
environment, including configuring InfiniBand networking. If you have similar needs, you can
use these examples as a foundation for creating your own scripts and job templates.
ChangeIPforIB.vbs. Original script to configure IP over InfiniBand networking.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb03.mspx
Capture-image-with-winpe.xml. Original job template to capture compute node
image.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb06.mspx
Deploy-image-on-16GB-with-winpe.xml. Original job template to deploy a compute
node image to the compute nodes.
http://www.microsoft.com/technet/scriptcenter/scripts/ccs/deploy/ccdevb07.mspx

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

35

Related Links
For more information about Windows Compute Cluster Server 2003 and high-performance
computing, visit the Windows high-performance computing Web site at
http://www.microsoft.com/hpc
For more information on scripts, visit Scripting for Compute Cluster Server at
http://www.microsoft.com/technet/scriptcenter/hubs/ccs.mspx.
For more information on the Windows Server 2003 family, visit the Windows Server 2003 Web
site at http://www.microsoft.com/windowsserver2003
For information on obtaining professional assistance with planning and deploying a cluster,
visit Microsoft Partner Solutions Center at
http://www.microsoft.com/serviceproviders/busresources/mpsc.mspx

The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the
date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment
on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This white paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS
DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this
document may be reproduced, stored in, or introduced into a retrieval system, or transmitted in any form or by any means (electronic,
mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft
Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in
this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not
give you any license to these patents, trademarks, copyrights, or other intellectual property.
2007 Microsoft Corporation. All rights reserved.
Microsoft, Active Directory, Internet Explorer, Virtual Studio, Windows, the Windows logo, and Windows Server are either registered
trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.
All other trademarks are property of their respective owners.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

36

New One

Step-by-Step Guide to Installing Cluster Service


This step-by-step guide provides instructions for installing Cluster service on servers running the
Windows 2000 Advanced Server and Windows 2000 Datacenter Server operating systems. The guide
describes the process of installing the Cluster service on cluster nodes. It is not intended to explain
how to install cluster applications. Rather, it guides you through the process of installing a typical, twonode cluster itself.
On This Page
Introduction
Checklists for Cluster Server Installation
Cluster Installation
Install Cluster Service software
Verify Installation
For Additional Information
Appendix A

Introduction
A server cluster is a group of independent servers running Cluster service and working collectively as a
single system. Server clusters provide high-availability, scalability, and manageability for resources
and applications by grouping multiple servers running Windows 2000 Advanced Server or Windows
2000 Datacenter Server.
The purpose of server clusters is to preserve client access to applications and resources during failures
and planned outages. If one of the servers in the cluster is unavailable due to failure or maintenance,
resources and applications move to another available cluster node.
For clustered systems, the term high availability is used rather than fault-tolerant, as fault tolerant
technology offers a higher level of resilience and recovery. Fault-tolerant servers typically use a high
degree of hardware redundancy plus specialized software to provide near-instantaneous recovery from
any single hardware or software fault. These solutions cost significantly more than a clustering
solution because organizations must pay for redundant hardware that waits idly for a fault. Faulttolerant servers are used for applications that support high-value, high-rate transactions such as check
clearinghouses, Automated Teller Machines (ATMs), or stock exchanges.
While Cluster service does not guarantee non-stop operation, it provides availability sufficient for most
mission-critical applications. Cluster service can monitor applications and resources, automatically
recognizing and recovering from many failure conditions. This provides greater flexibility in managing
the workload within a cluster, and improves overall availability of the system.
Cluster service benefits include:

High Availability. With Cluster service, ownership of resources such as disk drives and IP
addresses is automatically transferred from a failed server to a surviving server. When a
system or application in the cluster fails, the cluster software restarts the failed application on
a surviving server, or disperses the work from the failed node to the remaining nodes. As a
result, users experience only a momentary pause in service.

Failback. Cluster service automatically re-balances the workload in a cluster when a failed
server comes back online.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

37

Manageability. You can use the Cluster Administrator to manage a cluster as a single
system and to manage applications as if they were running on a single server. You can move
applications to different servers within the cluster by dragging and dropping cluster objects.
You can move data to different servers in the same way. This can be used to manually
balance server workloads and to unload servers for planned maintenance. You can also
monitor the status of the cluster, all nodes and resources from anywhere on the network.

Scalability. Cluster services can grow to meet rising demands. When the overall load for a
cluster-aware application exceeds the capabilities of the cluster, additional nodes can be
added.

This paper provides instructions for installing Cluster service on servers running Windows 2000
Advanced Server and Windows 2000 Datacenter Server. It describes the process of installing the
Cluster service on cluster nodes. It is not intended to explain how to install cluster applications, but
rather to guide you through the process of installing a typical, two-node cluster itself.
Top of page

Checklists for Cluster Server Installation


This checklist assists you in preparing for installation. Step-by-step instructions begin after the
checklist.
Software Requirements

Microsoft Windows 2000 Advanced Server or Windows 2000 Datacenter Server installed on all
computers in the cluster.

A name resolution method such as Domain Naming System (DNS), Windows Internet Naming
System (WINS), HOSTS, etc.

Terminal Server to allow remote cluster administration is recommended.

Hardware Requirements

The hardware for a Cluster service node must meet the hardware requirements for Windows
2000 Advanced Server or Windows 2000 Datacenter Server. These requirements can be found
at The Product Compatibility Search page

Cluster hardware must be on the Cluster Service Hardware Compatibility List (HCL). The
latest version of the Cluster Service HCL can be found by going to the Windows Hardware
Compatibility List and then searching on Cluster.
Two HCL-approved computers, each with the following:

A boot disk with Windows 2000 Advanced Server or Windows 2000 Datacenter
Server installed. The boot disk cannot be on the shared storage bus described below.

A separate PCI storage host adapter (SCSI or Fibre Channel) for the shared disks.
This is in addition to the boot disk adapter.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

38

Two PCI network adapters on each machine in the cluster.

An HCL-approved external disk storage unit that connects to all computers. This will
be used as the clustered disk. A redundant array of independent disks (RAID) is
recommended.

Storage cables to attach the shared storage device to all computers. Refer to the
manufacturers' instructions for configuring storage devices. If an SCSI bus is used, see
Appendix A for additional information.

All hardware should be identical, slot for slot, card for card, for all nodes. This will
make configuration easier and eliminate potential compatibility problems.

Network Requirements

A unique NetBIOS cluster name.

Five unique, static IP addresses: two for the network adapters on the private network, two
for the network adapters on the public network, and one for the cluster itself.

A domain user account for Cluster service (all nodes must be members of the same domain).

Each node should have two network adaptersone for connection to the public network and
the other for the node-to-node private cluster network. If you use only one network adapter
for both connections, your configuration is unsupported. A separate private network adapter
is required for HCL certification.

Shared Disk Requirements:

All shared disks, including the quorum disk, must be physically attached to a shared bus.

Verify that disks attached to the shared bus can be seen from all nodes. This can be checked
at the host adapter setup level. Please refer to the manufacturer's documentation for
adapter-specific instructions.

SCSI devices must be assigned unique SCSI identification numbers and properly terminated,
as per manufacturer's instructions.

All shared disks must be configured as basic (not dynamic).

All partitions on the disks must be formatted as NTFS.

While not required, the use of fault-tolerant RAID configurations is strongly recommended for all disks.
The key concept here is fault-tolerant raid configurationsnot stripe sets without parity.
Top of page

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

39

Cluster Installation
Installation Overview
During the installation process, some nodes will be shut down and some nodes will be rebooted. These
steps are necessary to guarantee that the data on disks that are attached to the shared storage bus is
not lost or corrupted. This can happen when multiple nodes try to simultaneously write to the same
disk that is not yet protected by the cluster software.
Use Table 1 below to determine which nodes and storage devices should be powered on during each
step.
The steps in this guide are for a two-node cluster. However, if you are installing a cluster with more
than two nodes, you can use the Node 2 column to determine the required state of other nodes.
Table 1 Power Sequencing Table for Cluster Installation
Step

Node 1 Node 2 Storage

Comments

Setting Up
Networks

On

On

Off

Verify that all storage devices on the shared bus


are powered off. Power on all nodes.

Setting up Shared
Disks

On

Off

On

Shutdown all nodes. Power on the shared


storage, then power on the first node.

Verifying Disk
Configuration

Off

On

On

Shut down first node, power on second node.


Repeat for nodes 3 and 4 if necessary.

Configuring the
First Node

On

Off

On

Shutdown all nodes; power on the first node.

Configuring the
Second Node

On

On

On

Power on the second node after the first node


was successfully configured. Repeat for nodes 3
and 4 if necessary.

Post-installation

On

On

On

At this point all nodes should be on.

Several steps must be taken prior to the installation of the Cluster service software. These steps are:

Installing Windows 2000 Advanced Server or Windows 2000 Datacenter Server on each node.

Setting up networks.

Setting up disks.

Perform these steps on every cluster node before proceeding with the installation of Cluster service on
the first node.
To configure the Cluster service on a Windows 2000-based server, your account must have
administrative permissions on each node. All nodes must be member servers, or all nodes must be
domain controllers within the same domain. It is not acceptable to have a mix of domain controllers
and member servers in a cluster.
Installing the Windows 2000 Operating System
Please refer to the documentation you received with the Windows 2000 operating system packages to
install the system on each node in the cluster.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

40

This step-by-step guide uses the naming structure from the "Step-by-Step Guide to a Common
Infrastructure for Windows 2000 Server Deployment"
http://www.microsoft.com/windows2000/techinfo/planning/server/serversteps.asp. However, you can
use any names.
You must be logged on as an administrator prior to installation of Cluster service.
Setting up Networks
Note: For this section, power down all shared storage devices and then power up all nodes. Do not let
both nodes access the shared storage devices at the same time until the Cluster service is installed on
at least one node and that node is online.
Each cluster node requires at least two network adaptersone to connect to a public network, and one
to connect to a private network consisting of cluster nodes only.
The private network adapter establishes node-to-node communication, cluster status signals, and
cluster management. Each node's public network adapter connects the cluster to the public network
where clients reside.
Verify that all network connections are correct, with private network adapters connected to other
private network adapters only, and public network adapters connected to the public network. The
connections are illustrated in Figure 1 below. Run these steps on each cluster node before proceeding
with shared disk setup.

Figure 1: Example of two-node cluster (clusterpic.vsd)


Configuring the Private Network Adapter
Perform these steps on the first node in your cluster.
1.

Right-click My Network Places and then click Properties.

2.

Right-click the Local Area Connection 2 icon.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

41

Note: Which network adapter is private and which is public depends upon your wiring. For
the purposes of this document, the first network adapter (Local Area Connection) is
connected to the public network, and the second network adapter (Local Area Connection 2)
is connected to the private cluster network. This may not be the case in your network.
3.

Click Status. The Local Area Connection 2 Status window shows the connection status, as
well as the speed of connection. If the window shows that the network is disconnected,
examine cables and connections to resolve the problem before proceeding. Click Close.

4.

Right-click Local Area Connection 2 again, click Properties, and click Configure.

5.

Click Advanced. The window in Figure 2 should appear.

6.

Network adapters on the private network should be set to the actual speed of the network,
rather than the default automated speed selection. Select your network speed from the dropdown list. Do not use an Auto-select setting for speed. Some adapters may drop packets
while determining the speed. To set the network adapter speed, click the appropriate option
such as Media Type or Speed.

Figure 2: Advanced Adapter Configuration (advanced.bmp)


All network adapters in the cluster that are attached to the same network must be identically
configured to use the same Duplex Mode, Flow Control, Media Type, and so on. These
settings should remain the same even if the hardware is different.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

42

Note: We highly recommend that you use identical network adapters throughout the cluster
network.
7.

Click Transmission Control Protocol/Internet Protocol (TCP/IP).

8.

Click Properties.

9.

Click the radio-button for Use the following IP address and type in the following address:
10.1.1.1. (Use 10.1.1.2 for the second node.)

10. Type in a subnet mask of 255.0.0.0.


11. Click the Advanced radio button and select the WINS tab. Select Disable NetBIOS over
TCP/IP. Click OK to return to the previous menu. Do this step for the private network
adapter only.
The window should now look like Figure 3 below.

Figure 3: Private Connector IP Address (ip10111.bmp)


Configuring the Public Network Adapter
Note: While the public network adapter's IP address can be automatically obtained if a DHCP server is
available, this is not recommended for cluster nodes. We strongly recommend setting static IP
addresses for all network adapters in the cluster, both private and public. If IP addresses are obtained
via DHCP, access to cluster nodes could become unavailable if the DHCP server goes down. If you
must use DHCP for your public network adapter, use long lease periods to assure that the dynamically
assigned lease address remains valid even if the DHCP service is temporarily lost. In all cases, set
static IP addresses for the private network connector. Keep in mind that Cluster service will recognize

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

43

only one network interface per subnet. If you need assistance with TCP/IP addressing in Windows
2000, please see Windows 2000 Online Help.
Rename the Local Area Network Icons
We recommend changing the names of the network connections for clarity. For example, you might
want to change the name of Local Area Connection (2) to something like Private Cluster Connection.
The naming will help you identify a network and correctly assign its role.
1.

Right-click the Local Area Connection 2 icon.

2.

Click Rename.

3.

Type Private Cluster Connection into the textbox and press Enter.

4.

Repeat steps 1-3 and rename the public network adapter as Public Cluster Connection.

Figure 4: Renamed connections (connames.bmp)


5.

The renamed icons should look like those in Figure 4 above. Close the Networking and
Dial-up Connections window. The new connection names automatically replicate to other
cluster servers as they are brought online.

Verifying Connectivity and Name Resolution


To verify that the private and public networks are communicating properly, perform the following steps
for each network adapter in each node. You need to know the IP address for each network adapter in
the cluster. If you do not already have this information, you can retrieve it using the ipconfig command
on each node:
1.

Click Start, click Run and type cmd in the text box. Click OK.

2.

Type ipconfig /all and press Enter. IP information should display for all network adapters in
the machine.

3.

If you do not already have the command prompt on your screen, click Start, click Run and
typing cmd in the text box. Click OK.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

44

4.

Type ping ipaddress where ipaddress is the IP address for the corresponding network
adapter in the other node. For example, assume that the IP addresses are set as follows:
Node

Network Name

Network Adapter IP Address

Public Cluster Connection

172.16.12.12.

Private Cluster Connection

10.1.1.1

Public Cluster Connection

172.16.12.14

Private Cluster Connection

10.1.1.2

In this example, you would type ping 172.16.12.14 and ping 10.1.1.2 from Node 1, and you would
type ping 172.16.12.12 and 10.1.1.1 from Node 2.
To verify name resolution, ping each node from a client using the node's machine name instead of its
IP number. For example, to verify name resolution for the first cluster node, type ping hq-res-dc01
from any client.
Verifying Domain Membership
All nodes in the cluster must be members of the same domain and able to access a domain controller
and a DNS Server. They can be configured as member servers or domain controllers. If you decide to
configure one node as a domain controller, you should configure all other nodes as domain controllers
in the same domain as well. In this document, all nodes are configured as domain controllers.
Note: See More Information at the end of this document for links to additional Windows 2000
documentation that will help you understand and configure domain controllers, DNS, and DHCP.
1.

Right-click My Computer, and click Properties.

2.

Click Network Identification. The System Properties dialog box displays the full computer
name and domain. In our example, the domain name is reskit.com.

3.

If you are using member servers and need to join a domain, you can do so at this time. Click
Properties and following the on-screen instructions for joining a domain.

4.

Close the System Properties and My Computer windows.

Setting Up a Cluster User Account


The Cluster service requires a domain user account under which the Cluster service can run. This user
account must be created before installing Cluster service, because setup requires a user name and
password. This user account should not belong to a user on the domain.
1.

Click Start, point to Programs, point to Administrative Tools, and click Active Directory
Users and Computers

2.

Click the + to expand Reskit.com (if it is not already expanded).

3.

Click Users.

4.

Right-click Users, point to New, and click User.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

45

5.

Type in the cluster name as shown in Figure 5 below and click Next.

Figure 5: Add Cluster User (clusteruser.bmp)


6.

Set the password settings to User Cannot Change Password and Password Never
Expires. Click Next and then click Finish to create this user.
Note: If your administrative security policy does not allow the use of passwords that never
expire, you must renew the password and update the cluster service configuration on each
node before password expiration.

7.

Right-click Cluster in the left pane of the Active Directory Users and Computers snap-in.
Select Properties from the context menu.

8.

Click Add Members to a Group.

9.

Click Administrators and click OK. This gives the new user account administrative privileges
on this computer.

10. Close the Active Directory Users and Computers snap-in.


Setting Up Shared Disks
Warning: Make sure that Windows 2000 Advanced Server or Windows 2000 Datacenter Server and
the Cluster service are installed and running on one node before starting an operating system on
another node. If the operating system is started on other nodes before the Cluster service is installed,
configured and running on at least one node, the cluster disks will probably be corrupted.
To proceed, power off all nodes. Power up the shared storage devices and then power up node one.
About the Quorum Disk
The quorum disk is used to store cluster configuration database checkpoints and log files that help
manage the cluster. We make the following quorum disk recommendations:

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

46

Create a small partition (min 50MB) to be used as a quorum disk. We generally recommend a
quorum disk to be 500MB.)

Dedicate a separate disk for a quorum resource. As the failure of the quorum disk would
cause the entire cluster to fail, we strongly recommend you use a volume on a RAID disk
array.

During the Cluster service installation, you must provide the drive letter for the quorum disk. In our
example, we use the letter Q.
Configuring Shared Disks
1.

Right click My Computer, click Manage, and click Storage.

2.

Double-click Disk Management

3.

Verify that all shared disks are formatted as NTFS and are designated as Basic. If you
connect a new drive, the Write Signature and Upgrade Disk Wizard starts automatically. If
this happens, click Next to go through the wizard. The wizard sets the disk to dynamic. To
reset the disk to Basic, right-click Disk # (where # specifies the disk you are working with)
and click Revert to Basic Disk.
Right-click unallocated disk space
a.

Click Create Partition

b.

The Create Partition Wizard begins. Click Next twice.

c.

Enter the desired partition size in MB and click Next.

d.

Accept the default drive letter assignment by clicking Next.

e.

Click Next to format and create partition.

Assigning Drive Letters


After the bus, disks, and partitions have been configured, drive letters must be assigned to each
partition on each clustered disk.
Note: Mountpoints is a feature of the file system that allows you to mount a file system using an
existing directory without assigning a drive letter. Mountpoints is not supported on clusters. Any
external disk used as a cluster resource must be partitioned using NTFS partitions and must have a
drive letter assigned to it.
1.

Right-click the desired partition and select Change Drive Letter and Path.

2.

Select a new drive letter.

3.

Repeat steps 1 and 2 for each shared disk.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

47

Figure 6: Disks with Drive Letters Assigned (drives.bmp)


4.

When finished, the Computer Management window should look like Figure 6 above. Now
close the Computer Management window.

Verifying Disk Access and Functionality


1.

Click Start, click Programs, click Accessories, and click Notepad.

2.

Type some words into Notepad and use the File/Save As command to save it as a test file
called test.txt. Close Notepad.

3.

Double-click the My Documents icon.

4.

Right-click test.txt and click Copy

5.

Close the window.

6.

Double-click My Computer.

7.

Double-click a shared drive partition.

8.

Click Edit and click Paste.

9.

A copy of the file should now reside on the shared disk.

10. Double-click test.txt to open it on the shared disk. Close the file.
11. Highlight the file and press the Del key to delete it from the clustered disk.
Repeat the process for all clustered disks to verify they can be accessed from the first node.
At this time, shut down the first node, power on the second node and repeat the Verifying Disk Access
and Functionality steps above. Repeat again for any additional nodes. When you have verified that all
nodes can read and write from the disks, turn off all nodes except the first, and continue with this
guide.
Top of page

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

48

Install Cluster Service software


Configuring the First Node
Note: During installation of Cluster service on the first node, all other nodes must either be turned off,
or stopped prior to Windows 2000 booting. All shared storage devices should be powered up.
In the first phase of installation, all initial cluster configuration information must be supplied so that
the cluster can be created. This is accomplished using the Cluster Service Configuration Wizard.
1.

Click Start, click Settings, and click Control Panel.

2.

Double-click Add/Remove Programs.

3.

Double-click Add/Remove Windows Components .

4.

Select Cluster Service. Click Next.

5.

Cluster service files are located on the Windows 2000 Advanced Server or Windows 2000
Datacenter Server CD-ROM. Enter x:\i386 (where x is the drive letter of your CD-ROM). If
Windows 2000 was installed from a network, enter the appropriate network path instead. (If
the Windows 2000 Setup flashscreen displays, close it.) Click OK.

6.

Click Next.

7.

The window shown in Figure 7 below appears. Click I Understand to accept the condition
that Cluster service is supported on hardware from the Hardware Compatibility List only.

Figure 7: Hardware Configuration Certification Screen (hcl.bmp)


8.

Because this is the first node in the cluster, you must create the cluster itself. Select The first
node in the cluster, as shown in Figure 8 below and then click Next.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

49

Figure 8: Create New Cluster (clustcreate.bmp)


9.

Enter a name for the cluster (up to 15 characters), and click Next. (In our example, we name
the cluster MyCluster.)

10. Type the user name of the cluster service account that was created during the preinstallation. (In our example, this user name is cluster.) Leave the password blank. Type the
domain name, and click Next.
Note: You would normally provide a secure password for this user account.
At this point the Cluster Service Configuration Wizard validates the user account and
password.
11. Click Next.
Configuring Cluster Disks
Note: By default, all SCSI disks not residing on the same bus as the system disk will appear in the
Managed Disks list. Therefore, if the node has multiple SCSI buses, some disks may be listed that are
not to be used as shared storage (for example, an internal SCSI drive.) Such disks should be removed
from the Managed Disks list.
1.

The Add or Remove Managed Disks dialog box shown in Figure 9 specifies which disks on
the shared SCSI bus will be used by Cluster service. Add or remove disks as necessary and
then click Next.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

50

Figure 9: Add or Remove Managed Disks (manageddisks.bmp)


Note that because logical drives F: and G: exist on a single hard disk, they are seen by
Cluster service as a single resource. The first partition of the first disk is selected as the
quorum resource by default. Change this to denote the small partition that was created as
the quorum disk (in our example, drive Q). Click Next.
Note: In production clustering scenarios you must use more than one private network for
cluster communication to avoid having a single point of failure. Cluster service can use
private networks for cluster status signals and cluster management. This provides more
security than using a public network for these roles. You can also use a public network for
cluster management, or you can use a mixed network for both private and public
communications. In any case, make sure at least two networks are used for cluster
communication, as using a single network for node-to-node communication represents a
potential single point of failure. We recommend that multiple networks be used, with at least
one network configured as a private link between nodes and other connections through a
public network. If you have more than one private network, make sure that each uses a
different subnet, as Cluster service recognizes only one network interface per subnet.
This document is built on the assumption that only two networks are in use. It shows you
how to configure these networks as one mixed and one private network.
The order in which the Cluster Service Configuration Wizard presents these networks may
vary. In this example, the public network is presented first.
2.

Click Next in the Configuring Cluster Networks dialog box.

3.

Make sure that the network name and IP address correspond to the network interface for the
public network.

4.

Check the box Enable this network for cluster use.

5.

Select the option All communications (mixed network) as shown in Figure 10 below.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

51

6.

Click Next.

Figure 10: Public Network Connection (pubclustnet.bmp)


7.

The next dialog box shown in Figure 11 configures the private network. Make sure that the
network name and IP address correspond to the network interface used for the private
network.

8.

Check the box Enable this network for cluster use.

9.

Select the option Internal cluster communications only.

Figure 11: Private Network Connection (privclustnet.bmp)


10. Click Next.
11. In this example, both networks are configured in such a way that both can be used for
internal cluster communication. The next dialog window offers an option to modify the order
in which the networks are used. Because Private Cluster Connection represents a direct
connection between nodes, it is left at the top of the list. In normal operation this connection

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

52

will be used for cluster communication. In case of the Private Cluster Connection failure,
cluster service will automatically switch to the next network on the listin this case Public
Cluster Connection. Make sure the first connection in the list is the Private Cluster
Connection and click Next.
Important: Always set the order of the connections so that the Private Cluster Connection is
first in the list.
12. Enter the unique cluster IP address (172.16.12.20) and Subnet mask (255.255.252.0),
and click Next.

Figure 12: Cluster IP Address (clusterip.bmp)


The Cluster Service Configuration Wizard shown in Figure 12 automatically associates the
cluster IP address with one of the public or mixed networks. It uses the subnet mask to select
the correct network.
13. Click Finish to complete the cluster configuration on the first node.
The Cluster Service Setup Wizard completes the setup process for the first node by copying
the files needed to complete the installation of Cluster service. After the files are copied, the
Cluster service registry entries are created, the log files on the quorum resource are created,
and the Cluster service is started on the first node.
A dialog box appears telling you that Cluster service has started successfully.
14. Click OK.
15. Close the Add/Remove Programs window.
Validating the Cluster Installation
Use the Cluster Administrator snap-in to validate the Cluster service installation on the first node.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

53

1.

Click Start, click Programs, click Administrative Tools, and click Cluster Administrator.

Figure 13: Cluster Administrator (1nodeadmin.bmp)


If your snap-in window is similar to that shown above in Figure 13, your Cluster service was
successfully installed on the first node. You are now ready to install Cluster service on the
second node.
Configuring the Second Node
Note: For this section, leave node one and all shared disks powered on. Power up the second node.
Installing Cluster service on the second node requires less time than on the first node. Setup
configures the Cluster service network settings on the second node based on the configuration of the
first node.
Installation of Cluster service on the second node begins exactly as for the first node. During
installation of the second node, the first node must be running.
Follow the same procedures used for installing Cluster service on the first node, with the following
differences:
1.

In the Create or Join a Cluster dialog box, select The second or next node in the
cluster, and click Next.

2.

Enter the cluster name that was previously created (in this example, MyCluster), and click
Next.

3.

Leave Connect to cluster as unchecked. The Cluster Service Configuration Wizard will
automatically supply the name of the user account selected during the installation of the first
node. Always use the same account used when setting up the first cluster node.

4.

Enter the password for the account (if there is one) and click Next.

5.

At the next dialog box, click Finish to complete configuration.

6.

The Cluster service will start. Click OK.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

54

7.

Close Add/Remove Programs.

If you are installing additional nodes, repeat these steps to install Cluster service on all other nodes.
Top of page

Verify Installation
There are several ways to verify a successful installation of Cluster service. Here is a simple one:
1.

Click Start, click Programs, click Administrative Tools, and click Cluster Administrator.

Figure 14: Cluster Resources (clustadmin.bmp)


The presence of two nodes (HQ-RES-DC01 and HQ-RES-DC02 in Figure 14 above) shows that
a cluster exists and is in operation.
2.

Right Click the group Disk Group 1 and select the option Move. The group and all its
resources will be moved to another node. After a short period of time the Disk F: G: will be
brought online on the second node. If you watch the screen, you will see this shift. Close the
Cluster Administrator snap-in.

Congratulations. You have completed the installation of Cluster service on all nodes. The server cluster
is fully operational. You are now ready to install cluster resources like file shares, printer spoolers,
cluster aware services like IIS, Message Queuing, Distributed Transaction Coordinator, DHCP, WINS, or
cluster aware applications like Exchange or SQL Server.
Top of page

For Additional Information


This guide covers a simple installation of Cluster service. For more articles and papers on Windows
2000 Server, Windows 2000 Advanced Server, and Windows 2000 Cluster service, see: The Windows
2000 Web site. For information on installing DHCP, Active Directory, and other services, see Windows
2000 Online Help, the Windows 2000 Planning and Deployment Guide, and the Windows 2000
Resource Kit.
Top of page

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

55

Appendix A
This appendix is provided as a generic instruction set for SCSI drive installations. If the SCSI hard disk
vendor's instructions conflict with the instructions here, always use the instructions supplied by the
vendor".
The SCSI bus listed in the hardware requirements must be configured prior to installation of Cluster
services. This includes:

Configuring the SCSI devices.

Configuring the SCSI controllers and hard disks to work properly on a shared SCSI bus.

Properly terminating the bus. The shared SCSI bus must have a terminator at each end of the
bus. It is possible to have multiple shared SCSI buses between the nodes of a cluster.

In addition to the information on the following pages, refer to the documentation from the
manufacturer of the SCSI device or the SCSI specifications, which can be ordered from the American
National Standards Institute (ANSI). The ANSI web site contains a catalog that can be searched for
the SCSI specifications.
Configuring the SCSI Devices
Each device on the shared SCSI bus must have a unique SCSI ID. Since most SCSI controllers default
to SCSI ID 7, part of configuring the shared SCSI bus will be to change the SCSI ID on one controller
to a different SCSI ID, such as SCSI ID 6. If there is more than one disk that will be on the shared
SCSI bus, each disk must also have a unique SCSI ID.
Some SCSI controllers reset the SCSI bus when they initialize at boot time. If this occurs, the bus
reset can interrupt any data transfers between the other node and disks on the shared SCSI bus.
Therefore, SCSI bus resets should be disabled if possible.
Terminating the Shared SCSI Bus
Y cables can be connected to devices if the device is at the end of the SCSI bus. A terminator can then
be attached to one branch of the Y cable to terminate the SCSI bus. This method of termination
requires either disabling or removing any internal terminators the device may have.
Trilink connectors can be connected to certain devices. If the device is at the end of the bus, a trilink
connector can be used to terminate the bus. This method of termination requires either disabling or
removing any internal terminators the device may have.
Y cables and trilink connectors are the recommended termination methods, because they provide
termination even when one node is not online.
Note: Any devices that are not at the end of the shared bus must have their internal termination
disabled.
1

See Appendix A for information about installing and terminating SCSI devices.

Step-by-Step Guide to Installing, Configuring, and Tuning a High-Performance Compute cluster

56

Anda mungkin juga menyukai