Yu Zhang Computer Architecture Group Chemnitz University of Technology January 25, 2011
Abstract
Computing clusters run usually on physical computers. With virtualization approach clusters can also be virtualized. This article describes the building of a virtual cluster based on VirtualBox.
Contents
1 2 VirtualBox Installation Creation of Virtual Machines 2.1 Create of New Machines . . . . . . . 2.2 Normal Installation on Master Node 2.3 Minimal Installation on Slave Nodes 2.4 Network Conguration . . . . . . . . 3 3 4 4 5 7
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Application of SLURM 8 3.1 Possible Problems During Installation . . . . . . . . . . . . . . . . . . . . 8 3.2 Installation and Conguration . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Automatic Startup when Booting . . . . . . . . . . . . . . . . . . . . . . . 10 Cluster Network Conguration 4.1 Hostnames . . . . . . . . 4.2 IP Addresses . . . . . . . 4.3 Host List . . . . . . . . . 4.4 Password-less SSH . . . . 4.5 Network File System . . . 13 13 13 13 13 15
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
Test with Applications 16 5.1 Simple MPI Program Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5.2 Tachyon Ray Tracer Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Further Work 21 22 23
1 VirtualBox Installation
There are two basic editions of VirtualBox: VirtualBox and VirtualBox OSE (Open Source Edition). Both have almost the same function except some dierent features targeting dierent customers. In Ubuntu the command: sudo apt-get install virtualbox-ose or wget http://download.virtualbox.org/virtualbox/4.0.0/ \ virtualbox-4.0_4.0.0-69151~Ubuntu~lucid_amd64.deb sudo dpkg -i virtualbox-4-0_4.0.0-69151~Ubuntu_lucid_amd64.deb installs the VirtualBox OSE or VirtualBox packages respectively, where virtualbox-4-0_4.0.0-69151~Ubuntu_lucid_amd64.deb can also be replaced by * * * * * * virtualbox-4.0_4.0.0-69151~Ubuntu~maverick_amd64.deb virtualbox-4.0_4.0.0-69151~Ubuntu~karmic_amd64.deb virtualbox-4.0_4.0.0-69151~Ubuntu~jaunty_amd64.deb virtualbox-4.0_4.0.0-69151~Ubuntu~hardy_amd64.deb virtualbox-4.0_4.0.0-69151~Debian~squeeze_amd64.deb virtualbox-4.0_4.0.0-69151~Debian~lenny_amd64.deb ...
according to the distribution and version of the host OS. The package architecture has to match the Linux kernel architecture, that is, install the appropriate AMD64 package for a 64-bit CPU. It does not matter whether it is a Intel or an AMD CPU.
(a) before
(b) later
(c) Memory
on each node. If not, say, with the command /sbin/ifcong ethernet cards rather than eth1 appeared, put the right ethernet cards in the le /etc/udev/rules.d/70-persistentnet.rules, then adjust it like this, sudo sudo sudo sudo modprobe -r e1000 modprobe e1000 /etc/init.d/udev restart /etc/init.d/networking restart
3 Application of SLURM
Resource management are of non-trivial an eort with ever-growing nodes in a clusster. SLURM is designed for this purpose on linux clusters of all sizes. It performs exclusive or non-exclusive resource access as well as monitors the present state of all nodes in a cluster. Beginers tend to get into troubles with slurm installation, at least for me previously.
2. Warnings in Conguration configure: WARNING: Unable to locate NUMA memory affinity functions ... configure: WARNING: Unable to locate PAM libraries ... configure: WARNING: Can not build smap without curses or ncurses library ... checking for GTK+ - version >= 2.7.1... no *** Could not run GTK+ test program, checking why... *** The test program failed to compile or link. See the file config.log for the *** exact error that occured. This usually means GTK+ is incorrectly installed. checking for mysql_config... no configure: WARNING: *** mysql_config not found. Evidently no MySQL \ install on system. checking for pg_config... no configure: WARNING: *** pg_config not found. Evidently no PostgreSQL \ install on system. ... checking for munge installation... configure: WARNING: unable to locate munge installation ... configure: WARNING: unable to locate blcr installation Solution: sudo sudo sudo sudo sudo sudo sudo apt-get apt-get apt-get apt-get apt-get apt-get apt-get install install install install install install install libnuma1 libnuma-dev libpam0g libpam0g-dev libncurses5-dev libgtk2.0-dev libmysql++-dev libpq-dev libmunge2 libmunge-dev
configure: WARNING: Unable to locate PLPA processor affinity functions ... Solution: wget http://www.open-mpi.org/software/plpa/v1.1/downloads/plpa-1.1.1.tar.gz tar xzvf plpa-1.1.1.tar.gz cd plpa-1.1.1/ ./configure make sudo make install 9
configure: WARNING: Could not find working OpenSSL library Solution: cd src/plugins make sudo make install configure: configure: configure: configure: Solution: remains unknown to me. But does not matter too much for our test purpose. Cannot support QsNet without librmscall Cannot support QsNet without libelan3 or libelanctrl! Cannot support Federation without libntbl WARNING: unable to locate blcr installation
11
Figure 7: graphic user interface to view and modify slurm state when successfully started
12
4.2 IP Addresses
Every node has a unique IP address within the cluster. We set 192.168.56.100 for the master node, and 192.168.56.101 to 192.168.56.108 for 8 slave nodes respectively. IP address can be changed in the le /etc/network/interface. After that, the network need to be restarted to apply the new one.
zhayu@node2>>.ssh/authorized_keys zhayu@node2s password: zhayu@node1:~$ ssh zhayu@node2 SSH login works as follows, if properly done, zhayu@node1:~$ ssh node2 Linux node2 2.6.26-2-686 #1 SMP Thu Sep 16 19:35:51 UTC 2010 i686 The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Mon Jan 17 19:23:14 2011 zhayu@node2:~$ If on the other hand, password is still demanded after this, or a message comes as the following, Agent admitted failure to sign using the key. Permission denied (publickey). then simply run ssh-add on the client node. Repeat it till SSH logins from master node to all the slave nodes require no password any more. If the following error message comes, it means SSH server daemon has not yet been started on the SSH sever. ssh: connect to host node2 port 22: Connection refused Issuing a command like: sudo /etc/init.d/ssh status to see whether SSH daemon is running on the SSH server. If no, type sudo /etc/init.d/ssh start to start SSH daemon.
14
15
MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Get_processor_name(name,&length); printf("Hello World MPI: processor %d of %d on %s\n", rank,size,name); MPI_Finalize(); } We compile and execute the program as Figure 10 shows.
in the le /tachyon/unix/Make-arch or else the following error message comes and make process is aborted. make[2]: *** [../compile/linux-beowulf-mpi/libtachyon/parallel.o] Error 1 make[1]: *** [all] Error 2 make: *** [linux-beowulf-mpi] Error 2 When make, a lot of architectures it supports are listed. As linux-beowulf-mpi is the actual architecture in our case, build it like this, wget http://jedi.ks.uiuc.edu/~johns/raytracer/ \ files/0.98.9/tachyon-0.98.9.tar.gz tar xvzf tachyon-0.98.9.tar.gz cd tachyon make make linux-beowulf-mpi
With Tachyon, we would like to illustrate not only the parallel rendering on a 9-node virtual cluster, but also a brief application of the above mentioned batch system SLURM. The more nodes a cluster embraces, the more obvious its power emerges. Here is an simple example of SLURM script, containing all the necessary task specications. Then submit the task and wait for its completion. 18
#!/bin/bash #SBATCH -n=9 mpirun tachyon/compile/linux-beowulf-mpi/tachyon \ tachyon/scenes/dna.dat -fullshade -res 4096 4096 -o dna2.tga Submit the slurm job with the command, sbatch ./task1.sh All results will be saved in a specied le, When slurm job is submitted to the batch system, the available computing nodes was allocated for this task. A performance speedup achieved by 9 nodes is presented as gure 14 below, Tests were made with dierent task allocation methods, namely 1PN9, 3PN9, 1P1N and 9P1N, that is, an arbitary number of processes on an arbitary number of nodes. We can see clearly from the graphic the speedup a 9-node virtual cluster achieved.
19
20
6 Further Work
This is only the rst half of our task. The aim is to control the virtual machines in a cluster with a batch system, where pxeboot of slave nodes from a master node is necessary. However, to bring slurm working for this, every pxebooted node should have its own File system rather than the one shared from the master node.
21
22
B A Bug in NFS
There comes always the error message when NFS are to be mounted by slave nodes when booting: if-up.d/mountnfs[eth0]: lock /var/run/network/mountnfs exist, not mounting Just replace the later part of /etc/network/if-up.d/mountnfs with the following code:[7] ... # Using no != instead of yes = to make sure async nfs mounting is # the default even without a value in /etc/default/rcS if [ no != "$ASYNCMOUNTNFS" ]; then # Not for loopback! [ "$IFACE" != "lo" ] || exit 0 # Lock around this otherwise insanity may occur mkdir /var/run/network 2>/dev/null || true if [ -f /var/run/network/mountnfs ]; then msg="if-up.d/mountnfs[$IFACE]: lock /var/run/network/mountnfs exist, not mounting" log_failure_msg "$msg" # Log if /usr/ is mounted [ -x /usr/bin/logger ] && /usr/bin/logger -t "if-up.d/mountnfs[$IFACE] " "$msg" exit 0 fi touch /var/run/network/mountnfs on_exit() { # Clean up lock when script exits, even if it is interrupted rm -f /var/run/network/mountnfs 2>/dev/null || exit 0 } trap on_exit EXIT # Enable emergency handler do_start elif [ yes = "$FROMINITD" ] ; then do_start fi --------------------------------------------------------------------This will use a le instead of a directory to lock the action and les would be cleaned up on boot.
23