Exercise 6: Set Up a Mini MPI Cluster

EXERCISE DESCRIPTION:

In this exercise you will team up with a few neighbors (from at least 1 to a maximum of 3) to build a true parallel machine composed by 2 to 4 desktop PCs. You can then run simple parallel program on top of them and compare MPI performances with respect to a standalone machine.

EXERCISE GUIDE

Task: Setting up a mini-“cluster”

1. Form teams of 2 to 4 people. Find out the IP of the PCs that will form the cluster. You can find out the IP by issuing the command:

$ /sbin/ifconfig

You will need these IP addresses in later steps.

2. You will need super-user privileges, so become the root user:

$ su -

3. All the PCs in the parallel cluster will need an identical user account (that is: same user name, user id, home directory, etc).

You can add a new user (named "mpi") with the following command:

# useradd -m -d /home/mpi -u 10000 -s /bin/bash mpi

With the following command, set the password of the "mpi" user to 12345678:

# passwd mpi
 <enter twice: 12345678>

4. Enable the "mpi" user account to login on both PCs without a password.

This can be accomplished with a feature of the ssh program named "public-key" authentication.

Steps b to d below are performed on one PC only,

Which PC does not matter, as long as it is part of your cluster.

a. Install NFS-server:

#apt-get install nfs-common nfs-kernel-server

b. Edit the file /etc/exports add the following line:

/home IP1(rw,no_root_squash) IP2(rw,no_root_squash) ...

where IPx are the IP addresses of the client machines.

Execute the following commnads

#exportfs -ra

c. Become user "mpi":

# su - mpi

d. Change current directory to .ssh:

$ cd ~/.ssh

If the directory does not exist yet, just create it with:

$ mkdir ~/.ssh 

$ chmod go-rwx ~/.ssh

(if it exists already make sure you do not have group write access privileges or this wont work)

e. Generate a private/public identity key pair with:

$ ssh-keygen -t dsa

Hit return (2 times) when asked for a password.

This creates 2 new files id_dsa and id_dsa.pub.

f. Copy the id_dsa.pub file under the name authorized_keys:

$ cd ~/.ssh

$ cp id_dsa.pub authorized_keys

You should have found the other PC host names in step 1.

g. From each PC, try to log into all other machines:

$ ssh mpi@other-pc

You should be able to login without a password prompt.

The following should be performed on all the slave nodes in the cluster

5. Install NFS-client

#apt-get install nfs-common

Edit /etc/fstab, and add the following line

IP:/home /home nfs rw,rsize=4096,wsize=4096,hard,intr,async,nodev,nosuid 0 0

where IP is the IP address for the server. Now reboot your client machine

6. Place the executable you want to run in a directory on one of the machines (does not matter which one, as the /home is shared)

For instance, you can copy the executable you compiled before IMB-MPI1 to the directory /home/mpi/test. To do this open a root console and type

# cp /home/myname/IMB-3.0/src/IMB-MPI1 /home/mpi/test

# chown mpi:mpi /home/mpi/test/IMB-MPI1

Return to the user console

7. Edit a host file in the @@/home/mpi@ directory:

$ gedit /home/mpi/myhosts

This file should list all the PCs in your cluster, for example:

IP1 slots=2
IP2 slots=2
.
.
.
IPn slots=2

where IPX is the IP address of the machines you wish to use. You should have identified the other PC IP in step 1.

The following steps are performed on one PC only

Which PC does not matter, as long as it is part of your cluster.

If you have performed the above steps correctly, you will be able to run MPI programs from any PC in the cluster and exploit the full multi-processor power from any PC in the cluster.

8. To run a mpi code (eg. the IMB) across different nodes you should type (eg. 2 PCs)

$ mpirun --hostfile myhosts -np 4 ./IMB-MPI1

9. Issue the top command on all PCs to make sure the code is running everywhere in the cluster.

10. Now rerun the benchmark and save the program output on a file as in the previous exercise. Observe the difference in performance compared to the run on a single PC. Why is there such a difference?