CET 479

Document Sample
CET 479 Powered By Docstoc
					Linux Cluster Configuration using MPICH-1.2.1

                May 12, 2006
      The Message Passing Interface (MPI) is a standard that is set for vendors
that creates parallel computing software. The MPICH implementation has been
designed to provide an implementation of MPI library that offers both high-
performance and portability. MPICH is free and the current version is “MPICH 2”.
A copy of the MPICH software can be downloaded at:                     http://www-
unix.mcs.anl.gov/mpi/mpich/ It is possible to mix and match systems.



Requirements:

    Fedora Core 4

    Mpich-1.2.1 (Message Passing Interface)

    Two Nodes “master & slave”

    Network Switch ( a crossover cable can be used if you only have two

      nodes)

    Ethernet cables

Next, install a Linux distribution on each computer in your cluster
(henceforth, I will call them nodes). I used Fedora Core 4 for my
operating system; it has fewer problems than using the Red Hat
distribution.


During the installation process, assign sensible hostnames and unique
IP addresses for each node in your cluster. Usually, one node is
designated as the master node (where you'll control the cluster, write
and   run   programs,     etc.)   with   all   the   other   nodes     used    as
computational slaves. I named my nodes Master and Slave to keep
things simple, (using Slave01, Slave02, etc.) I used IP address
172.16.1.100 for the master node and added one for each slave node
(172.16.1.101, etc.).


Fedora Core 4 Installation:

Insert the Fedora core 4 CD into the CDROM drive and just follow the
default installation. I used the workstation option in my cluster. DO
NOT INSTALL THE FIREWALL, if you do you will have problems with
security issues later. Trust Me! After the installation process is
completed; after the installation process is completed, do the
following:


   a) On the computer task bar click Desktop, Systems Settings,
      Add/Remove Application.
   b) Scroll down to Legacy Network Server: select the legacy
      checkbox, and then click Details.
   c) Under Extra Packages, select rsh-server, telnet-server, and
      rusers-server next click close, then Update and follow the on-
      screen instructions.
   d) Ping the nodes on your cluster to test for connectivity; e.g. ping
      172.16.1.101 or 172.16.1.102, etc.


Create .rhosts file in the root directories. My .rhosts file for the root
user is as follows:


Note! You put the names of all the nodes on your cluster in this file.


   1. Create a .rhosts file in the root directories. This file should have
      all the names of the nodes in your cluster. I only have two nodes
      on my cluster; namely Master and Slave. My .rhosts file for the
      root user is as follows:
                 Master    root
                 Slave01 root
   2. Next, create a hosts file in the /etc directory. Below is my
      hosts file for the master node.


            172.16.1.100 Master.home.net Master
            127.0.0.1     Localhost
            172.16.1.101 Slave01
            172.16.1.102 Slave02, etc.
Each node in the cluster has a similar hosts file with the appropriate
changes to the first line reflecting the hostname of that node. For
example, Slave01 node would have the first line:


      172.16.1.101      Slave01.home.net Slave01


With the third line containing the IP and hostname of Master; all other
nodes are configured in the same manner. Do not remove the
127.0.0.1 localhost line. To allow root users to use rsh, Add the
following lines to the /etc/securetty file:


      rsh, rlogin, pts/0, pts/1 (single space between each)


Also, modified the /etc/pam.d/rsh file: to look like the following.


#%PAM-1.0
# For root login to succeed here with pam_securetty, "rsh"
must be
# listed in /etc/securetty.
auth       sufficient   /lib/security/pam_nologin.so
auth       optional     /lib/security/pam_securetty.so
auth       sufficient   /lib/security/pam_env.so
auth       sufficient   /lib/security/pam_rhosts_auth.so
account          sufficient       /lib/security/pam_stack.so
service=system-auth
session          sufficient              /lib/security/pam_stack.so
service=system-auth

  3 Navigated to the /etc/xinetd.d directory and modified each of
      the command files (rsh, rlogin, telnet and rexec), changing
      the disabled = yes line to disabled = no.      Then


  4 Close the editor and issue the following command:


      xinetd –restart, to enable rsh, rlogin etc


5 Download the MPICH software onto the Master node (Master).
  Untar the file in the root directory (if you want to run the cluster as
  root). Issue the command: tar zxfv mpich.tar.gz (or whatever the
  name of the tar file is for your version of MPICH), and the mpich-
  1.2.1 directory will be created with all subdirectories in place. If you
  are using a later version of MPICH than we are, the last number
  might be different than mine.


  Change to the newly created mpich-1.2.1 directory. Make certain to
  read the README file (if it exists); at the command prompt, Type
  ./configure, and when the configuration is complete and you are
  back at the command prompt, type make


  The make may take a few minutes, depending on the speed of your
  master computer. Once make has finished, add the mpich-
  1.2.1/bin and mpich-1.2.1/util directories to your PATH in
  .bash_profile e.g. mpich-1.2.1/util: mpich-1.2.1/bin


  6   Log out and then log in to enable the modified PATH containing
      your MPICH directories.
7   From     within   the   mpich-1.2.1    directory,   go   to   the
    util/machines/ directory and find the machines.LINUX file.
    This file will contain the hostnames of all the nodes in your
    cluster. When you first view the file, you<\#146>ll notice that
    five copies of the hostname of the computer you are using will
    be in the file. For the Master node on our cluster, there will be
    five copies of Master in the machines.LINUX file. If you have
    only one computer available, leave this file unchanged, and you
    will be able to run MPI/MPICH programs on a single machine.
    Otherwise, delete the five lines and add a line for each node
    hostname in your cluster, with the exception of the node you
    are using. For my cluster, my machines.LINUX file as viewed
    from master looks like this:


                Salve01
                Slave02, etc.

8   Then make all the example files and the MPE graphic files. First,
    navigate to the mpich-1.2.1/examples/basic directory and
    type make to make all the basic example files. When this
    process has finished, to the mpich-1.2.1/mpe/contrib directory
    and make some additional MPE example files, especially if you
    want to view graphics. Within the mpe/contrib directory, you
    should see several subdirectories. The one we will be interested
    in is the mandel directory. Change to the mandel directory, and
    type make to create the pmandel exec file. You are now ready
    to test your cluster.
Running the Test Programs

  1. From within the mpich-1.2.1/examples/basic directory, copy
     the cpilog exec file (if this file is not present, try typing make
     again) to your top-level directory. “/root”


  2. Then, from your top level directory, rcp the cpilog file to each
     node in your cluster, placing the file in the corresponding
     directory on each node. For example, if I am logged as root on
     the master node, I'll type rcp cpilog Master:/root to copy
     cpilog to the /root directory on Slave01. Once the files have
     been copied, I'll type the following from the top directory of my
     master node to test my cluster:


  3. mpirun –np 1 cpilog – you should get the following output


pi is approximately 3.1415926535899406,
Error is 0.0000000000001474
Process 0 is running on node00.home.net
Wall clock time = 0.360909

  Figure 1


  Try to run this program using more than one node.


  Mpirun –np 2 cpilog: depending on the number of nodes on your
  class, try using all the available nodes and note the difference in
  execution time. For example if I have a eight node cluster, I would
  execute the following command. Mpirun –np cpilog and my result
  should look like the following.
pi is approximately 3.1415926535899406,
Error is 0.0000000000001474
Process 0 is running on node00.home.net
Process 1 is running on node01.home.net
Process 2 is running on node02.home.net
Process 3 is running on node03.home.net
Process 4 is running on node04.home.net
Process 5 is running on node05.home.net
Process 6 is running on node06.home.net
Process 7 is running on node07.home.net
wall clock time = 0.0611228

Figure 2


Future Works:

      The following tests do not work:


      To see some graphics, we must run the pmandel program. Copy the
pmandel exec file (from the mpich-1.2.2.3/mpe/contrib/mandel directory) to your
top-level directory and then to each node (as you did for cpilog). Then, if X isn't
already running, issue a startx command. From a command console, type xhost
+ to allow any node to use your X display, and then set your DISPLAY variable
as follows: DISPLAY=node00:0 (be sure to replace node00 with the hostname of
your master node). Setting the DISPLAY variable directs all graphics output to
your master node. Run pmandel by typing: mpirun -np 2 pmandel

The pmandel program requires at least two processors to run correctly. You
should see the Mandelbrot set rendered on your master node.
Figure 3 he mandelbrot Set Rendered on the Master Node

You can use the mouse to draw a box and zoom into the set if you want. Adding
more processors (mpirun -np 8 pmandel) should increase the rendering speed
dramatically. The mandelbrot set graphic has been partitioned into small
rectangles for rendering by the individual nodes. You actually can see the nodes
working as the rectangles are filled in. If one node is a bit slow, then the
rectangles from that node will be the last to fill in. It<\#146>s fascinating to watch.
We've found no graceful way to exit this program other than pressing Ctrl-C or
clicking the close box in the window. You may have to do this several times to kill
all the nodes. An option to pmandel is to copy the cool.points file from the original
mandel directory to the same top-level directory (on the master node) as
pmandel and run mpirun -np 8 pmandel -i cool.points

The -i option runs cool.points as a script and puts on a nice mandelbrot slide
show. You could use the cool.points file as a model to create your own display
sequence if you like.
Reference:

Step-by-Step Clustering Using MPICH 1.2.2.3:
http://www.linuxjournal.com/article/5690


MPICH Software downloads: http://www-unix.mcs.anl.gov/mpi/mpich/

				
DOCUMENT INFO