High Performance Computing Basics by gqz18849


									     High Performance Computing Basics

                  April 17, 2007

                Dr. David J. Haglin


  What is the HPC?
  Where did it come from?
  How can you get an account on hpc.mnsu.edu?
  How can you use it for your research?
  Where do you go from here?

                What is the HPC?
                         Many AMD Opteron
                          Computers (nodes) in a
                         Connected by a high-
                          speed network
                         In the IT Services Secure
                          area (third floor of the
                         All nodes run linux
                         http://www.mnsu.edu/hpc

                     What is the HPC?
                                           Head node has 8GB RAM;
                                            7.4 TB of Disk
                •Head Node
                                           Head node is for doing
                                            administrative work and
                                            starting long jobs
                                           The 34 Worker nodes are
                                            for doing long
                                           Each worker has 8GB
  •Worker 1        •…        •Worker 34
                                            RAM; 80 GB Hard Disk; 2
                                            dual-core AMD Opteron

                   What is the HPC?
  Software Installed:
       < GNU languages: C/C++ (gcc/g++), Fortran (gfortran)
       < Message Passing Interface library OpenMPI
  Software soon to be installed:
       <   MATLAB
       <   Fluent
       <   Portland Group Fortran and C/C++
       <   IMSL
  Email is “local delivery only”

                Where did it come from?
                             National Science Foundation
                               < MRI Program (Major
                                  Research Instrumentation)
                               < $140,000
                               < Institutional Equipment
                                  funds upgraded machine by
                                  adding five nodes
                             PIs: Patrick Tebbe, Rebecca
                              Bates, David Haglin
                             Proposal focused on a
                              college-wide need for HPC
                             Vendor: PSSC Labs, Inc.

        How can you get an account?
  We must submit a final report to NSF after July 31, 2009
  Part of the final report must include how much it was used
   within CSET (and within MSU).
  We need to track usage (research projects).
  To get an account, send an email to haglin@mnsu.edu with
   information as described:
    < http://www.mnsu.edu/hpc/accounts.html
    < Your students can get accounts too!
  We are very interested in knowing about publications you
   obtain as a result of using hpc.mnsu.edu.

                Your Research

  Okay, so you got an account.

  Now What?

                Your Research

  Learning to use HPC.
  Learning to use the OpenPBS/Torque job
   queuing software.
  Learning to “design” your usage.
  Tutorials will be maintained at

                    Your Research

  Connect to hpc.mnsu.edu (head node) using ssh
       < ssh on unix
       < PuTTY or SSH Windows Client (IT Services)
       < Firewall is pretty tight, may need to request a new
         opening in the firewall from your location
  Line-mode (command-line) interface
  Basic unix commands:
       < http://www.mnsu.edu/hpc/tutorials/linux_basics.doc

                Your Research

  Disks on hpc:

                      Your Research
  Using OpenPBS/Torque job queuing software:
       <   qstat             -- Inspect current job queue
       <   qsub              -- Add a new job to the queue
       <   qdel              -- Delete one of your jobs from the Q
       <   pbsmon.py         -- See the state of the entire machine
       <   xpbsmon           -- Uses X11 to display machine state
       <   firefox localhost/ganglia
  Detailed information available at:
       < http://www.clusterresources.com/torquedocs21/users
                   Your Research

  Designing your usage.
       < Assume you have a program you want to run for
         different parameter values of 1 through 1000
       < Ex:       $ myProgram -p1
                   $ myProgram -p2
                   $ myProgram -p1000

                Your Research

  Create 1000 “start scripts” to queue 1000 jobs to
   the master queue.
  Start your jobs and monitor their progress
  Combine results when they are all done.
  Organize experiments/runs in folders
  Use scripting languages such as python to
   generate start scripts.

                    Your Research

  Input and Output for your jobs:
       < Your script will start on a worker node
       < You can log in to a worker node to see filesystem:
           ssh n04
           df
       < Standard Output and Standard Error are separate
       < Files are written alongside your script when jobs
       < No way to monitor progress of your computation

                Your Research

  Sample script to run from 501 to 505:

         Where do you go from here?

  www.mnsu.edu/hpc is a communication portal
  Find colleagues who can help
  Learn more about the capabilities:
       < New software
       < Parallel programming (MPI)
       < Parallel libraries: e.g., ScaLAPACK.
  Keep this machine computing fast
  Other ideas?


To top