Docstoc

IWLSC-Vikas

Document Sample
IWLSC-Vikas Powered By Docstoc
					      Cluster Building and Design




Cluster Building and Design


         Vikas Singhal
      VECC, Kolkata, India




                                    Vikas Singhal, VECC
                                      February 9, 2006
                                                         1
                       Cluster Building and Design

                Cluster Building and Design

       General View of HPC

                                          Clustering Concept

  Requirement for clustering

                                          Quattor Description

          Working of Condor


                                          Glimpse of Ganglia

Current status of our cluster                          Vikas Singhal, VECC
                                                         February 9, 2006
                                                                            2
                      Cluster Building and Design


          High Performance Computing

Branch of Computing that deals with extremely powerful computers and
                  the applications that use them.


High Computing Power required for Data Intensive applications or High
          Computing applications. (As per requirement)


          Eg. Supercomputer is one of the answer for HPC.
        Supercomputer is characterized by very high speed, very large
                                 memory.
                Speed measured in terms of number of flops.
          Fastest computer in the world BlueGene/L (IBM made) 280
                               Tflops.

                                                      Vikas Singhal, VECC
                                                         February 9, 2006
                                                                            3
                                 Cluster Building and Design


                             Technologies for HPC


   Traditional : Build Faster CPUs
                                                                    Parallel Processing
                                                                 (Harness large number of
                                                               ordinary CPUs and divide the
                                                                     job between then)
    Special electronic Advanced CPU architecture
      technology for       (Pipelining, Vector
                                                 Large number of conventional CPUs
 increasing clock speed   Processing, Multiple
                                                  Interconnected through a Network
                          functional units etc)
       Eg: CRAY                                                      Cost effective
 Very high clock speed           Expensive

                                                             Program writing is difficult,
Very High heat dissipation                                     Job has to be split into
Advanced cooling techniques required                       independently executable units
    Liquid Freon / Liquid nitrogen
                                                                         Vikas Singhal, VECC
                                    But easy for User
                                                                            February 9, 2006
                             No special programming required
                                                                                               4
                    Cluster Building and Design



                Why Clustering

       Low cost technology than Supercomputer.

       Faster than super computer of same hardware cost.

       No technical and technological limitations.

       Scalable and Simple.


For High Performance and High Availability computing,
Making Cluster of computers is one of the best solution.

                                                  Vikas Singhal, VECC
                                                      February 9, 2006
                                                                         5
                           Cluster Building and Design


   High Computing Power                     Clustering of Computers

                                              Application

Computing Intensive Task

          Main aim is High Performance Computing (HPC)
          (Most of TOP500 computers are built by clustering,
          In BlueGene/L 1,31,000 processors (approx))

          Single User and single number crunching problem

          Communication between nodes should be much faster
          (Some Hi-Fi network card is required (Costly))

          Program should be written with the help of any parallel language or in
          Parallel environment.
                   Parallel Languages: LINDA, OCCAM etc
                   Parallel Extension to serial languages:
                   High Performance Fortran (HPF)              Vikas Singhal, VECC

                   Parallel APIs: OpenMP, MPI                    February 9, 2006
                                                                                    6
                          Cluster Building and Design


   High Computing Power                    Clustering of Computers

                                               Application

Data Intensive Task

          Main aim is not High Performance Computing (HPC) but High
          Availability.
          Multi User and Multi Job System
                    7 collaborating Institutes
                    More than 100 Users (Consult with Mr. S. K. Pal Talk)
          It is Part of Global Grid like EDG

          Security is main concern

           Internet Connectivity (High Bandwidth) is required.
           (We have installed 4-Mbps Leased Line (1:4))


                                                                 Vikas Singhal, VECC
                                                                   February 9, 2006
                                                                                      7
                         Cluster Building and Design

     How to build Cluster of Our Requirement



Hardware                                                       Software
    Purchase according to                 According to requirement.
    requirement and Budget.               Open Source Availability.
                                          Software Area is Very Big.
    Processors
                                                Cluster Building S/W
    Memory (RAM)
                                             Cluster Monitoring S/W
    Storage
                                                Job Scheduling S/W
    No need to purchase Hi-Fi
    Network Card                             User Management S/W




                                                              Vikas Singhal, VECC
                                                                 February 9, 2006
                                                                                    8
         Cluster Building and Design


Our specific requirement

             Procurement of

        HARDWARE


               Procurement of

          SOFTWARE

Procurement of full cluster is not at Once.
         Step by step process.
  Different H/W support different S/W.
                                              Vikas Singhal, VECC
                                                February 9, 2006
                                                                   9
                           Cluster Building and Design
                  Present status of Tier2-Kol Cluster
                                                  125.20.3.11
                                                  DMZ                 4Mbps (1:4)

   Management Nodes

                                                                  Giga-bit Switch
Giga-bit Switch




                                                                                    192.168.x.x (Stand by)
                                                         HP Proliant-360DLG3
                                                         Dual CPU Xeon 2.4 GHz
                                                                  Vikas Singhal, VECC
                                                                     February 9, 2006
    Computing Nodes       Based on High Availability                                                         10
                            Cluster Building and Design


                        High Availability
For Data Intensive and Real time task critical system requires High availability

 High Availability             Redundancy (Eliminate single point of failure)




                                                          2-Gigabit Switch



                     Eth1        Eth0
                                                   Each server has 2-NICs



         Based on Bonding Concept


                                                                Vikas Singhal, VECC
                                                                   February 9, 2006
                                                                                      11
                   Cluster Building and Design


                  Redundancy Cont.




Both are mirror of each other.    2 Hard Disks
Both are hot swappable.
Implemented on Hardware RAID-0 technique.
Both synchronized in each millisecond.

                       rsync


 Trying to make mirror of Management node.

                                                 Vikas Singhal, VECC
                                                   February 9, 2006
                                                                      12
                       Cluster Building and Design

     Software Requirement for making Cluster

Open Source Software for Cluster Building:-

        OSCAR                     : Free but harnessing of Client nodes is
                                    limited
        SCALI                     : Not free S/W. Paid with Network Cards
                                    (as in IMSc)

        Redhat Cluster Suits      : Not much suitable

        CPM (Central Processor Manager) : IBM Proprietary

        Rocks                     : Not free software

        Quattor                   : Free and Best Suitable

For selecting which one is “Best” according to our requirement one have to
get experience with all.
                                                             Vikas Singhal, VECC
                                                               February 9, 2006
                                                                                  13
                         Cluster Building and Design

           Installing a Quattor Server and Client
Quattor is an administration toolkit for optimizing resources.
Quattor is a large scale management system for managing medium to
very large (>1000 node) clusters.
3 Sets of Quattor RPM are available:-

1. i386 :- For all Pentium or Xeon processor or that has IA32 bit Instruction
    set
2. IA64 :- For 64 bit machine means Intel Itanium
3. i86x64 :- For 64 bit machine but also supports x86 instruction set like AMD
    Opetron
Site Address:-
          http://quattor.org
Package RPMs:-
          http://quattorsw.web.cern.ch/quattorsw/software/quatttor

Requirements:
        It supports SLC or RH Linux 7.3
        Disk: 6.5 GB for Server, 2.5 GB per client OS
No Specific Hardware or software required for Vikas Singhal, VECC
                                                 February 9, 2006
building Quattor Cluster.                                         14
                             Cluster Building and Design




           CDB                                                 SPMA
Configuration Data Base                       Software Package Manager Agent for
Hierarchical Template Based Structure         software deployment
Makes one common structure for                Manages the different software
different databases                           packages installation
Contains cluster descriptions,                Handle multiple package formats
networking parameters etc                     Manages Software Repository (SWRep)


                                                 AII
               NCM
                               Automated Installation Infrastructure
Node Configuration Manager for
                               Works on top of native RH/SL installer using PXE.
system configuration
                                       Anaconda / KickStart.
Framework, where service-
                                       DHCP server (IP address + kernel location).
specific plug-in (Components)
                                       TFTP server (boot kernel).
makes necessary system.
                                       HTTP server (OS images + packages).
                                                               Vikas Singhal, VECC
                                                                  February 9, 2006
                                                                                     15
                      Cluster Building and Design

  For Installing Cluster Site Basic Requirement

Cluster Building : Quattor

    Some basic steps after Quattor installations
    C3 commands
            for High availability (if Dual NIC)
    Bonding Package
    LDAP (Lightweight Directory Access Protocol)
    S/W Firewall (Make firewall rules)


Job Scheduling : Condor
    Specialized workload management system.
    Provides a job queuing mechanism, scheduling policy, resource
    monitoring, and resource management.
    Can checkpoint and migrate a job to a different machine



                                                        Vikas Singhal, VECC
                                                           February 9, 2006
                                                                              16
Cluster Building and Design


Condor Daemons




                              Vikas Singhal, VECC
                                February 9, 2006
                                                   17
  Cluster Building and Design

Job Submission Steps




                                Vikas Singhal, VECC
                                  February 9, 2006
                                                     18
                       Cluster Building and Design


                    Condor Commands


condor_compile
   Re-links source or object files with condor libraries
   Condor library provides check-pointing, migration, remote
   system calls

condor_submit - Takes as input submit description file and
produces a job classAd for further processing by central
manager

condor_status – to view about various machines in the Condor
pool

condor_q – for viewing job status

                                                     Vikas Singhal, VECC
                                                        February 9, 2006
                                                                           19
                     Cluster Building and Design

                 Submit description files


Directs queuing of jobs

Contains

   Executable location
   Command line arguments to job
   stdin, stderr, stdout
   Initial working directory
   should_transfer_files = <YES | NO | IF_NEEDED >. NO
   disables condor file transfer mechanism
   when_to_transfer_output = < ON_EXIT | ON_EXIT_OR_EVICT
   >



                                                   Vikas Singhal, VECC
                                                     February 9, 2006
                                                                        20
                         Cluster Building and Design

  Cluster Monitoring & Job Throwing : Ganglia




Ganglia is a scalable distributed monitoring system for high-performance
computing systems.

Relies on a multicast-based listen/announce protocol to monitor state.
Very low per-node overheads and high concurrency.
It uses
         XML for data representation
         XDR for compact, portable data transport,
         RRDtool for data storage and visualization.
                                                             Vikas Singhal, VECC
                                                                February 9, 2006
                                                                                   21
                        Cluster Building and Design

  Cluster Monitoring & Job Throwing : Ganglia

Ganglia Monitoring Daemon (gmond)

        Gmond is a multi-threaded daemon.
        Runs on each cluster node those we want to monitor .


Ganglia Meta Daemon (gmetad)

        Start it only Management node.

Ganglia PHP Web Front-end

        Displays Ganglia data in a meaningful way
New Era of Internet Use started

        We had used Internet / Web as Information / Knowledge Base
        Now we can use http for computing also.
                                                          Vikas Singhal, VECC
        Open page, select executable file and submit it.
        This file will execute on Cluster Client node.       February 9, 2006
                                                                           22
                 Cluster Building and Design


              Cluster  Grid

With EDG Grid connectivity :-
ALIEN, EGEE, gLite, LCG-2 ???




To become a Part of Global Monitoring :
MonaLisa, Lemon.




                                               Vikas Singhal, VECC
                                                 February 9, 2006
                                                                    23
                        Cluster Building and Design

               VECC Cluster Machine status


One Interactive node:-
        At this time we have only one Interactive node we will procure
        more in near future.
                  #ssh interactive001


Other Computing type of nodes:-

        Here 6 Computing nodes (node001 to node006).
        One cannot login to these nodes but compute jobs.
        One can use these for Batch mode for computing, not in Interactive
        mode.




                                                         Vikas Singhal, VECC
                                                            February 9, 2006
                                                                               24
                 Cluster Building and Design

              Where we land up Now




             PC – Post Card




PC – Personal Computer                     PC – Packed Cluster
                                                  Vikas Singhal, VECC
                                                    February 9, 2006
                                                                       25
                       Cluster Building and Design


                        Future Work
C++ and MPI (Massage Passing Interface) will be the Future for clusters.

For optimum use of cluster users have to learn MPI




                           Questions ??
                                                        Vikas Singhal, VECC
                                                           February 9, 2006
                                                                              26

				
DOCUMENT INFO