An Introduction to the Computational Grid

Reviews
An Introduction to the Computational Grid Jeff Linderoth Dept. of Industrial and Systems Engineering Univ. of Wisconsin-Madison linderot@cs.wisc.edu Steve Wright Computer Sciences Dept. Univ. of Wisconsin-Madison swright@cs.wisc.edu Second International Conference on Continuous Optimization McMaster University Hamilton, Ontario, Canada Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II wisconsin-logo 1 / 42 Outline What is “The Grid?” Grid Software: Condor Large-scale Grid resources: Teragrid, Open Science Grid Using Condor – Some Hands on Demos wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 2 / 42 The Grid Richard Dawson Rules! Come on Let’s Play the Feud ‘‘100 People Surveyed. Top 5 answers are on the board. Here’s the question...’’ Name one common use of the Internet wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 3 / 42 The Grid Richard Dawson Rules! The Big Board 1 2 email Looking up answers to homework problems YouTube Updating personal information at myspace Looking at pictures of Anna Kournikova 3 4 5 wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 4 / 42 The Grid Richard Dawson Rules! Strike! Doing Computations wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 5 / 42 The Grid Building a Grid People envision a “Computational Grid” much like the national power grid Users can seamlessly draw computational power whenever they need it Many resources can be brought together to solve very large problems Gives application experts the ability to solve problems of unprecedented scope and complexity, or to study problems which they otherwise would not. Large funded initiative in the US. NSF Office of Cyberinfrastructure wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 6 / 42 The Grid Building a Grid Types of Grids Computational grids Focus on computationally-intensive operations. This included CPU Scavenging Grids – which is our focus today Data grids Help control, share, and manage large quantities of (distributed) data Equipment grids Associated with a piece of expensive equipment (telescope, earthquaje shake table, advanced photon source) Grid software used to access and control equipment remotely Access grid Used to support group-to-group interactions Consists of multimedia large-format displays, presentation and interactive environments, interfaces to Grid middleware and visualization environments. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 7 / 42 The Grid Building a Grid Grid Contrasts (Source: IBM Web Site) Grid Vs. Web Like the web Grid keeps complexity hidden: multiple users enjoy a single, unified experience. Unlike the Web which mainly enables communication, grid computing enables full collaboration toward common business or scientific goals. Grid Vs. P2P Like peer-to-peer grid computing allows users to share files. Unlike peer-to-peer grid computing allows many-to-many sharing not only files but other resources as well. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 8 / 42 The Grid Building a Grid Grid Contrasts Grid Vs. Clusters Like clusters and distributed computing, grids bring computing resources together. Unlike clusters and distributed computing, which need physical proximity and operating homogeneity, grids can be geographically distributed and heterogeneous. Grid Vs. Virtualization Like virtualization technologies, grid computing enables the virtualization of IT resources. Unlike virtualization technologies, which virtualize a single system, grid computing enables the virtualization of vast and disparate IT resources. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 9 / 42 The Grid Building a Grid This ain’t easy! User access and security Who should be allowed to tap in? Interfaces How should they tap in? Heterogeneity Different hardware, operating systems, and software Dynamic Participating Grid resources may come and go Fault-Tolerance is very important! Communicationally challenged Machines may be very far apart ⇒ slow communication. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 10 / 42 The Grid Building a Grid Grid Computing Tools: Globus Globus: Widely-used grid computing toolkit Globus Services/Libraries Security, Information infrastructure, Resource management, Data management, Communication, Fault detection, Portability. It is packaged as a set of components that can be used either independently or together to develop applications. Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II wisconsin-logo 11 / 42 The Grid Building a Grid Building a Grid Even with wonderful tools like Globus providing these services, there is still a fundamental obstacle to creating computational grids available to all scientists GREED! Most people don’t want to contribute “their” machine! How to induce people to contribute their machine to the Grid? Screensaver – BOINC, seti@home Social Welfare – fightaids@home Offer frequent flyer miles – company went bankrupt Let the people keep control over their machine Give donaters a chance to use the Grid wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 12 / 42 The Grid Condor Condor   Peter Couvares    Alan DeSmet     Peter Keller     Miron Livny  Erik Paulsen   Marvin Solomon     Todd Tannenbaum      Greg Thain   Derek Wright http://www.cs.wisc.edu/condor wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 13 / 42 The Grid Condor Condor: www.cs.wisc.edu/condor Manages collections of “distributively owned” workstations User need not have an account or access to the machine Workstation owner specifies conditions under which jobs are allowed to run All jobs are scheduled and “fairly” allocated among the pool How does it do this? Scheduling/Matchmaking Jobs can be checkpointed and migrated Remote system calls provide the originating machines environment wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 14 / 42 The Grid Condor Matchmaking MyType = Job TargetType = Machine Owner = ferris Cmd = cplex Args = seymour.d10.mps HasCplex = TRUE Memory ≥ 64 Rank = KFlops Arch = x86 64 OpSys = LINUX MyType = Machine TargetType = Job Name = nova9 HasCplex = TRUE Arch = x86 64 OpSys = LINUX Memory = 256 KFlops = 53997 RebootedDaily = TRUE wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 15 / 42 The Grid Condor Checkpointing/Migration Professor’s Machine Professor Arrives 5am Checkpoint Server Grad Student’s Machine 8am } 5 min Grad Student Grad Student Arrives Leaves } 8:10am 12pm 5 min wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 16 / 42 The Grid Condor Other Condor Features Pecking Order Users are assigned priorities based on the number of CPU cycles they have recently used. If someone with higher priority wants a machine, your job will be booted off. Flocking Condor jobs can negotiate to run in other Condor pools. Glide-in Globus provides a “front-end” to many traditional supercomputing sites. Submit a Globus job which creates a temporary Condor pool on the supercomputer, on which users jobs may run. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 17 / 42 The Grid Condor Condor + Operations Research GAMS (www.gams.com) has added Grid Computing Language Extensions This allows regular GAMS optimization models to be submit to job schedulers like Condor! mymodel.solvelink=3; loop(scenario, demand=sdemand(scenario); cost=scost(scenario) solve mymodel min obj using minlp; h(scenario)=mymodel.handle); Ferris and Busseick use this strategy, in combination with some “manual branching”, and CPLEX MIP solver to solve three previously unsolved MIPLIB2003 instances “overnight” More this afternoon! Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 18 / 42 wisconsin-logo The Grid Condor Condor Daemons condor master: Controls all daemons condor startd: Controls executing jobs condor starter: Helper for starting jobs condor schedd: Controls submit jobs condor shadow: Submit-side helper for running jobs condor collector: Collects system information; only on Central Manager condor negotiator: Assigns jobs to machines; only on Central Manager wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 19 / 42 The Grid Condor A Typical Condor Pool wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 20 / 42 The Grid Condor How Condor Starts Up Your Job wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 21 / 42 The Grid Condor Flocking and Glide-in Flocking Collector from on central manager shark.ie.lehigh.edu is allowed to negotiate with central manager from a different pool condor.cs.wisc.edu shark’s condor config: FLOCK TO = condor.cs.wisc.edu condor’s condor config: FLOCK FROM = shark.ie.lehigh.edu Beware firewalls! (schedd on submit machine must be abe to make direct socket connection to submitting machine) There is a tool GCB (Generic Connection Broker) that can get around this limitation Glide-in Resource request made to gate-keeper Often on high-performance computing resource Gatekeeper make request to batch-scheduled resource. Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 22 / 42 wisconsin-logo The Grid Condor Personal Condor—A Computational Grid wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 23 / 42 The Grid Condor Grid-Enabling Algorithms Condor and growing number of interconnection mechanisms gives us the infrastructure from which to build a grid (the spare CPU cycles), We still need a mechanism for controlling algorithms on a computational grid No guarantee about how long a processor will be available. No guarantee about when new processors will become available To make parallel algorithms dynamically adjustable and fault-tolerant, we could (should?) use the master-worker paradigm What is the master-worker paradigm, you ask? More in the next talk! wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 24 / 42 Distributed Resources The TeraGRID The Teragrid http://www.teragrid.org Consortium of traditional high-performance computing centers > $150M of NSF funding behind it! Over 100 TeraFLOPS! total CPU power Dozens of Petabytes of online and archival storage 30Gbps backbone Site IU NCAR SDSC NCSA UC/ANL CACR PSC Purdue TACC # 712 1024 3612 4381 316 104 5248 5012 5256 21,284 Type PowerPC, Itanium, Xeon Blue Gene Itanium, Power-4, Blue Gene Itanium, Altix, Xeon Itanium, Xeon Itanium Alpha Xeon Xeon, Ultra-Sparc wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 25 / 42 Distributed Resources Open Science Grid Open Science Grid A distributed computing infrastructure for large-scale scientific research, built and operated by a consortium of universities and national laboratories “Virtual Organizations” Compact Muon Solenoid Computing Resources 85 participating institutions ≈ 25,000 computers. 175 TB of storage CompBioGrid Genome Analysis and Database Update Grid Laboratory of Wisconsin nanoHUB Network for Computational Nanotechnology wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 26 / 42 Distributed Resources Open Science Grid Putting it all together Distributed Resources The Teragrid: http://www.teragrid.org Open Science Grid: http://www.opensciencegrid.org The Upshot You can put all of these components together to solve BIG problems in operations research You can use byproducts (software tools) of this research We still need to use our OR expertise to engineer the algorithms for the computational platform wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 27 / 42 Distributed Resources Open Science Grid Installing and Setting up (Personal) Condor ./condor configure --install=/home/jtl3/tmp/condor-6.8.5/release.tar --install-dir=/home/jtl3/condor --local-dir=/home/jtl3/condor/local --make-personal-condor Set environment variable CONDOR CONFIG to point to $HOME/condor/etc/condor config Edit CONDOR CONFIG to have HOSTALLOW WRITE = *: Anyone can join your pool export PATH=$HOME/condor/bin:$HOME/condor/sbin:$PATH condor master wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 28 / 42 Distributed Resources Open Science Grid Other CPU Grid Building Tools Condor is not the only way to build Free: Sun Grid Engine: http://gridengine.sunsource.net/ Commercial: LSF Platform: www.platform.com wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 29 / 42 Let’s Run Condor! My First Condor Job Let’s Run Condor! You have been set up with temporary account at UW-Madison, from which we can run some Condor jobs. These will stay active for a week or so. To get started, ssh to chopin.cs.wisc.edu, and login using the username and passwords distributed earlier. If you don’t have ssh and are running Windows, you can get it from http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html Now run the following commands: cd /scratch mkdir your name (choose some unique identifier) cd your name cp -r /scratch/ICCOPT/* . source ./setit Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II wisconsin-logo 30 / 42 Let’s Run Condor! My First Condor Job Mmmmmmmmmmmmmm. Pie Our first computational task will be to estimate π by numerical integration. Everyone knows... 1 0 π 1 dx = arctan(x)|1 = arctan(1) = . x=0 2 1+x 4 wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 31 / 42 Let’s Run Condor! My First Condor Job The Rectangle Rule 4 4/(1+x*x) 3.5 3 2.5 2 1.5 1 0.5 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 ICCOPT II wisconsin-logo 1 Linderoth (UW-Madison) An Introduction to the Computational Grid 32 / 42 Let’s Run Condor! My First Condor Job A Program to Estimate π We’ve written a π-calculator for you. cd src gcc pi1.c -lm -o pi1 ./pi1 1000 This is not a parallel program. Just a simple (one process) program. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 33 / 42 Let’s Run Condor! My First Condor Job Condor Universes Condor jobs run in a specific Condor Universe Standard—Has cool features like checkpointing and migration of jobs Requires special linking of your program Vanilla—No cool condor features (regular) MPI/PVM/Java/Grid Not mentioned here today, but they exist. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 34 / 42 Let’s Run Condor! My First Condor Job Compiling for Condor Standard Universe Put the command condor compile in front of your normal link line. [jtl3@fire1 condor]$ condor compile gcc pi1.c -o pi1-standard -lm Vanilla Universe Do nothing Condor submission is like other resource management software Describe your job in a job submission file Submit and monitor your job with command line programs wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 35 / 42 Let’s Run Condor! My First Condor Job Sample Condor Submission Files universe = standard executable = pi1-standard arguments = 1000000000 output = pi1.out error = pi1.err notification = Never notify user = swright@cs.wisc.edu getenv = True rank = kflops queue universe = vanilla executable = pi1 arguments = 666 output = pi1.out requirements = (OpSys != WINNT51) error = pi1.err getenv = True rank = Memory queue man condor submit http: wisconsin-logo //www.cs.wisc.edu/condor/manual/v6.8/condor submit.html Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 36 / 42 Let’s Run Condor! My First Condor Job The Big Four condor submit Submit a job to the Condor scheduler condor q Check the status of the queue of Condor jobs condor status Check the status of the condor pool condor rm Delete a Condor job wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 37 / 42 Let’s Run Condor! My First Condor Job Let’s Do It! [jtl3@fire1 condor]$ condor_submit run.condor Submitting job(s). 1 job(s) submitted to cluster 16. [jtl3@fire1 condor]$ condor_q -- Submitter: fire1.cluster : <192.168.0.1:32777> : fire1.cluster ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 16.0 jtl3 8/4 11:22 0+00:00:16 R 0 3.4 pi1-standard 1000000000 [jtl3@fire1 condor]$ cat pi1.out pi is about 3.1415926555921398488635532 Error is 2.0023467328655897e-09 I could do condor rm 16.0 Any Condor questions? Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II wisconsin-logo 38 / 42 Let’s Run Condor! Parallel Job on Condor: Statistical Bootstrapping Condor Parallel Example: Statistical Bootstrapping {z2 , z2 , z5 , ...} Sample z1 , z2 , z3 , z4 , z5 , ...} Distribution samp {z2 , z5 , z7 , ...} Analyze Resamp {z5 , z7 , z9 , ...} Analyze Coalesce Resamp {z7 , z7 , z9 , ...} Analyze wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 39 / 42 Let’s Run Condor! Parallel Job on Condor: Statistical Bootstrapping Statistical Bootstrapping driver.m dist size = 100000; d = rand(dist size, 1) .* 500; subset = d(floor(rand(1000,1) .* 100000)); save ”subset” subset; Driver Creates distribution. Driver Creates submit file.Introduction to the Computational Grid Linderoth (UW-Madison) An submit universe = vanilla executable = worker.m transfer files = true when to transfer output = on exit transfer input files = subset output = mean.$(PROCESS) wisconsin-logo log = log queue 5 ICCOPT II 40 / 42 Let’s Run Condor! Parallel Job on Condor: Statistical Bootstrapping Running the example Shell prompt $ ./driver.m Submitting job(s)..... Logging submit event(s)..... 5 job(s) submitted to cluster 565262. 5 minutes later... All jobs done. mean of mean is 161.014978 wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 41 / 42 Let’s Run Condor! Parallel Job on Condor: Statistical Bootstrapping Let’s run it! Go to your directories scratch/your name on chopin.cs.wisc.edu, and look at the files submit, driver.m, and worker.m You’ll see that they contain the material from the previous slides. The two .m files have a line at the start indicating that they are to be run using Octave (a free Matlab-like language available from www.octave.org. Now run it by typing “driver.m” on the command line! driver.m creates the file “subset” and then invokes submit. Five instances of worker.m are submitted into the condor pool. You can check on the status of these by typing condor q (in another window). When the five jobs are finished, driver.m does the final computation and prints a message to the screen. wisconsin-logo Linderoth (UW-Madison) An Introduction to the Computational Grid ICCOPT II 42 / 42

Related docs
An Introduction to The Grid
Views: 35  |  Downloads: 8
Grid introduction
Views: 19  |  Downloads: 2
Introduction to the Grid Computing
Views: 70  |  Downloads: 15
Introduction to Grid Monitoring
Views: 20  |  Downloads: 2
Introduction to Grid Computing
Views: 14  |  Downloads: 1
Introduction to Grid Computing
Views: 10  |  Downloads: 2
Introduction to Grid Computing
Views: 14  |  Downloads: 4
grid theory
Views: 94  |  Downloads: 8
1What is a Grid
Views: 105  |  Downloads: 8
Introduction to Grid'5000
Views: 23  |  Downloads: 1
Other docs by gregorio11
Title and trust company
Views: 239  |  Downloads: 3
ISHPS_2006_Program_PDF
Views: 235  |  Downloads: 0
Of claim of title to real property
Views: 248  |  Downloads: 4
After expiration of term of years
Views: 199  |  Downloads: 0
To make multi year lease
Views: 239  |  Downloads: 0
herbalteas
Views: 129  |  Downloads: 0
Globalization of White Collar
Views: 304  |  Downloads: 8
Venture Capital for Technology Business Growth
Views: 1250  |  Downloads: 124
Wyoming articles of incorporation
Views: 270  |  Downloads: 4
President Woodrow Wilsons 14 Points info
Views: 804  |  Downloads: 1
Contract to Purchase Building
Views: 228  |  Downloads: 6