The LA Grid Meta-Scheduling Project Team: Liana Fong1, S. Masoud Sadjadi2, Yanbin Liu1, Ivan Rodero3, David Villegas2, Selim Kalayci2, Norman Bobroff1, and Julita Corbalan3 1: IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 2: Florida International University, Miami, FL 33199 3: Barcelona Supercomputing Center, Barcelona, Spain I: Objectives II: P2P Meta scheduling III: Related Work Centralized model: Objective • Meta-scheduling has direct information of all FIU • Support interoperation and cooperation of network of FIU-GCB resources available at the various institutes of the Meta-scheduler LAGrid Some key aspects of the distributed schedulers virtual organization SGE Metascheduler Protocol: • Responsible for scheduling job execution on all local local local Fork scheduler scheduler scheduler • Heterogeneous sites; inner structure of resources Strategic Importance • Local schedulers at individual institutes will act as job • Enhance usability: common job control language to different domains doesn’t effect the functionality of dispatchers. resource domains Meta- Scheduler the protocol. Hierarchical model: • Drive interoperability of schedulers: proprietary and open- Peer-to-peer Peer-to-peer • Site autonomy; each metascheduler is • Meta-scheduling has no direct access to resources in Meta-scheduler source the virtual organization IBM responsible from its own site, and offers • Assign jobs to the local schedulers of the various local local local • Provide integrated scheduling views for enterprise and grid BSC Meta- Scheduler Meta- Scheduler as much information as it wants to other dispatcher dispatcher dispatcher Peer-to-peer institutes customers CEPBA IBM-USA sites. • Local schedulers will match jobs to resources. BSCgrid IBM-India Distributed model: Technology Benefits LL/Fork TDWB • Peer-to-peer; no centralized body, no • Multiple local schedulers with a companion meta- Fork Meta-scheduler • Meet various user service objectives: policy driven (e.g. TDWB single-point of failure. scheduling functional entity local-scheduler capability based, response time based) • Local schedulers can submit jobs to each others • Maximize resource availability to users with transparency of through their respective meta-scheduling functional Meta-scheduler Meta-scheduler entities. locations C P: Job flow is from C to P, resource info flow is from P to C Ref: “Distributed job scheduling on computational grid using multiple local-scheduler local-scheduler • Optimize utilization of resources across domains Simultaneous Requests” by Vijay Subramani, Rajkumar Kettimuthu, Srividya Srinivasan, and P. Sadayappan Job flow Info flow IV: System architecture FIU IBM Connection API WS Client Connection API • Establish and terminate connections JSDL Connection Connection Web Console User Global between domain meta-schedulers. Client Resource Management Management Command-ine Scheduling manager manager • Negotiate roles and connection parameters Job Job Mgmt API Job using the interface 1. User Client takes the job request IBM Confidential JSDL 1. User Client takes the job request Management Management from the local User. This request is Site scheduling manager • Provider roles: provide resources for job from the local User. This request is forwarded to Global Scheduling Resource exchange API forwarded to Global Scheduling execution; is responsible of sending out Manager (GSM). Manager (GSM). Gridway Resource Resource 2. GSM queries the Resource Management Management 2. GSM queries the Resource resource information Manager (RM) for resources. RM Manager (RM) for resources. RM stores information about local and • Consumer roles: use resources provided stores information about local and remote resources. remote resources. Globus Globus by providers; route job request to 3. IfIfavailable resources are found 3. available resources are found on local site, job request is providers. on local site, job request is forwarded to Site Scheduling forwarded to Site Scheduling • Send heart beats: exchanged to guarantee Manager (SSM). Manager (SSM). Resource Job Connection 1. 4. SSM leverages Gridway Management Management Management 1. 4. SSM leverages Gridway the healthy state of the connection. functionality to submit the job to the functionality to submit the job to the SGE Fork IBM Confidential Grid Middleware (Globus). Grid Middleware (Globus). 5. IfIfthere are not available GCB LAGrid 5. there are not available Resource exchange API resources locally, job request is sent resources locally, job request is sent Cluster Cluster Apache to aa remote site through WS Client to remote site through WS Client Axis2 Server WS Interface • Exchange the scheduling capability and 6. Alternatively, job requests from 6. Alternatively, job requests from capacity of the domain controlled by the other peers can be received from the other peers can be received from the WS layer. LAGrid WS layer. meta-scheduler eNANOS Plugin WS Client LoadLeveler IBM Confidential LAGrid RP • Exchanged information can be a 1. The eNANOS Client forwards the user requests to the eNANOS Broker. Client Globus 1. The eNANOS Client forwards the user requests to the eNANOS Broker. CEPBA complete or incremental set of data 2. The remote request from the P2P infrastructure are managed by regular WS (Axis2) acting 2. The remote request from the P2P infrastructure are managed by regular WS (Axis2) acting JSDL Command-line eNANOS as a wrapper to a GT4 service that implements the LAGrid APIs and protocols. Connections as a wrapper to a GT4 service that implements the LAGrid APIs and protocols. Connections Resources and other data is stored in Resource Properties. Resource and other data is stored in Resource Properties. Java API 3. Jobs and resources (aggregated data) obtained from local and remote sites are used in the Globus Job management API 3. Jobs and resources (aggregated data) obtained from local and remote sites are used in the eNANOS Resource Broker scheduling. Jobs are executed under the local domain through Broker Fork eNANOS Resource Broker scheduling. Jobs are executed under the local domain through • Submit, re-route and monitor job executions Globus services, or are forwarded to other meta-scheduler. Globus services, or are forwarded to other meta-scheduler. GT4 Container BSCGrid 4. eNANOS provide its resources data, forwards jobs and performs other operations (such as across schedulers 4. eNANOS provide its resources data, forwards jobs and performs other operations (such as sending heart beats) through a WS Client. BSC Cluster sending heart beats) through a WS Client.
Pages to are hidden for
"The LA Grid meta-scheduling project"Please download to view full document