What is SAM-Grid

Reviews
Shared by: jackshepherd
Categories
Stats
views:
21
rating:
not rated
reviews:
0
posted:
10/31/2008
language:
pages:
0
What is SAM-Grid? Job Handling Data Handling Monitoring and Information Problems To Solve    How can a large, geographically distributed, dynamic, physics collaboration work together? How can this collaboration make use of available distributed computing resources? How can it handle the huge amount of data (PBs) generated by the experiment? Answers – The GRID & SAM-Grid  GRID  A network of middleware services that tie together distributed resources (Fabric – processors, storage). Integrate the standard middleware to achieve a complete Job, Data, and Information management infrastructure thereby enabling fully distributed computing.  SAM-Grid  SAM-Grid Architecture Job Management   Grid-level (global) job scheduling (selection of a cluster to run) distinguished from local scheduling (distribution of the job within the cluster) We distinguish structured jobs from unstructured.   Structured jobs have their details known to Grid middleware Unstructured jobs are mapped as a whole onto a cluster For data-intensive jobs, sites are ranked by the amount of data cached at the site  Scheduler is interfaced with the data handling system.  Job Handling User User Interface Interface Submission Service Informatio Information nCollector Collector JOB Grid/Fabri c Interface Match Match Making Making Service Resource Selection external algorithm Exec Site #1 Execution Site #n Grid/Fabri c Interface Generic Service Computin Computing gElement Element Computing Element Generic Service Grid Grid Sensor Sensors s Grid Grid Sensor Sensors s Computing Element Data Handling - SAM      SAM is a distributed data movement and management service SAM stations are resources pooled together to enable data management Data replication is achieved by the use of disk caches during file routing. SAM is a fully functional metadata catalog. A station can access a remote resource via the services offered by other connected stations MSS – Mass Storage System Control Flow Data Flow Remote Station Cache2 Local Station 1 Cache1 Local Station 1 Cache2 MSS2 MSS1 Local Station 2 Cache1 Remote Station Cache1 Data Handling services Global Resource Manager(s) Database Server(s) (Central Database) Shared Globally Name Server Log server Local To Site Station 1 Servers Station 3 Servers Station n Servers Mass Storage System(s) Station 2 Servers Arrows indicate Control and data flow Shared Locally Monitoring and Information  This includes:    configuration framework resource description for job brokering infrastructure for monitoring Sites (resources), services and jobs monitoring Distributed knowledge about jobs etc. Incremental knowledge building Grid Monitoring Architecture for current state inquiries, Logging for recent history studies All Web based  Main features      Monitoring and Information Web Browser Web Browser Web Server 1 Web Server Web Server N Site 1 Information System Site 2 Information System Site N Information System IP IP IP Challenges with Grid/Fabric Interface  The Globus toolkit Grid/Fabric interfaces are not sufficiently…     …flexible: they expect a “standard” batch system configuration. …scalable: a process per grid job is started up at the gateway machine. We want/need aggregation. …comprehensive: they interface to the batch system only. How about data handling, local monitoring, databases, etc. …robust: if the batch system forgets about the jobs, they cannot react. Flexibility    Addressing the peculiarity of the configuration of each batch system requires modification to the Globus toolkit job-manager We address the problem by writing jobmanagers that use a level of abstraction on top of the batch systems. Each batch system adapter can be locally configured to conform to the local batch system interface Scalability    The Globus gatekeeper starts up a process at the gateway node for every job entering the site This limits the number of grid jobs at a site to around 300, for the typical commodity computer We split single grid jobs into multiple batch processes in the SAM-Grid job-managers. Not only does this increase scalability, but it also increases the manageability of the job Comprehensiveness   The standard job-managers interface only to the local batch system We notify other fabric services when a job enters a site    Data handling: for data pre-staging Monitoring: to monitor a non-running job Database: to aggregate queries Robustness    The standard job-managers cannot react to temporary failures of the local batch systems In our experience, PBS, Condor and BQS have failed to report the status of a job We write wrappers around the batch systems. These wrappers implement extra robustness. We call them “idealizers”

Related docs
WHAT-IF……
Views: 7  |  Downloads: 0
WHAT IT IS
Views: 3  |  Downloads: 1
what
Views: 15  |  Downloads: 0
What-if…
Views: 6  |  Downloads: 1
What-this-is
Views: 1  |  Downloads: 0
What-if
Views: 6  |  Downloads: 0
WHAT!
Views: 5  |  Downloads: 0
What-is
Views: 7  |  Downloads: 0
What-is-not-an-eportfolio
Views: 1  |  Downloads: 0
What-is-a-gene
Views: 2  |  Downloads: 0
What-is-“Architecture”
Views: 1  |  Downloads: 0
WHAT-IS-A-COOPERATIVE
Views: 5  |  Downloads: 1
What-is-Not-a-Project
Views: 3  |  Downloads: 0
premium docs
Other docs by jackshepherd
170 Rent Control 7 Dollar Charge
Views: 211  |  Downloads: 0
Venture Capital for Technology Business Growth
Views: 1258  |  Downloads: 124
Sample_Press_Release
Views: 660  |  Downloads: 16
Gettysburg Address info
Views: 292  |  Downloads: 1
Sample Executive Summary Heartsoft
Views: 367  |  Downloads: 4
Exclusive listing contract to obtain tenan2
Views: 475  |  Downloads: 3
928 6th Street Proforma
Views: 235  |  Downloads: 13
Notice To Los Angeles Housing Inspector
Views: 421  |  Downloads: 0
Rent collection policies and procedures
Views: 575  |  Downloads: 15
Death of general partner
Views: 292  |  Downloads: 1