Embed
Email

A Simple Mass Storage System for the SRB Data Grid Michael Wan ...

Document Sample

Shared by: suchenfz
Categories
Tags
Stats
views:
0
posted:
1/4/2012
language:
pages:
9
A Simple Mass Storage System for the SRB Data Grid



Michael Wan, Arcot Rajasekar, Reagan Moore, Phil Andrews

San Diego Supercomputer Center,

University of California at San Diego

(mwan,sekar,moore,andrews)@sdsc.edu



Abstract: continent. Persistent archives are now being

implemented using the Storage Resource

The functionality that is provided by Broker in support of the National Archives

Mass Storage Systems can be implemented and Records Administration, and the

using data grid technology. Data grids National Science Foundation National

already provide many of the required Science Digital Library.

features, including a logical name space and The capabilities provided by the SRB

a storage repository abstraction. We represent a unique integration of data grids,

demonstrate how management of tape digital libraries, and persistent archives. The

resources can be integrated into data grids. mechanisms that were required to integrate

The resulting infrastructure has the ability to these three types of data handling systems

manage archival storage of digital entities turn out to be uniquely suited to the

on tape or other media, while maintaining implementation of a mass storage system.

copies on distributed, remote disk caches The name space is managed by the digital

that can be accessed through advanced library technology, distributed physical

discovery mechanisms. Data grids provide storage resources are managed by the data

additional levels of data management grid technology, and technology evolution is

including the ability to aggregate data into managed through the persistent archive

containers before storage on tape, and the storage repository abstractions.

ability to migrate collections across a The SRB is implemented as a federated

hierarchy of storage device. client-server system, with each server

managing/brokering a set of storage

1. Introduction resources. Storage resources that are

brokered by the SRB include Mass Storage

The SDSC Storage Resource Broker Systems (MSS) such as HPSS [5], UniTree

(SRB) [1, 2, 3, 4] is data grid middleware [6], DMF [7] and ADSM [8], as well as file

that provides a storage repository abstraction systems. What is of great interest is that the

for transparent access to multiple types of SRB data grid can be used to implement all

storage resources. The SRB has been used of the capabilities of a distributed Mass

to implement data grids (to integrate access Storage System, while providing access to

to data distributed across multiple data stored in file systems, databases, object

resources), digital libraries (to support ring buffers, databases, and other types of

collection-based management of distributed storage systems. A Mass Storage System

data), and persistent archives (to manage based upon data grid technology can be

technology evolution). The Storage implemented using virtually any type of

Resource Broker is in widespread use, storage device.

supporting collections that have five million The motivations for implementing a

images replicated across multiple HPSS MSS in a data grid are:

archives, and data grids that span the

• Cost of licensing - Not all of our users POSIX-like logical name space that

can afford the licensing fees of eliminates the need to design a name

commercial MSS systems. By server from scratch. Since the MCAT

implementing the Mass Storage System namespace is based on commercial

functionality within a data grid, a database technology, the transaction

common software system can be used for performance is substantially better than

resource federation and data sharing, as current archives. Other reusable features

well as for data archiving. will be discussed later. Leveraging these

• Efficiency and performance - A MSS reusable features greatly reduces the

system that is tightly integrated with the effort required to implement a simple

infrastructure of the SRB data grid can MSS in a data grid.

take full advantage of SRB features such • Finally, an MSS based on data grids can

as file replication, server directed parallel provide a storage system that spans

I/O, latency management, data-based remote caches and distributed archival

access controls, and collection based data devices interconnected by a Widea-Area-

management. Network. Such a logical linking of

• Elimination of duplication of features – distributed devices will provide new

The SRB and MSS systems such as ways for data sharing and fault tolerance

HPSS duplicate some features. For not currently provided by site-located

example, the HPSS has its own disk mass storage system.

cache that is used as a front end to a tape

system. The SRB can be configured to 2. SRB Architecture and Features

provide a cache that serves as a front end

to a HPSS resource. Since the SRB has The Storage Resource Broker (SRB) is

no knowledge of the operational middleware that uses distributed clients to

characteristics of the HPSS cache, it may provide uniform access to diverse storage

not be able to effectively manage its own resources. It consists of three components:

cache utilization in conjunction with the the metadata catalog (MCAT) service, SRB

HPSS cache management. With the servers for access to storage repositories and

integration of Mass Storage System SRB clients, connected to each other via a

capabilities into the SRB, a single large network.

cache pool can be used. Another The MCAT is implemented using a

example is that most MSS systems have relational DBMS such as Oracle, DB2,

their own authentication schemes in SQLServer, PostgresSQL, or Sybase. It

addition to the SRB authentication stores metadata associated with data sets,

system. A single authentication system users and resources managed by the SRB. It

can be used if their capabilities are maintains a POSIX-like logical name space

integrated. The resulting system is easier (file names, directories and subdirectories)

to administer. and provides a mapping of each logical file

• SRB already has many of the required name to a set of physical attributes and a

capabilities – The features needed for a physical handle for data access. The

simple MSS include a name space, a physical attributes include the host name and

storage repository abstraction, storage the type of resource (UNIX file system,

resource naming and data management HPSS archive, object ring buffer, database).

tools. For example, the SRB Metadata The physical handle for data access is the

Catalog (MCAT) [9] maintains a file path for UNIX file system type

resources. The MCAT server handles create, open, close, unlink, read, write, seek,

requests from the SRB servers. These sync, stat, fstat, mkdir, rmdir, chmod,

requests include information queries as well opendir, closedir, and readdir. If the handler

as instructions for metadata creation and cannot handle the request locally, it will

update. forward the request to the server that can

The MCAT imposes additional respond.

mappings on the logical name space to

support replication (one logical name 3. Simple MSS Design

mapped to multiple physical file names), soft

links (a logical name mapped to another The design goals for a simple Mass

logical name), aggregation (structural Storage System are:

mapping of a file to a location in a

container), segmentation (structural mapping • Provide a distributed farm of disk cache

of a file across multiple tape media), file- resources backed by a tape library

based access control (users mapped to system. The cache system should be

permissions on roles for each digital entity). configured to contain any number of

These mappings make it possible to organize distributed cache resources that may or

digital entities independently of their actual may not be on the same host as the tape

storage location. system. This makes it possible to treat

The SRB is implemented as a federated the disk cache as an independent level of

server system. Each server consists of three the storage hierarchy, with the disk cache

layers. The top level "communication and created “near” the end user.

dispatcher" layer listens for incoming client • Provide a tape library system to control

requests and dispatches the requests to the the mounting and dismounting of tapes.

proper request handlers. At a minimum, a Storage Tech silo

The middle layer is the logical layer or running ACSLS software should be

the "high-level API handler" layer. This supported.

layer handles requests in which all input • Provide a uniform access mechanism for

parameters are given in terms of their logical data stored on the Mass Storage System

representations (e.g., logical path name in or on disk caches. A file in the logical

the logical name space, logical resource name space stored in the MSS resource

name, logical user etc). Upon receiving a should appear and behave the same as

request, the logical layer handler queries the any other files stored on other resources.

MCAT and translates the logical input The physical location (on cache or tape)

parameters into their physical of the file should be totally transparent to

representations. It then calls upon the users.

appropriate handler in the physical layer to • Files should always be staged

perform the actual data access and automatically to cache before any I/O

movement. operations are done. Tools for system

The physical layer or the "low-level administrators to manage the cache

API handler" layer handles data access and system are also needed. i.e., tools to

data movement requests from its own logical synchronize files from cache to tape and

layer or directly from the physical layer of purge files on cache.

other SRB servers. This layer basically • Large files should be stored in

consists of driver functions for the 16 most segments. The advantages of using

commonly used POSIX I/O functions: segmented files are: the system can

handle files of very large size; and d . Functions to stage files from

parallel data transfer between tapes and tape to cache and dump files

the cache system can be implemented in from cache to tapes. The tape

our first release. Although large files are and cache resources can be

stored in segments on tapes, parallel distributed.

transfer between tapes and cache has not e . Support for data transfer

been implemented. Data transfer between the cache system and

between the cache system and clients can clients.

be done in parallel using existing SRB 6 . A tape library server whose primary

infrastructure. function is to schedule and perform the

mounting and dismounting of tapes.

Based of the above design goals, the 7. A tape database that tracks the usage of

following software components are needed all tapes controlled by the MSS.

to build the MSS:

4. Implementation

1. A client-server architecture with an

authentication scheme appropriate for The SRB framework version 1.1.8

access across administration domains initially provided the functionalities listed

and a framework for exchange of above, except for the metadata needed to

information between clients and servers manage files on the MSS, the drivers for

that can function over Wide Area basic tape I/O functions, functions to stage

Networks. files from tape to cache, and the tape library

2. A federated server system that allows server and tape database.

cache and tape resources to be located on A major innovation that was needed to

different hosts. implement a MSS within the SRB data grid

3. A metadata server that maintains a was the development of a new compound

logical POSIX-like name space and resource type as a fundamental resource type

provides a mapping of each logical file within the SRB/MCAT system. Files

name to its physical location. written to a compound resource are treated

4. Additional meta data and server as residing on a single resource. In order to

functions that allow files stored in the allow files stored in the MSS resource to

MSS resource to appear and behave the appear and behave the same as files stored

same as any other files stored on other on other resources, support is needed for

resources. compound digital entities. The file that is

5. Storage resource servers that have the written to a compound resource can be

following capabilities: migrated between the cache and the tape

a. Ability to translate user requests back-end within the compound document,

to physical actions using without requiring separate metadata

metadata information attributes to describe the separate residency

maintained in the MCAT of the file on either component of the

catalog. compound resource.

b . A set of driver functions for A compound resource contains multiple

basic tape I/O operations. cache resources for a given tape resource

c . A set of driver functions for (each of which are called internal compound

basic cache I/O operations. resources). When a user creates a file using

a compound resource, the object created is

tagged as a compound digital entity. With server framework as other SRB servers.

the help of MCAT, the server then drills Currently, tape mount is on a first-come-

down through the compound resource and first-server basis. Some amount of

discovers all of the internal resources. It intelligence is built in such that when a client

selects the cache resource where the digital is done using a tape and issues a dismount

entity will be stored initially. After the file request, the tape will not be actually

operation is completed (with a “close” call), dismounted if there is another request for the

the metadata of the just created compound mounting of the same tape in the queue. In

digital entity is updated with the physical this case, the server will just pass the tape to

description of the file pointing to the digital the new request. Specialized queuing

entity created in the cache resource. features can be implemented as needed.

A compound digital entity is treated as A database schema that tracks the usage

any other digital entity for most operations of all tapes controlled by the MSS has been

in SRB, except when the digital entity is incorporated in the MCAT. The schema is

opened for read or write. In this case, the used to track the tape position, total bytes

server will check to see if the digital entity is written, full flag, etc for all tapes controlled

already in a cache. If it is not, the digital by the MSS. A set of system utilities have

entity will be staged on one of the cache been created for tape labeling and tape

resources configured in the compound metadata ingestion, listing of tape metadata

resource. If the digital entity is changed, the and modification of tape metadata. By

dirty bit of the cached digital entity is set. managing these attributes in the MCAT

The dirty copy is not automatically catalog, it is possible to support

synchronized to tape. Synchronization is sophisticated queries against the tape

only done via requests by system attributes and against the attributes of the

administrators. A "dump tape" API and digital entities within the MSS. One can

command are created to allow system readily determine all of the files resident on

administrators to manage the cache system a given tape, identify all tapes that are filled

by synchronizing files in the cache system beyond a given level, and identify all tapes

onto tape, and then purging the files from the that are needed to retrieve all digital entities

cache system. within a logical sub-collection in the

The ability to manipulate data that is metadata catalog.

stored on tape requires additional

capabilities beyond those required for access 5. Comparisons with the IEEE Mass

to data on disk. A set of driver functions for Storage System Reference Model

basic tape I/O operations have been defined

and been incorporated into the SRB server. The SRB MSS provides similar

These functions include mount, dismount, functionality to the IEEE MSS Reference

open, close, read, write, seek, etc. Currently, Model [10]. Comparing with the

the driver has only been tested for 3590 tape implementations of the reference model,

drives. there are both similarities and differences. A

A tape library server for the STK silo major difference stems from the fact that the

running ACSLS software has been SRB MSS uses the underlying File System

incorporated into the SRB system. Its of the operating system for managing data

primary function is to schedule and perform storage instead of the mapping of bitfiles to

the mounting and dismounting of tapes. It the logical and physical volume abstractions

uses the same authentication system and used in the Reference Model. The use of a

File System greatly simplifies the design of bitfiles into physical volume references is

the storage manager. This approach is not needed. The rest of the functionalities of

reasonable given that the performance of the Bitfile Server, Storage Server and Mover

modern File Systems is quite good. Another of the Reference Model are combined into a

source of difference is that the SRB MSS single SRB Resource server. For tape

integrates the functionality of several servers resources, the SRB uses a tape library server

of the Reference Model into a single server. for mounting and dismounting of tapes. As

This greatly simplifies the architecture of the for the Migration-Purge server of the

SRB MSS. Improved robustness and Reference Model, SRB has an API and a

performance of the system is achieved at the system utility that migrates files on cache to

expense modularity. The design provides tape resources.

modular interfaces to support addition of

new storage repositories and new access 6. Usage Examples

APIs, while aggregating all metadata into a

single database. The use of data grid technology to

The major components of the Reference implement a Mass Storage System makes it

Model include: possible to incorporate latency management

capabilities directly into the architecture.

1. Name Server – provides POSIX-like For access to data distributed across multiple

name space, a mapping of logical names resources, the finite speed of light can

to bitfile IDs and access control (ACL) severely limit sustainable transaction rates, if

for the name space objects. the transactions are issued one by one. The

2. Bitfile Server – provides the abstraction SRB data grid provides multiple

of logical bitfiles to its clients and mechanisms to minimize the number of

handles the logical aspects of the storage messages that are sent over a wide area

and retrieval of bitfiles. network, including the aggregation of data

3 . Storage Server – handles the physical into containers, the aggregation of metadata

aspect of bitfile storage and retrieval. It into an XML file, and the aggregation of I/O

translates references to storage segments commands through the use of remote

into references to virtual volume and into proxies.

physical volume references. For a Mass Storage System, the ability

4 . Mover – transfers data from a source to aggregate data into containers is essential

device to a sink device. for achieving high performance when

5 . Migration-Purge server – provides managing small digital entities. When the

storage management by migrating size of a digital entity is less than the tape

bitfiles on disks to tapes. access bandwidth multiplied by the tape

latency, it becomes cost effective to work

The SRB MCAT server, which maintains with containers of files. The size of the

a POSIX-like logical name space, is container is adjusted such that the retrieval

equivalent to the Name Server of the of two digital entities from the same

Reference Model. The only difference is that container is smaller that the retrieval of the

each SRB digital entity is mapped directly to two files directly from tape.

a set of physical attributes rather than to a The usage scenario that illustrates the

logical bitfile as in the Reference Model. generality of the data grid based mass

Because of the direct mapping, the storage system is to consider the storage of a

functionality of translating from logical container on a mixture of caches, archives,

and compound resources. This scenario in the creation of a replica of the container in

requires the use of five different mappings the compound resource (disk cache and

on the logical name space: backend tape), and the creation of a replica

on one of the disk caches. A

• Mapping from the logical file name to a synchronization command will cause the

location within a container replica on the disk cache to be copied onto

• Mapping of the container to one of one of the archives.

several replicas The ability of data grids to support

• Mapping of a logical resource name to a sophisticated resource management

physical resource name functions on top of distributed storage

• Mapping of access control lists between repositories makes it possible to greatly

the user name and the requested file increase the number of options when

• Mapping of a compound digital entity to archiving data. An example is the

its location in a compound resource implementation of alternate completion

scenarios, in which a file is assumed to be

The SRB provides the ability to organize archived when it is written to “k” of the “n”

physical resources by a logical resource physical resources specified by a logical

name. Writing to the logical resource name resource name. Another example is the

then results in the replication of the file implementation of load balancing, by the

across all of the physical resources. Separate writing of digital entities in turn to a list of

metadata attributes are maintained for each resources specified by the logical resource

replica of the file. The SRB also supports name. The ability to replicate files across

the aggregation of files into a container. trees of storage resource options instead of

Containers are manipulated on disk caches, the traditional simple storage hierarchy,

and then written to an archive. A primary greatly increases the ability to manage

disk cache can be identified with multiple archival copies of data.

secondary disk caches. A primary archive A second data grid capability that greatly

can be identified with multiple secondary enhances mass storage systems is the

archives. When a primary resource is not organization of the digital entities as a

available, the SRB will then complete the hierarchical collection. It is possible to use

operation to the secondary resource. Note digital library discovery mechanisms to

that containers can be replicated onto identify relevant files within the mass

multiple storage repositories. storage system. The discovery mechanisms

The SRB also supports compound can be exercised through interactive web

resources composed of a cache and either a interfaces, or directly from applications

tape or archive. Writing a file to a through C library calls.

compound resource results in the creation of A third data grid capability that

a single set of metadata, with an attribute simplifies management of mass storage

used to specify which component of the systems is the association of access controls

compound resource holds the data. The with the data, rather than the storage system.

interesting management scenario is the This makes it possible to include sites across

creation of a logical resource name that administration domains within the mass

includes a compound resource, a primary storage system, while simplifying

disk cache, secondary disk caches, a primary administration of the system. Data grids

archive, and secondary archives. Writing a support collection-owned files, in which

container to this logical resource then results access to the files is restricted to the

collection. Users authenticate themselves to man months) because we were able to

the collection. The collection uses access leverage existing capabilities within the SRB

control lists that are specified separately for infrastructure. We believe this approach will

each registered digital entity to determine radically change how archives are

whether a person is authorized to access a constructed. The ability to manage replicas

file. The collection in turn authenticates of data on low cost storage media is as

itself to the remote storage system. simple as making a replica of a digital entity

A fourth data grid capability that on a disk cache. The ability to discover,

simplifies incorporation of new technology access, and manipulate digital entities stored

into the mass storage system is the use of a on tape media can now be done through the

storage repository abstraction that defines same sophisticated interfaces that data grids

the set of operations that will be performed provide for access to all types of storage

when accessing and manipulating digital systems.

entities. The storage repository abstraction

makes it possible to write drivers for each

type of storage system, without having to 8. Acknowledgements

modify any of the higher software levels of

the storage environment. The storage This research has been sponsored by the

repository abstraction is also used to support Data Intensive Computing thrust area of the

dynamic addition of resources to the system. National Science Foundation project ASC

96-19020 “National Partnership for

7. Conclusions Advanced Computational Infrastructure,” the

NSF National Science Digital Library, the

Because of the cost of licensing, access NARA supplement to the NSF NPACI

efficiency and transaction performance program, the NSF National Virtual

issues, we have implemented a simple MSS Observatory, and the DOE ASCI Data

system in the Storage Resource Broker data Visualization Corridor.

grid. We were able to accomplish this task

within a relative short time (slightly over 6



9. References



1. Baru, C., R, Moore, A. Rajasekar, M. 4. Rajasekar, A., M. Wan, and R. Moore,

Wan, (1998) "The SDSC Storage (2002), “MySRB & SRB - Components

Resource Broker," Proc. CASCON 98 of a Data Grid,” The 11th International

Conference, Nov.30-Dec.3, 1998, Symposium on High Performance

Toronto, Canada. Distributed Computing (HPDC-11)

2. SRB, (2001) "Storage Resource Broker, Edinburgh, Scotland, July 24-26, 2002.

Version 1.1.8", SDSC 5. HPSS, High Performance Storage

(http://www.npaci.edu/dice/srb). System,

3. Rajasekar, A., and M. Wan, (2002), http://www4.clearlake.ibm.com/hpss/ind

“SRB & SRBRack - Components of a ex.jsp.

Virtual Data Grid Architecture”, 6. UniTree, http://www.unitree.com.

Advanced Simulation Technologies 7. DMF, Data Migration Facilitty,

Conference (ASTC02) San Diego, April http://136.162.32.160/products/software/

15-17, 2002. dmf.html.

8. ADSM, ADSTAR Distributed Storage (http://www.npaci.edu/dice/srb/mcat.htm

Management, l).

http://searchstorage.techtarget.com/sDefi 10. Sam Coleman and Steve Mller, “Mass

nition/0,,sid5_gci214398,00.html. Storage System Reference Model:

9. MCAT, (2000) "MCAT:Metadata Version 4", Goddard Conference on

Catalog", SDSC Mass Storage Systems and Technologies,

Volume 1, 1992.



Related docs
Other docs by suchenfz
FLC_A1
Views: 0  |  Downloads: 0
Key_Contact_List
Views: 0  |  Downloads: 0
Examination of the Abdomen
Views: 0  |  Downloads: 0
ACT COMPATABILITY GUIDE 13 OCT 09
Views: 3  |  Downloads: 0
Verificar o título e o espanhol
Views: 1  |  Downloads: 0
Multimedia project guidelines
Views: 0  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!