LegionFS A Secure and Scalable F by ps94506


									    LegionFS: A Secure and Scalable File System Supporting
         Cross-Domain High-Performance Applications

                Brian S. White                 Michael Walker                   Marty Humphrey          Andrew S. Grimshaw
                                                           Department of Computer Science
                                                                 University of Virginia
                                                              Charlottesville, VA 22903
ABSTRACT                                                                            to numerical, scienti c datasets. The latter bene ts from a
Realizing that current le systems can not cope with the di-                         non-traditional interface, though neither require typical le
verse requirements of wide-area collaborations, researchers                         system precautions such as consistency guarantees. How-
have developed data access facilities to meet their needs. Re-                      ever, a le system catering only to these le access char-
cent work has focused on comprehensive data access archi-                           acteristics would be short-sighted, ignoring a possible re-
tectures. In order to ful ll the evolving requirements in this                      quirement for additional policy, such as a stricter form of
environment, we suggest a more fully-integrated architecture                        consistency. To service an environment which continues to
built upon the fundamental tenets of naming, security, scal-                        evolve, a le system should be exible and extensible.
ability, extensibility, and adaptability. These form the un-                           Wide-area environments are fraught with insecurity and
derpinning of the Legion File System LegionFS. This pa-                           resource failure. Providing abstractions which mask such
per motivates the need for these requirements and presents                          nuances is a requirement. The success of Grid and wide-
benchmarks that highlight the scalability of LegionFS. Le-                          area environments will be determined in no small part by its
gionFS aggregate throughput follows the linear growth of                            initial and primary users, domain scientists and engineers,
the network, yielding an aggregate read bandwidth of 193.8                          who have little expertise in coping which the vagaries of mis-
MB s on a 100 Mbps Ethernet backplane with 50 simulta-                              behaved systems. Corporations may wish to publish large
neous readers. The serverless architecture of LegionFS is                           datasets via mechanisms such as TerraVision 27 , while lim-
shown to bene t important scienti c applications, such as                           iting access to the data. This is appropriate for collabora-
those accessing the Protein Data Bank, within both local-                           tions that are mutually bene cial to organizations, which
and wide-area environments.                                                         are, nevertheless, mutually distrusting. Such varied and dy-
                                                                                    namic security requirements are most easily captured by a
                                                                                    security mechanism that transcends object interactions.
1.     INTRODUCTION                                                                    As resources are incorporated into wide-area environments,
   Emerging wide-area collaborations are rapidly causing the                        the likelihood of failure increases. A le system should re-
manner and mechanisms in which les are stored, retrieved,                           lieve the user of coping with such failures. Approaches which
and accessed to be re-evaluated. New, inexpensive storage                           require a user to explicitly name data resources in a location-
technology is making terabyte and petabyte weather data                             dependent manner require that a user rst locate the re-
stores feasible. Such data should be accessible physically                          source and later deal with any potential faults or migrations
close to the place of origin and by clients around the world.                       of that resource.
Companies are seeking mechanisms to share data without                                 To address these concerns, we advocate a fully-integrated
compromising the proprietary information of any involved                              le system infrastructure. We have implemented the Legion
site. Increasingly, clients desire the le system to dynam-                           17 File System LegionFS, an architecture supporting the
ically adapt to varying connectivity, security, and latency                         following ve tenets, which we consider fundamental to any
requirements.                                                                       system hoping to meet the goals delineated above:
   Accommodating the varied and continually evolving re-
quirements of applications existing in these domains pre-                                Location-Independent Naming: LegionFS utilizes a
cludes the use of le systems that impose static interfaces                               three-level naming scheme that shields users from low-
or xed access semantics. Common access patterns include                                  level resource discovery and is employed to seamlessly
whole- le access to large, immutable les and strided access                              handle faults and object migrations.
                                                                                         Security: Each component of the le system is rep-
                                                                                         resented as an object. Each object is its own secu-
Permission to make digital or hard copies of all or part of this work for                rity domain, controlled by ne-grained Access Control
personal or classroom use is granted without fee provided that copies are                Lists ACLs. The security mechanisms can be easily
not made or distributed for profit or commercial advantage, and that copies               con gured on a per-client basis to meet the dynamic
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
                                                                                         requirements of the request.
permission and/or a fee.
                                      c                                                  Scalability: Files can be distributed to any host par-
                                                                                         ticipating in the system. This yields superior perfor-
SC2001 November 2001, Denver             2001 ACM 1-58113-293-X/01/0011
$ 5.00
       mance to monolithic solutions and addresses the goals
       of fault tolerance and availability.                        Address Space

       Extensibility: Every object publishes an interface,                           Server Loop
       which may be inherited, extended, and specialized to
       provide alternate policies or a novel implementation.                                   BasicFileObject
       Adaptability: LegionFS maintains a rich set of system-                                 LegionBuffer (Data)
       wide metadata that is useful in tailoring an object's
       behavior to environmental changes.                             BasicFileObject
                                                                     LegionBuffer (Data)
   Previous work targeted at wide-area, collaborative envi-
ronments has successfully constructed infrastructures com-
posed of existent, deployed internet resources. Such ap-
proaches are laudable in that they leverage valuable, legacy
data repositories. However, they fail to seamlessly feder-
ate such distributed resources to achieve a uni ed and re-                            Native File
silient environment. A fully-integrated architecture adopts                            System
basic mechanisms such as naming and security upon which
new services are built or within which existent services are
wrapped. This obviates the need for application writers and
service providers to focus on tedious support structure and
allows them to concentrate on realizing policies within the
  exible framework provided by the mechanisms.
   LegionFS provides only basic functionality and is intended                 Figure 1: ProxyMultiObject
to be extended to meet the performance requirements of
speci c environments. The core of LegionFS functionality
is provided at the user-level by Legion's distributed object-    classes are responsible for creating and locating their in-
based system. The le and directory abstractions of Le-           stances, and for selecting appropriate security and object
gionFS may be accessed independently of any kernel le sys-       placement policies.
tem implementation through libraries encapsulating Legion           Legion objects may be active or inactive, and store their
communication primitives. This approach provides exibil-         internal state on disk. Objects may be migrated simply by
ity as interfaces are not required to conform to standard        transferring this internal state to another host. The object's
UNIX system calls. A modi ed user-level NFS daemon,              class then spawns a process which is instantiated with the
lnfsd, interposes an NFS kernel client and the objects consti-   migrated internal state.
tuting LegionFS. This implementation provides legacy ap-            The complete set of method signatures exported by an
plications with seamless access to LegionFS.                     object de nes its interface. The Legion le abstraction is
   This paper is organized as follows: Section 2 contains        a BasicFileObject, whose methods closely resemble UNIX
a description of the design of LegionFS, including a brief       system calls such as read, write, and seek. ContextObjects
overview of the Legion wide-area operating system and an         manage the Legion name space. Due to the resource inef-
in-depth discussion of naming, security, scalability, extensi-     ciency of representing each as a stand-alone process, les
bility, and adaptability. Section 3 contains a performance       and contexts residing on one host have been aggregated into
evaluation highlighting the advantages a orded by a scal-        container processes, called ProxyMultiObjects Figure 1.
able design. Section 4 presents an overview of related work         A ProxyMultiObject polls for requests and demultiplexes
and Section 5 concludes.                                         them to the corresponding contained le or context. Files
                                                                 store data in a LegionBu er, which achieves persistence
2.   LEGIONFS DESIGN                                             through the underlying UNIX le system. ProxyMultiOb-
   Legion 17 is middleware that provides the illusion of a       jects leverage existent le systems for data storage, pro-
single virtual machine and the security to cope with its un-     viding direct access to UNIX les. Unlike traditional le
trusted, distributed realization. From its inception, Legion     servers, ProxyMultiObjects are lightweight and intended to
was designed to deal with tens of thousands of hosts and mil-    be distributed throughout the system. They service only a
lions of objects - a capability lacking in other object-based    portion of the name space, rather than comprising it in its
distributed systems. This section discusses the key areas of     entirety.
Legion as they apply to the design of LegionFS.
                                                                 2.2 Naming
2.1 Object Model                                                   User-de ned text strings called context names identify
  Legion is an object-based system comprised of indepen-         Legion objects. Context names are mapped by a directory
dent, logically address space-disjoint, active objects that      service called context space to unique, location-independent
communicate via remote procedure calls RPCs. Objects           binary names called Legion object identi ers LOIDs. For
represent coarse-grained resources and entities such as users,   direct object-to-object communication, LOIDs must be
hosts, schedulers, les, and directories. Each Legion object      bound via a binding process  to low-level Object Addresses.
belongs to a class, which is itself a Legion object. Much        An Object Address OA represents an arbitrary communi-
of what is usually considered system-level responsibility is     cation endpoint, such as a TCP socket.
delegated to user-level class objects. For instance, Legion        The LOID records the class of an object, its instance num-
ber, and a public key to enable encrypted communication.            2.4 Scalability
New LOID types can be constructed to contain additional                LegionFS distributes les and contexts across the avail-
security information such as an X.509 certi cate, location        able resources in the system. This allows applications to
hints, and other information.                                       access les without encountering centralized servers and en-
   Context space is similar to a globally distributed, rooted       sures that they can enjoy a larger percentage of the available
directory. It is comprised of ContextObjects, which provide         network bandwidth without contending with other applica-
mappings from context names to LOIDs in the same fashion            tion accesses.
that directories map path names to inode numbers. Unlike               Scheduler objects provide placement decisions upon ob-
directories, ContextObjects may contain references to arbi-         ject creation. Utilizing information on host load, network
trary objects.                                                      connectivity, or other system-wide metadata, a scheduler
   Having translated a context name to a LOID, an object            can make intelligent placement decisions. A user may em-
consults a series of distributed caches to bind the LOID to         ploy existing schedulers, implement an application-tailored
an OA. Each object maintains a local binding cache. A               scheduler which places les, contexts, and objects accord-
binding cache miss results in a call to a Binding Agent ob-         ing to domain-speci c requirements, or may enforce directed
ject. Cache misses at the Binding Agent are serviced by the         placement decisions. Using the latter mechanism, a user
class of the LOID. This operation recurses, if necessary, but       might specify that all of his les be created on a local host
is guaranteed to terminate at LegionClass, the root of the          or within a highly-connected, nearby cluster. This ensures
Legion object hierarchy.                                            that most le accesses are local, while allowing for wide-area
   Legion's location-independent naming facilitates fault tol-      access. It also isolates user le accesses to achieve maximum
erance and replication. Because objects are not bound by            e ciency. A user may employ the replication techniques de-
name to individual hosts, they may be seamlessly migrated.          scribed elsewhere in this paper to tolerate failures of local
If an object's host fails, but the internal state of an object      resources. This provides highly-e cient access in the com-
is still accessible, the object's class may restart it elsewhere.   mon case, with a measure of insurance in case of host or disk
   Classes may act as replication managers by mapping one           failures.
LOID to multiple OAs, referring to objects on di erent hosts.          The fully-distributed design of LegionFS allows the user
A class object is a logical replication manager as its instances    to remain ignorant of the constraints of physical disk en-
would likely employ the same replica consistency policies.          closures, available disk space, and le system allocations.
By entrusting a class object with more responsibility, the          Administrators seamlessly incorporate additional storage re-
system increases the load on that object. Means of ensur-           sources into the le system. By simply adding a storage
ing that individual objects do not become bottlenecks are           subsystem to a context of available storage elements, the
discussed in Section 2.4.                                           additional space is advertised to the system and becomes a
                                                                    target for placement decisions.
                                                                       LegionFS utilizes multiple levels of caching to facilitate ef-
2.3 Security                                                          cient le and directory lookups and employs limited forms
   Legion's distributed, extensible nature and user-level im-       of replication. Aside from their role in the binding process,
plementation prevent it from relying on a trusted code base         Binding Agents cache translations between context names
or kernel. Furthermore, there is no concept of a superuser          and LOIDs. lnfsd similarly caches translations to avoid ex-
in Legion. Individual objects are responsible for legislat-         cessive RPCs.
ing and enforcing their own security policies. The public              Manager objects such as classes can become hot spots.
key embedded in an object's name enables secure commu-              Fortunately, there is no inherent reason to have one class
nication with other objects. Objects are free to negotiate          manager for all instances of a particular class. To mitigate
the per-transaction security level on messages, such as full        potential bottlenecks, management responsibilities are dis-
encryption, digital signatures, or cleartext.                       tributed across 'clones' of a particular class.
   When a user authenticates to Legion, currently via pass-
word, she obtains a short-lived, unforgeable credential 12          2.5 Extensibility
that uniquely identi es her. Authorization is determined by            LegionFS di erentiates between objects according to their
an Access Control List ACL associated with each object;           exported interfaces, not their implementations. For exam-
an ACL enumerates the operations on an object and the               ple, LegionFS interacts with any object providing the stan-
associated access rights of speci c principals or groups of        dard BasicFileObject interface as if it were a le. By focus-
principals. If the signer of any credentials passed in an in-      ing on the interface without concern for the object's actual
vocation is allowed to perform the operation, the operation         class or implementation, LegionFS provides an extensible
is permitted.                                                       set of services which can be specialized on an application- or
   Per-method access control facilitates ner-grained secu-          domain-speci c basis. An object may provide a value-added
rity than traditional UNIX le systems. No special privi-            service by changing the semantics associated with a method.
lege is necessary to create a group upon which to base ac-          Thus the same interface can be used to wrap di erent im-
cess. A client can dynamically modify the level of security         plementations. Further, an interface may be augmented to
employed for communication, for example to use encryption           provide functionality in the form of additional methods.
when transacting with a geographically-distant peer, but to            A newly-minted object exporting the standard interface
communicate in the clear within a cluster. Specialized le           may be accessed by existent libraries. If functionality war-
objects can be designed to keep audit trails on a per-object        rants an additional method, it may be implemented, ex-
or per-user basis i.e., auditing can be performed by someone       ported by the object, and incorporated into a newly gen-
other than a system administrator.                                 erated library. This allows multiple policies governing a
                                                                    particular design issue to co-exist. A programmer builds
upon lower-level functionality, such as the Legion security      other consistency scheme.
and communication layers, to construct objects suited for
particular domains, adding them to the pool of objects al-       2.6 Adaptability
ready populating the system.                                        A wide-area le system must be adaptable to a diverse set
   Legion's event-based protocol stack provides an additional    of network, load, and system-wide conditions. LegionFS fa-
opportunity for extensibility. Remote messages and excep-        cilitates adaptation by maintaining system-wide metadata.
tions are intercepted and announced to higher-level han-         Each object has an associated, arbitrary set of key,value
dlers. These handlers are registered according to priority       pairs. Typical attributes for a host object include load av-
and may handle an event or provide limited processing and        erages, architecture, and operating system. This list could
announce the event to subsequent handlers. The Legion se-        easily be extended to include other factors which might af-
curity layer is implemented as a layer in the protocol stack.    fect le placement in a wide-area environment such as net-
Operations that transcend method invocations, such as an         work interfaces and their associated nominal bandwidths,
auditing facility, may be implemented as additional layers       local le systems, and disk con gurations.
in the stack.                                                       Attributes are available directly from the object and are
   Providing excessive and heavy-weight functionality such      also stored in a metadata repository, called the Collection.
as consistency and replication in all le and contexts ob-       The Collection is a hierarchically distributed set of objects
jects is inappropriate as some applications neither require      which is queried by schedulers to determine object charac-
nor want the overhead associated with these mechanisms.          teristics and state. Objects periodically push their state
Instead LegionFS provides the basic set of functionality de-     information to the Collection. More sophisticated monitor-
scribed above and the framework to extend semantics where        ing facilities such as the Network Weather Service 44 could
desired. Such functionality need be implemented only in the      also be employed to populate the Collection.
objects that require it, without impeding objects and appli-        The Collection allows applications to track the dynam-
cations that do not.                                             ics of the system as well as capitalize on its more stable,
   Interface inheritance was useful in implementing Proxy-       inherent diversity. A geographically-distributed system is
MultiObjects, TwoDFileObjects, and Simple K-Copy                 likely to contain a range of heterogeneity in the form of un-
Classes SKCC. TwoDFileObjects are a domain-speci c im-         derlying le systems, storage devices, and architectures. If
plementation serving the scienti c community, but are appli-     the characteristics of an application are well-known, the ap-
cable to a broader audience. A TwoDFileObject implements         plication may bene t from placement that matches these
the BasicFileObject interface such that reads and writes are     needs against the properties of particular resources. As spe-
striped across constituent, underlying BasicFileObjects, ar-     ci c examples, XFS 4 provides bene ts to streaming ap-
ranged as a two-dimensional matrix. A parallel le interface      plications by allowing them to circumvent standard kernel
provides convenient access to applications performing ma-        bu er caches and RAID enclosures may provide more e -
trix operations. The two-dimensional design degenerates to       cient availability than can be provided at higher layers in
striping for high-performance I O.                               the system.
   SKCC wrap standard classes to provide fault tolerance, by        Since les and contexts are logically self-contained objects,
replicating an object's internal state but not the object it-   it is more convenient to specify ne-grained policies than
self across a number of user-speci ed storage elements. The     would be possible in a more conventional distributed le
state of an active class object may be synchronized across       system. Objects may act on these policies asynchronously
the replicas at convenient stable points of execution, such as   with respect to the user. LegionFS allows a user to explicitly
during object deactivation. This approach provides a good        migrate or deactivate an object. More interesting behaviors
measure of fault tolerance with a minimum of performance         include the ability to migrate due to network conditions or
degradation.                                                     replicate to accommodate increased load. A le might con-
   Some environments need more full-featured replication         sider re-negotiating transfer size, changing consistency pol-
and consistency guarantees than those provided by LegionFS.      icy, or varying write-back policy in accordance with network
It is possible to extend ContextObjects to perform replica-      constraints. Many of these issues were explored in the Coda
tion management: instead of a one-to-one mapping of con-           le system 24 .
text names to LOIDs, ContextObjects could provide a one-            Golding et al. 15 discuss means of exploiting idle peri-
to-many context name-to-LOID translation. The Context-           ods in computer systems. Assuming fair load distribution,
Object could perform replica selection based on availability     a le object is more likely to experience idleness than a cen-
or network connectivity constraints.                             tralized le server. Therefore, a le object has an oppor-
   File data consistency is not addressed by basic Legion        tunity to analyze its access patterns in order to prefetch.
mechanisms, because no current Legion object caches le           The le system literature is replete with prefeteching mech-
data. The initial implementation of lnfsd, which serves as       anisms 8 10 26 29 35 . Often e ciency is a concern as
an access point to LegionFS, provides NFS-like consistency       the mechanisms must be realized within the limited latency
semantics; it caches data for a con gurable amount of time       and memory constraints of a kernel-resident le system. Be-
before re-validating le metadata via a stat call. There are      ing less memory-constrained, a Legion le may retain more
important classes of domains where consistency guarantees        exact data concerning access patterns and prefetch sched-
are not appropriate, for example large read-only scienti c       ules. Having characterized its own usage, a le object could
datasets. For environments where consistency is necessary,       provide access hints 35 to the client to facilitate prefetching
it can be handled on a per- le or per-context basis at the       across the network. As a further optimization, a le object
object itself, without forcing the semantics on users access-    could recognize long periods of inactivity and move data to
ing other data. An object could grant leases 16 , which are      a more space-e cient, but less readily-accessible represen-
more scalable than simple callbacks 21 , or implement any        tation, le system, or storage device, as done in the HP

Bandwidth (MB/sec)

                     150                                                                                            NFS


                           1          10           20            30                40              50
                                                   Number of readers

                               Figure 2: Scalability of read performance in NFS, lnfsd, and LegionFS

AutoRAID system 43 .                                                   separate nodes, though they share the same switch when-
                                                                       ever possible. The experiment employs up to 100 nodes,
3.                   EVALUATION                                        providing the opportunity to scale the benchmark to 50
   This section compares the scalable design of LegionFS to            readers accessing les on 50 separate nodes. The LegionFS
a more traditional volume-oriented approach. The gross dis-            case utilizes the Legion BasicFile library and distributes Ba-
parity in potential parallelism between the two experimental           sicFileObjects throughout the network. These same Basic-
setups is intentional, and serves to validate the move from            FileObjects are accessed in the lnfsd experiment by clients
monolithic servers as employed by NFS to the peer-to-peer              that are co-located with the lnfs interposition agents. The
architecture advocated by Legion, xFS 4 , and others. Pre-             NFS experiment uses a single NFS daemon to service le
vious work 42 examined Legion wide-area I O performance                system requests from 50 readers. In all cases, caching oc-
alongside the Globus 14 I O facility and FTP, the de facto             curs only on the server side.
means of transferring les in a wide-area environment.                     Single readers attained 4.5 MB s and 2.1 MB s under Le-
   Each benchmark utilizes the Centurion cluster 28 at the             gionFS and NFS, respectively. NFS is limited to 4K trans-
University of Virginia. These experiments employ 400-Mhz               fers over the network, whereas LegionFS can use arbitrary
dual-processor Pentium II machines running Linux 2.2.14                transfer sizes. lnfsd performs similarly with a bandwidth of
with 256 MB of main memory and IDE local disks. These                  2.2 MB s. lnfsd performance is degraded by frequent con-
commodity components are directly connected to 100 Mbps                text switches and RPCs between the kernel client and lnfsd.
Ethernet switches, which are in turn connected via a 1 Gbps            This pure overhead is the expense of supporting legacy ap-
switch. A 100 Mbps link provides the cluster with access               plications, and is avoided when using the Legion library in-
to the vBNS. During the second experiment, remote hosts                terfaces. lnfsd attempts to mitigate the ine ciency of its
at Binghamton University and the University of Minnesota               user-level implementation by performing read-ahead on se-
communicate with the Centurion cluster using this connec-              quential le access, asynchronous write-behind, and le and
tion. The Sparc hosts at Binghamton University run Solaris             metadata caching.
5.7, while the dual-processor Intel machines at the Univer-               LegionFS and lnfsd each achieved peak performance at 50
sity of Minnesota run Linux 2.2.12.                                    readers, yielding aggregate bandwidths of 193.8 MB s and
   The rst micro-benchmark Figure 2 is designed to show              95.4 MB s, respectively. NFS peak performance occurred at
that LegionFS clients accessing independent subtrees achieve           2 readers, yielding aggregate bandwidth of 2.1 MB s. NFS
a linear increase in aggregate throughput in accordance with           does not scale well with more than two readers, whereas
the linear growth of the network. lnfsd performance also               both lnfsd and LegionFS scale linearly with the number of
scales nearly linearly. On the other hand, NFS performance             readers, assuming the le partitioning described above.
scales poorly with additional clients. Each reader accesses               To put the above results in the context of a popular do-
a private 10 MB le via a series of 1 MB transfers. The                 main, the next benchmark examines access to a subset of
experiment varies the number of simultaneous readers per               the Protein Data Bank PDB. This experiment is intended
run. Each reader and its associated target le are placed on            to simulate the workings of parameter space studies such as
         1000                                                            1000

         800                                                             800

         600                                                             600

         400                                                             400

         200                                                             200

           0                                                               0
                0   5    10       15        20    25    30   35                 0       10          20           30          40
                              Number of readers                                              Number of readers
                                     (a)                                                           (b)

Figure 3: Centurion clients accessing PDB data stored in ProxyMultiObject within Centurion cluster.
a Average Client Bandwidth b Aggregate Bandwidth

         1600                                                            3500
         1400                                                            3000
         1200                                                            2500

          400                                                            1000
          200                                                             500
            0                                                               0
                0       10             20          30        40                 0       10           20          30          40
                              Number of readers                                              Number of readers
                                     (a)                                                           (b)

Figure 4: Centurion clients accessing PDB data stored in BasicFileObjects within Centurion cluster.
a Average Client Bandwidth b Aggregate Bandwidth

Feature 3 , which has been used to scan the PDB searching           reader begins execution. Each client records the elapsed
for calcium binding sites. Feature, and similar parameter           time to read the list of les in its entirety and calculates its
space studies, employ coarse-grained parallelism to execute         bandwidth. The average of these bandwidths is reported on
large simultaneous runs against di erent datasets. The Pro-         the left-hand side of Figures 3, 4, 5, and 6 as average client
tein Data Bank is typical of large datasets in that it services     bandwidth. The test harness responsible for remotely exe-
many applications from various sites worldwide desiring to          cuting the hosts records the elapsed time from instantiation
access it via a high-sustained data rate.                           of the rst job to completion of the last. This aggregate
   Clients read a subset of les from the PDB stored in              bandwidth is reported on the right-hand pane of the same
Legion context space. To avoid excessively long runs, only            gures. The two metrics are intended to capture the per-
the rst 100 les from the PDB were accessed. These les               formance of individual clients and the throughput of the
have an average size of approximately 171 KBs, with a le            system under a speci ed load. During the prelude and epi-
size standard deviation of 272 KBs. Such a distribution             logue of an experiment, the test is not in a steady state and
indicates there are many small les in the database along            the number of active clients is below the speci ed value.
with a few very large les. A client's execution is termed a            Files hosted on the Centurion cluster store the PDB data.
job and consists of 100 whole- le reads. Client execution is        Though only the rst 100 les are accessed, 12000 les are
not synchronized. Each stage of the experiment de nes the           stored under a single context. This simulates accessing a
number of active clients. While the number of active clients        relatively small subset of a large data collection. The exper-
is varied from 1 to 32 between stages of the experiment, the        iments vary the placement of the clients and the le system
number of jobs remains constant at 100.                             distribution to cover local- and wide-area environments and
   The test harness iterates through the target hosts in round-     volume-oriented and peer-to-peer designs. The local-area
robin fashion, assigning readers until the speci ed paral-          experiments Figures 3 and 4 execute each of the clients
lelism is reached. Upon a client's completion, an additional        on one of 32 nodes within the Centurion cluster, though
         350                                                            800
         300                                                            700
         250                                                            600


         150                                                            300
         100                                                            200
         50                                                             100
          0                                                               0
               0      10           20          30           40                0       5   10       15        20    25    30   35
                           Number of readers                                                   Number of readers
                                  (a)                                                                (b)

Figure 5: Remote clients accessing PDB data stored in ProxyMultiObject within Centurion cluster.
a Average Client Bandwidth b Aggregate Bandwidth

         450                                                            2500
         350                                                            2000

         200                                                            1000
         100                                                             500
           0                                                              0
               0      10           20          30           40                    0       10            20          30        40
                           Number of readers                                                   Number of readers
                                  (a)                                                                (b)

Figure 6: Remote clients accessing PDB data stored in BasicFileObjects within Centurion cluster.
a Average Client Bandwidth b Aggregate Bandwidth

they are never co-located with PDB les. The wide-area              MultiObject and results in a slightly less severe performance
experiments Figure 5 and 6 place jobs on a pool of 4 ma-         impact. Clients achieve peak average bandwidth at 827
chines at the University of Minnesota and 12 at Binghamton         KB s and 312 KB s within local-area and wide-area en-
University. The relative dearth of remote machines requires        vironments, respectively. This occurs when a client need
that clients be scheduled on the same host when their ac-          not contend with other readers. Average client bandwidth
tive number exceeds 16. While an unfortunate incongruity           is minimized under each case at 32 readers, dropping to 42
between the environments, the jobs are I O-bound and do            KB s and 35 KB s for the local-area and wide-area cases,
not su er unduly by being placed on the same host. The             respectively.
experiments designed to stress the volume-oriented design             The ProxyMultiObject aggregate bandwidth curves bear
Figures 3 and 5 host all les within a single ProxyMulti-         close resemblance to one another. Aggregate bandwidth
Object. The peer-to-peer setup Figures 4 and 6 distributes       grows steadily until a maximum is reached at 8 clients, and
the BasicFileObjects across 32 Centurion nodes.                    then attens. The ProxyMultiObject is best utilized by a
   As expected, the ProxyMultiObject shows immediate and           small number of clients, but can not continue to scale with
drastic performance degradation with increasing load. The          increased load. The peak bandwidths of 944 KB s within
e ect is particularly acute when the clients execute within        the cluster and 717 KB s of the remote clients may seem sur-
the cluster Figure 3. In this case, there is a near 50 re-      prisingly small in comparison to the achieved average client
duction in bandwidth with each doubling of the number of           bandwidths. This occurs because the aggregate bandwidth
active readers. Figure 5 exhibits a similar dramatic trend,        measures the total elapsed execution time of all 100 jobs,
though the curve is not as steep. Given its relatively greater     including the time required to start the remote jobs, trans-
distance from the ProxyMultiObject, a client's requests are        fer an input le, and reap the results. While this additional
less densely concentrated than when running within the clus-       overhead comprises a non-trivial percentage of the total job
ter. This ensures less immediate contention for the Proxy-         turnaround time, it is illustrative of actual execution. The
average client bandwidths report performance once the job       storage facilities, such as le systems, databases, and hier-
has begun execution; the aggregate bandwidth is indicative      archical storage systems.
of system throughput.                                              In the context of the Globus Grid Toolkit 14 , Chervenak
   Distributing the load amongst the BasicFileObjects leads     et al. 9 posit a framework that stresses the importance
to more graceful performance degradation in Figures 4 and 6.    of employing standard protocols to achieve interoperabil-
The system does not scale linearly, however. Unlike the raw     ity. This work leverages previous work on Globus data
throughput experiment above, clients in this setup access       access 6 , deployed internet infrastructure and protocols
a shared portion of the PDB, rather than dedicated per-         such as HTTP and LDAP, and protocol extensions such as
client les. While average client bandwidth remains fairly       GridFTP 1 . File replication and selection via Condor Clas-
steady with a few additional clients in both graphs, large      sAds 36 have been successfully implemented using these
numbers of active clients increase the likelihood that one      mechanisms 41 . While Globus bene ts from existent pro-
or more will access the same data, leading to contention at     tocols and internet services, it is also constrained by their
the BasicFileObject. At 1406 KB s, peak client bandwidth        mandates. To ensure interoperability, entities must com-
accessing BasicFileObjects within the cluster is signi cantly   municate using the standard protocol. A perceived need
higher than the corresponding ProxyMultiObject case. This       or feature in the service may require amending that stan-
suggests additional bene ts of BasicFileObjects. ProxyMul-      dard. Not held captive to prescribed interfaces, Legion ob-
tiObjects must maintain state for each constituent object,      jects may simply export new methods. Because internet pro-
leading to overhead when demultiplexing a request to the        tocols evolved independently, they do not necessarily share
target. Further, BasicFileObjects may greater exploit the       commonalities along important dimensions such as naming,
local le system cache since they serve a much smaller por-      authentication, and authorization. Thus features such as
tion of the name space and are less likely to su er capacity    authorization, that might be expected to pervade the sys-
cache misses.                                                   tem, must be implemented anew for each service, either as
   The increasingly large standard deviations of Figure 4 re-   a mapping to each speci c protocol or outside the service
sult from the contention described above. Since all clients     proper. By exposing uniform and integrated mechanisms to
iterate through les in the same order, contention is more       distributed objects, LegionFS ensures le abstractions are
likely at the onset of the experiment. During the ramp up       secured in a consistent manner without this burden.
stage, clients perform more poorly than during steady-state        WebFS 40 and Ufo 2 also provide access to internet ser-
execution. This may seem counterintuitive as the test has       vices. WebFS is a kernel-resident le system that provides
not reached its full complement of active clients. Never-       access to the global HTTP name space. It supports three
theless, clients caravan behind one another until adequate      cache coherency policies deemed appropriate for HTTP ac-
spacing is achieved. As a job completes, a new job begins       cess: last writer wins, append only, and multicast updates.
execution and inherits the spacing won by the nished job.       Ufo employs the UNIX tracing facility to intercept open sys-
This e ect is not present in the wide-area case of Figure 6,    tem calls and transfers whole les from FTP and HTTP
where temporal distance between jobs is achieved by the         servers.
relatively longer time required to start remote execution.         The PUNCH Virtual File System PVFS 13 interposes
   The caravan e ect is pronounced in the aggregate band-       unmodi ed NFS clients and servers with NFS-forwarding
widths of Figure 4b, where performance dips under the         proxies. PVFS allows a client executing on a compute server
load of 32 clients. Unlike previous cases, the retarded         to access les stored within another security domain. Dur-
progress of the 32 initial clients is signi cant amongst 100    ing the course of a session, clients are allocated a temporary
jobs. Aggregate bandwidth reaches its height of 3044 KB s       shadow account on the compute server. Requests are di-
at 16 clients. Unburdened by temporal proximity, the re-        rected to the proxy, co-located with the target NFS server.
mote clients accessing the BasicFileObjects contribute to in-   The proxy maps the shadow account id of the request to the
creased aggregate bandwidth up to 32 clients at 1938 KB s.      user's corresponding id on the target host and forwards the
                                                                request to the NFS server.
4.   RELATED WORK                                                  File system adaptability has been addressed in Coda 24
   The continued and increasing interest in wide-scale dis-     and Odyssey 34 , which support application-transparent and
tributed computing, driven by high-bandwidth, long-haul         application-aware adaptation, respectively. Both adapta-
networks and the economies of scale of commodity hard-          tion strategies are designed to provide resilience in the pres-
ware, has lead to the design of le systems and data ac-         ence of varying network performance and collect simple in-
cess facilities engineered speci cally for such an environ-     formation about certain resources to aid in system monitor-
ment 2 6 5 9 13 . Such le systems were motivated                ing.
by concerns inherent in wide-area environments, unlike their       The Hurricane File System HFS 25 employs building
predecessors which were originally intended for campus- or      blocks to encapsulate le system policies, such as prefetching
local-area networks and were retro tted to ll expanding         and distribution. These building blocks may be composed
roles 21 38 37 .                                                according to their interfaces to achieve per- le and per-open
   Recognizing the diverse and evolving nature of wide-area       le instance specialization.
environments, researchers have followed the approach taken         While building blocks are relatively coarse-grained and
in LegionFS of developing layered architectures consisting of   focus on policies that span the entire le system, stacking
a potentially-expansive set of services integrated via lower-   allows individual le system calls to be interposed. Higher
level protocols 5 9 . SRB 5 is middleware that provides         layers in a stacked le system may provide additional pro-
access to data stored on heterogeneous resources residing       cessing or modify arguments before invoking the same op-
within a distributed system. SRB Agents contact the MCAT        eration on the subsequent, symmetric layer. Stacking vn-
metadata service in order to locate and transact with local     odes 39 create a chain of traditional vnodes to support
interposing. Ficus 19 is a replicated le system that allows        Understanding that many classes of scienti c applications
kernel- or user-level le system modules exporting the vn-       can best utilize the Grid without the imposition of costly
ode interface to be stacked. Later work 20 abandoned the        functionality, LegionFS follows a minimalist approach. How-
rigid vnode interface in favor of the UCLA interface which      ever, the means of incorporating application-speci c policies
is formed at kernel initialization and is the union of inter-   is enabled by the set of mechanisms a orded by Legion.
faces exported by each layer. A directory subtree constitutes   This ensures that emerging services and applications can ef-
a layer and may be mounted atop another layer to form a         fortlessly utilize existent infrastructure to form a cohesive
stack.                                                          system, without having to cobble and reconcile mechanisms
   The Spring 32 object-oriented operating system is com-       that were not intended to work in unison. Extensions to core
posed of cooperating servers running on a micro-kernel. File    services, such as ProxyMultiObjects and TwoDFileObjects,
objects inherit from Spring interfaces charged with handling    are a result of this philosophy. We have also described as yet
operations such as paging, authentication, consistency, and     unimplemented opportunities such as replication via class or
I O 33 . A new le system is allocated by contacting its cor-    context objects and consistency guarantees that capitalize
responding creator object. This le system may be stacked        on lower-level Legion facilities.
on an existing le system by means of a stackon method 23 .         The heterogeniety, wealth of storage, and abundance of
Subsequent work on the Solaris MC File System 30 replaces       CPU cycles in wide-area environments suggest interesting
the vnode interface with a new interface de ned in CORBA        possibilities for le systems. The ability to schedule pro-
IDL.                                                            cesses according to their I O a nities and leverage idle pe-
   The FiST language 45 is a high-level language for de-        riods are two avenues for continued research. We expect
scribing stackable le systems. By providing a standard in-      wide-area le systems to evolve into more than mere exten-
terface to mask operating system peculiarities, FiST allows     sions of smaller-scale distributed le systems. Rather, they
for portable le system implementations. File systems may        may e ciently bridge local or local-area le systems, gaining
interpose speci c operations or a set of operations and may     advantage from their unique strengths.
choose to insert code before, after, or in lieu of the opera-      While anticipating the future of wide-area le systems,
tion. The FiST description of the le system extensions is       this paper provided a quantitative study of the current state
input to stgen, a parser and code generator, which outputs      of LegionFS. The utility of LegionFS has been demonstrated
kernel C sources.                                               with the Legion object-to-object protocol as well as lnfsd,
   Legion's goal of acceptance amongst diverse organizations    a user-level daemon designed to exploit UNIX le system
requires both that it provide secure means of cross-domain      calls and provide an interface between a UNIX kernel and
access and that administrative overhead be minimized. Cen-      LegionFS. Benchmarks showed that the scalability of Le-
tralized key services, such as Kerberos, have been success-     gionFS compared favorably under load to volume-based le
fully employed by AFS 21 38 and DFS 22 , but do not             systems, such as NFS. Finally, LegionFS was shown to facil-
meet these requirements. The centralized key management         itate e cient data access in an important scienti c domain,
in Kerberos becomes increasingly di cult as the system          the Protein Data Bank.
scales. The Self-certifying File System SFS 31 embeds
a public key in the name of a le, making "self-certifying"      6. ACKNOWLEDGMENTS
pathnames. LegionFS leverages a similar, distributed key           This work was partially supported by Logicon for the
management system.                                              DoD HPCMOD PET program DAHC 94-96-C-0008, NSF-
   The notion of serverless or peer-to-peer le systems was      NGS EIA-9974968, NSF-NPACI ASC-96-10920, and a grant
popularized by xFS 4 , and has spurred a rash of related        from NASA-IPG. In addition, the authors would like to
projects 7 11 18 . xFS implements a serverless architec-        thank John Karpovich and Mark Morgan for answering ques-
ture to provide scalable le service, and provides data redun-   tions on the Legion architecture, Norm Beekwilder for his
dancy through networked disk striping to increase reliabil-     aid in experimental design and administration, Katherine
ity. JetFile 18 relies on multicast to locate les distributed   Holcomb for her patience as we taxed the Centurion net-
throughout the network. This location-independent naming        work, Anand Natrajan for his feedback and guidance with
scheme encourages data replication. Unfortunately, multi-       Legion scheduling, and the entire Legion team. The wide-
cast is problematic in wide-area environments as it oods        area results would not have been feasible without contributed
networks and relies on router support.                          academic resources. The authors thank Mike Lewis for the
                                                                use of machines at Binghamton University and Jon Weiss-
5.   CONCLUSION                                                 man of the University of Minnesota for his support. Finally,
   This paper has examined a small sample of the usage          we thank our shepherd, Ann Chervenak, and the anonymous
scenarios and requirements of le access in Computational        referees for adding clarity to this paper's presentation.
Grids as they exist today. With this knowledge, we advocate
an architecture integrated by basic, but powerful, facilities   7. REFERENCES
such as location-independent naming and pervasive authen-        1 Gridftp: Ftp extensions for the grid. Grid Forum
tication, authorization, and con dentiality mechanisms. A          Remote Data Access group, October 2000.
scalable, peer-to-peer design ensures that the Grid can ben-     2 A. D. Alexandrov, M. Ibel, K. E. Schauser, and C. J.
e t fully from its consituent resources, rather than be bound      Scheiman. Extending the operating system at the user
by the performance limitations of centralized services. Fi-        level: the ufo global le system. In 1997 Annual
nally, wide-area applications can exploit the dynamics of the      Technical Conference on Unix and Advanced
system through adaption and continue to evolve with our            Computing Systems USENIX '97, January 1997.
understanding of the Grid's potential through a framework        3 R. Altman and R. Moore. Knowledge from biological
promoting extensibility.                                           data collections. enVision, 162, April 2000.
 4 T. E. Anderson, M. D. Dahlin, J. M. Neefe, D. A.               sharing on a large scale. IEEE Computer, 325:29 37,
   Patterson, D. S. Roselli, and R. Y. Wang. Serverless           May 1999.
   network le systems. In Proceedings of the Fifteenth       18   B. Gronvall, A. Westerlund, and S. Pink. The design
   ACM Symposium on Operating Systems Principles,                 of a multicast-based distributed le system. In
   pages 109 126, Copper Mountain, CO, December                   Proceedings of the Third Symposium on Operating
   1995. ACM Press.                                               Systems Design and Implementation, New Orleans,
 5 C. Baru, R. Moore, A. Rajasekar, and M. Wan. The               Louisiana, February 1999.
   sdsc storage resource broker. In CASCON'98,               19   R. G. Guy, J. S. Heidemann, W. Mak, J. Thomas
   Toronto,Canada, November-December 1998.                        W. Page, G. J. Popek, and D. Rothmeier.
 6 J. Bester, I. Foster, C. Kesselman, J. Tedesco, and            Implementation of the cus replicated le system. In
   S. Tuecke. GASS: A data movement and access service            USENIX Conference Proceedings, Berkeley, CA, June
   for wide area computing systems. In Proceedings of the         1990. USENIX Association.
   Sixth Workshop on Input Output in Parallel and            20   J. S. Heidemann and G. J. Popek. File system
   Distributed Systems, pages 78 88, Atlanta, GA, May             development with stackable layers. ACM Transactions
   1999. ACM Press.                                               on Computer Systems, 121:58 89, February 1994.
 7 W. J. Bolosky, J. R. Douceur, D. Ely, and                 21   J. Howard, M. Kazar, S. Menees, D. Nichols,
   M. Theimer. Feasibility of a serverless distributed le         M. Satyanarayanan, R. Sidebotham, and M. West.
   system deployed on an existing set of desktop pcs. In          Scale and Performance in a Distributed File System.
   Sigmetrics 2000, pages 34 43, 2000.                            ACM Transactions on Computer Systems, 61:51 81,
 8 P. Cao, E. W. Felten, A. R. Karlin, and K. Li. A               February 1988.
   study of integrated prefetching and caching strategies.   22   M. L. Kazar, B. W. Leverett, O. T. Anderson,
   In Proceedings of the 1995 ACM SIGMETRICS                      V. Apostolides, B. A. Bottos, S. Chutani, C. F.
   Conference on Measurement and Modeling of                      Everhart, W. A. Mason, S.-T. Tu, and E. R. Zayas.
   Computer Systems, pages 188 196, Ottawa, Ontario,              Decorum le system architectural overview. In
   Canada, 1995.                                                  Proceedings of the 1990 Summer USENIX Conference,
 9 A. Chervenak, I. Foster, C. Kesselman, C. Salisbury,           pages 151 163, Anaheim, CA, June 1990. USENIX
   and S. Tuecke. The data grid: Towards an                       Association.
   architecture for the distributed management and           23   Y. A. Khalidi and M. N. Nelson. Extensible le
   analysis of large scienti c datasets. Journal of               systems in spring. In Proceedings of the Fourteen
   Network and Computer Applications, 1999.                       ACM Symposium on Operating Systems Principles,
10 K. M. Curewitz, P. Krishnan, and J. S. Vitter.                 Asheville, NC, December 1993. ACM Press.
   Practical prefetching via data compression. In            24   J. J. Kistler and M. Satyanarayanan. Disconnected
   Proceedings of the ACM SIGMOD International                    operation in the coda le system. In Proceedings of the
   Conference on Management of Data. ACM Press,                   Thirteenth ACM Symposium on Operating Systems
   1993.                                                          Principles, pages 3 25. ACM Press, February 1992.
11 P. Druschel and A. Rowstron. Past: A large-scale,         25   O. Krieger and M. Stumm. Hfs: A
   persistent peer-to-peer storage utility. In HOTOS              performance-oriented exible le system based on
   VIII, Schoss Elmau, Germany, May 2001.                         building-block compositions. ACM Transactions on
12 A. Ferrari, F. Knabe, M. Humphrey, S. Chapin, and              Computer Systems, 153:286 321, August 1997.
   A. Grimshaw. A exible security system for                 26   T. M. Kroeger and D. D. E. Long. The case for
   metacomputing environments. Technical report,                  e cient le access pattern modeling. In Proceedings of
   University of Virginia, December 1998.                         the 1996 USENIX Technical Conference, January
13 R. J. Figueiredo, N. H. Kapadia, and J. A. B. Fortes.          1996.
   The punch virtual le system: Seamless access to           27   Y. G. Leclerc, M. Reddy, L. Iverson, and N. Bletter.
   decentralized storage services in a computational grid.        Terravisionii: An overview. Technical report, SRI
   In Proceedings of the Tenth IEEE International                 International, 2000.
   Symposium on High Performance Distributed                 28   G. Lindahl, S. J. Chapin, N. Beekwilder, and
   Computing. IEEE Computer Society Press, August                 A. Grimshaw. Experiences with legion on the
   2001.                                                          centurion cluster. Technical report, University of
14 I. Foster and C. Kesselman. Globus: A metacomputing            Virginia, August 1998.
   infrastructure toolkit. International Journal of          29   T. M. Madhyastha and D. A. Reed. Input output
   Supercomputer Applications, 112:115 128, 1997.               access pattern classi cation using hidden markov
15 R. Golding, P. Bosch, C. Staelin, T. Sullivan, and             models. In Proceedings of the Fifth Workshop on
   J. Wilkes. Idleness is not sloth. In USENIX Technical          Input Output in Parallel and Distributed Systems,
   Conference, pages 201 212, January 1995.                       pages 57 67, San Jose, CA, November 1997.
16 C. Gray and D. Cheriton. Leases: An e cient               30   V. Matena, Y. A. Khalidi, and K. Shirri . Solaris mc
   fault-tolerant mechanism for distributed le cache               le system framework. Technical report, Sun
   consistency. In Proceedings of the Twelfth ACM                 Microsystems Research, 1996.
   Symposium on Operating Systems Principles, pages          31   D. Mazieres, M. Kaminsky, M. F. Kasshoek, and
   202 210. ACM Press, December 1989.                             E. Witchel. Separating key management from le
17 A. S. Grimshaw, A. Ferrari, F. Knabe, and                      system security. In Proceedings of the Seventeenth
   M. Humphrey. Wide-area computing: Resource                     ACM Symposium on Operating Systems Principles,
   Kiawah Island, SC, December 1999. ACM Press.               June 2000. USENIX Association.
32 J. G. Mitchell, J. J. Gibbons, G. Hamilton, P. B.
   Kessler, Y. A. Khalidi, P. Kougiouris, P. W. Madany,
   M. N. Nelson, M. L. Powell, and S. R. Radia. An
   overview of the spring system. In CompCon
   Conference Proceedings, 1994.
33 M. Nelson, Y. Khalidi, and P. Madany. The spring le
   system. Technical report, Sun Microsystems Research,
   February 1993.
34 B. Noble, M. Satyanarayanan, D. Narayanan, J. E.
   Tilton, J. Flinn, and K. R. Walker. Agile
   application-aware adaptation for mobility. In
   Proceedings of the Twelfth ACM Symposium on
   Operating Systems Principles, St. Malo, France,
   October 1997. ACM Press.
35 R. H. Patterson, G. A. Gibson, E. Ginting,
   D. Stodolsky, and J. Zelenka. Informed prefetching
   and caching. In Proceedings of the Fifteenth ACM
   Symposium on Operating Systems Principles, pages
   79 95. ACM Press, December 1995.
36 R. Raman, M. Livny, and M. Solomon. Matchmaking:
   Distributed resource management for high throughput
   computing. In Proceedings of the Seventh IEEE
   International Symposium on High Performance
   Distributed Computing. IEEE Computer Society
   Press, 1998.
37 R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and
   B. Lyon. Design and implementation of the sun
   network lesystem. In USENIX Conference
   Proceedings, Berkeley, CA, Summer 1985. USENIX
38 M. Satyanarayanan. Scalable, secure and highly
   available le access in a distributed workstation
   environment. IEEE Computer, 235:9 22, May 1990.
39 G. C. Skinner and T. K. Wong. "stacking" vnodes: A
   progress report. In USENIX Conference Proceedings,
   pages 61 74. USENIX Association, Summer 1993.
40 A. M. Vahdat, P. C. Eastham, and T. E. Anderson.
   Webfs: A global cache coherent le system. Technical
   report, University of California, Berkeley, 1996.
41 S. Vazhkudai, S. Tuecke, and I. Foster. Replica
   selection in the globus data grid. In Proceedings of the
   First IEEE ACM International Conference on Cluster
   Computing and the Grid, pages 106 113. IEEE
   Computer Society Press, May 2001.
42 B. S. White, A. S. Grimshaw, and A. Nguyen-Tuong.
   Grid-based le access: The legion i o model. In
   Proceedings of the Ninth IEEE International
   Symposium on High Performance Distributed
   Computing, Pittsburgh, PA, August 2000. IEEE
   Computer Society Press.
43 J. Wilkes, R. Golding, C. Staelin, and T. Sullivan.
   The hp autoraid hierarchical storage system. ACM
   Transactions on Computer Systems, 141:108 136,
   February 1996.
44 R. Wolski, N. Spring, and J. Hayes. The network
   weather service: A distributed resource performance
   forecasting service for metacomputing. Journal of
   Future Generation Computing Systems, 1998.
45 E. Zadok and J. Nieh. Fist: A language for stackable
     le systems. In Proceedings of the 2000 USENIX
   Annual Technical Conference, San Diego, California,

To top