On the scalability of the SGAS bank
Author: Peter Gardfjäll
Date: 24 November 2011
Version: Draft 0.1
1 Aim .................................................................................................................................................. 1
2 Introduction ...................................................................................................................................... 1
3 A virtual bank approach ................................................................................................................... 2
4 Naming schemes .............................................................................................................................. 2
4.1 eXtensible Resource Identifier (XRI) ....................................................................................... 2
4.2 Handle system........................................................................................................................... 3
4.3 Resource Namespace Service (RNS)........................................................................................ 3
4.4 WS-Naming .............................................................................................................................. 3
5 Virtual Bank design ......................................................................................................................... 4
5.1 Model overview ........................................................................................................................ 4
5.1.1 Branch interactions ............................................................................................................ 5
5.1.2 JARM interactions ............................................................................................................. 6
5.1.3 Admin interactions ............................................................................................................ 6
5.2 Server-side components ............................................................................................................ 7
5.3 Client-side components ............................................................................................................ 7
6 Future extensions/improvements ..................................................................................................... 8
7 References ........................................................................................................................................ 9
1 Aim
The objective of this document is to propose scalability improvements for the SGAS bank component.
As the fundamental architecture has started to stabilize, now may be a good time to discuss ways of
improving SGAS scalability. Such improvements are essential for the long-term success of SGAS, in
particular considering the forthcoming Globus Toolkit integration, which hopefully will lead to more
SGAS deployments in large-scale Grid environments.
The goal should be to allow SGAS to be deployed and run smoothly in Grid environments of arbitrary
size, without placing an unreasonable burden on SGAS administrators, resource owners and Grid
users.
2 Introduction
The current bank solution could quickly become a performance bottleneck in large-scale Grid
environments, comprising a large number of resources and a large user community. In such settings,
the job submission rate, and hence the account request rate, is likely to be high, threatening to overload
the bank at times of peak load, which could potentially incur a significant performance penalty on job
submissions or even worse: account reservation attempts may time out and job submissions “slip
through” the accounting system.
Although a number of separate banks could be set up, handling different subsets of user accounts, this
would place an additional burden on the bank administrator, who besides needing to configure and set
up the banks would also need to administer each bank separately. Furthermore, each resource owner
would need to reconfigure its resource to establish trust with each additional bank as well as to
configure separate user(group)-to-bank mappings directing each user to its designated bank – a tedious
task, especially considering a large user base. To add to the problem, all these resource-side mappings
1
would need to be changed whenever a server is relocated. Bank-mapping could also become
problematic in situations where a user is a member of several accounts that are located in separate
banks. Moreover, the bank administrator would need to manually enforce a project name uniqueness
constraint across the banks, to prevent accounts in different banks from sharing the same project name.
This document proposes a bank approach that virtualizes the bank, to allow dynamic (and transparent)
provisioning of more bank servers to adapt to VO growth, while preserving the illusion of a single
bank service. The proposed solution is based on an abstract naming scheme, which not only simplifies
account names but also facilitates server relocations. Furthermore, trust is automatically established
with new servers, without requiring resource owner intervention.
3 A virtual bank approach
The proposed virtual bank approach distributes the bank by partitioning its set of accounts across
several physical hosts (or bank branches), with the goal of balancing the client request load between
several servers and thereby being capable of handling larger Grid environments. The set of distributed
branches is, however, presented to clients as a single logical bank service.
An abstract and location-independent naming scheme is the key enabler of the virtual bank. Naming,
which is essential to the creation of virtualization, offers several desirable transparencies, such as:
Scaling transparency: allowing servers to be added/removed dynamically without affecting
client software (besides performance variations). This transparency is critical to enabling the
virtual bank. It allows additional branch servers to be added at runtime, to adapt to a growing
Grid environment.
Location transparency: allowing clients to disregard from the physical location of a resource.
This transparency facilitates server relocations. For example, network outage, server
maintenance or machine park upgrades may necessitate a (temporary) branch move to a
different network address.
Migration transparency: allowing resources to be dynamically moved between hosts, e.g., in
response to varying server load. One could, e.g., envision an automatic rebalancing
mechanism where load conditions may trigger account migrations from a heavily utilized
server to a server with spare capacity.
Replication transparency: allowing redundant copies of a service/resource to be maintained on
several servers, e.g. for purposes of performance or fault-tolerance. For the bank case, this
may ease a future move towards a replicated bank implementation.
In addition to the above benefits, an abstract naming scheme with human-friendly account names
could make the life of Grid users easier as well. Undoubtedly, an abstract (location-independent) name
such as sgas://account1 is easier to remember than a physical address such as
https://swegrid.bank.host:8443/wsrf/services/sgas/bank/account/AccountService?account1
Furthermore, a simple naming scheme may encourage users to specify the account in their job
specifications, thereby alleviating the resource from making a (linear) search for a default account.
The next section describes some of the ongoing naming standardization efforts, which are potential
candidates for a naming scheme within the virtual bank.
4 Naming schemes
No de-facto naming system standard has yet emerged in the Grid community. However,
standardization efforts are currently underway in many different organizations.
4.1 eXtensible Resource Identifier (XRI)
The eXstensible Resource Identifier (XRI) standard [XRI], specifies an URI-compatible identifier
scheme and resolution protocol, and is being actively defined within the OASIS XRI TC. XRIs are
abstract and location-independent identifiers, essentially capable of identifying anything. As suggested
by the name, XRIs can be extended to contain different kinds of metadata describing the identified
resource. XRIs also provide an HTTP(S)-based resolution protocol offering several modes of
2
resolution, including trusted (signed resolution results) and proxied resolution (server resolves on
behalf of client). The following example illustrates what an XRI may look like:
xri://swegrid.bank*account1
The XRI resolution process, which determines the network endpoint associated with a particular
identifier, is an iterative process where each authority sub-segment (*-delimited) is resolved left-to-
right into an associated XRI Descriptor (XRID) describing the authority and how it can be accessed
(URLs).
The second version of the XRI standard specifications is currently being finalized, based on feedback
from early XRI adopters. Furthermore, an open-source, Java-based implementation is available
(current implementation status is alpha).
4.2 Handle system
The handle system, developed by the CNRI [CNRI] described in RFC 3650, 3651 and 3652, is a
general-purpose system for assigning, managing, and resolving persistent identifiers, referred to as
handles. Through the use of resource handles, the system allows various kinds of information to be
associated with a resource. These resource handles are resolved into information about the associated
resource, including how to locate and access the resource.
The handle system is a global infrastructure, administered by CNRI, which runs a global handle
registry (c.f. DNS root servers). All locally managed handle servers are required to register to the
global registry, which is responsible for redirecting handle resolution requests to their corresponding
authoritative name server. Having to register with a global handle server, which exposes handles
globally, may not always be desirable. In particular, it would be problematic during system
development and for test deployments of a system. Furthermore, the mandatory global registration
complicates the installation procedure, since it requires the handle server administrator to apply to
CNRI for a handle (by mail) and wait for a reply before being able to finish the handle server
installation. Moreover, CNRI takes a registration fee for each handle server.
The handle system is targeted for integration with the Globus Toolkit [GT4-HDL] to provide a general
means of publishing and finding metadata about different types of resources. However, the integration
progress so far has been slow, primarily due to a lack of funding and development resources.
The conclusion is that the handle system, in its current state, does not look like a promising alternative.
4.3 Resource Namespace Service (RNS)
The Resource Namespace Service (RNS) has been defined within the GGF Grid File System Working
Group (GFS-WG) [GFS-WG]. The RNS is a WSRF-compliant web service that allows addressable
entities to be registered in a hierarchical (file-system like) namespace. It constitutes a general naming
service with operations for managing, navigating through, and resolving names into EPRs. In addition
to RNS, the specification also includes the description of an independent, non-hierarchical name-to-
address resolution service, defined in a separate port type (RNSResolverPortType). This adjunct
service, focusing on management and resolution of simple logical names, seems to be more in-line
with the SGAS needs. The RNS specification has reached final draft proposal status. However no
reference implementation appears to be available at the time of writing.
4.4 WS-Naming
The recently formed(?) OGSA-Naming-WG [Naming] investigates naming and name resolution for
Web services. The goal of the working group is to produce a WS-Naming naming specification that
builds on WS-Addressing, and to work on two specifications (RNS – Resource Namespace Service
described above and WSNR – Web Service Name Resolution).
3
The WS-Naming specification defines a WS-Name construct that utilizes the extension points in WS-
Addressing to add elements for abstract names and resolvers to the endpoint reference construct. The
WS-Naming specification also declares a simple port type for a WS-Naming resolution service that
resolves abstract names into WS-Name endpoint references.
The following XML-document shows a sample WS-Name, in this case an endpoint reference that has
been extended with an abstract name URI and a resolver address.
http://tempuri.org/example
sgas://swegrid.account1
http://tempuri.org/resolver1
5 Virtual Bank design
The virtual bank includes one key component – the bank authority, which serves as a naming and
resolution service that manages the set of bank branches belonging to the virtual bank and the
collection of name-to-account mappings. The bank authority effectively acts as the bank front-end and
is the single-point-of-trust to clients.
The design is based on the notion of WS-Names (as defined by WS-Naming, covered above), and is
made flexible with respect to what naming scheme to employ. That is, the bank is capable of using
different resolution back-ends, using different naming schemes and potentially with different
resolution protocols (e.g. a HTTP(S)-based resolution as is the case with XRIs or a SOAP-based
resolution protocol as in RNS). This approach enables future adoption of alternate naming schemes,
e.g. in order to leverage feature-rich implementations. Each account is assigned an abstract account
name, which is an arbitrary URI (such as sgas://account, xri://swegrid.bank*account,
rns://swegrid.bank/account1, hdl://12.34/bank). To contact an account, its abstract name first needs
to be resolved into a concrete EPR/WS-Name.
5.1 Model overview
The virtual bank model is illustrated in the figure below, which shows the administrator creating a new
branch account (sgas://account) that results in a mapping (abstract account name to physical address)
being added to the bank authority service. The figure also shows a client contacting the account by
resolving its abstract name.
4
Virtual bank
L M
O Bank G
2. Register
O
K authority M mapping
U T
P
5. Resolve: Account EPR
sgas://account Bank branch 1. Create account Admin
6. Invoke
Bank branch
Client
Bank branch
sgas://account
1
4. Account member gets account id
As can be seen from the figure, the bank authority provides two separate interfaces:
A management interface, exposed to branch servers and administrators, that permits
registration of branches and accounts and reveals details regarding the physical organization
of the virtual bank (e.g. the set of branch servers and account locations)
A resolution interface through which clients can invoke accounts by expanding abstract
account names into their physical locations.
The following sections outline the interactions that take place between the bank authority and the other
SGAS components as well as a more detailed description of the server- and client-side components of
the system.
5.1.1 Branch interactions
The bank branches interact with the bank authority in order to register account names. Name
registration can be performed at different points in time:
At the time of account creation, the bank authority must be contacted to register the mapping
between the abstract account name and its physical address. The bank authority enforces
account name uniqueness across the set of branch servers. Registration can be carried out by a
bind-operation that fails if a mapping for the account name already exists. The following
figure illustrates what an account registration may look like:
5
On restart of a branch server, its set of accounts needs to be re-registered with the bank
authority as the branch may have been relocated to a different network address. This
operation, which we may call rebind, is similar to the bind-operation, but uses slightly
different semantics. To account for branch server relocations any pre-existing account
mappings are overwritten.
The bank authority implements a soft-state registration protocol (more on this below). Hence,
the bank branches will need to report to the bank authority periodically to prevent their
mappings from being removed from the registry.
5.1.2 JARM interactions
The logical bank presented to users in the virtual bank approach allows a user to specify the abstract
name of an account in the job description. The JARM that intercepts the job request on the resource
looks up the abstract account name according to the resolution protocol determined by the current
naming scheme. The following sequence diagram illustrates what a job submission using abstract
account names would look like:
Note, in particular, that the returned WS-Name may include the identity of the branch server, which
allows the client-side (JARM) to dynamically establish trust with new branch servers, without
requiring resource owner intervention. This model assumes that clients trust the bank authority and
that clients need to trust all branches “recommended” by the bank authority.
In case no account is specified in the job request, the JARM must search the bank for a default account
to fall back on. The search needs to be carried out across all branch servers. Since each branch search
can be carried out in parallel no notable performance loss should be incurred. In fact, if accounts are
evenly distributed across the branches, this approach should perform faster than the single-bank
approach. Note that the resolution process and the account search procedure are essentially one-time
operations for each account, as the system can make quite aggressive use of caching.
5.1.3 Admin interactions
In contrast to the logical view presented to regular clients, the physical bank distribution needs to be
made explicit to the administrator, e.g. to examine individual branches or spread accounts evenly
among available branches. The bank authority serves as the entry-point to administer the entire bank.
From the bank authority, the set of branch servers is acquired, and then individual requests are directed
to each individual branch. The admin tool requires additional commands to deal with the distributed
nature of the bank, e.g. a branch list|info command, to learn the physical account distribution. Other
commands need to be modified to (optionally) apply to all branches (e.g. account list, bank-admin
add|remove). Furthermore, commands for correcting naming inconsistencies may be necessary
(however, considering the soft-state registration protocol (covered below) such commands may be
superfluous).
6
5.2 Server-side components
The bank authority acts as the front-end of the virtual bank, hiding the physical account distribution
from clients and being the “single-point-of-trust” in the system. The bank authority functionality can
be divided into two separate interfaces: 1) a WS-based management interface and 2) a customizable
name resolution interface, allowing different resolution engines (potentially using different resolution
protocols) to be plugged in. Although these two services are likely to be co-located, nothing prevents
them from being located on separate hosts. Furthermore, considering a small-scale Grid environment,
only requiring a single branch server, the bank authority may be co-located with the branch.
The management interface must provide operations for the following functionality:
Keep track of the set of active branch servers.
Allow (re)binding of account names, while enforcing a name uniqueness constraint across the
branch servers.
For the resolution interface, it must be possible to perform trusted account name resolution. To allow
for automatic trust establishment with dynamically added branch servers, the bank authority should
supply the server identity with each successful name resolution. During resolution, the bank authority
may also supply additional (replica) resolution services, which can be used by clients to discover new
bindings for a relocated account in case a connection cannot be established with the original bank
authority.
For automatic recovery from naming inconsistencies, a soft-state/lease-based binding protocol should
be employed. In order to improve overall scalability the bank authority should provide batch bind
operations. Furthermore, in addition to allowing lease renewal of individual bindings, a per-branch
lease renewal should be provided, effectively extending all bindings pertaining to a particular branch.
It is of utmost important to prevent the bank authority from becoming the system’s performance
bottleneck and/or single-point-of-failure. At least two approaches can be combined to address this
problem:
Extensive use of resolution result caching. Aggressive use of resolution caching in bank
clients would significantly offload the bank authority, especially considering the low update
rate of the bank (it is reasonable to assume that neither account creation nor server relocations
will occur frequently). Besides offloading the authority, thereby improving overall system
scalability, caching may also improve system fault-tolerance by allowing cached resolution
results to be used during bank authority outage.
Replication strategies. A set of redundant resolution services could be set up to improve
availability. Clients are made aware of these replica resolvers through resolver-elements
provided in the resolved WS-Names. Although maintaining replica consistency is challenging,
the infrequent update characteristic of the bank authority simplifies the problem. Performance
improvements could also be achieved if resolution requests are evenly spread among replicas.
5.3 Client-side components
The client-side needs to install a request flow resolution handler, responsible for resolving all abstract
account URIs (e.g. sgas://account1) to their corresponding physical addresses. To allow for different
resolution schemes, the resolution handler should allow registration of different resolvers, each
handling resolution for a particular naming scheme (such as rls://, xri://, sgas://, etc.). The following
figure shows a schematic view of the resolution handler:
7
Resolution xri://*
XRIResolver
Handler
Resolution default
EPR mapper ... Transport handler invoke
RNSResolver On failure: re-resolve name
rns://*
(bypass cache) and try again
Resolve abstract URI (if found)
Add server identity (if available) to trusted targets
Add resolver object to message context
The following Java interface describes what a client-side resolver looks like:
public interface Resolver {
public EPR resolve(URI abstractName);
public EPR resolve(URI abstractName, EPR invalidURI);
}
Whenever a client request message is handled, the resolution handler extracts the target address URI
and passes it to the resolution mapper. The resolution mapper examines the URI scheme and passes
the URI to its corresponding resolver, if one has been configured. The resolver takes the following
actions:
It resolves the abstract name according to the resolution protocol, perhaps by using a cached
resolution result.
It adds the server subject (if supplied in the resolution result) to the list of trusted targets
associated with this message exchange.
It adds itself to the message context (if a new resolution needs to be made).
The abstract account name is replaced by the resolved (physical) address in the SOAP message.
Eventually, the message will reach the transport handler, which is responsible for transmitting the
message. If the invocation fails (e.g. the branch server may have been relocated) the transport handler
may try to resolve the account again. In this case it may use the second resolve-method above to
indicate that it needs an EPR that is different from the one it just tried. The EPR works as a hint to the
Resolver that it might need to refresh its cache.
6 Future extensions/improvements
This section contains some (more or less far-fetched) topics that are subject to further investigation.
An account redistribution mechanism that migrates accounts from heavily loaded servers to
branches with spare capacity. Different mechanisms are conceivable, such as 1) a manual
mechanism, where the administrator chooses what accounts to migrate and where, 2) a
(semi)automatic where the administrator triggers account redistribution which is handled
automatically, and 3) a fully automatic mechanism where the system itself triggers account
migrations. This would involve investigating triggering mechanisms as well as migration
strategies. Migration should not necessarily be performed by balancing the number of
accounts on each server. It depends on the “activity” of individual accounts. A small set of
hot-spot accounts may account for much of the load on each branch and hence the goal should
be to spread such hot-spots across the system. Metrics for measuring load (im)balance,
strategies for migration, and atomic account migrations are some of the areas that need to be
investigated. This is a mechanism that presumably could be extended to apply in more general
settings.
8
7 References
[XRI] OASIS Extensible Resource Identifier (XRI) TC
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xri
[CNRI] The Handle System
http://www.handle.net/
[GFS-WG] Grid File System Working Group
https://forge.gridforum.org/projects/gfs-wg
[Naming] OGSA Naming Working Group
https://forge.gridforum.org/projects/ogsa-naming-wg
[GT4-HDL] Handle System - Globus Toolkit Integration Project
http://www-unix.globus.org/toolkit/projects/handle_system.html
9