A Perspective on Scientiﬁc Cloud Computing
Craig A. Lee
Open Grid Forum, www.ogf.org
Computer Systems Research Department
The Aerospace Corporation
ABSTRACT the Top500 List, but also the economic availability of su-
Cloud computing has the potential for tremendous beneﬁts, percomputing across all ﬁelds. The on-demand nature and
but wide scale adoption has a range of challenges that must economies of scale for cloud computing may do the same
be met. We review these challenges and how they relate to thing for science.
scientiﬁc computing. To achieve the portability, interoper-
ability, and economies of scale that clouds oﬀer, it is clear When cluster computing was gaining popularity, many peo-
that common design principles must be widely adopted in ple argued that the relatively slower commodity networks
both the user community and marketplace. To this end, would hamper the performance of parallel codes, relative
we argue that a private-to-public cloud deployment trajec- to that of parallel supercomputers with dedicated (and ex-
tory will be very common, if not dominant. This trajectory pensive) interconnects. While this may have been true for
can be used to deﬁne a progression of needed common prac- tightly coupled codes that were very bandwidth and latency-
tices and standards which, in turn, can be used to deﬁne sensitive, for many codes the performance was quite ade-
deployment, development and fundamental research agen- quate with an undeniable cost-eﬀectiveness. Whatever per-
das. We then survey the cloud standards landscape and formance issues may exist for cloud computing, there will be
how the standards process could be driven by major stake- many scientiﬁc endeavors where the on-demand nature and
holders, e.g., large user groups, vendors, and governments, cost-eﬀectiveness will far outweigh any performance degra-
to achieve scientiﬁc and national objectives. We conclude dation. Across the scientiﬁc computing landscape, there is
with a call to action for stakeholders to actively engage in clearly a distribution of requirements for available compute
driving this process to a successful conclusion. power and data access that is necessary to facilitate progress.
While there will always be a need for massive, parallel ma-
Categories and Subject Descriptors chines, there will also be a need for reasonable amounts of
H.1 [Models and Principles]: General compute power, on-demand, at a reasonable cost.
While such performance-related arguments are very impor-
General Terms tant to the scientiﬁc computing community, it is also impor-
Standardization tant to understand that the science cloud concept is hap-
pening in the context of a much larger phase change in dis-
Keywords tributed computing environments. It is the mission of the
Cloud computing, deployment trajectory, standardization Open Grid Forum (OGF) to understand, and to the best
extent possible, manage this phase change on behalf of its
1. INTRODUCTION members, stakeholders, and the wider distributed comput-
Cloud computing is enjoying a tremendous level of interest ing community. It is widely held that common best practices
across virtually all areas of computational use, including the and standards will be needed to realize many of the beneﬁts
scientiﬁc computing community [13, 17, 18]. While there being touted for cloud computing. To that end, this paper
are certainly many issues to resolve and pitfalls to avoid, argues for a perspective on how this phase change will oc-
the argument can be made that cloud computing may have cur and strategic eﬀorts that could be taken to maximize its
a similar impact as that of cluster computing. The use of beneﬁts.
commodity processors and networks to build cluster com-
puters fundamentally changed not only the composition of 2. A DISCUSSION OF CLOUDS
We begin by brieﬂy discussing the expected beneﬁts of cloud
computing, in general, followed by outstanding issues and
challenges. We also give a key example of the larger moti-
vations to address the challenges and realize the beneﬁts.
The anticipated beneﬁts of cloud computing can be broadly
categorized into infrastructure-oriented beneﬁts and user-
oriented beneﬁts. User-oriented beneﬁts are those that an
individual user would be able to realize, while infrastructure- • Security is widely stated as the primary issue facing
beneﬁts are those that an infrastructure provider or data cloud computing. A cloud-speciﬁc security issue is that
center operator would be able to realize across a large, ag- of running arbitrary VM images. This is, however,
gregate set of users. Infrastructure-oriented beneﬁts include: only one aspect of the broader notion of Information
Assurance , i.e., making sure the right data is avail-
able to the right user at the right time. In addition to
• Improved server utilization. The use of virtual ma-
the fundamental security operations of authentication,
chines and virtual machine images provides ﬂexibility
authorization, privacy, integrity and non-repudiation,
in mapping work to physical servers, thereby allowing
information assurance also involves data reliability and
higher utilization to be achieved.
availability. The Cloud Security Alliance  is directly
• Improved reliability. Likewise, the use of virtual ma- pursuing many of these issues.
chines can facilitate fail-over between physical servers.
• Deployment models. The typical cloud deployment
• Greener IT. Energy consumption and costs can be re- models, i.e,. public, private, hybrid and federated,
duced through improved utilization and moving work strongly aﬀect the security and information assurance
to where the cheaper energy is available. issues of a given cloud. More on this later.
• Clear business models. By providing resources through • Service level agreements. While the simpliﬁed API of
a simpliﬁed API that abstracts away many infrastruc- current commercial cloud oﬀerings is critical for pro-
ture details, clear consumer-provider business models viding a lower barrier of adoption and supporting clear
are possible. business models, it complicates the notion of user con-
trol. User applications may have very speciﬁc perfor-
User-oriented beneﬁts include: mance or behavior requirements, in addition to regu-
latory policies that must be observed. To avoid ex-
posing unnecessary infrastructure detail through the
• Commodiﬁcation of compute resources. The commod-
APIs, cloud providers must provide the right abstrac-
iﬁcation of any product means that it is no longer a
tions through service level agreements for eﬀectively
specialized resource that must be uniquely designed,
specifying the desired behavior or policy.
installed and maintained. It can be bought, sold, and
replaced as needed, without costly re-engineering, etc. • Governance. Strongly related to the notion of service
• Managing surge requirements with on-demand resources. level agreements and policy, is that of governance –
Since commodiﬁcation allows resources to acquired and how to manage sets of virtual resources. Especially
released on-demand, this allows users to more easily at the infrastructure level, applications may consist of
manage expected and unexpected surges in compute many virtual machines, virtualize storage, and virtual
requirements. networks. Managing these virtual missions, or virtual
data centers, will require policy and enforcement from
• Ease of deployment. The use of virtual machine images both the provider and consumer.
can also ease the deployment of applications since the
machine image may contain the exact OS, libraries, • Cost is an outstanding issue – especially for public
patches and application code necessary to execute. and hybrid cloud use. Depending on an application’s
compute, storage, and communication requirements,
• Virtual ownership of resources. Rather than having to public cloud resources could be more or less expensive
deal with a shared resource, and the access contention than hosting the application in-house . Even if it
that can go with it, users enjoy the ownership of a is cheaper to host an application in-house, if the appli-
resource that is available at their beck and call, even cation has variable surge requirements, there will be
if that ownership is virtual, as in a computing cloud. some break-even point where it makes sense to cloud-
burst into a public cloud. Quantitatively evaluating
We note that these beneﬁts will become more signiﬁcant such break-even points will require costing models that
when operating “at scale”, especially for cloud providers. adequately capture an application’s dynamic resource
For cloud consumers, if only a small number of processes requirements .
or servers are required for a given application, then existing
in-house resources may be available and suﬃcient. However, 2.3 A Key Example
as the number of required servers and storage increases, and While industry is vigorously pursuing cloud computing, both
surge requirements become more unpredictable, then the from the provider and consumer sides, for the reasons cited
on-demand nature of commodity resources becomes more above, the key example we wish to give here is the US fed-
attractive. For cloud providers, the beneﬁts realized must eral IT budget. As illustrated in Figure 1, the FY 2010 US
leverage the economies of scale made possible by providing federal IT budget is $79B, of which ∼70% will be spent on
cloud resources out of massive data centers. maintenance . The US federal CIO has publicly declared
that cloud computing will be adopted to reduce this budget
2.2 Issues by eliminating redundant IT capacity across federal agencies
While these expected beneﬁts are driving much of the in- . To this end, the web sites data.gov and apps.gov have
terest in cloud computing, there are nonetheless a range of been stood-up whereby government data can be made more
signiﬁcant issues that both providers and consumers must accessible, and government agencies can shop for software
address. These include: and also acquire cloud resources. The Eucalyptus-based
codes together for deployment. This doesn’t make the
conﬁguration management problem completely trivial,
but it does reduce the problem to guest OS-host OS
compatibility. For scientiﬁc codes, however, this can
have direct implications for numerical stability across
diﬀerent virtual platforms.
• Performance Management: Abstraction vs. Control.
While scientiﬁc computing can be broadly categorized
into high performance computing and high throughput
Figure 1: The US Federal IT Budget.
computing that have very diﬀerent workload charac-
teristics, it is clear that scientiﬁc computing will have
Nebula Cloud at NASA Ames is to be the ﬁrst cloud back- signiﬁcant issues concerning abstraction versus con-
end for apps.gov to service federal computing requirements trol. There is a fundamental trade-oﬀ between the
on-demand . simplicity that abstraction can provide versus the abil-
ity to control application behavior by having visibility
3. A DISCUSSION OF SCIENCE CLOUDS and control over the underlying resource infrastruc-
Commercially available public clouds have been designed to ture. Hence, the satisfaction of performance manage-
satisfy general computing requirements, e.g., e-commerce ment issues for scientiﬁc applications will depend on
and transactional communications, that are typically less what minimal abstractions can be exposed to users
sensitive to bandwidth and latency. As clouds become more that enable the desired performance behaviors to be
mature, however, it is anticipated that clouds of diﬀerent adequately controlled. Clearly it may be possible to
“ﬂavors” will be deployed to meet the requirements of dif- expose such minimal abstractions through the use of
ferent user communities, e.g., scientiﬁc computing. Hence, service-level agreements whereby the necessary cou-
while all of the potential beneﬁts and issues of general cloud pling of resources can be speciﬁed, e.g., tightly coupled
computing are relevant to the scientiﬁc computing commu- computing clusters, compute-data locality, bandwidth
nity, the notion of science clouds will force an emphasis on and latency requirements, etc. This is a major out-
speciﬁc beneﬁts and issues. standing issue for the deployment and use of eﬀective
• Identity and Federation management. Currently avail- • Data Access and Interoperability continues to be an
able public clouds have relatively simple authentica- outstanding issue for all inherently distributed appli-
tion mechanisms that are suitable for basic client-pro- cations and federated organizations. While this is not
vider interaction. The established grid community, speciﬁc to cloud computing environments, the use of
however, has extensive experience in identity manage- dynamically provisioned resources will underscore the
ment and federation management. That is to say, the need to easily integrate disparate data sources and
scientiﬁc enterprise may involve the federation of dis- repositories. In addition to supporting things like high-
tributed resources and the management of Virtual Or- throughput computing, eﬀective data access and inter-
ganizations and the management of user roles and au- operability will enable science that can’t be done any
thorizations within a given VO. Given the wide in- other way. As an example, access to oceanic databases
dustrial interest in cloud computing, it is anticipated from diﬀerent geographic regions of North America has
that business-to-business interactions will involve, and enabled fundamental insights that were otherwise not
eventually adopt, similar identity and federation man- possible. 
agement concepts and implementations. • Execution Models, Frameworks and SaaS. While dy-
• Virtual ownership of resources. This represents poten- namically provisioned resources at the infrastructure
tially the largest beneﬁcial change for science clouds. level have many advantages as previously mentioned,
The scientiﬁc computing community is very familiar their eﬀective management and utilization by the end-
dealing with batch schedulers, but nonetheless, the il- user will present challenges. Various execution mod-
lusion of having “your own” resources or set of nodes is els can be identiﬁed that provide useful abstractions
very attractive. In much the same way that private car and make scientiﬁc infrastructure resources easier to
ownership oﬀers beneﬁts (and trade-oﬀs) with respect use. Map-Reduce is one popular tool for partition-
to public transportation, virtual ownership of cloud ing the processing that must be done across massive
resources will reduce uncertainty concerning access to data stores. Data streaming applications, such as sig-
those resources when you need to use them. nal processing, could also be similarly supported. Pa-
rameter sweep applications have existing tool kits that
• Ease of deployment. In traditional grid environments, could be adapted for cloud environments. All of the
where speciﬁc machine resources are directly exposed execution models could be supported by frameworks
and available to users, the deployment of applications that make them easier to use. From the cloud perspec-
depends on explicitly managing the compatibility among tive, this could be tantamount to providing Software
binaries, operating systems, libraries, etc. The use as a Service. For scientiﬁc purposes, the SaaS con-
of virtual machine images oﬀers the ability to pack- cept can even be extended to the notion of Models as
age the exact OS, libraries, patches, and application a Service (MaaS) whereby semantic annotations and
ontologies are used to compose computational models
and execute them as a conceptual whole . Clearly
a distinguishing property of science clouds may be the
availability of such high-level abstractions that can ef-
fectively use the infrastructure resources to provide
top-to-bottom support for scientiﬁc computing goals.
3.3 A Key Example
While the economics of on-demand resources will drive and
support a large segment of scientiﬁc users, the key example
we wish to relate here is that of operational hurricane fore-
casting. Figure 2 from Bogden et al.  compares several
hurricane track forecasting models applied to Hurricane Ka-
trina in 2005, from one to ﬁve days in advance. The black
line indicates the actual path. At ﬁve and four days out,
there is essentially no agreement among these models. At
three days out, the models are converging – but to a incor-
rect track. Finally at two days and one day out, the models Figure 2: Predicting Hurricane Katrina.
tend to agree with ground truth.
that constitute their own virtual data centers allocated out
The eﬀective prediction of hurricane tracks, in addition to of larger, physical data centers, i.e., clouds.
disaster response and mitigation, actually represents a sci-
entiﬁc and operational grand challenge problem. This will Such notions will be important to all large cloud providers
require a fundamentally enhanced understanding of how the and consumers. As a case in point, several national cloud
atmospheric and oceanic systems work, in addition to devel- initiatives have been announced. The US Cloud Storefront
oping the computational models that accurately represent , the UK G-Cloud , and the Japanese Kasumigaseki
the physical systems. When a tropic depression becomes  cloud initiatives will be major stakeholders in cloud stan-
a hurricane, the results of these tracking models will have dards since they will ultimately involve large cloud deploy-
to be fed into precipitation models to determine where wa- ments. While these national cloud initiatives will support
ter will be deposited, which will have to be fed into ﬂooding many routine IT business functions to reduce redundancy
models to determine where lives and property will be at risk. and improve utilization across government agencies, it is
To accomplish this will require an enormous amount of com- quite possible that these national clouds will support a range
pute power that is just not economically possible to dedicate of application types and requirements. That is to say, while
to this single purpose. Hence, shared resources will have to science clouds may or may not be deployed as part of such
be used, but they must also be available on-demand, possi- national clouds, they could nonetheless beneﬁt from the abil-
bly from a national science cloud that can support coupled, ity of national cloud initiatives to drive the development of
HPC codes with strict processing deadlines. relevant best practices and standards.
4. THE CLOUD STANDARDS LANDSCAPE To summarize, science clouds and national clouds may share
It is clear that common best practices and standards will be the following similar requirements:
needed to achieve the fundamental properties of portabil-
ity and interoperability for cloud applications and environ-
ments. Portability will mean more than simply being able to • Applications be transferable out and back into the
run to completion without fatal errors – it will mean being same cloud, and between diﬀerent clouds, and still re-
able to preserve critical properties, such as performance, nu- tain desired properties, such as performance, numeri-
merical stability, and monitoring. Interoperability will mean cal stability, monitoring for fault detection/resolution,
more than avoiding vendor lock-in – it will mean being able etc.
to avoid “cloud silos” that are non-interoperable since they • A registry or brokerage be available for the discovery
are build on diﬀerent APIs, protocols, and software stacks. of available resources and service level agreements for
Beyond the issues of portability and interoperability from a
user’s perspective, we can also consider the notions of virtual • Support for virtual organizations across diﬀerent sci-
organizations, virtual missions, or virtual data centers. As ence clouds and national clouds.
mentioned above, VOs address the issue of managing user
roles and authorizations within a collaboration of organiza- • Support for virtual missions and virtual data centers
tions for a speciﬁc goal or purpose. VOs or enterprise-scale allocated out of diﬀerent science clouds and national
applications may require the deployment of many servers clouds.
and functions that must be managed as a whole. That is
to say, large applications may be deployed as sets of virtual
machines, virtual storage, and virtual networks to support An outstanding issue that must be resolved is how to priori-
diﬀerent functional components. Another perspective is that tize these requirements and structure their development for
such large applications may be deployed as virtual missions the beneﬁt of national scientiﬁc goals.
4.1 Deployment Trajectories
These requirements must also be considered in the context
of cloud deployment models. The relationship of private,
public, hybrid and federated clouds is illustrated in Fig-
ure 3. Private clouds are typically deployed behind an or-
ganization’s ﬁrewall where access and the user population
is known and managed according to organizational goals.
Public clouds, by contrast, have a very open admission pol-
icy, typically based on the ability to pay. Managing surge
requirements, one of the key user-oriented beneﬁts, is com-
monly called cloudbursting whereby an organization can ac-
quire public cloud resources on-demand to form a hybrid
cloud. We can also consider the case where two or more pri-
vate clouds wish to interact for common goals and thereby
form a federated cloud (also called a community cloud).
Figure 3: The Relationship of Cloud Deployment
Organizations will adopt diﬀerent deployment models based
on their particular requirements. This casts the diﬀerences
between private and public clouds into sharp relief. Pub- performance and control requirements for various scientiﬁc
lic clouds oﬀer wide availability and economies of scale that application domains. If this is the case, then science clouds
only very large data center operators can achieve, but are may experience a similar deployment trajectory as national
perceived to have a host of security and information assur- clouds.
ance issues. When using a public cloud, any security or as-
surance requirements are essentially delegated to the cloud If we assume that the predominant cloud deployment tra-
provider, where the user may or may not have suﬃcient jectory will start with private clouds, and then progress to
insight and trust of the provider to satisfy the necessary re-
federated, hybrid, and ﬁnally public clouds, then we can
quirements. Hence, many organizations with signiﬁcant IT consider the progression of deployment issues and capabil-
and security requirements that wish to realize the beneﬁts ities that must be addressed in sequence. Such a progres-
of cloud computing will deploy their own private cloud. sion is summarized in Figure 4. Within a private cloud, the
cloud operator has explicit knowledge of all users, can set
To illustrate this issue, Figure 3 is labeled in the context policy and can use traditional security mechanisms with a
of diﬀerent government agencies that may have deployed secure ﬁrewall perimeter. When two private clouds federate,
their own private clouds, but may also wish to federate there can be explicit, out-of-band knowledge about the joint
with other government agencies, or use a national public users and agreement about policy and security mechanisms.
cloud. It would be easy to relabel this diagram for any
When a private cloud acquires some public cloud resources
sub-organizations that interact with peers, or a parent or- to become a hybrid cloud, the private cloud operator must
ganization. Indeed, the primary distinction between private deal with resources that may have an unknown number of
and public clouds may be more of a relative distinction con- unknown tenants, but at least the private cloud operator has
cerning the ownership of resources and the ability to enforce a secure perimeter whereby they can make a decision about
security policies, etc., versus delegating those responsibilities which data and workloads are stored and processing inside
to a second party. and outside of the secure perimeter. Finally, if all data and
operations are hosted in a public cloud, the user has no ex-
This dichotomy between public and private clouds raises the plicit knowledge of the other tenants and has delegated all
issue of deployment trajectory. Will public cloud adoption
management and security requirements to the public cloud
be predominant in the community, or will private cloud de- operator.
ployment be predominant? In the context of governmental
or national clouds, which will be predominant? This progression of issues and concerns that accumulate
from left to right can be further partitioned into necessary
Despite the fact that national clouds are being designed and technical capabilities, legal and organizational issues, and
deployed, various government agencies are already deploying for lack of a better term, “landscape” issues concerning the
their own private clouds. This dynamic can be character- forging of agreements across user groups, vendors, and ma-
ized as a top-down vs. a bottom-up approach. The top-down, jor stakeholders. The technical issues progress from basic
national public cloud approach has the challenge of recruit-
capabilities such as managing diﬀerent job types and mixes
ing enough users whose security and assurance requirements thereof, workload management, and governance, through
can be met. The bottom-up, private cloud approach has the capabilities that would be needed for managing federated
challenge of mitigating the risk of creating non-interoperable clouds, such as virtual organizations, to highly theoretical
cloud silos that cannot federate or hybridize. topics in public clouds, such as practical ways to operate on
encrypted data which currently do not exist. With regards
What will be the dominant deployment trajectory in the to legal and organizational issues, costing models to help
context of science clouds? While commercially available organizations evaluate the potential cost beneﬁts for their
public clouds could be used for scientiﬁc computing, they own computing requirements would be useful. This pro-
were not designed with such applications in mind. Hence,
gresses through joint organizations to manage distributed
diﬀerent science clouds may be deployed to meet various infrastructures, such as the International Grid Trust Feder-
Figure 5: A Draft Deployment, Development, and
Figure 4: A Trajectory of Deployment Issues.
dynamics that may also drive the trajectory of standards.
ation, and then arrives at the many issues concerning client Cloud computing is often categorized as Infrastructure as a
management of their service and security requirements when Service (IaaS), Platform as a Service (PaaS), and Software
dealing with a public cloud provider. The landscape issues as a Service (SaaS), ascending from the lower to higher levels
also progress across this deployment trajectory where users, in the system software stack. Some argue that standardiza-
vendors, and standards organizations will need to collabo- tion at the IaaS level will progress faster since this entails the
rate on the harmonization and shake-out of various eﬀorts commodiﬁcation of compute resources, e.g., servers, storage,
such that there is an emergent dominant best practice in the and networks, and that providers will start to compete on
user community ideally codiﬁed in a standard. price and the quality of their product, as determined by ser-
vice level agreements. This same argument says that inno-
While this progression of issues and necessary capabilities vation and product diﬀerentiation will predominantly occur
could be driven down into much more detail, this struc- at the SaaS level where providers will target speciﬁc market
turing approach can nonetheless be used to derive deploy- segments that are supported by commodity resources.
ment, development, and research agendas or roadmaps, as
shown in Figure 5. Here we somewhat arbitrarily parti- A contrary argument is that standardization will primarily
tion the roadmap into four phases. Each of these phases occur at the SaaS level since customers in speciﬁc market
are also partitioned into deployment, development & risk segments, such as customer relations management, will de-
mitigation, and research issues. Phase I deployment can es- mand portability and interoperability, resulting in a com-
sentially proceed immediately, while development and risk mon look-and-feel. SaaS providers will then be free to im-
mitigation eﬀorts must be done ﬁrst before they can be de- plement their services in any way possible “on the back-end”
ployed in subsequent phases. Fundamental research issues in their data center.
must also be identiﬁed whereby experimental work can be
done across the design space for a particular capability, prior We argue that standardization will primarily occur at the
to subsequent development phases and eventual deployment. infrastructure level since the commoditization of basic re-
Hence, there is a general promotion of concepts and capa- sources will have the farthest reaching impact across all mar-
bilities from lower left to upper right as they mature. In the ket segments; industry, commerce, government, and science.
bottom right, we can consider longer-term research issues, To this end, we describe a currently developing set of stan-
such as quantum computing, which might have an impact dards for cloud computing at the infrastructure level.
across all areas of computing, including that of clouds.
The Open Cloud Computing Interface (OCCI)  from OGF
Clearly this roadmap could be driven into much more de- is a simple, RESTful API for managing the lifecyle of com-
tail, based on the progression of requirements arising from pute, storage, and network resources, as illustrated in Fig-
a private to public cloud deployment trajectory. National ure 6. Here the basic create, read, update, and delete (CRUD)
cloud initiatives will certainly have their own roadmap ac- operations can be applied to URLs that identify the speciﬁc
tivities, but it would beneﬁt all involved to compare notes providers and resources in question. Attributes are main-
and coordinate eﬀorts on common, necessary items on their tained for each resource instance, along with resource links
development and research agendas. Science clouds should that identify sets of related resources that are being man-
deﬁnitely be part of this wider national discussion such that aged as a whole.
scientiﬁc computing requirements are directly addressed.
OCCI can be used with the Open Virtualization Format
4.2 Standardization Trajectories (OVF)  from DMTF. As illustrated in Figure 7, OVF is
In addition to considering the eﬀect of deployment trajec- essentially a representation format for virtual machines that
tories on necessary cloud standards, we must also consider can be used as the “coin of the realm” for deﬁning virtual ap-
where in the software stack that standardization might be plications and moving them among IaaS providers. An OVF
most useful and eﬀective. In addition to coordinating ma- Package consists of exactly one Descriptor ﬁle, an XML doc-
jor stakeholders, we must also try to understand the market ument commonly called the OVF envelope that deﬁnes the
Figure 6: The OGF Open Cloud Computing Inter-
Figure 8: The SNIA Cloud Data Management In-
terface. (Used by permission.)
Figure 7: The DMTF Open Virtualization Format.
content and requirements of a virtual appliance. If not omit-
ted, there is one Manifest ﬁle that contains SHA-1 digests to Figure 9: Driving Cloud Standards.
provide data integrity for the package. If not omitted, there
is one Certiﬁcate ﬁle that is used to sign the package and Geospatial Consortium  has developed a roughly annual
provide package authenticity. Finally there are zero or more process whereby major stakeholders can drive the develop-
Disk Image ﬁles that represent the virtual disks supporting ment of geospatial standards. This general process could
the virtual appliance. be applied in many other areas where standards are re-
quired and is illustrated in Figure 9. In the ﬁrst half of
OCCI can also be used with the Cloud Data Management In- the process, the Standards Organization facilitates the col-
terface (CDMI)  from SNIA, as shown in Figure 8. CDMI lection and prioritization of requirements across all major
can manage the provisioning of block-oriented, ﬁle-oriented, stakeholders into speciﬁc, short-term project plans. This
and object-oriented storage. Hard or soft containers are al- information is then used to issue a Request for Quotation
located out of physical storage, depending on requirements. or Call for Participation. Major stakeholders and partici-
A separate Storage Management Client can also be used to pants respond to the RFQ/CFP and begin contractual exe-
for the direct management of the physical storage. cution on speciﬁc projects to achieve near-term goals, e.g.,
demonstrating implementation interoperability, end-to-end
5. DRIVING CLOUD STANDARDS integration of capabilities, speciﬁc application scenarios of
Regardless of the trajectory that cloud standards may ac- interest, etc. Ultimately, after development and test, the
tually take, there exists a fundamental open loop between results are ideally deployed in persistent operations.
the consumers and producers of standards. Consumers typ-
ically “just want something that works” and consider the This process has worked quite well for government agen-
development of standards as being outside their charter and cies that need geospatial standards. As larger organizations
budget. On the other hand, vendors and other producers of that have stable, long-term needs for standards, there is a
standards typically focus on establishing market share and clear business case for them to engage more directly in the
only consider standards when demanded by customers. standards process. Since they are collaborating with other
stakeholders, the need for common standards with wider ap-
How can we close this loop and make the standards pro- plicability and potential adoption are identiﬁed earlier. By
duction process more responsive and eﬀective? The Open working on common goals within a collaboration, the stake-
holders and participants realize a signiﬁcant return on in- search roadmap to achieve the necessary capabilities.
vestment since the entire cost of a project is not borne by
one organization. At this point, we note that as soon as clouds federate, many
of the necessary capabilities that have been fundamental to
Given the current thrust of national cloud initiatives, it the grid concept, such as identity management and virtual
is clearly possible that this general process could work for organizations, will become directly relevant to cloud envi-
cloud standards as well. As the size and number of organi- ronments. This includes concepts and tooling such as iden-
zations involved increases, the more likely that vendors will tity management, virtual organizations, the GLUE schema
also engage. In fact, if there is a “critical mass” of partici- for describing computing resources, data access and trans-
pants, the stakeholders many only need to deﬁne what type fer standards such as OGSA-DAI, GridFTP, and SRM, etc.
of standard is needed without specifying any of the technical As concepts concerning distributed infrastructure manage-
details. The technical speciﬁcs could be left to the develop- ment and dynamic provisioning, grid and cloud are not in
ers during the development and test phase. competition, but are rather quite complimentary.
To facilitate the coordination of producers and consumers Another argument we make is that standardization will be
of cloud standards in this manner, standards organizations most eﬀective and useful at the infrastructure level, since the
working in the area of cloud computing recognized the need commodiﬁcation of basic resources, such as servers, storage,
to coordinate themselves. This led to the creation of Cloud- and networks, will have the widest impact across application
Standards.org, an informal collaboration of standards orga- domains. With this in mind, we described the developing
nizations, including OGF, DMTF, SNIA, OCC, CSA, TMF, OCCI, OVF, and CDMI standards that could be used to-
OMG and OASIS. Through regular telecons and common gether to deploy and manage infrastructure clouds. While
workspaces, these organizations keep each other appraised standardization at the PaaS and SaaS levels could also oc-
of work on cloud standards and opportunities to collabo- cur, it will mostly likely happen after standardization at the
ration on development, demonstrations, and outreach. En- infrastructure level.
gagement with large user communities are actively encour-
aged, including national clouds and also science clouds. Finally we considered how to drive cloud standards. Rather
than just letting the marketplace do a “random walk”, or
6. SUMMARY be driven by corporate interest in maximizing market share,
we describe a collaborative project process whereby major
We have discussed the beneﬁts and issues of cloud com-
stakeholders and participants can deﬁne short-term goals
puting, in general, and then considered the speciﬁc bene-
to make clear progress concerning implementations, demon-
ﬁts and issues surrounding the use of dynamically provi-
strating interoperability, and the integration of end-to-end
sioned resources for scientiﬁc computing requirements. Like
capabilities By engaging in a collaborative process, stake-
most users, scientiﬁc users of computation will appreciate
holders can realize a substantial return on interest. This
the “ownership” of virtual resources, since this reduces the
process has worked well for government agencies that are
uncertainly concerning access when needed. This diﬀerence
direct consumers of geospatial standards. Hence, this same
between batch job queues and allocation of virtual resources
process could work well for the development of standards
is a fundamental diﬀerence between the grid and cloud expe-
for national clouds, as well as science clouds for national
riences for the user. Likewise, virtual machines can simplify
application deployment by reducing the possible compati-
bility issues between the application and the hosting en-
While some may argue that it is too early for cloud stan-
vironment. This is another fundamental diﬀerence and is
dards, we argue that from a technical perspective, it is of-
particularly important for scientiﬁc users where numerical
ten quite easy to see where a common practice would be
accuracy and stability may be an issue.
beneﬁcial by reducing incompatibility and increasing reuse.
From a market perspective, however, the wide adoption of a
We also argue that while public clouds will be deployed and
common practice is problematic since it can involve market
used, the fact that enterprises can deploy and use private
“timing”, development schedules, and other competing non-
clouds while managing their security and information as-
technical issues. Hence, in many cases, the best that can be
surance requirements using existing tools and approaches,
done is to put a “stake in the sand”. This allows a techni-
means that in the near-term, private cloud deployment will
cal solution to be deﬁned and tested in the “marketplace of
take on greater importance for enterprise requirements (even
ideas” wherein it might gain adoption if the time is right.
if the private clouds are deployed virtually from a cloud
provider’s data center). Furthermore, user groups that have
The ﬁnal message of this paper is a call to action for all
speciﬁc computing requirements may not want to acquire
stakeholders to engage in the best practices and standards
resources from a public cloud provider if those resources
processes described above. This paper was motivated by the
cannot meet requirements. As a case in point, scientiﬁc
need to organize not only the technical development issues
computing will demand closer coupling of servers, access to
for scientiﬁc and national clouds, but also to organize the de-
data, and the ability to manage performance through proper
velopment roadmap goals of science clouds, national clouds
abstractions exposed through service level agreements. This
and the wider distributed computing community. This can
argues for the separate deployment of science clouds that
only be done if stakeholders engage and help drive the pro-
have these properties. Hence, even science clouds may fol-
cess to a successful conclusion in community-based organi-
low the deployment trajectory from private, to federated,
zations such as the Open Grid Forum.
hybrid, and ﬁnally public clouds. Given this deployment
trajectory, we derived a deployment, development and re-
7. ACKNOWLEDGMENTS  U.S. Federal IT Dashboard.
The author wishes to thank Geoﬀrey Fox, Steven Newhouse, http://it.usaspending.gov.
and Martin Walker for valuable comments on earlier drafts  M. Armbrust et al. Above the Clouds: A Berkeley
of this paper. View of Cloud Computing. Technical Report
EECS-2009-28, UC Berkeley, 2009.
8. REFERENCES www.eecs.berkeley.edu/Pubs/TechRpts/2009/
 FY 2010 U.S. Federal Budget. EECS-2009-28.pdf.
http://www.gpoaccess.gov/usbudget.  P. Bogden. Personal Communication., June 2009.
 Information Assurance. http: Former project director, SURA Coastal Ocean
//en.wikipedia.org/wiki/Information_assurance. Observing and Prediction program (SCOOP).
 The Cloud Security Alliance.  P. Bogden et al. Architecture of a Community
http://www.cloudsecurityalliance.org. Infrastructure for Predicting and Analyzing Coastal
 The DMTF Open Virtualization Format. Inundation. Marine Technical Society Journal,
www.dmtf.org/standards/published_documents/ 41(1):53–61, June 2007.
DSP0243_1.0.0.pdf.  Chris Kemp. Standards, Nebula, and Interoperability.
 The Envision Project. www.omg.org/news/meetings/tc/ca/
 The Kasumigaseki Cloud Concept.  I. Foster and others. Cloud Computing and Grid
http://www.cloudbook.net/japancloud-gov. Computing 360-Degree Compared. In IEEE Grid
 The OGF Open Cloud Computing Interface. Computing Environments (GCE08), pages 1–10, 2008.
http://www.occi-wg.org/doku.php.  R. Buyya and others. Cloud Computing and Emerging
 The Open Geospatial Consortium. IT Platforms: Vision, Hype, and Reality for Delivering
http://www.opengeospatial.org. Computing as the 5th Utility. Future Generation
Computer Systems, 25(6):599–616, June 2009.
 The SNIA Cloud Data Management Interface.
http://www.snia.org/cloud.  J. Weinman. Mathematical Proof of the Inevitability
of Cloud Computing. cloudonomics.wordpress.com/
 The UK G-Cloud. http://johnsuffolk.typepad.
bility-of-cloud-computing, Nov. 30 2009.
 The US Cloud Storefront.