Grid Performance Evaluation and Modelling
The Grid is a highly heterogeneous environment that can potentially provide seamless, fast and
efficient access to range of resources that are distributed over a wide area. At the moment there
are no commonly accepted ways to systematically measure and understand the types of metric
that can make Grid performance evaluation and modelling an engineering discipline, rather than
an ad hoc exercise as is the case currently. After reviewing some ongoing efforts, we first break
down the Grid into manageable components. Then, for each constituent component, we describe
it characteristics and define metrics that can be used for understanding its performance.
Performance evaluation and modelling of computer-based systems has always been, to say the
least, a contentious and problematic exercise. A tension often arises due to the varying
stakeholder viewpoints. For example, application scientists are normally only interested in the
fast and reliable execution of their application. Whereas, the systems operators typically desire a
system that is easy to configure, manage, and maintain. Alternatively, the vendor often wants to
highlight the good aspect of their machine and minimise any bad facets. Other stakeholders, such
as a funding body may have other criteria, like costs, or the reliability of the vendor, and so on.
Consequently, attempts to create “standard” performance measurements and methodologies to
date have only been partially successful. It should be noted here that these attempts have only
been made on sequential or homogeneous parallel systems. The evaluation and modelling of the
Grid will introduce a raft of new concerns and issues, as well as providing an effective way to
address the difficult issues around the functioning of grid applications and middleware.
1.1 The Necessity of Performance Evaluation and Modelling: Motivation
In the past performance evaluation and modelling of computer system was performed in the
main, for one of three reasons:
1. To help purchase the best system to execute a suite of applications,
2. To understand architectural concerns and look at ways of enhancing future systems,
3. To help optimise an applications performance, based on knowledge of how the application
should execute on that architecture.
The evaluation and modelling of the Grid, in some way changes our fundamental reasons for
undertaking this task; as we are no longer looking at a single system, but rather considering
potentially large collections of resources, both hardware and software, working together to
provide services to an application. In addition, we are no longer considering a quiet and
controlled system in which to do our performance evaluation; now we are forced to cope with a
wide-area distributed system, where we may not have exclusive control over the components,
but also they may fail or become a bottleneck while the evaluation tests are being carried out.
2 Ongoing Efforts
2.1 The GGF Grid Benchmark Research Group
The GB-RG  plans to advance the efficient use of grids by defining metrics to measure the
performance of grid applications and architectures and rate the functionality and efficiency of
grid architectures. These metrics should assist good engineering practices by allowing alternative
implementations to be compared quantitatively. The defined tasks will be specified as paper-and-
pencil benchmarks that can be implemented, in principle, using any of the existing and future
grid environments. The BWG also aim to provide some reference implementations of the tasks
that can be used by grid users and developers as starting points for assessing grid
implementations. It appears that this working group has made little real progress since its
2.2 IETF Benchmarking Methodology Working Group (BMWG)
The goal of the BMWG  is to make a series of recommendations concerning the measurement
of the performance characteristics of various internetworking technologies; further, these
recommendations may focus on the systems or services that are built from these technologies.
The BMWG are focussed on primarily on Internet networking technologies, and are not on Grid
technologies at the moment.
2.3 Grid Assessment Probes (GRASP)
As a means of attempting to provide an insight into the stability, robustness, and
performance of the Grid, the GRASP project  have developed a set of probes that exercise
basic grid operations by simulating simple grid applications. The probes can be run on a grid
testbed, collecting performance data such as compute times, network transfer times, and
middleware overheads. The GRASP system is currently designed to use only Globus
infrastructure and tests the following activities:
Check for a valid grid proxy,
Perform a basic authentication to all nodes involved in the probe,
Validate the configuration file, if necessary,
Check directory sizes to ensure that target directories can accommodate files to be
Query the Globus MDS to and find available information on all nodes involved in the probe.
The GRASP project is at an early stage of development; the developers aim to produce a suite of
reprehensive Grid application benchmarks to test and evaluate Grid environments.
GridBench  is work package of the CrossGrid project . The goal of GridBench is to propose
a set of performance metrics to describe the performance capacity of the Grid and its applications.
The work package aims to develop and implement GridBench, a suite of benchmarks that are
representative of typical Grid workloads. The benchmarks will be used to estimate the
performance of different Grid configurations, to identify factors that affect end-to-end application
performance, and provide application developers with initial estimates of expected application
2.5 Integrated Performance Analysis of Computer Systems (IPACS)
The IPACS Project  is funded by a German 'High-Performance-Computing' programme.
Within this project, grid benchmarks and the technologies for grid-adaptive applications will be
developed. The grid benchmarks will be used to analyse and to parameterise the grid
environment in the first instance. These benchmarks will be the basis for the further development
and optimisation of grid products and applications. IPACS started in the summer of 2002, no
results or downloads are available yet.
Most of the efforts described above are adapting existing techniques for benchmarking from
various disciplines to the Grid, without taking into account the unique features of the Grid.
While this is understandable, since it makes sense to leverage existing knowledge, it is unclear
whether such benchmarks can provide answers to the questions that are posed by grid
applications and systems. In the following sections, we will briefly review the state of the art in
computer system benchmarking (Section 3), the architecture of the Grid, particular features that
are unique to the Grid (Section 4), and some recommendations for grid benchmarking and
performance modelling (Section 5).
3 A Historical View of Performance Evaluation and Modelling
A variety of benchmarks have been used as the means of assessing the performance of computer
architectures. Typically, the benchmarks can be classified into three categories:
a) Low-level, these determine the rates at which a machine can perform fundamental
operations, such as MIPS, flops/s, and memory I/O.
b) Application kernels, these are typical core application algorithms, such as matrix operations
or maybe an FFT.
c) Full applications, these would have all the major components of full applications, such as the
NAS Parallel Benchmarks, which are based on computational fluid dynamics applications, or
the GAMES  code.
4 The Architecture of the Grid
4.1 Typical Grid Applications
Large scale applications: Typically grand challenge scale technical computing applications that
can execute on one or more remote supercomputer. This class of application would normally be
a parallel application using the likes of MPI or OpenMP.
Parameter studies: A single program is executed multiple times on the available remote hosts
with a varying input data; this allows the user explore the influence of the input on the
application. With this class of application users are interested in high throughput.
Meta-problems: This class of application is one where parts of the application run distributed
platforms. An example of a meta-problem is coupled simulations, where different parts of the
application run on distributed and remote platforms.
Data-intensive: Applications in this class may access, move or use large quantities of data.
Examples include processing data for the LHC at CERN, interacting with remote databases, or
collaborative environments, such as video conferencing.
4.2 Characteristics and Components
One simple way to look at the Grid is as a collection of computing resources connected by
network links. In this model, the computing-based resources are endpoints in a graph
representing the connections between the resources; the network connecting them makes these
separate resources into a grid.
There are two major types of grid:
a) An intra-organizational grid; this is a grid within a single administrative domain. An
example is a tool to make use of the desktops within a single organization.
b) An inter-organizational grid; this is a grid that crosses administrative domains. In
addition to the issues faced by software for an intra-organization grid, software for an
inter-organizational grid must deal with the complex issues of authentication, trust, and
The Grid differs from classical computer systems in several respects:
a) Grids are typically physically large, often spanning thousands of kilometres. This implies a
speed-of-light delay of milliseconds (10-3 seconds), compared with the clock rate of a typical
personal computer that is now measured in fractions of nanoseconds (10 -9 seconds). This
leads to a different application mix than is appropriate for a single parallel computer or
b) Grid resources are shared. Even if the computational services are dedicated, the network
connecting them is typically shared with many users (it if is the Internet, with millions of
users). Because resources are shared, the sort of experimental reproducibility based on the
use of dedicated resources so common in computer system measurement is rarely feasible or
realistic in the Grid.
c) Grid resources are heterogeneous. While it is possible to build a grid out of homogenous
components (e.g., the same computing platform at all endpoints), this is uncommon.
d) Faults are a fact of life on the Grid. Grid software and algorithms must deal with faults; this
differs greatly from the single-system case (at least in most areas of technical computing)
where faults are rare and software and algorithms are designed under the assumption that
faults are exceedingly rare. Benchmarks, particularly for grid usability and productivity,
must be sensitive to how the system responds to faults.
e) Grid applications need to operate securely on distributed resources using various security
measures (such as firewalls or using technologies such as Kerboros). The extra security
needed in a grid, has potentially an impact and consequences on an applications
performance, which would not affect an application on a traditional single-site platform. It is
important that we understand the impact that of different levels of security has on grid
A grid consists of heterogeneous hardware and software components linked together over,
potentially, a wide area via the Internet. We can categorise a grid as consisting of hardware and
software services… something generic here as to why we want this!
4.3 Hardware Services:
4.3.1 Simple Abstract Hardware View
Figure 1: Abstract Hardware View
The client-side hardware will typically be a PC, but could obviously be an array of devices
ranging from lightweight mobiles ones, such as a PDA or cell phone, to more heavy weight ones
such as a high-end graphical workstation or specialised device. The software interface that a user
employs on a client side can vary in complexity, and so characterising a client endpoint will be a
function of the hardware capability against the software performance.
The server-side hardware can vary dramatically and range from a simple PC-based server to
some high performance multi-processor platform. The need for secure systems in the Grid
typically means that client-side interaction with the server side is via a single point of entry or a
gateway to the backend services.
Intermediate nodes will potentially consists of a number of application servers and the
4.4 Software Services:
Software layers and their affect… e.g. GT + WS == GS, SOAP engines,…
Client – client side processing used to initiate some action!
Information Services – registration, lookup, update, and remove services and associated
Security – authorisation, authentication, assertion…
Communications – put/get information + data
Batch Systems -
Spawning – starting, stopping and removing job.
Servers – apache, tomcat….
Runtime libraries and language…
5 Measurements and Metrics
Thoughts: may be advocate…
SI units - http://physics.nist.gov/cuu/Units/
Hockney’s ideas - http://www.ec-securehost.com/SIAM/SE02.html
Is there an IETF effort?
May be look at IETF Benchmarking Methodology (bmwg) ,
Test set up
Existing definitions - use IETF where possible.
The Meaning of
Common Definitions and discussion
6 Goals and Requirements for Grid Benchmarks
As described in Section 3, there are many well-understood techniques for understanding the
performance of a single computer system. Thus, a grid benchmark should be measure the
features of the Grid that are (roughly) independent of the performance of a computational
endpoint. That is, the characteristics measured by a grid benchmark should be (nearly)
orthogonal to those measured by existing benchmarks for single systems.
Grid benchmarks should be reproducible. Reproducibility is a hallmark of good experimental
science. Since the grid is a shared resource and in most cases cannot be completely controlled by
the benchmarker, good grid benchmarks will need to use statistical techniques to provide valid
Usability benchmarks are much harder to quantify (and to reproduce). However, such
benchmarks are needed to help improve the robustness and reliability of grid software.
7 Summary and Conclusions.
 LINPACK, http://www.netlib.org/linpack/
 Top500, http://www.top500.org
 ParkBench, http://www.netlib.org/parkbench/
 NPB, http://www.nas.nasa.gov/Research/Tasks/pbn.html
 The SPEC Benchmarks, http://www.specbench.org
 Grid Benchmarking Research Group, http://www.nas.nasa.gov/GGF/Benchmarks/
 IETF Benchmarking Methodology (BMWG), http://www.ietf.org/html.charters/bmwg-
 Grid Assessment Probes (GRASP), http://grail.sdsc.edu/projects/grasp/
 CrossGrid, http://www.cs.ucy.ac.cy/crossgrid/
 GridBench, http://www.cs.ucy.ac.cy/crossgrid/wp2.3/benchmarks.html
 Integrated Performance Analysis of Computer Systems (IPACS), http://www.ipacs-
 M.A. Frumkin and L. Shabanov, Arithmetic Data Cube as a Data Intensive Benchmark NAS
Tech Report NAS-03-005,
 , R.F. Van der Wijngaart, R. Biswas, M. Frumkin, and H. Feng, Beyond the NAS Parallel
Benchmarks: Measuring Performance of Dynamic and Grid-oriented Applications, Workshop
on the Performance Characterization of Algorithms, July 2001.
 B. Plale, C. Jacobs, Y. Liu, C. Moad, R. Parab, and P. Vaidya, Benchmark Details of Synthetic
Database Benchmark/Workload for Grid Resource Information, Indiana University Technical
Report 583, August 2003, 27 pp. http://www.cs.indiana.edu/cgi-
 C.A. Lee, C. De Matteis, J. Stepanek, and J. Wang, Cluster Performance and the Implications
for Distributed, Heterogeneous Grid Performance,
Heterogeneous Computing Workshop 2000, pp 253-261,
 GAMES-UK, http://www.csar.cfs.ac.uk/software/gamess.shtml