Principles of Collaborative Virtual Environments by dfsiopmhy6


									        Principles of Collaborative Virtual Environments
                                    David Roberts

1. Introduction
Inhabited Information Systems (IIS) require advanced communication infrastructures
that address issues arising from the use of limited computational and network
resources to place people within an interactive information space. These
infrastructures are commonly called Collaborative Virtual Environments (CVE). This
chapter describes these issues along with the ways in which they are typically
addressed. IIS situate people in a social and information context where they can
interact with each other and with the information itself. These users, possibly in
remote geographical locations, access the environment through a variety of display
devices through which they gain distinct levels of presence and immersion (Slater,
Steed, & Chrysanthou, 2001). Some may be co-located, seeing each other in the real
world while immersed in the virtual environment, while others may be in some
geographically remote location and represented locally as an avatar, a 3D graphic
character capable of representing human-like communication, appearance, identity
and activity. Information presented to the user may be shared or private, objective or
subjective. It may be abstracted from live data in the real world or from simulation.
Users may interact with the information to adapt its presentation, content or
behaviour. Information objects often provide a focus for group activity (Greenhalgh &
Benford, 1999).

IIS merge the real and the virtual. Ideally the latter should possess the richness and
naturalness of the former. We would like to be able to interact with remote users as if
they were standing next to us. Verbal and non-verbal communication and the use of
objects in the environment are primary methods of social human communication in
the real world (Burgoon, Hunsaker, & Dawson, 1994). In IIS, shared interactive
information objects may be observed, used to inform, explain, teach, heal, experiment
or as a basis for discussion. Sometimes it is important to see not only what each
participant is doing, in relation to the shared information, but how he or she is feeling.
Expressive human communication should include speech, gesture, posture, and facial
expressions. Shaking hands and passing task-related artefacts, from business cards to
a model of a new product, are important activities in real world group activities. We
are, however, constrained by technology, physics and cost. The constraints of
computers, networks, display and acquisition devices introduce a gap between what
we would like to achieve, and what is currently realisable. In practise we need to
make trade-offs, by reducing realism, naturalness and content where they are not
needed in order to maximise where necessary. This is typically addressed in terms of
what each user can see and in what detail, as well as the objectivity and
responsiveness of interactions with shared information.

IIS applications are numerous and have diverse requirements. Specialised IIS
communication architectures attempt to strike the balance within various application
genres. It has been found that striking the balance within one or more genres requires
a complex architecture comprising many cooperating optimisation and balancing
mechanisms. Common mechanisms will be dealt with in detail later. This chapter is
concerned with the systems issues of communication in IIS. That is, how we make
best use of computers and networks to support co-located and geographically distinct
users in an IIS.

We set the scene by briefly introducing a number of application genre along with the
balance the communication infrastructure needs to set for each. A detailed discussion
of application genre is beyond the scope of this chapter and we restrict our description
to Table 7.1. The remainder of the section discusses common requirements in detail.
Some architectures provide a level of configuration, and sometimes adaptation, to
cope with differential application requirements, various computational resources and
dynamic network characteristics.

 Application genre       Maximise                         Reduce
 Tele-conferencing       Expressive              avatar   Group size, complexity and
                         communication                    interactability of shared
 Scientific            Faithfulness of simulated          Group         size,    avatar
 visualisation         behaviour, consistency             communication
 Cohabited       agent Communication        between       Avatar         communication,
 spaces                agents and between agents          responsiveness
                       and users
 Social meeting places Group size                    Avatar         representation,
                                                     responsiveness, consistency
 Games                 Responsiveness                Faithfulness,           avatar
                                                     representation, group size
 Training          and Faithfulness, responsiveness, Group        size,      avatar
 planning              consistency, repeatability    communication

Table 7.1. Typical compromises for various application genres

Before we discuss the complexities of IIS communication architectures, it is important
to understand what this technology can give us, what we can do with it and what kind
of information needs to be communicated. The remainder of this section introduces
some functional and non-functional requirements of a CVE. Our discussion of
functional requirements focuses on situating inhabitants in a social and information
context. Non-functional requirements are taken from the various communication
media used by IIS as well as the computers and networks that these media must run

Information objects act as foci for activity and often collaboration. Users can
collaboratively affect the presentation, content and behaviour of shared information.
Simple interaction with information is often achieved through selection and
manipulation tools, allowing the representation to be moved to a more suitable
viewing perspective. Application specific 3D toolbars give additional control and
adaptation. The representation of information may itself incorporate handles or tools
for natural interaction. The presented information can often only be understood in the
context of how a group are working with it. It is therefore important to demonstrate
how others are interacting with the data as well as supporting instructive and
expressive communication within the group.
Information may represent anything and be represented in many abstract forms. A
reasonable question is what can IIS offer in terms of information representation, over
and above what we had before? IIS is a combination of advanced technologies and is
not restricted to a particular set of these. Let us briefly look at the way in which some
component technologies are changing what we can do with information. Access to
unprecedented scales of data and processing is now available through technologies
such as the e-Science GRID (GRID). 3D graphics, simulation and display devices
give unprecedented naturalness in the viewing and steering of such information.
Mobile and social agents provide powerful ways of finding, assessing and combining
information. A Collaborative Virtual Environment (CVE) allows us to share
information and observe those that we share it with. IIS technology encompasses all
of this and thus gives us novel ways of presenting and interacting with shared
information in a distributed group setting. So what does this information look like?
An advantage of computer graphics and virtual reality is that we can tailor the
representation of information to any abstract form that best suits the user, application
and display device. What can we do with it? We can alter the representation, change
the detail, content and state, or steer the simulation. Most importantly we can share
information and share the way we work with it.

The spoken word is often the most important medium for communication between
users. It is, however, not sufficient to demonstrate how others are interacting with the
information. When combined with video streaming, or a 3D avatar capable of
reflecting gesture and posture, we have an effective tool for instructive and emotive
communication. Viewing a remote user through a video window is, however, not
effective for demonstrating how the remote user is interacting with information.
Representing both the information and remote user through 3D graphics gives a much
better impression of how each user is interacting within the team and with the data.
All that is left to situate the inhabitants in a social and information context is the
support of expressive communication. This brings us to the topic of user controlled,
computer generated characters (avatars).

Video avatars can provide high levels of detail, realism and expression. They
faithfully reflect the actions and emotions of their user. Although there is little
technical difficulty in placing a stereo video in a 3D world, it is much harder to
capture imagery of the user. Problems of camera placement within a display system
are exacerbated by freedom of movement of the local user and any number of
observers. Other problems include isolation of the user from his environment,
occlusion of the displayed image by the cameras, and high bandwidth requirements of
multiple streaming of video across a network. For these reasons most CVEs use
avatars generated from 3D graphics. These are typically humanoid with movable
joints that provide a basic reflection of body movement. Although such avatars are not
as realistic, they can communicate instructive and emotive communication sufficient
for many applications. In the real world we look at posture, gesture, subconscious
movement and facial expression to gauge emotion. All of these can be represented
through an avatar. The problem again relates to capture. A typical display device
takes sparse input from the user to control the avatar. For example, a desktop system
may use a mouse to control movement, mouse keys to interact with objects and
keyboard to chat. An immersive display system, such as a Head Mounted Display
(HMD) or CAVE would typically track the head and a wand held in the dominant
hand. The wand provides additional input for moving long distances in the
environment and interaction with objects. Talking would normally be communicated
through streamed audio. Such input is sufficient for demonstrating how a user is
interacting with input. Showing any emotion through a present desktop interface is
almost impossible without additional input. The combination of audio and freedom of
single handed gesture does allow a base level of expressive communication from an
immersive device. It has been found that when desktop and immersive users work
together, the latter take dominant roles, presumably from their greater ability to
express themselves (Slater, Sadagic, Usoh, & Schroeder, 2000). Further to this we
have found that where two immersive users share an environment with desktop
counterparts, the former team up mostly ignoring the latter. Greater levels of emotive
communication may be achieved for any device by allowing the avatars to improvise
(Slater, Howell et al., 2000). Here the avatar will attempt to fill in the gaps left by the
lack of inputs. Context and profiled personality may be used to interpret user input, or
lack of it, and drive suitable emotive behaviour. Other behavioural techniques may
further enhance the believability and realism of avatars. For example behavioural
reasoning may combine concurrent simple autonomous behaviours such as fidgeting,
shifting weight between the feet, breathing and eye movement. Reactive behaviour is
useful to define how objects, including avatars, react to given interactions. Diverse
behaviour can be achieved through applying polymorphism, allowing something of
given type to behave in new ways to given stimuli.

Some basic requirements for interaction within IIS are responsiveness, detail and
intuitiveness. As they are of prime importance to the usability of the environment,
they will now be discussed in more detail.

A key aspect in the usability and believability is maintaining responsiveness of
interactions close to the level of human perception. Changes in the presented
information must be present as soon as a user affects it. Low responsiveness will
make the system feel unnatural and cause frustration. Immersive displays render the
environment from a new perspective every time the user moves his or her head. A low
responsiveness in updating perspective causes disorientation and sometimes feelings
of nausea. IIS introduce the issue of responsive sharing. This is a particular concern
where users are in geographically distinct locations connected over a network. The
communication infrastructure must provide sufficient responsiveness to support, and
not confuse, the natural sequences of conversation and interaction.

Some interactions will require more detail than others. For example, to interact with
another person it is often important to communicate both complex language and
emotion. Even email users exchange icons to represent how they feel. The detail of
presented information will be a balance between the data from which it is derived and
what is useful and perceivable for each user. The communication infrastructure must
support a wide range of detail in interactions and should do so in an optimum manner.

Interaction must be both natural and intuitive. A user should be able to interact with
an object or peer, without having to worry about overcoming shortcomings in the
technology. Furthermore, an object should react in a believable way regardless of how
or where it is implemented. This places requirements on both the device and the
infrastructure. Display devices offer various input/output capabilities that may be
mapped to interaction scenarios. For example, in an immersive system, a user may use
a joystick on a wand to move up to an object, their own body movement to position
themself at the correct aspect, and the wand to select and manipulate the object. Both
the physical device and the way its inputs are interpreted must map to natural and
believable behaviour in the virtual environment.

Communication Requirements
So where does this leave us in terms of communication? Representing information
using computer generated graphics gives unprecedented powers to tailor its
presentation. Virtual Reality (VR) uses 3D graphics to allow the user to control his or
her position in the environment, giving natural access to spatially organised
information. CVE situate socially a group of users around information within a
familiar spatial context. Although video avatars would offer a potentially higher level
of realism, computer generated avatars are easier to situate in an environment where
users have freedom to walk around. It is not surprising that the majority of CVEs rely
primarily on 3D computer generated graphics to present visual information. Unlike
video, 3D graphics scenes, comprising the geometry and appearance of many objects,
can be downloaded in advance. Where users are distributed, the scenegraph may be
replicated at each user’s computer with incremental changes sent across the network.
This massively reduces bandwidth usage and also increases the responsiveness.
Without such replication, any user movement would require a perspective
recalculation of the scene on a server before the resultant images could be streamed
back to the user’s machine. This approach is generally unusable as the network delays
result in feelings of disorientation and nausea as the user’s visual inputs lag behind
their internal senses of balance and proprioception.

3D graphics may be the primary medium for IIS but it is often combined with others.
Natural language has been shown to be of vital importance to collaborative tasks.
Audio streaming has been found much more effective than chat in IIS settings and
does not require the use of a keyboard. Streaming of video and 3D graphics across the
network is useful for rich and detailed images provided observer perspective is
constrained. An exception to this is tele-presence which allows a single user to see
through the eyes of a movable robot, but this is outside the scope of this chapter.
Table 7.2 shows how various media are typically used together, when they are used
and how they impact on available network bandwidth.
              3D            Audio       Video              Streamed   3D Text chat
              graphics –                                   graphics
Purpose       Primary       Primary     Supplementary      Supplementary     Alternative
              visual        natural     visual      and    visual e.g. for   to    audio
              medium        language    audio        for   high       end    streaming
                            medium      perspective        graphics     on   for
                                        constrained        desktop           desktop,
                                        high fidelity                        public
Usage         Continuous When      Occasionally            Occasionally      When user
                         user   is as required             as required       is chatting
Bandwidth     High       Medium High                       High              Low
usage         during

Table 7.2. Usage of mediums in IIS and the effect on the network

The degree to which each medium is used is application dependent. We have assumed
so far that sharing the use of the information is the primary goal of an CVE. Other
applications, for example, may place more emphasis on emotive communication and
thus use video as the primary medium. The remainder of this chapter deals with the
typical and leaves specialisation to other works. We therefore restrict our discussion
to systems that primarily use 3D graphics for vision, audio streaming for speech and
video, and 3D graphics streaming for occasional supplementary, high detail, imagery.

Resources: Computers and Networks
Let us now take a brief look at the relevant characteristics of computers and networks.
It is, after all, these that must underpin an CVE. Computers have limitations on the
amount of information they can store and process. An IIS will often contain users
supported by computers of widely differing capabilities. These computers may be
connected to various network technologies such as Ethernet, ATM and wireless.
These networks are often part of the greater Internet and will communicate through
intermediate networks of various technologies. These technologies have widely
differing bandwidth, delay characteristics and reliability. CVEs use the Internet
Protocol (IP) that deals with the heterogonous nature of the Internet by making low
assumptions about the guaranteed service. That is, IP assumes that messages may be
fragmented and individual fragments may arrive late, out of order or be lost. The
Internet, and often super computers running display devices and processing
information, are shared resources offering highly dynamic levels of throughput
depending on localised load. A final important point is that the speed of light will
introduce perceivable network delays for many intercontinental links. An CVE must
be designed to run over a set of heterogeneous computers and networks, each with
possibly very different dynamic, throughput and reliability.

2. Principles
IIS situate inhabitants in a social and information context that extends interaction in
the real world in a natural manner. Technology, physics and cost create a gap between
this ideal and reality. This section is concerned with balancing throughput limitations
of computers and networks with the requirements of IIS applications. CVEs employ a
set of cooperating mechanisms and algorithms that effectively concentrate resources
by maximising the fidelity of sharing where it is needed by reducing it where it is not.
We have looked at what might reasonably be expected in terms of perception and
interaction and how this may be supported through a combination of existing
communication media. We have explained why 3D graphics with replicated
scenegraphs have become the primary medium of communication in IIS and how
these may be supplemented with other media. We restrict our discussion here to the
mechanisms for improving the fidelity of sharing through 3D graphics and replicated

A key requirement of IIS and VR is the responsiveness of the local system. Delays in
representing a perspective change following a head movement are associated with
disorientation and feelings of nausea. A CVE supports a potentially unlimited reality
across a number of resource bounded computers interconnected by a network which
induces perceivable delays. Key goals of an CVE are to maximise responsiveness and
scalability while minimising latency. This is achieved through localisation and

2.1 Localisation
Localisation is achieved through replicating the environment, including shared
information objects and avatars, on each user’s machine. Sharing experience requires
that replicas be kept consistent. This is achieved by sending changes across the
network. Localisation may go further than simply replicating the state of the
environment and can also include the predictable behaviour of objects within it.

Object Model
The organisation and content of a scenegraph is optimised for the rendering of
images. Although some systems, for example Cavernsoft (Leigh et al., 2000) and
Arango (Tramberend, 2001) directly link scenegraph nodes across the network, most
systems introduce a second object graph to deal with issues of distribution. Known as
the replicated object model, we will from here on refer to it as the replication and its
nodes as objects. Objects contain state information and may link to corresponding
objects within the local scenegraph.

A virtual environment is composed of objects which may be brought to life through
their behaviour and interaction. Some objects will be static and have no behaviour.
Some will have behaviour driven from the real world, for example by a user.
Alternatively, object behaviour may be procedurally defined in a computer program.
In order to make an IIS application attractive and productive to use it must support
interaction that is sufficiently intuitive, reactive, responsive, detailed and consistent.
By replicating object behaviour we reduce dependency on the network and therefore
make better use of available bandwidth and increase responsiveness. Early systems
replicated object states but not their behaviour. Each state change to any object was
sent across the network to every replica of that object. This is acceptable for
occasional state changes but bandwidth intensive for continuous changes such as
movement. Unfortunately, movement is one of the most frequently communicated
behaviours in IIS. A more scalable approach is to replicate the behaviour model and
only send changes to that behaviour. Such changes are known as events.

Deterministic Behaviour:
Behaviour may be characterised as deterministic or non-deterministic. Deterministic
behaviour need not be sent across the network provided it can be calculated
independently at each replication. Most procedural behavioural descriptions such as
reactive, improvisational and emergent may be defined in a repeatable and
deterministic manner. Events can simply identify the name and possibly arguments to
a procedure, the execution of which will be replicated at each machine. Even non-
deterministic behaviour can be approximated as deterministic provided the effect of
bad approximations are not catastrophic.

Dead reckoning:
Constrained movement, such as that of a vehicle, may be determined approximately
using a technique called dead reckoning (IEEE1278.1, 1995). One of the earliest
applications for large scale IIS was battle field simulation (SIMNET). Here
embodiment was originally confined to vehicles, such as tanks, where the vast
majority of communicated behaviour was movement around the battle field. Dead
reckoning was introduced to reduce bandwidth consumption of movement
information. A dead reckoned path represents a predicted approximation of near
future parametric movement based on recent samples of position over time. Paths are
sent to other replicas in events. A remote replication then calculates the probable
position of the vehicle based on current time. Divergence is checked at the sender by
comparing actual and predicted position by running the same algorithm as the receiver
on the path it has sent. When divergence exceeds a threshold, a new path is calculated
and sent. The algorithms for calculating the path are based on Newton’s Laws and
Hamilton’s quaternion expressions from the 17th century. Variations on the approach
deal with first and second order integration, time constants and smoothing (Miller,
1989). The remote user is presented with an approximation of movement, the most
noticeable aspect of which is sudden jumps in position when a new event is received,
Figure 7.1. The magnitude of this discontinuous jump is the product of the difference
in velocity described in two adjacent events and the network delay.
                       Local                Remote
                       movement             representation

Fig. 7.1. Effect of dead reckoning

Public switched networks, such as the Internet, introduce into the distribution of event
messages both dynamically changing delays and the possibility of loss. This can
adversely affect the synchronisation, concurrency, causality and responsiveness of
events. Synchronisation ensures that events are replicated within real-time constraints.
Causal ordering ensures that causal relationships are maintained. Concurrency defines
the ability of the system to allow events to occur simultaneously. Lastly,
responsiveness is the delay in the user perceiving the effect of an action on the
system. Concurrence and therefore responsiveness are reduced as the level of
consistency is increased. This all leads to the need for consistency management, the
role of which is to provide sufficient synchronisation and ordering whilst maximising
concurrence and thus the responsiveness of the system. The optimal balance between
sufficient synchronisation, ordering and responsiveness is application and scenario
dependent. An ideal ordering mechanism provides a compromise between
synchronisation and ordering on one side and responsiveness and concurrence on the

Behaviour may be described parametrically. For example, dead reckoned paths
describe movement through time. Some early systems based time on frame rate. This
can be seen in some single user computer games, where the movement of objects
slows down as the complexity of the scene increases. This approach is unsuitable for
IIS as shared behaviour should be consistent and not represented differently to each
user dependent on the performance of the local machine. A common approach is to
use the system clock on each computer to provide a continuous flow of time.
Movement can then be described in terms of metres per second and will be
represented at the same rate to each user.

As well as progression it is important to synchronise the start of replicated events.
Some systems for example, NPSNet (Macedonia, 1994), set the start time of a
received event to the time at which it was received. This removes the need to
synchronise accurately local clocks which is a non-trivial task. The disadvantage with
this approach is that any behaviour is offset by the network delay. Through
synchronising local clocks it is possible to synchronise the state of objects from the
time an event arrives until the time a subsequent overriding event is sent. The
PaRADE system, developed as part of the author’s PhD (Roberts, 1996), allows
locally predictable events to be sent in advance, thus overcoming the network delay
and synchronising from the start.

Concurrency Control:
Concurrency control is an important subset of consistency management that deals
with the prevention of concurrent conflicting updates. This is most apparent where
two users try and move a given object in conflicting directions. Without concurrency
control it is difficult to determine the outcome but it will at least cause confusion and
frustration and at worst an unrecoverable divergence between replicas. Many existing
infrastructures do not include concurrency control. Those that do, employ algorithms
that themselves are adversely affected by network latency. This in turn affects the
responsiveness of interaction between user and shared information objects. A
conservative concurrency algorithm, as used in some analytical simulations, would
lock the whole world and allow updates on a turn basis. This unnecessarily restricts
responsive interaction to a level that is unworkable for general IIS applications. An
optimisation is to increase the granularity of locking, to either sets of objects, object
or object attribute level. A common mechanism for concurrency control is
transferable object ownership, where a user can only affect an object once ownership
has been transferred across the network. The effect of such latency is normally
apparent in a delay in being able to interact with an object recently affected by
another user. Optimisations have been developed for predicting interactions and
transferring ownership in advance (Roberts, Lake, & Sharkey, 1998).

Events sent over the Internet may be lost or arrive in a different order to which they
were sent. In many cases the current state is more important than history and can be
derived from an old state and a new event, even when some preceding events have
been missed. For example, a new dead reckoned path overrides the last and is not
dependent upon it. Ordering is, however, often vital. A lack of ordering can cause
complete confusion when collaborating with remote users and sharing objects. It is
therefore surprising that the majority of CVEs do not guarantee it. This is most likely
a throwback to the conventional applications of collaborative virtual environments
that did not properly support shared interaction.

Order must be balanced against responsiveness. The greater the level of ordering, the
lower becomes the concurrence and thus the responsiveness. A true objective state of
an environment cannot be guaranteed until all events have been received and
processed in the correct order. Generating a new event before the objective
environment state is known is dangerous and requires some strategy for dealing with
events generated on the basis of an untrue state. To guarantee objectivity all replicas
must be frozen whilst waiting for events to arrive thus lowering the concurrence.
Lamport developed an optimisation called causal ordering which removed the need to
order events that could not have been related (Lamport, 1978). The definition of
causal relationship was based on the subjective view of a replication. Total ordering
and Lamport causal ordering work well in distributed analytical simulation but are not
generally suited to IIS applications which require continuous and responsive
interaction with the environment. One solution is to allow the IIS infrastructure to
decide when to apply and where to apply ordering. This decision may be based on
application knowledge of causality and importance of ordering, awareness (see
below) and network conditions. Such approaches have been applied to various
degrees in PaRADE, MASSIVE III (Greenhalgh, Purbrick, & Snowdon, 2000) and
PING (Sharkey, Roberts, Tran, & Worthington, 2000).

Application of Consistency:
Now that we have introduced synchronisation, concurrency control and ordering as
the basic components of consistency, we can look at how they are applied. Table 7.3
compares common alternative mechanisms for each, describing each mechanism,
giving an application level example of use, comparing typical delay in terms of level
of human perception and giving some example infrastructures in which they are used.
              Synchronisation       Concurrency               Ordering
Description   Behaviour of an      Object replicas affected   Order of object events
              object is            concurrently               over replicas
              synchronised over
Mechanism     Wall clock Tick      ConvergenceOwnership       Causal        Total
Description   Remote      Replicas Diverging Prevents         Based on      All events
              object      update states are     divergence    potential     are ordered
              follows     in       converged through          causality
              parametric step                   unique key
Example       Dead        Crowd Tug of war Passing a       Player’s         Player
              reckoning walking with elastic business      activity         action
                          in       rope         card       ordered in       delayed
                          step                             ball game        until earlier
                                                           but not          spectator
                                                           with             action
                                                           spectators       Observed
Induced delay None         Medium Low             Low      Medium           High
Example        Most        RTI    DIVE            DIVE-    MassiveIII,      RTI
Infrastructure                                    Spelunk, PaRADE,
                                                  PaRADE, PING

Table 7.3. Comparison of consistency mechanisms

2.2 Scaling
Scaling allows the amount of information in the environment, including the number of
users, to increase, without reducing the fidelity of experience to any one user. This is
achieved by balancing each individuals need for information with what can be
achieved given available computational and network resource (S. Benford & L. E.
Fahlén, 1993). Awareness management is the mechanism used to balance an
individual’s ideal awareness with resources. The scale of information provided to any
one user or process may be controlled in terms of extent, granularity and detail. These
define awareness in terms of object subsets of the environment, aggregation of many
objects into few, and the attributes of a given object.

The majority of effort in attaining scalability has focused on subdivision of the
environment and population according to interest. This is often referred to as interest
management. Awareness of remote objects is determined by context dependent
interest. Distinct resources such as servers or communication groups, discussed later,
are used to support each area of interest. The interest of a user is dynamic and context
dependent. For example by walking into another virtual room a user becomes aware
of its contents and occupants. A number of technical issues must be addressed in
order to support this dynamic awareness. Subdivision should be natural and appear
transparent otherwise it can affect a user’s behaviour. To be effective it must balance
resources usage across the areas of interest. Changing awareness may require much
data to be transferred. This can result in delays in the presentation of a new area,
which may reach the order of seconds. Different application genres are suited to
distinct definitions of interest and methods of subdivision. The granularity of
subdivision may be tackled at world, object or intermediate level. We now survey
some classic approaches to subdivision used in IIS which are first summarised in
Table 7.4.

   Approach              Description                 Granularity  Example systems
   Multiple worlds       Separate world connected    World        Active worlds,
                         through portals                          Ulitima, DIVE
   Static spatial        Divide world surface into   Intermediate NPSNet
   subdivision           tiles
   Dynamic spatial       Flexible mesh of AOI        Intermediate VIVA
   subdivision           tiles that stretches to
                         balance tile membership
   Löcales               Rooms                       Intermediate SPLINE
   Aura                  Aura, focus and nimbus      Object       DIVE,
                                                                  MASSIVE I & II

   Regions               Abstract spaces             Intermediate MASSIVE II,
                                                                  DIVE (COVEN

Table 7.4. Overview of classic subdivision approaches

Multiple Worlds:
A simple method for dividing the environment and population is to provide distinct
multiple worlds. Each world is typically supported by a distinct server and hosts a
distinct set of objects and users. As discussed in the deployment section, this is
straightforward to support over the Internet and thus is prevalent in current systems
used by the general public, for example, in gaming (Ultima Online, 2003) and social
environments. Users typically inhabit a single world at a given time and, in some
systems, may move between these worlds using portals (Snowdon, Churchill, &
Munro, 2001). The disadvantage of this approach is the difficulty in balancing the
number of users in each world. Figure 7.2, shows multiple worlds interconnected
through portals and demonstrates the potential problems with balancing population.



     Fig. 7.2. Multiple worlds, showing portals and possible population loading

Static Spatial Subdivision:
Increasing the granularity of subdivision allows worlds to be split into areas of
interest. An approach developed for battlefield simulation and training was to divide
the environment into areas of interest in the shape of equal hexagonal tiles and map
each to a communication group (Macedonia, Zyda, Pratt, Brutzman, & Barham,
1995). A process sends information to the group associated with the tile occupied by
its user and receives information from that group and those associated with adjacent
tiles. The supporting process dynamically joins and leaves groups as the user moves
between tiles. Receiving information from adjacent tiles removes the problem of not
seeing spatially close objects across a border. Group communication provides a
mechanism for limiting awareness at a message distribution level with the added
bonus of removing the need for a server.
It is not yet, however, generally supported on the Internet and so this method adds
complexity to deployment which is discussed later.

Again the static nature of this method also can produce unbalanced population of
areas, Figure 7.3.

                         Fig. 7.3. Static spatial subdivision

Environment plays an important role in restricting and focusing human interaction.
Spatial subdivision approaches are suited to open spaces but do not take advantage of
the awareness limits imposed by buildings. Löcales (Barrus, Waters, & Anderson,
1996) are areas of interest that map to physically divided spaces such as rooms in a
building, Figure 7.4. This approach relies on the adequate provision of resources to
support a crowded room and again suffers from its static nature. It is, however,
sufficient for many applications.

                                  Fig. 7.4. Löcales

Dynamic space subdivision:
The above approaches rely on an even distribution of users across statically defined
areas of interest. This suits them to particular implementations and restricts their
general applicability. Dynamic space subdivision attempts to redefine divisions
between areas of interest in order to balance the number of users in each, Figure, 7.5.
(Robinson et al., 2001) divides the environment into a two dimensional mesh or three
dimensional lattice and moves the boundaries between the areas of interest to balance
membership. Boundary movement is considered when an area becomes over
populated and is determined through negotiation between servers dedicated to
adjacent interest areas. Robinson’s algorithm considers the cost of moving a boundary
to both servers and clients.

                         Fig. 7.5: Dynamic spatial subdivision

Interest may be determined at the granularity of object pairs by determining their
potential for interaction based on spatial proximity. Spatial proximity may be
efficiently detected by placing auras around objects and checking for aura
intersection. In the case of avatars, this potential for interaction is increased when they
face each other. (S. D. Benford & L. E. Fahlén, 1993), encapsulate avatars in auras
and use aura collision as a perquisite for interaction. Within the aura, focus and
nimbus spatially define attention and projection respectively, Figure 7.6. Both focus
and nimbus reach out in front of the avatar but have distinct shapes.

                          Fig. 7.6. Aura – Focus and Nimbus

Both tiles and Löcales are specific definitions of how to divide the environment and
are applicable to distinct forms of interaction and application genres. MASSIVE II
confines aura based awareness within abstract regions which may be mapped to
application specific definitions of interest. Figure 7.7 depicts one possible way of
dividing an environment into regions.
                                  Fig. 7.7. Regions

In the real world people are able to reason at different levels of granularity. For
example, a lecturer must be aware of the attention and understanding of each student
during a lecture whereas a university chancellor sees the institute in terms of
departments. This approach of aggregation may be adopted in CVEs to further
increase the scalability. Aggregation reduces not only the rendering but also the
amount of information needed by some observing processes. For example, in a
battlefield simulation, the driver of a tank is interested in other tanks whereas a
general is more concerned with tank divisions (Singhaland & Cheriton, 1996).
Another example is that of a crowded stadium represented by a single avatar
(Greenhalgh, 1999). The size of the group, the team they support, and the sound they
produce, are represented through the avatar’s size, colour and aggregated audio
streams respectively. Emergent behaviour may be replicated and communicated in
aggregated form to reduce the load on the network. For example, the behaviour of a
flock of birds could theoretically be replicated by simply communicating the size of
the group and then continuing to communicate the movements of whichever bird is in
front. A reasonable flocking behaviour can then be replicated at each site through
application of local rules based on following and collision avoidance. This aggregated
emergent behaviour may be applied to many other group behaviours, for example the
behaviour of a human crowd. A similar principle can be applied to an avatar allowing
the majority of body movement to be calculated locally and driven by the
communication of movement of selected body point, such as head and hand. Here, a
combination of kinematics and selections of previously recorded motion tracking data
can be used to improvise reasonable local behaviour based on head and hand

In order to reduce network traffic through aggregation it is necessary for the sender to
know the level of aggregation. Although aggregation can increase the scalability of a
receiving process it can decrease the scalability of the sender and the use of the
network when many receivers require distinct levels of aggregation for the same
objects (Roberts, 1996).

We have seen how scalability may be increased by reducing the number of
communicating objects held on each machine. Scalability can be further increased by
managing the detail at which individual objects are replicated. Heuristics of interest
such as distance or the relationship between the role of the observer and the use of the
observed may be applied. Many graphics languages, such as, Inventor, Performer and
VRML, support Level Of Detail (LOD) modelling where sufficient frame rate is
maintained by reducing the graphical complexity of distant objects. The scalability of
communication and computation can be greatly increased by applying this reasoning
to the communication of behaviour. Objects may be defined in terms of attributes in
which remote processes can dynamically express and decline interest, for example as
defined in IEEE 1516 and implemented in the DMSO RTI. Balancing the detail of
communicated behaviour with the interest of remote users is an important, if under
researched, topic. The amount of information being received may be reduced through
local filtering or sending control messages back to the sender. The latter approach
again suffers from the potential need to send distinct levels of information to different
receivers. A hybrid approach might send the highest detail required by any to all and
allow receivers to filter further.

2.3 Persistence
Users can join, leave and rejoin collaborative virtual environments at will. When in
the environment, they can affect its state through interacting with, and introducing,
objects. A real world analogy is a bank account. When someone deposits money into
a bank cash machine, the money should not be lost as soon as the card is withdrawn.
Persistent environments will maintain the effect of changes when the user leaves.
Supporting persistence is straightforward when the underlying CVE infrastructure
hosts all master objects on servers. Where a localized approach has been adopted to
increase scalability or responsiveness, master objects will be held in the memory of a
user’s machine. These must be moved to a participating machine when the user
leaves. Provided the behaviour of an object is known at the target site, it is only
necessary to move the current state and mastership of the object.

There are two basic forms of persistence: state; and evolutionary. State persistence
maintains an object in a static state once its owner has left. Evolutionary persistence
will support the ongoing behaviour of an object once its creator has left. For example,
in a lecturer’s bank account which is always overdrawn, the money deposited will be
reduced over time by interest payments.
So far we have only considered what happens to objects when a user leaves. We must
also consider the effect of the environment going off line. Such an occurrence may be
planned or accidental. In either case we may wish to guarantee persistence. One
solution is to store object state information to disk on a persistency server both
periodically and, where possible, when an imminent failure is predicted.

2.4 Communication
Previous sections have introduced the kind of information that must be passed through
an CVE and we have described object level mechanisms for managing this
information in order to maximise responsiveness and scalability. We now move down
into the message level to examine how to actually communicate this managed

The communication requirements of an CVE are complex. Those of responsiveness,
reliability and scale of information transfer differ greatly depending on application,
context and scenario. Before we describe the method of communication we must look
at the content. We now examine some typical forms of information and their
requirements on the underlying communication system. This is broken down into:
discovery of objects; events; audio and video.
Discovering Objects:
When a client alters its awareness by entering a new world or area of interest, it must
discover the objects within. Some mechanism is required for the client to obtain all
the information about every object it discovers. This information includes state,
behaviour and graphical appearance. Behaviours, and particularly appearance
descriptions, tend to be much larger than state, but in most systems remain unchanged
throughout the lifecycle of an object. Such information is typically in the order of
kilobytes per object. Usually such data only needs to be sent to one client at a time
and it must be sent reliably, in order and preferably efficiently. Users frequently move
between areas of interest, which results in traffic busts as the local system downloads
object state and possibly appearance and behaviour. In turn this can result in delays
often reaching several seconds. It is therefore important to use an awareness
management scheme that minimises movement between areas as well as the number
of objects in each. Some systems, for example DIVE (Frécon & Stenius, 1998),
download from an existing peer process but this can cause that process to lock, which
is disorientating for its user. The responsiveness of remote peers may be maintained
by obtaining all object information from a persistence server.

The behaviour of objects is driven and communicated by events. Events need to be
propagated to any interested process as quickly as possible. They are typically very
small in terms of network bandwidth. Many events are frequent and quickly
superseded. Others are infrequent and their loss might cause applications or users to
act in an erroneous way.
The majority of events typically describe movement. Constant latency is important as
it improves the realism of remote movement. As discussed above, in the context of
event ordering, it is typically more important to reflect current position as opposed to
how the object came to the position. Since we may typically send many movement
events for a given object in a second and that the probability of message loss is low,
lost movement events will seldom be noticed. An important exception to this rule is
introduced by dead reckoning where the frequency of path generation is considerably
lower. We presented a scheme for addressing this problem by reliably resending dead
reckoned paths that had not been superseded within a time limit (Roberts, 1996).
Tracking systems allow natural non-verbal communication but generate quantities of
events that are difficult to support over the network. During trials between networked
reality centres in UK and Austria, we found it difficult to realistically approximate
human movement with dead reckoning but have had greater success limiting the
frequency of outgoing events for given objects by simple filtering.

Bursts of events typically accompany interaction with other avatars, objects or both.
For example, avatar communication may well include gesticulation and talking. This
results in bursts of movement events and audio traffic. Such exchanges can
occasionally swamp bandwidth and overrun receive buffers, resulting in high message
loss. This is particularly the case for groups of interacting users. Remote events can
sometimes be delayed for seconds while the CVE attempts to catch up with the
receive buffer resulting in a temporary loss of responsiveness. In this case the loss of
movement events is preferable as it brings the system back to a synchronised state in
shorter time. Some systems, for example PING, limit this time through a Bucket
Some events may be vital particularly where they affect the result of, or ability to
process, subsequent events. This includes any event that changes the structure of the
scenegraph. Such events are commonplace where users interact with objects. Losing
such events can cause significant divergence between users’ views. For example, one
user sees that he has taken an object out of another’s hand, while the other sees herself
still holding it. At best this causes confusion and at worst, an unrecoverable

Verbal communication considerably improves the performance of general
collaborative tasks as well as the feeling of co-presence. In order for audio
communications to support human conversations it must be continuous and have
constant rate and sufficient resolution. Network bandwidth and message loss can
reduce resolution. Network jitter, where heavy network traffic causes temporary high
delays, can alter the rate at which the data is delivered. The COVEN trials suggest
that audio traffic is in the order of kilobytes per second for each user (Greenhalgh,

Video has similar requirements for continuity and rate but high resolution images can
require much higher bandwidths. Typical CVE use video sparingly, mapping low
resolution streams to polygons. For example, a low resolution video avatar might
require tens of kilobytes per second.

Now that we have described how information needs to be communicated we will look
at ways in which this is achieved in typical CVE. In particular we focus on how data
is prepared for sending over the network and how it may be disseminated to one or
many recipients with various Qualities of Service (QoS) of delivery.

Before being sent over the network, data is marshalled into a flattened message. This
message is split into packets and sent across the network. Transport level network
protocols convert between messages and packets. The size of a packet is determined
by the underpinning link level protocol. The Internet Protocol (IP) adopts a maximum
packet size from the underlying network technology. In an CVE a packet might
contain one or several events and a continuous flow of packets might support an audio
or video stream.

Dissemination determines if a message is sent to a single or a group of recipients.
Distinct forms of dissemination are offered by various transport level protocols.
Group communication multicast is often used in CVE to scale the number of users.
Multicast allows hosts to express interest in any set of communication groups. A
message sent to a group will be distributed to every member at no extra cost to the
sender. The scalability of the sender is maximised while the scalability of the receiver
can be increased by mapping awareness to groups. Group dissemination may
alternatively be implemented above point-to-point protocols by sending a message to
a set of connections, for example in Spline (Waters et al., 1997) and PaRADE
(Roberts, Strassner, Worthington, & Sharkey, 1999).

Not all packets that are sent arrive, or arrive in the correct order. Their subsequent
assembly into a message, and delivery to the application, may be delayed while these
errors are overcome. QoS determines what criteria, in terms of reliability, order and
timeliness, will be met before delivery. Generally the higher the reliability and level
of ordering, the lower the responsiveness and scalability. This is particularly the case
for group communications. Different transport level protocols offer distinct QoS in
addition to dissemination. Some systems, for example, DIVE and PaRADE,
implement additional or improved qualities of services above the transport level.

Mapping Information to Dissemination and Delivery:
We have seen how various types of information in an CVE require distinct levels of
dissemination and QoS. Some CVEs simplify their design by using single
dissemination and delivery methods and accept the drawbacks. Others, for example
HLA RTI, PaRADE, PING and DIVE, combine various dissemination and QoS
delivery methods to optimise performance. Table 7.5 suggests how an CVE might
map information type dissemination and QoS. This table is derived from combining
best practise of PaRADE, PING and DIVE.

 Type        Example           Reliability   Order    Responsiveness   Dissemination   Throughput
 Downloads   Object            HIGH          HIGH     LOW              ONE             HIGH
 Regular     Movement          LOW           LATEST   HIGH             MANY            MEDIUM
 Irregular   Object creation   HIGH          HIGH     HIGH             MANY            LOW
 Audio       Verbal            LOW           LATEST   CONSTANT         MANY            MEDIUM
 Video       Facial            LOW           LATEST   CONSTANT         MANY            MEDIUM
Table 7.5. How an CVE might map information type dissemination and QoS
Managing the mappings between information, dissemination and QoS becomes
complex when an environment contains many users dynamically moving between
areas of interest. The channel abstraction may be used to map dissemination to QoS.
For example in PING, events are routed to channels according to their type and the
current area of interest. Some events may theoretically be sent down many channels,
for example unreliably to user machines and reliably to a persistence server.

 3. Architecture
We have introduced the basic requirements and realities of communication within
CVE and outlined principles used to balance the two sufficiently to support fruitful
collaboration between users socially situated in an information context. This section
provides case studies of two example systems, DIVE and PING, describing each in
terms of modulised architecture and use of principles. The Distributed Interactive
Virtual Environment (DIVE) is a widely adopted CVE platform that implements most
of the principles we introduced. The Platform for Interactive Network Games (PING)
attempts to bring together best practise from CVE architecture. Although the latter is
still in beta prototype stage its design provides a good tool for explanation.
3.1 DIVE
The Distributed Interactive Virtual Environment (DIVE) has a classic architecture
consisting of seven modules, Table 7.6. Each represents a conceptual level and is
implemented as a unique library. This provides flexibility when updating the

 Modules      Description
 Video        Allows video to be texture mapped to polygons in the scene
 Audio        Audio supporting conversations between users as well as attaching
              sounds to objects
 Graphics     3D rendering of the graphical representation of objects and thus scene
 Aux          Tools for the application building including the scripting language
 Core         Object database and supporting functionality such as time and events
 Sid          Communication
 Threads      Thread library provides concurrence at each computer

Table 7.6. Modules of the DIVE architecture

DIVE introduces, adopts and adapts most of the principles described in the previous
section. Best practise solutions have been added and iteratively improved for more
than ten years. Widely used in research, this platform has proved the principles.

DIVE uses localisation to maximise local responsiveness and make best use of the
network. We will now look at the particular design decisions taken in implementing
this localisation within a framework of the principles outlined in the previous section.

Object Model:
The responsiveness of a user’s interactions with the environment is maximised
through object replication, negating the need for events to be passed across the
network before the local model is updated. The replicated object database resides on
participating machines according to awareness management. A replication is
organised into a hierarchy of objects, each of which contain state information and may
be attached to behaviour scripts and graphical appearance. A scenegraph is coupled to
a local replication and mirrors those qualities of objects necessary for rendering. An
application reads and writes to a replication regardless that other replicas may exist.

A simple reactive behaviour associates triggers and responses to objects. An object’s
behaviour is defined in an attached script. The script language, DIVE/TCL, extends
the Tool Command Language (TCL) to include useful commands for monitoring and
updating objects. Typed events may be triggered through a user input device, collision
with another object, timer and world entry. An interest in events may be expressed
through event callbacks and responses mapped to event types. For example, an
application programmer can register an interest in collision events for an avatar and
define distinct responses to various types of collided object. Behaviour scripts are
replicated along with the object. This allows objects to react to local interactions
without network induced delay. Remote scripts are not called directly but through the
communication of the same event that triggered them locally. The concept of dead
reckoning is supported but the implementation of the algorithm left to the application
programmer. Each object is able to store a parametric path from which the current
position may be calculated. Use of the path to communicate and calculate current
position is, however, optional. This feature is good, as not all objects move in a
predictable way.

High responsiveness comes at the cost of low consistency. Replicas are loosely
coupled allowing divergence and attempting convergence over time. For example,
when a user moves an avatar the remote representation will follow, delayed by the
network, and catch up when the avatar stops moving. There is no strong concept of
object ownership.

There is no specific concurrency control within DIVE. Hence an object may be
affected in conflicting ways by multiple users, causing the replicas to diverge.
Mechanisms are provided to settle an object to a mean position after being pulled in
opposing directions. Users, however, observe the object jumping wildly between them
until the steady state is reached. A loose form of ownership allows an object to be
attached to an avatar. Other users can still affect the object, for example, by changing
its relative position to the carrying avatar. An immersive extension to DIVE, Spelunk
(Steed, Mortensen, & E., 2001), implements concurrency control through object
Partial casual ordering is implemented at the communication level and is therefore
described below.

Awareness is managed at the world level as well as within the world through division
of the object model hierarchy. All replicas must hold the route object but can be
selectively pruned by local interest. Branches may be assigned to interest groups to
which the application may express interest. This low level approach can support any
higher level awareness management scheme that maps to the organisation of the
hierarchy. Both subjective views (Snowdon, Greenhalgh, & Benford, 1995) and aura
based focus and attention (S. D. Benford & L. E. Fahlén, 1993) have been
implemented above DIVE.

Level Of Detail (LOD) is partially supported. Composite objects comprise a tree of
objects within the hierarchy, therefore, interest based tree pruning may be used to
reduce their complexity. However, this does require some scripting. The LOD of an
atomic object can only be switched within the graphics module. Thus, without custom
scripting, LOD affects appearance and rendering performance as opposed to
behaviour and network traffic. The default renderer supports distance based LOD
switching. Adaptive rendering was incorporated in the COVEN extension to DIVE
(Frécon, Smith, Steed, Stenius, & Stahl, 2001). Here, distance culling and iterative
rendering techniques can alter the detail of the rendered scene to meet specified frame

Aggregation is not directly supported but again could be implemented at the
application level by making use of interest management, this time to switch from a
sub tree to an alternative atomic object.
Objects are not owned and thus can be created by an application and left in the world
once the application has closed. Any application can remove the object from the
world. By default, clients are responsible for persistency. Early versions had no
persistency servers but an object will remain in the world as long as one replication of
it exists. Later versions of DIVE incorporated persistency servers. Object behaviour
can be defined by scripts that are replicated along with the object at each host.
Evolutionary persistence is maintained through the continued triggering and execution
of scripts. The triggering events can come from the object itself or from other objects
in the world.

DIVE uses a combination of point-to-point and group communication. The point-to-
point protocol (TCP) is reliable and ordered. Group communication is supported at
two qualities of service: unreliable, and partial reliability and order. IP Multicast
provides the former and is extended into Scalable Reliable Multicast (SRM) for the

Discovering Objects:
The first client to enter a world downloads the initial world from an internet location.
Subsequently entering clients obtain the current world from a peer. This approach
allows an up-to-date world to be downloaded immediately without the need for a
world server. A downside of this approach is that the peer from which the world is
obtained freezes while sending data. This typically takes tens of seconds depending
on the complexity of the world. Later versions of DIVE address this problem by
allowing downloads from persistency servers instead of from clients.
Clients can create objects at any time and must inform peers on doing so. When a
client discovers a new object, either through a creation or update message, it may
request the object. With the exception of the first client download, all requests and
downloads are done over SRM. An algorithm attempts to transfer objects from the
nearest client in terms of network delay.

All events are sent using SRM. Partial reliability with ordering are mapped to three
event categories: movement, geometry and general. By default, all three are set to
reliable. Each object has a causal counter which is stamped to outgoing messages.
Partial ordering and reliability of events are implemented within SRM. Partial
ordering ensures that two events from the same object are delivered in the order they
were sent but the same is not guaranteed for events from distinct objects. Partial
reliability guarantees that if a lost event is detected through arrival of a later event, the
object state may be requested. The assumptions made are that event loss and disorder
are rare and that the partial reliability and ordering are thus sufficient to converge the
databases over time. Both reliability and ordering are achieved through object
sequencers thus providing a high level of concurrence and thereby reducing the effect
on responsiveness. Receiving an unexpectedly high sequencer detects message loss.
When this occurs, the state of the object is requested rather than the set of lost
updates. The downside is that loss is not detected until a subsequent event from the
same object arrives. The loss of events for infrequently updated objects can make
some applications unworkable. For example, a door might be unlocked by one user
but remain locked to another. Furthermore, a dead reckoned path can result in
considerable divergence if a subsequent path event is lost.

Audio and Video:
Both audio and video are streamed across unreliable multicast. Responsiveness and
consistency are slackened to allow constant delivery rates suitable for human
communication. Each world has a unique multicast group for audio and another for
video. Sound is spatialised so that objects and avatars are heard from where they are

3.2 PING
The Platform for Interactive Network Games (PING) was developed by a European
consortium headed by France Telecom. It combined many best practice principles into
a scalable architecture implemented as a communications infrastructure for support of
massive multi-player games. Figure 7.8 and Table 7.7 summarise the PING
architecture in terms of modules.

 Modules       Description
 Entities      Interfaces replicated persistent objects to the application program
 Replication   Manages the replication of objects including life cycle and
 Persistence   Maintains static persistence above stable storage
 Consistency   Balances synchronisation with responsiveness
 Interest      Manages awareness in terms of world subdivision
 Communication Supports message passing between processes
 Core          Provides core services used by and linking the other modules.

Table 7.7 Modules of the PING architecture


                 Replication          Consistency            Persistence

                                      Interest                Network
                         Figure 7.8. The PING architecture


Object Model:
At each process, objects are replicated by the replication service according to
awareness determined by the interest service. The entities management service
interfaces replicated persistent objects to the application program. It provides
selective transparencies of distribution and replication. The replication service is
responsible for the life cycle management of replicas and makes use of the
consistency service to update replicas. The object model comprises both data objects
and reactive objects. Data objects hold state information. Reactive objects are data
objects which embed a reactive behaviour. Data objects may be shared and may also
be made persistent. Sharable objects contain a selection of sharable attributes.

Two forms of behaviour support are provided: reactive; and reflective. Reactive
objects are associated with a reactive program. Within a given process, reactive
objects communicate through local broadcast of events. The reactive behaviour is
defined at the application level and then replicated within the object model by the
replication service. The reactive program defines triggers and responses in terms of
typed events.
Reflective behaviour allows the behaviour of objects to adapt to the availability and
condition of computational and network resources. This facility is not core to the
PING infrastructure but may be placed between it and the application as a filter.

The consistency service is highly configurable supporting a range of time
management services. The consistency module sits below that of the replication and
above that of event router and in turn communication. Its purpose is to balance
synchronisation with responsiveness and this is achieved by delaying the sending or
delivery of events according to some interchangeable time management policy. Each
iteration of the local simulation process is synchronised by a tick. This tick causes
events held in the consistency module to be delivered to the replication according to
the time management policy. Supported policies fall into two categories: non-causal
and causal. Non causal strategies are: receive order; time stamp; and predictive.
Receive order simply delivers events to the replication in the order received. Time
stamp delivers them in the order that they were created. Predictive delivers predicted
events at the predicted time thus overcoming some effects of network latencies. The
sending of predicted events may be delayed to reduce the likelihood of erroneous
predictions. Causal order may be guaranteed with policies that define causality in
terms of awareness or interaction. Some causal policies are based on object
sequencers and so use the exchange of sequencers to provide concurrency control.
More general concurrency control is offered as a core service of ping from outside the
consistency module. These include read and write locking of objects using either a
pessimistic or optimistic approach as well as a choice of explicit or implicit locking.
Pessimistic concurrency control prevents inconsistencies whereas the optimistic
approach resolves them. The former is generally better for human in the loop real time
systems and is used in PING by default.
The interest management service provides support for world subdivision policies
which may be defined at the application level. The role of this module is to manage
dynamic grouping, determine the set of object replicas needed within the local process
and informing other processes of changes in interest through the generation of events.
Neither control of Level of Detail or aggregation is supported within the

Persistency is provided at two levels relating to static and evolutionary persistence.
Static persistence is supported over stable storage and is guaranteed when all
processes have exited. Evolutionary persistence maintains and evolves objects as long
as one replication of them exists in any process.


Discovering objects:
The discovery of objects is directed by the interest management service. The
replication service is responsible for the life cycle of each replica and must thus fetch
an object to a process when it is originally discovered. A local caching service is
provided to overcome problems associated with frequent rediscover which may be
caused by re-crossing of the same interest borders.

Events are used to synchronise replica updates as well as communicate system
messages. These events are synchronised by the consistency service described above.
An event router service takes outgoing events from the consistency service and uses
interest management to direct them to appropriate communication channels. Channels
proved Application Level Framing (ALF) to map events to particular dissemination
groups and qualities of service. The granularity of the ALF is that of an object. The
event router maps unique object identifiers to channels using tables that are updated
according to the interest management service.

Various underpinning transport level protocols are used including UDP, TCP and IP
multicast. SRM offers object level reliability and ordering above the latter. Channels
hide all of this from the choice of protocol from the services above. The infrastructure
may be configured to implement reliability and ordering at either the consistency or
communication level. The basic requirement of reliability on the communication
service is that it will not deliver an incomplete or corrupted event to the consistency

4. Deployment
CVE bring together people, possibly from distinct geographical places, into a shared
information space. We have shown how the environment may be replicated across
many processes and synchronised through event communication. So far we have
assumed that all the machines are connected to some network of reasonable
bandwidth which allows them to communicate using a combination of peer-to-peer
and group communication and using varying qualities of service. Unfortunately the
use of a current wide area network, such as the Internet, introduces problems that
must be addressed when deploying an IIS over them. This section considers the
impact of real world problems of deployment on the existing Internet. These include
firewalls, modems and the lack of Multicast capability on the Internet. We consider
three idealised approaches to deployment: point-to-point, tunnelled group, and hybrid.

Firewalls have become essential to maintain the security of corporate and academic
networks connected to the Internet. A firewall restricts access to selected port
numbers, protocols and remote sites. It is unlikely that a CVE process can
communicate through a firewall without some adjustment or help.

CVEs may allow inclusion of users from home or mobile computers. Such computers
connect to the Internet using modems which typically offer low bandwidth compared
to corporate and academic networks. Furthermore, modems offer only a point-to-point
Multicast is supported on most local area networks (LANs) but is currently not
supported on much of the Internet. This is because of problems with scaling routing
strategies and global management of the address space as well as the large number of
legacy routers in use.

4.1 Point-to-point
The traditional approach to distributed processing on the Internet is based on the
simple client server model, Figure 7.9. This approach has been popular in supporting
public IIS applications such as social meeting places and games. This popularity
arises from the simplicity of access and reliability offered by restricting
communications to point-to-point connections as well as the simplicity of security,
maintenance and consistency offered by servers. Clients connect to servers that
maintain the current state of the environment. Scalability is increased by mapping
servers to worlds or awareness management subdivisions. Servers decide the true state
of the environment and thus simplify concurrency control. Many offer persistence.
Home or wareable computers connect to the Internet through modem links and
Internet Service Providers. Those on LANs connect through corporate routers.
Although this model is fundamentally less scalable than those using group
communication, some games applications have boasted tens of thousands of
simultaneous users by mapping awareness management to servers and relaxing
consistency (Ultima Online, 2003).

                                            LAN                     Client




               Fig. 7.9. Point-to-point deployment across the Internet
4.2 Tunnelled Group
Group communications mapped to peer-to-peer distribution is generally more scalable
than point-to-point, Figure 7.10. . It does, however, complicate the development and
deployment of an CVE. This approach is dominant in research and defence simulation
training that both aim for optimal rather than simple solutions and, furthermore, do
not make wide use of low cost modems. To use multicast across the Internet it is
currently necessary to join some Multicast backbone, such as MBone (Introduction to
the MBone, 2002), or to deploy a private equivalent. Multicast backbones use an
approach called tunnelling. Each connected LAN has a tunnel process that converts
between multicast and point-to-point network packets. Multicast packets are captured
by a tunnel process, encapsulated in IP packets, through firewalls and across the
Internet, to peer tunnel processes on remote LANs that strip off the IP headers and
redistribute as multicast. Private tunnels typically offer high security and low latency
compared to tunnelling across public backbones.
The servers can be placed at any LAN or stand alone computer connected via a
tunnel. Servers provide either initial or persistent worlds but are not generally
responsible for maintaining the true state of the environment. Maintaining this true
state is the responsibility of the clients with the help of distributed consistency

                          LAN                                        LAN
                 Client     Client




              Fig. 7.10. Tunnelled group deployment across the Internet

4.3 Hybrid
  A hybrid solution, pioneered in DIVE (Frécon, Greenhalgh, & Stenius, 1999) and
 refined in PING, is to allow private computers to link to multicast connected service
providers, Figure 7.11. Let us call these IIS providers or IISPs. Tunnels link the IISPs
and other servers. An IISP is responsible for converting point-to-point communication
     from a client into group multicast. Awareness management mapped to group
 addresses determines which clients, IISPs and other servers receive a given message.
        IISPs are positioned to minimise latency across the point-to-point link.

                                      Tunnelled multicast
                               IISP        network                 Client




                  Fig. 7.11. Hybrid deployment across the Internet

5. Conclusion
Inhabited Information Systems (IIS) situate users in a social information context. In
the real world, these users may be collocated or at different geographical locations.
The unique combination of IIS technology provides us with unprecedented access to
information, and ways of processing, presenting, interacting and sharing it. The
technology maps well to social human communication supporting not only verbal and
non-verbal communication but also unprecedented communication through
information objects in the environment.

Both information, and the way in which users interact with and around       it, must be
supported in a natural and intuitive manner. This requires the               issues of
responsiveness, fidelity, consistency and scalability to be addressed. A    multi-level
architecture is required to focus these issues on representation,            behaviour,
synchronisation and communication.

We have described the principles of supporting these issues at each level and
described how this is done in example systems. Deploying systems over the Internet
introduces additional problems of security, bandwidth and dissemination. We have
shown three idealised models of deployment to explain how these issues may be
addressed for different applications and networks.
For reasons of space, this chapter has focused on the support IIS that allow people to
inhabit information spaces through primarily graphical interfaces. We have not
discussed important systems such as COMRIS, where the emphasis is placed on the
large scale co-habitation of agents and people within information space and the
primary use of audio interfaces.

The underpinning technology of IIS, and particularly the communication systems, are
reaching maturity. Simple networked IIS are already in daily public and commercial
use. More advanced systems in research offer considerably higher levels of realism
and richness. Core to this is the shared interaction with dynamic and steerable
information. Most of the core principles of IIS communication are well developed and
a deep understanding of the usability of such systems is being gained. An CVE that
addresses all the issues well is yet to emerge. The time has now come for the IIS
research community to consolidate and bring together best practise at each
architectural level to develop systems to a commercial standard.
Distinct applications have diverse requirements and it is unlikely that one system will
fit all applications for the foreseeable future. However, we are yet to achieve true
realism in social interaction with information in any system. We are some way from
being able to work together in an IIS without constantly thinking about the effects of
the system but the light at the end of the tunnel is growing close.

The author would like to thank his PhD students, particularly Robin Wolff and Oliver
Otto, as well as Anthony Steed and his colleagues at UCL, Emmanuel Frécon and his
colleagues at SICS and Frederic Dang Tran and his colleagues within the PING

To top