Principles of Collaborative Virtual Environments David Roberts 1. Introduction Inhabited Information Systems (IIS) require advanced communication infrastructures that address issues arising from the use of limited computational and network resources to place people within an interactive information space. These infrastructures are commonly called Collaborative Virtual Environments (CVE). This chapter describes these issues along with the ways in which they are typically addressed. IIS situate people in a social and information context where they can interact with each other and with the information itself. These users, possibly in remote geographical locations, access the environment through a variety of display devices through which they gain distinct levels of presence and immersion (Slater, Steed, & Chrysanthou, 2001). Some may be co-located, seeing each other in the real world while immersed in the virtual environment, while others may be in some geographically remote location and represented locally as an avatar, a 3D graphic character capable of representing human-like communication, appearance, identity and activity. Information presented to the user may be shared or private, objective or subjective. It may be abstracted from live data in the real world or from simulation. Users may interact with the information to adapt its presentation, content or behaviour. Information objects often provide a focus for group activity (Greenhalgh & Benford, 1999). IIS merge the real and the virtual. Ideally the latter should possess the richness and naturalness of the former. We would like to be able to interact with remote users as if they were standing next to us. Verbal and non-verbal communication and the use of objects in the environment are primary methods of social human communication in the real world (Burgoon, Hunsaker, & Dawson, 1994). In IIS, shared interactive information objects may be observed, used to inform, explain, teach, heal, experiment or as a basis for discussion. Sometimes it is important to see not only what each participant is doing, in relation to the shared information, but how he or she is feeling. Expressive human communication should include speech, gesture, posture, and facial expressions. Shaking hands and passing task-related artefacts, from business cards to a model of a new product, are important activities in real world group activities. We are, however, constrained by technology, physics and cost. The constraints of computers, networks, display and acquisition devices introduce a gap between what we would like to achieve, and what is currently realisable. In practise we need to make trade-offs, by reducing realism, naturalness and content where they are not needed in order to maximise where necessary. This is typically addressed in terms of what each user can see and in what detail, as well as the objectivity and responsiveness of interactions with shared information. IIS applications are numerous and have diverse requirements. Specialised IIS communication architectures attempt to strike the balance within various application genres. It has been found that striking the balance within one or more genres requires a complex architecture comprising many cooperating optimisation and balancing mechanisms. Common mechanisms will be dealt with in detail later. This chapter is concerned with the systems issues of communication in IIS. That is, how we make best use of computers and networks to support co-located and geographically distinct users in an IIS. Requirements We set the scene by briefly introducing a number of application genre along with the balance the communication infrastructure needs to set for each. A detailed discussion of application genre is beyond the scope of this chapter and we restrict our description to Table 7.1. The remainder of the section discusses common requirements in detail. Some architectures provide a level of configuration, and sometimes adaptation, to cope with differential application requirements, various computational resources and dynamic network characteristics. Application genre Maximise Reduce Tele-conferencing Expressive avatar Group size, complexity and communication interactability of shared information Scientific Faithfulness of simulated Group size, avatar visualisation behaviour, consistency communication Cohabited agent Communication between Avatar communication, spaces agents and between agents responsiveness and users Social meeting places Group size Avatar representation, responsiveness, consistency Games Responsiveness Faithfulness, avatar representation, group size Training and Faithfulness, responsiveness, Group size, avatar planning consistency, repeatability communication Table 7.1. Typical compromises for various application genres Before we discuss the complexities of IIS communication architectures, it is important to understand what this technology can give us, what we can do with it and what kind of information needs to be communicated. The remainder of this section introduces some functional and non-functional requirements of a CVE. Our discussion of functional requirements focuses on situating inhabitants in a social and information context. Non-functional requirements are taken from the various communication media used by IIS as well as the computers and networks that these media must run on. Information objects act as foci for activity and often collaboration. Users can collaboratively affect the presentation, content and behaviour of shared information. Simple interaction with information is often achieved through selection and manipulation tools, allowing the representation to be moved to a more suitable viewing perspective. Application specific 3D toolbars give additional control and adaptation. The representation of information may itself incorporate handles or tools for natural interaction. The presented information can often only be understood in the context of how a group are working with it. It is therefore important to demonstrate how others are interacting with the data as well as supporting instructive and expressive communication within the group. Information Information may represent anything and be represented in many abstract forms. A reasonable question is what can IIS offer in terms of information representation, over and above what we had before? IIS is a combination of advanced technologies and is not restricted to a particular set of these. Let us briefly look at the way in which some component technologies are changing what we can do with information. Access to unprecedented scales of data and processing is now available through technologies such as the e-Science GRID (GRID). 3D graphics, simulation and display devices give unprecedented naturalness in the viewing and steering of such information. Mobile and social agents provide powerful ways of finding, assessing and combining information. A Collaborative Virtual Environment (CVE) allows us to share information and observe those that we share it with. IIS technology encompasses all of this and thus gives us novel ways of presenting and interacting with shared information in a distributed group setting. So what does this information look like? An advantage of computer graphics and virtual reality is that we can tailor the representation of information to any abstract form that best suits the user, application and display device. What can we do with it? We can alter the representation, change the detail, content and state, or steer the simulation. Most importantly we can share information and share the way we work with it. Avatars The spoken word is often the most important medium for communication between users. It is, however, not sufficient to demonstrate how others are interacting with the information. When combined with video streaming, or a 3D avatar capable of reflecting gesture and posture, we have an effective tool for instructive and emotive communication. Viewing a remote user through a video window is, however, not effective for demonstrating how the remote user is interacting with information. Representing both the information and remote user through 3D graphics gives a much better impression of how each user is interacting within the team and with the data. All that is left to situate the inhabitants in a social and information context is the support of expressive communication. This brings us to the topic of user controlled, computer generated characters (avatars). Video avatars can provide high levels of detail, realism and expression. They faithfully reflect the actions and emotions of their user. Although there is little technical difficulty in placing a stereo video in a 3D world, it is much harder to capture imagery of the user. Problems of camera placement within a display system are exacerbated by freedom of movement of the local user and any number of observers. Other problems include isolation of the user from his environment, occlusion of the displayed image by the cameras, and high bandwidth requirements of multiple streaming of video across a network. For these reasons most CVEs use avatars generated from 3D graphics. These are typically humanoid with movable joints that provide a basic reflection of body movement. Although such avatars are not as realistic, they can communicate instructive and emotive communication sufficient for many applications. In the real world we look at posture, gesture, subconscious movement and facial expression to gauge emotion. All of these can be represented through an avatar. The problem again relates to capture. A typical display device takes sparse input from the user to control the avatar. For example, a desktop system may use a mouse to control movement, mouse keys to interact with objects and keyboard to chat. An immersive display system, such as a Head Mounted Display (HMD) or CAVE would typically track the head and a wand held in the dominant hand. The wand provides additional input for moving long distances in the environment and interaction with objects. Talking would normally be communicated through streamed audio. Such input is sufficient for demonstrating how a user is interacting with input. Showing any emotion through a present desktop interface is almost impossible without additional input. The combination of audio and freedom of single handed gesture does allow a base level of expressive communication from an immersive device. It has been found that when desktop and immersive users work together, the latter take dominant roles, presumably from their greater ability to express themselves (Slater, Sadagic, Usoh, & Schroeder, 2000). Further to this we have found that where two immersive users share an environment with desktop counterparts, the former team up mostly ignoring the latter. Greater levels of emotive communication may be achieved for any device by allowing the avatars to improvise (Slater, Howell et al., 2000). Here the avatar will attempt to fill in the gaps left by the lack of inputs. Context and profiled personality may be used to interpret user input, or lack of it, and drive suitable emotive behaviour. Other behavioural techniques may further enhance the believability and realism of avatars. For example behavioural reasoning may combine concurrent simple autonomous behaviours such as fidgeting, shifting weight between the feet, breathing and eye movement. Reactive behaviour is useful to define how objects, including avatars, react to given interactions. Diverse behaviour can be achieved through applying polymorphism, allowing something of given type to behave in new ways to given stimuli. Interaction Some basic requirements for interaction within IIS are responsiveness, detail and intuitiveness. As they are of prime importance to the usability of the environment, they will now be discussed in more detail. Responsiveness A key aspect in the usability and believability is maintaining responsiveness of interactions close to the level of human perception. Changes in the presented information must be present as soon as a user affects it. Low responsiveness will make the system feel unnatural and cause frustration. Immersive displays render the environment from a new perspective every time the user moves his or her head. A low responsiveness in updating perspective causes disorientation and sometimes feelings of nausea. IIS introduce the issue of responsive sharing. This is a particular concern where users are in geographically distinct locations connected over a network. The communication infrastructure must provide sufficient responsiveness to support, and not confuse, the natural sequences of conversation and interaction. Detail Some interactions will require more detail than others. For example, to interact with another person it is often important to communicate both complex language and emotion. Even email users exchange icons to represent how they feel. The detail of presented information will be a balance between the data from which it is derived and what is useful and perceivable for each user. The communication infrastructure must support a wide range of detail in interactions and should do so in an optimum manner. Intuitiveness Interaction must be both natural and intuitive. A user should be able to interact with an object or peer, without having to worry about overcoming shortcomings in the technology. Furthermore, an object should react in a believable way regardless of how or where it is implemented. This places requirements on both the device and the infrastructure. Display devices offer various input/output capabilities that may be mapped to interaction scenarios. For example, in an immersive system, a user may use a joystick on a wand to move up to an object, their own body movement to position themself at the correct aspect, and the wand to select and manipulate the object. Both the physical device and the way its inputs are interpreted must map to natural and believable behaviour in the virtual environment. Communication Requirements So where does this leave us in terms of communication? Representing information using computer generated graphics gives unprecedented powers to tailor its presentation. Virtual Reality (VR) uses 3D graphics to allow the user to control his or her position in the environment, giving natural access to spatially organised information. CVE situate socially a group of users around information within a familiar spatial context. Although video avatars would offer a potentially higher level of realism, computer generated avatars are easier to situate in an environment where users have freedom to walk around. It is not surprising that the majority of CVEs rely primarily on 3D computer generated graphics to present visual information. Unlike video, 3D graphics scenes, comprising the geometry and appearance of many objects, can be downloaded in advance. Where users are distributed, the scenegraph may be replicated at each user’s computer with incremental changes sent across the network. This massively reduces bandwidth usage and also increases the responsiveness. Without such replication, any user movement would require a perspective recalculation of the scene on a server before the resultant images could be streamed back to the user’s machine. This approach is generally unusable as the network delays result in feelings of disorientation and nausea as the user’s visual inputs lag behind their internal senses of balance and proprioception. 3D graphics may be the primary medium for IIS but it is often combined with others. Natural language has been shown to be of vital importance to collaborative tasks. Audio streaming has been found much more effective than chat in IIS settings and does not require the use of a keyboard. Streaming of video and 3D graphics across the network is useful for rich and detailed images provided observer perspective is constrained. An exception to this is tele-presence which allows a single user to see through the eyes of a movable robot, but this is outside the scope of this chapter. Table 7.2 shows how various media are typically used together, when they are used and how they impact on available network bandwidth. 3D Audio Video Streamed 3D Text chat graphics – graphics replicated scene graph Purpose Primary Primary Supplementary Supplementary Alternative visual natural visual and visual e.g. for to audio medium language audio for high end streaming medium perspective graphics on for constrained desktop desktop, high fidelity public systems Usage Continuous When Occasionally Occasionally When user user is as required as required is chatting speaking Bandwidth High Medium High High Low usage during initial download then Medium Table 7.2. Usage of mediums in IIS and the effect on the network The degree to which each medium is used is application dependent. We have assumed so far that sharing the use of the information is the primary goal of an CVE. Other applications, for example, may place more emphasis on emotive communication and thus use video as the primary medium. The remainder of this chapter deals with the typical and leaves specialisation to other works. We therefore restrict our discussion to systems that primarily use 3D graphics for vision, audio streaming for speech and video, and 3D graphics streaming for occasional supplementary, high detail, imagery. Resources: Computers and Networks Let us now take a brief look at the relevant characteristics of computers and networks. It is, after all, these that must underpin an CVE. Computers have limitations on the amount of information they can store and process. An IIS will often contain users supported by computers of widely differing capabilities. These computers may be connected to various network technologies such as Ethernet, ATM and wireless. These networks are often part of the greater Internet and will communicate through intermediate networks of various technologies. These technologies have widely differing bandwidth, delay characteristics and reliability. CVEs use the Internet Protocol (IP) that deals with the heterogonous nature of the Internet by making low assumptions about the guaranteed service. That is, IP assumes that messages may be fragmented and individual fragments may arrive late, out of order or be lost. The Internet, and often super computers running display devices and processing information, are shared resources offering highly dynamic levels of throughput depending on localised load. A final important point is that the speed of light will introduce perceivable network delays for many intercontinental links. An CVE must be designed to run over a set of heterogeneous computers and networks, each with possibly very different dynamic, throughput and reliability. 2. Principles IIS situate inhabitants in a social and information context that extends interaction in the real world in a natural manner. Technology, physics and cost create a gap between this ideal and reality. This section is concerned with balancing throughput limitations of computers and networks with the requirements of IIS applications. CVEs employ a set of cooperating mechanisms and algorithms that effectively concentrate resources by maximising the fidelity of sharing where it is needed by reducing it where it is not. We have looked at what might reasonably be expected in terms of perception and interaction and how this may be supported through a combination of existing communication media. We have explained why 3D graphics with replicated scenegraphs have become the primary medium of communication in IIS and how these may be supplemented with other media. We restrict our discussion here to the mechanisms for improving the fidelity of sharing through 3D graphics and replicated scenegraphs. A key requirement of IIS and VR is the responsiveness of the local system. Delays in representing a perspective change following a head movement are associated with disorientation and feelings of nausea. A CVE supports a potentially unlimited reality across a number of resource bounded computers interconnected by a network which induces perceivable delays. Key goals of an CVE are to maximise responsiveness and scalability while minimising latency. This is achieved through localisation and scaling. 2.1 Localisation Localisation is achieved through replicating the environment, including shared information objects and avatars, on each user’s machine. Sharing experience requires that replicas be kept consistent. This is achieved by sending changes across the network. Localisation may go further than simply replicating the state of the environment and can also include the predictable behaviour of objects within it. Object Model The organisation and content of a scenegraph is optimised for the rendering of images. Although some systems, for example Cavernsoft (Leigh et al., 2000) and Arango (Tramberend, 2001) directly link scenegraph nodes across the network, most systems introduce a second object graph to deal with issues of distribution. Known as the replicated object model, we will from here on refer to it as the replication and its nodes as objects. Objects contain state information and may link to corresponding objects within the local scenegraph. Behaviour A virtual environment is composed of objects which may be brought to life through their behaviour and interaction. Some objects will be static and have no behaviour. Some will have behaviour driven from the real world, for example by a user. Alternatively, object behaviour may be procedurally defined in a computer program. In order to make an IIS application attractive and productive to use it must support interaction that is sufficiently intuitive, reactive, responsive, detailed and consistent. By replicating object behaviour we reduce dependency on the network and therefore make better use of available bandwidth and increase responsiveness. Early systems replicated object states but not their behaviour. Each state change to any object was sent across the network to every replica of that object. This is acceptable for occasional state changes but bandwidth intensive for continuous changes such as movement. Unfortunately, movement is one of the most frequently communicated behaviours in IIS. A more scalable approach is to replicate the behaviour model and only send changes to that behaviour. Such changes are known as events. Deterministic Behaviour: Behaviour may be characterised as deterministic or non-deterministic. Deterministic behaviour need not be sent across the network provided it can be calculated independently at each replication. Most procedural behavioural descriptions such as reactive, improvisational and emergent may be defined in a repeatable and deterministic manner. Events can simply identify the name and possibly arguments to a procedure, the execution of which will be replicated at each machine. Even non- deterministic behaviour can be approximated as deterministic provided the effect of bad approximations are not catastrophic. Dead reckoning: Constrained movement, such as that of a vehicle, may be determined approximately using a technique called dead reckoning (IEEE1278.1, 1995). One of the earliest applications for large scale IIS was battle field simulation (SIMNET). Here embodiment was originally confined to vehicles, such as tanks, where the vast majority of communicated behaviour was movement around the battle field. Dead reckoning was introduced to reduce bandwidth consumption of movement information. A dead reckoned path represents a predicted approximation of near future parametric movement based on recent samples of position over time. Paths are sent to other replicas in events. A remote replication then calculates the probable position of the vehicle based on current time. Divergence is checked at the sender by comparing actual and predicted position by running the same algorithm as the receiver on the path it has sent. When divergence exceeds a threshold, a new path is calculated and sent. The algorithms for calculating the path are based on Newton’s Laws and Hamilton’s quaternion expressions from the 17th century. Variations on the approach deal with first and second order integration, time constants and smoothing (Miller, 1989). The remote user is presented with an approximation of movement, the most noticeable aspect of which is sudden jumps in position when a new event is received, Figure 7.1. The magnitude of this discontinuous jump is the product of the difference in velocity described in two adjacent events and the network delay. Local Remote movement representation Fig. 7.1. Effect of dead reckoning Consistency Public switched networks, such as the Internet, introduce into the distribution of event messages both dynamically changing delays and the possibility of loss. This can adversely affect the synchronisation, concurrency, causality and responsiveness of events. Synchronisation ensures that events are replicated within real-time constraints. Causal ordering ensures that causal relationships are maintained. Concurrency defines the ability of the system to allow events to occur simultaneously. Lastly, responsiveness is the delay in the user perceiving the effect of an action on the system. Concurrence and therefore responsiveness are reduced as the level of consistency is increased. This all leads to the need for consistency management, the role of which is to provide sufficient synchronisation and ordering whilst maximising concurrence and thus the responsiveness of the system. The optimal balance between sufficient synchronisation, ordering and responsiveness is application and scenario dependent. An ideal ordering mechanism provides a compromise between synchronisation and ordering on one side and responsiveness and concurrence on the other. Synchronisation: Behaviour may be described parametrically. For example, dead reckoned paths describe movement through time. Some early systems based time on frame rate. This can be seen in some single user computer games, where the movement of objects slows down as the complexity of the scene increases. This approach is unsuitable for IIS as shared behaviour should be consistent and not represented differently to each user dependent on the performance of the local machine. A common approach is to use the system clock on each computer to provide a continuous flow of time. Movement can then be described in terms of metres per second and will be represented at the same rate to each user. As well as progression it is important to synchronise the start of replicated events. Some systems for example, NPSNet (Macedonia, 1994), set the start time of a received event to the time at which it was received. This removes the need to synchronise accurately local clocks which is a non-trivial task. The disadvantage with this approach is that any behaviour is offset by the network delay. Through synchronising local clocks it is possible to synchronise the state of objects from the time an event arrives until the time a subsequent overriding event is sent. The PaRADE system, developed as part of the author’s PhD (Roberts, 1996), allows locally predictable events to be sent in advance, thus overcoming the network delay and synchronising from the start. Concurrency Control: Concurrency control is an important subset of consistency management that deals with the prevention of concurrent conflicting updates. This is most apparent where two users try and move a given object in conflicting directions. Without concurrency control it is difficult to determine the outcome but it will at least cause confusion and frustration and at worst an unrecoverable divergence between replicas. Many existing infrastructures do not include concurrency control. Those that do, employ algorithms that themselves are adversely affected by network latency. This in turn affects the responsiveness of interaction between user and shared information objects. A conservative concurrency algorithm, as used in some analytical simulations, would lock the whole world and allow updates on a turn basis. This unnecessarily restricts responsive interaction to a level that is unworkable for general IIS applications. An optimisation is to increase the granularity of locking, to either sets of objects, object or object attribute level. A common mechanism for concurrency control is transferable object ownership, where a user can only affect an object once ownership has been transferred across the network. The effect of such latency is normally apparent in a delay in being able to interact with an object recently affected by another user. Optimisations have been developed for predicting interactions and transferring ownership in advance (Roberts, Lake, & Sharkey, 1998). Causality: Events sent over the Internet may be lost or arrive in a different order to which they were sent. In many cases the current state is more important than history and can be derived from an old state and a new event, even when some preceding events have been missed. For example, a new dead reckoned path overrides the last and is not dependent upon it. Ordering is, however, often vital. A lack of ordering can cause complete confusion when collaborating with remote users and sharing objects. It is therefore surprising that the majority of CVEs do not guarantee it. This is most likely a throwback to the conventional applications of collaborative virtual environments that did not properly support shared interaction. Order must be balanced against responsiveness. The greater the level of ordering, the lower becomes the concurrence and thus the responsiveness. A true objective state of an environment cannot be guaranteed until all events have been received and processed in the correct order. Generating a new event before the objective environment state is known is dangerous and requires some strategy for dealing with events generated on the basis of an untrue state. To guarantee objectivity all replicas must be frozen whilst waiting for events to arrive thus lowering the concurrence. Lamport developed an optimisation called causal ordering which removed the need to order events that could not have been related (Lamport, 1978). The definition of causal relationship was based on the subjective view of a replication. Total ordering and Lamport causal ordering work well in distributed analytical simulation but are not generally suited to IIS applications which require continuous and responsive interaction with the environment. One solution is to allow the IIS infrastructure to decide when to apply and where to apply ordering. This decision may be based on application knowledge of causality and importance of ordering, awareness (see below) and network conditions. Such approaches have been applied to various degrees in PaRADE, MASSIVE III (Greenhalgh, Purbrick, & Snowdon, 2000) and PING (Sharkey, Roberts, Tran, & Worthington, 2000). Application of Consistency: Now that we have introduced synchronisation, concurrency control and ordering as the basic components of consistency, we can look at how they are applied. Table 7.3 compares common alternative mechanisms for each, describing each mechanism, giving an application level example of use, comparing typical delay in terms of level of human perception and giving some example infrastructures in which they are used. Synchronisation Concurrency Ordering Description Behaviour of an Object replicas affected Order of object events object is concurrently over replicas synchronised over replicas Mechanism Wall clock Tick ConvergenceOwnership Causal Total Description Remote Replicas Diverging Prevents Based on All events object update states are divergence potential are ordered follows in converged through causality parametric step unique key behaviour Example Dead Crowd Tug of war Passing a Player’s Player reckoning walking with elastic business activity action in rope card ordered in delayed step ball game until earlier but not spectator with action spectators Observed Induced delay None Medium Low Low Medium High Example Most RTI DIVE DIVE- MassiveIII, RTI Infrastructure Spelunk, PaRADE, PaRADE, PING Table 7.3. Comparison of consistency mechanisms 2.2 Scaling Scaling allows the amount of information in the environment, including the number of users, to increase, without reducing the fidelity of experience to any one user. This is achieved by balancing each individuals need for information with what can be achieved given available computational and network resource (S. Benford & L. E. Fahlén, 1993). Awareness management is the mechanism used to balance an individual’s ideal awareness with resources. The scale of information provided to any one user or process may be controlled in terms of extent, granularity and detail. These define awareness in terms of object subsets of the environment, aggregation of many objects into few, and the attributes of a given object. Extent The majority of effort in attaining scalability has focused on subdivision of the environment and population according to interest. This is often referred to as interest management. Awareness of remote objects is determined by context dependent interest. Distinct resources such as servers or communication groups, discussed later, are used to support each area of interest. The interest of a user is dynamic and context dependent. For example by walking into another virtual room a user becomes aware of its contents and occupants. A number of technical issues must be addressed in order to support this dynamic awareness. Subdivision should be natural and appear transparent otherwise it can affect a user’s behaviour. To be effective it must balance resources usage across the areas of interest. Changing awareness may require much data to be transferred. This can result in delays in the presentation of a new area, which may reach the order of seconds. Different application genres are suited to distinct definitions of interest and methods of subdivision. The granularity of subdivision may be tackled at world, object or intermediate level. We now survey some classic approaches to subdivision used in IIS which are first summarised in Table 7.4. Approach Description Granularity Example systems Multiple worlds Separate world connected World Active worlds, through portals Ulitima, DIVE Static spatial Divide world surface into Intermediate NPSNet subdivision tiles Dynamic spatial Flexible mesh of AOI Intermediate VIVA subdivision tiles that stretches to balance tile membership Löcales Rooms Intermediate SPLINE Aura Aura, focus and nimbus Object DIVE, MASSIVE I & II Regions Abstract spaces Intermediate MASSIVE II, DIVE (COVEN version) Table 7.4. Overview of classic subdivision approaches Multiple Worlds: A simple method for dividing the environment and population is to provide distinct multiple worlds. Each world is typically supported by a distinct server and hosts a distinct set of objects and users. As discussed in the deployment section, this is straightforward to support over the Internet and thus is prevalent in current systems used by the general public, for example, in gaming (Ultima Online, 2003) and social environments. Users typically inhabit a single world at a given time and, in some systems, may move between these worlds using portals (Snowdon, Churchill, & Munro, 2001). The disadvantage of this approach is the difficulty in balancing the number of users in each world. Figure 7.2, shows multiple worlds interconnected through portals and demonstrates the potential problems with balancing population. Worlds Portal Fig. 7.2. Multiple worlds, showing portals and possible population loading Static Spatial Subdivision: Increasing the granularity of subdivision allows worlds to be split into areas of interest. An approach developed for battlefield simulation and training was to divide the environment into areas of interest in the shape of equal hexagonal tiles and map each to a communication group (Macedonia, Zyda, Pratt, Brutzman, & Barham, 1995). A process sends information to the group associated with the tile occupied by its user and receives information from that group and those associated with adjacent tiles. The supporting process dynamically joins and leaves groups as the user moves between tiles. Receiving information from adjacent tiles removes the problem of not seeing spatially close objects across a border. Group communication provides a mechanism for limiting awareness at a message distribution level with the added bonus of removing the need for a server. It is not yet, however, generally supported on the Internet and so this method adds complexity to deployment which is discussed later. Again the static nature of this method also can produce unbalanced population of areas, Figure 7.3. Fig. 7.3. Static spatial subdivision Löcales: Environment plays an important role in restricting and focusing human interaction. Spatial subdivision approaches are suited to open spaces but do not take advantage of the awareness limits imposed by buildings. Löcales (Barrus, Waters, & Anderson, 1996) are areas of interest that map to physically divided spaces such as rooms in a building, Figure 7.4. This approach relies on the adequate provision of resources to support a crowded room and again suffers from its static nature. It is, however, sufficient for many applications. Fig. 7.4. Löcales Dynamic space subdivision: The above approaches rely on an even distribution of users across statically defined areas of interest. This suits them to particular implementations and restricts their general applicability. Dynamic space subdivision attempts to redefine divisions between areas of interest in order to balance the number of users in each, Figure, 7.5. (Robinson et al., 2001) divides the environment into a two dimensional mesh or three dimensional lattice and moves the boundaries between the areas of interest to balance membership. Boundary movement is considered when an area becomes over populated and is determined through negotiation between servers dedicated to adjacent interest areas. Robinson’s algorithm considers the cost of moving a boundary to both servers and clients. Fig. 7.5: Dynamic spatial subdivision Aura: Interest may be determined at the granularity of object pairs by determining their potential for interaction based on spatial proximity. Spatial proximity may be efficiently detected by placing auras around objects and checking for aura intersection. In the case of avatars, this potential for interaction is increased when they face each other. (S. D. Benford & L. E. Fahlén, 1993), encapsulate avatars in auras and use aura collision as a perquisite for interaction. Within the aura, focus and nimbus spatially define attention and projection respectively, Figure 7.6. Both focus and nimbus reach out in front of the avatar but have distinct shapes. Fig. 7.6. Aura – Focus and Nimbus Regions: Both tiles and Löcales are specific definitions of how to divide the environment and are applicable to distinct forms of interaction and application genres. MASSIVE II confines aura based awareness within abstract regions which may be mapped to application specific definitions of interest. Figure 7.7 depicts one possible way of dividing an environment into regions. Fig. 7.7. Regions Granularity In the real world people are able to reason at different levels of granularity. For example, a lecturer must be aware of the attention and understanding of each student during a lecture whereas a university chancellor sees the institute in terms of departments. This approach of aggregation may be adopted in CVEs to further increase the scalability. Aggregation reduces not only the rendering but also the amount of information needed by some observing processes. For example, in a battlefield simulation, the driver of a tank is interested in other tanks whereas a general is more concerned with tank divisions (Singhaland & Cheriton, 1996). Another example is that of a crowded stadium represented by a single avatar (Greenhalgh, 1999). The size of the group, the team they support, and the sound they produce, are represented through the avatar’s size, colour and aggregated audio streams respectively. Emergent behaviour may be replicated and communicated in aggregated form to reduce the load on the network. For example, the behaviour of a flock of birds could theoretically be replicated by simply communicating the size of the group and then continuing to communicate the movements of whichever bird is in front. A reasonable flocking behaviour can then be replicated at each site through application of local rules based on following and collision avoidance. This aggregated emergent behaviour may be applied to many other group behaviours, for example the behaviour of a human crowd. A similar principle can be applied to an avatar allowing the majority of body movement to be calculated locally and driven by the communication of movement of selected body point, such as head and hand. Here, a combination of kinematics and selections of previously recorded motion tracking data can be used to improvise reasonable local behaviour based on head and hand movement. In order to reduce network traffic through aggregation it is necessary for the sender to know the level of aggregation. Although aggregation can increase the scalability of a receiving process it can decrease the scalability of the sender and the use of the network when many receivers require distinct levels of aggregation for the same objects (Roberts, 1996). Detail We have seen how scalability may be increased by reducing the number of communicating objects held on each machine. Scalability can be further increased by managing the detail at which individual objects are replicated. Heuristics of interest such as distance or the relationship between the role of the observer and the use of the observed may be applied. Many graphics languages, such as, Inventor, Performer and VRML, support Level Of Detail (LOD) modelling where sufficient frame rate is maintained by reducing the graphical complexity of distant objects. The scalability of communication and computation can be greatly increased by applying this reasoning to the communication of behaviour. Objects may be defined in terms of attributes in which remote processes can dynamically express and decline interest, for example as defined in IEEE 1516 and implemented in the DMSO RTI. Balancing the detail of communicated behaviour with the interest of remote users is an important, if under researched, topic. The amount of information being received may be reduced through local filtering or sending control messages back to the sender. The latter approach again suffers from the potential need to send distinct levels of information to different receivers. A hybrid approach might send the highest detail required by any to all and allow receivers to filter further. 2.3 Persistence Users can join, leave and rejoin collaborative virtual environments at will. When in the environment, they can affect its state through interacting with, and introducing, objects. A real world analogy is a bank account. When someone deposits money into a bank cash machine, the money should not be lost as soon as the card is withdrawn. Persistent environments will maintain the effect of changes when the user leaves. Supporting persistence is straightforward when the underlying CVE infrastructure hosts all master objects on servers. Where a localized approach has been adopted to increase scalability or responsiveness, master objects will be held in the memory of a user’s machine. These must be moved to a participating machine when the user leaves. Provided the behaviour of an object is known at the target site, it is only necessary to move the current state and mastership of the object. There are two basic forms of persistence: state; and evolutionary. State persistence maintains an object in a static state once its owner has left. Evolutionary persistence will support the ongoing behaviour of an object once its creator has left. For example, in a lecturer’s bank account which is always overdrawn, the money deposited will be reduced over time by interest payments. So far we have only considered what happens to objects when a user leaves. We must also consider the effect of the environment going off line. Such an occurrence may be planned or accidental. In either case we may wish to guarantee persistence. One solution is to store object state information to disk on a persistency server both periodically and, where possible, when an imminent failure is predicted. 2.4 Communication Previous sections have introduced the kind of information that must be passed through an CVE and we have described object level mechanisms for managing this information in order to maximise responsiveness and scalability. We now move down into the message level to examine how to actually communicate this managed information. Requirements The communication requirements of an CVE are complex. Those of responsiveness, reliability and scale of information transfer differ greatly depending on application, context and scenario. Before we describe the method of communication we must look at the content. We now examine some typical forms of information and their requirements on the underlying communication system. This is broken down into: discovery of objects; events; audio and video. Discovering Objects: When a client alters its awareness by entering a new world or area of interest, it must discover the objects within. Some mechanism is required for the client to obtain all the information about every object it discovers. This information includes state, behaviour and graphical appearance. Behaviours, and particularly appearance descriptions, tend to be much larger than state, but in most systems remain unchanged throughout the lifecycle of an object. Such information is typically in the order of kilobytes per object. Usually such data only needs to be sent to one client at a time and it must be sent reliably, in order and preferably efficiently. Users frequently move between areas of interest, which results in traffic busts as the local system downloads object state and possibly appearance and behaviour. In turn this can result in delays often reaching several seconds. It is therefore important to use an awareness management scheme that minimises movement between areas as well as the number of objects in each. Some systems, for example DIVE (Frécon & Stenius, 1998), download from an existing peer process but this can cause that process to lock, which is disorientating for its user. The responsiveness of remote peers may be maintained by obtaining all object information from a persistence server. Events: The behaviour of objects is driven and communicated by events. Events need to be propagated to any interested process as quickly as possible. They are typically very small in terms of network bandwidth. Many events are frequent and quickly superseded. Others are infrequent and their loss might cause applications or users to act in an erroneous way. The majority of events typically describe movement. Constant latency is important as it improves the realism of remote movement. As discussed above, in the context of event ordering, it is typically more important to reflect current position as opposed to how the object came to the position. Since we may typically send many movement events for a given object in a second and that the probability of message loss is low, lost movement events will seldom be noticed. An important exception to this rule is introduced by dead reckoning where the frequency of path generation is considerably lower. We presented a scheme for addressing this problem by reliably resending dead reckoned paths that had not been superseded within a time limit (Roberts, 1996). Tracking systems allow natural non-verbal communication but generate quantities of events that are difficult to support over the network. During trials between networked reality centres in UK and Austria, we found it difficult to realistically approximate human movement with dead reckoning but have had greater success limiting the frequency of outgoing events for given objects by simple filtering. Bursts of events typically accompany interaction with other avatars, objects or both. For example, avatar communication may well include gesticulation and talking. This results in bursts of movement events and audio traffic. Such exchanges can occasionally swamp bandwidth and overrun receive buffers, resulting in high message loss. This is particularly the case for groups of interacting users. Remote events can sometimes be delayed for seconds while the CVE attempts to catch up with the receive buffer resulting in a temporary loss of responsiveness. In this case the loss of movement events is preferable as it brings the system back to a synchronised state in shorter time. Some systems, for example PING, limit this time through a Bucket algorithm. Some events may be vital particularly where they affect the result of, or ability to process, subsequent events. This includes any event that changes the structure of the scenegraph. Such events are commonplace where users interact with objects. Losing such events can cause significant divergence between users’ views. For example, one user sees that he has taken an object out of another’s hand, while the other sees herself still holding it. At best this causes confusion and at worst, an unrecoverable divergence. Audio: Verbal communication considerably improves the performance of general collaborative tasks as well as the feeling of co-presence. In order for audio communications to support human conversations it must be continuous and have constant rate and sufficient resolution. Network bandwidth and message loss can reduce resolution. Network jitter, where heavy network traffic causes temporary high delays, can alter the rate at which the data is delivered. The COVEN trials suggest that audio traffic is in the order of kilobytes per second for each user (Greenhalgh, 2001). Video: Video has similar requirements for continuity and rate but high resolution images can require much higher bandwidths. Typical CVE use video sparingly, mapping low resolution streams to polygons. For example, a low resolution video avatar might require tens of kilobytes per second. Solutions Now that we have described how information needs to be communicated we will look at ways in which this is achieved in typical CVE. In particular we focus on how data is prepared for sending over the network and how it may be disseminated to one or many recipients with various Qualities of Service (QoS) of delivery. Preparation: Before being sent over the network, data is marshalled into a flattened message. This message is split into packets and sent across the network. Transport level network protocols convert between messages and packets. The size of a packet is determined by the underpinning link level protocol. The Internet Protocol (IP) adopts a maximum packet size from the underlying network technology. In an CVE a packet might contain one or several events and a continuous flow of packets might support an audio or video stream. Dissemination: Dissemination determines if a message is sent to a single or a group of recipients. Distinct forms of dissemination are offered by various transport level protocols. Group communication multicast is often used in CVE to scale the number of users. Multicast allows hosts to express interest in any set of communication groups. A message sent to a group will be distributed to every member at no extra cost to the sender. The scalability of the sender is maximised while the scalability of the receiver can be increased by mapping awareness to groups. Group dissemination may alternatively be implemented above point-to-point protocols by sending a message to a set of connections, for example in Spline (Waters et al., 1997) and PaRADE (Roberts, Strassner, Worthington, & Sharkey, 1999). Delivery: Not all packets that are sent arrive, or arrive in the correct order. Their subsequent assembly into a message, and delivery to the application, may be delayed while these errors are overcome. QoS determines what criteria, in terms of reliability, order and timeliness, will be met before delivery. Generally the higher the reliability and level of ordering, the lower the responsiveness and scalability. This is particularly the case for group communications. Different transport level protocols offer distinct QoS in addition to dissemination. Some systems, for example, DIVE and PaRADE, implement additional or improved qualities of services above the transport level. Mapping Information to Dissemination and Delivery: We have seen how various types of information in an CVE require distinct levels of dissemination and QoS. Some CVEs simplify their design by using single dissemination and delivery methods and accept the drawbacks. Others, for example HLA RTI, PaRADE, PING and DIVE, combine various dissemination and QoS delivery methods to optimise performance. Table 7.5 suggests how an CVE might map information type dissemination and QoS. This table is derived from combining best practise of PaRADE, PING and DIVE. Type Example Reliability Order Responsiveness Dissemination Throughput Downloads Object HIGH HIGH LOW ONE HIGH discovery Regular Movement LOW LATEST HIGH MANY MEDIUM events Irregular Object creation HIGH HIGH HIGH MANY LOW events Audio Verbal LOW LATEST CONSTANT MANY MEDIUM communication Video Facial LOW LATEST CONSTANT MANY MEDIUM expression Table 7.5. How an CVE might map information type dissemination and QoS Channels: Managing the mappings between information, dissemination and QoS becomes complex when an environment contains many users dynamically moving between areas of interest. The channel abstraction may be used to map dissemination to QoS. For example in PING, events are routed to channels according to their type and the current area of interest. Some events may theoretically be sent down many channels, for example unreliably to user machines and reliably to a persistence server. 3. Architecture We have introduced the basic requirements and realities of communication within CVE and outlined principles used to balance the two sufficiently to support fruitful collaboration between users socially situated in an information context. This section provides case studies of two example systems, DIVE and PING, describing each in terms of modulised architecture and use of principles. The Distributed Interactive Virtual Environment (DIVE) is a widely adopted CVE platform that implements most of the principles we introduced. The Platform for Interactive Network Games (PING) attempts to bring together best practise from CVE architecture. Although the latter is still in beta prototype stage its design provides a good tool for explanation. 3.1 DIVE The Distributed Interactive Virtual Environment (DIVE) has a classic architecture consisting of seven modules, Table 7.6. Each represents a conceptual level and is implemented as a unique library. This provides flexibility when updating the platform. Modules Description Video Allows video to be texture mapped to polygons in the scene Audio Audio supporting conversations between users as well as attaching sounds to objects Graphics 3D rendering of the graphical representation of objects and thus scene Aux Tools for the application building including the scripting language Core Object database and supporting functionality such as time and events Sid Communication Threads Thread library provides concurrence at each computer Table 7.6. Modules of the DIVE architecture DIVE introduces, adopts and adapts most of the principles described in the previous section. Best practise solutions have been added and iteratively improved for more than ten years. Widely used in research, this platform has proved the principles. Localisation DIVE uses localisation to maximise local responsiveness and make best use of the network. We will now look at the particular design decisions taken in implementing this localisation within a framework of the principles outlined in the previous section. Object Model: The responsiveness of a user’s interactions with the environment is maximised through object replication, negating the need for events to be passed across the network before the local model is updated. The replicated object database resides on participating machines according to awareness management. A replication is organised into a hierarchy of objects, each of which contain state information and may be attached to behaviour scripts and graphical appearance. A scenegraph is coupled to a local replication and mirrors those qualities of objects necessary for rendering. An application reads and writes to a replication regardless that other replicas may exist. Behaviour: A simple reactive behaviour associates triggers and responses to objects. An object’s behaviour is defined in an attached script. The script language, DIVE/TCL, extends the Tool Command Language (TCL) to include useful commands for monitoring and updating objects. Typed events may be triggered through a user input device, collision with another object, timer and world entry. An interest in events may be expressed through event callbacks and responses mapped to event types. For example, an application programmer can register an interest in collision events for an avatar and define distinct responses to various types of collided object. Behaviour scripts are replicated along with the object. This allows objects to react to local interactions without network induced delay. Remote scripts are not called directly but through the communication of the same event that triggered them locally. The concept of dead reckoning is supported but the implementation of the algorithm left to the application programmer. Each object is able to store a parametric path from which the current position may be calculated. Use of the path to communicate and calculate current position is, however, optional. This feature is good, as not all objects move in a predictable way. Consistency: High responsiveness comes at the cost of low consistency. Replicas are loosely coupled allowing divergence and attempting convergence over time. For example, when a user moves an avatar the remote representation will follow, delayed by the network, and catch up when the avatar stops moving. There is no strong concept of object ownership. There is no specific concurrency control within DIVE. Hence an object may be affected in conflicting ways by multiple users, causing the replicas to diverge. Mechanisms are provided to settle an object to a mean position after being pulled in opposing directions. Users, however, observe the object jumping wildly between them until the steady state is reached. A loose form of ownership allows an object to be attached to an avatar. Other users can still affect the object, for example, by changing its relative position to the carrying avatar. An immersive extension to DIVE, Spelunk (Steed, Mortensen, & E., 2001), implements concurrency control through object mastership. Partial casual ordering is implemented at the communication level and is therefore described below. Scaling Awareness is managed at the world level as well as within the world through division of the object model hierarchy. All replicas must hold the route object but can be selectively pruned by local interest. Branches may be assigned to interest groups to which the application may express interest. This low level approach can support any higher level awareness management scheme that maps to the organisation of the hierarchy. Both subjective views (Snowdon, Greenhalgh, & Benford, 1995) and aura based focus and attention (S. D. Benford & L. E. Fahlén, 1993) have been implemented above DIVE. Level Of Detail (LOD) is partially supported. Composite objects comprise a tree of objects within the hierarchy, therefore, interest based tree pruning may be used to reduce their complexity. However, this does require some scripting. The LOD of an atomic object can only be switched within the graphics module. Thus, without custom scripting, LOD affects appearance and rendering performance as opposed to behaviour and network traffic. The default renderer supports distance based LOD switching. Adaptive rendering was incorporated in the COVEN extension to DIVE (Frécon, Smith, Steed, Stenius, & Stahl, 2001). Here, distance culling and iterative rendering techniques can alter the detail of the rendered scene to meet specified frame rates. Aggregation is not directly supported but again could be implemented at the application level by making use of interest management, this time to switch from a sub tree to an alternative atomic object. Persistence Objects are not owned and thus can be created by an application and left in the world once the application has closed. Any application can remove the object from the world. By default, clients are responsible for persistency. Early versions had no persistency servers but an object will remain in the world as long as one replication of it exists. Later versions of DIVE incorporated persistency servers. Object behaviour can be defined by scripts that are replicated along with the object at each host. Evolutionary persistence is maintained through the continued triggering and execution of scripts. The triggering events can come from the object itself or from other objects in the world. Communication DIVE uses a combination of point-to-point and group communication. The point-to- point protocol (TCP) is reliable and ordered. Group communication is supported at two qualities of service: unreliable, and partial reliability and order. IP Multicast provides the former and is extended into Scalable Reliable Multicast (SRM) for the latter. Discovering Objects: The first client to enter a world downloads the initial world from an internet location. Subsequently entering clients obtain the current world from a peer. This approach allows an up-to-date world to be downloaded immediately without the need for a world server. A downside of this approach is that the peer from which the world is obtained freezes while sending data. This typically takes tens of seconds depending on the complexity of the world. Later versions of DIVE address this problem by allowing downloads from persistency servers instead of from clients. Clients can create objects at any time and must inform peers on doing so. When a client discovers a new object, either through a creation or update message, it may request the object. With the exception of the first client download, all requests and downloads are done over SRM. An algorithm attempts to transfer objects from the nearest client in terms of network delay. Events: All events are sent using SRM. Partial reliability with ordering are mapped to three event categories: movement, geometry and general. By default, all three are set to reliable. Each object has a causal counter which is stamped to outgoing messages. Partial ordering and reliability of events are implemented within SRM. Partial ordering ensures that two events from the same object are delivered in the order they were sent but the same is not guaranteed for events from distinct objects. Partial reliability guarantees that if a lost event is detected through arrival of a later event, the object state may be requested. The assumptions made are that event loss and disorder are rare and that the partial reliability and ordering are thus sufficient to converge the databases over time. Both reliability and ordering are achieved through object sequencers thus providing a high level of concurrence and thereby reducing the effect on responsiveness. Receiving an unexpectedly high sequencer detects message loss. When this occurs, the state of the object is requested rather than the set of lost updates. The downside is that loss is not detected until a subsequent event from the same object arrives. The loss of events for infrequently updated objects can make some applications unworkable. For example, a door might be unlocked by one user but remain locked to another. Furthermore, a dead reckoned path can result in considerable divergence if a subsequent path event is lost. Audio and Video: Both audio and video are streamed across unreliable multicast. Responsiveness and consistency are slackened to allow constant delivery rates suitable for human communication. Each world has a unique multicast group for audio and another for video. Sound is spatialised so that objects and avatars are heard from where they are seen. 3.2 PING The Platform for Interactive Network Games (PING) was developed by a European consortium headed by France Telecom. It combined many best practice principles into a scalable architecture implemented as a communications infrastructure for support of massive multi-player games. Figure 7.8 and Table 7.7 summarise the PING architecture in terms of modules. Modules Description Entities Interfaces replicated persistent objects to the application program Replication Manages the replication of objects including life cycle and synchronisation Persistence Maintains static persistence above stable storage Consistency Balances synchronisation with responsiveness Interest Manages awareness in terms of world subdivision Communication Supports message passing between processes Core Provides core services used by and linking the other modules. Table 7.7 Modules of the PING architecture Application Entity Behaviour Replication Consistency Persistence Interest Network Figure 7.8. The PING architecture Localisation Object Model: At each process, objects are replicated by the replication service according to awareness determined by the interest service. The entities management service interfaces replicated persistent objects to the application program. It provides selective transparencies of distribution and replication. The replication service is responsible for the life cycle management of replicas and makes use of the consistency service to update replicas. The object model comprises both data objects and reactive objects. Data objects hold state information. Reactive objects are data objects which embed a reactive behaviour. Data objects may be shared and may also be made persistent. Sharable objects contain a selection of sharable attributes. Behaviour: Two forms of behaviour support are provided: reactive; and reflective. Reactive objects are associated with a reactive program. Within a given process, reactive objects communicate through local broadcast of events. The reactive behaviour is defined at the application level and then replicated within the object model by the replication service. The reactive program defines triggers and responses in terms of typed events. Reflective behaviour allows the behaviour of objects to adapt to the availability and condition of computational and network resources. This facility is not core to the PING infrastructure but may be placed between it and the application as a filter. Consistency: The consistency service is highly configurable supporting a range of time management services. The consistency module sits below that of the replication and above that of event router and in turn communication. Its purpose is to balance synchronisation with responsiveness and this is achieved by delaying the sending or delivery of events according to some interchangeable time management policy. Each iteration of the local simulation process is synchronised by a tick. This tick causes events held in the consistency module to be delivered to the replication according to the time management policy. Supported policies fall into two categories: non-causal and causal. Non causal strategies are: receive order; time stamp; and predictive. Receive order simply delivers events to the replication in the order received. Time stamp delivers them in the order that they were created. Predictive delivers predicted events at the predicted time thus overcoming some effects of network latencies. The sending of predicted events may be delayed to reduce the likelihood of erroneous predictions. Causal order may be guaranteed with policies that define causality in terms of awareness or interaction. Some causal policies are based on object sequencers and so use the exchange of sequencers to provide concurrency control. More general concurrency control is offered as a core service of ping from outside the consistency module. These include read and write locking of objects using either a pessimistic or optimistic approach as well as a choice of explicit or implicit locking. Pessimistic concurrency control prevents inconsistencies whereas the optimistic approach resolves them. The former is generally better for human in the loop real time systems and is used in PING by default. Scaling The interest management service provides support for world subdivision policies which may be defined at the application level. The role of this module is to manage dynamic grouping, determine the set of object replicas needed within the local process and informing other processes of changes in interest through the generation of events. Neither control of Level of Detail or aggregation is supported within the infrastructure. Persistency Persistency is provided at two levels relating to static and evolutionary persistence. Static persistence is supported over stable storage and is guaranteed when all processes have exited. Evolutionary persistence maintains and evolves objects as long as one replication of them exists in any process. Communication Discovering objects: The discovery of objects is directed by the interest management service. The replication service is responsible for the life cycle of each replica and must thus fetch an object to a process when it is originally discovered. A local caching service is provided to overcome problems associated with frequent rediscover which may be caused by re-crossing of the same interest borders. Events: Events are used to synchronise replica updates as well as communicate system messages. These events are synchronised by the consistency service described above. An event router service takes outgoing events from the consistency service and uses interest management to direct them to appropriate communication channels. Channels proved Application Level Framing (ALF) to map events to particular dissemination groups and qualities of service. The granularity of the ALF is that of an object. The event router maps unique object identifiers to channels using tables that are updated according to the interest management service. Various underpinning transport level protocols are used including UDP, TCP and IP multicast. SRM offers object level reliability and ordering above the latter. Channels hide all of this from the choice of protocol from the services above. The infrastructure may be configured to implement reliability and ordering at either the consistency or communication level. The basic requirement of reliability on the communication service is that it will not deliver an incomplete or corrupted event to the consistency service. 4. Deployment CVE bring together people, possibly from distinct geographical places, into a shared information space. We have shown how the environment may be replicated across many processes and synchronised through event communication. So far we have assumed that all the machines are connected to some network of reasonable bandwidth which allows them to communicate using a combination of peer-to-peer and group communication and using varying qualities of service. Unfortunately the use of a current wide area network, such as the Internet, introduces problems that must be addressed when deploying an IIS over them. This section considers the impact of real world problems of deployment on the existing Internet. These include firewalls, modems and the lack of Multicast capability on the Internet. We consider three idealised approaches to deployment: point-to-point, tunnelled group, and hybrid. Firewalls have become essential to maintain the security of corporate and academic networks connected to the Internet. A firewall restricts access to selected port numbers, protocols and remote sites. It is unlikely that a CVE process can communicate through a firewall without some adjustment or help. CVEs may allow inclusion of users from home or mobile computers. Such computers connect to the Internet using modems which typically offer low bandwidth compared to corporate and academic networks. Furthermore, modems offer only a point-to-point connection. Multicast is supported on most local area networks (LANs) but is currently not supported on much of the Internet. This is because of problems with scaling routing strategies and global management of the address space as well as the large number of legacy routers in use. 4.1 Point-to-point The traditional approach to distributed processing on the Internet is based on the simple client server model, Figure 7.9. This approach has been popular in supporting public IIS applications such as social meeting places and games. This popularity arises from the simplicity of access and reliability offered by restricting communications to point-to-point connections as well as the simplicity of security, maintenance and consistency offered by servers. Clients connect to servers that maintain the current state of the environment. Scalability is increased by mapping servers to worlds or awareness management subdivisions. Servers decide the true state of the environment and thus simplify concurrency control. Many offer persistence. Home or wareable computers connect to the Internet through modem links and Internet Service Providers. Those on LANs connect through corporate routers. Although this model is fundamentally less scalable than those using group communication, some games applications have boasted tens of thousands of simultaneous users by mapping awareness management to servers and relaxing consistency (Ultima Online, 2003). Client Corporate LAN Client ISP Server ISP Corporate LAN Client Client Fig. 7.9. Point-to-point deployment across the Internet 4.2 Tunnelled Group Group communications mapped to peer-to-peer distribution is generally more scalable than point-to-point, Figure 7.10. . It does, however, complicate the development and deployment of an CVE. This approach is dominant in research and defence simulation training that both aim for optimal rather than simple solutions and, furthermore, do not make wide use of low cost modems. To use multicast across the Internet it is currently necessary to join some Multicast backbone, such as MBone (Introduction to the MBone, 2002), or to deploy a private equivalent. Multicast backbones use an approach called tunnelling. Each connected LAN has a tunnel process that converts between multicast and point-to-point network packets. Multicast packets are captured by a tunnel process, encapsulated in IP packets, through firewalls and across the Internet, to peer tunnel processes on remote LANs that strip off the IP headers and redistribute as multicast. Private tunnels typically offer high security and low latency compared to tunnelling across public backbones. The servers can be placed at any LAN or stand alone computer connected via a tunnel. Servers provide either initial or persistent worlds but are not generally responsible for maintaining the true state of the environment. Maintaining this true state is the responsibility of the clients with the help of distributed consistency control. Tunnel Tunnel Client LAN LAN Client Client Server Tunnel Client LAN Client Server Fig. 7.10. Tunnelled group deployment across the Internet 4.3 Hybrid A hybrid solution, pioneered in DIVE (Frécon, Greenhalgh, & Stenius, 1999) and refined in PING, is to allow private computers to link to multicast connected service providers, Figure 7.11. Let us call these IIS providers or IISPs. Tunnels link the IISPs and other servers. An IISP is responsible for converting point-to-point communication from a client into group multicast. Awareness management mapped to group addresses determines which clients, IISPs and other servers receive a given message. IISPs are positioned to minimise latency across the point-to-point link. Client Tunnelled multicast IISP network Client IISP Server IISP IISP Client Client Fig. 7.11. Hybrid deployment across the Internet 5. Conclusion Inhabited Information Systems (IIS) situate users in a social information context. In the real world, these users may be collocated or at different geographical locations. The unique combination of IIS technology provides us with unprecedented access to information, and ways of processing, presenting, interacting and sharing it. The technology maps well to social human communication supporting not only verbal and non-verbal communication but also unprecedented communication through information objects in the environment. Both information, and the way in which users interact with and around it, must be supported in a natural and intuitive manner. This requires the issues of responsiveness, fidelity, consistency and scalability to be addressed. A multi-level architecture is required to focus these issues on representation, behaviour, synchronisation and communication. We have described the principles of supporting these issues at each level and described how this is done in example systems. Deploying systems over the Internet introduces additional problems of security, bandwidth and dissemination. We have shown three idealised models of deployment to explain how these issues may be addressed for different applications and networks. For reasons of space, this chapter has focused on the support IIS that allow people to inhabit information spaces through primarily graphical interfaces. We have not discussed important systems such as COMRIS, where the emphasis is placed on the large scale co-habitation of agents and people within information space and the primary use of audio interfaces. The underpinning technology of IIS, and particularly the communication systems, are reaching maturity. Simple networked IIS are already in daily public and commercial use. More advanced systems in research offer considerably higher levels of realism and richness. Core to this is the shared interaction with dynamic and steerable information. Most of the core principles of IIS communication are well developed and a deep understanding of the usability of such systems is being gained. An CVE that addresses all the issues well is yet to emerge. The time has now come for the IIS research community to consolidate and bring together best practise at each architectural level to develop systems to a commercial standard. Distinct applications have diverse requirements and it is unlikely that one system will fit all applications for the foreseeable future. However, we are yet to achieve true realism in social interaction with information in any system. We are some way from being able to work together in an IIS without constantly thinking about the effects of the system but the light at the end of the tunnel is growing close. Acknowledgements The author would like to thank his PhD students, particularly Robin Wolff and Oliver Otto, as well as Anthony Steed and his colleagues at UCL, Emmanuel Frécon and his colleagues at SICS and Frederic Dang Tran and his colleagues within the PING consortium.