gcs by stariya


									                                  Group Communication

            Joshua A. Boggis, Richard C. Gronback, Harry L. Sauers, Adam P. Uccello
                                   University of Connecticut
                               Computer Science and Engineering
                         CSE 298/300 – Distributed Object Computing
                                         May 04, 1999


The task of efficiently managing a group of inherently autonomous individuals working on a
common project is difficult at best; impossible at worst. Coordinating their efforts, allowing for the
absence and addition of group members, and ensuring all of them are reading from the same page
of the "script" is a management task that originates with civilization itself. The analogous problem
in achieving reliable, distributed interprocess communications among a number of computers, each
with its own operator, seems even more difficult.       In trying to solve this "problem within a
problem," many forms of Group Communication Services (GCS) have been developed and in some
cases deployed in the recent past. They all deal, to one extent or another, with phrases like virtual
synchrony, partitionability, and fault-tolerance. In this paper, we will look at these phrases and
compile a short list of what these services do and do not provide. Next, as the computing world is
ablaze with Java, we will determine what role Java has or will play in GCSs. Following that, we
will take an in-depth look at one Java-based collaborative development API, the Java Shared Data
Toolkit (JSDT), and determine its capabilities with respect to the fundamentals of GCSs. Finally,
we intend to provide a high level specification of a JSDT-based application, detailing how to
incorporate the basics of a truly collaborative development environment into an existing stand-
alone program.
1. Introduction

The definition of group communication is as dynamic as most newly emerging technologies within
the computing community. Depending on what one considers a group, and what is meant by
communication, it is possible for each person's expectation of what a Group Communication
Service (GCS) should provide to be unique. This paper will begin with an attempt to provide a
general definition of group communication and identify what services are necessary to fulfill its
most basic aspects. As the concept grows in scope and popularity, so does the definition and
therefore a set of extended services will be identified which address them as well. Section 2 of this
paper will accomplish these goals by describing the methodologies of actual Group Communication
Services that were developed during the earlier days of research and development on the subject.
This will not be an exhaustive analysis, however, as an attempt to formally categorize existing
GCSs is far beyond the scope of this paper. The purpose of this section will be to introduce the key
concepts of group communication, utilizing three real-world examples to demonstrate the
complexities of the topic and how various solutions have been realized. This introduction will be
necessary to better understand and compare against the development efforts currently underway in
the object-oriented paradigm. In particular, it is our opinion that Java will play an ever-increasing
role in the evolving discipline of group communications.
        Most of the early work with Group Communication Services employed the C programming
language to develop APIs that embodied multicast communication protocols. Many of the early
efforts eventually collaborated to provide a mixed bag of layers and interfaces, many of which make
direct UNIX system calls. Many of these efforts are now expanding to provide interoperability with
object-oriented languages and concepts that have been the focus of distributed computing as of late.
With this, Java has the potential to revolutionize the Group Communications field. Java's founding
principles of being object-oriented, platform independent and secure, make it quite compatible with
this area. Specifically, features such as RMI, where object methods can be invoked from across the
office or across the world, may help make Java the language to follow for the future of group
communications. Currently, much research effort is being made in the development Java-based
GCS packages. We will consider two of these in detail in section 3 of this paper, comparing and
contrasting what they have to offer in relation to the "founding fathers" of GCSs.
        Sun Microsystems' Java Shared Data Toolkit (JSDT) provides several capabilities to enable
Group Communication Services. The JSDT is a Java API designed to support interactive, group-

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 2 of 62                                  05/04/99
oriented applications. It supports reliable, multicast, group communications in a distributed
environment. This toolkit allows its user to operate above network protocols and to concentrate on
the group communication, or collaborative services of the application. The low-level network
implementations needed to provide such group communications are essentially left invisible to the
user. However, the JSDT does provide the means to choose the supporting network protocol and
even implement alternate protocols. The group communication tools and flexibility of the JSDT, in
addition to all of the benefits of Java provide for a promising environment for the development of
Group Communication Services.
        Given the toolkits available to us, we would like to see how their features fit into an actual
'real world' application. In order to do this, we will be proposing a design for a collaborative
CRCTool using the JSDT. The CRCTool is an application, currently written in Java, which allows
for the creation and editing of CRC cards for the designing of object-oriented software. The current
tool only allows a single user at a single terminal to work on a given project. Since most projects in
the real world require multiple people to be involved in the design, there is a desire to make the tool
collaborative. In this way, multiple users could access and modify the same project in real-time with
other users. Each user could work from their own terminal while their edits and additions were
immediately visible to other users. Using the Collaborative CRCTool as an application base, we
will see how the common GCS services, uncovered above, map to a real-world design. We will
present a 'top-down' design of the collaborative tool highlighting the abilities and inabilities of Java
and the JSDT to implement the desired functionality.

2. Overview of Group Communication Services

The design of an application that is intended to function in a distributed paradigm is made more
complex when considering the myriad of services it must, or at least should, include. The most
basic of these are multicast and group membership services. Other issues range from the yet to be
explored, to commonplace networking issues like Quality of Service (QoS). The only thing that
seems certain among early attempts to develop Group Communication Services (GCS), is that no
two share services or semantics. The GCS “layers,” which lie between the underlying network and
the applications they serve, all intend to facilitate the process of building reliable distributed
software systems. In this section, we will discuss a number of “core” components that all GCSs

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 3 of 62                                    05/04/99
should share, in addition to some of the more expanded features. This section will end with a
discussion of three GCSs (Transis, Totem and Horus) and how they have achieved these core goals.

2.1 Impossible?
Considering the fact that GCSs are asynchronous and must operate on systems that are known to
fail from time to time, how does a system distinguish between a member that is lost, or one that is
just delayed? How does a system handle the partitioning of a group into subgroups, followed by the
later recombination of subgroups, in a coherent manner? Recent research has indicated that it is
impossible to achieve a truly reliable group membership service, so it is up to developers to
incorporate a “best-effort” policy. The acceptance of imperfect solutions to software problems is
commonplace within the computing community, and they are required in group communication
        After examining several GCSs currently available, it is clear that each was created with a
specific set of principles, goals, and services in mind.       Each has its own “solution” to the
impossibilities facing the GCS developer. And, although a recent paper [10] has attempted to
formalize the specification of GCSs, it is clear that given the existing underlying technologies and
motivations, it may prove to be as difficult to generalize them as it is to solve the problems inherent
to the task of group communication.         This section will later describe many of the service
descriptions given in [10], but only to give the reader a more explicit breakdown of the problem.
Before that, however, we will discuss two central issues: virtual synchrony and the state transfer

2.2 Virtual Synchrony

A key phrase and concept, which is pervasive in literature dealing with GCSs, is virtual synchrony.
Before defining it, however, we need to explore the concept of a view.              A view, in GCS
terminology, is a list of other processes with which the process that is currently acquiring
membership in a group is able to communicate. Additionally, as explained in [10], a view will
contain information that allows it to distinguish between other views that may exist with the same
process set. Views are the domain in which group events and communications take place. The
concept of view synchrony, as explained in [7], “ensures that failures do not result in incomplete
delivery of multicast messages…[, and that] if two processes proceed together from one view of the
group membership to the next, they deliver the same messages in the first view.” Basically, virtual

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 4 of 62                                    05/04/99
synchrony deals with how the asynchronous nature of GCSs can be made to appear synchronous
through the ordering and delivery of messages and group information through views. As defined in
[3], “[I]ntuitively, the virtual synchrony principle guarantees that a local view reported to any
member is reported to all other members, unless they crash.” In doing so, each process will
perceive other process failures and configuration changes as occurring in the same order, or at the
same logical time. A recovering process is viewed as a new process. In this model, if the network
partitions, only a primary partition will be allowed to continue operation while the remaining
partitions (components) are blocked. To overcome this limitation, the concept of virtual synchrony
was later expanded to include partitionable systems in the concept of extended virtual synchrony.
        Depending upon the particulars of an application, it may be possible for partitions to
recombine and continue as before the network partition.            By employing the extended virtual
synchrony model, it can be guaranteed “that processes in all components of a partitioned network
have a consistent, though perhaps incomplete, history of the system” ([8], 1). Extended virtual
synchrony is able to preserve causality by numbering messages in a way that provides total
ordering, thereby maintaining a consistent relationship between the delivery of messages and
configuration changes. The GCS message ordering service is key to enforcing this and will be
discussed in detail below. The remaining aspects of partitionable groups are covered in the next
section dealing with the state transfer problem.
        The virtual synchrony concept was first introduced in Isis literature, but alone it is not able
to provide enough information concerning the nature and state of a group and its members. The
issues that deal with the problem of maintaining a consistent state among group members, even
when partitioning occurs, is the subject of the next subsection.

2.3 The State Transfer Problem

The efficient distribution of objects throughout a network is a currently researched area in
distributed object computing. By distributing many object replicas, a distributed application will
maximize the availability of service and increase the level of fault tolerance. The problem faced in
GCSs is similar, but more difficult in that there also needs to be a consistent state among these
nodes, even after one or more have been detached from the group and later rejoined. For example,

863cf282-7897-4b0c-a76e-de57aae16e23.doc           Page 5 of 62                                 05/04/99
if a whiteboard application has many members and due to one type of system failure or another a
smaller group (partition) is detached from the larger (primary partition), there should be some way
to update the application so that it can continue as a whole upon subsequent recombination (Figure
2.1). A solution to this recombination will vary according to the application‟s requirements,
however all solutions are ones that deal with the problem of state transfer.
        The state transfer problem is key to group communication services, since many of the other
issues with basic communications have already been realized in many existing network solutions.
The key is in the development of a protocol that will allow for the partitioning of a group and its
successful, coherent recombination. Ideally, this solution needs to be efficient, which excludes such
solutions as every member broadcasting its state whenever a change in the group is detected. Only
those members without the new state need to be updated. To allow for this coherent recombination,
the extended virtual synchrony concept of a transitional set can be used.
        A transitional set is an extension of the membership notification that enable a process to
know which other processes are proceeding with it from one state to the next.                With this
information, it is possible to know in which order to place messages when reconstructing the state
of various processes upon recombination. Many complicated algorithms have been developed to
specify the exact nature of such a recombination, but it ultimately up to the application writer to
consider the best way for her partitions to recover. The notion of a transitional set, introduced in the
extended virtual synchrony model, allows us to overcome the state transfer problem.

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 6 of 62                                   05/04/99
                                                                             Network Partitions

                                                                               Partitions Continue

                                                                               Partitions Merge

                                   Figure 2.1 Network Partition/Recovery

        With the principles of virtual synchrony and the state transfer problem described, we now go
on to an overview of GCS requirements, as were developed in [10] for the formal specification of
the properties of GCSs.

2.4 GCS Requirements

This section opened with a brief overview of what might be expected from a group communication
service. A GCS serves more as a protocol than anything, and if designed right, incorporates many
of the modular properties found in most modern software development.             There are certain
assumptions concerning the environment that are made in the development of a GCS. These would
include the following constraints: the network provides an unreliable datagram service; messages
may be delivered out of order, be corrupted or duplicated; and the application sends uniquely
identifiable messages. These assumptions are to simplify the requirements of the GCS.

863cf282-7897-4b0c-a76e-de57aae16e23.doc           Page 7 of 62                              05/04/99
        Although not exhaustive, the below list of requirements should prove sufficient in
considering the needs of a GCS (without straying from its place as a layer between the underlying
network and the overlying application). They are broken down into three main categories as seen in
Figure 2.1: membership, multicast, and message ordering services.

                                    Membership      Message Ordering
                                     Service            Service

                                  Figure 2.2 Major Components of a GCS

2.4.1 Membership Service

A membership service is of paramount importance in the successful operation of a GCS. It is the
membership view, mentioned earlier, that allows for a process to know what other processes it is
able to communicate with at a particular time. A membership service can either be partitionable or
only allow for an active primary partition. All three of the GCSs we will look at are partitionable.
Basic membership service requirements are outlined below, with the last four being more advanced
requirements: Valid Execution Requirement
This is a basic requirement and is obvious: a process must be in a certain view, where it executes
events of that view. Self-Inclusion

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 8 of 62                               05/04/99
This will preclude a process from installing a view of which it is not a member. Again, this should
be obvious; a view is a list of processes with which the installing member should be able to
communicate, which includes itself. Local Monotony
When keeping track of which views it is a member of, a process identifies each view with an
indicator that is monotonically increasing.        This requirement will facilitate the recovery of
synchrony among processes, as their views will be installed in the same order. Termination of Membership
In order to prevent a process from waiting forever for a message from another, which is not a
member of that view, a process needs to be able to install a next view. Since after installing a view,
processes typically exchange messages, this will allow for a process to continue, not waiting for
messages from a process that may not have installed that view. Agreement on Membership
Processes should all be in agreement on the next common view. Therefore, a process should only
install a view after it is reasonably sure that the other processes in the current view will be installing
the new view. Transitional Set
Expanding upon agreement on membership, in order to exploit the true benefits of extended virtual
synchrony, a process needs more information than just a list of what other processes are part of a
view. It is also important for a process to know what members of the previous view are continuing
on to the next common view. Thus, a transitional set represents this additional information that is
provided along with the view message. More of this will be discussed below under the virtual
synchrony requirement of the multicast service. Causal Monotony of Views
Again expanding on the previous, this requirement states that the unique “view identifier should
reflect the „causal‟ order of events in the system … When two processes reconnect, they can exploit
this property to find out which process is more updated, i.e. was a member of a later primary
partition in the global history” ([10], 32). Preciseness

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 9 of 62                                    05/04/99
The extent of this requirement is limited by the nature of the underlying network. It is desirable for
the membership service to as accurately reflect the existing condition of the group as possible. To
be precise, the membership is required to achieve some level of completeness and accuracy, where
one is usually sacrificed due to practical necessity in achieving the other. As pointed out in [10], an
external failure detector may be utilized in order to convey to a process that another process may
have failed. Thus, the effectiveness of the preciseness requirement is dependent upon the abilities
of failure detector.

2.4.2 Multicast Service

If the membership service represents the group in GCS, then the multicast service represents the
communication. The multicast service is responsible for delivering messages to the current view
membership. As many existing network technologies employ one form of multicast service or
another, it is up to a GCS to manage such ability to provide for the existence of coherent groups.
The need for high reliability, strict ordering and performance of a GCS may not coincide with the
network‟s ability to deliver its QoS, if defined. It is therefore necessary for many GCSs to provide
“reliable multicast paradigms,” rather than depend upon the QoS of the underlying network layers.
Of course, these requirements will vary depending on the needs of the GCS and the application
types it is being developed for. Below is a listing of some of the basic and advanced requirements
that a multicast service should provide (where is should be noted that some of these requirements
may prove to be mutually exclusive): Delivery Integrity
A GCS should never spontaneously generate a message. This is a rather trivial requirement as it
was essentially covered by the assumption that the underlying network would not do so. No Duplication
Messages are not duplicated by the GCS. This does not restrict the underlying network from
duplicating messages, however. Termination of Delivery
A blocking of messages by the membership protocol may be terminated. This simply states that
messages will eventually be delivered, although due to the nature of asynchronous networks, there
is no strict limit on the time, or latency, of delivery. Same View Delivery

863cf282-7897-4b0c-a76e-de57aae16e23.doc           Page 10 of 62                                05/04/99
A message must be delivered in the same view at every process that delivers it. This requirement is
strengthened below in Virtual Synchrony
If two processes participate in the same two consecutive views, then they receive the same set of
messages in the former. In order to accomplish the condition of a process knowing which of the
members of a previous view will be continuing to the next, the aforementioned transitional set is
utilized.   This concept is generally included in Extended Virtual Synchrony model, currently
employed in many modern GCSs. Synchronous Delivery – At the Same View as Sending
Every message is to be delivered in the view in which it was sent. This may require the GCS to
“block the sending and delivery of messages while a membership change is taking place. An
alternative way to fulfill this requirement is by discarding „left over‟ messages that arrive in the
course of a membership change in later views, including messages from members that continue
from the previous view to the current view” ([10], 36). GCSs that incorporate this requirement are
generally said to support Strong Virtual Synchrony. Self-Delivery
Each message is delivered at least to its sender.       This requirement “prevents processes from
arbitrarily discarding left-over messages upon view changes” ([10], 37).

2.4.3 Message Ordering Service

The following outline some of the possible QoS levels that may be implemented in a GCS. As the
multicast requirements above illustrate, there are many options in the delivery of multicast
messages. Of the most important are reliability and ordering. In order to provide a consistent group
view, the order of the delivery of messages is important in the design of a GCS. Below is a list of
some basic requirements that a GCS should support in its messaging ordering service: FIFO Delivery
This QoS type guarantees that messages from the same sender do not arrive out of order. This
would represent the lowest, most basic level in the QoS ordering hierarchy. If the underlying
network can guarantee FIFO delivery, the GCS that depends upon this level of ordering can be
trivially implemented. Casual Delivery

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 11 of 62                                 05/04/99
This is an extension of FIFO, requiring that a response to a message is always delivered after the
delivery of the original message. This delivery only applies to the current configuration and does
not extend back to previous ones. Strong Agreed Delivery
This requirement guarantees that messages are delivered in the same order everywhere. With this
method, processes must agree upon the order of messaged, even if they become disconnected. Weak Agreed Delivery
This guarantees that processes that remain connected deliver messages in the same order. This
method corresponds to strong agreed delivery unless there is a link failure. Reliable FIFO, Casual, and Agreed
These reliable versions guarantee that the order is continuous within each view, in addition to the
order among messages that are delivered.


                                           RELIABLE                    WEAK
                                            CAUSAL                    AGREED

               FIFO                        CAUSAL


                               Figure 2.3 Message Ordering QoS Hierarchy

2.4.4 Additional Considerations

863cf282-7897-4b0c-a76e-de57aae16e23.doc              Page 12 of 62                         05/04/99
In addition to the three major categories of requirements for GCSs above, the issue of Message
Stability is pointed out in [10]. It is pointed out that an “all or nothing” approach to message
delivery is not possible in a system in which a process can fail. Therefore, message stability is the
next best thing. A message is considered “stable” at a process when that process is informed that all
other processes in the current view have also received the message. Now, the message will never
need to be retransmitted and may be discard the message. Some GCSs may choose to implement
safe messages as a level of QoS. Some may opt to provide a quorum-based alternative where it is
safe when a certain majority of processes have received the message. Either way, it represents yet
another choice to be made by the GCS developer in the assessment of what he or she is trying to
provide to potential distributed application developers.
        Another aspect of distributed applications, that has received much attention in recent
research, but is not addressed in [10], is the matter of security. Perhaps the specification for
requirements of a GCS as outlined above and taken from [10] should be expanded to include a
security segment. It would seem important to the membership service to ensure that only a valid
process is allowed to join a view, yet it has been addressed by only a few of the GCSs thus far.
Horus was the first one described here to make mention of security as a design consideration, while
the others added security later, after focusing on performance issues and fault tolerance. We are
certain that eventually, all developers of GCSs will include a level of security protocols to help
maintain the authenticity of members and security of transactions.

2.5 Properties of Existing Group Communication Services
In this subsection we will explore three GCSs, all of which were featured in the April 1996 issue of
Communications of the ACM. They were all designed to fulfill the needs of a certain group of
applications, and therefore each will display certain strengths and weaknesses. We will provide a
brief overview of their respective functionality, using the terminology specified in section 2.4 as a
common frame of reference between these disparate packages. For an overview of a general GCS
architecture, see Figure 2.4 below:

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 13 of 62                                 05/04/99

            safe               app                   app send                          view
            indication         recv                                                    change

                                       Group Communication System

                                                net send                    net recv
                    recv                        Failure Detector


                                      Figure 2.4 Typical GCS Architecture

2.5.1 Transis

Transis is a multicast, transport-level GCS that focuses on fault-tolerant, partitionable group
services in large-scale environments. Developed at the Hebrew University of Jerusalem, Transis
exploits the underlying network structure in its implementation and has produced a partitionable
operation methodology that has been adopted in both the Horus and Totem systems. Transis has
extended the virtual synchrony model to partitionable systems in such a way as to provide a
coherent system behavior to the application developer. Transis Membership Service
Transis employs a Group Manager to coordinate the group messages and views among its members.
The local view is maintained by each member, which contains a list of currently connected
members of the view. As each view has a lifetime, it may undergo changes which are indicated by
a view change event to each member. Transis also used the concept of hidden views to aid in the
detection of a condition where a “view has failed to form but may have succeeded to form at
another part of the system” [3]. Each member in the group holds a positive integer, which has been
agreed upon by all members of the view based on local counters, to indicate uniqueness of the view.

863cf282-7897-4b0c-a76e-de57aae16e23.doc            Page 14 of 62                               05/04/99
The developers of Transis have chosen to make it complete in that it will ultimately exclude a slow
process as having failed. Transis Multicast Service
Transis provides filtering at gateways through a hierarchical communication structure, thereby
preventing the flooding of the WAN with local message traffic. Transis employs the sliding
window flow control method and allows for ACKs to piggyback regular message traffic. Message
retransmission depends upon the receipt of a NACK, or negative acknowledgement, which is
explicitly sent. A periodic “I am alive” message is sent to preclude a process from being excluded.
        Fast cluster communication is achieved with Transis by exploiting the existing network
reliable multicast, “derived from the Trans protocol, that employs the Deering IP-multicast
mechanism for disseminating messages using selective hardware-multicast” [3].


                     send messages                                     message delivery
                                                                       group status


                                     Agreed                   Group




                                      Figure 2.5 Transis System Model Transis Message Ordering Service
Depending upon the requirements of the application, Transis multicasts messages to members of a
group in one of many methods: FIFO, causal, agreed and safe. As described in [3]:

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 15 of 62                              05/04/99
    FIFO multicast guarantees sender-based FIFO delivery order. FIFO and Causal are cheapest and fastest.
    Causal multicast preserves the potential causal order among messages, guaranteeing that a response to a
    certain message will not be delivered before the message.
    Agreed multicast enforces a unique order among every pair of messages in all of their destinations. This
    total ordering incurs a larger delay, but is useful when dealing with replicated information.
    A Safe multicast guarantees a unique order of message delivery, and in addition, delays message
    delivery until the message is acknowledged by the transport layers in all of the machines in the group,
    thus guaranteeing delivery atomicity in case of communication failures. This results in the longest delay
    while ensuring that delivery of a message occurs after all machines on the network have received a copy
    of the message. Transis Conclusion
Clearly, with Transis‟ approach to GCS development, it is possible to develop large-scale WAN
applications that do not depend on a primary partition to remain in operation. By using underlying
hardware multicast technologies, Transis is able to exploit existing networking features to achieve
its goals. It is able to sustain high throughput due to an effective flow control mechanism. As
Transis is available for Unix systems as an API written in C, the need for a platform independent
equivalent that incorporates the use of higher-level building blocks is clear. The focus of Transis‟
development has been on a partitionable membership service, whereas Totem focuses on fault
tolerance and real-time performance.

2.5.2 Totem

The Totem system was developed using the C programming language at the University of
California, Santa Barbara, and “provides reliable, totally ordered multicasting of messages over
local-area networks (LANs) and exploits the hardware broadcasts of such networks to achieve high
performance” [7]. It is the goal of its designers for Totem to provide a solid GCS, upon which fault
tolerant and real-time performance dependent applications can be built. Totem Membership Service
Totem provides a token-based, partitionable membership service.                 At its lowest level, Totem
superimposes a ring topology on the underlying network. A token is circulated around this ring in a
point-to-point fashion. Only a processor holding the token can transmit a message, while a token

863cf282-7897-4b0c-a76e-de57aae16e23.doc            Page 16 of 62                                    05/04/99
retransmission facility is provided to allow for a lost token. A token includes a ring (view)
identifier in addition to a monotonically increasing sequence (seq) identifier and timestamp.

            Totem Token

                        ring id


                 seq               aru

                flow           backlog
               control                                 Processor
                    retransmission                         p
                      request list


                                    Processor                            Processor
                                        t                                    s

                                                Figure 2.6 Totem Token

        The single-ring protocol incorporates a membership or configuration service that allows for
addition of new or rejoining members, and additionally can detect faulty members via a timeout
mechanism. The protocol attempts to form as large a membership as possible through consensus
and termination. Consensus ensures every member in a configuration agrees on the membership of
that configuration.       Termination ensures “every processor installs some configuration with an
agreed membership within a bounded time unless it fails within that time” [7]. This is possible
through Totem‟s use of an unreliable failure detector, which necessarily must exclude some slow
processes, as they are indistinguishable from failed ones.

863cf282-7897-4b0c-a76e-de57aae16e23.doc                Page 17 of 62                            05/04/99
        With a change in membership detected, the membership protocol constructs a new ring and
reaches a new consensus. Two Configuration Change messages are then sent out to ensure an
accurate transition from old to new configuration is achieved. Totem Multicast Service
Totem satisfies the requirements of the transitional view and extended virtual synchrony, in
additional to the less stringent requirements that all GCSs fulfill. As with all GCSs examined here,
Totem builds upon a best-effort multicast service, using the user datagram protocol (UDP) in order
to exploit the increased performance of hardware broadcasts on the LAN. The single-ring layer at
the bottom of the Totem hierarchy extends this best-effort multicast by providing a “service of
reliable totally ordered delivery of messages on a single LAN while providing fault detection,
recovery, and configuration-change services” [7]. Above the single-ring protocol lies the multiple-
ring protocol which is then able to ensure global totally ordered messages (Fig. 2.7). Totem Message Ordering Service
Totem employs two message-ordering services: agreed and safe. These correspond to the Strong
Agreed Delivery and Reliable FIFO requirements described above. Safe messages also satisfy the
“stable” message concept described in 2.4.4. Below is taken from [7] to describe agreed and safe as
implemented in Totem:
       Agreed delivery guarantees that, when a processor delivers a message, it has already delivered all
prior messages originated by processors in its current configuration and timestamped within the duration of
that configuration.
       Safe delivery further guarantees that before a processor delivers a message, it has determined that
every other processor in its current configuration has received the message. Safe delivery is useful, for
example, in transaction processing systems where a transaction must be committed by all of the processors or
none of them.
        These services fall within the normal extended virtual synchrony requirements, which
require born-ordered messages. Born-ordered messages are used in Totem and mean “that the
relative order of any two messages is determined directly from the messages, as broadcast by their
sources” [7]. Within the token, as mentioned above, is a seq field. This field provides for the
sequential numbering of all messages within a ring. When a message is broadcast by the process
possessing the token, it increments seq in the token and assigns the message this sequence
number. Other processes are able to thereby detect gaps in the messages it receives and may

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 18 of 62                                     05/04/99
request the retransmission of certain messages. The retransmission request list for messages is also
within the token (Fig. 2.6). If a process has received all of the messages up to the current, it accepts
it as agreed. The all-received-upto (aru) field in the token will indicate the receipt status of all
processes with respect to message delivery. If a process has a message with a sequence number
equal or less than aru, it can deliver it as safe, and reclaim its buffer space.

                                            Application Layer

          Ordered multicast to                                                            Process group
            process group                                                             membership changes

                                       Process Group Interface

         Globally ordered reliable                                                     Network topology
                 multicast                                                                      changes

                                           Multiple-Ring Protocol

          Locally ordered reliable                                                   Local configuration
                 multicast                                                                      changes

                                            Single-Ring Protocol

           Best Effort Multicast                                                   Timeouts and absence of
                                             Physical Medium                                     messages

                                     Figure 2.7 Totem System Hierarchy Totem Conclusion
Although Totem has been ported to other platforms, it was originally designed to be run on Sun
Microsystems‟ Unix-based platforms, taking advantage of “the Ethernet hardware broadcast
capabilities and standard Unix facilities, particularly Unix UDP sockets, to broadcast messages and

863cf282-7897-4b0c-a76e-de57aae16e23.doc            Page 19 of 62                                          05/04/99
to transfer the token” [7]. It has been shown to be more scalable than other token-based protocols,
using a filtering mechanism on each gateway in order to exploit process-group locality, and thereby
scale logarithmically rather than linearly, to larger networks.              While Totem provides a high
performance GCS, the flexibility available in Horus may make it a more attractive GCS for many

2.5.3 Horus

As was described in the introduction to this section, group communication systems are constructed
based on a greatly varied set of requirements. Most GCSs focus on a particular subset of these in
delivering the required functionality. Developed at Cornell University, Horus provides such a
flexible and extensive set of microprotocols that it may be configured so as to meet most of these
requirements. “This flexibility extends to system interfaces, the properties provided by the protocol
stack, and even the configuration of Horus itself, which can run in user space, in an operating
system kernel or microkernel, or be split between them” ([9], 77). Horus is able to do this by
intercepting certain Unix system calls through use of an intercept proxy. “The proxy redirects the
system calls, invoking Horus functions that create Hors process groups and register appropriate
protocol stacks at run time. Subsequent I/O operations on these group I/O sockets are mapped to
Horus communication functions” ([9], 80).             Of course, Horus can also be used in the more
conventional “toolkit” method.
        Horus has been saved for last as it incorporates many of the modular features upon which
GCSs of the next section are fundamentally based. Horus is extremely flexible in its approach to
group communication and is easily abstracted to a Lego block resembling model.                      Each
                               Tcl/Tk                                  Shared Debugger
                       can be represented as a block in Horus Socket Library
microprotocol in Horus X- Library                         a stack of Lego-style interconnecting
pieces. These blocks have standardized interfaces and may be placed in any sensible order where
                      Horus Intercept Proxy                          Horus Intercept Proxy
each block performs one of many possible communication features. These interfaces provide entry
points for the block‟s up-call and down-call interactions (Fig. 2.8).TOTALexample, if an application
                  TOTAL                                               For
                  were to be run on a platform that has reliable message transmission, a
which uses Horus FC            MERG                       PARCLD
retransmission protocol would not be required. Therefore, the “block” in Horus that would provide
                   MBRSHIP                 MBRSHIP                    FRAG        MBRSHIP
such a feature would not be needed in the stack. With Horus, a “process group can be optimized by
                 FRAG               FRAG                    NAK            FRAG
dynamically including or excluding particular modules from its protocol stack” ([9], 77).
                     NAK                    NAK                       COM            NAK

                     COM                    COM                                     COM

863cf282-7897-4b0c-a76e-de57aae16e23.doc             Page 20 of 62                                05/04/99
                                              Unix Kernel
                        Figure 2.8 The Horus System (layered approach example)

        In order to provide the extensive interface required between microprotocol layers, Horus
provides a Horus Common Protocol Interface (HCPI).              One interface of HCPI deals with
membership services, another deals with the sending, receiving and stability of messages. HCPI
was created to provide for maximum configuration flexibility, and is fully reentrant in order to
allow for multiprocessing. Below, we will attempt to categorize the functionality of these blocks
into the three main features of GCSs: Horus Membership Service
Horus also implements a partitionable membership service, with views that incorporate process
identifiers and local counter values. Horus stacks are shielded from one another, each with its own
thread and memory space. As described in [9], within each stack are three objects that form an
integral part of its membership and messaging services: groups, endpoints and messages. Endpoint
objects model the communicating entity and may correspond to a machine, process, socket, etc.
Group objects maintain the local protocol state on an endpoint. They contain the group address and
a view that lists endpoint addresses that represent other group members. Several membership
protocols in Horus allow for the transition between views should a process crash, where view
membership can be agreed upon in the process. The message object is passed from layer to layer
and provides for local storage.
        The features associated with group membership in Horus are encompassed in its MBRSHIP
microprotocol. This microprotocol “runs a consensus protocol to provide its users with a virtually
synchronous execution model” ([9], 78), and maintains the list of accessible endpoints.        The
aforementioned membership HCPI interface provides for the following functionality: “In the down
direction, it lets an application or layer control the group membership used by layers below it. As
upcalls, these report membership changes, communication problems, and other related events to the
application” ([9], 79). The remaining HCPI interface category is utilized in the microprotocol
layers of Horus which deal with multicast and message ordering capabilities. Horus Multicast Service
The COM microprotocol in Horus provides the group interface to low-level protocols such as IP,
UDP and some ATM interfaces. This typically represents the bottom-most block in the stack where
its thread sits and waits for messages arriving via the NIC. Horus is able to implement many of the

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 21 of 62                               05/04/99
requirements of 2.4.2 through this COM microprotocol, and is able to provide flow control with its
FLOW microprotocol.
        To break up large messages for transmission, Horus utilizes its FRAG layer. And, to
provide a layer of security, the CRYPT microprotocol can be added to the stack to allow for the
encryption and decryption of messages. As Horus is a partitionable multicast service, it offers the
MERGE microprotocol to facilitate the location and merging of multiple group instances. It also
provides the NAK layer for negative-acknowledgement-based message retransmission. Horus Message Ordering Service
The TOTAL microprotocol can be layered into the stack to provide totally ordered message delivery.
The MERGE microprotocol employs the virtual synchrony principle to rank the processes it finds,
employing the lowest-ranked process to maintain a master clock. Using this and the TOTAL
microprotocol, the logical timestamp of objects can be consistently performed. Horus uses a token,
rotated among the current set of senders, to implement the TOTAL protocol.            Messages are
numbered and use event count synchronization variables to reconstruct order if needed. By layering
the desired functionality provided by stackable microprotocols such as those described, a varying
degree of ordering and state-transferability can be acquired. Totem Conclusion
Clearly, Horus is the most advanced of those described here, at least in terms of flexibility and
security. It allows the developer to only use those features of a GCS that it requires, based upon a
combination of application needs, underlying network capabilities and performance considerations,
to name a few. Horus was initially a project designed to extend the capabilities of Isis to include
security and real-time functionality. It has since progressed into new system, Ensemble, that is
written in ML and provides interfaces to C, C++ and Java.
        By itself, application development using Horus requires intimate knowledge of the system;
more than most potential users would know. Therefore, a more transparent interface to Horus
functionality is offered in Electra, a CORBA-compliant complimentary interface to Horus.
Together, Horus and Electra can be used to build more modern “object groups.” Thus, the Horus
system represents a transitional GCS in that is has been extended to include concepts that will be
covered in the following section on Object Groups.              However, Horus with its C language
implementation and strict platform dependent qualities, is far from those GCSs implemented in
Java, which represent the most recent trend in platform independent software architectures.

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 22 of 62                                 05/04/99
2.6 Overview of GCS Summary

We have looked at a system of specifying the requirements of GCSs that includes the main features
of a membership service, a multicast service and a message ordering service. We would like to see
this formal specification for GCS requirements address the need for security in a system. And, after
looking at three examples of existing “middle generation” GCSs, it is clear that each is focused on
providing a subset of the overall needs of the GCS developer.           Each has its strengths and
weaknesses, were Horus is the most flexible and has already been extended to include a more
modern, object group approach.
        As the object-oriented programming environment has been increasingly popular, so has the
interest in distributed objects and the object “bus.”      In the next section, we will explore the
adaptability of the object-oriented paradigm to the development of GCSs. As seen with Horus,
which can be viewed as an object-style design, the concept of objects in group communication is a
natural extension. And, with the platform independence of Java, it is not surprising to find many
newer GCSs being developed in the strictly object-oriented Java programming language.

3. Enter Java

Up to the point before Java emerged as a viable platform for Group Communication products all of
the current products were limited in their platform independence. Systems such as Horus and
Transis relied on direct system calls for handling the network streams. This limitation is one of the
main inhibitors for bringing GCS products into the mainstream. C++ added in some object oriented
functionality, but it was still as platform dependent as C. With the emergence of Java all of this has
the potential to change. In this section we will cover the advantages that Java brings into the GCS
paradigm, and look at two 100% Java GCS products that are currently being worked on.

3.1 Advantages of the Java Language

Using the C language for writing a GCS product has many disadvantages, ranging from platform
dependence to the inherent limitations of client-server communications. Java originally came about
due to the frustrations of programmers over such general limitations of C. The same advantages of
using Java over C for programming can be seen in the advantages of Java over C for GCS, and they
can be broken down into the following categories:

3.1.1 Object Persistence (Serialization)

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 23 of 62                                   05/04/99
Object persistence is an ability to take an object and save its current state so that it can be loaded
again in the future, perhaps in another environment. Java accomplishes this through a process
called serialization. Because of Java's ability to send output through a stream, serialization can save
to disk or send the object across the network, a major advantage for GCS. Also the object‟s current
behavior is serialized as well, so you can stop process, serialize it, and start it up again on another
        When an object is deserialized a validity check is run on it. If the check encounters any
errors with the object it will throw an exception. This built in check not only allows easy detect of
errors due to network problems, but adds in a powerful security check to prevent altered objects
entering the system. This security only works within the Java Runtime Environment however.
Since the object being serialized can be either written to disk or sent across a network, the data will
be located outside of system and open to various security risks. Through the serialized object, it is
possible to gain access to the file system of the machine where the serialized object originated.
Such problems can be prevented however, by declaring the objects fields to be transient[4].

3.1.2 Platform Independence

One of the founding principles of Java is that it is platform independent. If you write a program for
a Windows based machine the same code can work on a Unix machine. The GUIs will look the
same (with using Swing instead of AWT), and program aspects that are usually heavily system
dependent, such as file access and network communication, are done via simple Java calls that
function the same in any environment.

3.1.3 Low Cost

Many may not believe that this is an important factor, but cost is a major consideration for most
corporations. The dramatic outpouring of support for the Linux operation system is one example of
how cost can decide what is best for an environment. Java is currently free, and the JRE is free to
distribute along with Java applications. Java also has a large support base for its product online.

3.1.4 Secure

Security is an important factor in the Java language as it was developed with the idea of network
use in mind. As Java is an Object Oriented language the design methodologies of OO make it quite
secure. Through the use of private data the user will be unable to access sections of the objects that
are protected.

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 24 of 62                                   05/04/99
        Method calls on other machines do open potential security holes. Java has taken this into
account however and designed a system where all method calls search the local machine first before
heading somewhere else. This makes it impossible for someone to attempt to override a common
function, such as print, with a malicious one. In addition, method calls that do get activated are run
through a bytecoder verifier first[5]. This verifier makes certain that the code only does what you
allow it to do. It also makes checks so things such as modified serialized objects are not allowed
back into the system.

3.1.5 Remote Method Invocation (RMI)

RMI is the most advantageous aspect of Java for group communication systems. All of the 100%
Java GCS products that will be later discussed are based mainly on RMI. Java RMI allows
programs running on a Java Virtual Machine to make method calls on another Java Virtual
Machine, this can be within the local area network or on a system across the world. Java supplies
the transport layer that handles the encoding and the transmission and call protocols, enabling users
to make calls on other machines without having to worry about managing the various streams. With
this, RMI can pass full objects as arguments and return values, instead of allowing only predefined
data types. RMI can also move object behavior across the network stream.
        With the development of the JDK1.2, Sun has shown that they are serious about their
commitment to RMI. They have added in features such as callbacks, dynamic class loading, and
object activation. Sun and IBM are also working on modifying the RMI protocol to make direct
calls to CORBA technologies via IIOP[4].

3.1.6 Java Native Interface (JNI)

While RMI is 100% pure Java and makes its own type of system calls, sometimes it is useful to
make calls on applications and libraries written in another language, the JNI allows a Java
programmer to do just that. JNI acts as the glue between Java and native applications. JNI is
beneficial for GCS as it can create client systems for legacy applications. If several users are to be
working on a database on a mainframe, JNI enables a Java based system to make direct calls to the
application that accesses the database information. This makes the addition of a Java based GCS
into legacy systems, which a large amount of companies do have, all that much easier.

3.1.7 Java Naming and Directory Interface (JNDI)

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 25 of 62                                   05/04/99
The JNDI allows Java to keep track of various object types across a system, be they computer
addresses, file locations, printers, networks, or object groups. This common format for accessing
almost anything in the system enables a GCS to allow common use of the various objects. A group
using a whiteboard could save the work to disk or print it out on one of the participants‟ printers.
Another use would be keeping track of the various participants of a group. The system could
periodically query each of the individuals to make sure they are still active. JNDI makes the task of
keeping track of all the objects more reasonable.

3.1.8 Common Object Request Broker Architecture (CORBA)

CORBA is a distributed framework designed to support heterogeneous architectures. CORBA
allows the connection of systems that may differ not only in hardware, but also in their operating
system and programming languages. Companies that have large mainframe computers can use
CORBA to have their users PC‟s get to the information on the mainframe system. Object Request Broker (ORB)
ORB makes sure that remote method invocations success in the CORBA system. When two
systems wish to communicate with each other, they make their calls through ORB. It will connect
the objects between the various systems[12]. Interface Definition Language (IDL)
IDL is an intermediate language used to define the interfaces that a client will use and a server will
implement. It is a way of declaring exactly what objects the system makes use of. This can be used
in such systems as C, C++, Java, SmallTalk, and Ada. IIOP (Internet Inter-Orb Protocol)
IIOP is a TCP/IP implementation of the General Inter-ORB Protocol (GIOP). The GIOP defines
exactly how the ORB‟s communicate with each other, from how the bye ordering on integers is set,
to how messages are sent. This has become a standard protocol for CORBA systems.
        Currently Java must use JNI in order to interact with legacy servers. It does not have any
way to directly handle IIOP. However, Sun and IBM have announced plans to enable RMI to use
the IIOP protocol to communicate with CORBA-compliant remote objects.

3.2 Filterfresh
Filterfresh is a work in progress at New York University. It is 100% pure Java tool “for building
replicated fault-tolerant servers.”[1] It is still in its early stages of development and continues to
expand as Sun improves the Java language. The system is heavily based upon Java RMI, as most

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 26 of 62                                   05/04/99
Java based GCS tools are, and handles all network communication itself. In doing so Filterfresh is
able to hide network errors from clients and reliably redirect object calls to functioning servers.

3.2.1 Filterfresh Membership Service

Filterfresh uses a GroupManager class to maintain a consistent group view. Groups form by an
initial member creating the group and additional members joining in. When a new member joins
the group the state of one of the current members is transferred via object serialization to the joining
member, this ensures that all members are in the same view. The initial creator of the group
becomes the group leader. The leader is responsible for keeping track of the current members of the
group and their unique id‟s, the current view incarnation number, and all of the message ordering
information. If at any time the group leader should lose contact with the rest of the group, the
remaining members will elect a new leader. The first person to notice the group leader is gone and
able to get the majority of the old members to join their group becomes the new group leader. If a
group member believes that they have been separated from the rest of the group, they rejoin the
group and re-request the current view.       Unfortunately Filterfresh is a primary-partition based
system, meaning that if the group splits, different groups can not form and recombine at a later time.
All members rejoining the group join as if they are there for the first time.

3.2.2 Filterfresh Multicast Service

All inter-group communication is sent through the group leader. The member who wishes to send a
message to the entire group sends it to the leader. The leader determines if the message is from the
current view and will broadcast the message to all group members if it is. Using a blocking service
the leader will only handle new messages after it has determined that the group members have
received the current broadcast. Once this verification has been determined, the leader will unblock
its message receiving service and update the view. This type of delivery system is considered to be
ACK based as all outgoing messages must be acknowledged by the leader.

3.2.3 Filterfresh Message Ordering Service

As discussed above all inter-group communications are sent through the group leader. Messages
are sent with a positive integer, and unique member id included in order to determine the correct
order in which they should be received. As such all messages are processed in First In First Out
order. When a leader receives a message it will block all further incoming messages until it has

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 27 of 62                                   05/04/99
processed the current message.             This system, along with sequence numbers for all outgoing
messages, guarantees synchronicity among the group members.
        Group members attempting to send messages to the leader will continue re-sending the
message until a valid state is reached. In one state the leader responds that it has received the
message and the send can continue as normal. The other state occurs when the leader does not
respond within a set period of time and a timeout occurs. When timeouts occur the members
perceive them as having been partitioned off from the group, and they will re-request the current
group view.

3.2.4 Filterfresh Example – FT Registry

The creators of Filterfresh have created a fault tolerant RMI registry server in order to demonstrate
the abilities of the system. RMI registry servers are the main area of failure for GCSs in Java. The
servers are not replicated and as such network failures can prevent RMI calls. If there are several
servers running they contain differing information, making it difficult to chose which server has the
correct method. Also if a server does crash and is resurrected on another machine, the old system
calls can not be transferred to the new server location. Filterfresh took these problems and used
their product to create the FT Registry.
The FT registry creates groups of replicated RMI servers that are capable of handling network
splits. All method invocations are made via the FT registry, which allows server splits to be kept
hidden from the client, the client will move to a new server without knowing its on a different
server. The FT registry runs as follows:

In a normal system call the client requests
the location of the object it is attempting to
call by asking the FT registry.            The FT
registry looks up the server location,
attempts the call, and if successful it will
connect the client and server via the
transport layer.

Upon a server crash, the client can attempt
to normally make the connection attempt

863cf282-7897-4b0c-a76e-de57aae16e23.doc             Page 28 of 62                            05/04/99
through the FT registry, but the FT registry will detect that the connection is unable to be
completed. It will stop this error message from ever reaching the client. This saves the hassle of
the client having to deal with server crashes, it always thinks that it is correctly connecting to the
same server.
Once the FT registry has determined that the
server is down, it will do a reverse lookup of
the reference that is being called by looking
at a stale copy of the downed server. The
reverse lookup will find another copy of the
object reference and then connect the client
to another server that contains the object. All
of this being done without the client
becoming aware of the network problems.

3.2.5 Future Plans

Filterfresh is still in its early stages of development, and many plans are in the works for
improvements. Realizing that objects can be removed from servers and replaced with newer
versions a FTMulticastRemoveObject class to handle such situations is in the works. They also
plan on allowing nested invocations.

3.2.6 Filterfresh Summary

Filterfresh is still in its early stages of development so it can not be too heavily judged at this point,
but there are some things that can be mentioned. One of the main drawbacks of Filterfresh is its
basis on a primary-partitioned system. Network crashes occur frequently on systems, designing a
GCS where the basis is that the network remains relatively stable is not a realistic concept at this
point. Groups that do split are unable to form their own views, this leaves the potential for work to
be lost during network partitions. Overall it is a good system for systems that are primary-

3.3 Jgroup

Jgroup is a GCS product currently being worked on at the University of Bologna. It is in its very
early stages of development, and as such is severely limited in its functionality at this point. It is

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 29 of 62                                   05/04/99
100% pure Java, and as the same as in Filterfresh, it is heavily based upon Java RMI. The main
feature of Jgroup is that it is based upon a partitionable system.

3.3.1 Jgroup Membership Service

New groups are formed in Jgroup by two members getting together and collaborating on a new
view. Group members communicated with each other via reliable unicast invocation semantics.
This ensures that a method invocation from a client will get at least one response from one of the
servers. All network traffic is kept hidden from the client, making errors easily handled without
client knowledge [6].
        In Jgroup groups are able to split apart and form smaller groups. When a network partition
splits, the separate groups attempt to form new groups based upon the last view they correctly
received. These groups continue to function with each other until they are able to reform with the
old group. Once the network allows the split groups to reform, a state reconciliation protocol is
executed in order to properly recombine the differing views[6]. This protocol is still in its early
stages of development.

3.3.2 Jgroup Multicast Service

Unfortunately, at this point in Jgroups creation the multicast service is barely written. Members
currently broadcast messages and use a “best hope” method of deliverance. If members believe that
their messages are not getting to others they can flag other members as “unreliable” and attempt
different routes to contact these members. This feature is quite interesting. If one member (A) can
speak with another (B) and this member can speak with a third (C) while the first member (A)
cannot, member A will communicate with member C through member B. This system of routing
messages through other members has great advantages for network systems such as the internet,
where such strange connections are common.

3.3.3 Jgroup Message Ordering Service

Messages that are from one member to another are sent in FIFO order. All messages are send with
an increasing order number to ensure this. The problem is however that messages from different
members are not guaranteed to be FIFO, only ones from the same member. As the multicast section
is still under development, the messaging service is also in its early stages.

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 30 of 62                              05/04/99
3.3.4 Jgroup Future Plans

Still being in its early stages there are many plans of additions in the future. The first addition will
be the completion of the multicast system, a prime requirement for this to be any type of GCS. In
addition plans are in the works for improvement on the low-level communication protocol through
IP multicast. Also some more work needs to be done on the recombination protocol for merging

3.3.5 Jgroup Summary

Unfortunately it is too early to form any real judgement at all on Jgroup. Being a paritionable
system is a step in the right direction, but at this stage it can not be determined if recombination is
occurring correctly.

3.4 Filterfresh vs Jgroup
Filterfresh and Jgroup are very similar systems, mainly due to their common basis on RMI.
Ignoring their commonality, and Jgroup‟s infant stage, there are a few advantages of Jgroup over
Filterfresh. Mainly Jgroup is partitionable and Filterfresh isn‟t; for a GCS, this is a big deal. If
Filterfresh wants to be a contender in the GCS marketplace, it will need to be add in this ability.
Filterfresh also causes some heavy system traffic whenever a call fails, as it must keep on making
calls on other servers until a working server is found. The same load happens on a server bind, all
binds require replications to all of the servers. In a large system such replication would flood the

3.5 Summary
Java has the possibility to revolutionize the GCS industry. The benefits of the Java language
transfer over easily to benefiting GCS. The main problem however is in the infant stages of the
language itself. Java is still slow, though the recent release of its Hotspot enhanced performance
engine could remedy this. Also a large factor that is undermining Java development is its lack of a
standard. Recently Sun was denied by ISO in its attempt to create a Java standard. Hopefully it can
resolve the problems and a standard will emerge. Java has a large potential for the future, and its
potential can only be a benefit for GCS.

4. Java Shared Data Toolkit and Group Communication

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 31 of 62                                    05/04/99
The Java Shared Data Toolkit (JSDT), developed at Sun Microsystems, is described as, "a toolkit
defined to support highly interactive, collaborative applications written in Java" [2]. The JSDT,
formally known as JSDA and "JavaShare," is a Java API that was designed for the development of
collaborative applications involving a shared data environment. It is this shared data environment
that is the common interest among nodes and therefore can define the group of nodes. It is the group
that is the focus of GCS. The primary purpose of communication among the group members is to
maintain and process this shared data. The set tools within the JSDT provide services that enable
this. With these tools, the JSDT can enable processes on different nodes of a distributed
environment to operate collectively as a group, in a similar way to most of the group
communication tools described.
        In this section, the JSDT is described in conjunction with is its ability to provide group
communication services, as defined in the previous sections. The role that each of the primary
components of the JSDT plays in GCS is depicted as well. Also, the JSDT is examined in relation to
the existing GCS tools. The following questions are examined: What does the JSDT provide that the
others do not and vice versa? What components of GCS are not supported in the JSDT? The JSDT's
ability to provide security in a distributed environment is also discussed. Finally, the formal
discussion of the JSDT concludes with an overview of its strengths and weaknesses and its future
with group communication.

4.1 JSDT and GCS Requirements
Even though Sun does not specifically describe the JSDT as providing group communication
services, the toolkit undoubtedly supports many of the requirements of GCS. Sun does, however,
describe the JSDT as having the ability to support "full-duplex multipoint communication among an
arbitrary number of connected application entities -- all over a variety of different types of
networks" [2]. Specifically, the JSDT provides several network implementation options, each of
which uses a different transport protocol. The socket option uses TCP/IP sockets, the lrmp option
uses the Lightweight Reliable Multicast Package, and the rmi option uses remote method invocation
calls through Java's RMI capabilities. Additionally, the latest version to date of the JSDT, version
1.5, provides an option for hyper-text transfer protocol (HTTP) as another implementation [11].
        The JSDT claims 100% Pure Java compliance, so it uses the benefits of the Java language to
its advantage. Java's platform independence provides the user of the JSDT with the ability to

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 32 of 62                                 05/04/99
develop distributed, group-orientated applications without having to worry about the specific
platforms on which the application might eventually run. This allows group communication to exist
within what is potentially a heterogeneous environment. Java's object serialization abilities are also
exploited by the JSDT. Its primary means of data and message transfer is through object
serialization. As long as a Java object is completely serializable, it can be sent to and from any
communicating entity using the JSDT. The JSDT also relies upon Java's event model to keep track
of new messages and data as they arrive. It is also used to monitor communicating entities joining
or leaving a group. To facilitate the handling errors within a distributed group environment the
JSDT also makes use of Java's exception handling capabilities. Java's exception handling allows for
fault tolerant approaches to GCS to be implemented. Java, to an extent, seems to naturally adhere to
some of the requirements needed for GCS, while the JSDT captures these and implements them in
an easy to use set of tools.
        Although touted as a "collaborative" toolkit, the JSDT does provide the primary group
communication services to an extent: the membership, multicast, and message ordering services. As
described in the next sections, each of these services is provided by one or more of these tools
working together. In the section that follows, these and the other primary JSDT classes are

4.1.1 JSDT Methodology

The JSDT was originally designed as a set of tools for developing highly collaborative applications.
It is for this reason that the terminology used in the JSDT does not directly reflect the GCS terms
and concepts used throughout many of the other GCS tools described in this paper. The tools
described here are not the full complement of tools available in the JSDT; they are the significant
tools related to the JSDT's GCS abilities.

       Client: A Client represents a single node participating in a group of communicating nodes.
        The transfer of data in Client communication can be point-to-point or multipoint. A Client
        can be the sender or receiver of this communication. The JSDT provides a means for
        authentication at the Client level. Also, each Client can be a member of any number of
        groups (Sessions).

       Session: The Session component is the JSDT's abstraction of a group. It consists of a set of
        related nodes (Clients) that exchange data through defined communications paths. The

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 33 of 62                                   05/04/99
        Session object maintains the state of the group of Clients and their communication paths.

       Registry: The Registry is used to maintain Session data, keeping it available to Client
        applications running on a particular networked computer. The Registry maintains a transient
        database that maps names to JSDT Client and Session objects.

       Channel: The JSDT Channel component represents the potentially multi-party
        communication path between clients participating in a particular Session. The Channel
        determines the reliability of the communications path and orderedness of the data messages
        through it.

       Data: Data is the JSDT's representation of the information sent between Clients over a
        Channel. An instance of the Data class can contain any completely serializable Java object.

       ByteArray: A JSDT ByteArray is similar to the Data component except that it is
        permanently, globally available to all Clients within a particular Session. Unlike a Data
        object, a ByteArray is not explicitly sent over a Channel. Instead it is written to by a
        communicating node and updated globally as it is modified. Clients obtain a reference to a
        ByteArray and use an EventListener for notification of updates to the ByteArray.

       Token: The JSDT Token mechanism allows for the synchronization of data within the
        group environment. The use of Tokens provides a means for exclusive access to a particular
        shared object. However, it can be used to create any number of data synchronization

       Managers: JSDT Managers are used to implement a level of security for JSDT objects. A
        Manager associates a particular "management policy" with a particular secure resource. It is
        the Manager object that can authenticate a Client requesting access to a particular resource
        or object [2]. This process is described in more detail in section 4.3

4.1.2 JSDT and GCS requirements

Together, these tools fulfill the primary services of group communication. Although, they do not
encompass every group communication service available, they do reflect a reasonable subset. It is
important to keep in mind here that the JSDT was not designed with the same goals of traditional
GCS systems. A direct mapping between the fault-tolerant GCS systems and the JSDT does not

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 34 of 62                                05/04/99
appear to be possible. Therefore, this section describes reasonable parallels between the JSDT tools
and standard GCS requirements. There is no implication here that the JSDT is a traditional GCS
system. It is only suggested that it could be used to provide some similar functionality. JSDT Membership Service
Arguably, the most important part of GCS is the membership service. In the JSDT, the Session class
defines a group of communicating Client members. However, it is the Channels interconnecting
these Clients within a Session that enable each Client to know what other Clients are able to
communicate with at a particular time. Hence, the JSDT's Channel provides part of the view-like
component of GCS.
        The JSDT is designed to support the primary partition model. Observed implementations
using the JSDT typically use some sort of central server to store data and maintain JSDT objects.
However, it is reasonable that additional partitions could be created by replicating the server. Then
using some of the traditional GCS approaches, the replicated data could be maintained. A partition
or membership group could be defined at varying levels within the JSDT. The shared data itself, the
ByteArray, provides an intuitive point at which to replicate data. Although this shared data is
available to all members of a Session, only Clients that have subscribed to the ByteArray will
receive updated data. Thus partitions can be defined by these Clients registered to receive updates
of the ByteArray. Since a Data object is sent to a particular Client or set of Clients and is not
available globally, it does not naturally define a partition. However, given that it is Channel that
carries the Data and provides many of the GCS capabilities, partitions defined at the Channel level
would be reasonable. Although the JSDT does not provide any built-in means to solve the state
transfer problem nor any of the other fault-tolerant issues mentioned in section 2.3, given the tools
available in the JSDT, a solution should be feasible.
        Any Client can exist without a Session, however it must join a Session to be a part of any
group communication and to share data. Additionally, a Client must send Data over a Channel to
communicate it to other Clients. If the GCS concept of view can be mapped to the JSDT Session or
Channel, then these conditions meet the GCS valid execution requirement. This states that a
process, or in this case a Client, is in a particular view where it executes events of that view.
Additionally, the Channel provides the self-inclusion requirement of the membership service; any
communication sent by a Client over a Channel can be received by all Clients consuming that
Channel, including the originating Client.
        Many of the membership service requirements are not explicitly defined in the JSDT. This is

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 35 of 62                                 05/04/99
mostly due to the fact the rest are requirements based on fault-tolerant approaches to GCS, where
the JSDT is focused around the collaborative aspects of GCS. However, again using the JSDT
components, it is believed that GCS applications utilizing these fault-tolerant approaches can be
developed. JSDT Multicast Service
The Channel component is responsible for the JSDT's multicast services. It is the Channel that
defines the multi-party communication paths within a particular group, such that any Data object
sent on a particular Channel is received by any Client that has registered an interest in receiving that
Data. Most of the multicast service requirements as outlined in section 2.4.2 of this paper and [7]
are also provided in the JSDT through the use of the Channel.
        Delivery integrity is maintained in that messages can only explicitly be sent by a Client;
spontaneous generation of messages is not possible in the JSDT. Additionally, the JSDT does not
duplicate data sent over a Channel, fulfilling the no duplication requirement. Since the underlying
implementation of the JSDT is not directly available, it is not known whether the JSDT explicitly
provides the termination of delivery requirement. In practice however, on a Channel that has been
initialized to send ordered messages, each message is eventually delivered. Same view delivery is
also available to a limited extent. Since the Channel represents the multi-party communications
path and "view" simultaneously, every message sent is delivered within the view from which it
originated. The self-delivery requirement is also made possible using the JSDT tools. Any Data sent
over a Channel can optionally be sent the originator of the Data. The JSDT does not explicitly
provide strong virtual synchrony or synchronous delivery; although probably not trivial, these could
be implemented using the JSDT tools. The implementation of such a system using the JSDT is left
as a future extension to the work presented here. JSDT Message Ordering Service
Again, the JSDT's Channel defines the message ordering services. A Channel is instantiated with an
option to order the data sent over it or not. The Channel, therefore, can guarantee that the data is
received in the same order that it was sent, thus fulfilling the FIFO delivery requirement of a
message ordering service. In addition, it is the Channel that can determine the reliability of the
communications path. This is also an option to the user of the tool. In this way a particular
communications path, or Channel, can be defined to be unreliable, avoiding the extra overhead
needed for a reliable path. The remaining message ordering service requirements again are based on
fault-tolerant approaches to GCS; although important aspects of GCS, these are not explicitly

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 36 of 62                                    05/04/99
provided by the JSDT.
        Together, these tools function to enable group communication and collaboration at the
application level. With this set of tools as building blocks, even more elaborate group
communication services can be created.

4.2 Comparison of JSDT to existing GCS

Most of the GCS tools described thus far employ a fault-tolerant approach to group communication.
To some degree, each of them addresses virtual synchrony, the state transfer problem, uniquely
identifiable messages, and other fault-tolerant issues. While it is quite conceivable that the JSDT
could be used to to handle many of these issues, the JSDT does not address them directly. In this
section, the JSDT is evaluated with respect to some of these GCS tools already mentioned.

4.2.1 JSDT vs. Transis

Transis, being a transport level GCS operates at a lower level than the JSDT, which exists more at
the application level. The JSDT is intended for integration with a Java application or applet to
enable collaborative functionality in what may have been a single-user program. Transis on the
other hand is layered below the application to provide its GCS capabilities. While the JSDT can be
used on the several platforms that support the Java Virtual Machine, Transis is restricted to the
Unix environment, thus limiting the final destination of a GCS application employing Transis.
Written in the C programming language, Transis takes advantage of the underlying system and
network interfaces [3]. Although not explicitly tested, it would be reasonable to suggest that Transis
provides better communication performance than that of the JSDT. Not only does any program
employing the JSDT for group communication have to contend with the JVM layer for
communication, it is also restricted to the transport protocols with which the JSDT tools can
interface. As of version 1.4 of the JSDT, these are restricted to sockets (TCP/IP sockets), LRMP
(light weight reliable multicast package), or RMI (remote method invocation). Version 1.5, which is
not in full production at the time of this writing, provides an additional HTTP (Hyper-Text Transfer
Protocol) option [11]. The JSDT is designed to keep the particular implementation interface
invisible to the user and, as mentioned previously, provides the means to interface with arbitrary
proprietary protocols as well as those based on standard networking interfaces [2]. Most of these
differences are due to the fact that the primary focus of Transis is to provide a partitionable
membership service, providing more fault-tolerance, while the main focus of the JSDT is to provide
a flexible, collaborative computing environment.

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 37 of 62                                  05/04/99
        The Transis approach to GCS provides a much more fault-tolerant and a higher performance
set of tools than does the JSDT. However, the JSDT provides much more flexibility in its use than
does Transis. If fault-tolerance and high performance are high priorities in designing a group
communications, then Transis might be the better tool of the two. However, if collaboration, ease of
use, and flexibility are more important to the application, then the JSDT seems to be the much
better choice.

4.2.2 JSDT vs. Totem

The Totem system is also designed to provide fault-tolerant, real-time group communication
services [7]. However, Totem's multicasting abilities are similar to those of the JSDT. As
mentioned previously, the JSDT's Channel provides a flexible means to implement multicast
messages. Similarly, the token-based messaging system of Totem allows for a reliable means to
multicast information. Totem also uses a best-effort policy for its multicast service, using the UDP
protocol. The JSDT also provides such a mechanism, but only TCP/IP socket implementation
option. Again, the primary focus of the tools is different. Totem aims to provide fault-tolerant, real-
time performance to group communication while the JSDT's primary focus is group collaboration.
As Totem is somewhat similar to Transis, many of the differences previously mentioned between
Transis and the JSDT hold between Totem and the JSDT as well. Specifically, Totem is written in
the C programming language and is intended for use in the Unix environment, limiting its
extensibility. Additionally, Totem is restricted to group communication over a local area network
[7]. In contrast, the JSDT has the ability to provide communications between machines of arbitrary
distance apart. The JSDT provides this ability in its use of the Uniform Resource Locator (URL)
addressing mechanism. Generally, Totem provides performance and fault-tolerance beyond that of
the JSDT. Again however, the JSDT's flexibility and ability for use in modern applications
surpasses that of Totem.

4.2.3 JSDT vs. Horus

Horus is probably the most extensible of the early GCS systems that are mentioned. Like Transis
and Totem, Horus is also limited to the Unix environment and other environments that can support
the Horus APIs. It is also programmed using the C and ML programming languages, where the
JSDT is programmed completely in Java. Thus, Horus provides a lesser degree of platform
independence than the JSDT. This is not a very important factor when considering fault tolerant
approaches to GCS. However, when considering the practicality of use of a distributed GCS

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 38 of 62                                   05/04/99
system, it would be desirable to overlook the specifics of the potential platforms where each node
will reside.
         Although the JSDT provides a great deal of flexibility and modularity for the design and
implementation of group communication applications, is does not provide the modularity of Horus'
"Lego block" approach [9]. Since the Horus blocks are used at such a low-level, a much greater
control of the underlying network interfaces can be realized. As previously mentioned, the JSDT
provides several options network implementations, RMI, sockets, LRMP, and HTTP[2][11].
However, these do not directly provide the same types of GCS services that the Horus blocks
provide. Additionally, within the JSDT, these implementations are not configurable to the extent of
Horus modular approach.
         Like the JSDT, Horus also encapsulates some of its services in particular objects. This
allows for an indirect mapping, although not a perfect one, between some of the JSDT objects and
the Horus objects. Specifically, Horus endpoints are similar to the JSDT Client. The Horus endpoint
represents the communicating entity and is not restricted to human-to-human communications. The
JSDT Client also functions as the communicating entity. Additionally, any part of the
communication system can inherit from the Client class providing the Client communication
functionality and becoming a node on the communications network. The Horus group object is
conceptually similar to the JSDT Session and Channel in that it defines a particular group of
communicating nodes. However, functionally the Horus group object is defined in fault-tolerance
terms, maintaining the state of the endpoints and their views. Lastly, Horus messages are analogous
to the JSDT's Data object in that they both encapsulate data to be sent between communicating
         Horus appears to provide more flexibility and control than the JSDT in its abilities to
provide fault-tolerant group communication services, despite the minor similarities. Again, the
JSDT provides a more practical solution than does Horus in its ability to add collaborative
functionality over a heterogeneous network to existing applications. An implementation of a JSDT
application using Horus' modular approach might enable more true GCS functionality to an extent.
However, because the JSDT operates far above the network protocol layer, the practicality of such
an approach might not be reasonable.

4.2.4 JSDT vs. JGroup and Filterfresh

As the JGroup and Filterfresh tools are also 100% pure Java, a better mapping to the JSDT might be

863cf282-7897-4b0c-a76e-de57aae16e23.doc      Page 39 of 62                                 05/04/99
expected. However, both of these are also developed based on fault-tolerant approaches to GCS,
leaving a direct comparison difficult. The fact that all of these tools are developed using Java
suggests that more fault-tolerant approaches to GCS can be realized using the JSDT.
        Filterfresh uses what it calls a "Group Manager" to maintain the information about the group
[1]. This functionality is different from the JSDT Managers, however it is analogous to the
functionality of the JSDT's Session, in that both maintain which members belong to the group.
However, the Filterfresh Group Manager always maintains a group leader whereas this type of
functionality is not inherent in the JSDT. Both the JSDT and Filterfresh maintain a Java RMI-like
registry providing lookup and naming services. Additionally, Filterfresh is one of the few GCS tools
mentioned here that use the primary partition model, as does the JSDT. However, Filterfresh
implements a specifically fault-tolerant version of the registry; this is not the case with the JSDT
registry. Also, Filterfresh uses a view approach to overcome network failures; as previously
mentioned the JSDT does not directly support the view requirement used is many GCS tools.
        The primary similarity between JGroup and the JSDT is their use of Java RMI as an
underlying interface to remote objects. Although RMI is an optional implementation in the JSDT, in
both tools it provides a useful means for transparent inter-node communication over a network.
JGroup also directly utilizes the view model of GCS's for fault-tolerance [4]. The JSDT claims that
messages sent over a Channel are ordered globally. JGroup on the other hand only claims message
orderedness with respect to the individual sender. Although JGroup is still in development, it does
provide the ability to merge group members after a network partition, unlike the JSDT.
        More knowledge of the underlying mechanisms of these tools is necessary for a more in-
depth comparison. As with the other GCS approaches, Filterfresh and JGroup seem to be more
closely tied to the traditional GCS. Their abilities in this respect obviously surpass those provided
by the JSDT. However, given Filterfresh's and JGroup's ability to handle network partitions proves
that Java-based approach to GCS in possible. Although the JSDT does not explicitly provide this
type of an approach, one could be developed similarly to Filterfresh and JGroup.
        The JSDT indeed provides a reasonable set of GCS capabilities. In comparison with the
other GCS systems, the JSDT falls short in providing fault-tolerant group communication. It is
important to reiterate, however, that the flexibility and extensibility of the JSDT and Java could
provide these types of services to some degree. Additionally, it is conceivable that the JSDT could
be used in conjunction with some explicitly fault-tolerant underlying network service. The JSDT
provides hooks where custom network protocols can be used to support the JSDT, in much the same

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 40 of 62                                  05/04/99
way that the JSDT now uses RMI, sockets, LRMP, and HTTP [2][11]. The JSDT's ability for
extension and configuration make it a more practical and flexible tool for group communication.

4.3 JSDT and Security
As previously mentioned, the JSDT Manager classes provides a level of security for the distributed
JSDT objects. Specifically, the JSDT provides managers for those classes that extend the
Manageable class. These include Sessions, Channels, ByteArrays, and Tokens, which are managed
by the SessionManager, ChannelManager, ByteArrayManager, and TokenManager respectively.
Object management in the JSDT is optional for each instance of these types of objects. However it
can be useful for authenticating Clients' requests for the objects.

4.3.1 Authentication

        Client authentication occurs when a Client attempts to join a managed Session, ByteArray,
Channel, or Token or when a Client attempts to create or destroy a ByteArray, Channel, or Token
that is in a managed Session [2]. The authentication interaction between the Client, the object
manager, and the managed object is shown above in Figure 4.1. In this example, the authentication
process begins with the Client's attempt to join the managed object (1). Next, manager of this object
sends an "authentication request" to the Client. Within this request is a challenge object (2), to
which the Client replies with a response object (3). The manager then validates the Client's response
(4) and then either authenticates the Client's request (5). Both the Client and the manager must agree
a priori on a particular authentication policy. The Client must be able to process the manager's
challenge and reply with the appropriate response.
        Security in distributed applications is necessary. In maintaining shared resources over a
network, it is critical that they be protected from processes that should not have access to them.
Additionally, it is important that each member of a group of communicating entities working
together is authenticated to be who they purport to be. The JSDT is the only one of the group
communication systems mentioned, besides Horus that explicitly provides security at some level.
What is significant about the JSDT security mechanism is that it can be extended to provide
virtually any type of authentication service that can be implemented using Java. This type of service
is increasingly important in a faceless, online environment.

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 41 of 62                                 05/04/99
                                                                                 Managed JSDT
Client                                      Manager

                                         join
           authentication request


                                    validate response

                     authentication complete

                 Figure 4.1 – Authentication example: Joining a managed JSDT object

4.4 JSDT and GCS Concluding Remarks
The Java Shared Data Toolkit provides a useful set of tools to support the development of
collaborative, group applications. In this paper, the JSDT is shown to have the tools necessary to
enable a useful subset of the GCS requirements. In comparison to the available GCS tools, the
JSDT is lacking in its ability to support fault-tolerance in group communication. It could be argued
that its Java-centric approach to group communication prevents the JSDT from handling network
partitions, progression of views, and other fault-tolerant GCS requirements. However, seeing that
GCS tools such as JGroup and Filterfresh are using a 100% Pure Java approach to true GCS, such
an implementation using the JSDT seems feasible. JSDT's built-in security and extensibility for a

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 42 of 62                              05/04/99
distributed environment make it a good candidate for the development of group communication
        "Collaboration" seems to be one of the many hot topics in computing recently. This notion
has fostered the development of the JSDT, hence its focus as a tool providing collaboration rather
than a tool to provide group communication services. As collaborative applications emerge, it will
be necessary to have fault-tolerant implementations if collaboration is to occur on a large scale. This
is especially true as network and system failures inevitably occur on a regular basis. If the fault-
tolerant GCS requirements are incorporated into the JSDT at some level, a practical, flexible GCS
tool could emerge.
        The JSDT is not a perfect technology. As with many of the other tools mentioned here, it is
still in its early stages and it is improving with newer versions. One of the original designers of the
JSDT and author of its Users' Guide, Rich Burridge, admits that there is room to grow with the
JSDT's fault-tolerance capabilities. He suggests Java technologies such as JNDI and Jini might
facilitate this in future versions of the toolkit.

5. An Application Proposal: Collaborative CRCTool Using JSDT

The final component is that of the CRCTool application proposal. The CRCTool will be designed as
a living example of a Group Communication Service (GCS). Many of the aspects of these GCSs, as
detailed earlier in the paper, will be put into practice in this design. The design will cover the formal
specifications of the Collaborative CRCTool and show how that application could be mapped into
implementation using the JSDT.
        We wish to have a collaborative tool that will allow for the real-time, multi-user editing of
CRC cards. CRC cards are a popular beginning design step in object-oriented design. The tool will
allow these cards to be „maintained‟ by a certain group of people. The group will be able to
collaboratively add, remove, and modify various cards within a project.

5.1 System Overview
The overall structure of the tool will be as follows: At the highest level, we have a project, termed
CRCProject. Each CRCProject will contain a variable number of cards each termed CRCCard. Each
card contains local card information as well as references to other cards in the project as dictated
under the CRC design model.

863cf282-7897-4b0c-a76e-de57aae16e23.doc             Page 43 of 62                               05/04/99
                                               Project Info





                                                 CRC Card Specific

                            Figure 5.1 – The crcProject to crcCard relationship.

The system will be based on a client-server architecture and will consequently have two major
components: the CRCServer and the CRCClient.

5.2 CRCServer Overview
The CRC server will be responsible for maintaining of CRC projects stored on its local disk. When
a client connects, they will be required to authorize themselves with an id and a password. The
server will maintain a list of valid users and their access permissions. The access permissions will
determine what projects an id has access to and whether or not they may create new projects.
        Once authenticated, a user will be provided with a list of projects that they may open as well
as an option to open a new project if allowed. At this point, they may choose to open a project in
read-only mode or in edit mode. Listed next to each project will be the status of each project. This
status will contain information concerning if the project is currently open and if so, by whom. In
this way, the entering user will be able to see who is currently working on what project.
        The server will provide a simple GUI configuration tool to allow the server administrator to
add users, change passwords, and administer permissions.
        The server will maintain the CRCProject data files on local disk and will also be able to send
a copy of the data files to the user at their request.
        The server will be able to support multiple multi-user sessions at the same time. In this way,
persons A, B and C, can be working on Project-1 while users C, D, E, and F are working on Project-

863cf282-7897-4b0c-a76e-de57aae16e23.doc               Page 44 of 62                           05/04/99
Following are the requirements/specifications for the CRCServer:

 SR1.0        The server should maintain a list of all allowed users, their passwords and their access
              permissions – This file should be stored on local disk
 SR2.0        The server will be responsible for authenticating users attempting to connect against
              the list in SR1.0
 SR3.0        The server should allow for at least n people to work on a given CRCProject at a time
 SR4.0        The server should allow for at least n CRCProjects to be worked on at a time
 SR5.0        The server should only allow 1 person to edit a given set of CRCCard data or
              CRCProject data at a given time – all other users attempting to edit should be forced
              in read-only mode
 SR6.0        The server will be responsible for providing a filtered list of available projects to the
 SR7.0        The server will be responsible for providing the user with the means to create a new
              project and set its user access permissions
 SR8.0        The server should have the capability to upload projects to a user
 SR9.0        The server should have the capability to download projects to local disk from a user
 SR10.0       The server should provide a simple GUI for administration
                                  Table 5.1 – The crcServer requirements.

5.3 CRCClient Overview
The CRCClient will be responsible for providing a GUI for a user to connect to and use the
CRCServer. Upon loading, the CRCClient will ask the user to select a server from which to work.
They may choose to work locally on their own, or from a server. If local mode is chosen, they are
given the option to open a local project or to create a new one. If server mode is selected, the user
must authenticate themselves to the server with their id and password. The projects they have access
to will then be shown to them as well as the option for them to create a new project if allowed.
         If a new project is created, the creator will be able to choose the users that will have access
to the project from the server user-list. The creator will be able to choose the filename of the project
and the project name.

Note: The following will assume that the user connected to a server and therefore working in
collaborative mode. If working in local mode, the collaborative capabilities will simply be disabled.

         Once a user is connected and has selected a project, the main GUI will be displayed. This
will contain various sections and a main menu/toolbar (CRCMainMenu). The CRCProjectView

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 45 of 62                                  05/04/99
section will show the open projects in an „explorer-tree‟ format. The main section, the
CRCDataView, will show the currently open CRCCard data or CRCProject data. The
CRCUserView box will show the currently connected users. Finally, the CRCChatView window
will provide for user-to-user interaction.
        Below is a screen shot from the original single-user CRCTool (Figure 5.2). The
Collaborative CRCTool would have a similar look with a chat view in a pane at the bottom, a user
list pane to the right and a slightly different menu-bar and toolbar options.

                                      Figure 5.2 – Single user CRCTool

5.3.1 CRCClient Component Descriptions CRCProjectView
This view will display the currently open CRCProjects in an „explorer-like‟ tree view. At the base
level will be the actual CRCProject names. When these projects are expanded, the CRCCard names
that they contain will be displayed. This provides for more than one CRCProject to be open and for
all of the contained, or a subset of the contained CRCCards to be visible. The user should be able to
cut and paste CRCCards between projects using this window. The user should also be able to
rename cards and projects in this window.

863cf282-7897-4b0c-a76e-de57aae16e23.doc           Page 46 of 62                              05/04/99
        Double clicking on a card or a project will bring that card or project into the CRCDataView
for editing. The user should also be able to single-click on the card or project name to bring up the
data in read-only mode. Right-click menus for editing and renaming functions should also be
provided. CRCDataView
This view provides the main editing functionality of the CRCTool. When a card or project is
loaded, its information is loaded into this window. The information can be loaded in either edit or
read-only mode. Only one user can have a CRCCard or CRCProject open in edit mode at one time.
When any card or project is loaded in edit mode, the user id of the editing user is displayed along
with the card data. In this way, if two users try to edit a card at the same time, the first user will be
given edit access and the second user will be given read-only access with the name of the editing
party displayed along with the data. CRCUserView
This view will be a simple window showing the names of the currently connected users working on
the selected project. This box will also show the mode {edit, read-only} that the user is connected.
This is merely an informational window and has no direct functionality. CRCChatView
This view will be a simple window allowing all connected users working on a selected project to
have a simple discussion. There will be a display window and a simple „send‟ text box. Anything
typed in the text box will be sent to all of the connected users working on the currently selected
project. CRCMainMenu
The functionality of the main menu will be mimicked in a graphical toolbar. The menu will allow
the user to do the following:
   File
            Close a project         - close a project
            Open a project          - open a project
            Create a new CRCCard - create a new CRCCard
            Delete a CRCCard- delete a CRCCard
            Edit a CRCCard          - open a CRCCard in edit mode
   Edit
            Copy                          - copy the currently highlighted data
            Paste                         - paste from the clipboard (if feasible)

863cf282-7897-4b0c-a76e-de57aae16e23.doc             Page 47 of 62                               05/04/99
   Tools
            Download CRCProject           - download a project copy from the server
            Upload CRCProject             - upload a local project to the server
            Disconnect from Server        - enter local mode
            Connect to Server             - enter server mode
            Export to HTML                - export the CRCProject to HTML
   Help
            Help with CRCTool             - general help
            About CRCTool                 - about CRCTool

Following are the requirements/specifications for the CRCClient:

 CR1.0        The client should provide the GUI interface for server select and client authorization
 CR2.0        The client should be able to operate in both local and remote (server) modes
 CR3.0        The client should be able to switch between modes during a single run
 CR4.0        The client should be able to upload or download projects from the server
 CR5.0        The client should provide the GUI interface to set user permission and names on a
              new or uploaded project
 CR6.0        The client should provide a tree-like view of the open projects (CRCProjectView)
 CR7.0        The client should provide for an edit/read-only view of CRCProject and CRCCard
              data (CRCDataView)
 CR8.0        The client should provide for a read-only view of the users working on a currently
              selected project (CRCUserView)
 CR9.0        The client should provide for a real-time text communications view for user
              interaction (CRCChatView)
 CR10.0       The client should detect and notify the user upon loss of connection to the server and
              or other users.
 CR11.0       When opened in read-only mode, the client should display the user id of the currently
              editing party to the user
 CR12.0       The user should be able to use the CRCProjectView to rename and cut and paste
              CRCCards and CRCProjects
 CR13.0       The client should provide help to the user
 CR14.0       The client should be able to export a CRCProject to HTML
                                   Table 5.2 – The crcClient requirements

5.4 A Brief Architectural Overview of the JSDT

We wish to show a mapping of the above requirements to Java1.2 and the Java Shared Data Toolkit
(JSDT). Since a basic overview of JSDT concepts has been discussed in the previous section, only
the aspects of the JSDT that are specific to design and implementation issues will be discussed in
detail here. Below, we have a simple diagram of the basic JSDT architecture (Figure 5.3).

863cf282-7897-4b0c-a76e-de57aae16e23.doc            Page 48 of 62                              05/04/99
        The basic JSDT architecture is fairly simple yet very powerful. The architecture consists of a
number of Clients that participate in what is called a Session. Any Client can create or join one of
these Sessions with a simple function call. This Session is where all collaboration takes place. A
Client may be a member of multiple Sessions if they choose and an application can have multiple
Clients. In order to access a Session, the Clients proxy their requests through a JSDT Registry. A
Registry must be running on every machine that participates in a JSDT session and serves as a map
between local and remote objects.
        Once a Session has been created and joined, a Client can then create and/or join various
communication objects. These include ByteArrays for global data sharing, Channels for single and
multiparty communication, and Tokens for access control. Using these objects the Clients are able
to effectively collaborate with each other.
        The JSDT can communicate using a number of different mechanisms; RMI, Sockets, or
LRMP (a lightweight reliable multicast package). This implementation is transparent to the user and
the programmer and can therefore be swapped at any point in development.
        All of the objects mentioned above can be what is called „Managed‟. This managing
provides a mechanism to build authentication into a JSDT system. This is very useful in the
collaborative environment in which the JSDT is designed to be used.

863cf282-7897-4b0c-a76e-de57aae16e23.doc        Page 49 of 62                                  05/04/99
                       Client com puter                                                             Client com puter

                     Client                                      s ockets                          Client
                                                                 rm i
                                                                 lrm p                                           Client

         Client                      Regis try                                         Regis try


                                                 Array                        Byte

                                                                 Ses s ion


                      Ses s ionManager


         Client                      Regis try

                                              Server com puter

                                              Figure 5.3 Basic JSDT Architecture

        The reader should note that a „Server‟ in a JSDT environment is in essence just another
Client. Normally, the Server Client object creates a Session and the appropriate communications
objects that the Client Clients can then join. In this way, a more traditional Client-Server
architecture can be mapped upon the JSDT „Client-Client‟ formation.

863cf282-7897-4b0c-a76e-de57aae16e23.doc                     Page 50 of 62                                                05/04/99
        For a complete treatment of the JSDT architecture and environment, along with some code
samples, see ([2], 98)

5.5 Mapping the Collaborative CRCTool onto the JSDT
In this section, a method in which to map the Collaborative CRCTool onto the JSDT architecture
will be presented and discussed.

5.5.1 Description of Major JSDT Linked Components The JSDT Clients
As mentioned above, the basic JSDT element for accomplishing collaborative tasks is that of the
Client. To implement the crcTool system we therefore define the following two classes:

        crcClient: Each crcTool client will have an instance of the crcClient class. This class will be
        used by the crcTool to join the crcSession (defined below) and any communications
        structures that are needed by the crcTool. The crcClient will be responsible for
        implementing the byteArrayListener interface provided by the JSDT model so that it can be
        notified of changes to the global byte arrays for which it has registered an interest in.

        crcServerClient: The crcServer will have an instance of the crcServerClient class. This
        class will be used by the crcServer to create the crcSession and the appropriate
        communications objects.

        These extensions of the JSDT Client class will allow the crcTool(s) and the crcServer to
interact. There will be a single instance of the crcServerClient which will be created by the
crcServer application. There will be one instance of the crcClient class per connected crcTool
application. The JSDT Session
Now that the Clients have been defined, there must exist a Session in which they can communicate.
Since we would like to have a list of valid users with certain permissions, we need to implement
some type of authentication. The JSDT provides what are termed Session Managers. These
managers can automatically cause authentication callbacks whenever a Client tries to join or call
any other function on a session (or any other manageable object). We therefore define the

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 51 of 62                                      05/04/99
crcSession and crcSessionManager classes. There will be a single instantiation of each of these
objects in the final system.

        crcSession: The crcSession will be created and joined by the crcServerClient. Upon crcTool
        startup, the crcClient will attempt to join the session. The session will be a managed session
        controlled by the crcSessionManager below. The crcSession is where all CRCTool
        interactions will take place.

        crcSessionManager: The crcSessionManager will provide authentication to the crcSession.
        When a crcClient attempts to join the crcSession, the crcSessionManager forces a challenge-
        response authentication request back to the crcClient. Using this mechanism (which is built
        into JSDT), the crcServer can effectively authenticate users which are attempting to join the
        crcSession. The crcClients should encapsulate their user information in a crcUser class
        which can be used by the crcServer to check against valid user entries in its crcUserList
        database. The JSDT Channel
The JSDT uses the Channel class for client-to-client communication. This communication can be single-cast
or multi-cast to any or all Clients that have joined the channel. The Channel class is sub-classed from the
JSDT ManagableObject class and can therefore provide another layer of authentication if necessary. For the
crcTool system, we wish to have two of these channels.

        crcControlChannel: The crcControlChannel will be created by the crcServerClient in the
        crcSession. There will only be one instance of the control channel in the session. This
        channel will be joined by all crcClients as well as the crcServerClient. This channel will be
        used by the crcServer and crcTool(s) to exchange message objects. These message objects
        will provide the mechanism for passing available project lists, open requests, the uploading
        and downloading of projects, and other command and control related messages. All joined
        clients will implement the JSDT channelConsumer interface on this channel and will
        therefore process all incoming messages that are sent to them.

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 52 of 62                                    05/04/99
        crcProjectChatChannel: The crcProjectChatChannel will be created in the crcSession by
        the crcServerClient. There will be one channel per open project. CrcClients that are working
        on a given crcProject will join the chat channel for that project in read/write mode. The
        crcClients will implement the JSDT channelConsumer interface in order to receive a
        callback when data is added to the chat.

        The JSDT channels use Data objects to encapsulate data. Therefore, any object that needs to
be sent over a channel must implement the java.io.Serializable interface. These channels will allow
all parties in the crcTool system to communicate effectively. JSDT ByteArrays and Tokens
The JSDT has two classes which support global data sharing. These are the ByteArray class, and the
Token class.
        The ByteArray class allows a Serializable object to be globally accessible to any Client in a
given Session. The ByteArray class is sub-classed from the JSDT ManageableObject class and can
therefore provide another layer of authentication if necessary. It is joined in the same manner as a
Session or a Channel. A Client can implement the JSDT byteArrayListener interface to be notified
when a given ByteArray has been changed. In this way, all Clients can keep a copy of the most up-
to-date object in the session space.
        If we are to have globally editable data, we need some method to control access to it. The
JSDT Token class provides a mechanism for this type of synchronization. Again, the Token class is
sub-classed from the JSDT ManageableObject class and can therefore provide another layer of
authentication if necessary. It is also joined in the same manner that Sessions are joined. A Client
can grab a Token (exclusively or non-exclusively), release a Token, give a Token, test a Token‟s
status, get a list of Clients that are holding the Token, or request a Token from another Client. A
Client can also implement the JSDT tokenListener interface to be notified of events such as when a
Token is released. Using these primitives, a Token can be used to effectively provide mutual
exclusion to a ByteArray in the session space.
        For the CRCUserView defined in the specifications, the crcServerClient can create the
following class and serialize it into a ByteArray in the crcSession:

863cf282-7897-4b0c-a76e-de57aae16e23.doc           Page 53 of 62                              05/04/99
        crcProjectUsers: This is a ByteArray created by the crcServerClient which holds the names
        of the users currently working on a project. There is one ByteArray per active crcProject.
        The crcServerClient joins the array in read/write mode and is the only Client that edits the
        list. All other clients simply implement the JSDT byteArrayListener interface to be notified
        of changes. Since the crcServerClient is the only entity changing the list, no access
        synchronization is needed.

        We now need to design a way in which to represent and access the crcProject data. We have
defined a nicely encapsulated crcProject which contains project information as well as a vector of
crcCard objects. We would like to put this single object out into the session and allow multiple
clients to edit it simultaneously. To do this, we must provide a Token for access synchronization.
        If we put the crcProject into the Session in a ByteArray and provide a Token to control
access to it, we can provide exclusive access of the project. However, since the entire project would
have to be locked and unlocked, this setup would actually prohibit two different clients from editing
different cards at the same time. This is not what we want at all.
        In light of the above, we need to define the granularity that we desire in editing a given
project‟s information. The above implementation would only support project level granularity. This
is obviously not very useful. On the other extreme, we could define the granularity to be a single
piece of data in a given crcCard. This level of granularity is too fine. Such a setup would be
confusing to users trying to accomplish real work. Considering these two alternatives along with the
range of possibilities in between, a reasonable level of granularity is that of card granularity. In this
way, we need to provide the ability to lock and unlock crcCard data.

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 54 of 62                                   05/04/99
                                           crcProject           crcProject
                                            Token 1              Token n


                                           crcProject           crcProject
                                               1                    n


                                                        Byte          Channe
                                                        Array            l     Token


                 Figure 5.4 – Each crcProject has a single Token. (project granularity)

        Considering the above, one method to achieve such granularity would be to „flatten‟ out the
crcProject into a crcProjectData byte array and multiple crcCardData arrays. We then could provide
a Token for each of the Data objects and two or more Clients could edit different cards at the same
time. While this representation would work, we have lost the encapsulation that the crcProject class
had provided for us. We have created a huge number of objects in the Session space and have
therefore increased our overhead. This is undesirable. An alternative solution is defined below.
        In the new model, we put the crcProject instance out in the Session in a single ByteArray.
We provide a serialized instance of another class termed crcProjectAccessList. We also define one
Token, the crcProjectAccessToken which will control access to the crcProjectAccessList. The
design works as follows:

863cf282-7897-4b0c-a76e-de57aae16e23.doc                 Page 55 of 62                        05/04/99
                                              crcProject                    crcProject
                                              DataToke      crcCardDa       DataToke
                                                 n1         taToken 1          nn         crcCardDa
                                                                                          taToken n
                                              ata 1
                                                           crcCardtDa                       ta n
                                                              ta 1      crcProjectD
                                                                           ata n


                                                                    Byte           Channe
                                                                    Array             l               Token


               Figure 5.5 – Using a flattened crcProject with a ByteArray and Token for
                               each crcProject and all contained crcCards. (card granularity)

        When a Client requests that a project be opened (via the crcControlChannel), the
crcServerClient loads the appropriate crcProject from its database. It then creates the crcProject byte
array and serializes the project into it. Next it creates the crcProjectAccessList byte array and
serializes the access list (which can simply be a vector of strings) into the array. Finally it creates
the crcProjectAccessToken. When the Client is notified that the data structures are now setup (via
the crcControlChannel), the Client joins the two ByteArrays and the Token. The Clients also
register as listeners for the crcProject ByteArray so that they will be notified of changes. They do
not register as listeners for the crcProjectAccessList.

863cf282-7897-4b0c-a76e-de57aae16e23.doc                       Page 56 of 62                                  05/04/99
                                              crcProject-           crcProject-
                                               Access-               Access-
                                               Token 1               Token n

                                                       crcSession                  crcProjectA
                                crcProjectA                                        ccessList n
                                ccessList 1

                                               crcProject           crcProject
                                                   1                    n


                                                            Byte          Channe
                                                            Array            l         Token


                       Figure 5.6 – The access list design model. (card granularity)

The crcProjectAccessToken protects the crcProjectAccessList. Any client, via the listener callback,
has read access to the crcProject. When a Client wants to request edit access to a given card, it takes
the following steps:
        1. Grab the crcProjectAccessToken exclusively – If this fails, keep trying
        2. Read in the crcProjectAccessList
        3. Check the list to see if the name of the crcCard that we wish to edit is present
        4. If it is already there, someone else has it open for editing and we may not have write
           access. Give back the Token and return a failure (read-only)
        5. If it is not there, we write the name of the card into the vector, write the modified
           crcProjectAccessList back to the Session ByteArray, and release the
        6. The card is now ours for the editing - Return a success

When a Client has completed its editing and wishes to write the crcCard back to the crcProject, it
takes the following steps:
        1.   Grab the crcProjectAccessToken exclusively – If this fails, keep trying
        2.   Read in the crcProjectAccessList
        3.   Remove the crcCard name from the list
        4.   Write the crcProject (this will notify all listening Clients)

863cf282-7897-4b0c-a76e-de57aae16e23.doc                     Page 57 of 62                       05/04/99
        5. Release the crcProjectAccessToken

        Finally, when a Client receives notification of a crcProject update (via the listener), it should
update its copy of the crcProject. The only exception to this is when a Client is editing a card. In
this case, it should keep its copy of the card being edited and only update the rest of the project.
This will keep interleaved card edits from interfering with each other.
        In this scenario, the crcProject specific data can be treated just like the cards. If a Client
wishes to change project specific data (i.e the description, author, etc…), it simply uses the name of
the project when it checks the crcProjectAccessList and locks that. If a Client wishes to add a card,
it must again lock the project specific data using the project name. If a Client wishes to delete or
rename a card, it must lock both the project specific data (the project name) and the card data (the
card name). (Here we assume that if a card is renamed or removed, the user is responsible for
cleaning up any hanging references in the remaining cards. If we wished to provide automatic
renaming, we would need to lock all of the cards and the project specific data before doing a
        One should note that the Client must „behave‟ and follow the above protocol in order to
ensure data integrity. If desired, byteArrayManagers could be used with the crcProject to ensure
that updates are only done via the above mechanisms.
        The reader should realize that there is a tradeoff between the two card-granularity methods
mentioned above. In the first, the „flattened‟ model, we have a possibly very large number of
crcCards and therefore Tokens out in the Session space. Since each token and ByteArray requires
an entry in the Registry, the overhead necessary to support such a setup could be very extensive.
        The second method uses only three objects in the Session per active crcProject. However,
from a different point of view, in the first model, if a crcCard is updated, the clients need only
update that card in their local model. In the second model, if a card is updated, the entire crcProject
must be updated. As the project grows in size, this could present a performance issue.

Following is a summary diagram of the Collaborative CRCTool mapped to the JSDT architecture:

863cf282-7897-4b0c-a76e-de57aae16e23.doc         Page 58 of 62                                   05/04/99
                                   crcClient                                                                                      crcClient                      GUI
                                   computer                                                                                       computer

                                 crcTool                                                                                        crcTool
                                                                                Java RMI

                    crcClient                   Registry                                                      Registry                          crcClient

                   Local                                                                                                                            Local
                   Data                                                                                                                             Data
                                                                   Access-                 crcProject-
                                                                    Token                   Access-
                                                                                                                                                        is run as part of
                                                                crcProje                                                                                the crcServer
                                               crcProjec                                      crcProject
                                                                ctUsers    crcSession                               crcProjec
                                               tAccess-                                           n                  tAccess
                                                             crcProject                          crcProject
                                                                 1                                 -Chat-

          GUI                    crcSessionManager

                                                                                                                                           The crcSession is
                   crcServer                                                                                                               created by the
                                  crcSM...                                                                                                 crcServer

                                                                                                                The session is
                    crcServer-                                                                                  joined by the clients
                                                Registry                                                        through the

                                                                                                    Byte            Channe
                                                                                                    Array              l                Token
                    Data                                   computer                        Key

                   Figure 5.7 – The Collaborative CRCTool mapped to the JSDT architecture

         At the design level, the second model is preferred due to its encapsulation properties.
However, it would be prudent to implement both methods and test their respective performance
under the JSDT with large data objects. Only from this testing could the better design from a
performance standpoint be determined.

863cf282-7897-4b0c-a76e-de57aae16e23.doc                                         Page 59 of 62                                                                              05/04/99
5.6 The JSDT and Partitioning

As discussed in the previous section, the JSDT is a primary partition GCS. Because of this, if any
clients break off from the crcServer partition, they will be forced to restart and reconnect upon
convergence. While the JSDT does not provide any base functionality to support this convergence,
some form of it may be able to be constructed using the JSDT primitives. While this is an inherently
difficult problem, there do exist certain subsets of the problem which could be fairly easily solved.
For example, in the CRCTool, if a group of users were working on a project and became separated
from the rest of the group, simple modifications like the addition of more CRC cards could be easily
merged when the two partitions converged. However, there would need to be a large number of
very domain specific rules set up to allow for this to happen smoothly. Even with strict guidelines,
there would inevitably be some cases that would require some sort of „governing body‟ that would
be able to resolve contending modifications to a single piece of information.

5.7 Collaborative CRCTool Conclusion

As we can see from the above, the majority of our collaborative application example maps well to
the JSDT architecture. However, there are some concerns with granularity, access synchronization,
and performance. The granularity concern is one that is present across many distributed applications
and one that will be refined as more work is done in the distributed arena. In terms of access
synchronization, the JSDT provides a means by which to implement a given design, but does not
provide many options for doing so. As for the performance issue, it remains to be seen how the
JSDT is able to perform under large loads.

6. Conclusions

This paper explored the evolution of group communication services. We have described some of
the key features required for any package to function as a GCS: membership, multicast and message
ordering services. We have described three Unix-based GCSs and how they have approached the
impossible task of providing a reliable membership service. These systems were designed with
specific goals in mind and have exploited existing communication technologies to provide
successful, high-performance group communication services.
        The prospect of using Java as a foundation for object groups, and group communication in
general, is an interesting one. Although they have not yet matured, and may never be able to

863cf282-7897-4b0c-a76e-de57aae16e23.doc       Page 60 of 62                                  05/04/99
replace earlier “true” GCSs, the Java-based GCSs explored here do have the potential to allow for
highly distributable functionality. Performance issues aside, Java-based collaborative applications
may be attractive for many who wish to remain as platform-independent as possible. Only time will
tell if Java-based GCSs will be successful. For now, those wishing to build truly collaborative
applications in Java without a steep learning curve may choose to consider JSDT.
        Although it only provides limited capabilities when looking at the features of a group
communication service, JSDT can be used to provide satisfactory solutions where issues such as
partitionability are not a factor.         Our experience with JSDT has been positive, especially
considering its ease of use, as was to be expected considering the other Java APIs we have used. To
allow for continued exploration of JSDT as a tool in the development of collaborative applications,
we have proposed a high-level specification for its use in expanding the capabilities of the CRCTool
program.     The specification provided here differs in its approach from a former attempt to
“collaboratize” CRCTool, and it will be interesting to compare the two if this one is implemented.
        As this paper was mainly of a research nature, the implementation of the proposed design is
the only applicable future work. Perhaps a similar application could be built using Transis and
Jgroup, for example, to give a feel for the differences between them and JSDT. Also, section two
could be expanded to give a more detailed description of the issues facing the developers of GCSs.
For instance, a more detailed explanation of virtual synchrony and some proposed algorithms to
handle it could make for an interesting paper. Also, as Java-based GCSs become more mature,
another look at them at a future date could reveal their progress toward acceptance as serious GCSs
        Given the current state of computer networking technologies, the design of group
communication services seems a daunting task.            Hopefully, as research continues on current
projects, tools will someday be available that combine the ease of use of toolkits such as JSDT, with
the power and flexibility of systems such as Horus, all implemented with the platform independence
of applications written in Java.


[1] Baratloo, P. Emerald Chung, Y. Huang, S. Rangarajan, and S. Yajnik. Filterfresh: Hot
      Replication of Java RMI Server Objects. In Proceedings of the 4th Conference on Object-
      Oriented Technologies and Systems (COOTS), Santa Fe, New Mexico, April 1998.

863cf282-7897-4b0c-a76e-de57aae16e23.doc          Page 61 of 62                               05/04/99
[2] R. Burridge. Java Shared Data Toolkit User Guide, Version 1.4. Sun Microsystems, Inc.,
      Mountain View, California, June 1998.
[3] D. Dolev and D. Malki, "The Transis Approach To High Availability Cluster
      Communication", Communications of the ACM, Vol. 39, No. 4, April 1996.
[4] P. Heller and S. Roberts, “Java 1.2 Developer‟s Handbook”, SYBEX Inc., Alameda,
      California, 1999
[5] C. Horstmann and G. Cornell, “Core Java 1.1 Volume 2 – Advanced Features”, Sun
      Microsystems Press, Palo Alto, California, 1998
[6] Montresor. The Jgroup Reliable Distributed Object Model. Tobe published in Proceedings of
      the Second IFIP WG 6.1 International Working Conference on Distributed Applications and
      Interoperable Systems, Helsinki, June 1999. Technical Report UBLCS 98-12, December 1998.
[7] L. E. Moser, P. M. Melliar-Smith, D. A. Agarwal, R. K. Budhia and C. A. Lingley-
      Papadopoulos, "Totem: A Fault-tolerant Multicast Group Communication System",
      Communications of the ACM, Vol. 39, No. 4, April 1996.
[8] L. E. Moser, Y. Amir, P. M. Melliar-Smith, D. A. Agarwal, “Extended Virtual Synchrony,”
      The 14th IEEE International Conference on Distributed Computing Systems (IC-DCS), pages
      56-65, June 1994.
[9] R. van Renesse, K. Birman and S. Maffeis, "Horus: A Flexible Group Communication
      System", Communications of the ACM, Vol. 39, No. 4, April 1996.
[10] Vitenberg, Properties of Distributed Group Communication and Their Utilization; Institute of
      Computer Science, The Hebrew University of Jerusalem, Jerusalem, Israel, 28 January 1998.
[11] http://java.sun.com/products/java-media/mail-archive/Share/0768.html
[12] http://www.javasoft.com/pr/1997/june/statement970626-01.faq.html

863cf282-7897-4b0c-a76e-de57aae16e23.doc      Page 62 of 62                               05/04/99

To top