BizTalk Server 2004: “Hub Bus” - Leaving “Hub and Spoke” and “Message Bus”
behind.
BizTalk Server 2004 introduces a revolutionary Message Box architecture that
overcomes many of the issues of both the hub and spoke and the message bus
architectures previously utilized by integration vendors. First we will examine the hub
and spoke and message bus architectures and then explain how BizTalk Server 2004
Message Box architecture takes advantage of key design points of both architectures in a
unique manner.
A. Hub and spoke brokers got their name because architecturally they provided
centralized hub processing machinery that accepted requests from multiple applications
that were plugged into the centralized hub as spokes. This intermediary provided a focal
point for applications and simplified the relationship between applications from n squared
lines of connectivity to only n lines. The hub provides a layer of insulation and
functionality between the spokes enabling spoke applications to be removed and replaced
without other spoke applications necessarily being aware of the change. Some
functionality executed at the hub included transformation, message tracking, n to m
message routing between the spokes and monitoring. The interactions between the hub
and spoke did not require any proprietary protocol nor any application modification as
the hub could accept messages over standard transports such as http/s, smtp etc. A
typical hub and spoke deployment is depicted below in figure 1 with a single hub
machine brokering connections across multiple spokes:
Spoke
Spoke
Spoke
Integration
Broker Spoke
Spoke (Hub)
Spoke
Spoke
Spoke
Figure 1: A Hub and Spoke Integration Broker Topology
While the hub and spoke model provided strong total cost of ownership by reducing the
cost to add and remove connections and through centralized management the hub and
spoke model was not without challenges. In particular hub and spoke models usually
predicated a single central machine – and as such scalability is limited to the scalability
of the central machine resource and a single-point of failure may be created. To avoid
these issues multiple implementations of hub and spoke were often tied together –
however this typically led to administration overhead and architectural complexity. These
reasons restricted this architecture from use in environments where extreme through-put
and scale was required. Examples of this architecture include IBM CrossWorlds, Vitria,
and Web Methods.
B. Message bus brokers got their name because they consisted of a network of message
processing functionality interlinked through a common protocol. Message bus
architectures provided high-throughput multicast capabilities utilizing a network of nodes
as shown in Figure 2 below:
Application
Application
Adapter
Transformation
Adapter
Transformation
Node 1 Node 2 Node 3 Node 4
Message Bus
Adapter
Transformation
Adapter
Transformation
Application
Application
Figure 2: Message Bus Topology
However, message buses were also not without lots of issues. For example the network
was composed with multiple nodes and management of these nodes was more complex.
Routing between the nodes in the network was typically using a proprietary bus protocol
and format so customers were restricted to a homogeneous integration broker technology
across the whole bus. Further routing on the bus was typically implemented by sending
packets out to each of the nodes over Ethernet. This Ethernet routing mechanism used
considerable bandwidth, caused a security risk on the bus as every node had access to the
others messages in routing and eventually flooded the pipe between the applications.
Transitions on Ethernet were not reliable without the addition of a proprietary reliable
protocol on top of the packet sends. For an application to talk to the message bus it
needed to adapt data from the applications format to the format of the business and
transmit a message on the bus. This lead to a considerable expense in adding additional
applications because transformation logic was applied locally at each application node,
rather than being centralized as in the hub and spoke architecture. Indeed the services
offered by the bus were limited by its decentralization; even logging or monitoring
information across the bus created considerable challenges because it typically involved
either broadcasting logging packets to centralized logging machinery which further
clogged bandwidth or browsing individual log files across multiple nodes. While for
edge-case scenarios of high value such as stock-market ticker applications that
technically do not require the message bus architecture provided simultaneous multi-cast
capabilities and through-put required the message bus architecture was not well suited to
typical integration scenarios for the reasons detailed above – especially the network
flooding issues. Examples of this architecture include is TIBCO Rendezvous.
C. BizTalk Server 2000/2 could be described as a hybrid hub and spoke message
broker/message bus message broker but its roots are firmly placed in the hub and spoke
model. From a hub and spoke perspective BizTalk Server 2000/2 provided centralized
transformation and standard transport interfaces and a single CPU install of BizTalk
Server fitted the topology described in Figure 1 above. On the other-hand Microsoft
recognized the need to over-come the single point of failure in the hub and spoke model
and so provided a shared processing database that enabled several integration broker
nodes to perform work in concert providing a “virtual hub” called a BizTalk Server group
which acts somewhat like a message bus in that it is a set of nodes working in concert –
but rather than passing state between the nodes across the Ethernet in a traditional
message bus model a centralized database was utilized and accessed by multiple nodes.
The database had a number of advantages rather than Ethernet, in particular it provided
reliable, “never lose a message” brokering. In BizTalk Server 2000/2 there were a
number of architectural restrictions that typically affinitized work to a particular node
(receiving or processing), rather than distributing it across nodes and the extent of the bus
functionality was limited to only a few servers as the bus typically supported as many
processing and receiving servers that could be connected to a single interchange database.
It is for these reasons that the hybrid architecture was more associated with hub and
spoke model. The BizTalk Server 2000/2 topology is shown in Figure 3.
Spoke
Spoke
Interchange
Interchange Database
Database
Receiving Processing
Receiving Processing
Virtual Hub 2
Virtual Hub 1
Spoke
Spoke
Figure 3: BizTalk Server 2000/2 topology with two virtual hubs with no single point of
failure.
However, the virtual hub concept was restricted to a certain number of processing and
receiving servers. Once the interchange database resources were exhausted customers
had to add an additional message broker that was independently resourced and managed,
albeit through the same tools at the user interface level.
C. Hub Bus topology used in BizTalk Server 2004 goes beyond traditional broking
paradigms of message bus and hub and spoke providing a hybrid taking advantage of the
best of both mechanisms. Building on the key concepts introduced in BizTalk Server
2000/2, BizTalk Server 2004 maintains the hub and spoke like properties of
transformation, centralized logging and tracking and management but internally the hub
acts very much like a bus in terms of processing scalability and distribution. Innovations
in the Hub Bus architecture revolve around the Message Box and the Host concepts.
I). The Message Box model enables centralization of state inside the bus without a single
point of failure through a set of distributed clustered databases nodes. Three key aspects
of the message box model are:
The MessageBox nodes centralize all state in the system, ensuring that all non-
MessageBox nodes such as the receiving/sending and processing in Figure 4
below can be interchangeable if desired as a particular node stored no local state.
There is no explicit relationship between a particular MessageBox node and a
particular non-MessageBox node at run-time. This means that MessageBox nodes
can be added and immediately any of the non-MessageBox nodes can take
advantage of the functionality they provide without any configuration change on
the non-MessageBox node.
Configuration information such as application routing information is shared
across all of the MessageBox nodes
II). The Host and Host Instance Model is used for the “processing nodes”. A host is an
abstract container for processing resources. For example, an abstract host may contain an
MSMQ receive function and an HTTP receive function. Host instances are the physical
incarnation of hosts. Why the abstraction? Two or more nodes in a bus may run the same
set of processing resources for fault-tolerance or scale-out. With the host model a single
host is defined and managed yet instances of that host are seamless deployed to multiple
nodes in the network. Indeed depending on the scenario customers can create networks
of receive host instances, processing host instances and send host instances within the
larger “hub bus”. Key to the host and host instance model is that the host instances do not
persist state locally instead using the MessageBox group (which is a network of
MessageBox nodes) for this task.
Figure 4 below shows an example “Hub Bus” topology across two applications.
Application
Either/Or
Receive Host Receive Host
Instance Instance
Processing
Host Instance
Either/Or
Message Message
Box Box
Processing
Host Instance
Either/Or
Send Host Send Host
Instance Instance
“Hub Bus”
Application
Figure 4: The “Hub Bus” topology of BizTalk Server 2004
As described in Figure 4 any of the receive host instances setup to receive messages from
the source application may introduce the message to the “hub bus” network. In this case
either of the receive host instances may pick-up the incoming message. The number of
receive host instances is not limited, so this may be a “network of nodes” in its own right,
rather than the two described below. The receive host instances may receive from one of
more applications, for example receiving messages both over HTTP and MSMQ.
However, adding a twist, should their be different types of application requirements
multiple receive host instance networks with different receiving capabilities, such as one
that receives messages from HTTP and one that receives message from MSMQ may be
created. The receive host instances deposit the messages in to the MessageBox. The
MessageBox network is independent from the non-MessageBox network and as such
both of the receiving host instances have access to both of the MessageBoxes and which
one is selected is determined round-robin at runtime. Should an additional MessageBox
be required for scalability it make be added into the MessageBox network and
subsequently three MessageBoxes will be available round-robin to each of the receiving
host instances. Once the Message is in the MessageBox network it becomes available to
the renaming nodes in the “hub bus”. For example a processing host instance may pull
the message from the message box to perform some work such as an orchestration. If two
processing host instances are in the network either one may do the work for any interval
of time returning the message to the MessageBox network once completing the task.
Avoiding flaws in Ethernet based bus topologies, intelligent self-adjusting load balancing
and throttling are inherently part of the infrastructure because the host instances can be
adjust their interactions with the MessageBox in times of peak volume to effectively.
Eventually a send host instance pulls the message from the MessageBox network and it
exits to the target application.
With BizTalk Server 2004 “Hub Bus” concept traditional issues with a “Hub and Spoke”
topology and with a “Message Bus” topology are overcome to provide a scale-able,
manageable integration broker.