Algorithm for Managing Node Resources for
Buffered Network Communications
CSE 237A Spring 2004
University of California
Table of Contents
Improve the connectivity of mesh wireless networks by creating a method to effectively
connect unplanned (disjoint) topologies using a redundant, timing-aware routing
Improvements in wireless software and hardware have made self-configuring mesh
network topologies realistic for sensor networks, or other real world distributed
applications. Mesh networks have difficulty with scalability, because their connectivity
graphs become very complex as nodes are added.
Most of the academic research in the area focuses on solving difficult graph problems in
order to maximize network efficiency. These improvements are useful in many situations,
however, most real world distributed wireless networks do not suffer from overloaded
airwaves, but from a lack of connectivity. The network algorithms available today focus
on dynamically building a structured hierarchy through analysis of connectivity graphs.
Unfortunately with this approach, researchers limit network connectivity to the
instantaneous static connectivity of a system, discarding any utility offered by the change
in the system over time. We seek to include that information in a network layer that can
act on top of existing network layers, so information can be transferred at times when
other network models would show that a transfer is impossible. This makes no sense,
since information transfer in a multi-hop network is nowhere close to an atomic process.
We will develop an algorithm that allows a network node to use its resources to increase
the probability that messages sent by neighbors will reach their destination. Events on the
network will be treated as entities with a lifespan, rather than atomic exchanges.
In other words, network nodes will cache messages when they are disconnected from the
network and forward them when possible. This will have to possibility of providing a
virtual connection to node neighbors.
For this project we will ignore some of the issues dealt with traditionally in network
protocols. These include transmission errors, unbounded execution time of high
complexity algorithms (Shortest path).
We will however, be adding constraints that are not normally considered that will be
limiting factors in the product niche we are targeting. This includes heavy memory
limitations, poor network link reliability
It should also be noted that in a system with no dynamic elements, our algorithm would
have no effect other than wasting resources.
There are people out there working on ways to use knowledge about the changing
network graph to help route messages when there is no apparent network connection.
One that we found was interesting. If there is a network that develops into two disjoint
networks, it will increase the broadcasting power on antennas until the connection is
made. Or, it can have highly directional antennas change directions to make connections
to antennas that have become disjointed from the rest of the system.
Another approach is to supply each node with hardware that give it the capability of long
range, but to use software to disable the antenna until it is required by the network, thus
But both of these approaches rely on having some sort of control over the behavior of the
nodes. We want to develop a system that will work in a more general case.
The Simulation Environment:
The code for the simulation environment is included in appendix B. The simulation
program draws a field, and cars moving across the field. The cars represent wireless
network nodes. When nodes are within range of each other, the simulator draws a green
line connecting the two nodes. Nodes collect tokens, which appear as colored boxes
below the cars. Nodes are able to copy tokens and transfer them across the green network
connections. It is assumed that somewhere on the network, there is a node that has a
connection to a wide-area network with perfect link quality. This is represented by an
icon of a computer in the center of the field. We call this node the “Server.” When a node
makes a connection to the server, it can copy any tokens it is carrying to the server, and
the simulator records a successful network transaction. The number of successful network
transactions are counted and displayed at the bottom left of the program window.
All of the tokens that have successfully been transferred to the server are listed across the
bottom of the program window.
Along the left of the program window are controls that allow the user to change the
Restart – Removes all of the nodes from the field and resets the success rate to 0.
Graph – displays a graph of the current success rate vs. past success rates for different
Plot – records the current success rate for display later.
Node count – controls the number of nodes on the field at one time.
Capacity – controls the maximum number of tokens that a single node can carry.
Chattiness – controls the rate at which the nodes produce tokens. (Bandwidth
Velocity – controls the average speed at which the nodes move across the field.
Range – controls the maximum distance at which nodes can transfer tokens.
Auto – runs the simulation over and over plotting the success rate each time.
Cycles – controls the precision at which the Auto button performs its simulations
Developing the Algorithm:
First, we determined what simulation parameters have an effect on the success rate of a
system. We did this by modifying the code and letting the simulation run. The following
is a list of parameters that had an impact on the success rate.
Memory is the most limiting constraint. Most network nodes are going to be small
sensors or other devices that aren’t going to have a lot of memory resources to dedicate to
other people’s messages. Here, we have defined memory capacity as “The number of
tokens that a node can carry.” Note that this also includes not only the token value, but
information about the token and it’s originating node, such as the token’s creation date, a
hash of it’s value, and other tags that we added to it to aid our algorithm. Also note, that
our analysis does not take into account the size of the node value. In our analysis, all
messages take the same amount of memory, and the capacity of any node is fixed at an
integer multiple of the size of the tokens. See Appendix A1.
Range, the distance that nodes can transfer tokens over, is an important constraint. To
most networks, it is the only parameter affecting network connectivity. Range has a
positive effect on success rate, with the probability of success proportional to the square
of range. See Appendix A2.
Bandwidth requirements, that is the number of tokens each node will be generating. The
system’s ability to store messages saturates very quickly, so the success is very sensitive
to bandwidth. As bandwidth increases, it appears to have an exponential effect on
success. See Appendix A3.
Velocity, that is the speed the cars move across the screen. This is important to model
because the success of our system relies on some sort of dynamic topology. If the cars
didn’t move, this whole project would be pointless. Velocity had a proportional effect on
success. See Appendix A4.
The number of nodes in the system. This number actually cancels out pretty much,
because as the number of nodes increases, the stress on the system increases, but the
chances that any given node will be connected to a server increase. Also, the total
buffering capacity of the system increases with nodes. See Appendix A5.
After discovering the relationship of success to each of our five parameters, we were able
to develop a unit of connectivity, where the success rate of a system is dependent only on
this variable, regardless of the magnitude of the components of each of the five
R^2 • Byte • V / Bw • Byte
Which cancels to Area•Nodes, which for the rest of the report we call Gupta or capital G.
Information Available to the Algorithm
The algorithm must be able to instruct a node what to do when it is faced with the
difficult decision of what to do what its memory is full, but it has access to a new token.
Should the node ignore the token and continue to allow the ones in its memory to reside
there? Or should it discard one in its memory to make room for the new token.
In order to make this decision, the algorithm must have some information about the
tokens to compare them. We developed several variables the algorithm has access to:
How old the token is. This is important because we don’t want copies of year-old tokens
flooding the system, even though they have already been delivered. Each token will have
to carry with it a tag that identifies its time of creation.
Pervasiveness. This means how many copies of the token are floating around the system.
This is actually impossible to know, but we can get a good estimate. Whenever a node
produces a token, it tags it with a number, say 30. Whenever the token gets copied to
another node, it will divide the number by two. So now two nodes each have a copy
tagged 15. Whenever they make a copy, they also divide by two, so the nodes that have a
copy will be tagged 7 or 8. This system limits the amount of memory a single token can
waste, because once it is down to 1, the algorithm can decide not to put much effort into
distributing it any more.
Time since the node last connected to the server. This is easy to keep track of, and allows
the algorithm to avoid taking on tokens with high Pervasiveness tags if it is unlikely that
the node will be connected any time soon.
A random number. If two tokens carry the exact same meta-data, they will be hard to
compare. A random number will allow different nodes on the system to behave
differently and not all work to service the same token.
A hash value. This will allow the node to determine if it has seen a token before.
Converging on the Algorithm:
We realized that we could reduce the algorithm to a method for comparing two tokens
and deciding which one is more important, and should be stored in memory, and which
should be discarded when the memory is full. We attempted to express this method as a
formula for determining the value of any one token.
We had no idea what the form of the formula would be so we thought it would be safe to
assume that it is a polynomial,
At + Bp + Cr + Dm,
Where A, B, C, D are coefficients, and t, p, r, m are values available to the algorithm, age
of token, pervasiveness of token, a random number, and time since server contact. In
order to find the proper values of A, B, C, D, we simulated the system, while varying one
coefficient at a time. See Appendix A6-A9.
We converged on values for A, B, C, and D.
To improve our formula, we experimented with different forms of the expression.
Eventually, we found one that worked very well, a rational polynomial in the form
At + Bp + Cs + D
Et + Fp + Gs + H
Again, we plotted the success rate of the simulator against the coefficient values of A-H.
This is a plot of the value of A vs. the success rate of the simulator. The data for this plot
can be found in Appendix A10.
It should be noted that the sensitivity charts provided are for the finished algorithm.
When we performed this process for the first iteration, the numbers did not converge on
the same values. We iterated through the process of converging the values five or six
times for each coefficient before we reached numbers that looked correct.
Also, it is important to note that this entire process was carried out with a single set of
simulation parameters. We made the assumption that the relationships between
simulation parameters and success rate did not vary with which algorithm we are using.
In order to verify this assumption, we changed the network parameters and again checked
value of A at the minimum error rate. This again worked out to very close to 0.6. See
The fact that the optimal value for A was not exactly the same suggests that an even
better solution would be to have each node try to estimate network conditions and adjust
the coefficients accordingly.
Another point to consider is the possibility that our experiment converged on a local
minimum error for our system. It is possible that another set of coefficients would result
in an even lower error rate. This possibility extremely difficult to prove false, since we
are using an essentially chaotic simulation system. The initial coefficients determine
which minimum you will eventually reach, and ours were A=B=C=D=E=F=G=H=5.
Most of the Sensitivities ended up showing that the best value for the coefficient is zero,
like chart E (Appendix A14), in other words, the Age of the token should not appear as a
term in the denominator.
In order to gauge our progress, we needed to develop benchmark solutions. We needed an
upper and lower bound on the performance we could expect from an algorithm, as well as
algorithms that would approximate a sub-par algorithm design (Something to beat).
Best-Case: A system where all of the nodes have no memory constraints. Using this
definition, any series of events that leads to a successful network transaction in an
omniscient algorithm, would also be counted as a success in our best-case benchmark. It
should also be noted that there are some situations in which an omniscient algorithm
would fail, but our best-case definition would record a success. One case is when one
node that is not connected to the network produces three tokens at the same time. If the
node capacity is two tokens, an omniscient algorithm would be forced to drop one. In
order to minimize the this effect had on the simulation results, we were careful to only
compare the results of our best-case algorithm using simulation parameters where the
memory capacities of nodes were set high. (3 or more). The performance of Best-Case is
very good. It breaks away towards 99% within about 20 Gupta.
First-come-first-served: When a node has access to a new token, it will always make a
copy of it. When its memory is full, it will stop accepting new tokens. When it makes a
connection to a server, it transfers its tokens and clears its memory. First-come-first-
served has acceptable performance in the low-Gupta range, but takes a very long time to
reach 99% (80 or more Gupta)
Last-come-first-served: A node will always opt to replace old tokens in memory with
new tokens. In the low Gupta range, LCFS has very poor performance, following worst-
case. It breaks upward toward 99% later than FCFS, but achieves 99% sooner.
Worst-Case: A node will never copy tokens from other nodes. When it connects to a
server, it will transfer its tokens and clear its memory. This algorithm is also called
“Mesh”. Worst-Case has almost linear performance vs. Gupta in the range that Best-Case
is below 99% success. Worst-Case achieves 99% around 100 Gupta.
These values are all from the chart in Appendix A20.
When a connection is made to a peer:
1) If the peer is a server, transfer all tokens to the server and clear memory.
2) Obtain a copy of all tokens that peer has
3) Eliminate all tokens that the node already has a copy of
4) Calculate At+Bp+Cs+D/Et+Fp+Gs+H for all tokens.
5) Choose the CAPACITY highest score token
6) Save those tokens in memory and discard the rest
Using the best version of the algorithm we have, we compiled simulation results in
The red line on top is the Best-Case algorithm. The blue and green lines are the two
stupid implementations, FCFS, LCFS. The purple line on the bottom that is almost linear
is the Mesh network line, that is, the worst possible algorithm.
As expected, our algorithm falls between the Best-Case and all other algorithms.
The protocol must be able to exchange tokens and exchange information about the
tokens. Including all meta-data used by the algorithm. The age of the token, the
Also the protocol should fit somewhere in the network stack, so that it can be used by
other programs. During later testing, we discovered that the protocol is very sensitive to
non-random token generation, as is produced by our simulator, we discuss this later in the
section, “Using our protocol in the real world.” We determined that if this protocol were
implemented as part of the network stack, it would have to reside no lower than the TCP
level, to avoid unnecessary chatty network protocols.
Of course this algorithm could be implemented at any layer higher than what we drew
here. You could write a version of AIM, for instance, that simply buffered your outgoing
messages during a network interruption until you signed on again, and then sends them
There were some data transfer issues that we faced when defining the protocol.
The first one is how to determine the age of a token. This requires all the nodes to have a
clock. But all clocks might not be synchronized.
We solved this by allowing each node to have its own clock. When information about the
age of the token is required by another node, the node will transmit the time that the
token was created, along with what time it is right now. From those two pieces of
information, the receiving node will be able to calculate what the time of creation was
relative to its own local clock.
Another problem was identifying when a node has a direct connection to a server or not.
It isn’t good enough to just assume that if one of your peers has a connection to a server,
then so do you. Imagine the situation when there is a circle of connected nodes, each
thinking the one in front of it has a connection to the server, tokens will continue to be
passed along the ring in circles.
When network robustness is important, embedded engineers have a choice. They can use
our algorithm, or they can increase the broadcast range of their hardware. The total cost
of a system is going to be the total cost of all the units deployed, plus the cost of the
damage caused by network failures.
This is a graph shows how the error cost affects the optimal range that should be included
in the hardware design on a node network. The red line is the cost of the hardware units,
and it increases as range is increased.
The blue line is the error rate times the cost of errors. As the range increases, the
probability of error drops, and so does the total cost of the errors.
Added together is the total cost, the green line. At around 10 Gupta there is a minimum.
This measurement corresponds with the optimal range that the design should employ. As
the cost per error increases, the minimum on the green line will move up and to the right,
meaning the total cost of the system increases, and the range necessary on the hardware
To assess whether this project was a success, let’s revisit the midcourse report.
Plan: Research latest methodology in mesh network topology and routing, so we don’t
Expected Outcome: Complete
Current research in the field of dynamic network topology focuses on ways to manipulate
the topology of the network (or virtual network) to maximize efficiency. This turned out
not to be very relevant to our project. Our simulation assumes that the programs have no
control over the actual topology of the network, as the connections and disconnections
would likely be caused by the physical movement of the nodes. Research relating to
virtual networks, (Like ref. ) Makes assumptions that the nodes can be connected if
Research product market for poorly connected wireless node arrays. Assess what
meaningful limitations to place on our system, and what factors are most important to the
Expected Outcome: Complete
Cost is the most important factor that goes into network design. Any amount of network
robustness can be achieved simply by improving the quality of the wireless hardware on
Develop testing environment and finalize benchmarks for algorithm analysis. Determine
the basic effects of testing variables and show how they affect best-case, modern, and
Results: in progress.
Compare algorithm to currently available solutions. Produce comparisons, analyses.
Expected Outcome: We will do better than those.
Of course in our simulation we had a better success rate than currently available
technologies, but that makes sense since our simulation was only testing situations where
current technology fails, so the comparison doesn’t make sense.
Verify implementability by simulating system using accepted network simulation tools.
Account for the limitations of our experiment by testing the effect of variables we did not
consider. (Transmission errors, malicious peers, corner cases)
Instead, in order to
And we did not test the effect of transmission errors, malicious peers
Prepare report/ presentation
This is the report.
At the time of the midcourse report, we reported that we weren’t able to approach the
Best-Case algorithm. We have since been able to close the gap by improving our
algorithm. This is because
There were issues that are important to an actual implementation of our algorithm that we
did not cover.
Malicious peers. Because malicious peers can use unlimited bandwidth, the system is
very sensitive to extra nodes that are not playing by the rules. If a malicious host sends
too many tokens, or manipulates the meta-data on a token to improve it’s own success
rate, for example, by putting a higher pervasiveness tag on it, or claiming it is newer than
it really is, it would be bad for the rest of the system, because the token would take up too
much of the system’s collective buffering capacity. Also, a malicious peer could identify
itself as a server, thus causing other nodes to clear their memories.
Also, there were assumptions we made about the system that probably won’t always be
One of them was that “servers” that is, the wide area network is always available. Or that
the destination of the message is even on the wide area network at all. It is possible that
the destination is another node floating around on the same network. In this case, the
success rate of the system would be even lower.
Also, in our simulation, Servers are distinguishable from other nodes. When a node
makes a connection to a server, it transfers its tokens and then deletes all copies it has of
those tokens. If it for some reason thinks it is connected to the server, but it not, it will be
deleting tokens that haven’t been transferred, and so, probably never will be.
We also assumed that tokens are equally valuable. This might not be true. For example, a
message that is delivered one year after it was sent probably wont be that useful for the
recipient. We are however, counting that delivery as a success.
Probably the most important assumption that we need to reexamine is that transactions
are atomic events. In other words, if a transfer fails, both nodes involved will know that
the transfer failed. Since our simulator does not simulate transfer failures, we have no
idea how the system will behave when we no longer guarantee atomic transactions. In
fact, we never were able to develop a foolproof system for guaranteeing that a node
would be notified if a transfer fails.
Using our protocol in the real world.
We wanted to know the effect on performance if the protocol were used in a situation
where two-way communication is required. We modified the simulation, so that
whenever a node produces a token to send on the network, it first creates a handshake
token and sends it. When the server receives the token, the server creates an
acknowledgement token and tries to send it back to the node. If any of these three tokens
fail, we programmed the simulator to register a failed transfer.
See Appendix A21.
At a high Gupta, the acknowledgement had no effect on success. That is, the chances that
a token would be successful was not affected by the pattern by which the tokens were
being created. However, at around 20 Gupta and fewer, the success rate starts to fall. This
is because the added tokens caused by this additional handshaking are interfering with the
tokens that are carrying the message the handshaking tokens are introducing. The nodes
that are capable of carrying tokens back to the nodes are the same nodes that have their
memories filled with the original message. At some point, around 11 G, the problem is so
severe, that it does worse than the Worst-Case algorithm.
A – Data Tables
1 – success vs. memory
2 – success vs. bandwidth
3 – success vs. range
4 – success vs. velocity
5 – success vs. number of nodes
6 – success vs. A
7 – success vs. B
8 – success vs. C
9 – success vs. D
10 – success vs. A
11 – success vs. B
12 – success vs. C
13 – success vs. D
14 – success vs. E
15 – success vs. F
16 – success vs. G
17 – success vs. H
18 – success vs. A, with range=150
19 – All algorithms. Success vs. Gupta
20 – Success with acknowledgements
B – Code
C – Midcourse report
D – References
E – Presentation