task allocation

Document Sample
task allocation Powered By Docstoc
					                                         Robotics and Autonomous Systems 30 (2000) 65–84




  The call of duty: Self-organised task allocation in a population of up to
                           twelve mobile robots
                                      Michael J.B. Krieger a,1 , Jean-Bernard Billeter b,∗,2
                                  a Institute of Ecology, University of Lausanne, 1015 Lausanne, Switzerland
                  b   Laboratoire de Micro-informatique, Swiss Federal Institute of Technology, 1015 Lausanne, Switzerland




Abstract
   Teams with up to 12 real robots were given the mission to maintain the energy stocked in their nest by collecting food-items.
To achieve this mission efficiently, we implemented a simple and decentralised task allocation mechanism based on individual
activation-thresholds, i.e. the energy level of the nest under which a given robot decides to go collect food-items. The
experiments show that such a mechanism — already studied among social insects — results in an efficient dynamical task
allocation even under the noisy conditions prevailing in real experiments. Experiments with different team sizes were carried
out to investigate the effect of team size on performance and risk of mission failure. ©2000 Elsevier Science B.V. All rights
reserved.
Keywords: Collective robotics; Task allocation; Division of labour; Self-organisation




1. Introduction                                                            be less visible when decentralised (as with neighbours
                                                                           providing unsupervised help after an earthquake). Be-
   In human and other social groups with advanced                          hind the diversity of possible task allocation mecha-
labour division, life is organised around a series of                      nisms lays a common structure: they all act at the indi-
concurrent activities. For a society to function effi-                      vidual level, prompting individuals either to continue
ciently, the number of individuals (team size) involved                    or to change their activities (Fig. 1). The condition
in these activities has to be continuously adjusted such                   that triggers the change to another activity may be a
as to satisfy its changing needs. The process regulat-                     simple rule of thumb or a complex decision-making
ing team size — and thus modulating labour division                        procedure.
— is called task allocation. It can be evident when                           Task or role 3 allocation has been extensively stud-
centralised and embodied in a special agency (like a                       ied in social insects (e.g. [6,10,13,15,26–28,32,33]).
foreman dispatching men on a working site) or it can                       The study of task allocation in social insects is
                                                                           particularly interesting since their labour division and
  ∗ Corresponding author. Tel.: +41-22-7400094; fax: +41-22-

7400094.                                                                    3 The words task, activity and role may often be used one for
E-mail address: jb.billeter@pingnet.ch (J.-B. Billeter).                   the other, as in “My task, activity, role, is to sweep the yard”.
  1 Present address: Department of Entomology and Department of            Still, task specifies “what has to be done”, activity “what is being
Microbiology, University of Georgia, Athens, GA 30602, USA.                done”, and role “the task assigned to a specific individual within
  2 Laboratoire de Micro-informatique, EPFL, http://diwww.epfl.ch/          a set of responsibilities given to a group of individuals”. Caste
lami                                                                       defines a group of individuals specialised in the same role.

0921-8890/00/$ – see front matter ©2000 Elsevier Science B.V. All rights reserved.
PII: S 0 9 2 1 - 8 8 9 0 ( 9 9 ) 0 0 0 6 5 - 2
66                        M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

                                                                     can account for the workers’ behavioral flexibility”. 4
                                                                     Similarly, models in sociology have shown that sim-
                                                                     ple reaction-threshold differences among individuals
                                                                     may lead to complex social dynamics [9,30]. The pur-
                                                                     pose of the experiments described below was to test
                                                                     the efficiency of the activation-threshold mechanism
                                                                     for task allocation in practical robotics.
                                                                        The regulation of a group of robots engaged in sev-
                                                                     eral concurrent activities involves regulating the team
                                                                     members’ activity in real time (dynamical task allo-
                                                                     cation). A variety of mechanisms may achieve task
       Fig. 1. Individual choice between two activities.             allocation, however, when working with real mobile
                                                                     robots whose perceptions, communication and actions
                                                                     are reckoned to be limited, it is advisable to select
its regulation are organised by surprisingly simple and              a mechanism for its simplicity and its robustness. A
robust means. Task allocation within insect colonies                 good candidate for a robust and simple task alloca-
was considered a rigid process. The different activities             tion mechanism is the activation-threshold mechanism
were associated with different castes and caste poly-                described above [6,26,27]. Its task-related recruiting
morphism was related to genetic or internal factors                  stimuli increase when the tasks to be accomplished
[11]. At the same time, other observations indicated                 are neglected, acting as a feed-back. In a team of N
that individuals could change activity during their life             agents whose choice of activities is limited to two,
span [27,32], suggesting other than genetic factors                  neglecting the first activity (because too many indi-
being relevant for task allocation. These findings have               viduals are engaged in the second activity) causes the
let to reformulate the caste definition based purely                  stimulus for the first activity to increase, prompting
on morphological or genetic criteria to incorporate                  individuals to change from the second to the first ac-
age or simple behavioral differences [13,27]. Thus,                  tivity; and conversely (Fig. 2). Choosing appropriate
recent research on task allocation in social insects                 activation-thresholds is crucial for the performance
concentrates on behavioral flexibility and stresses the               of the robot team since individuals with the same
importance of external and decentralised factors like                activation-thresholds and exposed to the same stim-
pheromones or individual encounters [5,25]. One of                   uli switch activity at the same time, yielding gen-
the most inspiring models to explain this decentralised              erally a poor regulation. Hence, one of our objec-
and flexible task allocation found in social insects is               tives was to show that simply implementing different
the activation-threshold model.                                      activation-thresholds is sufficient for an effective task
   In the activation-threshold model, individuals re-                allocation mechanism in robotic experiments.
act to stimuli that are intrinsically bound to the task
to be accomplished. For instance, neglected brood or
the corpse of dead ants diffuse an odour of increasing                 4 The idea of activation-threshold might seem a truism since any

strength. When this stimulus reaches an individual’s                 change can be brought back to the fact that a variable crossed a
threshold value, the individual reacts by adopting the               threshold! What the model wants to stress is the origin and the
                                                                     simplicity of the activation variable directly tied to the recruiting
relevant activity (in our example: grooming the brood
                                                                     activity. In the case of corpse carrying witnessed among some
or carrying the corpse out of the nest) or by increas-               ant species, the stimulus is a natural by-product of the corpse
ing its likelihood to do so. This is a proximal mech-                decomposition, namely oleic acid. We are all familiar with this
anism: individuals closer to the work to be done are                 type of mechanism: when there is a smell of gas, we go check
most likely to be recruited. Moreover, if the individuals            the gas-range; when there is a smell of cheese, we put the cheese
                                                                     away into the refrigerator. The point, however, is that social insects
do not have the same threshold value, the recruitment
                                                                     seem to regulate all their colony activities in this way, whereas,
is gradual, which may allow regulation of the teams’                 among humans, the dynamical allocation of workforce between
sizes [6,26,27]. Indeed, Bonabeau et al. [4] have shown              the different trades is a rather complex procedure including con-
that such a simple activation-threshold model “. . .                 siderations of aptitudes, tradition, interest, openings, wages, etc.
                            M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84                  67

                                                                       social insect societies and that all complex collective
                                                                       behaviours emerge from interactions among individu-
                                                                       als with simple stereotyped behaviours.
                                                                          Our experiments intend to bring heuristic evidence,
                                                                       and thus shed some light on two questions: first, can
                                                                       we imagine plausible mechanisms of automatic and
                                                                       decentralised control for insect societies; and secondly,
                                                                       do these mechanisms account for or lend themselves
                                                                       to the gradual evolution from a solitary individual sys-
                                                                       tems to sociality? Such a gradual evolution implies that
                                                                       the transition from the ancestral, solitary state to a so-
                                                                       cial system is beneficial. Hence, individuals in simple
                                                                       aggregations have to have a better pay-off than solitary
                                                                       individuals, and individuals in groups with coopera-
                                                                       tive interactions have to outperform individuals in ag-
                                                                       gregations. Since not all animals live in social systems
                                                                       it follows that these conditions are not always given.
                                                                       One element which has proven to influence social or-
                                                                       ganisation are environmental factors. Among them,
                                                                       the distribution of food and its availability was identi-
Fig. 2. Individual choice between two activities with a fixed           fied as one of the key features [1,7,17,29]. Therefore,
activation-threshold. Neglecting activity 1 causes the stimulus for    we also examined the influence of different food dis-
activity 1 to increase, prompting individuals to change from ac-       tributions on social organisation. However, it should
tivity 2 to activity 1; and conversely.                                be stressed that our robots do not mimic any specific
                                                                       social insect species and therefore, no binding conclu-
                                                                       sion can be obtained by the comparative study of our
   From the biologist’s point of view, the purpose of                  robots’ behaviour and the behaviour of social insects.
these experiments is to contribute a piece of heuristic
evidence that complex social systems may be organ-
ised on decentralised organisation principles. Central                 2. The robots’ mission
to sociality is labour division, with its correlates of
specialisation, cooperation and task allocation. As we                 2.1. The mission
have seen above, social insects are thought to organ-
ise an important aspect of labour division, i.e. task                     The robots’ mission was to search and collect
allocation, in a decentralised manner where each in-                   “food-items” in a foraging area (Figs. 3 and 4) and
dividual’s decision is made according to a simple set                  bring them back to the “nest” (Figs. 3 and 4) in
of rules based on local information only. Many com-                    order to keep the nest-energy at a safe level. Their
plex patterns and collective behaviours observed at                    energy consumption was activity-related: it was low
the colony (macroscopic) level emerge as the aggre-                    when they were inactive in the nest, increased when
gated result of the decentralised interactions at the                  they moved around and reached a maximum when
individual (microscopic) level. This mode of organ-                    they were carrying a food-item. For the robots to
isation, termed self-organisation [19], could account                  achieve their mission efficiently, i.e. using globally
for many collective phenomena found in social in-                      as little energy as possible, we dispersed their indi-
sects [8]. Other authors used successfully mathemat-                   vidual activation-thresholds to avoid that the robots
ical models of self-organisation to model either spe-                  leave the nest simultaneously (for comments on the
cific behaviours (e.g. [35,37]) or whole insect colonies                activation-threshold distribution, see Section 5.1). The
[3,20,21]. Yet it is difficult to prove unequivocally that              activation-thresholds were equidistributed between
self-organisation is the main mechanism operating in                   3/4 and full initial nest-energy. This simple task allo-
68                         M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84




Fig. 3. The experimental set-up. Experiments were carried out on a 9.24 m2 surface with a nest and a foraging area. The entrance of
the nest was signalled by a light beacon. Next to the entrance, the robots unloaded the collected food-items into a basket. Depending on
the experiment, the food-items were either grouped in two patches of 12 food-items each (clumped food-items) or placed singly at six
locations (isolated food-items).


cation mechanism resulted in a good modulation of                         • Leave the nest when the radioed nest-energy
the number of individuals engaged in the two possi-                         level is lower than your personal activation-
ble activities: staying in nest (inactive), and foraging                    threshold.
(active). All robots were able to execute either of the                2. Leave nest
two tasks and, depending on the situation, were able                      • Leave the waiting line and move to the exit-lane.
to switch from one to the other. They had no prior                        • If there are other robots, slow down and give way.
knowledge about the number and the distribution pat-                      • Follow the exit-lane; at its end turn left and leave
tern of the food-items, or about the robot team’s size.                     the nest.
                                                                          (Once the robot has left the nest, it is not updated
2.2. Basic mission cycle                                                  anymore about the current nest-energy level, nor
                                                                          can it inform the control-station about its energy
   The basic mission (Fig. 5) can be described with                       consumption. During this time the robot depends
the following pseudo-procedural instructions:                             entirely on the energy of its personal energy store.
1. Wait in nest                                                           At the same time the robot keeps track of its energy
   • If a robot ahead leaves the waiting line, compact                    consumption.)
     the waiting line by moving forward.                               3. Look for food-items
   • Keep listening to control-station’s radio messages                   • Start a random search for food-items. Or, if you
     that periodically update the nest-energy level.                        already know where to find one, try to return
   • Radio to the control-station the energy you have                       to the spot using odometry (a helpful but not a
     been consuming while waiting in the line. In the                       very robust localisation method, since any free
     same time refill your personal energy store.                            spinning of the wheels will cause the odometry
M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84    69




Fig. 4. An experiment under way (for schematic representation see Figs. 3 and 5).




                             Fig. 5. Mission cycle.
70                      M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

     to be inaccurate). If you have reached the spot                  Recruitment was executed just before the robot was
     where you previously detected a food-item but                 about to return to the food patch where it previously
     you do not detect one, start a random search.                 found food-items. The recruiting itself was rather sim-
   • If you have used up your personal energy store                ple; the recruiting robot approached the robot at the
     return to the nest (ignoring step 4 and 5).                   head of the waiting line, which was the signal for the
   • When you detect a food-item, load it.                         waiting robot to follow (Fig. 6). To incorporate in-
4. Load food-item                                                  formation exchange among the robots the following
   • Turn toward food-item, recede a bit, open the                 additions to the mission cycle were made:
     gripper, lower the arm, move forward until grip-              • To the robots waiting in the nest:
     per’s optical barrier is broken.                                 – If you are heading the waiting line, you are a po-
   • Close the gripper, raise the arm.                                  tential follower: be ready to be asked to tandem.
5. Evaluate site                                                      – Once you have been asked to tandem, follow the
   • Check the vicinity for the presence of more                        leading robot until you lose it. When you have
     food-items.                                                        lost it, start a local search for the food-items.
   • If you detect one, turn on your odometry.                     • To the recruiting robots:
6. Return to nest                                                     – If you know where to find more food-items, try
  • Head toward the nest using the light beacon at its                  recruiting the robot heading the waiting line, by
     entrance.                                                          approaching the one to be recruited to a close
  • When reaching the nest’s entrance, radio your                       distance (about 5 mm).
     energy consumption and recharge the personal                     – Go slowly toward the food source. When you
     energy store.                                                      have reached it, decouple the tandem by making
  (Recharging is a data exchange with no visible be-                    a fast move forward, wait a while and go back
  haviour associated.)                                                  to collect food-items. (By making a fast move
7. Unload food-item                                                     forward the leading robot gets out of the sensory
  • Go to the basket and unload your food-item.                         field of the follower, which serves as a signal to
  • If the nest-energy level is higher than your                        indicate the arrival at the food patch.)
     activation-threshold, stay in the waiting line, oth-
     erwise exit the nest. If you stay in the nest, erase
     any information about locations of food-items                 3. Experimental setup
Back to 1
                                                                   3.1. Data acquisition
2.3. Mission cycle with information exchange
                                                                      Radio communication was strictly used for control-
   In a second series of experiments, we introduced                ling the experiment and for sending data from the
a simple coordination among the robots. Robots that                robots to the control station for later analysis. The op-
detected a food source were able to recruit and guide              erator could initialise the robots from his computer, as
other robots to this food source (robots can record the            well as start, suspend, resume or stop an experiment.
position of food-item patches with their odometry).                The messages were not broadcast simultaneously to
This was inspired by the tandem recruiting-behaviour               all robots: every robot was individually addressed and
observed in the ant Camponotus sericeus [12]. Two                  a message was considered received only when echoed
reasons lead us to chose such a behavior based re-                 properly to the radio base and displayed on the control
cruitment: first, it represents a type of communication             station screen. Every 10 to 20 seconds a data request
without symbolisation which probably established it-               was initiated from the control station to the robots. The
self very early during social evolution and secondly,              request signal was radioed together with nest-energy,
such a mechanism scales up without problems (i.e. can              the only information used by the robots to allocate
be used with larger groups), whereas radio transmis-               tasks. The robots were programmed to consider and
sions relaid by a central base create communication                update this external variable while inside the nest, but
bottlenecks.                                                       to ignore it when outside, reflecting the fact that they
                          M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84                        71




Fig. 6. Tandem recruitment. (a) Recruiting robot (r) backs up toward the waiting robot (w) at the head of the waiting line. When w’s
proximity sensors detect r, w goes into follower mode. (b) Recruited robot (w) follows recruiting robot (r) to the food patch.

do not know the evolution of nest-energy while work-                 Table 1
ing outside the nest. In return, the robots updated the              Each type of experiment was repeated eight times (the experiment
                                                                     with group size one was dropped in Series C for an obvious
control station on their current activity and energy con-            reason: a single robot cannot recruit another robot)
sumption for the statistical analysis.
                                                                     Recruitment         Food distribution   Group size

3.2. Experimental procedure                                          Series A      No    Dispersed           1    3    6    9    12
                                                                     Series B      No    Clumped             1    3    6    9    12
                                                                     Series C      Yes   Clumped                  3    6    9    12
   The experiments were carried out on a 9.24 m2 sur-
face tiled with 64 printed boards (0.38 m × 0.38 m)
(Figs. 3 and 4) with copper strips alternatively con-                of experiment was carried out with groups of one,
nected to the two poles of a direct current power sup-               three, six, nine and twelve robot teams, with the ex-
ply. The floor was bordered by a wall. Along the                      ception of the experiments with recruitment, where the
wall, a black line was painted on the floor to facilitate             experiments with group size of one were omitted since
the distinction between the wall and the food-items                  recruitment involves at least two robots. All experi-
(for a detailed description see Section 3.5). The nest,              ments were repeated eight times (112 experiments in
whose entrance was signalled by a lamp, was also en-                 total) and had a duration of 30 minutes. Before the start
closed with black footed walls. Inside the nest and in               of the experiment, all robots were positioned inside the
its close proximity, black lines were used as tracks                 nest. The only deliberate difference among them was
for easy navigation (Fig. 4). Next to the entrance, the              their activation-threshold whose values were equidis-
robots unloaded the collected food-items into a bas-                 tributed between 3/4 of and full initial nest-energy
ket (Figs. 3 and 4). The food-items were small plastic               level. In order to allow comparisons between different
cylinders (3 cm diameter × 3 cm height) with narrow                  team sizes, the initial nest-energy was proportional to
strips of infrared reflecting tape. Depending on the                  the number of robots participating in the experiment.
experiment, they were either grouped in two patches                  A typical experiment with six robots is shown in Fig. 7.
of 12 food-items each (clumped food distribution), or
placed singly at six locations (dispersed food distri-               3.3. Statistical data analysis
bution). All food-items were replaced soon after they
were seized by the robots.                                              Two series of analysis were carried out. First, the
   We conducted three types of experiments (Table                    effect of the two different food distributions (clumped
1): food search in an environment with a dispersed                   versus dispersed) was investigated using the two
food distribution, food search in an environment with                types of experiments without recruitment and, sec-
a clumped food distribution both without recruitment                 ondly, the effect of recruitment (recruitment versus
and finally food search in an environment with a                      non-recruitment) was investigated using the two types
clumped food distribution with recruitment. Each type                of experiments carried out in the environment with a
72                          M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84




Fig. 7. A typical experiment with six robots. (a) Nest-energy; (b) number of active and inactive robots. At the beginning (a), none of the
robots is active. As nest-energy decreases (b), the robots progressively leave the nest. At time 800 s (c), all six robots are active. Their
harvest is good, which results in a nest-energy higher than the initial nest-energy (d). As a consequence the robots switch to the inactive
mode and stay in nest, as all of them have done by time 1200 s (e). But as nest-energy decreases again (f), the robots resume their food
search. At the end of the experiment all but one robot are inactive in the nest (g).


clumped food distribution. Statistical tests were con-                   measure allows to quantify the skew in individual
ducted with ANOVAs [34]. Performance was calcu-                          contributions to a global task and ranges from zero
lated as the inverse of the total energy used during the                 to one. A skew of one indicates that one team mem-
course of the experiment. Robustness was measured                        ber was doing all the work whereas a skew of zero
as the lowest nest-energy recorded during the exper-                     means that all team members contributed equally to
iment. Values closer to zero indicate a higher risk of                   the global task. The skew can be calculated with the
system collapse since the system was considered to                       following formula:
have crashed when nest-energy dropped below zero.
To assess the relative amount of work done by each                                         1
                                                                         S= N−               2
                                                                                                     (N − 1) ,
team member a skew measure [22] was used. This                                              pi
                            M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84              73

where N is the number of individuals in the group                      optical barrier (light modulation photo IC). The am-
and pi the relative contribution of team member i.                     bient light detector was used by the robots to navi-
                                                                       gate back to the nest which was signalled by a light
3.4. The robots                                                        beacon. The optical light barrier was used to distin-
                                                                       guish the three basic objects present in our experi-
                                                                       ments: walls, food-items and other robots. Distinc-
   We used Khepera 5 robots with three additional
                                                                       tion between the three basic objects was achieved in
modules: a gripper turret, a custom-made detection
                                                                       the following way: When the standard IR sensors de-
module, and a radio turret (Fig. 8). The Khepera’s ba-
                                                                       tected the presence of an object, the robot made two
sic module is a miniature mobile robot designed as a
                                                                       additional readings: floor reflection and optical bar-
research tool at the Laboratoire de Micro-informatique
                                                                       rier status. If the floor reflection was low, signalling
of the Swiss Federal Institute of Technology at Lau-
                                                                       black painting, the robot was near a wall. An unbro-
sanne [18], and now produced by K-Team SA. 6 For
                                                                       ken optical barrier indicated that another robot — the
our experiments, we redesigned the Khepera’s basic
                                                                       only object high enough to reflect its beam — was
module, adding four floor contacts plus a regulator for
                                                                       facing the robot. If the floor reflection was high (ab-
continuous power supply (allowing long experiments
                                                                       sence of black painting), and the optical barrier was
with additional turrets) [16], three floor-oriented IR
                                                                       broken (absence of other robots), the robot was facing
sensors for floor readings and a castor wheel (Fig. 9).
                                                                       a food-item. The programs used to detect the three ba-
The main specifications of the Khepera’s basic mod-
                                                                       sic objects were run continuously and in parallel, oc-
ule are:
                                                                       casional misinterpretations were usually resolved by
                                                                       the frequent double-check procedures included in the
Diameter  55 mm
                                                                       program.
Motion    2 DC motors with incremental encoder
                                                                          The radio module is a standard Khepera turret using
          (about 10 pulses per mm of advance)
                                                                       a low power 418 MHz transceiver [16]. A communi-
Power     Rechargeable NiCd batteries or external
                                                                       cation protocol allows complete control of the robot’s
Autonomy About 45 minutes (maximal activity,
                                                                       functionality through an RS232 serial line.
          no additional module)
Sensors   Eight infrared proximity and light
          sensors
                                                                       3.6. Control architecture
Processor Motorola 68331
RAM       256 Kbytes
                                                                          Goal of the robots’ control architecture (Fig. 10)
ROM       128 or 512 Kbytes
                                                                       was to achieve the desired functionality by the proper
                                                                       connection of the three units: Sensory, Motor and Pro-
                                                                       cessing unit. It can be considered as a black box and
3.5. Additional modules                                                does not claim any biological relevance to social in-
                                                                       sects.
   The gripper turret is a standard Khepera module                        The Sensory unit took measurements in the envi-
with an arm moving from the front to the back, plus                    ronment and sent associated signals to the Process-
the gripper whose jaw can seize objects up to about                    ing unit. Apart from the sensors described in Sections
55 mm. An optical barrier detects the presence of ob-                  3.4 and 3.5, the robot also received sensory feed-back
jects between the jaws, and conductive surfaces (not                   from its own motor actions through incremental en-
used in our experiments) measure the electrical con-                   coders for the wheel motors, and potentiometers for
ductivity of the object seized.                                        the arm and gripper. The Motor unit executed the
   The detection module was custom-made featuring                      commands sent by the Processing unit. Four different
two detection units: an ambient light detector and an                  motors were addressed: left wheel, right wheel, grip-
                                                                       per’s arm and gripper itself. Motor actions were as-
 5   Robot Khepera: http://lamiwww.epfl.ch/Khepera/#khepera.            sociated with an energetic cost — purely mathemati-
 6   K-team: http://diwww.epfl.ch/lami/robots/K-family/K-Team.html.     cal — which was subtracted from the robot’s energy
74                        M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84




Fig. 8. (a) A Khepera robot with three additional modules: a gripper turret (bottom), a custom-made detection module (middle), and a
radio turret (top). (b) A Khepera robot loads a “food-item”.
                         M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84                     75




         Fig. 9. Modified base. Contacts for continuous power supply (a), floor-oriented IR sensors (b) and castor wheel (c).




level (physiology). The Processing unit consisted of                 corresponding output (for details, see below). Once
two parts, the Decision unit for activating and deacti-              a program was activated, it ran concurrently with
vating small programs, and the programs themselves                   all other activated programs until deactivated (this
(Fig. 10). The Decision unit receives inputs from three              is facilitated by the multi-tasking capabilities of the
sources:                                                             Kheperas). Some programs had a higher priority and
• the physiological variable set;                                    under certain conditions could override other running
• the conceptual tags (= classified sensor inputs);                   programs. The programs were categorised into four
• the drives.                                                        distinct classes depending from which unit they re-
   The only physiological variable considered in our                 ceived their inputs and to which unit they sent their
experiments was the robot’s energy level, and the only               outputs.
drive used was hunger.                                               1. Input: Sensory unit, output: Processing unit (“Per-
   At anytime, depending on the inputs, the Decision                     ception”)
unit decided to either activate or deactivate a pro-                     These programs classified the inputs they received
gram. For instance, when a robot had left the nest,                      from the Sensory unit into conceptual tags like “ob-
the program “Follow nest tracks” was deactivated                         stacle”, “dark floor” or “object in the gripper”. Each
and “Search for food” was activated or when a robot                      conceptual tag was considered a user pre-defined
reached the nest “Return to nest” was deactivated and                    concept.
“Follow nest tracks” was activated. The programs                     2. Input: Processing unit, output: Processing unit
themselves had one or several inputs and generated a                     (“Thoughts”)
76                           M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

   These programs received one or several conceptual                        Sensory unit and generated immediate motor ac-
   tags as an input and generated another conceptual                        tion. No concepts were generated. Example: The
   tag as an output. Example: When both the proposi-                        obstacle avoidance routine.
   tions “there is an obstacle” and “the floor is dark”
   are confirmed, the Processing unit generates the                      4. Results
   concept “wall”.
3. Input: Processing unit, output: Motor unit (“Inten-                     Our experiments showed that an artificial, com-
   tional Actions”)                                                     plex system can be regulated using a simple
   These programs used conceptual tags from the Pro-                    activation-threshold as the only control parameter.
   cessing unit to engage specific motor actions. For                    Nest-energy, the variable to be regulated, was stable
   example, after having positively checked the propo-                  and stayed in the experiments with the three, six and
   sition “there is a food-item in the gripper”, the                    nine robot teams well above the activation-threshold
   robot closed its gripper and lift its arm to transport               of the robot with the lowest activation-threshold (Fig.
   position.                                                            11). Only in the experiments with the teams of 12
4. Input: Sensory unit, output: Motor unit (“Re-                        robots, a steady decrease in nest-energy was observed
   flexes”)                                                              (Fig. 11). Moreover, the activation-threshold allows
   Reflexes were programs which had to be exe-                           shifting from a single individual to a multi-individual
   cuted fast. They received inputs directly from the                   system without any additional changes. This mech-
                                                                        anism has proven well adapted to task allocation in
                                                                        robot teams.

                                                                        4.1. Effect of group size and food distribution on
                                                                        performance

                                                                           The effect of the group size and food distribution
                                                                        was tested using a 2-way ANOVA. The size of the
                                                                        robot team had a significant effect on their perfor-
                                                                        mance (2-way ANOVA, group size: df = 4,1, F = 4.72,
                                                                        P = 0.002). The relative performance, where the best
                                                                        robot team has a performance of 100%, was 88.9%,
                                                                        99.5%, 100%, 91.5% and 91.6% for teams of one,
                                                                        three, six, nine and twelve robots, respectively. In both
                                                                        environments the one robot teams had the lowest per-
                                                                        formance whereas the three and the six robot teams
                                                                        had the highest performance. Performance among the
                                                                        groups of one, nine and twelve as well as between
Fig. 10. Control architecture of the robots. The control architecture   the groups of three and six was however, not sig-
is made out of three main parts: the Sensory unit, the Processing       nificantly different (Fisher’s PLSD, 5%). When the
unit (consisting of a Decision unit for activating and deactivating     food-items were grouped in patches the robots had a
programs, and the program themselves), and the Motor unit. The          significantly better performance (2-way ANOVA, food
Sensory unit translates the outside world into internal variables,
the Processing unit uses these variables to create more complex
                                                                        distribution: df = 4,1, F = 12.87, P < 0.001). The rela-
variables or to instruct the motor unit to perform an action, and       tive decrease in performance in the environment with
the Motor unit translates internal commands into a physical action.     dispersed food-item was 7.7%. Yet the difference in
Depending on the inputs (conceptual tags, drives, physiology), the      performance among the various group sizes was dif-
Decision unit decided to either activate or deactivate programs.        ferent for the two food distributions. The relative in-
The programs were categorised into four distinct classes depending
from which unit they received their inputs and to which unit they
                                                                        crease in performance for robot teams of three and six
sent their outputs (for more details see Section 3.6). SIS = sensory    was more pronounced in the environment with dis-
input signal, CT = conceptual tag, MAC = motor action command.          persed food-item distributions (Fig. 12).
M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84   77
78                         M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84




Fig. 12. Performance relative to the one robot teams in an environment with clumped and a dispersed food distribution with no information
sharing. Performance was measured as the inverse of the total energy used and the normalised for the number of robots.

4.2. Effect of information sharing and group size on                    of time spent in an interference was 0.4%, 0.6%,
performance                                                             1.0%, 2.3% for teams of three, six, nine and twelve
                                                                        robots, respectively. However, a post hoc analy-
   When the location of the food patches were trans-                    ses revealed (Fisher’s PLSD, 5%) that only the
mitted by recruiting and guiding a team member to the                   twelve robot team spent significantly more time in
patch, the performance of the group increased signifi-                   interferences.
cantly (2-way ANOVA, information sharing: df = 3,1,
F = 39.93, P < 0.001). The increase in performance                      4.4. Effect of group size and food distribution on
was 13%. Again, group size had a significant effect                      robustness
on performance (2-way ANOVA, group size: df = 3,1,
F = 6.87, P < 0.001) with the best performance in the                      The size of the robot team had a significant ef-
three and the six robot teams.                                          fect on the minimal nest-energy recorded (2-way
                                                                        ANOVA, group size: df = 4,1, F = 5.41, P < 0.001).
4.3. Interferences in an environment with clumped                       Groups with a larger number of robots had a higher
food-item distribution                                                  minimal nest-energy (Fig. 13). Except for the twelve
                                                                        robot team, which had a higher minimal nest-energy
   Interferences among robots was defined whenever                       than the minimal nest-energy recorded for one
a robot tried to perform a task, but was hindered                       robot team but a lower compared to all other group
by another robot. The proportion of time spent in                       sizes. The effect of the food distribution on the
such interferences increased significantly with in-                      minimal nest-energy recorded was not significant
creasing group size (2-way ANOVA, group size:                           (2-way ANOVA, food distribution: df = 4,1, F = 3.18,
df = 3,1, F = 6.33, P = 0.003). The mean proportion                     P = 0.079).
                          M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84                      79




Fig. 13. Minimal nest-energy recorded during the experiments with no information sharing for the teams of different group size. The
minimal nest-energy was normalised for the number of robots. Error bars indicate the 95% confidence interval.


4.5. Effect of information sharing and group size on                 teams, respectively. This difference in work skew
robustness                                                           among the teams of different group sizes was signifi-
                                                                     cant (2-way ANOVA, group size: df = 3,1, F = 13.03,
   Information sharing had a positive effect on the ro-              P < 0.001). In contrast, information sharing had no
bustness of the robot teams. The minimal nest-energy                 significant effect on the work skew (2-way ANOVA,
recorded during the experiment was higher in teams                   information sharing: df = 3,1, F = 0.421, P = 0.519).
with information sharing, with an average of 2147
and 1870 in the experiments with and without in-
formation sharing, respectively. This difference was                 4.7. Time active
highly significant (2-way ANOVA, information shar-
ing: df = 3,1, F = 11.77, P = 0.001). Again, groups                     Information sharing decreased significantly the
with a larger number of robots had higher minimal                    total time the robots spent in the active (working)
nest-energies with the exception of the twelve robot                 state (2-way ANOVA, information sharing: df = 3,1,
teams (2-way ANOVA, group size: df = 3,1, F = 3.10,                  F = 6.86, P = 0.009). Robots with information sharing
P = 0.034).                                                          spent on average 48% of their time in an active state
                                                                     whereas the robots without information sharing 56%.
4.6. Work skew among the robots                                      However, this was not true for all team members
                                                                     (Fig. 14). When the robots were categorised accord-
   Due to their different activation-thresholds, the                 ing to their activation-threshold into three threshold
robots spent unequal amounts of time in the active                   classes (high, middle and low) the robots with the
foraging state. This skew in activity among the team                 lowest thresholds spent significantly more time in
members was expressed using the skew index. Larger                   the active state as the robots of the same threshold
groups had better share-out of their work load indi-                 class with no information sharing (ANOVA, df = 1,
cated by the mean work skew of 0.370, 0.345, 0.197                   F = 4.76, P = 0.031). This was due to recruitment from
and 0.172 for the three, six, nine and twelve robot                  other team members when inactive in the nest.
80                         M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84




Fig. 14. Proportion of time spent in an active (working) state depending on the threshold class and information sharing. The robots were
categorised according to their activation-threshold in one of the three classes (high, middle, low). Error bars indicate the 95% confidence
interval.

5. Discussion                                                           made to move from a single to a multi-robot system.
                                                                        Likewise, when the number of robots is reduced either
5.1. Threshold control                                                  by individual failures or by allocation of some robots
                                                                        to other work areas, no changes have to be made in the
   During our experiments the nest-energy never de-                     control program of the robots. This might even repre-
creased below zero. This held under a variety of differ-                sent a more substantial advantage since robot teams
ent experimental settings such as various group sizes,                  could be left unsupervised, individual breakdowns
different food distributions and presence or absence of                 having little effect on such a coordination scheme.
information sharing. This result suggests that complex                     As already mentioned, the individual activation-
systems as our artificial ant colony can be regulated                    thresholds were equidistributed between a lower
with a single control parameter in a decentralised way;                 and an upper value. Using other distributions be-
it also indicates that this mode of task allocation can                 tween these two limits might modify the regula-
be favourably used in practical robotics.                               tory characteristics of the mechanism. Although we
   Apart from its efficiency and simplicity, the                         did not pursue the experiments in this direction,
activation-threshold mechanism presents a second                        we speculate that any reasonable distribution of
advantage: it allows to control single as well as                       activation-thresholds would yield equivalent results.
groups of robots. Simply choosing a different                           There is no reason to consider equidistribution best;
activation-threshold for each robot results in an au-                   we chose it for its simplicity. Also, on account of the
tomatic adjustment in number of active robots to the                    noisy context of practical robotics, the exact values of
current work load. This has positive implications for                   activation-thresholds are but of minor interest. Among
future robot applications. Imagine a system where                       other possible activation-threshold distributions, the
several robots could work collectively on a given task.                 random distribution is of special interest since there
Due to financial or other reasons, only one robot is                     is no need to preset the thresholds by an opera-
engaged at the beginning. Later, when more robots are                   tor. In our experiments, we had to preset a specific
added to the system, no special arrangements beside                     activation-threshold for each robot at the beginning of
choosing different activation-thresholds have to be                     the experiments. In contrast, a random distribution of
                       M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84                81

activation-thresholds would allow to fix a same lower              under some ecological conditions, a change in group
and upper limit for all robots from which each robot              size is very likely to happen whereas under other con-
draws individually its activation-threshold.                      ditions this shift is unlikely to occur.

5.2. Performance                                                  5.3. Robustness

   The size of the robot team had a significant effect                The risk of mission failure was measured as the
on their performance, with teams of three and six                 minimal nest-energy recorded during the experiments,
performing better than single robots or larger groups.            since a nest-energy below zero would have ended the
This finding is caused by a trade-off between the                  experiment. Our results show that the larger the size
positive and the negative effects of robot–robot in-              of the team, the lower the risk of mission failure. This
teractions. Robots were programmed to avoid each                  result holds for all group sizes with the exception of
other which results in better coverage of the for-                the twelve robot team. The twelve robot team had
aging arena when several robots are present. This                 a lower risk than the one robot team but a higher
is because each robot will avoid an area where an-                risk than the three, six and nine robot teams (Fig.
other robot is already foraging. This effect caused               13). The risk of mission failure however, should not
single robots to perform less efficiently than small               be confounded with the performance of the system.
groups of robots. However, when the number of                     Performance was measured as the inverse of the total
robots increases, those robot–robot encounters have               energy used and hence measures the teams’ efficiency
also negative consequences. First, robots avoided                 of accomplishing the task (keeping the nest-energy at
more often areas with food-items because on av-                   a safe level). In contrast, the risk of mission failure
erage more robots were already present at loca-                   indicates how close the system was to a nest-energy
tions with food-items. This was especially impor-                 of zero anytime during the experiment. The apparent
tant in the environment where the food-items were                 contradictory result of the three and six robot team
grouped in patches. Secondly, robots seized simul-                having the best performance but not the lowest risk of
taneously the same food-item more often, causing                  mission failure stems from the fact that smaller teams’
one of the robots to leave empty handed. Finally,                 nest energies fluctuate more widely.
larger groups spent more time in interferences at the                In the experiments with groups of three, six and
entrance or inside the nest (see also Section 4.3).               nine robots, the nest-energy usually stayed above the
Thus, single robots could not benefit from the posi-               lowest activation-threshold (Fig. 11). As a result, at
tive effect of robot–robot interactions found in small            least one of the robot was still inactive in the nest
groups, whereas in large groups the negative effect               and hence, the system did not reach its full capacity.
of the robot–robot interactions overthrew the positive            A different picture emerged in the experiments with
ones.                                                             twelve robots. Even if the final nest-energy never
   A second result was that the optimal group size was            fell under half the initial nest-energy (Fig. 11 (a)),
different for each of the two food distributions. This            the energy curve showed in all three experimental
suggests that under some environmental conditions the             set-ups a tendency to steadily decrease. This suggests
benefit of living in group of a particular size is large           that in longer experiments the nest-energy would
whereas under other conditions the benefit is modest.              have reached zero and hence the system would have
Observed animal groups’ sizes, which are assumed to               crashed. From our observations it was the interfer-
have evolved and optimised, can be correlated to eco-             ences between the robots that made the system less
logical factors (e.g. [7,14,17]). Furthermore, the rel-           effective. Indeed, a detailed analysis of the robot’s
ative increase or decrease in performance from one                behaviour in the patchy environment (with and with-
group size to another was different in the two envi-              out information sharing) revealed that the time spent
ronments (Fig. 12). For example the decrease in per-              in interferences increased with increasing group size.
formance from a six to a nine robot team was large                This shows that a given experimental set-up (size
with the dispersed food distribution but relative small           of the environment, number of food-items) supports
with the clumped food distribution. This indicates that           only a limited number of robots, around 12 with the
82                     M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

mission we chose for our experiments. This is similar             5.6. Why not settle for a simulation?
to a phenomenon found in ecology termed carrying
capacity, which is defined as the maximum population                  Demonstrating by simulation that a fixed activation-
size that can be supported by a given environment                 threshold mechanism can properly regulate teams of
[2].                                                              simulated robots is not enough for practical robotics.
                                                                  Computer simulation is a powerful and essential tool
                                                                  for robotics. Still there is no warranty that an algo-
5.4. Information sharing                                          rithm tested with simulation will work practically with
                                                                  robots. First, real robot–robot and robot–environment
   Sharing the information of the location of the                 interactions are more complex and unpredictable
food patches by recruitment had a positive effect                 than their simulation. Secondly, a real environment,
on performance, resting time and robustness. The                  however simple, is subjected to its own dynamics
increased performance was due to the fact that not                and puts the robots up to a set of constraints usually
all robots had to search for their own food patch,                impossible to list and model. Sometimes, especially
which resulted on average in more efficient food re-               when the robot’s activities are restricted to moving
trievals. Thus, the robot teams needed less time to               on a smooth surface, simulation yields excellent re-
maintain the nest-energy. However, this was not the               sults. But as soon as the interactions between robots
case for all team members. The robots with a low                  and their environment increase, for instance when
activation-threshold had to work significantly more                the robots seize objects or when the lighting condi-
under this scheme since they were more likely to be               tions are not constant, practical experiments rapidly
recruited for work when inactive in the nest. This                diverge from their simulations. Even if not mentioned
phenomenon is useful since it allows teams organised              by technical articles, nor seen on videos, autonomous
with gradual activation-thresholds to distribute the              robotics experiments are frequently marked by tech-
work load more evenly among their members.                        nical problems. Only mechanisms proven effective in
   The increased robustness originated from the                   such difficult environments should be considered. The
early recruitment of inactive robots. In the experi-              ultimate test, in experimental robotics, is the practical
ments without recruitment the robots left the nest                test. In addition, experimentation is rich in unantici-
only when the nest-energy was below their personal                pated interactions which will not occur in simulations,
activation-threshold. This mechanism caused a large               most of them negative, some of them positive. The
delay between the signal (low nest-energy) and the                latter can prove useful enough to be integrated into
response (retrieving a food-item) which in turn re-               the mission’s strategy. 7 However, it should be men-
sulted in greater fluctuation of the nest-energy. This             tioned that exploitable unexpected phenomena remain
delay was much smaller in the experiments with re-                rare. Emergences — their noblest variety — do not
cruitment since some robots became already active                 pop up on request.
(by recruitment) at nest energies where they normally
would have stayed inactive in the nest.                           5.7. Related work

                                                                     In her seminal work, Parker [23] studied dynamical
5.5. Scaling up                                                   task allocation in a group of robots whose mission was
                                                                  to collect pucks in an arena. To choose between the
   As mentioned above (Section 2.3) information
exchange by tandem recruitment scales up easily                    7 A classical example is the robot programmed to aim at a light
with the number of robots. The same holds for the                 source, and devoid of an obstacle avoidance mechanism. Versino
fixed-threshold dynamical task allocation, especially              and Gambardella [36] tell how, after having inadvertently left
                                                                  an object between the robot and the light source, they saw the
if the activation-thresholds are drawn randomly be-
                                                                  robot turn around it without bumping into it! Explanation: entering
tween two preset limits. Whether the number of activ-             the shadow cone, the robot automatically turned toward a more
ities scales up as easily as the tandem recruitment and           luminous zone to the right or to the left. This emergence would
the activation-thresholds remains to be investigated.             not have taken place in a shadow-less simulation.
                           M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84                             83

various tasks involved in their mission, the robots con-                 From the biological point of view, we demonstrated
sulted their individual “motivation” which integrated                 that complex social systems can be regulated in de-
five factors: sensory feedback, inter-robot communica-                 centralised way. This adds further evidence to the hy-
tion, inhibitory feedback from other active behaviours,               pothesis that social insect colonies are regulated in a
robot impatience and robot acquiescence. This general                 self-organised manner. Moreover, groups of three and
purpose architecture, called ALLIANCE, lends itself                   six robot teams had a better performance than a sin-
to many variations since every motivation factor may                  gle robot illustrating that a transition from a solitary
be further developed including learning [24]. In our                  individual to a social group would have been favoured
research, we took a more restrictive approach, using                  in our experimental set-up. Even though the relative
the simplest possible mechanism to achieve task al-                   advantage of a given group size depends entirely on
location. The mechanism we propose functions with                     the system we demonstrated that there are favourable
a single parameter, the activation-threshold. The only                conditions where such a transition is possible.
broadcasted information used by our robots was the
periodical update of the nest’s energy.
   The question of labour division in collective                      Acknowledgements
robotics 8 was also studied by Schneider-Fontan and
Mataric [31] who engaged two, three and four robots in                   We thank Professor J.D. Nicoud’s LAMI (Mi-
an arena divided in as many equally-sized contiguous                  croprocessors Systems Laboratory, EPFL) for its
regions. Each robot is territorially attached to region i,            sustained help, and K-team SA for providing part
where it collects and carries the pucks to region i−1,                of the hardware. We thank Laurent Keller, Alcherio
where robot i−1 will then transport them to region                    Martinoli and Cristina Versino for comments on the
i−2, and so on. Eventually, the pucks reach region 1                  manuscript. We are especially grateful to Edo Franzi,
where robot 1 will carry them to their final destina-                  André Guignard, Alcherio Martinoli, Philip Maechler
tion, a spot in region 1. The goal of this mission was to             and Christian König. This work was funded by grants
clean all regions of their pucks and move them all to                 from the Fonds UNIL-EPFL 1997 (to J.D. Nicoud
their final destination (a spot in region 1). The authors              and L. Keller) and the “Stiftung für wissenschaftliche
compare the efficiency of accomplishing this task in                   Forschung” (to M.J.B. Krieger).
relation to the different group sizes used in their ex-
periments. Their best results were achieved with three
                                                                      References
robots, which shows “that increased group sizes can,
in embodied agents, negatively impact the effective-
                                                                       [1] R.D. Alexander, The evolution of social behavior, Annual
ness of the territorial solution”. We witnessed a sim-                     Review of Ecological System 5 (1974) 325–383.
ilar phenomenon in our experiments: the best results                   [2] M. Begon, J.L. Harper, C.R. Townsend, Ecology, Individuals,
were achieved with intermediate team sizes. Territo-                       Populations and Communities, 2nd ed., Blackwell Scientific
riality, which “produces a physical division of space                      Publications, MA, 1990.
                                                                       [3] E. Bonabeau, G. Theraulaz, J.-L. Deneubourg, Quantitative
and all associated resources”, seems to be an effective
                                                                           study of the fixed threshold model for the regulation of
way to reduce interferences between robots [31].                           division of labour in insect societies, Proceedings of the Royal
                                                                           Society of London B 263 (1996) 1565–1569.
                                                                       [4] E. Bonabeau, G. Theraulaz, J.-L. Deneubourg, Fixed response
6. Conclusions                                                             thresholds and the regulation of division of labour in insect
                                                                           societies, Bulletin of Mathematical Biology 60 (1998) 753–
   From the robotics point of view, we demonstrated                        807.
                                                                       [5] A.F.G. Bourke, N.R. Franks, Social Evolution in Ants,
that dispersing the individual activation-thresholds of
                                                                           Princeton University Press, Princeton, NJ, 1995.
robots is an efficient way of allocating tasks in teams                 [6] M.D. Breed, G.E. Robinson, R.E. Page, Division of labour
of real robots.                                                            during honey bee colony defense, Behavioral Ecology and
                                                                           Sociobiology 27 (1990) 395–401.
 8 A 15-minutes video entitled “Task allocation in collective          [7] J.H. Crook, The evolution of social organisation and visual
robotics” depicts and comments our experiments. For copies (pri-           communication in the weaver birds (Ploceinae), Behaviour
vate use only), contact jean-bernard.billeter@epfl.ch                       10 (Suppl.) (1964) 1–178.
84                          M.J.B. Krieger, J.-B. Billeter / Robotics and Autonomous Systems 30 (2000) 65–84

 [8] J.-L. Deneubourg, J.S. Gross, Collective patterns in decision     [29] C.H. Ryer, B.L. Olla, Social behavior of juvenile chum
     making, Ethology Ecology and Evolution 1 (1989) 295–311.               salmon, Oncorhynchus keta, under risk of predation — The
 [9] J.M. Epstein, R. Axtell, Growing Artificial Societies — Social          influence of food distribution, Environmental Biology and
     Science from the Bottom Up, MIT Press, Cambridge, MA,                  Fisheries 45 (1996) 75–83.
     1996.                                                             [30] T.C. Schelling, Micromotives and Macrobehavior, Norton,
[10] D.M. Gordon, Dynamics of task switching in harvester ants,             New York, 1978.
     Animal Behaviour 38 (1989) 194–204.                               [31] M. Schneider-Fontan, M. Mataric, A study of territoriality:
[11] D.M. Gordon, The organization of work in social insect                 the role of critical mass in adaptive task division, in:
     colonies, Nature 380 (1996) 121–124.                                   Proceedings of the Fourth International Conference on
[12] B Hölldobler, M. Möglich, U. Maschwitz, Communication                  Simulation of Adaptive Behavior: From Animals to Animats
     by tandem running in the ant Camponotus sericeus, Journal              (SAB96), Cape Cod, MA, MIT Press, Cambridge, MA,
     of Computational Physiology 90 (1974) 105–127.                         1996.
[13] Z.Y. Huang, G.E. Robinson, Honeybee colony integration:           [32] T.D. Seeley, Adaptive significance of the age polyethism
     Worker–worker interactions mediate hormonally regulated                schedule in honeybee colonies, Behavioral Ecology and
     plasticity in division of labor, Proceedings of the National           Sociobiology 11 (1982) 287–293.
     Academy of Sciences USA 89 (1992) 11726–11729.                    [33] A.B. Sendova-Franks, N.R. Franks, Social resilience in
[14] P.J. Jarman, The social organisation of antelope in relation
                                                                            individual worker ants, and its role in division of labour,
     to their ecology, Behaviour 48 (1974) 215–267.
                                                                            Proceedings of the Royal Society of London B 256 (1994)
[15] A. Lenoir, Feeding behaviour in young societies of the ant
                                                                            305–309.
     Tapinoma erraticum L.: Trophallaxis and polyethism, Insectes
                                                                       [34] R.R. Sokal, J.F. Rohlf, Biometry, 2nd ed., Freeman, New
     Society 26 (1979) 19–37.
[16] A. Martinoli, E. Franzi, O. Matthey, Towards a reliable set-up         York, 1981.
     for bio-inspired collective experiments with real robots, in:     [35] G. Theraulaz, E. Bonabeau, Coordination in distributed
     Proceedings of the ISER’97, Barcelona, Spain, 1997.                    building, Science 269 (1995) 686–688.
[17] T. Maruhashi, C. Saito, N. Agetsuma, Home range structure         [36] C. Versino, L.M. Gambardella, Learning the visuomotor
     and inter-group competition for land of Japanese macaques              coordination of a mobile robot by using the invertible kohonen
     in evergreen and deciduous forests, Primates 39 (1998) 291–            map, in: J. Mira, F. Sandoval (Eds.), From Natural to Artificial
     301.                                                                   Neural Computation: International Workshop on Artificial
[18] F. Mondada, E. Franzi, P. Ienne, Mobile robot miniaturization:         Neural Networks, Lectures Notes in Computer Science, Vol.
     A tool for investigation in control algorithms, in: Proceedings        930, Springer, Berlin, 1995, pp. 1084–1091.
     of the ISER’93, Kyoto, Japan, 1993.                               [37] J. Watmough, L. Edelstein-Keseht, Modelling the formation
[19] G. Nicolis, I. Prigogine, Self-Organisation in Non-Equilibrium         of trail networks by foraging ants, Journal of Theoretical
     Systems, Wiley, New York, 1977.                                        Biology 176 (1995) 357–371.
[20] S.W. Pacala, D.M. Gordon, H.C.J. Godfray, Effects of social
     group size on information transfer and task allocation,
     Evolutional Ecology 10 (1996) 127–165.
[21] E.P. Page, S.D. Mitchel, Self-organization and the evolution                              Micheal B. Krieger obtained his Master
     of division of labor, Apidologie 29 (1998) 171–190.                                       in Biology from the University of Basel,
[22] P. Pamilo, R.H. Crozier, Reproductive skew simplified, Oikos                               Switzerland. He received his Ph.D. from
     75 (1996) 533–535.                                                                        the University of Lausanne, Switzerland in
[23] L.E. Parker, Heterogeneous multi-robot cooperation, Ph.D.                                 1999. He is currently a Post-Doctoral As-
     Thesis, MIT, 1994.                                                                        sociate in the Department of Entomology
[24] L.E. Parker, L-alliance: Task-oriented multi-robot learning                               and Microbiology, University of Georgia,
     in behavior-based systems, Advances in Robotics 11 (1997)                                 USA, where he continues to work on as-
     305–322.                                                                                  pects of social organisation of ants.
[25] L. Passera, E. Roncin, B. Kaufmann, L. Keller, Increased
     soldier production in ant colonies exposed to interspecific
     competition, Nature 379 (1996) 630–631.
[26] G.E. Robinson, Modulation of alarm pheromone perception
     in the honey bee: Evidence for division of labour based                                   Jean-Bernard Billeter obtained his
     on hormonally regulated response thresholds, Journal of                                   diploma in Electrical Engineering at the
     Computational Physiology 160 (1987) 613–619.                                              Swiss Federal Institute of Technology in
[27] G.E. Robinson, Regulation of division of labor in insect                                  Zurich (ETHZ). After numerous years out
     societies, Annual Review of Entomology 37 (1992)                                          of the academic world, he joined EPEL
     637–665.                                                                                  from 1994 to 1998 to work on collective
[28] G.E. Robinson, R.E. Page, Genetic determination of nectar                                 intelligence. He is currently working on
     foraging, pollen foraging, and nest-site scouting in honey bee                            exhibition robotics projects.
     colonies, Behavioral Ecology and Sociobiology 24 (1989)
     317–323.
       Efficiency and Robustness of Threshold-Based
 Distributed Allocation Algorithms in Multi-Agent Systems
                   William Agassounon                                                                     Alcherio Martinoli
                 Collective Robotics Group                                                            Collective Robotics Group
              California Institute of Technology                                                   California Institute of Technology
                Pasadena, CA 91125, USA                                                              Pasadena, CA 91125, USA
                     +1 (626) 395 2243                                                                    +1 (626) 395 2208
             agassw@micro.caltech.edu                                                             alcherio@micro.caltech.edu

ABSTRACT                                                                            the principles guiding the collectively complex and intelligent
In this paper we present three scalable, fully distributed,                         behavior of natural systems consisting of many agents, such as ant
threshold-based algorithms for allocating autonomous embodied                       colonies and bird flocks. The abilities of such systems appear to
workers to a given task whose demand evolves dynamically over                       transcend the abilities of each constituent agent. In all the
time. Individuals estimate the availability of work based solely on                 biological cases studied so far, the emergence of high-level
local perceptions. The differences among the algorithms lie in the                  control has been found to be mediated by nothing more than a
threshold distribution among teammates (homogeneous or                              small set of simple low-level interactions among individuals, and
heterogeneous team), in the mechanism used for establishing                         between individuals and the environment [5,7].
threshold values (fixed, parameter-based or variable, rule-based),                  Generally speaking, our research focuses on the application of the
and in the sharing (public) or not sharing (private) of demand                      SI approach to control embedded systems that consist of many
estimations through local peer-to-peer communication. We tested                     autonomous decision making entities endowed with local
the algorithms’ efficiency and robustness in a collective                           perception and maybe communication capabilities. In particular,
manipulation case study concerned with the clustering of initially                  we are interested in understanding task allocation and labor
scattered small objects. The aggregation experiment has been                        division mechanisms exploited in social insect societies that are
studied at two different experimental levels using a microscopic                    suitable for artificial embedded systems such as multiple mobile
model and embodied simulations. Results show that teams using a                     robot platforms. One of the most appealing principles of the SI
number of active workers dynamically controlled by one of the                       approach to a roboticist is the minimalism [3] at the individual
allocation algorithms achieve similar or better performances in                     agent’s level. This characteristic prompts the roboticist to
aggregation than those characterized by a constant team size while                  carefully evaluate each additional capability the single agent
using on average a considerably reduced number of agents over                       should be endowed with, which in turn should lead to an overall
the whole aggregation process. While differences in efficiency                      increased system robustness and cost effectiveness in mass
among the algorithms are small, differences in robustness are                       production.
much more apparent. Threshold variability and peer-to-peer
communication appear to be two key mechanisms for improving                         Recently, several macroscopic models, some of them based on
worker allocation robustness against environmental perturbations.                   threshold responses [4,16], others focusing only on task-switching
                                                                                    probabilities [14], have been proposed to explain these
Categories and Subject Descriptors                                                  mechanisms in natural colonies. However, none of these
                                                                                    theoretical approaches has focused on how workers gather the
J.2 [Computer Applications]: Physical Science and Engineering                       information necessary to decide whether or not to switch task or
– engineering, electronics, mathematics and statistics.                             to engage in a task performance. More specifically, they have not
General Terms:              Algorithms, Performance, Reliability,                   taken into consideration the partial perception in time and space
Experimentation.                                                                    of the demand and the embodiment of the agents. For instance,
                                                                                    partial perceptions of the demand combined with real world
Keywords:       Swarm intelligence, division of labor, response                     uncertainties could strongly influence the optimal distribution of
threshold, probabilistic modeling, embodied multi-agent systems.                    thresholds among teammates or the switching mechanism itself
                                                                                    (e.g. probabilistic vs. deterministic).
1. INTRODUCTION
                                                                                    In the collective robotic literature, we find threshold-based [9,15],
Swarm Intelligence (SI, first introduced in [2]) is a new
                                                                                    market-based, and publish/subscribe messaging approaches [6]
computational metaphor for solving distributed problems using
                                                                                    that take into account the embodiment of the agents but which are
                                                                                    not scalable because of extensive communication requirements in
                                                                                    a finite bandwidth or the necessity of an external supervisor. For
 Permission to make digital or hard copies of all or part of this work for
 personal or classroom use is granted without fee provided that copies are          instance, in the pioneering approach proposed by Parker [15] each
 not made or distributed for profit or commercial advantage and that                robot at every instant of time and in every position is aware of the
 copies bear this notice and the full citation on the first page. To copy           progress in task accomplishment of its teammates based on a
 otherwise, or republish, to post on servers or to redistribute to lists,           global radio networking and an absolute positioning system. In
 requires prior specific permission and/or a fee.                                   Krieger and Billeter’s experiment [9] the demand related to the
 AAMAS ’02, July 15-19, 2002, Bologna, Italy.
 Copyright 2000 ACM 1-58113-480-0/02/0007…$5.00.


                                                                             1090
nest energy is assessed by an external supervisor and globally                of the working team was kept constant during the whole
transmitted to all the robots. Using this method, the team of robots          aggregation process. These experiments define our baseline for an
has to be heterogeneous and each agent has to be characterized by             efficiency comparison with and without worker allocation
a different threshold in order to regulate the activity of the team.          algorithm. In this paper, we are using three primary team
This in turn results in a different exploitation of the teammates,            performance measurements: the average cluster size, the average
the one endowed with the lowest threshold systematically being                number of clusters, and the average number of active workers in
more active than that with the highest one.                                   the environment. We then integrate all three primary team
                                                                              performances in a combined metric that represents the cost of
In [1] we proposed a threshold-based, distributed, scalable worker
                                                                              aggregation over a certain time period.
allocation algorithm that is based exclusively on the local
estimation of the demand by the individuals. The individuals were
all characterized by the same threshold but since the agents did              2.2 The Embodied Simulator
not perceive the demand globally but rather estimate it locally,              We implemented the aggregation experiment in Webots 2.01, a
they did not work or rest all at the same time, a behavior that               3D sensor-based, kinematic simulator [12] of Khepera robots
would have arisen if the demand was broadcasted from an                       [13]. The simulator computes trajectories and sensory inputs of
external supervisor. In this paper we propose two new algorithms              the embodied agents in an arena corresponding to the physical
of the same family and we compare their performances in                       setup (see Figure 1).
efficiency and robustness with the one previously presented.
Consistently with the SI approach, the two newly proposed
algorithms slightly extend the individual capabilities in order to
overcome some of the limitations presented by the first algorithm,
in particular in case of environmental perturbations. The first new
algorithm endows each agent with the ability to calibrate its own
response threshold before starting to adapt its activity based on
this threshold. We therefore replace here an a priori fixed
parameter by a rule that can adapt the value of this parameter
according to some locally sensed environmental constraints. The
second new algorithm allows the team of embodied agents to
exchange individual estimations of the demand through local
peer-to-peer communication while still relying on an a priori fixed
threshold. Since the local estimation of the demand is noisy,
sharing this information among the teammates is a way to increase
the update rate of this estimation and reduce the corresponding
error without recurring to an external supervisor.
Collective embedded systems can be studied at several
implementation levels, from macroscopic analytical models [1] to                  Figure 1 a. Close up of a simulated robot (5.5 cm in
units in real world [10,11] through microscopic, numerical                       diameter) in Webots equipped with a gripper turret in
models and embodied simulations. Models allow for a better                                           front of a seed.
understanding of the experiment dynamics and for a
generalization to other tasks, environmental constraints, and                 The mean comparative speed ratio for this experiment with 10
embedded platforms. When using quantitatively accurate models,                robots between Webots and real time is about 7 on a PC Pentium
optimal parameters of the control algorithms can be investigated              III 800 MHz workstation. The environment is represented by two
much more quickly at more abstract levels and the effectiveness of            square arenas of different sizes. Each of them contains a working
the devised solution can then be verified using embodied                      zone of a corresponding size (80x80 cm and 178X178 cm) where
simulations and/or real embedded systems. In this paper we                    twenty small seeds are randomly scattered at the beginning of the
present results gathered at two implementations levels using a                experiment. A resting zone surrounds the working zone, where
microscopic model and embodied simulations. Both levels are                   non-active agents go to rest (or stay in an idle state to save
well suited for studying noisy demand estimations and                         energy). Agents are endowed with sensor capabilities to
heterogeneous teams of agents without having to deal in this                  distinguish the border between resting and working zones.
exploratory phase with all sort of problems arising in real robots            Without considering the mode-switching behavior (explained in
experiments. The qualitative and quantitative reliability of both             section 3), we can summarize each agent’s behavior with the
implementation levels for this type of experiments have been                  following simple rules. In its default behavior the agent moves
shown in previous work [1,8,10,11].                                           straight forwards within the working zone looking for seeds.
                                                                              When at least one of its six frontal proximity sensors is activated,
                                                                              the agent starts a discriminating procedure. Two cases can occur:
2. EXPERIMENTAL SETUP                                                         if the agent is in front of a large object (a wall, another agent, or
2.1 The Aggregation Experiment                                                the body side of a cluster of seeds), the object is considered as an
The case study used for assessing the efficiency of the worker                obstacle and the agent avoids it. In the second case, the small
allocation algorithm is concerned with the gathering and                      object is considered as a seed. If the agent is not already carrying
clustering of small objects scattered in an enclosed arena. In most           a seed, it grasps this one with the gripper, otherwise it drops the
of the work done so far [7], and more specifically [10,11], the size          seed it is carrying close to that it has found; then in both cases, the



                                                                       1091
agent resumes searching for seeds. With this simple individual                   Working with models also brings an additional time saving in
behavior, the team is able to gather objects in clusters of                      comparison to embodied simulations. The mean speed ratio for
increasing size. A cluster is defined as a group of seeds whose                  this experiment with 10 agents between the microscopic
neighboring elements are separated by at most one seed diameter.                 probabilistic model and Webots is about 700 on a PC Pentium III
Note that, because agents identify only the two extreme seeds of a               800 MHz workstation.
cluster as seeds (as opposed to obstacles), clusters are built in line.
                                                                                 3. THE DISTRIBUTED WORKER
                                                                                 ALLOCATION MECHANISM
                                                                                 The main objective of this case study is to show that the
                                                                                 introduction of worker allocation mechanisms allows the team of
                                                                                 agents to increase its efficiency as a whole by allocating the right
                                                                                 number of workers as a function of the demand intrinsically
                                                                                 defined by the aggregation process. Intuitively, we can imagine
                                                                                 that at the beginning of the aggregation there are several possible
                                                                                 manipulation sites (i.e. several scattered seeds) that allow for a
                                                                                 parallel work of several agents. As the aggregation process goes
                                                                                 on, the number of these sites is reduced and having more agents
                                                                                 competing for the same manipulation sites decreases their
                                                                                 efficiency.
                                                                                 In threshold-based systems, the ‘propensity’ of any agent to act is
                                                                                 given by a response threshold. If the demand is above the agent’s
                                                                                 threshold then that agent continues to perform the task, conversely
                                                                                 if the demand is below its threshold then the agent stops
   Figure 1 b. Experimental setup: inner area corresponds                        performing that particular task. In all the algorithms presented in
    to the working zone, and outer area is the resting zone.                     this paper the time an agent spends before finding some work to
   Aggregation in progress with 10 agents in a 178X178 cm                        accomplish (i.e. to pick and drop a seed) represents the agent’s
                            arena.                                               estimation of the demand stimulus associated with the aggregation
                                                                                 task.
As shown in [11], if the agents do not drop a seed unless it is next
                                                                                 Our current worker allocation mechanism is as follows. When an
to another seed or pick up an internal seed of a cluster, the
                                                                                 agent has not been able to work (i.e. to pick up and drop a seed)
number of clusters is monotonically decreasing and eventually a
                                                                                 for a reasonable amount of time, its propensity to accomplish that
single cluster always arises.
                                                                                 particular task is decreased. If the stimulus goes below a certain
                                                                                 threshold (i.e. if the amount of time spent in the search for work to
2.3 The Microscopic, Probabilistic Model                                         accomplish is above a given Tsearch time-out), a deterministic
The central idea of the microscopic, probabilistic model is to                   switching mechanism prompts the agent to leave the working zone
describe the experiment as a series of stochastic events with                    for resting in the adjacent parking space. An agent carrying a seed
probabilities based on simple geometrical considerations and                     that decides to become inactive cannot do so until it finds an
systematic interaction experiments with a single real or embodied                appropriate spot (i.e. one tip of a cluster) to drop the seed. Thus,
agent. The probability for any agent to encounter any other object               with this simple algorithm characterized by a single threshold,
present in the arena (e.g. a seed, a teammate, the border between                each agent is able to estimate the aggregation demand locally and
the working field and the resting zone, etc.) is given by the ratio              to decide whether to work or rest.
of the extended area occupied by that object to the total arena area
in which the agent is moving. The extended area occupied by each                 In the following, ‘public’ refers to the existence of an explicit,
object is computed by considering the detection range of that                    collaborative information flow between the teammates, and
object by an active agent taken from the center of that agent. In                ‘private’, to no information sharing at all. Thus, in what follows
this specific collective manipulation case study, seed picking up                we present three different worker allocation algorithms: a private,
and dropping probabilities have also to be taken into account once               fixed-threshold algorithm, a private, variable-threshold algorithm,
a cluster is found and they depend on the angle of approach of the               and a public, fixed-threshold algorithm.
agent to the cluster (clusters can be modified only at their tips).
In the numerical, probabilistic model a finite state machine                     3.1 The Private, Fixed-Threshold Worker
defines the states of the agents, but instead of computing the                   Allocation Algorithm (PrFT)
detailed sensory information and trajectories of the agents the                  The PrFT algorithm is the same reported in [1]. Following the
change of states is determined randomly by rolling a dice. The                   worker allocation mechanism described previously, we assign the
overall behavior is then computed by averaging the results of                    same response threshold to all the agents. The team of agents is
several runs of the same experiment. A more detailed description                 therefore homogeneous from control point of view. The resulting
of this microscopic modeling methodology can be found in                         agents’ behavior (rhythm of activity) is not identical since it is
[8,10,11].                                                                       based on the local, private assessment of the current status of the
                                                                                 shared resource, i.e. the environment. In other words, diversity in




                                                                          1092
activity is created by exploiting the intrinsic noise of the system as          distribution of CR and Tsearch is homogeneous and fixed a priori
well as local perceptions and interactions.                                     among the teammates.

3.2 The Private, Variable-Threshold Worker
Allocation Algorithm (PrVT)
Using fixed-threshold algorithms does not endow the team of
workers with the robustness required to face external
perturbations brought to the system. For instance, if some key
control parameters such as the response thresholds have to assume
different values as a function of the characteristics of the
environment (e.g. arena size) for an optimal team performance, an
algorithm is required that automatically calculates the optimal
parameters for different environmental constraints.
In this paper, we propose a variable-threshold algorithm based on
a threshold self-calibration rule that works in two steps: a
threshold estimation phase followed by a worker allocation
phase. During the estimation phase, each autonomous agent
evaluates the spatial density of the demand and then sets its                   Figure 2. Example of two agents within communication range.
response threshold based on that individual estimation. During the
allocation phase, the algorithm works as explained above. Two                   4. RESULTS AND DISCUSSION
parameters govern the self-calibration mechanism: the Estimation                In this section we present and compare results collected at two
Steps (ES) and the Estimation Factor (EF). Each autonomous                      different experimental levels using microscopic modeling and
agent estimates the availability of work in the environment by                  embodied simulations. Unless otherwise stated, each aggregation
averaging the amount of time it spends to find some work to                     run lasted 10 hours (simulated time), for each team size 100 and
accomplish over its first ES successful attempts. The agent then                30 runs were carried out using the microscopic model and the
computes its response threshold by multiplying that average                     embodied simulator respectively, and error bars represent the
amount of time by EF. Notice that even if all the agents are                    standard deviations among runs. All results reported below were
characterized by the same values of ES and EF and therefore the                 obtained without using any free parameters. All the parameters
same calibration rule (homogeneous team), due to their partial                  introduced in the model (e.g. mean obstacle avoidance duration,
perceptions in time and space the agents will end up with different             mean time to pick up/drop a seed, mean time to leave the working
thresholds in the allocation phase. In addition, since PrVT is a                zone) were measured from a single embodied agent.
private algorithm, the transition time from one phase to the other
is also determined by the agents individually and asynchronously.
Equation 1 summarizes how each autonomous agent computes its                    4.1 Aggregation Without Worker Allocation
response threshold. TWk represents the amount of time an agent                  Figure 3 presents the model predictions and the embodied
spent to find work to accomplish at its kth successful attempt.                 simulation results of the aggregation experiment without the use
                                                                                of any worker allocation algorithm using a group of 10 agents in
                                                                                an 80X80 cm arena. In that plot the upper set of curves represents
                                                                                the (increasing) average size of the clusters over time while the
                    1 ES T                                                    other set shows the (decreasing) average number of clusters over
 T search
            = EF ∗     ∑ 
                    ES k =1 W 
                                  k
                                                                (1)             time. Figure 3 consists of a first phase when the average cluster
                                                                                size increases steadily from 1 seed to about 15 seeds and a second
                                                                                phase when the average cluster size remains on average constant
3.3 The Public, Fixed-Threshold Worker                                          around 15 seeds. Similarly, during the first phase the average
                                                                                number of clusters decreases asymptotically from 20 to about 1
Allocation Algorithm (PuFT)                                                     then remains close to 1 during the second phase of the
In order to allow this multi-agent system to react to dynamic
                                                                                aggregation process. This can be explained by the fact that, once a
external perturbations (e.g. sudden introduction of additional
                                                                                single cluster arises only two manipulation sites remain in the
seeds) more quickly than relying on the slow implicit
                                                                                environment (i.e. the two end tips of that cluster). The
communication through environmental modifications, we endow
                                                                                probabilities of picking up and dropping a seed are empirically
all the agents with peer-to-peer communication capabilities. An
                                                                                very close, therefore at any given time during the last phase of the
additional advantage of explicit communication is that each agent
                                                                                aggregation process, on average, half of the active workers will be
is able to gather information about the work demand both from its
                                                                                carrying a seed and the other half will not. A similar aggregation
individual experience and the experience of other teammates that
                                                                                evolution was recorded when using an arena of size 178X178 cm.
it may encounter. Our current collaborative scheme is as follows.
                                                                                The main difference lies in the time needed to reach a single
When two agents are within a certain Communication Range
                                                                                cluster, which takes on average twice as long in the larger arena.
(CR), they exchange their estimation of the demand; their
                                                                                Note that the latter arena has five times the surface of the 80X80
individual estimations are then set to the average of their original
                                                                                cm arena.
values. In the PuFT algorithm presented in this paper, the




                                                                         1093
                                                                               time of 10 hours. We chose !x = 10-2 and !x = !y = 4!z in order to
                                                                               obtain a higher penalty cost for unsuccessful aggregation results
                                                                               (cluster size and number of clusters) than a worker cost. In Table
                                                                               1 we optimized PuFT for a fixed communication range of 20 cm.
                                                                               In this paper, we use the IC for two purposes: first, as a metric to
                                                                               optimize parameters of the worker allocation algorithms; second,
                                                                               as a metric to compare the performances of the worker allocation
                                                                               algorithms      under   different    environmental      constraints.
                                                                               Additionally, the following criterion based on central tendencies
                                                                               (mean and standard error) is used in order to assess the quality of
                                                                               a given set of parameters on the IC-axis.
                                                                               Assume that (Ei,"i) represents the pair of mean and standard error
                                                                               of the IC associated with the set of simulation runs i. We consider
                                                                               the IC achieved by the set i as better (more efficient) than that of
                                                                               the set j if Ei+(1+#)"i < Ej-(1+#)"j, where # is a criterion
                                                                               parameter. For the results reported here we used # = 0. In the case
      Figure 3. Results of the aggregation experiment without                  where both sets respect the inequality above, we consider them as
           worker allocation using a team of 10 agents.                        equivalent. Considering the acceleration factor of the probabilistic
                                                                               model and the limited number of design points in a given
                                                                               parameter space, we performed a systematic test of all possible
The good agreement between the results collected at both
                                                                               parameter combinations for a given algorithm.
implementation levels shows how reliable the microscopic
model’s predictions are. Additional results for the same case study            Table 1 summarizes the algorithmic parameters and the optimized
have been reported in [1]. Therefore, in the following, we will be             values for the three worker allocation algorithms in an 80X80 cm
presenting results obtained using the microscopic model                        arena. Note that in Equation 2 we do not take into account the
exclusively.                                                                   extra power consumption cost introduced by the communication
                                                                               capability each agent is endowed with in PuFT.
4.2 Integrated Cost and Optimization of
Algorithmic Parameters                                                                         Table 1. Algorithmic Parameters
To be able to optimize the different algorithmic parameters and to
access the cost effectiveness of these different algorithms, we                  Algorithm      Para-        Range        Granula      Optimal
introduced a cost function whose integrated value over the total                                meter      (min-max)        rity        value
observation time, named Integrated Cost (IC), corresponds to the
total cost of the experiment. The IC represents an efficient                       PrFT        Tsearch     10-35 min.       5 min.      25 min.
combined metric for comparing the influence of each parameter of                   PrVT           ES          5 – 40          5           10
any given worker allocation algorithm, as well as the                                                                        0.5           8
                                                                                                  EF         0.5 - 10
performances of different algorithms. The cost function is defined
in Equation 2.                                                                     PuFT          Tsearch   10-35 min.       5 min.      15 min.

    F t (xt , yt , zt ) = !x (Xopt − xt ) + !y (Yopt − yt )
                                        2                 2

     cos
                                                                               4.3 Aggregation with Worker Allocation
                         + !z (Zopt − zt )
                                            2
                                                                (2)            In the following, we present a set of comparative results obtained
                                                                               by applying the worker allocation algorithms detailed in section 3
Where:                                                                         to the same aggregation experiment. We started by optimizing
•       xt, yt and zt represent the average cluster size, the average          their respective parameters (as explained in section 4.2) for a same
        number of clusters, and the average number of active workers           experimental setup (e.g. an 80X80 cm arena, 20 seeds originally
        at time step t respectively.                                           scattered in the arena, and 10 active agents at the beginning of the
                                                                               experiment). We then proceeded by studying the influence of
•       Xopt, Yopt and Zopt are the optimal values of the variables            external perturbations on the cluster formation (e.g. different
        above (e.g. 20, 1, and 0 respectively).                                arena size, 20% additional seeds introduced during the
•       !x, !y and !z are coefficients selected to weight the                  aggregation process) for the different algorithms. Table 2 presents
        contribution of each of the variables accordingly.                     the IC values and their standard errors for the three worker
                                                                               allocation algorithms in the different manipulation experiments
In the right-hand side of Equation 2, the first two terms can be               introduced previously (values in bold corresponds to the best
considered as the ‘penalty cost’ (e.g. the cost for work not                   performance, i.e. minimal cost, calculated by applying the
finished) and the third term can be seen as the ‘worker cost’ (e.g.            criterion mentioned above). In Table 2, Arena1 corresponds to the
cost for hiring workers). The Integrated Cost is computed by                   original setup (an 80X80 cm arena and 20 seeds). Arena2
using a discrete Riemann integration over any observation time.                corresponds to the case with static perturbation (20 seeds in a new
Results presented in Table 1 were obtained for an observation                  178X178 cm arena), and Arena3 refers to the case with dynamic



                                                                        1094
perturbations (80X80 cm arena, 20 original seeds and 5 additional
seeds introduced during the aggregation process 2 hours after the
start).
                    Table 2. Integrated cost
  Algorithm         Arena1           Arena2          Arena3
    PrFT           138.9±7.0       324.9 ± 10.8     154.5 ± 7.9
    PrVT          155.1 ± 8.0      231.9 ± 10.7     152.2 ± 8.7
    PuFT          138.2 ± 6.9      337.6± 10.7      122.4± 6.4
   W/o WA         227.4 ± 4.8       310.8 ± 8.8     197.2 ± 5.9



The PuFT and PrFT algorithms appear to be the most efficient in
Arena1 (i.e. equivalent performances following the criterion),
PrVT in arena2, and PuFT in Arena3.

4.3.$.   Private, Fixed-Threshold Worker Allocation                                Figure 4 b. Average number of active workers for
Figure 4 shows the outcome of the aggregation experiment using                      aggregation experiment with worker allocation
the worker allocation algorithms with a team of 10 agents in an                            algorithms in an 80X80 cm arena
80X80 cm arena. Figure 4 shows that here, conversely to the case
without worker allocation, during the last phase of the                     smaller on average than the average size of those created by the
aggregation, the average cluster size remains an increasing                 team using the PrVT algorithm (with similar standard deviations)
function of time eventually reaching 20 seeds, the optimal largest          because the agents withdraw too soon. This is illustrated in Figure
value possible, while the number of active workers in the                   5 where after 120 minutes, the size of the clusters created using
                                                                            the PrFT algorithm becomes (and remains for the rest of the
                                                                            experiment) distinctively smaller on average than that of the
                                                                            clusters created using the PrVT algorithm. As a consequence, the
                                                                            aggregation efficiency of the PrFT algorithm deteriorated
                                                                            considerably in Arena2 as shown in Table 2.

                                                                            4.3.2. Private, Variable-Threshold Worker
                                                                                Allocation
                                                                            The density of manipulation sites (seeds that can be manipulated)
                                                                            is higher in the smaller arena and the robots are more likely to
                                                                            encounter them than in the larger arena. In response to this
                                                                            difference in density of manipulation sites, variable-threshold
                                                                            workers autonomously set their response thresholds higher in
                                                                            Arena2. Therefore, they stay active longer in the larger arena than
                                                                            in the smaller and this in turn allows them to continue performing
                                                                            the task, as most seeds are not gathered yet into a single cluster.
                                                                            This is illustrated by Figure 5 where it clearly appears that PrFT
                                                                            and PuFT under-perform due to a relatively too low homogeneous
        Figure 4 a. Average cluster size for aggregation
                                                                            threshold value and their inability to adapt to a new environment.
      experiment with worker allocation algorithms in an
                       80X80 cm arena                                       However, the PrVT algorithm is not appropriate for an optimal
environment decreases. Intuitively, this can be explained by the            response of the agents to a dynamic change in the number of
fact that with only two manipulation sites remaining in the arena,          objects to manipulate. For instance, results in Table 2 show that
and on average half of the active agents always carrying a seed             when additional seeds are dropped in the arena after 2 h into the
and the other half not, reducing the number of active agents,               aggregation process, the efficiency of the PrVT algorithm
consequently increases the size of the single cluster.                      deteriorates. This results from the nonexistence of a continuous
                                                                            adaptive activity threshold mechanism that allows the agents to
However, due to the a priori fixed response threshold value the             upgrade their activity thresholds when facing a sudden increase in
agents behave sub-optimally in a different environment. For                 the availability of work.
instance, when performing the same aggregation task in a
178X178 cm arena, the average size of the clusters they create are




                                                                     1095
                                                                              Figure 6) as well as to a faster decrease in the activity of the
                                                                              robots as soon as a single cluster arises. Consequently, the PuFT
                                                                              algorithm offers the best efficiency in Arena3 (see Table 2 and
                                                                              Figure 7). Figure 7 presents the IC for the different allocation
                                                                              algorithms as a function of the observation time i.e., the
                                                                              integration time interval in Arena3. Error bars represent the
                                                                              standard errors among sets of simulation runs. From Figure 7, it
                                                                              appears that despite the dynamic change in the workload the
                                                                              algorithms with worker allocation are still more efficient than that
                                                                              without any worker allocation. The PuFT algorithm offers the best
                                                                              efficiency after the introduction of the additional seeds by
                                                                              exploiting    teammates’      collaboration    via     peer-to-peer
                                                                              communication.




           Figure 5. Average cluster size for aggregation
         experiment with worker allocation algorithms in a
                        178X178 cm arena

4.3.3.     Public, Fixed-Threshold Worker Allocation
PuFT can efficiently deal with a dynamic change in the number of
objects to manipulate in a distributed way. However, because the
threshold is fixed, the PuFT algorithm does not allow the agents
to respond efficiently to all modifications of the arena surface as
shown in Figure 5 and Table 2. Nevertheless, this algorithm has
the advantage of providing the team of autonomous agents with
the ability to quickly access the information about dynamic
changes brought to the working environment through its peer-to-
peer communication scheme. As shown in Figure 6, the sudden                       Figure 7. Integrated cost as a function of observation
increase in the availability of work (i.e. 20% additional seeds                       time for aggregation experiment in Arena3
dropped in the environment) was quickly sensed by the
teammates. This results in an appropriate number of agents

                                                                              5. CONCLUSION
                                                                              In this paper, we have presented a comparative study of three
                                                                              scalable, distributed, threshold-based worker allocation
                                                                              algorithms that allow a team of autonomous, embodied agents to
                                                                              dynamically allocate an appropriate number of workers to a given
                                                                              task based solely on their individual estimations of the progress in
                                                                              the execution of the task. We compared their efficiency and
                                                                              robustness in a collective manipulation case study concerned with
                                                                              gathering and clustering of seeds.
                                                                              Teams consisting of active workers dynamically controlled by one
                                                                              of the allocation algorithms achieve similar or better performances
                                                                              in aggregation than those characterized by a constant team size
                                                                              while using on average a considerably reduced number of agents
                                                                              over the whole aggregation process. Moreover, after a systematic
                                                                              optimization process in a given environment common to all the
                                                                              algorithms, it appeared that quantitative differences in efficiency
        Figure 6. Average cluster size for aggregation                        among these allocation algorithms are less apparent. Although
     experiment with worker allocation algorithms in an                       PuFT appears to be one of the two most efficient algorithms in the
     80X80 cm arena. 2 and 3 more seeds dropped in the                        smaller arena, its energy cost due to communication was not
         arena after 2h and 4 h into the experiment                           included in these experiments. In addition, even if the demand can
                        respectively                                          be estimated more accurately by sharing information and the
                                                                              collective reaction to external perturbations is faster, the PuFT
staying active to accomplish the task throughout the aggregation              algorithm still suffers from the same drawback as the PrFT
process. This in turn (based on the explanation in section 4.1)               algorithm in facing environmental changes for which its threshold
contributes to the collaborative team achieving larger clusters (see          was not a priori optimized for.



                                                                       1096
We believe that combining the characteristics of PuFT and PrVT,                     Agents. In Proceedings of Autonomous Agents’00,
threshold variability and information sharing will allow us in the                  Barcelona, Spain, June 2000, 203-204.
near future to further improve the robustness (and maybe the
                                                                               [7] Holland, O., and Melhuish, C. R. Stigmergy, Self-
efficiency) of the resulting worker allocation algorithm. In
                                                                                    organization, and Sorting in Collective Robotics. Artificial
particular, transforming the calibration rule in a continuously
                                                                                    Life, 1999, Vol. 5, 173-202.
adapting algorithm as suggested in [16] seems to be a promising
solution to static and dynamic external perturbations.                         [8] Ijspeert, A. J., Martinoli, A., Billard, A., Gambardella L. M.
                                                                                    Collaboration through the Exploitation of Local Interactions
Finally, it is worth noting that an agent that withdraws from a task
                                                                                    in Autonomous Collective Robotics: The stick Pulling
can allocate itself to a different task following a similar response-
                                                                                    Experiment. Autonomous Robots, 2001, Vol. 11, No. 2, 149-
to-stimulus mechanism, thus making this set of algorithms easily
                                                                                    171.
applicable to multi-task problems and more complex labor
division schemes.                                                              [9] Krieger, M. B., and Billeter, J.-B. The Call for Duty: Self-
                                                                                    Organized Task Allocation in Population of up to Twelve
6. ACKNOWLEDGEMENTS                                                                 Mobile Robots. Robotics and Autonomous Systems, 2000,
                                                                                    Vol. 30, No. 1-2, 65-84.
Special thanks to Xiaofeng Li for helping to collect systematic
simulation data. This work is supported in part by the TRW                     [10] Martinoli, A., Ijspeert, A. J., and Mondada, F. Understanding
Foundation and the TRW Space and Technology Division.                               Collective Aggregation Mechanisms: From Probabilistic
Further funding was received from the Caltech Center for                            Modeling to Experiments with Real Robots. Robotics and
Neuromorphic Systems Engineering as part of the NSF                                 Autonomous Systems, 1999, Vol. 29, 51-63.
Engineering Research Center program under grant EEC-9402726.
                                                                               [11] Martinoli, A., Ijspeert, A. J., and Gambardella, L. G. A
                                                                                    Probabilistic Model for Understanding and Comparing
7. REFERENCES                                                                       Collective Aggregation Mechanisms. In Proceedings of
[1] Agassounon, W., Martinoli, A., and Goodman, R. A                                ECAL ’99 (Lausanne, Switzerland, September 1999), 575-
     Scalable, Distributed Algorithm for Allocating Workers in                      584.
     Embedded Systems. In Proceedings of IEEE SMC ’01                          [12] Michel, O. Webots: Symbiosis Between Virtual and Real
     Tucson AZ, October 2001), 3367-3373.                                           Mobile Robots. Proceedings of ICVW ‘98 (Paris, France,
[2] Beni, G., and Wang, J. Swarm Intelligence. In Proceedings                       1998), 254-263.
     of the 7th Annual Meeting, Robotics Society of Japan, 1989,               [13] Mondada, F., Franzi, E., and Ienne, P. Mobile Robot
     425-428.                                                                       Miniaturization: A Tool for Investigation in Control
[3] Böhringer, K., Brown, R., Donald, B., Jennings, J., and Rus,                    Algorithms. Proceeding of ISER ’93 (Kyoto, Japan, October
     D. Distributed Robotic Manipulation: Experiments in                            1993), 501-513.
     Minimalism. Proceedings of ISER ’95 (Stanford, CA, June                   [14] Pacala, S. W., Gordon, D. M., and Godfray, H. C. J. Effects
     1995), 11-25.                                                                  of Social Group Size on Information Transfer and Task
[4] Bonabeau, E., Theraulaz, G., and Deneubourg J.-L. Fixed                         Allocation. Evolutionary Ecology, 1996, Vol. 10, 127-165.
     Response Thresholds and Regulation of Division of Labour                  [15] Parker, E. L. ALLIANCE: An Architecture for Fault Tolerant
     in Insect Societies. Bulletin of Mathematical Biology, 1998,                   Multi-robot Cooperation. IEEE Transactions on Robotics
     Vol. 60, 753-807.                                                              and Automation, 1998, Vol. 14, No. 2, 220-240.
[5] Bonabeau, E., Dorigo, M., and Theraulaz, G. Swarm                          [16] Theraulaz, G., Bonabeau, E. and Deneubourg, J.-L.
     Intelligence: From Animal to Artificial Systems. SFI Studies                   Response Threshold Reinforcement and Division of Labour
     in the Science of Complexity. Oxford University Press, New                     in Insect Societies. In Proceedings of Royal Society of
     York NY, 1999.                                                                 London, Series B, 1998, Vol. 265, 327-332.
[6] Gerkey, B. P., and Mataric, M. J. MURDOCH:
     Publish/Subscribe Task Allocation for Heterogeneous




                                                                        1097
Submitted to Proceedings of the IEEE/RSJ International Conference on Robotics and Intelligent Systems (IROS-03), 2003




         Adaptive Division of Labor in Large-Scale Minimalist Multi-Robot Systems
                                                                             c
                                               Chris Jones and Maja J. Matari´
                                                Computer Science Department
                                              University of Southern California
                                             941 West 37th Place, Mailcode 0781
                                                Los Angeles, CA 90089-0781
                                                cvjones maja @robotics.usc.edu



                            Abstract                                 These limitations in sensing, communication, and com-
                                                                  putation preclude a minimalist robot from performing
        A Large-Scale Minimalist Multi-Robot System (LMMS)        tasks requiring significant computation or communication
    is one composed of a group of robots each with limited        capabilities. Nonetheless, minimalist robots have been
    capabilities in terms of sensing, computation, and commu-     shown to be highly effective at a number of collective
    nication. Such systems have received increased attention      tasks, such as multi-robot formation control (Fredslund
    due to their empirically demonstrated range of capabilities              c
                                                                  & Matari´ 2002), collection tasks (Goldberg & Matari´     c
    and beneficial characteristics, such as their robustness to    2002), and robotic soccer (Werger 1999). A system com-
    environmental perturbations and individual robot failure      posed of a large number of such minimalist robots has
    and their scalability to large numbers of robots. However,    the potential of conferring advantages including increased
    little work has been done in investigating ways to endow      robustness to individual robot failure as no single robot
    such a LMMS with the capability to achieve a desired          is critical to task performance, the prospect of scaling to
    division of labor over a set of dynamically evolving          increasingly larger numbers of robots as there are few
    concurrent tasks, a necessity in any task-achieving LMMS.     bottlenecks in terms of complex communication, planning,
    Such a capability can help to increase the efficiency          or coordination requirements, and increased adaptability to
    and robustness of overall task performance as well as         changes in the environment since individuals act based on
    open new domains in which LMMS can be seen as a               local information and are not tied to globally coordinated
    viable alternative to more complex control solutions. In      plans.
    this paper we present a method for achieving a desired
    division of labor in a LMMS, experimentally validate it          The aim of this work is to investigate a method by
    in a realistic simulation, and demonstrate its potential to   which to endow a LMMS with the capability to achieve
    scale to large numbers of robots and its ability to adapt     a desired division of labor over a set of dynamically
    to environmental perturbations.                               evolving concurrent tasks, a critical requirement of any
                                                                  task achieving large-scale multi-robot system. We define
                                                                  division of labor as the phenomenon in which individuals
    I. Introduction and Motivation                                in a multi-robot system concurrently execute a set of
                                                                  tasks. The division of labor may need to be continuously
       A Large-Scale Minimalist Multi-Robot System                adjusted in response to changes in the task environment or
    (LMMS) is a multi-robot system composed of a large            group performance. The broader scope of this work is in
    number of robots, each having limited capabilities in terms   understanding ways in which to achieve robust, scalable,
    of sensing, computational power, and communication            and efficient coordination in a LMMS.
    range and bandwidth. We define a minimalist robot as one
    which maintains little or no state information, extracts         This paper is organized as follows. In Section II we
    limited, local, and noisy information from its available      provide the relevant related work. In Section III we give
    sensors, and lacks the capability for active communication    a detailed description of the concurrent foraging task
    with other robots. Due to these limited capabilities, the     domain we use as validation of our division of labor
    world in which a minimalist robot is situated is formally     mechanism. In Section IV we describe our experimental
    partially-observable and highly non-stationary, and it is     setup used for empirical evaluation of our work. In Sec-
    therefore not practical to assume that such a robot is        tion V we present the robot controller we use to produce
    capable of reliably knowing a significant portion of the       a division of labor in a LMMS. In Section VI we describe
    current global state of the environment or of overall task    and analyze experimental results, and in Section VII we
    progress.                                                     draw conclusions.
II. Related Work                                               individual task performance.
                                                                  The division of labor mechanism we present can be
   Here we summarize briefly the related work in physical       considered an instance of a response threshold model as
LMMS using robots with similar capabilities to those           presented in Bonabeau et al. (1996), Krieger & Billeter
on which our system is based. Matari´ (1995) provides
                                         c                                e
                                                               (2000), Th´ raulaz et al. (1998), and Agassounon & Mar-
early work on group coordination in LMMS using a               tinoli (2002). However, our task domain and division of
collection of simple basis behaviors. Th´ raulaz, Goss,
                                             e                 labor mechanism differ in that the task-related stimuli are
Gervet & Deneubourg (1991) and Agassounon & Marti-             perceived locally by the individual robots and are not
noli (2002) present minimalist methodologies for coordi-       altered as a result of task performance. Furthermore, the
nation in robot groups. Beckers, Holland & Deneubourg          individual robots are initially homogeneous, as opposed
(1994) demonstrate the capabilities of minimalist multi-       to Krieger & Billeter (2000) in which robot are initially
robot systems in object clustering and sorting. Kube &         assigned different response thresholds, and the robots do
Zhang (1996) present an approach to box-pushing using a        not learn or become specialized through adaptive response
group of robots with simple sensors and reactive control.                                       e
                                                               thresholds as is the case in Th´ raulaz et al. (1998) and
Werger & Matari´ (1996) present a minimalist solution
                    c                                          Agassounon & Martinoli (2002).
in the multi-robot foraging domain. Martinoli, Ijspeert
& Mondada (1999) present work on the probabilistic
modeling of robot behavior in the task regulation domain,      III. Concurrent Foraging Task Domain
demonstrating its performance as compared to experi-
ments on physical and simulated robots. Werger (1999)             In order to experimentally study a mechanism for
presents coordinated behavior in a robot soccer team           providing a LMMS with division of labor capabilities, we
using a minimalist behavior-based control system. Krieger      investigated division of labor in a concurrent foraging task
& Billeter (2000) present a decentralized task allocation      domain. Concurrent foraging, a variation on traditional
mechanism for large mobile robot groups based on indi-         foraging, consists of an arena populated by multiple types
vidual task-associated response thresholds in a collection     of objects to be collected. Each robot is equally capable
domain. Holland & Melhuish (2000) use probabilistic            of foraging all object types, but can only be allocated
behavior selection in minimalist robotic clustering and        to foraging for one type at any given time. Additionally,
                              c
sorting. Goldberg & Matari´ (2002) precisely define the         all robots are engaged in foraging at all times (i.e., a
foraging task for LMMS and provide a collection of             robot cannot be idle). A robot may switch the object type
general distributed behavior-based algorithms and their        according to its control policy, when it determines it is
                                           c
empirical evaluation. Fredslund & Matari´ (2002) present       appropriate to do so. It is desirable for a robot to avoid
work on the problem of achieving coordinated behavior          thrashing (i.e., wasting time and energy) by needlessly
in the context of formations using a distributed group         switching the object type for which it is foraging.
of physical robots using only local sensing and minimal
communication.                                                 A. Task Description
   In the multi-robot literature, there is work on more
communication and computationally complex forms of                Our experimental domain of concurrent foraging re-
task regulation in multi-robot systems through the use         quires multiple object (puck) types to be foraged from a
of publish/subscribe and market-based methods (Gerkey          circular arena. Initially, the arena is randomly populated
          c
& Matari´ 2002) and systems in which significant global         by two types of pucks: Puck          and Puck      , which
state is made known to all robots (Parker 1998).               are distinguishable by their color.
   There is related work in the area of research that             In this task, the robots move around an enclosed arena
studies and simulates insect colonies and their behaviors.     and pick up encountered pucks. When a robot picks up
   e
Th´ raulaz, E. & Deneubourg (1998) describe how the            a puck, the puck is consumed (i.e., it is immediately
adaptability of complex social insect societies is increased   removed from the environment, not transported to another
by allowing members of the society to dynamically change       region) and the robot carries on foraging for other pucks.
tasks (behaviors) when necessary. Giving that ability to       Immediately after a puck is consumed, another puck of the
robots allows a LMMS to operate in domains requiring           same type is placed in the arena at a random location. The
the simultaneous regulation of many tasks. Bonabeau,           reason for this replacement is to maintain constant puck
   e
Th´ raulaz & Deneubourg (1996) describe a model of             density in the arena through the course of an experiment.
a task regulation mechanism in insect societies through        In some situations, the density of pucks can have an
the use of response thresholds for task-related stimuli.       affect on the division of labor performance. This is an
   e
Th´ raulaz et al. (1998) extended that model by introducing    important consideration in mechanisms for division of
an adaptive threshold that changes over time based on          labor in LMMS for many domains; however, in this work
we want to limit the number of experimental variables
impacting system performance. Therefore, we reserve the
investigation on the impact of varying puck densities on
division of labor in LMMS for future work.
    The division of labor portion of the task requires the
robots to split their numbers by having some forage for
Puck       pucks and others for Puck         pucks. For the
purpose of our experiments, we desire a division of labor
such that the proportion of robots foraging for Puck
pucks is equal to the proportion of Puck      pucks present
in the foraging arena (e.g., if Puck    pucks make up 30%
of the pucks present in the foraging arena, then 30% of the
robots should be foraging for Puck       pucks). In general,
the desired division of labor could take other forms. For
example, it could be related to the relative reward or
cost of foraging each puck type without change to our
approach.                                                      Fig. 1. Close-up of the arena used in experiments; example pucks and
                                                               robots are shown.
    As was stated earlier, due to their minimalist capabili-
ties, individual robots do not have direct access to global
information such as the size and shape of the foraging
arena, the initial or current number of pucks to be foraged    using wheel rotation encoders, 8 evenly spaced sonars cov-
(total or by type), or the initial or current number of        ering the front 180 degrees used for obstacle avoidance,
foraging robots (total or by foraging type). Also, it cannot   and a forward-looking Sony color camera with a 60-degree
be assumed that any robot or subset of robots will always      field-of-view and a color blob detection system (used
be operational or the proportion of pucks will remain          for puck and robot detection and classification through
constant over time.                                            color). Each robot is also equipped with a 2-DOF gripper
                                                               on the front, capable of picking up a single puck at a
                                                               time. There is no capability available for explicit, direct
                                                               communication between robots nor can pucks and other
IV. Simulation Environment                                     robots be uniquely identified.

   All simulations were performed using Player and Stage.      A. Behavior-Based Controller
Player (Gerkey, Vaughan, Støy, Howard, Sukhatme &
       c
Matari´ 2001) is a server that connects robots, sensors, and      All robots have identical behavior-based controllers
control programs over the network. Stage (Vaughan 2000)        consisting of the following mutually exclusive behaviors:
simulates a set of Player devices. Together, the two rep-      Avoiding, Wandering, Visual Servoing, Grasping, and Ob-
resent a high-fidelity simulation tool for individual robots    serving. Descriptions of the behaviors used in the division
and robot teams which has been validated on a collection       of labor implementation are given below.
of real-world robot experiments using Player and Stage
                                                                   - The Avoiding behavior causes the robot to turn to
programs transferred directly to physical Pioneer 2DX
                                                                    avoid obstacles in its path.
mobile robots.
                                                                   - The Wandering behavior causes the robot to move
   The experimental arena (a close-up is shown in Fig-
                                                                    forward and after a random length of elapsed time
ure 1) was used in all experiments presented in this paper.
                                                                    turn left or right through a random arc for a random
The arena, shown populated with robots and pucks, is
                                                                    period of time.
circular and has an area of approximately 315 square
                                                                   - The Visual Servoing behavior causes the robot to
meters.
                                                                    move toward a detected puck of desired type. If the
                                                                    robot’s current foraging state is Robot    the desired
                                                                    puck type is Puck , and if the robots current
V. The Robots                                                       foraging state is Robot         the desired puck type
                                                                    is Puck        .
   The robots used in the experimental simulations are             - The Grasping behavior causes the robot to use the
realistic models of the ActivMedia Pioneer 2DX mobile               gripper to pickup and consume a puck within the
robot. Each robot, approximately 30 cm in diameter, is              gripper’s grasp.
equipped with a differential drive, an odometry system             - The Observing behavior causes the robot to take an
     image from its camera and record the detected pucks        and the history of the foraging state of observed robots
     and robots to their respective histories. The robot then   is limited to the last MAX-ROBOT-HISTORY robots ob-
     updates its foraging state based on the puck and robot     served.
     histories. A description of the histories is given in          While moving about the arena, each robot keeps track
     Section V-B and a description of the foraging state        of the approximate distance it has traveled by using
     update procedure is given in Section V-C.                  odometry measurements. At every interval of 2 meters
   Each behavior listed above has a set of activation con-      traveled, the robot makes an observation. An observation
ditions based on relevant sensor inputs and state values.       consists of the robot taking the current image from its
When met, the conditions cause the behavior to be become        color camera and, using simple color blob detection,
active. A description of when each activation condition         classifying all currently visible pucks and robots through
is active is given below. The activation conditions of all      their respective colors and adding them to their respective
behaviors are shown in Table I.                                 histories. This procedure is nearly instantaneous; there-
    - The Obstacle Detected activation condition is true        fore, the robot’s behavior is not outwardly affected. The
     when an obstacle is detected by the sonar within a         area in which pucks and other robots are visible is within
     distance of 1 meter. Pucks are not detectable by the       5 meters and       30 degrees in the robot’s direction of
     sonar, so are therefore not considered obstacles.          travel. Observations are only made after traveling 2 meters
    - The Puck        Detected activation condition is true     because updating too frequently leads to over-convergence
     if the robot’s current foraging state is Robot       and   of the estimated puck and robot type proportions due to
     a puck of type Puck       (where Det is Red or Green)      repeated observations of the same pucks and/or robots. On
     is detected by the color camera within a distance of       average, during our experiments, a robot detected 2 pucks
     approximately 5 meters and within 30 degrees of            and robots per observation.
     the robot’s direction of travel.
    - The Gripper Break-Beam On activation condition            C. Foraging State Transition Functions
     is true if the break-beam sensor between the gripper
     jaws detects an object;                                        After it makes an observation, the robot re-evaluates
    - The Observation Signal activation condition is true       its current foraging state given the newly updated puck
     if the distance traveled by the robot according to         and robot histories. We have experimented with several
     odometry since the last time the Observing behavior        functions, given below, to determine the conditions at
     was activated is greater than 2 meters.                    which a robot should change its current foraging state.
                                                                    The first method is a simple step transition function.
B. State Information                                            The condition in which a robot with a current forag-
                                                                ing state of Robot          will change its foraging state
   The robots maintain three types of state information:        to Robot       is given by Change(Green-Red), shown in
foraging state, observed puck history, and observed robot       Equation 1. Similarly, the condition in which a robot
history. The foraging state identifies the type of puck the      with a current foraging state of Robot          will change
robot is currently involved in foraging. A robot with a         its foraging state to Robot       is given by Change(Red-
foraging state of Robot     refers to a robot engaged in for-   Green), shown in Equation 2. In Equations 1 and 2, RR is
aging Puck       pucks and a foraging state of Robot            the proportion of Robot       entries in the Robot History
refers to a robot engaged in foraging Puck          pucks.      and RP is the proportion of Puck        entries in the Puck
   Each robot is outfitted with a colored beacon ob-             History.
servable by nearby robots which indicates the robot’s
current foraging state. The color of the beacon changes
                                                                                                         if RR    RP
to reflect the current state – a red beacon for a foraging                                                                 (1)
                                                                                                         otherwise
state of Robot       and a green beacon for Robot           .
Thus, the colored beacon acts as a form of local, passive
                                                                                                         if RR    RP
communication conveying the robot’s current foraging                                                                      (2)
                                                                                                         otherwise
state. All robots maintain a limited, constant-sized history
storing the most recently observed puck types and another          The second method we explored uses a probabilistic
constant-sized history storing the foraging state of the        transition function. The probability that a robot with a cur-
most recently observed robots. Neither of these histories       rent foraging state of Robot        will change its foraging
contain a unique identity or location of detected pucks or      state to Robot       is given by the probability, P(Green-
robots, nor does it store a time stamp of when any given        Red), shown in Equation 3. Similarly, the probability
observation was made. The history of observed pucks is          that a robot with a current foraging state of Robot
limited to the last MAX-PUCK-HISTORY pucks observed             will change its foraging state to Robot          is given by
                         Obstacle    Puck         Gripper Break-                                   Observation                       Active
                         Detected    Detected       Beam On                                          Signal                         Behavior
                            X           X               X                                              1                            Observing
                            1           X               X                                              X                            Avoiding
                            0           1               0                                              0                         Visual Servoing
                            0           X               1                                              0                            Grasping
                            0           X               X                                              X                           Wandering

                                                           TABLE I
B EHAVIOR A CTIVATION C ONDITIONS . B EHAVIORS ARE LISTED IN ORDER OF DECREASING RANK . H IGHER RANKING BEHAVIORS PREEMPT LOWER
   RANKING BEHAVIORS IN THE EVENT MULTIPLE ARE ACTIVE . X DENOTES THE ACTIVATION CONDITION IS IRRELEVANT FOR THE BEHAVIOR .




                                                                                                                  Division of Labor Using Step Transition Function (Puck Hist:2 Robot Hist:2)
the probability, P(Red-Green), shown in Equation 4. In                                              1
Equations 3 and 4, RR is the proportion of Robot                                                             Red Pucks
                                                                                                             Red Robots
entries in the Robot History and RP is the proportion of
                                                                                                   0.5
Puck      entries in the Puck History.

                                                                                                    0
                                                                                                         0    2000       4000       6000       8000       10000      12000      14000      16000   18000
                                                         (3)
                                                                   Proportion Puck or Robot Type
                                                                                                                 Division of Labor Using Step Transition Function (Puck Hist:10 Robot Hist:10)
                                                                                                    1


                                                         (4)
                                                                                                   0.5




VI. Experimental Results                                                                            0
                                                                                                         0    2000       4000       6000       8000       10000      12000      14000      16000   18000
                                                                                                                 Division of Labor Using Step Transition Function (Puck Hist:50 Robot Hist:50)
                                                                                                    1
    We experimentally validated our LMMS division of
labor mechanism using both the step and probabilistic
                                                                                                   0.5
transition functions presented in Section V-C in the real-
istic simulated environment described in Section IV. All
experiments used 20 robots and 50 pucks and all presented                                           0
                                                                                                         0    2000        4000     6000       8000       10000      12000       14000      16000   18000
results have been averaged over 20 experimental runs.                                                                                     Simulation Time (seconds)
    To test the adaptability of the division of labor mech-
anism to external perturbations in puck type proportions,          Fig. 2. The proportion of Puck            pucks and robots foraging for
                                                                   Puck       pucks over time when using the step transition function with
they were dynamically changed at various times during              different puck and robot history lengths.
the experimental trials. The experiments begin with 30%
Puck       and 70% Puck           pucks. At time 6000 sec-
onds, the relative proportion of pucks are changed to 80%
Puck       and 20% Puck           pucks, and at time 12000         puck type becoming significantly more prevalent than the
the relative proportions are changed to 50% Puck        and        other, the systems using shorter history lengths tend to
50% Puck          pucks. The total number of pucks remains         converge slightly off of the desired division of labor. This
constant throughout the experiment.                                is in part due to the insufficient granularity in their puck
    The plots in Figures 2 and 3 show a comparison                 and robot histories.
between the performance of the step and the probabilis-                Another factor in evaluating the efficiency of these
tic transition functions using MAX-PUCK-HISTORY and                methods is through the frequency by which individual
MAX-ROBOT-HISTORY values of 2, 10, and 50. Both                    robots switch between tasks. In some task domains,
the step and probabilistic transition functions converge to        switching between tasks can be very expensive and there-
a stable division of labor. The use of the step transition         fore it should be avoided. Figures 4 and 5 show the
function leads to faster convergence to a stable division of       cumulative number of times the robots change state during
labor at the expense of more high frequency oscillation.           the course of the experiments. The data points are obtained
    When the puck proportions are equal, both transition           by summing the total number of forage state changes over
functions converge to the desired division of labor. As            the course of the previous 50 seconds of the experiment (it
the puck type proportions become more skewed, with one             is possible that a single robot could change foraging state
                                           Division of Labor Using Probabilistic Transition Function (Puck Hist:2 Robot Hist:2)                                              Foraging State Changes Using Step Transition Function (Puck Hist:2 Robot Hist:2)
                                 1                                                                                                                                  8
                                          Red Pucks
                                          Red Robots                                                                                                                6

                                0.5                                                                                                                                 4

                                                                                                                                                                    2

                                 0                                                                                                                                  0
                                      0   2000        4000       6000       8000        10000      12000      14000      16000     18000                                0   2000       4000      6000      8000       10000      12000      14000      16000     18000
Proportion Puck or Robot Type




                                          Division of Labor Using Probabilistic Transition Function (Puck Hist:10 Robot Hist:10)                                            Foraging State Changes Using Step Transition Function (Puck Hist:10 Robot Hist:10)




                                                                                                                                           Foraging State Changes
                                 1                                                                                                                                  8

                                                                                                                                                                    6

                                0.5                                                                                                                                 4

                                                                                                                                                                    2

                                 0                                                                                                                                  0
                                      0   2000        4000       6000       8000        10000      12000      14000      16000     18000                                0   2000       4000      6000      8000       10000      12000      14000      16000     18000
                                          Division of Labor Using Probabilistic Transition Function (Puck Hist:50 Robot Hist:50)                                            Foraging State Changes Using Step Transition Function (Puck Hist:50 Robot Hist:50)
                                 1                                                                                                                                  8

                                                                                                                                                                    6

                                0.5                                                                                                                                 4

                                                                                                                                                                    2

                                 0                                                                                                                                  0
                                      0    2000        4000      6000       8000       10000      12000      14000       16000     18000                                0   2000       4000       6000       8000       10000      12000    14000      16000     18000
                                                                        Simulation Time (seconds)                                                                                                        Simulation Time (seconds)


Fig. 3. The proportion of Puck           pucks and robots foraging for                                                                     Fig. 4. The number of foraging state changes when using the step
Puck      pucks over time when using the probabilistic transition function                                                                 transition function with different puck and robot history lengths.
with different puck and robot history lengths.


                                                                                                                                           or directly communicate with other robots in the system.
more than once during this interval). As the plots show,                                                                                   Using this LMMS, we have demonstrated a method by
the shorter the puck and robot history lengths, the more                                                                                   which to achieve a desired division of labor in a concurrent
foraging state changes occur. Also, for puck and robot                                                                                     foraging task domain, experimentally validated it in a
histories of the same length, the use of the probabilistic                                                                                 realistic simulation, and demonstrated its robustness and
transition function leads to fewer foraging state changes                                                                                  adaptability to environmental perturbations.
than the step transition function.
   In general, shorter puck and robot history lengths result
in faster convergence to the desired division of labor
but lead to higher frequency oscillations due to more                                                                                      VIII. Acknowledgments
frequent changes in individual robot foraging state. For
increasingly small histories, convergence to a division of                                                                                   This work is supported in part by DARPA TASK Grant
labor short of the desired is exhibited as the puck type                                                                                   F30602-00-2-0573 and in part by NSF ITR Grant EIA-
proportions become skewed. The probabilistic transition                                                                                    0121141.
function results in more stable convergence with fewer
robot state changes and fewer high frequency oscillations                                                                                  IX. REFERENCES
in the division of labor. What transition function and
history lengths are to be for an arbitrary task environment                                                                                Agassounon, W. & Martinoli, A. (2002), Effi ciency and robust-
depends on factors such as how quickly the environment                                                                                          ness of threshold-based distributed allocation algorithms
                                                                                                                                                in multi-agent systems, in ‘First Int. Joint Conf. on Au-
changes, how often the environment changes, and the                                                                                             tonomous Agents and Multi-Agent Systems’, ACM Press,
expense of changing tasks.                                                                                                                      Bologna, Italy, pp. 1090–1097.
                                                                                                                                           Beckers, R., Holland, O. & Deneubourg, J. (1994), From local
                                                                                                                                                actions to global tasks: Stigmergy and collective robotics,
                                                                                                                                                in R. Brooks & P. Maes, eds, ‘Artifi cal Life IV: Proceedings
                                                                                                                                                of the Fourth International Workshop on the Synthesis and
VII. Conclusions                                                                                                                                Simulation of Living Systems’, MIT Press, Cambridge,
                                                                                                                                                MA, pp. 181–189.
                                                                                                                                           Bonabeau, E., Th´eraulaz, G. & Deneubourg, J. (1996), ‘Quanti-
   We have presented a Large-Scale Minimalist Multi-                                                                                            tative study of the fi xed threshold model for the regulation
Robot System (LMMS), composed of 20 simulated mobile                                                                                            of division of labour in insect societies’, Proceedings Royal
                                                                                                                                                Society of London B 263, 1565–1569.
robots, in which the individual robots maintain a minimal                                                                                  Fredslund, J. & Matari´c, M. J. (2002), ‘A general, local algo-
amount of state information, extract a limited amount of                                                                                        rithm for robot formations’, IEEE Transactions on Robotics
                                                                                                                                                and Automation, Special Issue on Multi-Robot Systems
information from available sensors, and cannot actively                                                                                         18(5), 837–846.
                                  Foraging State Changes Using Probabilistic Transition Function (Puck Hist:2 Robot Hist:2)
                         5                                                                                                                Behavior: From Animals to Animats’, MIT Press, Paris,
                                                                                                                                          France, pp. 346–355.
                         4                                                                                                            Vaughan, R. (2000), ‘Stage: A multiple robot simulator’, Institute
                         3                                                                                                                for Robotics and Intelligent Systems Technical Report IRIS-
                                                                                                                                          00-394, Univ. of Southern California .
                         2                                                                                                            Werger, B. B. (1999), ‘Cooperation without deliberation: A
                         1                                                                                                                minimal behavior-based approach to multi-robot teams’,
                         0
                                                                                                                                          Artifi cial Intelligence 110, 293–320.
                             0       2000       4000     6000      8000         10000      12000      14000      16000       18000    Werger, B. & Matari´c, M. (1996), Robotic ”food” chains: Exter-
                                 Foraging State Changes Using Probabilistic Transition Function (Puck Hist:10 Robot Hist:10)              nalization of state and program for minimal-agent foraging,
Foraging State Changes




                         5                                                                                                                in P. Maes, M. Matari´c, J. Meyer, J. Pollack & S. Wilson,
                         4                                                                                                                eds, ‘From Animals to Animats 4, Fourth International
                                                                                                                                          Conference on the Simulation of Adaptive Behavior (SAB-
                         3                                                                                                                96)’, Cape Cod, Massachusetts, pp. 625–634.
                         2
                         1
                         0
                             0       2000       4000     6000      8000         10000      12000      14000      16000       18000
                                 Foraging State Changes Using Probabilistic Transition Function (Puck Hist:50 Robot Hist:50)
                         5
                         4
                         3
                         2
                         1
                         0
                             0       2000      4000        6000       8000       10000      12000     14000      16000        18000
                                                                  Simulation Time (seconds)


Fig. 5.      The number of foraging state changes when using the
probabilistic transition function with different puck and robot history
lengths.




Gerkey, B. P. & Matari´c, M. J. (2002), ‘Sold!: Auction meth-
     ods for multi-robot coordination’, IEEE Transactions on
     Robotics and Automation, Special Issue on Multi-Robot
     Systems 18(5), 758–768.
Gerkey, B., Vaughan, R., Støy, K., Howard, A., Sukhatme, G. &
     Matari´c, M. (2001), Most valuable player: A robot device
     server for distributed control, in ‘IEEE/RSJ International
     Conference on Intelligent Robots and Systems, (IROS-01)’,
     Maui, Hawaii, pp. 1226–1231.
Goldberg, D. & Matari´c, M. J. (2002), Design and evaluation
     of robust behavior-based controllers for distributed multi-
     robot collection tasks, in T. Balch & L. E. Parker, eds,
     ‘Robot Teams: From Diversity to Polymorphism’, AK
     Peters, pp. 315–344.
Holland, O. & Melhuish, C. (2000), ‘Stigmergy, self-
     organization, and sorting in collective robotics’, Artifi cial
     Life 5(2), 173–202.
Krieger, M. B. & Billeter, J.-B. (2000), ‘The call for duty: Self-
     organized task allocation in a population of up to twelve
     mobile robots’, Robotics and Autonomous Systems 30(1-
     2), 65–84.
Kube, C. & Zhang, H. (1996), The use of perceptual cues
     in multi-robot box-pushing, in ‘IEEE International Con-
     ference on Robotics and Automation’, Minneapolis, Min-
     nesota, pp. 2085–2090.
Martinoli, A., Ijspeert, A. J. & Mondada, F. (1999), ‘Understand-
     ing collective aggregation mechanisms: From probabilitic
     modeling to experiments with real robots’, Robotics and
     Autonomous Systems 29, 51–63.
Matari´c, M. (1995), ‘Designing and understanding adaptive
     group behavior’, Adaptive Behavior 4:1, 51–80.
Parker, E. L. (1998), ‘Alliance: An architecture for fault tolerant
     multi-robot cooperation’, IEEE Transactions on Robotics
     and Automation 14(2), 220–240.
Th´eraulaz, G., E., B. & Deneubourg, J. (1998), ‘Threshold
     reinforcement and the regulation of division of labour in
     insect societies’, Proceedings Royal Society of London B
     265, 327–335.
Th´eraulaz, G., Goss, S., Gervet, J. & Deneubourg, J. (1991),
     Task differentiation in polistes wasp colonies: A model for
     self-organizing groups of robots, in J. Meyer & S. Wilson,
     eds, ‘International Conference on Simulation of Adaptive
To appear in the Intl. J. of Robotics Research
             Also Technical Report CRES-03-013, Center for Robotics and Embedded Systems, USC, July 2003




                      A formal analysis and taxonomy of task allocation
                                   in multi-robot systems
                            Brian P. Gerkey                                       Maja J Matari´c
                       Artificial Intelligence Lab                         Computer Science Department
                          Stanford University                            University of Southern California
                    Stanford, CA 94305-9010, USA                        Los Angeles, CA 90089-0781, USA
                          gerkey@stanford.edu                                     mataric@cs.usc.edu



       Abstract                                                         increasingly larger robot teams engaged in concurrent and
                                                                        diverse tasks over extended periods of time.
       Despite more than a decade of experimental work in
       multi-robot systems, important theoretical aspects of
       multi-robot coordination mechanisms have, to date, been          1.1 Multi-Robot Task Allocation (MRTA)
       largely untreated. To address this issue, we focus on            As a result of the growing focus on multi-robot systems,
       the problem of multi-robot task allocation (MRTA). Most          multi-robot coordination has received significant atten-
       work on MRTA has been ad hoc and empirical, with many            tion. In particular, multi-robot task allocation (MRTA)
       coordination architectures having been proposed and val-         has recently risen to prominence and become a key re-
       idated in a proof-of-concept fashion, but infrequently an-       search topic in its own right. As researchers design, build,
       alyzed. With the goal of bringing objective grounding            and use cooperative multi-robot systems, they invariably
       to this important area of research, we present a formal          encounter the fundamental question: “which robot should
       study of MRTA problems. A domain-independent taxon-              execute which task?” in order to cooperatively achieve
       omy of MRTA problems is given, and it is shown how               the global goal. By “task,” we mean a subgoal that is nec-
       many such problems can be viewed as instances of other,          essary for achieving the overall goal of the system, and
       well-studied, optimization problems. We demonstrate how          that can be achieved independently of other subgoals (i.e.,
       relevant theory from operations research and combinato-          tasks). Tasks can be discrete (e.g., deliver this package to
       rial optimization can be used for analysis and greater un-       room 101) or continuous (e.g., monitor the building en-
       derstanding of existing approaches to task allocation, and       trance for intruders) and can also vary in a number of
       show how the same theory can be used in the synthesis of         other ways, including timescale, complexity, and speci-
       new approaches.                                                  ficity. We do not categorize or distinguish between differ-
                                                                        ent kinds of robotic tasks, though others have done so (see
       1 Introduction                                                   Section 2). Task independence is a strong assumption, and
                                                                        one that clearly limits the scope of our study. For exam-
       Over the past decade, a significant shift of focus has oc-        ple, we do not allow ordering constraints on a set of tasks;
       curred in the field of mobile robotics as researchers have        in general we require that individual tasks can be consid-
       begun to investigate problems involving multiple, rather         ered and assigned independently of each other. This issue
       than single, robots. From early work on loosely-coupled          is addressed further in Section 7.2.
                                                      c
       tasks such as homogeneous foraging (Matari´ 1992) to                In this work, we are concerned with methods for inten-
       more recent work on team coordination for robot soccer           tional cooperation (Parker 1998). In this model, robots
       (Stone & Veloso 1999), the complexity of the multi-robot         cooperate explicitly and with purpose, often through task-
       systems being studied has increased. This complexity has         related communication and negotiation. Intentional coop-
       two primary sources: larger team sizes and greater hetero-       eration is clearly not a prerequisite for a multi-robot sys-
       geneity of robots and tasks. As significant achievements          tem to exhibit coordinated behavior, as demonstrated by
       have been made along these axes, it is no longer sufficient       minimalist or emergent approaches (Deneubourg, Ther-
       to show, for example, a pair of robots observing targets         aulaz & Beckers 1991). In such systems, individuals co-
       or a large group of robots flocking as examples of coordi-        ordinate their actions through their interactions with each
       nated robot behavior. Today we reasonably expect to see          other and with the environment, but without explicit nego-


                                                                    1
tiation or allocation of tasks. An open question is which           do not currently exist good approximations; in such cases
tasks (if any) require intentional cooperation. For exam-           we provide formal characterizations of the problems but
ple, cooperative box-pushing has been demonstrated us-              do not suggest how they should be solved.
ing both emergent (Kube & Zhang 1993) and intentional                  Our approach is not meant to be final or exhaustive and
(Parker 1998) techniques, and there remains significant              indeed it has limitations. However, we believe that the
debate as to the relative value of the two approaches.              ideas we present constitute a starting point on a path to-
   However, emergent systems tend not to be amenable to             ward a more complete understanding of problems involv-
analysis, with their exact behavior difficult, if not impos-         ing MRTA, as well as other aspects of multi-robot coordi-
sible, to predict. We assert that, as compared with emer-           nation.
gent cooperation, intentional cooperation is usually better
suited to the kinds of real-world tasks that humans might
want robots to do. If the robots are deliberately coop-             2 Related work
erating with each other, then, intuitively we expect that
humans can deliberately cooperate with them, which is a             Research in multi-robot systems has focused primarily on
long-term goal of multi-robot research. Furthermore, in-            construction and validation of working systems, rather
tentional cooperation has the potential to better exploit the       than more general analysis of problems and solutions.
capabilities of heterogeneous robot teams. In this work,            As a result, in the literature, one can find many archi-
the use of intentional cooperation is at the level of task          tectures for multi-robot coordination, but relatively few
allocation, and need not propagate to the level of task ex-         formal models of multi-robot coordination. We do not
ecution. Importantly, we do not prescribe or proscribe any          attempt here to cover the various proposed and demon-
particular method for implementing the details of a task.           strated architectures. For a thorough treatment of imple-
For example, if a foraging task is assigned to a team of            mented multi-robot systems, consult Cao, Fukunaga &
robots because they are best fit for the job, they can ex-           Kahng (1997) or the more recent work of Dudek, Jenkin
ecute the task in any way they wish, from probabilistic             & Milios (2002). Each provides a taxonomy that catego-
swarming to classical planning.                                     rizes the bulk of existing multi-robot systems along var-
                                                                    ious axes, including team organization (e.g., centralized
                                                                    vs. distributed), communication topology (e.g., broadcast
1.2 Toward formal analysis                                          vs. unicast), and team composition (e.g., homogeneous
                                                                    vs. heterogeneous). Rather than characterize architec-
The question of task allocation must be answered, even              tures, we seek instead to categorize the underlying prob-
for relatively simple multi-robot systems, and its impor-           lems, although we do analyze and discuss several key ar-
tance grows with the complexity, in size and capability, of         chitectures that solve those problems; see Section 6.
the system under study. The empirically validated meth-                Formal models of coordination in multi-robot systems
ods demonstrated to date remain primarily ad hoc in na-             tend to target medium- to large-scale systems composed
ture, and relatively little has been written about the gen-         of simple, homogeneous robots, such as the CEBOTS
eral properties of cooperative multi-robot systems. After           (Fukuda, Nakagawa, Kawauchi & Buss 1988). Agas-
a decade of research, while cooperative architectures have          sounon & Martinoli (2002) explored the tradeoffs be-
been proposed, the field still lacks a prescription for how          tween using a coarse, macroscopic model of such sys-
to design a MRTA system. Similarly, there has been little           tems and using detailed, microscopic models of the indi-
attempt to evaluate or compare the proposed architectures,          viduals. Lerman & Galstyan (2002) presented a physics-
either analytically or empirically.                                 inspired macroscopic model of a cooperative multi-robot
   In this paper we present a particular taxonomy for               system and showed that it accurately described the behav-
studying MRTA, based on organizational theory from                  ior of physical robots engaged in stick-pulling and forag-
several fields, including operations research, economics,            ing tasks. That kind of model is descriptive but not pre-
scheduling, network flows, and combinatorial optimiza-               scriptive, in that it does not guide the design of control or
tion. We show how this taxonomy can be used to an-                  coordination mechanisms.
alyze and classify MRTA problems, and evaluate and                     Though simple and elegant, such models are insuffi-
compare proposed solutions. For the simpler (and most               cient for domains involving complex tasks or requiring
widely studied) problems, we provide a complete anal-               precise control. To study complex tasks, Donald, Jennings
ysis and prescribe provably optimal, yet tractable, algo-           & Rus (1997) proposed the formalism of information in-
rithms for their solution. For more difficult problems, we           variants, which models the information requirements of a
suggest candidate approximation algorithms that have en-            coordination algorithm and provides a mechanism to per-
joyed success in other application domains. There are also          form reductions between algorithms. Spletzer & Taylor
some extremely difficult MRTA problems for which there               (2001) developed a prescriptive control-theoretic model


                                                                2
of multi-robot coordination and showed that it can be                  Within multi-robot research, the formulation of utility can
used to produce precise multi-robot box-pushing. Ma-                   vary from sophisticated planner-based methods (Botelho
son (1986) had earlier applied a similar control-theoretic             & Alami 1999) to simple sensor-based metrics (Gerkey
model to box-pushing with dexterous manipulators.                                 c
                                                                       & Matari´ 2002b). We posit that utility estimation of
   Relatively little work has been done on formal mod-                 this kind is carried out somewhere in every autonomous
eling, analysis, or comparison of multi-robot coordina-                task allocation system, for the heart of any task alloca-
tion at the level of task allocation. Chien, Barrett, Es-              tion problem is comparison and selection among a set of
tlin & Rabideau (2000) developed a baseline geological                 available alternatives. Since each system uses a different
scenario and used it to compare three different planning               method to calculate utility, we give the following generic
approaches to coordinating teams of planetary rovers.                  and practical definition of utility for multi-robot systems.
Klavins (2003) showed how to apply the theory of com-                     It is assumed that each robot is capable of estimating
munication complexity to the study of multi-robot coor-                its fitness for every task it can perform. This estima-
dination algorithms. Finally, Jennings & Kirkwood-Watts                tion includes two factors, which are both task- and robot-
(1998) described the method of dynamic teams, concen-                  dependent:
trating on programmatic structures that enable the specifi-
cation of multi-robot tasks.                                             • expected quality of task execution, given the method
   Multi-robot systems can also be formally described by                   and equipment to be used (e.g., the accuracy of the
process models, such as Petri nets (Murata 1989) and Par-                  map that will be produced using a laser range-finder),
tially Observable Markov Decision Processes (Kaelbling,                  • expected resource cost, given the spatio-temporal re-
Littman & Cassandra 1998), both of which are highly ex-                    quirements of the task (e.g., the power that will be
pressive. Unfortunately, such models tend to be too com-                   required to drive the motors and laser range-finder in
plex to be directly analyzed or solved, even for modest-                   order to map the building).
sized systems. Another formal model is that of the hybrid
system (Alur, Courcoubetis, Halbwachs, Henzinger, Ho,                  Given a robot R and a task T , if R is capable of executing
Nicollin, Olivero, Sifakis & Yovine 1995), which charac-               T , then one can define, on some standardized scale, QRT
terizes discrete systems operating in an analog environ-               and CRT as the quality and cost, respectively, expected
ment. Hybrid systems can also become complex and are                   to result from the execution of T by R. This results in a
usually used to describe or control the behavior of a single           combined, nonnegative utility measure:
robot, via a so-called three-layer architecture (Gat 1998).                     
                                                                                 QRT − CRT if R is capable of executing
   Our goal in this paper is to fill a gap in the existing liter-
                                                                       URT =                      T and QRT > CRT
ature on multi-robot coordination. We neither construct a                       
                                                                                    0             otherwise
formal model in support of a particular coordination archi-
tecture, nor compare different architectures in a particular           For example, given a robot A that can achieve a task T
task domain. Rather, we develop a task- and architecture-              with quality QAT = 20 at cost CAT = 10 and a robot
independent taxonomy, based on optimization theory, in                 B that can achieve the same task with quality QBT = 15
which to study task allocation problems.                               at cost CBT = 5, there should be no preference between
                                                                       them when searching for efficient assignments, for:
3 Utility                                                                      UAT = 20 − 10 = 10 = 15 − 5 = UBT .

To treat task allocation in an optimization context, one                  Regardless of the method used for calculation, the
must decide what exactly is to be optimized. Ideally the               robots’ utility estimates will be inexact due to sensor
goal is to directly optimize overall system performance,               noise, general uncertainty, and environmental change.
but that quantity is often difficult to measure during sys-             These unavoidable characteristics of the multi-robot do-
tem execution. Furthermore, when selecting among alter-                main will necessarily limit the efficiency with which co-
native task allocations, the impact on system performance              ordination can be achieved. We treat this limit as exoge-
of each option is usually not known. Consequently, some                nous, on the assumption that lower-level robot control has
kind of performance estimate, such as utility, is needed.              already been made as reliable, robust, and precise as pos-
   Utility is a unifying, if sometimes implicit, concept in            sible and thus that we are incapable of improving it at the
economics, game theory, and operations research, as well               task allocation level. When we discuss “optimal” allo-
as in multi-robot coordination. It is based on the no-                 cations, we mean “optimal” in the sense that, given the
tion that each individual can internally estimate the value            union of all information available in the system (with the
(or the cost) of executing an action. Depending on the                 concomitant noise, uncertainty, and inaccuracy), it is im-
context, utility is also called fitness, valuation, and cost.           possible to construct a solution with higher overall utility.


                                                                   3
This notion of optimality is analogous to that used in op-           computationally trivial when compared to the complexity
timal scheduling (Dertouzos & Mok 1983).                             of the optimization problem.
   It is important to note that utility is an extremely flexi-           Given a maximization problem over a subset system,
ble measure of fitness that can encompass arbitrary com-              one can define algorithms that attempt to solve it. Of par-
putation. The only constraint on utility estimators is that          ticular interest is the canonical Greedy algorithm (Ahuja
they must each produce a single scalar value that can be             et al. 1993):
compared for the purpose of ordering candidates for tasks.
For example, if the metric for a particular task is distance         Algorithm (The Greedy algorithm).
to a location and a candidate robot employs a probabilis-             1. Reorder the elements of E = {e1 , e2 , . . . , en } such
tic localization mechanism, then a reasonable utility esti-              that u(e1 ) ≥ u(e2 ) ≥ . . . ≥ u(en ).
mate might be to calculate the distance to the target using
the center of mass of the current probability distribution.           2. Set X := ∅.
Other mechanisms, such as planning and learning, can
likewise be incorporated into utility estimation. Regard-             3. For j = 1 to n:
less of the domain, it is vital that all relevant aspects of                   if X ∪ {ej } ∈ F then X = X ∪ {ej }
the state of the robots and their environment be included
                                                                        This algorithm is an abstraction of the familiar and intu-
in the utility calculation. Signals that are left out of this
                                                                     itive greedy algorithm for solving a problem: repeatedly
calculation but are taken into consideration when evaluat-
                                                                     take the best valid option. While the Greedy algorithm
ing overall system performance are what economists refer
                                                                     performs well on some optimization problems, it can do
to as externalities (Simon 2001) and their effects can be
                                                                     quite poorly on others. In particular, it performs well on
detrimental, if not catastrophic.
                                                                     certain subset systems that can be further classified as ma-
                                                                     troids:
4 Combinatorial optimization                                         Definition (Matroid). A subset system (E, F ) is a ma-
                                                                     troid if, for each X, Y ∈ F with |X| > |Y |, there exists
Before entering into a discussion of task allocation prob-           an x ∈ X \ Y such that Y ∪ {x} ∈ F .
lems as being primarily concerned with optimization, it
will be necessary to provide some theoretical background.               That is, given two independent sets X and Y , with X
The field of combinatorial optimization provides a set-               larger than Y , Y can be “grown” by adding to it some
theoretic framework, based on subset systems, for describ-           element from X. With respect to the current discussion,
ing a wide variety of optimization problems (Ahuja, Mag-             an equivalent definition of a matroid is that a subset sys-
nanti & Orlin 1993):                                                 tem (E, F ) is a matroid if and only if the Greedy algo-
                                                                     rithm optimally solves the associated maximization prob-
Definition (Subset System). A subset system (E, F ) is a              lem (Korte & Vygen 2000). In the parlance of algorith-
finite set of objects E and a nonempty collection F of sub-           mic analysis, matroids satisfy the greedy-choice property,
sets, called independent sets, of E that satisfies the prop-          which is a prerequisite for a greedy algorithm to produce
erty that if X ∈ F and Y ⊆ X then Y ∈ F .                            an optimal solution (Cormen, Leiserson & Rivest 1997).
  That is, any subset of an independent set is also an in-           Matroids are of particular interest precisely because their
dependent set. A general maximization problem can be                 associated optimization problems are amenable to greedy
defined in the following way:                                         solution.
                                                                        While the Greedy algorithm does not optimally solve
Definition (Subset Maximization). Given a subset sys-                 every maximization problem, it is useful to know how
tem (E, F ) and a utility function u : E → R+ , find an               poor the greedy solution can be. For such purposes it is
X ∈ F that maximizes the total utility:                              common to report a competitive factor for the sub-optimal
                                                                     algorithm. For a maximization problem, an algorithm is
                     u(X) =         u(e)                  (1)        called α-competitive if, for any input, it finds a solution
                                                                                                            1
                              e∈X                                    whose total utility is never less than α of the optimal util-
                                                                     ity.
   The elements of F are usually not given directly, or at
least are inconvenient to represent explicitly. Instead, it is
assumed that an oracle is available that, given a candidate          5 A taxonomy of MRTA problems
set X, can decide whether X ∈ F . The job of such an or-
acle, given a proposed solution, is to verify the feasibility        We propose a taxonomy of MRTA problems based on axes
of that solution. For many problems, this verification is             laid out below. Our goals here are two-fold: 1) to show


                                                                 4
how various MRTA problems can be positioned in the re-            which is a well-known problem that was originally stud-
sulting problem space; and 2) to explain how organiza-            ied in game theory and then in operations research, in the
tional theory relates to those problems and to proposed           context of personnel assignment. A recurring special case
solutions from the robotics literature. In some cases, it         of particular interest in several fields of study, this prob-
will be possible to construct provably optimal solutions,         lem can be formulated in many ways. Given the applica-
while in others only approximate solutions are available.         tion domain of MRTA, it is fitting to describe the problem
There are also some difficult MRTA problems for which              in terms of jobs and workers.
there do not currently exist good approximations. When
designing a multi-robot system, it is essential to under-         Definition (Optimal Assignment Problem). Given m
stand what kind of task allocation problem is present in          workers, each looking for one job and n prioritized jobs,
order to solve it in a principled manner.                         each requiring one worker. Also given for each worker is a
   We propose the following three axes for use in describ-        nonnegative skill rating (i.e., utility estimate) that predicts
ing MRTA problems:                                                his/her performance for each job; if a worker is incapable
                                                                  of undertaking a job, then the worker is assigned a rating
  • single-task robots (ST) vs. multi-task robots                 of zero for that job. The goal is to assign workers to jobs
    (MT): ST means that each robot is capable of exe-             so as to maximize overall expected performance, taking
    cuting as most one task at a time, while MT means             into account the priorities of the jobs and the skill ratings
    that some robots can execute multiple tasks simulta-          of the workers.
    neously.
                                                                     The OAP can be cast in many ways, including as an
  • single-robot tasks (SR) vs. multi-robot tasks                 integral linear program (Gale 1960): find mn nonnegative
    (MR): SR means that each task requires exactly one            integers αij that maximize
    robot to achieve it, while MR means that some tasks                                     m   n
    can require multiple robots.                                                    U=               αij Uij wj              (2)
                                                                                           i=1 j=1
  • instantaneous assignment (IA) vs. time-extended
    assignment (TA): IA means that the available infor-           subject to
    mation concerning the robots, the tasks, and the en-
                                                                                   m
    vironment permits only an instantaneous allocation
    of tasks to robots, with no planning for future alloca-                              αij = 1, 1 ≤ j ≤ n
                                                                                   i=1
    tions. TA means that more information is available,
                                                                                                                             (3)
    such as the set of all tasks that will need to be as-                           n
    signed, or a model of how tasks are expected to arrive                               αij = 1, 1 ≤ i ≤ m.
    over time.                                                                    j=1

We denote a particular MRTA problem by a triple of two-           The sum (2) is the overall system utility, while (3) en-
letter abbreviations drawn from this list. For example,           forces the constraint of working with single-worker jobs
a problem in which multi-robot tasks must be allocated            and single-job workers (note that since αij are integers
once to single-task robots is designated ST-MR-IA.                they must all be either 0 or 1). Given an optimal solu-
   These axes are not meant to be exhaustive, but to al-          tion to this problem (i.e., a set of integers αij that max-
low for a taxonomy that is both broad enough and de-              imizes (2) subject to (3)), an optimal assignment is con-
tailed enough to meaningfully characterize many practi-           structed by assigning worker i to job j only when αij = 1.
cal MRTA problems. Furthermore, this taxonomy will of-               The ST-SR-IA problem can be posed as an OAP in the
ten allow for a prescription of solutions. The following          following way: given m robots, n prioritized tasks, and
sections present the combinations allowed by these axes,          utility estimates for each of the mn possible robot-task
discussing for each which MRTA problem(s) it represents           pairs, assign at most one task to each robot. If the robots’
and what organizational theory pertains. Section 7 treats         utilities can be collected at one machine (or distributed to
some important MRTA problems that are not captured by             all machines), then a centralized linear programming ap-
this taxonomy.                                                    proach (e.g., Kuhn’s (1955) Hungarian method) will find
                                                                  the optimal allocation in O(mn2 ) time.
5.1 ST-SR-IA: Single-task robots, single-                            Alternatively, a distributed auction-based approach
                                                                  (e.g., Bertsekas’s (1990) Auction algorithm) will find the
    robot tasks, instantaneous assignment
                                                                  optimal allocation, usually requiring time proportional to
This problem is the simplest, as it is actually an instance       the maximum utility and inversely proportional to the
of the Optimal Assignment Problem (OAP) (Gale 1960),              minimum bidding increment. In order to understand such


                                                              5
economically-inspired algorithms, it is necessary to con-          Note that setting cj to 0 implicitly states that the bro-
sider the concept of linear programming duality. As do all         kers always prefer to sell their tasks, regardless of how
maximum linear programs, the OAP has a dual minimum                much they are paid. In other words, it is always better
linear program, which can be stated as follows: find m              to execute a task than not execute it, regardless of the
integers ui and n integers vj that minimize:                       expected performance. In economic terminology, those
                                                                   are lexicographic preferences with regard to the tasks
                         m            n
                                                                   (Pearce 1999). Such preferences violate important as-
                   P =         ui +         vj           (4)
                                                                   sumptions concerning the nature of utility values that
                         i=1          j=1
                                                                   are made when building or analyzing general economic
subject to:                                                        systems. Fortunately, in constructing the market corre-
                  ui + vj ≥ Uij , ∀i, j.                (5)        sponding to the ST-SR-IA problem, no assumptions are
                                                                   made concerning the robots’ preferences, and so lexico-
The Duality Theorem states that the original problem               graphic preferences do not present a problem. On the
(called the primal) and its dual are equivalent, and that          other hand, the behavior of more complex, long-lived
the total utility of their respective optimal solutions are        economies (such as the markets suggested by Dias &
the same (Gale 1960).                                                                                     c
                                                                   Stentz (2001) and Gerkey & Matari´ (2002a)) may de-
   Optimal auction algorithms for task allocation usually          pend strongly on the nature of the robots’ preferences, es-
work in the following way. Construct a price-based task            pecially if the synthetic economies are meant to interact
market, in which tasks are sold by brokers to robots. Each         with the human economy.
task j is for sale by a broker, which places a value cj on            The two approaches (i.e., centralized and distributed)
the task. Also, robot i places a value hij on task j. The          to solving the OAP represent a tradeoff between solu-
problem then is to establish task prices pj , which will in        tion time and communication overhead. Centralized ap-
turn determine the allocation of tasks to robots. To be            proaches generally run faster than distributed approaches,
feasible, the price pj for task j must be greater than or          but incur a higher communication overhead. To imple-
equal to the broker’s valuation cj ; otherwise, the broker         ment a centralized assignment algorithm, n2 messages
would refuse to sell. Assuming that the robots are acting          are required to transmit the utility of each robot for each
selfishly, each robot i will elect to buy a task ti for which       task; an auction-based solution usually requires far fewer
its profit is maximized:                                            (sometimes fewer than n) messages to reach equilibrium.
                                                                   With the addition of simple optimizations, such as buffer-
                ti = argmax {hij − pj }.                 (6)       ing multiple utility values and transmitting them in one
                         j
                                                                   message, this gap in communication overhead will only
Such a market is said to be at equilibrium when prices are         become apparent in large-scale systems. Furthermore,
such that no two robots select the same task.                      the time required to transmit a message cannot be ig-
   At equilibrium, each individual’s profit in this market is       nored, especially in wireless networks, which can in-
maximized. Furthermore, the profits made by the robots              duce significant latency. Thus, for small- to medium-
and the profits made by the brokers form an optimal solu-           scale systems, say n < 200, a broadcast-based cen-
tion to the dual of the OAP:                                       tralized assignment solution is likely the better choice.
                                                                   Not surprisingly, many MRTA architectures implement
                  ui   = hiti − pti , ∀i                           some form of this approach (Parker 1998, Werger &
                                                         (7)
                  vj   = pj − cj , ∀j.                                    c
                                                                   Matari´ 2001, Castelpietra, Iocchi, Nardi, Piaggio, Scalzo
                                                                   & Sgorbissa 2001, Weigel, Auerback, Dietl, D¨ mler,  u
Thus, the allocation produced by the market at equilib-
                                                                                         u
                                                                   Gutmann, Marko, M¨ ller, Nebel, Szerbakowski & Thiel
rium is optimal (Gale 1960).
                                                                                 a          c
                                                                   2001, Østerg˚ rd, Matari´ & Sukhatme 2001).
   In MRTA problems, separate valuations are not given in
                                                                      In order to determine its viability for solving MRTA
this manner, but only combined utility estimates for robot-
                                                                   problems, we implemented, in ANSI C, the Hungarian
task pairs. However, task valuations can be defined for the
                                                                   method1 (Kuhn 1955). We tested on randomly generated
robots and brokers as follows:
                                                                   symmetric assignment problems (i.e., problems where
                       hij     = αij                               m = n) with uniformly distributed utilities, and found
                                                         (8)
                       cj      = 0.                                that the Hungarian method is easily fast enough to be used
                                                                   in the control loop in real-world MRTA domains. Using
The solution to the corresponding dual problem then be-            a Pentium III-700MHz, problems with tens of robots and
comes:
                   ui = αiti − pti                                   1 The code for our implementation is available      from:
                                                    (9)
                   vj = p j .                                      http://robotics.stanford.edu/∼gerkey.


                                                               6
tasks can be solved in less than 1ms and problems with              a frequency on the order of 10Hz. This utility-based dy-
300 robots and tasks can be solved in less than 1s.                 namic role assignment problem and has been studied by
                                                                    many (Stone & Veloso 1999, Weigel et al. 2001, Castelpi-
5.1.1 Variant: iterated assignment                                  etra et al. 2001, Emery, Sikorski & Balch 2002, Vail &
                                                                    Veloso 2003).
Few MRTA problems exhibit exactly the above one-
time assignment structure. However, many problems can
                                                                       It is common in the robot soccer domain for each robot
be framed as iterated instances of ST-SR-IA. Consider
                                                                    to calculate its utility for each role and periodically broad-
the cooperative multi-object tracking problem known as
                                                                    cast these values to its teammates. The robots can then ex-
CMOMMT, studied by Parker (1999) and Werger &
                                                                    ecute, in parallel, some centralized assignment algorithm.
       c
Matari´ (2001), which consists of coordinating robots to
                                                                    For example, Castelpietra et al.’s (2001) assignment algo-
observe multiple unpredictably moving targets. When
                                                                    rithm consists of ordering the roles in descending priority
presented with new sensor inputs (e.g., camera images)
                                                                    and then assigning each to the available robot with the
and consequent utility estimates (e.g., perceived distance
                                                                    highest utility. This algorithm is yet another instance of
to each target), the system must decide which robot should
                                                                    the Greedy algorithm. Vail & Veloso (2003) also employ
track which target.
                                                                    the Greedy algorithm with fixed priority roles. Weigel
                      c
   Werger & Matari´ ’s (2001) MRTA architecture, Broad-
                                                                    et al. (2001) employ a similar but slightly more sophis-
cast of Local Eligibility (BLE), solves this iterated assign-
                                                                    ticated algorithm that tries to address the problem of ex-
ment problem using the following algorithm:
                                                                    cessive role-swapping by imposing stricter prerequisites
Algorithm (BLE assignment algorithm).                               for reassignment. Among other things, the algorithm re-
                                                                    quires that both robots “want” to exchange roles in order
 1. If any robot remains unassigned, find the robot-task             to maximize their respective utilities, recalling the condi-
    pair (i, j) with the highest utility. Otherwise, quit.          tions for equilibrium in markets (see Section 5.1). How-
 2. Assign robot i to task j and remove them from con-              ever, Weigel et al.’s (2001) algorithm is not guaranteed to
    sideration.                                                     produce optimal assignments of roles, a fact that can eas-
                                                                    ily be shown by counterexample.
 3. Go to step 1.
   This algorithm is an instance of the canonical Greedy               Since the number of robots involved in many iterated
algorithm. The OAP is not a matroid (see Section 4) and             MRTA problems today is small (n ≤ 11 for robot soc-
so the Greedy algorithm will not necessarily produce an             cer, which is more than for most current multi-robot sys-
optimal solution. The Greedy algorithm is known to be 2-            tems), O(n3 ) optimal assignment algorithms could easily
competitive for the OAP (Avis 1983), and thus so is BLE.            replace the suboptimal ad hoc assignment algorithms that
That is, in the worst case, BLE will produce a solution             are typically used. As the performance results mentioned
whose benefit is 1 of the optimal benefit. Exactly this al-           in the previous section show, the Hungarian method can
                   2
gorithm, operating on a global blackboard, has been used            be used to solve typical problems in less than 1ms per it-
in a study of the impact of communication and coordi-               eration with the moderately powerful computers found on
nation on MRTA (Østerg˚ rd et al. 2001). A very similar
                            a                                       today’s robots.
assignment algorithm is also used by Botelho & Alami’s
(1999) M+ architecture.                                                Since there is some additional cost for running an opti-
   Parker’s (1998) MRTA architecture L-ALLIANCE,                    mal algorithm (if only in the work involved in the imple-
which can also perform iterated allocation, learns its as-          mentation), one might ask whether the optimal solution
signment algorithm from experience. The resulting algo-             provides a sufficient benefit. For example, it is known that
rithm is similar to, but potentially more sophisticated than,       for arbitrary assignment problems, the Greedy algorithm’s
the Greedy algorithm. If well-trained, the L-ALLIANCE               worst-case behavior is to produce a solution with half of
assignment algorithm can outperform the Greedy algo-                the optimal utility. However, it is not known how the
rithm (Parker 1994), but is not guaranteed to be optimal.           algorithm can be expected to perform on typical MRTA
   Another domain in which the iterated OAP arises is               problems, which exhibit some structure and are unlikely
robot soccer. Since many of the robots are interchange-             to present pathological utility combinations. Anecdotal
able, it is often advantageous to allow any player to take          evidence suggests that the Greedy algorithm works ex-
on any role within the team, according to the current sit-          tremely well on such problems. An interesting avenue
uation in the game. The resulting coordination problem              of research would be to analytically determine how well
can be cast as an iterated assignment problem in which              the Greedy algorithm will perform on the kinds of utility
the robots’ roles are periodically reevaluated, usually at          landscapes that are encountered in MRTA problems.


                                                                7
5.1.2 Variant: online assignment                                   small problems, the exponential space of possible sched-
                                                                   ules precludes enumerative solutions.
In some MRTA problems, the set of tasks is not revealed
                                                                      A means of treating ST-SR-TA is to ignore the time-
at once, but rather the tasks are introduced one at a time.
                                                                   extended component and approximate the problem as an
If robots that have already been assigned cannot be reas-
                                                                   instance of the ST-SR-IA problem (Section 5.1), followed
signed, then this problem is a variant of SR-ST-IA, known
                                                                   by an instance of the online assignment problem (Sec-
as online assignment (Kalyanasundaram & Pruhs 1993).
                                                                   tion 5.1.2). For example, given m robots and n tasks,
Instead of being initially given, the robot-task utility ma-
                                                                   with n > m, the following approximation algorithm can
trix is revealed one column (or row) at a time. If previ-
                                                                   be used:
ously assigned robots can be reassigned, then the problem
reduces to an instance of the iterated SR-ST-IA problem,           Algorithm (ST-SR-TA approximation algorithm).
which can be optimally solved with standard assignment
algorithms.                                                         1. Optimally solve the initial m × n assignment prob-
   The MRTA problems solved by Gerkey & Matari´ ’s        c            lem.
(2002b) M URDOCH system, in which tasks are randomly                2. Use the Greedy algorithm to assign the remaining
injected into the system over time, are instances of the               tasks in an online fashion, as the robots become
online assignment problem. The M URDOCH assignment                     available.
algorithm can be stated as follows:
                                                                      The performance of this algorithm is bounded below
Algorithm (M URDOCH assignment algorithm).                         by the normal Greedy algorithm, which is 3-competitive
                                                                   for online assignment. The more tasks that are assigned
 1. When a new task is introduced, assign it to the most
                                                                   in the first step, the better this algorithm will perform.
    fit robot that is currently available.
                                                                   As the difference between the number of robots and the
   This simple algorithm is yet another instance of the            number of tasks that are initially presented decreases
Greedy algorithm, and is known in the context of network           (i.e., (n − m) → 0), performance approaches optimal-
flows as the Farthest Neighbor algorithm. Not surpris-              ity, wherein all tasks are assigned in one step. Thus, al-
ingly, the online assignment problem is not a matroid; the         though it is not guaranteed to produce optimal solutions,
Greedy algorithm is known to be 3-competitive with re-             this algorithm should work well in practice, especially for
spect to the optimal post hoc offline solution. Further-            ST-SR-TA problems with short time horizons.
more, this performance bound is the best possible for                 Another way to approach this problem is to employ an
any online assignment algorithm (Kalyanasundaram &                 iterative task allocation system, such as Dias & Stentz’s
Pruhs 1993). Thus, without a model of the tasks that are           (2001) price-based market. The robots would opportunis-
to be introduced, and without the option of reassigning            tically exchange tasks over time, thereby modifying their
robots that have already been assigned, it is impossible to        schedules. This idea is demonstrated by the multi-robot
construct a better task allocator than M URDOCH.                   exploration system described by Zlot, Stentz, Dias &
                                                                   Thayer (2002). However, without knowledge of the exact
                                                                   criteria used to decide when and with whom each robot
5.2 ST-SR-TA: Single-task robots, single-                          will trade, it is impossible to determine the algorithmic
    robot tasks, time-extended assignment                          characteristics (including solution quality) of this method.
When the system consists of more tasks than robots, or if
there is a model of how tasks will arrive, then the robots’        5.2.1 Variant: ALLIANCE Efficiency Problem
future utilities for the tasks can be predicted with some
                                                                   Parker (1995) formulated a related MRTA problem called
accuracy, and the problem is an instance of ST-SR-TA.
                                                                   the ALLIANCE Efficiency Problem (AEP). Given is a set
This problem is one of building a time-extended schedule
                                                                   of tasks making up a mission, and the objective is to allo-
of tasks for each robot, with the goal of minimizing total
                                                                   cate a subset of these tasks to each robot so as to minimize
weighted cost. Using Brucker’s (1998) terminology, this
                                                                   the maximum time taken by a robot to serially execute
problem is an instance of the class of scheduling problems
                                                                   its allocated tasks. Thus in order to solve the AEP, one
                                                                   must construct a time-extended schedule of tasks for each
                      R ||     wj Cj .
                                                                   robot. This problem is an instance of the class of schedul-
                                                                   ing problems:
That is, the robots execute tasks in parallel (R) and the
                                                                                            R || Cmax .
optimization criterion is the weighted sum of execution
costs ( wj Cj ). Problems in this class are strongly N P-          Problems in this class are known to be strongly N P-hard
hard (Bruno, Coffman & Sethi 1974). Even for relatively            (Garey & Johnson 1978). Parker (1995) arrived at the


                                                               8
same conclusion regarding the AEP, by reduction from the            especially in the context of solving crew scheduling prob-
N P-complete problem PARTITION.                                     lems for airlines (Marsten & Shepardson 1981, Hoffman
  To attack the AEP, Parker (1998) used a learning ap-              & Padberg 1993). As a result, many heuristic SPP algo-
proach, in which the robots learn both their utility esti-          rithms have been developed.
mates and their scheduling algorithms from experience.                 It remains to be seen whether such heuristic algorithms
When trained for a particular task domain, this system has          are applicable to MRTA problems. Some approxima-
the potential to outperform the approximation algorithm             tion algorithms, including those of Hoffman & Padberg
described above (but it is not guaranteed to do so).                                     u
                                                                    (1993) and Atamt¨ rk et al. (1995), have been shown
                                                                    to produce high-quality solutions to many instances of
5.3 ST-MR-IA: Single-task robots, multi-                            SPP. Even with hundreds of rows/columns and using
                                                                    mid-1990s workstation-class machines, these algorithms
    robot tasks, instantaneous assignment
                                                                    require at most tens of seconds to arrive at a solution.
Many MRTA problems involve tasks that require the com-              With ever-increasing computational power available on
bined effort of multiple robots. In such cases, we must             robots, it seems plausible that SPP approximation algo-
consider combined utilities of groups of robots, which are          rithms could be used to solve small- and medium-scale
in general not sums over individual utilities; utility may          instances of the ST-MR-IA problem. To this end, a po-
be defined arbitrarily for each potential group. For ex-             tentially important question is whether and how these al-
ample, if a task requires a particular skill or device, then        gorithms can be parallelized. Shehory & Kraus (1998)
any group of robots without that skill or device has zero           showed how to implement a parallel SPP algorithm for
utility with respect to that task, regardless of the capabil-       coalition formation in a multi-agent context. Another im-
ities of the other robots in the group. This kind of prob-          portant point is that, in order to apply certain SPP algo-
lem is significantly more difficult than the previously dis-          rithms to ST-MR-IA problems, it may be necessary to
cussed MRTA problems, which were restricted to single-              enumerate a set of feasible coalition-task combinations.
robot tasks. In the multi-agent community, the ST-MR-IA             In the case that the space of such combinations is very
problem is referred to as coalition formation, and has been         large, there is a need to prune the feasible set; pruning can
extensively studied (Sandholm & Lesser 1997, Shehory &              take advantage of sensor-based metrics such as physical
Kraus 1998).                                                        distance (e.g., if two robots are more than 50 meters apart,
   It is natural to think of the ST-MR-IA problem as split-         then disallow any coalitions that contain them both).
ting the set of robots into task-specific coalitions. A rel-
evant concept from set theory is that of a set partition. A         5.4 ST-MR-TA: Single-task robots, multi-
family X is a partition of a set E if and only if the ele-
                                                                        robot tasks, time-extended assignment
ments of X are mutually disjoint and their union is E:
                                                               The ST-MR-TA class of problems includes both coali-
                              = ∅                              tion formation and scheduling. For example, consider the
                       x∈X
                                                               problem of delivering a number of packages of various
                                                          (10)
                                                               sizes from a single distribution center to different desti-
                             = E.                              nations. The number of packages and their destinations
                       x∈X
                                                               are known in advance, as is the size of each package,
   With the idea of partitions in mind, a well-known which determines the number of robots required to carry
problem in combinatorial optimization called the (maxi- it. Given a pool of robots, the problem is to build a deliv-
mum utility) Set Partitioning Problem, or SPP (Balas & ery schedule for the packages, while guaranteeing that a
Padberg 1976) is relevant:                                     team of the appropriate size is assembled for each pack-
Definition (Set Partitioning Problem (SPP)). Given a            age.
finite set E, a family F of acceptable subsets of E, and           To produce an optimal solution, all possible schedules
a utility function u : F → R+ , find a maximum-utility          for all possible coalitions must be considered. This prob-
family X of elements in F such that X is a partition of E. lem is N P-hard. If the coalitions are given, with no more
                                                               than one coalition allowed for each task, the result is an
   The ST-MR-IA problem can be cast as an instance of instance of a multiprocessor scheduling problem:
SPP, with E as the set of robots, F as the set of all feasible
coalition-task pairs, and u as the utility estimate for each                      M P T m ||     wj Cj .
such pair.
   Unfortunately, the SPP is strongly N P-hard (Garey & Even with two processors (M P T 2 ||             wj Cj ), this prob-
Johnson 1978). Fortunately, the problem has been stud- lem is strongly N P-hard (Hoogeveen, van del Velde &
ied in depth (Atamt¨ rk, Nemhauser & Savelsbergh 1995), Veltman 1994), as is the unweighted version (M P T 2 ||
                    u


                                                                9
   Cj ) (Cai, Lee & Li 1998). With three processors, the          A relevant concept from set theory is the set cover. A
maximum finishing time version (M P T 3 || Cmax ) is also       family X is a cover of a set E if and only if the union of
strongly N P-hard (Hoogeveen et al. 1994).                     elements of X is E:
   A means of treating ST-MR-TA is to ignore the time-
extended component and approximate the problem as an                                          = E.                      (11)
instance of iterated ST-MR-IA. For this purpose, a greedy                               x∈X

approximation algorithm akin to the one given above for        As compared with a partition (see Section 5.3), the subsets
the ST-SR-TA problem can be employed. Unfortunately,           in a cover need not be disjoint. A well-known problem in
the quality of such an approximation is difficult to de-        combinatorial optimization called the (minimum cost) Set
termine. Another approach is to employ a leader-based          Covering Problem, or SCP (Balas & Padberg 1972), is
mechanism to dynamically form coalitions and build task        relevant:
schedules for them, as described by Dias & Stentz (2002).
However, the performance and overhead of this method           Definition (Set Covering Problem (SCP)). Given a fi-
will also be difficult, if not impossible, to predict with-     nite set E, a family F of acceptable subsets of E, and a
out detailed information about the implementation (how         cost function c : F → R+ , find a minimum-cost family
many and which robots will be leaders, how does a leader       X of elements in F such that X is a cover of E.
select among candidate coalitions, how long do coalitions
persist, etc.).                                                   The MT-MR-IA problem can be cast as an instance of
                                                               the SCP, with E as the set of robots, F as the set of all fea-
                                                               sible (and possibly overlapping) coalition-task pairs, and
5.5 MT-SR-IA & MT-SR-TA: Multi-task                            c as the cost estimate for each such pair.
    robots, single-robot tasks                                    Though superficially similar to the SPP, the SCP is in
                                                               fact a “distant relative,” with the solution space of the SCP
The MT-SR-IA and MT-SR-TA problems are currently               being far less constrained (Balas & Padberg 1976). The
uncommon, as they assume robots that can each concur-          two problems are similar in that the SCP is also strongly
rently execute multiple tasks. Today’s mobile robots are       N P-hard (Korte & Vygen 2000).
generally actuator-poor. Their ability to affect the envi-             a
                                                                  Chv´ tal (1979) developed a greedy approximation al-
ronment is typically limited to changing position, so they     gorithm for the SCP. The competitive factor for this al-
can rarely execute more than one task at a time. How-          gorithm is logarithmic in the size of the largest feasible
ever, there are sensory and computational tasks that fit the    subset (i.e., maxf ∈F |f |), and the running time is poly-
MT-SR-IA or MT-SR-TA models quite well.                        nomial in the number of feasible subsets (i.e., |F |). Bar-
   Solving the MT-SR-IA problem is equivalent to solving       Yehuda & Even (1981) present another heuristic set cov-
the ST-MR-IA problem (see Section 5.3), with the robots        ering algorithm, whose competitive factor is the maxi-
and tasks interchanged in the SPP formulation. Likewise,       mum number of subsets to which any element belongs
the MT-SR-TA problem is equivalent to the ST-MR-TA             (i.e., maxe∈E |{f ∈ F : e ∈ f }|), and whose running
problem (see Section 5.4). Thus the analysis and algo-         time is the sum of the sizes of the feasible subsets (i.e.,
rithms provided for the multi-robot task problems also di-
                                                                  f ∈F |f |) (Korte & Vygen 2000).
rectly apply here to the multi-task robot problems.               The important trend to note is that these heuristic algo-
                                                               rithms perform well when the space of feasible subsets is
5.6 MT-MR-IA: Multi-task robots, multi-                        limited, and that they perform poorly in the most general
    robot tasks, instantaneous assignment                      case of the SCP, with all subsets allowed. For MRTA,
                                                               this result suggests that such algorithms would best be
When a system consists of both multi-task robots and           applied in environments in which the space of possible
multi-robot tasks, the result is an instance of the MT-        coalitions is naturally limited, as is the case with hetero-
MR-IA problem. For example, consider the allocation of         geneous and/or physically distantly separated robots. In
surveillance tasks to a team of robots in an office build-      the case of equally-skilled collocated robots, these algo-
ing. Each robot continuously patrols a fixed portion of         rithms would tend to run slowly and produce poor-quality
the building. Due to computational and/or sensory limita-      solutions.
tions, each robot can simultaneously detect only a limited        To the authors’ knowledge, set covering algorithms
number of environmental events (e.g., suspicious person,       have not been applied to MRTA problems, and it is an
smoke, open door). Given a set of events to look for, and      open question as to whether such an application would be
knowledge about where in the building each event is likely     beneficial. However, Shehory & Kraus (1996) success-
to occur, which robots should be tasked to look for each                                              a
                                                               fully adapted and distributed Chv´ tal’s (1979) approxi-
event?                                                         mation algorithm for use in multi-agent systems, which


                                                              10
suggests that SCP algorithms may indeed be viable for                      MRTA domain that operation is usually either a calcula-
MRTA problems.                                                             tion or comparison of utility, and running time is stated
                                                                           as a function of m and n, the number of robots and tasks,
                                                                           respectively. Since modern robots have significant pro-
5.7 MT-MR-TA: Multi-task robots, multi-
                                                                           cessing capabilities on board and can easily work in par-
    robot tasks, time-extended assignment                                  allel, in this analysis we assume that the computational
We can extend the surveillance domain described in the                     load is evenly distributed over the robots, and state the
previous section by specifying that certain events need not                running time as it is for each robot. For example, if each
be monitored immediately or continuously, but according                    robot must select the task with the highest utility, then
to some predefined schedule. For example, “the left wing                    the running time is O(n), because each robot performs n
of the building should be checked every hour for open                      comparisons, in parallel. Note that this analysis does not
doors.” The result is an MT-MR-TA problem, which is                        measure or consider the actual running time of the util-
an instance of a scheduling problem with multiprocessor                    ity calculation, in large part because that information is
tasks and multipurpose machines:                                           not generally reported. Rather it is assumed that the util-
                                                                           ity calculations are computationally similar enough to be
                 M P T mM P M n ||             wj Cj .                     meaningfully compared.
                                                                              Communication requirements are determined as the to-
This problem is strongly N P-hard, because it includes as                  tal number of inter-robot messages sent over the network.
a special case the strongly N P-hard scheduling problem                    In the analysis we do not consider message sizes, on
M P T 2 ||    wj Cj . We are not aware of any heuristic or                 the assumption that they are generally small (e.g., sin-
approximation algorithms for this difficult problem.                        gle scalar utility values) and approximately the same for
                                                                           different algorithms. Further, we assume that a perfect
                                                                           shared broadcast communication medium is used and that
6 Analysis of existing approaches                                          messages are always broadcast, rather than unicast. So if,
                                                                           for example, each robot must tell every other robot its own
Presumably because it is the simplest case of MRTA,                        highest utility value, then the overhead is O(m), because
the ST-SR-IA problem has received the most attention                       each robot makes a single broadcast.
from the research community. Having developed a for-                          Solution quality is reported as a competitive factor,
mal framework in which to study to this MRTA problem,                      which bounds an algorithm’s performance as a function
we can now apply it to an analysis of some of the key task                 of the optimal solution (Section 4). The competitive fac-
allocation architectures from the literature. In this section              tor for an architecture is determined by mapping its task
six approaches to the ST-SR-IA problem are analyzed, fo-                   allocation algorithm onto the underlying assignment prob-
cusing on the following three characteristics2 :                           lem. For any given task allocation architecture, this map-
                                                                           ping could be arbitrarily complex and not necessarily in-
 1. computation requirements                                               formative. Fortunately, existing MRTA architectures tend
 2. communication requirements                                             to implement either the Greedy algorithm or a close vari-
                                                                           ant. By identifying the allocation algorithm as such, we
 3. solution quality                                                       can put a lower bound on its performance, and thus gain
                                                                           some insight into how the architecture can be expected
These theoretical aspects of multi-robot coordination                      to perform, independent of the particular application do-
mechanisms are vitally important to their study, compari-                  main.
son, and objective evaluation, as the large-scale and long-
term system behavior is strongly determined by the funda-
mental characteristics of the underlying algorithm(s). We                  Results & discussion Next, six MRTA architectures
can derive these characteristics for existing architectures                that have been validated on either physical or simulated
by seeing them as solutions to the underlying utility op-                  robots are analyzed. Three of the architectures solve the
timization problems that we identified in our taxonomy.                     iterated assignment problem and the other three solve the
First, we explain the methodology used in the analysis.                    online assignment problem. While there are a great many
                                                                           architectures in the literature, we have attempted to gather
                                                                           a set of approaches that is representative of the work to
Methodology Computational requirements, or running
                                                                           date.
time, are determined in the usual way, as the number of
                                                                              Of the iterated assignment architectures, the first is
times that some dominant operation is repeated. For the
                                                                           ALLIANCE (Parker 1998), one of the earliest demon-
  2 This   analysis was originally presented in Gerkey & Matari´ (2003)
                                                               c           strated approaches to MRTA. This behavior-based archi-


                                                                          11
         Name                                                            Computation      Communication          Solution
                                                                          / iteration       / iteration           quality
         ALLIANCE3                                                          O(mn)              O(m)               at least
         (Parker 1998)                                                                                         2-competitive
         BLE                                                                  O(mn)            O(mn)           2-competitive
                         c
         (Werger & Matari´ 2001)
         M+                                                                   O(mn)            O(mn)           2-competitive
         (Botelho & Alami 1999)

Table 1: Summary of selected iterated assignment architectures for MRTA. Shown here for each architecture are the
computational and communication requirements, as well as solution quality.


         Name                                                      Computation            Communication          Solution
                                                                       / task                / task               quality
         M URDOCH                                                  O(1) / bidder             O(n)              3-competitive
                            c
         (Gerkey & Matari´ 2002b)                                 O(n) / auctioneer
         First-price auctions                                      O(1) / bidder                O(n)              at least
         (Dias & Stentz 2001)                                     O(n) / auctioneer                            3-competitive
         Dynamic role assignment                                   O(1) / bidder                O(n)              at least
         (Chaimowicz et al. 2002)                                 O(n) / auctioneer                            3-competitive

Table 2: Summary of selected online assignment architectures for MRTA. Shown here for each architecture are the
computational and communication requirements, as well as solution quality.



tecture allocates tasks by maintaining, for each robot, lev-              also an implementation of the Greedy algorithm. The two
els of impatience and acquiescence concerning the avail-                  remaining architectures, from Dias & Stentz (2001) and
able tasks. These motivation factors are combined to                      Chaimowicz et al. (2002), also assign tasks with first-price
form, in effect, a utility estimate for each (robot, task)                auctions, but allow (in some circumstances) later reassign-
pair. Another behavior-based architecture is Werger &                     ment.
        c
Matari´ ’s (2001) Broadcast of Local Eligibility (BLE),                      Tables 1 & 2 summarize the results for the iterated
which is a distributed version of the well-known Sub-                     assignment architectures and online assignment architec-
sumption Architecture (Brooks 1986). As described in                      tures, respectively. Perhaps the most significant trend in
Section 5.1.1, BLE works as follows: at a fixed rate (1Hz),                these results is how similar the architectures look when
the robots compute and broadcast to each other their util-                examined in the manner. For example, the iterated ar-
ity estimates for all tasks; allocation is performed after                chitectures listed in Table 1, which assign all available
each broadcast with the Greedy algorithm. Another archi-                  tasks simultaneously, exhibit almost identical algorith-
tecture that employs the Greedy algorithm is M+ (Botelho                  mic characteristics. Only the ALLIANCE architecture
& Alami 1999), whose use of auctions represents the first                  (Parker 1998) shows any difference; in this case the de-
market-based approach to MRTA (or at least the first that                  crease in communication overhead is achieved by hav-
was motivated with economic ideas). Reassignment of                       ing each robot internally model the fitness of the oth-
tasks is allowed in all three architectures, although the fre-            ers, thereby effectively distributing the utility calcula-
quency of reassignment may vary. For example, in BLE                      tions. More striking are the results in Table 2, which
reassignment occurs almost continuously, but in M+ reas-                  lists architectures that assign tasks in a sequential man-
signment occurs only when a new task becomes available.                   ner: with respect to computational and communication
   Of the online assignment architectures, the first is                    requirements, these architectures are identical. In terms
                                 c
M URDOCH (Gerkey & Matari´ 2002b), which uses a first-                     of solution quality, Dias & Stentz’s (2001) and Chaimow-
price auction to assign each task, and does not allow re-                 icz et al.’s (2002) approaches, which allow reassignment
assignment. As stated in Section 5.1.2, this approach is                  of tasks, can potentially perform better than M URDOCH.
    3 In addition to solving the ST-SR-IA problem, the ALLIANCE ar-          These results are particularly interesting because they
chitecture is also capable of building time-extended task schedules in    suggest that there is some common methodology under-
order to solve a form of the ST-SR-TA problem (see Section 5.2.1).        lying many existing approaches to MRTA. This trend is


                                                                         12
difficult or impossible to discern from reading the tech-         to build a schedule of targets for each robot. Unfortu-
nical papers describing the work, as each architecture is        nately, this problem is not an instance of ST-SR-TA, be-
described in different terms, and validated in a different       cause the cost for a robot to visit target C depends on
task domain. However, with the analysis described here,          whether that robots first visits target A or target B. In-
fundamental similarities of the various architectures be-        stead, this problem is an instance of the multiple traveling
come obvious. These similarities are encouraging because         salesperson problem (MTSP); even in the restricted case
they suggest that, regardless of the details of the robots or    of one salesperson, MTSP is strongly N P-hard (Korte &
tasks in use, the various authors are all studying a com-        Vygen 2000). If, as is often the case with exploration, it is
mon, fundamental problem in autonomous coordination.             possible to discover new targets over time, then the prob-
As a corollary, there is now a formal grounding for the          lem is an instance of the dynamic MTSP, which is clearly
belief that these ad hoc architectures may have properties       at least as difficult as the classical MTSP.
that allow them to be generalized and applied widely.
   Of course, the described analysis does not capture all           Given the difficulty of the multi-robot exploration prob-
relevant aspects of the systems under study. For example,        lem, it is not surprising that researchers have not at-
in the ALLIANCE architecture, the robots’ computational          tempted to solve it directly or exactly. A heuristic ap-
load is increased to handle modeling of other robots, but        proximation is offered by Zlot et al. (2002), who use TSP
this analysis does not consider that extra load. Such de-        heuristics to build target schedules and derive costs that
tails, which are currently not widely discussed in the lit-      are used in Dias & Stentz’s (2001) market-based task allo-
erature, will likely become more important as the field           cation architecture. When a robot discovers a new target,
moves toward improved cross-evaluation of solutions.             it inserts the new target into its schedule, but retains the
   In addition to enabling evaluation, this kind of anal-        option of later auctioning the target off to another, closer
ysis can be used to explain why certain solutions work           robot.
in practice. For example, the online assignment archi-
tectures listed in Table 2 are all economically-inspired,           The multi-robot exploration problem is an example of
built around task auctions. The designers of such archi-         a larger class of problems, in which a robot’s utility for
tectures generally justify their approach with a loose anal-     a task may depend on which other tasks that robot exe-
ogy to the efficiency of the free market as it is used by         cutes. These problems in turn form part of another, more
humans. With a formal analysis, it is possible to gain           general class of problems in which a robot’s utility for a
a clearer understanding of why auction-based allocation          task may depend on which other tasks any robot executes.
methods work in practice. Specifically, is well known             That is, each robot-task utility can depend on the over-
that synthetic economic systems can be used to solve a           all allocation of tasks to robots. Such interrelated utilities
variety of optimization problems. As explained in Sec-           can sometimes be tractably captured with factored Par-
tion 5.1, an appropriately constructed price-based mar-          tially Observable Markov Decision Processes (POMDPs),
ket, at equilibrium (i.e., when the prices are such that         assuming that a world model is available (Guestrin, Koller
no two utility-maximizing robots would select the same           & Parr 2001).
task), produces optimal assignments. The previously de-
scribed economically-inspired architectures approximate
such a market to varying degrees.                                   For mobile robots, this situation can arise any time that
                                                                 physical interference contributes significantly to task per-
                                                                 formance. For example, consider a multi-robot resource
                                                                 transportation problem in which each robot must choose
7 Other problems                                                 which of a predetermined number of source-sink roads to
                                                                 travel. The decision of which road to travel should take
Although the taxonomy given in the previous sections
                                                                 into account the congestion caused by other robots. Tak-
covers many MRTA domains, several potentially impor-
                                                                 ing the position that interference effects are difficult or
tant problems are excluded. Next we describe some prob-
                                                                 impossible to adequately model a priori, Dahl, Matari´     c
lem domains that are not captured by the taxonomy.
                                                                 & Sukhatme (2002) developed a reinforcement learning
                                                                 approach to the multi-robot resource transportation prob-
7.1 Interrelated utilities                                       lem. The robots do not communicate with each other di-
                                                                 rectly, but rather through physical interactions, with each
Consider the problem of assigning target points to a team        robot maintaining and updating an estimate of the utility
of robots that are cooperatively exploring an unknown en-        for each available road. This approach was shown to pro-
vironment. Many targets (e.g., the frontiers of Yamauchi         duce higher-quality solutions than those produced without
(1998)) may be known at one time, and so it is possible          learning, and added no communication overhead.


                                                                13
7.2 Task constraints                                            or lesser extent, sometimes in simulation and sometimes
                                                                with physical robots. These research efforts are unde-
In addition to an assumption of independent utilities, our      niably useful, as they demonstrate that successful multi-
taxonomy also assumes independent tasks. There may              robot coordination is possible, even in relatively complex
instead be constraints among the tasks, such as sequen-         environments. However, to date it has not been possible to
tial or parallel execution. In principle, each set of tasks     draw general conclusions regarding the underlying MRTA
with such constraints could be phrased as a single mono-        problems, or to establish a prescriptive strategy that would
lithic task that requires multiple robots. The allocation of    dictate how to achieve task allocation in a given multi-
these larger tasks could then be described by the presented     robot system.
taxonomy (e.g., ST-MR-IA). Unfortunately, the difficult
                                                                   We view MRTA problems as fundamentally organiza-
problem of reasoning about task constraints is not re-
                                                                tional in nature, in that the goal is to allocate limited
moved, but simply shifted into the utility estimation for
                                                                resources in such a way as to efficiently achieve some
each potential multi-robot team. In general, our analysis
                                                                task(s). In this paper we have shown how MRTA prob-
will not suffice in the presence of constraints among tasks.
                                                                lems can be studied in a formal manner by adapting to
   Although the topic of job constraints is addressed by
                                                                robotics some of the theory developed in relevant disci-
the scheduling literature (Brucker 1998), the addition of
                                                                plines that study organizational and optimization prob-
such constraints generally increases problem difficulty,
                                                                lems. These disciplines include operations research, eco-
and tractable algorithms exist for only the simplest kinds
                                                                nomics, scheduling, network flows, and combinatorial op-
of constraints. A possible way to approach this prob-
                                                                timization.
lem is with techniques for dynamic constraints satisfac-
tion (Modi, Jung, Tambe, Shen & Kulkarni 2001).                    Using such connections to relevant optimization the-
                                                                ory, we have presented in this paper a formal analysis of
                                                                MRTA problems. We have provided characterizations of
8 Summary                                                       a wide range of such problems, in the larger context of
                                                                a taxonomy. For the easier problems, we have provided
In the field of mobile robotics, the study of multi-robot        provably optimal algorithms that can be used in place of
systems has grown significantly in size and importance.          commonly-employed ad hoc or greedy solutions. For the
Having solved some of the basic problems concerning             more difficult problems, we have, wherever possible, pro-
single-robot control, many researchers have shifted their       vided suggestions toward their heuristic solution. Thus,
focus to the study of multi-robot coordination. There           this work can be used to aid further research into multi-
are by now a plethora of examples of demonstrated co-           robot coordination by allowing for the formal classifica-
ordinated behavior in multi-robot systems, and almost as        tion of MRTA problems, and by sometimes prescribing
many proposed coordination architectures. However, de-          candidate solutions.
spite more than a decade of research, the field so far lacks        The presented MRTA formalism is very general, in that
a theoretical foundation that can explain or predict the be-    it relies only on domain-independent theory and tech-
havior of a multi-robot system. Our goal in this paper has      niques. Thus, for example, the taxonomy given in Sec-
been to provide a candidate framework for studying such         tion 5 should apply equally well in multi-agent and multi-
systems.                                                        robot systems. However, in exchange for such generality,
   The word “coordination” is somewhat imprecise, and           this formalism is only capable of providing coarse char-
has been used inconsistently in the literature. In order        acterizations of MRTA problems and their proposed so-
to be precise about the problem with which we are con-          lutions. Consider the analysis showing that M URDOCH,
cerned, we defined a smaller problem: multi-robot task           as an implementation of the canonical Greedy algorithm,
allocation (MRTA). That is, given some robots and some          is 3-competitive for the online assignment problem. This
tasks, which robot(s) should execute which task(s)? This        kind of competitive factor gives an algorithm’s worst-case
restricted problem is both theoretically and practically im-    behavior, which may be quite different from its average-
portant, and is supported by the significant body of exist-      case behavior. In this respect, the bounds established for
ing work that focuses on MRTA, in one form or another.          existing MRTA architectures, in terms of computational
   To date, the majority of research in MRTA has been ex-       overhead, communication overhead, and solution quality,
perimental in nature. The standard procedure, followed          are relatively loose.
by a large number of researchers, has been to construct            One way to tighten these bounds is to add domain-
a MRTA architecture and then validate it in one or more         specific information to the formalism. By capturing and
application domains. This proof-of-concept method has           embedding models of how real MRTA domains behave
led to the proposal of many MRTA architectures, each            and evolve over time, it should be possible to make more
of which has been experimentally validated to a greater         accurate predictions about algorithmic performance. For


                                                               14
example, while the classical theory of the OAP makes no                  tion Heuristic for Large-Scale Set Partitioning Problems’,
assumptions about the nature of the utility matrices that                J. of Heuristics 1, 247–259.
form the input, MRTA problems are likely to exhibit sig-            Avis, D. (1983), ‘A Survey of Heuristics for the Weighted
nificant structure in their utility values. Far from ran-                 Matching Problem’, Networks 13, 475–493.
domly generated, utility values generally follow one of
                                                                    Balas, E. & Padberg, M. W. (1972), ‘On the Set-Covering Prob-
a few common models, determined primarily by the kind
                                                                         lem’, Operations Research 20(6), 1152–1161.
of sensor data that are used in estimating utility. If only
“local” sensor information is used (e.g., can the robot cur-        Balas, E. & Padberg, M. W. (1976), ‘Set Partitioning: A Survey’,
rently see a particular target, and if so, how close is it?),            SIAM Review 18(4), 710–760.
then utility estimates tend to be strongly bifurcated (e.g.,        Bar-Yehuda, R. & Even, S. (1981), ‘A linear-time approximation
a robot will have very high utility for those targets that it            algorithm for the weighted vertex cover problem’, J. of Al-
can see, and zero utility for all others). On the other hand,            gorithms 2, 198–203.
if “global” sensor information is available (e.g., how close        Bertsekas, D. P. (1990), ‘The Auction Algorithm for Assignment
is the robot to a goal location?), then utility estimates tend           and Other Network Flow Problems: A Tutorial’, Interfaces
to be smoother (e.g., utility will fall off smoothly in space            20(4), 133–149.
away from the goal). A promising avenue for future re-
                                                                    Botelho, S. C. & Alami, R. (1999), M+: a scheme for multi-
search would be to characterize this “utility landscape”                 robot cooperation through negotiated task allocation and
as it is encountered in MRTA domains, and then classify                  achievement, in ‘Proc. of the IEEE Intl. Conf. on Robotics
different MRTA problems according to the shapes of their                 and Automation (ICRA)’, Detroit, Michigan, pp. 1234–
landscapes, and make predictions about, for example, how                 1239.
well a greedy assignment algorithm should be expected to
                                                                    Brooks, R. A. (1986), ‘A robust layered control system for a mo-
work, as opposed to a more costly optimal assignment al-                bile robot’, IEEE J. of Robotics and Automation 2(1), 14–
gorithm.                                                                23.
                                                                    Brucker, P. (1998), Scheduling Algorithms, 2nd edn, Springer-
                                                                         Verlag, Berlin.
Acknowledgments
                                                                    Bruno, J. L., Coffman, E. G. & Sethi, R. (1974), ‘Scheduling In-
The research reported here was conducted at the Interac-                dependent Tasks To Reduce Mean Finishing Time’, Com-
tion Lab, part of the Center for Robotics and Embedded                  munications of the ACM 17(7), 382–387.
Systems (CRES) at the University of Southern California.            Cai, X., Lee, C.-Y. & Li, C.-L. (1998), ‘Minimizing Total Com-
This work was supported in part by the Intel Foundation,                 pletion Time in Two-Processor Task Systems with Pre-
DARPA Grant DABT63-99-1-0015 (MARS), and Office                           specifi ed Processor Allocations’, Naval Research Logistics
of Naval Research Grants N00014-00-1-0638 (DURIP)                        45(2), 231–242.
and N00014-01-1-0354. We thank Herbert Dawid, An-                   Cao, Y. U., Fukunaga, A. S. & Kahng, A. (1997), ‘Coopera-
drew Howard, Richard Vaughan, and Michael Wellman                        tive Mobile Robotics: Antecedents and Directions’, Au-
for their insightful comments.                                           tonomous Robots 4(1), 7–27.
                                                                    Castelpietra, C., Iocchi, L., Nardi, D., Piaggio, M., Scalzo, A.
                                                                         & Sgorbissa, A. (2001), Communication and Coordina-
References                                                               tion among heterogeneous Mid-size players: ART99, in
                                                                         P. Stone, T. Balch & G. Kraetzschmar, eds, ‘RoboCup
Agassounon, W. & Martinoli, A. (2002), A Macroscopic Model
                                                                         2000, LNAI 2019’, Springer-Verlag, Berlin, pp. 86–95.
    of an Aggregation Experiment using Embodied Agents in
    Groups of Time-Varying Sizes, in ‘Proc. of the IEEE Conf.       Chaimowicz, L., Campos, M. F. M. & Kumar, V. (2002), Dy-
    on System, Man and Cybernetics (SMC)’, Hammamet,                    namic Role Assignment for Cooperative Robots, in ‘Proc.
    Tunisia, pp. 250–255.                                               of the IEEE Intl. Conf. on Robotics and Automation
                                                                        (ICRA)’, Washington, DC, pp. 293–298.
Ahuja, R. K., Magnanti, T. L. & Orlin, J. B. (1993), Network
    Flows: Theory, Algorithms, and Applications, Prentice           Chien, S., Barrett, A., Estlin, T. & Rabideau, G. (2000), A
    Hall, Upper Saddle River, New Jersey.                                comparison of coordinated planning methods for cooper-
                                                                         ating rovers, in ‘Proc. of Autonomous Agents’, Barcelona,
Alur, R., Courcoubetis, C., Halbwachs, N., Henzinger, T. A.,
                                                                         Spain, pp. 100–101.
     Ho, P.-H., Nicollin, X., Olivero, A., Sifakis, J. & Yovine,
     S. (1995), ‘The algorithmic analysis of hybrid systems’,          a
                                                                    Chv´ tal, V. (1979), ‘A greedy heuristic for the set cover prob-
     Theoretical Computer Science 138(1), 3–34.                          lem’, Mathematics of Operations Research 4, 233–235.
     u
Atamt¨ rk, A., Nemhauser, G. & Savelsbergh, M. (1995), ‘A           Cormen, T. H., Leiserson, C. E. & Rivest, R. L. (1997), Introduc-
    Combined Lagrangian, Linear Programming and Implica-                tion to Algorithms, MIT Press, Cambridge, Massachusetts.


                                                                   15
                   c
Dahl, T. S., Matari´ , M. J. & Sukhatme, G. S. (2002), Adaptive      Guestrin, C., Koller, D. & Parr, R. (2001), Multiagent Plan-
     spatio-temporal organization in groups of robots, in ‘Proc.          ning with Factored MDPs, in ‘Proc. of Advances in Neu-
     of the IEEE/RSJ Intl. Conf. on Intelligent Robots and Sys-           ral Information Processing Systems (NIPS)’, Vancouver,
     tems (IROS)’, Lausanne, Switzerland, pp. 1044–1049.                  Canada, pp. 1523–1530.
Deneubourg, J.-L., Theraulaz, G. & Beckers, R. (1991), Swarm-        Hoffman, K. L. & Padberg, M. W. (1993), ‘Solving Airline Crew
    made architectures, in ‘Proc. of the European. Conf. on              Scheduling Problems by Branch-and-Cut’, Management
    Artifi cial Life (ECAL)’, Paris, France, pp. 123–133.                 Science 39(6), 657–682.
Dertouzos, M. L. & Mok, A. K. (1983), ‘Multiprocessor On-            Hoogeveen, J., van del Velde, S. & Veltman, B. (1994), ‘Com-
     Line Scheduling of Hard-Real-Time Tasks’, IEEE Trans-               plexity of scheduling multiprocessor tasks with prespeci-
     actions on Software Engineering 15(12), 1497–1506.                  fi ed processor allocations’, Discrete Applied Mathematics
Dias, M. B. & Stentz, A. (2001), A Market Approach to Multi-             55, 259–272.
     robot Coordination, Technical Report CMU-RI-TR-01-26,
                                                                     Jennings, J. S. & Kirkwood-Watts, C. (1998), Distributed
     The Robotics Institute, Carnegie Mellon University, Pitts-
                                                                          Mobile Robotics by the Method of Dynamic Teams, in
     burgh, Pennsylvania.
                                                                          T. Luth, P. Dario & H. Worn, eds, ‘Distributed Au-
Dias, M. B. & Stentz, A. (2002), Opportunistic Optimization               tonomous Robotic Systems 3’, Springer-Verlag, New
     for Market-Based Multirobot Control, in ‘Proc. of the                York, pp. 47–56.
     IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems
                                                                     Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. (1998),
     (IROS)’, Lausanne, Switzerland, pp. 2714–2720.
                                                                          ‘Planning and Acting in Partially Observable Stochastic
Donald, B., Jennings, J. & Rus, D. (1997), ‘Information invari-           Domains’, Artificial Intelligence 101(1–2), 99–134.
    ants for distributed manipulation’, The Intl. J. of Robotics
    Research 16(5), 673–702.                                         Kalyanasundaram, B. & Pruhs, K. (1993), ‘Online Weighted
                                                                          Matching’, J. of Algorithms 14, 478–488.
Dudek, G., Jenkin, M. & Milios, E. (2002), A Taxonomy of
    Multirobot Systems, in T. Balch & L. Parker, eds, ‘Robot         Klavins, E. (2003), Communication Complexity of Multi-Robot
    Teams: From Diversity to Polymorphism’, A.K. Peters,                  Systems, in J.-D. Boissonnat, J. Burdick, S. Hutchinson &
    Natick, Massachusetts, pp. 3–22.                                      K. Goldberg, eds, ‘Algorithmic Foundations of Robotics
Emery, R., Sikorski, K. & Balch, T. (2002), Protocols for Collab-         V’, Springer-Verlag, New York, pp. 275–292.
    oration, Coordination, and Dynamic Role Assignment in a          Korte, B. & Vygen, J. (2000), Combinatorial Optimization:
    Robot Team, in ‘Proc. of the IEEE Intl. Conf. on Robotics             Theory and Algorithms, Springer-Verlag, Berlin.
    and Automation (ICRA)’, Washington, DC, pp. 3008–                Kube, C. R. & Zhang, H. (1993), ‘Collective robotics: From
    3015.                                                                social insects to robots’, Adaptive Behavior 2(2), 189–219.
Fukuda, T., Nakagawa, S., Kawauchi, Y. & Buss, M. (1988),
                                                                     Kuhn, H. W. (1955), ‘The Hungarian Method for the As-
    Self organizing robots based on cell structures – CEBOT,
                                                                         signment Problem’, Naval Research Logistics Quarterly
    in ‘Proc. of the IEEE/RSJ Intl. Conf. on Intelligent Robots
                                                                         2(1), 83–97.
    and Systems (IROS)’, Victoria, British Columbia, pp. 145–
    150.                                                             Lerman, K. & Galstyan, A. (2002), ‘Mathematical Model of For-
Gale, D. (1960), The Theory of Linear Economic Models,                   aging in a Group of Robots: Effect of Interference’, Au-
     McGraw-Hill Book Company, Inc., New York.                           tonomous Robots 13(2), 127–141.
Garey, M. R. & Johnson, D. S. (1978), ‘“Strong” NP-                  Marsten, R. E. & Shepardson, F. (1981), ‘Exact Solution
     Completeness Results: Motivation, Examples, and Impli-               of Crew Scheduling Problems Using the Set Partition-
     cations’, J. of the ACM 25(3), 499–508.                              ing Model: Recent Successful Applications’, Networks
                                                                          11, 165–177.
Gat, E. (1998), Three-Layer Architectures, in D. Kortenkamp,
     R. P. Bonasso & R. Murphy, eds, ‘Artifi cial Intelligence Mason, M. T. (1986), ‘Mechanics and planning of manipula-
     and Mobile Robots: Case Studies of Successful Robot Sys-     tor pushing operations’, The Intl. J. of Robotics Research
     tems’, AAAI Press, Menlo Park, California, pp. 195–210.      5(3), 53–71.
                       c
Gerkey, B. P. & Matari´ , M. J. (2002a), A market-based formu-              c
                                                                     Matari´ , M. J. (1992), Designing Emergent Behaviors: From
     lation of sensor-actuator network coordination, in ‘Proc.            Local Interactions to Collective Intelligence, in J.-A.
     of the AAAI Spring Symp. on Intelligent Embedded and                 Meyer, H. Roitblat & S. Wilson, eds, ‘From Animals to
     Distributed Systems’, Palo Alto, California, pp. 21–26.              Animats 2, Second International Conference on Simula-
Gerkey, B. P. & Matari´ , M. J. (2002b), ‘Sold!: Auction meth-
                      c                                                   tion of Adaptive Behavior (SAB-92)’, MIT Press, pp. 432–
     ods for multi-robot coordination’, IEEE Transactions on              441.
     Robotics and Automation 18(5), 758–768.                         Modi, J., Jung, H., Tambe, M., Shen, W.-M. & Kulkarni,
                        c
Gerkey, B. P. & Matari´ , M. J. (2003), Multi-Robot Task Allo-           S. (2001), A Dynamic Distributed Constraint Satisfaction
     cation: Analyzing the Complexity and Optimality of Key              Approach to Resource Allocation, in T. Walsh, ed., ‘Princi-
     Architectures, in ‘Proc. of the IEEE Intl. Conf. on Robotics        ples and Practice of Constraint Programming – CP 2001’,
     and Automation (ICRA)’, Taipei, Taiwan, pp. 3862–3868.              Springer-Verlag, New York, pp. 685–700.


                                                                    16
Murata, T. (1989), ‘Petri Nets: Properties, Analysis, and Appli-                             c
                                                                     Werger, B. B. & Matari´ , M. J. (2001), Broadcast of Local
    cations’, Proc. of the IEEE 77(4), 541–580.                          Eligibility for Multi-Target Observation, in L. E. Parker,
Ø      ˚
 sterg ard, E. H., Matari´ , M. J. & Sukhatme, G. S. (2001),
                            c                                            G. Bekey & J. Barhen, eds, ‘Distributed Autonomous
      Distributed multi-robot task allocation for emergency han-         Robotic Systems 4’, Springer-Verlag, New York, pp. 347–
      dling, in ‘Proc. of the IEEE/RSJ Intl. Conf. on Intelligent        356.
      Robots and Systems (IROS)’, Wailea, Hawaii, pp. 821–           Yamauchi, B. (1998), Frontier-Based Exploration Using Multi-
      826.                                                               ple Robots, in ‘Proc. of Autonomous Agents’, Minneapo-
                                                                         lis, Minnesota, pp. 47–53.
Parker, L. E. (1994), Heterogeneous Multi-Robot Cooperation,
     PhD thesis, MIT EECS Department.                                Zlot, R., Stentz, A., Dias, M. B. & Thayer, S. (2002), Multi-
                                                                           Robot Exploration Controlled by a Market Economy, in
Parker, L. E. (1995), L-ALLIANCE: A Mechanism for Adaptive
                                                                           ‘Proc. of the IEEE Intl. Conf. on Robotics and Automation
     Action Selection in Heterogeneous Multi-Robot Teams,
                                                                           (ICRA)’, Washington, DC, pp. 3016–3023.
     Technical Report ORNL/TM-13000, Oak Ridge National
     Laboratory, Knoxville, Tennessee.
Parker, L. E. (1998), ‘ALLIANCE: An architecture for fault-
     tolerant multi-robot cooperation’, IEEE Transactions on
     Robotics and Automation 14(2), 220–240.
Parker, L. E. (1999), ‘Cooperative Robotics for Multi-Target
     Observation’, Intelligent Automation and Soft Computing
     5(1), 5–19.
Pearce, D. W., ed. (1999), The MIT Dictionary of Modern
     Economics, 4th edn, The MIT Press, Cambridge, Mas-
     sachusetts.
Sandholm, T. W. & Lesser, V. R. (1997), ‘Coalitions among
    computationally bounded agents’, Artificial Intelligence
    94(1), 99–137.
Shehory, O. & Kraus, S. (1996), Formation of overlapping
    coalitions for precedence-ordered task-execution among
    autonomous agents, in ‘Proc. of the Intl. Conf. on Multi
    Agent Systems (ICMAS)’, Kyoto, Japan, pp. 330–337.
Shehory, O. & Kraus, S. (1998), ‘Methods for task allocation via
    agent coalition formation’, Artificial Intelligence 101(1–
    2), 165–200.
Simon, H. A. (2001), The Sciences of the Artificial, 3rd edn, MIT
    Press, Cambridge, Massachusetts.
Spletzer, J. R. & Taylor, C. J. (2001), A Framework for Sen-
     sor Planning and Control with Applications to Vision
     Guided Multi-robot Systems, in ‘Proc. of Computer Vision
     and Pattern Recognition Conf. (CVPR)’, Kauai, Hawaii,
     pp. 378–383.
Stone, P. & Veloso, M. (1999), ‘Task Decomposition, Dynamic
     Role Assignment, and Low-Bandwidth Communication
     for Real-Time Strategic Teamwork’, Artificial Intelligence
     110(2), 241–273.
Vail, D. & Veloso, M. (2003), Dynamic Multi-Robot Coordina-
      tion, in A. Schultz et al., eds, ‘Multi-Robot Systems: From
      Swarms to Intelligent Automata, Volume II’, Kluwer Aca-
      demic Publishers, the Netherlands, pp. 87–98.
                                       ¨
Weigel, T., Auerback, W., Dietl, M., Dumler, B., Gutmann, J.-
                      u
    S., Marko, K., M¨ ller, K., Nebel, B., Szerbakowski, B. &
    Thiel, M. (2001), CS Freiburg: Doing the Right Thing in
    a Group, in P. Stone, T. Balch & G. Kraetzschmar, eds,
    ‘RoboCup 2000, LNAI 2019’, Springer-Verlag, Berlin,
    pp. 52–63.


                                                                    17
                       Université Libre de Bruxelles
                       Institut de Recherches Interdisciplinaires
                       et de Développements en Intelligence Artificielle




Efficient Multi-Foraging in Swarm Robotics



               Alexandre Campo




       IRIDIA – Technical Report Series
             Technical Report No.
             TR/IRIDIA/2006-027
                 October 2006
 IRIDIA – Technical Report Series
                    ISSN 1781-3794

 Published by:
  IRIDIA, Institut de Recherches Interdisciplinaires
            e
     et de D´veloppements en Intelligence Artificielle
             ´
   Universite Libre de Bruxelles
   Av F. D. Roosevelt 50, CP 194/6
   1050 Bruxelles, Belgium

 Technical report number TR/IRIDIA/2006-027
 Revision history:
     TR/IRIDIA/2006-027.001          September 2006




The information provided is the sole responsibility of the authors and does not necessarily
reflect the opinion of the members of IRIDIA. The authors take full responsability for
any copyright breaches that may result from publication of this paper in the IRIDIA –
Technical Report Series. IRIDIA is not responsible for any use that might be made of
data appearing in this publication.
                  Efficient multi-foraging in swarm robotics
                                             Alexandre Campo∗
                                                October 2006


                                                   Abstract
           In the multi-foraging task under study, a group of robots has to efficiently retrieve two
       different types of prey to a nest. We define efficiency using a concept of energy that is spent
       by the robots when they leave the nest to explore the environment and gained when a prey
       is successfully retrieved to the nest.
           The goal of this study is to identify the characteristics of an efficient multi-foraging be-
       haviour. We design and validate a mathematical model that is used to predict the optimal
       behaviour. This model allows us to evaluate the performance of a group of robots on an
       absolute scale. We design a decision algorithm and study its performance and adaptivity in a
       wide range of experimental situations. This algorithm takes into account information about
       the number of prey encountered but also about the number of robots encountered, so that
       a suitable number of robots are allocated to foraging while the remaining robots rest at the
       nest and spare energy.

       Keywords: swarm-robotics, multi-foraging, mathematical modelling.


1      Introduction
Within collective robotics, swarm robotics is a relatively new approach to the coordination of a
system composed of a large number of autonomous robots. The coordination among the robots
is achieved in a self-organised manner: the collective behaviour of the robots is the result of local
interactions among robots, and between the robots and the environment. The concept of locality
refers to a situation in which a robot alone can not perceive the whole system. Each single robot
typically has limited sensing, acting and computing abilities. The strength of swarm robotics lies
                                                                           ¸
in the properties of robustness, adaptivity and scalability of the group [DSahin04].
    Foraging is a classical metaphor used in swarm robotics. In foraging, a group of robots has to
pick up objects that are scattered in the environment. The foraging task can be decomposed in
an exploration sub-task followed by a transport sub-task. Foraging can be applied to a wide range
of useful tasks. Examples of applications are toxic waste cleanup, search and rescue, demining
and collection of terrain samples. Central place foraging is a particular type of foraging task in
which robots must gather objects in a central place. Borrowing the terminology from biology, the
central place is also called the nest and the objects are called prey.
    Multi-foraging is a variation of the foraging task in which different types of objects to collect
are considered [Bal99]. These different types of objects can be concurrently and independently
collected by the individuals and can have different properties. Multi-foraging adds a level of com-
plexity with respect to the traditional foraging task as it may be necessary for the individuals to
choose which prey to take, and when.

   The study of the efficiency of foragers has first been the concern of biologists. In his semi-
nal article [Cha76], Charnov exposes the fundamental hypothesis that gives birth to the field of
optimal foraging. The hypothesis is that evolution has shaped individual behaviours of foraging
    ∗ Corresponding   author: alexandre.campo@ulb.ac.be


                                                          1
2                                       IRIDIA – Technical Report Series: TR/IRIDIA/2006-027


animals so as to maximize the net energy intake. Three decades later, roboticists try to identify
how robots should cooperate in order to forage efficiently. The efficiency has been defined in sev-
eral ways: in biology, researchers use the term energy and measure weights of food and animals
to quantify energy spent and gained. In robotics, the vocabulary is more flourishing, due to the
connections with the fields of game theory and scheduling. Terms such as reward, income, benefit
may have been used [UB04, LDD04, LJGM06, LWS+ 06]. For the sake of simplicity, we will use
the term energy, as in biology. Foraging efficiently is thus a quest to optimize the energy of a
group of foraging robots. Robotics researchers often consider that energy is spent when robots
move during exploration and that energy is gained when a prey is successfully retrieved to the
nest [LWS+ 06].

    We focus on a specific case of multi-foraging in which there are only two types of prey that have
to be retrieved to the nest. Robots can rest at the nest and in this way spare energy. Moreover, the
spatial aspect of the task is neglected as the prey do not have specific locations in the environment
but rather densities in the environment. The exploration mechanism used by the robots to find
prey is a random walk. Therefore robots discover prey in the environment at a given rate.

    Our objective is to identify the characteristics of an individual behaviour that leads the group
of foraging robots to have an efficient collective behaviour. To achieve this objective, we first
design and validate a mathematical model of multi-foraging. Mathematical modeling of robotic
experiments is a methodology [SS97] [SSYA98] [KAKG02]. Mathematical models are opposed
to individual based models (IBMs) [ME03]. In IBMs, each robot is represented as well as the
environment. Mathematical models are analytic description of the evolution of a system, in which
the individuals of a system are not represented separately. Macroscopic models are faster than their
counterparts because their computation time does not depend on the number of individuals. They
can be used as optimization tools: Ijspeert et al. [IMBG01] have used a stick pulling experiment
to demonstrate how the behaviour of the robots could be made efficient. Within the limits of the
mathematical tools available, it is also possible to draw conclusions on the dynamics and intrinsic
properties of the system.
    The model we devise predicts with a very good confidence the optimal behaviour of the robots,
henceforth the maximum amount of energy that a group of robots can accumulate during an
experiment. We use the model as a yardstick to evaluate the performance of the group of robots
and test different behavioural rules.
    Based on simplified equations, we introduce a decision algorithm to control the behaviour of
the robots. To evaluate the performance of that algorithm, we run simulations using a set of
2160 different experimental configurations. These configurations are obtained by varying the pa-
rameters of the experiment. The efficient behaviour of robots is not necessarily the same in one
configuration or the other. We also test the adaptivity of the algorithm by changing drastically
the configuration of experiments.

    In Section 2 we detail the experimental setup and the controller of the robots. Section 3 is
devoted to the description, analysis and validation of the mathematical model. Section 4 presents
the decision algorithm and the evaluation of its performance and adaptivity using the predictions
of the mathematical model. Section 5 concludes the paper with a discussion of the results and
some ideas for future work.


2     Methods
2.1    Experimental setup
Environment - The environment is a circular arena of 1.20 meters of radius (see Figure 1(a)).
Robots are randomly scattered on it at the beginning.
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                                                        3


    Nest - A circular nest is located in the center of the arena. Robots can locate the nest anywhere
in the arena thanks to a lamp suspended above it, signaling it’s position. In order to avoid over-
crowding, the nest has a structure of three imbricated rings with different gray-levels, as presented
in Figure 1(b). The limit of the most inner ring defines where robots can rest. By disposing robots
around the center, an aggregation of resting robots is avoided. Such a structure could prevent
robots surrounded by others to leave freely the nest. The second ring defines where a robot can
safely release a prey, with good confidence that it is dropped inside the nest. Indeed, as robots
have limited perception they cannot distinguish whether the retrieved prey is inside the nest or
only half inside. Finally, the limit of the outermost ring defines the boundary of the nest.




                                                                                                   Center of the nest
                                                                     Prey




                                                                                                          First mark
                                                                                                          robots can rest




                                                                                                Second mark
                                                            Prey                                robots can drop




                                                                             Nest mark
                                                                             the boundary of the nest


                       (a)                                                  (b)


Figure 1: (a) Setup of the multi-foraging experiment. The environment is a circular arena. The
nest is circular and centered. Robots and prey are initially scattered randomly. (b) Close-up of
the nest showing it’s structure of three imbricated rings. The structure is designed to minimize
interferences (overcrowding phenomena) among robots in the nest. In addition, the structure
permits robots to have a good confidence that prey are dropped inside the nest.



    Prey - Prey are introduced in the environment at random locations around the nest, at a fixed
distance of it. New prey appear with a constant rate per time unit. Prey have a lifetime, i.e.
they disappear at a constant rate. They are also removed when they are inside the nest. They
have a weight and friction that define the time required for being retrieved. An amount of energy
is associated to a prey. and is attributed to the group of robots once it is delivered in the nest.
Prey of the same type share all their characteristics (time to retrieve, colour, energy, lifetime and
incoming rate). There are only two different types of prey in the experiments.

    Robots - The simulated robots have the same characteristics as s-bots from the swarm-bots
project [DTG+ 05]. The sensors we rely on are the ground sensors to perceive whether robots
are outside or inside the nest, and in the latter case on which ring. Infrared sensors are used for
collision avoidance with walls and other robots. The camera is employed to determine the location
of the nest. The robots are also able to discriminate the type of prey upon encountering thanks
to their colour. Lastly, the robots use the camera to perceive if a nearby prey is already being
retrieved by another robot.
4                                         IRIDIA – Technical Report Series: TR/IRIDIA/2006-027


2.2    Robot controller
The controller used is the same for all the robots. The architecture of the program is a finite state
machine (FSM). The scheme in Figure 2 represents the possible states, with arcs denoting the
possible transitions between states. At the beginning of an experiment, all robots are initialized
in explore state.




Figure 2: Finite state machine representing the robot’s controller. Transitions between states are
triggered according to the probabilities described in Table 2. The control parameters (probabilities
γ, π1 and π2 ) can be modified by a decision mechanism which is not displayed here. The subscript
letter i designs different probabilities depending on the type of prey that was first encountered.


    • Explore [Description] The robot is wandering within the environment and performs a ran-
      dom walk. A subroutine of obstacle avoidance is triggered if a wall or another robot is
      encountered. [Transitions] Return to nest with a probability constant over time. Grasp a
      prey if it is close enough and no green colour is perceived. Grasping is conditioned by the
      probabilities π1 and π2 .
    • Grasp [Description] The robot has detected a prey and tries to perform a physical connec-
      tion. [Transitions] If the grasping is successful the robot tries to retrieve the prey, otherwise
      it enters in the ignore state.
    • Retrieve [Description] The robot becomes green. This colour is used to prevent any other
      robot to try to take the prey already grasped. The robot heads toward the nest. [Transitions]
      If the robot reaches the nest, it releases the prey and enters the explore state. During the
      retrieval process, the robot has a probability constant over time to give up the retrieval and
      enter the ignore state.
    • Ignore [Description] The robot gets blind to any prey. It is just performing a random walk
      with collision avoidance. [Transitions] After a delay of five seconds (enough to get away
      from a prey) the robot enters the explore state.
    • Rest [Description] The robot heads back to the nest. When it is in the nest it does nothing.
      [Transitions] With a constant rate per time unit, the robot can decide to leave the nest by
      entering in the explore state.


3     Mathematical model
3.1    Modelling assumptions
The environment is circular and the robots are performing random walk when they move. We
assume that we face a memory-less process: robots find a constant rate of prey in time and it is
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                                  5


independent of the time already spent exploring.
    To test this assumption, we ran 1000 simulations with only one robot and one prey. A specific
robot controller was designed to perform a simple random walk. The experiment was halted as
soon as the robots established visual contact with the prey. The survival curve of finding times
matches a straight line with a 95% confidence level (see Figure 3(a)).
    In the same manner, we tested the time required for a robot to go back to the nest. We ran
1000 other simulations with a single robot and no prey in the environment. The experiment was
halted as soon as the robot was in the nest. The survival curve of times to go back to nest matches
also a straight line with a 95% confidence level (see Figure 3(b)).
    As the focus of this paper is not the study of interferences among robots, we assume that
collision avoidance events are negligible in the mathematical model. Finally, we assume that the
probability to complete a retrieval can be modelled using a constant rate of prey retrieved per
unit of time.




                       (a)                                                  (b)


Figure 3: (a) Survival curve of the finding times and (b) times to go back to nest. The survival
curve analysis consists in plotting in log-linear scale the proportion of individuals that remain in
a given state (looking for a prey or going back to the nest) as a function of the time elapsed since
the beginning of this state. If the process is memory-less, then the decay of the proportion follows
a straight line according to the equation : f (t) = log(e−kt ) = kt. The slope k of the straight line is
the rate of changing state and 1/k is the mean time to remain in this given state. The regression
line is plotted in black and dashed lines represent a confidence interval of 95%




3.2    Mathematical description
Partial differential equations are devised to model the flow of robots among five main states. We
neglect the modeling of the grasp and ignore states because they occur rarely and their duration
is relatively short. In order to describe how energy is gained, we model the retrieval process in two
distinct parts, one for each type of prey. In addition, we noticed that the time required to go back
to nest before resting is not negligible and has to be modelled. To this extent, we introduce the
back state. We end up with five main states among which flow of robots are exchanged. Figure 4
shows a scheme of these flows.
    We enumerate the variables of the macroscopic model in Table 1. We also clarify the meaning
of all the parameters in Table 2. Notice that none of those parameters are free. They can all be
measured in the experimental setup or decided by the experimenter, except the triplet (π1 , π2 , β)
6                                        IRIDIA – Technical Report Series: TR/IRIDIA/2006-027




Figure 4: The diagram of flows that are modelled using the set of partial differential equations.
Flows of robots are exchanged between five main states. Each arc describes a possible transition
between the states.


which are control parameters of the robot and are set by the experimenter or the controller of the
robots.
             Variable    Description
             E           the number of   robots in explore state
             B           the number of   robots currently going back to the nest to rest
             I           the number of   robots in rest state (or inactive robots)
             R1          the number of   robots in retrieval state (prey of type 1)
             R2          the number of   robots in retrieval state (prey of type 2)
             N1          the number of   prey of type 1 in the environment
             N2          the number of   prey of type 2 in the environment

              Table 1: The signification of each variable of the mathematical model.


    The set of differential equations 1 is used to model the flow of robots exchanged between the
descriptive variables. We provide a detailed explanation of the first equation and let the reader
use tables 1 and 2 to figure out the details of the remaining equations. As explained in Section
2.2 and depicted on Figure 4, several transitions lead robots to enter or leave the explore state.
Each right-term of the differential equations is an amount of robots doing a specific transition.

                                                 2
                        ∂E
                               = −βE + γI +           (−πi ENi λ + µi Ri + ρRi )
                        ∂t                      i=1
                         ∂B
                               = +βE − κB
                         ∂t
                         ∂I
                               = +κB − γI
                          ∂t
                        ∂Ri
                               = πi ENi λ − µi Ri − ρRi        ∀i ∈ [1, 2]
                         ∂t
                        ∂Ni
                               = ϕi − πi ENi λ − ξi Ni + ρRi        ∀i ∈ [1, 2]
                         ∂t
                                                                                                 (1)
    1. First, robots can decide to rest at nest with a probability β. In average βE robots leave the
       explore state and enter the back state.
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                              7


 Parameter      Description
 T              the total number of robots in the experiment
 λ              rate of objects per second found in the environment by a single robot
 κ              probability for a single robot to find the nest
 En1            energy associated to a prey of type 1
 En2            energy associated to a prey of type 2
 Enp            energy (negative) associated to 1 second spent outside the nest for one robot
 ϕ1             incoming rate per second of prey of type 1
 ϕ2             incoming rate per second of prey of type 2
 ξ1             probability constant over time for a prey of type 1 to disappear
 ξ2             probability constant over time for a prey of type 2 to disappear
 µ1             inverse of the average time required to retrieve a prey of type 1
 µ2             inverse of the average time required to retrieve a prey of type 2
 ρ              probability to give up an ongoing retrieval
 β              probability for a robot to return to nest (i.e. switch to rest state)
 γ              probability for a robot to leave the nest and look for prey (i.e. switch to explore state)
 π1             probability to take a prey of type 1 upon encounter (i.e. switch to grasp state)
 π2             probability to take a prey of type 2 upon encounter (i.e. switch to grasp state)

                Table 2: The signification of each parameter of the experiment.



  2. Conversely, robots in rest state have a probability γ to come back in explore state. Thus
     there are in average γI robots entering the explore state.

  3. Robots can also find a prey and decide to retrieve it. The probability to find a single object
     for a single robot being λ, the average number of exploring robots that find a prey of type i
     is ENi λ. As robots decide to retrieve the prey using the parameter πi , the average number
     of robots that leave the explore state to retrieve a prey of type i is πi ENi λ.

  4. We consider that robots have a probability of µi to achieve the retrieval of a prey of type
     i. Hence, there are in average µi Ri robots that achieve a retrieval and come back in explore
     state.

  5. Lastly, during the retrieval of a prey of type i, robots have a probability ρ to give up and
     come back in explore state. In average their are ρRi robots that give up retrieval of prey of
     type i.


3.3    Stable states
We shall now study the properties of the system when it is stabilized. We assume that the agents
do not change any more their control parameters (π1 , π2 , β), and that the system thus converges
to a steady state in finite time. The number of robots in the environment is constant. Robots
change their state during the experiment: the macroscopic model describes the exchange flows of
robots between those states. Hence, the sum of robots in each state is always R, this is described
by the conservation Equation 2.


                                                          2
                                   T   =   B+I +E+             Ri                               (2)
                                                         i=1

    At steady state, the flows between states are stabilized and their derivative is zero. In the
following, we prove that there is only one stable state.
8                                        IRIDIA – Technical Report Series: TR/IRIDIA/2006-027




                            ∂B                   βE
                                  =0   ⇐⇒   B=
                            ∂t                    κ
                            ∂I                  βE
                                  =0   ⇐⇒   I=
                             ∂t                  γ
                           ∂Ri                   πi ENi λ
                                  =0   ⇐⇒   Ri =                       ∀i ∈ [1, 2]
                            ∂t                     µi + ρ
                           ∂Ni                    ϕi + ρRi
                                  =0   ⇐⇒   Ni =                          ∀i ∈ [1, 2]
                            ∂t                    πi Eλ + ξi
                                                                                                (3)

   By substituting the expression of the variables at steady state in the conservation Equation 2,
we get :

                                                2
                       T     =    B+I +E+            Ri
                                               i=1
                                                      2
                                  βE   βE         πi ENi λ
                       T     =       +    +E+
                                   κ    γ     i=1
                                                   µi + ρ
                                                           2
                                         κ+γ                          πi Eλϕi
                       T     =     1+β          E+                                              (4)
                                          κγ              i=1
                                                                πi Eλµi + ξi µi + ξi ρ

    According to Equation 4 and under the fair assumption that the parameters are strictly pos-
itive, there is a strict monotonic relationship between E and T . As T is a constant, it implies
that for a given set of parameters defining an experiment, there is only one possible value of E at
stable state and this value can be calculated using Equation 4. The remaining variables describing
the state of the system can be expressed, at stable state, in function of E as shown by Equations
3. Therefore, for a given set of parameters, there is only one possible stable state of the system.
Moreover, the value of the variables describing the system at stable state can be calculated.

3.4     Validation
To evaluate the quality of the model and determine to which extent we can rely on it to have a
good prediction, it is mandatory to go through a validation process. This phase involves com-
parison of the results obtained in simulation against those of the model for a collection of typical
experimental situations. It is a critical stage as we will rely on the model to draw conclusions on
the characteristics of the foraging algorithm introduced later.
   We define a range of reasonable values for each parameter of the experiment (except the
control parameters of the robots π1 , π2 and β). The table 3 provides the ranges examined for each
parameter of the experiment. A configuration of the experimental setup is defined by selecting one
value for each parameter out of the range associated to the parameter. According to the table 3,
there are 2160 possible configurations that define a large collection of possible experimental setups.
We denote P this collection and Ci , i ∈ [1, 2160] one particular configuration of the experimental
setup.

3.4.1   Prediction quality of the energy
The first validation test bears directly on the energy accumulated by a group of robots during an
experiment. The aim is to have a quantitative indication of the ability of the mathematical model
to predict the energy accumulated using a given behaviour.
    We use each configuration Ci ∈ P to parameterize an experiment of one hour long. The
mathematical model is used to explore the space of the control parameters (π1 , π2 , β) ∈ [0, 1]3 .
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                                  9


                   Parameter       Range of values tested            Unit
                   T               1, 2, 3, 5, 10, 15                robot
                   N 1(0)          5                                 prey of type 1
                   N 2(0)          5                                 prey of type 2
                   λ               1/159.4                           probability
                   κ               1/19.51                           probability
                   En1             −100, −10, −1, 1, 10, 100         energy
                   En2             1                                 energy
                   Enp             −0.001, −0.01, −0.1               energy
                   ϕ1              1/15, 1/30, 1/60, 1/120, 1/180    prey / second
                   ϕ2              1/60                              prey / second
                   ξ1              0.002                             probability
                   ξ2              0.002                             probability
                   µ1              1/90, 1/40, 1/30, 1/60            second−1
                   µ2              1/60                              second−1
                   ρ               0.0111                            probability
                   β               control parameter                 probability
                   γ               1/400                             probability
                   π1              control parameter                 probability
                   π2              control parameter                 probability

Table 3: Each parameter of the experiment is given a range of reasonable values (i.e. compatible
with the dimensions of the experimental setup, and not fostering interferences among robots). By
associating to each parameter one value, we define an experimental configuration. In total, there
are 2160 possible configurations.



The control space is discretized using a step of 0.05 units. The mathematical model is solved
for each triplet and we identify the control parameters that yield the highest amount of energy
according to the predictions of the model. Hence, for each configuration Ci we associate the
predicted optimal behaviour OBi found by an exhaustive exploration of the control space.
    Now we define Esim (C, B) and Emac (C, B) the energy accumulated respectively in simulation
and predicted with the mathematical model (by the collective behaviour B in an experiment
parameterized with the configuration C).
    We plot the energy obtained in simulation in function of the prediction of the mathematical
model. If the two values are identical, all the points lie on a straight line of which the slope is 1.
More generally, if the model manages to predict the energy with a constant proportion of error a
and a constant bias b, we have a linear relationship: Rsim = a · Rmac + b. We test the hypothesis
of the linear relationship using a linear regression.
    Figure 5 presents the plot of Rsim (Ci , Bi ) in function of Rmac (Ci , Bi ) for every Ci , OBi with
i ∈ [1, 2160]. The correlation coefficient of the linear regression (r2 = 0.98) is highly statistically
significant (p-value < 0.001). The slope of the regression line is a = 0.86 and the bias is b = −11.59,
which indicates that the mathematical model was able to predict the energy obtained in simulation
with a constant error of 14%.
    The difference between the predicted energy and the one actually accumulated in simulation
can be explained partly by the non modelled collisions and also by the fact that error of the
mathematical model is integrated over time.

3.4.2   Prediction quality of a comparison
The second test is meant to assess the ability of the model to compare the outcome of two different
behaviours. The test consists of selecting randomly two behaviours A and B from the control
space (π1 , π2 , β) ∈ [0, 1]3 . We compare the accumulated energy predicted for A and B. The same
10                                                                         IRIDIA – Technical Report Series: TR/IRIDIA/2006-027




                                                         20000
                         reward realized in simulation

                                                         15000
                                                         10000
                                                         5000
                                                         0




                                                                 0       5000       10000          15000   20000

                                                                                predicted optimal reward




Figure 5: The energy accumulated in simulation is plotted in function of the energy predicted by
the mathematical model. The values were obtained using 2160 experimental setups, differently
parameterized. The dashed line (with a slope of 1) indicates the perfect match between mathe-
matical predictions and simulations. A regression is performed on the data, and displayed by the
straight line. The correlation coefficient is r2 = 0.98, the slope of the straight line is a = 0.86 and
the bias is b = −11.59, indicating that the model underestimates the outcome of the simulations
with a constant error of 14%.



comparison is carried out using one single run of simulation for each behaviour. The question we
are asking is: does the mathematical model predict correctly which collective behaviour, either A
or B, performs better.
    Again, we use each configuration Ci ∈ P to parameterize an experiment of one hour long. For
each configuration Ci we generate 5 pairs of random behaviours (Aij , Bij ), j ∈ [1, 10]. The table 4
summarizes the frequencies of all possible comparison results for the 10800 tests performed. The
table indicates that in 85.35% of the tests, the mathematical model and the simulations agreed
on the ranking of the pair of behaviours. The table is almost symetric and shows no better
performance of the model if A superseeds B or the opposite.

                                                                     Simulation
                                                                                      R(A) < R(B)             R(A) > R(B)
                    Model
                    R(A) < R(B)                                                       43.22%                  7.64 %
                    R(A) > R(B)                                                       6.97 %                  42.13 %

Table 4: Comparison table of predicted orders by the mathematical model with respect to simu-
lation results. The notation R(A) <, > R(B) signifies that the energy accumulated in conditions
A is lower, respectively bigger than in conditions B.


    Moreover, we have studied the conditions in which disagreement between the mathematical
model and the simulations occurs. We did so by plotting the predicted energy for behaviour B
in function of the energy predicted for behaviour A. Figure 6 shows as black circle the pairs of
behaviours that lead to disagreement between the mathematical model and the simulations. The
regression performed on the disagreeing data returns a correlation coefficient r2 = 0.98 and a
regression slope a = 1.00. The agreeing pairs of behaviours are plotted in gray. It is clear that
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                                                        11


the wrong predictions of the model occur mainly if the two behaviours are supposed to yield very
similar energy. Given that we use only one run of simulation without averaging, an error caused
by the noise in simulation is much more likely to appear for those pairs of behaviours.




                                                            20000
                                                            10000
                         reward predicted for behaviour B

                                                            0
                                                            −10000
                                                            −20000




                                                                     −20000   −10000           0            10000   20000

                                                                                reward predicted for behaviour A




Figure 6: The energies predicted for behaviours B are plotted in function of the energies predicted
for behaviours A. Two cases are plotted in superimposition: the gray circles are situations in
which both the mathematical model and the simulation agree on which behaviour performs better.
The black circles show where disagreement was found between the mathematical model and the
simulations. The dashed line delimits the possible orders, R(A) > R(B) below or conversely
R(A) < R(B) above. The disagreeing cases in black are clearly aligned on the dashed line (r2 =
0.98, slope a = 1.00, bias b = 7.84), indicating that the disagreements arise mainly when the
expected energy of two behaviours are very much alike.




4     Efficient multi-foraging
4.1    Decision algorithm
The decision algorithm is a piece of code plugged in the controller of the robots. It is modifying
individual behaviour through the three control parameters π1 , π2 and β.
    To make decisions and modify their behaviours, robots rely on their partial perception. They
are “aware” of the encountering of prey, other robots and time spent exploring the environment.
By dividing the number of perceived objects (prey or robots) by the time of exploration, robots
can estimate a density of object per second. As the environment is finite and bounded, this density
is a function of the spatial density. Robots are also able to perceive individually the energy of a
prey and retrieval time of each type of prey.
    Depending on the environment, it is possible that the best behaviour of a robot is to stay in
the nest and spare energy. Imagine that new prey appear in the environment: if the robots have
decided to stay forever in the nest, they will not find out that there is better to do than resting at
nest. This situation shows that the classical tradeoff between exploration and exploitation holds.
We implement a solution to this problem by setting a minimal value for the time spent outside the
nest, and the probabilities to take prey of either type. On one hand, this threshold will diminish
efficiency of the robots but on the other hand it will guarantee that robots keep updating their
perception and sense conveniently the environment.
12                                        IRIDIA – Technical Report Series: TR/IRIDIA/2006-027


    In the present algorithm we use a discount factor to limit the robots’ perception to a time
window. This technique is used to let the robots forget about the past observations and behave
efficiently if the environment changes. Instead of memorizing a sequence of observations, the
robot counts the encountered objects and the time spent exploring outside the nest. The discount
factor ponderates the influence of observations, giving more importance to the most recent events.
Equation 5 is used to estimate the density of a type of objects in the environment (prey or robots).


        Object Count(t)     = Object Count(t − ∆t) · Discount F actor + Object Observed
        Explore T ime(t)    = Explore T ime(t − ∆t) · Discount F actor + ∆t
       Object Density(t)    = Object Count(t)/Explore T ime(t)                                  (5)

   The decision algorithm relies on an equation that permits robots to individually estimate the
energy per second, or instantaneous amount of energy EI that can be gained by the group. In
the following we detail the steps that lead to this equation.
   We start by calculating the rate of prey grasped per second by the group of robots using :


                                                                      2
                                     preyRate =              Eλ           N i πi
                                                                  i=1


     The proportion of prey of type i grasped is :


                                                             EλNi πi
                                     propi       =       2
                                                         j=1    EλNj πj
                                                             N i πi
                                                 =       2
                                                         j=1    Nj π j

     We also rely on the average time of retrieval of a prey :


                                                         2
                                   retT ime =                1/µi propi
                                                        i=1
                                                           2
                                                           i=1 1/µi Ni πi
                                                 =            2
                                                              i=1 Ni πi


   The previous are used to calculate the average time to grasp a prey and then retrieve it to the
nest :


                            preyT oN est =       1/preyRate + retT ime
                                                                          2
                                                       1                  i=1 1/µi Ni πi
                                             =               +              2
                                                     EλNi πi                i=1 Ni πi
                                                                  2
                                                     1 + Eλ       i=1 1/µi Ni πi
                                             =                    2
                                                        Eλ        i=1 Ni πi


    Lastly, we find the expression of EI, the instantaneous amount of energy acquired by the group
of robots :
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                             13




                                     2
               EI    = E · Enp +          Eni · propi · 1/preyT oN est
                                    i=1
                                     2                                         2
                                                    Ni πi                Eλ    j=1 Nj πj
                     = E · Enp +          Eni ·    2              ·            2
                                    i=1            j=1   N j πj       1 + Eλ   j=1 1/µj Nj πj
                                           2
                                                         Eni Ni πi
                     = E · Enp + Eλ                         2                                   (6)
                                          i=1   1 + Eλ      j=1   1/µj Nj πj

    Equation 6 can be used by each robot to estimate the rate of energy currently gained by the
group. All the parameters of this equation, except λ, are either control parameters, or can be
estimated by the robots during the exploration of the environment. Indeed, using Equations 5,
each robot can estimate the density of robots or prey of any type in the environment, which is
respectively λE, λN1 and λN2 . To compute EI, robots need to know λ. This parameter may be
estimated for instance by measuring the time to go back to the nest but the collisions with other
robots may diminish the quality of such an estimation. In the following, the robots are given the
parameter λ that characterizes the size of the environment.
    Based on the previously described sensing capabilities and Equation 6 of instantaneous energy,
we introduce the decision algorithm 1. In a nutshell, this algorithm estimates parameters of the
experiment using the observations of the robot. It then estimates the impact of several possible
triplet parameters (E, π1 and π2 ) on the rate of energy EI. Control parameters of the robot are
then updated to converge towards a behaviour that maximizes EI.

4.2    Performance
We assess the performance of the decision algorithm by carrying out a systematic comparison of the
energy accumulated in simulation with the energy obtained by the predicted optimal behaviour. As
in the validation process (see Section 3.4), we use P the collection of 2160 different configurations
of experimental setup. For each configuration Ci ∈ P , we use the mathematical model to find out
the predicted optimal behaviour OBi .
    In order to not include the error of the model in the estimation of the energy accumulated
by this behaviour, we use a single run of simulation to determine the energy gain Rsim (Ci , OBi )
associated to the predicted optimal behaviour. We also run a single simulated experiment with
the decision algorithm plugged in the behaviour of each individual. The control parameters are
initially set to (1, 1, 1) so that robots start by exploring the environment. The energy accumulated
in a simulated experiment with a configuration Ci and the decision algorithm plugged in the robot’s
controller is denoted Rdec (Ci ).
    The energy Rdec (Ci ) is compared to the predicted optimal energy Rsim (Ci , OBi ). As in the
validation process, we rely on a linear regression applied to Rdec (Ci ) and Rsim (Ci , OBi ). The
correlation coefficient (r2 = 0.98) indicates that the linear relationship hypothesis holds (p-value
< 0.001). The slope of the regression line is a = 0.99 and the bias is b = −23.73, which means
that the decision algorithm performs in average 99% as well as the predicted optimal behaviour.

4.3    Adaptivity
We focus on the adaptivity of the decision algorithm. The goal is to study how close robots get
from the optimal behaviour in changing environments. We have used experiments split in three
periods of 1 hour each. Two configurations Ci and Cj are randomly selected from the pool P of
2160 possible experimental setups. Each period is assigned a configuration and the first and third
periods share the same one. Thus, during the whole experiment the setup will be parameterized
by the sequence (Ci , Cj , Ci ) of configurations. The redundancy of the configuration Ci permits
to check whether robots can revert harmlessly to a previous behaviour that is, there is no memory
14                                       IRIDIA – Technical Report Series: TR/IRIDIA/2006-027


Algorithm 1: Decision algorithm

     Prepare four possible collective behaviours :
     Set b1 := (E + 1, 1, 1)
     Set b2 := (E + 1, 0, 1)
     Set b3 := (E + 1, 1, 0)
     Set b4 := (0, 0, 0)
     for each bi do
        use bi to compute an expected EI
     end
     Find the triplet bi that yields the highest expected EI
     if bi == b1 then
        decrease β by learnStep
        increase π1 by learnStep
        increase π2 by learnStep
     end
     if bi == b2 then
        decrease β by learnStep
        decrease π1 by learnStep
        increase π2 by learnStep
     end
     if bi == b3 then
        decrease β by learnStep
        increase π1 by learnStep
        decrease π2 by learnStep
     end
     if bi == b4 then
        increase β by learnStep
        decrease π1 by learnStep
        decrease π2 by learnStep
     end



effect elicited by the decision algorithm. If the number of robots is different from one period to
another, then newly introduced robots have control parameters initially set to (1, 1, 1). Conversely
robots to be removed are randomly chosen from the set of present robots. In total, we compute a
set of 1000 random combinations of configurations.
    To describe the adaptive performance of the algorithm, we still use linear regressions and
their associated correlation coefficients. Linear regressions are carried out on data extracted each
5 seconds during the experiments. Those data are the set of energies Rdec (Ciji ) obtained in
simulation and the energy Rsim (Ciji , OBiji ) obtained by the predicted optimal behaviour, for
each of the 1000 simulated experiments.
    Figure 8(b) presents the evolution of correlation coefficients obtained using linear regressions.
It shows that we can assume a linear relationship between the predictions of the model and the
realizations of the simulation after approximately 10 minutes of experiment. Figure 8(a) shows
the evolution of the slope obtained using linear regressions.
    In the very beginning of the experiment, the performance of the decision algorithm is best and
drops very quickly because robots have not yet enough observations to make sensible decisions.
After 10 minutes, the performance of the decision algorithm reaches in average 90% of the energy
accumulated by the predicted optimal behaviour. The performance keeps growing slowly until the
first change of configuration at 60 minutes, which is symbolized on the figures by vertical lines at
t = 3600 and t = 7200 seconds. The sudden change of parameters of the experiment decreases the
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                                         15




                                                       20000
                                                       15000
                    Energy accumulated in simulation

                                                       10000
                                                       5000
                                                       0
                                                       −5000




                                                               0   5000      10000          15000    20000

                                                                          Predicted optimal energy




Figure 7: The energy obtained using the decision algorithm is plotted in function of the energy
of the predicted optimal behaviour. The values were obtained using 2160 experimental setups,
differently parameterized. The dashed line (with a slope of 1) indicates the predicted optimal
performance. A regression is performed on the data, and it’s result is represented by the straight
line. The correlation coefficient is r2 = 0.98 and the slope of the straight line is a = 0.99, and
a bias b = −23.73. It means that the decision algorithm performs in average 99% as well as the
predicted optimal behaviour.



performance. Although the group of robots adapts its behaviour, it shows a noticeable decline of
performance down to 80% in average.
    The second sudden change occurs 120 minutes after the beginning of the experiment. As the
first and last periods have identical configurations, a memory effect would be noticeable. As a
matter of fact, such a phenomenon was reported by [LDD04], but for a different setup and decision
algorithm. The similarities between this work and the present one are strong enough to prompt us
to verify the presence of a memory effect that could possibly prevent the group to adapt towards
the optimal behaviour. We see on Figure 8(a) that the performance during the third period (85%
in average) is less than in the first period. This is because energy has been lost during the second
period. The performance increases slightly at the beginning of the third period and stabilizes, but
the loss of energy incured by the second phase can not be recovered fully. Consequently, there is
a remarkable evidence of a memory effect, once robots adapt to a given environment, they have
difficulties to achieve the same level of performance if the environment changes suddenly.


5     Discussion
5.1    Achievements and contributions
We have designed a mathematical model of the experiment. The validation of this model has
shown that the energy accumulated by a group of robots during an experiment can be predicted
with an average error of 14%. More importantly, we have shown that the model can be used
to rank successfully two different behaviours in 85% of the cases tested. We showed that the
16                                                                               IRIDIA – Technical Report Series: TR/IRIDIA/2006-027




                                                                                                       1.0
                        1.0




                                                                                                       0.8
                        0.8




                                                                                                       0.6
     regression slope

                        0.6




                                                                                           r squared

                                                                                                       0.4
                        0.4




                                                                                                       0.2
                        0.2




                                                                                                                        perfect correlation
                                         predicted optimal performance                                                  measured correlation
                                         performance measured                                                           configuration change




                                                                                                       0.0
                                         configuration change
                        0.0




                              0   2000        4000         6000          8000   10000                        0   2000      4000         6000   8000   10000

                                                      time (s)                                                                     time (s)



                                                     (a)                                                                          (b)


Figure 8: (a) The evolution of the slope of linear regressions performed each 5 seconds during
each experiment. Each linear regression is achieved using the energies obtained by the decision
algorithm and the ones obtained by the predicted optimal behaviour at a given time. Linear
regressions are made using the data of 1000 tests. (b) The correlation coefficient of the linear
regressions indicates to which extent a linear relationship holds between energies obtained by the
decision algorithm and the predicted optimal behaviour. The dashed horizontal line indicates
the performance of the predicted optimal behaviour. The two dotted vertical lines indicate the
times when the configurations of the experiments were randomly changed (after 3600 and 7200
seconds). The decision algorithm adapts towards the optimal behaviour (at least 90% after 10
minutes), and is remarkably impacted by the sudden changes of experimental parameters during
the second period.



errors in the remaining 15% arose only in ambiguous cases in which the energy yield by the two
compared behaviours are very similar. This new tool, previously unavailable in the literature for
the multi-foraging task, makes possible the evaluation of robots performance on an absolute scale.
    An equation to calculate the average instantaneous reward gained by the group of robots has
been devised. It has been used to implement a decision algorithm for the robots. The tests have
shown that robots using that decision algorithm manage to accumulate 99% of the energy that
can possibly be gained.
    Related works in the literature can be categorised in multi-foraging experiments and foraging
experiments.
    The study of Labella et al. [LDD04] focuses on division of labour among robots. Although
the authors define a concept of efficiency, they do not have a mathematical model to quantify
the performance of the robots on an absolute scale. Our model could be tested in this frame-
work to understand the similarity between the performance metric and the concept of energy we
introduced.
    Ulam & Balch [UB04] have proposed a Q-learning algorithm for a multi-foraging task that
differs from the one of the present document. Q-learning is used offline and the behaviour learned
is evaluated against some predictions using a mathematical model of optimal foraging. The ad-
vantage of our method is that the decision algorithm works offline and finds a very good solution
in approximately 10 minutes. Moreover our task differs slightly as the mathematical model is
non-linear. It is known that a classical Q-learning algorithm does not perform well if the system
is non-markovian [WD92]. Therefore, a simple Q-learning would may not be suited for our task.
    Lerman et al. [LJGM06] have proposed a mathematical study of a multi-foraging task. The
IRIDIA – Technical Report Series: TR/IRIDIA/2006-027                                                17


authors mention the possible use of a concept of energy (they use the term reward ) but do not
propose a solution. As a consequence, the robots’ behaviour proposed in their article is probably
not efficient in our task, although several results may be transposed.
    The last related work we would like to mention is the one of Liu et al. [LWS+ 06]. In their
article, the authors introduce the concept of energy in a simple foraging task and propose several
behavioural aspects that lead the group of robots to forage more efficiently. The authors lack a
validated mathematical model but it is clear that their conclusions apply to the present work, in
particular the implementation of a recruitment of robots could speed up the work of the robots.

5.2       Perspectives and future work
In our work, we neglected on purpose collisions among robots. Lerman et al. [LG02] emphasized
on the impact of interferences on the efficiency of the group of robots. It is likely that robots may
perceive or estimate that decrease of performance and cope with the phenomena automatically.
In the future, we would like to design a multi-foraging experiment in which collisions may happen
at a high rate and impact strongly the performance of the robots to study how well the group
could adapt it’s behaviour.
    It may be possible to have robots foraging efficiently without the knowledge of the λ parameter,
although that would probably degrade the performance of the robots. We will work in this direction
to make the behaviour of the robots totally free of any a priori knowledge of the environment.
    Our decision algorithm may be subject to a memory effect, similar to the one reported by
Labella et al. [LDD04]. We plan to investigate a new method of updating perception that may
restrict more strongly the long term memory of the robots, and therefore possibly reduce memory
effects that decreases the adaptivity of the robots. Another possibility is that robots need a long
time to adapt to a sudden change. We will check this hypothesis and try to accelerate adaptivity
if the problem is indeed a matter of speed.
    In the same way as Labella et al. [LDD04] and Liu et al. [LWS+ 06], we observed a division of
labour among robots. We didn’t report these observations in the present document, but further
investigations may provide interesting information on the characteristics of the behaviour of the
group of robots.
    The recruitment of foraging robots could improve the performance of the robots. We did not
implement this feature in the behaviour of the robots for the present study but it seems to be a
promising direction to enhance the foraging capabilities.
    Lastly, to validate our approach and assess the realism of our simulations we plan to carry out
a number of robotic experiments.

Acknowledgments.          This research has been supported by COMP2SYS, a Marie Curie Early Stage
Research Training Site funded by the European Community’s Sixth Framework Programme under contract
number MEST-CT-2004-505079. The information provided is the sole responsibility of the authors and
does not reflect the opinion of the sponsors. The European Community is not responsible for any use that
might be made of data appearing in this publication.
   The authors would like to thank Thomas H. Labella and Mauro Birattari for fruitful discussions, and
Shervin Nouyan for careful readings.


References
[Bal99]      Tucker Balch. Reward and diversity in multirobot foraging. Workshop on Agents
             Learning About and with Other Agents (IJCAI-99), Stockholm, Sweden, 1999.

[Cha76]      Eric L. Charnov. Optimal foraging, the marginal value theorem. Theoretical Popula-
             tion Biology, 9(2):129–136, 1976.

  ¸                           ¸
[DSahin04] M. Dorigo and E. Sahin. Swarm robotics – special issue editorial. Auton. Robots,
           17(2–3):111–113, 2004.
18                                    IRIDIA – Technical Report Series: TR/IRIDIA/2006-027


[DTG+ 05] Marco Dorigo, Elio Tuci, Roderich Groß, Vito Trianni, Thomas Halva Labella, Shervin
          Nouyan, Christos Ampatzis, Jean-Louis Deneubourg, Gianluca Baldassarre, Stefano
          Nolfi, Francesco Mondada, Dario Floreano, and Luca Maria Gambardella. The swarm-
          bots project. Lecture Notes in Computer Science, 3342:31–44, 2005.

[IMBG01] Auke Jan Ijspeert, Alcherio Martinoli, Aude Billard, and Luca Maria Gambardella.
         Collaboration through the exploitation of local interactions in autonomous collective
         robotics: The stick pulling experiment. Autonomous Robots, 11(2):149–171, 2001.
[KAKG02] S. Kazadi, A. Abdul-Khaliq, and R. Goodman. On the convergence of puck clustering
         systems. Robotics and Autonomous Systems, 38(2):93–117, 2002.
[LDD04]    Thomas H. Labella, Marco Dorigo, and Jean-Louis Deneubourg. Efficiency and task
           allocation in prey retrieval. Lecture Notes in Computer Science, 3141:274–289, 2004.
[LG02]     Kristina Lerman and Aram Galstyan. Mathematical model of foraging in a group of
           robots: Effect of interference. Auton. Robots, 13(2):127–141, 2002.
                                                                               c
[LJGM06] Kristina Lerman, Chris Jones, Aram Galstyan, and Maja J. Matari´. Analysis of
         dynamic task allocation in multi-robot systems. The International Journal of Robotics
         Research, 25(3):225–241, 2006.

[LWS+ 06] W. Liu, A. Winfield, J. Sa, J. Chen, and L. Dou. Strategies for energy optimisation in
          a swarm of foraging robots. In Lecture Notes in Computer Science : Swarm Robotics.
          Springer, Berlin / Heidelberg, 2006.
[ME03]     Alcherio Martinoli and Kjerstin Easton. Modeling swarm robotic systems. Springer
           Tracts in Advanced Robotics, 5:297–306, 2003.

[SS97]     Ken Sugawara and Masaki Sano. Cooperative acceleration of task performance: For-
           aging behavior of interacting multi-robots system. Physica D: Nonlinear Phenomena,
           100(3-4):343–354, 1997.
[SSYA98]   K. Sugawara, M. Sano, I. Yoshihara, and K. Abe. Cooperative behavior of interacting
           robots. Artificial Life and Robotics, 2:62–67, 1998.
[UB04]     Patrick Ulam and Tucker Balch. Using optimal foraging models to evaluate learned
           robotic foraging behavior. Adaptive Behavior - Animals, Animats, Software Agents,
           Robots, Adaptive Systems, 12(3-4):213–222, 2004.
[WD92]     Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8:279–
           292, 1992.
                                 Adaptive Behavior
                                                     http://adb.sagepub.com



Towards Energy Optimization: Emergent Task Allocation in a Swarm of Foraging Robots
               Wenguo Liu, Alan F. T. Winfield, Jin Sa, Jie Chen and Lihua Dou
                              Adaptive Behavior 2007; 15; 289
                             DOI: 10.1177/1059712307082088

                            The online version of this article can be found at:
                         http://adb.sagepub.com/cgi/content/abstract/15/3/289


                                                                   Published by:

                                                  http://www.sagepublications.com

                                                                   On behalf of:


                                         International Society of Adaptive Behavior



               Additional services and information for Adaptive Behavior can be found at:

                                       Email Alerts: http://adb.sagepub.com/cgi/alerts

                                  Subscriptions: http://adb.sagepub.com/subscriptions

                                Reprints: http://www.sagepub.com/journalsReprints.nav

                         Permissions: http://www.sagepub.com/journalsPermissions.nav

                                  Citations (this article cites 8 articles hosted on the
                                 SAGE Journals Online and HighWire Press platforms):
                                   http://adb.sagepub.com/cgi/content/refs/15/3/289




                                  Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                  © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
              Towards Energy Optimization: Emergent Task
              Allocation in a Swarm of Foraging Robots

              Wenguo Liu1, Alan F. T. Winfield1, Jin Sa1, Jie Chen2, Lihua Dou2
              1
               Bristol Robotics Lab, University of the West of England, Bristol, UK
              2
               Intellectual Information Technology Lab, Beijing Institute of Technology, China


              This article presents a simple adaptation mechanism to automatically adjust the ratio of foragers to
              resters (division of labor) in a swarm of foraging robots and hence maximize the net energy income to
              the swarm. Three adaptation rules are introduced based on local sensing and communications. Indi-
              vidual robots use internal cues (successful food retrieval), environmental cues (collisions with team-
              mates while searching for food) and social cues (team-mate success in food retrieval) to dynamically
              vary the time spent foraging or resting. Simulation results show that the swarm demonstrates success-
              ful adaptive emergent division of labor and robustness to environmental change (in food source den-
              sity), and we observe that robots need to cooperate more when food is scarce. Furthermore, the
              adaptation mechanism is able to guide the swarm towards energy optimization despite the limited
              sensing and communication abilities of the individual robots and the simple social interaction rules.
              The swarm also exhibits the capacity to collectively perceive environmental changes; a capacity that
              can only be observed at a group level and cannot be deduced from individual robots.


              Keywords swarm foraging · swarm robotics · task allocation · emergent division of labor



1     Introduction                                                                          ing (Nembrini, Winfield, & Melhuish, 2002). Forag-
                                                                                            ing, however, is a compelling example of a swarm
Inspired by the swarm intelligence observed in social                                       behavior that can be transferred from natural to artifi-
insects, robotic swarms are fully distributed systems in                                    cial systems because of the one-to-one correspondence
which overall system tasks are typically achieved                                           between ant and robot and between food-item and
through self-organization or emergence rather than                                          energy units. Foraging can, in principle, be undertaken
direct control (Bonabeau, Dorigo, & ThJraulaz, 1999).                                       by a single robot given enough time, but a swarm of
In swarm robotics a number of relatively simple                                             robots working together should be able to complete the
robots, each with limited sensing, actuation and cogni-                                     task more quickly and effectively. Foraging is, there-
tion, work together to collectively accomplish a task.                                      fore, an example of a swarm behavior in which it is not
Such tasks may be biologically plausible, such as clus-                                     the task itself, but the way the task is (self-)organized,
ter sorting (Holland & Melhuish, 1999) or cooperative                                       that is the desired emergent property of the swarm.
stick-pulling (Martinoli, 1999), or they may have no                                             It is a characteristic of foraging that the efficiency
parallel in nature, such as coherent wireless network-                                      of the swarm will not increase monotonically with



Correspondence to: Wenguo Liu, Bristol Robotics Lab, Faculty of                             Copyright © 2007 International Society for Adaptive Behavior
Computing, Engineering and Mathematical Sciences, University of the                         (2007), Vol 15(3): 289–305.
West of England, Bristol, BS16 1QY UK.                                                      DOI: 10.1177/1059712307082088
E-mail: wenguo.liu@brl.ac.uk                                                                Figures 2–8 appear in color online: http://adb.sagepub.com



                                                                                                                                                           289

                                         Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                         © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
290      Adaptive Behavior 15(3)

swarm size because of the negative impact of interfer-                                  communicate but observe each other and maintain a
ence, for instance due to robots competing for space                                    limited memory of observed activities of other robots
(overcrowding) either while searching or when con-                                      and tasks which need to be performed. Guerrero and
verging on the nest. Research has focused on how to                                     Oliver (2003) present an auction-like task allocation
carefully design the basic behavior or communication                                    model, partially inspired by auction and threshold-
protocols for the individual robots in order to mini-                                   based methods, to try to determine the optimal
mize the interference between robots. qstergaard,                                       number of robots needed for foraging, however, the
Sukhatme, and Matariƒ (2001) applied a bucket bri-                                      demands of communication between robots during the
gade approach to minimize the interference for robots                                   auction process constrains the scalability of their
near the home region, where competition for space                                       method
occurs much more frequently. In their experiment                                             This article builds upon previous work in task
each robot is allocated a specific working area and                                     allocation or division of labor in a number of ways.
delivery region so that they are less likely to collide                                 Firstly, our overall goal is that the swarm maximizes
with each other in a relatively crowded area. Lerman                                    its net energy income. Secondly, we investigate a
(2002) developed a macroscopic probabilistic model                                      richer set of adaptation rules, or cues, for individual
to quantitatively analyze the effect of swarm size and                                  robots: internal cues (successful food retrieval), envi-
interference among robots on the overall performance                                    ronmental cues (collisions with team-mates while
based on qstergaard’s experiment, and some research-                                    searching for food) and social cues (team-mate suc-
ers point out that there is a critical swarm size in order                              cess in food retrieval). Social cues are triggered by
to maximize the efficiency of the group for a given                                     pheromone-like local communication between robots.
task (Rosenfeld, Kaminka, & Kraus, 2005).                                               Thirdly, we evaluate a number of control strategies
     Other researchers have applied a threshold-based                                   based upon different combinations of these internal,
approach inspired from biological systems, as first                                     environmental and social cues, in order to discover the
described by ThJraulaz, Bonabeau, and Deneubourg                                        relative merit of the cues in optimizing the net energy
(1998) in investigating the division of labor in social                                 income to the swarm. Our foraging swarm makes use
insects, to allocate robots to each task: resting or for-                               of local sensing and communication only, but the
aging, in order to achieve the task efficiently. Krieger                                overall swarm exhibits properties of emergence and
and Billeter (2000) implement a swarm of up to 12                                       adaptive self-organization; that is adaptation to envi-
real robots engaged in a foraging task. In their experi-                                ronmental changes (in food density). The approach
ments each robot is characterized with a different ran-                                 presented in this article thus meets the criteria for
domly chosen activation-threshold, that is, the energy                                  swarm robotics articulated by Ôahin (2005) and Beni
level of the nest below which a given robot is trig-                                    (2005).
gered to go and collect food-items, in order to regulate                                     This article proceeds as follows. In Section 2 we
the activity of the team. Labella, Dorigo, and Deneu-                                   introduce the basic adaptation mechanism for the indi-
bourg (2006) introduce a simple adaptive mechanism                                      vidual robots in the swarm. In Section 3, descriptions
to change the ratio of foragers to resters to improve                                   of the foraging task and the experimental environment
the performance of the system where the probability                                     are given. Section 4 presents the experimental results,
of leaving home for one robot is adjusted dynamically                                   in which we compare the performance of the system
based on successful retrieval of food. They reward                                      with different adaptation rules. We conclude the arti-
successful food retrieval and punish failure in order to                                cle in Section 5.
vary the probability of leaving home. Self-organized
division of labor between resting and foraging is
observed with this simple adaptation mechanism. How-                                    2       Adaptation Mechanism
ever, a disadvantage of this approach is the absence of
knowledge about the other robots in the swarm.                                          2.1 Problem Description
     In contrast with the purely threshold-based                                        Consider the swarm foraging scenario: there are a
approaches, Jones and Matariƒ (2003) describe an                                        number of food-items randomly scattered in the arena
adaptive method for division of labor between a col-                                    and as food is collected more will, over time, “grow”
lection of red or green pucks, in which robots do not                                   to replenish the supply. A swarm of robots are search-




                                        Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                        © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                             Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                          291


ing and retrieving food-items back to the “nest”. Each                                      number of foragers x or decrease the average retrieval
food-item collected will deliver an amount of energy                                        time f(x,ρ). The more robots engaged in foraging,
to the swarm but the activity of foraging will consume                                      however, the more likely are the robots to compete for
a certain amount of energy at the same time. The goal                                       the limited resources of food and physical space. Thus,
of the swarm is to forage as much food as possible                                          increasing the number of foragers results in more inter-
over time in order to survive. Due to the limited avail-                                    ference between robots and robots take longer to find
ability of food, however, robots have to switch                                             and retrieve a food-item, that is, the average retrieval
between foraging and resting in order to maximize the                                       time for the swarm f(x,ρ) will increase with the number
net energy income to the swarm. The challenge is to                                         of foragers x increasing when the food density ρ
achieve this with no centralized control and robots                                         remains constant. Therefore, for a given ρ there should
with only minimal local sensing and communication.                                          be an optimal value, say X*, for x so that Eaverage in
Assume that:                                                                                Equation 3 is maximized. It is clearly the case that X*
                                                                                            will change if ρ changes. However, the function f(x,ρ)
•   each robot will consume energy at A units per sec-                                      will be quite complex and hard to model because of
    ond while searching or retrieving and at B units                                        the complexity of the interactions between robots;
    per second while resting, where A > B and energy                                        although it may ultimately be possible to develop a
    consumption depends on the actuators and sen-                                           detailed mathematical model in order to find an opti-
    sors used in different states;                                                          mal value of X*, using for example the probabilistic
•   each food-item collected by a robot will provide                                        approach developed by Martinoli and Easton (2004).
    C units of energy to the swarm;                                                         It seems more practical to adopt a bottom-up design
•   average retrieval time (the time spent on one suc-                                      process (a typical characteristic of the swarm robotics
    cessful search and retrieval cycle), denoted by t, is                                   methodology), resulting in a swarm that is able to
    a function of the number of foraging robots,                                            dynamically adapt the ratio of foragers to resters
    denoted by x, and the density of the food, denoted                                      through the interaction between robots and between
    by ρ, in the environment, say:                                                          robots and the environment.

    t = f(x,ρ)                                                                 (1)
                                                                                            2.2 Robot Controller
Let N be the size of swarm, Econsumed be the energy                                         We first need a controller design for the robot. Figure 1
consumed and Eretrieval be the energy collected per sec-                                    represents the control program for each robot in the
ond for the swarm, then we have                                                             swarm. Note that in order to keep the diagram clear,
                                                                                            with the exception of state resting, each state will move
    E consumed = Ax + B ( N – x )                 ( /s )                                    to state avoidance – not shown in Figure 1 – whenever
                                                                                            the proximity sensors are triggered. The behaviors for
                                                                                            the foraging task are:
                                Cx
     E retrieval = Cx ⁄ t = ---------------         ( /s )                     (2)
                            f ( x, ρ )
                                                                                            leavinghome: robot exits the nest region and resumes
                                                                                                its search;
The average energy income per second for the swarm                                          randomwalk: robot moves forward and at random
is                                                                                              intervals turns left or right through a random arc
                                                                                                using the camera to search for food;
    Eaverage = Eretrieval – Econsumed                                                       movetofood: if food is sensed by the camera, robot
                                                                                                adjusts its heading and moves towards the food;
               C                                                                          grabfood: if the food is close enough, triggered by
            =  --------------- – ( A – B ) x – BN                            (3)              sensors, robot closes its gripper and grabs the
               f ( x, ρ )                                                                     food;
                                                                                            scanarena: because of interference between robots
Equation 3 shows that in order to maximize energy                                               sometimes a robot will lose sight of the target
income for the swarm we need to either increase the                                             food-item while moving towards it; if this hap-




                                         Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                         © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
292      Adaptive Behavior 15(3)




Figure 1 State transition diagram of foraging task.



    pens the robot will scan the immediate area by                                      2.3 Adaptation Rules
    turning a random angle to find the lost food. If
    successful, it will move to the food (movetofood)                                   In swarm foraging each individual robot has only lim-
    again, if not, then randomwalk;                                                     ited local sensing and communications. A robot can-
movetohome: robot moves towards the home location                                       not get global information, for instance how much food
    with the food;                                                                      is available or how many other robots are engaged in
deposit: robot unloads the food-item in the nest;                                       foraging. How, then, can robots adjust the two time
resting: robot rests at home;                                                           threshold parameters (Ths and Thr) in order to improve
homing: if searching time is up and no food has been                                    group efficiency, based on local information? Our
    grabbed, then return home;                                                          inspiration comes from the widely observed phenome-
avoidance: robot avoids obstacles, walls and other                                      non in nature, such as in schools of fish, flocks of
    robots whenever its proximity sensors are trig-                                     birds, and so forth. The group behavior emerges from
    gered; after completing a successful avoidance                                      the interactions of individuals by essentially following
    behavior the robot returns to its previous state.                                   the rule “I do what you do”, “I go where you go”
                                                                                        (Camazine et al., 2001). If a robot follows the rule “I
We now introduce two internal clock counters for the                                    forage when you forage”, “I rest when you rest”, it
robots to regulate their activities (foraging or resting).                              may be possible to change the ratio of the foragers and
Ts is used to count the time the robot spends search-                                   resters in the swarm and improve the efficiency of the
ing, and Tr to count the time resting in the nest. As                                   system. In order to achieve this, we introduce three
shown in Figure 1, the transition between states ran-                                   rules to change the two parameters, Ths and Thr, based
domwalk, scanarena, movetofood and state homing is                                      on (i) environmental cues, (ii) internal cues, and (iii)
triggered when searching time Ts reaches its threshold                                  social cues, explained as follows:
Ths, that is, Ts ≥ Ths; such a transition will reduce the
number of foragers, which in turn minimizes the inter-                                  •      Environmental cues. For a robot that is searching
ference due to overcrowding, thus reducing the aver-                                           for food, if it collides with other robots it will
age retrieval time. The transition between state resting                                       reduce Ths and increase Thr because “there are
and state leavinghome, which is triggered when the                                             already enough foragers in the arena, I’d better
robot has rested for long enough, that is, Tr ≥ Thr, will                                      go back to the nest sooner so I don’t get in the
drive the robot back to work to collect more food for                                          others’ way”.
the colony, which means increasing the number of for-                                   •      Internal cues. After a successful food retrieval
agers in the swarm. The efficiency of the swarm might                                          cycle, a robot will reduce Thr because “there may
be improved if robots are able to autonomously adjust                                          be more food, I’d better go back to work as soon
their searching time threshold Ths and resting time                                            as possible”. Alternatively if a robot fails to find
threshold Thr.                                                                                 food, indicated by searching time is up, the robot




                                        Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                        © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                            Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                                         293

                                                                                                                                            i              i
Table 1 Mapping from cues to time threshold increase/                                      its own time thresholds Th s and Th r as shown in the
decrease.                                                                                  following equations, where i indicates the ID for each
                                                                                           robot.
                             increase                       decrease
                                                                                                         i                         i                   i
         Ths                                                                                       Th s ( t + 1 ) = Th s ( t ) – α 1 C ( t )
                                                                                                                                       i               i
         Thr                                                                                                             + β1 Ps ( t ) – γ1 Pf ( t )                  (4)

                          failure retrieval
    internal cues:                                                                                     i                          i                   i
                                                                                                   Th r ( t + 1 ) = Th r ( t ) + α 2 C ( t )
                          success retrieval
                                                                                                                                       i               i       i
    environmen-             collide with other robots while                                                              – β 2 P s ( t ) + γ 2 P f ( t ) – ηR ( t )   (5)
        tal cues:           searching
                           failure retrieval by teammates                                  The second entry on the right-hand side of the equa-
                    
     social cues:                                                                         tions above represents the contribution provided by the
                          success retrieval by teammates                                  environmental cues, where Ci(t) counts the collisions
                                                                                           while searching, and α1 and α2 are adjustment factors to
                                                                                           moderate the contribution of the environmental cues.
     will increase Thr since “there seems to be little                                     The third and fourth entries on the right-hand side take
                                                                                                                                     i            i
     food available, I’d better rest for longer”.                                          the social cues into account, in which P s ( t ) and P f ( t )
                                                                                           represent the retrieval information (success or failure)
For social cues we introduce a pheromone-like mech-                                        from team-mates through the pheromone-like mecha-
anism into the swarm. Two types of virtual pherom-                                         nism. The contribution from social cues is moderated
one, indicating either success or failure of food                                          by altering the adjustment factors β1, β2, γ1, γ2. The
retrieval, are deposited by the robots returning to the                                    final entry in Equation 5 denotes the contribution from
nest (implemented here by a shared whiteboard that                                         internal cues, where Ri(t) stores up-to-date informa-
each robot can both write to and read from). These                                         tion on the robot’s own food retrieval success or fail-
pheromone levels will gradually decay with time                                            ure, and η is the adjustment factor for internal cues.
elapsing. All robots resting at home will be able to                                            Note that the internal cues do not contribute to the
                                                                                                           i
sense the change in pheromone levels and adjust their                                      change of Th s in Equation 4 because it is difficult to
Ths and Thr based on the following rules:                                                  determine whether the robot should increase or
                                                                                                         i
                                                                                           decrease Th s solely based on its own recent retrieval
•    Social cues. If a robot returning home deposits a                                     success or failure. Consider that the robot has
     successful retrieval pheromone thus increasing                                        retrieved a food-item to the nest; it is not necessary to
                                                                                                                 i
     the level of that pheromone, then the robots rest-                                    increase its own Th s since it has been successful with
     ing at home will reduce Thr and increase Ths                                          this searching time threshold, conversely if it reduces
                                                                                                  i
     because “somebody else has found food, there                                          its Th s it may have less chance of finding food during
     may be more so I’d better get back to work                                            the decreased searching time threshold yet the suc-
     sooner”. Alternatively, if a resting robot perceives                                  cessful retrieval suggests there may be more food
     an increase in the failure retrieval pheromone it                                     available to collect. Now consider the situation in
     will increase Thr and reduce Ths because “some-                                       which the robot has failed to find food; increasing or
                                                                                                             i
     body else failed to find food, there may be a food                                    decreasing Th s will possibly lead the robot to either
     shortage so I’d better stay in the nest for longer”.                                  waste more energy searching in a poor food source
                                                                                           environment or have less chance to retrieve a food-
Table 1 summarizes the cues and conditions for                                             item in a rich food source environment because there
increasing or decreasing time thresholds. With the                                         is not enough time to find and collect a food-item. To
three adaptation rules described above each robot in                                       avoid this potentially ambiguous impact on perform-
the swarm will, from one time step to the next, adapt                                      ance, the robot therefore takes no reward or punish-




                                        Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                        © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
294          Adaptive Behavior 15(3)

ment from internal cues to adjust its searching time                                            threshold parameters, while the latter will benefit from
              i
threshold Th s .                                                                                the gradually decaying pheromone deployed by team-
      i     i       i             i                                                                                                              i
    C (t), R (t), P s ( t ) and P f ( t ) in Equations 4 and 5                                  mates. These two situations for updating P s ( t ) and
                                                                                                  i
are defined as follows                                                                          P f ( t ) are shown in Equations 8 and 9.
                                                                                                                                i         i
                                                                                                       Clearly the values of Th s and Th r for each robot
                 1 state randomwalk → state avoidance                                                                                      i
      C i (t ) =                                      (6)                                      will change over time. However, if Th s becomes too
                 0 otherwise                                                                   small the robot may not be able to complete food
                                                                                                retrieval (because of the time taken to move towards
                 1 state deposit → state resting
                                                                                  (7)          the food-item and grab it after the food-item has been
      R i (t ) =  −1 state homing → state resting
                                              s
                 0 otherwise                                                                   seen by the robot’s camera), resulting in unexpected
                                                                                               behavior of the robot. On the other hand, if Th s
                                                                                                                                                        i


                0 not in resting state                                                         becomes too large the robot will consume much more
        i                                                                                      energy than it can collect when the food source is rela-
      Ps (t ) =  SP (t ) state deposit → state resting
                    s                                                              (8)                                        i         i
                Σ N {R i (t ) | R i (t ) > 0} in resting state                                 tively poor. Therefore, Th s and Th r are clamped to
                 i =1
                                                                                                keep them within limits:
                0 not in resting state                                                                      i                                            i
        i                                                                         (9)                  Th s ∈ ( Th s-min, Th s-max ), Th r ∈ ( 0, Th r-max )
      Pf (t ) =  SPf (t ) state homing → state resting   e
                Σ N {R i (t ) | R i (t ) < 0} in resting state
                 i =1                                                                          The selection of values for adjustment and attenuation
                                                                                                factors will depend on the environment and the
where SPs and SPf in Equations 8 and 9 are used to                                              parameters of the robot behaviors. With careful selec-
simulate two kinds of virtual pheromone, related to                                             tion of those factors each robot can adapt its time
retrieval success or failure information respectively.                                          threshold parameters through the interactions between
This information can only be accessed by robots                                                 robots and environment, resulting in task switching
resting in the nest. As shown in Equation 10, attenu-                                           between foraging and resting. At the swarm level, the
ation factors δs and δf are introduced here to simulate                                         ratio of foragers to resters will dynamically adjust
gradually decaying rather than instantly disappearing                                           according to the given food source environment, thus,
social cues, somewhat akin to ants leaving a decay-                                             we contend, the efficiency of the swarm can be opti-
ing pheromone trail while foraging. Such a mecha-                                               mized on the basis of the analysis in Section 2.1.
nism could be readily implemented in a real robot
implementation:
                                                                                                3       Experimental Setup
            SP s ( t + 1 ) = SP s ( t ) – δ s
                                                                                                3.1 Simulation Environment
                                N        i         i
                           +   ∑ R (t)          R ( t ) > 0
                                     
                                                                                               To validate the hypothesis presented in the previous
                               i=1                                                              section, we tested our swarm foraging adaptation
                                                                                                scheme using the sensor-based simulation tools Player/
            SP f ( t + 1 ) = SP f ( t ) – δ f                                                   Stage (Gerkey, Vaughan, & Howard, 2003). Player is a
                                N                                                               server that connects robots, sensors, and control pro-
                                             i     i
                           +   ∑  R (t)
                                                 R ( t ) > 0                  (10)            grams over a network. Stage simulates a population of
                                                             
                               i=1                                                              mobile robots, sensors and objects in a two-dimen-
                                                                                                sional bitmapped environment. The robots used in the
As the virtual pheromones are only accessible to the                                            simulation are realistic models of the wireless net-
robots in the nest, two categories of robots will be                                            worked Linuxbots in the Bristol Robotics Laboratory
affected by the social cues. One is those already resting                                       (Winfield & Holland, 2000). An advantage of this
in the nest, the other is those ready to move to state                                          approach is that the Player control programs can be
resting from states homing or deposit; the former                                               transferred directly from simulation to the real robots.
should be able to “monitor” the change of these two                                                  Figure 2 is a snapshot of the simulation. Between
pheromones when resting and then adjust its time                                                two and ten robots work in an 8 m × 8 m octagonally




                                                Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                                © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                          Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                                   295




Figure 2 Screen shot of simulation, robots move at speed 0.15 m/s. (A) Eight robots are engaged in foraging. (B) Close
up view of robot, with a food-item in gripper; IR sensors range 0.2 m, the field of view for the camera is 60° and the view
distance is 2 m.

shaped arena. The nest area is indicated with a green                                    Table 2          Energy consumed in each state by the robot.
(gray) color, with one homing light source located at
point A to indicate the direction of the nest. Each                                        State                                   Energy consumed (units/s)
robot is size 0.26 m × 0.26 m, the same as the real
                                                                                           leavinghome                                                 6
robots (Linuxbots) in our laboratory, and is equipped
with three light intensity sensors, three infra-red prox-                                  randomwalk                                                  8
imity sensors, one camera, one color sensor and one
gripper. Thus the robot can sense food at a distance                                       movetofood                                                  8
using the camera then grab the food using its gripper;                                     grabfood                                                   12
the robot also has the ability to find its way back to the
nest using the three front mounted light intensity sen-                                    scanarena                                                   8
sors and to know whether it is at home or not with the                                     movetohome                                                 12
bottom mounted color sensor. The control programs
for each robot are identical, as shown in Figure 1.                                        deposit                                                    12
                                                                                           resting                                                     1
3.2 Selection of Parameters
                                                                                           homing                                                      6
At the start of the simulation, all robots are in state
                                                                                           avoidance                                                 6 or 9
resting with the same time threshold Ths(= Ths-max)
and Thr(= 0). In order to maintain the food density ρ
at a reasonably constant level over time, a new food-                                    step of the simulation, a robot will consume an amount
item is placed randomly in the searching arena with                                      of energy varying with its state since the robot uses dif-
probability Pnew, the growth rate, each second. By                                       ferent sensors and actuators in different states. For
changing the growth rate we obtain the desired food                                      example, a robot will consume more energy when car-
source density. Collected food deposited in the home                                     rying food back to the nest than when wandering in the
area will be removed to prevent robots from retrieving                                   search area because the gripper is used in the former
the food that has already been collected. In each time                                   state. We estimate these values by allocating each sen-




                                      Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                      © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
296        Adaptive Behavior 15(3)

Table 3    Parameter selection for adaptation algorithm.

 Ths–min    Ths–max   Thr–max           α1               α2               β1                β2              γ1                γ2                η                δs                δf

   20         200      2000              5                5               10                10              20               40                20               0.1              0.1


sor and actuator a relatively arbitrary value which rep-                                Table 4         The four strategies: cue combinations.
resents the energy consumed by the component, then
these values are summed in accordance with the spe-                                                                                                                  With
                                                                                                   With internal                   With social
cific sensors and actuators used in the state. Table 2                                                                                                          environmental
                                                                                                       cues                           cues
shows the resulting energy consumed per second for                                                                                                                   cues
each state. Note that the energy consumed in state
                                                                                         S1
avoidance also varies depending on whether the
robot is carrying a food-item. Moreover, the robot will                                  S2
consume a small amount of energy even when resting
at home, currently 1 unit/s. A successful food-item                                      S3
retrieval will deliver 2000 units of energy to the swarm.                                S4
     Before implementing the algorithm we first need
to choose the parameters in Equations 4 and 5.
Clearly, the average time for a robot to move to a                                      are then set at 10 to take the success retrieval from
food-item and grab it depends on the forward speed of                                   team-mates into account. Consequently, we set α1 and
the robot v and the detection range of the camera Dr, a                                 α2 to 5, based on the same considerations. Table 3
rough estimate is Dr/v ≈ 14 s. Based on this estimate                                   summarizes all of these parameters. Note that δs and δf
we then set Ths-min to 20 s. Ths-max is set to prevent the                              are arbitrarily chosen here.
robot wandering outside the nest for too long, which
could result in a larger negative contribution to the
energy of the swarm even if the robot does eventu-                                      4        Experimental Results
ally collect a food-item. For robots returning a food-
item, the average time spent moving back to the nest                                    The metric of performance of the swarm is defined as
can be estimated by simply considering the approxi-                                     the energy efficiency given by Equation 11.
mate radius of the arena R and robot forward speed v,
that is, R/2v ≈ 27 s. Taking into account the relative                                                       net energy income to swarm
                                                                                                                                                                                       -
                                                                                        efficiency = ----------------------------------------------------------------------------------- (11)
energy consumption in each state (see Table 2), the                                                  energy available from environment
robot will consume 27 * 12 = 324 units of energy
during this period, thus, in order to make a positive                                   In order to investigate whether and how our foraging
contribution to the energy of the swarm the maxi-                                       adaptation mechanism can improve the energy effi-
mum time it can spend on searching is approximately                                     ciency of the swarm three types of experiments were
(2000 – 324)/8 ≈ 210 s. Hence we choose Ths-max to                                      designed, by changing the size of swarm and the food
be 200 s. A similar set of considerations led us to set                                 source density. Additionally, to determine how each
Thr-max to 2000 s.                                                                      cue affects the energy efficiency, four strategies, each
     However, there are no obvious similar guidelines                                   with a different combination of cues, were designed
for selection of adjustment factors. We therefore                                       for each type of experiment. Table 4 shows the cue
choose these values on a trial and error basis. Gener-                                  configuration of each strategy. Strategy S1, in which
ally, a large change in the two time threshold parame-                                  no cues are taken into account, provided us with a
ters will potentially cause oscillation or stabilization                                homogeneous (fixed time threshold parameters)
problems for the swarm while a small change could                                       swarm as a benchmark. We then added cues one by
lead to a slow adaptation process. We first choose η as                                 one from strategy S2 to S4, the rationale for which
20, then γ2 is set to 40 and γ1 to 20 to balance the rela-                              being that any strategy must be able to adjust the two
tive contribution of social and internal cues. β1 and β2                                time threshold parameters in a bi-directional manner,




                                        Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                        © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                                           Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                          297


namely both increasing and decreasing. Not all com-
binations of cues met this requirement, for instance, a
strategy with a combination of internal and environ-
mental cues does not. For each strategy the value of
adjustment and attenuation factors in Equations 4, 5
and 10 were either the values shown in Table 3 or set
to zero depending on the combination of cues. For
example, for the swarm using strategy S1, all of these
values were zero; for the swarm using S2 only η was
set to the values in Table 3 and others remained zero.


4.1 Fixed Food Density, Variable Swarm Size
                                                                                                          Figure 4 The relationship between swarm size and effi-
To explore the optimal number of foragers, we ran                                                         ciency for different strategies.
experiments for the swarm with 2, 4, 6, 8 and 10
robots respectively. The food source density was the
same for these experiments and Pnew was set to 0.03.                                                      slightly from S1 to S4 for the swarm with the same
For each swarm we applied four strategies and each                                                        number of robots. For the swarm with 4, 6 and 8
strategy was run 10 times in simulation. The simula-                                                      robots, more than 98% of food was collected no mat-
tions each lasted for 10,000 s. We monitored the net                                                      ter which strategy was used. However, the swarm with
energy and the number of foragers during the whole                                                        2 robots only collected 92% of the food; the same sit-
simulation, and we then averaged the data from the 10                                                     uation occurred for swarm size 10 with strategies S3
runs for each experiment. The results are given in                                                        and S4, where 6–7% of food was left in the searching
Table 5. With equal Pnew the total amount of food                                                         arena. Checking the number of foragers for these
“growing” in the search arena over time was almost                                                        experiments, we found they were all less than 3 robots
the same (close to 300 items) in all experiments. We                                                      on average. All robots were foraging all of the time in
compare the food collection rate (collected/produced)                                                     the swarm with 2 robots since there was enough food
in Figure 3. It shows that the collection rate fell                                                       available for retrieval and, for the swarm sized 10 with
                                                                                                          strategies S3 and S4 the average numbers of foragers
                                                                                                          were 2.95 and 2.8, respectively. In other words, more
                                              2 robot
                                              4 robot
                                                                            8 robot
                                                                           10 robot
                                                                                                          than 2 robots needed to be engaged in foraging in
                            100
                                              6 robot                                                     order to collect all the food; meanwhile, the more
                                                                                                          robots resting in the nest, the more energy the swarm
                             99                                                                           could save. Therefore, for the given food source den-
Food collected rate s (%)




                             98
                                                                                                          sity Pnew = 0.03, the optimal average number of forag-
                                                                                                          ers over time, in order to maximize the energy income
                             97                                                                           for the swarm, must be between 2 and 4.
                             96
                                                                                                               We then compared the energy efficiency of the
                                                                                                          swarm with different strategies. As shown in Figure 4,
                             95                                                                           the efficiency of the four strategies in the swarm with
                             94
                                                                                                          2 robots was nearly the same since all robots were
                                                                                                          engaged in foraging. However, for swarms with more
                             93                                                                           than 2 robots, the swarm with strategy S4 could always
                             92
                                                                                                          obtain the highest energy efficiency, while the swarm
                                  S1         S2           S3                    S4                        without any adaptation rules (S1) had the lowest
                                               Strategies
                                                                                                          energy efficiency.
Figure 3 Comparison of food collected rate (food pro-                                                          The energy efficiency decreased with increasing
duced/collected) for different strategies in the swarm with                                               swarm size for all strategies, but the efficiency gap
2, 4, 6, 8, and 10 robots.                                                                                between strategies S3 and S2 became bigger with




                                                       Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                                       © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
298       Adaptive Behavior 15(3)

Table 5 Average results of 10 runs for the swarm with different size and different strategies. Each simulation lasts for
10,000 seconds. The food density remains the same during each simulation (Pnew = 0.03). Efficiency is calculated
according to Equation 11.

                                                  Food
  Size      Strategies                                                                   Average foragers                       Net energy              Efficiency
                               Produced                     Collected

      2         S1                295.8                         278.4                               2                              443634                75.0%
                S2                306.5                         283.6                               1.95                           452045                73.7%
                S3                298.8                         276.1                               1.97                           438392                73.4%
                S4                300.7                         277.7                               1.97                           443186                73.7%
      4         S1                298.4                         294.7                               4                              267806                44.9%
                S2                301.3                         297.9                               3.57                           300569                49.9%
                S3                295.0                         290.6                               3.52                           290855                49.3%
                S4                295.0                         290.1                               3.25                           306355                51.9%
      6         S1                299.9                         298.0                               6                              124218                20.8%
                S2                307.0                         304.2                               4.78                           215932                35.2%
                S3                299.4                         293.4                               3.11                           300533                50.2%
                S4                295.7                         286.3                               2.87                           305278                51.6%
      8         S1                297.8                         295.8                               8                              –23400                –3.9%
                S2                302.1                         299.1                               5.65                           141720                23.5%
                S3                296.7                         292.5                               4.32                           202260                34.1%
                S4                296.3                         290.4                               3.95                           227020                38.3%
   10           S1                300.7                         298.7                            10                              –167322                –27.8%
                S2                298.5                         293.7                               6.12                             74761               12.5%
                S3                291.1                         273.7                               2.95                           236173                40.6%
                S4                291.1                         270.5                               2.80                           240932                41.3%


increasing swarm size. Since strategies S3 and S4 both                                   the increasing swarm size. Considering the average
used social cues but S2 did not, the more robots in the                                  number of foragers in all experiments (swarm size
swarm the more often the robots will interact with                                       greater than 2); we can see the swarm with strategy S4
each other, and hence the more information about the                                     had the lowest average number of foragers, resulting
environment (food source density) the robots can                                         in more energy saved since most of the food produced
obtain. This helps a robot switch tasks between forag-                                   was collected. The values varied from 2.80 to 3.95,
ing and resting more effectively, and thus the effi-                                     close to the optimal number of foragers for the given
ciency of the swarm can be improved. We argue that                                       food source density as previously indicated. Thus we
the efficiency gap between S2 and S3 will increase with                                  can conclude the proposed adaptation mechanisms




                                         Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                         © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                                  Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                                                                     299

             3                                                                                                                                           5
                                             S1                                                                                                      4.5                S1
                                             S2                                                                                                                         S2
         2.5




                                                                                                                   Energy of swarm (10 5 units )
                                             S3                                                                                                          4
                                                                                                                                                                        S3
                                             S4                                                                                                      3.5                S4
             2
                                                                                                                                                         3
Robots




                                                                                                                                                     2.5
         1.5
                                                                                                                                                         2

                                                                                                                                                     1.5
             1
                                                                                                                                                         1

         0.5                                                                                                                                         0.5

                                                                                                                                                         0

             0                                                                                                                                       0.5
                 0    2000    4000            6000             8000          1×   10 4                                                                       0    2000        4000        6000   8000   1× 10 4
                              Time (s)                                                                                                                                        Time (s)

         6                                                                                                                                           3.5
                                            S1                                                                                                                          S1
                                            S2                                                                                                           3              S2




                                                                                                                 Energy of swarm (105 units )
         5
                                            S3                                                                                                                          S3
                                            S4                                                                                                       2.5
                                                                                                                                                                        S4
         4
                                                                                                                                                         2
Robots




         3                                                                                                                                           1.5


                                                                                                                                                         1
         2

                                                                                                                                                     0.5

         1
                                                                                                                                                         0


         0                                                                                                                                           0.5
             0       2000    4000             6000             8000           1× 10 4                                                                        0    2000        4000       6000    8000   1× 10 4
                             Time (s)                                                                                                                                         Time (s)


         8                                                                                                                                         3.5
                                             S1                                                                                                                    S1
         7                                   S2                                                                                                     3              S2
                                                                                                 Energy of swarm (105 units )




                                             S3                                                                                                                    S3
         6                                   S4                                                                                                    2.5
                                                                                                                                                                   S4
         5                                                                                                                                          2
Robots




         4                                                                                                                                         1.5


         3                                                                                                                                          1


         2                                                                                                                                         0.5

         1                                                                                                                                          0

         0                                                                                                                                         0.5
             0       2000     4000             6000             8000            1× 10 4                                                                  0       2000        4000        6000    8000   1× 10 4
                               Time (s)                                                                                                                                      Time (s)

Figure 5 (continues next page) The instantaneous foragers (left) and net energy (right) for swarm sizes 2, 4, 6, 8 and
10 (from top to bottom, for swarm sizes 8 and 10 see next page) with different strategies, Pnew = 0.03, each plot is the av-
erage of 10 runs. Left column: for each strategy both the varying number of foragers and the average number of foragers
over the whole time period are plotted.


can not only improve the performance of the system                                              are plotted. The number of foragers in all experiments
significantly but also guide the swarm towards energy                                           with strategy S3 and S4 kept oscillating with time,
optimization for a given food source density environ-                                           while staying near an average value. This means a
ment.                                                                                           dynamic equilibrium between the number of foragers
    A more interesting result is seen in Figure 5, in                                           and the number of resters was reached when we intro-
which the instantaneous number of foragers over time                                            duced the adaptation mechanisms into the swarm.




                                             Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                             © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
300               Adaptive Behavior 15(3)

         10                                                                                                                     3
                                           S1                                                                                              S1
          9
                                           S2                                                                                  2.5         S2




                                                                                             Energy of swarm (105 units )
          8                                S3                                                                                              S3
                                           S4                                                                                   2
          7
                                                                                                                                           S4
                                                                                                                               1.5
          6
Robots




          5                                                                                                                     1

          4
                                                                                                                               0.5
          3
                                                                                                                                0
          2

          1                                                                                                                    0.5

          0                                                                                                                     1
              0      2000    4000            6000             8000           1× 10 4                                                 0   2000    4000       6000   8000   1× 10 4
                             Time (s)                                                                                                            Time (s)


         12                                                                                                                      3

         11                                 S1                                                                                 2.5          S1
                                            S2                                                                                              S2




                                                                                                Energy of swarm (105 units )
         10
                                            S3                                                                                   2          S3
          9
                                            S4                                                                                 1.5          S4
          8

          7                                                                                                                      1
Robots




          6                                                                                                                    0.5

          5                                                                                                                      0
          4
                                                                                                                               0.5
          3
                                                                                                                                 1
          2

          1                                                                                                                    1.5

          0                                                                                                                      2
              0      2000    4000            6000             8000           1×   10 4                                               0   2000    4000       6000   8000   1× 10 4
                              Time (s)                                                                                                           Time (s)

Figure 5 (continued).


Thus, the overall swarm task allocation (division of                                          food produced in the environment was collected since
labor) emerges from the low level interactions between                                        there were enough robots engaged in the task. The
robots, and the environment. Figure 5 also plots the                                          efficiency of the swarm is plotted in Figure 6A. As we
instantaneous energies of the swarm with different                                            expect, the swarm with strategy S4 has the highest
strategies, and we were surprised to see the rates of                                         efficiency while the swarm without cues (S1) has the
increase of swarm energy for S2, S3, and S4 in all exper-                                     lowest efficiency in all experiments. Although the
iments were almost linear, and the swarm using all cues                                       average number of foragers in the swarm with S3 was
(S4) had the fastest rate of energy increase of all.                                          smaller when the food source became poorer, the effi-
                                                                                              ciency gap between S3 and S4 became larger. This is
                                                                                              because the robots not carrying food collided with
4.2 Variable Food Density, Fixed Swarm Size
                                                                                              each other more often in a low density food source
We designed a second set of experiments to investi-                                           environment. So the environmental cues had a bigger
gate the effect of different cues on the efficiency of the                                    impact on the performance of the swarm when the
swarm under different environment conditions; here                                            food source was poor than when the food source was
we fixed the swarm size to 8 robots but ran the experi-                                       rich. Comparing S3 and S2, Figure 6A shows that the
ment with three different food source densities, from                                         difference in efficiency between these two strategies
poor (Pnew = 0.015) to relatively rich (Pnew = 0.045).                                        became smaller with food source density increasing.
Different strategies were used for the swarm, again                                           The reason is that with the food source density
each experiment was run 10 times and each simulation                                          increasing, a robot is more likely to find food and
lasted for 10,000 s. Table 6 provides the results by                                          therefore reward itself (with resting less). Thus we can
averaging the 10 runs’ data for each experiment. Most                                         deduce that the social cues have less impact on the




                                              Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                              © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                          Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                                    301




Figure 6 (A) Efficiency of the swarm for different strategies and three different food source densities. (B) Average re-
trieval time for the swarm with different strategies and three different food source densities.


Table 6 Average results of 10 runs for the swarm sized 8 with different strategies. Each simulation lasts for 10,000 s.
The value of Pnew is set to 0.015, 0.03 and 0.045; within each simulation this value remains constant. Efficiency is calcu-
lated according to Equation 11.

                                                   Food
 Pnew       Strategies                                                                    Average foragers                       Net energy           Efficiency
                                Produced                     Collected

 0.015           S1                147.3                        146.5                                8                            –304996            –103.5%
                 S2                158.0                        155.6                                4.05                            –34466          –10.9%
                 S3                147.1                        143.1                                2.38                              45617           15.5%
                 S4                149.7                        142.0                                2.08                              64736           21.6%
 0.030           S1                297.8                        295.8                                8                               –23400            –3.9%
                 S2                302.1                        299.1                                5.65                            141720            23.5%
                 S3                296.7                        292.5                                4.32                            202260            34.1%
                 S4                296.3                        290.4                                3.95                            227020            38.3%
 0.045           S1                451.6                        448.3                                8                               258939            28.7%
                 S2                447.2                        442.3                                6.57                            343000            38.3%
                 S3                441.6                        436.9                                6.09                            361789            41.0%
                 S4                439.8                        431.5                                5.65                            381125            43.3%


performance of the swarm when the food source is rich                                    with strategy S4 was quite stable in comparison with
but more impact when the food source is poor; that is,                                   strategies S1 and S2, which implies the swarm with the
robots need to cooperate more when food is scarce.                                       adaptation mechanism is quite robust to environmen-
Figure 6B plots the average retrieval time for each                                      tal change. Instantaneous energy and foragers in the
strategy. It shows that despite the food source density                                  swarm are plotted in Figure 7 and, not surprisingly, a
changing, the average retrieval time for the swarm                                       new dynamic equilibrium for the number of foragers




                                      Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                      © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
302               Adaptive Behavior 15(3)

         10                                                                                                                      1
                                           S1
          9                                                                                                                     0.5
                                           S2




                                                                                                 Energy of swarm (105 units )
          8                                S3                                                                                    0
                                           S4
          7
                                                                                                                                0.5
          6
Robots




                                                                                                                                 1
          5
                                                                                                                                1.5
          4
                                                                                                                                 2
          3
                                                                                                                                            S1
                                                                                                                                2.5
          2                                                                                                                                 S2
                                                                                                                                 3          S3
          1
                                                                                                                                            S4
          0                                                                                                                     3.5
              0     2000     4000            6000            8000           1× 10 4                                                   0   2000    4000        6000    8000    1× 10 4
                             Time (s)                                                                                                             Time (s)

         10                                                                                                                       3
                                           S1                                                                                                S1
          9
                                           S2                                                                                   2.5          S2




                                                                                                 Energy of swarm (105 units )
          8                                S3                                                                                                S3
                                           S4                                                                                     2
          7                                                                                                                                  S4

          6                                                                                                                     1.5
Robots




          5                                                                                                                       1

          4
                                                                                                                                0.5
          3
                                                                                                                                  0
          2

          1                                                                                                                     0.5

          0                                                                                                                       1
              0     2000     4000           6000             8000          1× 10 4
                                                                                                                                      0   2000    4000         6000    8000     1× 10 4
                             Time (s)                                                                                                              Time (s)

         10                                                                                                                     4.5

          9
                                                                                                                                            S1
                                                                                                                                 4
                                                                                                                                            S2
                                                                                                 Energy of swarm (105 units )




          8                                                                                                                     3.5         S3
          7
                                                                                                                                            S4
                                                                                                                                 3

          6                                                                                                                     2.5
Robots




          5                                                                                                                      2

          4                                                                                                                     1.5

          3                                                                                                                      1
                                           S1
          2                                S2                                                                                   0.5
                                           S3
          1                                                                                                                      0
                                           S4
          0                                                                                                                     0.5
              0     2000     4000            6000            8000           1× 10 4                                                   0   2000    4000        6000    8000    1× 10 4
                             Time (s)                                                                                                             Time (s)

Figure 7 Instantaneous number of foragers (left) and net energy (right) for swarm size 8 and different food source
densities. From top to bottom, Pnew is 0.015, 0.03 and 0.045, respectively.

in the swarm is observed each time the food source                                             environment by introducing a step change of probabil-
density is changed and the gradient of energy increase                                         ity Pnew from 0.03 to 0.015 at t = 5,000 s and then
is different for the same strategies in different food                                         from 0.015 to 0.045 a t = 10,000 s. A swarm with 8
source environments.                                                                           robots, using different strategies, was engaged in the
                                                                                               foraging task. Each experiment was repeated 10 times
4.3 Dynamic Food Density                                                                       and each simulation lasted for 15,000 s, with other
                                                                                               parameters remaining as above. We plotted the instanta-
To test if our adaptation mechanism has the ability to                                         neous number of foragers and net energy of the swarm
adapt to a dynamic environment we now disturbed the                                            with time in Figure 8. As expected, a new dynamic




                                               Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                               © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                         Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                          303




Figure 8 The instantaneous foragers (A) and net energy (B) for swarm size 8 with different strategies when the food
source density changes during the simulation. Pnew changes from 0.03 to 0.015 when t = 5,000 and from 0.015 to 0.045
when t = 10,000. A: for each strategy both the varying number of foragers and the average number of foragers within
each time segment are plotted.



equilibrium for the number of foragers in the swarm                                     for the swarm. With the adaptation mechanism, the
was observed, after some delay, each time the food                                      swarm demonstrates:
source density was changed. Figure 8 also shows that
the swarms using social cues, S3 and S4, adapted more                                   •       significantly improved performance compared
rapidly to the change of environment than the swarm                                             with the swarm with no adaptation mechanism;
without social cues, S2. The reason for this is that                                    •       emergent dynamic task allocation (division of
social cues provide more information about the envi-                                            labor) between foraging and resting; and
ronment (food density) for the individuals. The gradi-                                  •       robustness to environmental change (in food den-
ent of net energy increase for different strategies                                             sity).
shows the swarm using all cues (S4) could still achieve
a more rapid energy increase than other strategies in                                   Furthermore, the swarm with the adaptation mecha-
the dynamic food density environment. Therefore, the                                    nism seems to be able to guide the system towards
swarm with the adaptation mechanism was more                                            energy optimization despite the limited sensing abili-
robust to dynamic environmental change.                                                 ties of the individual robots and the simple social
                                                                                        interaction rules. The swarm also exhibits the capacity
                                                                                        to perceive the environment collectively if we take
5   Conclusions and Future Studies                                                      into account the average number of active foragers in
                                                                                        the swarm over time. That is, more active robots indi-
In this article, we have proposed a simple adaptation                                   cate a richer food environment and more inactive
mechanism for a swarm foraging task which is able to                                    robots indicate a poor food environment. This correla-
dynamically change the number of foragers and thus                                      tion can only be observed at the overall swarm level
make the swarm more energy efficient. The individu-                                     and cannot be deduced from individual robots.
als in the swarm use only internal cues (successful or                                       Other interesting conclusions are, firstly, that the
unsuccessful food retrieval), environmental cues (col-                                  swarms utilizing social cues achieve the highest net
lision with other robots while searching) and social                                    energy income to the swarm and the fastest adaptation
cues (team-mate food retrieval success or failure) to                                   in ratio of foragers to resters when the food density
determine whether they will rest in the nest for longer                                 changes; and, secondly, that the same social cues have
to either save energy or minimize interference, or be                                   the greatest impact when the food density is low.
actively engaged in foraging, which costs more energy                                        This study is ongoing, and thus far we have tested
for the individual but potentially gains more energy                                    our approach in swarms with small numbers of robots




                                     Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                     © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
304       Adaptive Behavior 15(3)

(no more than 10) but we have not tested the scalability                                  Gerkey, B. P., Vaughan, R. T., & Howard, A. (2003). The
of the approach to swarms with hundreds or thousands                                           player/stage project: tools for multi-robot and distributed
of robots; however, given the minimal local communica-                                         sensor systems. In Proceedings of the International Con-
                                                                                               ference on Advanced Robotics (pp. 317–323). Coimbra,
tion between robots we have good reason to suppose the
                                                                                               Portugal.
approach is scalable. We also have confidence that the
                                                                                          Guerrero, J., & Oliver, G. (2003). Multi-robot task allocation
approach will exhibit a high level of robustness to failure                                    strategies using auction-like mechanisms. Artificial Re-
of individual robots, in keeping with the levels of robust-                                    search and Development in Frontiers in Artificial Intelli-
ness commonly seen in swarm robotic systems.                                                   gence and Applications, 100, 111–122.
      Currently, all values of adjustment factors in Equa-                                Holland, O., & Melhuish, C. (1999). Stimergy, self-organiza-
tion 4 and 5 are chosen on a trial and error basis and all                                     tion, and sorting in collective robotics. Artificial Life, 5,
experiments described in this article used the same time                                       173–202.
adjustment values (see Table 3). Thus we cannot be sure                                   Jones, C., & Matariƒ, M. J. (2003). Adaptive division of labor
                                                                                               in large-scale multi-robot systems. In Proceedings of the
how closely the swarm with the adaptation mechanism
                                                                                               IEEE/RSJ International Conference on Intelligent Robots
approaches optimal energy efficiency since such optimal
                                                                                               and Systems (IROS) (pp. 1969–1974). Las Vegas, NV.
values, to the best of our knowledge, cannot be deter-                                    Krieger, M. J. B., & Billeter, J.-B. (2000). The call of duty:
mined ideally through any modeling approach. Poten-                                            self-organised task allocation in a population of up to
tially, in order to achieve maximum energy efficiency,                                         twelve mobile robots. Robotics and Autonomous Systems,
the swarm should also be able to automatically adapt or                                        30, 65–84.
evolve these values. Therefore future studies will                                        Labella, T. H., Dorigo, M., & Deneubourg, J.-L. (2006). Divi-
include: (i) introducing a learning mechanism so that the                                      sion of labor in a group of robots inspired by ants’ forag-
swarm can find its own time adjustment values, and (ii)                                        ing behavior. ACM Transactions on Autonomous and
                                                                                               Adaptive Systems, 1, 4–25.
analysis to determine how these values affect the per-
                                                                                          Lerman, K. (2002). Mathematical model of foraging in a group
formance of the swarm and to what extent the system
                                                                                               of robots: Effect of interference. Autonomous Robots, 13,
will be able to remain robust over a range of time adjust-                                     127–141.
ment values. Future studies will also (iii) seek to develop                               Martinoli, A. (1999). Swarm intelligence in autonomous collec-
a probabilistic or statistical model for our foraging                                          tive robotics: from tools to the analysis and synthesis of
swarm, and compare the optimal performance from the                                            distributed collective strategies. PhD thesis, Lausanne,
mathematical model and robot experiments.                                                      Switzerland: Ecole Polytechnique FJdJrale de Lausanne.
                                                                                          Martinoli, A., & Easton, K. (2004). Modeling swarm robotic sys-
                                                                                               tems: A case study in collaborative distributed manipulation.
Acknowledgments
                                                                                               International Journal of Robotics Research, 23, 415–436.
The authors would like to thank Jan Dyre Bjerknes and Chris-                              Nembrini, J., Winfield, A., & Melhuish, C. (2002). Minimal-
topher Harper for their helpful discussions during preparation                                 ist coherent swarming of wireless networked autono-
for the paper. The authors also wish to thank the anonymous                                    mous mobile robots. In B. Hallam, D. Floreano, J. Hallam,
referees for their insightful comments, which have guided                                      G. Hayes, & J.-A. Meyer (Eds.), From Animals to Ani-
improvements to the clarity and content of the paper. This work                                mats 7: Proceedings of the Seventh International Confer-
is partially supported by the EU Asia-link Selection Training                                  ence on Simulation of Adaptive Behavior (pp. 373–382).
and Assessment of Future Faculty (STAFF) project.                                              Cambridge, MA: MIT Press.
                                                                                          qstergaard, E. H., Sukhatme, G. S., & Matariƒ, M. J. (2001).
                                                                                               Emergent bucket brigading – a simple mechanism for
References                                                                                     improving performance in multi-robot constrained-space
Beni, G. (2005). From swarm intelligence to swarm robotics. In                                 foraging tasks. In Proceedings of the 5th International Con-
    E. Ôahin & W. Spears (Eds.), Swarm robotics workshop:                                      ference on Autonomous Agents. (pp. 29–30). New York.
    state-of-the-art survey (pp. 1–9). New York: Springer.                                Rosenfeld, A., Kaminka, G. A., & Kraus, S. (2005). A study of
Bonabeau, E., Dorigo, M., & ThJraulaz, G. (1999). Swarm intel-                                 scalability properties in robotic teams. In P. Scerri, R. Vin-
    ligence: from natural to anrtificial systems. New York:                                    cent, & R. Mailler (Eds.) Coordination of large-scale multi-
    Oxford University Press, Inc.                                                              agent systems (pp. 27–51). New York: Springer-Verlag.
Camazine, S., Franks, N. R., Sneyd, J., Bonabeau, E., Deneu-                              Ôahin, E. (2005). Swarm robotics: From sources of inspiration
    bourg, J.-L., & ThJraulaz, G. (2001). Self-organization in                                 to domains of application. In E. Ôabin & W. Spears (Eds.)
    biological systems. Princeton, NJ: Princeton University                                    Swarm robotics workshop: state-of-the-art survey (pp. 10–
    Press.                                                                                     20). New York: Springer.




                                          Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                          © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                                           Liu et al. Emergent Task Allocation in a Swarm of Foraging Robots                                          305

ThJraulaz, G., Bonabeau, E., & Deneubourg, J.-L. (1998).                                  Winfield, A. F., & Holland, O. E. (2000). The application of wire-
    Response threshold reinforcement and division of labour                                   less local area network technology to the control of mobile
    in insect societies. In Proceedings of the Royal Society of                               robots. Microprocessors and Microsystems, 23, 597–607.
    London B: Biological Sciences, 265, 327–332.



About the Authors

                             Wenguo Liu received his BSc in industry automation (2001) and MSc in pattern recogni-
                             tion and intelligent systems (2004) from the Beijing Insititue of Technology, China. He is
                             currently a PhD student working in the Bristol Robotics Laboratory, University of the West
                             of England, Bristol, UK. His main research interests are modeling and adaptation in
                             swarm robotic systems.




                             Alan F. T. Winfield is Hewlett-Packard Professor of Electronic Engineering in the Faculty
                             of Computing, Engineering and Mathematical Sciences at the University of the West of
                             England (UWE), Bristol, UK. He received his PhD in digital communications from the Uni-
                             versity of Hull in 1984, then co-founded and led APD Communications Ltd until taking-up
                             the appointment at UWE, Bristol in 1991. He co-founded UWE’s Intelligent Autonomous
                             Systems Laboratory (now the Bristol Robotics Laboratory) in 1993 and the focus of his
                             current research is on the engineering and scientific applications of swarm intelligence.


                             Jin Sa received her BSc in computing and information system (1983) and her PhD in
                             computer science (1987) from the University of Manchester, UK. Currently, she is a prin-
                             cipal lecturer within the Faculty of Computing, Engineering and Mathematical Sciences at
                             the University of the West of England. Her research interest has been mainly on applying
                             formal approaches to various application domains such as operating systems and proc-
                             ess modeling. More recently she has been working on applying formal methods to devel-
                             oping swarm robotic systems.


                             Jie Chen is a Full Professor and Assistant President, Director of the Office of Research
                             and Development Administration, of Beijing Institute of Technology. He received his PhD
                             degree at Beijing Institute of Technology. His research interests are intelligent control and
                             intelligent system, multi-objective optimization and decision.




                             Lihua Dou received her PhD degree in control theory and control engineering in 2000
                             from the Beijing Institute of Technology. She is currently a professor at Beijing Institute of
                             Technology. Her research interests include pattern recognition, image processing and
                             robotics.




                                       Downloaded from http://adb.sagepub.com at PENNSYLVANIA STATE UNIV on February 7, 2008
                       © 2007 International Society of Adaptive Behavior. All rights reserved. Not for commercial use or unauthorized distribution.
                     Université Libre de Bruxelles
                     Institut de Recherches Interdisciplinaires
                     et de Développements en Intelligence Artificielle




Interference reduction through task
  partitioning in a robotic swarm


     Giovanni Pini, Arne Brutschy,
   Mauro Birattari, and Marco Dorigo




    IRIDIA – Technical Report Series
          Technical Report No.
          TR/IRIDIA/2009-006
                April 2009
IRIDIA – Technical Report Series
                   ISSN 1781-3794

Published by:
 IRIDIA, Institut de Recherches Interdisciplinaires
           e
    et de D´veloppements en Intelligence Artificielle
            ´
  Universite Libre de Bruxelles
  Av F. D. Roosevelt 50, CP 194/6
  1050 Bruxelles, Belgium

Technical report number TR/IRIDIA/2009-006




The information provided is the sole responsibility of the authors and does not necessarily
reflect the opinion of the members of IRIDIA. The authors take full responsibility for any
copyright breaches that may result from publication of this paper in the IRIDIA – Technical
Report Series. IRIDIA is not responsible for any use that might be made of data appearing
in this publication.
IRIDIA – Technical Report Series: TR/IRIDIA/2009-006                                                             1




        INTERFERENCE REDUCTION THROUGH TASK
           PARTITIONING IN A ROBOTIC SWARM
                    or: “Don’t you step on my blue suede shoes!”


               Giovanni Pini, Arne Brutschy, Mauro Birattari, and Marco Dorigo
                                               e
                      IRIDIA, CoDE, Universit´ Libre de Bruxelles, Brussels, Belgium
                              {gpini,arne.brutschy,mbiro,mdorigo}@ulb.ac.be




Keywords:     Swarm robotics, foraging, self-organized task allocation, task partitioning, swarm intelligence.

Abstract:     This article studies the use of task partitioning as a way to reduce interference in a spatially con-
              strained harvesting task. Interference is one of the key problems in large cooperating groups. We
              present a simple method to allocate individuals of a robotic swarm to a partitioned task, and show
              that task partitioning can increase system performance by reducing sources of interference. The
              method is experimentally studied, both in an environment with a narrow area and an environ-
              ment without this constraint. The results are analyzed and compared to the case in which task
              partitioning is not employed.



1   INTRODUCTION                                          self-organized allocation to such a task partition
                                                          when using a robotic swarm.
In collective robotics, interference is a critical            We use the foraging problem, one of the
problem limiting the growth of a group: the time          canonical testbeds for collective robotics, as a
each robot spends in non-task-relevant behaviors          base for our studies. In our experiments, a swarm
such as obstacle avoidance increases when the             of homogeneous robots has to harvest prey ob-
density of individuals rises—see e.g., Lerman and         jects from a source area and transport them to a
Galstyan (2002). The performance on tasks that            home area. In this study, we limit ourselves to a
suffer from physical interference can typically be         harvesting task that is pre-partitioned by the de-
improved by spatial partitioning; for example, by         signer into two subtasks with a sequential inter-
keeping each robot in its own “working area”. A           dependency. We study a simple threshold-based
known approach that uses this rationale is the            model of self-organized allocation and focus on
so called bucket-brigade (Font´n and Matari´,
                                 a               c        two questions: Under which environmental con-
1996; Shell and Matari´, 2006). In this approach,
                         c                                ditions is it advantageous to partition the task?
robots hand over objects to robots working in             Can this partition reduce interference between
the following area, until the objects reach their         group members? These questions are studied in
destination. As tasks usually cannot be parti-            two experiments using a simulated robot swarm.
tioned arbitrarily, this approach effectively limits           The paper is organized as follows. We first
the number of robots that can be employed in the          review related works in Section 2. In Section 3
task. A possible solution to this problem, treat-         we explain the task partitioning and the alloca-
ing working areas as non-exclusive, raises other          tion method employed in this study. Section 4
problems: How should individuals be allocated             gives the methods used in the experiments by de-
to tasks? How can such an allocation help in lim-         scribing the environments, the simulated robots,
iting the amount of interference?                         and the controller. In Section 5 the results of
    In this paper, we study how task partition-           the experiments are given and discussed. Sec-
ing can help in reducing sources of interference.         tion 6 draws some conclusions and discusses fu-
Additionally, we show a simple way to achieve             ture work.
2                                                 IRIDIA – Technical Report Series: TR/IRIDIA/2009-006




2    RELATED WORK                                        based on threshold-based approaches, taking in-
                                                         spiration from division of labor in social in-
Interference has long been acknowledged as be-           sects. Krieger and Billeter (2000) were among
ing one of the key issues in multi-robot coopera-        the first to propose threshold-based approaches in
tion (Goldberg and Matari´, 2003). Lerman and
                              c                          multi-robot task allocation. Labella et al. (2006)
Galstyan (2002) devised a mathematical model             used threshold-based task allocation in a multi-
that allows a quantification of the interference          foraging task. Similarly, Campo and Dorigo
and its effect on group performance. Probably,            (2007) used a notion of the group’s internal en-
the most thorough study was published by Gold-           ergy to allocate individuals to a multi-foraging
berg (2001), who identified several types of multi-       task. Finally, Liu et al. (2007) studied a multi-
robot interactions. Goldberg notes that one of the       foraging task while focusing on the influence of
most common types of interference is physical in-        the use of different social cues on the overall group
terference in a central area, for example the nest.      performance.
This kind of interference results from resource
conflicts, in this case physical space, and can be
arbitrated by either making sure that robots stay        3    TASK PARTITIONING AND
in different areas all the time or by employing a              ALLOCATION
scheduling mechanism to ensure that robots use
the same space only at different times.                   In this work, we study a collective foraging task.
    A simple method for reducing interference by         By spatially partitioning the environment, the
using the first arbitration method mentioned is           global foraging task is automatically partitioned
the so-called bucket-brigade: robots are forced          into two subtasks: 1) harvesting prey objects
to stay in exclusive working areas and to pass           from a harvesting area (source) and 2) transport-
objects to the following robot as soon as they           ing them to a home area (nest). Robots working
cross the boundaries of their area (Font´n and a         on the first subtask harvest prey objects from the
        c                              c
Matari´, 1996; Shell and Matari´, 2006). Re-             source and pass them to the robots working on
cently, this has been extended to work with adap-        the second subtask, which store the objects in
tive working areas by Lein and Vaughan (2008).           the nest. These subtasks have a sequential inter-
To the best of our knowledge, current works con-         dependency in the sense that they have to be per-
cerned with bucket brigading only studied the in-        formed one after the other in order to complete
fluence of interference due to obstacle avoidance.        the global task once: delivering a prey object to
Other sources of interference (e.g., object manip-       the home area.
ulation) were never studied, although they might             Robots can decide to switch from one subtask
have a critical impact on the performance of any         to the other, thus creating a task allocation prob-
task partitioning approach. To quote Shell and           lem: individual robots have to be allocated to
        c
Matari´ (2006): “If the cost of picking up or drop-      subtasks and different allocations yield different
ping pucks is significant [. . . ], then bucket brigad-   performance. As a prey object has to be passed
ing may not be suitable.”                                directly from one robot to the other, a robot usu-
    Task allocation for multi-robot systems is a         ally has to wait some time before passing a prey
wide field, which can be divided in intentional and       object to or receiving a prey object from a robot
self-organized task allocation. Intentional task al-     working on the other subtask. This waiting time
location relies on negotiation and explicit com-         can therefore give an indication of the allocation
munication to create global allocations, whereas         quality for the respective subtask: if the wait-
in self-organized task allocation global allocations     ing time is very long, there might not be enough
result from local, stochastic decisions. A formal        robots allocated to the other subtask. Thus, the
analysis and taxonomy that covers intentional            robots can use this waiting time to decide whether
task allocation has been proposed by Gerkey and          to switch subtask or not. Ideally, the waiting time
        c
Matari´ (2004). Kalra and Martinoli (2006) re-           should be the same for the two subtasks in order
cently compared the two best-known approaches            for the system to reach a stable state and deliver
of intentional and self-organized task allocation.       optimal performance.
    The field of self-organized task allocation is            Our robots exploit a simple threshold-based
in its early stages, as most studies tackle simple       model to decide when to switch task: when the
problems without task interdependencies. Stud-           waiting time tw is higher than a threshold Θ, a
ies in self-organized task allocation are mostly         robot switches its subtask. The robot’s waiting
IRIDIA – Technical Report Series: TR/IRIDIA/2009-006                                                           3




time is a function of the average time the robots
working in the other subtask need to complete
their task. The task-completion time of a robot
depends on two factors: 1) round-trip-time (i.e.,
distance to travel) and 2) time lost due to interfer-
ence. Thus, the robot’s threshold Θ is a function
of the round-trip-time and the interference of the
robots in the other subtask. Therefore, the opti-
mal task switching threshold depends on the task
(i.e., time to harvest a prey object) and the en-
vironment (i.e., distance between the source and
the nest). As the parameters of the environment
are not pre-programmed into the robots, deter-
mining the optimal threshold can be a complex
problem. In this work, we limit ourselves to a
simple method for setting this threshold: at the
start of the experiment, each robot draws a ran-        Figure 1: Depiction of (a) the narrow-nest environ-
dom threshold that is used as its task switching        ment used in the first experiment and (b) the wide-
threshold throughout the experiment.                    nest environment used in the second experiment. The
    In the following, we study the properties of        gray stripes are the source (left), and the nest (right),
                                                        each 0.25 m deep. The black stripe is the exchange
this simple self-organized task allocation strategy,    zone, that is 0.5 m deep. The light source is marked
compare this strategy to a strategy without task        with “L”.
partitioning, and analyze how it can help to re-
duce interference. We refer to the two strategies
as partitioned and non-partitioned, respectively.
                                                        source, and the exchange zone can be detected
                                                        through environmental cues (ground color).

4     METHODS                                              At time t = 0, the robots are randomly placed
                                                        in the harvest area. The experiments run for
                                                        tmax = 18, 000 time steps (a simulated time of
This section describes the environments in which
                                                        one hour, with a time step length of 200 ms). The
the experiments are carried out, the simulated
                                                        experiments are run in two different arenas (see
robots, and the robot’s controller. Additionally,
                                                        Figure 1). The first arena (Figure 1a) is 4.125 m
we describe how we run the experiments and we
                                                        long with a width of 1.6 m at the source and ex-
introduce some metrics that we use to evaluate
                                                        change zone, whereas the nest is 0.4 m wide. The
the properties of the system.
                                                        exchange zone is located 3.125 m away from the
                                                        source. This arena is characterized by the pres-
4.1    Environments                                     ence of an area, critical for the task, in which
                                                        high interference between robots can be expected
We study task allocation in two different envi-          (the nest). Thus, this arena is referred to as the
ronments. In these two environments, the nest is        narrow-nest environment.
marked by a light source that can be perceived
by all robots, thus providing directional informa-          The second arena (Figure 1b) has a rectangu-
tion. The environment is spatially partitioned in       lar shape: it is 3.75 m long and 1.6 m wide. Here
two parts: the source is located on the left and the    as well the exchange zone is located 3.125 m from
nest is located on the right side of the arena. We      the source. The arena shape does not suggest the
refer to the two sides of the arena as harvest area     presence of any zone where interference can be
and store area, respectively. The exchange zone         higher than in other places. This arena is referred
is located between these two areas. Robots work-        to as the wide-nest environment.
ing on the left side, called harvesters, gather prey        The area of both arenas is 6 m2 , 5 m2 for the
objects in the source and move them to the ex-          harvest area and 1 m2 for the store area. The
change zone, where they pass them to the robots         overall area is the same in the two arenas, so that
working on the other side. These are referred to        the same group size results in the same robot den-
as storers: their role is to transport prey objects     sity. Thus, results are comparable across the two
to the nest and store them there. The nest, the         environments.
4                                               IRIDIA – Technical Report Series: TR/IRIDIA/2009-006




4.2    Simulation

The experiments are carried out in a custom
simulation environment that models geometries
and functional properties of simple objects and
robots. Our robots’ model is purely kinematic.
Prey objects are simulated as an attribute a robot
can posses and not as physical entities. Although
the experiments are conducted in simulation only,
the simulated robots have a real counterpart:
the swarm-bot robotic platform (Mondada et al.,        Figure 2: Simplified state diagram of the controller
                                                       of the robots. Gray states belong to the harvest task,
2004). The platform consists of a number of mo-
                                                       white states to the store task. The obstacle avoidance
bile autonomous robots called s-bots, which have       state has been omitted for clarity, as it is applicable
been used for several studies, mainly in swarm in-     in all states of the robot. tw is the time spent in the
telligence and collective robotics—see for instance    exchange zone and Θ is the threshold.
Groß et al. (2006) and Nouyan et al. (2008). The
simulated s-bots are of round shape, with a diam-
eter of 0.116 m. Each of them is equipped with 16      a harvester with a prey object. Robots can de-
infrared proximity sensors, used to perceive ob-       tect other robots carrying a prey on the basis of
stacles up to a distance of 0.15 m. Eight ambient      the color of their LED ring. While moving, each
light sensors can be used to perceive light gradi-     robot avoids obstacles (walls and other robots).
ents up to a distance of 5.0 m. The robots are             Task switches can occur: a harvester carry-
equipped with 4 ground sensors used to perceive        ing a prey object can decide to become a storer,
nest, source and exchange zone. A 8 LEDs ring          and a storer not carrying a prey object can de-
is used to signal when a prey object is carried.       cide to become a harvester. As mentioned be-
An omnidirectional camera allows the perception        fore, robots switch task depending on an internal
of LEDs in a circle of radius 0.6 m surrounding        threshold Θ, representing the maximum amount
the robot. A uniform noise of 10% is added to          of control cycles they can spend in the transfer
all sensor readings at each simulation step. The       zone waiting to pass (harvesters) or receive (stor-
robots can move at a maximum speed of 0.1 m/s          ers) a prey object. If a robot remains in the trans-
by means of a differential drive system.                fer zone longer than its threshold without passing
                                                       or receiving prey objects (tw > Θ), it switches its
4.3    Controller                                      task. The optimal threshold value is not trivial
                                                       to determine. In the work presented here, we use
                                                       a simple method to set the threshold Θ: at the
All the robots share the same, hand coded, fi-
                                                       beginning of the experiment, each robot draws a
nite state machine controller depicted in Figure 2.
                                                       random threshold, sampled uniformly in the in-
The controller consists of two parts, each corre-
                                                       terval [0, 1000]. We chose this method because it
sponding to a possible subtask a robot can per-
                                                       is independent of the environment and does not
form. Gray states refer to the harvest subtask,
                                                       rely on complex approximation techniques. The
white states to the store subtask. Since all the
                                                       threshold value does not change during the exper-
robots start in the harvest area, their controller
                                                       iment. In case of the non-partitioned strategy, the
is initially set to perform anti-phototaxis. In this
                                                       threshold is set to Θ = 0, causing the robots to
way they will reach the source, where they can
                                                       switch subtask immediately as soon as they reach
start retrieving prey objects. The behavior of
                                                       the exchange zone.
each robot is a function of the task it is per-
forming. Harvesters not carrying a prey object
move towards the source, where they can find            4.4    Experiments
prey. Harvesters carrying a prey object, move
to the exchange zone and wait for a free storer.       The goal of the experiments is to investigate
Upon arrival of such a storer, the harvester passes    whether task partitioning can reduce interfer-
the prey object to it. Storers carrying a prey         ence in task-critical zones, and how to allocate
object move towards the nest, where they can           a robotic swarm to partitions. As pointed out
deposit the object. Storers not carrying a prey        by Lerman and Galstyan (2002), interference is
object head to the exchange zone and search for        related to the number of individuals in the sys-
IRIDIA – Technical Report Series: TR/IRIDIA/2009-006                                                    5




tem. Additionally, the physical interference be-       the cost of the non-partitioned strategy is purely
tween robots is also a function of the environment     due to interference (Tpart = 0), while in case of
the robots act in. The higher the group size, the      the partitioned strategy, prey passing costs add
higher the density, resulting in a higher amount       to interference costs. In a way, passing a prey
of physical interference. Thus, in order to study      object produces another kind of interference in
interference in our experiments, we increase the       the system. The strategy cost captures this effect,
size of the group in each of the two environments      thus allowing for a comparison of strategies.
shown in Figure 1, while using both strategies
(non-partitioned and partitioned). We study the
performance of the system when the group size
N ranges in the interval [1, 40]. We run 50 repe-
                                                       5   RESULTS AND DISCUSSION
titions for each value of N and each experimental
settin.                                                    The graphs in Figures 3a and 4 show the
                                                       performance P for different group sizes in the
4.5    Metrics                                         narrow-nest and wide-nest environment respec-
                                                       tively. Figure 3b shows the individual efficiency
In order to quantify the influence of interference,     Ieff of the robots in the narrow-nest environment.
we measure the group performance P by the num-         Black curves are the average computed over the
ber of prey objects collected by the swarm at the      50 repetitions of each setting, gray curves indi-
end of the experiment (tmax = 1 hour). From the        cate the 95% confidence interval on the expected
group performance measure we can derive the in-        value. The performance graph in Figure 3a shows
dividual efficiency as follows:                          that the partitioned strategy improves perfor-
                                                       mance in the narrow-nest environment. The
                   Ieff = P/N,                   (1)    graph shows that the non-partitioned strategy
                                                       performs better than the partitioned strategy for
where N is the size of the group. Individual effi-       small group sizes (up to N = 13 robots). How-
ciency can help to understand the effect of inter-      ever, increasing the group size makes the non-
ference on the performance.                            partitioned strategy collapse: the number of gath-
    In order to measure the influence of environ-       ered prey objects drops dramatically for groups
mental features on the interference, we define          larger than 13. The individual efficiency graph
an interference measure taking inspiration from        (Figure 3b) can explain the behavior of the sys-
Rosenfeld et al. (2005). In their work, interfer-      tem. The robots employing the partitioned strat-
ence is measured as the time spent performing          egy are less efficient, for small group sizes, than
actions not strictly related to the task, but rather   those performing the non-partitioned strategy.
lost due to negative interactions with the en-         However, the addition of more individuals af-
vironment (e.g., obstacle avoidance maneuvers).        fects the efficiency of the non-partitioned group
By registering the number of collisions for each       in a more dramatic way. At a certain point,
area of the arena, we can draw conclusions about       the drop in efficiency becomes very steep for the
where physical interferences happen most often.        non-partitioned strategy. On the other hand, the
We measure interference through the state of the       partitioned strategy scales better: individual effi-
controller: in our case a robot is experiencing in-    ciency drops smoothly. This explains why a group
terference each time its controller perceives an ob-   using the partitioned strategy performs better: it
stacle.                                                can benefit from the work of more individuals and
    In case of a partitioned task, there is another    therefore collects more prey objects. These con-
source of inefficiency that adds to interference:        siderations do not hold in the wide-nest environ-
the time lost in the exchange zone. We define           ment. The performance graph in Figure 4 shows
the strategy cost C as the sum of time lost be-        that the non-partitioned strategy performs bet-
cause of physical interference and time lost in the    ter than the partitioned strategy for group sizes
exchange zone:                                         N < 33. In both the environments, indepen-
                                                       dently of the strategy used to accomplish the task,
                C = Tint + Tpart ,              (2)    the system collapses when the area is saturated
where Tint is the amount of time steps during          by the swarm.
which the controller perceives an obstacle, and            Figure 5 shows the effect on the cost of in-
Tpart is the total amount of time steps spent in       creasing the number of robots in the narrow-nest
prey passing maneuvers. By using this metric,          environment. The graph compares the cost C of
6                                                                                                           IRIDIA – Technical Report Series: TR/IRIDIA/2009-006




                                       ●       non−partitioned
                                 400

                                               partitioned                              ●   ●
    Prey objects retrieved (P)




                                                                                   ●            ●

                                                                               ●
                                 300




                                                                           ●
                                                                       ●                            ●

                                                                   ●                                    ●
                                 200




                                                               ●
                                                           ●
                                                       ●                                                                                                                          ●
                                                                                                                                                                              ●
                                 100




                                                                                                                                                                                      ●   ●
                                                                                                            ●
                                                   ●                                                                                                             ●        ●                   ●
                                                                                                                ●                                                    ●                            ●
                                                                                                                    ●                                        ●                                            ●
                                               ●                                                                                                         ●                                            ●
                                                                                                                        ●   ●    ●   ●       ●   ●   ●                                                        ●
                                                                                                                                         ●
                                 0




                                           0                                       10                                       20                                       30                                       40
    0 10 20 30 40 50
                          ( I ef f )




                                               ●   ●   ●
    Prey objects retrieved




                                                           ●   ●   ●   ●   ●   ●   ●    ●
    by individual robot




                                                                                            ●
                                                                                                ●


                                                                                                    ●
                                                                                                        ●


                                                                                                            ●
                                                                                                                ●   ●                                            ●        ●   ●   ●   ●   ●
                                                                                                                        ●   ●    ●   ●   ●   ●   ●   ●   ●   ●       ●                        ●   ●   ●   ●   ●



                                                                                                                Number of robots (N)

Figure 3: (a) Performance P and (b) individual efficiency Ieff for increasing number of robots in the narrow-nest
environment. The black continuous line refers to the case of no task partitioning, the black dashed line to the
case of partitioning. Gray lines indicate the 95% confidence interval on the expected value.
                                 800




                                                                                                                                                 ●   ●   ●
                                                                                                                                         ●   ●
                                       ●       non−partitioned                                                                   ●
                                                                                                                                     ●                       ●
                                                                                                                                                                 ●
                                               partitioned                                                              ●
                                                                                                                            ●
                                                                                                                    ●
 Prey objects retrieved (P)




                                                                                                                ●
                                 600




                                                                                                                                                                     ●
                                                                                                            ●
                                                                                                        ●
                                                                                                    ●                                                                     ●
                                                                                                ●
                                                                                            ●
                                                                                        ●
                                 400




                                                                                   ●
                                                                                                                                                                              ●
                                                                               ●
                                                                           ●
                                                                       ●
                                                                   ●
                                                                                                                                                                                  ●
                                 200




                                                               ●
                                                           ●                                                                                                                          ●
                                                       ●                                                                                                                                  ●
                                                   ●                                                                                                                                          ●
                                                                                                                                                                                                  ●
                                               ●                                                                                                                                                      ●   ●   ●
                                 0




                                           0                                       10                                       20                                       30                                       40

                                                                                                                Number of robots (N)

Figure 4: Performance P for increasing number of robots in the wide-nest environment. The black continuous
line refers to the case of no task partitioning, the black dashed line to the case of partitioning. Gray lines indicate
the 95% confidence interval on the expected value.


each of the two strategies for different group sizes.                                                                        titioning inefficiencies. These considerations hold
In case of the partitioned strategy (Figure 5a),                                                                            in the narrow-nest environment, where the like-
the graph shows each component of the cost (Tint                                                                            lihood of physical interference in a task-critical
and Tpart ). Clearly, task partitioning has the ef-                                                                         zone is very high. In the wide-nest environment,
fect of reducing the cost due to interference but                                                                           interference in the nest is as likely as interference
has the disadvantage of increasing the cost due to                                                                          in the exchange zone. Thus, it is not beneficial to
time lost. The probability of two or more robots                                                                            pay the cost of waiting and the non-partitioned
encountering each other increases with the robot                                                                            strategy performs better for any group size.
density. Although this determines a higher inter-
ference cost (i.e., Tint ), it decreases the cost due                                                                           The mechanism by which partitioning reduces
to lower waiting time (i.e., Tpart ) in the case of                                                                         interference costs can be deduced by compar-
the partitioned strategy. Partitioning performs                                                                             ing the interference graphs in Figure 6. The
better when the gain from interference reduction                                                                            graphs show the number of times that physical
is greater than the loss of performance due to par-                                                                         interference (as defined in Section 4.5) was regis-
                                                                                                                            tered in each region of the narrow-nest environ-
IRIDIA – Technical Report Series: TR/IRIDIA/2009-006                                                                                                                                         7




                                                                                                       partitioned
                                                                                                                                                                           non−partitioned
                                                                                                        strategy
                                                                                                                                                                              strategy

                              100
                                                                                                                           100
                     Interf




                                80




                                                                                                                     Interfe
                                                                                                                               80
                                    60                                                                                          60
                            erenc




                                    40                                                                                           40




                                                                                                                        rence
                                     20                                                                                             20
                             e




                                         0                                                                                            0

                                             0.5                                                            1                                                                     1
                                                                                                                                          0.5

                                                                                                   0                                                                        0
                                                       0.0                                                                                          0.0
                                                   y                                                                                            y
                                                                                          −1       x                                                                  −1    x
                                                                  −0.5                                                                                    −0.5

                                                                              −2                                                                                 −2


Figure 6: Mean interference values registered for (a) the partitioned strategy and (b) the non-partitioned strategy,
both in the narrow-nest environment. Shown values are observation means of 50 repetitions with N = 18
robots. Coordinates on the x- and y-axis are given in meters. The arena is stretched along the y-axis for better
visualization. The dashed white line marks the location of the exchange zone.



                                                                                                                       ment. The total area was discretized in squares of
                                                                                                                       1 cm2 . Figure 6 shows the results obtained with
                    20000




                                         Collision costs                               partitioned strategy
                                         Partition costs                                                               18 robots, in the case of the non-partitioned strat-
                                                                                                                       egy (Figure 6a) and in the case of the partitioned
                    15000




                                                                                                                       strategy (Figure 6b). The graphs show that
Strategy cost (C)




                                                                                                                       the use of the non-partitioned strategy leads to
                    10000




                                                                                                                       high interference in the nest, which becomes con-
                                                                                                                       gested. Partitioning the task reduces the robot
                                                                                                                       density in the nest, thus spreading the interfer-
                    5000




                                                                                                                       ence more uniformly across the arena. In addi-
                                                                                                                       tion, the overall interference diminishes because
                                                                                                                       the exchange zone is wider: the robots have more
                    0




                               0             5         10    15          20   25        30      35         40          freedom of movement and collide less often. Al-
                                                                                                                       though the graphs show only data collected with
                    20000




                                                                                   non−partitioned strategy            18 robots, experiments with different group sizes
                                                                                                                       produced similar results.
                    15000
Strategy cost (C)

                    10000




                                                                                                                       6             CONCLUSIONS AND
                                                                                                                                     FUTURE WORK
                    5000




                                                                                                                       Interference can be an issue when working with
                                                                                                                       swarms of robots. In this work, we used task par-
                    0




                                                                                                                       titioning and allocation to reduce interference be-
                                                             Number of robots (N)
                                                                                                                       tween robots sharing the same physical space. We
Figure 5: Cost of interference in the narrow-nest envi-                                                                manually partitioned the environment and em-
ronment. Bars represent the cost C, sum of interfer-                                                                   ployed a simple self-organized strategy for allo-
ence time Tint and partition time Tpart (i.e., waiting
times). For easy reference, the outline of the bars of
                                                                                                                       cating individuals to subtasks. Results show that
the respective other graph has been added to each                                                                      a partitioning strategy improves performance in a
graph. (a) Costs for the partitioned strategy, where                                                                   constrained environment. Additionally, we iden-
interference cost stem from waiting times and colli-                                                                   tified cases in which partitioning is not advanta-
sions. (b) Cost in case of the non-partitioned strategy,                                                               geous and a non-partitioned strategy should be
where only physical interference through collisions ex-                                                                used. The proposed strategy is fairly simple and
ists.                                                                                                                  far from being an optimal solution, nevertheless
8                                                 IRIDIA – Technical Report Series: TR/IRIDIA/2009-006




we improved the performance of the swarm when             Goldberg, D. (2001). Evaluating the Dynamics of
interference was costly.                                    Agent-Environment Interaction. PhD thesis, Uni-
    Future work will concern the identification of           versity of Southern California, Los Angeles, CA.
the optimal allocation in the studied environ-                                    c
                                                          Goldberg, D. and Matari´, M. J. (2003). Maximizing
ments as well as the development and study of               reward in a non-stationary mobile robot environ-
a strategy that can find this optimal allocation in          ment. Autonomous Agents and Multi-Agent Sys-
a self-organized and adaptive way. In addition,             tems, 6(3):287–316.
the interference metric proposed in Section 4.5           Groß, R., Bonani, M., Mondada, F., and Dorigo, M.
could be used by the robots to decide whether to            (2006). Autonomous self-assembly in swarm-bots.
partition the task. In this way, we could achieve           IEEE Transactions on Robotics, 22(6):1115–1130.
even better performance, since partitioning would         Kalra, N. and Martinoli, A. (2006). A compara-
be employed only when strictly needed. Finally,             tive study of market-based and threshold-based
the goal is to validate the system using the real           task allocation. In Proceedings of the 8th Inter-
robots.                                                     national Symposium on Distributed Autonomous
                                                            Robotic Systems (DARS), Minneapolis, Minnesota,
                                                            USA.

ACKNOWLEDGEMENTS                                          Krieger, M. J. B. and Billeter, J.-B. (2000). The call
                                                            of duty: Self-organised task allocation in a popu-
                                                            lation of up to twelve mobile robots. Journal of
This work was supported by the SWARMANOID                   Robotics and Autonomous Systems, 30:65–84.
project, funded by the Future and Emerging
Technologies programme (IST-FET) of the Eu-               Labella, T. H., Dorigo, M., and Deneubourg, J.-L.
ropean Commission, under grant IST-022888                   (2006). Division of labor in a group of robots in-
                                                            spired by ants’ foraging behavior. ACM Trans-
and by the VIRTUAL SWARMANOID project                       actions on Autonomous and Adaptive Systems,
funded by the Fund for Scientific Research                   1(1):4–25.
F.R.S.–FNRS of Belgium’s French Community.
The information provided is the sole responsibil-         Lein, A. and Vaughan, R. (2008). Adaptive multi-
                                                            robot bucket brigade foraging. In Proceedings of
ity of the authors and does not reflect the Euro-            the Eleventh International Conference on Artificial
pean Commission’s opinion. The European Com-                Life (ALife XI), pages 337–342, Cambridge, MA.
mission is not responsible for any use that might           MIT Press.
be made of data appearing in this publication.
                                                          Lerman, K. and Galstyan, A. (2002). Mathematical
Marco Dorigo and Mauro Birattari acknowledge
                                                            model of foraging in a group of robots: Effect of
support from the Fund for Scientific Research                interference. Auton. Robots, 13(2):127–141.
F.R.S.–FNRS of Belgium’s French Community, of
which they are a research director and a research         Liu, W., Winfield, A., Sa, J., Chen, J., and Dou,
associate, respectively.                                    L. (2007). Towards energy optimization: Emer-
                                                            gent task allocation in a swarm of foraging robots.
                                                            Adaptive Behavior, 15(3):289–305.
                                                          Mondada, F., Pettinaro, G. C., Guignard, A., Kwee,
REFERENCES                                                 I. V., Floreano, D., Deneubourg, J.-L., Nolfi,
                                                           S., Gambardella, L. M., and Dorigo, M. (2004).
Campo, A. and Dorigo, M. (2007). Efficient multi-            SWARM-BOT: A new distributed robotic concept.
  foraging in swarm robotics. In Advances in Artifi-        Autonomous Robots, 17(2–3):193–221.
  cial Life: Proceedings of the VIIIth European Con-
  ference on Artificial Life, Lecture Notes in Artifi-      Nouyan, S., Campo, A., and Dorigo, M. (2008). Path
  cial Intelligence LNAI 4648, pages 696–705, Berlin,       formation in a robot swarm. Self-organized strate-
  Germany. Springer Verlag.                                 gies to find your way home. Swarm Intelligence,
                                                            2(1):1–23.
    a                      c
Font´n, M. S. and Matari´, M. J. (1996). A study of
  territoriality: The role of critical mass in adaptive   Rosenfeld, A., Kaminka, G. A., and Kraus, S. (2005).
  task division. In From Animals to Animats 4: Pro-         A study of scalability properties in robotic teams.
  ceedings of the Fourth International Conference of        In Coordination of Large-Scale Multiagent Systems,
  Simulation of Adaptive Behavior, pages 553–561,           pages 27–51, New York. Springer Verlag.
  Cambridge, MA. MIT Press.
                                                                                c
                                                          Shell, D. and Matari´, M. J. (2006). On foraging
Gerkey, B. P. and Matari´, M. J. (2004). A for-
                          c                                 strategies for large-scale multi-robot systems. In
  mal analysis and taxonomy of task allocation in           Intelligent Robots and Systems, 2006 IEEE/RSJ
  multi-robot systems. The International Journal of         International Conference on, pages 2717–2723,
  Robotics Research, 23(9):939–954.                         Beijing, China.
                 Self-organized Task Partitioning
                       in a Swarm of Robots

     Marco Frison1,2 , Nam-Luc Tran1 , Nadir Baiboun1,3 , Arne Brutschy1 ,
     Giovanni Pini1 , Andrea Roli1,2 , Marco Dorigo1, and Mauro Birattari1
          1
                                       e
               IRIDIA, CoDE, Universit´ Libre de Bruxelles, Brussels, Belgium
                         2
                                    a
                           Universit´ di Bologna, Bologna, Italy
                 3
                                      e
                   ECAM, Institut Sup´rieur Industriel, Brussels, Belgium
                    mfrison85@gmail.com, nadir ecam@hotmail.com,
              {namltran,arne.brutschy,gpini,mdorigo,mbiro}@ulb.ac.be,
                                andrea.roli@unibo.it


       Abstract. In this work, we propose a method for self-organized adap-
       tive task partitioning in a swarm of robots. Task partitioning refers to
       the decomposition of a task into less complex subtasks, which can then
       be tackled separately. Task partitioning can be observed in many species
       of social animals, where it provides several benefits for the group. Self-
       organized task partitioning in artificial swarm systems is currently not
       widely studied, although it has clear advantages in large groups. We pro-
       pose a fully decentralized adaptive method that allows a swarm of robots
       to autonomously decide whether to partition a task into two sequential
       subtasks or not. The method is tested on a simulated foraging problem.
       We study the method’s performance in two different environments. In
       one environment the performance of the system is optimal when the for-
       aging task is partitioned, in the other case when it is not. We show that
       by employing the method proposed in this paper, a swarm of autonomous
       robots can reach optimal performance in both environments.


1    Introduction
Many animal species are able to partition complex tasks into simpler subtasks.
The act of dividing a task into simpler subtasks that can be tackled by different
workers is usually referred to as task partitioning [15].
   Although task partitioning may have associated costs, for example because of
work transfer between subtasks, there are many situations in which partitioning
is advantageous. Benefits of task partitioning include, for example, a reduction of
interference between individuals, an improved exploitation of the heterogeneity
of the individuals, and an improved transport efficiency [9].
   Humans widely exploit the advantages of task partitioning in everyday activ-
ities. Through centuries, humans have developed complex social rules to achieve
cooperation. These include planning, roles and work-flows. Ancient romans real-
ized the importance of partitioning and they codified it in their military principle
divide et impera (also known as divide and conquer), which became an axiom in
many political [17] and sociological theories [10].

M. Dorigo et al. (Eds.): ANTS 2010, LNCS 6234, pp. 287–298, 2010.
c Springer-Verlag Berlin Heidelberg 2010
288     M. Frison et al.

   Examples of task partitioning can also be observed in social insects. A widely
studied case is the foraging task in ants and bees. In foraging, a group of in-
dividuals has to collect and transport material to their nest. Foraging involves
many different phases and partitioning can occur simultaneously in many of
them. Typical phases where task partitioning can occur are the exploration of
the environment and the preparation of raw materials [15]. Examples of task
partitioning are the harvesting of leaves by the leaf-cutter ants [9], the exca-
vation of nest chambers in Pogomomyrmex, and the fungus garden removal in
Atta [18].
   Also in swarm robotics there are situations in which it is convenient to par-
tition a task into subtasks. Advantages include increased performance at group
level, stimulated specialization, and parallel task execution. In most of the cases,
task partitioning is done a priori and the focus is on the problem of allocating
individuals to subtasks in a way that maximizes efficiency. However, in many
cases, task partitioning cannot be done a priori because the relevant information
on the environment is not available. We consider self-organized task partitioning
as a suitable approach in these cases.
   In this work, we focus on the case in which a task is partitioned into subtasks
that are sequentially interdependent. We propose a simple method, based on
individuals’ perception and decisions, that allows a swarm of autonomous robots
to decide whether to partition a foraging task into subtasks. We test the method
with a swarm of simulated robots in two different environmental conditions.
   The rest of the paper is organized as follows. In Section 2 we describe the prob-
lem and we review related works. In Section 3 we propose an adaptive method
that we tested with simulated robots. In Section 4 we provide a description of
the experimental framework we consider. In Section 5 we report and discuss the
results. Finally, in Section 6 we summarize the contribution of this work and
present some directions for future research.

2     Problem Description and Related Works
We study a swarm of robots that has to autonomously decide whether to parti-
tion a task into subtasks. Our focus is on situations in which a task is partitioned
into sequential subtasks: the subtasks have to be executed in a certain sequence,
in order to complete the global task once [5].
   In these cases, we can identify tasks interfaces where one task ends and an-
other begins. Through tasks interfaces, individuals working on one of the sub-
tasks can interact, either directly or indirectly, with individuals working on other
subtasks.
   An example of sequential task partitioning, observable in nature, is the forag-
ing activity in Atta leaf cutting ants. The sequential interdependency between
tasks stems from the fact that each leaf has to be cut from a tree before it can
be transported to the nest. Each individual can choose whether to perform both
the cutting and transporting subtasks, or to specialize in one subtask only.
   Hart at al. [9] described the strategy employed by Atta ants: some individuals
work on the tree, cutting and dropping leaves to the ground, while the rest of
                      Self-organized Task Partitioning in a Swarm of Robots     289

the swarm gathers and transports these leaves to the nest. Here the advantage of
partitioning comes from the fact that the energy cost to climb the tree has to be
paid only once by those individuals working as leaf cutters. Disadvantages come
from the fact that energy has to be spent to search for leaves on the ground.
The task interface can be identified, in this case, as the area where the leaves
land. Such areas are usually referred to as caches, and facilitate indirect transfer
of material between individuals. Anderson and Ratnieks described how foragers
of different ant species partition the leaf transport task by using caches [4].
   Partitioning along foraging trails using direct transfer of material between
individuals can be observed in other ant species and in other social insects. In
the case of direct transfer, the benefit of partitioning can come from the fact that
material weight can be matched with the strength of the transporter [2]. Akre
et al. observed task partitioning within Vespula nectar foraging [1]. Anderson
and Ratnieks studied partitioned foraging in honeybees species, showing that
the larger the swarm, the higher the performance [3].
   Robotic swarms often face situations similar to those of their natural coun-
terparts. However, despite its importance, few works have been devoted to task
partitioning in swarm robotics. Notable exceptions are the works of Fontan and
        c                             c
Matari´ as well as Shell and Matari´ on bucket-brigading [8,16]. In these works,
robots have to gather objects in an environment. Each robot operates in a lim-
ited area and drops the object it carries when it reaches the boundaries of its
working area. This process leads to objects being passed across working areas
until they reach the nest. Lein and Vaughan proposed an extension to this work,
in which the size of the robots’ working areas is adapted dynamically [11]. Pini
et al. showed that the loss of performance due to interference, can be reduced by
partitioning the global task into subtasks [14]. To the best of our knowledge, self-
organized task partitioning in terms of adaptive task decomposition has never
been investigated.


3   The Method

The method we propose allows a swarm of robots to adaptively decide whether
to partition a task into sequential subtasks or not. A decision is made by each
individual: In case a robot decides to employ task partitioning, it works only
on one of the subtasks. In case a robot decides not to employ task partitioning,
it performs the whole sequence of subtasks. The method is fully distributed
and does not require explicit communication between individuals. The swarm
organizes in a way that maximizes the overall efficiency of the group, regardless
of the specific environment. Efficiency is defined as the throughput of the system.
   In the method proposed, each individual infers whether task partitioning is
advantageous or not on the basis of its waiting time at tasks interfaces. We define
the probability p that a robot has to employ task partitioning as:

                                            1
                             p=1−                   ,                           (1)
                                      1 + e−θ(w(k))
290     M. Frison et al.

with θ being:
                                        w(k)
                                θ w(k) =     −d ,                       (2)
                                         s
and w(k) being the weighted average waiting time at task interfaces after k
visits, which is calculated as follows:

                           w(k) = (1 − α)w(k − 1) + αwM .                       (3)

In Equation 2, s and d are a scale and a delay factor, respectively. In Equation 3,
α ∈ (0, 1], is a weight factor that influences the responsiveness to changes: higher
values lead to a readily responsive behavior. The value of these parameters can
be determined empirically. The variable wM is the measured waiting time at
task interface and ranges in [0, wMAX ). The upper limit wMAX ensures that
robots eventually renounce to employ task partitioning when their waiting time
becomes too high. Each time a robot completes a subtask, it decides whether to
employ task partitioning for the next task execution, or not.


4     Experimental Setup
The purpose of the experiments described in this section is to show the validity
of the method described in Section 3. To illustrate the approach we have chosen
a foraging problem as a testbed. It is a canonical choice in collective robotics as
it is easy to model and its applications are numerous [7].
   In the experiments, the global task can be described as harvesting objects from
a single source and storing them in the nest. The global task can be partitioned
into two subtasks, referred to as harvesting and storing, respectively. Partitioning
enables the subdivision of the environment into two areas, linked by a task
interface as defined in Section 2. The task interface is represented by a cache
that can be used by the robots to exchange objects. As the cache has a limited
capacity, robots that decide to use it may have to wait. The waiting time is
defined as the delay between the moment when a robot decides to use the cache,
either for picking up or dropping objects, and the moment when this effectively
becomes possible. It is also possible to avoid the cache by using a separate
corridor, which links directly the source and the nest.
   Each robot has to autonomously decide whether to partition the foraging
task, by using the cache; or not to partition it, by using the corridor. The swarm
can also employ a mixed strategy in which some individuals use the cache and
others use the corridor. Robots have no notion of the global performance of the
swarm. In no case explicit communication is used. Figure 1 shows a simplified
state diagram that represents the behavior of each individual.

4.1   Simulation Tools
All the results presented in the paper are obtained using the ARGoS simulation
framework [13]. ARGoS is a discrete-time, physics-based simulation environment
that allows one to simulate experiments at different levels of detail. For the
                         Self-organized Task Partitioning in a Swarm of Robots       291


                                         1 − pd
                    pd

      Harvest                Drop in                Pick up              Store in
    from source               cache               from cache               nest

                                                                  pp
                                         1 − pp

Fig. 1. Simplified state machine representing the behavior of each individual. Prob-
abilities pd and pp are both defined using Equation 1 as described in Section 3. The
variable pd represents the probability of using the cache to drop an object. The variable
pp represents the probability of picking up an object from the cache. The states Avoid
obstacles and Navigate have been omitted for clarity.



experiments presented in this paper, it is sufficient to simulate kinematics in a
bi-dimensional space. A common control interface provides transparent access
to real and simulated hardware, allowing the same controller to run also on the
real robots without modifications.
   The robots we use in this research are the e-pucks1 . The e-puck has been
designed with the purpose of being both a research and an educational tool for
universities [12]. ARGoS simulates the whole set of sensors and actuators of the
e-puck. In our experiments we use the wheel actuators, the 8 IR sensors for light
and proximity detection, the VGA camera, and the ground sensors.

4.2    Harvesting Abstraction

As the e-pucks are not capable of grasping objects, we developed a device to
simulate this process [6]. Figure 2 shows a schematic representation of the device,
called an array 2 . It consists of a variable number of slots, named booths. Each
booth is equipped with two RGB LEDs that can be detected by the robots
through their color camera. A light barrier can detect the presence of a robot
within each booth. Reactions to this event, such as changes in LEDs color, are
programmable.
   In the experiments presented in the paper, a green booth, either in the source
or in the cache, means that an object is available there. Analogously, a blue
booth means that an object can be dropped there. By using this abstraction,
when a robot enters a booth in which the LEDs are lit up in green, we consider
that the robot picks up an object from that booth. When a robot enters a booth
in which the LEDs are lit up in blue, we consider that the robot drops an object
1
    http://www.e-puck.org/
2
    The array is under development and a working prototype is currently available.
292     M. Frison et al.




Fig. 2. Schematics of an array of booths with four booths on each side. The light bar-
riers, represented by black semicircles, detect when a robot enters in the corresponding
booth. LEDs, used to signal available pick up or drop sites, are represented by blank
semicircles.


in that booth. In both cases, when the booth perceives the presence of the robot,
it reacts by turning the LEDs red, until the robot has left. Once the robot has
left, the booths behave in a different way, depending whether they are source,
nest or cache booths. In the case of the source, the booth turns green, to signal
the availability of a new object to harvest. In the case of the nest, the booth turns
blue, to signal that the corresponding store spot is available again. In case of the
cache, if the robot leaves after picking up an object, the booth, previously green,
turns off and the corresponding booth on the other side turns blue signaling
that the spot is now available again for dropping an object. Conversely, if the
robot leaves after dropping an object, the booth, previously blue, turns off and
the corresponding booth on the other side turns green signaling that an object
is available for being picked up. This simple logic allows us to simulate object
transfer through the cache, as well as harvest from the source and store in the
nest.

4.3   Environments

We run the experiments in two different environments, named short-corridor
environment and long-corridor environment (see Figures 3a and 3b).
   In both the environments, the nest array is located on the right-hand side,
while the source array is located on the left-hand side of a rectangular arena.
Both the nest and the source arrays have four booths, all on one side. The
cache array is located between the nest and the source and has three booths on
each side. Therefore, the cache has a limited capacity, which is determined by
the number of booths on each side. Different ground colors allow the robots to
recognize on which side of the cache they are.
   Although the cache array cannot be crossed by robots, a corridor links the
two areas and allows the robots to harvest/store objects without using the cache
                       Self-organized Task Partitioning in a Swarm of Robots       293




Fig. 3. Representation of the a) short-corridor and the b) long-corridor environments
used in the experiments. Nest and source arrays have both four booths on one side.
The cache array has three booths for each side. Different ground colors help the robots
to distinguish between different parts of the environment and to navigate through the
corridor that connects the two areas. The light source, used as landmark for navigating
in the corridor, is marked with “L”.



array (i.e., without partitioning the foraging task). A light source and two colored
trails help the robots to navigate through the corridor. Both the environments
are 1.6 m wide, but they differ in the corridor’s length: in the short-corridor
environment the corridor is 1.5 m long, while in the long-corridor environment it
is 3.5 m long. Both the use of the cache and the use of the corridor entail costs.
The cache can be seen as a shortcut between source and nest: robots cannot cross
the cache, but its use can make material transfer faster. However, the cache can
also become a bottleneck as the decision of using it can lead to delays if it is
busy when dropping objects, or empty when picking them up. The decision of
using the corridor imposes a cost due to the time spent traveling through it.
Thus, the transfer cost varies with cache and group size while the travel cost
varies with corridor length. As we keep the size of the cache array constant
in our experiments, the corridor length determines the relative cost between
partitioning and not partitioning. In the long-corridor environment, the use of
the cache is expected to be preferable. On the other hand, in the short-corridor
environment, the cost imposed by the corridor length is low and should lead to
the decision not to use the cache.
294     M. Frison et al.

4.4   Experimental Settings

We run two different sets of experiments: in a first set of experiments we are
interested in assessing the performance of the adaptive method that we propose,
while in the second set of experiments we aim at evaluating its scalability. In the
first set of experiments the robotic swarm is composed of twelve e-pucks, each
controlled by the same finite state machine depicted in Figure 1. In both the
environments, we compare the adaptive method described in Section 3 to two
strategies which always partition (pd = pp = 1) or never partition (pd = pp = 0).
These are used as reference strategies for evaluating both the performance of
the proposed method and its capability of adapting to different environmental
conditions. The values of the parameters have been determined empirically and
fixed to s = 20, d = 5, α = 0.75, wMAX = 15 s. The duration of each experi-
mental run is 150 simulated minutes. For each experimental condition we run 30
repetitions, varying the seed of the random numbers generator. At the beginning
of each experiment the robots are randomly positioned in the right side of the
environment, where the nest array is located. As the average waiting time w(k)
is initially equal to zero, all the robots start with probabilities pd = pp ≈ 1: from
Equations 1 and 2, with d = 5; when θ w(k) = 0, p = 0.993.
    In the second set of experiments we compare the adaptive method to the
reference strategies in the short-corridor environment. In this case the size of
the swarm varies in the interval [4, 60]. For each condition we run 10 randomly
seeded experiments. The remaining parameters of the experiment are the same
as described for the first set of experiments.


5     Results and Discussion

As we keep the capacity of the cache array constant, it is the length of the cor-
ridor that determines which behavior maximizes the throughput of the system.
Partitioning allows the robots to avoid the corridor but, in order to exploit ef-
ficiently the cache, the swarm has to organize and to work on both of its sides.
Additionally, as pointed out in Section 4, the robots might have to wait in order
to use the cache.
   In the short-corridor environment, the cost of using the cache is higher than
the cost of using the corridor. In this case the robots should decide for a non-
partitioning strategy, without exchanging objects at the cache. Conversely, in the
long-corridor environment the time required to travel along the corridor is high,
and partitioning the task by using the cache is expected to be more efficient.
   The graphs in Figure 4 show the average throughput of a swarm of twelve
robots in the two environments. Throughput is measured as the number of ob-
jects retrieved per minute. The adaptive method is compared with the two ref-
erence strategies in which the robots never/always use the cache. As expected,
each of these reference strategies performs well only in one environment: the
strategy that never uses the cache performs better in the short-corridor envi-
ronment, while the strategy that always use the cache performs better in the
                                                             Self-organized Task Partitioning in a Swarm of Robots                                      295


                                1.8                                                                                           G      G      G     G
                                      Short−corridor environment                                 G   G         G        G
                                                                                      G     G
                                                                            G   G
                                                                   G   G

                                                               G

                                                         G
                                1.6
 Throughput (objects/minutes)




                                                   G
                                1.4
                                1.2




                                             G




                                                                                                                            Strategies
                                1.0




                                                                                                           G       never partition (pd = pp = 0)
                                                                                                                   always partition (pd = pp = 1)
                                0.8




                                                                                                                   adaptive (s = 20, d = 5, α = 0.75)


                                       0                               50       Time (minutes)       100                                          150
                                1.8




                                      Long−corridor environment
                                1.6
 Throughput (objects/minutes)
                                1.4




                                                                                                                                            G     G
                                1.2




                                                                                                               G        G     G      G
                                                                                                 G   G
                                                                                      G     G
                                                                                G
                                                                            G
                                                                       G
                                                                   G
                                                               G
                                                                                                                            Strategies
                                1.0




                                                         G

                                                                                                           G       never partition (pd = pp = 0)
                                                   G                                                               always partition (pd = pp = 1)
                                0.8




                                                                                                                   adaptive (s = 20, d = 5, α = 0.75)


                                       0                               50       Time (minutes)       100                                          150




Fig. 4. Average throughput in the short-corridor (top) and long-corridor (bottom) en-
vironment for different strategies. Time is given in simulated minutes. Throughput is
measured as objects retrieved per minute. Parameters for the adaptive behavior are
set to s = 20, d = 5, α = 0.75, wMAX = 15 s. Parameter values have been obtained em-
pirically. Each experiment has been repeated 30 times, varying the seed of the random
number generator. The bars around the single data points represent the confidence
interval of 95% on the observed mean.


long-corridor environment. On the other hand, the adaptive method we propose
shows good performance in both environments.
   Concerning the long-corridor environment, we assumed that the best strategy
was to always use the cache and to avoid the corridor. However, Figure 4(bottom)
shows that the adaptive method proposed improves over this fixed strategy. To
better understand this behavior, we empirically determined the fixed-probability
strategy yielding the highest throughput for each environment. This analysis
revealed that the best strategy in the long-corridor environment is to use the
cache with a probability around 80%. In the short corridor environment a non-
partitioning strategy is preferable.
   Table 1 reports the average throughput obtained at the end of the run for
each strategy in each environment. The results reported in the table show that
296                                        M. Frison et al.


Table 1. Average throughput at the end of the experiment for a swarm of 12 robots.
Results are reported for different partitioning strategies in the short-corridor and long-
corridor environments. The values in parenthesis indicate the 95% confidence interval
on the value of the mean.

                                                    Never partition Always partition Fixed (pd , pp = 0.8)                                                         Adaptive
Short-corridor                                          1.81 (±0.012)                  1.57 (±0.015)                           1.67 (±0.016)                  1.79 (±0.017)
Long-corridor                                           1.22 (±0.006)                  1.36 (±0.015)                           1.40 (±0.013)                  1.43 (±0.017)
Final throughput (objects/minutes)




                                                                                                                   G
                                     2.5




                                                                                                              G        G
                                                                                                    G     G                G
                                                                                               G
                                                                                           G                                    G
                                                                                  G
                                                                              G                                                     G
                                                                          G
                                     2.0




                                                                      G
                                                                                                                                         G
                                                                  G


                                                             G                                                                               G
                                     1.5




                                                         G
                                                                                                          Strategies                             G



                                                                                       G       never partition (pd = pp = 0)                         G
                                                                                                                                                         G
                                                    G
                                                                                               always partition (pd = pp = 1)
                                     1.0




                                                                                                                                                               G
                                                                                                                                                                    G


                                                G
                                                                                               adaptive (s = 20, d = 5, α = 0.75)                                       G
                                                                                                                                                                            G
                                                                                                                                                                                G



                                           0                 10                   20                          30                    40                   50                     60

                                                                                                        Swarm size



Fig. 5. Impact of the swarm size in the short-corridor environment. The value of the
throughput, measured as number of objects retrieved per minute, is the average value
reached by the swarm at the end of the experiment. Each experimental run lasts 150
simulated minutes. For each experimental condition we run 10 repetitions, varying the
seed of the random number generator.


fixed-probability strategies perform well only in one of the two environments.
Our adaptive method, on the other hand, reaches good performance in both the
short-corridor and the long-corridor environment. These results confirm that
the method we propose allows a swarm of robots to take a decision concerning
whether to partition a task into sequential subtasks or not.
   The graph in Figure 5 shows the results of the second set of experiments, in
which we focus on scalability. In this case we compare different strategies for
different swarm sizes in the short-corridor environment. As discussed previously,
in the short-corridor environment the optimal strategy is to always use the cor-
ridor. It can be observed that for small swarm sizes this strategy performs well.
However, the performance of this strategy drops drastically when the number of
robots increases. The reason for this degradation is the increasing interference
between the robots, which increases the cost of using the corridor. The parti-
tioned strategy does not suffer from steep drops of performance. However, the
throughput is considerably lower for smaller group sizes, as the waiting time at
the cache becomes dominant. The adaptive method performs well across all the
studied swarm sizes, finding a good balance between the robots that use the
cache and those that use the corridor.
                       Self-organized Task Partitioning in a Swarm of Robots        297

6   Conclusions
In this research we investigated the self-organized task partitioning problem
in a swarm of robots. In particular we have proposed an adaptive method to
tackle the task partitioning problem with a simple strategy based on individual’s
perception of each subtask performance. In the method proposed, the subtask
performance is estimated by each robot by measuring its waiting time at task
interfaces. Results show that the adaptive method we propose reaches the best
performance in the two environments we considered, employing task partitioning
only in those cases in which the benefits of partitioning overcome its costs. The
study of the impact of the group size reveals that the method scales well with the
swarm size. Future work will concern the study of self-organized task partitioning
in multi-foraging problems in environments with several caches. In addition, we
are interested in studying the application of the method proposed to cases in
which partitioning happens through direct material transfer.

Acknowledgements. This work was partially supported by the virtual
Swarmanoid project funded by the Fund for Scientific Research F.R.S.–FNRS
of Belgium’s French Community. Marco Dorigo and Mauro Birattari acknowl-
edge support from the F.R.S.–FNRS, of which they are a research director
and a research associate, respectively. Marco Frison acknowledges support from
                                                                  a
“Seconda Facolt di Ingegneria”, Alma Mater Studiorum, Universit` di Bologna.
Andrea Roli acknowledges support from the “Brains (Back) to Brussels” 2009
programme, founded by IRSIB – Institut d’encouragement de la Recherche Sci-
entifique et de l’Innovation de Bruxelles.


References
 1. Akre, R.D., Garnett, W.B., MacDonald, J.F., Greene, A., Landolt, P.: Behavior and
    colony development of Vespula pensylvanica and Vespula atropilosa (hymenoptera:
    Vespidae). Journal of the Kansas Entomological Society 49, 63–84 (1976)
 2. Anderson, C., Jadin, J.L.V.: The adaptive benefit of leaf transfer in Atta colombica.
    Insectes Sociaux 48, 404–405 (2001)
 3. Anderson, C., Ratnieks, F.L.W.: Task Partitioning in Insect Societies. I. Effect
    of colony size on queueing delay and colony ergonomic efficiency. The American
    naturalist 154(5), 521–535 (1999)
 4. Anderson, C., Ratnieks, F.L.W.: Task partitioning in insect societies: novel situa-
    tions. Insectes Sociaux 47, 198–199 (2000)
 5. Brutschy, A.: Task allocation in swarm robotics. Towards a method for self-
    organized allocation to complex tasks. Tech. Rep. TR/IRIDIA/2009-007, IRIDIA,
              e
    Universit´ Libre de Bruxelles, Brussels, Belgium (2009)
                                                         e
 6. Brutschy, A., Pini, G., Baiboun, N., Decugni`re, A., Birattari, M.: The
    IRIDIA TAM: A device for task abstraction for the e-puck robot. Tech. Rep.
                                               e
    TR/IRIDIA/2010-015, IRIDIA, Universit´ Libre de Bruxelles, Brussels, Belgium
    (2010)
 7. Dorigo, M., Sahin, E.: Guest editorial. Special issue: Swarm robotics. Autonomous
    Robots 17(2-3), 111–113 (2004)
298     M. Frison et al.

                          c
 8. Fontan, M.S., Matari´, M.J.: A study of territoriality: The role of critical mass in
    adaptive task division. In: From Animals to Animats 4: Proceedings of the Fourth
    International Conference of Simulation of Adaptive Behavior, pp. 553–561. MIT
    Press, Cambridge (1996)
 9. Hart, A.G., Anderson, C., Ratnieks, F.L.W.: Task partitioning in leafcutting ants.
    Acta Ethologica 5, 1–11 (2002)
10. Kant, E.: Perpetual Peace: A Philosophical Sketch. Hackett Publishing Company,
    Indianapolis (1795)
11. Lein, A., Vaughan, R.: Adaptive multi-robot bucket brigade foraging. In: Bullock,
    S., Noble, J., Watson, R., Bedau, M.A. (eds.) Artificial Life XI: Proceedings of
    the Eleventh International Conference on the Simulation and Synthesis of Living
    Systems, pp. 337–342. MIT Press, Cambridge (2008)
12. Mondada, F., Bonani, M., Raemy, X., Pugh, J., Cianci, C., Klaptocz, A., Magnenat,
    S., Zufferey, J.C., Floreano, D., Martinoli, A.: The e-puck, a robot designed for
    education in engineering. In: Proceedings of the 9th Conference on Autonomous
                                                                                   e
    Robot Systems and Competitions, Portugal, pp. 59–65, IPCB: Instituto Polit`cnico
    de Castelo Branco (2009)
13. Pinciroli, C.: Object retrieval by a swarm of ground based robots driven by aerial
                                                                   e
    robots. Tech. Rep. TR/IRIDIA/2007-025, IRIDIA, Universit´ Libre de Bruxelles,
    Brussels, Belgium (2007)
14. Pini, G., Brutschy, A., Birattari, M., Dorigo, M.: Interference reduction through
    task partitioning in a robotic swarm. In: Filipe, J., Andrade-Cetto, J., Ferrier, J.L.
    (eds.) Sixth International Conference on Informatics in Control, Automation and
    Robotics – ICINCO 2009, pp. 52–59. INSTICC Press, Setbal (2009)
15. Ratnieks, F.L.W., Anderson, C.: Task partitioning in insect societies. Insectes So-
    ciaux 46(2), 95–108 (1999)
                        c
16. Shell, D.J., Matari´, M.J.: On foraging strategies for large-scale multi-robot sys-
    tems. In: Proceedings of the 19th IEEE/RSJ International Conference on Intelli-
    gent Robots and Systems (IROS), pp. 2717–2723 (2006)
17. Traiano, B.: La Bilancia Politica di Tutte le Opere di Traiano Boccalini, Part 2-3.
    Kessinger Publishing, Whitefish, Montana, USA (1678)
18. Wagner, D., Brown, M.J.F., Broun, P., Cuevas, W., Moses, L.E., Chao, D.L., Gor-
    don, D.M.: Task-related differences in the cuticular hydrocarbon composition of
    harvester ants. Pogonomyrmex barbatus. J. Chem. Ecol. 24, 2021–2037 (1998)

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:41
posted:5/15/2011
language:English
pages:112