Docstoc

goals

Document Sample
goals Powered By Docstoc
					Evolving Multimodal
Behavior
PhD Proposal
Jacob Schrum
11/4/09
    Introduction
   Challenge: Discover behavior automatically
     Simulations,   video games, robotics
   Why challenging?
     Noisy sensors
     Complex domains
     Continuous states/actions
     Multiple agents, teamwork
     Multiple objectives
     Multimodal behavior required (focus)
    What is Multimodal Behavior?
   Working definition:
     Agent exhibits distinct kinds of actions
      under different circumstances
   Examples:
     Offensive & defensive modes in soccer
     Search for weapons or opponents in
      video game
     Animal with foraging & fleeing modes
   Very important for teams
     Rolescorrespond to modes
     Example domains will involve teamwork
Previous Approaches
   Design Approaches
     Hand    code in a structured manner
   Value-Function Based Approaches
     Learn   the utility of actions (RL)
   Evolutionary Approaches
     Selectively   search based on performance
Design
   Subsumption Architecture (Brooks 1986)
     Hierarchical design
     Lower levels independent of higher levels
     Built incrementally
     Common in robotics
     Hand coded
Value-Function Based (1/2)
   MAXQ (Dietterich 1998)
     Hand-designed   hierarchy
     TD learning at multiple levels
     Reduce state space
     Taxi domain
         Still just a grid world
         Discrete state & action
Value-Function Based (2/2)
   Basis Behaviors (Matarić 1997)
     Low-level behaviors pre-defined
     Learn high-level control
     Discrete state space
          High-level features (conditions)
     Reward  shaping necessary
     Applied to real robots


     Too    much expert knowledge
    Evolutionary (1/2)
   Layered Evolution (Togelius 2004)
     Evolve components of
      subsumption architecture
     Applied to:
        EvoTanks (Thompson and Levine 2008)
        Unreal Tournament 2004 (van Hoorn et al. 2009)

     Must   specify:
        Hierarchy
        Training tasks

     Similarto Layered
      Learning (Stone 2000)
Evolutionary (2/2)
   Neuro-Evolving Robotic Operatives
    (Stanley et al. 2005)

      ML  game
      Train robot army
      Many objectives
          Weighted sum: z-scores method
          User changes weights during training

      Dynamic         objective management
            Leads to multimodal behavior
    Multiple Objectives
   Multimodal problems are typically multiobjective
     Modes   associated with objectives
   Traditional: weighted sum (Cohon 1978)
     Must tune the weights                   Cannot be captured
                                              by weighted sum
     Only one solution
     Bad for non-convex surfaces
     Need better formalism



                Each point corresponds to
                one set of specific weights
Greatest Mass Sarsa
(Sprague and Ballard 2003)

 Multiple MDPs with shared action space
 Learn via Sarsa(0) update rule:


   Best value is sum of component values:

 Used in sidewalk
  navigation task
 Like weighted sum
Convex Hull Iteration
(Barrett and Narayanan 2008)

   Changes MDP formalism:
       Vector reward
   Find solutions for all possible
    weightings where
   Maximize:

   Results in compact set of solutions
       Different trade-offs
   Cannot capture non-convex surfaces
   Discrete states/actions only

   Need a way to capture non-convex surfaces!
Pareto-based Multiobjective Optimization
(Pareto 1890)
   Imagine game with two objectives:
         Damage Dealt
         Health Remaining         High health but did not deal much damage


        dominates     iff                      Tradeoff between objectives
    1.

         and
    2.
                                                       Dealt lot of damage,
                                                       but lost lots of health
   Population of points
    not dominated are best:
    Pareto Front
Non-dominated Sorting Genetic Algorithm II
(Deb et al. 2000)


   Population P with size N; Evaluate P
   Use mutation to get P´ size N; Evaluate P´
   Calculate non-dominated fronts of {P P´} size 2N
   New population size N from highest fronts of {P  P´}
    Constructive Neuroevolution
 Genetic Algorithms + Neural Networks
 Build structure incrementally
 Good at generating control policies
 Three basic mutations (no crossover used)
 Other structural
  mutations
  possible
     More   later

                     Perturb Weight   Add Connection   Add Node
    Evolution of Teamwork
   Heterogeneous              Homogeneous
     Different roles            Shared  policy
     Cooperation harder         Individuals know how
      to evolve                   teammates act
     Team-level                 Individuals fill roles as
      multimodal behavior         needed: multimodal
Completed Work
   Benefits of Multiobjective Neuroevolution
     Pareto-based   leads to multimodal behavior
   Targeting Unachieved Goals (TUG)
     Speed    up evolution with objective management
   Evolving Multiple Output Modes
     Allow   networks to have multiple policies/modes


   Need a domain to experiment in …
    Battle Domain
   Evolved monsters (yellow)
   Scripted fighter (green)
     Approach nearest monster
     Swing bat repeatedly
   Monsters can hurt fighter
   Bat can hurt monsters
   Multiple objectives
     Deal damage
     Avoid damage
     Stay alive
   Can multimodal teamwork evolve?
Benefits of Multiobjective Neuroevolution

   Research Questions:
     NSGA-II
            better than z-scores (weighted sum)?
     Homogeneous or heterogeneous teams better?
   30 trials for each combination
   Three evaluations per individual
     Average   scores to overcome noisy evals
Incremental Evolution
   Hard to evolve against scripted strategy
     Could   easily fail to evolve interesting behavior
   Incremental evolution against increasing speeds
     0%,   40%, 80%, 90%, 95%, 100%
   Increase speed when all
    goals are met
   End when goals met at 100%
    Goals
   Average population performance high enough?
       Then increase speed
   Each objective has a goal:
     At least 50 damage to bot (1 kill)
     Less than 20 damage per monster on average (2 hits)
     Survive at least 540 time
      steps (90% of trial)
   AVG population objective
    score met goal value?
       Goal achieved
Evolved Behaviors
   Baiting + Side-Swiping
     Lure fighter
     Turns allow team to catch up
     Attacks on left side of fighter

   Taking Turns
     Hitand run
     Next counter-clockwise
      monster rushes in
     Fighter hit on left side

   Multimodal behaviors!
Multiobjective Conclusions
 NSGA-II faster than z-scores
 NSGA-II more likely to generate
  multimodal behavior

 Many runs did not finish/were slow
 Several “successful” runs did not have
  multimodal behavior
  Targeting Unachieved Goals
   Research Question:
     How     to speed up evolution, make more reliable
 When objective’s goal is met, stop using it
 Restore objective if scores drop below goal
 Focuses on the most challenging objectives
 Combine NSGA-II with TUG




Tough Objectives
    Evolved Behaviors
   Alternating Baiting
     Baituntil another monster hits
     Then baiting monster attacks
     Fighter knocked back and forth

   Synchronized Formation
     Move  as a group
     Fighter chases one bait
     Other monster rushes in with
      side swipe attacks
   More multimodal behaviors!
TUG Conclusions
   TUG results in huge speed-up
     No   wasted effort on achieved goals
   TUG runs finish more reliably
   Heterogeneous runs have more multimodal
    behavior than homogeneous

   Some runs still did not finish
   Some “successful” runs still did not have
    multimodal behavior
    Fight or Flight
   Separate Fight and Flight trials
   Fight = Battle Domain
   Flight:
     Scriptedprey (red) instead of fighter
     Has no bat; has to escape
     Monsters confine and damage
     New objective: Deal damage in Flight

   Flight task requires teamwork
   Requires multimodal behavior
    New-Mode Mutation
 Encourage multimodal behavior
 New mode with inputs from preexisting mode
     Initially   very similar




   Maximum preference node determines mode
Evolving Multiple Output Modes
   Research Question:
     How   to evolve teams that do well in both tasks
   Compare 1Mode to ModeMutation
   Three evals in Fight and three in Flight
     Same   networks for two different tasks
    1Mode Behaviors
   Aggressive + Corralling
     Aggressive in Fight task
        Take lots of damage
        Deal lots of damage

     Corralling in Flight task
   Run/Rush + Crowding
     Run/Rush in Fight task
        Good timing on attack
        Kill fighter w/o taking too much damage

     Crowding in Flight task
        Get too close to prey
        Knock prey out and it escapes

   Networks can’t handle both tasks!
ModeMutation
Behaviors
   Alternating Baiting + Corralling
     Alternating baiting in Fight task
     Corralling in Flight task
           Spread out to prevent escape
           Individuals rush in to attack
   Hit into Crowd + Crowding
     Hitting into Crowd in Fight task
        One attacker knocks fighter into others

     Crowding in Flight task
        Rush prey, ricochet back and forth

        Some times knocks prey free

   Networks succeed at both tasks!
Mode Mutation Conclusions
 ModeMutation slower than 1Mode
 ModeMutation better at producing
  multimodal behaviors

 Harder task resulted in more failed runs
 Many unused output modes created
     Slows  down execution
     Bloats output layer
Proposed Work
    Extensions
    1. Avoiding Stagnation by Promoting Diversity
    2. Extending Evolution of Multiple Output Modes
    3. Heterogeneous Teams Using Subpopulations
    4. Open-Ended Evolution + TUG



    Evaluate in new tasks
    Killer App: Unreal Tournament 2004
1. Avoiding Stagnation by Promoting Diversity

 Behavioral diversity avoids stagnation
 Add a diversity objective (Mouret et al. 2009)
 Behavior vector:
     Given input vectors,      1.2 -2        …        -1 2.2

      concatenate outputs
   Diversity objective:                      …

     AVG  distance from
      other behavior
      vectors in pop.      0.5 1 0.2 1.7 -2   …   1.5 -1 0.6 0.3 2
2. Extending Evolution of Multiple Output Modes
   Encourage mode differences
     Random      input sources
   Probabilistic arbitration
     Bad  modes less likely to persist
     Like softmax action selection

   Restrict New-Mode Mutation
     New  objective: punish unused modes, reward used modes
     Delete similar modes
          Based on behavior metric
     Limit   modes: make best use of limited resources
          Dynamically increase the limit?
3. Heterogeneous Teams Using Subpopulations

   Each team member from different subpopulation
    (Yong 2007)

   Encourages division of labor across teammates
   Different roles leads to multimodal team behavior
4. Open-Ended Evolution + TUG
   Keep increasing goals
     Evolution   has something to strive towards
 Preserves benefits of TUG
 Does not settle early
 When to increase goals?
     When    all goals are achieved

     As   individual goals are achieved
New Tasks
   More tasks require more modes
   Investigate single-agent tasks
        Only teams so far
   Investigate complementary objectives
        TUG only helps contradictory?
   Are hard when combined with others
   Tasks:
    1.   Predator
         •   Opposite of Flight
         •   Partial observability
    2.   Sink the Ball
         •   Very different from previous
         •   Needs more distinct modes?
         •   Less mode sharing?
Unreal Tournament 2004
   Commercial First-Person Shooter (FPS)
   Challenging domain
     Continuous   state and action
     Multiobjective
     Partial information
     Multimodal behaviors required

   Programming API: Pogamut
   Competitions:
     Botprize
     Deathmatch
    Unreal Deathmatch
   Packaged bots are hand-coded
   Previous winners of botprize hand-coded
   Learning attempts
     Simplified version of game (van Hoorn et al. 2009)
     Limited to certain behaviors (Kadlec 2008)

   Multimodal behavior in full game: not done yet
    Unreal Teams
   Team Deathmatch
     Largely   ignored?
   Capture the Flag
     Teams  protect own flag
     Bring enemy flag to base
     GP approach could not beat UT bots (Kadlec 2008)

   Domination
     King of hill
     Teams defend key locations
     RL approach learned group strategy
      of hand-coded bots (Smith et al. 2007)
Review
   System for developing multimodal behavior
     MultiobjectiveEvolution
     Targeting Unachieved Goals
     New-Mode Mutation
     Behavioral Diversity
     Extending Mode Mutation
     Subpopulations
     Open-Ended Evolution

   Final evaluation in Unreal Tournament 2004
Conclusion
   Create system:
     Automatically discovers multimodal behavior
     No high-level hierarchy needed
     No low-level behaviors needed
     Works in continuous, noisy environments
     Discovers team behavior as well

 Agents w/array of different useful behaviors
 Lead to better agents/behaviors in
  simulations, games and robotics
Questions?
Auxiliary Slides
    Design (2)
   Behavior Trees (Isla 2005)
     Top-down    approach
     Used first in Halo 2
     Other commercial games since
     “Brute Force Approach to Common Sense”
     Add a behavior for every situation
     Hand coded
Evolutionary (0)
   Dangerous Foraging (Stanley et al. 2003)
     Don’t know if food is safe or poison
     Partial information
     Multimodal:
           Eat food
           Avoid food
   Adaptive Teams of Agents (Bryant         and Miikkulainen 2003)
     Roman  legions defend
     Homogeneous teams
     Barbarians plunder
     Multimodal:
           Defend town
           Chase barbarian
                                      strength = 0
Other MOEAs                                  strength = 1


   PESA-II (Corne et al. 2001)                     In archive

     External  archive and
      internal population
     Region-based selection
      using squeeze factor
   SPEA2 (Zitzler and Thiele 1999)
     External  archive and
      internal population
     Fitness based on strength
                                               squeeze factor = 2
      and density                        squeeze factor = 1
    1. Predator
   Scripted agent is predator
     Chases   nearest monster
 Monsters run to avoid damage
 When combined with Flight,
  is partially observable
     Opposite   behavior from Flight
   Complementary objectives
     Avoid damage
     Stay alive
    2. Sink the Ball
   Monsters push ball around
     Noscripted agent
     Monster agent sensors tuned to ball
   Move ball to goal to sink it
   Very different from previous tasks
     More   distinct behavioral modes
   Complementary objectives
     Min. distance between ball and goal
     Min. time to sink goal
Heterogeneous z vs. NSGA-II
  Homogeneous z vs. NSGA-II




The z-scores method was faster
Heterogeneous NSGA-II vs. TUG
Homogeneous NSGA-II vs. TUG
Heterogeneous 1Mode vs.
ModeMutation




1Mode was faster
Homogeneous 1Mode vs.
ModeMutation




1Mode was faster

				
DOCUMENT INFO
Categories:
Tags:
Stats:
views:0
posted:3/21/2013
language:English
pages:69
qingqing19771029 qingqing19771029 http://
About