PowerPoint Presentation - Computing Science - University of Alberta

Document Sample
PowerPoint Presentation - Computing Science - University of Alberta Powered By Docstoc
					  Hybrid Behavior Co-evolution and
Structure Learning in Behavior-based
              Systems
              Amir massoud Farahmand (a,b,c)
                         (www.cs.ualberta.ca/~amir)

                 Majid Nili Ahmadabadi (b,c)
                         Caro Lucas (b,c)
                     Babak N. Araabi (b,c)
       a) Department of Computing Science, University of Alberta
b) Control and Intelligent Processing Center of Excellence, Department of
       Electrical and Computer Engineering, University of Tehran
                  c) School of Cognitive Sciences, IPM
                        Motivation
Situated real-world agents
   (e.g.) face different
   uncertainties
   – Unknown environment/body
      • [exact] Model of
        environment/body is not
        known
   – Non-stationary
     environment/body
      • Changing environment
        (offices, houses, streets,
        and almost everywhere)
      • Aging
      • …
Designing a robust
  controller for such an
  agent is not easy.
         Research Specification
• Goal: Automatic design of
  intelligent agent
                                                          manipulate
• Architecture: Hierarchical behavior-                    the world
  based architectures (a version of
  Subsumption architecture)                               build maps

    – Behavior-based systems:                 sensors                     actuators
                                                           explore
         • A robust successful approach for
           designing situated agents
                                                        avoid obstacles
         • Behavioral decomposition
         • Behaviors: Sensors ---> Actions
                                                           locomote
• Evaluation: Objective performance
  measure is available (reinforcement
  signal)
    – [Agent] Did I perform it correctly?!
    – [Tutor] Yes/No! (or 0.3)
         ?
     How should we


DESIGN
a behavior-based system?!
       Behavior-based System
        Design Methodologies
• Hand Design
   – Common in almost everywhere.
   – Complicated: may be even infeasible in complex problems
   – Even if it is possible to find a working system, it is probably not the
     best solution.
• Evolution
   –   Good solutions can be found (+)
   –   Biologically plausible (+)
   –   Time consuming (-)
   –   Not fast in making new solutions (-)
• Learning
   – Biologically plausible (+)
   – Learning is essential for life-time survival of the agent. (+)
   – May get stuck in a local minimum (-)
   Taxonomy of Design Methods


                                 Behavior-based System Design

                    Learning                                         Evolution

Structure (hierarchy)                              Co-evolution of
                               Behavior learning                            Evolution of Structure
      learning                                       behaviors
   Taxonomy of Design Methods


                                 Behavior-based System Design

                    Learning                                                    Evolution

Structure (hierarchy)                                         Co-evolution of
                               Behavior learning                                       Evolution of Structure
      learning                                                  behaviors




                                       Hybridization of
                                     Evolution and Learning
                         Problem Formulation
                               Behaviors

Bi : S i  Ai i  1,...,n
Ai  Ai  No Action
S i  s i s i  M i ( s ); s  S i 
S i  S , Ai  A
M i : S  S i
                                          Problem Formulation
                 Purely Parallel Subsumption
                    Architecture (PPSSA)

T  [ Bindex(1) Bindex( 2 ) ... B index( m ) ]T   mn
index(i): j (that indicates B j is in the i th layer)




•Different behaviors
excites
•Higher behaviors can
suppress lower ones.
•Controlling behavior

    T  Wandering                    BallCollection ObstacleAvoidance
                Problem Formulation
      Reinforcement Signal and the Agent’s Value
                      Function


    1 N
 R   ri
    N i 1
        1 N                                                                    
VT  E   rt the agent with structureT and set of behaviors Bi (i  1,...,n 
                                                                              )
       N
         t 1                                                                  
 E R the agent with structureT and set of behaviors Bi (i  1,...,n 
                                                                       )
  


•This function states the value of using a set of behaviors in
an specific structure.
•We want to maximize the agent’s value function
            Problem Formulation
              Design as an Optimization
• Structure Learning: Finding the         T *  arg max VT
   best structure given a set of                          T
   behaviors using learning
• Behavior Learning: Finding the
   best behaviors given the structure
                                          B   arg max V
                                               *
                                               i                            T
                                                              Bi
   using learning
• Concurrent Behavior and
  Structure Learning
                                          T   *
                                                      
                                                   , Bi*  arg max VT
                                                                   T , Bi
• Behavior Evolution: Finding the
   best behaviors given structure using
   evolution
                                          B   arg max V
                                               *
                                               i                            T
                                                              Bi
• Behavior Evolution and
  Structure Learning                      T   *
                                                      
                                                   , Bi*  arg max VT
                                                                   T , Bi
                                 Behavior-based System Design

                    Learning                                                    Evolution

Structure (hierarchy)                                         Co-evolution of
                               Behavior learning                                       Evolution of Structure
      learning                                                  behaviors




                                       Hybridization of
                                     Evolution and Learning
                  Structure Learning
  build maps

                      explore
  manipulate
  the world
                                The agent wants to learn
                     locomote   how to arrange these
                                behaviors in order to get
avoid obstacles
                                maximum reward from its
                                environment (or tutor).

          Behavior Toolbox
                  Structure Learning
  build maps

                      explore
  manipulate
  the world


                     locomote
avoid obstacles



          Behavior Toolbox
             Structure Learning
build maps



manipulate
the world
                                       explore
                 locomote

                                   avoid obstacles

                            1-explore becomes
        Behavior Toolbox    controlling behavior and
                            suppress avoid obstacles
                            2-The agent hits a wall!
             Structure Learning
build maps



manipulate
the world
                                      explore
                 locomote

                                   avoid obstacles

                            Tutor (environment) gives
        Behavior Toolbox    explore a punishment for its
                            being in that place of the
                            structure.
             Structure Learning
build maps



manipulate
the world
                                       explore
                 locomote

                                   avoid obstacles

                            “explore” is not a very good
        Behavior Toolbox    behavior for the highest
                            position of the structure. So
                            it is replaced by “avoid
                            obstacles”.
              Structure Learning
                   Challenging Issues
• Representation: How should the agent represent knowledge
  gathered during learning?
   –   Sufficient (Concept space should be covered by Hypothesis space)
   –   Generalization Capability
   –   Tractable (small Hypothesis space)
   –   Well-defined credit assignment
• Hierarchical Credit Assignment: How should the agent assign
  credit to different behaviors and layers in its architecture?
   – If the agent receives a reward/punishment, how should we
     reward/punish the structure of the agent?
• Learning: How should the agent update its knowledge when it
  receives reinforcement signal?
         Structure Learning
     Overcoming Challenging Issues
• Our approach is defining a representation
  that allows decomposing the agent’s value
  function to simpler components.
• Structure can provide a lot of clues to us.
                       Structure Learning
                    Zero Order Representation


              ZO Value Table in the agent’s mind



               avoid obstacles   explore    locomote
Higher layer        (0.8)         (0.7)       (0.4)


               avoid obstacles   explore    locomote
                    (0.6)         (0.9)       (0.4)
Lower layer
                 Structure Learning
        Zero Order Representation - Value Function
                         Decomposition
                1 N
                       
VT  E R   E   rt 
     
              N
              
                   t 1   
      1 N                                                                           
   E   rt  " L1 is controlling"" L2 is controlling"..." Lm is controlling"
       t 1
     N
                                                                                     
      1 N                                      N                          
   E   rt " L1 is controlling"  ...  E  rt " Lm is controlling"
       t 1
     N
                                              
                                                  t 1                      
      1                         
   E   rt | L1 is controlling   P( L1 is controlling)
     N                         
       1                        
   E   rt | L2 is controlling   P( L2 is controlling)
       
      N
                                 
            1                         
   ...  E   rt | Lm is controlling   P( Lm is controlling)
            
           N
                                       
                       Structure Learning
          Zero Order Representation - Value Function
                       Decomposition
                      1                                                        
VZO (i, j )  Vij  E   rt B j is the controllin g behavior in the i th layer 
                     N                                                        
                                                             ZO components


 E[  rt | Li is controlling]   PB j | Li E   rt B j is the controlling behavior in Li 
   1                             n
                                                1                                           
  N                                           N                                           
                                j 1


   PB j | Li  ij
    n
                V      i  1,...,m
                                                             Layer’s value
   j 1




VT   PB j | Li  ij P Li is controllin g 
           m     n
                   V
          i 1 j 1

                                                        Agent’s value function
                 Structure Learning
        Zero Order Representation - Value Function
                     Decomposition



T *  arg maxVT
         T


 T  arg maxVT  arg max  PB j | Li  ij PLi is controlling 
                              m    n
    *
                                        V
             T          T    i 1 j 1
                          Structure Learning
               Zero Order Representation - Credit Assignment and Value Updating




       • Controlling behavior is the only responsible
         behavior for the current reinforcement signal.


                            Vij  PB j | Li  ij PLi is controlling
                            ~
                                             V
                       ~
                               
Vij n1  1   n,ij Vn,ij   n,ij " B j is active at time step n"" Li is controlling at time step n"rn
~
                                                                                                               
                                 Behavior-based System Design

                    Learning                                                    Evolution

Structure (hierarchy)                                         Co-evolution of
                               Behavior learning                                       Evolution of Structure
      learning                                                  behaviors




                                       Hybridization of
                                     Evolution and Learning
          Behavior Co-evolution
                           Motivations

               +                                  -
•   Learning can trap in the local   • Evolutionary robotics’
    maxima of objective function       methods are usually slow
•   Evolutionary methods have           – Fast changes of the
    more chance to find the global
                                          environment
    maximum of the objective
    function                         • Non-modular controllers
•   Learning is sensitive (POMDP,       – Monolithic
    non-Markov, …)                      – No reusability
•   Objective function may not be
    well-defined in robotics
        Behavior Co-evolution
                            Ideas
• Use evolution to search the difficult and big part of
  parameters’ space
   – Behaviors’ parameters space is usually the bigger one
• Use learning to do fast responses
   – Structure’s parameters space is usually the smaller one
   – A change is the structure results in different agent’s behavior
• Evolve behaviors separately
   – Modularity
   – re-usability
Behavior Co-evolution
  Behavior Pool 1


                            We have different
                            behavior (genetic)
                            pools
  Behavior Pool 2




  Behavior Pool n


                    Agent
Behavior Co-evolution
  Behavior Pool 1


                            One behavior is
                            selected randomly
                            from each pool. We
  Behavior Pool 2
                            want to assess its
                            fitness.



  Behavior Pool n


                    Agent
Behavior Co-evolution
  Behavior Pool 1
                            Agent interacts with
                            the environment
                            using an architecture
  Behavior Pool 2
                            that is built by
                            selected behaviors



  Behavior Pool n


                    Agent
Behavior Co-evolution
  Behavior Pool 1
                            … and tries to
                            maximize its reward.

  Behavior Pool 2




  Behavior Pool n


                    Agent
Behavior Co-evolution
  Behavior Pool 1


                    Based on the
                    performance of the agent,
                    a fitness is assigned to it.
  Behavior Pool 2




                                    Fitness

  Behavior Pool n


                    Agent
              Behavior Co-evolution
                                Fitness Sharing
• We can evaluate fitness of the agent after its
  interaction with the environment.
• How can we assess the fitness of each behavior
  based on the fitness of the agent? (remember that we
  have separate behavior pools)
• We approximate it!
                                               1                                                  
VB Last K episodes  f{B}                  E         rt the agent with B, t  Last K episode
                                                K tLast K episode                                
                            Last K episodes



              1
  f Bij      VB i                     (Fitness)
              N B       Last K episodes
                  i
       Behavior Co-evolution
Each behavior’s genetic pool
  has conventional
  evolutionary
  operators/phenomena
   – Selection
                                    jnew
   – Genetic Operators         Bi            XBi   j old
                                                             XB
                                                               i
                                                                k old

      • Crossover
      • Mutation
         – Hard
             » Replacement
         – Soft
             » Perturbation
  Multi-Robot Object Lifting Problem

• Three robots want to
  lift an object using
  their own local
  sensors
   – No central control
   – No communication
   – Local sensors
• Objectives
   – Reaching prescribed
     height
                           A group of robots lifts a bulky object.
   – Keeping tilt angle
     small
Multi-Robot Object Lifting Problem




                   QuickTime™ an d a
               TIFF (LZW) decomp resso r
            are need ed to see this picture .
Multi-Robot Object Lifting Problem




                   QuickTime™ and a
            TIFF (PackBits) decompressor
            are needed to see this picture.
             Conclusion
• Hybridization of evolution and learning
• Evolution and learning search different
  subspaces of the solution space
• Competitive results to human-designs
        Important Questions
• Is it possible to benefit from information
  gathered during learning?
   – Each agent learns an approximately good
     structure’s arrangement. However, we do not use
     it at all!
• Is there any other way of sharing fitness of
  the agent between behaviors?
   – Now, we share all behaviors uniformly.
 It seems that the answer to these questions is
                     positive!
           Future Research
• Can we decompose other problems (not just
  hierarchical behavior-based systems)
  similarly?!
  – Learning and evolution
  – Fast and Deep
  – Different subspaces of the solution space
• Other ways of fitness sharing
  – Low bias
  – Low variance
Questions?!

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:3/20/2013
language:English
pages:41