Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Analyzing Complex Strategic Interactions in Multi-Agent Systems by haz48108

VIEWS: 6 PAGES: 21

									Analyzing Complex Strategic Interactions in
          Multi-Agent Systems

        Walsh, Das, Tesauro, Kephart
 in Proceedings of the Workshop on Game Theoretic and
               Decision Theoretic Agents
                       (AAAI 2002)

           Presented by Deniz Sarioz
     in Simon Parsons‟ e-commerce seminar,
      CUNY Grad Center, April 19th 2005.
                      Premise
• Understanding interactions among various strategies can
  be valuable both to designers of markets (wishing to
  ensure economic efficiency and stability) and to designers
  of individual agents (wishing to „maximize‟ profits).
• By “demystifying” strategic interactions among agents, can
  improve our ability to predict (and design) the overall
  behavior of multi-agent systems.
• Question of which strategy is “best” is often not the most
  appropriate, a mix of strategies can be an equilibrium.
• The tournament approach has the shortcoming of being
  one trajectory through an infinite space of possible
  interactions.
• Authors present a more principled and complete method
  for analyzing interactions among strategies.
            Modeling Approach
• Start with a game that may include complex, repeated
  interactions between A agents.
• The rules specify particular actions that agents may take as
  a function of the game state (e.g., “bid b at time t.”)
• Each of the agents has a choice among S exogenously
  specified, heuristic strategies.
• Strategies are “heuristic” in that they are generally not the
  solution of Bayes / Nash equilibrium analysis
• Compute a heuristic-payoff table that specifies the
  expected payoff to each agent as a function of the
  strategies played by all agents.
• Agent strategies are selected independently from agent
  type (an assumption to help with tractability).
             Modeling Approach
• The heuristic-payoff table is an abstract representation of
  the fundamental game that reduces a potentially very
  complex game to a one-shot game in “normal form”.
• Treat the choice of heuristic strategies rather than basic
  actions as the level of decision making for strategic
  analysis.
• Standard payoff table for a normal-form game requires SA
  entries, which can be huge even for moderate S, A.
   – e.g., 3 agents X Y Z and 4 strategies S T U V: need cells in the
     table for (XS, YS, ZS), (XS, YS, ZT), (XS, YS, ZU), ...
• Due to same distribution of types, we can express the
  payoff of each strategy as a function of the number of
  agents playing each strategy ((XS,YS,ZT)  (XT,YS,ZS)).
            Modeling Approach
• What does this combinatorial reduction buy us?
• Number of entries in the table is the number of unique
  ways A agents can be partitioned into S strategies.
         A+S-1CA = (A+S-1)(A+S-2) ... (A+1) / (S-1)!
which is bounded below by and is approximately (for A>>S):
                            AS-1/(S-1)!
• This is fine for situations where even a lot of agents get to
  pick from / switch among a few strategies.
• For A=20 and S=3, the symmetric payoff table contains
  231 entries rather than 320 = (3.5) 109 in the asymmetric.
• For sufficiently simple games, payoffs may be computed
  analytically. Complexity usually demands simulations.
            Modeling Approach
• With a payoff table table is computed, a variety of
  techniques can be used. 3 such techniques in this paper:
• A static analysis, entails computing Nash equilibria of the
  payoff table.
• Model the dynamics of agents that switch to strategies that
  appear more successful.
• Suggest techniques for understanding strategies at a deeper
  level. Specifically: perturbation analysis
       Equilibrium Computation
• At the start of the game, each of the A agents chooses one
  of the S strategies
• The payoff to agent i is a real function u of the strategies
  played by all agents.
   – Payoff is the expected reward
• We assume symmetric strategy sets and payoffs, so payoff
  to an agent can be represented as the payoff to each
  strategy as a function of the number of agents playing each
  strategy.
        Equilibrium Computation
• Agent i may choose its strategies randomly according to a
  mixed strategy:
                  p^i = (p^i,1 , p^i,2 , ... , p^i,S )
where p^i,j is the probability of agent i playing strategy j.
• The (S-dim) vector of all agents‟ mixed strategies is p^
  and the ((AS-S)-dim) vector of mixed strategies for all
  agents except i is p^-i
• p^i = e j denotes the special case of p^i,j = 1.
• u(e j , p^-i) denotes payoff for agent i for playing pure
  strategy j while others play a mixed strategy.
• For mixed p^i we have u(p^i, p^-i) = Sj=1 u(e j , p^-i) p^i,j
         Equilibrium Computation
• In game theoretic analysis, usually assume everyone plays
  mixed Nash equilibrium strategies. i.e., no one can receive a
  higher payoff by unilaterally deviating to another strategy.
  Formally, p^* is a Nash equilibrium iff agent i:
                 u(p^i , p^*-i )  u (p^*i , p^*-i )
• Rest of the paper: specifically focus on symmetric mixed
  strategy equilibria--that is, agents i, k: p^*i = p^*k = p*
• Denote an arbitrary (not necessarily equilibrium) symmetric
  mixed strategy by p and the probability that a given agent
  plays pure strategy j by pj
• Symmetry assumption simplifies the analysis from
  A S-tuples of probabilities to only one such S-tuple.
   – Justification: a symmetric game (to be defined later) always has
     at least one (usually non-unique) symmetric Nash equilibrium.
         Equilibrium Computation
• There are equivalent formulations to the concept of Nash
  equilibrium. But mostly do not exploit symmetry, which
  imposes a severe restriction on the problem size to be solved.
• Formulate Nash Equilibrium as the minimum of a function on
  a polytope.
• Considering only symmetric equilibria, the problem is to (find
  the p that) minimize(s): [p* is Nash Eq iff global min of v]
            v(p) = Sj=1 (max [u(e j , p) - u(p, p) , 0 ] )2
     restatement of v(p) in pseudo-C by way of “derivation”:
      Sj=1 [ (u(e j , p) > u(p, p)) ? (u(e j , p) - u(p, p))2 : 0 ]
• The polytope is just the (S-1)-simplex in S dimensions defined
  by the constraints Sj=1 pj = 1 and (j) pj  0
• e.g., if we have 3 strategies, look for equilibria on the interior
  of a triangle, which is a bounded subset of 2 dimensional space.
         Equilibrium Computation
• Used a software package called amoeba, a non-linear
  optimizer, to find the minimum of the stated function on the S-
  simplex.
• The function is evaluated at each vertex of the simplex and the
  polytope attempts to move down the estimated gradient by a
  series of geometric transformations that strive to replace the
  worst-performing vertex.
• Authors repeatedly ran amoeba for restarting at random points
  on the S-simplex and stopping when found 30 previously
  discovered equilibria in a row.
• For A=20, S=3 took 10 minutes on a 450MHz machine.
               Dynamic Analysis
• Nash equilibria provide a theoretically satisfying view of the
  ideal static properties of a multi-agent system.
• Dynamic properties may be of greater or equal concern.
• In actual systems, may be unreasonable to assume that agents
  all have correct and common knowledge necessary to compute
  equilibria.
• Borrow a well-developed model from evolutionary game
  theory (Weibull 1995) to analyze strategy choice dynamics.
• Posit a very large population of N agents, and A agents
  (A<<N) are randomly chosen to play at each „tick‟. Each agent
  plays one of the S pure strategies and the fraction of agents
  playing strategy j is pj
• “for sufficiently large N, pj may be treated as cts variable”
                  Dynamic Analysis
• Use the replicator dynamics formalism to model the evolution
  of p with time as:
                   p
                     .       = [ u(e j , p) - u(p, p) ] pj
                         j
where u(p, p) is the population average payoff and
where u(e j , p) is the average payoff to agents currently using
  pure strategy j,
              .
and (I think) p j is by how much the jth component of p increases
  at every tick.
• This equation models the tendency of strategies with greater
  than average payoff to attract followers and those with less than
  average payoff to suffer defections.
                Dynamic Analysis
• Prefer that a dynamic model assume minimal informational
  requirements for agents beyond their own actions and payoffs.
• The replicator dynamics eq implies that agents know u(p, p)
  and that is a very implausible assumption.
• However, can obtain the same population dynamics with a
  “replication by imitation” model (Weibull 1995):
   – An agent switches to the strategy of a randomly chosen opponent
     who appears to be receiving a higher payoff.
   – Interpret p at any given time as representing a symmetric mixed
     strategy for all N players in the game. Then the fixed points of
                            .
     equation 2 (where j p j = 0 ) correspond to Nash equilibria.
   – When strategy trajectories governed by equation 2 converge to an
     equilibrium, the equilibrium is an attractor.
   – However, trajectories do not necessarily converge.
                Dynamic Analysis
• Two ways of thinking about this:
    – When multiple Nash equilibria exist, those that are attractors
      are the only plausible equilibria within the evolutionary model.
      Those with larger basins of attraction are more likely, assuming
      that every initial population state is equally likely.
    – We can use the basins of attraction to understand which initial
      population mixes will lead to which equilibrium.
•   Use the heuristic payoff table and eq for p
                                               . to generate a large
  number of strategy trajectories, starting from a broad
  distribution of initial strategy vectors p
• For a choice of 3 strategies, the resulting flows can be plotted
  in a 2-D unit simplex and have an immediate visual
  interpretation.
         Perturbation of Payoffs
• We are only considering very few strategies, whereas in
  fact the strategy space is infinite and strategies can in
  practice be slightly modified.
• Perturb the payoff table in some meaningful ways, in
  order to perform some directed study of plausible
  effects of abstract changes in strategy behavior.
• Key basic idea: the n+ agents of strategy + steal some
  part of the payoff of n- agents of strategy - in a way
  that preserves the total:
            q(+) +=  min(n+ , n- ) q(+) / n+
           q(-) -=  min(n+ , n- ) q(-) / n-
• Alternative changes: pick on just one strategy, uniform
  improvements to all strategies, random, etc.
                  Applications
• Applied “the methodology” to:
   – Automated Dynamic Pricing (ADP)
   – Continuous Double Auction (CDA)
• Chosen because computationally intractable to compute
  equilibria in these.
• Body of literature exists which includes interesting
  heuristic strategies.
• Simulators are available for computing the heuristic
  payoff tables.
       Automated Dynamic Pricing
             (description)
• Recent emergence of “shopbots”: e.g., buy.com
  monitors and undercuts competitors‟ prices
• In this experiment, sellers choose one of three heuristic
  dynamic pricing strategies
   – GT: “game theory”: (Greenwald & Kephart 1999) plays a
     mixed-strategy Nash equilibrium computed for the underlying
     game assuming all pricing and purchasing decisions are made
     simultaneously
   – DF: “derivative follower”: simple hill-climbing adaptation:
     solely based on observed profitability, ignoring assumptions
     about other buyers and sellers.
   – NIR: “No Internal Regret”: (Greenwald & Kephart 1999)
     adapted from Foster Vohra (1997).
       Automated Dynamic Pricing
               (analysis)
• Looked at two scenarios for these 3 strategies, with 5
  agents and with 20 agents.




• Only A is a pure-strategy Nash equilibrium
• When number of agents increased to 20, number of
  equilibria drops to one!
Automated Dynamic Pricing
        (analysis)
Automated Dynamic Pricing
        (analysis)
• Since DF was never seen as an attractor, tried
applied a perturbation analysis to compare the
other two using the method specified


• Result: NIR would start becoming a strong
strategy with a 6.75% improvement, and
nearly dominant with 10% improvement.

								
To top