Game Theory, Machine Learning and Reasoning under Uncertainty

Document Sample
Game Theory, Machine Learning and Reasoning under Uncertainty Powered By Docstoc
					Game Theory, Machine Learning and Reasoning
             under Uncertainty

This workshop explores the benefits that may result from carrying out research at the in-
terface between machine learning and game theory. While classical game theory makes
limited provision for dealing with uncertainty and noise, research in machine learning, and
particularly probabilistic inference, has resulted in a remarkable array of powerful algo-
rithms for performing statistical inference from difficult real world data.
Recent research work at this interface has suggested computationally tractable algorithms
for analysing games that consist of a large number of players, whilst insights from game
theory have also inspired new work on strategic learning behaviour in probabilistic infer-
ence and are suggesting new algorithms to perform intelligent sampling in Markov Monte
Carlo methods.
The goal of this workshop is to explore the significant advantages that game theory and ma-
chine learning seem to offer to each other, to explore the correspondences and differences
between these two fields and to identify interesting and exciting areas of future work.


Morning Session
 07:30    Introduction and Aims, Iead Rezek, University of Oxford
 07:35    Invited Talk: Machine Learning: Principles, Probabilities and Perspectives,
              Stephen J. Roberts, Oxford University
 08:05    Invited Talk: Learning Topics in Game-Theoretic Decision Making,
              Michael Littman, Rutgers
 08:35    Invited Talk: Predictive Game Theory,
              David Wolpert, NASA Ames Research Center
 09:05    Break
 09:15    Model-based Reinforcement Learning for Partially Observable Games
              with Sampling-based State Estimation,
              Hajime Fujita, Nara Institute of Science and Technology
 09:40    Effective negotiation proposals using models of preference and risk behavior,
              Angelo Restificar, Oregon State
 10:05    Mechanism Design via Machine Learning,
              Yishay Mansour, Tel-Aviv University
 10:30    Ski Break
Afternoon Session
 16:00   N-Body Games,
             Albert Xin Jiang, University of British Columbia
 16:25   Probabilistic inference for computing optimal policies in MDPs,
             Marc Toussaint, University of Edinburgh
 16:50   Graphical Models, Evolutionary Game Theory, and the Power of Randomization
             Siddharth Suri, University of Pennsylvania
 17:15   Probability Collectives for Adaptive Distributed Control,
             David H. Wolpert, NASA Ames Research Center
 17:40   Break
 17:50   Probability Collectives: Examples and Applications,
             Dev Rajnarayan, Stanford University
 18:15   A Formalization of Game Balance Principles,
             Jeff Long, University of Saskatchewan
 18:40   A stochastic optimal control formulation of distributed decision making
             Bert Kappen, Radboud University, Nijmegen
 19:05   Discussion, Wrapping Up
 19:30   End

I. Rezek
University of Oxford, Oxford, UK.∼irezek
A. Rogers
University of Southampton, Southampton, UK.∼acr
David Wolpert
NASA Ames Research Center, California, USA.

Learning Topics in Game-Theoretic Decision Making, M. L ITTMAN,
Rutgers, NJ
This presentation will review some topics of recent interest in AI and economics concern-
ing design making in a computational game-theory framework. It will highlight areas in
which machine learning has played a role and could play a greater role in the future. Cov-
ered areas include recent representational and algorithmic advances, stochastic games and
reinforcement learning, no regret algorithms, and the role of various equilibrium concepts.

Machine Learning: Principles, Probabilities and Perspectives,S.J.
ROBERTS, University of Oxford, UK
This talk will offer an overview of some of the key principles in machine learning. It will
discuss how uncertainty is involved, from data to models; how learning may be defined and
how we may evaluate the value of information. Based on simple principles, strategies may
be seen in the light of maximizing expected information. The differences (and similarities)
between machine learning and game theory will be considered.

Predictive Game Theory, DAVID H. WOLPERT, NASA Ames Research
Abstract: Conventional noncooperative game theory hypothesizes that the joint strategy of
a set of reasoning players in a game will necessarily satisfy an ”equilibrium concept”. All
other joint strategies are considered impossible. Under this hypothesis the only issue is
what equilibrium concept is ”correct”.
This hypothesis violates the first-principles arguments underlying probability theory. In-
deed, probability theory renders moot the controversy over what equilibrium concept is cor-
rect - every joint strategy can arise with non-zero probability. Rather than a first-principles
derivation of an equilibrium concept, game theory requires a first-principles derivation of
a distribution over joint (mixed) strategies. If you wish to distill such a distribution down
to the prediction of a single joint strategy, that prediction should be set by decision theory,
using your (!) loss function. Accordingly, for any fixed game, the predicted joint strategy -
one’s ”equilibrium concept” - will vary with the loss function of the external scientist mak-
ing the prediction. Game theory based on such considerations is called Predictive Game
Theory (PGT).
This talk shows how information theory can provide such a distribution over joint strategies.
The connection of this distribution to the quantal response equilibrium is elaborated. It is
also shown that in many games, having a probability distribution with support restricted to
Nash equilibria - as stipulated by conventional game theory - is impossible.
PGT is also used to:
i) Derive an information-theoretic quantification of the degree of rationality;
ii) Derive bounded rationality as a cost of computation;
iii) Elaborate the close formal relationship between game theory and statistical physics;
iv) Use this relationship to extend game theory to allow stochastically varying numbers of
Model-based Reinforcement Learning for Partially Observable Games
with Sampling-based State Estimation, H AJIME F UJITA AND S HIN
I SHII, Graduate School of Information Science Nara Institute of Science
and Technology, Ikoma, JP
We present a model-based reinforcement learning (RL) scheme for large scale multi-agent
problems with partial observability, and apply it to a card game, Hearts. This game is
a well-defined example of an imperfect information game. To reduce the computational
cost, we use a sampling technique based on Markov chain Monte Carlo (MCMC) in which
the heavy integration required for the estimation and prediction can be approximated by
a plausible number of samples. Computer simulation results show that our RL agent can
perform learning of an appropriate strategy and exhibit a comparable performance to an
expert-level human player in this partially observable multi-agent problem.

Effective negotiation proposals using models of preference and risk be-
havior, A NGELO R ESTIFICAR, Oregon State University, Corvallis, OR, and
P ETER H ADDAWY, Asian Institute of Technology Pathumthani, Thailand
In previous work, we infer implicit preferences and attitude toward risk by interpreting
offer/counter-offer exchanges in negotiation as a choice between a certain offer and a gam-
ble [A. Restificar 2004]. Supervised learning can then be used to construct models of
preference and risk behavior by generating training instances from such implicit informa-
tion. In this paper, we introduce a procedure that uses these learned models to find effective
negotiation proposals. Experiments were performed using this procedure via repeated ne-
gotiations between a buyer and a seller agent. The results of our experiments suggest that
the use of learned opponent models leads to a significant increase in the number of agree-
ments and a remarkable reduction in the number of negotiation exchanges.

Mechanism Design via Machine Learning, Y ISHAY M ANSOUR, Tel-Aviv
University, IL
We use techniques from sample-complexity in machine learning to reduce problems of
incentive-compatible mechanism design to standard algorithmic questions for a wide vari-
ety of revenue-maximizing pricing problems. Our reductions imply that given an optimal
(or beta-approximation) algorithm for the standard algorithmic problem, we can convert
it into a (1 + epsilon)-approximation (or beta(1 + ǫ)-approximation) for the problem of
designing a revenue-maximizing incentive-compatible mechanism, so long as the number
of bidders is sufficiently large as a function of an appropriate measure of complexity of the
comparison class of solutions. We apply these results to the problem of auctioning a digital
good, the ”attribute auction” problem, and to the problem of item-pricing in unlimited-
supply combinatorial auctions. From a learning perspective, these settings present unique
challenges: in particular, the loss function is discontinuous and asymmetric, and the range
of bidders’ valuations may be large.
This is a joint work with Maria-Florina Balcan, Avrim Blum, Jason D. Hartline

N-Body Games, A LBERT X IN J IANG , K EVIN L EYTON -B ROWN                             AND
NANDO D E F REITAS, University of British Columbia, CA
This paper introduces n-body games, a new compact game-theoretic representation which
permits a wide variety of game-theoretic quantities to be efficiently computed both approx-
imately and exactly. This representation is useful for games which consist of choosing
actions from a metric space (e.g., points in space) and in which payoffs are computed as a
function of the distances between players’ action choices.
Probabilistic inference for computing optimal policies in MDPs, M ARC
TOUSSAINT AND A MOS S TORKEY, University of Edinburgh, UK
We investigate how the problem of planning in a stochastic environment can be translated
into a problem of inference. Previous work on planning by probabilistic inference was
limited in that a total time T has to be fixed and that the computed policy is not optimal
w.r.t. expected rewards. The generative model we propose considers the total time T as
a random variable and we show equivalence to maximizing the expected future return for
arbitrary reward functions. Optimal policies are computed via Expectation-Maximization.

Graphical Models, Evolutionary Game Theory, and the Power of Ran-
domization, M ICHAEL K EARNS AND S IDDHARTH S URI, University of
Pennsilvania, PA
We study a natural extension of classical evolutionary game theory to a setting in which
pairwise interactions are restricted to the edges of an undirected graph or network. We
generalize the denition of an evolutionary stable strategy (ESS), and show a pair of comple-
mentary results that exhibit the power of randomization in our setting: subject to minimal
edge density conditions, the classical ESS of any game are preserved when the graph is
chosen randomly and the mutation set is chosen adversarially, or when the graph is chosen
adversarially and the mutation set is chosen randomly. We examine natural strengthenings
of our generalized ESS denition, and show that similarly strong results are not possible for

Probability Collectives for Adaptive Distributed Control, DAVID H.
WOLPERT, NASA Ames Research Center
There are two major fields that analyze distributed systems: statistical physics and game
theory. Recently it was realized that these fields can be re-expressed in a way that makes
them mathematically identical. This provides a way to combine techniques from them, pro-
ducing a hybrid with many strengths that do not exist in either field considered in isolation.
This mathematical hybrid is called Probability Collectives (PC). As borne out by numer-
ous experiments, it is particularly well-suited to distributed optimization and to adaptive
distributed control. The unifying idea of these applications is that rather than directly op-
timize a variable of interest x, often it is preferable to optimize an associated probability
distribution, P (x).
In particular, since probabilities are real-valued, P (x) can be optimized using power-
ful techniques for optimization of continuous variables, e.g., gradient descent, Newton’s
method, etc. This is true even if the underlying variable x is categorical, mixed type, time-
extended, etc. In this way PC allows us to (for example) apply gradient descent to optimize
a function over a categorical variable.
Another advantage of PC is that P (x) provides sensitivity information about the optimiza-
tion problem, e.g., telling us which variables are most important. In addition, finding P (x)
is an inherently adaptive process, with excellent robustness against noise. This makes it
particularly well-suited to real-world control problems. Moreover, PC algorithms typically
”fracture” in a way that allows completely distributed implementation, typically with ex-
cellent scaling behavior.
Probability Collectives: Examples and Applications, D EV R AJ -
NARAYAN , Stanford University
Probability Collectives (PC) is a broad framework that translates and unifies concepts from
statistical mechanics, game theory and optimization. In this framework, optimization is
performed not on the variables of the problem, but on a probability distribution over those
variables. Such an approach has many advantages. In particular, we can now tackle cate-
gorical and mixed problems using powerful methods like gradient descent from continuous
optimization. Since PC is based on random sampling, it is inherently suited to large prob-
lems. Techniques of Simulated Annealing (SA) and Estimation of Distribution Algorithms
(EDAs) can be shown to be instances of particular variants in the PC framework. With a
language that describes the working of many global optimization approaches, we show how
one can analyze the performance of any particular approach and deduce how they need to
be changed to improve performance. In this paper, we describe specific pedagogical and
real-world problems and a systematic approach to solving them using the PC framework.
The pedagogical examples range from categorical ones like two-player, common payoff,
matrix games and UAV path-planning over a 2-D grid of hexagonal cells, to standard con-
tinuous optimization benchmarks like the Rosenbrock function. Finally, we describe the
application of PC to two real-world problems – control design for flight control of a UAV
with novel distributed actuators, and control design for gust alleviation using distributed

A Formalization of Game Balance Principles, J EFF L ONG                        AND    M ICA -
HEL C. H ORSCH , University of Saskatchewan
Game balance is the problem of determining the fairness of actions or action sets in com-
petitive, multiplayer games. In this paper, I formalize issues related to game balance using
the mathematical language of game theory, as used in the economic sciences. I show how
to detect game imbalance in this language using existing concepts and algorithms, and
provide a new algorithm for correcting imbalances thus discovered. Finally, I discuss the
application of these techniques to large, real-world competitive games through the use of
high-level strategic abstraction.

A stochastic optimal control formulation of distributed decision mak-
Radboud University, Nijmegen, The Netherlands
It has recently been shown that a class of stochastic optimal control problems can be for-
mulated as a path integral and where the noise plays the role of temperature. The path
integral displays symmetry breaking and there exist a critical noise value that separates
regimes where optimal control yields qualitatively different solutions. The path integral
can be computed efficiently by Monte Carlo integration or by Laplace approximation, and
can therefore be used to solve high dimensional stochastic control problems.
In this contribution, I discuss the consequences of this approach for distributed deci-
sion making in multi-agent systems. It is shown that the optimal cost to go function
J(x1 , ..., xn ) takes the form of a log partition sum over all configurations at the final time.
The optimal action for agent i is given by the gradient of J wrt xi .
The free energy displays symmetry breaking as a function of the noise as well as the time
to go. This means that in different regimes the optimal behaviour of the agents changes
from averaging over all possible stategies to specialization to one particular strategy.
The cost to go is intractable for large systems. However, one can use a variety of methods to
approximate J. I will discuss Monte Carlo sampling, variational approximations or belief
propagation. I will present a number of examples to show the phenomenology of this
model as well as the effectiveness of various approximations. Also, possible extensions
into competitive games will be discussed.
H.J. Kappen. A linear theory for control of non-linear stochastic systems. Physical Review
Letters, 2005. In press.
H.J. Kappen. Path integrals and symmetry breaking for optimal control theory. Journal of
statistical mechanics: theory and Experiment, 2005. In press.

Shared By:
Description: Game Theory, Machine Learning and Reasoning under Uncertainty