Behavioral Mathematics for Game AI

Document Sample
Behavioral Mathematics for Game AI Powered By Docstoc

                               DAVE MARK

                              Charles River Media
               A part of Course Technology, Cengage Learning

  Australia, Brazil, Japan, Korea, Mexico, Singapore, Spain, United Kingdom, United States
Behavioral Mathematics for Game AI               © 2009 Course Technology, a part of Cengage Learning.
Dave Mark
                                                 ALL RIGHTS RESERVED. No part of this work covered by
                                                 the copyright herein may be reproduced, transmitted,
Publisher and General Manager,                   stored, or used in any form or by any means graphic,
Course Technology PTR: Stacy L. Hiquet           electronic, or mechanical, including but not limited to
                                                 photocopying, recording, scanning, digitizing, taping, Web
Associate Director of Marketing: Sarah Panella   distribution, information networks, or information storage
                                                 and retrieval systems, except as permitted under Section
                                                 107 or 108 of the 1976 United States Copyright Act, without
Content Project Manager: Jessica McNavich        the prior written permission of the publisher.

Marketing Manager: Jordan Casey

Acquisitions Editor: Heather Hurley
                                                         For product information and technology
Project Editor: Karen A. Gill                                    assistance, contact us at
                                                      Cengage Learning Customer and Sales Support,
Technical Reviewer: Kevin Dill
                                                        For permission to use material from this text
Editorial Services Coordinator: Jen Blaney                or product, submit all requests online at
Copy Editor: Ruth Saavedra
                                                      Further permissions questions can be e-mailed to
Interior Layout: Shawn Morningstar                

Cover Creator: Dave Mark

Cover Designer: Mike Tanamachi
                                                 All trademarks are the property of their respective owners.
Indexer: Broccoli Information Management
                                                 Library of Congress Control Number: 2008940614
Proofreader: Mike Beady
                                                 ISBN-13: 978-1-58450-684-3
                                                 ISBN-10: 1-58450-684-9
                                                 eISBN-10: 1-58450-693-8

                                                 Course Technology, a part of Cengage Learning
                                                 20 Channel Center Street
                                                 Boston, MA 02210

                                                 Cengage Learning is a leading provider of customized
                                                 learning solutions with office locations around the globe,
                                                 including Singapore, the United Kingdom, Australia,
                                                 Mexico, Brazil, and Japan. Locate your local office at:

                                                 Cengage Learning products are represented in Canada
                                                 by Nelson Education, Ltd.

Printed in the United States of America          For your lifelong learning solutions, visit
1 2 3 4 5 6 7 11 10 09                           Visit our corporate Web site at
              A Little History
              and a Lot of Dedication

   first got into computers in the early 1980s. By the late 1980s, my very nontech-
   nical grandfather, Gilbert Todd, advised me to “grab onto them computers and
   stick with ’em… that’s where the future is.” I did just that, but more out of my
own inexorable need to be involved in technology than from his encouragement.
After a brief spin in the music business, I got back into computers in the mid-
1990s by doing all manner of IT consulting.
    In 2002, my wife Laurie and I had just started dipping into the game business
by forming Intrinsic Algorithm. On May 20 of that year, my dad drove me to the
airport to head off to what was my first Electronics Entertainment Expo (E3) con-
ference. My 88-year-old grandfather wanted to come along for the ride. As an ex-
pilot, he enjoyed watching the planes. He knew that I was going out to Los Angeles
to some sort of computer game–related conference, although he probably believed
that it was a little more of a heady event than what the then-ubiquitous loud music,
spectacular lights, and “booth babes” would have suggested to him.
     As I got out of the car in front of the terminal and retrieved my luggage from
the trunk, my grandfather gave me the best handshake his aging body could gener-
ate and told me, “You head out there and tell ’em how it’s done.”
    I didn’t have the heart to argue with my grandfather and say that I wasn’t going
to E3 to tell anyone anything, but rather to soak up as much of the state of the in-
dustry as I could. I simply thanked him for the encouragement and told him that I
would do my best to “set ’em straight.”
    Those words were the last thing my grandfather said to me. On May 25, 2002,
the Saturday after the show and the night before I was due to fly back, he had a heart
attack and died.
     My grandfather’s final words to me—which none of us knew at the time would
be so—have rattled around in my brain for the past six years. Just like the trip that
I left for on that day, I have spent more time listening and learning than telling any-
one how anything is done. However, as I started writing this book, my grandfather’s

iv   Dedications

     words came back to me once again. I couldn’t help but think that, perhaps, his
     advice was more timeless than the handful of days that I spent wandering through
     the expansive LA Convention Center. Maybe his encouragement was issued without
     an expiration date. Or maybe I should just assume that it is still in effect.
         Regardless, here I am six years later writing a book on game AI. So, PawPa, I
     may be a bit late, but I’m ready to head out there and tell ’em how it’s done. Thank
        In the meantime, there are others who have been helping me along the way
     who deserve some thanks:
         To my parents, who bought me my first Apple IIc back in 1984—although
     they were a bit less enthusiastic when I used it to play games. See? I told you I could
     make something out of that!
         To my best friend in my teen years, Gregg Bieser, who did more than simply
     play games with me on our respective Apples. It was with Gregg that I made my first
     awkward—even abortive—efforts to make my own games.
         To my AP Computer sidekicks and friends, Phil, Bob, Scott, and Rick, whose
     love of strategy board games and RPGs (fueled by the strangely never-diminishing
     supply of caffeine at Phil’s house) helped crystallize in me what I wanted my own
     games to look and play like some day.
         To my younger brothers, Jared and Kevin, who, at elementary school ages in
     the 1980s, were my guinea pigs as AD&D players. In retrospect, being a Dungeon
     Master at 15 years old was the first time I turned mathematics into behaviors.
          To the late Eric Dybsand, who, at the 2003 GDC, told me I knew what I was
     talking about—even when I didn’t think I did. Respawn soon, my friend! In the
     meantime, I hope you can get this book delivered to wherever you are. It seems like
     you were right. Oh, do me a favor and loan a copy to PawPa while you are at it. He
     won’t understand a word of it, but it will make him proud.
         To Steve Woodcock, not only for being there from the beginning, but for stick-
     ing around during the hard part, and then telling me it was time to get back on the
     horse after I had fallen off.
         To Alex Champandard for both giving me plenty of good advice and handing
     me the “megaphone and soapbox” that a weekly column can turn into.
         To Steve Rabin for his supply of wisdom regarding “AI Wisdom” and trusting
     me to help hold the reins to something bigger than both of us.
          To Heather Hurley and Karen Gill at Charles River Media for helping me make
     the jump from 13 pages to 500.
                                                                  Dedications       v

   To my tech editor, Kevin Dill, for reading every word and number—and
reminding me about the geometry of triangles.
     To my in-laws, Larry and Lenora Reynolds, who loaned Laurie and me some
bootstraps they had lying around—and were patient with us when we wore them
  To my son Steven, who constantly reminds me that games can bridge generations
—and teaches me that my thumbs are too slow.
    To my daughter, Kathy, who constantly reminds me that games can be about
more than shooting stuff—and teaches me how to observe people in ways that
continue to amaze me.
   To my son Aaron, who constantly reminds me of what I was like when I was his
age—which kinda scares both of us.
   To my stepdaughter Beth, who constantly reminds me that it can be fun to
completely beat the snot out of an opponent.
    And, finally, thanks must go out to my wife and business partner, Laurie.
Starting Intrinsic Algorithm was my dream. Her dream was helping me achieve
mine. During the conversation at a restaurant in late 2001 that launched us into the
game business, she gave me the best bit of advice I’ve ever had. When I asked her,
“But what if I can’t do it? What if I fail?” she simply shrugged and said, “Well, then
we’ll know.”
   Thanks, dear. You don’t know what that meant to me, then and now. This
book is dedicated to your dedication.

    Thank you all,
    Dave Mark
                   About the Author

         Dave Mark is the president and lead designer of Intrinsic Algorithm, LLC, an
     independent game development studio and AI consulting company in Omaha,
     Nebraska. He has been programming since 1984 when he was in high school where,
     much to the dismay of his teacher, he wrote his first text adventure on the school’s
     DEC PDP-1144.
         After a brief detour in the music business as a composer/arranger, keyboard
     player, and recording engineer during the early 1990s, Dave re-entered the technol-
     ogy arena in 1995. In 2002, he came to the startling realization that the corporate IT
     consulting and development world is a little short on creativity and fun. As a result,
     Dave left to start Intrinsic Algorithm LLC with his wife, Laurie.
         He is a contributor to the AI Game Programming Wisdom series of books, a
     regular columnist at, and a founding member of the AI Game
     Programmers Guild.
         Dave continues to attend the University of Life. He does not plan to graduate
     anytime soon.

              Author’s Preface

       s I have read different books on a variety of subjects over the years, I have
       found a variety of styles to be effective for communicating ideas. In addition
       to simply reading the text of a book on a technical subject, I have often
appreciated the extra effort that went into creating examples of the concept. I enjoy
diagrams and graphs so that I can simply have a place to anchor my thoughts as
I ponder the issue. Where appropriate, I like seeing the formulas used, already
normalized into a form that can be dropped into code. And, of course, I find actual
code snippets very helpful when the problem is one of programming.
    Some of these elements have been more helpful than others. Some of them
have been used to the point of overkill such that it was next to impossible to extract
meaningful content from the clutter. On the other hand, I have often felt that I have
been left wanting a diagram, a table, or a bit of code so that I could put things into
     To that end, as I set out to write this book, I made myself a promise to attempt
to find the balance that would be the most helpful to you, the reader. Throughout
this work, I attempt to cover four different approaches: theory, math, examples,
and code.
    Much of the discussion in this book will be about the theory behind the con-
cepts. This will be where I lay out the concept of a point as groundwork. While this
might seem to be the part of the book that is the least directly relevant to game pro-
gramming, it is also where much of the important material is. Everything spills out
of these ideas. Rather than bland recitation, I try to make the theory accessible
through example and anecdote.
    If necessary, the theory sections will show the relevant mathematics of the
concepts. Please note that, unlike some other books, my goal is not to overwhelm
you with esoteric proofs of how I got to where I did, but to give you as a developer
an end product that you can find useful in production.

viii   Author’s Preface

        IN   THE   G AME

       Very often throughout the book, I will lay out hypothetical game-like situations
       that show how to use a theoretical concept. While these examples may be game
       genre–specific, the concepts themselves can usually be applied universally. You will
       be able to quickly identify these sections by the icon at the left that will appear at the
       beginning of each in-game example. Note that there’s a listing of all the In the Games
       following the table of contents.

       P UTTING I T        IN   C ODE

       Where appropriate, I will offer up examples of how tools are used in programming
       situations. Sometimes these functions will be ones that you can use in your own
       projects. In these cases, you can find the code on the book’s Web site at
            For clarity’s sake, I list some of the naming conventions that I use in my pro-
       jects and, therefore, in this book.

             Type and struct names are in all caps: MY_TYPE.
             Struct names are preceded by a lowercase “s”: sMY_STRUCT.
             Variables and functions are in initial caps: MyFunction( MyVariable ).
             Member variables of a class are generally preceded with a lowercase “m”:
             List and vector names are preceded by a lowercase “l” and “v,” respectively. I
             combine these when necessary, such as in a member of a class that is also a vector.
             In this case, the name is preceded by “mv,” such as in mvMyMemberVector.

            Given that discussions of differences in programming style are often likened to
       religious wars, I hope my naming conventions do not cause you to view me as a
       heretic or to lob this book onto a blazing pyre of sacrilegious tomes with evil code-
       naming conventions.
            Oh yeah, open and close braces for functions belong in the same column, but
       open braces for if and while statements belong at the end of the if or while line.
       I want to follow the column from the end brace to the command that started it all.
       If you think differently, then you… you… you’re just wrong! So there!
                                                              Author’s Preface      ix

    OK… sorry. That was uncalled for. I truly hope that my quirks don’t annoy you
or make this book harder to understand. I’m just old. I learned some of these habits
in 1984. Blame my high school computer teacher.

     Throughout all of the sections, we will encounter terms that are worth noting.
To help you locate and remember these items, they will be bolded and marked with
the important lingo arrow in the margin.

Sometimes, I will add a cautionary note or an aside of some sort. This will be
set apart with the stylized, yet familiar comment brackets you see here.

    In addition to seeing in-game examples, you will encounter numerous exam-
ples from “life”—mine as well as those around me. Since, as AI programmers, we
are often in the business of re-creating lifelike situations and behaviors, I find this
approach fitting. Also, I hope that you forgive me for a sense of humor. It is my at-
tempt to keep you from falling into the mind-numbing stupor that technical books
have been known to induce. In a manner of speaking, I’m doing it for your own
good. Don’t just learn—enjoy!
This page intentionally left blank

Part I    Introduction                             1
     1    Why Behavioral Mathematics?              3
          Games and Choices                        4
          Going Beyond Looks                      10

     2    Observing the World                     15
          Identifying Factors                     16
          Finding Hidden Factors                  22
          Quantifying Observations                24
          Needing More Than Observations          28

     3    Converting Behaviors to Algorithms      29
          Using Numbers to Select                 30
          Using Numbers to Define                 33
          Using Algorithms to Construct Numbers   37

Part II   Decision Theory                         43
     4    Defining Decision Theory                45
          Normative Decision Theory               45
          Descriptive Decision Theory             48
          The Best of Both Worlds                 51

     5    Game Theory                             55
          Starting Simple                         56
             Matching Pennies                     57
             Prisoner’s Dilemma                   61

xii   Contents

                 Asymmetric Games                          69
                   Cutting the Cake                        70
                   Ultimatum Game                          72
                   Dictator Game                           75
                   Trust Game                              76

        6        Rational vs. Irrational Behavior         79
                 Perfect Rationality                       80
                   The Pirate Game                         80
                   Superrationality                        85
                   Guess Two-Thirds of the Average Game    87
                 Bounded Rationality                       96
                   Misusing Irrelevant Information         97
                   Ignoring Relevant Information          100
                 Rational Ignorance                       105
                   The Cost of Information                105
                   The Cost of Calculation                107
                 Combining It All                         109

        7        The Concept of Utility                   111
                 Decisions under Risk                     112
                   Pascal’s Wager                         113
                   No Pain, No Gain                       115
                 Utility of Money                         121
                   Value vs. Utility                      121
                   Utility vs. Risk                       123
                 Utility of Time                          140
                   Production over Time                   141
                   Distance over Time                     145
                   Changes in Utility over Time           150
                 Our Utility of Utility                   165
                                                        Contents    xiii

      8    Marginal Utility                                        167
           Value vs. Utility vs. Marginal Utility                  168
           Changes in Marginal Utility                             169
             Decreasing Marginal Utility                           170
             Increasing Marginal Utility                           177
           Marginal Risk vs. Marginal Reward                       181
           Defining Thresholds                                     187
           Multiple Utility Thresholds                             188
           The Utility of Marginal Utility                         194

      9    Relative Utility                                        195
           Hedonic Calculus                                        196
           Multi-Attribute Utility Theory                          198
           Inconsistencies                                         205
             Problems with Perception                              206
             Problems with Categorization                          207
             Problems with Understanding                           207
           Apparent Contradictions                                 210
             Giffen Goods                                          210
             Moral Dilemmas                                        219
           The Relative Benefit of Relative Utilities              225

Part III   Mathematical Modeling                                   227
     10    Mathematical Functions                                  229
           Simple Linear Functions                                 229
           Quadratic Functions                                     231
              Shifting the Quadratic                               231
              Tilting the Parabola                                 233
              Reshaping the Parabola                               234
           Sigmoid Functions                                       236
              The Logistic Function                                236
              The Logit Function                                   239
           Ad Hoc Functions                                        240
xiv   Contents

       11        Probability Distributions                     241
                 Identifying Population Features               242
                    Segmenting the Population                  244
                    Analyzing a Single Segment                 247
                 Uniform Distributions                         248
                 Normal (Gaussian) Distributions               250
                    Properties of Normal Distributions         250
                    Generating Normal Distributions            254
                 Triangular Distributions                      269
                    Simplified Normal Distributions            269
                    Parametric Building                        270
                 Uneven Distributions                          272
                 Parabolic Distributions                       274
                 Poisson Distributions                         276
                 Distributing the Distributions                279

       12        Response Curves                               285
                 Constructing Response Curves                  286
                   Building Buckets                            289
                   Retrieving a Result                         294
                 Converting Functions to Response Curves       296
                   Simple 1-to-1 Mappings                      296
                   Advanced 1-to-1 Mappings                    297
                 Converting Distributions to Response Curves   303
                   Data Structure                              304
                   Entering Data                               304
                   Selecting a Result                          308
                   Adjusting Data                              311
                 Search Optimization                           314
                 Hand-Crafted Response Curves                  316
                 Dynamic Response Curves                       317
                                             Contents    xv

    13    Factor Weighting                              319
          Scaling Considerations                        319
            Imposing Artificial Limits                  319
            Absolute vs. Relative Weights               322
            Granularity                                 325
          Weighting a Single Criterion                  329
            Concrete Numbers                            329
            Abstract Ratings                            329
          Combining Multiple Criteria                   331
            Normalizing                                 331
            Weighted Sums                               339
          Layered Weighting Models                      341
            Constructing a Layer                        342
            Propagation of Change                       345
            Compartmentalized Confidence                346
          Everything Is Relative                        348

Part IV   Behavioral Algorithms                         349
    14    Modeling Individual Decisions                 351
          Defining Decision                             351
          Deciding What to Decide                       353
          Analyzing a Single Option                     355
             Identifying Factors                        356
             Identifying Relationships                  375
             Building Connections                       379
             Scoring the Option                         382
          Comparing Options                             385
          Selecting an Option                           386
          Testing the Algorithm                         387
          Summarizing the Decision Process              392
xvi   Contents

       15        Changing a Decision                        395
                 Monitoring a Decision                      396
                   Time Frames                              396
                   A Hybrid Approach                        401
                 Perseverance and Decision Momentum         405
                   Ignoring Futility                        408
                   Building Decision Momentum               409
                 Our Final Decision on Changing Decisions   415

       16        Variation in Choice                        417
                 Reasons for Variation                      418
                    Variation between Actors                419
                    Variation by a Single Actor             420
                 Embracing Randomness                       422
                 Selecting from Multiple Choices            424
                    Random from Top n Choices               425
                    Weighted Random from Top n Choices      430
                    Weighted Random from All Choices        441
                 Scores and Weights                         444

                 Epilogue                                   447

                 Index                                      451
    In the Game Listings

Know When to Walk Away, Know When to Run    33
Expanding the Engagement Decision           39
Matching Punches                            59
Dueling Rocket Launchers                    65
Scouting the Enemy                         107
Counting the Enemy                         108
The Tortoise and the Harried               119
Protecting the Barracks                    130
Settlers and Warriors                      143
Taking Fire                                156
Building Soldiers                          171
Declining Health                           177
How Many Troops?                           191
The Engagement Decision Revisited          200
Wizardry and Wands                         216
Hippocratic Morals                         224
How Much Weight?                           332
Are We There Yet?                          334
Who’s Next?                                336
Which Dude to Kill?                        355
Dudes Revisited                            403
Flotilla of Futility                       406

This page intentionally left blank

  I            Introduction

       art I sets the table for the rest of this book in that it introduces us to the mindset
       that we will want to carry into our quest for improving our artificial intelligence
       (AI) agents through behavioral mathematics.

      Chapter 1, “Why Behavioral Mathematics?” asks us to ponder what it is we are
  trying to accomplish when we create opponents in games.
      Chapter 2, “Observing the World,” offers up the suggestion that many of the
  clues to creating intricate, deep, and meaningful behaviors are all around us.
       Chapter 3, “Converting Behaviors to Algorithms,” gives us a general overview
  of the approach that we will use when trying to place our observations into some-
  thing that we can place into our game engine.

This page intentionally left blank
1             Why Behavioral

       ame artificial intelligence (AI) has been an expanding and changing term in
       the past five to ten years. Even today, if you ask people what “game AI” en-
       tails, you will get significantly different answers. For a long time, however,
most of those answers could be tied to specific and simple functions.

    “It’s what makes the character walk toward me.”
    “It’s what makes the enemy shoot at me.”
    “It’s what makes the animation change from ‘idle’ to ‘attack.’”

     Those are all well and good—although taken at face value, not very encompass-
ing. In a general sense, you would be hard-pressed to really justify calling those
things “artificial intelligence.” They are too simple. In some ways, they could be
accomplished by a flip of a coin, a roll of the dice, or simply by a single “If…then”
statement such as the player entering a room.
    Lately, there has been a massive expansion of the ideas that fall into the purview
of game AI. In the name of “realism” in games, many of these expansions and ex-
plorations have been flowing in the direction of trying to mimic behaviors—either
by individuals, teams, or complex systems. Given the inherent complexity of indi-
vidual psychology and group dynamics, sometimes this task seems to be more
Sisyphean than Herculean. We can never quite be finished. Just when we think we
have emulated behaviors, another caveat appears, and our algorithmic rock rolls
back to the bottom of the hill. We will never solve the challenge of replicating
human, animal, or alien behavior on the technological side because we will likely
never be able to solve those behaviors on the psychological side. The only thing we
can do is continue to examine and study and break things down as far as we can.
Maybe the next time we push the rock up hill, it will feel a lot easier.

4        Behavioral Mathematics for Game AI


         There is a saying in the game industry that originated from industry pioneer Sid
         Meier. He suggested, “A game is a series of interesting choices.” That statement is,
         in my opinion, reasonable in a vague sense. However, with a little exploration, you
         can bracket things a little better. For example, if you were to pursue that concept by
         negation to the extreme, you would be left with the statement, “If there are no
         choices for the player, it is not a game.” That is certainly true. What would be left
         is the monodirectional narrative that games have been replacing for decades now.
         Aside from the initial choice of what to watch, television and movies do not provide
         choices for the viewer. They are not interactive. They are not games.
               Books have historically fallen into the same category. The exception to this is
         the “choose your own adventure” books. Unlike typical narratives, they allow the
         reader to make choices that, while very simple, do affect the outcome of the story.
         It is the addition of that simple mechanism that lifts the book from narrative enter-
         tainment into the realm of interactive entertainment. And, if the alternate paths
         and endings were divergent enough to be construed as “positive” or “negative,” one
         could make the case that there is now a goal—and a corresponding concept of
         “winning” and “losing.” In a sense, the book has now become a game.
             Rock-Paper-Scissors, the staple mechanism of alleviating sheer boredom and
         selecting hapless people for distasteful tasks, has but one choice event with three
         possible selections in any given round (Figure 1.1). Flick the fingers, and you are
         done. All that is left is to count the score. That hardly makes for long-term enter-
         tainment. (In fact, it may be disqualified from consideration in Sid’s definition in
         that there isn’t even really a series of choices.)

                   FIGURE 1.1 The decision matrix for the game Rock-Paper-Scissors.
                                      Chapter 1 Why Behavioral Mathematics?            5

     In Tic-Tac-Toe, the number of possible choices starts at nine initially and gets
worse from there. (Actually, it only looks like nine. Given that the board is symmet-
rical on two axes, the true number of starting moves is three: corner, side, and
middle. Interestingly, the second player either has five or two responding moves
depending on the initial selection by the first player.) Still, there are more choices
per game than are available in Rock-Paper-Scissors. Anyone over the age of eight,
however, realizes after about 10 times through the game that all the choices lead to
the rather bleak outcome of a draw. (Startlingly, the WOPR supercomputer in the
movie WarGames needed significantly more time than that to reach this conclusion.)
So, while there are certainly more choices in play, the game isn’t necessarily better.
    Blackjack has only a few choices of what to do at any given point: hit, stand,
double, and split. There are plenty of scenarios that the player can find himself in,
however. A glance at one of the cute little Blackjack cheat cards (Figure 1.2) will
show that there are 280 different meaningful combinations of what’s on the table at
any given time. While that is a significantly larger possibility space than Tic-Tac-
Toe and certainly Rock-Paper-Scissors, a second glance at the cheat card will show
that there is a statistically “right” way to play each of those combinations. So, while
there are plenty of choices, they aren’t necessarily “interesting.” In fact, seasoned
Blackjack players have that cheat sheet memorized—removing it, for all intents
and purposes, from the realm of choice entirely. They simply play the way that mil-
lions of models have shown to be statistically advantageous.
     Wandering down the casino aisle, you may encounter the Poker room. Again,
there aren’t many choices in the standard poker game. Aside from folding a hand,
the choices largely revolve around how much to bet. And yet, the choices that are
in play are far more interesting than those of Blackjack. What makes them that way?
After all, just like Blackjack, it really comes down to the probability statistics. “What
is the likelihood that my hand is better than yours?”
    To understand the difference between the two games that makes Poker’s choices
more interesting than those in Blackjack, we must realize that there is another
concept in play. There’s someone else in the room that is also making a “series of
interesting choices.” Theoretically, just like Blackjack, the number of possible Poker
hand combinations could be calculated. Just like Blackjack, there are a finite num-
ber of scenarios—although I would hate to see the size of the little cheat card. Given
that information—and even being forced to use some inference about the other
hands—you could make a reasonable assertion about the relative strength of your
hand. Your interesting choice would be reduced to what you were willing to risk
given the potential of winning and losing. The game would still take on the form of
Blackjack—albeit a bit more complicated. Except for the one knotty factor that no
amount of statistical modeling can entirely take into consideration—the guy on the
other side of the table with the dark glasses and stupid hat that keeps rattling his
chips is making an “interesting choice” as well.
6   Behavioral Mathematics for Game AI

                   FIGURE 1.2 The Blackjack “basic strategy” card shows the
                      mathematically optimum play for every combination
                     of what the player has and what the dealer is showing.

    My Choices…Your Choices…
    The astute reader might notice that, in a game of Rock-Paper-Scissors or Tic-Tac-
    Toe, the other player is making a choice. Upon further examination, however, what
    can we really say about those choices? In Rock-Paper-Scissors, for example, there
    are only three options. As a player, you have no information about what your
    opponent is going to do (unless we dig into some major data analysis on long-term
    play patterns of our opponent—or just happen to know that he has a particular
    affinity for geology). Therefore, there is no real guide for the strategy that we should
    adopt throughout the course of our 3-second game. At that point, we are left with
    the somewhat-tepid realization that all three of our options are, for purposes of se-
    lection, the same. All the results are tied up in the scoring matrix alone. Whether we
    admit it or not, our selection is basically random. Or, in theory, it would be no bet-
    ter than a random selection. So, while we can operate under the pretense that we are
    making a choice of which option to play, truly our act of volition doesn’t matter.
                                        Chapter 1 Why Behavioral Mathematics?                  7

    If you put that mentality on the other side of the table from you in the form of
a human player (assume for the sake of example continuity that it is the same guy
with the dark glasses and stupid hat), it fails to provide any sort of challenge over
and above playing against the roll of an Official Rock-Paper-Scissors Decision Die
(ORPSDD). You are playing against a random selection machine. The only differ-
ence is that the human player may be more likely to react to your victory dance than
would the ORPSDD. In fact, as long as the taunting was not included in the test, I
would suggest that an ORPSDD would admirably pass the Turing Test (see sidebar).


  The oft-cited Turing Test was conceived by famed AI researcher Alan Turing. He sug-
  gested that a machine entity could not be deemed “intelligent” unless it performed
  the action being tested in a manner indistinguishable from a human.
      Currently, the test is used in a far more loose sense than what Alan Turing originally
  proposed. That leads to its being used in situations for which it was not intended—
  much as the term “litmus test” is used in far more situations than the chemical test
  that it truly is.
      In a broad sense, the term “Turing Test” has become the equivalent to saying
  “believable.” If one says that an AI entity “passes the Turing Test,” it generally means
  that it looks as good as a human player. This believability is always going to be limited
  in scope, however. Usually, only a specific behavior or set of behaviors is being judged.
  This scope is, by necessity, also limited to game-relevant situations. For example, an
  agent may run, hide, and shoot in a manner that looks like a real player. However, if
  you try to engage this same agent in a conversation about the weather, the agent is
  going to be exposed for the narrowly designed entity that it is.
      That being said, the phrase “it passes Turing” is a colloquially understood badge
  of honor that means “it looks and acts pretty darn well.” As such, it is a goal to strive
  for as an AI designer and programmer.

     Tic-Tac-Toe has a few more options, as I mentioned earlier. However, the state
space of the game is so small that at any given moment the choices we have before
us can be reduced to “correct,” “incorrect,” or “delaying the inevitable.” If the point
is to win the game, there really is no choice involved. At least not much more than
answering the question, “Do you want to win the game?” So, while there are more
choices in play for both you and your opponent, victory comes down to a failure on
either your part or that of Mr. Glasses & Hat to correctly answer the question, “Do
you want to win the game?” (Figure 1.3). Assuming a static level of continual lucid-
ity for both of you, the only way someone is going to lose is if he allows the other
8   Behavioral Mathematics for Game AI

    person the advantage by intentionally letting him win through an incorrect play.
    Put another way, until someone chooses to lose, the actual choices made in the game
    are irrelevant. You are playing against a rigid, predictable, rule-based machine. In a
    way, it is almost an inverse Turing Test. To win (or draw), you simply play the same
    way a computer would.

        FIGURE 1.3 The decisions in Tic-Tac-Toe are usually limited based on whether
        you want to win or lose. In the game above, the player (X or O) had to play the
           locations dictated by the arrows or else lose the game in the next move.

        Blackjack gives the appearance of playing against another entity. In the typical
    casino setting, the House as represented by the Dealer. (For the sake of saking, let’s
    put the Dealer in dark glasses and a stupid hat.) According to the actual rules of the
    game, your opponent has no choices whatsoever. There is no random chance in the
    selections he is making like there is in Rock-Paper-Scissors. The random chance
    comes from the nature of the game itself—specifically, the order of the cards in the
        While there is a strict rule-based system in Blackjack (hit on less than 17, stand
    on 17 or more), it differs from the one in Tic-Tac-Toe. (That is, “I choose to win;
    therefore, I should move here.”) In this case, the rules aren’t in the decision model
    of the Dealer specifically; they are in the rules of the game as a whole. Unlike Tic-
    Tac-Toe, no matter how badly the Dealer wants to win (or lose, for that matter),
    his choices are entirely cast in stone. The Dealer isn’t “selecting” anything; he is
    playing the game. He’s reading a script, as it were. You are playing against a rigid,
    predictable opponent in the realm of a completely random environment. It is really
    no different from the Craps table nearby. You aren’t playing against the man with
    the curvy stick and the loud banter. You are playing against precalculated payoff
    odds in a completely random environment (represented, in this case, by dice
    instead of cards).
                                      Chapter 1 Why Behavioral Mathematics?           9

     That being said, Blackjack is yet another “made-for-Turing” game. From the
player or dealer’s standpoint, everything is calculatable and, therefore, predictable.
If you win, it has nothing to do with your besting the abilities and decisions of your
opponent. Your “opponent,” in a sense, is the random ordering of the cards, not the
Glasses & Hat stoic on the other side of the table. (And most Dealers, regardless of the
regalia, won’t respond whatsoever to your victory dance anyway. But security may.)
     In Poker, however, your opponent is a vital part of the game. While you are
both dealing in the realm of randomness, you are both allowed to react to that
randomness—each to his own advantage. This differs from Rock-Paper-Scissors,
where the randomness is the game. While you are dealing in the realm of choices,
you are both allowed to pursue choices that may or may not lead to victory. This
differs from Tic-Tac-Toe, where each choice is specifically, and almost obviously,
tied to the condition of victory. While you are both dealing in the realm of rules,
neither of you is rigidly limited in your core gameplay decisions so as to completely
disconnect the decision from the desire to win. This differs from the Dealer in
Blackjack, where the available decisions are so narrowly prescribed that the Dealer
becomes a nonentity in the game.
     In the game of Poker, the constraints of your opponent’s gameplay are not so
narrow as to be either entirely random or entirely predictable. As a player, you are
aware that not only do you have many interesting choices to select from, but so does
your opponent. What’s more, because there is a cyclical interaction between you
and your opponent, your selection process must necessarily involve all the poten-
tial selections your opponent may make. Taken an iteration forward, you must be
aware that your opponent is likely to base his reaction on the selection you do make.
For that matter, taken an iteration backward, perhaps your opponent’s previous
choice was made based on his assumption that you would take his decision into
account. In comedy, this effect is referenced by some variant of “I know you know
that I know what you think I know.” In mathematics, it is known as a combinator-
ial explosion. In games, it sure does make for Sid’s “interesting choices.”
    And therein lies the quest as AI designers and programmers. Given that our
game studios often lack both the ability and the budget to hire live orcs, dragons,
aliens, Nazi soldiers, or even bunnies to hop around nearby, our games face a
paucity of opponents that are able to don the dark glasses and stupid hats that
symbolize our nemeses. And, it has been shown that, if we choose to march out
denizens to face them that are entirely random, entirely predictable, or mercilessly
constrained in their behavior and try to pass them off as intelligent, thoughtful, and
responsive entities, our gaming clientele is unforgiving. Frankly, those do not lead
to necessarily “interesting choices” for the player. Therefore, our goal as AI designers
and programmers must be to give the player those “interesting choices” by imbuing
our AI progeny with the ability to make “interesting choices” of their own. We can
leave the dark glasses and stupid hat to the art department.
10    Behavioral Mathematics for Game AI


      For purposes of full disclosure, I have to admit that I have little artistic talent. I
      understand some of the concepts; I can draw perspective of blocky objects using a
      vanishing point, for example. I can even copy an existing drawing to some extent.
      To this day, I have a fantastic painting of a pig that I made in seventh grade that was
      copied from a picture in a magazine (Figure 1.4). However, that about exhausts my
      talent for the medium.

                           FIGURE 1.4 An original Dave Mark, circa 1981.

      Looking Like a Pig
      Despite my dearth of ability to perform in that particular discipline, I would still
      feel secure in making the claim that artists in the game industry have life a bit eas-
      ier than do AI programmers. After all, they can see what it is that they are supposed
      to be accomplishing. Before they begin to draw a pig, they can look at a pig. They
      can make changes to parts of their pig that are less than accurate—in effect, fine-
      tuning their pig-replication abilities. They can show their picture of a pig to anyone
      else who wanders by and ask, “What does this look like?” Unless the artist sub-
      scribes to a more Picasso-esque approach, the reaction should be, “Hey! It’s a pig!”
      (Unfortunately, my art teacher didn’t buy my claim of being a disciple of Picasso.)
      People know what a pig looks like. It can be rigidly defined as a collection of
      expected parts. For example, everyone knows that a pig has four feet. If your pig
      has five feet, one of which is located in front of an inexplicably present dorsal fin,
      viewers will be mildly disturbed.
                                     Chapter 1 Why Behavioral Mathematics?         11

     Artists also are comfortable with re-creating the environment. A light on one
side of an object makes a corresponding shadow on the other. We’ve all seen it; we all
know that it would be incorrect otherwise. The ability to perform observation and
criticism doesn’t simply lead to the realization that an error needs to be addressed;
it often leads to the solution itself. For example, “Make the floor darker on the side
of the object opposite the light.” Even though I lack the ability to necessarily fix it
properly, even I as a nonartist can often suggest the solution.
    From a technical standpoint, the solutions are often fairly intuitive as well. For
example, to make the color of the blue floor darker in the shadows, use a darker
color of blue. To make the buildings in the distance look smaller, draw them
smaller. Truly, the models of how to accomplish many of the core features of art
have been somewhat solved for hundreds of years.

Acting Like a Pig
In contrast, game AI provides some challenges in a number of respects. For instance,
we can’t just show our AI-enabled pig to someone and ask, “Does this act like a
pig?” The answer can’t be rattled off as easily as one regarding the look of a pig.
Certainly, there are some obvious failure scenarios such as said pig flying about the
room in proverbially unlikely fashion. That should tip some of us off that some-
thing is amiss. However, it is far more difficult to state for certain while watching
Mr. Swine’s behavior unfold onscreen that it is, indeed, acting the way a proper pig
should. There is a layer of abstraction involved that is not easy to translate through.
     With the artwork, we see real life and then we see the representation of it. There
is an implicit equality there. If what we see in real life doesn’t match what we see
in the representation, we can determine that it is wrong. Even if equality is not
reached, we are cognizant of a point of diminishing returns. We are accepting of a
representation that is pretty darn close to what we see in reality.
    When watching behavior, however, we have to pass our understanding of that
behavior through a filter of judgment. “Is that the correct thing for the pig to do?”
To answer this question of equality, we would have to have an established belief
about what the real-life pig would have been doing in the first place. While we can
give generalized answers to that question, none of us can state for certain that every
pig will act in a certain way every time that a situation occurs.
     Moving beyond pigs to behaviorally more complicated life-forms (such as
humans—although there may be some exceptions), the solution set gets significantly
larger. As that happens, our confidence in what we believe the entity in question
“should be” doing slips dramatically. While we may be more comfortable in think-
ing of possibilities of human behavior than those of a pig (seeing that we are, indeed,
human), the fact that there are so many subtle shades of those behaviors makes it
statistically less likely that any one of them is the “right thing” to be doing.
12   Behavioral Mathematics for Game AI

         Just as our ability to define what it is these life-forms should be doing wanes, we
     are ever more hard-pressed to judge an artificial representation of an entity’s be-
     havior. In Tic-Tac-Toe, it was obvious when the opponent was playing right or
     wrong—the ramifications were immediately apparent. In Poker, even looking over
     a player’s shoulder at his cards, it is often difficult to judge what his behavior
     “should be.” The combination of the possibility space of the game with the range of
     thought processes of different players makes for a staggering array of choices. The
     best we can come up with is, “That may be a decent choice, but this is what I would
     do if I were him.” And that statement itself needs to be taken with a grain of salt
     since we may not be taking the correct—or more correct—approach ourselves.

     Making Pigs Act Like Pigs
     What this means is that AI programmers have it tough. Unlike the artist who can
     see his subject and gauge the relative accuracy of his work to it, AI programmers
     don’t necessarily know where they are headed. Certainly, we can have ideas and
     wishes and goals—especially in the short run. (“I want my pig to eat at this
     trough.”) We are also well aware that those can tend to backfire on us. (“Why is my
     stupid pig eating at that trough when it is on fire?”) However, as the complexity of
     our world grows, we have to realize that there may not be a goal of perfection such
     as the goal of photo-realism in art. Behavior is too vague and ephemeral to explain,
     thereby making it impossible to accurately mimic. Additionally, the goal in many
     games is to support the overarching narrative or role of a character. Achieving
     perfect behavior for a background character in a crowded city scene is different from
     attempting to construct realistically responsive behavior for that same character.
     Often, the best we can do is to embrace methods that give us a good shot at coming
     close to something that looks reasonable.
          But how do we do that without going the route of complete randomness of the
     Rock-Paper-Scissors player, the monotonous predictability of the Tic-Tac-Toe
     opponent, or the rigid mindlessness of the rule-bound Blackjack dealer? Somehow
     we have to be able to create the mind of the Poker player. We have to approach the
     game from the inside of that Poker player’s psyche.
          We have to embody that soul with the ability to perceive the world in terms of
     relevant, not relevant, interesting, dangerous. We have to give him a way to concep-
     tualize more than just “right or wrong,” but rather shades of differentiation: better,
     worse, not quite as bad as. We have to create for him a translation mechanism
     to our own thought processes. And it is this last part that is the most difficult. To
     do that, we have to do so in a language that both of us can understand, yet one that
     is robust enough to convey all that we perceive and ponder. And that language is
     thankfully one that computers know best; in fact, it’s the only one they know—that
     of mathematics.
                                    Chapter 1 Why Behavioral Mathematics?        13

    The trick is, how do we convert behavior into mathematics? It isn’t quite as
simple as the what-you-see-is-what-you-get (WYSIWYG) model of the artists.
(“Make the blue floor in the shadow darker by making the blue darker.”) There is
no simple translation from behaviors into numbers and formulas. (For what it’s
worth, I already checked AltaVista’s translation tool, Babel Fish. No help there!)
Certainly, researchers have been taking notes, conducting surveys, and accumulat-
ing data for ages. That doesn’t help us to model some of the behaviors that as game
developers we find necessary but are so second-nature to us that no researcher has
bothered to measure them.
    So we are on our own—left to our own devices, as it were. The best we can do is
to observe the world around us, take our own notes, and conduct our own surveys.
Then, using tools to make expression, calculation, and selection simpler, we can at-
tempt to create our own interface into our hat-wearing Poker player so that he can
proceed to make “interesting choices” as our proxy.
    It is my hope that, through this book, I will be able to suggest to you how to
observe the world and construct those mathematical models and interfaces for
decision making. Will your AI pigs become model swine in all areas of their lives? I
don’t know. That’s not entirely up to me. However, that result is hopefully going to
be far better than if I were to attempt to paint one for you.
This page intentionally left blank
2             Observing the World

       he most important skill to develop that will help you in constructing reason-
       able behaviors is something you will not learn how to do in this book. You
       must, of course, know what it is you are trying to re-create. If you are at-
tempting to create a behavior for an entity that is not familiar (space aliens come to
mind, although there are people who claim to be subject matter experts), you are at
least going to want to find something that is similar to base your models on. Once
you know what it is you are trying to re-create, you need to be able to observe what
that entity generally does.
     If you have taken a biology class in school, for example, you may have created
what is often referred to as an “observation log.” This is really not much more
complicated than it sounds. At my high school, our biology teacher had a pet owl
in the classroom. When it came time to do the observation log, the students would
grab pencil and paper and, for 45 minutes, watch the owl and make a note of every-
thing it did. These observations were generally very simplistic (apparently owls don’t
do much when sitting on a perch in a school classroom). They made notes such as:

    Turns head left
    Turns head center
    Ruffles feathers
    Repositions right foot

16    Behavioral Mathematics for Game AI

          Certainly, there doesn’t seem to be a lot we can glean about the behaviors of
      owls from observations such as these. Yes, we have compiled a list of possible things
      that an owl could do at any one point in time, but it doesn’t give us a lot of infor-
      mation about when and why.
           Often, the secret to the conceptual side of behavioral mathematics is to sit back
      and try to see what is really going on. Things are not often as clear-cut as they seem
      on the surface. Because of that, there is a layer of abstraction that is less visible but
      far more powerful. What sounds cause him to turn his head? Perhaps the owl ruffles
      his feathers only when the fan turns on? Is there a pattern to the blinking? What is
      he thinking when he moves that foot? Is he even thinking about it at all? Does he say
      to himself, “Now I’m going to move my foot”? Probably not. But there is something
      going on inside that owl’s head on either a conscious or unconscious level, and
      the outward signs of that thought process are what makes our owl look and act like
      an owl.
          What’s more important is comparing the actions of one owl to those of an-
      other, or comparing the actions of the owl in one situation to his actions in another.
          The complexity that differentiates one Poker player from another is not in the
      absolute linearity of processing mundane probability statistics, but in the subtle
      nuances of human behavior. Many of the things we do that set us apart from one
      another are unconscious—or at least born from beliefs that are, themselves, com-
      plicated in their origins.


      It isn’t enough to simply observe actions and reactions. We must identify those
      actions for what they are. We must have an explainable correlation between cause
      and effect. The difference is similar to the difference between hearing and listening.
      We hear things all around us at all times, but often we don’t actually listen to what
      we hear. When we actually listen to things, we separate out distinguishable sounds
      from the aggregate sonic disturbance in the background. The same can be said
      about hearing a person speak but not necessarily listening to what he is saying—a
      staple of marital discord. (So I’ve been told. I always listen very attentively to what
      my wife says!)
           The point is, it is only when we separate different observations from the back-
      ground that we can put them into their proper local context. At that point, by iden-
      tifying them, we have taken a necessary step toward isolating them for further
      analysis. Because there is no shortage of things to observe and identify, the trick is
      to determine which of those items are important enough to proceed with.
                                                  Chapter 2 Observing the World          17

     Surprisingly, this process is a little dichotomous for us as humans. Some things
jump right out at us as important. On the other hand, most creatures (humans
included) have a little evolutionary quirk called latent inhibition built into us as
well. Latent inhibition is a filter system. On an instinctual level, it allows animals to
observe the world and classify things as either important or unimportant to sur-
vival. If something is deemed to be unimportant, the mind no longer pays attention
to it. That way, we don’t waste precious clock cycles in our brains reprocessing
something that has already been classified as irrelevant.


   It has been observed that people with lower levels of latent inhibition—that is, those
   who do not shut out their environment as much as others do—tend to be more creative
   than those with normal levels of latent inhibition. The theory is that these people do
   not dismiss things as irrelevant as quickly as do their peers. Instead, they keep com-
   ing back to them and, in doing so, may be better able to understand not just what the
   subject of the observation is but how that subject fits into the world in a relative
       While this seems like a desirable trait to have in a creative endeavor such as game
   design and development—and, admittedly, it does come in handy—lower levels of latent
   inhibition are also strongly correlated with psychosis or even diagnoses of mental
   illnesses such as schizophrenia, attention deficit disorder, and bipolar disorder.
       Be careful what you wish for.

    However, in some pursuits, this time-saving device gets in the way of doing im-
portant work. Sometimes, in our process of observation and investigation, we need
to consciously suppress our latent inhibition (which makes for an interesting double
negative, doesn’t it?). By doing that, we can reexamine things that are in our envi-
ronment. Often, the most interesting and relevant ways our world works are things
that we take for granted—right under our proverbial noses. It is through suppressing
our latent inhibition that we can see enough detail to begin to sketch out the struc-
tures of the decision-making process.

Running the Fifth Grade
A number of years ago, when my daughter Kathy was 10, I was treated to a 20-
minute dissertation in which she explained her entire reasoning for running for vice
president of the fifth grade at her elementary school. First, she explained to me why
she was running for the number-two spot. Her reasoning was that, since everyone
18   Behavioral Mathematics for Game AI

     wanted to be president, there was significant competition for that spot. There were
     only a few candidates for VP, however (Figure 2.1). I probed her somewhat on this
     decision, asking her why she didn’t want to be president instead of only vice presi-
     dent. After all, the president would have more power (for what that’s worth in the
     fifth-grade student council). She firmly explained that she would have no power if
     she didn’t get elected at all. She also noted that the difference between the two po-
     sitions was rather negligible. By being elected VP, she would be able to help advise
     and guide this august governing body. What we will explore later in this book is the
     problem that my 10-year-old not only ascertained intuitively but was able to explain
     in age-appropriate, nonmathematical terms—that a good chance of getting less is
     better than a poor chance of getting more.

          FIGURE 2.1 All other factors being equal, there is a 10% chance of becoming
              president and a 33% chance of becoming vice president. Running for
               VP offers a better possibility of being elected to the student council.

          Another observation that she made was her perceptions about why people were
     supporting the candidates that they were—even prior to the speeches. She told of
     two phenomena that she had witnessed and identified. First, she talked about how
     the people who bothered to put up posters and hand-make stickers got a lot of the
     buzz. She thought it was silly that people would support someone simply because
     they had more posters (or cooler posters, I suppose). However, she was aware of
     how the process was happening. If people didn’t know about a candidate, there was
     no one to support that candidate. What’s more, the more posters and stickers that
     a kid had up, it would broadcast—in a simplistic sort of way—that the kid was
     actually serious about running for office. Again, this had nothing to do with issues…
     just a matter of exposure.
                                              Chapter 2 Observing the World        19

    The second phenomenon that she noted was the “power of the group.” Kids
who were on the fence about a candidate could easily be swayed simply by the fact
that someone they knew was supporting a particular person. The more people a kid
encountered (especially in their own social circle) who supported a particular can-
didate, the more legitimacy was lent to that candidate (Figure 2.2). Kathy explained
that it was just like the exposure via posters and stickers, but was more influential.
While you may see and remember a poster, you trust your friends.

 FIGURE 2.2 The larger the network of kids exchanging their opinions on a candidate,
the more “information momentum” is created and reinforced in that group. This leads to
                       more people supporting that candidate.

     Even more powerful, she continued, was the fact that you didn’t want to seem
like you were going against your friends. Fear of social reprisal was a far more
important (if insidious) reason to support a candidate. You might not support a
candidate about which you had no opinion specifically because someone you didn’t
like was doing so. Despite the fact that she was appalled at this mindless peer
pressure, she expressed with no small amount of mirth that often this propagation
was bidirectional. Other than a subtle flow downstream from the “cool kids,” you
20   Behavioral Mathematics for Game AI

     couldn’t really tell which kid was the original one to support a candidate. Each one
     was doing it because they thought the other wanted them to. It just sort of hap-
     pened as a group. Again, my daughter, in all her wisdom, had grasped an important
     concept. She had observed and attempted to codify the complex rules of informa-
     tion and influence propagation across the social network of a diverse group of
          One last weighty issue (which was actually the reason for her phone call that
     day) was the consternation she felt about her campaign speech. Kathy was torn
     between whether to deliver meaningful, adult-sounding content or make more
     exciting, kid-friendly promises. She felt that she needed to include the relevant,
     responsible ideas she would bring to the student council. On the other hand, she
     feared that this sort of material would come across as boring and uninspiring to her
     peers. She knew she needed to mix in some kid stuff such as promising to push for
     ice cream on Fridays and what-not. (She was honest enough not to promise to ac-
     tually deliver on these issues, but only to do her best to get them enacted.) With that
     approach, she knew she could inspire her juvenile electorate. Of course, going solely
     with that approach had its own caveats in that she feared that she would sound just
     like the other candidates or not be taken as someone who was serious about the job.
          In the end, she decided on a compromise, selecting from both the more adult-
     like topics and the more kid-like ones that would be important to the people who
     would possibly be voting for her. She took a similar mindset that, if she didn’t appeal
     to the voters with the happy-happy stuff, she would never get the opportunity to
     bring up what she felt needed to be addressed. (As a note, I’m writing this part of
     the book in the thick of the campaign season for a U.S. presidential election. It
     makes reminiscing about my daughter’s “substance vs. fluff” dilemma all that more
     amusing.) She had analyzed the possible mindsets of her audience and selected a
     mixed strategy that balanced the needs of the fifth grade, the necessity of getting
     elected to satisfy those needs, and her own desire to be intellectually honest. She
     knew that to craft her speech appropriately, she would need to get into their heads
     and predict their reactions.

     Observations without Order
     All of these issues were rattling around in my daughter’s brain that week. She had
     observed and analyzed the relative odds of the positions of president and vice pres-
     ident. She had observed the results that an increase in the numbers of posters and
     stickers could achieve. She had observed how opinions could be swayed by sec-
     ondary factors such as the opinions of others in proximity and the social pressures
     that are endemic to a school environment. And my poor, conflicted offspring
     observed that, in some contexts, an idea without substance could actually have
     more influence than something concrete.
                                               Chapter 2 Observing the World        21

     What she lacked was a way of putting this all into the proper framework and
perspective with the appropriate weights and measures. To select the proper ap-
proaches to these problems, she would have had to accurately weigh how each
factor affected the other children. For example, while she knew that posters and
stickers were valuable for name recognition, she did not know whether those meth-
ods were more or less important than having an “in” to the right social networks.
She didn’t even have a way of quantifying the relative merits of posters vs. stickers.
Was one poster worth ten stickers? Twenty? Only five?
    She also had no way of assigning values to the general categories of serious and
childish issues, much less the value of each individual topic. Simply put, she needed
to crawl inside the brains of typical fifth graders and not just determine what was
important to them on both conscious and subconscious levels, but express how
important each factor was relative to the other. If she had been able to do this bit of
mental wizardry, she would have been able to then construct a mathematical model
that informed her about what decisions she should make.
    Of course, she didn’t do this detailed analysis in such an open, scientific way.
Much of it was based on things she would intuit simply from observing the goings
on around her. After all, she’s only a 10-year-old girl (although a remarkably
insightful one). Whether the reasons for behavior are conscious or unconscious,
however, there are often ways of probing that layer of abstraction to see what is there.
Most of it is based on the idea of determinism—or cause and effect. Taken as a whole,
the hundreds of causes producing hundreds of effects can be a bit daunting—even
to the point where the fact that there is a direct link between things is completely
obscured. Observed and described individually, though, we can start to see how
they all work individually yet combine to work together.
     Evidence of these sorts of effects is all around us. The challenge is to take these
observations and convert them into numbers and formulas that we can use. This is
at the heart of the process we must use in crafting behavioral game artificial intelli-
gence (AI). To create an agent that would make the decisions (such as my daughter),
we must also be able to model other aspects of the world (such as her peers). Each
of the components above shows an individual system that could be modeled. When
you put them together, you get bigger and more complex decisions.
    In the end, Kathy plotted her course with noble intent and shrewd political
savvy yet came up short of the coveted office of fifth-grade vice president. After all,
she was working uphill… she wasn’t one of the “popular kids” and she refused to
promise “ice cream Fridays.”
22    Behavioral Mathematics for Game AI


      Large decisions such as the ones that Kathy faced in her quest for fifth-grade total-
      itarianism can be daunting. What’s more, the relationship between the various pieces
      and parts can be obscure and intertwined. How, then, do we approach unwinding
      this mass of threaded relationships until we have broken them down to their
      component parts? Once we have reached a reasonably atomic level, we can begin to
      make judgments about just that one item. The smaller the decision, the fewer factors
      are involved in analyzing it. The fewer the factors, the easier it is to see how they
      interrelate. Therefore, the trick is to start small… and then simply ask, “What is

      The Layer between Supply and Demand
      If I may play the role of the proud father once again, I will relate what my youngest
      son, Aaron, surprised me with the same time as Kathy’s socio-political epiphany.
      Aaron is a little more than a year younger than his sister and was, therefore, nine
      years old when this lightning bolt struck.
           We were playing Zoo Tycoon together (which I whole-heartedly recommend as
      a teaching tool for elementary school children). We had been working on a new zoo
      for a while and had a handful of exhibits up and running. Aaron, pausing in
      thoughtful analysis commented, “Dad, we’ve got some interesting stuff for people
      to see now… and there are more people coming into our park. We should think
      about raising the park admission price. That way we would have more money to
      build with.”
          Eager to turn this into one of those vaunted “teaching moments,” I asked him
      how much he thought we should raise the admission. He sat back slightly and nar-
      rowed his eyes in thought and then gave a response that seemed very odd coming
      out of a fourth-grader. The following is probably an over-dramatized recollection
      of the exchange due to my unconscious desire to make my kid look brilliant.
      However, to the best of my knowledge, it is pretty darn close to what he really said
      that day.

      Aaron: “Well… I’m not sure. But we can’t raise it too much.”
      Curious Dad: “Why not?”
      Aaron: “Because if we raise it too much, we could actually lose money.”
      Astounded Dad: “Really? Aren’t we raising it so we can make more money?”
      Aaron: “Yeah… but if the price is too high, then less people will come. We will
      make lots of money off the ones that do come in, but we won’t get the money of the
      people who decide not to come in.”
                                                  Chapter 2 Observing the World          23

Amazed Dad: “So how do we know what price is right?”
Aaron: “Well… we should raise it a little at a time and keep watching to see how many
people are coming in. If less people come in, then we know that it was too much.”
Proud Dad: “Son, you just expressed a concept that many 40-year-olds don’t

    Aaron had made the observation that he wasn’t operating in a static environ-
ment. His prospective customers may react to his decisions. Therefore, to make the
correct decision, he needed to be able to somewhat predict what their behavior
would be. What Aaron needed to determine was what the interface was between his
decision and the reaction of the guests. There must be a relationship that can be fig-
ured out that would predict, within reason, their behavior. Of course, we all know
that what he was expressing was the connection between price and demand. If
something costs little, it is more attractive; as the price increases, it is less attractive.
     If that is put into a consumer model (and additionally considering that differ-
ence in consumers’ ideas as to what the “right price” is), you will find that a rising
price will satisfy fewer and fewer people. Once the number of willing buyers falls to
a certain point, it doesn’t matter what we charge them; we won’t be making as
much money as before. Taken to the extreme, you could charge one million dollars
for a zoo pass, but you probably won’t get many takers. Alternatively, if we try to get
more customers by cutting the price, we could reach a point where we are getting
lots of business but not making any money. Again, the extreme end would be mas-
sive numbers of guests for free. There is a businessman’s joke that says, “We’ll make
it up in volume.” Uh… no you won’t.

Turning Human Decisions about AI into AI Decisions
Interestingly, as this was a computer game, and the park guests in Zoo Tycoon did
react appropriately to changes in price, we know that some clever designer had
actually created a formula for that behavior. As humans playing the game, our job
was to determine what this formula was. We know that attitudes would move in a
certain direction, but we didn’t know how much. As I mentioned, my son’s approach
was to move the price a little at a time and observe the changes in behavior.
Eventually, we would be able to determine the right price for our zoo admission.
(Of course, since our zoo was constantly changing in what it offered, the right price
was a moving target—a sign of a good algorithm design. More on that later.)
    However, in game AI, the roles are slightly different. The decision that we hu-
mans were making is one that could be applied to an AI agent such as a shopkeeper
in a role-playing game (RPG). What is the right price to charge for a particular
item? If all else was equal, the role of zookeeper would have to do just as we did—
take into account the unknown formula that went into crafting the decision model
24    Behavioral Mathematics for Game AI

      of the guests. Rather than simply guessing (which is not a very suitable task for
      computers), we would be tasked with determining what formula(s) we would use
      to take into account what the other agents would be doing.
          It is in that neutral territory that behavioral mathematics comes into play, and
      likewise, where we need to use observation and a little inference to construct those
      behavioral algorithms properly to adequately fill that gap.


      Converting ideas into numbers alone might not be as hard as it seems, however.
      Quantifying things not just in ordinal terms (e.g., A is greater than B) but in pre-
      cisely relative terms (e.g., A is twice as big as B) nudges us closer to the realm that
      computers can use to calculate and therefore closer to what we can use in game AI.
      In this case, we must ask another question. Just as the prior query was “What is im-
      portant?” this one could be phrased, “How much more important is it?”

      Counting on the Edge of a Razor
      I will illustrate with an example from my own life that came as something of an
      epiphany to me—and, given the circumstances, was a sure sign that I needed to step
      away from AI development for a day or two. It came while I was shaving.
           If you happen to use refillable razor cartridges as I do, they may come in a plas-
      tic container of five cartridges (Figure 2.3). If I were to ask you the odds of any of
      those particular cartridges being in use, you would likely give the mathematically
      simple and straightforward answer of one in five—or 20%. That would certainly be
      reasonable to assume. However, there is a subtle difference to be illustrated here. If
      you were to randomly select from my five cartridges, each of the five would have a
      20% chance of being selected by you. Likewise, if you knew nothing else of the
      situation, a random selection that you made would have a 20% chance of being
      the correct cartridge. However, that doesn’t mean that each cartridge has a 20%
      chance of being in use at any given moment.
           What if I told you that I always use the cartridges in order, from the top one in
      the container down to the bottom one? Would that change your decision? Why?
      After all, you were not asked the order in which I use them. You were not asked to
      guess which one I would use next based on which one I was currently on. You don’t
      know how long ago I purchased the refill pack. You don’t know for how long I typ-
      ically use one blade.
                                                Chapter 2 Observing the World          25

                   FIGURE 2.3 All blades may be created equal, but
                       that doesn’t mean they get used equally.

     But wait… that last item is relevant, isn’t it? If you use those refill packs, or any-
thing similar for that matter, you may immediately recognize the point. For many
of us, a distinct pattern of behavior surrounds those blades. When I first open a new
pack and click the first blade into place, there is a quiet, almost subconscious oath
to myself. “This is a new pack. I’m going to make it last.” That mindset lasts for a
while until finally, in an act of self-preservation, I have to switch to the second
blade. Once I begin using that second blade, the mystique of the “first blade” is
gone. I may as well change as needed. My diminished regard for conservation con-
tinues through the second, third, and fourth blades. The other end of the tray leads
to a similar thought process as the first, however. This one is a bit more conscious
on my part—and therefore has a bit more weight. That mentality is “This is the last
one in the pack. I should make it last before busting open a new one.”
     If this subtle, but noticeable, shift in my allegiance to certain blades is taken
into account, there is no longer a blanket assumption of equal usage that underlies
the earlier premise. If we know the order of the blades, we can suggest that there is
a slight bias toward the first and last blades in the pack—more so toward the last
one. Therefore, rather than a straight 20% across the board, the distribution of
usage may look more like Figure 2.4.
    Notice that there is a slight bias toward blade 1 and a slightly larger bias toward
5. Obviously, the other three come in slightly under their starting point of 20%. The
numbers I selected are by no means scientific. In fact, I have no idea if they are ac-
curate about my blade usage or not. (In my less-funded college days, blade 5’s usage
would have been significantly higher due to “make it last because I can’t afford to
buy a new one.”) If I were to use this hypothesis as a model for simulating my true
blade usage, however, I believe it would be more accurate than a simple 20% across
the board. In any event, it would certainly be more interesting.
26   Behavioral Mathematics for Game AI

                      FIGURE 2.4 There is a bias toward using the first and
                         last blades in a pack longer than the other three.

          Just as my relatively lean college days led to a more significant reliance on the
     final blade, other people could have other patterns. Someone who is better off may
     not be concerned about moving on to the next refill tray, thereby not sticking with
     that fifth one long enough to risk a medical emergency. Someone who has a specific
     weekly pattern of usage may find he is always on the third blade before his “big date
     night” and will switch to the fourth to get that “clean, close shave” the commercials
     try to tell us is a strict requirement for actually being male. (I admit that I’m reach-
     ing a little bit on examples now.)
         Now, razor blade usage patterns may not seem terribly relevant to games, but
     the illustration is more of the thought process involved than it is of the specific
     example. There are differences in how a given individual approaches each blade, and
     there are differences in how multiple individuals approach the whole blade-usage
     predicament. The point is that many typical behaviors can be quantified this way.
     What’s more, the quantification can be done in such a way as to emulate the depth
     and personality of the subject.

     Turning Observations into Numbers
     Observing the distribution of possible choices is only the first step, of course. AI
     agents are there to act, not analyze. However, from a conceptual basis, the transi-
     tion is not difficult. Using the razor blade example, if we proceed with the premise
     that the figures in Figure 2.4 are accurate usage percentages, it would be reasonable
     to assume that those same percentages would apply to the likelihood of me using
     any one of those five blades at a particular time.
         So, if I were faced with an entire tray of five blades that I had used before, and
     the order has been preserved, you could make an educated guess as to which blades
     I would be likely to select.
         In this case, we know that blade 5 is likely to be used the most, followed by blade
     4. Blades 2 through 4 are the least used of the batch and would therefore be the
     most attractive to me. You could make an assumption based on these facts that
                                                Chapter 2 Observing the World         27

might look like Figure 2.5. At first glance, the distribution of the percentages looks
fairly inverse of the spread in Figure 2.4 (the probability of any particular one being
in use). However, is that really what we are trying to accomplish?

             FIGURE 2.5 The relative attractiveness of each blade is based
                           on how long I likely used it.

Making a Decision Based on the Numbers
Since I am aware of my proclivity toward over-using blades 1 and 5, if for some
reason I have to go back and select a blade to reuse, I will make a conscious choice
of selecting from blades 2, 3, and 4 (Figure 2.6). It is statistically more likely that
those are less-worn, therefore making them more attractive to use. On the other
hand, knowing that I probably used blade five to the breaking point, I’m not going
to be terribly likely to even consider it. That much is obvious. However, despite the
fact that blade 1 is probably in far better shape, I’m still going to be quite certain
that it is worse than any of blades 2, 3, or 4. Why would I even consider blade 1,
even though it may be in reasonably serviceable condition? It is far more likely that
I would choose from that middle trio. The result of this is that the odds of selecting
blades 1 or 5 are pretty much nonexistent. The probabilities of blades 2–4 climb
     Notice the subtle difference between the underlying logic of the numbers in
Figure 2.5 and Figure 2.6. The former was simply a relative inverse of the usage sta-
tistics. The latter took into account an intelligent decision that I, as the shaver, would
have made because I knew about the scenario that created the data in Figures 2.4
and 2.5. Despite the relatively small amount of difference between the conditions of
the blades, I knew that if I had to choose one that was in better shape, it would have
to be drawn from the three in the middle.
    Figure 2.4 is based on observation. Figure 2.5 is based on inferences from the
data in Figure 2.4. Figure 2.6 is designed to model an intelligent decision that could
be made from the inferences in Figure 2.5. I will cover ways of modeling these types
of decisions throughout this book.
28    Behavioral Mathematics for Game AI

                FIGURE 2.6 Based on the what I expect the conditions of the blades
               to be after using them before, which blade might I select to use again?


      As we have laid out, to be able to craft our pretend worlds, we need to be able to ob-
      serve and understand what is going on in our real worlds. Much can be discovered
      simply by observing what happens around us. We can identify the relevant factors
      and relationships—sometimes intuiting and inferring things that are not readily
      apparent. Most importantly, we must think about this information in a manner that
      allows for quantification. In some cases, this is simple. For example, my daughter
      could certainly make the claim that making three election posters was better than
      making only one. At other times, however, making direct comparisons is not pos-
      sible. Could my daughter Kathy make the same claim that making three posters is
      the same as making three stickers? What is the difference between three posters and
      three stickers?
          Even more difficult to ascertain is the relationships between items that cannot
      be counted. How much is it worth to have one popular kid evangelizing about your
      campaign for president of the fifth grade? How do we put a number on that? How
      do you put a value on promising ice cream Fridays?
          And yet, if we are to construct an algorithm to make decisions about such
      things, we need to condense all this information into a form that can be digested
      by those algorithms. We need to be able to put these very issues into the language
      of mathematics and numbers. After all, when it comes to computers and the algo-
      rithms they so happily process, the power is in the numbers. Without the algorithms,
      we cannot make our decisions. Therefore, we need to somehow take these observa-
      tions and comparisons about the behaviors we witness and turn them into numbers
      and algorithms.
          Conveniently, that’s our next stop…
3             Converting Behaviors
              to Algorithms

        ifferent fields of work have their tools. Some are the typical physical tools
        such as the hammers and nails that carpenters use. Some are more concep-
        tual such as a standard business process in the business world. In the game
industry, artists have their standard tools as well. Some are physical such as model-
ing and animation software packages, and some are conceptual such as the idea of
using the texture-mapped triangle. In my view, however, there is an intermediary
between a pure tool and the end product.
     A writer, for example, uses tools ranging from the word processor to pen and
paper to the venerable quill pen and stone tablet. There are more abstract tools as well.
The formalized rules of sentence structure and verb conjugation provide guidelines
for what the reader expects from the writer’s method of delivery. The words he uses,
however—the fine points of the language he chooses to express his thoughts—are
the intermediary. It is through the selection of words that the idea is articulated.
    A classical artist uses brushes as his tools of choice. At first glance, one would
label the paints as tools, too. I would contend that the paints are more of a language
of expression—the liaison between the pure tool and the end product. There are
suggestions as to what paints to blend to achieve certain colors and the proper way
to mix them for various effects. However, it is up to the artist to decide how to
achieve what he is attempting to portray. The paints are what allow the expression
of creativity. They are what give variety and depth of character to the work.
     Game artificial intelligence (AI) has its standard toolbox. Many algorithms and
data structures are repeated over and over throughout all the games on the shelf. In
fact, given the wide range of in-game behaviors that are exhibited, there are actually
startlingly few methods on the list. In this case, a technique such as A*, a behavior
tree, a state machine, or even a planning algorithm is a raw tool. Left to their own
devices, however, those algorithms don’t create behavior any more than words
alone create a soliloquy or paints alone create an image of a supine swine. The nuance
isn’t in the method itself; the depth of behavior comes from the data that is inserted
into the algorithm.

30    Behavioral Mathematics for Game AI


      The finite state machine (FSM) depicted in Figure 3.1 is an example of something
      that is often seen in games. This sort of linearity of action is evident—and even
      common—in the most rudimentary game AI. For instance, an enemy in a first-person
      shooter (FPS) or a role-playing game (RPG) may start out in an idle state. Once a
      criteria is established, such as the player entering a certain radius, the enemy
      changes to an approach state. This is similar to the transition from state A to state
      B in Figure 3.1. There is no other option. If the player doesn’t approach, state B is
      never entered. Continuing on, once the player is within range, the enemy may
      switch to state C, attack. If it never reaches the player, there is no other option; it
      stays in state B. Likewise, state D could represent “die,” the transition triggered by
      its health reaching zero. If the health never reaches zero, it doesn’t die (and keeps
      attacking). There’s not a lot of AI going on here.

          FIGURE 3.1 A very simple FSM with one static transition to and from each state.

           The fault is not in the FSM, itself, however. After all, a state machine is a tool.
      As simply a tool, there is very little depth to it. In fact, with the assumption that sim-
      ple logical triggers are behind the transitions (the arrows in Figure 3.1) from one
      state to the next, there is an implication of rigid predictability. If you know those
      transition cases, you know the next state. In the case of the FPS enemy, eventually
      a player will figure out that entering a certain radius triggers the transition from idle
      to approach. There is no variation whatsoever, and, accordingly, there is not a lot
      of drama. It is reminiscent of the predictability of Tic-Tac-Toe. “If the player can
      win next turn, block his movement.”
           By comparison, the state machine in Figure 3.2 provides something subtly
      different. While there are four states, just as in the one in Figure 3.1, state A can
                                Chapter 3 Converting Behaviors to Algorithms          31

transition to states B, C, and D. Even given that expansion of potential “next
moves” gives a small increase in the depth of the behavior.
     For example, let’s say state A is still the idle state of our FPS enemy. However,
states B, C, and D are three different methods of attack. Perhaps B is still approach
with the hopes of engaging in melee once the enemy arrives within range. State C,
on the other hand, could be a ranged attack of some sort, and state D could repre-
sent casting a powerful spell. Now, as the enemy changes from idle, we may not
necessarily know what is going to happen next.

     FIGURE 3.2 State (A) in this FSM could transition to any of three other states
        (B, C, or D) giving more than one possibility of what could happen next.
          The reasons those transitions occur are not defined but could simply
                           be concrete rules or random chance.

     However, if the transitions themselves were rigidly defined, there would still be
predictability that would not take long to discern. For example, in the diagram,
condition X leads to state B, condition Y leads to state C, and condition Z leads to
state D. If the player can determine what conditions X, Y, and Z are, then the pre-
dictability of the agent has returned. Maybe conditions X and Y are triggered based
on whether the player has a melee or ranged weapon out at the time—with the
enemy responding in kind. If the enemy selects “approach” (state B) every time the
player has a melee weapon out (condition X) and “ranged attack” (state C) every
time the player brandishes his own ranged weapon (condition Y), then things are
going to get mundane quickly. In this case, the transitions are reminiscent of the
earlier explanation of the rules the dealer has to follow in Blackjack: If my cards are
less than 17, hit; if they are 17 or greater, stand.
    There is one more addition that would add some depth of character to this state
machine. This addition would provide for the unpredictability that is reminiscent of
the Poker player. One way of creating this sort of variety is to make the conditions
32   Behavioral Mathematics for Game AI

     themselves somewhat fuzzy. Using Figure 3.3 as an example, if there were percent-
     age chances of transitioning to each of states B, C, and D, then a specific “next
     move” could be nailed down.
         Using our FPS enemy example, perhaps the only trigger for a transition is still
     a radius, but the next state (i.e., approach, ranged attack, or spell) is based on the
     percentages established by X, Y, and Z. That way, while the player would know that
     the enemy will no longer be idle, there is no way to predict what is going to happen
     next. The player has to be prepared to react to each eventuality.

                 FIGURE 3.3 The transitions from state A to states B, C, and D are
                controlled by the percentages expressed by X, Y, and Z, respectively.

          What’s more, those percentages’ chances can be dynamic as well. Rather than
     the X% chance of transition to state B always being the same value, we could allow
     it to change based on other factors. The result is that we may have a vague idea of
     tendencies, but there is always the potential for surprise. It is that unknown that
     lends character to our characters. Without that, our characters are automatons with
     the predictability of mechanical clocks.
          The challenge—or to continue my metaphor, the artistry—comes through the
     process of building those fuzzy conditions. Just as the mixes of colors on an artist’s
     palette are infinite, there are truly infinite possibilities on how to construct mathe-
     matical models. The models can be realistic, caricatures, or completely abstract. (My
     pig was an attempt at realism but ended up more as a caricature.) Even in the realm
     of realism, there are plenty of possibilities of how such realism can be represented.
         In this case, the brush is an FSM, and, just like a paint brush, left to itself, it is
     largely incapable of providing the expression that we desire. By adding the mathe-
     matical models to the relatively expressionless FSM, we can paint a picture for our
     player—one that has the nuance that presents our vision in a robust, detailed, im-
     mersive way.
                                       Chapter 3 Converting Behaviors to Algorithms           33

          Just like the Poker player, our agent is now making “a series of interesting
      choices.” And as the player experiences and interacts with our agent, he will likewise
      be able to make “a series of interesting choices.” According to Sid’s premise, at that
      point we have a game.


        FSMs have long been a staple of game AI. However, as behaviors grew more complex,
        the numbers of states grew as well. Accordingly, the number of potential transitions
        between the states grew at an exponential rate. The complexity of managing these
        states and their transitions began to work against the simple efficiency that was orig-
        inally the attraction of the FSM.
            One of the criticisms of FSMs was that they were rigid and predictable. I don’t
        completely subscribe to that theory. FSMs still have their place in game AI. Dismissing
        them outright is to say that the hammer is not a useful tool because it can’t perform
        all possible construction tasks. However, often they are matched with other, more
        expressive structures where they can play a supporting role. In fact, by utilizing some
        of the techniques in this book, you can expand and extend FSMs to a great degree
        while managing some of the increased complexity.


      For the numbers that we use to make decisions to be meaningful, they must repre-
      sent something. If the transition percentages in Figure 3.3 were simply random,
      they wouldn’t represent anything more than their own randomness. They wouldn’t
      “stand for” anything. At that point, the effect of using them to guide the agent from
      state A to states B, C, or D would be lost in a chaotic shuffle.

       IN   THE   G AME    Know When to Walk Away, Know When to Run

      To make those transitions mean something in a behavioral context, the numbers
      themselves need to be constructed so that they represent something meaningful in
      that same context. For example, if states B, C, and D were replaced by the three
      actions attack, hide, and flee, respectively (Figure 3.4), we would need to find a
      meaningful parameter (or collection of parameters) that would be relevant to the
      decision. Depending on the type of game and the data available, we may choose
      something as simple as the agent’s health.
34   Behavioral Mathematics for Game AI

           FIGURE 3.4 The transitions from the idle state to the attack, hide, and flee
          states are controlled by the percentages expressed by X, Y, and Z, respectively.

         The example in Figure 3.5, shows three different definitions of the state that our
     agent’s health could be (good, damaged, and critical). For each of those, there are
     three percentage values for each of the three actions (attack, hide, and flee). Part of
     defining those behaviors would be assigning those values. When our agent is in
     good health, how often do we want him to attack? To flee? What about when the
     agent is damaged? Should he ever attack when he is in a critical state of health? At
     that point, we are defining when those actions should be taken.

         FIGURE 3.5 Definitions of the transition percentages (X, Y, and Z) to the attack,
      hide, and flee states based on whether the agent’s health is good, damaged, or critical.
                Note the arrow showing the approximate trend from attack to flee.

         Of course, we would have to define what good, damaged, and critical health
     means. Does “good” mean 100%? Anything above 80%? 75%? That is yet another
     definition that must be made. The algorithm that processes the decision itself does
     not change. What is adjusted is the way the numbers are arranged. And this is
     where most of the subtlety comes into decision making.
                                 Chapter 3 Converting Behaviors to Algorithms        35

    To make even more robust algorithms, we may not use any categories at all.
Instead, we could build a formula that takes into account the health value and con-
verts it somehow into various percentages that we will then use to make our decision.
For example, we may elect to use the following formulas to create our transition

     At any given moment, we could use the agent’s current health value to calcu-
late the three percentages that we need to decide what the agent is going to do next.
(Actually, we only need two, as the third option would be in effect if the first two are
not selected.) Graphing those three formulas, we would get the results in Figure 3.6.

          FIGURE 3.6 Based on the formulas shown, the percentages for the
         state transitions (y-axis) to attack, hide, and flee change automatically
                          as the agent’s health (x-axis) changes.

    As you can see, the numbers follow a pattern similar to the static figures we
created in Figure 3.5. We start with about an 80% chance of attacking and small
chances of hiding of fleeing. As health reaches zero, the chance of attacking
approaches 10%, and that of fleeing arrives at 63%, with hiding coming in at 27%.
36   Behavioral Mathematics for Game AI

     Somewhere in the middle, we pass through values similar to what we had initially
     defined for the damaged value.
          By changing the formulas we can tweak the behaviors to tailor them to what we
     are trying to accomplish. For example, I decided that, even if the agent is about to
     die, there should still be a 10% chance that it would attack. Depending on the
     behavior I was trying to emulate, this might be unreasonable. A different formula
     would yield something entirely different. Let’s change the attack formula to the

         We now have a result in which the initial attack percentage is about the same
     (80.75) but falls to zero as the agent’s health gets to 15%. Below that, the only op-
     tions are to hide or flee with the appropriate percentages. (Note that we would have
     to be careful to not use the negative numbers generated by this formula in the cal-
     culations of our hide and flee values.) This change leads to a completely different
     behavior for the agent—and a very different experience for the player. As before,
     the player will see severely wounded agents using the various retreat behaviors
     more and more often, but they will never see a critically wounded agent attacking.
          Using a formulaic approach leads to smoother changes in values and, if con-
     structed properly, is often very computationally efficient. There are also plenty of
     variations that we can apply to this approach. Rather than simply linear formulas,
     we can use exponential, logarithmic, or even complex polynomial expressions to
     define our behaviors. Each of these types of lines and curves has its own character-
     istics and, therefore, can be used to create very different behavioral responses.
          On the other hand, we experience some loss of control by putting things en-
     tirely in formulaic terms. Notice that in the original three-column version (Figure
     3.5) there was a subtle rise in the hide value when the agent was in the damaged
     state. Using our strictly linear approach (Figure 3.6), we lost that subtlety. In fact,
     it may have been difficult to find an appropriate combination of formulas that sat-
     isfied exactly what we wanted. If a formula makes things too sterile, a hand-crafted
     approach may be more appropriate.
         There is generally no “right answer” to which approach to use. The number of
     possibilities is, for all intents and purposes, infinite. The trick is to select one that
     best approximates whatever it is that we are trying to emulate. We will investigate
     more of theses strategies in Part III, “Mathematical Modeling.”
                                      Chapter 3 Converting Behaviors to Algorithms        37


      Individual numbers alone aren’t always enough to do the trick. There usually isn’t
      a clean one-to-one ratio between something we would like to consider (an input)
      and the decision we need to make (an output). It is often important for us to con-
      sider a variety of inputs. Additionally, these inputs may be of significantly different
      types. This complicates things further.
          For instance, if we are simply looking at our health compared to our oppo-
      nent’s health, we are comparing two values that mean the same thing and may very
      well have the same scale. On the other hand, if we were to consider our health and
      the number of bullets left in our gun, we would have to determine which of these
      is more important and by how much. Do we put the bullets in terms of health?
      Health in terms of bullets? Or do we abstract things out to a level that takes both
      types of information into account? At this point, we have stepped out of simply
      using numbers and formulas… we are now creating more complex chains of logic
      and calculation to make our decision.

      Making the Grade
      If I can tap into my parenting again, my kids’ school district has this fancy Web-
      based portal so that I can see every bit and piece of their grades. More than simply
      getting a letter grade or even the current grade percentage, I get to see the grades for
      every single assignment, project, quiz, and test. It goes even deeper than that… I get
      to see how many points each of the above was worth to begin with. So a 20-point
      quiz is worth more than a 15-point quiz. A 50-point homework assignment is
      worth a lot more than a 10-point one. Heck, even getting 40% on the 50-point
      homework (i.e., 20 out of 50 points) is worth twice as much as getting 100% of the
      10-point one. Of course, the grade on the 50-point assignment is also worth five
      times as much as the grade on the 10-point one.
           What makes things more interesting is that a 30-point test or quiz is not neces-
      sarily equal to a 30-point homework assignment. The reason? The grading system
      allows for layers of combination weighting that is similar to our example above.
      The work is grouped into categories such as homework, projects, labs, quizzes, and
      tests depending on the way the teacher has arranged the class. Each category (sim-
      ilar to our mid-level groupings) is then given a weight. The weights are combined
      with the respective scores in each category, and a final grade is determined.
           Stepping through an example makes things a bit clearer. Here are some sample
      grades that I just made up. (My own kids’ grades at the moment aren’t all that in-
      teresting or conducive to an example.) In this example (Figure 3.7), there have
      been two tests, two quizzes, and four homework assignments. Each of those (white
      boxes) shows the number of points scored out of the maximum possible. For exam-
      ple, the first test grade was 35 out of a possible 50.
38   Behavioral Mathematics for Game AI

            FIGURE 3.7 A hierarchy of individual grades grouped into three categories:
         tests, quizzes, and homework. Each item has its own weight, as does each of the
          three categories. By combining them accordingly, we can arrive at a final grade.

         In each category, the total points scored and the total points possible are added
     up, for tests in this instance, leading to 87 out of 110, or 79.1%. When a grade has
     been found for each of the three categories (e.g., quizzes = 91.1 and homework =
     85.3), they will then be combined.

     Note the top row of arrows in Figure 3.7. Tests are worth 50% of the total grade,
     quizzes 20%, and homework 30%. This has nothing to do with the total number of
     points accumulated in each category. For example, despite the fact that there are 95
     total homework points, that doesn’t mean that it is nearly as important as the 110
     test points. The points in each category only have meaning inside their respective cat-
     egories. The only number that comes out of that category is the final percent scored,
     which, as mentioned, is applied to the weighting for that category. Proceeding
     through the simple math, we arrive at weighted values for each of the three cate-
     gories that, when added together, give us a final grade value—in this case, 83.36%.
         There are extraordinary benefits to this model. By setting up these categories
     and layers, we compartmentalize things to the extent that there is essentially a fire-
     wall between the different areas. We are free to set the number of points for any given
     homework assignment relative only to other homework assignments. If we feel that
     there is a proper balance between the relative importance of the four assignments,
     we can feel confident with the value coming out of the entire homework category.
                                 Chapter 3 Converting Behaviors to Algorithms     39

     What’s more, if we added a fifth homework assignment, we would not have to
consider the weight of that assignment beyond the homework category. We don’t
have to consider if the points for the new assignment are in balance with the quiz
tomorrow or the test this Friday. As long as it blends in well with the existing four
assignments, we are finished. The result would cascade out and down toward the
final grade just like any other changes, but only based on the already established re-
lationship that all homework be taken together and (in this case) figure in as 30%
of the final grade.
    Noting a couple of more items of interest in the example in Figure 3.7, the 91%
grade on quizzes seems good until you realize that, at 20%, the quiz weighting isn’t
enough to bring the grade up that much. Even if the teacher gave 20 more quizzes
and nothing else, if you maintain that 91% grade on them, the final grade would
not change. Even if the quizzes were worth 100 points each and you received a
grade of 91% on them, the final grade would not change. That is another one of the
features of building decision algorithms in a modular fashion such as this.
    That simple act of grouping like factors together to determine an aggregate
also cuts much of the complexity down. If all items, whether test, quiz, or home-
work, were to be tallied up without weighting, every time we added a new one of
any of those, we would have to compare its weight to all of the others that have
come before. In actuality, in the school environment, if you wanted to maintain a
certain spread of the weights, you would even have to plan into the future and
preweight everything you would do for the whole term. So by simply constructing
a logical, mathematical algorithm, we not only make things simpler to use, but
maintain more control over what we want to see as an end result.

 IN   THE   G AME   Expanding the Engagement Decision

In the “engagement decision” example in the previous section, we used only one
criterion, “agent’s health,” to decide whether we should attack, hide, or flee.
Unfortunately, the data that we use to construct our decisions is rarely as simple as
that. Something as simple as “agent’s health” is usually not the only consideration
in making a decision. Often, we need to account for other factors. For example, in
the engagement decision above, we may want to take into account such things as:

      Agent’s health
      Enemy’s health
      Agent’s weapon
      Enemy’s weapon
      Number of enemies
40   Behavioral Mathematics for Game AI

         Proximity to a leader
         Proximity to an important location
         Agent’s “anger” level

         If we were so inclined, we could measure each of the above criteria and build a
     formula that takes it all in and spits out one single number. Needless to say, a for-
     mula that took all of that into account in the appropriate manner would get fairly
         Alternatively, we could combine some of the above factors into intermediate
     values that would then be utilized in constructing the final decision (Figure 3.8).
     For instance, we could combine the agent’s health and the enemy’s weapon into a
     value representing the risk that is presented to the agent. On the flip side of that, the
     agent’s weapon combined with the enemy’s remaining health could be combined
     into a measure of the threat to the enemy. These two values could then be combined
     into an indication of the total threat balance between what the agent could receive
     and dole out.

            FIGURE 3.8 A hierarchy of factors can be combined in layers to eventually
               arrive at a single decision. In this case, nine factors are combined and
                   recombined until the final “engagement decision” is calculated.

         By analyzing the number of friends and foes, where the agent’s leader is, and
     the proximity to an important location (like a base), we could determine the morale
     that the agent may have at the time. However, if we were to combine these factors
     with the agent’s anger level, we could arrive at a “perceived morale” that might rep-
     resent how flying into a rage might color the agent’s decision making. In the end,
     we can funnel all of the above into one final “engagement decision.”
                               Chapter 3 Converting Behaviors to Algorithms      41

    Now we have taken the nine criteria that we started with and combined them
into more of a modular format. We are still using them all, but the relationships
between them have become more meaningful. Each time we combine factors, how-
ever, we will have to determine how to weight them and massage them to get the
effect we want.
     Going back to our engagement decision example, even starting small, in con-
structing the risk to agent value, we need to determine how important the two
factors—agent’s health and enemy’s weapon—are, respectively, to that value. Are
they equal? Is the agent’s health more important than the weapon? Do we calculate
it in terms of how long the agent can expect to live at a particular rate of damage?
Combining intermediate values can be a little more obscure. How do we process
total threat and perceived morale so that we are sure we are going to generate an ap-
propriate balance?
    In large part, when dealing with complex, multilayered processes, the key is to
be confident with each step along the way. If you feel good about what went into
two smaller decisions, you can more accurately combine those two results into a
larger one. Likewise, when debugging or even tweaking a full decision, it is impor-
tant to work backward appropriately. If you are not satisfied with one layer, it may
not be the direct inputs to that layer that are the problem. You must question
everything that leads to that point. Referring again to Figure 3.8, if you are not
happy with the result you are getting from the total threat value, you have to ques-
tion not just how you are combining risk to agent and threat to enemy together, but
the factors that you used in arriving at those figures to begin with.

    All of these approaches are the steps to a very complicated process of crafting
decisions. Once done, however, the resulting behaviors can be deep and powerful…
and yet easily changed and maintained.
     That being said, don’t expect “solutions” in this book. Unlike an area such as
pathfinding, there isn’t one problem here, much less a single answer to that prob-
lem. Using the metaphor from the beginning of this chapter, as you proceed
through the book you can expect to find plenty of brushes, paints, palettes, and
other tools of the trade. You will learn some of the researched and studied history
of the art and how what those masters discovered can be applied in your own work.
You may find suggestions for what colors mix well to accomplish certain effects.
I will even invite you to look over my shoulder to see how I paint some sample
works. However, there is no way I can tell you how to do your own painting. There
are simply too many ways to do it.
    All the artistry must be your own.
This page intentionally left blank

 II             Decision Theory

     n Part II, we will explore the roots of what goes into making decisions.

      Chapter 4, “Defining Decision Theory,” looks at the two major types of
  decision theory, their uses, their weaknesses, and how they can be blended together.
      Chapter 5, “Game Theory,” walks through some of the classic decision scenarios
  and investigates what we can glean from them to use in our own games.
      In Chapter 6, “Rational vs. Irrational Behavior,” we discuss the differences
  between purely rational behaviors, purely irrational behaviors, and seemingly irra-
  tional behaviors that actually make sense.
      Chapter 7, “The Concept of Utility,” explains how objective measurements in
  the world can be converted to a perspective that an artificial intelligence (AI) agent
  can understand.
       Chapter 8, “Marginal Utility,” explores how the utility of an object or action
  is sometimes a moving target that changes based on the perspective of the observer.
      Chapter 9, “Relative Utility,” discusses how the utility of one object or action
  can be compared, contrasted, and even combined with the utility of another object
  or action, and how this can either assist in a higher goal or lead to conflicting infor-

This page intentionally left blank
     4              Defining Decision Theory

             ecision theory seems, by its name, to be a fairly straightforward concept. At
             first blush, it would seem to be the theory of making decisions. While that
             is technically correct, these waters are deceptively deep and even a bit turbid.
      The notion of what actually constitutes a decision is sometimes called into question
      —which doesn’t make it any simpler to advance a theory about it.
          Regardless of what we settle on as a definition of decision, there are plenty of
      theories about not only the “right” way to come to one, but also theories on why
      humans and other such decision-making entities often fail miserably to make that
      right decision.


      There are a couple of different subdivisions of the larger umbrella of decision
      theory. The primary one is the area of normative or prescriptive decisions. Think
      of this as relating to what is normal or should be prescribed in the situation—that is,
      what should be done given the facts. Facts are (in fact) a requirement of normative
      decision theory. The only way a decision that should be made can be determined
      with such certainty is with certain assumptions in place (Figure 4.1). The decision

          Has all of the relevant information available
          Is able to perceive the information with the accuracy needed
          Is able to perfectly perform all the calculations necessary to apply those facts
          Is perfectly rational

46   Behavioral Mathematics for Game AI

              FIGURE 4.1 Normative decision theory takes a perfectly perceived
            model of the world, accurately performs all the necessary calculations, and
                         rationally selects the most appropriate decision.

          Most other game AI agents use normative (prescriptive) decision theory. In
     fact, this is the area that encompasses much of game AI today. A pathfinding algo-
     rithm uses all four of the above criteria and returns the path that an agent should
     take. Likewise, a planner that utilizes A* to find a way to solve a goal theoretically
     returns the “best” steps to take to accomplish its intended goal. A stereotypical bot
     in a first-person shooter (FPS), as an entity tied to the game world, has all the in-
     formation about its environment available to it, can perceive it all—through cheats
     if necessary—and is able to perform any and all necessary calculations to make a
     perfect head shot on anyone it pleases, and they are rational to a fault.
         It would seem that normative decision theory is perfectly suited for computers
     and, by association, game AI. If, as a designer or programmer, we are specifically
     concerned with looking at our game world and solving a problem to tell our agents
     what they should do, normative decision theory seems to be the channel through
     which we must pass. While this is the case to some extent, there are caveats.

     Predictable “Shoulds”
     In Chapter 1, we examined how some games have a very defined pattern of what
     should be done on any given move. In Tac-Tac-Toe, given any board arrangement,
     there is one play that you should do if you want to win (or not lose). Not only does
     that instruct us about what to do on our turn, but we can be reasonably certain that
     our opponent will respond with the corresponding move that he should make to
     counter it. Accordingly, in our opponent’s eyes, our move is just as predictable.
          In Blackjack, the moves on any given arrangement of cards are a bit more obscure
     due to the mathematical complexity involved. However, the best mathematical
     solution is printed in vivid colors on the cheat cards. Those moves represent a
     statistical aggregation representing what we should do. Does it always work? No…
                                           Chapter 4 Defining Decision Theory        47

The effectively random nature of dealing cards makes it possible that we will lose
the hand despite making the suggested play. However, if one were to do the math,
the result still shows us that our best chances lie in a certain direction. Therefore, we
should follow it. Any kind dealer (and even the often overly helpful people at the
table) will tell you what your play should be—making it fairly apparent that the
proper Blackjack play is predictable.
     Therein lies the problem. Using normative decision theory exclusively as a
source of game AI makes those decisions and actions very predictable. If, as a
player, you can track the same criteria the agent is using to make those decisions,
you can then match them with the decision the agent makes when those criteria are
in place. As these patterns repeat, you will be able to determine what those
“shoulds” are. Thus, you can predict with complete accuracy what the AI agent is
going to do at any given time. All you are doing is connecting the cause and effect
model that the agent is depending on completely for its thought process. You have
exposed what seems to be a complicated process for the Rube Goldberg machine
that it is: Everything happens for a reason—everything happens in order… because
it should.
    The other problem with using normative decision theory is that it is entirely
mechanical. I don’t necessarily mean mechanical in the sense of completely deter-
ministic as explained above. I mean it in the perceived sense that it doesn’t seem
human. Certainly, much of that effect is due to the rigid cause and effect chain.
However, there is another factor in play: Not all people act the same way all the
time. If you were to face 10 enemies that are using the same model, and they all act
the same way every time they are faced with a constellation of criteria, the notion of
humanity (or any other –ity you want to apply here) is lost. We are exposing our
agents for the completely mathematical and logical bots that they are. Not only does
that not look “real,” but how long do you think it generally takes for the cause and
effect rules to make it onto the Web or into strategy guides? “If you want to beat
[this game], do [this]—the AI will do [that]… every single time.”
     Using normative decision theory for game agents causes people playing against
them to subscribe to using it as well. We have reduced the exchange to the deter-
ministic equivalent of Tic-Tac-Toe. There is an answer for everything on both sides.
If you want to win, you must follow it. To harken back to Chapter 1 and Mr. Sid’s
idea of interesting choices, it would seem that we have unwittingly removed both
the “interesting” and the “choice” from the experience. There must be a better way.
48    Behavioral Mathematics for Game AI


      The other side of the decision theory coin is the realm of positive or descriptive
      decision theory. In this case, positive is not the antonym of negative, but rather spun
      off the word posit, that is, to lay down an assumption or theory. In fact, despite the
      affirmative nature of the word, positive decision theory deals in such a manner as
      to not make a judgment about what should be done.
          The other choice of words, descriptive, is somewhat clearer in this respect.
      Descriptive decision theory simply describes a correlation, for example, between cause
      and effect or other such relationships. This is the realm of study of what people tend
      to do, rather than what they should do (Figure 4.2).

           FIGURE 4.2 Positive decision theory uses historical observations of behavior,
         and summarizes and analyzes that data to express what has been done in the past.

           Part of the problem is that of converting scale. In Isaac Asimov’s Foundation
      trilogy, the character Hari Seldon was a proponent of what he called “psychohis-
      tory.” He believed that by studying the past actions of large groups of people, you
      could predict the future actions of large groups of people. Notice that the first part
      of that sentence looks much like our descriptive decision theory, that is, studying
      the past actions of large groups of people. However, Seldon was very aware that the
      data from psychohistory could not predict the actions of a single person. Asimov’s
      example (which actually is borrowed from real-world physicist Daniel Bernoulli)
      was that of the action of molecules of a gas. While science has a pretty good handle
      on how large quantities of gaseous molecules act in concert, they can’t predict the
      action of any single one of those molecules. Therefore, while psychohistory is valu-
      able for predicting the path of masses, it doesn’t help us determine what may hap-
      pen on a more granular scale—such as that of the individual.
                                            Chapter 4 Defining Decision Theory      49

    For example, I’ve heard it said that “four out of five dentists surveyed recom-
mend sugarless gum to their patients who chew gum.” That is a description of the
aggregate of the dentists surveyed. Literally, when the data was summarized, 80% of
them recommended sugarless gum. The other 20% just wanted more repeat business.
     There are a few things that we can’t assume based on this data. First, we cannot
make the case that dentists should recommend sugarless gum four-fifths of the time
and recommend the dental time bomb the rest of the time. It also doesn’t mean that
any one dentist recommends sugarless gum 80% of the time and regular gum to the
rest of his patients. Descriptive decision theory doesn’t deal with the specifics of
individual behaviors like that. However, from the description provided by this data,
we can pull a random dentist off the street, ask his opinion, and feel reasonably cer-
tain (80% so) that he is going to recommend sugarless gum to us. (Come to think
of it, is there any gum that has sugar in it anymore?)

Past Performance May Not Be an Indication
So, the problem with theory is that it’s… well… theoretical. It is not really what “is”
in a concrete sense but what “seems to be” if you back up far enough to blur out all
the detail. If I can draw from a prior life of mine, when I was a musician studying
theory and composition, my mother bestowed upon me her copy of the book
Harmony by Walter Piston. While reading about music theory is almost as dry as
reading about game AI, there was a line in the preface that caught my eye. Walter
said, “Theory must follow practice… theory is not a set of directions for composing
music. It tells us not how music will be written in the future, but how music has
been written in the past” (emphasis mine). What Mr. Piston was pointing out is that
music theory is descriptive in nature rather than prescriptive. It gives us a window
into what has become accepted practice over time rather than what should be done
by anyone writing music today. (One could make the point that today’s musicians
seem to take Piston’s edict the wrong way around… and, accordingly, that the
creative variety of today’s music suffers in merciless homogeneity.)
    The only way we can make a connection between what has been done (descriptive
decision theory) and what should be done (normative decision theory) is to assume
that what has come before had a valid, reasonable rationale. That is, the decisions
that the collection of dentists made were for a reason. We may not know what that
reason is, but, if we trust the dentists, we can decide that their collective opinion is
reasonably correct.
    One problem with using these assumptions is that we may be misled by what
has been collected. In investment program commercials, we hear this as “past per-
formance may not be an indication of future results.” The same could be said for
50   Behavioral Mathematics for Game AI

     sports teams over the course of many years. Just because a team has a history of
     success over the years, that doesn’t mean they are going to do that well during the
     upcoming season. (Or, in the case of my Cubbies… uh… never mind.)
         Another problem that can occur is when we end up in a cyclical arrangement.
     That is, when the availability of data on what has been done by others in the past is
     used to drive the decisions of others in the future. At times, this seems like it would
     be a good idea. If I were a dentist trying to decide on what sort of gum I would
     recommend to my patients, I may look at what other dentists before me have rec-
     ommended. In our clichéd “four out of five” scenario I may decide that sugarless
     gum must have merit enough that my colleagues are suggesting it to their patients.
     Therefore, I should do so as well. There are problems with this approach, however.

     What if They Are Wrong?
     As we covered above, we have to assume that the data is relevant and well informed.
     Sometimes it is not. There is a well-known (and oft-repeated) phenomenon in pol-
     itics, wherein polling data is used as news. If we were to see a poll on an issue or a
     person who told me that a majority of people felt a certain way, I might be inclined
     (like the dentist above) to assume that there is something to the position. As time
     passes and more people believe the data and shift their opinions accordingly, the
     poll itself shifts more. This may not be due to any sort of legitimacy of the topic in
     question. It may be entirely due to the fact that people trust each other’s opinions
     and think that the next person (as reported by the poll) must know more about the
     issue than they do.
         The more people that are reported as supporting the position, the more
     strongly my conviction that there must be some relevance or benefit to it, despite
     the fact that I may know nothing about the issue. If we were to make a decision
     based on that information, we would have committed a grievous error. We have
     taken descriptive decision theory (the polling data) and used it as a replacement for
     the prescriptive decision theory (what we should do with the information).
          As I related in Chapter 3, my daughter Kathy noticed the effect that the descrip-
     tive data had in the fifth-grade election. People were willing to base their support
     for a candidate on the knowledge that a group of friends supported that candidate
     as well (i.e., the survey). For what it’s worth, in the milieu of a fifth-grade student
     council election, there is no prescriptive decision theory that can be presented.
     There is no reason that you should vote for any one kid over another. In that case,
     the perceived opinion of the group is good enough. In Kathy’s observation, each
     kid believed that the other had a perfectly valid reason for supporting that candi-
     date… and they never bothered to compare notes. It was an exhibition of the social
     momentum of peer pressure at its finest. You wouldn’t need Hari Seldon’s psycho-
     history to predict the voting habits of 10-year-olds.
                                                   Chapter 4 Defining Decision Theory           51

            In a similar vein, the practical joke of stopping in a crowded place to stare and
       point up at something that isn’t there is a great example of how you can influence
       the public in this way. The more people start mimicking you, the easier it is to get
       other people to look as well. Their thought process is just the same as the polling
       data. They don’t have the facts that would be required for a prescriptive motive
       telling them they should look up. Instead, they have placed all their trust in the
       descriptive sense of “Everyone else is doing it… I’m sure they wouldn’t be doing it
       without reason.”


       Certainly, there are merits in both of these areas, but while normative decision the-
       ory (the bot) has obvious examples of how it should be used to make decisions in
       games, how does descriptive decision theory come in? After all, we don’t tap into a
       database of survey information à la Family Feud to help our bots make decisions.
       About the closest thing we have to that is capturing player data ahead of time and
       attempting to make our decision models from that. If game AI were that simple,
       these sorts of books wouldn’t be necessary. However, there is something to be
       gleaned from the use of descriptive decision theory.
            In game AI, we are tasked with something of a hybrid. On the one hand, we are
       trying to determine, on any given game frame, what our agents should do. We are try-
       ing to create a decision for the moment—the arena of normative decision theory.
       That may or may not be a simple undertaking, depending on the decision or situation.
            On the other hand, whether or not we can come to a decision about what
       should be done, we are trying to emulate and/or re-create behaviors that look like
       what people (or animals, or space aliens…) tend to do. Of course, these decisions
       may or may not be what they should do. Hopefully, those tendencies that we analyze
       with descriptive game theory (and attempt to re-create) will also be near enough to
       the “most logical choice” as per normative game theory so that they don’t look
       ridiculous—slightly misguided or a bit erroneous perhaps, but not outright silly.
            The result of all of this is that, to construct decisions that are meaningful and
       realistic, we can’t tie ourselves to the omniscient and purely rational tenets of nor-
       mative decision theory. Looking again at the list of requirements for that to be in place:

             Has all of the relevant information available
             Is able to perceive the information with the accuracy needed
             Is able to perfectly perform all the calculations necessary to apply those facts
             Is perfectly rational
52   Behavioral Mathematics for Game AI

        … none of those are at all human, and therefore any decision based on that
     model is bound to be flawed in its efforts to look human (or animal, etc.).
        If we swap a few things into those four items, however, we can start to sense a
     model that is a bit more like what we encounter in reality.

         Has some of the relevant information available
         Is able to perceive the information—albeit with some inaccuracies
         Is able to perform the calculations necessary within a margin of error
         Makes decisions that involve factors other than perfect rationality

         In each of the italicized areas above, we have washed out some of the perfect
     computational ability that computers just happen to be good at. In its place, we
     now have some more fuzzy ideas and nebulous concepts. How do we know how
     much information to make available? How do we construct some inaccuracies in the
     perception? How do we insert a degree of error into calculations? How much error?
     And what sorts of factors do we include other than perfect rationality? For that
     matter, what other factors are there?
          This is where we can put positive or descriptive decision theory to work. By
     starting with the raw logic of what they should do, applying an analysis of what
     people, animals, or even orcs tend to do, we can build models that guide what they
     will do in our games (Figure 4.3).

             FIGURE 4.3 Combining normative and positive decision theory takes the
           limited world model perceived by an agent, adds the potential for errors, and
        creates a belief about the world. When combined in a moderately rational fashion
         with a behavior model constructed from observations of what people tend to do,
                          it yields varied, yet believable realistic behaviors.
                                         Chapter 4 Defining Decision Theory       53

     Assembling all of this together is a balancing act. Whereas each of the areas is
relatively self-contained and self-reliant on its own, once the walls are down, we
encounter inconsistencies and even outright contradictions. In that case, which is
more important? Do you start with one and modify it with the other? Is there an
underlying law that points us in the right direction regardless of the rigors of logic
or the vagaries of observed behavior? Is there a theory that encompasses all of the
above? Interestingly, there is, or, at least, there has been a reasonable attempt to
explore and codify these issues.
    To paraphrase Fraulein Maria, let’s start at the very beginning… a very good
place to start. We must decide on a manner for determining what our agents should
do. And for that, we tap into the wonderful world of John von Neumann.
This page intentionally left blank
5             Game Theory

           hile plenty of study from a variety of disciplines has tackled the behemoth
           of decision theory, perhaps one of the most successful (and conveniently
           relevant to us) forays into this realm is the collection of efforts made in
the area of game theory.
    Game theory has never been a mainstream subject. Initial references to it date
back as far as the early 1700s. However, the spectacular mathematical genius
John von Neumann is generally credited as being the “father of game theory.” In
1928, he published a series of papers on the subject. A more visible launching point
of the concept occurred in 1944 with the publishing of the book Theory of Games
and Economic Behavior that von Neumann co-authored with economist Oskar
Morgenstern. Not long afterward, in 1951, John Nash took game theory to another
level with his analysis of equilibria. More on that later.
    On the whole, however, game theory went through ebbs and flows of interest.
Limited mostly to esoteric, military, or scientific arenas, it has also seen use in the
areas of political science, sociology, biology, computer science, logic, and even phi-
losophy. In the 1980s, it finally caught on in the world of economics—over 50 years
after von Neumann’s original papers and more than 35 years after Neumann and
Morgenstern published their massive dissertation on the subject.
     Despite its name, game theory is only loosely connected to the concept of
games. In fact, the name “game theory” is something of an unfortunate misnomer
(which may have led to the delay in it being considered seriously by the scientific
community). The core concepts tend more toward a generic analysis of decision
making—some of which can be applied to games. One of the most important tasks
that the field of artificial intelligence is charged with is imbuing agents with the
ability to make decisions—some of which can be applied to games. It would seem,
then, that the ideas that are endemic to game theory would be a natural starting
place for a study on how to create decisions in artificial agents in games.

56    Behavioral Mathematics for Game AI


        When reading about game theory, you will occasionally see references to the follow-
        ing terms.

                Two-person: A game played by two people. Alternatively, this can apply to
                two teams of people working together toward the same goals (e.g., Bridge).
                Zero-sum: A game that has a balancing end condition such as one winner and
                one loser (e.g., Chess, Checkers, Tic-Tac-Toe), or whatever is gained by one
                player is lost by the other player (e.g., Poker with a fixed amount of money at
                the table).
                Non-zero-sum: A game where there are degrees of winning and losing (e.g.,
                Poker, a lottery, a contest where you get to keep your score such as Bridge).
                Perfect knowledge: A game where each player sees everything that is going
                on (e.g., Chess, Checkers, Tic-Tac-Toe).
                Imperfect knowledge: A game where some information is hidden from the
                player (e.g., Battleship, Poker, anything with a “fog of war”).
                Cooperative: A game where the players can form binding commitments.
                Symmetric: A game where both sides have the same information and play by
                the same rules.
                Asymmetric: A game where one side has different rules and/or different infor-
                mation than the other.


      Fourteenth-century English logician and Franciscan friar William of Occam is cred-
      ited with one of the most useful notions in all of science. Occam’s razor states,
      “Entia non sunt multiplicanda praeter necessitatem.” For those of you who don’t
      speak Latin (I admit that I must include myself here), this roughly translates as
      “entities must not be multiplied beyond necessity.” There have been plenty of
      restatements of this edict that are a bit clearer. Two of the more oft-quoted are “all
      other things being equal, the simplest solution is the best” and “the simplest expla-
      nation that covers all the facts is usually the best.”
           Brother Occam would have been pleased with John von Neumann. Despite the
      fact that he was a brilliantly talented mathematician (he was famously known for
      declining to use the supercomputer he actually helped invent because he could do
      the calculations faster in his head), Mr. von Neumann knew when flash and style
      were inappropriate. One of the useful aspects of his presentation of game theory
                                                           Chapter 5 Game Theory        57

       is that it boiled down decision making into small enough parts that it is easy to
       discern the concepts in play. Examples in game theory are often characterized by
       simple rules and equally transparent decision possibilities. (The actual proofs of
       von Neumann and Morgenstern’s assertions of game theory are not even remotely
       as compact—to the point that Theory of Games and Economic Behavior is largely
       unreadable.) John von Neumann starts small—with the concept of the two-player,
       zero-sum, perfect knowledge game and works up from there.
             Through this reverence to Occam’s razor, we are given the smallest possible
       building blocks from which we can learn and with which we can build later. Using
       these examples, we can establish a foundation of understand about what goes into
       decisions, not only from a logical standpoint, but from a mathematical one as well.
       It is this application of mathematics to decision theory that allows us to model the
       subtlety to which I referred in Chapter 1.

       In Chapter 4 of Theory of Games and Economic Behavior, von Neumann and
       Morgenstern begin with a section entitled “Some Elementary Games,” which has
       a subsection, “The Simplest Games.” They weren’t kidding. The first one they
       introduce is one not so obscurely called Matching Pennies. It is similar to Rock-
       Paper-Scissors but with two choices instead of three. (I told you they weren’t
       kidding!) It is a zero-sum game in that one player’s loss is another’s gain.
           The game involves two players and two pennies. (I grudgingly use the term game
       despite the lack of interesting choices… but that’s just my two cents.) Each player
       hides his penny and turns it to either heads or tails. They then simultaneously show
       each other their pennies, and the enormously complex ordeal of scoring the round
       ensues. As you can see in Figure 5.1, if the pennies match—either both heads or
       both tails—then player A is the winner. Conversely, if the coins do not match, then
       player B is the winner. (I’m not sure how it is determined who will be player A, but
       I would suggest that they could flip a coin.)
            The lesson to be learned here is strictly one of terminology. Matching Pennies
       is a game in which there is no “pure strategy.” That is, there is no “best response.”
       A pure strategy is a complete definition of what a player should do at any given
       time. That is, if you play this way, you will win (or at least will maximize your
       chances of doing so). For example, as we discussed before, Tic-Tac-Toe has a pure
       strategy. If you want to win, you have to make certain moves. If you make those
       moves, you maximize your chances—which, in the case of Tic-Tac-Toe, generally
       leads to a draw. At worst, it doesn’t lead to a loss. At best, if your opponent makes
       a mistake (by not playing the pure strategy), you will win.
58   Behavioral Mathematics for Game AI

                     FIGURE 5.1 The decision matrix for Matching Pennies.
                       If both players select the same face, player A wins.
                        If the players select opposite faces, player B wins.

          Pure strategies are a gold mine when playing games, but they are a death knell for
     game developers. If players discover a pure strategy for playing against your artifi-
     cial intelligence (AI), they will be able to win all the time with very little thought or
     effort. On the other hand, if AI agents can determine a pure strategy that causes them
     to play the same way every time, they become very predictable and, thus, very boring.
         In Matching Pennies, since there is no knowledge of the game environment,
     nothing to perceive, and no calculations to make, even normative decision theory
     can’t come to the rescue and tell us what we should do on any given play. Therefore,
     there is no pure strategy to play. There is no “best” thing to do at any one time.
          Interestingly, despite there being no “best” strategy, over time there is a worst
     strategy… always selecting the same face for your coin. If you were to do that, it
     wouldn’t take long for the other person to determine your pattern and respond
     accordingly. Likewise, a parallel worst strategy is to always alternate between heads
     and tails. Your opponent should be able to pick up on your method and play in
     such a way that your predictable repetition words to his advantage. Notice that at
     that point, we have not introduced some knowledge to be aware of… the pattern of
     play. If an agent were able to discern that and calculate accordingly (i.e., extending
     the pattern), then it could make a pretty good stab at what it should play as per nor-
     mative decision theory.
         Barring that, about the only hope we have is to apply some sort of psychology
     and hope to ascertain what your partner’s pattern is—and, therefore, his next
     move. If your opponent is even reasonably adept at obfuscation, though, nothing
     short of the Jedi Mind Trick is going to yield any better results than chance.
                                                       Chapter 5 Game Theory          59

Mixing It Up
This opposite approach from pure strategy is called a mixed strategy. In this ap-
proach, you select between a variety of strategies that are available to you. Often,
you can assign probabilities to these decisions to select which one you can do. This
is something that we address in depth later on. Suffice to say that in Matching
Pennies, there are only two choices at any one time.
    That being said, the best approach is to vary your plays in a somewhat random
fashion. Of course, if you do that, you are no better off than if you had flipped your
coin. Taken one step further, why flip two coins and see if they match when you
and your opponent could simply flip a single coin and be done with it? But now I’m
arguing with the entire premise of this delightfully simple game. Occam would not
     As I mentioned, the game is roughly analogous to Rock-Paper-Scissors. The
difference is that there are three possible plays in Rock-Paper-Scissors rather than
simply two. That makes the scoring grid 3×3 as well. Regardless, as with Matching
Pennies, there is no “pure strategy” to Rock-Paper-Scissors that you can follow to
achieve any more success.

 IN   THE   G AME   Matching Punches

At first, it would seem that something as simple as Matching Pennies would have no
real comparison to the world of computer and video games. However, over the
years plenty of games have used this mechanism in some form or another. In fact,
one could claim that some games still do.
     If we were to imagine a fighting game of the simplest sort, we could start to
draw a parallel. Let’s say that player A has two attacks—high and low punches.
Player B, on the other hand, has two defenses—high and low blocks. The goal of the
game is for player A to get past player B’s block and score a hit. If B’s block is in the
same area as A’s attack (i.e., high vs. high), then A fails to score a point. If B’s block
is in the wrong area (i.e., high vs. low), then A scores a hit.
     Given the parameters of this game, we can analyze it in exactly the same way as
Matching Pennies. There is no “best” strategy to play—other than not repeating the
same sequence over and over. The only thing you can do is observe the other player
for potential clues that he is playing dumb and repeating patterns. If both players
are mixing their choices appropriately, the best you can attempt to accomplish here
is a random button-masher.
60   Behavioral Mathematics for Game AI

     P UTTING I T   IN   C ODE

     Although this may seem obvious, I may as well drop it in. If you were to put the
     entire AI for Matching Pennies or the above hypothetical fighting game into code,
     this is what it might look like.
         typedef enum { MV_HEADS = 0,

                           MV_TAILS = 1} PENNIES_MOVE;

         PENNIES_MOVE MyGame::SelectPenniesMove()


              PENNIES_MOVE ThisMove = PENNIES_MOVE(rand() % 2);

              return ThisMove;


         It seems kind of pointless, doesn’t it? We have one single function that returns
     0 or 1, in this case, representing heads or tails. Incidentally, if you make the func-
     tion return three choices instead of two, you have the AI for Rock-Paper-Scissors.
         typedef enum { MV_ROCK = 0,

                           MV_PAPER = 1,

                           MV_SCISSORS = 2} RPS_MOVE;

         RPS_MOVE MyGame::SelectRPSMove()


              RPS_MOVE ThisMove = RPS_MOVE(rand() % 3);

              return ThisMove;

         As I suggested in Chapter 1, games such as these only have the appearance of
     playing against another person. If your best bet is to play randomly, the only choice
     is whether to do so or not. If you choose to play randomly, and your opponent does
     so as well, then we do not have a game that meets the condition of “interesting
     choices.” It also becomes tiring rather quickly. (I do not believe this is a coincidence.)
                                                               Chapter 5 Game Theory        61

       One of the more oft-cited examples of game theory is the Prisoner’s Dilemma. It
       was originally framed by Merrill Flood and Melvin Dresher working at the famed
       RAND Institute think tank in 1950. Albert W. Tucker formalized the game with
       prison sentence payoffs and gave it the name Prisoner’s Dilemma. In the spirit of
       von Neumann, it is a two-person, perfect knowledge game. However, unlike
       Matching Pennies, it is not a zero-sum game… both people can “win” to varying
       degrees. Likewise, both players can “lose” as well. And that, we will see, is what
       becomes an important, yet deceiving factor in this contest.
            In its most often cited form, the Prisoner’s Dilemma uses a hypothetical situa-
       tion of two suspects being arrested by the police. The police have insufficient evi-
       dence for a conviction on a major crime that carries a 10-year sentence. The police
       separate both prisoners and visit each of them to offer the same deal. If one testifies
       (i.e., “defects”) for the prosecution against the other and the other remains silent,
       the betrayer goes free and the silent accomplice receives the full 10-year sentence.
       If both remain silent, both prisoners are sentenced to only six months in jail for a
       minor charge. If each betrays the other, each receives a five-year sentence. Each
       prisoner must choose to betray the other or to remain silent. Each one is assured
       that the other would not know about the betrayal before the end of the investiga-
       tion. The question posed is, given these parameters—including the specifics of the
       lengths of the prospective sentences—how should the prisoners act?
           From a mathematical standpoint, the issue is deceptively simple to solve.
       However, not everyone notices the solution right off. At first glance (Figure 5.2),
       the choices seem to be:

           If I stay silent I could go to jail for 10 years.
           If I tell on my partner, I could go free.

           The above assertions, however, don’t quite cover all the possibilities. Specifically,
       they leave out the fact that you truly don’t know what the other person will do—
       and that his choices make a significant impact on the results. In fact, it is because the
       other person’s psychology and resultant choices are in play (and yet unknown)
       that this is a particularly intriguing problem.

       Strictly Dominant Strategies
       Let’s begin with the assertion that the potential actions of the other prisoner are un-
       known and, we will have to assume for now, unpredictable. Any solution we devise,
       therefore, has to give equal credence to both possibilities—that the person in the
62   Behavioral Mathematics for Game AI

     next room over could betray us or keep silent. Keeping that in mind, the choices we
     have before us can apply a 50% chance to each of the potential outcomes of that
     choice. Using that logic, our choices now carry the following potential penalties.
         Stay silent:

         50% chance that we serve 6 months (if he stays silent as well)
         50% chance that we serve 10 years (if he rats us out)

         Betray our partner:

         50% chance that we go free (if he stays silent)
         50% chance that we serve 5 years (if he tells on us as well)

          We are tempted to do a purely mathematical solution for this. By applying the
     percentages to the time periods, we can reduce the unknown action of our cohort
     to a single figure. In this case, staying quiet would amount to

         Likewise, if we were to betray our partner, we would be facing

                   FIGURE 5.2 The decision matrix for Prisoner’s Dilemma.
          Depending on the actions of the two prisoners, a variety of results could occur.
                                                     Chapter 5 Game Theory         63

     So, in a situation where we treat each of our partner’s choices as equally possi-
ble, it would seem that it is in our best interest to betray him. The two possible
outcomes given our action are zero and five years… or an average of 2 1⁄2 years. That
is certainly better than the average of 5 1⁄2 years that we face if we stay quiet.
     Put logically, rather than mathematically, this seems to make sense as well. We
are putting the possibility of 10 years of incarceration out of play. Instead, we are
looking at a maximum of five years. We are also putting the six-month sentence out
of play as well, but who cares? We are keeping the possibility of doing no time what-
soever alive and well! We have the best maximum, the best minimum, and the best
average result all on our side of the ledger. There doesn’t seem to be a weakness to
this approach. In fact, if we accept all our premises as valid, our solution concept
is what is referred to in game theory as a strictly dominant strategy. That is, our
strategy of ratting out our partner makes us better off no matter what he does.
    From a strictly game theory standpoint, we have achieved what we set out to
find—an optimal solution given all the possibilities. We determined that cooperat-
ing is strictly dominated by defecting. That is, the “defect” strategy is inherently
weaker. (Even defective, I suppose.)

Can We Improve on “The Best” ?
A quandary arises when we analyze our premises, however. For the sake of argu-
ment, we replaced our inability to know what our partner’s choice was going to be
with equally probable outcomes—50% each. That gave us a mechanism for taking
into account the unknown. Unfortunately, we also treated our partner as if he was
no more intelligent than the randomly shuffled deck of cards that the Blackjack
dealer in Chapter 1 was using. We ascribed no rationality to him whatsoever. By
reducing him, in effect, to a flip of a coin, we are not even taking into account that
he was capable of making a choice.
     Again, this may not seem to matter on the surface. After all, we have already
determined the strictly dominant strategy for ourselves (to defect). That strategy is
in our best interests of self-preservation in that it guarantees us the lowest sentence
regardless of what the other person does. If it is in my best interest to utilize that
strategy, would my partner not be of the same mind and defect as well?
    The answer to this may lie in personalizing things in the other direction. If our
partner chooses the strategy of defecting, he has looked at us as a 50/50 wildcard as
well. He has determined that we are not capable of rational thought and, therefore,
has taken matters entirely into his own hands… optimizing his own benefit by
choosing to betray us. Of course, if he does betray us and we betray him, it looks like
we are both going to do five years.
64   Behavioral Mathematics for Game AI

         Even if we were to adjust our payoff formulas to account for an imaginary
     “likelihood” that he will choose to defect, say making it 80% instead of 50%, we
     would get the following:

         Never mind the 21⁄2 year average potential sentence that we were looking at. The
     premise on which it was based (a 50% probability for each of our partner’s choices)
     was flawed the whole time. He wasn’t a coin flip. He was a rational human (in dark
     glasses and a stupid hat) who was fully capable of making (interesting) choices that
     served his own best interest.

     Pareto Optimality
     Of course, if we now accept the notion that our partner is a rational human, out for
     his own benefit, we may have to include the idea that he would view us the same
     way. Perhaps all is not lost. If we both view each other as completely reasonable and
     wise, and, in doing so, assume the other would view us as just as enlightened, we
     may have to reanalyze our payoff matrix.
           We have recently come to the conclusion that, despite our happy thoughts of a
     2 1⁄2-year average sentence, we are looking at five years if we both choose to defect.
     That 2 1⁄2-year sentence doesn’t really exist. It was a mathematical fiction based on
     our application of probability. Now that we are working under the assumption
     that it is no longer likely that we will get off entirely with no time at all, the six-
     month option that we allowed to be taken off the table looks a lot more attractive.
     What if he is thinking the same way?
          To simulate what would happen if we could read each other’s minds, we may
     as well pretend that we can speak with our partner in crime. To be honest, if we
     were to do so, this entire exercise would no longer be a dilemma. We could simply
     agree to keep our mouths shut, serve our six months, and go home. That would
     achieve what is called a Pareto improvement. A Pareto improvement occurs when
     the solution shifts so that at least one of the players has a better situation at no detri-
     ment to the other. In this case, the shift actually benefits both players. In fact, a
     quick glance at the Prisoner’s Dilemma matrix shows that the scenario of both
     prisoners keeping quiet is actually the Pareto optimum because no further im-
     provements can be made.
          The irony of the situation is that, in acting in what is ostensibly our own best
     interests (betraying our partner), we actually avoid the optimal solution for both of
     us. It seems to be a contradiction that acting in our own self-interest does not in-
     clude the actual best option for ourselves. The mistake that was made, however, was
                                                     Chapter 5 Game Theory         65

the original assumption we made about our partner… that we could have no idea
what it was that he would do. That may or may not be the case—it depends on our
knowledge of our partner. It could still very well be wise to betray him if we have
reason to believe he is going to do the same to us. If we believe he is not only intel-
ligent enough to recognize the hidden “out,” but also thinks enough of us to assume
that we would take it as well, then we can choose to keep quiet and meet him for a
cheeseburger in six months. We will examine this more in Chapter 6.

 IN   THE   G AME   Dueling Rocket Launchers

To see how this sort of game theory staple is important, let’s put the Prisoner’s
Dilemma into a hypothetical game setting. To design a scenario that accurately re-
flects the quandary of the Prisoner’s Dilemma, we need to look at the parameters
that surround the choice. First, we must have two parties with two identical choices
facing them. Second, we must have a conservative choice and an aggressive choice.
Both of those choices must have a reasonably positive outcome and a somewhat
negative outcome based on how they match up with the other player’s choice. That
is, we need to have a 2×2 scoring matrix where the positives and negatives roughly
match those in the original Prisoner’s Dilemma.
     Imagine we are in a shooter type environment as shown in Figure 5.3. Both
players start in a position of being mostly hidden from the other player. The cover
we are behind protects us from anything except rockets. If a rocket were to strike
our hiding place, we would be killed. We are fully healthy but lightly armed. We can
fire on our opponent from our cover, but only do very light damage to him.
Likewise, if he stays hidden, he can do light damage to us as well.

 FIGURE 5.3 Either agent could elect to run into the open and grab a rocket launcher
                      or to hide and wait for reinforcements.
66   Behavioral Mathematics for Game AI

         Out in an open area between us and our enemy are two rocket launchers. If ei-
     ther player gets a rocket launcher, he will be able to fire it at the hiding place of the
     other player—killing him. Because the small arms fire is not accurate when firing
     at people on the move, running to retrieve a rocket launcher will not bring about
     any damage. However, if both players run to acquire rocket launchers, they will
     then be able to shoot rockets at each other as they run around. If this happens, it is
     assured that both of them will take heavy damage, although not as bad as what
     would have happened if they had simply stayed hidden as a vulnerable, unmoving
     target for the enemy’s rocket launcher.
          Both teams are waiting for reinforcements that will arrive soon and simultane-
     ously. If we are heavily damaged when the others arrive, we could possibly die
     (50%). This chance of death is not as certain, however, as if an enemy’s rocket were
     to strike us while we are hidden (100%). If we are only lightly damaged by the small
     arms fire we took while hidden, we will have only a 5% chance of dying when both
     sides’ reinforcements show up. (In all cases, these risks are mirrored by our enemy,
     that is, he’s facing the same possibilities for the same choices.)

     Making the Choice
     The choice before our agent is this: Do we chose to stay in hiding, perhaps taking
     occasional pot shots at our hidden enemy, and wait for the reinforcements to show
     up, or do we rush out to grab one of the two rocket launchers and either kill the enemy
     where he is hiding or run around trying to kill him while he does the same to us? It
     is a little easier to visualize the decisions and results by placing them in a grid just
     like the one we used for Matching Pennies and Prisoner’s Dilemma (Figure 5.4).

                  FIGURE 5.4 The 2×2 decision matrix for the rocket launcher
                 example is surprisingly similar to that of the Prisoner’s Dilemma.
                                                        Chapter 5 Game Theory           67

     The decision process is much the same as well. If we are trying to maximize the
damage to save ourselves from harm, just as we did in the Prisoner’s Dilemma, we
may want to look at the possible outcomes of our choices from that standpoint. If
we were to run out and grab the rocket launcher, there are two possibilities. First,
if the enemy stays hidden, we acquire the big gun without opposition, aim, and
blow him and his cover sky high with no damage to ourselves (Figure 5.5). That
doesn’t sound too bad.

       FIGURE 5.5 The scenario of one agent rushing to get the rocket launcher
      while the other one hides is analogous to the Prisoner’s Dilemma scenario of
      one prisoner betraying while the other stays silent. In either case, the former
                            wins and the latter loses entirely.

     Second, if he does run out to grab a rocket launcher for himself, we begin the
shoot-and-hop dance in the middle of the arena, take some damage, doing some
damage, and hoping that when the reinforcements show up for both sides we aren’t
too messed up to get through it alive. However, we realized that being damaged at
that point isn’t terribly attractive. Still, it gives us a fighting chance. We are still
alive. In fact, the two outcomes of this choice are both rather positive. We either
blow our enemy to heck or we knock him down a few pegs. There isn’t any imme-
diate risk of death… only the heightened possibility of death later on once the rest
of the troops arrive.
    The same cannot be said for one of the possible outcomes were we to stay put.
If we remain in hiding and our enemy runs out to get the rocket launcher, we aren’t
going to last long. At that point, we are good and dead—reinforcements or not.
However, that outcome is not certain. He may choose to not run out there at all,
instead using his own woefully inadequate peashooter to harass us the way we are
doing him. Once the respective reinforcements show up, the battle will begin in
68   Behavioral Mathematics for Game AI

     earnest, with us having only sustained minor damage. However, it is pretty much
     an all or nothing prospect to remain where we are. We live or we die. And it’s that
     last part that is a little disturbing. Aren’t we trying to avoid death?

     The Strictly Dominant versus Optimal Strategies
     So, just like in the original Prisoner’s Dilemma, we have found a strictly dominant
     strategy. Before, it was to betray our partner. By doing so, we were guaranteed no
     worse than five years of prison, no matter what our partner did. In this case, by run-
     ning out to get the rocket launcher, we are guaranteeing that we are not going to die in
     the near term. If he comes out as well, we are going to take some damage, but at least
     we are not dead… no matter what our enemy does. That all sounds well and good.
          However, just like in the Prisoner’s Dilemma, the strictly dominating strategy
     isn’t the optimal strategy. The optimal strategy, in a joint sense, would be for both
     parties to stay put and wait for the reinforcements to arrive. This is similar to not
     betraying our partner in the Prisoner’s Dilemma—and hoping that he does the
     same. But, as I mentioned, this is a joint strategy. In the original scenario, we could
     believe that our partner was looking out for us as well as himself. If we are support-
     ing ourselves on this premise, we find that it falls apart in the rocket launcher
     example. Why would our enemy be looking out for our best interests as well as his
     own? That line of thinking is based on the fallacy of our premise.
          In both the Prisoner’s Dilemma and rocket launcher problems, the other per-
     son isn’t looking out for our interests at all. He is looking out for his own interests
     and, in looking at what we might do, believes (rightly) that we are looking out for
     our own interests. Confused yet? It is the classic “I believe that he believes that I
     believe…” cliché. Put another way, if I believe that my enemy is interested in “living
     to fight another day” and that he realizes that I might also subscribe to that notion,
     we are both going to wait it out and only fight when the reinforcements arrive.
          What’s to prevent him from charging out of cover, grabbing a rocket launcher,
     and blowing my hidey-hole sky high? Probably the fear that I might charge out and
     start blasting as well.
          But can I make my decision based on that? What is preventing me from charg-
     ing out and blasting him into proverbial smithereens? Only my own distaste at
     limping back with half health when the troops from both armies arrive—a situation
     that I may not survive. I would much rather play it safe for now, and I assume that
     he is thinking the same thing. That is the optimal solution: both of us playing it safe
     until the rest of the group shows up.
          Certainly, there could be plenty of other issues that we would need to consider
     if this were a true game environment. What does death mean, for example? In an
     online game, is there a respawn rate? In a real-time strategy (RTS) game, how many
                                                            Chapter 5 Game Theory          69

      units like me can my general build? And how fast? Are we close to a goal such as
      capturing a strategic point where we may want to throw everything we have at the
      attack? Are we defending a strategic point and need to do so for as long as possible
      —therefore causing us to be conservative and careful? All of these could possibly be
      valid points, but they also complicate things somewhat—and make Friar Occam
      shift a little uncomfortably in his chair. For purposes of making the point, we have
      to resort to our standard disclaimer of “all other things being equal.” (Trust me, as
      we progress further through this book, we will dismiss the good friar from the
      room so we can start having some real fun.)

           Sure, the puzzle isn’t completely “solved.” It remains a dilemma, but it’s not as
      cryptic as it once was. (Unfortunately, it’s also not as superficially clear-cut either.)
      What you realize through the process is that the correct decision depends on other
      information that wasn’t given and that, in a hypothetical, depersonalized setting,
      you can’t ferret out for yourself (e.g., the knowledge of whether or not your partner
      is a selfish moron or a suicidal, berserking maniac). That is the point I am trying to
          Many problems have mathematical solutions that take a limited number of
      parameters into account. Often, those lead to strictly dominating strategies. At that
      point, the problem is solved and the hunt is over. However, by putting other factors
      into play, such as a rational, intelligent person with a possible twist of malice… or
      loyalty… on the other side of the table, we have created something that goes beyond
      a simple mathematical solution. We have created a dilemma that, in our established
      vernacular, is an “interesting decision.”


      Up to this point, the games we have touched on have been symmetrical. Both sides
      have had the same decisions, risks, and rewards available to them. Certainly, that is
      not always the case. In fact, even in a symmetrical game such as an FPS or RTS,
      plenty of situations will arise where an individual encounter is not symmetrical.
      Theoretically, those encounters are, in and of themselves, complete games with de-
      cisions to be made.
          Our rocket launcher example above would be entirely different if the rules were
      changed so that the unarmed player A was deciding to grab the rocket launcher
      while player B was already armed with a sniper rifle. Staying put behind meager
      cover is now no longer a low-risk solution, which makes taking your chances running
      into the open a lot more attractive.
70        Behavioral Mathematics for Game AI

              Sometimes the lack of symmetry is in the goals and sometimes it is in the avail-
          able choices. In Chapter 1, we touched on the game of Blackjack. This popular
          casino game is deceptively asymmetric. When it comes to scoring the hands, both
          the players and the dealer are held to the same criteria.

                Have a higher valued hand than the opponent.
                Don’t go over 21.

              Given this information only, the game looks as if it is the same on both sides.
          In fact, you could play Blackjack head-to-head against a friend rather than in the
          traditional sense against a dealer and you would be playing symmetrically. Both
          players would be able to freely ask for hits, stands, splits, and doubles at will.
          However, the dealer doesn’t have those options. The rules are very specific about
          what he can and cannot do.

                If he has less than 17, he must hit.
                If he has 17 or higher, he must stand.

               This is why Blackjack is a winnable game. Very often it behooves the player to
          elect to stand with a hand of less than 17. The dealer doesn’t have that option avail-
          able… and the player knows it. So, while the scoring is identical, the gameplay dif-
          fers significantly. Blackjack is an asymmetrical game.
               Needless to say, there is an entirely different dynamic to games where there are
          different choices to be made on each side. However, there is still an important fac-
          tor that needs to be taken into account just as much as in symmetrical games: What
          is the other player likely to do, and how should I adjust my strategy to accommo-
          date that? Sometimes this is easy to compute, and other times, you have to make a
          concerted effort to “get inside the mind” of the other party.

          One situation that many people may be familiar with is that of cutting something
          in half to share. In this process, a cake or other such desirable item needs to be di-
          vided between two people. The first person makes the cut, and the second person
          selects which piece he wants.
               Most of us wouldn’t necessarily think of this process as a game. In fact, it really
          isn’t a game at all. However, the point that it illustrates does speak quite a bit to
          game theory, and, like some of the examples that will follow, this act is definitely
          asymmetrical, as the very simple rules show. One person is the “Cutter,” and the
          other is the “Decider.” There is no overlap in their responsibilities. Neither has any
          say in the other’s role, and yet each must rely on the other person.
                                                      Chapter 5 Game Theory        71

     The decision of the second person is the more obvious as long as we assume
that the person is going to attempt to get as much cake as he can. (I don’t think
that’s a far stretch, do you?) The key point—and therefore the more interesting de-
cision—falls to the first person—the Cutter. Where does the Cutter divide the cake
so that he can also maximize the amount of cake he would get?
     I am going to put off the obvious solution for a moment so that we can formally
approach the logic involved. The Cutter has the power to do one of three things
(ignore the symmetry for the moment). He can make piece A bigger (and B smaller,
of course), he can make piece B bigger (at the expense of A), or he can do his darn-
dest to make them the same size. If he makes A bigger, the Decider is likely to select
A. If he makes B bigger, the Decider is, once again, likely to select B. In both cases,
the Cutter comes out on the short end of the spatula and gets a smaller piece.
However, as the Cutter minimizes the difference between the two pieces—even to
the point of theoretical equality—the Decider loses the ability to even detect which
of the two pieces is bigger, much less to select it.
     Even if the Decider could tell the difference down to miniscule amounts, as the
size of the larger piece approaches the size of the Cutter’s inevitably smaller piece,
that smaller piece is increasing in size as well. If we assume that the meticulous mea-
surements of the Decider will always allow him to pick the larger piece, the best the
Cutter can hope for is half the cake minus some nanoscale amount. Obviously, it is
in the Cutter’s best interest to make that amount as small as possible since it is the
only thing standing between him and a complete half of the cake.
     From a game theory standpoint, the only way the Cutter can determine his
strictly dominant strategy is to take into account what the Decider may do given
each of the possible ways of cutting the cake (Figure 5.6). So, like the Prisoner’s
Dilemma, the strictly dominant strategy is to attempt to maximize your own posi-
tion. Unlike the Prisoner’s Dilemma, however, the optimum strategy is the same as
the dominant one. You simply can’t improve the dominant strategy any further.
    The main similarity to the Prisoner’s Dilemma, however, is that the key to
finding that strategy is to think beyond your own position and put yourself into the
mind of the opponent. That is, “How would the other person react to the choice I
make?” If you didn’t have to take this into consideration, the “game” would be as
simple as taking as much cake as you like and leaving the rest for the other person.
After all, they wouldn’t have a choice but to accept what you offered them.

In late 2007, a rumor began circulating claiming that “the cake is a lie.” I can
assure you that this is, in no way, a reference to the cake in this example. Any
attempt by nefarious sorts to imply that the cake in this example is a falsehood
or other such fabrication will be dealt with swiftly.
72    Behavioral Mathematics for Game AI

       FIGURE 5.6 When cutting the cake, if it is assumed that the Decider will always take a
        larger piece, the only way the Cutter does not get a smaller piece is if they are equal.

      Admittedly, the cake example in the previous section would be much simpler if we
      didn’t allow the other person to select which piece he wanted. However, there is an-
      other popular twist on asymmetrical decision theory that is in a similar vein. In this
      scenario, often referred to as the Ultimatum Game, there are still two differing
      roles—the Giver (similar to the Cutter) and the Decider. Once again, the Giver is
      in charge of dividing and distributing a finite amount between himself and another
      person. As much as I would like to continue using confection as our medium of
      exchange, I find that this example works better with money.
           Let’s assume that we, as the Giver, have been entrusted with $100 to distribute
      between ourselves and another person (the Decider). We are told that we propose
      to give a non-zero amount to the other person (the titular ultimatum). Whatever
      we don’t give to that person, we will be allowed to keep for ourselves, with one
      major caveat: The Decider can select to accept or refuse our offering. If the Decider
      keeps his cut, we (as the Giver) get to keep ours as well. If the Decider turns down
      the offer for whatever reason, we do not get to keep our share of the money either.
      In effect, the Decider is electing that neither of us will get anything at all. The $100
      is returned to the mysterious source from whence it came.
          This game has been used often in psychological studies, with interesting results.
      People’s actions differ significantly from those that would be dictated by using sim-
      ple mathematics alone.
           To determine where things go awry, we first need to analyze the decision fac-
      ing the Giver. Theoretically, we must select a value that the Decider will accept. The
      reason for this is plainly obvious. If the Decider is displeased with our offering, he
      will reject us, and we will get nothing. (Does it sound like we need a volcano and a
      small farm animal in this example?) However, if we please the Decider, we will get
      to keep whatever is left. Mathematically, the problem is, how do we get the most
      that we can out of the $100 without risking getting nothing at all?
                                                      Chapter 5 Game Theory         73

    We have set up the classic struggle between greed and fear (which we will talk
about more in a bit). It would seem that we need to decide on some threshold
below which our gift would upset the Decider. Once we are confident of that
threshold, we could increase our cut of the pot up to that point. For example, if we
decide that the Decider would be happy with $20, we could offer him that and
attempt to keep the $80 for ourselves. On the other hand, the Decider might not be
pleased with anything less than a 50/50 split like we had in the cake example above.
In that case, we would want to offer $50 and keep $50 for ourselves. Perhaps, if we
don’t offer an actual majority of the $100 to the Decider, he may turn us down out
of spite—making sure we don’t get anything either.
     In a way, the Decider is holding us hostage. He could easily make sure we don’t
get anything no matter what we offer! So how do we determine what the Decider is
likely to do? Again, our strategy requires us to flip the situation around and look at
the problem from the point of view of the Decider.

Looking from the Other Side
If I am now the Decider, after the Giver proposes the amount that he is offering me
(x), my choices are as follows:

    Accept amount x.
    Decline x and receive nothing.

    Given that the rules of the game are that the Giver must offer a non-zero
amount, x > 0 will always be true. Therefore, if I accept, I will always be better off
than if I decline. Even if the Giver elects to offer me $1—or one cent—accepting
that token amount would net me more than if I had declined.
     Viewed in a decision matrix (Figure 5.7), this much is obvious. Incidentally, I
could have left it at one box since x is a variable and the choices for the Decider are
all the same. Additionally, I could have expressed this solution as a simple graph. I
separated the Giver’s choices into three columns, however, to illustrate a point.
    Switching our mindset back to that of the Giver, we should realize that no mat-
ter what we offer the Decider, he is going to be better off accepting than declining.
If we, therefore, assume that he will accept any legal (i.e., non-zero) offer, we should
offer him the smallest possible amount (let’s use $1 for now) and claim the rest
($99) for ourselves.
74   Behavioral Mathematics for Game AI

     FIGURE 5.7 The decision matrix for the Ultimatum Game shows that the only possibility
         for either participant to get anything is for the Decider to accept the Giver’s offer.

     The Failure to Act Rationally
     Yet, when this test is given by psychologists, that is not what happens. They have
     discovered that there is a threshold below which the Decider will not accept the gift.
     For whatever reason—likely the feeling of being “stiffed”—the Decider elects to
     take nothing and suffer as long as the Giver is suffering along with him. In nonclin-
     ical terms, it is the equivalent of “Oh yeah? Fine! Be that way! I’ll show you!”
         What we have seen is that the Decider is not acting in his best interests mathe-
     matically and logically. Theoretically, he is not acting “rationally.” It would be ra-
     tional to choose to receive something of value over receiving nothing at all.
     However, although it might be irrational, apparently it is also not unexpected.
         It turns out that plenty of people playing the Giver side of the game actually do
     offer more than the bare minimum to the Decider. Just as it is irrational for the
     Decider to turn down a certain thing (via accepting whatever is offered), by the
     same rules, it would be irrational for the Giver to offer anything more than the min-
     imum. And yet, more often than not, they do so.
          There could be two reasons for this. First, it could be that the Giver doesn’t
     have a good grasp on the theory that goes into making the decision from the point
     of the Decider (i.e., something is better than nothing). Second, it could be that the
     Giver is well aware that there could be something insulting about being completely
     rational. That is, despite realizing the complete rationality of trying to maximize his
     own position, he may realize that the Decider might not elect to do the completely
     rational thing by accepting.
          Put more succinctly, the Giver may find it rational to assume that the Decider
     will not do the rational thing—and, therefore, the Giver should bypass what would
     normally be rational for him to do. Any clearer yet?
                                                              Chapter 5 Game Theory          75

            This illustrates a divide between the types of decision theory examined in
       Chapter 4. The mathematical approach of normative decision theory (the
       “shoulds”) is not reflected in what people tend to do in practice (i.e., descriptive de-
       cision theory). As we move on, we will find plenty of these seeming contradictions,
       and they may make us begin to second guess the seemingly reliable normative
       approach to attempting to replicate behaviors.

       A game (if it can be called that) similar to the Ultimatum Game is the Dictator Game.
       The setup is the same. A pot is available to the Giver. However, the other person is
       now no longer the Decider; he can’t decide to do anything at all. He will accept
       whatever the Giver elects to offer him. In a way, the Giver is now the Decider as
       well. (We will now refer to the other person in this exchange simply as the
       Receiver—since that is his only role.)
             Because the Receiver now has no control whatsoever in the arrangement, we (as
       the Giver) could simply elect to give him the minimum amount (e.g., $1) and be
       done with it. This would maximize the amount that we keep for ourselves: $99. In
       fact, if the rules allowed for it, we could elect to give the Receiver nothing at all, and
       there would be nothing he could do about it. From a decision theory standpoint,
       it is obvious that the most logical solution for the Giver is to minimize the gift (x)
       because it maximizes his own take (100 – x).
           However, once again, psychological testing shows to us that people don’t al-
       ways act rationally. In numerous trials, it has been shown that the Givers usually
       give more than the minimum amount. How much they give is not important. The
       point is simply that they do so when they don’t have to. After all, every dollar they
       give away is one they can’t keep for themselves. What could cause them to do this?
            One suggested solution is similar to the first one in the Ultimatum Game—the
       Giver simply fails to maximize his own position. This doesn’t seem quite as likely
       in the Dictator Game, however. In the Ultimatum Game, the failure to understand
       how to maximize his own position would have likely been born out of a failure
       to put himself in the shoes of the Decider, whose predicament was simply the in-
       evitable decision between something and nothing, and seeing how that affected his
       own choice. In this case, the Giver doesn’t have to put himself in the shoes of the
       other person at all. This should make the decision simpler than the Ultimatum Game.
           The other suggested solution is that there is a factor involved that is not being
       tracked and measured by the rules of the game—altruism. It could be that the people
       playing the role of Giver in the Dictator Game are gaining some sort of satisfaction
       out of sharing with others that outweighs whatever monetary losses they are taking.
76     Behavioral Mathematics for Game AI

           Certainly, this could be related to the structure of the game and the testing
       environment. If some dude in a lab coat were to hand me $100 and say “Could you
       share some of this with a stranger, please?” I wouldn’t have too much of a problem
       with it. I’m still walking out with more money than I came in with. However, if you
       were to approach me and say “Take some of your own assets and share them now,”
       you would likely get a slightly more rude answer.

       One final twist on these sorts of games is a spin on the Dictator Game. We need to
       reset the names again, as we are now getting a little circular. Once again, we have a
       Giver who will bestow an amount on the Receiver. However, in a sort of mirror
       twist, the amount the Giver has to split is based on an initial gift by the eventual
       Receiver (Figure 5.8).

             FIGURE 5.8 The difference between the Dictator Game and the Trust Game
                 is that in the Trust Game, the Receiver gives a sum of money to the
                              Giver in the hopes of getting some of it back.

            At first glance, it would seem silly for the Receiver to give the Giver anything at
       all in the hope of getting it back. Therefore, to prime the pump, the pot is generally
       sweetened somewhat by the researchers through matching funds or some other en-
       ticement, so that the Receiver is encouraged to actually give something to the Giver.
       Yes, it’s twisted.
           The reason that this is interesting is that, once in possession of the funds, the
       Giver doesn’t have to give anything (back) to the Receiver. Just like the Dictator
       Game, to maximize his own portion, it is in the Giver’s best interest to not give any-
       thing to the Receiver, and yet the Givers still tend to do so. Perhaps it is altruism again.
       Yet, once again, altruism is not possible to track via strictly mathematical models.
                                                      Chapter 5 Game Theory          77

    However, there does seem to be another factor in play. If the Receiver doesn’t
give a lot of money to the Giver in the first place (and the Giver knows this), then the
Giver is less likely to give more of it back to the Receiver. This introduces another
dynamic—one that seems to affect the factor of altruism. Let’s call this one “spite.”
     If the Receiver trusts the Giver with a large amount of money, for example, the
Giver may feel the same altruism that he would feel in the Dictator game. However,
as the amount of money the Receiver trusts the Giver with declines, so does the
likelihood and size of the amount the Giver gives back. It is as if the Giver is saying
“OK, fine… if you don’t trust me, I’m not going to reward you.”
    Interestingly, when these experiments are repeated with the same partners, they
very rarely end in what is actually the perfect single game equilibrium of “no trust.”
That is, the Receiver gives nothing at all to the Giver. One could account for this in
that the researchers are kicking in a little to encourage it to begin with, but still, it
also may have something to do with the fact that something else is going on in the
Receiver’s psyche. Does the Receiver have a “need” to feel trustworthy? Is that pride?
Avoidance of shame? You can see how the more we delve into this, the further away
from simple logical, mathematical explanations we drift.
    We also can begin to see why there are often differences between the purely
mathematical normative decision theory models and the more anecdotal descriptive
decision theory ones. It is not only the inability of agents to calculate in a completely
rational fashion but also the inclusion of messy emotions like fear, spite, pride, and
shame that make decisions a bit more difficult to predict. Remember the difference
between Tic-Tac-Toe and Poker? The calculations were simple to make in Tic-Tac-
Toe. Your opponent’s move was just as simple to predict. With Poker, we add a little
more complication and a lot of psychology, and the choices get very interesting.
This page intentionally left blank
6              Rational vs. Irrational

          ne of the problems that von Neumann and others had with their applica-
          tion of game theory was the expectation that people behave rationally and,
          in doing so, will always attempt to select the best outcome. As we saw from
some of the examples in Chapter 5, this is not always the case. Often, people either
fail to select the best option or even elect not to for whatever reason. The result is
the reason for such a difference between normative decision theory and descriptive
decision theory. In that gap lies a whole lot of irrational behavior.
     Of course, trying to figure out what those behaviors are is a bit of a knotty prob-
lem. Computers are good at figuring out the rational answers. Coming up with an
irrational but reasonable-looking answer is another trick entirely. Most of us are
accustomed to the notion that irrationality is something to be avoided or even shunned.
And yet, as we have seen so far—and will continue to explore—irrationality is not
only very real but it is what bestows depth of character on behaviors.
     There is a significant problem in trying to work with irrational behavior, how-
ever. While generally there is only one correct (i.e., rational) answer to a problem,
the solution set on irrational behavior tends to be a bit wider. That is not to say that
everything that is not the “correct” answer is going to look reasonable. Some things
are just plain wrong.
     If you recall in Chapter 1, I wrote about my beloved pig painting. While my
prone porcine portrayal was less than perfect, it was well within the bounds of
“piggishness.” It didn’t have the fifth leg sticking out of the top of its back like a dor-
sal fin. While not perfect, it was reasonably pig-like. There are, in truth, an infinite
number of ways that one could paint a pig, but only a select number of them would
fall within an acceptable range that observers would accept as “looking like a pig.”
Sure, some of them might be categorized as “an odd-looking pig” but would still be
thought of as reasonable enough to not be confused with, say, a horse, an iguana,
or a platypus (although a platypus is confusing enough on its own).

80     Behavioral Mathematics for Game AI

           In the end, while normative decision theory and the utility-maximizing algo-
       rithms that fall out of it provide us with the sterile “should do” answers, we need to
       look a little further into the basis of reason and rationality to begin to replicate it.


       Because irrationality is so difficult to define, it is actually easier to start this foray by
       starting from the summit of the mountain of rationality and working down. Agents
       are said to have perfect rationality if they always act in the best possible manner,
       even if they have to perform extensive and difficult calculations to do so.
           If, for the sake of example, we were to reduce this to a simpler game space, we
       could use the game of Tic-Tac-Toe. As we noted in Chapter 1, the choices available
       at any point in the game can be narrowed down to a decision between whether or
       not we want to win. If we do want to win, there is an obvious selection. If we do not
       want to win, there is an equally obvious selection. Therefore, our success at Tic-
       Tac-Toe is based entirely on whether or not we want to win. A perfectly rational
       player will always play those correct moves. If we were to elect to play incorrectly on
       even one of those moves, we would no longer be considered perfectly rational.
           Other examples of perfect rationality can be applied to the games from Chapter
       5. The prisoner who, without any other information to go on, elects to betray his
       partner in the Prisoner’s Dilemma is exhibiting perfect rationality. The person who
       gives the minimum in the Ultimatum and Dictator games is acting in a perfectly
       rational fashion. Cutting the cake exactly in half so as to minimize exposure to the
       expected (and perfectly rational) greed of another player is perfectly rational. Even
       the person who plays a mixed strategy in Matching Pennies to keep from tipping off
       his opponent to patterns is being perfectly rational.
          If an optimal solution exists, the perfectly rational agent will take it every time.
       What could be so wrong about that? It turns out that perfect rationality has serious
       weaknesses that can only be exposed by running it through a test drive. For that, we
       need a test track so we can see perfect rationality in action.

       The Ultimatum Game is an interesting conflict between two people with its “take
       it or leave it” game of chicken. As we noted above, it is also an excellent example of
       how perfect rationality can lead to an extreme solution—in this case offering the
       bare minimum payout to the other person. The possibilities get even more intrigu-
       ing when it is extended to multiple people. The Pirate Game does just this.
                                         Chapter 6 Rational vs. Irrational Behavior   81

     In the Pirate Game, we have a number of rational pirates (for this example we
will use five). Despite my desire to come up with really cool pirate names, we will
refer to them as A, B, C, D, and E. The alphabetical monikers actually help us with
the next issue, that the pirates have a strict order of seniority: A is superior to B, who
is superior to C, who is superior to D, who is superior to E.
    As a group, the five pirates find 100 gold coins and are trying to decide how to
distribute them. In the pirate’s world the rules of distribution are as follows.

    The most senior pirate should propose a distribution of coins.
    The pirates should then vote on whether to accept this distribution.
    The proposer is able to vote.
    The proposer has the casting vote in the event of a tie.
    If the proposed allocation is approved by vote, that proposal goes into effect.
    If the vote fails, the proposer is thrown overboard from the pirate ship and dies.
    The next most senior pirate makes a new proposal to begin the process again.

    Pirates base their decisions on four factors. Each pirate:

     1. Is entirely rational.
     2. Wants to survive.
     3. Wants to maximize the amount of gold coins he receives.
     4. Would prefer to throw another overboard, if all other results would other-
        wise be equal.

    At first glance, it would seem that pirate A, being outnumbered by his peers,
might have to minimize his own allocation to avoid getting kicked off. After all, if
the other four pirates think he is taking too much, they would stand to benefit by
declining his proposal and sending him down the stereotypical plank. At that point,
the total would only be divided among the four of them, rather than five. This,
however, is not the solution—and is surprisingly divergent from his optimal approach.

Iterating Perfectly Rational Decisions
The pure strategy solution becomes more apparent if we work backward. To do so,
let’s assume that somehow we managed to get down to the final two pirates, D and
E. Knowing that, as senior pirate, he has the deciding vote over E, D proposes 100
for himself and zero for the hapless E. E can’t do anything about it, so this would
be the final result when it is just the two of them.
82   Behavioral Mathematics for Game AI

             Pirate      Gold        Vote
             D           100         Yes*
             E           0           No

          If we were in a situation where three pirates are left, C, D, and E, then C would
     know that D will perform the strategy above and offer E nothing in the next round.
     Therefore, C will offer E one coin, and keep the rest for himself—D getting nothing.
     E, being perfectly rational, also realizes that if it comes down to D and himself, he
     is going to get nothing. Therefore, the one coin from C looks pretty good compared
     to nothing, and E elects to vote for C’s proposal.

             Pirate      Gold        Vote
             C           99          Yes*
             D           0           No
             E           1           Yes

          Of course, in order for the three-pirate scenario to come about, something must
     have gone amiss with the four-pirate scenario. If B, C, D, and E were remaining, B
     (again, being perfectly rational) would know that the three-pirate proposal by C
     would have gone as above. Therefore, he needs to placate D because D knows the next
     round will not go well for him. If B offers D one coin, D will take that offer; it beats
     getting nothing from C. Having bought D’s vote with one coin, and knowing that he
     has the deciding vote in a tie, B is no longer concerned with C or E’s opinions and
     offers them nothing. The vote will be two to two but with B’s vote as the kicker.

             Pirate      Gold        Vote
             B           99          Yes*
             C           0           No
             D           1           Yes
             E           0           No

          Of course, B could have taken the same approach as C and offered E the one
     gold coin. After all, since E knows he won’t get any more than one coin if B is tossed
     overboard, you would think that he would vote for that proposal. However, remem-
     ber that according to rule 3 above, each pirate is eager to throw the others overboard.
     In that case, he would be interested in throwing B overboard since he can get that
     single gold coin from C in the next round and dispense with a rival. E would there-
     fore not vote yes simply because he is offered one gold piece the same way that D
     would. Knowing this, B would go with the original plan of buying D’s vote.
                                         Chapter 6 Rational vs. Irrational Behavior   83

     Of course, A is all-knowing and purely rational as well, and will therefore have
a pretty good handle on all of the above scenarios. As we saw, if B were in charge of
distribution, C and E would be left out in the cold with nothing at all. A knows
this—and knows that those are the two votes he needs. What’s more, he knows that
the price of those votes is simply one gold piece. C and E, being rational, will real-
ize that the one gold coin is better than either nothing at all or, in the case of C,
eventually being thrown off. With all of this in mind, A’s proposal is as follows.

        Pirate      Gold          Vote
        A           98            Yes*
        B           0             No
        C           1             Yes
        D           0             No
        E           1             Yes

     One final note: like B’s option above, A could possibly distribute those two
coins another way, for example, giving one to D instead of C. However, just like E
in the example above, D would rather chuck A overboard as pirates seem inclined
to do and collect his one gold coin from B in the next round. Therefore, A will sim-
ply give the one gold coin to C rather than D.

For those of you who are familiar with recursion, this is a perfect example of
how a problem can be solved by starting with the smallest possible scenario (in
this case, two pirates) and working backward… applying the same rules at each
level, but with the knowledge gained from the prior ones. Given that, if you
have enough gold to support the pirate population, you can solve the Pirate
Game for any number of greedy mateys. (You can even do up to 200 pirates
with 100 gold coins.)
For those of you who are not familiar with the definition of recursion, this is
the way it was presented to me in my AP computer class in high school.
recursion n: 1. (see recursion)

     The lesson to be learned here is that, even when agents are purely rational and
have perfect information on hand (or on hook), sometimes the rational thing to do
needs a little digging to find. In this case, A had to take into account what D, C, and
B would propose (in that order) and how each would vote for each of those propos-
als. All of those issues together get rolled up into what A needs to propose. This is
the normative decision theory approach. We have been told, mathematically and
algorithmically, what we should do.
84   Behavioral Mathematics for Game AI

     Questioning Iterative Rationality
     The solution we arrive at using iterated rationality differs from the first-glance ap-
     proach. When we looked at the game initially, we figured that pirate A needed to
     propose at least a marginally fair settlement. In fact, if we were to run a live Pirate
     Game with real people, it is very likely that, like the Ultimatum, Dictator, and Trust
     games in the previous chapter, pirate A would propose something significantly dif-
     ferent from the proven optimal solution. Modeling behaviors based on this is the
     descriptive behavior theory approach, that is, what people tend to do in that situa-
     tion. In this case the people tend to do things they should not do.
          What mechanism would cause him to do this? Certainly not the naked altruism
     as is seemingly the only explanation in the Dictator Game. Pirates aren’t known for
     their philanthropic leanings. This would suggest something else as the cause of the
     illogical offers.
          If it is not benevolent other-centric interest (even on a subconscious level),
     perhaps it is more self-interest? Basing the decision on self-interest, it would seem,
     would more likely have led us to the maximal (that is, greedy) solution we got. So
     how can self-interest lead us in the wrong direction? There is a different way that
     self-interest can be represented. Remember that money was not the only consider-
     ation in the game. Pirates were trying to avoid the very real threat of becoming fish
     food as well. Therefore, it is a legitimate expression of self-interest to say, “If I
     appear too greedy, they will kick me off.” To that end, a pirate may tend to give
     more than is required mathematically to avoid a particular fate.
         At this point, we have arrived at a similar mindset that plagued the Ultimatum
     Game. That is, “If I give too little, I will be rejected by the other person simply out
     of spite—and will receive nothing.” In the case of the Ultimatum Game, the think-
     ing error was that the other person (the Receiver) would reject an offer that was too
     small and instead take nothing. Of course, that is illogical for the Receiver to do. His
     decision matrix shows that getting anything at all—even one dollar—is better than
     nothing. Therefore, the Giver should only offer one dollar and expect it to be ac-
     cepted. But that is not what real players of the Ultimatum Game do.
          In the Pirate Game, the mindset is similar, but the consequences are slightly
     different. Instead of getting no money, there is the possibility of getting no money
     and getting death on top of it all. (And, as we all know, one of the classic blunders
     is “never go in against a pirate when death is on the line!”) Any given senior pirate
     may believe he has to worry about looking altruistic to his subordinates not only so
     they will accept his proposal, but so he will survive the whole process. As we discov-
     ered, however, this first-blush approach is not even remotely correct. Aside from
     buying a vote or two at a very small price, the ranking pirate can simply keep the
     rest for himself—and no one can do anything about it. At least not without sacri-
     ficing themselves later on in the process.
                                              Chapter 6 Rational vs. Irrational Behavior      85

            So, if we were to put this into a game situation, which method do we use? The
       normative one provided us with the answer of what a pirate should do. The descrip-
       tive one provided us with the answer of what pirates (or at least pretend ones) tend
       to do. The former is the optimum solution; the later is the more “realistic.”

       In the Prisoner’s Dilemma, we touched on the difference between the strictly dom-
       inant strategy of defecting and the Pareto optimal strategy of keeping quiet. In that
       case, the only way the optimal strategy could be accomplished was if our partner
       elected to stay quiet as well. The only way we could feel comfortable making that
       decision was if we knew that our partner would recognize the optimum strategy…
       and believed that we were going to follow it as well. Therefore, the important issue
       is not simply one of us behaving rationally but also acting under the assumption
       that all other players act rationally as well. This is known as superrationality, as coined
       by Douglas Hofstadter in one of the eclectic articles in his 1985 book, Metamagical
       Themas (ISBN 0-465-04566-9).
            Admittedly, making a rational decision whose success must be based on the fact
       that everyone else in the room is also making a rational decision is sometimes a bit
       of a stretch. In fact, in some cases all it would take is one person to not act rationally
       to send the entire framework into a tumble. As we will explore later, this is one of
       the greatest handicaps in normative decision theory as discussed in Chapter 4.
       Remember that the requirements for normative decision theory are:

           Has all of the relevant information available
           Is able to perceive the information with the accuracy needed
           Is able to perfectly perform all the calculations necessary to apply those facts
           Is perfectly rational

           This is all well and good if we are dealing with a puzzle that does not involve
       other thinking agents. However, merely by their inclusion, we are potentially dis-
       qualifying the first two items on the list. Short of some form of omniscience, we
       can’t read the other players’ minds—therefore putting all the relevant information
       behind a screen of doubt. Even information we can perceive will not necessarily
       be completely accurate. As we can see, while normative decision theory and the
       “shoulds” that it spits out for us can be rather helpful in solving some sorts of prob-
       lems, it begins to show weaknesses when other people are involved.
86   Behavioral Mathematics for Game AI

     Four Out of Five Pirates Surveyed…
     In the Pirate Game, we questioned pirate A’s decision because it doesn’t fit with our
     gut feeling of what he ought to do. Part of the reason for this is that our view of the
     other pirates may be a little off. One of the initial guiding premises of the set up of
     the game was that “all the pirates are rational.” Once again, it is that unreasonable
     assumption of superrationality that disqualifies a purely normative approach from
     believability. If the other pirates are not rational, then all of the calculations we did
     above are meaningless. If they are not purely rational, then those messy things like
     fear, spite, shame, and even simple calculation errors will creep into their decision
     making. If their decision making is compromised and unreliable, then we can’t accu-
     rately put ourselves in their shoes, can we?
          Therefore, whether or not we are purely rational is not the only question that
     needs to be addressed if we are to formulate our approach. If we are rational and so
     are the rest of them, all is well, but if even one of them is not rational, it can signif-
     icantly skew any sort of predictive result.
         Of course, in the Prisoner’s Dilemma, it was in our best interests to work together.
     In the Pirate Game, the competitors all want to launch each other overboard and
     get more gold for themselves. Can we really assume they are both perfectly rational
     and have our best interests in mind as well?
         So, perhaps our quandary lies in that we are trying to solve the wrong problem.
     Rather than trying to ascertain the “right answer” that optimizes our solution in the
     perfectly rational world, we need to try to model behavior that takes into account that
     people aren’t purely rational with all of the information and calculation ability.
         The answer to the question is more of a function of what game problem we are
     trying to solve. Also, the answer may be affected by what it is we are trying to ac-
     complish in this particular calculation. If we need mathematical accuracy, then the
     normative approach is the best one to use. If we need psychological believability,
     the descriptive one is preferable. And someplace in the middle we can draw traits
     from both. In fact, only by juxtaposing the two approaches were we led to the fact
     that the descriptive approach (e.g., being altruistic) was downright incorrect from
     an efficiency standpoint. Only by analyzing the two of them simultaneously were
     we able to determine that the normative approach (e.g., mathematical exhaustion)
     didn’t take into account potential psychological factors. That is, the “shoulds”
     seemed perfectly viable until we brought in the data that suggested that real people
     simply don’t do what they “should do.”
         So, maybe our gut instinct wasn’t too far off. All of this wonderful math and
     logic that computers speak so well and artificial intelligence (AI) programmers are
     so fond of goes right out the window. If our pirates are to look and act realistically
     they may not act completely rationally. If they don’t act completely rationally, they
                                             Chapter 6 Rational vs. Irrational Behavior      87

       won’t use the algorithms that generate the optimum outcome. So, without the
       fancy mathematics to help us, what would be the right settlement for pirate A to
       offer so as to avoid an unfortunate diving expedition?
           While there may be no immediate solution, there are certainly lessons to be
       learned here.

       Thankfully, in the Pirate Game, the rules stated that “all pirates are rational.” While
       the assertion of superrationality (that is, every participant is perfectly rational)
       seems to be a bit of a stretch based on the stereotypical presentation of pirates, it
       certainly made arriving at a solution a lot simpler. As we iterated through each step
       of the process, we could solve each of those steps with the assertion “…and there-
       fore, he will do this.” That assumption of rationality left us a single outcome for
       each of the subsolutions, which, in the end, built up to one single outcome as the
       “best” arrangement for the lead pirate to propose.
            At the time, we hinted at the fact that the mathematically optimum solution
       that we arrived at was different from what we would have guessed as mere nonrational
       pirates. Most of us would have offered our buccaneer colleagues more than was
       necessary to purchase their acquiescence. Thankfully, our fellow pirates would have
       been just as nonrational and probably would not have known the difference. As
       long as our offer made sense to them (whether they liked it or not), they would have
       likely gone along with it because it “felt right.”
            A popular example similar to the Pirate Game that exemplifies this lack of
       rationality is a game called Guess Two-Thirds of the Average. The title is not very
       cryptic. The game literally involves asking a group of people to guess a number be-
       tween zero and 100. The goal is to guess closest to what two-thirds of the average of
       all the guesses would be. For example, if the average guess of all of the participants
       was 75, then the winning guess would be the one closest to 50. If the average of all
       the guesses was 33, the winner would be the one who guessed closest to 22.
            Once again, just like the Pirate Game, many of us reading the rules of this game
       are already formulating little plans in our heads as to how we would approach this.
       I certainly did—and I was wrong. Kind of. Unlike the Pirate Game, there is no ex-
       plicit edict of superrationality. That is, the participants may be perfectly rational,
       but it is likely that they are not. This injects a level of uncertainty into the decision.
       The quandary that this presents, ironically enough, is that one of the decisions that
       must be made is whether or not to act completely rationally (if you know how).
       This depends on whether or not—and even to what extent—one thinks the other
       players are going to be rational.
88   Behavioral Mathematics for Game AI

         In the Prisoner’s Dilemma, this manifested itself rather well. The strictly dom-
     inant strategy was to betray the other player because that promised the best results
     no matter what the other person did. However, as we explored, the optimal strategy
     was for both players to keep quiet. The only way a player would select this option,
     however, was if he knew that the other person was going to act in the completely ra-
     tional fashion and keep quiet as well. If the other person’s rationality is not known
     (or he is known to be a loose cannon anyway), then the fall back is the strictly dom-
     inant strategy.
          In the Guess Two-Thirds Game, we are faced with a similar dilemma—but on
     a larger scale. We need to decide if the other players of the game are going to be
     acting rationally or not before we can decide if we should play the rational strategy
     or not. What’s more, now that we have strayed away from the simplicity of von
     Neumann’s game theory examples, we have to account for far more variables. With
     the Prisoner’s Dilemma, we had to ascertain the rationality of one person to deter-
     mine which of the two choices he would make. In this case, how many of the other
     players are going to be rational? And to what extent?

     Iteratively Eliminating Irrational Answers
     The problem here is that there is no strictly dominating strategy from which to start.
     That is, we can’t say, “This is the best way to play no matter what other people do.”
     Unlike in the Prisoner’s Dilemma, we can’t say, “Betraying gives us the best chance.”
     Interestingly, there is a unique pure strategy. In the Prisoner’s Dilemma, this strat-
     egy was to keep silent with the knowledge that our partner was going to do so as
     well—because he was rational. Similarly, in the Guess Two-Thirds Game, this
     approach leads us to the best possible result if everyone in the game is acting purely
     rationally. However, we get a far different answer than we would expect.
         To arrive at this answer, we need to walk through the exercise from the begin-
     ning, just like we did with our miserly brigands. In this case, we do this by iteratively
     eliminating strictly dominating strategies. To do that, we must find the strictly
     dominated strategies. By using that information to our advantage, we can narrow
     our solution set down significantly.
          Just as a strictly dominating strategy was one that was the best for the situation
     no matter what, a strictly dominated strategy is one that is the worst for the situation
     no matter what. In the Guess Two-Thirds Game, there is no strictly dominating
     strategy, but there are strictly dominated strategies. That is, there are ways to play
     that will always lose—no matter what.
        The reason for this is that there are mathematical impossibilities in the game.
     Because the rules of the game state that people select numbers between 0 and 100,
     we know that it is impossible for the average guess to be above 100. Certainly, it
                                      Chapter 6 Rational vs. Irrational Behavior   89

could be 100, but only if everyone in the game guessed 100. (Don’t laugh; as we will
see later, there are people who do this.) If the average cannot be above 100, then the
two-thirds point cannot be above 66.67. It would therefore be irrational to guess
anything above 66. So, cross off a big chunk of the possibilities.
     Now, because we are working from the premise that everyone is rational, we
must assume that they know that guessing above 66 is irrational. Therefore, we also
know that no one is going to guess above 66. Well, if no one is going to guess above
66, then we can also assert that the two-thirds point will not be above 44 (two-thirds
of 66). Because none of our rational co-players will guess above 44, we know the
two-thirds point will not be above 29.48 (two-thirds of 44). Lather, rinse, repeat…
(Figure 6.1).

         FIGURE 6.1 By iteratively eliminating the strictly dominated strategies
             (i.e., those that have no possibility of winning), we determine
                     that the only pure strategy is that of guessing zero.

     Eventually, by eliminating the possible guesses of rational players, we get to the
point where any guess above 0 is irrational. That means, in the Guess Two-Thirds
of the Average Game, the pure strategy guess is… zero. Of course, this answer only
exists in the world of superrationality. Everyone has to be playing by the same
purely rational strategy, and the odds of that happening are fairly slim.
90   Behavioral Mathematics for Game AI

     Guessing the Guesser’s Guesses
     Much like the problem that existed in the Pirate Game, this pure strategy doesn’t fit
     well with what we feel the solution should be. The difference is that in the Pirate
     Game, by proposing something less than the 98 coins for ourselves, we were not
     being rational. We were the ones being irrational by not realizing that we were in
     the driver’s seat. The solution in the Pirate Game would have been to calculate the
     most rational proposal, tell it to the other players, and then get them to see that they
     have no choice. If we failed to propose the 98 coins for ourselves, it was because we
     were not being rational ourselves.
          In this case, the solution is not entirely in our hands. It is a moving target based
     on the other players’ degree of rationality and, subsequently, their guesses. Since we
     can’t assume that the other players are entirely rational, and we aren’t in the posi-
     tion to propose the solution and explain the way things are simply “going to be
     ’round here,” we have to take into account the questionable rationality of the other
     players ahead of time. That’s why the pure strategy of guessing zero is not necessar-
     ily the right one.
          But what is the right guess? As we suggested earlier, there really isn’t one. Any
     given run of the game could generate wildly different answers with wildly different
     averages. However, there may be an optimal guess. This would be one that takes
     into account what people may tend to do. Remember, we don’t have to figure out
     what each individual person is going to do; we need only figure out what the group
     is going to do in an aggregate. If we are close enough to most of them, the average
     will fall in line with our general expectations. Someplace in here is a sweet spot… if
     only we could find it.
        Rather than approach this problem by iterating through rational solutions, we
     need to approach it by iterating through likely solutions.
         Because the goal of the game is to determine what two-thirds of the average
     would be, we need to first determine what we believe the average of all the other
     player’s guesses will be. If people were either completely unaware of the rules or
     mathematics of the game (or completely random), we could assume that the guesses
     would be spread evenly across the range from zero to 100. The average of all these
     guesses would be near 50. In that case, our guess should be two-thirds of that, or 33.
          But what if even some of the other players are thinking the same way we are?
     Would they guess 33 as well? If so, a disproportionate number of educated guesses
     at 33 would be mixed into the purely random background noise of irrational
     people. The average of all the guesses would now drop below 50 slightly—as would
     the two-thirds point. For example, if the average of the guesses is now 45, then two-
     thirds would be 30. Our guess of 33 would have been too high. Maybe a guess of 30
     is more accurate. But what if other people are thinking the same thing we are and,
                                       Chapter 6 Rational vs. Irrational Behavior      91

instead of guessing 33, elect to guess 30 instead? That drags down the average even
more, and likewise our two-thirds target.
     Certainly you can see where this is headed. The more people who are acting
rationally in the group, the more the average (and associated two-thirds target) is
affected. What’s more, some people may be acting rationally at the shallower level
(for example, guessing 33), and some may be acting slightly more rationally by tak-
ing into account the first level of rational players. Even others may be assuming that
everyone is acting rationally at a shallow level and that the average guess is going to
be 33 instead of 50. That would make their two-thirds guess 22! Kinda makes your
head spin, doesn’t it?
     In 2005, the Department of Economics at the University of Copenhagen in
Denmark ran a well-publicized trial of the game in the Danish newspaper Politiken.
They offered a cash prize of 5,000 Danish kroner (about $1,000) to whoever had the
closest guess. They attracted over 19,000 submissions via an Internet site. Needless
to say, that wasn’t a bad sample size. (In fact, about 1 in 300 Danes participated!) The
average of the guesses in their trial was 32.407, which led to a winning target of 21.605.
    Upon examination of the histogram of the submissions (Figure 6.2), a couple of
things stand out. First, there was a wide distribution of guesses—including some
people who actually did guess 100. In fact, while it is definitely the sparsest area of the
chart, a surprising number of people guessed above the “impossible point” of 66.7.

        FIGURE 6.2 The results of the Danish experiment show that the two
  most popular guesses were 33 and 22. Indeed, the average guess was 32.4, making
  the winning target 21.6. Graphic from “Making an Educated Guess” Working Paper,
      Department of Economics, University of Copenhagen by Jean-Robert Tyran
              and Frederik Roose Øvlisen (2009) used with permission.
92   Behavioral Mathematics for Game AI

          The guess receiving the largest number of votes (over 6%) was 33. Remember
     that we had touched on 33 being a potential solution—but only if all of the other
     votes were equally distributed between zero and 100, making the average guess 50.
     So, some people were obviously thinking along those lines. However, their fatal
     flaw was expecting that even distribution. Just as superrationality—the rationality
     of all participants—is an unreasonable expectation, expecting that all of the partic-
     ipants are completely irrational is just as flawed a premise.
          The guess receiving the second-most number of votes—and, at 6%, only
     slightly behind the 33 guess—was 22. Once again, the mindset seemingly at work
     here is one that we touched on earlier. The people who guessed 22 were counting
     on the fact that a large majority of the other folks were going to be guessing 33. They
     were hoping some players would be at least thinking about their answer (unlike the
     players who guessed 100, for example), but not thinking too much. By taking those
     people into account, and then basing their actions on that information, people who
     guessed 22 were, in fact, very close to the solution.

     Measuring the Depth of Rationality
     So, it would seem that assuming that at least a portion of the population was partially
     rational is a valid strategy that leads to an optimal solution. The question is, how
     many of the participants are going to be rational and to what extent? We know the
     answer lies between two extremes. Also, as we have seen, the answer presents itself
     as a continuum that is represented by how many iterations people go through in
     attempting to outguess the other people.
         For example, we have the random players whose guesses are spread all over the
     possible range from 0 to 100. Those players are not thinking at all about what any-
     one else could do. This is evidenced by the fact that people are willing to even select
     numbers of the impossible result threshold of 66.67. Let’s give these people a ratio-
     nality index of 0. In essence, they are simply randomly picking numbers.
     Statistically, the average of their guesses should be close to 50 (Figure 6.3).
         The next group of people takes the “zero rationality” folks into account. They
     work from the assumption that people are going to be all over the map with their
     guesses, figure that the average will be 50, and then guess around 33. They have per-
     formed one level of iteration through the logic progression. Let’s give them a ratio-
     nality index of 1.
         The people who are aware that the level-one people exist would also then take
     those votes into account. With the presumption of the 33 vote being a big draw,
     they would then guess 22. We will assign this group a rationality index of 2. They
     have performed two iterations of thought—they assumed there would be level ones
     who would count on the existence of level zeros.
                                     Chapter 6 Rational vs. Irrational Behavior     93

   FIGURE 6.3 Each group of people will take into account the groups before them.
 For example, the index 1 group will assume the index 0 people are guessing randomly.
       The index 2 group will assume the index 1 people are basing their guess on
                                  index 0, and so on.

     If some people take the level-two guessers into account and base their own
guesses on that (i.e., 15), we would give them a rationality index of 3. Again, as be-
fore, they performed three layers of calculation, in that they are assuming that
plenty of levels twos exist based on the level twos’ belief about the level ones which
are, in turn, based on the assumption of the level zeros.
    If you were to proceed all the way down to the bottom, you would get to a guess
of one at rationality index 10. That is, someone would have to assume that level
nines exist who are basing their own guess on the assumption that level eights exist,
and so on.

How Low Will You Go?
So, how many iterations are people either capable of doing, willing to do, or believe
is appropriate to do? Separating out those three reasons for not proceeding further
is a little difficult, but coming up with how many iterations of logic people do is eas-
ily measured. As we saw in the Danish example, if a significant number of people
“get it,” it tends to show up on the histogram of the results. In this case, there were
plenty of index 0 people, about 6.5% were index 1 (guessing 33), and about 6%
were index 2 (guessing 22). After that, things tend to blur out and are no longer
apparent from the histogram. This is not a coincidence.
    Many researchers have toyed with the game and found that those patterns re-
peat. By changing the game to “guess 70% of the average,” for example, you would
94   Behavioral Mathematics for Game AI

     expect the most popular guesses to change accordingly. It turns out that this was,
     indeed, the case. Most people were random (index 0), many people guessed 70% of
     the expected average of 50, or 35 (index 1), and another large batch guessed 70%
     percent of 35, or 24 (index 2). Just like in the Danish example, index 3 either didn’t
     show up or was diffused enough that it blended into the background.
          Changing the game to other rules provided similar results. When various stud-
     ies around the world have been run with multipliers such as 0.7, 0.9, 1.1, and 1.3,
     the pattern was repeated. It turns out that people very rarely get to level three. In
     fact, studies have shown that most people end up with a rationality index of
     between 0 and 3. (One study showed that computer science folks ended up the
     highest, with an average of 3.8 iterations. That isn’t to say they won; they probably
     didn’t because they were thinking too far beyond the other guessers.) So, we can
     determine that, for some reason, people don’t bother going any further than a cou-
     ple of iterations into the logic. This could be for a variety of reasons.

         They don’t think of it.
         They can’t do the calculations.
         They don’t believe it is relevant because other people can’t or won’t.

          An interesting twist to this is that the rationality index that people get to is
     based on who the other players are. For example, if you give the game twice to an
     economist who knows he is playing against the general public the first time and a
     bunch of other economists the second time, he will change his answer accordingly.
     He will assume that the other economists will iterate further than the general pub-
     lic and adjust his selection to a lower number (a higher rationality index) accord-
     ingly. This leads us to the assumption that some people aren’t stopping their
     iterating process because they can’t think that far, but rather that they are doing so
     because they realize it is pointless to continue.
         If we think back to the pure strategy of this game, we were led to the logical con-
     clusion that we should guess zero. We arrived at that conclusion under the as-
     sumption of the superrationality of the participants, which we know to be
     unreasonable. The above observation shows that—excluding the 2% of Danes who
     guessed zero—many people are aware of that unreasonable expectation of ubiqui-
     tous rationality.

     Logical Irrationality
     What we have found is that, when faced with the very likely absence of superrational-
     ity, it actually becomes less efficient to be purely rational. With our first approach to
     solving the Guess Two-Thirds Game, we were being purely rational at each step.
                                      Chapter 6 Rational vs. Irrational Behavior      95

We were iteratively eliminating logically impossible answers and, in doing so, as-
sumed that the entire population of guessers would do so as well. Eventually, this
led us to the purely rational answer of guessing zero.
     In our second attempt at solving the problem, we started with the premise that
the population was not superrational; that is, some people may be rational, but it is
not likely that everyone is. By taking away the aura of superrationality, it would
seem that our problems have been solved. However, we still had to decide how to
rationally proceed from that standpoint. As we showed, this can be done in stages—
which we labeled as index 0, index 1, and so on. The same caveat is in play, however.
If we proceeded on logically, we would eventually end up back in the territory of
guessing zero once again.
     Both of these approaches lead us to the wrong answer. While we may have
wonderful proofs of the perfect rationality of guessing zero, it is simply not a good
answer to this problem. In the first method, our premise of superrationality was
flawed. Put another way, it is irrational for us to assume that everyone else is ratio-
nal. In the second method, we started with the right idea. We gave up on the notion
that everyone was rational (giving us the index 0 guessers) but still had to count on
the fact that at least some of them were (the index 1 guessers). So, it is rational to as-
sume that some other people are rational, but how far can we take that? At some
point, the seemingly rational chain of iterated processes has to be abandoned. We
can no longer assume that things will be the way they “should be” because, at some
point, other people will no longer be acting the way they “should.”
     Therefore, it is in our best interests to, somewhere along the way, abandon the
rational method of arriving at an answer and say, “Enough!” We are logically con-
cluding that we are going to no longer be purely rational. It is logical to act irra-
tionally simply because the environment that we are making the decisions in is
irrational as well. If the other people involved aren’t going to be completely ratio-
nal, then it is not logical for us to be completely rational. (I don’t recommend using
that justification during an argument with your wife. And definitely don’t do it
where your kids can hear… you don’t want it coming back at you later!)
     The whole idea can be summed up in a single quote from Mr. Spock, Star
Trek’s renowned purveyor of logic and rationality. When Captain Kirk asked Mr.
Spock if a particular action was “the logical thing to do,” Spock replied, “No, but it
is the human thing to do.” And isn’t this, after all, what we are trying to accomplish
in crafting our AI?
96    Behavioral Mathematics for Game AI


      So, through an analysis of the Guess Two-Thirds Game, we have discovered that it
      is sometimes illogical to be perfectly rational—even when we can be. The reason for
      this is that we can’t count on superrationality. We know superrationality does not
      exist because, even despite our best intentions, humans are not always rational in
      their decision making. Of course, humans are not completely random either. If
      perfect rationality is the pinnacle of the mountain, random decisions are the boulders
      strewn around the base. Someplace along the slope between these two extremes is
      where the bulk of human decision making is positioned.
          But what makes up these decisions? What keeps us from making perfectly ra-
      tional choices, even when we may want to do so? Even when we realize that it is in
      our best interests to do so? What is the barrier that gets between the slope of the
      mountain where we are and the pinnacle of perfect rationality? It turns out that,
      while there are many different pitfalls along the way—too numerous to list, in
      fact—many of them can be generally categorized the same way. In a word, error.
            The term bounded rationality has come to represent the notion that people
      often are not or cannot be completely rational. This is usually due to failings in the
      ability to perceive or calculate adequately. If we were, once again, to go back to our
      list of criteria for normative decision theory, we would find that many of the expec-
      tations therein are the impossibilities (or at least unlikelihoods) that get in the way
      of perfect rationality.

          Has all of the relevant information available
          Is able to perceive the information with the accuracy needed
          Is able to perfectly perform all the calculations necessary to apply those facts
          Is perfectly rational

          Beginning at the top, we often do not have all the relevant information avail-
      able. In the case of the Guess Two-Thirds Game, for example, we simply do not
      have the ability to know what everyone else is going to guess. That isn’t to say that
      information isn’t there; we simply can’t access it. Even when the information is
      available to us, we may not realize it as being present or important.
           The second item is only slightly different. Even if we know the information is
      there and that it is relevant, we may not be able perceive it with the accuracy needed
      to lead us in the right direction. That is a simple human failing of not only our
      physical senses, but of attention and comprehension.
                                               Chapter 6 Rational vs. Irrational Behavior   97

       Calculating Which Calculations to Use
       One of the most important problems that humans have with mimicking perfect
       rationality—and therefore normative decision making—is the third point on the
       list. We simply are not wired to make many of the calculations that are necessary.
       This is not limited to performing math and whatnot. Surprisingly, we do more of
       that than we realize. Simply catching a fly ball requires us to instinctively perform
       a startling number of differential calculus equations, yet this is something that small
       children can do many years before they are comfortable with long division. The
       problem of flawed calculation is a little more subtle than that.
           Often, the calculations that need to be made are not the ones we are handed
       and can write down on paper. Instead, the problem of calculation is determining
       what calculations need to be made in the first place. This is similar to the percep-
       tion of information issue above. Faced with an endless array of potential factors and
       an exponentially larger number of combinations of those factors, simply deciding
       what to include and what to exclude is an enormous feat.
           Both including too much information and not including enough information in
       our decision-making process can lead to failures to arrive at logical conclusions.
       These are two of the important “boundaries” that lead to bounded rationality.

       Thankfully, as we mentioned in Chapter 2, most sentient life forms have a built-in
       filter known as latent inhibition. This helps us sort through things in our environment
       and dismiss those which are not relevant. If, as a kid, we were in left field dutifully
       doing our differential calculus to catch the incoming fly ball, we would probably
       concede that the length of Suzie Miller’s straw relative to the depth of her cup and
       her volume of remaining Cherry Coke is not relevant to our task at hand (or at glove).
           However, latent inhibition is not infallible. Sometimes things slip through that
       we may think are relevant but actually are not. In fact, even being aware of these
       points may detract from our calculation ability, such as our awareness that Suzie
       Miller is watching us run for the lazy fly ball and thinking that we need to look cool
       while making the catch. At that point, we might get a little too muddled to do all that
       heady differential calculus. (Please note: This is, by no means, autobiographical.
       When I was performing athletically as a kid, I never got muddled by girls watching
       me. No girls ever bothered to watch me!)

       Inventorying the Backyard
       There certainly is a purpose for latent inhibition. This is a very important charac-
       teristic in evolutionary theory. If we were to process absolutely everything in our
98   Behavioral Mathematics for Game AI

     environment at all times, many times we would be unable to sort out conflicting
     information. As I watch the various wildlife that we have in our back yard interact,
     it is interesting to attempt to determine to what things they do and do not pay
     attention. Here are a few observations:

         The sparrows at the feeder don’t pay attention to the cardinals—but will flee
         from the much bigger grackles.
         The male cardinals at the feeder don’t pay attention to the sparrows or the
         grackles, but will readily assault another cardinal.
         The birds on the ground under the feeder don’t pay attention to the squirrels
         next to them.
         The squirrels don’t pay attention to the birds.
         The birds are aware of perches near the feeder that are available for when the
         feeder is overcrowded or in use by an unwanted rival.
         The squirrels are aware of all escape routes from the ground so as to avoid me
         and the dogs.
         The squirrels are not aware of perches they can’t reach.
         Both the squirrels and the birds pay attention to a dog when one comes out.
         Both dogs pay attention to all squirrels regardless of location.
         One dog (Jake) pays attention to the birds; the other (Maya) does not seem to
         see them.
         Both dogs pay attention to the chew rope in the grass and will seek it out.
         The squirrels and birds do not acknowledge the existence of the chew rope.
         Both the birds and the squirrels pay attention to the state of the feeder and will
         even watch me come out to refill it. The dogs only think I’m going to play with
         them and the chew rope.

         Consider that the important factors in the lives of birds, squirrels, and dogs are
     only those that are relevant to food, survival, and, in the case of the dogs, playing
     with the chew rope. Looking back through the list above and mentally checking off
     the relationships between entities as either relevant or not, we see that there are far
     more interactions that are not important than those that are important. For example,
     our dog Maya doesn’t pay attention to:

         The location of birds
         The sizes and types of birds
         The locations of perches
                                      Chapter 6 Rational vs. Irrational Behavior     99

     The location of the feeder
     The state of the feeder
     The potential escape routes for squirrels

     She does, however, pay attention to:

     The location of squirrels
     The location of the chew rope

    When looking for a place to lie down in the yard, Maya doesn’t have to take
into account the locations of the bird feeder, perches, or squirrel escape routes. She
simply doesn’t need or want to know about the location, sizes, and types of the
birds. She has little use for the information on whether or not I need to fill the bird
feeder. And, since she is now 15 years old (Lorne Greene would be quick to remind
us that this is 105 in dog years), she may only be mildly interested in where the chew
rope happens to be. If she sees a squirrel (at her age, she can’t hear them any more)
she will definitely pay attention and may, for a time, express a modicum of excite-
ment. For the most part, however, this entire backyard inventory is not important
to her. All she wants is soft grass in the sun. (Darn. I forgot to include “condition of
grass” and “location of sun” in the above list. There is so much information to include!)

Statistical Overkill in Football
In our own life, we follow similar patterns. There is simply too much to worry
about. A good portion of it is going to stay irrelevant most of the time. Most of it
doesn’t have anything to do with any particular decision we are making. And yet, it
seems that humans try desperately to include this irrelevant information in their
decision-making processes. They ascribe meaning where there isn’t any. They see
relationships where no relationship exists.
    When I was working on a statistical handicapping algorithm for NFL games in
2002, I was amazed at how many things other handicappers took into account to
predict the outcomes of games. Obviously, such things like “home field advantage”
were relevant. However, some people went to great lengths to justify using statistics
that were based on grass vs. turf, indoor vs. outdoor, good vs. bad weather, the
month of the year, wearing normal vs. alternate home uniforms, and the entire his-
tory of one team against the other. Trust me, it doesn’t stop there… there are even
more esoteric and strange factors that people take into account in the holy quest for
gridiron prognosticatory perfection.
100    Behavioral Mathematics for Game AI

           Finding the flaw in using the above statistics takes only one little bit of informa-
       tion that many people overlook: The statistics are about the teams over the course
       of decades; the average tenure for the players with a team is measured in a few
       years. It doesn’t matter if Team A beat Team B 10 straight times in the 1990s.
       Chances are, none of the players who participated in that lopsided rivalry are on ei-
       ther one of those teams any more. So, despite being really nifty information to look
       at and excellent fodder for sports bar trash-talking, is any of it relevant enough to
       take into account for today’s game? How about just simply starting out with “is this
       team good or bad?” and building from there?
            And yet, people are addicted to seeing connections in information. Literally.
       Scientific studies have shown that humans want to see order and connection. They
       want to determine cause and effect, but in their zealous pursuit of doing so, they see
       patterns that aren’t there. How often, after seeing a coin land on heads five times in
       a row, have we said to ourselves—even momentarily—“The next one just has to be
       tails!”? We all know that those five tosses have nothing at all to do with this next
       one, but our mind tries to make the connection anyway. Thankfully, if asked what
       the odds are of the next coin flip being tails, we would probably be able to shake
       ourselves enough to still say “50/50,” but decisions are rarely that simple.

       Despite our obsession with ascribing meaning to information, typically it is not the
       act of being over-aware of possible considerations that becomes a problem. More
       often the failure happens when we decide that things are not important when they
       really are. By deciding—even subconsciously—that something is not relevant to the
       decision at hand, any calculations that would have included that information are ei-
       ther skewed to some degree or rendered completely null. This is typically the most
       problematic hitch that leads us to not act in a perfectly rational way.

       The Monty Hall Problem
       Excellent examples of people failing to calculate properly are exhibited in the way
       they make decisions when put on the spot on TV game shows. When I was a kid in
       the mid-1970s, I vaguely remember seeing Let’s Make a Deal. At the time, I was
       more amused by the adults wearing ridiculous costumes to get the host’s attention
       than I was by what has become an iconic statistical logic problem. However, one of
       the staples of that show has become a classic example of the way the human brain
       manages to neglect very important information needed for a calculation.
           The Monty Hall Problem, named after the host of the show, comes across as a
       deceptively simple decision. However, a staggering number of people get the actual
       answer wrong. Even when asked to explain their decisions, the respondents often
                                      Chapter 6 Rational vs. Irrational Behavior     101

offer a justification that shows they aren’t acting in an entirely random fashion…
they are simply wrong in their logic. Thankfully, plenty of studies have been done
on Mr. Hall’s challenge. Science has seemed to nail down exactly where people go
wrong. Before I give away the answer, however, it is educational to approach the
problem blindly.
    The actual scenario presented predates Let’s Make a Deal significantly.
Variations on it can be traced back over 100 years. Joseph Bertrand proposed a
similar question now known as “Bertrand’s Box Paradox” in his book, Calcul des
Probabilités in 1889. Another version is known as “The Three Prisoners.” The Monty
Hall version is a little easier to relate to, however, so we will use that as our example.
     In the classic version of the problem, we (as the contestant) are presented with
three doors. We are told that behind one of the doors is a valuable prize such as a
car. Behind the other two doors are gag prizes such as a goat. We are asked by the
host to select one of the doors behind which we believe is the car. Prior to opening
the door, the host opens one of the two doors that we did not choose and reveals
what is behind it. At this point, we are now asked if we want to stay with the door
that we chose or if we would like to switch and take what is behind the other door
instead. What should we do?

I really have to point out here that when I presented this question to my wife,
she responded, quite genuinely, “But what if I want the goat instead of the
car?” I think it had something to do with not having to mow the lawn. Believe
it or not, her comment was actually more helpful than you may think. We will
cover the very important concept of the relative subjective worth later on in
Chapter 9.

For now, let’s work from the premise that: car = cool; goat = lame.

     An overwhelming number of people to whom this problem is presented will
stick with their original choice. Usually, the percentage is up around 80%–90%.
Some people will switch, but the reasons they do so are based more on a “what the
heck” attitude more than from any sort of logical deduction. The tragic thing about
this is that, when presented with this problem, the best option is to always switch.
It doesn’t guarantee you the car, but it increases your odds of doing so.
    “But how can that be?” we may ask. After all, with two doors left, we know
there is a goat behind one and a car behind the other. Our odds of having already
selected the door with the car are 50/50, right? This is the logic that is presented by
the people who do switch as well. They figure that, with the even odds, switching
makes as much sense as staying. Right answer, wrong reason.
102   Behavioral Mathematics for Game AI

      Checking Our Premises
      To find out what the actual probabilities are given our remaining two doors, we
      need to walk through the problem. When our decision time comes, there are two
      possible scenarios.

           1. We picked correctly.
           2. We did not pick correctly.

          Before we let our 50/50 mentality lead us forward, we need to check the premises
      on which that 50/50 mentality is based. To do that, we have to understand what
      happened prior to the “keep or switch” decision. Somewhere between our initial
      choice of which door we wanted and our second choice of whether to keep or switch,
      something very relevant happened—but we have to know where to look for it.
          As we know, the only event that happens in that time frame is that the host opens
      a door for us, showing us what is behind one of the doors that we did not select. As
      we have done before in Chapter 5, we need to now put ourselves in the role of the
      host. What decision is he making in that time? What information is he using to
      make that decision? And most importantly, what information does that convey
      to us for our keep or switch evaluation?
           First, let us analyze what happens if we had picked correctly to begin with
      (Figure 6.4). At this point, the host is faced with two doors behind which are goats.
      It doesn’t matter which one he elects to reveal to us. He could pick either one and
      it doesn’t change the end result. We are, indeed, faced with a car (our door) and a
      goat (the door he left for us).

         FIGURE 6.4 In the Monty Hall Problem, because the host knows where the car is,
       his selection of which door to reveal is more important than our initial, blind selection.
            Always switching doors nets us a two-thirds chance of winning the car rather
                                 than the expected one-half chance.
                                     Chapter 6 Rational vs. Irrational Behavior    103

    On the other hand, if we had picked a goat door to begin with, the host is now
in control of two doors… one that hides the car and one that hides the other goat.
The host is not going to reveal the car to us. Therefore, he has to open the door that
hides the other grass-muncher. Logically, that means the door that he didn’t open
has the car.
    At this point, our question may be, “But how do we know whether we selected
correctly to begin with?” Well, we don’t. However, the solution is for us to use some
of the same simple probability math that we were willing to use before. Our error
was not doing this probability math sooner.

Rethinking the Probabilities
Originally, there were three doors from which to select. That means a random
choice (which was, in effect, what we were making) gives us a one-third chance of
selecting the car and a two-thirds chance of selecting a goat. Again, for emphasis, we
had a two-thirds chance of being wrong and only a one in three chance of being
correct in the first place. When brought forward to our second decision, the same
chances hold true. Regardless of what the host does in the interim, we still only have
a one in three chance of currently claiming the correct door. We have a two in three
chance of being the proud, temporary custodian of a domestic barnyard animal.
    This is to our advantage, however. As we mentioned above, if we selected incor-
rectly (two-thirds chance), we forced the hand of the host. Of his two remaining
doors, he must take the other goat off the table and leave the car on it (which is a fun
metaphor, isn’t it?). That means that if we originally selected one of the wrong
doors, we now know that the other remaining door is the car. It has to be.
    So, to sum up:

    We had a one-third chance of originally being right—we should stay with our
    door in order to win.
    We had a two-thirds chance of originally being wrong—we should switch
    doors in order to win.

    If we rephrase slightly to see what happens if we always switch, we arrive at:

    With a one-third chance of originally being right—switching doors would
    always lose.
    With a two-thirds chance of originally being wrong—switching doors would
    always win.
104   Behavioral Mathematics for Game AI

          Therefore, if we always switch, we win two-thirds of the time. The only time we
      would lose is if we had originally picked correctly—and that only happens one-
      third of the time. Notice how this is different from our original 50/50 premise?

      What Did We Miss?
      The question is, what information did we miss the first time that led us to the
      wrong conclusion? Research has shown that, for whatever reason, people seem to
      dismiss the host’s selection of doors to reveal. They are so caught up in the random-
      ness of their initial selection and the subsequent, seemingly random selection of
      whether to keep or switch that they neglect to consider the thought process that
      went into the host’s decision. People seem to ascribe a similar random nature to the
      host’s decision. They don’t take into account the fact that the host knows where the
      car is. His decision is not random; he is trying to make you choose. After all, Monty
      Hall’s game was not titled Guess the Right Door; it was called Let’s Make a Deal. The
      attraction to the game was not seeing whether or not you got the car on the first se-
      lection. It was in the tension surrounding the decision of whether or not to switch.
      For that scenario to arrive, the host could not show you the car—and in doing so,
      tip his hand.
          If we think back to the examples in Chapter 1, Monty was not the Blackjack
      dealer mindlessly dealing out random cards and following prewritten rules. Monty
      was in the role of the Poker player from Chapter 1, making his own decisions that
      directly affected yours. (Ironically, on any given show, he may have been the only
      person in the studio to not wear a stupid hat.)
          So, it turns out that a puzzle that is certainly solvable from a logical, rational,
      mathematical standpoint fools most people to whom it is presented. Even people
      who get it right often do so for the wrong reason. Looking back at our normative
      decision theory criteria:

          Has all of the relevant information available
          Is able to perceive the information with the accuracy needed
          Is able to perfectly perform all the calculations necessary to apply those facts
          Is perfectly rational

          We had all the relevant information available to make a decision. We were able
      to perceive the information accurately enough. Most people would be able to perform
      the very simple probability calculations with all the accuracy needed. And, with all
      of the above, we seemed to be acting in a perfectly rational fashion. The problem
      that slipped through was that, while we were able to perceive all the information, we
                                            Chapter 6 Rational vs. Irrational Behavior   105

       didn’t perceive it all. We simply left out the fact that Monty’s decision on which door
       to open for us was a relevant fact. Somehow, we failed to take into consideration
       that his choice was not random—he knew where the car was and was intentionally
       avoiding showing it to us. More importantly, even if we knew that he was hiding it
       from us, we failed to take into account that this significantly skewed the odds in one
       direction. It was available and perceivable. We just didn’t think it mattered.
           The lesson here is that just because all the information is available to as and we
       know what to do with it, we won’t necessarily think to use it. In short, humans
       fail… even with the simplest of decisions. Therefore, their quest for perfect ratio-
       nality is bounded.


       While Monty Hall taught us that we often fail to take all the relevant factors into
       account when making a decision, there is a different reason we may not reach the
       pinnacle of rational decision making. Sometimes we may elect to not perform in a
       completely rational manner. This is different from what we explored above with the
       Guess Two-Thirds Game where we decided to not pursue a purely rational line of
       thought. In this case, by conceding to rational ignorance, we are making a rational
       decision to remain ignorant of all the relevant information. By doing so, we will be
       violating the rules of normative decision theory in that we will not have all the rel-
       evant information, may not perceive it accurately, or may not calculate it properly.
       There are certainly reasons for each of these.


       Thinking back to Cutting the Cake in Chapter 5, as the Cutter, we knew that our
       opponent (the Decider) was going to select the biggest piece of the two. Therefore,
       it was in our best interests to cut as close to the middle of the cake as possible.
       However, short of laboratory conditions and a handy electron microscope, there is
       no way we could accurately ascertain the exact middle of the cake. Even as the
       Decider, after a less-than-perfect cut, we would have very little way of knowing
       which of the two pieces was bigger. In fact, the closer the cut is to being perfectly
       equitable, the harder it would become to make the correct decision. Given enough
       time and through painfully protracted measurements, we could eventually identify
       the biggest piece, but what would we have gained? And what condition would our
       confection be in after such a lengthy investigative process? That is, what is the cost
       of gathering all that information?
106   Behavioral Mathematics for Game AI

           We face these decisions many times a day in our own lives. For example, imag-
      ine that at the grocery store, we see two cans of green beans from different compa-
      nies on the shelf. While they are otherwise identical, there is a price difference of
      five cents between them. We could stand and ponder what would have caused the
      difference. Does the store have too much inventory of one and is trying to reduce
      it? Why is there too much inventory? Was it not selling? Why was it not selling?
      Perhaps the wholesaler of the more expensive brand is taking more profit? Or,
      turning to a factor that is likely more important to us, is there a quality difference
      between the two brands?
          We could launch an investigation, interviewing the store managers and the
      wholesaler and researching the prices at other stores. We could jump out onto the
      Web to see ratings and reviews of other people’s experiences and opinions with the
      two brands. We could pull the financial statements of both companies and exam-
      ine their relative profit margins to see how much they are really making off their
          Or we could just grab a can and continue shopping.
           While all of that information could certainly be relevant to our decision, espe-
      cially the quality difference, is it really worth the five-cent difference to go through
      all of the effort? We are far more likely to decide between the two based on a single
      factor: Is the possibility of a little extra quality in our beloved green beans worth five
           This would certainly be a different story if we were purchasing a new car and
      were looking at a difference of $5,000 instead. Why are those two seemingly similar
      automobiles priced so differently? In that instance, the information may be worth
           I’m not going to attempt to solve this problem now. We will attempt to quan-
      tify the costs of pleasure, pain, happiness, and even time in future chapters. The
      point is that by just grabbing a can of beans and moving on, we have made a ratio-
      nal choice to remain ignorant of facts that could possibly be relevant to us. The cost
      of that information is just too high for the benefit we would get out of it.
           These decisions are common in the game world as well. Many times, the cost of
      the information is directly related to the computational power available to us. That
      is an external barrier that is not directly related to modeling of behavior, however.
      Even if we had more than enough processing power with which to compute things,
      there are times when it is more rational to not acquire information.
                                               Chapter 6 Rational vs. Irrational Behavior   107

           IN   THE   G AME   Scouting the Enemy

       If we are playing a strategy game, we may be faced with a decision about what sort
       of units to build in an army. Do we want to construct melee units? Archers? Do we
       need fast units such as mounted ones? Do we need defensive, area-of-effect units
       such as catapults? Much of our decision is based on what the other player would be
       building. If he is constructing primarily melee units, for example, we may want to
       load up on archers so we can attack him at a distance before he gets the opportu-
       nity to close on our forces. But how do we know what he has built already or is
       building currently?
           One way of solving this problem would be to send scouting units to the enemy
       base. Once we discover him and spy on the make-up of his armies, we can better
       decide how to proceed with our own production.
            This approach has a few problems, however. First, we may incur a cost of time
       and material in creating a scout. We may lose that scout if it gets discovered. This is
       especially problematic if the scout is discovered prior to reaching the enemy base—
       we would get no information for our efforts. Additionally, we have to wait for the
       scout to reach the enemy base before any information can be gathered. That is time
       we could be spending producing an army but instead are electing to use waiting for
       the information to make the proper decision. And, on top of all of this, the scout
       could acquire incorrect information. In fact, our enemy may intentionally work to
       deceive potential scouts by building things differently initially and then changing
       later. Or, our enemy may hide the bulk of his real force and expose what he wants
       our scout to see—and for us to believe.
           All of these are things that can go wrong through us wanting to collect informa-
       tion with which to make a decision. We are costing ourselves valuable time and
       resources for what may be questionably useful information. At that point, most of
       us would be content to say “never mind” and simply start building an army with
       what we believe will be a correct balance. As we gather information later (and at less
       direct expense), we can change our plans or adapt accordingly.
           What we have elected to do is proceed with our plans in rational ignorance.
       We left ourselves ignorant of some of the information we could find useful simply
       because the cost of acquiring it would be too high.

       Another reason for adopting rational ignorance may be the cost of calculation. This
       is very similar to expending effort on the cost of information above. In this case, we
108   Behavioral Mathematics for Game AI

      are stating that sometimes the cost of calculating the relative merit of something is
      too high to justify the additional information it may provide us.
          For the most part, humans are very accepting of generalized answers to ques-
      tions. When we ask what time it is, we don’t need an atomic clock on hand. When
      we ask how long it will take to drive to the store, we don’t care to know down to the
      second. When figuring a tip at a restaurant, most of us shoot for the rounded-off
      dollar amount that represents whatever tip percentage we would like to bestow.
           Imagine for a moment that you knew someone who did calculate this.
      Conveniently, I did know someone who did this, which makes for a wonderful, if
      hair-pulling example of this point. When splitting the bill with people, she would
      calculate the exact percentage of the bill that each person was responsible for (in-
      cluding the proper division of the total sales tax). Then, she would apply that per-
      centage to an exactly figured 15% tip. When she was done, each of us knew not only
      what we owed for the meal, including splitting the sales tax, but exactly—to the
      penny!—what we should tip the waitress. (Don’t bother e-mailing me with obser-
      vations on how there was an easier way to do this… we tried to tell her.) The fun
      really began when she tried to collect money from everyone for each of their
      portions… down to the penny. She once had to ask the waitress for change so that
      she could give us change on our portions so that—you guessed it!—we could give
      the change back to the waitress.
           Although we have never come to a conclusion on the exact pathology that caused
      her to do this (the cost of calculation was too high), we did come to the agreement
      that it wasn’t worth it to go through all of that. Not one person in the arrangement—
      neither diner nor waitress—was all that interested in a penny here or there. We were
      all content to be rationally ignorant of the exact amount that each person owed or the
      exact amount that we should tip the waitress (which is a spectacularly arbitrary figure
      anyway). Just like in the above example of the cost of information, we were pleased
      enough to say, “Here… It’s good enough. Take this so we can leave.”

       IN   THE   G AME   Counting the Enemy

      In the previous example, we considered waiting until we knew the make-up of the
      enemy forces before we started trying to build our own. The hope was that we
      could then tailor our forces to match the type of units the enemy was building. One
      thing we overlooked for the sake of simplicity was that it was likely that the enemy
      would be building a reasonably balanced force. They were not likely to build all of
      one type of unit—archers, for example—and none of any other. Such homogene-
      ity is, in and of itself, not very efficient. It would also lead to an answer that is not
      much more difficult than the decision matrix for Rock-Paper-Scissors.
                                            Chapter 6 Rational vs. Irrational Behavior   109

           If we are then to assume that the enemy forces are a cornucopia of varying unit
       types and strengths, how do we go about building our own forces to adequately
       match theirs? For example, if we were to discover that the enemy is building 60%
       melee units, is our decision going to be different from if they were building
       70% melee units? Or 75%? What if they were building 60% melee units but also
       30% mounted units? Is that different than 60% melee and 25% mounted? How
       much different? If there were five or six different unit types, the complexity mounts
       considerably. What if our archers are 80% effective against melee units but only
       60% effective against mounted units? Is that much different from if the efficiencies
       were 70 and 50%, respectively? How much different?
           Of course, this all comes with the caveat that we are able to perceive the num-
       bers exactly. While this may be a simple task for a computer to do, it is somewhat
       more difficult for our senses to process. Humans can only handle counting so many
       objects at one time before we start to get a little bleary eyed. Unless the enemy is
       polite enough to line his army up in neat rows of 10 units each, simply getting an
       accurate count of the various units by type is a significant challenge. Oh… don’t
       forget that this number is also likely to be continually changing as well.
           The best we can hope to come up with is that the enemy is creating “more of
       A than of B and maybe twice as many A as C.” Calculating with any greater accuracy
       than that may not only difficult, but, given the transient nature of any instanta-
       neous count, it could also be a degree of specificity that is irrelevant. So what do we
       do? While we are still interested in the general makeup of the enemy army, we
       choose to be rationally ignorant of the exact order of battle.


       So what is the logical, rational conclusion of all of this? At the beginning of the
       chapter, we suggested that the human behavior that is presented by descriptive
       decision theory (i.e., what people tend to do) lies somewhere between the rigid per-
       fection of normative decision theory (i.e., what people should do) and complete
       randomness. Let’s review what we have explored.

           Perfect Rationality Is Flawed: While we can admit that acting under perfect
           rationality leads to decisions that are “right” in a theoretical sense, they often
           fail to yield realistic-looking decisions.
           Not Everyone Is Rational: The reason acting perfectly rational often does not
           yield good results is that we cannot expect superrationality—the perfect ratio-
           nality of all other players.
110   Behavioral Mathematics for Game AI

          Not Everyone Is Irrational: Although we cannot expect superrationality, we
          can expect that at least some players will act in a rational fashion at least some
          of the time.
          Pursuing Rationality Has Limits: Even when people are attempting to be ratio-
          nal, they are often limited in their ability or desire to do so. This results in them
          acting partially rationally—often to varying degrees.
          Acting Irrationally Can Be Logical: Because we cannot expect everyone else to
          act either perfectly rationally or partially rationally, it is sometimes logical for us
          to act in a fashion that is not perfectly rational.
          People Exhibit Bounded Rationality: People are not perfectly rational because
          they often fail in their abilities to perceive and calculate the information cor-
          Pursuing Rationality Can Be Prohibitive: Sometimes the costs of attempting
          perfect rationality are so high compared to the benefits we would gain that we
          can justify remaining in rational ignorance of the information.

           The above observations lead us, not to a pinpoint solution of how to approach
      modeling human behavior, but rather to the notion that a range of behaviors is
      more appropriate. On one end of this range is the pure logic and math of perfect
      rationality; on the other end is the mindless chaos of randomness. Neither endpoint
      is acceptable as a solution that resembles the way “real people” think and act. But
      what lies in the middle?
          Before we can proceed with modeling behaviors, we need to know what that
      range is. And before we can begin to map out what that range is, we need to deter-
      mine how ranges like this are measured. That is, we need a measuring stick.
7             The Concept of Utility

         ne thing that was left unsaid in the analysis of the Prisoner’s Dilemma was
         the reason behind making the decision. Perhaps this is obvious, but the
         underlying premise is that going to jail was something negative. It is an
undesirable end—which is the reason it is considered a “loss.” However, strange as
this may sound, it is not a rigidly defined loss condition. Different people may have
different beliefs about and weights for the negative aspects of incarceration. For
instance, if you were quite certain that your wife would leave you if you were gone
for five years but not if you were only gone for six months, it makes the six-month
option look a lot better than the joint possibilities of either no time or five years.
However, if even six months away would be horribly detrimental to your life, you
may be more willing to take a chance on getting no time whatsoever by betraying
your partner and hoping he doesn’t do the same to you. What it comes down to is
a matter of how you, as an individual, value the various factors involved.
     There are other possible considerations. It is my understanding that there is an
unwritten code among the criminal masses: You do not rat people out. Not only are
there ramifications on the street for doing so, but there are ramifications from the
other inmates while incarcerated as well. That adds another aspect of personal util-
ity to the equation. How important is it—from a moral standpoint (and that term
is used within the context of the criminal mind)—to not betray a partner simply on
principle alone? These are all factors that provide the subtle shading of the decision
process well beyond what we thought we had determined was the black and white
of mathematics.
     In economics, utility is a measure of the relative satisfaction from or desirabil-
ity of consumption of goods. Put more simply, “How much does this mean to you?”
Whether it is money, time, the value of building a tower, or the effectiveness of a
razor blade, the problem with attempting to quantify the utility of an item is that
different things are worth different amounts to different people.

112   Behavioral Mathematics for Game AI

          For instance, in Chapter 6, in my introduction to the Monty Hall Problem, I
      mentioned that my wife suggested that getting the goat instead of the car wouldn’t
      be too bad. I disagreed. She and I have a differing opinion on the relative merits of
      goats and cars. We put different utilities on them.
           As a slightly less bizarre example, I put a differing value on a decent steak than
      does my wife. She, on the other hand will pay massive amounts of money for her
      particular favorite latte—which I view as paying money for coffee-flavored milk. I
      just don’t get it. The technical restatement would be, I have a different utility for
      lattes than does my wife. It costs the same no matter which one of us buys it, so the
      value of the latte is the same regardless. This is something that we accept as part of
      human nature (and something we will attempt to quantify later).
          To some extent, these differences can be modeled mathematically. While not
      accounting for everything, it gives us a starting point from which we can grow,
      tweak, and massage.


      Probably one of the easier ways to examine utility is to do so in the realm of com-
      parative values. By using ordinal rankings, we are able to achieve a sense of relation
      between objects or thoughts—that is, “This is more important to me than that.” At
      that point, we can then begin the process of scaling things in a way that provides not
      only quantification but direct comparison—for example, “This is twice as impor-
      tant to me as that.”
          Blaise Pascal was one of the 17th century’s more amazing minds. His contribution
      to mathematics came by way of trying to help out a French aristocrat kick his gambling
      habit. Despite Pascal’s religiosity, he did not preach to his friend about the evils of gam-
      bling. Instead, he provided mathematical advice on how to win at gambling.
           He took up the question with Pierre Fermat, the wellspring of what we now
      know as modern calculus. In what was likely a series of conversations that included
      more numbers and symbols than actual words, it is likely that Pascal invented prob-
      ability theory—and set in stone the underpinnings of today’s casinos.
          Pascal stated that when it comes to making bets, it is not enough to know the
      odds of winning or losing. You need to know what is at stake. For example, you
      might want to jump into a bet despite unfavorable odds if the payoff for winning
      would be really huge. (Which explains today’s multi-million dollar lotteries.)
      Conversely, you might consider playing conservatively by betting on a sure thing
      even if the payoff is small. Of course, betting on a long shot wouldn’t be terribly
      wise if the payoff is small. You might as well hand your money out.
                                                    Chapter 7 The Concept of Utility    113

            If you have any awareness whatsoever of how gambling odds are set up in such
       things as horse races, you are familiar with this concept. The “favorite” horse gen-
       erally has nothing to do with how congenial said steed is at parties. It is based
       entirely on the fact that many people believe he is the most likely to win. Naturally,
       if the aggregate opinion of the people can be trusted, we can assume that makes Mr.
       Horse a sure bet—and the payoff is low. However, if there is a horse that is consid-
       ered a “long shot” in track parlance, the payoff is significantly higher. It can be…
       you aren’t likely to win anyway. In a way, the excessive payout would be a reward
       for your foolhardiness.
            Looking at Pascal’s observations today, they don’t seem terribly spectacular. It
       is something that is ingrained in us on many levels and in many areas of our lives.
       However, the premise does merit a closer look at some specifics, if only to help us
       identify the steps in the process that has become so intuitive. (We may even find
       some surprises.)
            The essence of these sorts of decisions is that there is no way of knowing what
       the outcome is. Regardless, you have something of yours on the line—the wager.
       Given no control over the outcome for whatever reason (be it random chance such
       as a die roll or not enough knowledge of the situation such as in Matching Pennies),
       you are making what is referred to as a “decision under risk.” Pascal was attempt-
       ing to solve this problem in the realm of gambling, but it could be applied to every-
       thing from the stock market to whether or not to ask out a potential date to
       consenting to a potentially risky surgical procedure. (“As with any surgery, there is
       the risk of death.”) The best you can do is collect knowledge of the situation, ana-
       lyze your choices, and determine what the potential payouts might be. This is what
       Pascal brought to the table.

       After putting all of this thought into the mathematical laws of probability theory
       (and I assume passing on the advice to his gambler pal), Pascal applied his new con-
       cept back to his religious beliefs. What resulted is now known as “Pascal’s Wager.”
       Ol’ Blaise wanted to justify, to himself mostly, his belief in God. More accurately,
       he wanted to justify his living his life as if God existed (which, naturally, includes
       believing in God).
           He knew, as a good scientist, that it was impossible to prove the existence of
       God. In fact, he allowed that you couldn’t even determine the probability that God
       existed. However, from his beliefs in the religious teachings, he could apply what
       was “at stake” in the wager… eternal life.
114   Behavioral Mathematics for Game AI

          He created a decision matrix (Figure 7.1) much like what we have seen in
      Matching Pennies and Prisoner’s Dilemma. In this case, one question was whether
      or not God does indeed exist. Admittedly, we cannot determine the probability for
      this—much like we couldn’t determine the likelihood of our friend turning up
      heads or tails. As we will see, using Pascal’s logic, this turns out not to matter all that

            FIGURE 7.1 The scoring matrix for Pascal’s Wager. Pascal used these options
                and outcomes to determine that it was better to live as if God exists.

          The other side of the grid involves “our play” in the game. We are trying to
      decide what to do. His two choices were:

          Live as if God exists (e.g., believe, follow the rules, etc.)
          Live as if God doesn’t exist

          His payoff matrix was based on his understanding of what was at stake. This
      speaks more to Pascal’s reminder that we need to know what is being wagered and
      what is on the line.
          Walking through the grid, we can infer Pascal’s thought process. The most
      obvious possible outcome is the upper left. If we live as if God exists and it turns out
      that he does exist, we achieve eternal life in Heaven. That is, the payout is +∞ . On
      the other hand, if God exists and we life a… shall we say… less than exemplary life,
      we would be damned to hell for all eternity (or variations on that theme). The
      payout is – ∞ . The interesting part comes in the right-hand column of the matrix.
      If God does not exist, then it doesn’t matter. We lose nothing by living well; we gain
      nothing by living wild and free. For all intents and purposes those can be expressed
      by –0 and +0.
                                                      Chapter 7 The Concept of Utility      115

       Pascal’s Strictly Dominant Strategy
       Looking at those potential outcomes, the strictly dominating strategy is not very
       coy. In fact, it kinda jumps right out at you. If we live well, the possibilities are + ∞
       and –0. If we do not, our payouts are – ∞ and +0. Again, no matter what the prob-
       ability of the fact that God exists—even if it is a 1% chance, we come out ahead by
       living as if God exists. (1% of infinite happy stuff beats 1% of infinite nastiness.)
       When you combine it with Pascal’s premise that you gain and lose nothing if God
       doesn’t exist, the solution is obvious.
            However, much of the controversy surrounding Pascal’s Wager stems from
       that last point. Pascal treated the two lifestyles in question as being equal. If they
       were not, Pascal could not have made the claim that you gain nothing and lose
       nothing if God does not exist. Apparently, Mr. P. found neither appeal in the ben-
       efits of loose living, nor restriction in the relatively conservative nature of piety. For
       all intents and purposes, the two rows of that decision matrix are no different in any
       sense other than as it applies to the question of eternal life.
            Much like the blanket assumptions that were made in the Prisoner’s Dilemma
       about how awful jail is, for Pascal’s argument to be meaningful, we need to accept
       his views on a number of counts. What do we gain or lose by living in a Godly fash-
       ion? Exactly what does that mean for us? What are our “utility” values for lying,
       cheating, drinking beer, swearing profusely, and kicking puppies? For that matter,
       what are our views on the nature of Heaven and Hell? Do we have a way of quanti-
       fying our preferences for the proffered trappings of those two afterlife extremes? If
       we change those ideas based on our utility values, we also must change the payout
       matrix. This can occur even to the point where what we may believe a priori about
       whether God exists or not may come back into play much like we had to begin con-
       sidering what our partner would do in the Prisoner’s Dilemma.
           What we must realize here is that the simple matrix is not so simple once we
       begin questioning the inputs themselves. Again, while the basics of decision theory
       can be expressed in such an uncomplicated manner, it also provides numerous
       hooks upon which we can attach more interesting and expressive statements. And
       when we do so, we inch ever closer on our quest for “interesting decisions.”

       Pascal cautioned us that we must know what is being wagered before we can make
       a decision. However, while that is certainly valid and wise, that is really only part of
       the issue. The other side to the equation when making a decision under risk is that
       we have to have a good idea of why we are risking.
116   Behavioral Mathematics for Game AI

           In the case of Pascal’s gambler friend, it was obvious that the point of the gam-
      bling was to win something. The odds and the amount wagered itself were only a
      means to an end. If the end is not accomplished, adjusting what it is you are willing
      to risk is no longer an issue. To frame things back into terms of Pascal’s religious
      wager, if the terms of “living as if God exists” were structured such that it was im-
      possible to accomplish, then even attempting that approach is no longer worth it.
      Put another way, if your risk-management strategy is not capable of producing the
      desired outcome, then it is not a viable solution. Therefore, why risk at all?
          On the other hand, if your desired outcome is feasible if and only if you risk,
      then you will not accomplish your goal unless you enter into the risky behavior. If
      you elect not to risk, neither the odds nor the value of the prize will be of any im-
      portance. It’s like multiplying by zero—nothing else in the equation matters at all.
      Therefore, at times, the utility that is present in a situation is specifically tied up in
      taking the risk in the first place.

      A Monopoly on Preventing Monopolies
      Many years ago, I had the opportunity to play the game Monopoly with some
      friends. All of the people involved had played the game numerous times. With that
      background, there is an assumption that people have an understanding of the core
      game mechanics. By this, I do not mean the mechanics of rolling the dice, moving
      around the board, and so on. I am referring to the fact that the overarching strat-
      egy of the game is “get more and better stuff so that you can charge your opponents
      lots when they land on it.” Of course, the pinnacle of this concept is the titular
      “monopoly”—acquiring a collection of like assets so that their combined value is
      significantly greater than the sum of their parts.
           One method of acquiring a monopoly is through the random chance of landing
      on the associated spaces (usually three of them) and purchasing them before anyone
      else does. The odds of this happening are generally slim—especially seeing that we
      had five or six players that night. Instead, it is a generally understood and accepted
      tenet that, if anyone is going to get a monopoly at all, there will have to be barter-
      ing of some sort. Sometimes this is straight up property-for-property, sometimes it
      involves cash, and sometimes there is a combination of transactions. Sometimes
      only two people are involved. Other times, there can be three-way transactions…
      or even more. The bottom line is, unless some sort of transaction takes place, the
      game stagnates quickly. Once all the properties are bought, the only action in the
      game becomes “roll and pay.” And at the fees that are listed for the nonmonopo-
      lized, nondeveloped properties are relative loose change in the scheme of things.
           In this particular game, one of the participants, Marci, exhibited a peculiar
      strategy. By the time the properties on the board were all bought up, she had man-
      aged to accumulate pretty much one of everything important. Of the nine main
                                             Chapter 7 The Concept of Utility     117

groups of properties (eight residential groups and the railroad group), she had one
—and only one—property in five of them. At this point in the game, the transactions
needed to start flowing if anyone was to get ahead at all. Because of the variety of
Marci’s holdings, she was a rather popular person to approach with deals.
     At first, we thought she was taking full advantage of her position and holding
out for better offers. Every different combination of offers was made. “I will give
you A for X. No? How about A for Y? B for X? B for Y?” The solicitations were
flying. However, after numerous turns and an exhaustive list of offers from all of
the other players, it became clear to us that she simply was not selling. We began to
query her as to why she wouldn’t trade anything at all to any one of us. Her reason-
ing was, much to our amazement, staggeringly simple. She told us, “I don’t want to
give anyone else a monopoly.”

Playing Not to Lose
While we conceded that she was certainly doing a bang-up job in that regard, we
tried desperately to reason with her. We pointed out that, by trading someone a
particular property, it wouldn’t be giving them the third piece necessary to form
a group… only the second. Her answer was that it would allow the possibility that
they acquire the third item from someone else. We cajoled her by illustrating that,
unless she traded, she would not be able to acquire a monopoly either—thereby
weakening her own chances. To this she very simply asserted that she didn’t care
about winning by getting monopolies—she just didn’t want to lose by allowing
others to get them. Needless to say, her response elicited some significant confusion
(and no small amount of frustration) in the rest of us.
    For those of you who are curious, the game limped along in this fashion. It
turned out that the assets were arranged such that no two people could directly
trade so that each could acquire a monopoly without Marci’s help. Likewise, any
three-person deal (again sans-Marci) would have left one of the three severely
wanting, and was therefore nixed. Slowly, there was some forced consolidation,
however. Eventually, people started to gather power while Marci steadfastly held to
her zero-risk strategy. As we had prophesied, with no means of reasonable income,
she gradually bled out financially and succumbed to defeat. Of course, at that point,
her properties were taken over one by one… and the game became the fast-and-
furious trade-fest that it is generally meant to be.
     What went wrong for Marci was that she had defined “risk” in a very different
way than had the rest of us. Going back to her words, “I don’t want to give anyone
else a monopoly.” At that, she succeeded, at least for a while. If you put her strategy
aside for the moment and assume that her idea of winning was in line with the
accepted definition of “winning” in the game of Monopoly (i.e., being the last one
standing at the end), we can analyze the decision with all other things being equal.
118   Behavioral Mathematics for Game AI

           Regardless of whether you elect to buy, sell, or trade properties in the game of
      Monopoly, winning is not a sure thing. At a very minimum, the random nature of
      the dice ensure that there isn’t “one sure path” to winning. Therefore, a decision
      analysis such as the one we did for Pascal’s Wager cannot include “winning the
      game” as one of the outcomes. The best we can hope for is to stay in the game and
      continue to have the possibility of winning. Of course, with that comes the possi-
      bility of losing as well, but that’s why we play the game.

      Marci, Meet Blaise Pascal…
      We can analyze Marci’s peculiar approach to Monopoly in a manner similar to how
      Pascal justified his own risk vs. reward scenario. If we construct a decision matrix
      similar to the one we used for Pascal’s Wager (Figure 7.2), something does jump out
      at us. We can start by placing Marci’s decision—trade or don’t trade—along the left
      side of the matrix. Along the top, we list the possible uncontrolled outcomes. This
      represents the notion that, despite Marci’s best intentions, people may or may not
      get monopolies without help from her. This is similar to the unknown existence of
      God that Pascal was facing. By analyzing the matrix as we have done often to this
      point, we can find the strictly dominating strategy… that of trading properties.

                   FIGURE 7.2 The scoring matrix for Marci’s Monopoly strategy
                shows that the approach of trading properties is the only one where
                      winning is possible no matter what the other players do.

          There is a difference in this dominating strategy than what we have found in
      other analyses, however. In this case, it isn’t dominating in that it guarantees a win,
      but rather in that it gives us a chance to win. After that, all the other factors of the
      game of Monopoly (including random dice rolls) take over. What we do find is
      that there is a selection that holds only negative consequences. If we do not trade
      properties with other players, the best we can hope for is a prolonged stalemate.
                                               Chapter 7 The Concept of Utility     119

Worse still, if people manage to garner the monopolies without our help, we will
almost definitely lose.
     One lesson to learn here is that utility is often not inherent to a particular
item—but in a particular action attached to an item. Marvin Gardens has a value as
listed on the deed in the game. However, its utility in the game is in how it is used—
which may have nothing to do with the values listed. In this case, if trading away
Marvin Gardens could net you a property that you need to complete a trio, then
Marvin Gardens’ utility is worth something only in getting rid of it. If you don’t do
the action, then the utility is not realized.
     To exercise this utility, however, Marci had to take a risk. She was correct in de-
termining that, by trading, she was possibly giving someone else what they needed
to eventually win. However, by focusing only on that aspect of the transaction, she
hamstrung herself. She locked herself into the top row of Figure 7.2, but was only
looking at the top left corner… stalemate. Notice that the stalemate result is the only
one of the four possibilities that doesn’t include the word lose. Of course, the top row
is the only row that doesn’t include the word win. That word only appears in the bot-
tom row. It is that observation that very much sums up the entire point… in order
to have the possibility of winning, you often have to accept the possibility of losing.

 IN   THE   G AME   The Tortoise and the Harried

Marci’s approach in Monopoly is played out in the video game world as well, and it
is often just as frustrating for the opponents. To dip into the vernacular of the play-
ers of our games, the term turtling is used both in strategy games and in team-based
online first-person shooter (FPS) games. In both cases, it represents an approach of
playing almost entirely defensively at the expense of the offense. Obviously, the
metaphor is that of a turtle pulling into his protective shell and hoping to wait out
his attackers. Much as was the case on that ill-fated Monopoly night in the early
1990s, this is an effective strategy (and frustrating for the opponents), but only if all
you are trying to accomplish is to survive. For that reason, turtling is, more often
than not, a pejorative term.
    Certainly, depending on the game and the situation, there are times when it is
more appropriate than others. In an online team game where the goal is for one team
to capture the other’s territory, then it behooves the second team to be entirely de-
fensive. In a “capture the flag” scenario, however, the strategy of playing entirely
defensively looks a lot like Marci’s approach to Monopoly. In fact, if we put this
scenario into a decision matrix such as in Figure 7.3, it will look strikingly like what
we saw in Figure 7.2. If you elect to turtle and only defend, the other team may not
get your flag, but you certainly are never going to get theirs.
120   Behavioral Mathematics for Game AI

                     FIGURE 7.3 The scoring matrix for a theoretical game of
                    capture the flag shows that a turtling strategy (top row) may
                    either prolong the game into a stalemate or eventually lead
                        to a loss—but is not a valid strategy to win. The only
                         possible way of winning is to attack at some point.

           Again, depending on the game and genre, building armies or spawning soldiers
      is only part of the solution—just like holding on to Marvin Gardens. If the rules of
      the game dictate that a goal needs to be accomplished to win, and achieving that
      goal has an element of risk involved, then no amount of calculation will benefit you
      if you do not first elect to take the risk.

           Now certainly, there are risks worth taking and those that are not. Much of
      the remainder of this book is designed to help with those calculations. However, the
      point made here is important. Just as Pascal posited that, in addition to the odds of
      the game, “you must know what is at stake,” we must also know that there are times
      when putting things at risk actually changes the odds of the game. The question is
      not “Should I trade my properties in Monopoly?” The answer to that is “Yes.” The
      question is actually “How should I trade my properties?” The answer to that involves
      more detailed calculation, as we shall see shortly. However, the assumption can be
      made that depending on how much and which properties you trade, the odds of
      winning will actually change significantly… for better or worse.
          Similarly, in a capture the flag game, the question is not “Should some of my
      forces go on offense?” If your team ever hopes to win, again the answer is “Yes.”
      The question would need to be “How many of my team members should go on
      offense?” Once again, this is a question that demands more calculation than a two-
      by-two matrix can provide. But, unlike betting on the fixed odds of a flip of a coin
                                                       Chapter 7 The Concept of Utility     121

          or the roll of a die, the amount of the wager (in this case, the team members com-
          mitted to offense) actually changes the odds of success… again, for better or worse.
               Of course, our next question—indeed, our next challenge—becomes determin-
          ing the answers to those “how much” questions. And for that, we need more than
          vague ideas of none, some, more, or all. It’s time to start putting values on the util-
          ity of our playing pieces.


          Part of our difficulty with Pascal’s Wager was the inability to quantify the inputs. In
          fact, it was even made worse by using dissimilar measures such as Heaven and Hell
          vs. lifestyle. How do those compare? Is cheating on my wife or coveting my neigh-
          bor’s ox worth even 20% of the value of the Heaven we may give up? Is telling the
          truth to my wife about how her new hairstyle makes her look like our neighbor’s ox
          worth more than 14% of Hell? These are calculations that are near impossible to
          make because the comparisons themselves are so vague. (We’ll give it a shot
          nonetheless a little later on.)
              However, by making sure all measures are of the same type, our calculation
          process gets much easier. For the moment, let’s follow a more intuitive approach by
          employing something that we use all the time as a universal quantifier—money.
               Money has value that is hard to argue with. Most of the time, the actual value
          of any given item of currency is emblazoned right on it for all to see. A $20 bill is
          worth… 20 dollars. After all, it says so. The value of that currency relative to other
          currencies may change—that’s what gives us fluctuating exchange rates. The value
          of a currency compared to what it can buy may even change—which gives us infla-
          tion and deflation. At that point all we are doing is comparing the value of that cur-
          rency with the value of something else.

          Value is different from utility, however. As we mentioned at the beginning of this
          chapter, utility represents the relative satisfaction that we get out of something.
          Satisfaction, in this case, can also represent more than just happiness or pleasure. It
          can represent the usefulness of an item in achieving a goal.
             In the case of Marci holding Marvin Gardens, the original value was represented
          by what she paid for it in the first place. As she held it, that value changed as
          people were willing to pay more and more for it. The reason for that shift is that
          Marvin Gardens had a utility over and beyond the face value of the property—other
          people needed it to improve their positions. For Marci, the utility of Marvin
122   Behavioral Mathematics for Game AI

      Gardens was expressed in terms of her not wanting to lose. She had little hope of
      matching it with the other properties to get a monopoly—but she also did not de-
      sire that. To her (and to the rest of us), the utility of Marvin Gardens had nothing
      to do with its value. It had to do with power (such as it was).
          When money is used as the example, many of us can relate to the difference be-
      tween value and utility. We all have likely gotten into a discussion over whether or
      not something was “worth the price” that was set for it. The price the parties were
      discussing—the one on the tag—was the same for both people. Therefore, the value
      that would be paid for the item would be the same. The difference is in what we
      expected to get for that amount. Some people would be willing to pay that price for
      an object and others would not. They may have a greater notion of utility for their
      money than other people.
          Even this example is complicated somewhat by necessarily including the opin-
      ions of value (and utility) that the people have for the item in question. If you were
      to remove any exchange from the process entirely, people’s relative utility for the
      same value of money becomes a bit more apparent. For example, if you were to lose
      20 dollars somewhere, would you try to find it? Ten dollars? Five? One? At what
      point is the utility of the money you lost so low that you no longer care?
          Even this question is a little knotty in that we must account for the time spent
      looking for the lost money. If I dropped a dollar bill at my feet, I would bend down
      to pick it up. If I was told I dropped a dollar back in the parking lot, I wouldn’t
      bother. While the value was still the same, my utility for a single dollar is not worth
      the time and effort to locate it.

      What’s It Worth to You?
      In a different scenario, I was once caught behind a woman in the grocery store.
      Apparently a small can of peas had been scanned incorrectly—or at least different
      from the price she claimed she saw on the shelf. We were stalled waiting for the
      helpful grocery clerk to go and ascertain the validity of what she saw. I was stand-
      ing behind her with a single item—mildly perturbed over having to wait. When I
      overheard that the difference between the scanned price and her claim was about 20
      cents, I reached into my pocket, pulled out 20 cents and placed it on the counter
      between the customer and the cashier saying “here, I’ll cover the difference.”
          The customer turned to me and (much to my amazement) explained with
      rather animated excitement that the store rules stated that if something was
      scanned incorrectly, you received it for free. It didn’t take me but a moment to
      glance at the register and see that the can of peas had rung up at a whopping
      78 cents. Knowing that I had plenty of change in my pocket, I reached back in and
      plopped what likely amounted to two entire cans of peas on the counter.
                                                       Chapter 7 The Concept of Utility    123

          “Here,” I said. “I don’t know how much value you put on five minutes of your time,
          but five minutes of my time is worth more than the price of a can of peas.”
              There was a silent moment during which she displayed a thoroughly bewil-
          dered look (which was worth more to me than a can of peas). Collecting herself
          (but still probably uncomprehending that her time could possibly be worth more
          than 20 cents), she managed a brief nod to the cashier who, by this point, was
          beaming worshipfully at me like I was the grocery store equivalent of Charles
          Bronson in Death Wish. The price check was cancelled, the lady paid, and I left the
          store 30 seconds later.
              What is the point of my reflective anecdote? While the value of the can of
          peas was the same 78 cents for both of us, the lady obviously had a higher utility for
          that 78 cents than did I. And the utility was the important factor in making the
          decision… not the value.

          As we have discussed thus far, goals and tools can often be expressed in terms of
          utility. This may or may not equate to their stated value. Earlier in this chapter we
          also discussed the act of taking risks (or not taking them) in terms of utility. This
          becomes useful to us when we combine and compare the utility of the risk with the
          utility of the outcome. When we do this, we arrive at an expression that may yield
          an inequality of varying magnitude. It is this inequality that allows us to determine
          whether or not a risky activity is “worth it.”
              For purposes of keeping to a simple example, it would be best if we stay within
          the realm of a single, measurable unit. For now, we can construct an example that
          uses money as the manner of measurement for both the risk and the reward. To
          that end, let’s begin with a scenario that is likely familiar to many of us.

          Warranties and Risk
          Imagine purchasing a computer for $1,400. With tiresome predictability, the sales-
          person offers you a warranty that offers free repair for your computer for a year at
          a cost of $300. (Yeah, I know… that’s a little steep.) The question, naturally, is
          whether or not you should purchase the warranty.
              Obviously, the decision as to whether or not to purchase a warranty is based
          largely on our expectation about whether the computer will need repair during the
          warranty period. (Repairs normally cost $200.) We are also going to consider
          the possibility that the machine could completely implode beyond repair and need
          to be replaced.
124   Behavioral Mathematics for Game AI

           Given our vast experience with technology, we could apply what we believe are
      reasonable probability figures on this. We believe the odds of the computer contin-
      uing to work for the entire year are 70%. (Again, give me a break… it’s a hypothet-
      ical.) We give it a 20% chance of needing a minor repair and a 10% chance of
      becoming a proverbial doorstop.
           By applying the percentages and the associated costs for the various events, we
      achieve the figures shown in Figure 7.4. No matter what happens during the dura-
      tion of the warranty, we are out the $300 we paid for it. If we don’t buy the warranty
      and nothing goes wrong, then we don’t lose anything. If the computer requires
      fixing, we have to pay $200 for the repair job. If something goes amiss and we must
      toss it on our ever-growing pile of defunct electronic equipment, we are out the
      entire $1,400 we paid for it. (May I suggest at this point that we don’t replace it with
      a computer of the same model?)

            FIGURE 7.4 The prices for the computer and the warranty can be combined
                with the expected percentages of needing repair or replacement to
               determine whether or not to purchase a warranty for a new computer.

          Similar to what we did with previous examples, we are going to attempt to
      calculate our expected utility for the prospective warranty. For a change, we now have
      solid numbers with which to work. As much as I don’t want to make this a formula-
      laden book, I want to walk through the mathematics of this particular example.
          For those of you not familiar with reading statistical and logical notation, I will
      explain a bit of terminology. First, we will define W as meaning the purchasing the
      warranty. The symbol “¬” means “not.” So ¬W would mean “not W”—or in this
      case, “not purchasing the warranty.”
          To calculate the relative benefits of each of those two paths, we need to calculate
      the expected utility (E) of each of either purchasing (E(W)) or not purchasing
      (E(¬W)) the warranty. This involves taking into account the cost of the three
      possible outcomes in a manner that also respects the likelihood of those outcomes
                                               Chapter 7 The Concept of Utility     125

In these examples, the math may seem somewhat redundant or excessive. For
example, when buying a warranty, in all three cases the cost is 300. Multiplying
300 by 0.7, 0.2, and 0.1 only to add them back together to arrive at 300 may
seem a little silly. The reason that I do this is to keep in mind that we do have
three possible outcomes that we may need to treat separately. While in this case
those three outcomes were the same (i.e., paying $300), in other examples that
we will use shortly, this will not be the case. Please bear with me until then.

    The expected values for the two actions are:

    In English, the expected utility of purchasing the warranty is to lose $300 over
the course of that year, which is the cost of the warranty no matter what happens.
    In the case of not purchasing the warranty:

    Again, in more readable terms, the expected utility of not purchasing the war-
ranty is a cost of $180 for the year. This reflects the 20% chance of needing a $200
repair and the 10% chance of replacing the $1,400 computer.
    Because E(¬W) > E(W), this means that the value of not purchasing the war-
ranty is greater than the value of purchasing it. However, the result of –180 leads us
to expect that we will have to pay about $180 on repairs over the course of the year.

Different Inputs, Different Outputs
The example above is not a proof that all warranties are rip-offs. Given that we per-
formed a mathematical analysis of the problem, our solution is only as good as the
numbers we used. Some of the figures were established in a concrete fashion. For
example, the price of the computer and the price of the warranty are values that we
can point to as being known quantities. Other figures in the example were estimates.
We would have to estimate the likelihood that the computer we are purchasing is
going to need repair work or be cast into the bowels of hell. The question then
arises: What happens when those estimates change?
126   Behavioral Mathematics for Game AI

           The happy part is that, once established, these formulas can be applied to any
      values we want to use. For example, simply changing the percentages of computer
      failure will change the results significantly.
           Figure 7.5 shows a slightly different scenario. In this case, the computer is a
      complete piece of garbage and only has a 50% chance of surviving that first year. It
      will need a repair 30% of the time (remember, those cost you $200 without the war-
      ranty), and 20% of the time it will need to be replaced. By changing those figures,
      we get the following:

          In this case, we find that E(W) > E(¬W). Therefore, the expected value of
      buying the warranty is greater than the expected value of not purchasing it. (Has
      anyone yet considered not purchasing this computer at all?)
           Another scenario, shown in Figure 7.6, takes into account the possibility that
      the warranty is simply over-priced. Let’s return to the original assumptions about
      failure rates (20% = needs repair, 10% = boat anchor). However, we are going to
      reduce the price of the warranty from $300 to $150. (Apparently there’s a half-price
      sale on warranties for junky computers.)

        FIGURE 7.5 By increasing the percentage chances that the computer requires repair
          or replacement, the expected utility of purchasing the warranty increases as well.
                                              Chapter 7 The Concept of Utility     127

  FIGURE 7.6 By reducing the price of the warranty, the expected utility of purchasing
         the warranty is better than what we can expect by not purchasing it.

    The math works out to:

     Now, with our half-price warranty in hand, the expected value of the whole
shebang is a negative $150. However, as we saw before, not purchasing the warranty
will cost us about $180. This time, despite the fact that the computer is just as reli-
able as it was in the first example in Figure 7.6 (for whatever that’s worth), we see
that E(W) > E(¬W). The difference in the price of the warranty made it more
worthwhile to purchase. More accurately, it made it less of a drain than the poten-
tial of the computer breaking and us not having the warranty.

Generalizing the Formulas
Remember, however, that we are assuming that the failure rates are static and ac-
curate. We have only “solved” the situation for those figures. It would behoove us
to formalize the formula (so to speak) so that we can apply it to all combinations of
parameters just by plugging in the numbers.
    The above equations show that this is actually a simple process. Once we have
the different variables that we need laid out in the proper arrangement, calculating
the two utility values is rather straightforward.
128   Behavioral Mathematics for Game AI

          Given the following variables:

          c:   The cost of the computer
          x: The cost of the warranty
          f:   The likelihood of complete failure
          b: The likelihood of a breakdown needing a repair
          r:   The cost of a repair

         Our formulas for the estimated utility of both purchasing a warranty and not
      purchasing one are as follows:

         If we wanted to decide whether we should purchase or not, we would then just
      need to determine whether E(W) > E(¬W).

      P UTTING I T   IN   C ODE

      Of course, to get this calculation into a game-type environment, we will need to
      arrange it in terms of code. Once we have arranged our formulas in such a way that
      we are solving for the utility values that we need, this is actually a rather straight-
      forward process.
           When we were solving this problem “by hand” above, we had two formulas and
      a comparison of their results. Similarly, we will lay out two functions—one for
      calculating the utility of the warranty and one for calculating the utility of not
      purchasing the warranty. A third function will compare them and return a Boolean
      as to whether or not we should purchase the warranty with the given numbers.
          For purposes of simplicity, we are assuming that the appropriate numbers are
      calculated elsewhere and stored as member variables in our class. Note that the
      function WorksOdds() simply returns the percentage chance that the computer
      works based on what is already set by mRepairOdds and mReplaceOdds.
          float MyGame::UtilityWarranty()


               // Returns the utility value of purchasing the warranty.
                                     Chapter 7 The Concept of Utility   129

     float Utility;

     Utility =

         ( WorksOdds() * -mWarrantyPrice )

         + ( mRepairOdds * -mWarrantyPrice )

         + ( mReplaceOdds * -mWarrantyPrice );

     return Utility;


float MyGame::UtilityNotWarranty()


     // Returns the utility value of not purchasing the warranty.

     float Utility;

     Utility =

         ( mRepairOdds * -mRepairCost )

         + ( mReplaceOdds * -mComputerCost );

     return Utility;


bool MyGame::PurchaseWarranty()


     // Returns the decision on whether or not to purchase the

     // warranty based on the relative utility values returned from

     // the two functions.

     if ( UtilityWarranty() > UtilityNoWarranty() ) {

         return true;

     } else {

         return false;


130   Behavioral Mathematics for Game AI

          Given the above functions, we could set the appropriate values and call the
      function PurchaseWarranty() to get our answer at any time. In this case, we are
      using the values from the original example.
            // Set the appropriate values

            mComputerPrice = 1400;

            mWarrantyPrice = 300;

            mRepairCost = 200;

            mRepairOdds = 0.2f;

            mReplaceOdds = 0.1f;

            bool ShouldWeGetWarranty;

            // So, should we purchase the warranty

            // with the values set above?

            ShouldWeGetWarranty = PurchaseWarranty();

      Relative Utility
      As simple and comprehensive as the above solution seems to be, there is another
      factor we have yet to consider with our warranty purchase. We have left out the no-
      tion of the utility that an individual may place on a working computer. How much
      is it worth to us to avoid downtime on a computer? How much is that computer an
      integral part our lives? If it breaks and we cannot fix it, are we in dire need?
          I know that if my computer were to break, my entire life would be on hold until
      such time as it was repaired or replaced. A warranty is a necessity for me. (In fact, I
      happen to have next day, on-site with accidental damage coverage… I can get my
      computer repaired or replaced faster than I can get in to see my doctor.) On the
      other hand, my mother would probably not even realize something was wrong
      with her computer for months. She can spend that money elsewhere on things that
      she can actually get usage out of.
          These sorts of ruminations get a little beyond the scope of this chapter. We will
      attempt to approach this level of depth later on in the book.

       IN   THE   G AME   Protecting the Barracks

      While the above scenario is a familiar one to us in real life, I am not aware of too
      many games where the AI agents purchase a computer, much less have to decide
      whether or not to purchase a warranty for it. However, by using a little creativity it
      is not hard to envision a scenario in which the concept is easily applied.
                                              Chapter 7 The Concept of Utility     131

    In our computer plus warranty example, the values we used were either based
on the cost of something (i.e., money) or the percentage chance that something
would happen. By limiting the value references to money, we don’t have to concern
ourselves with converting between two different types of values. For the sake of
continuity, we will stick with that formula here to best mimic the warranty example.
     It is a staple of real-time strategy (RTS) games to construct buildings of some
sort for whatever reason. These buildings cost money to produce and are, therefore,
a valuable commodity much like the computer. Once we have spent the money for
the building—let’s just say it is a barracks—we would rather not lose the building
by having it damaged or destroyed by the enemy. Put simply, we want to keep our
investment intact. To do so, it may behoove us to protect our investment against
damage or destruction.
    One common method of protecting buildings in a game such as this is to build
defensive structures such as towers. A tower could repel invaders that would seek to
damage our building. Of course, towers also cost money to build. Spending money on
a tower is a big decision because the money could be spent elsewhere, for example,
on more military or on other, more productive buildings. So, what sorts of factors
must go into our decision to build an accompanying tower hovering over our
beloved barracks?
    Certainly, we must have a value associated with the construction of our barracks.
How much did it cost us? We can assume that a total destruction of the barracks
would require a replacement cost that is equal to the original construction cost. We
can also assume that there is a cost of repairing any damage that is done to the bar-
racks. Therefore, we should be able to assign a cost of repairing the damage done to
the barracks with or without the tower present. Of course, for all of this benefit that
the tower provides, we need to know what the cost of building a tower is.
    Once the values of building and repair are established, we also need to know
what the likelihood of the barracks being attacked is. We will assume that the bar-
racks could be attacked by a small force, a large force, or not at all. If the tower is
present, the difference between the two sizes of attacking forces is reflected in the
amount of damage done. If there is no tower, a large force will destroy the barracks,
whereas a small force will only damage it heavily before they are driven off.
    We must note that this is different from our warranty example in that, with a
warranty in place, we paid nothing for a repair or replacement. The only cost was
that of the warranty itself. In this case, some of the entries represent the cost of the
tower plus the cost of damage done despite the presence of the tower.
132   Behavioral Mathematics for Game AI

          Laying out some hypothetical costs:
          Building a Barracks                                         500
          Building a Tower                                            150
          Repair Small Damage with Tower Present                      100
          Repair Small Damage without Tower Present                   250
          Repair Large Damage with Tower Present                      250
          Repair Large Damage without Tower Present                   500 (new Barracks)

          And some hypothetical probabilities of attack:
          No Attack:                      30%
          Small Attack:                   50%
          Large Attack:                   20%

           If we lay out the possibilities as we did in the case of the computer warranty
      scenario (Figure 7.7), we arrive at two utility values. Note that they are negative util-
      ities because, in this case, they represent costs to us. With that in mind, we are look-
      ing for the highest utility, that is, the least cost. Using the figures above, we find that
      it would be better for us not to build a tower.

               FIGURE 7.7 In this situation, we are not assuming a great probability
              of receiving a large attack. Therefore, when we figure the cost of building
                  a tower plus the benefit that it provides, the utility is less than the
                  utility of not having one (and rebuilding the barracks if necessary).

          For the sake of completeness, let’s lay out the formulas themselves. We are now
      solving for E(T), the estimated utility of building a tower.
                                            Chapter 7 The Concept of Utility    133

    Compare this to the estimated utility of not building a tower.

    Again, we find that E(¬T) > E(T). Therefore, the estimated utility of not build-
ing the tower is greater than that of building it.
     Many factors go into this determination, so the decision is rather tenuous.
Certainly, the building costs of the barracks and the tower have a lot to do with the
determination. However, one of the major components is our assumptions about
the likelihood of attack. In this example, we are using what is probably a rather
naïve assumption that there is a 30% chance that our barracks won’t be attacked at
all. Likewise, we are assuming only a 20% chance that it will suffer a large attack.
This may be the case if the barracks is well back behind our lines deep in our city.
In that case, by simply using some nonmathematical reflection, we could agree that
not having a tower over the barracks would be justified.
    However, if we were to change these assumptions about the susceptibility to at-
tack—perhaps by assuming the barracks would be along the front lines—we would
expect the utility of an accompanying tower to shift as well. For instance, let’s use
the same costs for the buildings and the damage estimates but change the attack
probability figures to something that is more in line with a typical “in the thick of
things” RTS scenario.
    No Attack:                   5%
    Small Attack:                60%
    Large Attack:                 35%

    As we can see by the updated figures in Figure 7.8, by simply changing the ex-
pected percentages of attack, the utility functions have swung significantly in favor
of building a tower. The equations themselves would now be as follows:
134   Behavioral Mathematics for Game AI

          FIGURE 7.8 In this situation, we are assuming a greater probability of receiving
              an attack. Now, the benefits provided by the tower in case of attack are
               more prevalent—making the utility cost for building one much greater.

          Compare this to the estimated utility of not building a tower.

          Now, we find that E(T) > E(¬T). The utility of building (–297.5) is greater than
      the utility we achieve by not building (–325). Based on the numbers provided, our
      decision should now be to build the tower.

      P UTTING I T   IN   C ODE

      Just as we did with the computer warranty example before, the act of converting
      this example to code is relatively simple. In fact, the functions and flow look strik-
      ingly familiar. Again, we have one function for the utility of building a tower, one
      for the utility of not building a tower, and a third that does a simple comparison of
      those utilities to advise us. The numbers used for the building costs, the repair
      costs, and the odds of attack can be set elsewhere in the game (which is important,
      as we shall see in a moment).
          float MyGame::UtilityTower()


              // Returns the utility value of purchasing the tower.

              float Utility;
                                    Chapter 7 The Concept of Utility   135

     Utility = ( OddsNoAttack() * -mTowerCost)

         + ( mOddsSmallAttack * -( mTowerCost + mSmallDamageTower ) )

         + ( mOddsLargeAttack * -( mTowerCost + mLargeDamageTower ) );

     return Utility;


float MyGame::UtilityNoTower()


     // Returns the utility value of not purchasing the tower.

     float Utility;

     Utility = ( mOddsSmallAttack * -mSmallDamageNoTower )

            + ( mOddsLargeAttack * -mBarracksCost );

     return Utility;


bool MyGame::BuildTower()


     // Returns the decision on whether or not to purchase the

     // tower based on the relative utility values returned from

     // the two functions.

     if ( UtilityTower() > UtilityNoTower() ) {

         return true;

     } else {

         return false;


136   Behavioral Mathematics for Game AI

          At any point in our game, we could call the BuildTower() function, and it
      would process whatever values we had in place for the necessary variables and yield
      a Boolean value as to whether the expected utility of building the tower was greater
      than the expected utility of not building it… that is, is it worth it to build a tower
      over our barracks or not?
          Again, using the values from the original example:
          // Set the appropriate values

          mBarracksPrice = 500;

          mTowerPrice = 150;

          mSmallDamageTower = 100;

          mSmallDamageNoTower = 250;

          mLargeDamageTower = 250;

          mOddsLargeAttack = 0.2f;

          mOddsSmallAttack = 0.5f;

          The code to make the decision would simply look like:

          bool ShouldWeBuildTower;

          // So, should we build the tower

          // given the values set above?

          ShouldWeBuildTower = BuildTower();

      When Changing Changes the Changers
      There is one thing that we have not yet considered in this calculation that may be
      of interest in our utility functions. It is possible (if not likely) that simply building
      a tower next to the barracks may change the probability that it will be attacked at
      all. An undefended building is an almost irresistible target for even small forces to
      take on. On the other hand, the presence of the defensive structure provides a
      deterrent—especially to those lighter squads. To reflect this, we may want to work
      with two layers of probability of attack—one that reflects an undefended barracks
      and one that takes into account the fact that enemies may more carefully consider
      attacking a building with a tower hovering nearby. Needless to say, the complexity
      of our calculations goes up significantly at this point.
           To see this effect in action, we need to define two different sets of expectations
      for the likelihood of attack based on the presence of the tower. First, let’s assume
      that without a tower present, we can expect:
                                              Chapter 7 The Concept of Utility     137

    No Attack:                     10%
    Small Attack:                  50%
    Large Attack:                  40%

     Next, let’s assume that with a tower present, the probabilities change to the
    No Attack:                     50%
    Small Attack:                  10%
    Large Attack:                  40%

    The differences in the above figures represent what would likely be reluctance
on the part of the enemy to attack a protected barracks with a small force—perhaps
electing instead to only attack when it can commit the larger force that would be
necessary. The resulting figures are shown in Figure 7.9.

   FIGURE 7.9 By including the idea that the presence of the tower may discourage
  attack, we must change the probability numbers—which, in turn, change the way the
                               utility costs are applied.

     Note that to visualize this new wrinkle, we need to change the way we are laying
out our table. Before, we were putting the likelihood of attack at the top of each col-
umn. Now, these figures change based on the row as well. For clarity, I have broken
the table into two parts—one for building the tower and one for not building it.
     If we calculate the utility of not building the tower (E(¬T)) using the probabil-
ities that we have in place for building it (i.e., 50%, 10%, 40%), the resulting utility
is –225. The utility of building the tower (E(T)) is –260. E(¬T) > E(T). Therefore,
the utility of not building it is greater—that is, it would not be worth it to build.
     However, once we adjust for our “deterrent factor,” the utility of not building
the tower (E(¬T)) becomes –325. That means E(T) > E(¬T). Without changing any
138   Behavioral Mathematics for Game AI

      of the figures, including the cost of building the tower itself, we find that the deter-
      rent factor alone is enough to justify building the tower.
           Of course, a major disclaimer over all of this is that the numbers we have se-
      lected for the likelihood of attack may not be accurate. A change of even 5% on one
      of those figures could make a significant change in the decision about whether or
      not a tower is worth building. Additionally, we may want to vary what our esti-
      mates are of the damage that would result from the differently sized forces. What’s
      more, we may have more than those two sets of attack probabilities that we want to
      use. For example, we may want to have different assumptions based on where the
      barracks is in relation to enemy territory or other buildings that we have already
      constructed. We may even want to create a separate function that determines those
      factors for us in a different part of the AI code.

      P UTTING I T   IN   C ODE

      Thankfully, if we wanted to include this very specific decision into our game, we
      could package everything we have done into a few small functions that simply re-
      turn whether or not, based on our estimates, building a tower to cover our barracks
      has a utility value that makes it worthwhile.
          In this case, rather than holding the odds of there being a large or small attack
      in external variables that we access from the utility functions, we actually pass the
      variables into those utility functions. That way, we can send it whatever we want. Of
      course, that means our base function BuildTower() needs to have all the values
      passed in as well so it can distribute those values appropriately. Therefore, we will
      change BuildTower() to ask for the four odds values that represent the likelihood
      of a small or large attack depending on the presence of the tower. (Note that the
      odds of no attack is simply the percentage left over after the odds of the small and
      large attacks; you will see the function call for this in the code below but not the
      very simple function itself.)
          float MyGame::UtilityTower( float OddsLargeAttack, float OddsSmallAttack )


               // Returns the utility value of purchasing the tower using

               // alternate attack odds

               float Utility;

               float NoAttack = OddsNoAttack(OddsLargeAttack, OddsSmallAttack);
                                    Chapter 7 The Concept of Utility   139

     Utility = ( NoAttack * -mTowerCost )

             + ( OddsSmallAttack * -( mTowerCost + mSmallDamageTower ) )

             + ( OddsLargeAttack * -( mTowerCost + mLargeDamageTower )

     return Utility;


float MyGame::UtilityNoTower( float OddsLargeAttack, float
OddsSmallAttack )


     // Returns the utility value of not purchasing the tower using

     // alternate attack odds

     float Utility;

     Utility = ( OddsSmallAttack * -mSmallDamageNoTower )

            + ( OddsLargeAttack * -mBarracksCost );

     return Utility;


bool MyGame::BuildTower( float OddsLargeAttackTower,

                          float OddsLargeAttackNoTower,

                          float OddsSmallAttackTower,

                          float OddsSmallAttackNoTower )


     // Returns the decision on whether or not to purchase the

     // tower based on the relative utility values returned from

     // the two functions - including the different odds of attacks

     // on the barracks based on the presence of the tower.

     float thisUtilityTower;

     float thisUtilityNoTower;
140    Behavioral Mathematics for Game AI

                  thisUtilityTower =

                        UtilityTower( OddsLargeAttackTower, OddsSmallAttackTower );

                  thisUtilityNoTower =

                        UtilityNoTower( OddsLargeAttackNoTower, OddsSmallAttackNoTower );

                  if ( thisUtilityTower > thisUtilityNoTower ) {

                        return true;

                  } else {

                        return false;



            As before, this could be called from code at any point that we needed to make
       a determination as to whether or not we should build a tower over our barracks. We
       are still ignoring where we are getting the numbers for the attack odds—assuming
       that they are being set elsewhere.


       Money certainly is an obvious and reasonably measurable commodity to which we
       can ascribe value and utility. Even items on which we can hang a price tag, such as
       a computer of questionable quality, can be converted to be expressed in terms of
       the value of money.
           Another commodity, however, is important to consider when discussing the
       concept of utility—especially as it corresponds to decision theory. The idea of time
       as having measurable utility is not a foreign concept. After all, we often utter mus-
       ings to ourselves or to others along the lines of “if I only had the time to spend,”
       “but I don’t want to waste the time,” “it’s not worth my time,” and “could you buy
       me some time?” Notice that the emphasized words are all ones that can be used in
       similar sentences regarding money. Just as we can say we spend money and waste
       money, that our money is worth something, and we can even buy it, we have inter-
       nalized a measurement of time as being a measurable quantity against which we can
       judge other items.
                                                     Chapter 7 The Concept of Utility       141

           Even the direct comparison of time to money is something that is a given in our
       mental approaches to life. We often get paid a specific rate of money per hour. We
       rent cars, apartments, and carpet steam-cleaners in dollars per day. Even without
       being explicitly addressed, people will make the comparison. For example, paying
       $50 for a video game that has only two hours worth of gameplay content and zero
       replayability would likely make a few people grumble. If they took the time to think
       about it, they have paid $25 per hour for the privilege of playing the game.
       However, a role-playing game (RPG) that managed to offer up 50 hours’ worth of
       gameplay drops the rate to $1 per hour… a much more respectable number.
           Notice that the above rates are all expressed in the form of the amount of money
       spent per a fixed amount of time. It would seem that, when the two forms of mea-
       surement are in the same formula, we naturally tend to think in terms of the money
       spent. Is this simply a result of the fact that our society thinks in terms of money as
       the universal measuring stick? Perhaps not.

       Even when dealing with issues of production, for example, the amount of time
       seems to get reduced to the role of denominator in the ratio. For example, we may
       think of how many widgets we can produce in a fixed amount of time. Chuck can
       churn out six widgets per hour, whereas Ralph can only produce five per hour
       (Figure 7.10). It would seem that Chuck is more of a widget-meister than is Ralph.
       It is perfectly viable to think the other way around, of course. We could have said
       that it takes Chuck 10 minutes to produce a single widget, but it takes Ralph 12 to
       do so. The math works out the same, but it seems strangely uncomfortable. We feel
       almost drawn to do the mental flip-flop to convert it once again to “widgets per
       hour.” Why is that?

              FIGURE 7.10 Depending on what problem we are trying to solve, production
                 over time can also be expressed in terms of time per unit of production.
142   Behavioral Mathematics for Game AI

           Often, it comes down to what we are measuring. What is important? Of course,
      one thought process is that we are measuring widgets, not time. Therefore the wid-
      get measurement should be in the position of importance. In the above example, the
      reason for this way of thinking may be that it seems more intuitive to think in terms
      of “widgets per hour” because we are expecting that lying around nearby is a mea-
      surement of how much we are paying our widget-making pair. After all, if we are
      paying them the same amount (per hour), and all widgets have the same value, then
      it isn’t a stretch to determine that Chuck is more productive (per hour) than is
      Ralph. Notice how we didn’t even bother to compare how much we are paying Chuck
      or Ralph per widget? It’s always in terms of widgets per hour and dollars per hour.
          Even when money is not involved, time seems to get the back seat. Giving an-
      other, more personal, example… When I entered into the agreement to write this
      book, I gave my publisher a wild guess as to an expected number of pages (at the
      time of this writing, we have penciled in a figure of 500). In turn, we also agreed that
      I would have the book completed within six months. Only after that point did I
      come to the startling realization that I would have to write about 20 pages per week.
      I even subdivided it further to about three pages per day. Notice that my thought
      process was in pages over time. At no time did I bother to think in terms of how
      long it would take me to write a single page. Why is that?
           Again, it comes back to what is important. In the case of writing the book, I was
      not calculating how many pages I could write per day. I was determining how many
      I needed to write per day. There’s a subtle difference there. In my case, I realized that
      I needed to work the book writing around the rest of my life. I could tell myself, “I
      need to manage to work in my three pages today.” However, while this approach
      makes for a great mental bookmark, it doesn’t do me much good for making deci-
      sions. For that, I do need to start to calculate things in terms where time is the
      important factor.

      Putting Time at the Top
      For instance, if I were to try to decide what my plans for the evening were on any
      given night, I would look at the available time. Let’s say that after getting home
      from my “day job” of doing AI consulting, I determined that I had five hours into
      which to work all of my evening routine. Dinner would take one hour to cook and
      eat. Spending some time with the wife and kids would take another hour. That
      brings me down to three hours available for that evening. The question then arises,
      “Can I afford to watch a one-hour TV show and still get my three pages in?”
          The only way I can answer this question is to know how much time it would
      take me to write those three pages—which is directly related to the value of time per
      page. I am able to assign time expenditures to dinner, time with the family, and
      even to the TV show that I am pondering. The only thing I don’t know is the time
                                              Chapter 7 The Concept of Utility     143

value to assign to the three pages of writing. The rest of this decision exercise is not
terribly relevant (let’s just say I’m spending as much time as I can this weekend try-
ing to catch up on the writing that I didn’t get done during the week). The point is,
however, that there are two separate measurements in play.
     Originally, I had calculated how many pages per day I would need to write to
finish the book on time (three). That differs significantly, however, from the calcu-
lation of how long I would have to leave in each day to accomplish the three-page
goal. In that case, the value and utility of time as the primary measuring device
becomes important. “Can I spend time watching the TV show and still have enough
time to spend writing my book?” Notice the emphasis on spending time. How im-
portant is that TV show to me that I would spend time on it—perhaps in exchange
for something that is not as important to me later in the week so that I can make up
the time on my book?

 IN   THE   G AME   Settlers and Warriors

Similar sorts of gymnastics are often necessary when doing calculations for games.
At times, it is not enough to know how long it takes to create something. You need
to know how many somethings you can create in a particular amount of time.
     Imagine that we are calculating a build order in a turn-based strategy (TBS)
game. (We could do the same with an RTS game, but thinking in terms of “turns”
instead of seconds or minutes is easier for this example.) In our game, we know that
it takes three turns for a city to create a warrior. We also would like to use the
production of these cities to create a new settler unit so we can expand our empire
—as soon as possible. The settler takes five turns to create. Additionally, creating
the settler unit is useless unless we can send a warrior along to protect it.
    We also know that we could expect an attack in as few as 20 turns. We want to
be able to defend our city with a minimum number of four warriors. The four war-
riors must be built and in place on that twentieth turn when the attacks could begin
to occur.
    One approach would be to build the four warrior units first and, once the city
is suitably defended, proceed with building the settler and the escort warrior.
However, because expanding our empire is a high priority, we want to begin that as
soon as possible. The question is, can we afford to build the settler unit and its ac-
companying escort warrior right away, send them off in search of the promised
land, and then begin cranking out our defenses? Will we have the four defensive
warriors in place by the time the twentieth turn arrives?
144   Behavioral Mathematics for Game AI

          Now, most of us will recognize this exercise as not much more than the word
      problems that were so maligned back when we were in math classes in school. (I
      never understood why people griped about the word problems so much. I kinda
      liked them! But then again, here I am writing a book on the mathematics of game
      AI.) Just like those word problems, the trick is putting things into their proper
      frame of reference.
          For instance, we know by the rules of the game that a warrior takes three turns
      to produce. We also know that it takes five turns to produce our settler unit. For the
      sake of visualization, let’s assign those to variables.

          Time to produce a warrior:       TW = 3
          Time to produce a settler:       TS = 5

          If we were to produce our settler and warrior exploration team, it would take
      us eight turns:

           Subtracting that from the 20 turns during which we know we are going to be
      safe from attack leaves us with 12 turns. If we were to take the remaining 12 turns
      and build nothing but warriors, could we make our quota of four to have a suitable
      defense when the attacks come? I don’t mean to be pedantic about this. I know we
      have all done this in our heads about three paragraphs ago. I’m merely trying to
      express a point… bear with me.
          To do that, we need to express the equation not in terms of turns per warrior,
      but in terms of warriors per turn. Knowing that it takes three turns to produce a
      warrior, we know that we can produce one-third of a warrior per turn. Twelve
      turns of producing one-third of a warrior each gives us the following:

          So, it would seem that we will be able to complete our four warriors in time for
      that magical twentieth turn deadline.
                                                          Chapter 7 The Concept of Utility       145

           Certainly, we could have done the math the other way around—dividing the
       remaining 12 turns by 3 turns each—and arrived at the same result. The point was
       the mentality shift involved in putting things in terms of how much can we do per
       time period rather than in terms of how much time per action. This is important be-
       cause it helps us frame time as a valid and useful component of utility rather than
       simply a measurement of utility. That is, we are “spending time” doing something
       rather than doing something “over a period of time.”
            When viewed as a whole process, building five soldiers and a settler takes 20
       turns no matter which way we do it (Figure 7.11). However, we know that the set-
       tler unit has another job to do—that of starting a city. When he starts the city, the
       city will grow over time. The longer the time the city is established, the more it will
       grow. Therefore, time has a utility with regard to cities. If we want to maximize this
       utility, we want to maximize the time the city has to grow, which means that we
       want to start the city as soon as possible—that is, in the least amount of time.
       Therefore, the utility of time that is so important to the city is transferred to the set-
       tler itself. Any time saved in the process of building the settler has a valuable utility
       in our overall process.

            FIGURE 7.11 In terms of the number of each type of units produced, the end result
           of each build order is the same. If the settler has another job to do (presumably starting
                a new city), the time saved by building it with its escort first has a utility value.

       Another very important aspect to take into account when designing game AI is the
       relative value of time when attached to travel distance. If we had an agent deciding
       between two available cover points (see Figure 7.12), one consideration would be
       how far away those cover points are from the current location. All other things
146   Behavioral Mathematics for Game AI

      being equal, the closer cover point is a more attractive option. The further away
      the cover point is, the longer it would take to get there and, we would assume, the
      longer the agent is exposed to the threat from which he is trying to get cover.

              FIGURE 7.12 A common consideration for the utility of time is when it is
             calculated as a result of a traveled distance. In this case, the longer distance
                 to cover B would require being exposed for a longer period of time.

          This is hardly a surprise. We do calculations like this all the time. In fact, when
      constructing an AI that would take this into account, we likely wouldn’t bother
      with converting the distance to time. It would be easier to simply sort the cover
      points by distance and pick the nearest one. As we discussed above, this is relegat-
      ing time to the back seat once again.
           Other considerations could be in play that may make us start to think about the
      value of time, however. For instance, if the path to cover point A is tougher terrain
      that would cause you to move slower, now we are thinking in terms of time. The
      distances may be the same, but it takes a longer time… that is, we are spending our
      time in traveling the rougher terrain. If we were using a pathfinding algorithm that
      incorporated terrain types or another such method of representing traversal time,
      this is a calculation that can be performed through those numbers. Regardless, the
      distance-time relationship is still somewhat interlinked.
          There are times, however, when the time required to travel a distance would
      have to be compared to something less closely related. In the above example of
      cover points, the assumption was that all cover points are created equal. However,
      game agents are often presented with two objectives that are not only not the same
      thing (i.e., cover = cover) but are actually two different goals entirely (e.g., “raze this
      building” or “destroy that army”). When this occurs, the time involved must be
      included in whatever decision we are making regarding those goals.
                                              Chapter 7 The Concept of Utility       147

     If we were in a situation where the distance to each goal was the same (Figure
7.13), then the time factor would not be important (assuming there are no other
factors to consider). We could make the decision based entirely on the relative
values (V) of the two goals. Likewise, if the values of the goals were the same, such
as in our cover point example, then we could rely solely on the distance compari-
son to make our decision. The situation is slightly more complicated when the
distances and the values of the goals are different.

       FIGURE 7.13 If the distances to the goals are the same, the values of the
           goals would be the deciding factor. On the other hand, if the values
      of the goals are the same, the distances to them may be the deciding factor.

Unequal Times to Unequal Goals
Let’s assume that our agent is deliberating between accomplishing goal A and goal
B (Figure 7.14). Each of those goals has a value (V) that we have calculated with
other criteria. In addition to this, let’s assume that the agent is currently closer to
goal A. We now have a number of factors to consider. First, how much closer are
we to goal A than to goal B? Second, what is the difference in importance of accom-
plishing goal A vs. goal B? Third, and perhaps most evasively, what is the relative
importance of time compared to the values of the goals themselves?
      Let’s deal with the distance question first. For purposes of this argument, let’s
assume there are no terrain issues, so that distance can be directly related to time.
It is not a difficult proposition to determine which distance is greater. The value we
will need to determine, however, is how much different the distances are. After all,
if the difference is negligible, there isn’t much to lose by tackling the further goal.
Just for the sake of putting it out there, however, let’s show the formula.
148   Behavioral Mathematics for Game AI

          We are simply calculating the difference (Tn) in the time it would take to reach
      goals A and B (Ta and Tb, respectively). For purposes of this example, we are assum-
      ing that we had already determined that B was the most distant—this way I don’t
      have to clutter things up with absolute values and so on.

       FIGURE 7.14 In this example, goal A is closer than goal B. We must consider both the
            values of the relevant goals as well as the time it takes to get to the goals.

          At this point, Tn is simply a linear difference. We have only determined what
      the difference between them is numerically. For example, if Ta = 10 and Tb = 25,
      then Tn would be 15. While that is valid mathematically in that it allows us to com-
      pare the two values, it doesn’t tell us as much about the relationship between them.
      For instance, if the distances to A and B were 1,000 and 1,015, respectively, Tn
      would still be 15. At that point, the difference of 15 is not terribly relevant.
      Obviously, that difference of 15 doesn’t mean as much as the difference of 15 when
      the values were 10 and 25.
           The solution, as we shall see throughout this section and indeed throughout
      this book, is to normalize the result against one of the values. We need to determine
      not what the linear difference is, but rather the relative difference. In other words,
      we want to express one of the values in terms of the other one. In this case, let’s
      specifically compare all time values to that of the shortest time, Tn. Therefore,

          If we insert the values Ta = 10 and Tb = 25, we find that Tn = 2.5. That is, trav-
      eling to goal B (Tb) is going to cost us two and a half times as much as we would
      spend getting to goal A (Ta).
                                              Chapter 7 The Concept of Utility     149

    All of this is fairly simple so far… no surprises yet. In fact, we can do much the
same calculation with regard to the value of the goals themselves. Without getting
too deep into what the goals are and how we arrived at the values (we’ll tackle this
problem later on in the book), let’s assume that the value of goal A (Va) is 50 and
the value of goal B (Vb) is 75. Performing the same sort of calculation as we did with
the time expenditures, we could determine that the difference in the values of the
goals is:

     In English, the value of achieving goal B is one and a half times more valuable
than what we secure by achieving goal A. Putting numbers like this into the form
of relative difference helps us put things into perspective when comparing unlike
types of values. For instance, if the difference in the value of the goals was 50%
(such as above), but the difference in the time was only 1%, then the time difference
isn’t nearly as significant as the goal value difference.
     All of that is well and good, but the question still comes up: Which goal should
we pursue? Goal A is closer but not worth as much. Goal B is farther away but
worth more. The only way we can solve this equation is to find a way of equating
the value of time and the values of the goals themselves. We need to place some sort
of algorithmic translation between the goal values and the time values. To do this,
we need to have an idea of what their relationship is. The problem is very context
specific, of course. The values would depend largely on the game design. Some ex-
amples might be:

    Destroying an army before it can attack
    Destroying a building before it can be completed
    Destroying a building before it can finish producing something
    Arriving at a defense location prior to the arrival of opposing forces

    Of course, there may not be a particular reason that it is important to even pay
attention to the time factor. However, if there is no time pressure involved in com-
pleting either goal A or goal B, and if we know that we could do both if we want
to—and only need to select which order in which to do them—this problem is no
longer a big deal. Naturally, the course of action would be to accomplish goal A first
since it is nearby and then proceed to goal B when we are finished with A (Figure 7.15).
150   Behavioral Mathematics for Game AI

                      FIGURE 7.15 If the respective values of goal A and goal B are equal,
                     there are no time restraints and no other considerations, then it is more
                           efficient to complete goal A first and then proceed to goal B.

           Regardless, if there is no detriment to putting off goal B (or goal A for that mat-
      ter) until we can get around to it, there is really nothing to consider. In fact, what
      is to stop us from taking the scenic route and going all the way over to goal B first
      and then backtracking to goal A? Of course, if there is no time limit, then the value
      of the time spent getting there is not really relevant. So let’s make it relevant…

      To make this problem far more interesting, let’s assume that the values of accom-
      plishing goals A and B will diminish over time. For simplicity’s sake, let’s assume
      that the rate of decay (r) for the goal values is the same for both A and B. This
      change causes an inherent utility in the time spent (or saved) in traveling to goals
      A and B. We can no longer take a leisurely approach. Again, this puts us back in the
      mindset of taking the shortest route through the two goals. There is one caveat,
      however… what if goal B was so important that it couldn’t wait? Is there a point
      where accomplishing goal B quickly is urgent enough that we would postpone
      accomplishing goal A until later despite the fact that we are relatively close to it?
      Let’s construct our formula and test a couple of cases.
               The components that we need to consider are:

               Va        The value of goal A
               Vb        The value of goal B
               Vab       The total value of goals A and B
               Ta        The time to move to goal A
               Tb        The time to move to goal B
                                                 Chapter 7 The Concept of Utility   151

Tab         The time to move between A and B
r           The rate that the values of goals A and B decay

    The entire formula that we need for calculating the value of going to A first and
then to B would be:

      Let’s restate that in English (left to right):

       1. The total value of accomplishing A and B—in that order (Vab)…
       2. is the value of A (Va)…
       3. after reducing it by the decay over the travel time (1–(r×Ta)) …
       4. plus the value of B (Vb)…
       5. after reducing it by the decay over the travel time to A and from A to B

    In step 5, it is important to note that we have to include the time it took to get
to A and then from A to B. After all, the value of B was decaying that whole time.
That’s why we have to multiply the decay rate by (Ta+Tab).
    If we want to calculate the value of doing things via the reverse path (Vba), which
we need to do to compare them, we simply flip the As and Bs accordingly. (We can
assume for now that the distance from A to B is the same as from B to A.)

    To decide which route we should take, all that remains for us to do is to com-
pare the values of Vab and Vba to see which one provides us with the highest value.
   To test our scenario, let’s plug in some numbers (Figure 7.16). First, let’s see
what happens when the goal values are the same. Let’s use the following values.

      Va      The value of goal A                       100
      Vb      The value of goal B                       100
      Ta      The time to move to goal A                10
      Tb      The time to move to goal B                25
      Tab     The time to move between A and B          45
      r       The decay rate                            1% per time period
152   Behavioral Mathematics for Game AI

          Putting those values into the formula above, we arrive at:

           As we can see, Vab = 135 and Vba = 105. That means, of the original combined
      value of goals A and B (200), we could achieve 135 by visiting A first (then B) and
      105 by going the long way to B first and proceeding back to A. Put another way, by
      performing A first, we will achieve 68% of the possible total score, whereas we
      would only get 53% the other direction. (As these values change, it will be increas-
      ingly important for us to express the outcomes in terms of percentages.) This intu-
      itively makes sense to us. At that point, we would expect that the greatest value lies
      in the path that takes the shortest time to execute… A first and then B.

       FIGURE 7.16 If the values of goal A and goal B start the same and are both decaying
          at the same rate, we should complete the closest goal first—in this case, goal A.

           However, what if we raise the value of goal B to 200, for instance? We now have
      300 potential value points between A and B. Leaving the distances the same as in the
      first example, we would arrive at:
                                               Chapter 7 The Concept of Utility       153

    We still have a situation where Vab > Vba, although the margin is a little tighter.
The 210 out of 300 possible points is only 70%. We have lost 5% of the value that
we accomplished in the first example. However, it is still greater than the 65%
that we would achieve by going to B first. The best decision is still goal A, then B.
    The fact that the margin of importance is decreasing tips us off to something.
As the relative value of goal B increases as compared to goal A, we are no longer
quite as sure of our decision to take on goal A first. What would happen if this trend
were to continue?

   FIGURE 7.17 If the value of goal B is significantly greater than goal A, it becomes
   more worthwhile to travel the extra distance to goal B first to accomplish it as soon
    as possible before the value decays too much. We can then backtrack to goal A.

     Let’s try a third example (Figure 7.17). This time, we will increase the value of
goal B to 400. All other values remain the same as before. This means the total value
of all goals is now 500.

    This time, Vba > Vab. We have passed a threshold where the importance of goal
B has increased enough to warrant us going out of our way to accomplish it and
only afterward backtracking to touch on goal A. In fact, if we were to put the equa-
tions up against each other and solve for Vb, we would find that when Vb = 300, the
two solutions are equal… it wouldn’t matter in which order you visited them.
   The only reason the order was a factor in this decision is because there was a
mechanism in play that caused time to be a factor—namely the decay rate (r) of the
154   Behavioral Mathematics for Game AI

      values of goals A and B. In the final example, goal B was not only worth more but,
      due to the 1% decay rate, was losing value four times as fast as goal A. In layman’s
      terms, we “didn’t have time to waste” in dealing with the lesser objective of goal A.
           So, as we can see, the time one spends in travel is a valuable commodity.
      Similarly, the amount of time that may be spent in performing an activity also
      needs to be considered. If we were in an RTS environment, for example, goals A
      and B could be constructing buildings. Not only would the travel time to the loca-
      tions need to be considered (such as we did above), but the time spent in the actual
      building process would need to be added. The process is much the same—the only
      difference being that we would have to include two more variables, one for the con-
      struction time of each building.

      P UTTING I T   IN   C ODE

      Just as we did with the warranty and tower examples above, we can create a simple
      utility function to calculate the utility values of accomplishing the goals at various
      times. In this case, we can create a function, GoalUtility(), that calculates the
      utility of a particular goal. This is based on the value of the goal itself, the time spent
      traveling to the goal, and the decay rate at which the goal value will be reduced over
      that time. This function is the business end of the process in that it is the sole place
      where the utility of the goals is being calculated.
          int MyGame::GoalUtility( int ValGoal,             // value of the goal

                                        int Time,           // time to the goal

                                        float Decay )       // decay rate %


               // Calculate utility of a goal based on the value of the

               // goal, the time traveled to get to it,

               // and the decay rate

               int Utility; // Temporary local utility value

               // Utility is the value of the goal reduced by however

               // much the value decays during the time over which

               // the distance is traveled

               Utility = ValGoal * ( 1.0 - ( Time * Decay ) );
                                           Chapter 7 The Concept of Utility   155

        // Make sure the utility is a minimum of zero.

        if ( Utility < 0 ) Utility = 0;

        return Utility;


    Then, in the function SelectGoal(), by calling that function with the appro-
priate values, we can determine which of the two approaches is going to yield the
greatest utility. Note that for clarity, we have declared a type, GOAL_TYPE, which
enumerates which of the two goals we are returning as our decision.
    typedef enum {



    } GOAL_TYPE;

    GOAL_TYPE MyGame::SelectGoal( int ValA, int ValB,

                                    int TimeA, int TimeB, int TimeAB,

                                    float Decay )


        // Process the utilities of going to goal A first then goal B

        // and going to goal B first then A.

        float UtilityAB; // Utility of going to goal A first

        float UtilityBA; // Utility of going to goal B first

        // Get utility of going to A then B

        UtilityAB = GoalUtility( ValA, TimeA, Decay )

                     + GoalUtility( ValB, ( TimeA + TimeAB ), Decay );

        // Get utility of going to B then A

        UtilityBA = GoalUtility( ValB, TimeB, Decay )

                     + GoalUtility( ValA, ( TimeB + TimeAB ), Decay );

        // Compare the two utilities to determine which is preferable

        if ( UtilityAB > UtilityBA ) {
156   Behavioral Mathematics for Game AI

                        return GOAL_A;

                } else {

                        return GOAL_B;



           There is nothing magical about the SelectGoal() function. Its purpose is sim-
      ply to arrange the values appropriately for the GoalUtility() function to calculate
      the utility. If the scenario in which we are working changes, those changes would be
      reflected in SelectGoal()—or any other function that we want.
           For example, if we were to change the scenario to include three possible goals
      from which we wanted to select the best two to combine, we could write a function
      that arranged and passed the appropriate values to GoalUtility(). This is a stan-
      dard function pattern that, if you have done programming, you will likely recog-
      nize. Accordingly, you will see similar patterns throughout this book. For now, the
      point is that we have to get used to treating time as a parameter that has an inherent
      utility all its own.

       IN THE   G AME     Taking Fire

      The consideration of the utility of time becomes clearer once we put it into a com-
      mon game example. In keeping with a similar two-goal example like we have used
      above, imagine the following all-too-common shooter scenario.
           We are in a firefight, currently wounded and pinned down behind some cover
      (Figure 7.18). On one side of us is some available Armor that, once used, reduces
      the rate of damage we take from enemy fire. On the other side is a Health Kit that,
      as one would expect, increases our total health. Both of them are exposed to enemy
      fire. As soon as we step out toward either one, we are going to start taking damage.
      We would like to acquire one or both of them and then return to our place of cover
      with more health than when we left. There are a few questions to solve:

          Is it even possible for us to arrive back at cover better off?
          Can we acquire the Armor as well as the Health Kit and have at least some of
          it remaining when we arrive back at cover?
          Does it matter in which order we acquire the items?
                                               Chapter 7 The Concept of Utility       157

    As with our previous examples, we need to know a bit about the world before
we can process these decisions. Let’s use the following variables to define the rele-
vant items we will be using in the example.

    Current Health                                               H0
    Ending Health                                                Hn
    Value of Health Kit                                          Vh
    Value of Armor                                               Va
    Time to Health Kit                                           Th
    Time to Armor                                                Ta
    Time between Health Kit and Armor                            Tha
    Rate of damage from enemy fire without Armor                 D
    Rate of damage from enemy fire with Armor                    Da

 FIGURE 7.18 The agent needs to decide whether or not it can acquire the Health Kit,
  the Armor, or both and arrive back behind the cover better off than it left. The utility
    value of the time spent traveling under enemy fire is the greatest consideration.

    As we are starting to get more complex in our calculations, at this point it
would benefit us to break this process down into manageable sections. First, we
need to establish formulas for how much damage we would take both with and
without armor. We know the rates at which we will take damage, but how much
damage we take is based entirely on the time we will be exposed. For example, if we
run to the Health Kit, we would take damage as follows:
158   Behavioral Mathematics for Game AI

          Similarly, if we run to the Armor, the formula would be:

          And, if we ran between the Health Kit and Armor (assuming we did not yet
      have the armor), we would incur:

           The common thread here is that it is always the damage rate multiplied by the
      time spent. Assuming the values of the items themselves never change, the whole de-
      cision would be based on the time we are exposed to enemy fire and under what
      conditions (e.g., armored or not). Therefore, it is a simple and logical next step to
      condense the above three functions into a more generalized one that provides us
      the amount of damage we are going to incur in a period of time given a damage rate.

          With this utility function, we can throw any combination of time and damage rate
      we want at the problem. To solve our problem, we need three of these combinations
      —one for each leg of the trip. The times would be dependant on the distance and
      any other considerations we need to take into account (e.g., running slower while
      wearing the Armor). The damage rate, of course, would be based on whether or not
      we had picked up the Armor.
           To test our theory, we need to try a couple of examples. Since the rate of dam-
      age taken on different legs of the journey is a factor, it would be best to have our
      first examples expose that as the deciding factor. Therefore, let’s assume that the
      time to get to the Health Kit and the Armor are the same—five seconds. The long
      run from the Health Kit to the Armor (or vice versa) is eight seconds.
          In the first test, let’s assume that we are going to retrieve the Health Kit first and
      then go to the Armor. Using our damage utility function above, we arrive at the fol-
      lowing results for each leg of the foray:

          Leg                           Time          Rate           Damage
          Cover to Health Kit           5             2              10
          Health Kit to Armor           8             2              16
          Armor to Cover                5             1              5
          Total                         -             -              31

          The total amount of damage that would be taken is 31.
                                             Chapter 7 The Concept of Utility      159

    If we were to reverse this path, getting the Armor first, the data would be:

    Leg                          Time         Rate           Damage
    Cover to Armor               5            2              10
    Armor to Health Kit          8            1              8
    Health Kit to Cover          5            1              5
    Total                        -            -              23

    In this case, the total amount of damage taken is only 23 points. This intuitively
makes sense to us in that the difference in the second case was the fact that the long
run across the middle was done after acquiring the Armor. Therefore, only half
damage was taken during those eight seconds.
     We can now run a couple of new tests with one of the objects being significantly
farther away than the other. In this case, let’s make the two cover runs 10 and 5,
respectively. The run across the middle will have to be longer as well—we will make
it 20. Once again, we will do two examples with this configuration—one with the Health
Kit on the long side and one with the Armor there. The data ends up as follows:

    Leg                          Time         Rate           Damage
    Cover to Health Kit          3            2              6
    Health Kit to Armor          8            2              16
    Armor to Cover               7            1              7
    Total                        -            -              29

    The total amount of damage we would take is 29.
    Again, if we were to reverse this path, getting the Armor first, the data would be:

    Leg                          Time         Rate           Damage
    Cover to Armor               7            2              14
    Armor to Health Kit          8            1              8
    Health Kit to Cover          3            1              3
    Total                        -            -              25

    That is, by getting the Armor first and then going to the Health Kit we would
take a total of 25 points of damage. Once again, the results “feel right” to us. The
best solution, given these circumstances, is to get the Armor first—even if it is far-
ther away from cover. The reason for that is that while we may take a lot of damage
running to get the distant Armor, we are very exposed for a long period of time
160   Behavioral Mathematics for Game AI

      running all the way across the zone. It would be better to have the Armor on for
      those 8 seconds than to not have it.
          Of course, if we arranged the scenario so that our starting health was at or
      below 25, we would not be able to make it to the Armor and all the way back to the
      Health Kit before we died. In that case, every health point would matter signifi-
      cantly until we increased it. Therefore, it would be better to get the Health Kit first.
      But we shall explore those sorts of factors in the next chapter.

      P UTTING I T   IN   C ODE

      As we discussed above, the utility function we used multiple times in this example
      is simply one of time and rate (of damage). Expressed in code, this function is as
      simple as it sounds.
          int MyGame::Damage( int Time, int Rate )


               return ( Time * Rate );


          The rest of the decision is based on how and when we use the utility function.
      By using it multiple times with the relevant data, we can construct a total amount
      of damage taken. All of this action is performed in the function GetBestAction().
           In this function, we have put a slightly different spin on how we arrive at the re-
      sult. We start with our current health (mMyHealth) and then reduce that health
      value by the damage that is taken on each leg of the trip. The result is stored in
      HealthFirst or ArmorFirst, respectively. The reason we do this becomes clear at
      the bottom of the function. If our starting health is greater than both of the results,
      then it is clearly in our best interests to not leave the cover spot at all! (Note that I
      did not check for the possibility of running out of health along the way. That logic
      is not the point of the example and will actually be handled in a far more interest-
      ing way in the next chapter.)
          Once again, for visual clarity in returning a value from the function
      GetBestAction(),     we use an enumerated type, GOAL_TYPE. This is, of course,
      entirely cosmetic for purposes of this example.
          typedef enum {

               GOAL_NONE,                // Do not get either

               GOAL_HEALTH_ARMOR,        // Get Health Kit first

               GOAL_ARMOR_HEALTH         // Get Armor first
                                    Chapter 7 The Concept of Utility   161


GOAL_TYPE MyGame::GetBestAction()


    // Calc our remaining health if we get the Health Kit first

    int HealthFirst = mMyHealth

                    - Damage( mTimeToHealth, mDamageRate )

                    - Damage( mTimeHealthToArmor, mDamageRate )

                    - Damage( mTimeToArmor, mArmoredDamageRate )

                    + mHealthKitValue;

    // Calc our remaining health if we get the Armor first

    int ArmorFirst = mMyHealth

                   - Damage( mTimeToArmor, mDamageRate )

                   - Damage( mTimeHealthToArmor, mArmoredDamageRate )

                   - Damage( mTimeToHealth, mArmoredDamageRate )

                   + mHealthKitValue;

    // If both methods would leave us WORSE off than we started

    // then do neither.

    if (    ( mMyHealth > HealthFirst ) &&

           ( mMyHealth > ArmorFirst ) ) {

        return GOAL_NONE;


    // Return which of the two approaches will leave us with

    // the most remaining health.

    if ( HealthFirst > ArmorFirst ) {

        return GOAL_HEALTH_ARMOR;
162   Behavioral Mathematics for Game AI

                } else {

                    return GOAL_ARMOR_HEALTH;



           The above function is now able to decide based on any values we throw at it. No
      matter what the travel times are or what the damage rate is, we will find out which
      of the three approaches is best.

      One More Factor Couldn’t Hurt
      Upon further examination of the details of this adventure, we can prove that getting
      the Armor first is always the correct thing to do. The reason is that the sum of the
      lengths of two sides of a triangle is always longer than the length of the third. Given
      the parameters of our situation, it is better to be wearing armor while running two
      sides rather than running only one.
          However, this is a visitor from the realm of cold geometric theory. If we could
      guarantee that all our scenarios ONLY had those components in them, that would
      be, as they say in certain geometric circles, peachy. However, trying to teach our
      bots basic theories and getting them to apply them at the appropriate times is a bit
      outside the scope of game development. Our job, instead, is to provide them with
      tools they can apply in any situation.
          If we add one more component to this example, we ruin the simple triangle
      theory. Let’s assume the value of the Health Kit declines over time at a rate of 1 point
      per second. Therefore, the quicker we get to it, the more benefit we get from it. Now
      we are dealing with two different changes over time. The amount of change in our
      health from taking fire and the amount of benefit we get from the Health Kit.
          Tweaking our example once again, let’s assume that a Health Kit starts at 100
      points and declines by 1 point per second.

          Leg                         Time    Rate    Damage       Health Added       Net
          Cover to Health Kit         3       2       –6           +97
          Health Kit to Armor         8       2       –16
          Armor to Cover              7       1       –7
          Total                       -       -       –29          +97                +68

          We would sustain 29 points of damage and gain 97 health. This results in a net
      gain of 68 points.
                                              Chapter 7 The Concept of Utility     163

    Again, if we were to reverse this path, getting the Armor first, the data would be:

    Leg                       Time     Rate     Damage      Health Added         Net
    Cover to Armor            7        2        –14
    Armor to Health Kit       8        1        –8          +85
    Health Kit to Cover       3        1        –3
    Total                     -        -        –25         +85                  +60

    By adding this one simple factor, we have reversed our original decision. If we
had proceeded on our original route of getting the Armor first, we still would have
taken 25 damage points over that time. However, the Health Kit that was worth 100
when we started was only worth 85 by the time we reached it 15 seconds later. Our
net gain for the three legs of the triangle would be 60. On the other hand, when we
go to get the Health Kit first, it only loses 3 points, down to 97. We still take 29
points of damage from fire during our brief stroll, but we end up with a net of 68
points greater than when we started.
    In this example, if we were to only consider the (very relevant) factor that it is
better to be wearing armor when running out in front of enemy fire, we would have
decided to get the Armor first—and we would have been worse off in the end.
However, by recognizing the fact that the utility of time counts in two ways, we were
able to make the correct decision.


To reflect the fact that the value of the Health Kit is not static, we need to simply
replace the line where we added the Health Kit with a call to a function. This function
acts much the same way as our damage function does.
    int MyGame::HealthKitValue( int Time, int Rate )
          int HealthKitEndingValue;

          // Calculate the ending value of the Health Kit
          HealthKitEndingValue = mHealthKitValue - ( Time * Rate );

          // A Health Kit can never be less than 0
          if ( HealthKitEndingValue < 0 ) HealthKitEndingValue = 0;

          return HealthKitEndingValue;
164   Behavioral Mathematics for Game AI

          Notice that we have trapped this function so that the value of the Health Kit can
      never be negative. No matter how late we are in arriving at the kit, it will never
      actually damage us.
          Our resulting decision code would then be:
          GOAL_TYPE MyGame::GetBestAction2()


              // This implementation includes the fact that

              // the value of the Health Kit decays over time

              // Calc our remaining health if we get the Health Kit first

              int HealthFirst = mMyHealth

                                - Damage( mTimeToHealth, mDamageRate )

                                - Damage( mTimeHealthToArmor, mDamageRate )

                                - Damage( mTimeToArmor, mArmoredDamageRate )

                                + HealthKitValue( mTimeToHealth,
                                                  mHealthKitDecayRate );

              // Calc our remaining health if we get the Armor first

              int ArmorFirst = mMyHealth

                               - Damage( mTimeToArmor, mDamageRate )

                               - Damage( mTimeHealthToArmor, mArmoredDamageRate )

                               - Damage( mTimeToHealth, mArmoredDamageRate )

                               + HealthKitValue(mTimeToArmor + mTimeHealthToArmor,

                                                   mHealthKitDecayRate );

              // If both methods would leave us WORSE off than we started

              // then do neither.

              if ( ( mMyHealth > HealthFirst ) &&

                    ( mMyHealth > ArmorFirst ) ) {

                   return GOAL_NONE;

                                                    Chapter 7 The Concept of Utility    165

                // Return which of the two approaches will leave us with

                // the most remaining health.

                if ( HealthFirst > ArmorFirst ) {

                      return GOAL_HEALTH_ARMOR;

                } else {

                      return GOAL_ARMOR_HEALTH;



           The differences from our earlier version is that we replaced the variable
       mHealthKitValue with    the function call HealthKitValue(mTimeToHealth,
            If we wanted to, we could continue to add factors to this decision. What if we
       want to consider the likelihood that the Health Kit or Armor will still be there by
       the time we arrive? All we need to do is add additional variables or functions to cal-
       culate the various utilities and apply them to the decision equation itself.
            Additionally, we could extend this function to accommodate many possible
       end decisions. For example, perhaps it is best to go out and get only the Armor? Or
       only the Health Kit? What if we included a third potential stop along the way—such
       as a nifty weapon. We would only need to process our different utility values and
       compare them at the end. Theoretically, we could have many different possibilities
       stored in an array or a list and, after processing them all, sort them by the result
       value and see which one is the best. But now we are really getting ahead of our-
       selves. We will revisit this later on. (Aren’t you excited?)


       What we have covered in this chapter is what amounts to a core building block of
       decision making. It is the measuring stick with which we determine how important
       the different sides of a problem are to us. In the Prisoner’s Dilemma, this was the
       length of time in jail. In Cutting the Cake, our utility for cake was what made us
       want to make sure we got the biggest piece possible.
           In the Ultimatum, Dictator, and Trust Games, utility was measured in terms of
       money—and yet there a few other things sneaking into the equation… Guilt? Fear?
       Altruism? These factors prevented us from electing to purely maximize our utility
166   Behavioral Mathematics for Game AI

      for money. In the Pirate Game, money was certainly a factor, but fear was definitely
      on our minds. We wanted to avoid getting tossed overboard!
           We found out that, despite our fears, you have to sometimes risk utility to gain
      utility. Pascal was afraid of risking eternal damnation—whatever he believed that
      entailed. He apparently put a pretty negative utility value on it. He was willing to
      risk changing his lifestyle to avoid that and garner the infinitely positive outcome.
           Making the decision to buy a warranty on our seemingly doomed computer
      was a simple calculation—until we pondered how much the utility of having a
      working computer was really worth to us. In a similar vein, how much is it worth
      to protect our barracks with a tower? Is the utility more than the simple value of
      building the tower? It seemed like it was—especially when we considered that there
      may be a deterrent factor to simply having one around. Even if it never gets used, if
      it prevents the enemy from attacking simply because it is there, that is of great util-
      ity to us.
           Marci wanted to win the game of Monopoly (I think) but her utility of winning
      wasn’t as high as her utility of not losing. She didn’t want to take the risk, so she kept
      her properties to herself. That drove up the relative utilities of the properties she
      held in other people’s eyes. Those utilities made the asking price far more than
      what was printed on the Monopoly board. Getting those properties meant potential
      for the future. Just like building the tower could reduce the potential of an attack,
      getting a matched property in Monopoly gave us the potential for more income
      later on.
          And time… we spend it, we waste it, we take damage under fire during it. Time
      has a utility that often needs to be taken into consideration. The actual passage of
      time, in and of itself, is a powerful factor that needs to be measured and pondered.
          So… with all of the above, we seem to have found a way of placing concrete
      numbers on factors in our environment. This is worth that. This is worth far more
      than this other thing. On the other hand, this is nowhere near as important as those
      things over there. However, another caveat is lurking about. The utility of an item
      may change. Not just from one day to the next, but from one moment to the next.
      From one item to the next—even if they are the exact same type of item.
          Even from one chapter to the next…
8              Marginal Utility

     n Chapter 7, we discussed how objects or actions can have relative merits that
     may differ from their values. Different items have different utilities to different
     people or even in different situations. However, going a bit further, the same item
may have a different utility to the same person at a different time. This complicates
our calculations significantly. We cannot simply count on something having a
particular utility any time we encounter it. We experience a constant ebb and flow in
the relative utilities of those items. How, then, can we use utility in our calculations?
If the utility is always changing, what is something really worth? That is the struggle
with trying to quantify utilities.
      For instance, when I get up in the morning, having a can of caffeinated pop is
pretty high on my priority list. I have to admit that, at that point in my day, even the
idea of my wife’s coffee-flavored milk (latte) is almost attractive to me (although
still a little peculiar). That early in the day, caffeine of any sort has a higher appeal.
I figure I’m not alone in this assessment. At a different time of the day, however, my
utility for caffeine is not as great. The can size is the same, the milligrams of caffeine
per ounce are the same—everything about my intake of the product is the same.
The difference is that it’s just not that important to me at that time. The utility has
changed simply based on the time of day.
     Moving one square further down this path, even the changes in the amount of
something that someone already has in an item can represent different increases in
utility. As I mentioned, I like that first can of pop in the morning. In fact, I like it
about as much as I can like any can of pop during the day (100% value). I like the
second shot of caffeine quite a bit as well. It isn’t the same as getting that first dose,
however. There has been a drop-off in the utility that I have for it. As the day pro-
gresses, each additional can of pop has a smaller utility for me (Figure 8.1). I am still
gaining utility from each can, just not as much as the one before, and certainly not
as much as I received from the first can. Eventually, I get to a point where the util-
ity of a can of pop is next to zero. I am not getting any utility out of it at all. I could
take it or leave it.

168      Behavioral Mathematics for Game AI

                FIGURE 8.1 After I have had my first can of caffeinated pop in the morning, the
                 utility that I get from each additional can drops. Eventually, that utility drops to
                    almost nothing—that is, I’m not really getting anything out of it any more.

              Each additional can does not mean the same as the one before it. The amount
         of pop that each can adds (the value) is the same. Similarly, each additional can
         provides me with utility. However, it is simply not as useful to me. The change in
         utility from one can to the next is the marginal utility that I have for that next can.


         As we discussed earlier in Chapter 7, the notion of “value” is generally linked to
         something concrete. However, “utility” is usually somewhat more ephemeral. Even
         something as simple as a $20 bill has a somewhat amorphous utility. Its value is
         based on the two and the zero emblazoned on it for all to see. Its utility can be more
         of an “eye of the beholder” function, however. What makes marginal utility tick,
         however, is that the beholder himself can change his perception of the exact same
         value… much like me and my morning pop. As the perception of the value changes,
         the utility changes. The change from one point to the next is the marginal utility.
             For instance, a starving person on the street would find the $20 bill extraordi-
         narily attractive, would covet it greatly, and do whatever he could to acquire and
         protect it. That $20 bill has a large utility for this person. For that matter, a second
         $20 bill would have a fairly large marginal utility as well—only slightly smaller than
                                                           Chapter 8 Marginal Utility      169

      the first. Note that marginal utility works both ways. Just as gaining that first $20
      was very important, losing it would be tragic.
           A person of moderate means (such as a game developer) would find the $20 bill
      at least interesting and likely wouldn’t turn it down if offered. On the other hand,
      that same person may be of the mind to part with it if someone were in need. The
      marginal utility of $20 has tapered off somewhat. In this case, gaining $20 is mod-
      erately attractive and losing $20, while not pleasant, is only moderately distressing.
          For someone who has plenty of twenties lying around, the utility of any given
      one of them is low. Getting an additional one is likely not worth the calories you
      would burn in raising an eyebrow about it. Likewise, giving one away is just as un-
      eventful. In fact, I’m sure that there are people who would not be terribly concerned
      if one fell out of their pockets and landed on the street in front of the aforemen-
      tioned starving person. After all, what is twenty bucks when you’re sitting on twenty
          The differences between how these three people view that $20 bill is a function
      of how many twenties they already have. The more twenties they have, the smaller
      the marginal utility of each individual one that they gain or lose. When placed into
      the equation of decisions such as purchasing the warranty on a computer or buy-
      ing a $5 latte, these differences in the utility that person places on both the money
      and the purchase come into play.


      The very fact that utility can change is the reason utility is important. If utility never
      changed no matter where we were on the scale against which we were measuring it,
      it would be far easier to calculate. We could often simply apply a ratio between the
      value of an item or action and its corresponding utility. However, the changes in
      utility that take place over that scale are what justifies treating utility as a separate
      entity from value in the first place. Therefore, much of the discussion about utility
      revolves around marginal utility—that is, the changes in utility.
          The reason marginal utility is important is that it can change as well. In the ex-
      ample of the $20 bill above, the utility of the $20 bill was different for the poor per-
      son, the average person, and the rich person. Because of that, the marginal utility of
      an additional $20 bill was different for them as well. However, there are not simply
      three categories of people—poor, average, and rich. There is a continuous transi-
      tion between someone who has nothing and someone who has everything. So at
      what point does the utility (and likewise the marginal utility) of those $20 bills
      change? The answer is that there is a continuous change happening. Each additional
      $20 bill has a utility. As we move from one bill to the next, the utility changes.
170    Behavioral Mathematics for Game AI

       Returning to my caffeine example, in Figure 8.1 we graphed the utility of each can
       of pop and the utility that each can provided me in terms of caffeine. Each addi-
       tional can provided less and less utility. If we were to graph this in terms of my total
       caffeine intake, we can see how the changes in the utility of each can are reflected in
       my overall utility for caffeine (Figure 8.2). For clarity, we have made the graph
       continuous instead of per can. (Think in terms of milligrams of caffeine instead
       of cans.)

             FIGURE 8.2 As additional milligrams of caffeine are added, the utility I gain
               from them still increases, but at a slower rate. The change in utility from
                  one milligram to the next is the marginal utility for that milligram.

            As we can see, each milligram of caffeine adds more utility. The amount of util-
       ity it adds changes depending on how many I already have in me. Initially (left side
       of the graph), each unit increases the total utility by a large amount. This is the same
       as what is shown in Figure 8.1—the first can of caffeine was the most important.
       This is apparent from the initial slope of the graph in Figure 8.2.
            By the time we get to the right side of Figure 8.2, it is similar to what we saw
       with the last can in Figure 8.1. Utility is being added, but not as much. Again, this
       is evidenced in Figure 8.2 by the slope of the graph being almost horizontal. Not
       much utility is added by increasing the amount of caffeine.
                                                        Chapter 8 Marginal Utility        171

    The slope of the graph is particularly important. At any given point along the
graph, it is that slope that represents the marginal utility. As the marginal utility
between any two points changes, the slope of the line between those two points changes
as well. If the slope is changing as we progress along the line, we can identify the very
important features decreasing marginal utility and increasing marginal utility.
     Figure 8.3 shows a small segment of a utility curve. As the value changes from
x to y on the graph, the utility changes by a. Therefore a is the marginal utility of the
change in value from x to y. Likewise, as we change from value y to z, the utility
changes by b. The marginal utility of y to z is b. When we compare the marginal
utilities a and b, we find that b < a. Therefore, as our value moves from 0 to n, we
see decreasing marginal utility.

       FIGURE 8.3 A segment of a utility curve. Despite the fact that the total utility
          is increasing, the marginal utility of each additional unit is decreasing.

 IN   THE   G AME   Building Soldiers

Marginal utility has a very important part in game mathematics. Remember that
utility represents the “importance” of something. Naturally, decision making in-
volves deciding what things are important, which things are more important than
others, and even how much more important things are. However, the importance of
something is not static—just as the importance of the $20 bill is not static and the
importance of any given milligram of caffeine is not static. Things change.
172   Behavioral Mathematics for Game AI

           Marginal utility represents the change in importance of something. Accordingly,
      the factors that go into our agents’ decision making may change in importance.
      By taking into account not only the importance of something but the change in
      importance, we can craft our agents with more realistic, dynamic decision-making
           Plenty of examples can be invoked of times when a linear approach would look
      inefficient or even silly compared to one that respects marginal utility. An example of
      utility gone awry would be building units in a strategy game (Figure 8.4). When you
      build your first military unit, that unit has a utility that may even be greater than its
      value. (Remember the “deterrent effect” of the tower in Chapter 7?) After all, it is im-
      portant to us to have something there to defend our city. However, as we build more
      and more units, the marginal utility of each one diminishes. We could lose one and
      not even really care. At some point, our build manager might want to say “enough is
      enough” lest we end up with a ridiculously sized army that is an essay in overkill. If
      we were to not take decreasing marginal utility into account, the build manager may
      continue to crank out units assuming that each one is just as important as the first.
           This becomes even more important if there are multiple uses for the same re-
      sources. There is probably never a point where we could say that building another
      soldier is not valuable. However, once our army is “big enough,” that is, the marginal
      utility of an extra unit is negligible, the value of the resources that would have been
      spent on that unit could be spent on something else that had a higher utility. That is,
      as the importance of building a soldier decreases, it may look less important than

                      FIGURE 8.4 The first soldiers built are very important.
                 As we build more units, the marginal utility of each additional unit
                   decreases until we arrive at a point where we have “enough.”
                                                     Chapter 8 Marginal Utility     173

building other things. A more colloquial way of saying this is that you could spend
your resources where you get the most “bang for your buck” at any given time.

Constructing the Formula
There are a number of different ways that we could construct formulas to express
the changing marginal utility of building the soldiers. Which one we choose de-
pends largely upon how we want our build manager to view the relative importance
of the soldiers.
    For a simple example, we will use a linear function. This is similar to our caf-
feine example at the beginning of the chapter (Figure 8.1). There is a steady decline
from the first unit to the last. Let’s assume we want to build 10 soldiers. As we dis-
cussed earlier, we are more interested in building the first soldier then we are in
building the tenth. If we were to start with a maximum utility of 100 for the first
soldier and reduce the utility by 10 points for each additional soldier, the utility of
the 10th soldier built would be 10. This could be expressed with the following for-
mula (in terms of the utility of the nth soldier):

    This would yield the following utility values for the soldiers:

    Soldier           Utility
    1                 100
    2                 90
    3                 80
    …                 …
    9                 20
    10                10

     Just as we have seen in our previous examples, the utility of the first soldier is
high. The second soldier is also important. While the 10th soldier does have a pos-
itive utility, it is relatively insignificant compared to the respective utilities of the
earlier soldiers. The marginal utility of each subsequent soldier decreases; the im-
portance of building each soldier after the first decreases.
    One problem with this approach, however, is that it continues on in this fashion
beyond the 10th soldier. The 11th soldier would have a utility of zero, which is prob-
ably interpreted as being useless. Worse still, the 12th soldier would have a utility of
–10. The marginal utility has become negative. In other words, the importance
of acquiring this next soldier is actually detrimental. The implication is that we
174   Behavioral Mathematics for Game AI

      would be better off if this soldier had never been built. While that may sound like
      something a particularly abusive drill sergeant may scream at a recruit, it doesn’t
      make much sense in what we are trying to accomplish.
          Certainly, there are times when marginal utility could turn negative. In my caf-
      feine example, having too much is probably not a good thing. This effect is far more
      noticeable with beer. Eventually, having that next one is going to be detrimental.
      Having one more beyond that is going to even be more injurious. The next one
      beyond that… well, you get the point.
           In this case, the problem we encountered was that we assumed that 10 soldiers
      are all we would need. (As we will see a little later, this approach does have its ben-
      efits in certain circumstances.) A better solution is to construct a formula that re-
      duces the marginal utility of each soldier as we acquire more yet allows that
      marginal utility to stay positive.
          For example, consider the formula:

          The values we are arrive at are:

          Soldier            Utility
          1                  100
          2                  50
          3                  33
          4                  25
          …                  …
          9                  11.1
          10                 10
          11                 9.1
          …                  …
          15                 6.7
          …                  …
          20                 5
          …                  …
          50                 2
          …                  …
          100                1
                                                     Chapter 8 Marginal Utility      175

     We are certainly showing decreasing marginal utility in the above table. Also, as
we wanted, the formula will yield a positive marginal utility for any positive num-
ber of soldiers. However, the curve is significantly different from what we achieved
from our original formula. Notice that there is a very large drop-off between the
first and second soldiers. In our original example, the second soldier had a marginal
utility of 90. It was almost as important to us as the first one. Likewise, the third sol-
dier had a marginal utility of 80, whereas in this example, it is 33.
    This seems to run counter to what we originally intended… concentrating on
building initial soldiers. What we need is a formula that supports n soldiers with-
out giving a negative utility but keeps to the spirit of the original formula—that the
marginal utility should stay high initially and drop off later on once a reasonably
sized force has been built.
    Let’s change to this (somewhat contrived) formula:

    Our results would now be:

    Soldier             Utility
    1                   100
    2                   86
    3                   74
    4                   63
    …                   …
    9                   29
    10                  25
    11                  21
    …                   …
    15                  12
    …                   …
    20                  5
    …                   …
    50                  0.053
    …                   …
    100                 0.00002
176   Behavioral Mathematics for Game AI

           This is a bit more in line with what we originally intended. The marginal util-
      ity of the second and third soldiers is still high, that is, it is still important for us to
      build them. Additionally, as we look toward the 9th and 10th soldiers, we see that
      the values are much lower than those of the initial soldiers. We have also solved our
      problem of negative utility. While the marginal utility continues to decrease, it
      approaches but never reaches zero. Even the 100th soldier has some marginal util-
      ity to us, albeit a very small one.
           The difference between the three formulas is apparent when we graph them to-
      gether (Figure 8.5). In the case of the first, linear formula (the solid line), the mar-
      ginal utility decreased but was destined to cross into negative territory shortly after
      the 10th soldier. The dashed line shows the undesirable rapid drop-off in marginal
      utility that the second formula provided.

                  FIGURE 8.5 All three of the example formulas show decreasing
                   marginal utility yet exhibit very different characteristics such as
                     the rate of decrease or eventually intercepting the x-axis.

          The dotted line is the result of the third formula. Its curve is in the same neigh-
      borhood as the original, linear formula. Even by simply eyeballing the graph, we
      can tell there is a different characteristic, however. It approaches the axis but at a
      steadily decreasing rate. As we saw from the data, even if we extended this line out
      to 100, it would not have reached a marginal utility of zero. Constructing curves
      that exhibit this characteristic is a key to building suitable algorithms in behavioral
      mathematics. We will get into more detail on how to approach these issues later in
      the book.
                                                            Chapter 8 Marginal Utility      177

       In the above case, the marginal utility decreased as a certain point was approached.
       Contrary to that, sometimes marginal utility can increase. In this case, something
       that may not be important originally may become more important as the situation
       changes. As the other end of the scale is approached, the marginal utility increases,
       making each additional unit more and more important.
            Often, increasing marginal utility is tied to goals. For example, the minimum
       purchase price of an object is a goal that must be attained. If we wanted to purchase
       a new video game at a price of $60 (I may have just unwittingly dated this book),
       the first dollar we save is not very important. We do not have much money, and the
       goal is distant. In fact, we may not be certain that we are ever going get to the goal
       of $60. On the other hand, if we have $59, we know we are almost at our goal. We
       find ourselves saying, “Just one more!” The marginal utility of that last dollar is far
       greater to us than the early ones.
           Another example of increasing marginal utility is accumulating properties in
       Monopoly. In Chapter 7, I related how Marci would not accumulate properties
       through trading. She did not realize the value of how each additional property she
       acquired gave her more leverage than the one before.
            In Monopoly, it is important to continue to acquire properties. This is evi-
       denced by how one single property does not provide much income. Similarly, hav-
       ing two properties—even if they are the same color—does not provide a utility that
       is much greater than the stated values on the deeds. Acquiring the third property in
       a group, however, greatly increases the amount of income that can be collected
       from an opponent landing on any one of the three properties. The marginal utility
       of acquiring the third property is greater than the utilities of purchasing either the
       first for the second property. It’s because of this increased marginal utility of
       acquiring the third property in any given set that the other players and I were so
       desperate and willing to trade so heavily to acquire those properties from Marci.
       We found ourselves saying, “Just one more!”
           In fact, identifying scenarios that involve increasing marginal utility is often as
       simple as finding situations in which the phrases “just one more” and “almost
       there” are appropriate. Those phrases highlight the relative importance of the steps
       that are necessary to finish a task or achieve a goal.

         IN THE   G AME   Declining Health

       In our Health Kit and Armor example in the previous chapter, we alluded briefly to
       a caveat that could possibly affect our interest in pursuing the Health Kit. Specifically,
178   Behavioral Mathematics for Game AI

      the lower we are on health, the more important the Health Kit is to us—despite the
      fact that the value of the Health Kit does not change. If we have 80% health and are
      only loosing it at the rate of two points per second, getting a 50% Health Kit isn’t a
      very high priority. If we have 50% health and are losing it at the same rate, we might
      want to keep an eye out for one. If we only have 20% health, it becomes a top pri-
      ority. The reason for this is that the last 20 points of health we lose—those between
      20 and 0—are far more important than the first 20 points of health we lose —from
      100 down to 80. That is, the marginal utility of each health point increases as our
      health drops lower.
           In that example, we considered that the utility of a single point of health was
      static. Fifty percent is fifty percent. However, we may not have been comfortable
      letting our health get too low in the interim even if the outcome would have been
      better in the end.
           This phenomenon is more apparent if we compare the importance of the first
      health point (e.g., from 100 to 99) lost to that of the last health point we may lose
      (Figure 8.6). Obviously, losing that first point of health is not important. Losing the
      last one is… well… fatal. By extension, losing the next-to-last one is pretty darn im-
      portant as well. On the other hand, going from 99 to 98 isn’t anything to fret about.
      The same could be said for going from 98 to 97. Someplace in the middle ranges of
      health, however, these two views on the importance of a single point of health are
      going to have to meet. Eventually, as we continue to lose health point by point, we
      are going to have to be more concerned about bleeding all over the place. Accordingly,
      the importance of each health point lost is gradually going to increase. That is, the
      marginal utility of each health point increases as we approach zero health.

               FIGURE 8.6 As our level of health decreases, marginal utility of the
               Health Kit increases. The lower we are on health, the more important
                          it is to find a Health Kit to replenish our supplies.
                                                     Chapter 8 Marginal Utility      179

    If we had taken that into consideration as we decided whether to acquire the
Health Kit or the Armor first, our outcome may have been different. The solution
to that problem may have been to assign a utility to the Health Pack that differed
from its “face value.” That is, treat it as less than 50 if we have high health, but more
than 50 if we are running low. That way, when we considered our health status at
the end of each leg, we may have revised our decision as to whether or not to get the
Armor or the Health Kit first. To repeat, if our health is low (e.g., 20), 50 points of
added health mean more to us than adding the same 50 points when our health is
high (e.g., 80).

Increasing Is Not Always the Inverse of Decreasing
It is worth noting that the Health Kit example could have been expressed in terms
of decreasing marginal utility as well. However, caution must be exercised in trans-
posing the utility curve. In Figure 8.6, our level of health is expressed as decreasing
as we move from left to right. Consequently, the utility of the Health Kit is increas-
ing as we moved from left to right as well. On the left side, we express very little util-
ity for the Health Kit, as our health level is high. On the right, as our health nears
zero, the Health Kit is of great utility. When dealing with marginal utility, however,
the shape of the curve is important. It is not enough to simply say “low utility when
health is high; high utility when health is low.” How the marginal utility changes
over the course of the progression is important as well.
     Again referring to Figure 8.6, there is no movement in the utility of the Health
Kit when our health is high (the left side). Only at the end (when health gets criti-
cal) do we see major movement in the marginal utility of the Health Kit. The rate
of increase in the utility (the marginal rate) is at its greatest just as our health value
approaches zero. This makes sense to us—the urgency of acquiring a Health Kit gets
significant as we are about to die.
     When we reverse the axis so that health is now increasing from left to right, we
want to represent the marginal utility of the Health Kit as decreasing from left to
right (Figure 8.7). However, we must take care to replicate the same progressive
effect on the marginal rate. Both lines in the graph represent decreasing marginal
utility. The difference between them is when that change is at its greatest rate.
     If we were to duplicate the effect in Figure 8.6, we would need to use the solid
line in Figure 8.7. This is best examined by working from the right side of the graph
(n) and back toward zero at the left. Traversing the graph in this direction gives us
the same result as that illustrated in Figure 8.6: The utility for a Health Kit stays
small and then slowly increases until the urgency rises significantly close to zero.
180   Behavioral Mathematics for Game AI

                 FIGURE 8.7 We can restate increasing marginal utility as decreasing
              by changing the direction of the axis. Because both curves are descending,
           they both show decreasing marginal utility. They exhibit different characteristics,
               however. Care must be taken to represent the effect in the desired way.

           In contrast, both the solid line and the dashed line in Figure 8.7 show decreasing
      marginal utility while moving from left to right (i.e., health increasing). The effect
      is very different, however. In the case of the dashed line, the utility for a Health Kit
      would start high when the agent’s health is zero. It would stay high until moderate
      health was reached, at which point it would slowly begin to decrease. As the level of
      health approaches the maximum, there is a rapid change from one point to the
      next. In fact, most of the change in utility on the dashed line takes place over a span
      of health where the solid line’s utility is already close to zero, that is, not important.
           If we were to put this into practice in an example of an agent incrementally
      losing health (i.e., moving right to left on the graph), we would notice an almost im-
      mediate jump in the utility of acquiring a Health Kit. The first small wound would
      assign great importance to finding a way to heal. This is not the original behavior
      that we had constructed in Figure 8.6. The dotted line is incorrect. The lesson to
      learn here is that when we converted an expression of increasing marginal utility
      into terms of decreasing marginal utility by flipping the axis, we had to make sure
      that our resultant utility curve was the same as well.
           Many more subtle caveats can trip up the unwary designer when dealing with
      curves in this fashion. And yes, keeping track of the difference between value and
      utility makes for magnificent headaches. (Of course, this increases the utility of
      pain killers compared to their value.)
                                                           Chapter 8 Marginal Utility     181

      We could have addressed this without flipping the axis as we did above. The
      same idea can be represented by going the opposite direction on a marginal util-
      ity curve. If moving left to right on a graph gives us decreasing marginal
      utility, then moving right to left represents increasing marginal utility. Often,
      the choice of which direction to express a utility curve will depend on what the
      predominant usage of it will be.


      Marginal utility is used for more than just determining the relative importance of
      things we are either acquiring or already have (such as the soldiers and health).
      Marginal utility can also be used to weigh things that we are either giving up or
      putting at risk.
           For example, each additional dollar of our money that we spend has an increas-
      ing marginal utility. While we may be willing to spend our first or second dollar, we
      may not be as enthusiastic about spending our last dollar. Along the way, each ad-
      ditional dollar that we spent meant more and more to us.
          In the example of purchasing the warranty for a computer in Chapter 7, one
      question that was never addressed but is relevant in an intriguing sort of way is
      “How much money do we have in the bank?” We aren’t simply talking about
      whether we can afford to purchase the warranty in an absolute sense, that is,
      whether we have that extra $600 over and above the cost of the computer itself. We
      are addressing something a little more ephemeral. We want to know what that
      extra $600 means to us.
           When Pascal cautioned us that we “need to know what is at stake,” there was
      more to it than a simple value measurement of the wager alone (in this case, the
      dollar amount of the warranty). Not only do we have to consider the relative value
      of the wager to the potential outcome, but we must consider the value of the wager
      itself relative to what we can afford. If money is tight, a person may opt to not take
      the extra cost of the warranty and instead take his chances on getting through that
      first year unscathed. The value of the money saved by not buying the warranty
      might be needed elsewhere. To use a rather obvious example, rather than purchas-
      ing the warranty to guard against the possibility that we might need it, we may choose
      to keep the money to fund the extreme likelihood that we will want to purchase food
      in the next 12 months.
         Conversely, a multimillionaire would probably not bother with the hassle of a
      warranty on this computer. To him the replacement cost is negligible. (One could
      make the case that a multimillionaire should be buying a better computer than the
182   Behavioral Mathematics for Game AI

      hypothetical brick-to-be that we have been using in the example.) For that matter,
      maybe the multimillionaire should just buy the warranty and be done with it. After
      all, the cost of the warranty is loose change to him.
           It all comes down to how important that money is for a person—not so much in
      an absolute sense (“there is enough money in my account to cover the warranty…”),
      but more so in a relative one (“…but ensuring a working computer in the future is
      not as important to me as having beer money right now”). What we have witnessed
      is the application of marginal utility to something that is being given up (or even
      potentially given up). This is the marginal utility of risk.

      The St. Petersburg Paradox
      The phenomenon of marginal utility of risk was illustrated in spectacular, yet
      controversial, fashion by Swiss mathematician, Daniel Bernoulli, in 1738. In intro-
      ducing the St. Petersburg paradox, Daniel showed that using a purely mathematical
      approach of probability theory to determine decision theory can cause problems.
      (Credit where credit’s due: The original problem was conceived by Daniel’s cousin,
      Nicolas Bernoulli. Daniel just presented it and got it published in Commentaries of
      the Imperial Academy of Science of Saint Petersburg.)
           The St. Petersburg paradox is based on a lottery concept of a very simple sort.
      The pot starts at one dollar. You then repeatedly flip a coin. On any given round,
      if the coin shows tails, you win the pot. If it shows heads, you get to flip the coin
      again. Therefore, if you flip tails on the first toss, you would win the initial value of
      the pot: $1. However, if you were to flip heads first and your second flip showed
      tails, you would win $2. If you showed heads on the second flip and tails on the
      third, you would win $4. Skipping ahead a bit, if you got on a roll and threw ten
      heads in a row before tossing tails, you would win $1,024 (210). The big question is,
      “How much would you be willing to pay to enter this lottery?”
          The question makes for an interesting twist on Pascal’s charge that we need to
      determine what is at stake. In this case, we are going to try to determine what it is
      that we want to put at stake. The temptation is to look for the purely mathematical
      solution to this question. After all, that approach worked admirably for us in deter-
      mining how much we should pay for a warranty on our (less than stellar) com-
          The starting point involves trying to figure out how much we are likely to win.
      This part, at least, is relatively straightforward mathematically. All of the informa-
      tion we need was given to us in the rules of the game. A single flip of a coin is a
      50/50 proposition. The pot doubles with each “successful” flip of a head. Let’s bust
      out the math.
                                                    Chapter 8 Marginal Utility    183

   On the first flip, we have a 50% chance of winning the $1 pot. So our expected
winnings after one round (E1) are

    We can expect to win an average of 50 cents on the first round. If we were to be
betting only on one flip of a coin, we would be done now. We can expect to win 50
cents on average, so we would be willing to bet 50 cents on the game. In theory, over
time, we would hope to break even with this. If we are allowed to wager less than 50
cents, that’s a bonus, as we would most likely come out ahead in the long run. A
wager over 50 cents is not in our best interests, as we are going to be bleeding cash
the longer we play. (Although that doesn’t seem to stop people from lining up at
casinos, does it?)
    Our little lottery game doesn’t stop there. If we pass the first flip (50% chance),
we then have another 50% chance of winning $2. Therefore,

    Third flip?

    Or, to simplify,

    So now we can expect to win $1.50 on average after three flips. It seems that, if
we were to stop here, we should wager $1.50 because we have a chance of winning
that after three flips.
     The peculiarity comes into play when we extend things out to infinity. After all,
in theory, we could flip heads an infinite number of times in a row before throwing
that tail and collecting our winnings. Looking at it in a formula,
184   Behavioral Mathematics for Game AI

           Translated into English, this reads: The estimated payoff on n tosses (as n goes
      to infinity) is an infinite accumulation of 50 cents, which, of course, is an infinite
      payoff. Following our premise that we are willing to wager what we are likely to win
      over time, we should be willing to put up an infinite amount of money for the priv-
      ilege of playing the St. Petersburg lottery. But is that necessarily feasible?
          On one hand, who would be willing to put up an infinite amount of money to
      play this lottery? Certainly, none of us has an infinite amount of money, so we
      should be a little more reasonable and suggest that we could put up everything we
      own for this lottery. Doesn’t this sound a little silly? It doesn’t matter how many
      times I tell you that you could win an infinite amount of money, or even that it is
      possible that you could at least win all your assets back. You are still probably not
      going to take me up on this deal.
          On the other hand, if there is a potential for such a massive payoff, who would
      offer this lottery in the first place? Even if the players are wagering massive sums of
      money, there is always the possibility that you are going to pay out far more than
      what has come in. Jokes of governmental debt aside, I can’t even see a city, state, or
      country being able to back the potential “infinite” payoff.
          So what went wrong? Why won’t people be willing to put up massive amounts
      of money to play the St. Petersburg lottery? Why won’t anyone offer it anyway? The
      problem came into play when the nasty little concept of infinity got involved.
      The trouble you can get into when infinity is involved is actually worthy of its own
      book. (In fact, To Infinity and Beyond by Eli Maor is a fun read about the cultural
      effect that the notion of infinity has had over time.) However, the lesson we have
      learned actually has gone beyond the pitfalls of limitless numbers.
           By using infinity, Bernoulli unwittingly crafted a problem that can be posed to
      any person, regardless of their personal wealth. The hidden factor that makes this
      possible is the underlying question of “how much is this worth to you?” That is a
      question that each of us would be forced to ask ourselves as we pondered how
      much we would be willing to put on the line in the St. Petersburg lottery. The com-
      plicating factor is that there is not one factor to consider (and one corresponding
      question to ask). There are actually two. Both of them involve marginal utility. In
      fact, they are actually almost reciprocals of each other.

      Marginal Utility of Reward
      First, we must consider what value we would put on the money won. Sure, we would
      like to win a few bucks. That’s always nice. However, as the money goes up, it starts
      to look the same. The increase from $1 to $2 on a wager is rather substantial. We
      have doubled our money! In fact, the increase from $2 to $4 is just as attractive.
      Again, we have doubled our money. The case can be made that having $4 is twice
      as good as having $2, just as having $2 is twice as good as having $1. However, as
                                                       Chapter 8 Marginal Utility      185

many psychological studies have shown, there is a sort of leveling out point. Once
a certain threshold is reached (although that threshold is fairly vague), extra money
isn’t as important as it was earlier on in the acquisition process.
    In similar fashion to the caffeine I ingest so ravenously in the morning, after a
certain point, people tend to say, “I have enough” with other things as well. Let’s say
you have reached approximately $2 million in winnings from your coin-flipping
adventure in the St. Petersburg lottery. Is 4 million really twice as good as only 2
million? What about 8 million? Is that really twice as good as having only four mil-
lion? If you were sitting on $150 million, is your life going to be twice as good if you
double it to $300 million?
     The quandary would be a little more obvious if we were working with a fixed
value of money rather than doubling each time. For instance, increasing from $100
to $200 is nice. Increasing from $1,000 to $1,100 isn’t as big of a deal. Increasing
from $1,000,000 to $1,000,100 is loose change. Certainly, the effect is more obvious
with this example—particularly because we can see the percentage change that each
additional $100 has. Is it doubling the first hundred? Is it a 10% addition to $1,000?
Or is it a miniscule 0.01% increase when tacked on to a million? However, while it
is more obvious when expressed this way, even doubling the value each time trans-
lates into a diminishing utility at some point. It’s simply a matter of scale.
     Regardless of scale, we can represent the concept of diminishing marginal util-
ity with a curve such as the one on the left side of Figure 8.8. Note that, despite
being the same width as the areas represented by a, margins b and c show signifi-
cantly less of a vertical increase. In other words, despite the fact that the face value
of a prize in a lottery increases, each additional amount of increase begins to mean
less and less in terms of the utility that we would have for that prize.

 FIGURE 8.8 As the value of a prize goes up, the marginal utility of each increase gets
smaller. As the value of a wager goes up, the marginal utility of each increase gets larger.
186   Behavioral Mathematics for Game AI

      Marginal Utility of Risk
      As interesting as the mental exercise of pondering “how much is too much?” can be,
      it is still a little difficult to see what is at stake. After all, we can win an infinite
      amount of money. Who wouldn’t want that? Why artificially limit it? Well, things
      become a little more sobering when the other side of the wager is analyzed. How
      much are we willing to risk in this lottery? The reason this isn’t quite the same
      problem is that the scale of the x-axis is a little more defined. After all, most of us
      don’t have an infinite amount of money to plop down on a wager. In fact, we usu-
      ally have a maximum net worth with which we can work. And that draws the
      proverbial “line in the sand” on making our decisions.
          As we discussed earlier, marginal utility can increase. Just as with the curve for
      the reward, we can envision a similar curve for risk such as the one on the right side
      of Figure 8.8. In this case, the utility change of the initial money is not that big of a
      deal for most people. Wagering a dollar is fine for most of us. The second dollar as
      well. As the value changes early on, the utility difference is not all that significant.
      However, there’s a point where we start to raise our eyebrows a bit. Perhaps not if we
      go dollar by dollar… but more likely if, as above, we go $100 at a time. Naturally,
      if we start doubling the values, things get out of hand quickly.

      Maximizing the Difference
      Either way, there comes a point where the respective utilities of the money we are
      risking and the money we could win are moving in opposition to one another. Any
      additional money we stand to lose means more and more to us, that is, the marginal
      utility goes up significantly. On the other hand, the marginal utility of any addi-
      tional money we stand to win gets smaller. Eventually, we arrive at a conflict in which
      the additional risk is not worth the additional reward.
           The answer to the St. Petersburg lottery question—“How much would you be
      willing to wager?”—is, therefore, highly subjective when expressed in terms of value
      (which is, after all, what the question is asking). The secret is for each person to find
      the “sweet spot” where there is a maximization of the ratio of the utility—not value
      —that would be gained over the utility of what is being risked.
          If we were to overlay the utility curves from Figure 8.8, we would see, in a very
      abstract sense, this point illustrated in Figure 8.9. There is a point where the two
      curves are the furthest apart. In English, “For that kind of payout, I can risk this
      much!” That is the maximization point. Accordingly, there is a point where the two
      curves cross. Once again, in lay terms, “It’s not worth the risk.”
         Given the scenario of the St. Petersburg lottery, I wouldn’t be willing to wager
      more than a dollar. (Perhaps a by-product of running the numbers too much?)
                                                             Chapter 8 Marginal Utility      187

       FIGURE 8.9 As the value of a prize goes up, the marginal utility of each increase gets
      smaller. As the value of a wager goes up, the marginal utility of each increase gets larger.


      Of course, as with many of the concepts that we discuss in the book, trying to
      mathematically describe “enough” is a bit more art than science. In the St.
      Petersburg paradox, the graphs we used were indistinct in that our scale of value
      ran from zero to n, infinity, or generally “somewhere over there someplace.” The
      utility axis was even more hazy in that it really only expressed “more” or “less” utility.
      In a sense, that was by necessity since the entire dilemma dealt with “personal pref-
      erence in the space of possible infinity.” If that isn’t vague, I don’t know what is.
           In the examples we have shown so far, some of the thresholds we have set have
      been concrete. In the case of declining health, reaching zero health is a fixed point.
      There is nothing subjective about it. The same could be said for spending our last
      dollar. Some thresholds are defined by the rules of the environment. In Monopoly,
      the rules of the game state that three properties of the same group have more util-
      ity when combined than they do individually. Therefore, it is an inherent property
      of the game that three properties is an important threshold for determining mar-
      ginal utility.
           On the other hand, some of the thresholds we suggested were somewhat more
      flexible. In the case of building soldiers to defend our town, we arbitrarily decided
      that four soldiers was enough. We made a design decision. The same could be said
188   Behavioral Mathematics for Game AI

      for deciding that we need six soldiers before we send out our patrol. Those thresh-
      olds were entirely up to us to set. We needed to consider what would look and feel
          Generally, thresholds are set at a fixed number. How we arrive at that fixed
      number, however, could be determined in many different ways. In fact, the thresh-
      old target could continually be changing. From the standpoint of calculating utility
      and marginal utility, however, we need to know what the threshold is at that moment.
      Once we have that fixed number, we can begin to include it in our calculations.
           Also, as we mentioned earlier, thresholds could represent different approaches
      in the way our marginal utility behaves. Marginal utility could change as we move
      toward a threshold or it could change as we move away from a threshold. Marginal
      utility could either decrease or increase as a threshold is neared. There are no magic
      solutions for how thresholds are either defined or constructed. What is important
      is that we recognize that they exist so that we can start thinking in terms of them.
      We will explore methods of calculating utility, marginal utility, and thresholds later
      in the book.


      Up until this point, we have been envisioning utility curves as being somewhat
      asymptotic. That is, they curve in a single direction, approaching either a specific
      value or a particular slope. There are times when utility curves—both for risk and
      reward—may reverse themselves. This has the effect of making the extra value
      risked “worth it” again.
           The quandary posed by the St. Petersburg paradox is a staple of many of the
      game shows that we see on TV these days. Games like Who Wants to Be a Millionaire
      and Deal or No Deal have a component to them that encourages the players to
      weigh the marginal utility of the money they could win by continuing. There is an
      assumption that, as we saw in Figure 8.9, a person will reach a point where it is “no
      longer worth it” and will quit (if he hasn’t failed already). There is a subtle flaw in
      this premise.
           Interestingly, examples of this behavior are evident in those shows if you know
      where to look for them. Often, players in those games and others like them change
      their minds about whether to risk more money for achieving a greater reward. We
      will use Who Wants to Be a Millionaire as an example.
                                                    Chapter 8 Marginal Utility    189

Who Wants to Be a Thousandaire?
According to the rules of the game, if players get to the $32,000 level in Millionaire,
they are also guaranteed that if they lose over the next few questions they will still
receive that $32,000. There are a few of those safety net values in the game that
guarantee that, once you achieve those levels, you won’t go home empty handed.
    Imagine that you are a player who is at a prize level of $125,000. Obviously, you
have passed the $32,000 safety net—you are guaranteed that you will take home
that much. The next prize level is $250,000—if you get the question right. If you risk
going forward by attempting the next question, you could win another $125,000 for
a total of $250,000. On the other hand, you could also risk losing $93,000—drop-
ping you back to the $32,000. So, if your odds were 50/50 on winning $125,000 or
losing $93,000, which would you choose? Also, remember that your third option is
to not wager any more and simply walk away with the $125,000.
    Solving this mathematically from a strictly value standpoint (which is the only
thing numbers can convey without context) is fairly simple. Working with the as-
sumption that we have a 50/50 chance of selecting the right answer (which, in fact,
one of the “lifelines” will guarantee you) makes things even easier. We have a 50%
chance of winning $125,000 and a 50% chance of losing $93,000. We’ll shave the
zeros off for simplicity.

    This equation says that our estimated winnings (E(W)) are statistically likely to
be a gain of $16,000. Of course, that is based entirely on the odds being extended
over time. According to the rules of the game, there is no way that after this single
decision we could actually walk out with $16,000 more than the $125,000 we have
at the moment. The formula only shows us that the advantage is slightly in our
favor… and would be statistically more accurate if we were to make this bet numer-
ous times over the long term. Because we are only making this wager one time,
however, the bottom line is that we will either have $32,000 or $250,000. But what
about not wagering? We still do have the option of not continuing on and simply
taking our $125,000 off the table.
     Given those three values—$250,000, $125,000, and $32,000, many of us have
an opinion about the utility of the three prizes relative to their face values. Most of
us would be quite pleased to walk away at this point. You often hear players or their
families saying, “That’s good enough—I don’t want to risk it anymore.” Taking
home $125,000 is “enough” to make us happy about our little adventure. On the
other hand, for others, simply knowing that the minimum that they could leave with
was 32 Big Ones is fine with them—they would be willing to risk losing that $93,000
to try for another $125,000. But why the difference? Aren’t the numbers clear?
190   Behavioral Mathematics for Game AI

      Enough Is Enough Until It’s Not Enough
      Much of it comes down to the individual perception of wealth. Some of this is a
      psychological factor. A number of studies have shown that once certain plateaus are
      reached, people are satisfied. Interestingly, these plateaus are often in tiers (Figure
      8.10). That is, people might be satisfied for some time at one tier so that the marginal
      utility of additional wealth drops as it does in Figure 8.10. As that wealth increases
      for a while, however, another perceived “level of wealth” is approached so that the
      marginal utility increases rapidly once again.

       FIGURE 8.10 As the value of a prize goes up, the marginal utility of each increase gets
      smaller. As the value of a wager goes up, the marginal utility of each increase gets larger.

           I can’t provide firm figures on where these tiers are since they differ wildly
      based on the cost of living, the mentality of the society, and obviously the current
      value of the dollar. However, the function seems to be tied up in the perception of
      lifestyle changes that could happen at those tiers.
           If a lottery prize is less than $100, it might buy you a nice dinner for the family
      —or a single AAA game title. It’s not going to cause a significant lifestyle change,
      however (unless you lock yourself in your house to play that AAA game 16 hours
      per day for weeks on end… but I digress). Prizes in the hundreds of dollars are
      intriguing because they could help you pay off some bills or buy a new next-gen
      console. Prizes in the thousands get a little more fun, as do those in the tens of
      thousands. At that point, people are beginning to think “new car,” for example.
      However, those sums aren’t going to put you on Lifestyles of the Rich and Famous.
                                                    Chapter 8 Marginal Utility    191

     When you get to six figures, however, people start thinking about entire new
houses. (At least here in the Midwest, six figures gets you a nice house.) And that
seems to be a magic number on shows such as Millionaire and Deal or No Deal.
That tier is enough of a lifestyle change to make a difference to people. When you
are sitting on $125,000, an extra 10 grand isn’t going to be that much of a deal. Even
the extra $125,000 in the example above isn’t as important because the threshold of
“nice new house” has already been reached.
    The next threshold seems to be in the area of a million, which is not surprising
given the popularity of the term millionaire. In fact, it is not a coincidence that the
TV show Who Wants to be a Millionaire? is named the way it is. Can you envision
answering the titular question any other way than “Uh… I do!” However, using the
standard rules of both that show and Deal or No Deal, the top edge of the game is
the million dollars. Therefore, along the way most people end up asking them-
selves the question “Do I want a hundred thousand dollars?” Usually, when they get
to that point, they go home. If you were to extend those games to include tens of
millions, I suspect we would see another tier at the one-million mark. The statements
by the players and families would then be, “We’ve got our million, that’s enough.
Quit now.”
    Sometimes, that is good advice. Even Fast Eddie should have taken Charlie’s
advice in the movie The Hustler. When Eddie was up $11,400 on Minnesota Fats,
Charlie tried to get him to quit for the night saying, “You wanted ten thousand?
You got ten thousand.” Eddie didn’t quit and proceeded to lose it all.

 I N T HE G AME   How Many Troops?

The phenomenon of multiple thresholds can be applied to games as well. Earlier,
we used an example of the marginal utility of building extra units in a strategy
game. Once a certain number of troops is reached, the marginal utility of building
additional units begins to decrease—we simply have “enough.” However, as we
have been examining above, having “enough” is often tied to context. With the
prize money, we theorized that people could be thinking in terms of a nice dinner,
a new video game, a new game console, a new car, paying off a mortgage, buying an
entire new house, and so on. Each of those was a milestone that defined enough—
at least temporarily. Once that landmark was reached, the marginal utility of receiv-
ing additional value dropped somewhat. However, as the next milestone was
neared, the frame of mind changed so that additional value was now thought of in
terms of “if we only had a little more.” When this happens, the marginal utility of
additional value increases again. Getting additional value is important if the next
goal is to be attained.
192   Behavioral Mathematics for Game AI

          In our example of building units, “enough” can be relative to certain thresholds
      or goals as well. For example, rather than the lifestyle changes that the prize money
      can help us attain, let us frame building units in terms of game-contextual goals.
      We may think of counting our units in terms of having enough to:

          Defend our base against a small raid
          Defend our base against an attack
          Defend our base and harry a small enemy outpost
          Defend our base and destroy a small enemy outpost
          Defend our base and destroy a medium enemy outpost
          Defend our base and destroy a well-defended enemy base

          Each of those goals would require a specific number of troops. As we reach the
      number, we would be temporarily satisfied with that number. While we may not
      stop building troops entirely, the urgency of building them isn’t as great as it was
      when we were just short of the required number.

      A Few Good Men
      Let’s work from the top down (or bottom up in terms of troop levels). Early in a
      real-time strategy (RTS) game, we may decide that we want to have four soldiers in
      our fledgling base just in case the enemy sends a couple of soldiers of his own over.
      If we haven’t built any of them yet, the utility for the first soldier is fairly high. We
      really need that first one finished so we don’t have to worry. If someone comes,
      we can at least slow them down a little. Once the first soldier is built, the second one
      is a little less urgent, but important nonetheless. Three is certainly better than two,
      but it is no great tragedy if an attack were to come prior to the third one being fin-
      ished. The fourth soldier may very well be our insurance policy. The utility it pro-
      vides is less than the third—and certainly far less than the utility of getting that first
      soldier out the door. Once we reach that minimum of four soldiers, we may have a
      comfort level that allows us to concentrate on other things such as training more
      workers, erecting other buildings, and so on. Note that those other concerns also
      have utility values that would be taken into consideration as well… but let’s not
      complicate things just yet. (Yes, that sort of fun comes later in the book.)
          So, because we have achieved that threshold of four soldiers, building a fifth
      one doesn’t provide us with much utility at all—at this time. This is hardly a perma-
      nent solution, though. It should be obvious that we aren’t going to accomplish
      much in a typical strategy game with only four soldiers. In fact, we will likely
      continue cranking them out, but only if other more immediate priorities (i.e., those
      with higher utility values) don’t get in the way.
                                                  Chapter 8 Marginal Utility    193

    Eventually, we are going to be concerned about defending against a larger attack,
and to do so, we will need more units. For that contingency, let’s assume we feel
that 16 soldiers is the right number. While the numbers—and even the scale of the
numbers—may have changed, the pattern is similar. If we only have 9 or 10 soldiers,
we are going to place a great deal of importance on the 11th and 12th. Once we
arrive at 14 or 15, the urgency starts to fade—as expressed by the marginal utility
of each individual unit dropping slowly once again. The pattern will repeat as we
move down our list of goals, although its character may change somewhat.

Are We There Yet?
As we discussed earlier in this chapter, marginal utility can increase as a threshold
is approached. This generally occurs when we have specified a minimum for a pro-
ject rather than a preferred level that makes us feel safe. For example, let’s assume
we have our 16 defenders and accomplished anything else on our nonmilitary list.
We may want to construct a patrol dedicated to finding (and subsequently annoying
the heck out of) nearby enemy outposts (Figure 8.11). If we decide, through whatever
algorithmic crystal ball we employ for this sort of decision, that this patrol needs
to be comprised of six units—and we aren’t leaving until we have six—then that
becomes another threshold. Building the fifth and sixth units is of great importance.
Why would we build the other four if they are going to be left waiting?
    Notice how this differs from the prior examples of the defending force. In those
examples, if we were one short, we were still rather comfortable. Three instead of
four initial defenders was not a great setback. The same could be said about 15 in-
stead of 16. If there was a priority to do something else, we could have stopped

   FIGURE 8.11 As we approach and reach various thresholds in building our army,
       the marginal utility of each additional soldier may increase and decrease.
194    Behavioral Mathematics for Game AI

       without building the 16th soldier for a time. The first 15 could do the job almost as
       well as if they had waited for that 16th one. Likewise, the value we spent on those
       first 15 would not have been wasted or left sitting idle. However, by specifically
       defining that we want a patrol of six soldiers, it does us no good to build the first
       five and then quit. In that case, the marginal utility per soldier actually increases as
       we near the goal, rather than leveling off the closer we get.


       As we have seen, marginal utility can assist us in expressing some of the subtlety in
       how an agent could perceive the world. Just as utility allows us to give an object or
       an action a level of importance that was different than its value, marginal utility allows
       us to vary that level of importance as the number of objects or actions increases or
            Also, as we saw with Bernoulli’s St. Petersburg paradox, marginal utility can
       assist us in analyzing the differences between two utilities such as risk and reward.
       By thinking in terms of the differences in utility rather than differences in value, we
       can solve problems that would otherwise yield irrational results such as betting an
       infinite amount of money.
           We explored the idea of thresholds as anchors for these utility judgments. By
       finding places where we say, “That’s enough!” we establish targets and goals for our
       marginal utility to approach. However, we also saw that the thresholds themselves
       may not be as much of an anchor as we thought. Sometimes, a threshold may be
       “good enough for now,” only to change to a different one later on.
            In the end, we have found that marginal utility can be a powerful tool in ex-
       pressing shifting opinions about things. And, because most decision makers are not
       purely rational, opinions are a major component of the decision process. As we
       continue through this book, we will encounter numerous places where marginal
       utility plays an important role in constructing realistic and dynamic behaviors.
9             Relative Utility

       here is still one problem that we have yet to solve with regard to utility and
       marginal utility. While we can now express differences in importance between
       one item and another item of the same type, the numbers that we have been
using so far have often been abstract. What does a utility of 100 on my first can of
pop mean? We know that it is twice the utility of my sixth can of pop (50), but what
does it mean when compared to my utility for salsa, for instance?
    As we examined the potential uses for multiple thresholds when building our
armies in Chapter 8, one of the reasons we cited for pausing the building of soldiers
was so we could potentially build other things that were more important at the
time. How do we know if something is more important? How much more impor-
tant is it? Would we have built the tower over our barracks in Chapter 7 if we had
needed the resources to finish one final soldier for our patrol? For that matter,
would we have built the relatively unimportant final soldier in our town defense
force if we needed those resources to build the tower?
    Utility and, by extension, marginal utility are only valuable when they are put
into a comparative context. As we have discussed, to compare things, we need to be
able to measure them. More importantly, we need to measure them in such a way
that they can be compared to other measurements. The advantages of money and
time are that they can be measured in some form—for example, dollars and cents
or hours, minutes, and seconds. Production of a manufacturer can be measured in
the number of units—and compared to the dollars we are paying the manufacturer
and the price we are receiving for the unit. The efficacy of a fertilizer can be mea-
sured in the growth of the plant in inches or in the output of a crop in bushels. This
can be compared to the price we pay for the fertilizer and the price we receive for
the bushel. Many things can be empirically measured in some form or another.
However, plenty of considerations go into making a decision that can’t necessarily
be counted up and compared. Some things just don’t lend themselves to being
measured in so distinct a fashion. People have certainly tried it before, however,
and those attempts merit a closer look.

196   Behavioral Mathematics for Game AI


      In the late 18th century, the English utilitarian philosopher Jeremy Bentham devel-
      oped what came to be known as the felicific or hedonic calculus. He developed an
      algorithm that was formulated, in his mind, to help in calculating the degree or
      amount of pleasure (i.e., hedonism) that an action was likely to cause. Likewise, it
      could be used to calculate the amount of pain that said action would cause—perhaps
      doing one or the other to two different people—or even doing both to the same
          By rating any given action, he posited that you could determine whether an
      action was worthwhile—or even moral. In fact, by using his algorithm and the
      various weights he applied to actions, Bentham constructed arguments in favor of
      individual and economic freedom, the separation of church and state, freedom
      of expression, equal rights for women, the end of slavery, the abolition of physical
      punishment, animal abuse laws, the right to divorce, free trade, the interest charged
      on loans, and even the decriminalization of homosexual acts (and remember, this
      was the 18th century!).
          Although this may be presumptuous, his methods were definitely crafted in
      such a way as to make life easier for game artificial intelligence (AI) developers. As
      a unit of measurement (i.e., a common “utility” value), he reduced everything to
      “hedons” and “dolors”—positive and negative, respectively. He rated actions using
      a combination of variables, which he called “elements” or “dimensions.” What they
      amounted to was vectors of pleasure (or pain) that could be added together to
      combine into one final result. These dimensions included:

           1. Intensity: How strong is the pleasure?
           2. Duration: How long will the pleasure last?
           3. Certainty or Uncertainty: How likely or unlikely is it that the pleasure will
           4. Propinquity or Remoteness (Time Distance): How soon will the pleasure
           5. Fecundity: How probable is it that the action will be followed by sensations
              of the same kind?
           6. Purity: How probable is it that the action will not be followed by sensations
              of the opposite kind?
           7. Extent: How many people will be affected?

         The first six of these were designed to be applied to the individual in question,
      whereas the seventh was meant to encompass others as well.
                                                     Chapter 9 Relative Utility    197

     By analyzing any given situation or decision and applying what should osten-
sibly be an objective rating to the various facets involved, he believed you could
arrive at a score that was descriptive of what should or should not be done. The
decisions themselves could be of low importance such as selecting a flavor of ice
cream, or they could be of great weight such as selecting the benefit of one human
life over another during a deadly medical outbreak.

Bentham Goes to Dinner
To show how the formula works, we can toss all sorts of anecdotal examples at it.
If I were to try to decide how to spend money on a night out with the family, for
example, I would likely want to weigh the pros and cons of various activities. If we
were to go to dinner, I may be presented with a wide variety of choices of where to
go. If I were to analyze certain components of the dining experience, however, we
may be able to sift through the possibilities a bit easier. By using Bentham’s seven
criteria, I can at least get a better idea of how these choices stack up.

    Intensity: There may be a significant difference in the quality of the food from
    one establishment to another (intensity of the pleasure). On the other hand,
    that can easily be countered with a significant difference in the amount of money
    I would need to pay (intensity of displeasure). Of course, that necessitates me
    weighing the relative utilities of the pleasure from the food and the displeasure
    of spending money.
    Duration: Do we enjoy spending time at an eating establishment? If so, we may
    include the value of how long the pleasure lasts. Driving through a burger joint
    and inhaling their wares in the car doesn’t provide a long duration of pleasure;
    sitting down in a restaurant with a pleasant atmosphere and taking our time
    may be an important part of the experience for us.
    Certainty: If we are familiar with a restaurant, and know what our favorite
    dishes are there, we can be more certain of the pleasure than if we were to go to
    a new, unfamiliar establishment where we don’t know what we are getting into.
    In that case, if we aren’t feeling experimental—such as a night when we are
    treating company to dinner—we may want to play it safer and avoid the uncer-
    tainty involved in visiting a new restaurant.
    Remoteness (Time Distance): If we are very hungry or are concerned with
    time, we may be more inclined to select a place that is closer to our house than
    to drive across town. The delay in the pleasure would be a significant factor to
    us. The same could be said for the expected waiting time. If we know that we
    can expect to wait an hour before we get served, it may greatly affect our decision.
198   Behavioral Mathematics for Game AI

          Fecundity and Purity: If, despite liking the food that a place serves, we are
          aware that there is a possibility that our digestion will suffer for the remainder
          of the evening, we may not be enamored with the purity of the experience. Do
          we really want to take a chance on feeling ill later on?
          Extent: How many people in the family are going to enjoy this experience? Is it
          something that most people can agree on or is it going to alienate the wife and
          kids? This addresses Bentham’s last point—how many people are going to be
          affected by my decision?

          The trick to the whole process (and unfortunately its major sticking point) is
      trying to determine what the utility values should be. Even ranking things within
      one category can be hard—do I like pizza more or less than tacos? Or steak? Or
      pasta? How much more? Still worse is comparing things across categories. For
      example, is the difference in how much I like pizza compared to tacos worth more
      than the difference between a 20-minute wait and a 5-minute wait?
          To score things appropriately, we would need to have more than an ordinal
      ranking such as pizza > tacos > pasta. We would need to know how much greater
      pizza is than tacos and how much greater tacos are than pasta. And that scale would
      need to be directly comparable to the scales used elsewhere… the duration, the wait,
      the certainty of the experience, the likelihood of potential indigestion, and whether
      I have to put up with my family griping that they don’t like my choice of restaurant.
      Arranging all of these disparate factors into a framework that can help us make one
      decision is no small task (yet we will take a crack at it later on in the book).


      Bentham’s hedonic calculus attempts to combine multiple scores in such a fashion
      so as to create a single score. For the goal for which hedonic calculus was intended,
      those categories were reasonably adequate. In fact, there could possibly be game sit-
      uations where those categories would be appropriate. Of course, not all of
      Bentham’s categories would necessarily fit in a game environment decision.
      However, most game environments have a constellation of factors that can be taken
      into account to weigh the available choices that an agent might have. These factors
      are very decision-dependent. That is, no one set of criteria or factors works in all
         The important thing to learn from Bentham’s approach is that decisions can be
      composed of many different pros and cons. Not only do each of the individual pros
      and cons need to be scored or weighted, but the relationship between them needs
                                                    Chapter 9 Relative Utility    199

to be defined. Eventually, we can arrive at one value that represents the total utility
that the agent would derive from the action. This concept is known as multi-
attribute utility theory (MAUT).
     MAUT is used in many industries as a decision tool, although it may go by dif-
ferent names (e.g., multi-criteria decision analysis, multi-criteria decision making).
The business world presents an overflowing cornucopia of formalized methodolo-
gies available to assist decision-makers in processing massive amounts of data. A
quick search even turned up the International Society on Multiple Criteria Decision
Making, which claims to have over 1,400 members in 87 countries. On their Web
site (, they define their field as follows:

    “MCDM can be defined as the study of methods and procedures by which
    concerns about multiple conficting (sic) criteria can be formally incorpo-
    rated into the management planning process.”

     Aside from the egregious spelling error, that definition pretty much sums up
what we are trying to accomplish with game AI. Working backward, “management
planning process” is a buzzword for making a decision. To arrive at that decision,
we need to “formally incorporate” a number of “conflicting criteria.” That sounds
strikingly like what Bentham had in mind.
     This approach is also at the core of designing and programming game AI. If we
look back over the examples we used earlier in this book, we find that MAUT would
have been appropriate in many of them. They use many, possibly conflicting crite-
ria that we need to formally incorporate into a unifying algorithm to arrive at a
decision. For example:

    When my daughter was formulating her strategy for running for VP of the
    fifth grade, she identified many criteria that would have led to decisions about
    whether to run for president or vice president, whether or not to make stickers
    and posters, and if she should tailor her speech to be more kid-friendly.
    When my son was considering raising prices in his zoo, he pondered the per-
    ceived value of his zoo based on how many exhibits he had already created and
    the balance between more visitors at a lower price and fewer visitors at a higher
    When I analyzed my decision to use the next razor blade in the pack, I took into
    account how many blades were left, the cost of purchasing more blades, my
    comfort in shaving with a used blade, and how much I had already used prior
200   Behavioral Mathematics for Game AI

         Even some of the examples in the previous chapters on utility were based on
      some assumed premises that could have been further explored and defined with
      MAUT. For instance:

            Pascal was far to simplistic when he relegated “live as if God exists” and “live as
            if God does not exist” to equal status. I’m sure that Mr. Bentham would have
            been more than happy to point out to the conflicted Mr. Pascal that those two
            choices carry a significant amount of complexity in and of themselves.
            When we built our tower over the barracks, we assumed that the barracks was
            important to defend. Why is it important? What if we already had three bar-
            racks buildings and this was a fourth that we were building for insurance? What
            if we were currently beating back our opponent so badly that the likelihood of
            having our barracks attacked was nil?
            When we decided to build our settler and his escort soldier first, we worked from
            the premise that sending out a settler earlier was better than doing so later.
            Why? How much better? Is this as important if we are not concerned with ex-
            panding our empire? Why would we be concerned about expanding our empire?
            Are two smaller cities worth more than one slightly larger one? How much more?

          All of these factors could be defined in terms of a utility attribute. We would
      then be able to score the factors appropriately and include them in our decision.

       IN   THE   G AME   The Engagement Decision Revisited

      In Chapter 3, we showed a brief example in which we took multiple criteria into
      account to construct a decision about whether or not an agent should engage the
      enemy. In this case, the seven categories that Bentham defined would not be appro-
      priate. Instead, we identified eight factors that we thought we should consider.
      Those were:

            Agent’s health
            Enemy’s health
            Agent’s weapon
            Enemy’s weapon
            Number of enemies
            Proximity to a leader
            Proximity to an important location
            Agent’s “anger” level
                                                   Chapter 9 Relative Utility   201

    If we define each of those factors as a utility, the process of combining them
into a single utility function is an instance of MAUT (Figure 9.1). Specifically, we
are taking multiple, disparate attributes and blending them together to arrive at a
single, deciding factor—in this case, should we attack, hide, or flee?

                 FIGURE 9.1 The engagement decision from Chapter 3
                              is an example of MAUT.

Analyzing the Attributes
To show how the thought process is similar to hedonic calculus, we can step
through each of the attributes above just as we did with our dining decision. For
each, we can list what we believe the relevant factors are.

    Agent’s Health: How healthy are we? Are we fully healed? Damaged? If we are
    almost dead, taking any additional damage would be a serious concern.
    (Remember the increasing marginal utility of declining health from Chapter 8?)
    Enemy’s Weapon: What sort of weapon does the enemy have? How much
    damage does it do in one hit? How much damage does it do over time? This is
    important because it needs to be considered in direct relation to our current
    health. While a stronger weapon would be dangerous at all times, a weaker
    weapon would still be dangerous if we were low on health. Therefore, the
    enemy’s weapon is not an isolated factor.
    Enemy’s Health: We need to consider the current state of the enemy. How
    damaged is he currently? It would be a shame for us to run away if our foe was
    only barely clinging to life.
202   Behavioral Mathematics for Game AI

          Agent’s Weapon: What sort of weapon do we have? Are we packing something
          serious like the rocket launcher from Chapter 5, or are we stuck with the equiv-
          alent of a military issue peashooter? How much damage could we do in one hit?
          Over time? What is the accuracy? Is it area of effect or direct fire? All other
          things being equal, how long would it take us to dispatch our horrible nemesis?
          Number of Enemies: How many baddies are out there after all? Is it just this
          one dude with dark glasses and a stupid hat, or are there many shade-wearing,
          ridiculously crowned nemeses that we have to deal with? If it is more than one,
          is it still a reasonable enough number that we would feel confident in our abil-
          ities, or are we counting on the art department to provide our character with a
          change of pants after this encounter?
          Number of Allies: How many of our own chaps do we have around to assist us?
          We certainly would feel far more comfortable taking on the enemy if we had
          some support. Are we alone, or do we have strength in numbers?
          Proximity to Leader: Is our fearless and inspiring leader within visual range? Is
          he the type who would encourage us to press on into the battle—either through
          positive reinforcement or the threat of his boot to our head? If he is close by,
          we may feel more comfortable (or more afraid of him than the enemy). If he is
          distant—or dead—we may begin to feel that all is lost.
          Proximity to Location: Is our pitched battle taking place near an important
          location we need to claim? Or near one we absolutely must defend? Are we near
          a safe fall-back point, or is our back up against a wall?
          Anger: How clearly are we thinking about all of this? Is our morale artificially
          heightened to an irrational level by a sense of vengeance? Do we just want to
          “get even” regardless of our situation?

          All of these issues need to be addressed and codified in some manner. When we
      do that, each of them can be represented as a utility value. Constructing some of
      them will likely use the principles of marginal utility as well.
           As utility values, they become our core building blocks. As such, everything
      that follows from this point on will be based on what we determine to be the appro-
      priate interpretations of these factors. If we are inconsistent or flawed in our ap-
      proach to creating them, these inconsistencies will be cascaded throughout the rest
      of the decision algorithm.

      Assembling Our Blocks
      Additionally, as is illustrated in Figure 9.1, we can begin combining these individ-
      ual factors into intermediate concepts. Each of these concepts, once created, can
      stand on its own as a new attribute that can then be combined with other attributes.
                                                   Chapter 9 Relative Utility   203

For example:

    Risk to Agent: Our current health combined with the enemy’s weapon consti-
    tutes a risk to us. Is his weapon so powerful that a single hit can take us down?
    If not, how long would it take?
    Threat to Enemy: In the reverse of Risk to Agent, how much health does our
    enemy still have left? How does that compare to our weapon? Can we take him
    out in one hit? If not, how long would it take?

    These two attributes can then be combined into yet another attribute that rep-
resents an aggregate of the two of them.

    Total Threat: Given the risk that the enemy poses to us and the threat we pose
    to the enemy, who holds the upper hand? All other things being equal, who
    would die first in a firefight?

     By looking carefully and perhaps changing a few words, we start to recognize
similar patterns to problems we have addressed in previous chapters. For example,
in determining the “total threat” utility, we included the risk that we are undertak-
ing and the threat we pose to the enemy (i.e., our reward). The respective utilities
of risk and reward are concepts that we dealt with in previous chapters. In Chapter
5, we pondered the risks and rewards of running out to grab the rocket launcher. In
Chapter 7, we addressed the risks and benefits of building a tower over our barracks
and whether we should grab the armor or boost our health first. In Chapter 8, we
put risk and reward directly opposed to each other to help us analyze the St.
Petersburg paradox.
     In this case, we have constructed the concept of “Total Threat” from two other
utilities—each of which was constructed from two other utilities, which, in turn, we
created as measurements of utility. Assuming that we have confidence in the
choices we have made in the process so far, we can express similar confidence that
Total Threat represents an accurate assessment of whether or not our agent is in a
position of power in the engagement.

When the Heart Rules the Mind
Total Threat is only part of the total decision, however. It is a very mechanical “my
gun is bigger than your gun” question. Not a lot of subjectivity is involved in the
analysis of whether or not the rate of damage I can deal is going to be enough to
counter the rate of damage he is dealing to me. The inputs are fairly concrete num-
bers yielding an equation that can be solved by some simple algebra. The other side
of the problem, however, involves some speculation and subjectivity.
204   Behavioral Mathematics for Game AI

           We started by combining the respective number of allies and enemies that were
      present, our proximity to our leader, and the proximity to a point of interest into a
      concept of “Situational Morale.” While the actual count of allies and enemies
      nearby is concrete, how those results factor into morale is not. How many allies is
      enough? Is the morale we gain from allies a linear function? Are we twice as secure
      if there are four allies as we were when there are only two? Is there decreasing mar-
      ginal utility? What is the threshold?
           Additionally, the conversion of the proximity distances into utility functions is
      not something that immediately leaps into the realm of the obvious. What does
      proximity mean? Within sight? Unobstructed sight? Within a certain radius? Is it a
      linear function of distance, or does it exhibit marginal utility as well? Which direc-
      tion? Does the utility start flat but then drop off when we get to a certain distance?
      Does the utility drop off quickly as soon as we move away a little and then slowly
      trickle off from there?
           There is another consideration in the complexity of interaction. What if the
      state of one utility affected another? For example, let’s say the distance from our
      leader inside which we can still feel comfortable is dependant on how many ene-
      mies there are. The more enemies, the closer we need to be to our leader to feel em-
      boldened. The examples of how these factors can be combined together are infinite.
      Certainly, many of those infinite answers would not make much sense—but that
      doesn’t make our problem much easier to address.
          Regardless, at some point, we could craft a method for combining those four
      factors into a notion of Situational Morale. As we noted on our diagram and in the
      descriptions, this morale as a whole could be colored somewhat by our anger state.
      If we are angry, the morale that is implied by the four factors could be thrown out
      the window—at least for the duration of our anger. (Snapping back to reality after
      acting angry can provide for tragic—or humorous—situations… “Oh heck… now
      what have I got myself into?”) By factoring in our anger level, we are potentially
      skewing our model of Situational Morale into one of “Perceived Morale.”

      The Final Connection
      We have rolled up our original nine utility factors into two factors: Total Threat
      and Perceived Morale. These two variables need to be glued together in the same
      manner that we have done others up to this point. What is the relationship between
      them? Anecdotally, we could say that if our morale is higher, we are more willing to
      stick around despite a less than preferable threat level. In fact, we could say that
      there is a direct relationship between the two. The higher our morale is, the better
      our tolerance for threat; the lower our moral is, the lower our tolerance.
                                                           Chapter 9 Relative Utility    205

          The goal, therefore, is to construct an equation that reflects this relationship. Is
      that equation linear? Exponential? Does it approach an asymptote at some point?
      What is that asymptote?
          It’s worth stressing (and believe me, we will stress it repeatedly over the next
      200+ pages) that the only thing we should have to consider at this point is the rela-
      tionship between the threat level and our perceived morale. All the other consider-
      ations that are present in our model have been included already. At any point in the
      process, we should be able to trust our underlying assumptions and decisions that
      we made based on them. If we find ourselves questioning the validity of either of
      these two factors—or, worse yet, making accommodations for them—then we
      need to return to the previous levels and address our misgivings there.
           If we have done our work properly to this point, and if we find a relationship
      between our last two factors, then the result of that last step would provide us with
      the decision utility that we need to determine whether we should attack, hide, flee,
      faint, or simply suffer a massive psychological trauma and collapse into a quivering
      pile of wimpiness. By using MAUT on our nine factors, we have reduced a relatively
      large state-space into a single decision.
           Yes, I know that we have provided no fun specifics yet. We’re all dying to see
      real numbers and formulas! Don’t worry. We’ll get there—in Ryan Seacrest’s pop-
      culture staple—“after the break.”


      As we saw in our musings about both the dining out and engagement decisions, there
      is a lot of undefined “wiggle room” in how we could determine the weights of and
      associations between the factors. Much of the problem is due to the very fact that
      we are trying to quantify ideas that are not necessarily quantifiable.
           The effect of this problem often becomes exposed when people are polled
      about something that should be common knowledge. Even when faced with the
      exact same situation, different people can come away with different views on what
      something actually is. This is not a surprise to any of us; we have all experienced it.
      Differences in perception, understanding, and opinion are the staple of crime
      dramas, romantic comedies, and reality shows alike. It isn’t that we all just come to
      different conclusions; often we are all working with a completely different set of
      perceived facts or conceived understanding about the situation. It is these differences
      in the input of data that start to skew the subsequent processing of that data for pur-
      poses of making decisions. In short, we are using different measuring sticks.
206    Behavioral Mathematics for Game AI

       Often, the way people perceive and process things is the major bottleneck to estab-
       lishing a sorting of how we would prefer things. One of these problems is the actual
       perception of a stimulus. While this may seem obvious, it is an important facet of the
       process to cover. At the very core, a stimulus must be perceived to be included in
       consideration. This is somewhat different than the simple fact of whether or not the
       stimulus exists. Plenty of stimuli exist and yet are either below a threshold of de-
       tectability (e.g., too quiet to hear) or outside the sphere of awareness (e.g., happening
       behind or around the corner from the intended observer). In those cases there is a
       legitimate argument that, despite the very real occurrence of the trigger event, the
       agent was simply not able to be aware of it. If the agent was not aware of it, he
       couldn’t use it as part of the decision-making process.
            The science of psychophysics is an area of study that deals with a much more
       subtle and even scientifically intriguing area. It deals with the way that we psycho-
       logically process differences in stimuli. For example, if someone were to hand you
       two objects of similar weight—one in each hand, could you tell if they were differ-
       ent weights or the same? Could you tell how much different the weights were. If one
       weight was five pounds and the other one was six, would you be able to tell that the
       difference was the same as if they weighed four and five pounds respectively? In
       both cases, the difference is one pound, yet they may feel significantly different.
       How much different would the weights have to be before you were able to accu-
       rately determine which of the two was heavier?
            If we are modeling the sense of hearing for an agent, we would likely want to
       take into account what level of sound would be perceptible to that agent. We also
       may want to take into account how loud that sound is compared to other sounds
       in the area—that is, the relative sound level. It is entirely possible that a loud sound
       could go unnoticed when it is heard in concert (so to speak) with other loud
       sounds. Also, could the sound be mistaken for something else? My wife swears that
       there is a stairwell door in the hospital at which she works that squeaks in a manner
       that sounds remarkably (and eerily) like our cat that died in 2005. It is not a case of
       feline reincarnation as a fire door; it is a matter of flawed perception.
            Likewise, different people can perceive visual stimuli in dramatically different
       ways. My dad is red-green color blind, for example. If he isn’t specifically looking
       for it, he will not see a red traffic light in a background of a green tree. (Or a green
       light in a background of a red tree, I suppose.) If he were to visit a city that flipped
       their traffic lights upside down, he would really be in trouble!
                                                           Chapter 9 Relative Utility    207

       Difficulties also arise when you are trying to categorize things. Often, the definition
       of a category is entirely subjective. In our color example, people may have different
       understandings of what a color “should be.” My ex-wife used to ridicule me for
       claiming that one of the four colors of tiles in our Rummikub set was orange. She
       asserted that it was yellow. In fact, neither of us was “right.” Orange and yellow are
       examples of invented classifications that we humans have devised for the sake of
       simplicity. By being able to say “orange” or “yellow” we spare ourselves the headache
       of trying to communicate color through an expression of relative wavelengths of
       light. (On the other hand, there is a growing portion of the computer-graphic-
       savvy population that can speak quite fluently in red-green-blue [RGB] triads.)
            My ex and I were looking at the same color wavelengths. For the sake of argu-
       ment, let’s assume that our eyes worked well enough to perceive those wavelengths
       properly. The problem arose because we likely had differing opinions of where
       yellow ended and orange began (or vice versa). In fact, the color was somewhere in
       between the commonly accepted definitions of those two colors. However, not
       having a box of Crayola crayons handy to help us negotiate a compromise such as
       “sunglow,” “dandelion,” “goldenrod,” or even simply “yellow orange,” neither one
       of us was willing to back down on our assertion about the color of the tile.
            (Full disclosure: Wikipedia lists the color as yellow, but the instructions on the
       official Rummikub site list it as orange. I feel better about myself now. Kinda
       vindicated, ya know?)
           This concept is surprisingly important to modeling not only perception systems
       in games, but decision systems as well. Let’s look back over some of the classifica-
       tions we have to make for our engagement decision: What is “low health?” How
       many enemies need to be present before we begin to get nervous about being “out-
       numbered?” What is the actual linear distance away from our leader that no longer
       constitutes “close enough to feel safe?” These are examples of decisions we would
       need to make if we wanted to measure and compare attributes in a meaningful way.

       The issue that generates the most inconsistency and subjectivity when people address
       attributes is based more in psychology than perception, however. Even when obser-
       vation and measurement can be exact, therefore taking the vagaries of perception
       out of the loop, there are other issues in play. Extensive studies have shown that not
       only do different people put differing values on things, but the same person may put
       differing values on things based on their understanding of the situation. By stating
       the situation another way or in terms of another scale of measurement, the answers
       that people give may change significantly. The facts don’t change, but the meaning
       people ascribe to them may differ significantly.
208   Behavioral Mathematics for Game AI

      When Is Dying not Dying?
      One of the more famous examples showing how people’s decisions can be greatly
      affected by their perceptions was offered up by Daniel Kahneman and Amos Tversky.
      Incidentally, despite being a psychologist, Kahneman won the Nobel Prize in
      economics in 2002. (Tversky had died a few years prior to that; otherwise he would
      have been the co-recipient.) The point is that there is an increasingly significant and
      accepted link between psychology and the realm of economics.
          In their experiment, they presented the following choice to one group of people:

          Imagine that the U.S. is preparing for the outbreak of an unusual Asian
          disease, which is expected to kill 600 people. Two alternative programs to
          combat the disease have been proposed. Assume that the exact scientific
          estimates of the consequences of the programs are as follows:
              If program A is adopted, 200 people will be saved.
              If program B is adopted, there is a 1⁄3 probability that 600 people
              will be saved and a 2⁄3 probability that no people will be saved.
          Which of the two programs would you favor?

          Additionally, they presented the same scenario to a second group of
          people but with different choices:
              If program C is adopted, 400 people will die.
              If program D is adopted, there is a 1⁄3 probability that nobody
              will die, and a 2⁄3 probability that 600 people will die.
          Which of the two programs would you favor?

          It may take a bit of looking at the question, and maybe a pencil and paper, but
      closer examination shows that the choice between A and B is the same as the choice
      between C and D. If we were to show the various choices in terms of expected
      deaths, without regard to wording, we are able to cut through the clutter somewhat.
                                                      Chapter 9 Relative Utility     209

    In the first group, most of the people chose A—that is, to save one-third of the
people (200) for certain rather than roll the proverbial dice and gamble about sav-
ing everyone yet risk losing everyone in the process. On the other hand, the second
group chose program D—apparently thinking that the odds of saving everyone
were preferable to heartlessly letting two-thirds of them (400) die. The contradic-
tion is clearly irrational, given that there is no mathematical or statistical difference
between the two courses of action: A/C vs. B/D (Figure 9.2). The only difference
was in the perception of the situation based on the carefully chosen language of the
questions. The words we used played upon people’s feelings—the horror of losing
400 people when they could have been saved.

  FIGURE 9.2 In Kahneman and Tversky’s outbreak problem, people overwhelmingly
selected different choices based entirely on the wording of the question despite the fact
that the meaning of the words was exactly the same. (The selected answers are circled.)

    If flawed (or at least skewed) judgments like these were used in constructing
mathematical ratings for Bentham’s hedonic calculus, the results of any given
calculation would be all over the map. The premise that he had begun with, that his
method would cut through all the complexity of making multivariate decisions, was
hamstrung by the very fact that most of the variables are next to impossible to
     Put another way, it isn’t the solving of the equation that is the problem; it is the
determination of what values to use when we are solving the equation that is the
most important part. When approaching the task of constructing behaviors, many
people stop at the point of coming up with an algorithm without considering the
fact that the numbers they are feeding into the algorithm may have been subjectively
skewed—even slightly—in such a way as to not yield the outcome they would like.
210    Behavioral Mathematics for Game AI

       Furthermore, they may look to place the blame on the algorithm itself without
       analyzing and testing the validity of the components. The point is that both the con-
       struction of the algorithm and the values and weights that are used in the algorithm
       are of equal importance. Later on, we will address both of these components in
       more detail.


       Another interesting issue to consider is the possibility that there might be times when
       it is actually preferable to select a less attractive option. I use the preceding sentence
       with the disclaimer that both the terms preferable and less attractive are entirely sub-
       jective. As in the deadly outbreak example, when we analyze what is truly happening
       at a mathematical level, a statement that initially looks like a contradiction may
       actually become clearer.
            This is different from the outbreak example above, where the options were the
       same. The perception of the outcomes was what was flawed. In this case, we face a
       choice where, on the surface, one option is mathematically superior to another—
       in the short term. Once we place the choice into the entire framework, however, it
       is exposed as actually being less preferable. This issue can be even more apparent
       when placed in a continuum. For instance, as the situation changes, it would seem
       that one choice would become more advantageous. Yet surprising results can occur.

       Interestingly, it seems that we humans often confuse ourselves by making situations
       overly complex. Animals often seem to “get it.” One example that showed how an-
       imals can grasp the sometimes elusive complexity involved in hedonic calculus was
       an experiment conducted by Raymond Battalio and John Kagel. They showed that
       laboratory rats were perfectly capable of analyzing the relative merit of controlled
       supplies of pleasure and discomfort and making the correct decisions with that
            The experiment involved placing rats in a box equipped with the equivalent of
       two vending machines. Each machine would dispense a measured amount of liquid
       for the rats to drink for each press of the lever. They also changed how many lever
       presses were available to the rats in each session in the box. With a moment of re-
       flection, you can see how the variables in play correspond to familiar terms in the
       world of economics. By capping the number of lever presses available to the rats,
       they were setting the income level, that is, the most the rats had available to spend
       in any one period. By adjusting the amounts of liquid that the respective lever
                                                     Chapter 9 Relative Utility    211

presses dispensed, they were setting prices for the liquid, that is, how much bang the
rats were getting for their buck.
   After some time learning how the levers worked, the rats had a pretty good
handle on how much liquid was dispensed by each one. Invariably, they would
head for the one that gave them the most liquid. This doesn’t seem important in
and of itself. In fact, even without a lever-press limit, we can all see that it would
make sense to patronize the establishment that was dishing out the most liquid.

Root Beer for Rats
The next change, however, gave the rats something a bit more complicated to pon-
der in their quest for quenching their thirst. The experimenters modified the box so
that one lever produced root beer—apparently a favorite beverage of lab rats—and
the other one dished out water flavored with quinine. Since rats do not seem to
appreciate the bitter taste of tonic water, it made for an appropriate counterpoint
to the tasty root beer. All other things being equal, a rat would choose root beer
over quinine. However, not all other things were equal!
    The dispensers were set so that the amount of root beer provided was signifi-
cantly less than the amount of quinine. Again, without a limit on lever presses, this
decision is only slightly more complex. If you are a rat willing to work for it, the
root beer is still available to you… you just have to press more often to get it.
However, the researchers did set a maximum number of times that you could press
the two levers during a single session. (Note that this was total lever presses—each
lever counted against the single total.) With that limit in place (your fixed income),
however, you now have to decide how to spend wisely.
     Enter the final criteria… you are a rat who gets thirsty—more thirsty than you
can satisfy by simply selecting the small doses of root beer. You are going to have to
satisfy your thirst with some combination of quinine and root beer. And that is what
gives us our decision to make. How much quinine should we put up with when we
really would prefer root beer?
     Over time, the rats would work out a balance that satisfied both their individ-
ual levels of thirst and their preferences for root beer relative to quinine. For each
rat, there would be a slightly different level. In the end, however, equilibrium would
be established. They learned that they would have to quench much of their thirst
with the quinine and only occasionally dip into the root beer for a marvelous taste
     Again, that the rats established an equilibrium that balanced their desires is not
a surprise. The math is similar to that of purchasing the warranty, selecting work-
ers based on productivity, and deciding which goal is the most important when you
consider how long it takes to get there. The calculation is intuitive for the most part.
212   Behavioral Mathematics for Game AI

      If I were to ask, though, what you would do (or what the rats would do if you want
      to stay out of it) if the researchers were to cause the price of the quinine to go up—
      that is, decrease the amount of quinine per lever press—what would your answer be?

      Too Much of a Good Thing
      Analyzing it at a purely surface level, our initial instinct may be that we would have
      to drink less quinine per session… perhaps even increasing our intake of root beer.
      (And what a tragedy that would be!) However, this is not what the rats did. When
      the price of the quinine was increased (less per press), the rats actually increased
      their intake of the bitter liquid by selecting the quinine lever more often. Of course,
      that also meant that they drank less root beer as a result. But why would the rats use
      more of something after the price went up?
          The answer to that question is what lies at the heart of a concept called “Giffen
      goods.” In his 1890 book, Principles of Economics, Alfred Marshall named the
      concept after the 19th century British statistician and economist Robert Giffen, to
      whom he attributed the original idea. A Giffen good is an inferior product or com-
      modity whose usage defies the principles of demand in that its usage increases as
      the price goes up and goes down as the price lowers. There are only a few examples
      of Giffen goods, yet studies of microeconomic mathematics show how this phe-
      nomenon certainly can exist. In fact, the example used by Marshall of staple food-
      stuffs is remarkably similar to that exhibited by our thirsty rodents.
           To find where the core principle kicks in, let’s break down the issue mathemat-
      ically. With all the talk of the prices of the various drinks and the amounts that are
      dispensed for each press, it is easy to get caught up in just those two issues. After all,
      it was the “price” of the quinine that changed, so the answer must lie close to that
      particular facet. There are two other important factors to remember, however.
      More importantly, they seem to be the factors that get forgotten first:

           1. The rats have a fixed level of need (i.e., the amount of thirst per session).
           2. The rats are on a fixed income (i.e., the number of lever presses per session).

           When the price of the quinine was raised, neither of these two factors changed.
      Our parched pests still needed to drink something to satisfy that need. Also, they
      knew that they only had a limited number of presses available to them regardless of
      what they chose to drink or how much of each drink they were granted. It is these
      fixed values that sets the boundaries for the possible decisions made when the vari-
      able values change. What the rats realized was that to slake their thirst for the dura-
      tion of the session, they were going to have to spend more of their (hard-earned?)
      lever presses on the quinine than they used to get the same amount of liquid.
      Therefore, if they were using more lever presses to get the quinine, they had fewer
      lever presses left to “splurge” on the luxury of root beer.
                                                      Chapter 9 Relative Utility   213

Doing the Rat’s Math
Put into a utility formula similar to what we have used earlier, we can express it as
follows. First, let’s define our terms and the initial settings of the levers:

    Pr      Number of root beer lever presses
    Pq      Number of quinine lever presses
    Pqr     Number of lever presses per session
    Ar      Amount of root beer per press
    Aq      Amount of quinine per press
    Aqr     Amount of quinine and root beer to be consumed per session (thirst)

    We also need a few formulas. Let’s start small. We can glue them all together later.
    The amount of liquid consumed is the number of presses for each multiplied
by how much liquid is dispensed for each:

     The number of lever presses for root beer and quinine respectively add up
to the total number of lever presses:

    We need to put everything in terms of quinine. Therefore, we are trying to
eliminate references to root beer in our equations. Obviously, the number of
presses for root beer will be the maximum number of lever presses minus how
many are used for quinine. To express this, we just need to flip the terms around:

    Substituting into the original formula, we get:

    Jumping through the hoops to solve for the number of presses of the quinine
lever () gives us:
214   Behavioral Mathematics for Game AI

          In its current form, that looks a little cryptic. Let’s plug in some numbers. Note
      that in all of our examples, the only number that we will change is the amount of
      quinine that is bestowed per lever press (Aq).

          Pqr    Number of lever presses per session         25
          Ar     Amount of root beer per press               1
          Aq     Amount of quinine per press                 100
          Aqr    Amount of liquid to be consumed             200

          Placed into the formula above, we arrive at:

          This means that we only need 1.8 presses of the quinine lever to give us all the
      liquid we would need for our thirst. We can spend the other 23 presses splurging on
      root beer. At a rate of 100 per press, quinine is incredibly cheap compared to root
      beer. We could buy a lot of it, but we don’t have to. With just under two presses, we
      have satisfied everything we need from the less-than-pleasant quinine. When it
      comes to root beer, we rats can go hog wild, so to speak.
           As we reduce Aq dramatically (to 36, for example), we start to see a significant
      difference (Figure 9.3). Now the quinine is getting more expensive. (Nothing about
      the root beer has changed.) However, we now have to spend more of our presses on
      acquiring quinine. When Aq = 36, Pq climbs to five. That still leaves 20 presses for
      root beer. It’s not really time to panic.
          Continuing the trend of raising the price of quinine to the point where we are
      only getting 10 per press, we find that we now need to press the quinine lever over
      19 times to get what we need to survive. Despite the fact that the relative value of
      root beer to quinine has gotten significantly better, we can actually afford less of it.
      We are spending too many presses just getting enough liquid to survive. We can
      only afford to tap into the root beer keg five times.
          When Aq reaches 8, it is obvious that root beer is now completely out of the
      picture. Our 25 presses at 8 each will just get us enough to satisfy our thirst. Root
      beer is tantalizingly similar in cost to the annoying tonic water—at least compared
      to what it used to be—and yet we can’t bother ourselves with it.
                                                     Chapter 9 Relative Utility     215

     FIGURE 9.3 Because the number of lever presses per session is capped at 25,
        as the amount of quinine dispensed per press goes down, the number of
            quinine level presses must increase for the rat to quench its thirst.
                  The rat can no longer “splurge” on the tasty root beer.

Bentham vs. the Rats
There are a few lessons to be learned here. One is that utility levels aren’t always
obvious and aren’t always simple. It would be difficult for Bentham to have ascribed
values to the root beer and the quinine that would have, without other considera-
tions, allowed us to rank them appropriately in his hedonic calculus. Let’s look again
at Bentham’s seven criteria:

     1. Intensity: In this case, the root beer is far more pleasurable than the quinine.
        This much is simple.
     2. Duration: This is similar to how much liquid is being dispensed per lever
     3. Certainty or Uncertainty: The rats learned over time that the liquid was a
        certain outcome of a lever press, rendering this issue irrelevant.
     4. Propinquity or Remoteness (Time Distance): The dispensing was immediate
     5. Fecundity: Irrelevant
     6. Purity: Irrelevant
     7. Extent: Irrelevant
216   Behavioral Mathematics for Game AI

          Looking at the list, we would think that the only things that mattered were the
      intensity and duration (amount) of the pleasure. However, the one thing that was
      missing in Bentham’s list was the notion of context. The decision in question did not
      happen in a vacuum. It was not entirely stand-alone. Some ramifications were not
      covered by the seven parameters.
          The second lesson is that utility values don’t always ebb and flow the way we
      would expect them to. Or at least not in a way that is as simple and clear-cut as
      we would like. Everything that we have internalized about the laws of supply and
      demand tells us that as a price goes up, we should consume less of it. However, that
      outlook is simplistic in the sense that it doesn’t take into account all of the other
      laws that are in effect at the time… notably that we were working between the
      bookends of finite quantities and finite demands.
            The third lesson is that lab rats are a lot smarter than we give them credit for.

       IN   THE   G AME   Wizardry and Wands

      The reason the rats even had a decision to make in the first place was that they were
      on a budget. They had a limited number of level presses available to them. We can
      think of that limit in terms of “income.” Additionally, they had a fixed amount of
      thirst. They had a set amount of liquid (of any type) that they wanted to consume.
      This is analogous to “expenses.” Given the juxtaposition of those two limits, we
      have a very simple income vs. expense comparison. That’s what Giffen goods were
      originally designed to analyze.
           The question of income vs. expense is no stranger in the game world. Obvious
      examples come from the strategy game world, where purchasing decisions are com-
      mon. Examples abound in which we must decide how to distribute a fixed amount
      of a resource between two or more options with differing costs. However, there are
      other game choices that involve income and expense transactions that aren’t quite
      as obvious simply because they don’t use the familiar “buy and spend” terminology.
           Imagine we are a spell-casting character in a role-playing game (RPG). We have
      the ability to cast a fireball spell but with only moderate results—dealing a paltry
      five points of damage. However, we are packing a magic wand that casts a blizzard
      spell that does 60 points of damage. The drawback of the wand, however, is that it
      only has a limited number of charges. We are reluctant to use it unless it is absolutely
      necessary. With our own spell-casting, for all intents and purposes we can cast spells
      as many times as we want. Given the choice, we would prefer to use our fireball spell
      and save the wand’s blizzard for urgent situations. Regardless of whether we are
      casting a spell or using our wand, it takes us five seconds to perform either action.
                                                      Chapter 9 Relative Utility    217

    We encounter a massive foe that will take 260 points of damage to kill.
Additionally, we determine that this foe is so powerful that it is likely it will kill us
within about 60 seconds. Therefore, at five seconds per action, we only have 12 total
actions that we can perform before our time is up. With all that in mind, we decide
that this certainly qualifies as an “urgent situation” and pull out our trusty blizzard
   Before we proceed any further with this pending altercation, let’s lay out the
numbers that we have established so far:

    Our fireball spell:               5 points of damage
    Our blizzard wand:                60 points of damage
    Maximum total actions:            12 (i.e., 60 seconds at 5 seconds each)
    Damage we must deal:              260 points

    We can, of course, drop these into the same formula that our mathematical
rodents used to determine their choice of beverage:


    Cb       Casts of blizzard
    Cbf      Total casts of blizzard and fireball allowed
    Db       Damage from blizzard
    Df       Damage from fireball
    Dbf      Damage necessary to defeat the enemy

    This would yield the following:

    Remembering that we want to limit the number of uses of our blizzard wand,
our optimum course of action would be to use the wand four times, dealing 240
points of damage, and then cast four fireball spells to deal the remaining 20 points.
218   Behavioral Mathematics for Game AI

      (Note that 3.6 is the minimum, and since we can’t cast partial spells, we need to
      round up.) That only amounts to eight spell casts. However, there is no more effi-
      cient way of doing it that would save our precious wand charges. If we were to only
      use three blizzards, we would do 180 points. The remaining time allows for nine
      spell casts and, at five points each, we could only do another 45 points of damage—
      bringing the total to only 225. Therefore, the four and four distribution is best.

      Getting the Cold Shoulder
      Just before we begin the combat, however, our foe pulls out a ring of cold resistance.
      With our vast mental repository of magical lore, we know that the ring will provide
      a 20% resistance to our blizzard wand. Therefore, each usage of the wand will only
      do 48 points of damage rather than 60. How do we proceed?
           Originally, the blizzard wand dealt 12 times the damage that our pathetic fire-
      ball did. Now, the ratio is less than 10 to 1. In a relative sense, our fireball is more
      powerful than it was before our enemy donned the ring of cold resistance. Likewise,
      the blizzard wand is now less effective. At first glance, at 48 points of damage, we are
      getting less bang for our buck (or gore for our gold, I suppose) than we were when
      we were doing 60 points of damage. If we were to base our decision only on how
      powerful our two attacks were relative to each other, we would be less inclined to use
      the blizzard wand charges and lean more on the usage of the fireball.
          The lesson of Giffen goods taught us differently, however. As with the rats and
      their root beer, when the cost of an undesirable selection goes up—in this case,
      using the precious wand charges—it sometimes is necessary to use it more often. To
      confirm this, we place the new numbers into our formula. The only one that has
      changed is the first number in the denominator—from 60 to 48.

          With the amount of blizzard damage per cast reduced to 48, we now have to
      use it five times instead of four. (The remainder of the damage can still be dealt by
      two fireballs.)
          Continuing down that trend (Figure 9.4), if the damage from the blizzard was
      reduced to 40, we would have to use it six times. At 30, we would need to cast it
      eight times. If the effectiveness was reduced to 25—only five times greater than our
      piddly little fireball spell—we would have to use it ten times! What went wrong?
                                                          Chapter 9 Relative Utility   219

           FIGURE 9.4 Because the number of attacks is limited to 12 and the amount
              of damage we need to do is fixed, as the damage dealt by the blizzard
                   wand decreases, the minimum number of usages increases.

          Originally, the factors that went into our decision seemed only to be how much
      damage we could do with each weapon and which one we preferred to use (our
      never-ending fireball spell). However, just as in the Monty Hall Problem, we must
      be careful to take into account all the parameters that affect the equation. In this
      case, the reason we can’t simply use the relative strengths of the two weapons as our
      guide is that other factors are in play. First, we have a limited number of actions we
      can take in the time allotted to us (12). Second, we have a minimum amount of
      damage we need to do (260) in that amount of time. Therefore, as the efficiency
      of our more powerful, yet less desirable, weapon goes down, the necessity of its
      usage goes up.

          The moral of our story is that like the rats and their root beer, our agents need
      to be able to identify situations where a change in one factor may require a corre-
      sponding change in action that seems contradictory to what we would expect. The
      problem often results from not taking every factor into account.
          Another stumbling block is an assumption that some things are equal when
      they are not. For the moral of that story, we turn to a story about morals.

      In the outbreak problem, the decisions were exactly the same—people simply didn’t
      realize it because their perceptions were colored by the emotionality conveyed by
220   Behavioral Mathematics for Game AI

      the ideas of saving people and letting them die. Because of this, the issue isn’t as
      relevant in the realm of hedonic calculus. People who would choose group A over B
      but not C over D would do so not because A was more moral than C—they would
      do it because they perceived that there was a moral difference when there was not.
      There was a mathematical equivalency there that they did not understand. In plenty
      of situations, however, a perceived mathematical equivalency can actually resonate
      deeply on a moral level, and this is the sort of quandary that Bentham was after
      when he constructed his hedonic calculus.
           There is a famous series of “moral dilemma” problems that have various ver-
      sions and spins. All of them center on a runaway trolley and are, with another nod
      to Sir Occam, starkly simple. Taken as individual choices, they make for interesting
      discussion. The real intrigue comes when they are taken as a series, however. That
      way, they are not being judged as the simple A vs. B questions that they are. Instead,
      we are forced to deal with why option A in one question is better or worse than
      option A in another one.

      When Is Killing not Killing?
      In the initial question—the one that sets up the series as a whole—we are given the
      following scenario:

          An unmanned, runaway trolley is heading down a track. In its path are five
          people (who are either unaware of the approach of the trolley or, in a more
          dramatic version, tied to the track by some nefarious dude) who will be
          killed by the trolley if it reaches them. Thankfully, you are by a switch that
          will send the careening trolley onto a siding. On the siding, however, is a
          single person (also either blissfully unaware or tragically trussed) who will
          be killed if you were to send the trolley onto the siding. What should you do?

           It would seem, given the relative simplicity of the scenario that the answer is
      clear—you must throw the switch, saving the five people on the main track despite
      killing the one on the siding. Tossed up against Bentham’s criteria (particularly the
      seventh one), this makes sense. The extreme negativity of death is multiplied by the
      five people in the first choice and only by the one in the second. Five is greater than
      one. Throw the switch. Divert the trolley. Save five people. (Sorry buddy….) Done.
          Let’s put a spin on this issue (Figure 9.5). Imagine that we are now on a bridge
      over the track with a large weight available nearby. If we toss the weight onto the
      track, we can stop the trolley before it hits the five people. Once again, this much is
      simple: Toss the weight. Stop the trolley. Save the people. Done.
                                                      Chapter 9 Relative Utility        221

     But what if the trolley-stopping weight on the bridge was a rather obese man
who just happened to be minding his own business? Should we toss him off the
bridge to stop our horrible trolley from killing the five people? This is something
more problematic, as most of us would agree, although we can’t seem to put our
fingers on exactly why. If you were to approach it mathematically, it would seem to
be similar to the original solution: One person dies so that five may live. That doesn’t
seem satisfying—or even comfortable to us, though. Bentham’s seven rules don’t
seem to address anything that would make this scenario different from the original
siding-based one, either. So what is the issue?

     FIGURE 9.5 In its various forms, the Trolley Moral Dilemmas provide different
       insights into not only relative tragedy but the morality of active vs. passive
           roles in a tragedy. For example, allowing one person to die to save
                others is different from killing one person to save others.

      Looking back at the two examples, there is a difference between the original
man on the siding and the one on the bridge. In the latter, we are actually using the
man (presumably against his will) to stop the trolley. He would not be involved at
all if it weren’t for us bringing him into the problem through our decision. In the
former, we are not using the individual to save the five people on the main line—
we are using the siding to save them. The hapless human on the siding just happens
to be on the siding. It’s not our fault. The five people on the main line would have
been saved by utilizing the siding whether the single person was present or not. The
difference, therefore, is the fact that we chose specifically to cause the death of the
man on the bridge as a tool rather than as a side effect of an action.
222   Behavioral Mathematics for Game AI

          Of course, the case could be made that, if the track is currently set to run on the
      main line toward the five victims, by doing nothing at all, we are not really killing
      them—the trolley and track configuration is doing that. However, if we were to
      throw the switch to the siding, we would be selecting to kill the one person on it. He
      would not have died if we had not been involved, whereas the five people would
      have died regardless of whether we were present or not. The deaths of the five peo-
      ple are not our responsibility; the death of the one person would be.
           Despite being constructed for the purpose of determining dicey propositions
      such as those we refer to as “the lesser of two evils,” this form of instinctive moral-
      ity does not seem to get covered in Bentham’s calculus. Somehow, we would need
      to quantify the relative tragedies of their deaths.

      When Is a Victim More of a Victim?
      Absent from the above problems is any differentiation between the people. They are
      all just “victims” in our example. While this makes for simpler calculations—five
      deaths is greater that one, for instance—it does seem rather impersonal. We don’t
      know anything about these people whom we are using as markers in our morbid
      decision equation. Are there factors that we have not considered that may make our
      simple calculation more complicated? Consider the following:

          The five people are all in a van that became stuck on the tracks while transporting
          them to a critical care facility for treatment of short-term, terminal illnesses.
          Are they less valuable than the person on the siding?
          The single person on the siding is a nine-year-old child in perfect health. The
          five people on the main line are in their 50s and 60s. Do you sacrifice the one
          child to save the five older people?
          The five people are all criminals who have committed heinous crime for which
          they are serving life sentences. The person on the siding is an average person.
          Are the five felons less valuable than one free man?
          The one person on the siding is a wealthy, philanthropic benefactor who has
          dedicated his life and fortune to helping the poor, sick, and unfortunate. Do
          you sacrifice the saint to save the five normal people?
          The one person on the siding is a doctor who is the only person who can stop
          an outbreak of a deadly virus. Do we sacrifice five people so that the doctor can
          potentially save hundreds?
          The five people on the main line already have a deadly virus. There is a two-
          thirds chance that they may all die anyway—but a one in three chance that they
          may all survive. Do we consider the likelihood that they will die and save the
          one person for sure?
                                                      Chapter 9 Relative Utility    223

    The five people are regular folks, but the obese person on the bridge is a con-
    victed murderer? Do you now feel better about using him as a trolley-stopper?
    The person on the siding is a prodigious and talented artist—Mozart, for example.
    Do you now feel worse about silencing him forever?

    By personalizing the potential victims somewhat, we have made the decision
more complicated. We are now including factors that aren’t as simple as raw math-
ematics (e.g., five is greater than one) and further distorting the already confusing
issues of morality (e.g., a intentional victim vs. an unfortunate bystander). Yet, to
make a decision, these are factors that we must consider, and to consider these fac-
tors, we need to quantify them in some way.
     Bentham’s hedonic calculus can provide some support in this effort. However,
it can only go so far as to identify that we should consider certain factors. It doesn’t
offer guidelines as to the exact weight that we should put on those factors. For
example, citing the “duration” rule, Bentham may have suggested that the life of a
healthy young person is more valuable than the lives of five people who are likely
to die soon. One method of scoring this factor would be to count the years. If the
one person likely has 50 more years of life left and the five people likely have only
one year each, then we would be more inclined to save the young person. Although
saving five lives is greater than saving one, saving 50 years of life is greater than
saving only five. Of course, there is the possibility that the young person may not
live for 50 years. But that is how actuaries make their money.
     And what about the lives that would be touched indirectly? Assume that each
person has the same number of family members who would grieve their loss.
Assume as well that grief lasts for one year. If we let the one person die, a small
circle of people will grieve. By letting five people die, we would be causing great
emotional trauma to five times as many people for a year each.
     As we have discussed numerous times in the past few chapters, there are factors
that aren’t quite as comparable. How much pleasure would the artist bring to how
many people? If the artist (such as Mozart) has the capacity to “touch” millions of
lives for hundreds of years, is that worth sacrificing five otherwise nondescript
people? Certainly, purveyors of eugenics would claim that the artist’s long-term
benefit to society outweighs the loss of the five “normal” people that the decision
would incur. Of course, extreme genius and talent often come packaged together
with mental illness. Does that change our position on saving the artist?
     As we have suggested numerous times throughout this book, deciding how to
assign utilities to all of these disparate factors is more art than science. More specif-
ically, it is an art that is often driven by the game scenario that we are trying to fill
out. Some aspects of decision making are more relevant in certain game genres or
224   Behavioral Mathematics for Game AI

      game situations than in others. In a first-person shooter (FPS) game, for example,
      we need to consider a different constellation of factors than we would be concerned
      with in a real-time strategy (RTS). Similarly, there will be different considerations
      in an RPG than there would be in a sports game. For purposes of illustration, how-
      ever, we can conjure up an example that is similar to our moral dilemmas above.

       IN   THE   G AME   Hippocratic Morals

      Imagine that we are a medic for a platoon of soldiers. We have a number of seri-
      ously wounded squad-mates. Three of the injured are normal combatants. One of
      them, however, is the only demolitions expert in the group. Given the arrangement
      of the situation and the relative severity of the wounds, we must choose between
      saving the lives of the three normal soldiers or the life of the demolitions expert.
      We are also aware that the demolitions expert has the ability to destroy the enemy
      forces that are massing for a final assault that will assuredly wipe out the entire
      platoon. As the medic, who should we tend to?
          At first, the exercise seems strictly mathematical: Three people are more impor-
      tant than one. However, we are also aware that there is another layer involved. We
      must consider that by saving the demolitions expert, we are actually doing a greater
      good (by preventing an inevitable greater loss). The unfortunate loss of the other
      three soldiers is a by-product. We are not choosing to kill them; we are choosing to
      save many people over and above the four injured soldiers that are in our immedi-
      ate care. By realizing that extra layer of abstraction in the problem, we avoid the
      apparent contradiction that saving one person is more important than saving three
           To arrive at the solution to this problem, we would have to assign utility to the
      people. For example, we can assign that value based on the number of other lives
      they would save. This could either be direct, such as by healing another medic so he
      can assist others, or it could be indirect, such as in the scenario above where demo
      dude is going to prevent the enemy from killing more of our own side than are in
      jeopardy at present. Of course, there are other considerations. The three soldiers
      may have the ability to do some damage as well. If we were to heal them, they could
      start reaping vengeance upon our enemies… but how much vengeance? How does
      it compare to what the demo man can do?
           This example still comes down to a question of “how many?” Instead of stop-
      ping merely at “how many lives can I save right now?” it becomes “how many lives
      will be saved eventually?”
                                                           Chapter 9 Relative Utility    225

           Other scenarios may have more abstract considerations. Regardless of the form
       they take, however, if we can convert them to a utility value, we can measure and
       compare them. Through those comparisons, we can determine which course of
       action is the most preferable. On a simpler level, they may seem contradictory. When
       we put the considerations into the entire framework of the scenario, however, we
       can determine that an apparent moral tragedy is actually the lesser of two evils.


       In the end, the benefit of utility theory depends on not just ascribing utility to an
       object or an action, but framing it in a manner that we can compare to the utilities
       of all other objects or actions. We may find that this process is clear and simple, or
       we may find that we must apply a liberal dose of situational subjectivity.
           Once we have determined those utilities (regardless of whether they are static
       or dynamic), we can proceed with the delicate and often equally subjective task of
       applying them in some meaningful manner. Using hedonic calculus as a model, we
       can determine a way of combining the factors through MAUT practices so that
       each receives a proper weight in the final decision.
            However, as we have shown, caveats abound. Human perception and under-
       standing does not lend itself well to firm measurement, categorization, and inter-
       pretation. Messy things such as the emotional content of words such as save, live,
       die, and kill can place a smoky, distorted filter over comparisons that should yield
       equality. Worse still, things that should be obvious inequalities in one direction can
       tip in the opposite direction by simply putting them into a larger context.
            Despite all the vagaries, relative utility is the foundational building block from
       which all of decision theory is constructed. It is an extraordinarily powerful and
       flexible building block as well. For much of the remainder of this book we will
       focus on what you can build with this magic toy.
This page intentionally left blank

III             Mathematical Modeling

             hen constructing decisions for game artificial intelligence (AI), it is im-
             portant to tap into various types of statistical modeling and probability
             math. Part III explains some of the basic mathematical constructs that
  we will use throughout the remainder of the book.
     Chapter 10, “Mathematical Functions,” shows examples of some of the com-
  mon mathematical functions that we can use in the modeling of behaviors.
      Chapter 11, “Probability Distributions,” explains the terminology of distrib-
  utions of data, shows how we can model different distributions of populations,
  and gives examples of their usage.
      Chapter 12, “Response Curves,” shows we can construct data structures that
  allow for simple creation, manipulation, and retrieval of activation functions and
  probability distributions.
      Chapter 13, “Factor Weighting,” explains considerations and methods in the
  process of codifying, scaling, and combining utility values.

This page intentionally left blank
10                  Mathematical Functions

          t’s no secret that mathematical functions are integral (Oooh… a calculus pun!)
          to game development as a whole. This is especially true in the subspecialty
          of game artificial intelligence (AI). In Part II, we explored the heady world of
      decision theory. Much of the discussion, however, was theoretical rather than prac-
      tical. Even as we delved into the mathematics of utility, much of what we explored
      was relatively straightforward. If we are to convert that theory into practice, we
      need to explore the language necessary for expressing those theories.
           A good example is that of marginal utility. In Chapter 8, we discussed how util-
      ity can change over a range such as time or quantity. Simply knowing that it can
      change—or even which direction it is changing—is not enough. We need to define
      what that rate of change is. We may need to even define the rate of change of the
      change! There are numerous ways of accomplishing this task, the selection of which
      depends on exactly the effect that we desire.
           As we have repeated throughout this book so far, the best approach is for us to
      lay out the tools and make note of some convenient examples of their usage. In that
      pursuit, this chapter will consist of a reference guide to these basic tools.


      The most basic function we will use is the linear function—named as such due to
      the result forming a straight line. We usually see this early algebraic staple defined
      in the slope-intercept form,

      where m is the slope of the line and b is the y-intercept (the point where the line
      crosses the y-axis).

230   Behavioral Mathematics for Game AI

          We can also use a version that is solved in terms of x instead:

          In this case, c is the x-intercept rather than the y-intercept.
           There are a number of other variations on how to express a line. However,
      many of them are slightly more difficult to work with from a programming stand-
      point in that they have more than one term on the left side of the equal sign. For
      example, rather than a slope, we could substitute in values that result in the slope
      of the line:

          Note that in the above equation x2 cannot be equal to x1 lest we encounter a
      “division by zero” error and cause a catastrophic and untimely end to our faux
           There is really only one simple rule to remember about the linear function. If
      m < 0, the line is descending. If m > 0, the line is ascending. (For that matter,
      if m = 0, the line is horizontal.) Because the slope is, for all intents and purposes, the
      only defining characteristic of the linear function, m is the only value that changes
      the effect of the function. Changing the value of b merely moves the line around in
      space. We can see examples of these changes in Figure 10.1.

         FIGURE 10.1 The graph of three linear functions with different values for m and b.
                    Note that a negative value of m creates a descending line.
                                                        Chapter 10 Mathematical Functions         231


           We can create another common and useful function in a similar fashion to the lin-
           ear function above. By inserting an exponent into the formula, we create one of the
           veritably plethoric varieties of quadratic functions. Rather than being a straight
           line, the quadratic family exhibits parabolic curves. The most simple of these equa-
           tions is the familiar

               This function creates a parabola centered on the y-axis whose vertex (the point
           where the parabola changes direction) crosses the origin. The exponent (k) can be
           any non-zero real number. For simplicity’s sake, however, let’s restrict ourselves to
           k = 2 over the next few examples. (A little further on, we will see why different
           values of k can complicate things.)

           Most of us are used to thinking of equations with exponents as exponential
           functions. That term, however, has a very specific meaning. Technically, functions
           with a squared component (or higher) are referred to as quadratic.

           Additionally, while quadratic specifically refers to equations whose highest
           exponent is 2, it gets a little clumsy to refer to cubic, quartic, and quintic func-
           tions for exponents of 3, 4, and 5, respectively. This will get even more confusing
           when we use non-integer exponents. Therefore, when I refer to a quadratic
           function, please take it to mean simply one with parabolic characteristics.

               We can perform numerous manipulations on this base equation to shape it to
           our liking and move it to our desired position in the graph space.

           First, we are likely going to get more use out of the line produced by the quadratic
           equation if we can position it in different locations. To move the line vertically, we
           must add a value to it in a way similar to what we did with the linear equation
           above. In fact, we will term this value b since it performs a similar function to the
           y-intercept value of a linear equation. This causes the resulting function to be

                Positive values of b move the curve up the graph; negative values move it down.
           It is important to note, however, that b is not necessarily going to be the point
232   Behavioral Mathematics for Game AI

      where the line crosses the y-axis. If we move the parabola horizontally, the same
      function will cross the y-axis in a different place. Therefore, it is better to think of
      b in terms of shifting the vertex of the parabola up or down by b. That is, if b = 5,
      then the vertex will be at y = 5 regardless of what the x value is.
           To shift the vertex (and the entire parabola) horizontally, we subtract the value
      from x before raising it to the exponent. In keeping with the terminology estab-
      lished above, we will term this value c (although it is not necessarily the x-intercept).
      The equation for this is

         If we set c = 5, the curve is shifted to the right by five units. Therefore, the vertex
      would be located at the point (5, 0).
          Combining the two adjustments into one equation, we would arrive at

          By using this formula, we can generate a simple parabola with a vertex at the
      point (c, b). We can see three examples of shifted parabolas in Figure 10.2.

          FIGURE 10.2 Three examples of quadratic functions of the form y = (x – c)2 + b.
                The value of b shifts the curve vertically, and c shifts it horizontally.
                          The vertex of the parabola is located at (c, b).
                                                       Chapter 10 Mathematical Functions           233

          By adding the value of x without an exponent, we can “tilt” the parabola one direc-
          tion or the other. For example, assuming x > 0, the following formula will tilt the
          parabola slightly to the left:

             By adding a coefficient to the first-degree component (the one without an ex-
          ponent), we determine the magnitude of the effect. For example, the formula

          generates a parabola that tilts significantly further to the left than does the one
          from the original equation above (Figure 10.3).

                     FIGURE 10.3 By adding a first-degree component to the otherwise
                  second-degree function, we can “tilt” the parabola. In this case, the addition
                           of 10x to the basic parabola tilted the curve to the left.

               Note that the addition of other components will also change the vertex loca-
          tions much like the inclusion of the b and c parameters. In the case of the formula
          in Figure 10.3, the new vertex is at (–5, –25). When x = –5, the original x2 portion
          of the equation still wants to be at y = 25. The new 10x portion yields a –50, how-
          ever, which pushes that side of the curve down. Similarly, positive values of x push
          the right side of the curve up by adding 10x to it—thus the “tilt” effect. Note that
          you can’t tilt the parabola such that you can have two y values for the same x value.
          (Parabolas wobble, but they don’t fall down!)
234    Behavioral Mathematics for Game AI

       As elegant as the above parabola is, it is unlikely that its shape will exactly fit our
       needs. We may want to make it wider or narrower, or skew it slightly one direction
       or the other. The formula can be modified a number of ways to affect the shape of
       a parabola. Many of them come with caveats, however, that we must be aware of.

       Increasing the Exponent
       The first major modification that we can make to a quadratic equation is to change
       the exponent (k). Doing so changes the rate at which the slope of the parabola
       changes. That is, it makes the parabola narrower or wider.
            One thing we must remember, however, is that odd exponents (e.g., 3, 5) will
       cause the portion of the parabola to the left of the vertex to curve downward (Figure
       10.4). The reason we note this as “to the left of the vertex” rather than “below 0”
       is that if we shift the parabola horizontally using the techniques above, the x loca-
       tion of vertex may be greater than or less than 0. The inflection change happens at
       the vertex… not at the y-axis. We can avoid this effect by taking the absolute value
       of the function. In that case, both arms of the parabola still extend in the same

             FIGURE 10.4 The quadratic function y = xk with the values of k as 2, 3, 4, and 5.
               As the exponent increases, the parabola narrows. Also, odd exponents cause
                      the portion of the curve below 0 to descend rather that ascend.
                        (Note that we can avoid the negative y effect by taking the
                                       absolute value of the function.)
                                           Chapter 10 Mathematical Functions        235

    This is not an issue if we are only concerned with the portion of the parabola
that is to the right of the vertex. In fact, by focusing only on that side of the
parabola, we can open up another bag of tricks with regard to the exponent.
    By dispensing with the possibility of x values less than 0, we are allowed to raise
x to fractional exponents (Figure 10.5). The result of doing this is the same as
increasing the exponent by whole numbers—the higher the exponent, the steeper
the resulting parabola. However, by allowing real numbers rather than integers, we
have much more flexibility in the exact shape of the curve.

FIGURE 10.5 The quadratic function y = xk with the values of k as 1.5, 2.0, 2.2, and 2.5.
       As before, as the exponent increases, the bowl of the parabola narrows.

Rotating the Curve
Another convenient manipulation is to use values of k that are between 0 and 1.
Remember that another way of writing the square root of x is x0.5. Therefore, by
raising x to k values between 0 and 1, we are actually taking a root of x. The result-
ing curve is a parabola whose axis of symmetry is parallel to the x-axis rather than
the y-axis (Figure 10.6).
    As with exponents greater than 1, we can change the shape of the parabola by
modifying the magnitude of the value of k. In this case, when we make the exponent
smaller, the parabola narrows. An easier way to remember this is for us to think in
terms of closer to or farther away from 1. Just as with k > 1, the farther away from 1
we move, the narrower the curve.
236    Behavioral Mathematics for Game AI

             FIGURE 10.6 The quadratic function, y = x0.5 (or the square root of x), yields
                 a parabolic curve with its primary axis horizontal rather than vertical.


       One of the more useful, and therefore oft-used, functions is the sigmoid function.
       Specifically, the name sigmoid refers to a function with an “s-curve.” Therefore,
       sigmoid functions are actually an entire family of curves. We will examine some of
       the more common ones below.

       The most common of these is the logistic function. It pops up in a variety of appli-
       cations ranging from biology, economics, probability, statistics, and (strikingly
       relevant to us) the mathematics of behavioral psychology.
           The formula for the logistic function is

            Because logistic functions are often used for population and time-based calcu-
       lations, the value P stands for population, and t stands for a specified point in time.
       Therefore, we would read the function above as “the population at time t equals … .”
       However, this is simply a naming convention.
                                         Chapter 10 Mathematical Functions      237

    Converting it to our standard notation, we could rewrite the formula as

    We can attribute the usefulness and popularity of the deceptively simple-looking
logistic function to the presence of that ever-so-mystical value e. The value e (also
known as Euler’s number) is the base of the natural logarithm. Like the constant pi,
the value of e extends off to unfathomable lengths. For our purposes, a reasonable
approximation is 2.718281828. In fact, we can usually get away with something less
precise such as 2.718.
    The tails of the logistic curve asymptotically approach 0 and 1 (Figure 10.7).
Given the rate at which they close, when we are using the value e, there really isn’t
much need to calculate the logistic curve outside the x range of [–10...10]. At those
points, the function is within 0.0001 of the respective bounds. We can even define
cutoffs at –6 and 6, respectively, and not lose much resolution. For example, at
x = 6, y = 0.9975.
    If we want, we can use other, non-negative numbers besides e for the base in this
function. The greater the number, the narrower the shape of the logistic function.
Regardless of the base that we use, however, the values still approach the asymptotes
0 and 1.

        FIGURE 10.7    The logistic function is the most commonly encountered
                      member of the family of sigmoid functions.
238   Behavioral Mathematics for Game AI

           One of the advantages of the vertical range of the logistic function being [0...1]
      is that it works very well as an expression of percentage. Despite being apparent on
      the graph in Figure 10.6, it’s worth noting that the function crosses the “50% mark”
      at x = 0. If we think in those terms, we can see how the curve would be advanta-
      geous in applications of psychology such as subjective utility functions.
          For instance, if we were to say that we were “50% satisfied” with something
      when it was “normal” (i.e., 0), we could then make the further observation that, as
      that thing improved (i.e., x > 0), we would be more satisfied with it. Eventually, if
      the improvement continued, we would approach a point where we were almost
      100% satisfied with the object or action. Likewise, as the quality of the object or
      action grew worse (i.e., x < 0), we would be less satisfied with it until such time as
      we had almost 0% satisfaction.
           In both cases, the further that the quality of that thing moved away from
      “normal,” the less dramatic the effect is. This is reminiscent of marginal utility…
      but in both directions away from the starting point. Unlike the root-based quadratic
      equations that continued on to infinity in both directions, the asymptotic nature of
      the logistic function provides the natural boundaries of 0% and 100%. After all, we
      can’t be less satisfied than “not at all” or more satisfied than “perfectly so.”
          Also, because the vertical range of the curve is neatly defined, we can flip the
      curve upside down quite easily by changing the formula to

          Using the above formula, we find that the curve now starts near 1.0 when x <
      0 and ends near 0.0 when x > 0.
          We can also shift the curve left or right in a fashion similar to the way we did
      with other functions above. Using the same notation as before (the variable c), our
      new function would look like this:

           Because the line extends infinitely in both directions, using the point where y
      crosses the 0.5 mark is our best landmark. In the unshifted curve, when x = 0, y =
      0.5. If we include a value for c, we know that y will cross the 0.5 mark where x = c.
      As we noted before, the differences in value get insignificant outside the range of
      about –6 to 6. Therefore, if we wanted the curve to start with x = 0, y = 0, we could
      set c = 6. Accordingly, y would be within the same margin of 1.0 at about x = 12.
                                                  Chapter 10 Mathematical Functions          239

       Close kin to the logistic function is the logit function (pronounced low-jit). We can
       think of a logit curve as a logistic curve rotated 90 degrees. Where the logistic curve
       approached y = 0 and y = 1, the s-curve defined by the logit function approaches
       the asymptotes of x = 0 and x = 1. The formula for the logit function is

           As with the logistic function above, we can define any base to the logarithm that
       we wanted, although our good friend e is typically used (Figure 10.8). By increas-
       ing the value of the logarithm base, we can “flatten” the s-curve horizontally.

              FIGURE 10.8 We can think of the logit function as a rotated version of the
              logistic function. The curve approaches but does not reach x = 0 and x = 1.
               Therefore, we must take care not to attempt to solve for those values of x.

           As with the logistic function, we can use the logit function in numerous appli-
       cations with regard to behavioral and psychological perceptions and reaction. For
       example, rather than resembling the notion of decreasing marginal utility like the
       logistic function, as we move from the “center” point (x = 0.5), the logit curve
       resembles increasing marginal utility.
           Because the value of y approaches infinity the closer x gets to 0 and 1, there
       truly are no upper and lower bounds. However, as you can see from Figure 10.8, the
240   Behavioral Mathematics for Game AI

      shift is so dramatic that the actual boundaries are based on the decimal precision
      that we use for x. If we are not using very precise values of x, those extreme values
      of y do not come into play. For example, using e as the root (as shown in Figure
      10.8), y < –5 only when x < 0.0057. Therefore, if we only concerned ourselves with
      two decimal places, we would never see values of y less than –5 or greater than 5.
           Because of this, one common shift we can perform is to move the curve up by
      five points. This means the effective range of the logit function becomes 0 to 10
      rather than –5 to 5. For completeness, the formula would be


      By no means are the above functions an exhaustive list. What they do provide is a
      number of core building blocks from which to work. By using the basics, tweaking
      them with coefficients, and even combining multiple types into the same equation,
      we can create many very distinct curves.
           As we will see over the next few chapters, fine-tuning a function to fit a need is a
      skill that is core to constructing proper behavioral mathematics. Often, constructing
      the right curve takes a lot of trial and error. I very much recommend using a tool
      such as Excel to construct formulas and examine the resulting graphs.
          When all else fails, there are ways of handcrafting data points that break com-
      pletely out of the formula-based approach. We shall cover those in Chapter 12.
11             Probability Distributions

        s we discovered in earlier chapters, there are many reasons for the gulf
        between the one “should be done” solution that normative decision theory
        provides and the myriad possibilities that are summarized by descriptive
 decision theory—that is, what people as a whole tend to do. The suggestions and
 guides that descriptive decision theory provides often are based on observation
 and collection of information. This data is necessarily an aggregate. As such, we have
 a picture of the population as a whole rather than any individual member of that
     The end result is that, while we know that people do different things, we don’t
 know why they do them. While we can ascertain to some degree some of the
 possible non-optimal solutions that people may provide to a particular decision,
 often the reasons why they do so get lost in the background noise of human indi-
      The other shortcoming of the descriptive decision theory approach is that, for
 obvious reasons, the game development world doesn’t have data on the behaviors
 we would like to model. We simply don’t have enough data on how often orcs elect
 to use their hand axes or the accuracy with which demons from hell can toss balls
 of flaming plasma. Even in cases where we do have data from which to work, it may
 not provide the sort of behavior mix that makes for a believable, engaging game
 character. Therefore, to model these behaviors, we must be able to assemble our
 own substitutes for the data that we believe is representative of the population as a

 Throughout this section, the term population may represent a group of data
 points rather than simply the typical definition of “a group of people.” In large
 part, these concepts can be interchanged. For example, population could signify
 many people each making one decision about X or one person making many
 successive decisions about X.

242   Behavioral Mathematics for Game AI


      To artificially model a population, we need to identify the key components of that
      population. We need to make some broad assumptions about the decision that the
      population faces and use those conjectures as both building blocks and constraints
      as appropriate. Much like setting out stakes and lines defines where we will build a
      house, identifying, measuring out, and fixing in position the important features of
      our population is a necessary step in the process of constructing an accurate model
      of a population. From that model, we can then make the assumption that any one
      sample drawn from that population is representative of a randomly selected mem-
      ber of that population. The law of large numbers suggests that the more samples
      we draw from that model, the more the aggregate of our selections begins to mimic
      the population as a whole.
           For example, if we fill a bag with five red balls, three blue balls, and two green
      balls, and then select randomly from the balls in the bag, we have no idea which
      color ball we are going to select on any given draw. We can say nothing about the
      individual selection. While we can suggest that we may select a red ball about as
      often as a non-red ball, we can’t exactly predict which color we are going to select
      next. We also can’t say why we picked the color we just did. We didn’t follow a rule
      that said, “You will pick a green ball this time.” However, over time (and assuming
      that we replace the selected ball each time), we will find that we are selecting red
      balls 50% of the time, blue balls 30% of the time, and green balls for the remaining
      20% of the time.
           By artificially defining a distribution of colored balls, we can model an eventual
      distribution of those colors without necessarily prescribing the color of any individ-
      ual draw. Similarly, by defining a distribution of any behavioral trait or decision, we
      can model an eventual distribution of these traits or decisions without prescribing
      any individual event or agent.
           In Chapter 4, we glibly recalled the marketing staple “four out of five dentists
      surveyed recommended sugarless gum….” This does not mean that any given den-
      tist recommends sugarless gum to 80% of his patients. This means that if we were
      to select a random dentist, we are 80% likely to have selected one who recommends
      sugarless gum. The statistic as a whole tells us nothing about any individual dentist.
      However, if we were to attempt to model that population in a game, we could
      leverage that one number. It wouldn’t be difficult to create a game (“Dental
      Prophylaxis Simulator”) wherein the player would discover that 80% of the dentists
      he encounters recommend sugarless gum. That experience would mesh with what
      we know of the real world (or at least what the marketing folks have told us about
      the real world).
                                          Chapter 11 Probability Distributions    243

     Numbers such as the “four out of five” statistic not only tell us about the behav-
ior of populations, but they offer us useful tools with which we can reconstruct the
behavior of populations. If we were to construct a histogram about the dentists the
same way we did with the number guessers, it would look like the one in Figure 11.1.

          FIGURE 11.1 A simple histogram showing how the marketing adage
                    “four out of five dentists surveyed” would look.

     Based on this histogram, if we were to ask what was the percentage chance of
randomly selecting a dentist who answered “yes,” the answer would be four out
of five. Therefore, if we were to build an agent pretending to be a dentist, and we
wanted this dentist to answer the “sugarless gum” question, we could use this exact
same histogram as a source. In this (admittedly simple) case, we would leverage that
same mathematical proportion—four out of five… or 80%.

Turning the Tables
The trick is in turning the statistics back upon themselves. By looking at the survey
data, we can realize that we are 80% likely to encounter a dentist who recommends
sugarless gum. Therefore, when we create a random dentist, we make him 80%
likely to recommend sugarless gum. We are taking survey data from five dentists
and boiling it down into collection of probabilities that we can use to construct one
dentist at a time.
    To reiterate, in this case we created the histogram from the knowledge we had
of dentists (i.e., “four out of five”). By using that histogram as a decision tool for
our dental agent, we applied the process in reverse. We gave a four in five chance
that he would respond “yes” and a one in five chance that he would respond “no.”
If we change the numbers, the histogram would change. Of course, if we changed
244    Behavioral Mathematics for Game AI

       the numbers, the odds of the two potential outcomes would change as well. The
       histogram is representative of the odds. As such, the shape of the curve that it ex-
       hibits is an important tool in determining what outcomes we can draw from its data.

       Throughout this book, I will use the word curve often in regard to graphs of
       data. While the line implied by the data may or may not be curved, I will use
       curve regardless. Please do not get out a straight edge to determine if a line that
       looks straight to you actually has a curve to it simply because I referred to it as

       In the Guess Two-Thirds Game in Chapter 6, we identified some general categories
       of people. There were the random guessers who we termed Index 0. We saw a spike
       in the number of people who guessed 33 (Index 1) and a spike in the number who
       guessed 22 (Index 2). Those were three distinct categories for which we could make
       a reasonable case as to why they selected what they did. However, a significant
       majority of the population guessed plenty of answers that were neither mindlessly
       random nor calculatingly centered on the magical numbers of 33 and 22. Why did
       those people guess that way? The distribution of guesses showed that it wasn’t com-
       pletely random, but it wasn’t completely rational either. So how do we get into
       those people’s heads?
            The truth is that we don’t really need to get inside their heads as to exactly why
       they chose the way they did. If we were to ask 100 members of that large majority
       of people what their rationale was in choosing their number, chances are we would
       get 100 different answers. Everyone has a different thought process that may or may
       not be based on logical premises. It may even be based partially on a logical premise
       but has been obscured or adulterated by an interloping, non-logical thought. In
       Chapter 6, we covered many potential pitfalls that, if they don’t completely ensnare
       a person’s train of thought, can at least make it stumble somewhat. Each of the in-
       dividual person’s perceptions and thought processes may be composed of a differ-
       ent cocktail of truth, falsehood, error, and subjective coloring. Trying to model all
       of those individual rationales is not only prohibitive but usually largely irrelevant.
           We must ask ourselves, what problem are we trying to solve? If we were (for
       some obscure reason) trying to simulate people trying to play the Guess Two-Thirds
       Game, we would have to include three different approaches (Figure 11.2). First, we
       would have to model the intelligent calculators. This is actually a fairly straightfor-
       ward process of determining what percentage of people picked those two numbers
       (33 and 22).
                                         Chapter 11 Probability Distributions     245

FIGURE 11.2 The Guess Two-Thirds Game had three types of participants: the random
   guessers, the logical guessers, and a group of individuals who each had their own
  methodology. Without knowing the rationale of each individual, we can still identify
           where those people tended to guess and in what concentrations.

    Glancing at the data from Chapter 6, we see that about 6.5% of the population
picked 33, and just over 6% guessed 22. However, we must take note that only
about half of those two populations stuck up “above the crowd.” We would be rea-
sonable in making the assumption that only those people above the rest of the nearby
population specifically selected those numbers for a reason. The others selected 33
or 22 with as random a process of the people who selected 31, 34, 21, or 23. Let us
assume that 3% of the guessers selected 22 in a logical fashion, and 4% of the
guessers selected 33 in the same manner. That accounts for 7% of our population
     Second, we would have to model the random guessers. Likewise, this is a rela-
tively simple task. By determining the number of people who are guessing randomly,
we can then spread them evenly across the entire range of possibilities by simply
selecting a random number from 0 to 100. For example, we could say that 30% of
the population guessed randomly. Evenly spread, this would mean about 0.3%
would pick each of the 101 possibilities from 0 to 100.
    The third task is slightly more involved and certainly less definable. We would
have to come up with a method of representing the 63% of the players who weren’t
random, but weren’t logically calculating. Because we can’t model the very personal,
unique thought processes that went on in each person’s mind, we have to look at
them as an aggregate. Using the terminology from descriptive decision theory,
“What do people tend to do?”
246   Behavioral Mathematics for Game AI

          Regardless of their motives, people playing the game pick numbers. We need to
      find a way of modeling those picks. We do not start from the lowest level of why
      they pick the way they do, however. We need to generate the equivalent of what the
      descriptive decision theory would show us—that four out of five dentists recom-
      mend sugarless gum, for example. If we were to look at the histogram of the guesses
      that the players offered, we could identify the populations we mentioned above.
           Looking at Figure 11.2 again (which has been grossly simplified from the orig-
      inal histogram from Chapter 6), we have identified our three groups. Spread across
      the entire spectrum from 0 to 100 are the random guessers. Those are the small
      number of people who will guess anything without rhyme or reason. Additionally,
      we saw that there were two spikes of people who guessed 33 and 22, respectively. In
      Chapter 6 we showed why people would have guessed those particular logical an-
      swers. The darkened section in the figure represents our mystery population. There
      was an obvious bulge in the lower half of the histogram. Many people guessed those
      semi-logical answers—far more so than guessed randomly. Unlike the random
      guessers and the targeted 33/22 folks, we have no idea what these people were
      thinking. We only know that they tended to be between approximately 10 and 50,
      with a distinct bulge between 20 and 30.

      Re-Creating the Bulge
      Turning to artificial intelligence (AI), if we wanted to model an agent that would
      play the game the way humans do, we would have to take all of these populations
      into account. Creating an agent that generated purely random guesses wouldn’t
      create the histogram that the Denmark study yielded. Creating an agent that logi-
      cally solved the answer would generate the same answer every time—entirely de-
      pendant on the single logical formula we devised. Neither of these solutions, when
      repeated enough times to resemble a large population of people would mimic the
      results that 19,000 very real people gave us in the Denmark study.
          The solution is to incorporate three approaches. If we were to assume that a
      small portion of the population did guess randomly, we could allow some of our
      agents to do that. If we were to assume that a small portion of the population did
      guess logically, we could allow some of our agents to do that. However, the remain-
      der of our agents fall into that shadow zone. What we need is a way of creating the
      guesses that those agents would offer. That process requires us to do more investi-
      gation on the characteristics of only that population.
                                                     Chapter 11 Probability Distributions   247

       To analyze only the population of semi-logical guessers, we will isolate them from
       the graph in Figure 11.2. If we look at the counts of the guesses of the people who
       are in this category, we may see something like what is in Figure 11.3. Note that un-
       like the random guessers, who were equally probably all across the range from 0 to
       100, and unlike the targeted guessers who focused only on the numbers 33 and 22,
       these guessers exhibit certain characteristics. Let’s describe the observable features
       of this group one by one.

                Their guesses range from 0 to about 70.
                The majority of people guessed between about 15 and 35.
                There was a minority who guessed from 0 to about 15.
                There was a minority who guessed from about 35 to about 60.
                There were some rare “outliers” who guessed above 60.
                Significantly more people guessed any given number in the majority range
                (e.g., 27) than guessed any given number in the minority ranges (e.g., 50)—as
                much as three or four times as many.

                        FIGURE 11.3 The guesses of the semi-logical guessers shown
                          are spread over the range from 0 to 70. The majority of the
                           guesses are clustered between approximately 15 and 35.

           Even without the graph, from that list of descriptive characteristics, we can
       begin to generalize what the population of semi-logical guessers looked like. We can
       then start to construct an artificial curve using those characteristics that we could
       use to represent the population. But what would we do with this information?
248   Behavioral Mathematics for Game AI

      Reconstructing a Guesser
      In the case of the Denmark experiment, each possible selection from 0 to 100 has a
      number of people who selected it. Focusing only on the segment of semi-logical
      guessers again, we don’t care why they selected the number that they did. After all,
      there are probably 15,000 of the initial 19,000+ respondents in that category. However,
      by analyzing how many picked each number, we can remix those exact figures and
      turn them into likelihoods that any one person would pick those numbers.
          For example, if 2% of the people picked the number 29, we could assume that
      any random person had a 2% chance of picking 29. If 1.3% of the people picked 15,
      we could assume that any random person had a 1.3% chance of picking 15. If 0.3%
      picked the number 70, we can assume that any given person has a 0.3% chance of
      picking 70 as well.

      Selecting the Proper Tool
      While the curve that defined the dentist’s recommendations was relatively simplistic,
      and the curve that resulted from the respondents in the Guess Two-Thirds Game
      was slightly more involved, there are plenty of standard curves that fit many situa-
      tions. Additionally, we can tweak, adjust, and even combine each of the standard
      curves with each other to build in subtle nuances. We can use these results just as
      we did above—to model the likelihood of any given person from a population in
      proportion to that type of person’s occurrence in the population. If we do this
      process carefully and correctly, we can model a startlingly deep array of behaviors.
          But first, we must learn about some of the common tools of the trade. At the
      end of this chapter, we will use some of the tools to address the Guess Two-Thirds
      population problem above.


      For the sake of completeness and of setting a base from which to work, we must
      first at least pay a passing recognition to uniform distribution. As the name im-
      plies, it is a probability distribution that is… well… uniform. Every item in the pop-
      ulation has an equal chance of being selected. Examples are not hard to come by. If
      we were to flip a coin, the odds of heads or tails showing would be uniformly dis-
      tributed between the two options. If we roll a fair die numerous times, we would
      find the result of the rolls to be uniformly distributed among all the sides. Each of
      the 52 cards in a deck has an equal opportunity to be picked; therefore, the proba-
      bilities of each card being selected would be uniform. When we ask the computer
      for random numbers (yes, I know they are actually pseudo-random numbers), we
                                             Chapter 11 Probability Distributions        249

receive results that are theoretically uniformly distributed between 0 and some
maximum value. Certainly, we have made the point by now.
     This sort of random selection is probably the most leaned on by game develop-
ers as well. Very often, programmers simply ask for a random number between two
values. If we have three options in a decision, it is perfectly legitimate for us to gen-
erate a three-possibility random number and select the associated option. Uniform
distributions of this sort are also delightfully extensible. If we add a fourth potential
option to our decision, it is a simple matter to change the random number call to
add a fourth result.
    There are those who say that a person’s greatest strength is also their greatest
weakness. Whether or not that holds true for people, it is definitely the case with
uniform distributions. Their biggest weakness is that they are… well… uniform.
There are plenty of phenomena that we can’t model with them.


     A number of terms apply to distributions as a whole. While not directly related to
  our endeavor, they are important to recognize nonetheless.

          Cumulative distribution function (CDF): The CDF represents the sum of all
          the probabilities equal to or less than a given value x. That is, rather than the
          probability of x alone, it represents the probability that a random number will
          be ≤ x.
          Discrete random variable: A probability distribution is considered discrete if
          there are one or more places where the distribution function is not continuous.
          For example, while there are probabilities of rolling a 3 or a 4 on a die, there
          is no corresponding probability of rolling a 3.5.
          Continuous random variable: A probability distribution is considered continu-
          ous if its cumulative distribution function is continuous. A continuous function
          is one that has a solution for all real numbers in the range specified.
          Probability mass function (PMF): The probability that a discrete random
          variable is equal to some value.
          Probability density function (PDF): Because a continuous random variable
          can be subdivided into infinitely small segments, the probability of any one
          value is likewise infinitely small. The PDF expresses the probability that a
          continuous random variable will occur within a specified range. As such, it is
          based on integral calculus.
250    Behavioral Mathematics for Game AI

            Referring back to our Guess Two-Thirds example, the group that we had iden-
       tified as random guessers seemed to be a uniform distribution across the range from
       0 to 100. We could get away with modeling their guesses as random numbers
       from 0 to 100. However, the distribution of the slightly more thoughtful group was
       certainly not uniformly distributed (Figure 11.2). If we wanted to model the guesses
       that group would yield, we would need a slightly more involved approach.


       Perhaps the most well-known of all probability distributions is the normal distri-
       bution. People often refer to it as a Gaussian distribution after Carl Friedrich Gauss,
       the 19th century German mathematician and scientist. Most of us best know this
       function, however, by its colloquial name of the bell curve. The reason for this is
       purely visual coincidence—in its standard form, it looks like a bell.
           We find the hallmark of the bell curve in all its shapes and sizes all throughout
       nature, science, sociology, astronomy, and behavior. Even in just observation of
       humans, much of what we can measure ends up on a bell curve somewhere. Height,
       weight, strength, speed, and intelligence (IQ) are measurements that tend to lie in
       a bell curve. Even statistical processes as simple as flipping multiple coins or rolling
       multiple dice churn out bell curves. It is this connection that makes normal distri-
       butions one of the most valuable components in modeling behavior… and, there-
       fore, one of the ones we will examine in greater detail.

       Every normal distribution has key components (Figure 11.4). These properties reflect
       the size and shape of the distribution—and therefore the population it represents.
       If we know these properties, we can make assumptions about the population.
       Additionally, and more importantly to AI developers, we can also reasonably re-
       create the curve that the distribution shows.

       The range of a distribution is a measurement of the distance between its lowest and
       highest members. Sometimes, this is a function of the parameters of what we are
       trying to measure. For example, in the Guess Two-Thirds Game the minimum
       guess was 0 and the maximum was 100. The range could not be any wider than 101.
       As long as at least one person guessed 0 and one guessed 100, the range would be
                                           Chapter 11 Probability Distributions     251

  FIGURE 11.4 A generic representation of a normal (Gaussian) distribution showing
     the range, mean, median, upper and lower limits, and one standard deviation.

    Of course, it could certainly be smaller than 100. We must make certain to not
confuse the parameters of the scenario and the reality of the sample. If no one in
our sample guessed 100, the uppermost bound would not be 100. In our example
subset illustrated in Figure 11.2, the range was about 71—from 0 to 70. In this case
the actual range differed from the theoretical maximum that the parameters of the
scenario defined.
     There are times, however, when there are no parameters that would artificially
constrain a range. For example, the distribution of people’s heights has no theoret-
ical maximum—or at least not one that we have found. Therefore, the uppermost
limit of the distribution is the height of the tallest person. Similarly, the lower limit
of the range is pinned at the height of the shortest person in the sample. The result-
ing range is the difference between those two extremes.
    We must remember that the range is specific to the population. Two different
populations may have completely different ranges. An obvious example would be the
distribution of the heights of NBA players compared to the heights of the general
population. They each have their own maximums and minimums which, in turn,
yield different ranges.

Mean, Median, and Mode
Another major player in defining a normal distribution is the relative locations of
the mean, the median, and the mode. The mean (represented by the Greek letter
mu: μ) is the average value of all the items in the population. The median is the point
where there are as many items greater than it as less than it. The mode is the point in
the range that has the most samples.
252   Behavioral Mathematics for Game AI

           In its purest (and most cited) form, the bell curve is symmetrical. That means
      that the mean, median, and mode values are identical. They are also co-located
      exactly in the middle of the range. In a normal distribution with a range of [0...100],
      all three figures would be 50. Therefore, we could assert that:

          The average of all the values is 50.
          There are as many items below 50 as there are above 50.
          The number 50 is the most common value in the population.

          As we will find out later, these assertions come in handy.

      When mean, median, and mode are not aligned we call that skew. If the “tail” of the
      curve is longer in the positive direction, we call that “positive” skew. Naturally, if
      the “tail” is longer in the negative direction, we say that the distribution is “nega-
      tively” skewed. We can look at the features of this a different way. In a positively
      skewed distribution, the bulk of the population (and therefore the mean) is on the
      left half of the range (i.e., left of the median). In a negatively skewed distribution,
      the bulk of the population is on the right side.
          A convenient example of a skewed distribution is national household income.
      The curve has a significant skew in the positive direction. That is, the tail extending
      to the positive end of the range is longer than the one to the negative side. For the
      sake of example, let us place the median household income at $50,000. That means
      that half the households make more than $50,000 and half make less.
           The lower limit of the range is necessarily $0—there are people with no income
      whatsoever. (For the sake of saking, ignore the fact that people can actually have
      negative income.) On the other hand, there is no limitation on how much money
      you can make. That is, there is no upper limit to the range. The current upper limit
      is the income of the household making the most money. Because people on that
      long positive tail can make millions of dollars, the mean income is well above the
      median $50,000—for purposes of example, let’s say $70,000. The mode, on the other
      hand, is below the median. Again, pulling numbers out of the air, let us assume that
      the most common household income is $40,000.
           When a distribution is skewed, the mean, median, and mode are no longer
      co-located. They do have tendencies, however. In a positively skewed distribution
      such as the one showed in Figure 11.5, the mode will be less than the median, which
      will, in turn, be less than the mean. In a negatively skewed distribution, the oppo-
      site will be true: mean < median < mode.
                                          Chapter 11 Probability Distributions    253

            FIGURE 11.5 A skewed probability distribution has the bulk of
                   the population on one side and a tail on the other.
           As a result, the mean, median, and mode are no longer co-located.

Standard Deviation
Another measurement that gives us valuable information about the makeup of the
distribution is standard deviation (represented by the Greek letter sigma: Σ). It tells
us how spread out the bulk of the population is—also known as “dispersion.” A
probability distribution can have exactly the same range, mode, median, and mean,
and yet many different standard deviations (Figure 11.6). This is important to
remember because it is the standard deviation that determines the “character” of the
population. When most of the population is clustered tightly around the mean,
the standard deviation is small. If the standard deviation is high, then we can assume
that there is a wide dispersion of the population. A normal distribution with Σ = 1
is considered a standard normal distribution.
    There are actually multiple layers of standard deviations for the same curve.
They lie outside each other. For a normal distribution, about 68% of the data is
within the first standard deviation—34% on each side of the mean. The second
standard deviation is significantly wider than the first—encompassing 95% of the
population. For any normally distributed population, 99.7% lie within three standard
deviations of the mean. It is important to note that this is the case for all normal
distributions. No matter what the shape of the curve, 68% of the population will be
within that first standard deviation. This is known as the 68-95-99.7 rule—the rule
being that, in a normal distribution, almost all the data is within three standard
deviations of the mean.
254    Behavioral Mathematics for Game AI

            FIGURE 11.6 The standard deviation of a normal distribution determines the
       characteristic of the population. A small standard deviation makes a narrower, higher bulk
            of the population. A higher standard deviation leads to a broader, flatter curve.

       The formulas for generating true normal distributions are delightfully cryptic and
       involved. The following formula, the Gaussian function, generates the continuous
       probability density function of the normal distribution:

            In the above equation, Σ (sigma) is the standard deviation, e is our friendly
       natural logarithmic constant (≈ 2.71828), and the parameter μ (mu) is the expected
       value… a fun little term used in statistics. And of course, π (pi) is, well… pi. All of
       this squished together gives us a value for θ (theta) that shows the density of the
       curve at or around x. Now isn’t that pleasant?
            As you can see, dealing with this function is a little messy. Also, for the most part,
       it doesn’t do us a lot of good in the game development world. After all, we aren’t
       using these functions for analyzing existing data; we are trying to re-create data.
       Certainly, if we knew the exact figures needed to model a population’s behavior, we
       could plug those into a normal distribution curve and churn out figures. These cal-
       culations can be fairly processor-intensive, however—and processor time is not a
       luxury producers often grant us AI folks in the trenches of game development.
           Even more importantly, we aren’t trying to model things exactly. There is a point
       of diminishing returns where the minutia of hundredths of a percent gets lost in the
       shuffle. Our job is to generate something similar to what it is we are trying to repli-
       cate. Close enough for the effect but not so involved as to grind the process to a halt.
                                                Chapter 11 Probability Distributions   255

    Thankfully, we have plenty of techniques at our disposal for generating normal
distributions of all sorts of sizes and shapes. Interestingly, many of us “older folks”
were introduced to these techniques in the form of hunks of acrylic or hardened

A Fantasy RPG Introduction to Real Statistics
The pen-and-paper role-playing games (RPGs) in the mold of Dungeons & Dragons
(D&D) were many people’s unwitting introduction to probability and statistics.
This phenomenon became so ingrained in the gaming culture that many people
still refer to probability curves in terms of combinations of dice. If I write terms
such as “2d8” and “3d6,” there is no shortage of people that recognize them as “the
sum of the rolls of two eight-sided dice” and “the sum of the rolls of three six-sided
dice,” respectively. In fact, the latter was a D&D staple for generating the six char-
acter “ability scores” —strength, intelligence, wisdom, dexterity, constitution, and
charisma. This is not a coincidence.
     As we mentioned before, we often find normal distributions in nature. Traits
such as the six listed above are natural candidates for us to plot on normal distrib-
utions. Using strength as an example, most people are going to have a strength that
is near a concept of “average” strength. The farther away from that average we go,
the fewer people we will find who fit the description. There will be a very select few
who are at the extremes—those who can barely lift their own hands and those that
can heft small planetoids. This fits with what we know of the characteristics of a
normal distribution. The same can be said for intelligence; most people are fairly
close to average, with very few at the extremes. These characteristics of population
distribution are why the creators of D&D elected to generate ability scores in the
fashion that they did: “Roll 3d6” (Figure 11.7).
    Perusing the D&D literature, there is a veritable cornucopia of die-roll combi-
nations in evidence. Although I am not willing to invest the time to prove it, I believe
the creators used every possible combination of dice at least once. (That is not a
small number, either.) Die-roll instructions have their own formalized protocol
that is very expressive. Using simple operators, you can instruct someone on exactly
the way to use those tumbling tools to generate the distribution that you are trying
to achieve. Some examples are:

     1.   d20
     2.   3d4
     3.   2d6 + 1
     4.   d6 + d10
     5.   3d8 + 2d4 + 10
     6.   d10×10 + d10 (for generating numbers from 0 to 99)
256   Behavioral Mathematics for Game AI

      FIGURE 11.7    The result of adding the results of tossing three six-sided dice generates a
                                        normal distribution.

          In the above list, the first entry is a simple flat probability. Each of the 20
      possibilities is equally probable. We can say the same for number 6. It yields a flat
      probability from 1 to 100. Numbers 2 to 5, however, all create some variant of a
      normal distribution. We can ascertain characteristics of the resulting curves by
      looking for some key features of the die-roll equation. By isolating those features, we
      can learn how to utilize them in constructing our own probability distributions.

      Generating a Simple Curve
      For purposes of example, we will standardize on curves that produce a range of
      [0...30]. Additionally, we are not going to limit ourselves to the Pythagorean solids
      (and the 10-sided die) that make up the holy relics of D&D dice. We will allow random
      numbers to be generated in the spirit of dice of any number of faces. For example,
      0–15 will be a perfectly legitimate die roll for our purposes, although making a fair
      16-sided die would be an interesting feat of engineering.
          Given all of those options, there are plenty of ways that we can generate ran-
      dom numbers from between 0 and 30. The simplest, of course, is by simply rolling
      a d31. (Note that a d30 would only generate either [0...29] or [1...30].) This would
      give us a uniform probability distribution. The probabilities would be even across
      the entire range.
                                            Chapter 11 Probability Distributions      257

    Another method would be to roll 3d11 (each of which allows the results 0 to
10). If we were to graph the probability distribution of this method (Figure 11.8),
we would see the familiar bell shape of a normal distribution. The median and
mean are both 15. Naturally, the mode is 15 as well, with 6.84% of the rolls adding
up to it (91 of 1,331 possible combinations).

           FIGURE 11.8 This chart shows the random numbers from 0 to 30
           that would be generated using the distribution 3d11. The standard
             deviation of 5.48 means that the middle 11 selections ranging
             from 10 to 20 (in black) encompass ≈ 68% of the possibilities.

    The sample has a standard deviation of 5.48. That means that the data entries
±5.48 from the mean of 15 (e.g., 10 to 20) are in the first standard deviation.
Therefore, approximately 68% of the sample is between 10 and 20 inclusive.


For convenience, the die-rolling code in this chapter is on the Web site at It is contained in a single class, CDie.
By inserting this class into your projects, you can use it to simulate a variety of
die-roll combinations.

    We can create a simple function for generating normal distributions by first
creating one that parameterizes a die roll. For example, the following code block
simulates one die roll.
258   Behavioral Mathematics for Game AI

          unsigned short CDie::SingleDie( unsigned short NumSides,

                                               bool ZeroBased /*= true */ )


               unsigned short ThisRoll = rand() % NumSides;

               // if the die roll is not 0-based,

               // add one to it.

               if ( !ZeroBased ) {

                   ThisRoll += 1;

               } // end if

               return ThisRoll;


           By calling the function with the desired number of sides as the parameter, we
      will get a random number simulating the roll of the single die.

          unsigned short RollResult = SingleDie( 11 );

           Note that the parameter ZeroBased allows us to determine what the starting
      number of the die will be. If ZeroBased is true (the default), then the die roll will
      start with 0. For example, a d6 roll would generate the numbers 0 to 5. If ZeroBased
      is false, then a d6 roll will generate the numbers 1 to 6 like the six-sided die we are
      familiar with.
           If we wanted to simulate the 3d11 roll from our example above, we could call
      the function SingleDie() three times, passing in the number 11 each time, and
      add the results together. On the other hand, since we may be finding ourselves
      doing plenty of multiple-dice calls, we can create a function that condenses all of
      this into one package.
          unsigned short CDie::MultipleDie( unsigned short NumDie,

                                                unsigned short NumSides,

                                                bool ZeroBased /*= true */ )


               unsigned short TotalRoll = 0;
                                            Chapter 11 Probability Distributions       259

         for ( unsigned short i = 1; i <= NumDie; i++ ) {

              TotalRoll += SingleDie( NumSides, ZeroBased );

         } // end for

         return TotalRoll;


    The call to the function MultipleDie() takes an additional parameter for the
number of dice we want to roll. In this way, we can get our random 3d11 roll
through a single function call:

    unsigned short RollResult = MultipleDie( 3, 11 );

Changing the Shape
If we change our number generator from 3d11 to 5d7, it still generates numbers
from 0 to 30. The curve exhibits some different characteristics, however (Figure
11.9). The range of the distribution is still 0 to 30, but the center of the chart is taller
and narrower than the one in Figure 11.8. This is because the standard deviation of
this distribution is only 3.4, rather than the wider 5.48 of the prior curve. The re-
sult of this is that the 68% of the population that is in the first standard deviation is
condensed into a smaller range— from 11 to 19. In plain English, this means we can
more reliably expect numbers nearer the mean of 15.

     FIGURE 11.9 The random numbers generated using 5d7 create a taller, more
        condensed curve. The standard deviation is only 3.4. Therefore, only the
          selections from 11 to 19 (in black) are in the first standard deviation.
260   Behavioral Mathematics for Game AI

           Laying the two curves over each other (Figure 11.10) makes it easier to see the
      differences between them. The first thing we notice is that the 5d7 curve is taller
      than the 3d11 distribution. The reason is that the combination of five dice makes
      for more possibilities for the middle numbers than 3d11 makes. Specifically, when
      using 5d7, 1,451 out of the 16,807 possible combinations (8.63%) yield our median
      value of 15. With 3d11, only 91 out of the 1,331 combinations (6.84%) give 15.

        FIGURE 11.10 The distribution of numbers generated by 5d7 is narrower and taller
           than the one generated by 3d11. Accordingly, the tails of the curve are flatter.

          On the other hand, with more dice involved, it is significantly harder to get
      them all to play nicely together and roll extreme numbers such as 1 or 30. For
      example, to roll a 1 with 3d11, you would need any one of your dice to come up a
      1 and the other two dice to show 0. With 5d7, you need to show a 1 on one die…
      and then get four other dice to cooperate and show a 0 as well. It is my experience
      that dice simply don’t work well together when you want them to be all nice and
           This dynamic is not only what causes the 5d7 curve to be taller in the middle,
      but explains why the “bulge” is narrower. The additional probability of the middle
      numbers comes at the expense of the extreme numbers. After all, the area under the
      two curves is identical. Both of them represent 100% of the possible outcomes. If
      we add somewhere, we need to take away from somewhere else. We can see evidence
      of this in the difference between the two standard deviations. The 5d7 curve has a
      narrower range for its standard deviation because more of the population is com-
      pressed in the middle.
                                            Chapter 11 Probability Distributions       261

Skewing the Curve
The two examples that we created above were similar in that they were symmetri-
cal. That is, the mean, median, and mode were all 15. If we wanted to generate a
probability distribution that did not have the bulk of the population in the middle,
we would have to take a slightly different approach.
     For example, instead of rolling 3d11, we could use 4d11 and, once rolled, drop
the lowest from consideration. We add the remaining three together just as if they
were the only three that we rolled. We can see the result in Figure 11.11. The median
of this curve is now 19 rather than 15. The standard deviation is almost the same as
it was with 3d11, but it has been skewed to the right by four places. Now, the first
standard deviation (68% of the possible outcomes) is in the range of 19 ± 5.2.

       FIGURE 11.11 By rolling 4d11 and dropping the lowest die, we skew the
       curve to the right (i.e., negative skew). Now, the higher results (around 19)
           are more likely than the middle ones from the original (around 15).
                   The original 3d11 roll is shown by the dashed line.

    Note that we could have skewed the curve in the opposite direction by drop-
ping the highest of the four dice rather than the lowest. Additionally, we could have
made the skew more pronounced (in either direction) by rolling five dice and drop-
ping two of them.
    The last thing to note—something that has almost been forgotten in the shuffle
—is that these distributions have the same range… [0...30]. In all the cases, skewed
or not, we can legitimately claim that we are generating a random number between
0 and 30. This is important to remember. When faced with a situation that calls for
a random number between 0 and 30, we must ask ourselves the question, “How do
we want that distribution to look?”
262   Behavioral Mathematics for Game AI

      Why Skew?
      Arranging a probability distribution this way allows for very expressive character-
      istics. As I mentioned, in the original D&D, the prescribed method for generating
      the six character stats was 3d6. That created the bell curve that we saw in Figure
      11.6. However, if the definition of the numbers 10 and 11 for traits was “average,”
      then we were generally creating average characters. In a game such as D&D, playing
      an average character is hardly the point. In fact, it is rather unlikely that the average
      peasant would be the type to foray out into the wilderness to dispatch all sorts of
      baddies and perform the requisite “noble deeds.” That sort of adventure was more
      the purview of people who were inherently above average. But how could a D&D
      player create an above-average character worthy of such mighty endeavors?
           One method, of course, is the tried and true “brute force” method. Roll up a
      character. If you don’t like it, discard it and try again (or just give it to your little
      brother). However, many house rules started cropping up that made for slightly
      better than average character generation but still kept the spirit of randomness.
      Similar to our solution above, rather than roll 3d6, players would roll 4d6 and drop
      the lowest one. (Around our house, we rolled 5d6 and dropped two!) Just as we saw
      above, doing this skews the curve to the right (Figure 11.12). In the context of the
      game, this curve made for slightly above-average characters while still allowing for
      the occasional abysmal score on one attribute.

           FIGURE 11.12 By rolling 4d6 and dropping the lowest die, we skew the curve
          to the right. The stats of our D&D character will generally be better but still allow
           for the occasional low score. The original 3d6 roll is shown by the dashed line.
                                          Chapter 11 Probability Distributions    263

     Naturally, we can skew curves in the other direction as well. If we need to re-
create the example of household income mentioned earlier in this chapter, we
would create a distribution curve that is heavily skewed to the left. This would in-
dicate that the bulk of the households have a relatively low income but still allow for
the likes of Bill Gates and Warren Buffet on the extreme right end.


The code necessary to roll multiple dice and then remove one or more is slightly
more complicated. The reason for this is that we need to keep track of all the roll
results until the end. We can’t simply add them together as we go. If we know that
we are only going to be removing one die result, we can keep track of the highest or
lowest roll (depending on which direction we are skewing the curve), add the rolls
as we go along, and then add or subtract the stored value from the total of the rolls.
While this will certainly work, we would be limiting ourselves to only skewing by
one factor. For example, we could perform “4d6, drop the lowest” but not “5d6,
drop the lowest two.”
    By extending our multiple dice function above, we can make it account for sit-
uations where we want to skew the curve. It will handle skews of both directions (or
no skew at all) and an arbitrary number of dice.
    typedef enum {




    } SKEW_TYPE;

    unsigned short CDie::MultipleDie(

                                unsigned short NumDie,

                                unsigned short NumSides,

                                bool ZeroBased /*= true */,

                                SKEW_TYPE SkewDirection /*= SKEW_NONE*/,

                                unsigned short SkewCount /*= 0*/ )


         // Vector container to hold our die rolls

         std::vector<unsigned short> vRolls;
264   Behavioral Mathematics for Game AI

              // Current die roll

              unsigned short ThisRoll;

              // Total number of dice to roll

              unsigned short TotalToRoll = NumDie + SkewCount;

              // Get as many die rolls as we need

              // including the ones to drop for the skew

              for ( int i = 0; i < TotalToRoll; i++ ) {

                   ThisRoll = SingleDie( NumSides, ZeroBased );

                   vRolls.push_back( ThisRoll );

              } // end for

              // Total of all dice

              unsigned short TotalRoll = 0;

              // Array indexes of endpoint dice to consider

              unsigned short iFirstDie;

              unsigned short iLastDie;

              // Determine if we need to drop dice —

              // if so, sort and set the range of the ones to consider.

              // Remember that the vector indexes are 0-based, thus we

              // subtract one from the number of dice to roll to get the index.

              switch( SkewDirection ) {

                   case SKEW_NONE:

                       iFirstDie = 0;

                       iLastDie = TotalToRoll - 1;

                                         Chapter 11 Probability Distributions   265

             case SKEW_LEFT:

                 std::sort( vRolls.begin(), vRolls.end() );

                 iFirstDie = TotalToRoll - NumDie - 1;

                 iLastDie = TotalToRoll -1;


             case SKEW_RIGHT:

                 std::sort( vRolls.begin(), vRolls.end());

                 iFirstDie = 0 ;

                 iLastDie = NumDie - 1;




        } // end switch

        for ( i = iFirstDie; i <= iLastDie; i++ ) {

             TotalRoll += vRolls[i];

        } // end for

        return TotalRoll;


     To handle any number of dice, we create a std::vector container. This will be
our temporary die-roll storage location. The number of dice that we are going to
roll is set by the number that we want to consider for our result (NumDie) plus the
number that we need to accomplish the skew (SkewCount).
    After filling this vector with all the die roll results by calling our original
SingleDie() function, we sort the results. Depending on which way we are skew-
ing the curve, we then move the endpoints of the dice that we are going to consider.
For example, if the results of 4d6 rolls are [2, 3, 5, 6] and we are only going to be
dropping the lowest, we would only want to consider the second through fourth
positions in the array. (Note that the vectors are zero-based arrays, so we would be
266   Behavioral Mathematics for Game AI

      using array indexes 1 through 3.) Likewise, if we were going to be dropping the
      highest, we would use the first three positions (indexes 0 through 2). We accom-
      plish this by setting iFirstDie and iLastDie accordingly based on the switch
          The last portion of the function simply iterates through the vector between
      iFirstDie and iLastDie and adds the results together as if they were the only dice.
          As an example of calling this function, the “4d6, drop the lowest” call would be
      passed as:

          RollTotal = MultipleDie( 3, 6, true, SKEW_LEFT, 1 );

          Reading this in English: “roll 3d6, zero-based, skewed to the left by one die.”
          Note that on the Web site at, this func-
      tion has the third, fourth, and fifth parameters set with defaults as follows:
          unsigned short MultipleDie(

                   unsigned short NumDie,

                   unsigned short NumSides,

                   bool ZeroBased = true,

                   SKEW_TYPE SkewDirection = SKEW_NONE,

                   unsigned short SkewCount = 0 );

          Therefore, by not passing those parameters, we can use this same function even
      if we do not want to skew the distribution.

      Shifting the Curve
      Looking back at the D&D example in Figures 11.6 and 11.12, we will notice that the
      lower bound is 3. This is an artifact of using three dice with the numbers one
      through six on their faces. Mathematically, it is impossible to generate the number
      two by rolling three dice constructed in that fashion. Likewise, we arrived at the
      maximum of 18 in the same way. There is no way to achieve a 19 with these three,
      standard-issue dice.
           There are ways at arriving at numbers such as these, however. If we decide that
      we like the shape of the curve generated by 3d6—namely the range of 16 and standard
      deviation of about 3, we can pick that shape up and move it to many other places.
      For example, by using the formula (3d6 – 10) we could generate the numbers –7 to
      +8. The resulting curve would share the same probability characteristics as our
                                           Chapter 11 Probability Distributions      267

original 3d6… just in a different location on the number line. Similarly, we could
generate numbers from 50 to 65 using the equation (3d6 + 47). Again, the only dif-
ference is the numbers generated. The functional performance of the numbers rel-
ative to each other is the same. The range is the same (with different endpoints), the
relative locations of the mean, median, and mode are the same, and the standard
deviation is identical to the pre-shifted version of the distribution. This is going to be
an important feature to remember as we move through the remainder of this book.

The Distribution Checklist
By arranging the tools above, we can create a massive variety of normal probability
distributions to serve a variety of needs. Each feature of a normal distribution has
an associated factor that we can adjust to generate it. When constructing a distrib-
ution, we need to ask ourselves how we want these features to look and take the
related steps to accomplish that.

    Range: What is the range of the distribution? That is, how wide is the “footprint”
    going to be? This range will determine what the sum of our die rolls needs to be.
    Standard Deviation: How clustered or spread out should our population be?
    This will determine how many dice we will use to create the desired range. The
    more dice we use, the more compact the population will be.
    Skew: Will the distribution be symmetrical or skewed to the positive or nega-
    tive direction? This will determine how many dice we will roll initially, keeping
    only the number necessary to generate our desired standard deviation.
    Position: This will determine the lower and upper bounds of the range. By
    adding or subtracting a number from our die rolls, we slide the resulting distri-
    bution left or right.

    If we have in mind the sort of distribution that we would like to generate, we
only need to adjust these four parameters to craft our curve.


There is a function on the Web site at that
simplifies the generation of normal distributions. The declaration of this function is:
    int RandomFromNormalDist (

         int LowBound, // lower boundary

         int HighBound, // upper boundary
268   Behavioral Mathematics for Game AI

               unsigned short Pinch = 0, // extra die to narrow bulge

               SKEW_TYPE SkewDirection = SKEW_NONE, // side tail extends to

               unsigned short SkewFactor = 0); // extra die to skew curve )

           Rather than manually calculating the number of dice needed, this function al-
      lows us to pass in a lower and upper bound for our range. This feature will also shift
      the curve to a final location for us. For example, setting the boundaries to –10 and
      10, respectively, would generate a range from 0 to 21 and then shift it down by 10.
           When the range requested is not an exact multiple of the number of dice, the
      function adds an extra die roll for the remainder. For example, 3d11 (0-indexed)
      generates the range 0 to 30. If we wanted the range to be 0 to 32, the die roller would
      use 3d11 and add the result of an extra d3 (0 to 2). We could have also dealt with
      this by rolling 2d12 and a single d11, which yield the ranges of 0 to 22 and 0 to 10,
      respectively. Adding them together results in the range 0 to 32. Please note that this
      is not the method that is on the Web site at
           The pinch parameter changes how narrow the standard deviation will be. This
      parameter actually adds additional dice to the minimum of three. Therefore, pinch
      = 2 would divide the range up in five dice rather than three. As we saw before, this
      raises the center of the curve and flattens the tails.
         The last two parameters, SkewDirection and SkewFactor, are the same as we
      encountered before.
          We could call this function similarly to these examples:

          RollTotal = RandomFromNormalDist( 0, 10 );

          The above line returns a random number from a symmetrical normal distrib-
      ution with a range from 0 to 10.

          RollTotal = RandomFromNormalDist( -50, 50, 0, SKEW_NONE, 0 );

          This call returns a random number from a symmetrical normal distribution
      with a range from –50 to 50.

          RollTotal = RandomFromNormalDist( 20, 100 , 2, SKEW_LEFT, 2 );

          The above function returns a random number from a normal distribution with
      a range of 81, skewed to the left (the bulge is on the right). The bulk of the popula-
      tion will also be in a relatively narrow range due to the “pinch factor” of 2 in the
      third parameter.
                                                 Chapter 11 Probability Distributions     269

           This function is included as a part of the class CDie on the Web site at We can easily add the class to any project.


       A triangular distribution is, in essence, a stripped-down form of a normal distrib-
       ution. It shares some of the same characteristics, but without some of the additional
       complexity. For example, a triangular distribution has a mean, median, and mode
       just like its fancy cousin. Naturally, it also has a range in which its values lie.
       Triangular distributions, however, have some special qualities that make them
       useful in decision generation and simulation.

       Triangular distributions can be used as a simple, slightly faster version of normal
       distributions. To see why, let’s go back to our repeated efforts at generating a num-
       ber from 0 to 30. We showed two different methods that produced different curves:
       3d11 and 5d7. If we had continued to increase the number of dice used, the curve
       would have continued to get taller in the middle, with more of the population clus-
       tered tightly around the mean. The curves in the curve (if you will pardon the
       expression) became more pronounced as the population sample had to ramp up
       from a large span of very small values to very large ones, top off, and then plunge
       back toward 0% again.
           However, if we were to reduce the number of dice from three to two (i.e.,
       2d16), we would find that something very interesting occurs. There are no curves in
       the curve whatsoever. In essence, we are left with two straight lines that meet at a
       peak (Figure 11.13). Just as in the symmetrical normal distribution, the peak of an
       unskewed triangular distribution is the mode, median, and mean.
            Using a triangular distribution as a substitute for a normal distribution is often
       an acceptable practice. While we may lose a little of the subtlety of the normal dis-
       tribution’s shape (specifically, the tapered tails), we gain quite a bit in the rapidity
       of calculation. Obviously, using the above example of reducing a 3dx to a 2dx curve,
       we are cutting the number of random number calls by 33%. In heavily used deci-
       sion models, that can add up to a substantial savings of processor time.
270    Behavioral Mathematics for Game AI

         FIGURE 11.13 By using 2d16 to generate numbers between 0 and 30, our bell curve
        becomes a triangular distribution. The range, mean, median, and mode are all the same
        as the 3d11 distribution (dashed line). The standard deviation (black bars), however, is
                  slightly greater due to the slightly flatter spread of the population.

       While the above is merely a simplification of the normal distribution, there is another
       method that we can use to create triangular distributions that is enormously pow-
       erful and flexible. By setting only three parameters, we can automatically generate
       a continuous triangular distribution. The first two parameters are simply the upper
       and lower bounds of the range. The third parameter is the peak of the triangle. We
       can actually think of these three items as “worst case,” “best case,” and “likely case.”
       Looking at a sample triangular distribution in Figure 11.14, we can identify these
           If a and c are integers, then for any integer value x between the points a and c,
       we can determine what the percentage of the total sample under the curve x is by
       using one of the following formulas. The formula we use is based on whether the
       value x is above or below c. The value of x must be within the range specified by a
       and c. Therefore, the actual deciding factors and their formulas are:
           If a ≤ x ≤ c then:
                                           Chapter 11 Probability Distributions    271

         FIGURE 11.14 By setting the variables a, b, and c, such that a ≤ c ≤ b,
             any triangular distribution can be created quickly and easily.

    If c ≤ x ≤ b then:

     By programming the formulas and their constraints, we can yield quick prob-
ability distributions for all x between a and b. For example, by using the relatively
arbitrary numbers 6 and 53 for a and b, respectively, and 41 for c, we get the prob-
ability distribution shown in Figure 11.15.

          FIGURE 11.15 The triangular probability distribution generated by
            a = 6, b = 53 and c = 41. Note that because the curve crosses
                 the axis at a and b, the value at these points is 0%.
272   Behavioral Mathematics for Game AI

      Most distribution formulas are polite enough to give us either the number or
      the percentage of occurrences for a given value of x. However, unlike using the
      “dice-rolling” method, we can’t turn the function back around and give us a
      randomly selected x out of the distribution. This is a problem we will encounter
      throughout the remainder of this chapter. We will discover how to turn a
      distribution function into a random selection method in the next chapter.


      Another simple probability curve to create is an uneven distribution. While not
      technically a recognized form of probability distribution, we can still find it useful.
      In a way, it is a variant of the triangular distribution above, but with the c point
      co-located at either a or b. Rather than use the triangular distribution method,
      however, we can lean on the linear function we examined in the previous chapter:
      the formula, y = m(x) + b where m is the slope of the line and b is the y-intercept.

           The probability distribution that results from a linear distribution has very
      simple but identifiable characteristics. First, we can easily determine the “more
      likely” options compared to the “less likely” ones. In fact, due to the linear nature
      of the formula, we can assert that as x moves in one direction, the probability y will
      always move in one direction as well. In the distribution shown in Figure 11.16, for
      example, as x increases, y always decreases. Not only does it always decrease, but it
      does so at a constant rate. The decrease in y over the range xa to xa+1 is the same as
      the decrease in y from xb to xb+1.

                  FIGURE 11.16 A linear distribution is constructed from a simple
                     equation-based line such as those of the form y = m(x) + b
                        with which we all became so familiar with in school.
                                           Chapter 11 Probability Distributions       273

    Second, if we haven’t determined a range artificially (for example, stating that
we are only concerned with x values between 5 and 15), we can easily identify the
range of the distribution as the areas where y > 0. Through some painless algebraic
yoga we can solve any equation for the point y = 0 to determine the exact point at
which this occurs along the x-axis.
     Some variations to linear equations merit mention. Not all lines will intercept
the x-axis in the range that we are concerned with. In Figure 11.17A, for example,
the probability increases over the range between xmin and xmax. At no time, however,
is that probability 0.

     FIGURE 11.17 (A) A linear distribution that does not reach 0 within the range.
        (B) A linear distribution that reaches 0 within the range. Values below 0
      must be clamped to 0. (C) A linear distribution that extends beyond a defined
                      maximum must be clamped to the maximum.

      If we have established a range for the possible events (e.g., xmin to xmax), we may
encounter a situation where the probability line proceeds below 0 (Figure 11.17B).
We need to be careful to trap our code to avoid the possibility of negative probabil-
ity. In those circumstances, overriding the formula to set y to 0 is a simple solution.
    Similarly, if we set a maximum value for y, we need to be aware of another po-
tential problem. In the graph shown in Figure 11.17C, we see that the line proceeds
above the maximum value that we have set for y. In these cases, we must be careful
to cap the value returned by our formula to ymax.
     The bottom line on linear distributions is that they are a simple yet elegant so-
lution to crafting increasing or decreasing probabilities. It certainly makes for a nice
change of pace from uniform distributions but without time-consuming clutter. As
mentioned above, we will soon show a method for selecting random numbers out
of this distribution.
274   Behavioral Mathematics for Game AI


      We can create another common and useful distribution in a similar fashion to the
      linear distribution above. By inserting an exponent into the formula, thus making
      the formula a quadratic function such as we discussed in Chapter 10, we create one
      of the veritably plethoric varieties of parabolic distributions. The linear distribu-
      tions have a subtle advantage of expression over the uniform distributions in that
      the probability value changes over the range. Similarly, a parabolic distribution is
      slightly more expressive than a linear distribution in that the rate of change in
      probability changes over the range (Figure 11.18).

              FIGURE 11.18 Despite starting and ending at the same pair of locations
           as the linear function y = x (dashed line), the parabolic probability distribution
                        y = x2/100 shows significantly different characteristics.

           We can manipulate parabolic distributions in the ways that we covered in the
      previous chapter. First, by changing the exponent, we can dramatically change the
      inflection of the curve. For example, raising x to the power of three rather than two
      causes a more pronounced “corner” to appear. We can also utilize non-integer ex-
      ponents (e.g., x1.7) to fine-tune the shape of the curve.
           By using exponents less than one, we can reverse the inflection of the curve. For
      example, the formula x.5×10 (Figure 11.19) starts by climbing quickly and tapering
      off. This curve is reminiscent of some of the decreasing marginal utility curves that
      we discussed in Chapter 8.
                                          Chapter 11 Probability Distributions       275

   FIGURE 11.19 Exponents less than 1 create a curve with a downward inflection.
      The graph above shows the formula x.5×10 (which could also be referred to
                         as the square root of x times 10).

    Again, by changing the exponent, we can dramatically or subtly affect the shape
of this curve. However, if we are trying to get the values to fit into a certain range
(such as how the curve in Figure 11.19 ranges from 0 to 100), we will need to
change the rest of the formula to accommodate the massive shifts that changing an
exponent will cause.

       FIGURE 11.20 By using three different exponents, we have crafted three
      different curves. We divided the formulas by different numbers so that the y
            values remained between 0 and 100 over the range x = [0...100].
276   Behavioral Mathematics for Game AI

          In Figure 11.20, we show three similar parabolic curves being subtracted from
      100 and arranged such that when x reaches 100, y reaches 0. The paths that the three
      curves follow to get there are different, however. The shapes of the curves themselves
      are different because the exponents are different: 2, 3, and 4. However, because the
      results expand so rapidly, we needed to divide the equation by 100, 10,000, and
      1,000,000, respectively. By doing that, we ensure that when x = 100, y = 0.


      Another useful, if slightly more esoteric, probability curve is the Poisson distribution.
      Siméon-Denis Poisson originally developed it in 1838 as a way of expressing the
      probability of events happening over time. Accordingly, we can use it in game AI to
      generate events that occur at average intervals but without resorting to predictable
      time periods. On the other hand, we are not limited to using the Poisson distribution
      in connection with time-based events. We can benefit from using the unique prop-
      erties of the Poisson distribution in other areas as well.
          The formula for a Poisson distribution is:

          The values in the equation are a little counter-intuitive at first and merit expla-
      nation. The value k (the equivalent of x on the graph) represents the number of
      occurrences of an event over a time period. The value λ (lambda) is a positive, real
      number equal to the expected number of occurrences over a given time period.
      The familiar value e is the base of the natural logarithm (approximately 2.71828). The
      result of the equation is the probability that k events will happen in that time period.
           For example, if we know that an event happens five times in a minute on aver-
      age (λ = 5), the Poisson distribution suggests that 17.6% of the time we can expect
      it to occur exactly 25 times (5×5) times in exactly five minutes.

           On the other hand, if we want to know how often we could expect it to happen
      25 times (5×5) in six minutes, we would find:
                                               Chapter 11 Probability Distributions        277

    As we can see, it is less likely that the 25 events would be distributed over six
minutes than the usual five. Similarly, plugging k = 7 into the equation yields
10.4%; it is even less likely that 25 events would be spread over seven minutes. Each
value for λ produces a distinct curve that expresses the probability of λ events oc-
curring in each discrete value of k. Figure 11.21 shows three examples for λ values
of 5, 15, and 30.

      FIGURE 11.21        If we change the value for λ , the resulting curve spreads out
                             to account for the different probabilities.

Another Use for Poisson
We can also use a Poisson distribution to model an event that happens once every
k minutes instead of λ times per minute. We use the same equation but simply
change our mindset. Referring again to Figure 11.21, we could use the curves shown
to represent events that happen every 5, 15, and 30 minutes. In this case, the prob-
abilities are the probability that the event happens during the specified time period.
     For example, if we are expecting an event to happen on average every 15
minutes, (λ = 15), we would find that the actual recurrence of the event would be
distributed to values around 15 minutes. It may occur at the 15-minute mark, but
it also may occur at 14 or 16 minutes. For that matter, it may occur at 5 or 30 min-
utes. The odds of those results are significantly less likely than 14 or 16, however.
We can observe this result easily by examining the associated curve.
    The shapes of the curves are directly related to the value of λ. Loosely described,
the Poisson distribution takes an average interval and “fuzzies it up” a bit. The longer
the interval, the more room for variation there is. That is, the longer the average
interval, the more possibility for “spread” we can experience. For instance, if the
278   Behavioral Mathematics for Game AI

      average interval was 5 minutes, it is very unlikely that we would encounter an
      interval of more than 10 minutes. The reason for that is an almost necessarily math-
      ematical symmetry that is necessary for generating an average. That is, with an
      average of five minutes, every six-minute interval can be balanced by a four-minute
      one, every seven-minute interval can be balanced by a three-minute one, and so
      on. While an interval of 10 minutes or more is certainly possible, to maintain the
      5-minute average it would require more than one corresponding interval of less
      than 5 minutes.
           The result of this balancing act is that as the average interval decreases, the
      range of the distribution narrows. As the average interval increases, the range
      widens. It is worth noting that because of this balancing problem, the distributions
      are right-skewed—that is, the median and mean are to the right of the mode. The
      tails are longer on the positive side than they are on the negative side.
           Another artifact of this characteristic is that the likelihood of the average inter-
      val occurring decreases because the distribution is spread over more intervals. This
      is similar to the effect we saw with the normal distributions. The wider the bulk of
      the curve (i.e., the standard deviation) is, the fewer occurrences of the result at the
          This distribution is excellent for randomizing the times between events that
      otherwise would occur at regular intervals. For example, if we want an agent’s ac-
      tion to occur every 15 seconds, we can generate a Poisson distribution with λ = 15
      to determine in which second the event should occur.

      Computational Drawbacks
      One of the drawbacks of the Poisson distribution is the computational overhead in-
      volved in calculating factorials. From a calculation time standpoint, factorials are
      relatively inexpensive to compute. We can determine them from a simple loop. If
      we did want to speed up this calculation, we could use a lookup table instead.
           Of more concern is how quickly the size of factorial results increases. 10! =
      3,628,800. 11! checks in at just under 40 million. Likewise, we face some signifi-
      cantly large intermediate calculations with λk in the numerator of the equation.
      With λ = 15 and k = 15, that single calculation is 437,893,890,380,859,375… and we
      haven’t even multiplied it with e15 yet! I know the graphics folks like to brag about
      how many calculations and big numbers they push around, but we don’t need to
      try to out-gun them quite this much!
           As percentages, the actual results of Poisson distributions are not terribly large.
      Therefore, if we know the ranges of the potential λ and k values that we want to use
      ahead of time, they can be calculated offline and stored in a lookup table. (As we
      will see soon, storing probability values in an array-like structure will come in handy
                                                   Chapter 11 Probability Distributions   279

      Substituting Normal or Triangular Distributions
      If the above calculations are prohibitive, we can obtain much the same result by
      using a normal distribution. If we set the range to be slightly more than twice what
      we would normally be using for λ (that is, half of the curve would be on either side
      of λ), position the mode at λ, and slightly skew the curve to the right (i.e., make the
      positive portion of the tail longer than the negative), we can achieve a similar
      distribution. We can even generate very similar results without the skew on the nor-
      mal distribution. What we end up with is a normal distribution that amounts to λ,
      +/– half of λ, centered on λ.


      To tie off a loose end that we left dangling earlier in the chapter, we will revisit re-
      constructing our Guess Two-Thirds participants. We had identified three different
      types of guessers. We are going to break that into four types for purposes of this
      exercise. We also identified the rough percentages of the whole that each of these
      groups of people represented. To recap:

          Group                          %
          “33” guessers                  4
          “22” guessers                  3
          Random guessers                30
          Semi-logical guessers          63

          When generating a random guesser, the first thing we need to determine is
      which of the four categories the guesser is in. The result of that will determine our
      next course of action. Once we know the type of guesser we are working with, we
      then select what process we want to use to generate their actual guess. In the case of
      the “33” and “22” guessers, the answer is simply 33 and 22, respectively. The ran-
      dom guessers (30% of the population), as we determined earlier, would divide their
      guesses up across the entire range from 0 through 100.
          As we discussed before, the semi-logical bunch requires a bit of special treat-
      ment. We noticed that they fall roughly into a normal distribution with a range
      from 0 to about 70. The distribution has a slight right skew that shifts the bulge of
      the population from its natural center of 35 to a new mode of about 25. However
      we construct it, we know that 63% of the time, our guesser needs to pick a guess
      from that distribution.
          So… let’s throw this model into some code using the tools we laid out earlier.
280   Behavioral Mathematics for Game AI

      P UTTING I T   IN   C ODE

      For clarity, we will define an enumerated type GUESS_TYPE to represent our four
      possible types of guessers.
          typedef enum {





          } GUESS_TYPE;

         To generate a guess using the complete model, we only need call the function
      GetGuess(). The first order of business in GetGuess() is to call GetGuessType().
          GetGuessType() is a function that will, using the percentages laid out above,
      identify which type of guesser we are modeling.
          GUESS_TYPE CGuesser::GetGuessType()


              int index = DieRoller.SingleDie( 100, false );

              if ( index <= 4 ) return GUESS_33; // 1..4 = 4%

              if ( index <= 7 ) return GUESS_22; // 5..7 = 3%

              if ( index <= 37 ) return GUESS_RANDOM;       // 8..37 = 30%

              return GUESS_SEMI; // 38..100 = 63%


           Note that we are using the object DieRoller that is an object of type CDie that
      is declared in the header of CGuesser. We are asking for a number from 1 to 100.
      (By leaving the second parameter, ZeroBased, as true, we would have received one
      between 0 and 99.)
          Once we have received a type of guesser back from GetGuessType(), we can
      select the appropriate manner in which to generate the guess.
          unsigned short CGuesser::GetGuess()


              GUESS_TYPE GuessType = GetGuessType();

              unsigned short Guess;
                                          Chapter 11 Probability Distributions    281

         switch( GuessType ) {

             case GUESS_33:

                  Guess = 33;


             case GUESS_22:

                  Guess = 22;


             case GUESS_RANDOM:

                  Guess = DieRoller.SingleDie( 101 );


             case GUESS_SEMI:

                  Guess =

                      DieRoller.RandomFromNormalDist( 0, 70, 0,

                                                            SKEW_RIGHT, 2 );


         } // end switch

         return Guess;


    The switch statement is fairly self-explanatory. If GUESS_TYPE is GUESS_33,
then the guess is 33. Likewise, if the guess type is 22, the guess is set to 22. In the
other two cases, we generate random guesses based on the parameters shown.
    For example, if the guesser is one of those who is guessing randomly, we simply
ask our number generator for a number between 0 and 100 via the call DieRoller.
SingleDie( 101 ). This returns a uniform distribution. All of the possibilities are
    If the guesser is one of the semi-logical guessers we identified earlier in the
chapter, we ask for a random guess from the normal distribution called for by
the parameters:
282   Behavioral Mathematics for Game AI

          LowBound                 0
          HighBound                70
          Pinch                    0
          SkewDirection            Right
          SkewFactor               2

          As we can see, this distribution has a range of 70 and is skewed to the right—
      meaning that the “tail” extends more to the right than it does to the left. This
      description fits the curve that we identified in Figure 11.2.
           By calling the function GetGuess(), we are creating random guessers of all four
      types in the percentages that we laid out. That means that each call to GetGuess()
      could generate a 33, a 22, a random guess between 0 and 100, or a random guess
      between 0 and 70 with a bias toward about 25. If we call this function many times
      and examine a histogram of the results (Figure 11.22), we see a distribution similar
      to the one we found in the original data from the Denmark experiment.

      FIGURE 11.22 The histogram of 3,000 random guesses using the GetGuess() function.
           Aside from the two specific guesses of 22 and 33, we created the rest using
             two different types of probability distribution: a uniform distribution from
                  0 to 100, and a right-skewed normal distribution from 0 to 70.

           With some analysis of the population (which we performed earlier in the chapter),
      some planning, a little tweaking of our percentages, and a few calls to our probabil-
      ity distribution functions, we are able to replicate the population that entered the
                                         Chapter 11 Probability Distributions    283

Guess Two-Thirds contest in Denmark. Notice that while we certainly understood
some of the logic involved in people’s guesses (especially that of the “33” and “22”
guessers), we didn’t necessarily need to apply that logic in reconstructing our some-
what complex population distribution.
     If we were to hook this function up to a research project similar to the one that
the chaps from Copenhagen did, we would likely be hard-pressed to determine the
difference between the real guessers and the simulated ones. We could even be so bold
as to suggest that our Guess Two-Thirds simulation passes the Turing test… and
yet we did it without actually modeling the mindset behind any given guess. In fact,
our function knows nothing of the rules of the game. The major drawback of this
is that if the rules of the game change (such as becoming “Guess 70% of the
Average”), our function does not know how to respond. What’s more, we don’t know
how to change it until we either see empirical evidence of how real people play or
translate some of our understanding of the original game into the new rule set.

    Regardless, we have shown that by using various forms of probability distribu-
tions, we can simulate a relatively complex population. This skill will play a part
throughout much of the remainder of the book—in both large ways and small.
This page intentionally left blank
12             Response Curves

          ne of the drawbacks of some of the functions we looked at in Chapter 10 is
          that we are entirely at the mercy of the mathematics. Even if we could con-
          struct a function that gives us close to the shape that we wanted, we are still
 stuck with every bit of it. We have no way of tweaking a portion here or there to be
 “just a bit higher” or “not quite as steep in this part.” More importantly, there are
 plenty of situations where no mathematical function—no matter how convoluted
 —is going to give us anything close to what we need.
      Similarly, in Chapter 11, some of the probability distribution functions we ex-
 amined allowed us to construct pretty curves but did not offer a way for us to extract
 a random number from them. Being able to do so is important if we are to use those
 probability distributions to construct decisions and behaviors. It doesn’t matter if
 we know that choice x should occur y% of the time if we can’t cause x to occur at
 all. For example, while we know that a coin should land on heads 50% of the time,
 until we actually toss the coin, we won’t know who wins the coin toss. We know that
 rolling a 7 on 2d6 occurs 16.7% of the time—and yet the nice man running the
 craps table will be very annoyed with us if we never actually throw the dice.
      To address both of these problems, we need to introduce a new method of deal-
 ing with functional data. To solve the problem of customizability, we need a way to
 store the results of a function, tweaking those results at will, and extract what we
 need out of it. To solve the problem of extracting a random number x, y% of the
 time as determined by a continuous probability distribution function, we need to
 do something very similar: store the results in a data structure that allows for re-
 trieval in the appropriate proportion of occurrences. Response curves handle both
 of these situations admirably.

286   Behavioral Mathematics for Game AI


      One of the advantages of implementing response curves is that it gives us a new way
      of looking at data. By changing our vantage point, so to speak, we are able to process
      this data in ways that are more conducive to manipulation and selection.
          We will start with a simple example from the previous chapter… our helpful
      dentists. As we have recalled a few times, rumor has it that “four out of five dentists
      recommend sugarless gum.” If we were to generate random dentists from this data,
      we would want to make sure that 80% of the time, the dentist was of the mind to
      recommend sugarless gum. According to what we’ve been told, that’s realistic, right?
          Certainly, there are plenty of ways that we could generate a sugarless dentist
      80% of the time. It is actually a rather simple exercise. However, for purposes of this
      example, let’s look at the histogram from Chapter 11. On the left side of Figure
      12.1, we see the histogram representing the dentist data.

           FIGURE 12.1 By laying the histogram bars end-to-end, we lay the results over
            a number line. This allows us to mark the beginnings and ends of each range.

           I don’t mean to wander into pedantic territory here, but there is an aspect of
      this histogram that we should make a note of. By the very nature of histograms, we
      know that the “yes” bar is four times the size of the “no” bar. After all, another way
      of expressing the recommendations of the dentists is “dentists recommend sugar-
      less gum at a 4-to-1 ratio over gum with sugar in it.” It logically follows that a rep-
      resentation that measures ratios should be ratio-based in its portrayal. However,
      these two vertical bars don’t do us much good for randomly selecting which camp
      our prospective dentist is in.
                                                 Chapter 12 Response Curves        287

From Bars to Buckets
If we were to turn the bars on their sides, however, and lay them end to end, we
change our perspective. The bars are still in the same proportion as they were before:
4 to 1. By placing them in this orientation (Figure 12.1, right side), we can see how
they lie across what is now an x-axis. In this example, the “yes” answers run from 0
through 4 and the “no” answer is the single unit between 4 and 5.
     We can refer to these ranges as buckets—a name that makes sense when you
extend the metaphor slightly. Imagine a game of randomly dropping a ball into
these buckets (not too dissimilar from the game Plinko from the game show, The
Price Is Right). If the ball drop is truly random, it would have a 4:1 chance of land-
ing in the “yes” bucket. This is a result of the “yes” bucket being four times as big
as the “no” bucket.
    Now, because these are not real buckets and we are not dropping a real ball, we
have to simulate dropping a ball into the buckets. To do this, we generate a random
number between 1 and 5. By referring to the number line below our buckets, we
can determine which number corresponds to which bucket. For example, if we
were to roll a 2 (using our dice terminology again), we would let that signify that
our random ball has fallen into the “yes” bucket. In fact, if we were to roll a 1, 3, or
4, our ball would have landed in the “yes” bucket as well. On the other hand, if we
were to roll a 5, our ball would have landed in the “no” bucket. The difference is
that the ball landed on the other side of the edge that defines the separation be-
tween the buckets—in this case, the edge is 4.
     In such a simple example, all of this seems rather obvious. However, as we shall
see, there is a lot of potential wrapped up in this method of approaching random

Adding More Buckets
For a slightly more involved scenario, let us return to another example from the pre-
vious chapter. When we were trying to re-create the results of the Guess Two-Thirds
Game, we identified four segments of the population that had distinct characteristics.
Each of those four types of guessers had their own method of approaching the game.
We also identified what we believe to be the relative occurrence percentages of the
four groups. To reiterate:

    Group                        %
    “33” guessers                4
    “22” guessers                3
    Random guessers              30
    Semi-logical guessers        63
288   Behavioral Mathematics for Game AI

          We can lay out this data in a similar fashion as we did with our dentist data. Just
      as we based the sizes of the “dentists’ recommendation” buckets on the ratio of
      those recommendations, we construct our buckets based on the relative sizes of the
      four segments of the “guesser” population. When we lay them end-to-end, the total
      width is 100. Because the figures were percentages of the whole, and we have ac-
      counted for all of the groups that make up the whole, it makes sense that they add
      up to 100. (We will find later that this is not a necessity.)
          Once again, by laying the buckets side-by-side over our x-axis, we can deter-
      mine the edges of the buckets (Figure 12.2). By dropping our metaphorical ball into
      the buckets (by generating a random number between 1 and 100), we determine
      which population segment our next guesser is going to represent. Theoretically,
      63% of the guessers are going to be semi-logical, 30% will be random, and so on.
      While the relative frequencies of the “33” and “22” guessers are small, there still is
      a possibility that our ball will find its way into one of those two buckets.

              FIGURE 12.2 The buckets created by arranging the relative population
             segments of the Guess Two-Thirds Game. Note that the proportional sizes
                 of the buckets persist regardless of in what order we place them.

          Notice that the order that we place the buckets in doesn’t matter. In the bottom
      half of Figure 12.2, we moved the buckets into a different arrangement. However,
      because the sizes of the buckets haven’t changed, the odds of our random ball drop-
      ping into any one of them do not change either. For example, there is still a 4%
      chance of a “33” guesser appearing.
                                                        Chapter 12 Response Curves         289

           The locations of the edges of the buckets do change, however, and this is where
       our focus must lie.

       The numbers at the bottom of each of the two depictions in Figure 12.2 represent
       the cumulative sizes of the buckets that we have added. For example, in the top
       group, the first bucket we added was the probability of the group that guesses “33.”
       We had determined that the size of that group was 4%. The edge of this bucket is,
       therefore, 4. If our random number is 1, 2, 3, or 4, we pick the first group.
            It is important that we notice that the right edge of a bucket is inclusive. For the
       bucket above, the edge is 4, not 5. We can think of this as being “anything in the 4s
       is still fair game… but 5 is on the other side.” This will be an important distinction
       to remember as we write our code later.
           The second group that we add—the “22” guessers—occurs 3% of the time. We
       add this 3% to the original 4% from the first group. Therefore, they would occupy
       the next three slots on the number line—5, 6, and 7. The bucket edge for this group
       would be 7. As we will explore later, we only need to store one edge for each bucket.
       We can infer the other edge by the bucket immediately to the left.
           The third group, the random guessers, represented 30% of the whole. As above,
       we add 30 to the edge of the preceding group (7). Therefore, the edge of this bucket
       would be 37. If our randomly generated number falls anywhere in the range of 8 to
       37, we select the third bucket.
            Naturally, we repeat the process for the fourth bucket, the semi-logical
       guessers. The width of their bucket is 63; their bucket spans the range from 38 to
       100. It is important that we do not simply assume that anything that doesn’t land in
       the first three buckets lands in the fourth. We need to make sure that we keep track
       of the actual width that we intend the last bucket to be. The reason this is impor-
       tant is that we do not want to assume the total width for the combined buckets. We
       shall revisit this issue in a moment.

       P UTTING I T   IN   C ODE

       In the previous chapter, we laid out some code for selecting which of the four groups
       of guessers we were going to generate. The code for that was relatively simple.
           GUESS_TYPE CGuesser::GetGuessType()


                int index = DieRoller.SingleDie( 100, false );
290   Behavioral Mathematics for Game AI

               // 1..4 = 4

               if ( index <= 4 ) return GUESS_33;

               // 5..7 = 3

               if ( index <= 7 ) return GUESS_22;

               // 8..37 = 30

               if ( index <= 37 ) return GUESS_RANDOM;

               // 38..100 = 63

               return GUESS_SEMI;


            By looking closely, we can see some familiar numbers. In each of the if state-
      ments, we were checking to see if the random number was less than a specified
      number. The first one, if ( index <= 4 ), is testing to see if index lands in the
      first bucket (the “33” guessers). Likewise, the second statement, if ( index <= 7 ),
      is testing to see if our random number lands in the second bucket—between 5 and
      7 inclusive. This continues to the third if statement, checking to see if the number
      is between 8 and 37 (inclusive). If we have not exited the routine after the third
      statement, the number is above 37, and we return GUESS_SEMI from the function.
           This arrangement is a very familiar construct to most programmers. While it is
      certainly functional, it has one serious drawback. If we want to change the bucket
      widths—even just one of them, we have to change some (or even all) of the if state-
      ments. Specifically, we have to change the statement for the bucket we are chang-
      ing the size of and all the ones that occur after it. The worst-case scenario occurs if
      we want to change the first bucket. That means we have to change all of the if
      statements in the entire function. For example, if we decide that the “33” guessers
      occur 5% of the time instead of 4% (at the expense of 1% of the semi-logical
      guessers), our new code would look like this:
          GUESS_TYPE CGuesser::GetGuessType()


               int index = DieRoller.SingleDie( 100, false );

               // 1..5 = 5

               if ( index <= 5 ) return GUESS_33;
                                                Chapter 12 Response Curves        291

         // 6..8 = 3

         if ( index <= 8 ) return GUESS_22;

         // 9..38 = 30

         if ( index <= 38 ) return GUESS_RANDOM;

         // 39..100 = 62

         return GUESS_SEMI;


     As we can see above, we had to change all three if statements, increasing the
test number by one in each case. It is easy to see that this is not a very flexible way
of laying out code. Trust me: As I was fine-tuning this example in the last chapter,
I changed those numbers a few times… it’s not fun. (The process was made even
more annoying by the fact that I had to change my comments as well.)
    There are a few considerations over and above the time consumption argu-
ment. First, it is ridiculously prone to errors. For example, if we forget to change
one of those numbers, we are going to skew the probability of two occurrences
rather than just the one that we are changing. Second, the difficulty in keeping
track of our problems increases with the number of possibilities. The above exam-
ple had only four selections (which gives us three if statements. If we have
dozens… or even scores of possible actions, managing the bucket edges efficiently
gets prohibitive quickly.
    Perhaps the most problematic issue in constructing the probabilities in this
manner is the fact that it is hard-coded, however. We have no way of changing the
edges during run time. This goes beyond the ability to have data-driven code such
as probabilities based on a difficulty setting that a designer sets beforehand. We
have no way of efficiently changing these values on the fly. We will address the myr-
iad uses for this later on in the book.
   The solution to this is to store the edge values in a data structure. For example,
we will create a struct named sGUESSER_BUCKET in our project that represents
a bucket. Each bucket represents one type of guesser. The components of
sGUESSER_BUCKET are simple: a width, an edge (both of type USHORT), and a

    typedef enum {


292   Behavioral Mathematics for Game AI



          } GUESS_TYPE;

          typedef unsigned short USHORT; // for simplicity of declaration

          struct sGUESSER_BUCKET


               USHORT Width;      // the actual width of the bucket

               USHORT Edge;      // the calculated edge of the bucket

               GUESS_TYPE GuessType;        // the guess type this bucket represents


          Once we have defined our bucket structures, we create a vector of them:
          typedef std::vector< sGUESSER_BUCKET > GUESS_TYPE_LIST;

          GUESS_TYPE_LIST mvGuessTypeList;


        Now that the programming part of this book is starting to get more involved, perhaps
        it is a good time to reiterate some of the naming conventions that I use in my code.

               Type and struct names are in all caps: MY_TYPE
               Struct names are preceded by a lowercase “s”: sMY_STRUCT.
               Variables and functions are in initial caps: MyFunction( MyVariable )
               Member variables of a class are generally preceded with a lowercase “m”:
               List and vector names are preceded by a lowercase “l” and “v,” respectively.
               I combine these when necessary such as in a member of a class that is
               also a vector. In this case the name is preceded by “mv” such as in
                                                Chapter 12 Response Curves       293

     When we run our program, the buckets do not exist in the vector. We need to
set them up with the initial data. We do this by setting the data for each bucket and
pushing it onto the vector. We can isolate this process in a function such as this:
    void CGuesser::AddBucket(GUESS_TYPE GuessType, USHORT Width)


         sGUESSER_BUCKET CurrentBucket;

         mMaxIndex += Width;

         CurrentBucket.GuessType = GuessType;

         CurrentBucket.Width = Width;

         CurrentBucket.Edge = mMaxIndex;

         mvGuessTypeList.push_back( CurrentBucket );


     Notice that, despite the fact that our buckets have three members (GuessType,
Width,  and Edge), we only pass two variables into the AddBucket function. We
don’t need to pass in Edge because it is based on the running total of the bucket
sizes that have been pushed before it. We track this with the member variable
mMaxIndex, which represents the maximum array index of the vector. When we are
finished pushing buckets into our vector, mMaxIndex will represent the combined
width of all the buckets.
    To add our four guesser types to this vector, we call AddBucket() once for each
type. It doesn’t matter where we get our data. For simplicity’s sake, in this example,
we have hard-coded the probabilities for each of the four types.
    void CGuesser::InitBuckets()


         AddBucket( GUESS_33, 4 );

         AddBucket( GUESS_22, 3 );

         AddBucket( GUESS_RANDOM, 30 );

         AddBucket( GUESS_SEMI, 63 );


   In the function above, we are using the same probabilities that we used in
Chapter 11 and again in our initial example above. If we decide that we want to
change the probability values, however, our task is much simpler now than it was
when we were using the if statements. If we want to make the same change to the
294    Behavioral Mathematics for Game AI

       data that we did a few pages back (the “33” guessers being 5% and the semi-logical
       ones being only 62%), we only need to change the two relevant numbers. Our
       function would now read:
                 void CGuesser::InitBuckets()


                     AddBucket( GUESS_33, 5 );

                     AddBucket( GUESS_22, 3 );

                     AddBucket( GUESS_RANDOM, 30 );

                     AddBucket( GUESS_SEMI, 62 );


          The bucket edges would now be different than they were with the original
       numbers. (mMaxIndex still adds up to 100).

       Once we have our buckets set up, tossing our ball in to determine a result is a fairly
       simple process. Originally, we generated our random number and then tested it
       against three if statements to find out which of our four possibilities was selected.
       That is not much different than what we are going to do here. Thankfully, by hold-
       ing our results in vector, we can now perform this search in a loop.
                 GUESS_TYPE CGuesser::GetGuessType()


                     // Generate a random number between 1 and mMaxIndex

                     USHORT index = DieRoller.SingleDie( mMaxIndex, false );

                     // Count the number of buckets

                     USHORT NumBuckets = mvGuessTypeList.size();

                     // Loop through all the buckets

                     for ( USHORT i = 0; i < NumBuckets; i++ ) {

                          // See if index fits in this bucket

                          if ( index <= mvGuessTypeList[i].Edge ) {

                             return mvGuessTypeList[i].GuessType;

                          } // end if
                                                    Chapter 12 Response Curves         295

         } // end for

         // Index didn’t land in a bucket!

         assert( 0 && “Index out of range” );

         // As a default, however, we will return a random guesser

         return GUESS_RANDOM;


     The first thing we did in the function GetGuessType() is to generate our ran-
dom number, index. There is one change to this line from the technique we used
before. Instead of hard-coding the number 100, we changed our random number
call to be between 1 and mMaxIndex. This is important. We are now set up to gen-
erate a random number between 1 and whatever the combined width of all of our
buckets happens to be. For example, if we decided that our “33” guess bucket was
a width of 8 wide rather than 4 and made no other changes, the total width of all
buckets would be 104 rather than 100 (notice that we are no longer saying “per-
cent”). If we had continued to generate a random number between 1 and 100, we
would not be giving full credit to the bucket that we have now pushed to the right—
ending at 104 instead of 100. We will address how dynamic bucket widths can be
used to our advantage a little later on.
    The next statement in the function sets NumBuckets to the number of buckets
we have in our vector. Again, this is something that we can leverage. If we decide to
add a fifth type of guesser to our experiment, this code would automatically ac-
count for it.
    Once we know the number of buckets that we are going to search, we loop
through them. The test is the same as we did before: We check to see if our ran-
domly generated index is less than the edge of the current bucket designated by
mvGuessTypeList[i]. If it is, we return the GuessType associated with that bucket.
If not, we move on.

Note that this code should always return a GuessType before it exits the loop.
I leave it to you, gentle reader, to insert error-trapping code of your choice (such
as my assert() function), return a default GuessType, or devise any other
manner of graceful exit.

     There is another method of finding out in which bucket our metaphorical
bouncing ball landed. It will be much more entertaining to set up a few more buck-
ets to search before we open the lid on that method.
296    Behavioral Mathematics for Game AI


       As we touched on at the beginning of this chapter, response curves have another
       valuable role to play for us. The functions from Chapter 10 are rather inflexible in
       that we couldn’t tweak specific areas of the curves the way we might want to. We are
       stuck with whatever was spit out of the equation for a given x value. Similarly,
       some of the probability distributions in Chapter 11 are function-based. While we
       can calculate the probability (y) for a given x value, we lack the ability to extract a
       random x value based on all of the different y probabilities.
            While these two problems may not seem related, we can actually solve them
       in much the same manner using response curves. The secret to both solutions is
       converting the results of the function into a custom response curve. Once we have
       created the response curve, we have much more flexibility available to us. Before we
       get too far ahead of ourselves, however, we need to address the methods and code
       for putting the numbers into the response curve to begin with.

       To keep the numbers manageable at first, we will start with a simple equation:

           Because we are going to be filling a finite space with our results, it is important
       that we establish the range with which we are working. In this case, we will limit
       ourselves to the range 0 ≤ x ≤ 40.
           As usual, we first need to define our vector.
           typedef std::vector< double > CURVE_VECTOR;

           CURVE_VECTOR mvEquationResults;

           The process of filling the vector is rather intuitive.

           void CLinearFunction::FillVector( int Size )


                double y;

                for ( int x = 0; x <= Size; x++ ) {

                    y = ( -2 * x ) + 100;

                    mvEquationResults.push_back( y );

                } // end for

                                                        Chapter 12 Response Curves         297

           To fill mvEquationResults from 0 to 40, we simply call:

           FillVector( 41 );

           Below is what that data would look like:

           x             y
           0             100
           1             98
           2             96
           3             94
           …             …
           38            24
           39            22
           40            20

            Again, this seems ridiculously easy. In fact, it seems like a lot of wasted effort
       when we could have simply used the equation itself for any value of x. However,
       this does allow us to manipulate the results of that equation. For instance, if we
       want y to be the result of the equation except when x = 27, in which case we want
       y = 1.0, we can change that single entry in the vector.

           mvEquationResults[27] = 1.0;

            This may seem like an inconsequential benefit at the moment. However, we
       will soon see that this ability lies at the heart of the power behind response curves.
           Just for the sake of completeness, we can recover a value by simply retrieving
       the contents of the vector element.

           y = mvEquationResults[x]

           Not a lot to it, eh?

       The above example is simplified somewhat by the fact that the x range that we are
       working with is between 0 and 40. We have the luxury of using the vector indices
       that, for a group of 41 elements, start at 0 and end at 40. We are not able to do this if
       the range with which we want to deal starts at, for example, 125 and extends to 165.
298   Behavioral Mathematics for Game AI

      To accommodate this, we need to abandon using the index of the vector as our x
      value. Instead, we create a struct that contains data for both x and y. We then can
      create a vector composed of that struct.
          struct sELEMENT


               int x;

               double y;


          typedef std::vector< sELEMENT > ELEMENT_VECTOR;

          ELEMENT_VECTOR mvElementVector;

      Entering Data
      Filling the vector with the equation results is not much different with this new
      twist. Instead of simply pushing a y value onto the next element of the vector, we
      now store both the x and y value in an sELEMENT and push the whole thing onto the
      end of the vector.
          void CLinearFunction::FillVector( int Low, int High )


               sELEMENT thisElement;

               for ( int x = Low; x <= High; x++ ) {

                      thisElement.x = x;

                      thisElement.y = ( -2 * x ) + 100;

                      mvElementVector.push_back( thisElement );

               } // end for


           If, as we stated above, we want to store the data for x values from 125 to 165, we
      call FillVector with:

          FillVector( 125, 165 );

          By running this new version of FillVector, we fill mvElementVector with 41
      entries. The x values range from 125 to 165, with the corresponding y values being
      the result of our function.
                                               Chapter 12 Response Curves       299

    For the 41 potential values of vector index i, the corresponding values of the
elements x and y would be:
    i                  x              y
    0                  125            –150
    1                  126            –152
    2                  127            –154
    3                  128            –156
    …                  …              …
    38                 163            –226
    39                 164            –228
    40                 165            –230

Extracting a Value
Now that the index of the vector no longer corresponds to the x value, recovering
the data that we need is a slightly more involved process. There are three primary
ways of handling this.

Brute Force
The first option we could take is the brute force method. By iterating through the
entire vector, we could check all the x values until we find the one we want and re-
turn the corresponding y value. This is a perfectly viable solution—especially for
small data sets. It looks much like what we did with the guess type function earlier.
    double CLinearFunction::GetY_BruteForce(int x)


         for ( int i = 0; i < mvElementVector.size(); i++ ) {

              if ( mvElementVector[i].x = x ) {

                     return mvElementVector[i].y;

              } // end if

         } // end for

         // Code should never get here!

         assert( 0 && “Index not found” );

         return 0.0;

300   Behavioral Mathematics for Game AI

           The drawback of this method is that it doesn’t scale well to large data sets. The
      search time scales linearly with the number of members that we add to our vector.
      That is, if we have 41 members (as in our example above), we are going to average
      searching 20.5 members to find our match. If we have 1,000 members in the data
      set, we are going to be searching 500 of them on average… and as many as all 1,000
      in the worst-case scenario.

      Certainly, if we know the offset from the vector index to the lowest x value, we
      could leverage that to find the proper container. For example, we know that our
      lowest number in this exercise was 125. The lowest index in a vector is 0. Therefore,
      to find the container that corresponds to any given x value, we subtract 125 from
      the x value we actually want to look up. If we wanted to find y when x = 130, we
      could find it this way:

          y = mvElementVector[x – 125].y;

          If x = 130, the above statement would yield the equivalent of:

          y = mvElementVector[5].y;

           This method is problematic in that we have to keep track of the offset in a sep-
      arate location—either hard-coded as a constant such as above or held separately in
      a variable. Either way, if we change the range of our x values, we have to remember
      to change this offset. We could avoid this by setting the offset to the x value of the
      first element in the first place:
          Offset = mvElementVector[0].x;

          y = mvElementVector[x – Offset].y;

          The above method works in the scenario that we have set out. It also is signifi-
      cantly faster than the brute force method. There is a better way, however—one that
      allows us to do some nifty tricks down the road.

      Binary Search
      For those who aren’t familiar with the binary search technique, it is, in essence, a
      game of “guess the number.” The three possible outcomes are “higher,” “lower,”
      and “got it!” If you have every played this game, you realize that the most efficient
      way of guessing the number is to take a “divide and conquer” approach.
                                               Chapter 12 Response Curves       301

    For example, if guessing a number between 1 and 100, we want to start at 50. If
we are told that the number is higher than 50, we would guess 75. If we are then told
“lower,” our next guess would be 62. On each turn, we divide the remaining range
in half, thereby maximizing the possibility that it is on either side of an incorrect
guess. Compare this to the brute force method we listed above. That approach is
analogous to guessing the lowest possible number on each iteration and being told
“higher” until, one step at a time, we reach the target. Using that method, the max-
imum number of guesses is equal to the number of possibilities. If we are guessing
a number between 1 and 100, we could end up guessing 100 times to determine the
target (if the number was 100).
    On the other hand, a binary search performs in O(log2 n) time, where n is the
number of elements. In the classic “guess the number between 1 and 100” game, we
would need a maximum of seven guesses to determine the target. If we were to
expand the game from 1 to 200 instead, we would only need one additional guess
(eight total). Guessing a number between 1 and 400 requires only nine guesses.
Between 1 and 800? Only 10 guesses. It becomes apparent very quickly that a binary
search is an efficient way of finding data.
     The major requirement for a binary search is that the data we are searching is
stored in a sorted fashion. If it was not, then we could not determine “higher” or
“lower”… only whether or not we were correct. Imagine asking someone to guess
the name of an animal. When they did so, we could tell them “correct” or “incor-
rect.” The answer of “incorrect” doesn’t help them determine what direction they
should take on their subsequent guess, however. If we told them that the animal
they were attempting to guess was, for example, “earlier in the alphabet” or “heav-
ier” than the one they just guessed, we would be giving them a direction in which
to head. With that information, they could use a similar “divide and conquer”
strategy to close in on the correct answer.
    In the example we are working with, the values of x are stored in the vector in
sorted order from lowest to highest. That way, when we test the value of x at a par-
ticular array index (i), we can determine if we are higher or lower (assuming we are
not correct) and move in the proper direction from that point. Therefore, a binary
search is a valid approach.
    The code for a binary search isn’t difficult to write. We need only keep track of
the highest and lowest possible bounds at any given time. Then, we calculate the
midpoint between them as our guess and check to see if our guess was correct, high,
or low. The following searches our vector for the proper value of x to return the
corresponding y.
    double CLinearFunction::GetY_BinarySearch( int x )

302   Behavioral Mathematics for Game AI

              // Get number of elements in the vector

              int iCount = mvElementVector.size();

              // Set the boundaries of our search range

              // to the first and last elements (remember that

              // the vector indices are 0-referenced... that’s

              // why we subtract 1 from iCount!)

              int iLow = 0;

              int iHigh = iCount - 1;

              bool found = false;

              int i = 0;      // the vector index

              while ( !found ) {

                   // use the mid-way point as our index guess

                   i = iLow + ( ( iHigh - iLow ) / 2 );

                   if ( x = mvElementVector[i].x ) {

                       // the guess is correct

                       return mvElementVector[i].y;

                   } // end if

                   if ( x < mvElementVector[i].x ) {

                       // lower the high boundary to the current guess

                       iHigh = i - 1;

                   } else {

                       // raise the low boundary to the current guess

                       iLow = i + 1;

                   } // end if

              } // end while

              return mvElementVector[i].y;

                                                      Chapter 12 Response Curves        303

          Stepping through the function from the beginning, we first determine the
      number of elements in the vector. We set our initial bounds for the search at 0 and
      one less than the number of elements. (Vectors are 0-referenced, so n elements
      means the last index is n – 1.) In the while loop, we set our guess index (i) to the
      halfway point between whatever the current high and low are. We then check the
      value of x held at the point in the vector referenced by i. If it matches the x we are
      looking for, we are finished and return the corresponding y. If not, we then deter-
      mine whether our guess was too high or too low and change our upper or lower
      bounds accordingly. We then repeat the while loop to make a new guess until such
      time as we guess correctly.
          Notice that, as written, the while loop should not end because we never change
      the value of found. If we wanted to, we could write in various error traps to avoid
      such things as infinite loops. I left them out here for clarity of code.
           The binary search method has given us a few improvements over the prior
      methods. As we discussed above, the binary search method is significantly faster
      than the brute force method—especially when we work with larger data sets.
      Additionally, unlike the offset method, we no longer have to keep track of the rela-
      tionship between the vector index and the data contained in the vector. This last
      part is significant for one last reason: by design, the vector indices necessarily have
      to increment by one—we may not want to hold our data to the same requirement.


      In the previous examples, we were matching up a single x with a single y. That is,
      any given input generated an output. However, if we think back to the original (and
      delightfully simple) dentist example, we encounter a different requirement.
           We can think of the “ball into bucket” metaphor as having two different types
      of input. First, we think of the ball in terms of dropping into one of five different
      segments of the range. In a way, we have recast this as being five buckets rather than
      two. In this case, buckets 1 through 4 mean that the dentist recommends sugarless
      gum and bucket 5 indicates that he doesn’t.
          If we were to create our dentist example using the 1-to-1 methods outlined
      above, we would be inclined to create a five-unit vector—the first four of which
      were mapped to one output (“sugarless”) and the fifth mapped to the other
      (“tooth-rotting”). However, it does seem rather inefficient to have four slots in our
      vector all pointing to the same outcome.
         On the other hand, we can also think of the ball dropping into one of only two
      buckets—the two buckets representing our two choices. It just so happens, of
304    Behavioral Mathematics for Game AI

       course, that one of those buckets is four times as large as the other one. The differ-
       ence is subtle but important from an algorithmic standpoint.
            It would seem that a more accurate way of modeling this idea would be to truly
       have only two buckets—that is, two items in our vector. However, we would have
       to also represent the reality that the first bucket was four times as large as the second
       one. This is where the edges come into play.

       We don’t need to change too much of our data structure to represent this method
       of thinking. When we use a pattern similar to what we have done already, our den-
       tist recommendation structure would look like this.
           typedef enum {




           struct sRECOMMENDATION {

                 USHORT Size; // The size of this bucket

                 USHORT Edge; // The edge of this bucket based on its position

                 GUM_RECOMMENDATION Recommendation; // The actual recommendation


           typedef std::vector< sRECOMMENDATION > GUM_VECTOR;

           We have replaced the x parameter of the struct with two components: Size
       and Edge. The first one, Size represents the width of the bucket. Edge, on the other
       hand, represents the position of the edge on the x scale. This is similar to how we
       were using x in the prior examples. There is not much of a functional difference
       between the two—the edge value is simply an x value after all. The difference is that
       we are no longer representing every value of x.

       Entering data into the new structure works from the same premise that we used
       earlier. The function AddBucket takes a bucket size and a recommendation, places
       them into a temporary struct, and pushes that struct onto the back of the vector.
                                                 Chapter 12 Response Curves        305

    void CDentist::AddBucket( USHORT Size,

                                    GUM_RECOMMENDATION Recommendation )


         sRECOMMENDATION CurrentRecommendation;

         mTotalSize += Size; // Calculate the new edge

         CurrentRecommendation.Size = Size;

         CurrentRecommendation.Edge = mTotalSize;

         CurrentRecommendation.Recommendation = Recommendation;

         mvRecommendations.push_back( CurrentRecommendation );


     One important thing to note is how Edge works. As we add each bucket, we re-
trieve the total size of all the buckets we have added so far. That value represents the
right-most edge of the whole collection. Because we are adding our new bucket on
the end of the row, the edge of the new bucket is the total size plus the size of the
new bucket.
    We can fill our dentist recommendation list with the following function:
    void CDentist::InitVector()


         mTotalSize = 0;

         AddBucket( 4, SUGARLESS );

         AddBucket( 1, SUGARRY );


    This adds our two buckets to the vector. After running InitVector(), the data
stored in the vector is:

    i          Size          Edge           Recommendation
    0          4             4             SUGARLESS
    1          1             5             SUGARY
306   Behavioral Mathematics for Game AI

      Converting Functions to Distributions
      To convert a larger dataset such as the results of a function, we need to automate the
      process of adding buckets. For this example, we will use an uneven probability dis-
      tribution applied to 10 items. The probabilities of the 10 items follow the formula:

          For a visual reference, the graph of the number of occurrences looks like the
      one in Figure 12.3.

         FIGURE 12.3     The histogram showing the number of occurrences of each selection
                            is based on the formula y = –1(x – 100) + 12.

          The first thing we do is create the struct that will act as our data bucket.
          struct sBUCKET {

               USHORT Size; // The size of this bucket = probability

               USHORT Edge; // The edge of this bucket based on its position

               USHORT Result; // The result we are generating = x


          As with our dentist example, each record holds the size of the bucket, its edge
      location, and what the result will be. We’ve changed the terminology slightly here.
                                               Chapter 12 Response Curves     307

Rather than referring to an x value as we have done previously, we now call this
variable Result. We do this because, with a probability distribution, we are going
to be selecting one of our buckets based on the probability represented by Size. We
could have also named it something like “Name,” “Selection,” “Action,” or, as in
the dentist example, “Recommendation.” What we call it will be case-specific. In
any event, it is the name of what the bucket represents. For now, Result it is.
    As usual, we create a vector to hold our distribution.
    typedef std::vector< sBUCKET > DIST_VECTOR;

    DIST_VECTOR mvDistribution;

    We then create our function, InitVector(), that fills our vector with the 10
results that we want to track the probability of.
    void CDistribution::InitVector()


        sBUCKET ThisBucket;

        USHORT ThisSize;

        USHORT MaxItems = 10;

        for ( USHORT x = 0; x < MaxItems; x++ ) {

             ThisSize = (-1 * ( x -100 ) ) + 12;

             ThisBucket[x].Size = ThisSize;

             If ( x == 0 ) {

                 // this is the first entry

                 ThisBucket[x].Edge = ThisSize;

             } else {

                 ThisBucket[x].Edge = ThisBucket[x-1].Edge + ThisSize;


             ThisBucket[x].Result = x;

             mvDistribution.push_back( ThisBucket );

        } // end for


   In InitVector(), we loop through the 10 items, using the value of x in the
equation we specified above to determine the size of the bucket that we then store.
308     Behavioral Mathematics for Game AI

        The next step in the above function is slightly different than what we have done be-
        fore. Rather than hold a separate value for the total size to determine what the edge
        is, we use the edge of the previous bucket. By adding the size of the current bucket
        to the prior edge, we determine the edge of the current bucket. (Note that for the
        first bucket, we don’t have a prior bucket to use. The edge is the same as the size.)
        Later, when we need to know the last bucket edge—that is, the total width of the
        group—we can retrieve the edge value of the last bucket. As a last step, we assign
        the value of x to Result and then push the bucket onto the end of our vector.
            After running InitVector(), our data will look like this (i is the index of the

                i            Size             Edge             Result
                0            12               12               100
                1            11               23               101
                2            10               33               102
                3            9                42               103
                4            8                50               104
                5            7                57               105
                6            6                63               106
                7            5                68               107
                8            4                72               108
                9            3                75               109

             As we can see from the above table and from Figure 12.4, the edges represent
        the cumulative sizes of the buckets as we “lay them end to end.” The total width of
        all 10 buckets is 75.

        We can retrieve a random result out of the distribution using the binary search
        method outlined above. The function GetResult() is entirely self-sufficient. That
        is, we don’t need to pass it or otherwise store the number of items in our distribu-
        tion or the value of the right-most edge. When we call the function, it returns a
        random result from mvDistribution determined by where the generated random
        number lands in the distribution.
                                           Chapter 12 Response Curves      309

    FIGURE 12.4 The histogram in Figure 12.3 rearranged to show the data
       generated by the InitVector() function. The terminology of the
                   response curve data structure is labeled.

USHORT CDistribution::GetResult()


    // The number of buckets in the disribution

    USHORT NumBuckets = mvDistribution.size();

    // The maximum roll is the edge of the last bucket

    USHORT MaxRoll = mvDistribution[NumBuckets - 1].Edge;

    // The random number we are looking for

    USHORT Target = DiceRoller.SingleDie( MaxRoll, false );

    // Bucket indexes

    USHORT iHigh = mvDistribution.size() - 1;

    USHORT iLow = 0;

    USHORT iGuess;

    bool found = false;

    while ( !found ) {

         // Guess is halfway between the low and high indexes
310   Behavioral Mathematics for Game AI

                   iGuess = iLow + ( ( iHigh - iLow ) / 2 );

                   // Check for correct guess

                   if ( InBucket( iGuess, Target ) ) {

                        return mvDistribution[iGuess].Result;

                   } // end if

                   // If not correct...

                   if ( Target > mvDistribution[iGuess].Edge ) {

                        // guess is too low, change the bottom boundary

                        iLow = iGuess;

                   } else {

                        // guess is too high, change the top boundary

                        iHigh = iGuess;

                   } // end if

               } // end while

               // Code should never get here!

               assert( 0 && “Code fell through while loop!”);

               return 0;


           There is one main difference between this function and the binary search we
      used earlier. Because the bucket is a range rather than a discrete point on the x-axis,
      we must perform a slightly more involved check to see if our random number lands
      in it. To do this, we create a function InBucket that takes our current bucket guess
      and the random target as parameters.
          bool CDistribution::InBucket( USHORT i, USHORT Target )


               if ( i == 0 && Target <= mvDistribution[i].Edge ) {

                   return true;

               } // end if
                                                        Chapter 12 Response Curves         311

                 if ( Target <= mvDistribution[i].Edge &&

                      Target > mvDistribution[i-1].Edge ) {

                    return true;

                 } else {

                    return false;

                 } // end if


            This Boolean function checks to see if the random number Target is between
       the edge of the specified bucket mvDistribution[i] and the edge of the bucket to
       the left of it, mvDistribution[i–1]. We need to take care with the operators in the
       two statements. Because the edge of a bucket is inclusive, we also need to test for
       equality on the current bucket but not equality of the previous bucket.
            It is also important that we trap for the possibility that this is the lowest bucket
       (that is, i is 0). If that is the case, we cannot utilize the edge of the bucket below it
       without generating an index that is out of bounds (e.g., –1). We only check to see
       if Target is less than the edge of the bucket.
           Going back to GetResult(), if InBucket returns false, then we know that our
       current guess, iGuess, is not correct. We then test to see if our random number
       (Target) is higher than the edge of our current bucket. If it is, we move the lowest
       bucket to search up to the current bucket. If Target is lower, we move the highest
       bucket to search down to the current bucket.

       In all of the above examples, we create the response curve at the beginning and do
       not adjust it afterward. There are times, however, when we would want to adjust the
       contents of the response curve. It would be inefficient for us to erase all the contents
       of the existing vector and rebuild it from scratch. Depending on the change that is
       made, we can use a number of approaches to modify the existing data without hav-
       ing to start over.
            One of the most common (and most useful) adjustments we can make to the
       data in a response curve is to adjust the weights of one or more of the buckets.
       When we are dealing with a 1-to-1 response curve, adjusting the data held in one
       of the buckets is inconsequential. We return the data held in the bucket that we find
       at the selected index.
          The only difference to this approach is when we introduce the binary search.
       Because we are searching for our result by the edge values rather than the indices,
       we need to maintain the integrity of the edge values. With the distribution-based
312   Behavioral Mathematics for Game AI

      response curves, when we change the size of one bucket, we also affect the edge
      values of all the buckets after the changed one. It is a requirement that we rebuild
      the edge values in the vector any time the data changes so the change is reflected in
      all the edge values. For example, let’s look again at the data from the above uneven
          i                Size             Edge             Result
          0                12               12               100
          1                11               23               101
          2                10               33               102
          3                9                42               103
          4                8                50               104
          5                7                57               105
          6                6                63               106
          7                5                68               107
          8                4                72               108
          9                3                75               109

           If we arbitrarily decide to change the weight (“size”) of the result “104” from 8
      to 10, we need to make a number of changes. Obviously, the first change is that the
      bucket at index 4 would now have an edge of 52 rather than 50. (It now stretches
      from 43 to 52.) However, looking now at the bucket at index 5, when we add its size
      of 7 to the new edge of 4, we arrive at 59 rather than 57. This process cascades down
      to the last bucket, whose edge we increase by 2 to a value of 77. The contents of the
      new vector are (changes from above are emphasized in bold):
          i                Size             Edge             Result
          0                12               12               100
          1                11               23               101
          2                10               33               102
          3                9                42               103
          4                10               52               104
          5                7                59               105
          6                6                65               106
          7                5                70               107
          8                4                74               108
          9                3                77               109
                                                Chapter 12 Response Curves       313

     To accomplish this properly, we need to construct a function that rebuilds the
edges. Rather than rebuild all of the edges, however, it is more efficient (especially
in larger data sets) to rebuild only from the changed bucket forward to the end of
the vector. We can do this with a function such as this one:
    void CDistribution::RebuildEdges( USHORT iStartBucket /*= 0*/ )


        USHORT VectorSize = mvDistribution.size();

        for ( USHORT i = iStartBucket; i < VectorSize; i++ ) {

             if ( i > 0 ) {

                  mvDistribution[i].Edge =

                      mvDistribution[i-1].Edge + mvDistribution[i].Size;

             } else {

                  mvDistribution[i].Edge = mvDistribution[i].Size;

             } // end if

        } // end for


    The function RebuildEdges( USHORT iStartBucket ) will rebuild the edges
from the array starting point identified by iStartBucket and continuing on to the
end of the vector. If we do not specify a value for iStartBucket, the default of 0 is
used and the entire vector is rebuilt.
    As with our earlier example, we need to account for the possibility that the
index is 0. In that case, attempting to access the data at mvDistribution[i–1]
would generate an error. If the index is 0, we know that the edge is equal to the size
     We can use this function every time a data element in the vector is changed.
There is an exception to this approach, however. If we are going to be changing a
number of elements at the same time (that is, before we try to extract data from it
again), it would be redundant to keep recalculating the edges for each change. It is
very likely that we will be rewriting the edge data over and over to account for each
change. It is much more efficient to make all of our changes first and then recalcu-
late the necessary edges. (We could either rebuild all the edges at that point or keep
track of the lowest numbered index that was changed and start from there.)
314   Behavioral Mathematics for Game AI


      Because we are using the edge values to search for the chosen bucket, we don’t nec-
      essarily have to have the buckets sorted by size. However, we can optimize searches
      through having our buckets sorted by size. This method works very well when there
      are many buckets of widely disparate sizes. We can construct an example of when
      this optimization is useful by using a quadratic distribution such as:

           If we use this formula over the x range of 0 to 30, we find that the probabilities
      (y) of any given x range from 100 to 10. That gives us a large difference in bucket
      sizes. This is even more apparent when we realize that the final edge of these 31
      buckets will be at 2155. The first bucket (x = 0) has a 4.6% chance of being picked.
      On the other end of the spectrum the last bucket has only a 0.5% chance.
           The important factor to address, however, is the starting point of our “divide
      and conquer” approach. The purpose of guessing the midpoint is to reduce the
      amount of space left to search regardless of whether our guess was high or low. By
      starting with the middle bucket, we are reducing the number of buckets on each
      side of our guess to a joint minimum. No matter which way we missed (too high or
      too low), we know that we have reduced the remaining buckets to search as low as
      we can.
          In the above examples, we started at the middle bucket found by using the

                   iGuess = iLow + ( ( iHigh - iLow ) / 2 );

            Applied to this example, our initial index would be 15 (Figure 12.5). Upon a
      little examination, however, we find that the edge of bucket 14 is 1399. That means
      that almost 65% of the data is below what we calculated as the midpoint of the vec-
      tor. That also tells us that, if our initial guess of 15 is incorrect, we are going to be
      moving left more often than right. If that is the case, we would want to optimize for
      quicker searches on the side of the vector that we are going to be landing in more
          By changing our initial guess to the bucket that is in the middle of the possible
      choices rather than the middle bucket, we can achieve better results. That is, rather
      than selecting the middle bucket, we want to select the bucket at the point where we
      know the ball will fall half the time on one side and half the time on the other side.
      We can do this by determining a theoretical bucket edge that is half the value of the
      farthest edge.
                                                 Chapter 12 Response Curves        315

  FIGURE 12.5 We can improve the performance of searching for the selected bucket
     by starting in the middle of the data distribution rather than the middle bucket.
    Most of the data occurrences are on the left side of the histogram. Therefore, we
      bias our search to that side by starting with bucket 11 rather than bucket 15.

    In this example, the edge of bucket 30 is 2155. Half of that is 1077. The value
1077 falls into bucket 11. If we examine the bucket below number 11, we find that
the edge of bucket 10 is 1062. That tells us that 49.3% (1062/2155) of the values are
in bucket 10 or below. On the other hand, the edge of bucket 11 is 1149. There are
1005 (2155 – 1149) numbers above bucket 11. That represents 46.7% of the total.
(Bucket 11 is the missing 4.1%.) This means, if the answer is not bucket 11, we have
about an even chance of being either too high or too low.
    To accomplish this, we need a new way to determine our starting bucket. As we
explained, this is simply a matter of dividing the highest edge value in half and then
searching for it via the original, slightly inefficient method. We then store that value
for use in all of our regular searches.
      We don’t need to do this every time we search. If we did, it would be very inef-
ficient anyway because we would be searching twice for every real search. As long
as the data does not change, the suggested starting point does not change. In fact,
because the binary search is so fast, as long as the data doesn’t change much, we can
still use that suggested starting point.
316   Behavioral Mathematics for Game AI

          This point exposes the fact that these optimizations are very situation-dependant.
      In general, we can follow these rules:

          Any change to a bucket size:                 Rebuild edges
          Large changes to bucket sizes:               Re-sort vector
          Change in lopsided distribution:             Recalculate starting bucket

         By no means is the above an exhaustive list. Depending on the number of ele-
      ments in a vector, the relative sizes of the elements, the nature of the data, and even
      how often the data is changed or accessed, we can select our optimization and search
      methods to provide the best possible response.
           For example, the only time that we would want to re-sort the vector and change
      the start bucket is if we are dealing with very large data sets (e.g., hundreds of buck-
      ets) or are doing many searches on a data set that changes rarely (or both). In either
      case, we can see some minor gains in performance. However, we need to be aware
      that the sort and recalculate optimizations should happen very rarely lest we give
      back the gained time through the time it takes to perform the actual optimization.


      Other than the occasional manual tweak, we have generated most of the response
      curves in this chapter through a function of some sort. This is not a necessary lim-
      itation. It is not only possible, but very useful to hand-craft a distribution to match
      a desired effect.
          Thinking back to our five dentists, we manually created a two-bucket response
      curve representing the two bits of masticatory advice a dentist could offer to us:
      sugarless gum or sugared gum. We also manually set our two sizes of four and one,
      respectively to simulate the legendary “four out of five dentists surveyed” phenom-
      enon. We also used this same approach to model the four types of guessers in the
      Guess Two-Thirds Game. These are very small examples of hand-crafted distribution.
          Applying this process to a more extreme example, rather than go through the
      great pains that we went to in Chapter 11 to model the results of the Guess Two-
      Thirds contest run by the University of Copenhagen that we explored in Chapter 6,
      we could craft a 101-bucket response curve that exactly duplicated the results of the
      study. Rather than breaking down the four groups and then modeling the distrib-
      ution of the guesses for each group, we can simply create a single response curve
      that holds the data for all 101 possible selections. By sizing the buckets according to
      how the results appeared in the actual contest, we can generate a strikingly similar
                                                      Chapter 12 Response Curves        317

          The most common method of constructing hand-crafted response curves is to
      save the data in a file. At run time, we read the data from the file into the response
      curve in almost the same looping fashion as generating it from a function.
      Retrieving the data uses the same process as we have discussed above.


      We can also generate data for a response curve at any point during run time. In fact,
      as we will find, response curves are at their best when storing and selecting data that
      is constantly changing. If we think of the buckets as decisions (e.g., “sugarless” vs.
      “sugary”) and the sizes of the buckets as weights, priorities, or utilities for those
      decisions, we can see how being able to selected a dynamically weighted decision
      opens up many methods of adjusting and selecting behaviors. As the game runs,
      we can change the weights by any method we choose. Any time the weights change,
      we update the edges, re-sort the vector if necessary, and go about the business of
      selecting our behaviors. We will deal with this process in more detail in the next few
This page intentionally left blank
 13                   Factor Weighting

            n Chapters 7, 8, and 9 we discussed various methods of measuring utility. More
            importantly, we discussed that measurements of utility are a core component of
            the decision-making process. It behooves us to examine how we can measure,
        track, and adjust utility in a manner that is clean, efficient, understandable, and
        easy to use.


        When I confessed some of the intricacies of my caffeine usage earlier in the book,
        it would have done you, the gentle reader, very little good for me to say that I “like
        pop.” You would have no idea what that means in the wider arena of likes and dis-
        likes. Even if I were to say that I like pop “a lot,” you have no frame of reference.
        Does that mean “more than average”? More than I like rutabagas? More than I like
        watching football? How close is “a lot” to “more than I could possibly like anything
        in the world”? Or is it more along the lines of “yeah… I suppose I wouldn’t mind
        having a pop”? Without a frame of reference, many of our statements are lost in the
        inherent vagaries of language.

        Even when the words are very specific, we have to take them with a proverbial grain
        of salt. My sister, for example, has met “the funniest guy ever,” seen “the funniest
        show ever,” and eaten “the best sushi ever,”—and done all of them numerous times.
        In fact, she can manage to experience any one of those superlatives in startlingly
        rapid succession. (She once heard three “funniest joke ever” candidates within a
        two-hour period.) My conclusion is that she is either extraordinarily fortunate in

320   Behavioral Mathematics for Game AI

      her life’s adventure, or she just misuses the word ever. Given the spectacularly com-
      mon Internet usage of the word ever in the titles of videos, top 10 lists, and other
      such fare, I have determined that she is not alone in her overuse of the term.
          To make things more complicated, our perceptions can change rapidly depend-
      ing on what we are looking for; something that is the “the most” or “the best” can
      change from moment to moment. For instance, in a previous life, I worked at a
      recording studio as a composer, arranger, keyboard player, and recording engineer.
      (I got out of the biz when the kids wanted to eat more regularly than I was paid.)
      One of the phenomena that I observed there was what I termed the “climbing fader
          When I would be with a band working on the mix of their low-budget master-
      piece, I was often the recipient of a nonstop barrage of requests. Most often, these
      took the form of “dude… I can’t hear the [insert instrument here].” It doesn’t take
      much speculation to guess that each of the members specifically focused on their
      own part of the project. Therefore, each person would ask to hear more of their
      own part. When I acquiesced to each new request by turning up their track, it
      caused the other members to hear less of their own respective parts. In short order,
      another person would complain that their part was too soft. And another… and
      another. I was nudging each instrument higher in the mix, one at a time until,
      eventually, the original complainant was, once again, tapping me on the arm and
      indicating I should boost him further.
           The root problem that caused the climbing fader syndrome was that each per-
      son felt (consciously or subconsciously) that his part should be “the loudest”—an
      acoustical impossibility, to say the least. However, most of them didn’t really think
      that their part should be “out front in the mix,” but because it wasn’t, they didn’t
      perceive it as being there at all. I could usually prove the point fairly simply. When
      I received the comment “I can’t hear my [instrument],” I would respond by simply
      turning off their track for a moment and then turning it back on. The reaction was
      priceless… sometimes involving the person saying, “Oh, there it is! That’s a lot bet-
      ter!” Nothing had changed in the mix. They simply noticed their instrument again
      because they had missed it during the brief absence.
          The lesson of the “climbing fader syndrome” is that not everything can be “the
      best” or “the most.” And more importantly, even when something isn’t at the ex-
      treme, it still is important.
          So what is “the best”? What is “the most”? In a world where we reduce concepts
      to numbers, this is an important concept to hammer out. We could assert that “the
      best” selection is whatever happens to have the highest measurement at any one
      point. The problem with that approach is that we have no frame of reference for
      what “the best” could be. This is an important distinction.
                                                 Chapter 13 Factor Weighting       321

There’s Always Something More
In my recording studio example, at any one time there was a “loudest” instrument.
Most of the time, however, none of them were as loud as they could be. That is why
the climbing fader syndrome could happen. No matter what volume they were at,
I could always turn them up a little bit more. Eventually, however, I would have hit
the top of the range for that particular slider. I would have to then say, “It’s as loud
as it will go.” That point would have been “the loudest.” To make a particular track
louder, the only solution would have been to turn everything else down.
    There is a difference between the topmost value of a group and the topmost
possible value. Making note of where this artificial edge is is essential to scaling our
possible values. Sometimes there is a limit and sometimes there is not. In football,
basketball, or baseball, for example, there is no upper limit to the score that a team
can generate. (Sure, you eventually would run into the limitations of physics and
time, but for all practical purposes, we don’t consider that a problem.)

There’s No Room at the Top
On the other hand, in sports like gymnastics, figure skating, and diving, there is
certainly a maximum that one can obtain. Gymnasts can’t get more than a 10.0, no
matter how accomplished they are. (It used to be that figure skating maxed out at
6.0… I’m not sure what they do now. If you want to see multi-attribute utility
theory in action, check out “ISU Judging System” on Wikipedia.) To make matters
worse, there are floors where the gymnastics scoring starts: 8.8 or 9.2 depending on
the system. You really have to work hard to merit a score below those numbers.
(I was always curious to see what would happen if a skater just sat down on the ice
and didn’t move. Is it even possible to get a 0 in those sports?)
    Capped systems like this can generate something of a quandary. If you, as a
judge, give a contestant a 10.0 on a routine, and someone else comes along later
who performs better, what score can you give that one? The artificial restraints of
the scoring system have limited you to giving the second person a 10.0 as well.
When this happens, you sound a lot like my sister saying that you have just seen
“the best gymnastics routine ever”—from two different people! Certainly that is
unlikely. One of them would have been better than the other—at least marginally.
     On the other hand, if we work with a capless system, we run the risk of encoun-
tering a phenomenon similar to the climbing fader syndrome. If there is no restric-
tion on how high a score can go, we can always justify a score by comparing it to
others. Now, the second gymnast above could score a 10.1. And the next one that
is better could score a 10.2. Unfettered by the practical logistics of physics and time,
subjective scores can get out of hand quickly. Eventually, it is entirely possible that
gymnasts of the future could be scoring 11, 20, 40, or even in the hundreds!
322    Behavioral Mathematics for Game AI

           The solution lies in analyzing the approach that we use to generate our utility
       scores. To do that, we need to take into account a number of different components.
       We need to delve into what makes a score not only relevant, but usable. With that
       in mind, we can tailor our system to make it useable and meaningful without open-
       ing ourselves to pitfalls similar to the ones above.

       We can put a lot of these issues into a better perspective by understanding the dif-
       ference between types of measurements. There are three ways that we can refer to
       values. Each has its advantages and pitfalls. We can see each of the three illustrated
       in Figure 13.1. Note that the graph bars remain the same length in each of the three
       examples. The values that we assign to those bars change, however, depending on
       in what context we place them.

       Absolute Weights
       First, we can assign a concrete value to each bar. This approach is similar to a score
       in a sporting event such as baseball or basketball. The value “is what it is.” This
       method is useful when we truly want to know what the count of something is (like
       the points in a game). For example, if we were calculating how many units we could
       kill with four different strategies, we would want to know the actual number. We
       could then use those values to determine absolute weights for each strategy.
           We have already seen this idea in action. In Chapter 7, we used the actual val-
       ues of the health and damage as absolute weights in our decision. We also used the
       actual costs of building a barracks and a tower and the estimates of how much
       damage they would take. Those were absolute weights as well. The numbers were
       exactly what they represented. If we take 10 points of damage, we represent it as
       “10” in our formulas.
            In Figure 13.1 (left), we would determine that the rightmost bar (16) is the best
       of the four. However, there is nothing precluding us from selecting a different op-
       tion if something better comes along. For example, a different strategy could lead
       us to destroy 17 units. Another could net 20. In fact, there is no real maximum for
       this value. Another strategy at another time may allow us to kill 100. The results of
       our utility function are only useful for determining which of the four strategies will
       give the best results.
           One drawback is that we have absolutely no context in which to judge a given
       score. A frame of reference is required to interpret an absolute score. Unless we
       have other scores to compare it to, we don’t know whether a score is good or bad.
       For instance, if I was to tell you that I achieved a score of 8,423,128 on an unnamed
                                                 Chapter 13 Factor Weighting      323

game, you would have no idea whether that was good or bad. Only if you have
access to other people’s scores, could you then compare my score directly to them
and determine whether or not my braggadocio is warranted.

        FIGURE 13.1 By changing the context, we can refer to three otherwise
           identical measurements in different ways. In the middle example,
                  we have defined an arbitrary maximum at 20 units.

Weights Relative to a Maximum
The second method we can use to assign values to a score is to compare the absolute
weight of what we are measuring to a different value that we predetermine as the
maximum possible value. In the middle of Figure 13.1, we see the same bars as used
in the left example. However, we have added the rule that the maximum value is 20.
The numbers shown in the bars are the relative weights to what they could be. For
example, the first bar in the graph is still an absolute size of 12, but in the context
of a maximum of 20, the score is 0.60. It is 60% of what it could be.
    Anyone who has taken a test is familiar with this sort of weighting. There is
usually a maximum score that we can achieve on the test. If we compare the score
that we actually receive to the maximum possible score, we arrive at a percentage.
This percentage gives us two pieces of information.
    First, we can still compare different scores. We know that the 0.60 for the first
bar is better than the 0.45 for the second one but worse than the 0.80 of the fourth
bar. However, we can also determine the fitness of the scores themselves compared
to the maximum. Even if we have no other decision against which to compare, we
know where it falls in relation to what it could be. This is similar to how, while our
egos may be curious about how our test score compared to our peers, the grader
only really cares about how we did relative to the maximum.
324   Behavioral Mathematics for Game AI

           It is also similar to the gymnastics and skating scoring systems. Without know-
      ing how the other contestants did in any given competition, we can glean informa-
      tion from a single score. We know that a score of 9.975 on a 10-point scale is pretty
      darn good. We can not, however, determine if the recipient of that score won that
      particular competition without knowing the scores of the other participants.
      Therefore, while a weight relative to a maximum allows us to judge the fitness of an
      individual score on its merit alone, we can’t determine “the best” until we compare
      it directly to other scores.
           A variation on this method is that the defined point does not need to be a max-
      imum. Instead, we can define an anchor value to which we compare our values.
      The result is similar, but we now have the possibility that our relative weights can
      be greater than zero. For example, using the same bars as in our previous example,
      we see in Figure 13.2 that setting the anchor to a value of 10 produces different
      relative weights.

         FIGURE 13.2 The value to which we are comparing our scores does not have to
       be a maximum. We can use other anchors. In the right-hand graph, we are comparing
              our absolute weights to a value of 10 to arrive at the weighted scores.

      Weights Relative to Each Other
      The third method, illustrated on the right of Figure 13.1, is another variation of rel-
      ative weighting. In this case, the maximum is not a predetermined value. Instead,
      we compare all the scores to whichever score is the greatest. Therefore, the fourth bar
      (size of 16) is the de facto maximum for the moment. We score the other three bars
      relative to 16. For example, the first bar (size of 12) is 0.75 the size of the fourth.
           One advantage to this method is that we aren’t limited by an arbitrary maximum
      score. Because of this, the method scales itself as the scores change. For example, as
      a situation changes, the idea of what makes a “good score” may vary significantly.
                                                         Chapter 13 Factor Weighting       325

       In the example in Figure 13.1 (right), the fourth bar is the front-runner with a
       value of 16. Later on, the top score may be 32, for example. At that point, a score of
       16 (0.50) doesn’t look so hot.
            Also, because of the automatic scaling of this method, the relative differences
       between scores yield important information. When the top score is 16, the differ-
       ence between the scores of 12 and 9 are fairly significant: 0.75 – 0.56 = 0.19.
       However, if the top score is 32, the scores of 12 and 9 are 0.375 and 0.281, respec-
       tively. The difference between them is only 0.09 now. If the top score is 100, that
       difference drops to 0.03. For all intents and purposes, when compared to a value of
       100, the scores of 12 and 9 are becoming identically poor.
           As a variation, we don’t have to use the greatest value as our comparison point.
       At times, it may be beneficial to use the lowest value instead (Figure 13.3). For
       example, if we were concerned with the least time in which an action could be per-
       formed (as opposed to the highest score), we might want to rate the other options
       by comparing them to the quickest one. Once again, using our original four values,
       we recalculate their weights relative to the smallest of the four—the third bar’s size of
       7. The method that we select will depend on the problem we are trying to address.

              FIGURE 13.3 The value to which we compare the other scores does not have
                to be the maximum. At times, we may want to know how the other scores
                                   compare to the lowest selection.

       Another issue that we need to be mindful of when scaling our weight scores is
       granularity. This is comparable to the concept of significant digits in science and
       math—often with the description of superfluous precision. For our purposes, we can
       think of it in terms of the accuracy to which we need to either calculate or keep
       track of something.
326   Behavioral Mathematics for Game AI

      One of the main considerations for establishing a correct level of granularity is
      accuracy of differentiation. This is the level at which we can discern differences
      between adjacent measurements. An excellent example of this is hanging in many
      hospital and doctor’s examination rooms. Medical professionals have standardized
      a “pain scale” to assist in quantifying patients’ otherwise (very) subjective reports of
      the discomfort they feel (Figure 13.4).

           FIGURE 13.4 A typical pain chart runs from 0 (no pain) to 10 (extreme pain).
            The granularity of the ratings allows 11 selections. The faces allow 6 ratings.

           The pain scale ranges from 0 to 10, which represents the range from “no pain”
      to “extreme pain.” The patients report the number that corresponds to how they
      feel at the moment. The face icons included on most charts are helpful for children
      to identify how they feel.
           The question that we must ask ourselves is “Why 0 to 10?” Aside from 10 being
      a nice round number, there is a reason the creators of the pain scale designed it with
      this granularity. On the 10-point scale, we can differentiate between 6 and 7. The
      smallest possible unit on the 10-point scale represents a gradation that we can ac-
      tually discern and describe. The 11 selections are no more accurate than our own
      ability to sense and rate our pain level. We probably would have no reason to spec-
      ify that our pain level was at a 6.5, for example.

      Too Many
      On the other hand, imagine that the pain scale ran from 0 to 100 instead. The
      smallest possible unit on a 100-point scale is one. Can we tell the difference between
      60 and 61? We find that one out of 100 units is too small to be able to meaningfully
      differentiate the levels of pain.
                                                 Chapter 13 Factor Weighting       327

     A similar reason for the 0 to 10 range of the pain scale is the usability of the
information. While a doctor may respond differently if you report your pain at 7
rather than 6, he is unlikely to react much differently if your pain is a 61 rather than
60. This is the portion that is similar to significant digits in science and math calcu-
lations. Certainly, there is a numerical difference between 61 and 60, but what does
it really mean? If the granularity at which you are tracking a value is finer than the
granularity at which you are able to measure the value in the first place, the extra
data is superfluous precision.

Not Enough
On the other hand, a pain scale whose range was less than 10 may possibly not be
expressive enough. For example, if we change the scale so that “extreme pain” is 3
rather than 10, it is more difficult to express exactly how we feel. (“Well, doc… I’m
not a 1, and 2 seems a little high. I’m kinda somewhere between 1 and 2 but closer
to 2 than to 1.”) It also means that the doctor doesn’t get as accurate of a picture of
our condition from which to work.
     Interestingly, most of the pain charts show only 6 face icons as I did in Figure
13.4. The reason for this becomes startlingly clear when we attempt to determine
how we would “tweak” the faces to create ones in between the existing set. (There’s
only so much expression that you can present with smiley faces!) This leads to a dif-
ferent granularity for the “smiley face” version of the pain chart than the number
line version.
     However, as we will discuss later, it isn’t necessarily a correct assumption that
we should limit ourselves to only the outputs that we can display such as in the case
of the faces. Using response curves, for example, we can always combine portions
of the range into one displayed behavior. We can still benefit from tracking the un-
derlying data without being concerned with what we can exhibit externally. This is
especially important in the game development world where we are typically limited
on which animations and actions we can display.
    An extreme example would be a behavior that had only two outwardly visible
signs such as the “fight or flee” scenario we have referenced a few times. The response
curve could continue to track a more granular “degree of fear” (or some other such
value) and only trigger the change to “flee” when “degree of fear” reaches a partic-
ular threshold.
    The essential secret to constructing a weighting scale, therefore, is to arrange it
so that there is enough granularity to express the detail you need to differentiate
meaningful differences—and no more!
328   Behavioral Mathematics for Game AI

      Data Considerations
      Another consideration when determining granularity is the data structure we will
      use. Many people use a variable that can represent a plethora of decimal values
      from 0 to 1. That level of detail is often more than we need. Often, we only need to
      track a relative handful of values. For example, a single byte allows us to store 256
      values. Most of the utility scales we would need to track in our games would not ne-
      cessitate a finer granularity than that.
           By using response curves, we can map the 256 values in a byte onto a large variety
      of actual figures. For example, if we wanted to represent numbers from 0 to 1,000
      but didn’t need to have a granularity of 1/1000, we could use a response curve that
      associates the indices 0 through 250 with appropriately spaced decimal values using
      the formula y = 4x. The contents of our response curve vector would look like this:
          i                 y
          0                 0
          1                 4
          2                 8
          3                 12
          …                 …
          125               500
          …                 …
          249               996
          250               1,000

          We are still representing numbers from 0 to 1,000, but we are not doing so with
      possibly unnecessary granularity. By using a char instead of a short int, we are
      saving one byte of data storage. While this may not seem like much at first, when
      multiplied many times over, the savings can add up.
          Certainly, one could make the case that we could simply store the numbers 0 to
      250 and multiply them by 4 when we are ready to use them. As the formulas get
      more complex—and especially when we hand-craft our response curves—this
      approach yields significant benefits.
          Moreover, when dealing with arrays or vectors, by only storing token values
      along the way, we don’t have to have an array that covers the entire range of values.
      In the above example, despite wanting to represent values that range from 0 to
      1,000 in our array, we did so in only 251 array locations. This effect is even more
      noticeable when we want to store larger ranges. Just because we are storing numbers
      ranging from 0 to 25,000 doesn’t mean we care about every individual increment.
      We may not need a granularity that fine.
                                                        Chapter 13 Factor Weighting      329


       With all of the above methods, tips, and tricks in place, we are now able to approach
       the issue of actually scoring the value of something. To make our decisions, we need
       to consider information. Often, the information is not as simple as a binary “yes or
       no.” For example, in a decision about whether or not we should get a health kit, we
       aren’t using “alive or dead” as a factor. We are concerned with the gray areas of how
       much health we currently have. Are we fully healthy? Doing well? A little injured?
       Are we dragging our limbs behind us? By taking that entire spectrum of possibili-
       ties into account, we can craft deeper, more meaningful decisions.
            One of the first considerations when deciding how we should weight a single
       criterion is whether we are tracking the actual value of something or the utility. The
       former case is more intuitive and certainly more common.

       We count and measure plenty of things in games: units, buildings, numbers of bul-
       lets in our guns, damage dealt over time, health, and how much money items cost.
       We often use these concrete numbers in our decisions. In fact, much of the heavy lift-
       ing of decision making involves concrete numbers. The game world is no exception.
           For example, we may take the number of bullets remaining in our gun as a de-
       ciding factor on whether or not we should reload. Obviously, if the number drops
       to 0, reloading becomes relatively important. However, we can treat this in a non-
       binary fashion as well: the lower the number, the more we may consider taking a
       moment to reload.
            Another decision may involve comparing our health to the damage we are ex-
       pecting to receive from an opponent over time. We can compare this to the dam-
       age we are likely to deal over time and the opponent’s health to determine which of
       us is going to survive a conflict. All of the figures in the above calculation are con-
       crete numbers: damage (d) over time (t) vs. health (h).

       The game world is sometimes too enamored with concrete numbers. This is likely
       a by-product of the fact that computers count and perform math so well. Therefore,
       we programmers tend to gravitate toward using things that our computers can
       count and mathematically juggle. A typical thought process may sound like:

           “Ok, then we will add up the number of…, multiply it by the number of…
           and, if it is greater than three times the number of…, we will do [insert noble
           deed] as many times as it takes to construct the right number of….”
330   Behavioral Mathematics for Game AI

          Everything there involves concrete counts and figures. “The number of…” is a
      very safe and comfortable place for computers (and their programmers).
           As we discussed earlier in the book, concrete counts and measurements don’t
      necessarily correspond equally to utility. Therefore, when dealing with these types
      of items, we need to determine which of the two values we are going to measure. In
      Chapter 8, we discussed numerous examples where we were building armies. The
      number of soldiers we had already built was a criterion in the decision. More im-
      portantly, however, we were measuring the marginal utility of building additional
      soldiers. The abstract concept of utility is different from the concrete count that we
      started with.
           If we are using the raw number or value of an object in a calculation, there is lit-
      tle that we need to consider with regard to how we are weighting it as a decision cri-
      terion. However, when switching to utility, we have some subjective judgments to
      make. As such, the range and granularity that we select for our scale will be largely
           In this book so far, we have covered many different examples that we would
      treat in a variety of manners. Even going back to my razor blades in Chapter 2, we
      could devise a way of scoring my relative satisfaction with the blades. We may use
      the number of uses as a starting point, but my satisfaction with a blade is not as lin-
      ear as counting the times I have used it. For example, we could state that a blade
      starts with a quality of 100 when it is brand new, but, as I use it, its quality decreases
      rapidly at first and then gradually flattens out toward a value of 0 where I am not
      satisfied with it at all.
           Regardless of how we arrived at the satisfaction value, our range for this value
      is 0 to 100. We can even think of it as a percentage of satisfaction. (Note that we
      don’t need a lot of granularity here. Therefore, we don’t need to use a decimal
      value. A byte type to store the values of 0 to 100 performs just fine in this regard.)
      Once we have determined our satisfaction with all five blades, we can then compare
      them and decide which one we are going to use.
          In other examples, my daughter, Kathy, needed ways of measuring the worth of
      various topics that she wanted to put into her speech for her fifth grade election
      campaign. I needed a way of rating the quality of the food and the atmosphere at a
      particular restaurant when I took my family out to dinner. We could have even put
      a subjective value on the sense of altruism that we felt by giving more than was nec-
      essary in the Ultimatum Game.
          Of course, it is certainly feasible for us to track both the value and the utility
      separately. This is often the case when the corresponding subjective utility for a
      concrete number of items may change based on other parameters. In our $20 bill
                                                         Chapter 13 Factor Weighting       331

       example, the face value of $20 didn’t mean the same to everyone. The 20 is a concrete
       value; the utility that people put on those $20 is an abstract rating. We will revisit
       this later.


       The important task above was to decide on the range (and, by association, the gran-
       ularity) we were going to use to score our criterion. When all we are concerned
       about is a single criterion, we have plenty of flexibility in how we do this. However,
       when we plan on using one utility score in conjunction with others, it behooves us
       to keep the whole picture in mind.

       One way of making the job of comparing, contrasting, and combining disparate
       numbers easier is to use the same scale for all of them. If, for example, we are track-
       ing satisfaction ratings for various things, we want to place all satisfaction ratings on
       the same scale. In Chapter 9, we examined the hedonic calculus factors that could
       be involved in a decision on where to go out for dinner. While some of the factors
       were concrete (such as cost, travel time, and wait time), some of them were subjec-
       tive satisfaction ratings. By making sure that we were rating all satisfaction-based
       values on the same scale, we can ensure that we have little difficulty later on.
            Technically, the term normalization refers to the process of stretching or
       shrinking a range by a normalizing constant so that it fills the space from 0 to 1.
       This is a very common practice in probability math. While we may not be using it for
       exactly the same purpose, the rationale behind normalizing subjective factors is
       similar: put everything into one, homogenous template. For our purposes, this
       template doesn’t have to be 0 to 1. As we have suggested above, we could use the in-
       tegers 0 to 100 (pseudo-percentages), 0 to 255 (fill a byte variable), or something
       as simple as 0 to 10 such as what the pain scale uses. There is one very significant ad-
       vantage to using 0 to 1, however: when we multiply factors that range from 0 to 1
       together, the product is still in the range of 0 to 1. That is remarkably handy when
       producing weighted sums or weighted averages, for instance. Regardless of the range,
       however, the main consideration is the granularity that we require for the value.
            As we mentioned above, we can normalize concrete values as well. We can do
       this linearly or through a function such as the ones we covered in Chapter 10.
332   Behavioral Mathematics for Game AI

      Linear Normalization
      Linear normalization is the process of converting a concrete range into a normalized
      range on a proportional basis. We do this by determining the normalizing constant
      that we need to apply to the raw value to convert it to the normalized version.

       I N T HE G AME   How Much Weight?

      For example, we could convert the amount of weight a character in a role-playing
      game (RPG) is carrying into a normalized value. If the character can carry a maxi-
      mum total weight of 75 pounds of equipment, we can use that as the endpoint of
      our normalized range. Therefore, 75 pounds = 1.0. In this case, the normalizing
      constant is 75. At any point, we can calculate the load percentage that the character
      is toting around by dividing the actual weight by 75.

         If our character is burdened with 52 pounds of items, the normalized weight
      would be

          While this may look like an unnecessary complication, one advantage we gain
      from normalization of factors is that we can then standardize the rules we use to
      apply those factors to decisions.
          For example, let us assume that our game design states that different characters
      can carry different amounts of weight. At any time, we can calculate the character’s
      normalized weight (burden) through

          Also, we want to determine that a character can no longer play hopscotch when
      he is carrying too much of a load. However, the load that prohibits playing hop-
      scotch is based on the percentage that he is loaded (after all, stronger people can still
      hop around and pick up little pebbles while carrying more junk). If we state the rule
      that hopscotch is no longer possible once a character has more than 60% of his car-
      rying capacity, our hapless dude in the above example is going to be left out of the
      game until he can rid himself of at least some of his burden. On the other hand (or
      other foot), someone who could carry a maximum of 90 pounds of equipment
      would still be able to play hopscotch while carrying the same 52 pounds of items.
                                                 Chapter 13 Factor Weighting       333

     We could do the calculations for the percentage only when we need them, or
we could do them ahead of time by normalizing the idea of “burden” at all times.
There is a significant advantage to normalizing. The decision to play hopscotch
does not need to take into account the weight the player is carrying or the maxi-
mum capacity. All we need to check is if the normalized weight (burden) is less than
the threshold figure of 0.60. In fact, we can then construct all rules for all decisions
that have to do with burden based on this single normalized value. We may decide
that jumping rope requires a burden of less than 0.40, basketball requires less than
a 0.20 burden, and that even walking requires that the character is less that 90%
loaded (Figure 13.5).

FIGURE 13.5 By dividing the actual weight carried by the maximum weight a character
   can carry, we can determine a normalized “burden” value that ranges from 0 to 1.
         We can then set thresholds for various activities all on the same scale.
                We don’t have to know the actual weight that permits or
                  prohibits each activity—only the normalized burden.

     We can even use burden in other ways. We could state that fatigue over time
is a function of relative burden rather than weight. We could represent this by the
334   Behavioral Mathematics for Game AI

          Notice that we do not have to reference the amount of weight the character is
      carrying or the maximum amount that he can carry. We have rolled all of that in-
      formation into the tidy package of the factor Wnormalized. We can always expect a
      value of between 0 and 1 regardless of how much the character can carry or how
      much he has on him at the time.

      Nonlinear Normalization
      There are times when a direct mapping of values provided by the linear normaliza-
      tion method above does not suit our purposes. Thinking back to the chapters on
      utility (especially increasing and decreasing marginal utility), we often need a way
      of representing the changes in a number that reflects a particular formula. Using
      the same approach as we did with linear normalization, we can generate normalized
      values that fall within a certain range (such as between 0 and 1). Instead of using a
      constant to normalize the values, however, we can use a normalizing function. The
      specifics of how we would go about this vary widely depending on the type of for-
      mula we are using. The end requirement is that our values lie within a specified range.

       I N T HE G AME   Are We There Yet?

      One of the examples we used earlier in Chapter 8 was building squads and armies.
      We used this example to show the effects of marginal utility—both increasing and
      decreasing. However, as with the weight example above, it could be awkward to
      constantly change the marginal utility formula based on a differing number of sol-
      diers. At times, we may want to achieve a threshold of having four soldiers. At
      other times, for other tasks, we may want 10. For bigger, more aggressive jobs, we
      could say that 20 is enough. Even changes in the difficulty level of the game could
      warrant having slightly different numbers.
          To accommodate these variable requirements, we need to normalize our mar-
      ginal utility curve so that it scales to the number of units we are trying to create. We
      begin by deciding that our utility values for each soldier will range from 0 to 1.
      More accurately, they will range from 1 down to 0 as we add additional soldiers. We
      then set the utility of the nth soldier to the following formula, with goal being the
      number of soldiers that we will be satisfied with acquiring.
                                                    Chapter 13 Factor Weighting         335

     As with the weight example above, the denominator of the fraction here repre-
sents the maximum value that we are using as our “endpoint.” In the weight exam-
ple, we divided by the constant that represented the maximum weight a character
could carry. In this case, goal is the maximum number of soldiers we are striving
for. The approach is the same—actual number divided by maximum number. The
only difference is that we have an exponent in the formula. If we apply this expo-
nent to both the numerator and the denominator, the proportion of actual to max-
imum remains intact.
     If we graph results from a number of different values of goal, we find that the
curve is identical for all of them (Figure 13.6). In all cases, the utility of the first sol-
dier is 1.0. Then as we move through the range from 1 to goal, the utility of each sol-
dier decreases as a rate proportional to how many total soldiers we are building.
When n = goal, the utility is 0. (In practice, we would adjust the formula so that the
utility reaches 0 after the last soldier by changing the denominator of the fraction to
(goal + 1)3. Otherwise, with a utility of 0, we would never build the last soldier. For
clarity of the graphs, I did not do so here.)

        FIGURE 13.6 By constructing a formula that normalizes the utility values,
          the results follow the same curve from 1 down to 0 regardless of how
                               many soldiers we are building.
336   Behavioral Mathematics for Game AI

      Comparing Normalized Values
      By normalizing utility values, we can more easily use the resulting figure in a variety
      of comparisons and calculations that are not directly dependant on the potentially
      variable components that make up the utility. We have a homogenous “yardstick”
      that requires no additional processing to determine which is more important. As
      we touched on earlier, scoring systems in gymnastics and skating have a defined
      range in which to work. Regardless of the complicated (and sometimes controver-
      sial) processes that go into determining the final score, once we have that score, we
      can directly compare contestants and easily determine who performed better. We
      can determine the winner by the startlingly simple formula, “is ScoreA > ScoreB?”

       I N T HE G AME   Who’s Next?

      Going back to our soldier example, if we have followed a similar procedure in de-
      termining the marginal utility of building a new worker unit, we can compare that
      utility with the utility of building another soldier.
          To illustrate this example, let us assume that the formula for workers is

         Also, to homogenize the nomenclature (since that’s what this section is all
      about!), let us re-label the formula for soldiers

           By only raising the worker’s formula to the power of two, it makes for a flatter,
      less cornered curve. This is the equivalent of saying that we aren’t quite as driven to
      make sure we have a minimum number of workers as we are for soldiers.
           Theoretically, the decision to build a worker instead of a soldier would then be
      as simple as the comparison

           This direct comparison would not need to take into account how many work-
      ers or soldiers we need to build to accomplish our current civic and military agen-
      das, respectively, as this would already be included in the normalization process for
      each type of unit.
                                                Chapter 13 Factor Weighting     337

    To see this process in action, we will now plug in some numbers. Let’s assume
that our current goals and counts for workers and solders are:

    Unit                Goal          Count            Utility
    Worker              20            5                0.938
    Soldier             20            5                0.984

     By setting the goals and counts equal to each other, we show that the priority
for soldiers is, indeed, higher. Let’s assume that we have built a few more soldiers
and see how this changes.

    Unit                Goal          Count            Utility
    Worker              20            5                0.938
    Soldier             20            8                0.936

    Once we have built our eighth soldier, the utility for the next soldier drops to
the point where it is now less than the utility for the next worker. Our next build,
therefore, would be our sixth worker. The utility of our seventh worker is 0.910,
meaning we would build yet another soldier. Eventually, as we acquire more of each
type, the count of workers catches up to the soldiers. (Specifically, we would build
two workers in a row a couple of times. At a count of 18, the number of workers
matches that of the soldiers, and they alternate again from that point.)

Different Goals for Different Folks
While this is all well and good, the power of using normalized utility is when we are
working with unequal goals. For example, using the same utility formulas, let us as-
sume that we would like to have 35 soldiers and 20 workers. At the beginning, our
data would look like this:

    Unit                Goal          Count            Utility
    Worker              20            0                1.000
    Soldier             35            0                1.000

     After building one of each, the next four builds are of soldiers. Once we have
five, the normalized utility of the next soldier falls to 0.997, while the normalized
utility of the second worker is 0.998. Therefore, only after we have five soldiers
would we build our second worker.
338   Behavioral Mathematics for Game AI

          Unit              Goal               Count          Utility
          Worker            20                 1              0.998
          Soldier           35                 5              0.997

          We build our third worker when we have eight soldiers, and so on.
           Going the other way, let’s assume that we only want to build eight soldiers.
      Remember, however, that we biased the curves to build soldiers earlier. The results
      are surprising. The builds alternate until we have five of each. Notice that we would
      then have 63% of our soldiers built but only 25% of our workers! Only then do we
      start building multiple workers before building another soldier. In fact, only after
      building our tenth worker would we build our sixth soldier.

          Unit              Goal               Count          Utility
          Worker            20                 10             0.750
          Soldier           8                  5              0.756

          At 13 workers, we would build our seventh soldier and at 17 workers we build
      our eighth soldier.
          By normalizing the utility values so that they are represented on the same scale,
      the formulas that we used to calculate the marginal utility for each unit type are
      hidden from the comparison that determines what to build next. We were able to
      concentrate on just the single “which is greater?” expression to determine what to
      build next.

      Soldiers, Sailors, Airmen… or Workers?
      The worth of this uncomplicated method is even more apparent when we consider
      adding more unit types. For example, even if we have 15 or 20 different types of
      units, each with their own utility formulas, goal populations, and current counts,
      we can trust that the normalized utilities are all on the same 0 to 1 scale. Therefore,
      by sorting the list of units by their current utility values, we can select the highest
      one as the “most important” to build. This selection process would still be valid
      even if we changed the marginal utility formula for any of the unit types.

         As we have seen, normalization is a process that helps us make sense of what
      would otherwise be moving targets. It is a tool that we will use often.
                                                        Chapter 13 Factor Weighting      339

       Most of the time, we find that a single criterion is not enough to make a decision.
       Many decisions are a combination of two or more criteria. Except for rare circum-
       stances, the weights that factors have in the decision are usually not equal. One fac-
       tor is typically more important than another. As we increase the number of factors
       to consider in a decision, the likelihood that they all have the same importance
            The most common method of balancing the significance of each factor is
       through the process known as weighted sums. Through weighted sums, we con-
       struct a single value from multiple factors—each with their own value. We pair
       each component value with a coefficient that represents its weight in the overall de-
       cision. The resulting combined value reflects not only the component values, but
       the proportional “meaningfulness” for each of those components.
            If we think in terms of vector math, the weights are the same as the magnitude
       of a vector. While the direction of the vector provides directional information, the
       magnitude tells us how far to move in that direction.
           A generic formula for a two-component weighted sum looks like the following:

           In English, the weighted result of X and Y (RXY) is the weight for X (wX) times
       the value of X (vX) plus the weight for Y (wY) times the value of Y (vY).
           We can normalize the result by dividing by the sum of the weights.

           This results in a weighted mean of the values. By doing this, we achieve the effect
       of putting the result on the same scale as the component values. This is especially
       advantageous if we have already normalized the values to the same scale. By using
       the weighted mean, we ensure that the result is directly comparable to the original
       values. For example, if the component values are normalized to a range of 0 to 1,
       the weighted mean is between 0 and 1 as well.
           Certainly, we are not limited to two components. For the sake of completeness,
       the formula for n components is
340   Behavioral Mathematics for Game AI

          Or (if you are like me and the big E sign scares you)

      Deciding on Dinner
      If my wife, Laurie, and I were to decide what we were going to eat for dinner (I
      really have to stop working on this book when I’m hungry), we may have a conver-
      sation about what we are interested in eating. Naturally, we both have preferences,
      and, just as naturally, those preferences may differ. We may not, however, always
      put equal weight on our preferences. In a typical example, Laurie may say that she
      really doesn’t care what we have. (Whether or not I can believe that she really doesn’t
      care what we eat or whether I am supposed to guess is a topic for a book on under-
      standing women, however, and beyond the scope of this chapter.) While I don’t
      want to entirely disregard her preferences, I can assume at this point that she wants
      my preference to have more weight than her own. To reflect this, we can weight my
      utility for varying dinner selections more than we do her utility for those same
          Let’s assume that our four options for the evening are a steak dinner at a restau-
      rant (I’m just not firing up my grill when it’s freezing out!), Chinese take-out, nuking
      some burritos, and cooking a frozen pizza. First, we would establish the coefficients
      that we will use in the formula so that our preferences are proportional to how
      much consideration we put on our respective preferences. From those coefficients,
      we can calculate a combined utility that reflects our joint decision.
           Because I don’t want to completely disregard my wife’s desires, we will estab-
      lish the coefficients as:

          Dave             2
          Laurie           1

          This means my desires are worth twice as much as hers. Put another way, instead
      of splitting the decision 50/50, my preferences are worth 67% and hers are worth
          Put into a utility formula, this gives us:

           Assuming that we have normalized the two personal utilities on a scale from 0
      to 1, we can normalize the combined score by dividing by the sum of the coefficients.
                                                      Chapter 13 Factor Weighting      341

           When we identify our personal preferences for the four potential dinner op-
      tions and insert them into the above formula, we arrive at a combined utility value
      for each one.

          Selection                 Dave           Laurie         Combined
          Steak dinner              1.0            1.0            1.0
          Chinese take-out          0.1            0.7            0.3
          Microwave burritos        0.4            0.1            0.3
          Frozen pizza              0.4            0.6            0.5

          Examining the data above yields a few interesting observations. First, while I
      look at the burritos and pizza with the same opinion, Laurie really doesn’t like the
      burritos. Therefore, her preference acts a tie-breaker. We can see this reflected in
      the combined score where the pizza ranks higher than the burritos.
           Additionally, despite the fact that she likes Chinese take-out quite a bit—more
      than the frozen pizza, because I’m not in the mood for Chinese, the score suffers
      significantly. Much to Laurie’s dismay, the Chinese take-out option ends up with
      the same score as the burritos. Her difference in preference between the two is very
      large (0.6), whereas mine is relatively small (0.3). Because my opinion matters twice
      as much, however, that 0.3 difference means as much as her 0.6.
          The end result is less than surprising. With both of us preferring to go out for
      steak, that option easily wins out. (And the agreement likely spares me the potential
      of annoying my wife with my choice regardless of the fact that she said she didn’t
      care. That is a factor that carries a lot of utility value for me!)


      Even after we have calculated or combined factors to arrive at a utility function, we
      may not have enough information to process a decision. While weighted sums
      allow us to combine (and even normalize) similar information, decisions often
      take disparate pieces of information into account. It is usually simpler to combine
      similar items together first to arrive at an aggregate value. For example, we com-
      bined the two sets of dinner preferences into one combined dinner preference
      value. Once we have arrived at these combined figures, we can combine them with
      other figures in another step of the process. By doing this, we are creating a layered
      weighting model. In essence, we are building our decision in tiers.
342    Behavioral Mathematics for Game AI

           We touched on this process briefly in Chapter 3 using the example of how my
       children’s teachers calculate grades through first combining the scores for home-
       work, tests, and quizzes together into their own scores and then combining those
       scores together into one aggregate score. We revisited the idea through our discus-
       sion of multi-attribute utility theory in Chapter 9. Each of these examples follows
       the same pattern. We start with small blocks of information that we then combine
       into larger blocks. Theoretically, there is no limit to how many layers we can build
       or how many pieces of information can go into any one aggregate.
            Once again, we can use any of the above methods to do this. Typically, however,
       weighted sums allow us to combine the component parts together in the most flex-
       ible manner.

       The unfortunate weakness in the dinner example above is that our desire (separate
       or combined) is not the only factor in the decision. There are other thoughts that we
       must entertain. For example, we often have three factors that we want to take into

           desire          How much we desire the selection
           price           The price of the selection
           time            The time it takes to acquire the selection

            We can address each of the three items above separately. In the previous exam-
       ple, we utilized weighted sums to construct a combined utility of desire. We could
       generate utilities for the price of the food and the time it would take to acquire it in
       a similar fashion. We could use weighted sums, response curves, or any other
       method to determine the utility of each of those factors. Regardless of how we cal-
       culated them, we would arrive at a single utility figure for price and time. When we
       have arrived at those figures, we will once again use weighted sums to combine our
       preferences for the four dinner options with the other factors.

       You Can’t Always Eat What You Want
       Just as we decided how to weight my preference for dinner against Laurie’s, we must
       do the same with the three factors that we are including in this layer of the decision.
       Because we have a lot of game programming (or book writing) to do, we decide that
       the time it takes to acquire the selection is the most important factor. Additionally,
       because game programming (or writing about game artificial life [AI]) doesn’t
       make us rich, we decide that the price of our prospective forage is the second most
       important factor. While there is some consideration for how much we actually like
       the food, its impact is incidental.
                                                  Chapter 13 Factor Weighting       343

    Looking at the factors in order of decreasing importance, therefore, we find:
    time > price > desire
    The question we need to answer is, “What is the proportional relationship
between these factors?” To determine this, we can assign coefficients to each of the
three factors so that their respective ratios are reflective of the relative merits of
the factors. Let’s say, being the wise AI programmers that we have become by this
point, we decide that the coefficients are as follows:

    time                    5
    price                   3
    desire                  2

     As you can see, price is 1.5 times as important as desire but a little more than
half as important as time. Also, time is 2.5 times as important as desire.
    With the magnitude established, we can pair them with the individual utility
scores for each of the factors to construct a single formula that takes all three com-
ponents into account.

     As we can see, the utility (U) is the sum of the three component utility values
(t, p, d) after weighting them appropriately. By plugging in some sample data, we
can test drive our formula and its weights. We are working with the assumption
that we have calculated the three utility values elsewhere and come away with nor-
malized utilities for each of them.
     One important thing to note is that the time values are inversely proportional
to the actual time that it takes to acquire the food. In this case, driving to the steak-
house and waiting for the food is the longest time, so it is the lowest utility.
Microwaving a couple of burritos is the shortest time, giving it the maximum utility
of 1.0. The same pattern exists for price: The steak dinner is the most expensive,
thereby garnering the lowest utility score, and the frozen pizza is the least expensive.

    Selection                   Time            Price         Desire         Utility
    Steak dinner                0.1             0.1           1.0            2.8
    Chinese take-out            0.3             0.6           0.3            3.9
    Microwave burritos          1.0             0.8           0.3            8.0
    Frozen pizza                0.7             1.0           0.5            7.5
344   Behavioral Mathematics for Game AI

           As we can see from the results, heating up some burritos is our selected cena de
      noche. By analyzing the chart, we can see evidence of the weighted sum in action.
      First, we can reflect on the somewhat depressing fact that it is the selection that we
      least desired (right along with the kung pao chicken). However, because we were
      more interested in the time factor, the quick availability of the burritos had a sig-
      nificant advantage. The combination of having to drive to pick up our Chinese and
      the fact that it is a little more expensive seriously reduced the overall utility for the
      take-out food.
          Similarly, as we had suggested above, desire alone is not the only factor in this
      decision. Despite a craving to go out for steak that is double the utility of the sec-
      ond-place food (pizza), because it is the slowest and most expensive option, it
      dropped to last on our list. Put another way, regardless of how much we would have
      liked to eat steak at our neighborhood restaurant, our current priority structure
      (time > price > desire) did not allow it.

      You Don’t Always Have to Eat at Your Desk
      For the sake of completeness, let’s assume that I have finished writing this book
      (thereby freeing up my schedule) and that, because my publisher has anticipated
      that it will sell 500,000 copies, I receive a massive check from them. (OK… 500,000
      is a stretch, but run with me on this, would ya?) Obviously, our priorities for a
      celebratory dinner would be different from a typical night of writing and program-
      ming. To reflect this, we now weigh our utilities with the following numbers:

          time            1
          price           1
          desire          4

          For the sake of example, let’s assume that our utilities for desire, price, and time
      have not changed, nor have the relative prices of the four selections. They taste the
      same to us, they take the same amount of time to prepare or acquire, and they cost
      the same as they did before. The only change is in the priority of how we view those
      three factors.
          Using the new formula,

      our results are now:
                                                           Chapter 13 Factor Weighting    345

           Selection                  Time           Price            Desire        Utility
           Steak dinner               0.1            0.1              1.0           4.2
           Chinese take-out           0.3            0.6              0.3           2.1
           Microwave burritos         1.0            0.8              0.3           3.0
           Frozen pizza               0.7            1.0              0.5           3.7

            As we would expect, with time and price being relatively unimportant this time
       around, we would now elect to go out for steak. It is the most time-consuming and
       the most costly, yet those two factors combined cannot make up for the fact that it
       is also the most desired meal. Tonight, what we want to eat is twice as important as
       time and price combined. The frozen pizza gets some consideration by being cheap
       and fast, but it is not enough to overcome the very important factor of desire.
       Similarly, the burritos are cheap and fast, but those factors are simply not as rele-
       vant anymore.

       It’s worth repeating that three different dynamics are in play here—one in each of
       the three layers (our individual desires, the weighted combination of desires be-
       tween the two of us, and the combination of desire with price and time). We can
       think of the multi-layered weighting model as a large filter. Changes that happen in
       any portion of the process will change, to some extent, the end result. The amount
       of change depends on the number and configuration of the filter layers through
       which a decision must pass. In this example, when we submit a dinner option (such
       as pizza), it passes through the different layers of the filter process (Figure 13.7).

       FIGURE 13.7 A multi-layered weighting model acts like a filter through which our utility
        information passes. Each component adjusts and weighs the data as it passes through.
                 Changes to any part of the process will also change the end product.
346    Behavioral Mathematics for Game AI

           First, any changes Laurie and I make to our preferences for food are going to
       change our personal desire utilities. (I actually do really like Chinese food most of
       the time!) We can change how much we desire various foods. Additionally, we
       could change the weights of who gets to decide. There may be times when we would
       weight her preference equal to—or even greater than—mine. However, that is only
       the first step in the process.
            Any changes we make to process that determines the utilities of time or price
       for a specific meal are going to propagate from the first layer to the second just as
       the changes to our personal desire weights do. However, the weight of the factor
       where we made the change throttles the magnitude of the changes somewhat. It is
       entirely possible that a major change in one factor will have very little effect in the
       overall decision because that factor’s weight is minimal. On the other hand, if a fac-
       tor weighs heavily in the decision, even a minor change can cause a significant shift
       in the final number.
           The last layer is the weights themselves. We must remember to think of the
       weights as their own utility function. Naturally, changes we make to those weights
       have a direct effect on the outcome.
           After passing through all layers of the filter, we arrive at a single value that is a
       composite of all the processes and weights that are in place above it. We can then
       compare these final utility values to determine which selection has passed through

       There are no limits to the depth or breadth of the multi-layered weighting process.
       We can have as many levels as we need. Each layer can have many different compo-
       nents as well. Our only limit is the information that we have available in our design.
       (From a design standpoint, an alternate mentality is to say, “This is the decision I
       want my characters to make… what information should my game track to facilitate
       it?”) Of course, the complexity increases as we add more factors and layers.
            There are methods for managing this complexity, however. Most of the meth-
       ods involve making sure we don’t confuse ourselves through our own process. One
       such method is a topic we already covered in this chapter. By ensuring the homo-
       geneity of our data by establishing limits and practicing normalization, we keep the
       relationships between factors relatively simple.
           As a quick example, if we were to have scored Laurie’s dinner preferences on
       one scale and mine on another, the process of weighting them appropriately would
       have been more complicated. By scoring all preferences on a normalized scale of 0
       to 1, we ensured that the 2:1 weight ratio between my desires and hers was, indeed,
                                                  Chapter 13 Factor Weighting        347

2 to 1. If, instead, we score my ratings between 0 and 3 and hers between 0 and 5,
we have to perform one extra step to say that, at the moment, my desires are worth
twice as much as hers.
    The goal is to isolate each individual component as much as possible. If we can
ensure the integrity of a particular component, we do not need to know the process
that generates the inputs to it, or what is going to happen to the output it generates.
We have established compartmentalized confidence in that single portion of the
    If we trust that the decision model for each of our components is perfectly
valid—in its own scope—then we can subsequently trust that they will all work
together well (Figure 13.8). That is, if we believe that each step is correct, the entire
decision model will be correct as well. Of course, if the outcome of the decision
model doesn’t seem to make sense to us, we need to go back through each layer of
the model and question our premises.

FIGURE 13.8    If we ensure the integrity of each stage of a layered weighting process, we
                   can feel confident that the output is correct as well.

    For example, if we are comfortable with the idea that time, price, and desire are
the three components to our dinner decision, and we believe that the weights of 5,
3, and 2 (respectively) were correct for a night when time is tight and money is
scarce, we should be happy with the results that these weights generate. We trust
that the processes that determine the utility scores for time, price, and desire are
doing their jobs. We also trust that whatever score we generate at this step will be
used correctly down the road. (Theoretically, we could use the final result of
the “dinner decision” example in a larger decision model—such as “plans for the
348    Behavioral Mathematics for Game AI

            If we do need to make a change, we must be careful to only change components
       where there are problems. Of course, that necessitates finding where the problem
       begins in the first place. We must resist the temptation to begin tweaking one com-
       ponent to solve problems in another. We must locate the problem first and then
       change only that component until it generates the results we want or expect (which
       are not always the same thing). The process can be involved, but the end result is
       that our decision models can handle many widely disparate pieces of information
       in a manageable fashion.


       If the prior chapters have given us the tools with which we can build our AI algorithms,
       this chapter gave us the rules for drawing maps and blueprints and measuring our
       construction materials. As with maps and blueprints, through the right application
       of the techniques above we can put the entire world into perspective.
            I once had a social studies teacher who repeatedly proclaimed, “Everything is
       relative.” During class discussions, he would often respond to a student’s unqualified
       claim with “Compared to what?” For example, the word hot is only meaningful in
       context. A hot day may be 90 degrees (F). “Billy’s head feels hot” may mean a 100-
       degree fever. On the other hand, both 90 and 100 degrees make for comfortable, not
       hot, bath water. I’m sure that cooking your steak over “hot coals” would suggest
       something hotter than what you would be willing to bathe in.
            His point was that information needs to be processed together with other related
       information. Additionally, we can combine information with other less strongly
       related information if we can find some sort of common ground as an intermediary.
           While this was neither his forte nor his intent, his assertion is prophetic in the
       realm of behavioral AI. Because we are often trying to find that one decision factor,
       we must combine many different pieces and parts. Often the information that has
       common ground is only meaningful when we put it in context. And for those con-
       cepts that aren’t so obviously related, we define that common ground. The tools
       above allow us to do this. Through the iterative and layered process, we can take
       innumerable bits of information and construct one final expression that, in the
       proper context, represents to us what should be done.
          Of course, as we shall soon revisit… just because it should be done doesn’t
       mean that we will do it! After all, who among us is perfectly rational?

IV              Behavioral Algorithms

      Chapter 14, “Modeling Individual Decisions,” puts the tools we have assem-
  bled to use by working through a complex decision process.
      Chapter 15, “Changing a Decision,” addresses the caveats and complications
  that can arise through the process of monitoring the validity of our current plan of
  action—and changing decisions accordingly.
       Chapter 16, “Variation in Choice,” explores methods of selecting different
  options to ensure that our agents do not become predictable and, therefore, look
  less human.

This page intentionally left blank
14                  Modeling Individual

            o this point, we have pondered a lot of theory, laid out plenty of tools, and
            even examined ways of measuring our workspace. While all of that prepara-
            tory work was necessary, we have arrived at the point where we can put all of
      what we have learned to use.
          Before we proceed with the glorious and rewarding process of crafting our
      decision models, however, we really need to determine what we are doing. Before
      we act, we must choose. Before we choose, we must decide what we are choosing.
      After all, the decision to choose (or choice to decide?) isn’t one to take lightly. I’ve
      also heard it said that “if you choose not to decide, you still have made a choice.”
      (Wow… just saying that gives me a Rush.)
           With all of this choosing and deciding and acting ahead of us, perhaps a defin-
      ition of terms is in order.


      The most atomic structure in behavioral game artificial intelligence (AI) is the indi-
      vidual decision. I use the word atomic, not in the literal sense that it was first used—
      that is, “the smallest possible object”—but rather in the sense of “what bigger things
      are built out of.” The true definition of the word atom is “something so small as to
      prohibit further division.” Scientists of the past originally named atoms “atoms”
      because the belief was that there was nothing smaller from which an atom was
      made. They believed it was impossible to divide them further.
          Since that point, of course, we have discovered otherwise. The etymology of the
      word has drifted as well. When we talk about the chemical nature of a substance,
      we don’t make a count of the electrons, protons, and neutrons that are involved. We
      refer to the atoms. We may refer to a molecule as well, but usually that molecule
      is made up of atoms. We even name molecules after the atoms that are in them.

352   Behavioral Mathematics for Game AI

      Only through changing the name slightly do we imply that a subatomic particle is
      missing or that there is an extra one along for the ride. So, despite being divisible
      (and putting the lie to the original meaning of their name), atoms are still the core
      building blocks from which everything we see, touch, and feel is made.
          We can say the same for individual choices in game AI. When we look at a
      game character on a screen, we see many actions. Some are individual choices (e.g.,
      “use the gun instead of my fists”), and some are actual physical events (such as “fire
      the gun one time”).
           We also witness conglomerations of multiple actions (such as “draw the gun,
      raise the gun, aim the gun, fire the gun”). We sometimes refer to these collections
      of actions as behaviors. Behaviors are roughly analogous to molecules (Figure
      14.1). They are often composed of multiple actions (atoms). Some behaviors have
      many actions; some only have a few. Some actions combine well together to make
      a stable, understandable behavior; other actions don’t bind quite as readily to each
      other (e.g., “draw the gun, throw the gun up in the air, pick the flower, smell the
      flower, aim the flower, eat the flower”). Choices and actions are the atoms we use
      to make up those behavior molecules.
         The subatomic particles—the electrons, protons, and neutrons—in my obscure
      metaphor are the bits and pieces that we use to construct the individual decision.

                   FIGURE 14.1 Choices and actions are the atoms of game AI.
               We think of them as the smallest building blocks of character behavior.
                                          Chapter 14 Modeling Individual Decisions       353

      These include the tools we have covered so far in this book: value, utility, formulas,
      response curves, scales, granularity, and weighted sums, just to name a few. While,
      like the atom, we can’t construct the decision without them, we don’t think about
      the pieces and parts outside the context of the decision itself. For example, a response
      curve doesn’t have much meaning outside the context of a decision that utilizes it.
      Naturally, we need to understand how these tools work and how they combine. We
      need to understand their dynamics and how they affect the bigger picture. The
      entire existence of those parts, however, is given meaning by their role in forming
      those atoms—those decisions that our AI agents need.
           Along the way through this book, we have illustrated many individual decision-
      making processes through our examples. Most of those were specifically con-
      structed examples so that we could use the tool we were learning about in a context
      that was more familiar to us than simply an abstract theory or dry description. We
      will revisit some of these decisions and craft new ones throughout this chapter. Our
      goal is to begin to put everything we have covered into one decision-making
      process. Because of this, much of this chapter (and the next) falls into our familiar
      “In the Game” category.
           Remember, while a particular example may be of a specific behavior or endemic
      to a stereotypical genre, it is the decision process that is important to learn. Strange
      as it may sound, deciding what weapon to use in a role-playing game (RPG) is not
      all that different from deciding what attraction to visit in a theme-park-style game.
      Deciding whom to shoot or where to hide in a first-person shooter (FPS) is similar
      in many respects to deciding to whom to pass the ball in a sports simulation. The
      genres are different, the situations are different, and the behaviors are different.
      However, the choices our agents are making in those examples are similar, just as
      the atoms that make up wildly dissimilar substances are made of the same compo-
      nents. And the tools that we use to arrive at those decisions are definitely the same.
      A response curve is a response curve just as an electron is an electron. The bottom
      line is that, while a particular example may not seem similar to a challenge we face
      in our own game, the process may certainly be what the proverbial doctor ordered.
      (And I give you 80% odds that the process is sugarless.)


      We need to go through a number of steps to construct a decision-making algorithm.
      Because our atom is a single decision, it is naturally the place to focus on. Once we
      have made some decisions, we can assemble them into behaviors. In fact, making a
      single decision about an action often makes a decision about a behavior as well.
354   Behavioral Mathematics for Game AI

      For example, a decision to “attack Bad Dude with our gun” means we have made a
      decision about which enemy to attack, which weapon to use, that we need to draw
      it, aim it, fire it, and so on. All of those other actions are included in the key deci-
      sion of “attack Bad Dude with our gun.”
           The reason we view “attack Bad Dude with our gun” as a single decision is that
      we are processing all of the components as a whole. We could have put it up against
      “attack Bad Dude with our fists,” “attack Evil Dude with our gun,” or even “attack
      Evil Dude with our fists.” We were not breaking down the decision into “attack Bad
      Dude or Evil Dude?” or “attack with gun or fists?” While we certainly could have
      divided the quandary into two separate parts (i.e., who to attack and how), we may
      want to score the decision based on the combination of the criteria.
           For example, if we compare the threats posed by Bad Dude and Evil Dude, we
      may find that Evil Dude is more of a threat (Figure 14.2). If we compare the rela-
      tive strengths of our gun and our fists, we will likely find that our gun is a more po-
      tent weapon. Those two observations may lead us to attack Evil Dude with our gun
      (in the library?).

       FIGURE 14.2 If Bad Dude has a particular weakness to fist attacks, separating the two
         decisions would not have brought us to that choice. Only by combining the factors
           into a single decision algorithm would we have discovered the correct action.

           If we were to combine the target and weapon decisions together into one (mys-
      terious) utility equation, however, we might find that Bad Dude has a weakness for
      melee attacks and that our best option (of the four combinations) would be to
      attack him with our fists. By making two separate decisions and gluing them
      together, we arrived at a suboptimal behavior. By combining them, we determined
      the best choice for the situation.
                                           Chapter 14 Modeling Individual Decisions      355

      More or Less?
      The way around this is for us to decide what our decision is going to entail. One ac-
      tion? Two? A whole cluster of them? The more actions we lump into one decision,
      the more complex the decision becomes. On the other hand, the fewer actions we
      group into one decision, the more actual decisions we need to make. We also need
      to be careful not to run into logical pitfalls such as the one illustrated above.
          As always, the rationale for any given combination is very context-dependent.
      Most of the time, we want to group actions together if they are strongly related. For
      example, walking to an object is strongly related to the decision to pick the object
      up. We wouldn’t separate those two actions. The statements “should I pick up the
      box?” and “should I walk to box?” sound odd together. If we connect the answer to
      the first question to the second with “therefore,” however, it makes a lot more
      sense to us. “Should I pick up the box? Yes. Therefore, I should walk to the box.”
      The decision (such as it is) to walk to the box is a necessary component of the decision
      to pick up the box. We can’t pick it up if we don’t walk to it. (You also can’t pick
      up the box if you can’t walk to it.) Of course, we could not have even considered
      picking the box up if the assumption wasn’t there that we were going to walk to it.
      The two actions are almost inextricably linked—which means they should be con-
      sidered in one decision.


      Once we settle on what a decision entails, we need to analyze it. By sifting all of the
      relevant information through the tools we’ve discussed, we can begin to home in on
      the right decision.

       IN   THE   G AME   Which Dude to Kill?

      As our example throughout this next section, we define Evil Dude and Bad Dude as
      types of antagonistic dudes. We will also add a boss type, Arch Dude, to the mix. (If
      it helps complete the picture, we can imagine all the Dudes in dark glasses and stu-
      pid hats.) There are four types of weapons that we and the Dudes can arm ourselves
      with: a pistol, a shotgun, a machine gun, and a rocket launcher. The decision we
      need to make is, when confronted by dudes of various types, armed with any of the
      available weapons, and at various distances from us, which of the three should we
      attack first (Figure 14.3)?
356     Behavioral Mathematics for Game AI

             FIGURE 14.3 When our agent is confronted by Dudes of varying types, armed
               with any of the four weapons, and at a variety of distances, he must select
                         which of the Dudes to attack and with which weapon.

        Beginning as early as Chapter 2, we talked about the necessity of identifying and an-
        alyzing all the factors relevant to a decision. Throughout the examples in the book,
        we limited ourselves to the ones that helped illustrate the point we were trying to
        make at that time. We are now going to revisit this idea.
           To determine which factors are relevant to our pending Dude-icide, we need to
        make a list of things that could help or hinder our ability to successfully attack a
        Dude in general. After long consideration, here is the list we will use.

            Distance to enemy: How far away from us is our target? This relates to the
            weapon range (below) as well as the threat factor to our own safety (also below).
            Our weapon range: How far away can we shoot with our current weapon?
            Our weapon damage: How much damage per second does our current weapon
            deal? Is the amount of damage related to the distance?
            Our weapon accuracy: How accurate is our current weapon? Is the accuracy
            related to distance?
            Our health: How much damage can we take?
            Opponent’s weapon range: From how far away can they shoot us?
                                    Chapter 14 Modeling Individual Decisions      357

    Opponent’s weapon damage: How much damage does the opponent’s weapon
    do per second? Is it related to the distance of the shot?
    Opponent’s weapon accuracy: How accurate is the opponent with his weapon?
    Is the accuracy related to distance?
    Opponent’s health: How much damage can the target take?

  While there could be other considerations, we are going to stop there for the
moment. We can always add more fun stuff later on.
     As we discussed in the previous chapter, we need to determine if each criterion
is concrete or abstract, what range they will fall into, and at what granularity we are
measuring. By doing this, we get a better idea of what we are working with. We need
to know the shape of each piece before we can start fitting them together.
    A quick glance through the list tells us that all of the criteria are concrete val-
ues. They are nonsubjective, measurable values. In fact, they are all values that are
either listed as a property of an object (such as weapon damage) or that the game
engine can calculate for us (such as distance). This simplifies our process somewhat
for now.

Here a Dude, There a Dude, Everywhere a Dude, Dude…
To give us a better idea of the concrete data we are working with, we need to list the
specifics for each of the three types of Dudes. Each of the three has an amount of
health, with the Arch Dude being able to absorb the most damage. Additionally,
they have accuracy modifiers that adjust their ability to shoot their weapons. Being
the least trained, the Bad Dudes are… well… bad. The Arch Dudes, on the other
hand, have a bonus to their weapon accuracy.

    Dude Type             Health             Acc. Mod.
    Bad                   100                –20%
    Evil                  120                0%
    Arch                  150                +10%

Choose Your Weapon
Our agent and the Dudes can use any one of four weapons. These are a pistol, a
shotgun, a machine gun, and a rocket launcher. Each of the four types has its own
accuracy and damage-dealing characteristics. The weapons start with a base damage
and accuracy rate. Rather than a simple concrete number, however, these values are
358   Behavioral Mathematics for Game AI

           To construct these formulas, we use some of the suggestions in Chapter 10. To
      exhibit the characteristic of decreasing damage, we need to have a formula that was
      at its maximum result at a distance of 0 (with one exception, as we shall see). From
      there, we want it to fall away at an increasing rate. The natural starting point was a
      parabolic curve that we subtract from our maximum point.
          All four weapons use the same base formula. This ensures that the general,
      distance dependency characteristic is present. Each weapon has specific values for
      each variable, however, which is what separates one weapon from another. The
      formulas are

           Notice that the structure of the two formulas is the same. The magic of each
      happens with the numbers that we plug in. There are a few things to note, however.
      First, it is possible that the formula can generate a number less than 0. As we shall
      see, this is by design. Because we can’t have negative damage or negative accuracy,
      we need to clamp the result to a minimum of 0.
          As cryptic as the formulas look in this form, they begin to make more sense as
      we plug in the weapon-specific values. The figures for each weapon are:

          Value                      Pistol        Shotgun      Mach. Gun      Rocket L.
          Range                      137           57           300            300
          Base Damage (/sec)         10            50           30             100
          Dmg. Decay Exp.            2.2           2.1          2.0            2.0
          Dmg. Decay Divisor         5,000         100          5,000          1,500
          Dmg. Decay Shift           0             0            0              50
          Base Accuracy              0.70          0.95         0.80           0.50
          Acc. Decay Exp.            1.5           2.2          1.8            2.0
          Acc. Decay Divisor         3,000         15,000       50,000         30,000
          Acc. Decay Shift           0             0            0              50
                                   Chapter 14 Modeling Individual Decisions     359

    Unfortunately, looking at the figures in the table doesn’t make them any less
mysterious. To shed a little more light on how they operate together, we could use
one of the weapons an example. We will leave the Distance and Modifier values
blank for now, as those would be specific to the situation.

A word of caution for those of us who seek to infer meaning from things: The
numbers used in this example are neither specifically related to anything nor
drawn from anything. Often, in the search for an effect for a mathematical
model, we select these numbers through a trial-and-error approach. There is
no “correct way” to approach this process. We end up using “whatever works.”

   Using the formula and numbers above, the damage calculation for a machine
gun is:

     Assuming that the machine gun is in the hands of an Evil Dude (no modifiers)
at a range of 100 feet, we can now calculate how much damage the machine gun will
do per second.

    As we can see, the travel distance of 100 reduces the damage done by the ma-
chine gun by 2 points down to 28.0. If we extend the shot further, to 200 feet, we
would find that the damage is reduced further, to 22.0.

    The formula for accuracy works the same way.
    One aspect not shown above is the Modifier for the skill of user. The Modifier
parameter directly affects the starting point of the curve. We can see this by com-
paring the accuracy rates for a Bad Dude and an Evil Dude. The Bad Dude has a
–20% modifier to accuracy. Therefore, his accuracy with a pistol at 75 feet would be
360   Behavioral Mathematics for Game AI

          On the other hand, the accuracy for an Evil Dude with a pistol at 75 feet would be

          The only difference between the two is the inclusion of the Modifier value.
           The other aspect that we haven’t seen in action yet is the Shift parameter. If we
      think back to Chapter 10, we will remember that it is possible to shift the vertex of
      a parabola left or right. We do this by adjusting the value of x under the exponent.
      If we realize that Distance in our formula is the equivalent of x, then the placement
      of the Shift parameter along with Distance explains the horizontal movement of the
           For example, the peak accuracy of a rocket launcher is not at 0 feet. By specify-
      ing that Shift = 50, we ensure that the peak accuracy of the rocket launcher (i.e., the
      vertex of the parabola) is at 50 feet. At 20 feet, for example, the accuracy of
      the rocket launcher is only 47%—down from its peak of 50%.

          All of this data is better visualized (and more easily constructed) by looking at
      graphs. Figure 14.4 shows the accuracy of the four weapons based on the data for
      each inserted into the accuracy formula.

             FIGURE 14.4 The accuracy curves of the four weapons are a result of the
                data for the respective weapons entered into the accuracy equation.
                                    Chapter 14 Modeling Individual Decisions      361

    By looking at the accuracy curves on the same graph, we can see not only how
each weapon performs over the given range, but also how they perform compared
to each other. For example, while the shotgun is the most accurate close-range
weapon (due to the scatter effect), its accuracy falls off dramatically as the distance
increases. On the other hand, the accuracy of the machine gun remains relatively
good over the range of the graph.
    Of particular note is the graph of the rocket launcher. As we mentioned above,
the Shift parameter moved the vertex of the parabola to the right. Rather than hav-
ing a peak accuracy at a range of 0, we can see that it is at its best at a range of 50
(the Shift value for a rocket launcher). Both nearer and farther than 50 feet, its ac-
curacy decreases.
    We can view the results of the damage formula on a graph as well (Figure 14.5).
The range of the graph is the same as in the accuracy graph (Figure 14.4). Again, we
can see the telltale parabolas of the quadratic equation.

     FIGURE 14.5 The damage curves of the four weapons are a result of the data
            for the respective weapons entered into the damage equation.

     For what should be obvious reasons, the rocket launcher is the most potent of
the four weapons. The lowly pistol is on the low end of the range. It is more impor-
tant to note the effects of the different shapes of the damage curves as the range
increases. As we would expect, a shotgun blast is fairly potent at close range. As the
range increases, however, a shotgun blast loses much of its kick. In fact, at about 50
feet, it would be less powerful (per second) than being struck by a bullet from a
pistol. At 60 feet, the shotgun blast does no damage whatsoever.
362   Behavioral Mathematics for Game AI

          Whereas each of the two formulas gives us valid and important information
      about the four weapons, we learn a lot more about the effectiveness of the weapons
      when we combine the graphs. By simply multiplying the damage per second by the
      percent chance of scoring a hit, we arrive at a new figure: expected damage per second.
      We can then graph this combination of data in the same manner as either accuracy
      or damage alone (Figure 14.6).

               FIGURE 14.6 By multiplying the damage by the accuracy rate, we can
                 calculate the overall effectiveness of a weapon at different ranges.

           Once again, analyzing the four curves provides us with some interesting infor-
      mation. First, as we would expect, the high accuracy and high damage rates make the
      shotgun the weapon of choice at close range. While the rocket launcher certainly
      packs a punch close in, its accuracy rate causes it to be slightly unreliable. On the
      other hand, because the accuracy and damage rates of the shotgun drop so quickly,
      its effectiveness drops swiftly as well. That leaves the rocket launcher as a prime
      weapon for mid-range strikes. However, because of the poor long-range accuracy
      of the rockets, its formidable damage rate becomes less important as range increases.
      Eventually, the machine gun’s reliable accuracy and moderate damage-dealing
      capability wins out. For longer-range strikes, it becomes the weapon of choice.
           The pistol may not look impressive in the company of the other, more power-
      ful weapons. However, we have to consider that we may not always have access to
      (or ammo for) the other weapons. If we only had a shotgun and a pistol, for instance,
      we would elect to use the pistol at ranges of over 50 feet. If we had a machine gun
      (and no rocket launcher), we would elect to use it instead of the shotgun for ranges
      of over 35 feet.
                                     Chapter 14 Modeling Individual Decisions       363

   It’s important for us to remember that we are not the only one with a gun. The
Dudes are armed as well. Other than the accuracy modifiers that we identified
above, the Dudes’ weapons perform identically to ours.

Did I Mention the Detonator?
The above information helps us determine what the optimum weapon is for each
range. That goes a long way toward helping us make our decision. If we know the
range to each Dude, his health, and what weapon he is carrying, we can determine
which Dude to attack first and what the optimum weapon is for us to attack him
with. The solution is to determine which Dude is some combination of the biggest
threat and the easiest kill. However, before we go further, we are going to add one
last wrinkle to our example.
    We will now assume that there is an important point in the area. In true epic
James Bond style, we will say that it is a detonator for a large explosive device.
(On second thought, this could be in Austin Powers style, too. Or Jack Bauer style.
Or Jedi Knight sty–… never mind.) We now have two goals to address.
    First, as before, we need to avoid allowing ourselves to be killed by a Dude. That
is what we were addressing above when we were going to dispatch the biggest
threat–easiest kill combination. However, our second goal is to prevent someone
from triggering the detonator. By adding a parameter to each Dude “range to goal,”
we can determine who is the most dangerous target in that respect. The two priority
systems may not yield the same answer. For example, a Dude who we may have
judged as the lowest-priority target before may suddenly become extremely impor-
tant to attack if he moves close to the detonator (Figure 14.7).

   FIGURE 14.7 The poor fighter (Bad Dude) with a poor weapon (shotgun) at long
     range would normally be the lowest-priority target. If he is standing next to the
       dreaded detonator, however, his priority as a target increases significantly.
364   Behavioral Mathematics for Game AI

          We need to include a way of adding that priority into our target selection algo-
      rithm. Before we do that, however, we need to define what “close to” and “increased
      priority” mean. To do that, we construct another formula. Once again, we tap into
      a type of formula from Chapter 10. We will define Urgency as the result of a formula
      with an exponent that is less than 1 (a root). By subtracting from the high value, 1.0,
      we arrange it so that as distance from the detonator increases, the urgency of the
      target drops away from 1.0. The formula we will use is

          Once again, the effect of this formula is easier to visualize as a graph (Figure
      14.8). In this case, we are using a parabola to simulate the rise in Urgency as the
      range decreases. The nature of an exponent-based curve is such that the range of
      change is very significant near the vertex (Range = 0, Urgency = 1.0). While there is
      an increase in Urgency as the distance diminishes throughout the entire range of the
      graph, the rate of change increases markedly as the distance approaches 0.

              FIGURE 14.8 As the range to the detonator increases, the urgency level
                for the target drops. As the target’s range to the detonator decreases,
                    especially as it closes within 25 feet, the urgency rises rapidly.

      A Jumble of Blocks
      We have now defined all of the pieces and parts that we will use in our decision. We
      have not only set values for the various Dudes that we will encounter, but we have
      established formulas for calculating more complex (yet still concrete) values such
      as the range-based damage and accuracy figures. However, none of these parts
      work together. We have lots of facts and formulas, but no cohesion.
                                     Chapter 14 Modeling Individual Decisions      365

   Before we start putting these blocks together, we need to ensure that we have
them built correctly. To establish the compartmentalized confidence we talked
about in Chapter 13, we need to ensure that we are comfortable with each compo-
nent. Only then can we trust that what we build with the blocks will be valid.


There are three major components in this example: the agent (us), Dudes, and
weapons. Accordingly, we create classes for the three types of entities (CAgent,
CDude, and CWeapon). Additionally, for Dudes and weapons, we create collection
classes to hold the individual objects (Figure 14.9). We will look at the basics of each
of the classes for clarity.

           FIGURE 14.9 The class structure for the “Shoot Dudes” example.
               CDudeCollection contains a vector of CDude objects.
           CWeaponCollection contains an array of four CWeapon objects.

This logical arrangement is optional, of course. This book is not meant to be an
educational tome on design patterns or memory management. I have stripped
the example down to a simple, easy-to-understand model. Feel free to insert the
AI logic into a design of your own choosing.

Because we utilize CWeapon in both CDude and CAgent, we will begin by defining it.
First, we should note that we have enumerated a type to make referencing the
weapons easier throughout the entire program.
366   Behavioral Mathematics for Game AI

          typedef enum {





          } WEAPON_TYPE;

          The header of CWeapon is self-explanatory. (For space purposes, I have cut out
      the constructor, destructor, and the usual “get and set” accessor functions.)
          class CWeapon



              //[Ctor/Dtor snipped for space…]

              //[Accessors snipped for space…]


              // Accuracy and Damage Calculations


              double GetAccuracy( USHORT Dist = 0, double Modifier = 0.0 );

              double GetDamage( USHORT Dist = 0, double Modifier = 0.0 );



              // Member Variables


              char* mName;                 // Name

              USHORT mMaxRange;            // Max range

              USHORT mBaseDamage;          // Base (max) damage

              double mDmgDecayExp;         // Decay formula exponent

              USHORT mDmgDecayDiv;         // Decay formula divisor

              USHORT mDmgDecayShift;       // Decay formula shift (horiz.)
                                   Chapter 14 Modeling Individual Decisions     367

         double mBaseAccuracy;      // Base (max) accuracy

         double mAccDecayExp;       // Decay formula exponent

         USHORT mAccDecayDiv;       // Decay formula divisor

         USHORT mAccDecayShift;     // Decay formula shift (horiz.)


    The member variables for the weapons hold the numbers that we used for the
damage and accuracy calculations. For both damage and accuracy, we have a base,
a decay exponent, a decay divisor, and a decay shift. Based on the formulas we laid
out above, those are the only figures that we need to define the appropriate curves.
     By calling GetDamage() and GetAccuracy() with the distance and any modi-
fiers (e.g., the accuracy modifiers for the different types of Dudes), we receive the
appropriate damage value or accuracy percentage. These two functions operate
very simply.
    double CWeapon::GetAccuracy( USHORT Dist /*= 0*/,

                                     double Modifier /*= 0.0 */ )


         double Accuracy = mBaseAccuracy + Modifier -

             (pow(( Dist - mAccDecayShift ), mAccDecayExp )) / mAccDecayDiv;

         if ( Accuracy < 0 ) { Accuracy = 0; }

         return Accuracy;


    double CWeapon::GetDamage( USHORT Dist /*= 0*/,

                                  double Modifier /*= 0.0 */ )


         double Damage = mBaseDamage + Modifier -

             (pow(( Dist - mDmgDecayShift ), mDmgDecayExp )) / mDmgDecayDiv;

         if ( Damage < 0 ) { Damage = 0; }

         return Damage;

368   Behavioral Mathematics for Game AI

           Notice that, because our formulas can generate values that are less than 0, we
      must clamp the minimum on both values to 0. We can’t do negative damage, and we
      can’t have less than a 0% chance of hitting our target. Other than that, the formulas
      for calculating the values are familiar as code-ese translations of their mathematical
      versions we expressed above.
          We define an array of CWeapon objects in the CWeaponCollection as follows:
          #define MAX_WEAPONS 4

          CWeapon mWeapons[MAX_WEAPONS];       // Array of all weapons

           CWeaponCollection also has accessors for looking up information about any
      of the members of its array by index. These accessors simply pass the request on to
      the matching accessor of the referenced object.
           The important function that CWeaponCollection performs, however, is to ini-
      tialize the array with the four weapons we have defined in our game. We do this
      through the function InitOneWeapon().
          void CWeaponCollection::InitOneWeapon( WEAPON_TYPE Type,

                                                       char* Name,

                                                       USHORT MaxRange,

                                                       USHORT BaseDamage,

                                                       double DmgDecayExp,

                                                       USHORT DmgDecayDiv,

                                                       USHORT DmgDecayShift,

                                                       double BaseAccuracy,

                                                       double AccDecayExp,

                                                       USHORT AccDecayDiv,

                                                       USHORT AccDecayShift )


              mWeapons[Type].SetMaxRange( MaxRange );

              mWeapons[Type].SetBaseDamage( BaseDamage );

              mWeapons[Type].SetDmgDecayExp( DmgDecayExp );

              mWeapons[Type].SetDamageDecayDiv( DmgDecayDiv );

              mWeapons[Type].SetDamageDecayShift( DmgDecayShift );

              mWeapons[Type].SetBaseAccuracy( BaseAccuracy );

              mWeapons[Type].SetAccDecayExp( AccDecayExp );
                                  Chapter 14 Modeling Individual Decisions    369

        mWeapons[Type].SetAccDecayDiv( AccDecayDiv );

        mWeapons[Type].SetAccDecayShift( AccDecayShift );

        mWeapons[Type].SetName( Name );


    In the function InitWeapons(), we call InitOneWeapon() once for each of our
four weapons.
    void CWeaponCollection::InitWeapons()


        InitOneWeapon( WEAPON_PISTOL, “Pistol”, 137,

                          10, 2.2, 5000, 0,         // Damage

                          0.7, 1.5, 3000, 0 );      // Accuracy

        InitOneWeapon( WEAPON_SHOTGUN, “Shotgun”, 57,

                          50, 2.1, 100, 0,          // Damage

                          0.95, 2.2, 15000, 0 );    // Accuracy

        InitOneWeapon( WEAPON_MACHINEGUN, “M/G”, 300,

                          30, 2.0, 5000, 0,         // Damage

                          0.8, 1.8, 50000, 0 );     // Accuracy

        InitOneWeapon( WEAPON_ROCKETS, “R/L”, 300,

                          100, 2.0, 1500, 50,       // Damage

                          0.5, 2.0, 30000, 50 );    // Accuracy


     In each call to InitOneWeapon(), we are passing in the numbers from the char-
acteristics table that we laid out earlier. Once we have defined the weapon parame-
ters, the damage and accuracy functions have all the information that they need to
do their magic. If we want to change the characteristics of one weapon, we simply
change the initialization parameters that we pass in when we create the CWeapon
object. The formula does the rest. (In a production environment, we would store
these values in a configuration file so that we can tweak them without touching
the code.)
370   Behavioral Mathematics for Game AI

      What Makes a Dude a Dude?
      We now turn our attention to CDude and CDudeCollection. There really isn’t a lot
      to CDude in our implementation. As we did with the weapons, we define an enu-
      merated type for the three different types of Dudes.
          typedef enum {




          } DUDE_TYPE;

          The header file contains the usual suspects. The constructor takes the arguments
      that set all the member variables. (Again, I have snipped the accessor functions.)
          class CDude




              // Ctor/Dtor


              CDude( DUDE_TYPE Type,

                     char* Name,

                     USHORT Health,

                     USHORT Location,

                     USHORT DistToGoal,

                     CWeapon* pWeapon );

              virtual ~CDude();

              //[Accessors snipped for space…]

              void SetAccAdjust();



              // Member Variables

                                  Chapter 14 Modeling Individual Decisions    371

         char* mName;         // Name of Dude

         DUDE_TYPE mType;     // Dude Type

         USHORT mHealth;      // Health of this Dude

         USHORT mLocation;    // Location (1D) of this Dude

    USHORT mDistToGoal; // Distance to the goal

         double mAccAdjust;   // Accuracy adjustment

         CWeapon* mpWeapon;   // Pointer to weapon armed


    Please note that, for purposes of our example, we are tracking the positions of
the Dudes and our agent in one dimension only. Therefore, the distance between
them is simply a linear measurement on a single axis.
    The only function in CDude that is not precisely an accessor is SetAccAdjust().
When a Dude is created, the constructer calls SetAccAdjust() to set the accuracy
adjustment based on the type of Dude.
    void CDude::SetAccAdjust()


         switch( mType ) {

         case BAD_DUDE:

             mAccAdjust = -0.2;


         case EVIL_DUDE:

             mAccAdjust = 0.0;


         case ARCH_DUDE:

             mAccAdjust = 0.1;



             mAccAdjust = 0.0;


         } // end switch

372   Behavioral Mathematics for Game AI

          The CDudeCollection is just as bland. It contains, as its member variable, a
      vector containing CDude objects.
          typedef std::vector < CDude > DUDE_LIST;

          DUDE_LIST mlDudes; // List of Dudes in the game

           We fill this vector with our Dudes when the constructor for CDudeCollection
      calls InitDudes() which, in turn, calls InitOneDude().

          void CDudeCollection::InitDudes()


               InitOneDude( BAD_DUDE, “Baddie”,

                             100, 180, 50, WEAPON_SHOTGUN );

               InitOneDude( EVIL_DUDE, “Evilmeister”,

                             120, 200, 150, WEAPON_MACHINEGUN );

               InitOneDude( ARCH_DUDE, “Boss Man”,

                             150, 150, 110, WEAPON_ROCKETS );


          void CDudeCollection::InitOneDude( DUDE_TYPE Type,

                                               char* Name,

                                               USHORT Health,

                                               USHORT Location,

                                               USHORT DistToGoal,

                                               WEAPON_TYPE Weapon )


               CWeapon* pWeapon = mpWeaponCollection->GetWeaponPointer( Weapon );

               CDude NewDude ( Type, Name, Health, Location, DistToGoal, pWeapon

               mlDudes.push_back( NewDude );

                                    Chapter 14 Modeling Individual Decisions    373

The Mind of an Agent
The last class we need to establish is CAgent. This is where our AI decision-making
process will be taking place, of course. Before that, however, we need to establish
what our class looks like.
    The CAgent class uses a pair of structs to help track its information. The first
one is for keeping track of the weapons that we have on our person.
    struct sWEAPON_INFO


         CWeapon* pWeapon;

         USHORT Ammo;


    typedef std::vector < sWEAPON_INFO > WEAPON_VECTOR;

    Each item in a vector of type WEAPON_LIST contains a pointer to a weapon and
a record of the amount of ammunition that we carry for this weapon.
    We also have a similar structure for our list of targets.
    struct sTARGET_INFO


         CDude* pDude;

         CWeapon* pWeapon;

         double Score;

         bool operator<( const sTARGET_INFO& j ) {return Score > j.Score;}


    typedef std::vector < sTARGET_INFO > TARGET_VECTOR;

    The information contained in sTARGET_INFO may be confusing at first until we
discuss what our decision process will entail. We will be making a choice that is ac-
tually two in one. We must decide the best combination of which enemy we are
going to kill and with what weapon. As we touched on earlier in this chapter, decid-
ing on our target separately from the weapon we want to use may lead us to a false
“best solution.” Therefore, we have to address them as part of the same decision
process. Each entry that we place into a TARGET_LIST will be a combination of an
enemy and a weapon. We will then score that combination individually and select
the best entry.
374   Behavioral Mathematics for Game AI

          We overload the < operator for the std::sort algorithm because we have to
      explain to the compiler that we are sorting by a particular member of the
      sTARGET_INFO (specifically, Score).
           Rather than list the entire header file here and spoil the surprise, we will look only
      at a few key items of CAgent. First, there are no surprises in the member variables.
          USHORT mHealth;         // Current Health

          USHORT mLocation;       // Current Location in 1D

          TARGET_LIST mvTargets;       // List of available targets

          WEAPON_LIST mvWeapons;       // List of available weapons

          CDude* mpCurrentTarget;           // Current target to attack

          WEAPON_TYPE mCurrentWeapon; // Current weapon to use

          We have defined two member list variables—one for our current targets and
      one for our current weapons.
          When the agent needs the distance to a selected enemy, we call the function

          GetDistToTarget( CDude* pTarget )

          The implementation is simply
          USHORT CAgent::GetDistToTarget(CDude *pTarget)


               USHORT Dist = abs( mLocation - pTarget->GetLocation() );

               return Dist;


          The constructor for CAgent sets mLocation = 0 and mHealth = 100. Its other
      function is to initialize our weapons. We do this through the functions
      InitWeapons() and AddWeapon().

          void CAgent::InitWeapons()


               AddWeapon( WEAPON_PISTOL, 20 );

               AddWeapon( WEAPON_MACHINEGUN, 100 );

               AddWeapon( WEAPON_SHOTGUN, 10 );
                                            Chapter 14 Modeling Individual Decisions      375

                 AddWeapon( WEAPON_ROCKETS, 6 );


            void CAgent::AddWeapon( WEAPON_TYPE WeaponType, USHORT Ammo )


                 sWEAPON_INFO ThisWeapon;

                 ThisWeapon.Ammo = Ammo;

                 ThisWeapon.pWeapon =

                      mpWeaponCollection->GetWeaponPointer( WeaponType );

                 mvWeapons.push_back( ThisWeapon );


        All Dressed Up and No One to Kill
        By this point, we have created our objects and filled our lists. Our weapons have
        stats and functions describing their abilities, our enemies have stats and weapons,
        and our agent has stats, weapons, and enemies. None of that veritable cornucopia
        of information, however, yields a decision on its own. All we have is information.
        Even the nifty formulas that calculate the accuracy and damage that the weapons
        cause are only different forms of information.
             We have assembled the factors that we believe will be important to our decision
        just as a glance at the ingredient list for a cake tells us what we need to have handy.
        It does not tell us how we should combine those ingredients or what we need to do
        with them once they are in the same bowl. For that, we need to begin to identify
        how these components will work together.

        The first step we must take in combining our data into something meaningful is to
        identify the items that are directly related. We already touched on a few simple
        relationships. For example, our formulas for weapon damage and accuracy use
        distance as one of their components. Therefore, if we know the distance to a target,
        we can enter that into the formula for a particular weapon and generate a result. We
        have established a relationship between our location, that of our target, and the
376   Behavioral Mathematics for Game AI

          Another relationship we identified earlier also involved the weapons. The
      strength of a weapon is neither based only on the damage that it can do nor only on
      the accuracy with which it can be used. We combine these two factors into a single
      measurement. We measure it as the amount of damage that a weapon can do per
      second when we factor in the probability of a hit. In the graph in Figure 14.6, we
      labeled this “effectiveness.” By combining these two pieces of data into one concept,
      we establish a relationship.
          We can add another component to this relationship. This one makes it far
      more useful and relevant to the decision we are trying to make. By taking into ac-
      count the health of the target that we are considering, we can determine how long
      it would take for us to kill that target with a given weapon at a given distance.

           Of course, what is missing in the above equation is that Accuracy and Damage
      are the results of the distance-based formulas for the weapons. Figure 14.10 shows
      the entire cascade of relationships that leads us to the TimeToKill value.

         FIGURE 14.10 As we add more relationships between our individual components
             of data, we create data that is more useful to our overall decision process.

      P UTTING I T   IN   C ODE

      The code for this relationship is as simple as it seems. We create a function that takes
      the accuracy rate, the damage rate, and the target’s health and combines them into
      one figure based on the formula above.
                                    Chapter 14 Modeling Individual Decisions       377

    double CAgent::TimeToKill( USHORT Health,

                                    USHORT Damage,

                                    double Accuracy )


         USHORT TimeToKill;

         double DamagePerSec = ( Accuracy * Damage );

         // avoid divide by zero

         if ( DamagePerSec != 0 ) {

             TimeToKill = Health / DamagePerSec;

         } else {

             // if damage/sec is 0, clamp time to 1000

             TimeToKill = 1000;

         } // end if

         return TimeToKill;


    Notice that the process in this function is broken into two steps so that we can
avoid “division by zero” errors. We are serving another purpose here. If the amount
of damage per second that we can inflict is 0, there is no way that we can injure,
much less kill the target with the selected weapon. We want to exclude this target-
weapon combination from reasonable consideration. By setting TimeToKill to
an absurd value such as 1,000, we are ensuring that it will significantly reduce any
further calculations that we perform with it. However, we are not removing it from
consideration entirely. If we find ourselves in a situation where none of the weapons
are effective against any of the targets, we would still want to use the best combina-
tion to at least do some damage.
    Processing this relationship is as simple as sending the appropriate data into the
function. The data comes from a few simple functions as well. First, we have to retrieve
the accuracy and damage rates based on the weapon and distance. The following
functions provide that information for us.
    USHORT CAgent::GetDamage( CWeapon* pWeapon, USHORT Dist )


         USHORT Damage = pWeapon->GetDamage( Dist );
378   Behavioral Mathematics for Game AI

              return Damage;


          double CAgent::GetAccuracy( CWeapon* pWeapon, USHORT Dist )


              double Accuracy = pWeapon->GetAccuracy( Dist );

              return Accuracy;


          Therefore, the entire process for determining the time it takes to dispatch a
      particular enemy is:
          USHORT Dist = GetDistToTarget( pTarget );

          USHORT TargetHealth = pTarget->GetHealth();

          USHORT DamageIDeal = GetDamage( pWeapon, Dist );

          double MyAccuracy = GetAccuracy( pWeapon, Dist );

          double TimeToKillEnemy = TimeToKill( TargetHealth,


                                                   MyAccuracy );

      What Goes ’Round Comes ’Round
      Because we know what weapon our enemy is carrying, his accuracy rate (adjusted
      for the type of Dude if necessary), and our own health, we can reverse the above
      process and determine how long it would take for a particular enemy to dispatch us.
      The only difference in the functions is that GetAccuracy() must account for the
      Dude adjustment. By passing a pointer to the selected enemy into an overloaded
      version of GetAccuracy(), we can add that bit of additional information.
          double CAgent::GetAccuracy( CWeapon* pWeapon,

                                           USHORT Dist,

                                           CDude* pDude )


              // Get Dude’s accuracy adjustment

              double AccAdjust = pDude->GetAccAdjust();

              // Get weapon accuracy adjusted for Dude’s skill
                                            Chapter 14 Modeling Individual Decisions        379

                double Accuracy = pWeapon->GetAccuracy( Dist, AccAdjust );

                return Accuracy;


           Assembling the data requires a similar set of function calls.

           double DamageITake = GetDamage( pTarget->GetWeapon(), Dist );

           double EnemyAccuracy =

                     GetAccuracy( pTarget->GetWeapon(),


                                    pTarget->GetPointer() );

           double TimeToKillMe = TimeToKill( mHealth,


                                                    EnemyAccuracy );

           Through the processes above, we know how long it will take for us to kill a
       selected enemy and how long it will take for him to kill us… but now what? What
       do we do with this information?
            As we discussed in the first half of the book, the utility of an action is not always
       as obvious as we would like it to be. Sometimes we have to add a little bit of subjec-
       tivity as well.

       Certainly, each of the two values, TimeToKillEnemy and TimeToKillMe, is impor-
       tant. It is helpful to know how long it would take us to mow down a Dude with a
       particular weapon. It is also helpful to know how long we can expect to survive an
       onslaught from a particular Dude. However, there is not an inherent “cause and
       effect” relationship between TimeToKillEnemy and TimeToKillMe. We have to
       define a meaningful connection between the two.
           As with our simple example earlier, we cannot base our decision solely on one
       or the other of these two factors. Anecdotally, if we select the Dude we can kill the
       quickest, we may be exposing ourselves to a Dude who can kill us the quickest.
       Alternatively, if we elect to assault the one who is the greatest threat, we may be
       overlooking the fact that we could have quickly dismissed his weaker pals. What we
       must do is build a connection between these two independent values that expresses
       a combined utility to us.
380   Behavioral Mathematics for Game AI

          In Chapter 9, we explored the idea of relative utility. The decision that we often
      had to make at each step of the process was to decide if two items or actions were
      equally important or if one was more desirable than the other. We also had to decide
      how much more important one selection was than the other. This leads to a decision
      on how to weigh the utility of each action (Chapter 13). This is what we must do
      now to build a connection between the independent values of TimeToKillEnemy
      and TimeToKillMe.
          Certainly, we would prefer a situation where it takes us less time to vanquish
      our foe than it does for him to annihilate us. On the other hand, we also want to
      pick off the targets that are quick for us to kill. If we get them out of our hair, we
      have less to worry about. We will call the figure we are calculating ThreatRatio.
          After experimenting with various combinations, we can arrive at something
      that “feels” reasonable. We will define ThreatRatio with the following formula:

          By multiplying TimeToKillEnemy by 3, we are making it three times as impor-
      tant as TimeToKillMe. By dividing the weighted sum by 4, we are converting it to a
      weighted average.
          It’s worth noting that we are, once again, relying on the principle of compart-
      mentalized confidence. We trust that the processes that led us to the two “time to kill”
      values are valid. With that trust, we can turn all of our exploration and experimenta-
      tion to the components we are adding—the numbers involved in the weighted sum.
          By selecting a few examples, we can visualize the effect of our formula. For now,
      we will ignore how we arrived at these numbers (because we are confident, remember?)
      and instead focus only on the effect of the weighted average.

          TimeToKillMe                TimeToKillEnemy               ThreatRatio
          5.1                         3.3                           3.76
          5.1                         427.1                         321.56
          12.6                        4.7                           6.69
          12.6                        1,000.0                       753.15
          39.9                        1.0                           10.72
          39.9                        10.7                          18.02

          In the examples above, we can see a few extremes. The “best” score is the lowest,
      so we see that the first line is the best option for us to select. In that example, we are
      facing an enemy who can kill us more quickly than the other examples (5.1 seconds).
                                     Chapter 14 Modeling Individual Decisions       381

On the other hand, by using whatever weapon is represented in the first line, we will
dispatch him in a fairly rapid 3.3 seconds.
    Further examination of the data shows how it is working. For example, the sec-
ond line is the same dangerous enemy. However, our selection of weapon is less
than stellar. It would take us a tragic 427 seconds to finish him off. The threat ratio
score reflects this with a very poor result.
     On the other hand, if we had based our decision solely on how quickly we can
kill an enemy, we would have elected to go with the fifth line. In that combination,
we can kill the enemy in a single second. However, looking at the first column
shows that dispatching that enemy is a pressing problem. We have almost a full 40
seconds in which to deal with him.
     One other item to note is the appearance of the 1,000-second time in the fourth
line. This is due to a weapon choice where the damage per second would have been
zero and we manually overrode the value. We still calculate a threat ratio for that
selection in case we find ourselves in a hopeless situation where we must choose the
“lesser of two evils.” The threat ratio of 753 effectively puts that selection out of
the running, however.

Remember that Detonator?
While the formula above successfully connects the two expected kill times into a
single threat value, we still have the messy problem of the detonator to worry about.
The connection between the time to kill values was relatively obvious. We could
sum it up with the trite phrase “kill him before he kills you!” However, the Dude’s
proximity to the detonator is less directly related.
    We could calculate the amount of time that it would take a Dude to reach the
detonator based on the Dude’s speed and distance, but that assumes that the Dude
is moving directly for it without anything else on his mind. Instead, because we
constructed a utility curve that yielded the value, Urgency, we will use that as our
sole measurement.
     As with the two “kill times” above, we need to construct an abstract connection
between ThreatRatio and Urgency that meaningfully describes their relative merit in
our overall decision. We begin by making an assertion that we believe is important:
“If a Dude is close to the detonator, he becomes the most dangerous enemy.”
    While this assertion is indisputably relevant, it still doesn’t tell us how important
the factor is. For example, when we say “if a Dude is close to the detonator,” what
do we mean by “close?” The Urgency function already accounts for the phenomenon
that “closer means more urgent,” but until we relate it to the threat ratio, the no-
tion of “more urgent” is rather vague. As a result, we have no idea when the enemy
would become the “most dangerous” one out there. That depends on a lot of factors.
382       Behavioral Mathematics for Game AI

          If the Dudes arrayed before us all have the same threat ratio, a small tweak in the
          distance to the detonator could make the difference. On the other hand, if there is
          a large difference between the threats that the various Dudes present, it may not
          matter much how close any one of them is to the detonator. As with many occur-
          rences of subjective ratings, we are left to the obscure and very nonscientific method
          known as “pulling numbers out of the air to see what works.”
              Thankfully, by performing a simple division, we can significantly affect our result
          as both of our two factors change. Consider the formula

               Because we had already established that a lower ThreatRatio is more important,
          it helps for us to keep that arrangement. We also constructed Urgency so that the
          maximum value is 1. We want Score to be at its extreme (lowest) when Urgency = 1.
          By placing it in the denominator, we achieve this effect. As Urgency goes down, the
          value of the fraction would go up—that is, the target would become less of a prior-
          ity. Likewise, if Urgency stayed the same and ThreatRatio increased (became less
          important), Score would increase as well—again, making the target less of a priority.
          Therefore, the arrangement of dividing ThreatRatio by Urgency achieves the desired

          We can run some numbers through the formula to ensure that this is working
          properly. Consider the following values:

                ThreatRatio         Urgency          Score
                3.8                 0.26             14.64
                3.8                 0.32             11.85
                4.1                 0.26             15.77
                4.1                 0.32             12.81
                10.9                0.51             21.37
                10.9                0.80             13.62

               Again, as we peruse these examples, some things jump out at us. First, consider
          the first two lines. The threat ratio stays the same, but the urgency increases from
          the first to the second line. As we would expect, the value for the score decreases (or
          becomes more important). Using the same two urgency values but increasing the
                                    Chapter 14 Modeling Individual Decisions       383

threat ratio slightly shows that the resulting scores increase as well. However, we
can see that a ThreatRatio of 4.1 and an Urgency of 0.32 yields a lower Score than the
lower threat ratio (3.8) and lower Urgency (0.26). This exhibits the subtlety in the
combination of the two values.
    In the last two lines, the threat of 10.9 isn’t terribly dangerous in and of itself.
When combined with a significantly higher Urgency value (the last line), however,
the final score falls into the same range as the more threatening situations. This
would be comparable to a nonthreatening foe approaching the detonator. Whether
he can kill us or not, he becomes a priority target. In fact, if we increase the Urgency
of that last line to 0.95, the final score for that enemy becomes 11.47. That would
make it the lowest score (i.e., highest priority) of the ones shown.


Assuming we are now comfortable with the relationships and connections we have
established, we can now go about coding the scoring process. We looked at a portion
of this process already when we laid out the function calls that generated the time-
to-kill values. We will now put it into the whole function.
    void CAgent::ScoreTarget( CDude* pTarget, sWEAPON_INFO Weapon )


         CWeapon* pWeapon = Weapon.pWeapon;

         USHORT Dist = GetDistToTarget( pTarget );

         // Calculate time to kill enemy

         USHORT TargetHealth = pTarget->GetHealth();

         USHORT DamageIDeal = GetDamage( pWeapon, Dist );

         double MyAccuracy = GetAccuracy( pWeapon, Dist );

         double TimeToKillEnemy = TimeToKill( TargetHealth,


                                                    MyAccuracy );

         // Calculate time for enemy to kill me

         double DamageITake = GetDamage( pTarget->GetWeapon(), Dist );

         double EnemyAccuracy =
384   Behavioral Mathematics for Game AI

                   GetAccuracy( pTarget->GetWeapon(),


                                  pTarget->GetPointer() );

               double TimeToKillMe = TimeToKill( mHealth,


                                                      EnemyAccuracy );

               // Calculate threat ratio

               double ThreatRatio = ( TimeToKillMe + ( 3 * TimeToKillEnemy ) ) / 4;

               // Calculate target urgency based on proximity to the goal

               double Urgency = CalcTargetUrgency( pTarget->GetDistToGoal() );

               // Create and store the target information

               sTARGET_INFO ThisTarget;

               ThisTarget.pDude = pTarget;

               ThisTarget.pWeapon = Weapon.pWeapon;

               ThisTarget.Score = ThreatRatio / Urgency;

               mvTargets.push_back( ThisTarget );


           The function ScoreTarget() takes a pointer to a CDude and a weapon type as
      its parameters. We will see where these come from in a moment.
          The beginning of the function is familiar. First, we calculate the time it takes us
      to kill the selected enemy. Second, we calculate the same information in the other
      direction—the time it would take for the selected enemy to kill us. This is the portion
      we addressed above. Immediately after that, however, we calculate ThreatRatio
      according to the weighted average formula we defined earlier.
          We then calculate the urgency of the target based on his distance from the
      detonator. The function we call to do this is simple.
          double CAgent::CalcTargetUrgency( USHORT Dist )


               double Urgency = 1 - ( pow( Dist, 0.4) / 10);
                                         Chapter 14 Modeling Individual Decisions      385

              // Clamp negative urgency to 0

              if ( Urgency < 0.0 ) {

                   Urgency = 0.0;

              } // end if

              return Urgency;


           Once we have ThreatRatio and Urgency, we proceed with building the infor-
      mation about this target. Remember that sTARGET_INFO represents a combination
      of a Dude and a weapon that we are scoring for comparison purposes. We create a
      new instance of sTARGET_INFO, set the appropriate Dude and weapon information,
      and then save the score using the simple equation we decided on above.

          ThisTarget.Score = ThreatRatio / Urgency;

          We then push it onto the list of targets and exit the function. That is the entire
      process for scoring the utility of attacking one of our enemies with one of our


      Of course, we need to repeat ScoreTarget() for each of the enemies we face and
      for each of the weapons we carry. If we are facing three Dudes and have four
      weapons available to us, we have 12 options to score. By looping through the two
      lists, we can call ScoreTarget() for each of the 12 combinations.
          void CAgent::ScoreAllTargets()


              sWEAPON_INFO ThisWeapon;

              CDude* pThisEnemy;

              USHORT ei, wi; // loop indexes

              mvTargets.empty(); // start with a fresh list of targets
386   Behavioral Mathematics for Game AI

                  for ( ei = 0; ei < mpDudeCollection->GetSize(); ei++ ) {

                      pThisEnemy = mpDudeCollection->GetPointer( DUDE_TYPE(ei) );

                      for ( wi = 0; wi < mvWeapons.size(); wi++ ) {

                         ThisWeapon = mvWeapons[wi];

                         // Only consider loaded weapons

                         if ( ThisWeapon.Ammo > 0 ) {

                             ScoreTarget( pThisEnemy, ThisWeapon );

                         } // end if

                      } // end for

                  } // end for


          We have also placed our call to ScoreTarget() inside an if statement that
      checks to see if we have ammo for the selected weapon. If not, there is no need to
      check it and add it to the list.


      At the end of ScoreAllTargets(), we have a list of all the possible attack combi-
      nations. Selecting an option is a simple exercise at this point. Because we built our
      scoring algorithm so that the lower the value, the better the option, we can logically
      deduce that the target with the lowest score is the best option for us to select. We can
      sort mvTargets of targets by the Score value so that the lowest value is at the
      beginning of the vector. (Naturally, we could simply walk the list and make a note
      of the lowest-scored item. I sort it here because doing so will come in handy in
      Chapter 16.) We then set our target and weapon to the values held in that location.
      The entire function for selecting a target is:
             CDude* CAgent::SelectTarget()



                  std::sort( mvTargets.begin(), mvTargets.end() );
                                            Chapter 14 Modeling Individual Decisions      387

                  mpCurrentTarget = mvTargets[0].pDude;

                  mpCurrentWeapon = mvTargets[0].pWeapon;


           Reading it in order, we score all the targets, sort the vector by the score, set our
      target, and change our weapon. That’s it. Easy enough. We did all the heavy lifting
      in the layers of formulas that we built piece by piece (Figure 14.11).

                 FIGURE 14.11 The scoring algorithm for selecting a target and a weapon
                combination cascades through many levels. We combine Time to Kill Enemy
                      with Time to Kill Me, to arrive at Threat Ratio, which we then
                       combine with Urgency to yield our final score for the target.

      In Chapter 16, we will show some alternatives for selecting options that will
      allow us to create less predictable, deeper-looking behaviors in our agents.


      We will now take our new algorithm for a complete test drive to test some combi-
      nations of factors. When conducting this process, it is best to start with something
      simple and predictable. For example, if we place our three Dudes the same distance
      away from us and the same distance away from the detonator, and arm them all
      with the same weapon, we eliminate many of the possible variables.
388   Behavioral Mathematics for Game AI


          Dude                Distance          Dist. to Det.       Weapon
          Baddie              150                100                Machine Gun
          Evilmeister         150               100                 Machine Gun
          Boss Man            150               100                 Machine Gun


          Name                 Weapon            Threat          Urgency        Score
          Baddie               Pistol            754.6           0.369          2,044.7
          Baddie               M/G               6.9             0.369          18.8
          Baddie               Shotgun           754.6           0.369          2,044.7
          Baddie               R/L               7.0             0.369          19.0
          Evilmeister          Pistol            753.1           0.369          2,040.8
          Evilmeister          M/G               6.6             0.369          18.1
          Evilmeister          Shotgun           753.1           0.369          2,040.8
          Evilmeister          R/L               6.7             0.369          18.3
          Boss Man             Pistol            752.7           0.369          2,039.6
          Boss Man             M/G               7.4             0.369          20.1
          Boss Man             Shotgun           752.7           0.369          2,039.6
          Boss Man             R/L               7.5             0.369          20.4

           As we can see, because all the Dudes are at the same distance and using the
      same weapon, their threat ratios are very similar. The difference comes from two
      factors. First, the differences in their accuracy rates adjusts how much damage they
      can do to us. Boss Man will be a little more dangerous to us, and Baddie will be
      slightly less so.
          Second, the different types of Dudes have different amounts of health (50, 75,
      and 100). This makes a significant difference in how long it takes us to kill them.
      This yields an interesting result. Notice that killing the lowly Baddie is the second-
      most-preferable option. Despite the fact that Baddie is a less dangerous attacker,
      because of his lower health, we can kill him quickly. The difference is small, but in
      such a similar situation as this, we might as well get him out of the picture. In this
      situation, though, our algorithm suggests that killing an Evil Dude (in this case, Mr.
      Evilmeister) is our best bet. Also, given the distance, our trusty algorithm informs
      us that our best choice is the machine gun.
                                     Chapter 14 Modeling Individual Decisions        389

     Because the distance to the detonator is the same for all three Dudes, the urgency
ratings are all the same. Therefore, there is no change in the best choice as reflected by
the final score. We should whip out our trusty machine gun and let Evilmeister have it!

Here… Use This Instead
If we make one change to the above scenario and give Evilmeister a shotgun, his
threat ratio drops significantly. At 150 feet away, he can’t hurt us. Replacing only
his lines from the above table, we can see how his threat ratio (and his final score)

    Name                   Weapon              Threat         Urgency        Score
    Evilmeister            Pistol              1,000.0        0.369          2,709.7
    Evilmeister            M/G                 253.5          0.369          687.0
    Evilmeister            Shotgun             1,000.0        0.369          2,709.7
    Evilmeister            R/L                 253.6          0.369          687.3

     Evilmeister’s threat scores are high enough that he is taken completely out of
the running as our prime target. (Lucky him!) Notice that the threat ratio for using
a pistol or a shotgun is pinned at 1,000. This is as we designed. Because a pistol and
shotgun can’t reach him at 150 feet, the damage we could due with those two
weapons is 0. They are not even considered.
   Because Evilmeister is no longer a threat, we turn our attention to Baddie the Bad
Dude, who still comes in with a score slightly under (more important than) Boss Man.

Back Off, Son!
We will grudgingly give Evilmeister back his machine gun for the moment. There is
something else we need to test. What would happen, for example, if Baddie were to
begin to approach us? If we change the distance to Baddie to 100 feet rather than
150, his accuracy and damage rate go up. That makes him more dangerous. Of
course, our rates go up as well, making it easier for us to kill him. As we would
expect, Baddie becomes a higher priority as he gets closer to us. Again, showing
only Baddie’s new data:

    Name                   Weapon              Threat         Urgency        Score
    Baddie                 Pistol              29.0           0.369          78.6
    Baddie                 M/G                 5.3            0.369          14.3
    Baddie                 Shotgun             753.4          0.369          2,041.6
    Baddie                 R/L                 4.3            0.369          11.8
390   Behavioral Mathematics for Game AI

          When we recall that the prior best score before Baddie made his move was
      Evilmeister’s 18.1, we can see how much those 50 feet meant. Not only should we
      now attack Baddie, but we should do it with our rocket launcher. The reason for the
      weapon switch is because of the accuracy and damage curves that we designed very
      early on. A quick glance at Figure 14.6 reminds us that the rocket launcher is our
      most effective weapon at a range of 100 feet. (We should notice as well that while the
      shotgun is still not a good choice, the pistol is now something to at least consider.)

      Don’t Touch That!
      So far, all we have changed is the Dudes’ distance to us. Because we have not
      changed their respective distances to the detonator, the urgency scores have not
      changed. To test the effect this has on our scenario, let’s assume that Boss Man has
      made a break for the dastardly device. His path takes him no nearer to us, however.
      The only distance that is changing is the range to the detonator—to which he is
      coming alarmingly close. To complete the picture, we will leave Baddie at his closer
      range of 100 feet. The updated parameters, therefore, are:

          Dude                Distance          Dist. to Det.       Weapon
          Baddie              100               100                 Machine Gun
          Evilmeister         150               100                 Machine Gun
          Boss Man            150               20                  Machine Gun

          As we would expect, as Boss Man gets closer to his goal, our urgency to drop him
      increases. To help us make the various comparisons, we will list all 12 options again.

          Name                 Weapon            Threat          Urgency        Score
          Baddie               Pistol            29.0            0.369          78.6
          Baddie               M/G               5.3             0.369          14.3
          Baddie               Shotgun           753.4           0.369          2,041.6
          Baddie               R/L               4.3             0.369          11.8
          Evilmeister          Pistol            753.1           0.369          2,040.8
          Evilmeister          M/G               6.6             0.369          18.1
          Evilmeister          Shotgun           753.1           0.369          2,040.8
          Evilmeister          R/L               6.7             0.369          18.3
          Boss Man             Pistol            752.7           0.669          1,125.9
          Boss Man             M/G               7.4             0.669          11.1
          Boss Man             Shotgun           752.7           0.669          1,125.9
          Boss Man             R/L               7.5             0.669          11.3
                                     Chapter 14 Modeling Individual Decisions    391

    Checking back against our original results shows us that the threat scores for
Boss Man have not changed. This makes sense because his weapon, our weapon,
and the distance between us have not changed. On the other hand, because he is
now only 20 feet away from the detonator, our urgency level for him has climbed
to 0.669. Despite the fact that Baddie and Evilmeister have a more important
(lower) threat ratio than does Boss Man, the Arch Dude’s proximity to the awful
annihilatory apparatus drops his final score under that of the other options. Our
decision would now be to attack Boss Man with our machine gun.

Dude, There Were Dudes Everywhere!
In one last test, our original three Dudes are joined by reinforcements. There are now
eight Dudes total: four Bad Dudes, three Evil Dudes, and their infamous Arch Dude
leader, Boss Man. They are armed with different weapons, and arrayed at different
distances away from us and from the detonator.

    Dude                Distance           Dist. to Det.       Weapon
    Baddie 1            40                 50                  Pistol
    Baddie 2            80                 125                 Machine Gun
    Baddie 3            30                 130                 Shotgun
    Baddie 4            60                 90                  Shotgun
    Evil Genius         120                80                  Rocket Launcher
    Evil Knievel        180                40                  Machine Gun
    Evilmeister         60                 110                 Machine Gun
    Boss Man            90                 125                 Rocket Launcher

    At first glance, it is difficult to sort through the list and determine who might
be the highest-priority target. Consider the following observations:

    Baddie 3 is closest to us and has a powerful shotgun.
    Evil Knievel is closest to the detonator.
    Baddie 1 is close to us and to the detonator.
    Baddies 1 through 4 have only 50 points of health and are, therefore, the easiest
    to kill.
    Boss Man is the most accurate shot and has a rocket launcher.

    Unfortunately, none of those observations bring us to the correct conclusion.
Only when we run the information through our algorithm do we take all of that
information into account, rate it according to the formulas we have devised, combine
392   Behavioral Mathematics for Game AI

      it in the ways we have defined, and arrive at a single score do we come to the most
      logical answer:

              Use our rocket launcher to kill Evil Genius.

          Because there are eight enemies and four weapon choices for each, there are 32
      entries in our target list. Rather than clutter things up, we will only list the best
      weapon for each of the eight targets. (It’s not much of a surprise that it’s usually the
      rocket launcher, is it?)

          Name                  Weapon              Threat        Urgency        Score
          Baddie 1              R/L                 14.1          0.5218         27.0
          Baddie 2              R/L                 4.0           0.3101         13.1
          Baddie 3              R/L                 2.9           0.2992         9.7
          Baddie 4              R/L                 250.7         0.3950         634.7
          Evil Genius           R/L                 3.2           0.4229         7.7
          Evil Knievel          M/G                 8.0           0.5626         14.3
          Evilmeister           R/L                 3.3           0.3445         9.8
          Boss Man              R/L                 2.6           0.3101         8.5

           As we ponder the stats, we begin to see why Evil Genius didn’t immediately
      attract our attention. He ranks third on threat ratio (remember, lowest is most
      threatening) and third on urgency (higher is more urgent). If we had looked only
      at the disconnected data such as distance to us, distance to the detonator, weapon
      strength, and so on, we would not have detected the combination of information
      that makes Evil Genius the highest-priority target. (Rather ingenious of him, in an
      evil sort of way… don’t you think?)


      While it seems like it was a long road to accomplish the decision of which Dude to
      assault with which weapon, by breaking it down into individual steps, we were able
      to simplify the entire process.
          One of the important decisions we made about constructing our agent’s deci-
      sion is what the decision was going to entail. As we theorized early in the chapter
      and subsequently confirmed in our last example, we could not have separated the
      two individual choices of who to attack and what weapon to use. Doing so may
      have led us to less-than-optimum results. This is reminiscent of the Prisoner’s
                                     Chapter 14 Modeling Individual Decisions       393

Dilemma, where thinking only about our own choice rather than taking into con-
sideration our partner’s mindset led us to an acceptable yet not optimal solution.
Only when we considered both of the inputs and results did we arrive at the best
possible outcome.
    Once we decided what it was we were going to decide, we identified the individ-
ual components of the whole decision and dealt with each portion individually.
Establishing compartmentalized confidence in each of those steps as we went along
freed us up to concentrate only on the next step. For example, we were confident
that the formulas for the accuracy of the weapons were accurate. We were also con-
fident that the formulas for the damage from the weapons were accurate. Feeling
good about both of those as individual functions, we felt comfortable combining the
two into damage per second. Feeling that damage per second was an accurate mea-
surement of strength, we felt quite secure in the validity of comparing our damage-
dealing power with that of the enemy. We continued the process of adding more
layers—each layer only concerned with the one immediately before it. In the end,
we arrived at our final decision of who to attack and with what.
    One of the payoffs of the time we spent in developing this model is that our
agent is now highly dynamic. It responds well to changes in its environment. As
Dudes move, it adapts. Adaptation and change in AI agents is one of the major steps
in making AI seem more “alive” than mechanical and scripted.

There’s Always Something Bigger
That doesn’t have to be the end, however. We could have continued to combine
this result with something else. For example, we could introduce the idea of other
actions that are not related to attacking: fleeing, hiding, surrendering, grabbing a
health pack or a new weapon, running to a detonator of our own, or even pausing
to take a photo to memorialize the occasion.
    To incorporate these other possibilities, we would build a process similar to the
one for attacking and, as we have done a few times already, define a connection
algorithm between them. Our process above then becomes part of a bigger picture.
Rather than simply asking, “Who do we kill and with what?” a higher-level compo-
nent would be asking, “If we decide to attack, who would we kill and with what?”
There is a subtle difference between those two statements. The former is a final
decision; the latter is a suggestion.
    The difference becomes clearer if we imagine processing the decision from the
top down, instead. Imagine that our first decision was between the nebulous con-
cepts of attack, flee, hide, get health, get weapon, and take memorial photo. How can
we decide between them without knowing more about their relative merits? Sure,
we can decide that get health is a high priority if we are low on health, but if there is
394   Behavioral Mathematics for Game AI

      no health pack nearby, the arbitrary decision to “go get some health” could send us
      off on a wild goose chase (because everyone knows that geese provide health). We
      can’t decide to hide unless we know there are decent hiding places nearby. What
      constitutes “nearby” anyway? And what are we hiding from? What if we could eas-
      ily win the battle because the attack decision tells us the Dudes that are arrayed
      against us have no hope at all? We would have no reason to hide even if there was
      “the very bestest hiding place in the history of ever” right next to us! In fact, because
      we completely overpower them, we can take a moment to fulfill that burning desire
      to pause and take a photo of the poor Dudes for our MySpace page prior to blow-
      ing them up.
          The point is, all of those decisions are related through information that is
      specifically tied to them. Based on criteria that each would process on its own—
      hide needing a convenient hiding spot, for example—the possible decisions would
      generate their own utility values. We can then compare and contrast these utility
      values to decide which action best suits our needs at the moment.
          The moral of the story of the Dudes is:

          We can’t make decisions without information.
          Information is often ambiguous until we relate it to something.
          We can combine lots of little decisions into bigger decisions.
          We can roll bigger decisions into huge decisions.
          Beware the Evil Genius with the rocket launcher!

          There is one problem with all of the above, however: Information rarely stays
      the same—and especially not for very long.
          So then what do we do?
15             Changing a Decision

     n the last chapter, we illustrated many aspects of what goes into a making a de-
     cision. Indeed, throughout much of this book, we have thought about a decision
     as a static, isolated event—a puzzle to be solved. We have even examined some
 of the examples from various starting states.
      In the example in Chapter 13, we looked at how the weighted decision model
 for “what to do for dinner” yielded a different output if we offered it different inputs.
 We didn’t address the situation in terms of when the inputs change. The scenario
 for a given evening (i.e., the time and price restrictions and our desire for types of
 food) was presented as inputs to the algorithm. The algorithm crunched the num-
 bers and gave us an output. We were done. There was no further consideration
 given to what to have for dinner. One situation led to a single choice.
      Similarly, in Chapter 14, we laid out numerous starting conditions for our
 Dude assault. What types of Dudes are we facing? What weapons are they carrying?
 Where are they? Where is the detonator? We can represent all of those conditions
 in a single slice of time—the world as it is now. That slice is as thin as the simulation
 rate—a single frame of information… literally, a fraction of a second. We proceed
 with the decision process based only on that snapshot and yield a decision just as
 static. This is what we are doing now.
     But then what?
      The problem with a static, thin-sliced mentality is that time and circumstances
 are usually fluid. Situations change. Information varies. Inputs fluctuate. Accordingly,
 our decisions must adapt as well. On the other hand, changing our decision can be
 a weakness as well as a strength, as we shall examine in a little bit. Before we get too
 involved with the caveats, however, we need to address whether we need to change
 our minds in the first place.

396    Behavioral Mathematics for Game AI


       Just because a decision is valid at one point in time doesn’t mean it will always be
       so. When we base a decision entirely on information that we input into it from the
       world, we tie the decision itself directly to that information. That is not to say that
       any change in input would automatically yield a different decision. The problem we
       have is that we don’t necessarily know what changes will cascade into a different
            For instance, looking back at our Dude example from the previous chapter, it is
       unlikely that if any one of the Dudes took one step in any direction there would be a
       significant enough effect in the output to warrant a change in the decision. But how
       many steps would constitute enough of an adjustment? And what if all the Dudes
       were moving? And what if all of the Dudes’ health levels were fluctuating? And what
       if our ammo was changing? If all the inputs are changing, even in small amounts, the
       combination of all the changes makes for a dizzying array of possibilities.
            To account for the possibility that our results will change, we need to establish
       a method of monitoring those inputs so that our agent can respond accordingly.
       Unfortunately, there is no single solution to this problem. The approaches to mon-
       itoring a scenario are as widely varied as the procedures for generating the decision
       in the first place. As we have seen throughout this book, there is no single, all-
       encompassing answer. However, we do have some general considerations that we
       can take into account. By matching these tools to the task at hand, we can build our
       routines in a way that solves the problem we are attempting to address.

       One of the first concerns that we must address is how often we are going to check
       ourselves. We can think of this in the same manner as the concept of granularity
       that we discussed in Chapter 13. We want to establish a “Goldilocks” rule on how
       often to reprocess our decisions that is a balance between too often and not often
           Sometimes the time frame of the decision is obvious. For example, it would be
       overkill to process the “what to do for dinner” decision very often. In fact, one
       could make the case that once a night is enough (although we will throw a wrench
       into that in a minute). Other times, the notion of how often to reanalyze our deci-
       sion may be a little more obscure.
           If we were driving down the street in traffic, how often should we consider
       changing lanes? Is it something that we, as human drivers think about multiple times
       per second? Every second? Every five seconds? (On second thought, I know some
       people who seem to follow that rule.) What if we were on a highway in western
                                             Chapter 15 Changing a Decision      397

Nebraska with no other traffic in sight? Would you need to ever think of changing
lanes? (For my gentle readers who have not been to western Nebraska, think of a
highway on the moon instead.)
    It could be just as awkward to think about changing lanes over a longer period
such as five minutes, however. What if a situation arose where a lane change was not
only available, but desirable? Or even necessary? We would look a little silly sitting
there waiting patiently for our “lane-change timer” to expire. (On second thought,
I know people like this, too.) The problem is that there is no “natural” interval at
which we should consider changing lanes.
    While the correct approach to any given problem may be a little obscure at
times, we can generally narrow it down to one of four approaches:

    Every frame (continuous recalculation)
    Defined time periods
    Polling for changes
    In response to an external event

    Let’s examine the strengths and weakness of each approach.

Continuous Recalculation
The term continuous recalculation doesn’t evoke a lot of mystery about how often
we are reprocessing the world. In a nutshell, this method recalculates a decision
every frame. The only variation is how many frames per second we are running.
Suffice to say, for all intents and purposes, this method is equivalent to “as fast as
we possibly can.”
    There are uses for this approach throughout much of artificial intelligence (AI).
Simulation of flight, racing, aiming, and other continuous forms of movement, for
example, depends heavily on continuously updating speed and directional control
to match the information of the exact moment. Animation AI needs to know the
position of body, the states various limbs are in, what objects are in the way, and
what forces are present. One could make the case, however, that these items are
reactions and not decisions or behaviors. While reactions certainly do need to be up-
dated quickly, continuous updates aren’t as appropriate for behavioral AI.
     Above, we suggested that our decision on what to do for dinner should be cal-
culated once per evening. Barring certain contingencies (such as receiving a phone
call from friends inviting us over), imagine our dinner decision being constantly
monitored with the question, “Are we still having microwave burritos? Are we still
having microwave burritos? Are we still having…” (I’ll quit there. You probably get
398   Behavioral Mathematics for Game AI

      the point.) The continual analysis of a decision that we have already made and that
      is unlikely to change quickly is horribly inefficient (and sounds amazingly like a

      Defined Time Periods
      Going back to the reactions vs. behaviors question, from a pragmatic standpoint,
      one argument to not use continuous processing is a matter of reaction time. Human
      reaction time to a stimulus generally runs in the range of 150–300 milliseconds.
      A game running at 50 frames per second averages 20 ms per frame. Therefore, a
      human reaction time would be about 7 to 15 frames.
           For purposes of calculation, let us use 12 frames (240 ms) as our typical bench-
      mark. If our agent reprocesses the world every frame, and reacts to changes in the
      world in that same frame, it is acting in 1/12 the time that a human would. What’s
      more, if the stimulus continues to change every frame, our agent could theoretically
      make 12 different decisions in the time it should take for a real human to even react
      to the first stimulus.

      You can test yourself (and see the results of other people’s tests) at All they ask is that you
      click the mouse when the image changes from red to green. (This is similar to
      the test that cab drivers in major urban areas put the rest of us through at stop-
      lights—their horns being the punishment for failure.)
      It’s actually startling to realize how long it takes humans to react to things. I
      refuse to divulge what my average was… but I’m old. Let’s just say that my
      teenagers fared significantly better. Even my teenagers’ relatively rapid times
      and those of the people on the “high score” board make the idea of a single
      frame reaction look a little absurd.

           As we can see, continuous recalculation makes for serious overkill. We can
      mitigate this effect somewhat by ensuring that we don’t process more than one
      reaction in a set amount of time. By waiting for an animation to finish before we
      check for another stimulus, for example, we can slow down our agent’s hypersen-
      sitivity. If a “fire weapon” animation takes a full second, we can only decide to
      check for a new target at the end of each fire animation. We have limited the pro-
      cessing to once every 50 frames.
           We can also set an arbitrary limit on how often an agent will process decisions.
      If, for example, we use the 240 ms reaction time as our guide, we can elect to have
      an agent only process a decision every 12 frames. By setting this pace for the reactions
      of one agent, we open up those frames to allow other agents to make their decisions.
                                              Chapter 15 Changing a Decision        399

Theoretically, 1/12 of our agents will be making a decision in any given frame. (We
can actually specifically enforce this balance if we like.) This allows us to process
more agents and still have a short enough reaction time to make our agents look and
feel like they are reacting in human-like time spans. We aren’t really modeling re-
action time with this method, only taking advantage of the time that it gives us. The
possibility still exists, for example, that the decision would be processed in a single
    We can make use of this time savings in other ways. If we use those extra frames
to process more information about the world in more depth, we can possibly con-
struct better behaviors. Using 12 frames rather than 1 to make a decision can be
rather freeing. Exactly how we would do this depends largely on the type of decision
we are making and the architecture we are using. Although an extended discussion
on these techniques is beyond the scope of this book, some suggestions are multi-
threading and the well-documented techniques for spreading search algorithms
(such as A*) over multiple frames.

Polling for Changes
While we mentioned above that one of the inefficiencies of constant recalculation
is when the data hasn’t changed much, there is the possibility that the data has not
changed at all from frame to frame. If the data that we are incorporating in the de-
cision has not changed, we are processing the same inputs only to repeatedly arrive
at the same decision. To avoid this problem, we first check to see if any of our inputs
have changed. If they have, we can then proceed with recalculating the decision. We
refer to this method of asking the world for information as polling.
Polling has serious drawbacks, however. To check if something has changed, we
have to remember what it was before. This means that an agent would have to re-
member the state of all of the inputs that it uses in all of its decisions. Multiply this
by multiple agents and the data demands can increase rapidly.
    Continuously polling the world to see if an event has happened that would re-
quire a change in decisions is taxing to the processor. Imagine the simple act of an
agent determining if anyone in the local area has thrown a grenade that he would
need to react to. That seemingly simple test involves the following steps:

     1. Determine enemies in range of agent.
     2. Determine if agent has line of sight to enemies.
     3. Check each visible enemy to see if they have thrown a grenade since the last
        time we checked.
     4. Determine if grenade will land close to us.
     5. If necessary, process the reaction.
400   Behavioral Mathematics for Game AI

           Simply recalculating the enemies in range and determining if we have a line of
      sight to them (items 1 and 2) is an expensive proposition. Asking each enemy if he
      has thrown a grenade every frame (or even every few frames) when it is statistically
      likely that they have not done so, is wasting precious clock cycles. For example, if an
      agent only throws a grenade on average every 30 seconds, that is 1,500 times we
      must ask (at 50 frames per second, or fps) to receive one “yes” answer. If we also de-
      cide to check to see if they have aimed a weapon, fired a weapon, taken a step in any
      direction, surrendered, died, or stopped to take a memorial photo of the moment,
      we are asking a lot of repetitive questions for very little gain.

      Event-Driven Recalculations
      The usual solution to “polling paralysis” is an event-driven model. Rather than having
      each agent poll the environment to see if anything has changed in the 20 ms since
      he last checked, we can instruct agents to tell each other if they have done anything.
      We call this an event-based system.
          In our dinner example, receiving a phone call from friends inviting us over is
      an event that we can specifically associate with the dinner decision. Naturally, despite
      having made our decision earlier, we can use this excuse to recalculate our plans
      (with the addition of a new option).
          Using an event-based system, we broadcast actions performed by one agent to
      other agents. For example, if our agent throws a grenade, he performs steps one and
      two above and sends a message to the other nearby agents informing them that he
      has done something that is worth of their attention. Only then do the other agents
      know to include the new grenade in their decision process. The new action list is now:

           1. Determine enemies in range of agent.
           2. Determine if enemies have line of sight to agent (notice we reversed this).
           3. If necessary, inform each affected agent of the event.

          More importantly, rather than performing the above tasks every frame, we only
      execute them when there is something to say. In our grenade example, if we only
      lob a grenade every 30 seconds, we only send a “grenade!” message out every 30 sec-
      onds. We have achieved a 1:1 efficiency rate.
           Unfortunately, this is not a completely fail-safe solution either. We may not
      want our decision-making agent to wait until the state of the world changes. We have
      all played games where an enemy sat and waited, doing nothing until some trigger
      event happened. This is most noticeable in games where an enemy gets “aggroed”
      (angered into attacking) by the proximity or line of sight to a player. In older
      games, the agent would idle in either a state of suspended animation or an idle loop.
                                                  Chapter 15 Changing a Decision      401

       Some recent games have incorporated more realistic-looking combinations of idle
       behaviors that don’t rely on player-triggered events. Despite the appearance of
       added complexity, however, the enemy is still “waiting for something.”
           A completely event-triggered method can also be extremely brittle. Depending
       on when and in what order the messages arrive, our agents may react in different and
       unexpected ways. Even messages that arrive in the same frame may be processed in
       an order that leads to problems simply because of the order they were placed in the
       message queue. Also, considering that some of the actions that we may trigger as a
       result of a message can take a significant amount of time to perform (especially
       when measured in 20 ms frames), we may effectively preclude a response to some-
       thing that needs to be considered.
           For example, if we receive a message that causes our agent to decide to do
       something relatively mundane (such as answering the phone), we initiate that task,
       and then receive information of a high priority (the grenade), we may not process
       that information until much later. Our agent can get stuck doing the mundane task
       when he needed to consider something else.
           The mechanics of event-driven architectures, the messaging systems that sup-
       port them, and the difficulties involved in interrupting animations are well docu-
       mented in other books. However, it is important to be aware that they exist from a
       conceptual standpoint.

       As we have seen with the various tools and techniques throughout this book, a
       combination approach is often called for. We can draw from the strengths of the
       different methods to build an appropriate self-monitoring scheme. As we discussed
       at length in Chapter 2, observation of our own behaviors holds many clues to what
       the proper approach is. Just as we broke down a decision process into smaller val-
       ues, formulas, and algorithms, we can subdivide the decision-monitoring process
       into component parts.
            For example, think about what we do when we drive a car—an example we
       touched on briefly above. The decisions we make require a reevaluation of our en-
       vironment to determine that our current course of action is the correct one. For
       simplicity’s sake, let’s list only the factors we consider to determine if we should
       change our speed. While we monitor many items on a regular basis, we monitor
       none of them on a continual basis—that is, taking up all of our attention. We tend
       to alternate our attention between the following:

           Our current perceived speed
           Our current reported speed
402   Behavioral Mathematics for Game AI

          The current posted speed limit
          The proximity of the next traffic light
          The state of the next traffic light
          The proximity of the nearest traffic ahead of us
          The speed of the nearest traffic ahead of us
          The speed of traffic ahead of the traffic ahead of us
          The speed of traffic near us that may help or interfere with our plans (for a lane
          change, for example)
          Unexpected events to which we may have to react

          While all of those components are factors that we would want to build into a
      decision on whether to speed up, slow down, apply the brakes, or maintain our cur-
      rent velocity, we certainly don’t think about them all at the same time. How often
      do we make a mental note of our speed, for example? Every few seconds? Most of
      us likely check our speedometer even less than we make a mental note of our
      speed—perhaps every 10 seconds. How often do we consciously look for the posted
      speed limit? More likely in the process of “scanning the world” we happen to notice
      the sign and compare it to our mental record of what we thought it was.
           While we may make a note of how far away the next traffic light is, we don’t
      continually monitor its state until it is close enough that we would need to react to
      a change. (My father gleefully tells the story of how, when I was three years old,
      upon seeing a traffic light at the next corner turn red, I would stop walking… even
      if I was in the middle of the block. Obviously my AI was still being debugged at age

      Determining Minimum Granularity
      Most of the examples in the above list are addressable by setting different timers for
      checking information. One problem with this analogy, however, is that we have a
      hard time differentiating between the collection of information and the reprocessing
      of a decision. In a computer situation, the information is already there. We aren’t
      necessarily modeling the sensory system of “checking the speed.” Our driver agent
      already “knows” the speed. We aren’t modeling the process of looking out the win-
      dow at the scenery and occasionally glancing at the car ahead of us or the distant
      traffic light. All of that information is immediately accessible. By only processing
      the decision calculations occasionally rather than continuously, we are leveraging
      the fact that people don’t process stimuli that quickly.
                                             Chapter 15 Changing a Decision      403

     If we were to list the time periods between updates of the different actions in
the list, we would find a range of times. We, as drivers, may perform some of the ac-
tions every second, some every few seconds, and some every minute or so. It would
not make sense for us to wait until we have updated everything, however. In fact, the
opposite is true. If we have updated any of our information, we need to reprocess
the entire decision using the information that we already have. What we are estab-
lishing is a minimum granularity. If the shortest update interval is one second, for
instance, it is possible that our information would change in as little as one second.
Therefore, we need to reprocess our decision at that interval.

Interrupting with an Event
The one glaring exception that doesn’t work on a timed-update-based system is the
final item: unexpected events. Almost by definition, continually checking for an
unexpected event is counter-productive. This is where the event-based system is
helpful. Putting our driving example into computer terms, if the driver ahead slams
on his brakes, that car sends an “I’m braking!” message to other nearby cars. The ef-
fect is much like the grenade message earlier. This event would immediately trigger
a recalculation—not necessarily to stop (what if the driver “ahead of us” is a full
block ahead?), but to recalculate our current situation and construct an appropri-
ate decision. In essence, the message is telling us, “You really need to recalculate
everything right now!”

Timers as Interrupts
By constructing the timers we discussed above so that they send messages as well,
we effectively combine the two methods. The difference is minor and mostly logis-
tical. The agent’s message receiver (again, beyond the scope of this book) would not
need to differentiate whether the message instructing him to recalculate the deci-
sion arrived from an internal timer or an external event. The result is the same:
“Recalculate now.”

 IN   THE   G AME   Dudes Revisited

In the previous chapter, we constructed an elaborate method of determining who
the “best” target is for us to shoot. Let us now examine the possible solutions for
how and when to reexamine our decision and, possibly, reevaluate our target.
    First, we do have a natural division that we can work from—one shot. The ratio-
nale for using a single shot as the minimum granularity is simple; we can’t change
targets in the middle of a shot. Of course, the time period that each shot may take
404   Behavioral Mathematics for Game AI

      is different. Shooting our machine gun is a fairly rapid event. Similarly, a pistol is
      quick but requires more effort to aim. The shotgun doesn’t require as much time
      to aim, but the reload time is going to be more involved. Lastly, the rocket launcher
      is slower to wield, slower to aim, and slower to reload. The result of this is that,
      while we have a defined action that constitutes a single decision, we don’t have a
      single amount of time.
           The solution, therefore, is to recalculate the decision of whom to shoot next
      after the conclusion of each shot. As we prepare to shoot the next target, we proceed
      with all the nifty stuff that goes into facing the target and aiming. Upon completion
      of that shot (regardless of how long that takes), the decision about whom to pum-
      mel next is reevaluated and the process starts anew.
          This seems simple and foolproof at the moment, but there are plenty of wrenches
      that we could find haphazardly lobbed into our tidy little process. For example,
      consider the following sequence:

           1. We complete a shot at Bad Dude.
           2. We reload our weapon.
           3. We process the decision of whom to shoot next.
           4. We turn toward the target.
           5. We raise our weapon.
           6. We aim our weapon.
           7. We get ready to fire… and…
           8. We find that Bad Dude is already dead!

           Somewhere between step 3 and step 8, the decision of whom to dispatch was
      rendered irrelevant. While this might not look too out of the ordinary for a fast-
      firing weapon such as a machine gun or a pistol, our agent would look startlingly
      ridiculous going through the protracted action of aiming a rocket launcher at the
      space formerly occupied by a Bad Dude. If we only build our model to recalculate
      after a shot, we have to go through with the shooting process to select a new target.
      What’s more, if the game code insists that the actual act of shooting requires a
      valid, live target to complete, we may now be stuck.
          To avoid all of this, we can add a verification process throughout steps 3
      through 7. In essence, we are asking the simple question, “Can I still do this?” If not,
      we can trigger our decision evaluation process immediately (through an event mes-
      sage, for example).
                                                     Chapter 15 Changing a Decision         405

           To make this process more involved, let us return to the idea that the decision
      about “who to shoot” is only a part of the whole. At the end of Chapter 14, we listed
      other actions that we could consider: fleeing, hiding, surrendering, grabbing a
      health pack or a new weapon, running to a detonator of our own, or (as we so glibly
      joked) pausing to take a photo to memorialize the occasion. When we consider
      those actions as well, the line that demarks when we should process a new decision
      is a little more blurry.
          If we are hiding, for example, when do we peek out? How often do we think
      about peeking out? If we have decided to head off in pursuit of the health pack in
      the distance, what happens if someone else takes it first? Or shoots a rocket at it? Or
      simultaneously makes a break for the world-ending detonator? All of those are sit-
      uations that may cause us to change our mind about what we should be doing.
      Continuing on with our current course of action could make us look… well… less
      than intelligent.
            The answers to some of those events are triggers. A Dude shooting a rocket at
      us is certainly a trigger that would get our attention and spur us into careful recon-
      sideration of our priorities. On the other hand, some of the criteria are polling-
      based. “Is the health kit still there?” is very similar to checking to see if our target is
      still alive, for example.
           Other answers are a little harder to nail down. For instance, we don’t want to
      trigger an event every time a Dude moves. We would never stop triggering events!
      We can stick to polling the relevant information about the location of the Dudes to
      determine if any one of them is either an urgent threat or an opportunistic easy kill.
      However, if our polling interval is too long, the Dude may get a jump on his dash
      for the doomsday switch before we can react.
          Again, there isn’t a “right” answer that covers every contingency. We must
      keep in mind, however, is that we must find that elusive balance between useless
      reprocessing and the accuracy of the current decision.


      While changing our mind at appropriate times and for appropriate reasons is a ne-
      cessity to building agents that react adequately to their surroundings, problems can
      result. Even a seemingly logical decision change—in both the timing and content—
      can be perceived as odd, unproductive, or incorrect. What’s worse, by making our
      agents reactive to their environment, we can be opening them up to manipulation
      and exploits.
          Rather than doing theory first and then an example, let’s reverse the process.
406   Behavioral Mathematics for Game AI

       IN   THE   G AME   Flotilla of Futility

      Consider the following example (Figure 15.1):

            In a tile-based strategy game, we have assembled our naval fleet for a
            bombardment and invasion. We analyze the coastal cities of our enemy to
            determine which one is the most vulnerable. After using our complicated
            utility algorithm, we decide to attack Dudetown (I just had to use the
            Dudes again!), an ill-defended city on the western side of the continent.
            Our fleet sets out in the direction of the soon-to-be-ravaged city (1).
            When the fleet has traveled a significant portion of the distance (2), we
            reevaluate the situation by once again analyzing the coastal cities. Much to
            our surprise, we find that another city, Suckerville, is completely unpro-
            tected! Our trusty attack algorithm determines that you can’t get much
            better than an undefended city. We issue new orders to our fleet to head in
            the direction of Suckerville. Unfortunately, our new target happens to be
            on the eastern side of the continent. Oh well, for the easy spoils, it is worth
            the trip.
            All the way around the continent (3), we continue to check our status.
            Every time, the still-undefended Suckerville comes up on top of our prior-
            ity list. Upon closing to attack range of Suckerville (4), we are astounded to
            discover that, at the last moment, our enemy has moved large numbers of
            defensive units into the city. Strangely (and almost simultaneously, it
            seems), the heretofore well-fortified Dudetown is now devoid of military
            Well aware that an attack on Suckerville would be met with fierce resistance,
            we see Dudetown once again as a far more attractive target. Our ships turn
            around (5) and begin the long trip back around the continent toward what
            must be inevitable victory on the western rim of this vast land (6).
            As we once again spot the doomed city of Dudetown on the horizon (7), we
            check our intelligence reports one last time and find… wait a minute… are
            you kidding me?!?
         Anyone who has played strategy games probably uttered a knowing chuckle
      about two paragraphs in. It turns out that Suckerville was appropriately named.
      Our decision model, and by association, our fleet, was exploited by the enemy.
           This is a common technique to confuse the AI. It is most effective in tile-based
      games where there is a distinct grid space that defines “in the city.” Moving all the
      military units one space outside of a city causes the AI that is analyzing the poten-
      tial targets to see the city as being completely undefended.
                                               Chapter 15 Changing a Decision        407

FIGURE 15.1 By not taking the travel time into account, the AI fleet does not realize that
          it is wasting significant time alternating between two distant goals.

     When the fleet approaches the city, the player moves the units back into the
city. Simultaneously, the player removes units from a distant and otherwise safe city
somewhere else. The AI, upon recalculating the situation, views the second city as a
prime target—especially when compared to the original target that is now mysteri-
ously bristling with defenders. Using the same, simple utility model, the AI picks
the “best target” and sets off.
    The player can repeat this process with a minimum of attention and effort. It
also effectively takes the AI player’s navy completely out of the game. None of the
player’s cities get attacked because the AI can’t ever settle on a decision—at least not
long enough to act on it. If the naval forces are an invasion force to the continent,
the benefits are even more significant. The player can live quite comfortably as
long as he needs to in relative peace. He has not built defenses of stone and steel; he
has built a defense of flawed logic and confusion.
    Oh yeah… and the AI looks bad. We may as well put him in dark glasses and a
stupid hat.

   There are a number of ways that our AI can stumble in the manner exemplified
above. The AI in the above scenario fell prey to one of the classic blunders (the most
famous of which is, never get involved in a land war in Asia): It didn’t take into
408     Behavioral Mathematics for Game AI

        account all of the possible inputs. We will explore the exact mechanics of what went
        wrong in a moment. By not taking key factors into account, it allowed itself to be
        led into alternating between decisions. This is a phenomenon known as strobing,
        flip-flopping, or even just flipping.
            While strobing can manifest in many ways, the common thread is the effect
        that strobing has—nothing. In mild cases, strobing AI can look inefficient or mud-
        dled. At its worst, strobing leads to the behavior seen in our example above where
        the fleet alternated between sailing toward one city and the other, never attacking
        either one.

        While that example implied a nefarious player intentionally exploiting the decision
        algorithm of the fleet, AI agents can strobe between behaviors based on less direct
        and less obvious changes in their decision factors. In fact, a worse problem is when
        our game model generates strobing behaviors entirely on its own.
            Agents that depend on each other for decision criteria can enter into a joint
        concert of strobing. To illustrate with a simple example (with yet another nod to
        Brother Occam), imagine two agents facing off in a Rock-Paper-Scissors contest.
        There is a twist to the game, however: each agent can see the other player’s inten-
        tion before they make their play and elect to change what they are going to do.
            At first, each player selects randomly. Then, upon examination of the pending
        match, they can elect either to proceed or to change their selection. Because this is
        a zero-sum game, the only two outcomes are a win-loss combination or a draw.
        Unless there is a draw, one player is going to be on the losing side of the match.
        (Even a draw is less than preferable.) Naturally, one would expect the soon-to-lose
        player to be dissatisfied with this outlook and decide to change his play. Additionally,
        because he can see the other player’s choice, he changes his selection such that it will
        put him in the position to win.
             This act, of course, causes his opponent to be in the potential losing role. The
        logical reaction is for the second player to switch his play as well. You can see where
        this is headed, I’m sure. None of them will ever be satisfied with the pending out-
        come to actually commit to playing. Between the two of them, they will spend all of
        their time switching “weapons” in the battle and never fire a shot.
             While this example might seem absurd, consider a scenario such as the Dude
        assault from Chapter 14. As part of the decision process, our agent took into account
        both the weapon that he was using and the one the other player was using. If the sce-
        nario was drawn up so that there were distinct strength vs. weakness correlations
        (as there are in Rock-Paper-Scissors), two agents with the same decision algorithm
        may engage in much the same dance.
                                                      Chapter 15 Changing a Decision        409

            The reason that we, as intelligent people, find this behavior absurd is that we
       recognize the futility of the situation. In fact, the popular (and generally very accu-
       rate) layman’s definition of insanity is “doing the same thing over and over and ex-
       pecting a different result.” Given that one of the early structural flaws about which
       beginning programmers are cautioned is the “infinite loop,” most of us are very
       cognizant of the fact that computers are blissfully unaware of the existence of, much
       less the absurdity of, futility. Unless instructed otherwise or until a different com-
       plication (such as your processor chip melting) brings it to a halt, a computer will
       contentedly repeat mundane tasks ad infinitum.
            One of our tasks, then, is to make sure our AI agents understand the concept of
       futility. We can do this through pattern recognition or a variety of other esoteric
       techniques. However, it is usually better to avoid the situation altogether by provid-
       ing the agent with more information and a better decision model.

       We often refer to the antithesis of strobing as decision momentum. The use of the
       word momentum is not entirely figurative. It actually is appropriate in a metaphor-
       ical sense. We use the word in the same sense as we use it in physics. In the world
       of Sir Isaac Newton, when something is barreling along in one direction, it is going
       to take something pretty big to get it to change direction, much less reverse its di-
       rection entirely.
           Remember that in Newtonian physics, mass is only part of the equation. The
       momentum of an object is a product of its mass and its velocity. When two objects
       of equal mass meet in a collision, the one that is moving will “win” against the one
       that it is at rest. For example, imagine a moving billiard ball colliding with a station-
       ary one. Despite their equal mass, the velocity of the system continues in the direc-
       tion that the original, moving ball was going prior to the collision. However, if the
       moving billiard ball were to collide with something more massive such as a bowl-
       ing ball (to extend things to easily illustrative lengths), the larger (albeit stationary)
       mass is going to be rather convincing that the hapless billiard ball needs to head
       back in the other direction. To sum up, the bowling ball had enough weight to re-
       verse the direction of the entire system.
            We can apply this to our choices as well. Once we have set about doing some-
       thing, we have added velocity to our decision. (This is not a stretch, really. Don’t we
       say that we have “put a plan in motion?”) We can elect to continue performing that
       action—unless and until something “big enough” comes along to change our direc-
       tion. Because information is the impetus that puts our decisions into motion in the
       first place, it is reasonable to assume that information is the force that would act
       upon our decisions to get them to change their course.
410   Behavioral Mathematics for Game AI

           To do this, we judge the weight (now being used figuratively) of new informa-
      tion not only against the weight of the previous information, but against that
      weight times the velocity that the action is already traveling. If we already have
      committed heavily (see how these weight words work so well?) to a decision, the
      momentum of the system will be fairly significant. If information of equal weight
      to what we already used is placed in front of our decision, the idea of decision mo-
      mentum tells us that we would continue on in the same direction. However, if we
      encounter new information that has greater weight than what we originally had
      considered, the pace of our overall decision system will slow down somewhat. The
      greater the difference in the weight of the information, the greater the effect it has
      on our decision. Eventually, if the new information becomes massive enough, it will
      act as the bowling ball above did, and our decision will reverse in the face of this im-
      posing new mass of data.
           Keeping this in mind, rather than focusing on stopping our decisions from
      strobing after the fact, we can ensure that enough of the relevant information is
      available for the agent to take into account so that the problem doesn’t occur in the
      first place. Often, by making the agent a little more aware of information and a lit-
      tle more mindful of ramifications, we are giving it more “mass” and, therefore,
      more decision momentum. Rather than changing a decision on the fluctuations of
      the moment, our agents can persevere with their chosen course of action.

      Time Commitment
      In the Flotilla of Futility example above, there was one more piece of information
      that, if the AI had considered it, may have made the difference in its decision: time.
          In Chapter 7, we spent some time pondering the utility of time as a component
      to consider in decisions. We discussed time spent, time saved, time wasted, and
      other such ways that we can view chronological matters. In one example, we set out
      a scenario in which we were deciding whether we wanted to accomplish goal A be-
      fore goal B or vice versa. We illustrated how the decision depended on factors such
      as the relative value of the goals and the time it would take to travel to them. We
      complicated the situation a little more by introducing decay in the value of the goals
      over time. By doing so, we caused the time we spent traveling to the goals to be even
      more influential in the final decision calculation. All of those factors are transfer-
      able to the example of the aimless armada.
           In the example above, the AI calculated the relative utility of attacking the two
      cities (and presumably others as well). The AI would have to weigh various pros
      and cons about his own fleet, the defenders, and so on. This is, of course, similar to
      the Dude Assault example from Chapter 14. This is where the calculation stopped,
      however. Unlike our decision regarding which Dude to fire a single shot at, there is
                                               Chapter 15 Changing a Decision        411

a significant time commitment involved in execution of the naval attack. While a
portion of the solution was the continual monitoring of the prospective targets for
a change in their status, the weakness in the process was not monitoring what the
resultant decision would entail.
    The “attack” that we were deciding to launch only took into account the ex-
pected odds of success or failure and the relative worth of the target. All of those
factors take place between the time the first shot is fired and the time the last defender
(or attacker) gives up and throws down his weapons. There is no accounting for the
entirety of the action. Put another way, rather than a decision about “what city to
attack,” the decision needed to be “what city to travel to and then attack.” By
considering the whole process as one decision, time is automatically included in the

Information Expiry
In addition to the utility of the time we would spend traveling to someplace or
doing something, there is another reason that we need to consider time heavily in
the decision-making process. An intentional exploit by a player is not the only rea-
son we should treat the decision to attack a particular city with caution. In the
highly dynamic environment that games tend to be, information has a “shelf life.”
Units are created and moved, defenses are constructed, values change… in short,
things happen. Even if the decision to attack a city is a legitimate one based on the
information we have, we have to question the likelihood that the information will
still be valid by the time we get there.
    This is similar, but not identical, to the “fog of war.” The legendary “fog” en-
compasses how information may change while we are not able to view it. Information
expiry deals with the prospect that information probably will change over time
whether we are watching it or not.
    We touched on this above when we discussed the value of updating informa-
tion. However, by factoring in the possibility that the situation will be different as
part of the initial decision process, we are taking a more proactive approach. Rather
than the “we can always change our mind” method that continual updates provide,
we are saying, “let’s not even start.”
    While we could conduct complicated analyses of fluctuation frequencies and
ranges, it is usually simpler to build the idea that the state of the world could change
into the weight given to the time that we will need to spend. We can do this through
biasing the marginal utility in a nonlinear way. Using a simple exponential model,
we could say that our confidence in the validity of information drops away from
100% at an accelerating rate as time passes.
412   Behavioral Mathematics for Game AI

         The graph in Figure 15.2, for example, follows one of our trusty formulas from
      Chapter 10.

                     FIGURE 15.2 By constructing a decreasing marginal utility
                     curve such as this one, we can simulate our gradual loss of
                      confidence that what we know now will be relevant later.

          If we can act on our decision in the next turn, we can remain fairly confident
      that the information we are acting on is going to still be valid when we get there. On
      the other hand, if it will take us 10 turns to arrive, we may only ascribe 90% confi-
      dence to the data. As the time frame lengthens, we grow less confident more quickly.
      Doubling the distance (and therefore the time) to a target would yield a 60% con-
      fidence rate per the formula above. At 25 turns, our confidence is 38%, and if it will
      take us 30 turns to arrive, we may only believe there is a 10% chance that our esti-
      mate will be correct.
           We would only apply this confidence rating to information about items that are
      not in our control. In the example above, we would only apply it to the makeup of
      the city’s defenses. The value of the city as a strategic target likely wouldn’t change,
      and we know that the makeup of our forces won’t change. (If it does, we would
      need to recalculate our plans anyway.)
                                               Chapter 15 Changing a Decision        413

     Exactly how we apply this factor would depend largely on how we constructed
the decision algorithm in the first place. However, it is usually a good idea to bias a
distant objective pessimistically. For example, we would not serve ourselves well if
we assumed that, by the time we arrived, the enemy will have graciously removed
all their defenders.
     In that respect, another method of biasing our decision is to add a pessimistic,
time-based value to the strength of the opponents in the city. Imagine flipping the
curve in Figure 15.2 upside down. Instead of a confidence factor moving downward
from 1.0 toward 0, we could create a multiplier that moves up from 1.0. By using
this value as a coefficient for the defenses, we are artificially inflating our estimate
of the defense capabilities of the enemy. This could represent the possibility that
they will see us coming and have time to move units into the city, for example. The
result is that we will be less likely to consider that option as viable.
     On the other hand, we must also take care to not paint such a bleak picture of
all distant targets that our fleet despondently remains at anchor and doesn’t do any-
thing. Remember, much of the time we are using the information to bias an other-
wise equal decision one way or another… not negate the whole process of action

The Value of Thoroughness
Not all of the considerations on whether or not to switch from one task to another
are based on time, of course. Another significant factor reoccurs in many different
types of decision processes—completing the task at hand. Unfortunately, one of the
side effects of the task monitoring that we discussed at the beginning of the chapter
is that our agent may be tempted to abandon what it was doing to perform what it
believes is a higher priority task. While that is certainly laudable, and is a necessary
part of a proper decision framework, we don’t want our agents simply abandoning
what they were doing because something else comes up. We would often end up
with many partially finished projects and nothing that ever gets completed.
     One simple illustration of this effect is a slight modification of our fleet example.
Imagine that our little armada actually began to attack Dudetown rather than
changing its goal before the assault. However, just as the last battered and bleeding
defenders were about to fall, we received word that the distant Suckerville was un-
defended. If we leave at that moment and don’t finish the sacking of Dudetown, we
give the residents time to rebuild, the military time to regroup and heal, and per-
haps even allow time for reinforcements to arrive. Even if we are only gone for a few
turns, we may be right back where we started in our quest to eradicate Dudetown
of all Dudes. What did we accomplish by our initial attack?
414   Behavioral Mathematics for Game AI

      Finish Him!
      In a similar way, we can extend our Dude assault example from Chapter 14. If we
      have selected our target, fired on him enough to almost kill him, and yet changed
      to target someone else who has become even slightly more preferable, we are still
      exposed to the threat of the original target. However, if we incorporate the utility of
      finishing the job, we would decide to continue attacking the wounded warrior until
      we have eliminated him. This is analogous to the mindset, “I have noticed that the
      Dude over there is now important. I will need to attack him next… after I finish
      killing this Dude.”
          This is especially important in games where the offensive capabilities of a unit
      are not reduced by the damage that it has sustained. For example, if we face four
      dudes who can do 10 points of damage each second, it doesn’t matter if they each
      have 50% health remaining. We will still face all four of them and, as a result, con-
      tinue to take 40 points of damage per second. On the other hand, if we had used the
      same firepower over time to completely kill two of the four dudes, we would have
      done the same amount of damage and yet reduced the damage to ourselves to 20
      points per second instead of 40. The difference was, we finished the job. Taken to an
      extreme, if we were to continue alternating between different Dudes, we could
      theoretically face four Dudes who each have 1% health remaining—and are still
      dishing out 10 points of damage per second each. Obviously, this is not an efficient
      way to handle business.
          We can return to the principles of increasing marginal utility (Chapter 8) to
      help us bias our decisions and imbue our current course of action with some extra
      momentum. We incorporated some of this already in our Dude assault algorithm
      by using the amount of remaining health that a Dude had left (instead of his max-
      imum health) in our calculations. To recall two equations from Chapter 14, we first
      constructed TimeToKill out of some relevant information.

          We then used two different TimeToKill values to calculate ThreatRatio.

           The effect that the TimeToKill results had on ThreatRatio was linear, however. If
      we replace the raw TimeToKillEnemy with a value that represents the marginal util-
      ity of the time it would take to kill an enemy, we could increase the meaningfulness
                                                   Chapter 15 Changing a Decision      415

      of that factor in the formula as the time decreased. To do this, we could pass
      TimeToKill through a function that converts it from linear to exponential, for
           Notice that this is different from simply changing the weight of the factor com-
      pared to other factors. This is utilizing a response curve to change the weight of the
      factor compared to itself over other ranges. For instance, we may not care so much
      if the target has 100% health or 90% health. That 10% difference doesn’t mean
      much to us. We may be slightly more interested as the health drops to 80%. If the
      target’s health is only 20%, however, we become very interested in trying to finish
      this Dude off—and need to increase the utility of this factor accordingly. Inflicting
      that last 20% of damage to the Dude is far more important to us than doing the first

      Just Gimme a Minute, Boss!
      Changing genres to something more constructive, we can use the concepts of mar-
      ginal utility once again with regard to completion of buildings in a strategy game.
      If our game involves the allocation of time, labor, and resources to construct a
      building, we may want to continually check to see if there is something more
      important that the units should be building or doing. There would definitely be cir-
      cumstances that would warrant a worker putting down his hammer for a moment
      to take up another task. (Being attacked comes to mind as something that Mr.
      Carpenter would find mildly distracting.) We may even decide that, despite our
      partially finished building, our labor or money needs to be diverted to a more
      important building. However, it would be tragic (and our players would view it as
      ridiculous) if our workers were to abandon a 99% completed building to start a
      new one elsewhere. Even if the new building is important, is it so urgent that our
      laborers couldn’t simply finish the one they were working on first?
           The incorporation of increasing marginal utility as we near completion artifi-
      cially biases the importance of each successive hammer swing. As the building nears
      completion, we generate sufficient decision momentum to cause the workers to be
      more interested in completing their current task than shuffling off to a new one.


      As has been something of a mantra throughout this vast tome, there is no “right
      answer.” What we have discussed above is a handful of considerations, caveats, and
      strategies. “Ways to think” is probably more accurate. There are a few things to
416   Behavioral Mathematics for Game AI

          First, we need to be willing to change decisions. We need to not be afraid to
      allow our agents to do so. After all, we real people do it all the time—and usually for
      good reason. On the other hand, the ability to change a decision is like a powerful
      weapon that can be deadly if it falls into the wrong hands.
           Second, decisions do not exist in a vacuum. We can’t simply process the world
      as it is at that moment and expect consistently realistic-looking results. Somehow we
      need to step out of the present and look at the whole picture—whether it is the
      repetitive strobing behaviors of the past or the possibility that the decision we make
      now will not be valid in the future.
           Third, we must find that balance between making more decisions or fewer.
      More active processing adds to the responsiveness of our agent—but can make him
      too responsive and flighty. Less processing is more efficient and even somewhat more
      realistic but can lead to an appearance of indifference, bewilderment, or outright
           We can throw one final twist into our decision-making engine, however, and
      this is where the human-ness really sinks in! Rather than changing our mind about
      a decision we already made and replacing it with a newer, better decision, how
      about changing our mind about a decision that we are making and, before we act
      on it, decide to replace it with a worse one?
         Confused? You could always decide to turn the page… unless you have changed
      your mind.
16              Variation in Choice

     n the past few chapters, all of our examples have been solved by finding “the
     best” answer to the problem. We built all of our utility functions to rate and
     score the aspects of the decision, sorted the options by the score, and selected
 the one with “the best” score (be that highest or lowest). Naturally, this sounds
 strikingly like normative (or prescriptive) decision theory. In fact, if we recast it in
 the vernacular we used in Chapter 4, we were determining what we should do.
 Remember that normative decision theory assumes that our agent:

     Has all of the relevant information available
     Is able to perceive the information with the accuracy needed
     Is able to perfectly perform all the calculations necessary to apply those facts
     Is perfectly rational

      We spent a good deal of time in Part II of this book showing examples of how
 humans just don’t do what is perfectly rational. On the other side of the spectrum,
 descriptive decision theory tells us what people elect to do. More specifically, it reports
 what they have already done. Because our data for how demons from the under-
 world will counter the assault of a single space marine is somewhat lacking, we have
 to construct our own models—which defies the survey-based definition of descrip-
 tive decision theory.
      Our goal, however, is to create behaviors that only look like they are born out
 of the wide palette of descriptive decision theory’s data. The first step in this direc-
 tion is away from the single “best answer” that normative decision theory gives.
 That is, we are looking for variation in our decisions. How do we create this varia-
 tion without data from which to work? To find the answer, let’s first examine why
 we want variation in the first place.

418   Behavioral Mathematics for Game AI


      There are a number of reasons why we would want to create variation in the deci-
      sions our agents make. We can illustrate some of them best by using examples of
      what it looks like when variation is not present. Imagine the following:
           We enter a bank containing 20 customers, and 5 employees. We approach
           the middle of the lobby and begin yelling loudly that we are robbing the
           bank. As we look around to assess the impact that our display is having
           on the people, we are stunned to realize that all 25 people are reacting in
           exactly the same way. They are all standing there in stunned silence, too
           afraid to move. Not a single one has screamed or spoken. Not a single one
           has taken a step away from us—or toward us, for that matter. All 25 people
           have reacted in exactly the same way!
           Curious about this odd display of solidarity, we reach into our jacket for
           our gun. The moment our hand disappears into our pocket, every single
           customer in the bank shrinks into a crouch, ready to either duck or run. All
           of them. A glance at the tellers behind the counter shows that all five of them
           have raised their hands.
           As we pull out our gun and wave it wildly in the air, every player in this
           peculiar show drops to the floor and screams. None of them only duck.
           None of them try to run for the door. None of them are still paralyzed with
           fear. None of them even fall to the floor faster or slower than the others.
           The tellers, in the mean time, have all raised their right hands toward us in
           a placating gesture. Exactly two seconds later, their left hands reach into
           their bank tills. Simultaneously, all five say, “Here, take the money!”
           Dying to find out what this impromptu synchronized exhibit will do next,
           we turn the gun to point it at our own head. The customers all stop scream-
           ing, even the small girl in the corner. The employees all put down their
           hands in such rigid precision it seems like they are about to bust into a
           group version of the Macarena.
           We pull the trigger on the gun and the water inside squirts against the side
           of our head. All 25 people gasp in surprise in what sounds like a single,
           magnified intake of breath.
           As the water trickles to the floor, we yell “Hey folks! I was just kidding! It
           was a joke!” As if cued by the producer of a bad TV sit-com, all 25 people
           begin laughing.
           We put our now empty squirt gun back into our pocket, whereupon all 25
           people turn back to what they were doing prior to our entrance. To a man,
           they all seem completely unfazed by the trauma we just inflicted on them.
                                                      Chapter 16 Variation in Choice      419

             On the other hand, we are a little disturbed by what we just witnessed. We
             turn to the door, muttering under our breath, “Wow… it was like a bad
             video game or something.”

       As ludicrous as the above scenario seems, many of us have probably experienced
       something like this in games. The underlying technical problem with the above
       scenario (if you didn’t get it already) is that all the agents were using the same algo-
       rithm to determine their reaction to a stimulus. Every time we changed the world
       state through our actions, we fed the 25 people the same information with which to
       calculate their decisions. As a result, they all arrived at the same conclusion about
       what the “best” reaction should be.
            The only difference in the actions of the various people was that the employees
       and the customers had different reactions at one point. The reason for that is that
       they had different programmed reactions to the stimulus of us waving our gun
       around. The customers dropped to the floor and began screaming; the employees
       tried to placate us. The only reason for this occurring is that, at some point, either
       a designer or a programmer differentiated between the two: “When the player does
       [insert action], I want the customers to [insert reaction] and the employees to
       [insert reaction].”
            While the separation of customer vs. employee is admirable and was certainly
       a logical division, there is a subtle implication in that statement. Imagine the above
       statement rephrased this way: “When the player does [insert action], I want every
       single customer to [insert reaction] and every single employee to [insert reaction].”
       By inserting the specifier “every single…” we draw attention to the inherent weakness
       in this line of thinking.
            The differentiation between customer and employee was based on the reason-
       able premise that customers and employees would likely react to the situation in
       different ways. This is a logical assumption given the fact that their roles in a bank
       robbery would be different. They enter the scenario of a bank robbery with differ-
       ent goals. The customers’ primary goal is “don’t get killed.” The employees have
       two goals: “don’t get killed” and “defuse the situation by giving the really scary man
       what he wants.” The problem with the artificial intelligence (AI) as theoretically
       designed in our example is that every person’s solution for how to reach those goals
       was exactly the same as every other person’s.
           If we were to imagine how a scenario such as this one would go in real life (or
       even in a movie, for those of you who can’t envision yourself in the position of the
       bank robber), we likely would have a different picture than the one painted above.
       Simply making a list of possible customer reactions is enlightening.
420    Behavioral Mathematics for Game AI

             Stand paralyzed with fear
             Scream and run
             Scream and duck
             Scream and grab your child/parent/husband/wife/random stranger
             Just run (no screaming)
             Just duck (no screaming)
             Just grab your child/parent/husband/wife/random stranger (no screaming)
             Move slowly away from the robber
             Move toward the robber
             Try to get behind a desk
             Try to jump over a desk
             Reach for the robber’s gun
             Reach for our own concealed weapon
             Try to call 911 on our phone
             Laugh because the robber is wearing dark glasses and a stupid hat

           As we can see, there are a lot of options that the customers and the employees
       could have chosen from. There are probably many more. Add to this list that most
       of these actions could be done with various delays and speeds. For example, not
       everyone is going to scream at exactly the same time.
            If we can anecdotally list so many actions that our agents could select, the fact
       that they all selected the same action becomes even more startling. Variation in
       behavior is what makes humans look… well… human. Because, as we are fond of
       saying, “no two people are exactly alike,” it is unlikely that their reactions to a given
       set of stimuli will be the same.

       While it is important to consider the behaviors of a group of actors, we also need to
       consider the actions of a single actor. Imagine, if you will, the following scenario:

             We enter a bank containing 20 customers and 5 employees. We approach
             the middle of the lobby and begin yelling loudly that we are robbing the
             bank. The woman to our left shrieks in terror. The man to our right drops
             to the ground and assumes the fetal position. The teenager in back looks up
             from her phone where she is text-messaging, shrugs, and looks back down.
                                               Chapter 16 Variation in Choice      421

    The security guard by the wall turns toward us and stares slack-jawed in
    disbelief and, after two seconds, reaches for the gun at his belt. The girl
    in the pink dress says, “Look at the man with the dark glasses, Mommy! His
    hat is stupid!”

    After displaying our nifty squirt gun trick to the bank patrons and employees
    (and shooting an annoyed look at the little girl with the big mouth), we
    leave the bank.

    Later in the day, we return to the bank, which (surprisingly) still contains
    20 customers and 5 employees. We approach the middle of the lobby and
    begin yelling loudly that we are robbing the bank. The woman to our left
    shrieks in terror. The man to our right drops to the ground and assumes
    the fetal position. The teenager in back looks up from her phone where she
    is text-messaging, shrugs, and looks back down. The security guard by
    the wall turns toward us and stares slack-jawed in disbelief and, after two
    seconds, reaches for the gun at his belt. The girl in the pink dress says,
    “Look at the man with the dark glasses, Mommy! His hat is stupid!”

    Wow. That’s odd. Did I just step into my own version of the movie
    Groundhog Day? What’s with these people? That’s exactly what they did
    earlier today! Wait a minute… let’s try this again.

    We step outside onto the sidewalk and count to ten before we reenter the bank
    lobby. We approach the middle of the lobby and begin yelling loudly that
    we are robbing the bank. The woman to our left shrieks in terror. The man
    to our right drops to the ground and assumes the fetal position. The
    teenager in back looks up from her phone where she is text-messaging,
    shrugs, and looks back down. The security guard by the wall turns toward
    us and stares slack-jawed in disbelief and, after two seconds, reaches for the
    gun at his belt. The girl in the pink dress says, “Look at the man with the dark
    glasses, Mommy! His hat is stupid!”

      My guess is that you, my ever-gentle reader, skipped the last paragraph of our
little story above. Why? Because it was stale the third time around. You knew what
was going to happen. You had already seen how each person was going to react to
our declaration of impending robbery. If I asked you what the woman on the left
would do if we reentered the bank and once again made our threat, you would be
able to tell me with unflagging certainty that she was going to shriek in terror. After
all, we have been presented with the impression that she always shrieks in terror.
Every time. Without fail. Why would we think anything else?
422   Behavioral Mathematics for Game AI

           Not only is rigid predictability boring to us, the observer, but it is not even re-
      motely human. If we think of a situation in our own lives that occurs regularly, it is
      likely that we can also make a list of our actions or reactions to that situation at dif-
      ferent times. For the sake of continuity, let’s look again at the list of bank robbery
      reactions above. How many of them could we see ourselves possibly exhibiting?
      Some of them might be more likely than others, of course, but we would have at
      least a small chance of doing a great number of them. There also may be a few that
      we may never consider. (I can no longer see myself jumping over a desk, for example.)
           More importantly, if we found ourselves in a bank robbery situation again later
      (hopefully not the same day!), would we pick the exact same reaction? We still
      would have the same list of things we might do, but could we guarantee that any one
      of them is the one we would do every time? It is far more likely that we would react
      different ways each time we were faced with a bank robbery situation. If we experi-
      enced ten different robberies that were all similar in nature, how many different
      reactions would we exhibit? Two or three? Five? Would we have ten different reac-
      tions? (At this point in our saga, I am thinking that banking exclusively online
      would also start to look more attractive.)
          The quandary becomes apparent when we juxtapose having all those choices
      that we might select from against a formula or algorithm that tells us that this is “the
      best” reaction to have every time. Just as it was a false assumption to think that every
      person in the bank would react the same as every other person one time, it is likewise
      a tenuous assertion to say that one person would react the same way every time.


      Individual people make decisions based on a massive collection of information—
      only some of which is the current environment. The adage goes that “we are the sum
      of our life experiences.” Those experiences were, themselves, information at one
      time. Now, however, they have taken on a different role in our lives. What were
      events at the time have become memories. Experiences have shaped beliefs. Traumas
      have begotten fears. Pleasures have nurtured cravings and desires. Put into mathe-
      matical terms, those life experiences of the past are now the formulas, equations, and
      algorithms through which we pass the inputs of the current environment.
           The problem is, even if we were to analyze our own life in great detail, we
      would be hard-pressed to codify our own psychology accurately enough to
      construct the algorithms that would yield the “right way” for us to respond to those
      inputs. Even if we could do so, the construction of such a system would be prohib-
      itively labor-intensive. (Yes, this is a massive understatement.) Additionally, the
      computations involved would be processor-intensive. Oh yeah… don’t forget that
                                              Chapter 16 Variation in Choice     423

we would have to model the game world in such excruciatingly fine detail that we may
as well be creating the holodeck from Star Trek. To sum up, it ain’t gonna happen.
    Of course, constructing one algorithm to tell us what we “would do” is a lot like
the normative decision theory approach of one algorithm to tell us what we “should
do.” That’s what got us into this problem in the first place. To return to our bank
example, we would have to construct algorithms to replicate the life experiences of
20 typical bank customers and 5 typical bank employees. And when the cops show
up at the bank to arrest us, we would have to create three or four individual “life
experience models” of typical police officers. If constructing and using one such
model was prohibitive, we probably don’t have a word for what the prospect of
creating dozens of them would be like. (I nominate “insane.”)

Never Mind All That…
Remember, however, that we really don’t care about how a bank customer got to the
point in his life that would cause him to react in a particular way; we care about
the result of all of that information and experience-building. This is the same ratio-
nale that we used with the dentists dispensing their helpful advice. We don’t know
or care why four of the five dentists recommended sugarless gum or why the fifth
did not. We just know that they did. If we were going to create our Dental Advice
Simulator, we would not have to worry about the fact that one of the dentists’
spouses recently died of complications due to diabetes and another is taking kick-
backs from the Sugar Growers of the World Association. To simulate the dentists
we would simply codify that 80% of dentists we encounter recommend sugarless
gum and the remaining 20% do not. We can leave it up to the players to overlay
such creative interpretations if they really feel the driving need to do so.
    As we dealt with in Chapters 11 and 12, probability distribution and random
selection is something we can model rather well. How hard is it, after all, to recom-
mend sugarless gum 80% of the time? Using that simple procedure, our Dental
Advice Simulator could simulate a whole convention hall full of dentists! We have
substituted simple probability distributions for the complex (and largely unneces-
sary) process of generating the minutia of what goes into individualized human
decisions. The results are strikingly compelling, however. When we query our faux
dentists about their thoughts on sugarless gum, rather than acting in rigid unison,
80% of them (give or take a few, I’m sure) would raise their hands. A player of our
Dental Advice Simulator would see that, nod to himself, and say, “Wow… looks like
about four out of five to me. That’s a reasonably accurate simulation of dentists!”

A Framework for Randomness
It’s important to note that we did not make the decision completely random. We
did not flip a coin to decide. Doing so would have generated a 50/50 split on the
424   Behavioral Mathematics for Game AI

      Great Sugarless Debate. We used a random number process in conjunction with a
      carefully constructed probability distribution (i.e., four out of five).
          Yes, the dentist example was simple, but we can do far more if necessary. In fact,
      we explored some of this in Chapter 11 when we randomly created a population of
      guesses in the Guess Two-Thirds the Average Game that was similar to the results
      of the Danish study. Our approach in that exercise was to use random generation
      of guesses based on a combination of specifically tailored probability curves.
           Thinking further back to Chapter 6, recall the real lesson of the Guess Two-
      Thirds the Average Game. We were trying to find the sweet spot between the purely
      rational (yet very unlikely) answer of 0 and a completely random (and also unlikely)
      guess. That is, we were trying to get away from the rigid rationality of normative
      decision theory and move toward the more human-like descriptive decision theory.
      Our solution at the time was the carefully crafted and controlled application of
      structured randomness.
           The question that we still must answer, however, is how do we build that struc-
      ture around our randomness? In the Guess Two-Thirds the Average Game, we had
      an example from which to work. It was much like how I was able to paint my some-
      what realistic-looking pig because I was looking at a photo of a pig. From an artis-
      tic standpoint, I couldn’t draw a pig from scratch. However, by looking at one, I
      could copy the bits and pieces and replicate what was already there. We did this in
      Chapter 11 by reconstructing the guesser distribution one part at a time. We
      weren’t modeling why they were guessing the way they did, only that they did so
      because a survey told us, “This is how 19,000 Danes guessed.”
           Unfortunately, we don’t always have a handy survey to tell us how people di-
      vided themselves among the choices that range between “the best” and “the most
      ridiculous ever.” We need to have a more procedural way of outlining the frame-
      work that will support our otherwise stochastic process. Surprisingly, we are already
      closer to the solution than we might think.


      Let’s break our problem down into the features we need in our decision model and
      the features we don’t want.

             We want more variety than “the best” answer.
             We don’t want to include “the worst” answer.
             We do want to include “reasonable” answers.
             We don’t want to include “unreasonable” answers.
                                                          Chapter 16 Variation in Choice       425

             In the above criteria for what we do and don’t want, we were coming at the
         problem from both ends. We want more than just the best choice. We acknowl-
         edged that there are other choices that might be acceptable. On the other hand, we
         certainly don’t want the worst choice, and there are likely a lot of choices that aren’t
         the worst but are pretty darn bad as well. We want to avoid those. But how do we
         separate the good from the bad?

         If we list our choices ranked from optimum down to downright silly, someplace in
         the middle is a meeting point between the good ones and the bad ones. The prob-
         lem is that we don’t know where that point happens to be. This causes us to get a
         little arbitrary at first.
             In our Dudes example, we scored each combination of Dude and weapon. By
         selecting the best score (in this case, the lowest number), we chose which weapon
         to use and which Dude to target. However, that means that every time we encounter
         that exact arrangement of Dudes, we will do exactly the same thing. While this may
         not be startling in and of itself, what if there are two of us… or five… or a whole
         bank full of us all running the same algorithm. We would all respond exactly the
         same way. We would all pull out the same weapon and fire at the same Dude.
              Never mind that this collective response is a tactical nightmare; it just looks silly.
         It looks even more ludicrous when we translate it back out of our Dude scenario
         and into something like our bank example. Even if everyone was processing the
         same inputs, would everyone react exactly the same way? Not likely. Therefore,
         despite the fact that there may be a single best Dude to attack with a single best
         weapon, it probably does not behoove us to insist that we follow that model rigidly.

         Opening the Playbook
         Thankfully, one of the advantages of ranking the scores was that we learned more
         than simply which one was best. We also are aware that there were others that were
         decent choices as well… just not the best. To confirm this, let’s examine the results
         of our final Dude example from Chapter 14. (Note that in Chapter 14, we had lim-
         ited our list to only the best weapon for each Dude. Here, we include them all. We’ll
         see why in a moment.)

             Name                   Weapon              Score
             Evil Genius            R/L                 7.7
             Boss Man               R/L                 8.5
             Baddie 3               R/L                 9.7
             Evilmeister            R/L                 9.8
426   Behavioral Mathematics for Game AI

          Evil Genius          M/G                 10.8
          Baddie 3             Shotgun             11.2
          Baddie 3             M/G                 12.6
          Baddie 2             R/L                 13.1
          Evilmeister          M/G                 13.8
          Evil Knievel         M/G                 14.3
          Boss Man             M/G                 14.8
          Baddie 2             M/G                 16.3
          Baddie 1             R/L                 27.0
          Baddie 3             Pistol              28.7
          Baddie 1             M/G                 28.8
          Baddie 1             Shotgun             29.4
          Baddie 1             Pistol              38.6
          Evilmeister          Pistol              44.0
          Baddie 2             Pistol              54.2
          Boss Man             Pistol              100.0
          Evil Genius          Pistol              257.7
          Baddie 4             R/L                 634.7
          Baddie 4             M/G                 637.0
          Baddie 4             Pistol              654.5
          Evil Knievel         Pistol              1,339.7
          Evil Knievel         Shotgun             1,339.7
          Evil Knievel         R/L                 1,339.7
          Evil Genius          Shotgun             1,777.0
          Evilmeister          Shotgun             2,183.4
          Boss Man             Shotgun             2,421.3
          Baddie 2             Shotgun             2,428.8
          Baddie 4             Shotgun             2,531.1

          Remember that we had 8 enemies and 4 weapon choices for each, giving us 32
      possible combinations. We have ranked them here in order from the best choice to
      the worst. Recall that our eventual selection in this scenario was to kill Evil Genius
      with our rocket launcher. After all, our delightfully thorough algorithm told us
      that was the best answer.
                                                Chapter 16 Variation in Choice       427

Wheat and Chaff
Now that we have looked at all the possibilities, however, we can see that, while
“shoot Evil Genius with the rocket launcher” is, indeed, the lowest score, there
were plenty of other options that were very close. How much worse would it have
been for us to attack Boss Man with the rocket launcher instead? In the heat of
battle, would it have looked completely illogical for us to attack Boss Man instead
of Evil Genius? (After all, he’s Boss Man! And he’s got a rocket launcher too!) What
about Baddie 3 (who was 30 feet away with his potent shotgun) or Evilmeister
(only 60 feet away with his machine gun)? Those would certainly have been under-
standable and legitimate targets for us—and we see this reflected in the scores.
      On the other end of the spectrum, we can see that there were plenty of options
that would have looked just plain odd. With all these Dudes running around near
and far, why would we elect to attack the ones that are far away with a shotgun that
is all but useless at those distances? That explains the bottom five options in the list.
They are simply out of the question.

Drawing a Line
As we theorized, someplace in the middle of this list, the good meets the bad (by
blending into the mediocre). While we would have been comfortable selecting one
of the “reasonable” answers, we want to avoid the “unreasonable” ones. The trick
is deciding where that line is. Everything under 10? Under 50? Under 500?
     Pragmatically, we realize that we can’t even use the scores in this list as a guide.
In a different situation, we may not have any options that score under 10… or even
50. The scores will change all the time relative to whatever arbitrary line in the sand
we draw. However, one thing is certain: No matter what the scores are, we can always
arrange them relative to each other. After all, that’s how we determined the best
option before. “Attack Evil Genius with our rocket launcher” was the best relative to
the other scores. The solution, therefore, lies not in basing the threshold on an absolute
score but on the score relative to other scores.
    Instead of limiting ourselves to only the single best item, we can select from a
handful of the best scores. How many options we elect to choose from will be very
problem-specific. (Certainly, you are getting tired of hearing that by now.) We
could, for example, choose to select from the top 10 options. However, what if there
are fewer Dudes and fewer weapons available to us? We may not have 10 options
from which to select.
    Usually, it is a better idea to use the top n% of the choices rather than the top
n choices. In the case of the Dudes, for example, we could decide that our cut-off is
25% of the total possible choices. In this case, we would consider the top eight
items. If we had more options, that number would increase. As we removed Dudes
428   Behavioral Mathematics for Game AI

      from the battlefield and perhaps ran out of ammo for some of our weapons, the
      total number of options available to us would decrease, and the number that we
      would consider would contract with it.
          Now that we have decided that we will consider the top eight choices, we have
      truncated our list to:

          Name                    Weapon          Score
          Evil Genius             R/L             7.7
          Boss Man                R/L             8.5
          Baddie 3                R/L             9.7
          Evilmeister             R/L             9.8
          Evil Genius             M/G             10.8
          Baddie 3                Shotgun         11.2
          Baddie 3                M/G             12.6
          Baddie 2                R/L             13.1

          Those all look like fairly reasonable actions for us to take. Their scores are so
      similar, it almost doesn’t matter which one we choose. Therefore, a simple solution
      would be to pick one of them randomly.

      P UTTING I T   IN   C ODE

      We can do this very simply by changing the selection code in CAgent. The original
      SelectTarget() function sorted the vector and selected the target in the 0 index—
      the lowest-scored combination.
          void CAgent::SelectTarget()



              std::sort( mvTargets.begin(), mvTargets.end() );

              mpCurrentTarget = mvTargets[0].pDude;

              mpCurrentWeapon = mvTargets[0].pWeapon;

                                              Chapter 16 Variation in Choice    429

    By adding two lines, we can determine how many records we should consider
(25% of the total number) and generate a random vector index in that range. Rather
than returning the record at index 0, we are returning a random one between index
0 and (NumberToConsider – 1).
    void CAgent::SelectTarget()



         std::sort( mvTargets.begin(), mvTargets.end() );

         USHORT NumberToConsider = mvTargets.size() * 0.25;

         USHORT i = rand() % ( NumberToConsider – 1 );

         mpCurrentTarget = mvTargets[i].pDude;

         mpCurrentWeapon = mvTargets[i].pWeapon;


    Selecting randomly may seem counterintuitive and even mildly alarming. Why,
after all the work we went through in Chapter 14 to arrive at these scores, would we
seemingly cast it all aside and go back to random selection?
    The answer to this is in the lessons we learned all the way back in Chapter 5. In
the Ultimatum Game, we determined through similarly mathematical methods
that the best option for the Giver to bestow upon the Receiver was $1. That was the
most logical answer. Likewise, it was in the best interests of the Receiver to accept
that amount. (After all $1 > $0, right?) And yet, people placed in those two roles
rarely do either of those “best” actions.
     We explored this further in Chapter 6 with examples such as the Pirate Game,
the Guess Two-Thirds the Average Game, and even the Monty Hall Problem. We
humans like to talk about how rational we are, but, for a variety of reasons, we usu-
ally aren’t completely so. We may occasionally select the “best” answer but usually
only get close by selecting a reasonable answer.

The Payoff
That is what we accomplish by using a hybrid of our scoring system (pure rational-
ity) and random selection (inexplicable irrationality). By allowing our agent to
select from a number of reasonable-looking choices, we are effectively modeling all
430   Behavioral Mathematics for Game AI

      the little vagaries of humanness that we don’t care to model. Perhaps when our
      agent asked for his first rocket launcher at the age of eight, someone told him
      “You’ll put your eye out, kid,” and he’s been a little wary of them ever since.
      Perhaps he is particularly offended by the dark glasses and stupid hat that Baddie 3
      has donned for this encounter. It doesn’t really matter why they act differently; hu-
      mans simply do. Therefore, we don’t need to model the whys of the different deci-
      sions. We simply provide a mechanism that simulates those different (but still
      somewhat reasonable) decisions.
           Think back to Chapter 1 and the definition of a game provided by Sid Meier:
      “A game is a series of interesting choices.” We examined that notion by using ex-
      amples such as Rock-Paper-Scissors, Tic-Tac-Toe, Blackjack, and Poker. Rock-
      Paper-Scissors is predominantly random. The choices in Tic-Tac-Toe are almost
      entirely predictable. Blackjack has random components, but the dealer’s choices are
      entirely rule-bound; we don’t know what either of us will get next, but we know
      exactly what the dealer will do with it when he gets it. Only when we get to Poker
      do we see a challenge provided by the other player (i.e., the agent). The reason this
      is so compelling is that while we have an idea of what he might do, we can’t predict
      exactly what he will do. Responding to his interesting choices is what makes our
      choices that much more interesting.
           We will find that the result of changing our agent’s model to include more
      options is that he will display broader behavior than when he was restricted to only
      the one choice. In fact, if the agent were to encounter the same circumstances again,
      he may or may not select the same approach. If we are playing as one of the Dudes,
      this is actually a boon. Just as with the Poker player, the agent’s interesting (yet rea-
      sonable) variation of choices makes our choices that much more interesting as well.
           From a player’s standpoint, it also helps avoid situations like that uncomfort-
      able moment in the bank when everyone reacted in exactly the same way. In this
      case, if we were a Dude facing multiple AI agents, they would look just as peculiar
      all doing exactly the same thing. It seems that not only is variety the spice of life, but
      it keeps AI from looking like the Stepford Wives—completely identical in thought
      and deed… and horribly boring.

      We still have one problem with the approach we are using. We arbitrarily chose to
      randomly select from the top 25% of options. In the case of our primary example,
      that meant there were eight possibilities in play. Once we had split those eight
      options off, for all intents and purposes, we treated them equally. We didn’t bias the
      random selection whatsoever. Because the top eight options were similar, there
      was little difference between them anyway. We can see this in the black bars of
      Figure 16.1.
                                               Chapter 16 Variation in Choice       431

    While that approach worked for our sample data, imagine that our scores had
turned out more like what we see with the gray bars in Figure 16.1. In that example,
there is a large jump between the score of the first two and the rest of the options.
We can no longer consider these eight options as being equal. That hamstrings our
idea of selecting randomly from that list of eight.

       FIGURE 16.1 If there is a significant difference between the scores of our
                 top n selections, we should not treat them equally.

     One possible solution to this problem is to reduce our cutoff to two possibilities
instead of eight. However, that almost negates the purpose of expanding it in the
first place. Also, we have no way to predict where that cusp will be on any iteration
of this process. Looking at our original data (black bars), we could have included
the first 12 records before we saw a significant jump in the scores.
     To continue to provide some variety in our choices, we would like to continue
drawing from the top eight selections (or, more generally, the top 25% of the
possible selections). What we do not want to do, however, is to treat all eight pos-
sibilities as equal.
    Thankfully, our scoring system provided us with something other than an or-
dinal ranking of the possibilities. The very fact that the scores vary from one option
to the next reminds us that these differences are proportional. In layman’s speak,
some options are “more better” than the options that come after them. We need a
way to leverage these proportional differences.
432   Behavioral Mathematics for Game AI

          We already know that we would like for th