Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Genetic Algorithms by chenmeixiu


									    Class Project
   Due at end of finals week
   Essentially anything you want, so long
    as it’s AI related and I approve
   Any programming language you want
   In pairs or individual
   Email me by Wednesday, November 3
   Implementing Knn to Classify Bedform Stability Fields
   Blackjack Using Genetic Algorithms
   Computer game players:Go, Checkers, Connect Four,
    Chess, Poker
   Computer puzzle solvers: Minesweeper, mazes
   Pac-Man with intelligent monsters
   Genetic algorithms:
       blackjack strategy
   Automated 20-questions player
   Paper on planning
   Neural network spam filter
   Learning neural networks via GAs
   Solving neural networks via backprop
   Code decryptor using Gas
   Box pushing agent (competing against an
    What didn’t work as well
   Too complicated games: Risk, Yahtzee, Chess, Scrabble,
    Battle Simulation
        Got too focused in making game work
        I sometimes had trouble running the game
        Game was often incomplete
        Didn’t have time to do enough AI
   Problems that were too vague
        Simulated ant colonies / genetic algorithms
        Bugs swarming for heat (emergent intelligence never happened)
        Finding paths through snow
   AdaBoost on protein folding data
        Couldn’t get boosting working right, needed more time on
         small datasets (spent lots of time parsing protein data)
    Reinforcement Learning
   Game playing: So far, we have told the
    agent the value of a given board
   How can agent learn which positions
    are important?
       Play whole bunch of games, and receive
        reward at end (+ or -)
       How to determine utility of states that
        aren’t ending states?
    The setup: Possible game
   Terminal states have reward
   Mission: Estimate utility of all possible game
    What is a state?
   For chess: state is a combination of
    position on board and location of
       Half of your transitions are controlled by
        you (your moves)
       Other half of your transitions are
        probabilistic (depend on opponent)
   For now, we assume all moves are
    probabilistic (probabilities unknown)
Passive Learning
   Agent learns by “watching”
   Fixed probability of moving from one state
    to another
Sample Results
Technique #1: Naive Updating
   Also known as Least Mean Squares (LMS)
   Starting at home, obtain sequence of
    states to terminal state
   Utility of terminal state = reward
   loop back over all other states
        utility for state i = running average of all
         rewards seen for state i
    Naive Updating Analysis
   Works, but converges slowly
       Must play lots of games
   Ignores that utility of a state should
    depend on successor
    Technique #2: Adaptive
    Dynamic Programming
   Utility of a state depends entirely on the
    successor state
       If a state has one successor, utility should
        be the same
       If a state has multiple successors, utility
        should be expected value of successors
    Finding the utilities
   To find all utilities, just solve equations

   Set of linear equations, solveable
   Changes each iteration as you learn
   Completely intractable for large problems:
       For a real game, it means finding actual utilities of
        all states
    Technique 3: Temporal
    Difference Learning
   Want utility to depend on successors, but
    want to solve iteratively
   Whenever you observe a transition from i to

   a = learning rate
   difference between successive states =
    temporal difference
   Converges faster than Naive updating
    Active Learning
   Probability of going from one state to
    another now depends on action
   ADP equations are now:
    Active Learning
   Active Learning with Temporal Difference
    Learning: works the same way (assuming
    you know where you’re going)

   Also need to learn probabilities to
    eventually make decision on where to go
    Exploration: where should
    agent go to learn utilities?
   Suppose you’re trying to learn optimal
    game playing strategies
       Do you follow best utility, in order to win?
       Do you move around at random, hoping to
        learn more (and losing lots in the process)?
   Following best utility all the time can
    get you stuck at an imperfect solution
   Following random moves can lose a lot
    Where should agent go to
    learn utilities?
   f(u,n) = exploration function
       depends on utility of move (u), and number of
        times that agent has tried it (n)
   One possibility: instead of using utility to
    decide where to go, use

   Try a move a bunch of times, then eventually
   Alternative approach for temporal difference
   No need to learn probabilities: considered
    more desirable sometimes
   Instead, looking for “quality” of (state, action)
    Generalization in
    Reinforcement Learning
   Maintaining utilities for all seen states in a real game
    is intractable.
   Instead, treat it as a supervised learning problem
   Training set consists of (state, utility) pairs
       Or, alternatively, (state, action, q-value) triples
   Learn to predict utility from state
   This is a regression problem, not a classification
       Radial basis function neural networks (hidden nodes are
        Gaussians instead of sigmoids)
       Support vector machines for regression
       Etc…
    Other applications
   Applies to any situation where
    something is to learn from
   Possible examples:
       Toy robot dogs
       Petz
       That darn paperclip
       “The only winning move is not to play”

To top