Production Utility
Shared by: alicejenny
-
Stats
- views:
- 0
- posted:
- 9/29/2012
- language:
- English
- pages:
- 15
Document Sample


Production Utility
• When multiple productions match, how to
choose best one to fire
– Conflict set: the productions that match
– Conflict resolution: the scheme for choosing the
rule to fire
• For ACT-R - production with highest Utility
Utility
• Each production has a utility u associated
with it
– Explicitly set or
– Learned from experience
• Utility = u + noise
– Noise controlled by :egs (expected gain noise)
• logarithmic distribution, mean 0 and variance:
p2
s2 = s2
3
Probability of firing
Given j productions with utility Uj:
The probability of choosing production i:
S - expected gain noise
Summation is over all productions in conflict set
Building-sticks problem
• Unlimited supply of building sticks
• Three lengths
• Build a target stick of a particular length
• 2 basic strategies
– Start small
– Start big and saw off
• Tendency to hill climb
Building Sticks
Bst-nolearn
• A = 15 B = 200 C = 41 Goal = 103
– Diff B and Goal = 97, diff C and Goal = 35
– Only solution B-2C-A
• A = 10 B = 200 C = 29 Goal = 132
– Diff B and Goal = 68, diff C and Goal = 103
– Only solution B-2C-A
Model
• Run model
• Look at productions and trace
• Encode-under
– Encodes difference between goal and stick-c
• Encode-over
– Encodes difference between goal and stick-b
• (spp decide-over decide-under force-over-
force-under)
Production parameters
> (spp force-over force-under decide-over decide-under)
Parameters for production FORCE-OVER:
:utility 8.720
:u 10.000
:at 0.050
Parameters for production FORCE-UNDER:
:utility 8.328
:u 10.000
:at 0.050
Parameters for production DECIDE-OVER:
:utility 16.871
:u 13.000
:at 0.050
Parameters for production DECIDE-UNDER:
:utility 6.597
:u 13.000
:at 0.050
Setting Utility
(spp decide-over :u 13)
(spp decide-under :u 13)
(spp force-over :u 10)
(spp force-under :u 10))
Calculate probabilities
Difference of 35 -> 1 decide and 2 force, :ut = -100 means 0 chance
of failure, :egs = 3
Probability(force) = .248 Thus .248 chance that will try to solve in
wrong direction
Utility Learning
• Difference Learning Equation
Ui (n) = Ui (n -1)+ a[Ri (n)- Ui (n -1)]
learning rate (0.2)
• Ri(n) = reward rule receives for its nth
application
Reward
• Asynchronous
• Ri(n) = reward value - time from rule selection to
the reward
• Applies to all productions between current reward
and previous reward
• (trigger-reward val) val = number or nil
• Spp
(spp read-done :reward 20)
(spp pick-another-strategy :reward 0)
Learning in BST
a b c Goal %OVERSHOOT
15 250 55 125 20
10 155 22 101 67
14 200 37 112 20
22 200 32 114 47
10 243 37 159 87
22 175 40 73 20
15 250 49 137 80
10 179 32 105 93
20 213 42 104 83
14 237 51 116 13
12 149 30 72 29
14 237 51 121 27
22 200 32 114 80
14 200 37 112 73
15 250 55 125 53
• Majority look like undershoot, however majority
can only be solved be overshoot
• 1st and last
– Solved by either
– 20% overshoot for 1, 53% overshoot for last
BST-Learn
• (sgp :ul t)
• Initial:
– force-over :u 10.0
– force-under :u 10.0
– decide-over :u 13.0
– decide-under :u 13.0
> (collect-data 100)
CORRELATION: 0.775
MEAN DEVIATION: 18.506
Trial 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
29 51 61 67 89 42 79 79 57 44 28 35 62 75 51
DECIDE-OVER : 13.1984
DECIDE-UNDER: 11.3041
FORCE-OVER : 12.0823
FORCE-UNDER : 6.5455
Set/show production parameter-SPP
• Name
• :at – action time of production
– defaults to :dat
• :u
– Current U value
– Defaults to :iu
• :utility
– Last computed utility calculated during conflict
resolution
– Can’t be set
• :reward
– Reward triggered by production firing
Get documents about "