Document Sample
wp3 Powered By Docstoc
					                                PROGRAMMABLE USER
PUMA                            MODELLING APPLICATIONS

        Towards a Knowledge-Level
         Specification for Cardbox

                          Richard M Young
                                 March 1997


             Working paper only – please do not cite

Principal Investigator       Dr Ann Blandford           Middlesex University
Research Fellow              Richard Butterworth        Middlesex University
Academic Collaborator        Dr David Duke              University of York
Research Fellow              Jason Good                 Middlesex University
Industrial Collaborator      Sue Milner                 Praxis Critical Systems Ltd
Academic Collaborator        Dr Richard Young           APU Cambridge

 Contact :           Dr Ann Blandford
             School of Computing Science
             Middlesex University
             Bounds Green Road
             London. N11 2NQ. UK
             tel :   +44 (0)181 362 6163
             fax : +44 (0)181 362 6411                             Project funded by
             email :           EPSRC grant number GR/L00391
      Towards a Knowledge-Level Specification for Cardbox
                                        (PUMA WP3)
                                       Richard M Young
                             MRC Applied Psychology Unit, Cambridge

                                           March 1997


       This document is a Working Paper addressed to my fellow PUMA colleagues. The
       purpose of the paper is to build on paper WP2 and a flurry of recent discussion, to
       try to sketch out how one could construct a formal description of Cardbox and its
       user at the knowledge level.

1. Preliminaries

This document is a Working Paper addressed to my fellow PUMA colleagues. It therefore does
not have the “feel” of a paper intended for publication, even though — if it survives scrutiny
within the project — we would presumably hope to see something derived from it in print.
Nonetheless, I have tried to make the writing as clear as possible, because clarity of
communication is (at least!) as important within the project as it is to outsiders. The purpose of
the paper is to build on paper WP2 and a flurry of recent discussion, to try to sketch out how one
could construct a formal description of Cardbox and its user at the knowledge level.
In case it wasn’t already clear, it will become very apparent that I’m at best an amateur at this
formal specification game. I therefore do not intend the specification as written to be taken too
seriously, and I ask indulgence of those who know better. Instead I’ll grab bits of notation as
seems appropriate, and make some up as we go along, always with the overriding aim of clarity. I
should make it clear that I don’t consider this work to be a serious contribution to the important
quest within the project for an appropriate logic and notation. On the other hand, it is offered as
a contribution to the equally important question of what needs to be captured in a formal
Normally I dislike footnotes, but it seemed appropriate in this document to use them heavily for
technical and notational matters. That way I hope to keep the flow of the argument reasonably
clear, while still having a chance to note and discuss my technical uncertainties.
The paper is organised as follows. After dealing with the device and the task (§2) and a few
general issues (§3), I attempt a series of four technical exercises (§4-7). The exercises are
increasingly difficult, placing increasing demands on the formal description (and in fact I don’t
“complete” the fourth exercise, indeed we barely scratch the surface). The representation,
primarily of the user’s knowledge and cognitive architecture, is developed cumulatively as needed
for each exercise. Section 8 offers some conclusions and reflections on the whole enterprise.
Finally, I’ll write ‘U’ for ‘user’ occasionally, and employ feminine pronouns (because it’s got to
the point where I now feel guilty if I write “he”).
WP3        Knowledge Level Specification for Cardbox                     Young                      2

2. Device and Task

2.1 Cardbox

PUMA readers will probably already be familiar with Cardbox, but for the sake of completeness
here is a description lifted from another document:
   Figure 1 shows a simple program called Cardbox that runs on a personal computer, for
   storing and retrieving information in a database of people’s addresses and other personal
   information. Each person in the database is represented by a “card”, which has the person’s
   name written on the top edge, with their address and other contact information on the body of
   the card. Cards are stored and displayed in alphabetical order of names. Cards appear in a
   window on the screen, stacked in such a way that the body of only the front card is visible, but
   the top edges of all the cards in the window, bearing the names, can be read. The number of
   cards displayed depends on the size of the window, but is typically around ten or so. The
   database might typically hold a few hundred cards.

                                       SMITH, P.
                                    SMITH, A.

                                  15 Peter Pan Way
                                   tel: 01234 567-890

                                          Figure 1. Cardbox.

   Cardbox offers an impressive variety of means for navigating through the database in order to
   find a desired card. First, clicking on the visible, top edge of any card in the window brings it
   to the front of the stack. Since this is the only position in which the information on the card
   can be read, this action is normally the final step in accessing a target card, once it has been
   brought onto the screen. This is also the only unidirectional method of navigation. It can
   move only forward through the database, whereas all other methods can move either
   backwards or forwards. There is also a scrollbar which supports a range of manipulations.
   Clicking on either end arrow causes the stack to move by one card through the database in the
   corresponding direction. (We will call this a step.) Clicking in the scrollbar on either side of
   the “elevator” causes the stack to move by a whole windowful in either direction — what we
   will call a jump. Both stepping and jumping have their continuous counterparts. If the mouse
   is held down on an end arrow, then the display steps continuously through the cards in the
   specified direction. Similarly, if the mouse is held down to the left or right of the elevator, the
   display jumps continuously. In addition to all these, the elevator can be dragged to a
   different position, which changes the stack to the corresponding portion of the database.
   (Furthermore, the real device allows the user to type in the first few letters of the name, which
   is such a simple and convenient way to find a card that the navigation facilities just described
WP3            Knowledge Level Specification for Cardbox                          Young                            3

    are hardly ever used — a fact we will simply ignore for the purposes of this analysis.) Our
    analysis here will, for the sake of sanity, deal with just a few of the possible navigational
In fact we will consider five actions, called StepF, StepB (F for forward, B for backward), JumpF,
JumpB, and Click. We will further assume a variant of Cardbox in which the list of cards is
“circular”, so that the alphabetically first one also follows the last one (‘Aasman’ follows
‘Zygarnik’). And if you don’t like the idea of using a linear scrollbar with a circular list, then
imagine instead that there are four buttons labelled StepF, StepB, JumpF, and JumpB. For this
analysis it doesn’t matter.

2.2 The task

The canonical task is to retrieve an item of information from a named card, e.g. to find
Murgatroyd’s telephone number. More precisely, the goal is to get Murgatroyd’s address known
to the user, which as we shall see, involves making it visible, which in turn requires bringing
Murgatroyd’s card to the front of the screen. For some of the exercises, again as we shall see, we
ignore the main goal and regard the task simply as that of getting the card to the front of the
One of the intriguing things about the Cardbox domain is that, although the device is very simple,
the user’s relevant knowledge can be quite rich. Furthermore, there are interesting variations to
play. For instance, the cards could be in alphabetical order or random order: what difference
does it make to the user’s behaviour? Similarly, there are shortcuts that the user may or may not
know about or discover. For example, she may Click on the back card rather than use JumpF. Or
she might compare the target name with that on the back card rather than the front card. It’s of
interest to enquire into how she might come to know these tricks. Similarly (although it’s outside
the scope of this analysis) one can ask about what happens if the user accidentally “overshoots”
on a repeated Jump, so that a target that was ahead of her is now behind.

2.3 Formal specification of the device

The device is a good place to start with the formal description, because it is so delightfully simple,
perhaps because Cardbox is one of those applications which is “all interface”.

The primitive object classes are as follows1 :
    card:              ; the cards in the box                                                                   [D1]
    name:              ; the name on a card                                                                     [D2]
    address:           ; the address and other information on a card                                            [D3]
    action:            ; actions on the device                                                                  [D4]

The obvious relations name-of and address-of between cards and their associated information:
    name-of:           card ↔ name2                                                                             [D5]
    address-of:        card → address                                                                           [D6]
It will be useful also to label the inverse of name-of:
    card-of:           name → card                                                                              [D7]

1 These kinds of declarations don’t usually seem to be included in formal specifications, though I think they’re
useful for clarity and explicitness, and we do include them in IL.
2 By writing a double-headed arrow, I mean to state that the mapping is 1:1 and there is therefore an inverse
function. How should I write that properly?
WP3           Knowledge Level Specification for Cardbox                            Young                            4

    card-of ≡ (name-of)-1                                                                                        [D8]

We need a couple of parameters for the cardbox:
    N:                  integer                ; the number of cards in the box                               [D9]
    Show:               integer                ; the number of cards shown on the display                    [D10]

State information is given by:
    topcard:            card                   ; the card at the front of the stack                          [D11]
    shown:              card → {T, F} 3        ; whether a card is on the screen                             [D12]

The cards have a circular ordering, which I do not bother to axiomatise in full4 . But we do
define the adjacency relations next and prev:
    next, prev:         card → card                                                                          [D13]

In order to be able to describe the effects of the actions without getting tangled up in recursive
embeddings of next(next(…)) etc., it is convenient to regard cards as having a sequence number.
We could define mappings back and forth between cards and their sequence numbers, but for
simplicity we will refer to cards within the device only directly by their sequence number. Thus
we have
    next(c) ≡ c + 1                                                                                          [D14]
    prev(c) ≡ c – 1                                                                                          [D15]
where all arithmetic is performed modulo N, the number of cards.
Device invariants arise because a given topcard determines also which cards are shown. So we
    shown(c) ≡ (c – topcard) < Show            ; again with arithmetic modulo N.                             [D16]

In this domain, because the operators correspond 1:1 with actions, there is no need to distinguish
between them or name them separately. The actions and their effects on the device are
    StepF, StepB, JumpF, JumpB, Click:                   action                                              [D17]
    topcard = t ⇒ [StepF] topcard = t + 1                                                                    [D18]
    topcard = t ⇒ [StepB] topcard = t – 1                                                                    [D19]
    topcard = t ⇒ [JumpF] topcard = t + N                                                                    [D20]
    topcard = t ⇒ [JumpB] topcard = t – N                                                                    [D21]
    shown(c) ⇒ [Click(c)] topcard = c 5                                                                      [D22]

The cards are assumed to be in alphabetical order. Again, I’m going to duck out of specifying
this ordering6 , which plays a crucial but fortunately only informal role in the analysis.
Finally, and getting closer to the user, there’s the issue of visibility. The essential facts are that
‘topcard’ and ‘shown’ are visible on the display. All shown cards have their name visible, and

3 I’m not sure how one is supposed to write a simple predicate.

4 Indeed, I’m not confident that I’d know how to define a circular ordering. It looks rather trickier than the
standard full linear ordering.
5 What’s the proper way to write the precondition, that the card must be on the screen in order to be clicked on?
Is it in fact a precondition? If so, notice that it’s a physical precondition.
6 The approach would presumably be (a) to define a circular ordering on the names, similar to that for the cards;
and (b) connect the two orderings together by asserting something like next(x) = y ⇔ next(name-of(x)) =
WP3          Knowledge Level Specification for Cardbox                      Young                        5

the topcard also has its address visible. We can write this with what looks to me suspiciously like a
modal operator:
    visible(topcard)                        [D23]
    shown(c) ⇒ visible(shown(c))                                                                    [D24]
    shown(c) ⇒ visible(name-of(c))                                                                  [D25]
    visible(address-of(topcard))                                                                    [D26]
Axioms [D24] and [D25] need care in how they are interpreted and applied, especially
concerning the card designator ‘c’. The operator visible(X) means something like “information
about X is available to the user through visual means”. It therefore cannot convey information
that is not visually available. So, the designator ‘c’ is best restricted to identifiers that are visually
communicable, such as the deictic “that card there on the screen” (see §3.5). For example,
suppose that some card #183 is on the screen, and suppose that it happens to be the card that
Lucy added to Cardbox last Thursday. Then it is true that shown(card-added-by-Lucy-last-
Thursday), but we certainly would not want to assert visible(shown(card-added-by-Lucy-last-
Thursday)), since information about the history of the card is not visually explicit. This point is
taken up again in §4.2.

3. General Issues concerning Knowledge and Cognition

The bulk of the development of a formal description of the user is done in the context of the four
exercises undertaken next. Before we start down that path, there are a few general issues to be
taken care of, and a short by-way to explore.

3.1 Knowledge-level analysis

One of the purposes of this paper is to explore the feasibility of writing a formal description of
the user and her cognitive architecture at the knowledge level. This is not the place to try to pin
down precisely what is meant by working at the knowledge level, but an essential aspect is to make
it possible for us (as analysts) to be able to reason about the user’s behaviour in terms of what she
knows. [This contrasts, for example, with the analysis presented in WP2 which, although it
(necessarily) includes some reference to U’s knowledge, reasons about U primarily in terms of
the mechanics of her cognition, in other words at the symbol level.]
The name of the game is thus to specify enough to be able to carry through the necessary
derivations, while saying as little as possible about the mechanism of the cognitive architecture.
We also, of course, want to minimise the amount of knowledge attributed to the user, since for us
an important reason for working at the knowledge level is precisely to perform a “knowledge
analysis”, i.e. to say what knowledge is needed by the user in order to perform the task.
Treading a path between these various desiderata and constraints is not always easy.

3.2 Policy

While we’re at this level of generality, this is a good place to introduce the notion of policy and to
describe what I’ll call the “standard policy”.7
From a cognitive point of view, one of the dominant characteristics of Cardbox is that it presents
the user with choices: at any time, of the several different navigational actions that are available,
she has to choose which one to take. That observation raises the possibility of describing U’s
behaviour in terms of regularities in the choices she makes. By a policy, we mean simply a set of

7 Yes, I know I sound like an insurance salesman.
WP3          Knowledge Level Specification for Cardbox                             Young                              6

decision rules for choosing an action. For the sake of concreteness, here is the policy — the
“standard policy” — that we’ll use in this paper:
    A. If the target is on the screen, click on it.                                                            [P1]
    B. If the target is off screen and ahead, then jump forward.                                               [P2]
    C. If the target is off screen and behind, then jump backward.                                             [P3]
(I’ve numbered these rules for ease of reference, but they are not intended as part of the formal
specificiation.) The notions of ahead and behind will be expanded on later, but should be
sufficiently clear for present purposes.
A policy is not intended as a cognitive construct. In other words, we are not asserting that users
have policies and make their decisions by consulting them. On the contrary, the point of our
working with policies is that it abstracts away from the details of the cognitive processes involved
in the decision making, and instead focusses on the outcome, i.e. on the action chosen. It is thus
compatible with a knowledge-level analysis. Notice, though, that to be consistent with the spirit of
the enterprise, the conditions must be things that the user knows — after all, it’s the user who has
to choose the action. Then we, as analysts, can assert that if the user knows that the conditions of
a policy rule are satisfied, she will end up choosing the action identified in the rule.

3.3 Knowledge and visibility

To get started on the knowledge analysis, let’s look at how we say that U knows things that
Cardbox makes visible on the screen. We’ll employ two modal operators, K and Ke. K means
“the user knows that”. So to assert that U knows that X, we simply write K(X). So far as
possible, I want to avoid being drawn into details of the formal semantics of K. We’ll simply
postulate some axioms as we need them — which will be very little — and I’ll follow the only
approach I have any familiarity with, which is that given in Ernest Davis’s Representations of
Commonsense Knowledge (Morgan Kaufmann, 1990).
Because of the interactive nature of the situation, we will frequently want to assert that U knows
the value of certain variables or expressions. For this we will use the operator Ke. Suppose that E
is an expression with value V. Then Ke(E) means that U knows that the value of E is V. For
example, we will assume later that Ke(name-of(target-card)), i.e. that U knows the name of the
target card. If the target name is in fact Murgatroyd, then she knows that the name of the target
card is Murgatroyd.
The two operators are of course connected. Suppose that Ke(E) for some expression E with value
V, then the user knows that E = V. We can write
    Ke(E) ∧ E =: V ⇒ K(E = V)8                                                                                 [K1]
where the symbol ‘=:’ means ‘has value’, and is more restrictive than ordinary equality.
So, then, how does U know things from the screen? Simply, we assert that anything visible is
known to the user:
    visible(X) ⇒ Ke(X)                        ; the principle of visibility                                    [K2]
For example, one of the cards on the screen is the topcard, and so from [D25] we have
visible(name-of(topcard)). Axiom [K2] tells us Ke(name-of(topcard)), the user knows the name
of the topcard. If, as in Figure 1, the name is in fact Smee, then [K1] gives us K(name-
of(topcard) = Smee), the user knows that the name of the topcard is Smee.9

8 Notice that if E is a predicate P, then we get K(P = true), which is normally written just K(P).

9 We might want to go further and add “visible(X) ⇒ K(visible(X))”, if something is visible on the screen
then the user knows that it is — after all, she can see it. Now, this is not the place for a treatise on the formal
WP3          Knowledge Level Specification for Cardbox                             Young                          7

3.4 Knowledge of task

Let’s proceed with spelling out a couple more things the user knows. She obviously knows
something about the task. Objectively, i.e. from the analyst’s viewpoint, the task specifies a target
name, and there’s supposed to be a card that bears that name:
    target-name:    name                                                                                       [T1]
    target-card:    card                                                                                       [T2]
    name-of(target-card) = target-name                                                                         [T3]

Just how much of that does the user know? She certainly knows the target name, in the concrete
sense that if the target name is, say, Murgatroyd, then she knows that it is. It’s much harder to say
that she knows the target card, if only because we’re being pretty vague about what a ‘card’ is for
the user (see §3.5). But she does know that she’s trying to find a card, and that it’s the one with
the target name, so let’s try this and see how it works out:
    Ke(target-name)                                    ; U knows the target name                               [K3]
    K(name-of(target-card) = target-name)              ; and that it’s the name of the target card             [K4]

3.5 Basic classes and relations

It's clear that U knows about the basic concepts and relations pertaining to the device, as expressed
in [D1] to [D13]. For example, she knows there are such things as cards and names, and that each
card has a name. What is much less clear is how to say it. There are considerable difficulties. For
one thing, if we simply wrap a Ke() around the “declarative” stuff about concepts and mappings,
it’s unclear what that would mean, although I suppose we could define the meaning as something
like U knows about the concept X and can use it in her reasoning. There is a further difficulty
with variables and parameters, which is that we’ve already gone to some trouble (e.g. in §3.3, and
see also §3.6) to explain that Ke(variable) means that U knows that the variable has its current
value. So on that reading Ke(N) would mean that U knows the total number of cards, which she
probably doesn’t and is certainly not what we intend to say. On the other hand, it may be that we
do need to say something, because otherwise it’s not clear that we can make use of those terms
with a modal K() context. (Although, come to think of it, I don’t remember Davis saying
anything specifically about this.)
Rather than make any serious attempt to address the problem, I’ll simply assert:
    K([D1] … [D13])                           [K5]
and gloss it as
         The user is familiar with the basic classes and relations of the device.

representation of knowledge (and if it were, I’m not the person to write it), but it’s worth pointing out some of
the subtleties involved. From the earlier axioms it’s easy to derive visible(topcard), from which would follow
K(visible(topcard)), the user knows that she can see the topcard. But just what does that mean? There are at
least three possible readings of the underlined phrase. (1) Suppose that the current topcard is #127 with name
Mapleby. Then one meaning is that she can see that card (i.e. the one with name Mapleby) and knows that she
can. Notice that if we were to cut a slot in a cardboard sheet and hold it in front of the screen so that only that
one card were visible, this interpretion would continue to hold true: she would see the card and know that she
could see it, even though she wouldn’t know that it was the topcard. (2) A second reading is that she can see
the card (and knows that she can) and also knows that it’s the topcard. On this interpretation, the statement
would cease to hold if the cardboard mask were interposed. (3) A third reading of K(visible(topcard)) is that it
describes something about U’s knowledge of the device: namely, that she knows the device is such that the
topcard is always visible. Notice that this reading makes no reference to card #127 and is not affected by
cardboard masks. These distinctions matter, because the different readings sanction different inferences. The
potential virtue of a modal logic representation for knowledge is that, applied with care, it can capture these
different meanings and support the correct conclusions.
WP3          Knowledge Level Specification for Cardbox                  Young                       8

and leave it at that until trouble strikes.
Unlike the device, the user does not have access to a uniform representation for cards.
Conceptually, I find this bothersome, but it doesn’t seem to cause problems in practice. That’s
mainly because, for most of the cards we might want to refer to, U does indeed have a way of
representing them. For cards which have a known name, the relation card-of() provides a
descriptive handle. Cards on the screen also have a deictic representation, of “that card there”.
This is clearly a shallow analysis, but is probably sufficient.

3.6 Ke at the symbol level

This section is a detour from the main line of the paper, which is concerned with knowledge-level
analysis. But I thought it might be helpful if I showed how, in a symbol-level analysis, we can
spell out the meaning of Ke in a concrete, first-order representation. Take as example Ke(target-
name), the user knows the target name.
Imagine that inside the user’s head there’s a blackboard with two columns, listing the expressions
that are known about and their known values. Thus the blackboard would have a row containing
the entries ‘target-name’ and ‘Murgatroyd’. If we call the blackboard Ke-beliefs, and represent
each row as a pair, then we get something like
    Ke(target-name) ∧ target-name = Murgatroyd ⇒ <target-name, Murgatroyd> ∈ Ke-beliefs.
Indeed in general, and bearing in mind that translating what’s written in one notation into another
is a meta-transformation that can’t properly be captured in an ordinary rule, we have something
    Ke(E) ∧ E =: V ⇒ <E, V> ∈ Ke-beliefs.

Notice that this is very similar to the approach taken in WP2, of representing information as a set
of “beliefs” on a state. A similar technique could presumably be adopted for more complicated
things the user knows, such as inference rules or in general anything containing variables, but we
would there be going beyond “ordinary logic”.

4. Exercise 1

4.1 The four exercises

We haven’t the listed the exercises yet, so here they are:
  1. Simulate a policy (say, the standard policy) to show that it achieves the task.
  2. Without assuming specific cardbox contents (which therefore precludes simulation), show
     that a policy (say, the standard policy) achieves the task.
  3. Show that even without an explicit policy, a rational user with specified knowledge can
     perform the task, by making local decisions.
  4. Show something interesting about the user’s reasoning, for example that she can figure out
     a successful policy. (In fact we will replace that question by another.)
As already mentioned, these exercises are intended to pose increasingly demanding challenges for
the formal analysis. The exercise done in WP2 doesn’t correspond exactly to any one of these,
but it’s something like #1 with elements of #3.
For each of these exercises, we’ll focus mainly on the subtask of getting the target card to the
front of the screen. At the end of each exercise we’ll mention what has to added to get the whole
task dealt with, where the aim is to have the address on the target card known by the user.
WP3         Knowledge Level Specification for Cardbox                      Young                        9

4.2 Knowing whether the target card is on the screen

Exercise 1 requires us to simulate the standard policy and show that it achieves the task. In order
to work with the standard policy, we have to encode it in formal terms, and that means re-writing
[P1]-[P3] in a more formal notation.
The standard policy has conditions that refer to whether or not the target card is on the screen.
Those conditions are easy to write, K(shown(target-card)) and K(¬shown(target-card)), but the
question is whether they are something that the user can know.
To take the positive case first, the answer to the question is yes, it is something that the user can
know according to the axioms we have already written. However, demonstrating that answer will
introduce us to some of the delights of working with modal logic. It is tempting to argue as
follows. Suppose that the target card is in fact on the screen, so we have shown(target-card). By
[D24], we therefore have that visible(shown(target-card)). But [K2] says that anything visible is
known, so we get Ke(shown(target-card)) from which K(shown(target-card)), which is what we
wanted. Easy!
Unfortunately, the modal operator K, and more particularly in this case the modal operator
‘visible’, do not allow inferences of that kind. And quite rightly so: in a modal context,
descriptions matter. At the end of §2 we showed that visibility cannot handle a designator such as
card-added-by-Lucy-last-Thursday. ‘Target-card’ is another such designator that is not itself
visually apparent. Instead of that short-cut route, the derivation has to make use of the name of
the target card, just as the user does:
  • We’re supposing that the target card is on the screen, so in other words there is a card on the
    screen with the name Murgatroyd.
  • Desginate that card as ‘*’, which from the user’s point of view can be thought of as “that
    card there”. For example, perhaps the user puts the mouse cursor on it.
  • So we have shown(*) and name-of(*) = Murgatroyd.
  • From [D24] we have visible(shown(*)), and from [D25] we have visible(name-of(*)).
  • From [K2], the principle of visibility, we therefore get both Ke(shown(*)) and Ke(name-
    of(*)), which from [K1] expands to K(shown(*)) and K(name-of(*) = Murgatroyd). What
    we have so far shown is that, from the screen, the user knows that there’s a card with name
  • From what the user knows about the task, we have by expanding [K3], that K(target-name =
  • Substituting into [K4] gives us K(name-of(target-card) = Murgatroyd).
  • But from the screen we also had K(name-of(*) = Murgatroyd). So we have K(name-of(*) =
  • Applying card-of from [D8] to both sides yields K(* = target-card), in other words the user
    knows that the card in question is the target card.
  • Since we have from the screen that K(shown(*)), we finally conclude K(shown(target-card)),
    i.e. the user knows that the target card is on the screen.
This result will be useful enough, and was sufficiently hard to derive, that it’s worth designating it
as a theorem:
    shown(target-card) ⇒ K(shown(target-card))                                                   [Th1]
        If the target card is on the screen, then the user knows so.

Also, embedded in the derivation is a simpler theorem that we will find useful:
    shown(*) ⇒ K(shown(*))                                                                       [Th2]
        If a visually designated card is on the screen, then the user knows so.
WP3           Knowledge Level Specification for Cardbox                             Young                          10

We turn now to the negative question, namely: given that the target card is not on the screen, does
the user know that? This question is much harder than the positive case, and I will not go into so
much detail of the step-by-step derivation. It also turns out that the answer (I believe) is no: from
the axioms we have written it does not follow that U will know that the target card is not on the
We can get quite close. Given that U’s knowledge of what is on the screen is correct — after all, if
U can hallucinate, then all bets are off, especially about negative knowledge10 — then we
certainly have ¬K(shown(target-card)), the user does not believe that the target card is on the
screen. Furthermore, we can move back from that ¬K() to concluding that the target card is in
fact not on the screen, simply from the contrapositive of [Th1]. But can the user make that
The difficulty is that we have no axioms that infer negative knowledge, i.e. something of the form
K(¬X), U knows that X is not the case. Specifically, we have nowhere stated that U knows that the
objects she can see on the screen are all the objects there are on the screen. We do seem to need
an axiom along those lines. The axiom most in accord with commonsense is probably to give the
user knowledge equivalent to [Th2], or in other words to assume that U understands enough
about the screen and about vision to know that if a card were shown on the screen, she would see
it11 :
    K(shown(*) ⇒ K(shown(*)))                 ; the user knows that she’s aware of all cards shown                [K6]

The essence of the derivation is that from ¬K(shown(target-card)), the user’s not believing that
the target item is on the screen, we can also state that she knows that she doesn’t believe it’s on the
screen: K(¬K(shown(target-card))). Applying the contrapositive within [Kx], we can then get
K(¬shown(target-card)) — except that, as with the argument for the positive case, we have to be
careful to use the target name as a handle for the card. The details are a bit messy12 , but the idea
is that, with the addition of [K6], we can now show that if the target card is not on the screen, the
user knows so.
Given that we’ve now argued both the positive and negative cases, we can summarise the result by
    Ke(shown(target-card))                    ; U knows whether the target card is on the screen.                [Th3]

4.3 Knowing the direction to the target

It is reasonable to assume that most users, by deploying their knowledge of the alphabet, will
usually have some idea of the relative direction and approximate distance to the target. In some
circumstances, this information can be pretty specific, such as “about 2-3 screenfuls behind”.
The notions of distance and direction on a circular list would obviously need some care in a full
definition, but for now the right thing is to adopt a simple assumption, which is that the user can
judge the direction of the target on the basis of the target name and the topcard name:
    direction:         name × name → {ahead, behind}                                                              [K7]
In practice, we can drop the arguments, since they are almost always the topcard and the target
card, so we can assert simply

10 In this analysis, we really are dealing with knowledge (i.e. assumed to be correct), not just belief. So the
analysis as it stands will not be able to model U’s false beliefs, for example about the internal state of the
device, say the contents of a hidden buffer.
11 It’s probably better to go the whole hog and have K(visible(X) ⇒ K(visible(X))), the user knows that if
something is visible then she knows that it’s visible — after all, she can see it. But see earlier footnote 9.
12 And, to be honest, I’m not confident that I’ve got them right.
WP3            Knowledge Level Specification for Cardbox                          Young                        11

    Ke(direction)                            ; the user knows the direction to the target card               [K8]

We specify more about this judgement of direction in §5.3. Notice that we have said nothing yet
about the correspondence of this judged direction to any notion of “true” direction.

4.4 Formulating the standard policy

We can now write the formal version of the standard policy [P1]-[P3]:
    K(shown(target-card)) ⇒ K(recommended-action = Click(target-card))                                       [K9]
       If the user knows the target card is on the screen, she knows the
       recommended action is to Click on it.
    K(¬shown(target-card)) ∧ K(direction = ahead) ⇒ K(recommended-action = JumpF)                          [K10]
       If the user knows the target card is not on the screen, and that the target is
       ahead, then she knows the recommended action is to Jump forward.
    K(¬shown(target-card)) ∧ K(direction = behind) ⇒ K(recommended-action = JumpB)                         [K11]
         If the user knows the target card is not on the screen, and that the target is
         behind, then she knows the recommended action is to Jump backward.

We have here assumed that the outcome of the policy rules is to determine a ‘recommended-
action’. This formulation attributes just slightly more knowledge to the user than is strictly
necessary, in that having just “recommended-action = X” rather than “K(recommended-action
= X)” would be enough for us to make predictions of U’s behaviour on the basis of her
knowledge. However, it is in the spirit of being a rational agent that U should know that an action
is appropriate before taking it, so it seems worth including the K(). Notice, however, that we do
not state that the user knows the policy rules themselves.
The one thing left is to have the user actually execute the recommended action. I’ll write
    K(recommended-action = a) ⇒ apply a13                                                                  [K12]
but do please see the footnote.

4.5 Specific device assumptions

Simulations are a somewhat unnatural activity for Cardbox (which is in part why I chose this
domain). In order to carry out a simulation, we need to make some very specific, concrete,
assumptions about the contents and configuration of the device:
    Show = 10                                ; ten cards on the screen                                       [S1]
    N = 255                                  ; the exact number doesn’t matter                               [S2]
    name-of(141) = Murgatroyd                ; the target card                                               [S3]
    [] initial14                                                                                             [S4]

13 I really don’t know how to write this properly. I’m using “condition ⇒ apply a” with the understanding
that when the inference applies, the user actually takes the action. (That may give rise to problems later when it
comes to representing mental planning.) I could of course write “… ⇒ apply(action)”, which is I think what
has been done in some of our email, but that seems to have two drawbacks. One is that the device description
would need to have [apply(action)] instead of just [action], which seems an unnecessary complication. Second is
that writing “apply(action)” doesn’t really get us anywhere, since we still have to say that that’s something the
user actually does, and if we can say that then we don’t need the “apply()”. The problem seems to be got round
in MAL by writing “obl(a)”, and having the actual invocation of the action performed by MAL somewhere off-
stage. I suspect there’s something here I badly misunderstand.
14 How do we handle initialisation?
WP3        Knowledge Level Specification for Cardbox                   Young                     12

   initial ⇒ topcard = 113             ; card #113 is at the front                             [S5]
   name-of(113) = Lamartine                                                                    [S6]
   name-of(123) = Lorenzo                                                                      [S7]
   name-of(133) = Moffleman                                                                    [S8]

We also need to assume the user’s judgement of direction is sufficiently trustworthy that in these
very easy cases she judges the target to be forward:
   K(direction(Lamartine, Murgatroyd) = ahead)                                                 [S9]
   K(direction(Lorenzo, Murgatroyd) = ahead)                                                  [S10]

4.6 The simulation

We are finally in a position to perform the simulation. Table 1 show the progression as a series of
inference rules firing, together with the information they generate. After the initialisation, the
simulation has three major cycles, the first two of which involve the firing of policy rule #2
leading to a JumpF, and the last of which fires policy rule #1 for a Click, which brings the target
rule to the front of the screen. There is then a coda (steps 19 to 25) which shows in detail how the
user comes to know Murgatroyd’s address.

              Step      Rule(s)        New information
                 1      S4             initial
                 2      S5             topcard = 113
                 3      K3             K(target-name = Murgatroyd)
                 4      K4             K(name-of(target-card) = Murgatroyd)
                 5      Th3            K(¬shown(target-card))
                 6      S9             K(direction = ahead)
                 7      K10            K(recommended-action = JumpF)
                 8      K12            apply JumpF
                 9      S1, D17        topcard = 123
                10      (Th3)          (K(¬shown(target-card)))
                11      S10            K(direction = ahead)
                12      (K10)          (K(recommended-action = JumpF))
                13      (K12)          (apply JumpF)
                14      S1, D20        topcard = 133
                15      Th3            K(shown(target-card))
                16      K9             K(recommended-action = Click(target-card))
                17      K12            apply Click(target-card)
                18      S3, D22        topcard = 141
                19      D25           visible(name-of(topcard))
                20      K2            Ke(name-of(topcard))
                21      S3, K1        K(name-of(topcard) = Murgatroyd)
                22      step 4, D8    K(topcard = target-card)
                23      D26           visible(address-of(topcard))
                24      K2            Ke(address-of(topcard))
                25      step 22       Ke(address-of(target-card)) *Bingo*
                                Table 1. Simulation of Exercise 1.
WP3           Knowledge Level Specification for Cardbox                             Young                          13

The simulation certain reveals things about the overall structure of the logical model which I’m
not getting right, about the handling of inferences15 and the invocation of actions16 , and I won’t
get them right without help from others. But I don’t think they’re serious enough to affect the
value of the demonstration.

4.6 Solving the whole task

As stated earlier, the analysis here has focussed on the subtask of getting the target card to the
front of the screen, rather than on the overall task of having the user know the address. Here are
some brief remarks about dealing with the overall task.
The knowledge-level architecture we have defined for this exercise has no problem-solving
capability worth speaking of: it is simply a dumb “policy follower”. So it would not be
surprising if it were unable to solve the whole task. As it happens, however, the model does
complete the task, as shown in the final steps of the simulation. Once the target card is on the
screen, the user comes to know the address automatically, because it is assumed to be visible and
therefore known to the user.

5. Exercise 2

Exercise 2 requires us again to show that the standard policy can perform the task, but this time
without making any concrete assumptions about the content and configuration of the cardbox,
and therefore without using simulation. This exercise places more stringent demands on the
formal description, because it is now required to support what is effectively a proof of the
adequacy of the policy.
Most of the machinery is already in place. We do have to say a bit more about the user’s
judgement of direction to the target. The easiest approach seems to be first to make an
unrealistically strong assumption about that judgement, and to sketch a proof on that basis. We
can then discuss the consequences of relaxing the assumption to something more realistic.

5.1 Judgement of direction

What would be an “objectively correct” decision about the direction of the target, for example as
Cardbox itself might compute it? Consider the following decision process:
  • Calculate A = (target-card – topcard) and B = (topcard – target-card), in both cases with
    arithmetic modulo N.
  • We have either A ≤ B, in which case we’ll say that the direction is ahead, or else A > B, in
    which case we’ll say that the direction is behind.
That process computes the direction in which the target is closer to the topcard. Let’s call the
resulting function d-direction, for “device-based direction”,
    d-direction:       name × name → {ahead, behind}                                                           [K13]

15 The model is strongly driven by inferences from a small set of state variables (in both the device and the
user). There is no provision for “undoing” the inferences every so often and having to make them afresh. So in
the table, steps 10, 12, and 13 don’t actually correspond to the firing of an inference rule. Instead, the
information provided by the previous firing is still there. I expect that structure is basically correct, but it does
have a consequence for the invocation of action: see the next footnote.
16 Because, as just explained, rule K12 does not actually re-apply at step 13, there is nothing to invoke the
second JumpF action. I don’t think this is a deep flaw in the model, but it does confirm that I simply don’t
know how to get actual behaviour from it.
WP3            Knowledge Level Specification for Cardbox                   Young                   14

and leave the statement of the function in mathematical notation as an exercise for the reader. As
before, in practice we won’t bother with the name arguments.
Let’s now make the (unrealistic) assumption that the user’s own judgement of direction coincides
with the d-direction:
       d-direction = d ⇒ K(direction = d)                                                       [K14]

5.2 Sketch of proof

We’re now in a position to sketch a proof that the policy will succeed with the task. The
following argument is, I believe, rigorous, even though not expressed formally:
  1.     We’re given the target-name. From it, identify the target-card, t, using [D8].
  2.     Split the proof into two cases, depending upon d-direction. Consider the case where the
         target is ahead.
  3.     If (t – topcard) < Show, then the target is on-screen: go to step 10.
  4.     By assumption [K14], the user will know that direction is ahead.
  5.     Policy rule #2 applies, and the user will perform JumpF.
  6.     The effect of JumpF on the device is to increase topcard by the amount Show. This
         decreases the distance to the target.
  7.     Return to step 3.
  8.     After a finite number of iterations through steps 3-7, each of which reduces the distance to
         the target, we will have …
  9.     … (t – topcard) < Show, and the target card is on-screen.
  10. Policy rule #1 applies, and the user will perform Click on the target card.
  11. The target card will be at the front, with its address visible.
  12. An analysis similar to steps 3-11 can be done for the case where the direction is behind.

5.3 More realistic judgement of direction

Most of that proof is boringly obvious, and therefore unilluminating. But it does highlight the
crucial role of the user’s judgements of direction. Assumption [K14] above wants the user’s
judgements to align exactly with the objectively closer direction. That is unrealistic, especially
around the cross-over point, where over a considerable range the user has no way to really know
which direction is closer.
In fact, if we examine the stucture of the proof, it turns out that the accuracy of the user’s
judgement — i.e. whether it coincides with the objective d-direction — doesn’t really matter. The
important criterion is for the judgements to be consistent, so that once the policy starts Jumping in
a particular direction, it continues in that direction until the target card is on the screen. A more
careful statement of the requirement is that if, in a given configuration, the user judges the
direction to be D, then after a Jump in the D direction the user should still judge the direction to
be D. That implies the existence of a cross-over point, such that for all cards on one side of the
point the user judges one direction, and for all cards on the other side of the point the user judges
the other direction. But it doesn’t matter where the cross-over actually is. It can be anywhere.
It should be a straightforward matter to formalise that requirement, and to replace [K14] by an
assumption that the user’s judgements satisfy the consistency requirement. This is left as another
exercise. Some such requirement is needed. If the user’s judgements are inconsistent, it’s
possible for the standard policy never to achieve the task.
WP3          Knowledge Level Specification for Cardbox                             Young                         15

5.4 Solving the whole task

The model described here is, apart from the addition of [K14] or its replacement, the same as that
simulated in Exercise 1. We pointed out there that it is not a problem solver, and cannot be
expected necessarily to perform the whole task. Nevertheless it does complete the task, as shown
in steps 19-25 of Table 1. For Exercise 2 we need to demonstrate that the result holds in general,
i.e. that once the target card is on the screen, the user will know the address of the target card.
Such a proof should be very easy to construct, and indeed follows closely steps 19-25 of the
simulation except for being “abstracted” away from the concrete particulars, so that Murgatroyd,
for example, is replaced by target-name. Once again, details are left for the reader.

6. Exercise 3

Exercise 3 asks us to show that even without an explicit policy, a rational user with specified
knowledge can perform the task, by making local decisions. This exercise forces us for the first
time to confront seriously the question of what the user knows about the device, and particularly
about the effects of the actions. We’ll start with an easy one, to see what issues it raises.

6.1 Knowledge of Click

The user surely knows that the effect of Clicking on a card is to bring it to the front. We can
    K([Click(c)] topcard = c)                                                                                [K15]
         The user knows that the effect of Clicking on a card is that it becomes the

It’s much less clear what the user knows about any preconditions or filters of the Click action.
Indeed, it’s unclear what the preconditions or filter for Click are. Most of the time Click will be
considered only for cards that are already on the screen, and for which the user has a visually
apparent designator [e.g. topcard, or “that card”, or card-of()]. But it’s clear that the user can
conceive of Clicking on other cards, such as perhaps the target card. We’ll return to these
questions in §6.7.

6.2 Spatial metaphor for location and movement

When we come to consider the Jump actions, we encounter a whole new set of issues. The crucial
question is what the user knows, in relation to the task, that enables her to choose appropriate
actions. The device itself represents the cards in terms of a circular sequence of integers, and the
actions can be defined on that representation (as we have seen, e.g. [D18]-[D22]). But the user

17 An obvious alternative is to write “K(effect(Click(c)) = (topcard = c))”. What are their respective merits?
[K15] as written has the advantage of being consistent with the way effects are specified for the device itself, and
by having an interpretation of “in the new situation resulting from a Click(c), …”, makes a clean separation
between the current value of topcard and topcard in the new, resulting state. A disadvantage is that if we’re
going to try to select actions by virtue of their known effects, then we’ll have to allow variablisation over the
actions in []-brackets. That may or may not cause problems for the MAL. The alternative just suggested has no
problem with binding variables to the action. On the other hand, there is potentially a serious difficulty over the
statement of the effect itself, “topcard = c”. In the modal context of K() that just about works, maybe, because
the whole expression is effectively quoted. But if we ever moved “topcard = c” outside of a modal context, we’d
be in dead trouble, since we’d have two different values for topcard. Come to think of it, that probably means
there are problems with the notation used in WP2, either of inconsistency or the use of unrecognised modal
WP3           Knowledge Level Specification for Cardbox                             Young                            16

does not have access to such a representation. We could just about construct a “nominal”18
description of the actions. For example, if we defined the notion of backcard as the card at the
back of the stack on the screen, then we could state that the effect of JumpF is that the current
next(backcard) becomes the new topcard. But from the user’s perspective, and for capturing
their reasoning processes, that approach strikes me as somewhat perverse. Instead, users are surely
typically going to think about their relationship to the cards in terms of location and movement.
For example, an important component of the effect of JumpF is that it moves us through the list
of cards, in a forward direction.
What I’ll do here is to import a spatial metaphor, with a notion of the target being ahead of or
behind the user, and with ideas of the actions moving the user forwards or backwards through the
list. These ideas are of course vague, but in fact the formal specification approach is very
attractive because it potentially provides a way of saying just what the semantic content of such
operations is (and is not). One big advantage of such a route is that it provides a semantic basis
for the knowledge that the user could have brought to the task from her previous experience,
without requiring us to attribute to her an implausible amount of device- or task-specific
knowledge. In other words, it offers the possibility of explaining where the relevant knowledge
comes from.
Specifically, the spatial metaphor is that the user is standing in the list at the current position of
the topcard. The target card is either ahead of or behind her, and we’ve already seen that she
knows which, in terms of the direction function. The metaphor extends to the idea that the
navigation actions move the user through the list.19 All I can do here is to present the merest
sketch of the formalisation of such a metaphor. I will try to give enough to carry us through the
Exercise, but parts of it will be very crude.
One primitive object class is:
    location:          ; the set of locations [M1]
    at:                location               ; the current location                                           [M2]

The locations have some mathematical structure which is something like a circular ordering.
There are relations of next20 and prev on the ordering, and a notion of direction:
    next, prev:        location → location[M3]
    direction:         location × location → {ahead, behind}                                                   [M4]

Although this looks a lot like the circular list of cards in the device, and indeed in many ways it is,
it’s a lot more abstract. For example, in the device, relations like ‘next’ have concrete
extensions21 : next(16) = 17, for example. But there are no individually named locations, so the
relation next will be reasoned about primarily in terms of their abstract properties, as given in
these axioms.

18 By a nominal description I mean something like a formulation that “works”, i.e. delivers the correct
conclusions, but that has no connection with the (typical) user’s normal conceptualisation of the domain. The
Peano axioms for arithmetic provide a good example. In other contexts I’d probably say “formal” description,
but that obviously won’t do here.
19 Notice that on the version of Cardbox illustrated in Figure 1, with its linear rather than circular ordering of
cards, the spatial metaphor is strongly supported by the depiction and behaviour of the scrollbar.
20 I’m deliberately “overloading” the relation names where they correspond to those of the device. I believe
there’s a technical term for extending functions in this way (polymorphism?).
21 Probably not the correct technical term.
WP3          Knowledge Level Specification for Cardbox                          Young                        17

The metaphor also supports a notion of approximate distance:
    extent:           ; the set of possible extents                                                       [M5]
    distance:         location × location → extent          ; the distance between two locations          [M6]

Extents have a weak kind of ordering:
    <:                extent × extent → {T, F}                                                            [M7]

We identify some individual extents:
    adjacent, partscreen, screenful, far:  extent                                                         [M8]
    adjacent < partscreen < screenful < far                                                               [M9]
The intended interpretation is that adjacent refers to adjacent locations (and is therefore linked to
the next and prev relations); partscreen is roughly, say, half a screenful apart; screenful is about a
screenful apart; and far is several screenfuls.
People’s spatial knowledge is potentially very rich. It includes the understanding of such facts as:
if you move from one location towards (i.e. in the direction) of the other, then you are closer (the
distance to it is less); the bigger the extent of such a move, the closer you are; and so on. I can’t
here carry through the formalisation of such notions, but just to show that it can be done (or at
least, some of it), here is the encoding of “if you move towards a location, you get closer”:
    direction(L0,T) = dir ∧ distance(L0,T) = D ∧ direction(L0,L1) = dir ∧ distance(L0,L1) < D ⇒
                     direction(L1,T) = dir ∧ distance(L1,T) < D                                 [M10]
         If you move from a location towards a target T, but by less than the distance to
         T, then you get closer to T and T is still in the same direction.
Much of this spatial knowledge is applicable to reasoning about navigation through Cardbox.

6.3 Knowledge of actions

The user’s knowledge about actions, in the approach we are taking, has two parts: the literal part
and the metaphoric part. Let’s try it for StepF.
It’s plausible that the user is aware of the effect of StepF in literal terms of its effect on the cards,
namely that second card becomes the new topcard. After all, that card is visible on the screen, and
the effect of stepping is visually apparent:
    K(next(topcard) = c ⇒ [StepF] topcard = c)                                                           [K16]

There is also a parallel interpretation on the metaphoric side, that we move to the next location,
which is in the forward direction, and is only one location distant:
    K(at = L ⇒ [StepF] at = next(L) ∧ direction(L,at) = ahead ∧ distance(L,at) = adjacent                [K17]
It’s worth noticing how the metaphoric information is “richer” than the literal.22
Let’s now try JumpF. It’s hard to know how the user might conceive of the effect in literal terms.
She knows of course that there’s a linear sequence of cards (by [D13] and [K5]), so we can try
writing that the card “just over the horizon”, next(backcard), becomes the new topcard
    K(next(backcard) = c ⇒ [JumpF] topcard = c                                                           [K18]

even though to some extent that’s exactly the sort of “nominal” description (see footnote 18)
we’re trying to avoid. So let’s go for the metaphor:
    K(at = L ⇒ [JumpF] direction(L,at) = ahead ∧ distance(L,at) = screenful                              [K19]

22 The direction and distance information would be derivable from the semantics of ‘next’ if we provided a more
complete description of the metaphoric domain.
WP3           Knowledge Level Specification for Cardbox                        Young                        18

         The user knows a JumpF will move her to a new location a screenful ahead of
         her current one.

The description of StepB and JumpB is left as an exercise … .

6.4 Cognitive architecture

For the cognitive architecture, we need to provide a minimum amount of mechanism to translate
knowledge into behaviour. I will not explore this topic in depth. WP2 provides a lot of detail
about a possible mechanism at the symbol level, and my email of 28 Feb 97 sketches out how one
might set about describing corresponding functionality more at the knowledge level. Here I’ll
provide just the minimum needed to get through the exercise.
We haven’t yet talked about tasks at all. Whatever they are, they’re known to the user:
    Ke(tasks)                                                                                             [C1]

In the spirit of these exercises, we’ll focus on the subtask of getting the target card at the front of
the screen:
    (topcard = card-of(Murgatroyd)) ∈ tasks                                                               [T4]
         One task is to have the topcard be the card with the name Murgatroyd.

Various kinds of knowledge may come into play — such as selecting actions known to have
desired effects — to propose various relevant actions. Further knowledge will apply to narrow
down the choice, hopefully to a single recommended action. We already have rule [K12] to say
that the user carries out the recommended action. Some of the relevant actions may be preferred
to others, in which case only the preferred one becomes recommended:23
    K(a ∈ relevant-actions) ∧ ¬K(a2 ∈ relevant-actions ∧ prefer(a2, a))
                                        ⇒ a = recommended-action                                          [C2]
         If the user knows of a relevant action, and doesn’t know of any other relevant
         action preferred to it, then it is the recommended action.

We obviously need some kind of way to choose actions by virtue of their effects. Here is the
simplest possible version:
    K(t ∈ tasks) ∧ K([a] t) ⇒ a ∈ relevant-actions                                                        [C3]
         If the user knows of an action that achieves a task, then it’s a relevant
         action. 24

6.5 Applying the metaphor

There is a small problem with our spatial metaphor. The metaphoric interpretation of the actions
achieve metaphoric tasks. Because the main reason for invoking actions will be to achieve their
effects, if we have only real (i.e. literal) tasks we will never make contact with the metaphoric
information. To solve this problem, rather than having an elaborate mechanism to map the task
into the metaphoric domain and then back again, we will simply apply the metaphor by adding
metaphoric tasks to the real ones. We won’t need to map back again because, in the approach
we’re taking, the actions are real ones, not metaphoric ones. We can illustrate the essence of the
idea with the mapping rule:

23 I’m skating round the problem of what to do if there’s more than one recommended action, trusting that the
situation won’t arise in this exercise.
24 See further discussion of this rule in §8.2.
WP3          Knowledge Level Specification for Cardbox                             Young                         19

    K((topcard = c) ∈ tasks) ⇒ (at = c) ∈ tasks25                                                             [K20]
         If the user knows there’s a task of having a certain card be the topcard, then
         she adopts a metaphoric task of being located at that card.

Familiar spatial knowledge then provides strong guidance about appropriate actions to take:
    K((at = x) ∈ tasks) ∧ K(direction(at,x) = dir) ∧
        ∃ a • K(at = here ⇒ [a] direction(here,at) = dir ∧ direction(at,x) = dir)
                                            ⇒ a ∈ relevant-actions                                            [M11]
        If the user has a metaphoric task of being somewhere in a particular direction,
        and if she knows of an action which moves her in that direction but not past
        the target, then it’s a relevant action.
    K((at = x) ∈ tasks) ∧ K(direction(at,x) = dir) ∧ ∃ a • K(at = here ⇒ [a] direction(here,at) = dir) ∧
        K(a1, a2 ∈ relevant-actions ∧ (at = here ⇒ [a1] distance(here,at) = d1) ∧
            (at = here ⇒ [a2] distance(here,at) = d2) ∧ d1 > d2)
                                            ⇒ prefer(a1, a2)                                           [M12]
         If the user has a metaphoric task of being somewhere, and she knows of two
         relevant actions, she prefers the one which moves her farther.26

That should be enough to see us through.

6.6 Proving the model

I now claim that the model we have described, with just one or two further elaborations still to be
mentioned, will perform the task. So long as the target card is off-screen, both StepF and JumpF
will be relevant because they metaphorically move the user towards the target, but JumpF will be
preferred by [M12] because it moves her closer. Once the target is on-screen, Click will be
proposed on literal grounds, because it achieves the task directly.27
How would we demonstate that the model does indeed succeed with the task? The thought of
having to do an extensive case analysis appals me. A simpler way would be to show that the
behaviour of the model conforms to the standard policy. We showed in Exercise 2 that the
standard policy succeeds with the task, so if the model conforms to the policy then we’re home.
I’ll work through just part of the demonstration, which is to show that the model conforms to
policy rule #2, which recommends doing JumpF. Suppose that the target card is off-screen and is
judged by the user to be ahead. Given the literal task, rule [K20] applies to set up the metaphoric
task of being at the target card. Rule [M11] applies twice, to infer that both StepF (by rule [K17])
and JumpF (by rule [K19]) are relevant actions. The user knows (from [K17] and [K19]) that
StepF moves to just an adjacent location and JumpF moves by a screenful, so from [M12] she
prefers JumpF to StepF. From rule [C2], JumpF becomes the recommended action, which is what
the standard policy says.

25 This is not quite right, because we haven’t yet specified a correspondence between cards (in the literal
domain) and locations (in the metaphoric). Details, details … .
26 There really ought to be some further guards on the applicability of this rule, such as that the action doesn’t
move us too far. But it’s complicated enough already.
27 Click will have competition from StepF, which will be proposed on metaphoric grounds. There are several
possibilities for choosing Click. One would be that if we added the metaphoric interpretation of Click, it would
be seen to be preferable because it moves us farther (except in the case where the target is the second card, in
which case there is a true indifference). Or we could simply say that an action proposed on literal grounds takes
precedence over one proposed only on metaphoric grounds.
WP3         Knowledge Level Specification for Cardbox                        Young                   20

To complete the proof, StepB and JumpB would have to be added to the model, and a
demonstration similar to the above done for policy rules #1 and #3.

6.7 Solving the whole task

The original task is to achieve Ke(address-of(card-of(Murgatroyd))), the user knows the address
on Murgatroyd’s card. Moving from that task to the subtask of (topcard = card-of(Murgatroyd))
involves some kind of subgoaling, which I don’t wish to analyse in details for reasons given in
§6.4 above. These remarks will be kept brief.
There would appear to be two alternative approaches. One is that, provided we have set things up
right, that the subtask leads directly to the main task,
    K(topcard = card-of(Murgatroyd) ⇒ Ke(address-of(card-of(Murgatroyd)))).                    [Th4]
Now the user is certainly free to set up as a subtask anything she knows will lead to an existing
task. I have no idea how to write that, but let’s have something like
    K(t ∈ tasks) ∧ K(x ⇒ t) ⇒ optional x ∈ tasks.28
So, if we can find a way to handle the technical issues properly, that could provide one route to
the subgoal.
The second route would be that, by virtue of [Th4], the user knows that one of the consequences
of performing Click(card-of(Murgatroyd)) would be that the main task is achieved. So she could
subgoal on Click, which presumably generates a subtask of having the card on-screen (rather than
at the front, which is the subtask we have assumed). This would complicate the story about the
application of the spatial metaphor, but I don’t think seriously undermine it.
So, there is more work to be done, but the situation looks promising.

7. Exercise 4

Exercise 4 requires us to show something interesting about the user’s reasoning, for example that
she can figure out a successful policy. In fact, there are some weird aspects to that request. First
is that, by working at the knowledge level, we precisely do not want to say anything (interesting or
otherwise) about the user’s reasoning processes. Second is that the notion of policy is anyway
cognitively implausible. Third is that deriving a policy is a challenging intellectual task,
equivalent to programming, and it would probably require “full reflection” by the user, i.e. the
assumption that the user knows as much about the user and (certain aspects of) the device as we
do. I don’t even want to think about it.
Instead, I’ll tackle a more feasible challenge: to show that the user knows that the net effect of
doing a Click(backcard) followed by a StepF is equivalent to doing a JumpF. Even this simpler
exercise raises some interesting issues for knowledge-level analysis.
First, what do we mean by backcard, and what does the user know about it? We’ve employed the
notion several times in the paper. It’s a concept that’s more useful to the user than to the device.
For the device there’s nothing special about that card, but for the user the top card and the back
card are visually apparent in a way that the other cards aren’t. So it’s reasonable to assume that
the user “knows” the back card just as much as she does the topcard, something like
Ke(backcard). But which card is the back card? We could presumably define the backcard as,
say, the card shown on the screen whose next() is not on the screen. But that’s just the kind of
cutesy mathematical trick that I’ve referred to as a nominal description. It surely doesn’t
correspond to the way the user thinks about it. For the user, presumably, the key property of the

28 We don’t want a plain ‘⇒’ without any guard, because that would give us too many tasks!
WP3         Knowledge Level Specification for Cardbox                     Young                      21

card is that it is at the back of the stack shown on the screen. Since our formal language isn’t
developed enough to talk about such visual properties, the best thing probably is not to try to
define “which card it is”, but to allow ‘backcard’ as a visually apparent designator (so that it can
be an argument to Click, for example).
The spatial metaphor for the actions is not of itself precise enough to solve the problem.
However, it can check the question for plausibility. With appropriate axioms for the spatial
domain, we could show that the user would know that the effect of Click(backcard) followed by
StepF would be to move in the ahead direction, as does JumpF, and also (maybe) that the net
distance is about a screenful, as also for JumpF — though to handle the matter of distance we
would have to extend the metaphor to include cards on the screen, which we have so far avoided.
If we use the literal effect of JumpF as given in rule [K18], the question is easy. From [K15], we
have that the user knows the effect of Clicking on the last card, K(lastcard = L ⇒
[Click(lastcard)] topcard = L). From [K16] we have that the user knows the effect of a
subsequent StepF, K(lastcard = L ⇒ [Click(lastcard)] [StepF] topcard = next(L)], which
corresponds to the effect of JumpF. If we defined the effect of JumpF in some other way, then
more work would be required on our part to demonstrate the equivalence.
I’ll leave the exercise at that point.

8. Conclusions

It’s going to take a while (at least for me) to digest the implications of the analyses in this paper,
but in the meantime I offer some observations that strike me at the end of writing it. What I’ve
tried to do is to offer an analysis based on the user’s knowledge, by adopting a modal logic-like
notation and performing the analysis at the knowledge level. I’ll start with those two aspects.

8.1 Use of modal logic

This is the first time, so far as I can remember, that we’ve tried to employ a modal logic especially
adapted to representing knowledge as part of a knowledge analysis. It has been, I suppose, a bit
scary in places, though it has demonstrated its ability to represent fine — but significant —
differences in meaning. It therefore allows us to be precise about what the user does and doesn’t
know, and helps us to be more rigorous, i.e. to draw the conclusions, but only those conclusions,
that follow from the knowledge. In return for the fussiness it imposes, we are forced to be clear
about what the user has to know, which is surely the game that we’re trying to play.
Working with the modal logic has made me realise that the existing work on PUM IL, and
analyses like that in WP2, use modal contexts ubiquitously (even apart from MAL) in ways that I
at least had not recognised. For example, the specification of a task is inherently modal: a task
takes a sentential argument, and refers to a future situation different to the present one. The same
is true of goals, preconditions, and filters. The same is (perhaps more obviously) true when
describing the effects of operators. MAL makes that clear of course, which I think is a merit of its
[action] notation. But even if we simply write “effects(A) = X” we’re invoking a modal context.
The danger is that if we don’t recognise the modality and handle it properly, we will inevitably
fail to employ a proper semantics for the description.

8.2 Knowledge level analysis

In this paper I’ve deliberately aimed for a knowledge-level analysis, which I’ve once or twice
contrasted with the symbol-level analysis done in WP2. I don’t think one is necessarily better
than the other, but I do think it’s important that we explore both approaches.
WP3         Knowledge Level Specification for Cardbox                    Young                        22

It’s perhaps worth pointing out one of the places where the knowledge-level approach makes
things easier for us as analysts, and that’s the core step of identifying an action to achieve some
result. Suppose that the user needs to choose an action A to achieve some subtask T. Then in a
symbol-level analysis (see, e.g. the definition of “acceptable-ops” in WP2) the model needs to
find an action A which has explicitly listed as one of its effects (or purposes) some item E that
matches at the symbol level with T. In consequence, the analyst has to anticipate the exact
contexts and purposes for which an action can be used, and list them explicitly.
The corresponding selection done at the knowledge level is much more flexible. The relevant
rule is [C3], repeated here:
    K(t ∈ tasks) ∧ K([a] t) ⇒ a ∈ relevant-actions                                    [C3], repeated
Glossed carefully, that rule says that if the user has a subtask t and knows of an action for which t
is among its consequences, then … . There is no need for t to be explicitly associated with a in a
declaration. It only has to be the case that (the user knows that) t follows from the application of
a. We saw an example of that flexibility in the discussion of §6.7, where we saw that the user can
choose the action Click(x) to achieve the task Ke(address-of(x)). Rule [K15] asserts only that the
effect of Click(x) is that topcard = x, and makes no mention of Ke() or address-of(). But the user
knows that topcard = x implies Ke(address-of(x)), so the selection can be made. That kind of
knowledge-based selection is simply not available at the symbol level, precisely because the
knowledge-level analysis abstracts over the details of the cognitive processing required.

8.3 Simulation vs proof

In the paper, I’ve tried to sharpen (and have perhaps exaggerated) the distinction between using
the formal description to simulate the behaviour of the model as against using it to demonstrate
(or “prove”) abstract properties of its behaviour. One thing that the enterprise has clarified for
me is that both are needed, and that they deliver complementary results. It seems to me that a
simulation shows that the model “works”, in the sense that the formal description is done right
and all the bits fit together, whereas a proof shows that the model “does the right thing”, e.g. is
able to achieve the task.
The distinction showed up clearly in Exercises 1 and 2, which asked for simulation and proof
respectively. Exercise 1 was invaluable in trying to sort out the details of exactly what the user
knows from the device display, while Exercise 2 forced an unanticipated clarification of the
requirements on the user’s judgements of direction.

8.4 Circular ordering

My decision to assume a circular ordering — a “ring” — for the Cardbox has caused a number
of complications. I think it was the correct thing to do for this particular enterprise. Having a
structure which it was not obvious (to me) how to specify fully forced us to work more abstractly,
stating just some properties of the structure and trying to draw conclusions on the basis of them
— and doing that was one of the things I wanted to explore. It also led (in §5.3) to the interesting
process of clarifying the requirements on the user’s judgement of direction.
However, if this analysis is to be pursued further, e.g. for publication, I would recommend
reverting to the linear Cardbox depicted in Figure 1. That would make it feasible to tighten up
the derivations, and getting this kind of stuff right is hard enough even without the extra
WP3        Knowledge Level Specification for Cardbox                   Young                    23

8.5 And finally …

… let me end on a personal note. Although this working paper has been written very fast —
essentially over a weekend — some of the ideas have been incubating for a long time. I have
long wanted to be able to describe rigorously the use of spatial metaphor (or mental models) for
navigating through a Cardbox-like system, and of applying formal notation to specify the content
of the mental model. In fact, in the late 1980s I once spent a day at York discussing with various
people (Harrison, Thimbleby, Dix, I think it was) the possibility of such an enterprise. (At that
time I was working with something called the Option Ring scenario, closely analogous to
Cardbox.) We didn’t really get anywhere then, which I presume was at least partly because the
kind of formal models we were considering (PIE and so on) simply lend themselves less well to
the job. I think it represents real progress that the style and models we are now using are able to
get this far.
Anyway, it must be apparent that I’m excited by the possibilities of this approach. I’m sure that
much of what’s in the paper is technically wrong, but I hope that there’s enough underlying
correctness that the idea of doing a formal, knowledge-level analysis can be fruitfully developed.

Shared By: