VIEWS: 11 PAGES: 22 POSTED ON: 4/14/2010
13th ICCRTS: C2 for Complex Endeavors “Mutual Information and the Analysis of Deception” Topic 4: Cognitive and Social Issues E. John Custy (contact) and Neil C. Rowe SPAWAR Systems Center and Naval Postgraduate School Code 24527, San Diego, CA 92152 and Code CS/RP, 1411 Cunnningham Road, Monterey CA 93943 (619) 553-6167, (831) 656-2462 john.custy@navy.mil, and ncrowe@nps.edu Abstract This paper describes a general analysis technique for deception. We show that deception can be placed within the same modeling framework commonly associated with communication, and that elementary concepts from information theory can then be applied. In particular, the average effectiveness of a deception can be measured in terms of the mutual information between two random variables: the random variable representing the true state of nature, and the one representing a targets perception of the state of nature. Our analysis technique provides (i) a general yet simple framework for understanding deception, and (ii) a practical method for measuring the effectiveness of specific deception scenarios. Keywords: deception, information theory, communication, spoofing channels 13th ICCRTS: C2 for Complex Endeavors Mutual Information and the Analysis of Deception Abstract This paper describes a general analysis technique for deception. We show that deception can be placed within the same modeling framework commonly associated with communication, and that elementary concepts from information theory can then be applied. In particular, the average effectiveness of a deception can be measured in terms of the mutual information between two random variables: that representing the state of nature, and that representing a targets perception of the state of nature. Our analysis provides (i) a general yet simple framework for understanding deception, and (ii) a practical method for measuring the effectiveness of specific deception scenarios. Keywords: deception, information theory, communication, spoofing channels I. Introduction One of the most powerful ways of countering an adversary is through deception. Intuition suggests that deception and communication are somehow related, but the precise nature of any relationship that might exist has not yet been established. This is unfortunate because the past several decades have seen sophisticated and useful mathematical tools developed to characterize communication systems, yet few, if any, analysis techniques currently exist for deception [17]. In this paper we describe how the average effectiveness of a deception can be quantified with the same conceptual tools used to characterize conventional communication systems. Our analysis technique is based on the average mutual information between two specific random variables that arise in the course of any deception. These two random variables are (i) the actual state of nature, and (ii) the state of nature perceived by the deception target. Mutual information is measure of how much information the value of one random variable provides, on average, about the value of another. In our model, the deception target attempts to “communicate” with reality, and specifically tries to determine the value taken by some specific state variable. The deceiver plays a role analogous to noise in a communication system. Justification for this model comes from the concept of deception being the imposition of a specific false version of reality onto a target. It is important to note that our analysis technique gives information only about the average effectiveness of a deception, not about the success or failure of any particular deception event. This information about average effectiveness nonetheless provides 1 13th ICCRTS: C2 for Complex Endeavors important insight into, for example, how deceptions can be expected to “evolve” as participants become aware of the tactics being used by others. This paper is structured as follows. The following section introduces some basic terminology and concepts about communication systems to support later discussions. Section III, which contains the core ideas of this paper, describes how deception can be treated communication. These general concepts are illustrated by some examples in Section IV. Section V presents some material that we feel is interesting and relevant, but which must still be considered “work in progress.” Finally, Section VI concludes with a high-level summary of major points. II. A Sketch of Communication Terminology and Concepts In simplest terms, a conventional communication system consists of an information source (or “transmitter”) and an information receiver (“destination” or “sink”), the two being linked through a channel. The source picks at random a message from a large number of possible messages and impresses it onto the channel. The channel delivers to the receiver what can be interpreted as “evidence” about the message chosen by the source, and the receiver uses this evidence to infer the sent message. The most useful way to characterize both the source and the channel is in terms of their statistical behavior, because anything deterministic about their behavior is not of much conceptual interest [13]. Stated in different terms, the source is characterized statistically because the specific information to be communicated is unknown when the system is designed, and likewise the deterministic behavior of the channel can be accommodated before the system is used. To simplify terminology, the phrase random variable will be used to designate any random outcome, regardless of whether numbers have been assigned to those outcomes. Our discussion of deception will rely heavily on Figure 1, which shows an abstract representation of a discrete binary communication system. When interpreted in conventional communication terms, the information source generates symbols denoted A and B, with a-priori probabilities pA and pB=1−pA. That is, the fraction of A’s in a long sequence generated by this source will be about pA, and the fraction of B’s will be about pB=1−pA. The transition probabilities pAB and pBA indicate the rates at which each symbol type is delivered to the destination either correctly or incorrectly. That is, non-zero transition probabilities will cause a symbol of one type to be delivered as the other type. For 2 13th ICCRTS: C2 for Complex Endeavors example, pAB is a probability that represents the rate at which an input of symbol A is delivered to the destination as symbol B. Transitions of this sort in communication systems are typically caused by noise. Significantly, transition probabilities close to one do not necessarily imply poor communication performance. For example, the case pAB=pBA=1 is equivalent to the case pAB=pBA=0, as far as communication effectiveness is concerned. Both of these cases provide noiseless communication, with the situation pAB=pBA=1 requiring only a little post-processing at the destination, namely the conversion of all A’s to B’s and all B’s to A’s. Figure 1. An abstract representation of a discrete binary communication system. A particular type of statistical average useful for describing communication system is the entropy of a random variable [Cover and Thomas, 2006]. The entropy of a random variable X is denoted H(X), and represents (in informal terms) the average number of bits needed to describe the values taken on by that random variable. For example, if a discrete random variable X has an entropy of 2 bits/value, then we can say that, on the average, we need 2n bits to represent n particular values of this random variable. A natural generalization of entropy is the mutual information I(X;Y) between two random variables X and Y . Mutual information, which takes the form I(X;Y)=H(X)−H(X|Y)... 3 13th ICCRTS: C2 for Complex Endeavors can be interpreted as the average number of bits that one random variables provides about the other. It can be shown that I(X;Y)≥0; I(X;Y)=I(Y;X); I(X;X)=H(X). The entropy concept is useful because it conveniently encapsulates the (weak) law of large numbers, which allows the use of simple “counting” arguments for evaluating communication systems. Mutual information finds use, for example, when a channel capable of conveying a signal of some sort (optical, radio frequency, acoustic, etc.) becomes part of a communication system. This is accomplished by introducing interfaces between the source and channel, and between the channel and destination. A typical goal is to optimize performance subject to cost and other constraints, with performance evaluated by comparing the actual value of I(X;Y) against the maximum possible value of I(X;Y) for the channel. Statistical descriptions of channel characteristics allow computation of the maximum possible value of I(X;Y). Though these ideas find their most direct application in the development of communication systems, they are very general and are routinely applied under a variety of circumstances. Perhaps the most important point for our purposes is that the mathematical tools used to analyze communication systems can just as easily be used to analyze sensor systems, because these terms refer fundamentally to the same thing. Informally, a communication system can be viewed as simply a specially configured sensor system, and a sensor system, on the other hand, can be viewed as a special type of communication system. Stated differently, the random variable X in Figure 1 may be generated by, or located alongside, an agent who has an interest in transferring information to the destination, or, alternately, X can be associated with a process that is completely indifferent to that transfer process. The mutual information between the source and destination does not depend on the interests of the participants. III. A Link Between Communication & Deception Our analysis technique for deception is based fundamentally on the following definition. We use the term deception target or just target to refer to anyone who has a deception applied against them, whether or not they “fall” for it. This definition is based on [Godson and Wirtz, 2002, p. 6]. 4 13th ICCRTS: C2 for Complex Endeavors Deception is the presentation of a specific false version of reality by a deceiver to a target for the purpose of changing the targets actions in a specific way that benefits the deceiver. Deception is the imposition of a specific false version of reality onto an adversary; that is, a deceiver does not surround the correct version of reality with an obscuring fog, but rather replaces it with a specific and carefully created false version of reality. Deception is thus quite distinct from the denial of information to an adversary, and it is quite distinct from efforts that direct an adversary in a random, haphazard direction. As stated eloquently in [Whaley, 2007], a successful deception will make an adversary “... quite certain, very decisive, and wrong” [emphasis in the original]. These ideas can be made precise. People in general, and deception targets in particular, are constantly observing their environment in an ongoing effort to make correct inferences about that environment. A deceiver manipulates the environment of a target so that observations will suggest some specific incorrect version of reality. Figure 1 depicts this idea with the environment, or “state of nature,” represented as a source of information, and the deception target represented as an information destination. A full representation of nature in terms of state variables would be hopelessly complicated, but for our deception model we need only be concerned with a single state variable that can take either of two states, which we will call A and B. State A is essentially an arbitrary state of nature, subject only to the requirement that it not be the “false” state of nature mentioned in Definition 1, with other requirements perhaps imposed by the specific deception. These mild requirements make the deception possible, so we refer to state A as the precipitating state for the deception (though it can also be described as the actual state). State B is the false version of reality referred to in Definition 1. We refer to state B as the “false” or bogus state of nature that the deceiver uses as a mask, or disguise, for state A. Our model requires that either state A or state B hold, but that both not hold simultaneously. Of particular interest is that Definition 1 implies that there are necessarily circumstances under which a given deception cannot be carried out. Specifically, any given deception has associated with it a “false,” or bogus, version of reality, and it must be possible for this bogus version of reality to actually occur. When the bogus version of reality is actually in effect, the given deception cannot be carried out. For similar reasons, when state A is in effect, state B cannot be. Thus states A and B are mutually exclusive, and though they do not necessarily exhaust all the possible values that a state variable can take, any other values are of no interest for the deception, and we can, for convenience, condition all probabilities on the event AB. Because the precipitating and false versions of reality are mutually exclusive, and because we are interested only in cases where one or the other hold, the mathematical tools developed for binary communication system can be applied to deception scenarios. As an aside we note that if the bogus version of reality were impossible, it would be pointless to try to convince a target that it was actually in effect. Miracle weight-loss pills and other outlandish products are not counter-evidence to this statement; it is only necessary that the target consider the false state of nature to be plausible. It is interesting 5 13th ICCRTS: C2 for Complex Endeavors to note that these phenomena can be modeled as sensors operating with unrealistic, or “skewed,” a-priori probabilities. The role of the deceiver in this model is analogous to that of channel noise in a communication system. When nature is in a state appropriate for a specific deception, and when a deceiver decides to exploit those circumstances, one or more deception targets will have evidence of a specific false version of reality placed into their environment. The imposition of a false version of reality is represented in Figure 1 by a non-zero value assigned to the transition probability pAB. That is, pAB represents the rate at which the precipitating state A appears as the bogus state of nature B in the eyes of the target. In broad terms, our model of deception has the deception target attempting to infer the correct state of nature while a deceiver attempts to impose a specific false state of nature. In essence, the deception target is separated from reality by a communication, or sensor, channel, and successes by a deceiver act as channel noise which cause errors in the targets inferences about nature. A. One-Sided Deceptions The effectiveness of a deception is reflected in the mutual information between two random variables which, as described above, are present in any deception. These are (i) the random variable representing an information source or state of nature, denoted X in Figure 1; and (ii) the random variable representing a targets decision about the source, denoted Y. The mutual information I(X;Y) between these two random variables represents the number of bits of valid information that a deception target obtains, per observation, about the value of X. An “observation” is the total information gathered by a target before acting, and the state believed to be correct by the target is determined by their actions. 6 13th ICCRTS: C2 for Complex Endeavors Figure 2. Mutual information between random variables at the source and destination of a noiseless channel, Z-channel, and binary symmetric channel. The most straightforward interpretation of Definition 1 in terms of Figure 1 occurs with 0<pAB≤1 and pBA=0. The resulting mutual information I(X;Y) as a function of pAB is shown by the dashed curve in Figure 2. This most elementary type of deception, which is analogous to a Z-channel in communication systems, will be referred to hereafter as a one-sided deception. An important characteristic of a one-sided deception is that a deception target can make only one type of error. Specifically, a deception target is only capable of misinterpreting a state variable as being in state B when it is actually in state A. A one-sided deception is most effective when pAB=1. When this occurs, repeated deployment of the deception results in the target (or target community) perceiving only state B, regardless of whether state A or B is in effect. Figure 2 shows that the mutual information is sensitive to the value of pAB, and in fact the derivative of I(X;Y) at pAB=1 is -1/2. As a consequence, when pAB≈1, a small decrease of ∆ in pBA causes an increase of about 2∆ bits to the deception target. This sensitive behavior comes about because the target is only capable of misinterpreting A as B, and appearances of A are valid. In summary, a one-sided deception, the most elementary type of deception that can be represented in our model, corresponds to 0<pAB≤1 and pBA=0 in Figure 1. The mutual information I(X;Y) is given by the dashed curve in Figure 2, and, as with any deception tells us the average number of bits of information the deception target gains per 7 13th ICCRTS: C2 for Complex Endeavors observation. Under a one-sided deception, the deception target can make only one type of error, because only one value of a specific random variable is being disguised. B. Symmetric Complements and Symmetric Deceptions Following the terminology introduced above, a one-sided deception results, when successful, in state B replacing, or masking, state A when state A occurs. Consider the transition probability pBA, which is zero for a given one-sided deception. This transition probability can be formally interpreted as the rate at which the state A is made to stand in place of state B. An error of this type is related to, but quite distinct, from the given one- sided deception on which it is based. A one-sided deception with pAB>0 and pBA=0 has associated with it a deception characterized by pBA>0 and pAB=0, which we will call the symmetric complement of the original. A deception and its symmetric compliment used together will be referred to as a symmetric deception. That is, a symmetric deception disguises state A as state B when A occurs, and disguises state B as state A when B occurs. The idea of the symmetric complement of a deception can be considered from another perspective. Note that Figure 1 can be used to represent two distinct one-sided deceptions: one with pAB>0 and pBA=0, and one with pBA>0, pAB=0. Each of these one- sided deceptions is the symmetric complement of the other. When used together they form a symmetric deception. The solid “U-shaped” curve in Figure 2 shows the mutual information associated with a symmetric deception with pAB=pBA. This curve shows that a symmetric deception can cause a targets observations to become useless (i.e., I(X;Y)≈0) without transition probabilities taking the extreme values of zero or one. Additionally, at I(X;Y)=0 the derivative of I(X;Y) with respect to pAB=pBA is zero. There is no reason to expect that the transition probability of a deception will equal the transition probability of its symmetric complement, and in fact these parameters are completely independent of each other. However, it can be shown that symmetric deceptions in general achieve I(X;Y)=0 for values of pAB and pBA away from zero and one, and that the derivative of I(X;Y) with respect to $pAB and pBA at I(X;Y)=0 is zero. These two properties suggest that the requirements for an effective symmetric deception are much less stringent than those associated with a one-way deception. As an aside it should be noted that these two properties are not implied by the fact that I(X;Y) is a convex function of the transition probabilities. A further important characteristic of symmetric deceptions is that, as shown in Figure 2, the value of I(X;Y) increases as transition probabilities increase beyond 1/2. This situation is analogous to a binary communication channel that ``flips'' most of the bits sent through it. When this occurs in a communication system, a little post-processing at the receiver, namely inverting all received symbols can easily remedy this situation. In the case of a symmetric deception, however, the increase in I(X;Y) as transition probabilities increase beyond 1/2 implies that information about the actual state values is 8 13th ICCRTS: C2 for Complex Endeavors available to the deception target. This information can be extracted by the target in the following way: the target simply decides which state is most strongly implied by the available evidence, and then behaves as if the opposite state were in effect. Stated in different terms, if the target community knows that a symmetric deception is being applied against them, and this community knows that their performance is worse than would occur if they completely neglected any relevant observations, then the this community can exploit the deception to their advantage by behaving according to the state opposite to that implied by observations. The interesting conclusion is that a deception target can always exploit a deception that is too strong. Thus, unless the mutual information between these two random variables is low, deceivers leave themselves vulnerable to counter-deception techniques. This suggests that after a long enough exchange of measures and counter-measures, a deception will degenerate into what is essentially a case of denial. C. Further Significance of Mutual Information Interpreting deception in terms of mutual information leads to the concepts of one-sided and symmetric deceptions, which constitute unique perspectives into deception provided by information theory. We note here for completeness that our model also allows mutual information to serve the same purposes for deception that it does for communication. For example, mutual information allows apparently disparate deceptions to be compared. That is, because the effectiveness of a target in perceiving the correct state of nature is measured in the common units of bits for all deceptions, it becomes possible to compare deceptions of very different types. The use of mutual information also allows computations to be carried out for deceptions. If two communication channels provide mutual information values of I1(X;Y) and I2(X;Y), then the two channels operating independently of each other in parallel provide a mutual information of I1(X;Y)+I2(X;Y). Likewise, if a target is able to gather I1(X;Y) bits of information about the state of nature in the face of some particular deception, and another target in another particular deception scenario (that is, operating independently of the first) obtains I2(X;Y) bits of information about the state of nature in the face of that deception, then it can be said that the community of targets gathers I1(X;Y)+I2(X;Y) bits of information about the state of nature through the two deceptions. In addition, we can use mutual information to evaluate the cost effectiveness to a deceiver of changing the transition probabilities associated with a deception. Suppose that a one-sided deception is operating with a transition probability p1. A change ∆p1, which will presumably change the effort and costs incurred by the deceiver, will result in a change of ∆I(X;Y). This change is a non-linear function of p, and a change that may be cost effective at p1 may not be cost effective at some other p2. 9 13th ICCRTS: C2 for Complex Endeavors IV. Examples So far our presentation has been as general as possible. The following examples, summarized in Table 1, provide specifics that may be helpful for understanding the material introduced above. The first two examples are prototypical deceptions, at least in the popular imagination: the sale of used cars, and claims made on income tax forms. The next two examples, camouflage and identity theft, are like the previous two in that they are one-sided deceptions that do not have useful symmetric complements. The last two examples, one involving runways and the other involving honeypots, are interesting because they have natural symmetric compliments. A. Sale of Shoddy Item Participants exchanging goods, services, and/or money must share a common interest, because otherwise the exchange would not take place. However, there is also conflict: each party wants to “get” as much as possible and “give” as little possible. Potential state variables include the maximum amount of money that the purchaser is willing to spend, the minimum amount of money that the seller is willing to accept, and the quality of any products being exchanged. This example will consider the sale of a used car which, for simplicity, is assumed to have a fixed price, but which may be a “high quality,” or “shoddy” item. A potential buyer has available some class of used cars from which to make a choice, say, all of the cars on a particular lot, or all of the cars of a particular model and year. Before making any observations, our hypothetical purchaser knows only the a-priori probabilities associated with the available collection: that is, they know that x% of the members of a particular class are of “high quality” and that(1-x)% are of “low quality.” As far as a potential purchaser is concerned, the quality of a used car in this collection is random: the quality of any candidate for purchase may be high, or it may be low. Before making any observations, the best choice that the purchaser can make is based on the a- priori probabilities associated with the class in question. For example, if 90% of the cars in the collection are of low quality, then the best choice to assign to an arbitrary member chosen at random from the class is “low quality.” If half of the cars in the collection are 10 13th ICCRTS: C2 for Complex Endeavors high quality and half are low quality, then on the average neither choice will be better than the other for an arbitrary member of the class chosen at random. Before making a decision, however, a potential purchaser will usually supplement the a- priori probabilities with some observations. That is, our purchaser will look at several specimens from the collection, noting milage, cleanliness, service records, and so forth. These observations provide no guarantees about the state of the automobile, but they do provide evidence that can be used to form an estimate automobiles state. We will assume that salespersons have no incentive to disguise the actual state of nature to a customer examining a high quality car. That is, we will assume that causing a high quality car to appear of low quality will only hurt the salesperson. On the other hand, however, on those occasions when a potential purchaser is making observations of a low quality car, the salesperson may attempt to counter these observations in some systematic way, say by “rolling back” the odometer. However, the particular techniques used by the salesperson to crate this false version of reality are irrelevant to our analysis. The only significant point is that a potential purchaser observing a low-quality car will have a false version of reality presented to them by the salesperson. The deceptive salesperson will presumably be successful some fraction of the time. This is a one-sided deception because the salesperson is only attempting to make a low quality item look as if it were high quality; there is no attempt to make high quality items appear as low quality. Assume for simplicity that half the cars on a lot are of high quality, and the other half are of low quality. Suppose further that a particular salesman, when presenting a low quality car, is successful x% of the time that in making the car appear as high quality. The dashed curve in Figure 2 will provide a value of mutual information, say y bits, associated with the one-sided deception with this transition probability. Then the community of deception targets (i.e., used car purchasers) can view that particular salesperson as a channel which provides each customer with y bits of information about the quality of a candidate car. Each salesperson provides an information gathering ability that conforms to the dashed line in Figure 2s. Consider the extreme case in which the salesperson is totally ineffective in their deception efforts. Then a customer can make noiseless observations about the quality of a car through this salesperson, resulting in one bit per observation. Through this salesperson, a purchaser can perfectly determine the quality of a used car. Imagine, on the other hand, a salesperson who is a perfectly effective deceiver. This salesperson always makes every low quality automobile appear as a high quality car. Then all observations made through this salesperson will be “high quality.” This salesperson is useless to the customer, because he/she conveys no information; in essence, the noise associated with this channel is so bad that zero bits of information can be conveyed through it. Note that in order to achieve this extreme case, the deceiver must be completely effective; a small decrease in this salespersons effectiveness means that a non-zero number of bits will be transferred on the average for each observation made by the average target. That is, the target can be sure that any car that appears to be of low quality actually is of low quality, due to this deception being one-sided. 11 13th ICCRTS: C2 for Complex Endeavors Under the one-sided deception outlined above, the best the dealer can to is make all cars appear to be of high quality. However, suppose the salesperson presents high quality cars as low quality. This situation is not entirely implausible: it could arise if the dealer wants to “hold” the high quality cars for selected customers or for their own use; to circumvent poorly conceived tax laws; or it could result from a dealership that is a “front” for some illegal activity. Under these circumstances, a potential customer is subject to two distinct types of errors: the customer may mistake a low quality car as one of high quality, and the customer may mistake a high quality car as being of low quality. If the probability of each of these types of errors is equal, then the solid curve in Figure 2 gives the average number of bits of information gained by a purchaser per observation. Under these circumstances, a salesperson only has to be $50\%$ successful at each type of deception in order to make $I(X;Y)=0$. When $I(X;Y)=0$, a targets observations are useless, so a salesperson who is successful about half the time at each type of deception provides the same amount of information to a customer about the quality of a car as does the flipping a of a fair coin. This situation is more robust for the salesperson than the corresponding one-sided deception: even if the salespersons success rate varies slightly from $50\%$, the amount of information gathered by the target will still remain about zero. The salesperson may desire to cause an error rate of greater than 50% on the part of the targets. However, if the salesperson is successful in causing more than 50% errors on the part of automobile purchasers, then there is sure to be some systematic characteristic of the deception technique(s) that could be exploited by the target community. That is, if targets systematically believe that more than half of the high quality cars are low quality, and more than half the low quality cars are believed to be high quality, then, as intuition would suggest, there is a non-zero amount of mutual information that the target community can take advantage of. However, it may be non-trivial for the target community to discern and exploit this mutual information. B. Reporting Income for Tax Purposes Another prototypical deception is the reporting of income for tax purposes. The state variable being manipulated is the income of a taxpayer, and for simplicity we will assume that income can take on one of only two values: A denotes the value “high income,” and B denotes the value “low income.” The taxpayer is the deceiver in this scenario, and the deception consists of reporting that income is low when in fact it is high. Only a person with high income can carry out this deception; a person with low income is powerless to attempt this particular deception. The tax examiner is the deception target, and this person uses material such as (but not necessarily limited to) the form submitted by the taxpayer to discern the correct value of the state variable. The transition probability pAB is the rate at which the tax examiner accepts state B when state A is in effect; that is, pAB is the rate at which the examiner accepts as true a high income taxpayers statement that their income is low. 12 13th ICCRTS: C2 for Complex Endeavors The transition probability pBA, on the other hand, represents the rate at which someone who actually has low earnings is believed to have high earnings. If the tax agent makes no “honest” mistakes, pBA will be zero. However, if for some reason low income taxpayers provided evidence that they were high income, the tax examiner would be dealing with a symmetric deception. If enough high income persons gave convincing evidence that they were low income, and enough low income persons gave convincing evidence that they were high income, the tax examiners observations would become worthless for determining income. C. Camouflage A soldier or a hunter who dons special clothing to blend in with, say, a forest background is manipulating a state variable related to location. The state variable can take on values “someone is located in this forest,” which we will denote A, and “no one is located in this forest” denoted B. The special clothing and slow, quiet movements of the deceiver provide evidence to observers that B is in effect when A actually is. Someone who is not in the forest cannot carry out this deception. If we imagine that the deception target is a sentry who is surveilling the forest, and that the camouflage is perfectly effective, then the sentry will believe that state B holds. The symmetric complement of camouflage is a deception in which a region is made to appear populated when in fact it is empty. Conceivably, such a deception could be carried out with noise makers and/or mechanical devices for creating motion. Appropriate use of this symmetric complement can achieve the same ends as perfect camouflage, namely, making observations by the sentry useless. D. Identity of an Individual Many cases of social engineering [Mitnick, 2002] involve a deceiver who assumes the identity of an “insider.” The state variable of interest here can take on either the value “this person has authority” which we will denote B, or alternately “this person is not who they claim to be,” denoted A. The transition probability pAB then represents the rate at which an “outsider” is successful at being accepted as an “insider.” The symmetric complement is a deception in which a person with authority provides evidence that they have no authority. As in all the cases above, this symmetric complement is not very natural or practical in any real sense. E. Runway Strafing This example illustrates that the symmetric complement of a deception can be a very intuitive concept. Consider the following passage [Whaley, 2002], which involves the use of “dummy” aircraft to divert attacks away from real aircraft. Sometime around mid-1942, Major Oliver Thynne was a novice planner with Colonel Dudley Clarke's “A” Force, the Cairo-based British deception team. From intelligence, 13 13th ICCRTS: C2 for Complex Endeavors Thynne had just discovered that the Germans had learned to distinguish the dummy British aircraft from the real ones because the flimsy dummies were supported by struts under their wings. When Major Thynne reported this to his boss, Brigadier Clarke, the “master of deception,” fired back “Well, what have you done about it?” “Done about it, Dudley? What could I do about it?” “Tell them to put struts under the wings of all the real ones, of course!” Here the original one-sided deception is perpetrated by the British defenders, who use dummy aircraft to deceive the enemy attackers. The state variable being manipulated by the deceivers is the identity of an item sitting on a runway. Each item on the runway has a state variable associated with it, with possible values “real aircraft” and “dummy aircraft.” The only mistake that the deception targets can make is to misbelieve that a dummy aircraft is a real aircraft; the deception is thus one-sided. However, the one-sided deception turns out to be imperfect, and because of its simplicity, the symmetric complement is deployed. If the one-sided deception were perfect, the attackers would would not be able to distinguish real aircraft from the dummies. This story illustrates that an imperfect one-sided deception can be “salvaged” by proper deployment of the symmetric complement. F. Honeypots and False Honeypots One of the most effective ways of gathering information about the techniques used by computer intruders is through the use of a honeypot [TheHoneynetProject, 2004]. A honeypot is a computer that is placed on a network for the purpose of being broken into by computer intruders. Honeypots do not contain any information of value, and are usually highly instrumented so that the maximum amount of information about intruders can be gathered. Most computer intruders try to avoid honeypots so that their intrusion techniques, which may have required significant effort to develop, will not be revealed. A honeypot may thus be “tricked out” to appear as a non-honeypot computer containing valuable information. A honeypot thus represents a deception involving a state variable that can take the states “this computer is ordinary,” denoted B, and “this computer is a honeypot,” denoted A. Of particular interest is the symmetric complement of this deception, which is a deception in which an ordinary computer is made to appear as a honeypot. Deployment of this symmetric complement could potentially lead to a situation where a computer intruder cannot determine whether a computer that they have broken into is a honeypot or an ordinary computer. That is, the intruders observations about the machine they have broken into, gathered by examining all different aspects of the machine, would be useless for determining whether the machine is a honeypot or an ordinary computer. Because most computers on large networks like the Internet are ordinary, a computer intruder is safe in assuming that an arbitrary computer chosen for intrusion is ordinary. 14 13th ICCRTS: C2 for Complex Endeavors We thus suspect that the main benefits of this symmetric deception would come to those deploying honeypots, because honeypots make up only a small part of the computer population, and little or no evidence would be available to an intruder to foil the deception. However, computer intrusion in general may become less appealing when intruders have to work “blind” to honeypots. V. Miscellaneous Notes & Ongoing Work This section contains material that is relevant and interesting, but does not seem to fit naturally anywhere else in this paper. A. Deception Inputs and Outputs Our model treats “reality” as non-deterministic. All that anyone, including a potential deception target, can do is make observations and then act on inferences based on those observations. Observations can only provide evidence that a particular state of nature holds, and though this evidence can be extremely strong, no guarantees come along with an observation. Deception is possible only because of the non-deterministic nature of reality, and only because it is possible for deceivers to “feed” false observations to a target. No one can be deceived about something that is deterministic. The success or failure of a deception is measured by a targets actions, which in turn reflect the perception of reality by that target. That is, the target either behaves according to the specific false version of reality advocated by the deceiver (that is, the deception succeeds), or the target behaves according to the actual, correct version of reality (the deception fails). If the target behaves in a way that is independent of the the particular state of nature in question, then that target should be considered irrelevant to evaluating the success of the deception. For example, if a bogus money-making opportunity is thoroughly presented to a potential target, but the target suddenly drops dead before they have an opportunity to accept or decline, then we argue that this specific deployment cannot be counted as a success or a failure. B. Targets That Act as Ill-Designed Sensors We have seen that the average success of a deception can be analyzed as if it were a form of noise in a sensor channel between a deception target and reality, and that this model provides interesting insights into deception and counter-deception. It turns out that certain odd behavior on the part of deception targets can also be modeled in terms of communication concepts. In particular, a deception target that acts in accord with a very unlikely or impossible state of nature (miracle weight-loss pills, etc.) is similar to a communication receiver that is operating according to a-priori probabilities that are incorrect. To see this, imagine a communication receiver designed to receive binary symbols that occur with equal frequencies; that is, if the symbols are denoted 0 and 1, then any long sequence sent generated by the source will have equal numbers of 0's and 1's. Suppose now that 90% of the symbols generated by the source are 0's and 10% are 1's. In this 15 13th ICCRTS: C2 for Complex Endeavors case, the threshold for deciding between 0's and 1's will be significantly different than in the previous case; in essence, the receiver requires much more evidence to decide that a 1 was sent. On the other hand, a receiver designed to operate with a 90/10% mixture of 0's and 1's will, if provided with a 50/50% input, will err by reporting too many 0's. These ideas are illustrated by the following story, which can be interpreted as that of a sensor operating with incorrect a-priori probabilities [cartalk, 2007]. I worked my way through college as a Volvo mechanic, 1969-71. During those years, the extremely dependable but dated Volvo 120 series was being replaced by the extremely trendy but unreliable 140 series. Our shop foreman decided to buy a small Fiat, about 1500cc, saying that he could no longer trust the Volvo, and furthermore, he REALLY loved the TREMENDOUS gas mileage of the Fiat. The first week he had the Fiat, he did nothing but rave about the gas mileage, so we decided to help him. Every day we would add, at first a pint, then more and more gas to his tank when he wasn't looking. He went crazy. Our skeptical-looking (we were all in on it) crew would be regaled by his tales of getting, well, first it was 34, then 50, then 63 miles per gallon. He would snarl condescendingly at our gas guzzling Volvos, then reflect on the brilliance of Italian engineering. The Fiat dealership, of course, had several explanations. Tight engine. American gas. Driving habits. Then we gradually began to reduce the amount we added, until it was zero, and then of course we siphoned increasing amounts from the Fiat's tank. At first, the bragging slowed to a stop. He became surly. How was the Fiat? Wouldn't answer. Then of course he kept taking it back to the Fiat dealership, which, of course, had several explanations. Tight engine. American gas. Driving habits. In the end, he found us out, and our schedules were screwed for months. The behavior of this target is analogous to a communication receiver with a detection threshold set inappropriately low for certain types of symbols. This can result in (i) accepting as valid states of nature that are extremely unlikely (as in this story); or (ii) accepting unremarkable states of nature based on small amounts of evidence. C. Spoofing Channels The analysis technique presented in this paper provides no real guidance for the development of specific deception techniques. However, we are working on a specific deception technique that is worth mention here because of the interesting relationship this deception holds to its symmetric complement. Our deception technique developed from thoughts on how best to respond to a computer intrusion. A number of computer Intrusion Detection Systems (IDS's) are available for detecting and alerting system administrators when a computer intrusion occurs; a well known example of an IDS is SNORT [SNORT, 2007]. However, it is not clear what a system administrator should when an intrusion has been detected. One option is to immediately drop the connection to the intruder, which ensures that sensitive information is protected to the maximum extent possible. This option, thought, has the significant disadvantage that the intruder is unequivocally notified that they have been detected. It would be much better if the connection could be maintained and the intruders activities 16 13th ICCRTS: C2 for Complex Endeavors on the target machine observed. This would aid forensic work and would provide guidance for strengthening defenses against further intrusions. However, it is critically important that sensitive information be protected. We are developing a computer Intrusion Response System (IRS) based on a the idea of a spoofing channel. A spoofing channel is like a communication channel, but with a slightly less stringent performance requirement. A conventional communication channel is obligated to provide as output the same string of symbols provided as input, but in contrast, the output of a spoofing channel is required only to have the same statistical structure as the input, with no stronger relationship promised. When used in an IDS, a spoofing channel delivers to an intruder not the original document residing on the target computer, but rather a spoof of that document, with the spoof having the same statistical structure as the original, but no stronger relationship. The most significant characteristic of the spoofing channel is that it exploits for deception the uncertainty that conventional communication resolves. That is, communication is carried out because the information at the source is not known at the destination. If the information inside a target computer were known to the intruder, there would be no point in committing the intrusion. Our work so far has focused on spoofing channels for natural language text. In this special case, when we say that the output of a spoofing channel has the same statistical structure as the input, we mean that individual words appear with the same frequency in the input and output, word pairs appear with the same frequency, word triples, and so on. Thus when we say that the output of a spoofing channel has the same statistical structure as the input, we mean that all $n$-tuples of words appear with the same frequency. Two fundamental techniques have suggested themselves for automatic generation of spoofs of natural language text documents. 1. One technique for modifying a documents meaning while maintaining its “style” structure is through manipulation of the target documents semantic structure. This is intuitively the most straightforward approach to changing the meaning of a document while maintaining the same “style.” An example of this sort of technique would consist of negating and un-negating some particular subset of assertions in the subject document. 2. Another technique for automatically changing a documents meaning is through manipulations based on syntactic structure. A technique of this sort might consist of simply swapping two successive noun phrases (which may appear in the same, or in different, sentences). This technique depends heavily on pareidolia, which is the psychological phenomena of finding meaning in random and presumably ambiguous patterns [pareidolia entry on Wikipedia, 2007]. On occasion, spoofing channels have been observed “in the wild.” Examples include the classic spoof created by hand by Alan Sokal [Sokal, ], and SCIgen, which automatically generates random computer science research papers \cite{scigen}. 17 13th ICCRTS: C2 for Complex Endeavors Interestingly, the spoofing channel deception is identical to its symmetric complement. The fundamental reason for this is that the deception target is inherently unable to distinguish between valid information and a properly constructed spoof: the point of using a communication channel is to identify, or distinguish, valid information from among all the possibilities. Stated in different terms, a properly constructed spoof makes the observations of a deception target inherently useless. At this point, the spoofing channel appears to require handling as a “special case” for analysis. Analysis of the spoofing channel is a part of our ongoing research. Conclusion The material in this paper does not constitute a mathematical theory of deception, but it is just about as good: we have shown that the existing theory of communication can be used, almost “as is,” to describe deception. Based on a simple and natural definition of deception, our modeling technique allows the effectiveness of a deception to be evaluated in bits, and thus allows insights and computations that would otherwise not be possible. Because information theory has traditionally described the transfer of valid pictures of reality from one location to another, it is reasonable that the transfer of invalid pictures of reality can be described by the same means. Also, we have illustrated a form of duality between deception and communication. In a conventional communication system, the mutual information between the random variable at the information source and the information destination is of great interest: the goal of a communication system designer is to make the mutual information between these two random variables as large as possible. A successful deception, on the other hand, reduces the mutual information between reality and a targets perception of reality to the lowest value possible. This material is exciting not because of the specifics presented here, but rather because of the many open questions that remain. A few outstanding topics include the relationship of rate distortion theory [Cover and Thomas, 2006} to deception; analysis of deception as a game [Garg and Grosu, 2007] with payoffs quantified by mutual information; models of deception using continuous state variables; and the influence of deception on the stability of signaling systems [Searcy and Nowicki, 2005]. References http://www.cartalk.com/content/features/hell/01.05.html, retrieved 14 August 2007. Cover, Thomas M., and Joy A. Thomas. 2006. Elements of Information Theory, 2nd Ed. Hoboken, NJ: John Wiley and Sons. Godson, Roy, and James J. Wirtz, Eds. 2002. Strategic Denial and Deception: The Twenty-First Century Challenge. New Brunswick, NJ: Transaction Publishers. 18 13th ICCRTS: C2 for Complex Endeavors The Honeynet Project. 2004. Know Your Enemy: Learning About Security Threats, 2nd Ed. New York: Addison-Wesley. Garg, Nandan, and Daniel Grosu. 2007. Deception in Honeynets: A Game-Theoretic Analysis. Proceedings of the 2007 IEEE Workshop on Information Assurance, United States Military Academy, West Point, NY. Mitnick, Kevin D. and William L. Simon. 2002. The Art of Deception: Controlling the Human Element of Security. Indianapolis, Indiana: Wiley Publishing. http://en.wikipedia.org/wiki/Pareidolia Rowe, Neil C., Binh T. Duong, and E. John Custy. 2006. “Fake Honeypots: A Defensive Tactic for Cyberspace.” Proceedings of the 7th IEEE Workshop on Information Assurance, U.S. Military Academy, West Point, New York. Rowe, Neil C., Han C. Goh, Sze L. Lim, and Binh T. Duong. “Experiments with a Testbed for Automated Defensive deception Planning for Cyber-Attacks.” http://pdos.csail.mit.edu/scigen/ Searcy, William A., and Stephen Nowicki. 2005. The Evolution of Animal Communication: Reliability and Deception in Signaling Systems. Princeton, NJ: Princeton University Press. Shannon, Claude E., and Warren Weaver. 1949. The Mathematical Theory of Information. Chicago: University of Illinois Press. http://www.snort.org http://www.physics.nyu.edu/faculty/sokal/ Whaley, Barton. Conditions Making for Success and Failure of Denial and Deception: Authoritarian and Transitional Regimes. Printed as Chapter 3 of [Godson and Wirtz 2002]. Whaley, Barton. 2007. Stratagem: Deception and Surprise in War. Boston: Artech House. _________________ 19 13th ICCRTS: C2 for Complex Endeavors Last, First. 2000. Full title of the book. City: Publisher. Last, First, and First Last. 2000. Full title of the article in the publication. In Full title of the journal or publication 28: 254–267. Last, First, First Last, and First Last. 2000. Full title of the article presented at the conference. Paper presented at Full Name of the Event, June 5–7, in City, State. 20