A Bayesian Approach to Intention-Based Response Generation

Reviews
Shared by: the300e
Stats
views:
2
rating:
not rated
reviews:
0
posted:
7/26/2009
language:
English
pages:
0
European Journal of Scientific Research ISSN 1450-216X Vol.32 No.4 (2009), pp.477-489 © EuroJournals Publishing, Inc. 2009 http://www.eurojournals.com/ejsr.htm A Bayesian Approach to Intention-Based Response Generation Aida Mustapha Faculty of Computer Science and Information Technology, University Putra Malaysia 43400 UPM Serdang, Selangor, Malaysia E-mail: aida @fsktm.upm.edu.my Tel: +603-8946-6585; Fax: +603-8946-6577 Md. Nasir Sulaiman Faculty of Computer Science and Information Technology, University Putra Malaysia 43400 UPM Serdang, Selangor, Malaysia E-mail: nasir@fsktm.upm.edu.my Tel: +603-8946-6585; Fax: +603-8946-6577 Ramlan Mahmod Faculty of Computer Science and Information Technology, University Putra Malaysia 43400 UPM Serdang, Selangor, Malaysia E-mail: ramlan @fsktm.upm.edu.my Tel: +603-8946-6585; Fax: +603-8946-6577 Mohd. Hasan Selamat Faculty of Computer Science and Information Technology, University Putra Malaysia 43400 UPM Serdang, Selangor, Malaysia E-mail: hasan@fsktm.upm.edu.my Tel: +603-8946-6585; Fax: +603-8946-6577 Abstract The statistical approach to natural language generation of overgeneration-andranking suffers from expensive overgeneration. This article reports the findings of response classification experiment in the new approach of intention-based classification-andranking. Possible responses are deliberately chosen from a dialogue corpus rather than wholly generated, so the approach allows short ungrammatical utterances as long as they satisfy the intended meaning of the input utterance. We hypothesize that a response is relevant when it satisfies the intention of the preceding utterance, therefore this approach highly depends on intentions, rather than syntactic characterization of input utterance. The response classification experiment is tested on a mixed-initiative, transaction dialogue corpus in the theater domain. This article reports a promising start of 73% accuracy in prediction of response classes in a classification experiment with application of Bayesian networks. Keywords: Bayesian Networks, Classification, Dialogue Systems, Natural Language Generation A Bayesian Approach to Intention-Based Response Generation 478 1. Introduction As opposed to research on text generation systems that generate paragraph-length sentences based on rhetorical structure, research on natural language generation in dialogue systems often avoid detailed linguistic realization for two main reasons. The first is, dialogue utterances are typically short, singlesentence responses. Secondly, dialogue utterances are often incomplete and may even be grammatically incorrect, but yet are acceptable as long as they are coherent with the preceding utterance. Because dialogues are intention-driven, the major concern in a dialogue system is the coherence of the response utterances. The measure of a coherent dialogue is the relevance of the response to its preceding utterance (Hulstijn, 2000). This means, a probabilistic response generation component in a dialogue system must be able to recognize a relevant response while maintaining parity in semantic content. In 1986, Sperber and Wilson presented Relevance Theory, a theory that connects relevance to cognitive processing load. When people understand an utterance, they often try to maximize relevance by choosing the context in which the utterance is most relevant. The definition of context is deeper than semantics because semantics only encodes the information within the utterance. Reasoning about the context, however, is the very essence of pragmatics because pragmatic information can only be made relevant by the act of uttering the utterance (Bach, 2002). By this statement, we mean that linguistic form of utterances offer trivial help in distinguishing between what the speaker means and what the words mean. The relationship between linguistic forms and context has to be summarized from the perspective of pragmatics, which is intentionality. Based on this, Mustapha et al. (2008) argue that a response is relevant when it satisfies the intention of the preceding utterance. To model such effect, they propose an intention-based response generation module for dialogue systems, whereby response realization is based on the intentions of the input utterance, rather than its syntactic and semantic form. The philosophy behind the architecture is that the generation system learns to manage its own response strategies based on an in-domain corpus of dialogues. The basic assumption is that for every pair of utterance exchange in the dialogue corpus, the first pair from the user creates certain expectations that can constrain the possibilities of the second pair for the system utterance. The remainder of this article is organized as follows. Section 2 presents overview on related works in generation. Section 3 presents the classification task and details of the experiments, while section 4 presents the results and discussion. Finally, section 5 concludes with concluding remarks and plans for future works. 2. Previous Research The decisions to choose from one response utterance over another require a considerable amount of domain knowledge. Hence, a knowledge-based approach is absolutely necessary. Deep generation determines the content of an utterance, or what to say, while the surface generation realizes the structure of the utterance, or determines how to say. Because deep generation requires a high degree of linguistic abstraction to produce fine-grained input specifications in order to drive surface generators (McKeown, 1985; Elhadad, 1993; Lavoie and Rambow, 1997; Oberlander et al., 1998), its primary drawback is the classic problem of a knowledge engineering bottleneck. The general purpose and domain-independent natural language generators like KPML (Bateman, 1996), FUF/SURGE (Elhadad, 1993), and RealPro (Lavoie and Rambow, 1997) are difficult to adapt to a small, task-oriented dialogue system due to the high requirement on precision of grammar, lexical choice, and a daunting amount of other linguistic details to specify the rich form of input features. A cheaper alternative to grammar-based approach is to make available templates for some trivial part of responses like greetings or other standard expressions. The hybrid of grammar- and template-based generator is more flexible in determining variations of response utterances (Stent, 2002; Klarner and Ludwig, 2004). One significant advantage of templates is that it does not require any like lexical selection, part-of-speech tagging, sentence organization, or rhetorical relations during the 479 Aida Mustapha, Md. Nasir Sulaiman, Ramlan Mahmod and Mohd. Hasan Selamat generation process. Furthermore, to avoid manually hand-coding the templates, automatic template acquisition from the corpus is also possible through reinforcement learning (RL) (Walker, 2000) and probabilistic grammar (Zhong and Stent, 2005). However, templates are highly domain-specific, which means that they are not reusable for another domain. Such limitation in template-based generation has become the main motivation for statistical, corpus-based approach to language generation in dialogue systems. The statistical approach to natural language generation is based on an overgeneration-andranking architecture (Langkilde and Knight, 1998), whereby the generation component produces a collection of sentence realizations including partials and a separate ranking component selects the most likely realization (Langkilde, 2000l; Bangalore and Rambow, 2000; Oh and Rudnicky, 2002; Ratnaparkhi, 2002). The main limitation of this approach is that it is computationally expensive because the generation component has to over generate to set up the band of candidate sentences. Even the application of statistical language models like n-grams is often biased towards shorter strings because the likelihood of a string of words is determined by the joint probability of the words (Belz, 2005). This is clearly not necessary for generation of dialogue utterances because all candidates should be treated as equally good realizations regardless of length, in fact, regardless of grammar. More importantly, each dialogue utterance bears individual intention. In modeling intentions and dialogue behaviors, the closest work to ours are in generating dialogue contributions (Stent, 2002) and learning dialogue structure (Stent, 2002; Bangalore et al., 2006). Stent (2002) applies Conversation Acts Theory (Traum and Hinkelman, 1992) to generate dialogue contributions before sending them for another stage of surface realization. However, Stent defines argumentation acts in Conversation Acts Theory in the form of adjacency pairs, while we use phases of negotiations i.e., information, proposal, and confirmation as argumentation acts to model the dialogue structure. Furthermore, Bangalore, Di Fabbrizio, and Stent (2006) also use dialogue acts and manually-annotated domain-specific subtask segmentations in learning the dialogue structure. We, on the other hand, exploit combinations of the conversation acts to automatically extract the argumentation acts that form the structure. In this article, we will present classification experiment under classification-and-ranking architecture that is based on intentions (Mustapha et al., 2009). Classification-based NLG has been previously explored for tasks in content planning (Reiter and Mellish, 1992) and syntactic realization (Marciniak and Strube, 2004). Reiter and Mellish (1992) perform classification-based content planning to classify the user input utterance into the class of content-rules. Nonetheless, while content planning is performed entirely by classification, only a small part discourse planning employs the classificationbased approach, which is lexicalization. Similarly, classification-based generation by Marciniak and Strube (2004) is designed at the surface level, which concerns in linguistic features based on tree adjoining grammar (TAG) and the grammatical structure of sentences. Previous classification-based experiments do not take a full stochastic approach to response generation, but rather in selective stage of language generation. A Bayesian Approach to Intention-Based Response Generation 480 3. Experiments Our experiments concern the classification of user input utterances into response classes based on features extracted from user input utterances. There were 15 response classes as shown in Table 1. Table 1: Statistics for Response Classes No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Response class title genre artist time date review person reserve ticket cost avail reduc seat theater other Frequency 104 28 42 32 90 56 30 150 81 53 14 73 94 12 61 Percentage (%) 11.3 3.0 4.6 3.5 9.8 6.1 3.3 16.3 8.8 5.8 1.5 7.9 10.2 1.3 6.6 In classification stage, as in any classification task, the main task is to determine which of a set of classes some observation belongs to. In our case, we would like to identify a response class for each response utterances, maximizing P (response class | user utterance). The objective of our response classification task is to find the recognition accuracy of correct predictions of response class rc, given the user utterance U. Each utterance is analyzed from the perspective of speech actions, which is fully characterized by its (1) communicative function and (2) semantic content in the form of input frame (refer to Figure 3). These observed features are of utterance properties that uniquely constitute the user utterance U, ˆ during a particular turn of a conversation. We use rc to mean “our estimate of the correct response class”. The computable probability equation to find the best response class is shown in equation 1 ˆ rc = arg max P(U | rc) P(rc) (1) rc ∈ R For each user utterance, we manually tag the response class according to the topic of the response utterances (as opposed to the topic in user utterances). Tagging the response class faithfully adapts to patterns of input and response utterance per turn throughout the course of conversation. These patterns are described by the rules of adjacency pairs (Schegloff and Sacks, 1973) to maintain the coherence in a sequence of two utterances. 3.1. Dialogue Corpus SCHISMA (SCHouwburg Informatie Systeem) is a Theater Information and Ticket Reservation system (Hoeven et al., 1995). The dialogue corpus is a collection of 64 text-based, human-machine dialogues obtained through a series of Wizard of Oz experiments. It contains 920 user utterances and 1,127 server utterances in total. SCHISMA is a mixed-initiative, transaction dialogue, in which there are two types of interaction: inquiry and transaction (Hulstijn and Hessen, 1998). During inquiry, the system is user-initiative to allow users to inquire about details of the theater performances like the dates, artists, reviews or authors, while the system will answer all the questions. When the conversation arrives at the point where user indicates that they would like to make reservations, the system shifts into transaction mode where the system now takes initiative. Starting from this point onwards, the system asks the user series of questions like number of tickets to reserve, 481 Aida Mustapha, Md. Nasir Sulaiman, Ramlan Mahmod and Mohd. Hasan Selamat discount cards and others. The user answers the questions to complete the reservation details required by the system. In transaction dialogue, before reaching the point of reservation, both user and system must collaborate to achieve agreement on several issues like ticket price, seating arrangement or discount availability. This model is more complex than question-answering systems because at any point, both parties may request information from each other and the user in particular, may retract any previous decisions and take the conversation in a totally different direction. The dialogue excerpts in Figure 1 illustrate the complexities of mixed-initiative, transaction dialogues. Figure 1: Illustrations of Mixed-initiative, Transaction Dialogue in SCHISMA USER SYSTEM USER SYSTEM USER SYSTEM USER SYSTEM USER USER SYSTEM USER When does “Eugen Onegin” play? “Eugen Onegin”, played by National Opera of Saints Petersburg, is to be seen on 27 January 1994. Is there still a place available for? Yes there are still 826 seats available. What is the price for one? Valid reductions are CJP, JTK, Normal and Senior Pass. Do you have a reduction card? No The price of show “Eugen Onegin” is f75.00. What expensive! Forget about that reservation. So there are no tickets to be reserved. What where the other operas again? [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] To model the dialogue structure and user behaviors i.e., turn-taking are really a form of uncertainty. The system must decide whether the current response should answer the user's utterance immediately or the system should interrupt; this is when the system holds the answer and asks a question instead. To add to the intricacy of the decisions, the system must also recognize which utterance that it is currently addressing, in case of multiple utterances at one turn. In our proposed intention-based architecture, this is solved through extraction of pragmatic features, particularly the argumentation acts as presented in the following section. 3.2. Feature Extraction An intention-based response generation system may use any model of intentions, for example the Belief, Desire, Intention (BDI) model or domain-specific dialogue acts in different dialogue act scheme (Carletta et al., 1997; Alexandersson and Reithinger, 1997). Following Mustapha et al. (2008), the response classification experiment under classification-and-ranking approach relies heavily on dialogue acts adopted from Dialogue Act Markup in Several Layers (DAMSL) framework (Allen and Core, 1997). The DAMSL dialogue acts annotation scheme consists of five layers, each of which covers a different aspect of communicative functions. This research concerns two levels only, the forward-looking and backward-looking functions. Both levels indicate the communicative functions of an utterance. FLF tags indicate the type of speech act that the utterance is conveying, for example, assert, info-request and commit. BLF tags indicate how the particular utterance relates to the previous utterance and include answers (positive, negative or no-feedback) to questions, degree of understanding or disagreement. Table 2 lists the dialogue acts, represented as FLF and BLF in the SCHISMA corpus. A Bayesian Approach to Intention-Based Response Generation Table 2: FLF and BLF for SCHISMA User 29 0 0 239 0 71 433 123 4 2 19 920 System 31 4 66 11 111 38 165 694 0 0 7 1127 BLF signal_understanding signal_non_understanding positive_answer negative_answer no_answer_feedback correction_feedback accept, accept_part reject, reject_part Hold Maybe no_blf User 8 3 162 30 3 0 70 15 39 1 589 920 482 FLF conventional commit offer action_directive open_option query_if query_ref assert exclamation explicit_performative other_ff System 0 20 399 42 63 1 54 8 161 0 379 1127 The SCHISMA corpus is readily tagged with DAMSL annotation scheme by Keizer and op den Akker (2006). Apart from the set of dialogue acts (intentions) provided by a DAMSL-annotated corpus like SCHISMA, we defined two additional sets of features derived from user input utterances, which are semantic features and pragmatic features. 3.2.1. Semantic Features Recall that the merit of our response classification is the relevance of the response utterance with regards to the user utterance. According to Carberry (1990), relevance issues are related to the topic of the conversation, and an utterance is considered relevant only when it provides a contribution to a contextual plan used to achieve the current goal i.e., to make reservation. Table 3 corresponds to the semantic features extracted from both user utterances and response utterances in the dialogue corpus, which are context, topic, and focus. Nonetheless, features from each participant will be treated during different stages in our intention-based approach to response generation. Semantics of the user utterance are used during classification of response classes, while semantics of the response utterance are used as attributes for ranking the responses. Table 3: ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Semantic Features for SCHISMA Context performance Topic/Focus title genre artist time date review person reserve ticket cost avail reduc seat theater other Topic User 103 29 42 32 91 56 30 150 80 53 14 73 94 12 61 920 System 124 48 59 18 64 50 26 74 55 131 12 87 89 24 59 920 User 200 69 57 11 58 18 12 91 75 26 11 116 101 12 63 920 Focus System 219 36 32 22 84 8 9 80 84 16 28 68 86 20 128 920 reservation other Based on Table 3, Context is the global topic of the user utterance (i.e., performance or reservation), while Topic is local to the particular utterance. For each user utterance, the global topic is implied based on the subject of utterance. The context will change from performance to reservation if 483 Aida Mustapha, Md. Nasir Sulaiman, Ramlan Mahmod and Mohd. Hasan Selamat the user explicitly request for a reservation. The extraction of utterance topics was based on the theory of Information Structure (Halliday, 1967), which applied the notion “topic” as the first element in a sentence and are assessed depending on the mood of the utterance i.e., assertive, imperative, or interrogative. The rule of thumb is that topic is always the subject and focus is the object of utterance. Other utterances that do not fall to either of this mood will be tagged with the value Other. This includes opening and closing utterances, as well as topics that are not covered in the domain, for instance payment or opening hours of the theater. 3.2.2. Pragmatic Features Pragmatics is most concerned with analyzing speaker meaning at the level of utterances. Since utterances are situated in a context, we require pragmatic features to model the user utterance before we could assign the user input with the intended response. Our pragmatics features are based on Conversation Acts Theory (Traum and Hinkelman, 1992). According to this theory, there are four levels of action that are required to maintain the content and coherence of a dialogue conversation. The first are core speech acts, which coordinate the local flow of changes in belief, intention, and obligations. The second are turn-taking acts, which coordinate the party who is in immediate control of the speaking channel, therefore should have the attention of the opposite participant. The third are grounding acts, which coordinate the state of mutual understanding on what is contributes. And the fourth are argumentation acts, which coordinate the higher discourse purpose that the participants have for engaging in conversation. Table 4 shows pragmatic features extracted from both user utterances and response utterances in the dialogue corpus. Table 4: ID 0 1 2 3 4 Pragmatic Features for SCHISMA Freq 123 504 239 54 920 Control user system Freq 753 167 920 Role initiator responder Freq 753 167 920 Turn- taking keep release take Freq 84 556 280 920 Nego-tiation open inform propose confirm close Freq 1 340 451 94 34 920 Mood assert question command prompt - Because SCHISMA is readily annotated with dialogue acts, we use FLF to represent the speech acts and BLF to represent the grounding acts. In order to model turns, however, more information is required than just intentions. The key aspect of modeling turn-taking is to identify the participant who is in control of the particular utterance, before predicting whether the next turn is assigned or seized by the opposite participant. Turn-taking acts are derived based on the initiative tagging method of Walker and Whittaker (1990). They modeled mixed-initiative dialogue based on an utterance type classification and a set of rules for transfer of control between the participants. The utterance type classification is based on intentions; hence Mood can be seen as an abstraction of the FLF. Another component of the turn-taking model requires control rules, which also depends on the utterance types. Control is the participant who holds the initiative in the given turn. Control in turn will help to identify the turns in each utterance. For example, if the input utterance is a question and the user currently in control, then the user is releasing the turn by selecting the system to answer his question. Similarly, if the input utterance is an answer, it means the user is taking the turn because the system has just released the turn to the user. Meanwhile, Role represents the role of the speaker whether as initiator or responder. Speech acts, grounding acts and turn-taking acts represent all general behavior in any dialogue corpus, but the fourth act, argumentation act, will depend on the type of dialogue model that the system imposes i.e., information-seeking, question-answering, or collaborative planning. Because SCHISMA is a negotiation dialogue, argumentation acts are also called negotiation acts. A Bayesian Approach to Intention-Based Response Generation 3.3. Bayesian Networks 484 Bayesian networks are graphical models for reasoning under uncertainty, where the nodes represent variables (discrete or continuous) and arcs represent direct connections between them, which may be causal or otherwise (Jensen, 2001). A Bayesian network is composed of qualitative and quantitative components. The qualitative component is represented by the network structure while the quantitative component is represented by the network parameters, which are the conditional probability distributions of the nodes in the network. Table 5 shows the features used as nodes in Bayesian networks. The nodes are represented by semantic and pragmatic features from user utterances. Table 5: Features used as Nodes in Bayesian Networks Type Scalar Scalar Scalar Scalar Scalar Scalar Scalar Scalar Scalar Values {performance, reservation} {title, genre, artist, time, date, review, person, reserve, ticket, cost, avail, reduc, seat, theater, other} {assert, question, command, other} {client, system} {initiator, responder} {release, take, keep} {open, inform, propose, confirm, close} Refer Table 2 Refer Table 2 Descriptions Global topic of user utterance Topic of conversation in user utterance Classification of user utterance based on purpose i.e., declarative, interrogative or imperative Control holder at the point of user utterance Role of the user Turn-taking act for user utterance Negotiation act for user utterance Speech act for user utterance Grounding act for user utterance Node name Context Topic Action Control Role Turn Negotiation FLF BLF The goal for our Bayesian classification is to find the best response class to respond coherently to user input utterance, as set forth by equation 1. Modeling the Bayesian networks was done using the Probabilistic Networks Library (Intel, 2004). Structural learning was carried out using Hill-climbing, while conditional probability distributions were assessed using maximum likelihood estimation (MLE). MLE will find parameters that maximize the likelihood function of P (rc | θ ). Finally BN classifier will calculate joint probability distributions (JPD) for all nodes by multiplying all CPD in the network. Figure 2 illustrates an example of Bayesian network for our response classification experiment. Figure 2: A Bayesian Network for Response Classification Experiment Pragmatic level Context FLF Action BLF Negotiation Topic Semantic level Control Turn Role Response Class 485 Aida Mustapha, Md. Nasir Sulaiman, Ramlan Mahmod and Mohd. Hasan Selamat In order to validate the impact of intentions on response classification, we divided the experiments into three cases; these differ in terms of feature combinations that represent random variables in the Bayesian networks. The first case began with investigation on dialogue act features, the second case stretched dialogue act features to cater conversation act features, and the third case investigated on high-level discourse act features. For each case, we performed 10-fold cross validation to split the data into training and testing sets. Using this technique, the data is split into ten approximately equal partitions, each being used in turn for testing while the remainder of the data is used for training. 4. Results and Discussions The baseline for the experiment is the majority baseline, which is taken from relative frequency of the most frequent response class. The response class is Reserve, which yields 16.3%. Table 6 shows the results for our response classification experiments using different set of features. Table 6: Case 1 2 3 Comparison of Results for Different Set of Features Semantic Features None None Context, Topic Pragmatic Features FLF, BLF Case 1, Action, Control, Turn, Role Case 2, Negotiation Minimum Accuracy (%) 10.7 24.6 72.5 Maximum Accuracy (%) 13.4 28.5 73.9 Case 1 – Dialogue Act Features The goal of this case is to investigate the impact of intentions in the form of dialogue acts, which are forward-looking functions (FLF) and backward-looking functions (BLF). FLF and BLF are readily available from the DAMSL-annotated SCHISMA corpus. For the purpose of studying the impact of dialogue acts alone in characterizing user utterances, we discriminated out the semantic features from response classification. As shown in Table 6, case 1 only consists of speech acts FLF and grounding acts BLF from user utterances, readily available from the DAMSL-annotated SCHISMA corpus. Nonetheless, the result shows that the recognition accuracy of 10.7% falls even below the baseline accuracy, which is 16.3%. This is clearly because intentions in the form of speech acts and grounding acts do not capture enough behaviors from the user utterances, hence the low response classification accuracy result. Case 2 – Conversation Act Features The goal of this case is to investigate the impact of intentions in an utterance under the interpretation of a conversational framework, based on Conversation Acts Theory (Traum and Hinkelman, 1992). The underlying idea of Conversation Acts theory is that certain types of behaviors are subsumed by other behaviors. This premise is powerful to dialogue modeling in the sense that it is able to capture interaction of intentions at all levels during communication. Recall that Traum and Hinkelman (1992) distinguish four levels of action that are necessary for maintaining coherence and content of conversation. In case 2, we began with general conversational behavior in any dialogue corpus, which are the speech acts, grounding acts, and turn-taking acts. Case 2 shows the incorporation of turn-taking acts as well as modeling of control and initiative. The reason for exclusion of argumentation acts in case 2 is because extraction of negotiation acts requires semantic features. Because modeling general conversational behaviors in user utterances generally do not require any semantic information; semantic features are excluded from the set of random variables in Bayesian networks in case 2. Alas, as shown in Table 6, there is only a trivial increase in response class accuracy rate from 10.7% in case 1 to 28.5% in case 2. A Bayesian Approach to Intention-Based Response Generation Case 3 – Discourse Act Features 486 Because the accuracy does not increase significantly despite the use of pragmatic features from Conversation Acts Theory, this observation led to the investigation of argumentation acts. Argumentation acts will also become the benchmark in testing the adequacy of Conversation Act Theory in characterizing user utterances in dialogue systems. Argumentation acts uniquely shape the higher-level coherence based on the discourse objective or the dialogue model, therefore is highly specific to the type of dialogue corpus (i.e., task-oriented, question-answer). Argumentation acts are defined by sequence of speech acts constrained by the timing and semantic content (Traum, 1999). The first constraint, timing refers to the negotiation phases within transaction dialogues, for example Opening, Information, Proposal, Confirmation, and Closing. Timing of negotiation phases, in turns, can only be determined with the aid of semantic interpretation of input utterances because the same intentions (i.e., Query) can occur in more than one negotiation phases (i.e., Proposal and Confirmation). This led to the incorporation of semantic features in addition to the extended pragmatic features for case 3.. Back to Table 6, the result for case 3 shows a significant increase in the recognition accuracy of response class classification from 24.6% in case 2 to 71.3% in case 3. This is due to the incorporation of argumentation acts to complete the pragmatic features, in addition to the semantic features. Comparison of recognition accuracies across different experiment cases is illustrated in Figure 3. Figure 3: Recognition Accuracy for Response Classification 80 70 Accuracy (%) 60 50 40 30 20 10 0 Baseline Min. Accuracy Max. Accuracy Case 1 16.3 10.7 13.4 Case 2 16.3 24.6 28.5 Case 3 16.3 72.5 73.9 In conclusion, the experiments for classification of response classes were designed to deliberately show the importance of intentions (speech acts) and the derivation of conversation acts, in recognizing response utterances. It is imperative to realize that pragmatic information, in particular intentions, highly contribute to the performance of classification task. We believe that the loss of 26% in recognition accuracy of response class is attributed to unforeseeable imperfection during automatic extraction of semantic and pragmatic features. Although we do not discuss our feature extraction process due to space limitations, the success rate for extraction highly depends on the nature of user utterances. In SCHISMA, while many are only partial phrases, some are gibberish and out of domain i.e., opening hours of the theater. This has caused tagging of Other, when the system could not make out the significance of the user utterance. Although such partials may be sufficient for human interpretation in dialogues, our feature extraction process generalized the tagging into Other. Also, in assigning context information, a single utterance may bear different contextual meaning if it is placed 487 Aida Mustapha, Md. Nasir Sulaiman, Ramlan Mahmod and Mohd. Hasan Selamat in a different negotiation phase. This may be improved if we could refine the values of the Topic feature i.e. Cost (as in information phase) vs. Total_cost (as in reservation phase).. For limited task-oriented dialogue systems, we argue that it is sufficient for a generation engine to learn to acquire its responses automatically from corpus based on the user intention. Nevertheless, one may reason that in a ticket reservation system, we cannot simply predict a response when there are times where the tickets are available, while other times they are not. This argument is pertinent to the role of knowledge base in such reservation systems. However, a knowledge base is always part of a dialogue system, just as the response generation component. Our stochastic approach to generation still requires interaction with the knowledge base through the abstraction of response utterances in the response database. For each response utterance, we chose to substitute keywords like ‘Eugen Onegin’ into Title and ‘f75.00’ into Cost. Albeit the same naming convention as values in feature Topic, we refer them as domain attributes in response utterances. The use of domain attributes is important to ensure the approach is viable for cross domain application. The experiments for classification of response classes will become a benchmark for future intention-based response generation. The experiment cases were designed to deliberately show the important of intentions (speech acts) and its derivation of conversation acts, in recognizing response utterances. It is imperative to realize that pragmatic information (intentions) highly contribute to the performance of classification task. 5. Summary and Concluding Remarks In this article, we have presented findings on response classification experiment under the classification-and-ranking architecture (Mustapha et al., 2008). Similar to Marciniak and Strube (2004), the approach (as opposed to the overgenerate-and-rank technique) avoid overgeneration altogether. However, similar to the overgenerate-and-rank technique, once the response class has been identified; response generation system has to rank response utterances within the particular response class to choose one response back to the user. Classification-and-ranking architecture is referred as intention-based because the basis of learning the dialogue behaviors and the dialogue structures comes from conversation acts that elaborate the speech acts further into argumentation acts. The upcoming future work would be to continue with ranking experiment under the Maximum Entropy framework for the justifications put forward in section 4.3. In order to improve the classification accuracy, we are interested in refining the process of feature extraction as well as exploring into other probabilistic or machine learning techniques in the near future. In particular, we are very interested to explore the role of previous user utterances, if they can contribute to accuracy percentage for the classification task. Since previous utterances can be represented as time-series information, the SCHISMA corpus will be repeated under the framework of dynamic Bayesian networks. A Bayesian Approach to Intention-Based Response Generation 488 References [1] [2] [3] [4] [5] Alexandersson, J. and Reithinger, N. 1997. Learning dialogue structures from a corpus. In Proceedings of EuroSpeech ’97, Rhodes. Allen, J. and Core, M. 1997. Draft of DAMSL: Dialog act markup in several layers. Technical Report, Discourse Research Initiative, Schloss Dagstuhl. Austin, J. 1962. How to do Things with Words. Oxford: Clarendon. Bach, K. 2002. Semantic, Pragmatic. In J. Keim Campbell, M. O’Rourke, and D. Shier, editors, Meaning and Truth. Seven Bridges Press, NewYork, pages 284–292. Bangalore, S. and Rambow, O. 2000. Exploiting a probabilistic hierarchical model for generation. In Proceedings of the 18th International Conference on Computational Linguistics, Saarbrucken, Germany. Bangalore, S., Di Fabbrizio, G., and Stent, A. 2006. Learning the structure of task-driven human-human dialogues. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 201–208. Sydney, Australia. Bateman, J. 1996. KPML development environment. Technical Report, IPSI, GMD, Darmstadt, Germany. Belz, A. 2005. Statistical generation: Three methods compared and evaluated. In Proceedings of the 10th European Workshop on Natural Language Generation, ENLG 05. Bunt, H. 2000. Dialogue pragmatics and context specification. In H. Bunt and W. Black, editors, Abduction, Belief and Context in Dialogue. Studies in Computational Pragmatics, 1:81–150. Amsterdam, Benjamins. Carberry, S. 1990. Plan Recognition in Natural Language Dialogue. Cambridge, Mass., MIT Press. Carletta, J., Isard, A., Isard, S., Kowtko, J., Doherty-Sneddon, G., and Anderson, A. 1997. The reliability of a dialogue structure coding scheme. Computational Linguistics, 23(1):13–31. Elhadad, M. 1993. Using Argumentation to Control Lexical Choice: A Functional Unificationbased approach. Ph.D. thesis, Department of Computer Science, Columbia University, New York. Halliday, M. 1967. Notes on transitivity and theme in English. Part II. Journal of Linguistics, 3:199–244. Hoeven, v.d. G., Andernach, A., Burgt v.d. S., Kruijff, G-J., Nijholt, A., Schaake, J., and Jong F. 1995. SCHISMA: A natural language accessible theatre information and booking system. In Proceedings of the 1st International Workshop on Applications of Natural Language to Data Bases, pages 271–285. Versailles, France. Hulstijn, J. 2000. Dialogue Models for Inquiry and Transaction. Ph.D. thesis, University of Twente, Netherlands. Hulstijn, J. and van Hessen, A. 1998. Utterance generation for transaction dialogues. In Proceedings of International Conference of Spoken Language Processing, Sydney. Intel. 2004. Probabilistic Network Library: User Guide and Reference Manual. Intel Corporation. Implementation available at http://www.intel.com/technology/ computing/pnl. Jensen, F. V. 2001. Bayesian Networks and Decision Graphs. Springer, New York. Keizer, S., and op den Akker, R. 2006. Dialogue act recognition under uncertainty using Bayesian networks. Natural Language Engineering, pages 1–30. Klarner, Martin and Bernd Ludwig. 2004. Hybrid natural language generation in a spoken language dialogue system. In S. Biundo, T. Fruhwirth, and G. Palm, editors, KI 2004, LNAI 3238, pages 97–111. Springer-Verlag, Berlin Heidelberg. Kowtko, J., Isard, S., and Doherty, G. 1992. Conversational games within dialogue. HCRC Technical Report RP-31, University of Edinburgh. Langkilde, I. 2000. Forest-based statistical sentence generation. In Proceedings of the North American Meeting of the Association for Computational Lingustics, COLING-00. [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] 489 [23] Aida Mustapha, Md. Nasir Sulaiman, Ramlan Mahmod and Mohd. Hasan Selamat Langkilde, I. and Knight, K. 1998. Generation that exploits corpus-based statistical knowledge. In Proceedings of the 36th Annual Meeting on Association for Computational Linguistics, Montreal, Quebec, Canada. Lavoie, B. and Rambow, O. 1997. A fast and portable realizer for text generation systems. In Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLP-97, Washington, DC. Marciniak, T. and Strube, M. 2004. Classification-based generation using TAG. In Proceedings of International Conference of National Language Generation, pages 100–109, Brockenhurst, UK. McKeown, K. 1985. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text. Studies in Natural Language Processing. Cambridge University Press. Oberlander, J., O’Donnell, M., Knott, A., and Mellish, C. 1998. Conversation in the museum: Experiments in dynamic hypermedia with the intelligent labeling explorer. New Review of Hypermedia and Multimedia, 4:11–32. Oh, A. and Rudnicky, A. 2002. Stochastic natural language generation for spoken dialogue systems. Computer Speech & Language, pages 387–407. Ratnaparkhi, A. 2002. Trainable approaches to surface natural language generation and their application to conversational dialog systems. Computer Speech & Language, pages 435–455. Reiter, E. and Mellish, C. 1992. Using classification to generate text. In Proceedings of the 9th COLING. Schegloff, E. and Sacks, H. 1973. Opening up closings. Semiotica, 7(4):289–327. Searle, J. 1969. Speech Acts. Cambridge University Press. Sperber, D. and Wilson, D. 1986. Relevance: Communication and Cognition. Oxford: Blackwell. Stent, A. 2002. A conversation acts model for generating spoken dialogue contributions. Computer Speech & Language, 16(3–4):313–352. Traum, D. 1999. Speech acts for dialogue agents. In M. Wooldridge and A. Rao, editors, Foundations of Rational Agency. Kluwer, pages 169–201. Traum, D. and Hinkelman, E. 1992. Conversation acts in task-oriented spoken dialogue. Computational Intelligence, 8(3):575–599. Walker, M. and Whittaker, S. 1990. Mixed-initiative in dialogue: An investigation into discourse segmentation. In Proceedings of the 28th Annual Meeting on Association for Computational Linguistics, Pittsburgh, Pennsylvania, pages 70–78. Walker, Marilyn. 2000. An application of reinforcement learning to dialogue strategy selection in a spoken dialogue system for email. Journal of Artificial Intelligence Research, 12:387–416. Zhong, H. and Stent, A. 2005. Building surface realizers automatically from corpora. In Proceedings of the Corpus Linguistics-05, Workshop on Using Corpora for Natural Language Generation. [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39]

Related docs
Other docs by the300e
disc003
Views: 125  |  Downloads: 0
WRONGFUL DEATH
Views: 204  |  Downloads: 0
2006 Inst CT-1 (PDF) Instructions
Views: 238  |  Downloads: 1
Company Memorandum Re Vacation Time Available
Views: 191  |  Downloads: 0
Board Resolution Authorizing Litigation
Views: 160  |  Downloads: 3
Operating Agreement - Wilson Equity Office LLC
Views: 324  |  Downloads: 10
randolph-all
Views: 536  |  Downloads: 1
CorpDocs-Board Resolution Approving a Stock Split
Views: 394  |  Downloads: 14
Standard Form 33 Solicitation Offer and Award
Views: 233  |  Downloads: 0
Letter of Intent for Joint Venture
Views: 2009  |  Downloads: 215
Digital Microwave Corp Ammendments and By laws
Views: 184  |  Downloads: 0