Docstoc

dod030306.ppt

Document Sample
dod030306.ppt Powered By Docstoc
					Student simulation
and evaluation
DOD meeting
Hua Ai (hua@cs.pitt.edu)
03/03/2006
Outline

 §   Motivations
 §   Backgrounds
 §   Corpus
 §   Student Simulation Model
 §   Comparisons
 §   Conclusions & Future Work

                                 2
Motivations

 § For larger corpus
   § Reinforcement Learning (RL) is used to
     learn the best policy for spoken dialogue
     systems automatically
   § Best strategy may often not even be present
     in small dataset
 § For cheaper corpus
   § Human subjects are expensive
                                               3
                      Dialog Manager
Simulated User
                                            Reinforcement
                                              Learning
                          Strategy


    Dialog
    Corpus



  Simulation models




Strategy learning using a simulated user ( Schatzmann et
al., 2005)

                                                            4
Backgrounds (1)

 § Education community
   § Focusing on changes of student’s inner-
     brain knowledge representation forms
   § Usually not dialogue based
   § Simulated students for (Venlehn et al., 1994)
     § tutor training
     § Collaborative learning



                                                 5
Backgrounds (2)

 § Dialogue community
   § Focusing on interactions and dialogue
     behaviors
   § Simulated users have limited actions to take
   § (Schatzmann et al., 2005)
     § Simulating on DA level




                                                    6
Corpus (1)
 § Spoken dialogue physics tutor (ITSPOKE)




                                             7
Corpus (2)                  5 problems



    § Tutoring procedure
(T) Question               (T) Question

 (S) Answer                 (S) Answer


  Dialogue                   Dialogue
   (T) Q          ……          (T) Q
    (S) A                      (S) A
     …                          …


Essay revision             Essay revision


   Dialogue                   Dialogue      8
Corpus (3)

 § Tutor’s behaviors
   § Defined in KCD (Knowledge Construction
     Dialogues)
                 Correct




                 Incorrect/
              Partially Correct

                                              9
Corpus (4)

                 #dialogues           stuWord      stuTurn       tutorWord    tutorTurn


 f03                    100   avg          57.16         23.35      1256.92         29.64
 (Synthesized)
                              stdev     45.57638     17.44334      849.8195      19.76351


 05syn                  136   avg        91.0963     30.78519      1655.467      38.06667
 (Synthesized)
                              stdev     53.82931     14.42551      757.8744      16.32469


 05pre                  135   avg       87.34559     30.11765      1597.206      37.33088
 (pre-
 recorded)                    stdev     55.48004     16.96972      832.9845      18.20096

       f03:s05 Different groups of subjects                                               10
Simulation Models (1)

 § Simulating on word level
   § Student’s have more complex behaviors
   § DA info alone isn’t enough for the system
 § Two models trained on two corpus
                              03ProbCorrect
   ProbCorrect   f03
                             03Random

                              05ProbCorrect
    Random       s05

                             05Random
                                                 11
Simulation Models (2)

 § ProbCorrect Model
   § Simulates average knowledge level of real
     students
   § Simulate meaningful dialogue behaviors
 § Random Model
   § Non-sense
   § As a contrast

                                                 12
Real corpus      Candidate Ans:   ProbCorrect Model:
question1        For question1    Question 1
Answer1_1 (c)    c:ic = 1:2       Answer:
Answer1_2        c:               1) Choose to give a
(ic)             Answer1_1           c/ic answer with the
Answer1_3        ic:                 same average
(ic)             Answer1_2           probability as real
                 Answer1_3           student
question2                         2) Randomly choose
Answer2_1 (c)    For question2       one answers from
Answer2_2        c:ic = 1:1          the corresponding
(ic)             c:                  answer set
                 Answer2_1
                 ic
                 Answer2_2



                ProbCorrect Model                           13
HC03&05        Candidate      Big random Model:
Question1      Ans:           Question i:
Answer1_1      1) Answer1_1
Answer1_2      2) Answer1_2   Answer: any of the 6
Answer1_3      3) Answer1_3   answers with the
Answer1_4      4) Answer1_4   same probability
               5) Answer2_1
Question2      6) Answer2_2   (Regardless the
Answer2_1                     question!)
Answer2_2




            Random Model
                                                     14
Experiments

 § Comparisons between real corpora
 § Comparisons between real & simulated
   corpora
 § Comparisons between simulated corpora




                                       15
Real Corpora Comparisons
(1)
 § Evaluation metrics
   §   High-level dialog features
   §   Dialog style and cooperativeness
   §   Dialog Success Rate and Efficiency
   §   Learning Gains




                                            16
Real corpora comparisons
(2)
 § High-level dialog features




                                17
Real corpora comparisons
(3)
 § Dialogue style features




                             18
Real corpora comparisons
(3)
 § Dialogue success rate




                           19
Real corpora comparisons
(4)
 § Learning gains features




                             20
Results

 § Differences captured by these simple
   metrics can’t help to conclude whether a
   corpus is real or not (Schatzmann et al.,
   2005)
 § Differences could be due to different user
   population


                                            21
Real Vs Simulated Corpora
Comparisons




                        22
Results (1)

 § Most of the measurements are able to
   distinguish between Random and
   ProbCorrect model
 § ProbCorrect model generates more
   realistic behaviors
 § We can’t conclude on the power of these
   metrics since the two simulated corpus
   are really different
                                         23
Results (2)

 § Differences between real and random
   models are captured clearly, but
   differences between real and
   ProbCorrect is not clear
 § We don’t expect this simple model to give
   very real corpus. It’s surprising that the
   differences are small

                                           24
Results (3)

 § S05 variety > f03 variety è
   05probCorrect variety > 03probCorrect
   variety
 § However, we don’t get significantly more
   varieties in the simulated corpus than the
   real ones
   § Could be the computer tutor is simple (c/ic)
   § We’re using the same candidate answer set
                                                25
Results (4)

 § ProbCorrect models trained on different
   real corpora are quite different
 § The ProbCorrect model is more similar to
   the real corpus it is trained from than to
   the other real corpus




                                            26
Comparisons between
simulated dialogues with
different dialogue structure




                               27
Results

 § Larger differences between the two
   simulated corpora in prob7 than in
   prob34
 § Dialogue structure of prob34 is more
   restricted
 § The power of these simple metrics is
   restricted by the dialogue structure

                                          28
Conclusions

 § The simple measurements can
   distinguish between
   § real corpora
     § Different population
   § simulated and real corpora
     § To different extent
   § simulated corpora
     § Different models
     § Trained on different corpora
     § Limited to different Dialog structure
                                               29
Future work

 §   Explore “deep” evaluation metrics
 §   Test simulated corpus on policy
 §   More simulation models
     § More human features
        § Emotion, learning
     § Special cases
        § Quick learners, slow learners


                                          30

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:0
posted:7/18/2013
language:Unknown
pages:30