An attempt at capturing Tutoring Style in Tutorial Dialog

Document Sample
An attempt at capturing Tutoring Style in Tutorial Dialog Powered By Docstoc
					        An attempt at capturing Tutoring Style in Tutorial Dialog
                                                 Rohit Kumar
                            Language Technologies Institute, Carnegie Mellon University

Introduction & Motivation:
Intelligent Tutoring Systems community has accepted the assumption that tutoring is the most effective
method of instruction. It is known the expert human tutors are better than less skilled tutors. Expert Human
tutors set the benchmark for building Intelligent Tutoring Systems. In order to develop Intelligent Tutoring
Systems comparable to Expert Tutors, we need to understand what makes some human tutors better than

Our group has conducted human tutoring studies using tutorial dialog as the mode of instruction in the
thermodynamic cycle domain. We have observed that different human tutors have different learning
effects. We want to understand what makes these tutors different from each other. This difference may
have the key the understanding what makes human tutoring an effective mode of instruction.

In this work, i attribute the difference between different tutors we have observed to their individual tutoring
styles which appear as a preference to a particular choice in the conversational dialog.

I want to study what characterizes these different tutoring styles of individual human tutors and if these
tutors follow the tutoring style consistently. Also, I want to know whether following a characteristic style
of an effective human tutor lead to a consistent learning effect. A study of these characteristic styles will
give us directions to effective finer strategies to be implemented into Tutorial Dialog Systems.

Before we proceed, lets look at some of the human tutoring dialog we have collected for three different
human tutors to see if can see a characteristic of the human tutors in their dialog.

Observations of two of the human tutors V and M conversing with students about “Material Constraints”
in design of Thermodynamic Cycles.

Tutor: V
           <agent="tutor">but you cant inc it forever as it is constrained my material properties</sentence>
           <agent="tutor">so change the temp now to inc eff</sentence>
           <agent="tutor">did it inc?</sentence>
           <agent="tutor">go to the max tem p you can</sentence>
           <agent="tutor">no no</sentence>

Tutor: M
           <agent="tutor">you can change the value of tmax</sentence>
           <agent="tutor">to the maximum possible value</sentence>
           <agent="tutor">for stainless steel</sentence>
           <agent="tutor">which is 570 c</sentence>

           <agent="tutor">there is another constraint</sentence>
           <agent="student">and how do i know the high temperature</sentence>
           <agent="student">they didnt give me any tables</sentence>
           <agent="tutor">that is limited by how much temperature the boiler materials can withstand</sentence>
           <agent="tutor">on page 1</sentence>
           <agent="tutor">it is specified</sentence>
           <agent="tutor">that maximum t and p in the cycle are</sentence>
           <agent="tutor">570 c</sentence>
           <agent="tutor">and 20000 kpa respectively</sentence>
           <agent="tutor">did you see it?</sentence>

The underlined tutor utterances show the difference in preference of tutor V and M in handling similar
situations. While V prefers to say that statement that temperature cannot be changed beyond a maximum
level in a negative way, tutor M says it positively. Also, tutor V is more command in conversation whereas
tutor M is more collaborative.

Let us look at another set of example of difference in preference between three tutors (V, M and E). The
tutors are discussing about effect of temperature on the efficiency of a Thermodynamic Cycle.

Tutor: V
           <agent="tutor">note the initial thermal eff then try to improve it</sentence>
           <agent="tutor">ya plot eff vs tmax</sentence>
           <agent="tutor">tools and sesitivity</sentence>
           <agent="tutor">select the state</sentence>
           <agent="student">which one</sentence>
           <agent="tutor">temp is maximum after the boiler right</sentence>
           <agent="tutor">plot is from 300 to 600</sentence>
           <agent="tutor">see your instructions</sentence>
           <agent="tutor">do you understand what this means</sentence>
           <agent="tutor">as the temp inc eff inc</sentence>

Notable stylistic characteristics in Tutor: V’s utterances
    o Authoritative
    o Didactic Tutoring (Instructive)
    o Strong and Dominant
    o Often Negative

Tutor: M
           <agent="tutor">ok now lets g to tools sensitivity</sentence>
           <agent="tutor">refer page 5</sentence>
           <agent="tutor">lets see what happens to efficiency when we change the heater temperature</sentence>
           <agent="tutor">do analysis 1</sentence>
           <agent="tutor">the independent parametrs are filled in already</sentence>
           <agent="tutor">you just need ti fill in dependent parametrs</sentence>
           <agent="tutor">i mean you need to fill in independent parametrs</sentence>
           <agent="tutor">select parameter to vary</sentence>
           <agent="tutor">so what does this plot tell you?</sentence>
           <agent="student">the higher the temp the higher the eff</sentence>
           <agent="tutor">which is what we thought</sentence>
           <agent="tutor">higher the steam temperature higher the efficiency</sentence>

Notable stylistic characteristics in Tutor: M’s utterances
    o Similar to Tutor V in some ways
    o More collaborative than Authoritative
    o Wordier
    o More and continuous feedback
    o Rarely Negativ

Tutor: E
           <agent="tutor">what max t did you use in your sensitivity analysis?</sentence>
           <agent="student">not tmax?</sentence>
         <agent="student">it says the temp leaving heater 2 =tmax from optimized simple cycle</sentence>
         <agent="tutor">ok there's a reason 4 that</sentence>
         <agent="tutor">we'd see that it needs to be changed</sentence>
         <agent="student">so then it's not tmax?</sentence>
         <agent="tutor">please go on</sentence>
         <agent="student">so i leave it as 575?</sentence>

Notable stylistic characteristics in Tutor: E’s utterances
    o Mostly confused and highly divergent from topic
    o Often negative
    o Rare Feedback
    o Lesser content words & short utterances
    o Mostly confused and highly divergent from topic

We note that tutors have distinctive style of their own and we hypothesize that using the best of stylistic
strategies of each tutor can help us in developing better Tutorial Dialog Systems. Also we are motivated to
develop tutorial dialog systems which can adopt these stylistic variations with minimal repetition of effort
(like re-authoring knowledge sources).

What is Tutoring Style ?
With this thought in mind, I looked at literature referring to various characterizing features of tutoring
styles. A tutoring strategy would correspond with a choice of these features. By tutoring style, we refer to
the preference to a choice of these features. Different tutors may follow different strategies when put in
different contexts due to their characteristic preferences. Modeling the Tutoring style would involve
modeling the tutor’s preference to choice of strategy in given context.

If we can predict and follow the optimal strategy dynamically as the dialog proceeds by modeling the best
of strategies, we can create human like Tutorial dialog systems.

Literature Review
In order to review characterizing features of tutoring strategies referred by the intelligent tutoring
community, I looked at available literature. [1] provides a list of tutor turn actions studied from several
human tutoring corpora. In particular types of feedback and types of responses are listed. [2] also lists types
of feedback in a different dimension.

Most of the literature [2][3][4] describes cross comparison between conditions involving different types of
each features and the effect of each type is mentioned. The idea that some particular features may be both
positive and negative effect depending on the context is not pursued. [1] refers to corpus analysis with
contextual feature which include turn types of previous student turn and previous tutor turn.

[5] talk about use of different basic unit considered for content analysis.

I will be working with features motivated by this literature review such that the features can be retrieved
automatically using linguistic analysis tools. The context I use is similar to context referred by [1]. Also, I
have done data analysis with different basic units including dialog, topics and utterances.

The corpus I am working with for this project is collected for during our                                       No. of
experiments with Tutoring about Thermodynamic Cycles by Human tutors. 3                               Tutor
different tutors tutored 17 students in all. The distribution of students to tutors is
shown in table 1.                                                                                          V      5
                                                                                       M                          9
The tutoring task involved tutoring undergraduate mechanical engineering
students over an Instant Messenger while the students work with simulation of          E                          3
thermodynamic cycle to achieve maximum efficiency. The students were given a
set of 3 thermodynamic cycles to setup, analyze and optimize in CyclePad [6] environment.
First Analysis: Significance of Shallow Linguistic Features
In line another piece of investigation we were working on, we decided to you Linguistic Inquiry and Word
Count (LIWC) [7] dimensions as features for the tutoring conversation. Tutor utterances over every
tutoring session were extracted and LIWC Dimension for each conversation was calculated. Note that unit
of corpus analysis here is a conversation session as we are not splitting it further for this analysis.

We observed that of all the LIWC dimensions, the dimension for Cognitive processes (referred to as
CogMech) was closely correlated with student learning. CogMech along with PreTest are highly predictive
of PostTest (R-Sq=83.8%). Also by using couple of other LIWC dimensions we were able to make R-

We continue to investigate further for CogMech. CogMech (4.42) was slightly better predictor of PostTest
than PreTest (T=4.01). Student tutored with high CogMech dimension in the tutor utterances had a
significant learning effect (1.577 Standard Deviations) over students with low CogMech dimension.

We also observed that High CogMech consistently characterized Tutor V who was also the most effective
tutor among the 3. A pairwise comparison on CogMech showed the Tutor V was significantly high on
CogMech than Tutor M and Tutor E. Tutor M was slightly better than Tutor E. These observation on
CogMech were in line with our observation of effectiveness of each tutor.

These results give us reason to believe that LIWC Dimensions like CogMech may be suitable features to
characterize tutors. I decided to proceed with the next analysis with CogMech, Negative Response,
Optimism and Question dimensions as the features of tutor utterances.

In the second set of analysis, were intended to give an insight into how individual tutor employ cognitive
processes by triggering word for creating their characteristic effect. Also we want to know how consistently
tutors really follow a style in a principled way. We want to pick up the best of style aspects that lead to
higher learning effects.

Inventory Building
For the second set of analysis, I chose to use tutor utterance as the basic unit of analysis. Each tutor
utterance were tagged with 4 LIWC features indicating presence of CogMech words, Optimism, Negative
Feedback and Question. Along with these, context was every utterance which included 4 features of the
previous tutor utterance and 4 of the previous student utterance. These features sets were grouped by tutors
and by high/low learning gain for several analyses performed in the next step. Note that unit of analysis
here is an utterance.

Modeling Tutor Style
I tried modeling tutor style using Unpruned Decision Trees (based on ID3 algorithm). A decision tree was
built to predict each of the four features of a tutor utterance. This sets of 4 decision trees was built for each
tutor and for each group of learning gain (high/low).

               Train      Test->          V                        M                        E
                 V       Accuracy      40.9574                  37.2745                  47.7941
                          Kappa        0.0186                    0.012                   -0.0309

                 M       Accuracy      39.8936                  42.6854                  56.6176
                          Kappa         0.001                   -0.0047                   0.011

                 E       Accuracy 38.2979                      41.483              53.6765
                           Kappa      -0.0219                  -0.026               0.0177
          Table 1: Prediction accuracies of Decision tree based model of CogMech for each Tutor
The results were marked by consistently low values of Kappa and erratic prediction accuracies as shown in
Table 1 above. Reflecting back on why these analyses did not present any interesting numbers, I figure that
is because we are looking at a very fine unit of analysis and the corpus is having a lot of noise to be
modeled properly. Also I am compelled to think that it may be a shortcoming of the modeling approach and
the context features used for this case.

In order to see whether tutors were doing following a principled style in any sense, I decide to cluster the
tutor feature along with the context using Simple k-means clustering into K cluster such that K is equal to
the number of possible values for each feature. For example Optimism can have 2 values: Yes or No. The
clustering was marked by high classification errors as shown in table 2.

                        CogMech             Negation            Optimism                Question
          Tutor         n    Error          n    Error          n     Error         n        Error
            V           6   63.298          2   19.681          2    14.894         2        13.83

            M           6      76.754       2     49.098        2      44.89        2       40.681

            E           6      63.971       2     36.765        2     38.971        2       38.235

          High          6      68.354       2     46.835        2     48.734        2       39.241

          Low        6      71.714       2      24.571        2      22.857        2       33.429
            Table 2: Clustering accuracies for each class of tutor utterances for each feature

Topic as basic unit of analysis
As a final set of analysis, I used a the Thermodynamic corpus with utterance divided into topics. The unit
of analysis here is coarser than utterance and finer than dialog. The dialog utterances were grouped into 15
different topics. We found that the number of times and the order in which the topics were covered is
characteristic to each tutor. Regression tree based on topic distribution in each dialog classified the dialog
according to the tutor with an accuracy of 82.35% (kappa=0.7167). There were 3 incorrect classifications
associated with the intermediate tutor (Tutor M) whose characteristics matched the other tutors (V and E)
in several ways.

This result seems to be promising and the choice of unit of analysis seems to have significant effect on the
results. As work in progress, I plan to further investigate by modeling transitions between topics as tutor

Conclusions & Future Work
We observed that LIWC are good predictors of learning when we do content analysis with Dialog unit. My
attempts at fine grain modeling of tutor behavior failed which i attribute to lack to improper choice of
modeling technique, insufficient and incorrect feature. Use of Topic as the unit of analysis as shown a
promise in characterizing a tutor and I wish to include that as a contextual feature for further analysis.

[1]. Natalie K. Person, Arthur C. Grassaer, “Fourteen Facts about Human Tutoring: Food for Thought for
     ITS Developers,” AIED2003 Supplementary Proceeding, Sydney Australia, July 2003
[2]. Lepper et. al. “Self Perception…
[3]. B. Di Eugenio, X. Lu, T. C. Kershaw, A. Corrigan-Halpern, S. Ohlsson, “Positive and negative verbal
     feedback for Intelligent Tutoring Systems.” AIED 2005, the 12th International Conference on
     Artificial Intelligence in Education, Amsterdam, The Netherlands, July 2005
[4]. C. P. Rosé, J. D. Moore, K. VanLehn, D. Allbritton, “A Comparative Evaluation of Socratic versus
     Didactic Tutoring,” Proceedings of Cognitive Sciences Society
[5]. B. De Wever, T. Schellens, M. Valcke, H. van Keer, “Content analysis schemes to analyze transcripts
     of online asynchronous discussion groups: A review”, Computers & Education, 2005
[6]. CyclePad Reference
[7]. LIWC Reference
[8]. Weka Reference

Shared By:
Jun Wang Jun Wang Dr
About Some of Those documents come from internet for research purpose,if you have the copyrights of one of them,tell me by mail you!