Intentional insincerity

Document Sample
Intentional insincerity Powered By Docstoc
					Alexandria Marder
INLS 512 Spring 2011

Literature review

15 March 2011

Intentional insincerity
        Intentional insincerity, which includes sarcasm, verbal irony, and satire, is a popular tone in

natural language, but it can be difficult for humans to detect, much less computers. Some work has been

done in recent years on automatic detection of this tone, but the field is still in its infancy.

        Utsumi (1996) was a foundational work on developing a computational model of irony. The

paper described the ironic environment, which contains certain requirements for speech or writing to be

considered ironic. Utsumi found that an ironic utterance “implicitly displays the fact that its utterance

situation is surrounded by ironic environment” (p. 2). This environment is displayed when an utterance

“alludes to the speaker’s expectation, violates pragmatic principles, and implies the speaker’s emotional

attitude” (p. 1). Specifically, the utterance must allude to an expectation and the failure of that

expectation, resulting in the speaker’s disappointment or other negative attitude. Utsumi concluded

that in order for a person to interpret an utterance as ironic, only two of these three components needs

to be displayed (p. 5).

        Utsumi’s unified theory of irony laid the foundation for automating irony detection. He criticized

previous theories of irony for not being “clear enough to be formalized in a computable fashion” (p. 2),

but as he wrote, “This paper provides a basis for dealing with irony in NLP systems” (p. 6). Some

important implications of his research were the potential of a system to distinguish between ironic and

non-ironic utterances and a demonstration that ironic utterances can be interpreted without

intonational cues (p. 6).
Alexandria Marder
INLS 512 Spring 2011

How well can humans detect intentional insincerity?
        Researchers have struggled to automate the detection of intentional insincerity, partly due to its

subtlety; even humans do not always interpret it correctly. Kreuz and Caucci (2007) tested human

recognition of sarcasm by asking college students to evaluate the sarcasm levels in several texts. Some

of the texts originally included the phrase “said sarcastically.” Kreuz and Caucci deleted the word

“sarcastically” in each of these texts and randomly interspersed them with control texts with utterances

that were not sarcastic in tone. The students rated each of the excerpts (including the utterance and the

two paragraphs before and after it) on a seven-point scale of likely sarcastic intent (p. 2). Then two

judges coded each text based on the presence of adjectives and adverbs, the presence of interjections,

and the use of exclamation points or question marks (pp. 2-3). These dimensions were hypothesized to

be relevant to human readers’ interpretation of sarcasm. Kreuz and Caucci found that the sarcastic

excerpts were rated significantly higher (more likely to be sarcastic) than the control excerpts (p. 3). The

hand-coded dimensions were also analyzed, but only the presence of interjections was found to be a

significant predictor of high ratings of sarcasm (p. 3). Kreuz and Caucci’s research demonstrates that

humans have an ability to detect sarcasm and suggests some of the lexical cues they use. It could also

prove useful for automatic detection of sarcasm. The results “suggest that, in some contexts, the use of

interjections, and perhaps other textual factors, may provide reliable cues for identifying sarcastic

intent” (p.4).

        Kreuz’s earlier work with Roberts (1995) assessed the relative importance of hyperbole and

veridicality (truthfulness) in interpretation of irony. Kreuz and Roberts compiled short scenario texts and

asked college students to evaluate their levels of verbal irony. Kreuz and Roberts explained that

nonveridicality “is essential for the perception of irony. That is, an ironic statement must be contrary to

the true state of affairs to be interpreted correctly. There must be some discrepancy between the reality

and the utterance, and the listener must recognize this discrepancy in order to interpret the utterance
Alexandria Marder
INLS 512 Spring 2011

as it was intended” (p. 22). They also explained the importance of hyperbole, suggesting that “There

seems to be a standard frame for such [ironic] utterances in English; it can be characterized as an

adverb, followed by an extreme, positive adjective” (p. 24).

        In the study, Kreuz and Roberts presented college students with texts that included scenarios

with different variations of veridicality and hyperbole (p. 26). In other words, some scenarios set up

situations with utterances that made sense and others with utterances that were contrary to fact. In

some scenarios, the utterances were exaggerated, and in others they were non-hyperbolic. In this way,

the researchers were able to compare the relative importance of veridicality and hyperbole. They found

that both veridicality and hyperbole were significant, with the scenarios presenting a combination of the

two factors being rated most sarcastic (p. 27). Kreuz and Roberts did not suggest any computer-based

applications for their findings, but the lexical patterns they discovered could perhaps be used in

automatic irony detection.

How do automatic systems compare?
        Tsur et al (2010) devised the most successful system to date: a novel algorithm for sarcasm

identification using 66,000 reviews as a corpus. Their Semi-supervised Algorithm for

Sarcasm Identification (SASI) included two steps: a semi-supervised pattern acquisition algorithm and a

classification algorithm (p. 163). The pattern acquisition algorithm was trained with manually labeled

sentences rated one to five (not at all sarcastic to definitely sarcastic) (p. 163). The researchers then

extracted syntactic and pattern-based features (p. 163). The patterns were based on high-frequency

words and content words (p. 164). The strongest patterns were selected for the feature vectors, which

also included sentence length and punctuation features for analysis (p. 164). The researchers added to

their data set by searching on the web for sentences with similar patterns (p. 165). They compared their

results to a star-sentiment baseline based on the star rating associated with each review on (p. 165). They used 5-fold cross validation and a gold-standard annotation to evaluate their
Alexandria Marder
INLS 512 Spring 2011

results. For the 5-fold cross validation, the combination of all features yielded the best results, with

patterns+punctuation close behind (p. 166). The gold-standard annotation evaluation revealed a

significant improvement over the baseline (p. 165).

        Davidov et al (2010) was written by the same three researchers as Tsur et al. They used the SASI

algorithm to investigate 66,000 product reviews and 5.9 million Twitter messages (p. 107).

Davidov et al used the same training process as described in Tsur et al (2010). They used the #sarcasm

Twitter hashtag to train the system on the Twitter corpus, but this proved too noisy, so they performed

cross-domain training instead, using the Amazon data set (p. 111). The researchers used 15 annotators

from Amazon's Mechanical Turk service for annotating a gold standard for evaluation purposes (p. 112).

They also used the #sarcasm Twitter hashtag as a secondary gold standard; all tweets with this hashtag

were considered to be sarcastic (p. 113). As in Tsur et al, the combination of all features yielded the best

results, again with patterns+punctuation close behind (p. 113). The gold-standard evaluations were high

for the new sentences (both Twitter and Amazon), and the Mechanical Turk standard outperformed the

#sarcasm Twitter hashtag (p. 113). Davidov et al built on the success of Tsur et al and showed that the

SASI algorithm could be expanded successfully.

        Carvalho et al (2009) investigated the use of linguistic cues for detecting irony in user comments

on a Portuguese newspaper website. They achieved relatively high levels of precision “by exploring

certain oral or gestural clues in user comments, such as emoticons, onomatopoeic expressions for

laughter, heavy punctuation marks, quotation marks and positive interjections” (p. 53). The paper

followed a previous Carvalho et al study on opinion mining that achieved high precision for negative

opinions but lower precision for positive opinions. One of the major errors was found to be verbal irony,

and the 2009 paper investigated how to detect verbal irony in order to avoid false positive opinions in
Alexandria Marder
INLS 512 Spring 2011

opinion detection (p. 53). In particular, it focused on “the specific case where a word or expression with

prior positive polarity is figuratively used for expressing a negative opinion” (p. 53).

        Carvalho et al devised eight linguistic patterns that they hypothesized would be related to verbal

irony. The patterns were constrained by required inclusion of positive opinion polarity and human

named entities (p. 54). The patterns included diminutive forms, demonstrative determiners,

interjections, verb morphology, cross-constructions, heavy punctuation, quotation marks, and laughter

expressions (pp. 54-55). They used a named-entity lexicon and a sentiment lexicon to find excerpts from

the newspaper corpus that fit the constraints of the study (p. 54). The researchers evaluated common

sentence patterns and marked them as ironic, not ironic, undecided, or ambiguous (p. 54). They found

that the most productive patterns were the ones that relied on punctuation and keyboard characters,

“which are ways of representing oral or gestural expressions in written text” (p. 54). The patterns based

on laughter and quotation yielded the best results (p. 55).

        Burfoot and Baldwin (2009) used support vector machines and feature weighting to

differentiate between true and satirical news stories using newswire and satirical news articles for their

corpus. They focused on three feature types that were strongly related to satirical stories: headlines,

profanity, and slang (pp. 162-63). They also evaluated the validity of stories by comparing the

combinations of named entities in each story with web queries for the same combinations. Valid (and

less likely to be satirical) combinations of named entities had more web matches than the novel

combinations of named entities that characterized satirical stories (p. 163). The classifiers achieved high

precision but low recall. They detected the most obvious satirical stories but they could not catch the

subtler ones (p. 164).

        Tepperman et al (2006) studied sarcasm recognition in spoken dialogue using prosodic, spectral,

and contextual cues (p. 1838). They built their study around occurrences of the expression “yeah right”
Alexandria Marder
INLS 512 Spring 2011

from telephone dialogues in the Switchboard and Fisher corpora in order to capture a variety of

sarcastic and non-sarcastic uses (p. 1838). First they categorized each example of the expression “yeah

right” as one of four types of speech act: acknowledgment, agreement/disagreement, indirect

interpretation, or internal phrase (pp. 1838-39). They also coded the excerpts for the “yeah right”

expression preceded or followed by laughter, the expression as a question or answer, the expression as

the start or end of a turn, the expression preceded or followed by a pause, and the gender of the

speaker (p. 1839). They used 19 prosodic features to characterize the tone of voice for each utterance,

as well (p. 1839). Spectral information was recorded automatically. Two human annotators annotated

the excerpts, both with and without the context of the two or three turns before and after the target

expression (p. 1840). Having the context included improved inter-annotator agreement significantly,

suggesting that the context was an important factor in correctly interpreting the excerpts (p. 1840). The

researchers found that contextual and spectral features outperformed prosodic features (p. 1841).

Laughter was found to be the most important predictive feature (p. 1841). They concluded “that

prosody alone is not sufficient to discern whether a speaker is being sarcastic” and “that spectral and

contextual features can be used to detect sarcasm as well as a human annotator would” (p.1838).

Why is it important?
        The automatic detection of irony, sarcasm, and satire is important to the broader area of

sentiment detection as an interesting computational problem. It is also important as a way to clear away

sentimental noise in texts that causes false sentiment detection. For example, a sarcastic movie review

might contain all positive words but actually portray negative sentiment. Several researchers have run

up against this problem in sentiment detection research. Read (2005) performed a sentiment

classification study employing emoticons. He compiled a corpus of excerpts that included smile or frown

emoticons (p. 45) and optimized them for sentiment classification on a second corpus (p. 46). The

optimized system performed well on data from the emoticon corpus but not on data from the second
Alexandria Marder
INLS 512 Spring 2011

corpus (p. 47). Read suggested that the problem may be that the emoticon extracts “may be noisy with

respect to sentiment” (p. 47). He suggested that sarcasm was a significant contributor of noise in this


        Das et al (2009) used unsupervised learning to detect sentiment in political blogs by detecting

their themes and orientations. The results were mixed, partly because of problematic noise caused by

sarcasm. Das et al wrote, “Some articles have a vocabulary that is dominated by all terms related to the

promises made by one candidate, but ends with a sentence that changes the overall tone of the article.

Some articles are humor based where all the policies made by a candidate is debated using sarcasm or

jokes” (p. 91). They concluded, “Detecting sarcasm in text is indeed very hard and remains an open

problem” (p. 92).

        Davidov et al (2009) summarized the problem, “The difficulty in recognition of sarcasm causes

misunderstanding in everyday communication and poses problems to many NLP systems such as online

review summarization systems, dialogue systems or brand monitoring systems due to the failure of state

of the art sentiment analysis systems to detect sarcastic comments” (p. 107).

What’s next?
        Automatic detection of intentional insincerity has a rich variety of potential applications. Tsur et

al (2010) wrote:

        Beyond the obvious psychology and cognitive science interest in suggesting models for the use

        and recognition of sarcasm, automatic detection of sarcasm is interesting from a commercial

        point of view. Studies of user preferences suggest that some users find sarcastic reviews biased

        and less helpful while others prefer reading sarcastic reviews. (p. 163)

Tsur et al also list content ranking personalization, recommendation systems, and review summarization

and opinion mining systems as areas of potential application (p. 164). Meanwhile, Kreuz and Caucci
Alexandria Marder
INLS 512 Spring 2011

(2007) wrote about the potential to investigate “certain formulaic expressions (e.g., thanks a lot, good

job), foreign terms (e.g., au contraire), rhetorical statements (e.g., tell us what you really think), and

repetitions (e.g., perfect, just perfect) [that] are also common in sarcastic statements” (p.4). Further

research building on the early successes in automatic irony, sarcasm, and satire detection should

continue to yield better results with commercial applications, particularly in product reviews and

consumer personalization.
Alexandria Marder
INLS 512 Spring 2011


Burfoot, C. & Baldwin, T. (2009). Automatic satire detection: Are you having a laugh? Proceedings of the
        ACL-IJCNLP 2009 Conference Short Papers, 161–64.

Carvalho, P., Silva, M., Sarmento, L., & de Oliveira, E. (2009). Clues for detecting irony in user-generated
       contents: Oh...!! It's "so easy" ;-) TSA'09 - 1st International CIKM Workshop on Topic-Sentiment
       Analysis for Mass Opinion Measurement, 53-56.

Das, P., Srihari, R., & Mukund, S. (2009). Discovering voter preferences in blogs using mixtures of topic
         models. AND '09 Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text
         Data, 85-92.

Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in
       Twitter and Amazon. Proceedings of the Fourteenth Conference on Computational Natural
       Language Learning, 107–16.

Kreuz, R. & Caucci, G. (2007). Lexical influences on the perception of sarcasm. Proceedings of the
        Workshop on Computational Approaches to Figurative Language, 1-4.

Kreuz, R. & Roberts, R. (1995). Two cues for verbal irony: Hyperbole and the ironic tone of voice.
        Metaphor and Symbol, 10(1), 21-31.

Read, J. (2005). Using emoticons to reduce dependency in machine learning techniques for sentiment
         classification. Proceedings of the ACL Student Research Workshop, 43-48.

Tepperman, J., Traum, D., & Narayanan, S. (2006). "Yeah right": Sarcasm recognition for spoken dialogue
      systems. INTERSPEECH 2006 - ICSLP, 1838-41.

Tsur, O., Davidov, D., & Rappoport, A. (2010). ICWSM - A great catchy name: Semi-supervised
        recognition of sarcastic sentences in online product reviews. Proceedings of the Fourth
        International AAAI Conference on Weblogs and Social Media, 162-69.

Utsumi, A. (1996). A unified theory of irony and its computational formalization. COLING, 962–67.

Shared By: