Docstoc

PhD Comprehensive Exam

Document Sample
PhD Comprehensive Exam Powered By Docstoc
					AI Seminar


Our web page is at:
www.cs.nmsu.edu/~gradrep
Under “Events” in left frame

September 5, 2001   Melanie Martin - AI Seminar   1
Identifying Ideological Point of View
Part II


    Melanie Martin
    September 5, 2001


    September 5, 2001   Melanie Martin - AI Seminar   2
Outline of this presentation
n    Where are we???
n    Ideology
n    Statistical NLP and Machine Learning
n    Discourse features
n    Internet
n    Conclusion


September 5, 2001   Melanie Martin - AI Seminar   3
Where are we???

n    Let’s recall what we want to do:

n    Build a system that could take
     information from web pages and Usenet
     newsgroups on a given topic and
     segment, classify or cluster it by
     ideological point of view…..

September 5, 2001   Melanie Martin - AI Seminar   4
The Proposed System
          User
         inputs
          topic



                      Topic
                                               Set of           Ideological
     Search         Clustering,
                                             documents          Clustering
     Engine          Filtering
                                              on topic



                                                                 Docs on
  Internet:
                                                                  topic
 Web pages,
                                                                clustered
   Usenet
                                                                 by IPV

September 5, 2001                 Melanie Martin - AI Seminar                 5
Where are we???
n    What do we need?
       – A computationally feasible definition of
         ideological point of view

       – A search engine, possibly with additional
         processing, to produce a collection of
         documents on the topic specified by the
         user


September 5, 2001      Melanie Martin - AI Seminar   6
Where are we???

n    What else do we need?
       – A module to cluster documents by
         ideological point of view

       – A user interface

       – A way to evaluate the system


September 5, 2001      Melanie Martin - AI Seminar   7
Where are we???

n    Why do we need this?
n    Some examples using google:
       – query: back pain ~2,220,000
              • scoliosis ~121,000
       – query: lyme disease ~163,000
       – query: zoning shopping center ~65,100
              • (add) clark county nv ~299
       – query: un racism conference ~74,000
September 5, 2001           Melanie Martin - AI Seminar   8
Outline of this presentation
n    Where are we???
n    Ideology
n    Statistical NLP and Machine Learning
n    Discourse features
n    Internet
n    Conclusion


September 5, 2001   Melanie Martin - AI Seminar   9
Ideology

n    Working definition from van Dijk:
     “Ideologies are the fundamental beliefs
     of a group and its members.”
       – instantiated as Us vs. Them
       – predefined ideologies will not work across
         domains
       – want to avoid researcher bias
       – definition likely needs more work
September 5, 2001      Melanie Martin - AI Seminar    10
Ideology

n    Linguistics
       – van Dijk (1998)
       – Blommaert & Verschueren (1998)
       – Wang (1993)
       – Wortham & Locher (1996)




September 5, 2001    Melanie Martin - AI Seminar   11
Ideology

n    The Systems
       –   Ideology Machine -1965 to 1973 - Abelson et al.
       –   Politics - 1979 - Carbonell
       –   Pauline - 1987 - Hovy
       –   Tracking Point of View in Narrative - 1994 - Wiebe
       –   Spin Doctor - 1994 - Sack
       –   Terminal Time - 2000 - Mateas et al.




September 5, 2001             Melanie Martin - AI Seminar       12
Ideology

n    Some issues
       – Evaluation!!!
       – Hard-coded knowledge
       – Domain dependence
       – Cognitive plausibility
       – More precise definitions



September 5, 2001      Melanie Martin - AI Seminar   13
Outline of this presentation
n    Where are we???
n    Ideology
n    Statistical NLP and Machine Learning
n    Discourse features
n    Internet
n    Conclusion


September 5, 2001   Melanie Martin - AI Seminar   14
Statistical NLP and ML

n    Two techniques we will consider
       – Latent Semantic Analysis
       – Probabilistic Classification




September 5, 2001       Melanie Martin - AI Seminar   15
Statistical NLP and ML

n    Issues
       – clustering versus classification
              • categories may not be predefined
              • may want to take a variety of features into
                account
       – favor learning over hard-coding knowledge
       – supervised versus unsupervised
              • cost of annotated training data

September 5, 2001            Melanie Martin - AI Seminar      16
Statistical NLP and ML

n    Latent Semantic Analysis
       – text represented as a matrix
              • entries are weighted frequency of word in
                context
       – semantic space obtained through SVD
              • words appearing in similar context have similar
                feature vectors
       – characterizes semantic content of words in
         context
September 5, 2001            Melanie Martin - AI Seminar      17
Statistical NLP and ML
n   Why LSA is a good choice here
      – semantics is key component of ideological
        discourse
      – clustering without need for predefined
        categories
      – already shown useful for:
             • summarization (Ando 2000)
             • text segmentation (Choi 2001)
             • measuring text coherence (Foltz 1998)
September 5, 2001           Melanie Martin - AI Seminar   18
Statistical NLP and ML

n    We want to look a little more closely at
     Ando’s work
       – uses term, sentence, and document
         vectors
       – modified SVD algorithm
       – interesting interface
n    Multi-document summarization by visualizing topical content.
     Rie Kubota Ando, Branimir Boguraev, Roy Byrd, and Mary Neff.
     ANLP/NAACL '00 Workshop on Automatic Summarization

September 5, 2001            Melanie Martin - AI Seminar            19
Statistical NLP and ML

n    Another option is a probabilistic
     classifier
       – assigns most probable class to an object
         bases on a probability model
       – can we get around predefined classes?




September 5, 2001     Melanie Martin - AI Seminar   20
Statistical NLP and ML

n    Probability model
       – defines joint distribution of variables
              • set of feature variables and a class variable
n    Wiebe and Bruce (1995) got around the
     issue of not knowing the classes in
     advance by breaking up the problem
     and using a series of classifiers

September 5, 2001            Melanie Martin - AI Seminar        21
Statistical NLP and ML

n    We need to come up with a set of
     features…our next topic

n    Then deciding which features to use
     can be determined statistically with
     goodness of fit of graphical models


September 5, 2001   Melanie Martin - AI Seminar   22
Statistical NLP and ML

n    Both methods seem to have a lot of
     potential
n    LSA would be easier to implement
       – possibly a baseline for evaluation of
         probabilistic classifiers
n    Less linguistic knowledge gain likely
     with LSA

September 5, 2001      Melanie Martin - AI Seminar   23
Outline of this presentation
n    Where are we???
n    Ideology
n    Statistical NLP and Machine Learning
n    Discourse features
n    Internet
n    Conclusion


September 5, 2001   Melanie Martin - AI Seminar   24
Discourse features

n    If we use probabilistic classifiers we
     need features, so we look at:
       – linguistics
       – previous systems
       – discourse theory
       – literary theory


September 5, 2001    Melanie Martin - AI Seminar   25
Discourse features

n    From linguistics and discourse:
n    General strategy of most ideological
     discourse (van Dijk’s Ideological Square):
       – Emphasize positive things about Us
       – Emphasize negative things about Them
       – De-emphasize negative things about Us
       – De-emphasize positive things about Them

September 5, 2001    Melanie Martin - AI Seminar   26
Discourse features

n    How are these strategies instantiated in
     discourse? (van Dijk)
       – What is there:
              •     argument structure
              •     syntactic patterns
              •     style and non-literal language
              •     actor descriptions
              •     thematic structure
              •     topoi (standardized topics)
September 5, 2001                Melanie Martin - AI Seminar   27
Discourse features

       – What is not there
              •     implication
              •     presupposition
              •     inference
              •     goals and plans




September 5, 2001               Melanie Martin - AI Seminar   28
Discourse features

n    Disclaimers, selected examples:
       – Apparent Negation: I have nothing against X, but...
       – Apparent Concession: They may be very smart,
         but...
       – Apparent Empathy: They may have had problems,
         but...
       – Apparent Effort: We do everything we can, but...
n    Positive self-representation and face
     keeping
September 5, 2001        Melanie Martin - AI Seminar      29
Discourse features

n    Some discourse theories from
     Computational Linguistics
       – Mann & Thompson (RST) (1988)
       – Grosz & Sidner (G&S) (1986)
       – Morris & Hirst (Lexical chains) (1991)



September 5, 2001      Melanie Martin - AI Seminar   30
Discourse features
n    Issues

       – implementation
              • G&S, RST
       – finite number of fixed primitives
              • RST
       – domain specific
              • RST depends on training


September 5, 2001           Melanie Martin - AI Seminar   31
Discourse features

n    A reasonable first approach: Lexical
     Chains (Morris & Hirst)
n    Sequences of related words spanning a
     topical unit in the text
       – based on lexical cohesion
       – encapsulates context
       – helps identify key phrases

September 5, 2001      Melanie Martin - AI Seminar   32
Discourse features

n    Idea of Algorithm
       – read next word
              • if candidate
                    – check chains within suitable span
                         » check thesaurus or WordNet
                         » check other knowledge sources
                    – if found
                         » include in chain
                         » recalculate chain


September 5, 2001                Melanie Martin - AI Seminar   33
Discourse features

n    Lexical chains could help us in:
       – topic segmentation
       – intentional structure
       – lexical features for a classifier




September 5, 2001       Melanie Martin - AI Seminar   34
Discourse features

n    Lexical chains are easy to implement,
     but are unlikely to be sufficient…
n    For the next approximation: RST
       – Marcu’s implementation incorporating G&S
       – Mostly used for summarization and
         generation
       – Would help get at the argument structure of
         the text
September 5, 2001     Melanie Martin - AI Seminar   35
Discourse features
n    RST Basics
       – about 23 rhetorical relations
              • account for discourse coherence
              • link adjacent spans of text
       – 5 schema
              • defined in terms of relations
              • specify how spans can co-occur
       – nucleus and satellite spans
       – end up with tree structure
September 5, 2001           Melanie Martin - AI Seminar   36
Discourse features

n    Would most likely use RST to generate
     features for a classifier or as input to a
     pattern recognizer
n    Nuclei spans help pick out the more
     important segments of text
n    Produces a tree that gives the structure
     of the rhetorical structure of the text

September 5, 2001   Melanie Martin - AI Seminar   37
Outline of this presentation
n    Where are we???
n    Ideology
n    Statistical NLP and Machine Learning
n    Discourse features
n    Internet
n    Conclusion


September 5, 2001   Melanie Martin - AI Seminar   38
Internet

n    We would like to mine the structure of
     the internet
       – see if there is a correspondence with
         groups
       – improved IR by topic
       – figure out what search engine to use as a
         base for our system


September 5, 2001     Melanie Martin - AI Seminar    39
Internet

n    Issues
       – topic or query disambiguation
       – what is a minimal unit
       – how to use the structure of the web
              • finding authorities
              • communities and subgraphs
       – Evaluation!!!


September 5, 2001          Melanie Martin - AI Seminar   40
Internet

n    Kleinberg (1997)
       – link based model
       – hub - links to many related authorities
       – authority
       – iterative weighting algorithm that converges
         (rapidly in practice)
       – can disambiguate authorities by sense
       – can be used to trawl for cyber communities
September 5, 2001      Melanie Martin - AI Seminar   41
Outline of this presentation
n    Where are we???
n    Ideology
n    Statistical NLP and Machine Learning
n    Discourse features
n    Internet
n    Conclusion


September 5, 2001   Melanie Martin - AI Seminar   42
Conclusion
n   It seems that such a system can be built
      – find a good search engine
      – use Kleinberg’s algorithm to improve
        collection of documents retrieved
      – use LSA and/or a probabilistic classifier to
        handle the ideological point of view
      – with a probabilistic classifier use linguistic
        and discourse features
      – develop evaluation methodolgy
September 5, 2001      Melanie Martin - AI Seminar       43
The End


Thanks for listening!
If you want to know more, my
   Comprehensive Exam paper is at:
www.CS.NMSU.Edu/~mmartin/courses/comps_all.html

September 5, 2001   Melanie Martin - AI Seminar   44

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:7/4/2013
language:Latin
pages:44
caifeng li caifeng li
About