allison

Document Sample
allison Powered By Docstoc
					Authorship
Attribution
  By Allison Pollard
What is
Authorship Attribution?
   The way of determining who wrote a text
    when it is unclear who wrote it.
   It is useful when two or more people
    claim to have written something or when
    no one is willing (or able) to stay that
    (s)he wrote the piece
The Basis
   A text makes use of all linguistic domains:
    semantics, syntax, lexicography, phonology
    (orthography) and morphology. Each of these
    domains is rule governed, yet, within these rules
    and among the components, the grammar
    offers the writer choices.
   The text as an end product is an outcome of the
    particular choices taken by its author. This is
    why each specific text carries the fingerprints of
    its creator.
The Assumptions:
   there is a specific single author
   there are choices to be made
   the author is consistent in his/her
    preferred choices
   these choices are present and could be
    detected in all end products of that
    creator
Computerized Analysis
   Developed in the 1980s
   Based on stylometry—the statistical
    analysis of literary style [quantifying
    some of the features of an author’s style]
Method 1:
Word- or Sentence- Length
   The origin of stylometry
   First developed in 1887, later extended in
    1938
   NOT reliable methods
Method 2:
Function Words
   Relies on word usage and context-free
    (“function”) words
   Analyze frequency, position, or
    immediate context of words
   Criticized method, cannot reliably
    distinguish between certain literature
    types
Method 3:
Vocabulary Distributions
   Measuring the “richness” or “diversity” of
    an author’s vocabulary
   Analyzes the frequency profile of word-
    usage to glimpse the author’s extent of
    vocabulary
Method 4:
Content Analysis
   Tabulates the frequency of types of
    words in a text
   Aims to reach the denotative or
    connotative meaning of the text
Method 5:
Neural Networks
   Recognize the underlying organization of
    data (which is vitally important for any
    pattern recognition problem, which
    Stylometry is)
Past Uses—
Scholarly
   Did Shakespeare write his own plays?
   Who wrote the Federalist papers?
Recent Uses—
Literary
   Determine who wrote the anonymously
    published novel Primary Colors [Joe
    Klein]
   Target suspects for the authorship of the
    Unabomber’s Manifesto [Ted Kaczynski]
Future Uses—
Beyond
   Identifying and blocking spam
   Detecting lies, flag potential
    inconsistencies
   Locate authors of malicious code
References
   Ephratt, Michal. Authorship attribution - the case of lexical
    innovations.
    http://www.cs.queensu.ca/achallc97/papers/p006.html
   Gerritsen, Corey M. Authorship Attribution Using Lexical
    Attraction.
    http://genesis.csail.mit.edu/papers/Gerritsen2003.pdf
   Holmes, David I. Stylometry: Its Origins, Development
    and Aspirations.
    http://www.cs.queensu.ca/achallc97/papers/s004.html
   Pfleeger, Charles P. and Shari Lawrence Pfleeger.
    Security in Computing. Pg 342.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:13
posted:4/11/2011
language:English
pages:14