Authorship Attribution By Allison Pollard What is Authorship Attribution? The way of determining who wrote a text when it is unclear who wrote it. It is useful when two or more people claim to have written something or when no one is willing (or able) to stay that (s)he wrote the piece The Basis A text makes use of all linguistic domains: semantics, syntax, lexicography, phonology (orthography) and morphology. Each of these domains is rule governed, yet, within these rules and among the components, the grammar offers the writer choices. The text as an end product is an outcome of the particular choices taken by its author. This is why each specific text carries the fingerprints of its creator. The Assumptions: there is a specific single author there are choices to be made the author is consistent in his/her preferred choices these choices are present and could be detected in all end products of that creator Computerized Analysis Developed in the 1980s Based on stylometry—the statistical analysis of literary style [quantifying some of the features of an author’s style] Method 1: Word- or Sentence- Length The origin of stylometry First developed in 1887, later extended in 1938 NOT reliable methods Method 2: Function Words Relies on word usage and context-free (“function”) words Analyze frequency, position, or immediate context of words Criticized method, cannot reliably distinguish between certain literature types Method 3: Vocabulary Distributions Measuring the “richness” or “diversity” of an author’s vocabulary Analyzes the frequency profile of word- usage to glimpse the author’s extent of vocabulary Method 4: Content Analysis Tabulates the frequency of types of words in a text Aims to reach the denotative or connotative meaning of the text Method 5: Neural Networks Recognize the underlying organization of data (which is vitally important for any pattern recognition problem, which Stylometry is) Past Uses— Scholarly Did Shakespeare write his own plays? Who wrote the Federalist papers? Recent Uses— Literary Determine who wrote the anonymously published novel Primary Colors [Joe Klein] Target suspects for the authorship of the Unabomber’s Manifesto [Ted Kaczynski] Future Uses— Beyond Identifying and blocking spam Detecting lies, flag potential inconsistencies Locate authors of malicious code References Ephratt, Michal. Authorship attribution - the case of lexical innovations. http://www.cs.queensu.ca/achallc97/papers/p006.html Gerritsen, Corey M. Authorship Attribution Using Lexical Attraction. http://genesis.csail.mit.edu/papers/Gerritsen2003.pdf Holmes, David I. Stylometry: Its Origins, Development and Aspirations. http://www.cs.queensu.ca/achallc97/papers/s004.html Pfleeger, Charles P. and Shari Lawrence Pfleeger. Security in Computing. Pg 342.