Document Sample
Webinar Powered By Docstoc
					          Interactive Task of the
            TREC Legal Track:
     Theory meets Practice
 Making the world better for lawyers

                         Douglas W. Oard
                    College of Information Studies and
                 Institute for Advanced Computer Studies
                   University of Maryland, College Park

Joint work with Jason Baron (NARA), Bruce Hedin (H5), Stephen Tomlinson (Open Text)
White House                                      search                         Tobacco
                                                request                          Policy

              ~~~~~~~~                                            ~~~~~~~~
 32 million   ~~~~~~~~                                            ~~~~~~~~
              ~~~~~~~~                                            ~~~~~~~~
  emails                                                          ~~~~~~~~

                            National Archives

                         hired 25

                                            for 6 months …
Federal Rules of Civil Procedure
Rule 26(f): At the parties’ planning meeting,
issues expected to be discussed include:

– “Any issues relating to disclosure or discovery of
  electronically stored information, including the
  form or forms in which it should be produced”

– “Any issues relating to preserving discoverable
        Judge Grimm, writing for the
U.S. District Court for the District of Maryland

 “all keyword searches are not
 created equal; and there is a
 growing body of literature that
 highlights the risks associated
 with conducting an unreliable or
 inadequate keyword search”

 Victor Stanley, Inc. v. Creative Pipe, Inc.,
 ---F.Supp.2d---, 2008 WL 2221841, * 3 & n.9 (D. Md. May 29, 2008)
 The Design Space


“Features”              “Specification”

      What Does “Better” Mean?
                            D            “Better” Technique

   relevant                              “Baseline” Technique
 documents)   A


                   INCREASING EFFORT
              (time, resources expended, etc.)
            Other Desiderata
• Two-party
  – Negotiated information needs

• Comprehensive
  – “Smoking gun detection” + completeness

• Justifiable
  – Quantifiable comparison to present practice

• Affordable
  – Minimize amount of human review
Text Retrieval Conference (TREC)
• Goals
  – Foster development of research communities
  – Create “benchmark” evaluation resources
  – Establish baseline results

• History
  – Sponsored by NIST since 1992
  – “Legal Track” started in 2006; E-Discovery focus
  – Annual evaluation cycle
Evaluation Design


   Interactive Task
2008 Interactive Task Participants
4 research teams submitted 7 runs

Each run: YES/NO for all 7 million documents
          for a single production request

          Clearwell Systems
          University at Buffalo
          University of Pittsburgh
 “Complaint” and “Production Request”
…12. On January 1, 2002, Echinoderm announced record results for the prior year, primarily
attributed to strong demand growth in overseas markets, particularly China, for its products. The
announcement also touted the fact that Echinoderm was unique among U.S. tobacco companies
in that it had seen no decline in domestic sales during the prior three years.
13. Unbeknownst to shareholders at the time of the January 1, 2002 announcement, defendants
had failed to disclose the following facts which they knew at the time, or should have known:
a. The Company's success in overseas markets resulted in large part from bribes paid to foreign
government officials to gain access to their respective markets;
b. The Company knew that this conduct was in violation of the Foreign Corrupt Practices Act and
therefore was likely to result in enormous fines and penalties;
c. The Company intentionally misrepresented that its success in overseas markets was due to
superior marketing.
d. Domestic demand for the Company's products was dependent on pervasive and ubiquitous
advertising, including outdoor, transit, point of sale and counter top displays of the Company's
products, in key markets. Such advertising violated the marketing and advertising restrictions to
which the Company was subject as a party to the Attorneys General Master Settlement
Agreement ("MSA").
e. The Company knew that it could be ordered at any time to cease and desist from advertising
practices that were not in compliance with the MSA and that the inability to continue such
practices would likely have a material impact on domestic demand for its products. …
      All documents which describe, refer to, report on, or mention
      any “in-store,” “on-counter,” “point of sale,” or other retail
      marketing campaigns for cigarettes.
          ~7 Million Documents
Scanned                     OCR                                    Metadata
            Philip Moxx's. U.S.A. x.dr~am~c.
            cvrrespoaa.aa                                Title: CIGNA WELL-BEING
            Benffrts Departmext Rieh>pwna, Yfe&ia        NEWSLETTER - FUTURE
            Ta: Dishlbutfon Data aday 90,1997.           STRATEGY
            From: Lisa Fislla
            Sabj.csr CIGNA WeWedng Newsbttsr -           Organization Authors:
            Yntsre StratsU                               PMUSA, PHILIP MORRIS
            During our last CIGNA Aatfoa Plan
            meadng, tlu iasuo of wLetSae to i0op         USA
                                                         Person Authors: HALLE, L
            artieles aod discontinue mndia6 CIGNA
            Well-Being aawslener to om employees         Document Date: 19970530
            was a
            msiter of disanision . I Imvm done           Document Type: MEMO,
            somme reaearc>>, and wanted to
            pruedt you with my
            Sadings and pcdiminary                       Bates Number:
            recwmmeadatioa for PM's atratezy
            Ieprding l4aas aewelattee* .                 2078039376/9377
            I believe .vayone'a input is valusble, and
                                                         Page Count: 2
            would epproolate hoarlng fmaa aaeh of
            you on                                       Collection: Philip Morris
            whetlne you concur with my
      Relevance Assessment
• Volunteer assessors
  – Mostly from 13 law schools

• Web-based assessment system
  – Based on document images + metadata
Estimating Retrieval Effectiveness

         67% relevant in this region

         33% relevant in this region
   Everyone Gets High Precision

RelRet / Ret

RelRet / Rel

  Rel Ret
                   High OCR-accuracy documents only
Interaction Time Effect

                          All documents
        Takeaway Messages
• Leverage guided interactive refinement
  – Factor of two in comprehensiveness

• Vibrant research community
  – 22 research teams in 7 countries

• Unique test collection
  – Sampling for “recall-oriented” evaluation
      Some Useful References
• TREC Legal Track
  – Papers at
  – Mailing list (contact

• DESI-3 Workshop on
  “Global E-Discovery and E-Disclosure”
  – June 8, 2009 in Barcelona

Shared By: