Docstoc

Expert Finding = Finding People + Assessing Expertise

Document Sample
Expert Finding = Finding People + Assessing Expertise Powered By Docstoc
					  Expert Finding
         =
  Finding People
         +
Assessing Expertise
   Arjen P. de Vries
   arjen@acm.org

    Future Challenges in Expertise Retrieval,
   SIGIR 2008 Workshop, July 24th, Singapore
        Real life expert finding
• Information need (from email request
  originally sent to ILLC Amsterdam):

 ... algoritme te ontwikkelen dat ons in staat
 stelt om ongestructureerd content (XML) met
 behulp van intelligente programmatuur te
 voorzien van semantiek en metadatering,
 waarbij het de doelstelling is om het aantal
 handmatige acties voor de gebruiker te
 minimaliseren.
         Real life expert finding
• Asks an intern:

  Zou je mij kunnen laten weten of jij iemand
  kent die deze stage zou willen doen of dat je
  via je netwerk iemand zou kunnen vinden.
        Real life expert finding
• Intern answers:
  ...
  Er bestaat nog een andere afdeling die
  onderzoek doen naar pattern recognition.
  Misschien dat iemand daar een student kan
  wegzetten?
         Real life expert finding
• Marcel Reinders (colleague):

  Ik forward dit verzoek door naar Arjen de
  Vries. Het onderwerp dat u voorstelt ligt nl erg
  in lijn met zijn onderzoeksveld en -interesse,
  dus ik voorzie wel aanknopingspunten. Verder
  zal ik het ook forwarden naar Emile Hendriks.
  Hij is studentcoordinator binnen onze groep en
  kan studenten voor deze opdracht
  interesseren.
         Real life expert finding
• Email to me:

 ... Ik heb de term “pattern recoginition”
 aangehouden, maar ik ben er niet zeker van of
 dit correct is. Zou u mij kunnen aangeven wat
 de gebruikelijke termen zijn die gebruikt
 worden voor onderzoeken van deze aard?
            What do we learn?
• People who need an expert do not know what
  to ask for
  – 'pattern recognition'
  – 'unstructured data (XML)'
• People (try to) search for expertise by referral,
  via trusted people
• Candidate mention does not necessarily imply
  expertise
  – Msc coordinator
                  Motivation
• Find staff with the right expertise
  – Increasing productivity and pay-off
• Improve account management
• Tighten social networks




                                          Source: Teezir.nl
                       Case Study
• IBM’s ‘Professional Marketplace’
  – Fulfillment rates have improved with
    engagements staffed 20% faster and better
    matched to requested qualifications
  – A nearly 10% decrease in the use of
    subcontractors due to better utilization
  – Improved efficiencies have saved IBM US$500
    million thus far [June 2006]

      See also http://www-306.ibm.com/software/success/cssdb.nsf/CS/LJKS-6RMJZS
      and http://endeca.com/corporate-info/press-room/nr/n_072005_wsj.html
   Motivation for System Support
• Busy experts do not have time to maintain
  adequate descriptions of their continuously
  changing specialized skills
• Expert seekers have poorly articulated
  requirements and are not fully enabled to
  judge a good expert from a bad one
            Complicating factors
• Volume of communication/publication is not a
  reliable indication of expertise
• Certain topics engender more opinion than facts
• Lack of information about past performance of
  experts
• New employees don’t know about informal social
  networks
• Access to expertise is often controlled (informally or
  formally, by the experts or their management)
• Solutions to complex problems require diverse
  ranges of expertise
            Evidence of Expertise
•   Email or bulletin board messages
•   Corporate communications
•   Shared folders in file system                               Content
•   Resumes and homepages
•   Employee database
•   Email flow
•   Bibliographic information                                    Social
•   Software library usage                                      networks
•   Search and publication history
                                                                Activities
•   Project time charges

           See also bibliography on TREC-ENT wiki:
           http://www.ins.cwi.nl/projects/trec-ent/wiki/index.php/Bibliography
                Assumptions
• Content
  – Experts are mentioned in relevant documents
  – Experts author relevant documents
• Social networks
  – People that interact are likely to share expertise
  – Evidence in records of information exchange (and
    co-authorship, co-work) and/or organizational
    structure
  Document-based Expert Finding
• Find and score documents about the topic
  – Title about topic
  – Abstract about topic
• Aggregate scores for each distinct author
           Additional Techniques
                  Research Systems

• Combine the two basic approaches
• Estimate the quality of the evidence
• Use of collection/structural knowledge
  – Treat emails different from documents
  – Treat email’s subject/sender/receiver different
    from body
  – Locate homepages


                              See also TREC proceedings 2005-2007
           Additional Techniques
                  Research Systems

• Combine the two basic approaches
• Estimate the quality of the evidence
• Use of collection/structural knowledge
  – Treat emails different from documents
  – Treat email’s subject/sender/receiver different
    from body
  – Locate homepages
• Use social network extracted from co-
  authorship or email lists
                              See also TREC proceedings 2005-2007
          Additional Techniques
               Commercial Systems

• Self-declarations of expertise
• Metadata




                   M. Maybury, Expert Finding Systems, MITRE Technical
                   Report MTR 06B000040, 2006
              Key Requirements
• Identify experts via self-nomination and/or
  automated analysis of expert communications,
  publications, and activities
• Classify the type and level of expertise of individuals
  and communities
• Validate the breadth and depth of expertise of an
  individual
• Recommend experts, including the ability to rank
  order experts on multiple dimensions including skills,
  experience, certification and reputation
             Current systems
• Hardly validate the breadth and depth of
  expertise
  – Count mentions
  – Weight with relevance score
  – Sometimes weight with authority of document
    containing candidate mention
• Do not really attempt to classify the type and
  level of expertise
          Evidence of Expertise
• Information about true expertise is often not
  explicit in artifacts (as opposed to factual
  knowledge)
• Information about expertise is expressed using
  specialized terms and concepts
              How to improve?
• Integrate more sources of evidence
  – CV information
  – Project related data
     • Including temporal information
  – Training data (HR dept)
• Cost of achieving this evidence for expert vs.
  non-expert as weighting factor
  – Participation in TREC, authoring a book, getting a
    PhD in IR, ...
                                    Raymond D'Amore, Expert Finding in
                                         Disparate Environments, 2008
                  However...
• Two types of challenges to be overcome:
  – System challenge
  – Evaluation challenge
            System Challenges
• Multi-lingual entity extraction
• Privacy management
  – E.g., Tacit can email top N experts with private
    profiles (only recipient knows)
• Interoperability with heterogeneous data
  sources
  – IMAP, Exchange, Lotus Notes
  – LDAP, JDBC/ODBC, XML repositories, Peoplesoft,
    Oracle Financials, Word/Excel/PDF, ...
           Where is my data?
• > 80% of data not in relational databases
  – Documents, spreadsheets, presentations
  – Web pages
  – Email, instant messages, news feeds
  – Images, audio, video
                       Dataspaces
• The complete set of information belonging to
  one organization or task
• Examples:
  – Personal dataspace
  – Enterprise dataspace
  – Community dataspace
     • E.g., scientific, sports club, ...


             “From Databases to Dataspaces: A New Abstraction for Information Management”,
                Michael Franklin, Alon Halevy, David Maier, SIGMOD Record, December 2005.
  Dataspace Support Platform (DSSP)
• Deal with the complete dataspace
  – Not upon explicit entering content
• Data co-existence
  – DSSP may not assume full control over data
• Pay-as-you-go services
  – Keyword search is bare minimum
  – Incremental-schema for more advanced querying
    functionality
           “From Databases to Dataspaces: A New Abstraction for Information Management”,
              Michael Franklin, Alon Halevy, David Maier, SIGMOD Record, December 2005.
          Evaluation Challenge
• How to get realistic data for research
  evaluation?
• So far: build corpora from externally facing
  parts of company intranets
              W3C Limitations
• Topics and assessments developed by
  researchers instead of users
• Difficult to situate information needs in an
  organization you are not part of –
  representative of new members?
• Public e-mail lists; no private/personal
  communication, and receivers unknown
               CERC Limitations
• Real data, but...
• TREC has to use externally-facing data
  – Publicly available pages from csiro.au only
  – Can’t address security, databases, click data
     • No candidate list!
  – Participants do the assessments
         UvT Expert Collection
• Based on a publicly accessible database of UvT
  employees who are involved in research or
  teaching
• ~38,000 documents
  – Research descriptions
  – Course descriptions
  – Publications
  – Personal homepages

                        Balog et al., Broad Expertise Retrieval in Sparse
                        Data Environments, SIGIR 2007
          UvT Expert Collection
• Clean, heterogeneous, structured, and
  focused (but comprises a limited number of
  documents)
• Contains hierarchical information from an
  organization (institutes and faculties)
• List of expertise areas provided by the
  employees themselves
  – http://ilk.uvt.nl/uvt-expert-collection/


                         Balog et al., Broad Expertise Retrieval in Sparse
                         Data Environments, SIGIR 2007
           Conclusions so far...
• Expert finding could in principle use many more
  resources that indicate expertise, possibly
  more reliably, but it is difficult to setup the
  research
  – System challenges
  – Data availability
• Motivates research in operational setting
  – E.g., Raymond D'Amore, Expert Finding in Disparate
    Environments, 2008
            ‘In Situ’ Evaluation
• Advantages
  – Evaluation on live search system
  – Click-through data useful in inferring preference,
    even in the absence of explicit user judgements.
• Limitations
  – Experimenter effect
  – Experiments not repeatable
  – A is better than B, but by how much? Multi-way
    comparisons?

                     Thomas and Hawking, Evaluation by comparing result
                     sets in context, CIKM 2006
   Alternative benchmark model?
• Standard benchmark distributes the data +
  queries + relevance information
• Q: Could we distribute the systems instead?
  – Need organization(s) willing to evaluate in own
    intra-net, with their own users
  – Need participants with robust systems that run off-
    site, unmonitored...
                What else?
• Advanced applications of expert finding
  technology
• Alternative people finding applications
• Where's the referral disappeared to?
        Advanced Applications:
          Discover Expertise
• Rapidly locating individuals or communities of
  expertise to accelerate R&D
• Rapid formation of operational or proposal
  teams
• Support formation of cross disciplinary teams
  to respond to new market threats and
  opportunities

                    M. Maybury, Expert Finding Systems,
                    MITRE Technical Report MTR 06B000040, 2006
          Advanced Applications:
             Assess Expertise
•   Assess enterprise skill sets
•   Enable identification of atrophy
•   Discover new and emerging skill areas
•   Predict effects of skill loss
    (attrition/retirement) or gain
    (merger/acquisition)
        Related Applications
  Entity Ranking & Expert Finding
• Expert finding as a special case of entity
  retrieval?
  – query specify ‘expertise on T’
• XER topics with ‘People’ as core category
  – Artists related to Pablo Picasso
  – Actors who played Hamlet
  – 2007 XER test collection includes topics searching
    for presidents, tennis players and composers
              Types of Expertise
•   Technical [W3C]
•   Key people [CERC]
•   Developer [Alonso et al.]
•   Stakeholder [Arguello and Callan]
•   Bloggers [Balog]
•   Reviewer
•   Consultant
•   Plumber/Painter/...
                People Search
• Crucial step in social search?
  – Typed social links?
           Where's the referral
• Social network structure and its role in expert
  finding
• Interactive expert finding?

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:17
posted:9/25/2011
language:English
pages:49