Docstoc

Expressing Faceted Subject Index

Document Sample
Expressing Faceted Subject Index Powered By Docstoc
					FACETED SEMANTIC
SUBJECT ANNOTATION
SYSTEM

   Anand Kumar Pandey
   Junior Research Fellow
   Documentation Research & Training
   Centre
   Indian Statistical Institute, Bangalore,
   India
2
  COMMENTARY
   personally find the term [is] to be the most prefer the yet
 I Faceted classificationfacetone ofconfusing. I powerful, terms
     least understood, methods of terms are used in both the
attributes and attribute values. These organizing information.
database world and the artificial intelligence world, to describe a
very similar functionality, sometimes the exact same functionality.
  Peter Merholz: Innovations in Classification.
Re:SIGIA-l Faceted approach applied to content
     http://www.peterme.com/archives/00000063.html
From: Donna M. Fritzsche
Date: Fri Nov 14 2003 - 13:54:23 EST             http://www.info-arch.org/lists/sigia-l/0311/0161.html


 My complaint is that there is a lot of talk about facets, but little of any substance.
  Faceted classification serves up multiple‘pure’ classification
 Most of it won't help you build your own faceted classification scheme. It amounts
  schemes rather than a single other (faceted) side, but
 to saying the grass is greener on the ‘motley’ Taxonomy.fails to give you a
  Rosenfeld, L; Morville, get there and what obstacles you'll the World Wide way.
 map explaining how to P (2002). Information Architecture for face along the Web. And
                           .
 the academic literature doesn't help much either. It's too dense and I can't
  2nd Ed. Cambridge, MA: O’Reilly.
 recommend it to the practitioner (not the stuff I've seen). May 27, 2004
 Gordon Luk: http://www.getluky.net/archives/000052.html making reference to
 Christina Wodtke’s posting on her blog: Elegant Hack
 http://www.eleganthack.com/MT/mt-tb.cgi/2
OVERVIEW
 Present state of document annotation
 Alternative approach

 Discussion about Facets

 Faceted Subject Indexing-POPSI

 Elementary Categorize of POPSI

 SKOS (Simple Knowledge Organization System)

 Model of the proposed Faceted Semantic
  Annotation System
 Conclusion


                                                4
PRESENT STATE OF
DOCUMENT ANNOTATION

   In present scenario, the subject metadata are
    assigned in order to express the subject of the
    document.
       Limitation: But they are not always in context.


   For ex. In KIM: Named Entities (NEs) are
    identified and relationships are established
       Limitation: The context may change as per the use of
        the NEs in a document
         Plant in agriculture
         Steel Plant
                                                               5
ALTERNATIVE APPROACH
   By representing the basic constituent elements of
    the subject content. In other words, by providing
    the context to keywords.




                                                        6
EFFICIENT INFORMATION RETRIEVAL
LANGUAGE

   Which should be capable of –
     •   Dealing with the complex structure of knowledge
     •   Provide for the sequencing of a set of selected terms
         according to probable relevance to a particular topic
     •   Contextualizing the concept
     •   Giving aid to the searcher in choosing the right
         keywords for searching
     •   Mixing the searching and browsing facilities to work
         in co-ordination
    Vickery, B.C. (2006). “Structure and Function in Retrieval Language”, Journal of
    Documentation ,Vol. 62 No. 1, 2006 pp. 7-20

                                                                                       7
WHY FACETED SUBJECT INDEXING
LANGUAGE ?

It uses the Faceted Classification structure which –
o Uses logical structure to organize……

o Uses a standard set of categories to analyze the
  concepts and these categories are not locked but
  are left free to combine with each other in fullest
  freedom
o Breaks free from the restriction of traditional
  classification to the hierarchical, genus-species
  relations. By combining terms in compound
  subjects it introduces new logical relations
  between them, thus better reflecting the
                                                        8
  complexity of knowledge
WHAT IS THE “FACET”?
 ―A generic term used to denote any component- be it
  a basic subject or an isolate- of a Compound
  subject...Facets inhere in the subjects themselves,
  whether we sense them or not‖. – S.R Ranganathan.
 A homogeneous group or category derived according
  to the principles of facet analysis




                                                        9
WHAT IS THE “FACET”?
Near synonyms—
Small components of larger entities/units,
Properties, Attributes, Characteristics, category,
attribute, class, group, concept, and dimension


Facets are flat faces on diamond
 which reflect the underlying
symmetry of the crystal
structure.


                                                     10
QUICK RECIPE FOR BUILDING FACETED
CLASSIFICATION
 Define the subject field: What entities are of
  interest to the intended user of the system
 Formulate Facets: Sort the terms and arrange
  them in homogeneous groups known as Facets
 Structure each facets: Following the postulates
  and principles given by Ranganathan.
 Arrangement of the facets




                                                    11
BUILDINGS
FACET ANALYSIS
―Fundamental concepts are analyzed and grouped together as
  facets‖ (Following the principles and postulates give by
  Ranganathan)
                Hunter, E. (2002) Classification made simple. Ashgate

  Building Facets
       Location
       Composition

       Purpose

       Date/Period constructed

       Performance

       Style

       Associated persons

       ETC. . .

                                                                        13
WHAT IS THE FACETED SUBJECT
INDEXING?
o   Subject indexing is the technique which
    indicates the location of the resources according
    to their specific subject and it has two-fold job-
    o Translating the name of the subject of the document
      (NL) into a preferred system of artificial language
    o Translating the user‘s queries (NL) to the system‘s
      language
o   Faceted Subject Indexing is the system which
    uses Facet Analytico Theory in order to bring the
    context to the indexing system.

                                                            14
POSTULATE BASED PERMUTED SUBJECT
INDEXING (POPSI)
It is a generalized model for the representation of
  the thought content of information resource as
  well as to model a particular subject domain.

It consists of –
o Four elementary categories (Fundamental
  Categories)
o Modifiers


  Bhattacharyya, G. (1979), "POPSI: its fundamentals and procedure
  based on a general theory of subject indexing languages", Library
  Science with a Slant to Documentation, Vol. 16 No. 1, March, pp. 1-34.   15
ELEMENTARY CATEGORIES OF POPSI
 Discipline
  It includes the conventional field of studies or
  any aggregate of such fields
 Entity

   It includes any manifestation which is the core
  of the subject, be it, concrete or abstract as
  contrasted with their properties or action
  performed on or by them.



                                                     16
ELEMENTARY CATEGORIES OF POPSI
 Action
  It includes the manifestation denoting the
  concept doing. It includes the processes and steps
  of doing. The action may be self action or external
  action.
 Property

  It includes the manifestation denoting the
  concept of attribute.



                                                        17
MODIFIERS IN POPSI--
Are divided in two categories—
o Dependent Modifiers

o Independent Modifiers / Common Isolate



o   Dependent modifiers are used in conjunction
    with the elementary categories so that they can
    sharpen the particular facet.
     For ex. Romantic in Romantic Love
             Infections in Infectious Disease

                                                      18
COMMON MODIFIERS/COMMON
ISOLATES
These modifiers have the capability of modifying or
 sharpening any of the elementary categories. Some
 of them are—
o Space Modifiers

o Time Modifiers

o Language Modifiers

o Form Modifiers…and so on…




                                                      19
TAKING CARE OF THE COMPLEX
SUBJECTS
Phase Relations—
o General Relation

o Bias Relation

o Comparison
    o   Similarity
    o   Difference
o    Application Relation
o   Influence relation


                             20
EXAMPLE: 1
In Medical Science, Treatment of Infectious
  Disease of Lungs.

Discipline: Medical Science
Entity: Lung
Property: Infectious Disease
Action: Treatment




                                              21
EXAMPLE: 2
In Medical Science, A Report on the Treatment of
  Infectious Disease of Lungs in India during 1950-
  1965.
Discipline: Medical Science
Entity: Lung
Property: Infectious Disease
Action: Treatment
Space Modifier: India
Time Modifier: 1950-1960
Form Modifier: Report
                                                      22
EXPRESSING THE POPSI IN SKOS
   SKOS (Simple Knowledge organization System)-

      claims to provide a simple, machine-understandable,
       representation framework for Knowledge Organisation
       Systems (KOS)…

      has the flexibility and extensibility to cope with the
       variation found in KOS idioms…

       is fully capable of supporting the publication and use of
       KOS within a decentralised, distributed, information
       environment such as the world wide (semantic) web.

                                                                    23
http://www.w3.org/2004/02/skos/
SKOS …CONT..
 In   scope…
      controlled vocabularies
      thesauri
      taxonomies
      classification schemes
      subject heading systems
 Grey area…
   terminologies (sensu ISO TC37 SC4)
   wordnets
   lexical databases
   synonym rings
   glossaries
   dictionaries
   ‗ontologies‘                         24
   ‗folksonomies‘
POPSI CLASSES & PROPERTIES (1/2)
                   ElementaryCategory
                   Discipline
                   Entity
                   Property
                   Action

Property Classes
form                        hasProperty    phaseRelation
time                        isPropertyof    general
subPropertyOf: DAML/Time    hasAction       biasedBy
(TemporalEntity)            isActionOf      influenceBy
place                       hasPart         comparisonWith
subPropertyOf: DAML/Place   isPartOf        similarityWith
                                            differenceWith
                                            application
                                            tool             25
POPSI CLASSES & PROPERTIES (2/2)
    |-ElementaryCategory
    | |-Discipline
    | |-Entity
    | |-Property
    | |-Action
    | |-Form
    | |-Environment
    | |-place
    | |-Time
    |-modifier
    | |-type
    | |-discipline (hasDiscipline, isDisciplineOf)
    | |-entity (hasEntity, isEntityOf)
    | |-property (hasProperty, isPropertyOf)
    | |-action(hasAction, actionOn)
    | |-phaseRelation
    | | |-general
    | | |-bias (biased, biasing)
    | | |-influence (influenced, influencing)
    | | |-comparison (comparedWith)
    | | |-difference (differencedBy, differencing)   26
    | | |-application
    | | |-tool
Facetizing Concepts
   (Discipline) Medicine,
   (Entity) Human body,
   (property of Entity) disease,
   (action on property) treatment,
   (type of action) radiation therapy,
   (Entity of action) X-ray,
   (method of action) treatment using Rotation
    technique,
   (action of action) determination
   (application of action) depth dose,
                                                  27
   (tool of action) Ionized packet chamber
POPSI IN RDF
<?xml version="1.0"?>
<rdf:RDF xml:lang="en"
xmlns:popsi="http://drtc.isibang.ac.in/~guha/popsi/popsi-skos#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#">
 <rdf:Description rdf:about="http://hdl.net/1849/234">
    <popsi:ElementaryCategory>
         <rdfs:OrderedCollection>
         <popsi:Discipline>Medicine</popsi:Discipline>
         <popsi:Entity>Human Body</popsi:Entity>
         <popsi:Property>Disease</popsi:Property>
          <popsi:hasAction>Treatment</popsi:hasAction>
           <popsi:type>Radiation Therapy</popsi:type>
           <popsi:hasEntity>X-ray</popsi:hasEntity>
           <popsi:application>Rotation Technique</popsi:application>
           <popsi:tool>Ionized packet chamber</popsi:tool>
         </rdfs:OrderedCollection>
    </popsi:ElementaryCategory>
  </rdf:Description>                                                   28
</rdf:RDF>
GRAPHICAL REPRESENTATION
                   http://hdl.net/1849/234
    popsi:Entity
                                                         popsi:Discipline

     Human body
                                                          Medicine
                        popsi:Property


                                    Disease
                                                  popsi:hasAction


                   popsi:typeOf              treatment

            Radiation
            Therapy                         popsi:tool

popsi:hasEntity         popsi:application                    Ionized Packet
                                                                Chamber
                           Rotation                                           29
                          Technique
     X-Ray
FACETED SEMANTIC ANNOTATION SYSTEM
It will consist of two parts—
1. The Classaurus

    It will be arranged in two parts-
     •     Hierarchical Display of all the facets arranged in
           elementary categories and modifier classes
     •     Alphabetical listing of the keywords (word Clouds)
2.       The Associative index
          It will be inverted index of classaurus facets.



                                                                30
FACETED         SEMANTIC ANNOTATION SYSTEM

                      Users / Annotators




                                                  Classaurus
                    Meta Data Pool
                                           SVM




    Meta Data
    Harvester




IRs and                 Word Net                 POPSI Schema in
   Domain DLs        Lexical Database              SKOS/RDF
FURTHER RESEARCH
 Better algorithm and model for automatic text
  categorization
 Inclusion of the Faceted Semantic Subject
  Annotation model in existing Annotation
  Systems
 Formalization of the process of Facet Analysis

 Bringing the Associative effect in index




                                                   32
THANK YOU




            33

				
DOCUMENT INFO