Terminology and ontology

Document Sample
Terminology and ontology Powered By Docstoc
					  - TKE 2005 -
Copenhagen, Denmark




        “Terminology and Knowledge
       Engineering in Fraud Detection”

         Koen Kerremans
                                                     Gang Zhao
         Rita Temmerman

 Centrum voor Vaktaal en Communicatie     Semantics Technology and Applications
                (CVC)                       Research Laboratory (STAR Lab)

      Department of Applied Linguistics       Department of Computer Science
        Erasmushogeschool Brussel                Vrije Universiteit Brussel

            http://cvc.ehb.be                 http://www.starlab.vub.ac.be
  - TKE 2005 -
Copenhagen, Denmark




   How are terminology and knowledge
   engineering used in the fight against financial
   fraud?

   How to organise terminology and knowledge
   engineering methods into a development
   process of technological solutions in the fight
   against financial fraud?
  - TKE 2005 -
Copenhagen, Denmark             General outline

   FF POIROT:
      Aims
      Cases
      Partners
   Methodologies:
      AKEM (knowledge engineering)
      Termontography (terminology engineering)
   Interaction of methodologies
   Future work
   Conclusion
  - TKE 2005 -
Copenhagen, Denmark                               FF POIROT


   Financial Fraud Prevention Oriented Information
       Resources using Ontology Technology


   Aims:
      Apply Semantic Web technology to fraud detection and
      prevention, thereby showing the potential of ontologies in these
      areas
      Construct multililingual terminological as well as formal
      knowledge repositories covering the domains of interest
      Propose methods and guidelines in terminology and knowledge
      engineering
      Develop new and/or improve existing tools to support
      terminology and knowledge engineering
  - TKE 2005 -
Copenhagen, Denmark                 FF POIROT: cases

   VAT carousel fraud
      = VAT fraud in which fraudsters sell goods at VAT
      inclusive prices and disappear without paying the VAT
      paid by their customers to the tax authorities
      Companies unwittingly involved in this type of fraud can
      be held responsible for the „missing‟ VAT
      Each company has to find out whether or not it is „safe‟ to
      do business with a trader from another EU country

   On-line investment fraud
      = the selling of overpriced or worthless shares, bonds, or
      other financial instruments to the general public
      In Italy, Consob searches suspicious websites via
      „traditional‟ search engines such as Google, Altavista, …
  - TKE 2005 -
Copenhagen, Denmark                           FF POIROT

   Use of the ontology:
      Knowledge management of fraud investigative expertise
      Information exchange between investigative bodies
      Automation of parts of monitoring or investigative
      procedures with knowledge-based applications (e.g.
      information extraction)
   Use of multilingual terminology:
      Dicionary purposes
      Multilingual keywords in information extraction
      Explanation of reasoning in natural language
      Knowledge resource consulted during ontology
      development
  - TKE 2005 -
Copenhagen, Denmark                         FF POIROT: partners
                      ontology modellers

                                                      Tool developers
  terminographers




 legal experts                                                     investment regulator




      computer linguists
                                                                VAT experts
                                    Tool developers
  - TKE 2005 -
Copenhagen, Denmark                           AKEM



   Application Knowledge Engineering Methodology
   Development cycle:
      Knowledge scoping (result: stories)
      Knowledge analysis
      Ontology development
      Deployment
  - TKE 2005 -
Copenhagen, Denmark                                         AKEM

   Based on DOGMA:
     Developing Ontology-Guided Mediation for Agents
     Ontology = a set of lexons and their commitments in
     particular applications
     Lexon = a grouping element stored in a lexon base
     and composed of terms and roles
     <Context, Term_1, Role_1, Term_2, Role_2>:

          Context        Term_1       Role_1   Term_2   Role_2

       SixthDirective   MemberState   Adopt     Law     Adopted
  - TKE 2005 -
Copenhagen, Denmark                                   AKEM

   Why Application Knowledge Engineering
   Methodology?
      There is a need to organise a geographically distributed,
      multidisciplinary team of domain experts, knowledge
      analysts and engineers in a methodical traceable
      development cycle
      There is a need to examine how knowledge can be
      extracted from different perspectives on fraud to improve
      the quality of the fraud ontology


       linguistic             conceptual
                                                  fraud
                                                 ontology
                      legal          …
  - TKE 2005 -
Copenhagen, Denmark                                   AKEM
                H1
                                      An example of a legal
                                      view: Wigmore chart
     1                     …
                                        Blue = hypothesis
                                        Red = claim
    1.1          1.2       1.3          Purple = evidence
                                        Green = fact


  1.1.1       E1.2.1 E1.3.1


 E1.1.1         F1.1.1.1   F1.1.1.2
  - TKE 2005 -
Copenhagen, Denmark                                            AKEM

   H1: Public offer of company X is unlawful
      1.1: X solicits investors on the WWW
        • E1.1.1: X manages website that solicits investors
           – F1.1.1.1: Website states name „X‟
           – F1.1.1.2: Registration details indicate „X‟ as registrant of
             website
      1.2: No advance notice of solicitation to Consob
        • E1.2.1: X did not give a notification to Consob regarding
          public offer to purchase
      1.3: No related prospectus filed with Consob
        • E1.3.1: X did not draft or file a prospectus with Consob
          regarding public offer to purchase
  - TKE 2005 -
Copenhagen, Denmark                                                   AKEM

   Extraction of knowledge constituents and abstraction
   into production rules, allow knowledge modellers to
   identify and organise the abstract concepts and
   relations into a lexon base
   Example:

     Context            Term_1          Role_1       Term_2           Role_2
     58.94.1             Offerer         Make      PublicOffering     MadeBy
     58.94.1          PublicOffering   SubTypeOf     Offering       SuperTypeOf
     58.94.1             Offerer         Give      AdvanceNotice     GivenBy
     58.94.1            Regulator       Receive       Notice        ReceivedBy
     58.94.1             Notice         Contain     Prospectus      ContainedBy
     58.94.1           Solicitation     Target       Investor        Targeted
  - TKE 2005 -
Copenhagen, Denmark                  Termontography



   = a terminological approach in which (multilingual)
   terminological knowledge, retrieved from texts, is
   structured according to a framework of knowledge (i.e.
   categorisation framework)
   Why Termontography?
      Terminographers need a common reference framework to
      scope the terminology work
      There are significant commonalities between terminology
      compilation and text-based ontology development
      In our view a terminological analysis can contribute to the
      formalisation of a given domain
  - TKE 2005 -
Copenhagen, Denmark                                       Termontography

                          Search
                         phase (3)

   (mono- or                          first version of
  multilingual)                      termontological                             Ontology
 domain-specific                          database
    corpus

                                            Refinement                                      Dictionary
                                             phase (4)       (mono- or
      Information                                           multilingual)
       gathering                                          termontological
       phase (2)                                             database


 Domain-            TSR +                                                   Verification
               categorisation
 experts         framework                                                   phase (5)


      Knowledge                              Validation
       Analysis                              phase (6)
       phase (1)
  - TKE 2005 -
Copenhagen, Denmark        Interaction of methodologies

                                 Problem & knowledge space

                                         PROBLEM DETERMINATION

      Analysis phase   TERMONTOGRAPHY         AKEM               KNOWLEDGE SYSTEM DEVELOPMENT

        Information              KNOWLEDGE SCOPING                   SYSTEM REQUIREMENTS
          gathering
              phase       TEXT CORPUS            STORIES

       Search phase      EXTRACTION OF                                   SYSTEM DESIGN
                                               KNOWLEDGE
                        TERMINOLOGY &           ANALYSIS
                        KNOWLEDGE RICH
                           CONTEXTS            ONTOLOGY
         Refinement
                                              DEVELOPMENT
              phase        TERMBASE                                   SYSTEM DEVELOPMENT
                          REFINEMENT



        Verification       Terminology
                                                 Ontology                    System
                              base
        & validation
              phases
                                               DEPLOYMENT




                                              Solution
  - TKE 2005 -
Copenhagen, Denmark         Interaction of methodologies

   Knowledge scoping:
      Developing terminological resources and ontological
      repositories requires above all an insight in the domain of
      interest
      Domain experts can support the knowledge acquisition
      process by pointing out the relevant categories/topics (given
      the envisaged tasks/applications)
      Example: „Transactions for which no VAT is required‟

               Transaction not                           Transaction
            allowing the supplier   Vrijstelling   occurring outside the
                to deduct VAT       (NL-BE)         territory of the VAT
                                                     legislation at stake
               Transaction not      Exemption
            allowing the supplier   (EN-UK)            Transaction
                to deduct VAT       Zero-rated     occurring outside the
                                    (EN-UK)           scope of VAT

                                    Exemption
                                    (EN-IR)
  - TKE 2005 -
Copenhagen, Denmark   Interaction of methodologies

   Terminology base  ontology development:
      Rationale:
        • The AKEM extraction task seeks for basic semantic
          elements and follows linguistic units in natural language
          texts
        • Experience shows that ontology engineers resort from
          time to time to terminological resources for background
          information or exact definitions
      Characteristics of terminological analysis:
        • Special emphasis on documenting semantic contexts by
          means of textual contexts
        • Entries also include linguistic semantic descriptions such
          as agent-predicate-patient/recipient links and cross
          references among items of contents
    - TKE 2005 -
Copenhagen, Denmark                     Interaction of methodologies


Entry Number 1272
EN-UK                 Investment firm
        Domain        Investment                                                                                         References
        Description   any legal person the regular occupation or business of which is the provision of investment        Dir-93-
                                            services for third parties on a professional basis                           22_EN.txt-69

        Co-text       Whereas an investment firm should not be able to invoke this Directive in order to carry out       Dir-93-
                      spot or forward exchange transactions other than as services connected with the provision          22_EN.txt-23
                        or investment services; whereas, therefore, the use of a branch solely for such foreign-
                           exchange transactions would constitute misuse of the machinery of this Directive;
        Relation      Investment firm            Provide                             Investment service                  Dir-93-
                                                                                                                         22_EN.txt-50

                      Investment firm            Has                                 Registered office                   Dir-93-
                                                                                                                         22_EN.txt-137

IT-IT                 Impresa di investimento
        Domain        Investment                                                                                         References
        Description    i soggetti (comprese le persone fisiche) che offrono consulenza in materia di investimenti        pro_0625_IT.t
                         come loro attività principale/esclusiva, dovranno ottenere l'autorizzazione ad operare in       xt-388
                        qualità di "imprese di investimento" ai sensi della dsi, in sostituzione dei regimi nazionali
                                                specifici a cui sono assoggettati attualmente
                      qualsiasi persona giuridica la cui occupazione o attività abituale consiste nel prestare servizi   pro_0625_IT.t
                                                   di investimento a titolo professionale                                xt-658
  - TKE 2005 -
Copenhagen, Denmark   Interaction of methodologies




                          Editorial info




          Terminology                      Ontology
  - TKE 2005 -
Copenhagen, Denmark   Interaction of methodologies

   Consequences:
     Productivity of ontology engineers is improved by
     suggestions from terminologists, who examine the
     same knowledge resources
     Terminography adds a linguistic viewpoint to
     application-specific modeling
     During ontology development, multilingual
     terminological information can help discover
     „semantic gaps‟ across languages due to social and
     cultural differences and facilitate consensus
     building in a multilingual team of developers
  - TKE 2005 -
Copenhagen, Denmark                           Future work

   How to represent meaning variations of
   lexicalisations referring to the same category?
      E.g.: “An event on which VAT is to be paid”
        • Irish legislation: „chargeable event‟
            – VAT will be due on the date the invoice is issued
        • UK legislation: „chargeable event‟
            – VAT will be due no later than the 15th day following
              the month in which the supply takes place
        • French legislation: „fait générateur‟
            – VAT will be due at the moment the goods are
              supplied
  - TKE 2005 -
Copenhagen, Denmark           Future work

   Termontography workbench
  - TKE 2005 -
Copenhagen, Denmark                Conclusion

   We have shown how terminology and
   knowledge engineering is used in the fight
   against financial fraud
   We have shown how the methods of experts
   with different backgrounds have been
   translated into a coherent and traceable
   workflow
  - TKE 2005 -
Copenhagen, Denmark                                        Conclusion

  System               Task          Data            Source         Status
 Web search           Web         Unstructured      Web pages       Industrial
                  surveillance                                      prototype

  Invoice         VAT Fraud        Structured       User input       Industrial
management        prevention                                        application

   Web IE             Web         Unstructured      Webpages        Research
                  surveillance                                      prototype

  E-mail IE           Detection   Unstructured        E-mails       Research
                                                                    prototype

   Reader        Information      (Un)structured     Websites,      Conceptual
  Assistant      management                          databases        design

EC regulation     Document        Unstructured     EC Directives    Conceptual
management       management                        & national law     design
  - TKE 2005 -
Copenhagen, Denmark                  Conclusion

   Project:
      http://www.ffpoirot.org
   Partners:
      STAR Lab: http://www.starlab.vub.ac.be
      CVC: http://cvc.ehb.be
      JBC: http://www.cfslr.ed.ac.uk/
      RACAI: http://www.racai.ro
      KS: http://www.knowledgestones.com/
      L&C: http://www.landcglobal.com/index.php
      Consob: http://www.consob.it/main/index.html
      VAT@: http://www.vatat.com

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:10/2/2011
language:English
pages:26