Kyoto by niusheng11

VIEWS: 1 PAGES: 17

									KYOTO (ICT-211423)
Yielding Ontologies for Transition-Based Organization
FP7: Intelligent Content and Semantics

http://www.kyoto-project.eu/

Piek Vossen, VU University Amsterdam
     FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                          2




                        Project goals
• Open platform for knowledge sharing across languages
  and cultures
   – Wiki environment that allows people in the field to maintain their
     knowledge and agree on meaning without knowledge engineering
     skills
   – Bootstrap this knowledge through open text mining & concept
     learning
   – Enables knowledge transition and information search across
     different target groups, transgressing linguistic, cultural and
     geographic boundaries.
   – Enables deep semantic search for facts and knowledge
• Free, open source license (GPL)

            FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                        3




                                   Scope
• Languages:
   – English, Dutch, Italian, Spanish, Basque, Chinese, Japanese
• Domain:
   – Environmental domain, BUT usable in any domain
• Global:
   – Both European and non-European languages
• Available:
   – Free: as open source system and data (GPL)
• Future perspective:
   – Content standardization that supports world wide communication



             FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                       4



            KYOTO (ICT-211423)
• Funded:
  – 7th Framework Program-ICT of the European Union:
    Intelligent Content and Semantics
  – Taiwan and Japan funded by national grants
• STREPS project: research & development
• Duration:
  – March 2008 – March 2011
• Effort:
  – 364 person months of work.
            FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                          5


                           Consortium
•   Vrije Universiteit Amsterdam (Amsterdam, The Netherlands),
•   Consiglio Nazionale delle Ricerche (Pisa, Italy),
•   Berlin-Brandenburg Academy of Sciences and Humantities (Berlin,
    Germany),
•   Euskal Herriko Unibertsitatea (San Sebastian, Spain),
•   Academia Sinica (Tapei, Taiwan),
•   National Institute of Information and Communications Technology
    (Kyoto, Japan),
•   Irion Technologies (Delft, The Netherlands),
•   Synthema (Rome, Italy),
•   European Centre for Nature Conservation (Tilburg, The Netherlands),
•   Subcontractors:
    –   World Wide Fund for Nature (Zeist, The Netherlands),
    –   Masaryk University (Brno, Czech)



              FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                     6


  Current situation environment domain
• Vast amount of information in all kinds of formats
  and structures: websites, documents, databases,
  experts, community networks
• Scattered over the world: different regions,
  languages and cultures
• Highly dynamic and developing
• Increasing time and information pressure
• Technology gap, use first results Google
• Critical knowledge dependency


          FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                           7




         KYOTO cycle




FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                            8


                    KYOTO's Solution
• Text mining:
   –   Massive and accurate indexing of facts from vast amounts of text;
   –   In any language/culture from scattered sources;
   –   Again and again to detect trends and changes;
   –   Direct relation between knowledge modeling effort and text mining
• Knowledge modeling:
   – automatic learning of terms and concepts from text in any language;
   – formalization of knowledge in computer usable format -> wordnets &
     ontologies
• Community software:
   – For experts in the field and not knowledge engineers
   – Continuous and collaborative effort:
        • adapt to the changing domain;
        • consensus in the field;
        • consensus across languages and cultures
   – Produce interoperable, formal, standardized knowledge structures;
   – Relate knowledge structure to expressions in languages
                 FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
   Distributed, diverse & dynamic data                       Environmental organizations

                                                       1

                                                                                                               Citizens
                                                                               4                               Governments
                                                                       maintain                                Companies
                                                                       terms & concepts
                                                             Wikyoto

Capture text:                                     Wordnets           Ontology
"Sudden increase of
CO2 emissions in
                      2                                                    Q
2008 in Europe"                                                                                     Top
                                                                 Abstract Physical
              Tybot: term yielding robot
                                                                 Process            Substance
                          3
                              CO2 emission                                                          Middle
                                                                                   H20   CO2

                                                              H20       CO2     Greenhouse
                                                                                                    Domain
                                                             Pollution Emission     Gas

               Kybot: knowledge yielding robot
                              Index facts:
                          5   Process:    Emission                                              6         Semantic
                              Involves:   CO2                     Text & Fact Index
                              Property:   increase, sudden
                                                                                                           Search
                              When:       2008
                              Where:      Europe
                                                                     10




                                                              st
     Achievements after 1 year
• First version of all system components
   – Wordnets in 7 languages in uniform database formats
   – Standard representation for output of linguistic
     processing for 7 languages, based on ISO proposals
   – Tybot (term extraction), Kybot (fact extraction) and
     Wikyoto (user editor)
   – Semantic search
• Extensive definition of user requirements
• Integration of system components
          FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Potential impact
           Kyoto Knowledge Base
                              Domain
         Domain                                       Domain
                                WnJP

          WnIT                                         WnNL




Domain                     Domain Ontology
                                                               Domain
                              Ontology
                              Ontology
 WnES                                                          WnEN


                  Domain                     Domain

                   WnEU                       WnCH
                                                                                                                               13


               Linking Open Data dataset cloud
          http://richard.cyganiak.de/2007/10/lod/                                      Wordnet
                                                                                     environment
                                                                                        terms
                          legal
                          facts                                                                    environment
                                                                                                      facts
              medical
Wordnet        facts
sailing
 terms                                                                                                        Wordnet
           Wordnet                                                                                          environment
             legal                                                                                             terms
            terms

                                                                                                               Wordnet
                                                                                                             environment
          Wordnet
                                                                                                                terms
          medical
           terms

                                                                                                         Ontology
   Ontology                                                                                             environment
                    Ontology
     legal                                                                                               concepts
                     medical
   concepts
                    concepts
                                                                                                                   Wordnet
                                                                                               Wordnet
            Ontology                                                                                             environment
                                                                                             environment
             sailing                                                                                                terms
                                                                                                terms
            concepts




                                  FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Project characteristics
                                                                   15




         Why STRP project?
• Major technical challenges
• Cross-cultural and cross-lingual
• Small consortium for intense collaboration
  and discussion
• Bridge the gap between users and
  technology: two-directional process
• Role out needs to follow from technical
  achievements
        FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
                                                                    16




           How to keep focus?
• Use existing state of the art technology
• Start from current practice as baseline
• Develop robust platform that adds to baseline,
  with baseline as fall back
• Gradually add richer data, more precision and new
  functionalities
• Allow end-users to control the process, driven by
  textual examples
• Open standardized architecture that can be
  developed further
         FP7, Information Day Call 5, Luxembourg, May 11-12, 2009
Thank you for your attention

								
To top