eric by linxiaoqin

VIEWS: 3 PAGES: 34

									Spatiotemporal Infrastructure for
  Semantic Network in Digital
           Archives

              Eric Yen
   Computing Centre, Academia Sinica
               Dec, 2002


                    2002APEC Workshop on e-Learning and Digital Libraries
                              Academia Sinica, Taipei, Taiwan. Dec. 16-20
                       Outline
 Introduction
 NDAP Approaches – Space-Time-Language Coordinates
 Archiving and processing of millions of geospatial materials
  in AS
    Characteristics
    How to delve into the knowledge level
    Experiences & Lessons we learned
    Extend to more general solution
 Geolibrary
 The Trends
 Conclusions



                                 2002APEC Workshop on e-Learning and Digital Libraries
                                           Academia Sinica, Taipei, Taiwan. Dec. 16-20
        Introduction to Digital Archive
 Digital Archive is a collection of digital objects.
 A digital object is defined as something (e.g., an image, an audio
  recording, a text document, a movie, a map) that has been
  digitally encoded and integrated with metadata to support
  discovery, use, and storage of those objects.
 Goals for Digital Archive (functional point of view)
      Protection of the original
      Duplication for safety
      Search and Retrieval
      Easy Access
      Resource Sharing
      Lower cost of maintenance and dissemination
      Max. flexibility for integration of heterogeneous/homogeneous
       information resources
      Providing abundant resources for knowledge discovery and knowledge
       construction

                                        2002APEC Workshop on e-Learning and Digital Libraries
                                                  Academia Sinica, Taipei, Taiwan. Dec. 16-20
Knowledge Discovery and Construction
 Knowledge construction means the active process of
  manipulating data to arrive at abstract models of relationships
  among phenomena in the world that facilitate our
  understanding of those phenomena and, ultimately, of the
  world. [1]
 Knowledge discovery is a nontrivial process of identifying
  valid, novel, useful, and understandable pattern in data. [2]
 Persistent cataloging, classification, and segmentation of
  digital objects is the ground for finding patterns, models, and
  trends of large volume data.

Reference:
1. MacEachren, A. et al, Constructing knowledge from multivariate spatiotemporal Data: integrating
   geographic visualization with knowledge discovery in database methods
2. Fayyad, U., Piatetsky-Shapiro, G. and Smyth, P., 1996, From data mining to knowledge discovery:
   An overview. In advances in Knowledge Discovery and Data Mining, pp.1-34.


                                                    2002APEC Workshop on e-Learning and Digital Libraries
                                                              Academia Sinica, Taipei, Taiwan. Dec. 16-20
     Types of Elementary
Knowledge Organization Systems
   Classification Systems
   Ontologies
   Taxonomies
   Index Languages
   Thesauri and other controlled lists of keywords
   Glossary
   Dictionaries
   Clustering Approaches
   Lexical Databases
   Concept Maps/Spaces
   Semantic Road Maps
   …                           2002APEC Workshop on e-Learning and Digital Libraries
                                                  Academia Sinica, Taipei, Taiwan. Dec. 16-20
                     Why Knowledge-based
                  Approach for Digital Library ?1
 Providing “Conceptual Infrastructure”
     Mapping out the conceptual structure and providing a common language for a
      field
     Providing classification/typology and concept definitions. Clarifying
      concepts by putting them into context. Thus providing orientation and
      serving as a reference tool for individual researchers and practitioners
      and thereby
     Assisting with the exploration of the conceptual context of a research
      problem and in structuring the problem, thereby providing the conceptual
      basis for the design of good research, for the consistent definition of
      variables, and thus the cumulation of research results.
     Providing the conceptual basis for the exploration of the various aspects
      of a program in program planning, in the identification of approaches and
      strategies, and in the development of evaluation criteria
 Assisting users in understanding context
 Assisting information providers with conceptualizing a topic and with finding
  the proper term
 Discovery of high quality resources
 Providing frameworks for information exchange and resource interoperability

           Dagobert Soergel, Evaluation of Knowledge Organization Systems (KOS) Digital Libraries
                                                 2002APEC Workshop on e-Learning and
                                                           Academia Sinica, Taipei, Taiwan. Dec. 16-20
                      Why Knowledge-based
                   Approach for Digital Library ?2
 Information Storage & Retrieval
     Information system(s) in which the vocabulary is to be used
     Use of the vocabulary
         Vocabulary control in indexing and searching (controlled vocabulary)
         Vocabulary control only for searching. Assist with clarifying a search topic and
          assembling all applicable concepts and terms, whether searching with a controlled
          vocabulary of free-text.
     ISAR technique(s) (such as: printed index, computer search system). Support of
      inclusive (hierarchically expanded) searching
     Automated vs. manual indexing or query formulation. Approach to indexing to be
      supported: Request-oriented vs. entity-oriented
     Techniques for eliciting user needs (e.g., menu based on search tree; questions based
      on facet structure)
     Summary evaluation of the vocabulary's adequacy for the stated purpose on the
      more detailed analysis as outlined below.
 Translation
 Language learning
            Dagobert Soergel, Evaluation of Knowledge Organization Systems (KOS) Digital Libraries
                                                  2002APEC Workshop on e-Learning and
                                                              Academia Sinica, Taipei, Taiwan. Dec. 16-20
        Digital library requirements for
       knowledge organization schemas
 The need for knowledge organization in subject gateways
  and discovery services, issues of application and use
 Web-based directory structures as knowledge
  organization systems
 Knowledge organization as support for web-based
  information retrieval, query expansion, cross-language
  searching
 Semantic portals




   ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
   http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
                                      2002APEC Workshop on e-Learning and Digital Libraries
                                                Academia Sinica, Taipei, Taiwan. Dec. 16-20
       Digital library requirements for
      knowledge based data processing
 Knowledge organization for filtering, information
  extraction, summary
 Knowledge organization support for multilingual systems,
  natural language processing or machine translation
 Structured result display, clustering
 End-user interactions with knowledge organization
  systems, evaluation and studies of use, knowledge bases
  for supportive user interfaces, visualization




   ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
   http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
                                      2002APEC Workshop on e-Learning and Digital Libraries
                                                Academia Sinica, Taipei, Taiwan. Dec. 16-20
    Digital library requirements for
 knowledge structuring and management
 Suitable vocabulary structures, conceptual relationships
 Comparison between established library classification
  systems and home-grown browsing structures
 Methodologies, tools and formats for the construction and
  maintenance of vocabularies and for mapping between
  terms, classes and systems
 Frameworks for the analysis of assumptions and
  viewpoints underlying the construction and application of
  terminology systems
 Methods for the combination and adaptation of different
  vocabularies


   ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
   http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
                                      2002APEC Workshop on e-Learning and Digital Libraries
                                                Academia Sinica, Taipei, Taiwan. Dec. 16-20
Digital library requirements for access to
           knowledge structures
 Data exchange and description formats for knowledge
  organization systems, the potential and limitations of
  XML and RDF schemas
 Handling of subject information in metadata formats
 Standards and repositories for machine-readable
  description of networked knowledge organization
  schemas (as collections/systems)
 Interoperability, cross-browsing and cross-searching
  between distributed services based on knowledge
  organization systems
 Distributed access to knowledge organization systems:
  standard solutions and protocols for query and response,
  taxonomy servers
   ECDL2000, Special Workshop on Networked Knowledge Organization Systems,
   http://nkos.slis.kent.edu/ECDL-NKOS-final.htm
                                      2002APEC Workshop on e-Learning and Digital Libraries
                                                Academia Sinica, Taipei, Taiwan. Dec. 16-20
          Discover Knowledge from
               Digital Archive
 Geospatial information means those geo-materials that are
  georeferenced and having well-documented metadata
 Ref. Components of a digital object in digital archive
 Geospatial Content Based
 Extracting knowledge by space-time-language




                                  2002APEC Workshop on e-Learning and Digital Libraries
                                            Academia Sinica, Taipei, Taiwan. Dec. 16-20
                    Knowledge about Space
 Temporal Characteristics is embedded and could not be neglected
 Acquisition
     Direct Experience
         Locomotion thru environment(crawling, walking, running, bicycling, driving,
          flying, etc.)
         Stationary viewing
     Secondary Environmental Experience
         Static medium: maps, diagrams, paintings, photos, etc.
         Dynamic medium: animate static visual figures to show changes over time
     Other ways to conceive those that can not be viewed
 Characteristics
     Multimodal: proprioceptive, kinesthetic, auditory, visual, etc.
     Language is often used to convey spatial information
     Multi-perspective and scales
 充分瞭解人類獲取、整合與利用空間資訊模式,將可促進此類
  資訊的更有效利用,以及建立更符合實際需求的應用機制(e.g.,
  aid for decision making) 2002APEC Workshop on e-Learning and Digital Libraries
                                                         Academia Sinica, Taipei, Taiwan. Dec. 16-20
      Spatial Representation in GIS
 Data Model
    Vector: explicit
        Basic elements: point, line and polygon
    Raster: implicit
 Geographic space is organized into partitions (layers)
 Space-dominant representations focus on the spatial arrangement
  of entities based on the geometric and thematic properties of
  these entities.
    Space is a neutral container
    Entities only exist when associated to a layer or theme
    Applied primarily in traditional mapping
    Layer-based raster and vector models
    Each layer is associated to a period or point in time
    Change- or update-based scenario
    Analysis based on similarity or dissimilarity between aggregations
     (layers) at different points of time
                                             2002APEC Workshop on e-Learning and Digital Libraries
                                                       Academia Sinica, Taipei, Taiwan. Dec. 16-20
      Why Thinking in Spatio-
         Tempoal ways?
Because the earth is running: It’s
 incomplete to describe an events/object in
 spatial domain only.
Learn from the past, and plan for (predict)
 the future.
Characteristics of Space & Time
Importance
To organize space over time
                         2002APEC Workshop on e-Learning and Digital Libraries
                                   Academia Sinica, Taipei, Taiwan. Dec. 16-20
                     Discover Knowledge from
                      Geospatial Information
      Geospatial information means those geo-materials that are
       georeferenced and having well-documented metadata
      Ref. Components of a digital object in digital archive
      Geospatial Content Based
         Feature Identification
         Feature comparison: enhance the likelihood of relationships among
          features
         Feature interpretation: merge the identified features and their
          relationships with real world entity, by domain knowledge
         Linking to other resources that are related to this feature, this place
          and the time  parsing the collected information from metadata or
          lexical analysis
      Demands
         Link spatiotemporal data analysis techniques to GIS
Feature interpretation tools must provide connections between abstract representations
of data, metadata that describe those data, an analyst’s knowledge, and knowledge Digital Libraries
                                                       2002APEC Workshop on e-Learning and
                                                                   library)
sources external to the data set being explored (e.g., thru digitalAcademia Sinica, Taipei, Taiwan. Dec. 16-20
 Discover Knowledge from Geospatial Information
                     Feature Identification
   Def: Finding instances of identifiable features in spatiotemporal data
   Emphasis is on examining the distribution of data in all of its dimensions in an effort to
    notice any distinct object, regularity, anomaly, hot spot, etc.




Example:
Distribution of Tombs in
Han Dynasty




                                                    2002APEC Workshop on e-Learning and Digital Libraries
                                                              Academia Sinica, Taipei, Taiwan. Dec. 16-20
Integrated Support for
       Research




                         2002APEC Workshop on e-Learning and Digital Libraries
                                   Academia Sinica, Taipei, Taiwan. Dec. 16-20
WebGIS-based System Architecture




                 2002APEC Workshop on e-Learning and Digital Libraries
                           Academia Sinica, Taipei, Taiwan. Dec. 16-20
              Challenges of
    Geospatial Information Processing
 High threshold for general users
 Hard to find required geospatial content/service
 New retrieval technology for geospatial
  information
 Persistent metadata and archive
 Mechanism for effective management of huge
  volume of data set
 Efficient ways for digitization/vectorization of
  geospatial materials
 Integration with other information resources
                            2002APEC Workshop on e-Learning and Digital Libraries
                                      Academia Sinica, Taipei, Taiwan. Dec. 16-20
              Discover Knowledge by
         Space-Time-Language Coordinates
 Constructing the linkage among diversified archives thru
  language (vocabulary)
 Lingual coordinate has both spatial and temporal extents
    Lingual-Temporal Plane: evolution of language thru time
    Lingual-Spatial Plan: spatial distribution in dialect
 Multi-lingual support for digital archive
 Establishment of domain-specific controlled vocabulary sets,
  and serve as basis of ontology




                                  2002APEC Workshop on e-Learning and Digital Libraries
                                            Academia Sinica, Taipei, Taiwan. Dec. 16-20
             Discover Knowledge by
        Space-Time-Language Coordinates
                  Time




Space

                                                          Language




                         2002APEC Workshop on e-Learning and Digital Libraries
                                   Academia Sinica, Taipei, Taiwan. Dec. 16-20
Space, Time and Language Coordinates for Digital Archives



           Time            Historical               Space
                             GIS

                           Digital
                          Archives


         Language          Language                Language
          in Time          in Text, in              in Space
                            Speech...
       Language Changes                       Language variations
                          Language


                                    2002APEC Workshop on e-Learning and Digital Libraries
                                              Academia Sinica, Taipei, Taiwan. Dec. 16-20
          Lingual Coordinate in NDAP
 A lexis/vocabulary in context is analogy to the basic unit of a concept in knowledge
      Lexis is the basic unit for any kind of language process, such as recognition, parsing,
       wordformation, semantics, conversation and analysis
      Thru lexical analysis, collection of all the lexical types(詞類), lexical patterns(grammar文法),
       and instances could pave the base as lingual coordinate.
      Collection of enough description(context incl. metadata) for a specific domain(could be a set of
       digital objects), ontology(collection of concepts for the domain) of that field is constructed. 
       How do we know if that is enough?  Need the self-learning capability in the mechanism
 Atomic attributes of a place name
      Name
          Glyph & stroke: original writing, all the historical and contemporary writing, and Romanization(pinyin)
          Pronunciation: indigenous and evolutions afterward
          meaning (if we could restore to original fonts & sound)
      Footprint
          Could be ambiguous: M N
      Time: (start, end), could be vague for historical names
      Type: (geographic type, also could know the administrative level if it represents an administrative
       area)
 Atomic attributes of a datum
      People, event, time, place, object
                                                            2002APEC Workshop on e-Learning and Digital Libraries
                                                                      Academia Sinica, Taipei, Taiwan. Dec. 16-20
Constructing Space-Time-Language Coordinates for NDAP
 Geographic searching is a powerful and important tool
     More than 80% information resources pertain to specific geographic areas and are either
      explicitly or implicitly geo-referenced.
     To utilize benefits of geographic search, we have to geo-reference information contents first.
     the cost of creating geographic footprints for each record (the Alexandria Digital Library Project
      spent $4m over four years) is very high. The automatic extraction of geo-referenced information
      is also possible but there is a need for sophisticated tools that go further than geographic name
      extraction.
 Moving from information management toward knowledge management
     (Demands) New ways of information search & retrieval
           Traditional full-text search
           Keyword-based or query by example search
           Query by information content (image, audio, video, and multimedia contents)
           Incorporation of geographic & temporal search
     Versatile ways for presenting information & knowledge
         2D, 3D, or 4D
         Multimedia, virtual reality
         Map-on-demand, thru the parser of geographic names from context, or directly by the coordinates
     Separation of content representation & presentation
     The core is the metadata-based content analysis
         CA(Information Content)Metadata Schemes for management of contents
         Identify the best way of information representation and become persistent archive

                                                           2002APEC Workshop on e-Learning and Digital Libraries
                                                                     Academia Sinica, Taipei, Taiwan. Dec. 16-20
中國歷史文化地圖之整合應用
                      清代地方誌檢索
   漢籍全文檢索




        圖書聯合目錄
          查詢                                          人物資料庫查
                                                        詢




                 2002APEC Workshop on e-Learning and Digital Libraries
                           Academia Sinica, Taipei, Taiwan. Dec. 16-20
2002APEC Workshop on e-Learning and Digital Libraries
          Academia Sinica, Taipei, Taiwan. Dec. 16-20
            Roles of Visualization in
             Knowledge Discovery
 Role
    Useful in finding holes or errors in data sets
    Useful for noticing abstract features and patterns
    Predigest complex relations of data sets into visual form
    Facilitate access to multiple perspectives on information, thru
     interactivity
    Facilitate decisions on appropriate model representation during
     analysis stage.
    Process tracking: uncover key aspects of a process
    Parameter control to get corresponding outcome on the fly
 Functionality




                                        2002APEC Workshop on e-Learning and Digital Libraries
                                                  Academia Sinica, Taipei, Taiwan. Dec. 16-20
                           Geolibrary
 Objective: Lower the barriers for applying GIScience
  technologies
 Approaches
    Collecting and providing basic georeferenced spatial data/knowledge
     persistently
    Building up application environment and tools for utilization of
     spatiotemporal knowledge and technologies
    Development of spatiotemporal-based technologies for multi-disciplinary
     contents integration, aggregation, knowledge discovery in map-metaphor
 Focus & Approach
    Construction of the System Infrastructure for Spatial and Temporal
     Information Technology
    Development of Core Technology
    Establishment of Effective Service Model for Research Support


                                         2002APEC Workshop on e-Learning and Digital Libraries
                                                   Academia Sinica, Taipei, Taiwan. Dec. 16-20
                   Clearinghouse
 An instance of implementation of interoperability
 Functionality
    Locating the required resources/services
    Maintaining a persistent catalog of resources/services for
     sharing
    Exchange of information content
    Format transformation
                                                Clearinghouse (catalog)

                                                       Metadata

                                              Framework GEOdata
                                                         Standards

                                                                      Partnerships
                                   2002APEC Workshop on e-Learning and Digital Libraries
                                             Academia Sinica, Taipei, Taiwan. Dec. 16-20
       Effective Management System for Huge
                  Volume of Data
 Remote sensing data: 2TB/day;And will accumulate to 5 Peta
  Byte in 2005。
 According to the statistics of EU Space Center
    Raw data from satellite : 100GB/day, 500GB/day (after Feb. 2002)
    800 TB data had been archived
 Big Challenge of IT for cataloging, searching, retrieval,
  management, identification, knowledge discovery, and integration、
 Trading off between decentralization and consolidation on cost,
    Convergent to multi-centers of information resources in Internet
    Think about how to facilitate the collaboration among those centers –
     Community and virtual organization
 Demands for complete architecture and services Data Grid


                                          2002APEC Workshop on e-Learning and Digital Libraries
                                                    Academia Sinica, Taipei, Taiwan. Dec. 16-20
               What’s the Solution
 Support sharing and coordinated use of diverse resources in
  dynamic “virtual organizations” – Grid !
 Good technical solutions for key problems, such as
      Security enhancement like authentication and authorization
      Resource discovery and monitoring
      Reliable remote service invocation
      High-performance remote data access
    -- Grid !
 Good quality reference implementation, multi-lingual support,
  interfaces to many systems, large user base, industrial support,
  etc. – Grid !
 Persistent Web Services – Grid !




                                        2002APEC Workshop on e-Learning and Digital Libraries
                                                  Academia Sinica, Taipei, Taiwan. Dec. 16-20
                Measuring Success
   High degree of component autonomy
   Low cost of infrastructure
   Ease of contributing components
   Ease of using components
   Breadth of task complexity supported by the approach
   Scalability in the number of components




                                   2002APEC Workshop on e-Learning and Digital Libraries
                                             Academia Sinica, Taipei, Taiwan. Dec. 16-20
     Conclusions and Future Work
 Building the right infrastructure will be crucial
 Intersection of spatiotemporal coordinates and lingual
  coordinate constitutes a good framework both for knowledge
  extraction and interoperability
 Consensus gathering and technology development still the
  major challenges for interoperability
 Open System, Open Standard, and Open Source




                                 2002APEC Workshop on e-Learning and Digital Libraries
                                           Academia Sinica, Taipei, Taiwan. Dec. 16-20

								
To top