Amicalola Report Semantic Web, Databases and Information Systems by oneforseven


									    Amicalola Report: Database and Information Systems Research Challenges and
                  Opportunities in Semantic Web and Enterprises1

              Amit Sheth                                   Robert Meersman
          University of Georgia                        Vrije Universiteit Brussel
           Athens, GA, USA                                 Brussels, Belgium

                                                   Ontology     subgroup:   Stefan   Decker
1    Executive Summary                             (Coordinator), Yahiko Kambayashi, Vipul
This report describes opportunities for the        Kashyap (Coordinator), Max Egenhofer,
DB/IS community to contribute to the               William Grosky, Michael Uschold
advancement of the Semantic Web and the            Web Services subgroup: Karl Aberer, Isabel
challenges or new research topics presented        Cruz, Dieter Fensel (Coordinator), Mike
by the vision of the Semantic Web to the           Huhns, Munindar Singh (Coordinator), Ling
database and information systems (DB/IS)           Liu, Rudi Studer
researchers. It is based on the NSF-
OntoWeb Invitational Workshop on DB/IS                  Participants identified significant past
Research for Semantic Web and Enterprises          successes in DB/IS that are likely to play an
that was held during April 3-5, 2002 at the        important role in realizing the Semantic
Amicalola Falls State Park in the northern         Web, especially by bringing this
Georgia mountains. Most of the workshop            community’s unique strengths in technical
participants were industry R&D leaders or          capabilities in semantic modeling, query
senior academics from the fields of database       processing, transactions and workflow
management and information systems who             systems. Equally important is this
at various points in time have been deeply         community’s ability to develop technologies
involved with semantics or interdisciplinary       that are scalable, high performance, and
work in knowledge representation. Others           robust that this area has proven success
included AI and database researchers who           with. Although semantics is not a new topic
are active with Semantic Web related               to this community, the participants
projects, those who have worked on                 identified several new research challenges
semantic modeling and interoperability             for DB/IS researchers that Semantic Web
dealing with different domains (e.g.,              poses. Besides the broad vision of seeing
geographic) and/or media (video, images).          the entire web as a global information
This report could not have been produced           system, and observing semantics as the
without their generous contribution, which         primary enabler of scalability required for
we explicitly acknowledge and for which            the next generation of the Web, this
we are once more very grateful.                    community also sees more immediate
Amicalola Working Group:                           applications that benefit enterprise and e-
                                                   commerce between a group of enterprises
Organizers: Robert Meersman, Amit Sheth            and industry through the scalability and
Applications subgroup: Michael Bordie              productivity improvements semantics can
(Cooridnator),      Umeshwar        Dayal          bring. Several ideas in community building,
(Coordinator), Ramesh Jain. Frank Manola,          outreach and funding initiatives were
Hans-Jorg Stork, Bhavani Thuraisingham             discussed.

  This workshop was sponsored by NSF CISE-IIS-0211606 (Program Manager: Dr. Bhavani
Thuraisingham), OntoWeb, University of Georgia Research Foundation, Inc. and the LSDIS Lab.

                                                       understanding programs to selectively find
2    Background                                        what users want, or for programs to work on
    The Semantic Web concept was widely                behalf of humans and organizations to make
adopted as a vision, a challenge, and, by              them more scalable, efficient and
some a necessity. Many elaborations have               productive.
been provided, including:                                       None of the above definitions of or
   The Semantic Web is a computer                     perspectives in Semantic Web exclude
    system, a distributed machine which                significant role of database and information
    should function so as to perform                   systems (DB/IS), quite to the contrary..
    socially useful tasks. [B98b]                      Semantics has indeed been an important
   “The Web of data (and connections)                 undercurrent in database areas of modeling,
    with meaning in the sense that a                   query processing and transactions. Yet as
    computer program can learn enough                  observed at a CoopIS panel [CoopIS01] and
    about what data means to process it.”              in the background on the Amicalola
    [B99]                                              workshop[Agenda] , most recent workshops
   “The Semantic Web is an extension of               and conferences have had limited
    the current Web in which information is            involvement and participation by the DB/IS
    given well-defined meaning, better                 research community.
    enabling computers and people to work                       Arguably a significant majority of
    in cooperation.” [BHL01]                           Semantic Web researchers today come from
   “…next generation internet, where we               AI. This has allowed the early research in
    will not only surf the web, but work the           Semantic Web to benefit from the strength
    web.” [A01]                                        of past AI research, which includes skills in
   “The Semantic Web is a vision: the idea            knowledge modeling and representation
    of having data on the Web defined and              languages. There are some significant
    linked in a way that it can be used by             differences in the way how different
    machines not just for display purposes,            research seem to be viewing approaches and
    but for automation, integration and                mechanisms to achieve a Semantic Web.
    reuse     of    data     across     various                 One distinction stands out. Database
    applications. [W3C01]                              has for long realized the value of data
   "The Semantic Web is a web of data, in             independence, and has distinguished
    some ways like a global database."                 between schema and data. This has been
    [B98a]                                             the key to the scalability, efficiency, and
For the purposes of this report, we focus on           robustness of data management solutions.
the unique distinction between the current             By the desire to annotate each resource, the
web and the Semantic Web. The current                  Semantic Web vision calls for creation of
Web is sometimes referred to as an "eyeball            the equivalent of a massive new distributed
Web" where all interpretation of accessed              database of metadata (annotations), whose
information occurs, literally, in the eye of its       size can be of the same order of magnitude
beholder, viz. a human. On the Semantic                as data itself and of which the complexity
Web interpretation will be primarily done              will likely exceed that of the data itself.
by software agents: every information-                 This should clearly be viewed as the
dependent resource, includingenterprises,              opportunity for DB/IS to contribute
information services, application services,            synergistically with other disciplines to
and devices, need to become augmented                  make the Semantic Web a reality.
with machine processable descriptions to                        Thus the workshop’s agenda was to
support the finding, reasoning about (e.g.,            discuss what DB/IS can do for the Semantic
which service is best), and using (e.g.,               Web and to identify new research
executing or manipulating) the resource.               challenges for the DB/IS research
The idea is that self-descriptions of data and         community in the process of achieving the
other techniques would allow context-                  vision of Semantic Web. In the process, the

Amicalola workshop complements and                    significant benefits to businesses. Semantics
continues the work of other workshops                 was seen as a required contribution to the
which studied the relationships of the                efficiency of the world (e-)economy, in at
Semantic Web vision with various                      least three concrete ways.. First, by
disciplines [S02], including AI sub-                  generically improving the efficiency (e.g.,
communities      such      as     knowledge           reduce the cost) of business, government,
representation [E02] and machine learning.            and personal processes currently on or
    As we noted earlier, semantics has been           planned for the web through the creation of
part of various methods and techniques in             easily accessible, standardized, meaningful
database management, including (but not               interfaces with and descriptions of systems
limited to) modeling, query processing,               and data. Second, semantics are required to
transaction management.            However,           address the challenges posed by the growth
emerging Semantic Web changes the                     and sophistication of the web. Machine
thinking about semantics at two levels:               processable semantics is seen as the critical
 semantic annotation of all resources                elements of a scalable solution to deal with
    changes the scale at which the                    the current and anticipated growth of the
    techniques need to exploit semantics,             web and to deal with the expected vast
    and                                               number and sophistication of the services
 broader form/type of semantics, as in               available over the web. Third, semantics are
    domain semantics, which opens new                 required to exploit the unique opportunities
    research opportunities in applying them.          that the Semantic Web will offer such as
                                                      converting all relevant processes (e.g., tax
3    Workshop Overview                                preparation, supply chains management)
     The workshops consisted of three                 from incomplete (e.g., using only accessible
activities. The first day involved short              information) and discrete (e.g., compute
presentations by most the participants                once and again when ever the solution
(presentations and position papers appear on          becomes        grossly     sub-optimal)      to
the workshop web site and the proceedings,            comprehensive (e.g., using all relevant
respectively). The second day consisted of            information) and continuous (i.e., tax
workgroup discussions. Three workgroups               planning and preparation are an integral part
were formed by the participants on the                of the life of a person or organization such
topics of ontology, web services and                  that every financial event can be considered
application pull. This division was likened           in real-time).
to that in medical field of            anatomy,                Focusing as well on the process
physiology, and pathology, respectively.              perspective, rather than only on data, the
The third half day consisted of review of             Application Pull subgroup observed that
workgroup results, an exercise in discussing          web services are for real, that their
the role of DB/IS in enabling and making              organizations have started to prototype
Semantic Web successful, and the new                  using them, and the promise of semantic
challenges the emerging area of Semantic              composition of processes (as in workflows)
Web poses for DB/IS research. (A table of             hold huge promise to business productivity
the results from this last activity is appended       and efficiency.
at the end of this Report.) Let us briefly                 Key discussion areas and conclusions
review output of each of the workgroups,              of this WG included the following:
followed by the review of the relationships           (a)       applications      ranging      from
between DB/IS and Semantic Web.                            individual        applications      (e.g.,
                                                           continuous tax preparation), B2B (e.g.,
3.1   Application Pull                                     supply-chain)        and       scientific/
                                                           engineering research could benefit from
        There was significant agreement,                   Semantic      Web          R&D,      with
especially among the industry participants,                corresponding beneficiaries varying
that a future Semantic Web promises

     from individuals, organizations and             ontology lifecycle. A selection of the items
     society                                         of research identified in this WG includes:
(b)       Semantic Web should and can lead            Inference       v/s    Query     Rewriting/
     to significant benefits that include                Processing for Semantic Integration
     lower barriers to entry, adaptability or            (e..g., RichPerson = (AND Person (>
     dynamic behavior (to support changing               Salary 100))
     situation),     supporting    continuous         Distributed Inferences and Loss of
     activity, and various improvements                  Information         when       supporting
     (timeliness, accuracy, transparency,                relationships other than equality
     etc.)                                            Query Languages for combining
(c)       Challenges to realizing Semantic               metadata and data queries
     Web’s potential to applications include          Graph-based data models and query
     design/specification       of      upper            languages
     ontologies and domain ontologies with
                                                      Schema Correspondences/Mappings
     broader acceptance, support for
     ontology management activities (create,          Intensional Answers (when answers are
     search, select, maintain, map/integrate),           descriptions, e.g. (AND Person (>
     etc.                                                Salary 100)) instead of a list of all rich
Significant parts of the discussion involved             people)
outlining a real possibility of obtaining an          Semantic Associations (identification of
order of magnitude or higher improvement                 meaningful or contextually relevant
in key business applications such as supply-             relationships between classes and
chain management if even limited part of                 instances)
Semantic Web vision is realized, as well as
in noticing that at some companies, Web              3.3    Web Services
Services and their use/support for semantics             There    was,     perhaps     predictably
can be seen as initial forays towards                ,significant interest in web services at
Semantic Web applications. This WG felt              Amicalola Falls, with overlapping and
that the Semantic Web vision is more than a          complementary discussions in this subgroup
research initiative, and that there are              as well as Application Pull subgroup). It
plentiful real-world applications that can           quickly identified that Semantic Web
benefit as aspects of the Semantic Web               Services (SWS)--the Web Services that are
vision are realized.                                 “formally self-described”-- to be of primary
                                                     research interest and of critical importance
3.2    Ontology                                      to Semantic Web. The role of P2P (peer-to-
     This subgroup focused on the role of            peer protocols) as a possible new way of
database management in support for                   organizing     WS-based      systems     was
ontology engineering and management. The             discussed, as well as moving from natural
subgroup       discussed many aspects of             language (as in textual description of Web
ontology lifecycle (ontology search, match,          Services) to tags to domain ontologies was
merge/refine,     maintenance,     creation,         described as a way to provide increasing
modification/versioning,       requirements          level of semantics.
analysis, evaluation, learning, consistency
checking, deployment).[ It then identified
potential role of known database research
and technology in addressing various step in
ontology lifecycle, as well as identified                      Worth pursuing     Formally self-described
distinctions between assumptions and focus
                                                             Std        Self-described
of database research with respect to unique
features and requirements of such methods                   Program      Amazon                 Hard code
and techniques in supporting various step of
                                                      All                 html                         People

                                                      scalability and productivity. DB/IS research
The Web Services subgroup noted that                  has the potential to assume an increasingly
compared to the issues that deal with data,           important role in making the Semantic Web
web services are more challenging in                  happen for business and scientific uses,
matters such as modeling, organizing                  significantly impacting how the technology
collections, discovery and comparison,                and the Web supports individuals,
distribution and replication, access and              organizations and society at large.
composition,        fulfillment     (contracts,
coordination         versus      transactions,        4     Next Steps
compliance), and quality aspects more
general than correctness or precision,                4.1     Outreach        and     Community
compliance). They are also more dynamic                       Building
and have more difficult characterization of
                                                              This workshop has already been
security and trust. This discussion led to the
                                                      followed by another workgroup on Semantic
following research challenges for realizing
                                                      Web at the NSF-IDM PI’s workshop in May
SWS in future:
                                                      [S+02].       Additionally, the organizers
 Conversational (state-based, event-                 reviewed the results at the panel “Research
     based, history-based) web services               Directions for the Semantic Web” organized
 Interoperability,        composition     and        by Rudi Studer at OntoWeb3 in Sardinia,
     translation of web services                      Italy in June 2002.
 Representations            for      services:               We also expect improved interactions
     programmatic self-description                    between       relevant    communities     by
 Commitments, contracts, negotiation                 involvement       of     prominent    DB/IS
 Discovery, location, binding                        researchers in specific Semantic Web
 Compliance,                                         activities such as the ISWC Conference, and
 Cooperation                                         increased participation in the Semantic Web
 Transactional workflow: rollback, roll-             tracks that are being appended to a number
     forward, semantic exception handling,            of relevant recurring events such as the
     recovery                                         WWW Conference.
 Trustworthy           service    (discovery,                A number of networks and resources
     provisioning, composition, description)          supporting the emerging Semantic Web
                                                      community are or have been set up,
 Security; privacy v/s personalization
                                                      probably the most famous one at present
 Quality of Service, w.r.t. various aspects          being the OntoWeb Thematic Network of
    Esoteric and advanced issue                       the EU (under its 5th Framework Program),
                                            , in which 180
Workshop included presentations and some              partners (with more that 50% from industry)
discussion on areas that are related to               are actively collaborating to gather,
semantic web—multi-model semantics,                   represent and disseminate knowledge about
context-aware computing, semantics to                 relevant technologies, methods and tools. As
pragmatics,      experiential    computing.           the Semantic Web grows, we expect such
Although these may not yet be identified as           initiatives to multiply and spread to other
one of the core areas of Semantic Web, they           networks supporting a variety of interested
may become critical new areas in their own            communities, at least as Special Interest
right.                                                Groups.
     In summary, the current web supports
virtually every type of human endeavor and            4.2   Nourishment and Sustenance
these uses are growing dramatically in
                                                            With the basis provided by Semantic
coverage, sophistication, and adoption.
                                                      Web Working Symposium, this workshop
Semantics is viewed as the most important
                                                      and NSF-IDM work group, NSF-IIS is
enabler to continue this with better
                                                      currently evaluating the possibility of

initiating a program that can sponsor               with allied disciplines should be part of a
research in this area.                              research agenda.
A number of initiatives are also envisaged in
Europe notably as part of the planned 6th           References
Framework Program, due to start in 2003
and in which the Semantic Web will be the           [A01] J. Andersen, The Semantic Web Tutorial,
cornerstone in more than one Key Action of          XML 2001, Finland.
its Work Program ( A
number of concretely focused calls were
already done as part of the 5tyh Framework,
and a number of projects are under way or           opAgenda.htm
starting up as this Report appears (ibid.).
                                                    [B98a] T. Berners-Lee, Semantic Web Road
5     Conclusions                                   map,
      The success and potential of the web is
leading to the possibility that every
                                                    [B98b] T. Berners-Lee, Interpretation and
information resource, person, organization,
                                                    Semantics on the Semantic Web, 1998
and many of the activities relating to them
will be located on or be driven by the Web.         ml
This poses the opportunity of qualitatively
improved interactions but also quantitatively       [Be99] Tim Berners-Lee, Weaving the Web,
changes the scale and scope of already well-        Harper, 1999.
understood challenges in computer science.
The simple extrapolation of the current Web         [CoopIS01] Panel on “Semantic Web: Rehash or
(e.g., simply more resources) requires              Research Goldmine?” D. Fensel, R.
qualitatively    improved      solutions   to       Meersman, J. Mylopoulous and A. Sheth,
problems of interaction between resources,          CooPIS, Trento, Italy, 2001.
currently called interoperation, integration,
and collaboration. The sole, scalable               [E02] J. Euzenat, Report from the NSF-
                                                    EU Workshop „Research Challenges and
solution involves improving the automation
                                                    Perspectives of the Semantic Web“,
of interactions, which in turn can occur only
                                                    Sophia-Antipolis, October 2001,
with access to enhanced “meaning” of all
resources and the ability of software agents
on the Web to deal with this enhanced
meaning.                                            [W3C01] W3C: Semantic Web Activity
                                                    Statement, 2001,
We see Semantic Web as a long term and    
fundamental research direction for DB/IS
which requires vigorous research program.           [S02] R. Studer, “Research Directions for the
It has unique challenges in such issues as          Semantic Web, ” (Panel Introduction),
                                                    OntoWeb3, Sardinia, Italy, June 2002.
scalability, performance and robustness that
DB/IS has successfully tackled in the past,         [S+02] A. Sheth, et. al. Semantic Web
yet Semantic Web poses unique new                   Information Systems: NSF IDM workgroup
challenges for research. Amicacola group            report on challenges and opportunities in
believes that both a significant funding            Semantic       Web,      July        2002.
program targeted at DB/IS and collaboration

Appendix: Compilation of the Amicalola Working Group's collective perception on the
(bidirectional) interaction between the SW and the DB/IS research

DB     /  IS How is it relevant to research on          How may the SW stimulate research in
subcommunity the SW                                     this community
DB theory    Type theory, Complexity, theory of         Ontology axiomatics and theory; formal
             concurrency                                semantics;     semantics    for    incomplete,
                                                        inconsistent and evolving representations
Data(base)       Everything; in particular ontology Ontology modeling; formal semantics of web
semantics        language development; constraints; services
                 data structures
Normalization/   Not specifically as such; some work Requirement for formal properties for
design           on Non-First Normal Form               ontology organization; perhaps ontology
                                                        design guidelines or “semantic normal
                                                        forms”; conflict resolution; redundancy
                                                        checks in general
Data modeling    reuse/extend/map DM formalisms, semantic data modeling; ontology content
                 techniques and methods e.g. EER, creation techniques and methods; complex
                 ORM, UML for ontology (content) ontological relationships; domain models
                 specification and design
View             Ontology alignment, translation, see Federated DBs; ontology support for view
integration      object identities, updateable views…; and      application   integration;    ontology
                 model mappings                         composition and update
Schema           apply to autonomously designed Ontology alignment; new kinds of models will
integration      schemas; global schemas as pre- pose new kinds of problems
                 ontologies? conflict detection
Deductive        Learn from its failure, query how to handle different complexity levels
DB/Datalog       processing and F-logic                 efficiently
Multimedia DB    Image ontologies; semantic indexing; Image-based ontologies?
                 similarity-based search
Temporal/Spati   GIS semantics and archiving; requirement to model temporal knowledge as
al DB            histories data management;             first class citizen in ontologies; spatial,
                                                        temporal modeling in upper ontologies;
                                                        versioning of GIS becomes critical issue
Document DB      Digital libraries, unstructured data; Lack of a priori global model presents a
                 standards for digital library resource research challenge
                 descriptions to beused on the SW
OO DB            Object-oriented and object-based management of large collections of object-,
                 models for ontologies, extensible behavior- and resource identifiers
                 databases; modeling of object
                 behavior; build OODB into Java
Visual DB        Visualization for the SW, visual semantic upgrades of image databases to be
                 queries; ontology visualization        used as visual ontologies
XML/Web DB       Most relevant, caching                 Size and semantics; XML shortcomings for
                                                        semantics definition
Distributed DB   everything                             trust/privacy/compliance issues in distributed
                                                        DBMS; design/dynamic tailoring of DDBMS
                                                        underlying web services
Constraint DB    Constraint enforcement as semantics Non-closed world assumption issues
                 mechanism; semantics-based query

Transaction      loosening of ACID properties           Web      services,     Extended     distributed
modeling                                                transaction models; non-CWA issues; smart
                                                        user profiling
Transaction      limits of what can/must be             ACID properties of Web services; semantic
processing       transactional                          support for very long transactions
Mobile DB        not directly; “mobile” is a platform   context-aware computing; device location-
                 issue                                  independent semantics; mobility issues
                                                        raised/enabled by the (Semantic) Web
Main memory Semantic caching                            possibly semantic caching i.e. using
DB                                                      application semantics or context
Parallel DB      unclear at present; straightforward    Not clear at present Web SoA; parallel
                 reuse/apply (e.g. parallel queries,    architectures for ontology servers?
                 transactions, …) in certain niches
DB machines                                             Not clear at present Web SoA
DB security      A lot, e.g., access control            trust and privacy, QoS; dynamically changing
                                                        and conflicting security requirements
Federated DB     Autonomy; approaches for integrating www = huge federated DB; develop more
                 heterogeneous data sources, in powerful (scalable) approaches for ontology
                 particular web information sources; alignment and integration; heterogeneous
                 mediator/wrapper-based architectures sources may have different credibility; service
Query            high applicability; e.g. “smart” query
processing       enhancement
Query            high applicability; e.g. use domain-
optimization     knowledge to optimize query
                 execution and rewriting
Information      broad applicability of techniques and
retrieval        theory;
DB               Everything; esp. see federated DBs; Semantic aspects of interoperability; see
interoperability see schema integration                 federated DBs; quality of interoperation
DB versioning    Link        maintenance;      ontology Annotations, ontology modeling, versioning
                 versioning                             of instance data
Metadata                                                Annotations, ontology modeling, versioning
Mediation/Mid Web services will benefit                 P2P, collaboration, new market for mediating
dleware                                                 components
DB               DW architectures for decision Smart data warehousing; share/compose
warehousing      support; improve e.g. web service application semantics; ontology behind “real”
                 efficiency; see the (S)Web as a giant data
Data(base)       web mining; clustering; learning; mining from text; exploit semantics in
mining           information extraction profiles        mining; derive semantics inductively from
                                                        query results on “real” data including
                                                        exceptions; machine learning
Database         DBMS        (components)    as    web Ontology support in data dictionaries; new,
architectures    service(s); add semantics to every more flexible DB architectures for better SW
and DBMS         function/module in a DBMS’s support and processing on the web
Web-IS           fitting enterprise IS (components) New architectures and design principles for
architectures    into the SW; Web IS; also see DBMS Web IS
Functional       design of web services; functional Decomposition and composition of web

modeling         modeling that deals explicitly with a   services; event modeling
                 domain’s semantics
IS            in looser coupling required, provide       serving new organizations of business,
organizations    potential for organizations to morph    community and government with emergent
                 into the SW; see also workflow          SW-based IS technology
Web-IS                                                   smart (ontology-driven) SW portals and
applications                                             search engines (“Google++”-type); SW-based
                                                         “direct marketing”-style systems; smart user
IS     workflow exception handling in long (business)    unreliability of components; unavailability of
modeling        transactions; workflows as “the”         services
                paradigm for “programming” the SW
IS              ontology lifecycle issues; as IS New thinking required! E.g. Web IS in
methodologies   components become more intelligent, enterprises; how must business processes
                work shifts to self-organization        change to deal with existence of the SW;
                                                        develop/maintain SW-based systems for user
                                                        community unknown a priori
CASE tools      ontology management systems
User interfaces new applications of design principles New and complex requirements and methods,
                for GUIs                                immersive environments
DB application                                          Web application service
AI-and-DB       knowledge representation, inference
Uncharted                                               Sensor input and stream data management
territory 1
Uncharted       In general, most algorithms in DM
territory 2     are poor when they are applied to
                access, report etc data on the web.
                Domain semantics in such requests
                need to be exploited, where however
                “centralized”      solutions     (where
                resources need to notify potential
                requestors) will not be scalable.


To top