Getty Metafinder Prototype
Shared by: HC12100112935
-
Stats
- views:
- 0
- posted:
- 10/1/2012
- language:
- English
- pages:
- 29
Document Sample


Helping people find content …
preparing content to be found
Enabling the Semantic Web
Joseph Busch
Outline
Why Semantics Matter
What is the Semantic Web
Semantic Content Management
Why Semantics Matter
When you own a
Rembrandt you can
spell his name any
way you want.
But when you
want to find a
Rembrandt …
you better spell
his name
correctly.
Vocabulary resources can help find the right
artist even if their name is typed incorrectly.
Users cannot type in the
complex queries needed to
find all the relevant items...
But this can be done
automatically.
Complex queries are
even more important
when you search the
entire web.
So you find Rembrandt the
Dutch guy...
… And not Rembrandt
the toothpaste.
Search Failure
19% Character errors. 19%
(Young, et al) 21% 40%
40% Vocabulary errors. 20%
(Seaman)
20% Index confusion.
21% Successful (Nielsen)
Search Solution
Generate more consistent content to search on.
Correct user errors.
Map the language of users to the language of the target
content.
Search Alternatives
Personalization Content needs to be tagged
with attributes that map to
user categories
Analytics Users don’t follow predictable
& consistent pathways
Taxonomies Automatically generated
taxonomies reflect
ambiguities of natural
language
Syndication Requires subscriber profiles,
well-categorized content, &
managed rules
Solution for Search Alternatives
Predictable standardized structures, and
Consistent semantics to work on
… so machines can understand it.
What is the Semantic Web
Berners-Lee’s Semantic Web
Formatting content so that machines can understand it.
Use XML/RDF:
Infinitely flexible markup language.
Process content in many more ways than simply for viewing
it.
Problem: Mostly syntax … not semantics (in the human
sense of meaning, i.e., language)
XML is a Grail-like Object
XML is just a means for encoding information—an
envelope standard. The real value is still in the information
that you put in the envelope.
Filling XML placeholders such as <meta>, <subject>, and
<maker> requires semantic information management.
Soergel’s SemWeb Proposal
System of integrated access to data on concepts and
terminology.
Bring together variety of sources that exist largely in
separate worlds, including dictionaries, thesauri,
classification schemes, etc.
Federated system with multiple collaborators.
Common interface to all concept & terminology knowledge
bases on the Internet.
The Real Semantic Web
Namespace for uniquely identifying a semantic scheme &
each concept within each scheme.
Broad template or conceptual schema for holding all types
of semantic information & specifying relationships among
them.
Definitions of services for interacting with the System.
Vocabulary Markup Language (VocML)
XML schema for the Semantic Web.
Broad template for structured representation of semantic
schemes.
Dublin Core metadata.
Tags and syntax for uniquely identifying each concept.
Typed relationships (hierarchical, associative, etc.)
Typed notes.
Networked Knowledge Organization Systems
nkos.slis.kent.edu
<?xml version="1.0"?>
<!DOCTYPE VocML SYSTEM "VocML.dtd“> Dublin Core
<VocML version=”1.1“>
<SrcVocab>
<SVHeader>
<dc:Title>DFSIC-1998</dc:Title>
<dc:Source>Standard Industrial Classification (1987)</dc:Source>
<dc:Creator>Interwoven</dc:Creator>
<dc:Contributor>U.S. Department of Commerce</dc:Contributor>
…
<workNum UIDprefix=”DFSIC-1998” DisplayTitle=”Standard Industrial Classification” BriefDisplay=”SIC”>
</SVHeader> Unique ID
<SVTerm UID=”DFSIC-1998::0139” CCID”104:43”>
<label>Field Crops, except Cash Grains, not elsewhere classified</label>
<definition>Establishments primarily engaged in the production of field crops, except cash grains, not elsewhere
classified. This industry also includes establishments deriving 50 percent or more of their total value of sales of
agricultural products from field crops, except cash grains (Industry Group 013), but less than 50 percent from
products of any single industry.</definition>
<cla>0139</cla>
<typedRelation UREF=”DFSIC-1998::013” UTYPE=”Z39.19-1980::2" Name=”BT”>
<typedRelation UREF=”DFSIC-1998::013900” UTYPE=”Z39.19-1980::3" Name=”NT”>
…
Typed Relationships
Implementing the Semantic Web
The Holy Grail is ...
Accurate information automatically processed so that it
can easily be found and used for applications.
A rich web of linked information, with markup allowing
machines to route relevant information to the audiences
that value it most.
Metatagging
The hard work is mining content to extract key information:
Recognize the mentions of people, organizations, places,
and things.
Infer the subject matter.
And putting it into formats with standard labels for effective
exploitation.
Semantic Content Management
User Queries
• database search
• text search
Exploit It
Raw Content Relevant
Information
• unstructured text Vocabularies
• found items
• untagged data
• granular text
Tag It
Structured
Content
• metadata
• XML/RDF
Exploiting the Semantic Web
Route content to audience segments that value it most.
Link mentions of people, organizations, places, and things
to other information related to those entities.
Populate portal directories.
Precisely search heterogeneous content items.
Predictions
Predictions
VocabularyML.
Semantic standard for unique identifiers (a namespace) for
people, organizations, places, and things and the
relationships among them.
See: nkos.slis.kent.edu
Technologies that enable the persistent naming of the
information inside XML envelopes.
Generation of enormous value through interoperability
among web applications.
Joseph A. Busch
Content Intelligence Evangelist
ASIST President, 2001
415-778-3129
fax 415-778-3131
jbusch@interwoven.com
Moving business to the Web
www.interwoven.com
Get documents about "