FAQs About Taxonomies _ Metadata.ppt
Document Sample


Taxonomy Strategies LLC
FAQs About Taxonomies &
Metadata
Joseph A. Busch & Ron Daniel, Jr.
May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 2
Who is Joseph Busch?
Over 25 years in the business of organized information
Founder, Taxonomy Strategies
Director, Solutions Architecture, Interwoven
VP, Infoware, Metacode Technologies
Program Manager, Getty Foundation
Manager, Pricewaterhouse
Metadata and taxonomies community leadership
President, American Society for Information Science & Technology
Director, Dublin Core Metadata Initiative
Adviser, National Research Council Computer Science and
Telecommunications Board
Reviewer, National Science Foundation Division of Information and
Intelligent Systems
Founder, Networked Knowledge Organization Systems/Services
Taxonomy Strategies LLC The business of organized information 3
Who is Ron Daniel, Jr.?
Over 15 years in the business of metadata & automatic
classification
Principal, Taxonomy Strategies
Standards Architect, Interwoven
Senior Information Scientist, Metacode Technologies
Technical Staff Member, Los Alamos National Laboratory
Metadata and taxonomies community leadership
Chair, PRISM (Publishers Requirements for Industry Standard Metadata)
working group
Acting Chair, XML Linking working group
Member, RDF working groups
Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
Taxonomy Strategies LLC The business of organized information 4
Who has Taxonomy Strategies worked with?
Government Commercial
Commodity Futures Trading Commission Allstate Insurance
Defense Intelligence Agency Blue Shield of California
ERIC Debevoise & Plimpton
Federal Aviation Administration Halliburton
Federal Reserve Bank of Atlanta Hewlett Packard
Forest Service Motorola
GSA Office of Citizen Services PeopleSoft
(www.firstgov.gov) Pricewaterhousecoopers
Head Start Siderean Software
Infocomm Development Authority of Sprint
Singapore Time Inc.
NASA (nasataxonomy.jpl.nasa.gov)
Small Business Administration Commercial subcontracts
Social Security Administration Agency.com – Top financial services
USDA Economic Research Service Critical Mass – Fortune 50 retailers
USDA e-Government Program Deloitte Consulting – Big credit card
(www.usda.gov) Gistics/OTB – Direct selling giant
International orgs & Non-profits
CEN
IDEAlliance
IMF
OCLC
Taxonomy Strategies LLC The business of organized information 5
What we do
Organize Stuff
Taxonomy Strategies LLC The business of organized information 6
Who are you? What do you want out of today?
Government / NGO / SME / Global 2000?
IT / Library & IM / Public Affairs / Product Management
/ Engineering / HR & Finance / Other?
Webmaster / Technical / Researcher / Editorial /
Supervisory / Executive?
Competing session – Search & Content Management:
Putting the Puzzle Pieces Together
What brought you HERE instead of THERE?
Taxonomy Strategies LLC The business of organized information 7
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 8
What is metadata? Different definitions
Library & Information
Science
Author/Title/Subject
Controlled Vocabularies for
Subject Codes (e.g. Dewey)
Authority Files for Author
Names
Database
Tables/Columns/
Datatypes/Relationships
References for some values
Taxonomy Strategies LLC The business of organized information 9
What is metadata? Another view of Dublin Core
Subject metadata – Use metadata –
Difficult to Generate
What & Why: How can it be used:
Better resource description =
Subject, Description,
Coverage
Rights & Permissions
Better navigation &
Asset metadata –
discovery metadata –
Who, Where & When: Relational
Title, Creator, Publisher, Links between and to:
Contributor, Date, Type, Relation
Format, Identifier, Source,
Language
Functionality
Taxonomy Strategies LLC The business of organized information 10
Are there extensions to the Dublin Core?
Elements Refinements Encodings Types
1. Identifier Abstract Is referenced by Box Collection
2. Title Access rights Is replaced by DCMIType Dataset
3. Creator Alternative Is required by DDC Event
4. Contributor Audience Issued IMT Image
5. Publisher Available Is version of ISO3166 Interactive
6. Subject Bibliographic citation License ISO639-2 Resource
7. Description Conforms to Mediator LCC Moving Image
8. Coverage Created Medium LCSH Physical Object
9. Format Date accepted Modified MESH Service
10. Type Date copyrighted Provenance Period Software
11. Date Date submitted References Point Sound
12. Relation Education level Replaces RFC1766 Still Image
13. Source Extent Requires RFC3066 Text
14. Rights Has format Rights holder TGN
15. Language Has part Spatial UDC
Has version Table of contents URI
Is format of Temporal W3CTDF
Is part of Valid
Taxonomy Strategies LLC The business of organized information 11
What is metadata: A scheme for recipes
Data
Element Type Length Source Purpose
Asset Metadata
Unique ID Integer Fixed System supplied Basic accountability
Recipe Title String Variable Licensed Content Text search & results display
Recipe summary String Variable Licensed Content Content
Main Ingredients Key index to retrieve & aggregate
Main Ingredients List Variable vocabulary recipes, & generate shopping list
Subject Metadata
Meal Types List Variable Meal Types vocab
Cuisines List Variable Cuisines Browse or group recipes & filter search
Courses List Variable Courses vocab results
Cooking Method Flag Fixed Cooking vocab
Link Metadata
Recipe Image Pointer Variable Product Group Merchandize products
Use Metadata
Rating String Variable Licensed Content Filter, rank, & evaluate recipes
Release Date Date Fixed Product Group Publish & feature new recipes
Taxonomy Strategies LLC The business of organized information 12
What is a taxonomy? Systematics view
Pragmatic
Biological taxonomy place an organism in one and only one
But most of the time things belong to more than one category.
place.
Animalia
Chordata
Mammalia
Carnivora
Canidae
Canis
C. familiari
Kingdom Phylum Class Order Family Genus Species
Linnaeus …
Pets Mammals Farm
Animals
Dogs
Taxonomy Strategies LLC The business of organized information 13
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 14
Are there other organizational schemes?
Type Remarks
Synonym Connects a series of terms together
Ring Treats them as equivalent for search purposes
Authority File Used to control variant names with a preferred term
Typically used for names of countries, individuals,
organizations
Classification An arrangement of knowledge
Scheme Does not follow taxonomy rules
Usually enumerated; ie, LC or Dewey
Thesaurus Expresses semantic relationships of:
Hierarchy (broader & narrower terms)
Equivalence (synonyms)
Associative (related terms)
Ontology Resembles faceted taxonomy but uses richer semantic
relationships among terms and attributes and strict
specification rules
Taxonomy Strategies LLC The business of organized information 15
Another point of view ….
Taxonomies Ontologies
(Vocabularies)
Synonym Authority Classification
Thesauri
Rings Files Schemes
Simple Complex
Equivalence Hierarchical Associative
(Relationships)
Source: Amy Warner. Metadata and Taxonomies for a More Flexible Information
Architecture (http://www.lexonomy.com/presentations/metadataAndTaxonomies.ppt)
Taxonomy Strategies LLC The business of organized information 16
Taxonomic metadata – e-Forms example
Agency Form Type Industry Jurisdiction BRM Impact Keyword Audience
Impact Topic
0001 Legislative Application 00 Generic Citizen Srvcs Agriculture & All
1000 Judicial Approval 11 Agriculture Social Srvs food General
1100 Executive Claim 21 Mining Defense Commerce Citizen
Office of Pres Information Metadata Elements
22 Utilities Disasters Communica- Business
0003 Exec Depts request 23 Construct Federal Econ Dev tions Govt
1200 Agriculture Information 31-33 Manuf State + Education Education Employee
1300 Commerce submission 42 Wholesale Local + Energy Energy Native
9700 Defense Instructions 44-45 Retail Other + Env Mgmt Env pro American
9100 Education Legal filing 48-49 Trans Law Enf Foreign rels Non-
8900 Energy Payment 51 Info Judicial Govt resident
7500 HHS Procurement 52 Finance Correctional Health & Tourist
7000 DHS Renewal 54 Profession Health safety Special
8600 HUD Reservation 55 Mgmt Security Housing & group
1400 Interior Service 56 Support Income Sec comm dev
1500 Justice request 61 Education Intelligence Labor
1600 Labor Test 62 Health Intl Affairs Law
1900 State Other input Care Nat Resour Named grps
6900 Transport Other 71 Arts Transport National def
2000 Treasury transaction 72 Hospitality Workforce Nat resources
3600 Veterans 81 Other Science Recreation
Ind Agencies Services Delivery Sci & tech
Intl Orgs 92 Public Support Social pgms
Admin Management Transport
Taxonomies
Taxonomy Strategies LLC The business of organized information 17
Why use faceted taxonomies?
4 independent categories
of 10 nodes each have
the same discriminatory
power as one hierarchy
of 10,000 nodes (104)
Easier to maintain
Can be easier to
navigate
Taxonomy Strategies LLC The business of organized information 18
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
Can I get a taxonomy off-the-shelf or create one with software?
How do you know it is good?
How do you build or modify to make it good?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 19
How do I get a good Taxonomy? – Seven practical
rules
1) Incremental, extensible process that identifies and enables
users, and engages stakeholders.
2) Quick implementation that provides measurable results as
quickly as possible.
3) Not monolithic—has separately maintainable facets.
4) Re-uses existing IP as much as possible.
5) A means to an end, and not the end in itself .
6) Not perfect, but it does the job it is supposed to do—such as
improving search and navigation.
7) Improved over time, and maintained.
Taxonomy Strategies LLC The business of organized information 20
Can I get a taxonomy off the shelf?
Sure:
www.taxonomywarehouse.com
There are usually license fees, but they will be less than
the effort to develop an equivalent taxonomy.
The voice of experience says these will usually not be
what you want.
We recommend:
Adopt a faceted approach.
Reuse existing (esp. internal) vocabularies for as many
of the facets as reasonable.
Plan on doing full-custom “Content Type” and “Subject”
taxonomies.
Taxonomy Strategies LLC The business of organized information 21
Sources for 8 common taxonomies
Taxonomy Definition Potential Sources
Organization Organizational structure. FIPS 95-2, U.S. Government Manual, Your
organizational structure, etc.
Content Type Structured list of the various types DC Types, AGLS Document Type, AAT
of content being managed or used. Information Forms , Your records management
policy, etc.
Industry Broad market categories such as FIPS 66, SIC, NAICS, Your market segments,
lines of business, life events, or etc.
industry codes.
Location Place of operations or FIPS 5-2, FIPS 55-3, ISO 3166, UN Statistics
constituencies. Div, US Postal Service, Your sales regions, etc.
Function Functions and processes FEA Business Reference Model, Enterprise
performed to accomplish mission Ontology, AAT Functions, Your business
and goals. functions, etc.
Topic Business topics relevant to your Federal Register Thesaurus, NAL Agricultural
mission & goals. Thesaurus, LCSH, Your research areas, etc.
Audience Subset of constituents to whom a GEM, ERIC Thesaurus, IEEE LOM, Your
piece of content is directed or psycho-graphics or personas, etc.
intended to be used.
Products & Names of products/programs & ERP system, Your products and services, etc.
Services services.
Taxonomy Strategies LLC The business of organized information 22
What about automatically created taxonomies?
Documents can be
‘clustered’ based on
similarities and differences.
Problems:
Typically only a single
hierarchy
No overall plan
Results hard for people to
navigate
What does “North” mean on this map?
Taxonomy Strategies LLC The business of organized information 23
What should I expect from automatic taxonomy
construction software?
Software can scan large quantities of
content and extract statistically significant
words and phrases.
Example: Archive of 10 publications was
analyzed for topics significant to ‘copyright’.
Software does a poor job of
de-duplication
turning those significant words and phrases
into a larger structure
discriminating between gold and garbage
Software is good for
getting an understanding of the key phrases
in a large amount of content
providing test cases for evaluating a
taxonomy Source: Sample data courtesy of
Randy Marcinko and nStein.
Taxonomy Strategies LLC The business of organized information 24
How can I test a Taxonomy? – Qualitative methods
Method Process Validation
Walk-throughs Show and explain Approach
Consistency to rules
Appropriateness to task
Usability Testing Contextual analysis Tasks are completed
(card sorting, successfully
scenario testing, Time to complete task is reduced
etc.)
User Satisfaction Survey Reaction to new interface
Reaction to search results
Tagging samples Tag sample Content ‘fit’
content with Fills out content inventory
taxonomy
Training materials for people &
algorithms
Basis for quantitative methods
Taxonomy Strategies LLC The business of organized information 25
Quantitative Method – How evenly does it divide the
content?
Background: Measured and Expected Distribution of Top 10 Content Types
in Library of Congress Database
Documents do not distribute uniformly 350,000
across categories 300,000
Number of Records
250,000
Zipf (1/x) distribution is expected 200,000 Series2
Series1
behavior 150,000
100,000
80/20 rule in action (actually 70/20 rule) 50,000
0
e
hy
y
es
ls
s
ns
n
s
ur
ph
ap
io
ca
tic
ap
ss
itio
at
ct
ra
M
is
di
er
re
gr
Methodology:
Fi
b
at
og
rio
ng
lit
io
hi
St
Bi
Pe
Ex
bl
le
Co
Bi
ni
ve
Ju
Top 10 Content Types
Part of alpha test of ‘content type’ for
corporate intranet
115 URLs selected at random from Measured and Expected Distribution of Content Types in an
Intranet
search index were manually categorized.
Inaccessible files and ‘junk’ were 25
20
removed
# Documents
15 Measured
Results:
Expected
10
5
Results were slightly more uniform than 0
News & Events
Unclassified
Manuals &
Marketing &
Procedures &
Communications
Presentations
People, Groups
Proposals, Plans
the Zipf distribution, which is better than
Regulations,
Materials
Learning
Operations &
Other &
& Schedules
Papers &
Policies,
Sales
& Places
Programs,
Internal
expected
Content Type
Taxonomy Strategies LLC The business of organized information 26
Quantitative Method – How intuitive (repeatable) are the
categorizations?
Methodology: Closed Card
Sort
For alpha test of a grocery site
15 Testers put each of 100 best- “Cocoa Drinks – Powder” is best
selling products into one of 10 categorized in both “Beverages”
pre-defined categories and “Grocery”.
Categories where fewer than 14
of 15 testers put product into
same category were flagged
Results:
% of Cumulative %
Testers of Products
In the trade, “Corn Tortillas” are
15/15 54% a Dairy item!
14/15 70%
13/15 77%
12/15 83%
11/15 85%
<11/15 100%
Taxonomy Strategies LLC The business of organized information 27
Quantitative Method – How does taxonomy “shape”
match that of content?
Term Group % %
Background: Terms Docs
Hierarchical taxonomies allow
Administrators 7.8 15.8
comparison of “fit” between content
Community Groups 2.8 1.8
and taxonomy areas Counselors 3.4 1.4
Federal Funds Recipients 9.5 34.4
Methodology: and Applicants
25,380 resources tagged with Librarians 2.8 1.1
taxonomy of 179 terms. (Avg. of 2 News Media 0.6 3.1
terms per resource) Other 7.3 2.0
Counts of terms and documents Parents and Families 2.8 6.0
summed within taxonomy hierarchy Policymakers 4.5 11.5
Researchers 2.2 3.6
Results: School Support Staff 2.2 0.2
Roughly Zipf distributed (top 20 Student Financial Aid 1.7 0.7
Providers
terms: 79%; top 30 terms: 87%) Students 27.4 7.0
Mismatches between term% and Teachers 25.1 11.4
document% flagged Source: Courtesy Keith Stubbs, US. Dept. of Ed.
Taxonomy Strategies LLC The business of organized information 28
How do large corporations typically extend the
Dublin Core?
120%
100%
100%
86%
80%
57%
60%
40%
20%
0%
Doc Types Products & Services Roles
Base: 20 corporate information managers
Source: CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of
Dublin Core metadata in Corporate Environments
(http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp)
Taxonomy Strategies LLC The business of organized information 29
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
How are we going to populate metadata elements with complete and consistent
values?
What can we expect to get from automatic classifiers?
What kinds of tools do people use?
How do different automatic classification tools compare?
What else should I keep in mind?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 30
General remarks on tagging
Province of authors (SMEs) or editors?
Taxonomy often highly granular to meet task and re-use needs.
Vocabulary dependent on originating department.
The more tags there are (and the more values for each tag), the
more hooks to the content.
If there are too many, authors will resist and use “general” tags
(if available)
Automatic classification tools exist, and are valuable, but results
are not as good as humans can do.
“Semi-automated” is best.
Degree of human involvement is a cost/benefit tradeoff.
Taxonomy Strategies LLC The business of organized information 31
What methods do large companies use to create &
maintain metadata?
80% 71%
70%
57%
60%
50% 43% 43%
40%
30%
20%
10%
0%
Forms Distributed Centralized Not Automated
Production production
Base: 20 corporate information managers
Source: CEN/ISSS Workshop on Dublin Core. Guidance information for the deployment of
Dublin Core metadata in Corporate Environments
(http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/cwa/cwa15247.asp)
Taxonomy Strategies LLC The business of organized information 32
How do tools compare? Analyst viewpoint
high
Content Volumes
low
low high
Accuracy Level
Taxonomy Strategies LLC The business of organized information 33
What accuracy should we expect from an automatic
classifier?
Classification Performance is Accuracy
measured by “Inter-cataloger
agreement” Trained Librarians
Trained librarians agree less than 80%
of the time potential
performance
Errors are subtle differences in gain
judgment, or big goofs
Regexps
Automatic classification struggles to
match human performance
Exception: Entity recognition can
exceed human performance
Development Effort/
Licensing Expense
Classifier performance limited by
algorithms available, which is limited
1) 80/20 tradeoff where 20% of effort
by development effort
gives 80% of performance.
Very wide variance in one vendor’s 2) Smart implementation of inexpensive
performance depending on who does
tools will outperform naive
the implementation, and how much
time they have to do it implementations of world-class tools.
Taxonomy Strategies LLC The business of organized information 34
How do tools compare? Pragmatic viewpoint
high
Content Volumes
low
low high
Accuracy Level
Taxonomy Strategies LLC The business of organized information 35
What kind of metadata creation and maintenance
process is needed?
Even ‘purely’ automatic Compose in
Template
meta-tagging systems need
a manual error correction Automatically
fill-in metadata
Submit to CMS
Problem?
Har
d
Y N Cop
procedure. y
Approve/Edit Review Copy Edit
Should add a QA sampling
Web
metadata content content
site
mechanism
Problem?
N
Tagging models: Y
Author-generated
Tagging Tool Analyst Editor Copywriter Sys Admin
Central librarians
Sample of ‘author-generated’ metadata
Hybrid – central auto-tagging
workflow.
service, distributed manual
review and correction
Taxonomy Strategies LLC The business of organized information 36
Tagging tool example: Interwoven MetaTagger
Manual form fill-in w/ check
boxes, pull-down lists, etc.
Auto keyword &
summarization
Taxonomy Strategies LLC The business of organized information 37
Tagging tool example: Interwoven MetaTagger
Rules & pattern
Auto-categorization matching
Parse & lookup
(recognize names)
Taxonomy Strategies LLC The business of organized information 38
Where do I put the metadata?
Where can I store metadata?
In the content – HTML Headers, File properties, etc.
In a centralized repository – Search index, Metadata database, etc.
Where should I store metadata? It depends.
If you are moving files through a process, putting it in the file keeps it
from getting dropped at system borders.
If you are doing search across multiple documents, it has to be at
least copied out of the files.
If you make copies of files and modify them, consistent in-file
metadata will be impossible.
Real question is not where to STORE the metadata, it is how to
MAINTAIN the metadata.
Web CMS as an example
Taxonomy Strategies LLC The business of organized information 39
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 40
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
Does adding a taxonomy mean replacing my search engine?
How are they used behind the scenes in a search implementation
How are they used in the Search UI to aid searching?
How can we make our current search engine better?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 41
How to fix search? … Add metadata to search on!
“Adding metadata to unstructured content allows it to be managed
like structured content. Applications that use structured content work
better.”
“Enriching content with structured metadata is critical for supporting
search and personalized content delivery.”
“Content that has been adequately tagged with metadata can be
leveraged in usage tracking, personalization and improved
searching.”
“Better structure equals better access: Taxonomy serves as a
framework for organizing the ever-growing and changing information
within a company. The many dimensions of taxonomy can greatly
facilitate Web site design, content management, and search
engineering. If well done, taxonomy will allow for structured Web
content, leading to improved information access.”
Taxonomy Strategies LLC The business of organized information 42
How does Google do so well without metadata?
They don’t, they just use particular types of metadata:
Number of incoming links
PageRank for each incoming link
Text of incoming links
Taxonomy Strategies LLC The business of organized information 43
Dublin Core framework for corporate use
Not just 15 elements
A framework to enable cross-resource exploration and
use
Dublin Core is framework
for “integration metadata”
at BellSouth
Source: Courtesy of Todd Stephens, BellSouth
Taxonomy Strategies LLC The business of organized information 44
What about Search? Integration Metadata
Data Req. /
Element Type Length Repeat Source Purpose
Asset Metadata
Unique ID Integer Fixed
dc:identifier 1 System supplied Basic accountability
Recipe Title dc:title Variable
String 1 Licensed Content Text search & results display
Recipe summary dc:description
String Variable 1 Licensed Content Content
X Main Ingredients Key index to retrieve & aggregate
Main Ingredients List Variable ? vocabulary recipes, & generate shopping list
Subject Metadata
Meal Types X
List Variable * Meal Types vocab
Cuisines X
List Variable * Cuisines Browse or group recipes & filter
Courses X
List Variable * Courses vocab search results
Cooking Method X
Flag Fixed * Cooking vocab
Link Metadata
Recipe Image Pointer Variable
dcterms:hasPart ? Product Group Merchandize products
Use Metadata
Rating String Variable 1 Licensed Content Filter, rank, & evaluate recipes
Release Date dc:dateFixed
Date 1 Product Group Publish & feature new recipes
Legend: ? – 1 or more * dc:language=“en”
dc:type=“recipe”, dc:format=“text/html”, - 0 or more
Taxonomy Strategies LLC The business of organized information 45
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:10 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:30 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 46
How do I sell Management on a Taxonomy Project?
Don’t sell “metadata” or “taxonomy”, sell the vision of
what you want to be able to do.
Clearly understand what the problem is and what the
opportunities are.
Do the calculus (costs and benefits)
Design the taxonomy (in terms of LOE) in relation to
the value at hand.
Taxonomy Strategies LLC The business of organized information 47
Fundamentals of metadata ROI
Tagging content using metadata and a taxonomy are
costs, not benefits.
There is no benefit without exposing the tagged
content to users in some way that cuts costs or
improves revenues.
Putting metadata and a taxonomy into operation
requires UI changes and/or backend system changes,
as well as data changes.
You need to determine those changes, and their costs,
as part of the ROI.
Taxonomy Strategies LLC The business of organized information 48
What are the typical metadata ROI scenarios?
Catalog site
Increased sales.
Increased productivity.
Customer support
Cutting costs.
Increased sales.
Compliance
Avoiding penalties.
Knowledge worker productivity
Less time searching, more time working.
Taxonomy Strategies LLC The business of organized information 49
Metadata ROI: Catalog site
Guided Navigation
2-3 clicks to product
No dead ends
http://www.tesco.com/winestore
Taxonomy Strategies LLC The business of organized information 50
Metadata ROI: Catalog site
Increased sales Enterprise portal cost
Product findability. $6M
Product cross-sells and up-
sells.
Customer loyalty.
1-5% increase in sales
$57.6B sales (’04) $600M to $2B/year
$2.1B net income (’04) $21M to $105M/year
1-5% increase in productivity $155M to $776M/year
$50K average cost per employee
310,400 employees (’04)
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC The business of organized information 51
Metadata ROI: Customer support model
Help on search
page, not a click Type and go to
away. search for specific
policies
Policy categories
for browsing
Refine search
offered with
results
Good search
results for policy
topics, e.g.,
“pets”
Taxonomy Strategies LLC The business of organized information 52
Metadata ROI: Customer support model
Self service Manual processing
Fewer customer calls. 100,000 documents
Faster, more accurate CSR 2 pages per document
responses through better $4 per page
information access.
$800K
25-50% service efficiency
increase
300K customer service calls
per month
$6 cost per call $5.4M to $10.8M/yr
1-5% increased sales
$18.6B sales (’04) $186M to $930M/year
($761M) net income (’04) ($575M) to $169M/year
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC The business of organized information 53
Metadata ROI: Compliance
Avoiding penalties for
breaching regulations
SOX: up to 5 years in jail
SOX: up to $5M
Following required
procedures
Loss of company
$100B revenue (’00) $100B
Loss of partner companies
Arthur Andersen
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC The business of organized information 54
Knowledge workers spend up to 2.5 hours
each day looking for information …
Commun-
Searching
icating
Creating
… But find what they are looking for only 40% of
the time.
— Kit Sims Taylor
Taxonomy Strategies LLC The business of organized information 55
High cost of not finding information
“The amount of time wasted in futile searching for vital
information is enormous, leading to staggering costs …”
— Sue Feldman, bnb nbnbn
High cost of poor classification
Poor classification costs a 10,000 user organization
$10M each year—about $1,000 per employee.
— Jakob Nielsen, useit.com
But “better search” itself is a weak ROI
Taxonomy Strategies LLC The business of organized information 56
Knowledge workers spend more time re-creating
existing content than creating new content
Commun- Searching
icating
Creating
Recreating new
existing content
content 9%
26%
— Kit Sims Taylor
Taxonomy Strategies LLC The business of organized information 57
Metadata ROI: Productivity
Decreased cost to market Enterprise document
Decreased development cost management system cost
Increased R&D productivity $10M
Reduced time for sales &
marketing
1-5% decrease in drug
development cost
$800M/drug
$8M to $16M/drug
5-10% increase in R&D
productivity
13% of revenue
$39B in sales (’04) $254M to $507M/year
10-20% decrease in time
for sales & marketing
$254M to $507M/year
13% of revenue
Source: Proforma based on Hoover’s data.
Taxonomy Strategies LLC The business of organized information 58
Metadata ROI: Executive Mandate
There is no ROI out of the box
Just someone with a vision
…and the budget to make it happen.
What’s really needed?
Demos and proofs of value.
So that a stronger cost benefit argument can be made for
continuing the work
Taxonomy Strategies LLC The business of organized information 59
Productivity, loyalty, and revenue have provided the
ROI
Taxonomy Strategies LLC The business of organized information 60
Intranet has provided the best ROI
Intranet
Web/online
customer sales
Web dev
infrastructure
Web/online
business sales
Middleware to
link Web to ERP
Extranet/supply
chain
e-
billing/payment
systems
Wireless Web
access
e-marketplace/
portal
None
Taxonomy Strategies LLC The business of organized information 61
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 62
Agenda
9:00 Who are we?
9:10 What are taxonomies & metadata?
9:30 What kinds of taxonomies are there, and what do I need?
9:40 How do I get a good taxonomy?
10:05 How do I associate the taxonomy with content?
10:30 Break
10:45 What do taxonomies and metadata have to do with search?
11:15 How can I sell my management on a taxonomy project?
11:45 Any more questions?
12:00 Adjourn
Taxonomy Strategies LLC The business of organized information 63
Taxonomy Strategies LLC
Contact Info
Ron Daniel
925-368-8371
rdaniel@taxonomystrategies.com
Joseph Busch
415-377-7912
jbusch@taxonomystrategies.com
May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.