Database Researchers:
Plumbers or Thinkers?
Gerhard Weikum
Max Planck Institute for Informatics
http://www.mpi-inf.mpg.de/~weikum/
Acknowledgements
Personal Motivation
ACM SIGMOD Gerhard Weikum Speaks Out
on Why We Should Go for the Grand Challenges,
Record Interview: Why SQL Is Too Powerful, the Myth of Precision,
How to Have a Big Research Group in Germany, and More
Q: Someone suggested that …
you may be being seduced by the
dark side and end up doing AI.
Are you heading in this direction?
A: Mike Stonebraker used to say „This problem is AI-complete“,
meaning the problem was beyond any hope of solution, was science fiction,
was on the same level as „Scotty, beam me up!“.
Now, as I grow older, I think this attitude is wrong.
7. World Memex: Build a system that given a text corpus,
can answer questions about the text and summarize the text
as precisely and quickly as a human expert in that field.
Do the same for music, images, art, and cinema.
This is a demanding task.
It is probably AI Complete, but is an excellent goal …
(Jim Gray: What‘s Next? A Dozen Information-Technology Research Goals, 1999)
Take-Home Message
1970 1980 1990 2000 2010 2020
parallel distr. manycore megacore
B-tree B-tree B-tree B-tree B-tree B-tree
code code code code code code
multidim. parallel distr. cloud supercloud
index index index index index index
mgt. mgt. mgt. mgt. mgt. mgt.
custom scalable automatic
trans. query auto- stores & real-time data
proc. optim. admin engines analytics integration
universal
relation parallel knowledge Semantic Deep Turing
model DB sys. base Web QA test
Cool Problem: Semantic Queries on Web
www.google.com/squared/
Cool Problem: Semantic Queries on Web
www.google.com/squared/
Cool Problem: Semantic Queries on Web
www.google.com/squared/
Cool Problem: Deep QA in NL
William Wilkinson's "An Account of the
Principalities of Wallachia and Moldavia"
inspired this author's most famous novel
This town is known as "Sin City" & its
downtown is "Glitter Gulch"
As of 2010, this is the only
former Yugoslav republic in the EU
99 cents got me a 4-pack of Ytterlig
coasters from this Swedish chain
question knowledge
classification & back-ends
decomposition
YAGO
D. Ferrucci et al.: Building Watson: An Overview of the
DeepQA Project. AI Magazine, Fall 2010.
www.ibm.com/innovation/us/watson/index.htm
Cool Problem: Machine Reading
It’s about the disappearance forty years ago of Harriet Vanger, a young
scion of one of the wealthiest families in Sweden, and about her uncle,
determined to know the truth about what he believes was her murder.
tiny island of Hedeby.
Blomkvist visits Henrik Vanger at his estate on the same
same
The old man draws Blomkvist in by promising solid evidence against Wennerström.
same
Blomkvist agrees to spend a year writing the Vanger family history as a cover for the real
owns
assignment: the disappearance of Vanger's niece Harriet some 40 years earlier. Hedeby is
home to several generations of Vangers, all part owners in Vanger Enterprises. Blomkvist
becomes acquainted with the members of the extended Vanger family, most of whom resent
uncleOf hires
his presence. He does, however, start a short lived affair with Cecilia, the niece of Henrik.
enemyOf
same affairWith persuades her to assist
After discovering that Salander has hacked into his computer, he same
him with research. They eventually become lovers, but Blomkvist has trouble getting close
affairWith
to Lisbeth who treats virtually everyone she meets with hostility. Ultimately the two
discover that Harriet's brother Martin, CEO of Vanger Industries, is secretly a serial killer.
A 24-year-old computer hacker sporting an assortment of tattoos and body piercings
headOf
supports herself by doing deep background investigations for Dragan Armansky, who, in
same
turn, worries that Lisbeth Salander is “the perfect victim for anyone who wished her ill."
O. Etzioni, M. Banko, M.J. Cafarella: Machine Reading, AAAI ‚06
T. Mitchell et al.: Populating the Semantic Web by Macro-Reading Internet Text, ISWC’09
Outline
Intro: Motivation & Cool Problems
From Data Mining to Knowledge Harvesting
From Snapshots to Eternity
From Record Linkage to NL Disambiguation
Wrap-up
...
Goal: Turn Web into Knowledge Base
Source:
DB & IR methods for
knowledge discovery.
Communications of
the ACM 52(4), 2009
comprehensive DB of human knowledge
• everything that Wikipedia knows
• everything machine-readable
• capturing entities, classes, relationships
Approach: Harvesting Facts from Web
YAGO2: Politician
Angela Merkel
Political Party
CDU
10 Mio. entities, 500 000 classes, Karl-Theodor zu Guttenberg CDU
PoliticalParty Spokesperson Hartmann
Christoph FDP
300 Mio. facts for 100 relations, CDU Philipp Wachholz
…
100 languages, 95% accuracy Politician
Die Grünen
Facebook
Claudia Roth
Position
FriendFeed
Angela Merkel Chancellor Germany
Software AG IDS Scheer
Karl-Theodor zu Guttenberg Minister of Defense Germany
…
Christoph Hartmann Minister of Economy Saarland
…
Company AcquiredCompany
Google YouTube Company CEO
Yahoo Overture Google Eric Schmidt
Facebook FriendFeed Movie Yahoo Overture
ReportedRevenue
Software AG IDS Scheer Avatar Facebook FriendFeed
$ 2,718,444,933
… The Reader Software AG IDS Scheer
$ 108,709,522
Actor Award … FriendFeed
Facebook
Christoph Waltz Oscar
Software AG IDS Scheer
Sandra Bullock … Oscar
Sandra Bullock Golden Raspberry
…
SUMO
YAGO-NAGA IWP
Cyc
TextRunner WikiTax2WordNet ReadTheWeb
Knowledge in a KB
• facts / assertions: bornIn (GretaGarbo, Stockholm),
hasWon (GretaGarbo, AcademyAward),
playedRole (GretaGarbo, MataHari), livedIn (GretaGarbo, Klosters), …
• taxonomic: instanceOf (GretaGarbo, actress),
subclassOf (actress, artist), …
• lexical / terminology: means (“Big Apple“, NewYorkCity),
means (“Apple“, AppleComputerCorporation)
means (“MS“, Microsoft) , means (“MS“, MultipleSclerosis) …
• common-sense properties:
apples are green, red, juicy, sweet, sour … - but not fast, smart …
balls are round, smooth, slippery … - but not square, funny …
• common-sense axioms:
x: human(x) male(x) female(x)
x: (male(x) female(x)) (female(x) ) male(x))
x: animal(x) (hasLegs(x) isEven(numberOfLegs(x)) …
• procedural: how to fix/install/prepare/remove …
• epistemic / beliefs: believes (Ptolemy, shape(Earth, disc)),
believes (Copernicus, shape(Earth, sphere)) …
...
Knowledge for Intelligence
• entity recognition & disambiguation
• understanding natural language & speech
• knowledge services & reasoning for semantic apps
(e.g. deep QA)
• semantic search: precise answers to advanced queries
(by scientists, students, journalists, analysts, etc.)
Swedish king‘s wife when Greta Garbo died?
FIFA 2010 finalists who played in a Champions League final?
Politicians who are also scientists?
Relationships between
Max Planck, Angela Merkel, Jim Gray, and the Dalai Lama?
Enzymes that inhibit HIV?
Influenza drugs for teens with high blood pressure?
...
KB Growth, Dynamics, Life-Cycle
Great mileage from semistructured data:
(infoboxes, category systems, tables, lists, etc.)
YAGO, Dbpedia, Freebase, Trueknowledge, etc.:
Bio‘s of facts about 10 Mio‘s of entities,
for 1000‘s of relations
But: "To know that we know what we know, and
that we do not know what we do not know,
that is true knowledge. "
Confucius,
551-479 BC
Most new & interesting facts/statements are in:
news, blogs, forums, tabloids,
essays, books, scientific papers, …
knowledge harvesting from natural-language text !
KB needs continuous updates & long-term mgt. !
French Marriage Problem
facts in KB: new facts or fact candidates:
married married (Cecilia, Nicolas)
(Hillary, Bill) married (Carla, Benjamin)
married married (Carla, Mick)
(Carla, Nicolas) married (Michelle, Barack)
married married (Yoko, John)
(Angelina, Brad) married (Kate, Leonardo)
married (Carla, Sofie)
married (Larry, Google)
1) for recall: pattern-based harvesting
2) for precision: consistency reasoning
Pattern-Based Harvesting
(Hearst 92, Brin 98, Agichtein 00, Etzioni 04, …)
Facts & Fact Candidates Patterns
(Hillary, Bill) X and her husband Y
(Carla, Nicolas) X and Y on their honeymoon
(Angelina, Brad)
(Victoria, David) X and Y and their children
(Hillary, Bill) X has been dating with Y
(Carla, Nicolas)
X loves Y
(Yoko, John)
(Kate, Pete) … • good for recall
(Carla, Benjamin) • noisy, drifting
(Larry, Google) • not robust enough
(Angelina, Brad) for high precision
(Victoria, David)
Reasoning about Fact Candidates
Use consistency constraints to prune false candidates
FOL rules (restricted): ground atoms:
spouse(Hillary,Bill)
spouse(x,y) diff(y,z) spouse(x,z) spouse(Carla,Nicolas)
spouse(x,y) diff(w,x) spouse(w,y) spouse(Cecilia,Nicolas)
spouse(x,y) f(x) spouse(x,y) m(y) spouse(Carla,Ben)
spouse(Carla,Mick)
spouse(x,y) (f(x)m(y)) (m(x)f(y)) spouse(Carla, Sofie)
f(Hillary) m(Bill)
Rules reveal inconsistencies f(Carla) m(Nicolas)
Find consistent subset(s) of atoms f(Cecilia) m(Ben)
(“possible world(s)“, “the truth“) f(Sofie) m(Mick)
Rules can be weighted
(e.g. by fraction of ground atoms that satisfy a rule)
uncertain / probabilistic data
compute prob. distr. of subset of atoms being the truth
Markov Logic Networks (MLN‘s)
(M. Richardson / P. Domingos 2006)
Map logical constraints & fact candidates
into probabilistic graph model: Markov Random Field (MRF)
s(x,y) diff(y,z) s(x,z) s(x,y) f(x) f(x) m(x) s(Carla,Nicolas)
s(x,y) diff(w,y) s(w,y) s(x,y) m(y) m(x) f(x) s(Cecilia,Nicolas
s(Carla,Ben)
Grounding: Literal Boolean Var s(Carla,Sofie)
Literal binary RV …
s(Ca,Nic) s(Ce,Nic)
s(Ca,Nic) s(Ca,Ben) s(Ca,Nic) m(Nic)
s(Ca,Nic) s(Ca,So) s(Ce,Nic) m(Nic)
s(Ca,Ben) s(Ca,So) s(Ca,Ben) m(Ben)
s(Ca,Ben) s(Ca,So) s(Ca,So) m(So)
Markov Logic Networks (MLN‘s)
(M. Richardson / P. Domingos 2006)
Map logical constraints & fact candidates
into probabilistic graph model: Markov Random Field (MRF)
s(x,y) diff(y,z) s(x,z) s(x,y) f(x) f(x) m(x) s(Carla,Nicolas)
s(x,y) diff(w,y) s(w,y) s(x,y) m(y) m(x) f(x) s(Cecilia,Nicolas
s(Carla,Ben)
s(Carla,Sofie)
s(Ce,Nic) …
m(Nic) RVs coupled
s(Ca,Nic) by MRF edge
s(Ca,Ben) if they appear
m(Ben)
in same clause
s(Ca,So) m(So) MRF assumption:
P[Xi|X1..Xn]=P[Xi|N(Xi)]
Variety of algorithms for joint inference: joint distribution
has product form
Gibbs sampling, other MCMC, belief propagation, over all cliques
randomized MaxSat, …
Reasoning for KB Growth: Direct Route
(F. Suchanek et al.: WWW‘09)
new fact candidates:
facts in KB: married (Cecilia, Nicolas)
married (Carla, Benjamin)
married
(Hillary, Bill) + married (Carla, Mick)
married (Carla, Sofie)
?
married married (Larry, Google)
(Carla, Nicolas)
married patterns:
(Angelina, Brad) X and her husband Y
X and Y and their children
X has been dating with Y
Direct approach: X loves Y
1. facts are true; fact candidates & patterns hypotheses
grounded constraints clauses with hypotheses as vars
2. type signatures of relations greatly reduce #clauses
3. cast into Weighted Max-Sat with weights from pattern stats
customized approximation algorithm
unifies: fact cand consistency, pattern goodness, entity disambig.
www.mpi-inf.mpg.de/yago-naga/sofie/
Facts & Patterns Consistency with SOFIE
(F. Suchanek et al.: WWW’09)
constraints to connect facts, fact candidates, patterns
pattern-fact duality:
occurs(p,x,y) expresses(p,R) type(x)=dom(R) type(y)=rng(R) R(x,y)
occurs(p,x,y) R(x,y) type(x)=dom(R) type(y)=rng(R) expresses(p,R)
name(-in-context)-to-entity mapping:
means(n,e1) means(n,e2) …
functional dependencies: relation properties:
spouse(X,Y): X Y, Y X asymmetry, transitivity, acyclicity, …
type constraints, inclusion dependencies:
spouse Person Person capitalOfCountry cityOfCountry
domain-specific constraints:
bornInYear(x) + 10years ≤ graduatedInYear(x)
hasAdvisor(x,y) graduatedInYear(x,t) graduatedInYear(y,s) s 95% accuracy, >95% coverage, in one night
1) recall: gather temporal scopes for base facts
2) precision: reason on mutual consistency
consistency constraints are potentially helpful:
• functional dependencies: husband, time wife
• inclusion dependencies: marriedPerson adultPerson
• age/time/gender restrictions: birthdate + < marriage < divorce
Difficult Dating
(Even More Difficult) Implicit Dating
explicit dates vs.
implicit dates relative to other dates
(Even More Difficult) Relative Dating
vague dates
relative dates
narrative text
relative order
Framework for T-Fact Extraction
(M. Theobald et al.: MUD’10, Y. Wang et al.: EDBT’10)
1) represent temporal scopes of facts
in the presence of incompleteness and uncertainty
2) gather & filter candidates for t-facts:
extract base facts R(e1, e2) first; then
focus on sentences with e1, e2 and date or temporal phrase
3) aggregate & reconcile evidence from observations
4) reason on joint constraints about facts and time scopes
Joint Reasoning on Facts and T-Facts
(M. Dylla et al.: BTW’11)
Combine & reconcile t-scopes across different facts
constraint:
marriedTo (m) is an injective function at any given point
X, Y, Z, T1, T2:
m(X,Y) m(X,Z)
validTime(m(X,Y),T1) validTime(m(X,Z),T2)
overlaps(T1, T2)
after grounding:
m(Ca,Nic)
m(Carla, Nicolas) m(Cecilia, Nicolas) m(Ce,Nic)
overlaps ([2008,2010], [1996,2007]) false
m(Carla, Nicolas) m(Carla, Benjamin) m(Ca,Nic)
overlaps ([2008,2010], [2009,2011]) m(Ca,Ben)
true
Joint Reasoning on Facts and T-Facts
m(Ca, Mi)
m(Ca, Ben)
m(Ca, Nic)
m(Ce, Nic)
m(Ce, Mi)
time
Conflict graph:
m(Ca, Mi) m(Ca, Ben)
[2004,2008] [2009,2011] Find maximal
independent set:
m(Ce, Nic) m(Ca, Nic)
subset of nodes
[1996,2007] [2008,2010] w/o adjacent pairs
with (evidence-)
m(Ce, Mi) weighted nodes
[1998,2005]
Joint Reasoning on Facts and T-Facts
m(Ca, Mi)
m(Ca, Ben)
m(Ca, Nic)
m(Ce, Nic)
m(Ce, Mi)
time
Conflict graph:
30 m(Ca, Mi) m(Ca, Ben)
10 Find maximal
[2004,2008] [2009,2011]
independent set:
subset of nodes
100 m(Ce, Nic) m(Ca, Nic)
80 w/o adjacent pairs
[1996,2007] [2008,2010]
with (evidence-)
m(Ce, Mi) weighted nodes
[1998,2005] 20
Joint Reasoning on Facts and T-Facts
alternative approach:
split t-scopes and reason on
consistency of t-fact partitions
m(Ca, Mi)
m(Ca, Ben)
m(Ca, Nic)
m(Ce, Nic)
m(Ce, Mi)
time
Outline
Intro: Motivation & Cool Problems
From Data Mining to Knowledge Harvesting
From Snapshots to Eternity
From Record Linkage to NL Disambiguation
Wrap-up
Record Linkage (Entity Resolution)
record 1 record 2 record 3 … record N
Susan B. Davidson O.P. Buneman P. Baumann Y. Davidson
Peter Buneman S. Davison S. Davidson Sean Penn
Yi Chen Y. Chen Cheng Y. S. Chen
University of U Penn Penn State Penn Station
Pennsylvania
Issues in … Issues in … Issues in … Issues in …
Int. Conf. on Very VLDB Conf. PVLDB XLDB
Large Data Bases Conference
Find equivalence classes of entities, and records, based on:
• similarity of values (edit distance, n-gram overlap, etc.)
• joint agreement of linkage
similarity joins, grouping/clustering, collective learning, etc.
Halbert L. Dunn: Record Linkage. American Journal of Public Health. 1946
H.B. Newcombe et al.: Automatic Linkage of Vital Records. Science, 1959.
Linked Data: Record Linkage at Web Scale
Source: Christian Bizer, Tom Heath, Tim Berners-Lee, Michael Hausenblas,
WWW 2010 Workshop on Linked Data on the Web
linkeddata.org
Linked Data: Record Linkage at Web Scale
yago/wordnet:Artist 109812338
yago/wordnet:Movie 106613686
yago/wikicategory:SwedishFilmDirectors
imdb.com/title/tt0050986/
dbpedia.org/resource/Ingmar_Bergman
dbpedia.org/resource/Woody_Allen
dbpedia.org/resource/David_Lynch
dbpedia.org/resource/Uppsala
rdf.freebase.com/ns/Uppsala
data.nytimes.com/
lynch_david_per
?
data.nytimes.com/uppsala_sweden_geo
quotationsbook.com/author/4561
sws.geonames.org/2666199/
data 43''
need referential E 17° 38'quality for Linked Data:
N 59° 51' 30''
automatic & dynamic !
Named-Entity Disambiguation in Text
Harry fought with you know who. He defeats the dark lord.
Dirty Harry Prince Harry The Who Lord
Harry Potter of England (band) Voldemort
Three NLP tasks:
1) named-entity detection: segment & label by HMM or CRF
(e.g. Stanford NER tagger)
2) co-reference resolution: link to preceding NP
(trained classifier over linguistic features)
3) named-entity disambiguation:
map each mention (name) to canonical entity (entry in KB)
Mentions, Meanings, Mappings
Agnetha Qvarnström
Agnetha,
Björn, Agnetha Fältskog
Benny, Benny Goodman
and Anni-Frid
Benny Andersson
were Sweden‘s
most successful Battle of Waterloo
pop music group.
Waterloo Station
Their greatest hits KB
were Waterloo Waterloo (song)
and Mamma Mia. Agnetha means Agnetha Fältskog
Agnetha means Agnetha Munther
Agnetha means Agnetha Qvarnström
Björn means Björn Borg
Björn means Björn Ulvaeus
Björn means Björn the Viking
Benny means Benny Goodman
Benny means Benny Andersson
Waterloo means Battle of Waterloo
Waterloo means Waterloo (Ontario)
Waterloo means Waterloo Station
Waterloo means Waterloo (song)
Mention-Entity Graph
weighted undirected graph with two types of nodes
Agnetha, Agnetha Q.
Björn,
Benny, Agnetha F.
and Anni-Frid Benny G.
were Sweden‘s
Benny A.
most successful
pop music group. B. Waterloo
Their greatest hits
Waterloo St.
were Waterloo
and Mamma Mia. Waterloo (s)
Popularity Similarity KB+Stats
(m,e): (m,e):
• freq(m,e|m) • cos/Dice/KL
• length(e) (context(m),
• #links(e) context(e))
Mention-Entity Graph
weighted undirected graph with two types of nodes
Agnetha, Agnetha Q.
Björn,
Benny, Agnetha F.
and Anni-Frid Benny G.
were Sweden‘s
Benny A.
most successful
pop music group. B. Waterloo
Their greatest hits
Waterloo St.
were Waterloo
and Mamma Mia. Waterloo (s)
Popularity Similarity KB+Stats Coherence
(m,e): (m,e): (e,e‘):
• freq(m,e|m) • cos/Dice/KL • dist(types)
• length(e) (context(m), • overlap(links)
• #links(e) context(e)) • overlap
(anchor words)
Mention-Entity Graph
weighted undirected graph with two types of nodes
Agnetha, Swedish female singers
Agnetha Q. people from Jönköping
Björn, singers
Benny, Agnetha F. musicians
and Anni-Frid Benny G. Swedish songwriters
were Sweden‘s people from Stockholm
Benny A. composers
most successful musicians
pop music group. B. Waterloo
Their greatest hits ABBA songs
Waterloo St. #1 chart singles
were Waterloo songs
and Mamma Mia. Waterloo (s) artifacts
Popularity Similarity KB+Stats Coherence
(m,e): (m,e): (e,e‘):
• freq(m,e|m) • cos/Dice/KL • dist(types)
• length(e) (context(m), • overlap(links)
• #links(e) context(e)) • overlap
(anchor words)
Mention-Entity Graph
weighted undirected graph with two types of nodes
Agnetha, http://.../wiki/ABBA
Agnetha Q. http://.../wiki/Anni-Frid_Lyngstad
Björn, http://.../wiki/Jönköping
Benny, Agnetha F. http://.../wiki/Eurovision_Song_Con
and Anni-Frid Benny G. http://.../wiki/ABBA
were Sweden‘s http://.../wiki/Anni-Frid_Lyngstad
most successful
Benny A. http://.../wiki/Mamma_Mia!
http://.../wiki/Agnetha_Fältskog
pop music group. B. Waterloo
Their greatest hits http://.../wiki/ABBA
Waterloo St. http://.../wiki/Eurovision_Song_Con
were Waterloo http://.../wiki/Mamma_Mia!
and Mamma Mia. Waterloo (s)
Popularity Similarity KB+Stats Coherence
(m,e): (m,e): (e,e‘):
• freq(m,e|m) • cos/Dice/KL • dist(types)
• length(e) (context(m), • overlap(links)
• #links(e) context(e)) • overlap
(anchor words)
Mention-Entity Graph
weighted undirected graph with two types of nodes
Agnetha, pop group ABBA
Agnetha Q. best-selling music artist in history
Björn, Melodifestivalen
Benny, Agnetha F. The Winner Takes It All
and Anni-Frid Benny G. pop group ABBA
were Sweden‘s Grammy Award nomination
Benny A. Melodifestivalen
most successful Mamma Mia!
pop music group. B. Waterloo
Their greatest hits Agnetha Fältskog
Waterloo St. Benny Andersson
were Waterloo number-one single in Norway
and Mamma Mia. Waterloo (s) Mamma Mia!
Popularity Similarity KB+Stats Coherence
(m,e): (m,e): (e,e‘):
• freq(m,e|m) • cos/Dice/KL • dist(types)
• length(e) (context(m), • overlap(links)
• #links(e) context(e)) • overlap
(anchor words)
Different Approaches
Combine Popularity, Similarity, and Coherence Features
(Cucerzan: EMNLP‘07, Milne/Witten: CIKM‘08):
• for sim (context(m), context(e)):
consider surrounding mentions
and their candidate entities
• use their types, links, anchors
as features of context(m)
• set m-e edge weights accordingly
• use greedy methods for solution
Collective Learning with Prob. Factor Graphs
(Chakrabarti et al.: KDD‘09):
• model P[m|e] by similarity and P[e1|e2] by coherence
• consider likelihood of P[m1 … mk | e1 … ek]
• factorize by all m-e pairs and e1-e2 pairs
• use hill-climbing, LP, etc. for solution
Graph Algorithms with Online DB
50
30 50
20
30 10 10
90
100
30
80 20
90
100 90
30
5
• Build mention-entity graph and compute edge weights
from knowledge and statistics in online DB
• Compute dense subgraph (e.g., high edge weight) such that:
each m is connected to exactly one e (or at most one e)
Online Disambiguation (Prototype)
Outline
Intro: Motivation & Cool Problems
From Data Mining to Knowledge Harvesting
From Snapshots to Eternity
From Record Linkage to NL Disambiguation
Wrap-up
...
Research Opportunities
Knowledge Harvesting from Text
• recall & precision by patterns & reasoning
• efficiency & scalability
• soft constraints, hard constraints, richer logics, …
• discovery of new relation types (open IE)
Temporal Knowledge
• capture uncertain / incomplete temporal scopes of facts
• joint reasoning on base-facts and time-scopes
• long-term life-cycle of KB maintenance
Named Entity Disambiguation in NL
• near-human accuracy, using popularity, similarity, coherence
• efficient algorithms and real-time response
• automatic sameAs for Linked Data at Web-scale
Overall Take-Home
AI-complete problems:
knowledge harvesting, semantic search,
deep QA, machine reading
• exciting times, major progress
• many data-centric sub-problems
DB community = data-centric research
• storing & managing data was yesterday
• tomorrow is: analyzing, distilling, making sense of data
(turning data into knowledge)
Tapping into natural language crucial for:
• knowledge mostly produced in news, books, papers
• smartphone UI for ad-hoc real-time QA and KDD
...
Thank You !
“The plumber and Michelangelo
used marble from the same quarry,
“Not only is there no God, but try
but what each saw in the marble
finding a plumber on a weekend.“
made the difference between
(Woody Allen)
a sink and a brilliant sculpture.”
(Bob Kall)