Treatment of “Duplicates”
in the ADL Gazetteer
Jordan Hastings & Linda Hill
Alexandria Digital Library Project Department of Geography
University of California Santa Barbara
GIScience ’02 Conference
Boulder, Colorado
September 28, 2002
Gazetteer “Duplicates” Introduction (1)
What is a gazetteer?
“Spatial dictionary of named and typed features located in the environment.”
Traditional: Appendix in Atlas Digital: Computer Database
Gazetteer “Duplicates” Introduction (2)
What are “duplicates”?
Features that are somehow conflicted re: names, types, or locations
One feature - many names One name - many features
Gazetteer “Duplicates” Introduction (3)
What is the ADL Gazetteer?
http://www.alexandria.ucsb.edu/gazetteer
Key access component for digital geodata Pilot implementation of publicly-accessible placename (feature) database & service Fundamental GI Science research activity
Gazetteer “Duplicates” Outline of Talk
Tour of gazetteer-related issues in California-Nevada, esp. Lake Tahoe II. Discussion of approach to resolving issues regarding “duplicates” III. Demonstration of software that implements the approach
I.
California & Nevada
# # # # # # # # # # # # # # # # # # # # # # # # # # 8 " # ## # # # # # # # ## # # 8 " # # # # # # #
# 8 "# # 8 # 8 "" ## # # # # # # #
#
#
8 "
#
#
# # ## # ## ## # # # # # # # # # # # # # # # # ## # # # ## # # # # ### # # ## ## ## # # # # # # # # # # # # # # # ## # # ## # # # # # # # # # ### # # # # ## # ### ## # # # ## # # ## # # # # # # # # ### # # # # # ## ## # # # ## # # # ## # # # # # # # ## # # # # # # # # # # # # ## ## # # # # # # # ## # #### ## ### # # ## ### # # # # ## # # ## ## ## #### # # ## ## ## # # # ## # # # # # # # ## # # # # # # # # # # # ## ## # # ## # # # # ## ## # ## # # # # # # # ## # # ## # ### # # # ## # # ## # # ## # # ##### # # ## ### # ### # ## # # # ## # # # ## # ## # # # # # ## ## # # # # # # # ## ### # ## ## # # # # ## # # # ## # # # ### # ## # # # # ### # # # ### # # ## # # # # # # # # # # # # # # # ## # ## # # # # # # # # # # ## ## # # # # # # # # # # # # # # # # # # ### # # # # ## # # # # # # # # # # # # # ## # # ## # # # # #
8 8 " "
8 "
8 8" "" 8 8 "
# #
8 "
8 " 8 " 8 " 8 8 "" " " " " 8 "" 8 8 8" "8 8 8" "88 8 8 " 8 8" 8 " """" 8 8 8 8 " 8 " 8 " "" " 88 8 " 8 " 8 8 " 8 88 8 " "" " 8 " 8 8 8 88888 " "" "" " " " " " 8" 8 8 8" 8"8 " "8"" " 8 8 "8 " 8" 88" 8 88 8 " ""888 8 8 8 " " " 8 88 8 "8 """" "" 8 " 8 " 8" 8 888 8 """"" " " " " 88 8 " " 8" 88 " 8 " 8"" 8 """ 8" 88 "8 8 " 88 " 8 8"88 "" " "8 88 "" 8" 8 "8 8 " 8 " 8 " 8 8" 8 8 "" " " 8 8 " 8 " 88 8 """ " 8
8 "
8 " 8 "" 8
8 "
#
#
#
# ## # # # # 8 # 8 8" "" ## # ## 888# " " "" # 8 #" 8 # 8 " # #
8 "
88 8 "" " 8 88 " ""
# #
88 8 "" " 8 " 8 " 8 " 8 " 8 "
8 " 8 " 88 " " 8 " 8 "
# # # # ##
# # 8 "
8 " 8 " 8 "
## # # #
# # # ## #
#
8 "
8 "
#
8 "
# # 88 8 "" " #
# # ##
Source Data: ESRI ArcView 3.2
### " 8 "# # 8 # # 8 " # # # 8 # # 88 " "# # " " # 8 #"# # # # 8 " 8 888 "### " #"# " # ""# " ##"## # # ## # # ## # 88###" " ## # #8 # 88"8 8 # # 8#""#############"" " #" " " 8 8 "#" " # # 8 # 8 " 88 "# 8"##""""" 8 #8"8###8## 8# " # 88"# 88 " " "#88# " #8 ## #8# 8 #8" 8 8 88 8 # " 8 #8"###8#8#"#" " ##"#" " " # 8###" 8 ""8## #" 888 # 8 888 " 8" """ 8 # ## "88" 8 8 8 8 # " # # 88#8"#"#8##88#" 8#8 8 #8 "#"##"########"8##"" # ""#"# #" #" # " # # "8 8 8#8#8#" #8 # "88##8#8"#8 ######## 8" 8 ""8#" 8 8#" "8 ##""#""#"#" " ## 8 8#""88## 8 8 ""####""#"# " # " # # ## # # " # 8 88" #""8 ""#888# ## # 8 "# 8#"# 8"8 8 "8#8" 8"# #"# 88 88# " 8 #8 " #" 8 8 # ## # # 8 " # 8 " 8 8#88 8 888# "# #"#"#"" # # #######"" # # " #"#"" " 8 88# # " "#"" 8 # # ##" 8# # # 8 # 888 8#8 " 8" "#"# """ #8 8 ## # 8 8#8 " """ # 8### 8 ### 8 "" # 8 8 " 88 "" # 8 "## # # 8 " # # # 8#8" " """ #### # 8 88 # 8 "# # 8 88 " " "# # # # 8 " # 8 # 8# 88#"# " ""#"### # ## 88 "8 #"# 8 8#" 8 "# 8#8 """# # 8 "
## # # ## # #
8 " 8 "
#
# #
# # ##
# # # # # #
8 " 8 " 8 "
Gazetteer “Duplicates” Names & Features (4)
Lake Bigler, thru 1920s Lake Bonpland (also Bondland), thru 1890s Da-ow-a-ga, thru 1850s
$
Tahoe Vista Dollar Point $
$
Kings Beach Incline VillageCrystal Bay
$ $
$
Carson SunnysideTahoe City
$ $
Indian Hills Johnson Lane Zephyr CoveRound Hill Village
$ $ $ $
South Lake Tahoe
Kingsbury Stateline Minden
$ $ $ $ $
Gazetteer “Duplicates” Discussion (1) Definitions
DEF:: Feature: Humanly recognizable,
±persistent phenomenon in the environment
Each feature integrates & interrelates three different kinds of attributes (with special issues): Location (framework & scale, accuracy) Name (linguistics, culture) Type (taxonomy, ontology)
DEF:: Gazetteer: Database of Features
i.e., a spatial dictionary,continually evolving
Gazetteer “Duplicates” Discussion (2) Approach
Multiple metrics of feature similarity
Geospatial
Proximity (familiarity) Containment (hierarchy) Notation (as written) Diction (as spoken)
Textual
Weighted combinations of these metrics
Gazetteer “Duplicates” Discussion (3) Specifics
Geospatial Metrics (w/Subtleties)
Great Circle Distance Bounding Box Topology
(Polygons may not be better!)
Inside Nearto both scaled areally
Twin Lakes
Half Moon Lake
Gazetteer “Duplicates” Discussion (4) Specifics
Textual Metrics (w/Subtleties)
Hamming Distance (hd)
hd (“Lake”,”Pond”) = 4
Edit Distance (ed)
ed (“Lake”,”Lakes”) = 1
Soundex (sdx)
sdx (“Pyramid Lake”) = [P653] sdx (“Lake Tahoe”) = [T000]
1: 2: 3: 4: 5: 6:
B,P,F,V C,S,K,G,J,Q,X,Z D,T L M,N R
Gazetteer “Duplicates” Discussion (5) Specifics
Canonical Names
Tahoe Lake Tahoe Tahoe, Lake
but
Tahoe, Lake
Lake Bigler Big Frog Lake
?
Bigler, Lake Big Frog, Lake Frog, Big, Lake
Gazetteer “Duplicates” Demonstration (1) - Background
GNIS Dataset http://geoname.usgs.gov
Public product of USGS / Mapping Div. for BGN Centroid point features, from many 1:100K- maps Web-accessible, updated ad hoc
Private product, sold into logistics & mapping firms Polygon & line features, from DLGs, other sources CD-publication (75+ for U.S.), updated quarterly
GDT Dataset http://www.gdt1.com
[demo]
Gazetteer “Duplicates” Summary (1)
Features Cover a Large Territory
Crisp or Diffuse Compact or Extended Tangible or Abstract
Naming Features is Human Necessity
Linguistic Reference Identity and Ownership Navigation and Wayfinding
Gazetteer “Duplicates” Summary (2)
Feature Names are Numerous & Various
Polynymous, multi-lingual Suffused with linguistic conventions Time-variant Projected, multi-scale Obscured by cartographic conventions Time-variant
Feature Locations also Numerous & Various
Types, too, can be Numerous & Various
Gazetteer “Duplicates” Summary (3)
Automatic Recognition of “Duplicates”
Essential to gazetteer construction Relies on both geospatial & textual metrics; weighting of combinations is subjective Results in multiple characterizations for a single feature in many (most) cases, database & visualization implications
Gazetteers pushing at the limits of GIS
spatially, temporally, and ontologically
Gazetteer “Duplicates” Observations
Features are subjective, not objective “Duplicate” features are not problems, but clews to important subtleties No “right” answer to feature-izing the environment. Features vary:
Spatially (scale) Temporally Culturally (socially) Cognitively (personally)
[end]
Gazetteer “Duplicates” Future Work
Widening beyond California & Nevada Adjusting metrics & weights, regionally Testing computational costs/benefits of polygon vs. bounding box calculations Exploring database mechanisms to deal with complexity of gazetteer knowledge Implementing in Web-mapping GIS
Gazetteer “Duplicates” Fundaments (2)
Duplicates: An approximate notion
Firm types, ±close in hierarchy Locations ±close dependent on scale Names ±close dependent on language … or not at all All aspects variant in time
[end]