“Duplicate” Entries
in Gazetteers
jordan Hastings
Department of Geography
University of California
Santa Barbara
Gazetteer “Duplicates”
Names & Features (1)
Naming Features in the Environment
Linguistic Necessity
Identity and Ownership
Navigation and Wayfinding
Features Cover a Large Territory
Crisp or Diffuse
Compact or Extended
Tangible or Abstract
Gazetteer “Duplicates”
Names & Features (2)
Locations are Numerous & Various
Multiscale
Generalized
Dis-coordinated
Time-variant
Gazetteer “Duplicates”
Names & Features (3)
Names are Numerous & Various
Polynymous
Mis-spelled
Multilingual
Time-variant
Gazetteer “Duplicates”
Names & Features (4)
Lake Bigler, thru 1920s
Lake Bonpland (also Bondland), thru 1890s
Da-ow-a-ga, thru 1850s
$
Kings $
Beach Incline Village-
Crystal Bay
$
Tahoe Vista $ $
Dollar Point $
Carson
$ $
Sunnyside-
Tahoe City Indian Hills
$
Johnson Lane
$
Zephyr Cove-
Round Hill Village
$
$ Kingsbury
$
$
$ Stateline $
South Lake Tahoe Minden
$
Gazetteer “Duplicates”
Feature Types (1)
Dependable Type System
Because Features are “Objects”
Because Human Mind Categorizes
Types present in Taxonomy
Hierarchy is Natural in Environment
Because Human Mind Categorizes
Gazetteer “Duplicates”
Feature Types (2) – Examples
Cultural Environment
Nations -> States -> Provinces -> Districts
Gazetteer “Duplicates”
Feature Types (2) - Examples
Physical Environment
Watersources:
Springs-->Seeps
Watercourses:
Rivers-->Streams-->Creeks
Waterbodies:
Lakes-->Ponds-->Sloughs
?Glaciers
Gazetteer “Duplicates”
Fundaments (1)
Definition: Gazetteer
A spatial dictionary of
named & typed features
in the environment
Implications
Features uniquely identified
Searchable by name and type
Also searchable geospatially
Gazetteer “Duplicates”
Fundaments (2)
Duplicates: An approximate notion
Firm types, ±close in hierarchy
Locations ±close dependent on scale
Names ±close dependent on language
… or not at all
All aspects variant in time
Gazetteer “Duplicates”
Fundaments (3)
Database Implications / Support
Custom Datatypes
Hierarchy
Geometry
Multiple Attribution (unlimited)
Names
Locations
Efficient Geospatial Processing
Gazetteer “Duplicates”
Approach (1)
Independent Measures of Duplicates
1. Type Thesaurus Metrics
Inter-feature: hierarchy, explicit linkages
2. Geospatial Metrics
Intra-feature: size, compactness, …
Inter-feature: distance, overlap, …
3. Geonomial Metrics
Intra-feature: NL translation [not considered yet]
Intra-feature: stemming, soundex, substitution
Gazetteer “Duplicates”
Approach (2)
Unified Assessment of Duplicates
Weighted Combination of Measures
1 Type
2 Location(s)
3 Name(s)
Geographic Visualization, over Maps
Final Authority of Human Cataloger
Gazetteer “Duplicates”
Processing Cycle
random features
prep
grouped features
rework
Gazetteer “Duplicates”
Processing Cycle
random features
prep
grouped features
rework
Gazetteer “Duplicates”
Processing Cycle
random features
prep
grouped features
weigh
accepted suspended
feature
database
Gazetteer “Duplicates”
Processing Cycle
random features
prep
grouped features
weigh
review
accepted suspended
feature
database
Gazetteer “Duplicates”
Processing Cycle
random features
prep
grouped features
rework
weigh
review
accepted suspended
post
feature reject
database
trash
[end]