Embed
Email

CP204C - Intro to GIS Maggie Witt LAB _7 Address Geocoding 1 ...

Document Sample

Shared by: wuzhenguang
Categories
Tags
Stats
views:
0
posted:
1/8/2012
language:
pages:
12
CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





1. GEOCODE 100% OF THE ADDRESSES IN THE OAKLAND SCHOOLS DATABASE:



In this first part of the lab, I geo-coded 100% of the addresses in the

Oakland Schools database. The following describes the steps I took to get

there:







Steps to Geocode Schools Data:



I started this portion of the lab by

loading the streets layer in my map,

and then adding the Schools data (by

selecting "Sheet 1" as shown in the

screenshot at right).









Once the database/.xls file had loaded,

I opened the attribute table to examine

the data. I noticed that all of the

schools had an address associated

with them, so I refrained from

building an alias table. Additionally, I

noted that the table included the

following fields: SiteName.

Addresses, City, State, ZipCode.









1

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





In order to "evoke" an address locator, I

then opened ArcCatalog. I navigated to

the appropriate folder with my lab

assignment data, selected File  New 

Address Locator. In the next window, I

chose "US Streets with Zone" and filled

out the Address Locator window as shown

at here:









After running the address locator, I could see in

ArcCatalog that I now had an "Oakland Schools"

address locator:









In ArcMap, I added this to map by selecting Tools  Address Locator Manager; then I selected "Oakland

Schools." I checked a few intersections with the "Find" tool ( ) to make sure that the address locator

was working (which it was!).



Now, to geo-code the addresses, I right-clicked on

"Sheet1$," which contains the Oakland Schools data, and

selected "Geocode Addresses." This is the initial result that

I got:









In order to work on reconciling the tied and unmatched addresses, I navigated to Tools  Geocoding 

Review/Rematch Addresses and selected "Geocoding_Result_Schools."





2

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





Reconciling Unmatched Addresses with Candidates Tied:

First, I tried to address the three "tied" results:



(1) "Chabot" 6786 Chabot

For the address on Chabot, I noted that the first

candidate seemed closer to the stated address

(6670 close to 6786), so I went ahead and matched

this one.









(2) Claremont Middle School:

To geo-code this address, I used Google Maps to locate the school and the two

candidates. From this map (see right), I decided to go with the 5776 College Ave.

address (seemed closer to the actual location of the middle school, in pink).









(3) Explore Middle School:

For this candidate, I had a 100% match for both, so I consulted Google

Map to see if I could identify a clear winner. From this map, the school

looks to be at the intersection, so I selected the first candidate in the "tie."

looked at Google map to be sure (see right).





Unmatched Addresses:

To begin with, I had 12 unmatched addresses. To attempt to resolve these, I went through the following

steps:



(1) Looked for suspicious zip codes:



First, I noticed a seemingly out of place zip code—94270. A Google Maps

search revealed that this zip code was in France (see right), so I searched

Google Map for the correct zip code, and found that it should be 94610. I

made this change in ArcMap and searched for a new result, but the match

was still very low (42%). The top match was a "746 W. Grand Ave.," so I

used Google Map to evaluate if this was indeed a match. I found that West

Grand and Grand were NOT in the same location, so I decided to draw the

point in where it should be, using the Tool.









3

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





From Google Map: Where Lakeview Should Be: Using "Pick Address from Map" to get there:









In another example, I found an unusual zip code of 91601 for

"International Community School." I looked up this school in

Google Maps and found that the zip code should be 94601. I found a few

a few other examples of unusual zip codes (i.e. 99601), so I either

searched to find the actual zip code in Google Maps or eliminated

the zip codes altogether in order to find the correct match.



International Blvd.

I knew from the in-class lab that International Blvd. was also 14th St., so changed this in ArcMap. For

four "International Blvd." addresses, changing the addressed to "14th St." generated resulted that matched

to 82%, so I selected these results as the new matches. An example of the window and where I edited the

name from "International Blvd" to "14th St." is shown below:









Avenues vs. Streets:

For two of the remaining 4 addresses, I found (via Google Maps) that one (Howard on "Fontaine Ave.")

was supposed to be Fontaine St. and Community Day High School was on Mountain Blvd. (rather than

Ave.). Making these changes helped me to make two more matches!









4

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





Finally, to eliminate the last 2 unmatched addresses (Brookfield & Carl Munck), I used Google Map to

locate the schools and used the "Pick Address from Map" tool to set the points (since checking zip codes,

streets vs. avenues, etc. didn't result in an sure matches). To do so, I followed the same steps as outlined

above.



Carl Munck, as an example:









Note that, in the ArcMap image, the "candidates" were pretty far off from where the school is actually

located, so it's a good thing that I didn't geo-code according to these candidates!



After adding these points manually, my map was now 100%

matched! (see right).









5

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





2. GEOCODE 100% OF THE ADDRESSES IN THE OAKLAND FOOD DATABASE:



In this first part of the lab, I geo-coded 100% of the addresses in the

Oakland food database. The following describes the steps I took to get

there:









Steps to Geocode Food Data:



I started this portion of the lab just as I had done for the Oakland school data by adding the food database

to the map. Once in my TOC, I opened the attribute table to take a look at the data. Once again, while I

noticed some problems

with the data, it appeared

that there would not be a

need for an alias table.

Additionally, I noted that

the fields for this table

included: FoodStore,

Address, City, State,

Phone, Rating (note: no

zip code).









As before, I then switched to ArcCatalog to "evoke" the

address locator. I repeated the same steps as before, and when

I was finished, I could see in ArcCatalog that I had a new

Address Locator for the food data (see right).



Back in ArcMap, I added the food data and geo-coded the addresses in ArcMap the same way that I did

for the Oakland Schools (right clicked on "food$"  selected "Geocode Addresses").



The first time I tried to geo-code, I received zero matches,

but then realized that I had specified matching with

address and city. Since the food database is missing zip

code data; however, I did not receive any matches. When I

checked "city" to "none," I was able to recover ~75%

matches (see right).









6

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





I started reconciling this data by working with the tied results, and then moved on to the unmatched

results.



Reconcilitaing Matched Addresses with Candidates Tied:

To save time, I chose to match the first candidate tied (rather than going through them individually as I

had done with the schools) in order to save time for working with the unmatched results. This worked as a

strategy to match most of the tied results, EXCEPT for "Miss Pearls Jam House" (since the address for

this entry was "Broadway," and I figured that I could figure out the actual address) and 3325 Grand Ave.

(since there appeared to be a need for some data cleanup here). For "Miss Pearl's Jam House," I found an

address via Google Maps—1 Broadway—and picked the point on the map that corresponded with this

(knew from the in-class lab that the streets data did not contain this address). Then, I changed the address

to 3325 Grand Ave. for the remaining address and matched it.





Reconcilitating Unmatched Addresses:

I started sifting through/matching the unmatched addresses by "picking the low-hanging fruit." First, as in

the in-class lab, I also geo-coded all addresses with a >60% match score, deleted all of the "Emeryville"

records from the database table, and deleted repeat entries that did not have a match address, like Addis

restaurant in the example pasted below:









After implementing these measures, I was left with just 19 unmatched entries.









From my examination of the data in

the attribute table, I knew that some

data cleanup would be necessary to

get at the remaining unmatched

addresses, so I worked through the

table to fix key errors, like mislabeled

fields (City in the Address field, etc.).



For example, I found one address that

said "Gastro" for the address field, but

it was clear that this field was just

mislabeled—I found the correct

address a few columns down under

"Rating." I changed the address to

"3986 Adeline" and was able to match

it. Additionally, I went into the

attribute table to make edits so that the





7

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





fields would match us like they were supposed to.



After cleaning up some of the data, I started using Google Maps and Bing as I had for the schools data to

identify any other common problems with the data (i.e. mislabeled streets vs. avenues, etc.). In mapping

the data in Google Maps, I was sometimes able to pinpoint specific locations for the unmatched

addresses, so I used the "Pick Address from Map" tool to manually match the address. An example of this

is shown below for the restaurant "Il Pescatore." turquoise



Location from Google Map: Location Identified in ArcMap & Point Selected:



Note: To help me

identify when the point

should go, I selected

"Zoom to Candidates,"

so that the turquoise

markings would indicate

where Webster Street is;

then followed this street

to it's terminus to place

the address point.







I followed the same process (using Google Maps) to

locate some of the other addresses in question. Grand

Avenue was one that came up a few times, and I

learned from Google where these locations should

appear on the map. To figure out the street that

ArcMap attributed to these locations, I chose "select

features" selected the polyline, and located it in the

attribute table as W. Grand Ave. With this info, I was

able to select the appropriate candidate from the search

window, or picked the addresses "by hand" (using

"Pick Address from Map").







With the "Grand Avenue" unmatched addresses taken care of, I just had one last point to geocode for Jack

London Square (Hahn's Hibachi). This time, since I know that the square is somewhat smallish in size, I

tried removing the numerical address and came up with a match of 66%, so I matched this point and had

completed the geocoding to 100%!









8

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding









(3) For each School and Food dataset, what percent gets easily geo-coded? Why?



For the Schools dataset, 88% (106) of the addresses were

easily geo-coded. This is likely due to the fact that more of the

fields were correctly aligned between the Schools database and

the reference database: streets. For example, unlike the food

database, both the streets and schools databases include a field

for zip code, which gives the program more certainty in

matching the addresses.







For the Food dataset, 75% (212) of the addresses were easily

geo-coded. The lower percentage of initial matching is likely

due to the fact that the fields in the streets and food datasets

don't match as well as the fields for streets & schools. For

example, the food dataset is missing the zip code field, which

means that the only means of matching addresses is via the

addresses, city, and state fields (and the addresses play the

biggest role, since most of the data comes from

Oakland/Emeryville, CA).









9

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





(4) What steps and techniques do you apply to fully geocode your datasets?



Since I detailed the steps that I used in #1 and #2 above, I'll refrain from repeating and just list a few

common steps/techniques:

(1) Eliminate inapplicable/unnecessary entries from the database file.

a. Removing the Emeryville data along with duplicates of unmatched entries helped to

whittle away at unmatched addresses and picked off some of the "low hanging fruit"

before delving too deeply into investigating unmatched addresses.

(2) Look for data entry problems:

a. Misspellings, incorrect placement in fields, etc. can be obvious/quick for eliminating

errors and finding matches. Additionally, obvious deviations from the appropriate zip

code can raise early red flags for adjustments that must be made.

(3) Use Google Map or Bing to identify problems that aren't as obvious:

a. Once I hit a wall with looking for errors, eliminating erroneous data, I often used other

sources (like Google Maps) to do a quality assurance/quality control check on the data.

By doing this, I was able to identify some avenues that should have been labeled as "St."

and vice versa.

(4) Try eliminating elements of the address altogether to see how this changes the match candidates:

a. In some cases, eliminating the zip code or numerical address helped to identify the

correct addresses for schools/restaurants. This was especially effective when other

elements of the address were correct, but one factor was throwing it way off (i.e. zip

codes).

(5) When all else fails, "Pick Address from Map":

a. When I had gone through all of the above steps, I used Google Maps to help me identify

the unmatched address, found the corresponding point on the ArcMap, and placed the

point/geo-coded the address "by hand."





(5) What techniques prove most useful in geo-coding? Why?



(1) Editing data in Excel before geo-coding:

a. While I didn't do this in the lab, in hindsight, I wish I had since it would have likely saved

me a lot of time. Working with the data in Excel has two advantages in my perspective:

(1) it allows for more flexibility in making edits (and seems to be more user friendly in

doing so that ArcMap) and (2) changes made in the database from the outset eliminate

the need to essentially make the same change twice (i.e. to match the address and then

again to edit the data in the database).

(2) Identify aliases:

a. While I did not do this in parts #1 and #2 above or for the in-class lab—in hindsight—I

think that creating an alias table to capture some commonly misnamed streets would have

been very helpful/saved time. For example, in both the in-class and lab assignment,

creating an alias for "International Boulevard" would have saved a LOT of time from

having to individually edit and match each of these entries.

(3) Using Google Maps (and other spatial databases):

a. While time-consuming, I would not have been able to geo-code 100% of the address in

the schools and food databases without the use of internet-based spatial databases.









10

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





(6) What school is most impacted by poorly rated food establishments?



To answer this, I decided to look for the

school that is surrounded by the greatest

number of poorly rated food

establishments. To evaluate this, I

determined that a rating of 3.5 or lower

would be my benchmark for "poorly-

rated" establishments, and I selected these

from the attribute table for

"Geocoded_Result_Food" by sorting the

rating column and selecting all entries

greater than/equal to 3.5 but not those that

lacked a rating altogether.









With the poorly rated attributes selected, I then looked on the map to see which school seemed to be

surrounded by the greatest number of establishments, and selected Lincoln as the school most impacted,

there were ~ 9 poorly rated establishments within ~10 blocks of the school (if counting the grids as

blocks):









11

CP204C - Intro to GIS Maggie Witt

LAB #7: Address Geocoding





(7) Which record never gets coded (in the alcohol license exercise) and why? Why is 9332

VISTA CT a problem?



According to a Google Map search, the address 9332

Vista Ct. does exist in the Oakland area; however, this

street cannot be geo-coded in the alcohol license

database.









To investigate the cause, I looked in the attribute table for the streets layer (see next page), and saw that

Vista Court does not exist here (there is s Vista St., but this is different from Vista Court). As a result, the

Vista Ct. location cannot be geo-coded. Because the street does not even exist on the map, I can not "Pick

Address From Map," which would typically be my last resort.









12



Other docs by wuzhenguang
Is Air Quality a Problem in My Home
Views: 8  |  Downloads: 0
IHRM Chapter 6
Views: 9  |  Downloads: 0
37.10593
Views: 7  |  Downloads: 0
December_break
Views: 8  |  Downloads: 0
Lectures for 2nd Edition
Views: 9  |  Downloads: 0
Google Chart
Views: 30  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!