; CP204C - Intro to GIS Maggie Witt LAB _7 Address Geocoding 1
Learning Center
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

CP204C - Intro to GIS Maggie Witt LAB _7 Address Geocoding 1


  • pg 1
									CP204C - Intro to GIS                                                       Maggie Witt
                                       LAB #7: Address Geocoding


In this first part of the lab, I geo-coded 100% of the addresses in the
Oakland Schools database. The following describes the steps I took to get

Steps to Geocode Schools Data:

I started this portion of the lab by
loading the streets layer in my map,
and then adding the Schools data (by
selecting "Sheet 1" as shown in the
screenshot at right).

Once the database/.xls file had loaded,
I opened the attribute table to examine
the data. I noticed that all of the
schools had an address associated
with them, so I refrained from
building an alias table. Additionally, I
noted that the table included the
following fields: SiteName.
Addresses, City, State, ZipCode.

CP204C - Intro to GIS                                                                     Maggie Witt
                                       LAB #7: Address Geocoding

In order to "evoke" an address locator, I
then opened ArcCatalog. I navigated to
the appropriate folder with my lab
assignment data, selected File  New 
Address Locator. In the next window, I
chose "US Streets with Zone" and filled
out the Address Locator window as shown
at here:

After running the address locator, I could see in
ArcCatalog that I now had an "Oakland Schools"
address locator:

In ArcMap, I added this to map by selecting Tools  Address Locator Manager; then I selected "Oakland
Schools." I checked a few intersections with the "Find" tool (   ) to make sure that the address locator
was working (which it was!).

Now, to geo-code the addresses, I right-clicked on
"Sheet1$," which contains the Oakland Schools data, and
selected "Geocode Addresses." This is the initial result that
I got:

In order to work on reconciling the tied and unmatched addresses, I navigated to Tools  Geocoding 
Review/Rematch Addresses and selected "Geocoding_Result_Schools."

CP204C - Intro to GIS                                                                   Maggie Witt
                                         LAB #7: Address Geocoding

Reconciling Unmatched Addresses with Candidates Tied:
First, I tried to address the three "tied" results:

(1) "Chabot" 6786 Chabot
For the address on Chabot, I noted that the first
candidate seemed closer to the stated address
(6670 close to 6786), so I went ahead and matched
this one.

(2) Claremont Middle School:
To geo-code this address, I used Google Maps to locate the school and the two
candidates. From this map (see right), I decided to go with the 5776 College Ave.
address (seemed closer to the actual location of the middle school, in pink).

(3) Explore Middle School:
    For this candidate, I had a 100% match for both, so I consulted Google
    Map to see if I could identify a clear winner. From this map, the school
    looks to be at the intersection, so I selected the first candidate in the "tie."
    looked at Google map to be sure (see right).

Unmatched Addresses:
To begin with, I had 12 unmatched addresses. To attempt to resolve these, I went through the following

(1) Looked for suspicious zip codes:

First, I noticed a seemingly out of place zip code—94270. A Google Maps
search revealed that this zip code was in France (see right), so I searched
Google Map for the correct zip code, and found that it should be 94610. I
made this change in ArcMap and searched for a new result, but the match
was still very low (42%). The top match was a "746 W. Grand Ave.," so I
used Google Map to evaluate if this was indeed a match. I found that West
Grand and Grand were NOT in the same location, so I decided to draw the
point in where it should be, using the                  Tool.

CP204C - Intro to GIS                                                                     Maggie Witt
                                      LAB #7: Address Geocoding

From Google Map: Where Lakeview Should Be:               Using "Pick Address from Map" to get there:

In another example, I found an unusual zip code of 91601 for
"International Community School." I looked up this school in
Google Maps and found that the zip code should be 94601. I found                                  a few
a few other examples of unusual zip codes (i.e. 99601), so I either
searched to find the actual zip code in Google Maps or eliminated
the zip codes altogether in order to find the correct match.

International Blvd.
I knew from the in-class lab that International Blvd. was also 14th St., so changed this in ArcMap. For
four "International Blvd." addresses, changing the addressed to "14th St." generated resulted that matched
to 82%, so I selected these results as the new matches. An example of the window and where I edited the
name from "International Blvd" to "14th St." is shown below:

Avenues vs. Streets:
For two of the remaining 4 addresses, I found (via Google Maps) that one (Howard on "Fontaine Ave.")
was supposed to be Fontaine St. and Community Day High School was on Mountain Blvd. (rather than
Ave.). Making these changes helped me to make two more matches!

CP204C - Intro to GIS                                                                       Maggie Witt
                                      LAB #7: Address Geocoding

Finally, to eliminate the last 2 unmatched addresses (Brookfield & Carl Munck), I used Google Map to
locate the schools and used the "Pick Address from Map" tool to set the points (since checking zip codes,
streets vs. avenues, etc. didn't result in an sure matches). To do so, I followed the same steps as outlined

Carl Munck, as an example:

Note that, in the ArcMap image, the "candidates" were pretty far off from where the school is actually
located, so it's a good thing that I didn't geo-code according to these candidates!

After adding these points manually, my map was now 100%
matched! (see right).

CP204C - Intro to GIS                                                                       Maggie Witt
                                      LAB #7: Address Geocoding


In this first part of the lab, I geo-coded 100% of the addresses in the
Oakland food database. The following describes the steps I took to get

Steps to Geocode Food Data:

I started this portion of the lab just as I had done for the Oakland school data by adding the food database
to the map. Once in my TOC, I opened the attribute table to take a look at the data. Once again, while I
noticed some problems
with the data, it appeared
that there would not be a
need for an alias table.
Additionally, I noted that
the fields for this table
included: FoodStore,
Address, City, State,
Phone, Rating (note: no
zip code).

As before, I then switched to ArcCatalog to "evoke" the
address locator. I repeated the same steps as before, and when
I was finished, I could see in ArcCatalog that I had a new
Address Locator for the food data (see right).

Back in ArcMap, I added the food data and geo-coded the addresses in ArcMap the same way that I did
for the Oakland Schools (right clicked on "food$"  selected "Geocode Addresses").

The first time I tried to geo-code, I received zero matches,
but then realized that I had specified matching with
address and city. Since the food database is missing zip
code data; however, I did not receive any matches. When I
checked "city" to "none," I was able to recover ~75%
matches (see right).

CP204C - Intro to GIS                                                                      Maggie Witt
                                            LAB #7: Address Geocoding

I started reconciling this data by working with the tied results, and then moved on to the unmatched

Reconcilitaing Matched Addresses with Candidates Tied:
To save time, I chose to match the first candidate tied (rather than going through them individually as I
had done with the schools) in order to save time for working with the unmatched results. This worked as a
strategy to match most of the tied results, EXCEPT for "Miss Pearls Jam House" (since the address for
this entry was "Broadway," and I figured that I could figure out the actual address) and 3325 Grand Ave.
(since there appeared to be a need for some data cleanup here). For "Miss Pearl's Jam House," I found an
address via Google Maps—1 Broadway—and picked the point on the map that corresponded with this
(knew from the in-class lab that the streets data did not contain this address). Then, I changed the address
to 3325 Grand Ave. for the remaining address and matched it.

Reconcilitating Unmatched Addresses:
I started sifting through/matching the unmatched addresses by "picking the low-hanging fruit." First, as in
the in-class lab, I also geo-coded all addresses with a >60% match score, deleted all of the "Emeryville"
records from the database table, and deleted repeat entries that did not have a match address, like Addis
restaurant in the example pasted below:

After implementing these measures, I was left with just 19 unmatched entries.

From my examination of the data in
the attribute table, I knew that some
data cleanup would be necessary to
get at the remaining unmatched
addresses, so I worked through the
table to fix key errors, like mislabeled
fields (City in the Address field, etc.).

For example, I found one address that
said "Gastro" for the address field, but
it was clear that this field was just
mislabeled—I found the correct
address a few columns down under
"Rating." I changed the address to
"3986 Adeline" and was able to match
it. Additionally, I went into the
attribute table to make edits so that the

CP204C - Intro to GIS                                                                    Maggie Witt
                                      LAB #7: Address Geocoding

fields would match us like they were supposed to.

After cleaning up some of the data, I started using Google Maps and Bing as I had for the schools data to
identify any other common problems with the data (i.e. mislabeled streets vs. avenues, etc.). In mapping
the data in Google Maps, I was sometimes able to pinpoint specific locations for the unmatched
addresses, so I used the "Pick Address from Map" tool to manually match the address. An example of this
is shown below for the restaurant "Il Pescatore." turquoise

Location from Google Map:                Location Identified in ArcMap & Point Selected:

                                                                  Note: To help me
                                                                  identify when the point
                                                                  should go, I selected
                                                                  "Zoom to Candidates,"
                                                                  so that the turquoise
                                                                  markings would indicate
                                                                  where Webster Street is;
                                                                  then followed this street
                                                                  to it's terminus to place
                                                                  the address point.

I followed the same process (using Google Maps) to
locate some of the other addresses in question. Grand
Avenue was one that came up a few times, and I
learned from Google where these locations should
appear on the map. To figure out the street that
ArcMap attributed to these locations, I chose "select
features" selected the polyline, and located it in the
attribute table as W. Grand Ave. With this info, I was
able to select the appropriate candidate from the search
window, or picked the addresses "by hand" (using
"Pick Address from Map").

With the "Grand Avenue" unmatched addresses taken care of, I just had one last point to geocode for Jack
London Square (Hahn's Hibachi). This time, since I know that the square is somewhat smallish in size, I
tried removing the numerical address and came up with a match of 66%, so I matched this point and had
completed the geocoding to 100%!

CP204C - Intro to GIS                                                            Maggie Witt
                                       LAB #7: Address Geocoding

(3) For each School and Food dataset, what percent gets easily geo-coded? Why?

For the Schools dataset, 88% (106) of the addresses were
easily geo-coded. This is likely due to the fact that more of the
fields were correctly aligned between the Schools database and
the reference database: streets. For example, unlike the food
database, both the streets and schools databases include a field
for zip code, which gives the program more certainty in
matching the addresses.

For the Food dataset, 75% (212) of the addresses were easily
geo-coded. The lower percentage of initial matching is likely
due to the fact that the fields in the streets and food datasets
don't match as well as the fields for streets & schools. For
example, the food dataset is missing the zip code field, which
means that the only means of matching addresses is via the
addresses, city, and state fields (and the addresses play the
biggest role, since most of the data comes from
Oakland/Emeryville, CA).

CP204C - Intro to GIS                                                                       Maggie Witt
                                      LAB #7: Address Geocoding

(4) What steps and techniques do you apply to fully geocode your datasets?

Since I detailed the steps that I used in #1 and #2 above, I'll refrain from repeating and just list a few
common steps/techniques:
    (1) Eliminate inapplicable/unnecessary entries from the database file.
             a. Removing the Emeryville data along with duplicates of unmatched entries helped to
                 whittle away at unmatched addresses and picked off some of the "low hanging fruit"
                 before delving too deeply into investigating unmatched addresses.
    (2) Look for data entry problems:
             a. Misspellings, incorrect placement in fields, etc. can be obvious/quick for eliminating
                 errors and finding matches. Additionally, obvious deviations from the appropriate zip
                 code can raise early red flags for adjustments that must be made.
    (3) Use Google Map or Bing to identify problems that aren't as obvious:
             a. Once I hit a wall with looking for errors, eliminating erroneous data, I often used other
                 sources (like Google Maps) to do a quality assurance/quality control check on the data.
                 By doing this, I was able to identify some avenues that should have been labeled as "St."
                 and vice versa.
    (4) Try eliminating elements of the address altogether to see how this changes the match candidates:
             a. In some cases, eliminating the zip code or numerical address helped to identify the
                 correct addresses for schools/restaurants. This was especially effective when other
                 elements of the address were correct, but one factor was throwing it way off (i.e. zip
    (5) When all else fails, "Pick Address from Map":
             a. When I had gone through all of the above steps, I used Google Maps to help me identify
                 the unmatched address, found the corresponding point on the ArcMap, and placed the
                 point/geo-coded the address "by hand."

(5) What techniques prove most useful in geo-coding? Why?

    (1) Editing data in Excel before geo-coding:
            a. While I didn't do this in the lab, in hindsight, I wish I had since it would have likely saved
                me a lot of time. Working with the data in Excel has two advantages in my perspective:
                (1) it allows for more flexibility in making edits (and seems to be more user friendly in
                doing so that ArcMap) and (2) changes made in the database from the outset eliminate
                the need to essentially make the same change twice (i.e. to match the address and then
                again to edit the data in the database).
    (2) Identify aliases:
            a. While I did not do this in parts #1 and #2 above or for the in-class lab—in hindsight—I
                think that creating an alias table to capture some commonly misnamed streets would have
                been very helpful/saved time. For example, in both the in-class and lab assignment,
                creating an alias for "International Boulevard" would have saved a LOT of time from
                having to individually edit and match each of these entries.
    (3) Using Google Maps (and other spatial databases):
            a. While time-consuming, I would not have been able to geo-code 100% of the address in
                the schools and food databases without the use of internet-based spatial databases.

CP204C - Intro to GIS                                                                   Maggie Witt
                                       LAB #7: Address Geocoding

(6) What school is most impacted by poorly rated food establishments?

To answer this, I decided to look for the
school that is surrounded by the greatest
number of poorly rated food
establishments. To evaluate this, I
determined that a rating of 3.5 or lower
would be my benchmark for "poorly-
rated" establishments, and I selected these
from the attribute table for
"Geocoded_Result_Food" by sorting the
rating column and selecting all entries
greater than/equal to 3.5 but not those that
lacked a rating altogether.

With the poorly rated attributes selected, I then looked on the map to see which school seemed to be
surrounded by the greatest number of establishments, and selected Lincoln as the school most impacted,
there were ~ 9 poorly rated establishments within ~10 blocks of the school (if counting the grids as

CP204C - Intro to GIS                                                                          Maggie Witt
                                        LAB #7: Address Geocoding

    (7) Which record never gets coded (in the alcohol license exercise) and why? Why is 9332
        VISTA CT a problem?

According to a Google Map search, the address 9332
Vista Ct. does exist in the Oakland area; however, this
street cannot be geo-coded in the alcohol license

To investigate the cause, I looked in the attribute table for the streets layer (see next page), and saw that
Vista Court does not exist here (there is s Vista St., but this is different from Vista Court). As a result, the
Vista Ct. location cannot be geo-coded. Because the street does not even exist on the map, I can not "Pick
Address From Map," which would typically be my last resort.


To top