The list of OPAC typographical errors

Reviews
Shared by: listmaster
Stats
views:
3
rating:
not rated
reviews:
0
posted:
10/29/2008
language:
pages:
0
The list of OPAC typographical errors: • How it started, and why it will never stop growing By Terry Ballard Associate Professor and Automation Librarian Quinnipiac University Hamden Connecticut “It’s no wonder that truth is stranger than fiction. Fiction has to make sense.” First contact with the problem • I had been a systems librarian at Adelphi University in Garden City, Long Island for about a year, when my Associate Dean read a sidebar article in American Libraries giving a selection of possible typos that may be found in an OPAC. We found one or two. I thought that would be the end of it. It wasn’t. My boss kept looking and found a word here and a word there. • I got the idea to look through the entire keyword database, find the words that were wrong, and fix them. • It was easy to do that in an INNOPAC system because you could type in a deliberately wrong search like “aaaaa” and it would give you “Nearby Words” 8 at a time, and you could browse this list until you reached “zzzzzzz,” thousands of screens later. The Work Flow • I found that it was easy to browse one letter per day, so the whole process took about a month. • I had expected to find single typos randomly distributed through the database. Wrong! • Within days, I found out that some words had 5-10 instances of certain errors. Even in an OPAC that was, at best, mediumsized, we had 10 occurrences of “adminstrative.” “We made too many of the wrong mistakes.” This led to speculation: • 1. There are certain kinds of words prone to typos in OPACs. • 2. We got the records that contain them from OCLC. • 3. Other, bigger libraries must have the same problems, but more so. Confirmation • This was 1991, just before library opacs were commonly available on the web. • I visited other university libraries and found the same typos in every opac – in greater numbers because they were bigger libraries. • I concluded that all libraries have this problem unless they systematically go after it. • But as far as I could tell, nobody had a list like the one I was developing, so I spent a fair amount of effort getting the word out. • I called OCLC and informed the Quality Control department that there were a thousand references to “Commerical” in their database. They were properly horrified! • They said that they normally corrected a title only when a participating library sent a photocopy of the book’s title page. • After some consideration, they decided that this was too much, so they corrected more than 700 of those records that week. I spoke with the IT reporters at the New York Times and Library Journal • Both said that they would report on this problem. • Neither one did. Getting the word out • With my colleague Arthur Lifshin, I wrote an article on the subject for the journal Information Technology and Libraries • I volunteered to speak on the subject at the March, 1992 Computers in Libraries conference in DC. The speech went very well – particularly when I told the librarians there that these words were found in all of their OPACs. I was asked to write an article about my findings for the journal Computers in Libraries • The next year, this was awarded the “Article of the Year” by Computers in Libraries. • For 7 years, I got a steady number of questions about the study, but the list did not really grow in any major way. In 2000, I got an email that got things moving again. • Phalbe Henriksen, a library director in Florida, wrote me to say how much she enjoyed reading about the work I had done. • We started an online discussion at Yahoo Groups, to find out if anyone else is interested in this work. • Soon, we had more than 50 members and considerable cooperation in adding new typos to the list. • Tina Gunther, an early member, began keeping the master list and making appropriate changes. • Tina revised the list this week to coincide with this presentation. The words added by list members since 2000 far exceeds the original Adelphi list. Current methodology • Our test database is OHIOLINK, because it is the biggest database we know of that can be searched for free, and because they index the maximum number of fields. • Placement in the master list is based on the number of hits in OHIOLINK – from one (very low probability) to 100+ (very high). Later Developments • We found out that our study has been the subject of some discussion at OHIOLINK meetings. The last I heard, they were using our work as an opportunity to generate a more perfect catalog. • Phalbe Henriksen started “More Typos,” a list that tracks words that are wrong in one context but not another. Biggest problem words • Repons+ (more than 600) • Accomodat+ (More than 500) • Commeric+ (More than 300) This goes beyond OPACs • These same words can be located in nearly all online databases. For instance, “Commerical:” • 97 hits in Lexis Universe newspapers • 1973 hits in Proquest • 1598 in JSTOR • 6 in the Oxford English Dictionary (it surprised them, too) Oh yes, Google • 422,000 hits for “commerical” as of January 5, 2004 • Other typos are even more prevalent in Google. The Future • This work will never end. I was amused to look at the ALICAT database that was the basis for the original study. I found 6 hits for “Commerical” and 15 for “Adminstration.” The URL of this presentation • A PDF of this PowerPoint can be found at http://faculty.quinnipiac.edu/libraries/tballar d/typos.pdf • Too much typing? Just type in Ballard Typographical in Google.

Related docs
Other docs by listmaster