LOCKSS on the Last Frontier by dfhercbml


									                      LOCKSS on the Last Frontier
       Resources and information relating to the Alaska State Library’s
                document gathering and preservation efforts
                     (Compiled by Daniel Cornwall, April 27, 2007)

Document Discovery and Gathering

The main engine for document discovery is our use of the “Capturing Electronic
Publications” software developed by Larry Jackson of UIUC. The basic software and
documentation of the project can be found at http://www.isrl.uiuc.edu/pep/. In addition to
gathering agency web sites, it produces a monthly e-mail of added, changed and deleted
files from agency web servers. If you’d like to see a sample, e-mail me.

While the CEP software does automated gathering, we will go to the original agency web
site to harvest identified documents. If it’s dropped off the web, then we would go to the
CEP archive, which is not easy to search and use. But it is a handy backup.

Web Server Storage – files and space usage

Before a publication can be ingested into LOCKSS, it must first be posted to a web server
with a LOCKSS permission statement. We currently use our main web server to store
agency electronic publications.

Monograph files are stored in a directory structure of
edocs/[year]/[mo]/[OCLCnumber].[file extension] if publication is a single file. If the
document is made up of multiple files, the directory structure is
edocs/[year]/[mo]/[OCLCnumber/]. Inside the directory [OCLCnumber] is a file called
index.html and then whatever files there were on the original agency server for that
document. We make extensive use of the OCLC Number to name files and directories to
avoid duplicate file names (i.e. FinalReport.pdf).

We also store electronic serial and annual report files on our web server and the structure
for them is similar to that for multipart monographs. Contact me for details if you’re

The State Library first started using LOCKSS (http://www.lockss.org) as part of GPO’s
pilot project in July 2005. We purchased a low end Dell server with a 250 GB hard drive
for approximately $1,200. We are a member of the LOCKSS alliance, which charges
State Libraries a rate of $1,080/yr.

LOCKSS is a secure system because its operating system and software are all on a CD
and the configuration files are on a write-locked floppy drive. Compromised servers can
be restored by rebooting.

Storage for FY 2006 Alaska state documents is 1.3GB. As of March 2007, storage for FY
2007 Alaska state documents was 580 MB. Total number of files preserved in 1681, this
figure includes 35 multipart documents. At this rate, we have decades of storage space,
but we expect to replace our servers every five years or so.

Trading out LOCKSS machines for new servers is simple. You ask the LOCKSS team
for your “archive unit” backup configuration file You'd take the CD and floppy out of the
old machine, put them into the new machine, and boot up. After the machine is up and
the new drive has been prepared by the LOCKSS platform, you would log into the
administration UI and choose "Journal Configuration -> Restore". It would prompt you
for the backup file to upload.

An alternate method would be to temporarily connect both the old drive and the new
drive. You'd still move the CD and the floppy over to the new machine, but there is a
procedure to move all the content over from the old disk to the new disk. This saves a lot
of time in re-collecting and re-auditing content.

Contact information for Alaska State Publications Program

Alaska State Library
Attn: Daniel Cornwall
PO Box 110571
Juneau AK 99811-0571
Ph: 907-465-2927
Fax: 907-465-2665
Email: dan_cornwall@eed.state.ak.us
Web Page: http://library.state.ak.us/asp/asp.html

To top