Collection and Preservation of Web-Based ProvincialTerritorial

Document Sample
Collection and Preservation of Web-Based ProvincialTerritorial Powered By Docstoc
					            Collection and Preservation of Web-Based
          Provincial/Territorial Government Publications:
                     An Action Plan for CARL

                             Andrew Hubbertz
                              Ottawa, Ontario

                              September 2005

   This study was sponsored by the Canadian Association of Research
Libraries; however, the contents are solely the responsibility of the author.

Summary of recommendations……………………………………….……..3


First priority…………………………………………………………………6

What should be collected?..............................................................................8

How should material be collected and organized?.........................................9

Who should build these collections?.............................................................10


Addendum I: Copyright……………………………………………….…..16

Addendum II: Looking further ahead…………………………..…………17

Appendix I: Collecting web-based government publications:
standard practices in Canadian libraries ………………………………….18

Appendix II: Major trends identified by the PREMIS Working Group….20

                          Summary of Recommendations

Recommendation 1: Insure that there is at least one comprehensive collection of
government web-based publications for every province and territory in Canada.

Recommendation 2: At the outset, collections should consist of print-like materials,
i.e. monographs, serials, etc.

Recommendation 3: At the outset, collections should be built on the basis of
established practice in Canada, namely downloading material to a local server and

Recommendation 4: CARL should initiate discussions with APLIC, offering
support and assistance to legislative libraries in their efforts to collect and preserve
web-based government information.

Recommendation 5: CARL should undertake to raise public awareness of the need
to collect and preserve web-based provincial/territorial government information.
CARL should attempt to recruit other interested parties to this cause.

Recommendation 6: CARL should set a date, approximately six months from
acceptance of this report, to review progress and decide next steps, as appropriate.


In the fall of 2004, the Canadian Association of Research Libraries (CARL) contracted
with the present author to survey CARL members and determine what was being done to
collect and preserve web-based publications produced by provincial and territorial
governments in Canada, and to recommend a course of action, as appropriate, to insure
that this portion of our heritage is adequately preserved.

It was decided, in consultation, to separate these two issues, first assessing the current
state of affairs through a survey, and then compiling recommendations.

Accordingly, in the fall of 2004 a survey instrument was developed and tested, then sent
to CARL members. It was also sent to provincial and territorial legislative libraries,
through the Association of Parliamentary Libraries in Canada (APLIC). Following the
survey, libraries were contacted via email and telephone to obtain more information on
major projects.

Data from the survey and follow up communication were compiled in January 2005 and
the report published by CARL in March 2005. The report, Collection and Preservation
of Web-Based Provincial/Territorial Government Publications: Report on a Survey of
CARL and APLIC Libraries, is available on the CARL website at: http://www.carl- The principal findings
of the survey were the following:

   •   The situation is highly variable across the country, some provinces having well-
       established programs, other jurisdictions having nothing;
   •   Most such collections are found in legislative libraries, the principal exception
       being Quebec, where the Bibliothèque nationale du Québec has a legal mandate
       as depository for government electronic publications;
   •   Existing collections are limited to print-like material: serials, monographs,
       leaflets, etc. in PDF, HTML and other electronic formats. Data files, databases,
       online services, material from the “dark web” are not collected.
   •   These collections are organized so as to maintain continuity with print collections,
       i.e. as serials, monographs, etc., stored on a server, and catalogued with 856 tag
       link to the material itself. Collections are not organized in such a way as to
       preserve the contemporary context (i.e. websites, webpages, etc.)
   •   Where major collections exist, in particular in Ontario and Quebec, CARL
       members are choosing to link to those collections from their catalogues, rather
       than build collections of their own.
   •   For the most part, holdings are not reported to the Amicus database in Library and
       Archives Canada.

The Report was distributed widely and led to further discussion and communication, to
the benefit of the present report. On one point, the views expressed in the Report were
probably too optimistic. The claim was made that in the short and medium term,

electronic versions of statutes, regulations, legislative records, annual reports, and budget
documents were probably safe, as their importance widely recognized. This statement
underestimated the risk to these publications when they exist solely in electronic format,
for example the New Brunswick statutes, which are no longer published in print.

One of the most important findings was that such collections as now exist are to be found
almost exclusively in legislative libraries. The present report and recommendations are
framed within that knowledge.

                                       First Priority

The first priority should be to insure that there is at least one comprehensive collection of
government web-based publications for every province and territory in Canada.

These collections should be generally comparable to such collections as now exist in
various Canadian legislative libraries, the Bibliothèque nationale du Québec, and (for
federal publications) Library and Archives Canada.

The jurisdictions currently lacking a comprehensive collection, or at present any attempts
to establish one, are:
    • New Brunswick
    • Newfoundland and Labrador
    • Northwest Territories
    • Nova Scotia
    • Nunavut
    • Prince Edward Island
    • Yukon

For the remaining jurisdictions, the situation is described below, with particular emphasis
on those materials not currently collected:


The Alberta Legislature Library collects provincial web-based publications, but limits the
collection to materials available solely in electronic format.

Not collected: web-based publications that co-exist in print format, perhaps 75% of
Alberta government publications.

British Columbia

The Legislative Library of British Columbia collects web-based publications from the
Government of British Columbia, limited to those materials of most interest to members
of the legislature.

Not collected: curriculum materials, scientific publications, posters.


The Manitoba Legislative Library collects an estimated 75% of Manitoba government
web-based publications, except for legislative records (collected by the Clerk’s Office of
the Legislative Assembly) and statutes and regulations (archived by Department of

Justice). For technical reasons, the collection is not yet accessible to the public, however
this problem should eventually be rectified.

Not collected: no evident exclusions, with the qualifications for legislative records and
statutes and regulations as indicated.


The Legislative Library of the Legislative Assembly of Ontario has a fairly
comprehensive collection of Ontario government web-based publications.

Not collected: web-based serials that co-exist in print.


The Bibliothèque nationale du Québec has a mandate to collect and preserve all Quebec
web-based publications. It also appears to have the commitment and resources to carry
out the mandate.

Not collected: no evident exclusions.


The University of Saskatchewan Library several years ago undertook to collect provincial
government web-based publications, but appears to have abandoned the project. Recent
provincial legislation provides for deposit of electronic publications with the Legislative
Library. The latter has begun plans to build a collection.

Not collected: no evident exclusions in the University Library, but the project has stalled
and may be dead. The Legislative Library has not yet begun to collect material.

Recommendation 1: Insure that there is at least one comprehensive collection of
government web-based publications for every province and territory in Canada.

                             What Should be Collected?

These comprehensive collections should be directed at the outset to electronic equivalents
of print publications, such as monographs, serials, news releases, leaflets, maps, etc. In
many ways, libraries already know how to handle such material, even in electronic
format; it represents the central documentary record of government; and it is the kind of
material already being collected in such collections as now exist.

Once such materials are adequately being collected and preserved, collections should be
expanded to include data files, databases, online services, and material from the “dark”
web, the latter being material that is more or less print-like, but is accessed via a
database. Such collecting will probably require formal agreement with governments or
government institutions to arrange for their transfer.

Recommendation 2: At the outset, collections should consist of print-like materials,
i.e. monographs, serials, etc.

                  How Should Material be Collected and Organized?

Collecting in most cases should proceed by downloading material from government
websites and transfer to a local server. The material should be catalogued, with 856 link
to individual items, and catalogue records contributed to the Amicus database in Library
and Archives Canada.1

These are the practices in most existing collections, save contributing to Amicus, which
is unfortunately the exception. Appendix I enumerates these practices in more detail; it
also includes recommendations of this author, where they differ from common practice.
These practices (and recommended practices) cannot be called standards or best
practices, as standards and best practices have not yet been identified. However, they
may guide libraries newly undertaking to build collections. Adopting practices of other
Canadian libraries will reduce the decisions to be made by libraries just starting out, will
facilitate development of national strategies, and will make it easier to assess how
comprehensively material is being collected from each jurisdiction. They may also help
establish a common understanding among libraries undertaking collaborative collection
building, i.e. two or more libraries collaborating to build a collection of material from a
single jurisdiction.

The practices and recommendations in Appendix I have been incorporated into a guide
produced by this author, Cookbook for a Basic Collection of Web-based Government
Information (Canadian Association of Research Libraries, 2005)2 The Cookbook is
intended to provide step-by-step guidance in setting up a basic collection of web-based
government publications. It is hoped that it might be of use to libraries, including
legislative libraries, wanting to start collections but not knowing quite how to begin.

Appendix II reproduces a list of major trends identified by the OCLC/RLG PREMIS
Working Group as “trends in practice that may ultimately emerge as best practices.” The
institutions which form the basis of these observations, essentially institutions in the
United States, are for the most part well ahead us in Canada.

Recommendation 3: At the outset, collections should be built on the basis of
established practice in Canada, namely downloading material to a local server and

  In a few cases, namely Quebec and Saskatchewan, there is a legal mandate for deposit of these materials,
which will affect the mechanisms of acquisition.
  The Cookbook is available at:

                          Who Should Build these Collections?

Here we face a critical dilemma:

We must insure that there are comprehensive collections of web-based government
publications from every province and territory; however the smaller jurisdictions lack the
institutional infrastructure to undertake this work for themselves.

This dilemma mirrors, on a smaller scale, the perennial constitutional debate in Canada.
How are we to insure equal treatment for the records of each jurisdiction, when these
jurisdictions themselves are vastly different in size, population, institutions, and wealth?
How are we to balance the centralizing and centrifugal responsibilities of provincial,
territorial, and federal institutions?

If there is any lesson to be learned from our political experience of recent decades, it is to
avoid attempts to find a formula that will apply to each province and territory. What this
report proposes instead is a strategy that will lead, one may hope, to a set of practical
arrangements, arrangements that may be quite different from one jurisdiction to another.

As a strategy, it will not predict what these arrangements will be. However, it does
recognize that most existing collections of this kind are in legislative libraries and that
legislative libraries are in many ways the natural and preferred home for such collections.
Accordingly, we propose a ranked list of possible solutions, ranging from most preferred
to least preferred.

    1. Legislative library or equivalent3
    2. A consortium of legislative libraries
    3. A CARL academic library located within the province (or University of Prince
       Edward Island in that province)
    4. A CARL library (an academic library or Library and Archives Canada) outside
       the province or territory
    5. Periodic capture of provincial/territorial websites by Library and Archives Canada
       (or delegate) under s. 8(2) of the Library and Archives Canada Act.

The list is not intended to be exclusive. The interpolation of other solutions is invited.

As noted, different solutions will be appropriate in different jurisdictions. Solution 1 is
more or less in place in several provinces. Just as clearly, solution 1 will not be
implemented in the smaller jurisdictions in the foreseeable future. It would be desirable
to see solution 1 or 2 in place for every jurisdiction.

A consortium under the control of participating libraries (solution 2) might be a
politically acceptable means to out-source preservation from smaller jurisdictions to other
 In Quebec, the Bibliothèque nationale du Québec; in Prince Edward Island, the Government Services

public institutions in Canada or to the private sector. There are numerous precedents that
might serve as governance models, e.g. Canadian National Site Licensing Project.
Participating legislative libraries might form a legal entity, which could then pursue
funding from parent governments (or allocate funding from their internal budgets) and
contract with external bodies for the collection and preservation of material from their
jurisdictions. Ownership of the collections would remain with the contracting libraries.
The assumption is that a consortium would offer economies of scale unavailable to
individual provinces and territories, and that a consortium would insure a common

Failing solutions 1 or 2, CARL directors should consider solutions 3 to 5.

Clearly, solution 3 (a CARL library building a collection of material from the province in
which it is located) will be preferred over solution 4 (a CARL library building a
collection from a province or territory outside of which it is located), since the collection
would reside within the jurisdiction which is the source of the publications.

Solution 5 would amount to a salvage operation. It would do little to afford current
access, but would allow future librarians and scholars to recover government publications
from defunct websites. (It is also unclear whether the Act can be turned to this purpose.)

Possible solutions, as delineated above, are not mutually exclusive. For example, in
some jurisdictions the legislative library might seek to collaborate with another library in
the jurisdiction, which might entail a combination of solution 1 and solution 3.

In light of this ranked list of solutions, CARL should adopt two immediate action items:

   1. Offer support and assistance to provincial/territorial legislative libraries through
   2. Undertake to raise awareness of the issue in the minds of the public and with
      provincial and territorial governments.

Discussion of these items follows.

Action item 1: Supporting APLIC and the legislative libraries

CARL may wish to enter into open-ended discussions with APLIC, but should address
one question in particular:

Is there anything that CARL or its members can do to assist legislative libraries create
comprehensive collections of web-based government information?

Assistance might take many different forms. It might include joining with them in
lobbying for additional government support, in providing technical advice, or even
something as simple as providing access to a server. Such actions have minimal cost and
might be of limited duration. On the other hand, assistance might entail a major

commitment, such as agreeing to take responsibility for collecting and preserving some
portion of the provincial or territorial output of web-based publications.

Needless to say, CARL cannot make substantive commitments on behalf of member
libraries. It might, however, help match needs with CARL libraries willing and able to
provide assistance.

This action recognizes and supports legislative libraries as the natural and preferred
location for such collections.

Recommendation 4: CARL should initiate discussions with APLIC, offering
support and assistance to legislative libraries in their efforts to collect and preserve
web-based government information.

Action item 2: Raising public awareness

The campaign to raise public awareness should be for the purpose of motivating
provincial and territorial governments to properly fund preservation of web-based
information they have created. Such funding would make it possible to effect
preservation through solution 1 or 2 above.

It would be appropriate to recruit allies within the library community, e.g. APLIC and
other library associations. But it is also important to create alliances with law societies,
journalists, public interest groups and scholars. It would be a boon to such a campaign to
have a high profile “champion”.

There may be a special window of opportunity for such a campaign. The Canadian
Newspaper Association recently released the results of a nation-wide survey of Canadian
freedom of information legislation and how it is being implemented. (The survey is
sometimes referred to as the FOI audit.4) The study is notable for examining the success
of such legislation at the provincial, not solely the federal level of government.

Armed with the results of our recent survey, as well as anecdotal evidence (e.g. New
Brunswick statutes being published only in electronic format), CARL and allied
organizations might launch a campaign in the form of a public response to the Newspaper
Association’s FOI audit.

The argument needs to be made that public access to information rests upon the twin
pillars of freedom of information legislation and ordinary, published government
information. Freedom of information has a high profile and is what most interests
journalists. Government publications are essentially taken for granted. What is not
understood by journalists and by the public at large is the grave risk that they may be lost,
particularly when they are solely electronic.

 Canadian Newspaper Association. “Public’s Right to Know in Failing Health in Canada.” May 28, 2005.
Available at:'s+right+to+know+in+failing+health.

Whether CARL chooses to tie the campaign to the FOI audit, the immediate objective is
to motivate provincial and territorial governments to provide for long-term preservation
of the information under their control.

CARL should be prepared for provincial and territorial governments to be defensive and,
if cornered, to look for internal solutions. Least likely of all is any admission of failure or
lack of foresight. An internal solution may be acceptable if it means empowering (and
funding) a memory organization such as legislative library, provincial/territorial archive,
or provincial/territorial library to take on the job. The risk is that government will claim
to have addressed the problem by turning it over to individual departments, to a queen’s
printer, or to an information technology office. The argument has to be made that
institutions that create and distribute information should not be entrusted with its

Although there is a continuing need to raise public awareness about this and other issues,
what is intended here is a campaign of limited scope and duration, perhaps six months. If
it is not entirely successful, and it is hardly likely to be successful in every province and
territory, CARL will need to take more direct action, as described below.

Recommendation 5: CARL should undertake to raise public awareness of the need
to collect and preserve web-based provincial/territorial government information.
CARL should attempt to recruit other interested parties to this cause.

Assessment after six months

After a suitable period, perhaps six months from acceptance of this report, CARL should
assess what progress has been made in creating comprehensive collections of
provincial/territorial web-based government information. Six months should be enough
time to observe any results following from discussions with APLIC, from the
recommended public awareness campaign, and from the general effect of increased
awareness of the issue within the library community.

The assessment will not require a full scale survey. It will require no more than a brief
query to CARL and APLIC libraries to learn of any new developments: collections
begun, expanded, reduced or indeed abandoned. It is likely that at least one additional
legislative library will be starting a collection by then, and there may be others.

Given this information, CARL members should enter into discussion on a course of
action to carry out the objectives of Recommendation 1, i.e. insuring that there is at least
one comprehensive collection of web-based government publications for every province
and territory in Canada. It is probably best to avoid being too specific about the
discussions at this time, however the assumption here is that it is in the interest of CARL
libraries to undertake collecting this material, if that is what is required to insure its

This author has identified three options entailing CARL participation (solutions 3 to 5
above). To these might be added other options, unforeseen by this author, but arising
perhaps out of the discussions. As well, CARL might choose to delay action for various
reasons or, in fact, choose to do nothing at all.

Needless to say, it is to be hoped that CARL members will reject doing nothing. Indeed,
there would be little purpose in this study and its predecessor if that is the outcome.

Before closing this section, it may be worth making a crude estimate of the cost to CARL
libraries in building collections of this material. From discussions, it appears that Ontario
and British Columbia legislative libraries each have two to three staff working full-time
downloading material. To this must be added the cost of cataloguing, hardware, and
technical support. These are considerable resources, however Ontario and British
Columbia are among the largest provinces in the country. The jurisdictions most likely to
need the attention of CARL libraries are precisely the smallest ones, and the output of
these jurisdictions is correspondingly small. In the smallest, this may amount to no more
than thirty or forty serials (including legislative records, which are voluminous, but also
annual reports, estimates, and public accounts, which are annuals) and a few hundred

Recommendation 6: CARL should set a date, approximately six months from
acceptance of this report, to review progress and decide next steps, as appropriate.


In this report, we have proposed that the primary objective should be to insure that there
are complete collections of web-based government publications for every province and
territory in Canada. It has been relatively easy to identify what should be included in
these collections and how they should be built. The hard question, as anyone can see, is
who is to do it.

The position taken here is that no simple formula will serve provinces and territories that
are so different in size, wealth, population, and infrastructure. What is proposed
therefore is a strategy for finding what will amount to a set of practical arrangements.
What is important are results, not the path by which we get there.

Addressing the challenge of preserving provincial and territorial web-based information
will exceed the resources of the legislative libraries that are perhaps the natural and
preferred homes for this material and that have done most of the work to date. The most
obvious candidates to supplement their efforts are the members of CARL. In the course
of compiling this report and its predecessor, I have been impressed by the spirit of
cooperation among the libraries involved, both CARL libraries and legislative libraries. I
trust that this spirit will continue.

                              Addendum I: Copyright

This author is not a copyright specialist, and if CARL wishes to address copyright, it
should seek an opinion from a properly qualified authority. It is probably safe, however,
to make a couple of observations.

Clearly, crown copyright applies to web-based provincial and territorial government
information, just as it applies to print. To our knowledge, there have been no judicial
decisions relevant to the legality of collecting “free” information from the web, whether
put there by government or by anyone else.

Libraries that have collected web-based government publications appear to uniformly
block the servers on which they reside to webcrawlers. This is certainly a wise decision,
as “publishing” material on the web normally means making it available to webcrawlers.
Copyright clearly prohibits unauthorized publishing of such material.

Finally, we note from developments in Ontario and Quebec, that CARL libraries have
shown little interest in building collections of their own when permanent, properly
organized collections are available to them at institutions like the Ontario Legislative
Library and the Bibliothèque nationale du Québec. This suggests that CARL libraries are
chiefly interested in guaranteed access, not necessarily in building or owning collections
of the materials themselves. Any venture into this area by CARL libraries, therefore, is
in the nature of last resort. Such actions are motivated by public interest, not by any
desire to acquire material illicitly.

                        Addendum II: Looking Further Ahead

There is scarcely need to explain that the measures here, especially as they relate to what
to collect and how to collect and organize it, will be superseded as more advanced
techniques become available.

As indicated several times here, it is necessary that we capture and preserve web-based
information that goes beyond print-equivalent material, in particular data files, databases,
interactive services, and material from the “dark” web. For practical purposes, this will
require formal arrangements for transfer and deposit, perhaps along the lines of the OAIS
Reference Model. On the whole, legislative libraries, administratively and sometimes
legally, are in a privileged position, relative to CARL libraries, for this kind of
arrangement. Governments are going to be more willing to transfer material for long-
term preservation to a government institution than to an institution that is outside

Other issues facing us are technical.

It will be necessary to collect preservation metadata that supports long-term preservation.
XML-based schema will be more adaptable to these requirements than MARC records.
Leading work in this area has been undertaken by the OCLC/RLG PREMIS Working

In September 2004, the California Digital Library received a grant of $2.4 million from
Library of Congress to develop tools for capture and preservation of web-based
government and political information. We know that government information produced
in the United States and Canada is similar in most aspects relevant to preservation, so
tools coming out of this project may be applicable within our own environment.

These are two of the most important developments to monitor.

 OCLC/RLG PREMIS Working Group, Data Dictionary for Preservation Metadata: Final Report of the
PREMIS Working Group. (May, 2005). Available at: The
PREMIS report was released too recently to be incorporated into this report.

                                       Appendix I

                Collecting Web-Based Government Publications:
                    Standard Practices in Canadian Libraries

The following represents what may be termed standard practices in Canadian libraries at
the time of this report. They cannot be termed “best practices”, as best practices have not
yet been identified (See Appendix II), and current practices are destined to be superseded
by better practices, as they emerge.

“Recommended” practices below represent the opinion of the author and may not be (and
generally are not) the current practice in most libraries in Canada.

Collections and collection policies

   •   Collections consist principally of web-based publications analogous to print
       publications, usually in PDF or HTML;
   •   There is not usually a formal agreement between the government or government
       agency and the library, although in the case of legislative libraries there may be a
       legal mandate to collect publications;
   •   Material is collected by downloading and transfer to a server.


   •   Material is stored on a server connected to the internet;
   •   Material may be organized:
          o By government agency, or
          o By serial number, bibliographic number, etc.
   •   Material on the server is blocked to webcrawlers;
   •   Material may be stored
          o In its native format, or
          o Converted to PDF
          o Note: when converting HTML, Word, or other formats to PDF, it is
              recommended that the original be retained for long-term preservation. See
              Appendix II, no. 4.


   •   Materials are catalogued on the system used for the library’s general collection,
       with a link from 856 tag to the item;
   •   When an item is held by a library in both print and electronic formats, it is
       recommended that a record be created for each item;
   •   Preservation metadata is recorded in an unstructured note, including:

          o  Date collected;
          o  Source URL;
          o  Format and version;
          o  Tools used for capture, e.g. HTTrack;
          o  Software required to read it, e.g. “requires Adobe Acrobat Reader Version
•   It is recommended that catalogue records for these materials be contributed to the
    Amicus database in Library and Archives Canada


• Annuals and other low-frequency serials may be accessed from the holdings
  record, on some library systems;
• For frequent serials, and all serials in the case of library systems without the
  capacity described above, a web page in HTML is used to organize issues and

                                         Appendix II

              Major Trends Identified by the PREMIS Working Group

    The PREMIS Working Group identified the following as “trends in practice that may
    ultimately emerge as best practices”:

       1. Store metadata redundantly in an XML or relational database and with the
          content data objects. Metadata stored in a database allows fast access for use
          and flexible reporting, while storing them with the object makes the object
          self-defining outside the context of the preservation repository;
       2. Use the METS format for structural metadata and as a container for
          descriptive and administrative metadata; use Z39.87/MIX for technical
          metadata for still images;
       3. Use the OAIS model as a framework and starting point for designing the
          preservation repository, but retain the flexibility to add functions and services
          that go beyond the model;
       4. Maintain multiple versions (originals and at least some normalized or
          migrated versions) in the repository, and store complete metadata for all
          versions. Retention of the original reduces risk in case better preservation
          treatments become available in the future;
       5. Chose multiple strategies for digital preservation. There are good reasons to
          have more than one approach in a developing field.

There is additional discussion of these trends in section 6 of the Report, p. 53-55.

  OCLC/RLG PREMIS Working Group. 2004. “Implementing Preservation Repositories for Digital
Materials: Current Practice and Emerging Trends in the Cultural Heritage Community.” Report by
the joint OCLC/RLG Working Group Preservation Metadata: Implementation Strategies
(PREMIS). Dublin, O.: OCLC Online
Computer Library Center, Inc. Available online at: (PDF:668K/66pp.), p. 7-8.


Shared By: