VIEWS: 1 PAGES: 20 POSTED ON: 8/29/2011
Collection and Preservation of Web-Based Provincial/Territorial Government Publications: An Action Plan for CARL Andrew Hubbertz Ottawa, Ontario September 2005 This study was sponsored by the Canadian Association of Research Libraries; however, the contents are solely the responsibility of the author. Contents Summary of recommendations……………………………………….……..3 Introduction…………………………………………………………….……4 First priority…………………………………………………………………6 What should be collected?..............................................................................8 How should material be collected and organized?.........................................9 Who should build these collections?.............................................................10 Conclusion………………………………………………………..………..15 Addendum I: Copyright……………………………………………….…..16 Addendum II: Looking further ahead…………………………..…………17 Appendix I: Collecting web-based government publications: standard practices in Canadian libraries ………………………………….18 Appendix II: Major trends identified by the PREMIS Working Group….20 2 Summary of Recommendations Recommendation 1: Insure that there is at least one comprehensive collection of government web-based publications for every province and territory in Canada. Recommendation 2: At the outset, collections should consist of print-like materials, i.e. monographs, serials, etc. Recommendation 3: At the outset, collections should be built on the basis of established practice in Canada, namely downloading material to a local server and cataloguing. Recommendation 4: CARL should initiate discussions with APLIC, offering support and assistance to legislative libraries in their efforts to collect and preserve web-based government information. Recommendation 5: CARL should undertake to raise public awareness of the need to collect and preserve web-based provincial/territorial government information. CARL should attempt to recruit other interested parties to this cause. Recommendation 6: CARL should set a date, approximately six months from acceptance of this report, to review progress and decide next steps, as appropriate. 3 Introduction In the fall of 2004, the Canadian Association of Research Libraries (CARL) contracted with the present author to survey CARL members and determine what was being done to collect and preserve web-based publications produced by provincial and territorial governments in Canada, and to recommend a course of action, as appropriate, to insure that this portion of our heritage is adequately preserved. It was decided, in consultation, to separate these two issues, first assessing the current state of affairs through a survey, and then compiling recommendations. Accordingly, in the fall of 2004 a survey instrument was developed and tested, then sent to CARL members. It was also sent to provincial and territorial legislative libraries, through the Association of Parliamentary Libraries in Canada (APLIC). Following the survey, libraries were contacted via email and telephone to obtain more information on major projects. Data from the survey and follow up communication were compiled in January 2005 and the report published by CARL in March 2005. The report, Collection and Preservation of Web-Based Provincial/Territorial Government Publications: Report on a Survey of CARL and APLIC Libraries, is available on the CARL website at: http://www.carl- abrc.ca/projects/preservation/pdf/provincial_web-pubs_report.pdf. The principal findings of the survey were the following: • The situation is highly variable across the country, some provinces having well- established programs, other jurisdictions having nothing; • Most such collections are found in legislative libraries, the principal exception being Quebec, where the Bibliothèque nationale du Québec has a legal mandate as depository for government electronic publications; • Existing collections are limited to print-like material: serials, monographs, leaflets, etc. in PDF, HTML and other electronic formats. Data files, databases, online services, material from the “dark web” are not collected. • These collections are organized so as to maintain continuity with print collections, i.e. as serials, monographs, etc., stored on a server, and catalogued with 856 tag link to the material itself. Collections are not organized in such a way as to preserve the contemporary context (i.e. websites, webpages, etc.) • Where major collections exist, in particular in Ontario and Quebec, CARL members are choosing to link to those collections from their catalogues, rather than build collections of their own. • For the most part, holdings are not reported to the Amicus database in Library and Archives Canada. The Report was distributed widely and led to further discussion and communication, to the benefit of the present report. On one point, the views expressed in the Report were probably too optimistic. The claim was made that in the short and medium term, 4 electronic versions of statutes, regulations, legislative records, annual reports, and budget documents were probably safe, as their importance widely recognized. This statement underestimated the risk to these publications when they exist solely in electronic format, for example the New Brunswick statutes, which are no longer published in print. One of the most important findings was that such collections as now exist are to be found almost exclusively in legislative libraries. The present report and recommendations are framed within that knowledge. 5 First Priority The first priority should be to insure that there is at least one comprehensive collection of government web-based publications for every province and territory in Canada. These collections should be generally comparable to such collections as now exist in various Canadian legislative libraries, the Bibliothèque nationale du Québec, and (for federal publications) Library and Archives Canada. The jurisdictions currently lacking a comprehensive collection, or at present any attempts to establish one, are: • New Brunswick • Newfoundland and Labrador • Northwest Territories • Nova Scotia • Nunavut • Prince Edward Island • Yukon For the remaining jurisdictions, the situation is described below, with particular emphasis on those materials not currently collected: Alberta The Alberta Legislature Library collects provincial web-based publications, but limits the collection to materials available solely in electronic format. Not collected: web-based publications that co-exist in print format, perhaps 75% of Alberta government publications. British Columbia The Legislative Library of British Columbia collects web-based publications from the Government of British Columbia, limited to those materials of most interest to members of the legislature. Not collected: curriculum materials, scientific publications, posters. Manitoba The Manitoba Legislative Library collects an estimated 75% of Manitoba government web-based publications, except for legislative records (collected by the Clerk’s Office of the Legislative Assembly) and statutes and regulations (archived by Department of 6 Justice). For technical reasons, the collection is not yet accessible to the public, however this problem should eventually be rectified. Not collected: no evident exclusions, with the qualifications for legislative records and statutes and regulations as indicated. Ontario The Legislative Library of the Legislative Assembly of Ontario has a fairly comprehensive collection of Ontario government web-based publications. Not collected: web-based serials that co-exist in print. Quebec The Bibliothèque nationale du Québec has a mandate to collect and preserve all Quebec web-based publications. It also appears to have the commitment and resources to carry out the mandate. Not collected: no evident exclusions. Saskatchewan The University of Saskatchewan Library several years ago undertook to collect provincial government web-based publications, but appears to have abandoned the project. Recent provincial legislation provides for deposit of electronic publications with the Legislative Library. The latter has begun plans to build a collection. Not collected: no evident exclusions in the University Library, but the project has stalled and may be dead. The Legislative Library has not yet begun to collect material. Recommendation 1: Insure that there is at least one comprehensive collection of government web-based publications for every province and territory in Canada. 7 What Should be Collected? These comprehensive collections should be directed at the outset to electronic equivalents of print publications, such as monographs, serials, news releases, leaflets, maps, etc. In many ways, libraries already know how to handle such material, even in electronic format; it represents the central documentary record of government; and it is the kind of material already being collected in such collections as now exist. Once such materials are adequately being collected and preserved, collections should be expanded to include data files, databases, online services, and material from the “dark” web, the latter being material that is more or less print-like, but is accessed via a database. Such collecting will probably require formal agreement with governments or government institutions to arrange for their transfer. Recommendation 2: At the outset, collections should consist of print-like materials, i.e. monographs, serials, etc. 8 How Should Material be Collected and Organized? Collecting in most cases should proceed by downloading material from government websites and transfer to a local server. The material should be catalogued, with 856 link to individual items, and catalogue records contributed to the Amicus database in Library and Archives Canada.1 These are the practices in most existing collections, save contributing to Amicus, which is unfortunately the exception. Appendix I enumerates these practices in more detail; it also includes recommendations of this author, where they differ from common practice. These practices (and recommended practices) cannot be called standards or best practices, as standards and best practices have not yet been identified. However, they may guide libraries newly undertaking to build collections. Adopting practices of other Canadian libraries will reduce the decisions to be made by libraries just starting out, will facilitate development of national strategies, and will make it easier to assess how comprehensively material is being collected from each jurisdiction. They may also help establish a common understanding among libraries undertaking collaborative collection building, i.e. two or more libraries collaborating to build a collection of material from a single jurisdiction. The practices and recommendations in Appendix I have been incorporated into a guide produced by this author, Cookbook for a Basic Collection of Web-based Government Information (Canadian Association of Research Libraries, 2005)2 The Cookbook is intended to provide step-by-step guidance in setting up a basic collection of web-based government publications. It is hoped that it might be of use to libraries, including legislative libraries, wanting to start collections but not knowing quite how to begin. Appendix II reproduces a list of major trends identified by the OCLC/RLG PREMIS Working Group as “trends in practice that may ultimately emerge as best practices.” The institutions which form the basis of these observations, essentially institutions in the United States, are for the most part well ahead us in Canada. Recommendation 3: At the outset, collections should be built on the basis of established practice in Canada, namely downloading material to a local server and cataloguing. 1 In a few cases, namely Quebec and Saskatchewan, there is a legal mandate for deposit of these materials, which will affect the mechanisms of acquisition. 2 The Cookbook is available at: http://www.carl-abrc.ca/projects/preservation/pdf/Cookbook.pdf. 9 Who Should Build these Collections? Here we face a critical dilemma: We must insure that there are comprehensive collections of web-based government publications from every province and territory; however the smaller jurisdictions lack the institutional infrastructure to undertake this work for themselves. This dilemma mirrors, on a smaller scale, the perennial constitutional debate in Canada. How are we to insure equal treatment for the records of each jurisdiction, when these jurisdictions themselves are vastly different in size, population, institutions, and wealth? How are we to balance the centralizing and centrifugal responsibilities of provincial, territorial, and federal institutions? If there is any lesson to be learned from our political experience of recent decades, it is to avoid attempts to find a formula that will apply to each province and territory. What this report proposes instead is a strategy that will lead, one may hope, to a set of practical arrangements, arrangements that may be quite different from one jurisdiction to another. As a strategy, it will not predict what these arrangements will be. However, it does recognize that most existing collections of this kind are in legislative libraries and that legislative libraries are in many ways the natural and preferred home for such collections. Accordingly, we propose a ranked list of possible solutions, ranging from most preferred to least preferred. 1. Legislative library or equivalent3 2. A consortium of legislative libraries 3. A CARL academic library located within the province (or University of Prince Edward Island in that province) 4. A CARL library (an academic library or Library and Archives Canada) outside the province or territory 5. Periodic capture of provincial/territorial websites by Library and Archives Canada (or delegate) under s. 8(2) of the Library and Archives Canada Act. The list is not intended to be exclusive. The interpolation of other solutions is invited. As noted, different solutions will be appropriate in different jurisdictions. Solution 1 is more or less in place in several provinces. Just as clearly, solution 1 will not be implemented in the smaller jurisdictions in the foreseeable future. It would be desirable to see solution 1 or 2 in place for every jurisdiction. A consortium under the control of participating libraries (solution 2) might be a politically acceptable means to out-source preservation from smaller jurisdictions to other 3 In Quebec, the Bibliothèque nationale du Québec; in Prince Edward Island, the Government Services Library. 10 public institutions in Canada or to the private sector. There are numerous precedents that might serve as governance models, e.g. Canadian National Site Licensing Project. Participating legislative libraries might form a legal entity, which could then pursue funding from parent governments (or allocate funding from their internal budgets) and contract with external bodies for the collection and preservation of material from their jurisdictions. Ownership of the collections would remain with the contracting libraries. The assumption is that a consortium would offer economies of scale unavailable to individual provinces and territories, and that a consortium would insure a common standard. Failing solutions 1 or 2, CARL directors should consider solutions 3 to 5. Clearly, solution 3 (a CARL library building a collection of material from the province in which it is located) will be preferred over solution 4 (a CARL library building a collection from a province or territory outside of which it is located), since the collection would reside within the jurisdiction which is the source of the publications. Solution 5 would amount to a salvage operation. It would do little to afford current access, but would allow future librarians and scholars to recover government publications from defunct websites. (It is also unclear whether the Act can be turned to this purpose.) Possible solutions, as delineated above, are not mutually exclusive. For example, in some jurisdictions the legislative library might seek to collaborate with another library in the jurisdiction, which might entail a combination of solution 1 and solution 3. In light of this ranked list of solutions, CARL should adopt two immediate action items: 1. Offer support and assistance to provincial/territorial legislative libraries through APLIC 2. Undertake to raise awareness of the issue in the minds of the public and with provincial and territorial governments. Discussion of these items follows. Action item 1: Supporting APLIC and the legislative libraries CARL may wish to enter into open-ended discussions with APLIC, but should address one question in particular: Is there anything that CARL or its members can do to assist legislative libraries create comprehensive collections of web-based government information? Assistance might take many different forms. It might include joining with them in lobbying for additional government support, in providing technical advice, or even something as simple as providing access to a server. Such actions have minimal cost and might be of limited duration. On the other hand, assistance might entail a major 11 commitment, such as agreeing to take responsibility for collecting and preserving some portion of the provincial or territorial output of web-based publications. Needless to say, CARL cannot make substantive commitments on behalf of member libraries. It might, however, help match needs with CARL libraries willing and able to provide assistance. This action recognizes and supports legislative libraries as the natural and preferred location for such collections. Recommendation 4: CARL should initiate discussions with APLIC, offering support and assistance to legislative libraries in their efforts to collect and preserve web-based government information. Action item 2: Raising public awareness The campaign to raise public awareness should be for the purpose of motivating provincial and territorial governments to properly fund preservation of web-based information they have created. Such funding would make it possible to effect preservation through solution 1 or 2 above. It would be appropriate to recruit allies within the library community, e.g. APLIC and other library associations. But it is also important to create alliances with law societies, journalists, public interest groups and scholars. It would be a boon to such a campaign to have a high profile “champion”. There may be a special window of opportunity for such a campaign. The Canadian Newspaper Association recently released the results of a nation-wide survey of Canadian freedom of information legislation and how it is being implemented. (The survey is sometimes referred to as the FOI audit.4) The study is notable for examining the success of such legislation at the provincial, not solely the federal level of government. Armed with the results of our recent survey, as well as anecdotal evidence (e.g. New Brunswick statutes being published only in electronic format), CARL and allied organizations might launch a campaign in the form of a public response to the Newspaper Association’s FOI audit. The argument needs to be made that public access to information rests upon the twin pillars of freedom of information legislation and ordinary, published government information. Freedom of information has a high profile and is what most interests journalists. Government publications are essentially taken for granted. What is not understood by journalists and by the public at large is the grave risk that they may be lost, particularly when they are solely electronic. 4 Canadian Newspaper Association. “Public’s Right to Know in Failing Health in Canada.” May 28, 2005. Available at: http://www.cna-acj.ca/client/CNA/cna.nsf/web/Public's+right+to+know+in+failing+health. 12 Whether CARL chooses to tie the campaign to the FOI audit, the immediate objective is to motivate provincial and territorial governments to provide for long-term preservation of the information under their control. CARL should be prepared for provincial and territorial governments to be defensive and, if cornered, to look for internal solutions. Least likely of all is any admission of failure or lack of foresight. An internal solution may be acceptable if it means empowering (and funding) a memory organization such as legislative library, provincial/territorial archive, or provincial/territorial library to take on the job. The risk is that government will claim to have addressed the problem by turning it over to individual departments, to a queen’s printer, or to an information technology office. The argument has to be made that institutions that create and distribute information should not be entrusted with its preservation. Although there is a continuing need to raise public awareness about this and other issues, what is intended here is a campaign of limited scope and duration, perhaps six months. If it is not entirely successful, and it is hardly likely to be successful in every province and territory, CARL will need to take more direct action, as described below. Recommendation 5: CARL should undertake to raise public awareness of the need to collect and preserve web-based provincial/territorial government information. CARL should attempt to recruit other interested parties to this cause. Assessment after six months After a suitable period, perhaps six months from acceptance of this report, CARL should assess what progress has been made in creating comprehensive collections of provincial/territorial web-based government information. Six months should be enough time to observe any results following from discussions with APLIC, from the recommended public awareness campaign, and from the general effect of increased awareness of the issue within the library community. The assessment will not require a full scale survey. It will require no more than a brief query to CARL and APLIC libraries to learn of any new developments: collections begun, expanded, reduced or indeed abandoned. It is likely that at least one additional legislative library will be starting a collection by then, and there may be others. Given this information, CARL members should enter into discussion on a course of action to carry out the objectives of Recommendation 1, i.e. insuring that there is at least one comprehensive collection of web-based government publications for every province and territory in Canada. It is probably best to avoid being too specific about the discussions at this time, however the assumption here is that it is in the interest of CARL libraries to undertake collecting this material, if that is what is required to insure its preservation. 13 This author has identified three options entailing CARL participation (solutions 3 to 5 above). To these might be added other options, unforeseen by this author, but arising perhaps out of the discussions. As well, CARL might choose to delay action for various reasons or, in fact, choose to do nothing at all. Needless to say, it is to be hoped that CARL members will reject doing nothing. Indeed, there would be little purpose in this study and its predecessor if that is the outcome. Before closing this section, it may be worth making a crude estimate of the cost to CARL libraries in building collections of this material. From discussions, it appears that Ontario and British Columbia legislative libraries each have two to three staff working full-time downloading material. To this must be added the cost of cataloguing, hardware, and technical support. These are considerable resources, however Ontario and British Columbia are among the largest provinces in the country. The jurisdictions most likely to need the attention of CARL libraries are precisely the smallest ones, and the output of these jurisdictions is correspondingly small. In the smallest, this may amount to no more than thirty or forty serials (including legislative records, which are voluminous, but also annual reports, estimates, and public accounts, which are annuals) and a few hundred monographs. Recommendation 6: CARL should set a date, approximately six months from acceptance of this report, to review progress and decide next steps, as appropriate. 14 Conclusion In this report, we have proposed that the primary objective should be to insure that there are complete collections of web-based government publications for every province and territory in Canada. It has been relatively easy to identify what should be included in these collections and how they should be built. The hard question, as anyone can see, is who is to do it. The position taken here is that no simple formula will serve provinces and territories that are so different in size, wealth, population, and infrastructure. What is proposed therefore is a strategy for finding what will amount to a set of practical arrangements. What is important are results, not the path by which we get there. Addressing the challenge of preserving provincial and territorial web-based information will exceed the resources of the legislative libraries that are perhaps the natural and preferred homes for this material and that have done most of the work to date. The most obvious candidates to supplement their efforts are the members of CARL. In the course of compiling this report and its predecessor, I have been impressed by the spirit of cooperation among the libraries involved, both CARL libraries and legislative libraries. I trust that this spirit will continue. 15 Addendum I: Copyright This author is not a copyright specialist, and if CARL wishes to address copyright, it should seek an opinion from a properly qualified authority. It is probably safe, however, to make a couple of observations. Clearly, crown copyright applies to web-based provincial and territorial government information, just as it applies to print. To our knowledge, there have been no judicial decisions relevant to the legality of collecting “free” information from the web, whether put there by government or by anyone else. Libraries that have collected web-based government publications appear to uniformly block the servers on which they reside to webcrawlers. This is certainly a wise decision, as “publishing” material on the web normally means making it available to webcrawlers. Copyright clearly prohibits unauthorized publishing of such material. Finally, we note from developments in Ontario and Quebec, that CARL libraries have shown little interest in building collections of their own when permanent, properly organized collections are available to them at institutions like the Ontario Legislative Library and the Bibliothèque nationale du Québec. This suggests that CARL libraries are chiefly interested in guaranteed access, not necessarily in building or owning collections of the materials themselves. Any venture into this area by CARL libraries, therefore, is in the nature of last resort. Such actions are motivated by public interest, not by any desire to acquire material illicitly. 16 Addendum II: Looking Further Ahead There is scarcely need to explain that the measures here, especially as they relate to what to collect and how to collect and organize it, will be superseded as more advanced techniques become available. As indicated several times here, it is necessary that we capture and preserve web-based information that goes beyond print-equivalent material, in particular data files, databases, interactive services, and material from the “dark” web. For practical purposes, this will require formal arrangements for transfer and deposit, perhaps along the lines of the OAIS Reference Model. On the whole, legislative libraries, administratively and sometimes legally, are in a privileged position, relative to CARL libraries, for this kind of arrangement. Governments are going to be more willing to transfer material for long- term preservation to a government institution than to an institution that is outside government. Other issues facing us are technical. It will be necessary to collect preservation metadata that supports long-term preservation. XML-based schema will be more adaptable to these requirements than MARC records. Leading work in this area has been undertaken by the OCLC/RLG PREMIS Working Group.5 In September 2004, the California Digital Library received a grant of $2.4 million from Library of Congress to develop tools for capture and preservation of web-based government and political information. We know that government information produced in the United States and Canada is similar in most aspects relevant to preservation, so tools coming out of this project may be applicable within our own environment. These are two of the most important developments to monitor. 5 OCLC/RLG PREMIS Working Group, Data Dictionary for Preservation Metadata: Final Report of the PREMIS Working Group. (May, 2005). Available at: http://www.oclc.org/research/projects/pmwg/. The PREMIS report was released too recently to be incorporated into this report. 17 Appendix I Collecting Web-Based Government Publications: Standard Practices in Canadian Libraries The following represents what may be termed standard practices in Canadian libraries at the time of this report. They cannot be termed “best practices”, as best practices have not yet been identified (See Appendix II), and current practices are destined to be superseded by better practices, as they emerge. “Recommended” practices below represent the opinion of the author and may not be (and generally are not) the current practice in most libraries in Canada. Collections and collection policies • Collections consist principally of web-based publications analogous to print publications, usually in PDF or HTML; • There is not usually a formal agreement between the government or government agency and the library, although in the case of legislative libraries there may be a legal mandate to collect publications; • Material is collected by downloading and transfer to a server. Storage • Material is stored on a server connected to the internet; • Material may be organized: o By government agency, or o By serial number, bibliographic number, etc. • Material on the server is blocked to webcrawlers; • Material may be stored o In its native format, or o Converted to PDF o Note: when converting HTML, Word, or other formats to PDF, it is recommended that the original be retained for long-term preservation. See Appendix II, no. 4. Metadata • Materials are catalogued on the system used for the library’s general collection, with a link from 856 tag to the item; • When an item is held by a library in both print and electronic formats, it is recommended that a record be created for each item; • Preservation metadata is recorded in an unstructured note, including: 18 o Date collected; o Source URL; o Format and version; o Tools used for capture, e.g. HTTrack; o Software required to read it, e.g. “requires Adobe Acrobat Reader Version 5.0”. • It is recommended that catalogue records for these materials be contributed to the Amicus database in Library and Archives Canada Serials • Annuals and other low-frequency serials may be accessed from the holdings record, on some library systems; • For frequent serials, and all serials in the case of library systems without the capacity described above, a web page in HTML is used to organize issues and “volumes”. 19 Appendix II Major Trends Identified by the PREMIS Working Group The PREMIS Working Group identified the following as “trends in practice that may 6 ultimately emerge as best practices”: 1. Store metadata redundantly in an XML or relational database and with the content data objects. Metadata stored in a database allows fast access for use and flexible reporting, while storing them with the object makes the object self-defining outside the context of the preservation repository; 2. Use the METS format for structural metadata and as a container for descriptive and administrative metadata; use Z39.87/MIX for technical metadata for still images; 3. Use the OAIS model as a framework and starting point for designing the preservation repository, but retain the flexibility to add functions and services that go beyond the model; 4. Maintain multiple versions (originals and at least some normalized or migrated versions) in the repository, and store complete metadata for all versions. Retention of the original reduces risk in case better preservation treatments become available in the future; 5. Chose multiple strategies for digital preservation. There are good reasons to have more than one approach in a developing field. There is additional discussion of these trends in section 6 of the Report, p. 53-55. 6 OCLC/RLG PREMIS Working Group. 2004. “Implementing Preservation Repositories for Digital Materials: Current Practice and Emerging Trends in the Cultural Heritage Community.” Report by the joint OCLC/RLG Working Group Preservation Metadata: Implementation Strategies (PREMIS). Dublin, O.: OCLC Online Computer Library Center, Inc. Available online at: http://www.oclc.org/research/projects/pmwg/surveyreport.pdf (PDF:668K/66pp.), p. 7-8. 20
"Collection and Preservation of Web-Based ProvincialTerritorial "