Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

1023_BNH

VIEWS: 5 PAGES: 10

									Legal deposit from the Internet in Denmark – experiences
with the law from 1997 and the need for adjustments .
First I will briefly sketch the history of legal deposit in Denmark and what effect the
changes in the 1997 Legal Deposit Law have had for downloading of electronic
material from the internet for the Danish legal deposit libraries.
Next I will focus on some of the changes which were made in the 1902 Legal Deposit
Law in order to make a comparison between the current conditions regarding
electronic material as the consequence of the 1997 law.
Finally I will point to adjustments in the Danish Legal Deposit Law which are
necessary if we are to be capable of archiving either all or representative parts of the
Danish internet for the future.

Historical background

When we celebrated the passing of the new law in the summer of 1997, we could also
celebrate the three- hundredth anniversary of the first ordinance, issued on Legal
Deposit, namely a royal order that all printers in the kingdom deposit copies of the
works that they printed with the Royal Library in Copenhagen.
Revised legislation has since been passed at long intervals.
The 1902 law was the most extensive. It was passed just as the Danish Industrial
Revolution hit the printing industry and thus increased the amount of printed matters
deposited immensely.

The reason for having legal deposit has changed.
Contrary to what one might expect, the first ordinance was not issued in order to
control the printed expressions of his subjects, but to ensure free copies for the
absolute monarch to exchange with his royal colleagues in other countries.
It didn‟t quite work that way and the idea of exchange was dropped, but not the legal
deposit.
There have been other reasons for the many changes which were made up to 1997, but
not until the remarks to the present law is the purpose explicitly stated as being the
preservation of that part of the cultural heritage that is made up of published works.


The 1997 law and the system that supports the law .
In 1997, the Danish legislation on Legal Deposit was modernised and updated.
Working on a definition of what was to be deposited, the text ended up with two key
words:
“Work” and “published” and with the important point “Regardless of medium” and
“work” being a delimited quantity of information which must be considered a final
and independent unit
“published” being when one or any number of copies of the work have been placed on
sale or otherwise distributed to the public
The modernised law thus covers a selective collection and archiving of internet
material.
When the law was passed, the accompanying governmental instruction was being
produced. It was during this stage, that the concept of “dynamic - static” appeared.
Static included (periodically updated) monographs and periodicals while dynamic
excluded databases and homepages.

This was created partly as an attempt to define and also limit the number of works to
be deposited, partly as a measure to satisfy the software industry, who with success
had protested the deposit of computer programs.
At present only static documents are covered by the law and therefore archived in our
system.

Documents are downloaded from open as well as password protected web sites.

The web site www.pligtaflevering.dk contains information about the Legal Deposit
Law from 1997, its interpretation and forms for notification of different kinds of
publications. The website has recently been updated with a new search facility . This
web site constitutes the public part of the system.

To support the law, a system was developed for retrieving, archiving, viewing and
interacting with the rendered form of the archived net publications. Due to copyright
legislation, we are not allowed to give access over the net to deposited digital works
and this part is therefore non-public and the archived net publications can only be
viewed at the reading rooms in the legal deposit libraries where print-outs for personal
use are allowed.

The formulation of the existing law, which only requires that a portion of the content
of the net be deposited, makes it difficult to create a fully automatic model by which
all relevant material is harvested and registered.
Until now our harvesting have been based on a model where a notification starts a
download of the publication. The person in charge of the technical completion of the
digital copy is responsible for the notification by filling out a form at the website.
But this approach has the disadvantage that far from every producer of works subject
to the Legal Deposit Law is aware of it. Whereas printers and published have known
about Legal Deposit for many years, the new depositors have rarely if ever heard
about it and so we need to “educate” them if this approach is ever to succeed.
Mail campaigns and advertising in the newspapers have speeded up notification and
the number of harvested documents has increased, but as you shall see, the figures are
still low.
The material is archived completely as it is received without modification. W hen it is
placed at the disposal of users in our display system, this goes on through a database
in such a way that all URLs are corrected to references within the archive instead of to
active documents on the net.

At present we have archived parts of less than 1000 Danish sub-domains in the
system.
The 300.000 registered domains are far from all active or unique, but if we reduce
them to c. 2/3 to compensate for this, we are archiving less than 0.5% of these c.
200.000 domains.
Figure 1: Volume in Archived Material

                       June 1999             June 2000             June 2001
# Net publications     958                   5424                  9175
# Files total          87.886                346.685               569.150
# Bytes total          1.66 Gbyte            12,0 GByte            18,2 GByte


The system has archived material since January 1st 1998, and the figures give the
amount of publications, monographs as well as periodical issues, archived from then
until today.
The falling number of files pr work is due to the rise in the percentage of periodical
issues and that this particular type of publication is less complex and generally is
made up of fewer files pr. issue than is the case for monographs.

569.000 files have been collected in 3.5 years. In comparison, a total sweep of the
whole of the Icelandic domain of in all 5750 domains in January, 2001 , resulted in
565.000 files (nearly the same number of files) and a robot took one week to make the
collection.

The average need for storage is around 2 Mbyte per publication.


Figure 2: Monographs vs Periodicals

                 Before               Before                  Before
                 July 1st 1999        July 1st 2001           July 1st 2001

                 #            %       #                %      #               %

 Monographs 642               67      1594             29     2850            31


 Periodicals     316          33      3830             71     6325            69
 (issues)




As the numbers show, there has been a considerable increase in the archiving of
periodical issues, and this type of publication now makes up more than 2/3 of the
publications which are archived.
By far the majority of static net periodicals are collected because the Royal Library‟s
own personnel do the notification of these titles to the system.

The 6325 periodical issues are distributed over 564 different periodicals.
Figure 3: Public vs. Private Publishers

                June 1999            June 2001               June 2001



                #            %       #               %       #              %

Public          648          68      3985            71      6200           67,5



Private         304          32      1430            26.4    2975           32,5



As is clear from the table above, the division between works from s ites with public
sector publishers and those from private publishers has been nearly constant
throughout the 3 year period. C. 2/3 of the archived publications come from public
sites and c. 1/3 come from private sites.

The material collected from public publishers is mainly working papers, reports or
scientific reports, guides, periodicals and newsletters.
The material collected from private publishers is mainly periodicals including
newsletters, reports and Danish literature.
The closer we come to the individual citizen and his private concerns, the less the
citizen is represented in the archive.

Figure 4: Staff resources

                      Man Years             Paid hours per       Comments
                                            publication

 1998                 2,3                   12,75                System being
                                                                 developed and
                                                                 set up
 1999                 1,9                   1,2                  Downloading ,
                                                                 cataloguing and
                                                                 classifying all
                                                                 publications
 2000                 1,3                   0,6                  Downloading all ,
                                                                 cataloguing and
                                                                 classifying
                                                                 periodicals
Paid hours include vacation, education etc.
0.6 man hour is the price per archived net publication including a classification and
cataloguing of net periodicals. The net publication classification is made for the
national bibliography and ISSN.
William Arms suggests in his article about the Minerva Prototype at Library of
Congress in the latest number of DigiNews
(http://www.rlg.org/preserv/diginews/diginews5-2.html#feature1 ) that selective
collections are at least 100 times as expensive as bulk collection.
I do not know if 100 times per file or per net publication is the right factor, but the
experiences from our system show that building up a selective collection is expensive
and that it costs more than 1 man hour to download, store, cataloguing and classify a
notified publication which means that we have to be careful with which materials we
select for this expensive treatment. Selection ought therefore be directed to a greater
degree towards types of materials which we wish to preserve, but which we cannot
catch via the more mechanical methods – e.g. Interactive works, streamed material or
collecting according to themes or events.


Figure 5: MIME Type Statistics - % of collected files

                  June 1999        June 2000        June 2001


 TEXT/            56,0             58,6 %           59,3 %
 HTML
 Image (GIF, 41,8 %                38,4 %           37,9 %
 JPEG,
 PNG)
 PDF              1,3 %            1,6 %            1,7 %


 Other            0,9%             1,4 %            1,1 %
 formats


One of the duties of the Royal Library is to collect, store and make the files available
now and in the future. This could be problematic, but if you look at this slide, you
will see that by far the major portion of the archive is at present in generally well
known and wide spread formats, which we must expect will be maintained and
available some how in the future. Only c. 1 % of the archive have other formats.

That dynamic publications are not to be archived, and that e.g. streamed material as a
whole is not included, does of course skew the distribution of data formats in the
archive. But the figures from Sweden shown that the picture does not c hange
dramatically merely by changing from selective collection to harvesting. The picture
becomes more cloudy, and the number of data formats would rise, but the major
portion would still be types that can be preserved for posterity.
If other forms of collection are also to be used, e.g. an actual delivery of the more
difficult material, the picture could easily change.

The technical problems we have run into in connection with our current system are
similar to some of the problems we hear about in connection with the large harvesting
projects.
We often run into documents with errors in HTML standards or other errors which
make archiving difficult or impossible.
Similarly, there are problems with the download and later access to documents that
have java, client-side elements like java scripts or other types of code inside.
We have within the last few weeks received from various sources advice regarding
solutions to e.g. the Javascript problems, on which we intend to follow up.
And this is the situation: what was impossible yesterday may be possible tomorrow.
The global boom in portals offering e.g. search engine services – both highly
specialised and very general - indicates that search-engine developers are faced with
many of the same problems regarding document decoding as people in the archives.
This 'boom' implies, that there now are many more 'heads' working on the problem
than just a few archivists and thus solutions will be found.

Reasons for harvesting.
Figure 6: Three generations using the internet:

                     1st (age 74)           2th (age 40)                     3th (age 10-15)
 Professionel        Professional online    Professional online periodicals Uncritical all available
                     periodicals /portals   /portals                        material
 life (Work/
 school related)                            Product information
                                            Institutions and organisations
                                            Newsgroups
 Entertainment       Just surfing around    Auctions                         Events
                                            Game services                    Game services
                                            Bizarre websites                 Gimmicks
                                            Newsgroups                       Chat services
 Searching for       Search engines         Search engines (including        Search engines
                     News                   cashed web pages)/portals
 information                                News and media/portals
                     Municipal sites
                                            State- and municipal sites
                                            Product databases
 Special             Homebanking            Homebankingand info related Sport clubs (results)
                     Stock exchange         to family economy           Live role play
 interests                                  E-commerce
                                            Organisations
                                            Seasonal interests


I have looked at my own family and our use of the net to see if what is harvested
today for the archive in any way reflects our use of the net. We are three generations
who all use the net, but each in our own way. Each of us has at least one email address
and we all use search engines, including the cache possibilities.
In our professional life use of the net is concentrated on professional periodicals and
portals, institutional reports, web sites with product information as well as use of news
groups in connection with problem solving.

In our private live the pattern and use is completely different: here we use the net for
searching for information, for entertainment or to search for specific services.
 Information searches are directed at Online news (daily newspapers and specialized
journals such as Jyllandsposten or Ingeniøren, portals such as DR, TV2) and portals
such as the Netdoctor Special services at „authorized‟ web sites, for example:
Weather reports at the Meteorological Institute, travel route planner at the Danish
railway (DSB) and the maps at Krak are also much used. We also use public sites and
services to the extent that we need them, whether they are municipal or state.

As to entertainment, there is quite a spread in the field according to age. The first
generation virtually does not use the net for entertainment, but this is not the case for
the next two, despite the differences in their patterns of use. Use of newsgroups, net
conferences and online auctions belongs to the parent generation , while chat and the
following of events such as. Big Brother are almost exclusively the province of the
youngest generation. Both generations play on the net and download updates to
previously purchased off line games. In addition to home banking there is information
searching on all other family related economic situations such as insurance, the stock
quotations, interest rates on housing loans etc. We investigate the market for all kinds
of stuff – models and prices are checked in product databases - the modern version
of the mail order catalogue.
We buy things from the nets B-to-C websites: Traditional things like programs and
hardware but also food, books, videos, cloths, airplane tickets and hotel reservations.

We use the net to follow our various leisure interests: The youth club calendar, lists of
arrangements, news letters and minutes of meetings, the calendar for church services
in the local church, the variety of information from the garden society, the national
and local football clubs for results and information about up-coming games, sign- ups
and information about up-coming role-playing events. My conclusion is that we have
only used a minor part of the material that is actually archived (those highlighted in
red at figure 6), that which is not archived. The nearer we come to our daily lives as
citizens on the net, the less we have succeeded in archiving; primarily because the
strong side of the net is its offerings of the newly updated or if the issue is actual
services such as reservations, e-commerce, i.e. dynamic material.
Figure 7: Modifications from 1902

Material added in 1902                       Dynamic material used by family
Brochures and advertisements                 Brochures
Catalogues                                   Product databases/portals
Election campaign material
Club/organization magazines                  Organization websites
Scouting magazines, church newsletters       News/minutes on the websites
Maps                                         Online services like krak.dk
Portraits
Art prints                                   Net Art
Songs

The legal deposit was expanded considerably in 1902. The commentary on the law
refers repeatedly to technical developments in the production of illustrations and
printed matter.
All the new types of material were the direct result of the industrialization of the
printing process.
If we look at the type of material added in 1902 we find many if the categories listed
on the previous slide and many of the things in the right column do not have a
matching printed publication as in the case of institutional or scientific reports.

Online news corresponds to printed newspapers, which were already included in the
law from 1781.

The materials which were collected at the library as a consequence of the changes in
the 1902 law are today the material which researchers use when the histories of firms
are written or research is done on the break-through of industrialization, as well as
when illustrations typical of the period are needed.

In 2000 168.000 printed items were archived of the type called in DK ‟småtryk‟
(ephemera) whereas a total of 30.000 items were discarded. Of the corresponding
electronic publications nothing was archived, except annual reports from a few firms.
And in line with rising costs connected to printing and distributio n more or more of
this material will only exist in electronic versions. An example of this could be
BMWs latest advertisement campaign in which they use short film on the internet.

The materials which were collected at the library as a consequence of the changes in
the 1902 law is today the material which researchers use when the histories of firms
are written or research is done on the break-through of industrialisation, as well as
illustrations typical of the period are needed.

If we are going to collect electronic material matching the 1902 law we have to use
techniques like harvesting the entire Danish web space.
We would not only get a better coverage of Denmark outside the public sphere, but –
depending on the frequency of harvesting – also get all the changes in the contents.
Last, but not least, we would be able to catch new trends in functionality, contents and
design on the net as soon as they appear.
A complete ordinary archiving of the Ministry of Research‟s website at the end of
May 2001 would here captured a new era in the ministry, but also the enthusiasm for
technology that characterizes our day. The Ministry of Research began its reading of
documents with speech synthesis.

Why not only harvesting?

The purpose of legal deposit in our time is preservation of the national cultural
heritage for the future.

One of the possible ways of preserving the Danish web is by harvesting. But
harvesting may not be enough, not even if we were allowed to harvest from all sites,
internet as extra- and intranet, mail- and news servers. Some material available for
you as users simply isn‟t available for a harvester. Streamed and webcasted material
for instance must be delivered, if it must be covered by the archive.

Sites which adapt their contents and design to the current user would also be difficult
to archive as a whole and likewise with publications depending on a program running
on the webserver. I will show an example in a minute.

For preservation, the ‟background‟ version e.g. in XML used as basis for actual web
publication may well be a superior format.

N2art.nu is a site started by the five Nordic culture nets which presents a series of
digital art works produced for the net. Here is one of the Danish contributions:
Molecubes by one of Denmark's leading digital musicians, Peter Fjeldberg.
This is a project in which the user – via a browser window interface – is able to
generate music that is transformed into molecular objects. The project is controlled by
the user, and the user can extend the work of art by adding sounds and structures, and
she can also see and hear contributions created by other users. Everything that is
contributed by the users is stored and added to the work via a database.

The task of archiving this piece of net art would not be trivial. Harvesting will not
succeed and other methods must be considered.
In the presentation by Birte Christensen-Dalsgaard, an idea is given for archiving
such a piece of net art.

Harvesting may not always give the best format for long-time preservation A new
website - www.adl.dk or Archive for Danish Literature - will open this autumn
produced in partnership by the Danish Society for Language and Literature and The
Royal Library. Over the next few years this web site will publish the works of older
classical Danish authors in more or less their full extent.
All texts are structured in XML which means that the information is separated from
the layout. Each XML file contains a whole publication with mark- ups for page
breaks, chapters and so on. The XML is loaded to a database and the database
application performs the transformation to well- formed HTML. The works are
presented as desired (either as images of the printed page or as html documents) with
the possibility of navigating between the individual pages in an edition.

By harvesting the website we will therefore get the HTML files and not the XML
files, which are tagged ASCII and therefore much easier to preserve and migrate to
new standards in the future. SGML and XML is already widely use formats in
electronic publishing and the trend is expected to continue.

So we have to decide what should be the main purpose of the archive.

Needed adjustments
We do not imagine that we can archive the whole internet. The Internet covers not
only information and access to information, but also phenomena such as shared
computering (like SETI and THINK) and integration with other units such as mobile
telephones and small handheld computers.
But we wish to achieve a broader coverage of the type of materials that are to be
collected, as well as to minimize the costs connected to the collection and registration
phases, but also in the following long term preservation phase.
Harvesting is suitable for this and should be used to gather net material but harvesting
cannot solve all the problems. We still need the possibility to collect selectively for
various reasons.
Therefore we urge that the present Legal Deposit Law be amended by rules that will
make it possible for the national Legal Deposit institutions to harvest those sections of
the Internet, that the institutions deem essential for documenting the national digital
heritage and at intervals that will best serve this purpose.
At this moment access to the archived material is only given at the legal deposit
libraries. It should be possible to find a solution where material could be freely
accessible from the archive with the owners permission, at least after a period. In the
selective collection model such permission could be given at the notification time and
in the harvester model this could be done by adding a simple tag to the web page.


Copenhagen June 2001,
Birgit N. Henriksen,
bnh@kb.dk

								
To top