FRANCE The Bibliothèque nationale de France (BnF) has been active since 2000 in developing a combined methodology including: Automatic large scale domain crawls several times a year Continuous crawl of automatically selected sites (10% of the total) Deposit of deep web sites that can't be harvested on-line Thematic event-based collection for very ephemeral sites (that crawlers would take too long to find) In collaboration with INRIA (the French National Institute for Research in Computer Science and Automatic Control), the BnF has been experimenting in Web collection techniques using automatic harvesting and evaluation methods. A test of automatic selection parameters based in-linking ranking (like Google) has been made. The initial phase of the program, which ran until June 2001, focused on content gathering, involving the collection of 16 audiovisual sites with small robots to assess the limits of these crawlers. In 2002 two harvests were made: a thematic collection of sites relating to the French elections (1900 sites) a comprehensive crawl of the .fr Web In a second pilot on methods for archiving hidden Web sites, which ran from 2002 to June 2003, 100 selected Web owners were approached to deposit their Web sites in the BnF for permanent archiving. Legal Deposit The Loi du 20 juin 1992 relative au dépôt légal , revising the legal deposit legislation, came into force in 1994. It requires legal deposit of printed, graphic, photographic, sound, audiovisual and multimedia documents, whatever the technical means of production, as soon as they are made accessible to the public by the publication of a physical carrier. The legislation does not cover online electronic publications. An extension of the French legal deposit law to networked digital material should be voted by the French Parliament in 2004. It will allow BnF and INA (Institut National de l'Audiovisuel, in charge of TV and Radio preservation) to harvest sites on-line as well as request a deposit from publishers, when online harvesting is impossible and make the collection publicly available on site. Project There is no specific project at the moment. Software A portable extraction tool (DeepArc) was developed to enable simple extraction of database to XML by the producers. This tool, currently being tested by IIPC members will be released in open-source soon. It is envisaged that harvester technology may be employed as a discovery tool to identify sites in the Deep Web of interest for collection, which would then be transferred to the BnF after a technical negotiation with site owners. Web crawling is based on techniques developed at INRIA for Xyleme, a project concerned with creation of large dynamic data warehouses of XML data ( http://www.xyleme.com ). Like other robots, the Xyleme crawler is designed to retrieve HTML and XML pages from the web to a place where they can be monitored and stored. A key feature of Xyleme is that certain pages can be refreshed regularly.