iipc netpreserve.org contact
site search with google:
 
about
mission
who we are
member archives
for members
join the iipc
working groups
press releases
publications:
reports
events:
conferences and
 workshops

software:
toolkit
downloads

Member Archives

Archive List

Alphabetically (All archives are open onsite and offsite (via the Internet) unless otherwise noted after the archive name.)

Bibliotheca Alexandrina's Internet Archive
  • Collecting institution: Bibliotheca Alexandrina - http://www.bibalex.org
  • Start Date: 1996
  • Archive interface language(s): English
  • Access methods: URL Search
  • Harvesting methods: Bulk
Visit archive

Bibliothèque nationale de France - Archives de l'Internet (Bibliothèque nationale de France Web Archives) - (Access is open to researchers onsite at the institution)
  • Collecting institution: Bibliothèque nationale de France (National Library of France) - http://www.bnf.fr
  • Start Date: 2002
  • Archive interface language(s): French
  • Access methods: URL Search, Keyword Search, Full-Text Search, Topical Collections
  • Harvesting methods: National Domain, Bulk, Selective, Event, Thematic
  • Description: Since 2006, BnF shares with INA responsibility for the legal deposit of the French online publications and web material at large. BnF web archiving program started in 2002 with election websites first snapshots, then continued from 2004 with a 5-years partnership with the Internet Archive, which included performing annual domain crawls of the French domain and acquisition of historical collections. Today, BnF is running both domain and selective crawls internally.

    In 2010, BnF Archive consists of ca. 180 TB of data (13 billion files) from 1996 until now. The scope of this collection is the French web (.fr and beyond) and combines domain, thematic and event harvests. Special collections include a range of national, local and European Elections harvests, along with topical collections such as online diaries, blogs and literary websites or activist websites documenting the social history of the Web. 85 curators contribute to the selection of seeds, forming a collections in most areas of knowledge, in line with BnF encyclopedic heritage.

    Due to legal restrictions, BnF web archives can only be searched and browsed by researchers within the library premises in Paris.
Visit archive

Columbia University Libraries Web Archive
  • Collecting institution: Columbia University Libraries - http://library.columbia.edu/
  • Start Date: 2008
  • Archive interface language(s): English
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Full-Text Search, Topical Collections
  • Harvesting methods: Selective, Thematic
  • Description: The Columbia University Libraries web resources collection program archives selected websites in thematic areas corresponding to the Libraries' existing collection strengths, websites produced by affiliates of Columbia University, and websites from organizations or individuals whose papers or records are held in the Libraries' physical archives.
Visit archive

Government of Canada Web Archive / Archives du Web du gouvernement du Canada (Government of Canada Web Archive)
  • Collecting institution: Library and Archives Canada / Bibliothèque et Archives Canada (Library and Archives Canada) - http://www.collectionscanada.ca
  • Start Date: 2005
  • Archive interface language(s): English and French
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Full-Text Search
  • Harvesting methods: Regional Domain
  • Description: The Library and Archives of Canada Act received Royal Assent on April 22, 2004. For the purposes of preservation it allows Library and Archives Canada (LAC) to collect a representative sample of Canadian websites. To meet its new mandate, LAC began to harvest the web domain of the Federal Government of Canada starting in December 2005. As resources permit, this harvesting activity will be undertaken on a semi-annual basis. The website data which is harvested is stored in the Government of Canada Web Archive (GC WA). Client access to the content of the GC WA is provided through searching by keyword, by department name, and by URL. It is also possible to search by specific format type, e.g. .pdf. At the time of its launch in Fall 2007, approximately 100 million digital objects (over 4 terabytes) of archived Federal Government website data was made accessible via the LAC website. The GC WA currently contains over 170 million digital objects and more than 7 terabytes of data.
Visit archive

Harvard's Web Archiving Collection Service (WAX)
  • Collecting institution: Harvard University Library - http://hul.harvard.edu/
  • Start Date: 2009
  • Archive interface language(s): English, Japanese
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Full-Text Search, Topical Collections
  • Harvesting methods: Selective
  • Description: WAX is part of the Harvard University Library's central infrastructure for the capture, management, storage, preservation and display of web sites for long-term archiving.
Visit archive

Hrvatski arhiv weba (Croatian Web Archive (HAW))
  • Collecting institution: National and University Library of Croatia - http://www.nsk.hr/digarhiv
  • Start Date: 2004
  • Archive interface language(s): Croatian, soon in English as well
  • Access methods: URL Search, Keyword Search
  • Harvesting methods: Selective
  • Description: Croatian Web Archive(HAW)is a collection of selected material harvested from the Web. The archive is built with the purpose to collect and preserve selected Web resources wich are part of the Croatian national heritage. Web resources are subject to legal deposit since 1997. Resources archived in the HAW are an integral part of the National and University Library collection and can be searched through Library's online catalogue (http://katalog.nsk.hr /) and HAW's URL: http://haw.nsk.hr/
    The Archive is built with the purpose to collect and preserve selected Web resources wich are part of the Croatian national heritage. Web resources are subject to legal deposit since 1997. Resources archived in the Digital Archive are an integral part of the National and University Library collection and can be searched via : http://haw.nsk.hr
Visit archive

Ina (Institut National de l'Audiovisuel) (Ina ) - (Access is open to researchers onsite at the institution)
  • Collecting institution: Ina (Institut National de l'Audiovisuel) (Ina) - http://www.ina.fr
  • Start Date: 2009
  • Archive interface language(s): French
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Subject Browsing, Topical Collections
  • Harvesting methods: Selective, Thematic
  • Description: Since february 2009, Ina has started the focused and selective archiving of audiovisual media related web sites. A core list of about 5000 web sites is regularly updated and enriched. They are being crawled on a daily basis. Access will shortly be available on site at the Ina consultation centre which is hosted within the research library of the
    François-Mitterrand site of the BnF.
Visit archive

Internet Archive
  • Collecting institution: Internet Archive - http://www.archive.org
  • Start Date: 1996
  • Archive interface language(s): English
  • Access methods: URL Search, Topical Collections
  • Harvesting methods: National Domain, Regional Domain, Bulk, Selective, Event, Thematic
  • Description: The Internet Archive is a non-profit organization that is compiling a historic database of Web sites and other digital content. IA?s web archives now exceed 2PBs of data (compressed) and encompass over 150billion captures collected from 1996 to the present culled from every domain, over 200million web sites and 40+ languages. This archival database expands with the Internet, and so grows by nearly 100TBs (compressed) every month. Usage of IA?s web collections via the Wayback machine average 400-500 requests per second.

    Recently, the Internet Archive invested in Sun's open storage server and software technologies, specifically a Sun Modular Datacenter (Sun MD), installed at Sun's Santa Clara campus, supported by the Sun MD remote monitoring service.

    The new Sun MD was installed in March 2009. It is equipped with 60 Sun Fire X4500 (Thumper) Open Storage Systems that run the Solaris 10 OS, including the Solaris ZFS file system. Sun's servers with Solaris ZFS storage pools enabled the Internet Archive to double the storage capacity of its old system while using up to 50 percent less power than other servers would use.

    Sun engineers monitor power, heating and cooling, fire, smoke, and water detection, and physical access points, and dispatch repair technicians, if necessary. IA Engineers manage the repository software, archival data, and access services provided to researchers and the general public.
Visit archive

Íslenska vefsafnið (The Icelandic Web Archive)
  • Collecting institution: Landsbókasafn Íslands - Háskólabókasafn (National and University Library of Iceland) - http://www.landsbokasafn.is/
  • Start Date: 2004
  • Archive interface language(s): Icelandic, Limited English translation
  • Access methods: URL Search
  • Harvesting methods: National Domain, Selective, Event
  • Description: The Icelandic Web Archive contains all web sites hosted on the Icelandic domain .is and many web sites hosted elsewhere that are in Icelandic or refer directly to matters of interest to Iceland.
    Access to the complete Web Archive is open to the world except for web sites where the user must pay for access and web sites that for some reason are closed by the owners request.
    The .is domain has been harvested since october 2004 and the policy is to harvest the complete .is domain three times a year. In addition selected web sites are harvested at least weekly and for national events like elections relevant web sites are harvested.
Visit archive

Kansalliskirjaston verkkoarkisto (Finnish Web Archive) - (Other)
  • Collecting institution: Kansalliskirjasto (The National Library of Finland) - http://www.kansalliskirjasto.fi
  • Start Date: 2006
  • Archive interface language(s): Finnish / Swedish / English
  • Access methods: URL Search, Full-Text Search
  • Harvesting methods: National Domain, Regional Domain, Event, Thematic
  • Description: Annually The National Library of Finland collects representative sample of webpages from webservers 1) either having fi- or ax-domain names, 2) residing physically within Finland, or 3) containing material that is targeted for finnish public.

    Between years 2006-2009 The National Library of Finland collected about 160 million files and over 10 TB data from Internet (uncompressed).

    ACHIVE'S AVAILABILITY:
    The contents of the archive can be only accessed from special legal deposit workstations that are available in selected libraries within Finland (including The National Library of Finland).

    Anyone can use the archive but digital copying of material
    from the archive is prohibited.
Visit archive

Library of Congress Web Archive
  • Collecting institution: Library of Congress - http://www.loc.gov
  • Start Date: 2000
  • Archive interface language(s): English
  • Access methods: URL Search, Alphabetic Browsing, Subject Browsing, Topical Collections
  • Harvesting methods: Selective, Event, Thematic
  • Description: The Library of Congress Web Archives (LCWA) is composed of collections of archived web sites selected by subject specialists to represent web-based information on a designated topic. It is part of a continuing effort by the Library to evaluate, select, collect, catalog, provide access to, and preserve digital materials for future generations of researchers.
Visit archive

Netarkivet.dk (Netarchive.dk)
  • Collecting institution: Netarchive.dk (Royal Library and the State and University Library, Aarhus) - http://www.netarchive.dk
  • Start Date: 2005
  • Archive interface language(s): Danish and English
  • Access methods: URL Search
  • Harvesting methods: National Domain, Bulk, Selective, Event
  • Description: The legal foundation for Netarchive.dk is the Act on Legal Deposit of Published Material of 22 December 2004. In order to collect the Danish internet as complete as possible three different strategies are followed:
    1) Bulk harvesting (snapshots) 4 times / year
    2) Selective harvesting of 80 sites, which are often updated and of special importance to the society (eg. news sites)
    3) Event harvesting (eg. national and local elections).
    Access to the archive is restricted to research purposes.
Visit archive

Nettarkivet Norge (Web Archive Norway) - (Everything is in a dark archive but Web archiving or other staff can see the content)
  • Collecting institution: Nasjonalbiblioteket (The National Library of Norway) - http://www.nb.no
  • Start Date: 2001
  • Archive interface language(s): Norwegian
  • Access methods: Keyword Search
  • Harvesting methods: National Domain, Event
  • Description: The Act relating to the Legal deposit of generally available documents, which came into force in 1990, is the National Library`s foundation with regard to web harvesting. Different harvesting approaches have been followed since 2001: 1) Selective harvesting of web sites 2001-2004 and from 2009. 2) Domain crawls once or twice a year since 2002. 3) Event harvesting since 2001, for events of national interest, such as general and local elections, royal weddings etc. Due to privacy protection, access to the Web Archive is restricted for the time being.
Visit archive

New Zealand Web Archive
  • Collecting institution: National Library of New Zealand - http://www.natlib.govt.nz
  • Start Date: 1999
  • Archive interface language(s): English
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Subject Browsing
  • Harvesting methods: Selective
  • Description: The New Zealand Web Archive forms part of the Alexander Turnbull Library's collection within the National Library of New Zealand.

    Access to websites is available by searching the National Library's online catalogue and then clicking on the link to the archived copy.
Visit archive

OASIS
  • Collecting institution: National Library of Korea - http://www.nl.go.kr
  • Start Date: 2005
  • Archive interface language(s): Korean
  • Access methods: URL Search, Keyword Search, Subject Browsing
  • Harvesting methods: Selective
  • Description: ?Online Archiving & Searching Internet Sources (OASIS)?, a project designed to acquire online resources, such as web sites and web documents
Visit archive

PANDORA Australia's Web Archive
  • Collecting institution: National Library of Australia - http://www.nla.gov.au
  • Start Date: 1996
  • Archive interface language(s): English
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Full-Text Search, Subject Browsing
  • Harvesting methods: Selective, Event
  • Description: PANDORA is a selective archive with a broad coverage of web materials relating to the social, cultural, political and intellectual life of Australia and Australians. It includes government sites, blogs, organisational sites, examples of commercial sites, some online newspapers and collections relating to events such as elections.
Visit archive

Patrimoni Digital de Catalunya (PADICAT) (Digital Heritage of Catalonia (PADICAT))
  • Collecting institution: Biblioteca de Catalunya (Library of Catalonia) - http://www.bnc.cat
  • Start Date: 2005
  • Archive interface language(s): Catalan / Spanish / English
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Subject Browsing, Topical Collections
  • Harvesting methods: Regional Domain, Bulk, Selective, Event, Thematic
  • Description: PADICAT is a repository destined to collect and preserve the entire cultural, scientific and general output of Catalonia in digital format, that is, to preserve Catalan websites and to guarantee their open and permanent access.

    The Biblioteca de Catalunya (Library of Catalonia), that is the national library of Catalonia, initiated the Padicat project in June 2005, with the technological collaboration of the Centre de Supercomputació de Catalunya (CESCA) and the support of the Secretaria de Telecomunicacions i Societat de la Informació de la Generalitat de Catalunya.

    The aim of the project is to acquire, preserve and make available knowledge and information on the Internet of the day for coming generations and to create the Web archive of Catalonia.
Visit archive

Spletni arhiv Slovenije (Webarchive of Slovenia) - (Everything is in a dark archive but Web archiving or other staff can see the content)
  • Collecting institution: National and University Library of Slovenia - http://www.nuk.uni-lj.si
  • Start Date: 2007
  • Archive interface language(s): slovenian
  • Access methods: URL Search, Alphabetic Browsing
  • Harvesting methods: Selective
 

The UK Government Web Archive
  • Collecting institution: The National Archives (U.K.) - http://www.nationalarchives.gov.uk
  • Start Date: 1997
  • Archive interface language(s): English and Welsh
  • Access methods: URL Search, Alphabetic Browsing
  • Harvesting methods: Event, Thematic
  • Description: Archive of UK Central Government websites
Visit archive

UK Web Archive (UK Web Archive )
  • Collecting institution: British Library - http://www.bl.uk
  • Start Date: 2005
  • Archive interface language(s): English
  • Access methods: URL Search, Alphabetic Browsing, Full-Text Search, Subject Browsing, Topical Collections
  • Harvesting methods: Selective, Event, Thematic
  • Description: The UK Web Archive is a corpus of websites selected by leading UK institutions for their historical, social and cultural significance, for the benefit of researchers. The archive is free to view and has already collected over 5,000 selected websites since it was set up in mid-2005.

    The UK Web Archive is provided by the British Library in partnership with the National Library of Wales, JISC and The Wellcome Library. It also contains records contributed by the National Archives and the National Library of Scotland.
Visit archive

Web Archiving Project
  • Collecting institution: National Diet Library, Japan - http://www.ndl.go.jp/
  • Start Date: 2002
  • Archive interface language(s): Japanese
  • Access methods: Keyword Search, Full-Text Search, Topical Collections
  • Harvesting methods: Bulk, Selective
  • Description: Our web archiving project started in FY2002 on an experimental basis and moved into the operational stage in FY2006. We started to collect the public institution sites based on the law
    in FY2010. We are using Web curator tool and archiving contents in WARC format.
Visit archive

Web Archiving Service
  • Collecting institution: California Digital Library - http://www.cdlib.org
  • Start Date: 2003
  • Archive interface language(s): English
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Full-Text Search, Topical Collections
  • Harvesting methods: Selective, Event, Thematic
  • Description: The California Digital Library provides the Web Archiving Service to enable partner institutions to build and publish web archives. Current
    partners include the University of California campuses, New York
    University and Stanford University. The archives contain extensive
    information on the State of California, but also reflect a range of
    collections such as U.S. Labor Movements, African Elections, Middle East Political Sites and more.
Visit archive

Webarchief van Nederland (Web archive of The Netherlands)
  • Collecting institution: Koninklijke Bibliotheek (National Library of The Netherlands) - http://www.kb.nl
  • Start Date: 2007
  • Archive interface language(s): Dutch
  • Access methods: URL Search, Alphabetic Browsing, Full-Text Search
  • Harvesting methods: Selective
  • Description: KB as national library is responsible for collecting, cataloguing and archiving publications issued in the Netherlands. More and more publications are exclusively published in digital form, such as for example websites. This digital cultural heritage is under thread of becoming inaccessible in the (near) future. Therefore, KB sees it as it?s task to collect, archive and provide permanent access to websites.

    KB selection of Dutch websites is based on its collection policy (Dutch history, language and culture). The selection focusses on websites containing scientific and cultural content. Another area of interest is innovative websites. A subsequent step will be to extend the by cooperating with other Dutch knowledge institutions.

    KB uses a selective approach to web archiving for several reasons:
    The .nl domain is enormous. KB chooses integrally archiving a selection of websites over partly archiving all websites.
    KB asks the owner of a website in advance permission for crawling, archiving and presenting a website.
 

WebArchiv - archiv ceskeho webu (WebArchiv - archive of the Czech web)
  • Collecting institution: Národní knihovna Ceské republiky (National Library of the Czech Republic) - http://www.nkp.cz
  • Start Date: 2000
  • Archive interface language(s): Czech
  • Access methods: URL Search, Subject Browsing
  • Harvesting methods: National Domain, Event
  • Description: The National Library of the Czech Republic has been building the archive of the Czech web since 2000. It deploys a combination of automated large-scale crawls of the TLD (.cz) and harvesting of selected "Czech" websites regardless of the domain. Recently, the library has been experimenting with automated crawls of "Czech" websites outside of the TLD using the WebAnalyzer tool. Regular automated crawls outside of .cz will be launched in 2010. Automated crawls of .cz are conducted once a year (twice since 2009) and selective harvests of about 1500 websites every two months. In addition, the library builds thematic or event driven collections. Access to the public part of the archive (from the selective harvests) is provided to anyone online via internet while the rest of the archive is available only to the library patrons onsite from the library building.
Visit archive

Webarchiv Schweiz (Web Archive Switzerland) - (Other)
  • Collecting institution: Schweizerische Nationalbibliothek (Swiss National Library) - http://www.nb.admin.ch
  • Start Date: 2008
  • Archive interface language(s): German, French, Italian, English (planned)
  • Access methods: URL Search, Keyword Search, Alphabetic Browsing, Full-Text Search, Subject Browsing, Topical Collections
  • Harvesting methods: Selective, Event, Thematic
  • Description: Web Archive Switzerland is a project undertaken in collaboration with the Swiss cantonal libraries as part of the e-Helvetica programme at the Swiss National Library. The goal of the Web Archive Switzerland project is to set up a collection of selected regional and cultural Swiss websites and to preserve them in the Digital Archive of the Swiss National Library. In collaboration with several cantonal libraries we have designed a shared workflow for the collection, cataloguing, archiving and dissemination of the websites. Standards are used to harvest the websites and to ingest them into the Digital Archive (Heritrix, webform), to describe the websites in the catalogue (MARC, AACR2, CONSER) and to identify the websites within the Digital Archive (URN on the basis of NBN). Web Archive Switzerland has been operational since May 2008. Nevertheless it is not open to public yet. Access is planned to be provided within the Swiss National Library by the end of 2010.
Visit archive

Webarchiv Österreich (Webarchive Austria) - (Other)
  • Collecting institution: Österreichische Nationalbibliothek (Austrian National Library) - http://www.onb.ac.at/
  • Start Date: 2008
  • Archive interface language(s): German
  • Access methods: URL Search, Topical Collections
  • Harvesting methods: National Domain, Bulk, Selective, Event, Thematic
  • Description: The new Austrian Media Law became operative in March 2009. This amendment to the law is the legal basis for web archiving and governs the collection of online publications.
    In principle the webpages with the domain .at and pages that have a specific connection with Austria (for example, the Austrian Cultural Institute in New York) are to be collected.
    The Austrian National Library will start the web archiving with a pilot phase that will then become a permanent service in 2010. Access will be possible for anyone on site at the Austrian National Library and approx. 20 other libraries in Austria.
Visit archive

Return to top


Valid XHTML 1.0! top | © 2004-2011 IIPC | copyright and privacy statements | credits
iipc