iipc netpreserve.org contact
site search with google:
 
about
mission
members
membership
working groups
curators
press releases
publications:
reports
events:
conferences and
 workshops

software:
toolkit
downloads

Working Groups

To achieve its mission, the consortium sets up dedicated committees consisting of members of some of the participating libraries working on specific topics and providing the consortium with various deliverables. A technical committee supervises and runs projects which have an impact on the overall Web archiving framework and technical architecture. It guarantees convergence and consistency of standards and practices in areas such as harvesting, access and preservation.

In 2009, the chartered working groups are Harvesting, Access, and Preservation.

Harvesting

The Harvesting Working Group’s primary focus is the development of web harvesting technologies, particularly around the Internet Archive’s Heritrix web crawler. The major areas of work include a smart crawler. Other areas of focus include:

  • Development of a smart crawler and improving harvesting performance
  • Development and support of the WARC file format
  • Best practices and databases for sharing crawl information in bulk or selective harvesting
  • Feature requests for crawler
  • Harvesting the deep web
  • Harvesting video and streaming media

Access

The Access Working Group will focus on initiatives, procedures and tools required to provide immediate access and to preserve the future access to Internet material in a Web archive. Focus areas include:

  • Defining User Requirements to improve existing access tools such as the Open Source Wayback Machine (OSWM)
  • Testing full text indexing with NutchWax
  • Defining requirements for user authentication/authorization/access controls
  • Best practices and sharing experiences with Web archive researchers and end users
  • Access tools for the analysis of the content of the archived internet material
  • Access tools for the analysis of the structure of the archived internet material
  • Technology watch for innovative ways of accessing web archives

Preservation

The IIPC Preservation Working Group is looking at policy, practices and resources in support of preserving the content and accessibility of Web archives. Over the past decade, there has been great attention paid to the processes of capturing online resources, as a necessary step in their preservation; however, work on maintaining accessibility for the long term remains reasonably undeveloped. At the same time, many approaches have been proposed and implemented for other kinds of digital collections. The Preservation Working Group aims to understand and report on how such approaches might be used with Web archives, as well as the special characteristics of Web archives that might require new approaches. Following its Canberra and London meetings in 2008, the Preservation Working Group defined a set of work packages related to this following topics, which are currently the core of its current activity:

  • Discussion of Preservation Objectives
  • Preservation Skills Development
  • Preservation Strategies
  • Environmental Scan of Technical Environment
  • WARC Issues
  • Metadata
  • Workflows

2007-2008

In 2007 and 2008, there was a Standards and Interoperability Working Group, focused on the WARC standardization process. As that was finalized, the Standards WG was phased out. As it will be focusing on interoperability issues, the IIPC Technical Committee is expected to address most of the issues previously in scope of the Standards and Interoperability Working Group.

2003-2006

During 2003-2006, the following working groups were chartered:


Valid XHTML 1.0! top | © 2004-2008 IIPC | copyright and privacy statements | credits
iipc