Member Navigation

  • International
    I

    A global network of experts archiving the Web for future generations.

    Learn more about IIPC

  • Internet
    I

    The web is a unique and dynamic resource that is of high value to current and future researchers.

    Learn about the value of our work

  • Preservation
    P

    IIPC members archive the web on a local, national, and global scale.

    Browse our members' archives

  • Consortium
    C

    Our community comes together annually to share experiences and present solutions.

    Meet IIPC's member organizations

  • Live Archiving HTTP Proxy

    Status: 
    Past Project

    Project Leaders: Institut National de l'Audiovisuel, Netarchive.dk

    Approved August 2012
    The Live Archiving Proxy (LAP) project is an HTTP proxy that is able to capture the traffic that flows through it. The LAP delegates the handling of the captured data to one or multiple writers using a simple network protocol. Writers exists for the DAFF, WARC and ARC format. Using an HTTP proxy for Web archiving enables the use of any HTTP client for crawling (Heritrix, PhantomJS, HTTrack, Scrapy, etc.) while keeping a unified and simple storage backend. The LAP is designed to be high performance, easy to use and archive-format agnostic. It will run on any 64-bit linux system.

     
    Project phases
    Phase 1: Ina develops a first prototype of the Proxy to share with partners;
    Phase 2: Netarchive develops the Java WARC writer plugin and tests it with the Proxy prototype;
    Phase 3: Designated partners test the Proxy and WARC writer, report bugs and give feedback;
    Phase 4: The Proxy and WARC writer are modified according to feedback.
     
    Deliverables (Ina)
    • A standalone package that contains the Live Archiving Proxy software (via GitHub);
    • The source code of the Live Archiving Proxy (via GitHub or SVN);
    • A generic Writer Plugin library in Java (via Maven);
    • The source code of the generic Writer Plugin library (via Maven);
    • A documentation for the Live Archiving Proxy.
     
    Deliverables (Netarchive)
    • A WARC Writer Plugin library in Java;
    • The source code of the WARC Writer Plugin library;
    • A documentation for the WARC Writer Plugin.

    Project Closure Report