Approved August 2012
The Live Archiving Proxy (LAP) project is an HTTP proxy that is able to capture the traffic that flows through it. The LAP delegates the handling of the captured data to one or multiple writers using a simple network protocol. Writers exists for the DAFF, WARC and ARC format. Using an HTTP proxy for Web archiving enables the use of any HTTP client for crawling (Heritrix, PhantomJS, HTTrack, Scrapy, etc.) while keeping a unified and simple storage backend. The LAP is designed to be high performance, easy to use and archive-format agnostic. It will run on any 64-bit linux system.
- A standalone package that contains the Live Archiving Proxy software (via GitHub);
- The source code of the Live Archiving Proxy (via GitHub or SVN);
- A generic Writer Plugin library in Java (via Maven);
- The source code of the generic Writer Plugin library (via Maven);
- A documentation for the Live Archiving Proxy.
- A WARC Writer Plugin library in Java;
- The source code of the WARC Writer Plugin library;
- A documentation for the WARC Writer Plugin.