Source code and scripts for the Webis Web Archiver
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
src-bash
src/de/webis/webarchive/environment
LICENSE
README.md

README.md

webis-web-archiver

Source code and scripts for the Webis Web Archiver

Quickstart

You need to have Docker installed.

Then, on a Unix machine:

  • run src-bash/archive.sh for archiving web pages. It will display usage hints.
  • run src-bash/reproduce.sh for reproducing from an archive. It will display usage hints.

The scripts will automatically download and run the image (2GB+ due to all the fonts).

For other OSes, have a look at the shell scripts and adjust the call to docker run accordingly.

Custom user simulation scripts

  • Write a class that extends InteractionScript.

  • You can use the ScrollDownScript as an example, or extend it.

  • The utility class Windows offers static helper methods for frequently used interactions.

  • Compile your script with the binaries in the class path and create a JAR from it.

  • Place the JAR into a directory named "scriptname-1.0.0", where you replace "scriptname" by the name of your script.

  • Create a file "script.conf" with the following content and put it into the same directory

    script = packages.of.your.ScriptClass;
    environment.name = de.webis.java
    environment.version = 1.0.0
    

    where you replace "packages.of.your.ScriptClass" accordingly. For the example ScrollDownScript, that would be

    script = de.webis.webarchive.environment.scripts.ScrollDownScript
    
  • When running archive.sh or reproduce.sh, specify the directory that contains the new directory with "--scriptsdirectory" and give the script name (as in the new directory) with "--script".

Building

Coming soon