A monitoring system for Heritrix 3.
Java HTML JavaScript Groovy CSS Scala Other



Monitrix is a monitoring/analytics frontend for the Heritrix 3 Web crawler. There are two prototypes:

  • Prototype 1: A dedicated web application.
  • Prototype 2: A version built on the generic log visualisation platform, Kibana.

We are moving towards exclusively using the Kibana version. It is based on this ELK Docker image, but we add configuration so Logstash knows how to parse and store Heritrix logs.

Visit the Wiki for more information.

About The Dedicated Web Application

Developers: Quick Start

To start monitrix in development mode, change into the project root folder and type play run. The application will be at http://localhost:9000.

To generate an Eclipse project, type play eclipse.

Getting Data into monitrix

To load data into monitrix, enter the 'Admin' section, and enter the absolute path of a log file in the form. The log file should immediately appear in the list above the form, with status 'CATCHING UP' (or 'PENDING', followed shortly thereafter by 'CATCHING UP'). monitrix will now load the log file into the database. After the upload is complete, monitrix will continuously check the log file for updates. Warning: Loading data takes time! On my machine, a 10 GB log sample currently takes about 1 hour to process!

Alternatively, you can also populate the database 'manually' using either of the following Java utilities, located in the /test folder of the project:

  • uk.bl.monitrix.util.BatchLogProcessor will load a log file into the database in one go and then terminate.
  • uk.bl.monitrix.util.IncrementalLogProcessor will load a log file into the database, and then continue to monitor that file (and incrementally sync the DB) until it is terminated forcefully.