Monitrix is a monitoring/analytics frontend for the Heritrix 3 Web crawler. There are two prototypes:
- Prototype 1: A dedicated web application.
- Prototype 2: A version built on the generic log visualisation platform, Kibana.
We are moving towards exclusively using the Kibana version. It is based on this ELK Docker image, but we add configuration so Logstash knows how to parse and store Heritrix logs.
Visit the Wiki for more information.
About The Dedicated Web Application
- installing monitrix
- using monitrix
- monitrix internals:
Developers: Quick Start
To start monitrix in development mode, change into the project root folder and type
The application will be at http://localhost:9000.
To generate an Eclipse project, type
Getting Data into monitrix
To load data into monitrix, enter the 'Admin' section, and enter the absolute path of a log file in the form. The log file should immediately appear in the list above the form, with status 'CATCHING UP' (or 'PENDING', followed shortly thereafter by 'CATCHING UP'). monitrix will now load the log file into the database. After the upload is complete, monitrix will continuously check the log file for updates. Warning: Loading data takes time! On my machine, a 10 GB log sample currently takes about 1 hour to process!
Alternatively, you can also populate the database 'manually' using either of the following Java utilities, located in the /test folder of the project:
uk.bl.monitrix.util.BatchLogProcessorwill load a log file into the database in one go and then terminate.
uk.bl.monitrix.util.IncrementalLogProcessorwill load a log file into the database, and then continue to monitor that file (and incrementally sync the DB) until it is terminated forcefully.