Skip to content

Files

Latest commit

 

History

History

reports

Reports

The reports on the dataset are built to provide up-to-date summary statistics on the dataset, show the test sets we have compiled and annotated ourselves, and allow the results of the pipeline to be verified and further improved upon. This readme provides an overview of the relevant items. Several summary tables function as input for the general pipeline and are placed in the config/ folder. The relevant files are also linked in this readme.

Main overview

main_descriptives.pdf - A live report with up to date summary statistics collected from the curated dataset.

Place names

Overviews:

Build tables

Test sets:

Publishers

Overviews:

Build tables:

Test sets:

Improving the pipeline

The processing algorithms are built to provide an efficient and semi-automated solution to data enrichment. While we have done our best to reduce errors, there is always room for improvement.

The main ways to improve would be to improve the rules underlying the algorithms or to manually correct the output tables. Both options are helpful for development and will allow the quality to be improved for all users.

If you find errors or places for improvement you can 1) edit the summary tables in the repository and send a pull request, 2) edit the rules behind the algorithms and send a pull request, or 3) contact the maintainers by e-mail (current address krister.kruusmaa@tlu.ee) or by creating an issue in GitHub.