Automated checks to ensure data meets the expectations of OpenSpending and the vulekamali system before going through the upload and data build process.
This is intended to identify issues early and make it quicker to fix issues when adding new data to the site.
This repository describes the datasets and the checks that should be run against them.
The datasets are described as Data Packages by files named datapackage.json
in a directory under the datapackages
directory specific to each dataset.
Each dataset should be in a directory specific to the type of data release (e.g. epre
, ene
, aene
, aepre
, annual-report
) under a directory for the specific financial year that is the focus of the release (e.g. 2018-19
). For example:
datapackages
└── 2018-19
├── epre
│ ├── datapackage.json
└── 2019-20
├── ene
│ ├── datapackage.json
datapackage.json
files are automatically discovered and checked against the schema they refer to when changes are uploaded to this repository on GitHub.com in a Pull Request
Additionaly, some custom checks are also run depending on the type of dataset (e.g. epre
, ene
, etc.) that was uploaded.
- Create a folder for the dataset in the appropriate financial year with the right name for the type of release (see above)
- Upload the dataset somewhere where you can get a URL that will be available consistently, e.g. the Treasury website
- Copy an existing Datapackage.json into your dataset's folder
- Update the URL to your dataset's public URL and any references to other financial years to the correct one
Python version: See .travis.yml
Use a python virtual environment to manage dependencies.
Install dependencies:
pip install -r requirements.txt
Running locally:
pip install -e .
python bin/run-checks.py # Looks for datapackage.json files in the datapackages directory
python bin/run-checks.py <directory-path> # Looks for datapackage.json files in directory-path
Running the tests:
python -m unittest