vulekamali automated data checks

Automated checks to ensure data meets the expectations of OpenSpending and the vulekamali system before going through the upload and data build process.

This is intended to identify issues early and make it quicker to fix issues when adding new data to the site.

This repository describes the datasets and the checks that should be run against them.

The datasets are described as Data Packages by files named datapackage.json in a directory under the datapackages directory specific to each dataset.

Each dataset should be in a directory specific to the type of data release (e.g. epre, ene, aene, aepre, annual-report) under a directory for the specific financial year that is the focus of the release (e.g. 2018-19). For example:

datapackages
└── 2018-19
    ├── epre
    │   ├── datapackage.json
└── 2019-20
    ├── ene
    │   ├── datapackage.json

datapackage.json files are automatically discovered and checked against the schema they refer to when changes are uploaded to this repository on GitHub.com in a Pull Request

Additionaly, some custom checks are also run depending on the type of dataset (e.g. epre, ene, etc.) that was uploaded.

Data owners adding new datasets to be checked

Create a folder for the dataset in the appropriate financial year with the right name for the type of release (see above)
Upload the dataset somewhere where you can get a URL that will be available consistently, e.g. the Treasury website
Copy an existing Datapackage.json into your dataset's folder
Update the URL to your dataset's public URL and any references to other financial years to the correct one

Developer setup:

Python version: See .travis.yml

Use a python virtual environment to manage dependencies.

Install dependencies:

pip install -r requirements.txt

Running locally:

pip install -e .
python bin/run-checks.py # Looks for datapackage.json files in the datapackages directory
python bin/run-checks.py <directory-path> # Looks for datapackage.json files in directory-path

Running the tests:

python -m unittest

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
bin		bin
data_checks		data_checks
datapackages		datapackages
https:/s3-eu-west-1.amazonaws.com/manual-uploads.vulekamali.gov.za/ene		https:/s3-eu-west-1.amazonaws.com/manual-uploads.vulekamali.gov.za/ene
s3:/manual-uploads.vulekamali.gov.za/ene		s3:/manual-uploads.vulekamali.gov.za/ene
schema		schema
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
datapackage.json		datapackage.json
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

data_checks

data_checks

datapackages

datapackages

https:/s3-eu-west-1.amazonaws.com/manual-uploads.vulekamali.gov.za/ene

https:/s3-eu-west-1.amazonaws.com/manual-uploads.vulekamali.gov.za/ene

s3:/manual-uploads.vulekamali.gov.za/ene

s3:/manual-uploads.vulekamali.gov.za/ene

schema

schema

tests

tests

.gitignore

.gitignore

.travis.yml

.travis.yml

LICENSE

LICENSE

README.md

README.md

datapackage.json

datapackage.json

requirements.txt

requirements.txt

setup.py

setup.py

Repository files navigation

vulekamali automated data checks

Data owners adding new datasets to be checked

Developer setup:

About

Releases

Packages

Contributors 3

Languages

License

vulekamali/data-checks

Folders and files

Latest commit

History

Repository files navigation

vulekamali automated data checks

Data owners adding new datasets to be checked

Developer setup:

About

Resources

License

Stars

Watchers

Forks

Languages