Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.
/ DUCT Public archive

DUCT is Django application which allows user to convert CSV files to a harmonised datastore modelled on the SDMX standard with a standardised output.

License

Notifications You must be signed in to change notification settings

zimmerman-team/DUCT

Repository files navigation

DUCT: Data Universal Conversion Tool

License: AGPLv3

DUCT is Django application which allows user to convert CSV files to a harmonised datastore modelled on the SDMX standard with a standardised output. It provides two different API' to interface the data either to convert (PUT) and extract data in your bespoke IU (GET). DUCT makes use of The Django REST API as a base API and on top of this, it provides GraphQL to connect to your datastore for data modelling, data conversion, data integration and data interfacing.

DUCT has been build as part of Zoom, a Data platform for data informed strategy in combating the AIDS epidemic in cooperation with Aidsfonds that works towards ending AIDS in a world where all people affected by HIV/AIDS have access to prevention, treatment, care and support and HumanityX who are supporting organisations in the peace, justice and humanitarian sectors to adopt digital innovations in order to increase their impact on society.

Auth0

DUCT uses auth0 for authenticated access to certain endpoints and certain data.

Requirements

Name Recommended version
Python 3.6.5
PostgreSQL 10.5
virtualenv 16.1
pip 8.1
python-dev --
python3.6-dev --
libpython-dev --
libpython3.6-dev --
rabbitmq 3.7
libsqlite3-dev --
tippecanoe 1.34.6
Supervisor (for deployment) 3.2
nginx (for deployment) 1.14
PostGIS See: installing PostGIS
Ubuntu (Documentation only covers Ubuntu) (16.04)

Set up


  • git clone https://github.com/zimmerman-zimmerman/DUCT.git
  • cd DUCT

If you have Docker installed:

  • Create a file called 'docker_settings.py' in the folder 'DUCT/ZOOM/ZOOM' and add these variables to it(Note: these are basically used for sending email, after data mapping is done):

      from ZOOM.settings import *
    
      DATABASES = {
          'default': {
              'ENGINE': 'django.contrib.gis.db.backends.postgis',
              'NAME': 'zoom',
              'USER': 'zoom',
              'PASSWORD': 'zoom',
              'HOST': 'db',
          },
      }
    
      # SEND EMAIL CONFIG
    
      EMAIL_HOST = 'your_email_host'
      EMAIL_PORT = 'your_email_host_port'
      EMAIL_HOST_USER = 'your_email_host_user'
      EMAIL_HOST_PASSWORD = 'your_email_host_password'
      EMAIL_USE_TLS = True
    
      # TASKS
    
      ZOOM_TASK_EMAIL_CONFIRMATION_ENABLE = True
      ZOOM_TASK_EMAIL_SENDER = 'your_email_sender'
      ZOOM_TASK_EMAIL_RECEIVER = 'your_default_email_receiver'
    
      # DOCKER RABBIT MQ
    
      CELERY_BROKER_URL = 'amqp://rabbitmq'
      CELERY_RESULT_BACKEND = 'amqp://rabbitmq'
    

then run:

docker-compose build docker-compose up

or you can manually install by:

  • sudo sh bin/setup/install_dependencies.sh

  • Run virtualenv <name> -p python3 to create a virtual environment

  • Run source env/bin/activate to activate the virtual environment

  • pip install -r ZOOM/requirements.txt

  • sudo sh bin/setup/sync_db.sh

  • sudo sh bin/setup/create_django_user.sh

  • cd ZOOM/scripts

  • If you want to have geolocations for Netherlands PC4 digit areas in your DUCT please download it here PC4 geo json and add it to your 'DUCT/ZOOM/geodata/data_backup' folder. Note it might take up to 30mins more for the set up project script to finish if you have this file.

  • If you want to have geolocations for Netherlands PC6 digit areas in your DUCT please download it here PC6 geo json and add it to your 'DUCT/ZOOM/geodata/data_backup' folder. Note it might take up to 1 day or more for the set up project script to finish if you have this file.

  • ./setup_project.sh

  • cd ..

  • Create a file called 'local_settings.py' in the folder 'DUCT/ZOOM/ZOOM' and add these variables to it(Note: these are basically used for sending email, after data mapping is done):

      # SEND EMAIL CONFIG
    
      EMAIL_HOST = your_email_host
      EMAIL_PORT = your_email_host_port
      EMAIL_HOST_USER = your_email_host_user
      EMAIL_HOST_PASSWORD = your_email_host_password
      EMAIL_USE_TLS = True
    
      # TASKS
    
      ZOOM_TASK_EMAIL_CONFIRMATION_ENABLE = True
      ZOOM_TASK_EMAIL_SENDER = your_email_sender
      ZOOM_TASK_EMAIL_RECEIVER = your_default_email_receiver
     ```
    
  • Also in your local_settings.py ^ you can add this variable 'POCESS_WORKER_AMOUNT=' and specify a number of desired process workers. Basically this is used for big geodata processing, data that contains more than 40000 data points. Of course the amount of process workers to be used would very much be dependant on your machine, it should never exceed the amount of cores your machine has, and of course if you use to many process workers(like 20) it might work slower in comparison to using less process workers(like 4) because of pre process initiation tasks. The default of this variable is already set to 2.

  • In 'DUCT/ZOOM' folder create a file called '.env' and add these variables to it(mainly used for specific DUCT endpoints that can only be accessed with a user signed in via auth0 api):

     AUTH0_DOMAIN=your_auth_domain
     API_IDENTIFIER=your_auth_api_identifier
    
  • In 'DUCT/ZOOM' folder create a folder called 'media' and inside that one create a folder called 'tmpfiles'(If these were not already created)

  • Start your rabbitmq service

  • python manage.py runserver

  • Reactivate your virtual environment if it was deactivated, then in the folder 'DUCT/ZOOM' run the celery worker -A ZOOM worker -l info

  • Reactivate your virtual environment if it was deactivated, then in the folder 'DUCT/ZOOM' run the celery beat -A ZOOM beat -l info

...and visit 0.0.0.0:8000.

This will start a development environment (using Django's development server) for DUCT.

Note, that DUCT Docker image will be pulled from Docker HUB and not built locally.

Extra Info

  • Make sure that your tippecanoe executable is in '/usr/local/bin/tippecanoe' OR if you have it somewhere else, make sure to add the variable 'TIPPECANOE_DIR' to your local_settings.py pointing to the directory containing tippecanoe executable. For the recommended/default case it would look like TIPPECANOE_DIR = '/usr/local/bin/'

Documentation


clone the project

sudo apt-get install git
git clone https://github.com/zimmerman-zimmerman/DUCT.git;
cd DUCT;

Install dependencies

Install all the dependencies in the bin/setup/install_dependencies.sh folder.

sudo sh bin/setup/install_dependencies.sh

Install a python virtual environment

sudo apt-get install python-pip;
sudo pip install virtualenvwrapper;
export WORKON_HOME=~/envs;
/usr/local/bin/virtualenvwrapper.sh;
source /usr/local/bin/virtualenvwrapper.sh;
mkvirtualenv zoom;
workon zoom;

Install pip packages

cd ZOOM
pip install --upgrade pip;
pip install -r ZOOM/requirements.txt;

Configuration

Create a database

sudo -u postgres bash -c "psql -c \"CREATE USER zoom WITH PASSWORD 'zoom';\""
sudo -u postgres bash -c "psql -c \"ALTER ROLE zoom SUPERUSER;\""
sudo -u postgres bash -c "psql -c \"CREATE DATABASE zoom;\""

Migrate the database, create a superuser, and run the server (for production, we use nginx/gunicorn).

cd ZOOM/scripts
sh setup_project.sh
cd ../
python manage.py createsuperuser
python manage.py runserver

Eventually, you could add your modifications to the Django configuration in a new file at ZOOM/local_settings.py

Endpoints Overview

Rest endpoints

URL Code Loc
/api/indicators/ api.indicator.views.IndicatorList
/api/mapping/ api.mapping.views.MappingJob
/api/mapping/get_data api.mapping.views.get_data
/api/mapping/status api.mapping.views.MappingJobResult
/api/metadata/ api.metadata.views.FileListView
/api/metadata/pk/ api.metadata.views.FileDetailView
/api/metadata/sources/ api.metadata.views.FileSourceListView
/api/metadata/sources/pk/ api.metadata.views.FileSourceDetailView
/api/metadata/upload/ api.metadata.views.FileUploadView
/api/validate/ api.validate.views.Validate
/api/validate/check_file_valid/ api.validate.views.check_file_valid
/api/error-correction/ api.error_correction.views.ErrorCorrectionView

GraphQL

URL Code Loc
/graphql graphene_django.views.GraphQLView
Query Code Loc
allMappings gql.mapping.schema.Query
allIndicators gql.indicator.schema.Query
datapointsAggregation gql.indicator.schema.Query
fileSource gql.metadata.schema.Query
allFileSources gql.metadata.schema.Query
file gql.metadata.schema.Query
allFiles gql.metadata.schema.Query
country gql.geodata.schema.Query
allCountries gql.geodata.schema.Query
geolocation gql.geodata.schema.Query
allGeolocations gql.geodata.schema.Query
Mutation Code Loc
mapping gql.mapping.mutation.Mutation
indicator gql.indicator.mutation.Mutation
fileSource gql.metadata.mutation.Mutation
file gql.metadata.mutation.Mutation

About the project


Can I contribute?


Yes please! We are mainly looking for coders to help on the project. If you are a coder feel free to Fork the repository and send us Pull requests!

Running the tests


Django Rest API

The rest API endpoints can be tested by:

python manage.py test api.<Test Choice>

Below is an example of a test that can be run

python manage.py test api.mapping.tests.test_file_manual_mapping

GraphQL

The GraphQL enpoints can also be tested by:

python manage.py test gql.<Test Choice>

Below is an example of a test that can be run

python manage.py test gql.tests.test_mapping

About

DUCT is Django application which allows user to convert CSV files to a harmonised datastore modelled on the SDMX standard with a standardised output.

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages