This repo contains Docker and Makefile based Data Dictionary Development workflow i.e., packaging around dictionary tools (Docker image) for conversion, visualisation, testing, validation to allow Data Modeller to iteratively develop schema locally.
Our aim is to develop UMCCR Data Dictionary for Gen3 platform.
git clone https://github.com/umccr/umccr-dictionary.git
cd umccr-dictionary
make test dd=beacon
make compile dd=beacon
make validate dd=beacon
make simulate dd=beacon
make load dd=beacon
make import dd=beacon
make psql
metadata=> \dt node_*
metadata=> select * from node_cohort;
metadata=> \q
An example Node to Edge linking:
- Read design_notes.md
- Read search-query.md
How do I develop Gen3 Data Dictionary?
- Read pointers from this FAQ.
- Gen3 Data Dictionary are essentially authored in YAML file as DSL (Domain-Specific Language). However, there are tools available from Excel/CSV/TSV to YAML to JSON conversion. So, to start with, it can be as simple as modelling in Excel for metadata requirement for Data Dictionary. i.e. to have rough idea on determining base entities (a node in a Graph), their attributes (node properties) and links (edge/relation). See README.md
- Alternatively, you can pick the baseline dictionary that closely suit for your need and work out from there. By default, this is the GDC dictionary. You can search around in uc-cdis repos, keywords such as:
- Or, have a look:
- http://mindset-gen3-gallery.s3-website-us-east-1.amazonaws.com
- and, browse around deployed sites
In this repo, we selected few data dictionaries in dictionary folder to work with the following data dictionary development workflow.
- Docker Desktop (at least v3.2.1)
- GNU Make
- GNU Make comes with most Linux and macOS Xcode
- Try
make --version
to see whether you already have it in - Otherwise
brew install make
for macOS and try likegmake --version
- On Ubuntu, try
apt-get install make
- If
make
is not possible then you will need to execute each target in Makefile
- Download all the latest images in the stack
make pull
- Bring up the stack
make up
- Check the stack
make ps
- Restart the stack
make restart
- Bring it down
make down
- By default, it uses
.env-sample
for PostgreSQL connection and credentials. - You may override it by simply make a copy of file name in
.env
like so:
cp .env-sample .env
- You can then modify
.env
for your own custom values. - This
.env
is ignored for GitHub.
NOTE: You do not need to do this, if you are happy with default values in
.env-sample
. However, if you do, you need tomake down
andmake up
to take effect on changes.
- Visit to: http://localhost:8080
- You can switch dictionary as follows:
http://localhost:8080/#schema/<dictionary_name>.json
e.g.- http://localhost:8080/#schema/anvil.json
- http://localhost:8080/#schema/dcf.json
- http://localhost:8080/#schema/gdc.json
- http://localhost:8080/#schema/kf.json
DEBUG: To debug visualisation, try with Browser built-in developer tools (e.g., here and here). Typically, right click > inspect > select "console" tab > reload the page.
- Say you are working on
umccr
dictionary - Modify schema yaml files in
dictionary/umccr/gdcdictionary/schemas
- Compile into JSON
make compile dd=umccr
make compile dd=kf
make compile dd=gdc
make compile dd=anvil
make compile dd=dcf
- Visit to: http://localhost:8080/#schema/umccr.json
- Reload the page (do twice if necessary)
- To test and validate dictionary:
make test dd=umccr
- To validate DD graph, do like so:
make validate dd=umccr
- To simulate test data for the minted JSON Data Dictionary e.g., say
umccr
dictionary
make simulate dd=umccr
- This will validate the DD's graph and create test mock data into
/data/umccr/
folder.
-
This will populate database schema tables into local PostgreSQL server; based on JSON Data Dictionary schema that you have designed from previous steps.
-
To load the minted JSON Data Dictionary to Gen3 Metadata Database tables e.g., say
umccr
dictionary
make load dd=umccr
- To import simulated data based on
umccr
dictionary, do like so:
make import dd=umccr
- Part of data importing process, it also creates
*.tsv
counterpart of simulated*.json
data. Please see output/README.md for more.
- Get into PSQL console
make psql
- Once inside PSQL console, try like so:
metadata=> \l
metadata=> \dt
metadata=> \dt node_*
metadata=> \dt edge_*
metadata=> \d node_program
Table "public.node_program"
Column | Type | Modifiers
---------+--------------------------+------------------------
created | timestamp with time zone | not null default now()
acl | text[] |
_sysan | jsonb | default '{}'::jsonb
_props | jsonb | default '{}'::jsonb
node_id | text | not null
metadata=> select * from node_program;
metadata=> select * from node_project;
metadata=> \q
-
The Data Dictionary is populated into PostgreSQL Public schema
-
If you'd like to reset public schema, do like so:
make reset
- This will reset current
metadata
database; so that you can (re) load data dictionary again. Hence, for example:
make load dd=umccr
make psql
metadata=> \dt node_*
metadata=> \q
make reset
make load dd=anvil
make psql
metadata=> \dt node_*
metadata=> \q
- At this point, you have a couple of options to work with local PostgreSQL database. Use connection info as follows:
Host: localhost
Port: 5432
Database: metadata
Username: metadata
Password: metadata
- For sa (System Admin) account; use these instead:
Host: localhost
Port: 5432
Database: metadata
Username: postgres
Password: postgres
If you are new to PSQL, try the following for starter:
- http://postgresguide.com/utilities/psql.html
- https://www.postgresqltutorial.com/psql-commands/
- https://www.postgresqltutorial.com
- Try the following for some GUI-based IDE tooling:
- Setup PyCharm Community Edition
- SQL Developer (freeware)
Screenshot: PyCharm
Screenshot: PSQL Console