Skip to content
forked from abedul1980/RefData

Reference data and schema for RefData project stream

License

Notifications You must be signed in to change notification settings

tomyems/RefData

 
 

Repository files navigation

Reference Data Governance Service Schema Definitions

Running Local Simulation

This repository contains just enough code to start all of the services required to run the reference data service. Once you have successfully cloned this repository into your local ide, run git pull to ensure you have the most up to date version of the schema. Next you need to build a reference data image by running ./dev.sh rebuild in bash (or .\dev.ps1 -Rebuild in powershell). If using a Windows device use powershell to create your reference data image.

Then you can start it by running ./dev.sh (or .\dev.ps1 in powershell)

The script will map current flyway subdirectory to /flyway and current ../csv directory to /csvs. Once started, your terminal will run inside reference data world.

By default, the image does not include Keycloak service, because most of the time you don't need it and startup times increase dramatically. If you need to build an image with Keycloak, run ./dev.sh rebuild full (.\dev.ps1 -Rebuild -Full).

Next run ./flyway.sh this pushes schema changes to the pgweb instance. This should be run in the first instance such that you have the most up to date schema within your local simulation, when making csv changes although this is not required it is good practice to run this command as it re-reads the schema and restarts the tables to include your csv changes.

Any version of the container runs an instance of pgweb (tiny postgrest UI) which you can access by going to http://localhost:8081/ in your browser. It's already pre-connected and pre-authenticated to local reference data instance:

The tool allows to view schema, data, export data to CSV etc.

Customising Run

NOTE: If you are only using this code to run the Reference Data service customising the run is unnecessary and you can skip to making schema changes section

The image is serving the following ports inside the container:

  • 3000 - PostgREST.
  • 5432 - Postgres DB.
  • 8000 - Keycloak.
  • 8081 - pgweb.

It's better not to expose them in order not to conflict with anything running on your machine, but if you need to say expose port 5432 just use docker syntax i.e. run this instead:

docker run -it --rm -v "$(pwd)/flyway:/flyway" -v "$(dirname $(pwd))/csv:/csvs" -p 5432:5432 refdata

You can customise the command if you need to map flyway and/or flyway to different location.

Making Schema Changes

With any schema change create a separate branch from master for all of your changes.

  1. Create a new flyway file in schemas/reference - follow flyway documentation if you are not familiar with this system. Look at the existing scripts and try to be like a ninja - make your scripts blend in. Please refer to table requirements document to understand how it should be done. The file format we use for reference schema is as shown: V1__example.sql where two _ are used.
  2. Change flyway.target in docker/flyway_reference_docker.conf to your script number.
  3. Raise a PR and wait for validation to complete. Check build logs if anything goes wrong - you will get a detailed reason why something is not working.

You can test these changes made outside of docker image from inside the docker image, thereby pushing them to your local pgweb instance by starting

./flyway.sh

because local flyway directory is mapped as a docker volume inside.

Validating Schema Changes

Run

./validate.sh

When the validation script runs you will see a progress bar within your terminal, be sure to check the file you have changed is displayed to ensure the script has recognised your file when running tests.

Applying CSV Changes

Due to the fact this repository does not contain any csv data, and assuming you have already cloned the csv data folder one level above, test csv changes by running

./csv.sh

inside docker container as it maps local ../csvfolder as /csv. This command only pushes csv data to your local pgweb instance, to apply changes to the master branch follow the same process as outlined for schema changes.

Known Issue

When running the reference data image in bash on a Windows device, the container does not always recognise flyway changes and as a result the validate script will not pick up any errors. It is unclear why this occurs however the current workaround is if you are using a Windows device use powershell to build the reference data image.

Appendix 1. PSQL.

Use PSQL

psql -U postgres -d ref

to check table schema:

SELECT column_name, data_type FROM information_schema.columns WHERE TABLE_NAME = 'table_name';

or to get table values:

SELECT * from table_name;

Appendix 2. Foreign Keys.

As we have duplicating IDs (but unique id+validfrom) there is no way to create the usual foreign key constraint, unless the constraint is using id+validfrom as a source and target key. This is not what you want though, because latest version of the target table's record is always preferred.

In order to validate that the range of a column is in range of ids of a target table's id column, you can add a trigger. Let's say address table has countryid column, and it has to be in range of id columns from the country table:

create trigger address_countryid before insert on address
    for each row execute procedure validate_id('countryid', 'country', 'id');

validate_id is a custom function which is defined as follows at the moment of this writing:

-- the following function validates that a column is in range of other table column's values
-- expects 3 parameters
-- 0: source column name
-- 1: target table name
-- 2: target column name
create or replace function validate_id() returns trigger as $$
declare
    src TEXT;
    tbl TEXT;
    dest TEXT;
    value TEXT;
    in_range BOOLEAN;
begin
    src := tg_argv[0];
    tbl := tg_argv[1];
    dest := tg_argv[2];

    -- log
    --raise notice '% -> %.%', src, tbl, dest;

    -- dynamically query whether source value is in range
    -- i.e.
    -- select [new value] in (select [id] from [dest_table])
    execute format('select ($1).%s in (select %I from %I)', src, dest, tbl)
        using NEW
        into in_range;

    if in_range then
        return NEW;
    end if;

    -- raise proper error

    -- get source value
    execute format('select ($1).%s::text', src)
        using NEW
        into value;

    raise exception '% % is not in range of %.%', src, value, tbl, dest;

end;
$$ language plpgsql;

Appendix 3. Export to CSV.

From psql you can execute the copy command:

\copy (select * from port) to /csvs/port_export.csv with csv

About

Reference data and schema for RefData project stream

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 45.5%
  • Shell 24.9%
  • Dockerfile 24.0%
  • PLpgSQL 4.1%
  • PowerShell 1.5%