# GMNS Format Validation for networks stored as CSV files

This notebook demonstrates validation for whether a GMNS network conforms to the schema.
It uses a modified version of [GMNSpy](https://github.com/e-lo/GMNSpy), originally developed by Elizabeth Sall.

The first time you run this notebook after cloning this repo, you may need to run the following commands to update and install the working copy of `gmnspy`:

In [2]:
!git submodule update --init --recursive --remote --merge
# if you don't have command-line git, instead download this zip file: 
# https://github.com/zephyr-data-specs/GMNSpy/archive/refs/heads/master.zip
# and extract the contents of the `GMNSpy-master` folder in that zip archive
# into the folder named `gmnspy` in the same directory as this notebook.

!pip install ./gmnspy

Processing c:\users\ian.berg\documents\github\gmns\validation_tools\gmnspy
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: gmnspy
  Building wheel for gmnspy (pyproject.toml): started
  Building wheel for gmnspy (pyproject.toml): finished with status 'done'
  Created wheel for gmnspy: filename=gmnspy-0.0.3-py3-none-any.whl size=41526 sha256=7cdb7b7a2a54a1c4f8e378b59229f97b682058c33212e9558d886927193efb4e
  Stored in directory: C:\Users\ian.berg\AppData\Local\Temp\pip-ephem-wheel-cache-bhz69tu7\wheels\ce\a1\9b\83840c9048712fe5979bd021f40235c3256131f3e6e750da89
Successfully built gmnspy
Installing collected packages: gmnspy
  Attempting uninstall: gmnspy
    Found existin

## Inputs
GMNSpy takes CSV files as inputs for the network. Place all network files in a single directory. 

The validation constraints are checked using a set of JSON files. `gmns.spec.json` provides information about required files and paths to the JSON definition for each table. Each table has its own `.schema.json` file which defines required fields and field constraints. These may be edited to meet a user's specific needs (e.g., to add user-defined fields, or to relax constraints).

## Outputs
Reading a GMNS network using the command below checks the set of files in the `data_directory` against the spec defined by `config`. The script currently performs the following checks:
- Checks whether the required tables, as defined in `gmns.spec.json`, are present.
- Checks each file in the `data_directory` whose name matches one defined in the spec with its associated `.schema.json` file. The following checks are performed:
    - whether any required fields are missing (report a FAIL message if so). 
    - whether any fields outside the spec are present (report a WARN message if so).
    - whether the values present in each field have the same datatype (integer, number, boolean, string) as required by the spec (report a FAIL message if so).
    - whether any required fields have missing values (report a FAIL message if so).
    - whether the primary key field has any duplicate values (report a FAIL message if so).
    - whether any values in fields with strict constraints (minimum, maximum, enum) fall outside of those constraints (report a FAIL message if so).
    - whether any values in fields with warning constraints (minimum, maximum) fall outside of those constraints (report a WARN message if so).
- Checks the foreign keys specified in each table. The following checks are performed:
    - whether the foreign key specified exists on the reference table (report a FAIL message if not).
    - whether the foreign key specified has unique values on the reference table (report a FAIL message if not).
    - whether all values of the foreign key contained in a source table exist on the reference table (report a FAIL message if not).

Here's an example of a "good" dataset (no `FAIL` messages, and reviewing the `WARN` messages reveals no issues).

In [1]:
import gmnspy
out = gmnspy.in_out.read_gmns_network(data_directory = "../Small_Network_Examples/Arlington_Signals", config = "../Specification/gmns.spec.json")

This example shows the same dataset, but with some errors introduced.

In [4]:
import gmnspy
out = gmnspy.in_out.read_gmns_network(data_directory = "../Small_Network_Examples/Arlington_Signals_Errors", config = "../Specification/gmns.spec.json")

In [5]:
# To find the logging output, look for `gmnspy.log` in the tempdir

import tempfile
tempfile.gettempdir()

'C:\\Users\\IAN~1.BER\\AppData\\Local\\Temp'