# GMNS Format Validation for networks stored as CSV files

This notebook demonstrates validation for whether a GMNS network conforms to the schema.
It uses a modified version of [GMNSpy](https://github.com/e-lo/GMNSpy), originally developed by Elizabeth Sall.

The first time you run this notebook after cloning this repo, you may need to run the following commands to update and install the working copy of `gmnspy`:

In [1]:
!git submodule update --init --recursive --remote --merge
# if you don't have command-line git, instead download this zip file: 
# https://github.com/ianberg-volpe/GMNSpy/archive/refs/heads/hide_output.zip
# and extract the contents of the `GMNSpy-hide_output` folder in that zip archive
# into the folder named `gmnspy` in the same directory as this notebook.

!pip install ./gmnspy

Already up to date.
Submodule path 'GMNSpy': merged in '7d750063c2db06085fedc82d24af5a9c4ae9945f'
Processing c:\users\ian.berg\documents\github\gmns\validation_tools\gmnspy


  DEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.
   pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.


Building wheels for collected packages: gmnspy
  Building wheel for gmnspy (setup.py): started
  Building wheel for gmnspy (setup.py): finished with status 'done'
  Created wheel for gmnspy: filename=gmnspy-0.0.2-py3-none-any.whl size=14675 sha256=24ff7766f223cbdd9309cae95e449f358e4b3b57d394e726c6ff4702a8e0ccbe
  Stored in directory: C:\Users\ian.berg\AppData\Local\Temp\pip-ephem-wheel-cache-e_il15v4\wheels\ce\a1\9b\83840c9048712fe5979bd021f40235c3256131f3e6e750da89
Successfully built gmnspy
Installing collected packages: gmnspy
  Attempting uninstall: gmnspy
    Found existing installation: gmnspy 0.0.2
    Uninstalling gmnspy-0.0.2:
      Successfully uninstalled gmnspy-0.0.2
Successfully installed gmnspy-0.0.2


## Inputs
GMNSpy takes CSV files as inputs for the network. Place all network files in a single directory. 

The validation constraints are checked using a set of JSON files. `gmns.spec.json` provides information about required files and paths to the JSON definition for each table. Each table has its own `.schema.json` file which defines required fields and field constraints. These may be edited to meet a user's specific needs (e.g., to add user-defined fields, or to relax constraints).

## Outputs
Reading a GMNS network using the command below checks the set of files in the `data_directory` against the spec defined by `config`. The script currently performs the following checks:
- Checks whether the required tables, as defined in `gmns.spec.json`, are present.
- Checks each file in the `data_directory` whose name matches one defined in the spec with its associated `.schema.json` file. The following checks are performed:
    - whether any required fields are missing (report a FAIL message if so). 
    - whether any fields outside the spec are present (report a WARN message if so).
    - whether the values present in each field have the same datatype (integer, number, boolean, string) as required by the spec (report a FAIL message if so).
    - whether any required fields have missing values (report a FAIL message if so).
    - whether the primary key field has any duplicate values (report a FAIL message if so).
    - whether any values in fields with strict constraints (minimum, maximum, enum) fall outside of those constraints (report a FAIL message if so).
    - whether any values in fields with warning constraints (minimum, maximum) fall outside of those constraints (report a WARN message if so).
- Checks the foreign keys specified in each table. The following checks are performed:
    - whether the foreign key specified exists on the reference table (report a FAIL message if not).
    - whether the foreign key specified has unique values on the reference table (report a FAIL message if not).
    - whether all values of the foreign key contained in a source table exist on the reference table (report a FAIL message if not).

Here's an example of a "good" dataset (no `FAIL` messages, and reviewing the `WARN` messages reveals no issues).

In [5]:
import gmnspy
out = gmnspy.in_out.read_gmns_network(data_directory = "../Small_Network_Examples/Arlington_Signals", config = "../Specification/gmns.spec.json")

Checking Presence of Required Files:  ['link', 'node']
Found following files to define network: 
 - link
 - node
 - lane
 - location
 - movement
 - use_definition
 - use_group
 - segment
 - segment_lane
 - signal_controller
 - signal_coordination
 - signal_phase_mvmt
 - signal_timing_plan
 - signal_timing_phase
 - signal_detector

SCHEMA ../Specification\link.schema.json
...validating ../Small_Network_Examples/Arlington_Signals\link.csv against ../Specification\link.schema.json
Passed field type coercion
Passed Field Required Constraint Validation

SCHEMA ../Specification\node.schema.json
...validating ../Small_Network_Examples/Arlington_Signals\node.csv against ../Specification\node.schema.json
Passed field type coercion
Passed Field Required Constraint Validation

SCHEMA ../Specification\lane.schema.json
...validating ../Small_Network_Examples/Arlington_Signals\lane.csv against ../Specification\lane.schema.json
Passed field type coercion
Passed Field Required Constraint Validation

S

This example shows the same dataset, but with some errors introduced.

In [7]:
import gmnspy
out = gmnspy.in_out.read_gmns_network(data_directory = "../Small_Network_Examples/Arlington_Signals_Errors", config = "../Specification/gmns.spec.json")

Checking Presence of Required Files:  ['link', 'node']
Found following files to define network: 
 - link
 - node
 - lane
 - location
 - movement
 - segment
 - segment_lane
 - signal_controller
 - signal_coordination
 - signal_phase_mvmt
 - signal_timing_plan
 - signal_timing_phase
 - signal_detector

SCHEMA ../Specification\link.schema.json
...validating ../Small_Network_Examples/Arlington_Signals_Errors\link.csv against ../Specification\link.schema.json
Passed field type coercion
FAIL. bike_facility field has errors. ["Values: ['offstreet path'] not in enumerated list: ['unknown', 'none', 'wcl', 'sharrow', 'bikelane', 'cycletrack', 'offstreet_path']. Index of row(s) with bad values: [0, 1, 12, 13]"]
FAIL. ped_facility field has errors. ["Values: ['offstreet path'] not in enumerated list: ['unknown', 'none', 'shoulder', 'sidewalk', 'offstreet_path']. Index of row(s) with bad values: [0, 1, 12, 13]"]

SCHEMA ../Specification\node.schema.json
...validating ../Small_Network_Examples/Arlin