A GTFS Schedule (static) General Transit Feed Specification (GTFS) feed validator
This command-line tool written in Java that performs the following steps:
- Loads input GTFS zip file from a URL or disk
- Checks files integrity, numeric type parsing and ranges as well as string format according to the GTFS Schedule specification
- Performs basic GTFS business rule validation
- Performs advanced GTFS business rule validation (work-in-progress)
Fork this repository, open a PR on master within it, edit the file .github/workflows/end_to_end.yml
following instructions on lines 5, 43-45 and push on your PR branch. Name your branch from the agency/authority/publisher of the feed you are testing.
You should now see the workflow End to end / run-on-data
start automatically in your PR checks, running the validator on the dataset you just added. The validation report is collected as a run artifact in the Actions tab of your fork repository on GitHub.
If the workflow run crashes or something doesn't look right in the validation report json file, please see the Contribute section, we may be able to help!
-
click on the
fork
button on the top right corner -
wait for the fork creation, you should now see your fork (https://github.com/YOUR_USERNAME/gtfs-validator)
-
navigate to
.github/workflows/end_to_end.yml
-
click the crayon icon to enter edit mode
-
on line 5, replace
transport-agency-name
by something significant likesociete-de-transport-de-montreal
if you were adding a dataset from STM -
keep it around as you'll need it in step 15.
-
uncomment line 43 by removing the
#
character -
on line 43, replace
ACRONYM
by some acronym for the Agency/publisher, in our example that would beSTM
-
uncomment line 44 by removing the
#
character -
on line 44, replace
[[[ACRONYM]]]
in[[[ACRONYM]]].zip
by what you put down in step 12 - NO SPACES OR SPECIAL CHARACTERS -- keep the .zip extension intact -
on line 44, replace
DATASET_PUBLIC_URL
by a public url pointing to a GTFS Schedule zip archive -
clic on the green
Start commit
button on the right of the page -
select the option
Create a new branch for this commit and start a pull request.
-
replace the proposed default branch name by what you got from step 7. Note that the branch name must exactly match the line 5 text (e.g.,
societe-de-transport-de-montreal
). -
click the green
Propose changes
button -
on the next screen, click
Create pull request
You should now see the workflow End to end / run-on-data
start automatically in your PR checks, running the validator on the dataset you just added. The validation report is collected as a run artifact in the Actions tab of your fork repository on GitHub.
If the workflow run crashes or something doesn't look right in the validation report json file, please see the Contribute section, we may be able to help!
- Install Docker
- Retrieve an image from our package page. For snapshot versions of the master branch
docker pull ghcr.io/mobilitydata/gtfs-validator:master
- we also provide images of our tagged versions starting with
v1.3.0
- Run the image either in the Docker Dashboard UI (dont forget to bind port 8090) or via this command
docker run -p 8090:8090 ghcr.io/mobilitydata/gtfs-validator:[[[REPLACE_WITH_YOUR_TAG]]]
By default, you will then have access to the web version of the validator at http://localhost:8090/ See Web app usage
If you want to use the cli version within Docker, you must first stop the web app with the following command
TODO: Could not figure out command.
Note: if you don't do it, the cli app will compete for resources within the container
After attaching a terminal to the running container, navigate to the cli jar folder
cd /usr/gtfs-validator/cli-app
you can then follow the instructions of the next sections
Note: As a convenience, a shell script
file is provided in the same directory. It is copied in the Docker image from end_to_end.sh
It can be used to run the validator in an automated way via Java on your local computer as described in the next section. Only community based support is provided for local runs of the validator.
- Install Java 11 or higher
- Download the latest gtfs-validator JAR (cli or web) file from our Releases page or snapshot artifact from GitHub Actions or Circle-CI Pipelines
Sample usage:
java -jar gtfs-validator-v1.3.0_cli.jar -i relative/path/to/zipped_dataset -o relative/output/path -e relative/extraction/path -x enumeration_of_files_to_exclude_from_validation_process
...which will:
- Search for a zipped GTFS dataset located at
relative/path/to/zipped_dataset
- Extract the zip content to a directory located at
relative/extraction/path
- Validate the GTFS data and output the results to the directory located at
relative/output/path
. Validation results are exported to JSON by default. The validation process will not be executed on the enumeration of files provided via option-x
and the files that rely on them. - Validate the GTFS data and output the results to the directory named
output_folder
. This folder will contain a single.json
file with information related to the validation process. - The generated
.json
file will be beautified if option-b
or--beautify
has been provided and set totrue
. Note that if this argument is not specified, the validator will by default generate a beautified version of the validation report.
Note:
- export validation report as
.json
file: After validating MBTA's GTFS archive on 2020-10-20 at 09:07:48 (America/Montreal timezone), the validation report will be named as followsMBTA__2020-10-20_09/07/48.442365.json
- export validation report as
.pb
file: after validating MBTA's GTFS archive on 2020-10-20 at 09:07:48 (America/Montreal timezone), the validation reports will be named as followsMBTA__2020-10-20_09/07/48.442365-1.pb
MBTA__2020-10-20_09/07/48.442365-2.pb
- ...
MBTA__2020-10-20_09/07/48.442365-n.pb
Those names come from concatenating the information found in feed_info.feed_publisher_name
and the local time of execution separated with __
then replacing whitespace character by _
In the case where GTFS filefeed_info.txt
is not provided, the validation report name would be limited to: __2020-10-20_09/07/48.442365.json
or __2020-10-20_09/07/48.442365-1.pb
java -jar gtfs-validator-v1.3.0_cli.jar -i gtfs-dataset.zip -o output_folder -e extraction_folder
In order, this command line will:
- Search for a zipped GTFS dataset name
gtfs-dataset.zip
located in the working directory - Extract its content to a directory named
extraction_folder
- Validate the GTFS data and output the results to the directory named
output_folder
. This folder will contain a single.json
file with information related to the validation process. - The generated
.json
file will be beautified if option-b
or--beautify
has been provided and set totrue
. Note that if this argument is not specified, the validator will by default generate a beautified version of the validation report.
java -jar gtfs-validator-v1.3.0_cli.jar -i gtfs-dataset.zip -x fare_attributes.txt,attributions.txt
In order, this command line will:
- Search for a zipped GTFS dataset name
gtfs-dataset.zip
located in the working directory - Create a directory named
input
- Extract the content of
gtfs-dataset.zip
to the directory created at step 2 - Create a directory names
output
- Exclude files
fare_attributes.txt
andattributions.txt
from the validation process. But also the files that rely on them:translations.txt
andfare_rules.txt
- Validate the GTFS data and output the results to the directory created at step 4. This folder will contain a single
.json
file with information related to the validation process.
Sample usage:
java -jar gtfs-validator-v1.3.0_cli.jar -u url/to/dataset -o relative/output/path -e relative/extraction/path -i input.zip
...which will:
- Download the GTFS feed at the URL
url/to/dataset
and name itinput.zip
- Extract the
input.zip
content to the directory located atrelative/extraction/path
- Validate the GTFS data and output the results to the directory located at
relative/output/path
. Validation results are exported to JSON by default.
java -jar gtfs-validator-v1.3.0_cli.jar -u url/to/dataset -o output_folder -e extraction_folder -i local-dataset.zip -p
In order, this command line will:
- Download the GTFS feed at the URL
url/to/dataset
and name itlocal-dataset.zip
- Extract the
local-dataset.zip
content to the directoryextraction_folder
- Validate the GTFS data and output the results to the directory
output_folder
. As option-p
is provided, results will be exported as.pb
files
Example: Validate a GTFS dataset without specifying command arguments or providing configuration file
java -jar gtfs-validator-v1.3.0_cli.jar
In order, this command line will:
- Search for a zipped folder in the working directory
- Extract by default the content of the zipped GTFS dataset to directory
gtfs-validator/input/
- Validate the GTFS data and output the results to directory
gtfs-validator/output
. Validation results will be exported to JSON by default.
For a list of all available commands, use --help
:
java -jar gtfs-validator-v1.3.0_cli.jar --help
Execution parameters are configurable through command-line or via a configuration file execution-parameters.json
.
By default, if no command-line is provided the validation process will look for execution parameters in user configurable configuration file execution-parameters.json
.
In the case said file could not be found or is incomplete, default values will be used.
One should note that if both command-line options and configuration file are provided, the configuration file takes precedence over the command option.
Sample usage:
The two following sample usages are equivalent, provided that execution-parameters.json
file is located in the working directory:
java -jar gtfs-validator-v1.3.0_cli.jar -e relative/extraction/path -o relative/output/path -i relative/path/to/zipped_dataset -x agency.txt,routes.txt
{
"extract": "relative/extraction/path",
"output": "relative/output/path",
"input": "relative/path/to/zipped_dataset",
"exclude": "agency.txt,routes.txt"
}
Note that you'll need to change the above JAR file name to whatever release version you download.
A second implementation of gtfs-validator
uses SpringBoot
framework and a user interface (based on React
).
java -jar gtfs-validator-v1.3.0_web.war
Which will:
- Launch server side of application on port
8090
- Launch client side of the application on port
8090
Open your favorite browser and go to http://localhost:8090
the user interface of the application should be displayed as follows:
The entire valdiation process can be monitored in the Terminal:
- Drag and drop your configuration in the area indicated for this purpose
- Click on validate
The validation report will be generated and saved at the default location or the path specified via the configuration file's output
field.
The validation report can be displayed by a simple click on the Display validation report
button, which will automatically open your default text editor with the content of the validation report.
See configuration section for more details regarding software configuration.
$ yarn start
This command runs the app in the development mode. Note that this command should be ran in /application/web-app/react-client/ Open http://localhost:3000 to view it in your browser.
The development page will reload if you edit the React project.
You will also be able to see any lint errors in the console.
You can refer to this documentation for more information regarding the React implementation of the web-app.
We use clean architecture principles to implement this validator, which modularizes the project.
Some important modules:
- domain - Entity classes
- use cases - Business logic
- adapter - Convertors (e.g., parsers and exporters)
- application/cli-app - The main command-line application
- application/web-app/react-client - The local web ui as a React project
- application/web-app/spring-server - The implementation of the application that relies on SpringBoot framework
To run tests:
- Run Java tests
$ ./gradlew check
- Run JS tests
$ cd react-client/
$ npm test
There is a way to locally execute the run-on-data
job of the end_to_end GitHub workflow
You need Docker to be installed
Install act
brew install act
In the repo root folder
act -j run-on-data
Note: we run into a know issue of act
when trying to collect artifacts
[End to end/run-on-data] ❗ ::error::Unable to get ACTIONS_RUNTIME_TOKEN env variable
.zip dataset files and .json validation report files still are available **within the Docker image (docker exec -it
) ** for manual collection
MacBook-Pro-de-Fabrice:~ fabricev$ docker exec -it b22cf048e47ad10c65be3071dd14dad999dbcf59531a2e31326733c05d861048 /bin/sh; exit
# ls
ADDING_NEW_RULES.md build.gradle mst.zip
Dockerfile config null
LICENSE domain octa.zip
MTBA.zip gradle one_empty_gtfs_file.zip
README.md gradlew output
RELEASE.md gradlew.bat settings.gradle
RULES.md input usecase
adapter mbta.zip
application mixed_empty_full_gtfs_files.zip
# cd output
# ls
MBTA__2020-10-26_08-51-29.211229.json
MST__2020-10-26_08-51-39.835092.json
Orange_County_Transportation_Authority__2020-10-26_08-52-19.024739.json
Code licensed under the Apache 2.0 License.
If you have followed instructions in the Usage via GitHub Actions and have a fork with an open PR on your master branch, you've already done most of the work! Complete the following instructions to send us all the relevant information so we can diagnose and fix the issue.
- go to https://github.com/MobilityData/gtfs-validator
- select the
Pull requests
tab - click the green
New pull request
button - in the
Compare changes
section, click the blue link compare across forks. - on the left side of the
←
base repository: should beMobilityData/gtfs-validator
and base: bemaster
- on the right side of the
←
use the first dropdown to change head repository: to your forked one (likeilovetramways/gtfs-validator
for GitHub handleilovetramways
) - on the right side of the
←
use the second dropdown to change compare: to the branch in your fork containing the changes you made to end_to_end.yml that led to an issue - click the green
Create pull request
button - use the dropdown on the green
Create pull request
button to selectCreate draft pull request
- click the green
Draft pull request
button
Then we're all set, thk you very very much! The end to end workflow will run on the newly created PR in our repository and automatically collect all relevant information. We take care of everything from then and will follow up directly in the PR.
While we welcome all contributions, our members and sponsors see their PRs and issues prioritized.