Skip to content
This repository has been archived by the owner on Aug 19, 2020. It is now read-only.

temporaerhaus/OpenTripPlanner-data-container

 
 

Repository files navigation

Build process for OpenTripPlanner-data-container

Greenkeeper badge Build

This project:

Contains tools for fetching, building and deploying fresh otp data-containers for consumption by hsl, waltti and finland otp instances.

Main components

otp-data-builder

The actual data builder application. This is a node.js application that fetches and processes new gtfs/osm data. It's build around gulp and all separate steps of databuilding process can also be called directly from the source tree. The only required external dependency is docker. Docker is used for launching external commands that do for example data manipulation.

install gulp cli: npm install -g gulp-cli

install app deps: npm install

update osm data: gulp osm:update

download new gtfs data for waltti: ROUTERS=waltti gulp gtfs:dl

Configuration

It is possible to change the behaviour of the data builder by defining environment variables.

  • "DOCKER_USER" defines username for authenticating to docker hub.
  • "DOCKER_AUTH" defines password for authenticating to docker hub.
  • (Optional, default latest and tag based on date) "DOCKER_TAG" defines what will be the updated docker tag of the data container images in the remote container registry.
  • (Optional, default latest) "SEED_TAG" defines what version of data container should be used for seeding.
  • (Optional, default latest) "OTP_TAG" defines what version of OTP is used for testing and building graphs.
  • (Optional, default latest) "TOOLS_TAG" defines what version of otp-data-tools image is used for testing.
  • (Optional, default ${process.cwd()}/data) "HOST_DATA" defines base path for volume directories.
  • (Optional, default 'finland, waltti, hsl') "ROUTERS" defines which data containers are being built and deployed.
  • (Optional, default ${process.cwd()}/data) "DATA" defines base path for data directories in container's file system.
  • (Optional, default 23:00:00) "BUILD_TIME" defines when data build is being run. Uses UTC time.
  • (Optional, default dev) "BUILDER_TYPE" used as a postfix to slack bot name
  • (Optional, default {}) "EXTRA_SRC" defines gtfs src values that should be overridden or completely new src that should be added with unique id. "routers" is always a mandatory field. Example format:
    • {"FOLI": {"url": "http://data.foli.fi/gtfs/gtfs.zip", "fit": false, "rules": ["router-waltti/gtfs-rules/waltti.rule"], "routers": ["hsl", "finland"]}}
    • You can remove a src by including "remove": true, {"FOLI": {"remove": true, "routers": ["hsl"]}
  • (Optional, default {}) "EXTRA_UPDATERS" defines router-config.json updater values that should be overridden or completely new updater that should be added with unique id. "routers" is always a mandatory field. Example format:
    • {"turku-alerts": {"type": "real-time-alerts", "frequencySec": 30, "url": "https://foli-beta.nanona.fi/gtfs-rt/reittiopas", "feedId": "FOLI", "fuzzyTripMatching": true, "routers": ["waltti"]}}
    • You can remove a src by including "remove": true, {"turku-alerts": {"remove": true, "routers": ["waltti"]}

Data processing steps

Seed data can be retrieved with a single gulp command:

  1. seed

Downloads previous data containers (env variable SEED_TAG can be used to customize which tag is pulled) and extracts osm and gtfs data from there and places it in 'data/ready' directory. Old data acts as backup in case fetching/validating new data fails.

Currently there is single processing step for OSM data. Because gtfs processing steps require osm data, the osm data must be available before running the gtfs:fit stage later below.

  1. osm:update

This command downloads required OSM packages from configured location, tests the file(s) with otp, and if tests pass data is copied to 'data/downloads/osm' directory.

The data is then processed with the following steps:

  1. gtfs:dl Downloads a GTFS package from configured location, tests the file with otp, if test passes data is copied to directory 'data/fit/gtfs'. The resulting zip is named .zip.

  2. gtfs:fit Runs configured map fits. Copies data to directory 'data/filter/gtfs'.

  3. gtfs:filter Runs configured filterings. Copies data to directory 'data/id/gtfs'.

  4. gtfs:id Sets the gtfs feed id to and copies data to directory 'data/ready/gtfs'.

Building the router from available (seeded or downloaded and processed) data:

  1. router:buildGraph

Prebuilds graph with either current latest version or user defined version (with env variable OTP_TAG) of OTP and creates zip files ready for building the otp-data container.

The final step is router deployment:

  1. deploy.sh

Builds a data container, starts it, starts either latest or user defined version (with env variable OTP_TAG) of otp and runs routing tests (otp-data-tools latest image is used for it by default, TOOLS_TAG env variable can be used to change that) to verify that the data container looks ok. If tests pass the fresh data container is pushed to Dockerhub.

Normally when the application is running (as a container) the index.js is used. It runs the data updating process on a schedule specified as a cron pattern. Data build can be executed immediately by attaching to the builder container with bash 'docker exec -i -t /bin/bash' and then exexuting the command 'node index.js once'. The end result of the build is 2 docker containers uploaded into dockerhub. Digitransit-deployer detects the changes and restarts OTP instances, so that new data becomes in use.

Each data container image runs a http server listening to port 8080, serving both a gtfs data bundle and a pre-built graph:

otp-data-tools

Contains tools for gtfs manipulation, such as One Bus Away gtfs filter. These tools are packaged inside docker container and are used dunring the data build process.

Packages

No packages published

Languages

  • JavaScript 81.4%
  • Shell 17.0%
  • Dockerfile 1.6%