Introduction

The repo for temporal and spatial speed models, targets two scenarios:

Speed imputation: increase the coverage of speed derived from GPS observations.
Speed forecasting: predict future speed in the next 15 minute or 1 hour.

The implemented models:

Graph Convolutional Networks + LSTMs/CNNs
(unfinished) Tensor Factorization

Requirements

PyTorch version >= 1.2.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU

Getting Started

preprocess.py does feature extraction
trainer.py train TGCN model
- python trainer.py --train: train the data
- python trainer.py --test: predict data and save in parquet format

Output Data

For each subgrid and each dataset (i.e., test, train, validation) a parquet file with predicted values is created. With the following schema:

message schema {
  optional double 1553983200;
  optional double 1553986800;
  optional double 1553990400;
  optional double 1553994000;
  optional double 1553997600;
  optional double 1554001200;
  optional double 1554004800;
  optional double 1554008400;
  optional double 1554012000;
  ...
  optional double 1554883200;
  optional int64 from_node;
  optional int64 to_node;
}

Deployment to Kubernetes

We can claim GPU(s) when creating a pod using the k8s-trainer.yml script. Currently, only one GPU can be claimed for one pod.

Feature extraction and data transformation

To speed up and optimize CPU / GPU loads, we move all the data tranformation logics from Pytorch dataloader to preprocessing component. In specific:

The preprocess.py script expects data with numpy tensors (each tensor corresponds to a grid), in the shape of [num_nodes, 1, num_timesteps]. Currently, all the downstream transformations are done using numpy / pandas on a single machine.

Then for every tensor, we enrich with features extracted from JURBEY and some basic time-senstive features, window-slicing it to new tensor of shape [num_nodes, num_features, num_look_back_step, num_timesteps] for data, and [num_nodes, num_features, num_look_ahead_step, num_timesteps] for target, similarly for mask.
NOTE 1: the grid-based clustering is done separately and the mapping is in the cluster-mapping.csv file.
NOTE 2: The current memory botteneck is from the window-slicing of the feature tensors, there are 2 alternatives: (1) do it sequentially for every chunk of time steps and (2) use distributed framework i.e., Spark.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
configs		configs
data		data
notebook		notebook
src		src
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
LICENCE		LICENCE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
k8s-trainer.yml		k8s-trainer.yml
preprocess.py		preprocess.py
requirements.txt		requirements.txt
setup.py		setup.py
star2jurbey_local.json		star2jurbey_local.json
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

The implemented models:

Requirements

Getting Started

Output Data

Deployment to Kubernetes

Feature extraction and data transformation

License

About

Releases

Packages

Languages

License

tumeteor/ST-GCN

Folders and files

Latest commit

History

Repository files navigation

Introduction

The implemented models:

Requirements

Getting Started

Output Data

Deployment to Kubernetes

Feature extraction and data transformation

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages