Skip to content
Machine learning for transportation data imputation and prediction.
Jupyter Notebook
Branch: master
Clone or download
Latest commit a8c2861 Sep 15, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.


MIT License Python 3.7 GitHub stars


Machine learning models make important developments about spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?

About the Project

In the transdim (transportation data imputation) project, we build machine learning models to help address some of the toughest challenges of spatiotemporal data modeling -- from missing data imputation to time series prediction.

In a hurry? Please check out our contents as follows.

Table of Contents
Strategic Aim
Tasks and Challenges
What we do just now!
Our Implementation
Selected References
Our Publications

Strategic Aim

Creating accurate and efficient solutions for the spatiotemporal traffic data imputation and prediction tasks.

Tasks and Challenges

Missing data are there, whether we like them or not. The really interesting question is how to deal with incomplete data.

  • Missing data imputation

    • Random missing (RM): Each sensor lost their observations at completely random. (★★★)
    • Non-random missing (NM): Each sensor lost their observations during several days. (★★★★)
  • Spatiotemporal prediction

    • Forecasting without missing values. (★★★)
    • Forecasting with incomplete observations. (★★★★★)

What we do just now!

  • Task 1: Industrial tensor completion framework for multi-dimensional missing traffic data imputation.


  • Task 2: An illustration of single-step rolling prediction task under a matrix factorization framework.

    • Example: Traffic forecasting using matrix factorization models.


What we care about!

  • Best algebraic structure for data imputation.
  • The context of urban transportation.
  • Data noise avoidance.
  • Competitive imputation and prediction performance.
  • Capable of various missing data scenarios.

Our Implementation

Open data

In this repository, we have adapted the public data sets into our experiments. For example, to read the data set on your console, you may see the following code:


tensor ='../Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']
random_matrix ='../Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor ='../Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

If you want to view the original data, please check out the following links:

Model implementation

In our experiments, we have implemented the machine learning models mainly on Numpy, and written these Python codes with Jupyter Notebook. So, if you want to evaluate these models, you could download and run these notebooks directly (prerequisite: download the data sets before evaluation).

Task Jupyter Notebook link Gdata Bdata Hdata Sdata Ndata
Imputation BTMF 🔶 🔶
BayesTRMF 🔶 🔶
TRMF 🔶 🔶
BPMF 🔶 🔶
BTTF 🔶 🔶 🔶 🔶
BayesTRTF 🔶 🔶 🔶 🔶
BPTF 🔶 🔶 🔶 🔶
Prediction BTMF 🚧 🔶
BayesTRMF 🚧 🔶
TRMF 🚧 🔶
BTTF 🔶 🔶 🔶 🔶
BayesTRTF 🔶 🔶 🔶 🔶
TRTF 🔶 🔶 🔶 🔶
  • — Covered
  • 🔶 — Does not cover
  • 🚧 — Under development

Perhaps this repository is not well enough for your understanding, so if you have any suggestion, please feel free to contact Xinyu Chen (email: and send your suggestions.

Recommended email subject: Suggestions on transdim from [+ your name].

Imputation/Prediction performance

  • Imputation example

example (a) Time series of actual and estimated speed within two weeks from August 1 to 14.

example (b) Time series of actual and estimated speed within two weeks from September 12 to 25.

The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).

  • Prediction example




Selected References

Our Publications

  • Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [doi] [slide] [data] [Matlab code]

  • Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]

  • Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]

    Please consider citing our papers if they help your research.

Our Blog Posts (in Chinese)


This work is released under the MIT license.

You can’t perform that action at this time.