Machine learning models make important developments about spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?
In the transdim (transportation data imputation) project, we build machine learning models to help address some of the toughest challenges of spatiotemporal data modeling - from missing data imputation to time series prediction. The strategic aim of this project is creating accurate and efficient solutions for spatiotemporal traffic data imputation and prediction tasks.
In a hurry? Please check out our contents as follows.
Missing data are there, whether we like them or not. The really interesting question is how to deal with incomplete data.
-
Missing data imputation 🔥
- Random missing (RM): Each sensor lost their observations at completely random. (★★★)
- Non-random missing (NM): Each sensor lost their observations during several days. (★★★★)
Example: Tensor completion framework for multi-dimensional missing traffic data imputation.
- Spatiotemporal prediction 🔥
- Forecasting without missing values. (★★★)
- Forecasting with incomplete observations. (★★★★★)
Example: An illustration of single-step rolling prediction task under a matrix factorization framework.
In this repository, we have adapted the public data sets into our experiments. For example, to read the data set on your console, you may see the following code:
import scipy.io
tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']
If you want to view the original data, please check out the following links:
- Gdata: Guangzhou urban traffic speed data set.
- Bdata: Birmingham parking data set.
- Hdata: Hangzhou metro passenger flow data set.
- Ndata: NYC taxi data set.
- Sdata: Seattle freeway traffic speed data set.
In our experiments, we have implemented the machine learning models mainly on Numpy
, and written these Python codes with Jupyter Notebook. So, if you want to evaluate these models, you could download and run these notebooks directly (prerequisite: download the data sets before evaluation).
Task | Jupyter Notebook link | Gdata | Bdata | Hdata | Sdata | Ndata |
---|---|---|---|---|---|---|
Missing Data Imputation | BTMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
BayesTRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
TRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BPMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BGCP | ✅ | ✅ | ✅ | ✅ | ✅ | |
TF-ALS | ✅ | ✅ | ✅ | ✅ | ✅ | |
BTTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
BayesTRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
BPTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
Single-Step Prediction | BTMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
BayesTRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
TRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BTTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
BayesTRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
TRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
Multi-Step Prediction | BTMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
BayesTRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
TRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BTTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
BayesTRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
TRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ |
- ✅ — Covered
- 🔶 — Does not cover
- 🚧 — Under development
If you have any suggestion, please feel free to contact Xinyu Chen (email: chenxy346@mail2.sysu.edu.cn) and send your suggestions.
Recommended email subject: Suggestions on transdim from [+ your name].
- Imputation example
(a) Time series of actual and estimated speed within two weeks from August 1 to 14.
(b) Time series of actual and estimated speed within two weeks from September 12 to 25.
The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).
- Prediction example
-
-
Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski, 2019. Deep Factors for Forecasting. ICML 2019. (★★★★★)
-
Danielle C. Maddix, Yuyang Wang, Alex Smola, 2018. Deep Factors with Gaussian Processes for Forecasting. arXiv.
-
Syama Sundar Rangapuram, Matthias Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, Tim Januschowski, 2018. Deep State Space Models for Time Series Forecasting. NeurIPS 2018.
-
San Gultekin, John Paisley, 2019. Online Forecasting Matrix Factorization. IEEE Transactions on Signal Processing, 67(5): 1223-1236. [Python code]
-
Zheyi Pan, Yuxuan Liang, Junbo Zhang, Xiuwen Yi, Yong Yu, Yu Zheng, 2018. HyperST-Net: hypernetworks for spatio-temporal forecasting. arXiv.
-
Truc Viet Le, Richard Oentaryo, Siyuan Liu, Hoong Chuin Lau, 2017. Local Gaussian processes for efficient fine-grained traffic speed prediction. arXiv.
-
Yaguang Li, Cyrus Shahabi, 2018. A brief overview of machine learning methods for short-term traffic forecasting and future directions. ACM SIGSPATIAL, 10(1): 3-9.
-
Bing Yu, Haoteng Yin, Zhanxing Zhu, 2017. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv. (appear in IJCAI 2018)
-
Feras A. Saad, Vikash K. Mansinghka, 2018. Temporally-reweighted Chinese Restaurant Process mixtures for clustering, imputing, and forecasting multivariate time series. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. PMLR: Volume 84.
-
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu, 2018. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(6085).
-
Zhengping Che, Sanjay Purushotham, Guangyu Li, Bo Jiang, Yan Liu, 2018. Hierarchical deep generative models for multi-rate multivariate time series. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), PMLR 80:784-793, 2018.
-
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla, 2018. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. arXiv.
-
Wang, X., Chen, C., Min, Y., He, J., Yang, B., Zhang, Y., 2018. Efficient metropolitan traffic prediction based on graph recurrent neural network. arXiv.
-
Peiguang Jing, Yuting Su, Xiao Jin, Chengqian Zhang, 2018. High-order temporal correlation model learning for time-series prediction. IEEE Transactions on Cybernetics, early access.
-
Oren Anava, Elad Hazan, Assaf Zeevi, 2015. Online time series prediction with missing data. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 37: 2191-2199.
-
Shanshan Feng, Gao Cong, Bo An, Yeow Meng Chee, 2017. POI2Vec: Geographical latent representation for predicting future visitors. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017).
-
Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, Masatoshi Yoshikawa, 2012. Fast mining and forecasting of complex time-stamped events. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2012).
-
Yasuko Matsubara, Yasushi Sakurai, Willem G. van Panhuis, Christos Faloutsos, 2014. FUNNEL: automatic mining of spatially coevolving epidemics. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2014).
-
Koh Takeuchi, Hisashi Kashima, Naonori Ueda, 2017. Autoregressive tensor factorization for spatio-temporal predictions. 2017 IEEE International Conference on Data Mining (ICDM 2017).
-
Shun-Yao Shih, Fan-Keng Sun, Hung-yi Lee, 2018. Temporal pattern attention for multivariate time series forecasting. arXiv.
-
-
-
Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, Shin Ishii, 2003. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19: 2088-2096. [Matlab code]
-
Li Qu, Li Li, Yi Zhang, Jianming Hu, 2009. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Transactions on Intelligent Transportation Systems, 10(3): 512-522.
-
Li Li, Yuebiao Li, Zhiheng Li, 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation Research Part C: Emerging Technologies, 34: 108-120.
-
-
-
Michalis K. Titsias, Magnus Rattray, Neil D. Lawrence, 2009. Markov chain Monte Carlo algorithms for Gaussian processes, Chapter.
-
Filipe Rodrigues, Kristian Henrickson, Francisco C. Pereira, 2018. Multi-output Gaussian processes for crowdsourced traffic data imputation. IEEE Transactions on Intelligent Transportation Systems, early access. [Matlab code]
-
Nicolo Fusi, Rishit Sheth, Huseyn Melih Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv. [Python code]
-
Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, Guillermo Sapiro, 2012. Kernelized probabilistic matrix factorization: exploiting graphs and side information. [slide]
-
John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani, 2017. Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv.
-
David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, Jan Gasthaus, 2019. High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes. arXiv. (★★★★)
-
-
-
Nikhil Rao, Hsiangfu Yu, Pradeep Ravikumar, Inderjit S Dhillon, 2015. Collaborative filtering with graph information: Consistency and scalable methods. Neural Information Processing Systems (NIPS 2015). [Matlab code]
-
Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. [Matlab code]
-
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, Christina Kirsch, 2018. Network-wide crowd flow prediction of Sydney trains via customized online non-negative matrix factorization. In The 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy.
-
-
-
Ruslan Salakhutdinov, Andriy Mnih, 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland. [Matlab code (official)] [Python code] [Julia and C++ code] [Julia code]
-
Neil D. Lawrence, Raquel Urtasun, 2009. Non-linear Matrix Factorization with Gaussian Processes. ICML 2009. (★★★★★)
-
Ilya Sutskever, Ruslan Salakhutdinov, Joshua B. Tenenbaum, 2009. Modelling relational data using Bayesian clustered tensor factorization. NIPS 2009.
-
kan Saha, Vikas Sindhwani, 2012. Learning evolving and emerging topics in social media: A dynamic NMF approach with temporal regularization. WSDM 2012. (★★★★)
-
Nicolo Fusi, Rishit Sheth, Melih Huseyn Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv.
-
Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell, 2010. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 211-222.
-
Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751-1763.
-
Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv.
-
Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David B. Dunsun, Lawrence Carin, 2014. Scalable Bayesian low-rank decomposition of incomplete multiway tensors. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China.
-
Ömer Deniz Akyildiz, Theodoros Damoulas, Mark F. J. Steel, 2019. Probabilistic sequential matrix factorization. arXiv. (★★★★★)
-
-
-
Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, Pierre Vandergheynst, 2014. Matrix completion on graphs. arXiv. (appear in NIPS 2014)
-
Rianne van den Berg, Thomas N. Kipf, Max Welling, 2018. Graph convolutional matrix completion. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), London, UK.
-
Federico Monti, Michael M. Bronstein, Xavier Bresson, 2017. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. NIPS 2017.
-
-
-
Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.
-
Bin Ran, Huachun Tan, Yuankai Wu, Peter J. Jin, 2016. Tensor based missing traffic data completion with spatial–temporal correlation. Physica A: Statistical Mechanics and its Applications, 446: 54-63.
-
-
-
Brandon Amos, 2016. Image completion with deep learning in TensorFlow. blog post. [github]
-
Jinsun Yoon, James Jordon, Mihaela van der Schaar, 2018. GAIN: missing data imputation using Generative Adversarial Nets. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. [supplementary materials] [Python code]
-
Ian Goodfellow, 2016. NIPS 2016 tutorial: Generative Adversarial Networks.
-
Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, Georg Langs, 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. arXiv.
-
Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, Xiaojie Yuan, 2018. Multivariate time series imputation with generative adversarial networks. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]
-
-
-
Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, 2017. Factorized variational autoencoders for modeling audience reactions to movies. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.
-
Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao, 2018. Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications. WWW 2018.
-
John T. McCoy, Steve Kroon, Lidia Auret, 2018. Variational Autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine, 51(21): 141-146. [Python code] [VAE demo]
-
Pierre-Alexandre Mattei, Jes Frellsen, 2018. missingIWAE: Deep generative modelling and imputation of incomplete data. Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montréal, Canada. [related slide]
-
-
-
Guillaume Rabusseau, Hachem Kadri, 2016. Low-rank regression with tensor responses. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
-
Rose Yu, Yan Liu, 2016. Learning from multiway data: simple and efficient tensor regression. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
-
Masaaki Imaizumi, Kohei Hayashi, 2016. Doubly decomposing nonparametric tensor regression. Proceedings of the 33 rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
-
Rose Yu, Guangyu Li, Yan Liu, 2018. Tensor regression meets Gaussian processes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. [Matlab code]
-
Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang, 2018. Boosted sparse and low-rank tensor regression. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
-
-
-
Liangjie Hong, 2015. Poisson matrix factorization. blog post.
-
Ali Taylan Cemgil, 2009. Bayesian inference for nonnegative matrix factorisation models. Computational intelligence and neuroscience.
-
Prem Gopalan, Jake M. Hofman, David M. Blei, 2015. Scalable recommendation with hierarchical poisson factorization. In UAI, 326-335. [C++ code]
-
Laurent Charlin, Rajesh Ranganath, James Mclnerney, 2015. Dynamic Poisson factorization. Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), Vienna, Italy. [C++ code]
-
Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee, 2017. Recurrent Poisson factorization for temporal recommendation. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), Halifax, Nova Scotia Canada. [Matlab code]
-
Aaron Schein, Scott W. Linderman, Mingyuan Zhou, David M. Blei, Hanna Wallach, 2019. Poisson-Randomized Gamma Dynamical Systems. arXiv. (★★★★★)
-
-
-
Arman Hasanzadeh, Xi Liu, Nick Duffield, Krishna R. Narayanan, Byron Chigoy, 2017. A graph signal processing approach for real-time traffic prediction in transportation networks. arXiv.
-
Antonio Ortega, Pascal Frossard, Jelena Kovačević, José M. F. Moura, Pierre Vandergheynst, 2018. Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE, 106(5): 808-828. [slide]
-
-
-
Structured deep models: Deep learning on graphs and beyond. slide.
-
gcn: Implementation of Graph Convolutional Networks in TensorFlow. GitHub project.
-
gated-graph-neural-network-samples: Sample Code for Gated Graph Neural Networks. GitHub project.
-
Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jieping Ye, Yan Liu, 2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. AAAI 2019.
-
Menglin Wang, Baisheng Lai, Zhongming Jin, Yufeng Lin, Xiaojia Gong, Jiangqiang Huang, Xiansheng Hua, 2018. Dynamic spatio-temporal graph-based CNNs for traffic prediction. arXiv.
-
-
Daniel J. Stekhoven, Peter Bühlmann, 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1): 112–118. [missingpy - PyPI] or [missingpy - GitHub]
-
fancyimpute: A variety of matrix completion and imputation algorithms implemented in Python. [homepage]
-
Dimitris Bertsimas, Colin Pawlowski, Ying Daisy Zhuo, 2018. From predictive methods to missing data imputation: An optimization approach. Journal of Machine Learning Research, 18(196): 1-39.
-
Wei Cao, Dong Wang, Jian Li, Hao Zhou, Yitan Li, Lei Li, 2018. BRITS: Bidirectional Recurrent Imputation for Time Series. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]
-
-
Xinyu Chen, Lijun Sun (2019). Bayesian temporal factorization for multidimensional time series prediction. arxiv. 1910.06366. [preprint] [slide] [data & Python code]
-
Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [doi] [slide] [data] [Matlab code]
-
Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]
-
Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]
This project originates from our papers, please consider citing our papers if they help your research.
Xinyu Chen 💻 |
Jinming Yang 💻 |
Yixian Chen 💻 |
Lijun Sun 💻 |
Tianyang Han 💻 |
See the list of contributors who participated in this project.
This work is released under the MIT license.