Made by Xinyu Chen • 🌐 https://twitter.com/chenxy346
Machine learning models make important developments in the field of spatiotemporal data modeling - like how to forecast near-future traffic states of road networks. But what happens when these models are built with incomplete data commonly collected in real-world systems?
In the transdim (transportation data imputation) project, we build machine learning models to help address some of the toughest challenges of spatiotemporal data modeling - from missing data imputation to time series prediction. The strategic aim of this project is creating accurate and efficient solutions for spatiotemporal traffic data imputation and prediction tasks.
In a hurry? Please check out our contents as follows.
Missing data are there, whether we like them or not. The really interesting question is how to deal with incomplete data.
-
Missing data imputation 🔥
- Random missing (RM): Each sensor lost their observations at completely random. (★★★)
- Non-random missing (NM): Each sensor lost their observations during several days. (★★★★)
- Spatiotemporal prediction 🔥
- Forecasting without missing values. (★★★)
- Forecasting with incomplete observations. (★★★★★)
Figure 3: Illustration of our proposed Low-Rank Tensor Completion (LATC) imputer/predictor with a prediction window τ (green nodes: observed values; white nodes: missing values; red nodes/panel: prediction; blue panel: training data to construct the tensor).
In this repository, we have adapted the publicly available data sets into our experiments. If you want to view or use these data sets, please download them at the ../datasets/ folder in advance, and then run the following codes in your Python console:
import scipy.io
tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']
If you want to view the original data, please check out the following links:
- Gdata: Guangzhou urban traffic speed data set.
- Bdata: Birmingham parking data set.
- Hdata: Hangzhou metro passenger flow data set.
- Sdata: Seattle freeway traffic speed data set.
- Ndata: NYC taxi data set.
In particular, we take into account large-scale traffic data imputation/prediction on PeMS-4W and PeMS-8W data sets:
- PeMS-4W/8W/12W: Large-scale traffic speed data sets in California, USA.
You can download the data sets from Zenodo and place them at the folder of datasets (data path example: ../datasets/California-data-set/pems-4w.csv
). Then you can open data in Python by using Pandas
:
import pandas as pd
data = pd.read_csv('../datasets/California-data-set/pems-4w.csv', header = None)
For model evaluation, we mask certain entries of the "observed" data as missing values and then perform imputation for these "missing" values.
In our experiments, we have implemented some machine learning models mainly on Numpy
, and written these Python codes with Jupyter Notebook. So, if you want to evaluate these models, please download and run these notebooks directly (prerequisite: download the data sets in advance).
- Our models
Task | Jupyter Notebook | Gdata | Bdata | Hdata | Sdata | Ndata |
---|---|---|---|---|---|---|
Missing Data Imputation | BTMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
BGCP | ✅ | ✅ | ✅ | ✅ | ✅ | |
LRTC-TNN | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BTTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
Single-Step Prediction | BTMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
BTTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
Multi-Step Prediction | BTMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
BTTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ |
- Baselines
Task | Jupyter Notebook | Gdata | Bdata | Hdata | Sdata | Ndata |
---|---|---|---|---|---|---|
Missing Data Imputation | BayesTRMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
TRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BPMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
HaLRTC | ✅ | ✅ | ✅ | ✅ | 🔶 | |
TF-ALS | ✅ | ✅ | ✅ | ✅ | ✅ | |
BayesTRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
BPTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
Single-Step Prediction | BayesTRMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
TRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BayesTRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
TRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
Multi-Step Prediction | BayesTRMF | ✅ | ✅ | ✅ | ✅ | 🔶 |
TRMF | ✅ | ✅ | ✅ | ✅ | 🔶 | |
BayesTRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ | |
TRTF | 🔶 | 🔶 | 🔶 | 🔶 | ✅ |
- ✅ — Cover
- 🔶 — Does not cover
- 🚧 — Under development
- Imputation example (on Gdata)
(a) Time series of actual and estimated speed within two weeks from August 1 to 14.
(b) Time series of actual and estimated speed within two weeks from September 12 to 25.
The imputation performance of BGCP (CP rank r=15 and missing rate α=30%) under the fiber missing scenario with third-order tensor representation, where the estimated result of road segment #1 is selected as an example. In the both two panels, red rectangles represent fiber missing (i.e., speed observations are lost in a whole day).
- Prediction example
This is an imputation example of Low-Rank Tensor Completion with Truncated Nuclear Norm minimization (LRTC-TNN). One notable thing is that unlike the complex equations in our paper, our Python implementation is extremely easy to work with.
- First, import some necessary packages:
import numpy as np
from numpy.linalg import inv as inv
- Define the operators of tensor unfolding (
ten2mat
) and matrix folding (mat2ten
) usingNumpy
:
def ten2mat(tensor, mode):
return np.reshape(np.moveaxis(tensor, mode, 0), (tensor.shape[mode], -1), order = 'F')
def mat2ten(mat, tensor_size, mode):
index = list()
index.append(mode)
for i in range(tensor_size.shape[0]):
if i != mode:
index.append(i)
return np.moveaxis(np.reshape(mat, list(tensor_size[index]), order = 'F'), 0, mode)
- Define Singular Value Thresholding (SVT) for Truncated Nuclear Norm (TNN) minimization:
def svt_tnn(mat, alpha, rho, theta):
tau = alpha / rho
[m, n] = mat.shape
if 2 * m < n:
u, s, v = np.linalg.svd(mat @ mat.T, full_matrices = 0)
s = np.sqrt(s)
idx = np.sum(s > tau)
mid = np.zeros(idx)
mid[:theta] = 1
mid[theta:idx] = (s[theta:idx] - tau) / s[theta:idx]
return (u[:,:idx] @ np.diag(mid)) @ (u[:,:idx].T @ mat)
elif m > 2 * n:
return svt_tnn(mat.T, tau, theta).T
u, s, v = np.linalg.svd(mat, full_matrices = 0)
idx = np.sum(s > tau)
vec = s[:idx].copy()
vec[theta:] = s[theta:] - tau
return u[:,:idx] @ np.diag(vec) @ v[:idx,:]
- Define performance metrics (i.e., RMSE, MAPE):
def compute_rmse(var, var_hat):
return np.sqrt(np.sum((var - var_hat) ** 2) / var.shape[0])
def compute_mape(var, var_hat):
return np.sum(np.abs(var - var_hat) / var) / var.shape[0]
- Define LRTC-TNN:
def LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter):
"""Low-Rank Tenor Completion with Truncated Nuclear Norm, LRTC-TNN."""
dim = np.array(sparse_tensor.shape)
pos_missing = np.where(sparse_tensor == 0)
pos_test = np.where((dense_tensor != 0) & (sparse_tensor == 0))
X = np.zeros(np.insert(dim, 0, len(dim))) # \boldsymbol{\mathcal{X}}
T = np.zeros(np.insert(dim, 0, len(dim))) # \boldsymbol{\mathcal{T}}
Z = sparse_tensor.copy()
last_tensor = sparse_tensor.copy()
snorm = np.sqrt(np.sum(sparse_tensor ** 2))
it = 0
while True:
rho = min(rho * 1.05, 1e5)
for k in range(len(dim)):
X[k] = mat2ten(svt_tnn(ten2mat(Z - T[k] / rho, k), alpha[k], rho, np.int(np.ceil(theta * dim[k]))), dim, k)
Z[pos_missing] = np.mean(X + T / rho, axis = 0)[pos_missing]
T = T + rho * (X - np.broadcast_to(Z, np.insert(dim, 0, len(dim))))
tensor_hat = np.einsum('k, kmnt -> mnt', alpha, X)
tol = np.sqrt(np.sum((tensor_hat - last_tensor) ** 2)) / snorm
last_tensor = tensor_hat.copy()
it += 1
if (it + 1) % 50 == 0:
print('Iter: {}'.format(it + 1))
print('RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
print()
if (tol < epsilon) or (it >= maxiter):
break
print('Imputation MAPE: {:.6}'.format(compute_mape(dense_tensor[pos_test], tensor_hat[pos_test])))
print('Imputation RMSE: {:.6}'.format(compute_rmse(dense_tensor[pos_test], tensor_hat[pos_test])))
print()
return tensor_hat
- Let us try it on Guangzhou urban traffic speed data set (Gdata):
import scipy.io
tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/tensor.mat')
dense_tensor = tensor['tensor']
random_tensor = scipy.io.loadmat('../datasets/Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']
missing_rate = 0.2
### Random missing (RM) scenario:
binary_tensor = np.round(random_tensor + 0.5 - missing_rate)
sparse_tensor = np.multiply(dense_tensor, binary_tensor)
- Run the imputation experiment:
import time
start = time.time()
alpha = np.ones(3) / 3
rho = 1e-5
theta = 0.30
epsilon = 1e-4
maxiter = 200
LRTC(dense_tensor, sparse_tensor, alpha, rho, theta, epsilon, maxiter)
end = time.time()
print('Running time: %d seconds'%(end - start))
This example is from ../experiments/Imputation-LRTC-TNN.ipynb, you can check out this Jupyter Notebook for advanced usage.
-
Time series forecasting
-
Time series imputation
-
-
Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski, 2019. Deep Factors for Forecasting. ICML 2019. (★★★★★)
-
Danielle C. Maddix, Yuyang Wang, Alex Smola, 2018. Deep Factors with Gaussian Processes for Forecasting. arXiv.
-
Syama Sundar Rangapuram, Matthias Seeger, Jan Gasthaus, Lorenzo Stella, Yuyang Wang, Tim Januschowski, 2018. Deep State Space Models for Time Series Forecasting. NeurIPS 2018.
-
Zheyi Pan, Yuxuan Liang, Junbo Zhang, Xiuwen Yi, Yong Yu, Yu Zheng, 2018. HyperST-Net: hypernetworks for spatio-temporal forecasting. arXiv.
-
Truc Viet Le, Richard Oentaryo, Siyuan Liu, Hoong Chuin Lau, 2017. Local Gaussian processes for efficient fine-grained traffic speed prediction. arXiv.
-
Yaguang Li, Cyrus Shahabi, 2018. A brief overview of machine learning methods for short-term traffic forecasting and future directions. ACM SIGSPATIAL, 10(1): 3-9.
-
Bing Yu, Haoteng Yin, Zhanxing Zhu, 2017. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. arXiv. (appear in IJCAI 2018)
-
Feras A. Saad, Vikash K. Mansinghka, 2018. Temporally-reweighted Chinese Restaurant Process mixtures for clustering, imputing, and forecasting multivariate time series. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. PMLR: Volume 84.
-
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, Yan Liu, 2018. Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8(6085).
-
Zhengping Che, Sanjay Purushotham, Guangyu Li, Bo Jiang, Yan Liu, 2018. Hierarchical deep generative models for multi-rate multivariate time series. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), PMLR 80:784-793, 2018.
-
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng, Cristian Lumezanu, Wei Cheng, Jingchao Ni, Bo Zong, Haifeng Chen, Nitesh V. Chawla, 2018. A deep neural network for unsupervised anomaly detection and diagnosis in multivariate time series data. arXiv.
-
Wang, X., Chen, C., Min, Y., He, J., Yang, B., Zhang, Y., 2018. Efficient metropolitan traffic prediction based on graph recurrent neural network. arXiv.
-
Peiguang Jing, Yuting Su, Xiao Jin, Chengqian Zhang, 2018. High-order temporal correlation model learning for time-series prediction. IEEE Transactions on Cybernetics, early access.
-
Oren Anava, Elad Hazan, Assaf Zeevi, 2015. Online time series prediction with missing data. Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), 37: 2191-2199.
-
Shanshan Feng, Gao Cong, Bo An, Yeow Meng Chee, 2017. POI2Vec: Geographical latent representation for predicting future visitors. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI 2017).
-
Yasuko Matsubara, Yasushi Sakurai, Christos Faloutsos, Tomoharu Iwata, Masatoshi Yoshikawa, 2012. Fast mining and forecasting of complex time-stamped events. Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2012).
-
Yasuko Matsubara, Yasushi Sakurai, Willem G. van Panhuis, Christos Faloutsos, 2014. FUNNEL: automatic mining of spatially coevolving epidemics. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2014).
-
Koh Takeuchi, Hisashi Kashima, Naonori Ueda, 2017. Autoregressive tensor factorization for spatio-temporal predictions. 2017 IEEE International Conference on Data Mining (ICDM 2017).
-
Shun-Yao Shih, Fan-Keng Sun, Hung-yi Lee, 2018. Temporal pattern attention for multivariate time series forecasting. arXiv.
-
Dingxiong Deng, Cyrus Shahabi, Ugur Demiryurek, Linhong Zhu, Rose Yu, Yan Liu, 2016. Latent space model for road networks to predict time-varying traffic. Proceedings of the 22rd ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2016).
-
-
-
Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, Shin Ishii, 2003. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19: 2088-2096. [Matlab code]
-
Li Qu, Li Li, Yi Zhang, Jianming Hu, 2009. PPCA-based missing data imputation for traffic flow volume: a systematical approach. IEEE Transactions on Intelligent Transportation Systems, 10(3): 512-522.
-
Li Li, Yuebiao Li, Zhiheng Li, 2013. Efficient missing data imputing for traffic flow by considering temporal and spatial dependence. Transportation Research Part C: Emerging Technologies, 34: 108-120.
-
-
-
Michalis K. Titsias, Magnus Rattray, Neil D. Lawrence, 2009. Markov chain Monte Carlo algorithms for Gaussian processes, Chapter.
-
Filipe Rodrigues, Kristian Henrickson, Francisco C. Pereira, 2018. Multi-output Gaussian processes for crowdsourced traffic data imputation. IEEE Transactions on Intelligent Transportation Systems, early access. [Matlab code]
-
Nicolo Fusi, Rishit Sheth, Huseyn Melih Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv. [Python code]
-
Tinghui Zhou, Hanhuai Shan, Arindam Banerjee, Guillermo Sapiro, 2012. Kernelized probabilistic matrix factorization: exploiting graphs and side information. [slide]
-
John Bradshaw, Alexander G. de G. Matthews, Zoubin Ghahramani, 2017. Adversarial examples, uncertainty, and transfer testing robustness in Gaussian process hybrid deep networks. arXiv.
-
David Salinas, Michael Bohlke-Schneider, Laurent Callot, Roberto Medico, Jan Gasthaus, 2019. High-Dimensional Multivariate Forecasting with Low-Rank Gaussian Copula Processes. arXiv. (★★★★)
-
-
-
Nikhil Rao, Hsiangfu Yu, Pradeep Ravikumar, Inderjit S Dhillon, 2015. Collaborative filtering with graph information: Consistency and scalable methods. Neural Information Processing Systems (NIPS 2015). [Matlab code]
-
Hsiang-Fu Yu, Nikhil Rao, Inderjit S. Dhillon, 2016. Temporal regularized matrix factorization for high-dimensional time series prediction. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. [Matlab code]
-
Yongshun Gong, Zhibin Li, Jian Zhang, Wei Liu, Yu Zheng, Christina Kirsch, 2018. Network-wide crowd flow prediction of Sydney trains via customized online non-negative matrix factorization. In The 27th ACM International Conference on Information and Knowledge Management (CIKM 2018), Torino, Italy.
-
Hanbaek Lyu, Georg Menz, Deanna Needell, and Christopher Strohmeier, 2020. Applications of Online Nonnegative Matrix Factorization to Image and Time-Series Data
-
San Gultekin, John Paisley, 2019. Online Forecasting Matrix Factorization. IEEE Transactions on Signal Processing, 67(5): 1223-1236. [Python code]
-
-
-
Ruslan Salakhutdinov, Andriy Mnih, 2008. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. Proceedings of the 25th International Conference on Machine Learning (ICML 2008), Helsinki, Finland. [Matlab code (official)] [Python code] [Julia and C++ code] [Julia code]
-
Neil D. Lawrence, Raquel Urtasun, 2009. Non-linear Matrix Factorization with Gaussian Processes. ICML 2009. (★★★★★)
-
Ilya Sutskever, Ruslan Salakhutdinov, Joshua B. Tenenbaum, 2009. Modelling relational data using Bayesian clustered tensor factorization. NIPS 2009.
-
kan Saha, Vikas Sindhwani, 2012. Learning evolving and emerging topics in social media: A dynamic NMF approach with temporal regularization. WSDM 2012. (★★★★)
-
Nicolo Fusi, Rishit Sheth, Melih Huseyn Elibol, 2017. Probabilistic matrix factorization for automated machine learning. arXiv.
-
Liang Xiong, Xi Chen, Tzu-Kuo Huang, Jeff Schneider, Jaime G. Carbonell, 2010. Temporal collaborative filtering with Bayesian probabilistic tensor factorization. Proceedings of the 2010 SIAM International Conference on Data Mining. SIAM, pp. 211-222.
-
Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian CP factorization of incomplete tensors with automatic rank determination. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9): 1751-1763.
-
Qibin Zhao, Liqing Zhang, Andrzej Cichocki, 2015. Bayesian sparse Tucker models for dimension reduction and tensor completion. arXiv.
-
Piyush Rai, Yingjian Wang, Shengbo Guo, Gary Chen, David B. Dunsun, Lawrence Carin, 2014. Scalable Bayesian low-rank decomposition of incomplete multiway tensors. Proceedings of the 31st International Conference on Machine Learning (ICML 2014), Beijing, China.
-
Ömer Deniz Akyildiz, Theodoros Damoulas, Mark F. J. Steel, 2019. Probabilistic sequential matrix factorization. arXiv. (★★★★★)
-
-
-
Vassilis Kalofolias, Xavier Bresson, Michael Bronstein, Pierre Vandergheynst, 2014. Matrix completion on graphs. arXiv. (appear in NIPS 2014)
-
Rianne van den Berg, Thomas N. Kipf, Max Welling, 2018. Graph convolutional matrix completion. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2018), London, UK.
-
Federico Monti, Michael M. Bronstein, Xavier Bresson, 2017. Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks. NIPS 2017.
-
Tianyang Han, Kentaro Wada and Takashi Oguchi, 2019. Large-scale traffic data imputation using matrix completion on graphs. IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 2252-2258.
-
-
-
Ji Liu, Przemyslaw Musialski, Peter Wonka, Jieping Ye, 2013. Tensor completion for estimating missing values in visual data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1): 208-220.
-
Bin Ran, Huachun Tan, Yuankai Wu, Peter J. Jin, 2016. Tensor based missing traffic data completion with spatial–temporal correlation. Physica A: Statistical Mechanics and its Applications, 446: 54-63.
-
-
-
Brandon Amos, 2016. Image completion with deep learning in TensorFlow. blog post. [github]
-
Jinsun Yoon, James Jordon, Mihaela van der Schaar, 2018. GAIN: missing data imputation using Generative Adversarial Nets. Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. [supplementary materials] [Python code]
-
Ian Goodfellow, 2016. NIPS 2016 tutorial: Generative Adversarial Networks.
-
Thomas Schlegl, Philipp Seeböck, Sebastian M. Waldstein, Ursula Schmidt-Erfurth, Georg Langs, 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. arXiv.
-
Yonghong Luo, Xiangrui Cai, Ying Zhang, Jun Xu, Xiaojie Yuan, 2018. Multivariate time series imputation with generative adversarial networks. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]
-
Luo, Yonghong, Ying Zhang, Xiangrui Cai, and Xiaojie Yuan, 2019. E 2 GAN: end-to-end generative adversarial network for multivariate time series imputation IJCAI 2019..
-
Liu, Yukai, Rose Yu, Stephan Zheng, Eric Zhan, and Yisong Yue, 2019. NAOMI: Non-Autoregressive Multiresolution Sequence Imputation. NeurIPS 2019.
-
-
-
Fortuin, Vincent, Gunnar Rätsch, and Stephan Mandt, 2019. GP-VAE: Deep Probabilistic Time Series Imputation. AISTATS 2020.
-
Ivanov, Oleg, Michael Figurnov, and Dmitry Vetrov, 2019 Variational autoencoder with arbitrary conditioning. ICLR 2019.
-
Boquet, Guillem, Antoni Morell, Javier Serrano, and Jose Lopez Vicario, 2020. A variational autoencoder solution for road traffic forecasting systems: Missing data imputation, dimension reduction, model selection and anomaly detection Transportation Research Part C: Emerging Technologies 115 (2020): 102622.
-
Gregor, Karol, George Papamakarios, Frederic Besse, Lars Buesing, and Theophane Weber. Temporal difference variational auto-encoder ICLR 2019.
-
Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, Yisong Yue, 2017. Factorized variational autoencoders for modeling audience reactions to movies. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA.
-
Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng, Jie Chen, Zhaogang Wang, Honglin Qiao, 2018. Unsupervised anomaly detection via variational auto-encoder for seasonal KPIs in web applications. WWW 2018.
-
John T. McCoy, Steve Kroon, Lidia Auret, 2018. Variational Autoencoders for missing data imputation with application to a simulated milling circuit. IFAC-PapersOnLine, 51(21): 141-146. [Python code] [VAE demo]
-
Pierre-Alexandre Mattei, Jes Frellsen, 2018. missingIWAE: Deep generative modelling and imputation of incomplete data. Third workshop on Bayesian Deep Learning (NeurIPS 2018), Montréal, Canada. [related slide]
-
-
-
Guillaume Rabusseau, Hachem Kadri, 2016. Low-rank regression with tensor responses. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.
-
Rose Yu, Yan Liu, 2016. Learning from multiway data: simple and efficient tensor regression. Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
-
Masaaki Imaizumi, Kohei Hayashi, 2016. Doubly decomposing nonparametric tensor regression. Proceedings of the 33 rd International Conference on Machine Learning (ICML 2016), New York, NY, USA.
-
Rose Yu, Guangyu Li, Yan Liu, 2018. Tensor regression meets Gaussian processes. Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (AISTATS 2018), Lanzarote, Spain. [Matlab code]
-
Lifang He, Kun Chen, Wanwan Xu, Jiayu Zhou, Fei Wang, 2018. Boosted sparse and low-rank tensor regression. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada.
-
-
-
Liangjie Hong, 2015. Poisson matrix factorization. blog post.
-
Ali Taylan Cemgil, 2009. Bayesian inference for nonnegative matrix factorisation models. Computational intelligence and neuroscience.
-
Prem Gopalan, Jake M. Hofman, David M. Blei, 2015. Scalable recommendation with hierarchical poisson factorization. In UAI, 326-335. [C++ code]
-
Laurent Charlin, Rajesh Ranganath, James Mclnerney, 2015. Dynamic Poisson factorization. Proceedings of the 9th ACM Conference on Recommender Systems (RecSys 2015), Vienna, Italy. [C++ code]
-
Seyed Abbas Hosseini, Keivan Alizadeh, Ali Khodadadi, Ali Arabzadeh, Mehrdad Farajtabar, Hongyuan Zha, Hamid R. Rabiee, 2017. Recurrent Poisson factorization for temporal recommendation. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), Halifax, Nova Scotia Canada. [Matlab code]
-
Aaron Schein, Scott W. Linderman, Mingyuan Zhou, David M. Blei, Hanna Wallach, 2019. Poisson-Randomized Gamma Dynamical Systems. arXiv. (★★★★★)
-
-
-
Arman Hasanzadeh, Xi Liu, Nick Duffield, Krishna R. Narayanan, Byron Chigoy, 2017. A graph signal processing approach for real-time traffic prediction in transportation networks. arXiv.
-
Antonio Ortega, Pascal Frossard, Jelena Kovačević, José M. F. Moura, Pierre Vandergheynst, 2018. Graph signal processing: overview, challenges, and applications. Proceedings of the IEEE, 106(5): 808-828. [slide]
-
-
-
Structured deep models: Deep learning on graphs and beyond. slide.
-
gcn: Implementation of Graph Convolutional Networks in TensorFlow. GitHub project.
-
gated-graph-neural-network-samples: Sample Code for Gated Graph Neural Networks. GitHub project.
-
Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jieping Ye, Yan Liu, 2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. AAAI 2019.
-
Menglin Wang, Baisheng Lai, Zhongming Jin, Yufeng Lin, Xiaojia Gong, Jiangqiang Huang, Xiansheng Hua, 2018. Dynamic spatio-temporal graph-based CNNs for traffic prediction. arXiv.
-
-
Daniel J. Stekhoven, Peter Bühlmann, 2012. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics, 28(1): 112–118. [missingpy - PyPI] or [missingpy - GitHub]
-
fancyimpute: A variety of matrix completion and imputation algorithms implemented in Python. [homepage]
-
Dimitris Bertsimas, Colin Pawlowski, Ying Daisy Zhuo, 2018. From predictive methods to missing data imputation: An optimization approach. Journal of Machine Learning Research, 18(196): 1-39.
-
Wei Cao, Dong Wang, Jian Li, Hao Zhou, Yitan Li, Lei Li, 2018. BRITS: Bidirectional Recurrent Imputation for Time Series. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montréal, Canada. [Python code]
-
-
Xinyu Chen, Lijun Sun (2020). Low-rank autoregressive tensor completion for multivariate time series forecasting. arXiv: 2006.10436. [preprint] [data & Python code]
-
Xinyu Chen, Jinming Yang, Lijun Sun (2020). A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 117: 102673. [preprint] [doi] [data & Python code]
-
Xinyu Chen, Lijun Sun (2019). Bayesian temporal factorization for multidimensional time series prediction. arXiv: 1910.06366. [preprint] [slide] [data & Python code]
-
Xinyu Chen, Zhaocheng He, Yixian Chen, Yuhuan Lu, Jiawei Wang (2019). Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model. Transportation Research Part C: Emerging Technologies, 104: 66-77. [preprint] [doi] [slide] [data] [Matlab code]
-
Xinyu Chen, Zhaocheng He, Lijun Sun (2019). A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation. Transportation Research Part C: Emerging Technologies, 98: 73-84. [preprint] [doi] [data] [Matlab code] [Python code]
-
Xinyu Chen, Zhaocheng He, Jiawei Wang (2018). Spatial-temporal traffic speed patterns discovery and incomplete data recovery via SVD-combined tensor decomposition. Transportation Research Part C: Emerging Technologies, 86: 59-77. [doi] [data]
This project is from the above papers, please cite these papers if they help your research.
Xinyu Chen 💻 |
Jinming Yang 💻 |
Yixian Chen 💻 |
Lijun Sun 💻 |
Tianyang Han 💻 |
- Principal Investigator (PI)
Lijun Sun 💻 |
See the list of contributors who participated in this project.
Our transdim is still under development. More machine learning models and technical features are going to be added and we always welcome contributions to help make transdim better. If you have any suggestion about this project or want to collaborate with us, please feel free to contact Xinyu Chen (email: chenxy346@gmail.com) and send your suggestion/statement. We would like to thank everyone who has helped this project in any way.
Recommended email subjects:
- Suggestion on transdim from [+ your name]
- Collaboration statement on transdim from [+ your name]
This research is supported by the Institute for Data Valorization (IVADO).
This work is released under the MIT license.