In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import IPython
import time
import random
import Helper as helper
from Imputer import LRTC_TSpN

plt.rcParams['figure.figsize'] = (10,6)
%matplotlib inline
IPython.display.set_matplotlib_formats('svg')

This notebook gives a toy example to show how to implement LRTC-TSpN (low-rank tensor completion based on truncated tensor norm) on two small-size traffic flow data. Users can adopt this model to any spatial-temporal traffic data. For more detailted discussion about LRTC-TSpN, please see [1]

<div class="alert alert-block alert-info">
<font color="black">
<b>[1]</b> Tong Nie, Guoyang Qin, Jian Sun (2022). <b>Truncated tensor Schatten p-norm based approach for spatiotemporal traffic data imputation with complicated missing patterns</b>. arXiv.xxxx.xxxxx. <a href="https://arxiv.org/abs/xxxx" title="PDF"><b>[PDF]</b></a> 
</font>
</div>

## Preparation
### Third-order Tensor Structure

We organize the multivariate traffic time series as a third-order
tensor structure, i.e. $time~intervals×locations~(sensors)×days$. This three-dimensional data structure simultaneously
captures the integrated spatial-temporal information, thus making it more efficient to impute missing values.

### Spatial-temporal traffic sensor data

- **Guangzhou-small:** This is an urban traffic speed data set which consists of 214 road segments within two months (i.e., 61 days from August 1, 2016 to September 30, 2016) at 10-minute interval, in Guangzhou, China. We only use the speed data with the first 50 locations and the first 15 days. The size is (144 × 50 × 15). 
- **Portland-small:** This data set consists of link volume collected from highways in Portland, which contains 1156 loop detectors within one month at 15-minute interval. Volume data with the first 80 locations and the first 15 days are used. The size is (96 × 80 × 15).

### Complicated missing patterns
Besides the element-wise random missing case, we define three structured fiber mode-$n$ missing scenarios, which are generated through the two-by-two combinations of tensor mode-$n$ fibers. This can be described as: 
- **’Intervals’ mode fiber-like missing (FM-0)**, which illustrates a temporal missing pattern, is caused by adverse weather, breakdown of wireless connections or apparatus maintenance; 
- **’Locations’ mode fiber-like missing (FM-1)**, which denotes a spatial missing pattern, can be explained by lack of electricity for successive sensors or malfunction of Internet Data Center; 
- **’Days’ mode fiber-like missing (FM-2)** illuminates a spatial-temporal mixture missing situation that they are offline (do not operate) at regular time intervals everyday for specific sensors.

# License

<div class="alert alert-block alert-danger">
<b>This work is released under the MIT license.</b>
</div>