### GAIN: Missing data imputation using Generative Adversarial Nets

**Generative Adversarial Network (GAN)**, is a special case of Adversarial Process where the components are neural networks. The first network called Generator ($G$) generates data (i.e., fake data), and the second network called Discriminator ($D$) tries to tell the difference between the real data and the fake data.

**Generative Adversarial Imputation Network**, or **GAIN** in short, is a novel method for imputing missing data by adapting the well-known GAN framework.

In GAIN, the Generator observes some elements of a read data vector, imputes the missing values conditioned on what is actually observed, and outputs a complete vector. The Discriminator then takes a complete vector and attempts to determine which elements were actually observed and which were imputed.

**Reference**

Jinsun Yoon, James Jordon, Mihaela van der Schaar, 2018. [GAIN: missing data imputation using Generative Adversarial Nets](http://proceedings.mlr.press/v80/yoon18a/yoon18a.pdf). Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden. [[supplementary materials](http://medianetlab.ee.ucla.edu/papers/ICML_GAIN_Supp.pdf)] [[Python code - GitHub](https://github.com/jsyoon0823/GAIN)]

### Data preparation

The following experiment is supported by the [Urban traffic speed data set in Gaungzhou, China](https://zenodo.org/record/1205229).

In [1]:
import numpy as np
import scipy.io
from tensorly import unfold

tensor = scipy.io.loadmat('Guangzhou-data-set/tensor.mat')
tensor = tensor['tensor']
random_matrix = scipy.io.loadmat('Guangzhou-data-set/random_matrix.mat')
random_matrix = random_matrix['random_matrix']
random_tensor = scipy.io.loadmat('Guangzhou-data-set/random_tensor.mat')
random_tensor = random_tensor['random_tensor']

mat = unfold(tensor, 0)
missing_rate = 0.2

# =============================================================================
### Set the random misssing (RM) scenario by:
# binary_mat = unfold(np.round(random_tensor + 0.5 - missing_rate), 0)
# =============================================================================

# =============================================================================
### Set the non-random missing (NM) scenario by:
binary_tensor = np.zeros(tensor.shape)
for i1 in range(tensor.shape[0]):
    for i2 in range(tensor.shape[1]):
        binary_tensor[i1,i2,:] = np.round(random_matrix[i1,i2] + 0.5 - missing_rate)
binary_mat = unfold(binary_tensor, 0)
# =============================================================================

sparse_mat = np.multiply(mat, binary_mat)

Using numpy backend.


In [2]:
mat = mat.T
binary_mat = binary_mat.T
sparse_mat = sparse_mat.T

num_row = mat.shape[0]
num_col = mat.shape[1]

num_h1 = num_col
num_h2 = num_col

X0 = mat
M0 = binary_mat

### GAIN architecture

1) For the purpose of training, we define the following `placeholder` for $X,~M,~H$.

In [3]:
import tensorflow as tf

X = tf.placeholder(tf.float32, shape = [None, num_col])
M = tf.placeholder(tf.float32, shape = [None, num_col])
H = tf.placeholder(tf.float32, shape = [None, num_col])
New_X = tf.placeholder(tf.float32, shape = [None, num_col])

2) Define the xavier initialization.

**Discriminator**: $\Theta^{(D)} =\left\{W_1^{(D)},b_1^{(D)},W_2^{(D)},b_2^{(D)},W_3^{(D)},b_3^{(D)}\right\}$

**Generator**: $\Theta^{(G)} =\left\{W_1^{(G)},b_1^{(G)},W_2^{(G)},b_2^{(G)},W_3^{(G)},b_3^{(G)}\right\}$

In [4]:
def xavier_init(size):
    in_dim = size[0]
    xavier_stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape = size, stddev = xavier_stddev)

D_W1 = tf.Variable(xavier_init([2 * num_col, num_h1]))
D_b1 = tf.Variable(tf.zeros(shape = [num_h1]))

D_W2 = tf.Variable(xavier_init([num_h1, num_h2]))
D_b2 = tf.Variable(tf.zeros(shape = [num_h2]))

D_W3 = tf.Variable(xavier_init([num_h2, num_col]))
D_b3 = tf.Variable(tf.zeros(shape = [num_col]))

theta_D = [D_W1, D_W2, D_W3, D_b1, D_b2, D_b3]

G_W1 = tf.Variable(xavier_init([2 * num_col, num_h1]))
G_b1 = tf.Variable(tf.zeros(shape = [num_h1]))

G_W2 = tf.Variable(xavier_init([num_h1, num_h2]))
G_b2 = tf.Variable(tf.zeros(shape = [num_h2]))

G_W3 = tf.Variable(xavier_init([num_h2, num_col]))
G_b3 = tf.Variable(tf.zeros(shape = [num_col]))

theta_G = [G_W1, G_W2, G_W3, G_b1, G_b2, G_b3]

### Generator and Discriminator networks implementation

We implement the **Generator** and **Discriminator** networks using the following functions.

In [5]:
def generator(New_X, M):
    inputs = tf.concat(axis = 1, values = [New_X, M])
    G_h1 = tf.nn.elu(tf.matmul(inputs, G_W1) + G_b1)
    G_h2 = tf.nn.elu(tf.matmul(G_h1, G_W2) + G_b2)
    G_prob = tf.nn.elu(tf.matmul(G_h2, G_W3) + G_b3)
    return G_prob

def discriminator(New_X, H):
    inputs = tf.concat(axis = 1, values = [New_X, H])
    D_h1 = tf.nn.elu(tf.matmul(inputs, D_W1) + D_b1)
    D_h2 = tf.nn.elu(tf.matmul(D_h1, D_W2) + D_b2)
    D_logit = tf.matmul(D_h2, D_W3) + D_b3
    D_prob = tf.nn.sigmoid(D_logit)
    return D_prob

### GAIN training

#### Generator:

The generator, $G$, takes $\tilde{X}$ and $M$ as inputs and outpus an imputed vector. The random variable $\hat{X}$ is defined in the following way:

$$\hat{X}=M\circledast\tilde{X}+(1-M)\circledast G\left(\tilde{X},M\right)$$
where $\circledast$ denotes Hadamard product (element-wise multiplication).

In [6]:
alpha = 5

G_sample = generator(New_X, M)
New_X_hat = New_X * M + G_sample * (1 - M)
D_prob = discriminator(New_X_hat, H)

D_loss1 = -tf.reduce_mean(M * tf.log(D_prob + 1e-8) + (1 - M) * tf.log(1. - D_prob + 1e-8))
G_loss1 = -tf.reduce_mean((1 - M) * tf.log(D_prob + 1e-8))
MSE_train_loss = tf.reduce_mean((M * New_X - M * G_sample) ** 2) / tf.reduce_mean(M)

D_loss = D_loss1
G_loss = G_loss1 + alpha * MSE_train_loss

MSE_test_loss = tf.reduce_mean(((1 - M) * X - (1 - M) * G_sample) ** 2) / tf.reduce_mean(1 - M)

D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)

In [7]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

max_iter = 4000
for iter in range(1, max_iter + 1):
    X_mb = X0
    M_mb = M0
    H_mb = M0
    New_X_mb = M_mb * X_mb
    
    _, D_loss_curr = sess.run([D_solver, D_loss1], feed_dict = {M: M_mb, New_X: New_X_mb, H: H_mb})
    _, G_loss_curr, MSE_train_loss_curr, MSE_test_loss_curr = sess.run([G_solver, G_loss1, MSE_train_loss, MSE_test_loss], 
                                                                       feed_dict = {X: X_mb, M: M_mb, New_X: New_X_mb, H: H_mb})
    
    if iter % 500 == 0:
        sample = sess.run(G_sample, feed_dict = {X: X0, M: M0, New_X: M0 * X0})
        smaple = M * X + (1 - M) * sample
        
        print('Iter: {}'.format(iter))
        print('Train_loss: {:.4}'.format(np.sqrt(MSE_train_loss_curr)))
        print('Test_loss: {:.4}'.format(np.sqrt(MSE_test_loss_curr)))
        print()

Iter: 500
Train_loss: 8.653
Test_loss: 9.767

Iter: 1000
Train_loss: 8.09
Test_loss: 9.143

Iter: 1500
Train_loss: 7.984
Test_loss: 9.148

Iter: 2000
Train_loss: 7.915
Test_loss: 9.174

Iter: 2500
Train_loss: 6.609
Test_loss: 7.984

Iter: 3000
Train_loss: 5.627
Test_loss: 6.947

Iter: 3500
Train_loss: 5.018
Test_loss: 6.435

Iter: 4000
Train_loss: 4.963
Test_loss: 6.507



In [8]:
ind = np.argwhere((mat > 0) & (sparse_mat == 0))
mape = np.sum(abs(mat[ind[:, 0], ind[:, 1]] - sample[ind[:, 0], ind[:, 1]]) / mat[ind[:, 0], ind[:, 1]] / len(ind))
rmse = np.sqrt(np.sum((mat[ind[:, 0], ind[:, 1]] - sample[ind[:, 0], ind[:, 1]]) ** 2) / len(ind))
print('MAPE = {:.6}'.format(mape))
print('RMSE = {:.6}'.format(rmse))

MAPE = 0.127649
RMSE = 6.55002


**Experiment results** of missing data imputation using GAIN:

|  scenario |`alpha`| `num_h1` | `num_h2` | `max_iter` | noise | MAPE | RMSE |
|:----------|------:|---------:|---------:|-----------:|------:|-----:|-----:|
|**0.2, RM**|   10 | `num_col` | `num_col` | 5000 | `None` | 0.1031 | 4.8601 |
|**0.2, RM**|   10 | `num_col` | `num_col` | 4000 | `None` | 0.1034 | **4.6718** |
|**0.2, RM**|    5 | `num_col` | `num_col` | 5000 | `None` | 0.1062 | 5.5075 |
|**0.2, RM**|    5 | `num_col` | `num_col` |10000 | `None` | 0.1105 | 6.1929 |
|**0.2, RM**|   20 | `num_col` | `num_col` | 4000 | `None` | 0.1021 | 4.5394 |
|**0.2, RM**|   20 | `num_col` | `num_col` | 5000 | `None` | 0.1071 | 5.4990 |
|**0.4, RM**|   10 | `num_col` | `num_col` | 4000 | `None` | 0.1063 | **5.1776** |
|**0.4, RM**|   10 | `num_col` | `num_col` | 5000 | `None` | 0.1114 | 5.8402 |
|**0.2, NM**|   10 | `num_col` | `num_col` | 4000 | `None` | 0.1276 | **6.5500** |
|**0.4, NM**|   10 | `num_col` | `num_col` | 5000 | `None` | 0.1813 | 10.6302 |
|**0.4, NM**|    5 | `num_col` | `num_col` | 5000 | `None` | 0.1501 | 7.1339 |
|**0.4, NM**|    5 | `num_col` | `num_col` | 5000 | `None` | 0.1648 | 7.5626 |
|**0.4, NM**|    5 | `num_col` | `num_col` | 1000 | `None` | 0.1382 | **6.9947** |
|**0.4, NM**|   20 | `num_col` | `num_col` | 5000 | `None` | 0.1560 | 7.2674 |
