# Making sparse predictions, and forecasting the requests of the government declaration of natural disaster for a drought event in France
In this notebook, we will show how to predict the cost of drought event.
The methodology is presented in the paper [Making sparse predictions, and forecasting the requests of the government declaration of natural disaster for a drought event in France]() by 
T. T. Y. Nguyen,G. Ecoto and A. Chambaz. 

## Imports and installs

In [1]:
import numpy as np
import torch
import pandas as pd
from predicters import OTpreds
from utils import *
import logging
from sklearn.metrics import f1_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_absolute_error as MAE
from sklearn.metrics import mean_squared_error as MSE


logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.debug("test")
torch.set_default_tensor_type('torch.DoubleTensor')



# Example 1: synthesic data

We now present an illustration based on simulated data.

## Data loading
The dataset loader utilities assume there is a "simulations/" folder in the current directory.


In [2]:
file = './simulations/data_simulate.npz'
data = np.load(file, allow_pickle = True)

train = data['train']
test = data['test']
theta_SL_1 = data['theta_SL1']
theta_SL_2 = data['theta_SL2']
rmse_SL1 = data['rmse_SL1']
rmse_SL2 = data['rmse_SL2']

rmse_sl1 = np.mean(rmse_SL1)
rmse_sl2 = np.mean(rmse_SL2)


nb = 0
theta_sl1 = theta_SL_1[nb].squeeze()
theta_sl2 = theta_SL_2[nb].squeeze()
z = train[nb]
y = train[nb][:,2]

### Load the data and whiten it

In [3]:
zp_true = test[nb]
theta_true = zp_true[:, -1]
zp = zp_true.copy()
zp[:,-1] = np.nan
tau_true = np.mean(theta_true)
zp = zp_true.copy()
zp[:,-1] = np.nan

Z = torch.from_numpy(z)
Zp = torch.from_numpy(zp)
Zp_true = torch.from_numpy(zp_true)
Theta_true = torch.from_numpy(theta_true)



### Hyperparameters

In [4]:
alpha = 500
init = theta_sl2
tau = np.mean(init)
cost = dissim2(alpha).cost2

batchsize = 128
eps = 1e-2
lr = 1e-3
mu = 1e-4
report_interval = 2
niter = 30


### Optimal transport sparse prediction

In [5]:
sk_imputer = OTpreds(n_pairs = 1, noise = 0.01, batchsize = 128, niter = niter, eps = eps, lr = lr, 
                     mu = mu, tau = tau, cost = cost)

theta_ot, maes, rmses, aucs = sk_imputer.fit_transform_update(Z, Zp, theta_true = Theta_true,
                                                           init = None, verbose=True, 
                                                           report_interval=report_interval)



INFO:root:batchsize = 128, epsilon = 0.0100
INFO:root:Iteration, learning rate 0:	 Loss: 40.8341	 Validation MAE: 0.1368	RMSE: 0.2275	AUC: 0.5588
INFO:root:Iteration, learning rate 2:	 Loss: 80.1990	 Validation MAE: 0.1348	RMSE: 0.2268	AUC: 0.6171
INFO:root:Iteration, learning rate 4:	 Loss: 65.9078	 Validation MAE: 0.1331	RMSE: 0.2259	AUC: 0.6849
INFO:root:Iteration, learning rate 6:	 Loss: 55.0829	 Validation MAE: 0.1317	RMSE: 0.2251	AUC: 0.7459
INFO:root:Iteration, learning rate 8:	 Loss: 32.5202	 Validation MAE: 0.1303	RMSE: 0.2246	AUC: 0.7600
INFO:root:Iteration, learning rate 10:	 Loss: 71.2195	 Validation MAE: 0.1289	RMSE: 0.2240	AUC: 0.7928
INFO:root:Iteration, learning rate 12:	 Loss: 45.2270	 Validation MAE: 0.1278	RMSE: 0.2235	AUC: 0.7954
INFO:root:Iteration, learning rate 14:	 Loss: 39.2918	 Validation MAE: 0.1268	RMSE: 0.2232	AUC: 0.7894
INFO:root:Iteration, learning rate 16:	 Loss: 91.8793	 Validation MAE: 0.1258	RMSE: 0.2227	AUC: 0.7898
INFO:root:Iteration, learning rate

In [14]:

theta_hybrid = np.sqrt(theta_sl2*theta_ot)
theta_ot_np = theta_ot>np.quantile(theta_ot, 1-tau_true)
theta_sl2_np = theta_sl2> np.quantile(theta_sl2, 1-tau_true)

rmse_glm = MSE(theta_true, theta_sl2)**(1/2)
rmse_ot_glm = MSE(theta_true, theta_hybrid)**(1/2)
rmse_ot = MSE(theta_true, theta_ot)**(1/2)

print(accuracy_score(theta_true, theta_ot_np), accuracy_score(theta_true, theta_sl2_np))
print(maes[-1], rmses[-1], aucs[-1])


print(rmse_ot_glm, rmse_glm, rmse_ot)
theta_hybrid_np = theta_hybrid>np.quantile(theta_hybrid, 1-tau_true)
print(f1_score(theta_true, theta_ot_np), f1_score(theta_true, theta_sl2_np), )
print('F1-score:\nbinary OT prediction:', f1_score(theta_true, theta_ot_np), '\t binary SL2 prediction:', f1_score(theta_true, theta_sl2_np), 
      '\t binary hybrid prediction:', f1_score(theta_true, theta_hybrid_np))


0.944 0.96
0.12029944580581978 0.2203472433954151 0.8240720447889065
0.19364991120043143 0.17512161251811864 0.2203472433954151
0.4716981132075472 0.6226415094339622
F1-score:
binary OT prediction: 0.4716981132075472 	 binary SL2 prediction: 0.6226415094339622 	 binary hybrid prediction: 0.6037735849056604


# Part II: Real data 