# Point to Uncertainty


#### Converting the submission file for the accuracy competition with a score of 0.000 to the submission for the uncertainty competition.

Example notebook used [https://www.kaggle.com/kneroma/from-point-to-uncertainty-prediction](https://www.kaggle.com/kneroma/from-point-to-uncertainty-prediction).

The submission file was computed with the code in the accuracy competition.

Getting access to Google Drive for the use of Google Colab.

In [None]:
import pandas as pd, numpy as np
from matplotlib import pyplot as plt

import scipy.stats  as stats

Load and merge the best submission file with data from the sales train validation file.

In [None]:
best = pd.read_csv("../input/submission-accuracy/submission.csv")
sales_train_val = pd.read_csv('../input/m5-forecasting-uncertainty/sales_train_validation.csv')

sub = best.merge(sales_train_val[["id", "item_id", "dept_id", "cat_id", "store_id", "state_id"]], on = "id")
sub["_all_"] = "Total"

Quantile and ratio computation.

In [None]:
quantiles = np.array([0.005,0.025,0.165,0.25, 0.5, 0.75, 0.835, 0.975, 0.995])

quantiles_2 = np.log(quantiles/(1-quantiles))*.065

ratios = stats.norm.cdf(quantiles_2)
ratios /= ratios[4]
ratios = pd.Series(ratios, index=quantiles)
ratios.round(3)

Functions for computation of the uncertainty for each quantile.

In [None]:
def quantile_coefs(q):
  return ratios.loc[q].values

def get_group_preds(pred, level):
  submission = pred.groupby(level)[cols].sum()
  q = np.repeat(quantiles, len(submission))
  submission = pd.concat([submission]*9, axis=0, sort=False)
  submission.reset_index(inplace = True)
  submission[cols] *= quantile_coefs(q)[:, None]
  if level != "id":
    submission["id"] = [f"{lev}_X_{q:.3f}_validation" for lev, q in zip(submission[level].values, q)]
  else:
    submission["id"] = [f"{lev.replace('_validation', '')}_{q:.3f}_validation" for lev, q in zip(submission[level].values, q)]
  submission = submission[["id"]+list(cols)]
  return submission

def get_couple_group_preds(pred, level1, level2):
  submission = pred.groupby([level1, level2])[cols].sum()
  q = np.repeat(quantiles, len(submission))
  submission = pd.concat([submission]*9, axis=0, sort=False)
  submission.reset_index(inplace = True)
  submission[cols] *= quantile_coefs(q)[:, None]
  submission["id"] = [f"{lev1}_{lev2}_{q:.3f}_validation" for lev1,lev2, q in zip(submission[level1].values,submission[level2].values, q)]
  submission = submission[["id"]+list(cols)]
  return submission

levels = ["id", "item_id", "dept_id", "cat_id", "store_id", "state_id", "_all_"]
couples = [("state_id", "item_id"),  ("state_id", "dept_id"),("store_id","dept_id"),("state_id", "cat_id"),("store_id","cat_id")]
cols = [f"F{i}" for i in range(1, 29)]

Computation of the uncertainty per quantile per id. Saved in the format of the submission file provided by the kaggle competition.

In [None]:
submission = []
for level in levels :
    submission.append(get_group_preds(sub, level))
for level1,level2 in couples:
    submission.append(get_couple_group_preds(sub, level1, level2))
submission = pd.concat(submission, axis=0, sort=False)
submission.reset_index(drop=True, inplace=True)
submission = pd.concat([submission,submission] , axis=0, sort=False)
submission.reset_index(drop=True, inplace=True)
submission.loc[submission.index >= len(submission.index)//2, "id"] = submission.loc[submission.index >= len(submission.index)//2, "id"].str.replace("_validation$", "_evaluation")

submission.to_csv("../output/submission-uncertainty/submission.csv", index = False)