## M5 Setup
This competition is a little different from others. The true labels of the public leaderboard are now revealed.

The following is an excerpt from the [M5 Competition Guide](https://mofc.unic.ac.cy/m5-competition/):

> After the end of the validation phase, i.e., from June 1, 2020 to 30 June of the same year, the participants will be provided with the actual values of the 28 days of data used for scoring their performance during the validation phase. They will be asked then to re-estimate or adjust (if needed) their forecasting models in order to submit their final forecasts and prediction intervals for the following 28 days, i.e., the data used for the final evaluation of the participants. During this time, there will be no leaderboard, meaning that no feedback will be given to the participants about their score after submitting their forecasts. Thus, although the participants will be free to (re)submit their forecasts any time they wish (a maximum of 5 entries per day), they will not be aware of their absolute, as well as their relative performance. The final ranks of the participants will be made available only at the end of competition, when the test data will be made available. This is done in order for the competition to simulate reality as closely as possible, given that in real life forecasters do not know the future.

So while the public LB on Kaggle will either get infested by scores that use the true labels or Kaggle will freeze the public LB, we now have access to the actual labels and hence can calculate the validation scores (and rank as of 31st May, 2020) at various levels of aggregations ourselves.

The weights used in this notebook are the weights for the public LB (validation data). Note that the private LB (evaluation data) uses a different set of weights. A summary of the weights comparison is shared here: https://www.kaggle.com/rohanrao/m5-the-weighing-scale

Note that the final private LB ranking will be based on the test data at the end of the competition.


## Validation Data
The actual validation data (*d_1914 to d_1941*) is now available in the [sales_train_evaluation.csv](https://www.kaggle.com/c/m5-forecasting-accuracy/data?select=sales_train_validation.csv). Since this file also consists of the entire train data available earlier, we can completely ignore [sales_train_validation.csv](https://www.kaggle.com/c/m5-forecasting-accuracy/data?select=sales_train_validation.csv) for the rest of this competition.


In [2]:
## new train data
import pandas as pd

df_train_full = pd.read_csv("../input/sales_train_evaluation.csv")
df_train_full.iloc[:, -31:].head()


Unnamed: 0,d_1911,d_1912,d_1913,d_1914,d_1915,d_1916,d_1917,d_1918,d_1919,d_1920,...,d_1932,d_1933,d_1934,d_1935,d_1936,d_1937,d_1938,d_1939,d_1940,d_1941
0,0,1,1,0,0,0,2,0,3,5,...,2,4,0,0,0,0,3,3,0,1
1,0,0,0,0,1,0,0,0,0,0,...,0,1,2,1,1,0,0,0,0,0
2,1,1,1,0,0,1,1,0,2,1,...,1,0,2,0,0,0,2,3,0,1
3,3,7,2,0,0,1,2,4,1,6,...,1,1,0,4,0,1,3,0,2,6
4,2,2,4,1,0,2,3,1,0,3,...,0,0,0,2,1,0,0,2,1,0


In [8]:
df_train_full.head()

Unnamed: 0,id,item_id,dept_id,cat_id,store_id,state_id,d_1,d_2,d_3,d_4,...,d_1932,d_1933,d_1934,d_1935,d_1936,d_1937,d_1938,d_1939,d_1940,d_1941
0,HOBBIES_1_001_CA_1_evaluation,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,...,2,4,0,0,0,0,3,3,0,1
1,HOBBIES_1_002_CA_1_evaluation,HOBBIES_1_002,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,...,0,1,2,1,1,0,0,0,0,0
2,HOBBIES_1_003_CA_1_evaluation,HOBBIES_1_003,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,...,1,0,2,0,0,0,2,3,0,1
3,HOBBIES_1_004_CA_1_evaluation,HOBBIES_1_004,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,...,1,1,0,4,0,1,3,0,2,6
4,HOBBIES_1_005_CA_1_evaluation,HOBBIES_1_005,HOBBIES_1,HOBBIES,CA_1,CA,0,0,0,0,...,0,0,0,2,1,0,0,2,1,0


In [9]:
df_train_full.columns

Index(['id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id', 'd_1',
       'd_2', 'd_3', 'd_4',
       ...
       'd_1932', 'd_1933', 'd_1934', 'd_1935', 'd_1936', 'd_1937', 'd_1938',
       'd_1939', 'd_1940', 'd_1941'],
      dtype='object', length=1947)

In [12]:
data_melt=df_train_full.melt(id_vars=['id', 'item_id', 'dept_id', 'cat_id', 'store_id', 'state_id'],var_name='d',value_name='sold_num')

In [13]:
data_melt.head()

Unnamed: 0,id,item_id,dept_id,cat_id,store_id,state_id,d,sold_num
0,HOBBIES_1_001_CA_1_evaluation,HOBBIES_1_001,HOBBIES_1,HOBBIES,CA_1,CA,d_1,0
1,HOBBIES_1_002_CA_1_evaluation,HOBBIES_1_002,HOBBIES_1,HOBBIES,CA_1,CA,d_1,0
2,HOBBIES_1_003_CA_1_evaluation,HOBBIES_1_003,HOBBIES_1,HOBBIES,CA_1,CA,d_1,0
3,HOBBIES_1_004_CA_1_evaluation,HOBBIES_1_004,HOBBIES_1,HOBBIES,CA_1,CA,d_1,0
4,HOBBIES_1_005_CA_1_evaluation,HOBBIES_1_005,HOBBIES_1,HOBBIES,CA_1,CA,d_1,0


In [None]:
data_melt.to_csv('m.csv')

## Submission
The test data is for the predictions from *d_1942 to d_1969* corresponding to the sample submission format *F1 to F28*.   
We can still submit on Kaggle up to 5 times a day till the end of the competition.


## Public LB Score
We can use the actual validation data labels to score our models and get the exact public LB score. For predicting on the final test data it is highly recommended to rerun your models with including the new validation data available.

The code below can be used to get your public LB score. Thanks to [sakami](https://www.kaggle.com/sakami) for providing a neat class for the evaluation metric [here](https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/133834).   
The [dataset of M5 public LB](https://www.kaggle.com/rohanrao/m5-accuracy-final-public-lb) can be used to get your public LB rank.

I've also shared a notebook of how you can deep dive into analyzing your submission with the public LB: https://www.kaggle.com/rohanrao/m5-anatomy-of-the-public-lb

So you can now work without needing to make submissions.

I've verified the calculations below with [Konstantin Yakovlev](https://www.kaggle.com/kyakovlev)'s two public kernel submission files and corresponding scores on public LB:   
https://www.kaggle.com/kyakovlev/m5-three-shades-of-dark-darker-magic   
https://www.kaggle.com/kyakovlev/m5-witch-time


In [3]:
## importing packages
import numpy as np
import pandas as pd

from typing import Union
from tqdm.notebook import tqdm


In [4]:
## evaluation metric
## from https://www.kaggle.com/c/m5-forecasting-accuracy/discussion/133834 and edited to get scores at all levels
class WRMSSEEvaluator(object):

    def __init__(self, train_df: pd.DataFrame, valid_df: pd.DataFrame, calendar: pd.DataFrame, prices: pd.DataFrame):
        train_y = train_df.loc[:, train_df.columns.str.startswith('d_')]
        train_target_columns = train_y.columns.tolist()
        weight_columns = train_y.iloc[:, -28:].columns.tolist()

        train_df['all_id'] = 0  # for lv1 aggregation

        id_columns = train_df.loc[:, ~train_df.columns.str.startswith('d_')].columns.tolist()
        valid_target_columns = valid_df.loc[:, valid_df.columns.str.startswith('d_')].columns.tolist()

        if not all([c in valid_df.columns for c in id_columns]):
            valid_df = pd.concat([train_df[id_columns], valid_df], axis=1, sort=False)

        self.train_df = train_df
        self.valid_df = valid_df
        self.calendar = calendar
        self.prices = prices

        self.weight_columns = weight_columns
        self.id_columns = id_columns
        self.valid_target_columns = valid_target_columns

        weight_df = self.get_weight_df()

        self.group_ids = (
            'all_id',
            'cat_id',
            'state_id',
            'dept_id',
            'store_id',
            'item_id',
            ['state_id', 'cat_id'],
            ['state_id', 'dept_id'],
            ['store_id', 'cat_id'],
            ['store_id', 'dept_id'],
            ['item_id', 'state_id'],
            ['item_id', 'store_id']
        )

        for i, group_id in enumerate(tqdm(self.group_ids)):
            train_y = train_df.groupby(group_id)[train_target_columns].sum()
            scale = []
            for _, row in train_y.iterrows():
                series = row.values[np.argmax(row.values != 0):]
                scale.append(((series[1:] - series[:-1]) ** 2).mean())
            setattr(self, f'lv{i + 1}_scale', np.array(scale))
            setattr(self, f'lv{i + 1}_train_df', train_y)
            setattr(self, f'lv{i + 1}_valid_df', valid_df.groupby(group_id)[valid_target_columns].sum())

            lv_weight = weight_df.groupby(group_id)[weight_columns].sum().sum(axis=1)
            setattr(self, f'lv{i + 1}_weight', lv_weight / lv_weight.sum())

    def get_weight_df(self) -> pd.DataFrame:
        day_to_week = self.calendar.set_index('d')['wm_yr_wk'].to_dict()
        weight_df = self.train_df[['item_id', 'store_id'] + self.weight_columns].set_index(['item_id', 'store_id'])
        weight_df = weight_df.stack().reset_index().rename(columns={'level_2': 'd', 0: 'value'})
        weight_df['wm_yr_wk'] = weight_df['d'].map(day_to_week)

        weight_df = weight_df.merge(self.prices, how='left', on=['item_id', 'store_id', 'wm_yr_wk'])
        weight_df['value'] = weight_df['value'] * weight_df['sell_price']
        weight_df = weight_df.set_index(['item_id', 'store_id', 'd']).unstack(level=2)['value']
        weight_df = weight_df.loc[zip(self.train_df.item_id, self.train_df.store_id), :].reset_index(drop=True)
        weight_df = pd.concat([self.train_df[self.id_columns], weight_df], axis=1, sort=False)
        return weight_df

    def rmsse(self, valid_preds: pd.DataFrame, lv: int) -> pd.Series:
        valid_y = getattr(self, f'lv{lv}_valid_df')
        score = ((valid_y - valid_preds) ** 2).mean(axis=1)
        scale = getattr(self, f'lv{lv}_scale')
        return (score / scale).map(np.sqrt)

    def score(self, valid_preds: Union[pd.DataFrame, np.ndarray]):
        assert self.valid_df[self.valid_target_columns].shape == valid_preds.shape

        if isinstance(valid_preds, np.ndarray):
            valid_preds = pd.DataFrame(valid_preds, columns=self.valid_target_columns)

        valid_preds = pd.concat([self.valid_df[self.id_columns], valid_preds], axis=1, sort=False)

        group_ids = []
        all_scores = []
        for i, group_id in enumerate(self.group_ids):
            lv_scores = self.rmsse(valid_preds.groupby(group_id)[self.valid_target_columns].sum(), i + 1)
            weight = getattr(self, f'lv{i + 1}_weight')
            lv_scores = pd.concat([weight, lv_scores], axis=1, sort=False).prod(axis=1)
            group_ids.append(group_id)
            all_scores.append(lv_scores.sum())

        return group_ids, all_scores


In [5]:
## public LB rank
def get_lb_rank(score):
    """
    Get rank on public LB as of 2020-05-31 23:59:59
    """
    df_lb = pd.read_csv("../input/m5-accuracy-final-public-lb/m5-forecasting-accuracy-publicleaderboard-rank.csv")

    return (df_lb.Score <= score).sum() + 1


In [7]:
## reading data
df_calendar = pd.read_csv("../input/calendar.csv")
df_prices = pd.read_csv("../input/sell_prices.csv")
df_sample_submission = pd.read_csv("../input/sample_submission.csv")
df_sample_submission["order"] = range(df_sample_submission.shape[0])

df_train = df_train_full.iloc[:, :-28]
df_valid = df_train_full.iloc[:, -28:]

evaluator = WRMSSEEvaluator(df_train, df_valid, df_calendar, df_prices)


ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

In [6]:
## structure of validation data
preds_valid = df_valid.copy() + np.random.randint(100, size = df_valid.shape)
preds_valid.head()


Unnamed: 0,d_1914,d_1915,d_1916,d_1917,d_1918,d_1919,d_1920,d_1921,d_1922,d_1923,...,d_1932,d_1933,d_1934,d_1935,d_1936,d_1937,d_1938,d_1939,d_1940,d_1941
0,86,43,88,55,97,64,28,54,6,40,...,42,49,14,59,7,25,34,32,49,49
1,8,60,92,80,29,15,46,47,89,95,...,1,29,45,93,91,56,25,25,62,31
2,36,68,100,33,24,12,65,5,2,95,...,65,26,34,98,89,82,96,32,73,96
3,63,36,57,40,38,78,8,12,9,13,...,3,61,56,6,52,90,71,57,40,8
4,82,89,64,61,26,11,3,100,55,90,...,12,10,41,50,31,56,19,53,26,90


In [7]:
## evaluating random submission
groups, scores = evaluator.score(preds_valid)

score_public_lb = np.mean(scores)
score_public_rank = get_lb_rank(score_public_lb)

for i in range(len(groups)):
    print(f"Score for group {groups[i]}: {round(scores[i], 5)}")

print(f"\nPublic LB Score: {round(score_public_lb, 5)}")
print(f"Public LB Rank: {score_public_rank}")


Score for group all_id: 254.74625
Score for group cat_id: 266.09364
Score for group state_id: 232.62688
Score for group dept_id: 266.07847
Score for group store_id: 221.7439
Score for group item_id: 82.41056
Score for group ['state_id', 'cat_id']: 236.42144
Score for group ['state_id', 'dept_id']: 232.04109
Score for group ['store_id', 'cat_id']: 212.79594
Score for group ['store_id', 'dept_id']: 198.70739
Score for group ['item_id', 'state_id']: 51.69169
Score for group ['item_id', 'store_id']: 31.2954

Public LB Score: 190.55439
Public LB Rank: 4539


In [8]:
## evaluating submission from public kernel M5 - Three shades of Dark: Darker magic
## from https://www.kaggle.com/kyakovlev/m5-three-shades-of-dark-darker-magic
preds_valid = pd.read_csv("../input/m5-three-shades-of-dark-darker-magic/submission_v1.csv")
preds_valid = preds_valid[preds_valid.id.str.contains("validation")]
preds_valid = preds_valid.merge(df_sample_submission[["id", "order"]], on = "id").sort_values("order").drop(["id", "order"], axis = 1).reset_index(drop = True)
preds_valid.rename(columns = {
    "F1": "d_1914", "F2": "d_1915", "F3": "d_1916", "F4": "d_1917", "F5": "d_1918", "F6": "d_1919", "F7": "d_1920",
    "F8": "d_1921", "F9": "d_1922", "F10": "d_1923", "F11": "d_1924", "F12": "d_1925", "F13": "d_1926", "F14": "d_1927",
    "F15": "d_1928", "F16": "d_1929", "F17": "d_1930", "F18": "d_1931", "F19": "d_1932", "F20": "d_1933", "F21": "d_1934",
    "F22": "d_1935", "F23": "d_1936", "F24": "d_1937", "F25": "d_1938", "F26": "d_1939", "F27": "d_1940", "F28": "d_1941"
}, inplace = True)

groups, scores = evaluator.score(preds_valid)

score_public_lb = np.mean(scores)
score_public_rank = get_lb_rank(score_public_lb)

for i in range(len(groups)):
    print(f"Score for group {groups[i]}: {round(scores[i], 5)}")

print(f"\nPublic LB Score: {round(score_public_lb, 5)}")
print(f"Public LB Rank: {score_public_rank}")


Score for group all_id: 0.21289
Score for group cat_id: 0.25887
Score for group state_id: 0.29803
Score for group dept_id: 0.33461
Score for group store_id: 0.39143
Score for group item_id: 0.79453
Score for group ['state_id', 'cat_id']: 0.35703
Score for group ['state_id', 'dept_id']: 0.43278
Score for group ['store_id', 'cat_id']: 0.45556
Score for group ['store_id', 'dept_id']: 0.53982
Score for group ['item_id', 'state_id']: 0.80873
Score for group ['item_id', 'store_id']: 0.81651

Public LB Score: 0.47507
Public LB Rank: 1053


In [9]:
## evaluating submission from public kernel M5 - Witch Time
## from https://www.kaggle.com/kyakovlev/m5-witch-time
preds_valid = pd.read_csv("../input/m5-witch-time/submission.csv")
preds_valid = preds_valid[preds_valid.id.str.contains("validation")]
preds_valid = preds_valid.merge(df_sample_submission[["id", "order"]], on = "id").sort_values("order").drop(["id", "order"], axis = 1).reset_index(drop = True)
preds_valid.rename(columns = {
    "F1": "d_1914", "F2": "d_1915", "F3": "d_1916", "F4": "d_1917", "F5": "d_1918", "F6": "d_1919", "F7": "d_1920",
    "F8": "d_1921", "F9": "d_1922", "F10": "d_1923", "F11": "d_1924", "F12": "d_1925", "F13": "d_1926", "F14": "d_1927",
    "F15": "d_1928", "F16": "d_1929", "F17": "d_1930", "F18": "d_1931", "F19": "d_1932", "F20": "d_1933", "F21": "d_1934",
    "F22": "d_1935", "F23": "d_1936", "F24": "d_1937", "F25": "d_1938", "F26": "d_1939", "F27": "d_1940", "F28": "d_1941"
}, inplace = True)

groups, scores = evaluator.score(preds_valid)

score_public_lb = np.mean(scores)
score_public_rank = get_lb_rank(score_public_lb)

for i in range(len(groups)):
    print(f"Score for group {groups[i]}: {round(scores[i], 5)}")

print(f"\nPublic LB Score: {round(score_public_lb, 5)}")
print(f"Public LB Rank: {score_public_rank}")


Score for group all_id: 0.17417
Score for group cat_id: 0.23089
Score for group state_id: 0.27553
Score for group dept_id: 0.31848
Score for group store_id: 0.38106
Score for group item_id: 0.79245
Score for group ['state_id', 'cat_id']: 0.34073
Score for group ['state_id', 'dept_id']: 0.42337
Score for group ['store_id', 'cat_id']: 0.44684
Score for group ['store_id', 'dept_id']: 0.53514
Score for group ['item_id', 'state_id']: 0.80796
Score for group ['item_id', 'store_id']: 0.81658

Public LB Score: 0.46193
Public LB Rank: 420


In [10]:
!ls /kaggle/input/m5-submissions/

submission_giba.csv	submission_giba_5.csv  submission_giba_8.csv
submission_giba_1.csv	submission_giba_6.csv  submission_giba_9.csv
submission_giba_10.csv	submission_giba_7.csv


In [11]:
preds_valid = pd.read_csv("/kaggle/input/m5-submissions/submission_giba_10.csv")
#preds_valid.iloc[:, -28:] = preds_valid.iloc[:, -28:].round()
preds_valid.iloc[:, -28:] = preds_valid.iloc[:, -28:]*1.35

preds_valid = preds_valid[preds_valid.id.str.contains("validation")]
preds_valid = preds_valid.merge(df_sample_submission[["id", "order"]], on = "id").sort_values("order").drop(["id", "order"], axis = 1).reset_index(drop = True)

preds_valid.rename(columns = {
    "F1": "d_1914", "F2": "d_1915", "F3": "d_1916", "F4": "d_1917", "F5": "d_1918", "F6": "d_1919", "F7": "d_1920",
    "F8": "d_1921", "F9": "d_1922", "F10": "d_1923", "F11": "d_1924", "F12": "d_1925", "F13": "d_1926", "F14": "d_1927",
    "F15": "d_1928", "F16": "d_1929", "F17": "d_1930", "F18": "d_1931", "F19": "d_1932", "F20": "d_1933", "F21": "d_1934",
    "F22": "d_1935", "F23": "d_1936", "F24": "d_1937", "F25": "d_1938", "F26": "d_1939", "F27": "d_1940", "F28": "d_1941"
}, inplace = True)


groups, scores = evaluator.score(preds_valid)

score_public_lb = np.mean(scores)
score_public_rank = get_lb_rank(score_public_lb)

for i in range(len(groups)):
    print(f"Score for group {groups[i]}: {round(scores[i], 5)}")

print(f"\nPublic LB Score: {round(score_public_lb, 5)}")
print(f"Public LB Rank: {score_public_rank}")

Score for group all_id: 0.57063
Score for group cat_id: 0.97258
Score for group state_id: 0.5953
Score for group dept_id: 1.21698
Score for group store_id: 0.72642
Score for group item_id: 1.04492
Score for group ['state_id', 'cat_id']: 0.94703
Score for group ['state_id', 'dept_id']: 1.15958
Score for group ['store_id', 'cat_id']: 0.96669
Score for group ['store_id', 'dept_id']: 1.12071
Score for group ['item_id', 'state_id']: 0.93867
Score for group ['item_id', 'store_id']: 0.85959

Public LB Score: 0.92659
Public LB Rank: 3625


In [12]:
## Top Score: 0.42714
print(get_lb_rank(0.42713))
print(get_lb_rank(0.42714))
print(get_lb_rank(0.42715))


1
2
2


Whenever your score exactly matches one already on public LB, it will be ranked below the same.

## Notes
* The merge and sort with sample submission is not required if predictions are already ordered.
* The renaming of columns is not required if predictions already have the columns *d_1914 to d_1941*.
