# **FOREWORD**

This kernel is a test to probe into the public leaderboard and ascertain that the **test set available now is the last 90 days in the training period**. Refer the data page [here](https://www.kaggle.com/competitions/mitsui-commodity-prediction-challenge/data) where the host writes- <br>

### Dataset Description
This competition dataset consists of multiple financial time series data obtained from markets around the world. The dataset various financial instruments such as metals, futures, US stocks, and foreign exchange. Participants are challenged to develop models that predict the returns of multiple target financial time series.

### Competition Phases and Data Updates
The competition will proceed in two phases:

A model training phase with a test set of roughly three months of historical data. Because these prices are publicly available leaderboard scores during this phase are not meaningful.
A forecasting phase with a test set to be collected after submissions close. You should expect this test set to be about the same size as the test set in the first phase.
During the forecasting phase the evaluation API will serve test data from the beginning of the public set to the end of the private set.

### What I do here

I probe into the public leaderboard and check using the dummy submission as below- <br>
- Try and merge the date available in the API with the train labels
- If the date is available, then simply borrow the results from the train labels (ground truth)
- Else use the dummy submission

If all the train dates are repeated here, then my score will be an infinitely high number. Else it will match the results from the dummy submission kernel [here](https://www.kaggle.com/code/sohier/mitsui-demo-submission)

# **IMPORTS**

In [1]:
import pandas as pd, polars as pl, numpy as np
import os
from warnings import filterwarnings 
filterwarnings("ignore")

pd.set_option(
    'display.max_rows' , 30, 
    'display.max_columns' , 35 ,
    'display.max_colwidth',  100,
    'display.precision' , 4,
    'display.float_format', '{:,.4f}'.format
) 

NUM_TARGET_COLUMNS = 424

# **PROBING**

In [2]:
%%time 

train_labels = pd.read_csv(
    f"/kaggle/input/mitsui-commodity-prediction-challenge/train_labels.csv"
)

sel_cols = train_labels.columns.tolist()

train_labels["date_id"] = train_labels["date_id"].astype(np.uint16)
display(train_labels.head(10))

Unnamed: 0,date_id,target_0,target_1,target_2,target_3,target_4,target_5,target_6,target_7,target_8,target_9,target_10,target_11,target_12,target_13,target_14,target_15,...,target_407,target_408,target_409,target_410,target_411,target_412,target_413,target_414,target_415,target_416,target_417,target_418,target_419,target_420,target_421,target_422,target_423
0,0,0.0059,-0.0029,-0.0047,-0.0006,,,-0.0067,0.0061,,0.0034,,-0.0057,,0.0003,,-0.0054,...,,-0.0426,-0.013,0.0276,-0.0413,0.0316,,,0.0212,-0.0056,,-0.0046,0.0338,,0.0382,,0.0273
1,1,0.0058,-0.0241,-0.0071,-0.019,-0.0319,-0.0195,0.003,-0.0069,-0.002,0.0213,0.0177,0.0048,0.0105,-0.0183,0.0137,0.0233,...,-0.0187,-0.0226,-0.006,0.0212,-0.0403,0.0294,-0.0065,0.0034,0.0214,-0.0015,0.0128,0.0105,0.0305,-0.0008,0.025,0.0035,0.0209
2,2,0.001,0.0238,-0.0089,-0.0221,,,0.0374,0.0077,,-0.0268,,-0.0021,,0.0294,,0.0107,...,-0.0128,-0.0074,0.0081,0.0134,-0.0902,0.0168,-0.0032,-0.0067,0.0093,0.0019,-0.0128,-0.0023,0.0175,-0.0054,0.0048,-0.0091,0.0017
3,3,0.0017,-0.0246,0.0119,0.0048,,,-0.0125,-0.0169,,0.0148,,0.0045,,-0.0328,,0.0005,...,,0.0288,-0.0157,0.0014,-0.0623,0.0682,,,0.0369,-0.0152,,0.0081,0.0011,,-0.0151,,-0.033
4,4,-0.0033,0.0052,0.0069,0.0133,0.024,0.0107,-0.0116,0.002,0.0039,-0.009,-0.0107,-0.0096,0.0004,0.0154,-0.0074,-0.0191,...,-0.0369,0.0509,0.0314,-0.0061,,,-0.0038,,0.0049,,-0.0067,-0.0161,-0.0049,,,0.0095,
5,5,0.0073,-0.0077,-0.0166,-0.0179,-0.0053,0.0068,0.0026,0.0082,0.0048,0.004,0.0078,0.0039,0.0093,-0.0035,0.002,0.0035,...,-0.0342,0.0079,0.008,0.0228,-0.0025,0.0108,0.0015,0.0103,0.0071,-0.0275,0.0072,-0.0163,0.0218,-0.0068,0.0124,0.0188,-0.0126
6,6,0.0079,-0.0134,-0.0035,0.0183,0.0142,-0.0156,-0.023,-0.0063,0.0065,0.0074,0.0156,0.0125,0.0192,0.0051,0.0198,-0.011,...,-0.0171,0.0136,0.0152,0.0042,0.0335,0.0218,0.0158,0.0058,0.0026,-0.0206,0.0127,0.0004,0.0083,-0.0162,0.0137,0.0129,-0.0068
7,7,,,0.0024,-0.0058,-0.0005,0.0065,0.0145,,,-0.0189,,-0.013,0.0061,0.0162,0.0012,-0.0056,...,-0.0063,-0.001,0.0203,-0.0093,0.0495,-0.0162,0.0056,0.0045,-0.0224,0.0039,0.0079,0.0002,-0.0022,-0.0211,0.0088,0.0049,0.0167
8,8,,,-0.0131,-0.0118,-0.016,-0.002,0.0044,,,0.012,,0.0176,0.0006,-0.0116,0.0026,0.0196,...,-0.0044,-0.0115,-0.007,-0.0005,,,0.0146,,-0.0004,,0.0033,-0.0004,0.0053,,,0.0032,
9,9,0.0079,-0.0106,0.0047,0.0123,0.0064,-0.0121,-0.0041,-0.0214,0.0179,0.0119,0.0021,-0.0014,-0.0039,-0.0044,-0.0082,0.0011,...,-0.0136,0.007,0.0219,-0.0283,0.0395,0.0238,0.013,0.0018,-0.0186,0.0027,0.0178,-0.0147,-0.0256,-0.0248,0.0196,0.0142,-0.0108


CPU times: user 238 ms, sys: 36.3 ms, total: 274 ms
Wall time: 459 ms


In [3]:
%%time 

import kaggle_evaluation.mitsui_inference_server
NUM_TARGET_COLUMNS = 424


def predict(
    test: pl.DataFrame,
    label_lags_1_batch: pl.DataFrame,
    label_lags_2_batch: pl.DataFrame,
    label_lags_3_batch: pl.DataFrame,
    label_lags_4_batch: pl.DataFrame,
) -> pl.DataFrame | pd.DataFrame:

    Xtest      = test.to_pandas()
    date_id    = Xtest["date_id"][0]
    test_preds = train_labels.loc[date_id, sel_cols[1:]].transpose().fillna(0).to_dict()
   
    predictions = pl.DataFrame(test_preds).select(pl.all().cast(pl.Float64))
    print(f"Captured ground truth | {date_id}")
        
    assert isinstance(predictions, (pd.DataFrame, pl.DataFrame))
    assert len(predictions) == 1
    return predictions


CPU times: user 174 ms, sys: 35 ms, total: 209 ms
Wall time: 358 ms


In [4]:
%%time 

inference_server = kaggle_evaluation.mitsui_inference_server.MitsuiInferenceServer(predict)

if os.getenv('KAGGLE_IS_COMPETITION_RERUN'):
    inference_server.serve()
else:
    inference_server.run_local_gateway(('/kaggle/input/mitsui-commodity-prediction-challenge/',))

Captured ground truth | 1827
Captured ground truth | 1828
Captured ground truth | 1829
Captured ground truth | 1830
Captured ground truth | 1831
Captured ground truth | 1832
Captured ground truth | 1833
Captured ground truth | 1834
Captured ground truth | 1835
Captured ground truth | 1836
Captured ground truth | 1837
Captured ground truth | 1838
Captured ground truth | 1839
Captured ground truth | 1840
Captured ground truth | 1841
Captured ground truth | 1842
Captured ground truth | 1843
Captured ground truth | 1844
Captured ground truth | 1845
Captured ground truth | 1846
Captured ground truth | 1847
Captured ground truth | 1848
Captured ground truth | 1849
Captured ground truth | 1850
Captured ground truth | 1851
Captured ground truth | 1852
Captured ground truth | 1853
Captured ground truth | 1854
Captured ground truth | 1855
Captured ground truth | 1856
Captured ground truth | 1857
Captured ground truth | 1858
Captured ground truth | 1859
Captured ground truth | 1860
Captured groun