# [OTTO – Multi-Objective Recommender System](https://www.kaggle.com/competitions/otto-recommender-system)

## Many thanks to:
- [0.578 | Ensemble of Public Notebooks](https://www.kaggle.com/code/karakasatarik/0-578-ensemble-of-public-notebooks)
- [💡 [2 methods] How-to ensemble predictions 🏅🏅🏅](https://www.kaggle.com/code/radek1/2-methods-how-to-ensemble-predictions)
- [Candidate ReRank Model - [LB 0.575]](https://www.kaggle.com/code/cdeotte/candidate-rerank-model-lb-0-575)
- [otto-pipeline2 [LB 0.576]](https://www.kaggle.com/code/tuongkhang/otto-pipeline2-lb-0-576)
- [OTTO: Tuning Candidate ReRank Model[LB 0.577]](https://www.kaggle.com/code/utm529fg/otto-tuning-candidate-rerank-model-lb-0-577)

# Loading the data

In [1]:
!pip install polars

Collecting polars
  Downloading polars-0.15.15-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.0/15.0 MB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: polars
Successfully installed polars-0.15.15
[0m

In [2]:
import polars as pl
paths = ['/kaggle/input/0-578-ensemble-of-public-notebooks/submission.csv',  # 0.578
         #'/kaggle/input/candidate-rerank-model-lb-0-575/submission.csv', # 0.575
         '/kaggle/input/otto-pipeline2-lb-0-576/submission.csv', # 0.576
         '/kaggle/input/otto-tuning-candidate-rerank-model-lb-0-577/submission.csv' # 0.577
        ]

In [3]:
def read_sub(path, weight=1): # by default let us assing the weight of 1 to predictions from each submission, this will be akin to a standard vote ensemble
    '''a helper function for loading and preprocessing submissions'''
    return (
        pl.read_csv(path)
            .with_column(pl.col('labels').str.split(by=' '))
            .with_column(pl.lit(weight).alias('vote'))
            .explode('labels')
            .rename({'labels': 'aid'})
            .with_column(pl.col('aid').cast(pl.UInt32)) # we are casting the `aids` to `Int32`! memory management is super important to ensure we don't run out of resources
            .with_column(pl.col('vote').cast(pl.UInt8))
    )

In [4]:
subs = [read_sub(path) for path in paths]
subs[0].head()

session_type,aid,vote
str,u32,u8
"""14279927_carts...",872695,1
"""14279927_carts...",922440,1
"""14279927_carts...",67054,1
"""14279927_carts...",153333,1
"""14279927_carts...",215472,1


In [5]:
subs = subs[0].join(subs[1], how='outer', on=['session_type', 'aid']).join(subs[2], how='outer', on=['session_type', 'aid'], suffix='_right2')
subs.head()

session_type,aid,vote,vote_right,vote_right2
str,u32,u8,u8,u8
"""12899779_click...",59625,1,1,1
"""12899779_click...",1253524,1,1,1
"""12899779_click...",737445,1,1,1
"""12899779_click...",438191,1,1,1
"""12899779_click...",731692,1,1,1


In [6]:
subs = (subs
    .fill_null(0)
    .with_column((pl.col('vote') + pl.col('vote_right') + pl.col('vote_right2')).alias('vote_sum'))
    .drop(['vote', 'vote_right', 'vote_right2'])
    .sort(by='vote_sum')
    .reverse()
)

subs.head()

session_type,aid,vote_sum
str,u32,u8
"""14571581_carts...",1764910,3
"""14571581_carts...",978060,3
"""14571581_carts...",1497245,3
"""14571581_carts...",984794,3
"""14571581_carts...",1072049,3


In [7]:
%%time
preds = subs.groupby('session_type').agg([
    pl.col('aid').head(20).alias('labels')
])

preds = preds.with_column(pl.col('labels').apply(lambda lst: ' '.join([str(aid) for aid in lst])))

CPU times: user 5min, sys: 8.03 s, total: 5min 8s
Wall time: 4min 44s


In [8]:
preds.write_csv('submission.csv')