# Bi-linear interaction model with group-by augmentations

## Prepare dataset for training

Following the same steps as in [the training of a simple model](https://github.com/tinyclues/recsys-multi-atrribute-benchmark/blob/master/training/movielens%20simple%20model.ipynb), we first load splitted dataset generated in [notebook](https://github.com/tinyclues/recsys-multi-atrribute-benchmark/blob/master/dataset_preprocessing/movielens%20with%20imdb.ipynb)

In [1]:
DATASET = 'movielens_imdb'

In [2]:
from utils import load_dataset

datasets = {}
for split_name in ['train', 'val', 'test']:
    datasets[split_name] = load_dataset(DATASET, split_name)

Then we parse features' names to obtain a list of offer features (that will be used to modelize film) and a list of user features (aggregated history up to chosen date)

In [3]:
from utils import AGG_PREFIX

all_columns = list(datasets['train'].element_spec.keys())
technical_columns = ['userId', 'date']
user_features = list(filter(lambda x: x.startswith(AGG_PREFIX), all_columns))
offer_features = list(filter(lambda x: x not in user_features + technical_columns, all_columns))

### Rebatching datasets

Splitting dataset into smaller batches in the same way as described in [the training of a simple model](https://github.com/tinyclues/recsys-multi-atrribute-benchmark/blob/master/training/movielens%20simple%20model.ipynb)

In [4]:
%%time

from functools import partial
from uuid import uuid4

from utils import rebatch_by_events

datasets['train'] = rebatch_by_events(datasets['train'], batch_size=10080, date_column='date', nb_events_by_user_by_day=8)
for key in ['val', 'test']:
    datasets[key] = rebatch_by_events(datasets[key], batch_size=50400, date_column='date', nb_events_by_user_by_day=8,
                                      seed=1729).cache(f'/tmp/{uuid4()}.tf')

CPU times: user 41.2 s, sys: 6.32 s, total: 47.5 s
Wall time: 35.3 s


In [5]:
from utils import add_equal_weights

for key in datasets:
    datasets[key] = datasets[key].map(partial(add_equal_weights, features=offer_features))

## Define the model

First we need to get number of different modalities inputs can take from saved vectorizers (it will be used in embeddings layer definition):

In [6]:
from utils import load_inverse_lookups
inverse_lookups = load_inverse_lookups(DATASET)

In [7]:
import re

vocabulary_sizes = {}

for feature in offer_features:
    vocabulary_sizes[feature] = inverse_lookups[feature].vocabulary_size()

for feature in user_features:
    for key in inverse_lookups:
        pattern = re.compile(r"{}(\w+)_{}".format(AGG_PREFIX, key))
        if pattern.match(feature):
            vocabulary_sizes[feature] = vocabulary_sizes[key]

### Layers definitions

To define a model with group-by augmentations we need to create some layers described in [this blog post](https://medium.com/p/508d5080c0c6/). In all following schemas we color weights that will be learned during training in red.

#### Generation of group-by augmentations

To get group-by augmentations we need first choose randomly some offer features we will use to get group by keys that will consist of AND and OR combinations of values from chosen features. It is implemented in `KeyGenerator` layer. Next once we get keys, we will group by and calculate mean and variance embeddings vectors for other features wrt to chosen keys. Finally, we will broadcast mean and variance vectors back to return to original batch size. Both calculation and broadcast are implemented in `GroupBy` layer.

<img src="resources/group_by_augmentation.png" alt="group-by augmentation generation" width="800" />

In [8]:
import tensorflow as tf
from layers import KeyGenerator, GroupBy

In [9]:
test_key_generator = KeyGenerator(number_of_offer_attributes=len(offer_features),
                                  average_number_of_attributes_in_key=2,
                                  name='test_key_generator')

In [10]:
# sampling randomly values for offer features
test_offer_features = tf.random.uniform((10, len(offer_features)), maxval=5, dtype=tf.int32)
test_offer_features

<tf.Tensor: shape=(10, 7), dtype=int32, numpy=
array([[4, 1, 3, 1, 1, 4, 1],
       [1, 0, 0, 2, 1, 4, 0],
       [3, 1, 2, 0, 0, 1, 3],
       [1, 1, 4, 4, 0, 4, 3],
       [4, 3, 0, 1, 4, 1, 3],
       [3, 3, 3, 2, 4, 4, 2],
       [0, 1, 4, 0, 2, 3, 4],
       [1, 4, 1, 1, 0, 1, 4],
       [0, 0, 4, 3, 2, 1, 2],
       [3, 4, 2, 2, 0, 0, 1]], dtype=int32)>

In [11]:
# key generator returns hashed keys for group by
test_keys = test_key_generator(test_offer_features)
test_keys

<tf.Tensor: shape=(10,), dtype=int32, numpy=array([0, 1, 2, 3, 0, 4, 5, 6, 7, 4], dtype=int32)>

In [12]:
group_by = GroupBy(name='group_by')

In [13]:
# if we have some embeddings vectors
test_embeddings = tf.random.normal((10, 3))

In [14]:
# we can apply group-by operation for all features
test_mean, test_var = group_by(test_keys, test_embeddings)
test_mean.shape, test_var.shape

(TensorShape([10, 3]), TensorShape([10, 3]))

In [15]:
# or more direct example of group by
import numpy as np
group_by([0, 0, 1], np.eye(3))

(<tf.Tensor: shape=(3, 3), dtype=float64, numpy=
 array([[0.5, 0.5, 0. ],
        [0.5, 0.5, 0. ],
        [0. , 0. , 1. ]])>,
 <tf.Tensor: shape=(3, 3), dtype=float64, numpy=
 array([[0.25, 0.25, 0.  ],
        [0.25, 0.25, 0.  ],
        [0.  , 0.  , 0.  ]])>)

#### Compression of user features

For optimisation of calculation of interaction we want to reduce number of user features we use. For that we will generate meta features using a sequence of fully connected layers based on `tf.keras.layers.experimental.EinsumDense` layer.

<img src="resources/user_features_compression.png" alt="compression of user features" width="800" />

In [16]:
from layers import UserFeaturesCompressor
test_user_compressor = UserFeaturesCompressor(number_of_meta_features=2,
                                              dropout_rate=0.1,
                                              name='test_user_compressor')
test_user_compressor(tf.random.normal((10, 3, 7))).shape

TensorShape([10, 2, 7])

#### Compression of offer features and MaskNet

This is a key layer that will create some meta offer features and apply instance guided mask over embedding dimension. For meta features, the idea is similar to user side: we want to get smaller number of features before interaction, but here using information about variance we can completely deactivate some features, depending on offer we want to predict.

<img src="resources/offer_features_compression.png" alt="compression of offer features" width="800" />

In [17]:
from layers import OfferFeaturesCompressor
test_offer_compressor = OfferFeaturesCompressor(number_of_meta_features=2,
                                                dropout_rate=0.1,
                                                name='test_offer_compressor')
test_offer_compressor([tf.random.normal((10, 3, 7)), tf.random.normal((10, 3, 7))]).shape

TensorShape([10, 2, 7])

In [18]:
from layers import MaskNet
test_mask = MaskNet(number_of_meta_features=2, dropout_rate=0.1)
test_mask([tf.random.normal((10, 3, 7)), tf.random.normal((10, 3, 7))]).shape

TensorShape([10, 2, 7])

#### Bi-linear feature-wise interaction

Last step is a calculation of interaction using bi-linear kernel for each pair of meta features from user and from offer:

<img src="resources/bi_linear_interaction.png" alt="bi-linear feature wise interaction" width="800" />

We also incorporate mini-batch generation of negative examples inside this layer in the similar way described in [the training of a simple model](https://github.com/tinyclues/recsys-multi-atrribute-benchmark/blob/master/training/movielens%20simple%20model.ipynb).

In [19]:
from layers import BiLinearInteraction
test_interaction = BiLinearInteraction(number_of_negatives=3, dropout_rate=0., name='test_interaction')
test_interaction([tf.random.normal((10, 4, 7)), tf.random.normal((10, 3, 5))], generate_negatives=False).shape

TensorShape([10, 12])

In [20]:
test_interaction([tf.random.normal((12, 4, 7)), tf.random.normal((12, 3, 5))], generate_negatives=True).shape

TensorShape([48, 12])

### Model architecture

Now we can assemble all these layers into final model. Note that offer compression weights and interaction kernels are shared between different augmentations we generate.

<img src="resources/group_by_augmentations_model.png" alt="model" width="800" />

### Model parameters

For model's regularization we used a combination of several strategies:
* `weight_decay` in the optimizer (for L2-penalty)
* explicit L1-penalty on embedding layers
* dropouts in fully-connected layers (after interaction, inside compression)

We use `AdamW` optimizer and `BCE` loss, but in some cases it maybe be interesting to use [`FocalLoss`](https://www.tensorflow.org/addons/api_docs/python/tfa/losses/SigmoidFocalCrossEntropy) (with $\gamma=1.5~…~2.5$) that automatically will concentrate on harder examples.

There are some model parameters that can be changed (and tuned), during experimentations we found some typical values for those parameters:

| parameter                                    | description                     | typical values | comment                                                                                                   |
|----------------------------------------------|---------------------------------|----------------|-----------------------------------------------------------------------------------------------------------|
| batch size, set above                        | batch size                                       | 5k … 20k       | it should not be too low if we want to have on-the-fly group-by                                      |
| learning rate inside `OPTIMIZER`             | learning rate                                    | 0.001 … 0.005 | usually we set it as a half of the learning rate used in standard training                                               |
| `USER_META_FEATURES`, `OFFER_META_FEATURES`  | compression meta dimension                       | 2 … 6         | prefer bigger values for larger number of offer features and complex (non-hierarchical) feature structure|
| `NB_AUGMENTATIONS`                           | number of augmentations per step                 | 3 … 10        | bigger for larger number of offer features                                                               |
| `AVERAGE_NUMBER_OF_FEATURES_IN_AUGMENTATION` | how many offer features used for group by key    | 1.5 … 3       | bigger for larger number of offer features                                                               |
| `EPOCHS`                                     | number of epochs                                 | 2 … 50         | we need to double or triple number of epochs compared to std training                                      |
| `EMBEDDING_DIM`                              | embedding latent dimensions                      | 15 … 60        | usually depends on the data amount and features modularity                                             |
| `NUMBER_OF_NEGATIVES`                        | number of negatives examples                     | 3 … 10       | bigger number of negative examples may create some collisions for higher level offers                    |

In [21]:
EMBEDDING_DIM = 100
L1_COEFF = 8.5e-7
DROPOUT = 0.17

NB_AUGMENTATIONS = 3
AVERAGE_NUMBER_OF_FEATURES_IN_AUGMENTATION = 2
USER_META_FEATURES = 5
OFFER_META_FEATURES = 3

def REGULARIZER():
    return {'class_name': 'L1L2', 'config': {'l1': L1_COEFF, 'l2': 0.}}

def OUTPUT_DNN():
    return tf.keras.Sequential([
        tf.keras.layers.BatchNormalization(),
        tf.keras.layers.Dense(80,
                              kernel_regularizer=REGULARIZER(),
                              bias_regularizer=REGULARIZER()),
        tf.keras.layers.Dropout(DROPOUT),
        tf.keras.layers.Activation('gelu'),
        tf.keras.layers.Dense(40,
                              kernel_regularizer=REGULARIZER(),
                              bias_regularizer=REGULARIZER()),
        tf.keras.layers.Dropout(DROPOUT),
        tf.keras.layers.Activation('gelu'),
        tf.keras.layers.Dense(1,
                              kernel_regularizer=REGULARIZER(),
                              bias_regularizer=REGULARIZER()),
    ], name='output_dnn')

EPOCHS = 12

NUMBER_OF_NEGATIVES = 4
LOSS = tf.keras.losses.BinaryCrossentropy(from_logits=True)
AUC_METRIC = tf.keras.metrics.AUC(from_logits=True)

import tensorflow_addons as tfa
OPTIMIZER = tfa.optimizers.AdamW(weight_decay=8.5e-8, learning_rate=0.0008)

### Embeddings

We will define embeddings with the same `WeightedEmbeddings` layer described in [the training of a simple model](https://github.com/tinyclues/recsys-multi-atrribute-benchmark/blob/master/training/movielens%20simple%20model.ipynb) with one addition:
* for offer features when aggregating a list of embedding vectors, we will also calculate variance and not only mean vector

It is easy to do in the same sparse-dense matrix multiplication operation as mean calculation (we get second moment and then calculate variance from it).

In [22]:
from layers import get_input_layer, WeightedEmbeddings
from utils import WEIGHT_SUFFIX

inputs = {}
embedded_user_features, embedded_offer_features, variance_offer_features = {}, {}, {}
for feature in user_features:
    inputs[feature] = get_input_layer(feature)
    emb_layer = WeightedEmbeddings(vocabulary_sizes[feature],
                                   EMBEDDING_DIM, name=f'{feature}_embedding',
                                   embeddings_regularizer=REGULARIZER())
    embedded_user_features[feature] = emb_layer(inputs[feature])
for feature in offer_features:
    # for offer features we need weights:
    # with dummy weights during training, and the ones used for a feature's averaging at inference time
    inputs[f'{feature}_weight'] = get_input_layer(f'{feature}_weight', tf.float32)
    inputs[feature] = get_input_layer(feature)
    emb_layer = WeightedEmbeddings(vocabulary_sizes[feature],
                                   EMBEDDING_DIM, name=f'{feature}_embedding',
                                   embeddings_regularizer=REGULARIZER(),
                                   calculate_variance=True)
    embedded_offer_features[feature], variance_offer_features[feature] =\
        emb_layer(inputs[feature], inputs[f'{feature}_weight'])

### Combining everything into model

Now we can define described model architecture on the top of embeddings.

In [23]:
user_stacked = tf.stack(list(embedded_user_features.values()), axis=1)
offer_stacked = tf.stack(list(embedded_offer_features.values()), axis=1)
offer_variance = tf.stack(list(variance_offer_features.values()), axis=1)
stacked_raw_offer_attrs = tf.stack([tf.cast(inp.values, tf.int32) for feature, inp in inputs.items()
                                    if feature in offer_features], axis=1)

Note that we added an intermediate tensor where we stacked all raw offer features - it will be used in `KeyGenerator`:

In [24]:
stacked_raw_offer_attrs

<KerasTensor: shape=(None, 7) dtype=int32 (created by layer 'tf.stack_3')>

In [25]:
key_generator = KeyGenerator(number_of_offer_attributes=len(offer_features),
                             average_number_of_attributes_in_key=AVERAGE_NUMBER_OF_FEATURES_IN_AUGMENTATION,
                             name='grp_key_generator')

user_compressed = UserFeaturesCompressor(USER_META_FEATURES, DROPOUT,
                                         name='user_compressor')(user_stacked)
offer_features_compressor = OfferFeaturesCompressor(OFFER_META_FEATURES, DROPOUT, name='offer_compressor')
mask_net = MaskNet(OFFER_META_FEATURES, DROPOUT, name='mask_generation')
apply_mask = tf.keras.layers.Multiply(name='apply_mask')
bi_linear_interaction = BiLinearInteraction(number_of_negatives=NUMBER_OF_NEGATIVES, dropout_rate=DROPOUT,
                                            initializer='random_normal', regularizer=REGULARIZER(),
                                            name='interaction')
output_dnn = OUTPUT_DNN()

augmentation_predictions = []
for i in range(NB_AUGMENTATIONS):
    group_by_key = key_generator(stacked_raw_offer_attrs)
    mean_offer_emb, variance_offer_emb = group_by(group_by_key, offer_stacked)
    compressed_offer_embeddings = offer_features_compressor([mean_offer_emb, variance_offer_emb])
    mask = mask_net([mean_offer_emb, variance_offer_emb])
    masked_offer_embeddings = apply_mask([compressed_offer_embeddings, mask])
    _output = output_dnn(bi_linear_interaction([user_compressed, masked_offer_embeddings], generate_negatives=True))
    augmentation_predictions.append(_output)
output = tf.concat(augmentation_predictions, axis=1)

And for evaluation we don't need to create augmentations, we need just to take offer features' mean and variance coming from inputs.

In [26]:
compressed_offer_embeddings = offer_features_compressor([offer_stacked, offer_variance])
mask = mask_net([offer_stacked, offer_variance])
masked_offer_embeddings = apply_mask([compressed_offer_embeddings, mask])

eval_output = output_dnn(bi_linear_interaction([user_compressed, masked_offer_embeddings], generate_negatives=True))

In [27]:
from utils import BroadcastLoss, BroadcastMetric

model = tf.keras.Model(inputs, output, name='group_by_augmentations')
model.compile(optimizer=OPTIMIZER,
              loss=BroadcastLoss(LOSS, NUMBER_OF_NEGATIVES),
              metrics=[BroadcastMetric(AUC_METRIC, NUMBER_OF_NEGATIVES)])

eval_model = tf.keras.Model(inputs, eval_output, name='group_by_augmentations_eval')

In [28]:
tf.keras.utils.plot_model(model, show_shapes=True, show_layer_names=True, to_file=f'models/{DATASET}_group_by_augmentations.png')

You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model/model_to_dot to work.


### Training

In [29]:
model.fit(datasets['train'], epochs=EPOCHS, validation_data=datasets['val'])

Epoch 1/12


  inputs = self._flatten_to_reference_inputs(inputs)


Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f74421d0fa0>

## Single task models benchmark

Using same approach as in [the simple model notebook](https://github.com/tinyclues/recsys-multi-atrribute-benchmark/blob/master/training/rees%20simple%20model.ipynb) we will look at performance gap between the model with group by augmentations against set of models specialized on tasks corresponding to one offer feature at time. We won't use augmentations in those baseline models, because they will be already aligned with offer we will use in evaluation afterwards. To illustrate importance of augmentations we will also train single model without group_by augmentations.

In [30]:
# offer columns we want to evaluate, specific to dataset we test
TASKS = ['imdbId', 'director', 'genre']

In [31]:
def bi_linear_interaction_model(offer_features, name='bi_linear_model'):
    # user_features, vocabulary_sizes, EMBEDDING_DIM, REGULARIZER, OPTIMIZER,
    # OUTPUT_DNN, LOSS, NUMBER_OF_NEGATIVES
    # come from global scope, but can be passed as params instead
    inputs = {}
    embedded_user_features, embedded_offer_features = {}, {}
    for feature in user_features:
        inputs[feature] = get_input_layer(feature)
        emb_layer = WeightedEmbeddings(vocabulary_sizes[feature],
                                       EMBEDDING_DIM, name=f'{feature}_embedding',
                                       embeddings_regularizer=REGULARIZER())
        embedded_user_features[feature] = emb_layer(inputs[feature])

    # for offer feature we need weights:
    # with dummy weights during training, and the ones used for a feature's averaging at inference time
    for feature in offer_features:
        inputs[feature] = get_input_layer(feature)
        inputs[f'{feature}_weight'] = get_input_layer(f'{feature}_weight', tf.float32)
        emb_layer = WeightedEmbeddings(vocabulary_sizes[feature],
                                       EMBEDDING_DIM, name=f'{feature}_embedding',
                                       embeddings_regularizer=REGULARIZER())
        embedded_offer_features[feature] = emb_layer(inputs[feature], inputs[f'{feature}_weight'])
    
    user_stacked = tf.stack(list(embedded_user_features.values()), axis=1)
    offer_stacked = tf.stack(list(embedded_offer_features.values()), axis=1)
    
    user_compressed = UserFeaturesCompressor(USER_META_FEATURES, DROPOUT,
                                             name='user_compressor')(user_stacked)
    
    output_dnn = OUTPUT_DNN()
    bi_linear_interaction = BiLinearInteraction(number_of_negatives=NUMBER_OF_NEGATIVES, dropout_rate=DROPOUT,
                                                initializer='random_normal', regularizer=REGULARIZER(),
                                                name='interaction')
    
    output = output_dnn(bi_linear_interaction([user_compressed, offer_stacked], generate_negatives=True))

    model = tf.keras.Model(inputs, output, name=name)
    model.compile(optimizer=OPTIMIZER,
                  loss=BroadcastLoss(LOSS, NUMBER_OF_NEGATIVES),
                  metrics=[BroadcastMetric(AUC_METRIC, NUMBER_OF_NEGATIVES)])
    
    return model

In [32]:
model_wo_augmentations = bi_linear_interaction_model(offer_features, name='model_wo_augm')
model_wo_augmentations.fit(datasets['train'], epochs=EPOCHS, validation_data=datasets['val'])

Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x7f75575006a0>

In [33]:
mono_feature_models = {}
for task_offer_feature in TASKS:
    mono_feature_models[task_offer_feature] =\
        bi_linear_interaction_model([task_offer_feature], name=f'{task_offer_feature}_model')
    mono_feature_models[task_offer_feature].fit(datasets['train'],
                                                epochs=EPOCHS,
                                                validation_data=datasets['val'])

Epoch 1/12


  inputs = self._flatten_to_reference_inputs(inputs)


Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Epoch 1/12


  inputs = self._flatten_to_reference_inputs(inputs)


Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Epoch 1/12


  inputs = self._flatten_to_reference_inputs(inputs)


Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


## Evaluation

In [34]:
%%time
from utils import prepare_single_task_dataset
test_datasets = {}
for task_offer_feature in TASKS:
    test_datasets[task_offer_feature] = \
        prepare_single_task_dataset(datasets['test'], task_offer_feature, offer_features)

CPU times: user 1min 30s, sys: 7.08 s, total: 1min 37s
Wall time: 1min 28s


In [35]:
%%time
from collections import defaultdict
from utils import evaluate_model

aucs = defaultdict(dict)
for task_offer_feature in TASKS:
    kw = {'single_task_feature': task_offer_feature, 'test_datasets': test_datasets,
          'number_of_negatives': NUMBER_OF_NEGATIVES, 'inverse_lookups': inverse_lookups}
    
    aucs[task_offer_feature]['group_by augmentations'] = evaluate_model(eval_model, **kw)
    aucs[task_offer_feature]['without augmentations'] = evaluate_model(model_wo_augmentations, **kw)
    
    for model_name in TASKS:
        aucs[task_offer_feature][f'MONO:{model_name}'] = evaluate_model(mono_feature_models[model_name], **kw)

  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)
  inputs = self._flatten_to_reference_inputs(inputs)


CPU times: user 21min 19s, sys: 3min 21s, total: 24min 41s
Wall time: 5min 45s


In [36]:
from utils import save_metrics
save_metrics(aucs, DATASET, 'group_by_augmentations')

## Aggregating results

### Popular offers

In [37]:
import pandas as pd
from utils import wAUC

results = pd.DataFrame()
for task_name in aucs:
    for model_name in aucs[task_name]:
        w_auc = wAUC(aucs[task_name][model_name], cutoff_low=200)
        results = pd.concat([results,
                             pd.Series({'wAUC': w_auc, 'offers': task_name, 'model': model_name}).to_frame().T],
                            ignore_index=True)

In [39]:
pd.pivot_table(results, 'wAUC', 'model', 'offers')\
    .rename(columns={'imdbId': 'film'}, index={'MONO:imdbId': 'MONO:film'})\
    .iloc[[3, 4, 2, 0, 1]][['film', 'director', 'genre']]\
    .style.background_gradient(cmap='coolwarm').format(precision=3)

offers,film,director,genre
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
group_by augmentations,0.608,0.592,0.546
without augmentations,0.61,0.591,0.534
MONO:film,0.612,0.593,0.538
MONO:director,0.594,0.595,0.542
MONO:genre,0.524,0.532,0.558
