# Channel attribution modelling using a LSTM neural network

Using [Criteo Attribution Modeling for Bidding Dataset](https://ailab.criteo.com/criteo-attribution-modeling-bidding-dataset/) (the link gives full description of the dataset, related papers and how to download the data), we apply a ML-based attribution strategy where we:
1. Learn channel attribution weights using a long short-term memory (LSTM) recurrent neural network
2. Optimise budget allocaton using the estimated weights

This project is highly inspired by the great [notebook](https://github.com/ikatsov/tensor-house/blob/master/promotions/channel-attribution-lstm.ipynb) shared by [Ilya Katsov](https://github.com/ikatsov). The main modification in our project is that we have implemented the LSTM model with attention using PyTorch.

## Import libraries

In [9]:
import pandas as pd
import numpy as np

# ML
from sklearn.model_selection import train_test_split

# Plots
import matplotlib.pyplot as plt
import seaborn as sns

## Load and prepare data

In [19]:
def print_key_counts(df):
    print(f"Dataframe shape: {df.shape}")
    print(f"Number of unique users: {df.uid.nunique()}")
    print(f"Number of unique campaignd: {df.campaign.nunique()}")

def sample_campaigns(df, n):
    campaigns = np.random.choice(df['campaign'].unique(), n, replace=False)
    return df[df['campaign'].isin(campaigns)]

In [20]:
# DATA_FILE='criteo_attribution_dataset.tsv.gz'
# raw_df = pd.read_csv(DATA_FILE, sep='\t', compression='gzip')
print_key_counts(raw_df)
raw_df.head()

Dataframe shape: (16468027, 22)
Number of unique users: 6142256
Number of unique campaignd: 675


Unnamed: 0,timestamp,uid,campaign,conversion,conversion_timestamp,conversion_id,attribution,click,click_pos,click_nb,...,time_since_last_click,cat1,cat2,cat3,cat4,cat5,cat6,cat7,cat8,cat9
0,0,20073966,22589171,0,-1,-1,0,0,-1,-1,...,-1,5824233,9312274,3490278,29196072,11409686,1973606,25162884,29196072,29196072
1,2,24607497,884761,0,-1,-1,0,0,-1,-1,...,423858,30763035,9312274,14584482,29196072,11409686,1973606,22644417,9312274,21091111
2,2,28474333,18975823,0,-1,-1,0,0,-1,-1,...,8879,138937,9312274,10769841,29196072,5824237,138937,1795451,29196072,15351056
3,3,7306395,29427842,1,1449193,3063962,0,1,0,7,...,-1,28928366,26597095,12435261,23549932,5824237,1973606,9180723,29841067,29196072
4,3,25357769,13365547,0,-1,-1,0,0,-1,-1,...,-1,138937,26597094,31616034,29196072,11409684,26597096,4480345,29196072,29196072


In [22]:
sample_df = sample_campaigns(raw_df, n=10)
print_key_counts(sample_df)
del raw_df

Dataframe shape: (503090, 22)
Number of unique users: 205835
Number of unique campaignd: 10
