# Introduction

This Notebooks is a join notebook from both the prepare_data and pytorch-bst in order to be run in google colab.

# Prepare data section

In [1]:
!pip install pytorch_lightning

Collecting pytorch_lightning
  Downloading pytorch_lightning-2.5.2-py3-none-any.whl.metadata (21 kB)
Collecting torchmetrics>=0.7.0 (from pytorch_lightning)
  Downloading torchmetrics-1.7.4-py3-none-any.whl.metadata (21 kB)
Collecting lightning-utilities>=0.10.0 (from pytorch_lightning)
  Downloading lightning_utilities-0.14.3-py3-none-any.whl.metadata (5.6 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.1.0->pytorch_lightning)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.1.0->pytorch_lightning)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.1.0->pytorch_lightning)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.1.0->pytorch_lightning)
  Downloadi

In [2]:
import pandas as pd
import torch
import pytorch_lightning as pl
from tqdm import tqdm
import torchmetrics
import math
from urllib.request import urlretrieve
from zipfile import ZipFile
import os
import torch.nn as nn
import numpy as np
from math import sqrt

## Settings

In [3]:
WINDOW_SIZE = 20

## Data

In [4]:
urlretrieve("http://files.grouplens.org/datasets/movielens/ml-1m.zip", "movielens.zip")
ZipFile("movielens.zip", "r").extractall()

In [5]:
users = pd.read_csv(
    "ml-1m/users.dat",
    sep="::",
    names=["user_id", "sex", "age_group", "occupation", "zip_code"],
)

ratings = pd.read_csv(
    "ml-1m/ratings.dat",
    sep="::",
    names=["user_id", "movie_id", "rating", "unix_timestamp"],
)

movies = pd.read_csv(
    "ml-1m/movies.dat", sep="::", names=["movie_id", "title", "genres"], encoding="ISO-8859-1"
)

  users = pd.read_csv(
  ratings = pd.read_csv(
  movies = pd.read_csv(


In [6]:
## Movies
movies["year"] = movies["title"].apply(lambda x: x[-5:-1])
movies.year = pd.Categorical(movies.year)
movies["year"] = movies.year.cat.codes
## Users
users.sex = pd.Categorical(users.sex)
users["sex"] = users.sex.cat.codes


users.age_group = pd.Categorical(users.age_group)
users["age_group"] = users.age_group.cat.codes


users.occupation = pd.Categorical(users.occupation)
users["occupation"] = users.occupation.cat.codes


users.zip_code = pd.Categorical(users.zip_code)
users["zip_code"] = users.zip_code.cat.codes

#Ratings
ratings['unix_timestamp'] = pd.to_datetime(ratings['unix_timestamp'],unit='s')


In [7]:
# Save primary csv's
if not os.path.exists('data'):
    os.makedirs('data')


users.to_csv("data/users.csv",index=False)
movies.to_csv("data/movies.csv",index=False)
ratings.to_csv("data/ratings.csv",index=False)

In [8]:
## Movies
movies["movie_id"] = movies["movie_id"].astype(str)
## Users
users["user_id"] = users["user_id"].astype(str)

##Ratings
ratings["movie_id"] = ratings["movie_id"].astype(str)
ratings["user_id"] = ratings["user_id"].astype(str)

In [10]:
genres = [
    "Action",
    "Adventure",
    "Animation",
    "Children's",
    "Comedy",
    "Crime",
    "Documentary",
    "Drama",
    "Fantasy",
    "Film-Noir",
    "Horror",
    "Musical",
    "Mystery",
    "Romance",
    "Sci-Fi",
    "Thriller",
    "War",
    "Western",
]

for genre in genres:
    movies[genre] = movies["genres"].apply(
        lambda values: int(genre in values.split("|"))
    )
movies.head()

Unnamed: 0,movie_id,title,genres,year,Action,Adventure,Animation,Children's,Comedy,Crime,...,Fantasy,Film-Noir,Horror,Musical,Mystery,Romance,Sci-Fi,Thriller,War,Western
0,1,Toy Story (1995),Animation|Children's|Comedy,75,0,0,1,1,1,0,...,0,0,0,0,0,0,0,0,0,0
1,2,Jumanji (1995),Adventure|Children's|Fantasy,75,0,1,0,1,0,0,...,1,0,0,0,0,0,0,0,0,0
2,3,Grumpier Old Men (1995),Comedy|Romance,75,0,0,0,0,1,0,...,0,0,0,0,0,1,0,0,0,0
3,4,Waiting to Exhale (1995),Comedy|Drama,75,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,5,Father of the Bride Part II (1995),Comedy,75,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


### Transform the movie ratings data into sequences

First, let's sort the the ratings data using the `unix_timestamp`, and then group the
`movie_id` values and the `rating` values by `user_id`.

The output DataFrame will have a record for each `user_id`, with two ordered lists
(sorted by rating datetime): the movies they have rated, and their ratings of these movies.

In [17]:
ratings_group = ratings.sort_values(by=["unix_timestamp"]).groupby("user_id")

ratings_data = pd.DataFrame(
    data={
        "user_id": list(ratings_group.groups.keys()),
        "movie_ids": list(ratings_group.movie_id.apply(list)),
        "ratings": list(ratings_group.rating.apply(list)),
        "timestamps": list(ratings_group.unix_timestamp.apply(list)),
    }
)
ratings_data.head()

Unnamed: 0,user_id,movie_ids,ratings,timestamps
0,1,"[3186, 1721, 1270, 1022, 2340, 1836, 3408, 120...","[4, 4, 5, 5, 3, 5, 4, 4, 5, 4, 3, 5, 4, 4, 4, ...","[2000-12-31 22:00:19, 2000-12-31 22:00:55, 200..."
1,10,"[597, 858, 743, 1210, 1948, 2312, 3751, 1282, ...","[4, 3, 3, 4, 4, 5, 5, 5, 3, 3, 3, 5, 4, 4, 4, ...","[2000-12-31 00:59:35, 2000-12-31 00:59:35, 200..."
2,100,"[260, 1676, 1198, 541, 1210, 3948, 3536, 2567,...","[4, 3, 4, 3, 4, 3, 1, 1, 5, 4, 4, 3, 2, 3, 4, ...","[2000-12-23 17:46:35, 2000-12-23 17:46:35, 200..."
3,1000,"[971, 260, 2990, 2973, 1210, 3068, 3153, 1198,...","[4, 5, 4, 3, 5, 5, 2, 5, 5, 4, 5, 4, 3, 5, 5, ...","[2000-11-24 04:36:06, 2000-11-24 04:36:06, 200..."
4,1001,"[1198, 1617, 2885, 3909, 3555, 1479, 3903, 394...","[4, 4, 4, 2, 2, 1, 4, 5, 5, 4, 4, 4, 4, 3, 4, ...","[2000-11-24 04:19:51, 2000-11-24 04:21:42, 200..."


Now, let's split the `movie_ids` list into a set of sequences of a fixed length.
We do the same for the `ratings`. Set the `sequence_length` variable to change the length
of the input sequence to the model. You can also change the `step_size` to control the
number of sequences to generate for each user.

In [18]:
sequence_length = 8
step_size = 1


def create_sequences(values, window_size, step_size):
    sequences = []
    start_index = 0
    while True:
        end_index = start_index + window_size
        seq = values[start_index:end_index]
        if len(seq) < window_size:
            seq = values[-window_size:]
            if len(seq) == window_size:
                sequences.append(seq)
            break
        sequences.append(seq)
        start_index += step_size
    return sequences


ratings_data.movie_ids = ratings_data.movie_ids.apply(
    lambda ids: create_sequences(ids, sequence_length, step_size)
)
ratings_data.movie_ids.head()
ratings_data.ratings = ratings_data.ratings.apply(
    lambda ids: create_sequences(ids, sequence_length, step_size)
)
ratings_data.head()
del ratings_data["timestamps"]

After that, we process the output to have each sequence in a separate records in
the DataFrame. In addition, we join the user features with the ratings data.

In [19]:
ratings_data_movies = ratings_data[["user_id", "movie_ids"]].explode(
    "movie_ids", ignore_index=True
)
ratings_data_movies.head()

Unnamed: 0,user_id,movie_ids
0,1,"[3186, 1721, 1270, 1022, 2340, 1836, 3408, 1207]"
1,1,"[1721, 1270, 1022, 2340, 1836, 3408, 1207, 2804]"
2,1,"[1270, 1022, 2340, 1836, 3408, 1207, 2804, 260]"
3,1,"[1022, 2340, 1836, 3408, 1207, 2804, 260, 720]"
4,1,"[2340, 1836, 3408, 1207, 2804, 260, 720, 1193]"


In [20]:
ratings_data_rating = ratings_data[["ratings"]].explode("ratings", ignore_index=True)
ratings_data_rating.head()

Unnamed: 0,ratings
0,"[4, 4, 5, 5, 3, 5, 4, 4]"
1,"[4, 5, 5, 3, 5, 4, 4, 5]"
2,"[5, 5, 3, 5, 4, 4, 5, 4]"
3,"[5, 3, 5, 4, 4, 5, 4, 3]"
4,"[3, 5, 4, 4, 5, 4, 3, 5]"


In [21]:
ratings_data_transformed = pd.concat([ratings_data_movies, ratings_data_rating], axis=1)
ratings_data_transformed.head()

Unnamed: 0,user_id,movie_ids,ratings
0,1,"[3186, 1721, 1270, 1022, 2340, 1836, 3408, 1207]","[4, 4, 5, 5, 3, 5, 4, 4]"
1,1,"[1721, 1270, 1022, 2340, 1836, 3408, 1207, 2804]","[4, 5, 5, 3, 5, 4, 4, 5]"
2,1,"[1270, 1022, 2340, 1836, 3408, 1207, 2804, 260]","[5, 5, 3, 5, 4, 4, 5, 4]"
3,1,"[1022, 2340, 1836, 3408, 1207, 2804, 260, 720]","[5, 3, 5, 4, 4, 5, 4, 3]"
4,1,"[2340, 1836, 3408, 1207, 2804, 260, 720, 1193]","[3, 5, 4, 4, 5, 4, 3, 5]"


In [22]:
ratings_data_transformed = ratings_data_transformed.join(
    users.set_index("user_id"), on="user_id"
)
ratings_data_transformed.head()

Unnamed: 0,user_id,movie_ids,ratings,sex,age_group,occupation,zip_code
0,1,"[3186, 1721, 1270, 1022, 2340, 1836, 3408, 1207]","[4, 4, 5, 5, 3, 5, 4, 4]",0,0,10,1588
1,1,"[1721, 1270, 1022, 2340, 1836, 3408, 1207, 2804]","[4, 5, 5, 3, 5, 4, 4, 5]",0,0,10,1588
2,1,"[1270, 1022, 2340, 1836, 3408, 1207, 2804, 260]","[5, 5, 3, 5, 4, 4, 5, 4]",0,0,10,1588
3,1,"[1022, 2340, 1836, 3408, 1207, 2804, 260, 720]","[5, 3, 5, 4, 4, 5, 4, 3]",0,0,10,1588
4,1,"[2340, 1836, 3408, 1207, 2804, 260, 720, 1193]","[3, 5, 4, 4, 5, 4, 3, 5]",0,0,10,1588


In [23]:
ratings_data_transformed.movie_ids = ratings_data_transformed.movie_ids.apply(
    lambda x: ",".join(x)
)
ratings_data_transformed.ratings = ratings_data_transformed.ratings.apply(
    lambda x: ",".join([str(v) for v in x])
)

del ratings_data_transformed["zip_code"]

ratings_data_transformed.rename(
    columns={"movie_ids": "sequence_movie_ids", "ratings": "sequence_ratings"},
    inplace=True,
)
ratings_data_transformed.head()

Unnamed: 0,user_id,sequence_movie_ids,sequence_ratings,sex,age_group,occupation
0,1,31861721127010222340183634081207,44553544,0,0,10
1,1,17211270102223401836340812072804,45535445,0,0,10
2,1,1270102223401836340812072804260,55354454,0,0,10
3,1,102223401836340812072804260720,53544543,0,0,10
4,1,234018363408120728042607201193,35445435,0,0,10


With `sequence_length` of 4 and `step_size` of 2, we end up with 498,623 sequences.

Finally, we split the data into training and testing splits, with 85% and 15% of
the instances, respectively, and store them to CSV files.

In [24]:
random_selection = np.random.rand(len(ratings_data_transformed.index)) <= 0.85
train_data = ratings_data_transformed[random_selection]
test_data = ratings_data_transformed[~random_selection]

train_data.to_csv("data/train_data.csv", index=False, sep=",")
test_data.to_csv("data/test_data.csv", index=False, sep=",")

In [25]:
test_data

Unnamed: 0,user_id,sequence_movie_ids,sequence_ratings,sex,age_group,occupation
2,1,1270102223401836340812072804260,55354454,0,0,10
4,1,234018363408120728042607201193,35445435,0,0,10
8,1,2804260720119391960826921961,54354445,0,0,10
9,1,2607201193919608269219612028,43544455,0,0,10
33,1,3114279110292321119759423981545,44533444,0,0,10
...,...,...,...,...,...,...
963949,999,351331093156282188820920779,33443243,1,2,15
963954,999,209207792875231621653612688,24343133,1,2,15
963960,999,36126882422641959267625401363,33321323,1,2,15
963962,999,24226419592676254013637653565,32132334,1,2,15


# BST Implementation and training

In [26]:
import pandas as pd
import torch
import pytorch_lightning as pl
from tqdm import tqdm
import torchmetrics
import math
from urllib.request import urlretrieve
from zipfile import ZipFile
import os
import torch.nn as nn
import numpy as np

In [27]:
users = pd.read_csv(
    "data/users.csv",
    sep=",",
)

ratings = pd.read_csv(
    "data/ratings.csv",
    sep=",",
)

movies = pd.read_csv(
    "data/movies.csv", sep=","
)

## Pytorch dataset

In [28]:
import pandas as pd
import torch
import torch.utils.data as data
from torchvision import transforms
import ast
from torch.nn.utils.rnn import pad_sequence

class MovieDataset(data.Dataset):
    """Movie dataset."""

    def __init__(
        self, ratings_file,test=False
    ):
        """
        Args:
            csv_file (string): Path to the csv file with user,past,future.
        """
        self.ratings_frame = pd.read_csv(
            ratings_file,
            delimiter=",",
            # iterator=True,
        )
        self.test = test

    def __len__(self):
        return len(self.ratings_frame)

    def __getitem__(self, idx):
        data = self.ratings_frame.iloc[idx]
        user_id = data.user_id

        movie_history = eval(data.sequence_movie_ids)
        movie_history_ratings = eval(data.sequence_ratings)
        target_movie_id = movie_history[-1:][0]
        target_movie_rating = movie_history_ratings[-1:][0]

        movie_history = torch.LongTensor(movie_history[:-1])
        movie_history_ratings = torch.LongTensor(movie_history_ratings[:-1])



        sex = data.sex
        age_group = data.age_group
        occupation = data.occupation

        return user_id, movie_history, target_movie_id,  movie_history_ratings, target_movie_rating, sex, age_group, occupation

In [29]:
genres = [
    "Action",
    "Adventure",
    "Animation",
    "Children's",
    "Comedy",
    "Crime",
    "Documentary",
    "Drama",
    "Fantasy",
    "Film-Noir",
    "Horror",
    "Musical",
    "Mystery",
    "Romance",
    "Sci-Fi",
    "Thriller",
    "War",
    "Western",
]

for genre in genres:
    movies[genre] = movies["genres"].apply(
        lambda values: int(genre in values.split("|"))
    )

sequence_length = 8

In [32]:
class PositionalEmbedding(nn.Module):
    """
    Computes positional embedding following "Attention is all you need"
    """

    def __init__(self, max_len, d_model):
        super().__init__()

        # Compute the positional encodings once in log space.
        self.pe = nn.Embedding(max_len, d_model)

    def forward(self, x):
        batch_size = x.size(0)
        return self.pe.weight.unsqueeze(0).repeat(batch_size, 1, 1)


class BST(pl.LightningModule):
    def __init__(
        self, args=None,
    ):
        super().__init__()
        super(BST, self).__init__()

        self.save_hyperparameters()
        self.args = args
        #-------------------
        # Embedding layers
        ##Users
        self.embeddings_user_id = nn.Embedding(
            int(users.user_id.max())+1, int(math.sqrt(users.user_id.max()))+1
        )
        ###Users features embeddings
        self.embeddings_user_sex = nn.Embedding(
            len(users.sex.unique()), int(math.sqrt(len(users.sex.unique())))
        )
        self.embeddings_age_group = nn.Embedding(
            len(users.age_group.unique()), int(math.sqrt(len(users.age_group.unique())))
        )
        self.embeddings_user_occupation = nn.Embedding(
            len(users.occupation.unique()), int(math.sqrt(len(users.occupation.unique())))
        )
        self.embeddings_user_zip_code = nn.Embedding(
            len(users.zip_code.unique()), int(math.sqrt(len(users.sex.unique())))
        )

        ##Movies
        self.embeddings_movie_id = nn.Embedding(
            int(movies.movie_id.max())+1, int(math.sqrt(movies.movie_id.max()))+1
        )

        ###Movies features embeddings
        genre_vectors = movies[genres].to_numpy()
        self.embeddings_movie_genre = nn.Embedding(
            genre_vectors.shape[0], genre_vectors.shape[1]
        )



        self.embeddings_movie_year = nn.Embedding(
            len(movies.year.unique()), int(math.sqrt(len(movies.year.unique())))
        )

        self.positional_embedding = PositionalEmbedding(8, 9)

        # Network
        self.transfomerlayer = nn.TransformerEncoderLayer(72, 3, dropout=0.2)
        self.linear = nn.Sequential(
            nn.Linear(
                661,
                1024,
            ),
            nn.LeakyReLU(),
            nn.Linear(1024, 512),
            nn.LeakyReLU(),
            nn.Linear(512, 256),
            nn.LeakyReLU(),
            nn.Linear(256, 1),
        )
        self.criterion = torch.nn.MSELoss()
        self.mae = torchmetrics.MeanAbsoluteError()
        self.mse = torchmetrics.MeanSquaredError()



    def encode_input(self,inputs):
        user_id, movie_history, target_movie_id,  movie_history_ratings, target_movie_rating, sex, age_group, occupation = inputs

        #MOVIES
        movie_history = self.embeddings_movie_id(movie_history)
        target_movie = self.embeddings_movie_id(target_movie_id)

        target_movie = torch.unsqueeze(target_movie, 1)
        transfomer_features = torch.cat((movie_history, target_movie),dim=1)

        #USERS
        user_id = self.embeddings_user_id(user_id)

        sex = self.embeddings_user_sex(sex)
        age_group = self.embeddings_age_group(age_group)
        occupation = self.embeddings_user_occupation(occupation)
        user_features = torch.cat((user_id, sex, age_group,occupation), 1)

        return transfomer_features, user_features, target_movie_rating.float()

    def forward(self, batch):
        transfomer_features, user_features, target_movie_rating = self.encode_input(batch)
        positional_embedding = self.positional_embedding(transfomer_features)
        transfomer_features = torch.cat((transfomer_features, positional_embedding), dim=2)
        transformer_output = self.transfomerlayer(transfomer_features)
        transformer_output = torch.flatten(transformer_output,start_dim=1)

        #Concat with other features
        features = torch.cat((transformer_output,user_features),dim=1)

        output = self.linear(features)
        return output, target_movie_rating

    def training_step(self, batch, batch_idx):
        out, target_movie_rating = self(batch)
        out = out.flatten()
        loss = self.criterion(out, target_movie_rating)

        mae = self.mae(out, target_movie_rating)
        mse = self.mse(out, target_movie_rating)
        rmse =torch.sqrt(mse)
        self.log(
            "train/mae", mae, on_step=True, on_epoch=False, prog_bar=False
        )

        self.log(
            "train/rmse", rmse, on_step=True, on_epoch=False, prog_bar=False
        )

        self.log("train/step_loss", loss, on_step=True, on_epoch=False, prog_bar=False)
        return loss

    def validation_step(self, batch, batch_idx):
        out, target_movie_rating = self(batch)
        out = out.flatten()
        loss = self.criterion(out, target_movie_rating)

        mae = self.mae(out, target_movie_rating)
        mse = self.mse(out, target_movie_rating)
        rmse =torch.sqrt(mse)

        return {"val_loss": loss, "mae": mae.detach(), "rmse":rmse.detach()}

    # def validation_epoch_end(self, outputs):
    #     avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
    #     avg_mae = torch.stack([x["mae"] for x in outputs]).mean()
    #     avg_rmse = torch.stack([x["rmse"] for x in outputs]).mean()

    #     self.log("val/loss", avg_loss, on_step=False, on_epoch=True, prog_bar=False)
    #     self.log("val/mae", avg_mae, on_step=False, on_epoch=True, prog_bar=False)
    #     self.log("val/rmse", avg_rmse, on_step=False, on_epoch=True, prog_bar=False)
    def on_validation_epoch_end(self):
        outputs = self.validation_outputs
        avg_loss = torch.stack([x["val_loss"] for x in outputs]).mean()
        avg_mae = torch.stack([x["mae"] for x in outputs]).mean()
        avg_rmse = torch.stack([x["rmse"] for x in outputs]).mean()

        self.log("val/loss", avg_loss, on_step=False, on_epoch=True, prog_bar=False)
        self.log("val/mae", avg_mae, on_step=False, on_epoch=True, prog_bar=False)
        self.log("val/rmse", avg_rmse, on_step=False, on_epoch=True, prog_bar=False)

        self.validation_outputs.clear()  # Ê∏ÖÁ©∫‰øùÂ≠òÁöÑËæìÂá∫

    # def test_epoch_end(self, outputs):
    #     users = torch.cat([x["users"] for x in outputs])
    #     y_hat = torch.cat([x["top14"] for x in outputs])
    #     users = users.tolist()
    #     y_hat = y_hat.tolist()

    #     data = {"users": users, "top14": y_hat}
    #     df = pd.DataFrame.from_dict(data)
    #     print(len(df))
    #     df.to_csv("lightning_logs/predict.csv", index=False)
    def on_test_epoch_end(self):
        outputs = self.test_outputs
        users = torch.cat([x["users"] for x in outputs])
        y_hat = torch.cat([x["top14"] for x in outputs])
        users = users.tolist()
        y_hat = y_hat.tolist()

        data = {"users": users, "top14": y_hat}
        df = pd.DataFrame.from_dict(data)
        print(len(df))
        df.to_csv("lightning_logs/predict.csv", index=False)

        self.test_outputs.clear()  # Ê∏ÖÁ©∫‰øùÂ≠òÁöÑËæìÂá∫


    def configure_optimizers(self):
        return torch.optim.AdamW(self.parameters(), lr=0.0005)

    @staticmethod
    def add_model_specific_args(parent_parser):
        parser = ArgumentParser(parents=[parent_parser], add_help=False)
        parser.add_argument("--learning_rate", type=float, default=0.01)
        return parser

    ####################
    # DATA RELATED HOOKS
    ####################

    def setup(self, stage=None):
        print("Loading datasets")
        self.train_dataset = MovieDataset("data/train_data.csv")
        self.val_dataset = MovieDataset("data/test_data.csv")
        self.test_dataset = MovieDataset("data/test_data.csv")
        print("Done")

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
            self.train_dataset,
            batch_size=128,
            shuffle=False,
            num_workers=os.cpu_count(),
        )

    def val_dataloader(self):
        return torch.utils.data.DataLoader(
            self.val_dataset,
            batch_size=128,
            shuffle=False,
            num_workers=os.cpu_count(),
        )

    def test_dataloader(self):
        return torch.utils.data.DataLoader(
            self.test_dataset,
            batch_size=128,
            shuffle=False,
            num_workers=os.cpu_count(),
        )

model = BST()
trainer = pl.Trainer(accelerator="gpu", devices=1, max_epochs=50)
trainer.fit(model)

INFO:pytorch_lightning.utilities.rank_zero:üí° Tip: For seamless cloud uploads and versioning, try installing [litmodels](https://pypi.org/project/litmodels/) to enable LitModelCheckpoint, which syncs automatically with the Lightning model registry.
INFO:pytorch_lightning.utilities.rank_zero:GPU available: True (cuda), used: True
INFO:pytorch_lightning.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO:pytorch_lightning.utilities.rank_zero:HPU available: False, using: 0 HPUs


Loading datasets


INFO:pytorch_lightning.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


Done


INFO:pytorch_lightning.callbacks.model_summary:
   | Name                       | Type                    | Params | Mode 
--------------------------------------------------------------------------------
0  | embeddings_user_id         | Embedding               | 471 K  | train
1  | embeddings_user_sex        | Embedding               | 2      | train
2  | embeddings_age_group       | Embedding               | 14     | train
3  | embeddings_user_occupation | Embedding               | 84     | train
4  | embeddings_user_zip_code   | Embedding               | 3.4 K  | train
5  | embeddings_movie_id        | Embedding               | 249 K  | train
6  | embeddings_movie_genre     | Embedding               | 69.9 K | train
7  | embeddings_movie_year      | Embedding               | 729    | train
8  | positional_embedding       | PositionalEmbedding     | 72     | train
9  | transfomerlayer            | TransformerEncoderLayer | 318 K  | train
10 | linear                     | Sequential  

Sanity Checking: |          | 0/? [00:00<?, ?it/s]

AttributeError: 'BST' object has no attribute 'validation_outputs'