```markdown
##  SentenceTransformer("Prashasst/anime-recommendation-model")

This script demonstrates how I  fine-tuned a BERT based model for anime recommendations.

with Pooling layers and Normalization

You can use the fine-tuned Sentence Transformer from the sentence_transformers library as follows:

model = SentenceTransformer("Prashasst/anime-recommendation-model")

```


In [2]:
import pandas as pd
import numpy as np
# import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sentence_transformers.readers import InputExample
from sentence_transformers.losses import CosineSimilarityLoss
from sentence_transformers.training_args import BatchSamplers
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator, SimilarityFunction
from transformers import TrainingArguments
from sentence_transformers import (
    SentenceTransformer,
    SentenceTransformerTrainer,
    SentenceTransformerTrainingArguments,
    SentenceTransformerModelCardData,
)
from datasets import Dataset

In [38]:
thedata=pd.read_csv("thedata100.csv")

In [41]:
thedata.shape

(2942, 3)

In [16]:
thedata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 156950 entries, 0 to 156949
Data columns (total 3 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   description  153180 non-null  object 
 1   genre        156950 non-null  object 
 2   score        156950 non-null  float64
dtypes: float64(1), object(2)
memory usage: 3.6+ MB


In [31]:
import re

def clean_text(input_text):
    """
    Cleans the input text by removing slashes, <br> tags, and everything after the first <br> tag.

    Args:
        input_text (str): The input string to be cleaned.

    Returns:
        str: The cleaned text.
    """
    # Remove slashes
    text_no_slashes = input_text.replace("/", "")
    text_no_slashes = input_text.replace("'", "")
    text_no_slashes = input_text.replace('"', "")
    text_no_slashes = input_text.replace("<i>", " ")
    text_no_slashes = input_text.replace("<I>", " ")
    text_no_slashes = input_text.replace("!", "")
    
    # Remove everything after the first <br> tag and strip leading/trailing spaces
    cleaned_text = re.split(r'<br>', text_no_slashes, maxsplit=1)[0].strip()

    return cleaned_text


In [42]:
thedata.dropna(inplace=True)

In [43]:
thedata["description"] = thedata["description"].apply(clean_text)

In [44]:
thedata.to_csv("thedata100.csv", index=False)

In [36]:

# Split the data into train, validation, and test sets
train_data, temp_data = train_test_split(thedata, test_size=0.2, random_state=42)
val_data, test_data = train_test_split(temp_data, test_size=0.5, random_state=42)




# Create a Dataset object for each set


train_dataset = Dataset.from_dict({
    "description": list(train_data["description"]),
    "genre" :list(train_data["genre"]),
    "label" :list(train_data["score"])
})


val_dataset = Dataset.from_dict({
    "description": list(val_data["description"]),
    "genre": list(val_data["genre"]),
    "label": list(val_data["score"])
})

test_dataset = Dataset.from_dict({
    "description": list(test_data["description"]),
    "genre": list(test_data["genre"]),
    "label": list(test_data["score"])
})



# Load a pretrained SentenceTransformer model
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

# Define a loss function
loss = CosineSimilarityLoss(model=model)

# Create evaluators for validation and test datasets
val_evaluator = EmbeddingSimilarityEvaluator(
    sentences1=val_dataset["description"],
    sentences2=val_dataset["genre"],
    scores=val_dataset["label"],
    main_similarity=SimilarityFunction.COSINE,
    name="anime-recommendation-dev"
)

test_evaluator = EmbeddingSimilarityEvaluator(
    sentences1=test_dataset["description"],
    sentences2=test_dataset["genre"],
    scores=test_dataset["label"],
    main_similarity=SimilarityFunction.COSINE,
    name="anime-recommendation-test"
)

# Training arguments
training_args = SentenceTransformerTrainingArguments(
    output_dir="models/anime-recommendation",
    num_train_epochs=1,
    per_device_train_batch_size=16,
    eval_strategy="steps",
    eval_steps=100,
    save_steps=100,
    logging_steps=1,
    learning_rate=2e-5,
    warmup_ratio=0.1,
    save_total_limit=2,
    fp16=True
)

# Define a Trainer


trainer = SentenceTransformerTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    loss=loss,
    evaluator=val_evaluator
)

# Train the model
trainer.train()

# Evaluate the model on the test set
test_evaluator(model)

# Save the trained model
model.save("models/anime_recom/final")

# Push the model to the Hugging Face Hub
model.push_to_hub("anime-recommendation-model")


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.26.0`: Please run `pip install transformers[torch]` or `pip install 'accelerate>={ACCELERATE_MIN_VERSION}'`

In [45]:


sentences = [
    "suggest me a action and adventure anime",
    "Attack in titan is a anime where humans fight against titans",
    "One piece is a anime about a pirate named Monkey D. Luffy",
    "toradora is a romance anime"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


torch.Size([4, 4])
