```markdown
## NLP Methods for Anime Information Retrieval

I tried various NLP methods to get the most out of the natural language query to retrieve anime-specific information.
```

```markdown
## Data Preprocessing

In this sectio'n, we will preprocess the collected data to ensure it is clean and ready for analysis. This includes handling missing values, encoding categorical variables, and normalizing numerical features.
```

In [2]:
import pandas as pd
import numpy as np
# from sklearn.preprocessing import StandardScaler, LabelEncoder
# from sklearn.impute import SimpleImputer



In [3]:
df=pd.read_csv(r'..\DataCollection\anime_data.csv')

In [4]:
df.head()

Unnamed: 0,id,genres,averageScore,popularity,episodes,description,format,season,seasonYear,tags,title_romaji,title_english,title_native,relations_edges
0,16498,"['Action', 'Drama', 'Fantasy', 'Mystery']",84.0,823529,25.0,"Several hundred years ago, humans were nearly ...",TV,SPRING,2013.0,"[{'name': 'Male Protagonist', 'rank': 100}, {'...",Shingeki no Kyojin,Attack on Titan,進撃の巨人,"[{'node': {'id': 53390, 'title': {'english': '..."
1,101922,"['Action', 'Adventure', 'Drama', 'Fantasy', 'S...",83.0,780741,26.0,"It is the Taisho Period in Japan. Tanjiro, a k...",TV,SPRING,2019.0,"[{'name': 'Demons', 'rank': 96}, {'name': 'Sho...",Kimetsu no Yaiba,Demon Slayer: Kimetsu no Yaiba,鬼滅の刃,"[{'node': {'id': 87216, 'title': {'english': '..."
2,1535,"['Mystery', 'Psychological', 'Supernatural', '...",84.0,749742,37.0,Light Yagami is a genius high school student w...,TV,FALL,2006.0,"[{'name': 'Crime', 'rank': 96}, {'name': 'Dete...",DEATH NOTE,Death Note,DEATH NOTE,"[{'node': {'id': 30021, 'title': {'english': '..."
3,113415,"['Action', 'Drama', 'Supernatural']",85.0,728351,24.0,"A boy fights... for ""the right death.""<br>\n<b...",TV,FALL,2020.0,"[{'name': 'Urban Fantasy', 'rank': 93}, {'name...",Jujutsu Kaisen,JUJUTSU KAISEN,呪術廻戦,"[{'node': {'id': 101517, 'title': {'english': ..."
4,21459,"['Action', 'Adventure', 'Comedy']",77.0,706853,13.0,What would the world be like if 80 percent of ...,TV,SPRING,2016.0,"[{'name': 'Super Power', 'rank': 98}, {'name':...",Boku no Hero Academia,My Hero Academia,僕のヒーローアカデミア,"[{'node': {'id': 85486, 'title': {'english': '..."


```markdown
### One-Hot Encoding Genres

Divides the 'genre' column into multiple one-hot encoded sparse columns by extracting all genres from each row.
```

In [4]:
"""

Each entry in the 'genre' column consists of a list of genres. This function will create a new column for each unique genre found across all rows, and populate these columns with binary values indicating the presence (1) or absence (0) of the genre for each row.

Returns:
    DataFrame: A new DataFrame with the original data and additional one-hot encoded genre columns.
"""

"\n\nEach entry in the 'genre' column consists of a list of genres. This function will create a new column for each unique genre found across all rows, and populate these columns with binary values indicating the presence (1) or absence (0) of the genre for each row.\n\nReturns:\n    DataFrame: A new DataFrame with the original data and additional one-hot encoded genre columns.\n"

In [5]:
def one_hot_encode_genres(df):
    # Create a set of all unique genres
    unique_genres = set(genre for sublist in df['genres'].apply(eval) for genre in sublist)
    
    # Create a column for each genre and populate with binary values
    for genre in unique_genres:
        df[f"{genre}_genere "] = df['genres'].apply(lambda x: 1 if genre in eval(x) else 0)
    
    return df,unique_genres

# Apply the function to the dataframe
genere_df, Unique_genres = one_hot_encode_genres(df)
# df = pd.concat([df, genere_df], axis=1)

In [6]:
Unique_genres

{'Action',
 'Adventure',
 'Comedy',
 'Drama',
 'Ecchi',
 'Fantasy',
 'Hentai',
 'Horror',
 'Mahou Shoujo',
 'Mecha',
 'Music',
 'Mystery',
 'Psychological',
 'Romance',
 'Sci-Fi',
 'Slice of Life',
 'Sports',
 'Supernatural',
 'Thriller'}

In [7]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,19640,19641,19642,19643,19644,19645,19646,19647,19648,19649
id,16498,101922,1535,113415,21459,11061,21087,20605,20958,11757,...,120185,119458,118035,116567,115876,109354,106504,13829,170659,169763
genres,"['Action', 'Drama', 'Fantasy', 'Mystery']","['Action', 'Adventure', 'Drama', 'Fantasy', 'S...","['Mystery', 'Psychological', 'Supernatural', '...","['Action', 'Drama', 'Supernatural']","['Action', 'Adventure', 'Comedy']","['Action', 'Adventure', 'Fantasy']","['Action', 'Comedy', 'Sci-Fi', 'Supernatural']","['Action', 'Drama', 'Horror', 'Mystery', 'Psyc...","['Action', 'Drama', 'Fantasy', 'Mystery']","['Action', 'Adventure', 'Fantasy', 'Romance']",...,[],['Comedy'],['Mystery'],[],['Psychological'],[],[],"['Comedy', 'Slice of Life']","['Comedy', 'Slice of Life']",['Psychological']
averageScore,84.0,83.0,84.0,85.0,77.0,89.0,83.0,75.0,84.0,69.0,...,,,,,,,,,,
popularity,823529,780741,749742,728351,706853,667466,620632,592755,585346,579439,...,45,45,45,45,45,45,45,45,44,44
episodes,25.0,26.0,37.0,24.0,13.0,148.0,12.0,12.0,12.0,25.0,...,1.0,12.0,1.0,1.0,1.0,1.0,3.0,1.0,26.0,1.0
description,"Several hundred years ago, humans were nearly ...","It is the Taisho Period in Japan. Tanjiro, a k...",Light Yagami is a genius high school student w...,"A boy fights... for ""the right death.""<br>\n<b...",What would the world be like if 80 percent of ...,A new adaption of the manga of the same name b...,"Saitama has a rather peculiar hobby, being a s...",The suspense horror/dark fantasy story is set ...,Eren Jaeger swore to wipe out every last Titan...,"In the near future, a Virtual Reality Massive ...",...,Anthology film by animation collective G9+1 ab...,,A short animation from Tomori Komazaki for the...,Music video directed by Minori Yamada for the ...,Where do I come From? What am I? Where am I go...,A short film by Kana Imai.,"A heartwarming story of a acrobatic jet-plane,...",A Christmas special of Kamiusagi no Rope aired...,Second season of Chickip Dancers.,A very short animation drawn on tracing paper ...
format,TV,TV,TV,TV,TV,TV,TV,TV,TV,TV,...,MOVIE,ONA,MOVIE,MUSIC,MOVIE,MOVIE,OVA,SPECIAL,TV_SHORT,MOVIE
season,SPRING,SPRING,FALL,FALL,SPRING,FALL,FALL,SUMMER,SPRING,SUMMER,...,,,,,,,FALL,WINTER,FALL,
seasonYear,2013.0,2019.0,2006.0,2020.0,2016.0,2011.0,2015.0,2014.0,2017.0,2012.0,...,,,,,,,2004.0,2011.0,2022.0,
tags,"[{'name': 'Male Protagonist', 'rank': 100}, {'...","[{'name': 'Demons', 'rank': 96}, {'name': 'Sho...","[{'name': 'Crime', 'rank': 96}, {'name': 'Dete...","[{'name': 'Urban Fantasy', 'rank': 93}, {'name...","[{'name': 'Super Power', 'rank': 98}, {'name':...","[{'name': 'Shounen', 'rank': 95}, {'name': 'Su...","[{'name': 'Superhero', 'rank': 98}, {'name': '...","[{'name': 'Gore', 'rank': 90}, {'name': 'Urban...","[{'name': 'Survival', 'rank': 95}, {'name': 'P...","[{'name': 'Virtual World', 'rank': 97}, {'name...",...,[],[],[],[],[],[],"[{'name': 'Advertisement', 'rank': 60}, {'name...",[],[],[]


```markdown
## Tag Score One-Hot Encoding


This function processes the tags column in the anime dataset to create individual columns for each unique tag with their associated rank scores.

Key steps:
1. Extracts all unique tag names from the tags column, where each entry contains a list of tag dictionaries with  name  and  rank  fields
2. For each unique tag, creates a new column named  {tag}_tag_score
3. Populates the score columns by looking up the rank value for each tag in the original tags list, defaulting to 0 if tag not present
4. Returns the transformed dataframe and set of unique tags

The resulting dataframe has a separate column for each tag_s score, allowing for easier analysis of tag distributions and importance across anime titles.
```

In [None]:
def one_hot_encode_tags_with_scores(df):
    # Create a set of unique tag names
    unique_tags = set()
    for tags_list in df['tags']:
        tags = eval(tags_list)
        for tag in tags:
            unique_tags.add(tag['name'])
    
    # Create columns for each tag's score
    for tag in unique_tags:
        col_name = f"{tag}_tag_score"
        df[col_name] = df['tags'].apply(lambda x: next((item['rank'] for item in eval(x) if item['name'] == tag), 0))
        
    return df, unique_tags

# Apply the function
tags_df, tags = one_hot_encode_tags_with_scores(df)

KeyboardInterrupt: 

In [9]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,19640,19641,19642,19643,19644,19645,19646,19647,19648,19649
id,16498,101922,1535,113415,21459,11061,21087,20605,20958,11757,...,120185,119458,118035,116567,115876,109354,106504,13829,170659,169763
genres,"['Action', 'Drama', 'Fantasy', 'Mystery']","['Action', 'Adventure', 'Drama', 'Fantasy', 'S...","['Mystery', 'Psychological', 'Supernatural', '...","['Action', 'Drama', 'Supernatural']","['Action', 'Adventure', 'Comedy']","['Action', 'Adventure', 'Fantasy']","['Action', 'Comedy', 'Sci-Fi', 'Supernatural']","['Action', 'Drama', 'Horror', 'Mystery', 'Psyc...","['Action', 'Drama', 'Fantasy', 'Mystery']","['Action', 'Adventure', 'Fantasy', 'Romance']",...,[],['Comedy'],['Mystery'],[],['Psychological'],[],[],"['Comedy', 'Slice of Life']","['Comedy', 'Slice of Life']",['Psychological']
averageScore,84.0,83.0,84.0,85.0,77.0,89.0,83.0,75.0,84.0,69.0,...,,,,,,,,,,
popularity,823529,780741,749742,728351,706853,667466,620632,592755,585346,579439,...,45,45,45,45,45,45,45,45,44,44
episodes,25.0,26.0,37.0,24.0,13.0,148.0,12.0,12.0,12.0,25.0,...,1.0,12.0,1.0,1.0,1.0,1.0,3.0,1.0,26.0,1.0
description,"Several hundred years ago, humans were nearly ...","It is the Taisho Period in Japan. Tanjiro, a k...",Light Yagami is a genius high school student w...,"A boy fights... for ""the right death.""<br>\n<b...",What would the world be like if 80 percent of ...,A new adaption of the manga of the same name b...,"Saitama has a rather peculiar hobby, being a s...",The suspense horror/dark fantasy story is set ...,Eren Jaeger swore to wipe out every last Titan...,"In the near future, a Virtual Reality Massive ...",...,Anthology film by animation collective G9+1 ab...,,A short animation from Tomori Komazaki for the...,Music video directed by Minori Yamada for the ...,Where do I come From? What am I? Where am I go...,A short film by Kana Imai.,"A heartwarming story of a acrobatic jet-plane,...",A Christmas special of Kamiusagi no Rope aired...,Second season of Chickip Dancers.,A very short animation drawn on tracing paper ...
format,TV,TV,TV,TV,TV,TV,TV,TV,TV,TV,...,MOVIE,ONA,MOVIE,MUSIC,MOVIE,MOVIE,OVA,SPECIAL,TV_SHORT,MOVIE
season,SPRING,SPRING,FALL,FALL,SPRING,FALL,FALL,SUMMER,SPRING,SUMMER,...,,,,,,,FALL,WINTER,FALL,
seasonYear,2013.0,2019.0,2006.0,2020.0,2016.0,2011.0,2015.0,2014.0,2017.0,2012.0,...,,,,,,,2004.0,2011.0,2022.0,
tags,"[{'name': 'Male Protagonist', 'rank': 100}, {'...","[{'name': 'Demons', 'rank': 96}, {'name': 'Sho...","[{'name': 'Crime', 'rank': 96}, {'name': 'Dete...","[{'name': 'Urban Fantasy', 'rank': 93}, {'name...","[{'name': 'Super Power', 'rank': 98}, {'name':...","[{'name': 'Shounen', 'rank': 95}, {'name': 'Su...","[{'name': 'Superhero', 'rank': 98}, {'name': '...","[{'name': 'Gore', 'rank': 90}, {'name': 'Urban...","[{'name': 'Survival', 'rank': 95}, {'name': 'P...","[{'name': 'Virtual World', 'rank': 97}, {'name...",...,[],[],[],[],[],[],"[{'name': 'Advertisement', 'rank': 60}, {'name...",[],[],[]


In [76]:
df.shape

(19650, 425)

In [10]:
columns = list(df.columns)

In [11]:
columns

['id',
 'genres',
 'averageScore',
 'popularity',
 'episodes',
 'description',
 'format',
 'season',
 'seasonYear',
 'tags',
 'title_romaji',
 'title_english',
 'title_native',
 'relations_edges',
 'Mystery_genere ',
 'Ecchi_genere ',
 'Sci-Fi_genere ',
 'Sports_genere ',
 'Slice of Life_genere ',
 'Hentai_genere ',
 'Music_genere ',
 'Action_genere ',
 'Horror_genere ',
 'Adventure_genere ',
 'Comedy_genere ',
 'Mahou Shoujo_genere ',
 'Supernatural_genere ',
 'Drama_genere ',
 'Mecha_genere ',
 'Romance_genere ',
 'Fantasy_genere ',
 'Psychological_genere ',
 'Thriller_genere ',
 'Work_tag_score',
 'Criminal Organization_tag_score',
 'Fellatio_tag_score',
 'Rimjob_tag_score',
 'Hypersexuality_tag_score',
 'Historical_tag_score',
 'Josei_tag_score',
 'Dancing_tag_score',
 'Konbini_tag_score',
 'Environmental_tag_score',
 'Primarily Teen Cast_tag_score',
 'Espionage_tag_score',
 'Rape_tag_score',
 'Athletics_tag_score',
 'CGI_tag_score',
 'POV_tag_score',
 'Ensemble Cast_tag_score',
 '

In [12]:
max([len(eval(tags)) for tags in df.tags if tags != '[]'])

68

In [13]:
for i in range(len(df)):
    if eval(df.iloc[i].tags) != '[]':
        # print(len(eval(df.iloc[i].tags)))
        if len(eval(df.iloc[i].tags)) == 68:
            print(i)
            break

12


In [14]:
eval(df.iloc[12].tags)

[{'name': 'Pirates', 'rank': 98},
 {'name': 'Shounen', 'rank': 93},
 {'name': 'Ensemble Cast', 'rank': 93},
 {'name': 'Travel', 'rank': 93},
 {'name': 'Super Power', 'rank': 90},
 {'name': 'Found Family', 'rank': 86},
 {'name': 'Male Protagonist', 'rank': 83},
 {'name': 'Ships', 'rank': 81},
 {'name': 'Conspiracy', 'rank': 81},
 {'name': 'Slapstick', 'rank': 81},
 {'name': 'Time Skip', 'rank': 80},
 {'name': 'Tragedy', 'rank': 80},
 {'name': 'Anthropomorphism', 'rank': 80},
 {'name': 'Slavery', 'rank': 79},
 {'name': 'Politics', 'rank': 77},
 {'name': 'War', 'rank': 77},
 {'name': 'Fugitive', 'rank': 75},
 {'name': 'Crime', 'rank': 74},
 {'name': 'Dystopian', 'rank': 74},
 {'name': 'Gods', 'rank': 73},
 {'name': 'Prison', 'rank': 72},
 {'name': 'Lost Civilization', 'rank': 71},
 {'name': 'Swordplay', 'rank': 71},
 {'name': 'Food', 'rank': 71},
 {'name': 'Samurai', 'rank': 70},
 {'name': 'Monster Boy', 'rank': 70},
 {'name': 'Henshin', 'rank': 68},
 {'name': 'Medicine', 'rank': 67},
 {'

In [15]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Input text and predefined tags
input_text = "Suggest me an action anime with a pirate and an overpowered male lead and no female_lead."
tags = ["pirate", "overpowered", "action", "slavery", "female_lead", "angels", "samurai", "dance"]

# TF-IDF Vectorizer
vectorizer = TfidfVectorizer(vocabulary=tags)
tfidf_scores = vectorizer.fit_transform([input_text])

# Extract matching tags
extracted_tags = [tags[i] for i in tfidf_scores.toarray().argsort()[0] if tfidf_scores[0, i] > 0]

print(extracted_tags)


['pirate', 'overpowered', 'action', 'female_lead']


In [16]:

import spacy
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# Load NLP models
nlp = spacy.load("en_core_web_sm")
model = SentenceTransformer('all-MiniLM-L6-v2')

# Input text and predefined tags
input_text = "Suggest me an action anime with male lead and no slavery"
tags = ["pirate", "overpowered", "action", "slavery", "male_lead", "female_lead", "angels", "samurai", "dance"]

# Detect negations
def detect_negations(input_text, tags):
    doc = nlp(input_text)
    print(doc)
    negated_tags = set()
    for token in doc:
        print(token.dep_)
        if token.dep_ == "neg":  # Negation dependency
            negated_head = token.head.text.lower()
            if negated_head in tags:
                print(negated_head)
                negated_tags.add(negated_head)
    return negated_tags

# Compute embeddings and similarities
input_embedding = model.encode(input_text, convert_to_tensor=True)
tag_embeddings = model.encode(tags, convert_to_tensor=True)
similarities = cosine_similarity([input_embedding], tag_embeddings)[0]

# Adjust scores for negations
negated_tags = detect_negations(input_text, tags)
tag_scores = []

for i, tag in enumerate(tags):
    if tag in negated_tags:
        tag_scores.append(-similarities[i])  # Negative score for negated tags
    else:
        tag_scores.append(similarities[i])  # Positive score for relevant tags

# Normalize scores
tag_scores = np.array(tag_scores)
if tag_scores.max() > 0:  # Avoid division by zero
    tag_scores = tag_scores / abs(tag_scores).max()

# Print results
print("Tags:", tags)
print("Scores:", tag_scores)


  from .autonotebook import tqdm as notebook_tqdm


Suggest me an action anime with male lead and no slavery
ROOT
dative
det
compound
dobj
prep
amod
pobj
cc
det
conj
Tags: ['pirate', 'overpowered', 'action', 'slavery', 'male_lead', 'female_lead', 'angels', 'samurai', 'dance']
Scores: [0.48376837 0.31745082 0.7674215  0.77175695 1.         0.8690108
 0.3915192  0.9007799  0.5357921 ]


In [17]:
df.shape

(19650, 55)

In [None]:
def preprocess_text(text, nlp=nlp):
    """
    Preprocess text for NLP tasks using spaCy
    
    Args:
        text (str): Input text to be preprocessed
        nlp: spaCy language model (defaults to already loaded model)
        
    Returns:
        str: Preprocessed text with lemmatization and stopword removal
    """
    # Process text with spaCy
    doc = nlp(text.lower())
    
    # Remove stopwords and punctuation, lemmatize tokens
    tokens = [token.lemma_ for token in doc 
             if not token.is_stop 
             and not token.is_punct
             and not token.is_space]
    
    # Join tokens back into text
    processed_text = ' '.join(tokens)
    
    return processed_text
preprocess_text("Suggest me an action anime with pirates")

'suggest adventure action anime female lead'

In [18]:
from sentence_transformers import SentenceTransformer

def get_text_embedding(text, model_name='all-MiniLM-L6-v2'):
    """
    Transform natural language text to vector embeddings using SentenceTransformer
    
    Args:
        text (str): Input text to be transformed
        model_name (str): Name of the HuggingFace model to use
        
    Returns:
        numpy.ndarray: Vector embedding of the input text
    """
    # Load model (reuse existing if already loaded)
    try:
        embedding_model = model
    except NameError:
        embedding_model = SentenceTransformer(model_name)
    
    # Generate embedding
    embedding = embedding_model.encode(preprocess_text(text), convert_to_tensor=True)
    
    return embedding

# Example usage
text = "Suggest me an action anime with pirates"
embedding = get_text_embedding(text)
print(f"Embedding shape: {embedding.shape}")

NameError: name 'preprocess_text' is not defined

In [None]:
df.iloc[12]

id                                                                             21
genres                          ['Action', 'Adventure', 'Comedy', 'Drama', 'Fa...
averageScore                                                                 88.0
popularity                                                                 569270
episodes                                                                      NaN
                                                      ...                        
Angels_tag_score                                                               47
Alternate Universe_tag_score                                                    0
Meta_tag_score                                                                  0
Crime_tag_score                                                                74
Urban_tag_score                                                                 0
Name: 12, Length: 425, dtype: object

In [141]:
tags_12=[preprocess_text(df.iloc[12].description),preprocess_text(df.iloc[12].title_english)]
for tag in eval(df.iloc[12].tags):
    tags_12.append(tag["name"])
onepiece=preprocess_text(" ".join(tags_12))
print(onepiece)


gold roger know pirate king strong infamous sail grand line capture death roger world government bring change world word death reveal location great treasure world piece revelation bring grand age pirate man dream find piece promise unlimited rich fame possibly covet title person find title pirate king.<br><br > enter monkey d. luffy 17 year old boy defy standard definition pirate popular persona wicked harden toothless pirate ransack village fun luffy reason pirate pure wonder think exciting adventure meet new intriguing people find piece reason pirate follow footstep childhood hero luffy crew travel grand line experience crazy adventure unveil dark mystery battle strong enemy order reach piece.<br><br > < b>*this include follow special episodes:</b><br > chopperman rescue protect tv station shore episode 336)<br > strong tag team luffy toriko hard struggle episode 492)<br > team formation save chopper episode 542)<br > history strong collaboration vs. glutton sea episode 590)<br > 20

In [158]:
def anime_emb(idx):
    tags=[preprocess_text(df.iloc[idx].description) ,preprocess_text(df.iloc[idx].title_english)]
    for tag in eval(df.iloc[idx].tags):
        tags.append(tag["name"])
    txt=preprocess_text(" ".join(tags))
    return get_text_embedding(txt),txt

In [142]:
onepiece_embd=get_text_embedding(onepiece)

In [143]:
onepiece_embd.shape

torch.Size([384])

In [144]:
cosine_similarity([onepiece_embd], [embedding])

array([[0.50159776]], dtype=float32)

In [160]:
onepiece_embd,onepiece = anime_emb(12)
vinland_embd,vinland = anime_emb(47)
input_embedding = get_text_embedding("Suggest me an action anime with pirates")

In [162]:
vinland

'thorfinn son vike great warrior father kill battle mercenary leader askeladd swear revenge thorfinn join askeladd band order challenge duel end catch middle war crown england < br><br > source kodansha usa vinland saga viking revenge foreign historical male protagonist war tragedy philosophy come age seinen primarily male cast military politic anti hero swordplay primarily adult cast gore survival ship conspiracy slavery time skip pirate language barrier orphan spearplay religion cgi heterosexual archery cult snowscape'

In [None]:
input="vinland saga like anime with pirates"
cosine_similarity([anime_emb(47)], [get_text_embedding(input)])

array([[0.44499922]], dtype=float32)

In [None]:

# df.iloc[178].title_english
for i in range(len(df)):
    # print(df.iloc[i].id)
    if 101348 == df.iloc[i].id:
        print(i)
        break

47


In [163]:
tags

['pirate',
 'overpowered',
 'action',
 'slavery',
 'male_lead',
 'female_lead',
 'angels',
 'samurai',
 'dance']

In [165]:
unique_tags = set()
for tags_list in df['tags']:
    tags = eval(tags_list)
    for tag in tags:
        unique_tags.add(tag['name'])

In [None]:
tags=list(unique_tags)

In [None]:
tags.sort()

In [175]:
genres=sorted(list(set(genres)))

In [173]:
for tag in tags:
    print(tag)

4-koma
Achromatic
Achronological Order
Acrobatics
Acting
Adoption
Advertisement
Afterlife
Age Gap
Age Regression
Agender
Agriculture
Ahegao
Airsoft
Alchemy
Aliens
Alternate Universe
American Football
Amnesia
Amputation
Anachronism
Anal Sex
Ancient China
Angels
Animals
Anthology
Anthropomorphism
Anti-Hero
Archery
Armpits
Aromantic
Arranged Marriage
Artificial Intelligence
Asexual
Ashikoki
Asphyxiation
Assassins
Astronomy
Athletics
Augmented Reality
Autobiographical
Aviation
Badminton
Band
Bar
Baseball
Basketball
Battle Royale
Biographical
Bisexual
Blackmail
Board Game
Boarding School
Body Horror
Body Swapping
Bondage
Boobjob
Bowling
Boxing
Boys' Love
Bullying
Butler
CGI
Calligraphy
Camping
Cannibalism
Card Battle
Cars
Centaur
Cheerleading
Chibi
Chimera
Chuunibyou
Circus
Class Struggle
Classic Literature
Classical Music
Clone
Coastal
College
Coming of Age
Conspiracy
Cosmic Horror
Cosplay
Cowboys
Crime
Criminal Organization
Crossdressing
Crossover
Cult
Cultivation
Cumflation
Cunnilingus
C

In [177]:
import random

# Define base components


# Base templates
templates = [
    "Suggest me a [GENRE] anime with a [TAG].",
    "I want to watch a [GENRE] anime focused on [TAG].",
    "Looking for a [GENRE] anime with [TAG].",
    "Recommend me an anime with [TAG].",
    "Can you suggest a [GENRE] anime?",
    "Find me a [GENRE] anime with a lot of [TAG].",
    "What are some [GENRE] anime centered around [TAG]?",
    "Show me a good [GENRE] anime about [TAG].",
    "I'm in the mood for a [GENRE] anime with a [TAG].",
    "Are there any [GENRE] anime featuring [TAG]?",
    "Tell me about an anime with a lot of [TAG] elements.",
    "What is a must-watch anime with [TAG]?",
    "Give me a multi-genre anime combining [GENRE] and [GENRE] with a focus on [TAG].",
    "Recommend a [GENRE] and [GENRE] anime with [TAG].",
    "Suggest an anime with [TAG] and some [GENRE] themes.",
    "What [GENRE] anime has [TAG] as a central theme?",
    "I need an anime with [TAG], preferably in the [GENRE] genre.",
    "Can you find an anime with a mix of [GENRE] and [TAG]?",
    "Recommend a good [GENRE] anime that explores [TAG].",
    "I want to explore a [GENRE] anime without [TAG]."
]

# Data augmentation
augmentations = [
    "I'm looking for something similar to [EXAMPLE_ANIME].",
    "I enjoyed [EXAMPLE_ANIME], any recommendations like that?",
    "Can you suggest a new anime like [EXAMPLE_ANIME]?",
    "What's a good follow-up to [EXAMPLE_ANIME]?",
    "I've heard about [EXAMPLE_ANIME], but I want something different with [TAG].",
    "[EXAMPLE_ANIME] was amazing; what else is good in [GENRE]?"
]
example_anime = ["Naruto", "One Piece", "Attack on Titan", "Your Lie in April", "Steins;Gate", "Demon Slayer"]

# Generate 100 unique templates
unique_templates = set()

while len(unique_templates) < 100:
    # Randomly pick a base template and components
    template = random.choice(templates)
    genre1 = random.choice(genres)
    genre2 = random.choice(genres)
    tag = random.choice(tags)

    # Replace placeholders
    sentence = template
    if "[GENRE]" in template:
        sentence = sentence.replace("[GENRE]", genre1, 1)
        if "[GENRE]" in sentence:  # For multi-genre templates
            sentence = sentence.replace("[GENRE]", genre2, 1)
    if "[TAG]" in template:
        sentence = sentence.replace("[TAG]", tag, 1)

    # Add augmented examples
    if random.random() < 0.3:  # 30% chance to use augmentation
        aug_template = random.choice(augmentations)
        sentence = aug_template.replace("[EXAMPLE_ANIME]", random.choice(example_anime))
        if "[TAG]" in aug_template:
            sentence = sentence.replace("[TAG]", tag, 1)
        if "[GENRE]" in aug_template:
            sentence = sentence.replace("[GENRE]", genre1, 1)

    # Add the sentence to the set
    unique_templates.add(sentence)

# Print the results
for idx, template in enumerate(unique_templates):
    print(f"{idx+1}: {template}")


1: What is a must-watch anime with Agriculture?
2: Give me a multi-genre anime combining Sports and Music with a focus on Gambling.
3: Can you find an anime with a mix of Psychological and MILF?
4: Naruto was amazing; what else is good in Hentai?
5: Tell me about an anime with a lot of Satire elements.
6: I want to watch a Sci-Fi anime focused on Lacrosse.
7: What Music anime has Jazz Music as a central theme?
8: I'm looking for something similar to Naruto.
9: Give me a multi-genre anime combining Adventure and Sci-Fi with a focus on Agender.
10: What Psychological anime has Netori as a central theme?
11: Show me a good Horror anime about Love Triangle.
12: Can you suggest a Thriller anime?
13: What is a must-watch anime with 4-koma?
14: Recommend a Fantasy and Mecha anime with Scuba Diving.
15: I've heard about One Piece, but I want something different with Baseball.
16: I want to watch a Ecchi anime focused on War.
17: I'm looking for something similar to Your Lie in April.
18: Recom

In [19]:
one_desc=df.iloc[12].description

In [182]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("summarization", model="facebook/bart-large-cnn")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cpu


In [186]:
def generate_summary(description):
    return summarizer(description, max_length=200, min_length=20, do_sample=False)[0]['summary_text']

In [187]:
generate_summary(one_desc)

'gold Roger was known as the Pirate King, the strongest and most infamous being to have sailed the Grand Line . his last words before his death revealed the location of the greatest treasure in the world, One Piece . he was captured and killed by the world government by the grand age of pirates .'

In [22]:
import transformers

In [23]:
transformers.__version__


'4.48.0.dev0'

In [26]:
import torch._dynamo

# Suppress errors and fall back to eager mode
torch._dynamo.config.suppress_errors = True

sentences = [
    "Gold Roger was known as the Pirate King, the strongest and most infamous being to have sailed the Grand Line. The capture and death of Roger by the World Government brought a change throughout the world. His last words before his death revealed the location of the greatest treasure in the world, One Piece. It was this revelation that brought about the Grand Age of Pirates, men who dreamed of finding One Piece (which promises an unlimited amount of riches and fame), and quite possibly the most coveted of titles for the person who found it, the title of the Pirate King.Enter Monkey D. Luffy, a 17-year-old boy that defies your standard definition of a pirate. Rather than the popular persona of a wicked, hardened, toothless pirate who ransacks villages for fun, Luffy’s reason for being a pirate is one of pure wonder; the thought of an exciting adventure and meeting new and intriguing people, along with finding One Piece, are his reasons of becoming a pirate. Following in the footsteps of his childhood hero, Luffy and his crew travel across the Grand Line, experiencing crazy adventures, unveiling dark mysteries and battling strong enemies, all in order to reach One Piece."
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)

W1224 11:25:30.459000 10580 Lib\site-packages\torch\_dynamo\convert_frame.py:1125] WON'T CONVERT compiled_embeddings e:\New-Codes\Repositories\Advanced-Anime-Recommendation-System\venv\Lib\site-packages\transformers\models\modernbert\modeling_modernbert.py line 204 
W1224 11:25:30.459000 10580 Lib\site-packages\torch\_dynamo\convert_frame.py:1125] due to: 
W1224 11:25:30.459000 10580 Lib\site-packages\torch\_dynamo\convert_frame.py:1125] Traceback (most recent call last):
W1224 11:25:30.459000 10580 Lib\site-packages\torch\_dynamo\convert_frame.py:1125]   File "e:\New-Codes\Repositories\Advanced-Anime-Recommendation-System\venv\Lib\site-packages\torch\_inductor\cpp_builder.py", line 130, in check_compiler_exist_windows
W1224 11:25:30.459000 10580 Lib\site-packages\torch\_dynamo\convert_frame.py:1125]     subprocess.check_output([compiler, "/help"], stderr=subprocess.STDOUT)
W1224 11:25:30.459000 10580 Lib\site-packages\torch\_dynamo\convert_frame.py:1125]   File "C:\Users\prash\AppData

torch.Size([1, 1])


In [28]:
print(embeddings)

[[ 0.47324497  0.45060447 -0.2920536  ...  0.62320244  0.31446344
  -0.42435697]]


In [106]:
unique_tags = set()
for tags_list in df['tags']:
    tags = eval(tags_list)
    for tag in tags:
        unique_tags.add(tag['name'])

In [107]:
unique_genres = set(genre for sublist in df['genres'].apply(eval) for genre in sublist)

In [316]:
sentences

['Gold Roger was known as the Pirate King, the strongest and most infamous being to have sailed the Grand Line. The capture and death of Roger by the World Government brought a change throughout the world. His last words before his death revealed the location of the greatest treasure in the world, One Piece. It was this revelation that brought about the Grand Age of Pirates, men who dreamed of finding One Piece (which promises an unlimited amount of riches and fame), and quite possibly the most coveted of titles for the person who found it, the title of the Pirate King.Enter Monkey D. Luffy, a 17-year-old boy that defies your standard definition of a pirate. Rather than the popular persona of a wicked, hardened, toothless pirate who ransacks villages for fun, Luffy’s reason for being a pirate is one of pure wonder; the thought of an exciting adventure and meeting new and intriguing people, along with finding One Piece, are his reasons of becoming a pirate. Following in the footsteps of

In [333]:
df.iloc[12].tags

"[{'name': 'Pirates', 'rank': 98}, {'name': 'Shounen', 'rank': 93}, {'name': 'Ensemble Cast', 'rank': 93}, {'name': 'Travel', 'rank': 93}, {'name': 'Super Power', 'rank': 90}, {'name': 'Found Family', 'rank': 86}, {'name': 'Male Protagonist', 'rank': 83}, {'name': 'Ships', 'rank': 81}, {'name': 'Conspiracy', 'rank': 81}, {'name': 'Slapstick', 'rank': 81}, {'name': 'Time Skip', 'rank': 80}, {'name': 'Tragedy', 'rank': 80}, {'name': 'Anthropomorphism', 'rank': 80}, {'name': 'Slavery', 'rank': 79}, {'name': 'Politics', 'rank': 77}, {'name': 'War', 'rank': 77}, {'name': 'Fugitive', 'rank': 75}, {'name': 'Crime', 'rank': 74}, {'name': 'Dystopian', 'rank': 74}, {'name': 'Gods', 'rank': 73}, {'name': 'Prison', 'rank': 72}, {'name': 'Lost Civilization', 'rank': 71}, {'name': 'Swordplay', 'rank': 71}, {'name': 'Food', 'rank': 71}, {'name': 'Samurai', 'rank': 70}, {'name': 'Monster Boy', 'rank': 70}, {'name': 'Henshin', 'rank': 68}, {'name': 'Medicine', 'rank': 67}, {'name': 'Shapeshifting', 'ra

In [25]:
def get_genres_and_tags(id):
    genres = eval(df.loc[df['id'] == id, 'genres'].values[0])
    tags = {tag['name']: tag["rank"] for tag in eval(df.loc[df['id'] == id, 'tags'].values[0])}
    return genres, tags

# Example usage
genres, tags = get_genres_and_tags(21)
print(f"Genres: {genres}")
print(f"Tags: {tags}")

Genres: ['Action', 'Adventure', 'Comedy', 'Drama', 'Fantasy']
Tags: {'Pirates': 98, 'Shounen': 93, 'Ensemble Cast': 93, 'Travel': 93, 'Super Power': 90, 'Found Family': 86, 'Male Protagonist': 83, 'Ships': 81, 'Conspiracy': 81, 'Slapstick': 81, 'Time Skip': 80, 'Tragedy': 80, 'Anthropomorphism': 80, 'Slavery': 79, 'Politics': 77, 'War': 77, 'Fugitive': 75, 'Crime': 74, 'Dystopian': 74, 'Gods': 73, 'Prison': 72, 'Lost Civilization': 71, 'Swordplay': 71, 'Food': 71, 'Samurai': 70, 'Monster Boy': 70, 'Henshin': 68, 'Medicine': 67, 'Shapeshifting': 66, 'Cyborg': 66, 'Robots': 66, 'Artificial Intelligence': 66, 'Primarily Adult Cast': 65, 'Desert': 65, 'Animals': 64, 'Guns': 64, 'Skeleton': 63, 'Anti-Hero': 63, 'Dragons': 63, 'Anachronism': 62, 'Marriage': 62, 'Post-Apocalyptic': 61, 'Espionage': 60, 'Asexual': 60, 'Monster Girl': 60, 'Fairy': 60, 'Philosophy': 60, 'Drugs': 57, 'Kuudere': 56, 'Assassins': 56, 'Clone': 55, 'Battle Royale': 55, 'Aromantic': 55, 'Trains': 54, 'Ninja': 53, 'Ado

In [115]:
meta_data_1=[]
for genre in genres:
    gen=f" One piece anime is a {genre} anime."
    meta_data_1.append(gen)
    for tag in tags:
        meta_data_1.append(f" One piece anime is a  {genre} and {tag} anime.")

In [116]:
meta_data_1

[' One piece anime is a Action anime.',
 ' One piece anime is a  Action and Pirates anime.',
 ' One piece anime is a  Action and Shounen anime.',
 ' One piece anime is a  Action and Ensemble Cast anime.',
 ' One piece anime is a  Action and Travel anime.',
 ' One piece anime is a  Action and Super Power anime.',
 ' One piece anime is a  Action and Found Family anime.',
 ' One piece anime is a  Action and Male Protagonist anime.',
 ' One piece anime is a  Action and Ships anime.',
 ' One piece anime is a  Action and Conspiracy anime.',
 ' One piece anime is a  Action and Slapstick anime.',
 ' One piece anime is a  Action and Time Skip anime.',
 ' One piece anime is a  Action and Tragedy anime.',
 ' One piece anime is a  Action and Anthropomorphism anime.',
 ' One piece anime is a  Action and Slavery anime.',
 ' One piece anime is a  Action and Politics anime.',
 ' One piece anime is a  Action and War anime.',
 ' One piece anime is a  Action and Fugitive anime.',
 ' One piece anime is a 

In [317]:
sentences = [
    "Gold Roger was known as the Pirate King, the strongest and most infamous being to have sailed the Grand Line. The capture and death of Roger by the World Government brought a change throughout the world. His last words before his death revealed the location of the greatest treasure in the world, One Piece. It was this revelation that brought about the Grand Age of Pirates, men who dreamed of finding One Piece (which promises an unlimited amount of riches and fame), and quite possibly the most coveted of titles for the person who found it, the title of the Pirate King.Enter Monkey D. Luffy, a 17-year-old boy that defies your standard definition of a pirate. Rather than the popular persona of a wicked, hardened, toothless pirate who ransacks villages for fun, Luffy’s reason for being a pirate is one of pure wonder; the thought of an exciting adventure and meeting new and intriguing people, along with finding One Piece, are his reasons of becoming a pirate. Following in the footsteps of his childhood hero, Luffy and his crew travel across the Grand Line, experiencing crazy adventures, unveiling dark mysteries and battling strong enemies, all in order to reach One Piece."
]

In [319]:
len("".join(sentences).split(" "))

201

In [320]:
unique_tags[:10]

NameError: name 'unique_tags' is not defined

In [5]:
df.dropna(subset=['description'], inplace=True)

In [6]:
df.shape

(18543, 14)

In [15]:
df.head()

Unnamed: 0,id,genres,averageScore,popularity,episodes,description,format,season,seasonYear,tags,title_romaji,title_english,title_native,relations_edges
0,16498,"['Action', 'Drama', 'Fantasy', 'Mystery']",84.0,823529,25.0,"Several hundred years ago, humans were nearly ...",TV,SPRING,2013.0,"[{'name': 'Male Protagonist', 'rank': 100}, {'...",Shingeki no Kyojin,Attack on Titan,進撃の巨人,"[{'node': {'id': 53390, 'title': {'english': '..."
1,101922,"['Action', 'Adventure', 'Drama', 'Fantasy', 'S...",83.0,780741,26.0,"It is the Taisho Period in Japan. Tanjiro, a k...",TV,SPRING,2019.0,"[{'name': 'Demons', 'rank': 96}, {'name': 'Sho...",Kimetsu no Yaiba,Demon Slayer: Kimetsu no Yaiba,鬼滅の刃,"[{'node': {'id': 87216, 'title': {'english': '..."
2,1535,"['Mystery', 'Psychological', 'Supernatural', '...",84.0,749742,37.0,Light Yagami is a genius high school student w...,TV,FALL,2006.0,"[{'name': 'Crime', 'rank': 96}, {'name': 'Dete...",DEATH NOTE,Death Note,DEATH NOTE,"[{'node': {'id': 30021, 'title': {'english': '..."
3,113415,"['Action', 'Drama', 'Supernatural']",85.0,728351,24.0,"A boy fights... for ""the right death.""<br>\n<b...",TV,FALL,2020.0,"[{'name': 'Urban Fantasy', 'rank': 93}, {'name...",Jujutsu Kaisen,JUJUTSU KAISEN,呪術廻戦,"[{'node': {'id': 101517, 'title': {'english': ..."
4,21459,"['Action', 'Adventure', 'Comedy']",77.0,706853,13.0,What would the world be like if 80 percent of ...,TV,SPRING,2016.0,"[{'name': 'Super Power', 'rank': 98}, {'name':...",Boku no Hero Academia,My Hero Academia,僕のヒーローアカデミア,"[{'node': {'id': 85486, 'title': {'english': '..."


In [9]:
df["cleaned_description"] = df["description"].apply(clean_text)

In [369]:
newdata=[]

In [370]:
for i in range(100):
    anime_genre, anime_tags = get_genres_and_tags(i)
    
    desc=df.iloc[i].cleaned_description
    for gen in anime_genre:
        row={}
        row["description"]=desc
        row["genre"]=gen
        row["score"]=1.0
        newdata.append(row)
    for tag in anime_tags.items():
        row={}
        row["description"]=desc
        row["genre"]=tag[0]
        row["score"]=tag[1]/100
        newdata.append(row)
    
    

    
    

In [371]:
thedata=pd.DataFrame(newdata)

In [372]:
thedata.head()

Unnamed: 0,description,genre,score
0,"Several hundred years ago, humans were nearly ...",Action,1.0
1,"Several hundred years ago, humans were nearly ...",Drama,1.0
2,"Several hundred years ago, humans were nearly ...",Fantasy,1.0
3,"Several hundred years ago, humans were nearly ...",Mystery,1.0
4,"Several hundred years ago, humans were nearly ...",Male Protagonist,1.0


In [125]:
sentences.append(" ".join(tags))
sentences.append(" ".join(genres))

In [131]:
sentences

['Gold Roger was known as the Pirate King, the strongest and most infamous being to have sailed the Grand Line. The capture and death of Roger by the World Government brought a change throughout the world. His last words before his death revealed the location of the greatest treasure in the world, One Piece. It was this revelation that brought about the Grand Age of Pirates, men who dreamed of finding One Piece (which promises an unlimited amount of riches and fame), and quite possibly the most coveted of titles for the person who found it, the title of the Pirate King.Enter Monkey D. Luffy, a 17-year-old boy that defies your standard definition of a pirate. Rather than the popular persona of a wicked, hardened, toothless pirate who ransacks villages for fun, Luffy’s reason for being a pirate is one of pure wonder; the thought of an exciting adventure and meeting new and intriguing people, along with finding One Piece, are his reasons of becoming a pirate. Following in the footsteps of

In [188]:
data_onepiece={}

In [None]:
import random

# Define base components


# Base templates
templates = [
    "Suggest me a [GENRE] anime with a [TAG].",
    "I want to watch a [GENRE] anime focused on [TAG].",
    "Looking for a [GENRE] anime with [TAG].",
    "Recommend me an anime with [TAG].",
    "Can you suggest a [GENRE] anime?",
    "Find me a [GENRE] anime with a lot of [TAG].",
    "What are some [GENRE] anime centered around [TAG]?",
    "Show me a good [GENRE] anime about [TAG].",
    "I'm in the mood for a [GENRE] anime with a [TAG].",
    "Are there any [GENRE] anime featuring [TAG]?",
    "Tell me about an anime with a lot of [TAG] elements.",
    "What is a must-watch anime with [TAG]?",
    "Give me a multi-genre anime combining [GENRE] and [GENRE] with a focus on [TAG].",
    "Recommend a [GENRE] and [GENRE] anime with [TAG].",
    "Suggest an anime with [TAG] and some [GENRE] themes.",
    "What [GENRE] anime has [TAG] as a central theme?",
    "I need an anime with [TAG], preferably in the [GENRE] genre.",
    "Can you find an anime with a mix of [GENRE] and [TAG]?",
    "Recommend a good [GENRE] anime that explores [TAG].",
    "I want to explore a [GENRE] anime without [TAG]."
]
# Data augmentation
augmentations = [
    "I'm looking for something similar to [EXAMPLE_ANIME].",
    "I enjoyed [EXAMPLE_ANIME], any recommendations like that?",
    "Can you suggest a new anime like [EXAMPLE_ANIME]?",
    "What's a good follow-up to [EXAMPLE_ANIME]?",
    "I've heard about [EXAMPLE_ANIME], but I want something different with [TAG].",
    "[EXAMPLE_ANIME] was amazing; what else is good in [GENRE]?"
]
example_anime = ["Naruto", "One Piece", "Attack on Titan", "Your Lie in April", "Steins;Gate", "Demon Slayer"]

# Generate 100 unique templates
unique_templates = set()

while len(unique_templates) < 100:
    # Randomly pick a base template and components
    template = random.choice(templates)
    genre1 = random.choice(genres)
    genre2 = random.choice(genres)
    tag = random.choice(tags)

    # Replace placeholders
    sentence = template
    if "[GENRE]" in template:
        sentence = sentence.replace("[GENRE]", genre1, 1)
        if "[GENRE]" in sentence:  # For multi-genre templates
            sentence = sentence.replace("[GENRE]", genre2, 1)
    if "[TAG]" in template:
        sentence = sentence.replace("[TAG]", tag, 1)

    # Add augmented examples
    if random.random() < 0.3:  # 30% chance to use augmentation
        aug_template = random.choice(augmentations)
        sentence = aug_template.replace("[EXAMPLE_ANIME]", random.choice(example_anime))
        if "[TAG]" in aug_template:
            sentence = sentence.replace("[TAG]", tag, 1)
        if "[GENRE]" in aug_template:
            sentence = sentence.replace("[GENRE]", genre1, 1)

    # Add the sentence to the set
    unique_templates.add(sentence)

# Print the results
for idx, template in enumerate(unique_templates):
    print(f"{idx+1}: {template}")


In [None]:
custom_data={}

for i in range(len(df)):
    anchor=random.choice(templates)
    if anchor
    custom_data["anchor"]= anchor

In [315]:
def max_words_in_description(df):
    """
    Returns the maximum number of words present in the 'description' column of the dataframe.

    Args:
        df (pd.DataFrame): The input dataframe containing a 'description' column.

    Returns:
        int: The maximum number of words in any description.
    """
    # Fill NaN values with an empty string
    df['description'] = df['description'].fillna('')
    return df['description'].apply(lambda x: len(clean_text(x).split())).max()

# Example usage
max_words = max_words_in_description(df)
print(f"Maximum number of words in description: {max_words}")

Maximum number of words in description: 286


In [311]:
yi=df.description.iloc[12]

In [355]:
thedata

Unnamed: 0,description,genre,score
0,"Several hundred years ago, humans were nearly ...",Action,1.0
1,"Several hundred years ago, humans were nearly ...",Drama,1.0
2,"Several hundred years ago, humans were nearly ...",Fantasy,1.0
3,"Several hundred years ago, humans were nearly ...",Mystery,1.0
4,"Several hundred years ago, humans were nearly ...",Male Protagonist,1.0
...,...,...,...
156945,A Christmas special of Kamiusagi no Rope aired...,Comedy,1.0
156946,A Christmas special of Kamiusagi no Rope aired...,Slice of Life,1.0
156947,Second season of Chickip Dancers.,Comedy,1.0
156948,Second season of Chickip Dancers.,Slice of Life,1.0


In [373]:
thedata.to_csv("thedata100.csv",index=False)

In [374]:
df.head()

Unnamed: 0,id,genres,averageScore,popularity,episodes,description,format,season,seasonYear,tags,...,Athletics_tag_score,CGI_tag_score,POV_tag_score,Ensemble Cast_tag_score,Surfing_tag_score,Facial_tag_score,Asphyxiation_tag_score,Bondage_tag_score,Oiran_tag_score,cleaned_description
0,16498,"['Action', 'Drama', 'Fantasy', 'Mystery']",84.0,823529,25.0,"Several hundred years ago, humans were nearly ...",TV,SPRING,2013.0,"[{'name': 'Male Protagonist', 'rank': 100}, {'...",...,0,46,0,73,0,0,0,0,0,"Several hundred years ago, humans were nearly ..."
1,101922,"['Action', 'Adventure', 'Drama', 'Fantasy', 'S...",83.0,780741,26.0,"It is the Taisho Period in Japan. Tanjiro, a k...",TV,SPRING,2019.0,"[{'name': 'Demons', 'rank': 96}, {'name': 'Sho...",...,0,67,0,0,0,0,0,0,0,"It is the Taisho Period in Japan. Tanjiro, a k..."
2,1535,"['Mystery', 'Psychological', 'Supernatural', '...",84.0,749742,37.0,Light Yagami is a genius high school student w...,TV,FALL,2006.0,"[{'name': 'Crime', 'rank': 96}, {'name': 'Dete...",...,0,0,0,0,0,0,0,0,0,Light Yagami is a genius high school student w...
3,113415,"['Action', 'Drama', 'Supernatural']",85.0,728351,24.0,"A boy fights... for ""the right death.""<br>\n<b...",TV,FALL,2020.0,"[{'name': 'Urban Fantasy', 'rank': 93}, {'name...",...,0,0,0,68,0,0,0,0,0,"A boy fights... for ""the right death."""
4,21459,"['Action', 'Adventure', 'Comedy']",77.0,706853,13.0,What would the world be like if 80 percent of ...,TV,SPRING,2016.0,"[{'name': 'Super Power', 'rank': 98}, {'name':...",...,0,0,0,75,0,0,0,0,0,What would the world be like if 80 percent of ...


In [381]:
ge ,tg=get_genres_and_tags(12)
tag_=""
for k,v in tg.items():
    tag_+=k+", "
tag_+=", ".join(ge)
tag_

'Pirates, Shounen, Ensemble Cast, Travel, Super Power, Found Family, Male Protagonist, Ships, Conspiracy, Slapstick, Time Skip, Tragedy, Anthropomorphism, Slavery, Politics, War, Fugitive, Crime, Dystopian, Gods, Prison, Lost Civilization, Swordplay, Food, Samurai, Monster Boy, Henshin, Medicine, Shapeshifting, Cyborg, Robots, Artificial Intelligence, Primarily Adult Cast, Desert, Animals, Guns, Skeleton, Anti-Hero, Dragons, Anachronism, Marriage, Post-Apocalyptic, Espionage, Asexual, Monster Girl, Fairy, Philosophy, Drugs, Kuudere, Assassins, Clone, Battle Royale, Aromantic, Trains, Ninja, Adoption, Revenge, Female Protagonist, Mermaid, Gender Bending, Time Manipulation, Musical Theater, CGI, Angels, Unrequited Love, Zombie, Body Swapping, Acting, Action, Adventure, Comedy, Drama, Fantasy'

In [27]:
def get_name_genres_tags_and_description(id):
    """
    Returns the genres and tags of a given id from the dataframe df.

    Args:
        id (int): The id of the dataframe row.

    Returns:
        str: A formatted string containing the name, genres, tags, and description.
    """
    genre, tags = get_genres_and_tags(id)
    name_en = df.loc[df['id'] == id, 'title_english'].values[0]
    name_jp = df.loc[df['id'] == id, 'title_romaji'].values[0]
    desc = df.loc[df['id'] == id, 'cleaned_description'].values[0]
    base = f"Name of the anime is {name_en} and {name_jp}. Genres are {', '.join(genre)} tags are {', '.join(tags.keys())} and description is {desc}"
    return base


In [56]:
one_peice=get_name_genres_tags_and_description(20923)
vinland_saga=get_name_genres_tags_and_description(105333)
toradora=get_name_genres_tags_and_description(113425)
new_1=get_name_genres_tags_and_description(101348)

In [38]:
one_peice

'Name of the anime is Food Wars! and Shokugeki no Souma. Genres are Comedy, Ecchi tags are Food, Male Protagonist, Primarily Teen Cast, Shounen, School, Tsundere, Nudity, Educational, Tanned Skin, Gyaru, Twins, Female Harem, Tentacles, Found Family and description is Ever since he was little, Souma Yukihira’s main goals have been to beat his father in a cooking contest and take over the family diner. That’s why, when his dad suddenly announces that he’s shutting the restaurant down and sending Souma to cooking school, Souma is shocked. However, Totsuki Academy is no ordinary cooking school. This elite institution is filled with culinary giants like Erina “God Tongue” Nakiri, who does her best to keep a lowly short-order like Souma from even making it inside the door. However, if the snobs can dish it out, Souma can serve it up, and when he learns that only a handful of the students manage to graduate every year, he swears that the only way he’ll leave is as number one!'

In [55]:
vinland_saga

'Name of the anime is Dr. STONE and Dr. STONE. Genres are Action, Adventure, Comedy, Sci-Fi tags are Post-Apocalyptic, Survival, Lost Civilization, Educational, Time Skip, Male Protagonist, Shounen, Environmental, Ensemble Cast, Primarily Teen Cast, Kingdom Management, Spearplay, Primarily Male Cast, Found Family, Espionage, Asexual, Tomboy and description is After five years of harboring unspoken feelings, high-schooler Taiju Ooki is finally ready to confess his love to Yuzuriha Ogawa. Just when Taiju begins his confession however, a blinding green light strikes the Earth and petrifies mankind around the world— turning every single human into stone.'

In [52]:
toradora

'Name of the anime is Redo of Healer and Kaifuku Jutsushi no Yarinaoshi. Genres are Action, Adventure, Ecchi, Fantasy tags are Revenge, Sadism, Rape, Nudity, Anti-Hero, Male Protagonist, Torture, Time Manipulation, Bullying, Tragedy, Magic, Drugs, Memory Manipulation, Threesome, Primarily Female Cast, Fellatio, Alternate Universe, Gore, Slavery, Nakadashi, Large Breasts, Crime, Cunnilingus, Human Pet, Inseki, Defloration, Anal Sex, Feet, Ahegao, Heterosexual, Amnesia, Squirting, Villainess, Female Harem, Irrumatio, Kemonomimi, Conspiracy, Bondage, Death Game, Group Sex, Handjob, Maids, Masturbation, Ojou-sama, Travel, Femdom, Cannibalism, Flat Chest, Gender Bending, Monster Girl, Facial, Dystopian, Crossdressing, Twins, Assassins, Blackmail, LGBTQ+ Themes, Bisexual, Afterlife, Watersports, Boobjob, Dungeon, Goblin, Adoption, Animals, Agriculture, Asphyxiation, Incest and description is In a dark world of monsters, adventurers and mages, some of the most gifted healers are subjugated to

In [57]:
new_1

'Name of the anime is Vinland Saga and VINLAND SAGA. Genres are Action, Adventure, Drama tags are Vikings, Revenge, Foreign, Historical, Male Protagonist, War, Tragedy, Philosophy, Coming of Age, Seinen, Primarily Male Cast, Military, Politics, Anti-Hero, Swordplay, Primarily Adult Cast, Gore, Survival, Ships, Conspiracy, Slavery, Time Skip, Pirates, Language Barrier, Orphan, Spearplay, Religion, CGI, Heterosexual, Archery, Cult, Snowscape and description is Thorfinn is son to one of the Vikings greatest warriors, but when his father is killed in battle by the mercenary leader Askeladd, he swears to have his revenge. Thorfinn joins Askeladds band in order to challenge him to a duel, and ends up caught in the middle of a war for the crown of England.'

In [14]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Prashasst/anime-recommendation-model")



# [4, 4]

  from .autonotebook import tqdm as notebook_tqdm


In [60]:
query="anime released in 2022"

In [59]:
embeddings = model.encode([query,one_peice,vinland_saga,toradora,new_1])

similarities = model.similarity(embeddings, embeddings)
print(similarities)

tensor([[1.0000, 0.7938, 0.8259, 0.8122, 0.8071],
        [0.7938, 1.0000, 0.9486, 0.9450, 0.9345],
        [0.8259, 0.9486, 1.0000, 0.9714, 0.9542],
        [0.8122, 0.9450, 0.9714, 1.0000, 0.9584],
        [0.8071, 0.9345, 0.9542, 0.9584, 1.0000]])
