# The Cinematic Nexus: Unveiling the Future of Movie Recommendations and Analysis

by Anthony Amadasun

## 1.3 Evalutaion Assessment/Executive Summary/Recommendation

---

### 1.3.1 Introduction




Our objective in this section is to assess the performance of our movie recommendation system and gain valuable insights from it.  An important objective of this notebook is to evaluate the effectiveness of the recommendation models used, as well as to identify areas of improvement. The framework for our evaluation consists of a detailed analysis of various metrics, a deep dive into the collaborative, content-based filtering models and K-mean genre clustering models for the validation of specific modifications.

Through this evaluation, we hope to provide movie enthusiasts with a clear understanding of how well our recommendation system aligns with thier preferences, ultimately guiding them toward better informed decisions when selecting their next move and also help the system learn more about future enhancements. Through metrics assessment model analysis, and hypothesis testing, we will unravel the complexities of our system, celebrating its successes and addressing any challenges encountered. This section will conclude by paving the way for the next steps in refining our movie recommendation system. 

---

#### Imports

In [26]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


#nltk import
import nltk
from nltk.corpus import wordnet
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from nltk.tokenize import word_tokenize

#gensim imports
from gensim.models import Doc2Vec
from gensim.models.doc2vec import TaggedDocument

In [27]:
tmdb_df = pd.read_csv('../data/tmdb_data.csv')

----

### 1.3.2 Model Filtering Assessment and Analysis




#### Collaborative-Based Approach

<ins> **Steps:**</ins> 

1. Created an interaction matrix.
2. Applied collaborative filtering using cosine similarity on interaction_data_pivot.
3. Utilized item_similarity_matrix to recommend movies for a given input movie using the get_movie_recommendations function.
4. Implemented an interactive recommendation function that allows users to input their favorite movie and receive recommendations based on their preferences.

<ins> **Findings**</ins> 

- Collaborative filtering excels in capturing user preferences by leveraging their interaction history.
- Recommendations are based on the similarity between movies and users' historical preferences.
- Well-suited for users with clear preferences and substantial interaction history.

#### Content-Based Approach:

<ins> **Steps:**</ins> 

1. Text preprocessing steps, including lowercasing, tokenization, removing stopwords, and stemming/lemmatization.
2. Feature extraction by converting text into numerical features (TF-IDF or Count Vectorizer).
3. Combined features and calculated similarity.
4. Implemented a function to get similar movies based on a given movie title.

<ins> **Findings**</ins> 

- Content-based filtering relies on the inherent characteristics of movies (genre, director, cast, etc.).
- Recommendations are based on the similarity of content features.
- Effective for users with specific preferences for genres, directors, or actors/actresses.

#### K-Means Genre Cluster Approach:

<ins> **Steps:**</ins> 

1. Selected features for clustering (vote_average, popularity).
2. Fit KMeans on the original features with the correct number of clusters (19 in this case).
3. Plotted a scatter plot of data points with unique colors for each movie.
4. Showcased centroids representing the average characteristics of each genre cluster.

<ins> **Findings**</ins> 

- K-Means clustering groups movies into clusters based on their vote averages and popularity scores.
- Each cluster represents a distinct genre based on similar characteristics.
- Useful for identifying clusters of movies with shared characteristics.

#### Comparative Analysis and Improvement Suggestions:

<ins> **Comparison Analysis:**</ins> 

- Collaborative-Based Approach: Effective for diverse user preferences and leveraging historical interactions.
- Content-Based Approach: Ideal for users with specific content preferences and focuses on movie characteristics.
- K-Means Genre Cluster Approach: Useful for identifying groups of movies with similar characteristics.

<ins> **Improvement Suggestion:**</ins> 

1. Consider hybrid models combining collaborative and content-based filtering for a more comprehensive approach.
2. Fine-tune clustering parameters for K-Means to enhance genre cluster accuracy.
3. Incorporate additional features for content-based filtering to improve recommendation precision.
4. Gather more user interaction data for collaborative filtering to enhance model accuracy.

----

### 1.3.3 Hypothesis Testing



- Null Hypothesis: The accuracy of movie recommendations remains equivalent between collaborative filtering (user preference) and content-based filtering (advanced preference) models.

- Alternative Hypothesis: The accuracy of movie recommendations significantly differs between collaborative filtering (user preference) and content-based filtering (advanced preference) models.

Conclusion: The hypothesis testing results reveal difference in the accuracy of movie recommendations between the collaborative filtering (user preference) and content-based filtering (advanced preference) models. The collaborative-based approach is powerful for capturing user preferences, while the content-based approach excels in recommending movies based on inherent characteristics. The optional model of K-Means genre cluster approach offers insights into genre-based grouping but requires fine-tuning for optimal results.

In summary, the rejection of the null hypothesis suggests that the introduction of advanced preferences had a notable impact on the accuracy of movie recommendations. The collaborative-based approach excelled in capturing user preferences, while the content-based approach leveraged inherent movie characteristics for recommendations. These findings contribute valuable insights to guide further enhancements and optimizations in the movie recommendation system.

----

### 1.3.4 Conclusion and Future Recommendation


**Conclusion:**

Throughout this arduous process, the development and evaluation of the movie recommendation system has uncovered valuable insights into the effectiveness of different approaches. The hypothesis testing revealed a differences in the accuracy of movie recommendations between the collaborative (user preference) and content-based (advanced preference) filtering models.

- Collaborative Filtering: Proved powerful in capturing user preferences, offering personalized recommendations based on historical interactions.
- Content-Based Filtering: Excelled in recommending movies based on inherent characteristics, providing diversity beyond user history.

The optional inclusion of the K-Means genre cluster approach showcased potential insights into genre-based grouping but requires further fine-tuning for optimal results.

**Future Recommendation**

1. Integration of Chatbot (Alpha Version):

- Objective: Enhance user experience and gather preferences seamlessly.
- Approach: Develop a version of a chatbot using Gensim models to understand and respond to user preferences and feedback.
- Benefits: Improved user engagement, real-time preference extraction, and expanded dataset for recommendation models.

2. Fine-Tuning of K-Means Genre Cluster Approach:

- Objective: Optimize genre-based clustering for more accurate genre-based recommendations.
- Approach: Experiment with different cluster counts, feature selections, and clustering algorithms to improve the grouping of movies.
- Benefits: Enhanced genre-specific recommendations, providing a complementary approach to user and content-based models.

3. Hybrid Model Integration:

- Objective: Combine strengths of collaborative and content-based models for a comprehensive recommendation system.
- Approach: Develop a hybrid model that intelligently blends collaborative and content-based recommendations.
- Benefits: Leverage the strengths of both models to overcome individual limitations, providing more robust and diverse recommendations.

**Final Note:**

The conclusion of this project marks a milestone in the development of a movie recommendation system, and has the outlined a path for an even more sophisticated and user-centric movie recommendation system. The incorporation of a chatbot, refinement of existing models, and exploration of hybrid approaches promise an exciting path forward in delivering unparalleled cinematic recommendations and truly creating a cinematic nexus.

**chatbot gensim model (alpha version)**

In [28]:
#functions for text preprocessing 
#
def preprocess_text(text):
    #lowcasing
    text = text.lower()
    
    #tokenization
    tokens = word_tokenize(text)
    
    #removing stopwords and punctuation
    stop_words = set(stopwords.words('english'))
    tokens = [token for token in tokens if token.isalnum() and token not in stop_words]
    
    #Stemming
    stemmer = PorterStemmer()
    tokens = [stemmer.stem(token) for token in tokens]
    
    return ' '.join(tokens)

def get_wordnet_pos(word):
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"N": wordnet.NOUN, "V": wordnet.VERB, "R": wordnet.ADV, "J": wordnet.ADJ}
    return tag_dict.get(tag, wordnet.NOUN)

def lemmatize_text(text):
    
    if pd.isnull(text):  # Check for NaN values
        return ''
    
    lemmatizer = WordNetLemmatizer()
    stop_words = set(stopwords.words('english'))
    
    tokens = word_tokenize(text)
    filtered_tokens = [token.lower() for token in tokens if token.isalpha() and token.lower() not in stop_words]

    lemmatized_tokens = [lemmatizer.lemmatize(token, get_wordnet_pos(token)) for token in filtered_tokens]
    return ' '.join(lemmatized_tokens)


In [29]:
tmdb_df['preprocessed_overview'] = tmdb_df['overview'].apply(lemmatize_text)

In [30]:
documents = [TaggedDocument(words=doc.split(), 
                            tags=[str(i)]) for i, doc in enumerate(tmdb_df['preprocessed_overview'])]

In [31]:
#Train a Doc2Vec model
doc2vec_model = Doc2Vec(vector_size=100, window=5, min_count=1, workers=4, epochs=20)
doc2vec_model.build_vocab(documents)
doc2vec_model.train(documents, total_examples=doc2vec_model.corpus_count, epochs=doc2vec_model.epochs)

In [32]:
def recommend_based_on_user_input(user_input, model, data):
    # Convert user input into a document embedding
    user_embedding = model.infer_vector(user_input.split())

    # Get similar movies using most_similar method
    similar_movies = model.dv.most_similar([user_embedding], topn=len(model.dv))

    # Extract movie indices from the similar_movies result
    similar_movie_indices = [int(idx) for idx, _ in similar_movies]

    # Get movie titles based on indices
    recommended_movies = data.iloc[similar_movie_indices]['title'].tolist()

    return recommended_movies

In [33]:
user_input = "japan"
recommended_movies = recommend_based_on_user_input(user_input, doc2vec_model, tmdb_df)
print(recommended_movies[:7])

['Detective Conan: Black Iron Submarine', 'The Happening', 'Dredd', 'The Man with the Golden Gun', 'RoboCop 3', 'Independence Day', 'U.S. Marshals']
