# Movie Recommendation System
### Content-Based Recommendation using NLP



## Problem Definition & Objective

With the increasing number of movies available on digital platforms, users often find it difficult to discover content that matches their interests. Simple popularity-based recommendations do not provide personalization.

**Objective:**  
To design a **content-based movie recommendation system** that suggests similar movies using storytelling elements and genre similarity, without relying on user ratings or historical data.

---

## Selected Project Track

- **Track:** Recommendation Systems  
- **Approach:** Content-Based Filtering  
- **Domain:** Entertainment / Movies  

---

## Problem Statement

The challenge is to recommend relevant movies to users in the absence of user interaction data. The system must generate meaningful recommendations using only movie metadata while remaining transparent and explainable.

---

## Real-World Relevance & Motivation

Recommendation systems are widely used by platforms such as Netflix and Amazon Prime to improve user engagement.  
This project addresses the **cold-start problem** by using movie content instead of user behavior, making it suitable for real-world deployment where user data may be unavailable.

---

## Data Understanding & Preparation

- **Dataset:** TMDB 5000 Movie Dataset  
- **Source:** Public TMDB metadata  

**Features Used:**
- Movie title
- Overview
- Genres
- Keywords
- Cast and crew  

**Preprocessing Steps:**
- Merged movie and credit datasets
- Removed missing values
- Cleaned and tokenized text data
- Combined relevant features into textual representations

---

## Model / System Design

The system uses a **content-based recommendation approach** with Natural Language Processing.

### Recommendation Modes
- **Storytelling-Based:**  
  Uses movie overview and keywords to capture narrative similarity.
- **Genre-Based:**  
  Uses genre information to recommend movies within similar categories.

**Similarity Metric:** Cosine Similarity  
**Vectorization Technique:** Bag-of-Words

---

## Core Implementation

- Text data is converted into numerical vectors using `CountVectorizer`
- Cosine similarity is computed between movie vectors
- Separate similarity matrices are created for storytelling and genre modes
- Top-N similar movies are selected and displayed
- A Streamlit interface is used for interaction and visualization

---

## Evaluation & Analysis

Traditional accuracy metrics are not suitable for recommendation systems.

**Evaluation Approach:**
- Qualitative evaluation of recommendation relevance
- Comparison of storytelling-based vs genre-based results
- Visual validation through the Streamlit user interface

The system produces logically consistent and relevant movie recommendations.

---

## Ethical Considerations & Responsible AI

- No personal or sensitive user data is collected
- Recommendations are explainable and transparent
- Dataset bias toward popular or English-language movies is acknowledged
- The system is intended to assist users, not replace human judgment

---

## Conclusion & Future Scope

**Conclusion:**  
The project successfully implements a content-based movie recommendation system using NLP and cosine similarity, providing explainable and meaningful recommendations through a dual-mode approach.

**Future Scope:**
- Incorporate collaborative filtering
- Include user feedback and ratings
- Improve recommendation diversity
- Deploy as a scalable web application




In [33]:
import numpy as np
import pandas as pd
import ast
import pickle

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity


In [34]:
movies = pd.read_csv("tmdb_5000_movies.csv")
credits = pd.read_csv("tmdb_5000_credits.csv")

movies = movies.merge(credits, on="title")
movies.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,...,runtime,spoken_languages,status,tagline,title,vote_average,vote_count,movie_id,cast,crew
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...",...,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800,19995,"[{""cast_id"": 242, ""character"": ""Jake Sully"", ""...","[{""credit_id"": ""52fe48009251416c750aca23"", ""de..."
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...",...,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500,285,"[{""cast_id"": 4, ""character"": ""Captain Jack Spa...","[{""credit_id"": ""52fe4232c3a36847f800b579"", ""de..."
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...",...,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466,206647,"[{""cast_id"": 1, ""character"": ""James Bond"", ""cr...","[{""credit_id"": ""54805967c3a36829b5002c41"", ""de..."
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...",...,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106,49026,"[{""cast_id"": 2, ""character"": ""Bruce Wayne / Ba...","[{""credit_id"": ""52fe4781c3a36847f81398c3"", ""de..."
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]",...,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124,49529,"[{""cast_id"": 5, ""character"": ""John Carter"", ""c...","[{""credit_id"": ""52fe479ac3a36847f813eaa3"", ""de..."


In [24]:
movies = movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.dropna(inplace=True)

In [25]:
def convert(text):
    return [i['name'] for i in ast.literal_eval(text)]

def get_cast(text):
    return [i['name'] for i in ast.literal_eval(text)[:3]]

def get_director(text):
    for i in ast.literal_eval(text):
        if i['job'] == 'Director':
            return [i['name']]
    return []

In [26]:
movies['overview'] = movies['overview'].fillna('').apply(lambda x: x.split())
movies['genres'] = movies['genres'].apply(convert)
movies['keywords'] = movies['keywords'].apply(convert)
movies['cast'] = movies['cast'].apply(get_cast)
movies['crew'] = movies['crew'].apply(get_director)


In [27]:
movies['story_tags'] = movies['overview'] + movies['keywords']
movies['story_tags'] = movies['story_tags'].apply(lambda x: " ".join(x))

In [28]:
movies['genre_tags'] = movies['genres'].apply(lambda x: " ".join(x))


In [29]:
cv_story = CountVectorizer(max_features=5000, stop_words='english')
story_vectors = cv_story.fit_transform(movies['story_tags']).toarray()

similarity_story = cosine_similarity(story_vectors)

pickle.dump(similarity_story, open("similarity_story.pkl", "wb"))

In [31]:
cv_genre = CountVectorizer(max_features=1000, stop_words='english')
genre_vectors = cv_genre.fit_transform(movies['genre_tags']).toarray()

similarity_genre = cosine_similarity(genre_vectors)

pickle.dump(similarity_genre, open("similarity_genre.pkl", "wb"))


In [32]:
movie_dict = movies[['movie_id','title']].to_dict(orient='records')
pickle.dump(movie_dict, open("movie_dict.pkl", "wb"))