# Movie Recommendation System

##### This project uses concepts of Singular Value Decomposition to create a mock version of a movie recommendation system.

Initially, data is provided for **50 movies** and **50 different users**.  
A rating of **0** indicates that the user has not seen the movie yet, while all other values represent the ratings.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import pprint

In [3]:
data = {
    'The Shawshank Redemption': [4, 5, 0, 5, 0, 4, 0, 2, 3, 0, 4, 3, 4, 5, 1, 3, 2, 5, 4, 1, 0, 0, 3, 2, 5, 4, 0, 3, 3, 2, 3, 3, 5, 4, 0, 0, 2, 2, 5, 5, 3, 0, 2, 0, 1, 0, 5, 2, 1, 5],
    'The Godfather': [1, 2, 4, 1, 4, 4, 2, 3, 1, 3, 1, 3, 1, 3, 5, 3, 2, 1, 5, 1, 2, 2, 3, 3, 4, 0, 2, 0, 0, 3, 5, 1, 0, 2, 5, 3, 5, 5, 1, 0, 1, 1, 4, 2, 1, 3, 3, 5, 0, 2],
    'The Dark Knight': [0, 1, 5, 4, 4, 3, 1, 0, 3, 2, 2, 3, 4, 4, 4, 5, 1, 1, 3, 5, 3, 5, 3, 4, 1, 3, 5, 1, 5, 4, 0, 4, 1, 5, 3, 1, 4, 5, 1, 1, 3, 1, 3, 2, 4, 5, 5, 1, 5, 0],
    'Schindler\'s List': [0, 2, 0, 3, 4, 4, 2, 1, 2, 2, 2, 2, 0, 4, 5, 0, 5, 2, 2, 1, 3, 3, 2, 1, 0, 0, 3, 3, 5, 0, 3, 5, 3, 1, 1, 0, 5, 0, 0, 4, 3, 1, 1, 3, 2, 5, 0, 1, 3, 2],
    'Pulp Fiction': [5, 4, 3, 4, 1, 3, 2, 5, 4, 4, 2, 4, 4, 0, 1, 2, 0, 2, 3, 5, 1, 2, 4, 4, 4, 1, 2, 5, 5, 5, 4, 5, 4, 3, 1, 3, 0, 0, 1, 5, 2, 3, 0, 5, 0, 5, 1, 4, 4, 3],
    'The Lord of the Rings: The Return of the King': [2, 5, 2, 4, 5, 3, 3, 5, 1, 3, 2, 1, 1, 1, 4, 0, 1, 1, 5, 5, 3, 0, 0, 4, 4, 0, 5, 0, 0, 2, 3, 1, 5, 3, 2, 5, 0, 4, 5, 0, 1, 0, 4, 0, 2, 5, 3, 2, 5, 5],
    'Forrest Gump': [1, 0, 2, 2, 3, 5, 4, 4, 1, 3, 4, 3, 3, 5, 4, 1, 4, 2, 0, 1, 0, 1, 1, 0, 1, 3, 1, 1, 1, 1, 4, 1, 1, 2, 1, 4, 4, 3, 3, 2, 5, 0, 0, 4, 0, 0, 4, 2, 0, 5],
    'Fight Club': [2, 5, 2, 1, 4, 0, 2, 0, 3, 2, 2, 1, 0, 3, 1, 3, 0, 0, 5, 1, 4, 3, 4, 4, 0, 1, 0, 0, 2, 4, 1, 0, 1, 1, 2, 4, 3, 5, 4, 5, 1, 4, 3, 2, 5, 5, 2, 2, 2, 4],
    'Inception': [1, 0, 2, 2, 5, 3, 2, 4, 0, 3, 1, 3, 4, 4, 2, 5, 2, 5, 3, 1, 1, 5, 5, 0, 1, 4, 5, 2, 4, 3, 4, 1, 2, 4, 2, 4, 4, 4, 2, 5, 0, 5, 4, 0, 5, 5, 0, 4, 3, 1],
    'The Matrix': [1, 0, 2, 4, 1, 4, 1, 2, 3, 5, 0, 5, 1, 1, 1, 1, 3, 4, 2, 1, 4, 0, 0, 5, 2, 5, 2, 0, 3, 1, 4, 0, 3, 4, 2, 4, 4, 1, 1, 3, 0, 5, 5, 5, 2, 4, 0, 5, 2, 2],
    'The Lord of the Rings: The Fellowship of the Ring': [3, 2, 1, 5, 0, 0, 2, 3, 3, 1, 5, 3, 4, 4, 1, 3, 3, 3, 3, 0, 2, 5, 5, 2, 1, 5, 4, 3, 2, 0, 2, 0, 4, 1, 0, 0, 4, 3, 4, 2, 1, 1, 5, 1, 3, 4, 0, 4, 0, 1],
    'Star Wars: Episode V - The Empire Strikes Back': [5, 5, 3, 0, 3, 0, 0, 0, 1, 5, 3, 4, 5, 4, 2, 0, 3, 3, 0, 3, 5, 3, 3, 3, 0, 0, 0, 2, 2, 0, 0, 1, 3, 4, 0, 3, 1, 0, 3, 4, 1, 5, 2, 4, 3, 5, 0, 5, 1, 2],
    'The Lord of the Rings: The Two Towers': [1, 1, 4, 3, 2, 3, 3, 2, 1, 4, 4, 5, 4, 0, 2, 3, 0, 4, 5, 4, 2, 4, 5, 5, 5, 3, 3, 4, 3, 1, 0, 0, 0, 5, 0, 3, 5, 3, 5, 3, 5, 1, 1, 0, 4, 1, 3, 4, 3, 0],
    'Goodfellas': [4, 1, 3, 1, 5, 5, 0, 2, 2, 4, 1, 1, 4, 0, 5, 2, 5, 1, 3, 2, 3, 0, 5, 4, 3, 5, 3, 3, 2, 5, 5, 0, 1, 4, 0, 3, 1, 3, 4, 2, 2, 5, 5, 0, 4, 0, 0, 3, 4, 1],
    'The Godfather Part II': [1, 3, 5, 0, 5, 1, 3, 2, 0, 1, 4, 5, 1, 1, 2, 2, 3, 0, 2, 3, 2, 2, 3, 3, 3, 3, 1, 4, 4, 0, 4, 0, 5, 5, 2, 2, 4, 2, 4, 1, 4, 3, 1, 2, 5, 0, 2, 5, 2, 3],
    'The Avengers: Endgame': [0, 5, 2, 1, 0, 0, 5, 2, 4, 4, 2, 1, 1, 4, 0, 2, 5, 1, 0, 5, 0, 2, 1, 1, 5, 3, 0, 4, 3, 3, 4, 4, 2, 5, 0, 4, 1, 2, 3, 1, 5, 0, 5, 0, 1, 0, 2, 2, 4, 1],
    'The Silence of the Lambs': [3, 2, 5, 5, 5, 4, 2, 3, 0, 1, 4, 2, 2, 5, 2, 1, 3, 1, 3, 2, 1, 0, 1, 4, 2, 1, 4, 1, 5, 3, 0, 2, 0, 1, 3, 5, 5, 1, 1, 5, 5, 4, 4, 1, 3, 5, 5, 4, 0, 5],
    'Interstellar': [3, 0, 2, 4, 1, 4, 5, 3, 3, 0, 1, 2, 1, 0, 2, 3, 3, 1, 3, 0, 4, 3, 4, 3, 4, 2, 1, 5, 4, 1, 5, 3, 5, 0, 0, 4, 5, 0, 3, 5, 5, 4, 1, 1, 4, 4, 5, 5, 4, 2],
    'Parasite': [3, 2, 4, 4, 1, 4, 3, 1, 0, 4, 3, 0, 4, 5, 4, 5, 2, 1, 2, 1, 5, 5, 5, 3, 0, 5, 0, 3, 2, 5, 3, 5, 1, 5, 2, 4, 3, 5, 0, 1, 2, 3, 0, 2, 1, 3, 1, 3, 2, 5],
    'Saving Private Ryan': [3, 4, 3, 1, 3, 0, 2, 3, 3, 3, 5, 1, 5, 2, 2, 3, 1, 4, 4, 5, 4, 3, 2, 2, 4, 3, 5, 2, 4, 3, 3, 2, 4, 5, 1, 1, 5, 2, 2, 4, 3, 5, 1, 2, 1, 1, 0, 4, 0, 3],
    'The Green Mile': [1, 1, 1, 0, 5, 5, 2, 0, 1, 3, 4, 1, 4, 5, 5, 3, 4, 4, 1, 1, 0, 2, 3, 1, 5, 0, 3, 0, 4, 5, 5, 4, 5, 2, 0, 4, 5, 2, 1, 4, 4, 3, 0, 5, 0, 5, 2, 0, 1, 0],
    'Se7en': [2, 1, 3, 3, 5, 3, 2, 2, 5, 0, 4, 0, 3, 2, 1, 1, 3, 3, 3, 5, 0, 0, 0, 2, 3, 0, 2, 4, 2, 2, 5, 1, 4, 3, 1, 2, 3, 0, 1, 2, 4, 5, 5, 5, 2, 3, 4, 3, 1, 4],
    'Joker': [4, 0, 3, 0, 4, 2, 4, 0, 3, 2, 0, 1, 2, 0, 2, 1, 2, 0, 0, 4, 0, 1, 2, 5, 1, 2, 0, 4, 3, 2, 2, 5, 3, 1, 1, 0, 3, 5, 2, 0, 4, 4, 1, 0, 2, 4, 3, 2, 0, 3],
    'Gladiator': [3, 3, 3, 3, 0, 1, 0, 3, 1, 2, 5, 0, 1, 3, 5, 2, 5, 3, 0, 1, 4, 4, 0, 4, 3, 2, 1, 3, 4, 1, 3, 2, 2, 3, 1, 5, 5, 5, 4, 3, 1, 3, 1, 0, 2, 1, 3, 4, 3, 5],
    'Avengers: Infinity War': [5, 2, 5, 5, 4, 4, 3, 4, 5, 4, 3, 1, 5, 5, 4, 5, 1, 5, 3, 1, 1, 4, 5, 2, 3, 1, 4, 0, 2, 2, 1, 3, 4, 4, 5, 1, 0, 2, 4, 4, 1, 0, 2, 0, 5, 4, 3, 2, 3, 1],
    'Titanic': [3, 1, 2, 5, 1, 1, 1, 1, 1, 5, 3, 3, 0, 1, 5, 2, 3, 3, 5, 5, 1, 2, 5, 0, 1, 0, 2, 2, 3, 1, 0, 1, 3, 4, 5, 0, 0, 2, 3, 3, 3, 5, 0, 3, 3, 5, 4, 0, 2, 5],
    'The Departed': [0, 1, 5, 4, 4, 1, 2, 1, 4, 3, 0, 5, 5, 2, 2, 2, 0, 2, 0, 0, 3, 5, 2, 4, 3, 0, 5, 0, 4, 1, 0, 3, 0, 1, 1, 5, 4, 0, 1, 3, 1, 4, 0, 2, 4, 4, 3, 4, 0, 4],
    'Whiplash': [0, 0, 0, 1, 4, 4, 4, 4, 4, 0, 2, 5, 0, 4, 3, 2, 2, 4, 5, 5, 4, 1, 5, 0, 5, 3, 3, 3, 1, 3, 5, 2, 4, 2, 5, 5, 4, 5, 5, 4, 2, 4, 1, 2, 1, 4, 3, 2, 4, 5],
    'The Prestige': [1, 3, 3, 1, 5, 5, 4, 5, 5, 4, 0, 5, 5, 0, 1, 4, 1, 1, 1, 0, 2, 4, 4, 5, 1, 2, 3, 5, 5, 4, 4, 3, 2, 2, 3, 3, 3, 0, 3, 4, 3, 3, 0, 3, 0, 5, 0, 0, 5, 0],
    'Django Unchained': [0, 4, 3, 0, 2, 0, 5, 4, 0, 5, 5, 2, 3, 2, 4, 1, 5, 3, 4, 3, 5, 0, 3, 2, 3, 2, 4, 5, 1, 0, 3, 4, 3, 1, 4, 2, 1, 4, 2, 3, 0, 5, 5, 1, 3, 5, 1, 4, 4, 2],
    'The Lion King': [3, 2, 1, 2, 0, 5, 5, 3, 1, 5, 4, 3, 0, 5, 1, 0, 4, 1, 5, 2, 2, 5, 5, 0, 1, 0, 5, 4, 5, 5, 1, 3, 5, 0, 3, 5, 1, 0, 3, 0, 1, 1, 4, 0, 3, 3, 2, 1, 4, 5],
    'The Social Network': [3, 5, 5, 4, 2, 4, 3, 0, 2, 5, 4, 2, 0, 5, 5, 1, 4, 4, 3, 3, 3, 4, 5, 4, 5, 5, 0, 3, 3, 2, 1, 4, 2, 4, 1, 5, 3, 1, 3, 3, 5, 0, 0, 3, 5, 4, 1, 2, 0, 1],
    'The Truman Show': [5, 1, 5, 4, 1, 2, 0, 0, 5, 5, 2, 1, 0, 4, 5, 1, 3, 3, 0, 4, 3, 4, 3, 0, 5, 4, 4, 2, 1, 2, 0, 5, 3, 2, 1, 2, 5, 5, 5, 3, 4, 4, 2, 3, 4, 3, 3, 1, 2, 4],
    'The Grand Budapest Hotel': [0, 5, 2, 1, 3, 2, 0, 4, 0, 2, 0, 4, 4, 0, 0, 1, 5, 4, 0, 2, 4, 5, 5, 2, 2, 1, 3, 0, 1, 0, 2, 1, 0, 5, 2, 4, 2, 1, 4, 0, 1, 1, 0, 0, 3, 4, 4, 2, 1, 0],
    'A Beautiful Mind': [0, 3, 4, 4, 0, 1, 3, 3, 5, 0, 5, 5, 2, 3, 3, 2, 3, 3, 1, 4, 0, 5, 3, 3, 3, 3, 1, 4, 3, 1, 0, 5, 3, 2, 2, 0, 2, 0, 4, 1, 4, 2, 1, 3, 5, 3, 3, 2, 3, 1],
    'The Wolf of Wall Street': [4, 2, 0, 4, 5, 4, 2, 2, 1, 1, 4, 4, 3, 4, 0, 1, 2, 4, 5, 5, 1, 2, 1, 5, 0, 3, 4, 3, 5, 3, 5, 3, 5, 5, 1, 4, 5, 1, 1, 3, 5, 4, 5, 2, 5, 4, 0, 0, 2, 0],
    'Mad Max: Fury Road': [5, 1, 4, 2, 2, 3, 5, 3, 0, 3, 4, 4, 5, 1, 5, 3, 1, 1, 3, 3, 1, 1, 5, 3, 1, 5, 2, 5, 5, 3, 5, 4, 4, 5, 5, 0, 1, 5, 0, 1, 2, 1, 3, 2, 4, 5, 1, 3, 2, 5],
    'Spider-Man: Into the Spider-Verse': [0, 4, 3, 3, 2, 1, 2, 5, 2, 1, 0, 5, 4, 1, 3, 2, 1, 2, 0, 1, 5, 0, 1, 2, 4, 2, 2, 5, 0, 3, 1, 2, 1, 3, 4, 5, 5, 5, 0, 5, 3, 4, 0, 5, 0, 2, 5, 2, 5, 1],
    'La La Land': [1, 0, 2, 1, 1, 5, 0, 4, 3, 1, 4, 2, 2, 1, 3, 1, 3, 4, 1, 0, 5, 0, 4, 0, 4, 5, 5, 1, 3, 1, 4, 5, 4, 5, 1, 3, 3, 2, 0, 2, 3, 0, 3, 0, 2, 5, 3, 3, 1, 1],
    'Toy Story 3': [0, 4, 5, 1, 4, 3, 1, 3, 1, 0, 1, 4, 2, 1, 2, 1, 3, 2, 1, 1, 1, 2, 4, 2, 4, 3, 0, 0, 3, 2, 0, 1, 3, 2, 1, 2, 3, 4, 3, 1, 5, 5, 4, 0, 2, 1, 5, 3, 2, 5],
    'Coco': [4, 0, 0, 4, 2, 0, 5, 1, 4, 1, 1, 4, 3, 0, 3, 4, 4, 0, 5, 0, 0, 3, 0, 0, 3, 5, 4, 2, 0, 4, 5, 3, 3, 4, 1, 3, 3, 0, 3, 5, 0, 4, 3, 1, 4, 3, 1, 5, 4, 1],
    'WALL-E': [2, 0, 3, 5, 5, 2, 2, 0, 0, 2, 0, 4, 3, 0, 4, 5, 1, 2, 2, 5, 5, 5, 1, 3, 3, 4, 5, 4, 1, 3, 0, 4, 2, 2, 1, 5, 4, 5, 1, 2, 1, 4, 3, 4, 0, 5, 3, 1, 0, 0],
    'The Incredibles': [1, 2, 5, 4, 3, 2, 3, 2, 2, 4, 4, 2, 3, 1, 1, 2, 3, 0, 0, 0, 1, 4, 4, 2, 0, 3, 3, 2, 5, 1, 4, 4, 1, 5, 0, 0, 3, 4, 2, 3, 3, 1, 1, 5, 1, 3, 0, 2, 3, 3],
    'Up': [3, 3, 2, 4, 5, 4, 0, 1, 1, 1, 0, 5, 2, 1, 5, 4, 5, 2, 4, 5, 1, 3, 4, 1, 5, 1, 3, 2, 1, 4, 2, 4, 5, 5, 1, 3, 0, 1, 2, 4, 5, 0, 2, 2, 2, 2, 5, 3, 1, 3],
    'Finding Nemo': [5, 5, 5, 4, 0, 5, 5, 2, 4, 3, 5, 5, 4, 2, 2, 3, 4, 3, 0, 0, 1, 1, 5, 0, 5, 0, 3, 3, 1, 4, 3, 3, 2, 3, 3, 4, 5, 1, 3, 5, 0, 0, 5, 1, 0, 1, 3, 3, 4, 2],
    'Ratatouille': [1, 4, 0, 0, 5, 1, 1, 2, 2, 0, 5, 3, 1, 3, 0, 5, 3, 0, 4, 1, 4, 5, 4, 5, 5, 4, 5, 1, 1, 3, 1, 2, 5, 5, 1, 2, 0, 5, 2, 2, 1, 1, 1, 0, 3, 2, 0, 3, 0, 2],
    'Shrek': [3, 3, 4, 1, 0, 0, 4, 5, 0, 5, 2, 5, 0, 4, 0, 0, 1, 2, 1, 5, 4, 1, 2, 4, 2, 5, 2, 5, 5, 1, 2, 4, 5, 5, 0, 5, 3, 4, 4, 3, 1, 3, 2, 2, 2, 4, 4, 2, 5, 4],
    'Frozen': [0, 3, 4, 0, 0, 3, 5, 4, 5, 0, 3, 3, 5, 3, 0, 1, 1, 0, 3, 4, 0, 2, 1, 5, 1, 1, 4, 1, 1, 4, 4, 1, 4, 1, 5, 1, 5, 2, 0, 2, 0, 5, 2, 3, 2, 3, 1, 3, 1, 0],
    'Inside Out': [1, 1, 1, 0, 1, 4, 1, 2, 1, 2, 4, 0, 0, 3, 4, 3, 2, 1, 1, 3, 0, 3, 0, 1, 0, 3, 1, 0, 4, 5, 5, 3, 2, 0, 0, 4, 0, 3, 0, 3, 3, 3, 0, 3, 0, 4, 0, 3, 3, 5],
    'The Lego Movie': [4, 5, 2, 4, 0, 2, 4, 5, 3, 2, 1, 3, 2, 4, 3, 1, 2, 4, 4, 0, 3, 3, 1, 5, 1, 1, 3, 0, 1, 1, 2, 4, 2, 3, 3, 5, 2, 1, 3, 2, 4, 5, 4, 5, 2, 4, 5, 4, 3, 2],
    'Moana': [0, 5, 2, 5, 5, 1, 5, 1, 2, 5, 4, 5, 3, 0, 1, 4, 5, 1, 5, 3, 5, 3, 5, 5, 4, 4, 0, 2, 0, 1, 5, 2, 3, 2, 3, 0, 0, 2, 5, 4, 2, 5, 0, 0, 2, 4, 5, 2, 1, 2],
}

In [5]:
all_users = [
    f"User {i+1}" for i in range(len(next(iter(data.values()))))
]

df = pd.DataFrame(data, index=all_users)

In [7]:
# applying the SVD concept
U, S, Vt = np.linalg.svd(df.fillna(0), full_matrices=False)

> Each column in U represents feature of each user for each movie. In other words, row space for users.

In [64]:
# print(U) # for 50 users

>S contains the singular values. It is for the 15 movies. The larger the singular value, the more important the corresponding "feature". It is a diagonal matrix with singular values on the diagonal.

In [67]:
# print(S) # for 50 movies

> Vt is also for 20 users but this represents the column space.

In [70]:
# print(Vt) # for 50 users

In [18]:
# conversion to diagonal matrix
S_matrix = np.diag(S)

### Relationship of S with A^T.A

In [21]:
# computing eigen values of A^T.A

eigenvalues, _ = np.linalg.eig(df.T @ df)
squared_S = S**2

# for index, eigenvalue in enumerate(eigenvalues):
#     print("Eigenvalue: ", eigenvalue, " Corresponding Squared S: ", squared_S[index])

#### Therefore, now we know that Eigenvalues of A^T.A closely match the squared singular values of A (S).

> Reconstruction of the matrix by dot product of all three.

In [25]:
reconstructed_matrix = U @ S_matrix @ Vt
total_number_of_users = len(reconstructed_matrix)

### Input User here

In [28]:
user = 44

# Conclusion About Some Users:

# Most of the users have same close user in both approaches.
# User 2,3,4... is close to User 39 using both approach

# User ...29,30,32,36,38,40,41,42 has same result in both approaches with different values

# Some unique valued users:
# User 11
# User 17
# User 19
# User 21
# User 26
# User 35
# User 44
# User 47

> Compraing what the user input was for a movie vs what we obtained after decomposition and reconstruction. Column I represents the initial rating by user.

In [31]:
user -= 1

user_predictions = reconstructed_matrix[user]

print('Movie_# I Prediction')
for index, component in enumerate(data):
    print(component, data[component][user], user_predictions[index])

Movie_# I Prediction
The Shawshank Redemption 0 1.6667030538440123e-14
The Godfather 2 1.9999999999999958
The Dark Knight 2 1.9999999999999916
Schindler's List 3 2.999999999999995
Pulp Fiction 5 5.000000000000012
The Lord of the Rings: The Return of the King 0 2.1724114765773907e-15
Forrest Gump 4 4.0
Fight Club 2 1.9999999999999936
Inception 0 -1.009360580418923e-14
The Matrix 5 4.999999999999987
The Lord of the Rings: The Fellowship of the Ring 1 0.9999999999999977
Star Wars: Episode V - The Empire Strikes Back 4 3.9999999999999942
The Lord of the Rings: The Two Towers 0 -1.9675658115949794e-15
Goodfellas 0 -1.3063208404966977e-15
The Godfather Part II 2 2.0000000000000013
The Avengers: Endgame 0 1.573877486511573e-14
The Silence of the Lambs 1 0.9999999999999944
Interstellar 1 0.9999999999999993
Parasite 2 1.999999999999994
Saving Private Ryan 2 2.0000000000000093
The Green Mile 5 5.000000000000001
Se7en 5 5.0000000000000036
Joker 0 3.371691003793416e-15
Gladiator 0 9.16488178077597

### What Next

> We will now group similar users. The goal is to cluster users with similar movie taste and recommend the movies. If User A hasn't seen the movie, but User B, whose preference is similar to User A, has seen the movie and likes it, we will recommend it to User A also.

## Cosine Similarity Approach

> Cosine = dot product of two vectors / product of their magnitude
> This gives us the similarity between two matrices

In [37]:
# function to find cosine similarity

def cosineSimilarity(vector1, vector2):
    dot_product = vector1 @ vector2
    magnitude = np.linalg.norm(vector1) * np.linalg.norm(vector2)
    return dot_product/magnitude if magnitude != 0 else 0

> Cosine relationship of each user with each user

In [40]:
cosine_relationships_of_all_users = [[None for _ in range(total_number_of_users)] for _ in range(total_number_of_users)]

for i in range(total_number_of_users):
    for j in range(total_number_of_users):
        cosine_relationships_of_all_users[i][j] = cosineSimilarity(reconstructed_matrix[i],reconstructed_matrix[j])

cosine_relationships_of_all_users = pd.DataFrame(cosine_relationships_of_all_users, index=all_users, columns=all_users)
cosine_relationships_of_all_users

Unnamed: 0,User 1,User 2,User 3,User 4,User 5,User 6,User 7,User 8,User 9,User 10,...,User 41,User 42,User 43,User 44,User 45,User 46,User 47,User 48,User 49,User 50
User 1,1.0,0.635977,0.692267,0.750317,0.55619,0.678872,0.631122,0.609243,0.645343,0.717587,...,0.634586,0.637764,0.668947,0.579267,0.695919,0.698199,0.591341,0.71823,0.620046,0.706064
User 2,0.635977,1.0,0.723696,0.652176,0.611924,0.576772,0.679121,0.727135,0.644666,0.701178,...,0.652915,0.618971,0.621982,0.574668,0.64585,0.676555,0.674794,0.71705,0.676983,0.65633
User 3,0.692267,0.723696,1.0,0.747129,0.705813,0.720787,0.738129,0.718495,0.706706,0.79458,...,0.747269,0.719811,0.659985,0.68507,0.768442,0.759632,0.751346,0.806006,0.699267,0.713675
User 4,0.750317,0.652176,0.747129,1.0,0.633516,0.720738,0.68144,0.659052,0.71877,0.703592,...,0.718268,0.643241,0.660244,0.661124,0.729836,0.77182,0.770148,0.714747,0.677524,0.691111
User 5,0.55619,0.611924,0.705813,0.633516,1.0,0.732129,0.621956,0.615921,0.579681,0.640085,...,0.715232,0.720292,0.62308,0.602486,0.716942,0.773225,0.642255,0.682094,0.585698,0.622488
User 6,0.678872,0.576772,0.720787,0.720738,0.732129,1.0,0.688337,0.733548,0.67526,0.688003,...,0.769177,0.604201,0.662656,0.646545,0.637978,0.756838,0.686564,0.690899,0.693584,0.689848
User 7,0.631122,0.679121,0.738129,0.68144,0.621956,0.688337,1.0,0.770444,0.714686,0.731198,...,0.672502,0.686208,0.663914,0.61482,0.681855,0.756463,0.661519,0.74289,0.771336,0.664854
User 8,0.609243,0.727135,0.718495,0.659052,0.615921,0.733548,0.770444,1.0,0.678623,0.662723,...,0.650075,0.674773,0.676488,0.611625,0.634825,0.745879,0.697558,0.783877,0.802426,0.696174
User 9,0.645343,0.644666,0.706706,0.71877,0.579681,0.67526,0.714686,0.678623,1.0,0.607764,...,0.667075,0.66696,0.621882,0.661649,0.665023,0.702288,0.636806,0.693119,0.694489,0.582916
User 10,0.717587,0.701178,0.79458,0.703592,0.640085,0.688003,0.731198,0.662723,0.607764,1.0,...,0.62852,0.690718,0.622743,0.663503,0.689399,0.788281,0.578492,0.707989,0.729536,0.684005


> Now finding the user closest to the current_user.

In [43]:
cosine_relationship_with_current_user = [0] * total_number_of_users

for i in range(total_number_of_users):
    if i == user:
        continue
    cosine_relationship_with_current_user[i] = cosineSimilarity(reconstructed_matrix[user],reconstructed_matrix[i])

closest_user = cosine_relationship_with_current_user.index(max(cosine_relationship_with_current_user))
print("The closest user to User", user+1, "is \033[1mUser", closest_user+1, "\033[0m")

The closest user to User 44 is [1mUser 46 [0m


> Finding the movies that the current user didn't watch but the close user did.

In [46]:
close_user_predictions = reconstructed_matrix[closest_user]
recommendations_using_cosine_similarity = {}

for index, component in enumerate(data):
    # user_predictions from previous executions
    if user_predictions[index] < 0 and close_user_predictions[index] > 0:
        recommendations_using_cosine_similarity[close_user_predictions[index]] = component

recommendations_using_cosine_similarity = {key: recommendations_using_cosine_similarity[key] for key in sorted(recommendations_using_cosine_similarity, reverse=True)}

print("\nRecommendations using \033[1mCosine Similarity\033[0m: \n")
pprint.pprint(recommendations_using_cosine_similarity, sort_dicts=False)


Recommendations using [1mCosine Similarity[0m: 

{5.000000000000006: 'La La Land',
 4.99999999999999: 'Inception',
 3.999999999999994: 'The Grand Budapest Hotel',
 0.9999999999999993: 'Toy Story 3',
 0.9999999999999992: 'The Lord of the Rings: The Two Towers',
 5.491165123979575e-15: 'Goodfellas'}


> For those users that resulted into an empty recommendations, the second closest user could be used to retrieve the similar taste movies, which is not a part of this algorithm currently.

## Euclidean Distance Approach

In [50]:
# function to find euclidean distance

def euclideanDistance(vector1, vector2):
    return np.sqrt(np.sum((vector1 - vector2) ** 2))

> Euclidean Distance of each user with each user

In [53]:
euclidean_distance_of_all_users = [[None for _ in range(total_number_of_users)] for _ in range(total_number_of_users)]

for i in range(total_number_of_users):
    for j in range(total_number_of_users):
        euclidean_distance_of_all_users[i][j] = euclideanDistance(reconstructed_matrix[i],reconstructed_matrix[j])

euclidean_distance_of_all_users = pd.DataFrame(euclidean_distance_of_all_users, index=all_users, columns=all_users)
euclidean_distance_of_all_users

Unnamed: 0,User 1,User 2,User 3,User 4,User 5,User 6,User 7,User 8,User 9,User 10,...,User 41,User 42,User 43,User 44,User 45,User 46,User 47,User 48,User 49,User 50
User 1,0.0,17.435596,16.792856,15.033296,20.420578,16.881943,17.549929,17.663522,16.309506,15.652476,...,17.578396,18.734994,16.062378,17.435596,15.716234,18.920888,18.384776,15.588457,17.088007,16.03122
User 2,17.435596,0.0,16.431677,18.275667,19.77372,20.07486,17.088007,15.491933,17.262677,16.703293,...,17.860571,19.77372,18.05547,18.601075,17.748239,19.748418,17.146428,16.217275,16.613248,17.972201
User 3,16.792856,16.431677,0.0,16.062378,17.691806,16.822604,16.0,16.370706,16.431677,14.317821,...,15.779734,17.406895,17.832555,16.8523,14.933185,17.378147,15.556349,13.892444,16.733201,16.941074
User 4,15.033296,18.275667,16.062378,0.0,19.621417,16.703293,17.492856,17.832555,15.937377,17.058722,...,16.522712,19.519221,17.663522,17.262677,15.968719,16.911535,14.832397,16.703293,17.146428,17.464249
User 5,20.420578,19.77372,17.691806,19.621417,0.0,16.733201,19.519221,19.416488,19.924859,19.235384,...,17.029386,17.606817,19.104973,19.209373,16.792856,17.0,18.947295,18.05547,19.924859,19.748418
User 6,16.881943,20.07486,16.822604,16.703293,16.733201,0.0,17.233688,15.716234,17.0,17.435596,...,14.899664,20.493902,17.521415,17.521415,18.384776,17.406895,17.233688,17.320508,16.643317,17.435596
User 7,17.549929,17.088007,16.0,17.492856,19.519221,17.233688,0.0,14.21267,15.491933,15.84298,...,17.349352,17.972201,17.029386,17.720045,16.822604,17.320508,17.492856,15.459625,14.0,17.748239
User 8,17.663522,15.491933,16.370706,17.832555,19.416488,15.716234,14.21267,0.0,16.062378,17.464249,...,17.635192,18.083141,16.370706,17.378147,17.691806,17.606817,16.248077,13.96424,12.727922,16.643317
User 9,16.309506,17.262677,16.431677,15.937377,19.924859,17.0,15.491933,16.062378,0.0,18.411953,...,16.822604,18.027756,17.204651,15.684387,16.522712,18.814888,17.378147,16.278821,15.362291,19.052559
User 10,15.652476,16.703293,14.317821,17.058722,19.235384,17.435596,15.84298,17.464249,18.411953,0.0,...,18.708287,18.0,18.303005,16.881943,16.8523,16.278821,19.77372,16.673332,15.459625,17.435596


> Now finding closest to current user

In [56]:
euclidean_distance_with_current_user = [float('inf')] * total_number_of_users

for i in range(total_number_of_users):
    if i == user:
        continue
    euclidean_distance_with_current_user[i] = euclideanDistance(reconstructed_matrix[user],reconstructed_matrix[i])

closest_user = euclidean_distance_with_current_user.index(min(euclidean_distance_with_current_user))
print("The closest user to User", user+1, "is \033[1mUser", closest_user+1, "\033[0m")

The closest user to User 44 is [1mUser 9 [0m


In [58]:
close_user_predictions = reconstructed_matrix[closest_user]
recommendations_using_euclidean_distance = {}

for index, component in enumerate(data):
    # user_predictions from previous executions
    if user_predictions[index] < 0 and close_user_predictions[index] > 0:
        recommendations_using_euclidean_distance[close_user_predictions[index]] = component

recommendations_using_euclidean_distance = {key: recommendations_using_euclidean_distance[key] for key in sorted(recommendations_using_euclidean_distance, reverse=True)}


print("\nRecommendations using \033[1mEuclidean Distance\033[0m: \n")
pprint.pprint(recommendations_using_euclidean_distance, sort_dicts=False)


Recommendations using [1mEuclidean Distance[0m: 

{3.000000000000005: 'La La Land',
 1.999999999999994: 'Goodfellas',
 0.9999999999999976: 'Toy Story 3',
 0.9999999999999964: 'The Lord of the Rings: The Two Towers'}


## Comparison Between Recommendations

In [61]:
print("\nUsing \033[1mCosine Similarity\033[0m: \n")
pprint.pprint(recommendations_using_cosine_similarity, sort_dicts=False)

print("\n\nUsing \033[1mEuclidean Distance\033[0m: \n")
pprint.pprint(recommendations_using_euclidean_distance, sort_dicts=False)


Using [1mCosine Similarity[0m: 

{5.000000000000006: 'La La Land',
 4.99999999999999: 'Inception',
 3.999999999999994: 'The Grand Budapest Hotel',
 0.9999999999999993: 'Toy Story 3',
 0.9999999999999992: 'The Lord of the Rings: The Two Towers',
 5.491165123979575e-15: 'Goodfellas'}


Using [1mEuclidean Distance[0m: 

{3.000000000000005: 'La La Land',
 1.999999999999994: 'Goodfellas',
 0.9999999999999976: 'Toy Story 3',
 0.9999999999999964: 'The Lord of the Rings: The Two Towers'}


### Which one to Use?

> **Cosine Similarity** measures the relative preferences, which means the angle between the vectors. It accounts for **direction** but not **magnitude**.

> **Euclidean Distance** measures the **absolute difference** in ratings.


Therefore,

- **Cosine Similarity** is ideal when the goal is to find users with similar **patterns of preference**, regardless of the scale of their ratings.  
  For example, a user who consistently rates movies **5, 4, and 3** would be considered similar to another who rates the same movies **4, 3, and 2**.

- **Euclidean Distance** is more appropriate when the scale of ratings is significant and we want to account for **absolute rating similarity**.  
  For example, two users who both rate a movie **5** are closer in preference than users who rate the same movie **5 and 4**.


### Conclusion:

- If we are looking for users who have **similar preference patterns** no matter what scale their ratings fall in, we should use **Cosine Similarity**.  
- However, if the **scale of ratings** matters to us, we should use **Euclidean Distance**.
