# KNN-CF Recommender Engine
### This notebook contians the recommender engine using the KNN Collaborative Filtering model.

### Preprocessing and preliminary code:

Libraries used in preprocessing and data manipulation:

In [6]:
import numpy as np 
import pandas as pd

# Libraries for Recommendation System
from scipy.sparse import csr_matrix
from sklearn.neighbors import NearestNeighbors


In [16]:
#To set up the engine we will start with the Movielens 100k
movie_100k = pd.read_csv('data\movielens_100k\movies.csv')
rating_100k = pd.read_csv('data\\movielens_100k\\ratings.csv')
print("Original data columns:")
print(movie_100k.columns)
print(rating_100k.columns)

#We create new dataframes only using the columns that are necessary
movies = movie_100k.loc[:,["movieId","title"]]
ratings = rating_100k.loc[:,["userId","movieId","rating"]]

print("\n\n\nTruncated dataframes;")
print(movies.columns)
print(ratings.columns)


Original data columns:
Index(['movieId', 'title', 'genres'], dtype='object')
Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')



Truncated dataframes;
Index(['movieId', 'title'], dtype='object')
Index(['userId', 'movieId', 'rating'], dtype='object')


We will now try to create a matrix which maps all user ratings to all movies (for each movie (row) we will have each user (column) and their ratings)

In [26]:
#This command creates a dataframe which merges both ratings and movies based on a common column (movieId)
data_100k = pd.merge(movies,ratings)

#Switching index and columns allows for a pivot table representing either an item based or user based matrix
user_movie_table = data_100k.pivot_table(index=["title"], columns=["userId"], values="rating").fillna(0)

user_movie_table.head(10)

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
'71 (2014),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
'Hellboy': The Seeds of Creation (2004),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Round Midnight (1986),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Salem's Lot (2004),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Til There Was You (1997),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'Tis the Season for Love (2015),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"'burbs, The (1989)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
'night Mother (1986),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
(500) Days of Summer (2009),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.5
*batteries not included (1987),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


For the code above I will provide chatGPTs explanation of the methods bc I couldn't explain it better and i dont care to at the moment:

*Here, the pivot_table function is used to reshape the data dataframe. Let's understand the parameters:*

- *index=["title"]: This sets the "title" column as the index of the pivot table. Each unique movie title will become a row in the pivot table.*
- *columns=["userId"]: This sets the "userId" column as the columns of the pivot table. Each unique user ID will become a column in the pivot table.*
- *values="rating": This specifies that the values to be filled in the pivot table come from the "rating" column of the data dataframe.*
- *fillna(0): This fills any missing (NaN) values in the pivot table with 0.*

*As a result, user_movie_table becomes a matrix where rows represent movies, columns represent users, and the cells contain ratings given by users to movies. Any missing ratings are filled with 0, indicating no rating was given.*

### KNN Engine implemented 


Below I use our previous matrix to implement a basic KNN Recommender using the Sklearn NearestNeighbors library implementation. To show an example of the implementation I pick a random user and print the 5 nearest 

In [58]:
query_index = np.random.choice(user_movie_table.shape[0])

k_value = 6

user_movie_table_matrix = csr_matrix(user_movie_table.values)
model_knn = NearestNeighbors(metric = 'cosine', algorithm = 'brute')
model_knn.fit(user_movie_table_matrix)
distances, indices = model_knn.kneighbors(user_movie_table.iloc[query_index,:].values.reshape(1,-1), n_neighbors = k_value)


print("Random movie: \"" + str(user_movie_table.index[query_index]) + "\"'s closest " + str(k_value) + " neighbors are:")
j = 0
for i in indices[0]:
    print("\tRecommendation " + str(j+1) + "\tMovie: " + user_movie_table.index[i] + ", Distance: " + str(distances[0][j]))
    j = j+1


Random movie: "Norte, El (1984)"'s closest 6 neighbors are:
	Recommendation 1	Movie: Norte, El (1984), Distance: 0.0
	Recommendation 2	Movie: Away from Her (2006), Distance: 0.5926934600187216
	Recommendation 3	Movie: Color Purple, The (1985), Distance: 0.6739643997990017
	Recommendation 4	Movie: Next (2007), Distance: 0.6921403520065204
	Recommendation 5	Movie: Fracture (2007), Distance: 0.6923076923076923
	Recommendation 6	Movie: Flashdance (1983), Distance: 0.7103379389891713
