# Troy Quicksall
# DSC 630
# Week 10 Assignment

## Importing all of the datasets into dataframes

In [171]:
import pandas as pd
import warnings
warnings.filterwarnings('ignore')

movies_df = pd.read_csv('ml-latest-small/movies.csv')
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [172]:
links_df = pd.read_csv('ml-latest-small/links.csv')
links_df.head()

Unnamed: 0,movieId,imdbId,tmdbId
0,1,114709,862.0
1,2,113497,8844.0
2,3,113228,15602.0
3,4,114885,31357.0
4,5,113041,11862.0


In [173]:
ratings_df = pd.read_csv('ml-latest-small/ratings.csv')
ratings_df.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931


In [174]:
tags_df = pd.read_csv('ml-latest-small/tags.csv')
tags_df.head()

Unnamed: 0,userId,movieId,tag,timestamp
0,2,60756,funny,1445714994
1,2,60756,Highly quotable,1445714996
2,2,60756,will ferrell,1445714992
3,2,89774,Boxing story,1445715207
4,2,89774,MMA,1445715200


Given that there is user based data (ratings and tags) I believe it makes sense to use a collaborative recommender system. This way we are getting the most out of the data provided.

## Building Collaborative Recommender System Based Off User Ratings
I will create a pivot table from the ratings dataframe so that colums are UserIds and rows are unique movies. This way we can train a Nearest-Neighbor algorithm to find movies similar users liked. By entering a preferred movie we will be able to locate movies that other people who like that movie liked.

In [175]:
# Creating pivot table so that colums are UserIds and rows are Unique Movies
rating_pivot = ratings_df.pivot_table(values='rating',columns='userId',index='movieId').fillna(0)
rating_pivot.head()

userId,1,2,3,4,5,6,7,8,9,10,...,601,602,603,604,605,606,607,608,609,610
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,4.0,0.0,0.0,0.0,4.0,0.0,4.5,0.0,0.0,0.0,...,4.0,0.0,4.0,3.0,4.0,2.5,4.0,2.5,3.0,5.0
2,0.0,0.0,0.0,0.0,0.0,4.0,0.0,4.0,0.0,0.0,...,0.0,4.0,0.0,5.0,3.5,0.0,0.0,2.0,0.0,0.0
3,4.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0


In [176]:
from sklearn.neighbors import NearestNeighbors
nn_algo = NearestNeighbors(metric='cosine')

# Training the nearest neighbor algorithm based off of the rating pivot table
nn_algo.fit(rating_pivot)

Writing a function that will get the distance and the neighbors of a particular movie. Then creating a list of the 5 closest neighbors since I am going to combine the top 5 from this collaborative recommender and the top 5 from a content based recommender.

In [177]:
 
def recommend_movie_by_rating(movie):
    # Retrieving id by locating based on title
    movieid = int(movies_df[movies_df['title']==movie]['movieId'])
    # Getting distance and neighbors from nn algorithm
    distance,neighbors = nn_algo.kneighbors([rating_pivot.loc[movieid]]
                                            ,n_neighbors=6) # number of recommendations + 1
    
    # Getting the ids of the neighbor to locate from original movie dataframe
    movieids = [rating_pivot.iloc[i].name for i in neighbors[0]]
    # Generating list of the movie titles
    recommeds = [str(movies_df[movies_df['movieId']==mid]['title'])
        .split('\n')[0].split('  ')[-1] for mid in movieids if mid not in [movieid]]
    return recommeds[:5]
    

## Content Based Filtering using tags
I will now create a content based filtering nearest neighbor algorithm based off of movie tags. I will retrieve the top 5 nearest neighbors and combine with the list generated by the previous algorithm. I will use a vectorizer to generate a table where the tag words are columns, and the movie ids are the rows. I will then train a new nearest neighbor algorithm on that table.

In [178]:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(stop_words='english')
# vectorizing the 'tag' column from the tag dataframe
tags = vectorizer.fit_transform(tags_df.tag).toarray()
# generating table from the tags (feature names of the vectorizer)
contents = pd.DataFrame(tags,columns=vectorizer.get_feature_names_out())
contents.head()

Unnamed: 0,06,1900s,1920s,1950s,1960s,1970s,1980s,1990s,2001,250,...,wrongful,wry,york,younger,zellweger,zither,zoe,zombie,zombies,zooey
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [179]:
nn_algo_tag = NearestNeighbors(metric='cosine')
nn_algo_tag.fit(contents)

In [180]:
# Function to get top 5 neighbors from tagging nearest neighbor algorithm
def recommend_on_movie_tags(movie):
        # Getting location of movie in dataframe
        iloc = movies_df[movies_df['title']==movie].index[0]
        # Getting distance and neighbors from tag algorithm
        distance,neighbors = nn_algo_tag.kneighbors([contents.iloc[iloc]],n_neighbors=6) # recommendations + 1
        recommeds = [movies_df.iloc[i]['title'] for i in neighbors[0] if i not in [iloc]]
        return recommeds[:5]
    


## Combining Both Recommenders to Get 10 Total Recommendations

In [181]:
# Combining top 5 nearest neighbors from both algorithms to get a list of 10
# Loop will run until q is entered
while True:
    mov = input('Enter movie and year made in parenthesis i.e. Friday (1995) or q to quit: ')
    if mov == 'q' or mov == 'Q':
        break
    else:
        #combining both lists
        rating_list = recommend_movie_by_rating(mov)
        tag_list = recommend_on_movie_tags(mov)
        total_list = rating_list + tag_list
        print('Recommendations: ')
        for rec in total_list:
            print(rec)

Enter movie and year made in parenthesis i.e. Friday (1995) or q to quit: Friday (1995)
Recommendations: 
Kingpin (1996)
Menace II Society (1993)
Set It Off (1996)
Half Baked (1998)
Meatballs III (1987)
Cats & Dogs (2001)
Crew, The (2000)
5,000 Fingers of Dr. T, The (1953)
Annie (1982)
Russia House, The (1990)
Enter movie and year made in parenthesis i.e. Friday (1995) or q to quit: Titanic (1997)
Recommendations: 
Men in Black (a.k.a. MIB) (1997)
Star Wars: Episode I - The Phantom Menace (1999)
Saving Private Ryan (1998)
Shrek (2001)
Catch Me If You Can (2002)
Somebody to Love (1994)
Nutty Professor, The (1996)
Tape (2001)
Daylight (1996)
Powder (1995)
Enter movie and year made in parenthesis i.e. Friday (1995) or q to quit: q


## References

Sachinsarkar. (2021, November 6). Movielens Movie Recommendation System. Kaggle. https://www.kaggle.com/code/sachinsarkar/movielens-movie-recommendation-system 

Markgraf, M. (2020b, August 27). Recommendation system for movies - movielens: Grouplens. Medium. https://medium.com/swlh/recommendation-system-for-movies-movielens-grouplens-171d30be334e 