### Recommendations with MovieTweetings: Most Popular Recommendation

Now that you have created the necessary columns we will be using throughout the rest of the lesson on creating recommendations, let's get started with the first of our recommendations.

To get started, read in the libraries and the two datasets you will be using throughout the lesson using the code below.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

#### Part I: How To Find The Most Popular Movies?

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

Before you implement your code for `popular_recommendations` function, we will provide a helper function, called `create_ranked_df`. This helper function transforms `movies` and `reviews` dataframes into a `ranked_movies` dataframe of movies that are sorted by the highest average rating & time and have more than 4 ratings.

In [8]:
# This helper function transforms `movies` and `reviews` dataframes
# into a `ranked_movies` dataframe of movies that are sorted 
# by the highest average rating & time and have more than 4 ratings.

def create_ranked_df(movies, reviews):
        '''
        INPUT
        movies - the movies dataframe
        reviews - the reviews dataframe
        
        OUTPUT
        ranked_movies - a dataframe with movies that are sorted by highest avg rating, more reviews, 
                        then time, and must have more than 4 ratings
        '''
        
        # Pull the average ratings and number of ratings for each movie
        movie_ratings = reviews.groupby('movie_id')['rating']
        avg_ratings = movie_ratings.mean()
        num_ratings = movie_ratings.count()
        last_rating = pd.DataFrame(reviews.groupby('movie_id').max()['date'])
        last_rating.columns = ['last_rating']

        # Add Dates
        rating_count_df = pd.DataFrame({'avg_rating': avg_ratings, 'num_ratings': num_ratings})
        rating_count_df = rating_count_df.join(last_rating)

        # merge with the movies dataset
        movie_recs = movies.set_index('movie_id').join(rating_count_df)

        # sort by top avg rating and number of ratings
        ranked_movies = movie_recs.sort_values(['avg_rating', 'num_ratings', 'last_rating'], ascending=False)

        # for edge cases - subset the movie list to those with only 5 or more reviews
        ranked_movies = ranked_movies[ranked_movies['num_ratings'] > 4]
        
        return ranked_movies

ranked_movies = create_ranked_df(movies, reviews)

In [9]:
def popular_recommendations(user_id, n_top, ranked_movies):
    '''
    INPUT:
    user_id - the user_id (str) of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    ranked_movies - a dataframe from 

    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''

    # Implement your code here
    top_movies = list(ranked_movies['movie'][:n_top])
    return top_movies

Usint the three criteria above, you should be able to put together the above function.  If you feel confident in your solution, check the results of your function against our solution. On the next page, you can see a walkthrough and you can of course get the solution by looking at the solution notebook available in this workspace.  

In [12]:
recs_20_for_1 = popular_recommendations(1,20, ranked_movies)# Implement your code here
recs_20_for_1

['Be Somebody (2016)',
 'Doctor Zhivago (1965)',
 'Taare Zameen Par (2007)',
 'Coldplay: A Head Full of Dreams (2018)',
 'City Lights (1931)',
 'Nema-ye Nazdik (1990)',
 'The Lord of the Rings: The Return of the King (2003)',
 'Tarzan (1999)',
 'Mimi wo sumaseba (1995)',
 'Drishyam (2015)',
 '12 Angry Men (1957)',
 'The Shawshank Redemption (1994)',
 'La meglio gioventù (2003)',
 "It's a Wonderful Life (1946)",
 'The Lord of the Rings: The Two Towers (2002)',
 'The Sound of Music (1965)',
 'Hotaru no haka (1988)',
 'Terminator 2: Judgment Day (1991)',
 'Hiroshima mon amour (1959)',
 'Aladdin (1992)']

In [10]:
# Top 20 movies recommended for id 1

# Top 5 movies recommended for id 53968
recs_5_for_53968 = # Implement your code here

# Top 100 movies recommended for id 70000
recs_100_for_70000 = # Implement your code here

# Top 35 movies recommended for id 43
recs_35_for_43 = # Implement your code here

SyntaxError: invalid syntax (<ipython-input-10-154a878d8f2c>, line 5)

Usint the three criteria above, you should be able to put together the above function.  If you feel confident in your solution, check the results of your function against our solution. On the next page, you can see a walkthrough and you can of course get the solution by looking at the solution notebook available in this workspace.  

In [None]:
### You Should Not Need To Modify Anything In This Cell

# check 1 
assert t.popular_recommendations('1', 20, ranked_movies) == recs_20_for_1,  "The first check failed..."
# check 2
assert t.popular_recommendations('53968', 5, ranked_movies) == recs_5_for_53968,  "The second check failed..."
# check 3
assert t.popular_recommendations('70000', 100, ranked_movies) == recs_100_for_70000,  "The third check failed..."
# check 4
assert t.popular_recommendations('43', 35, ranked_movies) == recs_35_for_43,  "The fourth check failed..."

print("If you got here, looks like you are good to go!  Nice job!")

**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

In [13]:
ranked_movies.head()

Unnamed: 0_level_0,movie,genre,date,1800's,1900's,2000's,Documentary,Adventure,Animation,Thriller,...,Film-Noir,Sport,Mystery,Western,Horror,Comedy,Music,avg_rating,num_ratings,last_rating
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
5512872,Be Somebody (2016),Comedy|Drama|Romance,2016,0,0,1,0,0,0,0,...,0,0,0,0,0,1,0,10.0,41.0,2018-05-17 00:29:18
59113,Doctor Zhivago (1965),Drama|Romance|War,1965,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,10.0,5.0,2017-11-19 09:22:58
986264,Taare Zameen Par (2007),Drama|Family,2007,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,9.666667,18.0,2020-03-23 20:57:38
9095324,Coldplay: A Head Full of Dreams (2018),Documentary|Music,2018,0,0,1,1,0,0,0,...,0,0,0,0,0,0,1,9.555556,9.0,2020-02-08 16:48:18
21749,City Lights (1931),Comedy|Drama|Romance,1931,0,1,0,0,0,0,0,...,0,0,0,0,0,1,0,9.5,10.0,2019-05-31 02:44:26


In [32]:
ranked_movies[ranked_movies['date'].isin(['2015'])]

Unnamed: 0_level_0,movie,genre,date,1800's,1900's,2000's,Documentary,Adventure,Animation,Thriller,...,Film-Noir,Sport,Mystery,Western,Horror,Comedy,Music,avg_rating,num_ratings,last_rating
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1


In [36]:
ranked_movies[['History']].sum(axis = 1)

movie_id
5512872     0
59113       0
986264      0
9095324     0
21749       0
           ..
3079016     0
118688      0
1085492     0
5690360     0
10367276    0
Length: 3434, dtype: int64

In [42]:
def popular_recs_filtered(user_id, n_top, ranked_movies, years=None, genres=None):
    '''
    INPUT:
    user_id - the user_id (str) of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    ranked_movies - a pandas dataframe of the already ranked movies based on avg rating, count, and time
    years - a list of strings with years of movies
    genres - a list of strings with genres of movies
    
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    
    # Implement your code here
    
    # Step 1: filter movies based on year and genre 
    if years is not None:
        ranked_movies = ranked_movies[ranked_movies['date'].isin(years)] 
    if genres is not None:
        num_genres = ranked_movies[genres].sum(axis = 1)
        ranked_movies = ranked_movies.loc[num_genres > 0,:]
    # Step 2: create top movies list 
    top_movies = list(ranked_movies['movie'][:n_top])
    return top_movies

In [43]:
# Top 20 movies recommended for id 1 with years=['2015', '2016', '2017', '2018'], genres=['History']
recs_20_for_1_filtered = popular_recs_filtered(1,20, ranked_movies,years = ['2015', '2016', '2017', '2018'], genres=['History'])# Implement your code here
print(recs_20_for_1_filtered)
# Top 5 movies recommended for id 53968 with no genre filter but years=['2015', '2016', '2017', '2018']
recs_5_for_53968_filtered = popular_recs_filtered(53968, 5, ranked_movies,years=['2015', '2016', '2017', '2018']) # Implement your code here
print(recs_5_for_53968_filtered)
# Top 100 movies recommended for id 70000 with no year filter but genres=['History', 'News']
recs_100_for_70000_filtered = popular_recs_filtered(70000, 100,ranked_movies, genres=['History', 'News']) # Implement your code here
print(recs_100_for_70000_filtered)

[]
[]
['Hotel Rwanda (2004)', "Schindler's List (1993)", 'Amadeus (1984)', 'Gone with the Wind (1939)', 'Lawrence of Arabia (1962)', 'Braveheart (1995)', 'Barry Lyndon (1975)', 'Gandhi (1982)', 'Taeksi woonjunsa (2017)', 'Before the Flood (2016)', 'Ayla: The Daughter of War (2017)', 'The Grapes of Wrath (1940)', 'Hacksaw Ridge (2016)', 'Il gattopardo (1963)', 'Persepolis (2007)', 'Portrait de la jeune fille en feu (2019)', 'Good Night, and Good Luck. (2005)', 'Missing (1982)', 'Changeling (2008)', 'The Irishman (2019)', 'United 93 (2006)', 'Frost/Nixon (2008)', 'Yip Man (2008)', 'Chi bi xia: Jue zhan tian xia (2009)', 'We Were Soldiers (2002)', 'The Last Emperor (1987)', 'Al Midan (2013)', "The King's Speech (2010)", 'They Shall Not Grow Old (2018)', 'Elizabeth (1998)', 'JFK (1991)', 'The Great Escape (1963)', 'Seppuku (1962)', 'The Salt of the Earth (2014)', 'Spartacus (1960)', 'Rurôni Kenshin: Meiji kenkaku roman tan (2012)', 'Straight Outta Compton (2015)', 'Pride (2014)', 'Malcolm 

In [44]:
### You Should Not Need To Modify Anything In This Cell

# check 1 
assert t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History']) == recs_20_for_1_filtered,  "The first check failed..."
# check 2
assert t.popular_recs_filtered('53968', 5, ranked_movies, years=['2015', '2016', '2017', '2018']) == recs_5_for_53968_filtered,  "The second check failed..."
# check 3
assert t.popular_recs_filtered('70000', 100, ranked_movies, genres=['History', 'News']) == recs_100_for_70000_filtered,  "The third check failed..."

print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!
