### Recommendations with MovieTweetings: Most Popular Recommendation

Now that you have created the necessary columns we will be using throughout the rest of the lesson on creating recommendations, let's get started with the first of our recommendations.

To get started, read in the libraries and the two datasets you will be using throughout the lesson using the code below.


In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tests as t

%matplotlib inline

# Read in the datasets
movies = pd.read_csv('movies_clean.csv')
reviews = pd.read_csv('reviews_clean.csv')
del movies['Unnamed: 0']
del reviews['Unnamed: 0']

#### 1. How To Find The Most Popular Movies

For this notebook, we have a single task.  The task is that no matter the user, we need to provide a list of the recommendations based on simply the most popular items.

For this task, we will consider what is "most popular" based on the following criteria:

* A movie with the highest average rating is considered best
* With ties, movies that have more ratings are better
* A movie must have a minimum of 5 ratings to be considered among the best movies
* If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating

With these criteria, the goal for this notebook is to take a **user_id** and provide back the **n_top** recommendations.  Use the function below as the scaffolding that will be used for all the future recommendations as well.

In [2]:
movies

Unnamed: 0,movie_id,movie,genre,date,1800's,1900's,2000's,History,News,Horror,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
0,8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
1,10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
2,12,The Arrival of a Train (1896),Documentary|Short,1896,1,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
3,25,The Oxford and Cambridge University Boat Race ...,,1895,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,91,Le manoir du diable (1896),Short|Horror,1896,1,0,0,0,0,1,...,0,0,0,0,0,0,0,1,0,0
5,417,Le voyage dans la lune (1902),Short|Adventure|Fantasy,1902,0,1,0,0,0,0,...,1,0,0,0,0,0,0,1,0,0
6,439,The Great Train Robbery (1903),Short|Action|Crime,1903,0,1,0,0,0,0,...,0,0,0,1,0,0,0,1,0,0
7,443,"Hiawatha, the Messiah of the Ojibway (1903)",,1903,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,628,The Adventures of Dollie (1908),Action|Short,1908,0,1,0,0,0,0,...,0,0,0,1,0,0,0,1,0,0
9,833,The Country Doctor (1909),Short|Drama,1909,0,1,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [3]:
reviews

Unnamed: 0,user_id,movie_id,rating,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_9,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018
0,1,68646,10,1381620027,2013-10-12 23:20:27,0,0,0,0,0,...,0,1,0,0,1,0,0,0,0,0
1,1,113277,10,1379466669,2013-09-18 01:11:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,2,422720,8,1412178746,2014-10-01 15:52:26,0,0,0,0,0,...,0,1,0,0,0,1,0,0,0,0
3,2,454876,8,1394818630,2014-03-14 17:37:10,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
4,2,790636,7,1389963947,2014-01-17 13:05:47,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
5,2,816711,8,1379963769,2013-09-23 19:16:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
6,2,1091191,7,1391173869,2014-01-31 13:11:09,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
7,2,1103275,7,1408192129,2014-08-16 12:28:49,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
8,2,1322269,7,1391529691,2014-02-04 16:01:31,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
9,2,1390411,8,1451374513,2015-12-29 07:35:13,0,0,0,0,0,...,0,0,0,1,0,0,1,0,0,0


In [4]:
movie_is_five = pd.DataFrame(reviews.groupby('movie_id')['rating'].count() >= 5)

In [5]:
reviews.groupby('movie_id').max()['date']

movie_id
8          2014-04-08 18:20:11
10         2014-10-09 18:15:53
12         2015-08-10 23:16:19
25         2017-02-27 10:04:59
91         2013-11-23 18:59:55
417        2018-06-26 21:08:46
439        2018-06-24 02:59:09
443        2016-10-23 21:08:32
628        2013-11-29 12:55:28
833        2013-11-29 15:27:27
1223       2013-10-12 17:48:58
1740       2013-11-29 12:54:57
2101       2014-03-24 06:28:08
2130       2014-03-30 04:30:03
2354       2016-02-24 19:41:51
2844       2013-03-09 06:36:12
3740       2018-01-29 03:25:17
3863       2015-05-01 00:03:27
4099       2017-09-09 01:01:51
4100       2017-01-26 19:52:16
4101       2015-04-30 23:49:52
4210       2015-05-01 00:07:39
4395       2015-05-01 00:10:28
4457       2016-05-02 00:03:29
4518       2015-05-04 05:12:52
4546       2015-05-04 05:23:56
4936       2013-07-06 00:21:09
4972       2017-11-06 15:47:21
5074       2015-04-06 20:34:50
5078       2013-08-23 12:56:15
                  ...         
7942708    2018-03-14 22:25:17

In [6]:
df = pd.merge(reviews, movie_is_five, on='movie_id', how='left')
df = df.loc[df['rating_y'] == True]
df

Unnamed: 0,user_id,movie_id,rating_x,timestamp,date,month_1,month_2,month_3,month_4,month_5,...,month_10,month_11,month_12,year_2013,year_2014,year_2015,year_2016,year_2017,year_2018,rating_y
0,1,68646,10,1381620027,2013-10-12 23:20:27,0,0,0,0,0,...,1,0,0,1,0,0,0,0,0,True
1,1,113277,10,1379466669,2013-09-18 01:11:09,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,True
2,2,422720,8,1412178746,2014-10-01 15:52:26,0,0,0,0,0,...,1,0,0,0,1,0,0,0,0,True
3,2,454876,8,1394818630,2014-03-14 17:37:10,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,True
4,2,790636,7,1389963947,2014-01-17 13:05:47,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,True
5,2,816711,8,1379963769,2013-09-23 19:16:09,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,True
6,2,1091191,7,1391173869,2014-01-31 13:11:09,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,True
7,2,1103275,7,1408192129,2014-08-16 12:28:49,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,True
8,2,1322269,7,1391529691,2014-02-04 16:01:31,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,True
9,2,1390411,8,1451374513,2015-12-29 07:35:13,0,0,0,0,0,...,0,0,1,0,0,1,0,0,0,True


In [7]:
df.groupby('movie_id')['rating_x'].mean().sort_values(ascending=False)

movie_id
5262972    10.000000
2219210    10.000000
2059318    10.000000
4921860    10.000000
5688932    10.000000
4448444    10.000000
5131914    10.000000
1431149    10.000000
2560840    10.000000
2737018    10.000000
5512872     9.985836
4148400     9.882353
6798422     9.800000
1629443     9.800000
423176      9.800000
111341      9.800000
2396421     9.666667
58888       9.666667
2592910     9.666667
12364       9.625000
57565       9.600000
6054758     9.600000
5134588     9.600000
363473      9.600000
2265179     9.571429
5323386     9.555556
45274       9.500000
29843       9.500000
73440       9.428571
6781498     9.428571
             ...    
2088923     2.750000
3727824     2.714286
2769184     2.714286
3203620     2.714286
466342      2.666667
2006801     2.666667
3283792     2.600000
2325518     2.444444
185183      2.437500
1540767     2.434783
2357489     2.400000
3551400     2.400000
110978      2.333333
5988370     2.285714
829176      2.269231
3036740     2.200000
2622

In [8]:
movies.set_index('movie_id').head()

Unnamed: 0_level_0,movie,genre,date,1800's,1900's,2000's,History,News,Horror,Musical,...,Fantasy,Romance,Game-Show,Action,Documentary,Animation,Comedy,Short,Western,Thriller
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
8,Edison Kinetoscopic Record of a Sneeze (1894),Documentary|Short,1894,1,0,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
10,La sortie des usines Lumière (1895),Documentary|Short,1895,1,0,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
12,The Arrival of a Train (1896),Documentary|Short,1896,1,0,0,0,0,0,0,...,0,0,0,0,1,0,0,1,0,0
25,The Oxford and Cambridge University Boat Race ...,,1895,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
91,Le manoir du diable (1896),Short|Horror,1896,1,0,0,0,0,1,0,...,0,0,0,0,0,0,0,1,0,0


In [9]:
def create_ranked_df(movies, reviews):
    """Return a ranked movie based on
        - A movie with the highest average rating is considered best
        - With ties, movies that have more ratings are better
        - A movie must have a minimum of 5 ratings to be considered among the best movies
        - If movies are tied in their average rating and number of ratings, the ranking is determined by the movie that is the most recent rating
    """
    movie_ratings = reviews.groupby('movie_id')['rating']
    avg_ratings = movie_ratings.mean()
    num_ratings = movie_ratings.count()
    
    last_ratings = pd.DataFrame(reviews.groupby('movie_id').max()['date'])
    last_ratings.columns = ['last_rating']
    
    ratings_count_df = pd.DataFrame({'avg_rating': avg_ratings, 'num_rating': num_ratings})
    ratings_count_df = ratings_count_df.join(last_ratings)
    
    movie_recs = movies.set_index('movie_id').join(ratings_count_df)
    
    ranked_movies = movie_recs.sort_values(['avg_rating', 'num_rating', 'last_rating'], ascending=False)
    ranked_movies = ranked_movies.loc[ranked_movies['num_rating'] > 4]
    
    return ranked_movies

In [10]:
def popular_recommendations(user_id, n_top):
    '''
    INPUT:
    user_id - the user_id of the individual you are making recommendations for
    n_top - an integer of the number recommendations you want back
    OUTPUT:
    top_movies - a list of the n_top recommended movies by movie title in order best to worst
    '''
    top_movies = list(ranked_movies['movie'][:n_top])
    
    
    return top_movies # a list of the n_top movies as recommended

Usint the three criteria above, you should be able to put together the above function.  If you feel confident in your solution, check the results of your function against our solution. On the next page, you can see a walkthrough and you can of course get the solution by looking at the solution notebook available in this workspace.  

In [11]:
ranked_movies = create_ranked_df(movies, reviews)

# Top 20 movies recommended for id 1
recs_20_for_1 = popular_recommendations(1, 20)

# Top 5 movies recommended for id 53968
recs_5_for_53968 = popular_recommendations(53968, 5)

# Top 100 movies recommended for id 70000
recs_100_for_70000 = popular_recommendations(70000, 100)

# Top 35 movies recommended for id 43
recs_35_for_43 = popular_recommendations(43, 35)



In [12]:
recs_20_for_1

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)',
 'Be Somebody (2016)',
 'Birlesen Gonuller (2014)',
 'Agnelli (2017)',
 'Sátántangó (1994)',
 'Shijie (2004)',
 'Foster (2011)',
 'CM101MMXI Fundamentals (2013)',
 'Akahige (1965)',
 'Crystal Lake Memories: The Complete History of Friday the 13th (2013)',
 'Körkarlen (1921)']

In [13]:
### You Should Not Need To Modify Anything In This Cell
ranked_movies = t.create_ranked_df(movies, reviews) # only run this once - it is not fast

# check 1 
assert t.popular_recommendations('1', 20, ranked_movies) == recs_20_for_1,  "The first check failed..."
# check 2
assert t.popular_recommendations('53968', 5, ranked_movies) == recs_5_for_53968,  "The second check failed..."
# check 3
assert t.popular_recommendations('70000', 100, ranked_movies) == recs_100_for_70000,  "The third check failed..."
# check 4
assert t.popular_recommendations('43', 35, ranked_movies) == recs_35_for_43,  "The fourth check failed..."

print("If you got here, looks like you are good to go!  Nice job!")

If you got here, looks like you are good to go!  Nice job!


In [14]:
t.popular_recommendations('1', 20, ranked_movies)

['MSG 2 the Messenger (2015)',
 'Avengers: Age of Ultron Parody (2015)',
 'Sorry to Bother You (2018)',
 'Selam (2013)',
 "Quiet Riot: Well Now You're Here, There's No Way Back (2014)",
 'Crawl Bitch Crawl (2012)',
 'Make Like a Dog (2015)',
 'Pandorica (2016)',
 'Third Contact (2011)',
 'Romeo Juliet (2009)',
 'Be Somebody (2016)',
 'Birlesen Gonuller (2014)',
 'Agnelli (2017)',
 'Sátántangó (1994)',
 'Shijie (2004)',
 'Foster (2011)',
 'CM101MMXI Fundamentals (2013)',
 'Akahige (1965)',
 'Crystal Lake Memories: The Complete History of Friday the 13th (2013)',
 'Körkarlen (1921)']

**Notice:** This wasn't the only way we could have determined the "top rated" movies.  You can imagine that in keeping track of trending news or trending social events, you would likely want to create a time window from the current time, and then pull the articles in the most recent time frame.  There are always going to be some subjective decisions to be made.  

If you find that no one is paying any attention to your most popular recommendations, then it might be time to find a new way to recommend, which is what the next parts of the lesson should prepare us to do!


### Part II: Adding Filters

Now that you have created a function to give back the **n_top** movies, let's make it a bit more robust.  Add arguments that will act as filters for the movie **year** and **genre**.  

Use the cells below to adjust your existing function to allow for **year** and **genre** arguments as **lists** of **strings**.  Then your ending results are filtered to only movies within the lists of provided years and genres (as `or` conditions).  If no list is provided, there should be no filter applied.

You can adjust other necessary inputs as necessary to retrieve the final results you are looking for!

Try writing a few tests against the test function in our test function.  Below returns the top 20 movies for user 1 based on the specified year and genre filters.  Does yours return the same? 

```
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History'])
```

In [15]:
ranked_movies.head()

Unnamed: 0_level_0,movie,genre,date,1800's,1900's,2000's,History,News,Horror,Musical,...,Action,Documentary,Animation,Comedy,Short,Western,Thriller,avg_rating,num_ratings,last_rating
movie_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
4921860,MSG 2 the Messenger (2015),Comedy|Drama|Fantasy,2015,0,0,1,0,0,0,0,...,0,0,0,1,0,0,0,10.0,48,2016-08-14 17:16:50
5262972,Avengers: Age of Ultron Parody (2015),Short|Comedy,2015,0,0,1,0,0,0,0,...,0,0,0,1,1,0,0,10.0,28,2016-01-08 00:44:43
5688932,Sorry to Bother You (2018),Comedy|Fantasy|Sci-Fi,2018,0,0,1,0,0,0,0,...,0,0,0,1,0,0,0,10.0,14,2018-06-17 01:44:48
2737018,Selam (2013),Drama|Romance,2013,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,10.0,10,2015-05-10 22:56:01
2560840,"Quiet Riot: Well Now You're Here, There's No W...",Documentary|Music,2014,0,0,1,0,0,0,0,...,0,1,0,0,0,0,0,10.0,6,2016-01-23 00:30:44


In [16]:
t.popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History'])

['Ayla: The Daughter of War (2017)',
 'I Believe in Miracles (2015)',
 'The Farthest (2017)',
 'Sado (2015)',
 'Hatred (2016)',
 'Kincsem (2017)',
 'Nise - O Coração da Loucura (2015)',
 'LA 92 (2017)',
 'Straight Outta Compton (2015)',
 'Manjhi: The Mountain Man (2015)',
 'Only the Dead (2015)',
 'Spotlight (2015)',
 'Under sandet (2015)',
 'Airlift (2016)',
 'Dunkirk (2017)',
 'Taeksi Woonjunsa (2017)',
 'The Battleship Island (2017)',
 'Darkest Hour (2017)',
 'Best of Enemies (2015)',
 'The Ghazi Attack (2017)']

In [25]:
def popular_recs_filtered(user_id, n_top, ranked_movies, years, genres):
    top_movies = ranked_movies.loc[(ranked_movies['date'].isin(years)) &
                                   (ranked_movies[genres].sum(axis=1) > 0)]
    
    
    top_movies = list(top_movies['movie'][:n_top])
    
    return top_movies

In [26]:
popular_recs_filtered('1', 20, ranked_movies, years=['2015', '2016', '2017', '2018'], genres=['History'])

['Ayla: The Daughter of War (2017)',
 'I Believe in Miracles (2015)',
 'The Farthest (2017)',
 'Sado (2015)',
 'Hatred (2016)',
 'Kincsem (2017)',
 'Nise - O Coração da Loucura (2015)',
 'LA 92 (2017)',
 'Straight Outta Compton (2015)',
 'Manjhi: The Mountain Man (2015)',
 'Only the Dead (2015)',
 'Spotlight (2015)',
 'Under sandet (2015)',
 'Airlift (2016)',
 'Dunkirk (2017)',
 'Taeksi Woonjunsa (2017)',
 'The Battleship Island (2017)',
 'Darkest Hour (2017)',
 'Best of Enemies (2015)',
 'The Ghazi Attack (2017)']