# Date Night Movie

#### Grading:


- Code: 90 pts
- Markdown Documentation: 10 pts


In this assignment we are going to use pandas to figure out - What's the best **date-night movie**?

This assignment is going to use
- Joining
- Groupby
- Sorting


In [1]:
import os
import pandas as pd

##### Read in the movie data: `pd.read_table`

In [2]:
def get_movie_data():
    
    unames = ['user_id','gender','age','occupation','zip']
    users = pd.read_table(os.path.join('../data','users.dat'), 
                          sep='::', header=None, names=unames)
    
    rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    ratings = pd.read_table(os.path.join('../data', 'ratings.dat'), 
                            sep='::', header=None, names=rnames)
    
    mnames = ['movie_id', 'title','genres']
    movies = pd.read_table(os.path.join('../data', 'movies.dat'), 
                           sep='::', header=None, names=mnames)

    return users, ratings, movies

In [3]:
users, ratings, movies = get_movie_data()

  """
  if __name__ == '__main__':
  del sys.path[0]


In [4]:
users.head()

Unnamed: 0,user_id,gender,age,occupation,zip
0,1,F,1,10,48067
1,2,M,56,16,70072
2,3,M,25,15,55117
3,4,M,45,7,2460
4,5,M,25,20,55455


In [5]:
ratings.head()

Unnamed: 0,user_id,movie_id,rating,timestamp
0,1,1193,5,978300760
1,1,661,3,978302109
2,1,914,3,978301968
3,1,3408,4,978300275
4,1,2355,5,978824291


In [6]:
movies.head()

Unnamed: 0,movie_id,title,genres
0,1,Toy Story (1995),Animation|Children's|Comedy
1,2,Jumanji (1995),Adventure|Children's|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama
4,5,Father of the Bride Part II (1995),Comedy


##### Clean up the `movies`

- Get the `year`
- Shorten the `title`


In [7]:
tmp = movies.title.str.extract('(.*) \(([0-9]+)\)')
tmp.apply(lambda x:x[0] if len(x) > 0 else None)
tmp.apply(lambda x: x[0][:40] if len(x) > 0 else None)

0    Toy Story
1         1995
dtype: object

In [8]:
movies['year'] = tmp[1]
movies['short_title'] = tmp[0]

In [9]:
movies.head()

Unnamed: 0,movie_id,title,genres,year,short_title
0,1,Toy Story (1995),Animation|Children's|Comedy,1995,Toy Story
1,2,Jumanji (1995),Adventure|Children's|Fantasy,1995,Jumanji
2,3,Grumpier Old Men (1995),Comedy|Romance,1995,Grumpier Old Men
3,4,Waiting to Exhale (1995),Comedy|Drama,1995,Waiting to Exhale
4,5,Father of the Bride Part II (1995),Comedy,1995,Father of the Bride Part II


##### Join the tables with `pd.merge` (20 pts)

Displays the merged data of all the tables in one table. 

In [10]:
merged = pd.merge(users, ratings)
allMerged = pd.merge(merged, movies)
allMerged.head()

Unnamed: 0,user_id,gender,age,occupation,zip,movie_id,rating,timestamp,title,genres,year,short_title
0,1,F,1,10,48067,1193,5,978300760,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
1,2,M,56,16,70072,1193,5,978298413,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
2,12,M,25,12,32793,1193,4,978220179,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
3,15,M,25,7,22903,1193,4,978199279,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest
4,17,M,50,1,95350,1193,5,978158471,One Flew Over the Cuckoo's Nest (1975),Drama,1975,One Flew Over the Cuckoo's Nest


##### What's the highest rated movie? (20 pts))

Takes the highest sum of ratings and declares what is the best choice for the highest rated movie by displaying the list of the top five.

In [11]:
print("The first five highest rated movies are:")
# Takes the sum of the ratings and outputs the 5 highest rated movies.
allMerged.groupby(['title'])['rating'].sum().sort_values(ascending = False).head()

The first five highest rated movies are:


title
American Beauty (1999)                                   14800
Star Wars: Episode IV - A New Hope (1977)                13321
Star Wars: Episode V - The Empire Strikes Back (1980)    12836
Star Wars: Episode VI - Return of the Jedi (1983)        11598
Saving Private Ryan (1998)                               11507
Name: rating, dtype: int64

The highest rated movie based off the highest sum of ratings is:
**American Beauty**

###### What is a good rated movie for date night? (60 pts)

- Hint - highly rated movie by 
    - both partners (might be the same gender or not),
    - based on genre preferences,
    - age group can also be combined

This declares the male gender by having the gender equal M for male and declares the female gender by having the gender equal F for female.

In [12]:
allMovieUsers = allMerged.merge(users)
male = allMovieUsers[allMovieUsers['gender'] == 'M']
female = allMovieUsers[allMovieUsers['gender'] == 'F']

Displays the best male movies based on the highest sum of rating.

In [13]:
print("The best male movies are:")
male.groupby(['title'])['rating'].sum().sort_values(ascending=False).head()

The best male movies are:


title
American Beauty (1999)                                   10790
Star Wars: Episode IV - A New Hope (1977)                10537
Star Wars: Episode V - The Empire Strikes Back (1980)    10175
Saving Private Ryan (1998)                                9141
Star Wars: Episode VI - Return of the Jedi (1983)         9074
Name: rating, dtype: int64

The best male movie is:   **American Beauty**.
Displays the best female movies based on the highest rated sum.

In [14]:
print("The best female movies are:")
female.groupby(['title'])['rating'].sum().sort_values(ascending=False).head()

The best female movies are:


title
American Beauty (1999)              4010
Shakespeare in Love (1998)          3337
Silence of the Lambs, The (1991)    3016
Sixth Sense, The (1999)             2973
Shawshank Redemption, The (1994)    2846
Name: rating, dtype: int64

The best female movie is:  **American Beauty**.
Displays the values of the best five genre movies for males.

In [15]:
male.genres.value_counts().head()

Comedy             87675
Drama              78571
Comedy|Drama       29937
Comedy|Romance     27112
Action|Thriller    21929
Name: genres, dtype: int64

The number one genre choice for males is comedy followed by drama.
Displays the values of the best five genre movies for females.

In [16]:
female.genres.value_counts().head()

Drama             32852
Comedy            29208
Comedy|Romance    15600
Comedy|Drama      12308
Drama|Romance     11749
Name: genres, dtype: int64

The number one genre choice for females is drama followed by comedy. This is opposite of males.
This declares different genres for males which will show the best five movies for each genre.
## I will use the genres comedy, drama, and comedy|drama becasue those are the top three for males.

In [17]:
maleGenre1 = male[male['genres'] == "Comedy"]
maleGenre2 = male[male['genres'] == "Drama"]
maleGenre3 = male[male['genres'] == "Comedy|Drama"]

This displays the best five male movies for the genre comedy.

In [18]:
maleGenre1.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                                   genres
Being John Malkovich (1999)             Comedy    6878
Monty Python and the Holy Grail (1974)  Comedy    5441
Airplane! (1980)                        Comedy    5426
Ferris Bueller's Day Off (1986)         Comedy    4493
Clerks (1994)                           Comedy    4374
Name: rating, dtype: int64

The best movie for comedy is **Being John Malkovich** for males.
This displays the best five male movies for the genre drama.

In [19]:
maleGenre2.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                                   genres
Shawshank Redemption, The (1994)        Drama     7297
One Flew Over the Cuckoo's Nest (1975)  Drama     5660
Fight Club (1999)                       Drama     4766
Good Will Hunting (1997)                Drama     4553
Amadeus (1984)                          Drama     4146
Name: rating, dtype: int64

The best movie for drama is **The Shawshank Redemption**.
This displays the best five male movies for the genre comedy|drama.

In [20]:
maleGenre3.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                                       genres      
American Beauty (1999)                      Comedy|Drama    10790
Breakfast Club, The (1985)                  Comedy|Drama     4300
Christmas Story, A (1983)                   Comedy|Drama     4272
As Good As It Gets (1997)                   Comedy|Drama     3861
Life Is Beautiful (La Vita ï¿½ bella) (1997)  Comedy|Drama     3365
Name: rating, dtype: int64

The best movie for comedy|drama is **American Beauty**.
This declares different genres for femmales which will show the best five movies for each genre.
I will use the genres drama, comedy, and comedy|romance since they are the top three genres for females.

In [21]:
femaleGenre1 = female[female['genres'] == "Drama"]
femaleGenre2 = female[female['genres'] == "Comedy"]
femaleGenre3 = female[female['genres'] == "Comedy|Romance"]

This displays the best five female movies for the genre drama.

In [22]:
femaleGenre1.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                                   genres
Shawshank Redemption, The (1994)        Drama     2846
One Flew Over the Cuckoo's Nest (1975)  Drama     1914
Good Will Hunting (1997)                Drama     1912
Amadeus (1984)                          Drama     1730
Erin Brockovich (2000)                  Drama     1687
Name: rating, dtype: int64

The best movie for drama is **The Shawshank Redemption**.
This displays the best five female movies for the genre comedy.

In [23]:
femaleGenre2.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                                   genres
Being John Malkovich (1999)             Comedy    2367
Election (1999)                         Comedy    1620
Ferris Bueller's Day Off (1986)         Comedy    1572
Full Monty, The (1997)                  Comedy    1559
Monty Python and the Holy Grail (1974)  Comedy    1491
Name: rating, dtype: int64

The best movie for comedy is **Being John Malkovich**.
This displays the best five male movies for the genre comedy|Romance.

In [24]:
femaleGenre3.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                               genres        
Shakespeare in Love (1998)          Comedy|Romance    3337
Groundhog Day (1993)                Comedy|Romance    2458
When Harry Met Sally... (1989)      Comedy|Romance    2120
Four Weddings and a Funeral (1994)  Comedy|Romance    1829
Clueless (1995)                     Comedy|Romance    1814
Name: rating, dtype: int64

The best movie for comedy|Romance is **Shakespeare in Love**.
Displays the values of the best five genre movies for all movie users.

In [25]:
allMovieUsers.genres.value_counts().head()

Comedy            116883
Drama             111423
Comedy|Romance     42712
Comedy|Drama       42245
Drama|Romance      29170
Name: genres, dtype: int64

## Most movie viewers enjoy comedy followed by drama. 
This declares the the top genre and the top two genres combined based on all movie users.
I will use genres comedy and comedy|drama because for a good date night movie I enjoy comedy and drama|romance is my next choice because I feel like 
that would be a good date movie.

In [26]:
movieGenre1 = allMovieUsers[allMovieUsers['genres'] == "Comedy"]
movieGenre2 = allMovieUsers[allMovieUsers['genres'] == "Drama|Romance"]

This displays the genre comedy based on all movie users.

In [27]:
movieGenre1.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                                   genres
Being John Malkovich (1999)             Comedy    9245
Monty Python and the Holy Grail (1974)  Comedy    6932
Airplane! (1980)                        Comedy    6874
Ferris Bueller's Day Off (1986)         Comedy    6065
Election (1999)                         Comedy    5982
Name: rating, dtype: int64

Based on the highest sum the best comedy movie is **Being John Malkovich**. 
This displays the genres comedy|drama based on all movie users.

In [28]:
movieGenre2.groupby(['title', 'genres'])['rating'].sum().sort_values(ascending=False).head()

title                       genres       
Titanic (1997)              Drama|Romance    5540
Graduate, The (1967)        Drama|Romance    5354
Edward Scissorhands (1990)  Drama|Romance    5292
Jerry Maguire (1996)        Drama|Romance    5086
Chasing Amy (1997)          Drama|Romance    3650
Name: rating, dtype: int64

The best movie based on the highest sum is **Titanic**.
Declares different ages of users based on all the users.

In [29]:
underEightteen = allMovieUsers[allMovieUsers['age'] < 18]
eightteenTOtwentyfive = allMovieUsers[allMovieUsers['age']  < 26]
twentysixTOfortyfive = allMovieUsers[allMovieUsers['age'] < 46]
fortysixUp = allMovieUsers[allMovieUsers['age'] > 45]

Displays the best five movies for the ages of the users that are under 18 years old.

In [30]:
underEightteen.groupby(['title'])['rating'].sum().sort_values(ascending=False).head()

title
Sixth Sense, The (1999)                      466
Matrix, The (1999)                           442
Toy Story (1995)                             439
Star Wars: Episode IV - A New Hope (1977)    431
Toy Story 2 (1999)                           416
Name: rating, dtype: int64

The best movie was **The Sixth Sense** for younger users.
Displays the best five movies for the ages of the users that are between 18 and 25.

In [31]:
eightteenTOtwentyfive.groupby(['title'])['rating'].sum().sort_values(ascending=False).head()

title
American Beauty (1999)                                   9418
Star Wars: Episode V - The Empire Strikes Back (1980)    8120
Star Wars: Episode IV - A New Hope (1977)                8077
Matrix, The (1999)                                       7568
Star Wars: Episode VI - Return of the Jedi (1983)        7466
Name: rating, dtype: int64

The best movie was **American Beauty**.
Displays the best five movies for the ages of the users that are between 26 and 45.

In [32]:
twentysixTOfortyfive.groupby(['title'])['rating'].sum().sort_values(ascending=False).head()

title
American Beauty (1999)                                   13015
Star Wars: Episode IV - A New Hope (1977)                11869
Star Wars: Episode V - The Empire Strikes Back (1980)    11539
Star Wars: Episode VI - Return of the Jedi (1983)        10445
Matrix, The (1999)                                       10360
Name: rating, dtype: int64

The best movie was **American Beauty**.
Displays the best five movies for the ages of the users that are 46 and older.

In [33]:
fortysixUp.groupby(['title'])['rating'].sum().sort_values(ascending=False).head()

title
American Beauty (1999)                       1785
Star Wars: Episode IV - A New Hope (1977)    1452
Godfather, The (1972)                        1444
Schindler's List (1993)                      1400
Fargo (1996)                                 1399
Name: rating, dtype: int64

The best movie was **American Beauty**.
Displays the value and the best genres for users under 18.

In [34]:
underEightteen.genres.value_counts().head()

Comedy             3703
Drama              2239
Comedy|Romance     1219
Comedy|Drama       1000
Action|Thriller     745
Name: genres, dtype: int64

The best genre is **comedy**.
Displays the values and the best genres for users between 18 and 25.

In [35]:
eightteenTOtwentyfive.genres.value_counts().head()

Comedy             76351
Drama              62751
Comedy|Romance     25973
Comedy|Drama       25906
Action|Thriller    17736
Name: genres, dtype: int64

The best genre is **comedy**.
Displays the values and the best genres for users between 26 and 45.

In [36]:
twentysixTOfortyfive.genres.value_counts().head()

Comedy            106570
Drama              95632
Comedy|Romance     37950
Comedy|Drama       37501
Drama|Romance      25370
Name: genres, dtype: int64

The best genre is **comedy**.
Displays the values and the best genres for users 46 and above.

In [37]:
fortysixUp.genres.value_counts().head()

Drama             15791
Comedy            10313
Comedy|Romance     4762
Comedy|Drama       4744
Drama|Romance      3800
Name: genres, dtype: int64

The best genre is **drama**.
I am creating a list of six movies from the data that are highly rated and then will give one movie from the list that i feel
would be the best movie.

In [38]:
moviesForDateNight = ["American Beauty", "The Breakfast Club", "Being John Malkovich", "Shakespeare in Love", "Star Wars: Episode IV - A New Hope", "Titanic"]

This prints the list of the six choices for your date night viewing.

In [39]:
print(moviesForDateNight)

['American Beauty', 'The Breakfast Club', 'Being John Malkovich', 'Shakespeare in Love', 'Star Wars: Episode IV - A New Hope', 'Titanic']


The Best Date night movie would be **American Beauty** based on all the data for ages and gender and the highest rated sum. It is a mixture of comedy and drama, which would make this a such a great movie for a date.