# Course Solutions

1. [Pandas Intro](#1.-Pandas-Intro)
1. [Setting a meaningful index](#2.-Setting-a-meaningful-index)
1. [Making the most of a Jupyter Notebook](#3.-Making-the-most-of-a-Jupyter-Notebook)
1. [Selecting Subsets of Data from DataFrames with just the brackets](#4.-Selecting-Subsets-of-Data-from-DataFrames-with-just-the-brackets)
1. [Selecting Subsets of Data from DataFrames with `.loc`](#5.-Selecting-Subsets-of-Data-from-DataFrames-with-.loc)
1. [Selecting Subsets of Data from DataFrames with `.iloc`](#6.-Selecting-Subsets-of-Data-from-DataFrames-with-.iloc)
1. [Selecting Subsets of Data - Series](#7.-Selecting-Subsets-of-Data---Series)
1. [Boolean Indexing Single Conditions](#8.-Boolean-Indexing-Single-Conditions)
1. [Boolean Indexing Multiple Conditions](#9.-Boolean-Indexing-Multiple-Conditions)
1. [Boolean Indexing More](#10.-Boolean-Indexing-More)

# 1. Pandas Intro

In [3]:
import pandas as pd
import numpy as np

pd.options.display.max_columns = 40
bikes = pd.read_csv('../data/bikes.csv')

### Problem 1
<span  style="color:green; font-size:16px">Select the column **`events`**, the type of weather that was recorded and assign it to a variable with the same name. Output the first 10 values of it.</span>

In [2]:
events = bikes['events']
events.head(10)

0    mostlycloudy
1    partlycloudy
2    mostlycloudy
3    mostlycloudy
4    partlycloudy
5    mostlycloudy
6          cloudy
7          cloudy
8          cloudy
9    mostlycloudy
Name: events, dtype: object

### Problem 2
<span  style="color:green; font-size:16px">What type of object is **`events`**?</span>

In [3]:
# it's a Series
type(events)

pandas.core.series.Series

### Problem 3
<span  style="color:green; font-size:16px">Select the last 2 rows of the **`bikes`** DataFrame and assign it to the variable **`bikes_last_2`**. What type of object is **`bikes_last_2`**?</span>

In [4]:
# it's a DataFrame
bikes_last_2 = bikes.tail(2)
type(bikes_last_2)

pandas.core.frame.DataFrame

### Problem 4
<span  style="color:green; font-size:16px">What type of object is returned from the **`dtypes`** attribute?</span>

In [6]:
# a Series
type(bikes.dtypes)

pandas.core.series.Series

### Problem 5
<span  style="color:green; font-size:16px">What type of object is returned from the **`shape`** attribute?</span>

In [6]:
# a tuple of rows, columns
type(bikes.shape)

tuple

### Problem 6
<span  style="color:green; font-size:16px">What type of object is returned from the **`info`** method?</span>

The object **`None`** is returned. What you see is just output printed to the screen.

In [7]:
info_return = bikes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50089 entries, 0 to 50088
Data columns (total 19 columns):
trip_id              50089 non-null int64
usertype             50089 non-null object
gender               50089 non-null object
starttime            50089 non-null object
stoptime             50089 non-null object
tripduration         50089 non-null int64
from_station_name    50089 non-null object
latitude_start       50083 non-null float64
longitude_start      50083 non-null float64
dpcapacity_start     50083 non-null float64
to_station_name      50089 non-null object
latitude_end         50077 non-null float64
longitude_end        50077 non-null float64
dpcapacity_end       50077 non-null float64
temperature          50089 non-null float64
visibility           50089 non-null float64
wind_speed           50089 non-null float64
precipitation        50089 non-null float64
events               50089 non-null object
dtypes: float64(10), int64(2), object(7)
memory usage: 7.3+ MB


In [8]:
type(info_return)

NoneType

### Problem 7
<span  style="color:green; font-size:16px">The memory usage from the **`info`** method isn't correct when you have objects in your DataFrame. Read the docstrings from it and get the true memory usage.</span>

In [9]:
bikes.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50089 entries, 0 to 50088
Data columns (total 19 columns):
trip_id              50089 non-null int64
usertype             50089 non-null object
gender               50089 non-null object
starttime            50089 non-null object
stoptime             50089 non-null object
tripduration         50089 non-null int64
from_station_name    50089 non-null object
latitude_start       50083 non-null float64
longitude_start      50083 non-null float64
dpcapacity_start     50083 non-null float64
to_station_name      50089 non-null object
latitude_end         50077 non-null float64
longitude_end        50077 non-null float64
dpcapacity_end       50077 non-null float64
temperature          50089 non-null float64
visibility           50089 non-null float64
wind_speed           50089 non-null float64
precipitation        50089 non-null float64
events               50089 non-null object
dtypes: float64(10), int64(2), object(7)
memory usage: 28.9 MB


# 2. Setting a meaningful index

# 3. Making the most of a Jupyter Notebook

# 4. Selecting Subsets of Data from DataFrames with just the brackets

In [9]:
movie = pd.read_csv('../data/movie.csv', index_col='title')

### Problem 1
<span  style="color:green; font-size:16px">Select the column with the director's name as a Series</span>

In [11]:
movie['director_name'].head()

title
Avatar                                            James Cameron
Pirates of the Caribbean: At World's End         Gore Verbinski
Spectre                                              Sam Mendes
The Dark Knight Rises                         Christopher Nolan
Star Wars: Episode VII - The Force Awakens          Doug Walker
Name: director_name, dtype: object

### Problem 2
<span  style="color:green; font-size:16px">Select the column with the director's name and number of Facebook likes.</span>

In [12]:
movie[['director_name', 'director_fb']].head()

Unnamed: 0_level_0,director_name,director_fb
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Avatar,James Cameron,0.0
Pirates of the Caribbean: At World's End,Gore Verbinski,563.0
Spectre,Sam Mendes,0.0
The Dark Knight Rises,Christopher Nolan,22000.0
Star Wars: Episode VII - The Force Awakens,Doug Walker,131.0


# 5. Selecting Subsets of Data from DataFrames with `.loc`

### Problem 1
<span  style="color:green; font-size:16px">Select all columns for the movie 'The Dark Knight Rises'.</span>

In [10]:
movie.loc['The Dark Knight Rises']

year                                                            2012
color                                                          Color
content_rating                                                 PG-13
duration                                                         164
director_name                                      Christopher Nolan
director_fb                                                    22000
actor1                                                     Tom Hardy
actor1_fb                                                      27000
actor2                                                Christian Bale
actor2_fb                                                      23000
actor3                                          Joseph Gordon-Levitt
actor3_fb                                                      23000
gross                                                    4.48131e+08
genres                                               Action|Thriller
num_reviews                       

In [11]:
type(movie.loc['The Dark Knight Rises'])

pandas.core.series.Series

### Problem 2
<span  style="color:green; font-size:16px">Select all columns for the movies 'Tangled' and 'Avatar'.</span>

In [14]:
movie.loc[['Tangled', 'Avatar']]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Tangled,2010.0,Color,PG,100.0,Nathan Greno,15.0,Brad Garrett,799.0,Donna Murphy,553.0,M.C. Gainey,284.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,324.0,294810,17th century|based on fairy tale|disney|flower...,English,USA,260000000.0,7.8
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9


### Problem 3
<span  style="color:green; font-size:16px">What year was 'Tangled' and 'Avatar' made and what was their IMBD scores?</span>

In [15]:
movie.loc[['Tangled', 'Avatar'], ['year', 'imdb_score']]

Unnamed: 0_level_0,year,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Tangled,2010.0,7.8
Avatar,2009.0,7.9


### Problem 4
<span  style="color:green; font-size:16px">Can you tell what the data type of the `year` column is by just looking at its values?</span>

Yes, because it has a decimal value it must be a float. Integers do not have decimals

### Problem 5
<span  style="color:green; font-size:16px">Use a single method to output the data type and number of non-missing values of `year`. Is it missing any?</span>

In [16]:
# yes, its missing many values. 4432 non-missing vs 4916 total
movie.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4916 entries, Avatar to My Date with Drew
Data columns (total 21 columns):
year               4810 non-null float64
color              4897 non-null object
content_rating     4616 non-null object
duration           4901 non-null float64
director_name      4814 non-null object
director_fb        4814 non-null float64
actor1             4909 non-null object
actor1_fb          4909 non-null float64
actor2             4903 non-null object
actor2_fb          4903 non-null float64
actor3             4893 non-null object
actor3_fb          4893 non-null float64
gross              4054 non-null float64
genres             4916 non-null object
num_reviews        4867 non-null float64
num_voted_users    4916 non-null int64
plot_keywords      4764 non-null object
language           4904 non-null object
country            4911 non-null object
budget             4432 non-null float64
imdb_score         4916 non-null float64
dtypes: float64(10), int64(1), 

### Problem 6
<span  style="color:green; font-size:16px">Select every 100th movie between 'Tangled' and 'Forrest Gump'. Why doesn't 'Forrest Gump' appear in the results?</span>

In [17]:
# Forrest Gump is not a multiple of 100 away from Tangled
movie.loc['Tangled':'Forrest Gump':100]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Tangled,2010.0,Color,PG,100.0,Nathan Greno,15.0,Brad Garrett,799.0,Donna Murphy,553.0,M.C. Gainey,284.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,324.0,294810,17th century|based on fairy tale|disney|flower...,English,USA,260000000.0,7.8
Shrek the Third,2007.0,Color,PG,93.0,Chris Miller,50.0,Justin Timberlake,3000.0,Eric Idle,795.0,Rupert Everett,692.0,320706665.0,Adventure|Animation|Comedy|Family|Fantasy,227.0,211971,disney spoof|fairy tale|prince|princess|tough guy,English,USA,160000000.0,6.1
X-Men 2,2003.0,Color,PG-13,134.0,Bryan Singer,0.0,Hugh Jackman,20000.0,Bruce Davison,505.0,Aaron Stanford,346.0,214948780.0,Action|Adventure|Fantasy|Sci-Fi|Thriller,289.0,405973,mutant|prison|professor|school|x men,English,Canada,110000000.0,7.5
Cloud Atlas,2012.0,Color,R,172.0,Tom Tykwer,670.0,Tom Hanks,15000.0,Jim Sturgess,5000.0,Jim Broadbent,1000.0,27098580.0,Drama|Sci-Fi,511.0,284825,composer|future|letter|nonlinear timeline|nurs...,English,Germany,102000000.0,7.5
Divergent,2014.0,Color,PG-13,139.0,Neil Burger,168.0,Kate Winslet,14000.0,Theo James,5000.0,Mekhi Phifer,1000.0,150832203.0,Adventure|Mystery|Sci-Fi,459.0,341058,army|brother sister relationship|dystopia|fath...,English,USA,85000000.0,6.7
Hidalgo,2004.0,Color,PG-13,136.0,Joe Johnston,394.0,J.K. Simmons,24000.0,Viggo Mortensen,10000.0,Peter Mensah,1000.0,67286731.0,Action|Adventure|Western,140.0,67856,arab|cowboy|horse|race|sheik,English,USA,100000000.0,6.7
Doom,2005.0,Color,R,113.0,Andrzej Bartkowiak,43.0,Dwayne Johnson,12000.0,Ben Daniels,585.0,Dexter Fletcher,452.0,28031250.0,Action|Adventure|Horror|Sci-Fi,237.0,88146,commando unit|extra chromosome|first person sh...,English,UK,60000000.0,5.2
Gone Girl,2014.0,Color,R,149.0,David Fincher,21000.0,Patrick Fugit,835.0,Sela Ward,812.0,Emily Ratajkowski,625.0,167735396.0,Crime|Drama|Mystery|Thriller,568.0,569841,based on novel|disappearance|missing person|mi...,English,USA,61000000.0,8.1
"Sabrina, the Teenage Witch",,Color,TV-G,22.0,,,Nate Richert,870.0,Soleil Moon Frye,558.0,Caroline Rhea,271.0,,Comedy|Family|Fantasy,20.0,24420,female protagonist|hereditary gift of witchcra...,English,USA,3000000.0,6.6


# 6. Selecting Subsets of Data from DataFrames with `.iloc`

### Problem 1
<span  style="color:green; font-size:16px">Select the rows with integer location 10, 5, and 1</span>

In [18]:
movie.iloc[[10, 5, 1]]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Batman v Superman: Dawn of Justice,2016.0,Color,PG-13,183.0,Zack Snyder,0.0,Henry Cavill,15000.0,Lauren Cohan,4000.0,Alan D. Purwin,2000.0,330249062.0,Action|Adventure|Sci-Fi,673.0,371639,based on comic book|batman|sequel to a reboot|...,English,USA,250000000.0,6.9
John Carter,2012.0,Color,PG-13,132.0,Andrew Stanton,475.0,Daryl Sabara,640.0,Samantha Morton,632.0,Polly Walker,530.0,73058679.0,Action|Adventure|Sci-Fi,462.0,212204,alien|american civil war|male nipple|mars|prin...,English,USA,263700000.0,6.6
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1


### Problem 2
<span  style="color:green; font-size:16px">Select the columns with integer location 10, 5, and 1</span>

In [19]:
movie.iloc[:, [10, 5, 1]].head()

Unnamed: 0_level_0,actor3,director_fb,color
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,Wes Studi,0.0,Color
Pirates of the Caribbean: At World's End,Jack Davenport,563.0,Color
Spectre,Stephanie Sigman,0.0,Color
The Dark Knight Rises,Joseph Gordon-Levitt,22000.0,Color
Star Wars: Episode VII - The Force Awakens,,131.0,


### Problem 3
<span  style="color:green; font-size:16px">Select rows with integer location 100 to but not including 105 along with the column integer location 5.</span>

In [20]:
movie.iloc[100:105, 5]

title
The Fast and the Furious                   357.0
The Curious Case of Benjamin Button      21000.0
X-Men: First Class                         905.0
The Hunger Games: Mockingjay - Part 2      508.0
The Sorcerer's Apprentice                  226.0
Name: director_fb, dtype: float64

# 7. Selecting Subsets of Data - Series

### Problem 1
<span  style="color:green; font-size:16px">Read in the bikes dataset. We will be using it for the next few questions. Select the wind speed column as a Series and assign it to a variable and output the head. What kind of index does this Series have?</span>

In [21]:
bikes = pd.read_csv('../data/bikes.csv')
bikes.head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


In [22]:
wind = bikes['wind_speed']
wind.head()

0    12.7
1     6.9
2    16.1
3    16.1
4    17.3
Name: wind_speed, dtype: float64

This index is a **`RangeIndex`**

In [23]:
wind.index

RangeIndex(start=0, stop=50089, step=1)

In [24]:
wind.loc[4:10]

4     17.3
5     17.3
6     15.0
7      5.8
8      0.0
9     12.7
10     9.2
Name: wind_speed, dtype: float64

### Problem 2
<span  style="color:green; font-size:16px">From the wind speed Series, select the integer locations 4 through, but not including 10</span>

In [25]:
wind.iloc[4:10]

4    17.3
5    17.3
6    15.0
7     5.8
8     0.0
9    12.7
Name: wind_speed, dtype: float64

### Problem 3
<span  style="color:green; font-size:16px">Copy and paste your answer to problem 2 below but use `.loc` instead. Do you get the same result? Why not?</span>

In [26]:
wind.loc[4:10]

4     17.3
5     17.3
6     15.0
7      5.8
8      0.0
9     12.7
10     9.2
Name: wind_speed, dtype: float64

This is tricky - the index in this case contains integers and not strings. So the labels themselves are also integers and happen to be the same integers corresponding to integer location. The reason `.iloc` and `.loc` produce different results is that `.loc` always includes the last value when slicing.

### Problem 4
<span  style="color:green; font-size:16px">Read in the movie dataset and set the index to be the title. Select `actor1` as a Series. Who is the `actor1` for 'My Big Fat Greek Wedding'?</span>

In [27]:
movie = pd.read_csv('../data/movie.csv', index_col='title')
actor1 = movie['actor1']

In [28]:
actor1.loc['My Big Fat Greek Wedding']

'Nia Vardalos'

### Problem 5
<span  style="color:green; font-size:16px">Find `actor1` for your favorite two movies?</span>

In [29]:
actor1.loc[['Titanic', 'Blood Diamond']]

title
Titanic          Leonardo DiCaprio
Blood Diamond    Leonardo DiCaprio
Name: actor1, dtype: object

### Problem 6
<span  style="color:green; font-size:16px">Select the last 10 values from `actor1` using two different ways?</span>

In [30]:
actor1.iloc[-10:]

title
Primer                       Shane Carruth
Cavite                         Ian Gamazon
El Mariachi                Carlos Gallardo
The Mongol King             Richard Jewell
Newlyweds                      Kerry Bishé
Signed Sealed Delivered        Eric Mabius
The Following                  Natalie Zea
A Plague So Pleasant           Eva Boehnke
Shanghai Calling                 Alan Ruck
My Date with Drew              John August
Name: actor1, dtype: object

In [31]:
actor1.tail(10)

title
Primer                       Shane Carruth
Cavite                         Ian Gamazon
El Mariachi                Carlos Gallardo
The Mongol King             Richard Jewell
Newlyweds                      Kerry Bishé
Signed Sealed Delivered        Eric Mabius
The Following                  Natalie Zea
A Plague So Pleasant           Eva Boehnke
Shanghai Calling                 Alan Ruck
My Date with Drew              John August
Name: actor1, dtype: object

# 8. Boolean Indexing Single Conditions

In [32]:
import pandas as pd

In [33]:
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8


### Problem 1
<span  style="color:green; font-size:16px">Read in the movie dataset and set the index to be the title. Select all movies that have Tom Hanks as `actor1`. How many of these movies has he starred in?</span>

In [34]:
del pd

In [35]:
import pandas as pd

In [36]:
filt = movie['actor1'] == 'Tom Hanks'
hanks_movies = movie[filt]
hanks_movies.head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Toy Story 3,2010.0,Color,G,103.0,Lee Unkrich,125.0,Tom Hanks,15000.0,John Ratzenberger,1000.0,Don Rickles,721.0,414984497.0,Adventure|Animation|Comedy|Family|Fantasy,453.0,544884,college|day care|escape|teddy bear|toy,English,USA,200000000.0,8.3
The Polar Express,2004.0,Color,G,100.0,Robert Zemeckis,0.0,Tom Hanks,15000.0,Eddie Deezen,726.0,Peter Scolari,267.0,665426.0,Adventure|Animation|Family|Fantasy,188.0,120798,boy|christmas|christmas eve|north pole|train,English,USA,165000000.0,6.6
Angels & Demons,2009.0,Color,PG-13,146.0,Ron Howard,2000.0,Tom Hanks,15000.0,Ayelet Zurer,745.0,Armin Mueller-Stahl,294.0,133375846.0,Mystery|Thriller,298.0,207839,conclave|illuminati|murder|reference to bernin...,English,USA,150000000.0,6.7


He's starred in 24 movies

In [37]:
hanks_movies.shape

(24, 21)

### Problem 2
<span  style="color:green; font-size:16px">Select movies with and IMDB score greater than 9.</span>

In [38]:
filt= movie['imdb_score'] > 9
movie[filt]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Shawshank Redemption,1994.0,Color,R,142.0,Frank Darabont,0.0,Morgan Freeman,11000.0,Jeffrey DeMunn,745.0,Bob Gunton,461.0,28341469.0,Crime|Drama,199.0,1689764,escape from prison|first person narration|pris...,English,USA,25000000.0,9.3
Towering Inferno,,Color,,65.0,John Blanchard,0.0,Martin Short,770.0,Andrea Martin,179.0,Joe Flaherty,176.0,,Comedy,,10,,English,Canada,,9.5
Dekalog,,Color,TV-MA,55.0,,,Krystyna Janda,20.0,Olaf Lubaszenko,3.0,Olgierd Lukaszewicz,2.0,447093.0,Drama,53.0,12590,meaning of life|moral challenge|morality|searc...,Polish,Poland,,9.1
The Godfather,1972.0,Color,R,175.0,Francis Ford Coppola,0.0,Al Pacino,14000.0,Marlon Brando,10000.0,Robert Duvall,3000.0,134821952.0,Crime|Drama,208.0,1155770,crime family|mafia|organized crime|patriarch|r...,English,USA,6000000.0,9.2
Kickboxer: Vengeance,2016.0,,,90.0,John Stockwell,134.0,Matthew Ziff,260000.0,T.J. Storm,454.0,Sam Medina,354.0,,Action,2.0,246,,,USA,17000000.0,9.1


### Problem 3
<span  style="color:green; font-size:16px">Select all movies from the 1970s.</span>

In [39]:
filt1 = movie['year'] >= 1970
filt2 = movie['year'] <= 1979
filt = filt1 & filt2
movie[filt].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
All That Jazz,1979.0,Color,R,123.0,Bob Fosse,189.0,Roy Scheider,813.0,Ben Vereen,388.0,Max Wright,87.0,,Comedy|Drama|Music|Musical,84.0,19228,dancer|editing|stand up comedian|surgery|vomiting,English,USA,,7.8
Superman,1978.0,Color,PG,188.0,Richard Donner,503.0,Marlon Brando,10000.0,Margot Kidder,593.0,Ned Beatty,467.0,134218018.0,Action|Adventure|Drama|Romance|Sci-Fi,169.0,126357,1970s|clark kent|planet|superhero|year 1978,English,USA,55000000.0,7.3
Solaris,1972.0,Black and White,PG,115.0,Andrei Tarkovsky,0.0,Donatas Banionis,29.0,Anatoliy Solonitsyn,29.0,Natalya Bondarchuk,12.0,,Drama|Mystery|Sci-Fi,144.0,54057,hallucination|ocean|psychologist|scientist|spa...,Russian,Soviet Union,1000000.0,8.1
Mean Streets,1973.0,Color,R,112.0,Martin Scorsese,17000.0,Robert De Niro,22000.0,David Carradine,926.0,David Proval,354.0,32645.0,Crime|Drama|Romance|Thriller,112.0,67797,bar|catholic guilt|epilepsy|italian american|m...,English,USA,500000.0,7.4
Star Trek: The Motion Picture,1979.0,Color,PG,143.0,Robert Wise,338.0,Leonard Nimoy,12000.0,Nichelle Nichols,664.0,Walter Koenig,643.0,82300000.0,Adventure|Mystery|Sci-Fi,134.0,63330,alien|space|space station|spacecraft|warp speed,English,USA,35000000.0,6.4


# 9. Boolean Indexing Multiple Conditions

### Problem 1
<span  style="color:green; font-size:16px">Select all movies from the 1970s that had IMDB scores greater than 8</span>

In [40]:
filt1 = movie['year'] >= 1970
filt2 = movie['year'] <= 1979 
filt3 = movie['imdb_score'] > 8

filt = filt1 & filt2 & filt3
movie[filt].head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Solaris,1972.0,Black and White,PG,115.0,Andrei Tarkovsky,0.0,Donatas Banionis,29.0,Anatoliy Solonitsyn,29.0,Natalya Bondarchuk,12.0,,Drama|Mystery|Sci-Fi,144.0,54057,hallucination|ocean|psychologist|scientist|spa...,Russian,Soviet Union,1000000.0,8.1
Apocalypse Now,1979.0,Color,R,289.0,Francis Ford Coppola,0.0,Harrison Ford,11000.0,Marlon Brando,10000.0,Robert Duvall,3000.0,78800000.0,Drama|War,261.0,450676,army|green beret|insanity|jungle|vietnam,English,USA,31500000.0,8.5
The Deer Hunter,1978.0,Color,R,183.0,Michael Cimino,517.0,Robert De Niro,22000.0,Meryl Streep,11000.0,John Savage,652.0,,Drama|War,140.0,232577,escape|friend|party|pittsburgh steelers|vietnam,English,UK,15000000.0,8.2


### Problem 2
<span  style="color:green; font-size:16px">Select movies that were rated either R, PG-13, or PG.</span>

In [41]:
filt = movie['content_rating'].isin(['R', 'PG-13', 'PG'])
movie[filt].head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8


### Problem 3
<span  style="color:green; font-size:16px">Select movies that are either rated PG-13 or were made after 2010.</span>

In [42]:
filt1 = movie['content_rating'] == 'PG-13'
filt2 = movie['year'] > 2010
filt = filt1 | filt2

movie[filt].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,Joseph Gordon-Levitt,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
John Carter,2012.0,Color,PG-13,132.0,Andrew Stanton,475.0,Daryl Sabara,640.0,Samantha Morton,632.0,Polly Walker,530.0,73058679.0,Action|Adventure|Sci-Fi,462.0,212204,alien|american civil war|male nipple|mars|prin...,English,USA,263700000.0,6.6


### Problem 4
<span  style="color:green; font-size:16px">Find all the movies that have at least one of the three actors with more than 10,000 Facebook likes.</span>

In [44]:
filt1 = movie['actor1_fb'] > 10000
filt2 = movie['actor2_fb'] > 10000
filt3 = movie['actor3_fb'] > 10000
filt = filt1 | filt2 | filt3

movie[filt].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,Joseph Gordon-Levitt,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Spider-Man 3,2007.0,Color,PG-13,156.0,Sam Raimi,0.0,J.K. Simmons,24000.0,James Franco,11000.0,Kirsten Dunst,4000.0,336530303.0,Action|Adventure|Romance,392.0,383056,sandman|spider man|symbiote|venom|villain,English,USA,258000000.0,6.2
Avengers: Age of Ultron,2015.0,Color,PG-13,141.0,Joss Whedon,0.0,Chris Hemsworth,26000.0,Robert Downey Jr.,21000.0,Scarlett Johansson,19000.0,458991599.0,Action|Adventure|Sci-Fi,635.0,462669,artificial intelligence|based on comic book|ca...,English,USA,250000000.0,7.5


### Problem 5
<span  style="color:green; font-size:16px">Reverse the condition from problem 6. In words, what have you selected.</span>

The following selects non-PG-13 movies made in the year 2010 or before.

In [45]:
filt1 = movie['content_rating'] == 'PG-13'
filt2 = movie['year'] > 2010
filt = filt1 | filt2

movie[~filt].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,,,,Documentary,,8,,,,,7.1
Tangled,2010.0,Color,PG,100.0,Nathan Greno,15.0,Brad Garrett,799.0,Donna Murphy,553.0,M.C. Gainey,284.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,324.0,294810,17th century|based on fairy tale|disney|flower...,English,USA,260000000.0,7.8
Harry Potter and the Half-Blood Prince,2009.0,Color,PG,153.0,David Yates,282.0,Alan Rickman,25000.0,Daniel Radcliffe,11000.0,Rupert Grint,10000.0,301956980.0,Adventure|Family|Fantasy|Mystery,375.0,321795,blood|book|love|potion|professor,English,UK,250000000.0,7.5
The Chronicles of Narnia: Prince Caspian,2008.0,Color,PG,150.0,Andrew Adamson,80.0,Peter Dinklage,22000.0,Pierfrancesco Favino,216.0,Damián Alcázar,201.0,141614023.0,Action|Adventure|Family|Fantasy,258.0,149922,brother brother relationship|brother sister re...,English,USA,225000000.0,6.6
Alice in Wonderland,2010.0,Color,PG,108.0,Tim Burton,13000.0,Johnny Depp,40000.0,Alan Rickman,25000.0,Anne Hathaway,11000.0,334185206.0,Adventure|Family|Fantasy,451.0,306320,alice in wonderland|mistaking reality for drea...,English,USA,200000000.0,6.5


# 10. Boolean Indexing More

In [46]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv')

### Problem 1
<span  style="color:green; font-size:16px">Select the wind speed column a a Series and assign it to a variable. Are there any negative wind speeds?</span>

In [47]:
wind = bikes['wind_speed']
wind.head()

0    12.7
1     6.9
2    16.1
3    16.1
4    17.3
Name: wind_speed, dtype: float64

Yes, there is really strong negative wind! Or maybe its just bad data...

In [48]:
filt = wind < 0
wind[filt].head()

22990   -9999.0
27168   -9999.0
28368   -9999.0
29308   -9999.0
29309   -9999.0
Name: wind_speed, dtype: float64

### Problem 2
<span  style="color:green; font-size:16px">Select all wind speed between 12 and 16.</span>

In [49]:
filt = wind.between(12, 16)
wind[filt].head()

0     12.7
6     15.0
9     12.7
18    13.8
19    13.8
Name: wind_speed, dtype: float64

### Problem 3
<span  style="color:green; font-size:16px">Select the events and gender columns for all trip durations longer than 1,000 seconds.</span>

In [50]:
filt = bikes['tripduration'] > 1000
cols = ['events', 'gender']
bikes.loc[filt, cols].head()

Unnamed: 0,events,gender
2,mostlycloudy,Male
8,cloudy,Male
10,mostlycloudy,Male
11,mostlycloudy,Male
12,partlycloudy,Male


### Problem 4
<span  style="color:green; font-size:16px">Read in the movie dataset with the title as the index. We will use this DataFrame for the rest of the problems. Select all the movies such that the Facebook likes for actor 2 are greater than those for actor 1.</span>

In [51]:
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,Wes Studi,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,Jack Davenport,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,Stephanie Sigman,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,Joseph Gordon-Levitt,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,,,,Documentary,,8,,,,,7.1


There are none!

In [52]:
filt = movie['actor2_fb'] > movie['actor2_fb']
movie[filt]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1


### Problem 5
<span  style="color:green; font-size:16px">Select the year, content rating, and IMDB score columns for movies from the year 2016 with IMDB score less than 4.</span>

In [53]:
filt1 = movie['year'] == 2016
filt2 = movie['imdb_score'] < 4
filt = filt1 & filt2
cols = ['year', 'content_rating', 'imdb_score']

movie.loc[filt, cols]

Unnamed: 0_level_0,year,content_rating,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Fifty Shades of Black,2016.0,R,3.5
Cabin Fever,2016.0,Not Rated,3.7
God's Not Dead 2,2016.0,PG,3.4


### Problem 6
<span  style="color:green; font-size:16px">Select all the movies that are missing values for content rating.</span>

In [54]:
filt = movie['content_rating'].isnull()
movie[filt].head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,actor3,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,,,,Documentary,,8,,,,,7.1
Godzilla Resurgence,2016.0,Color,,120.0,Hideaki Anno,28.0,Mark Chinnery,544.0,Shin'ya Tsukamoto,106.0,Atsuko Maeda,12.0,,Action|Adventure|Drama|Horror|Sci-Fi,1.0,374,blood|godzilla|monster|sequel,Japanese,Japan,,8.2
Harry Potter and the Deathly Hallows: Part II,2011.0,Color,,,Matt Birch,0.0,Rupert Grint,10000.0,Dave Legeno,570.0,Ralph Ineson,159.0,,Action|Fantasy,1.0,381,,English,UK,,7.5


### Problem 7
<span  style="color:green; font-size:16px">Select all the movies that are missing both the gross and budget. Return just those columns to verify that those values are indeed missing.</span>

In [55]:
filt = movie['gross'].isnull() & movie['budget'].isnull()
cols = ['gross', 'budget']
movie.loc[filt, cols].head()

Unnamed: 0_level_0,gross,budget
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars: Episode VII - The Force Awakens,,
The Lovers,,
Godzilla Resurgence,,
Harry Potter and the Deathly Hallows: Part II,,
Harry Potter and the Deathly Hallows: Part I,,


### Problem 8
<span  style="color:green; font-size:16px">Write a function `find_missing` that has three parameters, `df`, `col1` and `col2` where `df` is a DataFrame and `col1` and `col2` are column names. This function should return all the rows of the DataFrame where `col1` and `col2` are missing. Only return the two columns as well. Answer problem 7 with this function.</span>

In [56]:
def find_missing(df, col1, col2):
    filt = df[col1].isnull() & df[col2].isnull()
    cols = [col1, col2]
    
    return df.loc[filt, cols].head()

In [57]:
find_missing(movie, 'gross', 'budget')

Unnamed: 0_level_0,gross,budget
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars: Episode VII - The Force Awakens,,
The Lovers,,
Godzilla Resurgence,,
Harry Potter and the Deathly Hallows: Part II,,
Harry Potter and the Deathly Hallows: Part I,,
