# 10. Boolean Indexing More

### Objectives

+ Boolean Selection with the brackets on a Series
+ Using the `between` method instead of an `and` condition
+ Simultaneously select rows with boolean selection and columns with a list of names with `.loc`
+ Select rows with missing values with the `isna` method

## Boolean Selection on a Series
All the examples thus far have taken place on the bikes DataFrame. Boolean selection on a Series happens almost identically. Since there is only one dimension of data, the queries you ask are usually going to be simpler.

First, let’s select a single column of data as a Series such as the temperature column.

In [2]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv', parse_dates=['starttime', 'stoptime'])

In [3]:
temp = bikes['temperature']
temp.head()

0    73.9
1    69.1
2    73.0
3    72.0
4    73.0
Name: temperature, dtype: float64

Let's select temperatures greater than 90

In [4]:
filt = temp > 90
temp[filt].head()

54    91.0
55    91.0
56    91.0
61    93.0
62    93.0
Name: temperature, dtype: float64

Select temperature less than 0 or greater than 95

In [5]:
%pwd

'/Users/Allen/Desktop/Data Science Bootcamp/Day-1-Data-Science-Bootcamp-master/01. Selecting Subsets of Data'

In [6]:
filt1 = temp < 0
filt2 = temp > 95
filt = filt1 | filt2

temp[filt].head()

395     96.1
396     96.1
397     96.1
1871    -2.0
2049    -2.0
Name: temperature, dtype: float64

## Re-read data with `starttime` in the index
The default index is not very helpful. Let's re-read data with **`starttime`** in the index. While, this column may not be unique it does provide us with useful information for the index.

In [7]:
bikes = pd.read_csv('../data/bikes.csv', 
                    parse_dates=['starttime', 'stoptime'], 
                    index_col='starttime')
bikes.head()

Unnamed: 0_level_0,trip_id,usertype,gender,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
starttime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2013-06-28 19:01:00,7147,Subscriber,Male,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
2013-06-28 22:53:00,7524,Subscriber,Male,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2013-06-30 14:43:00,10927,Subscriber,Male,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
2013-07-01 10:05:00,12907,Subscriber,Male,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
2013-07-01 11:16:00,13168,Subscriber,Male,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


In [8]:
temp2 = bikes['temperature']
temp2.head()

starttime
2013-06-28 19:01:00    73.9
2013-06-28 22:53:00    69.1
2013-06-30 14:43:00    73.0
2013-07-01 10:05:00    72.0
2013-07-01 11:16:00    73.0
Name: temperature, dtype: float64

Let's select temperatures greater than 90. We expect to get a summer month and we do.

In [9]:
filt = temp2 > 90
temp2[filt].head()

starttime
2013-07-16 15:13:00    91.0
2013-07-16 15:31:00    91.0
2013-07-16 16:35:00    91.0
2013-07-17 17:08:00    93.0
2013-07-17 17:25:00    93.0
Name: temperature, dtype: float64

Select temperature less than 0 or greater than 95. We expect to get winter months and we do.

In [10]:
filt1 = temp2 < 0
filt2 = temp2 > 95
filt = filt1 | filt2

temp2[filt2].head()

starttime
2013-08-30 15:33:00    96.1
2013-08-30 15:37:00    96.1
2013-08-30 15:49:00    96.1
Name: temperature, dtype: float64

## The `between` method
The `between` method return a boolean Series by testing whether the current value is between two given values. For instance, if want to select the temperatures between 50 and 60 degrees (inclusive), we do the following:

In [11]:
filt = temp2.between(50, 60)
filt.head()

starttime
2013-06-28 19:01:00    False
2013-06-28 22:53:00    False
2013-06-30 14:43:00    False
2013-07-01 10:05:00    False
2013-07-01 11:16:00    False
Name: temperature, dtype: bool

In [12]:
temp2[filt].head()

starttime
2013-09-13 07:55:00    54.0
2013-09-13 08:04:00    57.9
2013-09-13 08:04:00    57.9
2013-09-13 08:06:00    57.9
2013-09-13 08:22:00    57.9
Name: temperature, dtype: float64

# Simultaneous boolean selection of rows and column labels with `.loc`
The **`.loc`** indexer was thoroughly covered in an earlier notebook and will now be covered here to simultaneously select rows and columns. Earlier, it was stated that **`.loc`** made selections only by label. This wasn't strictly true as it is also able to do boolean selection along with selection by label.

Remember that **`.loc`** takes both a row selection and a column selection separated by a comma. Since the row selection comes first, you can pass it the same exact inputs that you do for just the brackets and get the same results.

Let's run some of the older examples of boolean selection with **`.loc`**.

In [13]:
filt = bikes['tripduration'] > 1000
bikes.loc[filt].head()

Unnamed: 0_level_0,trip_id,usertype,gender,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
starttime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2013-06-30 14:43:00,10927,Subscriber,Male,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
2013-07-03 15:21:00,21028,Subscriber,Male,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
2013-07-04 17:17:00,24383,Subscriber,Male,2013-07-04 17:42:00,1523,Morgan St & 18th St,41.858086,-87.651073,15.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,79.0,10.0,9.2,-9999.0,mostlycloudy
2013-07-04 18:13:00,24673,Subscriber,Male,2013-07-04 18:42:00,1697,Ashland Ave & Armitage Ave,41.917859,-87.668919,15.0,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,79.0,10.0,10.4,-9999.0,mostlycloudy
2013-07-05 10:02:00,26214,Subscriber,Male,2013-07-05 10:40:00,2263,Jefferson St & Monroe St,41.880422,-87.642746,19.0,Jefferson St & Monroe St,41.880422,-87.642746,19.0,79.0,10.0,0.0,-9999.0,partlycloudy


In [14]:
filt = bikes['events'].isin(['rain', 'snow', 'tstorms', 'sleet'])
bikes.loc[filt].head()

Unnamed: 0_level_0,trip_id,usertype,gender,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
starttime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2013-07-15 16:43:00,66336,Subscriber,Male,2013-07-15 16:55:00,727,Greenwood Ave & 47th St,41.809835,-87.599383,15.0,State St & Harrison St,41.873958,-87.627739,19.0,82.9,10.0,5.8,0.0,rain
2013-07-21 16:35:00,89180,Subscriber,Male,2013-07-21 17:06:00,1809,Michigan Ave & Pearson St,41.89766,-87.62351,23.0,Millennium Park,41.881032,-87.624084,35.0,82.4,10.0,11.5,0.0,tstorms
2013-07-21 16:47:00,89228,Subscriber,Male,2013-07-21 17:03:00,999,Carpenter St & Huron St,41.894556,-87.653449,19.0,Carpenter St & Huron St,41.894556,-87.653449,19.0,82.4,10.0,11.5,0.0,tstorms
2013-07-23 00:16:00,95044,Subscriber,Female,2013-07-23 00:26:00,563,Wabash Ave & Roosevelt Rd,41.867173,-87.625955,19.0,Daley Center Plaza,41.884337,-87.630183,47.0,78.8,10.0,17.3,0.0,tstorms
2013-07-26 19:10:00,111568,Subscriber,Male,2013-07-26 19:33:00,1395,Larrabee St & Kingsbury St,41.897764,-87.642884,27.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,66.9,8.0,12.7,0.0,rain


## Separate row and column selection with a comma for `.loc`
The great benefit of **`.loc`** is that it allows us to simultaneously do boolean selection along the rows and make column selections by label.

Let's select just the events rain and snow and only the columns events and trip duration.

In [15]:
filt = bikes['events'].isin(['rain', 'snow'])
cols = ['events', 'tripduration']
bikes.loc[filt, cols].head()

Unnamed: 0_level_0,events,tripduration
starttime,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-07-15 16:43:00,rain,727
2013-07-26 19:10:00,rain,1395
2013-07-30 18:53:00,rain,442
2013-08-05 17:09:00,rain,890
2013-09-07 16:09:00,rain,978


## Column to Column Comparisons
So far, we have created conditionals by comparing each of our column values to a single scalar value. It is possible to do element-by-element comparisons by comparing two columns to one another.

For instance, if we wanted to test whether there were more capacity at the start of the ride vs the end, we would do the following:

In [16]:
filt = bikes['dpcapacity_start'] > bikes['dpcapacity_end']

In [19]:
type(filt)

pandas.core.series.Series

Let's use this filter with **`.loc`** to return all the rows where the start capacity is greater than the end.

In [17]:
cols = ['dpcapacity_start', 'dpcapacity_end']
bikes.loc[filt, cols].head()

Unnamed: 0_level_0,dpcapacity_start,dpcapacity_end
starttime,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-06-28 22:53:00,31.0,19.0
2013-07-02 17:47:00,31.0,19.0
2013-07-03 15:21:00,31.0,15.0
2013-07-07 00:06:00,19.0,15.0
2013-07-08 17:06:00,23.0,19.0


### Boolean selection with `.iloc` does not work
The Pandas developers decided not to allow boolean selection with **`.iloc`**.

In [18]:
bikes.iloc[filt]

ValueError: iLocation based boolean indexing cannot use an indexable as a mask

# Finding Missing Values with `isna`
The **`isna`** method called from either a DataFrame or a Series returns True for every value that is missing and False for any other value. 

Let's see this in action by calling **`isna`** on the start capacity column.

In [20]:
bikes['dpcapacity_start'].isna().head()

starttime
2013-06-28 19:01:00    False
2013-06-28 22:53:00    False
2013-06-30 14:43:00    False
2013-07-01 10:05:00    False
2013-07-01 11:16:00    False
Name: dpcapacity_start, dtype: bool

### Filtering for missing values

We can now use this boolean Series to select all the rows where the capacity start column is missing. Verify that the 

In [None]:
filt = bikes['dpcapacity_start'].isna()
bikes[filt]

## `isnull` is an alias for `isna`
There is an identical method named **`isnull`** that you will see in other tutorials. It is an **alias** of **`isna`** meaning it does the exact same thing with a different name. Either one is suitable to use but I prefer **`isna`** because of the similarity **na** to **NaN**, the representation of missing values.

# Exercises

### Problem 1
<span  style="color:green; font-size:16px">Select the wind speed column a a Series and assign it to a variable. Are there any negative wind speeds?</span>

In [23]:
# your code here
filt = bikes['wind_speed'] < 0 
bikes.loc[filt, 'wind_speed'].head()

starttime
2016-03-19 10:08:00   -9999.0
2016-06-30 11:47:00   -9999.0
2016-07-21 21:02:29   -9999.0
2016-08-07 09:16:42   -9999.0
2016-08-07 09:29:44   -9999.0
Name: wind_speed, dtype: float64

### Problem 2
<span  style="color:green; font-size:16px">Select all wind speed between 12 and 16.</span>

In [25]:
# your code here
filt = bikes['wind_speed'].between(12, 16)
bikes.loc[filt, 'wind_speed'].head()

starttime
2013-06-28 19:01:00    12.7
2013-07-02 17:47:00    15.0
2013-07-04 15:00:00    12.7
2013-07-09 13:12:00    13.8
2013-07-09 13:14:00    13.8
Name: wind_speed, dtype: float64

### Problem 3
<span  style="color:green; font-size:16px">Select the events and gender columns for all trip durations longer than 1,000 seconds.</span>

In [28]:
# your code here
filt = bikes['tripduration'] > 1000
cols = ['events', 'gender']

In [29]:
bikes.loc[filt, cols].head()

Unnamed: 0_level_0,events,gender
starttime,Unnamed: 1_level_1,Unnamed: 2_level_1
2013-06-30 14:43:00,mostlycloudy,Male
2013-07-03 15:21:00,cloudy,Male
2013-07-04 17:17:00,mostlycloudy,Male
2013-07-04 18:13:00,mostlycloudy,Male
2013-07-05 10:02:00,partlycloudy,Male


In [26]:
bikes.columns


Index(['trip_id', 'usertype', 'gender', 'stoptime', 'tripduration',
       'from_station_name', 'latitude_start', 'longitude_start',
       'dpcapacity_start', 'to_station_name', 'latitude_end', 'longitude_end',
       'dpcapacity_end', 'temperature', 'visibility', 'wind_speed',
       'precipitation', 'events'],
      dtype='object')

### Problem 4
<span  style="color:green; font-size:16px">Read in the movie dataset with the title as the index. We will use this DataFrame for the rest of the problems. Select all the movies such that the Facebook likes for actor 2 are greater than those for actor 1.</span>

In [33]:
# your code here
movies = pd.read_csv('../data/movie.csv', index_col='title')
movies.head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8
The Dark Knight Rises,2012.0,Color,PG-13,164.0,Christopher Nolan,22000.0,Tom Hardy,27000.0,Christian Bale,23000.0,...,23000.0,448130642.0,Action|Thriller,813.0,1144337,deception|imprisonment|lawlessness|police offi...,English,USA,250000000.0,8.5
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,...,,,Documentary,,8,,,,,7.1


In [36]:
filt = movies['actor2_fb'] > movies['actor1_fb']
movies[filt]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1


### Problem 5
<span  style="color:green; font-size:16px">Select the year, content rating, and IMDB score columns for movies from the year 2016 with IMDB score less than 4.</span>

In [37]:
# your code here
filt = movies['imdb_score'] < 4
cols = ['year', 'content_rating', 'imdb_score']
movies.loc[filt, cols].head()

Unnamed: 0_level_0,year,content_rating,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Batman & Robin,1997.0,PG-13,3.7
Speed 2: Cruise Control,1997.0,PG-13,3.7
The Cat in the Hat,2003.0,PG,3.8
Catwoman,2004.0,PG-13,3.3
Son of the Mask,2005.0,PG,2.2


### Problem 6
<span  style="color:green; font-size:16px">Select all the movies that are missing values for content rating.</span>

In [38]:
# your code here
filt = movies['content_rating'].isna()
movies[filt]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,...,,,Documentary,,8,,,,,7.1
Godzilla Resurgence,2016.0,Color,,120.0,Hideaki Anno,28.0,Mark Chinnery,544.0,Shin'ya Tsukamoto,106.0,...,12.0,,Action|Adventure|Drama|Horror|Sci-Fi,1.0,374,blood|godzilla|monster|sequel,Japanese,Japan,,8.2
Harry Potter and the Deathly Hallows: Part II,2011.0,Color,,,Matt Birch,0.0,Rupert Grint,10000.0,Dave Legeno,570.0,...,159.0,,Action|Fantasy,1.0,381,,English,UK,,7.5
Harry Potter and the Deathly Hallows: Part I,2010.0,Color,,,Matt Birch,0.0,Rupert Grint,10000.0,Toby Jones,2000.0,...,1000.0,,Fantasy,4.0,252,,English,UK,,6.4
Asterix at the Olympic Games,2008.0,Color,,116.0,Frédéric Forestier,0.0,Alain Delon,936.0,Santiago Segura,276.0,...,141.0,,Adventure|Comedy|Family|Fantasy,33.0,20567,1st century b.c.|lightsaber|local blockbuster|...,French,France,78000000.0,5.1
"10,000 B.C.",,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,,...,,,Comedy,,6,,,,,7.2
Evolution,2015.0,Color,,81.0,Lucile Hadzihalilovic,92.0,Nissim Renard,23.0,Roxane Duran,21.0,...,8.0,,Drama|Horror|Mystery|Sci-Fi,63.0,979,boy|giving birth|nurse|sea|ultrasonography,French,France,,6.4
Life,,Color,,45.0,,,Adam Arkin,374.0,Brent Sexton,130.0,...,0.0,,Crime|Drama|Mystery,12.0,29450,cop|murder|partner|police|protective male,English,USA,,8.3
The Missing,,Color,,60.0,,,Jason Flemyng,1000.0,James Nesbitt,773.0,...,575.0,,Crime|Drama|Mystery,14.0,8739,france|journalist|limp|police detective|reporter,English,UK,,8.1
Xi you ji zhi: Sun Wukong san da Baigu Jing,2016.0,Color,,119.0,Pou-Soi Cheang,3.0,Li Gong,879.0,Aaron Kwok,107.0,...,22.0,,Action|Adventure|Fantasy,14.0,1212,buddhism|demon|journey to the west|monk|monkey...,English,China,68005000.0,6.0


### Problem 7
<span  style="color:green; font-size:16px">Select all the movies that are missing both the gross and budget. Return just those columns to verify that those values are indeed missing.</span>

In [40]:
# your code here
filt1 = movies['gross'].isna()
filt2 = movies['budget'].isna()
filt = filt1 & filt2 
movies[filt]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Star Wars: Episode VII - The Force Awakens,,,,,Doug Walker,131.0,Doug Walker,131.0,Rob Walker,12.0,...,,,Documentary,,8,,,,,7.1
The Lovers,2015.0,Color,R,109.0,Roland Joffé,596.0,Tamsin Egerton,622.0,Alice Englert,525.0,...,283.0,,Action|Adventure|Romance|Sci-Fi,10.0,2138,1770s|british india|great barrier reef|india|ring,English,Belgium,,4.5
Godzilla Resurgence,2016.0,Color,,120.0,Hideaki Anno,28.0,Mark Chinnery,544.0,Shin'ya Tsukamoto,106.0,...,12.0,,Action|Adventure|Drama|Horror|Sci-Fi,1.0,374,blood|godzilla|monster|sequel,Japanese,Japan,,8.2
Harry Potter and the Deathly Hallows: Part II,2011.0,Color,,,Matt Birch,0.0,Rupert Grint,10000.0,Dave Legeno,570.0,...,159.0,,Action|Fantasy,1.0,381,,English,UK,,7.5
Harry Potter and the Deathly Hallows: Part I,2010.0,Color,,,Matt Birch,0.0,Rupert Grint,10000.0,Toby Jones,2000.0,...,1000.0,,Fantasy,4.0,252,,English,UK,,6.4
The A-Team,,Color,TV-PG,60.0,,,George Peppard,669.0,Dirk Benedict,554.0,...,432.0,,Action|Adventure|Crime,29.0,25402,1980s|cult tv|famous opening theme|good versus...,English,USA,,7.6
"10,000 B.C.",,,,22.0,Christopher Barnard,0.0,Mathew Buck,5.0,,,...,,,Comedy,,6,,,,,7.2
Ben-Hur,2016.0,Color,PG-13,141.0,Timur Bekmambetov,335.0,Morgan Freeman,11000.0,Ayelet Zurer,745.0,...,635.0,,Adventure|Drama|History,1.0,57,,English,USA,,6.1
Hannibal,,Color,TV-14,44.0,,,Caroline Dhavernas,544.0,Scott Thompson,183.0,...,148.0,,Crime|Drama|Horror|Mystery|Thriller,103.0,159910,blood|cannibalism|fbi|manipulation|psychiatrist,English,USA,,8.6
All That Jazz,1979.0,Color,R,123.0,Bob Fosse,189.0,Roy Scheider,813.0,Ben Vereen,388.0,...,87.0,,Comedy|Drama|Music|Musical,84.0,19228,dancer|editing|stand up comedian|surgery|vomiting,English,USA,,7.8


### Problem 8
<span  style="color:green; font-size:16px">Write a function `find_missing` that has three parameters, `df`, `col1` and `col2` where `df` is a DataFrame and `col1` and `col2` are column names. This function should return all the rows of the DataFrame where `col1` and `col2` are missing. Only return the two columns as well. Answer problem 7 with this function.</span>

In [44]:
# your code here
def find_missing(df, col1, col2):
    filt1 = df[col1].isna()
    filt2 = df[col2].isna()
    filt = filt1 & filt2
    cols = [col1, col2]
    return df.loc[filt, cols]

In [45]:
find_missing(movies, 'gross', 'budget')

Unnamed: 0_level_0,gross,budget
title,Unnamed: 1_level_1,Unnamed: 2_level_1
Star Wars: Episode VII - The Force Awakens,,
The Lovers,,
Godzilla Resurgence,,
Harry Potter and the Deathly Hallows: Part II,,
Harry Potter and the Deathly Hallows: Part I,,
The A-Team,,
"10,000 B.C.",,
Ben-Hur,,
Hannibal,,
All That Jazz,,
