# 8. Boolean Indexing Single Conditions

### Objectives

+ Boolean Indexing or Boolean Selection is the selection of a subset of a Series/DataFrame based on the **values** themselves and not the row/column labels or integer location
+ Boolean means **`True`** or **`False`**
+ Each row of the DataFrame will be kept or discarded based on the boolean value aligned with it
+ Boolean selection has a two-step process
    + First create a **filter** - a sequence of True/False values the same length as the DataFrame/Series
    + Second, pass this filter to one of the indexers **`[ ]`** or **`.loc`**
+ Boolean selection does not work with **`.iloc`**
+ The indexing operators are overloaded — change functionality depending on what is passed to them
+ The filter is commonly created by comparing a column of data (a Series) against some scalar value


# Boolean Indexing
Boolean indexing, also referred to as **Boolean Selection**, is the process of selecting subsets of rows from DataFrames (or Series) based on the actual data values and NOT by their labels or integer locations.

# Examples of Boolean Indexing

Let's see some examples of actual questions (in plain English) that boolean indexing can help us answer from the bikes dataset.

+ Find all male riders
+ Find all rides with duration longer than 2 hours
+ Find all rides that took place between March and June of 2015.
+ Find all the rides with a duration longer than 2 hours by females with temperature higher than 90 degrees

The term **query** is used to refer to these sorts of questions.

### All queries have a logical condition
Each of the above queries have a strict logical condition that must be checked one row at a time.

Same as where clause in SQL

### Keep or discard an entire row of data
If you were to manually answer the above queries, you would need to scan each row and determine whether the row as a whole meets the condition. If so, then it is kept, otherwise it is discarded.

### Each row will have a True or False value associated with it
When you perform boolean indexing, each row of the DataFrame (or value of a Series) will have a True or False value associated with it depending on whether or not it meets the condition. True/False values are known as boolean. The documentation refers to the entire procedure as boolean indexing.

Since we are using the booleans to select data, it is sometimes referred to as **boolean selection**. We are using booleans to select subsets of data.

### Beginning with a small DataFrame
We will perform our first boolean indexing on a dataset of 5 rows. Let's assign the head of the bikes dataset to its own variable. The `bikes_head` DataFrame has five rows in it.

In [1]:
import pandas as pd
bikes = pd.read_csv('../data/bikes.csv')
bikes_head = bikes.head()
bikes_head

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
1,7524,Subscriber,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wells St & Walton St,41.89993,-87.63443,19.0,69.1,10.0,6.9,-9999.0,partlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
3,12907,Subscriber,Male,2013-07-01 10:05:00,2013-07-01 10:16:00,667,Carpenter St & Huron St,41.894556,-87.653449,19.0,Clark St & Randolph St,41.884576,-87.63189,31.0,72.0,10.0,16.1,-9999.0,mostlycloudy
4,13168,Subscriber,Male,2013-07-01 11:16:00,2013-07-01 11:18:00,130,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,73.0,10.0,17.3,-9999.0,partlycloudy


### A manual filtering of the data
Let's find all the rides with a trip duration greater than 900. We will do this manually by inspecting the data. 


### Create a list of booleans
By inspecting the data, we see that the 1st and 3rd rows have a trip duration greater than 900. A list of 5 boolean values is created, one for each row. The first 1st and 3rd values are `True`. The others are `False`.

In [2]:
filt = [True, False, True, False, False] # row filter, keep the first and third rows

## Variable name `filt`
All of the tutorials will use the variable name `filt` to contain the sequence of booleans. `filt` simply stands for filter. Being consistent with variables makes your code easier to understand.

### Pass this list into the just the brackets
The above list has a True in both the 1st and 3rd position. These will be the rows that are kept during boolean indexing. To formally do boolean indexing, we place the list inside the brackets.

In [3]:
bikes_head[filt]  # selection operator, a list of boolean values will filter the rows
# put the filter in the brackets, and get the rows are ture, that meet the condition

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy


## Wait a second… Isn’t `[ ]` just for column selection?

The primary purpose of *just the brackets* for a DataFrame is to select one or more columns by using either a string or a list of strings. Now, all of a sudden, this example is showing that entire rows are selected with boolean values. This is what makes Pandas, unfortunately, a confusing library to use.

## Operator Overloading
*Just the brackets* is **overloaded**. This means, that depending on the inputs, Pandas will do something completely different. Here are the rules for the different objects you pass to the brackets.

* **string** — return a column as a Series
* **list of strings** — return all those columns as a DataFrame
* **slice** — select rows (can do both label and integer location — confusing!) I never do this as it is ambiguous
* **sequence of booleans** — select all rows where True

In summary, primarily just the indexing operator selects columns, but if you pass it a sequence of booleans it will select all rows that are True.

## Using booleans in a Series and not a list
Instead of using a list to contain our booleans, we can store them in a Series. This produces the same output. Below, we use the Series constructor to create a Series object.

In [4]:
filt = pd.Series([True, False, True, False, False])
filt

0     True
1    False
2     True
3    False
4    False
dtype: bool

### Use the boolean Series to do the boolean selection
Placing the Series directly in the brackets will again select only the rows which have True values in the Series.

In [5]:
bikes_head[filt] # boolean series do the same way as a boolean list: tells which rows we do want

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
0,7147,Subscriber,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993,Lake Shore Dr & Monroe St,41.88105,-87.61697,11.0,Michigan Ave & Oak St,41.90096,-87.623777,15.0,73.9,10.0,12.7,-9999.0,mostlycloudy
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy


# Practical Boolean Selection
We will almost never create boolean lists/Series manually like we did above but instead use the actual data to create them.

## Creating Boolean Series from Column Data
By far the most common way to create a boolean Series will be from the values of one particular column. We will test a condition using one of the six comparison operators:

* `<`
* `<=`
* `>`
* `>=`
* `==`
* `!=`


## Create a Boolean Series
Let's create a boolean Series by determining which rows have a trip duration of over 1000 seconds.

In [6]:
filt = bikes['tripduration'] > 1000 
# select the columns, make a conditon wheather each of the values are greater than 1000, 
# yes or no
filt.head(10)

0    False
1    False
2     True
3    False
4    False
5    False
6    False
7    False
8     True
9    False
Name: tripduration, dtype: bool

### Manually verify correctness
Let's output the head of the trip duration Series to manually verfiy that indeed integer locations 2 and 8 are the ones greater than 1000.

In [15]:
bikes['tripduration'].head(10)

0     993
1     623
2    1040
3     667
4     130
5     660
6     565
7     505
8    1300
9     922
Name: tripduration, dtype: int64

## Complete our boolean indexing
We created our boolean Series, **`filt`** using the greater than comparison operator on the **`tripduration`** column. We can now pass this result into the brackets to filter the entire DataFrame. Verify that all **`tripduration`** values are greater than 1000. 

In [8]:
bikes[filt].head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
10,24383,Subscriber,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,41.858086,-87.651073,15.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,79.0,10.0,9.2,-9999.0,mostlycloudy
11,24673,Subscriber,Male,2013-07-04 18:13:00,2013-07-04 18:42:00,1697,Ashland Ave & Armitage Ave,41.917859,-87.668919,15.0,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,79.0,10.0,10.4,-9999.0,mostlycloudy
12,26214,Subscriber,Male,2013-07-05 10:02:00,2013-07-05 10:40:00,2263,Jefferson St & Monroe St,41.880422,-87.642746,19.0,Jefferson St & Monroe St,41.880422,-87.642746,19.0,79.0,10.0,0.0,-9999.0,partlycloudy


### How many rows have a trip duration greater than 1000?
To answer this question, let's assign the result of the boolean selection to a varible and then retrieve the **`shape`** of the DataFrame.

In [9]:
bikes.shape

(50089, 19)

In [10]:
bikes_duration_1000 = bikes[filt]
bikes_duration_1000.shape # 10178 out of 50089 meet the condition

(10178, 19)

About 20% of the rides are longer than 1000 seconds.

# Boolean selection in one line
Often, you will see boolean selection happen in a single line of code instead of the multiple lines we used above. Put the expression for the filter directly inside the brackets.

In [11]:
bikes[bikes['tripduration'] > 1000].head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
10,24383,Subscriber,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,41.858086,-87.651073,15.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,79.0,10.0,9.2,-9999.0,mostlycloudy
11,24673,Subscriber,Male,2013-07-04 18:13:00,2013-07-04 18:42:00,1697,Ashland Ave & Armitage Ave,41.917859,-87.668919,15.0,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,79.0,10.0,10.4,-9999.0,mostlycloudy
12,26214,Subscriber,Male,2013-07-05 10:02:00,2013-07-05 10:40:00,2263,Jefferson St & Monroe St,41.880422,-87.642746,19.0,Jefferson St & Monroe St,41.880422,-87.642746,19.0,79.0,10.0,0.0,-9999.0,partlycloudy


In [16]:
filt = bikes['tripduration'] > 1000
bikes[filt].head() # same as above, do it seperately

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
2,10927,Subscriber,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040,Sheffield Ave & Kingsbury St,41.909592,-87.653497,15.0,Dearborn St & Monroe St,41.88132,-87.629521,23.0,73.0,10.0,16.1,-9999.0,mostlycloudy
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
10,24383,Subscriber,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523,Morgan St & 18th St,41.858086,-87.651073,15.0,Damen Ave & Pierce Ave,41.909396,-87.677692,19.0,79.0,10.0,9.2,-9999.0,mostlycloudy
11,24673,Subscriber,Male,2013-07-04 18:13:00,2013-07-04 18:42:00,1697,Ashland Ave & Armitage Ave,41.917859,-87.668919,15.0,Lincoln Ave & Armitage Ave,41.918273,-87.638116,19.0,79.0,10.0,10.4,-9999.0,mostlycloudy
12,26214,Subscriber,Male,2013-07-05 10:02:00,2013-07-05 10:40:00,2263,Jefferson St & Monroe St,41.880422,-87.642746,19.0,Jefferson St & Monroe St,41.880422,-87.642746,19.0,79.0,10.0,0.0,-9999.0,partlycloudy


## I recommend using a separate variable for the filter

## Single condition expression
Our first example tested a single condition (whether the trip duration was 1,000 or more). Let’s test a different single condition and look for all the rides that happened when the weather was cloudy.

We use the == operator to test for equality and again pass this variable to the brackets which completes our selection.

In [12]:
filt = bikes['events'] == 'cloudy' # test equality, test cloudy event
bikes[filt].head()

Unnamed: 0,trip_id,usertype,gender,starttime,stoptime,tripduration,from_station_name,latitude_start,longitude_start,dpcapacity_start,to_station_name,latitude_end,longitude_end,dpcapacity_end,temperature,visibility,wind_speed,precipitation,events
6,18880,Subscriber,Male,2013-07-02 17:47:00,2013-07-02 17:56:00,565,Clark St & Randolph St,41.884576,-87.63189,31.0,Ravenswood Ave & Irving Park Rd,41.95469,-87.67393,19.0,66.0,10.0,15.0,-9999.0,cloudy
7,19689,Subscriber,Male,2013-07-03 09:07:00,2013-07-03 09:16:00,505,State St & Van Buren St,41.877181,-87.627844,27.0,Franklin St & Jackson Blvd,41.877708,-87.635321,27.0,64.0,7.0,5.8,-9999.0,cloudy
8,21028,Subscriber,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300,Clinton St & Washington Blvd,41.88338,-87.64117,31.0,Wood St & Division St,41.90332,-87.67273,15.0,71.1,8.0,0.0,-9999.0,cloudy
18,40924,Subscriber,Male,2013-07-09 13:12:00,2013-07-09 14:42:00,5396,Canal St & Jackson Blvd,41.878114,-87.639971,35.0,Millennium Park,41.881032,-87.624084,35.0,79.0,10.0,13.8,0.0,cloudy
19,40879,Subscriber,Male,2013-07-09 13:14:00,2013-07-09 13:20:00,384,Aberdeen St & Madison St,41.881487,-87.654752,19.0,Canal St & Jackson Blvd,41.878114,-87.639971,35.0,79.0,10.0,13.8,0.0,cloudy


# Exercises

### Problem 1
<span  style="color:green; font-size:16px">Read in the movie dataset and set the index to be the title. Select all movies that have Tom Hanks as `actor1`. How many of these movies has he starred in?</span>

In [17]:
# your code here
movie = pd.read_csv('../data/movie.csv', index_col= 'title')
movie.head(2) 

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1


In [18]:
# your code here
filt = movie['actor1'] == 'Tom Hanks' 
movie[filt].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Toy Story 3,2010.0,Color,G,103.0,Lee Unkrich,125.0,Tom Hanks,15000.0,John Ratzenberger,1000.0,...,721.0,414984497.0,Adventure|Animation|Comedy|Family|Fantasy,453.0,544884,college|day care|escape|teddy bear|toy,English,USA,200000000.0,8.3
The Polar Express,2004.0,Color,G,100.0,Robert Zemeckis,0.0,Tom Hanks,15000.0,Eddie Deezen,726.0,...,267.0,665426.0,Adventure|Animation|Family|Fantasy,188.0,120798,boy|christmas|christmas eve|north pole|train,English,USA,165000000.0,6.6
Angels & Demons,2009.0,Color,PG-13,146.0,Ron Howard,2000.0,Tom Hanks,15000.0,Ayelet Zurer,745.0,...,294.0,133375846.0,Mystery|Thriller,298.0,207839,conclave|illuminati|murder|reference to bernin...,English,USA,150000000.0,6.7
The Da Vinci Code,2006.0,Color,PG-13,174.0,Ron Howard,2000.0,Tom Hanks,15000.0,Seth Gabel,574.0,...,362.0,217536138.0,Mystery|Thriller,294.0,314253,based on supposedly true story|holy grail|mary...,English,USA,125000000.0,6.6
Cloud Atlas,2012.0,Color,R,172.0,Tom Tykwer,670.0,Tom Hanks,15000.0,Jim Sturgess,5000.0,...,1000.0,27098580.0,Drama|Sci-Fi,511.0,284825,composer|future|letter|nonlinear timeline|nurs...,English,Germany,102000000.0,7.5


### Problem 2
<span  style="color:green; font-size:16px">Select movies with and IMDB score greater than 9.</span>

In [19]:
# your code here
filt = movie['imdb_score'] > 9
movie[filt].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Shawshank Redemption,1994.0,Color,R,142.0,Frank Darabont,0.0,Morgan Freeman,11000.0,Jeffrey DeMunn,745.0,...,461.0,28341469.0,Crime|Drama,199.0,1689764,escape from prison|first person narration|pris...,English,USA,25000000.0,9.3
Towering Inferno,,Color,,65.0,John Blanchard,0.0,Martin Short,770.0,Andrea Martin,179.0,...,176.0,,Comedy,,10,,English,Canada,,9.5
Dekalog,,Color,TV-MA,55.0,,,Krystyna Janda,20.0,Olaf Lubaszenko,3.0,...,2.0,447093.0,Drama,53.0,12590,meaning of life|moral challenge|morality|searc...,Polish,Poland,,9.1
The Godfather,1972.0,Color,R,175.0,Francis Ford Coppola,0.0,Al Pacino,14000.0,Marlon Brando,10000.0,...,3000.0,134821952.0,Crime|Drama,208.0,1155770,crime family|mafia|organized crime|patriarch|r...,English,USA,6000000.0,9.2
Kickboxer: Vengeance,2016.0,,,90.0,John Stockwell,134.0,Matthew Ziff,260000.0,T.J. Storm,454.0,...,354.0,,Action,2.0,246,,,USA,17000000.0,9.1


### Problem 3
<span  style="color:green; font-size:16px">Select all movies from the 1970s.</span>

In [22]:
filt1 = movie['year'] >= 1970
filt2 = movie['year'] <= 1979
movie[filt1 & filt2].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
All That Jazz,1979.0,Color,R,123.0,Bob Fosse,189.0,Roy Scheider,813.0,Ben Vereen,388.0,...,87.0,,Comedy|Drama|Music|Musical,84.0,19228,dancer|editing|stand up comedian|surgery|vomiting,English,USA,,7.8
Superman,1978.0,Color,PG,188.0,Richard Donner,503.0,Marlon Brando,10000.0,Margot Kidder,593.0,...,467.0,134218018.0,Action|Adventure|Drama|Romance|Sci-Fi,169.0,126357,1970s|clark kent|planet|superhero|year 1978,English,USA,55000000.0,7.3
Solaris,1972.0,Black and White,PG,115.0,Andrei Tarkovsky,0.0,Donatas Banionis,29.0,Anatoliy Solonitsyn,29.0,...,12.0,,Drama|Mystery|Sci-Fi,144.0,54057,hallucination|ocean|psychologist|scientist|spa...,Russian,Soviet Union,1000000.0,8.1
Mean Streets,1973.0,Color,R,112.0,Martin Scorsese,17000.0,Robert De Niro,22000.0,David Carradine,926.0,...,354.0,32645.0,Crime|Drama|Romance|Thriller,112.0,67797,bar|catholic guilt|epilepsy|italian american|m...,English,USA,500000.0,7.4
Star Trek: The Motion Picture,1979.0,Color,PG,143.0,Robert Wise,338.0,Leonard Nimoy,12000.0,Nichelle Nichols,664.0,...,643.0,82300000.0,Adventure|Mystery|Sci-Fi,134.0,63330,alien|space|space station|spacecraft|warp speed,English,USA,35000000.0,6.4


In [24]:
# your code here
movie[(movie['year'] >= 1970) & (movie['year'] <= 1979)].head()

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
All That Jazz,1979.0,Color,R,123.0,Bob Fosse,189.0,Roy Scheider,813.0,Ben Vereen,388.0,...,87.0,,Comedy|Drama|Music|Musical,84.0,19228,dancer|editing|stand up comedian|surgery|vomiting,English,USA,,7.8
Superman,1978.0,Color,PG,188.0,Richard Donner,503.0,Marlon Brando,10000.0,Margot Kidder,593.0,...,467.0,134218018.0,Action|Adventure|Drama|Romance|Sci-Fi,169.0,126357,1970s|clark kent|planet|superhero|year 1978,English,USA,55000000.0,7.3
Solaris,1972.0,Black and White,PG,115.0,Andrei Tarkovsky,0.0,Donatas Banionis,29.0,Anatoliy Solonitsyn,29.0,...,12.0,,Drama|Mystery|Sci-Fi,144.0,54057,hallucination|ocean|psychologist|scientist|spa...,Russian,Soviet Union,1000000.0,8.1
Mean Streets,1973.0,Color,R,112.0,Martin Scorsese,17000.0,Robert De Niro,22000.0,David Carradine,926.0,...,354.0,32645.0,Crime|Drama|Romance|Thriller,112.0,67797,bar|catholic guilt|epilepsy|italian american|m...,English,USA,500000.0,7.4
Star Trek: The Motion Picture,1979.0,Color,PG,143.0,Robert Wise,338.0,Leonard Nimoy,12000.0,Nichelle Nichols,664.0,...,643.0,82300000.0,Adventure|Mystery|Sci-Fi,134.0,63330,alien|space|space station|spacecraft|warp speed,English,USA,35000000.0,6.4


# Do some boolean selection on your own