# Boolean Selection Single Conditions

**Boolean Selection**, also referred to as **boolean indexing**, is the process of selecting subsets of rows from DataFrames (or Series) based on the actual **values** and NOT by labels or integer locations.

In [1]:
import pandas as pd
df = pd.read_csv('/content/sample_data.csv', index_col=0)
df

Unnamed: 0_level_0,state,color,food,age,height,score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


## Operator overloading

*Just the brackets* is **overloaded**. Depending on the inputs, pandas will do something completely different. Here are the rules for the different objects passed to *just the brackets*.

* **string** — return a column as a Series
* **list of strings** — return all those column names as a DataFrame
* **sequence of booleans** — select all rows where `True`

In summary, *just the brackets* primarily selects columns, but, if you pass it a sequence of booleans, it will select all rows that are `True`.

In [5]:
bikes = pd.read_csv('/content/bikes.csv')
bikes.head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
0,Male,2013-06-28 19:01:00,2013-06-28 19:17:00,993.0,Lake Shore Dr & Monroe St,11.0,Michigan Ave & Oak St,15.0,73.9,12.7,mostlycloudy
1,Male,2013-06-28 22:53:00,2013-06-28 23:03:00,623.0,Clinton St & Washington Blvd,31.0,Wells St & Walton St,19.0,69.1,6.9,partlycloudy
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040.0,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy


### Create a boolean Series

Let's create a boolean Series by determining which rows have a trip duration greater than 1,000 seconds. To make the comparison, we select the `tripduration` column as a Series and compare it against the integer 1,000.

In [15]:
filter = bikes['tripduration'] > 1000
filter.head(3)

0    False
1    False
2     True
Name: tripduration, dtype: bool

When we write `bikes['tripduration'] > 1000`, pandas compares each value in the `tripduration` column against 1,000. It returns a new Series the same length as `tripduartion` with boolean values corresponding to the outcome of the comparison. Let's verify that the `filt` Series is the same length as the DataFrame.

In [13]:
len(filter)

24391

In [14]:
len(bikes)

24391

In [16]:
bikes[filter].head(3)

Unnamed: 0,gender,starttime,stoptime,tripduration,from_station_name,start_capacity,to_station_name,end_capacity,temperature,wind_speed,events
2,Male,2013-06-30 14:43:00,2013-06-30 15:01:00,1040.0,Sheffield Ave & Kingsbury St,15.0,Dearborn St & Monroe St,23.0,73.0,16.1,mostlycloudy
8,Male,2013-07-03 15:21:00,2013-07-03 15:42:00,1300.0,Clinton St & Washington Blvd,31.0,Wood St & Division St,15.0,71.1,0.0,cloudy
10,Male,2013-07-04 17:17:00,2013-07-04 17:42:00,1523.0,Morgan St & 18th St,15.0,Damen Ave & Pierce Ave,19.0,79.0,9.2,mostlycloudy


In [17]:
len(bikes[filter])

4816

### How many rows have a trip duration greater than 1000?

In [18]:
bikes_duration_1000 = bikes[filter]

Let's find the number of rows in each DataFrame.

In [19]:
len(bikes)

24391

In [20]:
len(bikes_duration_1000)

4816

We compute that 20% of the rides are longer than 1,000 seconds.

In [21]:
len(bikes_duration_1000) / len(bikes)

0.19744987905374933

## Boolean selection in one line

In [22]:
len(bikes[bikes['tripduration'] > 1000])

4816