---
# Filtering
Using conditionals to filter rows and columns.

#TODO test out filtering in series

---

In [1]:
import pandas as pd
import numpy as np
from IPython.display import display

In [2]:
people = {
    "first": ["Lorem", "John", "Jane"],
    "last": ["Ipsum", "Doe", "Doe"],
    "email": ["lorem@yahoo.com", "john@gmail.com", "jane@outlook.com"],
}
df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


---
## Creating Filters
To create a filter, a conditional is required. This conditional compares the values from a column to another value.

---

In [3]:
# Filter people with last name Doe
# I prefer creating a variable for the filter first
# for readability

# fmt: off
# Turns off black formatting so that the parentheses
# are not removed. I use it for readability

# Creating filter
my_filter1 = (df["last"] == "Doe")
my_filter1

# fmt: on

0    False
1     True
2     True
Name: last, dtype: bool

---
### Creating Filters - using isin()
isin() checks all values in a column if it exists in an interable and returns a Series of bools.

---

In [4]:
# Create a filter of people which have first name in an interable
# called my_iterable.
my_iterable = ["John", "Lorem"]
my_filt = df["first"].isin(my_iterable)
my_filt

0     True
1     True
2    False
Name: first, dtype: bool

---
### Creating Filters - using str.contains()
contains() is a string method that behaves almost like the opposite of isin(); in which it will check a str against all the values in a column and return a Series of bools indicating a match.  
This is useful if cells can contain a list of values instead of just one.

---

In [5]:
# Create a filter of people having the email of provider of
# gmail
email_filt = df["email"].str.contains("gmail.com", na=False)
email_filt

0    False
1     True
2    False
Name: email, dtype: bool

---
## Applying Filters
Filtering in pandas work by passing in a Series of bools known as a Boolean mask or a **filter**, which we created previously into a DataFrame.

---

In [None]:
# Applying the filter using loc (preferred)
# I prefer this since when selecting a column from the filter,
# loc does not need to do double indexing
# (i.e. df[filter, column] instead of df[filter][column])
display(df.loc[my_filter1])

# Another way to apply filter; using index.
display(df[my_filter1])

# Done in a single line:
df[df["last"] == "Doe"]

In [None]:
# Doing it like the previous example means the filter
# will only work on match case meaning only Doe (with capital D)
# will be filtered in. i.e:
my_filter2 = df["last"] == "doe"
display(my_filter2)
# Notice how the filter does not find any matching record.


# Filter people with last name Doe (case insensitive)
# using str accessors
my_filter3 = df["last"].str.lower() == "doe"
display(my_filter3)

---
## Filter using Multiple Conditions
We can filter using the character **&** for combining AND filters,  
and **|** for combining OR filters.

---

In [None]:
## Filtering using & (And)

# Filter rows that have last name of Doe, AND first name of John.
# Note that when combining filters, each condition must be enclosed
# in parenthesis
my_filter4 = (df["last"] == "Doe") & (df["first"] == "John")
df.loc[my_filter4]

In [None]:
## Filtering using | (Or)

# Filter rows that have a first name of Lorem OR first name of Jane
# Return first name and email
my_filter5 = (df["first"] == "Lorem") | (df["first"] == "Jane")
df.loc[my_filter5, ["first", "email"]]

---
## Inversion of Filter
To get the rows that are NOT covered by the filter. That is, the inverse of our filter.  
This is achieved by the bitwise NOT operator, tilde ( ~ ). This effectively inverses all bool in the Boolean mask.  
i.e.  df[~filter]

---

In [None]:
## To get the rows whose last name is NOT Doe:

# Make a filter that filters the Doe last name.
my_filter6 = df["last"] == "Doe"

# Use the tilde (~) operator to select all rows that does
# NOT have the last name Doe. (i.e. the inverse of our filter)
df[~my_filter6]