## Colab Prep

Execute the following code cells to whenever you open/restart the notebook in Google Colab.

In [None]:
!pip install "polars[all]" #execute each time you start/restart a Colab session

In [None]:
!wget https://github.com/WSU-DataScience/dsci_325_module6_basic_data_management_in_python/raw/main/sample_data.zip

In [None]:
!unzip ./sample_data.zip

# Module 6.6 - Review of `polars` basics

## Topic 1 - Dataframe basics
**Topics.**

1. Reading and writing `CSV` files.
2. Column expressions
3. Inspecting a dataframe

In [23]:
import polars as pl
pl.Config.with_columns_kwargs = True

### Reading from a data file with `read_csv`

#### Open a CSV file from a local file w/ relative path

In [3]:
artists = pl.read_csv('./sample_data/Artists.csv')
artists.head()

ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
i64,str,str,str,str,i64,i64,str,i64
1,"""Robert Arneson""","""American, 1930–1992""","""American""","""Male""",1930,1992,,
2,"""Doroteo Arnaiz""","""Spanish, born 1936""","""Spanish""","""Male""",1936,0,,
3,"""Bill Arnold""","""American, born 1941""","""American""","""Male""",1941,0,,
4,"""Charles Arnoldi""","""American, born 1946""","""American""","""Male""",1946,0,"""Q1063584""",500027998.0
5,"""Per Arnoldi""","""Danish, born 1941""","""Danish""","""Male""",1941,0,,


#### Open a CSV using a web address

In [5]:
url = "https://github.com/MuseumofModernArt/collection/raw/main/Artists.csv"
artists =  pl.read_csv(url)
artists.head()

ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
i64,str,str,str,str,i64,i64,str,i64
1,"""Robert Arneson""","""American, 1930–1992""","""American""","""male""",1930,1992,,
2,"""Doroteo Arnaiz""","""Spanish, born 1936""","""Spanish""","""male""",1936,0,,
3,"""Bill Arnold""","""American, born 1941""","""American""","""male""",1941,0,,
4,"""Charles Arnoldi""","""American, born 1946""","""American""","""male""",1946,0,"""Q1063584""",500027998.0
5,"""Per Arnoldi""","""Danish, born 1941""","""Danish""","""male""",1941,0,,


### Lazy column expressions

In [12]:
pl.col('BeginDate')

In [10]:
pl.col('BeginDate').str.contains('19')

In [11]:
pl.col('BeginDate') >= 1946

### Inspecting a dataframe

In [13]:
artists.dtypes

[Int64, String, String, String, String, Int64, Int64, String, Int64]

In [14]:
artists.shape

(15595, 9)

In [30]:
artists.describe()

statistic,ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
str,f64,str,str,str,str,f64,f64,str,f64
"""count""",15595.0,"""15595""","""13400""","""13106""","""12352""",15595.0,15595.0,"""3247""",2931.0
"""null_count""",0.0,"""0""","""2195""","""2489""","""3243""",0.0,0.0,"""12348""",12664.0
"""mean""",23854.509458,,,,,1490.80109,703.983584,,500070000.0
"""std""",29037.329213,,,,,810.710112,947.599848,,86603.459826
"""min""",1.0,"""""a.r."" group""","""1858–ca. 1910""","""Afghan""","""female""",0.0,0.0,"""Q1000203""",500000006.0
"""25%""",4371.0,,,,,1854.0,0.0,,500017574.0
"""50%""",9436.0,,,,,1923.0,0.0,,500033033.0
"""75%""",35521.0,,,,,1948.0,1967.0,,500114615.0
"""max""",138323.0,"""…XYZ Dot Dot Dot Ex Why Zed De…","""Łódź, Poland, est. 1929 – 1936""","""Zimbabwean""","""unknown. (non-binary or trans?…",2017.0,2024.0,"""Q993400""",500356571.0


In [31]:
artists.columns

['ConstituentID',
 'DisplayName',
 'ArtistBio',
 'Nationality',
 'Gender',
 'BeginDate',
 'EndDate',
 'Wiki QID',
 'ULAN']

In [32]:
artists.head()

ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
i64,str,str,str,str,i64,i64,str,i64
1,"""Robert Arneson""","""American, 1930–1992""","""American""","""male""",1930,1992,,
2,"""Doroteo Arnaiz""","""Spanish, born 1936""","""Spanish""","""male""",1936,0,,
3,"""Bill Arnold""","""American, born 1941""","""American""","""male""",1941,0,,
4,"""Charles Arnoldi""","""American, born 1946""","""American""","""male""",1946,0,"""Q1063584""",500027998.0
5,"""Per Arnoldi""","""Danish, born 1941""","""Danish""","""male""",1941,0,,


In [34]:
artists.tail()

ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
i64,str,str,str,str,i64,i64,str,i64
138318,"""Angelo González""",,,"""male""",0,0,,
138319,"""Roy Battiste""",,,"""male""",0,0,,
138320,"""(Moses) Anthony Figueroa""",,,,0,0,,
138321,"""Sal Becker""",,,"""male""",0,0,,
138323,"""MTA (Marina Tabassum Architect…","""Bangladesh, founded 2005""","""Bangladesh""",,2005,0,,


## Topic 2 - Select, Filter, and Mutate in `polars`

In [16]:
heroes = pl.read_csv('./sample_data/heroes_information.csv')
heroes.head()

Unnamed: 0_level_0,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""","""-""","""good""",441.0
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0
3,"""Abomination""","""Male""","""green""","""Human / Radiation""","""No Hair""",203.0,"""Marvel Comics""","""-""","""bad""",441.0
4,"""Abraxas""","""Male""","""blue""","""Cosmic Entity""","""Black""",-99.0,"""Marvel Comics""","""-""","""bad""",-99.0


### Selecting Columns with `select`

In [18]:
(heroes
 .select(['Eye color', 
          pl.col('name'),  
          heroes['Gender'],
         ])
 .head()
)

Eye color,name,Gender
str,str,str
"""yellow""","""A-Bomb""","""Male"""
"""blue""","""Abe Sapien""","""Male"""
"""blue""","""Abin Sur""","""Male"""
"""green""","""Abomination""","""Male"""
"""blue""","""Abraxas""","""Male"""


### How to filter in `polars`

In [19]:
(heroes 
 .filter(pl.col('Gender') == 'Male') # With a column expression
 .head()
)

Unnamed: 0_level_0,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""","""-""","""good""",441.0
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0
3,"""Abomination""","""Male""","""green""","""Human / Radiation""","""No Hair""",203.0,"""Marvel Comics""","""-""","""bad""",441.0
4,"""Abraxas""","""Male""","""blue""","""Cosmic Entity""","""Black""",-99.0,"""Marvel Comics""","""-""","""bad""",-99.0


### Dot-chaining methods in a "pipe"

In [12]:
(heroes
.select(['name', 'Gender', 'Weight'])
.filter(pl.col('Gender') == 'Male')
.filter(pl.col('Weight') > 0)
.head()
)

name,Gender,Weight
str,str,f64
"""A-Bomb""","""Male""",441.0
"""Abe Sapien""","""Male""",65.0
"""Abin Sur""","""Male""",90.0
"""Abomination""","""Male""",441.0
"""Absorbing Man""","""Male""",122.0


### Two ways to `MUTATE` in `polars`

1. Inside `select` like SQL
2. Using `with_columns`

In [20]:
(heroes 
 .select(['name', 
          'Gender', 
          'Weight',
          (pl.col('Weight')/2.2046).alias('Weight_kg'),
         ]) 
 .head()
)

name,Gender,Weight,Weight_kg
str,str,f64,f64
"""A-Bomb""","""Male""",441.0,200.036288
"""Abe Sapien""","""Male""",65.0,29.483807
"""Abin Sur""","""Male""",90.0,40.823732
"""Abomination""","""Male""",441.0,200.036288
"""Abraxas""","""Male""",-99.0,-44.906105


In [24]:
# Note: requires the `pl.Config.with_columns_kwargs = True` setting

(heroes 
 .select(['name', 
          'Gender', 
          'Weight'
         ]) 
 .with_columns(Weight_kg = pl.col('Weight')/2.2046,
               Weight_g =  pl.col('Weight')/2.2046*1000,
               ) 
 .head()
)

name,Gender,Weight,Weight_kg,Weight_g
str,str,f64,f64,f64
"""A-Bomb""","""Male""",441.0,200.036288,200036.287762
"""Abe Sapien""","""Male""",65.0,29.483807,29483.806586
"""Abin Sur""","""Male""",90.0,40.823732,40823.732196
"""Abomination""","""Male""",441.0,200.036288,200036.287762
"""Abraxas""","""Male""",-99.0,-44.906105,-44906.105416


### WARNING: Referencing a new column is tricky!

#### Cannot reference a new column in the same `with_columns`

In [27]:
pl.Config.with_columns_kwargs = True

(heroes 
 .select(['name', 
          'Gender', 
          'Weight'
         ]) 
 .with_columns(Weight_kg = pl.col('Weight')/2.2046,
               Weight_g =  pl.col('Weight_kg')*1000) # References new column ==> CRASH!
 .filter(pl.col('Weight_kg') < 100) 
 .head()
)

ColumnNotFoundError: Weight_kg

#### New column $\longrightarrow$ new `with_columns`

In [28]:


(heroes 
 .select(['name', 
          'Gender', 
          'Weight'
         ]) 
 .with_columns(Weight_kg = pl.col('Weight')/2.2046)
 .with_columns(Weight_g =  pl.col('Weight_kg')*1000) # Can reference in new call
 .filter(pl.col('Weight_kg') < 100) 
 .head()
)

name,Gender,Weight,Weight_kg,Weight_g
str,str,f64,f64,f64
"""Abe Sapien""","""Male""",65.0,29.483807,29483.806586
"""Abin Sur""","""Male""",90.0,40.823732,40823.732196
"""Abraxas""","""Male""",-99.0,-44.906105,-44906.105416
"""Absorbing Man""","""Male""",122.0,55.338837,55338.836977
"""Adam Monroe""","""Male""",-99.0,-44.906105,-44906.105416


## Topic 3 - Advanced Applications of Select

In [29]:
survey_raw = pl.read_csv("./sample_data/health_survey.csv")
survey_raw.columns

['',
 'F1',
 'F5',
 'F2',
 'F1.1',
 'F2.1',
 'F6',
 'F4',
 'F3',
 'F5.1',
 'F1.2',
 'F2.2',
 'F6.1',
 'F2.3',
 'F4.1',
 'F2.4',
 'F5.2',
 'F2.5',
 'F6.2',
 'F1.3',
 'F2.6',
 'F5.3',
 'F4.2',
 'F2.7',
 'F3.1',
 'F2.8',
 'F5.4',
 'F3.2',
 'F1.4',
 'F3.3',
 'F1.5',
 'F5.5',
 'F6.3',
 'F1.6',
 'F5.6',
 'F2.9',
 'F3.4',
 'F4.3',
 'F2.10',
 'F1.7',
 'F6.4',
 'F4.4',
 'F5.7',
 'F3.5',
 'F2.11']

### Renaming columns

In [35]:
survey = survey_raw.rename({'':'ID'})
survey.head()

ID,F1,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,F1.2,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F1.3,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F1.4,F3.3,F1.5,F5.5,F6.3,F1.6,F5.6,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""
3,"""Strongly Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Strongly Agree""","""Strongly Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Strongly Agree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Strongly Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Strongly Agree""","""Strongly Disagree""","""Somewhat Agree"""
4,"""Somewhat Agree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Agree""","""Strongly Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Strongly Agree""","""Strongly Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Strongly Agree""","""Somewhat Disagree""","""Strongly Agree""","""Somewhat Agree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree"""
5,"""Strongly Agree""","""Strongly Disagree""","""Neither Agree nor Disagree""","""Strongly Agree""","""Somewhat Agree""","""Strongly Disagree""","""Strongly Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Strongly Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Strongly Disagree""","""Strongly Agree""","""Strongly Agree""","""Strongly Agree""","""Strongly Agree""","""Strongly Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Strongly Disagree""","""Neither Agree nor Disagree""","""Strongly Agree""","""Strongly Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Strongly Agree""","""Somewhat Disagree""","""Somewhat Agree"""


### Dropping one or more columns with `drop`

#### Dropping a list of multiple columns

In [37]:
(survey
 .drop('F1')
 .head(2)
)

ID,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,F1.2,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F1.3,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F1.4,F3.3,F1.5,F5.5,F6.3,F1.6,F5.6,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


In [38]:
(survey
 .drop(['F1', 'F2'])
 .head(2)
)

ID,F5,F1.1,F2.1,F6,F4,F3,F5.1,F1.2,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F1.3,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F1.4,F3.3,F1.5,F5.5,F6.3,F1.6,F5.6,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


### `select` using the column type

In [47]:
(survey
 .select(pl.col(pl.Utf8))
 .head(2)
)

F1,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,F1.2,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F1.3,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F1.4,F3.3,F1.5,F5.5,F6.3,F1.6,F5.6,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


### `select` using regular expression

In [48]:
(survey
 .select(pl.col(r'^F1.*$')) # Needs to a be rare string starting, e.g. `r'...'`
 .head(2)
)

F1,F1.1,F1.2,F1.3,F1.4,F1.5,F1.6,F1.7
str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree"""


In [49]:
(survey
 .select(pl.col(r'^(F1|F2).*$'))
 .head(2)
)

F1,F2,F1.1,F2.1,F1.2,F2.2,F2.3,F2.4,F2.5,F1.3,F2.6,F2.7,F2.8,F1.4,F1.5,F1.6,F2.9,F2.10,F1.7,F2.11
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""


In [50]:
(survey
 .select(pl.col(r'^.*\..*$'))
 .head(2)
)

F1.1,F2.1,F5.1,F1.2,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F1.3,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F1.4,F3.3,F1.5,F5.5,F6.3,F1.6,F5.6,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


### `exclude` a column with `pl.all().exclude('reg_ex')`

In [51]:
(survey
 .select(pl.all().exclude(r'^F1.*$'))
 .head(2))

ID,F5,F2,F2.1,F6,F4,F3,F5.1,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F3.3,F5.5,F6.3,F5.6,F2.9,F3.4,F4.3,F2.10,F6.4,F4.4,F5.7,F3.5,F2.11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


#### Excluding columns by type

In [52]:
(survey
 .select(pl.all().exclude(pl.Int64))
 .head(2))

F1,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,F1.2,F2.2,F6.1,F2.3,F4.1,F2.4,F5.2,F2.5,F6.2,F1.3,F2.6,F5.3,F4.2,F2.7,F3.1,F2.8,F5.4,F3.2,F1.4,F3.3,F1.5,F5.5,F6.3,F1.6,F5.6,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


### `select` helpers from `more_polars`

In [None]:
# Run as needed
%pip install more_polars

In [55]:
from more_polars import cols

(survey
 .select(survey.columns >> cols.startswith('F1'))
 .head(2)
)

F1,F1.1,F1.2,F1.3,F1.4,F1.5,F1.6,F1.7
str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree"""


In [48]:
(survey
 .select(survey.columns >> cols.between('F1', 'F1.1'))
 .head(2))

F1,F5,F2,F1.1
str,str,str,str
"""Somewhat Agree...","""Somewhat Disag...","""Somewhat Agree...","""Somewhat Agree..."
"""Somewhat Agree...","""Somewhat Disag...","""Somewhat Agree...","""Somewhat Agree..."


In [50]:
(survey
 .select(survey.columns >> cols.from_('F5.7'))
 .head(2))

F5.7,F3.5,F2.11
str,str,str
"""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree..."
"""Somewhat Agree...","""Neither Agree ...","""Somewhat Agree..."


In [51]:
(survey
 .select(survey.columns >> cols.to('F1.1'))
 .head(2))

ID,F1,F5,F2,F1.1
i64,str,str,str,str
1,"""Somewhat Agree...","""Somewhat Disag...","""Somewhat Agree...","""Somewhat Agree..."
2,"""Somewhat Agree...","""Somewhat Disag...","""Somewhat Agree...","""Somewhat Agree..."


In [60]:
(survey
 .select( (survey.columns >> cols.startswith('F1'))
          + (survey.columns >> cols.startswith('F2'))
         )
 .head(2))

F1,F1.1,F1.2,F1.3,F1.4,F1.5,F1.6,F1.7,F2,F2.1,F2.2,F2.3,F2.4,F2.5,F2.6,F2.7,F2.8,F2.9,F2.10,F2.11
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Neither Agree ...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree..."
"""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Neither Agree ...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Neither Agree ...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree...","""Somewhat Agree..."


#### Beware of duplicate columns!

In [58]:
(survey
 .select( (survey.columns >> cols.startswith('F1'))
          + (survey.columns >> cols.contains('.'))
         )
 .head(2))

DuplicateError: the name 'F1.1' is duplicate

It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.

## Topic 4 -  Advanced Filters

In [59]:
heroes = (pl.read_csv('./sample_data/heroes_information.csv', null_values=['-', '-99.0', ''])
          .rename({'':'ID'})
         )
heroes.head()

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""",,"""good""",441.0
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0
3,"""Abomination""","""Male""","""green""","""Human / Radiation""","""No Hair""",203.0,"""Marvel Comics""",,"""bad""",441.0
4,"""Abraxas""","""Male""","""blue""","""Cosmic Entity""","""Black""",,"""Marvel Comics""",,"""bad""",


#### Check equality using `==`

In [60]:
(heroes
 .filter(pl.col('Eye color') == 'blue')
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0


#### Check not equal using `!=`

In [61]:
(heroes
 .filter(pl.col('Eye color') != 'blue')
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""",,"""good""",441.0
3,"""Abomination""","""Male""","""green""","""Human / Radiation""","""No Hair""",203.0,"""Marvel Comics""",,"""bad""",441.0


#### Other inequalities

In [62]:
(heroes
 .filter(pl.col('Height') > 200)
 .filter(pl.col('Weight') <= 440)
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
17,"""Alien""","""Male""",,"""Xenomorph XX121""","""No Hair""",244.0,"""Dark Horse Comics""","""black""","""bad""",169.0
19,"""Amazo""","""Male""","""red""","""Android""",,257.0,"""DC Comics""",,"""bad""",173.0


#### `LIKE 'Super%'`

In [63]:
(heroes
 .filter(pl.col('name').str.starts_with('Super'))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
643,"""Superboy""","""Male""","""blue""",,"""Black""",170.0,"""DC Comics""",,"""good""",68.0
644,"""Superboy-Prime""","""Male""","""blue""","""Kryptonian""","""Black / Blue""",180.0,"""DC Comics""",,"""bad""",77.0


#### `LIKE '%boy'`

In [64]:
(heroes
 .filter(pl.col('name').str.ends_with('boy'))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
142,"""Bumbleboy""","""Male""",,,,,"""Marvel Comics""",,"""good""",
321,"""Hellboy""","""Male""","""gold""","""Demon""","""Black""",259.0,"""Dark Horse Comics""",,"""good""",158.0


#### `LIKE '%boy%'`

In [65]:
(heroes
 .filter(pl.col('name').str.contains('boy'))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
142,"""Bumbleboy""","""Male""",,,,,"""Marvel Comics""",,"""good""",
321,"""Hellboy""","""Male""","""gold""","""Demon""","""Black""",259.0,"""Dark Horse Comics""",,"""good""",158.0


#### `ILIKE` using `str.to_lowercase()`

In [66]:
(heroes
 .filter(pl.col('name').str.to_lowercase().str.contains('boy'))
 .head(3))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
46,"""Astro Boy""","""Male""","""brown""",,"""Black""",,,,"""good""",
75,"""Beast Boy""","""Male""","""green""","""Human""","""Green""",173.0,"""DC Comics""","""green""","""good""",68.0
142,"""Bumbleboy""","""Male""",,,,,"""Marvel Comics""",,"""good""",


#### Using RegEx with `str.contains`

In [67]:
(heroes
 .filter(pl.col('Publisher').str.contains('DC|Marvel'))
 .filter(pl.col('name').str.contains('\s[Bb]oy|\wboy'))
 .head()
)

  .filter(pl.col('name').str.contains('\s[Bb]oy|\wboy'))


ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
75,"""Beast Boy""","""Male""","""green""","""Human""","""Green""",173.0,"""DC Comics""","""green""","""good""",68.0
142,"""Bumbleboy""","""Male""",,,,,"""Marvel Comics""",,"""good""",
183,"""Colossal Boy""","""Male""",,,,,"""DC Comics""",,"""good""",
643,"""Superboy""","""Male""","""blue""",,"""Black""",170.0,"""DC Comics""",,"""good""",68.0
644,"""Superboy-Prime""","""Male""","""blue""","""Kryptonian""","""Black / Blue""",180.0,"""DC Comics""",,"""bad""",77.0


#### `IN` 

In [69]:
(heroes
 .filter(pl.col('Publisher').is_in(['DC Comics', 'Marvel Comics']))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""",,"""good""",441.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0


#### `NOT IN` 

In [70]:
(heroes
 .filter(~pl.col('Publisher').is_in(['DC Comics', 'Marvel Comics']))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
6,"""Adam Monroe""","""Male""","""blue""",,"""Blond""",,"""NBC - Heroes""",,"""good""",


#### `IS NULL`

In [71]:
(heroes
 .filter(pl.col('Skin color').is_null())
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""",,"""good""",441.0
3,"""Abomination""","""Male""","""green""","""Human / Radiation""","""No Hair""",203.0,"""Marvel Comics""",,"""bad""",441.0


#### `IS NOT NULL`  using `is_not_null`

In [72]:
(heroes
 .filter(pl.col('Skin color').is_not_null())
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0


#### `AND` using `&`

In [73]:
(heroes
 .filter((pl.col('Hair color') == 'No Hair') & (pl.col('Eye color') == 'blue'))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0
2,"""Abin Sur""","""Male""","""blue""","""Ungaran""","""No Hair""",185.0,"""DC Comics""","""red""","""good""",90.0


#### `OR`

In [74]:
(heroes
 .filter((pl.col('Hair color') == 'No Hair') | (pl.col('Eye color') == 'blue'))
 .head(2))

ID,name,Gender,Eye color,Race,Hair color,Height,Publisher,Skin color,Alignment,Weight
i64,str,str,str,str,str,f64,str,str,str,f64
0,"""A-Bomb""","""Male""","""yellow""","""Human""","""No Hair""",203.0,"""Marvel Comics""",,"""good""",441.0
1,"""Abe Sapien""","""Male""","""blue""","""Icthyo Sapien""","""No Hair""",191.0,"""Dark Horse Comics""","""blue""","""good""",65.0


#### WARNING: Don't use `and` and `or` (hard to do)

In [75]:
(heroes
 .filter((pl.col('Hair color') == 'No Hair') and (pl.col('Eye color') == 'blue'))
 .head(2))

TypeError: the truth value of an Expr is ambiguous

You probably got here by using a Python standard library function instead of the native expressions API.
Here are some things you might want to try:
- instead of `pl.col('a') and pl.col('b')`, use `pl.col('a') & pl.col('b')`
- instead of `pl.col('a') in [y, z]`, use `pl.col('a').is_in([y, z])`
- instead of `max(pl.col('a'), pl.col('b'))`, use `pl.max_horizontal(pl.col('a'), pl.col('b'))`


## Topic 5 - Conditional Expressions

### Conditional expressions in `polars`

To perform a `CASE WHEN` in `polars` with a single dot-chain by
* Start with `pl.when(...).then(...)`
* Add any number of additional `.when(...).then(...)` to the dot-chain
* Add a `.otherwise(...)` to catch all remaining cases.

In [77]:
df = pl.DataFrame({'cat':['a','b','b','c','c'],
                   'val':[ 1,  1,  2,  1, 2]})
df

cat,val
str,i64
"""a""",1
"""b""",1
"""b""",2
"""c""",1
"""c""",2


In [80]:
(df
 .with_columns(new = pl.when(pl.col('cat') == 'a')
                       .then(pl.col('val') + 1)
                       .when(pl.col('cat') == 'b')
                       .then(pl.col('val') + 10)
                       .otherwise(pl.col('val'))
              )
)

cat,val,new
str,i64,i64
"""a""",1,2
"""b""",1,11
"""b""",2,12
"""c""",1,1
"""c""",2,2


### Including `polars` literal values

Note that
* `polars` is actually implemented in Rust.
* Literal/constant values need to use `pl.lit`.

In [81]:
0 # Python integer

0

In [82]:
pl.lit(0) # Gets converted to Rust/Apache Arrow

In [83]:
pl.lit(0, pl.Int8) # Cast to a specific int type

#### `case_when` with an optional literal value

In [84]:
(df
 .with_columns(new = pl.when(pl.col('cat') == 'a')
                       .then(pl.col('val') + 1)
                       .when(pl.col('cat') == 'b')
                       .then(pl.col('val') + 10)
                       .otherwise(pl.lit(0))
              )
)

cat,val,new
str,i64,i64
"""a""",1,2
"""b""",1,11
"""b""",2,12
"""c""",1,0
"""c""",2,0
