## Scaling up of Select and Drop with Comprehensions

In [1]:
import polars as pl
import re

## Outline

1. Using a `list` comprehension to select and drop columns,
2. Using regular expressions to create a complex filter,
3. Writing helper functions for selecting columns by position.

## Example - Health Survey

In [4]:
fix_name = lambda n: n.lower().replace('.', '_')

survey_raw = pl.read_csv("./data/health_survey.csv")

survey = (survey_raw
          .rename({'':'ID'})
          .rename({n:fix_name(n) for n in survey_raw.columns}) 
         )
survey.head(2)

ID,f1,f5,f2,f1_1,f2_1,f6,f4,f3,f5_1,f1_2,f2_2,f6_1,f2_3,f4_1,f2_4,f5_2,f2_5,f6_2,f1_3,f2_6,f5_3,f4_2,f2_7,f3_1,f2_8,f5_4,f3_2,f1_4,f3_3,f1_5,f5_5,f6_3,f1_6,f5_6,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


## Dropping one or more columns with `drop`

### Selecting or Dropping a list of multiple columns

#### Example - Starts with `f1`

In [6]:
(survey
 .select([c for c in survey.columns if c.startswith('f1')])
 .head(2))

f1,f1_1,f1_2,f1_3,f1_4,f1_5,f1_6,f1_7
str,str,str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree"""


In [8]:
(survey
 .drop([c for c in survey.columns if c.startswith('f1')])
 .head(2))

ID,f5,f2,f2_1,f6,f4,f3,f5_1,f2_2,f6_1,f2_3,f4_1,f2_4,f5_2,f2_5,f6_2,f2_6,f5_3,f4_2,f2_7,f3_1,f2_8,f5_4,f3_2,f3_3,f5_5,f6_3,f5_6,f2_9,f3_4,f4_3,f2_10,f6_4,f4_4,f5_7,f3_5,f2_11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


#### Example - Endswith `_1`

In [10]:
(survey
 .select([c for c in survey.columns if c.endswith('_1')])
 .head(2))

f1_1,f2_1,f5_1,f6_1,f4_1,f3_1
str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree"""
"""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree"""


In [11]:
(survey
 .drop([c for c in survey.columns if c.endswith('_1')])
 .head(2))

ID,f1,f5,f2,f6,f4,f3,f1_2,f2_2,f2_3,f2_4,f5_2,f2_5,f6_2,f1_3,f2_6,f5_3,f4_2,f2_7,f2_8,f5_4,f3_2,f1_4,f3_3,f1_5,f5_5,f6_3,f1_6,f5_6,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
i64,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
1,"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
2,"""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


#### Regex for more complex logic.

**Example.** Keep questions of type `f1` or `f2`, but only questions `1`, `2`, or `3`

In [13]:
col_to_keep = re.compile(r'^f[12]_[123]$')

(survey
 .select([c for c in survey.columns if col_to_keep.match(c)])
 .head(2))

f1_1,f2_1,f1_2,f2_2,f2_3,f1_3
str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree"""


#### BUT .... `polars` already does that (will be useful with `pyspark`)

In [14]:
(survey
 .select(pl.col(r'^f[12]_[123]$'))
 .head(2))

f1_1,f2_1,f1_2,f2_2,f2_3,f1_3
str,str,str,str,str,str
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree"""


## `select` by position

To `select` by the position of a column using column names, we need to

1. Use `index` to find the desired position,
2. Use `enumerate` and a `filter` to restrict columns.

#### Example - Keep all columns starting from `f6`.

#### Step 1 - Play around to figure out the inner logic

In [15]:
# Finding the correct method to determine the index of a given column
[m for m in dir(survey.columns) if not m.startswith('__')]

['append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [16]:
help(survey.columns.index)

Help on built-in function index:

index(value, start=0, stop=9223372036854775807, /) method of builtins.list instance
    Return first index of value.

    Raises ValueError if the value is not present.



In [17]:
survey.columns.index('f6')

6

In [19]:
# Filtering by index using `enumerate` and `index` method
[c for i, c in enumerate(survey.columns) if i >= survey.columns.index('f6')]

['f6',
 'f4',
 'f3',
 'f5_1',
 'f1_2',
 'f2_2',
 'f6_1',
 'f2_3',
 'f4_1',
 'f2_4',
 'f5_2',
 'f2_5',
 'f6_2',
 'f1_3',
 'f2_6',
 'f5_3',
 'f4_2',
 'f2_7',
 'f3_1',
 'f2_8',
 'f5_4',
 'f3_2',
 'f1_4',
 'f3_3',
 'f1_5',
 'f5_5',
 'f6_3',
 'f1_6',
 'f5_6',
 'f2_9',
 'f3_4',
 'f4_3',
 'f2_10',
 'f1_7',
 'f6_4',
 'f4_4',
 'f5_7',
 'f3_5',
 'f2_11']

In [23]:
# Abstract the pattern with a lambda
from_ = lambda df, col: [c for i, c in enumerate(df.columns) if i >= df.columns.index(col)]

from_(survey, 'f6')

['f6',
 'f4',
 'f3',
 'f5_1',
 'f1_2',
 'f2_2',
 'f6_1',
 'f2_3',
 'f4_1',
 'f2_4',
 'f5_2',
 'f2_5',
 'f6_2',
 'f1_3',
 'f2_6',
 'f5_3',
 'f4_2',
 'f2_7',
 'f3_1',
 'f2_8',
 'f5_4',
 'f3_2',
 'f1_4',
 'f3_3',
 'f1_5',
 'f5_5',
 'f6_3',
 'f1_6',
 'f5_6',
 'f2_9',
 'f3_4',
 'f4_3',
 'f2_10',
 'f1_7',
 'f6_4',
 'f4_4',
 'f5_7',
 'f3_5',
 'f2_11']

In [24]:
(survey
 .select(from_(survey, 'f6'))
 .head(2)
)

f6,f4,f3,f5_1,f1_2,f2_2,f6_1,f2_3,f4_1,f2_4,f5_2,f2_5,f6_2,f1_3,f2_6,f5_3,f4_2,f2_7,f3_1,f2_8,f5_4,f3_2,f1_4,f3_3,f1_5,f5_5,f6_3,f1_6,f5_6,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree"""
"""Somewhat Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Neither Agree nor Disagree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Somewhat Agree""","""Somewhat Disagree""","""Neither Agree nor Disagree""","""Somewhat Agree""","""Neither Agree nor Disagree""","""Somewhat Agree"""


## <font color="red"> Exercise 3.4 </font>

Write each of the following functions.

1. A `lambda` function named `to_` that takes a data frame and column name as inputs and returns a list of all columns up to and including that column.  Do this directly by using `enumerate` and keeping the corresponding indexes.
2. A `lambda` function named `between` that takes a data frame and two column names as inputs and returns a list of all columns between the first and second column names (inclusive). Do this directly by using `enumerate` and keeping the corresponding indexes.

In [56]:
# Your code here