## Advanced Applications of Select

In [2]:
import pandas as pd
from dfply import *

## Outline

* `select` helper functions
* Basic `filter` format

## Example - Health Survey

In [3]:
survey_raw = pd.read_csv("./data/health_survey.csv")
survey_raw.columns

Index(['Unnamed: 0', 'F1', 'F5', 'F2', 'F1.1', 'F2.1', 'F6', 'F4', 'F3',
       'F5.1', 'F1.2', 'F2.2', 'F6.1', 'F2.3', 'F4.1', 'F2.4', 'F5.2', 'F2.5',
       'F6.2', 'F1.3', 'F2.6', 'F5.3', 'F4.2', 'F2.7', 'F3.1', 'F2.8', 'F5.4',
       'F3.2', 'F1.4', 'F3.3', 'F1.5', 'F5.5', 'F6.3', 'F1.6', 'F5.6', 'F2.9',
       'F3.4', 'F4.3', 'F2.10', 'F1.7', 'F6.4', 'F4.4', 'F5.7', 'F3.5',
       'F2.11'],
      dtype='object')

## Renaming columns

Use `rename` to rename one or more columns.

1. Use keyword arguments, i.e. `new_name = old_name`
2. The RHS can be a string or intention.

In [4]:
(survey_raw
 >> rename(id_col = 'Unnamed: 0', f1 = X.F1) 
 >> head
)

Unnamed: 0,id_col,f1,F5,F2,F1.1,F2.1,F6,F4,F3,F5.1,...,F2.9,F3.4,F4.3,F2.10,F1.7,F6.4,F4.4,F5.7,F3.5,F2.11
0,1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree
2,3,Strongly Agree,Neither Agree nor Disagree,Somewhat Agree,Strongly Agree,Strongly Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Strongly Disagree,Somewhat Agree
3,4,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Agree,Strongly Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Disagree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Disagree,Somewhat Agree
4,5,Strongly Agree,Strongly Disagree,Neither Agree nor Disagree,Strongly Agree,Somewhat Agree,Strongly Disagree,Strongly Agree,Somewhat Agree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Disagree,Somewhat Agree


## Renaming many columns with a `dict`

In [6]:
from composable.strict import map
from composable.sequence import head as head_, to_list

fix_name = lambda n: n.lower().replace('.', '_')

(survey_raw.columns[:10]
 >> map(fix_name)
)


['unnamed: 0', 'f1', 'f5', 'f2', 'f1_1', 'f2_1', 'f6', 'f4', 'f3', 'f5_1']

In [7]:
survey = (survey_raw 
          >> rename(id_col = 'Unnamed: 0', f1 = X.F1) 
          >> rename(**{fix_name(n):n for n in survey_raw.columns}) 
         )
survey.head(2)

Unnamed: 0,id_col,f1,f5,f2,f1_1,f2_1,f6,f4,f3,f5_1,...,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
0,1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree


## Dropping columns with`~` 

In [8]:
(survey 
 >> select(~X.f1) 
 >> head(2)
)

Unnamed: 0,id_col,f5,f2,f1_1,f2_1,f6,f4,f3,f5_1,f1_2,...,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
0,1,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree


## Dropping multiple columns

In [9]:
from dfply import select
(survey >>
  select(~X.f1, ~X.f2) >>
  head(2)
)

Unnamed: 0,id_col,f5,f1_1,f2_1,f6,f4,f3,f5_1,f1_2,f2_2,...,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
0,1,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree


## Dropping columns with `drop`

`dfply.drop` can be used to drop columns.

In [10]:
(survey 
 >> drop(X.f1, X.f2) 
 >> head(2))

Unnamed: 0,id_col,f5,f1_1,f2_1,f6,f4,f3,f5_1,f1_2,f2_2,...,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
0,1,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree


## `select` helper based on label content

Filter columns based on content using

* `starts_with(prefix)`: find columns that start with a string prefix.
* `ends_with(suffix)`: find columns that end with a string suffix.
* `contains(substr)`: find columns that contain a substring in their name.

In [11]:
(survey 
 >> select(starts_with('f1')) 
 >> head(2))

Unnamed: 0,f1,f1_1,f1_2,f1_3,f1_4,f1_5,f1_6,f1_7
0,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree


## `~` composes with `select` helpers

In [12]:
(survey 
 >> select(~starts_with('f1')) 
 >> head(2))

Unnamed: 0,id_col,f5,f2,f2_1,f6,f4,f3,f5_1,f2_2,f6_1,...,f5_6,f2_9,f3_4,f4_3,f2_10,f6_4,f4_4,f5_7,f3_5,f2_11
0,1,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,...,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree


## More detail on `select` helpers

`select` helpers are

* Lazy, returning `Intention` instances
* Return a list of string (eventually)

In [13]:
starts_with('f1')

<dfply.base.Intention at 0x7f75c6c843a0>

In [14]:
starts_with('f1').evaluate(survey)

['f1', 'f1_1', 'f1_2', 'f1_3', 'f1_4', 'f1_5', 'f1_6', 'f1_7']

## `select` helper based position

* `everything()`: all columns.
* `columns_between(start_col, end_col, inclusive=True)`: find columns between a specified start and end column. The inclusive boolean keyword argument indicates whether the end column should be included or not.
* `columns_to(end_col, inclusive=True)`: get columns up to a specified end column. The inclusive argument indicates whether the ending column should be included or not.
* `columns_from(start_col)`: get the columns starting at a specified column.

In [15]:
(survey 
 >> select(columns_between('f1', 'f1_1')) 
 >> head(2))

Unnamed: 0,f1,f5,f2,f1_1
0,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree
1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree


## These helpers work with the `X` intention

In [16]:
(survey 
 >> select(columns_between(X.f1, X['f1_1'])) 
 >> head(2))

Unnamed: 0,f1,f5,f2,f1_1
0,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree
1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree


In [17]:
(survey 
 >> select(~columns_between(X.f1, X['f1_1'])) 
 >> head(2))

Unnamed: 0,id_col,f2_1,f6,f4,f3,f5_1,f1_2,f2_2,f6_1,f2_3,...,f2_9,f3_4,f4_3,f2_10,f1_7,f6_4,f4_4,f5_7,f3_5,f2_11
0,1,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
1,2,Somewhat Agree,Somewhat Disagree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,...,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree


## <font color="red"> Exercise 1 </font> 

Use the `select` helper functions to create a table that contains all `F2` and `F3` questions.  


**Hint:** `select` helpers return intentions that turn into lists.  Think about how you can combine two lists.

In [35]:
# Your code here
(survey 
 >> select(starts_with('f2'), starts_with('f3')) 
).sample(5)

Unnamed: 0,f2,f2_1,f2_2,f2_3,f2_4,f2_5,f2_6,f2_7,f2_8,f2_9,f2_10,f2_11,f3,f3_1,f3_2,f3_3,f3_4,f3_5
48,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree,Neither Agree nor Disagree
208,Strongly Agree,Somewhat Agree,Strongly Agree,Strongly Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree
249,Somewhat Agree,Strongly Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Somewhat Agree,Neither Agree nor Disagree,Neither Agree nor Disagree,Somewhat Agree,Neither Agree nor Disagree
65,Somewhat Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Somewhat Disagree,Strongly Agree,Somewhat Disagree,Somewhat Disagree,Strongly Disagree,Somewhat Disagree,Strongly Disagree,Strongly Disagree
203,Somewhat Agree,Strongly Agree,Somewhat Agree,Somewhat Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree,Strongly Agree
