# It all the same!

When releasing the `dplyr` library, Wickham proclaimed the functions in the library to be the *Grammar of data manipulation*.  While I might quibble with this claim, he makes a strong point: **The verbs presented in `dplyr` provide a platform-independent language for manipulating tables.**

In this lecture, we will illustrate this point by constructing wrapper functions for `sqlalchemy` that provide the same interface and functionality as the corresponding `dplyr` function.

In [1]:
import pandas as pd

## Set up

Let's read in a data set in `sqlalchemy`

In [2]:
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine, func
from sqlalchemy.ext.automap import automap_base

engine = create_engine("sqlite:///databases/heroes_2_1.db")

Base = automap_base()
Base.prepare(engine, reflect=True)
Hero = Base.classes.heroes

# `select` and `filter_by` functions for `sqlalchemy`

* For `pandas` and `pyspark`: `df >> func` returns a new `df`
* For `sqlalchemy`, we manipulate select statement, not `df`
* **Consequence:** `stmt >> func` returns a new `SQL stmt`

## Starting the `pipe`

Since we are to be transforming `stmt`s into `stmt`s, we need 
* a starting function to turn a table class into a starting statement.
* A `head` function to test our statements
* A function to pretty print the current stmt, then pipe it along
* A function to produce `pandas` tables of the output

#### Switching a table class `to_statement`

In [3]:
from sqlalchemy import select as select_sql
from functoolz import pipeable
from more_sqlalchemy import everything

@pipeable
def to_statement(table_class):
    return select_sql(everything(table_class))

#### Using `stmt.limit` to create a `headq` helper

**Note:** To avoid name conflicts, we will name all `sqlalchemy` helpers by appending a `q`

In [4]:
@pipeable
def headq(stmt, num = 5):
    return stmt.limit(num)

#### Embedding a `pprint` in the pipe

In [5]:
from more_sqlalchemy import pprint

@pipeable
def pprintq(stmt):
    pprint(stmt)
    return stmt

#### Using `pd.read_sql_query` to create a `to_pandasq` helper

**Note:** `pd.read_sql_query` requires `con=engine`.  We will make this an explicit first parameter

In [6]:
@pipeable
def to_pandasq(engine, stmt):
    return pd.read_sql_query(stmt, con = engine)

## Testing our helper functions

In [7]:
(Hero
 >> to_statement
 >> headq
 >> to_pandasq(engine))

Unnamed: 0,id,name,gender,eye_color,race,hair_color,height,publisher,skin_color,alignment,weight
0,0,A-Bomb,Male,yellow,Human,No Hair,203.0,Marvel Comics,,good,441.0
1,1,Abe Sapien,Male,blue,Icthyo Sapien,No Hair,191.0,Dark Horse Comics,blue,good,65.0
2,2,Abin Sur,Male,blue,Ungaran,No Hair,185.0,DC Comics,red,good,90.0
3,3,Abomination,Male,green,Human / Radiation,No Hair,203.0,Marvel Comics,,bad,441.0
4,4,Abraxas,Male,blue,Cosmic Entity,Black,,Marvel Comics,,bad,


## Using `pprintq` to view the intermediate statements

In [8]:
_ = (Hero
     >> to_statement
     >> pprintq
     >> headq
     >> to_pandasq(engine))

SELECT heroes.id,
       heroes.name,
       heroes.gender,
       heroes.eye_color,
       heroes.race,
       heroes.hair_color,
       heroes.height,
       heroes.publisher,
       heroes.skin_color,
       heroes.alignment,
       heroes.weight
FROM heroes


In [9]:
_ = (Hero
     >> to_statement
     >> headq
     >> pprintq
     >> to_pandasq(engine))

SELECT heroes.id,
       heroes.name,
       heroes.gender,
       heroes.eye_color,
       heroes.race,
       heroes.hair_color,
       heroes.height,
       heroes.publisher,
       heroes.skin_color,
       heroes.alignment,
       heroes.weight
FROM heroes
LIMIT :param_1


## Example 3 - Creating `selectq` for `sqlalchemy`

Now we will create a pipeable wrapper function.  To make this work, we will use `stmt.with_only_columns`, let's explore this method using help. To simplify the presentation, we will assume `columns` is a list of column name strings

In [10]:
s = (Hero >> to_statement)
help(s.with_only_columns)

Help on method with_only_columns in module sqlalchemy.sql.selectable:

with_only_columns(columns) method of sqlalchemy.sql.selectable.Select instance
    Return a new :func:`.select` construct with its columns
    clause replaced with the given columns.
    
    This method is exactly equivalent to as if the original
    :func:`.select` had been called with the given columns
    clause.   I.e. a statement::
    
        s = select([table1.c.a, table1.c.b])
        s = s.with_only_columns([table1.c.b])
    
    should be exactly equivalent to::
    
        s = select([table1.c.b])
    
    This means that FROM clauses which are only derived
    from the column list will be discarded if the new column
    list no longer contains that FROM::
    
        >>> table1 = table('t1', column('a'), column('b'))
        >>> table2 = table('t2', column('a'), column('b'))
        >>> s1 = select([table1.c.a, table2.c.b])
        >>> print s1
        SELECT t1.a, t2.b FROM t1, t2
        >>> s2 = s

## Accessing columns from a statement

* `selectq` will filter columns
* We need to
    * Iterate over **all current columns**
    * Filter the result **based on the column name**

#### Accessing all columns using `stmt.columns`

In [11]:
[c for c in s.columns]

[Column('id', INTEGER(), table=<Select object>, primary_key=True, nullable=False),
 Column('name', VARCHAR(), table=<Select object>),
 Column('gender', VARCHAR(), table=<Select object>),
 Column('eye_color', VARCHAR(), table=<Select object>),
 Column('race', VARCHAR(), table=<Select object>),
 Column('hair_color', VARCHAR(), table=<Select object>),
 Column('height', FLOAT(), table=<Select object>),
 Column('publisher', VARCHAR(), table=<Select object>),
 Column('skin_color', VARCHAR(), table=<Select object>),
 Column('alignment', VARCHAR(), table=<Select object>),
 Column('weight', FLOAT(), table=<Select object>)]

#### Accessing column names using `col.name`

In [12]:
[c.name for c in s.columns]

['id',
 'name',
 'gender',
 'eye_color',
 'race',
 'hair_color',
 'height',
 'publisher',
 'skin_color',
 'alignment',
 'weight']

#### Filtering columns using a list comprehension

In [13]:
cols_to_keep = ['name', 'gender', 'eye_color']
[c for c in s.columns if c.name in cols_to_keep]

[Column('name', VARCHAR(), table=<Select object>),
 Column('gender', VARCHAR(), table=<Select object>),
 Column('eye_color', VARCHAR(), table=<Select object>)]

#### Creating a new statement using `with_only_columns`

In [14]:
cols_to_keep = ['name', 'gender', 'eye_color']
s_new = s.with_only_columns([c for c in s.columns if c.name in cols_to_keep])
pprint(s_new)

SELECT name,
       gender,
       eye_color
FROM
  (SELECT heroes.id AS id,
          heroes.name AS name,
          heroes.gender AS gender,
          heroes.eye_color AS eye_color,
          heroes.race AS race,
          heroes.hair_color AS hair_color,
          heroes.height AS height,
          heroes.publisher AS publisher,
          heroes.skin_color AS skin_color,
          heroes.alignment AS alignment,
          heroes.weight AS weight
   FROM heroes)


## Creating the `selectq` helper


In [15]:
@pipeable
def selectq(columns, stmt):
    cols_to_keep = [c for c in stmt.columns if c.name in columns]
    return stmt.with_only_columns(cols_to_keep)
    

#### Testing `selectq`

In [16]:
(Hero
 >> to_statement
 >> selectq(['name', 'gender', 'eye_color'])
 >> headq
 >> to_pandasq(engine))

Unnamed: 0,name,gender,eye_color
0,A-Bomb,Male,yellow
1,Abe Sapien,Male,blue
2,Abin Sur,Male,blue
3,Abomination,Male,green
4,Abraxas,Male,blue


#### Inspecting `selectq` statements

In [17]:
(Hero
 >> to_statement
 >> selectq(['name', 'gender', 'eye_color'])
 >> pprintq
 >> headq
 >> to_pandasq(engine))

SELECT name,
       gender,
       eye_color
FROM
  (SELECT heroes.id AS id,
          heroes.name AS name,
          heroes.gender AS gender,
          heroes.eye_color AS eye_color,
          heroes.race AS race,
          heroes.hair_color AS hair_color,
          heroes.height AS height,
          heroes.publisher AS publisher,
          heroes.skin_color AS skin_color,
          heroes.alignment AS alignment,
          heroes.weight AS weight
   FROM heroes)


Unnamed: 0,name,gender,eye_color
0,A-Bomb,Male,yellow
1,Abe Sapien,Male,blue
2,Abin Sur,Male,blue
3,Abomination,Male,green
4,Abraxas,Male,blue


## <font color="red" > Exercise 2 - Construct `filter_byq` </font>

* This should be a wrapper around `stmt.where`
    * Explore using help.  How many additional arguments?  What are they?
* Must be `pipeable` with `stmt` in the last position.
* Include test(s) of your function.

In [39]:
# Your function definition here

In [40]:
# Your test expression here

## Up Next

Stuff