# More on Intentions and Column Expressions

In [4]:
import pandas as pd
from dfply import *
import matplotlib.pylab as plt
%matplotlib inline

In [5]:
artist_url = "https://github.com/MuseumofModernArt/collection/raw/master/Artists.csv"
artists =  pd.read_csv(artist_url)
artists.head()

Unnamed: 0,ConstituentID,DisplayName,ArtistBio,Nationality,Gender,BeginDate,EndDate,Wiki QID,ULAN
0,1,Robert Arneson,"American, 1930–1992",American,Male,1930,1992,,
1,2,Doroteo Arnaiz,"Spanish, born 1936",Spanish,Male,1936,0,,
2,3,Bill Arnold,"American, born 1941",American,Male,1941,0,,
3,4,Charles Arnoldi,"American, born 1946",American,Male,1946,0,Q1063584,500027998.0
4,5,Per Arnoldi,"Danish, born 1941",Danish,Male,1941,0,,


In [6]:
artwork_url = "https://github.com/MuseumofModernArt/collection/raw/master/Artworks.csv"
artwork = pd.read_csv(artwork_url) # Big file, be patient
artwork.head()

Unnamed: 0,Title,Artist,ConstituentID,ArtistBio,Nationality,BeginDate,EndDate,Gender,Date,Medium,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
0,"Ferdinandsbrücke Project, Vienna, Austria (Ele...",Otto Wagner,6210,"(Austrian, 1841–1918)",(Austrian),(1841),(1918),(Male),1896,Ink and cut-and-pasted painted pages on paper,...,http://www.moma.org/media/W1siZiIsIjU5NDA1Il0s...,,,,48.6,,,168.9,,
1,"City of Music, National Superior Conservatory ...",Christian de Portzamparc,7470,"(French, born 1944)",(French),(1944),(0),(Male),1987,Paint and colored pencil on print,...,http://www.moma.org/media/W1siZiIsIjk3Il0sWyJw...,,,,40.6401,,,29.8451,,
2,"Villa near Vienna Project, Outside Vienna, Aus...",Emil Hoppe,7605,"(Austrian, 1876–1957)",(Austrian),(1876),(1957),(Male),1903,"Graphite, pen, color pencil, ink, and gouache ...",...,http://www.moma.org/media/W1siZiIsIjk4Il0sWyJw...,,,,34.3,,,31.8,,
3,"The Manhattan Transcripts Project, New York, N...",Bernard Tschumi,7056,"(French and Swiss, born Switzerland 1944)",(),(1944),(0),(Male),1980,Photographic reproduction with colored synthet...,...,http://www.moma.org/media/W1siZiIsIjEyNCJdLFsi...,,,,50.8,,,50.8,,
4,"Villa, project, outside Vienna, Austria, Exter...",Emil Hoppe,7605,"(Austrian, 1876–1957)",(Austrian),(1876),(1957),(Male),1903,"Graphite, color pencil, ink, and gouache on tr...",...,http://www.moma.org/media/W1siZiIsIjEyNiJdLFsi...,,,,38.4,,,19.1,,


## `X` is an `Intention`

<img src="img/dfply_X_intention_1.png" width = 800>

Think of it as recording an expression for later evaluation

In [4]:
expr = X.BeginDate.head()
expr

<dfply.base.Intention at 0x13612e850>

## Use `evaluate` to apply the expression

We can apply an expression *later* using `evaluate` with a dataframe.

In [5]:
expr.evaluate(artists)

0    1930
1    1936
2    1941
3    1946
4    1941
Name: BeginDate, dtype: int64

## Intention expressions are reusable!

In [6]:
# Reusable!
expr.evaluate(artwork)

0    (1841)
1    (1944)
2    (1876)
3    (1944)
4    (1876)
Name: BeginDate, dtype: object

## <font color="red"> Exercise 2.3.1 </font>
    
**Tasks:**

1. Create and evaluate a column expression, saved as `my_expr`, that checks that the `Height` column is larger than 40. **Hint:** The space and `()` in the column name requires you to use `X['col name']` format.
2. Evaluate that column expression to the `Artist` and `Artwork` data frame, that is evaluate `my_expr.evaluate(df)`.
3. Use the expression object in filter, e.g. `filter(my_expr)` on the `Artwork` data set 
4. Now try to perform the filter with a pipe and no expression, e.g. `filter_by(artwork['Height cm' > 40)`.  Why does this still work?  When might we run into trouble?
5. Write a paragraph that summarizes how this all works.

In [7]:
#Code for Task 1
my_expr = X['Height (cm)']  > 40
my_expr

<dfply.base.Intention at 0x7fe70bf05e80>

In [8]:
# Code for Task 2
my_expr.evaluate(artwork)

0          True
1          True
2         False
3          True
4         False
          ...  
139932    False
139933    False
139934    False
139935    False
139936    False
Name: Height (cm), Length: 139937, dtype: bool

In [16]:
# Code for Task 3
(artwork 
>> filter_by(my_expr)
>> sample(5)
)

Unnamed: 0,Title,Artist,ConstituentID,ArtistBio,Nationality,BeginDate,EndDate,Gender,Date,Medium,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
73112,Number 1,Max Schnitzler,5254,"(American, born Poland. 1903–1999)",(American),(1903),(1999),(Male),1955,Oil on canvas,...,,,,,167.3,,,126.8,,
138068,Asociácie (Associations),Stano Filko,31341,"(Slovak, 1937–2015)",(Slovak),(1937),(2015),(Male),1967-1970/c.2000,"Letterpress on plastic sheet, letterpress with...",...,http://www.moma.org/media/W1siZiIsIjUwMDQ2MCJd...,,3.0,,75.0,,,55.0,,
74379,Obsol,James Brooks,798,"(American, 1906–1992)",(American),(1906),(1992),(Male),1964,Oil on canvas,...,http://www.moma.org/media/W1siZiIsIjE3OTQxMyJd...,,,,203.2,,,188.0,,
129629,Untitled,David Zink Yi,68288,"(Peruvian, born 1973)",(Peruvian),(1973),(0),(Male),2014,Gelatin silver print,...,http://www.moma.org/media/W1siZiIsIjQyMjU4NyJd...,,0.0,,59.8,,,75.8,,
78990,Byron Leaves Lasca's Apartment from Nigger Hea...,E. McKnight Kauffer,3020,"(American, 1890–1954)",(American),(1890),(1954),(Male),1931,Gouache and pencil on paper,...,http://www.moma.org/media/W1siZiIsIjMzMTU5OSJd...,,,,55.9,,,38.7,,


In [17]:
# Code for Task4
(artwork 
>> filter_by(X['Height (cm)'] > 40)
>> sample(5)
)


Unnamed: 0,Title,Artist,ConstituentID,ArtistBio,Nationality,BeginDate,EndDate,Gender,Date,Medium,...,ThumbnailURL,Circumference (cm),Depth (cm),Diameter (cm),Height (cm),Length (cm),Weight (kg),Width (cm),Seat Height (cm),Duration (sec.)
124681,Heritage Studies #3,Iman Issa,49198,"(Egyptian and American, born 1979)",(Egyptian),(1979),(0),(Female),2015,"Silicon bronze, gesso-finished plywood, and vi...",...,,,28.0,,59.0,,,130.0,,
36110,Liberation,Ben Shahn,5366,"(American, born Lithuania. 1898–1969)",(American),(1898),(1969),(Male),1945,Gouache on board,...,http://www.moma.org/media/W1siZiIsIjUyMDg2MSJd...,,,,75.6,,,101.6,,
58492,Chevelle,Peter Stampfli,7837,"(Swiss, born 1937)",(Swiss),(1937),(0),(Male),(1968),Screenprint,...,http://www.moma.org/media/W1siZiIsIjI0OTQwNCJd...,,,,61.0,,,49.7,,
4914,Meta,Niklaus Stoecklin,5671,"(Swiss, 1896–1982)",(Swiss),(1896),(1982),(Male),1941,Lithograph,...,http://www.moma.org/media/W1siZiIsIjQzNzMiXSxb...,,,,128.0,,,90.0,,
109961,J gentle stirring from circus alphabet,Corita Kent (Sister Mary Corita),41140,"(American, 1918–1986)",(American),(1918),(1986),(Female),1968,One from a series of twenty-six screenprints,...,http://www.moma.org/media/W1siZiIsIjQ1NzQ4MyJd...,,,,53.0,,,51.0,,


> Summary: The "X" is replaced by an actual dataframe (df) when the intention expression is used. By evaluating the expression, it checks if the condition (height > 40 ) is true for all the observations. In this example, the expression is then used in a piped process and outputs the filtered table. This can be very useful if you are working with several dfs that contain the same attributes' names as you can use the same expression in all dfs.

## Not just for data frames ... works for any* expression

In [18]:
double, line = 2*X, 3*X + 5

In [20]:
double.evaluate(3), line.evaluate(6)

(6, 23)

## We can make functions lazy too!

Decorate a function with `make_symbolic` to allow lazy evaluation of `Intention` objects

In [21]:
from math import log
log = make_symbolic(log)

In [22]:
log(8, 2) # Works as expected with numbers

3.0

## Passing in `X` now makes a `log` expression

In [23]:
expr = log(X, 2) # Passing in X makes it lazy/symbolic
expr

<dfply.base.Intention at 0x7fe70c638340>

In [24]:
expr.evaluate(16) # Evaluate later

4.0