## Pipeline Persistence

### Persistence
A pipeline persists its steps, context and dependencies. This example shows how those features of a pipeline work.

Here we will move this pipeline from our local environment to cortex.

To set up:

In [8]:
import numpy as np
import pandas as pd
from cortex import Cortex

Next, create a Cortex Builder instance:

In [9]:
cortex = Cortex.local()
builder = cortex.builder()

Create the test pipeline:

In [12]:
data_set = builder.dataset('forest_fires').title('Forest Fire Data')\
    .from_csv('data/ff.sample.csv').build()
tp = data_set.pipeline('prep')

Pipeline's have the method `to_camel()` that produces a textual representation of the state of the pipeline.  Lets look at that now and verify that this is the right dataset with the right pipeline.

In [13]:
tp.to_camel()

{'name': 'prep',
 'steps': [{'name': 'add_bui',
   'function': {'name': 'add_bui',
    'code': 'gANjZGlsbC5fZGlsbApfY3JlYXRlX2Z1bmN0aW9uCnEAKGNkaWxsLl9kaWxsCl9sb2FkX3R5cGUKcQFYCAAAAENvZGVUeXBlcQKFcQNScQQoSwJLAEsCSwVLQ0MwZAF8AWQCGQAUAHwBZAMZABQAfAFkAhkAZAR8AWQDGQAUABcAGwB8AWQFPABkAFMAcQUoTkc/6ZmZmZmZmlgDAAAARE1DcQZYAgAAAERDcQdHP9mZmZmZmZpYAwAAAEJVSXEIdHEJKVgIAAAAcGlwZWxpbmVxClgCAAAAZGZxC4ZxDFgfAAAAPGlweXRob24taW5wdXQtMjYtYzFkY2ZhYTIzOTUwPnENWAcAAABhZGRfYnVpcQ5LAUMCAAFxDykpdHEQUnERfXESaA5OTn1xE3RxFFJxFS4=',
    'type': 'inline'}},
  {'name': 'fix_area',
   'function': {'name': 'fix_area',
    'code': 'gANjZGlsbC5fZGlsbApfY3JlYXRlX2Z1bmN0aW9uCnEAKGNkaWxsLl9kaWxsCl9sb2FkX3R5cGUKcQFYCAAAAENvZGVUeXBlcQKFcQNScQQoSwJLAEsCSwRLQ0MafAFkARkAoABkAmQDhAChAXwBZAE8AGQAUwBxBShOWAQAAABhcmVhcQZoBChLAUsASwFLA0tTQwp0AKABfAChAVMAcQdOhXEIWAQAAABtYXRocQlYBQAAAGxvZzFwcQqGcQtYAQAAAGFxDIVxDVgfAAAAPGlweXRob24taW5wdXQtMjctODgyOTFkMWU1NzgwPnEOWAgAAAA8bGFtYmRhPnEPSwJDAHEQKSl0cRFScRJYGgAAAGZpeF9hcmVhLjxsb2NhbHM+LjxsY

## What is pipeline context?

A pipeline's context is a dictionary of values or functions that could be used to operate on the dataframe when each pipeline step is run.

In [20]:
df=data_set.as_pandas()
print(df)

    X  Y month  day  FFMC    DMC     DC   ISI  temp  RH  wind  rain     area
0   7  5   mar  fri  86.2   26.2   94.3   5.1   8.2  51   6.7   0.0     0.00
1   7  4   oct  tue  90.6   35.4  669.1   6.7  18.0  33   0.9   0.0     0.00
2   7  4   oct  sat  90.6   43.7  686.9   6.7  14.6  33   1.3   0.0     0.00
3   8  6   mar  fri  91.7   33.3   77.5   9.0   8.3  97   4.0   0.2     0.00
4   8  6   mar  sun  89.3   51.3  102.2   9.6  11.4  99   1.8   0.0     0.00
5   8  6   aug  sun  92.3   85.3  488.0  14.7  22.2  29   5.4   0.0     0.00
6   8  6   aug  mon  92.3   88.9  495.6   8.5  24.1  27   3.1   0.0     0.00
7   8  6   aug  mon  91.5  145.4  608.2  10.7   8.0  86   2.2   0.0     0.00
8   8  6   sep  tue  91.0  129.5  692.6   7.0  13.1  63   5.4   0.0     0.00
9   3  4   sep  fri  93.3  141.2  713.9  13.9  18.6  49   3.6   0.0    35.88
10  4  3   mar  mon  87.6   52.2  103.8   5.0  11.0  46   5.8   0.0    36.85
11  2  2   jul  fri  88.3  150.3  309.9   6.8  13.4  79   3.6   0.0    37.02

In [21]:
tp.set_context('one_median', np.asscalar(df['area'].astype(float).median()))
tp.to_camel()

{'name': 'prep',
 'steps': [{'name': 'add_bui',
   'function': {'name': 'add_bui',
    'code': 'gANjZGlsbC5fZGlsbApfY3JlYXRlX2Z1bmN0aW9uCnEAKGNkaWxsLl9kaWxsCl9sb2FkX3R5cGUKcQFYCAAAAENvZGVUeXBlcQKFcQNScQQoSwJLAEsCSwVLQ0MwZAF8AWQCGQAUAHwBZAMZABQAfAFkAhkAZAR8AWQDGQAUABcAGwB8AWQFPABkAFMAcQUoTkc/6ZmZmZmZmlgDAAAARE1DcQZYAgAAAERDcQdHP9mZmZmZmZpYAwAAAEJVSXEIdHEJKVgIAAAAcGlwZWxpbmVxClgCAAAAZGZxC4ZxDFgfAAAAPGlweXRob24taW5wdXQtMjYtYzFkY2ZhYTIzOTUwPnENWAcAAABhZGRfYnVpcQ5LAUMCAAFxDykpdHEQUnERfXESaA5OTn1xE3RxFFJxFS4=',
    'type': 'inline'}},
  {'name': 'fix_area',
   'function': {'name': 'fix_area',
    'code': 'gANjZGlsbC5fZGlsbApfY3JlYXRlX2Z1bmN0aW9uCnEAKGNkaWxsLl9kaWxsCl9sb2FkX3R5cGUKcQFYCAAAAENvZGVUeXBlcQKFcQNScQQoSwJLAEsCSwRLQ0MafAFkARkAoABkAmQDhAChAXwBZAE8AGQAUwBxBShOWAQAAABhcmVhcQZoBChLAUsASwFLA0tTQwp0AKABfAChAVMAcQdOhXEIWAQAAABtYXRocQlYBQAAAGxvZzFwcQqGcQtYAQAAAGFxDIVxDVgfAAAAPGlweXRob24taW5wdXQtMjctODgyOTFkMWU1NzgwPnEOWAgAAAA8bGFtYmRhPnEPSwJDAHEQKSl0cRFScRJYGgAAAGZpeF9hcmVhLjxsb2NhbHM+LjxsY

Now we add a step that uses the context:

In [22]:
def standard_score(pipeline, df):
    mu = pipeline.get_context('one_median')
    df['area'] = df['area'] * 100 - mu # lets ask about this
    
tp.add_step(standard_score)

tp.run(df)

running pipeline [prep]:
> add_bui 
> fix_area 
> standard_score 


Unnamed: 0,X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,area,BUI
0,7,5,mar,fri,86.2,26.2,94.3,5.1,8.2,51,6.7,0.0,-35.88,30.921902
1,7,4,oct,tue,90.6,35.4,669.1,6.7,18.0,33,0.9,0.0,-35.88,62.529409
2,7,4,oct,sat,90.6,43.7,686.9,6.7,14.6,33,1.3,0.0,-35.88,75.40672
3,8,6,mar,fri,91.7,33.3,77.5,9.0,8.3,97,4.0,0.2,-35.88,32.108865
4,8,6,mar,sun,89.3,51.3,102.2,9.6,11.4,99,1.8,0.0,-35.88,45.501063
5,8,6,aug,sun,92.3,85.3,488.0,14.7,22.2,29,5.4,0.0,-35.88,118.72057
6,8,6,aug,mon,92.3,88.9,495.6,8.5,24.1,27,3.1,0.0,-35.88,122.752218
7,8,6,aug,mon,91.5,145.4,608.2,10.7,8.0,86,2.2,0.0,-35.88,182.015602
8,8,6,sep,tue,91.0,129.5,692.6,7.0,13.1,63,5.4,0.0,-35.88,176.497663
9,3,4,sep,fri,93.3,141.2,713.9,13.9,18.6,49,3.6,0.0,324.88694,188.963689


### Dependecies

Pipelines can be chained together, running one after another. The mechanism to do this is the pipeline dependencies function.

Create another pipeline:

In [23]:
#change this to a farenheit conversion 
dtp = builder.pipeline('depends_on_prep_pipeline')

def diff_temp_rain(pipeline, df):
    df['c4'] = df['temp'] - df['rain']
    

dtp.add_step(diff_temp_rain)

<cortex.pipeline.Pipeline at 0x12a99a940>

C3 must exist before this step is run, so `dtp` depends on `tp`:

In [24]:
dtp.add_dependency(tp)

<cortex.pipeline.Pipeline at 0x12a99a940>

Now that a second pipeline is set up, run `dtp` to cause `tp` to run. 

In [25]:
dtp.run(df)

running pipeline [prep]:
> add_bui 
> fix_area 


ValueError: math domain error

## Builders vs Datasets

Datasets can have pipelines and datasets persist between kernel invocations, so pipelines associated with a dataset are persisted. Pipelines constructed from a builder are only persisted in memory. If the kernel restarts, the builders are deleted.

In [None]:
data_set = builder.dataset('sample').title('Persistence Sample Data')\
    .from_df(df).build()

data_frame = data_set.as_pandas()
data_frame.tail()

In [None]:
dsp = data_set.pipeline('dsp')
dsp.reset() # see below for an explanation of pipeline reset
dsp.add_step(add_one)
data_set.save()

When the Python kernel is reset, all in-memory data is cleared. However, the dataset, along with its pipeline, is persisted.

In [None]:
# WARNING!! this cell resets the kernel, wiping out everything!
%reset -f

Notice that the test pipeline is gone:

In [None]:
try:
    tp
except NameError:
    print('No tp')

Next we demonstrate that the `sample` dataset is persisted as is the `dsp` pipeline. In order to show how the pipeline persists, re-establish the required libraries and the Cortex builder:

In [None]:
import numpy as np
import pandas as pd
from cortex import Cortex

cortex = Cortex.local()
builder = cortex.builder()

Get the dataset, using the name we previousily specified:

In [None]:
data_set = cortex.dataset('sample')

Get the pipeline, again using the name we previousily specified:

In [None]:
new_pipeline = data_set.pipeline('dsp')

Get the dataframe from the dataset and then run the pipeline. 

In [None]:
df = data_set.as_pandas()
new_pipeline.run(df)

Notice that the `add_one` function is still present in the `new_pipeline`.

## Reset() affects pipeline state

Without parameters, reset removes all steps in the pipline. 

In [1]:
new_pipeline.to_camel()

NameError: name 'new_pipeline' is not defined

In [None]:
new_pipeline.reset()

new_pipeline.to_camel()

You can also delete all context and dependencies. 

In [None]:
another_new_pl = data_set.pipeline('dsp')

Add a context:

In [None]:
another_new_pl.set_context('for_example','a string')

Add a dependency:

In [None]:
tp = builder.pipeline('test_pipeline')
another_new_pl.add_dependency(tp)

another_new_pl.to_camel()

Next reset everything, including pipeline steps, dependencies, and context:

In [None]:
another_new_pl.reset(reset_deps=True, reset_context=True)

another_new_pl.to_camel()