## Pipeline Persistence

### Persistence
A pipeline persists its steps, context and dependencies. This example shows how those features of a pipeline work.

Here we will move this pipeline from our local environment to cortex.

To set up:

In [None]:
import numpy as np
import pandas as pd
from cortex import Cortex

Next, create a Cortex Builder instance:

In [None]:
cortex = Cortex.local()
builder = cortex.builder()

Create the test pipeline:

In [None]:
data_set = builder.dataset('forest_fires').title('Forest Fire Data')\
    .from_csv('data/ff.sample.csv').build()
tp = data_set.pipeline('prep')

Pipeline's have the method `to_camel()` that produces a textual representation of the state of the pipeline.  Lets look at that now and verify that this is the right dataset with the right pipeline.

In [None]:
tp.to_camel()

## What is pipeline context?

A pipeline's context is a dictionary of values or functions that could be used to operate on the dataframe when each pipeline step is run.

In [None]:
df=data_set.as_pandas()
print(df)

In [None]:
tp.set_context('one_median', np.asscalar(df['area'].astype(float).median()))
tp.to_camel()

Now we add a step that uses the context:

In [None]:
def standard_score(pipeline, df):
    mu = pipeline.get_context('one_median')
    df['area'] = df['area'] * 100 - mu # lets ask about this
    
tp.add_step(standard_score)

tp.run(df)

### Dependecies

Pipelines can be chained together, running one after another. The mechanism to do this is the pipeline dependencies function.

Create another pipeline:

In [None]:
#change this to a farenheit conversion 
dtp = builder.pipeline('depends_on_prep_pipeline')

def diff_temp_rain(pipeline, df):
    df['c4'] = df['temp'] - df['rain']
    

dtp.add_step(diff_temp_rain)

C3 must exist before this step is run, so `dtp` depends on `tp`:

In [None]:
dtp.add_dependency(tp)

Now that a second pipeline is set up, run `dtp` to cause `tp` to run. 

In [None]:
dtp.run(df)

## Builders vs Datasets

Datasets can have pipelines and datasets persist between kernel invocations, so pipelines associated with a dataset are persisted. Pipelines constructed from a builder are only persisted in memory. If the kernel restarts, the builders are deleted.

In [None]:
data_set = builder.dataset('sample').title('Persistence Sample Data')\
    .from_df(df).build()

data_frame = data_set.as_pandas()
data_frame.tail()

In [None]:
dsp = data_set.pipeline('dsp')
dsp.reset() # see below for an explanation of pipeline reset
dsp.add_step(add_one)
data_set.save()

When the Python kernel is reset, all in-memory data is cleared. However, the dataset, along with its pipeline, is persisted.

In [None]:
# WARNING!! this cell resets the kernel, wiping out everything!
%reset -f

Notice that the test pipeline is gone:

In [None]:
try:
    tp
except NameError:
    print('No tp')

Next we demonstrate that the `sample` dataset is persisted as is the `dsp` pipeline. In order to show how the pipeline persists, re-establish the required libraries and the Cortex builder:

In [None]:
import numpy as np
import pandas as pd
from cortex import Cortex

cortex = Cortex.local()
builder = cortex.builder()

Get the dataset, using the name we previousily specified:

In [None]:
data_set = cortex.dataset('sample')

Get the pipeline, again using the name we previousily specified:

In [None]:
new_pipeline = data_set.pipeline('dsp')

Get the dataframe from the dataset and then run the pipeline. 

In [None]:
df = data_set.as_pandas()
new_pipeline.run(df)

Notice that the `add_one` function is still present in the `new_pipeline`.

## Reset() affects pipeline state

Without parameters, reset removes all steps in the pipline. 

In [None]:
new_pipeline.to_camel()

In [None]:
new_pipeline.reset()

new_pipeline.to_camel()

You can also delete all context and dependencies. 

In [None]:
another_new_pl = data_set.pipeline('dsp')

Add a context:

In [None]:
another_new_pl.set_context('for_example','a string')

Add a dependency:

In [None]:
tp = builder.pipeline('test_pipeline')
another_new_pl.add_dependency(tp)

another_new_pl.to_camel()

Next reset everything, including pipeline steps, dependencies, and context:

In [None]:
another_new_pl.reset(reset_deps=True, reset_context=True)

another_new_pl.to_camel()