# Flor

Lorem Ipsum

## Prepare your environment before starting the activities.

We're going to start by importing Flor and letting it know the name of our notebook.

In [None]:
# Import Flor
import flor

# If the notebook name has not already been set, you are able to set the name in code. 
flor.setNotebookName('tutorial_2.ipynb')

## Interpreting someone else's work in Flor

In this next exercise, as in many "real-world" cases, you'll be joining an in-progress model development effort. Bob, a fellow member of your team, has already attempted two different data-preprocessing steps.

Run the cell below but notice that we are using a different experiment name (`bob_preproc` rather than `risecamp_demo`). Here, we are going to summarize someone else's past experiment versions.

In [None]:
flor.Experiment('bob_preproc').summarize()

Let's interpret the output. The first column, `utag`, lists the different versions of the experiment by name. We can see there are two past versions of the experiment `bob_preproc`: `first_preproc`, and `second_preproc`.  Now, let's pause for a second, run the next cell, and continue reading.

In [None]:
flor.Experiment('bob_preproc').plot('first_preproc')

We can now see the structure of the dataflow graph. We see that there are four (4) artifacts: `preprocess`, `data_loc`, `intermediate_X`, and `intermediate_y`.

Next, we inspect the structure of the dataflow graph, for the second version:

In [None]:
flor.Experiment('bob_preproc').plot('second_preproc')

We see that both node-link diagrams look the same. This means that the structure of the different experiment versions is the same; however, it is very likely that the contents of the computation graph differ. To see where the difference is, we `diff` the two versions of Bob's experiment.

In [None]:
flor.Experiment('bob_preproc').diff('first_preproc', 'second_preproc')

We see that `preprocess.py` was modified, so Bob probably tried two different preprocessing techniques.

We can continue to audit Bob with Flor, and this would alone be an interesting and worthwhile activity, but for the purposes of this tutorial, we will start by _using_ the preprocessed data created by Bob, and inspect it only if we need to.

## Using someone else's work in Flor

Earlier we brought ourselves "up to speed" with the preprocessing work that our colleague Bob had undertaken. We will now use Bob's preprocessed data instead of preprocessing the data ourselves. Here, we will show you how two different users of Flor may share their experiments and the artifacts/derived or consumed therewith.

Here's a reminder of what the previous experiment versions look like:

In [None]:
flor.Experiment('bob_preproc').summarize()

Below, we copy/pasted the pipeline you're already familiar with. As before, we highlight the changes in `###`.

In [None]:
@flor.func              ##############  ##############  ############
def split_train_and_eval(intermediate_X, intermediate_y, n_estimators, max_depth, **kwargs):
                        ##############  ##############  ############
    import pandas as pd
    import json

    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import classification_report
            
              ##############
    with open(intermediate_X) as json_data:
              ##############
        X = json.load(json_data)
        json_data.close()
        
              ##############
    with open(intermediate_y) as json_data:
              ##############
        y = json.load(json_data)
        json_data.close()

    X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.20, random_state=92)

    vectorizer = TfidfVectorizer()
    vectorizer.fit(X_tr)
    X_tr = vectorizer.transform(X_tr)
    X_te = vectorizer.transform(X_te)
    
                                              ############
    clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth).fit(X_tr, y_tr)
                                              ############
    
    y_pred = clf.predict(X_te)

    score = clf.score(X_te, y_te)
    print(score)
    
    return {'score': score}

In [None]:
with flor.Experiment('bob_preproc') as bob, flor.Experiment('risecamp_demo') as ex:
    data_x = bob.artifact('data_clean_X.json', 'intermediate_X', utag="first_preproc")
    data_y = bob.artifact('data_clean_y.json', 'intermediate_y', utag="first_preproc")
    
    n_estimators = ex.literal(7, 'n_estimators')
    max_depth = ex.literal(100, 'max_depth')
    do_split_train_and_eval = ex.action(split_train_and_eval, [data_x, data_y, n_estimators, max_depth])
    score = ex.literal(name='score', parent=do_split_train_and_eval)


Lorem Ipsum

In [None]:
score.plot()

In [None]:
#Run the experiment
score.pull('third_pull')

In [None]:
flor.Experiment('risecamp_demo').summarize()

## Pull again, trying a different dataset

In [None]:
flor.Experiment('bob_preproc').summarize()

In [None]:
data_x.version = "second_preproc"
data_y.version = "second_preproc"

In [None]:
score.plot()

In [None]:
score.pull('fourth_pull')

In [None]:
flor.Experiment('risecamp_demo').summarize()