Turn our model into a serverless prediction service that is going to run  by itself, generate predictions by itself and use its user interface to show predictions

# Iris Flower - Feature Pipeline

In this notebook we will, 

1. Run in either "Backfill" or "Normal" operation. 
2. IF *BACKFILL==True*, we will load our DataFrame with data from the iris.csv file 

   ELSE *BACKFILL==False*, we will load our DataFrame with one synthetic Iris Flower sample 
3. Write our DataFrame to a Feature Group

In [1]:
import random
import pandas as pd
import hopsworks

from sklearn.datasets import load_iris

BACKFILL=False

What is backfill and what do we use it for?

Set **BACKFILL=True** if you want to create features from the iris.csv file containing historical data.

## Synthetic data

We'll define a function for creating synthetic data. 
This function will be run from time to time in order to generate a dataframe containing a single iris flower sample.

In [16]:
def generate_new_iris_flower(name, sepal_len_max, sepal_len_min, sepal_width_max, 
                             sepal_width_min, petal_len_max, petal_len_min,
                             petal_width_max, petal_width_min):
    """
    Returns a single iris flower as a single row in a DataFrame
    """
    df = pd.DataFrame({ "sepal_length": [random.uniform(sepal_len_max, sepal_len_min)],
                       "sepal_width": [random.uniform(sepal_width_max, sepal_width_min)],
                       "petal_length": [random.uniform(petal_len_max, petal_len_min)],
                       "petal_width": [random.uniform(petal_width_max, petal_width_min)]
                      })
    df['species'] = name
    return df

In [22]:
def get_random_iris_flower():
    """
    Returns a DataFrame containing one random iris flower
    """
    setosa_df =  generate_new_iris_flower("setosa", 5.8, 4.3, 4.5, 2.3, 1.9, 1, 2.5, 0.1)
    versicolor_df = generate_new_iris_flower("versicolor", 7, 4.9, 3.4, 2.0, 5.1, 3, 1.8, 1.0)
    virginica_df = generate_new_iris_flower("virginica", 7.9, 4.9, 3.8, 2.2, 6.9, 4.5, 2.5, 1.4)
    
    # randomly pick one of these 3 and write it to the featurestore
    pick_random = random.uniform(0,3)
    if pick_random >= 2:
        iris_df = virginica_df
    elif pick_random >= 1:
        iris_df = versicolor_df
    else:
        iris_df = setosa_df

    return iris_df

In [23]:
get_random_iris_flower()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.399762,3.089884,1.799603,1.456563,setosa


## Backfill or create new synthetic input data

This pipeline can be run in either backfill or synthetic data mode

In [24]:
# Get the data from sklearn dataset
def get_iris():
    """
    This function loads the iris dataset from sklearn datasets
    """
    # Load dataset, get the featues and labels
    dataset = load_iris(as_frame=True)
    feature=dataset.data
    label=dataset.target

    # Merge the featues and labels into one dataframe
    df = feature.copy()
    df["species"] = label

    # Convert the target into acutal classes
    target = {
        0: "setosa",
        1: "versicolor",
        2: "virginica"
    }
    
    df['species'] = df["species"].map(target)

    rename_col = {
        "sepal length (cm)": "sepal_length",
        "sepal width (cm)": "sepal_width",
        "petal length (cm)": "petal_length",
        "petal width (cm)": "petal_width"
    }
    df = df.rename(columns=rename_col)
   
    return df

In [31]:
if BACKFILL == True:
    # iris_df = pd.read_csv("https://repo.hops.works/master/hopsworks-tutorials/data/iris.csv")
    iris_df = get_iris()    
else:
    iris_df = get_random_iris_flower()
    
iris_df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.190488,3.051881,4.700561,1.441109,versicolor


## Authenticate with Hopsworks using your API Key

Hopsworks will prompt you to paste in your API key and provide you with a link to find your API key if you have not stored it securely already.

What is hopsworks api key
It is a credential that allows users to interact with the hopsworks pltform without needing username and password

In [8]:
project = hopsworks.login()
fs = project.get_feature_store()

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/70809
Connected. Call `.close()` to terminate connection gracefully.


## Create and write to a feature group - primary keys

To prevent duplicate entries, Hopsworks requires that each DataFame has a *primary_key*.   
A *primary_key* is one or more columns that uniquely identify the row. Here, we assume  
that each Iris flower has a unique combination of ("sepal_length","sepal_width","petal_length","petal_width")  
feature values. If you randomly generate a sample that already exists in the feature group, the insert operation will fail.

The *feature group* will create its online schema using the schema of the Pandas DataFame.

In [32]:
iris_fg = fs.get_or_create_feature_group(name="sklearn_iris_dataset",
                                  version=1,
                                  primary_key=["sepal_length", "sepal_width", "petal_length", "petal_width"],
                                  description="Iris flower dataset from sklearn datasets"
                                 )
iris_fg.insert(iris_df)

Uploading Dataframe: 0.00% |          | Rows 0/1 | Elapsed Time: 00:00 | Remaining Time: ?

Launching job: sklearn_iris_dataset_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai/p/70809/jobs/named/sklearn_iris_dataset_1_offline_fg_materialization/executions


(<hsfs.core.job.Job at 0x7fabad72e6e0>, None)

In [None]:
import time

time.sleep(120)

Make sure the job finishes running before you move to next step