# Saving and Loading a Model Notebook

You can save and/or load an ESM using three different options:
1. **Workspace**: you can save and load an ESM to and from your workspace.
2. **MLflow Run**: you can load an ESM from an MLflow run.
3. **MLflow Model Registry**: if you are working on Databricks, you can register and load an ESM to and from the MLflow Model Registry.

# Check Environment Variables
Before installing Hybrid Intelligence in the notebook you need to set these Environment Variables externally as described in the User Guide https://docs.umnai.com/set-up-your-environment. 
This section checks that the environment variables have been set correctly and throws an error if not.

In [1]:
import os

umnai_env_vars = {
    'UMNAI_CLIENT_ID',
    'UMNAI_CLIENT_SECRET',
    'PIP_EXTRA_INDEX_URL',
}

if not umnai_env_vars.issubset(os.environ.keys()):
    raise ValueError(
        'UMNAI environment variables not set correctly. They need to be set before using the Umnai library.'
    )

# Install Hybrid Intelligence
Next we install the UMNAI Platform.

In [2]:
%pip install umnai-platform --quiet

[0mNote: you may need to restart the kernel to use updated packages.


# Set Workspace Paths According to Your Environment
Now we will set the workspace path and the experiment path automatically. They will be set to a local path if you are using a local machine environment or to a Databricks path if you are using a Databricks environment.

## Install Databricks SDK

This checks if you are running on Databricks and installs their SDK if you are.

In [3]:
import os
if os.environ.get('DATABRICKS_RUNTIME_VERSION') is not None:
    %pip install databricks-sdk --quiet

If you are on Databricks, you can select whether you would like the workspace to be created in the shared area (available to all users in your account) or in your personal user account area. You can ignore this if you are running on a local environment.

In [4]:
# Set to 1 if you want to use shared or 0 to use personal user account area.
USE_SHARED_WORKSPACE = 1 

## Set Paths
Next the workspace and experiment paths are set automatically.

In [5]:
import os

EXP_NAME = 'saveloadESM_adult_income'
if os.environ.get('DATABRICKS_RUNTIME_VERSION') is not None:
    from databricks.sdk import WorkspaceClient
    w = WorkspaceClient()

    # For a Databricks Environment
    WS_PATH = '/dbfs/FileStore/workspaces/'+EXP_NAME
    if USE_SHARED_WORKSPACE:
        EXP_PREFIX = f'/Shared/experiments/'
    else:
        USERNAME = dbutils.notebook.entry_point.getDbutils().notebook().getContext().userName().get()
        EXP_PREFIX = f'/Users/{USERNAME}/experiments/'
    w.workspace.mkdirs(EXP_PREFIX)
    EXP_PATH = EXP_PREFIX + EXP_NAME
else:
    # For a Local Machine Environment
    WS_PATH = 'resources/workspaces/'+EXP_NAME
    EXP_PATH = EXP_NAME

# Import and Prepare Dataset
Import the dataset to a Pandas DataFrame and the clean data in preparation for onboarding into Hybrid Intelligence.

In [6]:
import pandas as pd
import numpy as np

# Import Adult Income Dataset to pandas dataframe: 
# This dataset can be downloaded from https://archive.ics.uci.edu/dataset/2/adult 
column_names = ["Age", "WorkClass", "fnlwgt", "Education", "EducationNum", "MaritalStatus", "Occupation", "Relationship", "Race", "Gender", "CapitalGain", "CapitalLoss", "HoursPerWeek", "NativeCountry", "Income"]
dataset_df = pd.read_csv('https://raw.githubusercontent.com/umnaibase/umnai-examples/main/data/adult.data', names = column_names)

# Data Preparation:
dataset_df = dataset_df.apply(lambda x: x.str.strip() if x.dtype == 'object' else x)    # Remove whitespaces
dataset_df["Income"] = np.where((dataset_df["Income"] == '<=50K'), 0, 1)                # Replace Target values with [0,1]
dataset_df.tail(5)

Unnamed: 0,Age,WorkClass,fnlwgt,Education,EducationNum,MaritalStatus,Occupation,Relationship,Race,Gender,CapitalGain,CapitalLoss,HoursPerWeek,NativeCountry,Income
32556,27,Private,257302,Assoc-acdm,12,Married-civ-spouse,Tech-support,Wife,White,Female,0,0,38,United-States,0
32557,40,Private,154374,HS-grad,9,Married-civ-spouse,Machine-op-inspct,Husband,White,Male,0,0,40,United-States,1
32558,58,Private,151910,HS-grad,9,Widowed,Adm-clerical,Unmarried,White,Female,0,0,40,United-States,0
32559,22,Private,201490,HS-grad,9,Never-married,Adm-clerical,Own-child,White,Male,0,0,20,United-States,0
32560,52,Self-emp-inc,287927,HS-grad,9,Married-civ-spouse,Exec-managerial,Wife,White,Female,15024,0,40,United-States,1


# Create or Open a Hybrid Intelligence Workspace
Workspaces are used by the Hybrid Intelligence framework to organize your data and models together in one place.

In [7]:
from umnai.workspaces.context import Workspace

# Open a workspace
ws = Workspace.open(
    path = WS_PATH,
    experiment = EXP_PATH
)

ws # Prints workspace details to confirm created/opened

2023-08-30 14:20:00.752293: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-30 14:20:00.783083: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-30 14:20:00.784063: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
  ws = Workspace.open(


<umnai.workspaces.context.WorkspaceContext at 0x7f63c94aab50>

# Onboard Hybrid Intelligence Dataset

Onboard the Pandas DataFrame into a Hybrid Intelligence dataset.

In [8]:
from umnai.data.datasets import Dataset
from umnai.data.enums import PredictionType

dataset = Dataset.from_pandas(
    dataset_df,
    prediction_type=PredictionType.CLASSIFICATION,
    features=list(dataset_df.drop(['Income'], axis=1).columns),    # All columns except 'Income' are features
    targets=['Income'],
)

dataset # Prints dataset details to confirm created/opened

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/30 14:20:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


[ObservationSpec] - MLFLOW Run ID: 9ffa04600c3c48b1bac60746220d0a1d:   0%|          | 0/60 [00:00<?, ?it/s]

23/08/30 14:20:52 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method


2023-08-30 14:21:16.224743: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'fnlwgt' with dtype int64 and shape [?,1]
	 [[{{node fnlwgt}}]]
2023-08-30 14:21:16.376154: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'HoursPerWeek' with dtype int64 and shape [?,1]
	 [[{{node HoursPerWeek}}]]
2023-08-30 14:21:16.412195: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'EducationNum' with dtype int64 and shape [?,1]
	 [[{{node EducationNum}}]

INFO:tensorflow:Assets written to: /opt/atlassian/pipelines/agent/build/demo-notebooks/resources/workspaces/saveloadESM_adult_income/preprocessing/dataset_name=Dataset_56d209eb/assets


Dataset(id=aa99f238-65ee-4fb3-86ff-734093d7beff; name=Dataset_56d209eb; is_named=False; workspace_id=None)

# Induce a Hybrid Intelligence Model

First create the ModelInducer to set up the induction parameters and settings. Then simply use the ModelInducer to induce the ESM from the onboarded dataset.

In [9]:
from umnai.induction.inducer import ModelInducer
from umnai.esm.model import ESM

# Induce a simple model quickly using fast execution parameters
model_inducer = ModelInducer(
    max_interactions=3,
    max_interaction_degree=2,
    max_polynomial_degree=2,
    trials=2,
    estimators=2,
    batch_size=512,
    iterations=2,
)

# # Induce a more realistic model using default Induction parameters:
# model_inducer = ModelInducer()

# Create an ESM using Induction
esm = model_inducer.induce(dataset)

[Modules] - MLFLOW Run ID: 2f2b63ca68b2492ab41b596df6ac85ea:   0%|          | 0/25 [00:00<?, ?it/s]



Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: mo

2023-08-30 14:27:29.853462: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'fnlwgt' with dtype int64 and shape [?,1]
	 [[{{node fnlwgt}}]]
2023-08-30 14:27:30.015671: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'HoursPerWeek' with dtype int64 and shape [?,1]
	 [[{{node HoursPerWeek}}]]
2023-08-30 14:27:30.053726: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'EducationNum' with dtype int64 and shape [?,1]
	 [[{{node EducationNum}}]

INFO:tensorflow:Assets written to: /tmp/tmpvhdmrkga/model/data/model/assets


# Note ESM ID and MLflow run ID
Note down:
- ESM ID: needed to load a model from a workspace.
- MLflow run ID: needed to load a model from an MLflow run.

In [10]:
# Save ESM ID and MLFLow Run ID for use in this notebook session
MLflow_Run_ID = esm.producer_run_id
ESM_ID = esm.id

# Note ESM ID and MLFLow Run ID for use in another notebook or session
print("MLflow Run ID: ", esm.producer_run_id)
print("ESM ID: ", esm.id)

MLflow Run ID:  2f2b63ca68b2492ab41b596df6ac85ea
ESM ID:  Dataset_56d209eb_2f2b63ca68b2492ab41b596df6ac85ea


# Note ESM ID and MLflow run ID
Note down:
- ESM ID: needed to load a model from a workspace.
- MLflow run ID: needed to load a model from an MLflow run.

In [11]:
# Save ESM ID and MLFLow Run ID for use in this notebook session
MLflow_Run_ID = esm.producer_run_id
ESM_ID = esm.id

# Note ESM ID and MLFLow Run ID for use in another notebook or session
print("MLflow Run ID: ", esm.producer_run_id)
print("ESM ID: ", esm.id)

MLflow Run ID:  2f2b63ca68b2492ab41b596df6ac85ea
ESM ID:  Dataset_56d209eb_2f2b63ca68b2492ab41b596df6ac85ea


# Model Save Options

## Save a model to a workspace

In [12]:
# Save the ESM to your workspace
esm.save_to_workspace()



2023-08-30 14:28:23.634512: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'fnlwgt' with dtype int64 and shape [?,1]
	 [[{{node fnlwgt}}]]
2023-08-30 14:28:23.783604: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'HoursPerWeek' with dtype int64 and shape [?,1]
	 [[{{node HoursPerWeek}}]]
2023-08-30 14:28:23.820532: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'EducationNum' with dtype int64 and shape [?,1]
	 [[{{node EducationNum}}]

INFO:tensorflow:Assets written to: /opt/atlassian/pipelines/agent/build/demo-notebooks/resources/workspaces/saveloadESM_adult_income/models/Dataset_56d209eb_2f2b63ca68b2492ab41b596df6ac85ea/assets


## Register a Model on MLflow Registry

**NOTE:** MLflow Registry is on only supported on Databricks. This cell will only run on Databricks.

You can register an ESM on the MLflow Model Registry using `register_mlflow()` and passing the desired ESM name. If the `name` is unspecified, the experiment run name will be used. If a model with the same name is already registered, then a new version will be created.

In [13]:
# Only runs on Databricks
import os

if os.environ.get('DATABRICKS_RUNTIME_VERSION') is not None:
    # Register ESM on MLflow Registry
    esm.register_mlflow(name='AdultIncome')


# Model Load Options

## Load a model from a workspace

To load an ESM from your workspace, you will need the ESM ID.

In [14]:
# Use the ESM_ID previously obtained after the induction step above
esm_ws = ESM.from_workspace(id=ESM_ID)

# Check ESM was loaded by comparing its ID
if esm_ws.id == ESM_ID: 
    print("\nLoad from Workspace SUCCESSFUL.")
else:
    print("\nLoad from Workspace NOT SUCCESSFUL.")


Load from Workspace SUCCESSFUL.


## Load a model from an MLflow Run

To load an ESM from an MLflow Run, you will need the MLflow Run ID.

In [15]:
# Use the MLflow_Run_ID previously obtained after the induction step above
esm_mlrun = ESM.from_mlflow_run(run_id=MLflow_Run_ID)

# Check ESM was loaded by comparing its ID
if esm_mlrun.id == ESM_ID: 
    print("\nLoad from MLflow run SUCCESSFUL.")
else:
    print("\nLoad from MLflow run NOT SUCCESSFUL.")


Load from MLflow run SUCCESSFUL.


## Load a model from the MLflow Registry 

**NOTE:** MLflow Registry is on only supported on Databricks. This cell will only run on Databricks.

You can load an ESM from the MLflow Model Registry using `from_mlflow_registry()` and passing the desired model name and version. 

The code below automatically detects the latest model version and loads it.

In [16]:
import mlflow
from mlflow import MlflowClient

# Only runs on Databricks
import os

if os.environ.get('DATABRICKS_RUNTIME_VERSION') is not None:
    
    # Get the latest version name of a registered model.
    client = MlflowClient()
    # Get all model versions
    model_versions = client.search_registered_models(filter_string="name='AdultIncome'")
    # Find the latest version
    latest_version = max(model_versions[0].latest_versions, key = lambda v: v.version).version

    # Load latest version
    esm_mlreg = ESM.from_mlflow_registry(model_name='AdultIncome', model_version=latest_version)

    # Check ESM was loaded by comparing its ID
    if esm_mlreg.id == ESM_ID:
        print("\nLoad from MLflow Registry SUCCESSFUL.")
    else:
        print("\nLoad from MLflow Registry NOT SUCCESSFUL.")
