d-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px; height: 163px">
</div>

# Lab: Grid Search with MLflow

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) In this lab you:<br>
 - Import the housing data
 - Perform grid search using scikit-learn
 - Log the best model on MLflow
 - Load the saved model

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Setup & Classroom-Cleanup<br>

For each lesson to execute correctly, please make sure to run the **`Classroom-Setup`** cell at the start of each lesson (see the next cell) and the **`Classroom-Cleanup`** cell at the end of each lesson.

In [4]:
%run "../Includes/Classroom-Setup"

In [5]:


#######################################################################
#                                                                     #
#    This installs MLflow for you only on Databricks Runtime          #
#    NOTE: this code does not work with ML runtime (see below)        #
#                                                                     #
#######################################################################


dbutils.library.installPyPI("mlflow", "1.0.0")
dbutils.library.restartPython()



## Data Import

Load in same Airbnb data and create train/test split.

In [7]:

import pandas as pd
from sklearn.model_selection import train_test_split

df = pd.read_csv("/dbfs/mnt/training/airbnb/sf-listings/airbnb-cleaned-mlflow.csv")
X_train, X_test, y_train, y_test = train_test_split(df.drop(["price"], axis=1), df[["price"]].values.ravel(), random_state=42)


## Perform Grid Search using scikit-learn

We want to know which combination of hyperparameter values is the most effective. Fill in the code below to perform <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" target="_blank"> grid search using `sklearn`</a> over the 2 hyperparameters we looked at in the 02 notebook, `n_estimators` and `max_depth`.

In [9]:

# ANSWER
import mlflow.sklearn
from sklearn.ensemble import RandomForestRegressor
# ------------------------------------------------
from sklearn.model_selection import GridSearchCV

# dictionary containing hyperparameter names and list of values we want to try
parameters = {'n_estimators':[100,1000], 
              'max_depth':[10,15]}

rf = RandomForestRegressor()

grid_rf_model = GridSearchCV(rf, parameters, cv=3)

grid_rf_model.fit(X_train, y_train)

best_rf = grid_rf_model.best_estimator_

for p in parameters:
  print("Best '{}': {}".format(p, best_rf.get_params()[p]))
  
  

In [10]:

# what i tested with: 

for key,value in parameters.items():
  print(key, " ::  ", value)



## Log Best Model on MLflow

Log the best model as `grid-random-forest-model`, its parameters, and its MSE metric under a run with name `RF-Grid-Search` in our new MLflow experiment.

In [12]:


# ANSWER

from sklearn.metrics import mean_squared_error


with mlflow.start_run(run_name="RF-Grid-Search") as run:
  
  
  # Create predictions of X_test using best model
  predictions = best_rf.predict(X_test)
  # will use this in a bit 
  
  
  # Log model
  mlflow.sklearn.log_model(best_rf, "grid-random-forest-model")
  
  
  # Log params
  model_params = best_rf.get_params()
  
  
  # go thru each of the parameters and log them...
  [mlflow.log_param(p, model_params[p]) for p in parameters]
  
  
  # Create and log MSE metrics using predictions of X_test and its actual value y_test
  mse = mean_squared_error(y_test, predictions)
  mlflow.log_metric("mse", mse)
  
  
  runID = run.info.run_uuid
  artifactURI = mlflow.get_artifact_uri()
  print("Inside MLflow Run with id {} and artifact URI {}".format(runID, artifactURI))
  
  

In [13]:

# I want the actual model parameters...

for key,value in model_params.items():
  print(key,  "        " , value)


In [14]:
print(predictions)

Check on the MLflow UI that the run `RF-Grid-Search` is logged has the best parameter values found by grid search.

* https://docs.databricks.com/applications/mlflow/quick-start.html

-sandbox
## Load the Saved Model

Load the trained and tuned model we just saved. Check that the hyperparameters of this model matches that of the best model we found earlier.

<img alt="Hint" title="Hint" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.3em" src="https://files.training.databricks.com/static/images/icon-light-bulb.svg"/>&nbsp;**Hint:** Use the `artifactURI` variable declared above.

In [19]:

# ANSWER
model = mlflow.sklearn.load_model(artifactURI + "/grid-random-forest-model/")


Time permitting, continue to grid search over a wider number of parameters and automatically save the best performing parameters back to `mlflow`.

Time permitting, use the `MlflowClient` to interact programatically with your run.

### Examining the actual model object:

In [23]:
dir(model)

In [24]:
"""
 'apply',
 'base_estimator',
 'base_estimator_',
 'bootstrap',
 'class_weight',
 'criterion',
 'decision_path',
 'estimator_params',
 'estimators_',
 'feature_importances_',
 'fit',
 'get_params',
 'max_depth',
 'max_features',
 'max_leaf_nodes',
 'min_impurity_decrease',
 'min_impurity_split',
 'min_samples_leaf',
 'min_samples_split',
 'min_weight_fraction_leaf',
 'n_estimators',
 'n_features_',
 'n_jobs',
 'n_outputs_',
 'oob_score',
 'predict',
 'random_state',
 'score',
 'set_params',
 'verbose',
 'warm_start']
 """

In [25]:
model.get_params

In [26]:
model.max_depth

In [27]:
model.n_features_

In [28]:
model.score

In [29]:
model.verbose

### Run Object Examination:

In [33]:
dir(run)


In [34]:
"""
 'data',
 'from_dictionary',
 'from_proto',
 'info',
 'to_dictionary',
 'to_proto']
"""

In [35]:
for key,value in run.info:
  print(key, " . . . . . . . ", value)

### Reading Head Files View:

In [38]:
#  /Users/tbresee@mail.smu.edu/ACADEMY_TRAINING/MLflow-1.2.0-SPNC/Python/Solutions/Labs/02-Lab


In [39]:
#  	Artifact Location:    dbfs:/databricks/mlflow/2056266854504964

In [40]:

%fs ls dbfs:/databricks/mlflow/2056266854504964


path,name,size
dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/,c2e07b090d4b417c86eb8a8704e118ea/,0


In [41]:

%fs ls dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/artifacts/grid-random-forest-model/




path,name,size
dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/artifacts/grid-random-forest-model/MLmodel,MLmodel,362
dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/artifacts/grid-random-forest-model/conda.yaml,conda.yaml,130
dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/artifacts/grid-random-forest-model/model.pkl,model.pkl,15123079


In [42]:

%fs head dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/artifacts/grid-random-forest-model/conda.yaml


In [43]:

%fs head dbfs:/databricks/mlflow/2056266854504964/c2e07b090d4b417c86eb8a8704e118ea/artifacts/grid-random-forest-model/MLmodel


-sandbox
<img alt="Side Note" title="Side Note" style="vertical-align: text-bottom; position: relative; height:1.75em; top:0.05em; transform:rotate(15deg)" src="https://files.training.databricks.com/static/images/icon-note.webp"/> See the solutions folder for an example solution to this lab.

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Classroom-Cleanup<br>

Run the **`Classroom-Cleanup`** cell below to remove any artifacts created by this lesson.

In [51]:
%run "../Includes/Classroom-Cleanup"

## ![Spark Logo Tiny](https://files.training.databricks.com/images/105/logo_spark_tiny.png) Next Steps

Start the next lesson, [Packaging ML Projects]($../03-Packaging-ML-Projects ).

-sandbox
&copy; 2020 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>