# MLflow Quickstart (Python)

With MLflow's autologging capabilities, a single line of code automatically logs the resulting model, the parameters used to create the model, and a model score. MLflow autologging is available for several widely used machine learning packages. This notebook creates a Random Forest model on a simple dataset and uses the the MLflow autolog() function to log information generated by the run.

For details about what information is logged with autolog(), refer to the MLflow documentation.

# Setup
If you are using a cluster running Databricks Runtime, you must install the mlflow library from PyPI. See Cmd 3.
If you are using a cluster running Databricks Runtime ML, the mlflow library is already installed.

In [0]:
import mlflow
import mlflow.sklearn
import pandas as pd
import matplotlib.pyplot as plt
 
from numpy import savetxt
 
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
 
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

In [0]:
# Import the dataset from scikit-learn and create the training and test datasets.

db = load_diabetes()
X = db.data
y = db.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [0]:
display(db.data)

In [0]:
# Create a random forest model and log parameters, metrics, and the model using mlflow.sklearn.autolog().

# Enable genearl autolog()
# mlflow.autolog() requires mlflow 1.12.0 or above.
mlflow.autolog()

mlflow.set_experiment("/Users/dong.qiaoyang@databricks.com/publicdemo/mlflow quickstart")

# With autolog() enabled, all model parameters, a model score, and the fitted model are automatically logged.  
with mlflow.start_run():
  
  # Set the model parameters. 
  n_estimators = 110
  max_depth = 6
  max_features = 3
  
  mlflow.log_param('custom', 20)

  # Create and train model.
  rf = RandomForestRegressor(n_estimators = n_estimators, max_depth = max_depth, max_features = max_features)
  rf.fit(X_train, y_train)
  
  # Use the model to make predictions on the test dataset.
  predictions = rf.predict(X_test)

To view the results, click Experiment at the upper right of this page. The Experiments sidebar appears. This sidebar displays the parameters and metrics for each run of this notebook. Click the circular arrows icon to refresh the display to include the latest runs.

When you click the square icon with the arrow to the right of the date and time of the run, the Runs page opens in a new tab. This page shows all of the information that was logged from the run. Scroll down to the Artifacts section to find the logged model.