<h4 style="font-variant-caps: small-caps;font-size:35pt;">Databricks-ML-professional-S02b-Model-Management</h4>

<div style='background-color:black;border-radius:5px;border-top:1px solid'></div>
<br/>
<p>This Notebook adds information related to the following requirements:</p><br/>
<b>Preprocessing Logic:</b>
<ul>
<li>Describe the basic purpose and user interactions with Model Registry</li>
<li>Programmatically register a new model or new model version</li>
<li>Add metadata to a registered model and a registered model version</li>
<li>Identify, compare, and contrast the available model stages</li>
<li>Transition, archive, and delete model versions</li>
</ul>
<br/>
<p><b>Download this notebook at format ipynb <a href="Databricks-ML-professional-S02b-Model-Management.ipynb">here</a>.</b></p>
<br/>
<div style='background-color:black;border-radius:5px;border-top:1px solid'></div>

<a id="modelregistry"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">1. Describe the basic purpose and user interactions with Model Registry</span></div>

<b>MLflow Model Registry is a collaborative hub where teams can share ML models, work together from experimentation to online testing and production, integrate with approval and governance workflows, and monitor ML deployments and their performance.</b>
<ul>
<li>Is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model.</li>
<li>Provides model lineage (which MLflow Experiment and Run produced the model), model versioning, stage transitions (e.g. from staging to production), annotations (e.g. with comments, tags), and deployment management (e.g. which production jobs have requested a specific model version)</li>
<li>Features of Model Registry:<ul>
<li><b>Central Repository:</b> Register MLflow models with the MLflow Model Registry. A registered model has a unique name, version, stage, and other metadata.</li>
<li><b>Model Versioning:</b> Automatically keep track of versions for registered models when updated.</li>
<li><b>Model Stage:</b> Assigned preset or custom stages to each model version, like “Staging” and “Production” to represent the lifecycle of a model.</li>
<li><b>Model Stage Transitions:</b> Record new registration events or changes as activities that automatically log users, changes, and additional metadata such as comments.</li>
<li><b>CI/CD Workflow Integration:</b> Record stage transitions, request, review and approve changes as part of CI/CD pipelines for better control and governance.</li>
</ul>
</li>
<li>Can be managed using the UI or in pure python</li>
</ul>
<div><img src="https://files.training.databricks.com/images/eLearning/ML-Part-4/model-registry.png" style="height: 400px; margin: 20px"/></div>

<a id="programmaticregistration"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">2. Programmatically register a new model or new model version</span></div>
<p>Let's quickly train a model and programmaticaly register it to Model registry:</p>

In [1]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
#
import seaborn as sns
#
import mlflow
#
import logging
import re

In [12]:
TRACKING_URI = "http://127.0.0.1:8080"
mlflow.set_tracking_uri(TRACKING_URI)
mlflow.set_experiment("register_diamonds")

<Experiment: artifact_location='mlflow-artifacts:/1', creation_time=1728978842565, experiment_id='1', last_update_time=1728978842565, lifecycle_stage='active', name='register_diamonds', tags={}>

In [13]:
mlflow.sklearn.autolog(disable=True)
logging.getLogger("mlflow").setLevel(logging.FATAL)

In [14]:
diamonds_df = sns.load_dataset("diamonds").drop(["cut", "color", "clarity"], axis=1)
#
X_train, X_test, y_train, y_test = train_test_split(diamonds_df.drop(["price"], axis=1), diamonds_df["price"], test_size=0.33)
#
model = LinearRegression().fit(X_train, y_train)
model_path = "sns_diamonds"
#
with mlflow.start_run(run_name="register_diamonds") as run:
    mlflow.sklearn.log_model(sk_model     =model,
                             artifact_path=model_path)

<p>Programmaticaly register the latest logged model:</p>
<p><i>Note that, running multiple times the command below automatically register a new model with a version number incremented by one from last registered version.</i></p>

In [15]:
# get the latest model
latest_run_id = mlflow.search_runs().sort_values(by="end_time", ascending=False).head(1)['run_id'][0]
#
mlflow.register_model(f"runs:/{latest_run_id}/{model_path}", name="lr_sns_diamonds");

Successfully registered model 'lr_sns_diamonds'.
Created version '1' of model 'lr_sns_diamonds'.


<p>Alternatively, a newly logged model can be logged automatically by using parameter <code>registered_model_name</code> in <code>mlflow.sklearn.log_model</code>:</p>

In [16]:
diamonds_df = sns.load_dataset("diamonds").drop(["cut", "color", "clarity"], axis=1)
#
X_train, X_test, y_train, y_test = train_test_split(diamonds_df.drop(["price"], axis=1), diamonds_df["price"], test_size=0.33)
#
model = LinearRegression().fit(X_train, y_train)
model_path = "sns_diamonds"
#
with mlflow.start_run(run_name="register_diamonds") as run:
    mlflow.sklearn.log_model(sk_model     =model,
                             artifact_path=model_path,
                             registered_model_name="lr_sns_diamonds")

Registered model 'lr_sns_diamonds' already exists. Creating a new version of this model...
Created version '2' of model 'lr_sns_diamonds'.


<p>Alternatively, a new registered model can be created from scratch, and then filled with a model from an existing run:</p>

In [17]:
from mlflow.store.artifact.runs_artifact_repo import RunsArtifactRepository

In [26]:
# Register model name in the model registry
client = mlflow.MlflowClient()
#client.create_registered_model("sns_diamonds_create")

# Create a new version of the rfr model under the registered model name
desc = "A new version of sns diamonds dataset linear regressions model"
runs_uri = f"runs:/{latest_run_id}/{model_path}"
model_src = RunsArtifactRepository.get_underlying_uri(runs_uri)
mv = client.create_model_version("sns_diamonds_create", model_src, latest_run_id, description=desc)

<a id="updatemetadata"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'><span style="font-variant-caps: small-caps;font-weight:700">3. Add metadata to a registered model and a registered model version</span></div>

In [27]:
client = mlflow.MlflowClient()

<p>Registered model metadata can be listed:</p>

In [28]:
for val in client.get_registered_model("sns_diamonds_create"):
    print(val)

('aliases', {})
('creation_timestamp', 1728978848451)
('description', '')
('last_updated_timestamp', 1728978910712)
('latest_versions', [<ModelVersion: aliases=[], creation_timestamp=1728978910712, current_stage='None', description='A new version of sns diamonds dataset linear regressions model', last_updated_timestamp=1728978910712, name='sns_diamonds_create', run_id='246754e0f988487bb1ef6107a6e65c20', run_link='', source='mlflow-artifacts:/1/246754e0f988487bb1ef6107a6e65c20/artifacts/sns_diamonds', status='READY', status_message='', tags={}, user_id='', version='2'>])
('name', 'sns_diamonds_create')
('tags', {'task': 'classification'})


In [29]:
# Set registered model tag
client.set_registered_model_tag("sns_diamonds_create", "task", "classification")
for val in client.get_registered_model("sns_diamonds_create"):
    print(val)

('aliases', {})
('creation_timestamp', 1728978848451)
('description', '')
('last_updated_timestamp', 1728978910712)
('latest_versions', [<ModelVersion: aliases=[], creation_timestamp=1728978910712, current_stage='None', description='A new version of sns diamonds dataset linear regressions model', last_updated_timestamp=1728978910712, name='sns_diamonds_create', run_id='246754e0f988487bb1ef6107a6e65c20', run_link='', source='mlflow-artifacts:/1/246754e0f988487bb1ef6107a6e65c20/artifacts/sns_diamonds', status='READY', status_message='', tags={}, user_id='', version='2'>])
('name', 'sns_diamonds_create')
('tags', {'task': 'classification'})


In [30]:
# Set model version tag
client.set_model_version_tag("sns_diamonds_create", "2", "validation_status", "approved")
for val in client.get_registered_model("sns_diamonds_create"):
    print(val)

('aliases', {})
('creation_timestamp', 1728978848451)
('description', '')
('last_updated_timestamp', 1728978910712)
('latest_versions', [<ModelVersion: aliases=[], creation_timestamp=1728978910712, current_stage='None', description='A new version of sns diamonds dataset linear regressions model', last_updated_timestamp=1728978910712, name='sns_diamonds_create', run_id='246754e0f988487bb1ef6107a6e65c20', run_link='', source='mlflow-artifacts:/1/246754e0f988487bb1ef6107a6e65c20/artifacts/sns_diamonds', status='READY', status_message='', tags={'validation_status': 'approved'}, user_id='', version='2'>])
('name', 'sns_diamonds_create')
('tags', {'task': 'classification'})


In [None]:
# Add or update description
client.update_model_version(
    name="sns_diamonds_create",
    version=1,
    description="This is the first version of sns_diamonds_create model",
)
for version in client.search_model_versions(filter_string="name='sns_diamonds_create'"):
    print(f"Description of version {version.version} of the model: {version.description}")

Description of version 1 of the model: This is the first version of sns_diamonds_create model
Description of version 2 of the model: A new version of sns diamonds dataset linear regressions model


<p>See more information on how to update registered model <a href="https://mlflow.org/docs/latest/model-registry.html" target="_blank">here</a>.</p>

<a id="modelstages"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">4. Identify, compare, and contrast the available model stages</div>
<ul><li>The MLflow Model Registry defines several model stages: 
<ul>
<li><b>None</b>: the model is in developement</li>
<li><b>Staging</b>: the model is to be tested</li>
<li><b>Production</b>: the model is tested, validated and in production</li>
<li><b>Archived</b>: backup of the model</li>
</ul>
</li></ul>
<p>By default, when registered, models are associated with tag <b>None</b>.</p>

<a id="transitionarchivedelete"></a>
<div style='background-color:rgba(30, 144, 255, 0.1);border-radius:5px;padding:2px;'>
<span style="font-variant-caps: small-caps;font-weight:700">5. Transition, archive, and delete model versions</div>

<p>List registered models and their latest version:</p>

In [31]:
client = mlflow.MlflowClient()
#
registered_models = []
for model in client.search_registered_models():
    if len(model.latest_versions)>0:
        registered_models.append((model.latest_versions[0].name,
                                  model.latest_versions[0].run_id,
                                  model.latest_versions[0].version,
                                  model.latest_versions[0].current_stage,
                                  model.latest_versions[0].status,
                                  model.latest_versions[0].tags,
                                  model.latest_versions[0].source))
#
display(spark.createDataFrame(registered_models, ['name', 'run_id', 'latest_version', 'current_stage', 'status', 'tags', 'source']))

NameError: name 'spark' is not defined

<p>Info about one specific registered model and its latest version:</p>

In [32]:
client.get_registered_model('lr_sns_diamonds')

<RegisteredModel: aliases={}, creation_timestamp=1728978846571, description='', last_updated_timestamp=1728978848131, latest_versions=[<ModelVersion: aliases=[], creation_timestamp=1728978848131, current_stage='None', description='', last_updated_timestamp=1728978848131, name='lr_sns_diamonds', run_id='c4d59e626f1e40bea38159946fe438f7', run_link='', source='mlflow-artifacts:/1/c4d59e626f1e40bea38159946fe438f7/artifacts/sns_diamonds', status='READY', status_message='', tags={}, user_id='', version='2'>], name='lr_sns_diamonds', tags={}>

<p>Info about one specific model and a given version:</p>

In [33]:
client.get_model_version('lr_sns_diamonds', 2)

<ModelVersion: aliases=[], creation_timestamp=1728978848131, current_stage='None', description='', last_updated_timestamp=1728978848131, name='lr_sns_diamonds', run_id='c4d59e626f1e40bea38159946fe438f7', run_link='', source='mlflow-artifacts:/1/c4d59e626f1e40bea38159946fe438f7/artifacts/sns_diamonds', status='READY', status_message='', tags={}, user_id='', version='2'>

<p>Transition a specific version of a registered model to a given stage. Valid values for stage are: <b>Production</b>, <b>Staging</b>, <b>Archived</b>, <b>None</b></p>

In [37]:
client.transition_model_version_stage('lr_sns_diamonds', 2, 'Production')

  client.transition_model_version_stage('lr_sns_diamonds', 2, 'Production')


<ModelVersion: aliases=[], creation_timestamp=1728978848131, current_stage='Production', description='', last_updated_timestamp=1728978990760, name='lr_sns_diamonds', run_id='c4d59e626f1e40bea38159946fe438f7', run_link='', source='mlflow-artifacts:/1/c4d59e626f1e40bea38159946fe438f7/artifacts/sns_diamonds', status='READY', status_message='', tags={}, user_id='', version='2'>

<p>Archive a specific version of a registered model is the same command:</p>

In [39]:
client.transition_model_version_stage('lr_sns_diamonds', 1, 'Archived')

  client.transition_model_version_stage('lr_sns_diamonds', 1, 'Archived')


<ModelVersion: aliases=[], creation_timestamp=1728978846600, current_stage='Archived', description='', last_updated_timestamp=1728979004635, name='lr_sns_diamonds', run_id='246754e0f988487bb1ef6107a6e65c20', run_link='', source='mlflow-artifacts:/1/246754e0f988487bb1ef6107a6e65c20/artifacts/sns_diamonds', status='READY', status_message='', tags={}, user_id='', version='1'>

<p>Delete a registered model version:</p>

In [40]:
client.delete_model_version('lr_sns_diamonds', 6)

RestException: RESOURCE_DOES_NOT_EXIST: Model Version (name=lr_sns_diamonds, version=6) not found

<p>Get a list of available properties and methods:</p>

In [41]:
display(spark.createDataFrame([{'props_and_methods': method} for method in dir(client) if method[0]!='_']))

NameError: name 'spark' is not defined