# Deployment
In this notebook, our objective is to deploy Hugging Face Model of Bert_qa

# Notebook Overview
- Install requirements and Imports Dependencies
- Configurations
- Model Load
- Model Registry
- Testing latest model registred

# Install requirements and Imports Dependencies 

In [1]:
%pip install -r ../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
# Standard Library Imports
import logging
import warnings
import json
import os

# Third-Party Libraries
import shutil

# MLflow for Experiment Tracking and Model Management
import mlflow
from mlflow import MlflowClient
from mlflow.types.schema import Schema, ColSpec
from mlflow.types import ParamSchema, ParamSpec
from mlflow.models import ModelSignature

# Transformers
from transformers import pipeline

2025-04-11 12:01:00.385240: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-11 12:01:00.395514: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1744372860.408142    2697 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1744372860.412449    2697 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1744372860.422973    2697 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking 

# Configurations

In [3]:
warnings.filterwarnings("ignore")

In [4]:
# Define global experiment and run names to be used throughout the notebook
MODEL_PERSONAL_NAME = "morgana-rodrigues/bert_qa"
EXPERIMENT_NAME = "BERT model for Q&A"
MODEL_NAME = "BERT_QA"
RUN_NAME = 'BERT_QA'
NAME = 'BERT_QA'


# Set up the chunk separator for text processing
CHUNK_SEPARATOR = "\n\n"

In [5]:
# === Create logger ===
logger = logging.getLogger("deployment-notebook")
logger.setLevel(logging.INFO)

formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s", 
                             datefmt="%Y-%m-%d %H:%M:%S") 

stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
logger.propagate = False

In [6]:
logger.info('Notebook execution started.')

2025-04-11 12:01:02 - INFO - Notebook execution started.


## Model

In this part of the code, we load a Transformer model saved on Hugging Face to use it locally (in a pipeline object). This pipeline is then tested with a simple sample.

In [7]:
model_name = MODEL_PERSONAL_NAME

qa_pipeline = pipeline(
    'question-answering',
    model=model_name,
    device=0 # -1 means running on CPU
)

config.json:   0%|          | 0.00/582 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.20k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/669k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Device set to use cuda:0


In [8]:
qa_pipeline (context="Take me down to Paradise City where the grass is green and the girls are pretty", question="What colour is the grass")

{'score': 0.9356583952903748, 'start': 49, 'end': 54, 'answer': 'green'}

This class below encapsulates the model in the format that will be logged/registered into MLFlow. It receives a pipeline (or a trainer) as input, saves the model into a temporary folder (called model_name), and log as an artifact into MLFlow. When MLFlow deploys the model, it loads these artifacts into a new pipeline, which can be used to perform inference.

In [9]:
class DistilBERTModel(mlflow.pyfunc.PythonModel):
    def _preprocess(self, inputs):
        """
        Preprocesses the input data.

        Args:
            inputs: A dictionary containing two keys:
                - 'context': A list with the context text.
                - 'question': A list with the question to be answered.

        Returns:
            tuple: A tuple containing the context (str) and the question (str).
        """
        try:
                context = inputs['context'][0]
                question = inputs['question'][0]
                print("pre processing", context,question)
                return context, question
        
        except Exception as e:
            logger.error(f"Error preprocessing the input data: {str(e)}")  

    def load_context(self, context):
        """
        Loads the question-answering pipeline using the saved model artifact.

        Args:
            context: The MLflow context object 
                containing the loaded artifacts.
        """
        try:
            self.model = pipeline(
            'question-answering',
             model=context.artifacts["model"],
             device=0
        )
        except Exception as e:
            logger.error(f"Error loading the question-answering pipeline: {str(e)}")     

    def predict(self, context, model_input, params):
        """
        Runs inference using the loaded model and input data.

        Args:
            context: The MLflow context object 
                with access to artifacts.
            model_input: A dictionary containing 'context' and 'question' keys.

        Returns:
            The output from the model containing the predicted answer and optionally the score.
        """
        try:
            in_ctx, question = self._preprocess(model_input)
            output = self.model(context=in_ctx, question=question)
            return output
        except Exception as e:
            logger.error(f"Error running inference: {str(e)}")  

    @classmethod
    def log_model(cls, model_name, source_trainer = None, source_pipeline = None, demo_folder="../demo"):
        """
        Logs the model to MLflow, including artifacts, dependencies, and input/output signatures.

        Args:
            model_name: Path where the model will be temporarily saved before logging.
            source_trainer: A trainer object with a `.save_model()` method. Defaults to None.
            source_pipeline: A pipeline object with a `.save_pretrained()` method. Defaults to None.
            demo_folder: Path to the folder containing the compiled demo UI. Defaults to "demo".
        """
        try:
            input_schema = Schema(
            [
                ColSpec("string", "context"),
                ColSpec("string", "question"),
            ]
            )
            output_schema = Schema(
                [
                    ColSpec("string", "answer")
                ]
            )
            
            params_schema = ParamSchema(
                [
                    ParamSpec("show_score", "boolean", False)
                ]
            )
          
            signature = ModelSignature(inputs=input_schema, outputs=output_schema, params=params_schema)
            if source_trainer is not None:
                source_trainer.save_model(model_name)
            elif source_pipeline is not None:
                source_pipeline.save_pretrained(model_name)
                 
            requirements = [
                "transformers==4.48.0",
                "tf_keras"
            ]
            mlflow.pyfunc.log_model(
                model_name,
                python_model=cls(),
                artifacts={"model": model_name, "demo": demo_folder},
                signature=signature,
                pip_requirements=requirements
            )
            shutil.rmtree(model_name)
            logger.info("Logging model to MLflow done successfully")

        except Exception as e:
            logger.error(f"Error logging model to MLflow: {str(e)}")

# Model Registry

In [10]:
mlflow.set_experiment(experiment_name = EXPERIMENT_NAME)

2025/04/11 12:01:55 INFO mlflow.tracking.fluent: Experiment with name 'BERT model for Q&A' does not exist. Creating a new experiment.


<Experiment: artifact_location='/phoenix/mlflow/405365081274429657', creation_time=1744372915675, experiment_id='405365081274429657', last_update_time=1744372915675, lifecycle_stage='active', name='BERT model for Q&A', tags={}>

In [11]:
with mlflow.start_run(run_name= RUN_NAME) as run:
    logger.info(f"Run's Artifact URI: {run.info.artifact_uri}")
    DistilBERTModel.log_model(model_name = MODEL_NAME, source_pipeline=qa_pipeline)
    mlflow.register_model(model_uri = f"runs:/{run.info.run_id}/{MODEL_NAME}", name = NAME)

2025-04-11 12:01:55 - INFO - Run's Artifact URI: /phoenix/mlflow/405365081274429657/962d6ddaf429493cbab0b7f0b2876502/artifacts


Downloading artifacts:   0%|          | 0/6 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/19 [00:00<?, ?it/s]

2025-04-11 12:02:00 - INFO - Logging model to MLflow done successfully
Successfully registered model 'BERT_QA'.
Created version '1' of model 'BERT_QA'.


# Testing latest model registred

In [12]:
client = mlflow.MlflowClient()
model_metadata = client.get_latest_versions(MODEL_NAME, stages=["None"])
latest_model_version = model_metadata[0].version
print(latest_model_version, mlflow.models.get_model_info(f"models:/BERT_QA/{latest_model_version}").signature)

1 inputs: 
  ['context': string (required), 'question': string (required)]
outputs: 
  ['answer': string (required)]
params: 
  ['show_score': boolean (default: False)]



In [13]:
model = mlflow.pyfunc.load_model(model_uri=f"models:/BERT_QA/{latest_model_version}")
context = "Marta is mother of John and Amanda"
question = "what is the name of Marta's daugther?"
model.predict({"context": [context], "question":[question]})

Device set to use cuda:0


pre processing ['Marta is mother of John and Amanda'] ["what is the name of Marta's daugther?"]


{'score': 0.6202742457389832, 'start': 28, 'end': 34, 'answer': 'Amanda'}

Built with ❤️ using Z by HP AI Studio.