# Logging for ML Model Deployments

In previous blog posts we [introduced the decorator pattern](https://www.tekhnoal.com/ml-model-decorators.html) for ML model deployments and then showed how to use the pattern to build extensions for a ML model deployment. For example, in [this blog post](https://www.tekhnoal.com/data-enrichment-for-ml-models.html) we did data enrichment using a PostgreSQL database. The extensions were added without having to modify the machine learning model code at all, we were able to do it by using the decorator pattern. In this blog post we’ll add logging to a model deployment without having to modify the model code, using a decorator. In this blog post, we'll show how to use the decorator pattern to add logging functionality to an ML model.

This blog post is written in a Jupyter notebook and we'll be switching between Python code and shell commands, the formatting will reflect this.

## Introduction

As software systems become more and more complex, the people that build and operate these systems are finding that they are very hard to debug and inspect. To be able to solve this issue, a software system needs to be observable. An observable system is a system that allows an outside observer to infer the internal state of the system based purely on the data that it generates. The quality of "observability" helps the operators of a system to understand the inner workings of the system and to solve issues that may come up, even when the issues may be unprecedented.

Observability is a non-functional requirement (NFR) of a system. An NFR is a requirement that is placed on the operation of a system that has nothing to do with the specific functions of the system. Rather, it is a cross-cutting concern that needs to be addressed within the whole system design. Logging is a way that we can implement observability of a software system. 

In the world of software systems, a "log" is a record of events that happen as software runs. A log is made up of individual records called log records that each represent a single event in the software system. Logs are useful for debugging the system, keeping a permanent record of it's activities, and many other purposes. 

A log record can be made of any data, but is usually encoded as text which makes a log record easy to create and read by humans. A log record can contain many pieces of information, but the most common are the date and time when the log record was created, a unique identifier for the log record, and some description of the event that caused the log record to be created. Logs are also usually tagged with the level of the severity of the event, for example an exception thrown during the execution of a program can be a handled or it can stop the execution, in either case a log can be generated with the correct level of the log. In general, log records are designed for debugging, alerting, and auditing the activities of the system.

Just like any other software component, machine learning models need to create a log of events that may be useful later on. For example, we may want to know how many predictions the model made, how many errors occurred, and any other interesting events that we may want to keep track of. In this blog post we'll create a decorator that creates a log for a machine learning model.

This post is not meant to be a full guide for doing logging in Python, but we'll include some background information to make it easier to understand. Logging in Python can get complicated and there are other places that cover it more thoroughly. [Here](https://realpython.com/python-logging/) is a good place to learn more about Python logging.

All of the code is available in [this github repository](https://github.com/schmidtbri/logging-for-ml-models).

## Software Architecture

The logging decorator will operate within the model service, but it requires outside services to handle the logs that it produces. This makes the software architecture more complicated and requires that we add several more services to the mix. 

![Software Architecture](software_architecture_lfmlm.png)
![Software Architecture]({attach}software_architecture_lfmlm.png){ width=100% }

The logging decorator is executing right after the prediction request is received from the client and a prediction is made by the model, it will send logs to be handled by other services. The other services are:

- Log Forwarder: a service that runs on each cluster node that forwards logs from the local hard drive to the log aggregator service.
- Log Aggregator: a network service that receives logs from many sources, processes them, and formats them for the log storage service.
- Log Storage: a network service that can store logs and also query them.
- Log User Interface: a network service with a web interface that provides access to the logs stored in the log storage service.

The specific services that we'll use will be detailed later in the blog post.

## Logging Best Practices

There are certain things that we can do when we create a log for our application that makes it more useful, especially in productions settings. For example, attaching a "level" to each log record makes it easy to filter the log according to the severity of the events. For example, a log record is of level "INFO" when it communicates a simple action that the system has taken. A "WARNING" log event is an event that may indicate a problem in the system, but the system can continue to run. A good description of the common log levels is [here](https://sematext.com/blog/logging-levels/).

Another good practice for logs is to include contextual information that can help to debug any problems that may arise in the execution of the code. For example, we can include the location in the codebase where the log record was generated. This information is very helpful during debugging and helps to quickly find the code that caused the event to happen. The information is often presented as the function name, code file name, and line number where the log message was generated. Another piece of useful contextual information is the hostname of the machine where the log was generated.

Logs should be easy to interpret for both humans and machines, this means that log records  are often written in text strings. Humans can easily read text, but parsing a text string is complicated for machines. To allow both humans and machines to easily parse a log message, a good middle ground is to use JSON formatting. JSON-formatted logs are easy to parse, but also allow a human to quickly read and understand a log message.

Unique identifiers are useful to include in logs because they allow us to correlate many different log records together into a cohesive picture. For example, a correlation id is a unique ID that is generated to identify a specific transaction or query in a system. Adding unique identifiers to each log record can make it possible to debug complex problems that happen across system boundaries. A good description of correlation ids is [here](https://hilton.org.uk/blog/microservices-correlation-id).

## Logging in Python

The python standard library has a module that can simplify logging. The logging module imported and used like this:

In [1]:
import logging

logger = logging.getLogger()

logger.warning("Warning message.")



To start logging, we instantiated a logger object using the logging.getLogger() function. Then we used the logger object to log a WARNING message.

The log records are being sent to the stderr output of the process by default. We'll change that by instantiating a StreamHandler and pointing it at the stdout stream:

In [2]:
import sys


stream_handler = logging.StreamHandler(sys.stdout)

logger.addHandler(stream_handler)
logger.warning("Warning message.")



We can also log messages at other levels, here is a WARNING and DEBUG message:

In [3]:
logger.warning("Warning message.")
logger.debug("Debug message.")



When the code above executed, only the WARNING message was printed because the logger only sends log messages to the output that are at the WARNING level or above by default. This filtering functionality is helpful when you are only interested in logs above a certain level. We can change that by configuring the logger:

In [4]:
logger.setLevel(logging.DEBUG)

logger.warning("Warning message.")
logger.debug("Debug message.")

Debug message.


We can put in more information to the log record by adding a formatter to the log handler:

In [5]:
formatter = logging.Formatter('%(asctime)s:%(name)s:%(levelname)s: %(message)s')
stream_handler.setFormatter(formatter)

logger.warning("Warning message.")
logger.debug("Debug message.")

2022-06-12 18:06:30,165:root:DEBUG: Debug message.


The log record now contains the date and time of the event, the name of the logger that generated the message, the level of the log, and the log message.

Each logger has a name attached to it when it is created, the name of the current logger is "root" because it is the first logger created. We can create a new logger with a name like this:

In [6]:
logger = logging.getLogger("test_logger")

logger.debug("Debug message.")

2022-06-12 18:06:32,442:test_logger:DEBUG: Debug message.


The log record has the name of the logger. 

### Logging the Hostname

To log extra information that is not available by the default within the logger we have to extend the logging module by creating Filter classes. A Filter is simply a class that accepts log records and can modify them. 

We'll add the hostname of the machine where the process is running to the log records by creating a Filter class:

In [7]:
import platform


class HostnameFilter(logging.Filter):
    """Logging filter that adds the hostname to log records."""

    def filter(self, record):
        record.hostname = platform.uname()[1]
        return True

To use the HostnameFilter class, we'll instantiate it and add it to the logger:

In [8]:
hostname_filter = HostnameFilter()

logger.addFilter(hostname_filter)

To actually add the hostname to the log record, we'll need to modify the format:

In [9]:
formatter = logging.Formatter('%(asctime)s:%(name)s:%(levelname)s:%(hostname)s:%(message)s')
stream_handler.setFormatter(formatter)

logger.warning("Warning message.")
logger.debug("Debug message.")

2022-06-12 18:06:41,880:test_logger:DEBUG:Brians-MBP.attlocal.net:Debug message.


The name of the field added to the log record is "hostname" so to add it to a log record we needed to add "%(hostname)s" to the format string.

### Logging Environment Variables

We may want to add more contextual information to the log in the future. This information will come from the environment variables of the process in which the logger is running. To do this we'll create another Filter that is able to pick up information from the environment variables. This filter is more interesting because it needs to be configured with the names of the environment variables that it needs to add to the log records.



In [10]:
import os
from typing import List


class EnvironmentInfoFilter(logging.Filter):
    """Logging filter that adds information to log records from environment variables."""
    
    def __init__(self, env_variables: List[str]):
        super().__init__()
        self._env_variables = env_variables

    def filter(self, record):
        for env_variable in self._env_variables:
            record.__setattr__(env_variable.lower(), os.environ.get(env_variable, "N/A"))
        return True

To try it out we'll have to add an environment variable that will be logged:

In [11]:
os.environ["NODE_IP"] = "198.197.196.195"

Next, we'll instantiate the Filter class and add it to a logger instance to see how it works.

In [12]:
environment_info_filter = EnvironmentInfoFilter(env_variables=["NODE_IP"])

logger.addFilter(environment_info_filter)

In [13]:
formatter = logging.Formatter('%(asctime)s:%(name)s:%(levelname)s:%(hostname)s:%(node_ip)s:%(message)s')
stream_handler.setFormatter(formatter)

logger.warning("Warning message.")
logger.debug("Debug message.")

2022-06-12 18:07:47,285:test_logger:DEBUG:Brians-MBP.attlocal.net:198.197.196.195:Debug message.


The log record now contains the IP address that we set in the environment variables.

### Logging in JSON

So far, the logs we've been generated have been in a slightly structured format that we came up with. It uses colons to separate out different sections of the log record. If we want to easily parse the logs, we should instead use JSON records. In this section we'll use the python-json-logger package to format the log records as JSON strings. 

First, we'll install the package:

In [14]:
from IPython.display import clear_output

!pip install python-json-logger

clear_output()

We'll instantiate a JsonFormatter object that will conver the logs to JSON:

In [15]:
from pythonjsonlogger import jsonlogger


json_formatter = jsonlogger.JsonFormatter("%(asctime)s:%(name)s:%(levelname)s:"
                                          "%(hostname)s:%(node_ip)s:%(message)s")

We'll add the formatter to the stream handler that we created above like this:

In [16]:
stream_handler.setFormatter(json_formatter)

Now when we log, the output will be a JSON string:

In [17]:
logger.error("Error message.")

{"asctime": "2022-06-12 18:07:58,346", "name": "test_logger", "levelname": "ERROR", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "message": "Error message."}


We can add easily add more fields from the log record to make it more comprehensive:

In [18]:
json_formatter = jsonlogger.JsonFormatter(
    "%(asctime)s %(hostname)s %(node_ip)s "
    "%(process)s %(thread)s %(pathname)s "
    "%(lineno)s %(levelname)s %(message)s")

stream_handler.setFormatter(json_formatter)

logger.error("Error message.")

{"asctime": "2022-06-12 18:08:02,496", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/469864656.py", "lineno": 8, "levelname": "ERROR", "message": "Error message."}


The JSON formatter can also add extra fields to the log record by using the "extra" parameter:

In [19]:
extra = {
    "action": "predict",
    "model_qualified_name": "model_qualified_name",
    "model_version": "model_version",
    "status":"error",
    "error_info": "error_info"
}

logger.error("message", extra=extra)

{"asctime": "2022-06-12 18:08:04,808", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/1433050719.py", "lineno": 9, "levelname": "ERROR", "message": "message", "action": "predict", "model_qualified_name": "model_qualified_name", "model_version": "model_version", "status": "error", "error_info": "error_info"}


The extra fields are:

- action: the method called on the MLModel instance
- model_qualified_name: the qualified name of the model
- model_version: the version of the model
- status: whether the action succeeded or not, can be "success" or "error"
- error_info:

This information would normally be included in the "message" field of the log record as unstructured text, but by breaking it out and putting it into individual fields in the JSON log record we'll be able to parse it later.

### Putting It All Together

We've done a few things with the logger module, now we need to put it all together into one configuration that we can use to set up the logger the way we want it.

The logging.config.dictConfig() function can accept all of the options of the loggers, formatters, handlers, and filters and set them up with one function call.

In [20]:
import logging.config


logging_config = {
    "version": 1,
    "disable_existing_loggers": True,
    "formatters": {
        "json_formatter": {
            "class": "pythonjsonlogger.jsonlogger.JsonFormatter",
            "format": "%(asctime)s %(hostname)s %(node_ip)s %(process)s %(thread)s %(pathname)s %(lineno)s %(levelname)s %(message)s"
        }
    },
    "filters": {
        "hostname_filter": {
            "()": "__main__.HostnameFilter"            
        },
        "environment_info_filter": {
            "()": "__main__.EnvironmentInfoFilter",
            "env_variables": ["NODE_IP"]
        }
    },
    "handlers": {
        "stdout":{
            "level":"INFO",
            "class":"logging.StreamHandler",
            "stream": "ext://sys.stdout",
            "formatter": "json_formatter",
            "filters": ["hostname_filter", "environment_info_filter"]
        }
    },
    "loggers": {
        "root": {
            "level": "INFO",
            "handlers": ["stdout"],
            "propagate": False
        }
    }
}

logging.config.dictConfig(logging_config)

In [21]:
logger = logging.getLogger()

logger.debug("Debug message.")
logger.info("Info message.")
logger.error("Error message.")

{"asctime": "2022-06-12 18:08:17,392", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/4067465749.py", "lineno": 4, "levelname": "INFO", "message": "Info message."}
{"asctime": "2022-06-12 18:08:17,394", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/4067465749.py", "lineno": 5, "levelname": "ERROR", "message": "Error message."}


The logger behaved in the same way as when we created it programatically.

## Installing a Model

We won't be training an ML model from scratch in this blog post because it would take a lot of space in the post. We'll be reusing a model that we built in a [previous blog post](https://www.tekhnoal.com/regression-model.html). The model's code is hosted in [this github repository](https://github.com/schmidtbri/regression-model). The model is able to predict  health insurance premiums.

The model itself can be installed as a normal Python package, using the pip command:

In [22]:
!pip install -e git+https://github.com/schmidtbri/regression-model#egg=insurance_charges_model

clear_output()

Making a prediction with the model is done through the InsuranceChargesModel class, which we'll import like this:

In [23]:
from insurance_charges_model.prediction.model import InsuranceChargesModel

clear_output()

Now we'll instantiate the model class in order to make a prediction.

In [24]:
model = InsuranceChargesModel()

clear_output()

In order to make a prediction with the model instance, we'll need to instantiate the input:

In [25]:
from insurance_charges_model.prediction.schemas import InsuranceChargesModelInput, \
    SexEnum, RegionEnum

model_input = InsuranceChargesModelInput(
    age=25, 
    sex=SexEnum.male,
    bmi=21.0,
    children=0,
    smoker=False,
    region=RegionEnum.northwest)

The model's input schema is called InsuranceChargesModelInput and it holds all of the features required by the model to make a prediction.

Now we can make a prediction with the model by calling the predict() method with an instance of the InsuranceChargesModelInput class.

In [26]:
prediction = model.predict(model_input)

prediction

InsuranceChargesModelOutput(charges=2696.69)

The model predicts that the insurance charges will be $2696.69.

The model provides it's input and output schemas through the "input_schema" and "output_schemas" class attributes. We can view these schemas a JSON schemas by calling the .schema() method on the instance.

In [27]:
model.input_schema.schema()

{'title': 'InsuranceChargesModelInput',
 'description': "Schema for input of the model's predict method.",
 'type': 'object',
 'properties': {'age': {'title': 'Age',
   'description': 'Age of primary beneficiary in years.',
   'minimum': 18,
   'maximum': 65,
   'type': 'integer'},
  'sex': {'title': 'Sex',
   'description': 'Gender of beneficiary.',
   'allOf': [{'$ref': '#/definitions/SexEnum'}]},
  'bmi': {'title': 'Body Mass Index',
   'description': 'Body mass index of beneficiary.',
   'minimum': 15.0,
   'maximum': 50.0,
   'type': 'number'},
  'children': {'title': 'Children',
   'description': 'Number of children covered by health insurance.',
   'minimum': 0,
   'maximum': 5,
   'type': 'integer'},
  'smoker': {'title': 'Smoker',
   'description': 'Whether beneficiary is a smoker.',
   'type': 'boolean'},
  'region': {'title': 'Region',
   'description': 'Region where beneficiary lives.',
   'allOf': [{'$ref': '#/definitions/RegionEnum'}]}},
 'definitions': {'SexEnum': {'titl

The output schema looks like this:

In [28]:
model.output_schema.schema()

{'title': 'InsuranceChargesModelOutput',
 'description': "Schema for output of the model's predict method.",
 'type': 'object',
 'properties': {'charges': {'title': 'Charges',
   'description': 'Individual medical costs billed by health insurance to customer in US dollars.',
   'type': 'number'}}}

## Creating the Logging Decorator

Now that we have a logging configuration with all of the basics, we'll start working on a Decorator that can help us do logging around an MLModel instance. 

In order to build a MLModel decorator class, we'll need to inherit from the MLModelDecorator class and add some functionality.

In [29]:
from typing import List, Optional
import logging
from ml_base.decorator import MLModelDecorator
from ml_base.ml_model import MLModelSchemaValidationException


class LoggingDecorator(MLModelDecorator):
    """Decorator to do logging around an MLModel instance."""

    def __init__(self, input_fields: Optional[List[str]] = None, 
                 output_fields: Optional[List[str]] = None) -> None:
        super().__init__(input_fields=input_fields, output_fields=output_fields)
        self.__dict__["_logger"] = None
        
    def predict(self, data):
        if self.__dict__["_logger"] is None:
            self.__dict__["_logger"] = logging.getLogger("{}_{}".format(
                self._model.qualified_name, "_logger"))
        
        # extra fields to be added to the log record
        extra = {
            "action": "predict",
            "model_qualified_name": self._model.qualified_name,
            "model_version": self._model.version
        }
        
        # adding model input fields to the extra fields to be logged
        new_extra = dict(extra)
        if self._configuration["input_fields"] is not None:
            for input_field in self._configuration["input_fields"]:
                new_extra[input_field] = getattr(data, input_field)
        
        self.__dict__["_logger"].info("Prediction requested.", extra=new_extra)
        
        try:
            prediction = self._model.predict(data=data)
            extra["status"] = "success"
            
            # adding model output fields to the extra fields to be logged
            new_extra = dict(extra)
            if self._configuration["output_fields"] is not None:
                for output_field in self._configuration["output_fields"]:
                    new_extra[output_field] = getattr(prediction, output_field)            
            self.__dict__["_logger"].info("Prediction created.", extra=new_extra) 
            return prediction
        except Exception as e:
            extra["status"] = "error"
            extra["error_info"] = str(e)
            self.__dict__["_logger"].error("Prediction exception.", extra=extra)
            raise e

The LoggingDecorator class has most of its logic in the predict method. This method simply instantiates a logger object and logs a message before a prediction is made, after it is made, and in the case when an exception is raised. Notice that the exception information is logged, but the exception is re-raised immediately after. We don't want to keep the exception from being handled by whatever code is using the model.

The decorator also adds a few fields to the log message:

- action: the action that the model is performing, in this case "prediction"
- model_qualified_name: the qualified name of the model performing the action
- model_version: the version of the model performing the action
- status: the result of the action, can be either "success" or "error"
- error_info: an optional field that adds error information when an exception is raised

These fields are added on top of all the regular fields that the logging package provides. The extra information should allow us to easily filter logs later.

## Decorating the Model

To test out the decorator we’ll first instantiate the model object that we want to use with the decorator.

In [30]:
model = InsuranceChargesModel()

Next, we’ll instantiate the decorator:

In [31]:
logging_decorator = LoggingDecorator()

We can add the model instance to the decorator after it’s been instantiated like this:

In [32]:
decorated_model = logging_decorator.set_model(model)

We can see the decorator and the model objects by printing the reference to the decorator:

In [33]:
decorated_model

LoggingDecorator(InsuranceChargesModel)

The decorator object is printing out it's own type along with the type of the model that it is decorating.

Now we can try out the logging decorator by making a few predictions:

In [34]:
prediction = decorated_model.predict(model_input)

prediction

{"asctime": "2022-06-12 18:09:45,971", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/578892535.py", "lineno": 33, "levelname": "INFO", "message": "Prediction requested.", "action": "predict", "model_qualified_name": "insurance_charges_model", "model_version": "0.1.0"}
{"asctime": "2022-06-12 18:09:46,016", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/578892535.py", "lineno": 44, "levelname": "INFO", "message": "Prediction created.", "action": "predict", "model_qualified_name": "insurance_charges_model", "model_version": "0.1.0", "status": "success"}


InsuranceChargesModelOutput(charges=2696.69)

Calling the predict method on the decorated model now emits two log messages. The first message is a "Prediction requested." message and happens before the model's predict method is called. The second is a "Prediction created." message and happens after the prediction is returned by the model to the decorator. The decorator can also log exceptions made by the model.

The logging decorator is also able to grab fields from the model's input and output and log those alongside the other fields. Here is how to configure the logging decorator to do this:

In [35]:
logging_decorator = LoggingDecorator(input_fields=["age", "bmi"],
                                     output_fields=["charges"])

decorated_model = logging_decorator.set_model(model)

prediction = decorated_model.predict(model_input)

prediction

{"asctime": "2022-06-12 18:09:58,567", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/578892535.py", "lineno": 33, "levelname": "INFO", "message": "Prediction requested.", "action": "predict", "model_qualified_name": "insurance_charges_model", "model_version": "0.1.0", "age": 25, "bmi": 21.0}
{"asctime": "2022-06-12 18:09:58,611", "hostname": "Brians-MBP.attlocal.net", "node_ip": "198.197.196.195", "process": 36023, "thread": 4445095424, "pathname": "/var/folders/vb/ym0r3p412kg598rdky_lb5_w0000gn/T/ipykernel_36023/578892535.py", "lineno": 44, "levelname": "INFO", "message": "Prediction created.", "action": "predict", "model_qualified_name": "insurance_charges_model", "model_version": "0.1.0", "status": "success", "charges": 2696.69}


InsuranceChargesModelOutput(charges=2696.69)

The "Prediction requested." log message now has two extra fields, the "age" field and the "bmi" field which were directly copied from the model input. The "Prediction created." log message also has the "charges" field, which is the prediction returned by the model.

We now have a working logging decorator that can help us to do logging if the model does not do logging for itself.

## Adding the Decorator to a Deployed Model

Now that we have a working decorator that works locally, we can deploy it with a model inside of a service. The [rest_model_service package](https://pypi.org/project/rest-model-service/) is able to host ML models and create a RESTful API for each individual model. We don't need to write any code to do this because the service can decorate the models that it hosts with decorators that we provide. You can learn more about the package in [this blog post](https://www.tekhnoal.com/rest-model-service.html).

To install the service package, execute this command:

In [36]:
!pip install rest_model_service>=0.3.0

clear_output()

The configuration for our model and decorator looks like this:

```yaml
service_title: Insurance Charges Model Service
models:
  - qualified_name: insurance_charges_model
    class_path: insurance_charges_model.prediction.model.InsuranceChargesModel
    create_endpoint: true
    decorators:
      - class_path: ml_model_logging.logging_decorator.LoggingDecorator
logging:
    version: 1
    disable_existing_loggers: true
    formatters:
      json_formatter:
        class: pythonjsonlogger.jsonlogger.JsonFormatter
        format: "%(asctime)s %(hostname)s %(node_ip)s %(process)s %(thread)s %(pathname)s %(lineno)s %(levelname)s %(message)s"
    filters:
      hostname_filter:
        "()": ml_model_logging.filters.HostnameFilter
      environment_info_filter:
        "()": ml_model_logging.filters.EnvironmentInfoFilter
        env_variables:
        - NODE_IP
    handlers:
      stdout:
        level: INFO
        class: logging.StreamHandler
        stream: ext://sys.stdout
        formatter: json_formatter
        filters:
        - hostname_filter
        - environment_info_filter
    loggers:
      root:
        level: INFO
        handlers:
        - stdout
        propagate: false
```

The two main sections in the file are the "models" section and the "logging" section. The models section is simpler and simply lists the InsuranceChargesModel, along with the LoggingDecorator. This section works in the same way as in previous blog posts. The logging configuration is set up exactly like we set it up is the examples above, the YAML is converted to a dictionary and passed directly into the logging.config.dictConfig() function.

To run the service locally, execute these commands:

```bash
export PYTHONPATH=./
export REST_CONFIG=./configuration/rest_configuration.yaml
uvicorn rest_model_service.main:app --reload
```

The service should come up and can be accessed in a web browser at http://127.0.0.1:8000. When you access that URL you will be redirected to the documentation page that is generated by the FastAPI package:

![Service Documentation](service_documentation_lfmlm.png)
![Service Documentation]({attach}service_documentation_lfmlm.png){ width=100% }

The documentation allows you to make requests against the API in order to try it out. Here's a prediction request against the insurance charges model:

![Prediction Request](prediction_request_lfmlm.png)
![Prediction Request]({attach}prediction_request_lfmlm.png){ width=100% }

And the prediction result:

![Prediction Response](prediction_response_lfmlm.png)
![Prediction Response]({attach}prediction_response_lfmlm.png){ width=100% }


The prediction made by the model had to go through the logging decorator that we configured into the service, so we got these two log records from the process:


![Prediction Log](prediction_log_lfmlm.png)
![Prediction Log]({attach}prediction_log_lfmlm.png){ width=100% }

By using the MLModel base class provided by the ml_base package and the REST service framework provided by the rest_model_service package we're able to quickly stand up a service to host the model. The decorator that we want to test can also be added to the model through configuration, including all of its parameters.


The local web service process emits the logs to stdout just as we configured it.

## Deploying the Model Service

Now that we have a working service that is running locally, we can work on deploying it to a cloud provider. We'll be using the [managed Kubernetes service](https://www.digitalocean.com/products/kubernetes) on DigitalOcean to deploy the model service. 

### Creating a Docker Image

Kubernetes needs to have a Docker image in order to deploy something, we'll build an image using this Dockerfile:

```dockerfile
FROM python:3.9-slim

ARG DATE_CREATED
ARG VERSION
ARG REVISION

LABEL org.opencontainers.image.title="Logging for ML Models"
LABEL org.opencontainers.image.description="Logging for machine learning models."
LABEL org.opencontainers.image.created=$DATE_CREATED
LABEL org.opencontainers.image.authors="6666331+schmidtbri@users.noreply.github.com"
LABEL org.opencontainers.image.source="https://github.com/schmidtbri/logging-for-ml-models"
LABEL org.opencontainers.image.version=$VERSION
LABEL org.opencontainers.image.revision=$REVISION
LABEL org.opencontainers.image.licenses="MIT License"
LABEL org.opencontainers.image.base.name="python:3.9-slim"

WORKDIR ./service

RUN apt-get update
RUN apt-get --assume-yes install git

COPY ./ml_model_logging ./data_enrichment
COPY ./configuration ./configuration
COPY ./LICENSE ./LICENSE
COPY ./service_requirements.txt ./service_requirements.txt

RUN pip install -r service_requirements.txt

CMD ["uvicorn", "rest_model_service.main:app", "--host", "0.0.0.0", "--port", "8000"]

```

The Dockerfile includes a set of labels from the [Open Containers annotations specification](https://github.com/opencontainers/image-spec/blob/main/annotations.md). Most of the labels are hardcoded in the Dockerfile, but there are three that we need to add from the outside: the date created, the version, and the revision. To do this we'll pull some information into environment variables:

In [37]:
DATE_CREATED=!date +"%Y-%m-%d %T"
# current git revision which is a SHA5 hash
REVISION=!git rev-parse HEAD

!echo "$DATE_CREATED"
!echo "$REVISION"

['2022-06-12 18:11:44']
['7f21864be6b0566f7b1883c20e0f6909d4776952']


Now we can use the values to build the image. We'll also provide the version as a build argument.

In [38]:
!docker build \
  --build-arg DATE_CREATED="$DATE_CREATED" \
  --build-arg VERSION="0.1.0" \
  --build-arg REVISION="$REVISION" \
  -t insurance_charges_model_service:0.1.0 ..\

clear_output()

To find the image we just built, we'll search through the local docker images:

In [43]:
!docker images | grep insurance_charges_model_service

insurance_charges_model_service                                                                   0.1.0     b2445b60123b   2 minutes ago   1.25GB
registry.digitalocean.com/dev-model-services-container-registry/insurance_charges_model_service   0.1.0     a3624392da83   10 days ago     1.25GB


Next, we'll start the image to see if everything is working as expected.

In [44]:
!docker run -d \
    -p 8000:8000 \
    -e REST_CONFIG=./configuration/rest_configuration.yaml \
    -e NODE_IP="198.197.196.195" \
    --name insurance_charges_model_service \
    insurance_charges_model_service:0.1.0

91d91849e46fdde11eeef83f120340cf1071705a9114f6ce9a738c21436a1cf6


Notice that we added an environment variable called NODE_IP, this is just so we have a value to pull into the logs later, its not the real node IP address.

The service should be accessible on port 8000 of localhost, so we'll try to make a prediction using the curl command:

In [45]:
!curl -X 'POST' \
  'http://127.0.0.1:8000/api/models/insurance_charges_model/prediction' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{ \
      \"age\": 65, \
      \"sex\": \"male\", \
      \"bmi\": 50, \
      \"children\": 5, \
      \"smoker\": true, \
      \"region\": \"southwest\" \
    }"

{"charges":46277.67}

The service is up and running in the docker container. To view the logs coming out of the process, we'll use the docker logs command:

In [46]:
!docker logs insurance_charges_model_service

{"asctime": "2022-06-12 22:17:26,583", "hostname": "91d91849e46f", "node_ip": "198.197.196.195", "process": 1, "thread": 140681388955456, "pathname": "/usr/local/lib/python3.9/site-packages/rest_model_service/main.py", "lineno": 26, "levelname": "INFO", "message": "Starting 'Insurance Charges Model Service'."}
{"asctime": "2022-06-12 22:17:27,924", "hostname": "91d91849e46f", "node_ip": "198.197.196.195", "process": 1, "thread": 140681388955456, "pathname": "/usr/local/lib/python3.9/site-packages/rest_model_service/main.py", "lineno": 53, "levelname": "INFO", "message": "Loaded insurance_charges_model model."}
{"asctime": "2022-06-12 22:17:27,925", "hostname": "91d91849e46f", "node_ip": "198.197.196.195", "process": 1, "thread": 140681388955456, "pathname": "/usr/local/lib/python3.9/site-packages/rest_model_service/main.py", "lineno": 71, "levelname": "INFO", "message": "Added LoggingDecorator decorator to insurance_charges_model model."}
{"asctime": "2022-06-12 22:17:27,926", "host

As we expected, the logs are coming out in JSON format. We're done with the docker container so we'll stop it and stop it and remove it.

In [47]:
!docker kill insurance_charges_model_service
!docker rm insurance_charges_model_service

insurance_charges_model_service
insurance_charges_model_service


### Setting up DigitalOcean

To deploy the model to DigitalOcean, we'll need to connect to their API. To interact with the API, we'l be using the DigitalOcean CLI, which is a commmand line tool. Installation instructions for the CLI package are [here](https://docs.digitalocean.com/reference/doctl/how-to/install/). To interact with the API, we'll also need an authentication token, which we can get by following [these instructions](https://docs.digitalocean.com/reference/api/create-personal-access-token/). To add the token to the doctl CLI tool, we execute this command:

```bash
doctl auth init --context model-services-context
```

The command asks for the token and saves it for later use.

Now that we have the credentials set up, we can start creating the infrastructure for our model deployment.

### Creating the Kubernetes Cluster

Some of the commands in this section will have a lot of output, we we'll be shortening the output to maintain clarity.

We wont be creating the managed Kubernetes cluster by hand, we'll be doing it through an Infrastructure as Code tool called [Terraform](https://www.terraform.io/). Terraform will allow us to declaratively state our infrastructure requirements in configuration files, and then create, manage, and destroy it with simple commands. The command line Terraform tool can be installed by following [these intructions](https://learn.hashicorp.com/tutorials/terraform/install-cli). 

The actual Terraform module that we'll be using is not in the current repository, it's in [a separate repository](https://github.com/schmidtbri/do-kubernetes-cluster). We'll be referencing the Terraform module and adding our own variables to customize the infrastrucutre. Putting the IaC code in a different repository and importing it makes the code reusable in many different contexts.

The Terraform code that configures the infrastructure is in the terraform/k8s_cluster.tf file in this repository. The file looks like this:

```terraform
terraform {
  required_providers {
    digitalocean = {
      source = "digitalocean/digitalocean"
      version = "~> 2.0"
    }
  }
}

module "kubernetes_cluster" {
  source = "github.com/schmidtbri/do-kubernetes-cluster"

  project_name                = "model-services"
  environment                 = "dev"
  region                      = "nyc3"
  default_pool_size           = 2
  default_pool_worker_type    = "s-1vcpu-2gb"
  enable_additional_pool      = true
  additional_pool_size        = 2
  additional_pool_worker_type = "s-2vcpu-4gb"
}
```

The terraform code sets up the DigitalOcean provider at the top, which will be used to interact with the DigitalOcean API. The "kubernetes_cluter" module is configured to pull the source from the IaC repository, configuring it to create these resources:

- a Digital Ocean project to hold resources
- a docker registry
- a VPC for the cluster nodes
- a kubernetes cluster
- an additional node pool within the same cluster

The variables are used to customize the resources to our needs. The variables are:

- project_name: Name of the project, this name will be added to the names of all of the resources created.
- environment: Environment name that will be added to the name of all of the resources created.
- region: Geographical region to use for the resources.
- default_pool_size: Number of nodes to create in default node pool.
- default_pool_worker_type: Type of the droplets to use for the default node pool.	
- enable_additional_pool: Enable or disable creation of additional node pool.	
- additional_pool_size: Number of nodes to create in the additional node pool.	
- additional_pool_worker_type: Type of the droplets to use for the additional node pool.	

We'll be creating a cluster for hosting model services and we want to create a development environment to experiment in. We'll be creating two node pools because we want to run extra workloads that we want to keep separate from the model service deployments. The default node pool will have two nodes, and the additional node pool will also have two nodes.

To create the resources we'll need to apply the module. First we'll change into the terraform folder and save the DigitalOcean token as an environment variable.

In [48]:
%cd ../terraform

%env DIGITALOCEAN_TOKEN=dop_v1_4e7db92644ece702e39f628ba4c5348c55ae57de845de5fc2e55765823ba5263

clear_output()

Next, we'll initialize the Terraform environment.

In [50]:
!terraform init

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m
- Reusing previous version of digitalocean/digitalocean from the dependency lock file
- Using previously-installed digitalocean/digitalocean v2.20.0

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m


The Terraform CLI tool has downloaded the necessary provider code and started the environment. In order to deploy infrastructure, we'll first need to make a plan.

In [51]:
!terraform plan -out=tfplan


Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  [32m+[0m create
[0m
Terraform will perform the following actions:

[1m  # module.kubernetes_cluster.digitalocean_container_registry.container_registry[0m will be created[0m[0m
[0m  [32m+[0m[0m resource "digitalocean_container_registry" "container_registry" {
      [32m+[0m [0m[1m[0mcreated_at[0m[0m             = (known after apply)
      [32m+[0m [0m[1m[0mendpoint[0m[0m               = (known after apply)
      [32m+[0m [0m[1m[0mid[0m[0m                     = (known after apply)
      [32m+[0m [0m[1m[0mname[0m[0m                   = "dev-model-services-container-registry"
      [32m+[0m [0m[1m[0mregion[0m[0m                 = "nyc3"
      [32m+[0m [0m[1m[0mserver_url[0m[0m             = (known after apply)
      [32m+[0m [0m[1m[0mstorage_usage_bytes[0m[0m    = (known after

As we expected, there will be five resources created. To create the resources, we'll use the apply terraform command.

In [52]:
!terraform apply -auto-approve tfplan

[0m[1mmodule.kubernetes_cluster.digitalocean_container_registry.container_registry: Creating...[0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_vpc.cluster_vpc: Creating...[0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_vpc.cluster_vpc: Creation complete after 1s [id=b96875c9-ed67-46d7-892e-3a3c4d9329c0][0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_cluster.cluster: Creating...[0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_container_registry.container_registry: Creation complete after 7s [id=dev-model-services-container-registry][0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_cluster.cluster: Still creating... [10s elapsed][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_cluster.cluster: Still creating... [20s elapsed][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_cluster.cluster: Still creating... [30s elapsed][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_cluster.cluster

The cluster is now ready for use. To push workloads to the cluster we'll need to upload Docker images to the registry and then access them from the cluster. To enable access to the registry from the cluster, we'll use another doctl command:

In [53]:
!doctl kubernetes cluster registry add dev-model-services-cluster

In [54]:
%cd ../

/Users/brian/Code/logging-for-ml-models


### Pushing the Image

We've built a Docker image and created a cluster to deploy it to, but we still need to upload the image to the registry so that we can pull it from the cluster. To login to the registry we can use a doctl command:

In [55]:
!doctl registry login

Logging Docker in to registry.digitalocean.com


The image exists in the local Docker registry, so we'll need to tag it with the remote registry name to upload it.

In [56]:
!docker tag insurance_charges_model_service:0.1.0 registry.digitalocean.com/dev-model-services-container-registry/insurance_charges_model_service:0.1.0

Now we can push the image to the DigitalOcean docker registry.

In [57]:
!docker push registry.digitalocean.com/dev-model-services-container-registry/insurance_charges_model_service:0.1.0

The push refers to repository [registry.digitalocean.com/dev-model-services-container-registry/insurance_charges_model_service]

[1Baff73edb: Preparing 
[1B67322c6f: Preparing 
[1B6a3c9a87: Preparing 
[1B90be6b9a: Preparing 
[1Bf8a5a5e9: Preparing 
[1Bd31b029e: Preparing 
[1B93bba98e: Preparing 
[1Bb9285d50: Preparing 
[1B5ff75c19: Preparing 
[1B6adc64fd: Preparing 
[1B3e0df215: Preparing 
[1B341968bc: Preparing 


[13Bff73edb: Pushing  310.5MB/1.023GB[13A[2K[13A[2K[13A[2K[13A[2K[11A[2K[10A[2K[6A[2K[13A[2K[7A[2K[8A[2K[7A[2K[5A[2K[7A[2K[13A[2K[7A[2K[5A[2K[7A[2K[8A[2K[13A[2K[7A[2K[6A[2K[5A[2K[7A[2K[8A[2K[7A[2K[4A[2K[8A[2K[5A[2K[7A[2K[13A[2K[7A[2K[5A[2K[8A[2K[7A[2K[7A[2K[8A[2K[13A[2K[5A[2K[7A[2K[4A[2K[5A[2K[7A[2K[8A[2K[13A[2K[5A[2K[8A[2K[13A[2K[7A[2K[8A[2K[5A[2K[3A[2K[3A[2K[8A[2K[5A[2K[7A[2K[13A[2K[5A[2K[3A[2K[7A[2K[8A[2K[13A[2K[5A[2K[3A[2K[7A[2K[8A[2K[3A[2K[5A[2K[7A[2K[8A[2K[3A[2K[5A[2K[5A[2K[8A[2K[7A[2K[3A[2K[13A[2K[8A[2K[7A[2K[3A[2K[7A[2K[3A[2K[8A[2K[3A[2K[13A[2K[7A[2K[3A[2K[7A[2K[3A[2K[8A[2K[7A[2K[13A[2K[3A[2K[7A[2K[8A[2K[5A[2K[3A[2K[7A[2K[3A[2K[2A[2K[2A[2K[2A[2K[7A[2K[3A[2K[7A[2K[8A[2K[7A[2K[3A[2K[8A[2K[7A[2K[7A[2K[7A[2K[13A[2K[8A[2K[7A[2K[8A[2K[3A[2K[7A[2K[8A[2

[13Bff73edb: Pushing  916.5MB/1.023GB[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2

[13Bff73edb: Pushed   1.034GB/1.023GB[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2K[13A[2

### Accessing the Kubernetes Cluster

Now that we have a running cluster, we can connect to it by setting up the kubectl command line tool. The DigitalOcean CLI tool can do this for us with this command:

In [58]:
!doctl kubernetes cluster kubeconfig save cf9dd0d1-b67a-43fa-a7d2-f3b30d35d2da

[32mNotice[0m: Adding cluster credentials to kubeconfig file found in "/Users/brian/.kube/config"
[32mNotice[0m: Setting current-context to do-nyc3-dev-model-services-cluster


The unique identifier is for the cluster that was just created, we can get it through the DigitalOcean console. When the command finishes, the current kubectl context should be switched to the newly created cluster.

To make sure everything is working we can get a list of the nodes in the cluster with this command:

In [59]:
!kubectl get nodes

NAME                                           STATUS   ROLES    AGE    VERSION
dev-model-services-additional-pool-c2s5n       Ready    <none>   119m   v1.22.8
dev-model-services-additional-pool-c2ss9       Ready    <none>   119m   v1.22.8
dev-model-services-default-worker-pool-c2spi   Ready    <none>   122m   v1.22.8
dev-model-services-default-worker-pool-c2spv   Ready    <none>   123m   v1.22.8


As we expected, we have four nodes, two in each node pool. We'll be using [node tags](https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes/) to do scheduling later, so we'll look the node tags up like this:

In [60]:
!kubectl get nodes --show-labels

NAME                                           STATUS   ROLES    AGE    VERSION   LABELS
dev-model-services-additional-pool-c2s5n       Ready    <none>   119m   v1.22.8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=s-2vcpu-4gb,beta.kubernetes.io/os=linux,doks.digitalocean.com/node-id=9c522467-1824-444d-8e5a-8696ffcb8ac2,doks.digitalocean.com/node-pool-id=0734e25f-54da-465d-a6ca-833678b6b067,doks.digitalocean.com/node-pool=dev-model-services-additional-pool,doks.digitalocean.com/version=1.22.8-do.1,failure-domain.beta.kubernetes.io/region=nyc3,kubernetes.io/arch=amd64,kubernetes.io/hostname=dev-model-services-additional-pool-c2s5n,kubernetes.io/os=linux,node.kubernetes.io/instance-type=s-2vcpu-4gb,region=nyc3,topology.kubernetes.io/region=nyc3
dev-model-services-additional-pool-c2ss9       Ready    <none>   119m   v1.22.8   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=s-2vcpu-4gb,beta.kubernetes.io/os=linux,doks.digitalocean.com/node-id=96963518-6e

The output is long so we trimmed it down to a single node. We can see a lot of information about a node through the tags that it has. For example, the label "beta.kubernetes.io/arch=amd64" tells us the architecture of the node, the label "doks.digitalocean.com/version=1.22.8-do.1" tells us the kubernetes version installed. We're most interested in the "doks.digitalocean.com/node-pool=dev-model-services-additional-pool" label which tells us which node pool the node belongs to. We'll be using this later to schedule workloads.

### Creating a Kubernetes Namespace

Now that we have a cluster and are connected to it, we'll create a namespace to hold the resources for our model deployment. The resource definition is in the kubernetes/namespace.yaml file. To apply the manifest to the cluster, execute this command:

In [64]:
!kubectl create -f kubernetes/namespace.yaml

Error from server (AlreadyExists): error when creating "kubernetes/namespace.yaml": namespaces "model-services" already exists


To take a look at the namespaces, execute this command:

In [65]:
!kubectl get namespace

NAME              STATUS   AGE
default           Active   128m
kube-node-lease   Active   128m
kube-public       Active   128m
kube-system       Active   128m
model-services    Active   11s


The new namespace should appear in the listing along with other namespaces created by default by the system. To set the namespace as the default one for all kubectl commands, execute this command:

In [216]:
!kubectl config set-context --current --namespace=model-services

Context "do-nyc3-dev-model-services-cluster" modified.


### Creating the Model Service

To create the model service we need two types of kubernetes resources, these are:

- Deployment: a declarative way to manage a set of pods, the model service pods are managed through the Deployment.
- Service: a way to expose a set of pods in a Deployment, the model services is made available to the outside world through the Service, the service type is LoadBalancer which means that a load balancer will be created for the service.

Both of these resources are defined in the kubernetes/model_service.yaml file, the file is long so we won't list it here. The env section in the containers definition in the Deployment has a special section which is allowing us to access information about the pod and the node:

```yaml
...
env:
  # environment variable pointing at the configuration file to use
  - name: REST_CONFIG
    value: ./configuration/rest_configuration2.yaml
  # environment variables with downward API information
  - name: POD_NAME
      valueFrom:
        fieldRef:
          fieldPath: metadata.name
  - name: NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName
...
```

The pod definition is using the [downward API provided by Kubernetes](https://kubernetes.io/docs/tasks/inject-data-application/downward-api-volume-expose-pod-information/) to access the node name and the pod name. This information is made available as environment variables. We'll be adding this information to the log by adding the names of the environment variables to the logger configuration that we'll give to the model service. We built a logging context class above for the purpose of adding environment variables to log records.

Another section of the Deployment definition is the nodeSelector section:

```yaml
nodeSelector:
  doks.digitalocean.com/node-pool: "dev-model-services-default-worker-pool"
```

The nodeSelector section is making sure that the service pods are going to be scheduled on the default worker pool. We're choosing to schedule the service pods on this node pool so that we can keep the workload separate from the logging system, which we'll be working on later.

The service is deployed to the Kubernetes cluster with this command:

In [217]:
!kubectl apply -f kubernetes/model_service.yaml

deployment.apps/insurance-charges-model-deployment configured
service/insurance-charges-model-service unchanged


The deployment and service for the model service were created together. You can see the new service with this command:

In [218]:
!kubectl get services

NAME                              TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)        AGE
insurance-charges-model-service   LoadBalancer   10.245.24.248   138.197.51.107   80:30890/TCP   75m


You can also view the pods that are running the service:

In [219]:
!kubectl get pods -l app=insurance-charges-model-service

NAME                                                  READY   STATUS    RESTARTS   AGE
insurance-charges-model-deployment-79c64f6f46-m6kkf   1/1     Running   0          7s


The Service type is LoadBalancer, which means that the cloud provider is providing a load balancer and public IP address through which we can contact the service. To view details about the load balancer provided by DigitalOcean for this Service, we'll execute this command:

In [220]:
!kubectl describe service insurance-charges-model-service | grep "LoadBalancer Ingress"

LoadBalancer Ingress:     138.197.51.107


The load balancer can take a while longer than the service to come up, until the load balancer is running the command won't return anything. The IP address that the DigitalOcean load balancer sits behind will be listed in the output of the command.

To make a prediction, we'll hit the IP service with a request:

In [221]:
!curl -X 'POST' \
  'http://138.197.51.107/api/models/insurance_charges_model/prediction' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{ \
      \"age\": 65, \
      \"sex\": \"male\", \
      \"bmi\": 50, \
      \"children\": 5, \
      \"smoker\": true, \
      \"region\": \"southwest\" \
    }"

{"charges":46277.67,"prediction_id":"8c19b7fe-ee69-46f4-90bd-9ff1c8908bf7"}

The model service is up and running and returning predictions!

We can check which node the pods are scheduled on by describing one of the pods:

In [223]:
!kubectl describe pod insurance-charges-model-deployment-79c64f6f46-m6kkf  | grep "Node-Selectors"

Node-Selectors:              doks.digitalocean.com/node-pool=dev-model-services-default-worker-pool


The pod was scheduled on the default worker pool, as we expect.

### Accessing the Logs

Kubernetes has a built-in system that receives the stdout and stderr outputs of the running containers and saves them to the hard drive of the node for a limited time. You can view the logs emmitted by the containers by using this command:

In [224]:
!kubectl logs insurance-charges-model-deployment-79c64f6f46-m6kkf  | grep "\"action\": \"predict\""

{"asctime": "2022-06-13 01:57:22,641", "hostname": "insurance-charges-model-deployment-79c64f6f46-m6kkf", "pod_name": "insurance-charges-model-deployment-79c64f6f46-m6kkf", "node_name": "dev-model-services-default-worker-pool-c2spi", "process": 1, "thread": 140424458041088, "pathname": "/service/./ml_model_logging/logging_decorator.py", "lineno": 33, "levelname": "INFO", "message": "Prediction requested.", "action": "predict", "model_qualified_name": "insurance_charges_model", "model_version": "0.1.0", "prediction_id": null}
{"asctime": "2022-06-13 01:57:22,716", "hostname": "insurance-charges-model-deployment-79c64f6f46-m6kkf", "pod_name": "insurance-charges-model-deployment-79c64f6f46-m6kkf", "node_name": "dev-model-services-default-worker-pool-c2spi", "process": 1, "thread": 140424458041088, "pathname": "/service/./ml_model_logging/logging_decorator.py", "lineno": 44, "levelname": "INFO", "message": "Prediction created.", "action": "predict", "model_qualified_name": "insurance_char

The logs contain every field that we configured and they are in JSON format, as we expected. The log records also contain the pod_name and node_name fields that we added through the downward API.

Although we can view the logs like this, this is not the ideal way to hold logs. We need to be able to search through the logs generated across the whole system. To do this we'll need to export the logs to an external logging system. We'll be working on that in another section of this blog post.

### Adding a Prediction ID

Now that we have a working logging decorator, we'll add one more field to the logs that it produces. We want to be able to uniquely identify each prediction made by the model within the logs, and also to be able to provide a unique identifier for a prediction to clients of the model service. This allows us to quickly debug problems with individual predictions. 

We built a prediction ID decorator is [a previous blog post](https://www.tekhnoal.com/ml-model-decorators.html). We'll reuse this decorator to add a prediction_id field to the model's input and output. The code is in the current repository in the ml_model_logging/prediction_id_decorator.py file.

To add the prediction id decorator to the model, all we need to do is add it to the service configuration file in the decorators section:

```yaml
...
decorators:
  - class_path: ml_model_logging.prediction_id_decorator.PredictionIDDecorator
  - class_path: ml_model_logging.logging_decorator.LoggingDecorator
    configuration:
      input_fields: ["prediction_id"]
      output_fields: ["prediction_id"]
...
```

The PredictionIDDecorator class implements the functionality, we wont do a deep dive here because it would make the blog post too long. The decorator is instantiated by the model service and added last to the model, so that a prediction ID is generated before the LoggingDecorator sees the input data.

The LoggingDecorator is then configured to log the prediction_id field that is created by the PredictionIDDecorator instance. It's also configured to log the prediction_id from the output of the model because the decorator will generate an id if the client does not provide one.

To apply the new configuration, we'll change the kubernetes Deployment and by changing the environment variable that points at the configuration file that the model service is using:

```yaml
env:
  # environment variable pointing at the configuration file to use
  - name: REST_CONFIG
    value: ./configuration/rest_configuration3.yaml
```

Now we can modify the Deployment in the cluster with this command:

In [213]:
!kubectl apply -f kubernetes/model_service.yaml

deployment.apps/insurance-charges-model-deployment created
service/insurance-charges-model-service created


In [214]:
!kubectl get pods

NAME                                                  READY   STATUS    RESTARTS      AGE
elasticsearch-master-0                                1/1     Running   0             48m
elasticsearch-master-1                                1/1     Running   0             48m
insurance-charges-model-deployment-79c64f6f46-pt9g2   1/1     Running   0             9s
kibana-kibana-9d8cfff9c-xlksr                         1/1     Running   1 (32m ago)   38m


The model service Deployment gets recreated with a different value in the REST_CONFIG environment variable. Now when we make a prediction with the service, we'll get a prediction id in the response:

In [141]:
!curl -X 'POST' \
  'http://138.197.51.107/api/models/insurance_charges_model/prediction' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{ \
      \"age\": 65, \
      \"sex\": \"male\", \
      \"bmi\": 50, \
      \"children\": 5, \
      \"smoker\": true, \
      \"region\": \"southwest\" \
    }"

{"charges":46277.67,"prediction_id":"cfd6006b-69e8-4aac-a6bc-4be5500f336f"}

Since we did not provide a prediction_id input for the model, the PredictionIDDecorator instance generated one and returned it along with the prediction. Now we can look into the logs to see if the prediction_id field made it into the log records:

In [144]:
!kubectl logs insurance-charges-model-deployment-cf9f7c767-fsmzz | grep "\"action\": \"predict\""

{"asctime": "2022-06-13 01:05:08,291", "hostname": "insurance-charges-model-deployment-cf9f7c767-fsmzz", "pod_name": "insurance-charges-model-deployment-cf9f7c767-fsmzz", "node_name": "dev-model-services-default-worker-pool-c2spv", "process": 1, "thread": 140558518556416, "pathname": "/service/./ml_model_logging/logging_decorator.py", "lineno": 33, "levelname": "INFO", "message": "Prediction requested.", "action": "predict", "model_qualified_name": "insurance_charges_model", "model_version": "0.1.0", "prediction_id": null}
{"asctime": "2022-06-13 01:05:08,402", "hostname": "insurance-charges-model-deployment-cf9f7c767-fsmzz", "pod_name": "insurance-charges-model-deployment-cf9f7c767-fsmzz", "node_name": "dev-model-services-default-worker-pool-c2spv", "process": 1, "thread": 140558518556416, "pathname": "/service/./ml_model_logging/logging_decorator.py", "lineno": 44, "levelname": "INFO", "message": "Prediction created.", "action": "predict", "model_qualified_name": "insurance_charges_

The logs now have a prediction_id field that matches the value that was generated by the PredictionIDDecorator instance working in the model service. 

By building some flexibility into the code, we were able to add a prediction ID to the model by using a decorator. We were also able to add the prediction ID to the logs emitted by the logging decorator by configuring it. Lastly, we were able to add the prediction ID decorator to the model service by adding configuration to the model service.

The most poweful aspect of this is that we did not need to modify the code of the model, or the code of the decorators, or the code of the model service at all to add this extra field to the model and the logs.

## Creating the Logging System

The complexity of modern cloud environment makes it hard to manage logs in individual servers since we really don't know where our workloads are going to be scheduled ahead of time. Kubernetes workloads are also highly distributed, meaning that an application can be replicated in many different nodes in a cluster. This makes it necessary to gather logs together in one place so that we can more easily view and analyze them.

A logging system is responsible for gathering  log records from all of the instances of a running application and make them searchable from one centralized location. In this section, we'll add such a logging system to the cluster and use it to monitor the model service we've deployed.

### Logging in Kubernetes

Kubernetes is able to 

### Creating a Namespace

To begin building the logging system, we'll create a Kubernetes namespace for the system so we can keep thing separate from the rest of the workloads we'll be running in the cluster.

In [145]:
!kubectl create -f kubernetes/logging_namespace.yaml

namespace/logging-system created


The namespace is called "logging-system". 

Next we'll switch the kubectl context so that we'll be working exclusively within this namespace:

In [225]:
!kubectl config set-context --current --namespace=logging-system

Context "do-nyc3-dev-model-services-cluster" modified.


### Installing the Helm Charts

All of the services we'll be installing in this section are available as [Helm charts](https://helm.sh/) from [this](https://github.com/elastic/helm-charts) open source repository. We can add the Helm repository with a single command:

In [147]:
!helm repo add elastic https://helm.elastic.co

"elastic" already exists with the same configuration, skipping


The "elastic" repository contains several charts:

In [148]:
!helm search repo elastic

NAME                     	CHART VERSION	APP VERSION	DESCRIPTION                                       
elastic/elasticsearch    	7.17.3       	7.17.3     	Official Elastic helm chart for Elasticsearch     
elastic/apm-server       	7.17.3       	7.17.3     	Official Elastic helm chart for Elastic APM Server
elastic/eck-operator     	2.2.0        	2.2.0      	A Helm chart for deploying the Elastic Cloud on...
elastic/eck-operator-crds	2.2.0        	2.2.0      	A Helm chart for installing the ECK operator Cu...
elastic/filebeat         	7.17.3       	7.17.3     	Official Elastic helm chart for Filebeat          
elastic/kibana           	7.17.3       	7.17.3     	Official Elastic helm chart for Kibana            
elastic/logstash         	7.17.3       	7.17.3     	Official Elastic helm chart for Logstash          
elastic/metricbeat       	7.17.3       	7.17.3     	Official Elastic helm chart for Metricbeat        


We'll be using these charts:

- elastic/filebeat
- elastic/logstash
- elastic/elasticsearch
- elastic/kibana

### Creating the Log Storage Service

The log aggregator needs to have a place to store the logs, to do this we'll use  [ElasticSearch](https://www.elastic.co/elasticsearch/). ElasticSearch is a distributed full-text search enginer with a RESTful API.  The ElasticSearch service is ideal for our needs because our logs are made up of text strings.

To install ElasticSearch, we'll provide the ./helm/elasticsearch_values.yaml file to Helm using the elastic/elasticsearch chart:

In [149]:
!helm install elasticsearch elastic/elasticsearch -f ./helm/elasticsearch_values.yaml

NAME: elasticsearch
LAST DEPLOYED: Sun Jun 12 21:07:18 2022
NAMESPACE: logging-system
STATUS: deployed
REVISION: 1
NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=logging-system -l app=elasticsearch-master -w2. Test cluster health using Helm test.
  $ helm --namespace=logging-system test elasticsearch


Now we can view the pods running the service:

In [165]:
!kubectl get pods -l app=elasticsearch-master

NAME                     READY   STATUS    RESTARTS   AGE
elasticsearch-master-0   1/1     Running   0          5m46s
elasticsearch-master-1   1/1     Running   0          5m46s


ElasticSearch also has a [Service](https://kubernetes.io/docs/concepts/services-networking/service/):

In [166]:
!kubectl get services

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
elasticsearch-master            ClusterIP   10.245.219.122   <none>        9200/TCP,9300/TCP   5m53s
elasticsearch-master-headless   ClusterIP   None             <none>        9200/TCP,9300/TCP   5m53s


ElasticSearch is deployed as a [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/), which is a way to schedule workloads on Kubernetes that need to maintain state. 

The StatefulSet is also running on the additional node pool. We did this by adding a nodeSelector to the Helm values file:

```yaml
...
nodeSelector:
    doks.digitalocean.com/node-pool: "dev-model-services-additional-pool"
...
```

This scheduling requirement will prevent us from running the logging system on the same nodes as the model service.

### Creating the Log User Interface Service

To view the logs we'll be using [Kibana](https://www.elastic.co/kibana/). Kibana is a web application that can provide access to and visualize logs stored in ElasticSearch.

To install Kibana, we'll provide the ./helm/kibana_values.yaml file to Helm using the elastic/kibana chart:

In [168]:
!helm install kibana elastic/kibana -f ./helm/kibana_values.yaml

NAME: kibana
LAST DEPLOYED: Sun Jun 12 21:18:03 2022
NAMESPACE: logging-system
STATUS: deployed
REVISION: 1
TEST SUITE: None


In [178]:
!kubectl get pods -l app=kibana

NAME                            READY   STATUS    RESTARTS        AGE
kibana-kibana-9d8cfff9c-xlksr   1/1     Running   1 (5m34s ago)   11m


Kibana is deployed as a normal web service. We can view the Kibana service like this:

In [179]:
!kubectl get services | grep "kibana"

kibana-kibana                   ClusterIP   10.245.52.191    <none>        5601/TCP            11m


Te Kibana pods are also scheduled on the additional node pool using a nodeSelector:

```yaml
...
nodeSelector:
  doks.digitalocean.com/node-pool: "dev-model-services-additional-pool"
...
```

To access the Kibana web UI, we'll forward the port from the pod to a local port:

```bash
kubectl port-forward svc/kibana-kibana 5601:5601
```

We can view the Kibana UI on a local browser:

![Kibana UI](kibana_ui.png)
![Kibana UI]({attach}kibana_ui.png){ width=100% }

### Creating the Log Aggregator Service

Once the logs have been forwarded from the individual nodes in the cluster, we'll need to aggregated them and store them somewhere. To aggregate the logs, we'll use [Logstash](https://www.elastic.co/logstash/). Logstash is able to ingest data from many sources, process it, and save it to a destination.

To install Logstash, we'll provide the ./helm/logstash_values.yaml file to Helm using the elastic/logstash chart:

In [240]:
!helm install logstash elastic/logstash -f ./helm/logstash_values.yaml

NAME: logstash
LAST DEPLOYED: Sun Jun 12 22:04:59 2022
NAMESPACE: logging-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Watch all cluster members come up.
  $ kubectl get pods --namespace=logging-system -l app=logstash-logstash -w


We can view the pods that are running the Logstash service like this:

In [247]:
!kubectl get pods -l app=logstash-logstash

NAME                  READY   STATUS    RESTARTS   AGE
logstash-logstash-0   0/1     Running   0          88s


In [248]:
!kubectl describe pod logstash-logstash-0

Name:         logstash-logstash-0
Namespace:    logging-system
Priority:     0
Node:         dev-model-services-additional-pool-c2ss9/10.108.16.5
Start Time:   Sun, 12 Jun 2022 22:05:01 -0400
Labels:       app=logstash-logstash
              chart=logstash
              controller-revision-hash=logstash-logstash-65c4bdd744
              heritage=Helm
              release=logstash
              statefulset.kubernetes.io/pod-name=logstash-logstash-0
Annotations:  pipelinechecksum: 4839e081dd1fb1ef4e11ac386773fc6fbb9af9e4be1ff574f51527138017016
Status:       Running
IP:           10.244.1.225
IPs:
  IP:           10.244.1.225
Controlled By:  StatefulSet/logstash-logstash
Containers:
  logstash:
    Container ID:   containerd://bcf6d71add9888f536ee54cd0d36e097423e3bca5c4c32be29be2e3bfe30bcf1
    Image:          docker.elastic.co/logstash/logstash:7.15.0
    Image ID:       docker.elastic.co/logstash/logstash@sha256:ba6ee9c11620d0bb9d5bff5937bdf995b71bc7a2bcd1047b1458c

In [243]:
!kubectl logs logstash-logstash-0

Using bundled JDK: /usr/share/logstash/jdk


In [None]:
!kubectl get services

In [246]:
!kubectl logs logstash-logstash-0

Using bundled JDK: /usr/share/logstash/jdk


Logstash is deployed as a [Service](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/), which is a way to schedule workloads on Kubernetes that need to maintain state. 

The StatefulSet is also running on the additional node pool. We did this by adding a nodeSelector to the Helm values file, just like Logstash.

### Creating the Log Forwarder Service

In order to centralize access to logs, we'll first need a way to get the logs off of the individual cluster nodes and forward them to the aggregator service. The service we'll use to do this is called [Filebeat](https://www.elastic.co/beats/filebeat). Filebeat is a lightweight service that can forward logs stored in files to an outside service.

To install Filebeat, we'll provide the ./helm/filebeat_values.yaml file to Helm using the elastic/filebeat chart:

In [226]:
!helm install filebeat elastic/filebeat -f ./helm/filebeat_values.yaml

NAME: filebeat
LAST DEPLOYED: Sun Jun 12 21:58:51 2022
NAMESPACE: logging-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Watch all containers come up.
  $ kubectl get pods --namespace=logging-system -l app=filebeat-filebeat -w


We can view the pods that are running the Filebeat processes like this:

In [230]:
!kubectl get pods -l app=filebeat-filebeat

NAME                      READY   STATUS    RESTARTS   AGE
filebeat-filebeat-6tr9w   0/1     Running   0          36s
filebeat-filebeat-pdcvp   0/1     Running   0          36s


In [231]:
!kubectl describe pod filebeat-filebeat-pdcvp

Name:         filebeat-filebeat-pdcvp
Namespace:    logging-system
Priority:     0
Node:         dev-model-services-default-worker-pool-c2spv/10.108.16.2
Start Time:   Sun, 12 Jun 2022 21:58:53 -0400
Labels:       app=filebeat-filebeat
              chart=filebeat-7.17.3
              controller-revision-hash=5cb557986c
              heritage=Helm
              pod-template-generation=1
              release=filebeat
Annotations:  configChecksum: 8bff89f37192b6bcf6c9ac083a23400476e58d2aaa210ee43dd98650fa51b5f
Status:       Running
IP:           10.244.0.94
IPs:
  IP:           10.244.0.94
Controlled By:  DaemonSet/filebeat-filebeat
Containers:
  filebeat:
    Container ID:  containerd://1ab3612d5de06434cf92fb1f9c73dd1eea95f9ec64715707d7204afcec73e479
    Image:         docker.elastic.co/beats/filebeat:7.15.0
    Image ID:      docker.elastic.co/beats/filebeat@sha256:bb436cf141e03a2e5a2dc589971c2f20b621e46380d8f25016f9df4a2dba67a2
    Port:          <none>
    Hos

The Filebeat pods are running inside of a [DaemonSet](https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/), which is a way to schedule pods in Kubernetes that guarantees that a pod will run on all nodes. We have two nodes running on the worker node pool of the cluster so we see two pods listed above.

The DaemonSet is actually not running on the nodes in the additional node pool we set up using Terraform above. This was done by adding a nodeSelector section to the Helm values file for the Filebeat installation:

```yaml
...
nodeSelector:
    doks.digitalocean.com/node-pool: "dev-model-services-default-worker-pool"
...
```

This nodeSelector guarantees that we'll only run the log forwarding processes in the default worker pool of the cluster which is running the model service. This is to be able to keep the workloads separated in two different node pools, the default pool is for the model services, and the additional node pool is for the supporting services like the logging system.

## Viewing the Logs

A Kibana dashboard specifically for the model, created by using the fields that were added by the decorator…


Number of predictions...

Number of exceptions...

Number of predictions per instance...

## Deleting the Resources

Now that we're done with the service we need to destroy the resources. 

To delete the logging system, we'll delete the helm deployments:

In [249]:
!helm uninstall filebeat
!helm uninstall logstash
!helm uninstall elasticsearch
!helm uninstall kibana

release "filebeat" uninstalled
release "logstash" uninstalled
release "elasticsearch" uninstalled
release "kibana" uninstalled


The persistent volume claims are not deleted along with the deployments, so we'll list them and then delete them:

In [250]:
!kubectl get pvc

NAME                                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS       AGE
elasticsearch-master-elasticsearch-master-0   Bound    pvc-a5788ff6-1165-48a2-a55e-3ca3409f4501   30Gi       RWO            do-block-storage   67m
elasticsearch-master-elasticsearch-master-1   Bound    pvc-0721169a-345f-41ac-9d91-deb166e54f09   30Gi       RWO            do-block-storage   67m


In [251]:
!kubectl delete pvc -l app=elasticsearch-master

persistentvolumeclaim "elasticsearch-master-elasticsearch-master-0" deleted
persistentvolumeclaim "elasticsearch-master-elasticsearch-master-1" deleted


Now we can delete the logging system's namespace:

In [252]:
!kubectl delete -f kubernetes/logging_namespace.yaml

namespace "logging-system" deleted


To delete the model service kubernetes resources, we'll execute this command:

In [254]:
!kubectl config set-context --current --namespace=model-services
!kubectl delete -f ./kubernetes/model_service.yaml

Context "do-nyc3-dev-model-services-cluster" modified.
deployment.apps "insurance-charges-model-deployment" deleted
service "insurance-charges-model-service" deleted


We'll also delete the model service namespace:

In [255]:
!kubectl delete -f kubernetes/namespace.yaml

namespace "model-services" deleted


Lastly, we'll need to delete the cloud infrastructure using the Terraform:

In [256]:
%cd ./terraform

!terraform plan -destroy -out=tfplan

/Users/brian/Code/logging-for-ml-models/terraform
[0m[1mmodule.kubernetes_cluster.digitalocean_container_registry.container_registry: Refreshing state... [id=dev-model-services-container-registry][0m
[0m[1mmodule.kubernetes_cluster.digitalocean_vpc.cluster_vpc: Refreshing state... [id=b96875c9-ed67-46d7-892e-3a3c4d9329c0][0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_cluster.cluster: Refreshing state... [id=cf9dd0d1-b67a-43fa-a7d2-f3b30d35d2da][0m
[0m[1mmodule.kubernetes_cluster.digitalocean_project.project: Refreshing state... [id=a4470669-8c90-468a-9bdb-5e3a27428eff][0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_node_pool.additional_pool[0]: Refreshing state... [id=0734e25f-54da-465d-a6ca-833678b6b067][0m

Terraform used the selected providers to generate the following execution plan.
Resource actions are indicated with the following symbols:
  [31m-[0m destroy
[0m
Terraform will perform the following actions:

[1m  # module.kubernetes_clu

In [257]:
!terraform apply -auto-approve -destroy tfplan

[0m[1mmodule.kubernetes_cluster.digitalocean_project.project: Destroying... [id=a4470669-8c90-468a-9bdb-5e3a27428eff][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_container_registry.container_registry: Destroying... [id=dev-model-services-container-registry][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_node_pool.additional_pool[0]: Destroying... [id=0734e25f-54da-465d-a6ca-833678b6b067][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_container_registry.container_registry: Destruction complete after 1s[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_project.project: Destruction complete after 3s[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_node_pool.additional_pool[0]: Still destroying... [id=0734e25f-54da-465d-a6ca-833678b6b067, 10s elapsed][0m[0m
[0m[1mmodule.kubernetes_cluster.digitalocean_kubernetes_node_pool.additional_pool[0]: Still destroying... [id=0734e25f-54da-465d-a6ca-833678b6b067, 20s elapsed][0m[0m
[0m[1mm

## Closing

In this blog post we showed how to do logging with the Python logging package, and how to create a decorator that can help us to do logging around an MLModel. We also set up and used a logging system within a Kubernetes cluster and used it to aggregate logs and view them. Logging is usually the first thing that is implemented when we need to monitor how a system performs, and machine learning models are no exception to this. The logging decorator allowed us to do complex logging without having to modify the implementation of the model at all, thus simplifying a common aspect of software observability.

One of the benefits of using the decorator pattern is that we are able to build up complex behaviors around an object by combining decorator instances in the right order. We saw how to do this when we added a unique id to each prediction that the model made by adding the PredictionIDDecorator to the configuration. The prediction_id field was then added to the configuration of the LoggingDecorator as an extra field to add to the log records it produced. This approach allowed us to add a unique identifier to each prediction and also log the identifier with each log generated by the logging decorator, and we didnt have to write new code to do it.

The LoggingDecorator class is very configurable, since we are able to configure it to log input and output fields from the model. This approach makes the implementation very flexible, since we do not need to modify the decorator's code to add fields to the log. The EnvironmentInfoFilter class that we implemented to grab information from the environment for logs is also built this way. We were able to get information about the Kubernetes deployment from the logs without having to modify the code at all.

The LoggingDecorator class is designed to work with MLModel classes, and this is the only hard requirement of the code. This makes the decorator very portable, because we are able to deploy it inside of any other model deployment service we may choose to build in the future. For example, we can just as easily decorate an MLModel instance runnning inside of an gRPC service, since the decorator would work exactly the same way. This is due to interface-driven approach that we took when designing the MLModel interface.

The fact is that we added logging to the ML model from the "outside" and we were not able to access information about the internals of the model. This is a limitation of the decorator approach to logging which only has access to the model inputs, model outputs, and exceptions raised by the model. This approach is best used to add logging functionality to an ML model implementation that we do not control, or in simple situations in which the limitations of the approach do not affect us. If any logging of internal model state is needed, we'll need to generate logs from within the MLModel class. 