This notebook introduces the MLOps platform you will be using in this course. The notebook first provides an overview of the platform's architecture and then links you to the tutorials that help you prepare for this week's assignments. 

# An introduction to the MLOps platform
### Overview 
The overview of the MLOps platform is illustrated in the following figure.

![](./images/overview.jpg)

The MLOps platform has the following main components:

**[Kubeflow Pipelines](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/)** is an open-source ML task orchestrator. It allows users to define an ML pipeline, which is a way to codify and automate a set of predefined tasks, such as data preprocessing, model training, and model deployment. Kubeflow Pipelines allows for managing the dependencies and execution orders of tasks. It runs atop a Kubernetes cluster and can run multiple ML pipelines in parallel by leveraging the resources of the cluster.

**[MLflow](https://mlflow.org/docs/latest/index.html)** is an open-source platform for managing ML experiments. It is used to record model training metadata (e.g., hyperparameters), training datasets, and metrics (e.g., the evaluation metrics obtained when the model is evaluated against a testing dataset). MLflow also provides the feature of saving training artifacts (e.g., model artifacts and other files generated during model training) in a central place. In the MLOps platform, MLflow is deployed as a service, which provides endpoints for recording model training metadata and saving training artifacts. As shown in the figure above, MLflow uses a [PostgreSQL database](https://www.postgresql.org/) as metadata store for saving the training metadata and an [MinIO storage](https://min.io/) as an artifact store for persisting training artifacts, e.g. model artifacts, text, and datasets. 

**[KServe](https://kserve.github.io/website/0.11/)** is an open-source model serving platform built atop Kubernetes. It is designed to simplify the deployment and management of machine learning models in production environments. KServe leverages Kubernetes to provide the necessary infrastructure and capabilities for deploying and scaling ML models. In more details, upon receiving a model deployment request, KServe fetches the model artifact from defined storage (the MinIO storage service in our platform). It then packages the model artifact as a container and provides HTTP and gRPC APIs for querying predictions. KServe supports a wide range of model formats by default, such as SKLearn, XGBoost, LightGBM, TensorFlow, and PyTorch. 

**Monitoring stack**: The monitoring stack consists of three components
- [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/docs/grafana/latest/?pg=oss-graf&plcmt=hero-btn-2): A common solution for monitoring and visualizing service performance (e.g., throughput, latency, response status)
- [Evidently](https://docs.evidentlyai.com/): An open-source tool for monitoring ML model performance (e.g, prediction error).

### Week 1 tutorials
**Please run the tutorials using the `mlops_eng` Conda environment.**

In this week, we'll focus on tracking model training using MLflow. Specifically, you'll learn how to log model training metadata and upload trained models to MLflow. You'll also learn how to use [Deepchecks](https://docs.deepchecks.com/stable/getting-started/welcome.html) to evaluate a trained model. As the course progresses in the upcoming weeks, you'll learn more about KServe and Kubeflow Pipelines.

Below are the links to the MLflow and Deepchecks tutorial. 

* [MLflow tutorial](./tutorials/1_try_mlflow.ipynb)
* [Deepchecks tutorial](./tutorials/2_try_deepchecks.ipynb)
