# Overview

According to the [homepage](https://mlflow.org/) MLFlow is "An open source platform for the machine learning lifecycle". Its intended purpose is to provide a single tool which allows data scientists to address ML specific components of the Software Development Life cycle (SDLC) including: experimentation, reproducibility, model deployment, and a model storage in a centralizing model registry.

The project breaks down into the following sub-components:
- MLflow Tracking - Allows user to record and query experiments: code, data, config, and results
- MLflow Projects - A standard package format allowing reproducable deployments of models to any platform regardless of where they were authored
- MLflow Models - A unified abstraction layer allowing support for integration with multiple machine learning model libraries and providers
- Model Registry - Store, annotate, discover, and manage models in a central repository

We will see that each sub-component has a coresponding UI and API.

The project page boasts integrations with many big name players, providers, and technology stacks and sees contributions coming from many big names in the space.

## Agenda

In this notebook we will get our feet wet and explore the basic functionalities. For official documentation, see the [MLFlow quickstart guide](https://mlflow.org/docs/latest/quickstart.html).



# 1. Architecture

Before we get started with MLFlow, It is important to understand the architectural components. As we will see in section 2, there are a number of ways to deploy the MLFlow. Before we choose a method, it is important to understand that an MLFlow deployment consists of the following components:
- Backend - Persists MLflow entities (runs, parameters, metrics, tags, notes, metadata, etc)
- Artifact Store - persists artifacts (files, models, images, in-memory objects, or model summary, etc)
- REST API - The optional component exposing web based MLFlow API
- Tracking UI - The Web UI for lets you visualize, search and compare runs, as well as download run artifacts or metadata for analysis in other tools, and register or tag models.



# 2. Deployments

Reading through the [documentation](https://mlflow.org/docs/latest/tracking.html), assuming you are deploying your own MLFlow installation, you have the following documented options:

<table >
	<tbody>
		<tr>
			<td>Scenario</td>
			<td>Description</td>
			<td>Backend</td>
			<td>Artifact Store</td>
		</tr>
		<tr>
			<td>1</td>
			<td>MLflow on localhost</td>
			<td>local ./mlruns&nbsp;directory</td>
			<td>local ./mlruns&nbsp;directory</td>
		</tr>
		<tr>
			<td>2</td>
			<td>MLflow on localhost with SQLite</td>
			<td>local ./mlruns directory</td>
			<td>local mlruns.db file</td>
		</tr>
		<tr>
			<td>3</td>
			<td>MLflow on localhost with Tracking Server</td>
			<td>A REST API utilizes&nbsp;./mlruns directory</td>
			<td>A REST API utilizes ./mlruns directory</td>
		</tr>
		<tr>
			<td>4</td>
			<td>MLflow with remote Tracking Server, backend and artifact stores</td>
			<td>A REST API utilizes remote backend</td>
			<td>A REST API utilizes remote artifact store</td>
		</tr>
	</tbody>
</table>

**Note**: None of these options discuss the MLFlow UI.

Databricks and AWS also provide MLFlow deployments that data scientists can leverage.

For our purposes, we will keep it simple and go with scenario 1 and host our own UI.

# 3. Object Model

# 4. Launching MLFlow