# <span style="color: #ff6D04; ">MLflow + FiftyOne Workflow</span>


# A Guided Walkthrough


In this notebook, we will walkthrough a demo workflow using FiftyOne and MLflow together to train YOLOv9, a state of the art detection model. We will demonstate each step of the process, including loading in our training dataset, covering how to curate your data, training and monitoring our model, and performing post training evaluation. The notebook will demonstrate how MLflow and FiftyOne can be paired to level up any training workflow through precise model monitoring and data curation. Paired with FiftyOne's app all this information can be easily digested and explored to increase the peformance of a machine learning engineer.

Be sure to have started your MLflow server as well as downloaded all necesary requirements. Steps for both of these can be found below!

## Installing Requirements

First install the required python libraries below

In [None]:
!pip install mlflow fiftyone torch torchvision voxel51-eta[storage] ultralytics

Next we will install the fiftyone-mlflow-plugin that will allow us to view and manage our MLflow client in the FiftyOne App! The App can be run in your browser at localhost:5151 or even in your Databricks Notebook!

In [None]:
!fiftyone plugins download https://github.com/voxel51/fiftyone_mlflow_plugin

## Start the MLflow Server

Before we begin, we will start our MLflow server locally to serve as our backend for the demo.

In [None]:
# Please enter the following in another bash terminal in the same project directory!

mlflow server --backend-store-uri runs/mlflow

## Prepping for Training

Let's kick things off by loading in all of our required libraries. While we are at it, we will start our MLflow client and specifying our `tracking_uri`

In [1]:
import os

# Set tracking URI across libraries
os.environ["MLFLOW_TRACKING_URI"] = "http://127.0.0.1:5000"

import fiftyone as fo
import fiftyone.utils.random as four

For our example workflow, I will be using a subset of the [VisDrone](https://github.com/VisDrone/VisDrone-Dataset?tab=readme-ov-file) dataset, a state of the art drone imagery dataset from  Lab of Machine Learning and Data Mining, Tianjin University, China. It features a wide range of locations, time of day, objects, and angles. The subset we will be using can be downloaded on [Google Drive](https://drive.google.com/file/d/1a2oHjcEcwXP8oUF95qiwrqzACb2YlUhn/view). Once the file is downloaded and unzipped, we can load it in by following our ingestor below!</span>

In [None]:
!eta gdrive download --public 1a2oHjcEcwXP8oUF95qiwrqzACb2YlUhn VisDrone2019-DET-train.zip

In [None]:
!unzip VisDrone2019-DET-train.zip -d VisDrone-train

In [5]:
import os

dataset_dir="./VisDrone-train/VisDrone2019-DET-train/images"
name = "VisDrone"

# Create the dataset by loading in the directory of images
dataset = fo.Dataset.from_dir(
    dataset_dir=dataset_dir,
    dataset_type=fo.types.ImageDirectory,
    name=name,
    overwrite=True
)

# We compute the metadata of the dataset to get height and width of all our samples
dataset.compute_metadata()



 100% |███████████████| 6471/6471 [469.6ms elapsed, 0s remaining, 13.8K samples/s]      
Computing metadata...
 100% |███████████████| 6471/6471 [971.6ms elapsed, 0s remaining, 6.7K samples/s]      


VisDrone features 12 different classes which we will create a dictionary for. The annotations are stored as <x, y, w, h, confidence, label, truncation, occlusion> in txt files. Since it is a custom format, we ingest it by looping through our datasets and grabbing each sample. Next we open up the text file and add the detections and all their metadata on a sample by sample basis

In [8]:
class_map = {0:"ignore_regions",
             1:"pedestrians",
             2:"people",
             3:"bicycle",
             4:"car",
             5:"van",
             6:"truck",
             7:"tricycle",
             8:"awning_tricycle",
             9:"bus",
             10:"motor",
             11:"others",
}

ann_dir = "./VisDrone-train/VisDrone2019-DET-train/annotations/"

for sample in dataset:

    # Grab the annotation file
    filename = os.path.basename(sample.filepath)
    ann = ann_dir + os.path.splitext(filename)[0] + ".txt"
    if os.path.exists(ann):
        with open(ann, 'r') as file:
            detections = []
            for line in file:
                split_line = line.strip().split(",")
                ann_list = [int(x) for x in split_line[:8]]

                # Grab all the detection information from the line
                label = class_map[ann_list[5]]
                trunc = ann_list[6]
                occ = ann_list[7]

                # FiftyOne takes in normalized (x,y,w,h) bounding boxes
                x = ann_list[0] / sample.metadata.width
                y = ann_list[1] / sample.metadata.height
                w = ann_list[2] / sample.metadata.width
                h = ann_list[3] / sample.metadata.height
                det = fo.Detection(
                    label=label,
                    bounding_box = [x,y,w,h],
                    truncation=trunc,
                    occlusion=occ
                )
                detections.append(det)

            sample["ground_truth"] = fo.Detections(detections=detections)
            sample.save()

# Set our dataset as persistent
dataset.persistent=True

After loading both our images and annotations in, we set the dataset as persistent to have it persist in the database and make sure any new changes will saved. This also allows for easy reloading on future sessions with the following: 

In [2]:
dataset = fo.load_dataset("VisDrone")

Finally, we can launch our FiftyOne app with the line below to visualize our dataset. Learn about all the different ways you can use the FiftyOne app [here](https://docs.voxel51.com/user_guide/app.html)!

In [None]:
session = fo.launch_app(dataset, auto=False)
session.open_tab()

At this point, we can begin the data curation process and begin to look for issues or mistakes in our datasets. We can leverage powerful features within FiftyOne to help bring new insights into our dataset and create high quality subsets of our data to train on.

- [Visualize embeddings with FiftyOne Brain](https://docs.voxel51.com/user_guide/brain.html#visualizing-embeddings)
- [Search your datasets with text prompts or sort by similarity](https://docs.voxel51.com/user_guide/brain.html#similarity)
- [Find image quality issues](https://github.com/jacobmarks/image-quality-issues)
- [Find exact and approximate duplicates](https://github.com/jacobmarks/image-deduplication-plugin)
- [Find outliers in your dataset](https://github.com/danielgural/outlier_detection)
- [Create interesting views of your dataset by filtering, slicing, sorting, and more!](https://docs.voxel51.com/user_guide/using_views.html)

All these curation tools, the MLflow panel and more are powered by [FiftyOne Plugins](https://github.com/voxel51/fiftyone-plugins)

Once you have created a view you like, we need to export the dataset in YOLO format in order to train YOLO9. We do so by randomly splitting and using the `export` method

In [22]:
class_map = {0:"ignore_regions",
             1:"pedestrians",
             2:"people",
             3:"bicycle",
             4:"car",
             5:"van",
             6:"truck",
             7:"tricycle",
             8:"awning_tricycle",
             9:"bus",
             10:"motor",
             11:"others",
}

# Replace below with you own saved view, or use the whole dataset
#curated = dataset.load_saved_view("Curated")
curated = dataset

four.random_split(curated, {"val": 0.15, "train": 0.85})
classes = list(class_map.values())

for split in ["val","train","test"]:
    view =  curated.match_tags(split)
    view.export(
        export_dir="VisDrone_curated/",
        split=split,
        dataset_type=fo.types.YOLOv5Dataset,
        classes=classes
    )




 100% |███████████████| 1779/1779 [12.3s elapsed, 0s remaining, 126.8 samples/s]      
Directory 'VisDrone_curated/' already exists; export will be merged with existing files
 100% |███████████████| 6308/6308 [43.9s elapsed, 0s remaining, 91.8 samples/s]       
Directory 'VisDrone_curated/' already exists; export will be merged with existing files
 100% |█████████████████████| 0/0 [6.2ms elapsed, ? remaining, ? samples/s] 


## Beginning Training

To get started, we will be training with [Ultralytics YOLOv9](https://docs.ultralytics.com/models/yolov9/). We will take advantage of the Ultralytics MLflow integration to round out our stack for this workflow. Also, Ultralytics is [integrated](https://docs.voxel51.com/integrations/ultralytics.html) with FiftyOne for easy use!

The run will be stored on MLflow with information of the hyperparameters, dataset contents, and metrics during training like mAP score! A custom run will also be saved to the FiftyOne dataset that saves information like the tracking_uri and experiment name from MLflow as well as allows for you to come back to the view the run was trained on whenever! 

In [None]:
from ultralytics import YOLO
import fiftyone.operators as foo

log_mlflow_run = foo.get_operator("@voxel51/mlflow_tracking/log_mlflow_run")


# Build a YOLOv9c model from pretrained weight
model = YOLO('yolov9c.pt')

# Display model information (optional)
model.info()

# Train the model on the  dataset for 1 epochs, set project name for experiment_name, name for the run name
model.train(
    data='../VisDrone_curated/dataset.yaml',
    epochs=1,
    imgsz=640, 
    batch=4,
    project="mlflow_fiftyone",
    name="Curated"
)

# Add predictions to our dataset
dataset.apply_model(model, label_field="Curated")


#Log the completed run to our FiftyOne Dataset
log_mlflow_run(dataset, "mlflow_fiftyone", predictions_field="Curated")

We can start monitoring right away in FiftyOne! To open the MLflow panel, click the `+` button next to the sample tab and select `MLflow Dashboard`. You can also open the panel with the MLflow button shown below. If the dataset has an associated experiment, it will open that experiment as well in the dashboard!

<img src="./assets/open_mlflow.gif" alt="MLFLow Monitoring">

During our run, we can monitor its status in the FiftyOne App through the MLflow panel:

<img src="./assets/mlflow.gif" alt="MLFLow Monitoring">

We can even come back to a run whenever using the `show_mlflow_run` operator! With it, we can select an experiment followed by a run to pull up the panel to see training results as well as all the samples it was trained on!

<img src="./assets/view_mlflow_run.gif" alt="MLFLow Run">

## Evaluating Our Models

We can use `evaluate_detections` and calculate the mAP of our model. We also add metadata to our sample detections such if they were a false potive or a true positive!

In [17]:
results = dataset.evaluate_detections(pred_field="Curated", gt_field="ground_truth", eval_key="eval", compute_mAP=True)

Evaluating detections...
 100% |███████████████| 6471/6471 [20.0m elapsed, 0s remaining, 6.6 samples/s]      
Performing IoU sweep...
 100% |███████████████| 6471/6471 [6.8m elapsed, 0s remaining, 14.2 samples/s]      


We can repeat the workflow of adding predictions and evaluating for any number of models on our dataset! You can even compare predicitions from one model to another using the [model comparision](https://github.com/allenleetc/model-comparison) plugin!

<img src="./assets/model_compare_input.gif" alt="Model Compare Input">

We can choose from a variety of options to see exactly where your two models differ. Forget searching across hundreds of thousands of detection, the model comparision plugin will bring only the samples of interest right in front of you! 

<img src="./assets/model_compare_out.gif" alt="Model Compare Input">

A trained model can also help use during data curation! One of the most common ways is to check your high confidence false postives. This is where you are most likely to find annotation mistakes in your data!

<img src="./assets/high_cf_fp.gif" alt="High Conf False Positives">