# Waste identification with instance segmentation in Detectron2

Welcome to the Instance Segmentation Colab! This notebook will take you through the steps of running an "out-of-the-box" Mask RCNN Instance Segmentation model on image from Detectron2.

To finish this task, a proper path for the model and a single image needs to be provided. The path to the labels on which the models are trained is in the waste_identification_ml directory inside the Tensorflow Model Garden repository. The label files are inferred automatically for the model.

## Clone and install Detectron2

In [None]:
# Clone the Detectron2 repository and install the required packages.
# Relax as installing packages might take a while.
!git clone -q 'https://github.com/facebookresearch/detectron2'
!pip install -q 'git+https://github.com/facebookresearch/detectron2.git'

In [None]:
# Install supervision package for the postprocessing of output results
# from Detectron2 Mask RCNN model.
!pip install -q supervision

## Clone the TF Model Garden repo where the waste identification project is located

In [None]:
!git clone --depth 1 https://github.com/tensorflow/models 2>/dev/null

## Imports and Setup

In [None]:
# Third-Party Imports
import csv
import cv2
# Detectron2 Imports
import detectron2
from detectron2.config import get_cfg
from detectron2.data.catalog import Metadata
from detectron2.engine import DefaultPredictor
from detectron2.structures import Boxes, Instances
from detectron2.utils.logger import setup_logger
from detectron2.utils.visualizer import Visualizer
import matplotlib.pyplot as plt
from PIL import Image
import supervision as sv
import torch

# Setup Detectron2 Logger
setup_logger()

In [None]:
# @title Utilities


def convert_detections_to_instances(
    outputs: dict,
    image_size: tuple[int, int] = (1024, 1024),
    nms_threshold: float = 0.8,
    class_agnostic: bool = True,
) -> dict[str, Instances]:
  """Convert Detectron2 model outputs to an Instances object with Non-Maximum Suppression (NMS) applied.

  Args:
      outputs: Detectron2 model output containing instance predictions.
      image_size: Image dimensions (height, width).
      nms_threshold: Non-Maximum Suppression (NMS) threshold.
      class_agnostic: Whether NMS should be applied in a class-agnostic manner.

  Returns:
      Reformatted Detectron2 output as {"instances": Instances}.
  """
  # Apply NMS and convert to supervision Detections format
  detections = sv.Detections.from_detectron2(outputs).with_nms(
      threshold=nms_threshold, class_agnostic=class_agnostic
  )

  # Convert extracted values to PyTorch tensors
  bboxes = torch.tensor(detections.xyxy, dtype=torch.float32)
  scores = torch.tensor(detections.confidence, dtype=torch.float32)
  classes = torch.tensor(detections.class_id, dtype=torch.int64)

  # Create an Instances object
  output_instances = Instances(image_size)
  output_instances.set("pred_boxes", Boxes(bboxes))
  output_instances.set("scores", scores)
  output_instances.set("pred_classes", classes)

  # Add masks if available
  if detections.mask is not None:
    masks = torch.tensor(detections.mask, dtype=torch.uint8)
    output_instances.set("pred_masks", masks)

  return {"instances": output_instances}


def read_csv(file_path: str) -> list[str]:
  """Reads a CSV file and returns its contents as a list.

  This function reads the given CSV file, skips the header, and assumes
  there is only one column in the CSV. It returns the contents as a list of
  strings.

  Args:
      file_path: The path to the CSV file.

  Returns:
      The contents of the CSV file as a list of strings.
  """
  data_list = []
  with open(file_path, "r") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
      data_list.append(row[0])
  return data_list

## Import and load the labels.

In [None]:
LABELS_PATH = (
    'models/official/projects/waste_identification_ml/pre_processing/'
    'config/data/45_labels.csv'
)

labels = read_csv(LABELS_PATH)

my_metadata = Metadata()
my_metadata.set(thing_classes=labels)

## Import Detectron2 Mask RCNN model.

In [None]:
%%bash
wget https://storage.googleapis.com/tf_model_garden/vision/\
waste_identification_ml/Detectron2_Jan2025_1024_1024.zip

unzip Detectron2_Jan2025_1024_1024.zip > /dev/null 2>&1

## Load the model and perform inference (Non-TRT)

You will need to supply the input and output folders below. Among other options, you can use local files or connect to a google drive where you have images. See examples [here](https://colab.sandbox.google.com/notebooks/io.ipynb)

In [None]:
# Initialize the Detectron2 configuration object
cfg = get_cfg()

# Load the model configuration from a YAML file.
cfg.merge_from_file("config.yaml")

# Set the confidence threshold.
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5

# Specify the path to the trained model weights.
cfg.MODEL.WEIGHTS = "model_final.pth"

# Create a predictor object using the configured model.
predictor = DefaultPredictor(cfg)

In [None]:
HEIGHT = 1024
WIDTH = 1024

import collections
import os

INPUT_FOLDER = "images"
OUTPUT_FOLDER = "images_output"
LABEL_COUNTS = collections.defaultdict(int)


for filename in os.listdir(INPUT_FOLDER):
  if filename.lower().endswith(
      (".jpg", ".jpeg", ".png")
  ):  # Adjust extensions if needed
    input_path = os.path.join(INPUT_FOLDER, filename)
    img = cv2.imread(input_path)

    original_height, original_width = img.shape[:2]

    resized_image = cv2.resize(
        img, (WIDTH, HEIGHT), interpolation=cv2.INTER_AREA
    )

    outputs = predictor(resized_image)
    outputs = convert_detections_to_instances(outputs)

    # Extract the predicted instances
    instances = outputs["instances"].to("cpu")

    # Re-scale bounding boxes back to the original image size
    scale_x = original_width / WIDTH
    scale_y = original_height / HEIGHT
    instances.pred_boxes.scale(scale_x, scale_y)

    # Resize masks to match the original image size
    if instances.has("pred_masks"):
      pred_masks = instances.pred_masks.numpy()  # Convert to NumPy array
      resized_masks = []

      for mask in pred_masks:
        resized_mask = cv2.resize(
            mask.astype("uint8"),
            (original_width, original_height),
            interpolation=cv2.INTER_NEAREST,
        )
        resized_masks.append(resized_mask)

      instances.pred_masks = torch.tensor(resized_masks, dtype=torch.uint8)

    # Initialize the visualizer with the original image
    visualizer = Visualizer(
        img_rgb=img,  # Use the original image
        metadata=my_metadata,  # Metadata containing class labels, colors, etc.
        scale=1,  # Scale factor for visualization
    )

    # Draw predictions on the original image
    visualized_image = visualizer.draw_instance_predictions(
        instances
    ).get_image()

    output_path = os.path.join(OUTPUT_FOLDER, filename)
    cv2.imwrite(output_path, visualized_image)
    print(f"Processed {filename} and saved to {output_path}")

    # Count the labels
    for label in instances.pred_classes:
      label_name = labels[label.item()]  # Get label name from index
      LABEL_COUNTS[label_name] += 1  # Increment the count for this label

# Print the label counts
for label_name, count in LABEL_COUNTS.items():
  print(f"{label_name}: {count}")

## TensortRT Conversion and Prediction

In [None]:
!pip install -q onnx onnxruntime common
!pip install -q git+https://github.com/NVIDIA/TensorRT#subdirectory=tools/onnx-graphsurgeon
!git clone -q https://github.com/NVIDIA/TensorRT.git #-b v8.6.1
!pip install -q TensorRT

# Create TRT Sample image

We need a 1344x1344 image to provide for calibration during conversion, see docs [here](https://github.com/NVIDIA/TensorRT/blob/main/samples/python/detectron2/README.md)

In [None]:
original_image = cv2.imread('sample_image.jpg')
original_height, original_width = original_image.shape[:2]

trt_sample_image = cv2.resize(
    original_image, (1344, 1344), interpolation=cv2.INTER_AREA
)

cv2.imwrite('trt_sample_image.jpg', trt_sample_image)

# Modify export_model.py

According to the docs [here](https://github.com/NVIDIA/TensorRT/blob/main/samples/python/detectron2/README.md#detectron-2-deployment), we need to modify the export_model.py script in detectron2/tools/deploy as follows:

```
aug = T.ResizeShortestEdge(
    [cfg.INPUT.MIN_SIZE_TEST, cfg.INPUT.MIN_SIZE_TEST], cfg.INPUT.MAX_SIZE_TEST
)
```

-->


```
aug = T.ResizeShortestEdge(
    [1344, 1344], 1344
)
```

This is to match the ultimate trt pipeline required resolution, and ensure proper calibration. Note that ResizeShortestEdge creates a scaling factor that is later used during image inference pre-processing, where infer.py ultimately creates a model training resolution image that it then scales and adds padding to.

In [None]:
!python detectron2/tools/deploy/export_model.py \
    --sample-image trt_sample_image.jpg \
    --config-file config.yaml \
    --export-method tracing \
    --format onnx \
    --output ./ \
    MODEL.WEIGHTS model_final.pth \
    MODEL.DEVICE cuda

In [None]:
!python TensorRT/samples/python/detectron2/create_onnx.py \
    --exported_onnx model.onnx \
    --onnx converted.onnx \
    --det2_config config.yaml \
    --det2_weights model_final.pth \
    --sample_image trt_sample_image.jpg

In [None]:
!python3 TensorRT/samples/python/detectron2/build_engine.py \
--onnx converted.onnx --engine engine32.trt --precision fp32

# Change the labels in the infer.py script to match your labels.csv

The imported package /TensorRT/samples/python.detectron2/infer.py has a preset list of labels. These need to modified manually to match your labels.csv / metadata

The input for inference below can be a file or a folder, and the expected output is image(s) with visualizations and an accompanying .txt file:

[Inference in Python reference](https://github.com/NVIDIA/TensorRT/blob/main/samples/python/detectron2/README.md#inference-in-python)

# Run TRT Inference

You can also take the TRT converted model here (engine32.trt) and use the script below to run inference from it.

The infer.py script comes from
https://github.com/NVIDIA/TensorRT/blob/main/samples/python/detectron2/infer.py

In [None]:
!rm -rf tensorrt_predictions
!mkdir tensorrt_predictions
!python TensorRT/samples/python/detectron2/infer.py \
    --engine engine32.trt \
    --input images \
    --det2_config config.yaml \
    --output tensorrt_predictions

# Summarize the results (optional)

This colab features trt conversion and inference as a demo, and is not designed to be productionized. However, you can still take a look at your results below by parsing the outputted text files.

In [None]:
from collections import Counter
import os


def count_prediction_classes(folder_path):
  """Count the predicted classes in TensorRT prediction files.

  Args:
      folder_path (str): Path to the folder containing .txt prediction files

  Returns:
      Counter: Counts of each predicted class
  """
  # Counter to store class counts
  class_counts = Counter()

  # Loop through all txt files in the folder
  for filename in os.listdir(folder_path):
    if filename.endswith(".txt"):
      file_path = os.path.join(folder_path, filename)

      # Read the file
      try:
        with open(file_path, "r") as f:
          lines = f.readlines()

        # Extract class values (last column) from each line
        for line in lines:
          if line.strip():  # Skip empty lines
            columns = line.strip().split()
            if len(columns) >= 6:  # Ensure we have enough columns
              class_value = int(
                  columns[5]
              )  # The class value is the 6th column (index 5)
              class_counts[class_value] += 1

      except Exception as e:
        print(f"Error processing file {filename}: {e}")

  return class_counts


folder_path = "tensorrt_predictions"

# Count classes
class_counts = count_prediction_classes(folder_path)

# Print results
print("Class Counts:")
for class_id, count in sorted(class_counts.items()):
  print(f"Class {class_id}: {count}")