# Train and deploy a Faster R-CNN Object Detection model

<a target="_blank" href="https://colab.research.google.com/github/unionai-oss/faster-rcnn-object-detection-computer-vision-train-and-deploy/blob/main/tutorial.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>



### Setup

In [None]:
try:
    import google.colab
    IN_COLAB = True
except ImportError:
    IN_COLAB = False

if IN_COLAB:
    !git clone https://github.com/unionai-oss/faster-rcnn-object-detection-computer-vision-train-and-deploy
    %cd faster-rcnn-object-detection-computer-vision-train-and-deploy
    !pip install -r requirements.txt

### 🔐 Authenticate
To use **Union.ai**, you'll need to authenticate your account. Follow the appropriate step based on your setup:  

##### 🔸 **Using Union BYOC Enterprise**  

If you're using a **[Union BYOC Enterprise](https://www.union.ai/pricing)** account, log in with the following command:  
```bash
union create login --host <union-host-url>
```

Replace <union-host-url> with your organization's Union instance URL.

##### 🔸 Using Union Serverless
If you're using [Union Serverless](https://www.union.ai/) , authenticate by running the command below:

Create an account for free at [Union.ai](https://union.ai) if you don't have one yet:
 

In [None]:
# 🌟 Authenticate to union serverless
!union create login --serverless --auth device-flow

## Training Faster RCNN Object Detetection Model Pipeline

Run the command below to train a Faster RCNN Object Detection model using the Union.ai CLI. This command will create a new pipeline and start the training process.

The first time you this command it will take a while to download the model and set up the environment. The subsequent runs will be faster as the model will be cached.


In [None]:
# Run this command to start the training workflow & container building
!union run --remote workflows/train-frcnn-pipeline.py faster_rcnn_train_workflow --epochs 3

In [None]:
%%writefile workflow.py



In [None]:
# containers.py

In [None]:
# requirmets.txt

In [None]:
# data tasks


In [None]:
# model tasks

## Run Model Locally

 lets pull donw our same datset locally to run the model on examples

In [None]:
# lets call our downlaod dataset from earlier locally
!union run tasks/data.py download_hf_dataset

We will pull down the latest Model Artifact from Union and save it locally. 


In [None]:
from union import Artifact, UnionRemote
from flytekit.types.file import FlyteFile
import torch


# --------------------------------------------------
# Download & save the fine-tuned model from Union Artifacts
# --------------------------------------------------
FRCCNFineTunedModel = Artifact(name="frccn_fine_tuned_model")

query = FRCCNFineTunedModel.query(
    project="default",
    domain="development",
    # version="anmrqcq8pfbnlp42j2vp/n3/0/o0"  # Optional: specify version. Will download the latest version if not specified
)
remote = UnionRemote()
artifact = remote.get_artifact(query=query)
model_file: FlyteFile = artifact.get(as_type=FlyteFile)
model = torch.load(model_file.download(), map_location="cpu", weights_only=False)


model.eval()
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")


Let's run the model locally on an example image and draw the bounding boxes on the image. 

In [None]:
import cv2
import numpy as np
import requests
import torch
from flytekit.types.file import FlyteFile
from PIL import Image, ImageDraw, ImageFont
from torchvision.transforms import functional as F
from union import Artifact, UnionRemote
from io import BytesIO

# Define labels map
labels_map = {1: "Union Sticker", 2: "Flyte Sticker"}

# Check and set the available device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using {device} device")


font_url = "https://github.com/google/fonts/raw/refs/heads/main/apache/ultra/Ultra-Regular.ttf"
response = requests.get(font_url)
font = ImageFont.truetype(BytesIO(response.content), size=20)


# Function to draw bounding boxes
def draw_boxes(image, boxes, labels, scores, labels_map):
    draw = ImageDraw.Draw(image, 'RGBA')
    # font = ImageFont.truetype(urlopen(truetype_url), size=20)
    # font = ImageFont.load_default() # default font in pil


    colors = {
        0: (255, 173, 10, 200),  # Class 0 color (e.g., blue)
        1: (28, 140, 252, 200),  # Class 1 color (e.g., orange)
    }
    colors_fill = {
        0: (255, 173, 10, 100),  # Class 0 fill color (e.g., bluea)
        1: (28, 140, 252, 100),  # Class 1 fill color (e.g., orangea)
    }

    for box, label, score in zip(boxes, labels, scores):
        if score > 0.6: # adjust threshold as needed
          color = colors.get(label, (0, 255, 0, 200))
          fill_color = colors_fill.get(label, (0, 255, 0, 100))
          draw.rectangle([(box[0], box[1]), (box[2], box[3])], outline=color, width=3)
          draw.rectangle([(box[0], box[1]), (box[2], box[3])], fill=fill_color)
          label_text = f"{labels_map[label]}: {score:.2f}"
          text_size = font.getbbox(label_text)
          draw.rectangle([(box[0], box[1] - text_size[1]), (box[0] + text_size[0], box[1])], fill=color)
          draw.text((box[0], box[1] - text_size[1]), label_text, fill="white", font=font)

    return image


# Load a single test image
image_path = '/content/faster-rcnn-object-detection-computer-vision-train-and-deploy/dataset/swag/images/1bd5a6b5-20240916_133544.jpg'
image = Image.open(image_path).convert("RGB")
image_tensor = F.to_tensor(image).unsqueeze(0).to(device)

# Run inference
with torch.no_grad():
    outputs = model(image_tensor)

# Get the boxes, labels, and scores
boxes = outputs[0]['boxes'].cpu().numpy()
labels = outputs[0]['labels'].cpu().numpy()
scores = outputs[0]['scores'].cpu().numpy()

# Define labels map
labels_map = {0: "Background", 1: "Union Sticker", 2: "Flyte Sticker"}

# Draw the boxes on the image
image_with_boxes = draw_boxes(image, boxes, labels, scores, labels_map)

# Display the image
image_with_boxes.show()

# Save the image
image_with_boxes.save('output_image.jpg')

We can use the same draw bounding boxes function to loop over the frame in a video.  

In [None]:
import cv2
import numpy as np
import requests
import torch
from flytekit.types.file import FlyteFile
from PIL import Image, ImageDraw, ImageFont
from torchvision.transforms import functional as F
from union import Artifact, UnionRemote
from io import BytesIO


# ------------------------------------
# create video writer
# ------------------------------------

# Video path and properties
video_path = "dataset/swag/videos/union_sticker_video.mp4"
video = cv2.VideoCapture(video_path)
width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
frames_per_second = video.get(cv2.CAP_PROP_FPS)
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))


# Initialize video writer
video_writer = cv2.VideoWriter(
    "object_detection_video.mp4",
    cv2.VideoWriter_fourcc(*"mp4v"),
    fps=float(frames_per_second),
    frameSize=(width, height),
    isColor=True,
)


def run_inference_video(video, model, device, labels_map):
    while True:
        hasFrame, frame = video.read()
        if not hasFrame:
            break

        # Convert frame to PIL image
        image = Image.fromarray(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
        image_tensor = F.to_tensor(image).unsqueeze(0).to(device)

        # Run inference
        with torch.no_grad():
            outputs = model(image_tensor)

        # Get the boxes, labels, and scores
        boxes = outputs[0]["boxes"].cpu().numpy()
        labels = outputs[0]["labels"].cpu().numpy()
        scores = outputs[0]["scores"].cpu().numpy()

        # Draw the boxes on the image
        image_with_boxes = draw_boxes(image, boxes, labels, scores, labels_map)

        # Convert back to OpenCV image format
        result_frame = cv2.cvtColor(np.array(image_with_boxes), cv2.COLOR_RGB2BGR)

        yield result_frame


# Run inference and write video
for frame in run_inference_video(video, model, device, labels_map):
    video_writer.write(frame)

# Release resources
video.release()
video_writer.release()


This example below shows how you could run the model on a live video feed. This won't run in the notebook, but you can run it in your local environment. 


In [None]:
# THIS WON"T WORK IN COLAB
# Webcam example 

import torch
import cv2
import time
from torchvision.transforms import functional as F
from huggingface_hub import hf_hub_download
from union import Artifact, UnionRemote
from flytekit.types.file import FlyteFile

# --------------------------------------------------
# Load the fine-tuned SSD model from Union Artifact
# --------------------------------------------------
FRCCNFineTunedModel = Artifact(name="frccn_fine_tuned_model")
query = FRCCNFineTunedModel.query(
    project="default",
    domain="development",
    # version="anmrqcq8pfbnlp42j2vp/n3/0/o0"  # Optional: specify version
)
remote = UnionRemote()
artifact = remote.get_artifact(query=query)
model_file: FlyteFile = artifact.get(as_type=FlyteFile)
model = torch.load(model_file.download(), map_location="cpu", weights_only=False)

model.eval()
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)

# --------------------------------------------------
# Function to process a single frame and draw bounding boxes
# --------------------------------------------------
def process_frame(frame):
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    image_tensor = F.to_tensor(frame_rgb).unsqueeze(0).to(device)

    with torch.no_grad():
        prediction = model(image_tensor)

    boxes = prediction[0]['boxes'].cpu().numpy()
    scores = prediction[0]['scores'].cpu().numpy()
    labels = prediction[0]['labels'].cpu().numpy()

    for i, box in enumerate(boxes):
        if scores[i] > 0.5: # Confidence threshold for detection 
            x_min, y_min, x_max, y_max = box
            cv2.rectangle(frame, (int(x_min), int(y_min)), (int(x_max), int(y_max)), (0, 255, 0), 2)
            label = f"Class {labels[i]}: {scores[i]:.2f}"
            cv2.putText(frame, label, (int(x_min), int(y_min) - 10),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)

    return frame

# --------------------------------------------------
# Run feed with frame skipping option for efficiency
# --------------------------------------------------
def run_video_feed(skip_frames=5):
    frame_skip = skip_frames
    frame_count = 0
    last_processed_frame = None

    cap = cv2.VideoCapture(0)
    if not cap.isOpened():
        print("Error: Could not open video stream.")
        return

    prev_time = time.time()

    while True:
        ret, frame = cap.read()
        if not ret:
            print("Error: Failed to capture frame.")
            break

        current_time = time.time()
        fps = 1 / (current_time - prev_time)
        prev_time = current_time

        if frame_count % frame_skip == 0:
            last_processed_frame = process_frame(frame)
            fps_text = f"FPS: {fps:.2f}"
            cv2.putText(last_processed_frame, fps_text, (10, 30),
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 2)

        if last_processed_frame is not None:
            cv2.imshow('Object Detection RCNN', last_processed_frame)

        frame_count += 1

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

# Run the video feed function
if __name__ == "__main__":
    run_video_feed()

## Serving as an App on Union

We can serve our model as an app on Union. This allows us to run the model in a production environment and make it available for use by other applications or users.

This example will use Gradio, but we could also use any other web framework like Flask or FastAPI to serve our model as an API. 