Add YOLO dataset format support #74

mario-dg · 2025-03-28T13:01:43Z

This pull request introduces functionality to convert datasets from YOLO format to COCO format within the rfdetr package. The key changes include adding utility functions for this conversion and integrating these functions into the training workflow.

This PR addresses and fixes #69.

New functionality:

rfdetr/util/coco_to_yolo.py: Added utility functions is_valid_yolo_format and convert_to_coco to check the format of YOLO datasets and convert them to COCO format.

Integration into training workflow:

rfdetr/detr.py: Imported the new utility functions and added logic to convert datasets from YOLO to COCO format during training if they are detected to be in YOLO format. [1] [2]

def train_from_config(self, config: TrainConfig, **kwargs):
    if is_valid_yolo_format(config.dataset_dir):
        config.dataset_dir = convert_to_coco(config.dataset_dir)

    with open(
        os.path.join(config.dataset_dir, "train", "_annotations.coco.json"), "r"
    ) as f:
        anns = json.load(f)
        num_classes = len(anns["categories"])

After a successfull conversion, the dataset_dir config will be overwritten to ensure seamless training afterwards.

Type of change

[ x ] New feature (non-breaking change which adds functionality)

How has this change been tested, please provide a testcase or example of how you tested the change?

Will create a collab later. Locally the conversion was successfull, but I haven't tested this changed in a full training workflow yet.

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Will be updated later.

Docs updated? What were the changes:

SkalskiP · 2025-03-28T14:54:44Z

Hi 👋🏻 @mario-dg, thanks a lot for opening this PR!

It’s a solid step toward making training smoother for users working with YOLO datasets, and I appreciate the effort you’ve put into this. That said, I do have a couple of concerns I'd love your thoughts on:

I'm a bit worried about the performance of the YOLO-to-COCO conversion, especially for larger datasets. Have you tried running a full training pipeline on something like TFT-ID or SKU 110k? I’m curious if we’d observe any memory spikes or long conversion times.

It seems like we’re not caching or checking whether the dataset has already been converted. That could lead to unnecessary reprocessing on each run, which might become costly over time.

Given that, I wonder if it might make more sense in the long run to introduce a lightweight native loader for YOLO datasets instead of converting everything to COCO on the fly.

Looking forward to hearing your thoughts! @probicheaux @isaacrob-roboflow

Jordan-Pierce · 2025-03-28T15:19:45Z

@mario-dg what about:

# datasets.__init__.py

def build_dataset(image_set, args, resolution):
    if args.dataset_file == 'coco':
        return build_coco(image_set, args, resolution)
    if args.dataset_file == 'o365':
        return build_o365(image_set, args, resolution)
    if args.dataset_file == 'roboflow':
        return build_roboflow(image_set, args, resolution)
    if args.dataset_file == 'yolo':
        return build_yolo(image_set, args, resolution)
    raise ValueError(f'dataset {args.dataset_file} not supported')

# datasets.yolo.py

def build_yolo(mage_set, args, resolution):
    ...
    # Mimic build_coco in datasets.coco.py


# Load the YOLO annotations per usual, but output the results as COCODetection does
from torch.utils.data import Dataset


class YOLODetection(Dataset):
    def __init__(self, img_folder, ann_folder, transforms=None):
        """
        YOLO detection dataset.
        
        Args:
            img_folder: Path to the folder containing images
            ann_folder: Path to the folder containing YOLO annotation .txt files
            transforms: Optional transforms to be applied to images and targets
        """
        super().__init__()
        self.img_folder = img_folder
        self.ann_folder = ann_folder
        self._transforms = transforms
        
        # Get all image files with corresponding annotation files
        self.images = []
        self.annotations = []
        
        # Get supported image extensions
        img_extensions = ['.jpg', '.jpeg', '.png', '.bmp']
        
        # Find all valid image files that have corresponding annotation files
        for filename in os.listdir(img_folder):
            name, ext = os.path.splitext(filename)
            if ext.lower() in img_extensions:
                ann_path = os.path.join(ann_folder, name + '.txt')
                if os.path.exists(ann_path):
                    self.images.append(os.path.join(img_folder, filename))
                    self.annotations.append(ann_path)

    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, idx):
        # Load image
        img_path = self.images[idx]
        img = Image.open(img_path).convert('RGB')
        img_width, img_height = img.size
        
        # Load annotations
        ann_path = self.annotations[idx]
        boxes = []
        labels = []
        
        # Read YOLO format annotations
        with open(ann_path, 'r') as f:
            for line in f.readlines():
                if line.strip():
                    values = line.strip().split()
                    if len(values) >= 5:  # class, coordinates
                        class_id = int(values[0])
                        
                        if len(values) == 5:  # Standard bounding box format
                            # Convert normalized YOLO format to absolute coordinates
                            x_center = float(values[1]) * img_width
                            y_center = float(values[2]) * img_height
                            width = float(values[3]) * img_width
                            height = float(values[4]) * img_height
                            
                            # Convert from center coordinates to top-left corner
                            x_min = x_center - (width / 2)
                            y_min = y_center - (height / 2)
                            
                            boxes.append([x_min, y_min, width, height])
                            labels.append(class_id)
                        
                        elif len(values) > 5:  # Polygon format
                            # Parse polygon coordinates
                            polygon_points = []
                            for i in range(1, len(values), 2):
                                if i + 1 < len(values):
                                    # Convert normalized coordinates to absolute
                                    x = float(values[i]) * img_width
                                    y = float(values[i + 1]) * img_height
                                    polygon_points.append((x, y))
                            
                            # Convert polygon to bounding box
                            if polygon_points:
                                # Do conversion using supervision.utils
        
        # Create target dictionary
        target = {
            'boxes': torch.tensor(boxes, dtype=torch.float32),
            'labels': torch.tensor(labels, dtype=torch.int64),
            'image_id': torch.tensor([idx]),
            'orig_size': torch.tensor([img_height, img_width]),
        }
        
        # Apply transforms if available
        if self._transforms is not None:
            img, target = self._transforms(img, target)
            
        return img, target

# Do something with transforms
...

mario-dg · 2025-03-28T15:20:34Z

Yes, I agree. This is a very naive approach.
I haven't tested my implementation extensively yet, but from previous experiments I know that the supervision conversion can take a lot of time for large datasets. Hence, a new dataloader approach, that @Jordan-Pierce already mentioned in the issue might be needed.

isaacrob-roboflow · 2025-03-28T18:16:58Z

in general I'm in favor of a native data loader as opposed to conversion. the downside is I don't want to build custom loaders for n+1 dataset formats :) do y'all think it's likely people will want native support for more than just yolo and coco formats? if no I am pro data loader

Jordan-Pierce · 2025-03-28T18:24:47Z

in general I'm in favor of a native data loader as opposed to conversion. the downside is I don't want to build custom loaders for n+1 dataset formats :) do y'all think it's likely people will want native support for more than just yolo and coco formats? if no I am pro data loader

I think lots of people (not all, obviously) who might want to use RF-DETR are people who might also use libraries that use YOLO-formatted datasets (🙋‍♂️). Other dataset formats that are likely contenders are what, PASCAL-VOC?

But, given the demographic of peoples who use RF and also libraries that use YOLO-formatted datasets, I feel like these two formats would cover a lot of the people.

mario-dg · 2025-03-29T12:11:50Z

I've worked on a YOLO format data loader for a while now. The loader itself seems to work, when testing it isolated with below script. But the training is still failing.

Collab available now: https://colab.research.google.com/drive/143icsDIfvgOmtzfzEDLEeh4wMQu2g181?usp=sharing

#!/usr/bin/env python3
"""
Test script for the YOLO dataloader in RF-DETR.
This script checks if the YOLO dataloader can correctly read a YOLO format dataset.
"""

import os
import argparse
import matplotlib.pyplot as plt
import numpy as np
import torch
import random
from torchvision.transforms import functional as F
from matplotlib.patches import Rectangle

from rfdetr.datasets.yolo import build_yolo


def parse_args():
    parser = argparse.ArgumentParser(description="Test YOLO dataloader")
    parser.add_argument("--dataset-dir", type=str, required=True, help="Path to YOLO dataset")
    parser.add_argument("--image-set", type=str, default="train", help="Image set (train, val, test)")
    parser.add_argument("--resolution", type=int, default=640, help="Image resolution")
    parser.add_argument("--n-samples", type=int, default=5, help="Number of samples to display")
    parser.add_argument("--random", action="store_true", help="Randomly select samples instead of the first n")
    parser.add_argument("--seed", type=int, default=42, help="Random seed for reproducibility")
    return parser.parse_args()


class Args:
    """Dummy class to hold args for the dataloader"""
    def __init__(self, dataset_dir):
        self.dataset_dir = dataset_dir
        self.multi_scale = False
        self.expanded_scales = False
        self.square_resize_div_64 = False


def plot_sample(img, target, idx, class_names, args):
    """Plot a sample with bounding boxes"""
    # Convert tensor to numpy for visualization
    img_np = img.permute(1, 2, 0).numpy()
    
    # Denormalize the image
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    img_np = img_np * std + mean
    img_np = np.clip(img_np, 0, 1)
    
    # Create the figure
    fig, ax = plt.subplots(1, figsize=(10, 10))
    ax.imshow(img_np)
    
    # Get boxes and labels
    boxes = target["boxes"].numpy()
    labels = target["labels"].numpy()
    
    # Get image dimensions for denormalizing coordinates
    h, w = img_np.shape[0:2]
    
    # Plot each box
    for box, label in zip(boxes, labels):
        # RF-DETR stores boxes in normalized [centerX, centerY, width, height] format
        # We need to convert to absolute pixel coordinates for visualization
        cx, cy, bw, bh = box
        
        # Convert center coordinates to top-left
        x1 = (cx - bw/2) * w
        y1 = (cy - bh/2) * h
        width = bw * w
        height = bh * h
        
        # Print for debugging
        print(f"Box: {box}, Label: {label}")
        print(f"  Denormalized: x={x1:.1f}, y={y1:.1f}, w={width:.1f}, h={height:.1f}")
        
        # Create and add the rectangle
        rect = Rectangle((x1, y1), width, height, linewidth=2, edgecolor='r', facecolor='none')
        ax.add_patch(rect)
        
        # If class_names is available, use the class name instead of the numeric label
        class_label = class_names[label-1] if class_names and label-1 < len(class_names) else f"Class: {label}"
        ax.text(x1, y1, class_label, color='white', fontsize=12, 
                backgroundcolor='red', verticalalignment='top')
    
    ax.set_title(f"Sample {idx} - {len(boxes)} objects detected")
    plt.axis('off')
    plt.tight_layout()
    
    # Create output directory if it doesn't exist
    os.makedirs("test_output", exist_ok=True)
    plt.savefig(f"test_output/sample_{idx}.png")
    plt.close()


def main(args):
    print(f"Testing YOLO dataloader with dataset: {args.dataset_dir}")
    
    # Set random seed for reproducibility
    if args.random:
        random.seed(args.seed)
        np.random.seed(args.seed)
        torch.manual_seed(args.seed)
        print(f"Using random seed: {args.seed}")
    
    # Initialize Args for dataloader
    loader_args = Args(args.dataset_dir)
    
    try:
        # Build dataset
        print(f"Building dataset with image_set={args.image_set}, resolution={args.resolution}")
        dataset = build_yolo(args.image_set, loader_args, args.resolution)
        
        print(f"Dataset size: {len(dataset)} samples")
        print(f"Class names: {dataset.class_names}")
        
        # Verify dataset configurations
        print(f"Class mapping (YOLO class ID -> COCO class ID): {dataset.class_to_coco_id}")
        
        # Verify COCO API functionality
        coco_api = dataset.coco
        print(f"Total annotations: {len(coco_api.anns)}")
        print(f"Total categories: {len(coco_api.cats)}")
        print(f"Total images: {len(coco_api.imgs)}")
        
        # Print actual category IDs for debugging
        print(f"Category IDs in the dataset: {list(coco_api.cats.keys())}")
        
        # Select sample indices
        n_samples = min(args.n_samples, len(dataset))
        if args.random:
            sample_indices = random.sample(range(len(dataset)), n_samples)
            print(f"Randomly selected samples: {sample_indices}")
        else:
            sample_indices = list(range(n_samples))
            print(f"Using first {n_samples} samples")
        
        # Display selected samples
        print(f"Displaying {n_samples} samples...")
        
        # Check for potential issues in samples
        for i, idx in enumerate(sample_indices):
            try:
                img, target = dataset[idx]
                print(f"Sample {i} (dataset index {idx}):")
                print(f"  Image shape: {img.shape}")
                print(f"  Boxes: {target['boxes'].shape}")
                print(f"  Labels: {target['labels']}")
                
                # Validate labels - check if any label is out of range
                if len(target['labels']) > 0:
                    max_label = target['labels'].max().item()
                    min_label = target['labels'].min().item()
                    num_classes = len(dataset.class_names) + 1  # +1 for background class
                    
                    if max_label >= num_classes or min_label <= 0:
                        print(f"  WARNING: Invalid label range: min={min_label}, max={max_label}, valid range=[1, {num_classes-1}]")
                        
                    # Debug output for label values
                    label_counts = {}
                    for label in target['labels']:
                        l = label.item()
                        if l not in label_counts:
                            label_counts[l] = 0
                        label_counts[l] += 1
                    print(f"  Label distribution: {label_counts}")
                
                # Plot sample
                plot_sample(img, target, i, dataset.class_names, args)
            except Exception as e:
                print(f"Error processing sample {idx}: {str(e)}")
                import traceback
                traceback.print_exc()
        
        print(f"Sample images saved to test_output/ directory.")
    except Exception as e:
        print(f"Error testing YOLO dataloader: {str(e)}")
        import traceback
        traceback.print_exc()


if __name__ == "__main__":
    args = parse_args()
    main(args)

SkalskiP · 2025-03-29T17:14:28Z

@mario-dg I'm blown away by the progress in this PR! Have you managed to solve it?

I agree with @Jordan-Pierce. YOLO format is must have in project that is here to compete with YOLO models.

Jordan-Pierce · 2025-03-29T17:15:54Z

Nice job @mario-dg !

mario-dg · 2025-03-29T20:13:43Z

No not yet, trying to solve it tomorrow. In the meantime, maybe @Jordan-Pierce Has an idea? The main parts of the data loader are similar to his initial idea.

mario-dg · 2025-03-31T08:52:54Z

Ok, I am getting there. The first test training run went through, locally and in a Collab 🚀
I will try to train a model on a larger dataset, but I am limited by the free Google Collab runtime.
The datasets used can be found in this collab (same as previous). They where just downloaded in different formats from Roboflow universe. https://colab.research.google.com/drive/143icsDIfvgOmtzfzEDLEeh4wMQu2g181?usp=sharing

Small Dataset in YOLO format

Small Dataset in COCO format

SkalskiP · 2025-03-31T11:00:42Z

Awesome @mario-dg! 🔥 I'll run some tests myself and make code review.

rfdetr/util/coco_to_yolo.py

rfdetr/detr.py

rfdetr/util/coco_to_yolo.py

rfdetr/detr.py

rfdetr/datasets/yolo.py

rfdetr/util/coco_to_yolo.py

rfdetr/datasets/yolo.py

SkalskiP · 2025-03-31T11:40:23Z

@mario-dg I made first round of code review. is there any chance you could start working on those changes today? also are you on our discord server?

mario-dg · 2025-03-31T11:53:10Z

Yes, I will work on them right away! And yes I am, but have not been really active yet.

mario-dg · 2025-03-31T14:19:17Z

Did most of them, will be able to do the last remaining one tonight. Will also start a test training then 😄

SkalskiP · 2025-03-31T16:01:34Z

Thanks a lot @mario-dg ! 🙏🏻 We will have one more review round. I'm sorry to drag you through that review process, but I want us to get it right.

mario-dg · 2025-03-31T16:33:38Z

No worries. I wanna make sure that we have a solid code base that everyone can work upon. So do as many rounds as you feel are necessary

SkalskiP · 2025-03-31T16:57:17Z

We are on the same page! 🔥

mario-dg · 2025-03-31T18:28:12Z

@SkalskiP, we're good to go. Implemented your feedback and ran the test again with the small dataset. Everything working again 🚀

probicheaux · 2025-04-01T02:37:58Z

Hi @mario-dg , everything is looking good and thanks for your contribution. I'm afraid another merged pr has introduced a merge conflict here, can you resolve it? Then we should be good to go.

CLAassistant · 2025-04-01T04:55:32Z

All committers have signed the CLA.

mario-dg · 2025-04-01T05:01:40Z

@probicheaux fixed the merged conflict. Sorry for the force push, accidently committed with my work account.

mario-dg · 2025-04-04T13:28:28Z

Any news?

isaacrob-roboflow · 2025-04-04T17:16:08Z

@probicheaux are you comfortable owning making this happen? assuming since you were engaged prior

SkalskiP · 2025-04-04T18:30:41Z

@mario-dg I drive it to finish line next week ;)

mario-dg · 2025-04-04T22:33:18Z

@SkalskiP, anything I can/should improve or change?

mario-dg · 2025-04-15T07:27:01Z

@SkalskiP, when can we expect updates?

Jordan-Pierce · 2025-04-22T18:50:42Z

Any updates on this?

sergiev · 2025-04-24T18:14:02Z

tagging @SkalskiP just to remind that this feature is what we the crowd crave for :)

nok · 2025-05-25T19:06:43Z

Hallo @mario-dg , @SkalskiP ,

FYI, I pushed some improvements and added the latest changes/commits of this repository by creating a new pull request to your forked repository: mario-dg#1

I've added the latest missing commits to this pull request. However, this makes the development history of the feature less transparent. To help with review, I recommend focusing on the changes in yolo.py, as it's the core part of this update.

mario-dg · 2025-05-26T14:37:35Z

Hi @nok,
thanks for your improvements and updates from main. Before I merge your PR, I would like some feedback and updates to this feature by the Roboflow team, since this PR is already quite complex. When we get an answer, I'm gonna have a closer look at your additions and merge.

@SkalskiP, @isaacrob-roboflow, @probicheaux

Akhp888 · 2025-06-17T11:38:51Z

Hello Everyone,

I see a lot of progress done on this topic here in the chat, I see that the merge is pending for fixing requested changes since some weeks now, if it's not major can we have this addition sooner?

Thanks

mario-dg added 2 commits March 28, 2025 13:59

Add YOLO dataset format support

79ad788

Rename is_correct_yolo_format method

9d6c8d4

mario-dg changed the title ~~Add YOLO dataset format support (#69)~~ Add YOLO dataset format support Mar 28, 2025

Add YOLO format data loader, ensure COCO API compliance

c690c93

mario-dg added 2 commits March 29, 2025 13:15

Clean up dataloader

b9ae481

Undo last changes

8234f66

mario-dg added 6 commits March 31, 2025 08:58

image_ids should start at 1 and class ids were mismatched

fda8d9b

Cleanup docs of latest changes

67c6191

Ensure non-string idx

d388ef4

Try to fix COCO like API

4a98273

Use proper dataset structure in COCOLikeAPI

19c4a18

Fix image and annotation ID in COCOLikeAPI

ff4bc6c

Consistency between images and annotations in COCOLikeAPI

89029a0

mario-dg force-pushed the support-yolo-format branch from 04ef250 to 89029a0 Compare March 31, 2025 11:10

Remove default list when retrieving class names from data.yaml

e92ced3

SkalskiP requested changes Mar 31, 2025

View reviewed changes

Code Review improvements

834c54e

mario-dg added 6 commits March 31, 2025 19:37

More code review improvements

54dc467

Fix imports

40e5273

Notify user, if image or label files are skipped

19adea7

Correct usage of supervision file util methods

e4b361e

Fix params of parse_yolo_annotations

d09d468

Forgot to return boxes and labels

b3844f2

Use constant in build_yolo for consistency

5c9b972

Merge remote-tracking branch 'origin/main' into support-yolo-format

f8e15da

mario-dg force-pushed the support-yolo-format branch from 1524bcc to f8e15da Compare April 1, 2025 04:59

SkalskiP changed the base branch from main to develop April 3, 2025 08:53

Merge branch 'develop' into support-yolo-format

b1aeaad

mario-dg requested a review from SkalskiP June 17, 2025 13:14

Add YOLO dataset format support #74

Are you sure you want to change the base?

Add YOLO dataset format support #74

Uh oh!

Conversation

mario-dg commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

SkalskiP commented Mar 28, 2025

Uh oh!

Jordan-Pierce commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mario-dg commented Mar 28, 2025

Uh oh!

isaacrob-roboflow commented Mar 28, 2025

Uh oh!

Jordan-Pierce commented Mar 28, 2025

Uh oh!

mario-dg commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SkalskiP commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jordan-Pierce commented Mar 29, 2025

Uh oh!

mario-dg commented Mar 29, 2025

Uh oh!

mario-dg commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Small Dataset in YOLO format

Small Dataset in COCO format

Uh oh!

SkalskiP commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SkalskiP commented Mar 31, 2025

Uh oh!

mario-dg commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mario-dg commented Mar 31, 2025

Uh oh!

SkalskiP commented Mar 31, 2025

Uh oh!

mario-dg commented Mar 31, 2025

Uh oh!

SkalskiP commented Mar 31, 2025

Uh oh!

mario-dg commented Mar 31, 2025

Uh oh!

probicheaux commented Apr 1, 2025

Uh oh!

CLAassistant commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mario-dg commented Apr 1, 2025

Uh oh!

mario-dg commented Apr 4, 2025

Uh oh!

isaacrob-roboflow commented Apr 4, 2025

Uh oh!

SkalskiP commented Apr 4, 2025

Uh oh!

mario-dg commented Apr 4, 2025

Uh oh!

mario-dg commented Mar 28, 2025 •

edited

Loading

Jordan-Pierce commented Mar 28, 2025 •

edited

Loading

mario-dg commented Mar 29, 2025 •

edited

Loading

SkalskiP commented Mar 29, 2025 •

edited

Loading

mario-dg commented Mar 31, 2025 •

edited

Loading

mario-dg commented Mar 31, 2025 •

edited

Loading

CLAassistant commented Apr 1, 2025 •

edited

Loading

Akhp888 commented Jun 17, 2025 •

edited

Loading