Feat: MOT dataset loader #29

rolson24 · 2025-04-26T15:13:41Z

Description

This PR is the start of the evaluation framework started in #7. It contains a base dataset prototype and a MOTChallenge dataset implementation. The MOTChallenge dataset class can load, parse, preprocess, and provide an iterator for each sequence in the dataset, much in the same way TrackEval does. It also supports loading "public detections" which are pre-computed detections stored in a dets.txt file for each sequence which standardizes the detections that object trackers are evaluated on.

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Example usage:

from pathlib import Path
import supervision as sv
from trackers.dataset.mot_challenge import MOTChallengeDataset

local_mot17_path = "/path/to/MOT17"
dataset_root = os.path.join(local_mot17_path, 'train')
mot_dataset_path = Path(dataset_root)

# Initialize the dataset with the path to the MOT folder.
mot_dataset = MOTChallengeDataset(dataset_path=mot_dataset_path)

# Print the sequences in the dataset
available_sequences = mot_dataset.get_sequence_names()
print("Available sequences:", available_sequences)

# Print the info of the first sequence
first_sequence_name = available_sequences[0]
seq_info = mot_dataset.get_sequence_info(first_sequence_name)
print("Sequence Info:", seq_info)

# Get frames from the iterator
frame_count = 0
for frame_info in mot_dataset.get_frame_iterator(first_sequence_name):
    if frame_count < 5:
        print(frame_info)
    frame_count += 1
print(f"Total frames iterated: {frame_count}")

# Load the ground truth tracks for the first sequence
gt_detections = mot_dataset.load_ground_truth(first_sequence_name) # sv.Detections object
for i in range(min(5, len(gt_detections))):
    print(f"Frame: {gt_detections.data['frame_idx'][i]}, \
            ID: {gt_detections.tracker_id[i]}, \
            Class: {gt_detections.class_id[i]}, \
            Conf/Vis: {gt_detections.confidence[i]:.2f}, \
            Box: {gt_detections.xyxy[i]}")

# Load public detections for the sequence
mot_dataset.load_public_detections(min_confidence=0.1) # Optional: set a confidence threshold

if mot_dataset.has_public_detections:
    frame_iter = mot_dataset.get_frame_iterator(sequence_name)
    first_frame_info = next(frame_iter)
    # Get the image path for the first frame
    first_frame_path = first_frame_info.get('image_path')

    if first_frame_path:
        print(f"Getting detections for image: {first_frame_path}")
        public_dets_frame1 = mot_dataset.get_public_detections(first_frame_path)
        print(f"Found {len(public_dets_frame1)} public detections for the first frame.")
        if len(public_dets_frame1) > 0:
            print("First 5 public detections for this frame:")
            for i in range(min(5, len(public_dets_frame1))):
                print(f"Conf: {public_dets_frame1.confidence[i]:.2f}, \
                        Box: {public_dets_frame1.xyxy[i]}")

I also have a colab notebook testing out the data loader

Any specific deployment considerations

For example, documentation changes, usability, usage/costs, secrets, etc.

Docs

Docs updated? What were the changes:

trackers/dataset/mot_challenge.py

soumik12345

Thanks a lot for this PR!
Kudos for attaching such a detailed colab notebook with the PR, really helped me test the code quickly!

I have recommended some minor changes. Can you please address them?

trackers/dataset/base.py

trackers/dataset/utils.py

trackers/dataset/base.py

soumik12345

Hi @rolson24, the PR looks good to me.
I have some minor nitpick comments.

trackers/dataset/utils.py

SkalskiP · 2025-05-28T11:40:35Z

trackers/dataset/base.py

+import supervision as sv
+
+
+# --- Base Dataset ---


let’s remove Python comments like these; we can already tell it’s a base dataset since it only has abstract methods and implements the ABC interface.

SkalskiP · 2025-05-28T11:44:50Z

trackers/dataset/base.py

+
+        Returns:
+            A dictionary containing sequence information (e.g., 'frame_rate',
+            'seqLength', 'img_width', 'img_height', 'img_dir'). Keys and value


can we make sure all these dict keys follow the same naming convention? It looks like everything is in snake_case except for seqLength. Also, wouldn’t it make more sense to use a dataclass or a namedtuple instead of a regular dict here?

SkalskiP · 2025-05-28T12:04:21Z

trackers/dataset/base.py

+        iou_threshold: float = 0.5,
+        remove_distractor_matches: bool = True,


it seems that the preprocess implementation is highly dataset-specific. given that, does it really make sense to add iou_threshold and remove_distractor_matches to the base class, especially since there’s no guarantee they’ll be needed here? maybe it would be better to allow for additional keyword arguments instead?

SkalskiP · 2025-05-28T12:06:25Z

trackers/dataset/base.py

+
+
+# --- Base Dataset ---
+class EvaluationDataset(abc.ABC):


should we really call this dataset EvaluationDataset? Maybe it would make more sense to use a more general name, like TrackingDataset or MOTDataset instead?

SkalskiP · 2025-05-28T12:37:05Z

trackers/dataset/utils.py

+logger = get_logger(__name__)
+
+
+def relabel_ids(detections: sv.Detections) -> sv.Detections:


Let’s rename this function from relabel_ids to remap_ids.

We should also allow users to pass a custom mapping that defines how each source_id should be mapped to a target_id.

I definitely think we should add a unit test for this function.

SkalskiP · 2025-05-28T13:13:28Z

trackers/dataset/mot_challenge.py

+            with open(file_path, "r") as f:
+                for line in f:
+                    line = line.strip()
+                    if not line:
+                        continue
+
+                    parts = line.split(",")


supervision have a util for loading text file lines; let's use it here

SkalskiP · 2025-05-28T13:30:20Z

trackers/dataset/mot_challenge.py

+                        # or detection confidence in det.txt
+                        confidence = float(parts[6])
+
+                        class_id = int(parts[7]) if len(parts) > 7 else -1


I'd move -1 magic number to the top of the file and set is as python constant.

SkalskiP · 2025-05-28T13:41:18Z

trackers/dataset/mot_challenge.py

+            return sv.Detections(
+                xyxy=xyxy,
+                confidence=confidence,
+                class_id=class_id,
+                tracker_id=tracker_id,
+                data={"frame_idx": frame_idx},
+            )


I think this method must return separate Detections object per frame.

So instead of storing the frame index in the data attribute of the Detections object, we should just return a list of Detections objects? That probably does align more with how supervision uses Detections objects.

SkalskiP · 2025-05-28T13:45:40Z

trackers/dataset/mot_challenge.py

+            logger.warning(f"No valid MOTChallenge sequences found in {self.root_path}")
+        return sorted(sequences)
+
+    def _parse_mot_file(


I think we would need to separate loading data from file and processing data; this will allow us to unit test the data processing part

SkalskiP · 2025-05-28T13:52:45Z

trackers/dataset/mot_challenge.py

+            # Try common extensions explicitly if glob fails
+            if (img_dir / f"{1:06d}.jpg").exists():
+                img_ext = ".jpg"
+            elif (img_dir / f"{1:06d}.png").exists():
+                img_ext = ".png"


I'm a bit confused as to how any of this would work if glob fails.

SkalskiP · 2025-05-28T14:15:13Z

Hi @rolson24 👋🏻 After my conversation with @soumik12345 yesterday, it became clear to me that we really need to enable benchmarking in the trackers package as soon as possible. Because of this, getting this PR over the finish line is my top priority for the week. I’ve left my comments on the PR—do you think you’ll have time to address them? If not, no worries, I can take care of it to keep things moving forward.

rolson24 · 2025-06-06T22:48:11Z

Hi @rolson24 👋🏻 After my conversation with @soumik12345 yesterday, it became clear to me that we really need to enable benchmarking in the trackers package as soon as possible. Because of this, getting this PR over the finish line is my top priority for the week. I’ve left my comments on the PR—do you think you’ll have time to address them? If not, no worries, I can take care of it to keep things moving forward.

Hi @SkalskiP, I can address these request these changes this weekend. Sorry I was finishing up school last week, but I should have some more time now.

rolson24 added 2 commits April 26, 2025 09:45

initial dataset loader commit

2d4a686

replace prints with logging

df8ae6f

rolson24 requested review from soumik12345, SkalskiP and onuralpszr as code owners April 26, 2025 15:13

rolson24 commented Apr 26, 2025

View reviewed changes

trackers/dataset/mot_challenge.py Show resolved Hide resolved

soumik12345 requested changes Apr 28, 2025

View reviewed changes

trackers/dataset/base.py Outdated Show resolved Hide resolved

trackers/dataset/utils.py Outdated Show resolved Hide resolved

trackers/dataset/utils.py Outdated Show resolved Hide resolved

trackers/dataset/base.py Show resolved Hide resolved

Chore: respond to feedback

390a342

soumik12345 previously approved these changes Apr 29, 2025

View reviewed changes

trackers/dataset/utils.py Outdated Show resolved Hide resolved

trackers/dataset/utils.py Outdated Show resolved Hide resolved

rolson24 and others added 2 commits April 29, 2025 22:35

chore: fix nitpicks

3034cb5

Merge branch 'roboflow:main' into feat/mot_datasets

c432235

SkalskiP mentioned this pull request May 6, 2025

add: Market1501SiameseDataset #37

Merged

rolson24 dismissed soumik12345’s stale review via c432235 May 6, 2025 16:04

SkalskiP requested changes May 28, 2025

View reviewed changes

Merge branch 'roboflow:main' into feat/mot_datasets

1b28bfe

soumik12345 mentioned this pull request Jun 21, 2025

Feature: Offline Tracker (KSP Tracker) #89

Open

1 task

		iou_threshold: float = 0.5,
		remove_distractor_matches: bool = True,

		logger = get_logger(__name__)


		def relabel_ids(detections: sv.Detections) -> sv.Detections:

Feat: MOT dataset loader #29

Are you sure you want to change the base?

Feat: MOT dataset loader #29

Uh oh!

Conversation

rolson24 commented Apr 26, 2025

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs

Uh oh!

Uh oh!

soumik12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

soumik12345 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SkalskiP commented May 28, 2025

Uh oh!

rolson24 commented Jun 6, 2025

Uh oh!

Uh oh!