Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New vectorized distance functions. #211

Merged
merged 5 commits into from
Nov 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 3 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Using Norfair, you can add tracking capabilities to any detector with just a few

- Supports moving camera, re-identification with appearance embeddings, and n-dimensional object tracking (see [Advanced features](#advanced-features)).

- The function used to calculate the distance between tracked objects and detections is defined by the user, enabling the implementation of different tracking strategies.
- Norfair provides several predefined distance functions to compare tracked objects and detections. The distance functions can also be defined by the user, enabling the implementation of different tracking strategies.

- Fast. The only thing bounding inference speed will be the detection network feeding detections to Norfair.

Expand Down Expand Up @@ -98,14 +98,7 @@ Most tracking demos are showcased with vehicles and pedestrians, but the detecto

## How it works

Norfair works by estimating the future position of each point based on its past positions. It then tries to match these estimated positions with newly detected points provided by the detector. For this matching to occur, Norfair can rely on any distance function specified by the user of the library. Therefore, each object tracker can be made as simple or as complex as needed.

The following is an example of a particularly simple distance function calculating the Euclidean distance between tracked objects and detections. This is possibly the simplest distance function you could use in Norfair, as it uses just one single point per detection/object.

```python
def euclidean_distance(detection, tracked_object):
return np.linalg.norm(detection.points - tracked_object.estimate)
```
Norfair works by estimating the future position of each point based on its past positions. It then tries to match these estimated positions with newly detected points provided by the detector. For this matching to occur, Norfair can rely on any distance function. There are some predefined distances already integrated in Norfair, and the users can also define their own custom distances. Therefore, each object tracker can be made as simple or as complex as needed.

As an example we use [Detectron2](https://github.com/facebookresearch/detectron2) to get the single point detections to use with this distance function. We just use the centroids of the bounding boxes it produces around cars as our detections, and get the following results.

Expand All @@ -132,7 +125,7 @@ detector = DefaultPredictor(cfg)

# Norfair
video = Video(input_path="video.mp4")
tracker = Tracker(distance_function=euclidean_distance, distance_threshold=20)
tracker = Tracker(distance_function="euclidean", distance_threshold=20)

for frame in video:
detections = detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
Expand Down
2 changes: 1 addition & 1 deletion demos/3d_track/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ RUN apt-get update && \
rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade pip && \
pip install mediapipe==0.8.10.1 && \
pip install mediapipe==0.8.11 && \
pip install opencv-python==4.5.5.64

RUN pip install git+https://github.com/tryolabs/norfair.git@master#egg=norfair
Expand Down
3 changes: 1 addition & 2 deletions demos/alphapose/writer.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
EVAL_JOINTS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]

detection_threshold = 0.2
keypoint_dist_threshold = None
keypoint_dist_threshold = 10
javiber marked this conversation as resolved.
Show resolved Hide resolved


def keypoints_distance(detected_pose, tracked_pose):
Expand Down Expand Up @@ -262,7 +262,6 @@ def update(self):

final_result.append(result)
if self.opt.save_img or self.save_video or self.opt.vis:

img = orig_img.copy()
global keypoint_dist_threshold
keypoint_dist_threshold = img.shape[0] / 30
Expand Down
2 changes: 1 addition & 1 deletion demos/camera_motion/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ def run():
)

tracker = Tracker(
distance_function="frobenius",
distance_function="euclidean",
detection_threshold=args.confidence_threshold,
distance_threshold=args.distance_threshold,
initialization_delay=args.initialization_delay,
Expand Down
2 changes: 1 addition & 1 deletion demos/detectron2/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

# Norfair
video = Video(input_path=args.file)
tracker = Tracker(distance_function="frobenius", distance_threshold=20)
tracker = Tracker(distance_function="euclidean", distance_threshold=20)

for frame in video:
detections = detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
Expand Down
4 changes: 1 addition & 3 deletions demos/mmdetection/mmdetection_cars.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
from mmdet.core import get_classes

from norfair import Detection, Tracker, Video, draw_tracked_objects
from norfair.distances import frobenius
from norfair.tracker import TrackedObject

#
# MMDetection setup
Expand All @@ -27,7 +25,7 @@


tracker = Tracker(
distance_function=frobenius, distance_threshold=20, detection_threshold=0.6
distance_function="euclidean", distance_threshold=20, detection_threshold=0.6
)


Expand Down
4 changes: 1 addition & 3 deletions demos/mmdetection/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@
from mmdet.core import get_classes

from norfair import Detection, Tracker, Video, draw_tracked_objects
from norfair.distances import frobenius
from norfair.tracker import TrackedObject

parser = argparse.ArgumentParser(description="Track human poses in a video.")
parser.add_argument("files", type=str, nargs="+", help="Video files to process")
Expand Down Expand Up @@ -34,7 +32,7 @@
for input_path in args.files:

tracker = Tracker(
distance_function=frobenius, distance_threshold=20, detection_threshold=0.6
distance_function="euclidean", distance_threshold=20, detection_threshold=0.6
)

video = Video(input_path=input_path, output_path=args.output_path)
Expand Down
3 changes: 1 addition & 2 deletions demos/motmetrics4norfair/src/motmetrics4norfair.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@

from norfair import drawing, metrics, Tracker, video
from norfair.camera_motion import MotionEstimator
from norfair.distances import iou
from norfair.filter import FilterPyKalmanFilterFactory

DETECTION_THRESHOLD = 0.5
Expand Down Expand Up @@ -103,7 +102,7 @@ def build_mask(frame, detections, tracked_objects):
)

tracker = Tracker(
distance_function=iou,
distance_function="iou_opt",
distance_threshold=DISTANCE_THRESHOLD,
detection_threshold=DETECTION_THRESHOLD,
pointwise_hit_counter_max=POINTWISE_HIT_COUNTER_MAX,
Expand Down
11 changes: 2 additions & 9 deletions demos/reid/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,6 @@
from norfair.filter import OptimizedKalmanFilterFactory


def distance_func(detection, tracked_object):
return np.linalg.norm(detection.points - tracked_object.estimate)


def embedding_distance(matched_not_init_trackers, unmatched_trackers):
snd_embedding = unmatched_trackers.last_detection.embedding

Expand Down Expand Up @@ -50,19 +46,16 @@ def main(
if disable_reid:
tracker = Tracker(
initialization_delay=1,
distance_function=distance_func,
distance_function="euclidean",
hit_counter_max=10,
filter_factory=OptimizedKalmanFilterFactory(),
distance_threshold=50,
past_detections_length=5,
# reid_distance_function=embedding_distance,
# reid_distance_threshold=0.5,
# reid_hit_counter_max=500,
)
else:
tracker = Tracker(
initialization_delay=1,
distance_function=distance_func,
distance_function="euclidean",
hit_counter_max=10,
filter_factory=OptimizedKalmanFilterFactory(),
distance_threshold=50,
Expand Down
3 changes: 1 addition & 2 deletions demos/sahi/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from utils import create_arg_parser, obtain_detection_model

from norfair import Detection, Tracker, Video, draw_boxes, draw_tracked_boxes
from norfair.distances import iou
from norfair.filter import OptimizedKalmanFilterFactory


Expand Down Expand Up @@ -48,7 +47,7 @@ def main(

tracker = Tracker(
initialization_delay=initialization_delay,
distance_function=iou,
distance_function="iou",
hit_counter_max=hit_counter_max,
filter_factory=OptimizedKalmanFilterFactory(),
distance_threshold=distance_threshold,
Expand Down
2 changes: 1 addition & 1 deletion demos/yolopv2/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def detect():

# Norfair Tracker init
tracker = Tracker(
distance_function=iou,
distance_function="iou",
distance_threshold=0.7,
)

Expand Down
1 change: 1 addition & 0 deletions demos/yolov4/requirements.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
opencv-python==4.6.0.66
importlib-metadata==4.8.3
2 changes: 1 addition & 1 deletion demos/yolov4/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def get_centroid(yolo_box, img_height, img_width):
for input_path in args.files:
video = Video(input_path=input_path, output_path=args.output_path)
tracker = Tracker(
distance_function="frobenius",
distance_function="euclidean",
distance_threshold=max_distance_between_points,
)

Expand Down
3 changes: 1 addition & 2 deletions demos/yolov5/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@

import norfair
from norfair import Detection, Paths, Tracker, Video
from norfair.distances import frobenius, iou

DISTANCE_THRESHOLD_BBOX: float = 0.7
DISTANCE_THRESHOLD_CENTROID: int = 30
Expand Down Expand Up @@ -127,7 +126,7 @@ def yolo_detections_to_norfair_detections(
for input_path in args.files:
video = Video(input_path=input_path)

distance_function = iou if args.track_points == "bbox" else frobenius
distance_function = "iou_opt" if args.track_points == "bbox" else "euclidean"
distance_threshold = (
DISTANCE_THRESHOLD_BBOX
if args.track_points == "bbox"
Expand Down
3 changes: 1 addition & 2 deletions demos/yolov7/src/demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@

import norfair
from norfair import Detection, Paths, Tracker, Video
from norfair.distances import frobenius, iou

DISTANCE_THRESHOLD_BBOX: float = 0.7
DISTANCE_THRESHOLD_CENTROID: int = 30
Expand Down Expand Up @@ -137,7 +136,7 @@ def yolo_detections_to_norfair_detections(
for input_path in args.files:
video = Video(input_path=input_path)

distance_function = iou if args.track_points == "bbox" else frobenius
distance_function = "iou_opt" if args.track_points == "bbox" else "euclidean"

distance_threshold = (
DISTANCE_THRESHOLD_BBOX
Expand Down
6 changes: 3 additions & 3 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ from norfair import Detection, Tracker, Video, draw_tracked_objects

detector = MyDetector() # Set up a detector
video = Video(input_path="video.mp4")
tracker = Tracker(distance_function="frobenius", distance_threshold=100)
tracker = Tracker(distance_function="euclidean", distance_threshold=100)

for frame in video:
detections = detector(frame)
Expand Down Expand Up @@ -76,11 +76,11 @@ After inspecting the detections you might find issues with the tracking, several
- Objects take **too long to start**, this can have multiple causes:
- `initialization_delay` is too big on the Tracker. Makes the TrackedObject stay on initializing for too long, `3` is usually a good value to start with.
- `distance_threshold` is too small on the Tracker. Prevents the Detections to be matched with the correct TrackedObject. The best value depends on the distance used.
- Incorrect `distance_function` on the Tracker. Some distances might not be valid in some cases, for instance, if using IoU but the objects in your video move so quickly that there is never an overlap between the detections of consecutive frames. Try different distances, `frobenius` or `create_normalized_mean_euclidean_distance` are good starting points.
- Incorrect `distance_function` on the Tracker. Some distances might not be valid in some cases, for instance, if using IoU but the objects in your video move so quickly that there is never an overlap between the detections of consecutive frames. Try different distances, `euclidean` or `create_normalized_mean_euclidean_distance` are good starting points.
- Objects take **too long to disappear**. Lower `hit_counter_max` on the Tracker.
- Points or bounding boxes **jitter too much**. Increase `R` (measurement error) or lower `Q` (estimate or process error) on the `OptimizedKalmanFilterFactory` or `FilterPyKalmanFilterFactory`. This makes the Kalman Filter put less weight on the measurements and trust more on the estimate, stabilizing the result.
- **Camera motion** confuses the Tracker. If the camera moves, the apparent movement of objects can become too erratic for the Tracker. Use `MotionEstimator`.
- **Incorrect matches** between Detections and TrackedObjects, a couple of scenarios can cause this:
- `distance_threshold` is too big so the Tracker matches Detections to TrackedObjects that are simply too far. Lower the threshold until you fix the error, the correct value will depend on the distance function that you're using.
- Mismatches when objects overlap. In this case, tracking becomes more challenging, usually, the quality of the detection degrades causing one of the objects to be missed or creating a single big detection that includes both objects. On top of the detection issues, the tracker needs to decide which detection should be matched to which TrackedObject which can be error-prone if only considering spatial information. The solution is not easy but incorporating the notion of the appearance similarity based on some kind of embedding to your distance_finction can help.
- Mismatches when objects overlap. In this case, tracking becomes more challenging, usually, the quality of the detection degrades causing one of the objects to be missed or creating a single big detection that includes both objects. On top of the detection issues, the tracker needs to decide which detection should be matched to which TrackedObject which can be error-prone if only considering spatial information. The solution is not easy but incorporating the notion of the appearance similarity based on some kind of embedding to your distance_function can help.
- Can't **recover** an object **after occlusions**. Use ReID distance, see [this demo](https://github.com/tryolabs/norfair/tree/master/demos/reid) for an example but for real-world use you will need a good ReID model that can provide good embeddings.
2 changes: 1 addition & 1 deletion norfair/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
>>> from norfair import Detection, Tracker, Video, draw_tracked_objects
>>> detector = MyDetector() # Set up a detector
>>> video = Video(input_path="video.mp4")
>>> tracker = Tracker(distance_function="frobenious", distance_threshold=50)
>>> tracker = Tracker(distance_function="euclidean", distance_threshold=50)
>>> for frame in video:
>>> detections = detector(frame)
>>> norfair_detections = [Detection(points) for points in detections]
Expand Down
Loading