tryolabs · facundo-lezama · Nov 18, 2022 · Oct 25, 2022 · Nov 1, 2022 · Nov 2, 2022
diff --git a/README.md b/README.md
@@ -27,7 +27,7 @@ Using Norfair, you can add tracking capabilities to any detector with just a few
 
 - Supports moving camera, re-identification with appearance embeddings, and n-dimensional object tracking (see [Advanced features](#advanced-features)).
 
-- The function used to calculate the distance between tracked objects and detections is defined by the user, enabling the implementation of different tracking strategies.
+- Norfair provides several predefined distance functions to compare tracked objects and detections. The distance functions can also be defined by the user, enabling the implementation of different tracking strategies.
 
 - Fast. The only thing bounding inference speed will be the detection network feeding detections to Norfair.
 
@@ -98,14 +98,7 @@ Most tracking demos are showcased with vehicles and pedestrians, but the detecto
 
 ## How it works
 
-Norfair works by estimating the future position of each point based on its past positions. It then tries to match these estimated positions with newly detected points provided by the detector. For this matching to occur, Norfair can rely on any distance function specified by the user of the library. Therefore, each object tracker can be made as simple or as complex as needed.
-
-The following is an example of a particularly simple distance function calculating the Euclidean distance between tracked objects and detections. This is possibly the simplest distance function you could use in Norfair, as it uses just one single point per detection/object.
-
-```python
- def euclidean_distance(detection, tracked_object):
-     return np.linalg.norm(detection.points - tracked_object.estimate)
-```
+Norfair works by estimating the future position of each point based on its past positions. It then tries to match these estimated positions with newly detected points provided by the detector. For this matching to occur, Norfair can rely on any distance function. There are some predefined distances already integrated in Norfair, and the users can also define their own custom distances. Therefore, each object tracker can be made as simple or as complex as needed.
 
 As an example we use [Detectron2](https://github.com/facebookresearch/detectron2) to get the single point detections to use with this distance function. We just use the centroids of the bounding boxes it produces around cars as our detections, and get the following results.
 
@@ -132,7 +125,7 @@ detector = DefaultPredictor(cfg)
 
 # Norfair
 video = Video(input_path="video.mp4")
-tracker = Tracker(distance_function=euclidean_distance, distance_threshold=20)
+tracker = Tracker(distance_function="euclidean", distance_threshold=20)
 
 for frame in video:
     detections = detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

diff --git a/demos/3d_track/Dockerfile b/demos/3d_track/Dockerfile
@@ -7,7 +7,7 @@ RUN apt-get update && \
     rm -rf /var/lib/apt/lists/*
 
 RUN pip install --upgrade pip && \
-    pip install mediapipe==0.8.10.1 && \
+    pip install mediapipe==0.8.11 && \
     pip install opencv-python==4.5.5.64
 
 RUN pip install git+https://github.com/tryolabs/norfair.git@master#egg=norfair

diff --git a/demos/alphapose/writer.py b/demos/alphapose/writer.py
@@ -22,7 +22,7 @@
 EVAL_JOINTS = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
 
 detection_threshold = 0.2
-keypoint_dist_threshold = None
+keypoint_dist_threshold = 10
 
 
 def keypoints_distance(detected_pose, tracked_pose):
@@ -262,7 +262,6 @@ def update(self):
 
                 final_result.append(result)
                 if self.opt.save_img or self.save_video or self.opt.vis:
-
                     img = orig_img.copy()
                     global keypoint_dist_threshold
                     keypoint_dist_threshold = img.shape[0] / 30

diff --git a/demos/camera_motion/src/demo.py b/demos/camera_motion/src/demo.py
@@ -224,7 +224,7 @@ def run():
         )
 
         tracker = Tracker(
-            distance_function="frobenius",
+            distance_function="euclidean",
             detection_threshold=args.confidence_threshold,
             distance_threshold=args.distance_threshold,
             initialization_delay=args.initialization_delay,

diff --git a/demos/detectron2/src/demo.py b/demos/detectron2/src/demo.py
@@ -20,7 +20,7 @@
 
 # Norfair
 video = Video(input_path=args.file)
-tracker = Tracker(distance_function="frobenius", distance_threshold=20)
+tracker = Tracker(distance_function="euclidean", distance_threshold=20)
 
 for frame in video:
     detections = detector(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

diff --git a/demos/mmdetection/mmdetection_cars.py b/demos/mmdetection/mmdetection_cars.py
@@ -3,8 +3,6 @@
 from mmdet.core import get_classes
 
 from norfair import Detection, Tracker, Video, draw_tracked_objects
-from norfair.distances import frobenius
-from norfair.tracker import TrackedObject
 
 #
 # MMDetection setup
@@ -27,7 +25,7 @@
 
 
 tracker = Tracker(
-    distance_function=frobenius, distance_threshold=20, detection_threshold=0.6
+    distance_function="euclidean", distance_threshold=20, detection_threshold=0.6
 )
 
 

diff --git a/demos/mmdetection/src/demo.py b/demos/mmdetection/src/demo.py
@@ -5,8 +5,6 @@
 from mmdet.core import get_classes
 
 from norfair import Detection, Tracker, Video, draw_tracked_objects
-from norfair.distances import frobenius
-from norfair.tracker import TrackedObject
 
 parser = argparse.ArgumentParser(description="Track human poses in a video.")
 parser.add_argument("files", type=str, nargs="+", help="Video files to process")
@@ -34,7 +32,7 @@
 for input_path in args.files:
 
     tracker = Tracker(
-        distance_function=frobenius, distance_threshold=20, detection_threshold=0.6
+        distance_function="euclidean", distance_threshold=20, detection_threshold=0.6
     )
 
     video = Video(input_path=input_path, output_path=args.output_path)

diff --git a/demos/motmetrics4norfair/src/motmetrics4norfair.py b/demos/motmetrics4norfair/src/motmetrics4norfair.py
@@ -5,7 +5,6 @@
 
 from norfair import drawing, metrics, Tracker, video
 from norfair.camera_motion import MotionEstimator
-from norfair.distances import iou
 from norfair.filter import FilterPyKalmanFilterFactory
 
 DETECTION_THRESHOLD = 0.5
@@ -103,7 +102,7 @@ def build_mask(frame, detections, tracked_objects):
     )
 
     tracker = Tracker(
-        distance_function=iou,
+        distance_function="iou_opt",
         distance_threshold=DISTANCE_THRESHOLD,
         detection_threshold=DETECTION_THRESHOLD,
         pointwise_hit_counter_max=POINTWISE_HIT_COUNTER_MAX,

diff --git a/demos/reid/src/demo.py b/demos/reid/src/demo.py
@@ -12,10 +12,6 @@
 from norfair.filter import OptimizedKalmanFilterFactory
 
 
-def distance_func(detection, tracked_object):
-    return np.linalg.norm(detection.points - tracked_object.estimate)
-
-
 def embedding_distance(matched_not_init_trackers, unmatched_trackers):
     snd_embedding = unmatched_trackers.last_detection.embedding
 
@@ -50,19 +46,16 @@ def main(
     if disable_reid:
         tracker = Tracker(
             initialization_delay=1,
-            distance_function=distance_func,
+            distance_function="euclidean",
             hit_counter_max=10,
             filter_factory=OptimizedKalmanFilterFactory(),
             distance_threshold=50,
             past_detections_length=5,
-            # reid_distance_function=embedding_distance,
-            # reid_distance_threshold=0.5,
-            # reid_hit_counter_max=500,
         )
     else:
         tracker = Tracker(
             initialization_delay=1,
-            distance_function=distance_func,
+            distance_function="euclidean",
             hit_counter_max=10,
             filter_factory=OptimizedKalmanFilterFactory(),
             distance_threshold=50,

diff --git a/demos/sahi/src/demo.py b/demos/sahi/src/demo.py
@@ -6,7 +6,6 @@
 from utils import create_arg_parser, obtain_detection_model
 
 from norfair import Detection, Tracker, Video, draw_boxes, draw_tracked_boxes
-from norfair.distances import iou
 from norfair.filter import OptimizedKalmanFilterFactory
 
 
@@ -48,7 +47,7 @@ def main(
 
     tracker = Tracker(
         initialization_delay=initialization_delay,
-        distance_function=iou,
+        distance_function="iou",
         hit_counter_max=hit_counter_max,
         filter_factory=OptimizedKalmanFilterFactory(),
         distance_threshold=distance_threshold,

diff --git a/demos/yolopv2/src/demo.py b/demos/yolopv2/src/demo.py
@@ -96,7 +96,7 @@ def detect():
 
     # Norfair Tracker init
     tracker = Tracker(
-        distance_function=iou,
+        distance_function="iou",
         distance_threshold=0.7,
     )
 

diff --git a/demos/yolov4/requirements.txt b/demos/yolov4/requirements.txt
@@ -1 +1,2 @@
 opencv-python==4.6.0.66
+importlib-metadata==4.8.3
diff --git a/demos/yolov4/src/demo.py b/demos/yolov4/src/demo.py
@@ -62,7 +62,7 @@ def get_centroid(yolo_box, img_height, img_width):
 for input_path in args.files:
     video = Video(input_path=input_path, output_path=args.output_path)
     tracker = Tracker(
-        distance_function="frobenius",
+        distance_function="euclidean",
         distance_threshold=max_distance_between_points,
     )
 

diff --git a/demos/yolov5/src/demo.py b/demos/yolov5/src/demo.py
@@ -6,7 +6,6 @@
 
 import norfair
 from norfair import Detection, Paths, Tracker, Video
-from norfair.distances import frobenius, iou
 
 DISTANCE_THRESHOLD_BBOX: float = 0.7
 DISTANCE_THRESHOLD_CENTROID: int = 30
@@ -127,7 +126,7 @@ def yolo_detections_to_norfair_detections(
 for input_path in args.files:
     video = Video(input_path=input_path)
 
-    distance_function = iou if args.track_points == "bbox" else frobenius
+    distance_function = "iou_opt" if args.track_points == "bbox" else "euclidean"
     distance_threshold = (
         DISTANCE_THRESHOLD_BBOX
         if args.track_points == "bbox"

diff --git a/demos/yolov7/src/demo.py b/demos/yolov7/src/demo.py
@@ -8,7 +8,6 @@
 
 import norfair
 from norfair import Detection, Paths, Tracker, Video
-from norfair.distances import frobenius, iou
 
 DISTANCE_THRESHOLD_BBOX: float = 0.7
 DISTANCE_THRESHOLD_CENTROID: int = 30
@@ -137,7 +136,7 @@ def yolo_detections_to_norfair_detections(
 for input_path in args.files:
     video = Video(input_path=input_path)
 
-    distance_function = iou if args.track_points == "bbox" else frobenius
+    distance_function = "iou_opt" if args.track_points == "bbox" else "euclidean"
 
     distance_threshold = (
         DISTANCE_THRESHOLD_BBOX

diff --git a/docs/getting_started.md b/docs/getting_started.md
@@ -44,7 +44,7 @@ from norfair import Detection, Tracker, Video, draw_tracked_objects
 
 detector = MyDetector()  # Set up a detector
 video = Video(input_path="video.mp4")
-tracker = Tracker(distance_function="frobenius", distance_threshold=100)
+tracker = Tracker(distance_function="euclidean", distance_threshold=100)
 
 for frame in video:
    detections = detector(frame)
@@ -76,11 +76,11 @@ After inspecting the detections you might find issues with the tracking, several
 - Objects take **too long to start**, this can have multiple causes:
     - `initialization_delay` is too big on the Tracker. Makes the TrackedObject stay on initializing for too long, `3` is usually a good value to start with.
     - `distance_threshold` is too small on the Tracker. Prevents the Detections to be matched with the correct TrackedObject. The best value depends on the distance used.
-    - Incorrect `distance_function` on the Tracker. Some distances might not be valid in some cases, for instance, if using IoU but the objects in your video move so quickly that there is never an overlap between the detections of consecutive frames. Try different distances, `frobenius` or `create_normalized_mean_euclidean_distance` are good starting points.
+    - Incorrect `distance_function` on the Tracker. Some distances might not be valid in some cases, for instance, if using IoU but the objects in your video move so quickly that there is never an overlap between the detections of consecutive frames. Try different distances, `euclidean` or `create_normalized_mean_euclidean_distance` are good starting points.
 - Objects take **too long to disappear**. Lower `hit_counter_max` on the Tracker.
 - Points or bounding boxes **jitter too much**. Increase `R` (measurement error) or lower `Q` (estimate or process error) on the `OptimizedKalmanFilterFactory` or `FilterPyKalmanFilterFactory`. This makes the Kalman Filter put less weight on the measurements and trust more on the estimate, stabilizing the result.
 - **Camera motion** confuses the Tracker. If the camera moves, the apparent movement of objects can become too erratic for the Tracker. Use `MotionEstimator`.
 - **Incorrect matches** between Detections and TrackedObjects, a couple of scenarios can cause this:
     - `distance_threshold` is too big so the Tracker matches Detections to TrackedObjects that are simply too far. Lower the threshold until you fix the error, the correct value will depend on the distance function that you're using.
-    - Mismatches when objects overlap. In this case, tracking becomes more challenging, usually, the quality of the detection degrades causing one of the objects to be missed or creating a single big detection that includes both objects. On top of the detection issues, the tracker needs to decide which detection should be matched to which TrackedObject which can be error-prone if only considering spatial information. The solution is not easy but incorporating the notion of the appearance similarity based on some kind of embedding to your distance_finction can help.
+    - Mismatches when objects overlap. In this case, tracking becomes more challenging, usually, the quality of the detection degrades causing one of the objects to be missed or creating a single big detection that includes both objects. On top of the detection issues, the tracker needs to decide which detection should be matched to which TrackedObject which can be error-prone if only considering spatial information. The solution is not easy but incorporating the notion of the appearance similarity based on some kind of embedding to your distance_function can help.
 - Can't **recover** an object **after occlusions**. Use ReID distance, see [this demo](https://github.com/tryolabs/norfair/tree/master/demos/reid) for an example but for real-world use you will need a good ReID model that can provide good embeddings.
diff --git a/norfair/__init__.py b/norfair/__init__.py
@@ -6,7 +6,7 @@
 >>> from norfair import Detection, Tracker, Video, draw_tracked_objects
 >>> detector = MyDetector()  # Set up a detector
 >>> video = Video(input_path="video.mp4")
->>> tracker = Tracker(distance_function="frobenious", distance_threshold=50)
+>>> tracker = Tracker(distance_function="euclidean", distance_threshold=50)
 >>> for frame in video:
 >>>    detections = detector(frame)
 >>>    norfair_detections = [Detection(points) for points in detections]