# Person Following and Safe Distance Navigation for Task 2

For Task 2, the objective is to enable the robot to autonomously follow a manually selected person using visual tracking, while maintaining a safe and consistent distance. To achieve this, we combined ZED stereo vision, object tracking and robot control in a closed-loop system.


## Import Required Libraries

In [None]:
import numpy as np
import ipywidgets.widgets as widgets
import time
from IPython.display import display
import cv2
import pyzed.sl as sl

## Setup Live Display Widgets

To visualise the robot's perception in real-time, we use IPython widgets to display the raw and annotated frames side-by-side. This assists with visual debugging during object detection and tracking.

In [None]:
image_display = widgets.Image(format="jpeg", width="45%")
full_display = widgets.Image(format="jpeg", width="45%")
display(widgets.HBox([image_display, full_display]))

location_coords_display = widgets.Label()
display(location_coords_display)

## Initialise ZED Camera

The ZED camera is configured with VGA resolution and precise depth mode in millimetres. We ensure successful initialisation before continuing to tracking logic.

In [None]:
camera = sl.Camera()
camera_params = sl.InitParameters()
camera_params.camera_resolution = sl.RESOLUTION.VGA
camera_params.depth_mode = sl.DEPTH_MODE.ULTRA
camera_params.coordinate_units = sl.UNIT.MILLIMETER

camera_status = camera.open(camera_params)
if camera_status != sl.ERROR_CODE.SUCCESS:
    print("camera error")
    print(camera_status)
    camera.close()
    exit()

## Initialise Robot and Tracking Model

The robot hardware is controlled via a motors module, which contains essential movement functions. We initialise the robot instance and stop all motors immediately to ensure safety before any movement logic executes.

For tracking, we import a **main.Model** class which wraps a **pretrained YOLO object detector** (yolo11n.engine) and provides persistent tracking functionality. This includes:
- Assigning object IDs from the YOLO engine
- Selecting a target to track by ID
- Handling lost targets and re-identifying based on location and histogram similarity

This logic used the **Ultralytics pretrained YOLO backend** to detect humans (class 0) in each frame.

In [None]:
# Initialise robot
import motors
robot = motors.MotorsYukon()
robot.stop()

# Initialise model
import main
m = main.Model()

## Select Target for Tracking

We retrieve the live camera feed, annotate it with bounding boxes and prompt the user to manually select a person to track by entering the corresponding ID. Once selected, the system locks onto this target.

In [None]:
image_mat = sl.Mat()
depth_mat = sl.Mat()
started_tracking = False

# Loop until a valid tracking target is selected by the user
while not started_tracking:
    err = camera.grab()
    if err == sl.ERROR_CODE.SUCCESS:
        camera.retrieve_image(image_mat)
        image = image_mat.get_data()
        
        # Display the current frame with all detected bounding boxes
        image_display.value = m.show_all_boxes(image)
    
        user_input = input("enter id to track (or leave blank to skip):")
        if user_input == "":
            continue
        else:
            # Set the selected ID as the tracked target
            m.tracked_id = int(user_input)
            print("tracking object id " + str(user_input))
            started_tracking = True

## Define Robot Motion Parameters

These constants control how the robot adjusts its direction and distance when tracking:

- KEEP_X_BETWEEN = (275, 417): Horizontal range (in pixels) where the person is considered centred. Outside this, the robot turns.
- TURN_SPEED = 0.3: Speed of turning movement.
- TURN_SLEEP = 0.05: Time delay after each turn to avoid overshooting.
- KEEP_DEPTH_BETWEEN = (1500, 2000): Safe distance range (in mm) between the robot and the person.
- MOVE_SPEED = 0.6: Speed of forward/backward movement.
- MOVE_SLEEP = 0.05: Time delay after each move for controlled motion.

In [None]:
KEEP_X_BETWEEN = (275, 417)
TURN_SPEED = 0.3
TURN_SLEEP = 0.05
KEEP_DEPTH_BETWEEN = (1500, 2000)
MOVE_SPEED = 0.6
MOVE_SLEEP = 0.05

## Real-Time Person-Following Logic

The robot runs a continuous loop:
- Captures the latest RGB and depth data.
- Tracks the selected person using the model.
- Calculates their horizontal position and distance.
- Commands the robot to turn or move forward/backward accordingly.
- If the target is lost, the robot halts immediately.

Live image feedback with bounding boxes is updated at every iteration.

In [None]:
# Continuously grabs frames and updates robot movement based on tracked person's position
running = True
while running:
    err = camera.grab()
    if err == sl.ERROR_CODE.SUCCESS:
        # Retrieve image and depth data from the camera
        camera.retrieve_image(image_mat)
        image = image_mat.get_data()

        camera.retrieve_measure(depth_mat, measure=sl.MEASURE.DEPTH)
        depth = depth_mat.get_data()
        
        tracked_box = m.track(image, return_type="corners")
        
        if tracked_box is False:
            # If person is not detected, stop the robot and show image
            image_display.value = bytes(cv2.imencode('.jpg', image)[1])
            robot.stop()
        else:
            # Calculate the centre of the tracked bounding box
            x_centre = int((tracked_box[0] + tracked_box[2]) / 2)
            y_centre = int((tracked_box[1] + tracked_box[3]) / 2)
            image_rect = cv2.rectangle(
                image,
                (int(tracked_box[0]), int(tracked_box[1])),
                (int(tracked_box[2]), int(tracked_box[3])),
                (255, 0, 0),
                4
            )
            image_display.value = bytes(cv2.imencode('.jpg', image_rect)[1])

            # Turn if tracked person is outside KEEP_X_BETWEEN
            if x_centre < KEEP_X_BETWEEN[0]:
                robot.left(speed=TURN_SPEED)
                time.sleep(TURN_SLEEP)
                print("turning left   ", end="\r")
            elif x_centre > KEEP_X_BETWEEN[1]:
                robot.right(speed=TURN_SPEED)
                time.sleep(TURN_SLEEP)
                print("turning right  ", end="\r")
            else:
                # Adjust correct distance if person is centred
                depth_value = depth[y_centre, x_centre]
                print(depth_value)
                if depth_value < KEEP_DEPTH_BETWEEN[0]:
                    robot.backward(speed=MOVE_SPEED)
                    time.sleep(MOVE_SLEEP)
                    print("turning back   ", end="\r")
                elif depth_value > KEEP_DEPTH_BETWEEN[1]:
                    robot.forward(speed=MOVE_SPEED)
                    time.sleep(MOVE_SLEEP)
                    print("turning forward", end="\r")
                else:
                    # Person is centred and at a safe distance — stop moving
                    robot.stop()
                    print("stopping       ", end="\r")

        full_display.value = m.show_all_boxes(image)
    
# Close the camera when tracking ends
camera.close()

HBox(children=(Image(value=b'', format='jpeg', width='45%'), Image(value=b'', format='jpeg', width='45%')))

Label(value='')

[2025-05-07 11:43:38 UTC][ZED][INFO] Logging level INFO
[2025-05-07 11:43:38 UTC][ZED][INFO] Logging level INFO
[2025-05-07 11:43:38 UTC][ZED][INFO] Logging level INFO
[2025-05-07 11:43:39 UTC][ZED][INFO] [Init]  Depth mode: ULTRA
[2025-05-07 11:43:40 UTC][ZED][INFO] [Init]  Camera successfully opened.
[2025-05-07 11:43:40 UTC][ZED][INFO] [Init]  Camera FW version: 1523
[2025-05-07 11:43:40 UTC][ZED][INFO] [Init]  Video mode: VGA@100
[2025-05-07 11:43:40 UTC][ZED][INFO] [Init]  Serial Number: S/N 38559894
Loading yolo11n.engine for TensorRT inference...
[05/07/2025-12:43:45] [TRT] [I] Loaded engine size: 12 MiB
[05/07/2025-12:43:45] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[05/07/2025-12:43:45] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18, now: CPU 0, GPU 28 (MiB)


enter id to track (or leave blank to skip): 1


tracking object id 1
2156.092
2018.8047orward
1830.5281orward
1704.7145      
1749.1368      
1740.268       
1774.8849      
1816.8275      
1862.2815      
1939.7996      
2001.7268      
2134.2073orward
2121.3403orward
2085.6265orward
2021.1686orward
1919.5089orward
1827.12g       
1800.7797      
1726.8352      
1604.1819eft   
1497.5181      
1465.9336ack   
1488.2257ack   
1607.3795ack   
1615.3778eft   
1637.4213      
1627.6641      
1597.1511      
1521.4459      
1466.526       
1471.689back   
1618.6321eft   
1841.6927      
1870.7651      
1763.7526      
1702.1322eft   
1890.4225      
1951.4983      
1923.1707      
1885.2657      
nanning left   
2292.3325      
2214.5847orward
2125.4294orward
2071.1846orward
2067.7747orward
2011.3257orward
2028.9952orward
2102.1106eft   
2057.8076orward
2062.0317orward
2058.5247orward
2279.5198orward
2067.6777orward
2086.6863orward
2139.0874orward
2209.0857orward
2196.0532orward
2200.8257orward
2260.8318orward
2291.4739orward
2299.9092o