# Visual SLAM Trilogy
## Part I: The Frontend

~~In this SLAM hands-on lecture, we will implement a visual SLAM system that has an OpenCV frontend, a GTSAM backend, and a loop closure module based on a bag-of-words approach.~~

In a land where visions intertwine with ancient algorithms, we embark on a quest to forge a **visual SLAM system**. With the wisdom of **OpenCV** and the deep magics of **GTSAM** at our command, we shall weave a loop closure module, crafted from the mystical **bag-of-words** algorithm. This noble endeavor calls upon the brave to unlock the secrets of perception, guiding our creation through the unseen paths of the world.

![lord_of_slam](assets/lord_of_slam.webp)

## Now, back to reality

## Overview

The overview of our SLAM system is depicted below, simplified by certain assumptions:

1. Odometry
    - Assumes there is an odometry trajectory provided to our SLAM system.
    - In practice, this includes a Kalman filter fusing the IMU and encoder data.
2. Frontend:
    - Processes the raw sensor data and extracts relevant features for optimization.
    - Associates each measurement to a specific landmark (3D point).
    - Provide initial values for the backend variables.
3. Mapping
    - Utilizes a very minimum sparse map.
    - Could be replaced with OGM or even 3D Gaussian Splatting in the future.
4. Backend
    - Solve the maximum a posteriori (MAP) estimation problem.
    - Feed back information to loop closure.
5. Loop closure:
    - Acts as a long-term tracking module (compared to the short-term tracking module in frontend).
    - Implemented with visual bag-of-word algorithm.

![slam_overview](assets/slam_overview.png)

## Dataset

We will use the abandoned_factory P006 sequence from the TartanAir dataset to test the system. It is a simulation dataset with diverse environments and ground truth dataset, which make it perfect for testing and evaluating our system. To get started, we'll need to access the camera intrinsics, extrinsics, and data format information, which can be found here: https://github.com/castacks/tartanair_tools/blob/master/data_type.md.

## Implementation

In this notebook, we will walk through the implementation of the frontend step-by-step, while visualizing the output of each step. Specifically, we will cover the following topics:

- Loading the dataset
- Selecting keyframes
- Extracting features and tracking them across frames
- Removing outlier matches
- Assigning global IDs to features

## I. Dependency


### 1. Install Python libraries

Please use python>=3.9.

In [None]:
# # install the minslam package in “editable” mode
# !pip install -e ..

# # install other libraries
# !pip install numpy spatialmath-python opencv-python matplotlib gtsam ipympl evo plotly nbformat

### 2. Import libraries and dataset
Please download [abadoned_factory P006 dataset](https://drive.google.com/file/d/1Q_fSI0U-IMfv90lyE1Uh78KV2QJheHbv/view?usp=share_link) and extract it to a folder named "data".

In [None]:
# this block should run without error

# dataset
import os

# test if we can find the dataset
dataset_folder = '../data/tartanair/scenes/abandonedfactory/Easy/P006'
print('Check if the data folder exists:',os.path.exists(dataset_folder))

# visualization
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# frontend
import numpy as np
from spatialmath import *
import cv2
import matplotlib.pyplot as plt

# backend
import gtsam
from gtsam.symbol_shorthand import L, X

# our slam implementation
from minslam.data_loader import TartanAirLoader, plot_trajectory
from minslam.frontend import Frontend
from minslam.params import Params

## II. Frontend

### 1. Load images and trajectory

In [None]:
traj_filename = 'pose_left.txt'
traj_path = os.path.join(dataset_folder, traj_filename)
print('Loading trajectory from ', traj_path)

# Have a look at the trajectory file
print('First 3 lines of TartanAir trajectory file:')
with open(traj_path, 'r') as f:
    print(''.join(f.readlines()[:3])) # tx ty tz qx qy qz qw

# load a trajectory
print('First 3 SE3 poses:')
dataset = TartanAirLoader(dataset_folder)
gt_poses = dataset._load_traj('tum', traj_filename, add_timestamps=True)
dataset.set_ground_truth(gt_poses)
print(gt_poses[:3])

# add noise to the gt and set it as odometry
odom_poses = dataset.add_noise(gt_poses, [1e-4, 3e-4], [1e-3, 1e-3], seed=100)
dataset.set_odometry(gt_poses)

# plot the trajectory in 3d
gt_traj = np.array([p.t for p in gt_poses])
odom_traj = np.array([p.t for p in odom_poses])
fig = go.Figure()
n_keyframes = 300
plot_trajectory(odom_traj[:n_keyframes], 'odom', fig)
plot_trajectory(gt_traj[:n_keyframes], 'gt', fig)
fig.show()

# load the first frame
dataset.set_curr_index(50)
color, depth = dataset.read_current_rgbd()

# show color and depth horizontally
print('color image data type:', color.dtype)
print('depth image data type:', depth.dtype)
print('depth image range:', f'{depth.min()} - {depth.max()}')

fig_color = px.imshow(color[:,:,::-1])
fig_color.update_traces(hoverinfo="x+y+z", name="")

clipped_depth = depth.clip(0, 40)
fig_depth = px.imshow(clipped_depth, color_continuous_scale='gray')
fig_depth.update_traces(hoverinfo="x+y+z", name="")

fig_color.show()
fig_depth.show()

### 2. Keyframe selection

Instead of processing every incoming frame, we choose to pick some "keyframes" to reduce computation. In this step, we need to ensure sufficient transform for landmark triangulation. To do this, we define the distance between two odometry poses as $\lVert Log (X_{i}^{-1} X_{i+1}) \rVert$, and if this distance is greater than a specified threshold, we add the new frame as a keyframe.

The figure below illustrates the effect of increasing the threshold on the number of keyframes we obtain:

In [None]:
# intialize our frontend implementation
params = Params('../params/tartanair.yaml')
frontend = Frontend(params)

# The keyframe_selection function accepts a pose and returns a boolean.
# The first frame is always a keyframe, then we check if the motion is
# large enough.
pose = dataset.read_current_odometry()
is_keyframe = frontend.keyframe_selection(pose)

# What if we set the threshold to different values?
traces = []
for threshold in np.arange(0, 1.1, 0.2):
    frontend.params['frontend']['keyframe']['threshold'] = threshold
    keyframe_selections = np.zeros(100)
    for i in range(100):
        dataset.set_curr_index(i)
        pose = dataset.read_current_odometry()
        keyframe_selections[i] = frontend.keyframe_selection(pose)
        if keyframe_selections[i]:
            frontend.add_keyframe(pose, color, depth)
    
    # Generating x values (frame IDs where keyframes are selected)
    x_vals = np.arange(100)[keyframe_selections == 1]
    y_vals = np.full(x_vals.shape, threshold) # Y value is constant as the threshold

    # Add a trace for each threshold
    traces.append(go.Scatter(x=x_vals, y=y_vals, mode='markers', name=f'threshold={round(threshold, 1)}',
                             hoverinfo='text', text=['Frame ID: %d' % i for i in x_vals]))

# Create the figure with all traces
fig = go.Figure(data=traces)

# Update layout
fig.update_layout(title='Keyframe Selection Threshold Analysis',
                  xaxis_title='Frame ID',
                  yaxis_title='Threshold',
                  height=600, width=800)

# Display the figure
fig.show()

### 3. Extract features and generate matches

To detect and describe features in the image, we'll be using the [Scale-Invariant Feature Transform (SIFT)](https://en.wikipedia.org/wiki/Scale-invariant_feature_transform). Other options such as ORB, FAST, and AKAZE are also available. To track the features, we could use a brute-force matcher to compare each pair of feature descriptors and choose the best matches. Another method is using [optical flow](https://docs.opencv.org/3.4/d4/dee/tutorial_optical_flow.html) that tracks how image patches (3x3) move across different frames. This method provides better tracking results and more consistent runtime when the movement is small.

Here we are using the brute-force matcher.

In [None]:
# clear previous states
params = Params('../params/tartanair.yaml')
frontend = Frontend(params)

# add a keyframe
dataset.set_curr_index(100)
pose = dataset.read_current_odometry()
color, depth = dataset.read_current_rgbd()
frontend.add_keyframe(pose, color, depth)

# extract features
frontend.extract_features()
frontend.assign_global_id()
fig = frontend.plot_features()
fig.show()

In [None]:
# add another keyframe
dataset.set_curr_index(150)
pose = dataset.read_current_odometry()
color, depth = dataset.read_current_rgbd()
frontend.add_keyframe(pose, color, depth)

# match features
frontend.extract_features()
frontend.match_features('bruteforce')
fig = frontend.plot_matches(plot_id=False)
fig.show()

print('number of matches before outlier rejection:', len(frontend.curr_frame.matches))

### 4. Remove match outliers

To remove incorrect matches, we'll be using [`cv2.findFundamentalMat`](https://docs.opencv.org/3.4/d9/d0c/group__calib3d.html#ga59b0d57f46f8677fb5904294a23d404a). This function uses the [epipolar geometry model](https://web.stanford.edu/class/cs231a/course_notes/03-epipolar-geometry.pdf) to describe the relationship between matches, based on the fundamental matrix. It then detects outliers based on how well a match fits the model.

![epipolar geometry](assets/epipolar%20geometry.png)

In [None]:
# after removing outliers, the wrong match we added should be removed
frontend.eliminate_outliers()
fig = frontend.plot_matches(plot_id=False)
fig.show()

### 5. Assign global ID

Then, for each tracked feature, we assign a global id to it. The global id won't change across the frames.

In [None]:
# clear previous states
params = Params('../params/tartanair.yaml')
frontend = Frontend(params)

# add a keyframe
dataset.set_curr_index(100)
pose = dataset.read_current_odometry()
color, depth = dataset.read_current_rgbd()
frontend.add_keyframe(pose, color, depth)
frontend.extract_features()
frontend.assign_global_id()

# add another keyframe
dataset.set_curr_index(150)
pose = dataset.read_current_odometry()
color, depth = dataset.read_current_rgbd()
frontend.add_keyframe(pose, color, depth)
frontend.extract_features()
frontend.match_features()
frontend.eliminate_outliers()
frontend.assign_global_id()

fig = frontend.plot_matches(plot_id=True)
fig.show()

### 6. Test the frontend

Finally, we can construct a working frontend!!

In [None]:
# clear previous states
params = Params('../params/tartanair.yaml')
frontend = Frontend(params)
dataset.set_curr_index(100)


fig, ax = plt.subplots()
im = ax.imshow(np.zeros([480, 1280, 3]))

# run the whole pipeline once
def run_once(frame_num):

    pose = dataset.read_current_odometry()
    while not frontend.keyframe_selection(pose):
        if not dataset.load_next_frame():
            break
        pose = dataset.read_current_odometry()
    color, depth = dataset.read_current_rgbd()
    frontend.add_keyframe(pose, color, depth)
    print(f'--- Added keyframe {frontend.frame_id} (seq id: {dataset.curr_index}) ---')
    frontend.extract_features()
    if frontend.frame_id > 0:
        frontend.match_features()
        frontend.eliminate_outliers()
    frontend.assign_global_id()
    # do not show the plot
    plt.ioff()

    img = frontend.plot_matches(fig=fig, plot_id=True, matplot=True)
    im.set_data(img)
    return [im]

In [None]:
# generate tracking animation
from matplotlib.animation import FuncAnimation
anim = FuncAnimation(fig, run_once, frames=100)
anim.save('recitation_tracking.mp4', writer='ffmpeg', fps=10)