# 0. Preliminaries

Before starting, here are some prerequisites for this tutorial.

### 💻 Environment requirements
This project was tested with:

- Linux OS (Windows Subsystem for Linux _**might**_ work but we do not officially support it)
- 64G RAM
- NVIDIA GTX 1080 Ti 11G, NVIDIA V100 32G, NVIDIA A40 48G
- CUDA 11.8 and 12.1
- conda 23.3.1

### 🏗  Installation
As indicated in our [README](../README.md), simply run [`install.sh`](install.sh) to install all dependencies in a new conda environment 
named `spt`. 

```bash
# Creates a conda env named 'spt' env and installs dependencies
./install.sh
```

### 👩‍💻 Coding experience
Being familiar with the following is _**mandatory**_:
- [Python](https://www.python.org/)
- [Jupyter](https://jupyter.org/)
- [PyTorch](https://pytorch.org/docs/stable/index.html/)
- [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/)

Knowledge of the following would also be _**nice to have**_:
- [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/)
- [Hydra](https://hydra.cc/docs/intro/)
- [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template)

Finally, having a look at our [README](../README.md) would help you better _**navigate our code structure**_.

### 🧑‍🎓 Machine learning experience
Whether you intend to **simply understand, make use of, or extend** our method, we **strongly encourage you to read (and cite) our paper [_Efficient 3D Semantic Segmentation with Superpoint Transformer_](https://arxiv.org/abs/2306.08045)** (ICCV 2023).

Besides, if you are not very familiar with 3D deep learning and self-attention, some important papers might provide a bit more context for this work:
- [Transformer](https://arxiv.org/abs/1706.03762) (NeurIPS 2017)
- [PointNet](https://arxiv.org/abs/1612.00593) (CVPR 2017)
- [Superpoint Graph](https://arxiv.org/abs/1711.09869) (CVPR 2018)

# 1. Introduction

### 👉 [Introductory slides](../media/superpoint_transformer_tutorial.pdf)

This tutorial will demonstrate how to use Superpoint Transformer (SPT) on your own point cloud data. 

In our running example, we will use a large point cloud from the [Vancouver LiDAR 2022](https://opendata.vancouver.ca/explore/dataset/lidar-2022/map/?location=12,49.25683,-123.14421) dataset and run inference on it with SPT pretrained on [DALES](https://udayton.edu/engineering/research/centers/vision_lab/research/was_data_analysis_and_processing/dale.php), a similar dataset for which we officially provide pretrained weights.

<p align="center">
    <img width="33%" src="../media/dales/sem_gt_demo.png">
</p>

Although they both cover similar urban areas, the DALES and Vancouver datasets are far from identical: different semantic segmentation classes, sensor noise and resolution might differ, and Vancouver provides pointwise RGB colors and LiDAR intensity, while DALES only has LiDAR intensity. For these reasons, we will likely want to parametrize and train SPT on Vancouver data rather than just using a DALES-pretrained model. This tutorial will give you guidelines on how to proceed.

In particular, we will cover the following:

&emsp;✅ reading and visualizing raw point clouds using `Data` objects <br/>
&emsp;✅ runnning an inference on custom data using a pretrained SPT <br/>
&emsp;✅ parametrizing the preprocessing of the hierarchical superpoint partition on custom data <br/>
&emsp;✅ training SPT on a custom dataset <br/>

# 2. Reading and visualizing raw point clouds using `Data` objects

### 2.1. Preparing a `Data` reader

Before anything, you will need to define a reader function that parses your raw point cloud files (eg LAS, PLY, txt, ...) and returns a `Data` object holding your points and associated attributes.

Our `Data` object is a simple class based on [PyG's `Data` object](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch_geometric.data.Data) for holding points clouds (or more generally graphs) with convenient utilities.

Below we provide a ready-to-use example of such reader for parsing LAS files from the [Vancouver LiDAR 2022](https://opendata.vancouver.ca/explore/dataset/lidar-2022/map/?location=12,49.25683,-123.14421) dataset.

> **Tip 💡**: You can find inspiration from other point cloud readers implemented for our supported datasets in `src.datasets`. In particular, for PLY format, you may want to have a look at the source code for DALES and KITTI-360.

In [None]:
import os
import sys

# Add the project's files to the python path
# file_path = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))  # for .py script
file_path = os.path.dirname(os.path.abspath(''))  # for .ipynb notebook
sys.path.append(file_path)

import laspy
import torch
from src.data import Data
from src.utils.color import to_float_rgb


def read_vancouver_tile(
        filepath, 
        xyz=True, 
        rgb=True, 
        intensity=True, 
        semantic=True, 
        instance=False,
        remap=True, 
        max_intensity=600):
    """Read a Vancouver tile saved as LAS.

    :param filepath: str
        Absolute path to the LAS file
    :param xyz: bool
        Whether XYZ coordinates should be saved in the output Data.pos
    :param rgb: bool
        Whether RGB colors should be saved in the output Data.rgb
    :param intensity: bool
        Whether intensity should be saved in the output Data.rgb
    :param semantic: bool
        Whether semantic labels should be saved in the output Data.y
    :param instance: bool
        Whether instance labels should be saved in the output Data.obj
    :param remap: bool
        Whether semantic labels should be mapped from their Vancouver ID
        to their train ID
    :param max_intensity: float
        Maximum value used to clip intensity signal before normalizing 
        to [0, 1]
    """
    # Create an emty Data object
    data = Data()
    
    las = laspy.read(filepath)

    # Populate data with point coordinates 
    if xyz:
        # Apply the scale provided by the LAS header
        pos = torch.stack([
            torch.tensor(las[axis])
            for axis in ["X", "Y", "Z"]], dim=-1)
        pos *= las.header.scale
        pos_offset = pos[0]
        data.pos = (pos - pos_offset).float()
        data.pos_offset = pos_offset

    # Populate data with point RGB colors
    if rgb:
        # RGB stored in uint16 lives in [0, 65535]
        data.rgb = to_float_rgb(torch.stack([
            torch.FloatTensor(las[axis].astype('float32') / 65535)
            for axis in ["red", "green", "blue"]], dim=-1))

    # Populate data with point LiDAR intensity
    if intensity:
        # Heuristic to bring the intensity distribution in [0, 1]
        data.intensity = torch.FloatTensor(
            las['intensity'].astype('float32')
        ).clip(min=0, max=max_intensity) / max_intensity

    # Populate data with point semantic segmentation labels
    if semantic:
        y = torch.LongTensor(las['classification'])
        data.y = torch.from_numpy(ID2TRAINID)[y] if remap else y

    # Populate data with point panoptic segmentation labels
    if instance:
        raise NotImplementedError("The dataset does not contain instance labels.")

    return data

Often, we need to remap the raw labels provided in a dataset to another set of labels to be used for training. 
In the next cell, we define some environment variables for remapping Vancouver class indices and corresponding customized class names and colors for downstream visualization.

> **Tip 💡**: As described in our [datasets documentation](../docs/datasets.md/#semantic-label-format) we consider labels in `[0, num_classes - 1]` to be valid classes and use the `num_classes` label for void/ignored/unlabeled points (whichever you call it). Check out the [documentation](../docs/datasets.md/#semantic-label-format) for more details.

In [None]:
import numpy as np

# Number of classes in the dataset (excluding void/unlabeled/ignored)
VANCOUVER_NUM_CLASSES = 6

# Mapping from original classes
ID2TRAINID = np.asarray([
    VANCOUVER_NUM_CLASSES,  # 0 Not used         ->  6 Ignored
    5,                      # 1 Other            ->  5 Other
    0,                      # 2 Ground           ->  0 Ground
    3,                      # 3 Low vegetation   ->  3 Low vegetation
    VANCOUVER_NUM_CLASSES,  # 4 Unknown / Noise  ->  6 Ignored
    2,                      # 5 High vegetation  ->  2 High vegetation
    4,                      # 6 Building         ->  4 Buildings
    VANCOUVER_NUM_CLASSES,  # 7 Unknown / Noise  ->  6 Ignored
    VANCOUVER_NUM_CLASSES,  # 8 Unknown / Noise  ->  6 Ignored
    1])                     # 9 Water            ->  1 Water

# Class names (including void/unlabeled/ignored last)
VANCOUVER_CLASS_NAMES = [
    'Ground',
    'Water',
    'High vegetation',
    'Low vegetation',
    'Buildings',
    'Other',
    'Ignored']

# Class color palette (including void/unlabeled/ignored last)
VANCOUVER_CLASS_COLORS = np.asarray([
    [243, 214, 171],
    [169, 222, 249],
    [ 70, 115,  66],
    [204, 213, 174],
    [214,  66,  54],
    [186, 160, 164],
    [  0,   0,   0]])

### 2.2. `Data` visualization

We can now download tiles from [Vancouver LiDAR 2022](https://opendata.vancouver.ca/explore/dataset/lidar-2022/map/?location=12,49.25672,-123.14434) and read their content into a `Data` object.

In [None]:
filepath = '/path/to/your/vancouver.las'
data = read_vancouver_tile(filepath)

We have created a `Data` object containing out point cloud and associated attributes. 
Let's have a closer look at it !

The basic `Data.__repr__()` will show the attributes (ie keys) in Data and their respective shapes.

In [None]:
data

You can check the number of points (ie nodes) in a `Data` object with `data.num_points` (or `data.num_nodes`).

In [None]:
data.num_points

You can check the list of attributes stored in a `Data` object with `data.keys`.

In [None]:
data.keys

We provide a [Plotly](https://plotly.com/python)-based too for visalizing `Data` objects. To use it, simply use `data.show()`. This function offers many options for customizing your plot. We will see later on that it can also be used for visualizing hierarchical superpoint partitions held in `NAG` objects.

First, let's visualize the whole point cloud contained in `Data` (this may take a couple of seconds if your cloud has $\sim10^5$ points or more).
We can specify our `class_names` and `class_colors` to `show()` to customize the displaying of semantic segmentation labels.

In [None]:
data.show(class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS)

By default, the point cloud is subsampled`max_points=50000` to alleviate the visualization computation time.
To get a clearer, high-resolution view, you can increase `max_points` or visualize smaller scenes.
You can for instance, only display a spherical crop of the point cloud by specifying a `center` and a `radius`.

In [None]:
data.show(center=[425, 282, 15], radius=30, keys=['intensity'], class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS)

> **Tips 💡**
> - More info on our `Data` structure ? 👉 see [`docs/data_structures.md`](../docs/data_structures.md), our source code in `src.data.data`, and the [PyG Data documentation](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.data.Data.html#torch_geometric.data.Data) it builds upon
> - More info on our `show()` visualization tool ? 👉 see [`docs/visualization.md`](../docs/visualization.md) and  source code in `src.visualization`

# 3. Tiling very large point clouds

Sometimes, aerial or terrestrial LiDAR acquisition campaigns produce point cloud files covering extremely large areas 🐘. 
For instance, DALES and Vancouver datasets provide tiles at 50 pts/m² resolution spanning 0.25 km² and 1 km², respectively.

While Superpoint Transformer is quite scalable, the CPU and GPU memories 💾 still put a limit on how big of a scene can be processed at once.
Since we usually do not _need_ to jointly process all the points in a 500m radius for understanding the scene semantics, it is safe to **tile these into smaller chunks of manageable size**.

We propose two tiling strategies in this project: 
- tiling along the XY coordinate system axes with `SampleXYTiling` 👉 when your clouds already have simple, convex, axis-aligned horizontal layouts like DALES or Vancouver
- recursively tiling along the principal XY components with `SampleRecursiveMainXYAxisTiling` 👉 when your clouds have complex horizontal layouts like KITTI-360

Let's visualize the impact of the tilings on our current `Data` object (we will run the below example on subsampled data for the sake of faster visualization).

> **Tip 💡**: You can skip this section if your `Data` is not that large (eg $\sim 10⁶$ points or fewer with a 24G-32G GPU 🦋). You can still adjust the tiling later on to suit your point cloud size and hardware capabilities if you run into memory issues.

In [None]:
from src.transforms import SampleXYTiling, GridSampling3D
from src.data import Batch

# Tile the cloud into `xy_tiling` XY-oriented chunks of equal horizontal 
# span
xy_tiling = (4, 2)

# Voxelize the point cloud only for the sake of faster computation and 
# visualization here
data_5m = GridSampling3D(10)(data)

# Compute each chunk 
chunks = []
for x in range(xy_tiling[0]):
    for y in range(xy_tiling[1]):        
        # Extract the chunk at (x, y) in the tiling grid
        chunk = SampleXYTiling(x=x, y=y, tiling=xy_tiling)(data_5m)

        # Add a 'tile' attribute to the points for visualization
        chunk.tile = torch.full((chunk.num_points,), x * xy_tiling[1] + y)
        
        # Store the chunk for later aggregation
        chunks.append(chunk)

# Aggregate all chunk `Data` objects into one big `Data` object
data_tiled = Batch.from_data_list(chunks)

# Show the resulting `Data' with the 'tile' attribute
data_tiled.show(keys='tile')

In [None]:
from src.transforms import SampleRecursiveMainXYAxisTiling, GridSampling3D
from src.data import Batch

# Recursively tile the cloud into `2**pc_tiling` chunks with respect to 
# principal components of the XY coordiantes
pc_tiling = 3

# Voxelize the point cloud only for the sake of faster computation and 
# visualization here
data_5m = GridSampling3D(5)(data)

# Compute each chunk 
chunks = []
for x in range(2**pc_tiling):
    # Extract the chunk at x in the recursive tiling
    chunk = SampleRecursiveMainXYAxisTiling(x=x, steps=pc_tiling)(data_5m)

    # Add a 'tile' attribute to the points for visualization
    chunk.tile = torch.full((chunk.num_points,), x)
    
    # Store the chunk for later aggregation
    chunks.append(chunk)

# Aggregate all chunk `Data` objects into one big `Data` object
data_tiled = Batch.from_data_list(chunks)

# Show the resulting `Data' with the 'tile' attribute
data_tiled.show(keys='tile')

Since the Vancouver point cloud is XY-axis aligned and has a simple square XY layout, we choose to use `SampleXYTiling` here.
**For the rest of this tutorial, we will work on one of the chunks of the original point cloud.**
Feel free to adjust the tiling method and the chosen tile to your dataset.

In [None]:
from src.transforms import SampleXYTiling

# Extract the chunk at (x, y) in the tiling grid
data = SampleXYTiling(x=1, y=1, tiling=3)(data)

# 4. Using a pretrained model for inference

We provide pretrained weights and preprocessing parametrization for several datasets (see [README](../README.md) and [datasets documentation](../docs/datasets.md)). Since the Vancouver dataset is fairly similar to DALES, we would like to check how a DALES-pretrained SPT would fare on our present `Data` object.

As mentioned in the [introductory slides](../media/superpoint_transformer_tutorial.pdf), running an inference with a pretrained SPT requires more than just the model weights. Indeed, we also need to apply to our `Data` the same `pre_transform` and `on_device_transform` as the ones used for training the model.

### 4.1. Instantiating transforms from `configs/`

We will first need to recover the transforms used in the DALES experiments as provided in the `configs/experiment` using [Hydra](https://hydra.cc/docs/intro/). 
In the next cell, we show how to use the `init_config()` utility to get the **exact configuration used for training the released DALES model**.

> **Tips 💡**
> - More info on how `configs/` & [Hydra](https://hydra.cc/docs/intro/) work ? 👉 see the [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template) repository
> - More info on a specific experiment's settings ? 👉 explore our configuratin files in `configs/`, these are fairly commented 😉

In [None]:
from src.utils import init_config

cfg = init_config(overrides=[f"experiment=semantic/dales"])

This `cfg` is an [omegaconf](https://omegaconf.readthedocs.io) `DictConfig` object. It contains all the necessary hyperparameters for reproducing the pretraining experiment: dataset, model structure, training recipe, etc. We can explore its content just like a basic dictionary, or a simple object.

In [None]:
cfg.keys()

The parametrization of the transforms is specified in the datamodule config in `cfg.datamodule`.
We can instantiate the transforms from an [omegaconf](https://omegaconf.readthedocs.io) `DictConfig` object without instantiating the whole dataset by using the `instantiate_datamodule_transforms()` utility.

In [None]:
from src.transforms import instantiate_datamodule_transforms

transforms_dict = instantiate_datamodule_transforms(cfg.datamodule)
transforms_dict

The transforms are chained operations applied to a `Data` or a `NAG` object. Their order and parametrization plays a significant role and modifying these may have non-negligible downstream effects. **These must be thought as part of the model itself**.

### 4.2. Applying transforms

As explained in the [introductory slides](../media/superpoint_transformer_tutorial.pdf), we will be using `pre_transform` and `on_device_test_transform` to reproduce the behavior of the pretrained model at inference time.

> **Note 🤓**: In the next cell, we manually apply some `NAGRemoveKeys()` transform after the `pre_transform`. This is because we ocasionally need to mimick the full behavior of the pretraining `Dataset`: after the `pre_transform` is executed, the preprocessed `NAG` is saved to disk. When later read from disk by the `Dataset`, only the `point_load_keys` attributes of `NAG[0]` and `segment_load_keys` attributes of `NAG[i], i>0` are loaded from disk. This mechanism ensures we only load the strict necessary during training, hence saving I/O time. Since we are running the `pre_transform` manually here, we need to account for this mechanism and discard the preprocessed attributes that the DALES dataset did not read from disk. These can be found in `cfg.datamodule.point_load_keys` and `cfg.datamodule.segment_load_keys`.

In [None]:
# Apply pre-transforms
nag = transforms_dict['pre_transform'](data)

# Simulate the behavior of the dataset's I/O behavior with only
# `point_load_keys` and `segment_load_keys` loaded from disk
from src.transforms import NAGRemoveKeys
nag = NAGRemoveKeys(level=0, keys=[k for k in nag[0].keys if k not in cfg.datamodule.point_load_keys])(nag)
nag = NAGRemoveKeys(level='1+', keys=[k for k in nag[1].keys if k not in cfg.datamodule.segment_load_keys])(nag)

# Move to device
nag = nag.cuda()

# Apply on-device transforms
nag = transforms_dict['on_device_test_transform'](nag)

The output of the transforms is no longer a `Data` object, but a `NAG`. This is the data structure we use to carry around **point clouds** and **hierarchical superpoint partitions**. 

Essentially, it is a list of `Data` objects, each representing a partition level:
- `nag[0]` is $P_0$, the (voxelized) points
- `nag[i]` is $P_i$, the $\text{i}^\text{th}$ superpoint partition level 

At each level $i>0$, the `edge_index` and `edge_attr` attributes carry the **superpoint adjacency graph** and corresponding **adjacency features**.

> **Tip 💡** More info on our `NAG` structure ? 👉 see [`docs/data_structures.md`](../docs/data_structures.md) and source code in `src.data.nag`

In [None]:
nag

Let's visualize the impact of the transforms on the data on a small area for high-resolution display. Note we can display the nodes and edges of the superpoint graphs by passing `show(centroids=True, h_edge=True)`.

In [None]:
nag.show(class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS, center=[485, 505, 0], radius=20, keys=nag[0].keys, centroids=True, h_edge=True)

Now we have preprocessed our data, we need to run an inference with the pretrained model.

> **Tip 💡**: If you want to store your progress disk, both `Data` and `NAG` have `.save()` and `.load()` methods specially designed with fast I/O and disk usage in mind 😉.

### 4.3. Instantiating a pretrained model from `configs/` and a `*.ckpt`

Similar to the transforms, we will use the DALES experiment configuration files to instantiate the **pretrained model**. 
This time, the part of the [omegaconf](https://omegaconf.readthedocs.io) `DictConfig` object we are interested in is stored under `cfg.model`.

As stated in the [README](../README.md), the pretrained weights for our models can be recovered from [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8042712.svg)](https://doi.org/10.5281/zenodo.8042712).

In [None]:
import hydra 
from src.utils import init_config

# Path to the checkpoint file downloaded from https://zenodo.org/records/8042712
ckpt_path = "/path/to/your/superpoint_transformer.ckpt"

cfg = init_config(overrides=[f"experiment=semantic/dales"])

# Instantiate the model and load pretrained weights
model = hydra.utils.instantiate(cfg.model)
model = model._load_from_checkpoint(ckpt_path)

### 4.4. Applying SPT

Now everything is ready for running our inference ! 

In [None]:
# Set the model in inference mode on the same device as the input
model = model.eval().to(nag.device)

# Inference, returns a task-specific ouput object carrying predictions
with torch.no_grad():
    output = model(nag)

The output of the model is a `SemanticSegmentationOutput` object. It is a simple class dedicated to holding onto predictions in `output.semantic_pred` and facilitating certain basic post-processing operations such as metrics computation. 

In [None]:
output.semantic_pred.shape, nag.num_points

As stated in [introductory slides](../media/superpoint_transformer_tutorial.pdf), it is important to remember that, by default, **SPT outputs predictions on the $P_1$ level** (ie `nag[1]`). Since the superpoints $P_1$ are assumed to be semantically pure, simply classifying those is equivalent to classifying each point in the scene. In doing so, we save a lot of computation and memory during training.

Yet, at inference time, we often want the predictions at the voxel level $P_0$ (ie `nag[0]`) or even at the full-resolution of the raw input cloud. 
To this end, we simply need to distribute the $P_1$ predictions to the lower partition levels.
The `SemanticSegmentationOutput.voxel_semantic_pred()` and `SemanticSegmentationOutput.full_res_semantic_pred()` were designed just for that ! 

In the next cell, we will convert $P_1$ predictions into $P_0$ predictions.

> **Tip 💡**: For **full-resolution predictions**, see our [`demo.ipynb` notebook](../notebooks/demo.ipynb), and have a look at [`src.utils.output_semantic.py`](../src/utils/output_semantic.py#L140). Remember that if you have applied a tiling to your data, your full-resolution predictions will be given for the tile at hand and not the original point cloud.

> **Note 🤓**: Although SPT does make predictions as $P_1$ node classifications, all losses and metrics are properly computed so as to take into account the true labels assigned to full-resolution points. To make these efficient, our pipeline always tracks the **histogram of ground truth labels** for each voxel in $P_0$ and superpoint in $P_i, i>0$.

In [None]:
# Compute the level-0 (voxel-wise) semantic segmentation predictions 
# based on the predictions on level-1 superpoints and save those for 
# visualization in the level-0 Data under the 'semantic_pred' attribute
nag[0].semantic_pred = output.voxel_semantic_pred(super_index=nag[0].super_index)

Let's visualize the resulting predictions on a small area for high-resolution display.

Note that since the model was trained on DALES classes, the predicted labels do not align with those of our Vancouver dataset. 
For better visualization, we will use the DALES `CLASS_NAMES` and `CLASS_COLORS`.

In [None]:
from src.datasets.dales import CLASS_NAMES as DALES_CLASS_NAMES
from src.datasets.dales import CLASS_COLORS as DALES_CLASS_COLORS

nag.show(class_names=DALES_CLASS_NAMES, class_colors=DALES_CLASS_COLORS, center=[485, 505, 0], radius=20)

We can see that the DALES-pretrained model is actually doing a pretty good job on the Vancouver dataset !

Still, the DALES classes and Vancouver classes are not the same. If we are particularly interested in identifying Vancouver classes such as _low vegetation_, or _water_, we will need to train a dedicated model on the Vancouver data. Besides, we may also want to adjust the preprocessing steps in `pre_transform`: different parameters may produce partitions that better respect the semantic boundaries of Vancouver classes.

# 5. Parametrizing the partition

### 5.1. Assessing partition quality

There are many ways to parametrize your preprocessing `pre_transform`, and finding a good setting for a new dataset is usually a matter of 'just trying'. 
Still, there are some **guidelines for what a _good_ partition might be**:
- **⚡ efficiency** -  it must simplify the scene by having as few superpoints as possible 👉 measured with `NAG.level_ratios`
> `NAG.level_ratios` computes the ratio of the number of elements between successive partition levels. 
- **🎯 accuracy** - it must respect the semantic boundaries of objects 👉 measured with `Data.semantic_segmentation_oracle()`
> `Data.semantic_segmentation_oracle()` computes the semantic segmentation metrics of a hypothetical _oracle_ model capable of predicting the majority label for each superpoint. To compute this, we use the fact that labels in `nag[i].y` are stored as histograms, which allows for computing _exact_ full-resolution metrics (even accounting for the voxelization of $P_0$).

#### 🔮 **Rules of thumb** - _Don't take these for granted for any dataset but aiming for those can get you started._

We usually aim for:
- $\frac{|P_0|}{|P_1|} \in [30, 50]$
- $\frac{|P_i|}{|P_{i+1}|} \in [3, 10],\quad i > 0$
- $\text{oracle mIoU} ~ P_1 > 0.95$

Beyond these quantified measurements, it is also important that you **visualize your partitions** and, given your own domain expertise, check whether they make sense for the task you are interested in.

Let's check the efficiency and accuracy of the current partition on the `NAG` at hand.

> **Tip 💡**: In practice you would want to compute and accumulate these values on your entire dataset, or at least on several representative tiles. Here we only compute these on a single tile for simplicity. Scaling the present single-tile study to multiple tiles will be up to you, but we would recommend you implement your own `Dataset` for that (see next section) 😉

In [None]:
# Ratio of sizes of successive partition levels
nag.level_ratios

In [None]:
# Oracle semantic segmentation metrics on P_1
nag[1].semantic_segmentation_oracle(VANCOUVER_NUM_CLASSES)

In [None]:
# Oracle semantic segmentation metrics on P_0
nag[0].semantic_segmentation_oracle(VANCOUVER_NUM_CLASSES)

As we can see, the partition is not so bad, but we may want to improve the $P_1$ oracle mIoU a little.

### 5.2. Adjusting the partition parameters

As mentioned in the [introductory slides](../media/superpoint_transformer_tutorial.pdf), the `pre_transform` typically includes the following steps:

| Operation | `Transform` |
| :------ | :------ |
| Voxelization | [`GridSampling3D`](../src/transforms/sampling.py#L59C7-L59C21) |
| Neighbor search | [`KNN`](../src/transforms/neighbors.py#L9)  |
| Elevation estimation | [`GroundElevation`](../src/transforms/point.py#L223) |
| Pointwise local geometric features | [`PointFeatures`](..src/transforms/point.py#L17) |
| Adjacency graph | [`AdjacencyGraph`](..src/transforms/graph.py#L24) |
| Hierarchical partition | [`CutPursuitPartition`](../src/transforms/partition.py#L23) |
| Superpoint-wise handcrafted features | [`SegmentFeatures`](../src/transforms/graph.py#L75) |
| Superpoint adjacency graph and features | [`RadiusHorizontalGraph`](../src/transforms/graph.py#L548) |

In [None]:
transforms_dict['pre_transform']

Let's have a quick look at how what some of these operations affect the `Data` object. To this end, we will re-read the raw data to start from scratch.

In [None]:
from src.transforms import *

data = read_vancouver_tile(filepath)
data = SampleXYTiling(x=1, y=1, tiling=3)(data)

#### 5.2.1. Voxelization

`GridSampling3D(size=...)` voxelizes a point cloud to voxel `size`. This first step is not specific to Superpoint Transformer, it is shared by most point cloud preprocessing pipelines, even not explicitly voxel-based. This **mitigates sampling density disparities** and reduces the size of the point cloud, hence **reducing downstream compute and memory costs**.

> **Tip 💡**: Keep the [Nyquist-Shannon theorem](https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem) in mind when deciding on a voxel size. You typically want your voxel resolution to be **at least half the size of the smallest structure you want to characterize**. This puts a lower bound the voxel sizes you should consider. Still, using smaller voxels (ie higher point resolution) usually comes with higher model performance at the expense of compute and memory efficiency.

> **Note 🤓**: `GridSampling3D` offers advanced mechanisms for aggregating point attributes inside each voxel, based on their nature:
> - mean aggregation (eg for float values like the position or colors)
> - last encountered value (eg for identical values like the batch index)
> - histogram (eg for semantic segmentation labels)
> - voting for dominant value (eg for semantic segmentation labels, superpoint indices)
> - merging into a `Cluster` object (eg for full-resolution point indices)
> - unit-normalized vector combination (eg for normals)
> 
> See the [source code](../src/transforms/sampling.py#L59C7-L59C21) for more details.

In [None]:
data_voxelized = GridSampling3D(size=1, hist_key='y', hist_size=VANCOUVER_NUM_CLASSES + 1)(data)
data_voxelized

In [None]:
data.num_nodes / data_voxelized.num_nodes

In our case the already-selected DALES voxel resolution of 10 cm is well-adapted for Vancouver, so we will keep it as is.

In [None]:
data = GridSampling3D(size=0.1, hist_key='y', hist_size=VANCOUVER_NUM_CLASSES + 1)(data)

#### 5.2.2. Neighbor search

`KNN(k=..., r_max=...)` searches for the `k` nearest neighbors of each point, within a maximum radius of `r_max`. Contrary to basic K-NN search, the radius constraint prevents spurious neighborhoods for very sparse areas of the point cloud. By design, this approach implies **points may not all have the same number of neighbors**, depending on the local geometry and density. Our pipeline is capable of dealing of neighborhoods of uneven sizes, without resorting to artificial subsampling or oversampling strategies.

Applying `KNN` will store the results in `neighbor_index` and `neighbor_distance` attributes. Missing neighbors are indicated as `-1` in `neighbor_index`.

The neigbors are used for two things in the preprocessing pipeline:
- computing local geometric features with `PointFeatures`, later used by `CutPursuitPartition` as pointwise signal for the superpoint partition
- computing the adjacency graph with `AdjacencyGraph`, later used by `CutPursuitPartition` as the graph on which the superpoint partition is computed

> **Note 🤓**: Our fast `KNN` implementation internally relies on [`FRNN`](https://github.com/lxxue/FRNN) which is optimized for **GPU-based neighbor search**. While it offers considerable speedups compared to other off-the-shelf neighbor search libraries, its installation has revealed challenging to some users. We might move to a slightly-slower-but-more-stable CPU-based [`nanoflann`](https://github.com/jlblancoc/nanoflann/tree/c4c4daf6bb9bda9890fb58324282016b4184d887) implementation in the future. If you are having troubles installing `FRNN`, check [related solved issues](https://github.com/drprojects/superpoint_transformer/issues?q=frnn) in the repository.

In [None]:
data = KNN(k=25, r_max=2)(data)
data

In [None]:
(data.neighbor_index == -1).sum() / data.num_nodes

Let's visualize the number of neighbors per point when the `r_max` is smaller than what is set for DALES.

In [None]:
data.num_neighbors = data.neighbor_index.ge(0).sum(dim=1)
data.show(class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS, center=[485, 505, 0], radius=20, keys='num_neighbors')

#### 5.2.3. Elevation estimation

`GroundElevation` is used to look for the ground among the points, to then infer point `elevation`. Indeed, the elevation is a more informative feature than the `z` coordinate of points for semantic parsing. For real-life large point cloud acquisitions, the absolute `z` value usually carries no meaning, but the _relative `z`_ with respect to the ground does (the same holds for absolute `x` and `y` values).

To find the ground, we simply use the [RANSAC](https://en.wikipedia.org/wiki/Random_sample_consensus) algorithm. `GroundElevation(threshold=..., scale=...)` will search for the ground as a planar surface located within `threshold` of the lowest point in the cloud. Pointwise distance to the plane will then be computed and normalized by `scale`. `threshold` should be tuned for environments where other large planar surfaces may affect the RANSAC ground search (eg ceiling, building roof, bridges, below-ground water surface, ...).

> **Note 🤓**: Using RANSAC to represent the ground surface is a _coarse and error-prone_ strategy. While it was sufficient for the benchmark datasets used in our paper, more advanced tools should be used for capturing with non-planar outdoor terrain or multi-floor indoor scans.

In [None]:
data = GroundElevation(threshold=5, scale=20)(data)

In [None]:
import seaborn as sns

sns.displot(data.elevation)

In [None]:
data.show(class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS, keys='elevation')

For the relatively-flat Vancouver dataset, it seems the DALES parametrization of `GroundElevation` is good enough.

#### 5.2.4. Pointwise local geometric features

`PointFeatures` computes some handcrafted geometric features characterizing each point's neighborhood. The following features are currently supported:
- RGB color
- HSV color
- LAB color
- density
- linearity
- planarity
- scattering
- verticality
- normal
- length
- surface
- volume
- curvature

These features should be computed with the superpoint partition in mind: these will be the **criteria based on which points will or will not grouped together** by the cut-pursuit algorithm.

The choice of which feature is useful to your problem will depend on your classes of interest. For instance, when studying anthropic structures, planarity and linearity are very important. Note that the robustness and expressivity of these computed geometric features will depend on your `KNN` parametrization. If your point clouds come with RGB colors, converting those to HSV or LAB colorspaces may help capturing object boundaries (cf [SLIC](https://ieeexplore.ieee.org/document/6205760) paper).

Interestingly, Vancouver has RGB colors, which was not the case for DALES. Let's see if using these instead of the LiDAR intensity improves the partition. 

> **Note 🤓**: `PointFeatures` supports various strategies for geometric computation. By default, all neighbors produced by `KNN` will be used. One may also specify `PointFeatures(k_min=...)` below which a point will receive `0` geometric features, to mitigate the low-quality features for too-small neighborhoods. Besides, `PointFeatures(k_step=..., k_min_search=...)` will search for the optimal neighborhood size among available neighbors for each point, based on eigenfeatures entropy (based on this [paper](https://isprs-annals.copernicus.org/articles/II-3/181/2014/isprsannals-II-3-181-2014.pdf)).

In [None]:
data = PointFeatures(keys=('elevation', 'rgb', 'hsv', 'linearity', 'planarity', 'scattering', 'verticality'))(data)

In [None]:
data.show(class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS, center=[485, 505, 0], radius=20, keys=data.keys)

#### 5.2.5. Adjacency graph

`AdjacencyGraph` computes the adjacency graph based on which the superpoint partition will be computed. It is relying on the output of `KNN` to find neighbors for each point. `AdjacencyGraph(k=..., w=...)` will store edges for the `k`-NN graph in `Data.edge_index`, along with edge weights in `Data.edge_attr` to be used in the partition (the larger the an edge's weight the harder to separate the corresponding points).

In [None]:
data = AdjacencyGraph(k=10, w=1)(data)

#### 5.2.6. Hierarchical partition

`CutPursuitPartition` is where the actual superpoint partition occurs. The [parallel cut-pursuit](https://arxiv.org/abs/1905.02316) algorithm is used to partition the adjacency graph based on point features. A regularization term rules the trade-off between "many-superpoint-with-homogeneous-content" and "few-superpoints-with-heterogenous-content".

In `CutPursuitPartition(regularization=..., spatial_weight=..., k_adjacency=..., cutoff=...)`, `regularization` carries a list of increasing float values for coarser and coarser hierarchical superpoint partition levels. `spatial_weight` indicates how much importance the point coordinates play with respect point features, when grouping points: the larger the weight, the more spatial coordinates take over, the more tesselated-looking the partition. `k_adjacency` prevents superpoints from staying isolated. `cutoff` rules the minimum number of points in each superpoint partition level: too-small superpoint will be merged with other superpoints.

Before computing the partition, we need to move to the `x` attribute all the features that we want to use for the partition (`CutPursuitPartition` will blindly use whatever it finds `x`). To this end, we will use the `AddKeysTo` transform.

You can play with the features used with `AddKeysTo` and `CutPursuitPartition` parameters, and see how it impacts your partition metrics.

In [None]:
# Copy desired features to `x`
data = AddKeysTo(keys=['linearity', 'planarity', 'scattering', 'elevation'], to='x', delete_after=False)(data)

# Compute the hierarchical partition
nag = CutPursuitPartition(
    regularization=[0.1, 0.2], 
    spatial_weight=[0.1, 0.01], 
    cutoff=[10, 30], 
    iterations=15, 
    k_adjacency=10)(data)

In [None]:
# Ratio of sizes of successive partition levels
nag.level_ratios

In [None]:
# Oracle semantic segmentation metrics
nag[1].semantic_segmentation_oracle(VANCOUVER_NUM_CLASSES)

In [None]:
nag.show(class_names=VANCOUVER_CLASS_NAMES, class_colors=VANCOUVER_CLASS_COLORS, center=[485, 505, 0], radius=20)

#### 5.2.7. Superpoint-wise handcrafted features

Once the hierarchical partition has been computed, `SegmentFeatures` builds some superpoint-wise features at each partition level. These are basic descriptors that can be used to help the model characterize the superpoints or the connection between superpoints (see `RadiusHorizontalGraph`).

#### 5.2.8. Superpoint adjacency graph and features

`RadiusHorizontalGraph` computes the superpoint adjacency graphs and stores them in the `edge_index` and `edge_attr` of each partition level. These are the graphs used by SPT to propagate information between nodes with self-attention. 

In particular `RadiusHorizontalGraph(gap=..., k_min=..., k_max=...)` rule how far each superpoint is allowed to look inside each partition level. You can think of this as the **"kernel size" of the attention mechanism**. While increasing `gap` may increase model semantic segmentation performance, be aware that it will also increase the number of edges in the adjacency graph, which will directly impact computation and memory efficiency.

Once we are happy with our partition parametrization, we will want to deploy the preprocessing to our entire dataset and train on it.

# 6. Training on your own `Dataset`

### 6.1. Creating your own `Dataset`

To make the most of the codebase capabilities, your dataset must inherit from the `BaseDataset` class and follow a certain structure. See the [datasets documentation](../docs/datasets.md) for how to implement your own dataset.

### 6.2. Parametrizing your transforms

We have seen above how to configure your `pre_transform`. Since these will be executed once on your dataset at preprocessing time, you will not need to re-run them at each experiment if you leave the parameters unchanged.

Still, you will also need to parametrize your `on_device_train_transform` and `on_device_val_transform` (usually we fix `on_device_test_transform=on_device_val_transform`).
Exploring these is outside of the scope of this tutorial, but several design choices can have an impact on your model performance and memory consumption.

Have a look at how we configured already-existing datasets in the `configs/datamodule/semantic/` for reference. Besides, the source code of all transforms is fairly documented. Make sure you read it to understand what they do !

> **Tips 💡**:
> - It is possible to parametrize SPT to **train and infer on an 11G GPU 💾**. To this end, you can have a look at the existing '*_11G' configs in `configs/experiment/semantic` to see how we did for supported datasets.
> - We provide a **detailed list of suggestions for troubleshooting CUDA memory errors** in the [README](https://github.com/drprojects/superpoint_transformer/tree/master?tab=readme-ov-file#cuda-out-of-memory-errors).

### 6.3. Training and testing

Have a look at the [README](https://github.com/drprojects/superpoint_transformer/tree/master?tab=readme-ov-file#cuda-out-of-memory-errors) for basic training and testing commands.
Refer to the documentation of the [lightning-hydra-template](https://github.com/ashleve/lightning-hydra-template) to make the most of all the _**many available functionalities**_ !