# Exercise 2: Shape Classification and Segmentation

**Submission Deadline**: 31.05.2022, 23:55

In this exercise, we will dive into Machine Learning on 3D shapes by taking a look at shape classification and segmentation.

## 2.0. Running this notebook
We recommend running this notebook on a cuda compatible local gpu. You can also run training on cpu, it will just take longer.

We describe two options for executing the training parts of this exercise below: Using Google Colab or running it locally on your machine. If you are not planning on using Colab, just skip forward to Local Execution.

### Google Colab

If you don't have access to gpu and don't wish to train on CPU, you can use Google Colab. However, we experienced the issue that inline visualization of shapes or inline images didn't work on colab, so just keep that in mind.
What you can also do is only train networks on colab, download the checkpoint, and visualize inference locally.

In case you're using Google Colab, you can upload the exercise folder (containing `exercise_2.ipynb`, directory `exercise_2` and the file `requirements.txt`) as `3d-machine-learning` to google drive (make sure you don't upload extracted datasets files).
Additionally you'd need to open the notebook `exercise_2.ipynb` in Colab using `File > Open Notebook > Upload`.

Next you'll need to run these two cells for setting up the environment. Before you do that make sure your instance has a GPU.

In [1]:
# import os
# from google.colab import drive
# drive.mount('/content/drive', force_remount=True)
#
# # We assume you uploaded the exercise folder in root Google Drive folder
#
# !cp -r /content/drive/MyDrive/3d-machine-learning 3d-machine-learning/
# os.chdir('/content/3d-machine-learning/')
# print('Installing requirements')
# !pip install -r requirements.txt
#
# # Make sure you restart runtime when directed by Colab

Run this cell after restarting your colab runtime

In [1]:
# import os
# import sys
# import torch
# os.chdir('/content/3d-machine-learning/')
# sys.path.insert(1, "/content/3d-machine-learning/")
# print('CUDA availability:', torch.cuda.is_available())

### Local Execution

If you run this notebook locally, you have to first install the python dependiencies again. They are the same as for exercise 1 so you can re-use the environment you used last time. If you use [poetry](https://python-poetry.org), you can also simply re-install everything (`poetry install`) and then run this notebook via `poetry run jupyter notebook`.

### Imports

The following imports should work regardless of whether you are using Colab or local execution.

In [1]:
%load_ext autoreload
%autoreload 2
from pathlib import Path
import numpy as np
import matplotlib as plt
import k3d
import trimesh
import torch

Use the next cell to test whether a GPU was detected by pytorch.

In [4]:
torch.cuda.is_available()

True

## 2.1. ShapeNet Terms of Use

We provide pre-processed shapes from the [ShapeNet](//shapenet.org) database for this exercise. ShapeNet is an ongoing effort to establish a richly-annotated, large-scale dataset of 3D shapes and is a collaborative effort between researchers at Princeton, Stanford and TTIC.

<img src="exercise_2/images/shapenet.png" alt="shapenet" style="width: 512px;"/>

In order to be able to use the data, we ask you to read and agree to their Terms of Use as stated below (this is a requirement for passing this exercise):

1. Researcher shall use the Database only for non-commercial research and educational purposes.
2. Princeton University and Stanford University make no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose.
3. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify Princeton University and Stanford University, including their employees, Trustees, officers and agents, against any and all claims arising from Researcher's use of the Database, including but not limited to Researcher's use of any copies of copyrighted 3D models that he or she may create from the Database.
4. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions.
5. Princeton University and Stanford University reserve the right to terminate Researcher's access to the Database at any time.
6. If Researcher is employed by a for-profit, commercial entity, Researcher's employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.
7. The law of the State of New Jersey shall apply to all disputes under this agreement.

To agree, simply type `I agree to the Terms of Use` in the next code cell.

In [5]:
# I agree to the Terms of Use

## 2.2. A simple 3D CNN with pytorch

Here, we will create a very simple 3D Convolutional Neural Network on some toy data. This is meant as a quick introduction into the pytorch framework if you haven't used it yet. It will cover everything you need to know for the following parts of the exercise; if you want to go into a bit more detail about the framework, a good place to start is the official [pytorch quickstart tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html).

A very useful resource if you want to know more about a certain class or function are the [official docs](https://pytorch.org/docs/stable/index.html). Take a minute to look at the documentation of a [Tensor](https://pytorch.org/docs/stable/tensors.html?highlight=tensor#torch.Tensor) which is the data structure pytorch uses throughout their APIs. It is comparable to a numpy array, with added functionality such as the ability to move seamlessly between cpu host memory and cuda device memory or the integrated autograd functionality for automatic differentiation. Most importantly, you can create a tensor from a numpy array with `torch.from_numpy(array)` and the other way around with `array = tensor.numpy()`.

Our objective with this 3D CNN is to simply classify if a given SDF is containing a sphere or a torus. The data is generated the same way we did it in the last exercise (we randomize the shape parameters a bit so the task is not too trivial). Each such sample has a scalar label associated with it, either 0 or 1, depending on the shape it contains.

All of the implementation in this part will take place in `exercise_2/simple_nn.py`. This makes sense for now but note that we would usually spread things over multiple different files like in the following exercise parts.

The important bits we need for every training in pytorch are:
1. The **dataset** implementation. This is a class responsible for providing the data used by the training procedure sample-by-sample. In its simplest form, it just loads data samples (eg. point clouds or voxel grids) from disk. However, it is usually also used to transform raw data from disk into the correct format and to apply various kinds of augmentations.
2. The network definition ("**model**") that we want to train. It specifies the network structure (number and type of layers, activation functions, normalization layers etc.) as well as how exactly input data is processed.
3. The **loss function**: For classification, this is usually a Cross Entropy Loss; we will also see reconstruction losses like l1 and l2 in exercise 3.
4. The **optimizer**: Usually chosen from a set of pre-defined classes like SGD or ADAM.
5. The **training loop**: For each batch of data, it does the following: Loads data from the dataloader, passes it through the model ("forward pass"), computes the loss, calculates the gradients ("backward pass"), and finally, adjusts the network weights using the optimizer.

Let's walk through everything step-by-step:

### (a) Dataset

We start by implementing the data source, which is called `Dataset` in pytorch. Take a look at the `SimpleDataset` class, it contains everything we have to implement to make it work:
1. The `__init__` function that takes paramters and prepares the dataset by setting paths etc. Here, the one thing you always have to do is to load at least a list of samples you want to use - they don't have to be loaded from disk yet, but you need to know which and how many samples there are. This usually also depends on the current split - you use a different set of samples for training than for validation or testing.
2. The `__len__` function that returns the total number of samples. In most cases, this will simply be the length of the list of samples you prepared in `__init__`.
3. The `__getitem__` function. This is where the actual dataloading takes place. The function gets an index between 0 and the length you returned with `__len__`. Your job is then to return some data corresponding to the index. You do not need to worry about putting data onto the GPU here yet; instead, you load the data and return a tuple containing numpy arrays and other data like sample ids you might need in your training loop. Overall, you should take the index parameter in this function, find the sample id it belongs to, load that data, and then return it in a format that can be used in the training loop.

Note that it is also fine to load all data into a list in the `__init__` function for small datasets like our toy dataset in this exercise part. You would then just index this pre-loaded data in your `__getitem__` function.

Add and implement functions `__init__`, `__len__`, and `__getitem__` in class `SimpleDataset`. 

First, implement `__init__`: It takes a single parameter (called `split`) that determines if we are using train or val split at the moment. Based on that, it should generate toy data: 4096 samples if the split is train and 1024 if it is val. Use `generate_toy_data` from `exercise_2/util/toy_data.py` for this which generates two numpy arrays: The first one containing sphere and torus SDFs generated with randomized parameters and the second one classification labels, a 0 if the corresponding SDF in the first array is a sphere and 1 if it is a torus.

Then, implement `__len__` by simply returning the length of the data you generated in `__init__`.

Lastly, implement `__getitem__` which takes an integer index and returns a tuple: A numpy array containing the input SDF volume as generated by `generate_toy_data` before and a scalar value that represents the target class label for this volume.

Note: You have to add an additional dimension of size 1 to the input volume you return in `__getitem__` to make everything work later on, i.e. instead of returning a numpy array of shape (32, 32, 32), you should return an array of shape (1, 32, 32, 32).

Test your implementation with the checks below:

In [8]:
from exercise_2.simple_nn import SimpleDataset

# Create datasets with train and val splits
train_dataset = SimpleDataset('train')
val_dataset = SimpleDataset('val')

# Test lengths
print(f'Length of train set: {len(train_dataset)}')  # expected output: 4096
print(f'Length of val set: {len(val_dataset)}')  # expected output: 1024

# Get sample at index 0
train_sample = train_dataset[0]
print(train_sample[0].shape)  # Expected output (1, 32, 32, 32) (the leading 1 is important for later)
print(f"Class = {train_sample[1]}")  # Expected output: Scalar value 0
val_sample = val_dataset[-1]
print(val_sample[0].shape)  # Expected output (1, 32, 32, 32) (the leading 1 is important for later)
print(f"Class = {val_sample[1]}")  # Expected output: Scalar value 1

Generating toy data ...
Generating toy data ...
Length of train set: 4096
Length of val set: 1024
(1, 32, 32, 32)
Class = 0
(1, 32, 32, 32)
Class = 1


In [9]:
from exercise_2.util.visualization import visualize_sdf

train_sample = train_dataset[0]
visualize_sdf(train_sample[0].squeeze(0), filename=Path.cwd() / 'toy_data.ply')

Creating SDF visualization for 32^3 grid ...
Exported to D:\TUM\ML3D\E2\exercise_2\toy_data.ply


### (b) Model
The model is defined in class `SimpleModel`. Two functions are important here: `__init__` which sets up the architecture by instantiating layers with the correct parameters and adapting them to possible input parameters and `forward` which takes an input tensor (we call it `x` here). The output of `forward` is another tensor that contains the result of the network - in our case, this is a vector of logits which will then be used for classification. It could also be a volume of labels if you do segmentation or a volume of the same size as the input if you do reconstruction.

Analogous to `forward`, you could also define a custom `backward` function that describes how to calculate the gradients for your model. However, this is not necessary for most architectures as pytorch's autograd implementation takes care of all of that.

For now, we will implement a very simple model: We stack together three 3D Convolution layers and have a fully connected layer at the end for classification. This is a schematic for the overall model structure:
<img src="exercise_2/images/simplenn.png" alt="simplenn_architecture" style="width: 512px;"/>

Each layer starts with a 3d convolution, followed by a batch normalization layer and a ReLU activation function. We left the implementation of the first layer in there; your task is to fill out the missing parts in `__init__` for layers 2 and 3 (use the same parameters as for the first convolution: kernel size 4, stride 3, padding 1). Each convolution should double the number of feature channels, such that you get a volume of size 16 x 1 x 1 x 1 after layer 3. You also need to define the ReLU and the fully connected layer (for this, use `torch.nn.Linear` to reduce the number of dimensions from 16 to 2).

In `forward`, you can repeat the first line to move the input tensor through all three convolutional layers. Then, reshape the resulting tensor to dimension batchsize x 16 (the batch size is always the first dimension) and apply the linear layer to it. Return the result as-is.

Hint: The ease of debugging is one of the main reasons why pytorch is preferred in research over alternatives like tensorflow. You can set breakpoints in the `forward` function and watch as the input tensor gets modified by each layer to find bugs or things like wrongly calculated convolution parameters.

Use the following sanity checks to verify your model:

In [10]:
from exercise_2.simple_nn import SimpleModel
from exercise_2.util.model import summarize_model

simple_nn = SimpleModel()
print(summarize_model(simple_nn))  # Expected: Rows 0-8 and TOTAL = 10614

input_tensor = torch.randn(32, 1, 32, 32, 32)
predictions = simple_nn(input_tensor)

print('Output tensor shape: ', predictions.shape)  # Expected: torch.Size([32, 2])

  | Name  | Type        | Params
--------------------------------------
0 | conv1 | Conv3d      | 260   
1 | bn1   | BatchNorm3d | 8     
2 | conv2 | Conv3d      | 2056  
3 | bn2   | BatchNorm3d | 16    
4 | conv3 | Conv3d      | 8208  
5 | bn3   | BatchNorm3d | 32    
6 | fc    | Linear      | 34    
7 | relu  | ReLU        | 0     
8 | TOTAL | SimpleModel | 10614 
Output tensor shape:  torch.Size([32, 2])


### (c) Training

We already laid out most of the code structure you need to start training once you have your dataset and model defined. This code usually does not change much between projects; this is why there exist many different libraries to simplify it even more (you can take a look at pytorch lightning for example if you are interested). We will stick to the standard pytorch way of doing things for these exercises.

The most important things left to do are:
1. Define the data loaders. They take the samples you provide in your dataset implementation and take care of loading multiple samples in parallel, shuffling the dataset, and combining samples into batches.
2. Define a loss function. For classification, this is usually `torch.nn.CrossEntropyLoss`
3. Instantiate the optimizer. Mostly, this will be ADAM (`torch.optim.Adam`) unless you have a good reason to use another one.
4. Implement the training loop: Get a batch of data from the data loader, move it to the GPU, perform a forward pass, compute the loss, calculate gradients in the backward pass, adjust weights in an optimizer step, repeat.

Note: You have to move some stuff to the correct compute device (usually the GPU) by calling `.to(device)`: The model and loss function as well as the data you get from each batch.

Take a look at the structure of `main` since you will also use it for the next exercise parts and fill in code in `train` at the blanks marked with TODO. Then, start the training below. For this exercise, we don't care too much about the results since it is a very easy classification task anyways. Your model should be able to get to > 98% validation accuracy, though. You can stop the training once that is the case.

In [1]:
from exercise_2 import simple_nn

config={
    'experiment_name': '2_2_simple_nn',
    'device': 'cuda:0',  # change this to cpu if you do not have a GPU
    'batch_size': 32,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 50,
    'print_every_n': 100,
    'validate_every_n': 100
}

simple_nn.main(config)  # should have val_accuracy > 99%

Using device: cuda:0
Generating toy data ...
Generating toy data ...
[000/00099] train_loss: 0.304
[000/00099] val_loss: 0.175, val_accuracy: 95.703%
[001/00071] train_loss: 0.114
[001/00071] val_loss: 0.123, val_accuracy: 96.680%
[002/00043] train_loss: 0.056
[002/00043] val_loss: 0.106, val_accuracy: 97.266%
[003/00015] train_loss: 0.014
[003/00015] val_loss: 0.086, val_accuracy: 98.633%
[003/00115] train_loss: 0.093
[003/00115] val_loss: 0.074, val_accuracy: 97.559%
[004/00087] train_loss: 0.083
[004/00087] val_loss: 0.057, val_accuracy: 98.047%
[005/00059] train_loss: 0.057
[005/00059] val_loss: 0.081, val_accuracy: 99.023%
[006/00031] train_loss: 0.033
[006/00031] val_loss: 0.088, val_accuracy: 97.363%
[007/00003] train_loss: 0.003
[007/00003] val_loss: 0.061, val_accuracy: 98.438%
[007/00103] train_loss: 0.069
[007/00103] val_loss: 0.052, val_accuracy: 98.828%
[008/00075] train_loss: 0.046
[008/00075] val_loss: 0.049, val_accuracy: 98.730%
[009/00047] train_loss: 0.025
[009/00047

That's it! In the following parts, we will now move on to more complicated problems and more involved network models.

## 2.3. Shape Classification using 3DCNN

### (a) Download and extract voxelized ShapeNet training data

Each folder in the `exercise_2/data/ShapeNetVox32` directory represents a shape category represented by a number, e.g. `02691156`.
We provide the mapping between these numbers and the corresponding names in `exercise_2/data/shape_info.json`. Each of these shape category folders contains a number of shapes
represented as voxels and stored in a `binvox` format.

```
# contents of exercise_2/data/ShapeNetVox32

02691156/                                   # Shape category folder with all it's shapes
    ├── 1a04e3eab45ca15dd86060f189eb133/    # A single shape of the category
        ├── model.binvox                    # Voxel representation of the shape in binvox format
    ├── 1a6ad7a24bb89733f412783097373bdc/   # Another shape of the category
    ├── 1a9b552befd6306cc8f2d5fe7449af61/
    ├── :                                   # And so on ...
    ├── :
02828884/                                   # Another shape category folder
02933112/                                   # In total you should have 13 shape category folders
:
:
```

In [None]:
print('Downloading ...')
!wget http://cvgl.stanford.edu/data2/ShapeNetVox32.tgz -P exercise_2/data
print('Extracting ...')
!tar -xzf exercise_2/data/ShapeNetVox32.tgz -C exercise_2/data
!rm exercise_2/data/ShapeNetVox32.tgz
print('Done.')

### (b) Dataloading and exploring the data

We already provide you with a training and validation split in files `train.txt` and `val.txt` in folder `exercise_2/data/splits/shapenet`.
All the shapes in the list `train.txt` make up the training samples, while all the samples in `val.txt` constitute the validation set.
Additionally, we provide `overfit.txt` as the set of shapes we'll use for overfitting / debugging later.

Now let's write a Pytorch Dataset class that can load this data from the disk. Check out `ShapeNetVox` class in file `exercise_2/data/shapenet.py`
for a partial implementation of such a dataset.

The dataset class is instantiated with the type of the split, e.g. `train`, `val` or `overfit` and loads all
shape names for that split as a list in its member variable `self.items`. The class also provides utility method `get_shape_voxels(shapenet_id)`
which given a `shapenet_id` of the form `<shape_class>/<shape_identifier>` returns a 32x32x32 numpy array representing the voxels of the shape.
The class also provides a list of all shape categories as the static member `ShapeNetVox.classes`.

Your task is to fill out the missing implementations of functions `__getitem__` and `__len__` as specified by their docstrings.
Once done, test your implementation below.

In [2]:
from exercise_2.data.shapenet import ShapeNetVox
# Let's test your implementation

In [3]:
# Create a dataset with train split
trainset = ShapeNetVox('train')

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(trainset)}')  # expected output: 21705

Length of train set: 21705


In [4]:
# Create a dataset with val split and print its length
valset = ShapeNetVox('val')
print(f'Length of validation set: {len(valset)}')  # expected output: 5426

Length of validation set: 5426


In [5]:
# Visualize some shapes
from exercise_2.util.visualization import visualize_occupancy

shape_data = trainset[0]

print(f'Name: {shape_data["name"]}')  # expected output: 04379243/d120d47f8c9bc5028640bc5712201c4a
print(f'Voxel Dimensions: {shape_data["voxel"].shape}')  # expected output: (1, 32, 32, 32)
print(f'Label: {shape_data["label"]} | {ShapeNetVox.classes[shape_data["label"]]}')  # expected output: 10, 04379243

visualize_occupancy(shape_data["voxel"].squeeze(), flip_axes=True)

Name: 04379243/d120d47f8c9bc5028640bc5712201c4a
Voxel Dimensions: (1, 32, 32, 32)
Label: 10 | 04379243


Output()

In [6]:
shape_data = trainset[7]
print(f'Name: {shape_data["name"]}')  # expected output: 03211117/edfc3a8ecc5b07e9feb0fb1dff94c98a
print(f'Voxel Dimensions: {shape_data["voxel"].shape}')  # expected output: (1, 32, 32, 32)
print(f'Label: {shape_data["label"]} | {ShapeNetVox.classes[shape_data["label"]]}')  # expected output: 5, 03211117

visualize_occupancy(shape_data["voxel"].squeeze(), flip_axes=True)

Name: 03211117/edfc3a8ecc5b07e9feb0fb1dff94c98a
Voxel Dimensions: (1, 32, 32, 32)
Label: 5 | 03211117


Output()

### (a) Download and prepare the ShapeNetPointClouds dataset
We generated point clouds from ShapeNet meshes via uniform sampling. Each point cloud contains 1024 xyz points.

The data layout is basically the same as in 2.3.:
Each folder in the `exercise_2/data/ShapeNetPointClouds` directory contains one shape category represented by a number, e.g. `02691156`.
We provide the mapping between these numbers and the corresponding names in `exercise_2/data/shape_info.json`. Each of these shape category folders contains a number of shapes in obj format.

```
# contents of exercise_2/data/ShapeNetPointClouds

02691156/                                      # Shape category folder with all its shapes
    ├── 1a04e3eab45ca15dd86060f189eb133.obj    # A single shape of the category
    ├── 1a6ad7a24bb89733f412783097373bdc.obj   # Another shape of the category
    ├── :                                      # And so on ...
    ├── :
02828884/                                      # Another shape category folder
02933112/                                      # In total you should have 13 shape category folders
:
:
```
print('Downloading ...')
!wget https://www.christian-diller.de/ShapeNetPointClouds.zip -P exercise_2/data
print('Extracting ...')
!unzip -q exercise_2/data/ShapeNetPointClouds.zip -d exercise_2/data
!rm exercise_2/data/ShapeNetPointClouds.zip
print('Done.')

### (b) Dataset implementation

You can use the same split setup as in 2.3: `overfit.txt` for overfitting, `train.txt` for the train samples, and `val.txt` for the val samples in folder `exercise_2/data/splits/shapenet`.

The dataset implementation will therefore be very similar to the one from 2.3: Fill out the missing implementations of functions `__getitem__` and `__len__` in class `ShapeNetPoints` in `exercise_2/data/shapenet.py`.

The major difference is how the actual data is loaded: We don't have regular voxel grids anymore and instead load arrays of 1024 points each. In `__getitem__`, we now return 'points' instead of 'voxel' and for loading the point clouds, we use `get_point_cloud` instead of `get_shape_voxels`. You can load the point cloud data either by hand (since it is in the same obj format you used in exercise 1) or simply use `trimesh.load`. The point clouds you return from `__getitem__` should have shape 3 x 1024 and datatype `np.float32`.

Otherwise, the implementation is very much the same as in 2.3. Once done, test your implementation below.

In [14]:
from exercise_2.data.shapenet import ShapeNetPoints

# Create a dataset with train split
train_dataset = ShapeNetPoints('train')
val_dataset = ShapeNetPoints('val')
overfit_dataset = ShapeNetPoints('overfit')

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(train_dataset)}')  # expected output: 21705
# Get length, which is a call to __len__ function
print(f'Length of val set: {len(val_dataset)}')  # expected output: 5426
# Get length, which is a call to __len__ function
print(f'Length of overfit set: {len(overfit_dataset)}')  # expected output: 64

Length of train set: 21705
Length of val set: 5426
Length of overfit set: 64


In [15]:
# Visualize some shapes
from exercise_2.util.visualization import visualize_pointcloud

shape_data = train_dataset[np.random.randint(len(train_dataset))]
print(f'Name: {shape_data["name"]}')  # expected output: 04379243/d120d47f8c9bc5028640bc5712201c4a
print(f'Voxel Dimensions: {shape_data["points"].shape}')  # expected output: (3, 1024)
print(f'Label: {shape_data["label"]} | {ShapeNetPoints.classes[shape_data["label"]]} | {ShapeNetPoints.class_name_mapping[ShapeNetPoints.classes[shape_data["label"]]]}')  # expected output: 10, 04379243

visualize_pointcloud(shape_data["points"].T, point_size=0.025, flip_axes=True)

Name: 04401088/c87bd717c3640f0f741e88434245c899
Voxel Dimensions: (3, 1024)
Label: 11 | 04401088 | telephone


Output()

### (c) Defining the model

The model architecture of PointNet was discussed in the lecture and is visualized below:
<img src="exercise_2/images/pointnet.png" alt="pointnet_architecture" style="width: 800px;"/>

Some hints for the actual implementation:
1. We use conv1d layers with kernel size 1 for all "shared" mlps to expand the feature channel dimension, e.g. when the input is of shape batch_size x 3 x 1024 and we apply conv1d(in_features=3, out_features=64), then we get to shape batch_size x 64 x 1024
2. The mlps in the classification network after the max pooling operation are implemented using Linear layers
3. The numbers in parenthesis after mlp() in the visualization above descibe the number of layers with their out_features dimension. Note though that the first mlp from nx3 to nx64 is expressed as two layers in the original tensorflow code but can be implemented as a single conv1d layer going from 3 to 64 features in the pytorch version.
4. We define all layers up to and including the max pooling operation as the `PointNetEncoder`. The architecture of the model head depends on the task we are trying to solve: Either classification (`PointNetClassification`, used in this part of the exercise) or segmentation (`PointNetSegmentation`, used in 2.5).
5. ReLU and Batch Norms are applied after each layer, except after the last classification layer. In the last layer before the max operation, we only apply Batch Norm but no ReLU.
6. Dropout is applied for classification only after the second Linear layer, before the Batch Norm.
7. The TNets are basically small PointNets.

Implement the missing parts of the PointNet architecture in `TNet`, `PointNetEncoder`, and `PointNetClassification`, as indicated by the TODOs. All of them are located in `exercise_2/models/pointnet.py`. Use the following code cell to sanity check your implementation:

In [26]:
from exercise_2.model.pointnet import PointNetClassification
from exercise_2.util.model import summarize_model

pointnet = PointNetClassification(13)
print(summarize_model(pointnet))  # Expected: Rows 0-40 and TOTAL = 3464534

input_tensor = torch.randn(8, 3, 1024)
predictions = pointnet(input_tensor)

print('Output tensor shape: ', predictions.shape)  # Expected: 8, 13
num_trainable_params = sum(p.numel() for p in pointnet.parameters() if p.requires_grad) / 1e6
print(f'Number of traininable params: {num_trainable_params:.2f}M')  # Expected: ~3M

   | Name                                  | Type                   | Params 
-----------------------------------------------------------------------------------
0  | encoder                               | PointNetEncoder        | 2803529
1  | encoder.input_transform_net           | TNet                   | 803081 
2  | encoder.input_transform_net.conv1     | Sequential             | 384    
3  | encoder.input_transform_net.conv1.0   | Conv1d                 | 256    
4  | encoder.input_transform_net.conv1.1   | BatchNorm1d            | 128    
5  | encoder.input_transform_net.conv1.2   | ReLU                   | 0      
6  | encoder.input_transform_net.conv2     | Sequential             | 8576   
7  | encoder.input_transform_net.conv2.0   | Conv1d                 | 8320   
8  | encoder.input_transform_net.conv2.1   | BatchNorm1d            | 256    
9  | encoder.input_transform_net.conv2.2   | ReLU                   | 0      
10 | encoder.input_transform_net.conv3     | Sequential   

### (d) Training Script and Overfitting

You can now go to the train script in `train_pointnet_classification.py` and fill in the missing pieces as in 2.3. Then, verify that your training works by overfitting to a few samples below.

In [39]:
from exercise_2.training import train_pointnet_classification
config = {
    'experiment_name': '2_4_pointnet_classification_overfitting',
    'device': 'cuda:0',                   # change this to cpu if you do not have a GPU
    'is_overfit': True,                   # True since we're doing overfitting
    'batch_size': 32,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 1000,
    'print_every_n': 100,
    'validate_every_n': 100,
}

train_pointnet_classification.main(config)  # should be able to get ~0 loss, 100% accuracy

Using device: cuda:0
[049/00001] train_loss: 0.556
[049/00001] val_loss: 0.589, val_accuracy: 73.438%
[099/00001] train_loss: 0.123
[099/00001] val_loss: 0.087, val_accuracy: 96.875%
[149/00001] train_loss: 0.082
[149/00001] val_loss: 0.112, val_accuracy: 96.875%
[199/00001] train_loss: 0.044
[199/00001] val_loss: 0.002, val_accuracy: 100.000%
[249/00001] train_loss: 0.024
[249/00001] val_loss: 0.004, val_accuracy: 100.000%
[299/00001] train_loss: 0.015
[299/00001] val_loss: 0.001, val_accuracy: 100.000%
[349/00001] train_loss: 0.007
[349/00001] val_loss: 0.009, val_accuracy: 98.438%
[399/00001] train_loss: 0.047
[399/00001] val_loss: 0.096, val_accuracy: 93.750%
[449/00001] train_loss: 0.045
[449/00001] val_loss: 0.004, val_accuracy: 98.438%
[499/00001] train_loss: 0.016
[499/00001] val_loss: 0.002, val_accuracy: 100.000%
[549/00001] train_loss: 0.038
[549/00001] val_loss: 0.008, val_accuracy: 100.000%
[599/00001] train_loss: 0.029
[599/00001] val_loss: 0.001, val_accuracy: 100.000%
[

### (e) Training over the entire training set

Once your overfitting completes successfully, you can move on to training on the entire dataset again.

In [40]:
from exercise_2.training import train_pointnet_classification
config = {
    'experiment_name': '2_4_pointnet_classification_generalization',
    'device': 'cuda:0',                    # change this to cpu if you do not have a GPU
    'is_overfit': False,
    'batch_size': 32,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 10,
    'print_every_n': 100,
    'validate_every_n': 250,
}

train_pointnet_classification.main(config)  # Should be able to get > 92% accuracy on the val set

Using device: cuda:0
[000/00099] train_loss: 1.164
[000/00199] train_loss: 0.699
[000/00249] val_loss: 0.004, val_accuracy: 84.077%
[000/00299] train_loss: 0.570
[000/00399] train_loss: 0.506
[000/00499] train_loss: 0.471
[000/00499] val_loss: 0.005, val_accuracy: 89.016%
[000/00599] train_loss: 0.445
[001/00020] train_loss: 0.395
[001/00070] val_loss: 0.002, val_accuracy: 86.399%
[001/00120] train_loss: 0.426
[001/00220] train_loss: 0.398
[001/00320] train_loss: 0.348
[001/00320] val_loss: 0.003, val_accuracy: 87.726%
[001/00420] train_loss: 0.374
[001/00520] train_loss: 0.342
[001/00570] val_loss: 0.002, val_accuracy: 91.283%
[001/00620] train_loss: 0.326
[002/00041] train_loss: 0.367
[002/00141] train_loss: 0.303
[002/00141] val_loss: 0.002, val_accuracy: 89.937%
[002/00241] train_loss: 0.341
[002/00341] train_loss: 0.315
[002/00391] val_loss: 0.002, val_accuracy: 90.804%
[002/00441] train_loss: 0.278
[002/00541] train_loss: 0.303
[002/00641] train_loss: 0.274
[002/00641] val_loss: 

### (f) Inference using the trained model

In [16]:
from exercise_2.inference.infer_pointnet_classification import InferenceHandlerPointNetClassification


# create a handler for inference using a trained checkpoint
inferer = InferenceHandlerPointNetClassification('exercise_2/runs/2_4_pointnet_classification_generalization/model_best.ckpt')

In [17]:
# get shape point cloud and visualize
shape_points = ShapeNetPoints.get_point_cloud('03001627/f913501826c588e89753496ba23f2183')
print('Predicted category:', inferer.infer_single(shape_points))  # expected output: chair
visualize_pointcloud(shape_points.T, point_size=0.025, flip_axes=True)

Predicted category: chair


Output()

In [18]:
# get shape point cloud and visualize
shape_points = ShapeNetPoints.get_point_cloud('02691156/6af4383123972f2262b600da24e0965')
print('Predicted category:', inferer.infer_single(shape_points))
visualize_pointcloud(shape_points.T, point_size=0.025, flip_axes=True)

Predicted category: airplane


Output()

In [19]:
# get shape point cloud and visualize
shape_points = ShapeNetPoints.get_point_cloud('04090263/eae96ddf483e896c805d3d8e378d155e')
print('Predicted category:', inferer.infer_single(shape_points))
visualize_pointcloud(shape_points.T, point_size=0.025, flip_axes=True)

Predicted category: rifle


Output()

Make sure you submit the trained model `exercise_2/runs/2_4_pointnet_classification_generalization/model_best.ckpt` in your zip
so that we can evaluate it on the test set at our end.

## 2.5. Shape Parts Segmentation using PointNet

We now go one step further: We do not just want to learn the overall class label for a given shape but instead for each point in a shape the part it belongs to. We call this Part Segmentation. The good thing is that we can actually re-use most of the PointNet architecture from 2.4.

### (a) Download the ShapeNetPart dataset

Annotating data for segmentation is a lot of effort since labelling has to be performed within the shape for each part instead of globally for the entire shape.

Luckily, there are existing datasets we can use for this. In our case, this is the ShapeNet Part Segmenation dataset that you can download in the cell below.

In terms of data layout, the general idea of shape class identifiers and shape IDs is the same; we just have slightly different shape categories now. Also, each point cloud now has a correponding file specifying the part class for every point.

We put the shape class labels for this dataset in `exercise_2/data/shape_parts_info.json`, analogous to `shape_info.json` from exercise parts 2.3 and 2.4.

The point cloud data is stored as pts files which is basically an even simpler version of obj. It omits the v in front of each line that represents a point and does not support faces. Each line therefore represents one point with its xyz coordinates, separated by a space.

```
# contents of exercise_2/data/shapenetcore_partanno_segmentation_benchmark_v0

02691156/                                         # Shape category folder with all its shapes
    ├── points                                    # All point clouds go here
        ├── 1a04e3eab45ca15dd86060f189eb133.pts   # Point cloud data
        ├── 1a32f10b20170883663e90eaf6b4ca52.pts  # Another point cloud
        :
        :
    ├── points_label                              # Part labels for each point in the corresponding pts file
        ├── 1a04e3eab45ca15dd86060f189eb133.seg   # Each line represents the local part class of a point
        ├── 1a32f10b20170883663e90eaf6b4ca52.seg  # Another segmentation file
        :
        :
    ├── seg_img                                   # Visualizations of the original mesh part segmentation
02773838/                                         # Another shape category folder
02954340/                                         # In total you should have 16 shape category folders
:
:
train_test_split/                                 # Official split IDs
```

In [None]:
print('Downloading ...')
!wget https://shapenet.cs.stanford.edu/ericyi/shapenetcore_partanno_segmentation_benchmark_v0.zip --no-check-certificate -P exercise_2/data
print('Extracting ...')
!unzip -q exercise_2/data/shapenetcore_partanno_segmentation_benchmark_v0.zip -d exercise_2/data
!rm exercise_2/data/shapenetcore_partanno_segmentation_benchmark_v0.zip
print('Done.')

### (b) Dataset implementation

You can use the same split setup as in 2.3 and 2.4: `overfit.txt` for overfitting, `train.txt` for the train samples, and `val.txt` for the val samples; This time, use the files in folder `exercise_2/data/splits/shapenet_parts`.

The dataset implementation will be similar to 2.3 and 2.4: Fill out the missing implementations of functions `__getitem__` and `__len__` in class `ShapeNetPoints` in `exercise_2/data/shapenet_parts.py`. Note that you now need to load not only the point cloud but also the per-point segmentation labels in function `get_point_cloud_with_labels`. Since each point cloud in this dataset contains more than 1024 points, we also need to sub-sample the raw points list. Use `np.random.choice` for this: Randomizing the sampling will work as augmentation which in turn helps prevent overfitting. Make sure to sample the corresponding points and labels when doing so.

Once done, test your implementation below.

In [20]:
from exercise_2.data.shapenet_parts import ShapeNetParts

# Create a dataset with train split
train_dataset = ShapeNetParts('train')
val_dataset = ShapeNetParts('val')
overfit_dataset = ShapeNetParts('overfit')

# Get length, which is a call to __len__ function
print(f'Length of train set: {len(train_dataset)}')  # expected output: 12137
# Get length, which is a call to __len__ function
print(f'Length of val set: {len(val_dataset)}')  # expected output: 1870
# Get length, which is a call to __len__ function
print(f'Length of overfit set: {len(overfit_dataset)}')  # expected output: 64

Length of train set: 12137
Length of val set: 1870
Length of overfit set: 64


### (c) Modifying the PointNet Model

Take a look at the PointNet architecture again:
<img src="exercise_2/images/pointnet.png" alt="pointnet_architecture" style="width: 800px;"/>

We only cared about the blue classification part in 2.4. Now, we also want to implement the yellow part. You can re-use your encoder from 2.4. 

The idea is simple: Take the n points with 64-dimensional point features from the correct layer of the encoder and concatenate the global shape descriptor you get after applying the max function to it. Then, implement the remaining layers as conv1ds with batchnorm and relu after all but the last layer. The final layer reduces the dimensionality per point to m which is 50 in our case since we have 50 overall parts.

Add the missing layers to `PointNetSegmentation` in `exercise_2/models/pointnet.py` and finish the implementation of the forward pass.

In [3]:
from exercise_2.model.pointnet import PointNetSegmentation
from exercise_2.util.model import summarize_model

pointnet = PointNetSegmentation(50)
print(summarize_model(pointnet))  # Expected: Rows 0-40 and TOTAL = 3533563

input_tensor = torch.randn(8, 3, 1024)
predictions = pointnet(input_tensor)

print('Output tensor shape: ', predictions.shape)  # Expected: 8, 1024, 50
num_trainable_params = sum(p.numel() for p in pointnet.parameters() if p.requires_grad) / 1e6
print(f'Number of traininable params: {num_trainable_params:.2f}M')  # Expected: ~3M

   | Name                                  | Type                 | Params 
---------------------------------------------------------------------------------
0  | encoder                               | PointNetEncoder      | 2803529
1  | encoder.input_transform_net           | TNet                 | 803081 
2  | encoder.input_transform_net.conv1     | Sequential           | 384    
3  | encoder.input_transform_net.conv1.0   | Conv1d               | 256    
4  | encoder.input_transform_net.conv1.1   | BatchNorm1d          | 128    
5  | encoder.input_transform_net.conv1.2   | ReLU                 | 0      
6  | encoder.input_transform_net.conv2     | Sequential           | 8576   
7  | encoder.input_transform_net.conv2.0   | Conv1d               | 8320   
8  | encoder.input_transform_net.conv2.1   | BatchNorm1d          | 256    
9  | encoder.input_transform_net.conv2.2   | ReLU                 | 0      
10 | encoder.input_transform_net.conv3     | Sequential           | 134144 
11 | e

### (d) Training Script and Overfitting

You can now go to the train script in `train_pointnet_segmentation.py` and fill in the missing pieces as in 2.3 and 2.4. Then, verify that your training work by overfitting to a few samples below.

In [21]:
from exercise_2.training import train_pointnet_segmentation
config = {
    'experiment_name': '2_5_pointnet_segmentation_overfitting',
    'device': 'cuda:0',                   # change this to cpu if you do not have a GPU
    'is_overfit': True,                   # True since we're doing overfitting
    'batch_size': 32,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 500,
    'print_every_n': 100,
    'validate_every_n': 100,
}

train_pointnet_segmentation.main(config)  # should be able to get <0.1 loss, >97% accuracy, >0.95 iou

Using device: cuda:0
[049/00001] train_loss: 1.063
[049/00001] val_loss: 0.423, val_accuracy: 86.090%, val_iou: 0.736
[099/00001] train_loss: 0.378
[099/00001] val_loss: 0.266, val_accuracy: 90.088%, val_iou: 0.825
[149/00001] train_loss: 0.247
[149/00001] val_loss: 0.175, val_accuracy: 93.205%, val_iou: 0.868
[199/00001] train_loss: 0.159
[199/00001] val_loss: 0.125, val_accuracy: 95.291%, val_iou: 0.900
[249/00001] train_loss: 0.116
[249/00001] val_loss: 0.104, val_accuracy: 95.880%, val_iou: 0.903
[299/00001] train_loss: 0.108
[299/00001] val_loss: 0.106, val_accuracy: 95.830%, val_iou: 0.909
[349/00001] train_loss: 0.116
[349/00001] val_loss: 0.685, val_accuracy: 83.928%, val_iou: 0.799
[399/00001] train_loss: 0.190
[399/00001] val_loss: 0.097, val_accuracy: 96.191%, val_iou: 0.916
[449/00001] train_loss: 0.095
[449/00001] val_loss: 0.078, val_accuracy: 96.890%, val_iou: 0.920
[499/00001] train_loss: 0.099
[499/00001] val_loss: 0.089, val_accuracy: 96.310%, val_iou: 0.918


### (e) Training over the entire training set

Once your overfitting completes successfully, you can move on to training on the entire dataset again.

In [None]:
from exercise_2.training import train_pointnet_segmentation
config = {
    'experiment_name': '2_5_pointnet_segmentation_generalization',
    'device': 'cuda:0',                   # change this to cpu if you do not have a GPU
    'is_overfit': False,
    'batch_size': 32,
    'resume_ckpt': None,
    'learning_rate': 0.001,
    'max_epochs': 10,
    'print_every_n': 100,
    'validate_every_n': 250,
}

train_pointnet_segmentation.main(config)  # Should be able to get > 90% accuracy and > 0.8 iou on the val set

Using device: cuda:0
[000/00099] train_loss: 1.451
[000/00199] train_loss: 0.858
[000/00249] val_loss: 0.634, val_accuracy: 78.849%, val_iou: 0.672
[000/00299] train_loss: 0.746
[001/00019] train_loss: 0.590
[001/00119] train_loss: 0.515
[001/00119] val_loss: 0.445, val_accuracy: 85.615%, val_iou: 0.731
[001/00219] train_loss: 0.472
[001/00319] train_loss: 0.474
[001/00369] val_loss: 0.581, val_accuracy: 81.850%, val_iou: 0.708
[002/00039] train_loss: 0.451
[002/00139] train_loss: 0.399
[002/00239] train_loss: 0.380
[002/00239] val_loss: 0.351, val_accuracy: 87.954%, val_iou: 0.775
[002/00339] train_loss: 0.380
[003/00059] train_loss: 0.359
[003/00109] val_loss: 0.330, val_accuracy: 88.703%, val_iou: 0.788
[003/00159] train_loss: 0.337
[003/00259] train_loss: 0.334
[003/00359] train_loss: 0.344
[003/00359] val_loss: 0.325, val_accuracy: 89.268%, val_iou: 0.797
[004/00079] train_loss: 0.343
[004/00179] train_loss: 0.345
[004/00229] val_loss: 0.341, val_accuracy: 88.828%, val_iou: 0.801


### (f) Inference using the trained model

In [None]:
from exercise_2.inference.infer_pointnet_segmentation import InferenceHandlerPointNetSegmentation
from exercise_2.util.visualization import visualize_pointcloud
from matplotlib import cm, colors
import numpy as np

# create a handler for inference using a trained checkpoint
inferer = InferenceHandlerPointNetSegmentation('exercise_2/runs/2_5_pointnet_segmentation_generalization/model_best.ckpt')

In [None]:
# Get shape point cloud, predict labels, and visualize colored point cloud
shape_points = ShapeNetParts.get_point_cloud_with_labels('02691156/1c4b8662938adf41da2b0f839aba40f9')[0]
point_labels = inferer.infer_single(shape_points)
point_labels = (point_labels - min(point_labels)) / (max(point_labels) - min(point_labels))
point_colors = cm.get_cmap('hsv')(point_labels)[:, :3]
point_colors = np.sum((point_colors * 255).astype(int) * [255*255, 255, 1], axis=1)
visualize_pointcloud(shape_points.T, colors=point_colors, point_size=0.025, flip_axes=True)

In [None]:
# Get shape point cloud, predict labels, and visualize colored point cloud
shape_points = ShapeNetParts.get_point_cloud_with_labels('03948459/e017cf5dac1e39b013d74211a209ce')[0]
point_labels = inferer.infer_single(shape_points)
point_labels = (point_labels - min(point_labels)) / (max(point_labels) - min(point_labels))
point_colors = cm.get_cmap('hsv')(point_labels)[:, :3]
point_colors = np.sum((point_colors * 255).astype(int) * [255*255, 255, 1], axis=1)
visualize_pointcloud(shape_points.T, colors=point_colors, point_size=0.025, flip_axes=True)

In [None]:
# Get shape point cloud, predict labels, and visualize colored point cloud
shape_points = ShapeNetParts.get_point_cloud_with_labels('03790512/86b6dc954e1ca8e948272812609617e2')[0]
point_labels = inferer.infer_single(shape_points)
point_labels = (point_labels - min(point_labels)) / (max(point_labels) - min(point_labels))
point_colors = cm.get_cmap('hsv')(point_labels)[:, :3]
point_colors = np.sum((point_colors * 255).astype(int) * [255*255, 255, 1], axis=1)
visualize_pointcloud(shape_points.T, colors=point_colors, point_size=0.025, flip_axes=True)

Make sure you submit the trained model exercise_2/runs/2_5_pointnet_segmentation_generalization/model_best.ckpt in your zip so that we can evaluate it on the test set at our end.

## Submission

This is the end of exercise 2 🙂. Please create a zip containing all files we provided, everything you modified, and all of your generated output/visualization files, including you checkpoints. Name it with your matriculation number(s) as described in exercise 1. Make sure this notebook can be run without problems. Then, submit via Moodle.

**Submission Deadline**: 31.11.2022, 23:55

## References



[1] Qi, C. et al. “Volumetric and Multi-view CNNs for Object Classification on 3D Data.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 5648-5656.

[2] Qi, C. et al. “PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017): 77-85.