# PointNet

In this exercise you will implement a simple version of the PointNet architecture for point cloud processing and train it to classify point cloud from the **ModelNet10** dataset (https://modelnet.cs.princeton.edu/).

First load the necessary dependencies by executing the following code:

In [None]:
#Numpy
import numpy as np
#Library used to load the data
import h5py

#!pip install -q tensorflow-gpu==2.0.0-beta1
try:
  %tensorflow_version 2.x  # Colab only.
except Exception:
  pass

import tensorflow as tf
print(tf.__version__)

`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `2.x  # Colab only.`. This will be interpreted as: `2.x`.


TensorFlow 2.x selected.
2.4.0


The first thing we will need to do is downloading the data. 

**ModelNet** is a dataset of different CAD objects. Two of the most common benchmarks used to evaluate the performance of a network to classify point clouds is **ModelNet10** and ModelNet40. **ModelNet40** is composed of almost 10k objects from 40 different classes and ModelNet10 is composed of almost 4k objects from 10 different classes. 

We have prepared a sampled version of ModelNet10 in which each object is sampled with 512 points. In order to download these files execute the following commands:

In [None]:
!wget https://www.dropbox.com/s/449t6c267kzspfs/modelnet10.zip
!unzip modelnet10.zip

--2021-01-20 09:32:39--  https://www.dropbox.com/s/449t6c267kzspfs/modelnet10.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.67.18, 2620:100:6020:18::a27d:4012
Connecting to www.dropbox.com (www.dropbox.com)|162.125.67.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/449t6c267kzspfs/modelnet10.zip [following]
--2021-01-20 09:32:39--  https://www.dropbox.com/s/raw/449t6c267kzspfs/modelnet10.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucfc8fdf0a87139e4e31d1326f69.dl.dropboxusercontent.com/cd/0/inline/BHWZNmQrU7Y-8vA2gd4r2D9B1IieHs3eg3M2gucEjxW1JRMR7jWEcKDeABkYI_zitHCu2EQtGG1ngZnwlFY6UJQwa95OrFCBlDEsf-fh209vXjs3Njx0ehpP1RFjplluKBg/file# [following]
--2021-01-20 09:32:40--  https://ucfc8fdf0a87139e4e31d1326f69.dl.dropboxusercontent.com/cd/0/inline/BHWZNmQrU7Y-8vA2gd4r2D9B1IieHs3eg3M2gucEjxW1JRMR7jWEcKDeABkYI_zitHCu2EQtGG1ngZnwlFY6UJQwa95OrFCBlDEsf-

If everything went well you should see the hdf5 file in the associated files of the notebook. Now we will prepare the data for training.

First we will load the hdf5 binary file and extract the different datasets.

In [None]:
dataset = h5py.File("modelnet10.hdf5", "r")
init_x_train = dataset['train_data'][:]
init_y_train = dataset['train_categories'][:]
x_test = dataset['test_data'][:, :, :]
y_test = dataset['test_categories'][:]

print("Point cloud training:", init_x_train.shape[0])
print("Point cloud testing:", x_test.shape[0])

Point cloud training: 3991
Point cloud testing: 908


In order to increase our training data and improve generalization, we are going to augment our training set by scaling it and by applying noise to each point coordinate. This will prevent the network to memorize each model in the training set.

In [None]:
#Number of data augmentation passes.
numAugment = 5

#Initialize the random seed.
np.random.seed(0)

#Create the tensor for the augmented data.
x_train = np.full((init_x_train.shape[0]*numAugment, 512, 3), \
                  0.0, dtype=np.float32)
y_train = np.full((init_x_train.shape[0]*numAugment), 0, dtype=np.int32)

#For each model.
for curModel in range(init_x_train.shape[0]):
  #For each augmentation pass
  for i in range(numAugment):
    
    #Compute a random scaling in each axis between 0.9 and 1.1
    scaling = np.random.random((3))*0.2+0.9
    x_train[curModel*numAugment + i, :, :] = \
      init_x_train[curModel, :, :].reshape((512,3))*scaling
    
    #Apply gaussian noise to each point coordinate with a stdev of 0.02.
    jittered_data = np.clip(0.02 * np.random.randn(512, 3), -0.1, 0.1)
    x_train[curModel*numAugment + i, :, :] = \
      x_train[curModel*numAugment + i, :, :] + jittered_data
    
    #Save the label for the augmented data.
    y_train[curModel*numAugment + i] = \
      init_y_train[curModel]

At this point we should have our dataset ready and we can start creating our PointNet network. The original paper contains multiple layers and transformation networks. However, in this exercise we will implement a much simpler version.

1.   First, we will create a MLP with two hidden layers (64 and 128 outputs) that transform our points to 128 dimensional points. (Each layer: Dense+BatchNorm+RELU)
2.   Then, we will aggregate all the points using a max pooling operation. For this we will use the low level API function tf.reduce_max. This function reduces a tensor along a dimension using the max operation.
3.   Lastly, we will apply another MLP to this global feature vector with two hidden layers (32 and 10 outputs) followed by a softmax activation function. We will apply dropout in this MLP with a rate of 0.5.

In [None]:
inputs = tf.keras.Input(shape=(512, 3), name='batch_point_cloud')

################# TODO 
#MLP to transform points to 64 dimensions
x = tf.keras.layers.Dense(64)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)

#MLP to transform points to 128 dimensions
x = tf.keras.layers.Dense(128)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)

#Max aggregation function
#At this point x has the shape [B, 512, 128], where B is the current batch size.
#We want to end with a global feature tensor with shape [B,128] getting the 
#maximum value for each of the 128 features along the 512 points (axis 1). Use
#the low-level API function tf.reduce_max for that.
x = tf.reduce_max(x , axis=1)

#Last MLP
x = tf.keras.layers.Dense(32)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
 ################# END TODO 

outputs = tf.keras.layers.Dense(10, activation='softmax')(x)

Now we can train our model executing the following commands, and if everything went well, we will achieve an accuracy of around 90%.

In [None]:
#Create the model.
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='modelnet10_model')

#Compile the model.
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.SGD(
                  learning_rate=0.0001, 
                  momentum=0.98),
              metrics=['accuracy'])

#Fit the model to the data.
model.fit(x_train, y_train,
          batch_size=32,
          epochs=40,
          validation_data=(x_test, y_test))

#Evaluate the model on the test data.
model.evaluate(x_test, y_test, verbose=0)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


[0.2551615238189697, 0.9096916317939758]



# PointNet (Segmentation)

In this section, we will implement the version of PointNet architecture used to segment point clouds from the **ShapeNet Part** dataset (https://shapenet.cs.stanford.edu/iccv17/). This dataset is composed of 16K models from 16 different categories with 50 different object parts in total.

The following command will download the preprocessed dataset.

In [None]:
!wget https://www.dropbox.com/s/taos2s389ikli6d/shapenet.zip
!unzip shapenet.zip

--2021-01-20 09:35:52--  https://www.dropbox.com/s/taos2s389ikli6d/shapenet.zip
Resolving www.dropbox.com (www.dropbox.com)... 162.125.67.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.67.18|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/taos2s389ikli6d/shapenet.zip [following]
--2021-01-20 09:35:52--  https://www.dropbox.com/s/raw/taos2s389ikli6d/shapenet.zip
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc557e3e614ad264ffa298311a56.dl.dropboxusercontent.com/cd/0/inline/BHX7t6VoQue-tv1KMThIhN9tsMUlYEXc4kDynL-BPKPn3CsfVM8NuvS2wLWTFfDzzMZrX9_ffT7gxMTywwpxF5g-WtpmSSM1O3vUuNaRfZedDtFzyVi5rU13q8xkP0JHXfM/file# [following]
--2021-01-20 09:35:53--  https://uc557e3e614ad264ffa298311a56.dl.dropboxusercontent.com/cd/0/inline/BHX7t6VoQue-tv1KMThIhN9tsMUlYEXc4kDynL-BPKPn3CsfVM8NuvS2wLWTFfDzzMZrX9_ffT7gxMTywwpxF5g-WtpmSSM1O3vUuNaRfZed

Once downloaded, we will prepare the data for training:

In [None]:
dataset_shapenet = h5py.File("shapenet.hdf5", "r")

#Training data.
x_train = dataset_shapenet['train_data'][:] # 3D point coordinates.
y_train = dataset_shapenet['train_labels'][:] # Point label (0-50).

#Validation data.
x_val = dataset_shapenet['val_data'][:] # 3D point coordinates.
y_val = dataset_shapenet['val_labels'][:] # Point label (0-50).

#Test data.
x_test = dataset_shapenet['test_data'][:] # 3D point coordinates.
y_test = dataset_shapenet['test_labels'][:] # Point label (0-50).

print("Point cloud training:", x_train.shape[0])
print("Point cloud validation:", x_val.shape[0])
print("Point cloud testing:", x_test.shape[0])

Point cloud training: 12137
Point cloud validation: 1870
Point cloud testing: 2874


Now we are ready to create our network architecture. We are going to design a similar architecture as in the previous section but here we will use it for point cloud segmentation. Therefore, we will need to do some changes.

Since we are doing predictions per point, we will need to concatenate the global descriptor obtained with the max-pooling to the individual features of each point. We will select as individual features the 128 features of the second layer after applying the activation function.

Once we concatenate the local and global information, we will process these features by another hidden layer of 128 features before predicting the final probabilities.

In [None]:
inputs = tf.keras.Input(shape=(512, 3), name='batch_point_cloud')

################# TODO 
#MLP to transform points to 64 dimensions
x = tf.keras.layers.Dense(64)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dropout(x)


#MLP to transform points to 128 dimensions
x = tf.keras.layers.Dense(128)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dropout(x)
y = x

#Max aggregation function
#At this point x has the shape [B, 512, 128], where B is the current batch size.
#We want to end with a global feature tensor with shape [B,128] getting the 
#maximum value for each of the 128 features along the 512 points (axis 1). Use
#the low-level API function tf.reduce_max for that.
x = tf.reduce_max(x , axis=1)

#Concatenate
#Here we have to concatenate x and y. For that we first will need to reshape y 
#to [B, 1, 128] and then use tf.tile to replicate axis one 512 times.
z = tf.keras.layers.Reshape((1,128))(x)
x = tf.tile(z , [1,512,1])
x = tf.concat([x,y] , 2)

#Last MLP
x = tf.keras.layers.Dense(128)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
x = tf.keras.layers.Dropout(x)
################# END TODO 

outputs = tf.keras.layers.Dense(50, activation='softmax')(x)

Lastly, we will train the model with the **ShapeNet Part** dataset. You should obtain an accuracy of around 84% in the test set.

In [None]:
#Create the model.
model = tf.keras.Model(inputs=inputs, outputs=outputs, name='shapenet_model')

#Compile the model.
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.SGD(
                  learning_rate=0.0001, 
                  momentum=0.98),
              metrics=['accuracy'])

#Fit the model to the data.
model.fit(x_train, y_train,
          batch_size=32,
          epochs=40,
          validation_data=(x_val, y_val))

#Evaluate the model on the test data.
model.evaluate(x_test, y_test, verbose=0)

Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40
Epoch 31/40
Epoch 32/40
Epoch 33/40
Epoch 34/40
Epoch 35/40
Epoch 36/40
Epoch 37/40
Epoch 38/40
Epoch 39/40
Epoch 40/40


[0.5521785616874695, 0.8241738677024841]