# Tutorial of merging camera data and extracting camera features for WOMD

This tutorial demonstrates how to add camera tokens to the original WOMD scenes. It also provides methods to extract camera features from the merged scenario proto message. Note that WOMD also provides APIs to load the LiDAR data in the tutorial `tutorial_womd_lidar.ipynb`.

## Install

To run Jupyter Notebook locally:

```
python3 -m pip install waymo-open-dataset-tf-2-12-0==1.6.4
python3 -m pip install "notebook>=5.3" "ipywidgets>=7.5"
python3 -m pip install --upgrade jupyter_http_over_ws>=0.0.7 && \
jupyter serverextension enable --py jupyter_http_over_ws
jupyter notebook
```

In [0]:
import os
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()

from waymo_open_dataset import dataset_pb2
from waymo_open_dataset.protos import scenario_pb2
from waymo_open_dataset.utils import womd_camera_utils

# Augmenting a WOMD scenario

To augment the original WOMD with camera data for input frames, there are three steps:
1. Load the first scenario proto message from the motion data and check the `scenario_id` field.
2. Find the corresponding frame camera data file which has the file name as `{scenario_id}.tfrecord`.
3. Load the frame camera data file which is a scenario proto with non-empty `frame_camera_tokens` field only and merge the loaded data into the scenario proto's `frame_camera_tokens` field.

In [0]:
def _load_scenario_data(tfrecord_file: str) -> scenario_pb2.Scenario:
  """Load a scenario proto from a tfrecord dataset file."""
  dataset = tf.data.TFRecordDataset(tfrecord_file, compression_type='')
  data = next(iter(dataset))
  return scenario_pb2.Scenario.FromString(data.numpy())

WOMD_FILE = '/content/waymo-od/tutorial/womd_scenario_input.tfrecord'
womd_original_scenario = _load_scenario_data(WOMD_FILE)
print(f'Loaded a scenario with the scenario_id {womd_original_scenario.scenario_id}'

In [0]:
# The corresponding compressed camera data file has the name
# {scenario_id}.tfrecord. For simplicity, we rename the corresponding camera
# data file 'ee519cf571686d19.tfrecord' to be
# 'womd_lidar_and_camera_data.tfrecord'.
CAMERA_DATA_FILE = '/content/waymo-od/tutorial/womd_lidar_and_camera_data.tfrecord'
womd_camera_scenario = _load_scenario_data(CAMERA_DATA_FILE)
scenario_augmented = womd_camera_utils.add_camera_tokens_to_scenario(
    womd_original_scenario, womd_camera_scenario)
print(f'#frames = {len(scenario_augmented.frame_camera_tokens)}')

# Extract camera features

The camera data in the WOMD proto files is a sequence of integers for each frame and each sensor. Each sensor image is encoded with 256 integers (tokens). The integer refers to the row index of a pre-trained coodebook. We provide the codebook in the tutorial and show how to extract the corresponding camera features.

In [0]:
WOMD_CAMERA_CODEBOOK_FILE = '/content/waymo-od/tutorial/womd_camera_codebook.npy'
womd_camera_codebook = np.load(WOMD_CAMERA_CODEBOOK_FILE)

cur_frame_index = 0
for camera_tokens in scenario_augmented.frame_camera_tokens[cur_frame_index].camera_tokens:
  print(f'Camera name = {camera_tokens.camera_name}')
  tokens = np.array(camera_tokens.tokens, dtype=int)
  embedding = womd_camera_utils.get_camera_embedding_from_codebook(
      womd_camera_codebook, tokens
  )
  print(f'Embedding shape = {embedding.shape}')