
## Installation

To run Jupyter notebook locally:

```
python3 -m pip install gcsfs waymo-open-dataset-tf-2-12-0==1.6.4
python3 -m pip install "notebook>=5.3" "ipywidgets>=7.5"
python3 -m pip install --upgrade jupyter_http_over_ws>=0.0.7 && \
jupyter serverextension enable --py jupyter_http_over_ws
jupyter notebook
```

# Overview

The Waymo Open Dataset (WOD), a comprehensive self-driving dataset, has recently undergone a major update to its format. The new format is based on the [Apache Parquet column-oriented file format](https://parquet.apache.org/docs/file-format/). This format separates the data into multiple tables, allowing users to selectively download the portion of the dataset needed for their specific use case. This modular format offers a significant advantage over the previous format by reducing the amount of data that needs to be downloaded and processed, saving time and resources.

This tutorial explores the benefits of the new format and how to effectively access and work with the selective parts of the WOD. 

Whether you are an experienced data scientist or just starting out in the field of autonomous driving, this tutorial will provide you with the information and tools you need to prepare a training dataset derived from WOD for your model.

## Navigating this tutorial

Text and code cells in this notebook use headers and cell titles which form a hierarchy to help navigate this tutorial:

- Overview
- Usage example:
  - Initial setup
  - Basic Example
  - Joining components for different applications
    - Scene or track level
      - Per-object trajectories
    - Frame level
      - Sensor data with both lidar and camera boxes
      - Sensor data with lidar and camera boxes if available
    - Object level
      - Camera image, lidar, boxes with keypoints
- Dataset Format
    - Folder structure
    - Loading raw data
    - Supported components
    - Object-oriented API
    - A relational dataset-like structure


Cell titles make sense in the hierarchy, ex. "Joining components for different applications > Frame level > Sensor data ...". Use "Table of contents" in Google Colab to see.

The first section "Usage example" has a number of motivating examples, feel free
to jump to the "Dataset Format" if you'd like to learn about technical details
first.

## v2 dataset format

Previous releases of the WOD were in the Frame binary protocol buffers serialized into tfrecords files. From now on we will refer to this data format as v1. With WOD `v2.0.0.alpha` we are launching the v2 format of the dataset. Note that column values in v2-supported components is the same as corresponding proto fields in the `v1.4.2`. Refer to the section "List columns in all components" for the full list of supported components and the code under `waymo_open_dataset/v2/perception/compat_v1/` to learn more how this correspondence is defined.

## v2 API

The v2 format was designed to be usable with any library which supports reading the Apache Parquet file format, without any extra dependencies (not even protocol buffers). See "Loading raw data" section for an example. 

We provide a collection of convenience functions and dataclasses, the main of which is `v2.merge` - a small wrapper for `DataFrame.merge` which exploits the v2 format conventions to significantly simplify merging multiple components into a single DataFrame object. To learn more refer to the docstring for the `merge` function in `waymo_open_dataset/v2/dataframe_utils.py` and `waymo_open_dataset/v2/__init__.py` to see the full surface of the v2 object oriented API for WOD.

# Usage examples

In [0]:
#@title Initial setup
from typing import Optional
import warnings
# Disable annoying warnings from PyArrow using under the hood.
warnings.simplefilter(action='ignore', category=FutureWarning)


import tensorflow as tf
import dask.dataframe as dd
from waymo_open_dataset import v2


# Path to the directory with all components
dataset_dir = '<specify actual path>'

context_name = '10023947602400723454_1120_000_1140_000'

def read(tag: str) -> dd.DataFrame:
  """Creates a Dask DataFrame for the component specified by its tag."""
  paths = tf.io.gfile.glob(f'{dataset_dir}/{tag}/{context_name}.parquet')
  return dd.read_parquet(paths)


In [0]:
# @title Basic Example (Camera images with labels)

# Lazily read camera images and boxes 
cam_image_df = read('camera_image')
cam_box_df = read('camera_box')

# Combine DataFrame for individual components into a single DataFrame.

# Camera cam_box_df will be grouped, so each row will have a camera image
# and all associated boxes.
image_w_box_df = v2.merge(cam_image_df, cam_box_df, right_group=True)

# Show raw data
image_w_box_df.head()

# Example how to access data fields via v2 object-oriented API
print(f'Available {image_w_box_df.shape[0].compute()} rows:')
for i, (_, r) in enumerate(image_w_box_df.iterrows()):
  # Create component dataclasses for the raw data
  cam_image = v2.CameraImageComponent.from_dict(r)
  cam_box = v2.CameraBoxComponent.from_dict(r)
  print(
      f'context_name: {cam_image.key.segment_context_name}'
      f' ts: {cam_image.key.frame_timestamp_micros}'
      f' camera_name: {cam_image.key.camera_name}'
      f' image size: {len(cam_image.image)} bytes.'
      f' Has {len(cam_box.key.camera_object_id)} camera labels:'
  )

  for j, (object_id, x, y) in enumerate(zip(
      cam_box.key.camera_object_id, cam_box.box.center.x, cam_box.box.center.y
  )):
    print(f'\tid: {object_id},  center: ({x:.1f}, {y:.1f}) px')
    if j > 2:
      print('\t...')
      break
  if i > 2:
    print('...')
    break

Available 979 rows:
context_name: 10023947602400723454_1120_000_1140_000 ts: 1552440195362591 camera_name: 1 image size: 387998 bytes. Has 50 camera labels:
	id: 02f3a769-021f-49b0-b954-18e4fa1d5fde,  center: (900.0, 665.7) px
	id: 0611ea50-f652-406d-942a-9ec7e3e90546,  center: (1303.6, 697.9) px
	id: 08b1e19e-b912-4963-a7bb-55e138bbc25e,  center: (1828.5, 741.5) px
	id: 11165fc3-e936-4722-9257-1574c6752eab,  center: (951.2, 658.7) px
	...
context_name: 10023947602400723454_1120_000_1140_000 ts: 1552440195362591 camera_name: 2 image size: 412300 bytes. Has 11 camera labels:
	id: 0ec04ccf-c924-45b1-ab54-cbc82a742877,  center: (703.9, 1108.3) px
	id: 144c7855-9e09-4bda-bb7a-c52a06d3075f,  center: (953.7, 628.4) px
	id: 17742b3c-df43-42a6-ba81-c67474d0990b,  center: (1052.9, 665.7) px
	id: 648739d1-b4d9-467a-8274-a68cbe294d48,  center: (1479.0, 1052.6) px
	...
context_name: 10023947602400723454_1120_000_1140_000 ts: 1552440195362591 camera_name: 4 image size: 193053 bytes. Has 1 camera la

## Joining components for different applications

Different models often require custom data preparation pipelines because each model has unique data requirements that need to be met for optimal performance. For example, one model may require lidar sensor data to be pre-processed to remove noise or keep only fixed number of points, while another model may require camera images to be processed for object detection. 

This section shows how to prepare input data for preprocessing pipelines operating at different levels: entire run segment, scene, frame, camera or a crop for a specific object.

NOTE: Depending on the level or type of the JOIN operation used to define a table, a single row of the table may contain repeated values for some parts of the data, referring the same original record (ex. repeated camera images or lidar data for each object). See examples below for details.

### Scene or track level

At this level each row of a table represents data for a single object across all frames in a scene.

In [0]:
# @title Per-object trajectories

# Lazily read lidar boxes
lidar_box_df = read('lidar_box')
# Group all available boxes per object
lidar_box_df = (
    lidar_box_df.groupby(['key.segment_context_name', 'key.laser_object_id'])
    .agg(list)
    .reset_index()
)


# Read a single row, which contains data for all available frames.
_, row = next(iter(lidar_box_df.iterrows()))

# Create component object
lidar_box = v2.LiDARBoxComponent.from_dict(row)
print(
    f'Object {lidar_box.key.laser_object_id=} seen on'
    f' {len(lidar_box.key.frame_timestamp_micros)} frames'
)
print('Its trajectory across frames:')
print(f'\t{lidar_box.box.center.x=}')
print(f'\t{lidar_box.box.center.y=}')
print(f'\t{lidar_box.box.center.z=}')

Object lidar_box.key.laser_object_id='-ItvfksmEcYtVEcOjjRESg' seen on 90 frames
Its trajectory accross frames:
	lidar_box.box.center.x=[23.267635429570873, 23.249028348451247, 23.218535695019455, 23.141356138959964, 23.041015064622115, 22.943379720318262, 22.837007592977898, 22.699640942439146, 22.55271614263438, 22.406182537455607, 22.255760798618212, 22.104388865565852, 21.926845604939444, 21.74289930834675, 21.541009551732714, 21.31035498363417, 21.068154290935126, 20.826015581797037, 20.586888317488047, 20.34230145818219, 20.08004165622333, 19.81921650300501, 19.564810605319508, 19.316562933598107, 19.065521244701813, 18.822356058548394, 18.593808704348703, 18.38374361586102, 18.177806150901233, 18.000751103629227, 17.84484811807488, 17.69976946847237, 17.564665781314034, 17.441918124226504, 17.338055039508617, 17.242434248302743, 17.160077564167295, 17.092516316946785, 17.039343006006675, 16.998662457359387, 16.976064796856008, 16.966728632773084, 16.965259847673224, 16.9837279791

### Frame level

At this level each row of a table represents a single frame with all objects in it.

In [0]:
#@title Sensor data with both lidar and camera boxes

# Lazily read DataFrames for all components.
association_df = read('camera_to_lidar_box_association')
cam_box_df = read('camera_box')
cam_img_df = read('camera_image')
lidar_box_df = read('lidar_box')
lidar_df = read('lidar')

# Join all DataFrames using matching columns
cam_image_w_box_df = v2.merge(cam_box_df, cam_img_df)
cam_obj_df = v2.merge(association_df, cam_image_w_box_df)
obj_df = v2.merge(cam_obj_df, lidar_box_df)
# Group lidar sensors (left), group labels and camera images (right) and join.
df = v2.merge(lidar_df, obj_df, left_group=True, right_group=True)

# Read a single row, which contain data for all data for a single frame.
_, row = next(iter(df.iterrows()))
# Create all component objects
camera_image = v2.CameraImageComponent.from_dict(row)
lidar = v2.LiDARComponent.from_dict(row)
camera_box = v2.CameraBoxComponent.from_dict(row)
lidar_box = v2.LiDARBoxComponent.from_dict(row)

print(
    f'Found {len(lidar_box.key.laser_object_id)} objects on'
    f' {lidar.key.segment_context_name=} {lidar.key.frame_timestamp_micros=}'
)
for laser_object_id, camera_object_id, camera_name in zip(
    lidar_box.key.laser_object_id,
    camera_box.key.camera_object_id,
    camera_image.key.camera_name,
):
  print(f'\t{laser_object_id=} {camera_object_id=} {camera_name=}')

Found 24 objects on lidar.key.segment_context_name='10023947602400723454_1120_000_1140_000' lidar.key.frame_timestamp_micros=1552440195362591
	laser_object_id='ZyK_iICxQsEYdJLxMFTw7w' camera_object_id='08b1e19e-b912-4963-a7bb-55e138bbc25e' camera_name=1
	laser_object_id='ZyK_iICxQsEYdJLxMFTw7w' camera_object_id='da518036-17cb-48c4-97aa-97939f989d16' camera_name=3
	laser_object_id='dNW3S4yA8s8GmltmAPV8LQ' camera_object_id='24508c28-3154-4b7f-964d-70b3ed7d5a9d' camera_name=1
	laser_object_id='8IZ7fkXm0FDeUdEuJAqSlA' camera_object_id='260f57c0-787d-4f36-bfe9-64ca9828448f' camera_name=1
	laser_object_id='brLOJzgVMhP_-kdQzIa7ng' camera_object_id='30326915-a157-4c60-9bef-0a188938c998' camera_name=1
	laser_object_id='txvVqORJ6Gcf510DdJvW4Q' camera_object_id='4286a233-4a34-4d99-b214-2d8690e92570' camera_name=1
	laser_object_id='L5HBM5tSKtNA4qHegDQk8Q' camera_object_id='4b9a9206-fbf1-4d45-8765-197759100bd6' camera_name=1
	laser_object_id='YTw2k0_-8UqolJn4WQRc2g' camera_object_id='699b6cb3-afc1-

In the example above the same camera images are repeated for multiple objects with the same camera_name. 

In [0]:
# @title Sensor data with lidar and camera boxes if available

# Lazily read DataFrames for all components.
association_df = read('camera_to_lidar_box_association')
cam_box_df = read('camera_box')
cam_img_df = read('camera_image')
lidar_box_df = read('lidar_box')
lidar_df = read('lidar')

# Join all DataFrames using matching columns
cam_image_w_box_df = v2.merge(cam_box_df, cam_img_df)
cam_obj_df = v2.merge(association_df, cam_image_w_box_df)
# In this example camera box labels are optional, so we set left_nullable=True.
obj_df = v2.merge(cam_obj_df, lidar_box_df, left_nullable=True)
# Group lidar sensors (left), group labels and camera images (right) and join.
df = v2.merge(lidar_df, obj_df, left_group=True, right_group=True)

# Read a single row, which contain data for all data for a single frame.
_, row = next(iter(df.iterrows()))
# Create all component objects
camera_image = v2.CameraImageComponent.from_dict(row)
lidar = v2.LiDARComponent.from_dict(row)
camera_box = v2.CameraBoxComponent.from_dict(row)
lidar_box = v2.LiDARBoxComponent.from_dict(row)

print(
    f'Found {len(lidar_box.key.laser_object_id)} objects on'
    f' {lidar.key.segment_context_name=} {lidar.key.frame_timestamp_micros=}'
)
for laser_object_id, camera_object_id, camera_name in zip(
    lidar_box.key.laser_object_id,
    camera_box.key.camera_object_id,
    camera_image.key.camera_name,
):
  print(f'\t{laser_object_id=} {camera_object_id=} {camera_name=}')

Found 81 objects on lidar.key.segment_context_name='10023947602400723454_1120_000_1140_000' lidar.key.frame_timestamp_micros=1552440195362591
	laser_object_id='-U88NMYnocLWCh6iqZwj1g' camera_object_id=nan camera_name=nan
	laser_object_id='0VCoeT-jjrIfzTCsOWz20A' camera_object_id=nan camera_name=nan
	laser_object_id='0_HBXNo3olLueqYvkPohlg' camera_object_id=nan camera_name=nan
	laser_object_id='1nDCER_bA9py1ZPpNXecog' camera_object_id=nan camera_name=nan
	laser_object_id='2-A6zakvKX2opVnyx9gplQ' camera_object_id='a6f937a6-7ea8-4393-b636-e0560e699856' camera_name=1.0
	laser_object_id='2OYKagQRfCdaOXgU5RkMBA' camera_object_id=nan camera_name=nan
	laser_object_id='2SYmRAjI0pCOwp2XYemMBQ' camera_object_id='ba670814-995e-4ade-bc42-58b0a1d8ec8d' camera_name=1.0
	laser_object_id='2SYmRAjI0pCOwp2XYemMBQ' camera_object_id='ca9be338-79bb-4908-b4ee-5607a21b5b41' camera_name=2.0
	laser_object_id='3083QteOhZ_vSpxmP0XK-Q' camera_object_id=nan camera_name=nan
	laser_object_id='38Np8bwqcvw9KkrH3xHfpg' 

### Object level

At this level each row of a table represents a single object at a single frame.

In [0]:
# @title Camera image, lidar, boxes with keypoints


# Lazily read DataFrames for all components.
association_df = read('camera_to_lidar_box_association')
cam_hkp_df = read('camera_hkp')
cam_box_df = read('camera_box')
cam_img_df = read('camera_image')
lidar_box_df = read('lidar_box')
lidar_df = read('lidar')

# Join all DataFrame objects for all components together.
cam_image_w_box_df = v2.merge(cam_box_df, cam_img_df)
cam_image_w_box_w_hkp_df = v2.merge(cam_image_w_box_df, cam_hkp_df)
cam_obj_df = v2.merge(association_df, cam_image_w_box_w_hkp_df)
obj_df = v2.merge(cam_obj_df, lidar_box_df)
t = v2.merge(lidar_df, obj_df, left_group=True, right_group=True)

# Create a row iterator (continue to the next cell)
it = iter(t.iterrows())

In [0]:
# Execute this cell multiple times to see data for different rows.

# Actually read the data.
_, row = next(it)

# Create all component objects
camera_hkp = v2.CameraHumanKeypointsComponent.from_dict(row)
camera_box = v2.CameraBoxComponent.from_dict(row)
camera_image = v2.CameraImageComponent.from_dict(row)
lidar_box = v2.LiDARBoxComponent.from_dict(row)
lidar = v2.LiDARComponent.from_dict(row)

print(
    f'Found {len(lidar_box.key.laser_object_id)} objects on'
    f' {lidar.key.segment_context_name=} {lidar.key.frame_timestamp_micros=}'
)
for laser_object_id, camera_object_id, camera_name, cam_kp_x in zip(
    lidar_box.key.laser_object_id,
    camera_box.key.camera_object_id,
    camera_image.key.camera_name,
    camera_hkp.camera_keypoints.keypoint_2d.location_px.x,
):
  print(
      f'\t{laser_object_id=} {camera_object_id=} {camera_name=} with'
      f' {len(cam_kp_x)} camera keypoints'
  )

Found 17 objects on lidar.key.segment_context_name='10023947602400723454_1120_000_1140_000' lidar.key.frame_timestamp_micros=1552440196462383
	laser_object_id='dNW3S4yA8s8GmltmAPV8LQ' camera_object_id='24508c28-3154-4b7f-964d-70b3ed7d5a9d' camera_name=1 with 6 camera keypoints
	laser_object_id='8IZ7fkXm0FDeUdEuJAqSlA' camera_object_id='260f57c0-787d-4f36-bfe9-64ca9828448f' camera_name=1 with 14 camera keypoints
	laser_object_id='brLOJzgVMhP_-kdQzIa7ng' camera_object_id='30326915-a157-4c60-9bef-0a188938c998' camera_name=1 with 11 camera keypoints
	laser_object_id='YTw2k0_-8UqolJn4WQRc2g' camera_object_id='699b6cb3-afc1-469f-8ac9-40978aeb511b' camera_name=1 with 11 camera keypoints
	laser_object_id='Yyu039jUMIJ4gI_2-mTkSg' camera_object_id='6cee0533-b9cd-462e-8a05-d781e8864f16' camera_name=1 with 10 camera keypoints
	laser_object_id='kWKrXcZMJ7I5K4Z4z9GdSQ' camera_object_id='7f37cfc3-1926-43a1-9af6-109ca0a5c797' camera_name=1 with 6 camera keypoints
	laser_object_id='sJlmZW9yFdc8ca--Rhk7

# Dataset Format

## Folder structure

Here are a few examples of how the file paths might look using the format `{path_to_the_dataset}/{component_tag}/{component_name}.parquet`:

```
/waymo_open_dataset/camera_image/10023947602400723454_1120_000_1140_000.parquet
...
/waymo_open_dataset/lidar_box/10023947602400723454_1120_000_1140_000.parquet
...
/waymo_open_dataset/lidar/10023947602400723454_1120_000_1140_000.parquet
```

In these examples, "waymo_open_dataset" is the path to the dataset, "camera_image", "lidar_box", and "lidar" are the string tags for corresponding components, "10023947602400723454_1120_000_1140_000" is the `context_name`.


## Loading raw data

You can use any existing library which supports the Apache Parquet files to read the dataset, for example PyArrow, Pandas, Dask or any other. We recommend to use [Dask](https://docs.dask.org/en/stable/) to access entire dataset, because it supports larger-then-memory tables and distributed processing. If data for a single segment (aka partition) fit into memory, you can use Pandas as well. Both libraries have very similar APIs.

In [0]:
import dask.dataframe as dd

table_path = f'{dataset_dir}/camera_image/{context_name}.parquet'
print(f'Reading a single shard from a single component {table_path}')
table = dd.read_parquet(table_path)
table.head()

Reading a single shard from a single component /tmp/wod_debug_20230212/camera_image/10023947602400723454_1120_000_1140_000.parquet


Unnamed: 0,key.segment_context_name,key.frame_timestamp_micros,key.camera_name,[CameraImageComponent].image,[CameraImageComponent].pose.transform,[CameraImageComponent].velocity.linear_velocity.x,[CameraImageComponent].velocity.linear_velocity.y,[CameraImageComponent].velocity.linear_velocity.z,[CameraImageComponent].velocity.angular_velocity.x,[CameraImageComponent].velocity.angular_velocity.y,[CameraImageComponent].velocity.angular_velocity.z,[CameraImageComponent].pose_timestamp,[CameraImageComponent].rolling_shutter_params.shutter,[CameraImageComponent].rolling_shutter_params.camera_trigger_time,[CameraImageComponent].rolling_shutter_params.camera_readout_done_time,Unnamed: 16
0,10023947602400723454_1120_000_1140_000,1552440195362591,1,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6990588408049747, 0.7150334085392328, 0.006...",1.337475,-1.359002,0.027288,-0.015392,0.012218,0.011145,1552440000.0,0.006981,1552440000.0,1552440000.0,1.0
1,10023947602400723454_1120_000_1140_000,1552440195362591,2,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6989900519876013, 0.7151025083199268, 0.006...",1.333927,-1.373261,0.024719,-0.016407,0.010103,0.017257,1552440000.0,0.009992,1552440000.0,1552440000.0,1.0
2,10023947602400723454_1120_000_1140_000,1552440195362591,4,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6987839219647801, 0.7153068814214333, 0.006...",1.339028,-1.374271,0.019406,-0.017616,0.006203,0.01399,1552440000.0,0.009992,1552440000.0,1552440000.0,1.210402
3,10023947602400723454_1120_000_1140_000,1552440195362591,3,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6991963373797294, 0.7148964117436919, 0.006...",1.332058,-1.358936,0.031292,-0.015262,0.015056,0.013473,1552440000.0,0.009992,1552440000.0,1552440000.0,1.07113
4,10023947602400723454_1120_000_1140_000,1552440195362591,5,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6992650451400215, 0.7148278952390329, 0.007...",1.332926,-1.35307,0.035072,-0.014256,0.017961,0.010754,1552440000.0,0.009992,1552440000.0,1552440000.0,1.537538


## Supported components

A **component** in the context of the Waymo Open Dataset v2 format is a set of related fields (aka table columns) that are all needed to understand each individual field. In other words, a component is a collection of data that is related and interdependent. For example, camera image component includes the following fields:

 type | Column name
 --- | ---
 string | key.segment_context_name
 int64 | key.frame_timestamp_micros
 int8 | key.camera_name
 binary | [CameraImageComponent].image
 fixed_size_list<item: double>[16] | [CameraImageComponent].pose.transform
 float | [CameraImageComponent].velocity.linear_velocity.x
 float | [CameraImageComponent].velocity.linear_velocity.y
 float | [CameraImageComponent].velocity.linear_velocity.z
 double | [CameraImageComponent].velocity.angular_velocity.x
 double | [CameraImageComponent].velocity.angular_velocity.y
 double | [CameraImageComponent].velocity.angular_velocity.z
 double | [CameraImageComponent].pose_timestamp
 double | [CameraImageComponent].rolling_shutter_params.shutter
 double | [CameraImageComponent].rolling_shutter_params.camera_trigger_time
 double | [CameraImageComponent].rolling_shutter_params.camera_readout_done_time


Key columns start with `key.` prefix and use same names across all components. Component specific columns start with `[Name of a Component].` prefix. Each column represents a simple Python type or a list of elements of a simple type.
For example a 3D vector representing linear velocity of a camera sensor at the
specific frame is stored in 3 table columns:


 type | Column name
 --- | ---
float | [CameraImageComponent].velocity.linear_velocity.x
float | [CameraImageComponent].velocity.linear_velocity.y
float | [CameraImageComponent].velocity.linear_velocity.z

For the details about each component refer to the source code under `waymo_open_dataset/v2/perception`. To see all available components execute the following cell:

In [0]:
# @title List columns in all components

print('Available components:')
for component, tag in v2.TAG_BY_COMPONENT.items():
  print(f'{tag}: {component.__name__}')
  schema = component.schema()
  for column, arrow_type in zip(schema.names, schema.types):
    print(f'\t{str(arrow_type):40s} {column}')

Available components:
camera_box: CameraBoxComponent
	string                                   key.segment_context_name
	int64                                    key.frame_timestamp_micros
	int8                                     key.camera_name
	string                                   key.camera_object_id
	double                                   [CameraBoxComponent].box.center.x
	double                                   [CameraBoxComponent].box.center.y
	double                                   [CameraBoxComponent].box.size.x
	double                                   [CameraBoxComponent].box.size.y
	int8                                     [CameraBoxComponent].type
	int8                                     [CameraBoxComponent].difficulty_level.detection
	int8                                     [CameraBoxComponent].difficulty_level.tracking
camera_calibration: CameraCalibrationComponent
	string                                   key.segment_context_name
	int8                        

## Object-oriented API

Users can access the dataset using the mentioned external libraries, but the WODv2 python library defines an object-oriented API - a high level abstraction that allows users to work with the data in a more intuitive and streamlined manner. The API provides easy-to-use classes and methods for accessing and manipulating the data, making it easier to write code and reducing the time required to perform complex data operations. 

Below are two examples:

In [0]:
# Read a single row from the table
_, row = next(iter(table.iterrows()))


# Direct access to the tabular data
def process_coordinates(x: float, y: float, z: float) -> None:
  print(f'{x=} {y=} {z=}')

# Use DataFrame row directly
process_coordinates(
    row['[CameraImageComponent].velocity.linear_velocity.x'],
    row['[CameraImageComponent].velocity.linear_velocity.y'],
    row['[CameraImageComponent].velocity.linear_velocity.z'],
)


# Access via the object-oriented API
def process_point(p: v2.Vec3d) -> None:
  print(f'{p.x=} {p.y=} {p.z=}')

image = v2.CameraImageComponent.from_dict(row)
process_point(image.velocity.linear_velocity)

x=1.3374745845794678 y=-1.359001636505127 z=0.02728821150958538
p.x=1.3374745845794678 p.y=-1.359001636505127 p.z=0.02728821150958538


## A relational database-like structure

The dataset is organized into multiple tables with multi-column keys, creating a relational database-like structure on top of Apache Parquet files. This structure combines the benefits of Apache Parquet's efficient storage and retrieval of large amounts of data with the ease of use of relational databases. It enables advanced data operations like filtering, grouping, and aggregating, and the use of multi-column keys enhances the ability to link data across tables (aka joins).

In the example below we create a table with camera images and bounding boxes, only for the frontal camera (camera_name.FRONT = 1):

In [0]:
camera_image_df = read('camera_image')
# Filter the images from camera=1
# NOTE: We could also use push down filters while reading the parquet files as well
# Details https://docs.dask.org/en/stable/generated/dask.dataframe.read_parquet.html#dask.dataframe.read_parquet
camera_image_df = camera_image_df[camera_image_df['key.camera_name'] == 1]

camera_box_df = read('camera_box')
# Inner join the camera_image table with the camera_box table.
df = camera_image_df.merge(
    camera_box_df,
    on=[
        'key.segment_context_name',
        'key.frame_timestamp_micros',
        'key.camera_name',
    ],
    how='inner',
)

# Create corresponding components from the raw
_, row = next(iter(df.iterrows()))

camera_image = v2.CameraImageComponent.from_dict(row)
camera_box = v2.CameraBoxComponent.from_dict(row)
print(
    f'Loaded image ({len(camera_image.image)} bytes) for'
    f' {camera_image.key.camera_name=} {camera_image.key.frame_timestamp_micros} {camera_image.key.camera_name=}'
)
print(
    'Loaded bounding box for'
    f' {camera_box.key.camera_object_id=} {camera_box.box=}'
)

Loaded image (387998 bytes) for camera_image.key.camera_name=1 1552440195362591 camera_image.key.camera_name=1
Loaded bounding box for camera_box.key.camera_object_id='02f3a769-021f-49b0-b954-18e4fa1d5fde' camera_box.box=BoxAxisAligned2d(center=Vec2d(x=900.01575, y=665.69586), size=Vec2d(x=13.894980000000032, y=22.737240000000043))


Since we use the same convention to name key columns for all components we can automatically determine a subset of columns to JOIN two tables by calling `v2.merge` function (SQL dialects use the "join" term, while Pandas and Dask call it "merge"):

In [0]:
# Merge
df = v2.merge(read('camera_image'), read('camera_box'))

# Show the list of columns in the combined DataFrame
df.head()

Unnamed: 0,key.segment_context_name,key.frame_timestamp_micros,key.camera_name,[CameraImageComponent].image,[CameraImageComponent].pose.transform,[CameraImageComponent].velocity.linear_velocity.x,[CameraImageComponent].velocity.linear_velocity.y,[CameraImageComponent].velocity.linear_velocity.z,[CameraImageComponent].velocity.angular_velocity.x,[CameraImageComponent].velocity.angular_velocity.y,[CameraImageComponent].velocity.angular_velocity.z,[CameraImageComponent].pose_timestamp,[CameraImageComponent].rolling_shutter_params.shutter,[CameraImageComponent].rolling_shutter_params.camera_trigger_time,[CameraImageComponent].rolling_shutter_params.camera_readout_done_time,key.camera_object_id,[CameraBoxComponent].box.center.x,[CameraBoxComponent].box.center.y,[CameraBoxComponent].box.size.x,[CameraBoxComponent].box.size.y,[CameraBoxComponent].type,[CameraBoxComponent].difficulty_level.detection,[CameraBoxComponent].difficulty_level.tracking,Unnamed: 24
0,10023947602400723454_1120_000_1140_000,1552440195362591,1,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6990588408049747, 0.7150334085392328, 0.006...",1.337475,-1.359002,0.027288,-0.015392,0.012218,0.011145,1552440000.0,0.006981,1552440000.0,1552440000.0,0.0,02f3a769-021f-49b0-b954-18e4fa1d5fde,900.01575,665.69586,13.89498,22.73724,1,2.0,2.0
1,10023947602400723454_1120_000_1140_000,1552440195362591,1,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6990588408049747, 0.7150334085392328, 0.006...",1.337475,-1.359002,0.027288,-0.015392,0.012218,0.011145,1552440000.0,0.006981,1552440000.0,1552440000.0,0.0,0611ea50-f652-406d-942a-9ec7e3e90546,1303.60176,697.90695,29.05314,54.94833,2,,
2,10023947602400723454_1120_000_1140_000,1552440195362591,1,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6990588408049747, 0.7150334085392328, 0.006...",1.337475,-1.359002,0.027288,-0.015392,0.012218,0.011145,1552440000.0,0.006981,1552440000.0,1552440000.0,0.0,08b1e19e-b912-4963-a7bb-55e138bbc25e,1828.45305,741.48666,67.58013,99.15963,2,,
3,10023947602400723454_1120_000_1140_000,1552440195362591,1,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6990588408049747, 0.7150334085392328, 0.006...",1.337475,-1.359002,0.027288,-0.015392,0.012218,0.011145,1552440000.0,0.006981,1552440000.0,1552440000.0,0.0,11165fc3-e936-4722-9257-1574c6752eab,951.17454,658.74837,13.89498,15.78975,1,2.0,2.0
4,10023947602400723454_1120_000_1140_000,1552440195362591,1,b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00...,"[0.6990588408049747, 0.7150334085392328, 0.006...",1.337475,-1.359002,0.027288,-0.015392,0.012218,0.011145,1552440000.0,0.006981,1552440000.0,1552440000.0,0.0,24508c28-3154-4b7f-964d-70b3ed7d5a9d,608.85276,715.59147,67.58013,99.15963,2,,
