# Rekall Tutorial: Data Loading and Visualization

In this tutorial, you'll take a deep dive into loading data into Rekall and visualizing annotations with Vgrid. We'll dive into the "helper code" used in the Cyclist Detection tutorial.

You should complete this tutorial after the Cyclist Detection tutorial.

# Imports

So far, we've just been importing Rekall libraries in our notebooks. Now we'll also import vgrid and vgrid_jupyter (Vgrid plugin for Jupyter notebooks) into our environment. We'll also import some standard Python libraries to read data from our servers. Previously the helper code was doing this for us.

In [1]:
%load_ext autoreload
%autoreload 2

# Rekall imports
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from rekall.predicates import *

# Vgrid imports
from vgrid import VGridSpec, VideoMetadata, VideoBlockFormat, FlatFormat, SpatialType_Bbox
from vgrid_jupyter import VGridWidget

# Imports to read data from external servers.
import urllib3, requests, os, pickle, posixpath

# Preview

Let's first take a look at what the end product will look like. This code will load data the cyclist detection dataset and Mask R-CNN detections from GCP and visualize it. This is Intel's CyDet dataset (hence variable names and URLs).

### Once the visualization is up, click it to expand it. Hover over the expanded video and use `;` to play/pause the video.

In [8]:
urllib3.disable_warnings()
VIDEO_COLLECTION_BASEURL = "https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet" 
VIDEO_METADATA_FILENAME = "metadata.json"
req = requests.get(posixpath.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = sorted(req.json(), key=lambda vm: vm['filename'])

video_metadata = [
    VideoMetadata(v["filename"], v["id"], v["fps"], int(v["num_frames"]), v["width"], v["height"])
    for v in video_collection
]

maskrcnn_bbox_files = [ 'maskrcnn_bboxes_0001.pkl', 'maskrcnn_bboxes_0004.pkl' ]

maskrcnn_bboxes = []
for bbox_file in maskrcnn_bbox_files:
    req = requests.get(posixpath.join(VIDEO_COLLECTION_BASEURL, bbox_file), verify=False)
    maskrcnn_bboxes.append(pickle.loads(req.content))

# Load Mask R-CNN data into Rekall
maskrcnn_bboxes_ism = IntervalSetMapping({
    vm.id: IntervalSet([
        Interval(
            Bounds3D(
                t1 = frame_num / vm.fps,
                t2 = (frame_num + 1) / vm.fps,
                x1 = bbox[0] / vm.width,
                x2 = bbox[2] / vm.width,
                y1 = bbox[1] / vm.height,
                y2 = bbox[3] / vm.height
            ),
            payload = {
                'class': bbox[4],
                'score': bbox[5],
                'spatial_type': SpatialType_Bbox(text=bbox[4])
            }
        )
        for frame_num, bboxes_in_frame in enumerate(maskrcnn_frame_list)
        for bbox in bboxes_in_frame
    ])
    for vm, maskrcnn_frame_list in zip(video_metadata, maskrcnn_bboxes)
})

# Visualize the data
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('bboxes', maskrcnn_bboxes_ism)
    ]),
    video_endpoint = VIDEO_COLLECTION_BASEURL
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xdb\xae&K\x92\x9c\xf7*D_\x0bB\x9c\x0f\xba\…

# Code Walkthrough
Now let's walk through the above code bit by bit to get an idea of what's going on.

## Load Video metadata

First we need to get some metadata about the individual videos that we're visualizing. In particular, we need to know the **FPS, duration, width, and height** of each video in order to display them using Vgrid. In our case, we've already computed these things for our driving videos, but you can also use [this script](https://github.com/scanner-research/esperlight/blob/master/create_video_metadata.py) to compute them for you (`fmpeg/ffprobe` are dependencies).

### This code loads pre-computed FPS, duration, width, and height of each video and puts them into `VideoMetadata` objects:

In [9]:
urllib3.disable_warnings()
VIDEO_COLLECTION_BASEURL = "https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet" 
VIDEO_METADATA_FILENAME = "metadata.json"

req = requests.get(posixpath.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = sorted(req.json(), key=lambda vm: vm['filename'])

video_metadata = [
    VideoMetadata(
        v["filename"], id=v["id"], fps=v["fps"],
        num_frames=int(v["num_frames"]), width=v["width"], height=v["height"])
    for v in video_collection
]

Let's go line by line:

Lines 2-4 specify the location of the metadata. You can look at the JSON file yourself by going to https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet/metadata.json.

```Python
urllib3.disable_warnings()
VIDEO_COLLECTION_BASEURL = "https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet"
VIDEO_METADATA_FILENAME = 'metadata.json'
```
    
Lines 6-7 get the data with an HTTP request and parse it into JSON:

```Python
req = requests.get(posixpath.join(VIDEO_COLLECTION_BASEURL, VIDEO_METADATA_FILENAME), verify=False)
video_collection = sorted(req.json(), key=lambda vm: vm['filename'])
```

At this point, `metadata_json` is a list of Python objects, with information about each video's filename, FPS, width, height, and number of frames. We can loop through this list and construct a list of `VideoMetadata` objects with that information:

```Python
video_metadata = [
    VideoMetadata(
        v["filename"], id=v["id"], fps=v["fps"],
        num_frames=v["num_frames"], width=v["width"], height=v["height"])
    for v in video_collection
]
```
    
`VideoMetadata` objects are constructed by passing in `path`, `id`, `fps`, `num_frames`, `width`, and `height`. The `path` is used by Vgrid and a fileserver to serve the video, and `id` is a key that links visual bounding box metadata to the videos.

### Aside: Visualizing videos directly.

Now that we've loaded in video-level metadata, we can visualize the videos in Vgrid directly.

#### Again, click to expand the thumbnails. Then use `;` to play the videos.

In [10]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = None, video_meta = video_metadata),
    video_endpoint = 'https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xadR\xcbn\xdb0\x10\xfc\x15\x83\xbd\x06\x16\x95\xd8…

Going line by line:

Lines 1-5 specify a Vgrid spec for the widget:

```Python
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = None, video_meta = video_metadata),
    video_endpoint = 'https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet'
)
```
    
The `video_meta` option takes in a list of per-video metadata. `vis_format` specifies how the individual blocks in Vgrid should be drawn, along with what to draw on them. In this case, we are using `VideoBlockFormat` and passing in `None` for `imaps` and `video_metadata` for `video_meta`. This will automatically create one block for each video in `video_meta`. Later one, we'll see how we can use it to draw the spatial metadata as well. Finally, `video_endpoint` specifies that we should look for the videos on the `olimar` server.

Finally, line 6 creates the widget and displays it in our Jupyter environment:

```Python
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())
```
    
We pass the spec to the Vgrid widget as compressed JSON. Since it's the last line of the cell, it gets displayed below the cell automatically.

## Load Bounding Boxes from Pickle files

But as we saw earlier, we can do a lot more than just look at videos if we have some spatial metadata to associate with the videos.

**NB in your applications, you'll load data from your own data sources -- however you computed them!**

###  This code loads bounding box data associated with each video from olimar:

In [13]:
maskrcnn_bbox_files = [ 'maskrcnn_bboxes_0001.pkl', 'maskrcnn_bboxes_0004.pkl' ]

maskrcnn_bboxes = []
for bbox_file in maskrcnn_bbox_files:
    req = requests.get(posixpath.join(VIDEO_COLLECTION_BASEURL, bbox_file), verify=False)
    maskrcnn_bboxes.append(pickle.loads(req.content))

Going line by line:

Line 1 specifies the names of the metadata files on the server:

```Python
maskrcnn_bbox_files = [ 'maskrcnn_bboxes_0001.pkl', 'maskrcnn_bboxes_0004.pkl' ]
```
    
In this case, we've pre-loaded bounding box metadata into `maskrcnn_bboxes_0001.pkl` and `maskrcnn_bboxes_0004.pkl`. Each Pickle file contains all the bounding boxes for the corresponding video.

Lines 3-6 make HTTP requests to the server and parse the Pickle files:

```Python
maskrcnn_bboxes = []
for bbox_file in maskrcnn_bbox_files:
    req = requests.get(posixpath.join(VIDEO_COLLECTION_BASEURL, bbox_file), verify=False)
    maskrcnn_bboxes.append(pickle.loads(req.content))
```

Line 5 specifies the path (join the base URL to the specific metadata file), and line 6 specifies that we should parse the Pickle file.

If you manually inspect the parsed objects, you'll see that each one has the following format:

```Python
[
    [
        [
            x1: float,
            y1: float,
            x2: float,
            y2: float,
            class: string,
            score: float,
            image_name: string
        ], # for each bounding box in the frame
        ...
    ], # for each frame
    ...
]
```
    
In other words, `maskrcnn_bboxes[0][frame][i]` contains Bbox `i` from frame `frame` in the first video.

In [18]:
maskrcnn_bboxes[0][10][0]

[644.10205078125,
 116.60777282714844,
 745.7400512695312,
 157.728515625,
 'car',
 0.9951943755149841,
 '000000011.png']

## Load Bounding Boxes into Rekall

Now that we've loaded our bounding boxes from the server, we can load them into Rekall.

In [19]:
maskrcnn_bboxes_ism = IntervalSetMapping({
    vm.id: IntervalSet([
        Interval(
            Bounds3D(
                t1 = frame_num / vm.fps,
                t2 = (frame_num + 1) / vm.fps,
                x1 = bbox[0] / vm.width,
                x2 = bbox[2] / vm.width,
                y1 = bbox[1] / vm.height,
                y2 = bbox[3] / vm.height
            ),
            payload = {
                'class': bbox[4],
                'score': bbox[5],
                'spatial_type': SpatialType_Bbox(text=bbox[4])
            }
        )
        for frame_num, bboxes_in_frame in enumerate(maskrcnn_frame_list)
        for bbox in bboxes_in_frame
    ])
    for vm, maskrcnn_frame_list in zip(video_metadata, maskrcnn_bboxes)
})

This cell contains Rekall's core programmatic abstractions, so let's go line by line again.
    
### The core abstraction of Rekall is an `Interval`, which contains a `Bounds` and a payload

![video_volume_v2.png](https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/videovolume_v2.png)

The `Bounds` contains 3D spatial co-ordinates, while the payload contains other metadata about each `Interval`; in this case, each `Interval` corresponds to a bounding box (`t1` and `t2` in seconds, spatial co-ordinates in co-ordinates relative to the frame size), and the payload contains class information and the confidence score for each bounding box:

```Python
Interval(
    Bounds3D(
        t1 = frame_num / vm.fps,
        t2 = (frame_num + 1) / vm.fps,
        x1 = bbox[0] / vm.width,
        x2 = bbox[2] / vm.width,
        y1 = bbox[1] / vm.height,
        y2 = bbox[3] / vm.height
    ),
    payload = {
        'class': bbox[4],
        'score': bbox[5],
        'spatial_type': SpatialType_Bbox(text=bbox[4])
    }
)
```
    
Notice that we convert timestamps from frame numbers to seconds by dividing by FPS, and from pixel co-ordinates to frame-relative co-ordinates by diving by width and height (`vm` is defined by the generator at the bottom of the original code).

We also set the `spatial_type` of the payload for visualization -- the `text` value that gets passed in is used to write the text on the bounding box.
    
### An `IntervalSet` is a collection of related `Interval`s

We create an `IntervalSet` by passing in a list of `Interval`s. In this case, we create an `IntervalSet` for each video by looping through each frame (first generator), and through all the bounding boxes for the frame (second generator):

```Python
IntervalSet(
    [
        Interval(
            ...
        )
        for frame_num, bboxes_in_frame in enumerate(maskrcnn_frame_list)
        for bbox in bboxes_in_frame
    ]
)
```

For those less familiar with Python, this code is roughly equivalent to something like this:

```Python
arr = []
for frame_num in range(0, len(maskrcnn_frame_list)):
    bboxes_in_frame = maskrcnn_frame_list[frame_num]
    for bbox in bboxes_in_frame:
        arr.append(Interval(...))
IntervalSet(arr)
```

### An `IntervalSetMapping` organizes `IntervalSet`s from different domains

Finally, we organize `IntervalSet`s from different videos using an `IntervalSetMapping`, which maps from keys to `IntervalSet`. We create one by passing in a `dict` with one entry for each metadata object in `video_metadata` (notice that we rely on `video_metadata` and `maskrcnn_bboxes` being in the same order):

```Python
maskrcnn_bboxes_ism = IntervalSetMapping({
    vm.id: IntervalSet([...])
    for vm, maskrcnn_frame_list in zip(video_metadata, maskrcnn_bboxes)
})
```

In this case, we are mapping from the video ID to the `IntervalSet` with that video's bounding boxes.

## Display videos with bounding box metadata in Vgrid

Finally, we can display the bounding box metadata drawn over the video metadata using Vgrid. To see more documentation about the Vgrid API, check out the [Vgrid documentation](https://github.com/scanner-research/vgrid#javascript-and-python).

In [22]:
# Visualize the data
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('bboxes', maskrcnn_bboxes_ism)
    ]),
    video_endpoint = VIDEO_COLLECTION_BASEURL
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xdb\xae&K\x92\x9c\xf7*D_\x0bB\x9c\x0f\xba\…

Again, line by line:

Lines 2-8 specify a VGridSpec for the widget:

```Python
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        ('bboxes', maskrcnn_bboxes_ism)
    ]),
    video_endpoint = VIDEO_COLLECTION_BASEURL
)
```
    
This is the same as when we visualized videos on their own, except this time we are passing `imaps` into `VideoBlockFormat`. This argument expects a list of pairs of `(String, IntervalSetMapping)`. In this case, we are only passing in a single `IntervalSetMapping`.

Note that the ID's in `video_metadata` match the keys in the `maskrcnn_bboxes_ism` that we created earlier. Vgrid uses this mapping to know what bounding boxes to draw on which videos.

Line 9 passes the spec (with data) as compressed JSON and creates a VGrid Jupyter widget with the data. Since it's the last line of the cell, it gets displayed below the cell automatically.

# Visualizing Multiple Tracks

Now that we have a handle on visualizing `IntervalSetMapping`s with Vgrid, let's use some of Rekall's functionality to display a more meaningful visualization.

Let's use the filter operation to filter for different classes. This will create four different `IntervalSetMapping` objects, each of which corresponds to different object categories:

In [24]:
object_names = [
    'person',
    'car',
    'truck',
    'traffic light'
]
object_isms = [
    maskrcnn_bboxes_ism.filter(lambda interval: interval['payload']['class'] == object_name)
    for object_name in object_names
]

The above code creates a list of `IntervalSetMapping`s, each of which corresponds to a different object. The first `IntervalsetMapping` of `object_isms` contains all the bounding boxes with the `person` class, the second one contains all the bounding boxes with the `car` class, etc.

This code visualizes the different `IntervalSetMapping`s. Each `IntervalSetMapping` will have a different track on the timeline and be visualized with a different color. Notice that we pass in more items into the `imaps` argument of `VideoBlockFormat`.

In [25]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = VideoBlockFormat(imaps = [
        (
            object_name,
            ism
        )
        for ism, object_name in zip(object_isms, object_names)
    ]),
    video_endpoint = 'https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xcb\xae%Krm\xf7+\x17\xd5\x16\x08\x7f?\xd4\…

# More Visualization Formats

Vgrid also provides a few other visualization formats. Check out Vgrid's [visualization formats](https://github.com/scanner-research/vgrid/blob/master/vgridpy/vgrid/vis_format.py) for more examples.

Here's a useful one - let's have a block for every high confidence (score above `0.99`) person detection in our videos: 

In [27]:
vgrid_spec = VGridSpec(
    video_meta = video_metadata,
    vis_format = FlatFormat(maskrcnn_bboxes_ism.filter(
        lambda interval: (interval['payload']['class'] == 'person' and
                          interval['payload']['score'] > 0.99)
    ).dilate(1.0)),
    video_endpoint = 'https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/cydet'
)
VGridWidget(vgrid_spec = vgrid_spec.to_json_compressed())

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xe4\xbd\xcb\xeem\xcdq\xe4\xf7*\r\x8e\x05\xa1\xeeY\…

The above visualization format (`FlatFormat`) creates one Vgrid block for each `Interval`. We dilate the temporal dimension by 1 second just to compensate for inaccurate frame loading in-browser.

# Congratulations!

You now have an in-depth understanding of how to load data into Rekall and visualize data using Vgrid. For more ideas of what you can do with Rekall, check out the [tech report](http://www.danfu.org/projects/rekall-tech-report/) and the [API documentation](https://rekallpy.readthedocs.io/en/latest/?badge=latest).