# Rekall Tutorial: Cyclist Detection

In this tutorial, you'll learn how to use Rekall to detect a new class of objects (cyslists) from existing person and bicycle detections from Mask R-CNN.

Let's first import Rekall and a few of its important classes:

In [1]:
%load_ext autoreload
%autoreload 2
from rekall import Interval, IntervalSet, IntervalSetMapping, Bounds3D
from rekall.predicates import *

For this tutorial, we'll provide some helpers to handle data loading videos and pre-computed object detections from our servers. Run this cell to load in those helpers:

In [2]:
from cyclist_tutorial_helpers import *

And now let's load up the pre-computed bounding box detections:

In [3]:
bboxes = get_maskrcnn_bboxes()

We can use the `visualize_helper` function to visualize these bounding boxes. Click on the video to expand it, and play the video by hovering over it and using `;`. You can navigate through the video by clicking through the timeline, and using the `+` and `-` buttons to zoom in or out.

In [4]:
visualize_helper([bboxes])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xdb\x8ed\xc9y\xac\xf9*\x04\xafg6\xfc|\x98\…

# Filtering on Payload
Let's give a preview of some of the things you'll be able to do with Rekall. In the above two cells we've loaded up bounding box detections over two videos, and visualized them for you.

Let's start by filtering the bounding box detections by class to look at bicycle and person detections:

In [6]:
bikes = bboxes.filter(lambda interval: interval['payload']['class'] == 'bicycle')
person = bboxes.filter(lambda interval: interval['payload']['class'] == 'person')

And now let's visualize them:

In [7]:
visualize_helper([bikes, person])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xd4\x9d\xcb\xae,Ir]\x7f\x85\xa81\xd1\xf0\xf7CC\xfd…

In [8]:
# Try some payload filtering functions yourself here!

# Filtering on Bounds

We are using Rekall's [Bounds3D](https://rekallpy.readthedocs.io/en/latest/source/rekall.bounds.html#rekall.bounds.Bounds3D) to represent these intervals. The intervals all have co-ordinates `t1`, `t2` (seconds), `x1`, `x2` (frame relative, between 0 and 1), and `y1`, `y2` (frame relative, between 0 and 1).

We can filter on the bounds co-ordinates as well:

In [9]:
visualize_helper([
    bikes.filter(lambda interval: interval['t1'] < 300),
    person.filter(lambda interval: interval['t1'] < 300)
])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xd4\x9d\xcb\xae,Ir]\x7f\x85\xa81\xd1\xf0\xf7CC\xfd…

In [10]:
# Try some bounds filtering functions yourself here!
visualize_helper([
    bikes.filter(lambda interval: interval['x1'] < 0.5),
    person.filter(lambda interval: interval['x1'] < 0.5)
])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcc\xbd\xdb\xaefIv\x9d\xf7*B_\xcbD\x9c\x0f\xbe\xf4…

# Rekall's Data Model

Now that we have a flavor of what we can do with Rekall, let's build our understanding of the data representation from the ground up. Let's first understand what `Interval`s are - these are the fundamental data structure that we use to represent any annotations in videos.

Here's a figure demonstrating what these Intervals can look like:

![video_volume_v2.png](https://storage.googleapis.com/esper/dan_olimar/rekall_tutorials/videovolume_v2.png)

Intervals are parameterized by a Bounds object (`Bounds3D` in all the intervals above), and an optional payload (face identities, word in the caption, or nested Intervals in the figure above):

```Python
# This interval has time bounds from 0 to 10 seconds, X bounds from 0.5 to 0.7 (frame-relative),
# and Y bounds from 0.6 to 0.9 (frame-relative)
new_interval = Interval(Bounds3D(
    t1 = 0,
    t2 = 10,
    x1 = 0.5,
    x2 = 0.7,
    y1 = 0.6,
    y2 = 0.9
))

# This interval has time bounds from 5 to 15 seconds, and default X and Y bounds of the whole
# frame (0 to 1 for both X and Y)
new_interval2 = Interval(Bounds3D(5, 15))

# This interval has a payload. The payload can be an arbitrary object.
new_interval3 = Interval(Bounds3D(0, 1), payload={ 'class': 'my first payload' })
                         
# We can access the co-ordinates of payload and an Interval directly
print(new_interval['t1'], new_interval['t2'], new_interval['x1'])
print(new_interval2['t1'], new_interval2['x1'])
print(new_interval3['payload'])
print(new_interval3['payload']['class'])
```

Try it yourself below!

**NB: If you're coming from the paper/tech report, the words "Label" are "Interval" are interchangeable in the code.**

In [11]:
# This interval has time bounds from 0 to 10 seconds, X bounds from 0.5 to 0.7 (frame-relative),
# and Y bounds from 0.6 to 0.9 (frame-relative)
new_interval = Interval(Bounds3D(
    t1 = 0,
    t2 = 10,
    x1 = 0.5,
    x2 = 0.7,
    y1 = 0.6,
    y2 = 0.9
))

# This interval has time bounds from 5 to 15 seconds, and default X and Y bounds of the whole
# frame (0 to 1 for both X and Y)
new_interval2 = Interval(Bounds3D(5, 15))

# This interval has a payload. The payload can be an arbitrary object.
new_interval3 = Interval(Bounds3D(0, 1), payload={ 'class': 'my first payload' })

# We can access the co-ordinates of payload and an Interval directly
print(new_interval['t1'], new_interval['t2'], new_interval['x1'])
print(new_interval2['t1'], new_interval2['x1'])
print(new_interval3['payload'])
print(new_interval3['payload']['class'])

0 10 0.5
5 0.0
{'class': 'my first payload'}
my first payload


In [12]:
# Create some Intervals yourself here!


# Associating Intervals with Events
In Rekall, we use *sets* of Intervals to represent events in videos. A single `IntervalSet` contains all occurrences of an event in a single video (all the bounding box detections, all the cyclist annotations, etc).

We can create an `IntervalSet` by passing in a list of `Interval`s:

```Python
# This IntervalSet represents all occurrences of a "made up" event in a video
my_first_intervalset = IntervalSet([
    Interval(Bounds3D(0, 10), payload = { 'class': 'made up'} ),
    Interval(Bounds3D(20, 30), payload = { 'class': 'made up'} ),
])
```

In [13]:
my_first_intervalset = IntervalSet([
    Interval(Bounds3D(0, 10), payload = { 'class': 'made up'} ),
    Interval(Bounds3D(20, 30), payload = { 'class': 'made up'} ),
])

One last thing - we want to associate each `IntervalSet` with the right video. We might have detected bikes in one video, but not the other!

We use `IntervalSetMapping` to associate `IntervalSet`s with different videos by keys. We create an `IntervalSetMapping` by passing in a `dict` from 

```Python
my_first_ism = IntervalSetMapping({
    0: IntervalSet(...), # the IntervalSet for video 0
    2: IntervalSet(...) # the IntervalSet for video 2
})
```

`bboxes` is an `IntervalSetMapping` object that we pre-loaded with `Interval`s containing object detections from Mask-RCNN.

Similarly, `bikes` and `person` are `IntervalSetMapping` objects representing the event that a bicycle object or a person object was detected in the video.

We can look at the keys of one of the `IntervalSetMapping` objects to see what the keys in our videos are!

In [14]:
bboxes.keys()

dict_keys([0, 3])

The videos that we were looking at before have keys 0 and 3. The visualization shows the videos in sorted order, so we know that the first video is video 0, and the second video is video 3.

Now we can create a simple `IntervalSetMapping` and visualize it:

In [15]:
my_first_ism = IntervalSetMapping({
    0: my_first_intervalset,
    3: IntervalSet([Interval(Bounds3D(50, 60, 0.5, 0.8, 0.1, 0.3)), Interval(Bounds3D(100,200))])
})

In [16]:
visualize_helper([my_first_ism])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xcdTM\x8f\x9b0\x10\xfd++z\xad\xc0\xe4k\xb79\xec\xa…

Now try it yourself below! Create some new `IntervalSetMapping` objects and visualize them on our videos.

**NB: if you try to visualize an `IntervalSet` with only a single Interval in it, the timeline may not appear until you click on the video.**

In [17]:
# Try it yourself!
visualize_helper([IntervalSetMapping({
    3: IntervalSet([Interval(Bounds3D(50, 60, 0.5, 0.8, 0.1, 0.3), payload={'class': 'temp', 'score': 1, 'spatial_type': SpatialType_Bbox(text='tmp')}), Interval(Bounds3D(100,200))])
})])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xadSM\x8f\x9b0\x10\xfd++z\xad\x82\xc9\xd7ns\xd8C\x…

# Checkpoint: check on your understanding

At this point, you should be familiar with a few concepts:
* We represent events in videos as *sets* of intervals - objects that are defined by a Bounds object and an optional payload
* An `IntervalSet` represents all the occurrences of a particular event in a single video
* An `IntervalSetMapping` organizes multiple `IntervalSet`s and associates each one with a different video

Since we told you that `bboxes_cydet` and `cyclist_gt` are `IntervalSetMapping` objects, you may have also surmised that we have some functions for manipulating these sets of intervals - we've already looked at a few examples of the `filter` function. But now that we know more about the underlying data representation, we can take a closer look at `filter`.

From the [documentation](https://rekallpy.readthedocs.io/en/latest/index.html#rekall.IntervalSet.filter):

```Python
def filter(self, predicate):
    """
    Filter the set and keep intervals that pass the predicate.
    
    Args:
        predicate: A function that takes an Interval and returns a bool.
        
    Returns:
        A new IntervalSet which is the filtered set.
    """
```

So `filter` expects a function that will take an `Interval` and return `True` or `False`. It runs the predicate function on every interval, and only keeps the ones where the predicate returns `True`.

Notice that the documentation says that `filter` returns a new `IntervalSet` - that's because it's actually a function on `IntervalSet`! `IntervalSetMapping` simply reflects all the functions in `IntervalSet` and applies the functions to every `IntervalSet`. So even though we wrote the function over `IntervalSet`s, we can run them on `IntervalSetMapping` objects like `bboxes_cydet` and `cyclist_gt`.

## Checkpoint exercise: duplicate certain labels across the entirety of a video
Let's go through a simple exercise to check your understanding of these concepts.

We'll define a simple `IntervalSetMapping` with a few objects. We'll want you to duplicate the Intervals at time 0 of video 0 throughout videos 0 and 3, but only if the class in their payload is `car`.

In [18]:
objects_to_duplicate = IntervalSetMapping({
    0: IntervalSet([
        Interval(Bounds3D(0, 10, 0.1, 0.3, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(0, 10, 0.4, 0.6, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(0, 10, 0.7, 0.9, 0.1, 0.9), payload={ 'class': 'godzilla' }),
        Interval(Bounds3D(10, 20, 0.1, 0.3, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(10, 20, 0.4, 0.6, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(10, 20, 0.7, 0.9, 0.1, 0.9), payload={ 'class': 'godzilla' })
    ]),
    3: IntervalSet([
        Interval(Bounds3D(0, 10, 0.2, 0.4, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(0, 10, 0.5, 0.7, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(0, 10, 0.75, 0.95, 0.1, 0.9), payload={ 'class': 'godzilla' }),
        Interval(Bounds3D(10, 20, 0.2, 0.4, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(10, 20, 0.5, 0.7, 0.1, 0.9), payload={ 'class': 'car' }),
        Interval(Bounds3D(10, 20, 0.75, 0.95, 0.1, 0.9), payload={ 'class': 'godzilla' })
    ])
})

In particular, we'll want the resulting Intervals to have the following properties:
* You should have Intervals with time extents of (0, 10), (10, 20), (20, 30), etc.
* The Intervals should have the same X/Y extent and payload as the Intervals at *time 0* of video 0 that have payload 'car'.
* The Intervals should cover the entire video.

A few hints to get you started:
* You can access `IntervalSet 0` of `objects_to_duplicate` like this: `objects_to_duplicate[0]`.
* You can loop through all the video keys of `objects_to_duplicate` using an iterator: `[k for k in objects_to_duplicate]`.
* You can filter down to Intervals that start at time 0 like this: `objects_to_duplicate.filter(lambda interval: interval['t0'] == 0)`
* You can access all the Intervals in an `IntervalSet` like this: `objects_to_duplicate[0].get_intervals()`. This will return a list of Intervals sorted by time.
* For example, to get the t2 value of the last interval in `bboxes_cydet`: `bboxes_cydet[0].get_intervals()[-1]['t2']`
* You can access the bounds of an interval like so: `interval['bounds']`
* You can copy a bounds like so: `interval['bounds'].copy()`
* Remember you can use Python ranges to get a value every ten seconds: `range(0, end, 10)`

In [19]:
# Construct an IntervalSetMapping with the answer here!

selected_bounds = []
for interval in objects_to_duplicate[0].get_intervals():
    if interval['t1'] == 0 and interval['payload']['class'] == 'car':
        selected_bounds.append(interval['bounds'].copy())

results = IntervalSetMapping({
    k: IntervalSet([
        Interval(Bounds3D(t, t + 10, bound['x1'], bound['x2'], bound['y1'], bound['y2']), payload={'class': 'car'})
        for t in range(0, int(bboxes[k].get_intervals()[-1]['t2']), 10)
        for bound in selected_bounds
    ])
    for k in objects_to_duplicate
})

visualize_helper([results])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xed\x9cKo\xe2H\x14\x85\xffJ\xcb\xb3\x1d\x05\xfc\xc…

## Exercise solution:

Example solution to this exercise - don't look if you haven't tried it!

```Python
objects_at_start = objects_to_duplicate[0].filter(
    lambda interval: interval['t1'] == 0 and interval['payload']['class'] == 'car'
).get_intervals()

video_lengths = {
    key: bboxes_cydet[key].get_intervals()[-1]['t2']
    for key in bboxes_cydet
}

objects_duplicated = IntervalSetMapping({
    key: IntervalSet([
        Interval(Bounds3D(
            t1 = t,
            t2 = t + 10,
            x1 = interval['x1'],
            x2 = interval['x2'],
            y1 = interval['y1'],
            y2 = interval['y2']
        ))
        for t in range(0, int(video_lengths[key]), 10)
        for interval in objects_at_start
    ])
    for key in bboxes_cydet
})

visualize_cydet([objects_duplicated])
```

# Defining New Events through Composition and Manipulation Functions for Cyclist Detection

Now that we have that basic understanding of Rekall's data representation, let's look at some more complex manipulation functions to try to make a cyclist detector.

First, let's look at the [`join`](https://rekallpy.readthedocs.io/en/latest/index.html#rekall.IntervalSet.join) function. This function computes the cross product of two `IntervalSet`s, filters the resulting pairs by a predicate, and then merges the resulting pairs back into a single `Interval` using a merge operation:

![simple_join.png](https://olimar.stanford.edu/hdd/rekall_tutorials/simple_join.png)

Here's an example of using a `join` operation to create an `IntervalSetMapping` object containing instances of a `person` bounding box overlapping with a `bicycle` bounding box.
```Python
person_intersect_bike = person.join(
    bikes,
    predicate = and_pred(
        Bounds3D.T(equal()),
        Bounds3D.X(overlaps()),
        Bounds3D.Y(overlaps())
    ),
    merge_op = lambda interval1, interval2: Interval(
        interval1['bounds'].span(interval2['bounds'])
    ),
    window = 0.0,
    progress_bar = True
)
```

This function joins `person` and `bikes`. The predicate expects a function of the following format:
```Python
def predicate(interval1, interval2):
    # return True or False
```

We provide a number of spatial and temporal predicates in Rekall, outlined [here](https://rekallpy.readthedocs.io/en/latest/source/rekall.predicates.html). In this case, we're only keeping pairs of (person detection, bike detection) if they have the same time bounds (`Bounds3D.T(equal())`), and the `X` and `Y` bounds overlap (`Bounds3D.X(overlaps())`, `Bounds3D.Y(overlaps())`). The `and_pred` wrapper takes in an arbitrary number of predicates and makes sure that all of them pass.

The `merge_op` expects a function of this form:
```Python
def merge_op(interval1, interval2):
    # return a new Interval
```

In this case, we are returning a new Interval whose bounds span both the Intervals in the pair - basically, the minimum bounding box that covers both of them.

We pass in a `window` of `0.0` - this is an optimization that limits the pairs in the cross product to only those Intervals whose time bounds are apart from each other by `window` or less time. A `window` value of `0` limits the pairs to only those that overlap. Since we're already filtering by that in the time dimension, we know that this optimization won't change our results.

Finally, we pass `progress_bar=True` just to visualize a progress bar while we wait for this computation to complete (there are a lot of detections to process).

Let's run this function and visualize the results below!

In [23]:
# Use the join above to construct bounding boxes where a person and bike overlap!
person_intersect_bike = person.join(
    bikes,
    predicate = and_pred(
        Bounds3D.T(equal()),
        Bounds3D.X(overlaps()),
        Bounds3D.Y(overlaps())
    ),
    merge_op = lambda interval1, interval2: Interval(
        interval1['bounds'].span(interval2['bounds'])
    ),
    window = 0.0,
    progress_bar = True
)


  0%|          | 0/2 [00:00<?, ?it/s][A
100%|██████████| 2/2 [00:11<00:00,  5.58s/it][A


In [21]:
visualize_helper([person_intersect_bike, bikes, person])

VGridWidget(vgrid_spec={'compressed': True, 'data': b'x\x9c\xd4\x9d\xcb\xae\xa4\xc9u\x9d_\xc5\xe0\xd8 \xe2~\xf…

# Congratulations!

You've now used a simple Rekall query to detect bicyclists by composing person detections with bicycle detections.

Next, check out the parking space detection tutorial to take a deeper dive into some of Rekall's functions and detect empty parking spaces using nothing more than the outputs of an off-the-shelf object detector!