Reel - a Query Language for TrailDB
Reel is a small language for querying time-series of discrete events encoded as TrailDBs. The standard TrailDB API provides functionality for filtering individual events based on a Boolean query but expressing more complex queries over multiple events is left to the application, or to a language like Reel.
Reel makes it easy to express complex queries. Reel provides zero-cost abstraction over hand-written C code using the TrailDB API: Reel programs are compiled to optimization-friendly C code which can be easily accessed by any language.
What is Reel?
Consider Reel as AWK for TrailDB, following the tradition of data-driven programming. Whereas AWK programs are evaluated over lines of text, Reel programs are evaluted over a trail in TrailDB, that is, a sequence of events ordered by time.
Similar to AWK, Reel programs are series of pattern-action pairs which are evaluated for each event in the trail, in order. Typically actions involve updating internal variables of the program, such as counters, which are finally returned to the calling program or output to a CSV file.
Figuratively, this is like pulling a strip of frames (events) from a film reel (trail) and deciding what to do with each frame based on its contents (pattern-action). You can also control the movement of the film reel in the program (rewind, stop, or jump to a frame).
Reel is meant for computing metrics over series of events that happen over time. These types of queries are typically very tedious to express as SQL, and expensive to query at scale, which is the raison d'etre for TrailDB. Examples of such queries include:
Multi-event metrics - Compute metrics over multiple events, such as bounce rate, i.e. a person visiting only one page at a web site before leaving.
Funnel analysis - Analyze how many users follow a certain sequence of events within a given time window, e.g. from site visit to purchase.
Threading - A trail may include a mixture interleaved events produced by many concurrent sources. For instance, consider a person surfing the web across many web sites. You can use Reel to group events related to each web site and produce metrics for each of them separately.
Joins - Enrich events by joining them with external data based on a field or a more complex logic, and produce metrics over the enriched trails.
All the above use cases could be covered by a general-purpose language, such as Python or C, using the standard TrailDB API. This approach may be preferable for the trickiest use cases but in most cases it would be convenient to use a simpler and more expressive way to write queries.
Based on our experiences, here are some of the main motivations for using Reel:
Rapid - For its intended use, Reel is much more expressive than a low-level language like C and thus more rapid to develop. Reel compiles to C so performance is at least at the level of hand-written C, which makes its much faster than higher-level languages such as Go or Python.
Embeddable - By design, Reel queries are easy to embed in applications written in any language, either by linking them directly or by using a Foreign Function Interface, such as Python's ctypes.
Extensible - You can easily extend Reel itself by introducing new functions. You can build a library of domain-specific functions of your own on top of Reel.
Lightweight - Reel is a very simple language, so its cognitive overhead is minimal.
This tutorial introduces you to all features of Reel.
Make sure your system has Python and
Check out the Reel repo
git clone https://github.com/traildb/reel.
There are two main modes for using Reel: By default, the
takes a Reel program and a TrailDB and executes the program over the
TrailDB and outputs the results as CSV. Alternatively, you can use
reel to only compile your program for linking with an external
application (TODO: How?).
You can check that your setup works by executing
./reel doc/00-helloworld.rl doc/hedgehog1.tdb
If everything works smoothly, you should see output like
You can find all examples below in the
/doc/ directory in the Reel repo.
Most examples use a TrailDB, included in the repo at
which contains one trail that looks as follows:
Besides the standard
time (shown in the upper left corner of each
uuid fields, there are two additional fields:
hedgehog. As you can see in the illustration above, the frames are
either yellow, blue, or the color is undefined (white). Some frames have
a hedgehog. Your TrailDBs are surely more interesting but they work the
same way with Reel.
Print out hello world.
# variable declarations var Hello uint var World uint # a pattern matching the beginning of the trail begin: # actions set Hello 1 set World 2
Variable declarations are listed at the top of the program. All variables used in the program must be declared here.
beginabove, are listed at the top level and they are evaluated in order for every event. An event can match zero or more patterns, not just one.
Actions are evaluated for each matching pattern. Reel is a whitespace sensitive language, like Python. Actions need to be indended under the pattern.
Comments are prefixed by
Reel is a strongly typed language with static (compile-time) type checking.
The types are defined in the variable declarations, like
var Hello uint above.
Reel has the following scalar types:
- uint - a 64-bit unsigned integer
- item - a TrailDB item, i.e a field-value pair.
Much of the power of Reel comes from its first-class support of TrailDB items.
Literal references to TrailDB fields and items are prefixed with a dollar sign,
Variables don't have a prefix.
By convention, variables starting with an uppercase letter are shown in the output. Variables starting with an underscore are reserved.
Count the number of hedgehogs.
var Hedgehogs uint if $hedgehog: inc Hedgehogs 1
Execute the example by running
./reel doc/01-counter.rl doc/hedgehog1.tdb.
As seen in the picture above, the result should be 6 hedgehogs.
This example uses one variable,
Hedgehogs, which is incremented every
time we find a hedgehog. The pattern
if $hedgehog matches every event
that has a non-empty value in the field
$hedgehog. The action
increments its first argument, which needs to be of type
uint, by the
amount specified in the second
Count the number of colors.
var Blues uint var Yellows uint if $color $color='yellow': inc Yellows 1 if $color $color='blue': inc Blues 1
This program should output 3 blues and 5 yellows.
In contrast to the previous example, which matched any event with
a non-empty value for
$hedgehog, this example matches a specific
value of the field. Reel provides special syntax for resolving
specific values or item literals, such as
Item literals are resolved when the program is initialized using
tdb_get_item, which allows
Reel to evaluate the pattern quickly.
You need to specify both the field name,
$color, as well as the value,
yellow, since the same value may exist in multiple fields. The pattern
matches when the field
$color in the event has the value
Besides scalar types,
item, Reel supports special arrays,
called (lookup) tables, which map items to other values. Lookup tables
come in handy when dealing with fields with a high cardinality. Handling
cases like this with item literals, like in the previous example, would
not be very convenient.
Count the number of colors using a lookup table.
var Colors table:$color->uint if 1: inc Colors[$color] 1
Like the previous exaple, this example counts the number of colors but
this time using a lookup table. The variable declaration
var Colors table:$color->uint defines a table
Colors that maps all values of
uint. The table is created at the time of initialization,
so no run-time memory allocations are needed.
if 1 evaluates to true for each event. Because of this, you
can see that this example counts the number of whites as well. Can you
modify the program so that it counts only blues and yellows?
Score colors based on a preset table.
var Colors table:$color->uint var Scores table:$color->uint if 1: inc Colors[$color] Scores[$color]
A very powerful feature of Reel is that you can assign values to variables in the calling application. Hence you can use variables to parameterize behavior of the program.
Try executing the above program as usual:
./reel doc/04-preset-table.rl doc/hedgehog1.tdb
The result will be all zeroes: The program increments
on the value defined in
Scores, which is zeroes by default.
Now try this:
./reel doc/04-preset-table.rl --set Scores=@doc/scores.csv doc/hedgehog1.tdb
--set option can be used to preset values to variables. In this case,
we are populating the
Scores tables based on the values in
TrailDB is all about events over time, so Reel handles time natively.
The timestamp of the current event is accessible through the field
$time that has the type
Find pairs of hedgehogs that are not more than
window time units apart.
var Pairs uint var hedgehog_time uint var window uint if $hedgehog: if hedgehog_time and time_before hedgehog_time window: inc Pairs 1 set hedgehog_time $time else: set hedgehog_time 0
This example finds all hedgehogs that are immediately followed by
another hedgehog that comes at most
window seconds after the first
one. This is accomplished by storing the time of each hedgehog seen in
hedgehog_time and by comparing it to the current time with the
window is zero by default so no pairs are found. Try
./reel doc/04-preset-table.rl --set window=1 doc/hedgehog1.tdb
that captures the pair at 5-6 seconds. By setting the
you can also capture the pair at 1-5 seconds etc.
By default, Reel evaluates every event in the trail. You can control what events Reel sees by using TrailDB's event filter. You can also control the flow of events using control flow statements of Reel:
beginis evaluated before the first event, even when there may be no events due to event filtering. It must be the first pattern of the program.
endis evaluated after the last event, before the program exits. It must be the last pattern of the program.
stopearly exits evaluation of the trail. Note that the
endpattern is evaluated immediately after calling
nextjumps to the next event immediately, without evaluating rest of the pattens.
rewindjumps to the first event, without evaluating
setpos posjumps to the arbitrary position in the trail, stored in the
pos. You can retrieve the current position in the trail from a special variable
Count the number of blues in the trail if the last event of the trail is yellow.
var last_is_yellow uint var prev_is_yellow uint var Blues uint if last_is_yellow and if $color $color='blue': inc Blues 1 if $color $color='yellow': set prev_is_yellow 1 else: set prev_is_yellow 0 end: if prev_is_yellow and not if last_is_yellow: set last_is_yellow 1 rewind
This example first checks the color of the last event. If it is yellow,
we go back to the beginning with
rewind and count the number of blues.
After counting the blues, we need to skip the last event check in
otherwise we would be looping through the trail infinitely.
Grouping by Splitting Contexts
All the examples seen this far produce one set of metrics, that is,
one row of output. This is similar to what you would get with a SQL
statement that only consists of aggregate functions, like
select count(*), sum(field) from ... where .... Often you want to get metrics
for many groups separately, which is what you get with
group by in
With a declarative language like SQL, the query engine can figure out the groupings automatically. Reel, being more imperative in nature, requires a more explicit approach.
Reel programs manage their state in a context that contains all
variables of the program. Grouping in Reel is done by creating new
copies of the context on the fly with the
fork statement. Each
context with its variables becomes a row of output.
Groups are identified by a key that can be any scalar variable. An important feature of groupings is that by default each context, that is, each key is activated only once for each trail. The following example makes it clear why this is a good idea.
Count the number of hedgehogs grouped by color.
var Color item var Hedgehogs uint if Color $color: # this is evaluated in the child context if $hedgehog: inc Hedgehogs 1 else: # this is evaluated in the parent context if $color: fork $color: send Color $color
Reel programs with groupings have typically two branches of execution:
The parent, which takes care of creating the desired groups with
and the child contexts, or the groups, which compute the actual metrics.
We know which branch to execute based on the group label, like
above - only the children have a non-empty group label.
The example above creates a new group, keyed by the
$color field, for
each event seen with a non-empty
$color. This is necessary since we do
not know the groups in advance. It would be wasteful to evaluate all
possible groups opportunistically. The fact that Reel deduplicates groups
based on their keys ensures that we compute metrics for each group only
The actions under
fork are evaluated only when the group is activated.
A special action,
send, allows the parent to set variables in the new
child context. In this case, we set the
Color variable of the child to
correspond to the
$color that invoked the child.
The output of the program looks like follows:
Color,Hedgehogs ,0 yellow,4 blue,1
which is what you would expect based on the picture above. Reel outputs
each context as a row, including the parent context, which explains the
first row with an empty
Color. Learn how this can be avoided in
the section "Formatting Output" below.
Managing State across Multiple Trails
All the examples this far have computed metrics over a single trail. In real life, it is more typical to compute metrics over multiple trails, which Reel supports without trouble. You just need to manage state properly, as shown below.
Let's move on to another example TrailDB,
contains another trail in addition to the one we have been using this far:
reel executes the program for every trail in the TrailDB. The
begin pattern is called in the beginning of each trail but otherwise the
program state is kept intact over trails.
You can see this by executing the hedgehog counter over the two-trail TrailDB:
./reel doc/01-counter.rl doc/hedgehog2.tdb
This should output 12, as there are 12 hedgehogs in total. You can also try
./reel doc/02-colors.rl doc/hedgehog2.tdb
which shows that there are 6 yellows and 6 blues in total.
However, try to execute the program that counts the number of blues for trails that end with yellow color
./reel doc/06-rewind.rl doc/hedgehog2.tdb
This returns 6, which is not correct. The program counts the blues in the second trail, although it doesn't end with a yellow event. This is because its internal state is incorrectly carried over from the previous trail to the next one. It is easy to fix:
var last_is_yellow uint var prev_is_yellow uint var Blues uint begin: unset last_is_yellow unset prev_is_yellow if last_is_yellow and if $color $color='blue': inc Blues 1 if $color $color='yellow': set prev_is_yellow 1 else: set prev_is_yellow 0 end: if prev_is_yellow and not if last_is_yellow: set last_is_yellow 1 rewind
We added a
begin block that resets the internal state by calling
unset for the trail-specific variables. This fixes the program. Note
that we do not want to reset the
Blues counter, which is supposed to
collect statistics over multiple trails.
You should always carefully initialize any variables in
begin that are
not supposed to leak over trails.
You can post-process and output results of a Reel program arbitrarily by using its API, as described in the "Embedding Reel" section below. This section focuses on the Reel command line tool, which outputs CSV.
By convention, all variables starting with an uppercase letter are output
as columns in the CSV. The order of variable declarations defines the order
of the columns. Lookup tables starting with an uppercase letter are expanded
to columns using
variable:value notation for the column labels.
Every context becomes a row in the CSV. This is not always desirable, as seen
in the group-by example above. Often, you want to exclude the parent context
from the output, or you may decide, based on the computed metrics, that a row
should not be included in the results. To make this possible, Reel defines a
_HIDE, which if set to a non-zero value, causes the context
to be excluded from the output. Here is an example, extending the previous
var Color item var Hedgehogs uint var _HIDE uint begin: if Color: set _HIDE 0 else: set _HIDE 1 if Color $color: # this is evaluated in the child context if $hedgehog: inc Hedgehogs 1 else: # this is evaluated in the parent context if $color: fork $color: send Color $color
This program excludes the parent context from the output using the
_HIDE variable. Only
blue are output.
A Reel program compiles to a self-sufficient C object. The objects are namespaced so that you can include multiple programs in your application without name clashes. You can also compile the Reel program to a shared library and use it from other languages through their Foreign Function Interface.
TODO: Document API, examples.
Reel is designed to be easily extensible with user-defined actions. Since Reel is strongly typed, you need to provide type-specialized versions of your actions, defined as C functions, for the types that your action accepts.