Skip to content

Ares Query Language

Hongzheng Shi edited this page Dec 3, 2018 · 1 revision

Overview

AQL is an effective time series analytical query language. It does not follow the SQL syntax of SELECT FROM WHERE GROUP BY etc like other SQL-like languages. Instead it is specified in structured fields and can be carried with JSON, YAML, Golang objects etc. For instance, instead of SELECT count(*) FROM trips GROUP BY city_id, the equivalent AQL in JSON would look like:

{
  "table": "trips",
  "dimensions": [
    {
      "sqlExpression": "city_id"
    }
  ],
  "measures": [
    {
      "sqlExpression": "count(*)"
    }
  ]
}

The goals of AQL include:

  • To enable non tech-savvy users to compose ad hoc queries by filling (and drag-n-drop) web forms corresponding to the structured fields, instead of writing in SQL syntax.
  • To easily validate the query in real time, preventing common query mistakes. To hide complex details such as timezone conversion, time bucketization, *-to-many joins, exist queries, over counting etc.
  • To easily split/rollup time series range for effective result caching and query optimization. In AresDB, query execution plan will be generated directly from AQL without any SQL involvement.

Language Tutorial

The Basics

An AQL always queries over a main table for 1 or more measures. A measure is an aggregate specified in a SQL aggregate expression, such as count(), sum(farefx_rate), avg(distance). An example query would be:

{
  "table": "trips",
  "measures": [
    {
      "sqlExpression": "count(*)"
    }
  ]
}

The result would look like: 1234

Dimensions

An AQL query can have 0 or more dimensions. Dimensions are equivalent to the GROUP BY clause in SQL. Each dimension is specified as a SQL expression. An example query would be:

{
  "table": "trips",
  "dimensions": [
    {
      "sqlExpression": "status"
    }
  ],
  "measures": [
    {
      "sqlExpression": "count(*)"
    }
  ]
}

The result would look like:

completed	1234
cancelled	2345
unfulfilled	4567

Time Filter

A time series AQL query must specify a time filter as a time range on a time column. Here's an example:

{
  "table": "trips",
  "measures": [
    {
      "sqlExpression": "count(*)"
    }
  ],
  "timeFilter": {
    "column": "request_at",
    "from": "2 days ago",
    "to": "now"
  }
}

See here for the full time filter spec.

Time Dimension

Some of the dimensions in an AQL query can be time dimensions, which must be specified with time bucketizers. A time bucketizer field specifies not only the size but also the alignment of the time bucket. For instance:

{
  "table": "trips",
  "dimensions": [
    {
      "sqlExpression": "request_at",
      "timeBucketizer": "day"
    }
  ],
  "measures": [
    {
      "sqlExpression": "count(*)"
    }
  ]
}

The result would look like:

2017-01-01	1234
2017-01-02	2345
2017-01-03	4567

In this query each time bucket always starts at UTC day boundary (midnight), regardless of the range in time filter. See here for the full time bucketizer and time unit spec.

Timezone

AQL has built-in timezone support since we are a global business. The timezone can be specified in one of the following ways:

  • Offsets like -8:00.
  • Timezone names like America/Los_Angeles, which automatically adjust for daylight saving.
  • Timezone for a city ID, i.e., timezone(city_id).
  • The corresponding region, mega region, sub region, or country timezone for a city ID, i.e., mega_region_timezone(city_id). Notice that time filter range boundaries and time dimension bucket boundaries are always aligned to the specified timezone at their specified granularities. For instance, from 30 days ago to now plus a week time bucketizer would generate the first bucket as from midnight 30 days ago until the first Monday midnight since then, followed by more week buckets all start and end at Monday midnights, followed by the final bucket which starts at the recent Monday midnight until now, with all midnights aligned to the specified timezone.

Row Filters

Row filters can be applied at either query level or at individual measure level, the former will apply to all measures within the query. Each filter is specified as a SQL boolean expression, such as status = 'completed'. Multiple filters are conjunctive (AND) in their effects. Disjunctions are archived within the same filter using OR. Here's an example.

{
  "table": "trips",
  "measures": [
    {
      "sqlExpression": "count(*)",
      "rowFilters": ["surge_multiplier > 1"]
    },
    {
      "sqlExpression": "sum(fare)"
    },
  ],
  "rowFilters": [
    "status = 'completed' or status = 'cancelled'",
    "city_id = 1"
  ]
}

The first measure counts SF surging trips that are either completed or cancelled, while the second measure sums fare for SF trips that are either completed or cancelled.

Joins, Foreign Tables

AQL supports joins of 0 or more foreign tables. Each foreign table is specified with its name, a mandatory alias, and 1 or more joining conditions as SQL boolean expressions. For instance:

{
  "alias": "cts",
  "table": "api_cities",
  "conditions": ["cts.id = city_id"]
}

There are a couple of key differences between a typical SQL join and an AQL join:

  • Any foreign table specified for joins are on-demand at individual measure level, meaning that simply adding the table to the join section does not guarantee the join to physically happen. For each measure, the join only happens if any column of the foreign table is used (or transitively used through other foreign tables) by the measure, any dimension or any filter. Otherwise the table will be skipped from joining.
  • The system automatically detects *-to-many join based on augmented schema and join conditions, and decides whether the join is physically implemented as join or EXISTS sub query.
  • Columns of foreign tables must always be qualified with the table alias (cts.id, not id); columns without qualification are assumed to be from the main table. Here's an example of a geo-join:
{
  "table": "trips",
  "joins": [
    {
      "alias": "g",
      "table": "geofences",
      "conditions": ["geography_intersects(request_point, g.shape)"]
    }
  ],
  "measures": [
    {
      "sqlExpression": "count(*)",
      "rowFilters": ["g.uuid = 0x101'"]
    }
  ]
}

Numeric Bucketizers

Non-Aggregate Queries

While AQL is designed primarily for time series aggregates, it is also possible to express non-aggregate queries, such as selection queries. This is achieved by:

  • Use 1 or any constant as the measure expression.
  • Add each field to be queried as dimensions.
  • Primary key should also be added to dimensions, since AQL post processing deduplicates based on dimensions.
  • Specify an optional limit as the maximum number of results to return. It defaults to 1 when left unspecified.

Here's an example:

{
  "table": "trips",
  "measures": [
    {
      "sqlExpression": "1"
    }
  ],
  "dimensions": [
    {
      "sqlExpression": "hex(uuid)"
    },
    {
      "sqlExpression": "status"
    },
    {
      "sqlExpression": "city_id"
    },
  ],
  "limit": 3
}

The result would look like:

 0x12345678  completed   1   1 
 0x23456789  cancelled   12  1 
 0x34567890  unfulfilled 5   1