Skip to content
Yan Cui edited this page Jun 23, 2013 · 11 revisions

Whilst I tried to make the syntactical difference between querying and scanning minimal, DynamoDB has some limitations and differences to what you can do in a Query and Scan request. For instance, whilst you can use the full range of comparison operators ("EQ", "NE", "GT", etc.) in a Scan request, only a subset of these operators ("EQ", "GT", "GE", "LT", "LE", "BEGINS_WITH" and "BETWEEN") can be used in a Query request.

To learn more about the supported operations, read this and this API doc.

When to use

The external DSL outlined below is only intended to be used to perform a Query and Scan request, the code provided in the repository only allows you to use them as such. If you're looking to Get an item by keys (hash and range), you should continue to use the existing mechanisms to do so. This library does not intent to replace all existing functionalities of the AWS SDK but to make it easy to perform Query and Scan operations.

Assumptions

The rest of this guide assumes that you have at least basic understanding of DynamoDB and its data model. If you're not familiar with DynamoDB already, please refer to its API documentation here to help getting yourself familiarized, believe me, it'll pay off in the long run because it's such an awesome product!

Syntax

Query

The query syntax outlined below is only applicable when working with AmazonDynamoDBClient and DynamoDBContext types under the Amazon.DynamoDBv2 top level namespace introduced in AWSSDK v1.5.18.0. If you're working with the mirrored types in the Amazon.DynamoDB namespace, then please refer to the syntax outlined here instead.

A Query request is only permitted if your table defines both a hash and a range key. A Query allows you to specify the hash key value and optionally specify at most one comparison against the range key value, if no range key comparison is specified then all available range keys will be returned.

The basic format of a Query can be expressed as:

SELECT AttributeName1, AttributeName2, ..., AttributeNameN
FROM   TableName
WHERE  HashKeyAttributeName = HashKeyValue
AND    RangeKeyAttributeName operator RangeKeyValue(s)
ORDER DESC
LIMIT  N
WITH   (NoConsistentRead, PageSize(n), NoReturnedCapacity, Index(IndexName, AllAttributesFlag))

where:

  • AttributeName1 to AttributeNameN are the names of the attributes you want to return with the query, and if you want to return all attributes, you can also just use asterisk (*) instead

  • TableName is the name of the table, which at the time of writing, the library does not allow table names to be any case variants of 'where' or 'limit' in order to avoid subtle bugs such as:

    SELECT * FROM WHERE ThreadId = 2012

    from creeping into your code because TableName had been parsed as "WHERE ThreadId = 2012"

  • HashKeyAttributeName and RangeKeyAttributeName are the names of the attributes used for the Hash and Range keys

  • HashKeyValue and RangeKeyValue can be either numeric (integer or floating number) or string (which must be enclosed in double quotes ("))

  • WHERE clause is non-optional, and must specify one and only one value for the hash key with the '=' operator

  • an optional comparison can be made against the range key by specifying an operator with one of more values for the operator

  • operator is one of the allowed comparison operators, i.e.

    • = : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age = 30
    • >= : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age >= 30
    • > : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age > 30
    • <= : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age <= 30
    • < : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age < 30
    • BEGINS WITH : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Subject BEGINS WITH \"Dynamo\"
    • BETWEEN .. AND .. : SELECT * FROM Employees WHERE FirstName = \"Yan\" AND Age BETWEEN 10 AND 30
  • ORDER clause is optional, and can be followed by either ASC or DESC to search forward or backward respectively. If not specified, the default behaviour (search forward) will be used.

  • LIMIT clause is optional, and can be used to limit the number of results that are returned.

  • WITH clause is optional, and can be used to set optional parameters for the query:

    • NoConsistentRead : by default the DSL will issue query requests with conssitent read, the NoConsistentRead option will force the DSL to use eventual consistency instead
    • PageSize : by default DynamoDB will return as many items as available in your table that can fit into the max 1MB return size per request, but this creates strong bursts of activities against DynamoDB which is likely to cause other operations to be throttled. You can use the PageSize option to set the max number of items that are returned per query request to reduce the amount of bursts your generate against DynamoDB
    • NoReturnedCapacity : only useful when used with the low-level AmazonDynamoDBClient, specifying this option stops DynamoDB from returning the consumed capacity unit in the response
    • Index : used to specify that the query should be made against a Local Secondary Index of the specified IndexName, the AllAttributesFlag is a boolean value (either true or false) which indicates whether the query should return ALL attributes (which might involve fetching from the table and hence consuming more capacity units) or just the PROJECTED attributes in the index.
  • the keywords SELECT, FROM, WHERE, LIMIT, ORDER, WITH and the operators are all case insensitive

Scan

A Scan request is more permissive but also more expensive to run (both in terms of latency and read units consumed which translates directly to running cost). A 'Scan' allows you to filter on arbitrary fields in your data using a wide range of comparison operators.

The basic format of a Scan can be expressed as:

SELECT AttributeName1, AttributeName2, ..., AttributeNameN
FROM   TableName
WHERE  AttributeName1 operator1 AttributeValue1
AND    AttributeName2 operator2 AttributeValue2
AND    ...
AND    AttributeNameN operatorN AttributeValueN
LIMIT  N
WITH   (PageSize(n), NoReturnedCapacity, Segments(m))

where:

  • AttributeName1 to AttributeNameN are the names of the attributes you want to return with the query, and if you want to return all attributes, you can also just use asterisk (*) instead

  • TableName is the name of the table, which at the time of writing, the library does not allow table names to be any case variants of 'where' or 'limit' in order to avoid subtle bugs such as:

    SELECT * FROM WHERE Year = 2012

    from creeping into your code because TableName had been parsed as "WHERE Year = 2012"

  • WHERE clause is optional, and can specify one or more comparisons against the values of attributes using one of the supported operators

  • if WHERE clause is not specified, then all rows in the table will be returned

  • AttributeValue1 to AttributeValueN can be either numeric (integer or floating number) or string (which must be enclosed in double quotes ("))

  • operator is one of the allowed comparison operators, i.e.

    • = : SELECT * FROM Employees WHERE FirstName = \"Yan\"
    • != : SELECT * FROM Employees WHERE FirstName != \"Yan\"
    • >= : SELECT * FROM Employees WHERE Age >= 18
    • > : SELECT * FROM Employees WHERE Age > 18
    • <= : SELECT * FROM Employees WHERE Age <= 99
    • < : SELECT * FROM Employees WHERE Age < 99
    • CONTAINS : SELECT * FROM Employees WHERE Interests CONTAINS \"DynamoDB\"
    • NOT CONTAINS : SELECT * FROM Employees WHERE LuckyNumbers NOT CONTAINS 11
    • BEGINS WITH : SELECT * FROM Employees WHERE DateJoined BEGINS WITH \"2010\"
    • IS NULL : SELECT * FROM Employees WHERE FavouriteTeam IS NULL
    • IS NOT NULL : SELECT * FROM Employees WHERE FavouriteTeam IS NOT NULL
    • BETWEEN .. AND .. : SELECT * FROM Employees WHERE Age BETWEEN 18 AND 30
    • IN (...) : SELECT * FROM Employees WHERE Age IN (18, 25, 32, 39)
  • LIMIT clause is optional, and can be used to limit the number of results that are returned.

  • WITH clause is optional, and can be used to set optional parameters for the scan:

    • PageSize : by default DynamoDB will return as many items as available in your table that can fit into the max 1MB return size per request, but this creates strong bursts of activities against DynamoDB which is likely to cause other operations to be throttled. You can use the PageSize option to set the max number of items that are returned per scan request to reduce the amount of bursts your generate against DynamoDB
    • NoReturnedCapacity : only useful when used with the low-level AmazonDynamoDBClient, specifying this option stops DynamoDB from returning the consumed capacity unit in the response
    • Segments : Amazon announced support for parallel Scans back in May 2013. To execute a scan in parallel to boost overall time required to complete the scan, you can use the Segments option. Please refer to the DynamoDB best practice guide on when to use parallal scans and how to choose the correct number of segments.
  • the keywords SELECT, FROM, WHERE, LIMIT, WITH and the operators are all case insensitive

Query and Scan Count

If all you want to find out is the number of matching items rather than the actual items themselves, then you can also use the COUNT keyword in place of SELECT.

For instance, for a query count:

COUNT  * 
FROM   TableName
WHERE  HashKeyAttributeName = HashKeyValue
AND    RangeKeyAttributeName operator RangeKeyValue(s)
LIMIT  N

or a scan count:

COUNT  * 
FROM   TableName
WHERE  AttributeName1 operator1 AttributeValue1
AND    AttributeName2 operator2 AttributeValue2
AND    ...
AND    AttributeNameN operatorN AttributeValueN
LIMIT  N

Please note that DynamoDB does not allow attribute names to be specified when performing a count operation, so a valid count query will always start with COUNT * FROM ....

Also, at the time of writing, count queries can only be used with the low-level AmazonDynamoDBClient class. This is because there's no support for performing counts from the DynamoDBContext class, which as an ORM layer I think it's sensible decision.

Usages

For now, there are extension methods for the AmazonDynamoDBClient and DynamoDBContext classes to allow you to execute a Query or Scan using a query string written in the syntax described above.

Working with the low-level AmazonDynamoDBClient class

In the DynamoDbV2.SQL.Execution namespace, there exists a number of extension methods to asynchronously or synchronously execute a Query or Scan:

/// Executes a query asynchronously and returns the results
[<Extension>]
static member QueryAsync       : AmazonDynamoDBClient * string -> Async<QueryResponse>

/// Executes a query asynchronously as a task and returns the results
[<Extension>]
static member QueryAsyncAsTask : AmazonDynamoDBClient * string -> Task<QueryResponse>

/// Executes a query synchronously and returns the results
[<Extension>]
static member Query            : AmazonDynamoDBClient * string -> QueryResponse

/// Executes a scan asynchronously and returns the results
[<Extension>]
static member ScanAsync        : AmazonDynamoDBClient * string -> Async<ScanResponse>

/// Executes a scan asynchronously as a task and returns the results
[<Extension>]
static member ScanAsyncAsTask  : AmazonDynamoDBClient * string -> Task<ScanResponse>

/// Executes a scan synchronously and returns the results
[<Extension>]
static member Scan             : AmazonDynamoDBClient * string -> ScanResponse

Examples of the query syntax can be found here.

Working with the high-level DynamoDBContext class

In the DynamoDb.SQL.Execution namespace, there are also two extension methods for DynamoDBContext class to synchronously execute a Query or Scan:

/// Executes a query synchronously and returns the results
[<Extension>]
static member ExecQuery<'T>    : DynamoDBContext * string -> IEnumerable<'T>

/// Executes a scan synchronously and returns the results
[<Extension>]
static member ExecScan<'T>     : DynamoDBContext * string -> IEnumerable<'T>

I had decided to name the methods ExecQuery and ExecScan intentionally to avoid clashing with the existing Query<T>(object hashKey) and Scan<T>(object hashKey) methods. This is because methods that take in object as argument always leave doubt in developers' minds, and can introduce subtle runtime bugs if someone had forgotten to use the DynamoDb.SQL.Execution namespace and attempted to run a Query/Scan using a string written in the above DSL syntax, in which case the DSL string will be interpreted as the hash key instead and mostly likely return nothing...

Examples of the scan syntax can be found here.

Limitations

  • the usual DynamoDB limitations also supplies here, so a maximum of 1MB of data can be returned per API call, there's currently no support to automatically use LastEvaluatedKey to make the follow up request.
  • DynamoDB supports a number of data types (String, String Set, Number, Number Set, Binary, Binary Set), but for the time being this library only supports the most commonly used data types of String and Number.