Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan API and get_account_transfers #1054

Merged
merged 4 commits into from
Dec 19, 2023
Merged

Scan API and get_account_transfers #1054

merged 4 commits into from
Dec 19, 2023

Conversation

batiati
Copy link
Contributor

@batiati batiati commented Jul 21, 2023

This PR

Scan API, including fuzz tests and the new state machine operation get_account_transfers for querying transfers for a given account.

Scan API

Scan API:

ScanTreeType: The basic building block of the range queries, specialized over Tree, allowing to seek for key_min..key_max in ascending or descending key order in any LSM tree.

ScanBuilderType: Specialized type over Groove, grouping ScanTreeTypes for all IndexTrees in the groove, exposing them in a common API not specialized by Tree (the Scan Interface), allowing conditions where the predicate is the tree's CompositeKey prefix and the value produced is the ObjectTree's timestamp.

Merge operations (union, intersection, difference) can be performed with scans that are agnostic about its source (e.g. it can be a ScanTreeType specialized over any Tree or it can be another merge operation). The Scan type holds all possible concrete implementations as a tagged Zig union for dynamic dispatch through comptime-generated code.

Note: Only merge union is currently implemented.

ScanLookupType: Implements the lookup logic for retrieving the groove objects from a scan. Unlike the regular prefetch mechanism, this lookup acts only over timestamp and loads the objects directly into a provided buffer, keeping the same sort order of the underlying scan. The buffer size can be used as the LIMIT operator.

This diagram illustrates how multiple scans can be combined to create a complex query:

WARNING: outdated diagram.

image

API limitations

This API is intended to be used internally by the state machine, not to be exposed directly to the end user, it has some sensible limitations:

Note: All SQL examples are only illustrative, it's not the actual query language provided by the Scan API.

1. Only secondary indexes contained in IndexTree can be queried by ScanGrooveType.

Examples:

✔ Supported:

WHERE code=$1 AND ledger=$2

✖ Not supported:

WHERE id=$1 AND code=$2

Rationale: id isn't an IndexTrees field, this kind of query may be split into two steps:
1 - Performing a regular lookup for id=$1
2 - And then check if code=$2 in the instance found in the lookup step.

2. The field timestamp can be used in the query for paging, not as an index.

Examples:

✔ Supported:

WHERE code=$1 AND timestamp>$2

✖ Not supported:

WHERE timestamp>$2

Rationale: timestamp is part of the CompositeKey and can be used in operations such as =, <, > when combined with the key's prefix (see TimestampRange struct), but cannot be used directly from ScanGrooveType as it isn't a secondary index. This query may use ScanTreeType directly over ObjectTree instead.

3. Only objects of the same Groove can be queried together.

✖ Not supported:

SELECT
    *
FROM
    Transfers
    JOIN Accounts AS DebitAccount ON Transfers.debit_account_id=DebitAccount.id
WHERE
    DebitAccount.code=$1

Rationale: As ScanGrooveType is specialized over a single Groove, joining different objects must be done in another layer.

4. Merge operations on sorted scans.

Merge operations such as union, intersection, and difference can only be performed in scans sorted in the same direction.
Since scans always yield values sorted by prefix, timestamp, it's not possible to combine scans when the criteria aren't an exact match to the prefix.

Examples:

✔ Supported: Results sorted by timestamp:

WHERE code=$1 AND ledger=$2

✔ Supported: Results sorted by code and then timestamp:

WHERE code BETWEEN $1 AND $2

✖ Not supported:

WHERE code>$1 AND ledger=$2

Rationale: Since code>$1 isn't sorted by timestamp, it must be sorted in an auxiliary buffer prior to being intersected with ledger=$2.
Note: Sorting in an auxiliary buffer is out of the scope of this PR, since it requires an unbounded amount of memory or non-linear execution time.

New get_account_transfers operation

New get_account_code operation:

A new operation was implemented on top of the StateMachine using the Scan API to retrieve transfers related to an account_id.

Query Transfers by debit_account_id, credit_account_id, or both; sorted by timestamp in ascending or descending order; optionally from a specific timestamp (exclusive range > or < depending on the sort order) and limited by a number of transfers.
The combination of timestamp and limit can be used as pagination, passing the last timestamp received from the previous "page".

NOTE: The operation get_account_transfers differs from the other operations by the fact it always accepts only one Event and can return unknown amounts of Results (limited by the message size).
In many places in our clients, we assumed that the maximum number of Results could be inferred by the number of Events provided, making it necessary to expose the maximum size to the client side. Those places are signalized in the code with a TODO label.

Includes docs section for get_account_transfers and client support with autogenerated bindings, docs, samples, and tests.

TODO:

  • MacOS is segfaulting in Debug mode. I could track the cause down the call of Groove.init(...), but this function itself is never executed.
    EDIT It was caused by stack overflow declaring a BoundedArray.

  • Unit and fuzz tests for Scan and Merge will be included during the query engine implementation.

  • get_account_transfers unit tests need changes on the test executor. We are deferring the refactor to be done along the Query API.

@batiati batiati marked this pull request as draft July 21, 2023 20:59
src/lsm/scan_groove.zig Outdated Show resolved Hide resolved
src/lsm/scan_groove.zig Outdated Show resolved Hide resolved
@batiati batiati force-pushed the batiati-scan-api branch 2 times, most recently from 9a44f9a to 14abe83 Compare November 3, 2023 13:13
@batiati batiati changed the title [WIP] Scan API Scan API Nov 27, 2023
@batiati batiati changed the title Scan API Scan API and get_account_transfers Nov 27, 2023
@batiati batiati marked this pull request as ready for review November 27, 2023 12:58
docs/reference/operations/get_account_transfers.md Outdated Show resolved Hide resolved
docs/reference/operations/get_account_transfers.md Outdated Show resolved Hide resolved
docs/reference/operations/get_account_transfers.md Outdated Show resolved Hide resolved
docs/reference/operations/index.md Outdated Show resolved Hide resolved
src/clients/docs_generate.zig Outdated Show resolved Hide resolved
src/lsm/scan_groove.zig Outdated Show resolved Hide resolved
src/lsm/scan_groove.zig Outdated Show resolved Hide resolved
src/lsm/scan_groove.zig Outdated Show resolved Hide resolved
src/lsm/scan_groove.zig Outdated Show resolved Hide resolved
src/tigerbeetle.zig Outdated Show resolved Hide resolved
src/vsr/free_set_encoded.zig Outdated Show resolved Hide resolved
docs/reference/operations/get_account_transfers.md Outdated Show resolved Hide resolved
docs/reference/operations/get_account_transfers.md Outdated Show resolved Hide resolved
src/clients/go/README.md Outdated Show resolved Hide resolved
src/clients/go/go_bindings.zig Outdated Show resolved Hide resolved
src/lsm/scan_lookup.zig Outdated Show resolved Hide resolved
src/lsm/scan_lookup.zig Outdated Show resolved Hide resolved
src/lsm/scan_lookup.zig Show resolved Hide resolved
src/lsm/scan_lookup.zig Outdated Show resolved Hide resolved
src/lsm/scan_lookup.zig Outdated Show resolved Hide resolved
docs/reference/operations/get_account_transfers.md Outdated Show resolved Hide resolved
src/lsm/groove.zig Outdated Show resolved Hide resolved
src/lsm/scan_lookup.zig Outdated Show resolved Hide resolved
src/lsm/scan_lookup.zig Show resolved Hide resolved
src/lsm/scan_lookup.zig Show resolved Hide resolved
src/lsm/scan_tree.zig Show resolved Hide resolved
src/lsm/scan_tree.zig Outdated Show resolved Hide resolved
src/lsm/scan_tree.zig Show resolved Hide resolved
src/lsm/scan_tree.zig Show resolved Hide resolved
src/lsm/scan_tree.zig Show resolved Hide resolved
Copy link
Member

@sentientwaffle sentientwaffle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🎉

@batiati batiati added this pull request to the merge queue Dec 19, 2023
Merged via the queue into main with commit c9ebc80 Dec 19, 2023
27 checks passed
@batiati batiati deleted the batiati-scan-api branch December 19, 2023 17:15
sentientwaffle added a commit that referenced this pull request Dec 19, 2023
As part of the [Scan API](#1054) code review, we found that `Groove.prefetch()` would invoke its callback synchronously if all of the objects could be prefetched from cache.
That was fixed as part of the same PR.

But fixing it caused an unrelated test to fail: `replica_test.zig`'s `"Cluster: repair: ack committed prepare"`.

During the `Change views. B1/B2 participate. Don't allow B2 to repair op=21.` step, even though only B1/B2 participated in the view change, A0 was still allowed to send SVC messages. Deferring `prefetch()`'s callback to next tick made commits take slightly longer. This permutation meant that B1/B2 would finish their view change (as before), but then A0 would prod them into kicking off another view change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants