New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Accounts and Transfers to u128 amounts #1157
Conversation
Interesting benchmark results, too. I remember when testing a pure block size increase, that bumped performance too due to the situation wrt compaction pacing (think the sweet spot was 256K). Would be interesting to run those again (eg, compare |
The combination of
The
|
1131b3f
to
06d8615
Compare
ade000a
to
24cfb47
Compare
docs/reference/transfers.md
Outdated
* Must not overflow a 64-bit unsigned integer when converted to nanoseconds | ||
and summed with the transfer's timestamp (`error.overflows_timeout`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this warning is less important now that the user passes in seconds, which we convert to nanoseconds -- unless the TB cluster's clock is 400 years in the future! 😉
May as well leave it in for completeness, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed!
I removed it from here and instead it's now mentioned in create_transfers.md#overflows_timeout
- Account balances fields are now u128. - Transfers amounts are u128. - Timeouts are u32 and expressed as seconds instead of nanoseconds. - Three user_data fields, as u128, u64 and u32. - AccountMutable and AccountImmutable are now a single groove Account. - Reordered the Transfer and Account fields to keep @Alignof(T) == 16.
0780b7e
to
6cef56f
Compare
.user_data_32 = 8, | ||
.ledger = 9, | ||
.code = 10, | ||
.timestamp = 11, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Despite the name "timestamp" here, what this actually corresponds to is the groove's object tree. Imo that should take precedence over the field order, and this tree's id should be listed/assigned before id
. (Likewise for transfers
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what this actually corresponds to is the groove's object tree
However, our usage of timestamp
as internal identifier for the groove's object tree is a secondary implementation detail. The primary usage of timestamp
is external, as a totally ordered hybrid logical clock. Beyond that, the reason it's last, is because it's the only field that's "filled in"—in the sense that it's appended to the existing data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, ahead of timestamp
is id
, which is the external identifier and which comes first (it's the "key", followed by the value). Users provide this much of the key/value, consecutively, and then the time is "stamped" on after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users provide this much of the key/value, consecutively, and then the time is "stamped" on after.
This code is defining/assigning the LSM tree ids though, it doesn't relate to anything that users see.
From the LSM's perspective, the object tree is the "primary" tree -- object and id trees are special cases in the groove, separate from all of the secondary index trees.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imo, there's not much benefit that justifies not following the object's declaration order here.
We already handle timestamp
as a special case in the LSM, which is very clear, there's no need to reinforce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, those fields are intended to be stable IDs and may become unordered in the future if we ever change our schema. Either we will follow the declaration order or the numerical order. So, I don't have strong feelings about it.
\\ transfer T1 A1 A3 -0 _ U1 U1 U1 _ L1 C2 _ _ _ _ _ _ _ _ exists_with_different_flags | ||
\\ transfer T1 A3 A1 -0 _ U1 U1 U1 1 L1 C2 _ PEN _ _ _ _ _ _ exists_with_different_debit_account_id | ||
\\ transfer T1 A1 A4 -0 _ U1 U1 U1 1 L1 C2 _ PEN _ _ _ _ _ _ exists_with_different_credit_account_id | ||
\\ transfer T1 A1 A3 -0 _ U1 U1 U1 1 L1 C1 _ PEN _ _ _ _ _ _ exists_with_different_amount | ||
\\ transfer T1 A1 A3 123 _ U1 U1 U1 1 L1 C2 _ PEN _ _ _ _ _ _ exists_with_different_user_data_128 | ||
\\ transfer T1 A1 A3 123 _ _ U1 U1 1 L1 C2 _ PEN _ _ _ _ _ _ exists_with_different_user_data_64 | ||
\\ transfer T1 A1 A3 123 _ _ _ U1 1 L1 C2 _ PEN _ _ _ _ _ _ exists_with_different_user_data_32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test vectors are always so nice and concise.
Note: Dotnet tests are failing for Macos self-hosted. |
* @throws IllegalStateException if not at a {@link #isValidPosition valid position}. | ||
* @see <a href="https://docs.tigerbeetle.com/reference/accounts/#user_data_64">user_data_64</a> | ||
*/ | ||
public long getUserData64() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 it feels like this API makes it too easy to ignore high order bits. Should we perhaps through an exception here if this is not the entirety of user data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is also inconsistetn with how, eg, getDebitsPosted
works (that API makes sense more to me personally).
@batiati it feels like maybe we have an incomplete refactor here?
This PR:
Account balances fields are now
u128
.Transfers amounts are
u128
.Timeouts are
u32
and expressed asseconds
instead ofnanoseconds
.Three
user_data
fields, asu128
,u64
andu32
.AccountMutable
andAccountImmutable
are now a single grooveAccount
.Reordered the
Transfer
andAccount
fields to keep@alignOf(T) == 16
.Direct changes:
Docs: All changes have been reflected, and new examples have been added for the
user_data_{128|64|32}
fields.Benchmarks, Demos, Samples, Tests, and Repl: The code has been refactored.
Clients: Besides the naming and types refactor, these are specific changes for each client:
Node client: No extensive changes were needed, as
BigInt
was already being used foru64
values.Java client: The API now defaults all
get
acessors tobyte[]
for IDs and user data, andBigInteger
for balances and amounts. Additionally, overloaded accessors allow the application to get/set values as a pair of primitivelong
. It allows integrating with third-party numerical libraries without the need to allocate temporary objects like arrays or BigIntegers.C# client: .NET 7.0 comes with support for
UInt128
. Users can easily opt-in to use the newUInt128
or to use a replacement type we distribute for .NET Standard 2.1 (which is supported by other runtimes like .NET Core, Xamarin, Unity, and .NET up to 6.0). Also, there are extension methods for converting to/fromBigInteger
.Go client: We expose the type
Uint128
with a set of conversion functions, includingmath/big.Int
. Likewise, users have the option to usebig.Int
or any other implementation to handle or format 128-bit values.Other changes:
block_size
to 128K.Without this, we can't fit everything in a single block due to the size of each object and
lsm_batch_multiple
:Argument
--cache-grid
is now validated for the minimum size.Currently, it's
--cache-grid=256MB
instead of128MB
due toblock_size
changes.Repl parser now accepts identifiers containing numbers (but not starting with numbers).
Example:
user_data_128
would be rejected by the previous version.Benchmarks
Before:
After: