Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UUIDv7 with Millisecond Precision and Clock Sequence Rollovers #40

Closed
theckman opened this issue Aug 27, 2021 · 7 comments
Closed

UUIDv7 with Millisecond Precision and Clock Sequence Rollovers #40

theckman opened this issue Aug 27, 2021 · 7 comments
Labels
UUIDv7 All things UUIDv7 related

Comments

@theckman
Copy link

Hello,

I was on the fence as to whether I should include this in #24, or file it as its own issue. If that other issue is better, let me know and I'll add a comment there instead.

I am working on an implementation of the latest draft in Go, and when using millisecond precision at full-bore I can easily increment the clock sequence to the point of rolling over the 12 bit integer. When discovering this I wanted to understand how the rollover should be handled, and I noticed some issues with how section 4.4.2 is worded that contradict themselves and ultimately don't provide explicit direction on how to handle the rollover.

4.4.2.  UUIDv7 Clock Sequence Usage

   UUIDv7 SHOULD utilize a monotonic sequence counter to provide
   additional sequencing guarantees when multiple UUIDv7 values are
   created in the same UNIXTS and SUBSEC timestamp.  The amount of bits
   allocates to the sequence counter depend on the precision of the
   timestamp.  For example, a more accurate timestamp source using
   nanosecond precision will require less clock sequence bits than a
   timestamp source utilizing seconds for precision.  For best
   sequencing results the sequence counter SHOULD be placed immediately
   after available sub-second bits.

   The clock sequence MUST start at zero and increment monotonically for
   each new UUIDv7 created on by the application on the same timestamp.
   When the timestamp increments the clock sequence MUST be reset to
   zero.  The clock sequence MUST NOT rollover or reset to zero unless
   the timestamp has incremented.  Care MUST be given to ensure that an
   adequate sized clock sequence is selected for a given application
   based on expected timestamp precision and expected UUIDv7 generation
   rates.

The initial issue is that the first and second paragraph contradict each other. The first says we SHOULD utilize a monotonic sequence counter, without offering guidance on how to implement it without using a sequence counter. The second paragraph then goes on to say "The clock sequence MUST...", which with the usage of the indicates that there is only one type of clock sequence that should be used. This makes me think the SHOULD in the first paragraph might benefit from becoming a MUST.

The other issue I ran into is that it's unclear how to proceed when the sequence counter would roll over without the timestamp having changed. I understand that I should not permit it to roll over, but I don't know whether that's failure condition and I should return an error or whether I should just generate UUIDv7 with the maximum sequence number until the timestamp changes. Would there be value in being explicit about how to handle that failure mode?

The truth is, I believe the majority of organizations would lean on publicly available UUID implementations over implementing their own. To me that means ambiguities are going to result in implementation differences between languages and libraries within single languages, which could be problematic in a polyglot environment if the different implementations have their own interpretations of the RFC.

Separately from the wording in the RFC, is there something we can do about the clock sequence hitting the maximum value when using millisecond precision? Seeing how easy it was to hit the maximum clock sequence value, it seems to strongly devalue the millisecond precision variant of UUIDv7 to the point I'd probably actively discourage its use in favor of the microsecond precision.

Would love to hear your thoughts on both of these.

@edo1
Copy link

edo1 commented Aug 27, 2021

The initial issue is that the first and second paragraph contradict each other. The first says we SHOULD utilize a monotonic sequence counter, without offering guidance on how to implement it without using a sequence counter. The second paragraph then goes on to say "The clock sequence MUST...", which with the usage of the indicates that there is only one type of clock sequence that should be used. This makes me think the SHOULD in the first paragraph might benefit from becoming a MUST.

I see no problem with the wording. The clock sequence SHOULD be implemented ("this is recommended"), if implemented, it MUST be as described.

But I really don't like the idea of a dedicated sequence counter.

Separately from the wording in the RFC, is there something we can do about the clock sequence hitting the maximum value when using millisecond precision?

My proposal #34 addresses this issue. The timestamp + sequence number is treated as a single 64-bit number, which will be incremented if the timestamp has not increased since the last UUID was generated.
This also addresses the problem of slight time stamp drift (e.g. due to NTP sync) that is not handled in the current draft.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Aug 27, 2021

... when using millisecond precision at full-bore I can easily increment the clock sequence to the point of rolling over the 12 bit integer.

It seems the 15 bit clock sequence would be the best decision for millisecond precision timestamp. See ULID with sequence

@theckman
Copy link
Author

I see no problem with the wording. The clock sequence SHOULD be implemented ("this is recommended"), if implemented, it MUST be as described.

@edo1 In that case my initial thought is that there is some value in being a bit more explicit at the beginning of the second paragraph with something like When utilizing a monotonic clock sequence it MUST start at zero...

My proposal #34 addresses this issue.

I'll take a look at it, thanks!

@LiosK
Copy link

LiosK commented Sep 7, 2021

Hi all,

I am not sure if I should introduce a new topic about UUIDv7 Clock Sequence here. Please let me know if I should open a new issue.

Is it possible to allow UUIDv7 Clock Sequence to start at a random number? The current draft says:

The clock sequence MUST start at zero

and

When the timestamp increments the clock sequence MUST be reset to zero.

As a result, I see the whole seq bits be filled with zeros and just wasted in an application that does not need a lot of UUIDs within a millisecond or microsecond. I was considering an implementation that initializes the seq counter at a random number and increments it when necessary, but such an implementation of clock sequence cannot be compliant with the clock sequence usage of the current draft because of the MUST wording that explicitly requires zero as the initial counter value.

It is true that the use of clock sequence is optional in the draft standard and we can just use the seq field as a random field in the low-frequency use cases. However, I think we can utilize a seq field more efficiently if we reset it to a random number. For example, if we initialize a 12-bit seq field with an 11-bit random integer, we can provide 11-bit extra randomness in the idle use cases while ensuring the generation of at least 2048 UUIDs within a timestamp. A pair of zero-starting seq field and random field cannot utilize the same 12 bits in this way.

In my opinion, the key characteristic of the clock sequence is that it must increment monotonically within the same timestamp, so other constraints (such as the initial value and timing of resetting) can be relaxed without hindering the goal of the standard.

Thanks,

@oittaa
Copy link

oittaa commented Nov 29, 2021

Just a random passerby, but I noticed how Python handles the situation at the moment in UUID1. They just increment the "nanosecond" part by one if there's a sequential call within the same timestamp. The official RFC talks about handling the collisions in the clock sequence part, but I guess since that would be a huge pain in the ass, they just made sure that the timestamps are never the same. So where I'm going with this is that people are just going to ignore the RFC if it's not "easy" to implement correctly.

Or is there some huge issue I'm missing, if you would just drop the whole clock sequence part from UUIDv7, use the whole nanosecond precision, and increment nanoseconds by one if there's a call within the same nanosecond? Like Python is doing with UUIDv1 although their UUIDv1 is just with a 100ns precision. Are there any realistic scenarios where someone is able and wants to generate UUIDs constantly within the same nanosecond?

To play around with these ideas I created a small test library for uuid6() and uuid7() and basically used the same method. Just increase the nanosecond part with one if for some weird reason it would be the same, and fill the rest with random data.

@LiosK
Copy link

LiosK commented Nov 30, 2021

That's exactly how I handled the issue recently when I experimentally implemented UUIDv7 in TypeScript. I now believe the description of clock sequence in the draft RFC is rather harmful as I have seen some implementations that naively implement the clock sequence and waste precious bits by filling them with zeros.

@fabiolimace
Copy link

@LiosK , your implementation can also be listed as a prototype in https://github.com/uuid6/prototypes. You can open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
UUIDv7 All things UUIDv7 related
Projects
None yet
Development

No branches or pull requests

7 participants