Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation? #41

Closed
theckman opened this issue Aug 27, 2021 · 4 comments
Closed

Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation? #41

theckman opened this issue Aug 27, 2021 · 4 comments
Labels
Draft 03 IETF Draft 03 Work UUIDv6 All things UUID6 related

Comments

@theckman
Copy link

theckman commented Aug 27, 2021

Hello,

I understand that this may be beyond the scope of what you intended with your draft, but since it's slated to update 4122, and it has an impact on UUIDv6, I thought I'd drop this here in case you wanted some well-intentioned scope creep. 😄

While going through and manually revalidating my implementation of UUIDv6, I realized the library I maintained was generating UUIDv1 clock sequences subtly different than how it is defined in RFC 4122. This library was forked from an earlier one, and so this implementation detail of my library originated there. Here is what section 4.1.5 of RFC 4122 says:

4.1.5.  Clock Sequence

   For UUID version 1, the clock sequence is used to help avoid
   duplicates that could arise when the clock is set backwards in time
   or if the node ID changes.

   If the clock is set backwards, or might have been set backwards
   (e.g., while the system was powered off), and the UUID generator can
   not be sure that no UUIDs were generated with timestamps larger than
   the value to which the clock was set, then the clock sequence has to
   be changed.  If the previous value of the clock sequence is known, it
   can just be incremented; otherwise it should be set to a random or
   high-quality pseudo-random value.

This says the clock sequence is used to help avoid duplicates that could arise when the clock is set backwards in time. What this means is that the number of UUIDs a stateful generator can generate is limited by the precision of the timestamp, whereas a stateless generator using a random clock sequence does not face the same limitations. That seems like an unintentional side effect of the wording of the RFC. Especially when the RFC says you should either fail to generate the UUID or block until it's safe to do so:

4.2.1.2.  System Clock Resolution

   The timestamp is generated from the system time, whose resolution may
   be less than the resolution of the UUID timestamp.

  ...

   If a system overruns the generator by requesting too many UUIDs
   within a single system time interval, the UUID service MUST either
   return an error, or stall the UUID generator until the system clock
   catches up.

So the person who originally implemented the UUIDv1 generator I now maintain, did more than the RFC said to do to avoid this failure mode and also incremented the clock sequence whenever a UUIDv1 has already been generated for this timestamp. This effectively removed the limitation on the number of UUIDv1s the stateful generator could generate, albeit while technically violating the language of the RFC.

if timeNow <= g.lastTime {
	g.clockSequence++
}

After I discovered this, I became curious about other UUIDv1 implementations in Go, and other languages, and I came to find that every stateful UUIDv1 implementation I looked at chose to not follow this section of the RFC as it was written. Instead, they either implemented it the same as the code I maintain, or they took the simpler approach of incrementing the sequence on every generation. The C implementation provided in RFC 4122 was implemented as the RFC itself was written, but that was the only stateful one I found doing so.

So that makes me wonder if we should use this opportunity to update the clock sequence generation language from 4122, so that UUIDv6 stateful generator implementers aren't also encouraged to violate the RFC as it's currently written.

I'd be interested in your thoughts.

@theckman theckman changed the title Further clarify UUIDv1 Clock Sequence Generation Further clarify UUIDv1 Clock Sequence Generation? Aug 27, 2021
@theckman theckman changed the title Further clarify UUIDv1 Clock Sequence Generation? Further clarify UUIDv1 (UUIDv6) Clock Sequence Generation? Aug 27, 2021
@kyzer-davis
Copy link
Contributor

If I read your request correctly:
You want to update some text around 4.2.1.2. System Clock Resolution to explicitly detail the method of incrementing the clock sequence if the previous timestamp is less than or equal to the current timestamp being used for the current UUIDvX generation vs stopping/blocking/stalling/erroring UUID generation altogether.

Else new timestamp or unknown previous timestamp:
Set clock sequence to random (UUIDv1/6) or explicit 0 start (UUIDv7)

@theckman
Copy link
Author

theckman commented Aug 27, 2021

You want to update some text around 4.2.1.2. System Clock Resolution to explicitly detail the method of incrementing the clock sequence if the previous timestamp is less than or equal to the current timestamp being used for the current UUIDvX generation vs stopping/blocking/stalling/erroring UUID generation altogether.

@kyzer-davis I think your summarization is almost right, with one minor comment. I wasn't thinking of it in a sense of "versus" stalling when I wrote up this issue, but this would definitely reduce the number of situations where an implementer following the RFC as written would need to do that.

For UUIDv1, it seems extremely unlikely a single stateful generator could generate enough UUIDs (4096) to loop back to the same clock sequence value without the timestamp having changed on most systems. There may be systems in use that don't have a precise clock and could generate 4096 UUIDs before the timestamp changes, so that assumption may not hold true. Considering that, we may still want to include language that explicitly guards against the clock sequence looping back around and generating duplicate UUIDs. I think either erroring or stalling when you detect you're about to generate a duplicate clock sequence is appropriate, so this is why I wouldn't summarize it as a "versus" but more of a "yes and".

I do have a similar open question (#40) as to whether we should error/stall UUIDv7s when the sequence clock is about to roll over. Maybe we want to unify the language for both to have the same behaviors. With UUIDv1/6 starting at a random value, and UUIDv7 starting at 0.

@fabiolimace
Copy link

fabiolimace commented Aug 27, 2021

TL;DR

The 'clock sequence' and the timestamp 'count' are not the same thing. They have different goals. The 'clock sequence' is a misnomer. The timestamp 'count' is just a counter with many names.

The 'count'

Many people confuse the 'clock sequence' with the 'count' used to simulate a high resolution timestamp. RFC-4122 says the following in the fourth paragraph of Section-4.2.1.2:

A high resolution timestamp can be simulated by keeping a COUNT of the number of UUIDs that have been generated with the same value of the system time, and using it to construct the low order bits of the timestamp. The count will range between zero and the number of 100-nanosecond intervals per system time interval.

Each implementation calls the 'count' with a different name. These are some names I found:

Note that all of the above 'count' implementations have the same function: simulate a high resolution timestamp. This is the common formula used, assuming a system time with millisecond precision: TIME * 10000 + COUNT. The formula is derived from this paragraph:

If UUIDs do not need to be frequently generated, the timestamp can simply be the system time multiplied by the number of 100-nanosecond intervals per system time interval.

For example, a generator that relies on a system clock that ticks every millisecond must use a 'count' variable that can be initialized to zero and increments whenever the system millisecond repeats. But it has a limit of 10,000 increment operations. When the 'count' reaches this limit, the Section-4.2.1.2 says that the generator should either throw an error or wait the clock to advance. If the generator is allowed to generate more than 10,000 UUIDs in the same millisecond, the generator would be violating the limit of 10,000,000 per second per machine:

The UUID generation algorithm described here supports very high allocation rates of up to 10 million per second per machine if necessary, so that they could even be used as transaction IDs.

The clock sequence

The clock sequence has a very different goal:

For UUID version 1, the clock sequence is used to help avoid duplicates that could arise when the clock is set backwards in time or if the node ID changes.

The RFC-4122 says that the clock sequence must be changed when:

  • the clock is backwards;
  • the clock might have been set backwards;
  • the generator is not sure if the clock is backwards or not;
  • the node identifier changed;
  • node identifiers are getting mixed up.

The clock sequence 'can just be incremented' (like a sequence). But it is a MAY, not a SHOULD. I've found some implementations that always randomize the clock sequence here and here.

I think the word 'sequence' is the cause of the confusion. Although the 'clock sequence' behaves like a sequence when it is incremented, it is a random number. So, the UUID v1 has not just two parts, namely time (100-nanos of the Gregorian epoch) and place (IEEE address), but THREE: time, place and some entropy.

Some people add another task to the 'clock sequence': to be an extension of the timestamp. I don't see any problem with this, but it's not explicit in RFC-4122. Read this text from Wikipedia:

A 13- or 14-bit "uniquifying" clock sequence extends the timestamp in order to handle cases where the processor clock does not advance fast enough, or where there are multiple processors and UUID generators per node... Since the time and clock sequence total 74 bits, 274 (1.8×1022, or 18 sextillion) version-1 UUIDs can be generated per node ID, at a maximal average rate of 163 billion per second per node ID.

If a developer wants to add the task of timestamp extension, he/she just needs to replace the comparison operator '<' with '<=':

    if (timestamp <= last_time)
        clockseq++;

Thanks for reading! :)

@bradleypeabody
Copy link
Contributor

@fabiolimace This is a terrific explanation! I'm not sure exactly how, but this clarification should absolutely appear in the new draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Draft 03 IETF Draft 03 Work UUIDv6 All things UUID6 related
Projects
None yet
Development

No branches or pull requests

4 participants