Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v6] Python implementation increments timestamp instead of clock sequence #4

Closed
nerg4l opened this issue Jun 9, 2021 · 5 comments
Closed

Comments

@nerg4l
Copy link

nerg4l commented Jun 9, 2021

While having a look at the python prototype and reading the "UUIDv6 Basic Creation Algorithm" section of the draft I noticed a difference between them.

From Section 4.3.4.

  1. If the state was available, but the saved timestamp is less than
    or equal to the current timestamp, increment the clock sequence
    value.

if _last_timestamp is not None and timestamp <= _last_timestamp:
timestamp = _last_timestamp + 1
_last_timestamp = timestamp
if clock_seq is None:
clock_seq = random.getrandbits(14) # instead of stable storage

I think the implementation in the linked code segment increments the timestamp when it should increment the clock sequence instead.

@kyzer-davis
Copy link
Collaborator

Hey @nerg4l,
I had noticed this too when forking the UUIDv1 code from the official Python repo.
https://github.com/python/cpython/blob/3.9/Lib/uuid.py#L688

I have been meaning to fix it in v6 prototype but have not found the time.
After I can submit PR to have the Python folks update their code too since the UUIDv1 creation algorithm is basically the same for that step in RFC 4122: https://datatracker.ietf.org/doc/html/rfc4122#section-4.2.1

@kyzer-davis
Copy link
Collaborator

@nerg4l
I pushed a branch uuidv7-python which should fix the v1/v6 issue where timestamp is incremented instead of the clock sequence.
This is simply re-used code from my UUIDv8 timestamp check.

Sequence Counter Logic:

  • sequenceCounter always starts at 0 instead of random value Python UUIDv1 was setting before
  • Compare _last_timestamp to current timestamp for this UUID generation
  • If current timestamp is less than or equal increment sequenceCounter by one (shouldn't ever be less than)
  • If current timestamp is greater, reset sequenceCounter to 0
  • Ensure we are updating _last_timestamp and _last_sequence for future UUID generation iterations.

Testing v6 with Dev Debugs enabled seems to increment properly but let me know what you think.

@nerg4l
Copy link
Author

nerg4l commented Jun 10, 2021

I think it is better to manage the sequence this way. The previous logic altered the time segment which meant whenever someone would read it from the generated uuid they would get an incorrect result.

Just remove clock_seq from the paramtere list. Currently that parameter is not respected. It gets a new value assigned in the method.

RFC 4122 - Section 4.1.5 says the following about clock sequence:

The clock sequence MUST be originally (i.e., once in the lifetime of
a system) initialized to a random number to minimize the correlation
across systems. This provides maximum protection against node
identifiers that may move or switch from system to system rapidly.
The initial value MUST NOT be correlated to the node identifier.

I don't know how important to respect this requirement in case of v6. The 48 random bit should be enough to avoid collisions.

@kyzer-davis
Copy link
Collaborator

@nerg4l
Yeah, I just made a commit to drop clock_seq variable from v1/v6. The goal with this v6 prototype was to detail that you only needed to change a few lines of code to change an existing v1 implementation over to v6. Little did I know Python's v1 implementation was non-compliant with RFC4122...

As for the v6:
Our current 01 draft re-uses the logic from the section you specified in RFC4122. However, in v1 the Sequence also doubles as entropy since the Node usually may contain a static 48-bit MAC. But in v6 our Node is random as such there is no reason to randomize the Seq to start.

Furthermore, if I recall in testing random Seq start in v8 prototype I found that random sequence counter greatly increased the change the Seq would rollover causing out of order UUIDs. Or if the rollover isn't handled properly overflows where the Seq becomes larger than the space it was meant to encompass resulting in UUIDs larger than 128 bits.

Thus is the reason why I stated v7 and v8 like so:

The clock sequence MUST start at zero and increment monotonically for
each new UUID created on by the application on the same timestamp.
When the timestamp increments the clock sequence MUST be reset to
zero. The clock sequence MUST NOT rollover or reset to zero unless
the timestamp has incremented. Care MUST be given to ensure that an
adequate sized clock sequence is selected for a given application
based on expected timestamp precision and expected UUID generation
rates.

I will discuss with Brad and update draft 02 where required to state that the sequence should start at 0 and increase by 1. Exactly like we do in v7/v8.

@nerg4l
Copy link
Author

nerg4l commented Jun 11, 2021

Furthermore, if I recall in testing random Seq start in v8 prototype I found that random sequence counter greatly increased the change the Seq would rollover causing out of order UUIDs. Or if the rollover isn't handled properly overflows where the Seq becomes larger than the space it was meant to encompass resulting in UUIDs larger than 128 bits.

I had the same observation while writing an implementation for v6.

I will discuss with Brad and update draft 02 where required to state that the sequence should start at 0 and increase by 1. Exactly like we do in v7/v8.

I think that would be a great addition. It would also make the sequence reusable between v6, v7, and v8.

When you discuss this don't forget to take into consideration the rule for systems/languages which cannot provide 100-nanosecond timestamps (e.g. JavaScript in the browser). RFC 4122 - Section 4.2.1.2. suggests the following:

A high resolution timestamp can be simulated by keeping a count of
the number of UUIDs that have been generated with the same value of
the system time, and using it to construct the low order bits of the
timestamp. The count will range between zero and the number of
100-nanosecond intervals per system time interval.

At the moment, there are 3 ways which comes to my mind.

  • Option 1: Increment low order bits, Sequence always 0
  • Option 2: Increment sequence, Low order bits not modified
  • Option 3: Increment low order until it reaches the precision of the system time and then increment sequence

The most optimal would be Option 3 but it might add a lot of unnecessary complexity.

@nerg4l nerg4l closed this as completed Jun 17, 2021
kyzer-davis added a commit that referenced this issue Jul 12, 2021
Draft 01 Update (#2 #4 #5 and update Readme with new prototype links)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants