Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how to represent fractional seconds in UUIDv7 #44

Closed
stevesimmons opened this issue Dec 19, 2021 · 3 comments · Fixed by #58
Closed

Clarify how to represent fractional seconds in UUIDv7 #44

stevesimmons opened this issue Dec 19, 2021 · 3 comments · Fixed by #58
Labels
Draft 03 IETF Draft 03 Work UUIDv8 All things UUIDv8 related

Comments

@stevesimmons
Copy link

stevesimmons commented Dec 19, 2021

Some of the UUIDv7 implementations I've seen on GitHub have not implemented fractional seconds correctly. For instance, putting the number of milliseconds directly into the 12 bit field subsec_a, thereby using a 0-999 subset of the full 0-4095 range.

To prevent this, could the initial descriptions of the subsec_a and subsec_b fields (around L568 of the -02.txt draft) make it more explicit that these must use the full range, with an example of what not to do.

Currently this isn't spelled out (though discussed in issue #24). For instance:

Section 4.4.1. - UUIDv7 Timestamp Usage

Additional sub-second precision (millisecond, nanosecond,
microsecond, etc) MAY be provided for encoding and decoding in the
remaining bits in the layout. [note: but doesn't say how!]

Section 4.4.4.1. - UUIDv7 Encoding

All 12 bits of scenario subsec_a is fully dedicated to millisecond
information (msec). [note: it isn't clear what "fully dedicated" means here]

It's only once the reader gets down to L845 that the requirements are spelled out:

Section 4.4.4.2. UUIDv7 Decoding

Similarly as per Figure 2, the sub-second precision values lie within
subsec_a, subsec_b, and subsec_seq_node which are all interpreted as
sub-second information after skipping over the version (ver) and
(var) bits. These concatenated sub-second information bits are
interpreted in a way where most to least significant bits represent a
further division by two. This is the same normal place notation used
to express fractional numbers, except in binary. For example, in
decimal ".1" means one tenth, and ".01" means one hundredth. In this
subsec field, a 1 means one half, 01 means one quarter, 001 is one
eighth, etc. This scheme can work for any number of bits up to the
maximum available, and keeps the most significant data leftmost in
the bit sequence.

As an additional suggestion, it would be very helpful if the text could include some examples of uuids and their min/max implied timestamps, to serve as test cases in unit tests.

@oittaa
Copy link

oittaa commented Jan 1, 2022

I think it might be a good idea to include code examples, at least in a few of the most common programming languages, how to encode and decode those timestamps. Basically every programming language I know expresses precision timestamps as integers. For example https://docs.python.org/3/library/time.html#time.time_ns

@oittaa
Copy link

oittaa commented Jan 1, 2022

Python code example

# Enough to represent nanoseconds from time.time_ns()
SUBSEC_BITS = 30
SUBSEC_DECIMAL_DIGITS = 9


def subsec_encode(
    value: int,
    subsec_bits: int = SUBSEC_BITS,
    subsec_decimal_digits: int = SUBSEC_DECIMAL_DIGITS,
) -> int:
    return value * 2 ** subsec_bits // 10 ** subsec_decimal_digits


def subsec_decode(
    value: int,
    subsec_bits: int = SUBSEC_BITS,
    subsec_decimal_digits: int = SUBSEC_DECIMAL_DIGITS,
) -> int:
    return -(-value * 10 ** subsec_decimal_digits // 2 ** subsec_bits)


def test_millisecond():
    print("Testing millisecond conversions.")
    for i in range(10 ** 3):
        if i % 10 ** 2 == 0:
            print(f"{i=} ...")
        assert i == subsec_decode(subsec_encode(i, 10, 3), 10, 3)


def test_microsecond():
    print("Testing microsecond conversions.")
    for i in range(10 ** 6):
        if i % 10 ** 5 == 0:
            print(f"{i=} ...")
        assert i == subsec_decode(subsec_encode(i, 20, 6), 20, 6)


def test_nanosecond():
    print("Testing nanosecond conversions.")
    for i in range(10 ** 9):
        if i % 10 ** 6 == 0:
            print(f"{i=} ...")
        assert i == subsec_decode(subsec_encode(i, 30, 9), 30, 9)


def main():
    import time

    timestamp = time.time_ns()
    unixts, ns = divmod(timestamp, 10 ** SUBSEC_DECIMAL_DIGITS)
    subsec = subsec_encode(ns)
    subsec_to_ns = subsec_decode(subsec)
    print(f"{timestamp=}")
    print(f"{unixts=}")
    print(f"{ns=}")
    print(f"{subsec=}")
    print(f"{subsec_to_ns=}")
    test_millisecond()
    test_microsecond()
    test_nanosecond()


if __name__ == "__main__":
    main()

Ouput is something like

timestamp=1641075060289465000
unixts=1641075060
ns=289465000
subsec=310810677
subsec_to_ns=289465000
Testing millisecond conversions.
i=0 ...

Those main() and test_... functions here are to show that it actually works and maybe you would just include subsec_encode() and subsec_decode() in the actual documentation. The encoding and decoding should work correctly even if you change SUBSEC_BITS to something like the above mentioned 12 for a millisecond precision.

@kyzer-davis kyzer-davis added the Draft 03 IETF Draft 03 Work label Jan 31, 2022
@fabiolimace
Copy link

Great code example @oittaa !

@kyzer-davis kyzer-davis added the UUIDv7 All things UUIDv7 related label Feb 7, 2022
@kyzer-davis kyzer-davis added UUIDv8 All things UUIDv8 related and removed UUIDv7 All things UUIDv7 related labels Feb 23, 2022
@kyzer-davis kyzer-davis mentioned this issue Feb 23, 2022
@kyzer-davis kyzer-davis linked a pull request Feb 23, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Draft 03 IETF Draft 03 Work UUIDv8 All things UUIDv8 related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants