UUID version for proprietary formats #31

broofa · 2021-08-15T14:06:35Z

Note: Using the term "proprietary format" here to refer to any uuid formats that are user-specific in a way that doesn't [yet] merit formal RFC specification. (This notion comes from the world of MIME types, where the MIME standard allows for non-standard types, given a suitable "tree prefix" on the type string.)

In #30, I've argued that the current version 8 spec is too vague to carry much meaning. It defines timestamp, node, and clockseq fields in the most general way possible, allowing for arbitrary bit lengths in all fields. But, in doing so, fails to meaningfully define any of these fields. Furthermore, the spec is prefaced with a variety of provisions about when version 8 should and should not be used, all of which scream, "ONLY USE THIS FOR EXPERIMENTAL OR PROPRIETARY FORMATS".

In short, I believe version 8 is at risk of falling afoul of the classic "wanting the best of both worlds, but getting the worst" design process. It's a spec that is overly strict for proprietary use cases, and overly vague for timestamp cases. Thus, I believe the right approach is to combine the existing versions 6-8 into a single, new, timestamp uuid version, as discussed in issue #30).

Then, for use cases that require a proprietary format, have a new version that is dedicated to that purpose, and that makes as few assumptions about the nature of the contained proprietary format as possible. In other words, the spec should boil down to the following:

variant bits as defined in RFC4122
version bits set to 1000 (version 8, or whatever version # ends up being used)
All other bits available for user data. The one provision here being that users are encouraged (but not required) to structure fields in a way that maximizes database locality (by placing the most stable fields / bits in the most significant bits of the UUID).

The text was updated successfully, but these errors were encountered:

edo1 · 2021-08-15T14:14:49Z

All other bits available for user data

Is there a reason not to use version=0b0100 (v4) for this?

broofa · 2021-08-15T14:46:28Z

Is there a reason not to use version=0b0100 (v4) for this?

There is a guarantee of uniqueness that stems from using "cryptographic quality" random number sources. That guarantee breaks down anywhere version 4 uuids are generated through other means.
Users may make assumptions about how v4 uuids are distributed in the uuid "space" that affect system performance. Witness the driving issue of DB locality as it relates to this spec.
If v4 (random) uuids get comingled with v4 (proprietary) uuids, there won't be any 100% reliable way of distinguishing between them. This could be problematic if, for example, a (random) v4 uuid is parsed as a semantically meaningful (proprietary) v4 uuid.

Note: One possible use case for a proprietary UUID format would be a "GIS UUID" that encodes latitude-longitude location.

fabiolimace · 2021-08-15T15:16:35Z

UUIDv4 is expected to be generated from truly random or pseudo-random numbers. There are some implementations that prepend the timestamp in the UUIDv4, like in COMB-GUID, but they are not a strict UUIDv4.

I think a version for proprietary formats is an interesting thing to prevent people from using UUIDv4 for this purpose. UUIDv8 can be the version for proprietary formats.

edo1 · 2021-08-15T15:25:56Z

If v4 (random) uuids get comingled with v4 (proprietary) uuids, there won't be any 100% reliable way of distinguishing between them. This could be problematic if, for example, a (random) v4 uuid is parsed as a semantically meaningful (proprietary) v4 uuid

How could application distinguish between the v7 sub-variants?
And IMO UUID parsing is a bas idea at all.

Users may make assumptions about how v4 uuids are distributed in the uuid "space" that affect system performance.

So "v8 is any btree-friendly (sortable) application-defined UUID"?

Note: One possible use case for a proprietary UUID format would be a "GIS UUID" that encodes latitude-longitude location.

Doubt this is a good idea, something like postgres gist index should be used instead of b-tree.

edo1 · 2021-08-15T15:37:48Z

There are some implementations that prepend the timestamp in the UUIDv4, like in COMB-GUID, but they are not a strict UUIDv4.

I understand this. But what is the difference (except sortability) between v4 and v8 for the reader? An application cannot rely on the internals of UUID v8 because there is no good way to know how a particular UUID was generated.

broofa · 2021-08-15T16:28:49Z

And IMO UUID parsing is a bas idea at all.

Alas, this isn't something any of us have control over. Users are going to do whatever they deem right. If this spec doesn't define a sandbox for users that want to encode/decode proprietary information in uuids then they're likely to do exactly what you suggest, and use version 4 (or 1 or 5 or 3 or whatever). That your first instinct was to suggest people use version 4 uuids for proprietary formats is a good demonstration of the problem.

A proprietary version would at least tell people where the guardrails are.

UUIDv8 can be the version for proprietary formats.

Are you suggesting the current v8 proposal works for this? As I noted above, I think it's overly restrictive in its current form.

fabiolimace · 2021-08-15T16:41:32Z

Are you suggesting the current v8 proposal works for this? As I noted above, I think it's overly restrictive in its current form.

IMO all restrictions can be removed from v8 except version and variant bits.

edo1 · 2021-08-15T17:56:43Z

IMO all restrictions can be removed from v8 except version and variant bits.

Even sortability is not required? IMO it is the error-prone way. The use of proprietary UUID should be avoided whenever possible.

My suggestion, there are use cases:

"just a unique identifier", v4 should be used;
This algorithm only needs a good RNG to generate a collision-free identifier.
"reproducible identifier", v5 should be used;
"globally sortable time-based identifier", v7 should be used;
This algorithm could be used to generate roughly time-sorted identifiers generated around the world (or for monotonic centrally-generated sequences).
"proprietary sortable identifier", v8 should be used (if it will be decided to leave it in the final RFC version);
"proprietary non-sortable identifier", avoid this, use v4 or v5 instead.

kyzer-davis · 2021-08-16T15:17:32Z

@broofa

As I noted above, I think it's overly restrictive in its current form.

The goal was exactly what you and the others mentioned but I can relax it even more if required.

UUIDv8 is at its core a standards based UUID layout with 122 bits for whatever proprietary sortable identifier an application requires.

The text, figures and definitions in that section are really there to share some creation examples and detail best practices we have learned from working with v6/v7 on the topics of timestamp, sequence, node ordering to avoid sorting issues along with considerations for timestamp length and sequence length (i.e more exact timestamp less clock sequence required).

If I wanted to abstract the layout definitions even further the text definitions could be:

segment_a - Everything from first bit to version (48 bits)
ver- 4 bits (1000)
segment_b - Everything from version to variant (12 bits)
var - 2 bits (Assuming 10)
segment_c - Everything after variant (62 bits)

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                            segment_a                          |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |           segment_a           |  ver  |      segment_b        |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |var|                       segment_c                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |                           segment_c                           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

With this in place I can use write very simple "UUIDv8 Basic Creation Algorithm" or modify "General algorithm for generation of UUIDv8 not defined here" from that section.
It may still be good to have a real example that has some bit allocations just to provide more context. The "48-bit timestamp, 12-bit sequence counter, 62-bit node:" example number 2 I currently have seems like a good contender but if a general is enough I will drop them all.

kyzer-davis mentioned this issue Aug 16, 2021

Have just one new UUID version rather than three(?) #30

Closed

broofa changed the title ~~UUID format for proprietary formats~~ UUID version for proprietary formats Aug 16, 2021

kyzer-davis mentioned this issue Jan 29, 2022

Typo: Proposed UUIDv8 implementation #46

Closed

kyzer-davis added the Draft 03 IETF Draft 03 Work label Jan 31, 2022

kyzer-davis added the UUIDv8 All things UUIDv8 related label Feb 7, 2022

kyzer-davis mentioned this issue Feb 23, 2022

Draft 03 PR #58

Merged

kyzer-davis linked a pull request Feb 23, 2022 that will close this issue

Draft 03 PR #58

Merged

kyzer-davis closed this as completed in #58 Mar 1, 2022

fabiolimace mentioned this issue Sep 13, 2023

Simplify UUIDv8 Hash-based Example ietf-wg-uuidrev/rfc4122bis#147

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UUID version for proprietary formats #31

UUID version for proprietary formats #31

broofa commented Aug 15, 2021

edo1 commented Aug 15, 2021 •

edited

Loading

broofa commented Aug 15, 2021 •

edited

Loading

fabiolimace commented Aug 15, 2021

edo1 commented Aug 15, 2021

edo1 commented Aug 15, 2021

broofa commented Aug 15, 2021 •

edited

Loading

fabiolimace commented Aug 15, 2021 •

edited

Loading

edo1 commented Aug 15, 2021 •

edited

Loading

kyzer-davis commented Aug 16, 2021

UUID version for proprietary formats #31

UUID version for proprietary formats #31

Comments

broofa commented Aug 15, 2021

edo1 commented Aug 15, 2021 • edited Loading

broofa commented Aug 15, 2021 • edited Loading

fabiolimace commented Aug 15, 2021

edo1 commented Aug 15, 2021

edo1 commented Aug 15, 2021

broofa commented Aug 15, 2021 • edited Loading

fabiolimace commented Aug 15, 2021 • edited Loading

edo1 commented Aug 15, 2021 • edited Loading

kyzer-davis commented Aug 16, 2021

edo1 commented Aug 15, 2021 •

edited

Loading

broofa commented Aug 15, 2021 •

edited

Loading

broofa commented Aug 15, 2021 •

edited

Loading

fabiolimace commented Aug 15, 2021 •

edited

Loading

edo1 commented Aug 15, 2021 •

edited

Loading