Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDs sequenced by time? #75

Closed
binarykitchen opened this issue Oct 8, 2013 · 21 comments
Closed

IDs sequenced by time? #75

binarykitchen opened this issue Oct 8, 2013 · 21 comments

Comments

@binarykitchen
Copy link

Hi there

I'm using uuid.v1() for keys within my LevelDB + node.js app. Whenever I list entries from the LevelDB, the sorting happens by these keys and unfortunately it is very random.

I mean, in a boring relational DB you increment the primary keys. Whenever you query the rows, the are sorted by these primary keys as a default behavior. This is good, you always know that recently inserted rows appear on top.

But with UUIDs v1 as keys, the sequence is pretty random.

Is there a solution for that? Do I need to fine-tune some options to achieve that behavior or can this be implemented at all in a new version?

@broofa
Copy link
Member

broofa commented Oct 8, 2013

RFC 4122 specifies an explicit ordering of the fields - http://tools.ietf.org/html/rfc4122#section-4.1.2 - that puts the low bytes of the timestamp at the front of the string. I forget the rational for this - something to do with putting the most-likely-to-change fields at the front of the string to improve string comparison performance, iirc. If you want to sort by timestamp, you'll need to swap the 1st and 3rd fields (separated by '-'), then order lexicographically.

@broofa broofa closed this as completed Oct 8, 2013
@binarykitchen
Copy link
Author

I see. That's interesting and should be documented I think. How would you swap these two fields best in Javascript?

Or better, why don't you introduce a new option for that. Or introduce a new, unofficial version uuid.vX() for that?

@broofa
Copy link
Member

broofa commented Oct 8, 2013

> var id = '110ec58a-a0f2-4ac4-8393-c866d813b8d1';
undefined
> id.replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1')
"4ac4-a0f2-110ec58a-8393-c866d813b8d1"

Dealing with non-standard id formats is a slippery slope to which there is no bottom. And since this can pretty easily be bolted on outside of this module I don't see any particular reason to include it here. Sorry. :-/

@binarykitchen
Copy link
Author

:(

But you said, the RFC 4122 says to put the low bytes of the timestamp at the front??

PS: thx for the code snippet

@broofa
Copy link
Member

broofa commented Oct 8, 2013

Sorry, by "low bytes" I meant "low fields". The fields are lowest-first but within a field the hex octets are ordered highest-first. So the regex above is what you want. (But test it to make sure!)

From the RFC:

Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit string with the most significant digit first. The hexadecimal values "a" through "f" are output as lower case characters and are case insensitive on input.

@binarykitchen
Copy link
Author

I seeee ... thx. Yeah, I already tested and your regex seems to be correct. Cheers!

@xerosanyam
Copy link

Hello guys,

uuid.v4().replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1') < uuid.v4().replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1')

doesn't always result in true.
what am I missing here ?

@broofa
Copy link
Member

broofa commented May 20, 2017

@xerosanyam, v4 uuids are randomly generated.

@xerosanyam
Copy link

So there is no way to sequence IDs by time ? 🤔

@broofa
Copy link
Member

broofa commented May 21, 2017

Only for v1 (timestamped) uuids. You'd have to lexicographically order by time_hi, time_mid, time_lo , and clock_sequence fields, respectively.

const uuidV1 = require('uuid/v1');

function v1UuidCompare(a, b) {
  a = a.replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1');
  b = b.replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1');
  return a < b ? -1 : (a > b ? 1 : 0);
}

> v1UuidCompare(uuidV1(), uuidV1())
-1

Edit: Remove clockseq as primary sort field, since it's non-deterministic across process restarts.

@sm2017
Copy link

sm2017 commented Apr 16, 2019

@broofa Why you have 2 logic?

replace(/^(.{8})-(.{4})-(.{4})-(.{4})/, \'$4-$3-$2-$1\')

and

replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1')

@broofa
Copy link
Member

broofa commented Apr 16, 2019

@sm2017: Good question.

I edited my last comment to remove the four field form (the one that resulted in clockseq being the primary sort field). Readers should just stick to the replace(/^(.{8})-(.{4})-(.{4})/, '$3-$2-$1') form.

While an argument can be made for including clockseq in the sort order because it gets incremented if there's a system clock regression, the fact it is randomly initialized on process startup makes it unsuitable for inclusion in sorting. (Including it in my previous comment was a mistake on my part. My apologies for the confusion.)

What this means is that in the (very rare?) case where the system clock regresses (goes back in time), the sort order won't reflect the order in which ids were generated. I expect that's a minor concern for most folks, however.

(Note: The behavior of clockseq described here is part of the 4122 spec, where the intent is to minimize the risk of uuid collisions.)

@sm2017
Copy link

sm2017 commented Apr 17, 2019

@broofa Thanks a lot

@jonagoldman
Copy link

jonagoldman commented Oct 6, 2019

In RFC 4122 UUID v1 format, the first three number groups aaaaaaaa-bbbb-cccc are generated from the low, middle, and high parts of a timestamp.

So I think it's possible to transform a UUID v1 to an ordered UUID v1 and back like this:

const uuidV1 = require('uuid/v1');

function orderUUIDv1(regularUuid) {
	return regularUuid.replace(/^(.{4})(.{4})-(.{4})-(.{4})/, '$4$3-$1-$2');
}

function unorderUUIDv1(orderedUuid) {
	return orderedUuid.replace(/^(.{4})(.{4})-(.{4})-(.{4})/, '$3$4-$2-$1');
}

// a9ec5a07-e872-11e9-b92c-080027d0eccd
let regularUuid = uuidV1();

// 11e9e872-a9ec-5a07-b92c-080027d0eccd
let orderedUuid = orderUUIDv1(regularUuid);

// a9ec5a07-e872-11e9-b92c-080027d0eccd
let unorderedUuid = unorderUUIDv1(orderedUuid);

This works by swapping the time-low and time-high parts (the first and third groups of hexadecimal digits, respectively). Basically moving the more rapidly varying part to the right.

This is inspired by how MySQL 8 deals with UUIDs. Specifically UUID_TO_BIN and BIN_TO_UUID, and explained here.

This is also possible in PHP with Ramsey\Uuid using OrderedTimeCodec.

@broofa
Copy link
Member

broofa commented Oct 7, 2019

So I think it's possible to transform a UUID v1 to an ordered UUID v1 and back like this:

Just so we're clear, there is no provision in RFC4122 for "ordered" v1 UUIDs. Describing them as such is a misnomer. (In fact, the transformation described creates an explicitly invalid UUID, because it moves the version field as well as the time_hi field)

Better to just say "transform a v1 UUID to a sort-friendly string"

@jonagoldman
Copy link

@broofa Yes, the result is an invalid UUID string, but it can be used to store it in an optimized way in relational databases, specially MySQL.

Main disadvantages of storing regular UUIDs are:

  1. UUID is not ordered, so inserts are random and the data is scattered.
    • Fixed by transforming it to a 'sort-friendly string'.
  2. UUID has 36 characters which make it bulky. When used as a PRIMARY KEY it makes the index bigger which sometimes cannot fit into memory.
    • Fixed by storing it in BINARY.

The fact that the MySQL added this functionality in MySQL 8 is a good indicator that it is useful.

Of course there is no need to do it JavaScript, you can transform to ordered UUIDs in MySQL or PHP, but I post it here in case it is helpful to anyone.

ctavan added a commit to tc39/proposal-uuid that referenced this issue Oct 7, 2019
The analysis text could be misunderstood in a sense that all v1 UUIDs
are globally time ordered. This is not the case, since the timestamp is
split into three fields that appear in the UUIDs in reverse order, see
uuidjs/uuid#75 (comment)
for more discussion.

This improvement of the text does not change anything about the
assumptions or results presented in the analysis.
@bradennapier
Copy link

FYI I built this awhile back which does this shuffling when converting uuidv1 into a binary format: https://github.com/odo-network/binary-uuid

@rajatkeshar
Copy link

Hi, is there any ways to generate uuid in descending order?

@ctavan
Copy link
Member

ctavan commented Sep 28, 2020

What exactly are you trying to achieve, @rajatkeshar ?

@jonagoldman
Copy link

jonagoldman commented Sep 28, 2020

I think he means a version 1 UUID with the time fields swapped so the lowest bytes appear first, the middle bytes are next, and the highest bytes come last so that the UUID is sorted by creation time.

Like in Ramsey Ordered-Time UUID or MySQL 8 Ordered UUID.

Or maybe the experimental UUID version 6?

@ctavan
Copy link
Member

ctavan commented Sep 28, 2020

But generating these would still result in ascending order…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants