Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize UUIDv7 generation for database partitioning #130843

Open
sscherfke opened this issue Mar 4, 2025 · 9 comments
Open

Customize UUIDv7 generation for database partitioning #130843

sscherfke opened this issue Mar 4, 2025 · 9 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@sscherfke
Copy link

sscherfke commented Mar 4, 2025

Feature or enhancement

Proposal:

Support for UUIDv7 via uuid7() has just landed in main: #89083

One use-case for UUIDv7 is using it as PK in databases. Since it is time based, it can also be used as partition key (e.g., to use one partion for each day). In order to calculate the partition range, you need calculate the "minimal" UUID for a given date (i.e., 2025-04-05 00:00:00 and use all zeros for the random bits => 0196033f-4400-7000-8000-000000000000).

I'm totally fine with uuid.uuid7() not taking any arguments, but it would be cool if the building blocks for generating a UUIDv7 based on custom unix_ts_ms, counter, and tail could be exposed as well.

def min_uuid7(date: datetime.datetime | None) -> UUID:
    # This is just for convenience and could be left out:
    if date is None:
        today = datetime.date.today()
        date = datetime.datetime(
            today.year, today.month, today.day, tzinfo=datetime.UTC
        )

    # Provide a custom timestamp and a custom counter and tail
    timestamp_ms = int(date.timestamp() * 1_000)
    counter, tail = 0, 0

    # The remainder is the same as in uuid7():
    unix_ts_ms = timestamp_ms & 0xFFFF_FFFF_FFFF
    counter_msbs = counter >> 30
    # keep 12 counter's MSBs and clear variant bits
    counter_hi = counter_msbs & 0x0FFF
    # keep 30 counter's LSBs and clear version bits
    counter_lo = counter & 0x3FFF_FFFF
    # ensure that the tail is always a 32-bit integer (by construction,
    # it is already the case, but future interfaces may allow the user
    # to specify the random tail)
    tail &= 0xFFFF_FFFF
    
    int_uuid_7 = unix_ts_ms << 80
    int_uuid_7 |= counter_hi << 64
    int_uuid_7 |= counter_lo << 32
    int_uuid_7 |= tail
    # by construction, the variant and version bits are already cleared
    int_uuid_7 |= _RFC_4122_VERSION_7_FLAGS
    return UUID(int=int_uuid_7)
>>> min_uuid7(datetime.datetime(2025, 4, 5, tzinfo=datetime.UTC))
UUID('0196033f-4400-7000-8000-000000000000')

Another useful addition might be a helper that recovers the original datetime/timestamp from a UUIDv7. I understand that this is additional code that might be slightly out of context, but such functions - like uuid7() - would probably not need to be changed, but are not trivial to implement for "normal users".

These functions could look like this:

def uuid_to_timestamp_ms(uuid: UUID) -> int:
    uuid_flags = uuid.int & _RFC_4122_VERSION_7_FLAGS
    if uuid_flags != _RFC_4122_VERSION_7_FLAGS:
        raise ValueError(f"{uuid} is not a v7 UUID.")
    return int.from_bytes(uuid.bytes[:6])


def uuid_to_datetime(uuid: UUID) -> datetime.datetime:
    ms_since_epoch = uuid_to_timestamp_ms(uuid)
    return datetime.datetime.fromtimestamp(ms_since_epoch / 1_000, tz=datetime.UTC)
>>> d = datetime.datetime(2025, 4, 5, tzinfo=datetime.UTC)
>>> u = min_uuid7(d)
>>> assert uuid_to_datetime(u) == d

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

@sscherfke sscherfke added the type-feature A feature request or enhancement label Mar 4, 2025
@picnixz picnixz added the stdlib Python modules in the Lib dir label Mar 4, 2025
@picnixz
Copy link
Member

picnixz commented Mar 4, 2025

Another useful addition might be a helper that recovers the original datetime/timestamp from a UUIDv7

For this one, I plan to somehow make it work under #120878. I don't know how to make it work properly though because the notion of time_lo/time_mid/time_hi is different for UUIDv1/v6 and UUIDv7 (the first two have 60-bit timestamp, UUIDv7 has 48-bit timestamp).

@picnixz
Copy link
Member

picnixz commented Mar 4, 2025

As for min_uuid7(), I think it's better to actually support timestamp in general, not date and datetime. To make a UTC timestamp, one could do time.mktime(time.gmtime()) (IIRC). WDYT?

@sscherfke
Copy link
Author

Using a timestamp would be okay, but the users’ mindset for this function (or at least for my use case ;-)) is "I need to create a new DB partition for today / the next month which is YYYY-MM-DD 0 o'clock. Gimme the minimal UUIDv7 for that!", so supporting datetimes would be very convenient.

@picnixz
Copy link
Member

picnixz commented Mar 4, 2025

It's possible to convert the datetime object to a timestaml via .timestamp(). One reason for accepting a timestamp is essentially to make the interface more flexible for the standard library and easier to maintain (maintenance cost is something that needs to be taken into account). Also, a timestamp is timezone agnostic.

I'm leaving for 10 days so I won't be able to reply except on mobile.

@sscherfke
Copy link
Author

I understand your reasoning and something with a timestamp is better than nothing. :-)

@picnixz
Copy link
Member

picnixz commented Mar 4, 2025

We actually had a similar discussion on whether to accept or not datetime objects for gzip in #128584. I see reasons not to but I also see reasons to. I think we need to find an equilibrium between what would be the most useful and what would be the best solution for future compatibility (remember that once we decide on something for the standard library, it becomes kind of "frozen" and requires a deprecation period for any changes we make).

@picnixz picnixz self-assigned this Mar 4, 2025
@sergeyprokhorenko
Copy link

Using a timestamp would be okay, but the users’ mindset for this function (or at least for my use case ;-)) is "I need to create a new DB partition for today / the next month which is YYYY-MM-DD 0 o'clock. Gimme the minimal UUIDv7 for that!", so supporting datetimes would be very convenient.

It is enough to take the left segment of the required length from the UUID as the partition key. The accuracy does not necessarily have to be exactly the same as 24 hours.

@sergeyprokhorenko
Copy link

sergeyprokhorenko commented Mar 6, 2025

Another useful addition might be a helper that recovers the original datetime/timestamp from a UUIDv7

For this one, I plan to somehow make it work under #120878. I don't know how to make it work properly though because the notion of time_lo/time_mid/time_hi is different for UUIDv1/v6 and UUIDv7 (the first two have 60-bit timestamp, UUIDv7 has 48-bit timestamp).

Nobody really needs version 6. Only version 1 and 7. See the uuid_extract_timestamp() function here for an example

@sscherfke
Copy link
Author

It is enough to take the left segment of the required length from the UUID as the partition key. The accuracy does not necessarily have to be exactly the same as 24 hours.

You still need to pass in a custom timestamp/datetime to calculate that segment for the given time and in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants