Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Python Prototype for v7 #2

Merged
merged 2 commits into from Jul 12, 2021
Merged

Adding Python Prototype for v7 #2

merged 2 commits into from Jul 12, 2021

Conversation

fabiolimace
Copy link
Contributor

Added v7 Python prototype plus testing

Added v7 Python prototype plus testing
@kyzer-davis
Copy link
Collaborator

kyzer-davis commented May 24, 2021

Thanks for the pull. A quick glance with my morning coffee looks good. I will grab a copy this afternoon and run through some deeper reviews and some testing.

Appreciate the help!

@fabiolimace
Copy link
Contributor Author

fabiolimace commented May 25, 2021

You can modify it or refuse it if you think it is not a compliant implementation.

There are some aspects of this implementation that can be problematic:

  • It does not have a clock sequence (sequence counter). I did it without clock seq to avoid more complexity.
  • The node id changes all the time. Should it be a static random node id for the entire session?
  • The "devDebugs" IF is incomplete. I didn't have time to finish.
  • The subsec bits are encoded by multiplying the fractional part by 2 ** subsec_bits. I assume that if the decoding is done by dividing the subsec by 2 ** subsec_bits, the encoding must be done by multiplying the fractional part by 2 ** subsec_bits. Is it required for encoding? Can the encoding be relaxed as long as the decoding derives a value "as close to the correct value as possible"?

@kyzer-davis
Copy link
Collaborator

Ah, thanks for the heads up.
I will work in a clock sequence.
Node can change each time that is fine.
I was only doing UUID generation in these prototypes so no need to handle decoding at the moment.

@kyzer-davis
Copy link
Collaborator

kyzer-davis commented Jun 10, 2021

@fabiolimace I updated v7 in branch uuidv7-python

  • Added clock sequence
  • More comments for those that may follow along with this code
  • More tests and dev debug sections

Testing I did seems okay. Let me know what you think. (Note: I did replace your splices because I am terrible with bitwise operations). I also added f4b6a3/uuid-creator to the readme table for UUIDv6


@bradleypeabody In testing this implementation I found that we may need to update draft 01. NS only needs 30 bits for subsec and our example for NS example in 4.4.4.1. UUIDv7 Encoding used too many bits. Worth also double-checking Millisecond and Microsecond too.

  • All 12 bits of scenario subsec_a is fully dedicated to providing
    sub-second encoding for the Nanosecond precision (nsec).
  • All 12 bits of subsec_b have been dedicated to providing sub-
    second encoding for the Nanosecond precision (nsec).
  • The first 14 bit of the subsec_seq_node dedicated to providing
    sub-second encoding for the Nanosecond precision (nsec).

It is an easy fix, we just need to give 8 bits back to the Random part of subsec_seq_node and update like so:

  • The first 6 bit of the subsec_seq_node dedicated to providing
    sub-second encoding for the Nanosecond precision (nsec).
  • Finally the remaining 48 bits in the subsec_seq_node section are
    layout is filled out with random data to pad the length and
    provide guaranteed uniqueness (rand).

@fabiolimace
Copy link
Contributor Author

fabiolimace commented Jun 12, 2021

@kyzer-davis Now it's much better. I also wanted to replace the bitwise operations. Thanks!

I think this patch can fix the problem of bad decoding that forced the use of padding:

--- OLD/new_uuid.py
+++ NEW/new_uuid.py
@@ -170,7 +170,7 @@
 def uuid7(devDebugs=False, returnType="hex"):
     """Generates a 128-bit version 7 UUID with nanoseconds precision timestamp and random node
 
-    example: 60c26bbe-0728-7f46-9602-bcf7423f3cb7
+    example: 060c4735-8bcb-7726-a200-1fd41eaa8a29
 
     format: unixts|subsec_a|version|subsec_b|variant|subsec_seq_node
 
@@ -217,8 +217,7 @@
 
     ### Binary Conversions
     ### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
-    unixts = f'{sec:032b}'
-    unixts = unixts + "0000" # Pad end with 4 zeros to get 36-bit
+    unixts = f'{sec:036b}'
     subsec_binary = f'{subsec:030b}'
     subsec_a =  subsec_binary[:12] # Upper 12
     subsec_b_c = subsec_binary[-18:] # Lower 18
@@ -263,7 +262,7 @@
     _last_uuid_int = UUIDv7_int
 
     # Convert Hex to Int then splice in dashes
-    UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
+    UUIDv7_hex = f'{UUIDv7_int:032x}'
     UUIDv7_formatted = '-'.join(
         [UUIDv7_hex[:8], UUIDv7_hex[8:12], UUIDv7_hex[12:16], UUIDv7_hex[16:20], UUIDv7_hex[20:32]])

If you want to test the UUID time you can apply these changes to testing_v6.py and testing_v7.py :

testing_v6.py

--- OLD/testing_v6.py
+++ NEW/testing_v6.py
@@ -1,5 +1,6 @@
 import new_uuid
 import random
+import time
 
 """
 Testing:
@@ -17,16 +18,24 @@
 showUUIDs = False # True to view the generated UUID returnType and lists
 clock_seq = None # Set Clock Sequence
 
+def extractSeconds(uuid):
+	uuid_hex = uuid.replace('-', '')
+	timestamp = uuid_hex[:12] + uuid_hex[13:16]
+	return int((int(timestamp, 16) - 0x01b21dd213814000) / 10000000)
+
 def v6Tests(showUUIDs):
     counter = 0
     testList = []
     masterDict = {}
+    
+    start = int(time.time())
     while counter < 1000:
         # UUIDv6 = new_uuid.uuid1(devDebugs, returnType)
         UUIDv6 = new_uuid.uuid6(devDebugs, returnType)
         testList.append(UUIDv6)
         masterDict[UUIDv6] = counter
         counter += 1
+    end = int(time.time())
 
     if showUUIDs:
         print("\n")
@@ -54,6 +63,9 @@
         if masterDict[UUID] != counter:
             failCount+=1
             print('{0}: {1}'.format(str(counter), UUID))
+        elif not (extractSeconds(UUID) >= start and extractSeconds(UUID) <= end):
+            failCount+=1
+            print('{0}: {1} {2}'.format(str(counter), UUID, time.ctime(extractSeconds(UUID))))
         counter+= 1
     if failCount == 0:
         print("+ No Failures Observed")

testing_v7.py

--- OLD/testing_v7.py
+++ NEW/testing_v7.py
@@ -1,5 +1,6 @@
 import new_uuid
 import random
+import time
 
 """
 Testing:
@@ -17,15 +18,25 @@
 
 showUUIDs = False # True to view the generated UUID returnType and lists
 
+def extractSeconds(uuid):
+	uuid_hex = uuid.replace('-', '')
+	uuid_int = int(uuid_hex, 16)
+	uuid_bin = f'{uuid_int:0128b}'
+	time_bin = uuid_bin[:36]
+	return int(time_bin, 2)
+    
 def v7Tests(showUUIDs):
     counter = 0
     testList = []
     masterDict = {}
+    
+    start = int(time.time())
     while counter < 1000:
         UUIDv7 = new_uuid.uuid7(devDebugs, returnType)
         testList.append(UUIDv7)
         masterDict[UUIDv7] = counter
         counter += 1
+    end = int(time.time())
 
     if showUUIDs:
         print("\n")
@@ -53,6 +64,9 @@
         if masterDict[UUID] != counter:
             failCount+=1
             print('{0}: {1}'.format(str(counter), UUID))
+        elif not (extractSeconds(UUID) >= start and extractSeconds(UUID) <= end):
+            failCount+=1
+            print('{0}: {1} {2}'.format(str(counter), UUID, time.ctime(extractSeconds(UUID))))
         counter+= 1
     if failCount == 0:
         print("+ No Failures Observed")

The file testing_v8.py don't need to test the UUID time, since it depends on the implementation.

And thank you for the inclusion of the uuid-creator!

@kyzer-davis kyzer-davis mentioned this pull request Jul 12, 2021
kyzer-davis added a commit that referenced this pull request Jul 12, 2021
Draft 01 Update (#2 #4 #5 and update Readme with new prototype links)
@kyzer-davis kyzer-davis merged commit 70febd7 into uuid6:main Jul 12, 2021
@fabiolimace
Copy link
Contributor Author

@kyzer-davis

I think we can avoid the timestamp padding doing 2 changes in the file new_uuid.py.

Change 1:

     ### Binary Conversions
     ### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
(-)  unixts = f'{sec:032b}'
(-)  unixts = unixts + "0000" # Pad end with 4 zeros to get 36-bit
     subsec_binary = f'{subsec:030b}'
     ### Binary Conversions
     ### Need subsec_a (12 bits), subsec_b (12-bits), and subsec_c (leftover bits starting subsec_seq_node)
(+)  nixts = f'{sec:036b}'
     subsec_binary = f'{subsec:030b}'

Change 2:

     # Convert Hex to Int then splice in dashes
(-)  UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:]
     UUIDv7_formatted = '-'.join(
     # Convert Hex to Int then splice in dashes
(+)  UUIDv7_hex = f'{UUIDv7_int:032x}' # int to hex
     UUIDv7_formatted = '-'.join(

After tthese changes the UUID is generated with the right length (36) without padding:

before: 60c26bbe-7287-f469-602b-cf7423f3cb7
after:  060c4735-8bcb-7726-a200-1fd41eaa8a29

The padding can result in different time when one tries to call uuid.get_time().

@kyzer-davis
Copy link
Collaborator

kyzer-davis commented Aug 9, 2021

@fabiolimace

After tthese changes the UUID is generated with the right length (36) without padding:

  • Both methods end up padding unix 32 bit to 36. The difference is my current implementation pads the least significant bits (end) and your proposed change pads the most-significant, starting bits. (note the leading 0 in your final UUID.)
  • My preference has always been to pad in the least significant position and avoid leading 0s. I actually just published #21 earlier today detailing this in the V02 draft.

Change 2:

  • This is only required due to change number 1 causing the operation of int(UUIDv7_bin, 2) to drop the leading 0s you padded earlier. Somewhat counter-intuitive since f'{UUIDv7_int:032x} re-pads.
  • With the current padding, least significant position, you can use either UUIDv7_hex = hex(int(UUIDv7_bin, 2))[2:] or UUIDv7_hex = f'{UUIDv7_int:032x}' since they yield the same result of a 32 hex characters.

The padding can result in different time when one tries to call uuid.get_time()

  • The current implementation of uuid.get_time() will likely not be able to handle full UUIDv7 parsing until it is extended. By explicitly detailing the padding position this makes future extension of that easier. That is, if the spec is ratified as an official RFC.
  • With the current padding the decoder can always assume the first 32-bits of UUIDv7 are valid 32-bit Unix epoch. Decoding the remaining 4 bits along with the subsequent sub-second precision found in the rest of the UUIDv7 layout I would leave up to the implementer of the decoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants