Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: align CUIDv2 generation with newest spec #16

Merged
merged 2 commits into from
Feb 21, 2023

Conversation

xaevik
Copy link
Contributor

@xaevik xaevik commented Feb 19, 2023

Refactors code base to align it with the current CUIDv2 implementation. The following overall changes have occurred:

  • Private struct fields have been renamed to be more contextually accurate (e.g., _c -> _counter)
  • Calls to the insecure System.Random have been removed from Utils.GenerateRandom()
  • Implementation of Utils.GenerateIdentity() has been refactored to use only the following parts:
    • System Hostname (if unavailable, a base-16 value is generated)
    • Process ID
    • Thread ID
    • 32 bytes of secure random data are generated via Utils.GenerateRandom()
  • Counter in Cuid2.Counter was refactored:
    • Value changed from unsigned int to unsigned long
    • Initialization of the seed value now includes multiplication against the value 476782367

@xaevik xaevik added the enhancement New feature or request label Feb 19, 2023
@xaevik xaevik self-assigned this Feb 19, 2023
@xaevik xaevik force-pushed the feature/signature-refactor branch 4 times, most recently from 4d0e393 to 82c0fb4 Compare February 20, 2023 01:07
Signed-off-by: Alan Brault <alan.brault@visus.io>
@xaevik xaevik force-pushed the feature/signature-refactor branch 4 times, most recently from 0cd72c0 to bf7b745 Compare February 20, 2023 01:28
@visus-io visus-io deleted a comment from sonarcloud bot Feb 20, 2023
Signed-off-by: Alan Brault <alan.brault@visus.io>
@xaevik
Copy link
Contributor Author

xaevik commented Feb 20, 2023

@ericelliott I'd appreciate your input on these changes. I did not go forward with splitting the SHA3-512 byte array into smaller chunks before encoding. I was not comfortable with the potential loss of entropy or risks it could introduce.

So, for now, Cuid2 is still heavily affected by the performance issues of BigInteger.DivRem when tens of thousands (or hundreds of thousands) of values are generated in tight loops. Normally, I would simply use the GNU MP library (which I do in the PHP port) to get around this, but the .NET Wrapper is no longer being actively maintained.

It is unfortunate that .NET lacks strong built-in support for base conversions. They only support conversion of Base 2, 8, 10, and 16 and the performance of System.Numerics.BigInteger.DivRem is quite abysmal (System.Math.DivRem also has the same issue which is why it is not used in Cuid). I am hoping that they may address it further when .NET 8 releases as BigInteger has gone through several iterations.

Obvservations

It would have been nice if base-32 (z-base-32 variant) had been chosen instead of base-36 since it operates under powers-of-two. That would have resulted in much faster encoding as bitwise operations could have been done against the byte array instead of binary long division.

@xaevik xaevik marked this pull request as ready for review February 20, 2023 13:41
BinaryPrimitives.WriteInt64LittleEndian(buffer, _t);
BinaryPrimitives.WriteUInt32LittleEndian(buffer[..8], _c);
BinaryPrimitives.WriteInt64LittleEndian(data[..8], _timestamp);
BinaryPrimitives.WriteUInt64LittleEndian(data[^8..], _counter);

Sha3Digest digest = new(512);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this API, but new(512) seems strange to me. new what? I'm expecting to see something like Sha3Digest digest = new Sha3Digest(512);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a C# 9 language feature called a new operator. I use explicit typing when I write code so instead of var digest = new Sha3Digest(512) or Sha3Digest digest = new Sha3Digest(512) I can do Sha3Digest digest = new(512) instead since the type is known on the left side rather than the right side.

@ericelliott
Copy link

when tens of thousands (or hundreds of thousands) of values are generated in tight loops.

In my experience, this shouldn't be happening much in most apps. Typically, ids like this should be for entities that are getting persisted in a database. For use-cases that don't require persistence, maybe a simple counter would suffice, and the security and collision-resistance of Cuid2 would be overkill?

Cuid2 is intentionally a little bit slow for security. The SHA-3 algorithm itself does extra hashing rounds for security, to prevent brute-force attacks.

@ericelliott
Copy link

It would have been nice if base-32 (z-base-32 variant) had been chosen instead of base-36 since it operates under powers-of-two. That would have resulted in much faster encoding as bitwise operations could have been done against the byte array instead of binary long division.

I considered that, but a 24-character Base36 value is 4x more collision-resistant than a 24-character Base32 value.

@xaevik
Copy link
Contributor Author

xaevik commented Feb 20, 2023

when tens of thousands (or hundreds of thousands) of values are generated in tight loops.

In my experience, this shouldn't be happening much in most apps. Typically, ids like this should be for entities that are getting persisted in a database. For use-cases that don't require persistence, maybe a simple counter would suffice, and the security and collision-resistance of Cuid2 would be overkill?

Cuid2 is intentionally a little bit slow for security. The SHA-3 algorithm itself does extra hashing rounds for security, to prevent brute-force attacks.

Fair enough, as a baseline when I develop routines such as this, I push them to an upper limit (one million or more instances) as it's easier to identify any hidden bottlenecks. In this case, the slowest performer is the Base 36 encoding through BigInteger, not the hashing itself.

@xaevik xaevik merged commit eceab85 into main Feb 21, 2023
@xaevik xaevik deleted the feature/signature-refactor branch February 21, 2023 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants