Skip to content

nmt: switch NS convention to big-endian #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 30, 2023

Conversation

pepyakin
Copy link
Contributor

The canonical representation of a namespace ID is [u8; N] (N=4 for now).
However, it seems to be useful at least for debugging purposes to
parse and display the namespace as an unsigned integer.

Prior this change, the integer form was defined as little-endian, just
because it's a default. Most of modern architectures are LE by default,
wasm is LE, etc. Mostly it doesn't matter, performance differences
are neglegible either way 99% of the time. I suppose the reason for
this is that LE has a marginal advantage because of the things like
that you don't need byte shuffle when reinterpreting value type (e.g.
u32 and u16) behind the same pointer and stuff like that.

However, big endian also has it's benefits. First is a human readable
representation, which is important here since this is more of a
developer convenience representation. Another, benefit is that
BE corresponds to the byte ordering. E.g., 256 > 255 (0x0100 > 0x00ff),
which doesn't hold in LE. And we do care about the ordering of
namespaces, at present because the NMT must literally be built in
order, but in future I also think it would be a useful property to
have blobs sorted.

At the same time, the performance gains of using LE are neglegible.

So given all of that I propose we change the canonical integer representation
to BE.

Copy link
Contributor

@gabriele-0201 gabriele-0201 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this changes is needed otherwise the chain would panic if two blobs with namespace 1 and 256 are submitted in this order because they are ordered from an integer perspective but as soon as those blobs are pushed in the nmt-tree it would see the bytes as BE and if the integer were converted to bytes using LE then the workspace are not in order anymore and thus the nmt-rs library would return Error, unwrapped by sugondat-nmt because from its view they were ordered

@pepyakin pepyakin force-pushed the pep-headsup-no-submit-key branch from a531099 to ecca8d6 Compare November 28, 2023 17:03
@pepyakin pepyakin force-pushed the pep-switch-ns-convention-to-be branch from 1d56a3b to f385b17 Compare November 28, 2023 17:03
pub fn with_namespace_id(namespace_id: u32) -> Self {
Self(namespace_id)
/// Convert the given big-endian 32-bit integer to a namespace ID.
pub fn from_be_u32(namespace_id: u32) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why from_be_u32? the main point of the function is that the u32 is converted to a BE and not converted from BE (because the memory representation of the input argument is agnostic to the core of the function)

I propose something like from_u32_to_be or better to just from_u32 and treat the byes as BE internally in the library

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_u32_be / from_u32_le would be a good consistent API.

Alternatively, we can keep only from_raw_bytes and have the caller of the API decide whether to use big-endian or little-endian.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the memory representation of the input argument is agnostic to the core of the function

You are correct and I am really happy you get it. I spent quite some time explaining this to so-called lead developers.

That said, it's still possible to view the number as it's stored in the memory. See for example u32::to_le and ::from_le. They operate on the concept of target endianess and according to that we can view this input argument as in target endianess.

So what this function is, it is a convenience function that takes a value in the target endianess and extracts it's big endian representation. Semantically, it's the same operation as u32::to_be_bytes. However, to_be_bytes takes self: u32 and returns another type [u8; 32]. In our case here, we need to take u32 and return self: [u8; N]. That's how the name from_be_u32 came about!

Now,

or better to just from_u32

We are talking about a type that is canonically represented as [u8; N]. It seems to me that it's important for the caller to know which way BE, LE or even TE (which are all reasonable guesses) the u32 is going to be treated. That's why I think it warrants to rub in the face that it's be.

I would rather

keep only from_raw_bytes and have the caller of the API decide whether to use big-endian or little-endian.

as @rphmeier suggested, than having from_u32. Now, why I would avoid doing that is I would like to introduce a canonical way of interpreting a number as a namespace so that we don't encourage using it both ways. Introducing, from_le_u32 would kinda do that I feel. And, as it maybe clear from the OP, I think BE should be enough for everybody.

That all said, I am not married on this naming/semantics/logic. In fact, I don't think it matters too much and I wrote all the above just for the sake of getting on the same page, for education and hopefully for posterity. So let me know if you still want to change the naming and/or add the LE variant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Now I see you're point and it makes more sense, of course I also don't push really hard for a specific name, the meaning of the conversion arrives anyway!
If I have to vote for something I would vote for from_u32_be just because it seems (at least for me) more clear at first view about what it does

Base automatically changed from pep-headsup-no-submit-key to main November 29, 2023 16:13
@pepyakin pepyakin force-pushed the pep-switch-ns-convention-to-be branch from f385b17 to bb990d7 Compare November 29, 2023 16:21
The canonical representation of a namespace ID is [u8; N] (N=4 for now).
However, it seems to be useful at least for debugging purposes to
parse and display the namespace as an unsigned integer.

Prior this change, the integer form was defined as little-endian, just
because it's a default. Most of modern architectures are LE by default,
wasm is LE, etc. Mostly it doesn't matter, performance differences
are neglegible either way 99% of the time. I suppose the reason for
this is that LE has a marginal advantage because of the things like
that you don't need byte shuffle when reinterpreting value type (e.g.
u32 and u16) behind the same pointer and stuff like that.

However, big endian also has it's benefits. First is a human readable
representation, which is important here since this is more of a
developer convenience representation. Another, benefit is that
BE corresponds to the byte ordering. E.g., 256 > 255 (0x0100 > 0x00ff),
which doesn't hold in LE. And we do care about the ordering of
namespaces, at present because the NMT must literally be built in
order, but in future I also think it would be a useful property to
have blobs sorted.

At the same time, the performance gains of using LE are neglegible.

So given all of that I propose we change the canonical integer representation
to BE.
@pepyakin pepyakin force-pushed the pep-switch-ns-convention-to-be branch from bb990d7 to c5dccf5 Compare November 30, 2023 14:09
@pepyakin pepyakin merged commit 8c884c7 into main Nov 30, 2023
@pepyakin pepyakin deleted the pep-switch-ns-convention-to-be branch November 30, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants