nmt: switch NS convention to big-endian #67

pepyakin · 2023-11-28T12:29:34Z

The canonical representation of a namespace ID is [u8; N] (N=4 for now).
However, it seems to be useful at least for debugging purposes to
parse and display the namespace as an unsigned integer.

Prior this change, the integer form was defined as little-endian, just
because it's a default. Most of modern architectures are LE by default,
wasm is LE, etc. Mostly it doesn't matter, performance differences
are neglegible either way 99% of the time. I suppose the reason for
this is that LE has a marginal advantage because of the things like
that you don't need byte shuffle when reinterpreting value type (e.g.
u32 and u16) behind the same pointer and stuff like that.

However, big endian also has it's benefits. First is a human readable
representation, which is important here since this is more of a
developer convenience representation. Another, benefit is that
BE corresponds to the byte ordering. E.g., 256 > 255 (0x0100 > 0x00ff),
which doesn't hold in LE. And we do care about the ordering of
namespaces, at present because the NMT must literally be built in
order, but in future I also think it would be a useful property to
have blobs sorted.

At the same time, the performance gains of using LE are neglegible.

So given all of that I propose we change the canonical integer representation
to BE.

pepyakin · 2023-11-28T12:29:52Z

Current dependencies on/for this PR:

main
- PR Fix sov demo dependencies #64
  - PR shim,nmt: expunge unwrap #59
    - PR shim: fix up the error message when no submit key #60
      - PR shim: ensure connected to sugondat-node #61
        
        PR shim: fix submit key loading #65
        
        PR shim: heads up about lacking submit key #66
        
        PR nmt: switch NS convention to big-endian #67 👈

This stack of pull requests is managed by Graphite.

gabriele-0201

Also, this changes is needed otherwise the chain would panic if two blobs with namespace 1 and 256 are submitted in this order because they are ordered from an integer perspective but as soon as those blobs are pushed in the nmt-tree it would see the bytes as BE and if the integer were converted to bytes using LE then the workspace are not in order anymore and thus the nmt-rs library would return Error, unwrapped by sugondat-nmt because from its view they were ordered

gabriele-0201 · 2023-11-28T17:08:47Z

sugondat-nmt/src/ns.rs

-    pub fn with_namespace_id(namespace_id: u32) -> Self {
-        Self(namespace_id)
+    /// Convert the given big-endian 32-bit integer to a namespace ID.
+    pub fn from_be_u32(namespace_id: u32) -> Self {


why from_be_u32? the main point of the function is that the u32 is converted to a BE and not converted from BE (because the memory representation of the input argument is agnostic to the core of the function)

I propose something like from_u32_to_be or better to just from_u32 and treat the byes as BE internally in the library

from_u32_be / from_u32_le would be a good consistent API.

Alternatively, we can keep only from_raw_bytes and have the caller of the API decide whether to use big-endian or little-endian.

because the memory representation of the input argument is agnostic to the core of the function

You are correct and I am really happy you get it. I spent quite some time explaining this to so-called lead developers.

That said, it's still possible to view the number as it's stored in the memory. See for example u32::to_le and ::from_le. They operate on the concept of target endianess and according to that we can view this input argument as in target endianess.

So what this function is, it is a convenience function that takes a value in the target endianess and extracts it's big endian representation. Semantically, it's the same operation as u32::to_be_bytes. However, to_be_bytes takes self: u32 and returns another type [u8; 32]. In our case here, we need to take u32 and return self: [u8; N]. That's how the name from_be_u32 came about!

Now,

or better to just from_u32

We are talking about a type that is canonically represented as [u8; N]. It seems to me that it's important for the caller to know which way BE, LE or even TE (which are all reasonable guesses) the u32 is going to be treated. That's why I think it warrants to rub in the face that it's be.

I would rather

keep only from_raw_bytes and have the caller of the API decide whether to use big-endian or little-endian.

as @rphmeier suggested, than having from_u32. Now, why I would avoid doing that is I would like to introduce a canonical way of interpreting a number as a namespace so that we don't encourage using it both ways. Introducing, from_le_u32 would kinda do that I feel. And, as it maybe clear from the OP, I think BE should be enough for everybody.

That all said, I am not married on this naming/semantics/logic. In fact, I don't think it matters too much and I wrote all the above just for the sake of getting on the same page, for education and hopefully for posterity. So let me know if you still want to change the naming and/or add the LE variant.

Thanks for the explanation! Now I see you're point and it makes more sense, of course I also don't push really hard for a specific name, the meaning of the conversion arrives anyway!
If I have to vote for something I would vote for from_u32_be just because it seems (at least for me) more clear at first view about what it does

The canonical representation of a namespace ID is [u8; N] (N=4 for now). However, it seems to be useful at least for debugging purposes to parse and display the namespace as an unsigned integer. Prior this change, the integer form was defined as little-endian, just because it's a default. Most of modern architectures are LE by default, wasm is LE, etc. Mostly it doesn't matter, performance differences are neglegible either way 99% of the time. I suppose the reason for this is that LE has a marginal advantage because of the things like that you don't need byte shuffle when reinterpreting value type (e.g. u32 and u16) behind the same pointer and stuff like that. However, big endian also has it's benefits. First is a human readable representation, which is important here since this is more of a developer convenience representation. Another, benefit is that BE corresponds to the byte ordering. E.g., 256 > 255 (0x0100 > 0x00ff), which doesn't hold in LE. And we do care about the ordering of namespaces, at present because the NMT must literally be built in order, but in future I also think it would be a useful property to have blobs sorted. At the same time, the performance gains of using LE are neglegible. So given all of that I propose we change the canonical integer representation to BE.

This was referenced Nov 28, 2023

Fix sov demo dependencies #64

Merged

shim,nmt: expunge unwrap #59

Merged

shim: fix up the error message when no submit key #60

Merged

shim: ensure connected to sugondat-node #61

Merged

shim: fix submit key loading #65

Merged

pepyakin mentioned this pull request Nov 28, 2023

shim: heads up about lacking submit key #66

Merged

pepyakin requested review from gabriele-0201 and rphmeier November 28, 2023 16:59

gabriele-0201 reviewed Nov 28, 2023

View reviewed changes

pepyakin force-pushed the pep-headsup-no-submit-key branch from a531099 to ecca8d6 Compare November 28, 2023 17:03

pepyakin force-pushed the pep-switch-ns-convention-to-be branch from 1d56a3b to f385b17 Compare November 28, 2023 17:03

gabriele-0201 reviewed Nov 28, 2023

View reviewed changes

Base automatically changed from pep-headsup-no-submit-key to main November 29, 2023 16:13

pepyakin force-pushed the pep-switch-ns-convention-to-be branch from f385b17 to bb990d7 Compare November 29, 2023 16:21

pepyakin force-pushed the pep-switch-ns-convention-to-be branch from bb990d7 to c5dccf5 Compare November 30, 2023 14:09

pepyakin merged commit 8c884c7 into main Nov 30, 2023

pepyakin deleted the pep-switch-ns-convention-to-be branch November 30, 2023 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nmt: switch NS convention to big-endian #67

nmt: switch NS convention to big-endian #67

Uh oh!

pepyakin commented Nov 28, 2023

Uh oh!

pepyakin commented Nov 28, 2023

Uh oh!

gabriele-0201 left a comment

Uh oh!

gabriele-0201 Nov 28, 2023

Uh oh!

rphmeier Nov 29, 2023

Uh oh!

pepyakin Nov 29, 2023

Uh oh!

gabriele-0201 Nov 29, 2023

Uh oh!

Uh oh!

nmt: switch NS convention to big-endian #67

nmt: switch NS convention to big-endian #67

Uh oh!

Conversation

pepyakin commented Nov 28, 2023

Uh oh!

pepyakin commented Nov 28, 2023

Uh oh!

gabriele-0201 left a comment

Choose a reason for hiding this comment

Uh oh!

gabriele-0201 Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

rphmeier Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

pepyakin Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

gabriele-0201 Nov 29, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!