Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add endianness as one of the pointer properties #649

Closed
andrewrk opened this issue Dec 6, 2017 · 9 comments
Closed

Add endianness as one of the pointer properties #649

andrewrk opened this issue Dec 6, 2017 · 9 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Dec 6, 2017

A pointer is a memory address with metadata.

Here's the metadata a pointer currently has:

  • type
  • const or mutable
  • volatile or no-side-effects for load/store
  • align(x) - guaranteed alignment of the address.
    if unspecified it is the ABI alignment of the type.
  • :a:b - indicates that the value is a bits offset from the address.
    I think we can remove b because it should always be @bitSizeOf(T)
    If :a:b is omitted, a is 0 and b is @bitSizeOf(T).

Here is metadata we plan on adding in accepted proposals:

This proposal is to add yet another piece of metadata to pointers, which is endianness.

  • &.Endian.Little u32
  • &.Endian.Big u32

A target has a native endianness. When pointer endianness is unspecified, it
means the native endianness. So on x86_64, &.Endian.Little u32 is the same
as &u32.

The value Endian here can be obtained from @import("builtin").Endian.
We may decide to automatically import builtin into the global namespace,
so it would become builtin.Endian.

Just like the type of a pointer, endianness can be a comptime value:

const E = if (some_comptime_value) Endian.Little else Endian.Big;

fn read(ptr: &.E u32) -> u32 {
    return *ptr;
}
  • A load from a foreign endian pointer performs byte swapping.
  • A store to a foreign endian pointer performs byte swapping.

These pointer concepts can be combined, and make sense together:

&.Endian.Big const volatile :2 u4

Here we have a memory address that

  • const we should not write through
  • volatile there are side effects from reading from
  • :2 we must bit shift the loaded value
  • u4 we must mask only 4 bits from the loaded value
  • .Endian.Big bit shift and mask assuming the loaded value is big endian

So how does this work with packed structs? (See #307)

const BitField = packed(Endian.Big) struct {
    a: u32,
    b: u32,
    c: u4,
    d: u4,
    e: u4,
    f: u4,
};

Here, if you take the address of each field, you get respectively:

  • a - &.Endian.Big u32.
  • b - &.Endian.Big u32.
  • c - &.Endian.Big :0 u4.
  • d - &.Endian.Big :4 u4.
  • e - &.Endian.Big :0 u4.
  • f - &.Endian.Big :4 u4.

What happened here is that the sub-byte fields have a parent integer, which
zig automatically determines based on byte boundaries.

const BitField = packed(Endian.Big) struct {
    a: u32,
    b: u32,
    data: packed(Endian.Big, u16) struct {
        c: u4,
        d: u4,
        e: u4,
        f: u4,
    },
};

Now we have explicitly decided the parent integer.

  • data.c - &.Endian.Big :0 u4.
  • data.d - &.Endian.Big :4 u4.
  • data.e - &.Endian.Big :8 u4.
  • data.f - &.Endian.Big :12 u4.
@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Dec 6, 2017
@andrewrk andrewrk added this to the 0.2.0 milestone Dec 6, 2017
@PavelVozenilek
Copy link

Wouldn't it be easier to provide handful of functions converting basic datatypes back and forth? Perhaps also function which does it for structs and arrays, with support from the compiler.

@andrewrk
Copy link
Member Author

andrewrk commented Dec 6, 2017

How does that address bit fields?

@PavelVozenilek
Copy link

AFAIK endianness cares only about 16, 32 and 64 bit sized integer data (not sure about 128 bit and floats). Bytes and byte streams are the same everywhere. Unless Zig is doing something really really strange bitfields could be transferred over the wire "as is".

I am not sure about the "parent integer" concept, it feels as a bad idea. Since bitfields are what one really wants here it probably should be treated as bytestream.

@andrewrk andrewrk added the accepted This proposal is planned. label Dec 8, 2017
@kyle-github
Copy link

It is common to think that there are only two ways of representing multibyte values, big endian and little endian. However, that is not the case. There are definitely others. I would request that this be rethought to allow it to be more powerful to allow any arbitrary byte order. I deal with alternate byte orders in the embedded/automation field.

@PavelVozenilek, it is not that simple in a lot of areas. Floats are definitely subject to byte ordering. However, they are not always subject in the same ways. I have heard of (a common manufacturer) of programmable logic controllers that use a byte order of 1032 for 32-bit ints (two big-endian 16-bit words with the words themselves in little-endian order!) and 3210 for 32-bit floats (big-endian). The numbers are the byte offsets from the start of the value in memory.

I like the ability to add these onto pointer values (that makes more sense than what I was thinking a few months ago with the integers themselves having byte order).

const BitField = packed(byteOrder(u32, 3,2,1,0)) struct {
    a: u32,
    b: u32,
    data: packed(byteOrder(u16, 1,0)) struct {
        c: u4,
        d: u4,
        e: u4,
        f: u4,
    },
};

I have been playing with ideas for bit structs where all the fields are specified in bits.

const controlReg = bitstruct {
     a: bitField(u32, 24,25,26,27,28,29,30,31,16,17,18,19,20,21,22,23,8,9,10,11,12,13,14,15,0,1,2,3,4,5,6,7);
     b: bitField(u32, 56,57,58,59,60,61,62,63,48,49,50...);
     c: bitField(u4, 72,73,74,75);
     d: bitField(u4, 76,77,78,79);
     e: bitField(u4, 64,65,66,67);
     f: bitField(u4, 68,69,70,71);
};

I am not very happy about the syntax. What I am trying to achieve is the ability to state exactly which bits belong to each field. In the above example, you can see that the a field is a big-endian 32-bit field. Bits in a bitstruct are considered to be like a bit array. They start with offset 0 and increase. Each field is thus given a discrete set of the bits in the bitstruct (there are some nice tricks you could do with fields that overlap and that are not that uncommon in hardware). Note that the c, d, e and f fields are ordered as they would be in a bit stream, so their bits look out of order.

@kristate
Copy link
Contributor

kristate commented Sep 1, 2018

Excited about this.

@ghost
Copy link

ghost commented Sep 1, 2018

https://www.reddit.com/r/C_Programming/comments/9bv2tx/what_do_c_programmers_think_of_the_zig_language/e56qf06/

What I'd like to see is something like i32 (int_fast32_t), i32le (int32_t which makes it clear it's only for serializing to/from little-endian), i32be (similarly). But that's just a pet peeve of mine....

@andrewrk andrewrk removed the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Nov 21, 2018
@andrewrk andrewrk modified the milestones: 0.4.0, 0.5.0 Nov 21, 2018
@andrewrk andrewrk removed the accepted This proposal is planned. label Apr 6, 2019
@shawnl
Copy link
Contributor

shawnl commented May 18, 2019

To me this feels like hidden control flow, and if you are going to add hidden control flow, like with bit-fields, then why only provide for a few use-cases, as @kyle-github pointed out? Why not just allow overloading the assignment/load operators with pure functions? This way you can do;

  • BIt-fields
  • Circular power-of-two-sized buffers
  • mixed-endian data
  • mixed-endian on a bit-wise basis
    and even
  • regular expression transforms and lexing.

...Or just not using the load/assignment operators for these use-cases. Bit-fields certainly don't need those operators, as the algorithms work better if you copy out/in of the bit-field before/after. The only one here that does need them is circular power-of-two-sized buffers.

@tgschultz
Copy link
Contributor

This issue was first posed back in 2017. Now that we have In/OutBitStream, and to a lesser extent PackedIntArray/Slice, I'm not sure the reasonable use cases for this feature aren't covered already without a new language feature.

Can anyone speak to use cases I may not have considered?

@squeek502
Copy link
Collaborator

squeek502 commented Aug 16, 2023

One potential use case for this that I've been running into lately would be UTF-16. Being able to have a []endian(.Little) u16 slice that (1) handles littleToNative/nativeToLittle conversions for you, and (2) allows the endianness of some UTF-16 data to be retained & enforced by the slice itself seems like it'd be quite useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

7 participants