Add endianness as one of the pointer properties #649

andrewrk · 2017-12-06T15:13:12Z

A pointer is a memory address with metadata.

Here's the metadata a pointer currently has:

type
const or mutable
volatile or no-side-effects for load/store
align(x) - guaranteed alignment of the address.
if unspecified it is the ABI alignment of the type.
:a:b - indicates that the value is a bits offset from the address.
I think we can remove b because it should always be @bitSizeOf(T)
If :a:b is omitted, a is 0 and b is @bitSizeOf(T).

Here is metadata we plan on adding in accepted proposals:

null or 0 to indicate that the pointer is null or 0 terminated (see proposal: type for null terminated pointer #265)

This proposal is to add yet another piece of metadata to pointers, which is endianness.

&.Endian.Little u32
&.Endian.Big u32

A target has a native endianness. When pointer endianness is unspecified, it
means the native endianness. So on x86_64, &.Endian.Little u32 is the same
as &u32.

The value Endian here can be obtained from @import("builtin").Endian.
We may decide to automatically import builtin into the global namespace,
so it would become builtin.Endian.

Just like the type of a pointer, endianness can be a comptime value:

const E = if (some_comptime_value) Endian.Little else Endian.Big;

fn read(ptr: &.E u32) -> u32 {
    return *ptr;
}

A load from a foreign endian pointer performs byte swapping.
A store to a foreign endian pointer performs byte swapping.

These pointer concepts can be combined, and make sense together:

&.Endian.Big const volatile :2 u4

Here we have a memory address that

const we should not write through
volatile there are side effects from reading from
:2 we must bit shift the loaded value
u4 we must mask only 4 bits from the loaded value
.Endian.Big bit shift and mask assuming the loaded value is big endian

So how does this work with packed structs? (See #307)

const BitField = packed(Endian.Big) struct {
    a: u32,
    b: u32,
    c: u4,
    d: u4,
    e: u4,
    f: u4,
};

Here, if you take the address of each field, you get respectively:

a - &.Endian.Big u32.
b - &.Endian.Big u32.
c - &.Endian.Big :0 u4.
d - &.Endian.Big :4 u4.
e - &.Endian.Big :0 u4.
f - &.Endian.Big :4 u4.

What happened here is that the sub-byte fields have a parent integer, which
zig automatically determines based on byte boundaries.

const BitField = packed(Endian.Big) struct {
    a: u32,
    b: u32,
    data: packed(Endian.Big, u16) struct {
        c: u4,
        d: u4,
        e: u4,
        f: u4,
    },
};

Now we have explicitly decided the parent integer.

data.c - &.Endian.Big :0 u4.
data.d - &.Endian.Big :4 u4.
data.e - &.Endian.Big :8 u4.
data.f - &.Endian.Big :12 u4.

The text was updated successfully, but these errors were encountered:

PavelVozenilek · 2017-12-06T20:00:43Z

Wouldn't it be easier to provide handful of functions converting basic datatypes back and forth? Perhaps also function which does it for structs and arrays, with support from the compiler.

andrewrk · 2017-12-06T20:05:45Z

How does that address bit fields?

PavelVozenilek · 2017-12-06T23:55:41Z

AFAIK endianness cares only about 16, 32 and 64 bit sized integer data (not sure about 128 bit and floats). Bytes and byte streams are the same everywhere. Unless Zig is doing something really really strange bitfields could be transferred over the wire "as is".

I am not sure about the "parent integer" concept, it feels as a bad idea. Since bitfields are what one really wants here it probably should be treated as bytestream.

kyle-github · 2017-12-10T19:50:27Z

It is common to think that there are only two ways of representing multibyte values, big endian and little endian. However, that is not the case. There are definitely others. I would request that this be rethought to allow it to be more powerful to allow any arbitrary byte order. I deal with alternate byte orders in the embedded/automation field.

@PavelVozenilek, it is not that simple in a lot of areas. Floats are definitely subject to byte ordering. However, they are not always subject in the same ways. I have heard of (a common manufacturer) of programmable logic controllers that use a byte order of 1032 for 32-bit ints (two big-endian 16-bit words with the words themselves in little-endian order!) and 3210 for 32-bit floats (big-endian). The numbers are the byte offsets from the start of the value in memory.

I like the ability to add these onto pointer values (that makes more sense than what I was thinking a few months ago with the integers themselves having byte order).

const BitField = packed(byteOrder(u32, 3,2,1,0)) struct {
    a: u32,
    b: u32,
    data: packed(byteOrder(u16, 1,0)) struct {
        c: u4,
        d: u4,
        e: u4,
        f: u4,
    },
};

I have been playing with ideas for bit structs where all the fields are specified in bits.

const controlReg = bitstruct {
     a: bitField(u32, 24,25,26,27,28,29,30,31,16,17,18,19,20,21,22,23,8,9,10,11,12,13,14,15,0,1,2,3,4,5,6,7);
     b: bitField(u32, 56,57,58,59,60,61,62,63,48,49,50...);
     c: bitField(u4, 72,73,74,75);
     d: bitField(u4, 76,77,78,79);
     e: bitField(u4, 64,65,66,67);
     f: bitField(u4, 68,69,70,71);
};

I am not very happy about the syntax. What I am trying to achieve is the ability to state exactly which bits belong to each field. In the above example, you can see that the a field is a big-endian 32-bit field. Bits in a bitstruct are considered to be like a bit array. They start with offset 0 and increase. Each field is thus given a discrete set of the bits in the bitstruct (there are some nice tricks you could do with fields that overlap and that are not that uncommon in hardware). Note that the c, d, e and f fields are ordered as they would be in a bit stream, so their bits look out of order.

kristate · 2018-09-01T14:53:05Z

Excited about this.

ghost · 2018-09-01T15:02:59Z

https://www.reddit.com/r/C_Programming/comments/9bv2tx/what_do_c_programmers_think_of_the_zig_language/e56qf06/

What I'd like to see is something like i32 (int_fast32_t), i32le (int32_t which makes it clear it's only for serializing to/from little-endian), i32be (similarly). But that's just a pet peeve of mine....

shawnl · 2019-05-18T02:44:00Z

To me this feels like hidden control flow, and if you are going to add hidden control flow, like with bit-fields, then why only provide for a few use-cases, as @kyle-github pointed out? Why not just allow overloading the assignment/load operators with pure functions? This way you can do;

BIt-fields
Circular power-of-two-sized buffers
mixed-endian data
mixed-endian on a bit-wise basis
and even
regular expression transforms and lexing.

...Or just not using the load/assignment operators for these use-cases. Bit-fields certainly don't need those operators, as the algorithms work better if you copy out/in of the bit-field before/after. The only one here that does need them is circular power-of-two-sized buffers.

tgschultz · 2019-05-19T02:50:40Z

This issue was first posed back in 2017. Now that we have In/OutBitStream, and to a lesser extent PackedIntArray/Slice, I'm not sure the reasonable use cases for this feature aren't covered already without a new language feature.

Can anyone speak to use cases I may not have considered?

squeek502 · 2023-08-16T09:05:31Z

One potential use case for this that I've been running into lately would be UTF-16. Being able to have a []endian(.Little) u16 slice that (1) handles littleToNative/nativeToLittle conversions for you, and (2) allows the endianness of some UTF-16 data to be retained & enforced by the slice itself seems like it'd be quite useful.

andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Dec 6, 2017

andrewrk added this to the 0.2.0 milestone Dec 6, 2017

andrewrk added the accepted This proposal is planned. label Dec 8, 2017

andrewrk mentioned this issue Dec 19, 2017

use a builtin enum for calling conventions instead of keywords #661

Closed

andrewrk referenced this issue in Hejsil/pokemon-randomizer Jan 9, 2018

@andrewrk segmentation fault on build :)

57011e1

andrewrk mentioned this issue Jan 11, 2018

Fix endian swap parameters #682

Merged

skyfex mentioned this issue Jan 11, 2018

Shortcut (type inferrence) for naming enum values #683

Closed

andrewrk mentioned this issue Feb 19, 2018

add support for stack traces on macosx #780

Merged

Hejsil mentioned this issue Feb 27, 2018

Enum Arrays #793

Open

andrewrk modified the milestones: 0.2.0, 0.3.0 Feb 28, 2018

andrewrk modified the milestones: 0.3.0, 0.4.0 Jul 18, 2018

andrewrk removed the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Nov 21, 2018

andrewrk modified the milestones: 0.4.0, 0.5.0 Nov 21, 2018

andrewrk mentioned this issue Dec 17, 2018

Memory Mapped IO for 32-bit ARM #1834

Closed

Hejsil mentioned this issue Feb 6, 2019

More control over the data layout of tagged unions #1922

Open

andrewrk removed the accepted This proposal is planned. label Apr 6, 2019

andrewrk closed this as completed Jul 5, 2019

ghost mentioned this issue Feb 16, 2022

make packed struct always use a single backing integer, inferring it if not explicitly provided #10113

Closed

squeek502 mentioned this issue Oct 18, 2023

Windows: Deal with NT namespaced paths in GetFinalPathNameByHandle #17541

Open

squeek502 mentioned this issue Feb 3, 2024

Proposal: Represent integer endianness in the type system #18796

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add endianness as one of the pointer properties #649

Add endianness as one of the pointer properties #649

andrewrk commented Dec 6, 2017

PavelVozenilek commented Dec 6, 2017

andrewrk commented Dec 6, 2017

PavelVozenilek commented Dec 6, 2017

kyle-github commented Dec 10, 2017

kristate commented Sep 1, 2018

ghost commented Sep 1, 2018

shawnl commented May 18, 2019

tgschultz commented May 19, 2019

squeek502 commented Aug 16, 2023 •

edited

Loading

Add endianness as one of the pointer properties #649

Add endianness as one of the pointer properties #649

Comments

andrewrk commented Dec 6, 2017

PavelVozenilek commented Dec 6, 2017

andrewrk commented Dec 6, 2017

PavelVozenilek commented Dec 6, 2017

kyle-github commented Dec 10, 2017

kristate commented Sep 1, 2018

ghost commented Sep 1, 2018

shawnl commented May 18, 2019

tgschultz commented May 19, 2019

squeek502 commented Aug 16, 2023 • edited Loading

squeek502 commented Aug 16, 2023 •

edited

Loading