New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrong order of bits in bit fields #307

Closed
AndreaOrru opened this Issue Apr 5, 2017 · 18 comments

Comments

Projects
None yet
4 participants
@AndreaOrru
Member

AndreaOrru commented Apr 5, 2017

const VGAEntry = packed struct {
    c:  u8,
    fg: u4,
    bg: u4,
};

Expected bit layout as implemented by C compilers (c = char bits, b = bg bits, f = fg bits):
cccccccc bbbbffff

Zig produces this instead:
cccccccc ffffbbbb

Is this expected behavior?

@thejoshwolfe

This comment has been minimized.

Show comment
Hide comment
@thejoshwolfe

thejoshwolfe Apr 5, 2017

Member

here's some relevant discussion: http://stackoverflow.com/questions/19376426/order-of-fields-when-using-a-bit-field-in-c

i didn't cross check the claims there, but i expect they're right that the C spec doesn't specify the order at all. so "as implemented by C compilers" means "the precedent established by popular C compilers", not "as defined by the C spec".

In practice though, that might be irrelevant. If code is already written in C, it might be nice to do simple syntactic updates to translate it to Zig rather than carefully moving bitfields around.

This brings up a serious question: what happens when two different drivers require two different implementations of how to layout sub-byte bits? for example, a {u12,u20} has one byte that's shared between the two fields. does the u12 get the top half or the bottom half of that byte? You can't just swap the order of the fields to resolve this, because the other bytes in the fields would move too. Are all drivers designed to expect the "precedent established by popular C compilers", or do some of them require the opposite decision?

This issue looks exactly analogous to endianness, except at the bit level. are bits specified from lowest to highest or highest to lowest? I would have expected highest to lowest, like big endian, because that's how humans think about it (as shown in your diagram). Does the behavior of popular C compilers actually depend on byte endianness, or is ot always "little endian" for bit order?

Proposal: allow specifying the endianness of a packed struct, and allow specifying the byte order and bit order separately.

Member

thejoshwolfe commented Apr 5, 2017

here's some relevant discussion: http://stackoverflow.com/questions/19376426/order-of-fields-when-using-a-bit-field-in-c

i didn't cross check the claims there, but i expect they're right that the C spec doesn't specify the order at all. so "as implemented by C compilers" means "the precedent established by popular C compilers", not "as defined by the C spec".

In practice though, that might be irrelevant. If code is already written in C, it might be nice to do simple syntactic updates to translate it to Zig rather than carefully moving bitfields around.

This brings up a serious question: what happens when two different drivers require two different implementations of how to layout sub-byte bits? for example, a {u12,u20} has one byte that's shared between the two fields. does the u12 get the top half or the bottom half of that byte? You can't just swap the order of the fields to resolve this, because the other bytes in the fields would move too. Are all drivers designed to expect the "precedent established by popular C compilers", or do some of them require the opposite decision?

This issue looks exactly analogous to endianness, except at the bit level. are bits specified from lowest to highest or highest to lowest? I would have expected highest to lowest, like big endian, because that's how humans think about it (as shown in your diagram). Does the behavior of popular C compilers actually depend on byte endianness, or is ot always "little endian" for bit order?

Proposal: allow specifying the endianness of a packed struct, and allow specifying the byte order and bit order separately.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Apr 5, 2017

Member

Is this expected behavior?

Yes. The idea is that you are specifying the exact order of fields in the struct. The order you define in the struct is c, fg, bg, and that matches the bits.

A couple other things to note:

  • If a field greater than 8 bits is byte-aligned, it is represented in memory with the endianness of the host.
  • If the field is not byte-aligned, it is represented in memory in big-endian.

for example, a {u12,u20} has one byte that's shared between the two fields.

Are you talking about a union?

Member

andrewrk commented Apr 5, 2017

Is this expected behavior?

Yes. The idea is that you are specifying the exact order of fields in the struct. The order you define in the struct is c, fg, bg, and that matches the bits.

A couple other things to note:

  • If a field greater than 8 bits is byte-aligned, it is represented in memory with the endianness of the host.
  • If the field is not byte-aligned, it is represented in memory in big-endian.

for example, a {u12,u20} has one byte that's shared between the two fields.

Are you talking about a union?

@AndreaOrru

This comment has been minimized.

Show comment
Hide comment
@AndreaOrru

AndreaOrru Apr 5, 2017

Member

As long as this is expected and consistent, the current behavior is OK for me.

+1 for customizable endianness at bit and byte level though.

Member

AndreaOrru commented Apr 5, 2017

As long as this is expected and consistent, the current behavior is OK for me.

+1 for customizable endianness at bit and byte level though.

@andrewrk andrewrk changed the title from Potentially wrong order of nibbles in packed structs to proposal: ability to set endianness at byte and bit level of bitfields Apr 6, 2017

@andrewrk andrewrk added the enhancement label Apr 6, 2017

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Apr 11, 2017

Member

Turns out in addition to byte-level endianness systems also have bit-level endianness. Current implementation of bit fields assumes big endian bits. Instead, we should have the same bit endianness as the byte endianness of the host. Then byte-alignment no longer makes a difference, and everything is as expected.

Member

andrewrk commented Apr 11, 2017

Turns out in addition to byte-level endianness systems also have bit-level endianness. Current implementation of bit fields assumes big endian bits. Instead, we should have the same bit endianness as the byte endianness of the host. Then byte-alignment no longer makes a difference, and everything is as expected.

@andrewrk andrewrk changed the title from proposal: ability to set endianness at byte and bit level of bitfields to wrong order of bits in bit fields Apr 11, 2017

@andrewrk andrewrk added bug and removed enhancement labels Apr 11, 2017

@andrewrk andrewrk added this to the 0.1.0 milestone Apr 11, 2017

@AndreaOrru

This comment has been minimized.

Show comment
Hide comment
@AndreaOrru

AndreaOrru Apr 13, 2017

Member

Here is a test case for x86(-64):

const io = @import("std").io;

const u4 = @intType(false, 4);

const Bitfields = packed struct {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
};

pub fn main() -> %void {
    var all: u64 = 0x7765443322221111;
    var bitfields = *@ptrcast(&Bitfields, &all);

    %%io.stdout.printf("f1: {X}\n", u64(bitfields.f1));
    %%io.stdout.printf("f2: {X}\n", u64(bitfields.f2));
    %%io.stdout.printf("f3: {X}\n", u64(bitfields.f3));
    %%io.stdout.printf("f4: {X}\n", u64(bitfields.f4));
    %%io.stdout.printf("f5: {X}\n", u64(bitfields.f5));
    %%io.stdout.printf("f6: {X}\n", u64(bitfields.f6));
    %%io.stdout.printf("f7: {X}\n", u64(bitfields.f7));
}

Output:

f1: 1111
f2: 2222
f3: 33
f4: 44
f5: 6     // Wrong
f6: 5     // Wrong
f7: 77
Member

AndreaOrru commented Apr 13, 2017

Here is a test case for x86(-64):

const io = @import("std").io;

const u4 = @intType(false, 4);

const Bitfields = packed struct {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
};

pub fn main() -> %void {
    var all: u64 = 0x7765443322221111;
    var bitfields = *@ptrcast(&Bitfields, &all);

    %%io.stdout.printf("f1: {X}\n", u64(bitfields.f1));
    %%io.stdout.printf("f2: {X}\n", u64(bitfields.f2));
    %%io.stdout.printf("f3: {X}\n", u64(bitfields.f3));
    %%io.stdout.printf("f4: {X}\n", u64(bitfields.f4));
    %%io.stdout.printf("f5: {X}\n", u64(bitfields.f5));
    %%io.stdout.printf("f6: {X}\n", u64(bitfields.f6));
    %%io.stdout.printf("f7: {X}\n", u64(bitfields.f7));
}

Output:

f1: 1111
f2: 2222
f3: 33
f4: 44
f5: 6     // Wrong
f6: 5     // Wrong
f7: 77
@thejoshwolfe

This comment has been minimized.

Show comment
Hide comment
@thejoshwolfe

thejoshwolfe Apr 13, 2017

Member

we should have the same bit endianness as the byte endianness of the host.

wikipedia disagrees. it looks any of the 4 combinations are possible, except maybe little byte big bit. I think we need two separate endiannesses.

the proposal for configuring the endianness(es) of a packed struct can be a separate issue. we still need to get the default native behavior correct regardless.

Member

thejoshwolfe commented Apr 13, 2017

we should have the same bit endianness as the byte endianness of the host.

wikipedia disagrees. it looks any of the 4 combinations are possible, except maybe little byte big bit. I think we need two separate endiannesses.

the proposal for configuring the endianness(es) of a packed struct can be a separate issue. we still need to get the default native behavior correct regardless.

@andrewrk andrewrk added the breaking label May 2, 2017

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk May 18, 2017

Member

The whole point of bit fields is to have a well defined in-memory layout. So we're going to have a way to specify byte endianness as well as bit-endianness. Here's the proposed syntax:

const Endian = @import("builtin").Endian;
const Bitfields = packed(Endian.Little, Endian.Big) struct {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
}

So the packed keyword takes 2 arguments which is the byte endianness and the bit endianness, respectively.

You can leave off the bit endianness if it doesn't matter, e.g. if you don't have any sub-8-bit fields. You get a compile error if bit endianness is unspecified and it does matter.

You can leave off both the byte endianness and the bit endianness if both don't matter, e.g. if you don't have any sub-8-bit fields and you also don't have any fields greater than 1 byte. You get a compile error if the byte endianness is unspecified and it does matter.

Now consider 2 cases for bit fields:

  • Serialization / deserialization, e.g. with a network stream or a file on disk
  • OS dev, where bit/byte endianness depends on the architecture

In the first case you would hard code the parameters to the packed keyword, probably Endian.Big for both.

In the second case you would do something like this:

const builtin = @import("builtin");
const Bitfields = packed(builtin.byte_endian, builtin.bit_endian) struct {
    // ...
}

Pointers to fields with foreign endianness are not normal pointers, and cannot be used where normal pointers are expected.

Member

andrewrk commented May 18, 2017

The whole point of bit fields is to have a well defined in-memory layout. So we're going to have a way to specify byte endianness as well as bit-endianness. Here's the proposed syntax:

const Endian = @import("builtin").Endian;
const Bitfields = packed(Endian.Little, Endian.Big) struct {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
}

So the packed keyword takes 2 arguments which is the byte endianness and the bit endianness, respectively.

You can leave off the bit endianness if it doesn't matter, e.g. if you don't have any sub-8-bit fields. You get a compile error if bit endianness is unspecified and it does matter.

You can leave off both the byte endianness and the bit endianness if both don't matter, e.g. if you don't have any sub-8-bit fields and you also don't have any fields greater than 1 byte. You get a compile error if the byte endianness is unspecified and it does matter.

Now consider 2 cases for bit fields:

  • Serialization / deserialization, e.g. with a network stream or a file on disk
  • OS dev, where bit/byte endianness depends on the architecture

In the first case you would hard code the parameters to the packed keyword, probably Endian.Big for both.

In the second case you would do something like this:

const builtin = @import("builtin");
const Bitfields = packed(builtin.byte_endian, builtin.bit_endian) struct {
    // ...
}

Pointers to fields with foreign endianness are not normal pointers, and cannot be used where normal pointers are expected.

@AndreaOrru

This comment has been minimized.

Show comment
Hide comment
@AndreaOrru

AndreaOrru May 20, 2017

Member

packed(builtin.byte_endian, builtin.bit_endian) wouldn't this be the default?

Member

AndreaOrru commented May 20, 2017

packed(builtin.byte_endian, builtin.bit_endian) wouldn't this be the default?

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk May 20, 2017

Member

No, because this leads to accidentally non-portable code. If the endianness matters then you have to specify.

I can see this being annoyingly verbose in OS Dev, however, so here's another syntax proposal. This would be included in the builtin import:

const module = this;
pub const Struct = enum {
    Normal,
    Packed: Endian,

    pub const Endian = struct {
        byte: module.Endian,
        bit: module.Endian,
    };
};
pub const Endian = enum {
    Big,
    Little,
};
pub const byte_endian = Endian.Little; // set by target
pub const bit_endian = Endian.Big; // set by target
pub const NativePacked = Struct.Packed { Struct.Endian { .byte = byte_endian, .bit = bit_endian, } };

And then just like the proposal for setting the tag type of enums (See #305), you could pass a Struct argument to struct, like this:

const NativePacked = @import("builtin").NativePacked;
const Bitfields = struct(NativePacked) {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
}

And then get rid of the packed keyword. Maybe this is a little more palatable.

Member

andrewrk commented May 20, 2017

No, because this leads to accidentally non-portable code. If the endianness matters then you have to specify.

I can see this being annoyingly verbose in OS Dev, however, so here's another syntax proposal. This would be included in the builtin import:

const module = this;
pub const Struct = enum {
    Normal,
    Packed: Endian,

    pub const Endian = struct {
        byte: module.Endian,
        bit: module.Endian,
    };
};
pub const Endian = enum {
    Big,
    Little,
};
pub const byte_endian = Endian.Little; // set by target
pub const bit_endian = Endian.Big; // set by target
pub const NativePacked = Struct.Packed { Struct.Endian { .byte = byte_endian, .bit = bit_endian, } };

And then just like the proposal for setting the tag type of enums (See #305), you could pass a Struct argument to struct, like this:

const NativePacked = @import("builtin").NativePacked;
const Bitfields = struct(NativePacked) {
    f1: u16,
    f2: u16,
    f3: u8,
    f4: u8,
    f5: u4,
    f6: u4,
    f7: u8,
}

And then get rid of the packed keyword. Maybe this is a little more palatable.

@AndreaOrru

This comment has been minimized.

Show comment
Hide comment
@AndreaOrru

AndreaOrru May 21, 2017

Member

I like that.

Member

AndreaOrru commented May 21, 2017

I like that.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Aug 27, 2017

Member

It's not clear to me how to proceed on this yet. How do we know what is the bit endianness of a system? Does this concept even exist at the machine level?

I think we can find the answers to these questions but it's tricky. I'm moving it to 0.2.0.

Reminder to self:

  • update docs about structs to include bit fields
  • update docs about pointers to talk about non-byte-aligned pointers
Member

andrewrk commented Aug 27, 2017

It's not clear to me how to proceed on this yet. How do we know what is the bit endianness of a system? Does this concept even exist at the machine level?

I think we can find the answers to these questions but it's tricky. I'm moving it to 0.2.0.

Reminder to self:

  • update docs about structs to include bit fields
  • update docs about pointers to talk about non-byte-aligned pointers
@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Aug 28, 2017

Member

I think what we want here is just the ability to specify endianness of packed structs. And the default is the same endianness as the byte endianness, instead of big endian.

In short, non-byte-aligned fields are part of a larger byte-aligned field. Currently we always use big endian to put the sub-byte fields into the larger field. The proposal is to consider little or big endian when shifting bits into the larger field. I believe this would solve the problem as originally presented.

struct { }
struct(builtin.PackedLittleEndian) { }
struct(builtin.PackedBigEndian) { }
struct(builtin.PackedNativeEndian) { }

Given this, it's clear to me how to proceed.

Member

andrewrk commented Aug 28, 2017

I think what we want here is just the ability to specify endianness of packed structs. And the default is the same endianness as the byte endianness, instead of big endian.

In short, non-byte-aligned fields are part of a larger byte-aligned field. Currently we always use big endian to put the sub-byte fields into the larger field. The proposal is to consider little or big endian when shifting bits into the larger field. I believe this would solve the problem as originally presented.

struct { }
struct(builtin.PackedLittleEndian) { }
struct(builtin.PackedBigEndian) { }
struct(builtin.PackedNativeEndian) { }

Given this, it's clear to me how to proceed.

@andrewrk andrewrk modified the milestones: 0.1.0, 0.2.0 Aug 28, 2017

@kyle-github

This comment has been minimized.

Show comment
Hide comment
@kyle-github

kyle-github Sep 14, 2017

Some protocols have fields that are split in two parts. Those parts are NOT contiguous in the protocol if it is viewed as a bitstream. Some protocols (and CPUs apparently) do not have big endian or little endian data. It is mixed depending on the size of the entity. Look at things like hardware page table entries for some "interesting" examples. ModBus-based PLCs are probably the worst.

As I found out with C, packed, byte order and bit order are all separate concepts but not entirely orthogonal from each other.

Using Zig's nifty(tm) compile time features, I think you can implement something very close to Perl's pack/unpack and get semi-automated marshalling/unmarshalling.

Reading through this and the questions I posted on the forum have made me rethink whether a struct is the right abstraction for this. There is so much platform/compiler dependent behavior here...

If you want to make a syntactical representation of a foreign bit/byte format in a Zig struct, is it really a struct? Perhaps this is a different entity (new) since the layout is so important to the meaning of the data?

kyle-github commented Sep 14, 2017

Some protocols have fields that are split in two parts. Those parts are NOT contiguous in the protocol if it is viewed as a bitstream. Some protocols (and CPUs apparently) do not have big endian or little endian data. It is mixed depending on the size of the entity. Look at things like hardware page table entries for some "interesting" examples. ModBus-based PLCs are probably the worst.

As I found out with C, packed, byte order and bit order are all separate concepts but not entirely orthogonal from each other.

Using Zig's nifty(tm) compile time features, I think you can implement something very close to Perl's pack/unpack and get semi-automated marshalling/unmarshalling.

Reading through this and the questions I posted on the forum have made me rethink whether a struct is the right abstraction for this. There is so much platform/compiler dependent behavior here...

If you want to make a syntactical representation of a foreign bit/byte format in a Zig struct, is it really a struct? Perhaps this is a different entity (new) since the layout is so important to the meaning of the data?

@kyle-github

This comment has been minimized.

Show comment
Hide comment
@kyle-github

kyle-github Sep 16, 2017

Note that if you allow fields to be reordered by the compiler, you will not be able to support some kinds of polymorphism where you cast pointers to a base type in form of OO. This is somewhat common in C code. See issue #130.

I suspect that the compile time generic functions will take care of many of those cases, but not all.

kyle-github commented Sep 16, 2017

Note that if you allow fields to be reordered by the compiler, you will not be able to support some kinds of polymorphism where you cast pointers to a base type in form of OO. This is somewhat common in C code. See issue #130.

I suspect that the compile time generic functions will take care of many of those cases, but not all.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Sep 16, 2017

Member

This issue is about packed structs. The layout of packed structs is completely defined by the programmer. Field re-ordering cannot happen for packed or extern structs.

Member

andrewrk commented Sep 16, 2017

This issue is about packed structs. The layout of packed structs is completely defined by the programmer. Field re-ordering cannot happen for packed or extern structs.

@kyle-github

This comment has been minimized.

Show comment
Hide comment
@kyle-github

kyle-github Sep 17, 2017

As I mentioned in #488, packing is potentially orthogonal from reordering (as concepts).

kyle-github commented Sep 17, 2017

As I mentioned in #488, packing is potentially orthogonal from reordering (as concepts).

@andrewrk andrewrk removed this from the 0.1.0 milestone Sep 18, 2017

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 22, 2017

Member

cc @skyfex this is fixed now.

Member

andrewrk commented Dec 22, 2017

cc @skyfex this is fixed now.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 22, 2017

Member

Here's an explanation for why this fix works and why we don't have to care about parent integer size:

consider:

a: u4
b: u4
c: u4
d: u4

parent integer could be 2 u8's or a u16

parent integer = u8
 * Little Endian

parent[0] = ba
parent[1] = dc

 * Big Endian

parent[0] = ab
parent[1] = cd

parent integer = u16
 * Little Endian

parent[0] = dcba

 * Big Endian

parent[0] = abcd


On LE for u8s you look at parent[1] then parent[0].
For u16 there is only parent[0]. so:
 u8: dcba
u16: dcba

On BE for u8s you look at parent[0] then parent[1].
For u16 there is only parent[0]. so:
 u8: abcd
u16: abcd

Member

andrewrk commented Dec 22, 2017

Here's an explanation for why this fix works and why we don't have to care about parent integer size:

consider:

a: u4
b: u4
c: u4
d: u4

parent integer could be 2 u8's or a u16

parent integer = u8
 * Little Endian

parent[0] = ba
parent[1] = dc

 * Big Endian

parent[0] = ab
parent[1] = cd

parent integer = u16
 * Little Endian

parent[0] = dcba

 * Big Endian

parent[0] = abcd


On LE for u8s you look at parent[1] then parent[0].
For u16 there is only parent[0]. so:
 u8: dcba
u16: dcba

On BE for u8s you look at parent[0] then parent[1].
For u16 there is only parent[0]. so:
 u8: abcd
u16: abcd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment