Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve packed semantics #19660

Open
ikskuh opened this issue Apr 15, 2024 · 7 comments
Open

Improve packed semantics #19660

ikskuh opened this issue Apr 15, 2024 · 7 comments
Labels
frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ikskuh
Copy link
Contributor

ikskuh commented Apr 15, 2024

This is a plan for hammer out the semantics for bit packed types in the Zig language.

Let's start with a tiny syntactical change:

(1) Rename packed to bitpacked

This change will reflect the true meaning of both packed struct and packed union more than packed and reduces confusion about the type of packing that is performed (as other languages use packed for byte packing).

This should be trivial to implement and update via zig fmt.

(2) Improve bitpacked union semantics

Right now, the semantics of packed unions are kinda … underdefined:

A packed union has well-defined in-memory layout and is eligible to be in a packed struct.

Let's change that definition to:

A bitpacked union has a well-defined in-memory layout and, similar to a bitpacked struct, is backed by a single integer of the size of the largest union member.
bitpacked union(T) can be used to explicitly specify the size of the union.

All union members are aligned such that the LSB of all types align. This means that

const T = packed union(u32) {
    a: u32,
    b: u8,
};
var t: T = .{ .a = 1 };
std.debug.assert(t.b == 1);

holds true for both big and little endian systems.

This also means that not all union members share the same address, which is true for extern union.

(3) No changes to bitpacked struct

Except for the rename of packed to bitpacked, no changes are done here.

(4) Introduce a new type opaquebits(N)

This new type opaquebits(N) is an opaque bit bag that has no semantics defined except for "it takes N bits of space".

This type can be used for storing fully opaque data and the only way to interpret its content is by using @bitCast from and to it.

(5) Introduce a new type class std.builtin.Type.BitPacked

(This is definitly a stretch goal, and might require further work)

This type can be the backing type for both bitpacked union and bitpacked struct, which is similar to a regular struct/union type in that it has typed fields, but lacks most other properties.

The BitPacked type is a mix of structure and union and inspired by C# StructLayout.Explicit and FieldOffset, which allow construction of both structures and unions of an arbitrary layout.

This means that it's also legal to allow overlapping fields. For a union type, all bit_offset values are 0, for a structure, all bit_offset fields are layed out such that all fields are back-to-back.

Each field in a BitPacked type has an explicit bit offset and backing type:

pub const BitPackedField = struct {
    name: [:0]const u8,
    type: type,
    bit_offset: u16,
    default_value: ?*const anyopaque,
};

pub const BitPacked = struct {
    backing_integer: type,
    fields: []const BitPackedField,
    decls: []const Declaration,
};

It is not legal to create a type with unnamed bits.

Each BitPacked type behaves in codegen and on ABI boundaries as if it would be BitPacked.backing_integer.

(6) Incorporate #19395

Use bitpacked(T) struct/bitpacked(T) union instead of bitpacked struct(T) bitpacked union as those are more uniform in what the (T) actually modifies.

Use Case

One example where i'd like to have seen some of these features would be the EDID Video Input Definition:

Video Input Definition Layout

const VideoInputDefinition = bitpacked(u8) struct
{
    config: bitpacked(u7) union {
        analog: bitpacked(u7) struct {
            serrations: Support,
            composite_sync_on_green: Support,
            composite_sync_on_horiz: Support,
            separate_sync: Support,
            setup: enum(u1) {
                blank_is_black = 0,
                pedestal = 1,
            },
            signal_level: enum(u2) {
                model0 = 0b000, //  0.700 : 0.300 : 1.000 V p-p
                model1 = 0b001, //  0.714 : 0.286 : 1.000 V p-p
                model2 = 0b010, //  1.000 : 0.400 : 1.400 V p-p
                model3 = 0b011, //  0.700 : 0.000 : 0.700 V p-p
            },
        },
        digital: bitpacked(u7) struct  {
            standard: enum(u4) {
                none = 0,
                dvi = 1,
                hdmi_a = 2,
                hdmi_b = 3,
                mddi = 4,
                display_port = 5,
            },
            color_depth: enum(u3) {
                undefined = 0b000, // Color Bit Depth is undefined
                color6 = 0b001, // 6 Bits per Primary Color
                color8 = 0b010, // 8 Bits per Primary Color
                color10 = 0b011, // 10 Bits per Primary Color
                color12 = 0b100, // 12 Bits per Primary Color
                color14 = 0b101, // 14 Bits per Primary Color
                color16 = 0b110, // 16 Bits per Primary Color
            },
        },
    },
    type: enum(u1) { analog=0, digital=1 };
};

For additional type safety, config could've been done as opaquebits(7) and a fn unpack() union(enum){…} function could be done that returns either/or contained structure definition.

Some other structures in EDID contain even more nested bit field definitions on a singular byte or integer, so a more improved support for bitpacked data can help for sure.

Summary

For me, (1) to (4) are things that are definitly useful and would improve language semantics. (5) is definitly more of a stretch goal.

@mlugg mlugg added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Apr 15, 2024
@Snektron
Copy link
Collaborator

(4) Introduce a new type opaquebits(N)

Whats the rationale that favors this over enum(N) { _ }? The main difference I can see is that it does not allow == and !=.

@rohlem
Copy link
Contributor

rohlem commented Apr 15, 2024

bitpacked union(T) can be used to explicitly specify the size of the union.

Here T has a completely different meaning to union(T), where it is the union's tag type.
See also my proposal #19395 to resolve this by instead putting the backing type (T) after the packed/bitpacked keyword.


Each field in a BitPacked type has an explicit bit offset and backing type:

The code instead shows backing_integer being not in BitPackedField but in BitPacked - either the text or the code are incorrect (either it's per-field or per-aggregate for all fields).


@Snektron From what I understand it forces using @bitCast instead of @intFromEnum and @enumFromInt.
Besides that it seems pretty much equivalent, only that the enum can further be extended by declarations.
(It doesn't seem like opaquebits(N) is meant to have a {} scope after it, but that's not fully clear to me.)

@Vexu Vexu added this to the 0.13.0 milestone Apr 15, 2024
@Vexu
Copy link
Member

Vexu commented Apr 15, 2024

I like 1-3, I don't see a need for 4, and 5 seems like a complex feature that combines unions and structs that implements #985 but only through a builtin, not syntax.

@ikskuh
Copy link
Contributor Author

ikskuh commented Apr 15, 2024

Whats the rationale that favors this over enum(N) { _ }? The main difference I can see is that it does not allow == and !=.

I don't see a need for 4

For me, it's something that prevents accidential comparability and allows for types that must not have a defined value (thus, cannot be constructed from zig land). Use case here would be modelling microcontroller register values where some fields have the value "don't ever touch these bits, do not write anything else than what you've read"

(It doesn't seem like opaquebits(N) is meant to have a {} scope after it, but that's not fully clear to me.)

Yeah, it is similar to anyopaque, but it has to be sizeable. Maybe opaquebit5 or something would also work.

Each field in a BitPacked type has an explicit bit offset and backing type:
The code instead shows backing_integer being not in BitPackedField but in BitPacked - either the text or the code are incorrect (either it's per-field or per-aggregate for all fields).

No, both are correct. The backing type of a BitPackedField is BitPackedField.type, the backing type of the full struct is BitPacked.backing_integer

bitpacked union(T) can be used to explicitly specify the size of the union.
Here T has a completely different meaning to union(T), where it is the union's tag type.
See also my proposal #19395 to resolve this by instead putting the backing type (T) after the packed/bitpacked keyword.

True. Will adapt the original post to respect that, it's reasonable.

5 seems like a complex feature that combines unions and structs that implements #985 but only through a builtin, not syntax.

I can see that, it's not something that is necessarily needed, but would help in some cases where fields might overlap. But after some considerations, this could also be fully solved by casting between different instances of types

@Khitiara
Copy link

5 might actually be helpful for some osdev interop things where a packed struct field represents the upper bits of an address, aligned with the start of the struct but with lower bits used for other things

@ikskuh
Copy link
Contributor Author

ikskuh commented Apr 15, 2024

5 might actually be helpful for some osdev interop things where a packed struct field represents the upper bits of an address, aligned with the start of the struct but with lower bits used for other things

That was just an example, opaquebit5 is similar to u5, so ``opaquebit31oropaquebit32` would be valid as well

@Khitiara
Copy link

That was just an example, opaquebit5 is similar to u5, so ``opaquebit31oropaquebit32` would be valid as well

not sure what this actually means, i meant point 5 of having something like structlayout.explicit but at bit resolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants