Skip to content

use cases for alignment, packing and byte order #488

@kyle-github

Description

@kyle-github

@andrewrk is probably tired of this by now, but I think it might be worth at least recording some use cases that make other proposals for struct handling less complete.

One of the problems is that there are cases where you really want to control the exact representation of data in memory and others where you just want to use a higher-level concept.

Here is a short list (not at all complete) of some reasons why you would want to control some aspect of the in-memory representation:

  1. (Alignment) Hardware twiddling: memory mapped IO. There are cases where a 32-bit value is mapped at some location in memory but that location is NOT 32-bit aligned. Or vice versa that a location requires an alignment bigger than the natural alignment of the object. This one is not specific to structs.
  2. (packing and field ordering) Saving memory. If I want to really put a lot of stuff in memory, I want to have the tightest packing possible.
  3. (internal field alignment and overall structure ordering) Controlling cache effects. There are circumstances where controlling the size and ordering of fields in a structure can be used to optimize cache behavior. You can get really strange data structures when you start thinking about column-oriented data. I may even want to have the compiler switch from array-of-struct to struct-of-array representation. That kind of representation is used in things like geometry (in games etc.), in-memory databases etc.
  4. (alignment, field ordering and byte ordering) Reading/writing binary data to a stream such as a file or network connection. A TCP/IP packet's headers are in big-endian order. PNG and JPEG specify specific byte orders. Some wide character encodings have specific byte orders.
  5. (alignment, padding) Padding allocations for "invisible" metadata or other hacks. This is much less useful in Zig than C, but there may be cases not covered in Zig. I use this hack in my own simple C ref counting implementation to store the ref count info before the data block.
  6. (field ordering) Polymorphic pointer handling. C guarantees that a pointer to a struct is also a pointer to the first (lexical) field in the struct. That is used to do a form of polymorphism by having a parent struct be the first field in a child struct. Then a pointer to a child can be cast to a parent and any parent function can be called. Reordering fields stops that from working.
  7. (field ordering, padding, alignment) Atomic access. You may want to specify that all fields in a struct are going to be accessed via atomic instructions. In that case, it is likely that the CAS-like instructions the CPU offers only work on a very limited set of sizes. Any fields smaller than the smallest CAS instruction size would need to be padded out so that you did not try to atomically access multiple small fields at once by accident. This might not be deadly to your program correctness but it could cause a lot of contention)

Alignment above is actually two things. There is the alignment of the entity itself (where the struct or value starts) and there is the alignment of fields within the entity (in the case of a struct). Field alignment can be native (aligned to the natural alignment of the field itself), some minimum, and even a maximum. Some fields are aligned to a size that is not some nice multiple or fraction of the field. For instance the weird x87 long-ish double is 80 bits.

Padding can be somewhat included in alignment. You may want to pad a struct out to a certain alignment at all times. I do this kind of garbage in C:

`
struct rc {
int count;
lock_t lock;
void (*cleanup_func)(void *data);

union {
    uint8_t dummy_u8;
    uint16_t dummy_u16;
    uint32_t dummy_u32;
    uint64_t dummy_u64;
    double dummy_double;
    void *dummy_ptr;
    void (*dummy_func)(void);
} dummy_align[];

};
`

I could not find another way to make sure that it would pad out this struct to be properly aligned. Probably not all of the dummy fields are necessary, but I found enough questionable things in my searches that I decided that it was better to do this hack with a belt and suspenders.

Suppose you have a struct with a lot of bool fields. You may put them in the struct in a lexical order that matches how you might use them. In many cases it would be nice to tell the compiler to pack them all into a few bytes at the beginning of the struct for size. Given the extreme difference in speed between a cache hit and miss, packing data tightly so that more fits in the cache can have a huge effect on some algorithms.

It would be nice to be able to control all these "knobs" on a data element. On all data, alignment of the targets of pointers would be useful to control. With structs, it would be good to be able to control:

  • alignment of the struct itself
  • padding of the struct itself (often related to alignment so perhaps redundant)
  • order of fields (or lack of an order constraint)
  • field padding
  • field alignment (related to the above)
  • field byte order (big endian and little endian are two of the most common possibilities but definitely not all of them)
  • field combination (the bool example)

I thought I would try to tie all these thoughts into one place.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions