New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

organize the concepts of enum and union #618

Closed
andrewrk opened this Issue Nov 17, 2017 · 17 comments

Comments

Projects
None yet
6 participants
@andrewrk
Member

andrewrk commented Nov 17, 2017

  • Right now we have enum which can be "dumb" enums where it's just named number values.
    • It's planned to be able to specify the integer tag type and integer tag values #305
    • extern enum makes sense in this case
  • enum can also have associated types, which makes it a "tagged union". This is very close to unions, the only difference is that with unions the tag is a secret field only used by debug safety. It's silly that the initialization syntax for these kind of enums is different than for unions. (and the union init syntax is better since it mirrors structs.)
    • extern enum does not make sense in this case
  • union works like a C union, except we have debug safety to make sure you don't access the wrong field
    • extern union maps to a C union, and disables the safety field.

Bottom line: enum does too much and it acts too much like union. It violates "only one way to do things".

Solution:

  • enum is only for "dumb" enums. you can specify the tag type and tag values.
  • Add enum union which is a union that always has the tag field. extern enum union works, it makes (in C) struct Foo { enum { ... } tag; union { ... } payload; }. Init syntax is the same as union.
  • You can specify the integer tag type and integer tag values just like enum.
  • Allow switch to work with enum union, like how it currently does with enums with payloads.
  • Types cannot be left off of enum union. You have to use void when you want void.
  • Initialization of an enum union looks exactly like a union. You might have to use Foo { .field = {} } for void types.
  • enum union creates a sub-type which is a dumb enum type. It's the type of the tag. You get a value of this type if you do Foo.field.
  • A enum union can implicitly cast to its enum tag type. This means you can do e.g. foo == Foo.field.

Now there's only one way to do things. If you need a dumb enum, use enum. Otherwise if enum union fits your use case, use that. Otherwise, use the flexibility that union provides.

@andrewrk andrewrk added this to the 0.3.0 milestone Nov 17, 2017

@PavelVozenilek

This comment has been minimized.

Show comment
Hide comment
@PavelVozenilek

PavelVozenilek Nov 17, 2017

union works like a C union, except we have debug safety to make sure you don't access the wrong field

Low level data fiddling is big use case for C unions. "Debug safety" would be harmful here. One example is using tagged pointers.

In some situations the tag selector is already present somewhere else.

PavelVozenilek commented Nov 17, 2017

union works like a C union, except we have debug safety to make sure you don't access the wrong field

Low level data fiddling is big use case for C unions. "Debug safety" would be harmful here. One example is using tagged pointers.

In some situations the tag selector is already present somewhere else.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Nov 17, 2017

Member

Zig unions with safety are compatible with a tag selector being somewhere else. The safety tag is omitted in release-fast builds.

You're going to have to elaborate on exactly how safety would be harmful because it prevents only what would be undefined behavior. I suspect that the "low level data fiddling" you are thinking of would be a Type Based Alias Analysis violation.

Member

andrewrk commented Nov 17, 2017

Zig unions with safety are compatible with a tag selector being somewhere else. The safety tag is omitted in release-fast builds.

You're going to have to elaborate on exactly how safety would be harmful because it prevents only what would be undefined behavior. I suspect that the "low level data fiddling" you are thinking of would be a Type Based Alias Analysis violation.

@thejoshwolfe

This comment has been minimized.

Show comment
Hide comment
@thejoshwolfe

thejoshwolfe Nov 17, 2017

Member

Or are we talking about a situation where the exact size and layout of the fields is important, in which case use a packed something.

Member

thejoshwolfe commented Nov 17, 2017

Or are we talking about a situation where the exact size and layout of the fields is important, in which case use a packed something.

@PavelVozenilek

This comment has been minimized.

Show comment
Hide comment
@PavelVozenilek

PavelVozenilek Nov 17, 2017

I am guessing (it is not documented) that if there are safety checks one could not do the "casting via union" as in C.

Example: advanced allocator with memory guards around the block and with lot of metadata inside. This can be manipulated by incrementing/decrementing pointer, but as this way is closed in Zig, raw unions may be usable here.

Another example is easy extraction of lowest bits from a tagged pointer and the true pointer value itself.

I think all of above could be accomplished via @ptrToInt casting and back, but why to make it more cumbersome than it needs to be?

PavelVozenilek commented Nov 17, 2017

I am guessing (it is not documented) that if there are safety checks one could not do the "casting via union" as in C.

Example: advanced allocator with memory guards around the block and with lot of metadata inside. This can be manipulated by incrementing/decrementing pointer, but as this way is closed in Zig, raw unions may be usable here.

Another example is easy extraction of lowest bits from a tagged pointer and the true pointer value itself.

I think all of above could be accomplished via @ptrToInt casting and back, but why to make it more cumbersome than it needs to be?

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Nov 17, 2017

Member

"casting via union" as in C.

This is undefined behavior, at least in C99.

It's not more cumbersome than it needs to be. You modify which field of the union is active by assigning a new value to the entire union, rather than a specific field. The use case with an advanced allocator is handled, no problem.

Member

andrewrk commented Nov 17, 2017

"casting via union" as in C.

This is undefined behavior, at least in C99.

It's not more cumbersome than it needs to be. You modify which field of the union is active by assigning a new value to the entire union, rather than a specific field. The use case with an advanced allocator is handled, no problem.

@PavelVozenilek

This comment has been minimized.

Show comment
Hide comment
@PavelVozenilek

PavelVozenilek Nov 17, 2017

I'll probably go the @ptrToInt / @intToPtr way, for allocators and containers. It still looks as the easiest one.

In this thread I got concerned by the emphasize on safety by limiting expressibility. Things like leaks, runaway pointers or nulls can be caught trivially, complex code is more defiant to change.


Couple of ideas about the enums:

  1. Enums could be defined inline, like:
fn foo() -> enum { This, That}  { .. }
var x : foo.return_type = foo();

fn bar( x : enum { A, B, C }) { ... }
var x : bar.x  = ...
bar(x);

No need to invent yet another name, and the enum definition is next to supposed use, not somewhere far away, prone to misuse.

  1. Enum could include numbers and/or numeric ranges:
fn get_bits_size() -> enum { 8, 16, 32, 64, NOT_APPLICABLE } { ... }

fn amazonian_indian_counting() -> enum { 1 .. 3, MANY, DONT_KNOW } { ... }

PavelVozenilek commented Nov 17, 2017

I'll probably go the @ptrToInt / @intToPtr way, for allocators and containers. It still looks as the easiest one.

In this thread I got concerned by the emphasize on safety by limiting expressibility. Things like leaks, runaway pointers or nulls can be caught trivially, complex code is more defiant to change.


Couple of ideas about the enums:

  1. Enums could be defined inline, like:
fn foo() -> enum { This, That}  { .. }
var x : foo.return_type = foo();

fn bar( x : enum { A, B, C }) { ... }
var x : bar.x  = ...
bar(x);

No need to invent yet another name, and the enum definition is next to supposed use, not somewhere far away, prone to misuse.

  1. Enum could include numbers and/or numeric ranges:
fn get_bits_size() -> enum { 8, 16, 32, 64, NOT_APPLICABLE } { ... }

fn amazonian_indian_counting() -> enum { 1 .. 3, MANY, DONT_KNOW } { ... }

@kyle-github

This comment has been minimized.

Show comment
Hide comment
@kyle-github

kyle-github Nov 18, 2017

Could someone provide some examples? I think I get it, but... I have been wrong more than once :-)

Other languages show sum types with different syntax:

const MyType = i32 | f64 | foo;

So the solution above from @andrewrk is going to be:

const MyType = enum union {
       i_val: i32;
       f_val: f64;
       foo_val: foo;
};

var a: MyType = { .i_val = 16 };
var b: MyType = { .f_val = 3.14159 };

a = b; <--- fails to compile?

const ASTNode = enum union {
      name: NameNode;
      number:NumberNode;
      ....
};

Is that close?

How does switch work with this? The ASTNode example is pretty common.

@PavelVozenilek you have some good examples there. I have made use of tagged pointers in some code (particularly when implementing VMs such as in Smalltalk or Java). While it is not portable to use unions, since the code is supposed to be very, very tied to a specific machine type, this is not a problem in practice.

One thing I would like to see is the ability to convert something into a bit array such that I can use it to access specific bits directly. Then tagged pointers would be a breeze.

kyle-github commented Nov 18, 2017

Could someone provide some examples? I think I get it, but... I have been wrong more than once :-)

Other languages show sum types with different syntax:

const MyType = i32 | f64 | foo;

So the solution above from @andrewrk is going to be:

const MyType = enum union {
       i_val: i32;
       f_val: f64;
       foo_val: foo;
};

var a: MyType = { .i_val = 16 };
var b: MyType = { .f_val = 3.14159 };

a = b; <--- fails to compile?

const ASTNode = enum union {
      name: NameNode;
      number:NumberNode;
      ....
};

Is that close?

How does switch work with this? The ASTNode example is pretty common.

@PavelVozenilek you have some good examples there. I have made use of tagged pointers in some code (particularly when implementing VMs such as in Smalltalk or Java). While it is not portable to use unions, since the code is supposed to be very, very tied to a specific machine type, this is not a problem in practice.

One thing I would like to see is the ability to convert something into a bit array such that I can use it to access specific bits directly. Then tagged pointers would be a breeze.

@hasenj

This comment has been minimized.

Show comment
Hide comment
@hasenj

hasenj Nov 23, 2017

What will happen to the implicit error union types, e.g. %u8?

hasenj commented Nov 23, 2017

What will happen to the implicit error union types, e.g. %u8?

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 1, 2017

Member

Instead of enum union, we'll have this:

const Letter = enum {
    A,
    B,
    C,
};
const Payload = union(Letter) {
    A: i32,
    B: f64,
    C: bool,
};

This gives the union a tag field, which has the type of the enum given.

Like a switch statement on an enum value, if you fail to enumerate all the enum fields in a union declaration, it is a compile error.

union(T) implicitly casts to T.

So,

  • enum lets you do enum(IntType) #305
  • struct lets you do packed struct(endianness) #307
  • union lets you do union(EnumType) (this issue)

What will happen to the implicit error union types, e.g. %u8?

Error unions are independent from this issue.

Member

andrewrk commented Dec 1, 2017

Instead of enum union, we'll have this:

const Letter = enum {
    A,
    B,
    C,
};
const Payload = union(Letter) {
    A: i32,
    B: f64,
    C: bool,
};

This gives the union a tag field, which has the type of the enum given.

Like a switch statement on an enum value, if you fail to enumerate all the enum fields in a union declaration, it is a compile error.

union(T) implicitly casts to T.

So,

  • enum lets you do enum(IntType) #305
  • struct lets you do packed struct(endianness) #307
  • union lets you do union(EnumType) (this issue)

What will happen to the implicit error union types, e.g. %u8?

Error unions are independent from this issue.

@PavelVozenilek

This comment has been minimized.

Show comment
Hide comment
@PavelVozenilek

PavelVozenilek Dec 1, 2017

In the example:

const Letter = enum {
    A,
    B,
    C,
};
const Payload = union(Letter) {
    A: i32,
    B: f64,
    C: bool,
};

Has the Letter type some separate use? Like being able to extract the tag value, do switch on the union's tag ( not on the data type), getting and using tag's datatype?

PavelVozenilek commented Dec 1, 2017

In the example:

const Letter = enum {
    A,
    B,
    C,
};
const Payload = union(Letter) {
    A: i32,
    B: f64,
    C: bool,
};

Has the Letter type some separate use? Like being able to extract the tag value, do switch on the union's tag ( not on the data type), getting and using tag's datatype?

@jido

This comment has been minimized.

Show comment
Hide comment
@jido

jido Dec 1, 2017

Does the enum have to be defined ahead of time or can it be inferred from the union definition, as in:

const Payload = union(enum) {
    A: i32,
    B: f64,
    C: bool,
}

Also this syntax does not give name to the alternatives, is that intentional? Can you do
const x: Payload = 5.35; const y: f64 = x;
?

jido commented Dec 1, 2017

Does the enum have to be defined ahead of time or can it be inferred from the union definition, as in:

const Payload = union(enum) {
    A: i32,
    B: f64,
    C: bool,
}

Also this syntax does not give name to the alternatives, is that intentional? Can you do
const x: Payload = 5.35; const y: f64 = x;
?

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 1, 2017

Member

The enum type has a separate use:

  • you can explicitly cast the union type to the enum type to get the tag value.
  • You can check for tag equality
  • you can assign custom tag values in the enum
const x = Payload { .B = 5.35 };
const y = x.B;
assert(Letter(x) == Letter.B);

At least for now, the enum will have to be defined ahead of time.

Member

andrewrk commented Dec 1, 2017

The enum type has a separate use:

  • you can explicitly cast the union type to the enum type to get the tag value.
  • You can check for tag equality
  • you can assign custom tag values in the enum
const x = Payload { .B = 5.35 };
const y = x.B;
assert(Letter(x) == Letter.B);

At least for now, the enum will have to be defined ahead of time.

@andrewrk andrewrk added the accepted label Dec 3, 2017

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 3, 2017

Member

Also make it so that enums support signed integer tag types.

Member

andrewrk commented Dec 3, 2017

Also make it so that enums support signed integer tag types.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 3, 2017

Member

On second thought, I'm going to accept @jido's proposal. So you can do:

const Payload = union(enum) {
    A: i32,
    B: f64,
    C: bool,
}

And this automatically creates the enum. You can also do:

const Payload = union(enum(u32)) {
    A: i32 = 3,
    B: f64 = 10,
    C: bool = 100,
}

And now you've configured the integer tag type of the enum, and specified the tag values. Then you can access the automatically created enum type with @TagType(Payload), and you can access the integer type u32 with @TagType(@TagType(Payload)).

A union(enum) with all fields void is equivalent to enum.

Member

andrewrk commented Dec 3, 2017

On second thought, I'm going to accept @jido's proposal. So you can do:

const Payload = union(enum) {
    A: i32,
    B: f64,
    C: bool,
}

And this automatically creates the enum. You can also do:

const Payload = union(enum(u32)) {
    A: i32 = 3,
    B: f64 = 10,
    C: bool = 100,
}

And now you've configured the integer tag type of the enum, and specified the tag values. Then you can access the automatically created enum type with @TagType(Payload), and you can access the integer type u32 with @TagType(@TagType(Payload)).

A union(enum) with all fields void is equivalent to enum.

@jido

This comment has been minimized.

Show comment
Hide comment
@jido

jido Dec 3, 2017

Ha, and what if you want to give a default value to the members of the union? That is what it looks like with the latter syntax. It is confusing.

jido commented Dec 3, 2017

Ha, and what if you want to give a default value to the members of the union? That is what it looks like with the latter syntax. It is confusing.

@PavelVozenilek

This comment has been minimized.

Show comment
Hide comment
@PavelVozenilek

PavelVozenilek Dec 3, 2017

This syntax

const Payload = union(enum(u32)) {
    A: i32 = 3,
    B: f64 = 10,
    C: bool = 100,
}

feels really strange.

Edit: the visual pattern

name : type = value

should be reserved for variable/const definition with initialization.

PavelVozenilek commented Dec 3, 2017

This syntax

const Payload = union(enum(u32)) {
    A: i32 = 3,
    B: f64 = 10,
    C: bool = 100,
}

feels really strange.

Edit: the visual pattern

name : type = value

should be reserved for variable/const definition with initialization.

@andrewrk

This comment has been minimized.

Show comment
Hide comment
@andrewrk

andrewrk Dec 3, 2017

Member

Ha, and what if you want to give a default value to the members of the union?

You can't do that. Neither structs nor unions support giving a default value to a field.

That is what it looks like with the latter syntax. It is confusing.

The other option we have is: if you want to change the enum tag values, you would have to specify the enum separately.

Member

andrewrk commented Dec 3, 2017

Ha, and what if you want to give a default value to the members of the union?

You can't do that. Neither structs nor unions support giving a default value to a field.

That is what it looks like with the latter syntax. It is confusing.

The other option we have is: if you want to change the enum tag values, you would have to specify the enum separately.

@andrewrk andrewrk closed this in 0ad1239 Dec 4, 2017

andrewrk added a commit that referenced this issue Dec 4, 2017

andrewrk added a commit that referenced this issue Dec 4, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment