Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Decl Literals #9938

Closed
SpexGuy opened this issue Oct 12, 2021 · 18 comments
Closed

Proposal: Decl Literals #9938

SpexGuy opened this issue Oct 12, 2021 · 18 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@SpexGuy
Copy link
Contributor

SpexGuy commented Oct 12, 2021

Enum literals are a powerful and extremely useful feature of Zig. This proposal changes their definition slightly to make them more useful in a wider variety of cases, and renames them to "Decl Literals". I'll start with a thorough description of the feature, and then end with a discussion of the tradeoffs of this proposal.

Description

Part 1: Decl Literals

The current procedure for enum literal casting is to look in the target type for an enum field matching the name of the literal. I propose to generalize that to look instead for a type field (decl or enum/union tag) matching the name of the literal. With this change, decl literals can be coerced to any namespace type. This can be especially useful for modeling default values and common presets. For example:

const DeloreanOptions = packed struct {
    enable_flux_capacitor: bool,
    target_speed_mph: u7,
    enable_remote_control: bool = false,

    pub const time_travel: DeloreanOptions = .{
        .enable_flux_capacitor = true,
        .target_speed_mph = 88,
    };

    pub const highway: DeloreanOptions = .{
        .enable_flux_capacitor = false,
        .target_speed_mph = 60,
    };
};

pub fn startDelorean(options: DeloreanOptions) void { ... }

test {
    // coerce from decl literal to constant
    startDelorean(.time_travel);

    // late binding coercion is supported
    const highway_literal = .highway;
    startDelorean(highway_literal);

    // explicit instantiation still works too.
    startDelorean(.{
        .enable_flux_capacitor = true,
        .target_speed_mph = 88,
        .enable_remote_control = true,
    })
}

Part 2: Operations on Decl Literals

We can further define a couple of operations on decl literals, to take advantage of their ability to infer a namespace:

2A: .decl_literal()

Calling a decl literal does the following operations:

  1. Require a result type.
  2. Look up the decl literal in %1
  3. call %2
  4. coerce %3 to %1 if not forwarding result location

This can remove repetition in initialization code:

var array: std.ArrayList(u32) = .init(allocator);

2B: .decl_literal{ .field = value, ... }

Instantiating a decl literal with this syntax does the following:

  1. Require a result type.
  2. Require that the result type is a union
  3. Look up a field in %1 named decl_literal
  4. Use the struct literal to initialize field %3 of the result location

This extends the current special-case initialization for void tags to work for struct tags as well.

test {
    // init void tag (this works already)
    var t: std.builtin.TypeInfo = .Type;

    // init struct tag
    t = .Pointer{
        .size = .One,
        .is_const = true,
        .is_volatile = false,
        .is_allowzero = false,
        .alignment = 0,
        .address_space = .generic,
        .child = u32,
        .sentinel = @as(?u32, null),
    };

    // init struct tag with late binding
    const tag = if (comptime something()) .Fn else .BoundFn;
    t = tag{
        .calling_convention = .Unspecified,
        .alignment = 0,
        .is_generic = false,
        .is_var_args = false,
        .return_type = u32,
        .args = &[_]FnArg{},
    };
}

Discussion

1: Decl Literals

An extremely common pattern in C when building a bitfield enum is to create extra named constants for common sets of flags. These defaults often behave like a de-facto enum, with custom specifications being very uncommon. Zig's solution to bitfields is to use packed structs. However, a packed struct can have only one default (.{}), which in the case of a bitfield is usually reserved for the zero value. You can declare default values as decls in the bitfield namespace, but in doing so you lose a lot of the ergonomics that those decls might provide. (obj.foo(some.deeply.nested.package.FooFlags.flushCpuCaches)).

This friction causes a conflict when specifying field defaults. You can either specify defaults so that .{} is a useful value, or specify defaults so that fields must be correctly initialized. These two things are often not the same. The second one is safer, but the first is often more ergonomic. With decl literals, there is an ergonomic alternative for useful default values which lets .{} syntax be reserved for intentional initialization.

There is an additional tradeoff between modeling such a structure as a packed struct or an extensible enum. In theory, the packed struct is better on nearly all metrics. It documents the bit meanings, reflection code can understand it, and it's clearer and easier to make custom variants. But in the current language, the common case of using a preset is much less ergonomic with a packed struct than an enum. This feature solves that tradeoff, making packed struct the clear choice.

The std lib and stage 2 compiler don't make heavy use of this sort of bitfield API, but it's common in C/C++ libraries and their zig bindings. Some examples:

https://github.com/SpexGuy/Zig-ImGui/blob/1469da84a3d90e9d96a87690f0202475b0f875df/zig/imgui.zig#L53-L97

https://github.com/MasterQ32/SDL.zig/blob/f3a3384e6a7b268eccb4aa566e952b05ff7eebfc/src/wrapper/sdl.zig#L43-L56

I don't believe that this pattern comes from the language design of C, but instead from the high information density of bitfields. This property carries over to Zig, so there shouldn't be any reason that these sorts of APIs wouldn't be desirable in Zig. I suspect the current lack of them comes from the lack of ergonomics surrounding these features, not because there are "better" patterns that we choose to use instead.

2A: Call syntax

I really like this syntax for initialization, and I think it's a consistent extension of the var x: T = .{} syntax. With the current pattern,

const value = package.SomeType.init(4);

The reader does not necessarily know that the type of value is package.SomeType. This is usually true by convention, but careful readers and editor tools cannot know for sure. In contrast, with the new syntax:

const value: package.SomeType = .init(4);

The reader and tools now know for sure that value must be of type package.SomeType. This syntax conveys extra information, and is consistent with a preference for x: T = .{} over x = T{}.

Examples of code this would affect are everywhere, but here are some examples from the std lib and stage 2:


zig/lib/std/bit_set.zig

Lines 428 to 430 in f42725c

pub fn iterator(self: *const Self, comptime options: IteratorOptions) Iterator(options) {
return Iterator(options).init(&self.masks, last_item_mask);
}

 pub fn iterator(self: *const Self, comptime options: IteratorOptions) Iterator(options) { 
     return .init(&self.masks, last_item_mask); 
 } 

zig/src/Compilation.zig

Lines 1445 to 1451 in f42725c

.emit_analysis = options.emit_analysis,
.emit_docs = options.emit_docs,
.work_queue = std.fifo.LinearFifo(Job, .Dynamic).init(gpa),
.c_object_work_queue = std.fifo.LinearFifo(*CObject, .Dynamic).init(gpa),
.astgen_work_queue = std.fifo.LinearFifo(*Module.File, .Dynamic).init(gpa),
.keep_source_files_loaded = options.keep_source_files_loaded,
.use_clang = use_clang,

            .emit_analysis = options.emit_analysis,
            .emit_docs = options.emit_docs,
            .work_queue = .init(gpa),
            .c_object_work_queue = .init(gpa),
            .astgen_work_queue = .init(gpa),
            .keep_source_files_loaded = options.keep_source_files_loaded,
            .use_clang = use_clang,

zig/src/codegen/spirv.zig

Lines 247 to 261 in f42725c

pub fn init(spv: *SPIRVModule) DeclGen {
return .{
.spv = spv,
.air = undefined,
.liveness = undefined,
.args = std.ArrayList(ResultId).init(spv.gpa),
.next_arg_index = undefined,
.inst_results = InstMap.init(spv.gpa),
.blocks = BlockMap.init(spv.gpa),
.current_block_label_id = undefined,
.code = std.ArrayList(Word).init(spv.gpa),
.decl = undefined,
.error_msg = undefined,
};
}

    pub fn init(spv: *SPIRVModule) DeclGen {
        return .{
            .spv = spv,
            .air = undefined,
            .liveness = undefined,
            .args = .init(spv.gpa),
            .next_arg_index = undefined,
            .inst_results = .init(spv.gpa),
            .blocks = .init(spv.gpa),
            .current_block_label_id = undefined,
            .code = .init(spv.gpa),
            .decl = undefined,
            .error_msg = undefined,
        };
    }

// alu instructions
try expect_opcode(0x07, Insn.add(.r1, 0));
try expect_opcode(0x0f, Insn.add(.r1, .r2));
try expect_opcode(0x17, Insn.sub(.r1, 0));
try expect_opcode(0x1f, Insn.sub(.r1, .r2));
try expect_opcode(0x27, Insn.mul(.r1, 0));
try expect_opcode(0x2f, Insn.mul(.r1, .r2));
try expect_opcode(0x37, Insn.div(.r1, 0));
try expect_opcode(0x3f, Insn.div(.r1, .r2));

    // alu instructions
    try expect_opcode(0x07, .add(.r1, 0));
    try expect_opcode(0x0f, .add(.r1, .r2));
    try expect_opcode(0x17, .sub(.r1, 0));
    try expect_opcode(0x1f, .sub(.r1, .r2));
    try expect_opcode(0x27, .mul(.r1, 0));
    try expect_opcode(0x2f, .mul(.r1, .r2));
    try expect_opcode(0x37, .div(.r1, 0));
    try expect_opcode(0x3f, .div(.r1, .r2));

There may be an argument that this is too implicit, and removes information that would have previously been available. However, it is still clear where to look for the relevant function, and it's clear that a function call is being made. It's also clearer now what the return type of the function is, where that was not known before. So I think this change is still reasonable.

2B: Union struct init syntax

This syntax could be used in a large number of places in the std lib and stage 2 compiler. Search for the regex \.\{ \.\w+ = \.\{ to find them. Some examples for convenience:


zig/src/AstGen.zig

Lines 701 to 704 in f42725c

.data = .{ .@"unreachable" = .{
.safety = true,
.src_node = gz.nodeIndexToRelative(node),
} },

 .data = .@"unreachable"{ 
     .safety = true, 
     .src_node = gz.nodeIndexToRelative(node), 
 },

zig/src/AstGen.zig

Lines 6079 to 6082 in f42725c

.data = .{ .switch_capture = .{
.switch_inst = switch_block,
.prong_index = undefined,
} },

 .data = .switch_capture{ 
     .switch_inst = switch_block, 
     .prong_index = undefined, 
 }, 

Because the void tag syntax works, I intuitively expected the proposed syntax to work as well. So I think this feature has a certain amount of consistency on its side. However, it also has some significant drawbacks:

  • It makes multiple ways to initialize a union
  • It only works for structs or fixed size arrays

There are alternatives, but I don't like them either:

- The above but also .tag{ value } initializes tag to value

  • Kind of strange, we don't allow braced init anywhere else. Also it's ambiguous for an array type of length 1.

- const u: U = .tag = value;

  • This just drops the .{}. Also it's difficult to read, and it's a new syntactic form which would now be allowed in non-typechecked code.

- const u: U = .tag: value;

  • This is inconsistent, : specifies types in all other situations, not values.

- const u: U = .tag value;

  • This one looks kind of cool: val = .tag.{ .x = 4, .y = 6 };. But we don't use bare word order like this anywhere else in the language. It's probably ambiguous with something.

- const u: U = .tag(init_expr);

  • Ambiguous with a function call, would kind of break the "function calls look like function calls" rule. If we were going to use any of these options, this would be my preference. But I don't think it's needed.

Because of this, I don't think 2B should be accepted. But I wanted to put it out there anyway for completeness.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Oct 12, 2021
@andrewrk andrewrk added this to the 0.10.0 milestone Oct 12, 2021
@ifreund
Copy link
Member

ifreund commented Oct 12, 2021

As another data point, 2A would significantly reduce the verbosity of using the type-safe binding for libwayland's wl_listener I came up with. The status quo code is https://github.com/ifreund/river/blob/4b94b9c0839eb75e5a8d3eeaf26e85e516a89015/river/XdgToplevel.zig#L47-L64

destroy: wl.Listener(*wlr.XdgSurface) = wl.Listener(*wlr.XdgSurface).init(handleDestroy),
map: wl.Listener(*wlr.XdgSurface) = wl.Listener(*wlr.XdgSurface).init(handleMap),
unmap: wl.Listener(*wlr.XdgSurface) = wl.Listener(*wlr.XdgSurface).init(handleUnmap),
new_popup: wl.Listener(*wlr.XdgPopup) = wl.Listener(*wlr.XdgPopup).init(handleNewPopup),
new_subsurface: wl.Listener(*wlr.Subsurface) = wl.Listener(*wlr.Subsurface).init(handleNewSubsurface),

// Listeners that are only active while the view is mapped
ack_configure: wl.Listener(*wlr.XdgSurface.Configure) =
    wl.Listener(*wlr.XdgSurface.Configure).init(handleAckConfigure),
commit: wl.Listener(*wlr.Surface) = wl.Listener(*wlr.Surface).init(handleCommit),
request_fullscreen: wl.Listener(*wlr.XdgToplevel.event.SetFullscreen) =
    wl.Listener(*wlr.XdgToplevel.event.SetFullscreen).init(handleRequestFullscreen),
request_move: wl.Listener(*wlr.XdgToplevel.event.Move) =
    wl.Listener(*wlr.XdgToplevel.event.Move).init(handleRequestMove),
request_resize: wl.Listener(*wlr.XdgToplevel.event.Resize) =
    wl.Listener(*wlr.XdgToplevel.event.Resize).init(handleRequestResize),
set_title: wl.Listener(*wlr.XdgSurface) = wl.Listener(*wlr.XdgSurface).init(handleSetTitle),
set_app_id: wl.Listener(*wlr.XdgSurface) = wl.Listener(*wlr.XdgSurface).init(handleSetAppId),

@g-w1
Copy link
Contributor

g-w1 commented Oct 13, 2021

I like the idea, a few technical questions:
Can an enum literal coerce to a decl literal:

const z = .z;
var a: SomeType = z;

not sure if this should work, although the syntax is the same.

What is @typeof(.f())? does this even work like enum literals?
Thanks

@InKryption
Copy link
Contributor

Although not explicitly stated, assuming this implicitly would also extend to the declarations in enums, this could also enable slightly better cohesion between normal enum values, and enum value "aliases", which are common in C APIs like Vulkan. E.g.

const std = @import("std");
const VkResult = enum(i32) {
    VK_SUCCESS = 0,
    VK_NOT_READY = 1,
    VK_TIMEOUT = 2,
    VK_EVENT_SET = 3,
    VK_EVENT_RESET = 4,
    VK_INCOMPLETE = 5,
    VK_ERROR_OUT_OF_HOST_MEMORY = -1,
    VK_ERROR_OUT_OF_DEVICE_MEMORY = -2,
    VK_ERROR_INITIALIZATION_FAILED = -3,
    VK_ERROR_DEVICE_LOST = -4,
    VK_ERROR_MEMORY_MAP_FAILED = -5,
    VK_ERROR_LAYER_NOT_PRESENT = -6,
    VK_ERROR_EXTENSION_NOT_PRESENT = -7,
    VK_ERROR_FEATURE_NOT_PRESENT = -8,
    VK_ERROR_INCOMPATIBLE_DRIVER = -9,
    VK_ERROR_TOO_MANY_OBJECTS = -10,
    VK_ERROR_FORMAT_NOT_SUPPORTED = -11,
    VK_ERROR_FRAGMENTED_POOL = -12,
    VK_ERROR_UNKNOWN = -13,
    VK_ERROR_OUT_OF_POOL_MEMORY = -1000069000,
    VK_ERROR_INVALID_EXTERNAL_HANDLE = -1000072003,
    VK_ERROR_FRAGMENTATION = -1000161000,
    VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS = -1000257000,
    VK_ERROR_SURFACE_LOST_KHR = -1000000000,
    VK_ERROR_NATIVE_WINDOW_IN_USE_KHR = -1000000001,
    VK_SUBOPTIMAL_KHR = 1000001003,
    VK_ERROR_OUT_OF_DATE_KHR = -1000001004,
    VK_ERROR_INCOMPATIBLE_DISPLAY_KHR = -1000003001,
    VK_ERROR_VALIDATION_FAILED_EXT = -1000011001,
    VK_ERROR_INVALID_SHADER_NV = -1000012000,
    VK_ERROR_INVALID_DRM_FORMAT_MODIFIER_PLANE_LAYOUT_EXT = -1000158000,
    VK_ERROR_NOT_PERMITTED_EXT = -1000174001,
    VK_ERROR_FULL_SCREEN_EXCLUSIVE_MODE_LOST_EXT = -1000255000,
    VK_THREAD_IDLE_KHR = 1000268000,
    VK_THREAD_DONE_KHR = 1000268001,
    VK_OPERATION_DEFERRED_KHR = 1000268002,
    VK_OPERATION_NOT_DEFERRED_KHR = 1000268003,
    VK_PIPELINE_COMPILE_REQUIRED_EXT = 1000297000,
    pub const VK_ERROR_OUT_OF_POOL_MEMORY_KHR: @This() = .VK_ERROR_OUT_OF_POOL_MEMORY;
    pub const VK_ERROR_INVALID_EXTERNAL_HANDLE_KHR: @This() = .VK_ERROR_INVALID_EXTERNAL_HANDLE;
    pub const VK_ERROR_FRAGMENTATION_EXT: @This() = .VK_ERROR_FRAGMENTATION;
    pub const VK_ERROR_INVALID_DEVICE_ADDRESS_EXT: @This() = .VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS;
    pub const VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS_KHR: @This() = .VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS;
    pub const VK_ERROR_PIPELINE_COMPILE_REQUIRED_EXT: @This() = .VK_PIPELINE_COMPILE_REQUIRED_EXT;
};
test {
    const expected_result: VkResult = .VK_ERROR_PIPELINE_COMPILE_REQUIRED_EXT;
    const actual_result: VkResult = .VK_PIPELINE_COMPILE_REQUIRED_EXT;
    try std.testing.expectEqual(expected_result, actual_result);
}

But then, would this also enable switching on the alias literals? Obviously, switching on the actual value and the alias would be a compile error, the same as having duplicate switch cases. But it's worth considering.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Oct 13, 2021

Can an enum literal coerce to a decl literal?

This proposal renames enum literals to decl literals, so they are already one and the same. A decl literal will resolve to an enum value when coerced to an enum type with a matching field name.

What is @typeOf(.f())? does this even work like enum literals?

This is a compile error because there is no result type to bind .f to. @TypeOf(@as(T, .f())) is well formed, and is a compile error if T.f() does not return something that coerces to T.

would this also enable switching on the alias literals?

Yes, for the same reason switching on enum literals works now. The switch target expressions are coerced to the target type (which would now see aliases), they are all calculated at compile time, and then they are checked for uniqueness and exhaustiveness. So if decl literals are implemented they should "just work" in switches with no extra work.

@Snektron
Copy link
Collaborator

Snektron commented Oct 17, 2021

However, it also has some significant drawbacks:

I would also argue that the syntax

const tag = if (comptime something()) .Fn else .BoundFn;
t = tag{

kind of undermines the explicit-ness required by a regular union assignment.

One other idea would be to extend this to also the left-hand side of struct literals:

const tag = if (comptime somethin()) .Fn else .BoundFn;
t = .{tag = .{...}};

but i don't think that is very nice either.

@ikskuh
Copy link
Contributor

ikskuh commented Oct 17, 2021

I think that while this proposal is a good idea per se, i really dislike it for my vision of the Zig language.

I find the code examples using this proposal way less clear and require way more knowledge of the whole codebase.

Imho, this proposal contradicts

  • Only one obvious way to do things.
  • Favor reading code over writing code.
  • Communicate intent precisely.

.work_queue = .init(gpa) does not convey at all what type work_queue is. Is it a std.TailQueue? A thread safe queue? Just a linked list? Ring buffer?

Status quo syntax usually answers these questions by looking at the same source file, as i either have a qualified name init (.work_queue = ArrayList(u8).init(gpa)) or i have a variable that has a specified type.

I feel like this is a step away from the goals of Zig

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Oct 18, 2021

@MasterQ32 Do you feel the same way about part 1, or is it just 2A that bothers you?

Personally, I could take or leave 2A, it's really only fixing a minor inconvenience. But I think part 1 is really important for the ergonomics of any library that makes heavy use of bit flags.

In defense of 2A though, IMO the information that is removed is not relevant to the locations from which it has been removed. Specifically, the language makes no attempt to specify field types on struct initializers. For example:

some_struct = .{ // no indication of the type of some_struct
  .num_items = 4, // no idea what kind of number this is
  .dispatch_type = .disable, // no idea what enum this is
  .extra_value = getExtra(), // no indication of the type of extra_value.  In case of coercion, it may not even match the return type of getExtra().
  .tag = util.makeTag(), // false positive: .tag is not of type `util`.
};

There are many existing cases where actual types are not apparent within struct initializers, so I don't feel that it's inconsistent or a major loss to drop that information in a few more cases. It's not really important to the initializer whether or not work_queue is a TailQueue or a thread safe queue, if the initialization code is the same either way.

However, your example of status quo also does not necessarily indicate what type work_queue is. It only tells you in what namespace to look up the function which creates it. init functions are not required to return @This(). The required code to indicate the type of work_queue is

.work_queue = @as(ArrayList(u8), ArrayList(u8).init(gpa));

If indicating the type is your goal, this proposal makes that easier:

.work_queue = @as(ArrayList(u8), .init(gpa));

@ikskuh
Copy link
Contributor

ikskuh commented Oct 20, 2021

@MasterQ32 Do you feel the same way about part 1, or is it just 2A that bothers you?

After typing some code with this proposal in mind, i have found some very nice use cases indeed for Part 1. So i revisit my thing and say: Let's do Part 1 for sure

@ghost
Copy link

ghost commented Feb 11, 2022

This doesn’t quite sit right with me. It requires a result type for resolution, but decls aren’t required to be of the same type as their container, so it only works in the specific case where they happen to be coercible. Say I write some code using this feature, then I refactor so the decl is now of a different type; now every occurrence of this feature is broken, so I rewrite into type-on-the-right style. If, later, I decide this was a bad change for whatever reason, all of the existing code still works, and I see no reason to change them as they are not any less neat like that. This “hysteresis” of correct style under refactorings just seems very un-Zig-like to me.

And while I’m all about ergonomics enforcing correctness, the cases presented in 1 and 2A (ignoring 2B because I agree with your assessment of it) are only necessarily unergonomic if a type-on-the-left style is enforced, which it is not in general in Zig. That said, I am curious to hear xq’s cases, and I’m not certain how to address ifreund’s case (though I’m not certain it needs to be addressed — I’m not aware of any possible sketchy shortcuts). Just that to me, this feature would need to meet a very high threshold of utility to be justified.

@ghost
Copy link

ghost commented Feb 11, 2022

There are two ways to look at this feature:

  1. Consistently allow elision of the namespace if it can be inferred unambiguously. Previously this only worked for fields and variants.
  2. A thin layer of syntax sugar for the special case where a struct has constants and methods of the same type as itself.

From the first point of view, we should probably do it; from the second, we probably shouldn't. Personally, I still lean towards 1., but not very strongly.

The "style hysteresis" issue is an interesting point too, though it strikes me as somewhat theoretical. How often would such a change really happen? In the intended use cases (constructor methods and pre-configured structs) the type is what it is by construction. One particular exception I can think of is changing the type of a constructor from T to !T. I'm not sure we'd want to allow var x: T = try .init() to cover that case 😄.

@mlugg
Copy link
Member

mlugg commented Mar 21, 2024

One thing I'd like to pick up on here is the support for late binding coercion, rather than this being a specific syntax form. Is this support really necessary? I feel that in any scenario where you could use this, the intention would start becoming unclear, to the point where it would be preferable to write the code as you would today, i.e. probably with @field.

In that case, rather than changing how the type currently known as @Type(.EnumLiteral) works, we can simply special-case the syntax form. There is precedent for this in field calls, where foo.bar() performs a field call but (foo.bar)() does not; and in nested pointer casts, where @ptrCast(@alignCast(x)) works despite there not technically being an intermediary result type.

Thus, I propose to simply special-case the syntactic form of enum (/decl) literals when a result type is provided. So:

const S = struct {
    z: u32,
    const default_val: S = .{ .z = 123 };
    fn init() S {
        return default_val;
    }
};

// this works
const x0: S = .default_val;

// this does not
const x_lit = .default_val;
const x1: S = lit;

// this works
const y0: S = .init();

// this does not
const y_lit = .init;
const y1: S = y_lit();

I feel that this is a rather less sweeping language change: it's a relatively simple extension of our general preference for type annotations over explicitly typed expressions (const x: S = .{} over const x = S{}, and const y: u32 = @intCast(z) over const y = @as(u32, @intCast(z))). As a nice bonus, this is also super easy to implement - just playing around for fun, I'm pretty sure I've got it working with 4 files changed, 61 insertions(+), 3 deletions(-).

@DerpMcDerp
Copy link

If this proposal gets implemented, you can special case "constructor" syntax for functions with no name so we can get rid of the .init convention:

const Vec2 = struct {
	x: f32,
	y: f32,
	pub inline fn @""(x: f32, y: f32) Vec2 {
		return .{ .x = x, .y = y };
	}
};

const pt1 = Vec2.(1, 2);
const pt2: Vec2 = .(3, 4);

@silversquirl
Copy link
Contributor

@DerpMcDerp that seems like rather confusing syntax, and I don't really see any benefit over .init(1, 2) personally.

Also, sometimes you have multiple init functions (eg. ArrayList's init() and initCapacity()) and having an explicit "constructor syntax" in the language would make it more difficult to properly name those variants.

@InKryption
Copy link
Contributor

InKryption commented Apr 15, 2024

Zero-length identifiers are currently illegal, so that wouldn't even work in status quo; even if it were to become legal again, this proposal does not pose any changes to the rules around accessing declarations, so your example would most likely in all actuality be:

const Vec2 = struct {
    x: f32,
    y: f32,
    pub inline fn @""(x: f32, y: f32) Vec2 {
        return .{ .x = x, .y = y };
    }
};

const pt1 = Vec2.@""(1, 2);
const pt2: Vec2 = .@""(3, 4);

@silversquirl
Copy link
Contributor

A few examples of how real Zig APIs (mostly from std) could benefit from this proposal:

  • mach-gpu uses wrappers for several WebGPU structs to convert slices to pointer + len. This leads to unwieldy gpu.VertexState.init(.{}), gpu.FragmentState.init(.{}), etc. which would be much more readable as .init(.{}), as the types are obvious from the usage.
  • std.ArrayListUnmanaged provides default initialization via default field values, however this doesn't actually make sense. Initializing either items or capacity while leaving the other as the default is always a bug.
    It would make more sense for ArrayListUnmanaged to provide a default decl for initialization, and leave the fields without default values, but that's too unwieldy to be worth it currently. With decl literals, that change would result in a safer and more explicit API.
  • struct {
        set: std.StaticBitSet(10) = std.StaticBitSet(10).initFull(),
    }
    (as well as many other types that require similar duplication, such as std.enums.EnumArray and friends)
  • Similar to the previous example:
    const S = struct {
        a: std.HashMap(K, V, Context, 80),
    };
    const s: S = .{
        .a = std.HashMap(K, V, Context, 80).init(allocator),
    };
    As well as many other similar cases.

All of these cases would result in improved clarity and less possibility for bugs from this proposal, similar to how .{ ... } syntax helps readability by avoiding duplication.

@mlugg
Copy link
Member

mlugg commented May 3, 2024

In addition, here are a few cases I hit frequently in the compiler itself.

  • As a common, but not very significant, one: the fromInterned methods on Type and Value are used quite frequently. return Value.fromInterned(ip_index) is a little worse than return .fromInterned(ip_index); the specific type is not relevant and just adds noise to the line. This is even more true in function call arguments, since a small amount of visual noise can really hurt a line's legibility in some cases. (With an API restructure that we ought to do, this point will also apply to Air.Ref.fromInterned.)

  • A more significant example is InternPool.Alignment. When operating on log2 units of alignment, we want to use Alignment.fromLog2Units to convert back, for documentation purposes, even though it is just -- in the doc comment's words -- a "glorified @enumFromInt". However, it is in practice very common to write @enumFromInt. The reason is very simple: this is the path of least resistance. Alignment.fromLog2Units, or even InternPool.Alignment.fromLog2Units, is pretty unwieldy to type when the result type is already known (e.g. when we're writing a struct initialization), so contributors will often just write @enumFromInt without a second thought. This leads to confusion when reading such code. Decl literals would mean no path were easier than another, so would encourage more correct API usage (i.e. .foo_align = .fromLog2Units(log_align) by eliminating needless verbosity.

Here's another point (no longer related to the compiler implementation): this solves an issue which could return if we bring back return value RLS paired with pinned types. Let me elaborate.

Return value RLS (whose fate is undecided) alongside #7769 (accepted) gives us the ability to directly return a value whose memory address is a "part" of its value (e.g. it embeds self-referential pointers). The typical use case for this would probably be init functions on such structs. Today, these would be constructed with const x = MyPinnedFoo.init();. However, there's a problem with this line: it actually can't apply RLS! Today, stack allocations with inferred types can not apply RLS to the initialization expression [1]. So, this would emit a compile error, since the pinned struct value is copied. You would have to write const x: MyPinnedFoo = MyPinnedFoo.init();, which I think everyone would agree is a bit ugly. OTOH, with this proposal implemented and in widespread usage, the author would probably have written const x: MyPinnedFoo = .init(); in the first place, sidestepping the problem entirely! This is an example of how type annotations are a fundamentally good thing: when a variable's definition is going to mention its own type, it is desirable for it to be in a type annotation rather than the init expression whenever reasonably possible. This is both easier and faster for the compiler to solve, and more versatile at the language level.

[1]: this is a necessary restriction, because PTR on the final alloc type (in the case of multiple initialization peers) could result in RLS demanding an impossible coercion. For instance, in the code const x = if (runtime_condition) init_u16() else init_u32(), the constant x is assigned type u32, so no correct u16 result pointer can be provided to init_u16.

@mlugg
Copy link
Member

mlugg commented May 27, 2024

In a language design meeting on 2024/05/18 between myself, @andrewrk, @SpexGuy, and @thejoshwolfe, the variant of this proposal described in my previous comment was accepted, but with an important caveat. I'll slap on the label for now, but please read this comment for details.

The main point of contention for acceptance of this proposal was the issue of namespace ambiguity in enums (and unions). Today, the syntax .foo always refers to an enum/union field; this is important because such containers can have fields and declarations with the same names (e.g. literally every field of std.builtin.Type). Under this proposal, it could also refer to the declaration, and indeed, that would be more consistent with MyType.foo, which "prefers" decls over enum fields; but that would make the field effectively inaccessible (you could technically access it by coercing an untyped enum literal, but that's plainly ridiculous). On the other hand, making decl literals prefer fields would preserve flexibility, but is a quite inconsistent and subtle rule.

As such, the following conclusion was reached. We will attempt to introduce the following rule: fields and declarations of a container share a namespace, and thus cannot have the same name. It is as yet undecided if this rule will apply to all containers, or just unions/enums (i.e. it is undecided whether it will apply to structs). If this rule can be introduced without decreasing the quality of Zig code in the wild, then this rule will become a part of the language specification, and the Decl Literals proposal is accepted (for now). Otherwise, this proposal may have to be tweaked or thrown out entirely.

Now that I've outlined the state of affairs, I'll quickly summarize the other points discussed during the meeting. I unfortunately didn't take notes, so this is all from memory (if any attendees want to add any points I forgot to mention, by all means do!).

  • Type inference (of any form) typically improves writeability and refactorability, but can hinder readability, depending on the context.
  • Common use case of writing x: T = .foo is fine; readability concern is centered around function parameters (what does foo(.init(...)) initialize?) and struct fields (what does .{ .foo = .default } default?).
  • The particular concern here is increased mental load; you need to keep more of the codebase in your head, or otherwise jump around to check types more often.
    • Counterpoint: that's already the case today for enum literals, and isn't a problem in practice. It's very rare to see e.g. fooBar(std.builtin.Endian.little) over fooBar(.little).
  • Sometimes readability / understandability can be better, because the new syntax tells you more. const x: T = .foo gives you strictly more information than const x = T.foo, because in the former case you know that foo is declared on T and has type T.
  • Under this proposal, you lose the ability to know for sure that something is an enum field without checking the source.
  • Simple potential concrete use case: switch (endian) { .native => ..., .foreign => ... }.
    • This could seem a bit weird to a Zig programmer today, but is completely intuitive to read!
  • Case study: can someone unfamiliar with the feature understand what code using it does? (we recruited a helper)
    • Answer: yes!
  • This proposal helps with defaulting struct fields, which is going to become very important when std.ArrayListUnmanaged etc are converted to follow the guidance on default struct field values!
    • my_field: ArrayListUnmanaged(u32) = ArrayListUnmanaged(u32).empty is annoying to read
    • ...and this can get much worse for e.g. AutoArrayHashMapUnmanaged with a complex context (see fields on InternPool in the compiler)
  • Accepting this proposal brings Zig as a whole over its complexity budget; we should aim to simplify the language elsewhere.

@mlugg mlugg added the accepted This proposal is planned. label May 27, 2024
@expikr
Copy link
Contributor

expikr commented Jun 5, 2024

First question: if namespace exclusion is to apply to structs, will shadowing rules apply too?

const Vec3 = struct {
    x: f32, y: f32, z: f32, // occupies Vec3.x, Vec3.y, Vec3.z
    pub fn init(x: f32, y: f32, z: f32) Vec3 { // error: function parameter shadows declaration
        return .{ .x=x, .y=y, .z=z };
    }
};

Second question: how much of the decl usecase can be achieved by instead providing a @Here() builtin that makes the result location type available in-situ?

my_field: ArrayListUnmanaged(u32) = @Here().empty
startDelorean(@Here().time_travel);
var array: std.ArrayList(u32) = @Here().init(allocator);

mlugg added a commit to mlugg/zig that referenced this issue Aug 28, 2024
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of ziglang#9938.
mlugg added a commit to mlugg/zig that referenced this issue Aug 28, 2024
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of ziglang#9938.
mlugg added a commit to mlugg/zig that referenced this issue Aug 28, 2024
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of ziglang#9938.
mlugg added a commit to mlugg/zig that referenced this issue Aug 28, 2024
This is a mini-proposal which is accepted as part of ziglang#9938.

This compiler and standard library need some changes to obey this rule.
mlugg added a commit to mlugg/zig that referenced this issue Aug 28, 2024
This is a mini-proposal which is accepted as part of ziglang#9938.

This compiler and standard library need some changes to obey this rule.
mlugg added a commit to mlugg/zig that referenced this issue Aug 28, 2024
This is a mini-proposal which is accepted as part of ziglang#9938.

This compiler and standard library need some changes to obey this rule.
mlugg added a commit to mlugg/zig that referenced this issue Aug 29, 2024
This is a mini-proposal which is accepted as part of ziglang#9938.

This compiler and standard library need some changes to obey this rule.
mlugg added a commit to mlugg/zig that referenced this issue Aug 31, 2024
This is mainly useful in conjunction with Decl Literals (ziglang#9938).

Resolves: ziglang#19777
mlugg added a commit to mlugg/zig that referenced this issue Aug 31, 2024
mlugg added a commit to mlugg/zig that referenced this issue Aug 31, 2024
@andrewrk andrewrk modified the milestones: 0.16.0, 0.14.0 Aug 31, 2024
mlugg added a commit to mlugg/zig that referenced this issue Aug 31, 2024
mlugg added a commit to mlugg/zig that referenced this issue Sep 1, 2024
This is mainly useful in conjunction with Decl Literals (ziglang#9938).

Resolves: ziglang#19777
@mlugg mlugg closed this as completed in 6e3e23a Sep 1, 2024
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
The compiler actually doesn't need any functional changes for this: Sema
does reification based on the tag indices of `std.builtin.Type` already!
So, no zig1.wasm update is necessary.

This change is necessary to disallow name clashes between fields and
decls on a type, which is a prerequisite of ziglang#9938.
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
This is a mini-proposal which is accepted as part of ziglang#9938.

This compiler and standard library need some changes to obey this rule.
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
This is mainly useful in conjunction with Decl Literals (ziglang#9938).

Resolves: ziglang#19777
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests