Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Payload Capture Syntax and Semantics #15086

Open
ZoloZithro opened this issue Mar 26, 2023 · 9 comments
Open

Payload Capture Syntax and Semantics #15086

ZoloZithro opened this issue Mar 26, 2023 · 9 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ZoloZithro
Copy link

ZoloZithro commented Mar 26, 2023

Zig Version

0.11.0-dev.2194+626a75bbc

Steps to Reproduce and Observed Behavior

Example 1:

const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    var xs: [3]u8 = .{ 0, 1, 2 };
    for (xs, 0..) |x, idx| {
        try expect(x == xs[idx]);     // Loop invariant for all 'for loops'.
        x +=1;
        try expect(x == xs[idx]);     // Loop invariant for all 'for loops'.
        try expect(xs[idx] == idx+1); // Example specific expect.
    }
}

Doesn't compile, reporting this error:

for.zig:8:11: error: cannot assign to constant
        x +=1;
        ~~^~~

Example 2:

const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    var b: ?i32 = 5;
    if (b) |value| {
        value += 1;
    }
    try expect(b.? == 6);
}

Doesn't compile, reporting this error:

if.zig:7:15: error: cannot assign to constant
        value += 1;
        ~~~~~~^~~~

Example 3:

// Modified example from Zig Learn: https://ziglearn.org/chapter-1/#unions
const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    const Tagged = union(enum) { a: u8, b: f32, c: bool };
    var value = Tagged{ .a = 10 };
    switch (value) {
        .a => |byte| byte += 1,
        .b => |float| float *= 2,
        .c => |b| b = !b,
    }
    try expect(value.a == 11);
}

Doesn't compile, reporting this error:

switch.zig:9:27: error: cannot assign to constant
        .a => |byte| byte += 1,
                     ~~~~~^~~~

Example 4:

// Modified example from:
// https://zig.news/kristoff/easy-interfaces-with-zig-0100-2hc5

const std = @import("std");
const print = std.debug.print;
const expect = std.testing.expect;

const Cat = struct {
    anger_level: usize,
    hp: usize,

    pub fn talk(self: Cat) void {
        print("Cat: meow! (anger lvl {})\n", .{self.anger_level});
    }

    pub fn eat(self: *Cat) void {
        self.hp +=1;
        if (self.anger_level > 0) {
            self.anger_level -= 1;
        }
    }
};

const Dog = struct {
    name: []const u8,
    hp: usize,

    pub fn talk(self: Dog) void {
        print("{s} the dog: bark!\n", .{self.name});
    }

    pub fn eat(self: *Dog) void {
        self.hp +=1;
    }
};

const Snake = struct {
    pub fn slither(self: Snake) void {
        _ = self;
    }
};

const Animal = union(enum) {
    cat: Cat,
    dog: Dog,
    snake: Snake,

    pub fn talk(self: Animal) void {
        switch(self) {
            .snake => print("Ssss~~~\n", .{}),
            inline else => |animal| animal.talk(),
        }
    }

    pub fn eat(self: *Animal) void {
        switch(self.*) {
            .snake => void,
            inline else => |animal| animal.eat(),
        }
    }
};

pub fn main() !void {
    var tom: Animal = Animal { .cat = Cat { .anger_level = 10, .hp = 100 }};
    tom.talk();
    tom.eat();
    tom.talk();
    try expect(tom.cat.anger_level == 9);
}

Doesn't compile, reporting this error:

interface.zig:58:43: error: expected type '*interface.Cat', found '*const interface.Cat'
            inline else => |animal| animal.eat(),
                                    ~~~~~~^~~~
interface.zig:58:43: note: cast discards const qualifier
interface.zig:16:22: note: parameter type declared here
    pub fn eat(self: *Cat) void {
                     ^~~~

Expected Behavior

I expect examples 1, 2, 3, and 4 above to compile and, when run, return from main successfully.

Reasoning

The expected behaviour above is clean, clear in intent, and abides by the 'Zig Zen'; fixing the syntax and semantics of payload capturing.

Everything you capture in Zig is already in scope, which means you already have everything you need without capturing at all. Let's say that again, everything you capture in Zig is already in scope.

This means that anything you capture has a type, a value, and usually has been defined to be either const or var. In the cases when the value you capture hasn't been specified as const or var (i.e. capturing a function's return value) the simple syntax of |var val| can be used, when you want the value to be mutable, and be const by default.

There is no point forcing payload captures to be const, as they currently are, because you already have access to the payload without capturing; this means, regardless of the payload capture being const, if my payload is var I can mutate it. Exploiting this fact, you can get inconsistencies between the payload and the captured value; the loop-invariant present in Example 1 of the observed behaviour above is there for this reason.

Motivation

Previously, there were also issues with payload capturing. However, recent changes outlined in the second note of #14671 introducing this syntax:

for (&xs) |*xs| { ... }

and proposed future changes in #14734 (comment) have really highlighted them.

The core of the problem (and confusion) is that while payload capturing is professed to not be/use a lambda function, it acts like a nerfed lambda function that is jerry-rigged into various language constructs to request a normal capture or a pointer capture.

Yes, this can be made to work like this, but the real question is should it work like this. Since payload capturing doesn't use lambda functions, the current and future proposed syntax reveals compiler details of how the capturing is performed.

Since payload capturing doesn't use lambda functions, it doesn't have to be implemented like that; it is just a new language construct that can have any syntax and semantics we want. If this is the case, then why don't we choose syntax and semantics that better match Zig's philosophy and zen.

In fact, based on my understanding of how payload capturing is being implemented, except for the cases when you capture the return value of a function, there is a significant unnecessary overhead; possibly to the extent that their use will be avoided except where needed.

Hopefully someone can confirm this: every capture declares memory, either the size of the payload or the size of a pointer, and performs a read and a write of that size. This means that when using a for construct to loop over an array of size n, you have an unnecessary overhead of n reads and n writes (I'm assuming and hoping that memory for only one array element [or pointer] is declared and then reused). Every time you capture an optional, instead of a simple comparison you have an additional memory declaration, read, and write.

However, since payload capturing doesn't use lambda functions, we can do better. We can have cleaner syntax, more consistent semantics, and even eliminate some unnecessary overhead.

Idea Illustration

I think there are a least two ways to achieve the expected behaviour outlined; possibly even a combination of the two.

The first possibility heavily makes use of the existing implementation; when performing the payload capture (x) |value|:

  • If x is a var, then capture a pointer to x and perform one level of dereferencing on value automatically.
  • If x is a const, then the compiler will choose whether to make a copy of x or capture a pointer to x and perform one level of dereferencing on value automatically, depending on the size of x and which is more performant.

The second possibility is syntactic sugar, which is commonly used in other languages in moderation, especially for language constructs like switch, match, and case of, which are replaced with if else constructs. This possibility removes the unnecessary overhead entirely where it can be used by making an in-place replacement. The following examples correspond to the observed behaviour examples above.

Example 1.2

const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    var xs: [3]u8 = .{ 0, 1, 2 };
    {
        var idx: usize = 0;
        while (idx < xs.len) : (idx += 1) {
            xs[idx] += 1;
            try expect(xs[idx] == idx+1); // Example specific expect.
        }
    }
}

Example 2.2

const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    var b: ?i32 = 5;
    if (b != null)  {
        b.? += 1;
    }
    try expect(b.? == 6);
}

Example 3.2

// Modified example from Zig Learn: https://ziglearn.org/chapter-1/#unions
const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    const Tagged = union(enum) { a: u8, b: f32, c: bool };
    var value = Tagged{ .a = 10 };
    switch (value) {
        .a => value.a += 1,
        .b => value.b *= 2,
        .c => value.c = !value.c,
    }
    try expect(value.a == 11);
}

Conclusion

I want to start by saying that Zig's developers and community have been doing a great job and that the Ziglings project is wonderfully made; it is entertaining, informative, and has a good pedagogy.

Personally, I have seen many cases where Zig code is easier to read, write, and understand without using capturing than it is when using capturing. I think this is something no one in the Zig community wants to see and that the expected behaviour above conforms perfectly to Zig's principles and zen.

@ZoloZithro ZoloZithro added the bug Observed behavior contradicts documented or intended behavior label Mar 26, 2023
@mlugg
Copy link
Member

mlugg commented Mar 27, 2023

I just want to check you're not missing this: you know every one of your examples can be made to work simply by using a by-ref capture, right? e.g:

var b: ?i32 = 5;
if (b) |*value| {
    value.* += 1;
}
try expect(b.? == 6);

If you're asking why this isn't the default behaviour, IMO the most convincing point is that it's a footgun. Mutating a value is a loop is actually a fairly rare thing to want to do, so it makes sense to make it explicit in the cases where it is your intent. That means it's impossible to accidentally do it in what was meant to be a simple loop only reading data. (Indeed, if value were mutable in a simple loop over b above, I wouldn't as a language user expect it to refer back to the payload of b, but instead to be a separate variable.) This is a similar justification as to why function parameters are immutable.

@sampersand
Copy link

Imo the best way would be if (b) |&value|. The * syntax makes me think we're dereferencing value, not getting the address of it.

@hryx
Copy link
Sponsor Contributor

hryx commented Mar 27, 2023

I'm struggling to identify the problem being addressed in this proposal. There seems to be some assumption here that immutable captures necessarily create in-memory copies. I don't think that's necessarily the case. The compiler should be able to make optimizations to immutable captures similar to those of pass-by-value function parameters. So maybe this is just a matter of clarifying the semantics of captures.

the simple syntax of |var val| can be used

This appears to be the same as #9696

try to consider the expected behavior above without bias, or at least acknowledging the bias

General feedback: Please don't presuppose that the maintainers of any project will be closed-minded about proposals. It adds artificial controversy to your issue which ironically only makes it harder to evaluate the actual content. Zig language improvements are being continuously driven by real-world use cases of people not working on the compiler.

@silversquirl
Copy link
Contributor

I have seen many cases where Zig code is easier to read, write, and understand without using capturing

Could you share some of these examples? Would be good to see some real Zig code that support this proposal

@ZoloZithro
Copy link
Author

ZoloZithro commented Mar 27, 2023

@hryx Your right, it was my mistake to presuppose; I'll try to avoid that in the future. As an aside, I like your proposal in #1717.

I'm struggling to identify the problem being addressed in this proposal. There seems to be some assumption here that immutable captures necessarily create in-memory copies. I don't think that's necessarily the case. The compiler should be able to make optimizations to immutable captures similar to those of pass-by-value function parameters. So maybe this is just a matter of clarifying the semantics of captures.

I think clarifying the semantics would be very helpful, as I said above, I'm hoping someone can confirm the exact behaviour. If there is no overhead to payload capturing then that would great. We would just be left with syntax that's confusing; my expected behaviour seems to make it clear without a downside? Maybe I'm missing the downside?

Mutating a value is a loop is actually a fairly rare thing to want to do, so it makes sense to make it explicit in the cases where it is your intent.

@mlugg Yes, I'm aware every one of my examples can be made to work by using a by-ref capture. If Zig wants to replace c, then it needs to replace c in the field of high-performance computing (e.g. scientific computing) too; in these fields you basically want to mutate your array elements every time you loop. In these fields, you don't just consider reads, writes, and memory declarations, but even your floating point operations; a little overhead can be multiplied billions of times.

If you're asking why this isn't the default behaviour, IMO the most convincing point is that it's a footgun.

I don't see how this is the case, as mentioned above:

This means that anything you capture has a type, a value, and usually has been defined to be either const or var.

I have already declared my data to be var or const, it doesn't make sense to do it again in the same function; as explained above too:

There is no point forcing payload captures to be const, as they currently are, because you already have access to the payload without capturing; this means, regardless of the payload capture being const, if my payload is var I can mutate it.

This is a similar justification as to why function parameters are immutable.

I don't think this applies, payload capturing is just a language construct within a function, it shouldn't behave like a function, lambda, or iterator; besides (possible?) performance issues, this is the issue, the current syntax and semantics is similar to these three things combined.

@ZoloZithro
Copy link
Author

ZoloZithro commented Mar 27, 2023

Could you share some of these examples?

Switches on tagged unions are really good examples; capturing here also decouples the naming of the captured value from the tag that was selected, meaning it is up to the programmer to choose a name (hopefully the same name as matching union field) which adds to the complexity and makes it more difficult to read. Though, this isn't much of an issue when the tag type of a tagged union is inferred. I've seen an example where they didn't match and to read it I had to refer to the tagged union definition; if I remember correctly, it wasn't real world code, but I wouldn't be surprised if I stumble upon it again (after all many examples people learn from aren't real world code).

@silversquirl Is the following easier to read than Example 3.2 above.

/ Modified example from Zig Learn: https://ziglearn.org/chapter-1/#unions
const std = @import("std");
const expect = std.testing.expect;

pub fn main() !void {
    const Tagged = union(enum) { a: u8, b: f32, c: bool };
    var value = Tagged{ .a = 10 };
    switch (value) {
        .a => |*byte| byte.* += 1,
        .b => |*float| float.* *= 2,
        .c => |*b| b.* = !b.*,
    }
    try expect(value.a == 11);
}

If you experiment with switches and ifs by writing a version of code without captures, and compare, it'll likely look cleaner.

Another simple example was already posted by @mlugg above, is the following clearer than Example 2.2 above?

var b: ?i32 = 5;
if (b) |*value| {
    value.* += 1;
}
try expect(b.? == 6);

I read this as "I pointer capture value from b, if b is not null, and increment the value, pointed to by value, by 1". Compared to Example 2.2, which I read as "If b is not null, increment the value of b". I prefer Example 2.2 over this and Example 2; if I had to say: when comparing how well Zig Zen is fulfilled, Example 2.2 would be best.

@kubkon
Copy link
Member

kubkon commented Mar 31, 2023

@ZoloZithro if you could ensure to add Zig syntax highlighting to your snippets, it would make following your examples a lot easier. FWIW you can do it with "```zig" when opening the code snippet in MD.

@silversquirl
Copy link
Contributor

Personally I'd be inclined to say those examples are more readable, or at the very least more explicit (which generally makes things easier to read and understand).
Either way they're very simple examples; I'd be interested in seeing some examples of real Zig code from real Zig projects where you've found it hard to understand what's going on because of the way captures work.

@andrewrk andrewrk added proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. and removed bug Observed behavior contradicts documented or intended behavior labels Apr 10, 2023
@andrewrk andrewrk changed the title Payload Capture Syntax and Semantics: Violating Zig Zen Payload Capture Syntax and Semantics Apr 10, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Apr 10, 2023
@andrewrk
Copy link
Member

I'll let this proposal slide since it's well written (despite the reference to "zig zen" which I find distasteful as an argument for anything).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

7 participants