add another way of passing non-copyable things as parameters #733

andrewrk · 2018-02-02T03:58:01Z

Right now you have to pass structs and other non-copyable things as a pointer. As a consolation prize, we allow implicitly casting T to &const T. This causes problems in a number of ways. One example is with generics, where it seems like you could take a T parameter but really you would want a &const T, and then if you try to look at the type, it's a pointer. Or if you use a var parameter, zig has to automatically pass a &const T instead which is counter-intuitive.

This proposal is to allow a function like this:

const Foo = struct {
    x: i32,
    y: i32,
};

fn callee(foo: Foo) {
}

test "aoeu" {
    callee(Foo {.x = 1, .y = 2});
}

For lack of a better name, I'm going to call this "passing arguments by const reference".

To the callee, foo looks like a value, the same as if you did const foo = Foo {.x = 1, .y = 2}; on the first line of the body of the function. However, it is not a by-value parameter, because the caller does not necessarily make a copy. Zig would be free to pass the parameter by value, perhaps if it is smaller than some number of bytes, or pass it by reference. The caller guarantees that the bytes of foo will not change for the lifetime of the function.

This allows Zig to use the "noalias" optimization on the const reference pointer.

Zig could figure out that this should be a compile error:

const Foo = struct {
    x: i32,
    y: i32,
};

fn callee(foo: Foo, another: &Foo) {
    another.x += 1;
}

test "aoeu" {
    var foo = Foo {.x = 1, .y = 2};
    callee(foo, &foo);
}

Zig knows that arg 1 will be passed as a pointer under the hood, and it knows that arg 2 is the same pointer. So another.x += 1 violates the noalias rules.

However this could be obfuscated enough that zig could not figure out this causes undefined behavior, so runtime safety is in order.

What this looks like is, in a function with args passed this way - we note the pointer ranges of the const references and noalias arguments. That is, the pointer of the argument + sizeOf(the_struct_or_arg).

When a mutable pointer is indexed, we look at the address of the indexed element, and check if it's in the range of any of the noalias ranges. If it is, that's a runtime safety panic, because there shouldn't be a mutable pointer to that address.

With this proposal, I think we should remove the implicit cast of T to &const T. Almost every place that currently uses &const T should be changed to use this new arg passing convention.

Related: #670

The text was updated successfully, but these errors were encountered:

zesterer · 2018-02-02T04:09:25Z

Makes a lot of sense.

The original syntax takes a problem that should be dealt with by the optimiser (namely that data should be passed by reference for performance reasons) and places it in the hands of the developer, someone that is - in most cases - woefully unequipped to handle such optimisation decisions.

The new syntax leaves it up to the optimiser what it thinks it should or shouldn't pass by reference. At the end of the day, the optimizer knows best. There's no loss of semantic expressiveness either since a &const T is no more useful than a T to the callee. In addition, it also simplifies the syntax.

andrewrk · 2018-02-11T04:25:38Z

This is a reversal of #103

The example from there is:

var global_array: [1024]i32 = undefined;

fn dont_modify_param(param: [1024]i32) i32 {
    const y = param[0];
    global_array[0] = 0;
    const x = param[0];
    return x + y;
}

test "uh oh" {
    dont_modify_param(global_array);
}

how we fix this is, every write through a global pointer will have the safety check. So on the line global_array[0] = 0; we would get a panic "write modifies const parameter".

if you wanted to be able to do this, you'd have to make param an explicit pointer:

var global_array: [1024]i32 = undefined;

fn dont_modify_param(param: &const [1024]i32) i32 {
    const y = (*param)[0];
    global_array[0] = 0;
    const x = (*param)[0];
    return x + y;
}

test "uh oh" {
    dont_modify_param(&global_array);
}

Now this will trigger a similar safety check, panic "write invalidates alias rules". By default, modifying global variables is not allowed to modify any data directly pointed to by parameters.

To fix it:

fn dont_modify_param(alias param: &const [1024]i32) i32 {

Now param can alias anything, including global variables and other parameters. So writing through a global variable, or another parameter could potentially change the bytes param points to.

This is all for optimization. Gotta go fast.

This is better done after some pointer reform changes I plan to make. I'm about to open that issue.

andrewrk · 2018-04-29T18:22:55Z

Once this is done, I'd like to change std.mem.Allocator.create:

    fn create(self: &Allocator, init: var) !&@typeOf(init) {
        const T = @typeOf(init);
        if (@sizeOf(T) == 0) return &{};
        const slice = try self.alloc(T, 1);
        const ptr = &slice[0];
        *ptr = init;
        return ptr;
    }

So you would use it like this:

      const defer_node = try arena.create(ast.Node.Defer {
          .base = ast.Node {
              .id = ast.Node.Id.Defer,
              .comments = comments,
          },
          .defer_token = token,
          .kind = switch (token.id) {
              Token.Id.Keyword_defer => ast.Node.Defer.Kind.Unconditional,
              Token.Id.Keyword_errdefer => ast.Node.Defer.Kind.Error,
              else => unreachable,
          },
          .expr = undefined,
      });

If you wanted to leave the value undefined:

const ptr = try allocator.create(Foo(undefined));

This more closely matches variable declaration, where you explicitly have to use undefined to avoid initializing memory.

andrewrk · 2018-06-13T00:28:42Z

These aliasing safety checks are tricky. Here's an example of code that runtime safety should catch:

const Point = struct {
    x: i32,
    y: i32,
};
const Indirect = struct {
    p: *Point,
};

test "alias" {
    var pt = Point {
        .x = 1,
        .y = 2,
    };
    var indirect = Indirect {
        .p = &pt,
    };
    foo(&pt.x, &indirect);
}

fn foo(a: *i32, b: *Indirect) void {
    const v1 = a.*;
    b.p.x += 1;
    const v2 = a.*;
    // aliasing rules assert that v1 == v2
    // however v1 == 1 and v2 == 2
}

The expected crash message would be: memory store aliases parameter 'a'

I think this example means that we would have to put the runtime safety checks before every store instruction. So the performance penalty for this safety in every function is O(number_of_pointer_params * number_of_store_instructions). Might be costly, but worth a try.

0joshuaolson1 · 2018-06-13T02:51:47Z

Would ReleaseFast remove the checks?

andrewrk · 2018-06-13T03:29:41Z

Yes, and add noalias llvm property to the parameters, making the above example undefined behavior.

andrewrk · 2018-06-13T19:56:58Z

Here's an adversarial example against the current proposal:

var global_array: [1024]i32 = undefined;

fn dont_modify_param(param: [1024]i32) i32 {
    const y = param[0];
    innocent_function();
    const x = param[0];
    // we would expect x == y
    return x + y;
}

fn innocent_function() void {
    global_array[0] = 2;
}

test "uh oh" {
    global_array[0] = 1;
    dont_modify_param(global_array);
}

here's the proposed debug safety:

var global_array: [1024]i32 = undefined;
var __zig_safety_global_array: usize = 0; // generated safety global

fn dont_modify_param(param: [1024]i32) i32 {

    const y = param[0];
    innocent_function();
    const x = param[0];
    // we would expect x == y
    return x + y;
}

fn innocent_function() void {
    {
        // generated safety code for modifying global variables
        const prev = @atomicRmw(usize, &__zig_safety_global_array, builtin.AtomicRmwOp.Xchg, @maxValue(usize));
        if (prev != 0)
            @panic("data race on global variable 'global_array'");
    }

    global_array[0] = 2;

    {
        // generated safety code
        const prev = @atomicRmw(usize, &__zig_safety_global_array, builtin.AtomicRmwOp.Xchg, 0);
        if (prev != @maxValue(usize))
            @panic("data race on global variable 'global_array'");
    }
}

test "uh oh" {
    global_array[0] = 1;

    {
        // generated safety code, because we passed a mutable global variable to a
        // parameter value that isn't supposed to be mutated
        const prev = @atomicRmw(usize, &__zig_safety_global_array, builtin.AtomicRmwOp.Add, 1);
        if (prev == @maxValue(usize))
            @panic("data race on global variable 'global_array'");
    }

    dont_modify_param(global_array);

    {
        // generated safety code
        const prev = @atomicRmw(isize, &__zig_safety_global_array, builtin.AtomicRmwOp.Add, -1);
        if (prev == @maxValue(usize))
            @panic("data race on global variable 'global_array'");
    }
}

This is neat because in addition to solving the above example, it also detects data races of globals.

OK but then here's an adversarial example against this:

var global_array: [1024]i32 = undefined;

fn dont_modify_param(param: [1024]i32) i32 {
    const y = param[0];
    innocent_function();
    const x = param[0];
    // we would expect x == y
    return x + y;
}

fn innocent_function() void {
    modify(&global_array[0]);
}

fn modify(p: *i32) void {
    p.* = 2;
}

test "uh oh" {
    global_array[0] = 1;
    dont_modify_param(global_array);
}

So, debug safety is yet to be solved for this. And there are more adversarial examples to come up with, for example ones that use pointer references in structs rather than global variables.

andrewrk · 2018-06-14T02:34:16Z

I created #1108 for the rather ambitious "noalias on everything" ideas here. What's left in this issue is the ability to "pass by non-copying value", which is when the parameter type is an aggregate type such as a struct or array. In this case, the semantics are:

The value acts like a value (not a pointer)
The parameter cannot be modified (same as other parameters)
- However, one is not guaranteed that the value does not change.
It is not guaranteed whether it is passed by reference or copy. One must not rely on either way.
- In practice, for performance reasons, small types will be passed by copy and large types will be passed by reference.
If you take the address of such a parameter, the lifetime of the address is the lifetime of the function call (because it might be the address of a copy)

For async functions the semantics are the same. Async functions have to copy information into their coroutine frame, so a "pass by non-copying value" usually does a copy, making the data available for the lifetime of the async function. However, if Zig can prove that the lifetime of the coroutine frame is eclipsed by the lifetime of the calling function or async function, then Zig my pass a reference.

And with that, there is no safety to add to Zig; instead there are new semantics for Zig programmers to be aware of. However these semantics are designed to be what C programmers already expect from most of the code they already write. So this issue is reasonably easy to solve now.

closes #733

ghost · 2018-06-14T08:25:32Z

The parameter cannot be modified (same as other parameters)
However, one is not guaranteed that the value does not change.

This sounds a bit confusing to me.
Do you mean „it just acts as a normal value, you can modify it, but the „source“ is not modified, only a copy of it is modified“?

alexnask · 2018-06-14T08:29:12Z

@monouser7dig
What I got from it is that in a multi-threaded context, the value of the argument may change while the function is executed (aka you don't always get a copy, you could still be pointing to the original) but I may be wrong.

ghost · 2018-06-14T08:43:05Z

@alexnask that makes sense, thanks

tiehuis · 2018-06-14T11:11:29Z

Just want to clarify my thoughts and understanding. Please correct me if I'm overlooking something.

fn add(a: MyStruct) MyStruct;

This is equivalent to C, except for the fact that &a is not guaranteed to be a new local copy in the function. Further, since this may be a reference a user is required to ensure that there will be no concurrent modification at the call-site for the passed in value. If modification is a possibility, then they need to handle perform an explicit memcpy at the call site.

On when to use *const MyStruct vs. MyStruct, we can be a bit more lenient than we usually would in C. In general, if a type is never modified and we always create new instances when a transformation needs to occur then prefer copy, otherwise prefer pointers.

An example in std where we would want copy semantics would be in the current Complex type. Since the value is considered immutable once constructed and we always create new copies. It is like a primitive type.

An example where we don't want this is in the big.Int type. It is arguable that it is okay for the const parameters but consistency probably suggests to keep the current *const T behavior.

Maybe a loose rule like the following is useful in the common case?

Don't mix *T and T parameters for a type in its 'method' calls.
If you need to mix, prefer using *T and *const T.

ghost · 2018-06-14T11:17:41Z

How I think about values and references it would make sense if it would actually behave just like a value in c, you can modify it but this modification is only local.
To eliminate the need for cpp style const& zig could automatically pass the value as a pointer and thus eliminate the copy as long as the value is not modified in the function.

But detecting IF the value is modified in the function was one of the problem to begin with right? so maybe this is not possible.

andrewrk · 2018-06-14T16:07:05Z

Added breaking label back because actually it does change how var args are passed.

@tiehuis

Your understanding is correct.

fn add(a: MyStruct) MyStruct;

There's one more thing I want to do with this, and that's the well defined copy elision (#287) but in this case probably solved with named return value. It would look something like this:

fn add(a: MyStruct) result: MyStruct;

At the callsite if you did this: x = add(x) then inside add it is likely that: &a == &result

Idea here being that zig never does memcpy for you. (copying a pointer-sized integer because the struct is only 1 field is a different matter). This would also protect pointer fields that point to other fields - if we had some way to annotate that they must not be copied.

On when to use *const MyStruct vs. MyStruct, we can be a bit more lenient than we usually would in C. In general, if a type is never modified and we always create new instances when a transformation needs to occur then prefer copy, otherwise prefer pointers.

I'm thinking that the process would look like: prefer MyStruct, unless you need a guarantee that the value is in fact a pointer, in which case use *const MyStruct. The former gives the fewest guarantees to the user and the most flexibility to the compiler, so it would be preferred if the user did not need strong guarantees.

Regarding Complex, if we had the return value copy elision semantics, when copies would be made would be entirely up to the callsite. For example,

var a = Complex(f32).new(5, 3); // fn new() directly initializes a 
var b = Complex(f32).new(2, 7); // fn new() directly initializes b
var c = a.add(b); // fn add() directly initializes c
b = b.add(c); // no copies! add() directly reads and writes b
b = b.add(c).add(c); // hidden stack variable for the intermediate expression, but the final add() writes directly to b

As for big int, can you elaborate more on what you mean?

Currently in the big int code, mutable pointers are used for the return values, but it is my vision to achieve the above semantics with return value copy elision. So there would not really be any *Int arguments. Almost exclusively Int. divFloor would be:

pub fn divFloor(a: Int, b: Int) !result: tuple{Int,Int}

The result of this being that we could have similar code to the above Complex code. At the callsite of divFloor:

    var a = try Int.initSet(al, -3);
    defer a.deinit();

    var b = try Int.initSet(al, -5);
    defer b.deinit();

    var q, var r = try Int.divFloor(a, b);
    defer q.deinit();
    defer r.deinit();

Here you would want to avoid intermediate expressions since they would produce resource leaks.

Anyway this depends on 2 features we don't have yet. So what Big Int would really look like is just the *const Int turned into Int.

What would be the goal of the loose rule you proposed?

andrewrk · 2018-06-14T17:40:44Z

One more note: @Hejsil points out in #670 that the problem with automatically passing structs as *const (status quo before this proposal) is that you cannot distinguish between a type which is a pointer, or a type which has been automatically converted to a pointer. The Big Int implementation currently relies on this behavior, but I think it has to change.

closes #733

kristate · 2018-06-20T12:52:46Z

In std/mem.zig on line 42, there is a comment referencing this issue:

    /// Call destroy with the result
    /// TODO once #733 is solved, this will replace create
    pub fn construct(self: *Allocator, init: var) Error!*@typeOf(init) {

@andrewrk, has this been solved? what is the next step here?

…iglang#733

BarabasGitHub · 2018-09-09T16:40:58Z

Regarding Complex, if we had the return value copy elision semantics, when copies would be made would be entirely up to the callsite. For example,

var a = Complex(f32).new(5, 3); // fn new() directly initializes a 
var b = Complex(f32).new(2, 7); // fn new() directly initializes b
var c = a.add(b); // fn add() directly initializes c
b = b.add(c); // no copies! add() directly reads and writes b
b = b.add(c).add(c); // hidden stack variable for the intermediate expression, but the final a

Removing the copy here actually doesn't achieve anything I think.

b = b.add(c); // no copies! add() directly reads and writes b

It might actually be better to make a copy and treat the assigned b here as a new variable. (see https://en.wikipedia.org/wiki/Static_single_assignment_form)

This really only is an issue for large structs which you probably shouldn't be returning from a function if you don't want them to be copied anyway imho. It's also not like it can copy large chunks of memory easily like in C++ where you have hidden copy operations with constructors and assignment operators.

And when the variable to assign to is in some struct (or a global) it'll (at least in the case of complex) have to read all the values from memory, do some operations and write them back anyway, so it doesn't matter a whole lot if it makes a 'copy'.

Oh and maybe it matters when you pass structs by reference (as an optimization), but then you shouldn't be modifying the input when creating the result.

So all in all I fail to see the use of copy elision in Zig to be honest.

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Feb 2, 2018

andrewrk added this to the 0.2.0 milestone Feb 2, 2018

andrewrk added the accepted This proposal is planned. label Feb 3, 2018

andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Feb 10, 2018

andrewrk mentioned this issue Feb 11, 2018

Pointer Reform #770

Closed

andrewrk mentioned this issue Feb 26, 2018

allow implicit cast from S to ?&const S #785

Merged

andrewrk modified the milestones: 0.2.0, 0.3.0 Feb 28, 2018

This was referenced Mar 7, 2018

remove workaround on windows which disables byval parameters #536

Closed

error return traces missing entries from async functions #821

Closed

tiehuis mentioned this issue Apr 24, 2018

Add initial complex-number support #949

Merged

andrewrk referenced this issue in tiehuis/zig-bn May 10, 2018

Perform alias check in copy itself and nop if so

0e87b58

andrewrk mentioned this issue Jun 13, 2018

noalias on all parameters by default (with debug safety); ability to specify mayalias #1108

Open

andrewrk removed the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Jun 14, 2018

andrewrk mentioned this issue Jun 14, 2018

allow passing by non-copying value #1109

Merged

andrewrk added a commit that referenced this issue Jun 14, 2018

allow passing by non-copying value

66f5f77

closes #733

andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Jun 14, 2018

andrewrk added a commit that referenced this issue Jun 16, 2018

allow passing by non-copying value

59b3dc8

closes #733

andrewrk closed this as completed in #1109 Jun 17, 2018

andrewrk mentioned this issue Jun 17, 2018

Discussion: The standard way of writing generic functions #670

Closed

kristate added a commit to kristate/zig that referenced this issue Jun 20, 2018

std.mem: remove allocator create in favor of construct; ref ziglang#733

457c0f0

kristate added a commit to kristate/zig that referenced this issue Jun 20, 2018

std: update stdlib to match updated allocator create signature; ref z…

71db8df

…iglang#733

andrewrk mentioned this issue Sep 3, 2018

remove implicit cast from T to *const T #1465

Closed

tgschultz mentioned this issue Oct 18, 2018

Add std.meta #1662

Merged

andrewrk mentioned this issue Nov 19, 2018

Allow implicit cast from *T to *const ?*T #836

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add another way of passing non-copyable things as parameters #733

add another way of passing non-copyable things as parameters #733

andrewrk commented Feb 2, 2018

zesterer commented Feb 2, 2018 •

edited

andrewrk commented Feb 11, 2018

andrewrk commented Apr 29, 2018 •

edited

andrewrk commented Jun 13, 2018

0joshuaolson1 commented Jun 13, 2018

andrewrk commented Jun 13, 2018

andrewrk commented Jun 13, 2018 •

edited

andrewrk commented Jun 14, 2018

ghost commented Jun 14, 2018

alexnask commented Jun 14, 2018

ghost commented Jun 14, 2018

tiehuis commented Jun 14, 2018

ghost commented Jun 14, 2018

andrewrk commented Jun 14, 2018 •

edited

andrewrk commented Jun 14, 2018

kristate commented Jun 20, 2018

BarabasGitHub commented Sep 9, 2018

add another way of passing non-copyable things as parameters #733

add another way of passing non-copyable things as parameters #733

Comments

andrewrk commented Feb 2, 2018

zesterer commented Feb 2, 2018 • edited

andrewrk commented Feb 11, 2018

andrewrk commented Apr 29, 2018 • edited

andrewrk commented Jun 13, 2018

0joshuaolson1 commented Jun 13, 2018

andrewrk commented Jun 13, 2018

andrewrk commented Jun 13, 2018 • edited

andrewrk commented Jun 14, 2018

ghost commented Jun 14, 2018

alexnask commented Jun 14, 2018

ghost commented Jun 14, 2018

tiehuis commented Jun 14, 2018

ghost commented Jun 14, 2018

andrewrk commented Jun 14, 2018 • edited

andrewrk commented Jun 14, 2018

kristate commented Jun 20, 2018

BarabasGitHub commented Sep 9, 2018

zesterer commented Feb 2, 2018 •

edited

andrewrk commented Apr 29, 2018 •

edited

andrewrk commented Jun 13, 2018 •

edited

andrewrk commented Jun 14, 2018 •

edited