Proposal: Limit scope of `undefined` values (and introduce automatic coercion on function boundaries) #3328

rohlem · 2019-09-27T18:33:37Z

This was originally going to be a response to #3325, but it's more general in scope. So I thought I'd comment on #1947 , but that is more of a tracking issue rather than an open question. So I think a new issue is the best fit, but feel free to close it if it seems too similar.
If you're unfamiliar, please read #1947 for details on undefined values (and when they trigger potentially-checked UB), also see the test scenario fixed by 34d02f2 for an example of how undefined is currently "unsafe". (expl: the cast (?*T)(*T) is implemented as a no-op, so if *T is undefined, the bit pattern may remain 0 => result in null.)

Motivation

As I see it, types are generally useful to readers in that they expose certain invariants about values:

i3 only permits the range [-4, 3]
Unions and (status-quo non-open) enums always permit exactly one of their members to be "active"/ the value (=> no default in a switch that covers all cases)
Non-optional (non-allowzero) pointers are never null.

Within this formal model of program behaviour, non-conforming and undefined values (further shortened to "invalid" values) don't really exist (at least I'd prefer if they didn't). But as an optimization step, and ergonomic feature for uncovering program logic errors, it still makes sense for Zig to expose the concept of undefined, never-to-be-used values.
The issue is then, in my eyes, this optimization leaking into the program logic, f.e. by (probably non-exhaustive):

dereferencing pointers to "invalid" values (that violate type invariants - I'm actually uncertain if this might already be checked in some cases)
unchecked function arguments (evidently, see test case linked above)

Opinion/Discussion

IMO it would be helpful to have (more granular) well-defined boundaries establishing where exactly these invariants hold.
The precedent set by @ptrCast of 0 to a non-optional *T, and Zig's Zen to handle errors and edge cases, suggests that a detected invalid value should then trigger a safety panic in safe build modes (/ under scopes with runtime safety enabled).

side? note: The type system might even currently be an obstacle, as I wouldn't be surprised if f.e. a comparison u4 > 15 were optimized out, even if during runtime such an invalid value can manifest. Maybe a builtin @isValid, or some std.meta function for handling these edge cases could improve the situation.
If we accept this as "wontfix", then that implies that the implementation of an opt-ptr argument extern fn(x: ?*c_void) could be safer than non-opt-ptr argument extern fn(x: *c_void), because it can check for null. (Although, with pointers, there are still enough "valid" addresses in unmapped memory you cannot check so easily.)

Proposal

As functions seem to me a sensible unit of program analysis, I propose for every function call (and return) to introduce a "coercion" step to replace every "invalid" argument value (and return value) with a "valid" fallback (which is still semantically undefined, as far as analysis and runtime safety checks are concerned). (I'm undecided about inline functions, it seems likely that an argument could be made either way.)
Thereby function calls become a "validation boundary"; every function can assume its arguments are valid. (And every caller can assume the return value is valid.)
(Note: Doing this at every function call might sound too expensive, but since every function's inputs are now always "valid", it effectively only applies to functions which can introduce invalid values, mainly via keyword undefined, and only to affected variables, which partially reflects an already-accepted proposed step of analysis.)
EDIT3: As @SpexGuy pointed out below, indirection through a pointer also requires this validation step, and if we don't want to track all pointers written to within a function, this means validation on every write of an invalid (because locally-undefined) value through a pointer (i.e. ensuring that we write a valid-but-undefined value).

For extern Zig functions (that can be called externally, "from C" / dynamically linked etc.), it would be useful to either provide a wrapper function factory in std.meta, or (if that is somehow infeasible) have a language-generated assertion as part of the function prologue, checking that Zig's type invariants hold for the given arguments. Note the first would be opt-in, so could be considered less safe. (The same applies to extern functions' return values, which would require this caller-side.)

Following this proposal, (non-extern) callers would guarantee that non-optional pointers are non-null, solving #3325 for function arguments (although I don't mean to argue against a nonnull_or_undefined attribute, for release build modes). Local variables and local "sources of undefined values" still need to be tracked in an analysis-step, as already proposed in #211 .

EDIT2: While this proposal essentially talks about "safety" features (as correct programs should never observably access undefined values), I do think the idea behind it might still end up influencing optimized build modes.
Specifically, it may be interesting to opt-in to this behaviour via a function-scope builtin like a more specialized @setRuntimeSafety, i.e. @enableArgumentValidityCheck.

The text was updated successfully, but these errors were encountered:

kyle-github · 2019-10-06T21:22:41Z

How would @isValid look? Something like this?

fn @isValid(t:type, v:val) bool
if(@isValid(i4, 83) { ... } else { ... }

SpexGuy · 2020-10-22T19:07:32Z

This definitely needs to be solved, but I'm not sure that validating on function boundaries is enough. Consider this code:

const Pun = extern enum {
    integer: usize,
    pointer: *u32,
};

fn bar(p: *Pun) void {
    p.* = Pun{ .integer = 0 };
}

fn foo() *u32 {
    var p: Pun = undefined;
    bar(&p);
    return p.pointer;
}

Here, in bar, the compiler knows that p is in the integer state at the end, so can validate and see that 0 is valid. But in foo, the compiler does not know the state of the union, so cannot perform this conversion. It's not until later when the value is read that the compiler can do the fixup.

Thinking more about pointers, it's also important to consider that they can have defined values but would cause a segfault if dereferenced. So it's never safe to perform a fixup through a pointer unless the code is already reading from the pointer. Because of this, the compiler can't just generate checks at every function boundary but must generate them at every read. In some cases, if the optimizer can prove that the memory wasn't altered, it may be able to remove some of these checks. But in the general case that's difficult because of pointer aliasing.

The only "leak" I'm aware of for this problem is the (?*T)(*T) cast mentioned in the OP, so it would maybe be sufficient to only add these checks when performing this specific conversion.

But we also have the problem of type punning between *T and ?*T through @ptrCast or extern union. If the nullability check for that is well defined, then we have an even bigger problem of requiring the check on both reads and writes!

const Pun = extern union {
    nullable: ?*u32,
    nonnull: *u32,
};

fn foo() void {
    var p: Pun = .{ .nonnull = undefined };
    if (p.nullable == null) {
        // is this well defined?
    }
}

rohlem · 2020-10-23T04:14:22Z

(quick recap, probably unnecessary)
In my interpretation, undefined is a semantically distinct state from all defined values (which #211 proposes to track as runtime safety mechanism).
In practice (currently, and without this runtime safety mechanism), undefined values may lie outside of the valid states of the type of the expression that produced them, which violates the invariants the type system promises developers.
This can happen via a reinterpreting operation, i.e. when:

bitcasting, via @bitcast, @ptrCast, extern union, packed union, or other means
interacting with memory (that we can't prove we correctly initialized ourselves) across ABI boundaries:
- on the heap, via indirection or global variables
- on the stack, i.e. C code calls or returns from a function with NULL that declares its parameter or return type as a non-nullable *T.

(actual comment)
I assume with fixup you mean the validation step, @SpexGuy ; see below. To reiterate, in general, in my opinion there is nothing the compiler could/should do about invalid values, unless they come from a local undefined. They violate the assertions of the type system, so if Zig were to check for them, the only default handling strategy that seems appropriate to me would be to @panic. We (I) only want to coerce undefined values to be in a valid range, so that developers can rely on the type system for interfaces.
As you point out, plenty of valid non-null pointers still point to garbage, so if your undefined pointer value ends up being 0, you wouldn't want a reinterpreting bitcast to replace it with a different value.
Note that this is different from non-reinterpreting conversions, which need to preserve the invariants of the type system (or @panic, for inputs outside of their domain).

So, to clarify my proposal with my understanding of / suggestion in your examples:

Example 1:
- foo locally knows that p is at first undefined. To pass &p to bar, it has to validate p (choose a value that would be a valid .pointer, or a valid .integer state).
- (bar locally knows that p points to a valid state, but not which state (.integer or .pointer). If it were to read either state, this could be a reinterpreting operation, see below.)
- foo locally knows that bar leaves p either unmodified, or in a valid state (but not in which state (.integer or .pointer)).
- Accessing p.pointer is a reinterpreting operation. Zig should imo check the type invariants (here: cannot be null) and @panic if they are violated.
  (Note: This is just Zig upholding its type system, not technically part of my proposal. Not doing this just means we allow undefined behaviour of having a non-null pointer hold 0.)
- At the end of the function, fooneeds to validate that what it returns is a *u32. (Note: Because it validated all sources of invalid values (= sources of undefined values), this is a no-op).
- (If foo didn't call bar, p.pointer would be undefined, and it would have to validate it by replacing a 0 by a different undefined value.)
Example 2:
- foo locally knows that p is in the state .nonnull with undefined value, validation not strictly required.
- Reading p.nullable is a reinterpreting operation. Zig currently guarantees that the bit-pattern of null is 0. Because we didn't require immediate validation, we allow the invalid state of {.nonnull = 0}, which would lead to .nullable being null.
- If we instead moved validation boundaries to expressions, we would have forced the initialization to write a valid value, and != null would have been preserved.

Note that whatever validation boundary we choose, be it functions or expressions, basically sets the baseline performance gain of undefined over concrete values before optimizations.

While writing this I got a clear (and status-quo-actionable) idea for @isValid that doesn't depend on this proposal; follow-up issue incoming.

cajw1 · 2020-10-24T11:37:13Z

Magic values being generated at magic times is too difficult a mental model for me. Doesn't ZenOfZig imply I should be the one generating the values, explicitly, where and when I want?

I see this as two different problems, both solvable at compile-time:

extern union
undefined

@SpexGuy, re your extern union example, instead of runtime magic, what about:

=> The compiler will not allow primitive types with value constraints (not all possible bit permutations that fit into @sizeof(T) are possible for T) in an extern union. Unless (maybe, less kiss) the compiler can prove no illegal observable values can result from any use thereof because each field's legal values are legal values for each other field.

(Maybe the C to Zig transpiler needs to generate extern unions that would violate such constraints, but maybe they are fine for all C types?)

So no extern union { nonzeroptr: *u32, zeroisgood: usize, } because 0 is not allowed for nonzeroptr but is allowed for zeroisgood.

What about extern union { getme: u2, gotcha: u8, } (... both have @sizeof 1, but all reads of u2 seem to be bitmasked so all observable values of u2 should be fine)?

As for undefined, I fail when trying to think of it as a runtime value.

What is the value of a var of type T in state undefined?

(Currently) it does not even obey the loosest possible interpretation of value:
"any value, ie any permutation of on and off bits that can fit in the space allocated for var of @sizeof(T), even if that value is semantically illegal for T".

I think this is most concisely demonstrated by @vi's (undefined & ~undefined != 0) // can be true at runtime
example (#1947 (comment)).

Or less concisely by this snippet in --relase-fast:

const std = @import("std");
pub fn main() !void {
    var v1 : u32 = 2;
    std.debug.warn("defined v1 is = {}\n",.{v1});
    var v2: i32 = undefined;
    if(v2 >= 1){
        std.debug.warn("undefined v2 is >=1: {}\n",.{v2});
    } else if(v2 < 1){
        std.debug.warn("undefined v2 is < 1: {}\n",.{v2});
    } else { 
        std.debug.warn("undefined v2 is ???: {}\n",.{v2});
    } 
}
// ==> using release-fast this prints: undefined v2 is ???: 2
// ==> using relase-safe  this prints: undefined v2 is < 1: -1431655766

So when thought of as a value, undefined has unusual properties, with complicated rules required when used (see #1947 and the nonnull_or_undefined issue #3325).

Why not redefine undefined to be a pure compile-time constraint:

you are only allowed to assign undefined to a var when declaring it;
a var declared as undefined cannot be used before it has been defined;
if the compiler cannot prove at compile-time that every use of var is preceded by a define of var, it is a compile-time error.

This would make it easier to reason about. And allow a simplification of the language: you could remove = undefined altogether:

var undefinedvar : u32 = undefined; // with proposed change: not allowed to use until you have defined it
var uninitializedvar: u32; // currently illegal, why not allow and require a define before use (provable at compile-time)

Maybe a new concept "any legal value for that type, I don't care which one, let the compiler choose the fastest one right now" requiring a new keyword anylegalval would be helpful:

var initializedvar : i32 = anylegalval; // compiler, initialize this for me as fast as possible

The compiler knows that for an i32 it is a noop to choose anylegalval, but for a u32 it needs to ensure >= 0, it can either refuse to do anything for a non-null ptr (although any non-zero val is legal, even if it segfaults), ...

Et voilà, KISS, no nonnull_or_undefined, and no hidden runtime magic.

What about (from the documentation):

var answer: T = undefined;
return if (@addWithOverflow(T, a, b, &answer)) error.Overflow else answer;

Ok, maybe the undefined keyword is still needed in one place, to declare parameters to be uninitialized (for the compiler to check that if they are used within the function, they must be defined first). So addWithOverflow's would be declared as fn addWithOverflow(-, -, -, *undefined T).

andrewrk · 2020-10-26T18:43:17Z

Here is a real world bug caused by the status quo design flaw of Zig: #6790

You can see this results in tempting people to hack around the issue: #6825

The Zig code is:

export fn stage2_fetch_file(
    stage1: *Module,
    path_ptr: [*]const u8,
    path_len: usize,
    result_len: *usize,
) ?[*]const u8 {
    const comp = @intToPtr(*Compilation, stage1.userdata);
    const file_path = path_ptr[0..path_len];
    const max_file_size = std.math.maxInt(u32);
    const contents = comp.stage1_cache_manifest.addFilePostFetch(file_path, max_file_size) catch return null;
    result_len.* = contents.len;
    return contents.ptr;
}

The C++ code is:

Error file_fetch(CodeGen *g, Buf *resolved_path, Buf *contents_buf) {
    size_t len;
    const char *contents = stage2_fetch_file(&g->stage1, buf_ptr(resolved_path), buf_len(resolved_path), &len);
    if (contents == nullptr)
        return ErrorFileNotFound;
    buf_init_from_mem(contents_buf, contents, len);
    return ErrorNone;
}

Can you spot the bug? When zig loads the file, and the length is 0, it uses undefined for the pointer. So when the length is 0, the C++ code is branching on an undefined value, which is actually Undefined Behavior, the worst kind of footgun.

tadeokondrak · 2020-10-26T18:47:06Z

When zig loads the file, and the length is 0, it uses undefined for the pointer.

This itself might be the problem here, we could make .ptr when .len is 0 not UB, requiring the compiler to use an invalid but not-undefined value for ptr.

frmdstryr · 2020-10-26T19:44:48Z

Well I knew it was a hack lol, but I'm still confused what is going on here if someone want's to clarify...

Isn't zig using null and not undefined here? I thought null was a known value of 0?

rohlem · 2020-10-26T19:58:54Z

@frmdstryr When comp.stage1_cache_manifest.addFilePostFetch returns a 0-length slice (without error), that slice (stored in contents) has an undefined .ptr value that is returned.

andrewrk · 2020-10-26T20:00:35Z

@frmdstryr oh I knew you knew it was a hack, I'm sorry - I didn't mean to throw shade. I'm saying it's not your fault, the language has a flaw.

cajw1 · 2020-10-26T20:29:04Z

Beginner's reaction: why are slices not defined to have optional pointers (slice.ptr : ?*T) if that is a legal concept for 0 len slices? I guess it would increase the friction of using them.

rohlem · 2020-10-26T20:33:30Z

@cajw1 One reason I can think of is that we already have a way to express a slice without contents, by setting its .len to 0. If its .ptr can also be null, then which should you check before using it? Always checking both requires an extra operation to get the same information you can currently get by just looking at .len.

cajw1 · 2020-10-26T20:48:20Z

@rohlem I think you would only have to check one, either .len or .ptr, before using the slice, the advantage would be that you wouldn't be allowed to shoot yourself in the foot because either choice would work. (Of course with a non-optional ptr, if the compiler warned you when using an invalid .ptr, you would learn to only use .len...)

As an aside: maybe there is a more general problem/temptation to use undefined instead of either invalid or optional.

frmdstryr · 2020-10-26T21:35:38Z

I should probably take this onto IRC instead of taking derailing this thread... but in #1947 it says:

Note that for slicing operator, if the start is 0, the pointer value is not read, which makes this expression defined: (([*]u8)(undefined))[0..0]

So if I understand this correctly... the reason the slice with 0 len becomes undefined is because the compiler decides it can just optimize out reading the ptr and just leave whatever at the result location before in memory?

Can this occur if running with debug/release-safe?

rohlem · 2020-10-26T21:58:53Z

@frmdstryr Without reading the implementation of .addFilePostFetch, I would assume that for returning an empty slice no memory is allocated, so .ptr has no "real" value. (I don't know whether such a slicing expression is involved in it or not.)
So the Zig function would also have to return null in the case of .len == 0, so that the C++ code can safely check against nullptr before checking the returned length.

See #3328

See ziglang#3328

SpexGuy · 2020-11-05T04:15:43Z

The decision made in #1831 holds here as well. In terms of guarantees in release-fast mode, undefined is a special value, and coercion from T to ?T yields undefined if the input is undefined. However, #63 and #211 track debug checks for undefined to help catch these problems. I think that makes this issue a duplicate of those. Please reopen if you disagree.

rohlem · 2020-11-05T10:09:45Z

I don't see it as a duplicate, but I agree that the decision as I understand it rejects this proposal, so I'm fine it being closed.

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 27, 2019

andrewrk added this to the 0.6.0 milestone Sep 27, 2019

kyle-github mentioned this issue Oct 6, 2019

proposal: Non-zero integer types #3386

Closed

andrewrk modified the milestones: 0.6.0, 0.7.0 Mar 12, 2020

rohlem mentioned this issue Oct 22, 2020

Design Flaw: Assigning an undefined pointer to an optional may cause it to be null #6762

Closed

rohlem mentioned this issue Oct 26, 2020

Proposal: Introduce safety-checked UB for reading invalid bit patterns via @bitCast, pointers, and untagged unions #6784

Open

4 tasks

andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 26, 2020

andrewrk added a commit that referenced this issue Oct 26, 2020

non-hacky workaround for the empty file bug

435c8ad

See #3328

andrewrk mentioned this issue Oct 26, 2020

Bug: Importing an empty file yields "unable to find" #6790

Closed

travv0 pushed a commit to travv0/zig that referenced this issue Oct 29, 2020

non-hacky workaround for the empty file bug

5b18abe

See ziglang#3328

SpexGuy closed this as completed Nov 5, 2020

andrewrk removed this from the 0.8.0 milestone Nov 6, 2020

andrewrk added this to the 0.7.0 milestone Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Limit scope of `undefined` values (and introduce automatic coercion on function boundaries) #3328

Proposal: Limit scope of `undefined` values (and introduce automatic coercion on function boundaries) #3328

rohlem commented Sep 27, 2019 •

edited

Loading

kyle-github commented Oct 6, 2019

SpexGuy commented Oct 22, 2020 •

edited

Loading

rohlem commented Oct 23, 2020 •

edited

Loading

cajw1 commented Oct 24, 2020

andrewrk commented Oct 26, 2020 •

edited

Loading

tadeokondrak commented Oct 26, 2020

frmdstryr commented Oct 26, 2020

rohlem commented Oct 26, 2020

andrewrk commented Oct 26, 2020

cajw1 commented Oct 26, 2020

rohlem commented Oct 26, 2020 •

edited

Loading

cajw1 commented Oct 26, 2020

frmdstryr commented Oct 26, 2020

rohlem commented Oct 26, 2020

SpexGuy commented Nov 5, 2020

rohlem commented Nov 5, 2020

Proposal: Limit scope of undefined values (and introduce automatic coercion on function boundaries) #3328

Proposal: Limit scope of undefined values (and introduce automatic coercion on function boundaries) #3328

Comments

rohlem commented Sep 27, 2019 • edited Loading

Motivation

Opinion/Discussion

Proposal

kyle-github commented Oct 6, 2019

SpexGuy commented Oct 22, 2020 • edited Loading

rohlem commented Oct 23, 2020 • edited Loading

cajw1 commented Oct 24, 2020

andrewrk commented Oct 26, 2020 • edited Loading

tadeokondrak commented Oct 26, 2020

frmdstryr commented Oct 26, 2020

rohlem commented Oct 26, 2020

andrewrk commented Oct 26, 2020

cajw1 commented Oct 26, 2020

rohlem commented Oct 26, 2020 • edited Loading

cajw1 commented Oct 26, 2020

frmdstryr commented Oct 26, 2020

rohlem commented Oct 26, 2020

SpexGuy commented Nov 5, 2020

rohlem commented Nov 5, 2020

Proposal: Limit scope of `undefined` values (and introduce automatic coercion on function boundaries) #3328

Proposal: Limit scope of `undefined` values (and introduce automatic coercion on function boundaries) #3328

rohlem commented Sep 27, 2019 •

edited

Loading

SpexGuy commented Oct 22, 2020 •

edited

Loading

rohlem commented Oct 23, 2020 •

edited

Loading

andrewrk commented Oct 26, 2020 •

edited

Loading

rohlem commented Oct 26, 2020 •

edited

Loading