New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax flaw: return type #760

Closed
andrewrk opened this Issue Feb 9, 2018 · 65 comments

Comments

Projects
None yet
@andrewrk
Member

andrewrk commented Feb 9, 2018

Fundamentally we have this ambiguity:

fn foo() A { }

Is this a function that returns A with an empty body? Or is this a function that returns the struct literal A{} and we're about to parse the body?

Another example

fn foo() error {}

Function that returns an error, or empty error set, and we're about to find the function body?

fn foo() error{}!void {
}

The above is not allowed because zig thinks the return value is error and the function has an empty body, and then is surprised by the !. So we have to do this:

fn foo() (error{}!void) {
}

Unfortunate. And the compile errors for forgetting to put a return type on a function point at the wrong place.

Let's come up with a better syntax for return types. All of these issues are potentially on the table:

  • could reintroduce -> or => if it helps.
  • could change the struct literal syntax / error set declaration syntax
  • multiple return values are potentially in zig's future. but also maybe not.
  • could potentially go back to allowing leaving off the return type for void return types.
  • error set inference - which currently looks like this: fn foo() !T {} - is special syntax that belongs to a function definition that marks the function as having an inferred error set. It doesn't necessarily have to be specified as it currently is by leaving off the error set of the ! binary operator.

The syntax change should be satisfying, let's make this the last time everybody has to update all their return type code.

@andrewrk andrewrk added the breaking label Feb 9, 2018

@andrewrk andrewrk added this to the 0.2.0 milestone Feb 9, 2018

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Feb 10, 2018

I'm mentioning it for the sake of completeness, not as a serious suggestion, but the ambiguity can be resolved with arbitrary look-ahead / back-tracking.

For tooling reasons though you probably want zig's grammar to be LL(1) or LR(1).

@hasenj

This comment has been minimized.

hasenj commented Feb 10, 2018

As an aside, I do like having the -> before the return type, just for consistency with similarly recent languages (Swift, Rust) that employ this syntax. It makes reading code a bit easier; I find.

That said, for this issue it seems the problem is figuring out where the type declaration ends and the bod begins. The main flaw is that { is a valid token inside the return type declaration.

Possible solutions I see:

  1. Introduce a token to designate the end of return type and start of body. For example, =, as in:

     const abc = fn(x: X, y: Y) -> Z = {
         // function body
     }
    

    I don't think this is a very good idea. It deviates too much from what everyone is used to in similar languages, and adds a needless clutter to a lot of code just to avoid an ambiguity that only arises some of the time.

  2. Forbid { inside the return type, unless surrounded by parenthesis. For example:

     fn(...) -> (A{}) { the return type is `A{}`
     }
    
      fn(...) -> Z { // the return type is just `Z`
      }
    

    This avoids the ambiguity without requiring the (..) to be always present in the return type, but potentially violates the "only one obvious way of doing things", and can cause "code style" wars, with one camp saying "you should always use parenthesis around your return types" and another one saying "never put parenthesis around return types, and never put { inside the return types; always use type aliases".

    Which brings me to the next idea:

  3. Forbid { inside the return type completely; require the use of type aliases.

    If you need a function return a type that requires { in its definition, the only way to return such a type is to give an alias first. This can be good for readability, and avoids the introduction of having multiple ways of doing the same thing.

I don't have other ideas at the moment.

@raulgrell

This comment has been minimized.

Contributor

raulgrell commented Feb 27, 2018

@hasenj Your first suggestion is pretty ok if we can declutter it a bit. With number 3, readability can be hindered by forcing everything to have a name/alias. Number 2 seems like a good compromise.

As an iteration on the first, using => to denote the start of the function body:

const S = struct {
    someFn: fn(x: X, y: Y) error{} ! Z,
    foo: u8
}

fn bar(x: X, y: Y) error{} ! Z => {
     // function body
}

// Single expression functions!
fn barWhenXEquals2(y: Y)  !Z => try bar(2, y);

const s = S {.someFn = bar, .foo = 0 };
@andrewrk

This comment has been minimized.

Member

andrewrk commented Feb 27, 2018

@raulgrell taking your => suggestion to its natural conclusion:

if x + y > 10 => {

} else {

}

if foo() => |value| {

} else |err| {

}

while it : |node| it = node.next => {

}

while x < 10 : x += 1 => {

}

switch expressions already match this pattern.

I'm not so sure about this. I'd like to explore changing container initialization syntax and container declaration syntax to disambiguate.

@andrewrk

This comment has been minimized.

Member

andrewrk commented Feb 27, 2018

To clarify:

() on if, while, switch, for, etc is solving the same syntactic ambiguity. So for certain we will do one of these two things:

  • require () on return types, just like if and friends
  • solve the syntactic ambiguity a different way, and drop the () on if and friends
@raulgrell

This comment has been minimized.

Contributor

raulgrell commented Feb 27, 2018

I'd love to rid if and friends of their parentheses, so my vote is strongly for the =>.

The curly brackets for container initialization is common in C, C++, Rust, C#. On the other hand, of those, only Rust allows the omission of the parentheses in if statements.

Alternative container initialization methods include:

const S = struct {a: u8, b: u8};

// Some of these would probably mean adding named parameters in function calls

// keyword - it's a familiar keyword, but doesn't really look right in the language
const s = new S(.a = 0, .b = 1);

// builtin type function - more consistent with other language features. other function names
// can include make, create, init, initialize
const s = S.new(.a = 0, .b = 1);

// Callable type with named parameters - builtin types are callable, so there's some
// consistency here.
const s = S(.a = 0, .b = 1);

// We could probably do with compile time enforcement of Capitalized struct names if
// calling with parentheses or square brackets. On the other hand, it would mean
// comptime created types could look awkward

fn V(comptime T: type) type {
 return struct { value: T }
}

const vs = V(.T = S)(.a = 0, .b = 1);

// We could do square brackets on type is initialization
const s = S[.a = 0, .b = 1];
const vs = V(.T = S)[.a = 0, .b = 1];

// named initializers - I'm not sure how we'd define an initializer outside of the struct's scope.
// I do have a slightly more complicated idea that involves enum arrays, but I'll
// get to that later
const T = struct {
      a: u8,
      b: u8,

      new zeroes => { .a = 0, .b = 0 }
}

const t_zeroes = T::zeroes;

// builtin function
const t_ones = @new(T) { .a = 1, .b = 1};
// with named args
const t_ones = @new(T, .a = 1, .b = 1};

@raulgrell

This comment has been minimized.

Contributor

raulgrell commented Feb 27, 2018

This is the cleanest iteration of the above ideas I have found, using the new keyword and changing declaration syntax.

Declaration syntax changes to use parentheses instead of curly brackets, and square brackets instead of parentheses. Initialization requires new (T) { .fields = t }. Somewhat inspired by #661

const S = struct(a: u8, b: u8);
const s = new(S) {.a = 0, .b = 0};

fn G(comptime T: type) type { return struct( a: T ); }
const g = new (G(S)) { .a = s };

const E = enum(A, B);
const TE = enum[u8] (A = 3, B = 4);

error NoMem;
const errorSet = error(NoMem);

const C = union[E] ( A: u8, B: void);
const c = new (C) { .A = 0 };

const U = union[enum] (Number: f64, String: []const u8, None)
const n = new (U) { .Number = 0.99};

// This a function that returns A with an empty body
fn foo() A { }

// This is a function that returns the struct literal and an empty body
fn foo() struct(a: u8) { }

// This is a function that returns the struct literal A() and an empty body
fn foo() A() { }

// This is a function that returns an error with an empty body
fn foo() error {}

// This is a function that returns empty error set and an empty body
fn foo() error() {}

// This is fine
fn foo() error()!void {}

Actually, with these changes to declaration, I think initialization syntax can stay the same. I'm just not sure about the "struct literal A()" bit.

const s = S {.a = 0, .b = 0};
const g = G(S) { .a = s };
const c = C { .A = 0 };
const n = U { .Number = 0.99};

@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Feb 28, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 5, 2018

Here's my proposal:

  • Add the requirement of . before the { when instantiating types and values
  • Remove the requirement of () for if, while, for, switch, etc.
const Foo = struct.{
    a: i32,
    b: i32,
};

const Id = enum.{
    One,
    Two,
};

const Blah = union(enum).{
    Derp,
    Bar: struct.{
        aoeu: f64,
    },
};

const FileError = error.{
    OutOfMemory,
    AccessDenied,
};

test "aoeu" {
    var array = []u8.{'a', 'b', 'c', 'd'};

    var x = Foo.{
        .a = 1234,
        .b = 5678,
    };

    if x.a == 1234 {
        while true {
            eatStinkyCheese();
        }
    }
}

This fixes all the parsing ambiguity. Now { proceeded by a . is always type/value instantiation, and { not proceeded by a . is always the beginning of a block.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 5, 2018

It looks a little awkward but I guess that's no reason not to do it. That said, I didn't see discussion on reintroducing ->. Are there drawbacks besides extra typing?

@andrewrk

This comment has been minimized.

Member

andrewrk commented Mar 5, 2018

Are there drawbacks besides extra typing?

Not really. I removed -> because with mandatory return types, it was rendered unnecessary.

-> does not solve the return type syntax problem though. Whether or not we have ->, there is still the fn foo() -> error { } { } ambiguity.

@bnoordhuis

This comment has been minimized.

Member

bnoordhuis commented Mar 5, 2018

Ah right. And mandating parens around the return type makes for bad error messages if you forget them. I see the issue now.

@raulgrell

This comment has been minimized.

Contributor

raulgrell commented Mar 5, 2018

I was wondering whether it would be beneficial to change the built in field/reflection syntax from using the regular scope accessor . to a new operator like ::, ie:

Function.return_type becomes Function::return_type and slice.len becomes slice::len. This signals to the reader that whatever you're accessing is something built-in.

Then keep your proposal and make container instantiation requiring a :: followed by a {

const Blah = union(enum)::{
    Derp,
    Bar: struct::{
        aoeu: f64,
    },
};
@jido

This comment has been minimized.

jido commented Mar 8, 2018

Ok with the proposal, but:

var x = Foo.{
        .a = 1234,
        .b = 5678,
    };

Do we still need dot a, dot b here? We already have a dot after Foo.

@raulgrell

This comment has been minimized.

Contributor

raulgrell commented Mar 30, 2018

Only because this was mentioned in a separate issue, I just wanted to voice my support for your most recent proposal - disregard my :: suggestions as I've come to appreciate the value of the universal . operator.

Regarding @jido's point about the dot (ha, pun), I think they are consistent with the rest of the language.

// Here there is no dot because the identifier is being declared
const Foo = struct.{
    a: i32,
    b: i32,
};

// Here, we would have a dot because the name already exists
var x = Foo.{
    .a = 1234,
    .b = 5678;
};

If we look at consider the enum type inference in switches proposed in #683, they mention of issues with shadowing without the .. This would also be consistent with other parts of the language

const Value = enum(u32).{
    Hundred = 100,
    Thousand = 1000,
    Million = 1000000,
};

const v = Value.Hundred;
switch (v) {
    .Hundred => {},
    .Thousand => {},
    .Million => {}
}

@Hejsil Hejsil referenced this issue Apr 3, 2018

Merged

Self hosted parser completion #873

32 of 33 tasks complete

@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Jul 18, 2018

@Hejsil Hejsil closed this in #1628 Oct 15, 2018

@andrewrk

This comment has been minimized.

Member

andrewrk commented Oct 15, 2018

Thanks for doing that work, @Hejsil .

To those who prefer the previous syntax, I empathize because I also prefer the previous syntax. However parsing ambiguities are simply not acceptable. And so we're stuck with the least worst solution. Proposals for different syntax/solutions are welcome, provided that they are better than status quo and do not introduce any syntax ambiguities.

@allochi

This comment has been minimized.

allochi commented Oct 15, 2018

@andrewrk as I mention in my previous comment in #1628, using the dot here harm readability, as it's used usually as a member access operators, here it's overloaded with two extra meaning define and new.

If I'm getting this right, the problem is to solve the ambiguity of expressions like this

fn foo() A {}
fn foo() error{}!void {}

All what we need is to surround the return type with symbols that are not used somewhere else, how about

fn foo() `A {}`
fn foo() `error{}!void` {}

To clarify:

() on if, while, switch, for, etc is solving the same syntactic ambiguity. So for certain we will do one of these two things:

  • require () on return types, just like if and friends
  • solve the syntactic ambiguity a different way, and drop the () on if and friends

I much prefer to require () on return types and keep it with if, while, switch, for, etc, like it is with C like languages than create a new meaning for .

@andrewrk

This comment has been minimized.

Member

andrewrk commented Oct 15, 2018

@allochi I'm going to amend your proposal and change it from surrounding the return type with backticks to surrounding it with parentheses. I think that's actually a reasonable proposal. If we end up doing named return values to solve #287, then parentheses around return values is especially reasonable.

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 15, 2018

How about in case of ambiguity use a dot?

fn foo() A {} // function returns A
fn foo() A{} {} // compile error
fn foo() A.{} {} // function returns A{} struct literal

Now everything stays the same for the most part, assuming this works

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 15, 2018

Here's how I'm thinking it:

{} is always the body of what's to the left of it.

const A = struct {} // works fine
fn foo() {} // easily detects that return type is missing

The function decl parsing code is special cased to expect the {} to always refer to its body, and it must be escaped if you want to refer to something else.

fn foo() A.{} {}

@andrewrk

This comment has been minimized.

Member

andrewrk commented Oct 18, 2018

I approve of #760 (comment). anyerror is a great name for the primitive type that represents the global error set.

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 18, 2018

So it's "fn foo() anyerror {}"?

For shorter typing you could have "fn foo() error {}" and "const FileOpenError = errorset {};"

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 18, 2018

@UniqueID1 It is encouraged/good style to use error sets more than error, so I think it makes sense to change error.

@Manuzor

This comment has been minimized.

Manuzor commented Oct 18, 2018

If the standalone error is the only case that causes ambiguity then I also think special casing it is a much better solution than anything else mentioned here before. The identifier anyerror states the intent very clearly.

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 18, 2018

@Hejsil Wanting to have (subjectively) uglier syntax for encouragement reasons is a valid viewpoint, if I'm not simplifying your viewpoint too much. I think it's in a way risky. I think it'll encourage people to switch over from the global error set at the wrong time for the wrong reason.

When I say "at the wrong time" I'm channeling my inner Jonathan Blow who has a way of writing really imperfect code at the right time which works for him to stay hyper productive. So if you're not handling errors yet, and you might not for years, then you'll use the global error set to keep yourself from going off on a tangent such as having to define the errors correctly before you'll ever need to handle them.

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

@UniqueID1 I see your concern, but I'm pretty sure inferred error sets cover 99% of this use case (fast prototyping), because in 99% of cases you (or the compiler), knows which errors can be returned from a function.

If you don't care, then just add ! to the return type of all functions that can fail. It's easier to type, and more correct than anyerror.

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 19, 2018

Ahh okay, also anyerror is very descriptive so that's a plus for it. It's actually not bad at all and it tells the newbies exactly what it does.

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

Alright, seems like we might have found a better solution than what was merged in #1628. Any objections should be voiced as soon as possible, as I wanna revert #1628, and start implementing the proposed solution.

One downside to this is that that we have to reject #1659.

@thejoshwolfe

This comment has been minimized.

Member

thejoshwolfe commented Oct 19, 2018

// Note, that <Expr> "." <Symbol> is a <TypeExpr>. > This is therefor possible:
fn a() A{.T = u8}.T {}

This doesn't seem right. Everything else about the proposed TypeExpr seems good, but I think this example should fail to parse.

The formal grammar specification for Zig isn't always up to date, but it might be able to shed light on this if it's updated with this proposal.

This is also not very important and shouldn't block any forward progress. This is just something to iron out before 1.0.0.

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

I don't think we can have that example fail as simple cases like fn a() A.B {} and fn a() A().B {} should work.

My plan was to work on #208, but it seems we still haven't reached a conclusion on this (and this issue seemed to be related, so I switched my focus to this).

I'll probably work towards having the grammar up to date, as that will help with the stage2 parser rewrite when that is gonna happen. Then add some kind of test that ensures that it stays up to date (have the grammar as a bison parser, and run this parser on all Zig src code in this repo on every commit). This will also help when prototyping new syntax.

@binary132

This comment has been minimized.

binary132 commented Oct 19, 2018

@Hejsil, I really like this proposal (#760 (comment)) a lot better than the one using dot to disambiguate. 😁

I don't love the anyerror name but I'm not quite clear on the semantics so I hesitate to comment. It seems kind of abstract and poorly-defined though. This is where Go uses a builtin interface type error that can easily be implemented by any type. Nested / chained errors are also a really common need and all of Go, Rust, C, and C++ have awkward tooling and conventions around that problem. Using a special name for a special error type to imply that it can contain other error values (?) just feels icky and non-obvious.

Is there a place for discussing the error naming or semantics? Please feel free to disregard, just my two cents as an outside observer. But maybe there is an even better solution here.

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

@allochi

This comment has been minimized.

allochi commented Oct 19, 2018

@Hejsil thanks a lot for being so communicative and agile toward this issue.

I'm glad that we are getting to agree on a solution for this, I don't mind parentheses in if, for, ..., I admit I got used not to use them in Go for example, but I don't mind them in other languages, and since these are expressions in zig it kind of works well, take for example this from the standard library

const n = if (self.pos + bytes.len <= self.slice.len)
    bytes.len
  else
    self.slice.len - self.pos;

// -- vs --

const n = if self.pos + bytes.len <= self.slice.len
    bytes.len
  else
    self.slice.len - self.pos;

I don't know about the others, but with parentheses it seems to be easier to read.

Now regarding the global error set, why name it anyerror? why not errorset or errors wouldn't that be more descriptive of what it is?

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

@allochi There are a few reasons for anyerror:

  • error{A,B} is an error set, so I don't think we should rename the error to errorset as that is just confusing.
  • There is a low chance of anyerror to collid with identifiers, as the standard style is snake_case (any_error).
  • anyerror communicates quite well if you don't know the language:
// `a` could return any error or u8. 
fn a() anyerror!u8 { ... }

// `a` could return errors or u8. (I could read `fn a() !u8` the same way)
fn a() errors!u8 { ... }

// `a` could return all errors or u8. (It could return all errors at the same time?)
fn a() allerrors!u8 { ... }
@allochi

This comment has been minimized.

allochi commented Oct 19, 2018

@Hejsil Thanks, no strong feeling againstanyerror, it seems to be fine.

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 19, 2018

error{A,B} is an error set, so I don't think we should rename the error to errorset as that is just confusing.
I mean I don't know about that.

const A = struct {}; // a struct
const A = errorset {}; // an error set

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

@UniqueID1 Sorry, that was a confusing statement. I'm talking about this rename:

fn a() error{}!u8 {} -> fn a() error{}!u8 {}  
fn a() error!u8 {}   -> fn a() anyerror!u8 {}

We could even rename both examples:

fn a() error{}!u8 {} -> fn a() errorset{}!u8 {}  
fn a() error!u8 {}   -> fn a() anyerror!u8 {}

But that's not the goal of this issue.

@Hejsil

This comment has been minimized.

Member

Hejsil commented Oct 19, 2018

Or maybe best:

// Leave `error` as is. This allows error.A to work without parser hacks
fn a() error { return error.A; }

// Rename `error` for error sets
const E = errorset {A, B};
@allochi

This comment has been minimized.

allochi commented Oct 19, 2018

@Hejsil Yeah, this is what I was thinking of too, it make sense this way.

@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 20, 2018

That was my proposal.

For shorter typing you could have "fn foo() error {}" and "const FileOpenError = errorset {};"

Edit: ..maybe, I don't know. I'm confused haha.

@Manuzor

This comment has been minimized.

Manuzor commented Oct 20, 2018

Let me get this straight. The new keyword errorset would be used to declare a new error set, just like struct and friends. The existing keyword error would be used to return a specific error, like return error.OutOfMemory.

If that is correct, then I think there is no need for anyerror anymore.

// no errors from this function.
fn foo() void {
    // ...
}

// inferred errorset from this function.
fn bar() !void {
    // ...
}

// explicit errorset that is declared inline.
fn baz() errorset{ OutOfMemory, FileNotFOund }!void {
    // ...
}

// new explicit errorset
const TheErrors = errorset{ OutOfMemory, FileNotFound };
fn qux() TheErrors!void {
    // ...
}

// status quo to return "any error" still works.
fn corge() error!void {
    // ...
}

// Not sure about this one, but it's certainly not ambiguous.
fn waldo() error.OutOfMemory!void {
    // ...
}
@UniqueID1

This comment has been minimized.

UniqueID1 commented Oct 20, 2018

I believe that there are 3 options

// keywords are error and errorset
fn foo() error {}
const E = errorset {};
// keywords are anyerror and error
fn foo() anyerror {}
const E = error {};

// just the error keyword
fn foo() error {}
fn foo() (error {A,B}) { ... }
const E = error {};

I prefer error + errorset over anyerror + error. However, with functions accepting TypeExpr instead of Expr, I believe we could also just have the error keyword and use parenthesis in some cases as shown above.

Edit: I see that @andrewrk finds the parenthesis unfortunate (first post, 4th example). My bad Hejsil, forgot about that. In that case having two keywords might be better, and it might be clearer overall anyways. Personally I love error + errorset. Whatever the case, Hejsil's change with functions accepting TypeExpr is absolutely excellent. Super cool change!

@binary132

This comment has been minimized.

binary132 commented Oct 20, 2018

I agree with @UniqueID1. If there's an ambiguity in parsing inline return types in fn signatures, the right way to resolve the ambiguity is by explicit grouping. ISTM this would only be necessary when declaring a type inline in a fn signature's return type, and it makes it more readable, so the increased write-time effort is minimal compared to the benefits.

a) It does not reducing readability overall, and in fact enhances it. Otherwise, fn signatures can read like a run-on sentence. Easy for humans to parse from left to right.
b) Does not require an awkward new special-purposed builtin keyword.
c) Familiar and known to be usable in Go.

@Hejsil

This comment has been minimized.

Member

Hejsil commented Nov 13, 2018

The . syntax has been reverted as of #1685, which implements option 2 in this comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment