Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return type inference #447

Closed
andrewrk opened this issue Sep 8, 2017 · 22 comments
Closed

return type inference #447

andrewrk opened this issue Sep 8, 2017 · 22 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Sep 8, 2017

use case:

fn max(a: var, b: var) var {
    return if (a > b) a else b;
}
@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Sep 8, 2017
@andrewrk andrewrk added this to the 0.2.0 milestone Sep 8, 2017
@thejoshwolfe
Copy link
Sponsor Contributor

I'm worried that this is too easy. var is much easier to type than &const T, which means authors are incentivized to omit function return types out of laziness. This will lead to a paradigm shift where Zig functions can optionally declare their return type, rather than optionally inferring their return type.

Lessons learned from Python and Haskell say that explicit return types greatly improve readability, so I'm opposed to this proposal as it is.

A counter proposal would be some syntax that is sufficiently painful to use, so that authors are incentivized not to use it, except where it's really the best solution. Perhaps:

fn max(a: var, b: var) -> @typeOf(this.bodyExpression) {
    if (a > b) a else b
}

That still seems too easy to me. And of course in order to make that work and make sense, we'd be dragging in a whole lot of other features with this, so I don't recommend this either.

Here's another way to increase pain:

fn max(a: var, b: var) -> var {
    @setReturnTypeInferrable(this);
    if (a > b) a else b
}

And another way:

fn max(a: var, b: var) -> @inferReturnType(this) {
    if (a > b) a else b
}

All of these approaches are fundamentally flawed, because they're all a fixed snippet of code you could paste in without thinking. The thinking is what we want from the authors. We want the authors to document the return types when possible.

Here's another idea:

fn max(a: var, b: var) ->
        if (@isComptime(a) and @isComptime(b))
            @typeOf(if (a > b) a else b)
        else
            @typeOf(a, b)
{
    if (a > b) a else b
}

Now THAT's painful! (See also #439.)

Can we come up with any examples beyond min() and max()? That last thing I wrote there is actually not that bad of a solution in my opinion.

@tiehuis tiehuis added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 15, 2017
@hasenj
Copy link

hasenj commented Sep 29, 2017

D has this feature and I think it significantly reduces readability of code and documentation. The standard library is full of functions returning auto. I call a function, it returns something, I have no idea what that something is, and no idea what I can or can't do with it.

@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Oct 19, 2017
@andrewrk andrewrk added accepted This proposal is planned. and removed enhancement Solving this issue will likely involve adding new logic or components to the codebase. labels Feb 23, 2018
@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Feb 28, 2018
@jaccarmac
Copy link

Since so many things in Zig are lazily compiled the following suggestion might be a bit tricky.

It seems that most of the worries above are about reducing readability, esp. of documentation/APIs. If the return type is inferred then the documentation/autocomplete hints could contain the inferred type instead of var, no?

@andrewrk andrewrk changed the title ability to infer return types of generic functions return type inference Nov 16, 2018
@andrewrk andrewrk removed the accepted This proposal is planned. label Nov 16, 2018
@andrewrk andrewrk modified the milestones: 0.4.0, 0.5.0 Nov 16, 2018
@hryx hryx mentioned this issue May 12, 2019
5 tasks
@hryx
Copy link
Sponsor Contributor

hryx commented May 12, 2019

Just leaving a note here that came up while doing the stage2 parser rewrite.

FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? (KEYWORD_var / TypeExpr)

var is currently a grammatically accepted return type, but I see now that it is not implemented.

The grammar does not include Keyword_var as a choice for PrimaryTypeExpr, but the original iterative stage2 parser treats it as one. If inferred return types are to be supported, then that behavior actually kinda makes sense, if I understand correctly.

If var were to be included in PrimaryTypeExpr, I think this could be a valid update to the FnProto rule:

- FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? (KEYWORD_var / TypeExpr)
+ FnProto <- FnCC? KEYWORD_fn IDENTIFIER? LPAREN ParamDeclList RPAREN ByteAlign? LinkSection? EXCLAMATIONMARK? TypeExpr

hryx added a commit to hryx/zig that referenced this issue May 12, 2019
@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 May 15, 2019
@zimmi
Copy link
Contributor

zimmi commented Jul 10, 2019

Java opted to allow type inference only for local variables, the reasoning was that return type inference makes it too easy to accidentally break API compatibility when changing implementation details (especially when such methods call each other, changing a method deep down might change the API return type at a distance). There was debate if private (non-pub) fields / methods should allow type inference because they are not part of the API, but ultimately it was decided against to keep the rules simple. Zig could make a different tradeoff here, but I think the argument to enforce explicit commitment to the API (the public one, at least) of a module has a lot of weight.

Zig already infers errors, following the argumentation above it might make sense to enforce explicitly listing the possible errors at API boundaries as well, but that's another issue.

@shawnl shawnl mentioned this issue Aug 2, 2019
@andrewrk
Copy link
Member Author

andrewrk commented Aug 7, 2019

Here's a good use case for this: 2cd5e55. With this issue implemented, the Min function from that commit would not be needed.

@thejoshwolfe
Copy link
Sponsor Contributor

I'll repeat my question from above, as this still seems to be an important question:

Can we come up with any examples beyond min() and max()?

@kevinw
Copy link

kevinw commented Aug 13, 2019

Can we come up with any examples beyond min() and max()?

Implementing a multi-typed glGetUniform for hypothetical GL bindings might be a nice example. Note that in WebGL, the return type varies.

@thejoshwolfe
Copy link
Sponsor Contributor

glGetUniform

I don't understand how the bindings layer could figure out the type at compile time. I've never used GL shaders, but it seems like there are a few layers of runtime values that are getting in the way of determining the type at compile time.

Is there a statically typed language binding that knows the type at compile time?

@kevinw
Copy link

kevinw commented Aug 14, 2019

Ah, I see that you might be right! I was thinking you'd get the "inferred" type at the call site, like

var line_width:f32 = try gl.GetUniform("line_width");

And then it would error out if line_width wasn't an f32. But this might be over-complicating things...

@ghost
Copy link

ghost commented Aug 14, 2019

For a use case, how about a function that loads a file at comptime (using embedFile) and puts some info from the file in the return type? (e.g. an image loaded at comptime, which returns [w*h]u8)

I could almost have done that here, if I decided not to enforce a fixed width and height.

I'm not actually sure I like the idea of return type inference though, I just thought this was a funny idea.

@thejoshwolfe
Copy link
Sponsor Contributor

i like where you're going with that usecase @dbandstra. i can imagine a case that's even more extreme where you load an assets bundle from a .tar.xz at compile time.

const assets = loadAssets("assets.tar.xz");
fn loadAssets(comptime filename: []const u8) [getEntryCount(filename)]Asset {
    // do everything at compile time
}

In order to implement getEntryCount(), you'd need to decompress and iterate over the entire archive (at compile time), and then to implement loadAssets() you'd need to decompress and iterate over it all again. That's real bad.

Consider this workaround that works (untested) in status quo:

const assets = LoadAssetsT("assets.tar.xz").value;
fn LoadAssetsT(comptime filename: []const u8) type {
    comptime var assets = [_]Asset{};
    // read the assets at compile time
    while (something) {
        assets = assets ++ [_]Asset{entry};
    }
    return struct {
        const value: [assets.len]Asset = assets;
    };
}

That trick only works for entirely comptime values.

This reminds me a lot of C++ templates, like std::is_same. Very unergonomic.

I wonder if the compromise is that functions with inferred return type have to be entirely run at compile time.

@tomc1998
Copy link
Contributor

tomc1998 commented Sep 29, 2019

Here's a use case I just ran into:

Constructing anything where the structure needs to be altered through the course of development, but will be constant at runtime (so there's no need for switches / function pointers).

I was just writing some noise generation functions (to generate maps for games procedurally), and it's incredibly useful to be able to have composeable noise functions so you can combine them quickly & experiment with different combinations. Here are the two I was using:

/// A radial weight, which returns higher values for points closer to the center
const RadialWeight = struct {
    ...
    pub fn gen(self: @This(), x: f32, y: f32) f32 {
    }
};

/// A gradient noise function, for smooth noise
const SimplexNoise = struct {
    ...
    pub fn gen(self: @This(), x: f32, y: f32) f32 {
    }
};

Here are two really common functions. The intended usage is to setup the noise with some parameters (for example with RadialWeight you'd give a radius and a width / height, with SimplexNoise you'd give a scaling factor), then call the 'gen()' function repeatedly with different points in 2D.

Simplex noise will generate something similar to this, for reference -> https://i.stack.imgur.com/LNK39.png

I want to combine various scales of simplex noise, plus a radial weight, to generate an island map (where higher values indicate a higher elevation). Here's an example -> https://shanee.io/imagesT/blog/island-generation/mask_with_height.png

This generation requires a lot of playing around with different combinations of noise, different weights, etc - this may change through development. So I created a third type of noise, for combining any two noises:

pub fn CombinedNoise(comptime Noise1: type, comptime Noise2: type) type {
    ...
}

You can probably guess how this works. I can combine 3 noises by nesting them:

const MyNoise = CombinedNoise(CombinedNoise(SimplexNoise, RadialWeight), SimplexNoise));

I can't put this experimental noise generation in a function, however, because I need to know the return type for the function, which means duplicating the code - one version of the code to work out the types, and another version of the code to actually create the values.

Here's the actual function WITH inferred return types - try and figure out the return type & write it down yourself, as an exercise to the reader ;)

pub fn createNoise(seed: usize, map_size: f32) var {
    const SIMPLEX_SCALE = 1.0 / 90.0;
    const rad = map_size / 2.0;
    // Gen base radial noise with some extra octaves of simplex noise
    const base_noise = CombinedNoise(RadialWeight, SimplexNoise)
        .init(RadialWeight{ .cx = rad, .cy = rad, .r = rad },
              SimplexNoise.init(seed, SIMPLEX_SCALE), 0.7);
    const octave_1 = CombinedNoise(@typeOf(base_noise), SimplexNoise)
        .init(base_noise, SimplexNoise.init(seed, SIMPLEX_SCALE * 2.0), 0.92);
    const octave_2 = CombinedNoise(@typeOf(octave_1), SimplexNoise)
        .init(octave_1, SimplexNoise.init(seed, SIMPLEX_SCALE * 3.0), 0.95);
    const octave_3 = CombinedNoise(@typeOf(octave_2), SimplexNoise)
        .init(octave_2, SimplexNoise.init(seed, SIMPLEX_SCALE * 3.0), 0.98);
    const final_noise = octave_3;
    return final_noise;
}

Now imagine 'hey, I actually want another octave of noise' later on in development.

This might seem like an esoteric example, but would hold true for anything where you want to be able to compose functions at compile time, but may want to alter that composition through DEVELOPMENT (not runtime).

The solution I've had to use for this (and is the solution I'd use in C) is a big switch statement, and a heap allocated array of enums that I loop through. Now the burden of figuring out the types is left to runtime code, rather than the compiler. This is the perfect example of something which would run faster in c++ (and be much nicer to maintain) because the facilities of the language let me compose these things in a reasonable way.

I'd also mention that figuring the types out manually would likely be less readable than just putting 'var' there

@m-r-hunt
Copy link
Contributor

Just weighing in, I agree that allowing var return types adds to much potential for unreadable/non-understandable types. I think that using @typeof in the return type gives enough flexibility. for the min/max example:

fn max(a: var, b: @typeOf(a)) @typeOf(a) {
    return if (a > b) a else b;
}

If allowing a and b as different types and figuring out a compatible type is really necessary, you can write a seperate type function to do so and use that:

fn CompatibleNumericType(a: type, b: type) type {
    //...
    return SomeType;
}

fn max(a: var, b: var) CompatibleNumericType(@typeOf(a), @typeOf(b)) {
    return if (a > b) a else b;
}

Clunky? Maybe. But it's a lot more explicit about what's going on, and should encourage people to stick to simpler forms unless they really need complicated stuff.

This also works today with no modification as far as I'm aware.

@frmdstryr
Copy link
Contributor

Some variant of this would be helpful for translating c macro fn's.

@momumi
Copy link
Contributor

momumi commented Jan 6, 2020

For functions that return comptime arrays/strings, it might be nice to use [_] to infer the length of the returned array:

/// convert comptime string literal to utf16 string
fn utf16(comptime utf8: []const u8) [_]u16 {
    // ...
}

@hryx
Copy link
Sponsor Contributor

hryx commented Jan 8, 2020

This proposal might interact with the language change introduced with #2749.

@courajs
Copy link
Contributor

courajs commented Jun 1, 2020

This also popped up when I was trying to be generic over function types:

const std = @import("std");

pub fn main() anyerror!void {
  thing(a);
  thing(b);
}

fn thing(f: fn()var) void {
  f();
}
fn a() i32 {
  std.debug.warn("a", .{});
  return 4;
}
fn b() *const[_]u8 {
  std.debug.warn("b", .{});
  return "hey";
}
./src/main.zig:8:13: error: TODO implement inferred return types https://github.com/ziglang/zig/issues/447
fn thing(f: fn()var) void {
            ^
./src/main.zig:4:3: note: referenced here
  thing(a);
  ^

@bb010g
Copy link

bb010g commented Jul 17, 2020

You can work around this with some wrappers:

pub const WrappedAnytype = struct {
    val: anytype,
};

pub fn wrap_val(val: anytype) WrappedAnytype {
    return WrappedAnytype { .val = val };
}
pub fn unwrap_val(wrapped_val: anytype) @TypeOf(wrapped_val.val) {
    return wrapped_val.val;
}

pub fn wrapped_anytype_decrement(val: anytype) WrappedAnytype {
    return wrap_val(val - 1);
}
pub fn anytype_decrement(val: anytype) @TypeOf(unwrap_val(wrapped_anytype_decrement(val))) {
    return unwrap_val(wrapped_anytype_decrement(val));
}

comptime {
    @compileLog(anytype_decrement(0));
}

Output:

| -1
<source>:20:5: error: found compile log statement
    @compileLog(anytype_decrement(0));
    ^
Compiler returned: 1

If you're willing to deal with anonymous struct & function literals and awkward @call syntax, here's a helper:

pub fn unwrap_anytype_func_wrapped(func: anytype) WrappedAnytype {
    const Closure = struct {
        fn func(options: @import("std").builtin.CallOptions, args: anytype) @TypeOf(unwrap_val(@call(options, func, args))) {
            return unwrap_val(@call(options, func, args));
        }
    };
    return wrap_val(Closure.func);
}
pub fn unwrap_anytype_func(func: anytype) @TypeOf(unwrap_val(unwrap_anytype_func_wrapped(func))) {
    return unwrap_val(unwrap_func_wrapped(func));
}

pub const anytype_decrement_call = unwrap_anytype_func(struct {
    fn anytype_decrement(val: anytype) WrappedAnytype {
        return wrap_val(val - 1);
    }
}.anytype_decrement);

comptime {
    @compileLog(anytype_decrement_call(.{}, .{0}));
}

Output:

| -1
<source>:42:5: error: found compile log statement
    @compileLog(anytype_decrement_call(.{}, .{0}));
    ^
Compiler returned: 1

@SpexGuy
Copy link
Contributor

SpexGuy commented Jul 17, 2020

@bb010g That only works when executing the function at compile time. Structs containing var/anytype are comptime-only so the function is always implicitly evaluated at compile time. This won't compile with your example:

export fn foo(x: i32) i32 {
    return anytype_decrement(x);
}

@andrewrk
Copy link
Member Author

andrewrk commented Oct 9, 2020

There is no current plan for return type inference. This is a simplification of the language for the person reading the code as well as the compiler implementation.

@ArborealAnole
Copy link

Return type inference would be very useful.

Manual calculation is often entirely redundant an providing no additional information but taking 5-25 LOC, sometimes nearly doubling the size of the function. So function creation is discouraged leading to suboptimal code organization and code duplication.

And if you make functions to calculate the type - that's still more work, more code - and why will the reader go to the type calculating function when it's just as easy to know the type from reading the body?

There's no way that this will be helping the reader, when you are just calculating the return type with a logic parallel to the body, and at each break using .field_type, .return_type, or @TypeOfs. Also no way it helps the compiler. It has to infer type if you assign a block of code to an identifier, so it surely can already do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests