Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursion detection is over aggressive #10967

Closed
bpeake13 opened this issue Feb 22, 2022 · 4 comments
Closed

Recursion detection is over aggressive #10967

bpeake13 opened this issue Feb 22, 2022 · 4 comments
Labels
bug Observed behavior contradicts documented or intended behavior

Comments

@bpeake13
Copy link

Zig Version

0.9.1

Steps to Reproduce

Unsure if this is a bug or a feature request, it feels like a bug because this behavior does work with extern c functions, and other functions that could potentially be recursive, but cannot suspend.

Probably can be produced with other code, i discovered it while using this library:
https://github.com/kubkon/zig-yaml

Compile the following code using said library:

fn yamlTestFunc() void {
    _ = yaml.Yaml.load(std.heap.page_allocator, "test") catch unreachable;
}

pub const io_mode = .evented;
pub const yaml = @import("zyaml");
pub const std = @import("std");

pub fn main() !void {
  nosuspend yamlTestFunc();
}

The code fails to compile, stating that yamlTestFunc references a recursive async function. This code also fails with the same errors even if we we call with no_async, reference through a function pointer, or attempt to start a new thread that calls that function.

Example:

pub fn main() !void {
  var t = try std.Thread.spawn(yamlTestFunc, .{});
  t.join();
}

This will fail, stating a recursive error, even though the function is call from another thread.

Expected Behavior

These scenarios should not cause compiler errors. In some scenarios we can promise that the function does not need to be async, as it has no suspends, so recursion should be acceptable. This behavior would mirror that of calling an extern c from an async block, which is allowed.

The scenarios where the function is not actually called from the async frame should not result in errors, as it does not break any async recursion rules.

Actual Behavior

Any function that is detected as being recursive, that is referenced from an async frame, no matter if it is actually called or used, results in a compiler error.

@bpeake13 bpeake13 added the bug Observed behavior contradicts documented or intended behavior label Feb 22, 2022
@rohlem
Copy link
Contributor

rohlem commented Feb 23, 2022

I didn't try compiling the code, but here's a suggested solution.
I don't think the issue is invalid though, if nothing else the error message should improved to make this explanation superfluous (assuming it isn't already).


In some scenarios we can promise that the function does not need to be async, as it has no suspends, so recursion should be acceptable.

I think this is a mismatch in expectations. Zig currently aims for making all recursion explicit, to enable calculating a static bound for all stack growth over the stack's lifetime. See issues #157 and #1006. This would limit programs from "naturally" using recursion.
To make recursion still available, there was originally a separate builtin @newStackCall that would be called with a caller-allocated buffer to use as the new stack memory. There was however an issue in the implementation (though I don't think semantics?), see #3268 , upon which it was removed (for now).

This in itself has no connection to async functions (from what I can tell). However, the way async is specified in Zig, that mechanism also requires allocation of the function's stack (type @Frame(f) of fn f), which is either done implicitly by the async keyword, or explicitly by the user to be passed to @asyncCall.

So, the actual issue is the fact that you're calling recursive code (in particular, it seems yaml.Value.fromNode contains calls of itself) which is not allowed by Zig.
One solution could be to refactor the function to be non-recursive outside of tail calls (which can be forced by @call with .{.modifier = .always_tail}). For this example it doesn't seem feasible though, since it look like the code may arbitrarily nest both maps and lists - maybe supplying a comptime-known upper bound would be acceptable.

Then the only other solution (for the forseeable future) would be to:

  1. Make the function async (though I'm not sure if that maybe happens implicitly with (2.)?
  2. Explicitly allocate the stack for each recursive call of the function (@Frame(yaml.Value.fromNode)), from within itself (or one of its callee-s)
  3. Write recursive calls using @asyncCall, using that allocated stack. (I think adding nosuspend here, as well as in your calling code, would be correct.)

This should also force you to handle allocation failure of new stack frames, which the code (from what I can tell) currently doesn't consider.

@SpexGuy
Copy link
Contributor

SpexGuy commented Feb 23, 2022

The zig-yaml library was apparently not built to support evented io mode. You will need to open an issue with it instead, or disable evented io.

Recursion into a function which has no suspends is allowed. But remember that calling an async function counts as a suspend. Evented io turns many core io functions into suspend points through this mechanism.

@bpeake13
Copy link
Author

@rohlem That makes sense. Would that mean in the future calls to extern c functions and calls to function pointers would require setting the stack size in some way, or allocating a stacklet?

I'll mark this as closed.

@rohlem
Copy link
Contributor

rohlem commented Mar 19, 2022

@bpeake13 The way I remember it, function pointer types would need to be enhanced with the stack size the caller must provide, and external functions would be similarly be annotated the stack size they require. Again, there should be some way to provide a manually-allocated stack.

However, to clarify, please also consider the comment from @SpexGuy above; they're an active developer on the project (unlike me), so probably more knowledgeable. While our comments seem to contradict, I'm talking based on what I remember the discussion being when I heard it mentioned 1-2 years ago.

I believe that, as they state it, currently recursion may still be allowed, and your current issue might be resolved by using blocking io-mode (defining pub const io_mode = .blocking;, or not defining it at all), or modifying the library as a workaround. (In that case, I believe marking your callsite with nosuspend should have solved the issue originally, but didn't, which would be a bug.)

The Zig compiler is currently being rewritten (named stage2) in Zig itself, so the old compiler (stage1) is not given as much attention. As soon as the new compiler is fully-usable (2-3 months at most?), I expect many of the longer-standing language design questions to be addressed, which includes solidifying the design around recursion. Then we'll see whether the old plan still holds up (if I remembered it correctly) or it will need to be changed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior
Projects
None yet
Development

No branches or pull requests

3 participants