Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack allocation often harder than heap allocation #13640

Open
enderger opened this issue Nov 23, 2022 · 8 comments
Open

Stack allocation often harder than heap allocation #13640

enderger opened this issue Nov 23, 2022 · 8 comments
Labels
use case Describes a real use case that is difficult or impossible, but does not propose a solution.
Milestone

Comments

@enderger
Copy link

Zig Version

0.11.0-dev.184+c4f7663c9

Steps to Reproduce and Observed Behavior

  1. Create a structure which contains a pointer to itself
  2. Try to instantiate said structure without either allocating on the heap or creating an imperative list of substructures

Expected Behavior

The language heavily pushes for stack allocation to be easier than heap allocation, but currently doesn't provide any way to just allocate memory on the stack inline. This either means that you have to write out a load of disjointed variables to express recursive structures (as pointers cannot derive their mutability from their parent) or allocate large amounts of unmanaged memory on the heap. This is problematic, as you either need to write structures that take up a large heap memory and mechanistically free all memory at once (forcing far more data onto the heap than actually needed), have horribly disjointed and hard to follow initialization, or leak memory. To fix this, there probably should be a builtin which creates a pointer to a hidden variable initialized to a constant in the current scope (in other words, expanding the current stack frame and getting a pointer) so that stack based constructs are able to take advantage of struct initializer syntax without either calculating the size of the buffer needed for an allocator or having a massive number of variables.

@enderger enderger added the bug Observed behavior contradicts documented or intended behavior label Nov 23, 2022
@nektro
Copy link
Contributor

nektro commented Nov 23, 2022

const S = struct {
    field: i32,
    me: *S = undefined,
};

pub fn main() void {
    var the = S{ .field = 42 };
    the.me = &the;
}

@enderger
Copy link
Author

enderger commented Nov 23, 2022

More meant a case like this:

const SExpr = union(enum) {
  Atom: []const u8,
  Pair: struct {
     car: *SExpr,
     cdr: *SExpr,
  }
}

The problem arises when you try to initialize deeply nested pairs, since you now need a variable to hold every SExpr without allocating it on the heap.

@enderger
Copy link
Author

Sorry for the oddly specific example, this problem has made quite a lot of trouble when trying to implement a parser for S expressions since every part of the tree has to be assigned to a named variable whose sole purpose is to hold one part without putting it on the heap.

@dee0xeed
Copy link

While doing some exercises I permanently bumped into similar (or not very similar, not sure) problems. With pure stack allocation we do not know actual address of an entity (and therefore addresses of all its' sub-entities) until return from .init(). This may significantly affect data structures design and also initialization sequence - they can be more simple in case of heap allocation.

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 27, 2022

If it helps, you can use heap code to allocate out of a stack buffer with code like this:

const needed_sexprs = 40;
var mem: [@sizeOf(SExpr) * needed_sexprs]u8 align(@alignOf(SExpr)) = undefined;
var fba = std.heap.FixedBufferAllocator.init(&mem);
const tree = buildTree(fba.allocator()) catch @panic("stack mem not big enough");

@mlugg
Copy link
Member

mlugg commented Dec 1, 2022

While doing some exercises I permanently bumped into similar (or not very similar, not sure) problems. With pure stack allocation we do not know actual address of an entity (and therefore addresses of all its' sub-entities) until return from .init(). This may significantly affect data structures design and also initialization sequence - they can be more simple in case of heap allocation.

There are proposed language features to deal with this issue. Improved result location semantics (#2765) will allow you to access the return address of a function, meaning you effectively get a pointer to it for free, and pinned structs (#7769) will allow you to enforce that the struct is never directly copied (to make sure the result location stuff is being used properly and the value is never accidentally copied, invalidating the pointers). Until those are implemented, of course, you can just pass out pointers into init functions if needed.

@dee0xeed
Copy link

dee0xeed commented Dec 1, 2022

Improved result location semantics (#2765) will allow you to access the return address of a function, meaning you effectively get a pointer to it for free

I guess this is organized in a way similar to something like this:

  • on the caller side
    ** reserve a space for the result (on stack, as if it is another one local var)
    ** push parameters on stack
    ** make a call
  • on the callee side:
    ** we know the location of the result and just fill it

Hence addresses are known from the very beginning and are valid until the caller returns.
Did I get the idea right?

@mlugg
Copy link
Member

mlugg commented Dec 4, 2022

<snip>
Did I get the idea right?

Kind of, yeah. What happens is that the caller passes as an implicit argument a pointer to the memory the return value should occupy - this actually probably already happens internally in basically any standard imperative language, it's also how most (all?) C calling conventions handle large return values.

Passing a pointer rather than just implicitly writing to memory below the stack frame allows result locations to be chained through multiple calls. For instance, if you have this code:

fn f() MyStruct {
    return .{
        // ...
    };
}

fn g() MyStruct {
    return f();
}

pub fn main() void {
    const foo = g();
    // ...
}

Result location semantics guarantee zero intermediate copies, since g and then f will be internally passed a pointer to foo's stack memory. The improved semantics would allow f to directly access that pointer, likely through a builtin, so foo could be self-referential.

@andrewrk andrewrk added use case Describes a real use case that is difficult or impossible, but does not propose a solution. and removed bug Observed behavior contradicts documented or intended behavior labels Jan 23, 2023
@andrewrk andrewrk added this to the 1.0.0 milestone Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
use case Describes a real use case that is difficult or impossible, but does not propose a solution.
Projects
None yet
Development

No branches or pull requests

6 participants