Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: PoC: an alternative to tail calls (for Xtensa) #74

Closed
wants to merge 2 commits into from

Conversation

igrr
Copy link
Contributor

@igrr igrr commented Jan 28, 2020

As pointed out in #28 (comment), Xtensa port suffers from lack of tail call optimization in the compiler. Tail calls are possible on the ESP8266 (but not implemented in GCC) and aren't possible on the ESP32, due to the ABI limitation.

This PR aims to provide an alternative to tail calls as means of chaining primitive operations.

The idea is to modify the operation function signature as follows:

m3_ret_struct_t OP(pc_t _pc, u64 * _sp, m3reg_t _r0, f64 _fp0);

where return structure m3_ret_struct_t has the same layout (in registers) as the input arguments:

typedef struct {
    union {
        pc_t pc;
        m3ret_t err;
    };
    m3stack_t sp;
    m3reg_t r0;
    f64 fp0;
} m3_ret_struct_t;

No tail calls are performed, instead all the operations are called from the following loop:

    while (true) {
        m3_ret_struct_t rs = ((IM3Operation) *_pc)(_pc+1, _sp, _r0, _fp0);
        _pc = rs.pc;
        _sp = rs.sp;
        _r0 = rs.r0;
        _fp0 = rs.fp0;
        if (_sp == NULL) {
            return rs.err;
        }
    }

Zero/non-zero _sp is used to indicate whether execution should be continued (operation wants a tail call) or the loop should return (operation wants to return).

The theory (which I haven't tested yet) is that the C loop above can be implemented in a few assembly instructions. This is because the return values after function call are already in the right registers. So we only need to check if _sp is zero, increment the PC, and jump to the PC again.

At the moment this modification seems to pass the spec tests.

Another change is converting _mem into a global value. This is needed to make the operation arguments fit into the registers, as on Xtensa only 6 32-bit registers are used for argument passing. The rest of the arguments would have to be spilled onto the stack. If necessary, _mem can be made thread-local instead of global.

@igrr igrr requested a review from vshymanskyy January 28, 2020 02:06
@igrr
Copy link
Contributor Author

igrr commented Jan 28, 2020

Unfortunately Xtensa GCC doesn't use registers for return structures larger than 4 words. In this case the structure is 6 words long, and is passed on the stack. This means that the approach is still functional but may be not super efficient.

I have done a quick performance check using fib(30) function:

  • master: task stack used: 13908 bytes, time elapsed: 5400ms
  • this branch: task stack used: 5588 bytes, time elapsed: 5520ms

So it seems that even this non-optimal C version isn't that much slower than the version in master, but some stack savings are observed.

@vshymanskyy
Copy link
Member

vshymanskyy commented Jan 28, 2020

That's why I love this project. It allows to run quick experiments, often with unexpected results ;)

@igrr
Copy link
Contributor Author

igrr commented Jan 28, 2020

@vshymanskyy I think i have finished with the experiments from my side, next step will depend on your feedback about:

  • changes related to wrapping tail calls into macros, and returning structs instead (required as a replacement for tail calls on Xtensa)
  • changes related to moving _mem and _fp0 arguments into global variables or special registers (optional, result in minor performance improvement and code size reduction).

The PR shows that such changes can be hidden behind configuration macros, and enabled only for certain platforms. I can do further cleanup, at the moment this is only a rough PoC.

An alternative to this set of changes would be to implement all the basic functions in assembly, bypassing the limitations of the ABI. With help of some macros, this seems to be a viable task. However until the interpreter can be considered stable, it may be a premature thing to do.

@vshymanskyy
Copy link
Member

@igrr Thanks much for your efforts. I have in plans to create a POC with Labels-As-Values based execution. This is much more involved, but should be interesting.
Let's have this PR open for some time, until we have something to compare with.

@igrr
Copy link
Contributor Author

igrr commented Jan 31, 2020

@vshymanskyy Would you like some help with Lables-as-values approach? Or maybe I can make a PR with assembly implementation of operations for Xtensa, which would be less invasive — i will only need to separate declarations in m3_exec.h from the implementations.

I'd like to make Xtensa support more practically useable in near term, as currently it is limited by the stack requirements. Any suggestion how to proceed with this would be appreciated!

@igrr
Copy link
Contributor Author

igrr commented Mar 3, 2020

@vshymanskyy gentle bump; did you get any progress with labels-as-values? Would you be okay with separation of declarations/implementations in m3_exec.h, as an interim solution? At least this would allow me to push the work on Xtensa port forward.

@tve
Copy link

tve commented Jul 28, 2020

I know I'm coming late to the party, is there any interest in pursuing any of the approaches here? I'm interested in good esp32 support...

@igrr
Copy link
Contributor Author

igrr commented Jul 29, 2020

I don't mind rebasing this PR and resolving this conflicts, if @vshymanskyy approves the proposed changes in general.

@vshymanskyy
Copy link
Member

Thanks for your interest guys. Let me review it once again!

@vshymanskyy
Copy link
Member

I've been evaluating different dispatch methods recently.
Check out this repository: https://github.com/vshymanskyy/interp

@vshymanskyy
Copy link
Member

I have some progress on this.
Still need to figure out some obscure bugs, but I like how it looks like after restructuring.

Base automatically changed from master to main March 15, 2021 23:06
@vshymanskyy
Copy link
Member

Now tracked by #241

@vshymanskyy vshymanskyy closed this Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants