-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: PoC: an alternative to tail calls (for Xtensa) #74
Conversation
Unfortunately Xtensa GCC doesn't use registers for return structures larger than 4 words. In this case the structure is 6 words long, and is passed on the stack. This means that the approach is still functional but may be not super efficient. I have done a quick performance check using
So it seems that even this non-optimal C version isn't that much slower than the version in master, but some stack savings are observed. |
That's why I love this project. It allows to run quick experiments, often with unexpected results ;) |
@vshymanskyy I think i have finished with the experiments from my side, next step will depend on your feedback about:
The PR shows that such changes can be hidden behind configuration macros, and enabled only for certain platforms. I can do further cleanup, at the moment this is only a rough PoC. An alternative to this set of changes would be to implement all the basic functions in assembly, bypassing the limitations of the ABI. With help of some macros, this seems to be a viable task. However until the interpreter can be considered stable, it may be a premature thing to do. |
@igrr Thanks much for your efforts. I have in plans to create a POC with |
@vshymanskyy Would you like some help with Lables-as-values approach? Or maybe I can make a PR with assembly implementation of operations for Xtensa, which would be less invasive — i will only need to separate declarations in m3_exec.h from the implementations. I'd like to make Xtensa support more practically useable in near term, as currently it is limited by the stack requirements. Any suggestion how to proceed with this would be appreciated! |
@vshymanskyy gentle bump; did you get any progress with labels-as-values? Would you be okay with separation of declarations/implementations in m3_exec.h, as an interim solution? At least this would allow me to push the work on Xtensa port forward. |
I know I'm coming late to the party, is there any interest in pursuing any of the approaches here? I'm interested in good esp32 support... |
I don't mind rebasing this PR and resolving this conflicts, if @vshymanskyy approves the proposed changes in general. |
Thanks for your interest guys. Let me review it once again! |
I've been evaluating different dispatch methods recently. |
I have some progress on this. |
Now tracked by #241 |
As pointed out in #28 (comment), Xtensa port suffers from lack of tail call optimization in the compiler. Tail calls are possible on the ESP8266 (but not implemented in GCC) and aren't possible on the ESP32, due to the ABI limitation.
This PR aims to provide an alternative to tail calls as means of chaining primitive operations.
The idea is to modify the operation function signature as follows:
where return structure
m3_ret_struct_t
has the same layout (in registers) as the input arguments:No tail calls are performed, instead all the operations are called from the following loop:
Zero/non-zero
_sp
is used to indicate whether execution should be continued (operation wants a tail call) or the loop should return (operation wants to return).The theory (which I haven't tested yet) is that the C loop above can be implemented in a few assembly instructions. This is because the return values after function call are already in the right registers. So we only need to check if
_sp
is zero, increment the PC, and jump to the PC again.At the moment this modification seems to pass the spec tests.
Another change is converting
_mem
into a global value. This is needed to make the operation arguments fit into the registers, as on Xtensa only 6 32-bit registers are used for argument passing. The rest of the arguments would have to be spilled onto the stack. If necessary,_mem
can be made thread-local instead of global.