chapter1: Clarify "nop before call" paragraph #4
has generated some questions on reddit and #performance on the Gophers Slack. Could you please expand on this more?
The text was updated successfully, but these errors were encountered:
So here's the boring, abridged backstory:
Unfortunately I cannot recall the name of this book for the life of me, nor the reason that was given for this pattern to exist. Googling for tricky usages of NOP instructions mostly redirects to security-related stuff, and I don't remember this being related to security. But then again, I've got no source at all to back this up, and I might as well be talking complete non-sense here, so...
Anyway, I was kinda hoping that someone with better assembly skills than me would be able to shed some light on this.
Now, on the bright side, your question made me go back into the code to look for more clues.
0x003a NOP ;; 0x3a 0x003a CALL runtime.morestack_noctxt(SB) ;; 0x3a too
Now this is just some abstract assembly that can and will be modified by the linker in many ways, but still, this looks odd. So I went a bit deeper.
Whose description sounds very much like what we're looking at. We do need somewhere to land, and we seem to be 0 bytes...
In the end, this might not be related at all to what made me write this in the first place. Heh.
Sorry I cannot help you more here. Thanks for the great question though :)
This does not seem to be a generic "NOP-before-CALL" situation but rather a fix up for the stacksplit epilogue to maintain the correct stack pointer adjustment (for debugging purposes only it seems):
// Now we are at the end of the function, but logically // we are still in function prologue. We need to fix the // SP data and PCDATA. spfix := obj.Appendp(last, newprog) spfix.As = obj.ANOP spfix.Spadj = -framesize
as you have observed, NOP does not map to a machine code NOP but rather is simply ignored when generating machine code. But the
Also note that on certain architectures, the stacksplit check was in the prologue with a jump to skip the check so such fix-up is unnecessary. (i.e. instead of "jump to stacksplit (in epilogue) if not enough space", the emitted code reads "jump over stacksplit (in prologue) if having enough space"). e.g. when compiling with
MOVW R31, R3 CALL runtime.morestack_noctxt(SB) ; no NOP before CALL
P.S. The "doing so can lead to very dark places" case you are thinking of might be referring to the practice of appending NOPs after any branching instruction for architectures that have branch delay slots as oppose to this case.
Thanks for the explanation as well as the links @zliuva, it all makes perfect sense once you've read that code! I was so focused on my
I do remember reading about delay slots a few years back, so that might be it yes; I'll have to dig further.
Thanks again for your pointers!