New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter1: Clarify "nop before call" paragraph #4

dgryski opened this Issue Mar 5, 2018 · 4 comments


3 participants

dgryski commented Mar 5, 2018

The section

The NOP instruction just before the CALL exists so that the prologue doesn't jump directly onto a
CALL instruction. On some platforms, doing so can lead to very dark places; it's common pratice to
set-up a noop instruction right before the actual call and land on this NOP instead.

has generated some questions on reddit and #performance on the Gophers Slack. Could you please expand on this more?

@teh-cmc teh-cmc changed the title from Clarify "nop before call" paragraph to chapter1: Clarify "nop before call" paragraph Mar 5, 2018


This comment has been minimized.


teh-cmc commented Mar 5, 2018

So here's the boring, abridged backstory:
A few years back, I was reading various books about the Linux kernel. One of these books was particularly heavy on assembly, and specifically mentioned this pattern of landing a JMP on a NOP if the next instruction happens to be a CALL.
It looked so odd to me that I've kept a vivid memory of it for all this time; and when I saw this same pattern again in the output of the Go compiler, it ticked immediately.

Unfortunately I cannot recall the name of this book for the life of me, nor the reason that was given for this pattern to exist. Googling for tricky usages of NOP instructions mostly redirects to security-related stuff, and I don't remember this being related to security. But then again, I've got no source at all to back this up, and I might as well be talking complete non-sense here, so...

Anyway, I was kinda hoping that someone with better assembly skills than me would be able to shed some light on this.

Now, on the bright side, your question made me go back into the code to look for more clues.
Looking back at the code, I've noticed that the NOP instruction that's inserted by the compiler is marked as being 0 byte instead of the 1-byte instruction you'd expect it to be:

0x003a NOP				    ;; 0x3a
0x003a CALL runtime.morestack_noctxt(SB)    ;; 0x3a too

Now this is just some abstract assembly that can and will be modified by the linker in many ways, but still, this looks odd. So I went a bit deeper.
Grepping through the codebase in search of weird-looking NOP instructions, I stumbled upon this:

// The NOP is needed to give the jumps somewhere to land.
// It is a liblink NOP, not an ARM64 NOP: it encodes to 0 instruction bytes.
q = q1

Whose description sounds very much like what we're looking at. We do need somewhere to land, and we seem to be 0 bytes...
Maybe that's a start?

In the end, this might not be related at all to what made me write this in the first place. Heh.

Sorry I cannot help you more here. Thanks for the great question though :)


This comment has been minimized.

zliuva commented Apr 11, 2018

This does not seem to be a generic "NOP-before-CALL" situation but rather a fix up for the stacksplit epilogue to maintain the correct stack pointer adjustment (for debugging purposes only it seems):

e.g. on x86 and arm64

// Now we are at the end of the function, but logically
// we are still in function prologue. We need to fix the
// SP data and PCDATA.
spfix := obj.Appendp(last, newprog)
spfix.As = obj.ANOP
spfix.Spadj = -framesize

as you have observed, NOP does not map to a machine code NOP but rather is simply ignored when generating machine code. But the Spadj value is used so that current SP adjustment value is correct.

Also note that on certain architectures, the stacksplit check was in the prologue with a jump to skip the check so such fix-up is unnecessary. (i.e. instead of "jump to stacksplit (in epilogue) if not enough space", the emitted code reads "jump over stacksplit (in prologue) if having enough space"). e.g. when compiling with GOOS=linux GOARCH=mips the following is generated:

MOVW	R31, R3
CALL	runtime.morestack_noctxt(SB) ; no NOP before CALL

As for Spadj, it seems it is used to maintain a mapping between PC and corresponding SP adjustment at the PC. [1][2] You can access this information through go tool compile -d pctab=pctospadj. (note that for the example provided in the chapter, framesize happen to be 0 so the effect of this fix-up is not noticable.)

P.S. The "doing so can lead to very dark places" case you are thinking of might be referring to the practice of appending NOPs after any branching instruction for architectures that have branch delay slots as oppose to this case.


This comment has been minimized.


teh-cmc commented Apr 17, 2018

Thanks for the explanation as well as the links @zliuva, it all makes perfect sense once you've read that code! I was so focused on my CALL backstory that I didn't even think to look at the implementation on the compiler's part, heh.
Also I wasn't aware of that -d flag, it's a real gold mine; definitely gonna help a ton with future chapters.
Anyway, mystery solved, then. This indeed has nothing to do with the CALL.

I do remember reading about delay slots a few years back, so that might be it yes; I'll have to dig further.

Thanks again for your pointers!


This comment has been minimized.


teh-cmc commented Apr 17, 2018

I've added a link to this discussion in the relevant part of chapter 1; and will be closing this now.
Don't hesitate to add more comments even if it's closed!

Hopefully I'll take the time to update chapter 1 with all these learnings some day.

@teh-cmc teh-cmc closed this Apr 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment