Skip to content

Support frame pointers on S390x architecture#24

Open
tmcgilchrist wants to merge 6 commits into
trunkfrom
s390x_frame_pointers
Open

Support frame pointers on S390x architecture#24
tmcgilchrist wants to merge 6 commits into
trunkfrom
s390x_frame_pointers

Conversation

@tmcgilchrist

@tmcgilchrist tmcgilchrist commented Dec 11, 2024

Copy link
Copy Markdown
Owner

Preliminary work to support frame pointers and perf on s390x

@tmcgilchrist

Copy link
Copy Markdown
Owner Author

OCaml stack frame layout on s390x (with frame pointers)

Each OCaml function allocates: lay %r15, -frame_size(%r15)
Frame pointer convention: {backchain, r14} at SP+0, SP+8

  High addresses
  │                                                      │
  │  ┌───────────────────────────────────┐               │
  │  │         caller's frame            │               │
  │  │             ...                   │               │
  │  ├───────────────────────────────────┤ ◄── caller SP │
  │  │  +0  backchain ───────────────────│───► caller's caller SP
  │  │  +8  r14 (return into caller's    │               │
  │  │       caller)                     │               │
  │  │ +16  locals / spills              │               │
  │  │      ...                          │               │
  │  │      outgoing stack args (if any) │               │
  │  ├───────────────────────────────────┤ ◄── current SP
  │  │  +0  backchain ───────────────────│───► caller SP
  │  │  +8  r14 (return into caller)     │               │
  │  │ +16  local slot 0                 │               │
  │  │ +24  local slot 1                 │               │
  │  │      ...                          │               │
  │  │      outgoing stack args (if any) │               │
  │  ├───────────────────────────────────┤ ◄── after ENTER_FUNCTION
  │  │  +0  backchain ───────────────────│───► current SP
  │  │  +8  r14 (return into current)    │               │
  │  └───────────────────────────────────┘               │
  │                                                      │
  Low addresses (stack grows down)

frame_size = backchain_reserve(16) + locals + spills + outgoing_args
aligned to 8 bytes (weird but ok)

slot_offset(Local n) = n + 16
slot_offset(Outgoing n) = n + 16

Contrast with C frame layout (-mbackchain, no -mpacked-stack):
──────────────────────────────────────────────────────────────

    ┌─────────────────────────────────────┐
    │         C caller's frame            │
    │  +0   backchain                     │
    │  +8   ...reserved...                │
    │  +48  r6 save slot                  │ ◄── register
    │  +56  r7 save slot                  │     save area
    │  ...                                │     (160 bytes,
    │ +112  r14 save slot ◄── return addr │     filled by
    │ +120  r15 save slot                 │     callee)
    │ +128  f0,f2,f4,f6 save slots        │
    │  ...                                │
    ├─────────────────────────────────────┤ ◄── C callee SP
    │  +0   backchain ───────────────────►│───► C caller's SP
    │  +8   ...                           │
    │       (callee's own frame)          │

    fi->prev    = backchain = caller's SP  ✓
    fi->retaddr = *(SP+8)   = NOT r14      ✗  (r14 is at caller's SP + 112)

Frame pointer walker (fp_backtrace.c):

    struct frame_info { prev, retaddr }  maps directly to {SP+0, SP+8}

    fi->prev    = backchain  = caller's SP     ✓
    fi->retaddr = *(SP+8)    = r14             ✓

    Walker: fi = fi->prev  (follow backchain to next frame)

The two conventions are incompatible at SP+8. OCaml puts r14 there; C (-mbackchain) puts
reserved/unrelated data there and saves r14 160+ bytes away in the caller's save area.

The code that needs to walk the stack in fp_backtrace.c and runtime/fiber.c need to understand the convention and potentially go through C frames. I've tried a heuristic based on size of the backchain, and assuming that OCaml will probably have smaller frames. Anything with 160 bytes plus is C code using -mbackchain and smaller frames are OCaml. This is quite a hack and not what we want.

Options:

  1. -mpacked-stack, which puts r14 at SP+8 in GCC-compiled frames. Downside is you don't get support in Linux Distros and perhaps perf doesn't know to use it anyway? We would get support in the C runtime and C code we compile.
  2. Use -mbackchain and the backchain+112 convention exclusively on s390x, force OCaml to allocate larger frames than necessary wasting memory. Supported on Ubuntu but no other distros, fall back to DWARF unwinding which is known problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant