Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: New Inline Assembly #5241

Open
ghost opened this issue May 1, 2020 · 9 comments
Open

Enhancement: New Inline Assembly #5241

ghost opened this issue May 1, 2020 · 9 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ghost
Copy link

ghost commented May 1, 2020

Copied over from #215. Inspiration is via them. Thanks also to @MasterQ32 and @kubkon for help extending it to support stack machine architectures. See #7561 for standalone assembler improvements.

New Inline Assembly

asm volatile? {bindings}? body? : post_expression?

TL;DR: Benefits over Status Quo

  • No mandatory sections -- flexible to any application
  • Components are listed in evaluation order
  • First-class support for stack machine architectures
  • First-class support for floating point, vector, and as yet unforeseen register types
  • Operands have types
  • Named inputs are optional
  • Input/output characteristics fully customisable
  • Not bound to an input/output model
  • Can access program symbols and call functions safely
  • Volatility is inferred in most cases
  • Concise, flexible wildcard syntax
  • Substitution syntax easier to scan and less likely to clash with native symbols
  • Open to architecture-specific extensions
  • Communicates stack-relevant metadata to compiler
  • Can be automatically distinguished from status quo; no sneaky breakages

Stack Machines

This syntax has first-class support for stack machine architectures such as WebAssembly, the JVM, and @MasterQ32's SPU Mk. II. It accomplishes this with a novel batch-push and -pop mechanism for marshaling between Zig and the stack. Because there is significant difference between register and stack machine architectures, a new .paradigm() method is defined on builtin.Arch, which returns an enum with the variants .register and .stack. (NOTE: supporting stack machines with LLVM is a very hard problem -- maybe defer to stage 2?)

Meta

At least one of body or post expression must be present. The expression inherits block/statement status from the post expression if present, and defaults to statement if not.

Volatile

This block has side effects, and may not be optimised away if its value is not used. Implied by a return type of void or noreturn, or a mutable symbol binding -- so, in practice, very rarely used.

Bindings

There are three types of bindings: operand, symbol, and clobber. All of them use specially formatted comptime strings to interface with assembly, as in status quo. This decision was made as integrating the required functionality into Zig itself would have required either breaking several guidelines or introducing special constructs with no other use cases.

Operand

An operand binding has the form "operand" name: type = value. Within the block, ?(name) then refers to operand compatible with Zig type type, initially with value value, which may be a register (integer, float, or vector), a datum literal (only integer in every ISA I'm aware of), a stack top (array with size a multiple of stack alignment), or a processor condition code (boolean). type must be coercible to all of name's uses in the block, taking into account sign- or zero-extension and lane width/count if applicable, and may be omitted if the type of value is known -- in addition, value may be omitted if initialisation is not needed, and name may be omitted if only initialisation is needed. The type of the binding must be derivable -- that is, at least one of type or value must be present (this also means that operand and symbol bindings are syntactically distinct). Stack pushes and pops must be declared separately -- see below. Condition codes may not be initialised (type must be present and must be bool). operand may be a wildcard, as described below.

Symbol

A symbol binding has the form "type" const? symbol, where symbol is a program symbol in scope. type is a wildcard indicating the type of symbol, which could be a variable or a function. Within the block, ?(symbol) then refers to the assembly program entity corresponding to the Zig program construct (which need not be an exported symbol -- it may be an internal label, a simple address, or even the referenced data itself on stack machines). A const annotation indicates an immutable binding -- this may be safety-checked by comparing the value at the associated address before and after the block. (NOTE: In some assemblies, many label operations are actually macros, which expand to multiple instructions and relocations -- we'd need some way of propagating this information through the compilation pipeline from codegen to linking.)

Clobber

A clobber is simply "location", which may be a literal or a wildcard.

Wildcards

Wildcards indicate that a binding has special properties, and give the compiler freedom to fill in some details. Wildcards start with ? and run the length of the binding string. A literal ? is escaped with another one, for symmetry with in-block syntax. Wildcards may be followed by architecture-dependent :options to place restrictions on their resolution -- for instance, ?reg:abcd for a legacy x86 register on x86_64, or ?int:lo12 for a 12-bit integer immediate on RISC-V. Options may change the type of a binding -- for instance, "?tmp:all" callconv(.fast) is a clobber that binds all callee-saved registers under the fast calling convention.

The following wildcards are defined:

Operand

  • ?reg
    Arbitrary register. Register machine architectures only. value may be an integer, a float, or an int/float vector, of any architecturally-supported width and length.
  • ?tmp
    Arbitrary caller-saved register under current calling convention. See above. May be annotated with callconv to specify a different calling convention.
  • ?sav
    Arbitrary callee-saved register under current calling convention. See above.
  • ?lit
    Literal. value must be comptime-known, and may be any architecturally-supported literal type.
  • ?psh
    Array. value must be provided. Length * element size must be a multiple of platform stack alignment; elements must be size-compatible with stack cells if applicable. Pushed onto the stack at block entry, leftmost element topmost. Only one allowed per block. This is the only way of marshaling non-symbol values into assembly on stack machines.
  • ?pop
    Uninitialised array (value must not be provided). See above. Popped from the stack on block exit, topmost element leftmost. This is the only way of marshaling non-symbol values out of assembly on stack machines.
  • ?stg
    Additional stack growth, i.e. growth not already accounted for by ?push or function calls, in bytes. name, type omitted. value must be comptime-known. (NOTE: This does not imply that the stack pointer has a different value before and after the block -- in fact, unless it is listed as a clobber, this is not allowed.)

Symbol

  • ?locl
    Local variable. Stack machine only.
  • ?argm
    Argument of current function. Stack machine only. Implies const.
  • ?glob
    Global variable.
  • ?thdl
    Thread-local variable.
  • ?comp
    Comptime-known variable/constant. Substitution semantics of a literal. Implies const.
  • ?func
    Function. Registers symbol in this block's call graph. Implies const.

Clobber

  • ?memory
    Unspecified memory.
  • ?status
    Processor status flags.

Body

The assembly code itself, as a comptime string. For symbol scoping purposes, treated as a separate file, i.e. declared symbols do not leak to the rest of the program and elsewhere-defined symbols are not visible except through bindings. May be omitted if only values of registers are desired.

Bound operands and symbols are accessed within the block by enclosing their names in ?(). This syntax was chosen as the ? character is far less commonly used in assembly languages than %, and pairs well with the theme of an unknown resolution -- additionally, parentheses are less likely to have semantic significance than square brackets, so the code is easier to scan. Accessing an unbound name in this manner is a compile error. As with wildcards, names may be modified with :options, for instance ?(r:hi) to access the high byte of register r, or ?(i:x) to print integer i in hexadecimal. A literal ? is escaped with another one, as regular escaping is not possible in multiline strings.

Post Expression

An expression evaluated after the body, using the final values of all bindings. Becomes the value of the whole block. Preceded by a colon. May be omitted without ambiguity, in which case the return type is void. This permits us to return as many values as we like, in whatever format and location we choose. Moreover, we don't have to specify the exact lifetimes of all of our inputs and outputs to appease the optimiser -- we can decide for ourselves how our values are allocated and consumed.

Examples

Simple, bindless assembly is simple:

comptime assert(builtin.arch == .x86_64);

// No unused names, types on everything
asm { "rax": u64 = 60, "rdi": u64 = 0 } "syscall";

// No unnecessary detail
starting_stack_ptr = asm { "rsp" sp: usize } : sp;

More involved assembly is logical:

// Using #1717 syntax because that proposal has been accepted
// -- this proposal does not depend on #1717
const vendorId = fn () void {
    comptime assert(builtin.arch == .x86_64);

    // Multiple return values, anyone?
    return asm {
        "eax": u32 = 0,
        "ebx" b: u32,
        "ecx" c: u32,
        "edx" d: u32,
        "?memory",
    } "cpuid"
    : .{ b, c, d };
};

// In case we have trouble getting RLS working, we can do it directly
const vendorId2 = fn (result: *[3]u32) void {
    comptime assert(builtin.arch == .x86_64);

    // void return type implies volatile
    asm {
        "eax": u32 = 0,
        "ebx" b: u32,
        "ecx" c: u32,
        "edx" d: u32,
        "?memory",
    } "cpuid"
    : {
        result[0] = b;
        result[1] = c;
        result[2] = d;
    }
};

A simple bare-metal OS entry point on RISC-V:

const stack_height = 16 * 1024;
var stack: [stack_height]usize = undefined;

const _start = fn callconv(.naked) () noreturn {
    comptime assert(builtin.arch == .riscv64);

    asm {
        "?func" kmain,
        "?glob" stack,

        "?reg" stack_size: usize = stack_height,
        "?int" slot_shift: usize = @ctz(@sizeOf(usize)),
        "sp", "ra", "t1",
    }
    \\ slli ?(stack_size), ?(stack_size), ?(slot_shift)
    \\ la sp, ?(stack)
    \\ add sp, sp, ?(stack_size)
    \\ call ?(kmain)
    : unreachable;
};

const kmain = fn () noreturn {
    // kernel kernel kernel
};

POSIX startcode (adapted from lib/std/start.zig):

const _start = fn callconv(.naked) () noreturn {
    if (builtin.os.tag == .wasi) {
        std.os.wasi.proc_exit(@call(.{ .modifier = .always_inline }, callMain, .{}));
    }

    asm {
        "?reg" stack_ptr: [*]usize,
    // Much more compact and local
    } switch (builtin.arch) {
        .x86_64 => "mov ?(stack_ptr), rsp",
        .i386 => "mov ?(stack_ptr), esp",
        .aarch64, .aarch64_be, .arm => "mov ?(stack_ptr), sp",
        .riscv64 => "mv ?(stack_ptr), sp"
        .mips, .mipsel => (
          \\ .set noat
          \\ move ?(stack_ptr), $sp
        ),
        else => @compileError("unsupported arch"),
    }
    // By the time we get here, we have the stack pointer
    // -- so, no global required
    : @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{ stack_ptr });
};
@daurnimator daurnimator added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label May 2, 2020
@ghost ghost changed the title Enhancement: New Inline Asm Syntax Enhancement: New Inline Assembly Syntax May 3, 2020
This was referenced May 4, 2020
@Vexu Vexu added this to the 0.7.0 milestone May 6, 2020
@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 27, 2020
@ghost ghost changed the title Enhancement: New Inline Assembly Syntax Enhancement: New Inline Assembly Jan 24, 2021
@andrewrk andrewrk modified the milestones: 0.8.0, 0.9.0 May 19, 2021
@lerno
Copy link

lerno commented Aug 8, 2021

Compared to GCC style syntax this is much more verbose. So would it really make sense to do this?

What people really like is MSVC style, where clobbers, register allocation etc are mostly inferred by the compiler. This requires lots of good defaults, but is great to work with: downside is a lot of loss of control, which could be regained with some clever added optional constraints instead. That said it might violate the Zig explicitness goal.

So if the ideal isn't achievable, why pick a style that is completely new for Zig, rather than the de facto standard?

@lerno
Copy link

lerno commented Aug 9, 2021

Also, everything is still strings, so it's basically leaving everything stringly typed. Compare to the Rust asm that at least defines what looks as constants for things like xmm_reg behaves as an actual constant rather than a random string. Similarly registers in this proposals are also just strings.

I suspect people will just consider this a confusing, hobbled and verbose version of GCC inline asm.

Compare this:

return asm {
        "eax": u32 = 0,
        "ebx" b: u32,
        "ecx" c: u32,
        "edx" d: u32,
        "?memory",
    } "cpuid"
    : .{ b, c, d };

To

static inline void cpuid(int code, uint32_t* a, uint32_t* d)
{
    asm volatile ( "cpuid" : "=a"(*a), "=d"(*d) : "0"(code) : "ebx", "ecx" );
}

You don't like that? Here's another one using code = 1 without the memory:

int a = 0x1, b, c, d;
asm ( "cpuid"  : "=a" (a), "=b" (b), "=c" (c), "=d" (d) : "0" (a) );

So what is the benefit?

@lerno
Copy link

lerno commented Aug 9, 2021

For comparison, the Rust new asm: https://doc.rust-lang.org/beta/unstable-book/library-features/asm.html

@N00byEdge
Copy link
Sponsor Contributor

N00byEdge commented Feb 1, 2022

I like this syntax. A lot. But there is one issue here I think just looks a little weird to me

const _start = fn callconv(.naked) () noreturn {
    if (builtin.os.tag == .wasi) {
        std.os.wasi.proc_exit(@call(.{ .modifier = .always_inline }, callMain, .{}));
    }

    asm {
        "?reg" stack_ptr: [*]usize,
    // Much more compact and local
    } switch (builtin.arch) {
        .x86_64 => "mov ?(stack_ptr), rsp",
        .i386 => "mov ?(stack_ptr), esp",
        .aarch64, .aarch64_be, .arm => "mov ?(stack_ptr), sp",
        .riscv64 => "mv ?(stack_ptr), sp"
        .mips, .mipsel => (
          \\ .set noat
          \\ move ?(stack_ptr), $sp
        ),
        else => @compileError("unsupported arch"),
    }
    // By the time we get here, we have the stack pointer
    // -- so, no global required
    : @call(.{ .modifier = .never_inline }, posixCallMainAndExit, .{ stack_ptr });
};

In here, I just want to grab the value of a register. I don't care about what the mov instruction looks like, and I believe that should be left to the compiler to figure out. Should putting empty asm and replacing "?reg" with a switch on the arch returning "rsp" etc be allowed instead?

@ghost
Copy link
Author

ghost commented Feb 1, 2022

Hmm, I hadn’t thought of that. My instinct is to allow this, but I’m not sure if this would lead to parsing ambiguity. If not, then sure.

@andrewrk andrewrk modified the milestones: 0.10.0, 0.11.0 Apr 16, 2022
@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
@ethindp
Copy link

ethindp commented Jul 28, 2023

I have an alternative proposal that, I think, will be much clearer, and far different from GCC inline asm, or any other asm syntax I've seen. And it won't be just string hackery, either. I imagine this proposal will take a long time to actually implement, but it'll be much, much clearer, and very elegant, and fits Zig's Zen (whereas the current proposal doesn't).

The general idea are asm blocks. The syntax is similar, but with some significant differences:

  • As is currently done, asm blocks begin with the keyword asm, followed by the optional keyword volatile. Then, either:
    • a parenthesized list of semicolon-separated inputs, outputs, and clobber specifications followed by a block; or
    • a block.
    • Inputs, outputs, and clobbers are specified before any assembly statements. (I use "assembly statements" here deliberately, see below.) Input specifications take the form inputs: element1, element2, element3, ...; output and clobber specifications are similar but using the keyword outputs or clobbers, respectively.
    • Input, output, and clobber specifications take input, output, or clobber elements. An input, output, or clobber element can be either of the following:
      • the form arg = value, where arg can be a register or variable, and value can either be a register, variable, or 'memory'; or
      • the form value, which can either be a register or variable, or for the case of clobbers, 'memory'.
    • In the second specification form, the (actual) value is the only way to refer to said value in assembly statements; the first form could be considered a "renaming" of the item.
    • The input/output/clobber specifications are optional, but if parentheses come before the block, at least one of those specifications must be provided. There can only be one of all of the specifications in any given asm block.
    • If the input or output is a compound data type (array, slice, struct, union, ...), that entire compound data type is considered as the input or output; you cannot solely use as an input or output a constituant field or element of that data type.
  • The block contains assembly statements. An assembly statement can either be an assignment statement or instruction.
  • In the case of an assignment statement, the form is a = b;, just as in zig. However, assignments must be "split" assignments; that is, you cannot do a[2..5] = 3;. This is because assignment statements are directly translated into loads and stores, and this version of the syntax doesn't allow for multi-loads and stores because that would be quite complex depending on the architecture, and this proposal is already complex as is. Perhaps we can change this in the future, but I think this is an acceptable limitation for now.
  • The LHS of an assignment statement must be a register, dereference, or constituant element or field of a compound data type. You cannot write to arbitrary memory using this construction; for that, you have to use actual instructions. This is mainly because allowing arbitrary address writes and reads would look quite odd (at least in my opinion), and the usual way of doing this is to load the address into a register and then write to it that way.
  • In the case of instructions, these look like function calls. This sticks to Zigs "favor reading code over writing code" ideal, and also makes things easier for people who aren't experts with inline assembly. For example, the instruction vpcmpltud k3, ymm3, ymm0 would be translated into vpcmpltud(k3, ymm3, ymm0);. Similarly, the ARM instruction LDR r0, [r1] would be translated into ldr(r0, &r1);. (I'm unsure how to translate an instruction like STMFD sp!, {r0-r3, lr} into this syntax, and would appreciate assistance to refine it.)
  • Labels have the same syntax as in zig; same for referring to them in assembly statements (e.g. jmp :do_something).

To provide an example in action, here's the classic CPUID on x86, from Agner Fog's asmlib library, which uses a parameter as a return value (but for this example we just allocate it on the stack and use that). The original example is as follows:

cpuid_ex:
%IFDEF   WINDOWS
; parameters: rcx = abcd, edx = a, r8d = c
        push    rbx
        xchg    rcx, r8
        mov     eax, edx
        cpuid                          ; input eax, ecx. output eax, ebx, ecx, edx
        mov     [r8],    eax
        mov     [r8+4],  ebx
        mov     [r8+8],  ecx
        mov     [r8+12], edx
        pop     rbx
%ENDIF        
%IFDEF   UNIX
; parameters: rdi = abcd, esi = a, edx = c
        push    rbx
        mov     eax, esi
        mov     ecx, edx
        cpuid                          ; input eax, ecx. output eax, ebx, ecx, edx
        mov     [rdi],    eax
        mov     [rdi+4],  ebx
        mov     [rdi+8],  ecx
        mov     [rdi+12], edx
        pop     rbx
%ENDIF        
        ret

We'll drop the prologue, and the example in this proposed syntax becomes:

fn cpuid(a: u32, c: u32) [4]u32 {
    var abcd: [4]u32 = undefined;
    asm(inputs: a, c; outputs: abcd; clobbers: eax, ecx) {
        edx = eax;
        cpuid();
        // These are memory-based movs
        abcd[0] = eax;
        abcd[1] = ebx;
        abcd[2] = ecx;
        abcd[3] = edx;
    }
    return abcd;
}

As another example, take a more complex one, loading the GDT (sorry if this isn't quite valid, I'm not the most skilled at this):

Original:

load_gdt:
    push %rbp
    mov %rsp, %rbp
    sub $32, %rsp
    mov 8(%rsp), %rax
    lgdt (%rax)
    pushq $0x08
    lea reload_segment_regs(%rip), %rax
    push %rax
    lretq
reload_segment_regs:
    mov $0x10, %ax
    mov %ax, %ds
    mov %ax, %es
    mov %ax, %fs
    mov %ax, %gs
    mov %ax, %ss
    mov %rbp, %rsp
    pop %rbp
    ret

In this syntax, this becomes:

fn load_gdt(gdt: usize) void {
    asm(inputs: gdt) {
        rax = gdt;
        lgdt(&rax);
        push(0x08);
        lea(:reload_segment_regs); // labels are always PIC/PIE unless `build.zig` explicitly indicates that the executable is not position independent
        push(rax);
        lret(); // long return
        reload_segment_regs:
        // Register-immediate load
            ax = 0x10;
            // register-register load and store
            ds = ax;
            es = ax;
            fs = ax;
            gs = ax;
            ss = ax;
    }
    }

Like I said, this definitely needs refinement and I think that this will take a long time to completely implement. However, I think that this is, most likely, the proposal that upholds Zig's zen and doesn't make inline assembly look like a complete and utter mess. This syntax has the benefit of giving the compiler a lot of information about what your trying to do, so it could very well optimize your loads/stores into something using AVX or neon if possible. What I'm unsure about are things like:

  • explicit pointer size indicators (e.g. dword ptr)
  • instruction prefixes (rex64, rex.w, etc.) (though we could perhaps make these just another function call)
  • ARM ranged loads/stores
  • AVX masking and broadcasting (e.g. vmovdqu8 zmm16{k1}{z}, [rsi] or vpaddd ymm4 {k2}, ymm4, dword ptr [ADD1] {1to8})
  • Memory offset (mov rbx, [rax + 0x32] for instance) (though maybe we could do mov(rbx, &(rax op...));?)

If you guys want to help refine this I'd appreciate it. I know that some of the syntactic elements that this introduces are unorthodox, and are quite different from Zig's normal syntax, but I did try to stay as close to Zig as possible while compromising on the fact that this was inline assembly and I didn't really have much of a choice. For the RHS of an assignment statement, most valid expressions are allowed, barring multi-loads or stores; I was thinking that you could even call built-ins as well. When this happened, the load/store would be a multi-load/store, but would finish as a single load/store; for example, if you used xmm0 = @sqrt(...)), and @sqrt resulted in the VSQRTSD instruction, the compiler would translate your code appropriately to execute VSQRTSD and then would do a final load into xmm0. Conversely, if it resulted in a libc function call, the compiler would issue the appropriate instructions for a function call, then (attempt) to store the result in xmm0; if it couldn't, an error would result.

I understand that this syntax would result in "behind your back" instructions in certain instances. In the case of the aforementioned built-in function call idea, discarding that for now would be perfectly reasonable. The assignment statement thing was to eliminate the minutia of a ton of movs, or whatever the target uses for loads and stores, and to instead allow the programmer to focus on what they really are using inline assembly to accomplish. Obviously, if you really wanted to they could fall back to setting everything up themselves; this syntax does not prevent you from using any instruction that the target supports, even if there is a more "abstract" syntax available.

@lerno
Copy link

lerno commented Jul 31, 2023

@ethindp You might want to draw some inspiration from how C3 does it: https://c3-lang.org/asm/ It creates a very simple, regular grammar and infers clobbers.

@ethindp
Copy link

ethindp commented Jul 31, 2023

@lerno That's an interesting syntax, but IMO it's not as clear as mine (but mine is more complex since I'm trying to be as flexible as possible).

@lerno
Copy link

lerno commented Jul 31, 2023

@ethindp Yes, the focus is trying to be as cheap as possible to implement for various variants of asm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

6 participants