Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline assembly improvements #215

Open
andrewrk opened this issue Nov 18, 2016 · 19 comments
Open

inline assembly improvements #215

andrewrk opened this issue Nov 18, 2016 · 19 comments
Labels
Milestone

Comments

@andrewrk
Copy link
Member

@andrewrk andrewrk commented Nov 18, 2016

This inline assembly does exit(0) on x86_64 linux:

    asm volatile ("syscall"
        : [ret] "={rax}" (-> usize)
        : [number] "{rax}" (60),
            [arg1] "{rdi}" (0)
        : "rcx", "r11");

Here are some flaws:

  • 60 and 0 are number literals and need to be casted to a type to be valid. This causes an assertion failure in the compiler if you don't cast the number literals. Assembly syntax should include types for inputs.
  • [number], [arg1], [ret] unused, and that is awkward.
  • need multiple return values (see #83)
  • do we really need this complicated restraint syntax? maybe we can operate on inputs and outputs.
  • let's go digging into some real world inline assembly code to see the use cases.
  • when we get errors from parsing assembly, we don't attach them to the offset from within the assembly string. #2080
@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Nov 18, 2016

One idea:

const result = asm volatile ("rax" number: usize, "rdi" arg1: usize, "rcx", "r11")
    -> ("rax" ret: usize)  "syscall" (60, 0);

This shuffles the syntax around and makes it more like a function call. Clobbers are extra "inputs" that don't have a name and a type. The register names are still clunky.

This proposal also operates on the assumption that all inline assembly can operate on inputs and outputs.

@andrewrk andrewrk added this to the 0.1.0 milestone Nov 18, 2016
@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Nov 18, 2016

@ofelas can I get your opinion on this proposal?

@ofelas
Copy link

@ofelas ofelas commented Nov 18, 2016

Right, you really made me thinkg here, haven't done that much asm in zig yet, here are a few that I've used on x86, they primarily struggle with the issue of multiple return values, the below examples may not be correct, I always end up spending some time reading the GCC manuals when doing inline asm in C, it isn't always straight forwards.

I just skimmed through the discussion over at Rust users and Rust inline assembly, they seem to have similar discussions and it seems that the asm feature may not be used that much. If you really need highly optimized or complex asm wouldn't you break out to asm (or possibly llvm ir)?

I guess what we have to play with is what LLVM provides, at least as long as zig has a tight connection to it (It seems there are discussions on also supporting Cretonne in Rust according to the LLVM Weekly).

With the above proposal would I write the PPC eieio (and isync, sync) like this _ = asm volatile () -> () "eieio" (); and old style _ = asm volatile ("eieio");? This may typically be available as an intrinsic barrier, I guess. Think I read somewhere that the _ would be the same as Nims discard, it may not be needed as this asm didn't return anything.

Not sure I answered you question...

inline fn rdtsc() -> u64 {
    var low: u32 = undefined;
    var high: u32 = undefined;
    // ouput in eax and edx, could probably movl edx, fingers x'ed...
    low = asm volatile ("rdtsc" : [low] "={eax}" (-> u32));
    high = asm volatile ("movl %%edx,%[high]" : [high] "=r" (-> u32)); 
    ((u64(high) << 32) | (u64(low)))
}

The above obviously is a kludge, I initially hoped to write it that more like this, it does however feel strange having to specify the outputs twice, both lhs and inside the asm outputs, with the potential of mixing the order which may be important.

inline fn rdtsc() -> u64 {
    // ouput in eax and edx
    var low: u32 = undefined;
    var high:u32 = undefined;
    low, high = asm
        // no sideeffects
        ("rdtsc"
         : [low] "={eax}" (-> u32), [high] "={edx}" (-> u32)
         : // No inputs
         : // No clobbers
         );
    ((u64(high) << 32) | (u64(low)))
}

Or possibly like this, not having to undefined/zeroes/0 the output only parameters;

inline fn rdtsc() -> u64 {
    // ouput in eax and edx
    const (low: u32, high: u32) = asm
        // no sideeffects
        ("rdtsc"
         : [low] "={eax}" (-> u32), [high] "={edx}" (-> u32)
         : // No inputs
         : // No clobbers
         );
    ((u64(high) << 32) | (u64(low)))
}

I've also tinkered with the cpuid instruction which is particularly nasty;

inline fn cpuid(f: u32) -> u32 {
    // See: https://en.wikipedia.org/wiki/CPUID, there's a boatload of variations...
    var id: u32 = 0;
    if (f == 0) {
        // Multiple outputs (as an ASCII string) which we mark as clobbered and just leave untouched
        return asm volatile ("cpuid" : [id] "={eax}" (-> u32): [eax] "{eax}" (f) : "ebx", "ecx", "edx");
    } else {
        return asm volatile ("cpuid" : [id] "={eax}" (-> u32): [eax] "{eax}" (f));
    }
}

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Nov 18, 2016

With the proposal, rdtsc would look like this in zig:

fn rdtsc() -> u64 {
    const low, const high = asm () -> ("eax" low: u32, "edx" high: u32) "rdtsc" ();
    ((u64(high) << 32) | (u64(low)))
}

This seems like an improvement.

cpuid with the proposal. I propose that instead of naming the function after the assembly instruction, we name it after the information we want. So let's choose one of the use cases, get vendor id.

fn vendorId() -> (result: [12]u8) {
    const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
    const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
    const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
   *a, *b, *c = asm () -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32) "cpuid" ();
}

Once again volatile not necessary here. cpuid doesn't have side effects, we only want to extract information from the assembly.

So far, so good. Any more use cases?

@ofelas
Copy link

@ofelas ofelas commented Nov 18, 2016

Yes, that ain't too shabby, so with the correct input in eax it is;

fn vendorId() -> (result: [12]u8) {
    const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
    const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
    const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
   // in eax=0, out: eax=max accepted eax value(clobbered/ignored), string in ebx, ecx, edx
   *a, *b, *c = asm ("eax" func: u32) -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32, "eax") "cpuid" (0);
}

Would something like this be possible, ignoring my formatting?

result = asm ( // inputs
        "=r" cnt: usize = count,
        "=r" lhs: usize = &left,
        "=r" rhs: usize = &right,
        "=r" res: u8 = result,
        // clobbers
        "al", "rcx", "cc")
        -> ( // outputs
        "=r" res)
        // multiline asm string
        \\movq %[count], %rcx
        \\1:
        \\movb -1(%[lhs], %rcx, 1), %al
        \\xorb -1(%[rhs], %rcx, 1), %al
        \\orb %al, %[res]
        \\decq %rcx
        \\jnz 1b
        // args/parameters
        (count, &left, &right, result);

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Nov 19, 2016

Yes, that ain't too shabby, so with the correct input in eax it is;

Ah right, nice catch.

I like putting the values of the inputs above as you did. Then we don't need them below.

Is the count arg necessary to have the movq instruction? seems like we could pass that as a register.

And then finally result should be an output instead of an input right?

So it would look like this:

const result = asm ( // inputs
        "{rcx}" cnt: usize = count,
        "=r" lhs: usize = &left,
        "=r" rhs: usize = &right,
        // clobbers
        "al", "rcx", "cc")
        -> ( // outputs
        "=r" res: u8)
        // multiline asm string
        \\1b:
        \\movb -1(%[lhs], %rcx, 1), %al
        \\xorb -1(%[rhs], %rcx, 1), %al
        \\orb %al, %[res]
        \\decq %rcx
        \\jnz 1b
);

This is a good example of why we should retain the constraint syntax, since we might want {rcx} or =r.

@ofelas
Copy link

@ofelas ofelas commented Nov 19, 2016

Not too familiar with the x86 asm, I nicked that example from the Rust discussions, in this case rcx (and ecx i 32 bit) is a loop counter somewhat similar to ctr on Power PC. So the movq, decq, jnz drives the loop. So as long at that condition is met it probably doesn't matter. Maybe it could have been done with the loop instruction that decrements and tests at the same time.

result is both an input and an output, like if you were updating a cksum or similar where you would feed in an initial or intermediate value that you want to update.

Are you planning to support all the various architecture specific input/output/clobber constraints and indirect inputs/outputs present in LLVM?

@kiljacken
Copy link

@kiljacken kiljacken commented Dec 9, 2016

Another avenue to go down is the MSVC way of doing inline assembly. M$ does a smart augmented assembly, where you can transparently access C/C++ variables from the assembly. An example would be a memcpy implementation:

void
CopyMemory(u8* Dst, u8* Src, memory_index Length)
{
	__asm {
		mov rsi, Src
		mov rdi, Dst
		mov rcx, Length
		rep movsb
	}
}

It provides a really nice experience. However, MSVC isn't smart about the registers, so all registers used are backed up to the stack before emitting the assembly, and are then restored after the assembly. This avoids the mess of having to specify cluttered registers, but at the cost of a fair bit of performance.

The smart syntax is awesome, but it might be hard fit with a LLVM backend, if you do not want to write an entire assembler as well.

@andrewrk andrewrk added this to the 0.2.0 milestone Apr 21, 2017
@andrewrk andrewrk removed this from the 0.1.0 milestone Apr 21, 2017
@dd86k
Copy link

@dd86k dd86k commented Oct 19, 2017

As kiljacken says, I personally really, really enjoy the Intel syntax over GAS as D has done it (except for GDC, which is based on GCC). I'm only assuming it'll be harder to implement a MSVC-styled inline assembly feature.

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Oct 19, 2017

The end game is we will have our own assembly syntax, like D, which will end up being compiled to llvm compatible syntax. It's just a lot of work.

I at first tried to use the Intel syntax but llvm support for it is buggy and some of the instructions are messed up to the point of having silent bugs.

@SamTebbs33
Copy link
Contributor

@SamTebbs33 SamTebbs33 commented Aug 31, 2019

Points 1 and 2 in the OP seem to be solved.

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented May 1, 2020

OUTDATED

This has been split off into #5241. This comment will no longer be updated.

New Inline Asm Syntax

asm (arches) (bindings, clobbers) (:return_register|void|noreturn) { local_labels body } (else ...)? + config? (somewhere)

Arches

An optional list of target architectures. If this is null, the block is assumed to be for all architectures (an assembler error is always a compile error). Otherwise, one of these must match builtin.arch, or an else branch must be present. This is a list rather than a single value as some architectures have mutually compatible subsets (e.g. 8086/x86/x86_64, MIPS/RISC-V).

Bindings and Clobbers

Bindings have the form "register" name: type = init_value. name can be _, if the register is desired only for initialisation. name can also be a variable in scope, in which case type and init_value are omitted, and changes to this register's value are taken as changes to the variable. init_value can be undefined, in which case type can be omitted (it doesn't matter much in assembly anyway), unless name is the return register (more on this later). Clobbers are simply "register".

Return Register

A binding can be nominated as the return value, with :name. (Allowing :"register" would cause parsing ambiguity, and this can be trivially done with a binding anyway.) void and noreturn are also allowed. Reaching the end of a noreturn block is safety-checked UB.

Local Labels

A list of local labels. Formatted as strings.

Local labels are unique to the block: %(label) matches %(label) within the block, and is guaranteed not to match anything else in the program. They are listed within the braces of the body because they really don't make sense outside that context.

Body

The assembly code itself, as a string. If this fails to assemble, it's a compile error.

The following macros are defined:

  • %[name]
    Register, as specified in bindings section.
  • %(label)
    Label, as listed in local labels section.
  • @[variable]
    Pre-mangled global variable name. Used to reference globals. See #5211.
  • @(function)
    Pre-mangled function name. Used to call functions. See #5211.

A literal % or @ is escaped with another one: %% or @@. Strictly speaking, if we're substituting text, only one of @[] and @() is needed -- but, if we want to integrate the assembler with the compiler, the distinction may be important, so I've listed both.

Else

If arches is non-null and none of the listed architectures match builtin.arch, this is compiled instead. Can be used to switch on architectures, optimise a specific architecture only, or simply @compileError. If this is not present, a target mismatch is a compile error.

N.B.: An else branch is only allowed if arches is non-null. This decision was made because, when you set arches to null, either you know execution will never reach this point on the wrong architecture, or you only care about compiling for a specific architecture. In the former case, you definitely want an unexpected architecture to be a compile error; and in the latter, to support a new architecture, the laziest thing you can do is start caring.

Config

Configuration is passed in a pragma (#5239) with the following fields:

  • impure
    This block has side effects.
  • stack(n)
    This block allocates n bytes on the stack. Defaults to 0.
  • calls(funcs)
    This block calls the functions listed in funcs. Defaults to .{}.

Example

const builtin = @import("builtin");

const fib_asm = fn (n: u32) u32 {
    return asm (.{ builtin.Arch.riscv64, builtin.Arch.riscv32 }) @{
        stack(12),
        calls(.{ fib_iter }),
    } (
        "a0" this  : u32 = 0,
        "a1" next  : u32 = 1,
        "=r" to_go : u32 = n,
    ) :this {
        .{ "loop", "end" }

        \\%(loop):
        \\  bez %[to_go], %(end)
        // We can do function pro/epi at callsite!
        \\  addi sp, -12
        \\  sd ra, 0(sp)
        \\  sw %[to_go], 8(sp)
        \\  call @(fib_iter)
        \\  lw %[to_go], 8(sp)
        \\  ld ra, 0(sp)
        \\  addi sp, 12
        \\  addi %[to_go], -1
        \\  j %(loop)
        \\%(end):
    } else @compileError("Your machine could be better");
};

// Actually returns two values, but the compiler has no way to express that
const fib_iter = fn @{callconv(.Naked)} (this: u32, next: u32) void {
    // No need to check architecture -- we'll only call this from fib_asm
    asm (null) @{impure} (
        "a0" this,
        "a1" next,
        "=r" temp = undefined,
    ) void {
        .{}

        \\  add %[temp], %[this], %[next]
        \\  mv %[this], %[next]
        \\  mv %[next], %[temp]
    };
};

TL;DR: Benefits over Status Quo

  • If any of the sections are missed, the compiler can detect exactly which ones
  • Order of mandatory components has a logical progression, just like function declaration
  • Option to tie to target architecture
  • Registers have types
  • Can express non-returning and valueless assembly
  • Can reference global variables and call functions
  • Won't unexpectedly jump to random points in the program
  • Communicates metadata to compiler, but does not require it
  • Provides alternative for unsupported architectures
  • Can be automatically distinguished from status quo, albeit with some lookahead
  • Can be automatically derived from status quo

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented May 1, 2020

Ok, sorry, I changed it. I can't help it, I'm a perfectionist.

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented May 1, 2020

Ok, it's a living document. I'll admit it.

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented May 1, 2020

I've split it off into its own issue. See above.

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented May 8, 2020

Hey @andrewrk -- given the emphasis on stabilisation in this release cycle, should we take the time to get this right now, so we're not stuck with it forever?

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented May 10, 2020

Hey, I did a fairly major rework of #5241 recently. Now there's a more powerful constraint syntax.

@andrewrk
Copy link
Member Author

@andrewrk andrewrk commented Jun 9, 2020

Possible inspiration from Rust: New inline assembly syntax available in nightly

@EleanorNB
Copy link
Contributor

@EleanorNB EleanorNB commented Jun 10, 2020

For those who want to look further into that, there's more here.

There's a lot of good stuff there. The two deal-breakers for me are contextually repurposed syntax (out is not a function, reg is not a variable) and behind-the-scenes non-configurable action (assigning outputs). I've updated #5241 with the good stuff.

@andrewrk andrewrk removed this from the 0.7.0 milestone Oct 9, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Oct 9, 2020
@andrewrk andrewrk removed this from the 0.8.0 milestone Apr 21, 2021
@andrewrk andrewrk added this to the 0.9.0 milestone Apr 21, 2021
@andrewrk andrewrk removed this from the 0.9.0 milestone May 19, 2021
@andrewrk andrewrk added this to the 0.10.0 milestone May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants