-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inline assembly improvements #215
Comments
One idea: const result = asm volatile ("rax" number: usize, "rdi" arg1: usize, "rcx", "r11")
-> ("rax" ret: usize) "syscall" (60, 0); This shuffles the syntax around and makes it more like a function call. Clobbers are extra "inputs" that don't have a name and a type. The register names are still clunky. This proposal also operates on the assumption that all inline assembly can operate on inputs and outputs. |
@ofelas can I get your opinion on this proposal? |
Right, you really made me thinkg here, haven't done that much asm in zig yet, here are a few that I've used on x86, they primarily struggle with the issue of multiple return values, the below examples may not be correct, I always end up spending some time reading the GCC manuals when doing inline asm in C, it isn't always straight forwards. I just skimmed through the discussion over at Rust users and Rust inline assembly, they seem to have similar discussions and it seems that the asm feature may not be used that much. If you really need highly optimized or complex asm wouldn't you break out to asm (or possibly llvm ir)? I guess what we have to play with is what LLVM provides, at least as long as zig has a tight connection to it (It seems there are discussions on also supporting Cretonne in Rust according to the LLVM Weekly). With the above proposal would I write the PPC Not sure I answered you question...
The above obviously is a kludge, I initially hoped to write it that more like this, it does however feel strange having to specify the outputs twice, both lhs and inside the asm outputs, with the potential of mixing the order which may be important.
Or possibly like this, not having to undefined/zeroes/0 the output only parameters;
I've also tinkered with the
|
With the proposal, fn rdtsc() -> u64 {
const low, const high = asm () -> ("eax" low: u32, "edx" high: u32) "rdtsc" ();
((u64(high) << 32) | (u64(low)))
} This seems like an improvement. cpuid with the proposal. I propose that instead of naming the function after the assembly instruction, we name it after the information we want. So let's choose one of the use cases, get vendor id. fn vendorId() -> (result: [12]u8) {
const a: &u32 = (&u32)(&result[0 * @sizeOf(u32)]);
const b: &u32 = (&u32)(&result[1 * @sizeOf(u32)]);
const c: &u32 = (&u32)(&result[2 * @sizeOf(u32)]);
*a, *b, *c = asm () -> ("ebx" a: u32, "ecx" b: u32, "edx" c: u32) "cpuid" ();
} Once again So far, so good. Any more use cases? |
Yes, that ain't too shabby, so with the correct input in
Would something like this be possible, ignoring my formatting?
|
Ah right, nice catch. I like putting the values of the inputs above as you did. Then we don't need them below. Is the count arg necessary to have the movq instruction? seems like we could pass that as a register. And then finally result should be an output instead of an input right? So it would look like this: const result = asm ( // inputs
"{rcx}" cnt: usize = count,
"=r" lhs: usize = &left,
"=r" rhs: usize = &right,
// clobbers
"al", "rcx", "cc")
-> ( // outputs
"=r" res: u8)
// multiline asm string
\\1b:
\\movb -1(%[lhs], %rcx, 1), %al
\\xorb -1(%[rhs], %rcx, 1), %al
\\orb %al, %[res]
\\decq %rcx
\\jnz 1b
); This is a good example of why we should retain the constraint syntax, since we might want |
Not too familiar with the
Are you planning to support all the various architecture specific input/output/clobber constraints and indirect inputs/outputs present in LLVM? |
Another avenue to go down is the MSVC way of doing inline assembly. M$ does a smart augmented assembly, where you can transparently access C/C++ variables from the assembly. An example would be a memcpy implementation: void
CopyMemory(u8* Dst, u8* Src, memory_index Length)
{
__asm {
mov rsi, Src
mov rdi, Dst
mov rcx, Length
rep movsb
}
} It provides a really nice experience. However, MSVC isn't smart about the registers, so all registers used are backed up to the stack before emitting the assembly, and are then restored after the assembly. This avoids the mess of having to specify cluttered registers, but at the cost of a fair bit of performance. The smart syntax is awesome, but it might be hard fit with a LLVM backend, if you do not want to write an entire assembler as well. |
As kiljacken says, I personally really, really enjoy the Intel syntax over GAS as D has done it (except for GDC, which is based on GCC). I'm only assuming it'll be harder to implement a MSVC-styled inline assembly feature. |
The end game is we will have our own assembly syntax, like D, which will end up being compiled to llvm compatible syntax. It's just a lot of work. I at first tried to use the Intel syntax but llvm support for it is buggy and some of the instructions are messed up to the point of having silent bugs. |
That's a constructive point! Thanks:)
Oh, I had forgotten the godbolt and surely didn't know it supports msvc there too! Thanks again. |
Could Zig start supporting Intel's assembler syntax like LLVM's clang does? |
As far as i know intel asm syntax support is broken in LLVM and gcc has incomplete implementation too. I forgot what exactly intel asm support in GCC lacks but ive stumbled on that issue and had to revert to AT&T syntax. If i remember correctly it was named inline asm constraints dereference that don't work in gcc/llvm |
That's right man. I got an answer that can partially help me for Zig...
|
@isoux, to me the link gives no answer and says:
...
It would be interesting to know at least the name of the Discord channel. |
It's #os-dev in the Zig discord: https://discord.gg/zig |
My apologies, I forgot that the link points to https://discord... where only registered users can access. |
I don't know if this is interesting for Zig, seeing as Zig's now working on multiple backends where possibly some sort of distinct asm syntax might be used. So what I noticed when I was doing research on inline asm is that it was possible to create a uniform asm syntax that would cover pretty much all architectures.
What you see here is x64 of course, but it works similarly for aarch64. The syntax is Where An "arg" in my language is one of
This AST is then taken and translated into whatever the backend's format is, it also handles things like auto detecting clobbering etc. In the LLVM backend case this generates the constraints and the asm string. Because the AST is the same regardless of arch, and the whole translation can be fairly declaratively described, it should be significantly less work to do this than having an inline asm which is unique per architecture. It can be type checked and validated in a fairly generic way. This is already implemented for C3, but with the following restrictions: (1) the entire set of instructions are not done yet, I did the proof of concept then left it to other to fill out the list of instructions. (2) labels are lacking. Those should be added. That said it should be possible to play around with what's there. |
And how does your asm syntax handle clobbers? How does it cast types of inputs and outputs? What you're proposing is just a version of a black box inline asm that msvc had for x86. They had to abandon it in x86_64 because it was bad. I see it as a no go. It might look nice, but its performance and applicability is very low. Sure, it will work for syscalls, but for something like crypto instructions and complex optimizations, no. |
The syntax does not handle clobbers. But the compiler will calculate the minimal set of clobbers as it knows the set of clobbers for each instruction. Each instruction has a tiny definition in the source code, e.g. reg_instr("movzbq", "w:r64/mem, r8/mem");
I am not proposing anything. I am just explaining how it works in C3, to offer some food for thought regarding the future Zig inline assembly improvements. Also, you are wrong about what it does, in particular, you should note that I say:
x86 MSVC inline asm just clobbers everything at input/output. Also the inline asm of MSVC retains the full MASM syntax, which adds quite a bit of complexity to the lexer and parser, not to mention then having to also support aarch64 later on. In addition to this, due to the compiler not being able to reason about the asm (even when auto-detecting clobbers), it is a fairly reasonable solution to look at intrinsics. But the problem with intrinsics is that you need so many of them.
It was bad for the above reasons yes. The C3 asm shares none of those problems (except of course the compiler not being able to reason about the internals of the asm aside from the clobbers and in/out parameters) Given this inline asm: asm
{
movl x, 4;
movl [gp], x;
movl x, $eax;
movq [&z], 33;
} The following LLVM IR is generated: %4 = load ptr, ptr %gp, align 8
%5 = call i32 asm alignstack "movl $$4, $0\0Amovl $0, ($2)\0Amovl %eax, $0\0Amovq $$33, $1\0A", "=&r,=*m,r,~{flags},~{dirflag},~{fspr}"(ptr elementtype(i64) %z, ptr %4) We can try something simpler, e.g. asm
{
movl x, $eax;
} Now this can be a bit confusing as I use intel ordering but the AT&T suffixes (you can do whatever you want with this) Anyway this yields %4 = call i32 asm alignstack "movl %eax, $0\0A", "=r,~{flags},~{dirflag},~{fspr}"()
store i32 %4, ptr %x, align 4 Like we want. I mean this is just the rough outlines to explain how it's done. So it's fairly straightforward:
|
Note there that there are even more subtle use of constraints available, such as pinning variables to particular registers. This can be achieved by inserting pseudo-instructions, eg @pin(x, $eax); Or some such. I have not implemented this myself, but it is an obvious improvement one could build on. Or simply retain the string based inline ASM for the cases when more exact control is needed. |
Interesting. Thank you for in-depth explanation. I will look into it more. |
C3 documentation says:
@lerno, may I ask you? Please excuse me if I sound dumb. How does C3 compiler detects the ISA used in the Personally I would like to be able to indicate the targeted ISA explicitly. Eg. to make the current core sleep on x86 and MIPS platforms, it would be nice to have something like that:
... so only the specific asm-block gets built on a given architecture (if any). |
Assembly can be used to call or jump to external code that the compiler has no awareness of. For example, the DOS API via INT 21. Manually specifying clobbers seems unavoidable in that type of situation. |
Absolutely. Right now I wrap it in a compile time if, which works fine since they all parse the same way. I think such an inline annotation is useful though, but there is a question whether such annotation is just on architecture or is also switched on CPU features for example. |
Certainly, and there are probably several ways one could handle this. Off the top of my head:
|
I would want a robust inline asm instead of easy one for most use cases. |
The question is "what is robust" though. It is a well known problem that string based inline asm has zero checks that it actually uses the correct clobbers etc. At least in my research I found pages discussing bugs in asm blocks due to people either misunderstanding the constraints or mistakenly forgetting about some. The problem here of course being that they're only observable in specific scenarios, so it would occur almost randomly depending on the final register allocation and code ordering. So one also has to ask oneself: is manual clobbering LESS or MORE likely to be correct than automatic clobbering? If one is worried about things like INT21, then perhaps start with a pessimistic clobber, allowing the user to ease the restrictions. This is something that should be discussed. |
Robust in the sense that Zig can have one inline asm syntax that can do everything without compromises.
…________________________________
From: Christoffer Lerno ***@***.***>
Sent: Sunday, June 18, 2023 12:22:54 AM
To: ziglang/zig ***@***.***>
Cc: eLeCtrOssSnake ***@***.***>; Mention ***@***.***>
Subject: Re: [ziglang/zig] inline assembly improvements (#215)
The question is "what is robust" though. It is a well known problem that string based inline asm has zero checks that it actually uses the correct clobbers etc. At least in my research I found pages discussing bugs in asm blocks due to people either misunderstanding the constraints or mistakenly forgetting about some.
The problem here of course being that they're only observable in specific scenarios, so it would occur almost randomly depending on the final register allocation and code ordering.
So one also has to ask oneself: is manual clobbering LESS or MORE likely to be correct than automatic clobbering? If one is worried about things like INT21, then perhaps start with a pessimistic clobber, allowing the user to ease the restrictions. This is something that should be discussed.
—
Reply to this email directly, view it on GitHub<#215 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJSXC4QBT4DFJG5BKLSA3ETXLYN25ANCNFSM4CWYUVMQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
instead of text asm we could perhaps have a and then language-level inline asm would only be text thats generated at comptime by the functions |
@nektro Do you mean essentially making a mini DSL using compile time functions corresponding to the particular instructions? |
This immense amount of work to even make these simple intrinsics FOR EVERY ARCH, excluding adding safety. Also, what do you mean by llvm forgoes optimization for inline asm? It does everything in it's power to make the assembly block performant based on inputs, outputs, clobbers. It won't touch your asm code, for the reasons you(programmer) wrote it in the first place. |
Why not just make asm blocks with a zig-like DSL? Reuse as much zig syntax as possible: asm volatile {
mov (rax, ecx);
} IMO the asm syntax shouldn't be annoying or ridiculously complicated. As it currently stands, the current asm syntax definitely doesn't convey intent precisely, nor do any of the proposals that I've seen on this thread -- they all just make everything more confusing to me, and I assume it's confusing to the majority of others who try to write it, let alone read it. Though this isn't a problem unique to Zig. @eLeCtrOssSnake You want a "robust" solution for asm syntax; many of us want an easy to read and write one too. The only way you could achieve a robust one is to write your own assembler. It'll take a lot of work, but that's inevitable anyway unless you don't want anything checking the correctness of your code and just "assuming" you're an assembly genius who always knows what they're doing. Sure, LLVM is good at optimizing asm blocks with inputs/outputs/clobber specs, but it sacrifices readability and writability and it doesn't actually check the constraint codes you use other than verifying they're correct. |
The MSVC way is good; I know that D and Pascal do something like it. But it lacks any constraint mechanism. But that could probably be added. Point is, if we're going to stick with ridiculously unreadable inline assembly I might as well just forgo inline asm and just write a bunch of asm in a big .s file for the architecture I'm trying to support; that at least is readable and comprehensible. It doesn't seem too unreasonable to expect a similar guarantee from my programming language. |
Would one even need to specify clobbers if every arch and instruction were supported? Is there a scenario where we wouldn't know which regs were clobbered? |
I see a lot of newly proposed syntax to structure the inputs, outputs, and clobbers of an inline assembly. This is all in my opinion unreadable and difficult for a beginner to understand. I believe this could all be implemented reusing existing Zig syntax of anonymous structs, tuples, and enum literals: // asm [volatile] (
// assembly string,
// anonymous struct of ins
// anonymous struct of outs
// anonymous struct (tuple) of clobbers
// )
pub fn syscall3(number: usize, arg1: usize, arg2: usize, arg3: usize) usize {
return asm volatile (
"syscall",
.{
.rax = .{ number, "number" },
.rdi = .{ arg1, "arg1" },
.rsi = .{ arg2, "arg2" },
.rdx = .{ arg3, "arg3" },
},
.{ .rax = usize },
.{ .rcx, .r11 },
);
} I apologize for using a simple syscall as an example, but I do not know assembly, and I am afraid I could not write a correct string of more complex assembly. I hope it gets my idea of the syntax across. Giving the inputs types could either be done through the type of the variable passed in or another argument of the tuple. Since this uses entirely valid existing Zig syntax, this could even use a builtin instead of a keyword. I am, however, unsure of using a builtin due to the inability to use the volatile keyword. Volatility could be accomplished through a boolean flag, but I would much prefer the usage of the existing volatile keyword. |
@Spitz7279 I'm not exactly sure how what your proposing is any more readable than a syntax like that which I proposed previously. I mean yes that reuses (some) zig syntax but at this point why not make the syntax just automatically deduce templated arguments or something? |
My proposed usage reuses all valid Zig syntax. Outside of an inline assembly block, it would be valid Zig that you could parse on your own. It also retains the comptime known string of assembly, allowing the developer to use |
This inline assembly does
exit(0)
on x86_64 linux:Here are some flaws:
[number]
,[arg1]
,[ret]
unused, and that is awkward.when we get errors from parsing assembly, we don't attach them to the offset from within the assembly string.connect inline assembly errors from zig back to the source #2080The text was updated successfully, but these errors were encountered: