Skip to content

arch/x86: Changed system call ABI #4452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: x86-mmu
Choose a base branch
from

Conversation

Ioan-Cristian
Copy link
Contributor

@Ioan-Cristian Ioan-Cristian commented Jun 3, 2025

Pull Request Overview

This pull request changes the system call ABI on x86. Instead of passing arguments and return values on the stack, they are passed in registers. The following registers are used for argument/return values: ebx, ecx, edx, edi.

This change is needed to simplify the implementation of the MMU on the x86 architecture, where the virtual address space of a process doesn't match the virtual address space of the kernel. In that case, writing/reading from stack led to page faults due to missing pages.

The new system call ABI mimics the Linux kernel's ABI and Tock's RISC-V ABI.

Testing Strategy

This pull request was tested by running two libtock-rs applications on QEMU.

TODO or Help Wanted

No help needed.

Documentation Updated

  • No updates required.

Formatting

  • Ran make prepush.

@alexandruradovici
Copy link
Contributor

We are preparing a set of several pull requests to unify the usage of MPU and MMU, eventually with paging capability, for Tock. This pull request is required to simplify the work.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 3, 2025

I also decreased the value of MIN_APP_BRK, since the upcall arguments are passed in registers rather than on stack.

bradjc
bradjc previously approved these changes Jun 3, 2025
brghena
brghena previously approved these changes Jun 3, 2025
ppannuto
ppannuto previously approved these changes Jun 3, 2025
@ppannuto ppannuto added the last-call Final review period for a pull request. label Jun 3, 2025
@alevy
Copy link
Member

alevy commented Jun 3, 2025

Will/does this also require changes to libtock-rs/libtock-c for x86?

@Ioan-Cristian
Copy link
Contributor Author

Will/does this also require changes to libtock-rs/libtock-c for x86?

There is no upstream x86 support for libtock-c nor libtock-rs. There is a pending PR for libtock-rs that attempts to introduce x86 support for the old ABI. Once it is merged, I will create a PR for the new ABI. The required changes are minimal.

@alevy
Copy link
Member

alevy commented Jun 4, 2025

@reynoldsbd @HMiyaziwala any reason changing the ABI in this way would be a problem?

@reynoldsbd
Copy link
Contributor

On mobile so I haven't had a chance to review this PR yet. But the motivation behind the current ABI was to mirror cdecl, which had the specific benefit of making upcalls easier to implement and allowing the usermode callback to be a plain cdecl function. It also made the crt0 easier to write.

I've got no special attachment to the ABI, however I would prefer to preserve the "upcall handler being regular cdecl" feature. So maybe it's possible to pass args in registers for syscalls but preserve the cdecl compatibility for upcalls?

I also recognize our team has not yet published our libtock-c changes for x86. I'll try to make that happen soon, because I think it would be good to see the proposed ABI changes in both repos.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 4, 2025

@reynoldsbd

It also made the crt0 easier to write.

I will take care of porting the ctr0 to the new system call ABI.

I've got no special attachment to the ABI, however I would prefer to preserve the "upcall handler being regular cdecl" feature. So maybe it's possible to pass args in registers for syscalls but preserve the cdecl compatibility for upcalls?

There are two upcalls in my libtock-rs implementation. A naked function that pushes the arguments it gets from the kernel on the stack, then it calls your regular upcall.

Why can't the "kernel upcall" have its arguments pushed on the stack directly by the kernel?

  1. set_process_function() sets the next function to be executed.
  2. With previous ABI, set_process_function() uses the process' stack pointer to push arguments.
  3. Process' stack pointer is a user virtual pointer.
  4. In the MMU-capable Tock, when set_process_function() is called by the kernel, the MMU configuration is not necessarily set for that process.

[1] ^ [2] ^ [3] ^ [4] ==> the kernel accesses a virtual pointer that is not mapped. ==> a kernel fault is trigerred ==> the entire OS shuts down.

Why don't we change the kernel's code such that when set_process_function() is invoked, the MMU is set up for that process?

  1. Inefficient, as the process may be scheduled later, which means that the kernel needs to change the MMU configuration again.
  2. Inefficient, as the userspace-kernel boundary needs to check whether every stack memory access is within process' virtual address space.
  3. Harder to implement than changing the system call ABI.
  4. Since there is no upstream x86 support for libtock-c and libtock-rs, nobody cares about the ABI change.
  5. Have only one point where process memory is accessed, namely grants. Easier to test/certify Tock correctly handles userspace buffers.
  6. Linux is the best kernel ever. Why wouldn't we try to mimic it? :)

@alevy
Copy link
Member

alevy commented Jun 6, 2025

  1. Since there is no upstream x86 support for libtock-c and libtock-rs, nobody cares about the ABI change.

Just noting that this part in particular, is not true. The ABI is part of a stable yet, and upstream libtock-c / libtock-rs so it's technically fine to just change it, but there are thousands? hundreds of thousands? millions? of devices in the field using the current ABI. So it's not true that "nobody care about the ABI change"

@Ioan-Cristian
Copy link
Contributor Author

The ABI is part of a stable yet

Did you mean "The ABI is not part of a stable release yet"?

there are thousands? hundreds of thousands? millions? of devices in the field using the current ABI.

Is this a general statement about ABI changes? If so, you are right. Is this a statement about x86 Tock's ABI change? I doubt the veracity of your statement. Thousands of devices? Maybe. Hundreds of thousands or millions? That looks like an overestimate for me.

Also, Tock doesn't support remote updates, so it is impossible to break any running device. Since remote updates of the kernel are impossible, the apps are either flashed together with the kernel, or separately thanks to dynamic application loader. This means that the ABI of the kernel is known prior to application loading. A developer can compile the apps for a specific kernel ABI.

A minor problem might be closed-source Tock apps. In this case, a developer cannot recompile them, so an ABI change is indeed problematic. Should we care about closed-source apps? Isn't open-source the better way to develop?

Lastly, do we really care about every downstream project using Tock? Due to Tock's licence, it is actually impossible to know which projects/organizations use Tock. And apart from Microsoft, I really doubt there is anyone using the x86 port. I also assume they have full control over the apps they flash with Tock.

@reynoldsbd
Copy link
Contributor

The high end of @alevy's statement is indeed what we are looking at. Unfortunately I'm not able to share any more detail than that. It is also true that we have full control to update our kernel and apps in lockstep, so this PR does not present any risk of unexpected breakage.

What I will say as a general statement is that my team at Microsoft has a certain amount of capacity to work with and contribute to upstream Tock, and the more breakage we have to absorb the less time we have to participate upstream. Even some of the seemingly simple refactoring within tock and libtock-c repos over the past year have taken away from our capacity to help with the x86 upstream efforts. For example, I'm currently spending a ton of time wrestling with the Makefile-related changes in libtock-c over the past year, which is slowing down my ability to upstream the x86 port for libtock-c.

That's the angle I'm coming from, and that's why I asked about libtock-c/-rs updates. I personally don't care as much about the ABI itself changing if it truly needs to, but I'd much prefer if the change was transparent to our downstream use case (i.e. that we don't have to rewrite all our usermode driver wrappers for the new ABI).

Couple of side notes that maybe aren't directly relevant to this PR:

  • Our team has been doing some exploration around paging on x86 and RISC-V and this is a big area of interest for us. Is there somewhere we can go to understand the approach that the larger Tock community is taking towards enabling paging? We'd love to get involved!
  • I have a question about some comments you made above in reference to set_process_function(). It sounds like you're adjusting the ABI in an attempt to avoid translating virtual addresses within the kernel. On our end, we think it may be difficult to avoid making the entire kernel "aware" of virtual addresses and to perform translation of any pointers being passed between kernel and usermode.
    • While it may indeed be possible to revise the ABI in such a way that avoids translation for upcalls, the situation is more complicated for drivers that operate on usermode data buffers (esp. where DMA is involved). I'd love to understand how this would work in your proposal where virtual memory is enabled.
    • I'll also point out that accessing process memory doesn't necessarily require the corresponding page table to actually be set up or active. You can still do "offline" translation between process and kernel address spaces by traversing the page tables to compute an address that is accessible from kernel code.

Like I said, these are more like side notes. Maybe there's a better venue (Slack channel or GH discussion?) to discuss x86 and paging?

@reynoldsbd
Copy link
Contributor

FWIW here's my working branch to add i386 support to libtock-c. I don't think it's quite ready for PR yet, but all ABI-related pieces are there and should provide a good baseline to evaluate these proposed ABI changes.

tock/libtock-c@master...reynoldsbd:libtock-c:i386

@alevy
Copy link
Member

alevy commented Jun 11, 2025

@Ioan-Cristian

Did you mean "The ABI is not part of a stable release yet"?

Yes, sorry and thanks for the correction.

The remainder of the comments, I think this became too adversarial.

In general, yes, we care about downstream projects---both those that have not yet upstreamed all relevant support as well as those do not necessarily contribute code upstream but otherwise engage with the community. There is of course a limit, but we don't want to make life harder for deploying projects unnecessarily.

ABI stability is important not specifically because of the potential for online updates (which do happen, actually) or separate updates of the kernel and applications (which are not deployed anywhere AFAIK, but they are certainly on the roadmap for several projects), but because applications and the kernel are developed and compiled separately, and the only way of ensuring an application works with a kernel is the stability of the ABI. Subtle and less subtle changes can (and do) break things in ways that can significantly increase the burden on testing and thus increase the burden to, for example, track upstream Tock.

OK, none of this is to say that this PR is bad, or shouldn't be accepted (a note/question about that to come in a follow up comment, I'd like to put this high-level discussion of what's worth discussing to bed). It's a good PR and the effort to support actual virtual memory is welcome and the challenge appreciated!

But, in considering whether changes are the right ones, it's important to keep in mind how they impact the ecosystem, not only what is strictly in one repo upstream right now. In this case, the problem is that because libtock-c does not yet have upstream support for x86, it is not immediately obvious what changes to libtock-c will be exactly needed to support the change in system ABI and what impact those changes will have (on performance, code size, application APIs, etc).

@alevy alevy added the P-Significant This is a substancial change that requires review from all core developers. label Jun 11, 2025
Copy link
Member

@alevy alevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now for a more substantive question and comment about these changes:

First, it's good to reconsider the ABI when significantly new features are going to be added to an architecture, like is happening in this PR. It's also good to try and unify the ABI and implementation for MMU and MPU-like architectures (though not strictly necessary if it makes sense for them to use different ABIs).

But i'm not convinced this ABI is the right one. Avoiding pushing to the stack avoids the having to align the kernel and process MMUs, which maybe saves some cycles (eventually the MMU will need to be adjusted before switching to the process and/or back to the kernel, so why not just do it earlier? the actual switch is just a single update to crt0, no?). However, if the system call ABI does not match the C calling convention (which both C and Rust use for external function calls) it would cost additional copying and cycles in userspace and/or the kernel to translate between the calling convention and the Tock ABI.

Moreover, dealing with the kernel not necessarily having a process's pages in it's VM map is going to need to be dealt with for allow buffers, so dealing with it in this way might just be kicking the can down the road.

I don't believe MMU vs. MPU has any impact on the calling convention which, for 32-bit x86 is the same (see the System-V ABI specification for example https://www.sco.com/developers/devspecs/abi386-4.pdf). On 64-bit x86, of course, the calling conventions are different, and indeed most arguments are stuck in registers (certainly enough for Tock system calls).

@Ioan-Cristian
Copy link
Contributor Author

@reynoldsbd

I'm currently spending a ton of time wrestling with the Makefile-related changes in libtock-c over the past year

libtock-c build system is a complete mess. It looks more complicated than it should be. However, I expect source-related changes to be far easier to manage once the i386 build is fixed.

we don't have to rewrite all our usermode driver wrappers for the new ABI

You won't for libtock-rs. Unfortunately, I couldn't compile libtock-c. I've tried both crosstool-ng and manual toolchain setup, both failed to build libstdc++.a. Do you mind if you push the commands used to build libstdc++.a, eventually with a Dockerfile?

the larger Tock community is taking towards enabling paging

As far as I know, there is none else involved with MMU support for Tock.

we think it may be difficult to avoid making the entire kernel "aware" of virtual addresses

The kernel is aware of virtual addresses. Actually, the entire kernel will be aware of virtual addresses. This is different from arch code, that normally shouldn't be aware of virtual addresses.

the situation is more complicated for drivers that operate on usermode data buffers (esp. where DMA is involved)

When a process makes an upcall, the kernel checks if it is in bounds, then translates its starting user virtual address to kernel virtual address. This should be in theory the sole point where offline translation is performed.

You can still do "offline" translation between process and kernel address spaces by traversing the page tables to compute an address that is accessible from kernel code.

See my previous answer.

@Ioan-Cristian
Copy link
Contributor Author

@alevy

this became too adversarial.

Didn't intend to be adversarial. In French culture, people speak more frankly than in the US, so we might be perceived as rude by you, Americans. We also love defending our viewpoints. :)

In general, yes, we care about downstream projects

I never said we shouldn't care about downstream projects. I said we should not care about EVERY downstream project. MIT and Apache licence allows organisations to use Tock without notice. As a consequence, it is impossible to know every downstream user of Tock, their needs and their capability to adapt to changes.

both those that have not yet upstreamed all relevant support as well as those do not necessarily contribute code upstream but otherwise engage with the community

Here I have a different vision, but I would just stop here. Don't want to be "adversarial" again. :)

online updates (which do happen, actually)

Can you update Tock online without an access to a debugger and using a communication protocol such as Ethernet or SPI?

Subtle and less subtle changes can (and do) break things in ways that can significantly increase the burden on testing

For libtock-rs, changes are minor, a simple git pull should simply work. The change has been tested by running several applications on QEMU. Maybe people at Microsoft can test the PR against real hardware and confirm that everything still works.

As for libtock-c, check my answer to reynoldsbd.

OK, none of this is to say that this PR is bad, or shouldn't be accepted

Depending on your discussions, this PR may be bad or not. may be accepted or not. Check my next comment for more details. :)

it's important to keep in mind how they impact the ecosystem, not only what is strictly in one repo upstream right now. In this case, the problem is that because libtock-c does not yet have upstream support for x86, it is not immediately obvious what changes to libtock-c will be exactly needed to support the change in system ABI and what impact those changes will have (on performance, code size, application APIs, etc).

x86 support is recent. I expect very few projects to use it. Also, the people who use the x86 port didn't bother to come up with a well formatted PR, neither for Tock nor for libtock-c.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 12, 2025

@alevy

eventually the MMU will need to be adjusted before switching to the process and/or back to the kernel, so why not just do it earlier?

I have already said why this is bad. A process may not be scheduled immediately, so the kernel has to setup the MMU just to schedule an upcall or set system call return values.

the actual switch is just a single update to crt0, no?

What do you mean by crt0? Did you mean CR0 (control register 0)? The page map base address is hold in CR3. And no, changing MMU configuration is not done by changing the CR3 to point to another table. The current upstream x86 Tock MMU support to emulate an MPU uses a single table. Every time a process is going to be run, the table is changed. A similar approach is used for ARMv8-A. Why this approach? For a few reasons:

  1. A priori, the kernel doesn't know how much virtual memory a process needs. This means that its table should have a variable-length. So the kernel should have the possibility to dynamically allocate memory for the table. I decided to avoid this, as this would have meant an even more complex PR.
  2. Using a per-process table means that kernel pages should also be stored in its table, right? The current MMU implementation that's used for emulating a MPU, uses a 4KiB Non-PAE page translation in legacy mode, which is a two level translation, with level 1 translation having a granule of 2MiB or 4MiB. Tock is small enough that such a granule suffices, but if it gets bigger, multiple level 1 entries would be required. And that for each process. Also, if the process' memory is sparse, using a per-process table would have many unused entries for each process.
  3. Using a global table that is constantly updated was easier to implement (at least for me). Initially, there was no plan for MMU and a global table better reflected the way MPUs work, which for ARMv*-M and RV architectures have a global table of MPU entries. It can also save memory when there are hundreds of processes running.

How expensive is a table change? Quite expensive for several reasons:

  1. Memory writes have to be non-temporal so they can be seen by the page-walker.
  2. Depending on the size of the process virtual address, many writes can be involved.
  3. Since x86 doesn't support ASIDs (yet), the entire TLB must be flushed. This drawback doesn't exist on the ARMv8-A port which uses ASIDs.
  4. Memory barriers and an instruction barrier (on ARMv8-A) stall the pipeline.

However, if the system call ABI does not match the C calling convention (which both C and Rust use for external function calls) it would cost additional copying and cycles in userspace and/or the kernel to translate between the calling convention and the Tock ABI.

Look at subscribe disassembly from libtock-c:

00000000 <subscribe>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   57                      push   %edi
   4:   56                      push   %esi
   5:   53                      push   %ebx
   6:   83 ec 0c                sub    $0xc,%esp
   9:   8b 55 08                mov    0x8(%ebp),%edx
   c:   ff 75 18                push   0x18(%ebp)
   f:   ff 75 14                push   0x14(%ebp)
  12:   ff 75 10                push   0x10(%ebp)
  15:   ff 75 0c                push   0xc(%ebp)
  18:   b8 01 00 00 00          mov    $0x1,%eax
  1d:   cd 40                   int    $0x40
  1f:   5b                      pop    %ebx
  20:   5f                      pop    %edi
  21:   5e                      pop    %esi
  22:   59                      pop    %ecx
  23:   89 f8                   mov    %edi,%eax
  25:   81 fb 82 00 00 00       cmp    $0x82,%ebx
  2b:   75 0d                   jne    3a <subscribe+0x3a>
  2d:   89 fb                   mov    %edi,%ebx
  2f:   89 f7                   mov    %esi,%edi
  31:   be 01 00 00 00          mov    $0x1,%esi
  36:   31 c0                   xor    %eax,%eax
  38:   eb 17                   jmp    51 <subscribe+0x51>
  3a:   83 fb 02                cmp    $0x2,%ebx
  3d:   75 08                   jne    47 <subscribe+0x47>
  3f:   89 f3                   mov    %esi,%ebx
  41:   89 cf                   mov    %ecx,%edi
  43:   31 f6                   xor    %esi,%esi
  45:   eb 0a                   jmp    51 <subscribe+0x51>
  47:   83 ec 0c                sub    $0xc,%esp
  4a:   6a 01                   push   $0x1
  4c:   e8 fc ff ff ff          call   4d <subscribe+0x4d>
  51:   89 f1                   mov    %esi,%ecx
  53:   88 0a                   mov    %cl,(%edx)
  55:   89 5a 04                mov    %ebx,0x4(%edx)
  58:   89 7a 08                mov    %edi,0x8(%edx)
  5b:   89 42 0c                mov    %eax,0xc(%edx)
  5e:   89 d0                   mov    %edx,%eax
  60:   8d 65 f4                lea    -0xc(%ebp),%esp
  63:   5b                      pop    %ebx
  64:   5e                      pop    %esi
  65:   5f                      pop    %edi
  66:   5d                      pop    %ebp
  67:   c2 04 00                ret    $0x4

You see that despite having the same ABI, the userspace still pushes the registers on the stack just before issuing the system call. This is done because some of the registers will be cloberred by the system call so they have to be saved on the stack.

Also, if you care about performance and memory size, why don't you define system calls in header files so they can be inlined by the compiler? In theory, that could avoid arguments being pushed on the stack twice in some cases.

For comparaison, Console::write_str() with the old ABI:

  300216:       55                      push   %ebp
  300217:       53                      push   %ebx
  300218:       57                      push   %edi
  300219:       56                      push   %esi
  30021a:       83 ec 0c                sub    $0xc,%esp
  30021d:       83 64 24 04 00          andl   $0x0,0x4(%esp)
  300222:       8b 54 24 28             mov    0x28(%esp),%edx
  300226:       8b 74 24 24             mov    0x24(%esp),%esi
  30022a:       31 ff                   xor    %edi,%edi
  30022c:       47                      inc    %edi
  30022d:       6a 04                   push   $0x4
  30022f:       5b                      pop    %ebx
  300230:       89 d5                   mov    %edx,%ebp
  300232:       89 f9                   mov    %edi,%ecx
  300234:       55                      push   %ebp
  300235:       56                      push   %esi
  300236:       51                      push   %ecx
  300237:       57                      push   %edi
  300238:       89 d8                   mov    %ebx,%eax
  30023a:       cd 40                   int    $0x40
  30023c:       5f                      pop    %edi
  30023d:       59                      pop    %ecx
  30023e:       5e                      pop    %esi
  30023f:       5d                      pop    %ebp
  300240:       83 ff 02                cmp    $0x2,%edi
  300243:       74 67                   je     3002ac <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9
write_str17h767dad58e918f97cE+0x96>
  300245:       b9 41 01 30 00          mov    $0x300141,%ecx
  30024a:       8d 7c 24 04             lea    0x4(%esp),%edi
  30024e:       31 db                   xor    %ebx,%ebx
  300250:       43                      inc    %ebx
  300251:       89 de                   mov    %ebx,%esi
  300253:       57                      push   %edi
  300254:       51                      push   %ecx
  300255:       56                      push   %esi
  300256:       53                      push   %ebx
  300257:       89 d8                   mov    %ebx,%eax
  300259:       cd 40                   int    $0x40
  30025b:       5b                      pop    %ebx
  30025c:       5e                      pop    %esi
  30025d:       59                      pop    %ecx
  30025e:       5f                      pop    %edi
  30025f:       83 fb 02                cmp    $0x2,%ebx
  300262:       74 4f                   je     3002b3 <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h767dad58e918f97cE+0x9d>
  300264:       31 c9                   xor    %ecx,%ecx
  300266:       41                      inc    %ecx
  300267:       6a 02                   push   $0x2
  300269:       5e                      pop    %esi
  30026a:       31 ff                   xor    %edi,%edi
  30026c:       89 cb                   mov    %ecx,%ebx
  30026e:       57                      push   %edi
  30026f:       52                      push   %edx
  300270:       53                      push   %ebx
  300271:       51                      push   %ecx
  300272:       89 f0                   mov    %esi,%eax
  300274:       cd 40                   int    $0x40
  300276:       59                      pop    %ecx
  300277:       5b                      pop    %ebx
  300278:       5a                      pop    %edx
  300279:       5f                      pop    %edi
  30027a:       09 cb                   or     %ecx,%ebx
  30027c:       0f 94 c0                sete   %al
  30027f:       81 f9 80 00 00 00       cmp    $0x80,%ecx
  300285:       74 06                   je     30028d <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h767dad58e918f97cE+0x77>
  300287:       b1 01                   mov    $0x1,%cl
  300289:       84 c0                   test   %al,%al
  30028b:       74 2c                   je     3002b9 <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h767dad58e918f97cE+0xa3>
  30028d:       31 f6                   xor    %esi,%esi
  30028f:       46                      inc    %esi
  300290:       6a 00                   push   $0x0
  300292:       6a 00                   push   $0x0
  300294:       6a 00                   push   $0x0
  300296:       56                      push   %esi
  300297:       b8 00 00 00 00          mov    $0x0,%eax
  30029c:       cd 40                   int    $0x40
  /* MANY MORE LINES */

You read arguments from the stack, do a bit of computation and then push them again on the stack.

Console::write_str() with the new ABI:

00300237 <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h3f97909dec5136b8E>:
  300237:       55                      push   %ebp
  300238:       53                      push   %ebx
  300239:       57                      push   %edi
  30023a:       56                      push   %esi
  30023b:       83 ec 08                sub    $0x8,%esp
  30023e:       8b 74 24 24             mov    0x24(%esp),%esi
  300242:       83 24 24 00             andl   $0x0,(%esp)
  300246:       8b 54 24 20             mov    0x20(%esp),%edx
  30024a:       31 db                   xor    %ebx,%ebx
  30024c:       43                      inc    %ebx
  30024d:       6a 04                   push   $0x4
  30024f:       58                      pop    %eax
  300250:       89 d9                   mov    %ebx,%ecx
  300252:       89 f7                   mov    %esi,%edi
  300254:       cd 40                   int    $0x40
  300256:       83 fb 02                cmp    $0x2,%ebx
  300259:       74 53                   je     3002ae <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h3f97909dec5136b8E+0x77>
  30025b:       89 e7                   mov    %esp,%edi
  30025d:       31 c0                   xor    %eax,%eax
  30025f:       40                      inc    %eax
  300260:       89 c3                   mov    %eax,%ebx
  300262:       89 c1                   mov    %eax,%ecx
  300264:       ba 3c 01 30 00          mov    $0x30013c,%edx
  300269:       cd 40                   int    $0x40
  30026b:       83 fb 02                cmp    $0x2,%ebx
  30026e:       74 45                   je     3002b5 <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h3f97909dec5136b8E+0x7e>
  300270:       31 db                   xor    %ebx,%ebx
  300272:       43                      inc    %ebx
  300273:       6a 02                   push   $0x2
  300275:       58                      pop    %eax
  300276:       89 d9                   mov    %ebx,%ecx
  300278:       89 f2                   mov    %esi,%edx
  30027a:       31 ff                   xor    %edi,%edi
30027c:       cd 40                   int    $0x40
  30027e:       09 d9                   or     %ebx,%ecx
  300280:       0f 94 c0                sete   %al
  300283:       81 fb 80 00 00 00       cmp    $0x80,%ebx
  300289:       74 06                   je     300291 <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h3f97909dec5136b8E+0x5a>
  30028b:       b2 01                   mov    $0x1,%dl
  30028d:       84 c0                   test   %al,%al
  30028f:       74 2a                   je     3002bb <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h3f97909dec5136b8E+0x84>
  300291:       31 f6                   xor    %esi,%esi
  300293:       46                      inc    %esi
  300294:       89 f3                   mov    %esi,%ebx
  300296:       8d 05 a4 02 30 00       lea    0x3002a4,%eax
  30029c:       50                      push   %eax
  30029d:       b8 00 00 00 00          mov    $0x0,%eax
  3002a2:       cd 40                   int    $0x40
  3002a4:       83 3c 24 00             cmpl   $0x0,(%esp)
  3002a8:       74 ea                   je     300294 <_ZN76_$LT$libtock_console..ConsoleWriter$LT$S$GT$$u20$as$u20$core..fmt..Write$GT$9write_str17h3f97909dec5136b8E+0x5d>
  /* MANY MORE LINES */

You read the arguments from the stack and do some computation with them, without needing to push them back on the stack.

Binary size with old ABI:

text	   data	    bss	    dec	    hex	filename
   1537	      0	    256	   1793	    701	console

Binary size with new ABI:

text	   data	    bss	    dec	    hex	filename
   1477	      0	    256	   1733	    6c5	console

You saved 60 bytes of text. Why? Cuz xoring and moving registers around need 2 bytes, while reading from the stack needs 3 bytes + the additional overhead of constantly pushing registers on the stack (1 byte per push).

Also, the performance implication of translating from function call ABI to system call ABI is insignificant. It takes at most a few cycles, since the processor can perform register renaming with movs and xors. Also, most modern CPUs have more ALUs than AGUs, so they can compute more than they can load and store. For instance, my machine, AMD 15h CPU, has 4 ALUs and just 2 AGUs. Not to mention that, supposing the old ABI didn't require any push/read, a few cycles required for the translation is insignificant compared to ~70-150 cycles required by int 0x40 instruction. If you add the required time to handle the system call, caring about ABI translation falls into the category of micro-optimisation.

Moreover, dealing with the kernel not necessarily having a process's pages in it's VM map is going to need to be dealt with for allow buffers, so dealing with it in this way might just be kicking the can down the road.

Check my answer to reynoldsbd.

32-bit x86 is the same (see the System-V ABI specification for example

Function call ABI != system call ABI. Both Linux and FreeBSD use the ABI pointed out by man syscall. The sole benefit of having the same ABI is relevant to ARMv*-M architectures, where the hardware takes care of pushing the relevant registers such that an exception handler can be written as a simple C function. This advantage doesn't exist on x86 where the hardware doesn't respect the platform's ABI. It is also nullified by the fact that Tock's handlers are still written in inline assembly.

I hope I'll be able to share the PR with MMU support so we can discuss on concrete code rather than on etherical topics and suppositions.

@alexandruradovici
Copy link
Contributor

Here is my view on this:

  • using registers seems a better solution at least for the moment, except for upscalls that requires user space changes
  • using the stack for upcall arguments has another drawback, the space for the arguments might be in between two pages, depending on what the stack pointer points to, which means two table traversals and at least on page map if the kernel does not map the whole RAM linearly
  • one option would be to use the stack, but copy the arguments back to the stack only right before switching to the process, after the MMU has been properly configured for the process, but this implies that at least part of the kernel memory (that stores the process state) will be in the processes's page table

@Ioan-Cristian Ioan-Cristian mentioned this pull request Jun 12, 2025
14 tasks
@reynoldsbd
Copy link
Contributor

So at least from my side, I have a broader underlying question about the overall MMU implementation being proposed:

From your comments (@Ioan-Cristian and @alexandruradovici) it sounds like you envision a very flexible and general-purpose MMU subsystem which dynamically allocates pages and configures page tables at runtime. (Please correct me if that's untrue! This is why I mentioned kicking off a larger discussion about MMU, because right now I'm just reading between the lines of your comments.)

In contrast, the limited MMU implementation currently in master was designed with up-front allocation and linear mapping in mind. This was very intentional on our part for a number of reasons, the biggest of which is that we view dynamic allocation as a reliability and timing hazard. So we took advantage of the Tock application model - specifically the fact that the full list of apps is generally known already when the kernel is booting - to produce what we feel is a more reliable and deterministic design with page tables that are linearly mapped and which do not change at runtime.

To be clear, I don't think either of these viewpoints is wrong. Much of the community may favor a dynamic MMU, and at the same time some deployments may prefer stronger guarantees about reliability and determinism (certainly the case for Pluton, and quite possibly valuable for other applications like industrial or automotive). Hence my comment about a broader MMU conversation. In true Rust fashion, I'm confident we'll be able to find some solution that puts the choice into the user's (i.e. board author's) control.

Back to the topic of the present ABI PR: I agree with @alevy's comment, this feels like kicking the can down the road. Regardless of whether the page tables are crafted statically or dynamically, I would still assert the same two observations from my previous comment:

  1. The motivation for this PR seems to be an attempt to avoid having to translate between kernel and app page tables, but I would still contend this is unavoidable. There are numerous capsules and hardware drivers which will need to access process memory, so the problem needs to be solved regardless of whether the ABI uses stack or registers.
  2. I agree that swapping (or rewriting) page tables and performing a TLB flush each time the kernel needs to access process memory would be much too costly. Would you be open to brainstorming a solution that allows the kernel to access app memory without incurring such a cost?

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 12, 2025

@reynoldsbd

Regardless of whether the page tables are crafted statically or dynamically

The MMU support I have implemented is designed for static page tables. @alexandruradovici desire for dynamic page tables is more like a long term goal.

but I would still contend this is unavoidable

It is unavoidable for allow buffers. It is avoidable for x86 userspace kernel boundary. Can you please provide instructions on how to compile a toolchain for the i386-elf, alongside with newlib and libstdc++.a, preferably in a Dockerfile format? I didn't manage to compile libtock-c applications for i386-elf target. I've tried both crosstool-ng and manual toolchain setup.

I agree that swapping (or rewriting) page tables and performing a TLB flush each time the kernel needs to access process memory would be much too costly. Would you be open to brainstorming a solution that allows the kernel to access app memory without incurring such a cost?

The kernel does not need to swap, rewrite page tables or perform TLB flushes when it wants to access process memory. Instead, it has to perform an offline translation to find the kernel's equivalent of a user virtual pointer. This is done each time a buffer is allowed. @alevy suggested to setup the MMU earlier in do_process(). That is indeed very costly. As a last resort, if we don't reach a consesus regarding the new ABI, I would change the userspace-kernel boundary so that it takes additional arguments, which represent the start of the process memory, its break and its stack pointer as kernel virtual pointers.

Comment on lines 12 to 14
kernel = { path = "../../kernel" }
tock-cells = { path = "../../libraries/tock-cells" }
tock-registers = { path = "../../libraries/tock-register-interface" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for thoughts here, @Ioan-Cristian.

Does all of the x86 crate need to be duplicated? I had hoped that we could have one shared x86 crate, and then have two additional crates (x86 flat memory space and x86 virtual memory space) that build on top of it and each only have a few files that differ. For this PR, the only difference would be the boundary stuff. In the future, additional files could be pulled out of shared and instead differentiated for the two as necessary for implementing virtual memory.

However, you're the one actually doing the work. So maybe I'm totally missing something about why that's way too much work, totally unworkable, or bad for some other reason. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Common code may be shared. I don't like the approach of using two different crates, so I was lazy and simply duplicated the code.

The reasons I don't like the approach of using two different crates:

  1. They should be eventually merged, either by adopting the new ABI or by dropping it.
  2. It doesn't seem natural to have two x86 platforms, because, in reality, there is only one x86 platform.
  3. Duplicating the arch crate results in duplicating the chip and board crates. This means that the idea of using a shared crate for common code and two separate crates for specific code will propagate to chip and board crate.

As a consequence, I don't think it is worth the effort of separating common and distinct code. Eventually, the two x86 platforms will be merged by choosing one ABI over another.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem natural to have two x86 platforms, because, in reality, there is only one x86 platform.

Do you not think that there will eventually be two separate platforms, one with virtual memory and one without?

Eventually, the two x86 platforms will be merged by choosing one ABI over another.

I do agree that even if there are two separate platforms, we would likely want to choose one ABI for both.

Copy link
Contributor Author

@Ioan-Cristian Ioan-Cristian Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you not think that there will eventually be two separate platforms, one with virtual memory and one without?

Isn't a platform with a MMU a superset of a platform with MPU? As for the new memory management system, it works with both MPUs and MMUs with little changes. As a proof, I was able to run the same applications, unmodified, on both ARMv6-M, RV32I and ARMv8-A, without changing any capsule at all. There are minor changes to chip, arch and board crates, but nothing significant. Even x86 doesn't require any big change in order to work with the new memory management system. It just happens that one of the changes, the ABI one, is a breaking one for userspace.

As a result, I don't think Tock will need to have two separate x86 platforms, as the x86 with virtual memory provides all the functionality of the x86 without virtual memory. The only downside is a slight increase in binary size for x86, but given how small Tock is, this is negligible.

@Ioan-Cristian
Copy link
Contributor Author

@alevy @brghena Is this PR still marked as waiting-on-author because you forgot to remove the label? Or are there unanswered questions for me?

@brghena
Copy link
Contributor

brghena commented Jun 25, 2025

Removed the waiting-on-author tag for now.

A request: could you split this into two (or more if necessary) commits. One that duplicates the x86 stuff and one that makes changes to it? As-is, I can't tell which files actually have changes.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 25, 2025

@brghena

A request: could you split this into two (or more if necessary) commits. One that duplicates the x86 stuff and one that makes changes to it? As-is, I can't tell which files actually have changes.

Done.

@bradjc
Copy link
Contributor

bradjc commented Jun 25, 2025

In hindsight, having single crates for chips was, in my opinion, a mistake. We should have used a two-crate structure for all chips, one for the chip family and one for a specific MCU.

I don't support merging this with the duplicated code and pushing the refactoring work on to the next contributor. It's not fair to someone who wants to fix something in the shared x86 code who either has to implement the prudent chip/chip-family split or duplicate their work. It's also not fair to reviewers who have to keep track of which patches need to be duplicated for the fork to not diverge.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 25, 2025

In hindsight, having single crates for chips was, in my opinion, a mistake. We should have used a two-crate structure for all chips, one for the chip family and one for a specific MCU.

I don't support merging this with the duplicated code and pushing the refactoring work on to the next contributor. It's not fair to someone who wants to fix something in the shared x86 code who either has to implement the prudent chip/chip-family split or duplicate their work. It's also not fair to reviewers who have to keep track of which patches need to be duplicated for the fork to not diverge.

I also think it is a bad idea to duplicate code in this way, especially that the goal is to merge them one day. That's why I suggested a new branch, virtual_memory, where Tock developers can experiment with a MMU capable kernel. After that branch gets stable enough, it can be merged into the main branch.

Branden suggested to create a common crate, x86, then two distinct ones x86-flat and x86-virtual. However, this would also result in three chip crates (common + flat + virtual) and three board crates (common + flat + virtual). I have not implemented this approach because:

  1. lazy
  2. there is only one true x86 chip, namely x86-virtual

In a previous comment, I explained why there is no need for two x86 platforms. Since it has been ignored, I assumed Tock core WG really wants to have two x86 platforms.

Another approach is to use conditional compilation. This one has two disadvantages:

  1. code becomes harder to understand
  2. doesn't play well with Rust tools, such as cargo test and cargo clippy

@alevy
Copy link
Member

alevy commented Jun 25, 2025

I wouldn't mind a feature branch

@brghena
Copy link
Contributor

brghena commented Jun 25, 2025

The downsides of branches are:

  • No one else keeps them up-to-date for you (probably less of a concern for the x86 stuff)
  • They are not discoverable by others (although you can point people to them)

So to me, it's not clear that a feature branch is any better than just keeping the code in your own downstream repo.

However, we did find them really useful for the final push on Tock ethernet. Merging a huge change request was going to be hard, but merging little pieces into a feature branch was reviewable. Then the merge from the feature branch into master was quick because we'd already reviewed all the pieces within it.

I do think a feature branch is better than duplication.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 25, 2025

@alevy If everyone agrees with a feature branch, can you create it, please? I don't have the required permissions to do it by myself.

@Ioan-Cristian
Copy link
Contributor Author

Ioan-Cristian commented Jun 25, 2025

Also, if I understand correctly, the main branch of Tock is designed for:

  1. stability for downstream users
  2. development

Aren't the two goals mutually exclusive? Why doesn't Tock have multiple branches, a stable one, for downstream users, and an unstable one, for development? Or why doesn't Tock use tags with the guarantee that a tagged commit may be safely used by downstream users, while untagged commits are subjects to fast and various changes?

@alevy alevy changed the base branch from master to x86-mmu June 26, 2025 17:15
@alevy
Copy link
Member

alevy commented Jun 26, 2025

Created x86-mmu and changed the base of this PR to that branch. I think worth reverting this PR back to the non-duplicated version (sorry for the churn) and then we can merge

Instead of pushing and popping arguments on the stack, arguments are
passed in registers as on Linux x86 and Tock RISC-V. This new ABI
simplifies the implementation of virtual memory.

Signed-off-by: Ioan-Cristian CÎRSTEA <ioan.cirstea@oxidos.io>
@Ioan-Cristian
Copy link
Contributor Author

@alevy Reverted the PR.

sorry for the churn

No problems on my side!

Copy link
Contributor

@brghena brghena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, this is now going into a separate branch. The changes themselves look fine. So, I think we should merge this now.

Then later after other work on MMU support progresses we can revisit the question about whether these are the right ABI changes before merging in master.

@brghena brghena added the last-call Final review period for a pull request. label Jun 30, 2025
@brghena
Copy link
Contributor

brghena commented Jun 30, 2025

Also, if I understand correctly, the main branch of Tock is designed for:

  1. stability for downstream users
  2. development

Aren't the two goals mutually exclusive? Why doesn't Tock have multiple branches, a stable one, for downstream users, and an unstable one, for development? Or why doesn't Tock use tags with the guarantee that a tagged commit may be safely used by downstream users, while untagged commits are subjects to fast and various changes?

To answer this question, Tock does the latter option. It has tagged releases that may be safely used by downstream users, while the master branch includes many recent PRs with less testing.

But given that the master branch eventually becomes the next release, we do still care about stability for downstream users in it. It's not that things can't ever change, but that we want to be cautious about those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
last-call Final review period for a pull request. P-Significant This is a substancial change that requires review from all core developers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants