Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce UK_LLSYSCALL_R_U_DEFINE and register clone with it #1175

Merged
merged 22 commits into from Dec 23, 2023

Conversation

mogasergiu
Copy link
Member

@mogasergiu mogasergiu commented Nov 24, 2023

Prerequisite checklist

  • Read the contribution guidelines regarding submitting new changes to the project;
  • Tested your changes against relevant architectures and platforms;
  • Ran the checkpatch.uk on your commit series before opening this PR;
  • Updated relevant documentation.

Base target

  • Architecture(s): x86_64, arm64
  • Platform(s): kvm
  • Application(s): N/A

Additional configuration

Description of changes

Some runtimes assume that the child of clone inherits full context of the parent (except for scratch registers).
Introduce struct uk_syscall_regs:

struct uk_syscall_regs {                                                                                                                                                      
       struct __regs regs;                                                                                                                                                    
       struct ukarch_ulctx ulctx;                                                                                                                                             
       __u8 ectx[UKARCH_ECTX_SAVE_MAX_SIZE];                                                                                                                                  
       __u8 pad[UK_SYSCALL_PAD_SIZE];                                                                                                                                         
};

A structure made up of architecture specific structures:

  • struct __regs for general purpose registers
  • new struct ukarch_ulctx, meant to help the kernel keep track of certain special registers an application may change and which we may want to know of in case we have to swap them with ours when entering a system call. E.g. TLS pointer in bincompat, or the x86_64 gs_base register which an application may change through arch_prctl system call.
  • ectx mean to represent the architecture specific slot where we may want to save/restore to/from when enterin/exiting a system call

Introduce UK_LLSYSCALL_R_U_DEFINE, a system call registration macro alternative that offers a system call access to the caller/parent's full struct uk_syscall_regs context. Two system calls shall be registered: clone and arch_prctl. Additionally, uk_syscall6_r and ukplat_syscall_handler shall now make use of this generic structure.

Thus, now clone is able to make the child inherit full context of the parent if desired and satisfy picky runtimes.

Depends on #1173 and #1174

@mogasergiu mogasergiu requested review from a team as code owners November 24, 2023 17:44
@mogasergiu mogasergiu removed request for a team November 24, 2023 17:58
@mogasergiu mogasergiu force-pushed the smoga/uk_syscall_regs branch 4 times, most recently from 5d54c9a to 1773bc2 Compare November 26, 2023 15:57
@mogasergiu mogasergiu changed the title Introduce UK_SYSCALL_R_U_DEFINE and register clone with it Introduce UK_LLSYSCALL_R_U_DEFINE and register clone with it Nov 26, 2023
Copy link
Member

@michpappas michpappas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mogasergiu amazing as always 😎 I am leaving some first comments. I might provide additional feedback after porting my work. Thanks!

PS: As with #1174 perhaps we can move the Arm stuff to a separate PR 🙏🏼

lib/syscall_shim/arch/arm64/include/uk/ulctx.h Outdated Show resolved Hide resolved
include/uk/essentials.h Show resolved Hide resolved
lib/syscall_shim/include/uk/syscall.h Outdated Show resolved Hide resolved
include/uk/essentials.h Outdated Show resolved Hide resolved
lib/syscall_shim/include/uk/syscall.h Outdated Show resolved Hide resolved
lib/syscall_shim/arch/x86_64/ulctx.c Outdated Show resolved Hide resolved
lib/syscall_shim/include/uk/syscall.h Outdated Show resolved Hide resolved
lib/syscall_shim/include/uk/syscall.h Outdated Show resolved Hide resolved
lib/syscall_shim/include/uk/syscall.h Outdated Show resolved Hide resolved
lib/syscall_shim/include/uk/syscall.h Outdated Show resolved Hide resolved
@razvand razvand self-assigned this Dec 23, 2023
@razvand razvand changed the base branch from staging to staging-1175 December 23, 2023 13:33
@razvand razvand merged commit be08b3a into unikraft:staging-1175 Dec 23, 2023
10 checks passed
razvand pushed a commit that referenced this pull request Dec 23, 2023
Define an architecture specific userland context that an architecture
can use to store things it may want to preserve or switch between
when entering/exiting Unikraft context during normal userland
execution (e.g. running in conjunction with binary syscalls).

For now, this is defined as a struct with a base field for both
ARM64 and x86_64, namely the `tpidr_el0` and `fs_base` registers
respectively.

Additionally, x86_64 has one more specific field:
- `gs_base`, that refers to the value of the `gs_base` register of
the application before calling the system call

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Add basic setter/getter operations for the `fs_base` field of the
userland context structure that is meant to represent the userspace
TLS.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Add basic setter/getter operations for the `tpidr_el0` field of the
userland context structure that is meant to represent the userspace
TLS.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement two basic methods:
- TLS switch to Unikraft: stores current userland TLS into `fs_base`
fielf of the userland context and updates the active TLS to that of
Unikraft
- TLS switcho to userland: undoes what switchon did, by setting the
active TLS as the one that was previously stored in `fs_base`

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement two basic methods:
- TLS switch to Unikraft: stores current userland TLS into `tpidr_el0`
fielf of the userland context and updates the active TLS to that of
Unikraft
- TLS switch from Unikraft: undoes what the previous switch did, by
setting the active TLS as the one that was previously stored in
`tpidr_el0`

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement switch from/to functions to be used in conjunction
with the userland context.

The switch to Unikraft operation assumes that it can only be called
from within Unikraft context (i.e. not directly by the app) and, as
such, the app's `gs_base` register value was preserved within
`X86_MSR_KERNEL_GS_BASE` following a `swapgs` on syscall entry.
This value will be thus stored in the current userland
context.

The switch to userland operation assumes the same thing and, therefore,
it will set `X86_MSR_GS_BASE` to the current `struct lcpu` and
`X86_MSR_KERNEL_GS_BASE` to the app's preserved `gs_base` register
within the userland context of the app's thread we are currently
switching to.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement basic methods for getting/setting the `gs_base` field of
`struct ukarch_sysregs`.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement `switch_ul`/`switch_uk` functions to be used in conjunction
with the system registers. Useful when wanting to ensure consistency
between register states of the application versus Unikraft core's.

This will also switch the current TLS pointer to the one that was
stored in the userland context.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Define a new structure, `struct uk_syscall_ctx`, which is currently
composed of the following fields:
- a `struct __regs regs` field meant to contain the register context
of the userland caller before executing the `syscall` instruction
- a `struct ukarch_sysregs sysregs` field meant to contain the architecture
specific context containing only the registers that have to be preserved,
kept track of and switched between during system call entry/exit.
- a slot big enough to hold the saving of the architecture specific
extended context
- padding to ensure that the structure is aligned end-to-end, w.r.t. the
alignment of the extended context saving area.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
It may come in handy to be able to know the offsets of each register
inside the `struct __regs` structure during more fragile states of
execution, usually coded in assembly.

Therefore, move the guarding `#ifdef` below the register definitions
so that assembly files may be able to include this header and benefit
from the aforementioned macros.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Allow catching missized formats of the `struct __regs` at build time
by using `UK_CTASSERT` on them.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
This is no longer used and unnecessarily adds a bit of difficulty
when it comes to adding new system call registration macros. Thus,
remove it.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Add a macro for `__attribute__((naked))`. This hints to the compiler
to generate code without prolog and epilog code. This comes in handy
when writing inline assembly functions only. Although caution may
be required as the ABI must still be respected and the caller
registers that are to be preserved by the callee must still be
preserved.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement an x86 inline assembly macro `UK_SYSCALL_USC_PROLOGUE_DEFINE`
function named after `pname` that switches to the auxiliary stack,
and starts pushing and storing the caller's context before passing it
as argument to a function whose name is defined by the second argument,
`fname`.

This function shall be defined as a `__naked` function, meaning the
compiler does not provide us withABI compliant prologue/epilogue.
Althrough we supposedly don't touch the callee preserved registers in
this function, fully optimized images may inline the `fname` function
and end up messing the ABI. Therefore, make double sure this is avoided
by restoring the callee preserved registers before switching back to the
initial stack and returning.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement an ARM inline assembly macro `UK_SYSCALL_USC_PROLOGUE_DEFINE`
function named after `pname` that switches to the auxiliary stack,
and starts pushing and storing the caller's context before passing it
as argument to a function whose name is defined by the second argument,
`fname`.

Althrough we supposedly don't touch the callee preserved registers in
this function, fully optimized images may inline the `fname` function
and end up messing the ABI. Therefore, make double sure this is avoided
by restoring the callee preserved registers before switching back to the
initial stack and returning.

To further emphasize to the compiler that we do not want anything
else in this function other than the prologue, optimize it with O3.

Notice that we make use of `TPIDRRO_EL0` as a scratch register so that
we can temporarily preserve x0's value. This should be safe as the
application/user themselves cannot change this register's value,
which means this register's value will always be known by us and we
could restore it anytime.

Some example OSes making use of TPIDRRO_EL0 would include Zephyr using
use this register for holding its `struct __cpu`, Windows using it to
hold the current CPU number, or Linux which uses it as both a scratch
register and a secondary thread ID.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Introduce an assembly written method `uk_syscall_ctx_popall` that
may be used in conjunction with a context that may be saved through a
`struct uk_syscall_ctx` structure. This method assumes that `%rsp`
points to the aforementioned structure and, furthermore, makes the
assumption that it is already aligned in memory to what this structure
should be (i.e. alignment of ECTX). Furthermore, and most importantly,
this method assumes that `gs_base` is set to the current LCPU's
correspondent `struct lcpu` pointer as it ends in a `swapgs`.

`uk_syscall_ctx_popall` simply disables IRQ's and restores the
extended context, userland context (`gs_base`/`fs_base` at the moment)
and the general purpose registers only to then finish with a `swapgs`
to swap Unikraft/Application `gs_base`s and an `iretq` to "teleport"
in the expected context.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Introduce an assembly written method `uk_syscall_ctx_popall` that
may be used in conjunction with a context that may be saved through a
`struct uk_syscall_ctx` structure. This method assumes that `sp`
points to the aforementioned structure and, furthermore, makes the
assumption that it is already aligned in memory to what this structure
should be (i.e. alignment of ECTX).

`uk_syscall_ctx_popall` simply disables IRQ's and restores the
extended context, userland context (`tpidr_el0` only at the moment)
and the general purpose registers only to `eret` to "teleport"
to the expected context.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Implement `UK_LLSYSCALL_R_U_DEFINE`, a `struct uk_syscall_ctx`
alternative to `UK_LLSYSCALL_R_DEFINE`. System call that get
registered through this macro shall have an inline assembly
preamble call them. This preamble will preserve the
`struct uk_syscall_ctx` of the caller and pass it to the
registered syscall.

The system call function will be declared with the registered
arguments and, additionally, with a hidden `struct uk_syscall_ctx *usr`
argument that will contain the aforementioned saved context. This
is very useful when it comes to system calls such as `fork` or `clone`
that may want to duplicate the caller's/parent's context into the child.

E.g.
Consider the usual registration with `UK_LLSYSCALL_R_DEFINE`:
```C
UK_LLSYSCALL_R_DEFINE(type, sysname, type0, arg0, type1, arg1)
{
	...
}
```
This will end up creating two global scope symbols:
```
uk_syscall_e_sysname
uk_syscall_r_sysname
```
These symbols will end up calling a statically declared
`__uk_syscall_r_sysname`/`__uk_syscall_e_sysname` behind the scenes
with the necessary arguments and this will be the actual function
executing the code between the brackets `{ }`.

Now consider the new variant, `UK_LLSYSCALL_R_U_DEFINE`:
```C
UK_LLSYSCALL_R_U_DEFINE(type, sysname, type0, arg0, type1, arg1)
{
	...
}
```
Behind the scenes this will create four global scope symbols:
```
uk_syscall_e_sysname
uk_syscall_r_sysname
uk_syscall_e_u_sysname
uk_syscall_r_u_sysname
```
`uk_syscall_r_sysname` and `uk_syscall_e_sysname` will actually be
the assembly preamble that stores the caller's context in the form of
`sturct uk_syscall_ctx` and then they call `uk_syscall_r_u_sysname`
and `uk_syscall_e_u_sysname` respectively, which represent the actual
code written between brackets `{ }`. These, just like before, will call
`__uk_syscall_r_u_sysname`/`__uk_syscall_e_u_sysname`, which represent
a key useful difference from their alternative without the `_u`:
- The `__uk_syscall_r_sysname`/`__uk_syscall_e_sysname` will have the
following signature:
```C
static inline
type __uk_syscall_r_sysname(type0 arg0, type1 arg1);
static inline
type __uk_syscall_r_sysname(type0 arg0, type1 arg1);
```
- The `__uk_syscall_r_u_sysname`/`__uk_syscall_e_u_sysname` will have the
following signature:
```C
static inline __used
type __uk_syscall_r_u_sysname(struct uk_syscall_ctx *usr,
			      type0 arg0, type1 arg1);
static inline __used
type __uk_syscall_r_u_sysname(struct uk_syscall_ctx *usr,
			      type0 arg0, type1 arg1);
```
Thus, the programmer can use the `_U_` system call registration variant
if they require access to the full context of the caller/parent.

Finally, add equivalent `ARG_MAP`/`PRINTD` and the like macro's to
accomodate this new type of registration.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Make it so that `ukplat_syscall_handler` now receives a
`struct uk_syscall_ctx` pointer as its argument instead.

This structure shall be saved on the stack by the early assembly
entry for each architecture. Furthermore, prefer `sysregs` API
over other methods of touching the context.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Being the center of all system call dispatching, it may be useful
to let `uk_syscall6_r` receive a `struct uk_syscall_ctx` as its
argument, containing the full context of the caller/parent.

Thus, if a system call has its `_u_` variant defined, that will be
the one that is used instead and the argument passed to it is
the aforementioned `struct uk_syscall_ctx` structure.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Register `clone` through `UK_LLSYSCALL_R_U_DEFINE` to have full
access to the `struct uk_syscall_ctx` context of the caller/parent.
This way we will be able to give the child the same exact register
context as the parent, except, of course, for the obvious registers
(e.g. `%rax` equal to `0` on `x86_64` and `x0` equal to `0`
on `ARM64`).

Some runtimes, such as the Go one, may expect some registers
to be exact the same as those of the parent.

Registers that may not be preserved may be the scratch ones
(e.g. `%r11` on x86).

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175
razvand pushed a commit that referenced this pull request Dec 23, 2023
Now that we rely fully on `struct uk_syscall_ctx` and
`struct ukarch_sysregs`, we no longer need these TLS variables.

Therefore, remove them and any references of them from the codebase.

Signed-off-by: Sergiu Moga <sergiu@unikraft.io>
Reviewed-by: Michalis Pappas <michalis@unikraft.io>
Approved-by: Razvan Deaconescu <razvand@unikraft.io>
GitHub-Closes: #1175

/* We now have in SP the trap stack and in x0 the auxiliary stack */
EXCHANGE_SP_WITH_X0 /* Switch them */
/* Restore old SP we stored before system call check */
ldr x0, [x0, #-16]
str x0, [sp, #-16] /* Store old SP in auxiliary stack */
str x0, [sp, #__SP_OFFSET] /* Store old SP in auxiliary stack */
b 1f
0:
/* Restore x0 */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not restore x0 in the case of a system call

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, x0 should have been restored on syscalls as well of course. I must have somehow wrongly changed this during the review process and not notice. The restoration of x0 should have been placed after the 1: label instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mogasergiu , would this require an update (PR) from your side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I could look into it after I return from my vacation, unless someone else would like to have a go at it as well. Though the fix should be rather simple: moving the restoration of x0 two lines below should suffice if I see this right.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done #1256

michpappas added a commit to michpappas/unikraft that referenced this pull request Feb 3, 2024
Interim fix for unikraft#1175

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
michpappas added a commit to michpappas/unikraft that referenced this pull request Feb 3, 2024
Interim fix for unikraft#1175

Signed-off-by: Michalis Pappas <michalis@unikraft.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done!
Development

Successfully merging this pull request may close these issues.

None yet

5 participants