New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce UK_LLSYSCALL_R_U_DEFINE
and register clone
with it
#1175
Introduce UK_LLSYSCALL_R_U_DEFINE
and register clone
with it
#1175
Conversation
5d54c9a
to
1773bc2
Compare
UK_SYSCALL_R_U_DEFINE
and register clone
with itUK_LLSYSCALL_R_U_DEFINE
and register clone
with it
1773bc2
to
b43d2c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mogasergiu amazing as always 😎 I am leaving some first comments. I might provide additional feedback after porting my work. Thanks!
PS: As with #1174 perhaps we can move the Arm stuff to a separate PR 🙏🏼
Define an architecture specific userland context that an architecture can use to store things it may want to preserve or switch between when entering/exiting Unikraft context during normal userland execution (e.g. running in conjunction with binary syscalls). For now, this is defined as a struct with a base field for both ARM64 and x86_64, namely the `tpidr_el0` and `fs_base` registers respectively. Additionally, x86_64 has one more specific field: - `gs_base`, that refers to the value of the `gs_base` register of the application before calling the system call Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Add basic setter/getter operations for the `fs_base` field of the userland context structure that is meant to represent the userspace TLS. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Add basic setter/getter operations for the `tpidr_el0` field of the userland context structure that is meant to represent the userspace TLS. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement two basic methods: - TLS switch to Unikraft: stores current userland TLS into `fs_base` fielf of the userland context and updates the active TLS to that of Unikraft - TLS switcho to userland: undoes what switchon did, by setting the active TLS as the one that was previously stored in `fs_base` Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement two basic methods: - TLS switch to Unikraft: stores current userland TLS into `tpidr_el0` fielf of the userland context and updates the active TLS to that of Unikraft - TLS switch from Unikraft: undoes what the previous switch did, by setting the active TLS as the one that was previously stored in `tpidr_el0` Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement switch from/to functions to be used in conjunction with the userland context. The switch to Unikraft operation assumes that it can only be called from within Unikraft context (i.e. not directly by the app) and, as such, the app's `gs_base` register value was preserved within `X86_MSR_KERNEL_GS_BASE` following a `swapgs` on syscall entry. This value will be thus stored in the current userland context. The switch to userland operation assumes the same thing and, therefore, it will set `X86_MSR_GS_BASE` to the current `struct lcpu` and `X86_MSR_KERNEL_GS_BASE` to the app's preserved `gs_base` register within the userland context of the app's thread we are currently switching to. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement basic methods for getting/setting the `gs_base` field of `struct ukarch_sysregs`. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement `switch_ul`/`switch_uk` functions to be used in conjunction with the system registers. Useful when wanting to ensure consistency between register states of the application versus Unikraft core's. This will also switch the current TLS pointer to the one that was stored in the userland context. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Define a new structure, `struct uk_syscall_ctx`, which is currently composed of the following fields: - a `struct __regs regs` field meant to contain the register context of the userland caller before executing the `syscall` instruction - a `struct ukarch_sysregs sysregs` field meant to contain the architecture specific context containing only the registers that have to be preserved, kept track of and switched between during system call entry/exit. - a slot big enough to hold the saving of the architecture specific extended context - padding to ensure that the structure is aligned end-to-end, w.r.t. the alignment of the extended context saving area. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
It may come in handy to be able to know the offsets of each register inside the `struct __regs` structure during more fragile states of execution, usually coded in assembly. Therefore, move the guarding `#ifdef` below the register definitions so that assembly files may be able to include this header and benefit from the aforementioned macros. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Allow catching missized formats of the `struct __regs` at build time by using `UK_CTASSERT` on them. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
This is no longer used and unnecessarily adds a bit of difficulty when it comes to adding new system call registration macros. Thus, remove it. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Add a macro for `__attribute__((naked))`. This hints to the compiler to generate code without prolog and epilog code. This comes in handy when writing inline assembly functions only. Although caution may be required as the ABI must still be respected and the caller registers that are to be preserved by the callee must still be preserved. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement an x86 inline assembly macro `UK_SYSCALL_USC_PROLOGUE_DEFINE` function named after `pname` that switches to the auxiliary stack, and starts pushing and storing the caller's context before passing it as argument to a function whose name is defined by the second argument, `fname`. This function shall be defined as a `__naked` function, meaning the compiler does not provide us withABI compliant prologue/epilogue. Althrough we supposedly don't touch the callee preserved registers in this function, fully optimized images may inline the `fname` function and end up messing the ABI. Therefore, make double sure this is avoided by restoring the callee preserved registers before switching back to the initial stack and returning. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement an ARM inline assembly macro `UK_SYSCALL_USC_PROLOGUE_DEFINE` function named after `pname` that switches to the auxiliary stack, and starts pushing and storing the caller's context before passing it as argument to a function whose name is defined by the second argument, `fname`. Althrough we supposedly don't touch the callee preserved registers in this function, fully optimized images may inline the `fname` function and end up messing the ABI. Therefore, make double sure this is avoided by restoring the callee preserved registers before switching back to the initial stack and returning. To further emphasize to the compiler that we do not want anything else in this function other than the prologue, optimize it with O3. Notice that we make use of `TPIDRRO_EL0` as a scratch register so that we can temporarily preserve x0's value. This should be safe as the application/user themselves cannot change this register's value, which means this register's value will always be known by us and we could restore it anytime. Some example OSes making use of TPIDRRO_EL0 would include Zephyr using use this register for holding its `struct __cpu`, Windows using it to hold the current CPU number, or Linux which uses it as both a scratch register and a secondary thread ID. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Introduce an assembly written method `uk_syscall_ctx_popall` that may be used in conjunction with a context that may be saved through a `struct uk_syscall_ctx` structure. This method assumes that `%rsp` points to the aforementioned structure and, furthermore, makes the assumption that it is already aligned in memory to what this structure should be (i.e. alignment of ECTX). Furthermore, and most importantly, this method assumes that `gs_base` is set to the current LCPU's correspondent `struct lcpu` pointer as it ends in a `swapgs`. `uk_syscall_ctx_popall` simply disables IRQ's and restores the extended context, userland context (`gs_base`/`fs_base` at the moment) and the general purpose registers only to then finish with a `swapgs` to swap Unikraft/Application `gs_base`s and an `iretq` to "teleport" in the expected context. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Introduce an assembly written method `uk_syscall_ctx_popall` that may be used in conjunction with a context that may be saved through a `struct uk_syscall_ctx` structure. This method assumes that `sp` points to the aforementioned structure and, furthermore, makes the assumption that it is already aligned in memory to what this structure should be (i.e. alignment of ECTX). `uk_syscall_ctx_popall` simply disables IRQ's and restores the extended context, userland context (`tpidr_el0` only at the moment) and the general purpose registers only to `eret` to "teleport" to the expected context. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Implement `UK_LLSYSCALL_R_U_DEFINE`, a `struct uk_syscall_ctx` alternative to `UK_LLSYSCALL_R_DEFINE`. System call that get registered through this macro shall have an inline assembly preamble call them. This preamble will preserve the `struct uk_syscall_ctx` of the caller and pass it to the registered syscall. The system call function will be declared with the registered arguments and, additionally, with a hidden `struct uk_syscall_ctx *usr` argument that will contain the aforementioned saved context. This is very useful when it comes to system calls such as `fork` or `clone` that may want to duplicate the caller's/parent's context into the child. E.g. Consider the usual registration with `UK_LLSYSCALL_R_DEFINE`: ```C UK_LLSYSCALL_R_DEFINE(type, sysname, type0, arg0, type1, arg1) { ... } ``` This will end up creating two global scope symbols: ``` uk_syscall_e_sysname uk_syscall_r_sysname ``` These symbols will end up calling a statically declared `__uk_syscall_r_sysname`/`__uk_syscall_e_sysname` behind the scenes with the necessary arguments and this will be the actual function executing the code between the brackets `{ }`. Now consider the new variant, `UK_LLSYSCALL_R_U_DEFINE`: ```C UK_LLSYSCALL_R_U_DEFINE(type, sysname, type0, arg0, type1, arg1) { ... } ``` Behind the scenes this will create four global scope symbols: ``` uk_syscall_e_sysname uk_syscall_r_sysname uk_syscall_e_u_sysname uk_syscall_r_u_sysname ``` `uk_syscall_r_sysname` and `uk_syscall_e_sysname` will actually be the assembly preamble that stores the caller's context in the form of `sturct uk_syscall_ctx` and then they call `uk_syscall_r_u_sysname` and `uk_syscall_e_u_sysname` respectively, which represent the actual code written between brackets `{ }`. These, just like before, will call `__uk_syscall_r_u_sysname`/`__uk_syscall_e_u_sysname`, which represent a key useful difference from their alternative without the `_u`: - The `__uk_syscall_r_sysname`/`__uk_syscall_e_sysname` will have the following signature: ```C static inline type __uk_syscall_r_sysname(type0 arg0, type1 arg1); static inline type __uk_syscall_r_sysname(type0 arg0, type1 arg1); ``` - The `__uk_syscall_r_u_sysname`/`__uk_syscall_e_u_sysname` will have the following signature: ```C static inline __used type __uk_syscall_r_u_sysname(struct uk_syscall_ctx *usr, type0 arg0, type1 arg1); static inline __used type __uk_syscall_r_u_sysname(struct uk_syscall_ctx *usr, type0 arg0, type1 arg1); ``` Thus, the programmer can use the `_U_` system call registration variant if they require access to the full context of the caller/parent. Finally, add equivalent `ARG_MAP`/`PRINTD` and the like macro's to accomodate this new type of registration. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Make it so that `ukplat_syscall_handler` now receives a `struct uk_syscall_ctx` pointer as its argument instead. This structure shall be saved on the stack by the early assembly entry for each architecture. Furthermore, prefer `sysregs` API over other methods of touching the context. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Being the center of all system call dispatching, it may be useful to let `uk_syscall6_r` receive a `struct uk_syscall_ctx` as its argument, containing the full context of the caller/parent. Thus, if a system call has its `_u_` variant defined, that will be the one that is used instead and the argument passed to it is the aforementioned `struct uk_syscall_ctx` structure. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Register `clone` through `UK_LLSYSCALL_R_U_DEFINE` to have full access to the `struct uk_syscall_ctx` context of the caller/parent. This way we will be able to give the child the same exact register context as the parent, except, of course, for the obvious registers (e.g. `%rax` equal to `0` on `x86_64` and `x0` equal to `0` on `ARM64`). Some runtimes, such as the Go one, may expect some registers to be exact the same as those of the parent. Registers that may not be preserved may be the scratch ones (e.g. `%r11` on x86). Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
Now that we rely fully on `struct uk_syscall_ctx` and `struct ukarch_sysregs`, we no longer need these TLS variables. Therefore, remove them and any references of them from the codebase. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Reviewed-by: Michalis Pappas <michalis@unikraft.io> Approved-by: Razvan Deaconescu <razvand@unikraft.io> GitHub-Closes: #1175
|
||
/* We now have in SP the trap stack and in x0 the auxiliary stack */ | ||
EXCHANGE_SP_WITH_X0 /* Switch them */ | ||
/* Restore old SP we stored before system call check */ | ||
ldr x0, [x0, #-16] | ||
str x0, [sp, #-16] /* Store old SP in auxiliary stack */ | ||
str x0, [sp, #__SP_OFFSET] /* Store old SP in auxiliary stack */ | ||
b 1f | ||
0: | ||
/* Restore x0 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not restore x0 in the case of a system call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, x0 should have been restored on syscalls as well of course. I must have somehow wrongly changed this during the review process and not notice. The restoration of x0 should have been placed after the 1:
label instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mogasergiu , would this require an update (PR) from your side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I could look into it after I return from my vacation, unless someone else would like to have a go at it as well. Though the fix should be rather simple: moving the restoration of x0 two lines below should suffice if I see this right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #1256
Interim fix for unikraft#1175 Signed-off-by: Michalis Pappas <michalis@unikraft.io>
Interim fix for unikraft#1175 Signed-off-by: Michalis Pappas <michalis@unikraft.io>
Prerequisite checklist
checkpatch.uk
on your commit series before opening this PR;Base target
x86_64
,arm64
kvm
Additional configuration
Description of changes
Some runtimes assume that the child of
clone
inherits full context of the parent (except for scratch registers).Introduce
struct uk_syscall_regs
:A structure made up of architecture specific structures:
struct __regs
for general purpose registersstruct ukarch_ulctx
, meant to help the kernel keep track of certain special registers an application may change and which we may want to know of in case we have to swap them with ours when entering a system call. E.g. TLS pointer in bincompat, or thex86_64
gs_base
register which an application may change througharch_prctl
system call.ectx
mean to represent the architecture specific slot where we may want to save/restore to/from when enterin/exiting a system callIntroduce
UK_LLSYSCALL_R_U_DEFINE
, a system call registration macro alternative that offers a system call access to the caller/parent's fullstruct uk_syscall_regs
context. Two system calls shall be registered:clone
andarch_prctl
. Additionally,uk_syscall6_r
andukplat_syscall_handler
shall now make use of this generic structure.Thus, now
clone
is able to make the child inherit full context of the parent if desired and satisfy picky runtimes.Depends on #1173 and #1174