-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/ukschedcoop: Add early check for runnable thread in idle thread #941
lib/ukschedcoop: Add early check for runnable thread in idle thread #941
Conversation
17b567d
to
3f28479
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good on my side 🚀
I won't add the review tag, since I'm added as a co-author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch looks fine to me.
A side note: sti
has a delayed effect that won't kick in until the next instruction, meaning IRQs cannot fire in the middle of a sti; hlt
sequence with the exception of NMIs on some CPUs (which Linux handles in this fashion). There might be an alternative way to solve the issue without having to set the clock every idle tick.
Reviewed-by: Tu Dinh Ngoc dinhngoc.tu@irit.fr
Hey, thanks for the feedback! Interesting, I did not know that, definetely valuable information and I should have checked before for how Linux handles this, indeed. One thing though, is that with this change, the same bug is also fixed on ARM. In any case, the commit message must definetely be updated to at least change the What is your opinion on this, @andreittr ? It would be great to avoid having to re-configure the timer at every idle tick. [1] https://github.com/qemu/qemu/blob/master/target/i386/cpu.c#L7424 |
Yes, the correct way to solve this is to use the "interrupt shadow" mechanism as @dinhngtu noted. The x86 |
Ok, but, as I said, I would like the fix to apply to ARM as well. Besides, can IRQ's not fire before the idle thread is even switched to? |
True.
I mean, there has to be some way on ARM to properly halt and wait for interrupts. Otherwise, timers would be basically unusable. The current approach of deciding to wait for interrupts (in the scheduler), enable the interrupts, do some stuff, and then doing the actual "wait-for-interrupts" is just very wrong. |
Agreed. I'd have to check how others do it on ARM. However, at this point in time, as you both guys also pointed out, the problem ends up not being in the halting instruction sequence. But rather before it, because IRQ's can end up being fired between before switching and our if statement. Maybe something like placing the scheduler call a bit earlier would be an idea, but even then, IRQ's can fire after we reach it, meaning that we would also have to have IRQ's disabled at that point. I propose something similar to this diff --git a/lib/ukschedcoop/schedcoop.c b/lib/ukschedcoop/schedcoop.c
index 4655c4bc8..f18c2b479 100644
--- a/lib/ukschedcoop/schedcoop.c
+++ b/lib/ukschedcoop/schedcoop.c
@@ -188,9 +188,16 @@ static __noreturn void idle_thread_fn(void *argp)
{
struct schedcoop *c = (struct schedcoop *) argp;
__nsec now, wake_up_time;
+ unsigned long flags;
UK_ASSERT(c);
+ flags = ukplat_lcpu_save_irqf();
+
+ /* Quick early check to see if, in the meantime, we got a runnable thread */
+ if (UK_TAILQ_FIRST(&c->run_queue)) {
+ ukplat_lcpu_restore_irqf(flags);
+ schedcoop_schedule(&c->sched);
+ }
+
for (;;) {
/*
* FIXME: We assume that `uk_sched_thread_gc()` is non-blocking
@@ -215,12 +222,12 @@ static __noreturn void idle_thread_fn(void *argp)
if (!wake_up_time || wake_up_time > now) {
if (wake_up_time) {
- ukplat_lcpu_halt_to(wake_up_time);
+ time_block_until(wake_up_time);
} else {
ukplat_lcpu_halt_irq();
- ukplat_lcpu_enable_irq();
}
+ ukplat_lcpu_restore_irqf(flags);
/* handle pending events if any */
ukplat_lcpu_irqs_handle_pending();
} This works on my side. I hope it does not just work because I am missing something though. The flags might not be needed AFAICT though, maybe just enabling/disabling would suffice. I am waiting for your guys' opinions. |
@mogasergiu The "is-work-available" check has to be within the loop. Otherwise, it will only work the first time. So the loop should be a Also I would rather modify Using save/restore is fine and IMHO usually less bugprone. |
Oops, yes, you are correct. So I was, indeed, missing something 😆. Thank you!
Ah, yes, I forgot to disable IRQ's after exiting the scheduler. For whatever reason, I thought it was a
Indeed, that would be a good idea. Would help avoid future mistakes, or at least in being more cautious.
Thanks a lot for the valuable input
|
@mschlumpp Actually, one thing that I have just noticed though, is that this |
We can also just add a save/restore pair to Is there any benefit of exposing the raw platform |
Wouldn't
Every platform seems to have its own implementation for it. The main reason I want to do this is because libraries cannot include headers from |
It was already used for the common code. In that case we still have at least one sanity check there (the irq-disabled assertion).
That would increase the risk of using "unchecked" version accidentally. I also do not see a reason why libraries would want to opt-in the potential risk of running into the bug that this PR is about. The documentation of
If you use this function you must disable IRQs or it will not do what you want. |
So you want
They have different signatures, I am not sure how likely that is to happen. The
Ok, this I agree with, the use of this function would be weird without IRQ's disabled, sure. Right then, the second commit shall be just making |
My point is that the function would still wait for interrupts, but in a very unreliable way if you don't disable interrupts. Otherwise, you would have to explicitly check that the wake-interrupt was the correct timer interrupt. Or at least mention that there are "spurious" wakeups in the documentation.
Yes.
Okay, it is possible I misunderstood you. I though you wanted to rename
Sounds good, or
Huh? You mean at the call-site? |
Good idea!
Yes.
Did I misunderstand your idea? This is what I thought we agreed upon. diff --git a/plat/common/lcpu.c b/plat/common/lcpu.c
index 2af4b5621..e02b3983c 100644
--- a/plat/common/lcpu.c
+++ b/plat/common/lcpu.c
@@ -189,11 +189,9 @@ void __noreturn ukplat_lcpu_halt(void)
- void ukplat_lcpu_halt_to(__nsec until)
+ void ukplat_lcpu_halt_until(__nsec until)
{
- unsigned long flags;
+ UK_ASSERT(ukplat_lcpu_irqs_disabled());
- flags = ukplat_lcpu_save_irqf();
time_block_until(until);
- ukplat_lcpu_restore_irqf(flags);
}
Is this not what you meant? diff --git a/lib/posix-time/time.c b/lib/posix-time/time.c
index b6fc71636..97ec47131 100644
--- a/lib/posix-time/time.c
+++ b/lib/posix-time/time.c
@@ -56,9 +56,12 @@
static void __spin_wait(__nsec nsec)
{
__nsec until = ukplat_monotonic_clock() + nsec;
+ unsigned long flags;
+ flags = ukplat_lcpu_save_irqf();
while (until > ukplat_monotonic_clock())
- ukplat_lcpu_halt_to(until);
+ ukplat_lcpu_halt_until(until);
+ ukplat_lcpu_restore_irqf(flags);
}
#endif |
3f28479
to
49d0ed1
Compare
Timer IRQ
can kick us out of halt
states
✅ Checkpatch passed Beep boop! I ran Unikraft's
|
@mschlumpp, I added you as the assignee for this PR. Approve it if / when it's all good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: putting the review on hold for now
49d0ed1
to
defcd91
Compare
Make sure that `IRQ`'s are disabled when calling `ukplat_lcpu_halt_to`. Furthermore, do not surround the call to `time_block_until` with `IRQ` `save`/`restore` and inform the function user through a `NOTE` to take care of that himself. Therefore, with this commit, change every other reference to this function accordingly. Furthermore, change the name of the function to reflect this. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Suggested-by: Marco Schlumpp <marco@unikraft.io>
Since we enable IRQ's before context switching, there is a chance that IRQ's may fire in the short time frame between the enabling of IRQ's in the scheduler before context switching and disabling of IRQ's before entering the idle thread's halting state. To fix this, make sure we disable IRQ's at the beginning of the idle thread's halting loop and do a quick early check for runnable threads alongside the one for exited threads. Co-authored-by: Stefan Jumarea <stefanjumarea02@gmail.com> Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Signed-off-by: Stefan Jumarea <stefanjumarea02@gmail.com>
defcd91
to
0365bc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, now everything seems to work fine. Thanks again!
Approved-by: Marco Schlumpp marco@unikraft.io
Since we enable IRQ's before context switching, there is a chance that IRQ's may fire in the short time frame between the enabling of IRQ's in the scheduler before context switching and disabling of IRQ's before entering the idle thread's halting state. To fix this, make sure we disable IRQ's at the beginning of the idle thread's halting loop and do a quick early check for runnable threads alongside the one for exited threads. Co-authored-by: Stefan Jumarea <stefanjumarea02@gmail.com> Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Signed-off-by: Stefan Jumarea <stefanjumarea02@gmail.com> Reviewed-by: Tu Dinh Ngoc <dinhngoc.tu@irit.fr> Approved-by: Marco Schlumpp <marco@unikraft.io> Tested-by: Unikraft CI <monkey@unikraft.io> GitHub-Closes: #941
Make sure that `IRQ`'s are disabled when calling `ukplat_lcpu_halt_to`. Furthermore, do not surround the call to `time_block_until` with `IRQ` `save`/`restore` and inform the function user through a `NOTE` to take care of that himself. Therefore, with this commit, change every other reference to this function accordingly. Furthermore, change the name of the function to reflect this. Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Suggested-by: Marco Schlumpp <marco@unikraft.io> Reviewed-by: Tu Dinh Ngoc <dinhngoc.tu@irit.fr> Approved-by: Marco Schlumpp <marco@unikraft.io> Tested-by: Unikraft CI <monkey@unikraft.io> GitHub-Closes: unikraft#941
Since we enable IRQ's before context switching, there is a chance that IRQ's may fire in the short time frame between the enabling of IRQ's in the scheduler before context switching and disabling of IRQ's before entering the idle thread's halting state. To fix this, make sure we disable IRQ's at the beginning of the idle thread's halting loop and do a quick early check for runnable threads alongside the one for exited threads. Co-authored-by: Stefan Jumarea <stefanjumarea02@gmail.com> Signed-off-by: Sergiu Moga <sergiu@unikraft.io> Signed-off-by: Stefan Jumarea <stefanjumarea02@gmail.com> Reviewed-by: Tu Dinh Ngoc <dinhngoc.tu@irit.fr> Approved-by: Marco Schlumpp <marco@unikraft.io> Tested-by: Unikraft CI <monkey@unikraft.io> GitHub-Closes: unikraft#941
lib/ukschedcoop: Add early check for runnable thread in idle thread
Since we enable IRQ's before context switching, there is a chance that
IRQ's may fire in the short time frame between the enabling of IRQ's
in the scheduler before context switching and disabling of IRQ's before
entering the idle thread's halting state.
To fix this, make sure we disable IRQ's at the beginning of the idle
thread's halting loop and do a quick early check for runnable threads
alongside the one for exited threads.
Co-authored-by: Stefan Jumarea stefanjumarea02@gmail.com
Signed-off-by: Sergiu Moga sergiu@unikraft.io
Signed-off-by: Stefan Jumarea stefanjumarea02@gmail.com
Prerequisite checklist
checkpatch.uk
on your commit series before opening this PR;Base target
x86_64
or N/A]kvm
,xen
or N/A]app-python3
or N/A]Additional configuration
Description of changes