-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API k_delayed_work_submit_to_queue() make a delayed_work unusable #5952
Comments
The test case seems a little weird. It looks like this line has a boolean inversion:
My guess is that this was intended to busy-wait until the handler executes (itself a fragile idea -- that will only work if the work_q has a higher priority than the main thread). But instead it's waiting until the work item is pending. And since it was just submitted with a non-zero timeout, it's guaranteed to be pending here and this while() will only run the test once. So what happens is that your code falls straight through to a second submission of the same item, which succeeds as a noop per docs. But then, indeed, there's a bug. The handler never actually executes after that second submission, and in fact the third submission then fails. |
As discovered in zephyrproject-rtos#5952 ...a duplicate call to k_delayed_work_submit_to_queue() on a work item whose timeout had expired but which had not yet executed (i.e. it was pending in the queue for the active work queue thread) would fail, because the cancellation step wouldn't clear the PENDING bit, causing the resubmission to see the object in an invalid state. Trivially fixed by adding a bit clear. It also turns out that the behavior of the code doesn't match the docs, which state that a PENDING work item is not supposed to be cancelled at all. Fix the docs to remove that. And on yet further review, it turns out that there's no way to make a test like the one in the linked bug threadsafe. The work queue does no synchronization by design, so if the user code does no external synchronization it might very well clobber the running handler. Added a sentence to the docs to reflect this gotcha. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Never mind, the inversion was mine, not yours: "pending" in this context means "the timeout is finished and we're in the main work queue" not "pending on timeout". I see what you're doing now, and indeed it failed to try to resubmit a pending object, despite the fact that the code was trying to do exactly that. There's a fix for this specific case up at #6189 Note that careful review shows that this particular usage can't ever be safe, though. The work queue itself does no synchronization, so it's not legal to call this function to "restart the timeout" without external synchronization in the general case. Cleaned up the docs to reflect that. But it seems to make your particular case work. Be very careful. |
As discovered in #5952 ...a duplicate call to k_delayed_work_submit_to_queue() on a work item whose timeout had expired but which had not yet executed (i.e. it was pending in the queue for the active work queue thread) would fail, because the cancellation step wouldn't clear the PENDING bit, causing the resubmission to see the object in an invalid state. Trivially fixed by adding a bit clear. It also turns out that the behavior of the code doesn't match the docs, which state that a PENDING work item is not supposed to be cancelled at all. Fix the docs to remove that. And on yet further review, it turns out that there's no way to make a test like the one in the linked bug threadsafe. The work queue does no synchronization by design, so if the user code does no external synchronization it might very well clobber the running handler. Added a sentence to the docs to reflect this gotcha. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Calling k_delayed_work_submit_to_queue() repeatly would broke the delayed_work object.
test case:
run:
-22 is "-EINVAL", let's check the docs:
Is it a bug or just some wrong with my operation?
The text was updated successfully, but these errors were encountered: