-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Resolving the issue of fdcheck reporting errors when an epoll file descriptor (fd) is closed #16431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tors Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
Why is fdcheck implemented in the kernel, the kernel should not have to care about such things. Shouldn't it be implemented at the user interface i.e. libc ? |
@pussuw The implementation of fdcheck originally resides in libc https://github.com/apache/nuttx/blob/889a26db1d229ee462bc8ea075c2330dc5f79b94/libs/libc/misc/lib_fdcheck.c But we need to perform the conversion from file descriptor (fd) to file structure pointer (filep) within the kernel, as this conversion cannot be done in user space. And, the parameter for system calls is an fd, nothing else, and the filep is also a kernel data structure. |
Yes the problem here is that the data is in filep. After #16361 you could do this magic in userspace. |
I completely agree with @pussuw here! There is nice cleanup available for the FDs in PR #16361, which fixes the operation to be acc. to the posix. Please don't inject debug functionality which should NOT be inside kernel in there. Integrate first the #16361, and then implement the debug functionality in where it belongs. That is, in the kernel-userspace interface, not directly in the kernel. |
@pussuw #16361 would definitely be the best if we could cover this type of scenario. Therefore, our priority is to ensure that the current design is sound. You can continue to rework based on #16361 . |
Thanks, I do really prefer fixing the file descriptor subsystem like @xiaoxiang781216 suggested in #16326 . It is better to do a comprehensive fix that as far as I can see, will solve most of your problems as well. |
@pussuw @jlaitine |
Yes, I agree with patches 1 and 4, this is why I did not comment on them, those could be merged with no disputes from my side |
@pussuw here is our plan:
Do you think this plan reasonable? |
Yes, all agree #16361 is the right direction to go.
The patchset doesn't change or add any new debugging functionality, but fix the nasty bug found by fdsan/fdcheck. |
Allow users to operate poll in the kernel using the file_poll approach, as file is protected with reference counting, making it more secure. Signed-off-by: dongjiuzhu1 <dongjiuzhu1@xiaomi.com>
@pussuw please review. For the previous two patches, we made enhancements or implementations based on your PR. |
Yes, thank you. I'm OoO until tomorrow so don't have time to read github much until then. |
Sorry, it will go to tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
Resolving the issue of fdcheck reporting errors when an epoll file descriptor (fd) is closed.
The primary cause of the error is as follows:
When a task exits by calling exit, it closes all file descriptors (fds) associated with the current task.
When the epoll fd is closed, if there are still fds in the epoll queue that have values greater than the epoll fd, these fds may be closed first. This can lead to fdcheck failures when call epoll_close
To resolve this, we should use a file-based approach instead of directly managing fds in epoll.
Impact
fix epoll fdcheck issue
Testing
open vela monkey