-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fs/vfs: Separate file descriptors from file descriptions #16361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Marking as draft as this change has exposed a leak in inode refcount handling. It causes crashes when running test_nuttx_fs_dup201(). I'll find the root cause and fix this separately. |
Added a new patch into this PR that fixes the inode refcnt leak for pseudofiles. |
I will do more testing tomorrow, but marking this as ready for review. |
Reminder to self: remove the CONFIG_FS_REFCOUNT Kconfig option, it's blocking CI it seems. |
@@ -466,6 +466,15 @@ struct file | |||
off_t f_pos; /* File position */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we drop CONFIG_FS_REFCOUNT as you mention in pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I forgot to remove the option, I'll remove it tomorrow.
One option of course if we want to keep CONFIG_FS_REFCOUNT we must prevent file_close altogether, because it will never be safe to call file_close.
The simplest case is stdout/stderr/stdin, now the file is shared by ALL children of init, and no duplicate file structs are created. So when a process exits, it is NOT SAFE to free stdout/stderr/stdin for that process if there is no reference counter protecting them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anchao what is your opinion on this ? Should I leave the option and prevent file closing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refcount is required to implement the POSIX semantic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anchao what is your opinion on this ? Should I leave the option and prevent file closing ?
I still suggest keeping the configuration of CONFIG_FS_REFCOUNT. This can reserve more space and performance on devices with limited resources. Currently, in the projects I'm maintaining, the support for CONFIG_FS_REFCOUNT has been turned off because there are basically no cases of process termination or file descriptor closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some basic tests with CONFIG_FS_REFCOUNT=n and at least there are no crashes, but as files cannot be freed, the memory consumption increases with every file open()'d or dup()'d.
Closing the file explicitly with close() / file_close() obviously works but is dangerous. If this is acceptable I'll leave the CONFIG_FS_REFCOUNT option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what the difference is with the current version. Why file could not be released if CONFIG_FS_REFCOUNT=n? Shouldn't the user control the fds life cycle? Is there some paths without close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation had a separate file struct for each file descriptor number. Even for the duplicated stdin/stdout/stderr every new task group created new file structs.
The new implementation does not create new file structs when a fd is duplicated, a duplicated fd simply REFERS to a filep. So with stdio, the new task group REFERS to the parent's stdio files. The reference counter protects the file struct so it does not get freed before it is not needed any more.
This means when either the parent or child exits, the stdio files cannot be directly closed, only the reference count can be decremented. If the ref counter is disabled, the shared file struct cannot be closed safely, ever.
Same applies to descriptors created by dup/dup2/dup3. The file struct to which oldfd and newfd points to can never be safely freed / closed without the reference counter, as with the new implementation they point to the same file struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your detailed explanation. I wonder if there could be a better implementation to balance the current requirement. I need to spend some time studying this PR, and I will get back to you over the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memory leak will always happen without refcount. so, it isn't optional with the new implementation
dc209b3
to
3618f59
Compare
This comment was marked as resolved.
This comment was marked as resolved.
I've carried out some fdsan / fdcheck basic tests, and it seems to be okay. |
@pussuw fdcheck have some bug, i have fixed in internal of team, you can try these PR: |
15a5ca7
to
2f44a44
Compare
@pussuw please rebase this PR, i will try them on my local enviroment. |
The current branch should be conflict free, I don't get conflicts locally nor does gh detect any |
Is this ready for merge? I have seen crashes probably caused by file descriptor race conditions. |
@Donny9 is testing and refining the patch, the verified patchset will be sent out tomorrow. |
I am experiencing kernel crashes related to sockets, sendmsg->psock_sendmsg . This PR solves the issue; is this now good to be merged? |
please wait a moment, because we found that mountpt_operations::dup doesn't need in the new approach and dup function need more change. @Donny9 is refining the patchset and do the internal testing. |
Does this have something to do with file_dup and file_dup2 (or file pointer duplication in general)? They both seem to fit very poorly into the new approach but did not know whether they can be removed or not. I also suspected that the ops::dup() is redundant but did not know for sure if some kernel component depends on it. Good that you are making a cleanup round for them, but I'd like to remind that even in its current state these patches should not break existing functionality and they fix a massive kernel side crash (for us at least). |
Could you elaborate a bit why can't this be integrated and further optimizations / cleanups done on top of this? Does this patchset break something? For me it is fixing a serious issue, so I'd rather see this going in first, and other changes later. Why are we waiting for some cleanup pathces before integrating a PR which fixes a kernel crash? Moreover, this is a large patchset, and it is tested as it is now. Waiting increases the risk of things breaking if there are new patches pushed in, and this needs to be re-based several times. Please, move forward with this already. |
So that the same code can be used with and without spinlocks. Signed-off-by: Ville Juven <ville.juven@unikie.com>
When a new pseudofile is created, the inode reference count needs to be bumped to protect the node. Signed-off-by: Ville Juven <ville.juven@unikie.com>
Currently the code is dumped into one massive file; fs_files. Move the different logical parts into their own files. Signed-off-by: Ville Juven <ville.juven@unikie.com>
This patch is a rework of the NuttX file descriptor implementation. The goal is two-fold: 1. Improve POSIX compliance. The old implementation tied file description to inode only, not the file struct. POSIX however dictates otherwise. 2. Fix a bug with descriptor duplication (dup2() and dup3()). There is an existing race condition with this POSIX API that currently results in a kernel side crash. The crash occurs when a partially open / closed file descriptor is duplicated. The reason for the crash is that even if the descriptor is closed, the file might still be in use by the kernel (due to e.g. ongoing write to file). The open file data is changed by file_dup3() and this causes a crash in the device / drivers themselves as they lose access to the inode and private data. The fix is done by separating struct file into file and file descriptor structs. The file struct can live on even if the descriptor is closed, fixing the crash. This also fixes the POSIX issue, as two descriptors can now point to the same file. Signed-off-by: Ville Juven <ville.juven@unikie.com>
@pussuw If it's not convenient to grant the permissions, I can submit a new Pull Request (PR) for review instead. |
Hi @Donny9 it is not possible to add permissions to tiiuae/nuttx, you can push the patches to another branch and make a new PR. |
you don't give the full write permission of your nuttx git to @Donny9, but the branch which already upstream to nuttx like this: |
Not sure I have rights to do even that. Let's just move to the new PR. |
Summary
This PR is a rework of the NuttX file descriptor implementation. The
goal is two-fold:
to inode only, not the file struct. POSIX however dictates otherwise.
an existing race condition with this POSIX API that currently results
in a kernel side crash.
The crash occurs when a partially open / closed file descriptor is
duplicated. The reason for the crash is that even if the descriptor is
closed, the file might still be in use by the kernel (due to e.g. ongoing
write to file). The open file data is changed by file_dup3() and this
causes a crash in the device / drivers themselves as they lose access to
the inode and private data.
The fix is done by separating struct file into file and file descriptor
structs. The file struct can live on even if the descriptor is closed,
fixing the crash. This also fixes the POSIX issue, as two descriptors
can now point to the same file.
Impact
The impact is big, this is basically a re-write of the file descriptor subsystem, so the possibility of regression is big.
Using Kconfig option CONFIG_FS_REFCOUNT=n is now potentially very dangerous, as files cannot be closed safely any
more. This option is reserved for targets that only open() files and start processes. If a process with open files exits, it will NOT release its open files as it is not safe to do so.
User: The impact to user is opaque
Documentation: Documentation impact is not clear, depends on the level of documentation for the old vfs implementation
Security: none
Build system: none
Testing
Targets:
MPFS target with KERNEL_BUILD=y, SMP=y and 100+ threads
rv-virt:nsh64
rv-virt:knsh64
rv-virt:smp64
rv-virt:ksmp64
Tests:
ostest
test_nuttx_fs_dup01();
test_nuttx_fs_dup201();
test_nuttx_fs_fcntl01();
test_nuttx_fs_fcntl02();
test_nuttx_fs_fcntl03();
The motivation for this change can be found here:
#16326
There are also open questions about some calls to e.g. file_dup2, it was not clear to me how to handle these.