Skip to content

fs/vfs: Separate file descriptors from file descriptions #16361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

pussuw
Copy link
Contributor

@pussuw pussuw commented May 12, 2025

Summary

This PR is a rework of the NuttX file descriptor implementation. The
goal is two-fold:

  1. Improve POSIX compliance. The old implementation tied file description
    to inode only, not the file struct. POSIX however dictates otherwise.
  2. Fix a bug with descriptor duplication (dup2() and dup3()). There is
    an existing race condition with this POSIX API that currently results
    in a kernel side crash.

The crash occurs when a partially open / closed file descriptor is
duplicated. The reason for the crash is that even if the descriptor is
closed, the file might still be in use by the kernel (due to e.g. ongoing
write to file). The open file data is changed by file_dup3() and this
causes a crash in the device / drivers themselves as they lose access to
the inode and private data.

The fix is done by separating struct file into file and file descriptor
structs. The file struct can live on even if the descriptor is closed,
fixing the crash. This also fixes the POSIX issue, as two descriptors
can now point to the same file.

Impact

The impact is big, this is basically a re-write of the file descriptor subsystem, so the possibility of regression is big.
Using Kconfig option CONFIG_FS_REFCOUNT=n is now potentially very dangerous, as files cannot be closed safely any
more. This option is reserved for targets that only open() files and start processes. If a process with open files exits, it will NOT release its open files as it is not safe to do so.

User: The impact to user is opaque
Documentation: Documentation impact is not clear, depends on the level of documentation for the old vfs implementation
Security: none
Build system: none

Testing

Targets:
MPFS target with KERNEL_BUILD=y, SMP=y and 100+ threads
rv-virt:nsh64
rv-virt:knsh64
rv-virt:smp64
rv-virt:ksmp64

Tests:
ostest
test_nuttx_fs_dup01();
test_nuttx_fs_dup201();
test_nuttx_fs_fcntl01();
test_nuttx_fs_fcntl02();
test_nuttx_fs_fcntl03();

The motivation for this change can be found here:
#16326

There are also open questions about some calls to e.g. file_dup2, it was not clear to me how to handle these.

@pussuw pussuw marked this pull request as draft May 12, 2025 13:12
@github-actions github-actions bot added Area: Tooling Area: Networking Effects networking subsystem Area: File System File System issues Area: OS Components OS Components issues Size: XL The size of the change in this PR is very large. Consider breaking down the PR into smaller pieces. labels May 12, 2025
@pussuw
Copy link
Contributor Author

pussuw commented May 12, 2025

Marking as draft as this change has exposed a leak in inode refcount handling. It causes crashes when running test_nuttx_fs_dup201().

I'll find the root cause and fix this separately.

@pussuw pussuw changed the title fs_vfs: Separate file descriptors from file descriptions fs/vfs: Separate file descriptors from file descriptions May 12, 2025
@xiaoxiang781216 xiaoxiang781216 requested a review from Donny9 May 12, 2025 13:28
@pussuw pussuw force-pushed the fs_fdlist_rewrite branch from 2fe1c71 to 6bd86ce Compare May 12, 2025 15:03
@pussuw
Copy link
Contributor Author

pussuw commented May 12, 2025

Added a new patch into this PR that fixes the inode refcnt leak for pseudofiles.

@pussuw
Copy link
Contributor Author

pussuw commented May 12, 2025

I will do more testing tomorrow, but marking this as ready for review.

@pussuw pussuw marked this pull request as ready for review May 12, 2025 15:04
@pussuw pussuw force-pushed the fs_fdlist_rewrite branch from 6bd86ce to 9bae58d Compare May 12, 2025 15:09
@pussuw
Copy link
Contributor Author

pussuw commented May 12, 2025

Reminder to self: remove the CONFIG_FS_REFCOUNT Kconfig option, it's blocking CI it seems.

@@ -466,6 +466,15 @@ struct file
off_t f_pos; /* File position */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we drop CONFIG_FS_REFCOUNT as you mention in pr

Copy link
Contributor Author

@pussuw pussuw May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I forgot to remove the option, I'll remove it tomorrow.

One option of course if we want to keep CONFIG_FS_REFCOUNT we must prevent file_close altogether, because it will never be safe to call file_close.

The simplest case is stdout/stderr/stdin, now the file is shared by ALL children of init, and no duplicate file structs are created. So when a process exits, it is NOT SAFE to free stdout/stderr/stdin for that process if there is no reference counter protecting them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anchao what is your opinion on this ? Should I leave the option and prevent file closing ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refcount is required to implement the POSIX semantic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anchao what is your opinion on this ? Should I leave the option and prevent file closing ?

I still suggest keeping the configuration of CONFIG_FS_REFCOUNT. This can reserve more space and performance on devices with limited resources. Currently, in the projects I'm maintaining, the support for CONFIG_FS_REFCOUNT has been turned off because there are basically no cases of process termination or file descriptor closure.

Copy link
Contributor Author

@pussuw pussuw May 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some basic tests with CONFIG_FS_REFCOUNT=n and at least there are no crashes, but as files cannot be freed, the memory consumption increases with every file open()'d or dup()'d.

Closing the file explicitly with close() / file_close() obviously works but is dangerous. If this is acceptable I'll leave the CONFIG_FS_REFCOUNT option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what the difference is with the current version. Why file could not be released if CONFIG_FS_REFCOUNT=n? Shouldn't the user control the fds life cycle? Is there some paths without close?

Copy link
Contributor Author

@pussuw pussuw May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation had a separate file struct for each file descriptor number. Even for the duplicated stdin/stdout/stderr every new task group created new file structs.

The new implementation does not create new file structs when a fd is duplicated, a duplicated fd simply REFERS to a filep. So with stdio, the new task group REFERS to the parent's stdio files. The reference counter protects the file struct so it does not get freed before it is not needed any more.

This means when either the parent or child exits, the stdio files cannot be directly closed, only the reference count can be decremented. If the ref counter is disabled, the shared file struct cannot be closed safely, ever.

Same applies to descriptors created by dup/dup2/dup3. The file struct to which oldfd and newfd points to can never be safely freed / closed without the reference counter, as with the new implementation they point to the same file struct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your detailed explanation. I wonder if there could be a better implementation to balance the current requirement. I need to spend some time studying this PR, and I will get back to you over the weekend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory leak will always happen without refcount. so, it isn't optional with the new implementation

@pussuw pussuw force-pushed the fs_fdlist_rewrite branch from 9bae58d to 6d420f0 Compare May 13, 2025 06:51
@github-actions github-actions bot removed the Area: Networking Effects networking subsystem label May 13, 2025
@pussuw pussuw force-pushed the fs_fdlist_rewrite branch 3 times, most recently from dc209b3 to 3618f59 Compare May 13, 2025 07:33
jerpelea
jerpelea previously approved these changes May 13, 2025
@pussuw

This comment was marked as resolved.

@hujun260
Copy link
Contributor

nsh> fdinfo

pid:0
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:1
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:2
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:3
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:4
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040036b76 0x00000000400358e7 0x00000000400054d2 0x0000000040006dc8 0x0000000040003633 0x000000004000378
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040036b76 0x00000000400358e7 0x00000000400054d2 0x0000000040006dc8 0x0000000040003633 0x000000004000378
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040036b76 0x00000000400358e7 0x00000000400054d2 0x0000000040006dc8 0x0000000040003633 0x000000004000378
 3   3073    0    5         /proc//        0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038478 0x000000004005ad3d 0x00000000400291aa 0x0000000040028b11 0x0000000040022ec
 4   1025    3    0                        0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038478 0x0000000040028d61 0x0000000040026394 0x000000004002925e 0x0000000040028b1

Backtrace should now be working. At least it's doing something.

fdsan / fdcheck might be broken. If someone could help me test them it would be appreciated. @hujun260 looks like you authored fdcheck, could you take a look please.

I've carried out some fdsan / fdcheck basic tests, and it seems to be okay.

@Donny9
Copy link
Contributor

Donny9 commented May 14, 2025

nsh> fdinfo

pid:0
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:1
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:2
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:3
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038390 0x00000000400053a2 0x0000000040002ce6 0x0000000040002613 0x000078a2faa2a1c
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053ba 0x0000000040002ce6 0x000000004000261
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040035240 0x0000000040035abf 0x0000000040035aea 0x00000000400053c9 0x0000000040002ce6 0x000000004000261
 
pid:4
FD  OFLAGS  TYPE POS       PATH           BACKTRACE
0   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040036b76 0x00000000400358e7 0x00000000400054d2 0x0000000040006dc8 0x0000000040003633 0x000000004000378
 1   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040036b76 0x00000000400358e7 0x00000000400054d2 0x0000000040006dc8 0x0000000040003633 0x000000004000378
 2   3       1    0         /dev/console   0x0000000040053878 0x0000000040036b20 0x0000000040036b76 0x00000000400358e7 0x00000000400054d2 0x0000000040006dc8 0x0000000040003633 0x000000004000378
 3   3073    0    5         /proc//        0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038478 0x000000004005ad3d 0x00000000400291aa 0x0000000040028b11 0x0000000040022ec
 4   1025    3    0                        0x0000000040053878 0x00000000400356e4 0x0000000040038072 0x0000000040038478 0x0000000040028d61 0x0000000040026394 0x000000004002925e 0x0000000040028b1

Backtrace should now be working. At least it's doing something.

fdsan / fdcheck might be broken. If someone could help me test them it would be appreciated. @hujun260 looks like you authored fdcheck, could you take a look please.

@pussuw fdcheck have some bug, i have fixed in internal of team, you can try these PR:
#16374
#16375
apache/nuttx-apps#3075

@Donny9
Copy link
Contributor

Donny9 commented May 23, 2025

@pussuw please rebase this PR, i will try them on my local enviroment.
image

@pussuw
Copy link
Contributor Author

pussuw commented May 23, 2025

@pussuw please rebase this PR, i will try them on my local enviroment. image

The current branch should be conflict free, I don't get conflicts locally nor does gh detect any

@jpaali
Copy link
Contributor

jpaali commented Jun 2, 2025

Is this ready for merge? I have seen crashes probably caused by file descriptor race conditions.

@xiaoxiang781216
Copy link
Contributor

Is this ready for merge? I have seen crashes probably caused by file descriptor race conditions.

@Donny9 is testing and refining the patch, the verified patchset will be sent out tomorrow.

@jlaitine
Copy link
Contributor

jlaitine commented Jun 4, 2025

I am experiencing kernel crashes related to sockets, sendmsg->psock_sendmsg .

This PR solves the issue; is this now good to be merged?

@Donny9 , @xiaoxiang781216 ?

@xiaoxiang781216
Copy link
Contributor

I am experiencing kernel crashes related to sockets, sendmsg->psock_sendmsg .

This PR solves the issue; is this now good to be merged?

@Donny9 , @xiaoxiang781216 ?

please wait a moment, because we found that mountpt_operations::dup doesn't need in the new approach and dup function need more change. @Donny9 is refining the patchset and do the internal testing.

@pussuw
Copy link
Contributor Author

pussuw commented Jun 5, 2025

I am experiencing kernel crashes related to sockets, sendmsg->psock_sendmsg .
This PR solves the issue; is this now good to be merged?
@Donny9 , @xiaoxiang781216 ?

please wait a moment, because we found that mountpt_operations::dup doesn't need in the new approach and dup function need more change. @Donny9 is refining the patchset and do the internal testing.

Does this have something to do with file_dup and file_dup2 (or file pointer duplication in general)? They both seem to fit very poorly into the new approach but did not know whether they can be removed or not. I also suspected that the ops::dup() is redundant but did not know for sure if some kernel component depends on it.

Good that you are making a cleanup round for them, but I'd like to remind that even in its current state these patches should not break existing functionality and they fix a massive kernel side crash (for us at least).

@jlaitine
Copy link
Contributor

jlaitine commented Jun 6, 2025

@xiaoxiang781216 ,

Could you elaborate a bit why can't this be integrated and further optimizations / cleanups done on top of this? Does this patchset break something? For me it is fixing a serious issue, so I'd rather see this going in first, and other changes later.

Why are we waiting for some cleanup pathces before integrating a PR which fixes a kernel crash?

Moreover, this is a large patchset, and it is tested as it is now. Waiting increases the risk of things breaking if there are new patches pushed in, and this needs to be re-based several times.

Please, move forward with this already.

pussuw added 4 commits June 6, 2025 11:28
So that the same code can be used with and without spinlocks.

Signed-off-by: Ville Juven <ville.juven@unikie.com>
When a new pseudofile is created, the inode reference count needs to
be bumped to protect the node.

Signed-off-by: Ville Juven <ville.juven@unikie.com>
Currently the code is dumped into one massive file; fs_files. Move the
different logical parts into their own files.

Signed-off-by: Ville Juven <ville.juven@unikie.com>
This patch is a rework of the NuttX file descriptor implementation. The
goal is two-fold:

1. Improve POSIX compliance. The old implementation tied file description
to inode only, not the file struct. POSIX however dictates otherwise.
2. Fix a bug with descriptor duplication (dup2() and dup3()). There is
an existing race condition with this POSIX API that currently results
in a kernel side crash.

The crash occurs when a partially open / closed file descriptor is
duplicated. The reason for the crash is that even if the descriptor is
closed, the file might still be in use by the kernel (due to e.g. ongoing
write to file). The open file data is changed by file_dup3() and this
causes a crash in the device / drivers themselves as they lose access to
the inode and private data.

The fix is done by separating struct file into file and file descriptor
structs. The file struct can live on even if the descriptor is closed,
fixing the crash. This also fixes the POSIX issue, as two descriptors
can now point to the same file.

Signed-off-by: Ville Juven <ville.juven@unikie.com>
@pussuw pussuw force-pushed the fs_fdlist_rewrite branch from 2f44a44 to 825ab63 Compare June 6, 2025 08:28
@Donny9
Copy link
Contributor

Donny9 commented Jun 9, 2025

@pussuw
Hello, could you please grant me write permissions to your GitHub repository? I've made updates to the code locally and need to sync them up, but I'm currently encountering permission issues.
image
image

If it's not convenient to grant the permissions, I can submit a new Pull Request (PR) for review instead.

@pussuw
Copy link
Contributor Author

pussuw commented Jun 9, 2025

Hi @Donny9 it is not possible to add permissions to tiiuae/nuttx, you can push the patches to another branch and make a new PR.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Jun 9, 2025

Hi @Donny9 it is not possible to add permissions to tiiuae/nuttx, you can push the patches to another branch and make a new PR.

you don't give the full write permission of your nuttx git to @Donny9, but the branch which already upstream to nuttx like this:
image
anyway, let's continue review on the new and close this pr.

@pussuw
Copy link
Contributor Author

pussuw commented Jun 9, 2025

Hi @Donny9 it is not possible to add permissions to tiiuae/nuttx, you can push the patches to another branch and make a new PR.

you don't give the full write permission of your nuttx git to @Donny9, but the branch which already upstream to nuttx like this: image anyway, let's continue review on the new and close this pr.

Not sure I have rights to do even that. Let's just move to the new PR.

@pussuw pussuw closed this Jun 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: File System File System issues Area: OS Components OS Components issues Area: Tooling Size: XL The size of the change in this PR is very large. Consider breaking down the PR into smaller pieces.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants