Skip to content

sort: deduplicate file descriptors in merge mode#11961

Open
nonontb wants to merge 7 commits intouutils:mainfrom
nonontb:feature/sort-input-fd-optimization
Open

sort: deduplicate file descriptors in merge mode#11961
nonontb wants to merge 7 commits intouutils:mainfrom
nonontb:feature/sort-input-fd-optimization

Conversation

@nonontb
Copy link
Copy Markdown

@nonontb nonontb commented Apr 23, 2026

What This Does

This PR makes sort -m (merge mode) use less (minimum?) opened files.

The Problem

Before:

If you ran sort -m file.txt file.txt file.txt, the program opened file.txt three times eagerly — once for every time it appeared on the command line.
With lots of duplicates or a tight system limit on open files, this could fail.

If you tried to merge a file that was also your output file, the program had to create a temporary copy behind the scenes, using one more file.

GNU version has no issue running the test in #5714

The Fix

Now the program opens each unique file only once and Lazily and use Mmap (memmap2 - unsafe) to manage one FD for all input file duplicates including re-use of output file as inputs.

Result

Fix #5714

New tests are added for merging duplicate files.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 23, 2026

GNU testsuite comparison:

Skipping an intermittent issue tests/cut/bounded-memory (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tail/symlink (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/cp/sparse-to-pipe is no longer failing!
Congrats! The gnu test tests/tail/pipe-f2 is no longer failing!
Congrats! The gnu test tests/basenc/bounded-memory is now passing!
Congrats! The gnu test tests/rm/many-dir-entries-vs-OOM is now passing!
Note: The gnu test tests/misc/write-errors was skipped on 'main' but is now failing.
Skip an intermittent issue tests/pr/bounded-memory (was skipped on 'main', now failing)

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 23, 2026

Merging this PR will not alter performance

✅ 309 untouched benchmarks
⏩ 46 skipped benchmarks1


Comparing nonontb:feature/sort-input-fd-optimization (44259aa) with main (33b9156)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@@ -0,0 +1,3 @@
1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please generate the files on the fly

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@nonontb nonontb force-pushed the feature/sort-input-fd-optimization branch from d8afcf6 to c233373 Compare April 24, 2026 13:50
@nonontb nonontb force-pushed the feature/sort-input-fd-optimization branch from c233373 to 44259aa Compare April 24, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sort opens too many files

2 participants