sort: deduplicate file descriptors in merge mode#11961
Open
nonontb wants to merge 7 commits intouutils:mainfrom
Open
sort: deduplicate file descriptors in merge mode#11961nonontb wants to merge 7 commits intouutils:mainfrom
nonontb wants to merge 7 commits intouutils:mainfrom
Conversation
New tests are added for merging duplicate files.
|
GNU testsuite comparison: |
Merging this PR will not alter performance
Comparing Footnotes
|
sylvestre
reviewed
Apr 23, 2026
| @@ -0,0 +1,3 @@ | |||
| 1 | |||
Contributor
There was a problem hiding this comment.
please generate the files on the fly
d8afcf6 to
c233373
Compare
… alias Mmap as MemoryMap
c233373 to
44259aa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What This Does
This PR makes sort -m (merge mode) use less (minimum?) opened files.
The Problem
Before:
If you ran sort -m file.txt file.txt file.txt, the program opened file.txt three times eagerly — once for every time it appeared on the command line.
With lots of duplicates or a tight system limit on open files, this could fail.
If you tried to merge a file that was also your output file, the program had to create a temporary copy behind the scenes, using one more file.
GNU version has no issue running the test in #5714
The Fix
Now the program opens each unique file only once and Lazily and use Mmap (memmap2 - unsafe) to manage one FD for all input file duplicates including re-use of output file as inputs.
Result
Fix #5714