Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardlink: passing the same file multiple times results in .hardlink-temporary files being left #1602

Closed
elyscape opened this issue Feb 10, 2022 · 8 comments

Comments

@elyscape
Copy link

When the same filepath is provided to hardlink multiple times, it leaves .hardlink-temporary files around:

$ echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 35261410
f2 35261411
f3 35261416
$ hardlink -c f1 f2 f3 f3
Mode:           real
Files:          4
Linked:         3 files
Compared:       0 xattrs
Compared:       2 files
Saved:          12 B
Duration:       0.000145 seconds
$ stat -c '%n %i' f*
f1 35261410
f2 35261410
f3 35261410
f3.hardlink-temporary 35261410
$ rm f*
$ echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 35261421
f2 35261423
f3 35261425
$ hardlink -c f1 f2 f3
Mode:           real
Files:          3
Linked:         2 files
Compared:       0 xattrs
Compared:       2 files
Saved:          12 B
Duration:       0.000100 seconds
$ stat -c '%n %i' f*
f1 35261421
f2 35261421
f3 35261421

While obviously it's better for the user not to specify the same file multiple times, hardlink should probably handle this better.

@elyscape
Copy link
Author

See also RHBZ#2025157.

@elyscape
Copy link
Author

elyscape commented Feb 10, 2022

Update: this note in the manpages for rename(2) explains what's happening:

If oldpath and newpath are existing hard links referring to the same file, then rename() does nothing, and returns a success status.

I was going to say that the solution is to compare inode numbers when checking if a file is a candidate for replacement, and report that it isn't if they're the same, but it turns out that's already happening:

a->st.st_ino != b->st.st_ino &&

This means that what's going on is a TOCTOU issue; the struct stat for each file is retrieved before processing begins:

for (; optind < argc; optind++) {
if (nftw(argv[optind], inserter, 20, FTW_PHYS) == -1)

If the file represented by the underlying path changes because an earlier path was an alias, this doesn't get updated, and the file_may_link_to check gets bypassed.

Probably the solution here is to run each requested path through realpath(3) and deduplicate the results before passing them into nftw(3).

@ferdnyc
Copy link
Contributor

ferdnyc commented Feb 13, 2022

@elyscape Nice find!

Impressively, this bug even occurs if you specify a directory twice on the command line. It doesn't even have to have an ambiguous name.

$ mkdir hltest
$ for i in $(seq 1 4); do echo value > hltest/f$i; done
$ hardlink -c hltest hltest                            
Mode:           real
Files:          8
Linked:         6 files
Compared:       0 xattrs
Compared:       3 files
Saved:          18 B
Duration:       0.000174 seconds
$ ls hltest
f1  f2.hardlink-temporary  f3.hardlink-temporary  f4.hardlink-temporary
f2  f3                     f4

@ferdnyc
Copy link
Contributor

ferdnyc commented Feb 13, 2022

I was going to say that the solution is to compare inode numbers when checking if a file is a candidate for replacement, and report that it isn't if they're the same, but it turns out that's already happening:

The problem, I think, is that while inode numbers are also checked when initially adding entries to the binary tree, paths to the same inode are identified and captured (presumably to track existing hardlinks), but that's done without any checks to ensure that the same path isn't added multiple times:

if (*node != fil) {
/* Already known inode, add link to inode information */
assert((*node)->st.st_dev == sb->st_dev);
assert((*node)->st.st_ino == sb->st_ino);
fil->links->next = (*node)->links;
(*node)->links = fil->links;

The linked-list of paths to update for a given inode i will end up with two entries for /path/to/file_for_inode_i. (AFAICT it's supposed to only hold different paths to that inode.)

Then, when the time comes to point "all" of the old paths to a new inode via the link-and-rename tango, the rename() will be attempted twice to the same path, causing the second one to fail — er, succeed-without-effect — in the manner you described.

In fact, specify the same paths three times on the command line and you can get even the link() to fail:

$ # [same setup as before]
$ hardlink -c hltest hltest hltest
hardlink: cannot link hltest/f1 to hltest/f2.hardlink-temporary: File exists
hardlink: cannot link hltest/f1 to hltest/f3.hardlink-temporary: File exists
hardlink: cannot link hltest/f1 to hltest/f4.hardlink-temporary: File exists
hardlink: cannot link hltest/f2 to hltest/f3.hardlink-temporary: File exists
hardlink: cannot link hltest/f2 to hltest/f4.hardlink-temporary: File exists
hardlink: cannot link hltest/f3 to hltest/f4.hardlink-temporary: File exists
Mode:           real
Files:          12
Linked:         6 files
Compared:       0 xattrs
Compared:       6 files
Saved:          18 B
Duration:       0.000331 seconds

@karelzak
Copy link
Collaborator

Thanks, guys for debugging! Please, try the current git tree to verify the bugfix.

@elyscape
Copy link
Author

This unfortunately doesn't solve the case where aliasing paths are passed in:

[root@1888bab03696 hltest]# hardlink -V
hardlink from util-linux 2.38-rc1
[root@1888bab03696 hltest]# echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 2766692
f2 2766693
f3 2766694
[root@1888bab03696 hltest]# hardlink -vvvc f1 f2 f3 f3
Scanning [device/inode/links]:
     1: [143/2766692/1] f1
     2: [143/2766693/1] f2
     3: [143/2766694/1] f3
     4: [143/2766694/1] f3
Skipped f3 (specified more than once)
Linking f1 to f2 (-6 B)
Linking f1 to f3 (-6 B)
Mode:                     real
Method:                   sha256
Files:                    4
Linked:                   2 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    12 B
Duration:                 0.000342 seconds
[root@1888bab03696 hltest]# stat -c '%n %i' f*
f1 2766692
f2 2766692
f3 2766692
[root@1888bab03696 hltest]# rm -f f*
[root@1888bab03696 hltest]# echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 2766692
f2 2766693
f3 2766694
[root@1888bab03696 hltest]# hardlink -vvvc f1 f2 f3 ./f3
Scanning [device/inode/links]:
     1: [143/2766692/1] f1
     2: [143/2766693/1] f2
     3: [143/2766694/1] f3
     4: [143/2766694/1] ./f3
Linking f1 to f2 (-6 B)
Linking f1 to ./f3 (-6 B)
Linking f1 to f3 (-6 B)
Mode:                     real
Method:                   sha256
Files:                    4
Linked:                   3 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    12 B
Duration:                 0.000490 seconds
[root@1888bab03696 hltest]# stat -c '%n %i' f*
f1 2766692
f2 2766692
f3 2766692
f3.hardlink-temporary 2766692
[root@1888bab03696 hltest]# rm -f f*
[root@1888bab03696 hltest]# echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 2766692
f2 2766693
f3 2766694
[root@1888bab03696 hltest]# hardlink -vvvc f1 f2 f3 ./f3 "${PWD}/f3"
Scanning [device/inode/links]:
     1: [143/2766692/1] f1
     2: [143/2766693/1] f2
     3: [143/2766694/1] f3
     4: [143/2766694/1] ./f3
     5: [143/2766694/1] /root/hltest/f3
Linking f1 to f2 (-6 B)
Linking f1 to /root/hltest/f3 (-6 B)
Linking f1 to ./f3 (-6 B)
Linking f1 to f3 (-6 B)
hardlink: cannot link f1 to f3.hardlink-temporary: File exists
Mode:                     real
Method:                   sha256
Files:                    5
Linked:                   3 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    12 B
Duration:                 0.000469 seconds
[root@1888bab03696 hltest]# stat -c '%n %i' f*
f1 2766692
f2 2766692
f3 2766692
f3.hardlink-temporary 2766692

The easiest solution for this is probably to pass the paths through realpath(3) at some point before comparing them.

@karelzak
Copy link
Collaborator

The easiest solution for this is probably to pass the paths through realpath(3) at some point before comparing them.

Good point. It seems good enough to call realpath() in main() before the code calls nftw(). I'll implement it.

karelzak added a commit that referenced this issue Feb 17, 2022
Addresses: #1602
Signed-off-by: Karel Zak <kzak@redhat.com>
@elyscape
Copy link
Author

That did the trick! Thanks for fixing this.

[root@10152f54d3f1 hltest]# hardlink -V
hardlink from util-linux 2.38-rc1
[root@10152f54d3f1 hltest]# echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 2375078
f2 2375079
f3 2375080
[root@10152f54d3f1 hltest]# hardlink -vvvc f1 f2 f3 f3
Scanning [device/inode/links]:
     1: [143/2375078/1] /root/hltest/f1
     2: [143/2375079/1] /root/hltest/f2
     3: [143/2375080/1] /root/hltest/f3
     4: [143/2375080/1] /root/hltest/f3
Skipped /root/hltest/f3 (specified more than once)
Linking /root/hltest/f1 to /root/hltest/f2 (-6 B)
Linking /root/hltest/f1 to /root/hltest/f3 (-6 B)
Mode:                     real
Method:                   sha256
Files:                    4
Linked:                   2 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    12 B
Duration:                 0.000354 seconds
[root@10152f54d3f1 hltest]# stat -c '%n %i' f*
f1 2375078
f2 2375078
f3 2375078
[root@10152f54d3f1 hltest]# rm -f f*
[root@10152f54d3f1 hltest]# echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 2375078
f2 2375079
f3 2375080
[root@10152f54d3f1 hltest]# hardlink -vvvc f1 f2 f3 ./f3
Scanning [device/inode/links]:
     1: [143/2375078/1] /root/hltest/f1
     2: [143/2375079/1] /root/hltest/f2
     3: [143/2375080/1] /root/hltest/f3
     4: [143/2375080/1] /root/hltest/f3
Skipped /root/hltest/f3 (specified more than once)
Linking /root/hltest/f1 to /root/hltest/f2 (-6 B)
Linking /root/hltest/f1 to /root/hltest/f3 (-6 B)
Mode:                     real
Method:                   sha256
Files:                    4
Linked:                   2 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    12 B
Duration:                 0.000215 seconds
[root@10152f54d3f1 hltest]# stat -c '%n %i' f*
f1 2375078
f2 2375078
f3 2375078
[root@10152f54d3f1 hltest]# rm -f f*
[root@10152f54d3f1 hltest]# echo value >f1 ; echo value >f2 ; echo value >f3 ; stat -c '%n %i' f*
f1 2375078
f2 2375079
f3 2375080
[root@10152f54d3f1 hltest]# hardlink -vvvc f1 f2 f3 ./f3 "${PWD}/f3"
Scanning [device/inode/links]:
     1: [143/2375078/1] /root/hltest/f1
     2: [143/2375079/1] /root/hltest/f2
     3: [143/2375080/1] /root/hltest/f3
     4: [143/2375080/1] /root/hltest/f3
Skipped /root/hltest/f3 (specified more than once)
     5: [143/2375080/1] /root/hltest/f3
Skipped /root/hltest/f3 (specified more than once)
Linking /root/hltest/f1 to /root/hltest/f2 (-6 B)
Linking /root/hltest/f1 to /root/hltest/f3 (-6 B)
Mode:                     real
Method:                   sha256
Files:                    5
Linked:                   2 files
Compared:                 0 xattrs
Compared:                 2 files
Saved:                    12 B
Duration:                 0.000314 seconds
[root@10152f54d3f1 hltest]# stat -c '%n %i' f*
f1 2375078
f2 2375078
f3 2375078

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants