Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to link to always create hard links to first found file #3016

Open
Thunder7IsAvailable opened this issue May 7, 2024 · 3 comments
Open

Comments

@Thunder7IsAvailable
Copy link

Thunder7IsAvailable commented May 7, 2024

I'd like an option to tell hardlink to always link to the file found first in the arguments.

Use case:

when a large directory move/rename is backed up using rsnapshot, identical files aren't linked anymore. If you just run hardlink over the rsnapshot repository, chances are good that the 'new' files are hard-linked to the old files. Thus, on the next rsnapshot cycle, they are again not linked but written as new files.

None of the existing options allow for this. jdupes has

-L --link-hard
replace all duplicate files with hardlinks to the first file in each set of duplicates

where the 'first file' part is what I'm after here. At the moment, I do de-duplicating a large rsnapshot repository in two steps: hardlink for all except the alpha.0 directory, then jdupes for the alpha.0 alpha.1 directories. The jdupes-part takes longer for 2 directories than hardlink for 20, so I'd like this option in hardlink please.

It looks like adding the global 'optind' variable in a new priority field in the file struct and then adding an option to compare that may be enough? This is as of yet untested

--- hardlink.org 2024-05-07 11:20:32.089756539 +0200
+++ hardlink.c 2024-05-07 11:29:04.871035637 +0200
@@ -107,4 +107,5 @@
#endif
} *links;
+ int priority;
};

@@ -164,4 +165,5 @@
* @minimise: Chose the file with the lowest link count as master
* @keep_oldest: Choose the file with oldest timestamp as master (default = FALSE)
+ * @keep_first: Choose the file which was found first when scanning (default = FALSE)
* @dry_run: Specifies whether hardlink should not link files (default = FALSE)
* @min_size: Minimum size of files to consider. (default = 1 byte)
@@ -183,4 +185,5 @@
unsigned int minimise:1;
unsigned int keep_oldest:1;
+ unsigned int keep_first:1;
unsigned int dry_run:1;
uintmax_t min_size;
@@ -200,4 +203,5 @@
.respect_xattrs = FALSE,
.keep_oldest = FALSE,
+ .keep_first = FALSE,
.min_size = 1,
.cache_size = 10*1024*1024
@@ -673,4 +677,6 @@
if (res == 0 && opts.minimise)
res = CMP(b->st.st_nlink, a->st.st_nlink);
+ if (res == 0 && opts.keep_first))
+ res = CMP(a->priority, b->priority)
if (res == 0)
res = opts.keep_oldest ? CMP(b->st.st_mtime, a->st.st_mtime)
@@ -876,4 +882,5 @@
fil->links->dirname = rootbasesz;
fil->links->next = NULL;
+ fil->priority = optind;

memcpy(fil->links->path, fpath, pathlen);
@@ -1174,4 +1181,6 @@
fputs(_(" -n, --dry-run don't actually link anything\n"), out);
fputs(_(" -o, --ignore-owner ignore owner changes\n"), out);
+ fputs(_(" -F, --keep-first keep the first file found during scanning of multiple equal files\n"
+ " (lower precedence than minimize/maximize)\n"), out);
fputs(_(" -O, --keep-oldest keep the oldest file of multiple equal files\n"
" (lower precedence than minimize/maximize)\n"), out);
@@ -1225,4 +1234,5 @@ {"maximize", no_argument, NULL, 'm'}, {"minimize", no_argument, NULL, 'M'},
+ {"keep-first", no_argument, NULL, 'F'},
{"keep-oldest", no_argument, NULL, 'O'}, {"exclude", required_argument, NULL, 'x'},
@@ -1274,4 +1284,7 @@
opts.keep_oldest = TRUE;
break;
+ case 'F':
+ opts.keep_first = TRUE;
+ break;
case 'f':
opts.respect_name = TRUE;

@karelzak
Copy link
Collaborator

I like the idea, but we need a better description of the feature. It should be "first tree win" instead of "first file win.".

Can you prepare a patch and create a pull-request? Thanks.

@Thunder7IsAvailable
Copy link
Author

that sounds much more simple than it is. I don´t have a gpg key and I'm not willing to create one (and especially maintain it) for this simple patch, then install some other programs and so on.

Here are the changed files.

hardlink.1.adoc.txt
hardlink.c.txt

@karelzak
Copy link
Collaborator

We also have a mailing list (util-linux@vger.kernel.org) for people who don't want to use GitHub :-) It's fine to use git format-patch and send the result to the list (you do not have to subscribe).

Anyway, adding the files (as you did) is also possible. I'd prepare a git commit from it. Thanks!

karelzak added a commit to karelzak/util-linux-work that referenced this issue May 15, 2024
Based on patch from discussion at util-linux#3016

Signed-off-by: Karel Zak <kzak@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants