Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Queue #864

Closed
dgalli1 opened this issue Jan 14, 2021 · 40 comments
Closed

Move Queue #864

dgalli1 opened this issue Jan 14, 2021 · 40 comments
Labels

Comments

@dgalli1
Copy link

dgalli1 commented Jan 14, 2021

Hi,
Im currently trying to write a script which moves files to the correct path policy folder over the night.
I have ignorepponrename set to true, so that i can move files over samba.

But i would still like to move every file to the correct harddrive via a cronjob.

Is it somehow possible to create a simple log entry for every file that was not moved to the correct path because of the ignorepponrename opition? I thought about layering another fuse fs on top of mergerfs. To just log this one operation, but this seems kinda overkill for this task. Anyother options that you could think of?

@dgalli1 dgalli1 changed the title Move Que Move Queue Jan 14, 2021
@trapexit
Copy link
Owner

I'm not really sure I understand what you're doing and why.

Why do you need to move files around? What policy are you using?

@dgalli1
Copy link
Author

dgalli1 commented Jan 14, 2021

So im moving files around via my file explorer. But i don't want to wait for the moves because it delays my sorting work by a lot.
This already works because of the option "ignorepponrename".

But i still want the files on the correct harddrive. So i thought i can move them in the night via a cronjob.
But i don't know which files have to go where (of course i could re implement your policies somehow, but this seems like a lot of work. Or do this via a ) So i thought if their was a possibility to somehow get a log of all rename actions which don't get moved between harddrives because of the "ignorepponrename" option.

I don't know if their is a way, i just thought i ask you before i try anything.
I hope this explanation was more clear.

@trapexit
Copy link
Owner

I'm sorry but I'm still not really following. You're telling me you want strict path preservation (why?) but you enabled ignorepponrename which undermines that and seem to be fine with that.

So you don't want that? You want to strictly keep files on a drive based on the underlying path? Wouldn't that happen simply by "moving" files between relevant paths? mergerfs will return EXDEV to the mv command or whatever is trying the move which should force it to copy and remove keeping the paths in place.

@dgalli1
Copy link
Author

dgalli1 commented Jan 14, 2021

So you don't want that? You want to strictly keep files on a drive based on the underlying path? Wouldn't that happen simply by "moving" files between relevant paths? mergerfs will return EXDEV to the mv command or whatever is trying the move which should force it to copy and remove keeping the paths in place.

I want a strict path policy. But i also want to "move" stuff fast even when the command returns exdev. (This is obviously not possible). But if i could log the it somehow when mergerfs tried to move a file but returned "exdev" i could move the files to the desîgnated path preserving positon later.

Example:
1.
sda:
folder1/file.mp4

sdb:
folder2/

mv folder1/file.mp4 /folder2
This will now return EXDEV and copy & delete the file instant of moving it.
sdb:
folder2/file.mp4
sda:
folder1/

But what i want is this:
sda:
folder1/
folder2/file.mp4

And in addition to this i would like if this none path preserving behavior could somehow be recorded into a log file.

So that i can do this when i'm not using my computer:
sda:
folder1/

sdb:
folder2/file.mp4

I totally understand if this is out of scope.
I should be able to to the same with inotify, and checking when a new folder/path is created, that already exists on another drive. But i thought i ask here first before going that route.

@trapexit
Copy link
Owner

I feel like I'm being dense but I'm still not getting it. Or at least why.

You want to go through a list of rename/link calls that fail and then manually copy&unlink the files out of band rather than just letting it happen when the original rename/link was called?
Why? Why not let the software doing the high level move do the right thing on the fly? What I really want to understand is what the problem is with doing it live? The system isn't performant enough to handle copying and other things at the same time? It is much better to describe the ultimate problem and not solutions. Offering solutions anchors the conversation without allowing me to really flush out the possible solutions.

Also... I have to ask why you care which drive you're writing files to? You don't have backup and want to limit the type of files lost if a drive dies? You add and remove drives regularly for other purposes?

@dgalli1
Copy link
Author

dgalli1 commented Jan 14, 2021

Why? Why not let the software doing the high level move do the right thing on the fly? What I really want to understand is what the problem is with doing it live? The system isn't performant enough to handle copying and other things at the same time? It is much better to describe the ultimate problem and not solutions. Offering solutions anchors the conversation without allowing me to really flush out the possible solutions.

It's not really that important, i just don't like waiting for my move operations while sorting through a lot of files.
Because it takes me a lot longer to sort some folder then if i would get instant feedback on my new folder structure.

Also... I have to ask why you care which drive you're writing files to? You don't have backup and want to limit the type of files lost if a drive dies? You add and remove drives regularly for other purposes?

Yup i don't have the same level of redundancy on all my drives, which i why i want the files to end up on the correct hdd.

@trapexit
Copy link
Owner

OK. Now I understand the situation. In the future I really suggest describing the problem abstractly because it's much easier to understand situation and the possible solutions.

Your proposal wouldn't be a move queue so much as an error log. The ideal situation would be for mergerfs to do everything but that would require massive changes as it would have to fake the location of files which isn't something it does now and wouldn't be fun to instrument into the code. It would touch everything and I'm not sure would be useful for anything else.

What if there was a setting where mergerfs, when an EXDEV occurred, instead created a symlink to the source? For a rename it's a little weird because you expect the source to disappear but for a link it wouldn't be bad. Then you could walk the filesystem and do a copy and rename on the underlying branch whenever you wish and all the info is there to do it. You'd need to be careful though. A copy to temp file and then rename over the symlink. For rename it could even move the file to a known hidden directory on the branch or something to at least get it out of the directory it is in. This would be a lot easier to implement and I think offer you basically what you want in terms of the information. The only downside is that you have to crawl the branches.

@dgalli1
Copy link
Author

dgalli1 commented Jan 14, 2021

What if there was a setting where mergerfs, when an EXDEV occurred, instead created a symlink to the source?

Seems good, but if i understand your proposal correctly this would mean that the source file keeps existing until i run my script? The issue i got with this that its hard to keep track of which files i already moved.
Or did i understood you wrong and you meant the other way around? Move the file and create a symlink from original location to the new one?

Also won't this fuck with programmes that expect a file to get moved (Ex. Sonarr)?

@trapexit
Copy link
Owner

There are 2 situations to consider. A link and a rename call. Both can fail due to source and destination being on different mounts.

For link: if a EXDEV happens then create a symlink instead
For rename: if a EXDEV happens try to link the file to some known location outside the directory it is in, symlink the destination to that new "hidden" file, unlink the original file.

If for some reason the symlink or second rename fails then return EXDEV.

Could it cause issues with software that call link or rename? Sure. None of this stuff is perfect because we're talking about low level behaviors and the client software could always be doing something like... rename a file and then check the target to see if something about it is true. Is that likely? I don't believe so.

That said... I don't know for sure this will work. Whenever working with the type of file and changing it the kernel sometimes doesn't like it. I'll have to basically implement this to see if it'd work.

@dgalli1
Copy link
Author

dgalli1 commented Jan 15, 2021

Alright now i get what you're trying to do, this sounds like a plan 👍
I will submit a pullrequest to mergerfs-tools when you got around to implement this. (Although i have to learn python first ;) ).

For rename: if a EXDEV happens try to link the file to some known location outside the directory it is in, symlink the destination to that new "hidden" file, unlink the original file.

The only issue that i can think of that could come up if we symlink it, is docker mount points not being able to follow the symlinks because they don't have access to the whole mergerfs mount. But as long as the symlinks are relative it shouldn't be to much of an issue.

@trapexit
Copy link
Owner

It would depend on what / how you have it mounted. The move directory will probably have to be configurable so you can do as you please.

@trapexit
Copy link
Owner

Yeah. Unfortunately, the kernel doesn't like trying to create a link and getting told the file is a symlink. There may be a way to work around this but I'll need to mess around with it more.

@trapexit
Copy link
Owner

OK. I got link working. relative and absolute symlinks created when a exdev happens. Still needs to be cleaned up but ready for testing if you're interested. rename will take more work given I need to manage the movement of the oldpath. I don't know if you have the interest of ability to really test the link feature but I can push the branch if interested.

@trapexit
Copy link
Owner

Will you have time to look at this? I don't want to release it without some 3rd party testing. I will likely have the rename done soon. I've had a bit of coder's block recently and ended up rewriting some of the rename logic (which is a bit complicated.)

@dgalli1
Copy link
Author

dgalli1 commented Jan 26, 2021

Oh i'm sorry i completely missed your last comment.
Sure i can test it if you branch it out. Can't guarantee any feedback before Friday, i have a deadline coming up.

@trapexit
Copy link
Owner

No rush. Just wanted to confirm you're around and still interested :)

I'll ping this thread when the branch when I have something to test.

@trapexit
Copy link
Owner

Not done but testable

link-symlink branch

git clone https://github.com/trapexit/mergerfs.git -b link-symlink

options are:

  • link-exdev=passthrough | rel-symlink | abs-symlink
  • rename-exdev = passthrough | rel-symlink | abs-symlink

It doesn't properly handle if there are multiple files to link or rename but as of now I'm simply going to do one of them and remove the others. To do more than that is a bit complicated. For mergerfs 3 I think I need to reconsider rename and link a bit. Separate "found 1 files" and "found N>1 files" behaviors.

For link... it tries what it does today and if that fails it will make a symlink instead.
For rename... it tries what it does today and if that fails with exdev it will create on the branch it is on "/.mergerfs_rename_exdev/relative/path/to/file" and then symlink to it.

I'll update the branch with any changes. Please put it through the paces to see if this works for you. I've not tried it with any more complex software. Just "mv" to ensure it works. I don't know if radarr or similar would freak out if it finds a symlink where it thinks it should be a link or renamed file.

@dgalli1
Copy link
Author

dgalli1 commented Feb 1, 2021

It doesn't properly handle if there are multiple files to link or rename but as of now I'm simply going to do one of them and remove the others. To do more than that is a bit complicated. For mergerfs 3 I think I need to reconsider rename and link a bit. Separate "found 1 files" and "found N>1 files" behaviors.

Did I understood this correctly? It doesn't work when i have a the same file on multiple hdds? (Or atleast a file with the same path)

Anyways it compiled fine, i will put it through the wringer this evening.

@trapexit
Copy link
Owner

trapexit commented Feb 1, 2021

Define "works". I'm saying it doesn't do anything overly complicated when it encounters multiple files to link / rename.

This is a general problem with mergerfs. How do you manage the fact that the POSIX standard is all about working on one thing but mergerfs has to possibly work on more than one?

What happens if you have:

  • /drive0/a/foo
  • /drive1/a/foo
  • /drive2/b/

That's everything and then a rename("/a/foo","/b/foo") is requested with path preservation enabled? Today a EXDEV is returned unless if on a "create" call to "/b/foo" would end up on /drive0 or /drive1. Then it creates /drive0/b/ and tries rename again. But then you have the other "/a/foo" laying around. So that is removed. There are other situations that need to be taken into account to. Like... what if "/drive2/b/foo" exists? That has to be removed.

Basically... it links or renames all that it can (using create to see if the drives that fail might be viable anyway) and removes those it can't if at least one succeeds.

So back to this patch. After I wrote the above message I added the removal like above on success and extra files. But like I mentioned it doesn't do anything beyond that. It doesn't try to be extra smart... like...

  • /drive0/a/
  • /drive1/a/
  • /drive2/b/foo
  • /drive3/b/foo

rename("/b/foo","/a/foo") resulting in 2 symlinks on /drive{0,1}/a/foo to the hidden /drive{2,3}/b/foo. That's a lot more complicated.

@dgalli1
Copy link
Author

dgalli1 commented Feb 1, 2021

Absolute Path:
Moving/Link Via Console:
Seems to work flawless.

SFTP Connection via Dolphin:
Moving the file "test" from folder public_data/sonstiges/ to public_data/Serien/The Expanse/

Result:
File is gone, can't find it.
Symlink looks like this:
ls -al test/public_data/Serien/The Expanse
lrwxrwxrwx 1 root root 56 Feb 1 18:55 test -> /srv/dev-disk-by-label-data02/public_data/sonstiges/test

SMB via Dolphin:
Cries that that the file can't be renamed (guess that one was expected)

Rel Symlink:
Same results on all the test cases i tried.
symbolink link via SFTP looks like this: ../../sonstiges/test

Same Results for abs and relative:

SSHFS:
Works as expected for both link and move

Sonarr/Radarr:
It doesn't even move the file but copies it, so this is no worry.

Nextcloud:
Seems to work just fine.

What do you think are the issues with SMB/SFTP fixable, or is this looked because of the implementation of Dolphin? (I don't need them just asking)

@dgalli1
Copy link
Author

dgalli1 commented Feb 1, 2021

Define "works". I'm saying it doesn't do anything overly complicated when it encounters multiple files to link / rename.

Thanks alot for the detailed explanation. Now i understand what you meant. At least for me this is is of no concern, because i try to avoid cases like this anyways.

Thanks a lot for taking the time to implement this feature, this will help me immensely with my workflow.

I think as it stands right now i will implement a script to resolve the symlinks and move the files. Are you alright with a PR on mergerfs-tools or should i put them in a separate repository, so that its clear that I'm maintaining it?

@trapexit
Copy link
Owner

trapexit commented Feb 1, 2021

I'd need to understand what the issues you saw are before I can comment. The only device I have KDE on at the moment is the PinePhone I just got. I could try that + something on my other desktops that are similar. The layout of your tests isn't super clear but I think I got it.

re: the tool... sure. feel free to submit a PR when you have something if you're fine with that. Is there anything extra needed to know what symlinks to process?

@trapexit
Copy link
Owner

trapexit commented Feb 1, 2021

Thanks a lot for taking the time to implement this feature, this will help me immensely with my workflow.

No problem. Glad to help out.

@trapexit
Copy link
Owner

trapexit commented Feb 1, 2021

  1. make sure you're using the latest version of the branch.
  2. Can you tell me explicitly which one's don't seem to work as you would expect them to? "rename-exdev=X; renamed Y, Z -> got symlink value A"
  3. Just dawned on me that there are two kinds of abs values that might be useful. One that points to the mergerfs mount and one (like now) that points to the full original path. Not sure what names to give them though. abs-base-symlink and abs-pool-symlink?

@dgalli1
Copy link
Author

dgalli1 commented Feb 1, 2021

make sure you're using the latest version of the branch.

Just double checked but i was indeed on the latest commit. But just to be clear the issue only arrives when moving files around via a samba share.

Can you tell me explicitly which one's don't seem to work as you would expect them to? "rename-exdev=X; renamed Y, Z -> got

This only happens when using SFTP via Dolphin:
rename-exdev=rel-symlink
renamed "/root/git/mergerfs/build//test/public_data/sonstiges", "/root/git/mergerfs/build/test/public_data/Serien/The Expanse/test"
Resulting Symlink:
lrwxrwxrwx 1 root root 20 Feb 2 00:31 test -> ../../sonstiges/test

rename-exdev=abs-symlink
renamed "/root/git/mergerfs/build//test/public_data/sonstiges", "/root/git/mergerfs/build/test/public_data/Serien/The Expanse/test"
Resulting Symlink:
lrwxrwxrwx 1 root root 56 Feb 2 00:39 test -> /srv/dev-disk-by-label-data02/public_data/sonstiges/test

Just dawned on me that there are two kinds of abs values that might be useful. One that points to the mergerfs mount and one (like now) that points to the full original path. Not sure what names to give them though. abs-base-symlink and abs-pool-symlink?

Seems like a good idea. I think the names you gave are pretty clear.

@trapexit
Copy link
Owner

trapexit commented Feb 1, 2021

I don't know how you're not having symlinks with .mergerfs_rename_exdev in the name. It is hardcoded in the algo. The file is moved to "/branch/.mergerfs_rename_exdev/RELPATH". Any relative or absolute link would have to include it. What you're showing me looks like links. Not renames.

I'm using SFTP with nemo and it works as expected. Same with SMB on windows and nemo.

@trapexit
Copy link
Owner

trapexit commented Feb 5, 2021

If you would pull the latest and retest. I'm still not positive what you did to produce the above.

@trapexit
Copy link
Owner

trapexit commented Feb 6, 2021

Just dawned on me that the abs-base-symlink would cause issues with renaming of directories. Rather than trying to manage two behaviors I'm thinking maybe of just having the rel-symlink and abs-symlink (which is the pool version).

@trapexit
Copy link
Owner

trapexit commented Feb 6, 2021

Actually... link is fine since you can't link directories. Maybe I'll leave link with the 3 options and have just the 2 for rename.

@dgalli1
Copy link
Author

dgalli1 commented Feb 6, 2021

Alright, i will test it as soon as i rebuilt my pool from backups.
I accidentally ran "make clean" while i had my mergerfs pool mounted in the build directory 🤦 ....

Just dawned on me that the abs-base-symlink would cause issues with renaming of directories. Rather than trying to manage two behaviors I'm thinking maybe of just having the rel-symlink and abs-symlink (which is the pool version).

All for it aswell, it ads a lot of complexity to have one more option. To be honest, i think the most useful setting for me will be the abs-base-symlink. This way i have a possibility to mount ther .mergerfs folder seperatly into my docker containers without exposing the whole mergerfs pool so that the lns will stay usable even while in a container.

@trapexit
Copy link
Owner

trapexit commented Feb 7, 2021

The relative link would work for that too. I could leave in the abs-base-symlink but it'd break directories. Or I'd have to create a feature for regular files and one for dirs.

@dgalli1
Copy link
Author

dgalli1 commented Feb 21, 2021

Alright i figured out what i did, you were right the path it generated was indeed created because of the link option.
I retried the sftp stuff again with this command:
./mergerfs -o allow_other,use_ino,rename-exdev=rel-symlink /srv/dev-disk-by-label-data0* test/

And everything i tested worked perfectly.

What i did the last time was this:
./mergerfs -o allow_other,use_ino,rename-exdev=rel-symlink,link-exdev=rel-symlink /srv/dev-disk-by-label-data0* test/

And i only get the issue mentioned above if i have rename-exdev and link-exdev enabled. So i guess Dolphin triggers a Link somehow while renaming files.

@dgalli1
Copy link
Author

dgalli1 commented Feb 21, 2021

The relative link would work for that too. I could leave in the abs-base-symlink but it'd break directories. Or I'd have to create a feature for regular files and one for dirs.

No need the other options work aswelll for me it's just a little bit more work.

@trapexit
Copy link
Owner

And i only get the issue mentioned above if i have rename-exdev and link-exdev enabled. So i guess Dolphin triggers a Link somehow while renaming files.

You can strace it to see exactly what calls its making.

@dgalli1
Copy link
Author

dgalli1 commented Feb 21, 2021

Alright i tried to do this. But i believe this strace is worthless and i somehow have to get one from the kio daemon that is internally mounting the sftp filesystem. Or does this help you?
https://pastebin.com/yNKnkibq

@trapexit
Copy link
Owner

trapexit commented Feb 21, 2021

renameat2(AT_FDCWD, "season01-poster_test.jpg", AT_FDCWD, "../Snowpiercer/test.jpg", RENAME_NOREPLACE) = 0

I don't have rename2 implemented but that behavior is in effect a link and unlink but atomically so it makes sense the kernel asked mergerfs to link.

@dgalli1
Copy link
Author

dgalli1 commented Feb 21, 2021

Btw i just realized that the same behavior also occurs when using filezilla.
Anyways it seems like KIO is not doing anything wrong this are the SFTP Logs from the Serverside:

Feb 21 17:38:53 openmediavault.local sftp-server[25044]: lstat name "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: debug1: request 293: sent attrib have 0xf
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: statvfs "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: lstat name "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer/tvshow_testus.nfo"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: sent status No such file
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: rename old "/root/git/mergerfs/build/test/public_data/Serien/Marvel's Agents of S.H.I.E.L.D/tvshow_testus.nfo" new "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer/tvshow_testus.nfo"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: sent status Success
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: opendir "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: debug1: request 297: sent handle handle 0
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: debug1: request 298: readdir "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer" (handle 0)
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: debug1: request 298: sent names count 11
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: readlink "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer/tvshow_help.nfo"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: debug1: request 299: sent names count 1
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: stat name "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer/tvshow_help.nfo"
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: sent status No such file
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: debug1: request 301: readdir "/root/git/mergerfs/build/test/public_data/Serien/Snowpiercer" (handle 0)
Feb 21 17:38:53 openmediavault.local sftp-server[25044]: sent status End of file

@trapexit
Copy link
Owner

So are we good? Is there anything not working as expected/needed?

@dgalli1
Copy link
Author

dgalli1 commented Feb 21, 2021

For me everything seems well.

@trapexit
Copy link
Owner

OK. I committed #883. Will be in the next version. I refactored a lot of code so I want to put it through some more testing before release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants