Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gunzip won't unzip a symlink #2

Closed
rsharris opened this issue Apr 1, 2021 · 5 comments
Closed

gunzip won't unzip a symlink #2

rsharris opened this issue Apr 1, 2021 · 5 comments

Comments

@rsharris
Copy link

rsharris commented Apr 1, 2021

(This doesn't seem like it needs to be addressed in the short term, if at all).

I tried "FastK orange.fa.gz" where orange.fa.gz was a symlink. The result is

gunzip: ./orange.fa.gz is not a regular file
FastK: Cannot get stats for ./orange.fa

Apparently gunzip doesn't like symlinks, so it fails to create the unzipped file. The downstream code in FastK, I guess, doesn't notice that gunzip failed, but a later sanity check notices the unzipped file doesn't exist.

This would be an issue for the use case where the user has read access to a shared directory of gzipped read data or assemblies, but doesn't have write access. They can't give FastK a path to the original (because, I presume, it would try to write the unzipped file in that directory). Traditionally a symlink would be the 'right' solution, to avoid wasted disk space. But perhaps disk deduplication technology makes this less of an issue?

I suspect this is only an issue for gzip'd files. I assume for the other compressed formats you are able to decompress on-the-fly and don't need to write an uncompressed file.

Bob H

@richarddurbin
Copy link
Collaborator

richarddurbin commented Apr 1, 2021 via email

@rsharris
Copy link
Author

rsharris commented Apr 1, 2021

Yep. In my own stuff I usually use gzip -dc and pipe it into the tool, so there's no file created.

When the program expects a filename, this can still be accomplished with <(gzip -dc ZZ.gz). (I'm not sure what that's called in unix lingo — named pipe or process substitution?). I'm not sure what string bash passes in argv for that, but whatever it is fopen must end up returning something attached to a process running gzip.

But those only give the program the uncompressed file as a read-once stream, anyway.

@thegenemyers
Copy link
Owner

thegenemyers commented Apr 1, 2021 via email

@rsharris
Copy link
Author

rsharris commented Apr 1, 2021

I'm absolutely willing to work around this! I only reported it so it would be a known issue. (See the first line in the first post).

I'm probably hitting the edge cases because, as I get my feet wet with the package, I've been running small not-very-realistic test cases so I can quickly understand how the piece of the package fit together. I'm just about done with that now.

@thegenemyers
Copy link
Owner

Thanks Richard, you gave me enough insight to fix this problem. FastK now uses -c (I though -k would signal the
same thing, but no) to do the unpack and puts it in the -P temp directory used for the sorting. It seems to work
just fine.

Bob, Richard, let me know if there are any further problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants