Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle reserved characters in filepaths #1629

Open
wants to merge 4 commits into
base: master
from

Conversation

@Jaliborc
Copy link

Jaliborc commented May 16, 2019

These two lines of code additions show how you can easily make WebTorrent be able to handle torrents and files having paths using UNIX or POSIX reserved characters, like the majority of torrenting clients do.

It is using a tiny package I created for the purpose of replacing reserved characters with visually similar unicode characters.

See issue #1618

Jaliborc added 4 commits May 14, 2019
@welcome

This comment has been minimized.

Copy link

welcome bot commented May 16, 2019

🙌 Thanks for opening this pull request! You're awesome.

@jimmywarting

This comment has been minimized.

Copy link
Contributor

jimmywarting commented May 16, 2019

Hmm, I might think that we shouldn't touch/edit the filename or path like that
browser can handle whatever. A unix system should be able to save unix characters as posix should be able to save posix characters.

If I want to transfer files from Mac to Mac I would want to reserve the characters in the path/name.

So it should be handled in the place where you actually read/write file(s) to/from the disc?

@Jaliborc

This comment has been minimized.

Copy link
Author

Jaliborc commented May 18, 2019

You need to do it. All user torrent clients do it as well. Currently, webtorrent tries to write the files with the reserved characters, which fails. For example:
cache_directory_given/someFolder/This is some file: with a colon.mp4

This will actually create a 0kb cache_directory_given/someFolder/This is some file on Windows, which webtorrent won't be able to write in. So the file is never downloaded. To be able to download to files, I had to edit webtorrent like this, because there was no way from the outside API to force the client to work... which should work by default anyway.

@Jaliborc

This comment has been minimized.

Copy link
Author

Jaliborc commented May 18, 2019

For reference, BitTorrent replaces every single reserved character with an underscore. This implementation is smarter, as most of the time the change isn't even visible to the eye.

@jimmywarting

This comment has been minimized.

Copy link
Contributor

jimmywarting commented May 18, 2019

I'm not saying we shouldn't do it. I just said it could be done in another way.
Also have to think about what would happen if the torrent had more similar files.

what if a torrent had this 4 files?

  • This is some file: with a colon.mp4
  • This is some file_ with a colon.mp4
  • This is some file with a colon.mp4
  • This is some file

( maybe could be worth to have a original and a renamed path/name )

@Jaliborc

This comment has been minimized.

Copy link
Author

Jaliborc commented May 18, 2019

I do see your point, but I think you have misunderstood this implementation. While most clients simply replace with an underscore, as you have written in the comment, and to no apparent issue, this one goes one step further: This is some file: with a colon.mp4 becomes This is some file꞉ with a colon.mp4. The difference is not visible, but the colon was replaced with a native canadian character used by the Inuit tribe.

I have a server currently processing over 400.000 torrents and have not found a single one using any of the replacement characters. But yes, in theory it is still possible to have a duplicate, even though ridiculously unlikely. My suggestion for this case would be:

  1. Define an escape character
  2. Replace the reserved characters as in this pull request
  3. Check if replacement creates duplicates
  4. If so, append the replaced character with the escape character
  5. Repeat from step 3

There is no way around a dynamic approach to guarantee no duplicates ever. It won't harm defining a propriety with the original path either. of course, that the user can access if they so need.

@Jaliborc

This comment has been minimized.

Copy link
Author

Jaliborc commented May 23, 2019

@jimmywarting Found an additional issue, and for this one I don't have a good suggestion. It's very rare, but some torrent file paths go over the maximum length allowed by some operative systems (cough Windows 10 again). It results in weird undefined behavior, where the file is written but cannot be renamed, copies cannot be created under most circumstances and it cannot be found by many applications/processes (Windows explorer does though).

@jimmywarting

This comment has been minimized.

Copy link
Contributor

jimmywarting commented May 23, 2019

eh, what is the maximum length allowed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.