Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to sanitize the user name (%(uploader)s) ... #483

Open
Albretch opened this issue Oct 21, 2012 · 9 comments
Open

option to sanitize the user name (%(uploader)s) ... #483

Albretch opened this issue Oct 21, 2012 · 9 comments

Comments

@Albretch
Copy link

@Albretch Albretch commented Oct 21, 2012

I had pointed out an issue already about using --output TEMPLATE options
~
#414
~
Here is another one: there should be an option to sanitize the user name (%(uploader)s) as well. I think the best way to do this is by escaping all characters that are not kosher to filesystems (even those in capital letters (VFAT (pen drives are preformatted as vfat)))
~
Check out for example:
~
youtube.com/watch?v=q8K7GmEW7Lw
~
that user name includes exclamation marks "!" which makes both vlc and mplayer stumble on it even if you use quotes
~
vlc "!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv"
bash: !Thiago: event not found
~
mplayer "!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv"
bash: !Thiago: event not found
~
thanks
lbrtchx

@Plaque-fcc
Copy link

@Plaque-fcc Plaque-fcc commented Oct 21, 2012

Either use backslash escape (mplayer \!\!\!Thiago\ …) or use

single quotation marks (') to prevent shell from expanding your input.

But the (uploader) field is a quite nice proposal, since one

may wish to have it somehow-specially sorted.

В Sun, 21 Oct 2012 06:41:04 -0700
Albretch notifications@github.com пишет:

I had pointed out an issue already about using --output TEMPLATE
options ~
#414
~
Here is another one: there should be an option to sanitize the user
name (%(uploader)s) as well. I think the best way to do this is by
escaping all characters that are not kosher to filesystems (even
those in capital letters (VFAT (pen drives are preformatted as
vfat))) ~ Check out for example: ~ youtube.com/watch?v=q8K7GmEW7Lw
~
that user name includes exclamation marks "!" which makes both vlc
and mplayer stumble on it even if you use quotes ~
vlc "!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv"
bash: !Thiago: event not found
~
mplayer "!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv"
bash: !Thiago: event not found
~
thanks
lbrtchx


Reply to this email directly or view it on GitHub:
#483

@Plaque-fcc
Copy link

@Plaque-fcc Plaque-fcc commented Oct 21, 2012

bash: !Thiago: event not found

I guess that this reply relates neither to mplayer nor vlc. Of
course, if you use youtube-dl with an URI which contains ampersands
(«&») and don't quote the URI or escape the ampersand, you'll get
youtube-dl running in background, nothing else.

@Albretch
Copy link
Author

@Albretch Albretch commented Oct 22, 2012

vlc "!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv"
bash: !Thiago: event not found
~
if you use youtube-dl with an URI which contains ampersands
(«&») and don't quote the URI or escape the ampersand ...
~
but there are no ampersands on that path and I did quote it?
~
lbrtchx

@Tailszefox
Copy link
Contributor

@Tailszefox Tailszefox commented Oct 22, 2012

That was an example to say that some characters have special meanings for Bash, and thus the problem is related neither to VLC, mplayer nor youtube-dl.

Exclamation points also have a special meaning, and like @Plaque-fcc said they need to either be escaped or quoted with single-quotes; double-quotes will still make bash try to interpret them. So you should instead do vlc '!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv' and you'll be good to go.

@FiloSottile
Copy link
Collaborator

@FiloSottile FiloSottile commented Oct 22, 2012

The amperstand thing was an example, what @Plaque-fcc wanted to say is that what you encounter is a bash shell feature (like &) and you need to either use ' to quote or escape the !. So:

vlc "\!\!\!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv"

or

vlc '!!!Thiago Clarinete Flauta Sax/q8K7GmEW7Lw.flv'
@Albretch
Copy link
Author

@Albretch Albretch commented Oct 22, 2012

~
we keep cross talking each other ;-) (which is OK, it is part of
looking at an issue from different perspectives). My point is: OK,
there are issues relating to the file systems, the bash/shell
interpreter(s) and media players, but if you guys can simply fix all
those issues with a flag, why not solve them all at once? Say:
~
youtube-dl –full-fs-compliance ...
~
then all characters (including capital letters (remember vfat)) will be escaped
~
Python is not one of the languages I code in, but I would bet they
have libraries to escape character sequences with a simple function
call as you can do in ANSI- C, C++, and java
~
Quite honestly, I find naïve using a free text token, which people,
very naturally indeed, could use to their heart contents, for any kind
of naming/referring strategy
~
thank you very much guys for maintaining this wonderful peace of code
lbrtchx

@Tailszefox
Copy link
Contributor

@Tailszefox Tailszefox commented Oct 22, 2012

Oh, I think you are quite right regarding the original issue, actually. The point was to explain that the issue doesn't come from youtube-dl or the players, but it wasn't meant to undermine your point, I think it's still perfectly valid.

The issue is, the name of the uploader should be sanitized to prevent that sort of thing from happening. After all, the "stitle" template is already sanitized, so as far as I can tell it would only be a matter of applying the function sanitize_filename to that field as well. Maybe the function should even be called on the whole filename after it's been constructed with an additional option?

@Plaque-fcc
Copy link

@Plaque-fcc Plaque-fcc commented Oct 22, 2012

Adding an option for sanitization of the whole filename in a
post-procession would be the single shot which could make not
necessary other sanitizations when used.

I support this, as it can provide meaningful functional.

@phihag
Copy link
Contributor

@phihag phihag commented Oct 22, 2012

When an application that calls cannot handle arbitrary strings (except, maybe, NUL characters), the application is buggy and should be fixed.

But I agree that we should provide filesystem-safe strings in the output template (and probably default to them). If we can identify broken (= Cannot support Unicode minus /, \, NUL) file systems like FAT, I'm also fine with conforming to their notions and encodings. Since I have personally no interest in using DOS-era file systems, I'll leave it up to the community (that's you) to implement and test the proposed sanitation.

Filesystem-safe strings will not solve your original problem, which is improper encoding into shells. We can offer a flag to remove all characters not in a minimal set of characters - say [A-Z0-9_,], but I'm pretty sure there's a shell or language somewhere in which , cannot occur in unescaped strings. So there's really no way around proper encoding.

If you're calling vlc/mplayer in an automated script, encode properly (in shells, that usually entails some kind of escaping, for example putting the variable in quotes in a shell script). If you're calling vlc/mplayer interactively, you need to know your shell's escaping rules. Most shells can help you significantly if you press Tab. For example, a modern bash will automatically complete

vlc !                   # press Tab
vlc \!\!\!Thiago\ Clarinete\ Flauta\ Sax/q8K7GmEW7Lw.flv

Alternatively, you can use wildcards, like

vlc *Clarinete*/q8K7GmEW7Lw*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.