Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

youtube 404 error due to bad regex #346

Closed
fenollp opened this issue May 12, 2012 · 6 comments
Closed

youtube 404 error due to bad regex #346

fenollp opened this issue May 12, 2012 · 6 comments

Comments

@fenollp
Copy link
Contributor

@fenollp fenollp commented May 12, 2012

Hi,

Rationale:
While watching youtube videos embedded on other's website, you click on the YouTube icon in the swf player it opens up a page. I usually get this page's URL to input to youtube-dl. But sometimes there is GET info on where in the video's timeline I was; it appears as some wild '#!' or the like… and so it 404.
More generally, if the v GET param isn't the first GET param, the download screws.

URL: http://www.youtube.com/watch?feature=player_embedded&v=0w_xEUoK79o#!
youtube-dl --version: 2012.02.27
python --version: Python 2.7.1

Well, I like regexes (yep, I'm totally into SM) and I don't really know how to help.
In order to fix this: as soon as you've found out that you're fed a youtube link, seek for something like “v=([0-9a-zA-Z+_-]{11})”.
That's it.

And thanks for all the fish!

@fenollp
Copy link
Contributor Author

@fenollp fenollp commented May 12, 2012

Well, I finally managed to get to the one line (: 1175)
_VALID_URL = r'^((?:https?://)?(?:youtu\.be/|(?:\w+\.)?youtube(?:-nocookie)?\.com/)(?!view_play_list|my_playlists|artist|playlist)(?:(?:(?:v|embed|e)/)|(?:(?:watch(?:_popup)?(?:\.php)?)?(?:\?|#!?)(?:.+&)?v=))?)?([0-9A-Za-z_-]+)(?(1).+)?$'
What the hell?
I mean, all those (:? … ) are not catching usefull info. If I may, I would do
_VALID_URL = r'(:?//[^/]+youtu\.?be[^/]+/[^/]+)v=([0-9a-zA-Z+_-]{11})'
where [^/] represents a non-slash character. All youtube videos IDs fit in a string of eleven chars. Lovely, uh?

Whatcha think?

@FiloSottile
Copy link
Collaborator

@FiloSottile FiloSottile commented May 15, 2012

That is a really bad formatted Gist to explain the regex. The world is never as simple as it looks, guy.
https://gist.github.com/2176911

Anyway, I am looking into your bug, thanks for reporting it!

@fenollp
Copy link
Contributor Author

@fenollp fenollp commented May 15, 2012

I enjoy being corrected! (remember the S&M part…) and yeah, I see what you did there. I'll try to keep the negative lookaround on the playlists & the different params allowed:
_VALID_URL = r'(:?//[^/]+youtu\.?be[^/]+/)?(:?[^/]+(?!view_play_list|my_playlists|artist|playlist)(:?(?:v|embed|e)/|[^/]+v=))?([0-9a-zA-Z+_-]{11})'
It may be too ugly. Let's split it into 3 blocks:
(:?//[^/]+youtu\.?be[^/]+/)? ⟺ a YT website or none
(:?[^/]+(?!view_play_list|my_playlists|artist|playlist)(:?(?:v|embed|e)/|[^/]+v=))? ⟺ no playlists, and v/ or v=
([0-9a-zA-Z+_-]{11})'⟺ then the ID (note the '+' char. May only be here to shape a blinking head:)

Voilà. It works fine with my link and those in your Gist. And sorry for that bad formated one!
Thanks for reading.

@fenollp
Copy link
Contributor Author

@fenollp fenollp commented May 15, 2012

Oh man this is stupid.
It appears that it's a shell problem. Add quotes around you url and it's all well.
The & messes up with /usr/bin/env bash and the regex is no cause of that.
Still the issue is that you get to have this kind of URL just by opening a new window clicking in the player's area. It's a practical issue if you see what I mean.
<3

@FiloSottile
Copy link
Collaborator

@FiloSottile FiloSottile commented May 23, 2012

Don't worry, happens often :)
However I don't understand well the remaining issue, or what youtube-dl could do about it.

@fenollp fenollp closed this Apr 18, 2014
@phihag
Copy link
Contributor

@phihag phihag commented Apr 21, 2014

For reference, we have done something about it; the above command now outputs

ERROR: Did you forget to quote the URL? Remember that & is a meta character in most shells, so you want to put the URL in quotes, like youtube-dl "http://www.youtube.com/watch?feature=foo&v=BaW_jenozKc" or simply youtube-dl BaW_jenozKc .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.