Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube link regex fails for youtu.be links. #55

Closed
edelooff opened this issue Jan 10, 2011 · 2 comments
Closed

Youtube link regex fails for youtu.be links. #55

edelooff opened this issue Jan 10, 2011 · 2 comments
Labels
bug

Comments

@edelooff
Copy link

@edelooff edelooff commented Jan 10, 2011

Given the example link "http://youtu.be/watch?v=COiIC3A0ROM"
and the _VALID_URL regex present in the Jan 7 checkout, the groups() results are as follows:
('http://youtu.be/', 'watch')

The fix for the regex is a parenthesis that is placed too late. There should be an extra one after the ".com/" part, closing the optional matching group for youtu.be | \w+.youtube.com.

The regular expression that "works for me" is as follows:
_VALID_URL = r'^((?:https?://)?(?:youtu.be/|(?:\w+.)?youtube(?:-nocookie)?.com/)(?:(?:v/)|(?:(?:watch(?:popup)?(?:.php)?)?(?:?|#!?)(?:.+&)?v=)))?([0-9A-Za-z-]+)(?(1).+)?$'

@edelooff
Copy link
Author

@edelooff edelooff commented Jan 10, 2011

An annotated version of the above, which is not as verbose as it could be, but attempts some explanation of parts:

_VALID_URL = re.compile(r"""
^( # Start matching an optional URL.
(?:https?://)? # The scheme may be 'http', 'https' or missing.
(?:youtu.be/| # Domain can be youtu.be ...
(?:\w+.)?youtube # ... or any youtube subdomain ...
(?:-nocookie)?.com/) # with or without -nocookie in the domain name.
(?:(?:v/)| # The path might start with 'v', ...
(?:(?:watch(?:_popup)? # ... 'watch', with or without 'popup' ...
(?:.php)?)? # ... and might have a trailing '.php'.
(?:?|#!?) # Info may be in the fragment or in the path ...
(?:.+&)? # ... and may contain leading query arguments.
v=)))? # but it WILL contain a 'v=' for the video ID!
([0-9A-Za-z
-]+) # Capture the video_id (group #2).
(?(1).+)?$ # If the video ID was in an URL, capture the rest.
""", re.VERBOSE)

@rg3
Copy link
Collaborator

@rg3 rg3 commented Jan 10, 2011

Fixed. Thanks.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.