Skip to content

Commit

Permalink
Fixed #69. url_sanitize no longer crashes on unparsable urls.
Browse files Browse the repository at this point in the history
Also optimized the code to bypass parsing when not in safe_mode and return
immediately upon failure rather than continue parsing when in safe_mode.

Note that in Python2.7+ more urls may fail than in older versions because
IPv6 support was added to urlparse and it apparently mistakenly identifies some
urls as IPv6 when they are not. Seeing this only applies to safe_mode now,
I don't really care.
  • Loading branch information
Waylan Limberg authored and mdirolf committed Jan 14, 2012
1 parent 12baab2 commit 35930e0
Showing 1 changed file with 18 additions and 9 deletions.
27 changes: 18 additions & 9 deletions markdown/inlinepatterns.py
Expand Up @@ -311,20 +311,29 @@ def sanitize_url(self, url):
`username:password@host:port`.
"""
if not self.markdown.safeMode:
# Return immediately bipassing parsing.
return url

try:
scheme, netloc, path, params, query, fragment = url = urlparse(url)
except ValueError:
# Bad url - so bad it couldn't be parsed.
return ''

locless_schemes = ['', 'mailto', 'news']
scheme, netloc, path, params, query, fragment = url = urlparse(url)
safe_url = False
if netloc != '' or scheme in locless_schemes:
safe_url = True
if netloc == '' or scheme not in locless_schemes:
# This fails regardless of anything else.
# Return immediately to save additional proccessing
return ''

for part in url[2:]:
if ":" in part:
safe_url = False
# Not a safe url
return ''

if self.markdown.safeMode and not safe_url:
return ''
else:
return urlunparse(url)
# Url passes all tests. Return url as-is.
return urlunparse(url)

class ImagePattern(LinkPattern):
""" Return a img element from the given match. """
Expand Down

0 comments on commit 35930e0

Please sign in to comment.