Closed
Description
Background
RFC 3986 defines an authority as such
authority = [ userinfo "@" ] host [ ":" port ]
WHAT WG says that
An opaque-host-and-port string must be either the empty string or: a valid opaque-host string, optionally followed by U+003A (:) and a URL-port string.
A scheme-relative-special-URL string must be "//", followed by a valid host string, optionally followed by U+003A (:) and a URL-port string, optionally followed by a path-absolute-URL string.
The Bug
ParseResult.from_string will parse a port number which is not preceded by a colon. This conflicts with both specifications.
Minimally Reproducable Example
import rfc3986
# Arguments are (URL, encoding, strict, lazy_normalize)
parsed_url = rfc3986.ParseResult.from_string('scheme://[v1.ip]8000/path', 'utf-8', True, False)
print("Host: " + str(parsed_url.host)) # prints 'Host: [v1.ip]'
print("Port: " + str(parsed_url.port)) # prints 'Port: 8000'
Cause
This is the regex used to parse the authority component in misc.py
SUBAUTHORITY_MATCHER = re.compile(
(
"^(?:(?P<userinfo>{})@)?" # userinfo
"(?P<host>{})" # host
":?(?P<port>{})?$" # port
).format(
abnf_regexp.USERINFO_RE, abnf_regexp.HOST_PATTERN, abnf_regexp.PORT_RE
)
)
This bug is a result of the first '?' character in the regex used for the port :?(?P<port>{})?$
. This regex allows the colon to be optional independently of an optional port number. However, according to the specs a port number and colon should always be paired.
Metadata
Metadata
Assignees
Labels
No labels