Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
HTMLParser incorrectly decodes string as HTML entities #11798
Comments
|
HTML5 allows entities without a semicolon. Thus this is perfectly legal. |
|
Albeit very stupid |
|
I guess it's PornFlip that provides broken HTML. |
Please follow the guide below
xinto all the boxes [ ] relevant to your issue (like that [x])Make sure you are using the latest version: run
youtube-dl --versionand ensure your version is 2017.01.18. If it's not read this FAQ entry and update. Issues with outdated version will be rejected.Before submitting an issue make sure you have:
What is the purpose of your issue?
Description of your issue, suggested solution and other information
Due to the reliance of HTMLParser, any site such as PornFlip (see #11795) that contains the string
§imein the mpd manifest URL gets incorrectly decoded to§ime. I have no knowledge of why HTMLParser does this, as§HTML-encoded is§(note the semicolon, like other HTML-encoded symbols). It is due to this that I had to rely on extracting the MP4 links rather than simply calling_parse_html5_media_entries.I felt that I needed to inform the developers of this, in case this or a similar problem happens to any current or future developer.