Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
ArteTV support broken by unicode encoded JSON URLs #21619
Comments
|
Duplicates #21614 I guess. |
Verbose log
Description
Since yesterday 2019/07/02, URL of videos on the arte.tv website, which were working until then, stopped getting downloaded.
I don't know what changed, but I figured that the extractor, in
_extract_from_webpage(), looks for a JSON URL withfind_iframe_url()by searching for a pattern starting with<iframebut matches nothing as JSON data are unicode-escaped in the response stored in thewebpagevariable, and would thus appear as\\u003Ciframeinstead.I found that by unescaping the retrieved webpage as follows, expected eventual behavior (i.e. videos getting downloaded) is restored:
I don't know whether this is the piece of the extractor that got broken by a recent change in the website response, but I can confirm it restores video downloading functionality for the URLs that recently stopped working.
Please note that most
_TESTSURLs are now outdated and it's better to test with fresh URLs from https://www.arte.tv.