Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArteTV support broken by unicode encoded JSON URLs #21619

Closed
desseim opened this issue Jul 3, 2019 · 1 comment
Closed

ArteTV support broken by unicode encoded JSON URLs #21619

desseim opened this issue Jul 3, 2019 · 1 comment
Labels

Comments

@desseim
Copy link

@desseim desseim commented Jul 3, 2019

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2019.07.02
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.arte.tv/fr/videos/065424-071-A/blow-up-c-est-quoi-claudia-cardinale/']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2019.07.02
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.18362
[debug] exe versions: ffmpeg N-90169-gf4709f1b7b, ffprobe N-90169-gf4709f1b7b
[debug] Proxy map: {}
[arte.tv:+7] blow-up-c-est-quoi-claudia-cardinale: Downloading webpage
[download] Downloading playlist: None
[arte.tv:+7] playlist None: Collected 0 video ids (downloading 0 of them)
[download] Finished downloading playlist: None

Description

Since yesterday 2019/07/02, URL of videos on the arte.tv website, which were working until then, stopped getting downloaded.
I don't know what changed, but I figured that the extractor, in _extract_from_webpage(), looks for a JSON URL with find_iframe_url() by searching for a pattern starting with <iframe but matches nothing as JSON data are unicode-escaped in the response stored in the webpage variable, and would thus appear as \\u003Ciframe instead.
I found that by unescaping the retrieved webpage as follows, expected eventual behavior (i.e. videos getting downloaded) is restored:

--- arte.old.py
+++ arte.new.py
@@ -215,1 +215,1 @@
-        webpage = self._download_webpage(url, video_id)
+        webpage = self._download_webpage(url, video_id, encoding="unicode_escape")

I don't know whether this is the piece of the extractor that got broken by a recent change in the website response, but I can confirm it restores video downloading functionality for the URLs that recently stopped working.
Please note that most _TESTS URLs are now outdated and it's better to test with fresh URLs from https://www.arte.tv.

@dstftw dstftw closed this Jul 3, 2019
@dstftw dstftw added the duplicate label Jul 3, 2019
@desseim
Copy link
Author

@desseim desseim commented Jul 3, 2019

Duplicates #21614 I guess.
Sorry I didn't catch it when reporting.

@ytdl-org ytdl-org locked and limited conversation to collaborators Jul 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.