Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tele5: Download fails, "Unable to extract video ID" #24553

Closed
darkstar opened this issue Mar 31, 2020 · 19 comments
Closed

Tele5: Download fails, "Unable to extract video ID" #24553

darkstar opened this issue Mar 31, 2020 · 19 comments

Comments

@darkstar
Copy link

@darkstar darkstar commented Mar 31, 2020

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2020.03.24
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--merge-output-format', 'mkv', 'https://www.tele5.de/filme/mega-alligators/', '--verbose']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2019.11.28
[debug] Python version 3.6.3 (CPython) - Windows-7-6.1.7601-SP1
[debug] exe versions: ffmpeg N-82664-g801b5c1, ffprobe N-82664-g801b5c1
[debug] Proxy map: {}
[Tele5] mega-alligators: Downloading webpage
ERROR: Unable to extract video id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update
  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "d:\bin\python36\lib\site-packages\youtube_dl\YoutubeDL.py", line 796, in extract_info
    ie_result = ie.extract(url)
  File "d:\bin\python36\lib\site-packages\youtube_dl\extractor\common.py", line 530, in extract
    ie_result = self._real_extract(url)
  File "d:\bin\python36\lib\site-packages\youtube_dl\extractor\tele5.py", line 53, in _real_extract
    r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
  File "d:\bin\python36\lib\site-packages\youtube_dl\extractor\common.py", line 1014, in _html_search_regex
    res = self._search_regex(pattern, string, name, default, fatal, flags, group)
  File "d:\bin\python36\lib\site-packages\youtube_dl\extractor\common.py", line 1005, in _search_regex
    raise RegexNotFoundError('Unable to extract %s' % _name)
youtube_dl.utils.RegexNotFoundError: Unable to extract video id; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version;
 see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Description

Tele5 changed something on their backend tonight. Yesterday I was able to download just fine, but today it just gives that error. This happens with all movies on https://www.tele5.de/filme/online/ as of today, even those that I was able to successfully download yesterday.

Note this is not a duplicate of #22810 as the error message is different so I thought it would be better to open a new issue instead of piggybacking on the old one.

@darkstar
Copy link
Author

@darkstar darkstar commented Mar 31, 2020

The format of the video ID seems to have changed:

<div id="player_X7QbljEQ" class="jwplayer jw-reset jw-state-paused jw-skin-seven jw-stretch-uniform jw-flag-aspect-mode jw-breakpoint-4" tabindex="0" aria-label="Video Player" style="width: 100%;">

youtube-dl has the following regex in youtube_dl/extractor/tele5.py which only looks for numbers as video ID:

                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
                 r'\s+id\s*=\s*["\']player_(\d{6,})',
                 r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')

I tried simply hacking the regex to allow arbitrary strings as ID, and also fixed the nexx extractor in the same way, but it seems it's not that simple

This is the patch I tried:

diff --git a/youtube_dl/extractor/nexx.py b/youtube_dl/extractor/nexx.py
index 586c1b7..3de4a84 100644
--- a/youtube_dl/extractor/nexx.py
+++ b/youtube_dl/extractor/nexx.py
@@ -24,7 +24,7 @@ class NexxIE(InfoExtractor):
                             nexx:(?:(?P<domain_id_s>\d+):)?|
                             https?://arc\.nexx\.cloud/api/video/
                         )
-                        (?P<id>\d+)
+                        (?P<id>.+)
                     '''
     _TESTS = [{
         # movie
diff --git a/youtube_dl/extractor/tele5.py b/youtube_dl/extractor/tele5.py
index 33a7208..b4b1581 100644
--- a/youtube_dl/extractor/tele5.py
+++ b/youtube_dl/extractor/tele5.py
@@ -48,9 +48,9 @@ class Tele5IE(InfoExtractor):
             display_id = self._match_id(url)
             webpage = self._download_webpage(url, display_id)
             video_id = self._html_search_regex(
-                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
-                 r'\s+id\s*=\s*["\']player_(\d{6,})',
-                 r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
+                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\']([^"]+)',
+                 r'\s+id\s*=\s*["\']player_([^"]{6,})',
+                 r'\bdata-id\s*=\s*["\']([^"]{6,})'), webpage, 'video id')

         return self.url_result(
             'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
@AndrewMBL
Copy link
Contributor

@AndrewMBL AndrewMBL commented Mar 31, 2020

It looks like they are using JWplayer not NEXX, i'll have a look at it.

@PeerVanTiersken
Copy link

@PeerVanTiersken PeerVanTiersken commented Mar 31, 2020

Browser fetches this manifest. Download using this works with youtube-dl. Based on domain name it looks still to be nexx:

https://tele5nexx.akamaized.net/ba8ff4e5-b1a4-4ede-95ae-b8729afb9afb/1679093_src.ism/Manifest(format=m3u8-aapl-v3)
@AndrewMBL
Copy link
Contributor

@AndrewMBL AndrewMBL commented Mar 31, 2020

My bad I completely missed that, I've got a fix that works based on JWPlatformIE rather than the NexxIE as that seemed a more straight forward approach.

@martin54
Copy link

@martin54 martin54 commented Mar 31, 2020

@AndrewMBL : nevertheless it's correct that it's related to jwplayer, as it's first loading the nexx id from cdn.jwplayer.com JSON reply:
GET https://cdn.jwplayer.com/v2/media/X7QbljEQ
=> "nexx_id":"1679012", or the above manifest url:
"file":"https://tele5nexx.akamaized.net/2225c59f-49b6-4480-a84d-a69f81ed9954/1679012_src.ism/Manifest(format=m3u8-aapl-v3)"

@martin54
Copy link

@martin54 martin54 commented Mar 31, 2020

FYI: For the given example URL, this code solves the problem:

from ..utils import ExtractorError

... replacing the first few lines of _real_extract(): ...

    def _real_extract(self, url):
        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
        video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]

        if not video_id:
            display_id = self._match_id(url)
            webpage = self._download_webpage(url, display_id)
            video_id = self._html_search_regex(
                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\']([^"\']+)',
                 r'\s+id\s*=\s*["\']player_([^"\']{6,})',
                 r'\bdata-id\s*=\s*["\']([^"\']{6,})'), webpage, 'JWplayer video id') # NEW: get new JWplayer video id
        
        # NEW: translate the new JWplayer video id into the old nexx video id:
        info = self._download_json('https://cdn.jwplayer.com/v2/media/%s' % video_id, display_id)
        video_id = info['playlist'][0]['nexx_id'] # TODO: no idea, how to use: info.get()
        if not video_id:
            error = '%s: Cannot get nexx video id' % display_id
            raise ExtractorError(error, expected=True) # needs: from ..utils import ExtractorError

(the new/changed code is marked with "# NEW:")

@darkstar
Copy link
Author

@darkstar darkstar commented Mar 31, 2020

@martin54
Copy link

@martin54 martin54 commented Mar 31, 2020

Maybe it makes sense to note somewhere, that this makes video-IDs from the --download-archive invalid?

@AndrewMBL
Copy link
Contributor

@AndrewMBL AndrewMBL commented Apr 1, 2020

Hmm i think your approach is better where the Video ID is preserved.

@AndrewMBL
Copy link
Contributor

@AndrewMBL AndrewMBL commented Apr 1, 2020

@martin54 this fix fails on playlist items but with a small change it works great and preserves the video ids.

    def _real_extract(self, url):
        qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
        video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]

        if not video_id:
            display_id = self._match_id(url)
            webpage = self._download_webpage(url, display_id)
            video_id = self._html_search_regex(
                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\']([^"\']+)',
                 r'\s+id\s*=\s*["\']player_([^"\']{6,})',
                 r'\bdata-id\s*=\s*["\']([^"\']{6,})'), webpage, 'JWplayer video id')  # NEW: get new JWplayer video id

        # NEW: translate the new JWplayer video id into the old nexx video id:
        info = self._download_json('https://cdn.jwplayer.com/v2/media/%s' % video_id, self._match_id(url))  # when video id is in the url display id isn't initialised
        video_id = info['playlist'][0]['nexx_id']  # TODO: no idea, how to use: info.get()
        if not video_id:
            error = '%s: Cannot get nexx video id' % display_id
            raise ExtractorError(error, expected=True)  # Needs: from ..utils import ExtractorError

        return self.url_result(
            'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
            ie=NexxIE.ie_key(), video_id=video_id)
@PeerVanTiersken
Copy link

@PeerVanTiersken PeerVanTiersken commented Apr 1, 2020

@AndrewMBL your fix below works for me as well an different videos. Thanks. I worry this is just the beginning of a migration away from nexx, but the future will tell.

https://raw.githubusercontent.com/AndrewMBL/youtube-dl/388a8c1bb42c957b2120bd2986cb262ac6ff059b/youtube_dl/extractor/tele5.py

@martin54
Copy link

@martin54 martin54 commented Apr 1, 2020

Everything has advantages and disadvantages.

  • fetching the nexx id from cdn.jwplayer.com is some kind of quick'n dirty hack and "backward compatible" for the video ids. But it's not very future-proof, when Tele5 is going away from nexx. And when this happens, the video id's will become invalid anyway.
  • Also nobody knows, if the important metadata at nexx (which uses ytdl for title, alt_title etc.) will still be maintained by tele5 in the future (as now the main source is probably the jwplayer JSON).
  • From that p.o.v., using the first jwplayer approach probably isn't a bad choice at all.
@tomesi
Copy link

@tomesi tomesi commented Apr 3, 2020

Hi Guys! At first, you're doing a great job!!
Please, would it be possible to quickly implement the temporary solution so we can use the ytdl in the meantime, because the videos will disappear after a few days from their page?

@seligmanns
Copy link

@seligmanns seligmanns commented Apr 3, 2020

Hi all,
yesterday I tried the patch. I checked out the latest release and replaced the tele5.py extractor.
[Tele5] gewalt: Downloading webpage [Tele5] gewalt: Downloading JSON metadata [Nexx] 1680615: Downloading JSON metadata [Nexx] 1680615: Downloading m3u8 information [Nexx] 1680615: Downloading MPD manifest [Nexx] 1680615: Downloading ISM manifest ERROR: unable to download video data: HTTP Error 403: Forbidden
It seems that again they changed something - or did I miss something?
Stefan

@Knut-HH
Copy link

@Knut-HH Knut-HH commented Apr 3, 2020

the patch worked for me

@martin54
Copy link

@martin54 martin54 commented Apr 3, 2020

@seligmanns :

  1. It would help, if you provide the full command line you're executing, and verbose log.
  2. Did you saw HTTP Error 403: Forbidden? Did you were able to Playback the video on the Website when you tried to download it?
  3. Which patch did you used? Please provide full URL.
@seligmanns
Copy link

@seligmanns seligmanns commented Apr 4, 2020

@martin54 : thank you for trying to assist

my command line: ./youtube-dl https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/der-flugkoerper/ --verbose
simultaneous streaming of the video works
This is the patch I used: https://raw.githubusercontent.com/AndrewMBL/youtube-dl/388a8c1bb42c957b2120bd2986cb262ac6ff059b/youtube_dl/extractor/tele5.py

Here is the verbose log:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/der-flugkoerper/', u'--verbose']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.03.24
[debug] Python version 2.7.17 (CPython) - Linux-5.0.0-32-generic-x86_64-with-LinuxMint-19.3-tricia
[debug] exe versions: none
[debug] Proxy map: {}
[Tele5] der-flugkoerper: Downloading webpage
[Tele5] der-flugkoerper: Downloading JSON metadata
[Nexx] 1680766: Downloading JSON metadata
[Nexx] 1680766: Downloading m3u8 information
[Nexx] 1680766: Downloading MPD manifest
[Nexx] 1680766: Downloading ISM manifest
[debug] Default format spec: best/bestvideo+bestaudio
[debug] Invoking downloader on u'http://tele5nexx.akamaized.net/b00323c8-723a-4116-8b7c-6445662cfdda/1680766_src_1024x576_1500.mp4'
ERROR: unable to download video data: HTTP Error 403: Forbidden
Traceback (most recent call last):
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 1926, in process_info
    success = dl(filename, info_dict)
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 1865, in dl
    return fd.download(name, info)
  File "./youtube-dl/youtube_dl/downloader/common.py", line 366, in download
    return self.real_download(filename, info_dict)
  File "./youtube-dl/youtube_dl/downloader/http.py", line 341, in real_download
    establish_connection()
  File "./youtube-dl/youtube_dl/downloader/http.py", line 109, in establish_connection
    ctx.data = self.ydl.urlopen(request)
  File "./youtube-dl/youtube_dl/YoutubeDL.py", line 2238, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden


@martin54
Copy link

@martin54 martin54 commented Apr 5, 2020

@seligmanns : Using the same patch, and up-to-date nexx.py, I got a different log. Esp. the URL is different:
[debug] Invoking downloader on u'http://tele5nexx.akamaized.net/.../Manifest...

If removing all external tools from PATH, I got the same error as you.
And with the following tools, download was ok:
[debug] exe versions: avconv v13_dev0-1440-g34c1133, avprobe v13_dev0-1440-g34c1133

Therefore, it looks more like a "problem" in nexx.py or hlsnative (or missing tools), and not it tele5.py.
You could also try the other tele5 patch which uses JWPlatformIE.
Below is my --verbose log, and also the --print-traffic output here: https://pastebin.com/bMi06QNu

> python -m youtube_dl  https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/der-flugkoerper/ --verbose
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/der-flugkoerper/', '--verbose']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2019.12.25
[debug] Python version 3.8.1 (CPython) - Windows-10-10.0.18362-SP0
[debug] exe versions: avconv v13_dev0-1440-g34c1133, avprobe v13_dev0-1440-g34c1133, ffmpeg N-96875-g017bdeec70, ffprobe N-96875-g017bdeec70, rtmpdump 2.4-20190330-gc5f04a5-GnuTLS_3.6.10-x86_64-static
[debug] Proxy map: {}
[Tele5] der-flugkoerper: Downloading webpage
[Tele5] der-flugkoerper: Downloading JSON metadata
[Nexx] 1680766: Downloading JSON metadata
[Nexx] 1680766: Downloading m3u8 information
[Nexx] 1680766: Downloading MPD manifest
[Nexx] 1680766: Downloading ISM manifest
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'http://tele5nexx.akamaized.net/b00323c8-723a-4116-8b7c-6445662cfdda/1680766_src.ism/QualityLevels(1498491)/Manifest(video,format=m3u8-aapl)'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 460
[download] Destination: Der Flugkörper-1680766.fazure-hls-1678.mp4
[download] 100% of 512.08MiB in 02:14
[debug] Invoking downloader on 'http://tele5nexx.akamaized.net/b00323c8-723a-4116-8b7c-6445662cfdda/1680766_src.ism/QualityLevels(128001)/Manifest(aac_UND_2_128,format=m3u8-aapl)'
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 459
[download] Destination: Der Flugkörper-1680766.fazure-hls-138.mp4
[download] 100% of 44.87MiB in 01:04
[ffmpeg] Merging formats into "Der Flugkörper-1680766.mp4"
[debug] ffmpeg command line: ffmpeg -y -loglevel "repeat+info" -i "file:Der Flugkörper-1680766.fazure-hls-1678.mp4" -i "file:Der Flugkörper-1680766.fazure-hls-138.mp4" -c copy -map "0:v:0" -map "1:a:0" "file:Der Flugkörper-1680766.temp.mp4"
Deleting original file Der Flugkörper-1680766.fazure-hls-1678.mp4 (pass -k to keep)
Deleting original file Der Flugkörper-1680766.fazure-hls-138.mp4 (pass -k to keep)

====================

>python -m youtube_dl  https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/der-flugkoerper/ --verbose --print-traffic
--> https://pastebin.com/bMi06QNu
@martin54
Copy link

@martin54 martin54 commented Apr 16, 2020

Unfortunately this fix isn't working any more for very new videos on Tele5 ca. since Thursday 16-APR, as they don't use the nexxx_id any more (entry is missing in the JSON from cdn.jwplayer.com).
Example:
https://www.tele5.de/star-trek/das-nachste-jahrhundert/
-> https://cdn.jwplayer.com/v2/media/bRADqmXB

I changed the code to support both nexx (if available), and otherwise to fallback to native JWplayer support via url 'jwplatform:'. The modified full code of tele5.py is available here:
https://pastebin.com/iDJg9cHq

The main logic looks like this:

nexx_video_id = info['playlist'][0]['nexx_id']
if not nexx_video_id:
    result = self.url_result(
        'jwplatform:%s' % JWplayer_video_id,
        ie=JWPlatformIE.ie_key(), video_id=JWplayer_video_id, video_title=title)
else:
    NEXX_URL_PATTERN = 'https://api.nexx.cloud/v3/759/videos/byid/%s'
    result = self.url_result(
        NEXX_URL_PATTERN % nexx_video_id, # return Nexx URL, which must be further processed by NexxIE ...
        ie=NexxIE.ie_key(), video_id=nexx_video_id, video_title=title)

return result

Please note: The full code also supports playlists, therefore there're "more" differences to the master version. Also I didn't checked for coding style etc. . PS: that code isn't fully based on the latest master version, e.g. the test cases are outdated :-(

As a very negative side effect of the switch to JWplayer, the nexx property "%%(alt_title)s" isn't available any more for filenames (which was the only reliable source for episode numbers).
The "raw" JSON from cdn.jwplayer.com still has something similar as playlist[0].subtitle ("Star Trek - TNG S02E29"), but that's not extracted into the info.json file (it's empty there).
(Also the properties for season and episode number are not available there, but as they are probably still not reliable, that doesn't matter).

bbepis referenced this issue in animelover1984/youtube-dl May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants
You can’t perform that action at this time.