Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Redtube] Handle additional indirection in playlists #29318

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dirkf
Copy link
Contributor

@dirkf dirkf commented Jun 16, 2021

## Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Issue #29305 reported that the Redtube extractor had stopped working.

The site has started to send JSON playlist data where the extractor expected M3U8. This patch checks for the extra level of indirection and iterates over such playlist data if found.

Fixes #29299,
fixes #29305;
fixes #29484,
fixes #30443.

dstftw
dstftw previously requested changes Jun 16, 2021
youtube_dl/extractor/redtube.py Outdated Show resolved Hide resolved
@dirkf dirkf requested a review from dstftw July 7, 2021 13:03
@dirkf dirkf force-pushed the df-rt-patch branch 3 times, most recently from 35671cb to 7cd31c4 Compare September 26, 2021 18:19
@dirkf dirkf changed the title Handle additional indirection in Redtube playlists [Redtube] Handle additional indirection in playlists Sep 26, 2021
@nicolaasjan
Copy link

I applied this fix, but this is what I get now:

ytd -v --ignore-config https://www.redtube.com/40022171
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--ignore-config', 'https://www.redtube.com/40022171']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2023.07.20
[debug] Lazy loading extractors enabled
[debug] Single file build
[debug] Python 3.8.10 (CPython x86_64 64bit) - Linux-5.4.0-153-generic-x86_64-with-glibc2.29 - OpenSSL 1.1.1f  31 Mar 2020 - glibc 2.31
[debug] exe versions: ffmpeg N-111355-g68e9d2835f-Nico-20230707, ffprobe N-111355-g68e9d2835f-Nico-20230707, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[RedTube] 40022171: Downloading webpage
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7'
[download] Destination: Slender Teen Masturbates After Working in Her Garden-40022171.unknown_video
[download] 100% of 42.00B in 00:00

@dirkf
Copy link
Contributor Author

dirkf commented Jul 21, 2023

I guess the problem here is different from what the PR was addressing. Both yt-dl master and yt-dlp stable find the inline GIF in the player element instead of either of these:

      'mediaDefinitions': [
        {
          'format': 'hls',
          'videoUrl': '/media/hls?s=eyJ2a2V5Ijo0MDAyMjE3MSwicyI6IjVkYmRjZWQ4MTY5NTlkOWUyYjdjZmM4ODZlZDdlOWZjOTE3Y2ViYWMwNDRjODZkMmJmMjIzYzUxY2MxODFjODUiLCJndCI6MTY4OTk0OTYyMSwiZSI6ZmFsc2V9',
          'remote': true
        },
        {
          'format': 'mp4',
          'videoUrl': '/media/mp4?s=eyJ2a2V5Ijo0MDAyMjE3MSwicyI6IjVkYmRjZWQ4MTY5NTlkOWUyYjdjZmM4ODZlZDdlOWZjOTE3Y2ViYWMwNDRjODZkMmJmMjIzYzUxY2MxODFjODUiLCJndCI6MTY4OTk0OTYyMSwiZSI6ZmFsc2V9',
          'remote': true
        }
      ],

@nicolaasjan

This comment was marked as outdated.

@dirkf
Copy link
Contributor Author

dirkf commented Jul 21, 2023

Let me have a look ...

The problem seems to be that

  1. the media URLs don't have https:
  2. the extractor falls back to a value that isn't tested for being a valid URL.
$ python -m youtube_dl -v -F 'https://www.redtube.com/40022171'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.redtube.com/40022171']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 4a644dafd
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1u  30 May 2023 - glibc 2.15
c
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[RedTube] 40022171: Downloading webpage
[RedTube] 40022171: Downloading m3u8 information
[info] Available formats for 40022171:
format code  extension  resolution note
hls-0        mp4        unknown    
mp4          mp4        unknown    (best)
$

I'll just push this to the PR.

@dirkf
Copy link
Contributor Author

dirkf commented Jul 21, 2023

And now combined:

$ python -m youtube_dl -v -f mp4-240 --test 'https://www.redtube.com/40022171'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-f', u'mp4-240', u'--test', u'https://www.redtube.com/40022171']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 4a644dafd
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1u  30 May 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[RedTube] 40022171: Downloading webpage
[RedTube] 40022171: Downloading JSON metadata
[RedTube] 40022171: Downloading m3u8 information
[RedTube] 40022171: Downloading m3u8 information
[RedTube] 40022171: Downloading m3u8 information
[RedTube] 40022171: Downloading m3u8 information
[RedTube] 40022171: Downloading m3u8 information
[RedTube] 40022171: Downloading JSON metadata
[debug] Invoking downloader on u'https://dv-ph.rdtcdn.com/videos/202107/21/391630841/240P_1000K_391630841.mp4?ttl=1689965474&ri=307200&rs=776&hash=1536c8d5dded0853ba3fc9032e535459'
[download] Destination: Slender Teen Masturbates After Working in Her Garden-40022171.mp4
[download] 100% of 10.00KiB in 00:00
$

But after more testing I see that some of the old media URLs are giving 404, so, with additional checking:

$ python -m youtube_dl -v -F 'https://www.redtube.com/38864951'
[debug] System config: [u'--prefer-ffmpeg']
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: [u'-v', u'-F', u'https://www.redtube.com/38864951']
[debug] Encodings: locale UTF-8, fs UTF-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: 4a644dafd
[debug] Python 2.7.18 (CPython i686 32bit) - Linux-4.4.0-210-generic-i686-with-Ubuntu-16.04-xenial - OpenSSL 1.1.1u  30 May 2023 - glibc 2.15
[debug] exe versions: avconv 4.3, avprobe 4.3, ffmpeg 4.3, ffprobe 4.3
[debug] Proxy map: {}
[RedTube] 38864951: Downloading webpage
[RedTube] 38864951: Downloading JSON metadata
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading JSON metadata
[RedTube] 38864951: Checking mp4-1080 video format URL
[RedTube] 38864951: Checking mp4-720 video format URL
[RedTube] 38864951: Checking mp4-480 video format URL
[RedTube] 38864951: mp4-480 video format URL is invalid, skipping: HTTP Error 404: Not Found
[RedTube] 38864951: Checking mp4-240 video format URL
[info] Available formats for 38864951:
format code  extension  resolution note
mp4-240      mp4        240p       
mp4-720      mp4        720p       
mp4-1080     mp4        1080p      (best)
$

Maybe these formats no longer exist, or some cookie action is needed that's being blocked under #32450, or just Referer. More testing/investigation needed.

* URLs need host://domain default
* use traverse_obj()
@nicolaasjan
Copy link

But after more testing I see that some of the old media URLs are giving 404, so, with additional checking:

$ python -m youtube_dl -v -F 'https://www.redtube.com/38864951'

That page shows:

screenshot_20230722

@nicolaasjan
Copy link

nicolaasjan commented Jul 22, 2023

And this is what I get with yt-dlp giving it the master.m3u8 URL and providing referrer and cookies:

yt-dlp --ignore-config --cookies-from-browser firefox --no-part --restrict-filenames -N 4 --user-agent "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0" --referer "https://www.redtube.com/" "https://ev-h-ph.rdtcdn.com/hls/videos/201912/07/266878312/,1080P_4000K,720P_4000K,480P_2000K,240P_400K,_266878312.mp4.urlset/master.m3u8?validfrom=1689989959&validto=1689997159&hdl=-1&hash=9De0V3USQZEzj9MGc%2FnRrtv9%2FCk%3D&"
Extracting cookies from firefox
Extracted 124 cookies from firefox
[generic] Extracting URL: https://ev-h-ph.rdtcdn.com/hls/videos/201912/07/266878312/,1080P_4000K,720P_4000K,480P_2000K,240P...Gc%2FnRrtv9%2FCk%3D&
[generic] master: Downloading webpage
ERROR: [generic] None: Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>); please report this issue on  https://github.com/yt-dlp/yt-dlp/issues?q= , filling out the appropriate issue template. Confirm you are on the latest version using  yt-dlp -U

@dirkf
Copy link
Contributor Author

dirkf commented Jul 22, 2023

It seems that the HLS formats have gone 404 but depending on unknown factors some of the mp4 formats may be available, as shown above. I guess that the site player uses the HLS formats.

@nicolaasjan
Copy link

Now I get an error running:
python "devscripts/make_lazy_extractors.py" "youtube_dl/extractor/lazy_extractors.py":

python "devscripts/make_lazy_extractors.py" "youtube_dl/extractor/lazy_extractors.py"
WARNING: Lazy loading extractors is an experimental feature that may not always work
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.13) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
Traceback (most recent call last):
  File "/media/ext3-data/git/youtube-dl/youtube_dl/extractor/__init__.py", line 4, in <module>
    from .lazy_extractors import *
ModuleNotFoundError: No module named 'youtube_dl.extractor.lazy_extractors'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "devscripts/make_lazy_extractors.py", line 23, in <module>
    from youtube_dl.compat import compat_register_utf8
  File "/media/ext3-data/git/youtube-dl/youtube_dl/__init__.py", line 14, in <module>
    from .options import (
  File "/media/ext3-data/git/youtube-dl/youtube_dl/options.py", line 8, in <module>
    from .downloader.external import list_external_downloaders
  File "/media/ext3-data/git/youtube-dl/youtube_dl/downloader/__init__.py", line 23, in <module>
    from .niconico import NiconicoDmcFD
  File "/media/ext3-data/git/youtube-dl/youtube_dl/downloader/niconico.py", line 11, in <module>
    from ..extractor.niconico import NiconicoIE
  File "/media/ext3-data/git/youtube-dl/youtube_dl/extractor/__init__.py", line 9, in <module>
    from .extractors import *
  File "/media/ext3-data/git/youtube-dl/youtube_dl/extractor/extractors.py", line 69, in <module>
    from .ard import (
  File "/media/ext3-data/git/youtube-dl/youtube_dl/extractor/ard.py", line 8, in <module>
    from .generic import GenericIE
  File "/media/ext3-data/git/youtube-dl/youtube_dl/extractor/generic.py", line 67, in <module>
    from .redtube import RedTubeIE
  File "/media/ext3-data/git/youtube-dl/youtube_dl/extractor/redtube.py", line 49
    }, {
     ^
SyntaxError: invalid syntax

@dirkf
Copy link
Contributor Author

dirkf commented Jul 22, 2023

Should work now, missing }, in cut-and-paste to web.

@nicolaasjan
Copy link

https://www.redtube.com/38864951
Is still failing here (other URL's work).
Guess this particular page is broken?

If so, maybe not include it in _TESTS.

ytd -v --ignore-config https://www.redtube.com/38864951
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '--ignore-config', 'https://www.redtube.com/38864951']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2023.07.22.1
[debug] Lazy loading extractors enabled
[debug] Single file build
[debug] Python 3.8.10 (CPython x86_64 64bit) - Linux-5.4.0-153-generic-x86_64-with-glibc2.29 - OpenSSL 1.1.1f  31 Mar 2020 - glibc 2.31
[debug] exe versions: ffmpeg N-111355-g68e9d2835f-Nico-20230707, ffprobe N-111355-g68e9d2835f-Nico-20230707, phantomjs 2.1.1, rtmpdump 2.4
[debug] Proxy map: {}
[RedTube] 38864951: Downloading webpage
[RedTube] 38864951: Downloading JSON metadata
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading m3u8 information
WARNING: Failed to download m3u8 information: HTTP Error 404: Not Found
[RedTube] 38864951: Downloading JSON metadata
[RedTube] 38864951: Checking mp4-1080 video format URL
[RedTube] 38864951: mp4-1080 video format URL is invalid, skipping: HTTP Error 404: Not Found
[RedTube] 38864951: Checking mp4-720 video format URL
[RedTube] 38864951: mp4-720 video format URL is invalid, skipping: HTTP Error 404: Not Found
[RedTube] 38864951: Checking mp4-480 video format URL
[RedTube] 38864951: mp4-480 video format URL is invalid, skipping: HTTP Error 404: Not Found
[RedTube] 38864951: Checking mp4-240 video format URL
[RedTube] 38864951: mp4-240 video format URL is invalid, skipping: HTTP Error 404: Not Found
ERROR: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl.orig/youtube_dl/YoutubeDL.py", line 862, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/bin/youtube-dl.orig/youtube_dl/YoutubeDL.py", line 958, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/local/bin/youtube-dl.orig/youtube_dl/extractor/common.py", line 564, in extract
    ie_result = self._real_extract(url)
  File "/usr/local/bin/youtube-dl.orig/youtube_dl/extractor/redtube.py", line 137, in _real_extract
    self._sort_formats(formats)
  File "/usr/local/bin/youtube-dl.orig/youtube_dl/extractor/common.py", line 1443, in _sort_formats
    raise ExtractorError('No video formats found')
youtube_dl.utils.ExtractorError: No video formats found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

@dirkf
Copy link
Contributor Author

dirkf commented Jul 22, 2023

It worked after several tries for me. There's some random issue, maybe 60% OK. The site serves, unpredictably, various sets of CDN links: one or more sets may be broken. It could be good to replace the test if it can't be made to work repeatedly -- not that there aren't hundreds of bad download tests already.

$ python test/test_download.py TestDownload.test_RedTube
[RedTube] 38864951: Downloading webpage
[RedTube] 38864951: Downloading JSON metadata
[RedTube] 38864951: Downloading m3u8 information
[RedTube] 38864951: Downloading m3u8 information
[RedTube] 38864951: Downloading m3u8 information
[RedTube] 38864951: Downloading m3u8 information
[RedTube] 38864951: Downloading m3u8 information
[RedTube] 38864951: Downloading JSON metadata
[RedTube] 38864951: Checking mp4-1080 video format URL
[RedTube] 38864951: Checking mp4-720 video format URL
[RedTube] 38864951: Checking mp4-480 video format URL
[RedTube] 38864951: mp4-480 video format URL is invalid, skipping: HTTP Error 404: Not Found
[RedTube] 38864951: Checking mp4-240 video format URL
[info] Writing video description metadata as JSON to: test_RedTube_38864951.info.json
[debug] Invoking downloader on u'https://ev-ph.rdtcdn.com/videos/201912/07/266878312/1080P_4000K_266878312.mp4?validfrom=1690027459&validto=1690034659&rate=50000k&burst=300k&hash=XpPJ6%2FRGCWjyUBScJVHpMhyM6Wg%3D'
[download] Destination: test_RedTube_38864951.mp4
[download] 100% of 10.00KiB in 00:00
.
----------------------------------------------------------------------
Ran 1 test in 7.682s

OK
$

@gamer191
Copy link

I haven't looked at this code, but I'm pretty sure that line 64 is supposed to be https, since http://www.redtube.com is a 301 redirect

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants