[Tumblr] some links returning Unable to download webpage: HTTP Error 403: Forbidden #29585

someziggyman · 2021-07-18T10:38:04Z

Checklist

I'm reporting a broken site support
I've verified that I'm running youtube-dl version 2021.06.06
I've checked that all provided URLs are alive and playable in a browser
I've checked that all URLs and arguments with special characters are properly quoted or escaped
I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl -v -F https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Git HEAD: 7d37d0970
[debug] Python version 3.9.6 (CPython) - macOS-11.4-arm64-arm-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[Tumblr] 656964996113301504: Downloading webpage
[Tumblr] 656964996113301504: Downloading iframe page
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/opt/homebrew/Cellar/youtube-dl/2021.6.6/libexec/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/opt/homebrew/Cellar/youtube-dl/2021.6.6/libexec/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

Test link:
https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via
same link but a bit different format:
https://everythingfox.tumblr.com/post/656964996113301504/embed

However these links work, even though the structure seems to be the save (subdomain, post, ID, video name):
https://dumbasscats.tumblr.com/post/638777506589229056/a-true-captain-goes-down-with-his-ship-via-reddit
https://cuteanimalshare.tumblr.com/post/656841552268869632/who-doesnt-like-ginger-cats

The text was updated successfully, but these errors were encountered:

dirkf · 2021-07-19T12:36:26Z

The failing URL https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via needs the Referer header to be added when fetching the iframe URL (the value being the URL of the original page).

Also, the page has 10 video iframes, but the extractor only finds the first (top) one. The extractor should default to selecting the first video unless a playlist is requested, but, because --yes-playlist isn't distinguishable from failing to say --no-playlist, there is no way for yt-dl to do that.

Fixes ytdl-org/youtube-dl#29585 Authored by: foghawk

dirkf · 2022-02-12T14:29:43Z

Most extractors follow the browser's access paths, so that we know the extracted item corresponds to the resource indicated by the extracted URL.

When using an API that isn't directly invoked in the browser access path, we need to understand what metadata is available, in case the webpage needs to be searched for missing fields, and to what extent the API is supported/documented.

In this case, just pulling the yt-dlp fixes looks like a simple solution and would avoid duplicate code.

dirkf · 2022-02-12T14:58:44Z

If the site/app has a function like that I'd count it as a documented API. But such deep link URLs can be handled by adding an extractor, or extending an existing URL pattern. The default approach I described follows since yt-dl pre-dates the smartphone app era.

Of course, yt-dl has its own custom links, such using just the YT ID, or ytsearchall:..., or kaltura:partner:id.

foghawk mentioned this issue Jan 30, 2022

[tumblr] Fix 403 errors; handle Vimeo embeds yt-dlp/yt-dlp#2542

Merged

9 tasks

pukkandan pushed a commit to yt-dlp/yt-dlp that referenced this issue Jan 31, 2022

[tumblr] Fix 403 errors and handle vimeo embeds (#2542)

403be2e

Fixes ytdl-org/youtube-dl#29585 Authored by: foghawk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tumblr] some links returning Unable to download webpage: HTTP Error 403: Forbidden #29585

[Tumblr] some links returning Unable to download webpage: HTTP Error 403: Forbidden #29585

someziggyman commented Jul 18, 2021 •

edited

Loading

dirkf commented Jul 19, 2021

dirkf commented Feb 12, 2022

dirkf commented Feb 12, 2022

[Tumblr] some links returning Unable to download webpage: HTTP Error 403: Forbidden #29585

[Tumblr] some links returning Unable to download webpage: HTTP Error 403: Forbidden #29585

Comments

someziggyman commented Jul 18, 2021 • edited Loading

Checklist

Verbose log

Description

dirkf commented Jul 19, 2021

dirkf commented Feb 12, 2022

dirkf commented Feb 12, 2022

someziggyman commented Jul 18, 2021 •

edited

Loading