Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tumblr] some links returning Unable to download webpage: HTTP Error 403: Forbidden #29585

Open
5 tasks done
someziggyman opened this issue Jul 18, 2021 · 3 comments
Open
5 tasks done

Comments

@someziggyman
Copy link

someziggyman commented Jul 18, 2021

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2021.06.06
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl -v -F https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', '-F', 'https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.06.06
[debug] Git HEAD: 7d37d0970
[debug] Python version 3.9.6 (CPython) - macOS-11.4-arm64-arm-64bit
[debug] exe versions: none
[debug] Proxy map: {}
[Tumblr] 656964996113301504: Downloading webpage
[Tumblr] 656964996113301504: Downloading iframe page
ERROR: Unable to download webpage: HTTP Error 403: Forbidden (caused by <HTTPError 403: 'Forbidden'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see  https://yt-dl.org/update  on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/opt/homebrew/Cellar/youtube-dl/2021.6.6/libexec/lib/python3.9/site-packages/youtube_dl/extractor/common.py", line 634, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/opt/homebrew/Cellar/youtube-dl/2021.6.6/libexec/lib/python3.9/site-packages/youtube_dl/YoutubeDL.py", line 2288, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 523, in open
    response = meth(req, response)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 632, in http_response
    response = self.parent.error(
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 561, in error
    return self._call_chain(*args)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/opt/homebrew/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

Test link:
https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via
same link but a bit different format:
https://everythingfox.tumblr.com/post/656964996113301504/embed

However these links work, even though the structure seems to be the save (subdomain, post, ID, video name):
https://dumbasscats.tumblr.com/post/638777506589229056/a-true-captain-goes-down-with-his-ship-via-reddit
https://cuteanimalshare.tumblr.com/post/656841552268869632/who-doesnt-like-ginger-cats

@dirkf
Copy link
Contributor

dirkf commented Jul 19, 2021

The failing URL https://everythingfox.tumblr.com/post/656964996113301504/i-am-fierce-via needs the Referer header to be added when fetching the iframe URL (the value being the URL of the original page).

Also, the page has 10 video iframes, but the extractor only finds the first (top) one. The extractor should default to selecting the first video unless a playlist is requested, but, because --yes-playlist isn't distinguishable from failing to say --no-playlist, there is no way for yt-dl to do that.

@dirkf
Copy link
Contributor

dirkf commented Feb 12, 2022

Most extractors follow the browser's access paths, so that we know the extracted item corresponds to the resource indicated by the extracted URL.

When using an API that isn't directly invoked in the browser access path, we need to understand what metadata is available, in case the webpage needs to be searched for missing fields, and to what extent the API is supported/documented.

In this case, just pulling the yt-dlp fixes looks like a simple solution and would avoid duplicate code.

@dirkf
Copy link
Contributor

dirkf commented Feb 12, 2022

If the site/app has a function like that I'd count it as a documented API. But such deep link URLs can be handled by adding an extractor, or extending an existing URL pattern. The default approach I described follows since yt-dl pre-dates the smartphone app era.

Of course, yt-dl has its own custom links, such using just the YT ID, or ytsearchall:..., or kaltura:partner:id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants