Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[patreon] fix some vimeo embed downloads #9613

Merged
merged 7 commits into from Apr 7, 2024
Merged

Conversation

johnvictorfs
Copy link
Contributor

@johnvictorfs johnvictorfs commented Apr 4, 2024

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This change prioritizes downloading from m3u8 instead of vimeo embeds (or embeds in general), which in some cases may no longer exist and only the m3u8 video attachment does.

This change adds checks to verify if external embeds of Patreon posts are still valid before trying to download them (since they may have been deleted from vimeo/dropbox/etc. but still be attached to the Patreon post), and if it is invalid then it tries other options

Example, using python -m yt_dlp https://www.patreon.com/posts/hunter-x-hunter-34007913 (this video is public, so no need for cookies/account to test with)

# Before change, finds a vimeo embed (that no longer exists) and fails to download it
python -m yt_dlp https://www.patreon.com/posts/hunter-x-hunter-34007913
[Patreon] Extracting URL: https://www.patreon.com/posts/hunter-x-hunter-34007913
[Patreon] 34007913: Downloading API JSON
[vimeo] Extracting URL: https://player.vimeo.com/video/391366663?app_id=122963#__youtubedl_smuggle=%7B%22referer%22%3A+%2...%2Fpatreon.com%22%7D
[vimeo] 391366663: Downloading webpage
ERROR: [vimeo] 391366663: Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: Not Found>)
# After change: checks for vimeo embed, fails, then tries to download the m3u8 file attachment
python -m yt_dlp https://www.patreon.com/posts/hunter-x-hunter-34007913
[Patreon] Extracting URL: https://www.patreon.com/posts/hunter-x-hunter-34007913
[Patreon] 34007913: Downloading API JSON
[Patreon] 34007913: Downloading webpage
WARNING: [Patreon] Unable to download webpage: HTTP Error 405: Method Not Allowed
[Patreon] 34007913: Downloading webpage
WARNING: [Patreon] Unable to download webpage: HTTP Error 404: Not Found
[Patreon] 34007913: Downloading m3u8 information
[info] 34007913: Downloading 1 format(s): 4712
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 284
[download] Destination: Hunter x Hunter | Kurapika DESTROYS Uvogin!!! [34007913].mp4
[download]  10.2% of ~ 721.42MiB at    2.61MiB/s ETA 03:26 (frag 28/284)

Possibly fixes #8702 (don't have access to the post in the example to test but seems pretty similar to my test case)

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

@pukkandan
Copy link
Member

pukkandan commented Apr 4, 2024

which in some cases may no longer exist and only the m3u8 video attachment does.

Do you know why this happens? Uploader deletes vimeo video? Can the reverse scenario happen?

Also, if both work, we should prioritize the video embed, correct?

@johnvictorfs
Copy link
Contributor Author

johnvictorfs commented Apr 4, 2024

which in some cases may no longer exist and only the m3u8 video attachment does.

Do you know why this happens? Uploader deletes vimeo video? Can the reverse scenario happen?

@pukkandan Not 100% sure but yeah I believe the uploader deleting just the Vimeo video could be it (or Vimeo themselves). The reverse scenario seems really unlikely to me (although I am not a creator on Patreon so I can't test that) since the removal of a Patreon embed would be done on Patreon itself, I would imagine it would be removed from the post correctly in that case.

Edit: by reverse scenario I am assuming you mean, the removal of the patreon attachment itself (but still having the url for it as if it was still there) but having a working (vimeo/other) embed, right?

Also, if both work, we should prioritize the video embed, correct?

But yes, in that scenario prioritizing the embed would make more sense, I could change it so it would still try the embed first but if it fails try other options, I will look into that a bit later, thanks!

@pukkandan
Copy link
Member

We can make a HeadRequest to the Vimeo URL and fall back to m3u8 if that fails

@johnvictorfs
Copy link
Contributor Author

johnvictorfs commented Apr 4, 2024

We can make a HeadRequest to the Vimeo URL and fall back to m3u8 if that fails

@pukkandan Changed the code to use this approach, is it correct?

Don't have any public posts I could share that fall into this example (having both a working embed and a file attachment), but seems to be working with some private posts I have access to.

image

I also haven't found any working vimeo embeds on Patreon to test with (just dropbox) but it worked with a hard-coded vimeo url temporarily in the code as if it was attached to the post:

image

My original test (with a public post) also works fine

python -m yt_dlp https://www.patreon.com/posts/hunter-x-hunter-34007913
[Patreon] Extracting URL: https://www.patreon.com/posts/hunter-x-hunter-34007913
[Patreon] 34007913: Downloading API JSON
[Patreon] 34007913: Downloading webpage
WARNING: [Patreon] Unable to download webpage: HTTP Error 405: Method Not Allowed
[Patreon] 34007913: Downloading webpage
WARNING: [Patreon] Unable to download webpage: HTTP Error 404: Not Found
[Patreon] 34007913: Downloading m3u8 information
[info] 34007913: Downloading 1 format(s): 4712
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 284
[download] Destination: Hunter x Hunter | Kurapika DESTROYS Uvogin!!! [34007913].mp4
[download]  10.2% of ~ 721.42MiB at    2.61MiB/s ETA 03:26 (frag 28/284)

@bashonly
Copy link
Member

bashonly commented Apr 4, 2024

HTTP Error 405: Method Not Allowed

this means a HEAD request won't work. IMO just use GET for both

@johnvictorfs
Copy link
Contributor Author

HTTP Error 405: Method Not Allowed

this means a HEAD request won't work. IMO just use GET for both

@bashonly Yeah you're right, I missed that, only tested vimeo.com links and not player.vimeo.com, changing both to GET requests, thanks.

@bashonly bashonly added the site-bug Issue with a specific website label Apr 4, 2024
yt_dlp/extractor/patreon.py Outdated Show resolved Hide resolved
yt_dlp/extractor/patreon.py Outdated Show resolved Hide resolved
@bashonly bashonly added the pending-review PR needs a review label Apr 6, 2024
yt_dlp/extractor/patreon.py Outdated Show resolved Hide resolved
yt_dlp/extractor/patreon.py Outdated Show resolved Hide resolved
@Grub4K Grub4K removed the pending-review PR needs a review label Apr 7, 2024
bashonly and others added 3 commits April 7, 2024 15:42
Co-authored-by: Simon Sawicki <accounts@grub4k.xyz>
@bashonly bashonly merged commit 36b240f into yt-dlp:master Apr 7, 2024
6 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong video URL extracted from Patreon post
4 participants