[ie/crunchyroll] Remove initial state scrape #7632

Grub4K · 2023-07-18T13:59:44Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This PR removes the initial state scraping, saving one request. The initial state payload was removed from the webpage, and so we hardcode these values instead.

It also tries to adds a more helpful message for 403 workaround (#7442).

Fixes #7624

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 06d280f`

Summary

🛠️🌐✅

Improve Crunchyroll extraction by using better API parameters, handling errors, and fixing a bug.

Crunchyroll extracts
With client IDs, locale
Cloudflare cut off

Walkthrough

Add and use class attributes for client IDs and locale codes (link, link, link)
Refactor and improve base extractor methods (link, link)
Fix minor bug in beta extractor URL regex (link)
Add test case for beta extractor with German language URL (link)

ProDev2 · 2023-07-18T18:46:34Z

Works perfectly for me! Thank you. Also I checked the code and it shouldn't cause any merge conflicts with my PR (#7009) after you merge it to master.

Grub4K · 2023-07-18T22:31:23Z

This PR and build aim to fix both the 403 and the initial state error.
If working correctly, the 403 should no longer be present.
If it still is, please let me know the specific region and OS that failed in a comment with a verbose (-v) log.

If it fails, there is still a way to do the workaround as discussed here:

Extract your current User-Agent from the browser
- This can be done using google/duckduckgo/... by searching for "my user-agent"
Open a crunchyroll.com page in the browser
Load cookies from the browser into yt-dlp
- Use either --cookies-from-browser or --cookies
Pass the previously extracted User-Agent to yt-dlp
- Use --user-agent <your_user_agent>
Additionally, pass the flag to disable the automatic workaround
- --extractor-arg crunchyrollbeta:ua_workaround

To test, first get the build in one of the following ways:

update diretly from a regular builds: yt-dlp --update-to Grub4K/yt-dlp@2023.07.18.215323
download it from the pr-branch pre-release
run from source: git checkout git@github.com:Grub4K/yt-dlp.git,
then git switch fix/crunchy-initial-state
install using pip: python3 -m pip install -U pip setuptools wheel, then python3 -m pip install --force-reinstall https://github.com/Grub4K/yt-dlp/archive/fix/crunchy-initial-state.tar.gz

Afterwards, try and run through the above steps, letting me know of any further errors.

bashonly · 2023-08-27T11:51:56Z

@hajimekun Doesn't sound like it would be related. You can open a new issue if you think it should be looked into

Authored by: Grub4K

Grub4K added 3 commits July 18, 2023 03:03

Implement direct language extraction

1b1c1ad

Add more helpful token auth error message

3538de5

Clarify Cloudflare workaround steps

06d280f

Attempt at automated 403 workaround

c558917

Grub4K added site-enhancement Feature request for some website site-bug Issue with a specific website needs-testing Patch needs testing labels Jul 18, 2023

This was referenced Jul 18, 2023

[crunchyroll] Error 403 Forbidden for all URLs #7442

Closed

Crunchyroll ERROR: Unable to extract initial state #7624

Closed

pukkandan assigned Grub4K Jul 19, 2023

Why did we ever think that would work

82f1a12

Grub4K merged commit 9b16762 into yt-dlp:master Jul 20, 2023
13 checks passed

Grub4K removed the needs-testing Patch needs testing label Jul 20, 2023

Grub4K deleted the fix/crunchy-initial-state branch July 20, 2023 20:18

Rafawell mentioned this pull request Aug 6, 2023

[crunchyroll] Download audio language in Japanese and/or French #7741

Closed

9 tasks

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/crunchyroll] Remove initial state extraction (yt-dlp#7632)

24160e2

Authored by: Grub4K

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ie/crunchyroll] Remove initial state scrape #7632

[ie/crunchyroll] Remove initial state scrape #7632

Grub4K commented Jul 18, 2023 •

edited

ProDev2 commented Jul 18, 2023

Grub4K commented Jul 18, 2023

bashonly commented Aug 27, 2023

[ie/crunchyroll] Remove initial state scrape #7632

[ie/crunchyroll] Remove initial state scrape #7632

Conversation

Grub4K commented Jul 18, 2023 • edited

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 06d280f

Summary

Walkthrough

ProDev2 commented Jul 18, 2023

Grub4K commented Jul 18, 2023

bashonly commented Aug 27, 2023

Grub4K commented Jul 18, 2023 •

edited

`🤖 Generated by Copilot at 06d280f`