Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/dropout] Dropout season pagination #7304

Merged
merged 9 commits into from Jun 21, 2023

Conversation

OverlordQ
Copy link
Contributor

@OverlordQ OverlordQ commented Jun 13, 2023

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Handles pagination for seasons with more than 24 episodes. Also fixes test cases with accurate episode counts for seasons.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

馃 Generated by Copilot at b7d9739

Summary

馃搫馃敆馃И

Improved the DropoutSeasonIE extractor to handle multiple pages of episodes. Updated the tests and the yt_dlp/extractor/dropout.py file accordingly.

We're scraping the web for some videos to see
We use DropoutSeasonIE
But the pages are many and we need to fetch more
So we add pagination with get_element_by_attribute

Walkthrough

  • Import get_element_by_attribute function from utils.py to use for pagination (link)
  • Update playlist_count for DropoutSeasonIE extractor test cases to reflect current number of episodes for "Dimension 20: Fantasy High" (link, link)
  • Add new test case for DropoutSeasonIE extractor to cover multi-season series with pagination, such as "Breaking News No Laugh Newsroom" (link)
  • Initialize page_num variable to 1 in _real_extract method of DropoutSeasonIE extractor to keep track of current page number (link)
  • Add pagination logic to _real_extract method of DropoutSeasonIE extractor using get_element_by_attribute, get_elements_by_class, and url_result functions to download and extract all episodes of a paginated series (link)

@OverlordQ OverlordQ changed the title Dropout pagination [extractor/dropout] Dropout season pagination Jun 13, 2023
@pukkandan
Copy link
Member

Try to return a PagedList instead, or atleast a generator.

@OverlordQ
Copy link
Contributor Author

Second pass. The try/except seems hacky but only way I saw to not barf when len(page 1) == page_max

yt_dlp/extractor/dropout.py Outdated Show resolved Hide resolved
yt_dlp/extractor/dropout.py Outdated Show resolved Hide resolved
@bashonly bashonly added site-bug Issue with a specific website pending-fixes PR has had changes requested labels Jun 14, 2023
@OverlordQ OverlordQ requested a review from bashonly June 14, 2023 16:19
@OverlordQ
Copy link
Contributor Author

OverlordQ commented Jun 14, 2023

Changes made, but still the nested exception issue. Is passing expected_status=400 to _download_webpage the expected way to handle this case?

Comment on lines 212 to 213
webpage = self._download_webpage(
f'{url}?page={page}', season_id, note=f'Downloading page {page}')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I was wrong. yt-dlp will catch the error and continue with playlist extraction, but the extractor & tests do fail. Let's do this then:

Suggested change
webpage = self._download_webpage(
f'{url}?page={page}', season_id, note=f'Downloading page {page}')
webpage = self._download_webpage(
f'{url}?page={page}', season_id, note=f'Downloading page {page}', expected_status=400)

@bashonly bashonly removed the pending-fixes PR has had changes requested label Jun 14, 2023
@bashonly bashonly added the pending-review PR needs a review label Jun 14, 2023
@bashonly bashonly removed the pending-review PR needs a review label Jun 21, 2023
@bashonly bashonly merged commit db22142 into yt-dlp:master Jun 21, 2023
11 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-bug Issue with a specific website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants