New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BitChute] - Fix 'uploader' and playlist tests md5 result #8507
Conversation
Instead of reopening essentially the same PR, feel free to edit the description or add commits to the old PR instead |
Sorry. These pull requests were created while my account was flagged by github. The |
copilot summary can be ignored (its honestly quite pointless) and I will manually approve the workflows. As long as pushing to the old branch works, feel free to keep doing that |
Note that I had to manually approve the workflows for this PR as well |
Thanks @Grub4K for the clarification. |
yt_dlp/extractor/bitchute.py
Outdated
@@ -126,7 +126,7 @@ def _real_extract(self, url): | |||
'title': self._html_extract_title(webpage) or self._og_search_title(webpage), | |||
'description': self._og_search_description(webpage, default=None), | |||
'thumbnail': self._og_search_thumbnail(webpage), | |||
'uploader': clean_html(get_element_by_class('owner', webpage)), | |||
'uploader': self._search_regex(r'<p\sclass=\"name.+Channel\">([^<]+)<', webpage, 'uploader'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about something like this:
'uploader': self._search_regex(r'<p\sclass=\"name.+Channel\">([^<]+)<', webpage, 'uploader'), | |
'channel': clean_html(get_element_by_class('name', details)), | |
'uploader': clean_html(get_element_by_class('creator', details)), |
with
details = get_element_by_class('details', webpage) or ''
Potentially, the channel_url
and uploader_url
could also be extracted, as these can also differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a couple of changes, and added the channel_url
and uploader_url
as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both channel
and uploader
should be added as they can differ as seen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example @seproDev . I added the channel
extraction as well, since the information was already there.
yt_dlp/extractor/bitchute.py
Outdated
details = get_element_by_class('details', webpage) or '' | ||
uploader_path = extract_attributes( | ||
get_element_html_by_class('spa', get_element_html_by_class('creator', details)) or '').get('href') | ||
channel_path = extract_attributes( | ||
get_element_html_by_class('spa', get_element_html_by_class('name', details)) or '').get('href') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- all of this "details" extraction should be moved down near the return for better code flow/readability
uploader_path
andchannel_path
are currently fatal becauseget_element_html_by_class
can returnNone
and throws when its html input isNone
- matching
base_url
in_VALID_URL
is overkill, we can always just usehttps://www.bitchute.com
- use
utils.urljoin
- move duplicated code into a function
details = get_element_by_class('details', webpage) or ''
uploader_html = get_element_html_by_class('creator', details) or ''
channel_html = get_element_html_by_class('name', details) or ''
def construct_url(html):
path = extract_attributes(get_element_html_by_class('spa', html) or '').get('href')
return urljoin('https://www.bitchute.com', path)
return {
# ...
'uploader': clean_html(uploader_html),
'channel': clean_html(channel_html),
'uploader_url': construct_url(uploader_html),
'channel_url': construct_url(channel_html),
# ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestions @bashonly . I applied those and reverted the base_url
change.
Interested in this being merged. |
Closes yt-dlp#8492 Authored by: SirElderling
Closes yt-dlp#8492 Authored by: SirElderling
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
This pull request contains two changes:
uploader
name from the webpageFixes #8492
Template
Before submitting a pull request make sure you have:
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
What is the purpose of your pull request?
Copilot Summary
🤖 Generated by Copilot at 0f97a23
Summary
🐛🛠️📝
Improve BitChute extractor by fixing uploader name and test case.
Walkthrough
BitChuteIE
using regex (link)BitChuteChannelIE
test case to match website changes (link)