[BitChute] - Fix 'uploader' and playlist tests md5 result #8507

SirElderling · 2023-11-03T16:40:11Z

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

This pull request contains two changes:

Fixed the code that retrieves the uploader name from the webpage
Fixed the md5 hash for one of the playlist tests.

Fixes #8492

Template

Before submitting a pull request make sure you have:

At least skimmed through contributing guidelines including yt-dlp coding conventions
Searched the bugtracker for similar pull requests
Checked the code with flake8 and ran relevant tests

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Fix or improvement to an extractor (Make sure to add/update tests)
New extractor (Piracy websites will not be accepted)
Core bug fix/improvement
New feature (It is strongly recommended to open an issue first)

Copilot Summary

`🤖 Generated by Copilot at 0f97a23`

Summary

🐛🛠️📝

Improve BitChute extractor by fixing uploader name and test case.

We're sailing on the BitChute sea, me hearties, yo ho ho
We're scraping all the data we can find, yo ho ho
We've fixed the uploader and the description too
We've made the extractor more reliable and true

Walkthrough

Fix uploader name extraction for BitChuteIE using regex (link)
Update description hash for BitChuteChannelIE test case to match website changes (link)

Grub4K · 2023-11-03T16:43:10Z

Instead of reopening essentially the same PR, feel free to edit the description or add commits to the old PR instead

SirElderling · 2023-11-03T16:50:11Z

Instead of reopening essentially the same PR, feel free to edit the description or add commits to the old PR instead

Sorry. These pull requests were created while my account was flagged by github. The Checks and Copilot summary were not working. Hence, I decided to recreate them.

Grub4K · 2023-11-03T16:51:54Z

copilot summary can be ignored (its honestly quite pointless) and I will manually approve the workflows. As long as pushing to the old branch works, feel free to keep doing that

Grub4K · 2023-11-03T16:52:50Z

Note that I had to manually approve the workflows for this PR as well

SirElderling · 2023-11-03T16:53:39Z

Thanks @Grub4K for the clarification.

seproDev · 2023-11-07T04:53:19Z

yt_dlp/extractor/bitchute.py

@@ -126,7 +126,7 @@ def _real_extract(self, url):
            'title': self._html_extract_title(webpage) or self._og_search_title(webpage),
            'description': self._og_search_description(webpage, default=None),
            'thumbnail': self._og_search_thumbnail(webpage),
-            'uploader': clean_html(get_element_by_class('owner', webpage)),
+            'uploader': self._search_regex(r'<p\sclass=\"name.+Channel\">([^<]+)<', webpage, 'uploader'),


How about something like this:

Suggested change

'uploader': self._search_regex(r'<p\sclass=\"name.+Channel\">([^<]+)<', webpage, 'uploader'),

'channel': clean_html(get_element_by_class('name', details)),

'uploader': clean_html(get_element_by_class('creator', details)),

with

details = get_element_by_class('details', webpage) or ''

Potentially, the channel_url and uploader_url could also be extracted, as these can also differ.

I did a couple of changes, and added the channel_url and uploader_url as you suggested.

I think both channel and uploader should be added as they can differ as seen here.

Thanks for the example @seproDev . I added the channel extraction as well, since the information was already there.

bashonly · 2023-11-08T14:29:06Z

yt_dlp/extractor/bitchute.py

+        details = get_element_by_class('details', webpage) or ''
+        uploader_path = extract_attributes(
+            get_element_html_by_class('spa', get_element_html_by_class('creator', details)) or '').get('href')
+        channel_path = extract_attributes(
+            get_element_html_by_class('spa', get_element_html_by_class('name', details)) or '').get('href')


all of this "details" extraction should be moved down near the return for better code flow/readability

uploader_path and channel_path are currently fatal because get_element_html_by_class can return None and throws when its html input is None

matching base_url in _VALID_URL is overkill, we can always just use https://www.bitchute.com

use utils.urljoin

move duplicated code into a function

details = get_element_by_class('details', webpage) or '' uploader_html = get_element_html_by_class('creator', details) or '' channel_html = get_element_html_by_class('name', details) or '' def construct_url(html): path = extract_attributes(get_element_html_by_class('spa', html) or '').get('href') return urljoin('https://www.bitchute.com', path) return { # ... 'uploader': clean_html(uploader_html), 'channel': clean_html(channel_html), 'uploader_url': construct_url(uploader_html), 'channel_url': construct_url(channel_html), # ... }

Thanks for the suggestions @bashonly . I applied those and reverted the base_url change.

…action

sefabey · 2023-12-06T17:35:35Z

Interested in this being merged.

Closes yt-dlp#8492 Authored by: SirElderling

[BitChute] - Fix 'uploader' and playlist md5 result

0f97a23

Grub4K mentioned this pull request Nov 3, 2023

[RadioComercial] Add extractor #8508

Merged

9 tasks

bashonly self-requested a review November 3, 2023 20:05

seproDev added the site-bug Issue with a specific website label Nov 4, 2023

SirElderling changed the title ~~[BitChute] - Fix 'uploader' and playlist md5 result~~ [BitChute] - Fix 'uploader' and playlist tests md5 result Nov 4, 2023

seproDev reviewed Nov 7, 2023

View reviewed changes

[BitChute] - Fix 'uploader', add 'channel_url' and 'uploader_url'

25d7cac

SirElderling requested a review from seproDev November 8, 2023 13:51

bashonly reviewed Nov 8, 2023

View reviewed changes

[BitChute] - code cleanup following suggestions; added 'channel' extr…

7ae8311

…action

bashonly self-requested a review November 26, 2023 03:24

minor

c2bd5ee

bashonly approved these changes Dec 6, 2023

View reviewed changes

bashonly self-assigned this Dec 6, 2023

bashonly merged commit b1a1ec1 into yt-dlp:master Dec 11, 2023
15 checks passed

gonzalezjo pushed a commit to gonzalezjo/yt-dlp that referenced this pull request Dec 12, 2023

[ie/bitchute] Fix and improve metadata extraction (yt-dlp#8507)

8adcaf0

Closes yt-dlp#8492 Authored by: SirElderling

SirElderling deleted the bitchute branch December 18, 2023 06:25

SirElderling restored the bitchute branch December 18, 2023 06:25

SirElderling deleted the bitchute branch December 18, 2023 06:25

aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024

[ie/bitchute] Fix and improve metadata extraction (yt-dlp#8507)

b00e4ca

Closes yt-dlp#8492 Authored by: SirElderling

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BitChute] - Fix 'uploader' and playlist tests md5 result #8507

[BitChute] - Fix 'uploader' and playlist tests md5 result #8507

SirElderling commented Nov 3, 2023 •

edited by ghost

Grub4K commented Nov 3, 2023

SirElderling commented Nov 3, 2023 •

edited

Grub4K commented Nov 3, 2023

Grub4K commented Nov 3, 2023

SirElderling commented Nov 3, 2023

seproDev Nov 7, 2023

SirElderling Nov 8, 2023

seproDev Nov 8, 2023

SirElderling Nov 8, 2023

bashonly Nov 8, 2023 •

edited

SirElderling Nov 8, 2023

sefabey commented Dec 6, 2023

	'uploader': self._search_regex(r'<p\sclass=\"name.+Channel\">([^<]+)<', webpage, 'uploader'),
	'channel': clean_html(get_element_by_class('name', details)),
	'uploader': clean_html(get_element_by_class('creator', details)),

[BitChute] - Fix 'uploader' and playlist tests md5 result #8507

[BitChute] - Fix 'uploader' and playlist tests md5 result #8507

Conversation

SirElderling commented Nov 3, 2023 • edited by ghost

Description of your pull request and other information

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

What is the purpose of your pull request?

🤖 Generated by Copilot at 0f97a23

Summary

Walkthrough

Grub4K commented Nov 3, 2023

SirElderling commented Nov 3, 2023 • edited

Grub4K commented Nov 3, 2023

Grub4K commented Nov 3, 2023

SirElderling commented Nov 3, 2023

seproDev Nov 7, 2023

Choose a reason for hiding this comment

SirElderling Nov 8, 2023

Choose a reason for hiding this comment

seproDev Nov 8, 2023

Choose a reason for hiding this comment

SirElderling Nov 8, 2023

Choose a reason for hiding this comment

bashonly Nov 8, 2023 • edited

Choose a reason for hiding this comment

SirElderling Nov 8, 2023

Choose a reason for hiding this comment

sefabey commented Dec 6, 2023

SirElderling commented Nov 3, 2023 •

edited by ghost

`🤖 Generated by Copilot at 0f97a23`

SirElderling commented Nov 3, 2023 •

edited

bashonly Nov 8, 2023 •

edited