Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/mx3] Add extractor #8736

Merged
merged 14 commits into from Jan 21, 2024
Merged

[extractor/mx3] Add extractor #8736

merged 14 commits into from Jan 21, 2024

Conversation

martinxyz
Copy link
Contributor

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

Add a simple, basic extractor for mx3.ch.

(mx3.ch is a site that hosts music uploaded by bands from or in Switzerland. As a first approximation I'd say it's government funded.)

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

@seproDev seproDev added the site-request Request to support a new website label Dec 9, 2023
@bashonly bashonly self-requested a review December 12, 2023 00:08
@seproDev
Copy link
Collaborator

I think it would be better to extract metadata from: https://mx3.ch/t/1LIY.json
And to also include the other formats:

  • https://mx3.ch/tracks/1Cru/player_asset (128 kbps mp3)
  • https://mx3.ch/tracks/1Cru/player_asset?quality=hd (320 kbps mp3)
  • https://mx3.ch/tracks/1Cru/player_asset?quality= (source wav file)
  • https://mx3.ch/tracks/1C6E/download (source download. Not available for all files)

@seproDev seproDev added the pending-fixes PR has had changes requested label Dec 19, 2023
@martinxyz
Copy link
Contributor Author

martinxyz commented Dec 26, 2023

Thanks for looking at this. I missed that there is a JSON, using it now where possible. Sadly it is missing the genre info (which I would be tempted to drop, if it was the only thing), and also whether the "download" format is available.

I have added multiple formats now. The Mime-Types are all over the place. (Different video formats, mp3, wav.) I'm not confident about hardcoding a bitrate into the info. I've removed my previous code that hardcoded the file extension to 'mp3' for audio and 'mp4' for video.

I tried to set 'ext' to None (in the hope to trigger the default code that makes a HEAD request, then guesses the extension) but the testing framework doesn't seem to like that. So I've added the HEAD request directly into the extractor now. It works, but I wonder if there is a better way. (Ideally, the result of the HEAD request would also fill the file size info, etc. - But maybe it's too much, let's first have a working extractor.)

yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
The track IDs on neo.mx3.ch and volksmusik.mx3.ch do not work on mx3.ch.
The sites even require users to create a separate login.

And also extract "composer" and "performer".
Some other extractors use lists too, but it doesn't work well with filename
templates.
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
Comment on lines 47 to 62
add_format({
'url': f'{track_url}/player_asset',
'format_id': 'default',
'quality': 1,
}, fatal=True)
# the formats below don't always exist
add_format({
'url': f'{track_url}/player_asset?quality=hd',
'format_id': 'hd',
'quality': 10,
}, fatal=False)
add_format({
'url': f'{track_url}/download',
'format_id': 'download',
'quality': 11,
}, fatal=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo. yt-dlp should always download the highest quality format by default. We also prefer the source file on other extractors like Vimeo. If you don't want to download the highest quality format, you can use -f or -S.

Suggested change
add_format({
'url': f'{track_url}/player_asset',
'format_id': 'default',
'quality': 1,
}, fatal=True)
# the formats below don't always exist
add_format({
'url': f'{track_url}/player_asset?quality=hd',
'format_id': 'hd',
'quality': 10,
}, fatal=False)
add_format({
'url': f'{track_url}/download',
'format_id': 'download',
'quality': 11,
}, fatal=False)
add_format({
'url': f'{track_url}/player_asset',
'format_id': 'default',
'quality': 1,
})
add_format({
'url': f'{track_url}/player_asset?quality=hd',
'format_id': 'hd',
'quality': 10,
})
add_format({
'url': f'{track_url}/download',
'format_id': 'download',
'quality': 11,
})
add_format({
'url': f'{track_url}/player_asset?quality=source',
'format_id': 'source',
'quality': 11,
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I can see your point. But for me there is simply no difference in quality between a high-bitrate MP3 and WAV. For video I would list the formats and pick one manually, but for music I use the "Open With" browser extension (non-interactive) and check the download folder later. I'm going to use -f"best[ext!=wav][ext!=flac][filesize<50M]" -x now, so it will work for me if you add the format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only want the format selection for this site, just use -f hd/default. This will download hd if available and otherwise fallback to default.

yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
yt_dlp/extractor/mx3.py Outdated Show resolved Hide resolved
Comment on lines 64 to 74
return {
'id': track_id,
'formats': formats,
'artist': ', '.join(artists),
'genre': genre,
**traverse_obj(json, {
'title': ('title', {str}),
'composer': ('composer_name', {str}),
'thumbnail': (('picture_url_xlarge', 'picture_url'), {url_or_none}),
}, get_all=False),
}
Copy link
Collaborator

@seproDev seproDev Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I wrote a function to extract more metadata fields. For artist/performer, how about we split this across artist and album_artist, with a fallback for artist.

Suggested change
return {
'id': track_id,
'formats': formats,
'artist': ', '.join(artists),
'genre': genre,
**traverse_obj(json, {
'title': ('title', {str}),
'composer': ('composer_name', {str}),
'thumbnail': (('picture_url_xlarge', 'picture_url'), {url_or_none}),
}, get_all=False),
}
more_info = get_element_by_class('single-more-info', webpage)
def get_info_field(name):
return self._html_search_regex(
rf'<dt[^>]*>\s*{name}\s*</dt>\s*<dd[^>]*>(.*?)</dd>',
more_info, name, default=None, flags=re.DOTALL)
return {
'id': track_id,
'formats': formats,
'genre': self._html_search_regex(
r'<div\b[^>]+class="single-band-genre"[^>]*>([^<]+)</div>', webpage, 'genre', fatal=False),
'release_year': int_or_none(get_info_field('Year of creation')),
'description ': get_info_field('Description'),
'tags': try_call(lambda: get_info_field('Tag').split(', '), list),
**traverse_obj(data, {
'title': ('title', {str}),
'artist': (('performer_name', 'artist'), {str}),
'album_artist': ('artist', {str}),
'composer': ('composer_name', {str}),
'thumbnail': (('picture_url_xlarge', 'picture_url'), {url_or_none}),
}, get_all=False),
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The artist/album_artist split seem to fit pretty well, I like it. I slightly preferred the filenames I got previously with the format string, I'll get a few duplicated artist names now but it's not too bad really.

I've updated the tests to match. I noticed that https://neo.mx3.ch/t/1hpd kind of has a description, but they put it all into the credits field, not sure if we want to add that.

@seproDev seproDev added pending-review PR needs a review and removed pending-fixes PR has had changes requested labels Jan 20, 2024
@bashonly
Copy link
Member

Are there overlapping IDs between the 3 sites? Could this just be 1 extractor?

@seproDev
Copy link
Collaborator

There are collisions 1g2T (neo), 1g2T (volksmusik)

@bashonly bashonly removed the pending-review PR needs a review label Jan 21, 2024
@seproDev seproDev merged commit 5a63454 into yt-dlp:master Jan 21, 2024
6 checks passed
aalsuwaidi pushed a commit to aalsuwaidi/yt-dlp that referenced this pull request Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
site-request Request to support a new website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants