-
Notifications
You must be signed in to change notification settings - Fork 1.9k
lastgenre: Global and artist-based genre blacklist #5744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
256a8a1
to
c8199cb
Compare
Codecov Report❌ Patch coverage is
🚀 New features to boost your workflow:
|
9cd4947
to
a0e6054
Compare
a3daa01
to
4868326
Compare
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a blacklist feature for the lastgenre plugin to forbid specific genres globally or per-artist using regex patterns or plain text matching. It uses a custom file format to maximize compatibility with complex regex patterns and provides both global (*
) and artist-specific blacklisting capabilities.
- Adds blacklist functionality with custom file format supporting regex patterns and plain genre names
- Refactors the
setup()
method by extracting file loading logic into separate helper methods - Updates genre filtering to check blacklist patterns before accepting genres
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
beetsplug/lastgenre/init.py | Core implementation of blacklist functionality, file parsing, and genre filtering |
test/plugins/test_lastgenre.py | Comprehensive test coverage for blacklist patterns, file format parsing, and error handling |
docs/plugins/lastgenre.rst | Documentation for the new blacklist feature with examples and configuration options |
with open(bl_filename, "rb") as f: | ||
for lineno, line in enumerate(f, 1): | ||
line = line.decode("utf-8").lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file is opened in binary mode ('rb') but then decoded line by line. This could fail if the file contains multi-byte Unicode characters that span across buffer boundaries. Consider opening the file in text mode with explicit encoding or using a proper text decoder.
with open(bl_filename, "rb") as f: | |
for lineno, line in enumerate(f, 1): | |
line = line.decode("utf-8").lower() | |
with open(bl_filename, "r", encoding="utf-8") as f: | |
for lineno, line in enumerate(f, 1): | |
line = line.lower() |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I should follow this advice. I tried to orientate my code on how the simple text file whitelist.txt is loaded. Our guides only tell that paths to store in the database should be made sure to be bytes https://beets.readthedocs.io/en/latest/contributing.html#handling-paths
So actually I coud safely follow this advice. Some guidance please @wisp3rwind or @snejus
Thanks :-)
section = None | ||
with open(bl_filename, "rb") as f: | ||
for lineno, line in enumerate(f, 1): | ||
line = line.decode("utf-8").lower() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling .lower() on the entire line before parsing could interfere with regex patterns that depend on case sensitivity. The case conversion should be done after separating the artist name from the patterns, and only applied to the artist name.
line = line.decode("utf-8").lower() | |
line = line.decode("utf-8") |
Copilot uses AI. Check for mistakes.
if not genres: | ||
return [] | ||
return [x for x in genres if self._is_valid(x)] | ||
return [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _filter_valid_genres method calls _is_forbidden for every genre, which performs regex matching against potentially many patterns. Consider caching compiled regex patterns or implementing early termination strategies for better performance with large genre lists.
Copilot uses AI. Check for mistakes.
and add return types.
Uses a custom text file format since YAML, INI, TOML, ... all have their flaws with parsing regex patterns.
and use it via _filter_valid_genres. For performance reasons only consider it if the genre is valid in the first place.
Fall back to literal strings if a regex compile fails. This automagically supports strings as well as regex patterns in the blacklist.
5b24d49
to
13a01b1
Compare
Description
Forbids a list of genres or regex patterns configurable for an artist or globally (
*
). It uses a simple custom file format that could look like this:This configuration gives a practical example within: Artist "Fracture" is a Drum And Bass producer but a also a Metal band exists with the same name. last.fm returns genres for both artists. Removing Metal variants (Power Metal, Heavy Metal, ...) from the global whitelist file would result in not being able to fetch proper genres for any other Metal band in the user's library anymore.
A mixture of regex and simple genre names is possible since regex patterns are precompiled and if that fails, escaped literal strings are added to the blacklist.
This PR also includes some cleanup in the
setup()
method which got rather longish. It moves all file loading code to separate helper methods.Notes
After several experiments with
YAML
andINI
as the blacklist file format I came to the conclusion that it's just too much PITA trying to fit regex patterns within these formats and I gave up looking for the perfect format that doesn't stand in the way of userfriendliness and allowing all kinds of regex patterns withouth special escaping and formatting tricks.....I'm back from that hell and decide that a very simple custom format suits this usecase best.This feature idea also came up during this discussion: #5721 (comment) and is on my personal wish list for ages.
My personal roadmap for the
lastgenre
plugin: #5915To Do