Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Media Selection Logic for Duplicate Resolution #31

Open
4 tasks
troykelly opened this issue Dec 1, 2023 · 2 comments
Open
4 tasks

Improve Media Selection Logic for Duplicate Resolution #31

troykelly opened this issue Dec 1, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@troykelly
Copy link
Owner

Challenge Description

Our EmbyDedupe script identifies duplicates purely based on media metadata available from Emby's API. It then selects one media item to keep and marks the rest for deletion. The logic behind the selection of the 'best' copy among duplicates primarily considers basic attributes such as resolution and bitrate. However, this approach could be improved by adopting a more sophisticated multi-criteria rating system encompassing a broader range of media quality metrics such as codec efficiency, audio quality, frame rate, HDR presence, etc.

Objective

The goal is to enhance the duplicate resolution logic to make a more informed decision when selecting the highest quality media item to retain. This will necessitate devising a weighted scoring system where each media attribute contributes to a composite 'quality score' for each media item. The item with the highest score would be presumed to be of the best quality and retained, while the others would be marked for deletion.

Criteria for Quality Assessment

Key criteria to be considered in the scoring system should include, but not be limited to:

  • Resolution: Both width and height dimensions.
  • Video Codec Efficiency: Efficiency of video codecs such as H.264, HEVC (H.265), VP9, and AV1.
  • Audio Quality: Channel count, audio codec type, and bitrate.
  • File Size: Generally, larger file sizes suggest higher quality, but this should be weighted less heavily than other criteria to account for codec efficiency.
  • Frame Rate: Actual frame rate information from the media, with a preference for higher rates.
  • HDR Presence: Whether the video has HDR generally improves viewing quality.

Other factors may also be considered where relevant, such as the colour depth, the presence of subtitles, and multiple language tracks.

Discussion Points

Before we implement these changes, we need to address several considerations:

  1. Determining the appropriate weight for each criterion based on its significance towards perceived media quality.
  2. Ensuring the system is flexible enough to handle future updates or new media attributes.
  3. Evaluating the computational complexity of the new selection logic and its impact on the script's performance, especially when dealing with large libraries.

I would appreciate feedback and thoughts on the proposed changes, including any additional criteria that might be relevant or potential pitfalls we should be aware of. Let's fine-tune our approach to establish a robust logic for media selection that satisfies our need for high-quality content.

Action Items

  • Discuss and finalize the criteria and their respective weights for the quality assessment formula.
  • Update the determine_items_to_delete function to incorporate the new weighted scoring system.
  • Test and validate the new media selection logic to ensure its accuracy and efficiency.
  • Document the changes and their rationale for future reference and maintenance.
@troykelly troykelly added the enhancement New feature or request label Dec 1, 2023
@troykelly troykelly self-assigned this Dec 1, 2023
@mgaulton
Copy link

mgaulton commented May 2, 2024

Based on my own usage, I wonder if you could add a "targeted" quality option to the criteria.
For example, some things, I just need 480p and don't want higher.
Some things, I want 720p/1080p and nothing higher.
Or just keep the highest available, within a size range.

@troykelly
Copy link
Owner Author

Ah Interesting.
I've been using Sonarr / Radarr etc to do that, and the dedupe script to just get rid of the massive bulk and duplication - but a target is a good idea.
Given this does a library at a time though, it would be a target for the whole library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants