Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SponsorBlock merge algorithm improvements #999

Merged
merged 2 commits into from
Sep 19, 2021
Merged

SponsorBlock merge algorithm improvements #999

merged 2 commits into from
Sep 19, 2021

Conversation

nihil-admirari
Copy link
Contributor

@nihil-admirari nihil-admirari commented Sep 16, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Multiple small fixes:

  1. Restore a commit from A better SponsorBlock #360 (@pukkandan force-pushed the sponsor-block branch from eb3c813 to e10ef4c 15 days ago). Please consider using --force-with-lease the next time.
  2. Standardized SHA256 hash sums.
  3. Hash sums for TAR archive are currently missing. Incorporated into [build] Improve release process #880.
  4. Tiny chapters were unconditionally appended to the previous chapter. Now, if possible, tiny normal chapters are combined with normals and sponsors with sponsors.

Since the fixes are tiny, they are combined into one pull request. Cherry-pick whatever you find useful.

Signed-off-by: nihil-admirari <50202386+nihil-admirari@users.noreply.github.com>
@pukkandan
Copy link
Member

pukkandan commented Sep 16, 2021

Restore a commit from A better SponsorBlock #360 (@pukkandan force-pushed the sponsor-block branch from eb3c813 to e10ef4c 15 days ago). Please consider using --force-with-lease the next time.

Sorry about that. I don't remember whether I pushed with/without leash, but I do remember seeing and reviewing that commit. So not sure what happened. No point in mulling over it now - I'll try to be more careful in the future.

Tiny chapters were unconditionally appended to the previous chapter. Now, if possible, tiny normal chapters are combined with normals and sponsors with sponsors.
Repeats of the same sponsor category are now accounted for when determining the smallest sponsor within the overlap.

I don't currently have time to properly test this. But if you have tested everything and is confident, I can merge it

Standardized SHA256 hash sums. Fixes SHA2-256SUMS cannot be verified using standard Linux tools #385. It's been a long time since GNU-style release SHA512 sums #383: everyone has updated, so format change should not be an issue.
Hash sums for TAR archive are currently missing.

I had pretty much forgotten about this. I'll cherrypick it

When resuming a download of e.g. a playlist, --embed-metadata overwrites previously downloaded videos even if metadata has already been embedded and haven't really changed. A new --no-override-metadata (enabled by default) saves a lot of disk operations.

Can't the same be said about most other postprocessing operations? We'll need to think about how to implement this without needing a new switch for each one

PS: btw, why not just use download-archive for this?

@nihil-admirari
Copy link
Contributor Author

But if you have tested everything and is confident, I can merge it

There are unit test for all of these changes. However, after reviewing the code once more, I decided that summing durations is not needed. Since sponsors are no longer merged into one, but rather separated into before overlap, overlap, and after overlap, the overlap part rarely contains more than two sponsors. Even in edge cases when it does, picking the absolute smallest contributor is probably better than picking the category with lowest total duration. I've added a test for smallest chapter determination anyway.

I had pretty much forgotten about this. I'll cherrypick it

I forgot to remove old format handling code in the updater, which is now fixed.

PS: btw, why not just use download-archive for this?

Didn't know that --download-archive disables post-processing completely. Indeed, it makes the last commit unecessary.

…sors

Signed-off-by: nihil-admirari <50202386+nihil-admirari@users.noreply.github.com>
@nihil-admirari nihil-admirari changed the title Hashing + sponsor merge algo changes + override metadata SponsorBlock merge algorithm improvements Sep 18, 2021
@nihil-admirari
Copy link
Contributor Author

nihil-admirari commented Sep 18, 2021

I dropped hashing-related commits. Only SponsorBlock changes remain.

@pukkandan pukkandan merged commit c6af2dd into yt-dlp:master Sep 19, 2021
@nihil-admirari nihil-admirari deleted the various_features branch September 19, 2021 16:30
nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants