Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle/Organize Large Archive Files #21300

Closed
QBcrusher opened this issue Jun 4, 2019 · 3 comments
Closed

Handle/Organize Large Archive Files #21300

QBcrusher opened this issue Jun 4, 2019 · 3 comments
Labels

Comments

@QBcrusher
Copy link

@QBcrusher QBcrusher commented Jun 4, 2019

Checklist

  • I'm reporting a feature request
  • I've verified that I'm running youtube-dl version 2019.05.20
  • I've searched the bugtracker for similar feature requests including closed ones

Description

I'm currently working with a large archive file (almost 2 million entries spanned across close to 300 channels). I've noticed that my update script seems to scan through a channel's videos quicker when the entries are close to each other in the archive file. I think it'd be helpful to have a way to scan through the archive file and group the entries together by channel.

Possibly this could be done by extracting the entries to a separate text file for each channel and then merging them together, or maybe even implementing the output templates to the archive parameter so that you can use multiple archive files for the same script.

I'm not a very good coder so I don't know the easiest way to do this, but I'm sure it's possible somehow. If it's already possible, feel free to close the topic. Thanks!

@QBcrusher QBcrusher added the request label Jun 4, 2019
@dstftw
Copy link
Collaborator

@dstftw dstftw commented Jun 4, 2019

Putting entries close to each other (whatever this means) won't have any effect in general assuming random video.
Also managing archive files' granularity is your responsibility as user.
Output templates will not be allowed for download archive file paths due to obvious reasons.

@dstftw dstftw closed this Jun 4, 2019
@QBcrusher
Copy link
Author

@QBcrusher QBcrusher commented Jun 4, 2019

By "close to eachother" I meant when video IDs from the same channel are near each other in the archive file (so video 1 is on line 1000, video 2 is on line 1001, etc).

This has to be relevant to some degree, because my connection is steady but some channels run through quickly and some very slow. The only thing that's different is some of the ids are grouped together (due to downloading a whole channel at once) vs others are spread apart because they are updated over time and the IDs get written to the archive spread out (due to other channels also being updated and written in between)

@QBcrusher
Copy link
Author

@QBcrusher QBcrusher commented Jun 4, 2019

I mean you are the expert so I could be wrong, but if scanning the archive file is anything like the "find" feature on Notepad++, it's faster to find a string of text that's 10 lines down from the current one vs finding one that's 1,000,000 lines down. It might only be a half of a second, but that half second adds up if you are doing it hundreds of times.

Maybe I'm fundamentally misunderstanding how checking for video IDs works in the archive file, but it seems like it'd be obvious that ID's that are spread apart by thousands of lines would take longer to find than ones that are grouped together

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.