Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Handle/Organize Large Archive Files #21300
Comments
|
Putting entries close to each other (whatever this means) won't have any effect in general assuming random video. |
|
By "close to eachother" I meant when video IDs from the same channel are near each other in the archive file (so video 1 is on line 1000, video 2 is on line 1001, etc). This has to be relevant to some degree, because my connection is steady but some channels run through quickly and some very slow. The only thing that's different is some of the ids are grouped together (due to downloading a whole channel at once) vs others are spread apart because they are updated over time and the IDs get written to the archive spread out (due to other channels also being updated and written in between) |
|
I mean you are the expert so I could be wrong, but if scanning the archive file is anything like the "find" feature on Notepad++, it's faster to find a string of text that's 10 lines down from the current one vs finding one that's 1,000,000 lines down. It might only be a half of a second, but that half second adds up if you are doing it hundreds of times. Maybe I'm fundamentally misunderstanding how checking for video IDs works in the archive file, but it seems like it'd be obvious that ID's that are spread apart by thousands of lines would take longer to find than ones that are grouped together |
Checklist
Description
I'm currently working with a large archive file (almost 2 million entries spanned across close to 300 channels). I've noticed that my update script seems to scan through a channel's videos quicker when the entries are close to each other in the archive file. I think it'd be helpful to have a way to scan through the archive file and group the entries together by channel.
Possibly this could be done by extracting the entries to a separate text file for each channel and then merging them together, or maybe even implementing the output templates to the archive parameter so that you can use multiple archive files for the same script.
I'm not a very good coder so I don't know the easiest way to do this, but I'm sure it's possible somehow. If it's already possible, feel free to close the topic. Thanks!