Improve efficiency of upload when not checking changes #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request greatly improves efficiency of the script when MANAGE_CHANGES is False. For large sets of media, this can represent the difference between running in seconds versus minutes, hours or longer.
The script currently enumerates all files (
self.grabNewFiles()
returns all files) and then runsself.uploadFile
on each path. Inself.uploadFile
, the database is loaded & queried, andMANAGE_CHANGES
is checked. WithMANAGE_CHANGES
set to False, this is a noop for existing images -- lots of processing happened to do ignore said file. For a set of 2000 images with 5 new files, this means this (expensive) work is called 2000 times, once for each path.This PR runs the code for only new files - eg, the ones not yet in the database. It computes the set difference between the complete list of files from
self.grabNewFiles
and the list of paths from the database. Then, it runs the upload on only these paths. The result, using the example above, is the code only has the essential 5 calls required to upload the new files and ignores the rest.If
MANAGE_CHANGES
isTrue
, the code carries on as it originally was, as it needs to check each and every image for hash changes.