-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Scan SmugMug account for missing files and duplicates #21
Comments
I don't know if there's a single endpoint that tells this info, but you can for sure calculate it looping over the albums and their photos, as each photo/video has the size field As per your main request: how would you like to identify the duplicated/missing photos? By their file name? or also by the directory tree they are into? And maybe their size too? |
The short answer to your question is that I believe the dupes/missing files must be identified/matched by md5. This may prove to be problematic with videos (which I believe are re-encoded by smugmug on upload), but let's address that later after the basics are nailed down.
I found an old dgrin blog post that showed where one could find the disk usage on the account in the Stats page in Account Settings, but this appears to be currently unavailable. I think this reflects a third feature request that I have: Request 3 add support tallying up disk usage stats for all files in all galleries, and dump the results in a Below I will tie together how I think this should work, along with my main requests 1. and 2. which are all related in subtle ways, and I will restate below. Request 1 add support for a new run mode that will not copy any files, but find and identify photos and videos that are in a local directory tree (distinct from smugmug_backup's Request 2 add support for finding and identifying photos and videos that reside in multiple locations on the SmugMug account. I believe Request 2 and 3 can be easily and efficiently "baked into" the standard run mode you already have, since currently, you're already walking through all files in all galleries, and fetching the required information about each. So, I believe requests 2 and 3 can be solved solved by: A. always generate and dump the aforementioned top-level
Request 1. actually requires a new run mode that would be specified in your .toml file, since it requires at least one other configuration item to specify the local directory to scan for dupes and cross-reference the smugmug_backup archive. Since this is a completely new behavior, I would propose to add a new configuration section called something like
This new mode would scan each specified local dir and compute a similar database file containing the filename, size, and computed md5 sum of each local photo or video file found in the tree. For each leaf file, it would look for an md5 match in the smugmug_backup archive (located at |
After getting through a few full runs of this app, and figuring out how In the long run, I think this won't be sufficient or convenient for implementing the md5 checking I originally suggested, and still feel it's wise to create a simple database each run to track md5 and other info about each transferred file. This database will be useful for debugging purposes and implementing future features. It shouldn't cause any performance burden, since I imagine this application is completely I/O bound, waiting for network transfers to complete most of the time. As I mentioned in the minor documentation PR I just raised, I'm no go developer, but I am an experienced Embedded SW Engineer turned Devops guy that's gotten pretty deeply into the Packer source code lately (another Go app). I'd be curious to get your opinions on the overall changeset I'm proposing here, the scope of the changes, and your interest in working on them. I'd also be willing to ramp up on Go and help out with the workload if you're interested in accepting help. I think there are a lot of people that want these features, it's just that most SmugMug users wouldn't think to look on github for solutions. The smugmug and other photography forums are full of people looking for solutions to problems caused by shaky uploaders and tools. This tool, backing up our cloud, is already almost the perfect companion to a SmugMug account. Providing some basic introspection and troubleshooting devices (dupes and missing detection), really would make it the perfect companion to a SmugMug account. |
Stale issue message |
I am currently running your fabulous app against my SmugMug account to download its massive, who-knows-how-many-terabytes content to a local disk*. Kudos to you for creating this project, well done sir!
I'm actually using this app as a starting point for solving another problem that it seems well-suited for: I would like to identify photos and videos that are:
I do not trust the various SmugMug auto uploader solutions, and feel they leave behind blocks of images and videos from time to time. Before I free up local disk space by deleting photos and videos, I want a higher level of confidence that they are all present on my SmugMug account.
I also do not trust any of the SmugMug uploader's ability to detect and omit duplicates. I sometimes get a lot of duplicates when the auto uploader stopped uploading for a time and I manually upload large chunks of photos to compensate but the set I upload overlaps with ones that are already been uploaded.
The text was updated successfully, but these errors were encountered: