-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt local SHA1 hash of content before uploading #67
Comments
I was looking for something like this because I want to scan my own 7 dvd map set locally to identify anything you may currently be missing in the archive. Chances are not if this already includes what i'd upped to rushbase before it closed. be good to have a tool that could walk a local directory against the data here and build an upload queue of some kind with a separate frontend for folks who want that. In the past i've used existing tools like clrmamepro (built to scan rom files but can be leveraged for other filesets) to catalog the files, ~13k in all which is similar to what the project looks to do in the metadata. |
The Is your map set a collection of zip/rar/archive files as may have been originally distributed/uploaded by authors, or are they unpacked Unreal Archive is currently set up to deal with the former - "original" archives as distributed by content authors. Unfortunately it doesn't have a good mechanism of dealing with the individual "loose" content files at the moment. If you want to try it out, there's a pre-built binary distribution available at: https://code.shrimpworks.za.net/artefacts/net/shrimpworks/unreal-archive/latest/unreal-archive-latest.zip This can be used to scan a directory for new or unknown content:
The output of the Maybe as a starting point, give that a try, and see what it finds? Given the caveat that it expects to find archive files, not unr files. As, if you feed it unr files, they will be hashed independently, and considered unique compared to existing content within archives, so everything would appear "new". |
Ok thanks, i'll take a look at that tool in the first instance, if need be i can knock up something to do the same anyway but sounds like it exists.
all of the above, i was predominately focused on release archives and variations and then documenting the contents which is how the tooling i was using at the time works. I also only scanned crc32 (for some reason) i've upped the old dats for reference here where every |
had a go at the tool and ran into some issues pulling the archive data, i'll have another look later. I ended up writing my own tool in python just to do the basic check I needed done, slower to pull the data but ran quickly enough for me. i.e. sha1 hash the local archive and look the hash up in the unreal-archive data. I have ~1500 potentially new files or variations on known. Noting it seems past me didn't retain the release casing for filenames, lowercased everything, annoying. e.g. i'll have a read in how to bulk submit, or i may just up them and you can grab them |
Cool. Unfortunate it didn't just work, what errors or problems did you encounter getting the archive data? It should automatically download it if you don't manually git checkout and provide a Anyway, if you've managed to narrow down just the new or variant content, and if you have somewhere to upload it even temporarily, where I can grab it, I'm gappy to do a bulk indexing on my end. |
there was an initial error that i didn't capture, but now when trying to download the data i run into this
using
|
Interesting, thanks. I'll have to validate that, I may well have broken it in the recent refactors 😬 |
@shrimpza tried messaging you on discord to discuss uploads but it's being a pain, reach out if you can. |
Perhaps we can SHA1 hash and lookup content before uploading.
This would allow validation of at least some duplicates before wasting time and resources uploading to the server.
submit/index.html
should be able to load and hash files selected by the user, then perform some sort of request to the backend to look up content by hash.This would likely also involve generating a new file structure based on hashes, perhaps with meta redirects to the appropriate content pages.
References:
The text was updated successfully, but these errors were encountered: