New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient hashing #6
Comments
Instead of adler32 algorithm, we can use Fletcher's checksum algorithm to reduce the computation cost. Another approach - We can create a cache like system to directly get the hash of some repetitive values. But it may consume some extra space of RAM. |
yes, and it would be nice to explore Fletcher algorithm.
I'm interesting in minimizing time/cpu resources required
to calculate checksum of 2GB-10GB files.
…On 0, Rishi ***@***.***> wrote:
Are you talking about replacing adler32 algorithm? Well, in that case, we can use [Fletcher's checksum](https://en.wikipedia.org/wiki/Fletcher%27s_checksum) algorithm to reduce the computation cost.
Another approach - We can create a cache like system to directly get the hash of some repetitive values.
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#6 (comment)
|
Also, it would be great if you can provide me the format of data. How are we storing the data? Is it JSON file? |
No, data are binary. We are not storing any data we are transferring it from
site A to site B.
…On 0, Rishi ***@***.***> wrote:
Also, it would be great if you can provide me the format of data. How are we storing the data? Is it JSON file?
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#6 (comment)
|
What do you think about BitTorrent-like protocol? This protocol itself has a mechanism for automatically verifying each chunk's integrity after download. |
I don't think we need BitTorrent protocol per-se, since we mostly interested in
use case of transferring files from single site to another site, rather from
multiple sites. Also, I want to explore event streaming following our file
format and I'm not sure if it will fit in this protocol, but will keep this in
mind.
…On 0, Rishi ***@***.***> wrote:
What do you think about BitTorrent-like protocol? This protocol itself has a mechanism for automatically verifying each chunk's integrity after download.
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#6 (comment)
|
FYI, the files we're transferring are ROOT, see https://root.cern.ch/
and there is Go interface for ROOT I/O
https://godoc.org/go-hep.org/x/hep/rootio
…On 0, Rishi ***@***.***> wrote:
What do you think about BitTorrent-like protocol? This protocol itself has a mechanism for automatically verifying each chunk's integrity after download.
--
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub:
#6 (comment)
|
For each file to transfer we need to obtain its hash, so far we read files from end-to-end to obtain file hash. It has impact on RAM utilization. Study if this can be avoided or find a better way to obtain reliable hash while minimize RAM utilization impact. For example, seek file in multiple places and obtain hash of some chunk of the data.
The text was updated successfully, but these errors were encountered: