Efficient hashing #6

vkuznet · 2017-03-24T16:57:05Z

For each file to transfer we need to obtain its hash, so far we read files from end-to-end to obtain file hash. It has impact on RAM utilization. Study if this can be avoided or find a better way to obtain reliable hash while minimize RAM utilization impact. For example, seek file in multiple places and obtain hash of some chunk of the data.

rishiloyola · 2017-04-01T21:51:31Z

Instead of adler32 algorithm, we can use Fletcher's checksum algorithm to reduce the computation cost.

Another approach - We can create a cache like system to directly get the hash of some repetitive values. But it may consume some extra space of RAM.

vkuznet · 2017-04-02T17:24:40Z

yes, and it would be nice to explore Fletcher algorithm. I'm interesting in minimizing time/cpu resources required to calculate checksum of 2GB-10GB files.

…

On 0, Rishi ***@***.***> wrote: Are you talking about replacing adler32 algorithm? Well, in that case, we can use [Fletcher's checksum](https://en.wikipedia.org/wiki/Fletcher%27s_checksum) algorithm to reduce the computation cost. Another approach - We can create a cache like system to directly get the hash of some repetitive values. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #6 (comment)

rishiloyola · 2017-04-02T18:09:42Z

Also, it would be great if you can provide me the format of data. How are we storing the data? Is it JSON file?
If it is well formatted then we can also transfer data in chunks of the single file.

vkuznet · 2017-04-02T18:24:42Z

No, data are binary. We are not storing any data we are transferring it from site A to site B.

…

On 0, Rishi ***@***.***> wrote: Also, it would be great if you can provide me the format of data. How are we storing the data? Is it JSON file? -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #6 (comment)

rishiloyola · 2017-04-02T18:29:19Z

What do you think about BitTorrent-like protocol? This protocol itself has a mechanism for automatically verifying each chunk's integrity after download.

vkuznet · 2017-04-03T09:22:13Z

I don't think we need BitTorrent protocol per-se, since we mostly interested in use case of transferring files from single site to another site, rather from multiple sites. Also, I want to explore event streaming following our file format and I'm not sure if it will fit in this protocol, but will keep this in mind.

…

On 0, Rishi ***@***.***> wrote: What do you think about BitTorrent-like protocol? This protocol itself has a mechanism for automatically verifying each chunk's integrity after download. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #6 (comment)

vkuznet · 2017-04-03T09:25:18Z

FYI, the files we're transferring are ROOT, see https://root.cern.ch/ and there is Go interface for ROOT I/O https://godoc.org/go-hep.org/x/hep/rootio

…

On 0, Rishi ***@***.***> wrote: What do you think about BitTorrent-like protocol? This protocol itself has a mechanism for automatically verifying each chunk's integrity after download. -- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: #6 (comment)

vkuznet added the enhancement label May 10, 2017

vkuznet added this to the June development milestone May 10, 2017

vkuznet modified the milestones: July development, June development Jun 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient hashing #6

Efficient hashing #6

vkuznet commented Mar 24, 2017

rishiloyola commented Apr 1, 2017 •

edited

vkuznet commented Apr 2, 2017 via email

rishiloyola commented Apr 2, 2017 •

edited

vkuznet commented Apr 2, 2017 via email

rishiloyola commented Apr 2, 2017

vkuznet commented Apr 3, 2017 via email

vkuznet commented Apr 3, 2017 via email

Efficient hashing #6

Efficient hashing #6

Comments

vkuznet commented Mar 24, 2017

rishiloyola commented Apr 1, 2017 • edited

vkuznet commented Apr 2, 2017 via email

rishiloyola commented Apr 2, 2017 • edited

vkuznet commented Apr 2, 2017 via email

rishiloyola commented Apr 2, 2017

vkuznet commented Apr 3, 2017 via email

vkuznet commented Apr 3, 2017 via email

rishiloyola commented Apr 1, 2017 •

edited

rishiloyola commented Apr 2, 2017 •

edited