Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upMemory usage and NAT traversal questions #430
Comments
This comment has been minimized.
This comment has been minimized.
|
It seems to me that mega.nz is different because it doesn't need to be able to access the file in memory. They don't need to analyze the whole file in browser, just stream it to the server. Maybe I am missing something here, but it doesn't seem like the same can be achieved when you need to generate an md5 hash of the entire file. I'm not familiar with the actual md5 algorithm, but maybe we can write our own md5 implementation that accepts a stream so we can release memory after the alorithm has consumed it. |
This comment has been minimized.
This comment has been minimized.
|
It seems theorically possible to calculate an MD5 hash with chunks of data little by little: http://www.openssl.org/docs/manmaster/crypto/md5.html |
This comment has been minimized.
This comment has been minimized.
|
It's actually SHA-1, there is an optional torrent option for an md5, but the info hash is sha1. Here is an untested implementation I found http://pajhome.org.uk/crypt/md5/contrib/sha1_stream.js Here is a good step by step of implementing sha-1: http://m.metamorphosite.com/one-way-hash-encryption-sha1-data-software |
This comment has been minimized.
This comment has been minimized.
|
Thanks Eric ! Sounds actually like something we should always do when hashing unknown-size files ! |
This comment has been minimized.
This comment has been minimized.
|
i am interested in this problem as well. Any suggestions i can look to help. |
This comment has been minimized.
This comment has been minimized.
|
It looks like feross is using simple-sha1, which basically just uses https://github.com/srijs/rusha, in a synchronous manner Rusha appears to take Buffers, which is nice, but probably the source of the large memory load. Because the browser reads the file into a buffer for rusha, what we need is an implementation that will take a stream, or take chunks as inputs. Here is a gist of someone using sha1 from cryptojs to read a file in chunks, and create a sha1 hash. https://gist.github.com/npcode/11282867 This may be a lot slower, rusha looks like it's fast. Might be worth it though if it saves memory. Chrome hasn't been my friend memory wise recently. I think the lines that would need updating are here, https://github.com/feross/webtorrent/blob/master/lib/torrent.js#L1103 but I am not sure. Maybe I will have some time this week to attempt a pull request. I haven't been developing much, works been crazy, then my mom got attacked with an axe. Hoping to get back onto a normal schedule soon. @feross are the seeded files being held in memory elsewhere, or were you able to store them locally and read them out as pieces when seeding? If it's being written to storage, it would probably be a good idea to combine the file > local storage step and the sha1 step. Can you point out where that is being done? |
This comment has been minimized.
This comment has been minimized.
|
I think he can use some kind of local storage html5 thingy right ? I haven't got the time to read the js source code as I'm fighting with my basic java implementation myself, but I don't think @feross puts all in memory. Btw, that local storage, is it unlimited ? Now that I think about it, if webtorrent can store GB of files on the HDD, any malicious code could too! Really time to read the code :D |
This comment has been minimized.
This comment has been minimized.
|
@xpierro I think it's in https://github.com/feross/webtorrent/blob/master/lib/torrent.js#L321 Looks like it uses https://github.com/mafintosh/memory-chunk-store and https://github.com/feross/immediate-chunk-store. I looked through their source, I don't see any kind of local storage or indexeddb stuff. Still trying to track down where this might happen, or maybe it stays all in memory? Still not sure, Webtorrent has been split into a lot of smaller repo's, which is good, but it's making it really hard to find where this kind of logic would be stored, especially the ones that are swapped out based on enviornment. Here is a memory profile of seeding a file on instant.io, which makes me think that the file is entirely being read into memory, then dumped into somewhere. The big spike is what needs working on. But I'm still having a hard time tracing down where the sha1 happens on that data and where the any kind of storage happens. |
This comment has been minimized.
This comment has been minimized.
|
I have been thinking more about this, and it may be that webtorrent just keeps the pointer supplied by the file input. Then grabs buffers in chunks from those, similarly to how it's done in the gist used for sha1 https://gist.github.com/npcode/11282867? I need to spend some time analyzing how that works, but if so, this could be easier than I thought. |
This comment has been minimized.
This comment has been minimized.
|
this could be related a2800276/bncode#18 (comment) |
This comment has been minimized.
This comment has been minimized.
|
+1 on the NAT issue. |
This comment has been minimized.
This comment has been minimized.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |

Got this email from a user. Moving to GitHub so the maximum number of people can benefit from the ensuing discussion.
Note: please don't email me support questions. Post them as GitHub issues so others have a chance to help. Thanks!