Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage and NAT traversal questions #430

Closed
feross opened this issue Sep 11, 2015 · 12 comments
Closed

Memory usage and NAT traversal questions #430

feross opened this issue Sep 11, 2015 · 12 comments
Labels

Comments

@feross
Copy link
Member

@feross feross commented Sep 11, 2015

Got this email from a user. Moving to GitHub so the maximum number of people can benefit from the ensuing discussion.

Note: please don't email me support questions. Post them as GitHub issues so others have a chance to help. Thanks!

Hi Feross,

I am looking for the best way to fix 2 below issues in instant.io and
kindly ask you for any clues/advices in case you already looked into them.

1. Memory consumption right after drag and drop is as huge as at least
added file size. This causes my Ubuntu laptop with 4GB of RAM to become
totally unresponsive within just a few minutes after dragging and
dropping of 5GB file. Is it expected to create and seed torrent file
without loading it whole into memory, as well as without coping it to
another place on hard drive accordingly to your current implementation?
I.e. similarly like it is implemented in https://mega.nz/.
2. NAT is not traversed when both peers are behind different home
routers with different public IP and neither open ports nor UPnP.
Transfer is made only via third party server on limited speed in this
case while uTorrent transfers directly between same peers on full speed
of slowest ISP without any third party server. I understand it should
work in WebTorrent basing on browser's WebRTC NAT traversal capabilities
basing on ice/turn/stun defined in https://instant.io/rtcConfig but is
it expected already to work in your current implementation right now or
are you aware about any issues with it?

Sincerely yours,
Sergiy
@feross feross added the question label Sep 11, 2015
@ericwooley

This comment has been minimized.

Copy link
Contributor

@ericwooley ericwooley commented Sep 11, 2015

It seems to me that mega.nz is different because it doesn't need to be able to access the file in memory. They don't need to analyze the whole file in browser, just stream it to the server.

Maybe I am missing something here, but it doesn't seem like the same can be achieved when you need to generate an md5 hash of the entire file. I'm not familiar with the actual md5 algorithm, but maybe we can write our own md5 implementation that accepts a stream so we can release memory after the alorithm has consumed it.

@xpierro

This comment has been minimized.

Copy link

@xpierro xpierro commented Sep 11, 2015

It seems theorically possible to calculate an MD5 hash with chunks of data little by little: http://www.openssl.org/docs/manmaster/crypto/md5.html

@ericwooley

This comment has been minimized.

Copy link
Contributor

@ericwooley ericwooley commented Sep 11, 2015

@feross Can I get a hint as to where the hashing is happening? I looked through torrent, and I am having a hard time finding it.

It's actually SHA-1, there is an optional torrent option for an md5, but the info hash is sha1. Still seems possible to me though. In fact, it appears it is http://stackoverflow.com/questions/2495994/can-sha-1-algorithm-be-computed-on-a-stream-with-low-memory-footprint

Here is an untested implementation I found http://pajhome.org.uk/crypt/md5/contrib/sha1_stream.js

Here is a good step by step of implementing sha-1: http://m.metamorphosite.com/one-way-hash-encryption-sha1-data-software

@xpierro

This comment has been minimized.

Copy link

@xpierro xpierro commented Sep 12, 2015

Thanks Eric ! Sounds actually like something we should always do when hashing unknown-size files !

@shavyg2

This comment has been minimized.

Copy link

@shavyg2 shavyg2 commented Sep 13, 2015

i am interested in this problem as well. Any suggestions i can look to help.

@ericwooley

This comment has been minimized.

Copy link
Contributor

@ericwooley ericwooley commented Sep 14, 2015

It looks like feross is using simple-sha1, which basically just uses https://github.com/srijs/rusha, in a synchronous manner

Rusha appears to take Buffers, which is nice, but probably the source of the large memory load. Because the browser reads the file into a buffer for rusha, what we need is an implementation that will take a stream, or take chunks as inputs.

Here is a gist of someone using sha1 from cryptojs to read a file in chunks, and create a sha1 hash. https://gist.github.com/npcode/11282867

This may be a lot slower, rusha looks like it's fast. Might be worth it though if it saves memory. Chrome hasn't been my friend memory wise recently.

I think the lines that would need updating are here, https://github.com/feross/webtorrent/blob/master/lib/torrent.js#L1103

but I am not sure. Maybe I will have some time this week to attempt a pull request. I haven't been developing much, works been crazy, then my mom got attacked with an axe. Hoping to get back onto a normal schedule soon.

@feross are the seeded files being held in memory elsewhere, or were you able to store them locally and read them out as pieces when seeding? If it's being written to storage, it would probably be a good idea to combine the file > local storage step and the sha1 step. Can you point out where that is being done?

@xpierro

This comment has been minimized.

Copy link

@xpierro xpierro commented Sep 17, 2015

I think he can use some kind of local storage html5 thingy right ? I haven't got the time to read the js source code as I'm fighting with my basic java implementation myself, but I don't think @feross puts all in memory.

Btw, that local storage, is it unlimited ? Now that I think about it, if webtorrent can store GB of files on the HDD, any malicious code could too!

Really time to read the code :D

@ericwooley

This comment has been minimized.

Copy link
Contributor

@ericwooley ericwooley commented Sep 17, 2015

@xpierro I think it's in https://github.com/feross/webtorrent/blob/master/lib/torrent.js#L321

Looks like it uses https://github.com/mafintosh/memory-chunk-store and https://github.com/feross/immediate-chunk-store. I looked through their source, I don't see any kind of local storage or indexeddb stuff. Still trying to track down where this might happen, or maybe it stays all in memory? Still not sure, Webtorrent has been split into a lot of smaller repo's, which is good, but it's making it really hard to find where this kind of logic would be stored, especially the ones that are swapped out based on enviornment.

Here is a memory profile of seeding a file on instant.io, which makes me think that the file is entirely being read into memory, then dumped into somewhere. The big spike is what needs working on.
screen shot 2015-09-17 at 12 32 33 pm
It looks like the torrent file gets the file stream from here https://github.com/feross/webtorrent/blob/master/lib/torrent.js#L1161 called from here https://github.com/feross/webtorrent/blob/master/index.js#L204

But I'm still having a hard time tracing down where the sha1 happens on that data and where the any kind of storage happens.

@ericwooley

This comment has been minimized.

Copy link
Contributor

@ericwooley ericwooley commented Sep 17, 2015

I have been thinking more about this, and it may be that webtorrent just keeps the pointer supplied by the file input. Then grabs buffers in chunks from those, similarly to how it's done in the gist used for sha1 https://gist.github.com/npcode/11282867? I need to spend some time analyzing how that works, but if so, this could be easier than I thought.

@arestov

This comment has been minimized.

Copy link

@arestov arestov commented Oct 30, 2015

this could be related a2800276/bncode#18 (comment)

@stale stale bot added the stale label May 3, 2018
@webtorrent webtorrent deleted a comment from stale bot May 4, 2018
@stale stale bot removed the stale label May 4, 2018
@oleiba

This comment has been minimized.

Copy link

@oleiba oleiba commented Jun 11, 2018

+1 on the NAT issue.

@stale

This comment has been minimized.

Copy link

@stale stale bot commented Sep 9, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 9, 2018
@stale stale bot closed this Sep 16, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Dec 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.