Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client tries to download entire file in one request #1213

Closed
Jedian opened this issue Oct 25, 2017 · 27 comments
Closed

Client tries to download entire file in one request #1213

Jedian opened this issue Oct 25, 2017 · 27 comments

Comments

@Jedian
Copy link

@Jedian Jedian commented Oct 25, 2017

What version of WebTorrent?
WebTorrent/0.98.19, according to the webseed apache log

What operating system and Node.js version?
Ubuntu 16.04
Node.js v5.12.0

What browser and version? (if using WebTorrent in the browser)
Not using WebTorrent in the browser.

What did you expect to happen?
I wrote a little script with create-torrent module to, well, create a torrent file out of a text file (169 MB). The file, on the start, is only available in a webseed.

I expected the client to make requests of the correct length of the piece, and not get stuck :(.

What actually happened?
When I try to download it with the Webtorrent client, it does a big request, as if downloading the entire file in one request. Trying to debug, I added this snippet to client.add():

    torrent.on('download', function(bytes){
        console.log('just got %d bytes', bytes);
    });

The output is:
just got 176917572 bytes, which is the entire file size, when the pieceLength specified on the create-torrent is 65536, so it doesn't make any sense.
Then it gets stuck. It stop making requests and no more outputs are given.

Watching the apache access.log while executing the client, I got only this:
127.0.0.1 - - [25/Oct/2017:10:19:57 -0200] "GET /filename.txt HTTP/1.1" 200 429899 "-" "WebTorrent/0.98.19 (https://webtorrent.io)"
Apparently, 429899 is the size of the content in response. (And HTTP code 200 means it's not partial, which would be HTTP code 206).

I tried with some other clients (transmission and a libtorrent python one) and it works perfectly.
With some other files, even the Webtorrent works, but not in this one.

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Oct 25, 2017

Can you set DEBUG=* and post the full log please?

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Oct 26, 2017

Thanks for the response!
Sure thing. Here it is.

https://pastebin.com/TVwCzQCP

I aborted the script when it stopped spamming things. If you need me to give it more time, just ask.

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Oct 26, 2017

I can only see 35k bytes being requested in the log, not the full file, can you point me to the offending line in the log please?

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Oct 27, 2017

Hmm, maybe this issue's title is not accurate, then.
Because even if it does seems to be requesting 35k bytes (at line 51), in line 82 it reports it got 169MB.
My console.log snippet is at line 85, saying the same.

At first, I thought it could be some problem related with CORS, but I discarded the possibility when I tested the following command:

curl localhost/candido.txt -H 'range:bytes=176881664-176917571'

And I got exactly 35908 bytes. (The range header worked).

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Oct 27, 2017

That's weird. Line 82 is an issue, it's wrong.

I'm seeing in lines 5x that the file is being opened and verified (it's already downloaded), so that's why it's being 'downloaded' instantly. Line 82 is the thing we must check

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Oct 30, 2017

Hey, i ran the tests again. Now remembering to delete the file before trying to download. (I also changed the tracker, but it shouldn't be important here)
https://pastebin.com/vqA3PSYv
In the new log output, the request for 35k bytes is at line 51. Then it reports it got 169MB at line 73.
What more tests could be useful for debugging? What should be done here?

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Oct 30, 2017

Is file fully downloaded by the time it reports 'downloaded' event?

Did you delete the folder & file in /tmp/webtorrent?

Can you log the 'done' event and maybe the other ones too?

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Oct 30, 2017

Also, what webserver are you using? does it support range requests?

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Oct 30, 2017

Is file fully downloaded by the time it reports 'downloaded' event?

Nope, the file size is 0 bytes.

Did you delete the folder & file in /tmp/webtorrent?

Yes.

Can you log the 'done' event and maybe the other ones too?

Here it is, I am now logging 'download', 'infoHash', 'metadata', 'ready', 'warning', 'error', 'wire', 'noPeers' and 'done' events. But 'download' is the only one emitted(line 82). :(
https://pastebin.com/Gmq3ws2J

Also, what webserver are you using? does it support range requests?

I am using apache2, my tests to check range support worked. I did the following (simulating the same request of line 51):

curl localhost/candido.txt -H 'range:bytes=176881664-176917571'

And it got exactly 35908 bytes.

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Oct 30, 2017

What's the access.log in apache look like for that candio.txt file?

This issue seems related to the download event emitted with the wrong size. Because the actual full file hasn't been download

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Oct 30, 2017

There only appears one line:

127.0.0.1 - - [30/Oct/2017:20:29:26 -0200] "GET /candido.txt HTTP/1.1" 200 429899 "-" "WebTorrent/0.98.19 (https://webtorrent.io)"
@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 6, 2017

Any suggestions? I could provide more tests, if it's needed...

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Nov 6, 2017

Line 70 shows that webconn got all data. Even if only requested a range in line 51.

See: https://github.com/webtorrent/webtorrent/blob/master/lib/webconn.js#L116

How about you try to use other webserver to test? nginx?

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 8, 2017

I think it would be weird if that would work, because when I open the same .torrent file in transmission, it gets downloaded nicely...

But I tried it(nginx), it finally completed the download, but the downloaded file is different from the file which I created the torrent... (when I use "cmp" with them as arguments, it reports bytes differences). This doesn't happen with transmission, though, that is still downloading nicely... This time the log was too big for pastebin. So I am attaching it here...
log_file.txt

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Nov 8, 2017

Hmmmm, it looks like it ahs to do with the webserver, can you please post the logs of both apache and nginx as they happen during the download?

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 13, 2017

Of course! Here it is. Apache log file shows 4 failed tries, while nginx log file shows a "successful" download.
logapache2.txt
lognginx.txt

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Nov 13, 2017

Yeah, this is an apache error. It's sending the whole file as if it didn't support the range request from webtorrent, for some reason. Unfortunately, it will be harder to debug, as now it has to do with apache.

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 14, 2017

What about nginx, that sends the response correctly, but at the end, the file is corrupted? Is there a way to debug that?

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Nov 14, 2017

Do the pieces from nginx get verified according to the logs?

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 16, 2017

Yes, checking the log, I could see every piece index being verified. But the files are different 😢

@transitive-bullshit

This comment has been minimized.

Copy link
Member

@transitive-bullshit transitive-bullshit commented Nov 25, 2017

Are you seeding and downloading the file entirely from localhost? If so, this could exacerbate any possible race conditions internally since the order of all the different callbacks that happen in this scenario would be harder to reason about.

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 28, 2017

Yes, I was seeding and downloading entirely from localhost. I tried to manually throttling my local bandwidth so it would be slow and safe, but that didn't work.

But reading forums through the internet, I got updates that may be useful for you guys:
https://www.ruby-forum.com/topic/1998601
Apparently, "There is an issue with gzipping and ranges - what should be done first.
[...] there is no order description in RFC [...]. Apache (at least 2.3.8) gzips content first and then process ranges on gzipped body and according gzipped content length [...]"

What was occurring is that my file (169MB), when gzipped, was only 418KB and the first range request Webtorrent client did was exactly this one: '176881664-176917571'. As this range didn't exists on the gzipped file, Apache would just send the whole file. That explains why the client got all the data in one download event (and why the response code was 200), but that doesn't explains why the client gets choke.

Then, for testing purposes, when I tried to disable gzipping from Apache, the file was downloaded successfully (with no choking, I mean), as in nginx, but was also corrupted. 😢

@bertofer

This comment has been minimized.

Copy link

@bertofer bertofer commented Nov 28, 2017

Hi, I had a similar problem only with Chrome. I am using a webseed, and sometimes the files were being downloaded entirely when requesting a range. That made webtorrent to broke completely, as it was expecting X number of bytes, but was receiving the entire file.

After some debugging, I realized the range header was being removed from the request by Chrome itself. This is, in the process of going to the cache, sometimes the range header is removed in strange scenarios (when various cache access for same URL with different range header were going at the same time, one before the previous could resolve).

I opened an issue in Chromium and it was confirmed it was a bug in the Chromium way of handling the Cache, that has already been solved, but not sure when it will be available in Chrome -> https://bugs.chromium.org/p/chromium/issues/detail?id=775014

To confirm it's this what is happening to you, there are various ways:

  • Try another browser (Firefox?)
  • Disable cache in the web developer console of chrome

If it works in those scenarios, this is probably the bug you're seeing

Hope it helps

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 28, 2017

I'm really grateful for your attention and willingness to help, but I'm afraid this may not help me because I am not using any browser. I am just using NodeJS on the client and a webserver (have tried Apache and Nginx) for webseeding. No browsers involved.

Anyway, thanks! :)

@DiegoRBaquero

This comment has been minimized.

Copy link
Member

@DiegoRBaquero DiegoRBaquero commented Nov 28, 2017

In the browser, it handles the gzip, in node, webtorrent itself doesn't. Don't gzip by default in apache. Or use nginx, it appears to work.

Please close the issue once you consider it

@Jedian

This comment has been minimized.

Copy link
Author

@Jedian Jedian commented Nov 29, 2017

Errm.. Actually, my download still is not working. Even without gzip by default in apache or nginx. In both cases, the file gets corrupted.

I got some interesting logfile when running the client with the file (corrupted) already downloaded:
https://pastebin.com/uza7KEXR

It says the piece 2698 is invalid (line 2759), retry to download it and says it is now verified. But if I run the same again, it will say the same, that the piece 2698 is invalid...

What could it be?

@feross

This comment has been minimized.

Copy link
Member

@feross feross commented Apr 30, 2018

This issue appears stale since it has been open for a while with no activity. I'm going to close it for now. If this is still issue, please feel free to leave a comment or open a new issue.

@feross feross closed this Apr 30, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Jul 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.