Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON body response is being truncated #537

Open
kristophM opened this issue Aug 18, 2016 · 6 comments
Open

JSON body response is being truncated #537

kristophM opened this issue Aug 18, 2016 · 6 comments

Comments

@kristophM
Copy link

kristophM commented Aug 18, 2016

Hello, I'm pulling a large amount of data while paginating through an API (250 records per page, about 500 pages per hour, Typhoeus max concurrent requests are 20). The first 20 min I'm able to receive json strings from response.body no problem. However, after that I get a truncated response, and thus without the closing braces of the JSON I'm not able to parse the string successfully.

Any idea why this may be happening? I'm running this on a worker process, not web. I can provide more code if that would be helpful. Have tried both the latest version of Typhoeus at time of writing, as well as 0.5.0.

@Zapotek
Copy link
Contributor

Zapotek commented Oct 26, 2016

Hello,

Do you have the #response_headers, #return_code and #return_message for the incomplete responses?

@cs0511
Copy link

cs0511 commented Feb 19, 2018

hi @Zapotek

I have the same issue, here is my #response_headers, #return_code and #return_message

[13] pry(main)> result.response_headers
=> "HTTP/1.1 200 OK\r\nServer: nginx/1.10.3 (Ubuntu)\r\nDate: Mon, 19 Feb 2018 03:53:46 GMT\r\nContent-Type: application/json; charset=UTF-8\r\nContent-Length: 145550\r\nConnection: keep-alive\r\nVary: Accept-Encoding\r\nAccess-Control-Allow-Headers: Authorization\r\nEtag: \"585d7faf80a9e5b54c69e0efe43ef4e9246908ae\"\r\nAccess-Control-Allow-Origin: *\r\nAccess-Control-Allow-Methods: GET, OPTIONS\r\n\r\n"
[14] pry(main)> result.return_code
=> :partial_file
[15] pry(main)> result.return_message
=> "Transferred a partial file"
``

@kwasimensah
Copy link
Contributor

I'm getting this to and I think it's because easy_cleanup isn't being called.

It seems like you're using Auto pointers to detect when the easy handle is no longer referenced https://github.com/typhoeus/ethon/blob/ab052b6a317309b6ae8be7b538738b138b11437a/lib/ethon/easy/operations.rb#L13

However, according to curl's docs (https://curl.haxx.se/libcurl/c/curl_easy_cleanup.html), easy_cleanup might invoke some progress callbacks, which I'm assuming is to deal with a race between multi_info_read saying your done and actually calling the write callback.

The problem is you call easy.complete while still having a handle to to the easy handle https://github.com/typhoeus/ethon/blob/ab052b6a317309b6ae8be7b538738b138b11437a/lib/ethon/multi/operations.rb#L151 so the final bytes of a message might not get processed. I can try testing this with a PR in a couple hours when I have more time.

@kwasimensah
Copy link
Contributor

D'oh. I poked at this and I don't think that's the issue :( But I am still seeing this bug.

Response Headers:

{"Server"=>"nginx/1.10.3 (Ubuntu)", "Date"=>"Tue, 20 Feb 2018 21:41:38 GMT", "Content-Type"=>"application/json; charset=UTF-8", "Transfer-Encoding"=>"chunked", "Connection"=>"keep-alive", "Vary"=>"Accept-Encoding", "Access-Control-Allow-Origin"=>"*", "Cache-Control"=>"public, max-age=3600"}

With a 200 status. And the Response body is large since this is part of a scraping task

@kwasimensah
Copy link
Contributor

I'm in a breakpoint where this happens right now and the easy.return_code in check is :recv_error. However, nothing seems to be actually caring about that.

easy.log_inspect output:

"EASY effective_url=http://api.tvmaze.com/shows/793?embed%5B%5D=seasons&embed%5B%5D=episodes response_code=200 return_code=recv_error total_time=5500.674963"

It looks like https://github.com/typhoeus/typhoeus/blob/master/lib/typhoeus/adapters/faraday.rb#L102 should also check if resp.return_code is anything but :ok.

@kwasimensah
Copy link
Contributor

As for what's causing the :recv_error, the client is sending a RST ACK to the server to socket close isn't coming from the server. Not sure what's causing that yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants