Skip to content

Conversation

@MannyClicks
Copy link
Contributor

Hello Yuji,

this PR fixes a bug that happens when the Content-Length > max_int.
This crash was verified and reproduced on MacOS.

Thanks for creating this nice library, very helpful for us. ❤️

@yhirose
Copy link
Owner

yhirose commented Apr 12, 2019

@MannyClicks, thank you for the really good bug report!

After ponder over your PR, I came to the conclusion that it should convert the Content-Length to size_t, because the value will eventually used as size_t with basic_string& assign( size_type count, CharT ch ) at this line.

So I am thinking to remove get_header_value_int and make the following method instead.

inline size_t get_header_value_size_t(const Headers &headers, const char *key, size_t def = 0)

When it comes to the conversion from string to size_t, we could follow the suggestion on the following stackoverflow post.
https://stackoverflow.com/questions/34043894/convert-string-to-size-t#answer-34044121

I'll let you know when it's ready. Thanks a lot!

@yhirose
Copy link
Owner

yhirose commented Apr 12, 2019

Also, I am thinking to add the length check with the actual size of size_t type on a user's environment.

@MannyClicks
Copy link
Contributor Author

Thanks for the kind words.

I recommend not using size_t directly. This choice would clamp the range of content-length to 32bit on 32bit platforms. Any upload > 4gb would break the server. Also, I assume the STL throws an exception if you try to convert a string stream value exceeding the size_t range to a size_t.

My PR guarantees an unsigned 64bit type on all platforms.
You can still handle the result gracefully on 32bit platforms - if this is what you need to do.

@yhirose
Copy link
Owner

yhirose commented Apr 12, 2019

@MannyClicks, thanks for the clear information. I now understand that size_t conversion shouldn't be done. I have two more questions to understand fully this PR.

(1) Can we use uint64_t for unsigned long long for consistency? (It's because uint64_t is already used at various places in httplib.h like here.)

(2) Should we fix read_content_with_length to work properly on 32-bit platform? (I am using httplib.h on 32-bit Windows applications...)

Here is my thought regarding it:

The return value from get_header_value_uint64 is passed to size_t len parameter on read_content_with_length. So if the value is > 64GB on 32-bit platform, the value will be truncated since the size of size_t is 32-bit. Then, the following code will only read the content up to the first 32-bit length, and the rest of content will just remain on the socket. That would cause problems.

Also out cannot hold the entire content, because the maximum std::string buffer size is 32-bit on 32-bit platform.

cpp-httplib/httplib.h

Lines 822 to 838 in 07ed076

inline bool read_content_with_length(Stream &strm, std::string &out, size_t len,
Progress progress) {
out.assign(len, 0);
size_t r = 0;
while (r < len) {
auto n = strm.read(&out[r], len - r);
if (n <= 0) { return false; }
r += n;
if (progress) {
if (!progress(r, len)) { return false; }
}
}
return true;
}

So I am thinking to change the type of size_t len on read_content_with_length to uint64_t. Also if a request comes with huge content > 4GB on 32-bit platform, it should read whole content anyway, but return false in read_content_with_length because out cannot hold the entire content.

Does it make sense to you?

@MannyClicks
Copy link
Contributor Author

Sure, happy to help the project 👍

RE 1: yes, I would - but I didn't because from looking at the project style it didn't seem to me that you'd be using stdint.h types. uint64_t would be the correct type as per my explanation above.

RE 2: I think it should only load a user set maximum into the buffer on 32bit platforms and keep the rest on the socket. However, you'd need to inform the user that another read is necessary. Keep in mind that 32bit processes have 4GB memory limits that many people forget these days. It will make the handling the stream much more difficult than on a 64bit platform. I think an OK'ish alternative for 32bit platforms is to just reject such a big request (which can be done naturally if what I propose next is implemented). A better alternative would be to only load the content from the stream in the request path handler setup in the server if the user wants to do. Then the user can build his own chunked stream.

Another thing I noticed in our stress testing today, we have to reject streams if the content length exceeds a maximum size specified in the server instance. Otherwise, bad clients can create 40gb requests - effectively killing our process.

BTW, one thing I dislike about the API is that we have std::string everywhere for data buffers. I would prefer std::vector<uint8_t>

@yhirose
Copy link
Owner

yhirose commented Apr 12, 2019

Thanks for the further explanation. I removed get_header_value_int which is no longer used, and changed the return type of get_header_value_uint64 from unsigned long long to uint64_t, and merged the PR. Thanks for your contribution!

As for RE 2, I'll ponder over those options.

Another thing I noticed in our stress testing today, we have to reject streams if the content length exceeds a maximum size specified in the server instance. Otherwise, bad clients can create 40gb requests - effectively killing our process.

BTW, one thing I dislike about the API is that we have std::string everywhere for data buffers. I would prefer std::vector<uint8_t>

Thanks for the suggestions, too. I made corresponding tickets. (Of course, you are welcome to implement these. 😄)

@MannyClicks
Copy link
Contributor Author

Sounds good. Expect another PR soon! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants