-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validate max_allowed_packet on the client side #79
Conversation
My understanding is that the builder API already splits packets up based on a maximum size. I think the issue is that the max packet size is currently hard coded in the packet_parser.h. If I'm remembering the protocol correctly, I think the correct fix for this is to properly pass a custom-configured maximum packet size down into the various parts of trilogy that are currently assuming the hard coded value. And maybe fall back to the hard coded value (16MB) as a default? |
That's correct. Unfortunately the FYI 16MB is the maximum possible value for |
abf8bc1
to
9bb8d2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the max_allowed_packet doesn't refer to the size of a packet in that sense but it refers to the size of the whole message minus continuation headers.
Ah that's right, sorry for missing that 😅 The length portion of the packet segment header is definitely only 24 bits.
Be that as it may, it appears that we're only checking packet length here when sending a query. But the maximum packet size applies at the protocol level, so doing the check down in the builder API is probably where the check should happen. It would be a breaking change to the C ABI, but it's the right place to do it IMO. This allows for future extensions of this codebase to continue to operate according to spec without having to remember to add their own additional checks for packet length. We actually designed the builder API for this exact purpose: to ensure packets are built to spec.
I demonstrated that in the test_packet_size_greater_than_trilogy_max_packet_len test where max_allowed_packet is greater than TRILOGY_MAX_PACKET_LEN and everything still works up to a point when max_allowed_packet is reached.
We should probably test when a packet is smaller than max_allowed_packet
as well. My understanding of the configurable maximum packet size of the protocol is so clients in memory constrained environments can tell the server to limit the size of packets. Not that it necessarily matters for us in this library, but I think the server uses MAX(net_buffer_length, max_allowed_packet)
.
I'm also reading that as of MySQL 8.0 the maximum packet size allowed is 1GB, so we should probably check for that in code as well.
When we originally built trilogy, we were really only taking into account the immediate needs of our production systems at GitHub; But in a way that could be made reusable and open source down the road. So it's great to see things like this get added over time. Thanks for getting this one taken care of!
@@ -526,8 +526,12 @@ int trilogy_ping_recv(trilogy_conn_t *conn) { return read_generic_response(conn) | |||
|
|||
int trilogy_query_send(trilogy_conn_t *conn, const char *query, size_t query_len) | |||
{ | |||
int err = 0; | |||
// we have allow for 2 extra bytes due to the mysql protocol overhead; null byte + command byte(?) | |||
if (conn->socket->opts.max_allowed_packet != 0 && query_len > conn->socket->opts.max_allowed_packet - 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be missing something here, but the protocol header is 4 bytes not 2 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've run some experiments and 2 of offset bytes were enough to not trigger a server-side error, regardless of the max_allowed_packet
.
experiment on current main:
set_max_allowed_packet(4 * 1024 * 1024)
client = new_tcp_client
client = new_tcp_client
create_test_table(client)
client.query "TRUNCATE trilogy_test"
query_overhead = "INSERT INTO trilogy_test (blob_test) VALUES ('')".bytesize
byte_offset = 2
payload_size = 4 * 1024 * 1024 - query_overhead - byte_offset
result = client.query "INSERT INTO trilogy_test (blob_test) VALUES ('#{"x" * payload_size}')" # works
result = client.query "INSERT INTO trilogy_test (blob_test) VALUES ('#{"x" * (payload_size + 1)}')" # doesn't work
I don't really have an explanation why it worked but I think @matthewd has a very rough idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the maximum size is indeed only counting the message portion of the packet frame, then subtracting a 4 byte offset makes sense to me. I'll let someone else chime in though.
I agree. We even wanted to do that at first, but it was hard or even impossible and we assumed that in practice, only queries will exceed that limit. We weren't sure if the buffers that we're building are pushed to the server on the fly or after the whole message is split into buffers. In the first case it's not really possible to figure out the size of the whole message since:
If you think that those arguments aren't true then I think we should revisit moving this change to the buffer builder. We only considered really long hostnames, database names or passwords combined with a fairly small
Yup, that's exactly what
It felt a bit expensive and kinds excessive so I didn't do it but I can add that spec if you really insist. |
This may be true today, but as I linked to in my pull request for adding prepared statement/binary protocol support - there are other command types that can potentially breach the limit. The replication protocol is another area that would be of concern, were someone to add support for that in the future. So having it at a lower level, shared across all of these various higher level APIs would be nice. As for the design of trilogy with regards to packet buffers: We very intentionally designed it such that only one packet will be parsed or generated at a time. Never more than one, and never both parsing and generation at the same time. The MySQL protocol is somewhat synchronous in that you must issue a command, then read its response before issuing another command. That's technically slightly not true (as I worded it) with pipelining. I'm not even still sure that's a thing but even with that, if you issue 5 commands you must read 5 responses after. And they'll show up in the order the commands were issued. As for how that works at a technical level in trilogy: When building a command (like say, a query) the command packet buffer is built in-memory in its entirety, containing all of the packet sub-frames, before being handed back to the caller. At that point, the caller can use the non-blocking or blocking APIs to write it to the network. And when reading packets off the network, the entire packet is read into memory before being returned to the caller. The way we built the API, in order to save on allocations a single buffer is shared for both parsing and generation. And it lasts for the lifetime of the connection until All of that is the long way of saying it's safe to assume we can keep track of the overall packet size (minus the header(s)) in the builder. As for parsing, I assume the server would never send a packet larger than the maximum configured size, but maybe it's worth putting a check in there as well to prevent a potential DOS?
🤦 so sorry, I should have seen that test. Thanks! For the 1GB check, maybe the test for this could be conditionally enabled somehow? That way it would only run if specifically requested. These days, I don't imagine a 1GB allocation would be an issue on modern developer machines. But I totally understand the concern here. I'll defer to your judgement. |
closing in favour of #84 |
This PR achieves the same goal as rails/rails#48226 and a better version of #74 intended to but it's done in the client layer.
Motivation:
Trilogy doesn't do anything to protect against sending queries bigger than the max packet size. Depending on timing and query size, sending a giant query (the default max is 64MB…) could result in either Trilogy::Error: trilogy_query_send: TRILOGY_CLOSED_CONNECTION, Errno::ECONNRESET: Connection reset by peer - trilogy_query_send, or Trilogy::DatabaseError: trilogy_query_recv: 1153 Got a packet bigger than 'max_allowed_packet' bytes.
Details
This PR just shifts the error to the client side so it can be raised without attempting to sent the query, which used to result in aborting the connection.
It will also improve visibility for this kind of error. Before, we weren't always getting the corresponding error message because sometimes the connection was aborted before we received it from the server.
merge after #78