Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vttablet: stream consolidation #7752

Merged
merged 12 commits into from
Apr 7, 2021
Merged

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Mar 26, 2021

Description

Happy Friday everyone! Here's the first iteration of a feature I believe is going to have a very significant impact in some of Vitess' largest deployments.

Here's the problem we're trying to solve: thundering herds. That's basically it. Sometimes databases suffer a usage pattern called a "thundering herd" where hundreds/thousands of queries happen at the same time, and all these queries are identical. Most of the time, this is caused either by a deployment, or by a cache that has been cleared or has expired.

Vitess already does a very good job at handling thundering herds in vttablet with our query consolidation engine, which prevents to identical point-queries from going to the upstream MySQL server: only one of the queries goes upstream, and all the others wait for the initial query to return. The issue is that our current consolidation engine does not support streaming queries -- understandable, because this is a hard problem to solve, but at the same time, this is particularly bad, because streaming queries are by definition more expensive than point queries, so consolidating them is particularly beneficial to performance. In practice, these queries often "block" one of our MySQL streaming connections for a long amount of time, and since we have a limited amount of connections from vttablet to MySQL, this results in a snowball effect where every single query in the herd takes longer than the previous one, because it needs to wait for all the previous queries to pick up a connection and release it back to the pool.

So, let's try to solve this problem: this PR implements a consolidation engine for streaming queries. I'm not sure whether this is a novel algorithm, but I haven't found any existing art on this so I had to cook it up myself. It was tricky to get right, and it's definitely complex, but the results seem worth it.

The consolidation engine for streaming queries is configurable with a given amount of memory, because it essentially acts as a cache. When a streaming query starts, the engine keeps track of it and of all the intermediate rows it's returning from MySQL. If another identical stream shows up, it automatically catches up to the original stream with the data that has been cached in memory, and any new rows streamed from MySQL are fanned out to it. Once the original stream has grown large enough in memory, we clear its intermediate cache and new queries can no longer catch up to it, but all the existing followers keep receiving new rows as they come in.

In practice this works very well: with the default settings of 128MB total to be used for consolidation, and 2MB max for any consolidation stream, we can capture massive thundering herds even if they come with a latency differential of up to 250ms. The consolidation engine is smart, so when an identical query comes in that cannot catch up to the existing consolidation, it automatically promotes itself as a consolidation leader for any follow-up queries. A herd of 10.000 identical queries spread over 4s can be handled with only 4 MySQL streams, as opposed to, huh, 10.000.

The performance impact of this optimization is hard to graph, because the previous behavior is very pathological and the new behavior is very straightforward. As a sample:

consol

This is a thundering herd pattern that is performing an expensive query (1100 rows taking up to 2MB in total) from up to a thousand simultaneous connection. The latency pattern for the current version of vttablet (seen in orange) is all over the place: depending on the ordering that the scheduler gave to the incoming queries, the earliest queries take 30ms to finish, while the latest ones go all the way to 1.4 seconds -- this is because, again, every query needs to wait for all the previous ones to release the borrowed MySQL connections.

The consolidation engine (seen in blue), well, it consolidates the 1000 queries into a single stream, so they all finish simultaneously between 40 and 50ms.

This is it for now, I'm putting this up early for review/testing while I'm away next week for Easter. The code may have bugs, but the testing is comprehensive, with integration tests for all error behaviors (including clients that lag behind -- that was tricky to get right) and endtoend tests, with full -race coverage.

Outstanding Issues

Nothing big. My main concern is that we're allocating more memory because the existing MySQL streaming client was reusing the rows of every Result set after yielding it (it's no longer safe to do this, because the result set is kept in memory temporarily to allow other queries to catch up). I've worked around the issue conservatively; an ideal approach would switch to MySQL connection to use a memory pool for rows reuse.

The other outstanding issue is that the results of the stream callbacks are now immutable -- understandably, as they're now shared between clients. I had to lift up a namespace-modification mutation so it didn't happen more than once between the consolidated queries, and the result was clean, but if somebody introduces another of these mutations in the future higher in the stack, it'll break things. The Go type system does not let us model this pointer as immutable because it uses type theory from 1973, so we'll have to keep an eye out to the race detector.

Related Issue(s)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Impacted Areas in Vitess

Components that this PR will affect:

  • Query Serving
  • VReplication
  • Cluster Management
  • Build/CI
  • VTAdmin

@vmg
Copy link
Collaborator Author

vmg commented Mar 26, 2021

Unfortunate hiccup: the CI server doesn't have enough concurrency to handle the 1000-client thundering herd in the stress tests, either with or without consolidation. I need to ponder this a bit; maybe the test needs to be simplified for CI altogether.

@dweitzman
Copy link
Member

dweitzman commented Mar 27, 2021

Seems like a generally good thing. The edge case I wonder about is what happens if two gates are caught up and streaming the same query, then one of them isn't able to receive results quickly (maybe the application is thrashing and paging through results very slowly).

I haven't looked through the code, but I'm assuming the a single slow or stuck consolidated gate would fill up some buffer and then potentially block other gates from receiving results too?

If that's happening in production, seems like someone would be able to mitigate by setting ConsolidatorStreamTotalSize to 0 and restarting the tablets (to keep OLTP query consolidation and disable OLAP consolidation).

Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg
Copy link
Collaborator Author

vmg commented Apr 5, 2021

Back from my holiday!

The edge case I wonder about is what happens if two gates are caught up and streaming the same query, then one of them isn't able to receive results quickly

This is explicitly handled and tested! Any clients that fall behind so much that they cannot catch up will eventually be disconnected with a specific error message. https://github.com/vitessio/vitess/pull/7752/files#diff-5fc9fba5ef5ce1a6a2744d9d42243b740ac8f76b29b06c16abcd805bb416673cR245-R282

Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg vmg force-pushed the vmg/stream-consolidation branch from 1d8bae5 to 11621b2 Compare April 5, 2021 09:54
vmg added 5 commits April 5, 2021 12:01
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg
Copy link
Collaborator Author

vmg commented Apr 5, 2021

The tests are now reliably green. It turned out to be quite hard to get these concurrent tests to always pass when run under the race detector, because it slows them down.

Signed-off-by: Vicent Marti <vmg@strn.cat>
@vmg
Copy link
Collaborator Author

vmg commented Apr 5, 2021

I've worked around the issue conservatively; an ideal approach would switch to MySQL connection to use a memory pool for rows reuse.

I've now implemented this feature in aa59b88, which should reduce memory allocations both when using and when not using the consolidator.

vmg added 3 commits April 5, 2021 18:16
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Copy link
Collaborator

@systay systay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always a learning experience reading through your code :)

@systay systay merged commit dadfdd5 into vitessio:master Apr 7, 2021
Comment on lines +301 to +304
if replaceKeyspace != "" {
result.ReplaceKeyspace(replaceKeyspace)
}
return callback(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace keyspace does not need to called always as not all queries will be for keyspace tables. It can be a information_schema query or _vt query.

There is similar check in Execute method where we check for if dbname is different from keyspace name and also if field.Database == dbname then only replace with the keyspace name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshit-gangal this is the same behavior that we had before the PR; I've just refactored it by lifting it from the caller. I'm pretty sure we're only replacing the keyspace when it's required (see https://github.com/vitessio/vitess/pull/7752/files#diff-799d4762d9e64026cb623db2defeb53209df055a99c1106f535006a89e7b812aR282-R285, which I've lifted from https://github.com/vitessio/vitess/pull/7752/files#diff-28fc9df9d28d69ca625b85e18a6fa652d61bfaf6de03012ce2d8d0338363d70eL808-L817) Can you double-check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(unless the behavior before the PR was already broken, in that case I'll gladly fix it now 😄)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that, it was broken before this PR

Comment on lines +333 to +335
if replaceKeyspace != "" {
result.ReplaceKeyspace(replaceKeyspace)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants