cache: track memory usage more closely #8804
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This is a set of fixes arising from an internal PlanetScale issue, in which importing a large SQL dump into a Vitess cluster was causing the
vtgates
to OOM: the import queries themselves where using more memory than we were expecting for the operation.The actual data being imported is not relevant to the issue so it won't be discussed here. The only meaningful thing about the import is that the resulting
INSERT
SQL statements were extremely large.The first thing I did to debug the increase in memory usage during the import was graphing the heap usage and RSS of the
vtgate
process while the import was running:Although an increase in memory usage is to be expected for the actual duration of the import, as the incoming SQL queries are massive and they must be processed in the
vtgate
, the most salient point from this graph is that neither RSS for the process nor the heap allocation amount for the Go runtime recovers after the import process is done.In theory, the resulting heap size after the import should be 32mb (from the query plan cache that has been populated by the import) + 12mb (which is the idle state at the start of the import). The heap however sits at 100mb after the import: more than twice the expected size. We're keeping memory somewhere where we shouldn't.
The fixes
I'm not going to dive into the actual debugging process because it took a week of effort and it's too specific to explain in detail. I did find, however, two major issues that compounded into that 100mb+ heap; as expected, they were related to the way were calculating the in-memory size for the
engine.Plan
objects that were being stored in the plans cache.Here are the issues broken down:
sizegen: calculate malloc block sizes (0106bc4): the most obvious thing when measuring the cache memory usage for all
Plan
objects is that there actual heap usage was always ~20% smaller than the calculated size by ourCachedSize
code. In retrospect, this was a glaring issue that I introduced since the very first version of thesizegen
code generator, but it didn't become apparent until now: our code generator calculates the sizes at the type system level for the Go (e.g. a struct hasunsafe.Sizeof
bytes, a byte slice hascap(slice)
bytes, etc), but when actually running our program, all these data structures are allocated into the Go heap, and the Go runtime uses a block-based allocator with size families, where all allocations are always rounded up to the closest size family.The list of size families is hardcoded and hidden in the Go runtime, but using the power of
go:linkname
, we can access the Go runtime'sruntime.roundupsize
helper, which is the exact same function that the memory allocator will use to round-upmalloc
sizes at runtime. By calling this function explicitly, we can adjust the size of all the memory calculations in our generated code.sqlparser: do not pin table names in the heap (1c4ee65): after the block size fix, the calculated sizes for most queries were very close to the visualized heap usage, but the largest
INSERT
queries from our sample set were still resulting in a difference of up to 50% between calculated vs real memory usage. Debugging this took several core dumps, but the answer was obvious in retrospect: the Go runtime shares the underlying memory for allstring
instances that have been sliced from one another (this is a common pattern in Garbage Collected languages that greatly reduces memory allocations), but in our case, this was an issue when keeping chunks of the parsed AST for large queries. For this specific example, an incoming query that was as large as 2MB was being parsed by our SQL parser. The resulting AST was being processed in full and discarded, as expected, except for one single exception: we were keeping a reference to a string containing the name of the table in the original query. This 7-byte string was being stored into our cache, and since it was a subslice of the original query, it was pinning 2MB+ of data on the heap so we could access the underlying 7 bytes.This was the source for the vast majority of the memory over-counting. There is no way for our size calculation code to notice that a 7-byte string was actually backed by 2MB in the heap. This is a common and recurring issue in garbage collected languages that has no easy fix, except being more explicit and attentive with memory-shared strings. For this specific case, we've fixed the issue by introducing a
hack.StringClone
helper that will create a copy of a string that does not share the underlying storage. By changing ourTableIdentifier
helper to use it, we've fixed the memory usage calculations for dozens of different planning primitives in our planner, which were all keeping references to the table's name.executor: use a hash for plan cache keys (7b459d1): this last optimization is not related to size calculations, but it shows a noticeable reduction in the peak RSS usage for
vtgate
during high load. When caching queries, the key with which they were being cached was being calculated by prepending a prefix to the full text of the query. Since strings are immutable in Go, any string concatenation operation causes a brand new memory allocation. For a 2MB input query, the resultinglen(prefix) + len(query)
allocation is extremely wasteful -- particularly since our cache is already designed to store the hash of the cached keys, as opposed to the full keys.We can work around this issue by using a SHA256 of the query instead of the full query when building the key. In practice, it uniquely identifies the query just the same, but it does not create a temporary large allocation that would increase the total RSS of the system during load situations.
Results
Here's the resulting memory usage graph for the import after these optimizations:
It's looking good. Most notably, the heap usage now tracks the designed size for the plan cache (32mb) within a <2% margin of error. The peak RSS usage during the import is significantly lower, and the stable RSS usage after the import is done has also decreased severely, although it does not match the runtime heap usage because the Go runtime is not super-eager to return memory to the OS, even after it is not being used.
vtgate
could be adjusted to aggressively return memory as to reduce RSS during idle situations, which may make sense for some Vitess Kubernetes deployments, although I fear it may result in reduced performance in many real world scenarios. This will be discussed in a separate issue in the future.cc @deepthi
cc @systay for the sizegen changes
Related Issue(s)
Checklist
Deployment Notes