Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: track memory usage more closely #8804

Merged
merged 5 commits into from
Sep 14, 2021
Merged

Conversation

vmg
Copy link
Collaborator

@vmg vmg commented Sep 13, 2021

Description

This is a set of fixes arising from an internal PlanetScale issue, in which importing a large SQL dump into a Vitess cluster was causing the vtgates to OOM: the import queries themselves where using more memory than we were expecting for the operation.

The actual data being imported is not relevant to the issue so it won't be discussed here. The only meaningful thing about the import is that the resulting INSERT SQL statements were extremely large.

The first thing I did to debug the increase in memory usage during the import was graphing the heap usage and RSS of the vtgate process while the import was running:

before

Although an increase in memory usage is to be expected for the actual duration of the import, as the incoming SQL queries are massive and they must be processed in the vtgate, the most salient point from this graph is that neither RSS for the process nor the heap allocation amount for the Go runtime recovers after the import process is done.

In theory, the resulting heap size after the import should be 32mb (from the query plan cache that has been populated by the import) + 12mb (which is the idle state at the start of the import). The heap however sits at 100mb after the import: more than twice the expected size. We're keeping memory somewhere where we shouldn't.

The fixes

I'm not going to dive into the actual debugging process because it took a week of effort and it's too specific to explain in detail. I did find, however, two major issues that compounded into that 100mb+ heap; as expected, they were related to the way were calculating the in-memory size for the engine.Plan objects that were being stored in the plans cache.

Here are the issues broken down:

  • sizegen: calculate malloc block sizes (0106bc4): the most obvious thing when measuring the cache memory usage for all Plan objects is that there actual heap usage was always ~20% smaller than the calculated size by our CachedSize code. In retrospect, this was a glaring issue that I introduced since the very first version of the sizegen code generator, but it didn't become apparent until now: our code generator calculates the sizes at the type system level for the Go (e.g. a struct has unsafe.Sizeof bytes, a byte slice has cap(slice) bytes, etc), but when actually running our program, all these data structures are allocated into the Go heap, and the Go runtime uses a block-based allocator with size families, where all allocations are always rounded up to the closest size family.

    The list of size families is hardcoded and hidden in the Go runtime, but using the power of go:linkname, we can access the Go runtime's runtime.roundupsize helper, which is the exact same function that the memory allocator will use to round-up malloc sizes at runtime. By calling this function explicitly, we can adjust the size of all the memory calculations in our generated code.

  • sqlparser: do not pin table names in the heap (1c4ee65): after the block size fix, the calculated sizes for most queries were very close to the visualized heap usage, but the largest INSERT queries from our sample set were still resulting in a difference of up to 50% between calculated vs real memory usage. Debugging this took several core dumps, but the answer was obvious in retrospect: the Go runtime shares the underlying memory for all string instances that have been sliced from one another (this is a common pattern in Garbage Collected languages that greatly reduces memory allocations), but in our case, this was an issue when keeping chunks of the parsed AST for large queries. For this specific example, an incoming query that was as large as 2MB was being parsed by our SQL parser. The resulting AST was being processed in full and discarded, as expected, except for one single exception: we were keeping a reference to a string containing the name of the table in the original query. This 7-byte string was being stored into our cache, and since it was a subslice of the original query, it was pinning 2MB+ of data on the heap so we could access the underlying 7 bytes.

    This was the source for the vast majority of the memory over-counting. There is no way for our size calculation code to notice that a 7-byte string was actually backed by 2MB in the heap. This is a common and recurring issue in garbage collected languages that has no easy fix, except being more explicit and attentive with memory-shared strings. For this specific case, we've fixed the issue by introducing a hack.StringClone helper that will create a copy of a string that does not share the underlying storage. By changing our TableIdentifier helper to use it, we've fixed the memory usage calculations for dozens of different planning primitives in our planner, which were all keeping references to the table's name.

  • executor: use a hash for plan cache keys (7b459d1): this last optimization is not related to size calculations, but it shows a noticeable reduction in the peak RSS usage for vtgate during high load. When caching queries, the key with which they were being cached was being calculated by prepending a prefix to the full text of the query. Since strings are immutable in Go, any string concatenation operation causes a brand new memory allocation. For a 2MB input query, the resulting len(prefix) + len(query) allocation is extremely wasteful -- particularly since our cache is already designed to store the hash of the cached keys, as opposed to the full keys.

    We can work around this issue by using a SHA256 of the query instead of the full query when building the key. In practice, it uniquely identifies the query just the same, but it does not create a temporary large allocation that would increase the total RSS of the system during load situations.

Results

Here's the resulting memory usage graph for the import after these optimizations:

after

It's looking good. Most notably, the heap usage now tracks the designed size for the plan cache (32mb) within a <2% margin of error. The peak RSS usage during the import is significantly lower, and the stable RSS usage after the import is done has also decreased severely, although it does not match the runtime heap usage because the Go runtime is not super-eager to return memory to the OS, even after it is not being used.

vtgate could be adjusted to aggressively return memory as to reduce RSS during idle situations, which may make sense for some Vitess Kubernetes deployments, although I fear it may result in reduced performance in many real world scenarios. This will be discussed in a separate issue in the future.

cc @deepthi
cc @systay for the sizegen changes

Related Issue(s)

  • Not linked (PlanetScale internal)

Checklist

  • Should this PR be backported?
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@vmg vmg changed the title Vmg/cache memory cache: track memory usage more closely Sep 13, 2021
@vmg vmg force-pushed the vmg/cache-memory branch 2 times, most recently from 0873373 to 6884f07 Compare September 13, 2021 16:05
Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@deepthi deepthi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful work :)
Let's leave it open for a day for any other comments and then merge.

vmg added 5 commits September 14, 2021 12:01
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
Signed-off-by: Vicent Marti <vmg@strn.cat>
@frouioui frouioui added the Benchmark me Add label to PR to run benchmarks label Sep 14, 2021
@vmg vmg merged commit 9df4dc7 into vitessio:main Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants