Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thanos Receive memory usage spikes extremely high after restart (9GB before restart, 93GB after) #2107

Closed
rdzimmer-zz opened this issue Feb 6, 2020 · 5 comments

Comments

@rdzimmer-zz
Copy link

Thanos, Prometheus and Golang version used:

bin/thanos --version
thanos, version 0.10.1 (branch: HEAD, revision: bdcc35842f30ada375f65aaf748a104b43d56672)
  build user:       circleci@4e51e880cd24
  build date:       20200124-07:36:32
  go version:       go1.13.1

Object Storage Provider: MinIO

What happened: After a restart of the VM running the Thanos Receive component, the memory of my Thanos Receive exploded from around 9GB (based on ps and go_memstats_sys_bytes stats) to 93GB!

What you expected to happen: Thanos Receive should be able to restart using the same amount of memory it was using before the restart.

How to reproduce it (as minimally and precisely as possible): Run a relatively large workload and restart Thanos. Watch the memory usage.

Full logs to relevant components:

Anything else we need to know:
Originally my VM's memory size was only 32GB. Thanos Receive OOM'ed after a restart, so I increased to 64GB on the VM. Thanos Receive OOM'ed again, so I went to 128GB. The other processes on the VM (MinIO, Thanos Store/Compact/Query, Prometheus) are contributing < 1GB to the system memory usage.
I have 12 vCPUs on this system.

I am running a scale test workload that generates 2.19 million datapoints each minute. The workload has the ability to add cardinality to the time series. However, in order to rule that out, I have been running with a static 2.19 million time series (metric name/label combinations). One datapoint per time series each minute. I cleared out my test environment to ensure that old time series were not polluting the test. My original theory that cardinality of time series fluctuating was causing the issue (perhaps replaying back everything and loading all of the time series in its history) was incorrect based on this.
image

During the test, the go_memstats_alloc_bytes fluctuates between 4 and 8GB. However, after the restart it exploded to 88GB. While the internal allocation does reset back to the 4-8GB range, the system RSS go_memstats_sys_bytes does not (not a surprise). It remains around 93GB.
image

image

I have some additional script automation to monitor system stats with ps, iotop, sar etc. From this I can see that the CPU usage of Thanos Receive was very high. The column CPU % is the percentage of a single CPU core (so 1081% is 10.81 cores). Using iotop I can see that the majority of the disk IO was a large amount of writes by Thanos Receive (P-Read and P-Write are KB of writes per minute).
Thanos Receive system statistics:
image

MinIO did a large amount of work for a few minutes at the end of Thanos Receives' startup. Mostly disk writes (around 23GB of them).
MinIO system statistics:
image

Environment:

  • OS (e.g. from /etc/os-release): Red Hat Enterprise Linux Server release 7.7 (Maipo)
  • Kernel (e.g. uname -a): 3.10.0-1062.el7.x86_64 #1 SMP Thu Jul 18 20:25:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
@squat
Copy link
Member

squat commented Feb 6, 2020

Hi @rdzimmer, unfortunately this is currently a known issue in Thanos receive. This occurs whenever the WAL must be replayed to write a whole block to disk. Normally this occurs during a restart if there is an incomplete block. I think we might have another open issue on this topic, so maybe we can consolidate te discussion. Nevertheless, it is an implementation issue in the Prometheus library we are using to read the WAL. We are actively investigating, as there’s no reason that a restart should be the limiting factor in the system :p. We’re also happy to review a contribution :)

-Lucas

@rdzimmer-zz
Copy link
Author

Thanks @squat! Looks like #1624 is probably the duplicate you're referencing. Not sure how I missed it yesterday (once I added "WAL" to the search it was obvious). Happy to close this and keep using that one. I'll see if there's any contribute I or my team can make.

@rdzimmer-zz
Copy link
Author

One thought I just had, I'm guessing the reason I saw mostly writes and not reads was because in this case I didn't reboot the VM, just Thanos Receive. So the data was most likely still in system file cache. I'll setup another test and do a full reboot to confirm. Not sure if that really helps things, but at least explains the results better.

@RahulArora31
Copy link

Experiencing the same problem. After the restart the memory consumption went from 6Gi to 47Gi and then crashes due to insufficient memory on the node.

@JoseRIvera07
Copy link

I'm facing the same problem. After restart the memory went from 14Gi to 64Gi and then also crashed due OOMkilled by system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants