Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive: v0.18.0 memory leak (v0.16.0 regression) #3726

Closed
sepich opened this issue Jan 16, 2021 · 33 comments
Closed

receive: v0.18.0 memory leak (v0.16.0 regression) #3726

sepich opened this issue Jan 16, 2021 · 33 comments

Comments

@sepich
Copy link
Contributor

sepich commented Jan 16, 2021

Thanos, Prometheus and Golang version used:
thanosio/thanos:v0.17.2

Object Storage Provider:
GCS

What happened:
I'm trying to upgrade from v0.16.0 to v0.17.2 and see that thanos-receive memory is "leaking":
image
Here are 3 thanos-receive pods in a hashring with equal load, red and blue lines are v0.16.0. At 18:30 i'm restarting thanos-receive-0 (orange line) as v0.17.2, then at 19:33 restart it back as v0.16.0.
GC load profile also differs between versions:
image

Related:
#3265

What you expected to happen:
Stable memory usage.

How to reproduce it (as minimally and precisely as possible):
Only reproducible on production with 80k samples/s per thanos-receive pod.

Full logs to relevant components:
Attaching pprof heap.zip, right before second restart.
heap.zip

@sepich
Copy link
Contributor Author

sepich commented Feb 7, 2021

Rechecked on:

thanos, version 0.18.0 (branch: HEAD, revision: 60d45a02d46858a38013283b578017a171cf7b82)
  build user:       circleci@8ddf80c1eb30
  build date:       20210127-12:29:07
  go version:       go1.15.7
  platform:         linux/amd64

Result still the same (v0.16.0 before 17:43):
image
image
Attaching pprof for v0.18.0 if needed:
heap.zip

@kakkoyun
Copy link
Member

Sorry for the late response @sepich, is this still an issue with the latest Thanos version?

@sepich
Copy link
Contributor Author

sepich commented Feb 12, 2021

Result for latest v0.18.0 is above.
Anyway i've retested with thanosio/thanos:master-2021-02-11-7b09e30c and memory is leaking to 24Gb even faster.

@sepich sepich changed the title receive: memory leak (v0.16.0 regression) receive: v0.18.0 memory leak (v0.16.0 regression) Feb 12, 2021
@kakkoyun
Copy link
Member

Thanks again. We're going to dedicate something to investigate this.
We had a similar issue but I thought we had fixed it. Anyways we'll check it out. Feel free to give us a hand, so help wanted.

cc @squat @bwplotka

@kakkoyun
Copy link
Member

Possible duplicate of #3471

@luizrojo
Copy link
Contributor

I am also facing this issue and same behavior, v0.16.0 fixes the possible leak.

As we can see below

Screen Shot 2021-02-23 at 10 49 37

Until 9h40 all nodes were running v0.18.0, at this time I had all receive nodes restarted.

At 9h45 all nodes were back up, still on v0.18.0, and we can see a rapid memory increase.

At 10h17 I had 2 nodes downgraded to v0.17.2 and v0.16.0. (ip-10-184-125-10 to v0.17.2 and ip-10-184-125-13 to v0.16.0)

At 10h35 the v0.17.2 node starts to increase the memory usage, following the v0.18.0 nodes behavior, but the v0.16.0 keeps the memory usage stable.

Here we can see the memory heap usage

Screen Shot 2021-02-23 at 10 50 27

@bwplotka
Copy link
Member

Ack, so this means something change between 0.17.2 and 0.18?

Let's check the v0.19.0-rc.0 I am cutting this week if anything helps there, then we can try to look closer on the commit log and bisect on commit level, especially if you can amazingly reproduce the problem 🤗

@luizrojo
Copy link
Contributor

@bwplotka I believe something changed between 0.16.0 and 0.17.2 and then apparently got worse on 0.18.0.

On 0.17.2 we can see it takes longer to start building the memory usage, but it does start to increase, following the same behavior as 0.18.0.

I'll update one of the instance to the v0.19.0-rc.0 and get back to you with some more info.

Right now I got all nodes on v0.16.0 and memory is as stable as it can get:

Screen Shot 2021-02-23 at 12 12 23

Screen Shot 2021-02-23 at 12 12 55

At 11h18 I had all instances downgraded and at 11h34 got metrics ingestion back on

@bwplotka
Copy link
Member

I would start bisecting the commits between 0.17.2 🤗 that would be helpful.

@luizrojo
Copy link
Contributor

luizrojo commented Feb 23, 2021

I just ran a couple more tests and did not have to go very far on the image tags to notice a pattern change on memory consumption.

Here is a screenshot of the memory graphs, the green line is the test instance and the yellow line is the instance running 0.16.0.

The first 3 big slopes are version 0.17.0, 0.17.2 and 0.18.0 consecutively

Starting at 15h49 im using master-2020-10-26-8447f621 the first image tag after 0.16.0.

The pattern changes, and it looks like after every GC, the consumption increases a little.

Screen Shot 2021-02-23 at 16 19 09

I'll do some more testing tomorrow

@bwplotka
Copy link
Member

There are cool findings made by @svenwltr:
#3471 (comment)

@jmichalek132
Copy link
Contributor

We are experiencing the same issue (with both v0.17.2 and v0.18.0), but the downgrade to v0.16.0 seems to help as suggested.

Screenshot_2021-03-01 Thanos Receive - Grafana

The fall in memory usage between 13:20 and 14:40 is caused by the instances beeing oomkilled.
The fall in memory usage between 15:20 and 14:50 is caused by the rollout of v0.16.0. After that the memory usage is lower and stable.

@dhohengassner
Copy link

We are facing the same issue. Glad this issue exists.
Downgrade to 0.16.0 stabilized our system as well 👍

Looking forward to a fix!

@jmichalek132
Copy link
Contributor

Tried with v0.19.0-rc.0 to see if the issue is still there, looks like it is. (Upgraded from 0.16.0 to the rc at 14:30)
Screenshot_2021-03-02 Thanos Receive - Grafana

@bwplotka
Copy link
Member

bwplotka commented Mar 2, 2021

Thanks a lot!

Anyone can help but I would love to find out what's wrong this week, ideally before 0.19.0 but let's see.

We got some profile but let's see if this is helpful:

(pprof) top 10
Showing nodes accounting for 421.25GB, 96.30% of 437.43GB total
Dropped 572 nodes (cum <= 2.19GB)
Showing top 10 nodes out of 47
      flat  flat%   sum%        cum   cum%
  199.27GB 45.55% 45.55%   199.27GB 45.55%  github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal
   80.06GB 18.30% 63.86%    80.06GB 18.30%  github.com/golang/snappy.Decode
   75.49GB 17.26% 81.11%    77.47GB 17.71%  github.com/thanos-io/thanos/pkg/receive.(*Writer).Write
   25.18GB  5.76% 86.87%    25.18GB  5.76%  bytes.makeSlice
   18.15GB  4.15% 91.02%    18.64GB  4.26%  github.com/thanos-io/thanos/pkg/receive.(*Handler).forward
   18.07GB  4.13% 95.15%   217.34GB 49.69%  github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*WriteRequest).Unmarshal
    2.71GB  0.62% 95.77%     2.71GB  0.62%  github.com/prometheus/prometheus/tsdb/encoding.(*Decbuf).UvarintStr (inline)
    1.97GB  0.45% 96.22%     4.62GB  1.06%  github.com/prometheus/prometheus/tsdb/record.(*Decoder).Series
    0.33GB 0.076% 96.29%     2.87GB  0.66%  github.com/prometheus/prometheus/tsdb.(*Head).loadWAL
    0.03GB 0.0064% 96.30%     2.32GB  0.53%  github.com/prometheus/prometheus/tsdb.(*Head).loadWAL.func6

Ideally we pin point commit which introduced the regression 🤗 The problem will be if that's TSDB update (it probably is).

@jmichalek132
Copy link
Contributor

Profiles captured via conprof:


github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal

/home/circleci/project/pkg/store/storepb/prompb/types.pb.go

  Total:     22.64TB    22.64TB (flat, cum) 46.00%

github.com/thanos-io/thanos/pkg/receive.(*Writer).Write

/home/circleci/project/pkg/receive/writer.go

  Total:      6.77TB     6.88TB (flat, cum) 13.98%

google.golang.org/grpc.(*parser).recvMsg

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go

  Total:      4.77TB     4.77TB (flat, cum)  9.70%

github.com/thanos-io/thanos/pkg/store/storepb.(*WriteRequest).Marshal

/home/circleci/project/pkg/store/storepb/rpc.pb.go

  Total:      4.77TB     4.77TB (flat, cum)  9.69%

github.com/golang/snappy.Decode

/home/circleci/go/pkg/mod/github.com/golang/snappy@v0.0.2/decode.go

  Total:      1.79TB     1.79TB (flat, cum)  3.63%

github.com/thanos-io/thanos/pkg/receive.(*Handler).forward

/home/circleci/project/pkg/receive/handler.go

  Total:      1.72TB     3.31TB (flat, cum)  6.72%

google.golang.org/grpc/internal/transport.(*http2Client).Write

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:      1.60TB     1.60TB (flat, cum)  3.25%

github.com/thanos-io/thanos/pkg/store/storepb.(*WriteRequest).Unmarshal

/home/circleci/project/pkg/store/storepb/rpc.pb.go

  Total:      1.18TB    17.34TB (flat, cum) 35.22%

internal/reflectlite.Swapper

/usr/local/go/src/internal/reflectlite/swapper.go

  Total:    937.76GB   937.76GB (flat, cum)  1.86%

github.com/thanos-io/thanos/pkg/receive.hash

/home/circleci/project/pkg/receive/hashring.go

  Total:    603.91GB     1.51TB (flat, cum)  3.06%

bytes.makeSlice

/usr/local/go/src/bytes/buffer.go

  Total:    459.36GB   459.36GB (flat, cum)  0.91%

github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*WriteRequest).Unmarshal

/home/circleci/project/pkg/store/storepb/prompb/remote.pb.go

  Total:    325.43GB     6.80TB (flat, cum) 13.82%

github.com/prometheus/prometheus/tsdb/record.(*Decoder).Series

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/record/record.go

  Total:    138.39GB   138.39GB (flat, cum)  0.27%

google.golang.org/grpc/internal/transport.(*decodeState).processHeaderField

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http_util.go

  Total:     87.70GB    87.70GB (flat, cum)  0.17%

google.golang.org/grpc/internal/transport.(*http2Client).newStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:     87.38GB   111.23GB (flat, cum)  0.22%

golang.org/x/net/http2.(*Framer).readMetaFrame.func1

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20201110031124-69a78807bb2b/http2/frame.go

  Total:     85.90GB    85.90GB (flat, cum)  0.17%

google.golang.org/grpc/internal/transport.(*http2Client).createHeaderFields

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:     85.67GB    95.15GB (flat, cum)  0.19%

context.WithValue

/usr/local/go/src/context/context.go

  Total:     85.18GB    85.18GB (flat, cum)  0.17%

github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/querier.go

  Total:     75.53GB    75.58GB (flat, cum)  0.15%

google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:     72.84GB   174.99GB (flat, cum)  0.35%

google.golang.org/grpc.newClientStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/stream.go

  Total:     62.68GB   332.85GB (flat, cum)  0.66%

github.com/go-kit/kit/log.With

/home/circleci/go/pkg/mod/github.com/go-kit/kit@v0.10.0/log/log.go

  Total:     46.79GB    46.79GB (flat, cum) 0.093%

github.com/prometheus/prometheus/tsdb/index.(*Writer).writePostingsToTmpFiles

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/index/index.go

  Total:     43.93GB    45.88GB (flat, cum) 0.091%

golang.org/x/net/http2.(*Framer).readMetaFrame

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20201110031124-69a78807bb2b/http2/frame.go

  Total:     40.48GB   127.44GB (flat, cum)  0.25%

google.golang.org/grpc/internal/transport.(*controlBuffer).executeAndPut

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/controlbuf.go

  Total:     36.12GB    39.32GB (flat, cum) 0.078%

context.(*cancelCtx).Done

/usr/local/go/src/context/context.go

  Total:     33.51GB    33.51GB (flat, cum) 0.066%

context.WithDeadline

/usr/local/go/src/context/context.go

  Total:     31.74GB    51.55GB (flat, cum)   0.1%

google.golang.org/grpc/internal/transport.newWriteQuota

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/flowcontrol.go

  Total:     28.73GB    28.73GB (flat, cum) 0.057%

github.com/thanos-io/thanos/pkg/receive.(*Handler).replicate

/home/circleci/project/pkg/receive/handler.go

  Total:     23.55GB    72.43GB (flat, cum)  0.14%

github.com/thanos-io/thanos/pkg/tracing.StartSpan

/home/circleci/project/pkg/tracing/tracing.go

  Total:     22.55GB    45.18GB (flat, cum)  0.09%

github.com/prometheus/prometheus/tsdb.(*Head).appender

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:     22.28GB    22.30GB (flat, cum) 0.044%

github.com/prometheus/prometheus/tsdb/index.(*MemPostings).addFor

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/index/postings.go

  Total:     20.98GB    20.98GB (flat, cum) 0.042%

github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next.func1

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/querier.go

  Total:     19.84GB    19.84GB (flat, cum) 0.039%

time.AfterFunc

/usr/local/go/src/time/sleep.go

  Total:     19.81GB    19.81GB (flat, cum) 0.039%

github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:     19.62GB    49.24GB (flat, cum) 0.098%

github.com/thanos-io/thanos/pkg/receive.(*Handler).fanoutForward

/home/circleci/project/pkg/receive/handler.go

  Total:     18.12GB   123.33GB (flat, cum)  0.24%

google.golang.org/grpc/internal/transport.(*http2Server).writeHeaderLocked

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:     17.57GB    20.73GB (flat, cum) 0.041%

google.golang.org/grpc/internal/transport.(*http2Server).WriteStatus

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:     17.49GB    37.89GB (flat, cum) 0.075%

github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.newClientSpanFromContext

/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/tracing/opentracing/client_interceptors.go

  Total:     17.46GB    36.64GB (flat, cum) 0.073%

github.com/prometheus/prometheus/tsdb/index.(*MemPostings).Delete

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/index/postings.go

  Total:     16.26GB    16.26GB (flat, cum) 0.032%

google.golang.org/grpc/internal/transport.(*http2Client).NewStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:     15.73GB   233.06GB (flat, cum)  0.46%

github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:     15.57GB    18.77GB (flat, cum) 0.037%

golang.org/x/net/http2.parseHeadersFrame

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20201110031124-69a78807bb2b/http2/frame.go

  Total:     14.25GB    14.25GB (flat, cum) 0.028%

google.golang.org/grpc/internal/transport.(*recvBuffer).put

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/transport.go

  Total:     13.98GB    13.98GB (flat, cum) 0.028%

google.golang.org/grpc.(*Server).processUnaryRPC

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go

  Total:     13.16GB    24.68TB (flat, cum) 50.12%

google.golang.org/grpc.(*clientStream).newAttemptLocked

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/stream.go

  Total:     12.72GB    12.72GB (flat, cum) 0.025%

github.com/grpc-ecosystem/go-grpc-prometheus.newClientReporter

/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/client_reporter.go

  Total:     12.72GB    12.72GB (flat, cum) 0.025%

github.com/grpc-ecosystem/go-grpc-prometheus.newServerReporter

/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_reporter.go

  Total:     12.67GB    12.67GB (flat, cum) 0.025%

google.golang.org/grpc.(*clientStream).SendMsg

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/stream.go

  Total:     12.66GB     6.39TB (flat, cum) 12.97%

github.com/prometheus/prometheus/tsdb/chunkenc.(*XORChunk).iterator

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/chunkenc/xor.go

  Total:     12.15GB    12.15GB (flat, cum) 0.024%


github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal

/home/circleci/project/pkg/store/storepb/prompb/types.pb.go

  Total:      2.11TB     2.11TB (flat, cum) 45.68%

github.com/thanos-io/thanos/pkg/receive.(*Writer).Write

/home/circleci/project/pkg/receive/writer.go

  Total:    644.39GB   655.40GB (flat, cum) 13.88%

github.com/thanos-io/thanos/pkg/store/storepb.(*WriteRequest).Marshal

/home/circleci/project/pkg/store/storepb/rpc.pb.go

  Total:    453.94GB   453.94GB (flat, cum)  9.61%

google.golang.org/grpc.(*parser).recvMsg

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go

  Total:    452.94GB   452.99GB (flat, cum)  9.59%

github.com/golang/snappy.Decode

/home/circleci/go/pkg/mod/github.com/golang/snappy@v0.0.2/decode.go

  Total:    172.04GB   172.04GB (flat, cum)  3.64%

github.com/thanos-io/thanos/pkg/receive.(*Handler).forward

/home/circleci/project/pkg/receive/handler.go

  Total:    163.47GB   314.08GB (flat, cum)  6.65%

google.golang.org/grpc/internal/transport.(*http2Client).Write

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:    151.62GB   151.90GB (flat, cum)  3.22%

github.com/thanos-io/thanos/pkg/store/storepb.(*WriteRequest).Unmarshal

/home/circleci/project/pkg/store/storepb/rpc.pb.go

  Total:    111.61GB     1.61TB (flat, cum) 34.83%

internal/reflectlite.Swapper

/usr/local/go/src/internal/reflectlite/swapper.go

  Total:     87.29GB    87.29GB (flat, cum)  1.85%

github.com/thanos-io/thanos/pkg/receive.hash

/home/circleci/project/pkg/receive/hashring.go

  Total:     55.46GB   142.74GB (flat, cum)  3.02%

bytes.makeSlice

/usr/local/go/src/bytes/buffer.go

  Total:     44.68GB    44.68GB (flat, cum)  0.95%

github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*WriteRequest).Unmarshal

/home/circleci/project/pkg/store/storepb/prompb/remote.pb.go

  Total:     30.47GB   654.21GB (flat, cum) 13.86%

github.com/prometheus/prometheus/tsdb/record.(*Decoder).Series

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/record/record.go

  Total:     19.57GB    19.57GB (flat, cum)  0.41%

github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:     15.57GB    18.77GB (flat, cum)   0.4%

google.golang.org/grpc/internal/transport.(*decodeState).processHeaderField

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http_util.go

  Total:      8.16GB     8.16GB (flat, cum)  0.17%

google.golang.org/grpc/internal/transport.(*http2Client).newStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:      8.02GB    10.23GB (flat, cum)  0.22%

google.golang.org/grpc/internal/transport.(*http2Client).createHeaderFields

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:      8.02GB     8.89GB (flat, cum)  0.19%

github.com/prometheus/prometheus/tsdb/index.(*MemPostings).addFor

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/index/postings.go

  Total:      7.96GB     7.96GB (flat, cum)  0.17%

golang.org/x/net/http2.(*Framer).readMetaFrame.func1

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20201110031124-69a78807bb2b/http2/frame.go

  Total:      7.93GB     7.93GB (flat, cum)  0.17%

context.WithValue

/usr/local/go/src/context/context.go

  Total:      7.93GB     7.93GB (flat, cum)  0.17%

google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:      6.74GB    16.29GB (flat, cum)  0.34%

github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/querier.go

  Total:         6GB     6.01GB (flat, cum)  0.13%

google.golang.org/grpc.newClientStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/stream.go

  Total:      5.82GB    30.78GB (flat, cum)  0.65%

github.com/go-kit/kit/log.With

/home/circleci/go/pkg/mod/github.com/go-kit/kit@v0.10.0/log/log.go

  Total:      4.26GB     4.26GB (flat, cum)  0.09%

golang.org/x/net/http2.(*Framer).readMetaFrame

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20201110031124-69a78807bb2b/http2/frame.go

  Total:      3.77GB    11.81GB (flat, cum)  0.25%

github.com/prometheus/prometheus/tsdb/index.(*Writer).writePostingsToTmpFiles

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/index/index.go

  Total:      3.47GB     3.64GB (flat, cum) 0.077%

google.golang.org/grpc/internal/transport.(*controlBuffer).executeAndPut

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/controlbuf.go

  Total:      3.15GB     3.45GB (flat, cum) 0.073%

context.(*cancelCtx).Done

/usr/local/go/src/context/context.go

  Total:      3.14GB     3.14GB (flat, cum) 0.066%

context.WithDeadline

/usr/local/go/src/context/context.go

  Total:      2.98GB     4.85GB (flat, cum)   0.1%

github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:      2.95GB     6.55GB (flat, cum)  0.14%

google.golang.org/grpc/internal/transport.newWriteQuota

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/flowcontrol.go

  Total:      2.69GB     2.69GB (flat, cum) 0.057%

github.com/prometheus/prometheus/tsdb.(*Head).appender

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:      2.50GB     2.50GB (flat, cum) 0.053%

github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:      2.49GB    12.46GB (flat, cum)  0.26%

github.com/prometheus/prometheus/tsdb/index.(*MemPostings).Delete

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/index/postings.go

  Total:      2.47GB     2.47GB (flat, cum) 0.052%

github.com/thanos-io/thanos/pkg/receive.(*Handler).replicate

/home/circleci/project/pkg/receive/handler.go

  Total:      2.26GB     6.79GB (flat, cum)  0.14%

github.com/thanos-io/thanos/pkg/tracing.StartSpan

/home/circleci/project/pkg/tracing/tracing.go

  Total:      2.07GB     4.19GB (flat, cum) 0.089%

time.AfterFunc

/usr/local/go/src/time/sleep.go

  Total:      1.86GB     1.87GB (flat, cum)  0.04%

github.com/prometheus/prometheus/tsdb/chunkenc.(*bstream).writeBits

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/chunkenc/bstream.go

  Total:      1.75GB     1.75GB (flat, cum) 0.037%

github.com/prometheus/prometheus/tsdb/chunkenc.(*XORChunk).iterator

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/chunkenc/xor.go

  Total:      1.73GB     1.73GB (flat, cum) 0.037%

github.com/thanos-io/thanos/pkg/receive.(*Handler).fanoutForward

/home/circleci/project/pkg/receive/handler.go

  Total:      1.68GB    11.44GB (flat, cum)  0.24%

google.golang.org/grpc/internal/transport.(*http2Server).WriteStatus

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:      1.64GB     3.55GB (flat, cum) 0.075%

google.golang.org/grpc/internal/transport.(*http2Server).writeHeaderLocked

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:      1.63GB     1.91GB (flat, cum)  0.04%

github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next.func1

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/querier.go

  Total:      1.60GB     1.60GB (flat, cum) 0.034%

github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.newClientSpanFromContext

/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/tracing/opentracing/client_interceptors.go

  Total:      1.56GB     3.32GB (flat, cum)  0.07%

google.golang.org/grpc/internal/transport.(*recvBuffer).put

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/transport.go

  Total:      1.44GB     1.44GB (flat, cum)  0.03%

google.golang.org/grpc/internal/transport.(*http2Client).NewStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:      1.40GB    21.53GB (flat, cum)  0.46%

github.com/prometheus/prometheus/tsdb.seriesHashmap.set

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20201119142752-3ad25a6dc3d9/tsdb/head.go

  Total:      1.35GB     1.35GB (flat, cum) 0.029%

golang.org/x/net/http2.parseHeadersFrame

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20201110031124-69a78807bb2b/http2/frame.go

  Total:      1.35GB     1.35GB (flat, cum) 0.029%

google.golang.org/grpc.(*Server).processUnaryRPC

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go

  Total:      1.23GB     2.28TB (flat, cum) 49.55%

google.golang.org/grpc.(*clientStream).newAttemptLocked

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/stream.go

  Total:      1.18GB     1.18GB (flat, cum) 0.025%

@jmichalek132
Copy link
Contributor

jmichalek132 commented Mar 9, 2021

So yesterday I deployed new instances of Prometheus and Thanos receive for this test With the same amount of traffic as before when I hit the leak. First with v0.17.2 then before midnight I downgraded to v0.16.0
Screenshot_2021-03-09 Thanos Receive - Grafana
So it seems that the downgrade does lead to significant memory usage reduction.
But the memory profiles collected via conprof just before the downgrade are different compared to last time.
This time

github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal

isn't in top 10.
Screenshot_2021-03-09 thanos alloc_space

Full profile before the downgrade:

compress/flate.NewWriter
/usr/local/go/src/compress/flate/deflate.go
  Total:      8.63GB    13.61GB (flat, cum) 56.32%
compress/flate.(*compressor).init
/usr/local/go/src/compress/flate/deflate.go
  Total:      4.99GB     4.99GB (flat, cum) 20.63%
runtime/pprof.StartCPUProfile
/usr/local/go/src/runtime/pprof/pprof.go
  Total:      1.30GB     1.30GB (flat, cum)  5.39%
runtime/pprof.allFrames
/usr/local/go/src/runtime/pprof/proto.go
  Total:   1007.67MB  1007.67MB (flat, cum)  4.07%
google.golang.org/grpc/internal/transport.newFramer
/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http_util.go
  Total:    757.67MB   762.17MB (flat, cum)  3.08%
github.com/prometheus/procfs.FS.Stat
/go/pkg/mod/github.com/prometheus/procfs@v0.1.3/stat.go
  Total:    696.48MB     1.98GB (flat, cum)  8.18%
runtime/pprof.(*profileBuilder).emitLocation
/usr/local/go/src/runtime/pprof/proto.go
  Total:    685.01MB     2.61GB (flat, cum) 10.81%
strings.Fields
/usr/local/go/src/strings/strings.go
  Total:    623.45MB   623.45MB (flat, cum)  2.52%
runtime/pprof.writeHeapInternal
/usr/local/go/src/runtime/pprof/pprof.go
  Total:    550.47MB     5.15GB (flat, cum) 21.29%
github.com/prometheus/client_golang/prometheus.processMetric
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:    519.56MB     1.41GB (flat, cum)  5.85%
bytes.makeSlice
/usr/local/go/src/bytes/buffer.go
  Total:    480.22MB   480.22MB (flat, cum)  1.94%
github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:    478.36MB     1.96GB (flat, cum)  8.12%
runtime/pprof.(*protobuf).strings
/usr/local/go/src/runtime/pprof/protobuf.go
  Total:    295.40MB   307.69MB (flat, cum)  1.24%
os.(*File).readdirnames
/usr/local/go/src/os/dir_unix.go
  Total:    288.42MB   288.42MB (flat, cum)  1.17%
github.com/prometheus/client_golang/prometheus.NewDesc
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/desc.go
  Total:    244.01MB   244.01MB (flat, cum)  0.99%
github.com/prometheus/client_golang/prometheus.(*histogram).Write
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/histogram.go
  Total:    230.01MB   230.01MB (flat, cum)  0.93%
github.com/prometheus/client_golang/prometheus.checkMetricConsistency
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:    207.01MB   207.01MB (flat, cum)  0.84%
bufio.(*Scanner).Scan
/usr/local/go/src/bufio/scan.go
  Total:    198.15MB   198.15MB (flat, cum)   0.8%
regexp.(*bitState).reset
/usr/local/go/src/regexp/backtrack.go
  Total:    187.23MB   187.23MB (flat, cum)  0.76%
github.com/prometheus/client_golang/prometheus.populateMetric
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/value.go
  Total:    141.51MB   141.51MB (flat, cum)  0.57%
runtime/debug.ReadGCStats
/usr/local/go/src/runtime/debug/garbage.go
  Total:    117.13MB   117.13MB (flat, cum)  0.47%
golang.org/x/net/trace.NewEventLog
/go/pkg/mod/golang.org/x/net@v0.0.0-20200822124328-c89045814202/trace/events.go
  Total:    114.69MB   114.69MB (flat, cum)  0.46%
github.com/prometheus/procfs.parseCPUStat
/go/pkg/mod/github.com/prometheus/procfs@v0.1.3/stat.go
  Total:    100.51MB   188.51MB (flat, cum)  0.76%
github.com/prometheus/client_golang/prometheus.(*wrappingCollector).Collect
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/wrap.go
  Total:     94.01MB    94.01MB (flat, cum)  0.38%
github.com/prometheus/client_golang/prometheus/internal.NormalizeMetricFamilies
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/internal/metric.go
  Total:     84.06MB    84.06MB (flat, cum)  0.34%
github.com/prometheus/client_golang/prometheus.(*wrappingMetric).Write
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/wrap.go
  Total:        79MB   177.51MB (flat, cum)  0.72%
compress/flate.(*huffmanEncoder).generate
/usr/local/go/src/compress/flate/huffman_code.go
  Total:     77.67MB    77.67MB (flat, cum)  0.31%
github.com/prometheus/client_golang/prometheus.(*goCollector).Collect
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/go_collector.go
  Total:     77.42MB   214.05MB (flat, cum)  0.86%
runtime/pprof.(*protobuf).varint
/usr/local/go/src/runtime/pprof/protobuf.go
  Total:     72.46MB    72.46MB (flat, cum)  0.29%
sync.(*Pool).pinSlow
/usr/local/go/src/sync/pool.go
  Total:     63.59MB    63.59MB (flat, cum)  0.26%
net/textproto.(*Reader).ReadMIMEHeader
/usr/local/go/src/net/textproto/reader.go
  Total:     57.51MB    57.51MB (flat, cum)  0.23%
regexp.(*Regexp).FindAllStringIndex.func1
/usr/local/go/src/regexp/regexp.go
  Total:     50.01MB    50.01MB (flat, cum)   0.2%
fmt.newScanState
/usr/local/go/src/fmt/scan.go
  Total:        48MB    55.01MB (flat, cum)  0.22%
net/http.newBufioWriterSize
/usr/local/go/src/net/http/server.go
  Total:     43.63MB    48.64MB (flat, cum)   0.2%
github.com/prometheus/prometheus/tsdb/index.NewFileWriter
/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200922180708-b0145884d381/tsdb/index/index.go
  Total:     36.01MB    36.01MB (flat, cum)  0.15%
fmt.(*ss).floatToken
/usr/local/go/src/fmt/scan.go
  Total:        34MB       34MB (flat, cum)  0.14%
runtime/pprof.writeHeapProto
/usr/local/go/src/runtime/pprof/protomem.go
  Total:     33.50MB     4.61GB (flat, cum) 19.07%
net/http.(*conn).readRequest
/usr/local/go/src/net/http/server.go
  Total:     32.50MB   203.08MB (flat, cum)  0.82%
net/http.readRequest
/usr/local/go/src/net/http/request.go
  Total:     31.01MB   120.03MB (flat, cum)  0.48%
net/http.newBufioReader
/usr/local/go/src/net/http/server.go
  Total:     28.61MB    32.11MB (flat, cum)  0.13%
github.com/prometheus/prometheus/tsdb/index.NewWriter
/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200922180708-b0145884d381/tsdb/index/index.go
  Total:     24.01MB    60.02MB (flat, cum)  0.24%
runtime/pprof.writeRuntimeProfile
/usr/local/go/src/runtime/pprof/pprof.go
  Total:     21.19MB     2.84GB (flat, cum) 11.74%
github.com/prometheus/client_golang/prometheus.NewConstMetric
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/value.go
  Total:        21MB       21MB (flat, cum) 0.085%
net/url.parse
/usr/local/go/src/net/url/url.go
  Total:     20.50MB    20.50MB (flat, cum) 0.083%
os.lstatNolog
/usr/local/go/src/os/stat_unix.go
  Total:        19MB       28MB (flat, cum)  0.11%
runtime/pprof.writeMutex
/usr/local/go/src/runtime/pprof/pprof.go
  Total:     18.75MB     1.41GB (flat, cum)  5.82%
github.com/prometheus/client_golang/prometheus.checkSuffixCollisions
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:     18.50MB    18.50MB (flat, cum) 0.075%
context.WithCancel
/usr/local/go/src/context/context.go
  Total:        18MB       38MB (flat, cum)  0.15%
github.com/prometheus/common/expfmt.glob..func1
/go/pkg/mod/github.com/prometheus/common@v0.13.0/expfmt/text_create.go
  Total:     17.57MB    17.57MB (flat, cum) 0.071%
syscall.anyToSockaddr
/usr/local/go/src/syscall/syscall_linux.go
  Total:        16MB       16MB (flat, cum) 0.065%

Full profile from this morning after the downgrade:

ompress/flate.NewWriter
/usr/local/go/src/compress/flate/deflate.go
  Total:      9.66GB    15.18GB (flat, cum) 55.74%
compress/flate.(*compressor).init
/usr/local/go/src/compress/flate/deflate.go
  Total:      5.52GB     5.52GB (flat, cum) 20.25%
runtime/pprof.StartCPUProfile
/usr/local/go/src/runtime/pprof/pprof.go
  Total:      1.45GB     1.45GB (flat, cum)  5.34%
runtime/pprof.allFrames
/usr/local/go/src/runtime/pprof/proto.go
  Total:      1.20GB     1.20GB (flat, cum)  4.42%
google.golang.org/grpc/internal/transport.newFramer
/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http_util.go
  Total:    868.66MB   873.66MB (flat, cum)  3.13%
runtime/pprof.(*profileBuilder).emitLocation
/usr/local/go/src/runtime/pprof/proto.go
  Total:    806.57MB     3.04GB (flat, cum) 11.15%
github.com/prometheus/procfs.FS.Stat
/go/pkg/mod/github.com/prometheus/procfs@v0.1.3/stat.go
  Total:    795.99MB     2.19GB (flat, cum)  8.03%
strings.Fields
/usr/local/go/src/strings/strings.go
  Total:    691.46MB   691.46MB (flat, cum)  2.48%
runtime/pprof.writeHeapInternal
/usr/local/go/src/runtime/pprof/pprof.go
  Total:       648MB     5.94GB (flat, cum) 21.83%
github.com/prometheus/client_golang/prometheus.processMetric
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:    611.57MB     1.62GB (flat, cum)  5.94%
github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:    517.38MB     2.21GB (flat, cum)  8.10%
bytes.makeSlice
/usr/local/go/src/bytes/buffer.go
  Total:    494.39MB   494.39MB (flat, cum)  1.77%
runtime/pprof.(*protobuf).strings
/usr/local/go/src/runtime/pprof/protobuf.go
  Total:    348.78MB   365.20MB (flat, cum)  1.31%
os.(*File).readdirnames
/usr/local/go/src/os/dir_unix.go
  Total:    337.22MB   338.72MB (flat, cum)  1.21%
github.com/prometheus/client_golang/prometheus.(*histogram).Write
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/histogram.go
  Total:    271.52MB   271.52MB (flat, cum)  0.97%
github.com/prometheus/client_golang/prometheus.NewDesc
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/desc.go
  Total:    259.01MB   259.01MB (flat, cum)  0.93%
github.com/prometheus/client_golang/prometheus.checkMetricConsistency
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/registry.go
  Total:    241.59MB   241.59MB (flat, cum)  0.87%
bufio.(*Scanner).Scan
/usr/local/go/src/bufio/scan.go
  Total:    205.73MB   205.73MB (flat, cum)  0.74%
regexp.(*bitState).reset
/usr/local/go/src/regexp/backtrack.go
  Total:    190.90MB   190.90MB (flat, cum)  0.68%
github.com/prometheus/client_golang/prometheus.populateMetric
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/value.go
  Total:    157.51MB   157.51MB (flat, cum)  0.56%
runtime/debug.ReadGCStats
/usr/local/go/src/runtime/debug/garbage.go
  Total:    149.78MB   150.28MB (flat, cum)  0.54%
golang.org/x/net/trace.NewEventLog
/go/pkg/mod/golang.org/x/net@v0.0.0-20200822124328-c89045814202/trace/events.go
  Total:    137.33MB   137.33MB (flat, cum)  0.49%
github.com/prometheus/client_golang/prometheus.(*wrappingCollector).Collect
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/wrap.go
  Total:    109.01MB   109.01MB (flat, cum)  0.39%
github.com/prometheus/procfs.parseCPUStat
/go/pkg/mod/github.com/prometheus/procfs@v0.1.3/stat.go
  Total:    109.01MB   209.01MB (flat, cum)  0.75%
compress/flate.(*huffmanEncoder).generate
/usr/local/go/src/compress/flate/huffman_code.go
  Total:    100.72MB   100.72MB (flat, cum)  0.36%
runtime/pprof.(*protobuf).varint
/usr/local/go/src/runtime/pprof/protobuf.go
  Total:     94.15MB    94.15MB (flat, cum)  0.34%
github.com/prometheus/client_golang/prometheus/internal.NormalizeMetricFamilies
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/internal/metric.go
  Total:     87.57MB    87.57MB (flat, cum)  0.31%
github.com/prometheus/client_golang/prometheus.(*goCollector).Collect
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/go_collector.go
  Total:     87.47MB   260.75MB (flat, cum)  0.94%
github.com/prometheus/client_golang/prometheus.(*wrappingMetric).Write
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/wrap.go
  Total:     82.50MB   196.51MB (flat, cum)   0.7%
sync.(*Pool).pinSlow
/usr/local/go/src/sync/pool.go
  Total:     62.59MB    62.59MB (flat, cum)  0.22%
net/textproto.(*Reader).ReadMIMEHeader
/usr/local/go/src/net/textproto/reader.go
  Total:     62.51MB    63.02MB (flat, cum)  0.23%
github.com/prometheus/prometheus/tsdb/index.NewFileWriter
/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200922180708-b0145884d381/tsdb/index/index.go
  Total:     60.02MB    60.02MB (flat, cum)  0.22%
regexp.(*Regexp).FindAllStringIndex.func1
/usr/local/go/src/regexp/regexp.go
  Total:     49.51MB    49.51MB (flat, cum)  0.18%
fmt.newScanState
/usr/local/go/src/fmt/scan.go
  Total:     47.50MB    52.51MB (flat, cum)  0.19%
fmt.(*ss).floatToken
/usr/local/go/src/fmt/scan.go
  Total:     43.50MB    43.50MB (flat, cum)  0.16%
net/http.newBufioWriterSize
/usr/local/go/src/net/http/server.go
  Total:     40.13MB    47.14MB (flat, cum)  0.17%
github.com/prometheus/prometheus/tsdb/index.NewWriter
/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20200922180708-b0145884d381/tsdb/index/index.go
  Total:     40.01MB   100.03MB (flat, cum)  0.36%
net/http.(*conn).readRequest
/usr/local/go/src/net/http/server.go
  Total:     37.01MB   211.57MB (flat, cum)  0.76%
net/http.readRequest
/usr/local/go/src/net/http/request.go
  Total:     35.51MB   128.53MB (flat, cum)  0.46%
runtime/pprof.writeHeapProto
/usr/local/go/src/runtime/pprof/protomem.go
  Total:     35.50MB     5.31GB (flat, cum) 19.50%
net/http.newBufioReader
/usr/local/go/src/net/http/server.go
  Total:     34.13MB    37.64MB (flat, cum)  0.13%
github.com/prometheus/common/expfmt.glob..func1
/go/pkg/mod/github.com/prometheus/common@v0.13.0/expfmt/text_create.go
  Total:     27.11MB    27.11MB (flat, cum) 0.097%
runtime/pprof.writeRuntimeProfile
/usr/local/go/src/runtime/pprof/pprof.go
  Total:     24.71MB     3.16GB (flat, cum) 11.60%
github.com/prometheus/client_golang/prometheus.NewConstMetric
/go/pkg/mod/github.com/prometheus/client_golang@v1.7.1/prometheus/value.go
  Total:     24.50MB    24.50MB (flat, cum) 0.088%
runtime/pprof.writeMutex
/usr/local/go/src/runtime/pprof/pprof.go
  Total:     20.27MB     1.52GB (flat, cum)  5.57%
os.lstatNolog
/usr/local/go/src/os/stat_unix.go
  Total:        20MB    25.50MB (flat, cum) 0.091%
context.WithCancel
/usr/local/go/src/context/context.go
  Total:     19.50MB       43MB (flat, cum)  0.15%
net/url.parse
/usr/local/go/src/net/url/url.go
  Total:        17MB       17MB (flat, cum) 0.061%
net.(*netFD).accept
/usr/local/go/src/net/fd_unix.go
  Total:     16.50MB       40MB (flat, cum)  0.14%
syscall.ByteSliceFromString
/usr/local/go/src/syscall/syscall.go
  Total:        16MB       16MB (flat, cum) 0.057%

@bwplotka
Copy link
Member

BTW those memory metrics you have is for what exactly metrics? (it matters, before Go1.6)

@jmichalek132
Copy link
Contributor

Memory used is

container_memory_usage_bytes

from cAdvisor and Memory Used based on thanos metrics is:

go_memstats_heap_inuse_bytes

@bwplotka
Copy link
Member

bwplotka commented Mar 10, 2021

container_memory_usage_bytes

is inflated, see https://www.bwplotka.dev/2019/golang-memory-monitoring/

Some amazing ideas from Cortex experience are using following options (adding those envvars to Thanos process):

  • GODEBUG=madvdontneed=1 which will mimick what Go1.16 will do by default (and what was done before Go1.12). This will improve observability part of things.
  • GOGC=50 (default is 100). This changes GC to run more often. For memory heavy containers we want that: We prefer latency and CPU over OOM

@bwplotka
Copy link
Member

bwplotka commented Mar 11, 2021

While ingesting ~10 millions series (replicated) on 0.19.0-rc.1 looks actually better not worse:

image

@jmichalek132
Copy link
Contributor

jmichalek132 commented Mar 11, 2021

Screenshot 2021-03-11 at 16 33 37
Tested version in order

  1. v0.17.2 -> profile
  2. v0.16.0 -> profile
  3. v0.19.0-rc.0 -> profile
  4. 1fff9a7 -> profile
  5. c534b6d -> profile
  6. b452888 -> profile
  7. 9875340 -> profile
  8. f494a99 -> profile
  9. v0.16.0 -> profile
  10. 2e12840 -> profile

@tomleb
Copy link

tomleb commented Mar 11, 2021

While ingesting ~10 millions series (replicated) on 0.19.0-rc.1 looks actually better not worse:

Is this 10 million samples per scrape or total? 0.19.0-rc.1 also leaks for me with about ~1 million samples per scrape. (Prometheus 2.22.2)

@jmichalek132
Copy link
Contributor

I am not sure that this bugfix that was merged into 0.16.0 release branch made it back into main branch.

@metalmatze
Copy link
Member

@jmichalek132 I think you're right. I was never merged back into v0.17+ but instead entirely replaced with the ZLabel.
https://github.com/thanos-io/thanos/commits/v0.17.0/pkg/store/labelpb/label.go

I'm wondering if @bwplotka based that work on the fixed label or ignored it and did an entire rewrite?
Either way, it shouldn't show up anymore with the benchmarks posted, it seems rather unlikely. 🤔

@jmichalek132
Copy link
Contributor

jmichalek132 commented Mar 11, 2021

Memory after deployment of the v0.19.0-rc.1.

Screenshot_2021-03-11 Thanos Receive Copy - Grafana(1)
And the

github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal

popped up again in the profile.
Screenshot_2021-03-11 thanos alloc_space

TOP


github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*TimeSeries).Unmarshal

/home/circleci/project/pkg/store/storepb/prompb/types.pb.go

  Total:      1.95TB     1.95TB (flat, cum) 46.75%

github.com/thanos-io/thanos/pkg/receive.(*Writer).Write

/home/circleci/project/pkg/receive/writer.go

  Total:    616.01GB   626.44GB (flat, cum) 14.67%

google.golang.org/grpc.(*parser).recvMsg

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go

  Total:    453.55GB   453.57GB (flat, cum) 10.62%

github.com/thanos-io/thanos/pkg/store/storepb.(*WriteRequest).Marshal

/home/circleci/project/pkg/store/storepb/rpc.pb.go

  Total:    337.71GB   337.71GB (flat, cum)  7.91%

github.com/thanos-io/thanos/pkg/receive.(*Handler).forward

/home/circleci/project/pkg/receive/handler.go

  Total:    152.65GB   290.83GB (flat, cum)  6.81%

github.com/golang/snappy.Decode

/home/circleci/go/pkg/mod/github.com/golang/snappy@v0.0.3-0.20201103224600-674baa8c7fc3/decode.go

  Total:    126.02GB   126.02GB (flat, cum)  2.95%

github.com/thanos-io/thanos/pkg/store/storepb.(*WriteRequest).Unmarshal

/home/circleci/project/pkg/store/storepb/rpc.pb.go

  Total:    112.75GB     1.61TB (flat, cum) 38.66%

google.golang.org/grpc/internal/transport.(*http2Client).Write

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:    112.66GB   112.87GB (flat, cum)  2.64%

internal/reflectlite.Swapper

/usr/local/go/src/internal/reflectlite/swapper.go

  Total:     80.62GB    80.62GB (flat, cum)  1.89%

github.com/thanos-io/thanos/pkg/receive.hash

/home/circleci/project/pkg/receive/hashring.go

  Total:     49.27GB   129.89GB (flat, cum)  3.04%

bytes.makeSlice

/usr/local/go/src/bytes/buffer.go

  Total:     30.01GB    30.01GB (flat, cum)   0.7%

github.com/thanos-io/thanos/pkg/store/storepb/prompb.(*WriteRequest).Unmarshal

/home/circleci/project/pkg/store/storepb/prompb/remote.pb.go

  Total:     22.34GB   480.65GB (flat, cum) 11.26%

github.com/prometheus/prometheus/tsdb/record.(*Decoder).Series

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/record/record.go

  Total:     18.23GB    18.23GB (flat, cum)  0.43%

github.com/prometheus/prometheus/tsdb.(*Head).processWALSamples

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/head.go

  Total:     13.80GB    16.67GB (flat, cum)  0.39%

github.com/prometheus/prometheus/tsdb/index.(*MemPostings).addFor

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/index/postings.go

  Total:      8.98GB     8.98GB (flat, cum)  0.21%

golang.org/x/net/http2.(*Framer).readMetaFrame.func1

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20210119194325-5f4716e94777/http2/frame.go

  Total:      7.41GB     7.41GB (flat, cum)  0.17%

context.WithValue

/usr/local/go/src/context/context.go

  Total:      6.99GB     6.99GB (flat, cum)  0.16%

google.golang.org/grpc/internal/transport.(*decodeState).processHeaderField

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http_util.go

  Total:      6.94GB     6.94GB (flat, cum)  0.16%

google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:      6.81GB    16.11GB (flat, cum)  0.38%

google.golang.org/grpc/internal/transport.(*http2Client).newStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:      6.03GB     7.70GB (flat, cum)  0.18%

google.golang.org/grpc/internal/transport.(*http2Client).createHeaderFields

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_client.go

  Total:      6.01GB     6.67GB (flat, cum)  0.16%

github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/querier.go

  Total:      5.68GB     5.69GB (flat, cum)  0.13%

google.golang.org/grpc.newClientStream

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/stream.go

  Total:      4.29GB    23.07GB (flat, cum)  0.54%

github.com/go-kit/kit/log.With

/home/circleci/go/pkg/mod/github.com/go-kit/kit@v0.10.0/log/log.go

  Total:      3.96GB     3.96GB (flat, cum) 0.093%

github.com/prometheus/prometheus/tsdb/index.(*Writer).writePostingsToTmpFiles

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/index/index.go

  Total:      3.29GB     3.42GB (flat, cum)  0.08%

golang.org/x/net/http2.(*Framer).readMetaFrame

/home/circleci/go/pkg/mod/golang.org/x/net@v0.0.0-20210119194325-5f4716e94777/http2/frame.go

  Total:      3.13GB    10.63GB (flat, cum)  0.25%

google.golang.org/grpc/internal/transport.(*controlBuffer).executeAndPut

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/controlbuf.go

  Total:      3.07GB     3.29GB (flat, cum) 0.077%

context.(*cancelCtx).Done

/usr/local/go/src/context/context.go

  Total:      2.80GB     2.80GB (flat, cum) 0.066%

github.com/prometheus/prometheus/tsdb.(*memSeries).cutNewHeadChunk

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/head.go

  Total:      2.78GB     6.20GB (flat, cum)  0.15%

context.WithDeadline

/usr/local/go/src/context/context.go

  Total:      2.75GB     4.53GB (flat, cum)  0.11%

github.com/thanos-io/thanos/pkg/server/http/middleware.RequestID.func1

/home/circleci/project/pkg/server/http/middleware/request_id.go

  Total:      2.74GB   700.44GB (flat, cum) 16.40%

github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/head.go

  Total:      2.40GB    13.33GB (flat, cum)  0.31%

google.golang.org/grpc/internal/transport.newWriteQuota

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/flowcontrol.go

  Total:      2.37GB     2.37GB (flat, cum) 0.056%

github.com/prometheus/prometheus/tsdb/index.(*MemPostings).Delete

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/index/postings.go

  Total:      2.32GB     2.32GB (flat, cum) 0.054%

github.com/thanos-io/thanos/pkg/tracing.StartSpan

/home/circleci/project/pkg/tracing/tracing.go

  Total:      1.95GB     3.86GB (flat, cum)  0.09%

github.com/oklog/ulid.Monotonic

/home/circleci/go/pkg/mod/github.com/oklog/ulid@v1.3.1/ulid.go

  Total:      1.93GB     1.93GB (flat, cum) 0.045%

time.AfterFunc

/usr/local/go/src/time/sleep.go

  Total:      1.77GB     1.77GB (flat, cum) 0.042%

github.com/prometheus/prometheus/tsdb/chunkenc.(*XORChunk).iterator

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/chunkenc/xor.go

  Total:      1.68GB     1.68GB (flat, cum) 0.039%

google.golang.org/grpc/internal/transport.(*http2Server).writeHeaderLocked

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:      1.65GB     1.93GB (flat, cum) 0.045%

github.com/thanos-io/thanos/pkg/receive.(*Handler).replicate

/home/circleci/project/pkg/receive/handler.go

  Total:      1.62GB     4.83GB (flat, cum)  0.11%

google.golang.org/grpc/internal/transport.(*http2Server).WriteStatus

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go

  Total:      1.62GB     3.59GB (flat, cum) 0.084%

github.com/thanos-io/thanos/pkg/receive.(*Handler).fanoutForward

/home/circleci/project/pkg/receive/handler.go

  Total:      1.57GB    10.58GB (flat, cum)  0.25%

github.com/prometheus/prometheus/tsdb.(*blockBaseSeriesSet).Next.func1

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/querier.go

  Total:      1.54GB     1.54GB (flat, cum) 0.036%

github.com/minio/minio-go/v7.Client.putObjectMultipartStreamFromReadAt

/home/circleci/go/pkg/mod/github.com/minio/minio-go/v7@v7.0.10/api-put-object-streaming.go

  Total:      1.50GB     1.50GB (flat, cum) 0.035%

github.com/prometheus/prometheus/tsdb.(*Head).appender

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/head.go

  Total:      1.45GB     1.45GB (flat, cum) 0.034%

github.com/prometheus/prometheus/tsdb/chunkenc.(*bstream).writeBits

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/chunkenc/bstream.go

  Total:      1.43GB     1.43GB (flat, cum) 0.033%

github.com/prometheus/prometheus/tsdb.seriesHashmap.set

/home/circleci/go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20210215121130-6f488061dfb4/tsdb/head.go

  Total:      1.29GB     1.29GB (flat, cum)  0.03%

github.com/grpc-ecosystem/go-grpc-middleware/tracing/opentracing.newClientSpanFromContext

/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/tracing/opentracing/client_interceptors.go

  Total:      1.20GB     2.50GB (flat, cum) 0.059%

google.golang.org/grpc.(*Server).processUnaryRPC

/home/circleci/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go

  Total:      1.19GB     2.30TB (flat, cum) 55.06%

github.com/grpc-ecosystem/go-grpc-prometheus.newServerReporter

/home/circleci/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_reporter.go

  Total:      1.19GB     1.19GB (flat, cum) 0.028%

@bwplotka
Copy link
Member

What's the latest? 🤗

@bwplotka
Copy link
Member

We discussed some of this today's Contributor Hours: https://docs.google.com/document/d/137XnxfOT2p1NcNUq6NWZjwmtlSdA6Wyti86Pd6cyQhs/edit#heading=h.dmnpchivqkn9 I will try to look on this a bit.

@bwplotka
Copy link
Member

Investigation: I found this #3327 being on v0.16.0 but not on master. I think merge to v0.16.0 failed.

The long term fix was merged, but it looks like not properly, as this issue is exactly the same as #3265

Long term fix attempt no 1: #3279
Long "fix" merged: #3330 might be not actually fixing this. Let me unpack this (:

@bwplotka
Copy link
Member

Thank you all for your patience and help. It kind of silly but the fix was already in PR but never merged: #3334

Up-to-date fix is available here: #3943 and will be part of v0.19.0 🚀

Some learnings when we would be investigating such issues:

  • Find the regression commit. I found that only after this guy things were working fine for @jmichalek132

image

So it was easy to tell that such "quick fix" never made to master 🤗

So it was as easy as really porting #3334 (stripping from unrelated changes) and ensure we have profiles that back up our thinking. E.g those:

Before fix: https://share.polarsignals.com/68255aa/
After fix: https://share.polarsignals.com/fbffd26/

Notice this big 200MB chunk that does not exist now 🤗

@luizrojo
Copy link
Contributor

Awesome news @bwplotka !

Looking forward to v0.19

@mxmorin
Copy link

mxmorin commented Mar 25, 2021

I've tested 0.19-rc2 and issued is fixed.
Many thanks

@bwplotka
Copy link
Member

Done then. I see the capacity to reduce the resource usage after v0.19.0 a lot too. Let's release it and iterate. Thanks all for help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants