Skip to content

Memory leak while using kubernetes_logs source #6673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
karlmartink opened this issue Mar 9, 2021 · 26 comments
Closed

Memory leak while using kubernetes_logs source #6673

karlmartink opened this issue Mar 9, 2021 · 26 comments
Assignees
Labels
domain: performance Anything related to Vector's performance source: kubernetes_logs Anything `kubernetes_logs` source related type: bug A code related bug.

Comments

@karlmartink
Copy link

Vector Version

0.11.1-alpine

running on Kubernetes

v1.19.3

Vector Configuration File

    [sources.kube_logs_1]
      type = "kubernetes_logs"
      exclude_paths_glob_patterns = ["**/calico-node/**"]
      annotation_fields.container_name = "container_name"
      annotation_fields.container_image = "container_image"
      annotation_fields.pod_ip = "pod_ip"
      annotation_fields.pod_name = "pod_name"
      annotation_fields.pod_namespace = "namespace_name"
      annotation_fields.pod_node_name = "pod_node_name"
      annotation_fields.pod_uid = "pod_uid"

    [transforms.kube_logs_remapped]
      type = "remap"
      inputs = ["kube_logs_1"]
      source = '''
      .full_message = .message
      .message = "pod_stderr_stdout"
      .source = .pod_node_name
      del(.file)
      del(.pod_node_name)
      del(.kubernetes)
      '''
    [transforms.add_region_fields_to_logs]
      type = "add_fields"
      inputs = ["vector_1", "kube_logs_remapped"]
      fields.env = "<%= @environment %>"
      fields.region = "<%= @region %>"
      fields.region_domain = "<%= @region_domain %>"

    [sinks.graylog_gelf]
      type = "http"
      inputs = ["add_region_fields_to_logs"]
      uri = xxx
      encoding.codec = "ndjson"
      compression = "none"
      batch.max_bytes = 4096
      #batch.timeout_secs = 1
      #buffer.max_events = 1
      #buffer.type = "memory"
      tls.verify_hostname = false

Debug Output

https://gist.github.com/karlmartink/095979bec3dea7d91430c91a842d3927

Expected Behavior

Stable memory usage or slight increase until some maximum level is increased.

Actual Behavior

Containers memory + cpu usage is slowly growing overtime until maximum limits are reached and service is killed while processed events remain the same.

Example Data

POD CPU and Memory usage
Screenshot 2021-03-09 at 15 14 19

Processed events
Screenshot 2021-03-09 at 15 14 28

Additional Context

I am using vector to mostly collect Kubernetes logs and manipulate them with some transforms. After that they get forwarded to graylog via HTTP to GELF input.

@karlmartink karlmartink added the type: bug A code related bug. label Mar 9, 2021
@binarylogic binarylogic added domain: performance Anything related to Vector's performance source: kubernetes_logs Anything `kubernetes_logs` source related labels Mar 9, 2021
@karlmartink
Copy link
Author

karlmartink commented Mar 17, 2021

Meanwhile I updated to vector version 0.12-debian and the memory leak still stayed the same.
I split Vector deployment to two different services:

  • vector-kube-collector - Collects logs from Kubernetes and forwards to Vector-aggregator.
  • vector-aggregator - Receives logs from kube-collector and forwards to Graylog via. HTTP

The discovery I made is that when I disabled HTTP sink and replaced with Blackhole, the memory leaking stopped.
So it seems the culprit of the leak is HTTP sink.

Running with Blackhole sink
Screenshot 2021-03-17 at 13 43 48

Running with HTTP sink
Screenshot 2021-03-17 at 09 00 14

vector-kube-collector configuration file
`

Read Kubernetes logs from files

[sources.kube_logs_1]
  type = "kubernetes_logs"
  exclude_paths_glob_patterns = ["**/calico-node/**"]
  annotation_fields.container_name = "container_name"
  annotation_fields.container_image = "container_image"
  annotation_fields.pod_ip = "pod_ip"
  annotation_fields.pod_name = "pod_name"
  annotation_fields.pod_namespace = "namespace_name"
  annotation_fields.pod_node_name = "pod_node_name"
  annotation_fields.pod_uid = "pod_uid"

# Transform Kube logs and remove unused fields
[transforms.kube_logs_remapped]
  type = "remap"
  inputs = ["kube_logs_1"]
  source = '''
  .full_message = .message
  .message = "pod_stderr_stdout"
  .source = .pod_node_name
  del(.file)
  del(.pod_node_name)
  del(.kubernetes)
  '''

# Forward logs to vector-aggregator service
[sinks.vector_aggr]
  type = "vector"
  inputs = ["kube_logs_remapped"]
  address = "vector-aggregator.monitoring.svc.cluster.local:31014"`

vector-aggregator configuration file
`
# Receive logs from VECTOR agents
[sources.vector_1]
type = "vector"
address = "${POD_IP}:31014"

# Add environment, region and region_domain fields
[transforms.add_region_fields_to_logs]
  type = "add_fields"
  inputs = ["vector_1"]
  fields.env = "<%= @environment %>"
  fields.region = "<%= @region %>"
  fields.region_domain = "<%= @region_domain %>"

# Forward logs to Graylog
[sinks.blackhole]
  type = "blackhole"
  inputs = ["add_region_fields_to_logs"]

@blt
Copy link
Contributor

blt commented Mar 17, 2021

I haven't had a chance to triage but this does strike me as relevant to the conversation in #6781.

@karlmartink
Copy link
Author

As this memory leak issue is the last blocker for me to deploy vector to my production environment, I continued experimenting with different versions and images. All graphs represent "Average pod memory usage"

0.12-debian + HTTP sink
Screenshot 2021-03-18 at 11 02 18

0.12-debian + Blackhole sink
Screenshot 2021-03-18 at 11 03 19

0.12.1-distroless-libc + HTTP sink
Screenshot 2021-03-18 at 11 04 14

@blt if there is any combination that you would like me to test then let me know.

I will continue experimenting with HTTP sink buffer settings and see if they have any effect on the memory leak.

@karlmartink
Copy link
Author

Some further testing revealed, that enabling prometheus metrics with the following configuration made things alot worse... can be related to #6781

[sources.internal_metrics]
      type = "internal_metrics"
[sinks.prom_metrics]
  type = "prometheus_exporter"
  inputs = ["internal_metrics"]
  address = "${POD_IP}:8080"
  default_namespace = "vector"`

0.12.1-distroless-libc + HTTP sink + Prometheus exporter
Screenshot 2021-03-19 at 09 58 43

@KHiis
Copy link

KHiis commented Mar 19, 2021

+1 with the issue. Is there any expected workaround or anything we could look forward to?

@binarylogic
Copy link
Contributor

@KHiis we are actively investigating and will report back as we learn. We're hoping to get to the root cause next week.

@blt
Copy link
Contributor

blt commented Mar 19, 2021

@karlmartink @KHiis apologies for the delay in getting back to you and thank you for the detailed issue. Sinking into the blackhole not demonstrating this issue would suggest to me that vector's buffering as it backs up into the HTTP sink, but the growth when you enable prometheus is surprising to me. @karlmartink out of curiosity, if you enable prometheus and sink into blackhole does the issue still persist?

Also, if you have the prometheus sink switched on I would be very curious to know about the following bits of telemetry for each step in your config:

  • events_in_total
  • events_out_total
  • processed_bytes_total

If indeed one of the steps is backing up and causing buffering I'd expect this triplet to fall off steeply for one of them. We are also aware of #6305 where vector will hold file descriptors open if the kubernetes log source is unable to drain quickly enough. Do you have a measure of your open FDs? We have also, in 0.12, introduced debug logging that shows which steps in the config have high load. If you enable debug logging -- -vvv -- can you send me log output with lines that match for utilization? The closer to 1 the component is the busier, so that might give us some clue as well.

@karlmartink
Copy link
Author

@blt here is some more telemetry. I grouped the graphs by "component_type" to accurately visualise as I have 3 pods running for vector-aggregator.

0.12.1-distroless-libc + HTTP sink + Prometheus exporter
Screenshot 2021-03-22 at 13 35 18

CPU usage graph for the same period
Screenshot 2021-03-22 at 14 16 26

Some utilization info from debug output:

Mar 22 12:34:05.906 DEBUG transform{component_kind="transform" component_name=add_region_fields_to_logs component_type=add_fields}: vector::utilization: utilization=0.8536889391132572
Mar 22 12:34:05.907 TRACE vector: Beep.
Mar 22 12:34:05.907 DEBUG sink{component_kind="sink" component_name=blackhole component_type=blackhole}: vector::utilization: utilization=0.007456749021618303
Mar 22 12:34:10.907 DEBUG transform{component_kind="transform" component_name=add_region_fields_to_logs component_type=add_fields}: vector::utilization: utilization=0.08562899173450227
Mar 22 12:34:10.907 DEBUG sink{component_kind="sink" component_name=blackhole component_type=blackhole}: vector::utilization: utilization=0.0010800648283539548
Mar 22 12:34:10.907 TRACE vector: Beep.
Mar 22 12:34:20.907 DEBUG transform{component_kind="transform" component_name=add_region_fields_to_logs component_type=add_fields}: vector::utilization: utilization=0.00107175880550649
Mar 22 12:34:20.907 DEBUG sink{component_kind="sink" component_name=blackhole component_type=blackhole}: vector::utilization: utilization=0.0003146771524068778
Mar 22 12:34:25.906 DEBUG transform{component_kind="transform" component_name=add_region_fields_to_logs component_type=add_fields}: vector::utilization: utilization=0.0003256409642432362
Mar 22 12:34:25.908 TRACE vector: Beep.
Mar 22 12:34:25.908 DEBUG sink{component_kind="sink" component_name=blackhole component_type=blackhole}: vector::utilization: utilization=0.00027920071659492116

I enabled prometheus exporter to blackhole sink. Will post results soon.

@karlmartink
Copy link
Author

karlmartink commented Mar 23, 2021

0.12.1-distroless-libc
"internal_metrics" enabled
Kubernetes logs received via. Vector source
Blackhole sink

Screenshot 2021-03-23 at 08 37 10
Screenshot 2021-03-23 at 08 37 16

As it can be seen sinking into blackhole again does not demonstrate the issue.

Unfortunately I don't have open FD metric to provide. But as can be seen on the graphs, kubernetes log collecting works without issues and has zero problems.

@blt
Copy link
Contributor

blt commented Mar 23, 2021

@karlmartink this is great, thank you. @eeyun it strikes me that this might actually be related to the work you're doing these days.

@karlmartink
Copy link
Author

karlmartink commented Mar 24, 2021

I really hope this fixes the issue.
Ready to test whenever it is merged.
Meanwhile if there is anything else I can test then let me know.

@karlmartink
Copy link
Author

@blt Any progress on this issue or any other metrics I can provide?

@blt
Copy link
Contributor

blt commented Mar 29, 2021

@karlmartink this has been a confounding one. I don't have any news yet but it's still being worked.

@blt
Copy link
Contributor

blt commented Apr 2, 2021

Alright. I have been able to reproduce the issue in a very minimal setup. Here's my vector config:

data_dir = "/tmp"

[sources.stdin]
type = "stdin"

[sinks.http]
type = "http"
inputs = ["stdin"]
compression = "none"
encoding.codec = "json"
healthcheck.enabled = false
batch.max_bytes = 128
uri = "http://127.0.0.1:8080/"

Which I sink into atchteeteepea though any web server will do. My run script is:

#!/usr/bin/env bash

set -o errexit
set -o pipefail
set -o nounset
# set -o xtrace

RUSTFLAGS="-g" cargo build --release --no-default-features --features "sources-stdin,sinks-http"

yes "msg" | ./target/release/vector --config vector-tohttp.toml

I can observe that vector does gradually increase its memory by a few bytes here and there which does seem roughly correlated to the number of requests out that vector makes, which is why I have the batch size set so low in my config. Unfortunately there's not a direct correlation and the accumulation process does take a while as your graphs show, so we do apologize for the slow progress here. We've just finished an upgrade to vector's tokio which, as @lukesteensen pointed out to me, should resolve some known sources of fragmentation. I'll be running vector with the above config and setup under valgrind; this profile should be clearer now but it will take some time to get results.

@blt
Copy link
Contributor

blt commented Apr 3, 2021

The upgrade to tokio 1.0 has really paid off. @jszwedko relevant to our recent work, if you pull this massif dump open you'll see its recording vector at 1.5Gb of heap and the majority of that is in the tracing subsystem. Massif out:

massif.out.zip

Screenshot:

Screenshot from 2021-04-02 17-31-26

@karlmartink we have an undocumented environment variable DISABLE_INTERNAL_METRICS_TRACING_INTEGRATION that if you set this to "true" in your environment will disable that code paths that lead into this area. I'm going to re-run my massif setup again with this set and report back but just FYI.

@karlmartink
Copy link
Author

@blt I appriciate the feedback.

Had some time to do some testing but unfortunately I did not see any improvements on the memory usage side with DISABLE_INTERNAL_METRICS_TRACING_INTEGRATION flag.

0.12.2-distroless-libc
HTTP sink
No prometheus exporter
DISABLE_INTERNAL_METRICS_TRACING_INTEGRATION
= true
~6hour timespan

Screenshot 2021-04-05 at 19 42 28

@blt
Copy link
Contributor

blt commented Apr 5, 2021

@karlmartink Alright, good news. If you pop open this massif dump there's a very clear indication that vector's putting a lot of memory into metric-rs histograms. This is almost surely an unfortunate consequence of the bucket sizes chosen by metrics-rs and the granularity of the data we're pushing into them.

I hope to have a fix here today or tomorrow.

massif.out.1324348.zip

@blt
Copy link
Contributor

blt commented Apr 5, 2021

Related to #6461 and #6456.

@blt
Copy link
Contributor

blt commented Apr 5, 2021

Ah, yep, I believe I see the problem. We bridge our internal and metric-rs metrics here:

https://github.com/timberio/vector/blob/267265a739bfd33f8b45d2fbcbf8155571bd7524/src/metrics/mod.rs#L118-L126

The important bit is that from_metric_kv:

https://github.com/timberio/vector/blob/267265a739bfd33f8b45d2fbcbf8155571bd7524/src/event/metric.rs#L292-L316

We're calling read_histogram here which leaves the underlying samples in place -- the metrics-rs histogram is a linked list of fixed sized blocks of samples that grows as you add more samples -- where metrics-rs own exporters call read_histogram_with_clear, a function that clears up the internal space of the histogram. Experimentation shows this doesn't quite do the trick but it does help some. We're behind current metrics-rs and may be hitting a but that upstream has fixed.

@karlmartink
Copy link
Author

Seems you have made solid progress on the issue which I am really glad about.
If there is anything else I can test from my side I am happy to help out.

blt added a commit that referenced this issue Apr 8, 2021
This commit introduces our own metrics-rs Handle implementation, one that takes constant space per metric type. This significantly reduces our happy path memory use under load and should resolve memory leaks noted in the closing issues. The leak culprit is metrics-rs Histogram combined with tracing integration. Their implementation relies on an atomic linked list of fixed-sized blocks to store samples. The storage required by this method grows until a quiescent period is found, a period that happens regularly when metrics-rs is not integrated with tracing and does not happen at present for reasons unknown. However, in our testing we've seen that a single histogram under heavy load can consume approximately 200Mb before being garbage collected, a burden vector doesn't need to bear as our own needs for histogram are served by a simple counting bucket arrangement. Own own histogram uses a fixed, boxed slice of atomic ints as buckets. This is the shape of the data that vector was exposing already and we remove a little internal translation.

I believe this PR will significantly reduce the memory use of a loaded vector, leak not withstanding.

REF #6673
Closes #6461 #6456

Signed-off-by: Brian L. Troutwine brian@troutwine.us
@blt
Copy link
Contributor

blt commented Apr 8, 2021

@karlmartink well, good news. I am reasonably certain that #7014 will address a good portion if not the whole of your problem. Our nightly build job starts at 4 AM UTC so by 6AM UTC there ought to be a build available with #7014 included. When you get a chance can you try out the nightly and let us know how it goes?

@karlmartink
Copy link
Author

@blt that's awesome!
I deployed the new nightly version and will monitor the resource usage over the weekend. Will report back soon.

@karlmartink
Copy link
Author

It seems that the issue is fixed! @blt thank you for the support.

Screenshot 2021-04-12 at 09 12 28

@jszwedko
Copy link
Member

jszwedko commented Apr 12, 2021

Nice! This seems to have fixed a large portion of the unbounded memory growth. However, looking at those new graphs it seems like there is still steady memory growth, just much slower.

@blt
Copy link
Contributor

blt commented Apr 12, 2021

@karlmartink excellent news! If you're satisfied I'll close this ticket out then.

@jszwedko yes, agreed. I think some of that is explicable by allocator fragmentation but I'll do a long massif run to get more concrete details, at least for the minimal configs we used to diagnose the more serious leak here.

@karlmartink
Copy link
Author

This can be closed from my side.
If there is anything else I can help with testing in the future I am happy to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Vector's performance source: kubernetes_logs Anything `kubernetes_logs` source related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

6 participants