Skip to content

[Question] Backpressure propagation poses risk for all components (anti-pattern - cascading failure: single component failures leads to all components failure) #19938

@slavanovak

Description

@slavanovak

Question: Can you please confirm that it is expected behavior that when buffer of one of "sink" is full then buffering start to happen to all buffers in the chain until buffer of the related "source" becomes full and "source" starts to slowdown?

(note: Buffering model is explained in https://vector.dev/docs/about/under-the-hood/architecture/buffering-model/)

As "source" can have multiple "sinks" then single "sink" saturation (buffer is full) leads to other "sink"s to be impacted as "source" is slowed down on ingestion of events.

Notes:

  • Timestamp are not exact and may be differ by 10-30 seconds on the pictures below

Evidence

image

Network failure (dropping packets) was simulated with https://github.com/ModusCreateOrg/slow

root@minikube:~# ./slow -p 10%
command=slow
bandwidth=100kbps
latency=
Changing existing queuing discipline
root@minikube:~# date
Fri Feb 23 13:47:04 UTC 2024
root@minikube:~# ./slow reset
resetting queueing discipline
root@minikube:~# date
Fri Feb 23 13:55:36 UTC 2024
root@minikube:~# 
Timeline
  • From before to 13:47:10 UTC: no dropping of events; rate: 12K events per minute
  • From 13:47:10 UTC to 13:55:36 UTC: dropping events on rate 10% using "slow"; rate is not consistent
  • From 13:55:36 UTC: no dropping of events; rate: 12K events per minute
Source (my_source) as "demo_logs"

Configuration of 12k events per minute:

...
    my_source_id:
      type: demo_logs
      format: json
      interval: 0.005
...
image
Humio_1 (humio_kubernetes_logs_output_mirror)

Humio_1 (humio_kubernetes_logs_output_mirror) didn't have full buffer.

image image

Humio query:

 minikube
| "tags.component_id" = "humio_kubernetes_logs_output_mirror" | gauge
| name != "buffer_max_event_size"
| name != utilization
| name != "buffer_byte_size"
Humio_2 (humio_kubernetes_logs_output)

Humio_2 (humio_kubernetes_logs_output) had full buffer.

image image

Humio query:

 minikube
| "tags.component_id" = "humio_kubernetes_logs_output" | gauge
| name != utilization
| name != "buffer_byte_size"
| name != "buffer_max_event_size"
| name = "buffer_events"

Problem

Why saturation of "humio_2" led to the impact on "humio_1"

Configuration

# Base config - https://github.com/vectordotdev/helm-charts/blob/develop/charts/vector/values.yaml
role: Agent
serviceAccount:
  create: true
  name: ${serviceAccountName}
podHostNetwork: true
podPriorityClassName: system-node-critical
image:
  repository: timberio/vector
  tag: "nightly-debian"
  sha: ${sha}
env:
  - name: HUMIO_TOKEN_KUBERNETES_LOGS_ENV
    valueFrom:
      secretKeyRef:
        name: vector-humio-token-kubernetes-logs-secret-name
        key: HUMIO_TOKEN_KUBERNETES_LOGS
  - name: VECTOR_LOG
    value: debug
resources: ${resources} 
podMonitor:
  enabled: true
  jobLabel: app.kubernetes.io/name
  port: prom-exporter
  path: /metrics
  interval:
  scrapeTimeout:
  relabelings: [ ]
  metricRelabelings: [ ]
  honorLabels: false
  honorTimestamps: true
customConfig:
  data_dir: /vector-data-dir
  api:
    enabled: true
    address: 127.0.0.1:8686
    playground: false
  sources:
    kubernetes_logs_input:
      type: kubernetes_logs
      ingestion_timestamp_field: .vector_ingest_timestamp
    my_source_id:
      type: demo_logs
      format: json
      interval: 0.005
    vector_internal_logs:
      type: internal_logs
    vector_internal_metrics_poc:  
      type: internal_metrics
      scrape_interval_secs: 20
  transforms:
    transform_kubernetes_logs_input:
      inputs: 
        - kubernetes_logs_input
      type: remap
      source: |-
        del(.file)
...
  sinks:
    humio_kubernetes_logs_output:
      type: humio_logs
      inputs:
        - transform_kubernetes_logs_input
        - my_source_id
      token: "$${HUMIO_TOKEN_KUBERNETES_LOGS_ENV}"
      encoding:
        codec: json
    humio_kubernetes_logs_output_mirror:
      type: humio_logs
      inputs:
        - transform_kubernetes_logs_input
        - my_source_id
      token: X
      encoding:
        codec: json
      acknowledgements:
        enabled: true
    humio_vector_internal_logs_output:
      type: humio_logs
      inputs:
        - vector_internal_logs
      token: X
      encoding:
        codec: json
    humio_vector_internal_metrics_output:
      type: humio_metrics
      inputs:
        - vector_internal_metrics_poc
      token: X

***

(⎈|minikube:default)16:13:19:slava.novak:vector-poc-replacement-of-cribl [pull_commit] $:minikube start
😄  minikube v1.31.2 on Darwin 13.5.1 (arm64)
🎉  minikube 1.32.0 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.32.0
💡  To disable this notice, run: 'minikube config set WantUpdateNotification false'

✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔄  Restarting existing docker container for "minikube" ...
🐳  Preparing Kubernetes v1.27.4 on Docker 24.0.4 ...
🔗  Configuring bridge CNI (Container Networking Interface) ...
🔎  Verifying Kubernetes components...
    ▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Version

root@minikube:/# vector -V vector 0.37.0 (aarch64-unknown-linux-gnu 1470f1a 2024-02-21 04:01:39.270562243) root@minikube:/#

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugA code related bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions