Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'notifier.url' always set even when 'notifier.blackhole' is set to true #894

Closed
hconnan opened this issue Feb 26, 2024 · 7 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@hconnan
Copy link

hconnan commented Feb 26, 2024

Hello,

'notifier.url' parameter is not longer needed by default in Victoria Metrics.

When we would like to disable, we need to set the notifier.blackhole parameter to true. Since Victoria Metrics 1.96, when this parameter is set, we cannot set notifier.url parameter in the same time (source)
However, in the server-deployment template of the victoria-metric-alerts chart, the notifier-url is always set whatever the extra arguments.

How to reproduce it? Deploy VM alert with notifier.blackhole extraArgs set to true.

Fix suggestion: I suggest to add a condition to set the notifier.url parameter when there is no -notifier.blackhole or -notifier.config extra arguments.

@zekker6
Copy link
Contributor

zekker6 commented Feb 28, 2024

@hconnan Chart release v0.9.2 allows using -notifier.blackhole correctly. Could you check if this release helps in your case?

@hconnan
Copy link
Author

hconnan commented Mar 5, 2024

Hey! Thanks for the quick response!
Could you please release a new version for [victoria-metrics-k8s-stack](https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-k8s-stack) please? I need an updated version for it in order to check if it helps in my case.

EDIT
Ok I just saw a new version has been released for victoria-metrics-k8s-stack. Let me check

@hconnan
Copy link
Author

hconnan commented Mar 7, 2024

It does not work. I got this error : failed to init: failed to init notifier: only one of -notifier.blackhole, -notifier.url and -notifier.config flags must be specified
I saw you add a fix but it seems that, somewhere, the vmalert.alertmanager.urls is set and it's empty by default. With your fix, I still the notifer.url= parameter. I am not sure the existing condition is enough.

@zekker6
Copy link
Contributor

zekker6 commented Mar 7, 2024

@hconnan Could you please share values file which reproduces this error for you? (with any sensitive information removed)

@hconnan
Copy link
Author

hconnan commented Mar 7, 2024

Sure.

alertmanager:
  enabled: false
coreDns:
  enabled: false
defaultDashboardsEnabled: false
defaultRules:
  create: false
  rules:
    alertmanager: false
    etcd: false
    general: false
    k8s: false
    kubeApiserver: false
    kubeApiserverAvailability: false
    kubeApiserverBurnrate: false
    kubeApiserverHistogram: false
    kubeApiserverSlos: false
    kubePrometheusGeneral: false
    kubePrometheusNodeRecording: false
    kubeScheduler: false
    kubeStateMetrics: false
    kubelet: false
    kubernetesApps: false
    kubernetesResources: false
    kubernetesStorage: false
    kubernetesSystem: false
    network: false
    node: false
    vmagent: false
    vmcluster: false
    vmhealth: false
    vmsingle: false
fullnameOverride: vm-cluster
grafana:
  enabled: false
kube-state-metrics:
  enabled: true
  nameOverride: kube-state-metrics-staging
  namespaces: monitoring-staging
  rbac:
    useClusterRole: true
  replicas: 1
kubeApiServer:
  enabled: false
kubeControllerManager:
  enabled: false
kubeEtcd:
  enabled: false
kubeProxy:
  enabled: false
kubeScheduler:
  enabled: false
kubelet:
  enabled: false
prometheus-node-exporter:
  enabled: false
serviceAccount:
  annotations:
    iam.gke.io/gcp-service-account: xxx
  create: true
  name: vm-cluster
victoria-metrics-operator:
  enabled: false
vmagent:
  enabled: true
  spec:
    externalLabels:
      cluster: xxx
      entity: xxx
      environment: staging
    extraArgs:
      enableTCP6: "true"
      promscrape.suppressScrapeErrorsDelay: 120s
    ignoreNamespaceSelectors: false
    image:
      tag: v1.99.0
    inlineRelabelConfig:
    - source_labels:
      - service
      target_label: gce_instance
    nodeSelector:
      cloud.google.com/gke-nodepool: xxx
    replicaCount: 2
    resources:
      limits:
        cpu: 100m
        memory: 300Mi
      requests:
        cpu: 50m
        memory: 200Mi
    scrapeInterval: 30s
    selectAllByDefault: true
    serviceScrapeNamespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: monitoring-staging
    tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: monitoring
vmalert:
  enabled: true
  ingress:
    enabled: true
    hosts:
    - vmalert-staging.victoria-metrics
    ingressClassName: traefik
    tls:
    - secretName: tls-certs
  spec:
    extraArgs:
      notifier.blackhole: "true"
    image:
      tag: v1.99.0
    inlineRelabelConfig:
    - source_labels:
      - service
      target_label: gce_instance
    logFormat: json
    logLevel: INFO
    nodeSelector:
      cloud.google.com/gke-nodepool: xxx
    replicaCount: 2
    resources:
      limits:
        cpu: 100m
        memory: 200Mi
      requests:
        cpu: 50m
        memory: 100Mi
    selectAllByDefault: true
    tolerations:
    - effect: NoSchedule
      key: dedicated
      operator: Equal
      value: monitoring
vmcluster:
  enabled: true
  ingress:
    insert:
      enabled: true
      hosts:
      - vminsert-staging.victoria-metrics
      ingressClassName: traefik
      tls:
      - secretName: tls-certs
    select:
      enabled: true
      hosts:
      - vmselect-staging.victoria-metrics
      ingressClassName: traefik
      tls:
      - secretName: tls-certs
    storage:
      enabled: false
  spec:
    replicationFactor: 2
    retentionPeriod: "1"
    serviceAccountName: vm-cluster
    vminsert:
      extraArgs:
        maxLabelsPerTimeseries: "10000000"
      image:
        tag: v1.99.0-cluster
      nodeSelector:
        cloud.google.com/gke-nodepool:xxx
      replicaCount: 2
      resources:
        limits:
          cpu: "1.5"
          memory: 1000Mi
        requests:
          cpu: "1"
          memory: 500Mi
      tolerations:
      - effect: NoSchedule
        key: dedicated
        operator: Equal
        value: monitoring
    vmselect:
      extraArgs:
        search.maxSeries: "1000000"
        search.maxUniqueTimeseries: "0"
      image:
        tag: v1.99.0-cluster
      nodeSelector:
        cloud.google.com/gke-nodepool: xxx
      replicaCount: 2
      resources:
        limits:
          cpu: "1"
          memory: 1000Mi
        requests:
          cpu: "0.5"
          memory: 500Mi
      tolerations:
      - effect: NoSchedule
        key: dedicated
        operator: Equal
        value: monitoring
    vmstorage:
      containers:
      - command:
        - /bin/sh
        - -c
        - |
          sleep 40
          while true; do
              # every hour we create a snapshot and upload it to latest
              /vmbackup-prod \
                  -storageDataPath=/vm-data \
                  -snapshot.createURL=http://localhost:8482/snapshot/create \
                  -dst=gs://xxx/vmstorage-snapshots/latest/monitoring-staging-$POD_NAME
              # if its 5am we also upload the daily snapshot
              if [ $(date +%H) -eq "05" ]; then
                 /vmbackup-prod \
                    -storageDataPath=/vm-data \
                    -snapshot.createURL=http://localhost:8482/snapshot/create \
                    -dst=gs://xxx/vmstorage-snapshots/daily-$(date +%d-%m-%Y)/monitoring-staging-$POD_NAME
              fi
              sleep 1h
          done
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        image: victoriametrics/vmbackup:v1.99.0
        name: hourly-sidecar-backup
        volumeMounts:
        - mountPath: /vm-data
          name: vmstorage-db
      extraArgs:
        search.maxUniqueTimeseries: "0"
      image:
        tag: v1.99.0-cluster
      nodeSelector:
        cloud.google.com/gke-nodepool: xxx
      replicaCount: 3
      resources:
        limits:
          cpu: "4"
          memory: 12000Mi
        requests:
          cpu: "4"
          memory: 12000Mi
      storage:
        volumeClaimTemplate:
          spec:
            resources:
              requests:
                storage: 1000Gi
      tolerations:
      - effect: NoSchedule
        key: dedicated
        operator: Equal
        value: monitoring
vmsingle:
  enabled: false

So there is no alertmanager.urls set somewhere and you can see there is notifier.blackhole set to true

@zekker6
Copy link
Contributor

zekker6 commented Mar 7, 2024

@hconnan The chart I've referred to in this comment was actuall victoria-metrics-alert, not victoria-metrics-k8s-stack.
Let me also check k8s-stack chart and apply similar fix there.

@zekker6 zekker6 transferred this issue from VictoriaMetrics/helm-charts Mar 7, 2024
zekker6 added a commit that referenced this issue Mar 7, 2024
…blackhole is set by user

Previously, setting "notifier.blackhole" and not using any notifiers would lead to CrashLoopBackoff because vmalert would receive an empty "notifier.url" value.

Updates:
- #894
- #813
zekker6 added a commit that referenced this issue Mar 7, 2024
…blackhole is set by user

Previously, setting "notifier.blackhole" and not using any notifiers would lead to CrashLoopBackoff because vmalert would receive an empty "notifier.url" value.

Updates:
- #894
- #813
f41gh7 pushed a commit that referenced this issue Mar 7, 2024
…blackhole is set by user

Previously, setting "notifier.blackhole" and not using any notifiers would lead to CrashLoopBackoff because vmalert would receive an empty "notifier.url" value.

Updates:
- #894
- #813
@hconnan
Copy link
Author

hconnan commented Mar 15, 2024

All is good for me! Great job! Thank you very much 😄 🥳

@hconnan hconnan closed this as completed Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants