You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FYI: this issue also occurs on producer. When there's some glitch or issue with upstream kafka, I start to see this message:
2022-08-03T06:52:00.031666Z ERROR sink{component_kind="sink" component_id=out_fluent_kafka component_type=kafka component_name=out_fluent_kafka}:request{request_id=3794715}: vector_core::stream::driver: Service call failed. error=KafkaError (Message production error: Fatal (Local: Fatal error)) request_id=3794715
And see no output into Kafka until I restart Vector (messages seems to be simply lost).
At least liveness probe should probably kill such instance, unfortunately vector_events_discarded_total was not populated (might be a bug, right?). But if I compare vector_events_in_total with vector_events_out_total, I can clearly see that there was an issue:
So either I'll extend liveness probe or ideally vector's health endpoint should be able to reflect this kind of issues.
Prometheus query in percentage for each sink. Basically means rate of "in-flight events":
(sum(rate(vector_events_in_total{component_kind="sink", component_name!="prometheus_exporter"}[5m])) by (site,pod,component_name) - sum(rate(vector_events_out_total{component_kind="sink", component_name!="prometheus_exporter"}[5m])) by (site,pod,component_name))*100/sum(rate(vector_events_in_total{component_kind="sink", component_name!="prometheus_exporter"}[5m])) by (site,pod,component_name) > 0
A note for the community
Problem
When kafka has under-replicated partitions, Vector crashes after a while.
With disabled rack awareness, error looks like this:
https://github.com/fede1024/rust-rdkafka/blob/master/src/topic_partition_list.rs#L201
With rack awareness enabled:
which is probably this issue: confluentinc/librdkafka#3569 (vector: #8750)
For first issue, Vector should probably handle this kind of errors and re-initialize client without panicking.
Configuration
Version
vector 0.21.1 (x86_64-unknown-linux-gnu)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
fede1024/rust-rdkafka#279
The text was updated successfully, but these errors were encountered: