You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
TL'DR : Elasticsearch sink drop events unintentionally even when Acknowledgments are enabled due to failure on loading AWS credentials.
Also when basic auth user do not have permissions (403 status code)
I tried to be really concise, please let me know if can provide any extra information that maybe I missed. Thanks in advance.
Context:
Vector running into AWS ECS Fargate as a service ( 1 to 3 task, with autoscaling enabled).
Kafka Source configured
Elasticsearch and AWS S3 Sinks configured using aws auth strategy.
End to end acknowledgments are enabled.
Description:
Vector is running smoothly until some increase of load arrives Kafka (for us, Kubernetes velero backups every hour). Sometimes those spikes do not drop anything, others drop a few and sometimes drop a lot.
But always same errors in the logs:
Things that have been tried:
Increase CPU resources: improve a little bit, but still facing the issue
Scale horizontally: Also improve but still been able to reproduce.
Configure IMDS timeouts, nothing change.
Workaround:
Switch to basic authentication is the only way to avoid dropping events when those spikes comes that was found.
Proposal:
Vector should be able to handle credentials errors and apply back pressure instead of dropping events when:
AWS credential provider failed to load credential, EcsContainer in this case.
Http response status 403 forbidden due to lack of user permissions.
Troubleshooting:
Seems like vector is using AWS rust SDK to sing the request to OpenSearch, but apparently, loads the credentials every single request and not use the cache that is defined?
it seems that could fit under #10870 , but the error loading aws credentials from the aws-sdk maybe is kind of different thing here? not 100% sure tbh...
A note for the community
Problem
TL'DR : Elasticsearch sink drop events unintentionally even when Acknowledgments are enabled due to failure on loading AWS credentials.
I tried to be really concise, please let me know if can provide any extra information that maybe I missed. Thanks in advance.
Context:
aws
auth strategy.Description:
Vector is running smoothly until some increase of load arrives Kafka (for us, Kubernetes velero backups every hour). Sometimes those spikes do not drop anything, others drop a few and sometimes drop a lot.
But always same errors in the logs:
Things that have been tried:
Workaround:
Switch to basic authentication is the only way to avoid dropping events when those spikes comes that was found.
Proposal:
Vector should be able to handle credentials errors and apply back pressure instead of dropping events when:
EcsContainer
in this case.403 forbidden
due to lack of user permissions.Troubleshooting:
Seems like vector is using AWS rust SDK to sing the request to OpenSearch, but apparently, loads the credentials every single request and not use the cache that is defined?
Configuration
Version
0.36.0 -> 0.37.0
Debug Output
Example Data
No response
Additional Context
Vector is running at AWS ECS Fargate.
References
debug
#15196The text was updated successfully, but these errors were encountered: