-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
query: Add store.response-timeout #928
Conversation
87bc810
to
b36db84
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks like a sensible mitigation. Would be great to look in to the underlying store issue as follow-up.
4c822b4
to
2207f9d
Compare
a69a9c5
to
c230d23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just:
- flag name suggestion
- CI test suggestion
8221f9c
to
cea8d0e
Compare
@bwplotka I've introduced https://github.com/improbable-eng/thanos/pull/928/files#diff-e7ba964cd2afa54767a2adc1bac73e67R438 |
fb7c0bd
to
ca89c08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good and I would be happy to merge if not the confusion around gRPC metric we add here. See my comment.
Also I am happy to add this periodic test in CircleCi for this env, can you add Github issue on me for that? |
Still before merging we need to remove the metrics from here I guess? |
d2bba2e
to
0519103
Compare
Yup, also made a PR for metrics grpc-ecosystem/go-grpc-prometheus#71 |
I've squashed it |
Co-Authored-By: povilasv <p.versockas@gmail.com>
Co-Authored-By: povilasv <p.versockas@gmail.com>
0519103
to
e4b3530
Compare
Co-Authored-By: povilasv <p.versockas@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, just one minor nit, not a blocker though. LGTM, thanks!
Changes
Improves handling of timed out data/
Adds
--store.response-timeout
in order to have a timeout on a single stream operation.Some times GRPC streams get stuck and don't send any data for long periods of time
E.G. Simple GRPC Store API, which just sleeps.
Right now the only option to time them out in GRPC is to set
context.WithTimeout()
The issue is that it is a global timeout,
if we set it to 10s, we won't ever get results from Thanos Store (backed by object store)
if we set it to 120s, we will always be waiting 120s for the HTTP queries to finish, which is :/
As we will be waiting for the slowest one to finish
This fix adds a timeout per single data receive operation
Ref: #928
Thanos Query will log if timeout was reached:
Verification
Tested locally on our cluster, both turned on and turned off (default).
Unit tests.