Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add query bytes fetched along with other series stats in Query Response #7390

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

christopherzli
Copy link

Add query bytes fetched along with other series stats in Query Response
Want to seek approval before I added changelog

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

Verification

curl http://localhost:10902/api/v1/query?query='COUNT(up)'
{"status":"success","data":{"resultType":"vector","result":[{"metric":{},"value":[1716515107.878,"173"]}],"analysis":{},"seriesStatsCounter":{"Series":174,"Chunks":174,"Samples":2248,"Bytes":101533}}}

@christopherzli
Copy link
Author

it is to address this issue: #7365

add thanos stats queried

Signed-off-by: Christopher Li <christopher.li@databricks.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a change on API response. I wonder if we want to put this under a FF in the API request.

@christopherzli
Copy link
Author

It is a change on API response. I wonder if we want to put this under a FF in the API request.

sure that sounds like an idea. Another route I want to chase down is if we want to include in other metadata format like http headers, do we have any guidelines/examples on where we should put?

@yeya24
Copy link
Contributor

yeya24 commented May 24, 2024

Sorry I am a little bit unsure what you want to achieve here with http headers. Can you elaborate it more?

Do you plan to log the info or ?

@christopherzli
Copy link
Author

christopherzli commented May 24, 2024

Our use case is to export this stat information along with query response, and it could be part of response or pass with http headers.
We want to pass along that information to query frontend and finally return as part of http response or header as well so that we can do our query cost attribution to see how much data is fetched for each query.
We want to seek suggestion on what is the best option to export this piece of information for query cost attribution.
We previously use M3DB as our metrics engine and there was query bytes fetched information in their response header.
The reason I put that in part of Thanos query data response struct instead of http header is that I do not see any existing use of http header as part of query response.

Warnings []error `json:"warnings,omitempty"`
QueryAnalysis queryTelemetry `json:"analysis,omitempty"`
Warnings []error `json:"warnings,omitempty"`
SeriesStatsCounter storepb.SeriesStatsCounter `json:"seriesStatsCounter,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm against this, it's a feature that could be so much more useful. For example, if you have a query like sum(a) + sum(b) then how do you know which selector downloaded more data and what should be the focus of optimizations? We could do much better. In fact, there's a mechanism/place where this kind of info should be exposed.

Please take a look at the Analyze functionality and focus there. A good plan of action would be first to add some mechanism through context.Context that would allow capturing more granular data about the fan-out. This could be used in the PromQL engine to "surface" this data through the Analyze() call, and then this could be shown in the UI next to each selector.

Copy link
Author

@christopherzli christopherzli Jun 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @GiedriusS in our case our usecase is much simpler: we simply want to know how expensive each query is and attribute our total query cost to each user based on # of bytes fetched/# of timeseries.
I could follow @yeya24 's suggestion on making it a command line flag on querier.
The method you mentioned is an overkill for this usecase and could be put as a TODO, what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants