Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time consumed by each operator #289

Merged
merged 51 commits into from
Aug 4, 2023
Merged

Conversation

nishchay-veer
Copy link
Contributor

@nishchay-veer nishchay-veer commented Jun 18, 2023

This pull request enhances the codebase by introducing the model.OperatorTelemetry interface, allowing for better tracking and analysis of CPU time usage by operators.
• Added a new interface called model.OperatorTelemetry, which includes the methods AddCPUTimeTaken and CPUTimeTaken for each operator.
• The AddCPUTimeTaken method allows recording the CPU time taken by an operator during its execution.
• Included a new test TestQueryAnalyze to verify that the query analysis produces valid results, including non-zero CPU time values, for different types of queries.

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
@yeya24
Copy link
Contributor

yeya24 commented Jun 18, 2023

This only tracks the time consumed when initializing the operator. I think what we need is mainly time consumed when calling series, Next, etc

the TimingOperator struct wraps another VectorOperator and includes a startTime field to track the start time of each Next method call. Inside the Next method implementation of TimingOperator, it captures the current time using time.Now() and stores it in the startTime field. Then, it calls the Next method of the wrapped operator.

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
added a boolean flag recordTime to enable or disable recording of time taken.If it is enabled, then I have added the code to record the time taken to the duration field. Also modified the constantMetric to include a new label called .

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
The changes are done for capturing observability information for operators in a clean and modular way. By using embedding instead of wrapping, it allows for more granular data capture without creating cross-references between operators.

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! I think we need to make a few things clear before we proceed with this.

execution/function/scalar.go Outdated Show resolved Hide resolved
execution/scan/literal_selector.go Outdated Show resolved Hide resolved
execution/scan/literal_selector.go Outdated Show resolved Hide resolved
execution/function/scalar.go Outdated Show resolved Hide resolved
execution/function/scalar.go Outdated Show resolved Hide resolved
execution/model/operator.go Show resolved Hide resolved
execution/function/scalar.go Outdated Show resolved Hide resolved
The Analyze method returns an AnalyzeOutputNode, which represents the analysis of the query execution. This analysis can include performance metrics, such as CPU time, memory usage, or other relevant statistics. The Analyze method allows for capturing observability information during query execution to assess performance

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
engine/engine.go Outdated Show resolved Hide resolved
engine/engine.go Outdated Show resolved Hide resolved
engine/engine.go Show resolved Hide resolved
execution/execution.go Outdated Show resolved Hide resolved
execution/scan/literal_selector.go Outdated Show resolved Hide resolved
engine/engine.go Outdated Show resolved Hide resolved
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

}
func (o *numberLiteralSelector) AddCPUTimeTaken(t time.Duration) {
o.t.AddCPUTimeTaken(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest just embedding model.OperatorTelemetry directly so that we wouldn't have to write these functions. It would simply be possible to call AddCPUTimeTaken directly: https://go.dev/play/p/sYvyjlMaZh0

engine/engine.go Outdated Show resolved Hide resolved
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
engine/engine_test.go Outdated Show resolved Hide resolved
execution/function/absent.go Outdated Show resolved Hide resolved
engine/engine_test.go Show resolved Hide resolved
engine/engine_test.go Outdated Show resolved Hide resolved
}

func (o *scalarFunctionOperator) Analyze() (model.OperatorTelemetry, []model.ObservableVectorOperator) {
return o, []model.ObservableVectorOperator{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return o.next here.

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
next: nextOps[0],
funcExpr: funcExpr,
}, nil
}
r.OperatorTelemetry = &model.NoopTimingInformation{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement this logic once and then pass it to each operator. This way you would avoid copy/paste.

if obsOperator, ok := operator.(model.ObservableVectorOperator); ok {
obsOperators[i] = obsOperator
} else {
obsOperators[i] = nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only odd Analyze(). Let's do this instead:

obsOperators := make([]model.ObservableVectorOperator, 0, len(c.operators))
for i, operator := range c.operators {
  if obsOperator, ok := operator.(model.ObservableVectorOperator); ok {
    obsOperators = append(obsOperators, obsOperator)
  }
}
return c, obsOperators


func (ti *NoopTimingInformation) AddCPUTimeTaken(t time.Duration) {}

type TimingInformation struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future we will be using this struct for not just capturing CPU timing information so maybe it would be worth changing names in this PR before merging?

I suggest something like TrackedTelemetry and NoopTelemetry?

engine/engine.go Outdated
}

type AnalyzeOutputNode struct {
OperatorTelemetry model.OperatorTelemetry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add OperatorName string like in ExplainOutputNode

…emetry information

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think now this is in a good place, some comments!

engine/engine.go Outdated

func analyzeVector(obsv model.ObservableVectorOperator) *AnalyzeOutputNode {
telemetry, obsVectors := obsv.Analyze()
name, _ := obsv.(model.VectorOperator).Explain()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to typecast here.

Suggested change
name, _ := obsv.(model.VectorOperator).Explain()
name, _ := obsv.Explain()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add the name to the original struct? This will be quite costly 😅

engine/engine.go Show resolved Hide resolved
engine/engine.go Outdated

func analyzeVector(obsv model.ObservableVectorOperator) *AnalyzeOutputNode {
telemetry, obsVectors := obsv.Analyze()
name, _ := obsv.(model.VectorOperator).Explain()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might add some additional overhead, but we can iterate on that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can return name as well in Analyze() (string, OperatorTelemetry, []ObservableVectorOperator) for each operator ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add the name to the struct and add a method to OperatorTelemetry like Name() string which would return the operator's name. 🤔 I think that would make the most sense

Comment on lines 36 to 40
func SetTelemetry(t model.OperatorTelemetry, opts *query.Options) {
t = &model.NoopTelemetry{}
if opts.EnableAnalysis {
t = &model.TrackedTelemetry{}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes lint to fail. Let's return simply,

Suggested change
func SetTelemetry(t model.OperatorTelemetry, opts *query.Options) {
t = &model.NoopTelemetry{}
if opts.EnableAnalysis {
t = &model.TrackedTelemetry{}
}
func SetTelemetry(opts *query.Options) model.OperatorTelemetry {
if opts.EnableAnalysis {
return &model.TrackedTelemetry{}
}
return &model.NoopTelemetry{}

Comment on lines 59 to 60
}
SetTelemetry(r.OperatorTelemetry, opts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With comment above we can then use it as,

Suggested change
}
SetTelemetry(r.OperatorTelemetry, opts)
OperatorTelemetry: SetTelemetry(opts),
}

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome work!
Let's merge and iterate on this! :)

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
@GiedriusS GiedriusS merged commit 99d2be3 into thanos-io:main Aug 4, 2023
6 checks passed
Player256 pushed a commit to Player256/promql-engine that referenced this pull request Aug 16, 2023
* CPU time consumed by each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Time consumed by operator when calling Next

the TimingOperator struct wraps another VectorOperator and includes a startTime field to track the start time of each Next method call. Inside the Next method implementation of TimingOperator, it captures the current time using time.Now() and stores it in the startTime field. Then, it calls the Next method of the wrapped operator.

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Generalise timing operator for scalar and aggregate

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Added recordTime and duration field in numberLiteralSelector

added a boolean flag recordTime to enable or disable recording of time taken.If it is enabled, then I have added the code to record the time taken to the duration field. Also modified the constantMetric to include a new label called .

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* embedding instead of wrapping operator inside of operator

The changes are done for capturing observability information for operators in a clean and modular way. By using embedding instead of wrapping, it allows for more granular data capture without creating cross-references between operators.

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Added Analyze method in ExplainableQuery interface

The Analyze method returns an AnalyzeOutputNode, which represents the analysis of the query execution. This analysis can include performance metrics, such as CPU time, memory usage, or other relevant statistics. The Analyze method allows for capturing observability information during query execution to assess performance

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Added analyze function for local testing

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Include ObservableVectorOperator in building the operator tree

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Minor changes for type assertion

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Type casting into model.ObservableVectorOperator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Initialised TimingInformation to avoid nil case

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Next call when Analyze query

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Fixed some checks failing

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Embed struct

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Test code for Query Analyze

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Embed model.OperatorTelemetry

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Assertion function for non-zero CPU Time

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Adding model.OperatorTelemetry to each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Capturing CPUTime of each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* engine: actually execute the query

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* Find time consumed by each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removed unnecessary type casting

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Added analyze function for local testing

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Initialised TimingInformation to avoid nil case

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Next call when Analyze query

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Adding model.OperatorTelemetry to each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Capturing CPUTime of each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Find time consumed by each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removed unnecessary type casting

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Fixed few minor nits

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Added model.OperatorTelemetry to noArgFunOperator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removing type checks for model.TimingInformation

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removed type checks

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removed type checks in TestQueryAnalyze

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Fixed few minors issues

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* modified TestQueryAnalyze to check for child operators

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* linters check passed

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Added operatorTelemetry to each operator

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Changed TimingInformation to TrackedTelemetry for recording other telemetry information

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Remove  nil case in slice of operators

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* removing unnecessary typecasts

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Set name of operators

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removed Explain overheads from AnalyzeOutputNode

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* added setName() method to Analyze

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* fixed engine_test

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Removed name from NoopTelemetry

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>

* Rename CPU Time -> Execution time

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* engine: clean up last CPUTime reference

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

* execution: rename more fields

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>

---------

Signed-off-by: nishchay-veer <nishchayveer19@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Nishchay Veer <99465982+nishchay-veer@users.noreply.github.com>
Co-authored-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants