export opentelemetry spans with batch processor #6733

theoclark · 2023-12-22T13:42:27Z

We have observed slowdowns of up to 50% when running tritonserver with opentelemetry. We have traced this to the SimpleSpan Exporter and have observed these slowdowns go away when switching to the BatchSpanProcessor.

oandreeva-nv · 2023-12-26T19:06:05Z

src/tracer.cc

@@ -48,6 +48,9 @@ namespace otel_resource = opentelemetry::sdk::resource;

 namespace triton { namespace server {

+static const int EXPORT_QUEUE_SIZE = 5000;


Does it make sense to make these options configurable through cmdline args?

oandreeva-nv · 2023-12-26T19:06:57Z

src/tracer.h

@@ -40,6 +40,8 @@
 #include "opentelemetry/sdk/resource/resource.h"
 #include "opentelemetry/sdk/trace/processor.h"
 #include "opentelemetry/sdk/trace/simple_processor_factory.h"


If we are removing SimpleProcessFactory, then it makes sense to remove this input

oandreeva-nv

Hi @theoclark , thank you for this PR! Left some comments. May I also ask you to write some tests for your code? Open telemetry tests are located here: https://github.com/triton-inference-server/server/tree/main/qa/L0_trace. Could you please also send us a signed CLA?

theoclark · 2024-01-02T17:49:35Z

Hi @theoclark , thank you for this PR! Left some comments. May I also ask you to write some tests for your code? Open telemetry tests are located here: https://github.com/triton-inference-server/server/tree/main/qa/L0_trace. Could you please also send us a signed CLA?

Thanks for the review, will make some changes. I work for Speechmatics and I believe we've already signed the CLA? Any specific tests you had in mind?

oandreeva-nv · 2024-01-02T19:16:29Z

Thanks for the review, will make some changes. I work for Speechmatics and I believe we've already signed the CLA? Any specific tests you had in mind?

Got it, thanks for letting me know about CLA. Regarding tests. Basically, it would be nice to add the following tests:

Tests for setting new cmdline args. E.g. test for small queue size (3 or 4) and see if OTEL collector collects all 4 traces only after Triton receives 4 traces. Let me know if it makes sense.
Existing tests need adjustments. Since we are removing Simple Span Processor, current tests will fail, unless the default values for BatchSpanProcessor are set to imitate SimpleSpanProcessor.

oandreeva-nv · 2024-01-11T19:32:12Z

Hi @theoclark , I've pushed a PR, which has a major refactor for tests: #6785

If you are interested, I can pick up this PR as well and add tests myself.

theoclark · 2024-01-12T12:05:57Z

Hi @oandreeva-nv, thanks that would be great. Hadn't got around to writing the tests myself.

theoclark · 2024-01-17T11:21:58Z

@oandreeva-nv we've also noticed in other parts of our code that exporting traces across HTTP can have more of an impact on performance than gRPC. Is there a reason tritonserver uses HTTP over gRPC that I'm not aware of?

oandreeva-nv · 2024-01-17T18:46:25Z

@theoclark There was a limited scope of what we could implement initially. gRPC is on our roadmap for implementation. I'll try to make it happen in 24.02 release, since there is one more ask for gRPC exporter

oandreeva-nv · 2024-02-01T17:59:17Z

Hi @theoclark , I'll close this PR since #6842 was merged, and it includes your commits from this PR and mentions you as a co-author

theoclark · 2024-02-01T19:47:28Z

Thanks @oandreeva-nv for your work on this. Great that it's been merged.

Theo Clark added 2 commits December 18, 2023 18:18

switch from single to batch span processor

1cae1c0

update header files

d9858bd

oandreeva-nv reviewed Dec 26, 2023

View reviewed changes

oandreeva-nv self-assigned this Dec 26, 2023

Merge branch 'main' into main

9cd46f3

oandreeva-nv mentioned this pull request Jan 26, 2024

Adding OpenTelemetry Batch Span Processor #6842

Merged

oandreeva-nv closed this Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export opentelemetry spans with batch processor #6733

export opentelemetry spans with batch processor #6733

theoclark commented Dec 22, 2023

oandreeva-nv Dec 26, 2023

oandreeva-nv Dec 26, 2023

oandreeva-nv left a comment

theoclark commented Jan 2, 2024

oandreeva-nv commented Jan 2, 2024

oandreeva-nv commented Jan 11, 2024

theoclark commented Jan 12, 2024

theoclark commented Jan 17, 2024

oandreeva-nv commented Jan 17, 2024

oandreeva-nv commented Feb 1, 2024

theoclark commented Feb 1, 2024

		@@ -48,6 +48,9 @@ namespace otel_resource = opentelemetry::sdk::resource;

		namespace triton { namespace server {

		static const int EXPORT_QUEUE_SIZE = 5000;

export opentelemetry spans with batch processor #6733

export opentelemetry spans with batch processor #6733

Conversation

theoclark commented Dec 22, 2023

oandreeva-nv Dec 26, 2023

Choose a reason for hiding this comment

oandreeva-nv Dec 26, 2023

Choose a reason for hiding this comment

oandreeva-nv left a comment

Choose a reason for hiding this comment

theoclark commented Jan 2, 2024

oandreeva-nv commented Jan 2, 2024

oandreeva-nv commented Jan 11, 2024

theoclark commented Jan 12, 2024

theoclark commented Jan 17, 2024

oandreeva-nv commented Jan 17, 2024

oandreeva-nv commented Feb 1, 2024

theoclark commented Feb 1, 2024