Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification Needed for Queueing in FileStorage and Memory Usage #38279

Open
justinlikeswhiskey opened this issue Feb 28, 2025 · 2 comments
Open
Assignees
Labels

Comments

@justinlikeswhiskey
Copy link

Component(s)

extension/storage/filestorage

Describe the issue you're reporting

  1. Sizing and Scaling Recommendations from Splunk OTel Collector specify that the OTel Collector can handle around 15,000 spans, 20,000 datapoints, or 10,000 logs per second with a single CPU and 2 GiB of Memory: https://docs.splunk.com/observability/en/gdi/opentelemetry/sizing.html#sizing-and-scaling

However we need further clarification to understand which metrics are used to calculate this. For example we have three types of metrics from the Collector:

otelcol_receiver_accepted_*
otelcol_receiver_refused_*
otelcol_scraper_errored_*

With the above recommendations in mind, which metric should we use to calculate the above recommendations? would this only be otelcol_receiver_accepted, or an aggregate/sum of otelcol_receiver_* combined together?

  1. Regarding the queue_size: [https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#configuration](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#configuration)

So the documentation states that sending_queue.queue_size is:
The maximum number of batches stored to disk can be controlled using sending_queue.queue_size parameter (which, similarly as for in-memory buffering, defaults to 1000 batches).
However, when you look at the settings, there's not a spot where the batch size is specified. We only see it's the combination of num_seconds, requests_per_second, requests_per_batch. There is also a link to the batch processor but this only specifies a generic number for send_batch_size and it doesn't mention if this bytes, MiB, etc.

Can we get clarification on this?

  1. Regarding the sizing/scaling, do we have any insight as to how processors, connectors, and other components effect CPU and Memory? I'm assuming processors could be memory heavy if processing a massive amount of logs as well in addition to sheer ingestion volume.
@justinlikeswhiskey justinlikeswhiskey added the needs triage New item requiring triage label Feb 28, 2025
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@atoulme atoulme added discussion needed Community discussion needed and removed needs triage New item requiring triage labels Mar 8, 2025
@atoulme
Copy link
Contributor

atoulme commented Mar 8, 2025

I will bring it up to a SIG meeting to get the ball rolling.

@atoulme atoulme self-assigned this Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants