Description
Component(s)
extension/storage/filestorage
Describe the issue you're reporting
- Sizing and Scaling Recommendations from Splunk OTel Collector specify that the OTel Collector can handle around 15,000 spans, 20,000 datapoints, or 10,000 logs per second with a single CPU and 2 GiB of Memory: https://docs.splunk.com/observability/en/gdi/opentelemetry/sizing.html#sizing-and-scaling
However we need further clarification to understand which metrics are used to calculate this. For example we have three types of metrics from the Collector:
otelcol_receiver_accepted_*
otelcol_receiver_refused_*
otelcol_scraper_errored_*
With the above recommendations in mind, which metric should we use to calculate the above recommendations? would this only be otelcol_receiver_accepted, or an aggregate/sum of otelcol_receiver_* combined together?
- Regarding the queue_size: [https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#configuration](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md#configuration)
So the documentation states that sending_queue.queue_size is:
The maximum number of batches stored to disk can be controlled using sending_queue.queue_size parameter (which, similarly as for in-memory buffering, defaults to 1000 batches).
However, when you look at the settings, there's not a spot where the batch size is specified. We only see it's the combination of num_seconds, requests_per_second, requests_per_batch. There is also a link to the batch processor but this only specifies a generic number for send_batch_size and it doesn't mention if this bytes, MiB, etc.
Can we get clarification on this?
- Regarding the sizing/scaling, do we have any insight as to how processors, connectors, and other components effect CPU and Memory? I'm assuming processors could be memory heavy if processing a massive amount of logs as well in addition to sheer ingestion volume.