-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/stanza] Prevent data loss due to LogEmitter
's buffer
#35456
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Removing |
This changes the Log Emitter to run the `consumerFunc` on the whole batch, instead of splitting the batch into individual entries and calling `consumerFunc` on each of them. This doesn't change much while the Log Emitter has its own `batch` buffer, but if we remove the `batch` buffer (see open-telemetry#35456), this should prevent the performance drop described in open-telemetry#35454.
This changes the Log Emitter to run the `consumerFunc` on the whole batch, instead of splitting the batch into individual entries and calling `consumerFunc` on each of them. This doesn't change much while the Log Emitter has its own `batch` buffer, but if we remove the `batch` buffer (see open-telemetry#35456), this should prevent the performance drop described in open-telemetry#35454.
This changes the Log Emitter to run the `consumerFunc` on the whole batch, instead of splitting the batch into individual entries and calling `consumerFunc` on each of them. This doesn't change much while the Log Emitter has its own `batch` buffer, but if we remove the `batch` buffer (see open-telemetry#35456), this should prevent the performance drop described in open-telemetry#35454.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Related to open-telemetry#38054. The File Log receiver benchmark has a scope that is larger than both the [FIle consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go) and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go). Compared to the File input benchmark, the File Log receiver benchmark includes: - translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)). - batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103). This new benchmark should be useful when comparing performance with and without batching in LogEmitter, see open-telemetry#35456.
…38171) #### Description Related to #38054. The File Log receiver benchmark has a scope that is larger than both the [File consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go) and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go). Compared to the File input benchmark, the scope of File Log receiver benchmark includes: - translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)). - batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103). This new benchmark should help us measure the performance impact of [removing batching from LogEmitter](#35456) after it is [added in File consumer](#35455). #### Link to tracking issue - Needed for #35456
#### Description Removes batching in LogEmitter to prevent data loss during ungraceful shutdown of the collector. See #35456 for details. This is done behind a feature gate, as it may have a negative performance impact, depending on user configuration. See the added documentation for the feature gate. On implementation side, this was done by renaming the existing `LogEmitter` struct to `BatchingLogEmitter` and introducing a new `SynchronousLogEmitter`, see `pkg/stanza/adapter/emitter.go`. #### Link to tracking issue - Fixes #35456 #### Testing Added unit tests in `pkg/stanza/adapter/emitter_test.go`. Adapted the benchmarks `pkg/stanza/adapter/receiver_test.go` to run for both the existing BatchingLogEmitter and the new SynchronousLogEmitter. #### Documentation Added documentation for the feature gate.
Component(s)
pkg/stanza
Is your feature request related to a problem? Please describe.
This issue is created as a result of discussion in #31074.
During an non-graceful shutdown of the collector, logs kept in the LogEmitter's buffer (the batch field) are lost. This is because they are not persisted and are already marked as sent by the File consumer.
Describe the solution you'd like
Remove the buffer in the LogEmitter and make LogEmitter synchronously emit every log/batch of logs received down the collector pipeline.
This will likely introduce a performance impact if implemented without other changes. This should only be done after:
LogEmitter
#35454Describe alternatives you've considered
Additional context
See discussion in #31074 (comment) and further comments.
The text was updated successfully, but these errors were encountered: