Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Prevent data loss due to LogEmitter's buffer #35456

Closed
andrzej-stencel opened this issue Sep 27, 2024 · 4 comments · Fixed by #38428
Closed

[pkg/stanza] Prevent data loss due to LogEmitter's buffer #35456

andrzej-stencel opened this issue Sep 27, 2024 · 4 comments · Fixed by #38428
Assignees
Labels
enhancement New feature or request pkg/stanza

Comments

@andrzej-stencel
Copy link
Member

Component(s)

pkg/stanza

Is your feature request related to a problem? Please describe.

This issue is created as a result of discussion in #31074.

During an non-graceful shutdown of the collector, logs kept in the LogEmitter's buffer (the batch field) are lost. This is because they are not persisted and are already marked as sent by the File consumer.

Describe the solution you'd like

Remove the buffer in the LogEmitter and make LogEmitter synchronously emit every log/batch of logs received down the collector pipeline.

This will likely introduce a performance impact if implemented without other changes. This should only be done after:

Describe alternatives you've considered

  • Instead of removing the buffering, have it persisted so that the data is not lost during non-graceful shutdown.
  • Change the logic of marking logs as sent, so that logs are only marked as sent when they are actually successfully sent out to next consumer in the collector pipeline.

Additional context

See discussion in #31074 (comment) and further comments.

@andrzej-stencel andrzej-stencel added enhancement New feature or request needs triage New item requiring triage labels Sep 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@andrzej-stencel
Copy link
Member Author

Removing needs triage label as this was discussed with the code owner.

andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this issue Nov 8, 2024

Verified

This commit was signed with the committer’s verified signature.
KinectTheUnknown David-Joseph Xayavong
This changes the Log Emitter to run the `consumerFunc`
on the whole batch, instead of splitting the batch into individual entries
and calling `consumerFunc` on each of them.

This doesn't change much while the Log Emitter has its own `batch` buffer,
but if we remove the `batch` buffer (see open-telemetry#35456),
this should prevent the performance drop described in open-telemetry#35454.
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this issue Nov 8, 2024

Verified

This commit was signed with the committer’s verified signature.
KinectTheUnknown David-Joseph Xayavong
This changes the Log Emitter to run the `consumerFunc`
on the whole batch, instead of splitting the batch into individual entries
and calling `consumerFunc` on each of them.

This doesn't change much while the Log Emitter has its own `batch` buffer,
but if we remove the `batch` buffer (see open-telemetry#35456),
this should prevent the performance drop described in open-telemetry#35454.
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this issue Nov 19, 2024

Verified

This commit was signed with the committer’s verified signature.
KinectTheUnknown David-Joseph Xayavong
This changes the Log Emitter to run the `consumerFunc`
on the whole batch, instead of splitting the batch into individual entries
and calling `consumerFunc` on each of them.

This doesn't change much while the Log Emitter has its own `batch` buffer,
but if we remove the `batch` buffer (see open-telemetry#35456),
this should prevent the performance drop described in open-telemetry#35454.
Copy link
Contributor

github-actions bot commented Dec 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 2, 2024
@djaglowski djaglowski removed the Stale label Dec 3, 2024
Copy link
Contributor

github-actions bot commented Feb 3, 2025

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Feb 3, 2025
andrzej-stencel added a commit to andrzej-stencel/opentelemetry-collector-contrib that referenced this issue Feb 25, 2025

Verified

This commit was signed with the committer’s verified signature.
KinectTheUnknown David-Joseph Xayavong
Related to open-telemetry#38054.

The File Log receiver benchmark has a scope that is larger than both
the [FIle consumer benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go)
and the [File input benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go).
Compared to the File input benchmark, the File Log receiver benchmark includes:
- translating of Stanza entries to pdata logs ([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)).
- batching of logs in [LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103).

This new benchmark should be useful when comparing performance with and without batching in LogEmitter,
see open-telemetry#35456.
@andrzej-stencel andrzej-stencel self-assigned this Feb 25, 2025
djaglowski pushed a commit that referenced this issue Mar 3, 2025

Verified

This commit was signed with the committer’s verified signature.
KinectTheUnknown David-Joseph Xayavong
…38171)

#### Description

Related to
#38054.

The File Log receiver benchmark has a scope that is larger than both the
[File consumer
benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/fileconsumer/benchmark_test.go)
and the [File input
benchmark](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/input/file/benchmark_test.go).
Compared to the File input benchmark, the scope of File Log receiver
benchmark includes:
- translating of Stanza entries to pdata logs
([converter.ConvertEntries](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/adapter/converter.go#L20)).
- batching of logs in
[LogEmitter](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/a826350bab9388e7ab8179f1e02c68177d83f0b4/pkg/stanza/operator/helper/emitter.go#L103).

This new benchmark should help us measure the performance impact of
[removing batching from
LogEmitter](#35456)
after it is [added in File
consumer](#35455).

#### Link to tracking issue

- Needed for
#35456
djaglowski pushed a commit that referenced this issue Mar 7, 2025

Verified

This commit was signed with the committer’s verified signature.
KinectTheUnknown David-Joseph Xayavong
#### Description

Removes batching in LogEmitter to prevent data loss during ungraceful
shutdown of the collector. See
#35456
for details.

This is done behind a feature gate, as it may have a negative
performance impact, depending on user configuration. See the added
documentation for the feature gate.

On implementation side, this was done by renaming the existing
`LogEmitter` struct to `BatchingLogEmitter` and introducing a new
`SynchronousLogEmitter`, see `pkg/stanza/adapter/emitter.go`.

#### Link to tracking issue

- Fixes
#35456

#### Testing

Added unit tests in `pkg/stanza/adapter/emitter_test.go`.

Adapted the benchmarks `pkg/stanza/adapter/receiver_test.go` to run for
both the existing BatchingLogEmitter and the new SynchronousLogEmitter.

#### Documentation

Added documentation for the feature gate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/stanza
Projects
None yet
2 participants