Skip to content

file_chunk: add stricter checks for broken meta files #4998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 27, 2025

Conversation

Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Jun 9, 2025

Which issue(s) this PR fixes:

What this PR does / why we need it:
This PR improves meta file corruption checking.

The meta file contains at least the following field values.

data = @metadata.to_h.merge({
id: @unique_id,
s: (update ? @size + @adding_size : @size),
c: @created_at,
m: (update ? Fluent::Clock.real_now : @modified_at),
})

This PR reinforces #1874.

Without this changes, it might causes following error when launch fluentd every time with broken meta file:

2025-06-06 12:11:26 +0900 [error]: unexpected error while checking flushed chunks. ignored. error_class=NoMethodError error="undefined method '<' for nil"
  2025-06-06 12:11:26 +0900 [error]: /Users/watson/src/fluentd/lib/fluent/plugin/output.rb:1479:in 'block in Fluent::Plugin::Output#enqueue_thread_run'
  2025-06-06 12:11:26 +0900 [error]: /Users/watson/src/fluentd/lib/fluent/plugin/buffer.rb:548:in 'block in Fluent::Plugin::Buffer#enqueue_all'
  2025-06-06 12:11:26 +0900 [error]: /Users/watson/src/fluentd/lib/fluent/plugin/buffer.rb:542:in 'Array#each'
  2025-06-06 12:11:26 +0900 [error]: /Users/watson/src/fluentd/lib/fluent/plugin/buffer.rb:542:in 'Fluent::Plugin::Buffer#enqueue_all'
  2025-06-06 12:11:26 +0900 [error]: /Users/watson/src/fluentd/lib/fluent/plugin/output.rb:1479:in 'Fluent::Plugin::Output#enqueue_thread_run'
  2025-06-06 12:11:26 +0900 [error]: /Users/watson/src/fluentd/lib/fluent/plugin_helper/thread.rb:78:in 'block in Fluent::PluginHelper::Thread#thread_create'

If the timekey value is corrupted, the above error occurs.
Since there is no appropriate way to check timekey directly, check id, c, and m fields instead. This is because when timekey is broken, other fields may also be broken.

It might be possible that the @size is 0.
@unique_id, @created_at, and @modified_at are set when FileChunk is initialized, so they definitely have some values.
I think these fields should be written in meta file.

So, this PR adds the id, c, and m fields check.

Previously, it operates using default value if metadata was broken.
However, it can miss the corruption and result in unexpected errors.
So, this PR enhances the detection of broken metadata files instead of using defalut value.

This change has backward compatible with v0.14 behavior.

Docs Changes:
Not necessarily required.

Release Note:
buf_file: reinforce buffer file corruption check

@Watson1978 Watson1978 force-pushed the file_chunk branch 4 times, most recently from 2487088 to ae638cb Compare June 10, 2025 08:16
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
@Watson1978 Watson1978 marked this pull request as ready for review June 11, 2025 04:08
@Watson1978 Watson1978 requested review from kenhys and daipom June 11, 2025 04:27
@daipom daipom added this to the v1.19.0 milestone Jun 26, 2025
Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This metadata could be handled as a hash(map) like data structures.
So, this is just my two cents but it seems that we can add shasum hash for record contents into that metadata.
is it feasible?

This reverts commit f40c07f.
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
@Watson1978
Copy link
Contributor Author

I think metadata is extensible.
However, we have to verify the cost of shasum calculation and confirm the behavior for that...
It's tough to find the time to do that now...

@daipom
Copy link
Contributor

daipom commented Jun 27, 2025

This metadata could be handled as a hash(map) like data structures. So, this is just my two cents but it seems that we can add shasum hash for record contents into that metadata. is it feasible?

Thanks @cosmo0920 !

It would certainly be technically feasible to introduce a checksum for the chunk in the metadata.
However, the implementation cost will likely be significant.
For now, I'm considering this and #5003 as relatively easy improvements in the upcoming v1.19 release.

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

@daipom daipom merged commit 1bc8a75 into fluent:master Jun 27, 2025
14 of 16 checks passed
@Watson1978 Watson1978 deleted the file_chunk branch July 4, 2025 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants