Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Fluent Bit source and sink operators #3461

Merged
merged 52 commits into from Oct 6, 2023
Merged

Add Fluent Bit source and sink operators #3461

merged 52 commits into from Oct 6, 2023

Conversation

tobim
Copy link
Member

@tobim tobim commented Aug 15, 2023

This will add fluent-bit loaders and savers.

This PR is in a very early stage right now, the actual functionality is not implemented yet.

We include a custom fluent-bit-minimal.h from fluent/fluent-bit#7165 (comment) because the public API of fluent-bit is currently unusable. This will hopefully be included upstream in the near future.

Business Logic

Edit tasklist title
Beta Give feedback Tasklist Business Logic, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Implement first scaffold to test fluent-bit library
    Options
  2. Write Fluent Bit input plugin
    Options
  3. Implement fluentbit sink operator
    Options
  4. Implement fluentbit source operator
    Options
  5. Add logic that converts Fluent Bit JSON to table slices
    Options
  6. Apply decoding heuristic
    Options
  7. Fix crash on invalid sink, e.g., fluent-bit xxx
    Options
  8. Add integration tests
    Options

Scaffolding

Edit tasklist title
Beta Give feedback Tasklist Scaffolding, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Get initial scaffold compiling
    Options
  2. Reconsider rolling our own input plugin, as opposed to pushing JSON through flb_lib_push
    Options
  3. Fix Docker build
    Options
  4. Static libraries are currently not installed and scattered across many archive files. We need to manually pass them to the linker command line.
    Options
  5. The few dependencies that are not vendored need to be linked in manually for the static build.
    Options

@tobim tobim added the connector Loader and saver label Aug 15, 2023
@tobim tobim changed the title Topic/fluent bit Add a fluent-bit connector Aug 15, 2023
@mavam
Copy link
Member

mavam commented Aug 17, 2023

That build system work is quite a feat!

I had started something local using crtp_operator. Is saver/loader the right abstractions? Aren't we going from void to events directly? I haven't studied the fluent-bit API in depth, perhaps there's a MsgPack parser/printer here as well.

@tobim
Copy link
Member Author

tobim commented Aug 17, 2023

I had started something local using crtp_operator. Is saver/loader the right abstractions? Aren't we going from void to events directly? I haven't studied the fluent-bit API in depth, perhaps there's a MsgPack parser/printer here as well.

I just assumed that we would go to bytes first, but looking at the included minimal API it seems to make more sense to go to directly to events.

This is an implementation choice that you should make as the main implementer or the feature.

@mavam
Copy link
Member

mavam commented Aug 17, 2023

This is an implementation choice that you should make as the main implementer or the feature.

Alrighty, can I take it from here?

@mavam
Copy link
Member

mavam commented Aug 18, 2023

@tobim I've split the outstanding todos in two tasklists for clearer responsibilities. In particular, please review da417f1, which I needed to get things to link on macOS.

I'm surprised that the Docker build fails, as the script that installs the dev deps runs through just fine, and the plain Debian CI build works. Here I have no intuition how to proceed and would probably not invest any time myself. Any help would be appreciated.

@mavam mavam changed the title Add a fluent-bit connector Add a Fluent Bit operator Aug 31, 2023
@mavam mavam added operator Source, transformation, and sink and removed connector Loader and saver labels Aug 31, 2023
@mavam
Copy link
Member

mavam commented Sep 2, 2023

I managed to get rid of any custom Fluent Bit plugins. This now works with a stock Fluent Bit package that you get on your system.

The consequence is that we now have to go through JSON when using Fluent Bit as sink (Fluent Bit input), because the library function flb_lib_push only supports JSON. We're also restricted to JSON when using Fluent Bit as source (= Fluent Bit output), although here Fluent Bit already provides us with the option of parsing the raw MsgPack.

Net effect: we have the potential to tune performance as source operator, but not as sink operator.

I've asked @edsiper in Slack:

[E]duardo, would you accept a patch that enhances in_lib and flb_lib_push to accept MsgPack in addition to JSON? It's already possible to consume MsgPack with out_lib, but not yet in_lib. This would make the library API symmetric.

I'm envisioning no API change, just making this work:

in_ffd = flb_input(ctx, "lib", NULL);
flb_input_set(ctx, in_ffd, "format", "msgpack", NULL);
flb_lib_push(ctx, in_ffd, msgpack_buf, msgpack_buf_len);

With a bit of luck, this will be a welcome future contribution as it doesn't change anything about the public API.

@mavam mavam linked an issue Sep 2, 2023 that may be closed by this pull request
@tobim
Copy link
Member Author

tobim commented Sep 15, 2023

I'm surprised that the Docker build fails, as the script that installs the dev deps runs through just fine, and the plain Debian CI build works. Here I have no intuition how to proceed and would probably not invest any time myself. Any help would be appreciated.

The regular debian CI job does not build any plugins except for web. Plugins are built in a dedicated job matrix to which I just added fluent-bit.

The issue in the docker build was quite hairy and took me longer to fix than the static build. Behold: https://github.com/tenzir/tenzir/pull/3461/files#diff-4f95d2e348b638a037c80301070df229e817aaba51d9cd3b868976c7ae96c695R37-R69

@mavam
Copy link
Member

mavam commented Sep 16, 2023

CI error:

CMake Error at CMakeLists.txt:29 (dependency_summary):
  Unknown CMake command "dependency_summary".

@tobim IIRC we don't have that function in plugins. @dominiklohmann once told me to just remove it.

@tobim
Copy link
Member Author

tobim commented Sep 16, 2023

@tobim IIRC we don't have that function in plugins. @dominiklohmann once told me to just remove it.

This has been bugging us often enough. I refactored the functionality so it can be reused for standalone plugins in the latest commit.

@mavam mavam force-pushed the topic/fluent-bit branch 2 times, most recently from 86a7c67 to 89a3c90 Compare September 16, 2023 18:51
@mavam mavam added the integration Integration with third-party tools label Sep 17, 2023
tobim and others added 25 commits October 6, 2023 09:32
The plugin scaffold requires static plugins to have identical plugin
and project names.
We need to work around the limitations of the upstream build and
installation logic, which is done by collecting all necessary
archives into the plugin linker command line. See the code comments
for details.
The fluent bit shared library contains a private copy of jemalloc
which can't be loaded with `dlopen()` at runtime. We work around
the issue by preloading fluent bit with the Tenzir binary when
possible.

The proper fix would be to use the system jemalloc instead of vendoring
a private copy in the fluent-bit package.
These are no longer necessary because there's not more shared state between our
plugin and a third-party Fluent Bit plugin. Phew.
Co-authored-by: Daniel Kostuj <daniel.kostuj@tenzir.com>
The official fluent-bit package contains a bundeld copy of jemalloc
that poses 2 problems: First, it is built in a way that makes it
impossible to load `libfluent-bit.so` with `dlopen()`. And second,
the way it initializes thread local variables creates problems with
the jemalloc present in `libarrow.so`, specifically it causes a
segfault when initializing the s3 filesystem.

I'm trying to resolve this issue upstream at
fluent/fluent-bit#8005, but until it is
fixed we have to rely on this fallback.

Signed-off-by: Tobias Mayer <tobim@fastmail.fm>
@Dakostu Dakostu merged commit 3b4b4b4 into main Oct 6, 2023
38 checks passed
@Dakostu Dakostu deleted the topic/fluent-bit branch October 6, 2023 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality integration Integration with third-party tools operator Source, transformation, and sink
Projects
None yet
4 participants