Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a RabbitMQ-based AMQP connector #3546

Merged
merged 23 commits into from Nov 6, 2023
Merged

Conversation

mavam
Copy link
Member

@mavam mavam commented Oct 2, 2023

This PR adds a RabbitMQ connector, making it possible to produce and consume messages via AMQP.

Definition of Done

Edit tasklist title
Beta Give feedback Tasklist Definition of Done, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Implement saver
    Options
  2. Implement loader
    Options
  3. Write operator docs
    Options
  4. Get CI to build
    Options
  5. Add <url> as optional positional argument
    Options
  6. Expose specific consumer and producer controls
    Options
  7. Fix confusion of routing key vs. queue name
    Options
  8. Consider renaming plugin to amqp
    Options
  9. Add changelog entry
    Options
  10. Implement --set options (after merging fluent-bit PR that factors the implementation)
    Options

@mavam mavam added feature New functionality connector Loader and saver labels Oct 2, 2023
@mavam
Copy link
Member Author

mavam commented Oct 3, 2023

@satta not urgent, but when you get a chance I'd be curious to hear your feedback on the rabbitmq connector. When I went through the C library abstractions, it felt pretty natural to implement a source as consumer and a sink a producer. Does that work for you?

Also, I've exposed only a few basic tuning knobs, like queue, exchange, and channel in the operator invocation itself. See the README at https://github.com/tenzir/tenzir/blob/cc6bfdfc5c9a32b4dc948a7a7e2ebeeff2285999/web/docs/connectors/rabbitmq.md.

Settings like hostname, port, vhost, etc., are orthogonal and part of the plugin-specific configuration file, as my intuition is that these mostly shared between operators. As with kafka, it's possible to override them via --set as well.

Finally, I thought about adding an optional, positional argument that is a URL of the form:

amqp://[$USERNAME[:$PASSWORD]\@]$HOST[:$PORT]/[$VHOST]

I'm not sure how useful it is right now. Exposing this probably only makes sense when using multiple different RabbitMQ deployments. So I'd only add that if it's really needed and a pain to work with --set or the config file, as it clutters the operator invocation.

@satta
Copy link
Contributor

satta commented Oct 3, 2023

@satta not urgent, but when you get a chance I'd be curious to hear your feedback on the rabbitmq connector. When I went through the C library abstractions, it felt pretty natural to implement a source as consumer and a sink a producer. Does that work for you?

Sounds good to me, but I guess I'll need to play around with it to get a better idea of how it feels in practice.

Also, I've exposed only a few basic tuning knobs, like queue, exchange, and channel in the operator invocation itself. See the README at https://github.com/tenzir/tenzir/blob/cc6bfdfc5c9a32b4dc948a7a7e2ebeeff2285999/web/docs/connectors/rabbitmq.md.

Yup, these are the most important ones. Definitely mandatory is some way of specifying if a queue is to be temporary (i.e. removed when the client disconnects) or persistent (storing incoming deliveries until the client comes back). Both styles of behaviour can be useful in practice. This configuration is usually done with the 'auto_delete' option that is either used when declaring a queue, or via a server-side policy (which is based on a name pattern and does not require any client parameterization). I'll have to test whether this parameter can be specified on the client side using the --set parameter. Is there a list of the supported keys allowed in this context? The README still lists XXX.

Also note that declaring queues only makes sense for sources; sinks (i.e. components that send to RabbitMQ) can only specify exchanges; this should be appropriately handled. I haven't looked at the code yet, but that's one point that comes to mind.

Settings like hostname, port, vhost, etc., are orthogonal and part of the plugin-specific configuration file, as my intuition is that these mostly shared between operators. As with kafka, it's possible to override them via --set as well.

This is maybe a simplification that suggests a bit too obviously how the plugin should be used: it kind of assumes that there is one cluster per node that is usually interacted with. Might be the case for me, but maybe not for everyone.

Finally, I thought about adding an optional, positional argument that is a URL of the form:

amqp://[$USERNAME[:$PASSWORD]\@]$HOST[:$PORT]/[$VHOST]

That would probably be better (and also what many other client interfaces do). I'd prefer this, but I can see one also has to keep clarity of the operators in mind.

I'll try to get my build environment up and running to test it out and send/receive some data. It might also be helpful to also allow access to message headers, i.e. to determine if, for instance, the payload needs to be decompressed or not. Not sure how such OOB values would fit into the pipeline pattern Tenzir uses.

@satta
Copy link
Contributor

satta commented Oct 3, 2023

FTR: In order to get the plugin to build with Debian's librabbitmq-dev (which is rabbitmq-c) I had to change some paths since in Debian the headers are named differently:

diff --git a/plugins/rabbitmq/src/plugin.cpp b/plugins/rabbitmq/src/plugin.cpp
index 79c91a06d9..bfe1d13c67 100644
--- a/plugins/rabbitmq/src/plugin.cpp
+++ b/plugins/rabbitmq/src/plugin.cpp
@@ -12,8 +12,8 @@
 
 #include <caf/expected.hpp>
 
-#include <rabbitmq-c/amqp.h>
-#include <rabbitmq-c/tcp_socket.h>
+#include <amqp.h>
+#include <amqp_tcp_socket.h>
 
 using namespace std::chrono_literals;
 

@mavam
Copy link
Member Author

mavam commented Oct 3, 2023

Definitely mandatory is some way of specifying if a queue is to be temporary (i.e. removed when the client disconnects) or persistent (storing incoming deliveries until the client comes back). Both styles of behaviour can be useful in practice. This configuration is usually done with the 'auto_delete' option that is either used when declaring a queue, or via a server-side policy (which is based on a name pattern and does not require any client parameterization).

These are currently hard-coded for the consumer before I declare the queue:

    auto passive = amqp_boolean_t{0};
    auto durable = amqp_boolean_t{0};
    auto exclusive = amqp_boolean_t{0};
    auto auto_delete = amqp_boolean_t{1};
    // and before consume
    auto no_local = amqp_boolean_t{0};
    auto no_ack = amqp_boolean_t{1};

And this for the producer:

    auto mandatory = amqp_boolean_t{0};
    auto immediate = amqp_boolean_t{0};

Which of those should I expose to the operator?

Also note that declaring queues only makes sense for sources; sinks (i.e. components that send to RabbitMQ) can only specify exchanges; this should be appropriately handled. I haven't looked at the code yet, but that's one point that comes to mind.

Yep, that's the way I've implemented it.

That would probably be better (and also what many other client interfaces do). I'd prefer this, but I can see one also has to keep clarity of the operators in mind.

Okay, I'll make the URL as optional positional argument.

It might also be helpful to also allow access to message headers, i.e. to determine if, for instance, the payload needs to be decompressed or not. Not sure how such OOB values would fit into the pipeline pattern Tenzir uses.

We can expose the list of headers simply in the schema layout, e.g., as headers: list<record<key: string, value: string>> or whatever the headers look like. We can also make it such that only a given flag introduces the extra headers into the schema.

@mavam
Copy link
Member Author

mavam commented Oct 3, 2023

I'll have to test whether this parameter can be specified on the client side using the --set parameter. Is there a list of the supported keys allowed in this context? The README still lists XXX.

Fixed this. Now looks as follows:

image

@satta
Copy link
Contributor

satta commented Oct 3, 2023

Definitely mandatory is some way of specifying if a queue is to be temporary (i.e. removed when the client disconnects) or persistent (storing incoming deliveries until the client comes back). Both styles of behaviour can be useful in practice. This configuration is usually done with the 'auto_delete' option that is either used when declaring a queue, or via a server-side policy (which is based on a name pattern and does not require any client parameterization).

These are currently hard-coded for the consumer before I declare the queue:

    auto passive = amqp_boolean_t{0};
    auto durable = amqp_boolean_t{0};
    auto exclusive = amqp_boolean_t{0};
    auto auto_delete = amqp_boolean_t{1};
    // and before consume
    auto no_local = amqp_boolean_t{0};
    auto no_ack = amqp_boolean_t{1};

And this for the producer:

    auto mandatory = amqp_boolean_t{0};
    auto immediate = amqp_boolean_t{0};

Which of those should I expose to the operator?

I'd suggest all of them! They are required to implement various use cases in which the connecting client plays a specific role and needs to behave accordingly.

It's surely fine to set the defaults as above (corresponding to a typical temporary consumer) but they should be adjustable IMHO.

We'd also need a routing key for the consumer, in case one wants to bind to a topic exchange.

Also note that declaring queues only makes sense for sources; sinks (i.e. components that send to RabbitMQ) can only specify exchanges; this should be appropriately handled. I haven't looked at the code yet, but that's one point that comes to mind.

Yep, that's the way I've implemented it.

Great 👍🏻

It might also be helpful to also allow access to message headers, i.e. to determine if, for instance, the payload needs to be decompressed or not. Not sure how such OOB values would fit into the pipeline pattern Tenzir uses.

We can expose the list of headers simply in the schema layout, e.g., as headers: list<record<key: string, value: string>> or whatever the headers look like. We can also make it such that only a given flag introduces the extra headers into the schema.

The headers are, IIRC, indeed key-value pairs of strings. So that would work.

@satta
Copy link
Contributor

satta commented Oct 3, 2023

I have not yet been able to receive data from a local instance. The connection and channel are set up, but queue handling still seems to be a bit off.

I'm not sure why amqp_queue_declare() appears to be always called with an empty queue name, and the queue name is later used as the routing_key to bind to the exchange (which are different things). The routing key is not used with all exchange types, and BTW should also be configurable when one wants to bind to an exchange dynamically.

Also, if one is not using a temporary queue, one might not need to care about the exchange since the binding has already been set up and one only needs the queue name.

I'd suggest to:

  • Use the given queue name when declaring the queue. Only use declare->queue (which I assume is the automatically given name for an anonymous queue) if no queue name has been specified.
  • Make the exchange parameter optional. Only accept a routing key parameter and bind to an exchange when an exchange has been set -- because that means that the client wants to set up the binding.

@satta
Copy link
Contributor

satta commented Oct 3, 2023

I also tried:

diff --git a/plugins/rabbitmq/src/plugin.cpp b/plugins/rabbitmq/src/plugin.cpp
index 79c91a06d9..6584218b79 100644
--- a/plugins/rabbitmq/src/plugin.cpp
+++ b/plugins/rabbitmq/src/plugin.cpp
@@ -237,14 +237,14 @@ public:
   // TODO: need a better name for this function.
   auto consume(amqp_channel_t channel, std::string_view exchange,
                std::string_view queue) -> caf::error {
-    TENZIR_DEBUG("declaring queue");
+    TENZIR_DEBUG("declaring queue {}", queue);
     auto passive = amqp_boolean_t{0};
     auto durable = amqp_boolean_t{0};
     auto exclusive = amqp_boolean_t{0};
     auto auto_delete = amqp_boolean_t{1};
     auto arguments = amqp_empty_table;
     auto* declare
-      = amqp_queue_declare(conn_, channel, amqp_empty_bytes, passive, durable,
+      = amqp_queue_declare(conn_, channel, as_amqp_bytes(queue), passive, durable,
                            exclusive, auto_delete, arguments);
     if (auto err = to_error(amqp_get_rpc_reply(conn_)))
       return err;

but just got a segfault, apparently during error logging:

...
Thread 18 "caf.thread" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffda7f26c0 (LWP 2150108)]
0x00007fffe1137c51 in operator()(_ZZNK6tenzir7plugins8rabbitmq12_GLOBAL__N_115rabbitmq_loader11instantiateERNS_22operator_control_planeEENUlNS2_14connector_argsENS2_11amqp_engineEE_clES6_S7_.Frame *) (frame_ptr=0x616000140480)
    at ./plugins/rabbitmq/src/plugin.cpp:425
...

@mavam
Copy link
Member Author

mavam commented Oct 3, 2023

I'm not sure why amqp_queue_declare() appears to be always called with an empty queue name, and the queue name is later used as the routing_key to bind to the exchange (which are different things). The routing key is not used with all exchange types, and BTW should also be configurable when one wants to bind to an exchange dynamically.

I hear you. I took this approach from official example at https://github.com/alanxz/rabbitmq-c/blob/master/examples/amqp_consumer.c. It highly confused me as well. I thought "it's the way to do it" but I am not surprised it doesn't work.

However, this worked for me locally after starting RabbitMQ:

tenzir 'from rabbitmq'
tenzir 'show operators | to rabbitmq'

I'm not quite sure what to make of it. At first I wanted to get the scaffold in place before going deeper. I guess it's now time to do that. :-)

@mavam
Copy link
Member Author

mavam commented Oct 3, 2023

@satta I exposed a bunch more options for saver and loader, plus added the ability to provide an optional URL. Mind taking a look at the Markdown file whether the new options work for you?

I'll take a look at the routing-key vs. queue name next.

EDIT: I fixed the confusion of declared queues vs. routing keys. We now only allow setting routing keys for the publisher, and queue name plus routing key for the consumer. Moreover, the default queue name is the empty string, resulting and randomly generated queue names by the AMQP server. The -q flag allows for setting a dedicated queue name.

The -r flag defaults to tenzir. Should it be the empty string?

@mavam
Copy link
Member Author

mavam commented Oct 4, 2023

@satta we have a problem with injecting headers into the loader: the reason is that the loader only forwards blocks of bytes to a parser. However, the headers are structured data that we can't simply add in the current framework. For example, if the payload is CSV or JSON, there might be a chance to simply add header fields. But what do you do if the payload is a PCAP file? It's simply not possible to inject structured data into a stream of bytes.

I'm not quite sure how we can solve this.

  • Perhaps a side-channel, so that a loader can also produce events that will be fed into the downstream operator? But then we lose the binding of the headers to a message.
  • Expose the headers as metrics? Then we can't make in-stream decisions based on their values.

I would need to understand how important this is. Then we should discuss the solution space with @dominiklohmann. The only option I see currently is we make it possible to pass structured, per-chunk metadata when we communicate between connector and format.

@mavam
Copy link
Member Author

mavam commented Oct 4, 2023

@tobim mind taking a quick look at the Nix build failure in CI? I can't make sense of the seemingly unrelated Perl linker errors.

@tobim
Copy link
Member

tobim commented Oct 4, 2023

@tobim mind taking a quick look at the Nix build failure in CI? I can't make sense of the seemingly unrelated Perl linker errors.

That seems to be an error in the rabbitmq-c package definition. The following patch should fix it:

diff --git a/nix/overlay.nix b/nix/overlay.nix
index 09fdf28ece..b77cebadf8 100644
--- a/nix/overlay.nix
+++ b/nix/overlay.nix
@@ -182,6 +182,13 @@ in {
         configureFlags = old.configureFlags ++ ["--enable-prof" "--enable-stats"];
         doCheck = !isStatic;
       });
+  rabbitmq-c =
+    if !isStatic
+    then prev.rabbitmq-c
+    else
+      prev.rabbitmq-c.override {
+        xmlto = null;
+      };
   tenzir-source = inputs.nix-filter.lib.filter {
     root = ./..;
     include = [

mavam and others added 2 commits October 4, 2023 09:33
Co-authored-by: Tobias Mayer <tobim@fastmail.fm>
@satta
Copy link
Contributor

satta commented Oct 31, 2023

tcache_bin_flush_edatas_lookup seems jemalloc-related. I installed jemalloc-dev (5.3.0) from Debian, maybe that caused an issue. Will rebuild with jemalloc disabled.

@dominiklohmann
Copy link
Member

@satta there's a known issue with the AWS C++ SDK used in Arrow before version 13; can you double-check whether you're running the newest version of Arrow?

@satta
Copy link
Contributor

satta commented Oct 31, 2023

@satta there's a known issue with the AWS C++ SDK used in Arrow before version 13; can you double-check whether you're running the newest version of Arrow?

At least I'm on 13:

-- Arrow version: 13.0.0
-- Found the Arrow shared library: /usr/lib/x86_64-linux-gnu/libarrow.so.1300.0.0
-- Found the Arrow import library: ARROW_IMPORT_LIB-NOTFOUND
-- Found the Arrow static library: /usr/lib/x86_64-linux-gnu/libarrow.a

Disabling jemalloc didn't make a difference.

@mavam mavam marked this pull request as ready for review October 31, 2023 12:46
@satta
Copy link
Contributor

satta commented Nov 1, 2023

FYI regarding the segfault, removing libtenzir/builtins/connectors/s3.cpp made it work again for me. Not a permanent solution of course but allowing me to continue testing.

@mavam
Copy link
Member Author

mavam commented Nov 2, 2023

@Dakostu is there an upstream issue tracking this leak?

@Dakostu
Copy link
Member

Dakostu commented Nov 2, 2023

@mavam @satta
(testing this on Linux)
My Arrow version is also 13.0.0:

[cmake] -- Arrow version: 13.0.0
[cmake] -- Found the Arrow shared library: /usr/local/lib/libarrow.so.1300.0.0
[cmake] -- Found the Arrow import library: ARROW_IMPORT_LIB-NOTFOUND
[cmake] -- Found the Arrow static library: 

For some reason, my static library doesn't get shown? Not available? Nonetheless, I was able to build this branch and launch the tenzir-node without any segfaults:

./tenzir-node
      _____ _____ _   _ ________ ____       
     |_   _| ____| \ | |__  /_ _|  _ \      
       | | |  _| |  \| | / / | || |_) |     
       | | | |___| |\  |/ /_ | ||  _ <      
       |_| |_____|_| \_/____|___|_| \_\     

        v4.3.0-157-g125d873746-dirty        
Visit https://app.tenzir.com to get started.

[08:50:10.333] loaded configuration file: /home/dakostu/.config/tenzir/tenzir.yaml
[08:50:10.726] node is listening on 127.0.0.1:5158

And the s3 connector works for me:

./tenzir 'from s3 sentinel-cogs/sentinel-s2-l2a-cogs/1/C/CV/2023/1/S2B_1CCV_20230101_0_L2A/tileinfo_metadata.json | write json'        
[08:51:45.209] loaded configuration file: /home/dakostu/.config/tenzir/tenzir.yaml
{
  "path": "tiles/1/C/CV/2023/1/1/0",
  "timestamp": "2023-01-01T21:05:55.632000",
  "utmZone": 1,
(...)

For the s3 connector, we just use two calls to arrow's s3 filesystem to (de)initialize everything. Any issues arising "under the hood" of Arrow - especially the strange arrow_strptime call chain - is unfortunately out of our hands.

@satta Just to make sure - did you install Arrow manually? What were the options for your Arrow installation?

@Dakostu
Copy link
Member

Dakostu commented Nov 2, 2023

@mavam No upstream issue yet. This smells like an issue for the Arrow repository to me. But I need to know how @satta's Arrow set up looks like because I can't reproduce this.

@satta
Copy link
Contributor

satta commented Nov 2, 2023

@satta Just to make sure - did you install Arrow manually? What were the options for your Arrow installation?

I got the debs from their repo:

$ cat /etc/apt/sources.list.d/apache-arrow.sources
Types: deb deb-src
URIs: https://apache.jfrog.io/artifactory/arrow/debian/
Suites: bookworm
Components: main
Signed-By: /usr/share/keyrings/apache-arrow-apt-source.gpg

$ dpkg -l libarrow1300 libarrow-dev
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name               Version      Architecture Description
+++-==================-============-============-======================================================
ii  libarrow-dev:amd64 13.0.0-1     amd64        Apache Arrow is a data processing library for analysis
ii  libarrow1300:amd64 13.0.0-1     amd64        Apache Arrow is a data processing library for analysis

in Debian, the -dev packages usually ship the headers, the version-independent symlink to the library and also a static library. Since I installed the libarrow-dev package I have it.

BTW Arrow 14 seems to be available just now from the repo. Would it make sense to try that?

@Dakostu
Copy link
Member

Dakostu commented Nov 2, 2023

Our Dockerfile & CI on GitHub are also using Debian packages to install Arrow (even the same version), and during automated tests the segfault does not happen. Something seems strange here.
@satta Installing Arrow 14 might be worth a try, even though I haven't tried it myself.

@satta
Copy link
Contributor

satta commented Nov 2, 2023

Our Dockerfile & CI on GitHub are also using Debian packages to install Arrow (even the same version), and during automated tests the segfault does not happen. Something seems strange here. @satta Installing Arrow 14 might be worth a try, even though I haven't tried it myself.

OK I'll give it a try. Here we go:

-- Arrow version: 14.0.0
-- Found the Arrow shared library: /usr/lib/x86_64-linux-gnu/libarrow.so.1400.0.0
-- Found the Arrow import library: ARROW_IMPORT_LIB-NOTFOUND
-- Found the Arrow static library: /usr/lib/x86_64-linux-gnu/libarrow.a

Just in case it helps, here's a ldd output of the previous binary that I found in my terminal history:

$ ldd bin/tenzir
	linux-vdso.so.1 (0x00007fffb3bef000)
	libasan.so.8 => /lib/x86_64-linux-gnu/libasan.so.8 (0x00007f0167600000)
	libfluent-bit.so => /usr/lib/fluent-bit/libfluent-bit.so (0x00007f0166200000)
	libtenzir.so.2819.0 => /home/satta/tmp/tenzir/build/lib/libtenzir.so.2819.0 (0x00007f015c800000)
	libcaf_openssl.so.0.18.7 => /home/satta/tmp/tenzir/build/lib/libcaf_openssl.so.0.18.7 (0x00007f016be9d000)
	libcaf_io.so.0.18.7 => /home/satta/tmp/tenzir/build/lib/libcaf_io.so.0.18.7 (0x00007f015c000000)
	libcaf_core.so.0.18.7 => /home/satta/tmp/tenzir/build/lib/libcaf_core.so.0.18.7 (0x00007f015b200000)
	libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f0167d56000)
	libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f015ac00000)
	libyaml-cpp.so.0.7 => /lib/x86_64-linux-gnu/libyaml-cpp.so.0.7 (0x00007f016be47000)
	libxxhash.so.0 => /lib/x86_64-linux-gnu/libxxhash.so.0 (0x00007f016be32000)
	libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f0167ca6000)
	libspdlog.so.1.10 => /lib/x86_64-linux-gnu/libspdlog.so.1.10 (0x00007f016757c000)
	libfmt.so.9 => /lib/x86_64-linux-gnu/libfmt.so.9 (0x00007f016755c000)
	libsimdjson.so.16 => /home/satta/tmp/tenzir/build/lib/libsimdjson.so.16 (0x00007f015c6d1000)
	libarrow.so.1300 => /lib/x86_64-linux-gnu/libarrow.so.1300 (0x00007f0158600000)
	libboost_filesystem.so.1.81.0 => /lib/x86_64-linux-gnu/libboost_filesystem.so.1.81.0 (0x00007f0167538000)
	libboost_atomic.so.1.81.0 => /lib/x86_64-linux-gnu/libboost_atomic.so.1.81.0 (0x00007f016be26000)
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0158200000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0166121000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f015c6b1000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f015841f000)
	libyaml-0.so.2 => /lib/x86_64-linux-gnu/libyaml-0.so.2 (0x00007f015c690000)
	libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007f015bf30000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f015c671000)
	libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f015c654000)
	libpq.so.5 => /lib/x86_64-linux-gnu/libpq.so.5 (0x00007f015b1ab000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f016bf7c000)
	libre2.so.9 => /lib/x86_64-linux-gnu/libre2.so.9 (0x00007f015b132000)
	liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f015bf01000)
	libunwind.so.8 => /lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f015bee5000)
	libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f015b103000)
	libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f015b0d2000)
	librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f015b0b3000)
	libssh2.so.1 => /lib/x86_64-linux-gnu/libssh2.so.1 (0x00007f015abbf000)
	libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f016610d000)
	libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f015ab6d000)
	libldap-2.5.so.0 => /lib/x86_64-linux-gnu/libldap-2.5.so.0 (0x00007f015ab0e000)
	liblber-2.5.so.0 => /lib/x86_64-linux-gnu/liblber-2.5.so.0 (0x00007f015b0a3000)
	libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f015aa52000)
	libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f015b096000)
	libbrotlienc.so.1 => /lib/x86_64-linux-gnu/libbrotlienc.so.1 (0x00007f015816f000)
	libprotobuf.so.32 => /lib/x86_64-linux-gnu/libprotobuf.so.32 (0x00007f0157e00000)
	libutf8proc.so.2 => /lib/x86_64-linux-gnu/libutf8proc.so.2 (0x00007f0157da9000)
	libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f015b083000)
	liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f0158149000)
	libabsl_bad_optional_access.so.20220623 => /lib/x86_64-linux-gnu/libabsl_bad_optional_access.so.20220623 (0x00007f016be17000)
	libabsl_str_format_internal.so.20220623 => /lib/x86_64-linux-gnu/libabsl_str_format_internal.so.20220623 (0x00007f0158130000)
	libabsl_time.so.20220623 => /lib/x86_64-linux-gnu/libabsl_time.so.20220623 (0x00007f0157d97000)
	libabsl_strings.so.20220623 => /lib/x86_64-linux-gnu/libabsl_strings.so.20220623 (0x00007f0157d79000)
	libabsl_strings_internal.so.20220623 => /lib/x86_64-linux-gnu/libabsl_strings_internal.so.20220623 (0x00007f0167ca0000)
	libabsl_throw_delegate.so.20220623 => /lib/x86_64-linux-gnu/libabsl_throw_delegate.so.20220623 (0x00007f0167531000)
	libabsl_time_zone.so.20220623 => /lib/x86_64-linux-gnu/libabsl_time_zone.so.20220623 (0x00007f0157d5f000)
	libabsl_bad_variant_access.so.20220623 => /lib/x86_64-linux-gnu/libabsl_bad_variant_access.so.20220623 (0x00007f0166108000
	libsnappy.so.1 => /lib/x86_64-linux-gnu/libsnappy.so.1 (0x00007f0157d53000)
	libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007f0157d47000)
	libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f0157c00000)
	libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f0157a4a000)
	libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f0157800000)
	libhogweed.so.6 => /lib/x86_64-linux-gnu/libhogweed.so.6 (0x00007f01577b7000)
	libnettle.so.8 => /lib/x86_64-linux-gnu/libnettle.so.8 (0x00007f0157769000)
	libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f01576e8000)
	libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f015760e000)
	libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f0157a1d000)
	libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f015c64c000)
	libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f0157600000)
	libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f01575dd000)
	libabsl_int128.so.20220623 => /lib/x86_64-linux-gnu/libabsl_int128.so.20220623 (0x00007f01575d6000)
	libabsl_base.so.20220623 => /lib/x86_64-linux-gnu/libabsl_base.so.20220623 (0x00007f015bedd000)
	libabsl_raw_logging_internal.so.20220623 => /lib/x86_64-linux-gnu/libabsl_raw_logging_internal.so.20220623 (0x00007f015841a000)
	libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f01575ae000)
	libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f015747a000)
	libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f0157465000)
	libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f015745e000)
	libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f015744d000)
	libabsl_spinlock_wait.so.20220623 => /lib/x86_64-linux-gnu/libabsl_spinlock_wait.so.20220623 (0x00007f0157448000)
	libffi.so.8 => /lib/x86_64-linux-gnu/libffi.so.8 (0x00007f015743c000)

I can't see anything that stands out, but I guess you know better what to expect ;)

@satta
Copy link
Contributor

satta commented Nov 2, 2023

@satta Installing Arrow 14 might be worth a try, even though I haven't tried it myself.

Segfault still present:

$ ./bin/tenzir
tenzir-v4.3.0-140-ge4c9341587: Error: signal 11 (Segmentation fault)
0x7f71d3b09c0b: (fatal_handler+0x93)
0x7f71ca05afd0: (__sigaction+0x40)
0x7f71d865a26e: (tcache_bin_flush_edatas_lookup.constprop.0+0x17e)
0x7f71d865bb97: (je_tcache_bin_flush_small+0xb7)
0x7f71d85f7207: (je_sdallocx_default+0x537)
0x7f71cbcb9eb6: (arrow_strptime+0x5a8a16)
0x7f71cbca5a0e: (arrow_strptime+0x59456e)
0x7f71cbca5b62: (arrow_strptime+0x5946c2)
0x7f71cbc704bf: (arrow_strptime+0x55f01f)
0x7f71cbbf1e95: (arrow_strptime+0x4e09f5)
0x7f71cbbc9bfa: (arrow_strptime+0x4b875a)
0x7f71cbb9661a: (arrow_strptime+0x48517a)
0x7f71cbb34497: (arrow_strptime+0x422ff7)
0x7f71cbae1f9d: (arrow_strptime+0x3d0afd)
0x7f71cb5fde80: (arrow::fs::EnsureS3Initialized()+0x250)
0x55972456cda9: (tenzir::plugins::s3::plugin::initialize(tenzir::detail::vector_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tenzir::data, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tenzir::data> >, tenzir::detail::stable_map_policy> const&, tenzir::detail::vector_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tenzir::data, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tenzir::data> >, tenzir::detail::stable_map_policy> const&)+0x10f)
0x7f71d5225dcb: (tenzir::plugins::initialize(caf::actor_system_config&)+0x31c4)
0x5597252c2689: (main+0x2252)
0x7f71ca0461ca: (__libc_init_first+0x8a)
0x7f71ca046285: (__libc_start_main+0x85)
0x559723db65f1: (_start+0x21)
zsh: segmentation fault  ./bin/tenzir

@satta
Copy link
Contributor

satta commented Nov 2, 2023

FTR the issue was fixed by using a workaround in the FluentBit library, see fluent/fluent-bit#8011

@Dakostu Dakostu self-requested a review November 2, 2023 13:57
Copy link
Member

@Dakostu Dakostu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set up a basic RabbitMQ server and was able to send/receive events. I approve, with some comments left.

plugins/amqp/src/plugin.cpp Outdated Show resolved Hide resolved
plugins/amqp/src/plugin.cpp Outdated Show resolved Hide resolved
plugins/amqp/src/plugin.cpp Show resolved Hide resolved
plugins/amqp/src/plugin.cpp Show resolved Hide resolved
web/docs/connectors/amqp.md Outdated Show resolved Hide resolved
plugins/amqp/src/plugin.cpp Outdated Show resolved Hide resolved
plugins/amqp/src/plugin.cpp Outdated Show resolved Hide resolved
@mavam mavam enabled auto-merge November 4, 2023 08:19
@mavam mavam force-pushed the topic/rabbitmq branch 2 times, most recently from fa02274 to 3fbe680 Compare November 6, 2023 11:02
@dominiklohmann dominiklohmann merged commit 19a72b6 into main Nov 6, 2023
11 of 12 checks passed
@dominiklohmann dominiklohmann deleted the topic/rabbitmq branch November 6, 2023 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connector Loader and saver feature New functionality
Projects
None yet
5 participants