Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging output not working as expected #80

Closed
rsmmr opened this issue Jan 10, 2020 · 6 comments
Closed

Debugging output not working as expected #80

rsmmr opened this issue Jan 10, 2020 · 6 comments
Assignees

Comments

@rsmmr
Copy link
Member

rsmmr commented Jan 10, 2020

Not sure if I'm missing something, but I'm not getting the debug output I'd expect.

According to an old commit, we have 3 environment vars:

        BROKER_DEBUG_LEVEL=<level>
            Set the CAF logging level explicitly.

        BROKER_DEBUG_COMPONENT_FILTER
            Set the CAF component filter explicitly.

        BROKER_DEBUG_VERBOSE=1
            Switch back to old settings: logging at level DEBUG, no
            component filtering.

However, no matter how I set these, I don't get any Broker-level messages, but only caf.flow DEBUG level messages. Would be good to review how debug logging is working currently and document how to activate it.

Note I noticed this outside of Zeek with the new zeek-agent. That one hardcodes the debug level to -1 currently, which I changed to 2 when trying this. See also zeek/zeek-agent#2

@jsiwek
Copy link
Contributor

jsiwek commented Jan 10, 2020

Running with BROKER_DEBUG_LEVEL=trace is what seems to work for me in Zeek/Broker (didn't check into zeek-agent or other log levels). Examples:

Here's a release Zeek build with no special CAF logging configuration:

$ ./configure --build-type=release && make
...
$ BROKER_DEBUG_LEVEL=trace  zeek -r ~/Desktop/get.trace
$ cat broker_78238_1578680818358148000.log
142 caf DEBUG actor0 0x700001db8000 caf.logger log_first_line logger.cpp:613 level = trace, node = 782D1884932E7041B05E3337493D71E11B8BBD00#78238, component-blacklist = []
142 broker INFO actor0 0x11aed2dc0 broker.endpoint endpoint endpoint.cc:195 creating endpoint
142 broker INFO actor0 0x11aed2dc0 ,%20std.vector<topic> subscriber subscriber.cc:156 creating subscriber for topic(s) []
142 broker TRACE actor7 0x700001aa6000 broker operator() core_actor.cc:399 ENTRY filter = []
142 broker TRACE actor7 0x700001aa6000 broker.detail.core_policy add_worker core_policy.cc:365 ENTRY filter = []
...

A standalone Broker debug build (don't think debug matters, just what I happened to build) and again without special CAF logging configuration (looks like it defaults to QUIET logging level):

$ ./configure --build-type=debug && make install
...
$ c++ -std=c++11 -I<broker-install>/include/ listener.cc -L<broker-install>lib/ -lbroker -lcaf_core -lcaf_openssl -lcaf_io -o listener
$ BROKER_DEBUG_LEVEL=trace ./listener
...
$ cat broker_79295_1578681249263050000.log 
165 caf DEBUG actor0 0x70000ce22000 caf.logger log_first_line logger.cpp:613 level = trace, node = 782D1884932E7041B05E3337493D71E11B8BBD00#79295, component-blacklist = []
165 broker INFO actor0 0x119d18dc0 broker.endpoint endpoint endpoint.cc:195 creating endpoint
165 broker INFO actor0 0x119d18dc0 ,%20std.vector<topic> subscriber subscriber.cc:156 creating subscriber for topic(s) ["/topic/test"]
165 broker INFO actor0 0x119d18dc0 broker.endpoint listen endpoint.cc:237 listening on :9999 (no SSL)
...

@jsiwek
Copy link
Contributor

jsiwek commented Jan 10, 2020

I was also testing current master branches, not sure if it differs from behavior of release versions.

@rsmmr
Copy link
Member Author

rsmmr commented Jan 14, 2020

The plan is to add the capability to switch on two levels of Broker-side debug output:

  1. INFO: high-level connections activity (setup, handshake, teardown, any errors or warnings, such as version differences)
  2. DEBUG: A dump of all messages going over the Broker sessions.

Turning that on will work through an environment variable and through config files, and it will work in both release & debug builds.

Goal is to get this into Zeek 3.1 still.

@Neverlord
Copy link
Member

A standalone Broker debug build (don't think debug matters, just what I happened to build) and again without special CAF logging configuration (looks like it defaults to QUIET logging level)

CAF defaults to quiet, but the logger is always available since recent CAF versions.

Turning that on will work through an environment variable and through config files, and it will work in both release & debug builds.

Broker uses its own set of logging macros. They are independent of whatever CAF_LOG_LEVEL may be. Currently, we always compile Broker with all logging branches. At runtime, Broker has to "pay" one if-branch per log statement to check whether the log level at runtime permits logging the event. Notably, all BROKER_TRACE statements currently have to check twice (at function entry and exit).

So far I did not measure how much overhead these extra branches amount to. Also, there's currently no build option for removing them. Maybe we should run some benchmarks to see whether it's worth it to add a BROKER_LOG_LEVEL in order to remove trace statements from release builds.

warnings, such as version differences

I'm working on this, but as it turns out: CAF doesn't propagate this properly. CAF only signals sec::disconnect_during_handshake. But this error actually can be any of these things:

  • an actual, unexpected TCP disconnect
  • an application ID mismatch (e.g., broker-v1 vs broker-v2)
  • a CAF version mismatch (e.g. CAF 0.16.x vs CAF 0.17.x)

Broker needs to knows which of the three it is, because only in one case we'd try to reconnect.

Goal is to get this into Zeek 3.1 still.

I'll try to get this fixed on the CAF side with an 0.17.4 bugfix release in time.

@Neverlord
Copy link
Member

With the patch to CAF (submitted as PR 1023), I now see this output when trying to connect two incompatible Broker versions:

starting to peer with localhost:8080 retry: 1000000000ns [synchronous]
initiating connection to localhost:8080 (SSL)
remote endpoint unavailable: system_error(incompatible_application_id)

Tested by changing the application ID for one Broker endpoint on the command line. We already log this error in core_actor.cc as error, so there are no changes to Broker necessary. The CAF change plus PR #80 should make it easier to troubleshoot similar issues.

I'll also do a pass over Broker to improve usefulness of INFO and DEBUG logs, but I'd prefer to do that on the "new" code in topic/neverlord/multi-hop-routing. Otherwise, we'd have to do the same work twice because the core actor has been redesigned for the ALM feature.

@jsiwek
Copy link
Contributor

jsiwek commented Feb 6, 2020

Think we've got everything to call this done now: improved configuration for Broker logs and now using CAF 0.17.4 branch to better report version mismatches. Or re-open / correct me if wrong.

@jsiwek jsiwek closed this as completed Feb 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants