Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove zeek::Batch and simplify API #64

Closed
wants to merge 4 commits into from

Conversation

Neverlord
Copy link
Member

The Zeek-side batching was implemented in response to observing performance degradation with Broker compared to the old communication backend. Batching multiple Zeek messages into a single Broker message and making the content opaque to Broker by exchanging binary data (serial_data) mitigated the performance issues. Since then, we made several performance improvements. Most notably, Broker now uses copy-on-write messaging in order to avoid unnecessary copy overheads.

Latest benchmark results all seem to indicate that we no longer need the Zeek-side batching. Hence, this commit essentially undoes the batching:

  • Remove broker::zeek::Message::Batch and broker::zeek:: Batch
  • Consolidate the various valid functions and make them static
    • Previously, these member functions allowed to check whether an object contains valid data
    • Now, the static valid function checks whether it's safe to create an object in the first place
  • Drop opaque serial_data members in favor of using Broker's data::vector directly

This API change also requires a patch in Zeek (see accompanying PR in zeek/zeek).

Neverlord added a commit to zeek/zeek that referenced this pull request Oct 19, 2019
The Zeek-side batching was implemented in response to observing
performance degradation with Broker compared to the old communication
backend. This seems no longer necessary, so we can remove the opaque
data as well as the batching messages in Broker (see
zeek/broker#64).

Instead of serializing into opaque buffers, we simply convert the
threading values directly to `broker::data` and vice versa.
@Neverlord
Copy link
Member Author

@jsiwek I also had to update the Python bindings (7627657). It seems like throwing exceptions is fine with pybind11, but please tell me if you prefer some other error strategy. We could also expose the new valid and from functions instead to mirror the C++ API. I don't know how many people / existing scripts rely on the interface, so I didn't want a change the Python API proactively.

@jsiwek
Copy link
Contributor

jsiwek commented Oct 21, 2019

@Neverlord nice job, the changes look good conceptually, but before I do the merge, can you check into why the Travis build fails for this Broker PR ?

And after that, can you update your topic/neverload/batching branch in the Zeek repo to set the Broker submodule pointing at this same branch? I'll end up updating it myself during merge, but always nice if I can see beforehand whether the PR against Zeek, zeek/zeek#644, has any surprising CI results, too.

@rsmmr
Copy link
Member

rsmmr commented Oct 22, 2019

Yeah, this is neat. I was hoping we could benchmark it a bit more before merging, but not sure when that can happen. We don't have the new testbed in place yet, and not sure if we can find somebody who'd be able to run old vs new in a production environment. However, I'm a bit reluctant to just go ahead, as it would be difficult to back this out again in case it did unexpectedly cause trouble.

@Neverlord
Copy link
Member Author

before I do the merge, can you check into why the Travis build fails for this Broker PR ?

It failed because of the documentation examples. Fixed.

can you update your topic/neverload/batching branch in the Zeek repo to set the Broker submodule pointing at this same branch?

Done. The CI isn't running automatically for me, but when it eventually runs I'll address any build issues in Zeek.

I was hoping we could benchmark it a bit more before merging

Me too, for the exact reasons you're pointing out. I think we should merge it only after having confidence in it.

Also, this PR removes the opaque blobs Zeek is currently sending around and thus gives Broker subscribers full access to Zeek log messages. I'm sure there are some cool use cases for doing analysis using that data source in real time. We shouldn't merge it, get (some) people excited and then swap it back out again.

Neverlord added a commit to zeek/zeek that referenced this pull request Jan 31, 2020
The Zeek-side batching was implemented in response to observing
performance degradation with Broker compared to the old communication
backend. This seems no longer necessary, so we can remove the opaque
data as well as the batching messages in Broker (see
zeek/broker#64).

Instead of serializing into opaque buffers, we simply convert the
threading values directly to `broker::data` and vice versa.
@jsiwek
Copy link
Contributor

jsiwek commented Feb 3, 2020

As mentioned in zeek/zeek#644, removal of Zeek-side batching is on hold until after CAF 0.18. The followup issue to track that: zeek/zeek#771

@jsiwek jsiwek closed this Feb 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants