Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve main event loop #264

Closed
jsiwek opened this issue Feb 6, 2019 · 8 comments
Closed

Improve main event loop #264

jsiwek opened this issue Feb 6, 2019 · 8 comments
Assignees
Labels
Complexity: Modest A cup of tea and an evening (or two) with Zeek. Type: Enhancement Type: Project A self-contained project — for example an intern project, a tech evaluation, or prototyping
Milestone

Comments

@jsiwek
Copy link
Contributor

jsiwek commented Feb 6, 2019

The main event loop has always been a bit of a nuisance -- primary problems seem to be that devs find it hard to understand and also there's high-ish CPU utilization even in situations where it should be mostly idling.

Here's an old reference with other thoughts/ideas that I'm not sure are still relevant: https://bro-tracker.atlassian.net/browse/BIT-1388

A concrete project would be to test out porting the event loop to use something like libev or libuv (or other event library) and seeing what results from it.

@jsiwek jsiwek added Complexity: Modest A cup of tea and an evening (or two) with Zeek. Type: Enhancement Type: Project A self-contained project — for example an intern project, a tech evaluation, or prototyping labels Feb 6, 2019
@timwoj timwoj self-assigned this Aug 8, 2019
@timwoj
Copy link
Contributor

timwoj commented Aug 14, 2019

After digging into the code and doing a survey of the available libraries out there, we settled on libuv (https://github.com/libuv/libuv). It handles file descriptors directly, which means it can handle all of the current input sources. It also has built-in support for DNS lookups which means the code in nb_dns.cc can likely all be removed.

@timwoj
Copy link
Contributor

timwoj commented Aug 16, 2019

I ran into what might be an issue with libuv: libuv/libuv#2428

@timwoj
Copy link
Contributor

timwoj commented Aug 27, 2019

Here's the layout of the work to be done and a priority list.

Work to be done:

  • Import libuv into source directory
    • Options in configure to pass a path to libuv
    • Make git submodule in 3rdparty repo
    • CMake configuration to build it
    • ❗️ Update required CMake version to 3.0 or greater, since libuv requires it. (Require cmake 3 #571)
  • Rewrite IOSource to use the libuv API internally
    • Support two types of sources
      • Non-selectable file descriptor (ex: pcap file)
      • Selectable file descriptor (ex: pcap interface)
    • Rewrite pcap source as proof-of-concept
  • Rework existing internal sources to use the new sources
  • Rework existing external IO sources to use the new API (@sethhall sent me a list of the majority of them)
  • Reimplement packet dumping
  • Loop in main
    • Replace with a call to uv_run
    • Rework setup/teardown of IO sources
  • Rework DNS module to use uv methods
  • Rework Timers module to use uv methods
  • Benchmarking
    • @rsmmr said that he talked to Ram about using IXIA for this. I need to follow up.
    • Need a baseline measurement
    • Continual benchmarking throughout the process of rework

Priority list:

  • Benchmarking setup and a baseline measurement
  • Git submodule for libuv
  • Main IO loop
  • Inserting old sources into main loop
  • New IO source API
  • Reworking old sources to new API
  • Packet dumping
  • Timers
  • DNS

@timwoj
Copy link
Contributor

timwoj commented Aug 28, 2019

Another issue with using libuv that we need to resolve is that they require CMake 3.x. The current limiting factor for this is CentOS 7, which comes with 2.8.12 by default. CMake 3 is available via the cmake3 package. We'll need to bump our required version of CMake for zeek.

@rsmmr
Copy link
Member

rsmmr commented Sep 9, 2019

Updating to cmake3 seems reasonable, that's been our for 5 years. I'll create a separate ticket for that.

@rsmmr rsmmr mentioned this issue Sep 9, 2019
@timwoj
Copy link
Contributor

timwoj commented Sep 16, 2019

Some observations from a week of working with libuv:

Poll timeout is set to zero if we use uv_idle handles, which causes the loop to just spin forever and use 100% of the CPU. If not using uv_idle handles and with the default UV_RUN_DEFAULT mode, the poll timeout is either infinite or the time that the next uv_timer handle will fire. The easy solution is to just use uv_prepare handles everywhere instead of uv_idle handles, but that causes the poll to just hang forever if there's nothing to be read on any of the actual file descriptors.

This is fine for things that have real file descriptors, since the poll will block for packet sources that are interfaces and for broker, or it will fire based on a timer being ready. The existing thread manager code doesn't return any file descriptors and so it currently doesn't have a say in when select would return, and the same would continue. The threading manager will be called with every loop, but only when something else wakes up the poll.

I haven't dug super-far into how the existing timers work yet. libuv timers fire based on a millisecond timeout from when uv_timer_start is called, and can optionally be scheduled to repeat some number of milliseconds as well. Right now I'm assuming I can work our existing timers into that sort of architecture, but I'll likely get to that this week.

The real problem with the looping is in the case of pcap files. With pcap files, we actually want the loop to spin as fast as it can, only it will ignore packet data when in psuedo-realtime mode and the timestamp hasn't been met yet. I'm considering just making this a uv_idle handle and letting it spin.

See http://docs.libuv.org/en/v1.x/design.html#the-i-o-loop for more info on how the poll loop works.

@timwoj
Copy link
Contributor

timwoj commented Sep 16, 2019

The things I haven't really dug into yet:

  • Timers
  • DNS. Right now I think I can replace whole swaths of DNS_Mgr with calls to uv_getaddrinfo and uv_getnameinfo, but keep all of the name caching that exists.

@jsiwek jsiwek added this to Unassigned / Todo in Release 3.1.0 via automation Sep 19, 2019
@jsiwek jsiwek added this to the 3.1.0 milestone Sep 19, 2019
@jsiwek jsiwek moved this from Unassigned / Todo to Assigned / In Progress in Release 3.1.0 Sep 19, 2019
@rsmmr rsmmr moved this from Assigned / In Progress to Pending Review / Merge in Release 3.1.0 Jan 23, 2020
@jsiwek
Copy link
Contributor Author

jsiwek commented Jan 31, 2020

This is done via #681

@jsiwek jsiwek closed this as completed Jan 31, 2020
Release 3.1.0 automation moved this from Pending Review / Merge to Done Jan 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Complexity: Modest A cup of tea and an evening (or two) with Zeek. Type: Enhancement Type: Project A self-contained project — for example an intern project, a tech evaluation, or prototyping
Projects
No open projects
Release 3.1.0
  
Done
Development

No branches or pull requests

3 participants