Make low-level analyzers pluggable #248

J-Gras · 2019-01-21T08:45:13Z

Implement a plugin interface for low-level analyzers like GOOSE (see #76). Requirements will be collected on zeek-dev mailinglist.

rsmmr · 2019-06-13T14:38:33Z

Work on this is going to start soon at KIT.

rsmmr · 2020-01-09T18:25:58Z

@J-Gras what's the state here?

J-Gras · 2020-01-10T15:35:21Z

The interface design and benchmarks are completed while the implementation is work in progress. The master thesis should be finished end of January.

timwoj · 2020-04-08T16:02:49Z

@J-Gras Are there any updates on this?

J-Gras · 2020-04-15T15:15:31Z

The thesis is finished and a POC can be found here: https://github.com/peet1993/zeek/tree/llpoc
So far, there is no support for tunneling.

timwoj · 2020-04-17T22:36:06Z

I rebased and merged the entire thing onto a local branch on the main repo, under topic/timw/peet1993-llpoc. I'll dig into it next week.

timwoj · 2020-04-28T00:43:02Z

Here's my notes on the code. I have a lot of questions.

TL;DR: The overall design seems good. It follows pretty closely how we do plugins, especially the L3 analyzers. It just has some quirks that make it not work currently, and places where the code can be cleaned up.

I marked the one big question mark where I think it all falls apart.

LL Analyzers can be instantiated as plugins.
- An initial set comes built-in to zeek that includes Ethernet, ARP, Ethernet, PPPOE, IP4, IP6, ICMP and ICMP6
  - I’m not a huge fan of all of this being sort of outside of the existing code in a plugin with a separate build directory, configure, etc. They’re alongside the hooks-plugin code in src/plugins, but is that the right place for them? Ideally we could merge them in like we do with the L3 protocol analyzers.
  - The Ethernet plugin looks like it duplicates a bunch of code from Packet::Layer2() but that function still exists with no changes on the branch. I think the intention is for the llanalyzers to replace that code entirely.
- These are registered in llanalyzer::Manager via the normal plugin/component system
llanalyzer::Manager is the main dispatch for llanalyzers.
- iosource::Packet calls llanalyzer::Manager::processPacket() to unpack the layer 2 header off the packet
- By the time the code has passed through the llanalyzer tree it should be fully setup to go into the L3 analyzers, similar to how Packet::ProcessLayer2() functioned.
- Manager contains a ProtocolAnalyzerSet, which is initialized after scripts by a configuration that defines a tree of protocols.
- Manager is mostly just a wrapper around the PAS beyond registering/enabling/disabling the analyzer plugins
- Manager::processPacket() takes a packet, looks up the next layer’s type, asks PAS for the analyzer associated with that ID, and calls analyze on it. It then repeats this until it runs out of headers to process.
llanalyzer:ProtocolAnalyzerSet is where the actual work takes place.
- PAS is initialized via a llanalyzer::Config object by llanalyzer::Manager::InitPostScript()
  - The config contains a tree of mappings between an identifier (usually the link type from the packet?) and the string name of an analyzer plugin.
  - Currently, the protocol configuration tree is hard-coded. If someone adds a new protocol via another plugin, how does it get added to the tree?
  - The PAS constructor instantiates any protocol analyzers needed based on the definitions in the config object and stores a pointer to the root object.
- ProtocolAnalyzerSet::getDispatcher creates a mapping between a dispatcher name (a plugin name or “ROOT” for the root configuration) and a tree of Dispatcher objects for all of the configuration tree underneath that name.
  - This is initialized during the PAS constructor by calling it with the “ROOT” dispatcher.
    - ⭐ This part looks broken. It causes an InternalWarning when it tries to look up the ROOT dispatcher, but then the tree never gets built underneath it. I added some logging to the dispatcher's Register method and it never prints anything.
  - There’s a number of implementations for this Dispatcher object, but the one used hard coded at compile time. Was the intention to implement them all and then find the best one and remove the rest?
  - This looks like the third copy of this tree in some way. Can we merge these all together somehow?
  - This dispatcher object is used by PAS::dispatch to walk down through the tree of analyzers that need to be called by Manager::processPacket().
General notes
- Is there any reason why Manager and ProtocolAnalyzerSet can’t be merged into a single class? I get the reasoning to separate the areas of concern.
- The configuration object and how that tree is built is a gigantic question mark. How do people add new protocols outside of editing the core Zeek code to update that tree?
- Right now this is failing about half of the btests. Even just calling Zeek manually on $TRACES/http/get.trace fails, because Manager::processPacket() never finds an analyzer for any packets.
- I haven’t dug into the Dispatcher classes at all mostly because I don’t understand the reasons why all of them exist. The code right now is using UniversalDispatcher which seems to use some sort of custom-built hash table? Can all of this just be replaced with our internal Dict object? For that matter it looks like all of these are keyed on the identifier, which is the link type from the packet. Are we expecting a large number of collisions on these keys?
- This code needs a formatting pass to match the Zeek style guide but that’s a much smaller concern.

J-Gras · 2020-04-28T15:59:50Z

Thanks a lot for having a look! I will try to explain some of the design aspects.

 * LL Analyzers can be instantiated as plugins.
   
   * An initial set comes built-in to zeek that includes Ethernet, ARP, Ethernet, PPPOE, IP4, IP6, ICMP and ICMP6
     
     * I’m not a huge fan of all of this being sort of outside of the existing code in a plugin with a separate build directory, configure, etc. They’re alongside the hooks-plugin code in src/plugins, but is that the right place for them? Ideally we could merge them in like we do with the L3 protocol analyzers.

I agree. A set of common analyzers should be contained in zeek. That should be done similar to the static plugins at application layer.

     * The Ethernet plugin looks like it duplicates a bunch of code from Packet::Layer2() but that function still exists with no changes on the branch. I think the intention is for the llanalyzers to replace that code entirely.

I'll leave the details to Peter. Likely, the branch still contains both paths, as Peter did some measurements to compare his architecture and the old path.

   * These are registered in llanalyzer::Manager via the normal plugin/component system

 * llanalyzer::Manager is the main dispatch for llanalyzers.
   
 ...

 * llanalyzer:ProtocolAnalyzerSet is where the actual work takes place.
   
   * PAS is initialized via a llanalyzer::Config object by llanalyzer::Manager::InitPostScript()
     
     * The config contains a tree of mappings between an identifier (usually the link type from the packet?) and the string name of an analyzer plugin.

The architecture is designed to support multiple layers and tunneling (without maintaining the context so far). Figure 4.3 b) in the thesis shows an example.

     * Currently, the protocol configuration tree is hard-coded. If someone adds a new protocol via another plugin, how does it get added to the tree?

Here the thesis ran out of time. In the thesis an approach using dedicated configuration files is explained. Another option would be to use the script-land.

     * The PAS constructor instantiates any protocol analyzers needed based on the definitions in the config object and stores a pointer to the root object.
   * ProtocolAnalyzerSet::getDispatcher creates a mapping between a dispatcher name (a plugin name or “ROOT” for the root configuration) and a tree of Dispatcher objects for all of the configuration tree underneath that name.
     
     * This is initialized during the PAS constructor by calling it with the “ROOT” dispatcher.
       
       * star This part looks broken. It causes an `InternalWarning` when it tries to look up the ROOT dispatcher, but then the tree never gets built underneath it. I added some logging to the dispatcher's `Register` method and it never prints anything.

That must be a path issue. I had to install zeek and the plugin to make it work. I will have a look to see whether I can clean this up.

     * There’s a number of implementations for this Dispatcher object, but the one used hard coded at compile time. Was the intention to implement them all and then find the best one and remove the rest?

Correct. That was the research part of the thesis. The document contains the numbers and elaborates on the setup of Peter's benchmarks.

     * This looks like the third copy of this tree in some way. Can we merge these all together somehow?

Here I cannot follow. What would be the three copies?

     * This dispatcher object is used by PAS::dispatch to walk down through the tree of analyzers that need to be called by Manager::processPacket().

 * General notes
   
   * Is there any reason why Manager and ProtocolAnalyzerSet can’t be merged into a single class? I get the reasoning to separate the areas of concern.

Again, I'll leave that to Peter.

   * The configuration object and how that tree is built is a gigantic question mark. How do people add new protocols outside of editing the core Zeek code to update that tree?

Unfortunately, this hasn't been implemented so far, as described above. The general idea is: Zeek looks for analyzers in a special path, each analyzer comes with a configuration file, which is loaded by zeek and the tree is build based on that (Section 4.3).

   * Right now this is failing about half of the btests. Even just calling Zeek manually on $TRACES/http/get.trace fails, because Manager::processPacket() never finds an analyzer for any packets.

I guess this needs a major cleanup. To get the POC running I installed zeek and the demo analyzer that wraps the original stack. To make btest work, paths need to be handled properly.

   * I haven’t dug into the Dispatcher classes at all mostly because I don’t understand the reasons why all of them exist. The code right now is using UniversalDispatcher which seems to use some sort of custom-built hash table? Can all of this just be replaced with our internal Dict object? For that matter it looks like all of these are keyed on the identifier, which is the link type from the packet. Are we expecting a large number of collisions on these keys?

This is thoroughly discussed in the thesis. The architecture was designed to support multiple layers, hence the tree. For each protocol, we expected a sparse set of identifiers to be dispatched. The thesis evaluated different data structures with respect to time complexity and memory consumption. Although we are well aware of the root of all evil, this was the research part of the thesis.

   * This code needs a formatting pass to match the Zeek style guide but that’s a much smaller concern.

timwoj · 2020-04-28T23:01:23Z

I'll admit I hadn't read the thesis yet and dove straight into the code from the technical side. Let me read through it today and I'll get back to my comments tomorrow.

J-Gras · 2020-04-29T17:23:12Z

I think Peter moved the original zeek code for processing lower layers in the Ethernet llanalyzer for testing purposes. I moved that code into zeek like it's done for the application layer analyzers and renamed it to WrapperAnalyzer. This can be found in https://github.com/J-Gras/zeek/tree/llpoc. Let me know if that works for you.

poettig · 2020-05-03T16:34:27Z

Hey everyone,

first of all, sorry that it took me so long to answer, I didn't get around to it earlier. I hope I can clear up all questions. Most of them are already answered precisely by Jan, so not as much left for me :)

I’m not a huge fan of all of this being sort of outside of the existing code in a plugin with a separate build directory, configure, etc. They’re alongside the hooks-plugin code in src/plugins, but is that the right place for them? Ideally we could merge them in like we do with the L3 protocol analyzers.

To be honest, I'm not either. Initially, I analyzed how you are doing plugins already and tried build on that. I copied the hooks-plugin example from the existing code, checked how it works and modified it to work for my use case. This results in all plugins being dynamic and being built separately, like a plugin author would to. Included plugins should definitely be static, but I didn't invest the time to change that as time was scarce already. I like the example implementation that Jan posted a few days ago, this should also solve the problem that building the PAS didn't work for you.

The Ethernet plugin looks like it duplicates a bunch of code from Packet::Layer2() but that function still exists with no changes on the branch. I think the intention is for the llanalyzers to replace that code entirely.

As Jan already said, it is really just a copy of what Zeek is already doing internally. I did that to analyze how usage of the plugin interface compared to internal code affects the monitor performance by running the layer 2 analysis once with the internal code and once with the plugin-based code. In both approaches, the packet was handed to the internal layer 3 analysis of Zeek afterwards. I found that the impact of the plugin interface is measureable but small for this simple case. You can find the details in section 6.2 of the thesis.

By the time the code has passed through the llanalyzer tree it should be fully setup to go into the L3 analyzers, similar to how Packet::ProcessLayer2() functioned.

That is not entirely correct. The code is flexible enough to allow pluggable layer 3 analyzers as well, so the internal layer 3 analysis of Zeek can be replaced by plugins as well. This also makes it possible to make the included ARP analyzer better integrated into the analysis flow, right now it is basically treated as a special case during layer 3 protocol detection (Sessions.cc:172). I demonstrated this with a PPP-session trace, producing a logfile of all metadata for each packet (MACs, IP-Addresses etc.) using the analyzers provided in the llpoc-analyzers plugin. These analyzers output information into the debug.log if "-B llpoc" is enabled, it looks like this:

579531442.294118/1588521771.243316 [llpoc] Analyzing packet 12, ts=1579531442.302...
1579531442.294118/1588521771.243321 [llpoc] Found Ethernet layer: ETH_TYPE=0x8864, SRC_MAC=f8:f2:1e:47:16:60, DST_MAC=e0:28:6d:83:0d:72
1579531442.294118/1588521771.243325 [llpoc] Found PPPOE layer! Next is 0x21
1579531442.294118/1588521771.243328 [llpoc] Found IPv4 layer: SRC_IP=91.123.103.200, DST_IP=217.160.127.200
1579531442.294118/1588521771.243331 [llpoc] Done, last found layer identifier was 0x6.
1579531442.301816/1588521771.243355 [llpoc] Analyzing packet 13, ts=1579531442.302...
1579531442.301816/1588521771.243360 [llpoc] Found Ethernet layer: ETH_TYPE=0x8864, SRC_MAC=f8:f2:1e:47:16:60, DST_MAC=e0:28:6d:83:0d:72
1579531442.301816/1588521771.243364 [llpoc] Found PPPOE layer! Next is 0x21
1579531442.301816/1588521771.243367 [llpoc] Found IPv4 layer: SRC_IP=91.123.103.200, DST_IP=217.160.127.200
1579531442.301816/1588521771.243370 [llpoc] Done, last found layer identifier was 0x6.

To make this work, you'll have to uncomment all "configuration.addMapping" lines in llanalyzer::Manager and swap the commented-out method "Ethernet::analyze" (Ethernet.cc:180) with the currently-not-commented-out one.
The code also theoretically supports layer 4 analyzers, but that was outside of scope for the thesis.

This part looks broken. It causes an InternalWarning when it tries to look up the ROOT dispatcher, but then the tree never gets built underneath it. I added some logging to the dispatcher's Register method and it never prints anything.

This has to be a path issue as Jan already mentioned. This process worked for me:

Build Zeek
Run ./configure --zeek-dist=<path_to_zeek_root> inside of plugins/llpoc-analyzers
Build the plugin .so
Add plugin path to the zeek environment variables and run

The reason you are only seeing a warning instead of an error and Zeek refusing to run is that I forgot to check if the created dispatching tree actually contains a root analyzer, the code only checks if there was a dispatcher object created (ProtocolAnalyzerSet.cc:23). When there are no analyzers (because the plugin cannot be found), a dispatching object is created without any content. Sorry about that :(

This looks like the third copy of this tree in some way. Can we merge these all together somehow?

I only see two "copies": The Config object containing the configured mapping and the PAS, which is created from the configuration. It therefore also is not really a copy, as the Config object is just an internal representation of the user-provided configuration and the PAS is the actually usable system with dispatchers. The PAS gets recursively created from the configuration. Can you elaborate what you mean?

Is there any reason why Manager and ProtocolAnalyzerSet can’t be merged into a single class? I get the reasoning to separate the areas of concern.

These are separated classes because I derived my code from the already existing code for application analyzer plugins which uses the Manager-based approach. I actually noticed that the Manager doesn't really do much in this context, but found it useful to have a separation between "machine" and "operator" if that analogy makes sense. The Manager builds the PAS from the configuration and then "uses" the PAS (which is basically a finite state machine) for the dispatching-and-analysis process through processPacket().

The configuration object and how that tree is built is a gigantic question mark. How do people add new protocols outside of editing the core Zeek code to update that tree?

Yeah, as Jan said, implementation of the configuration was sadly not possible because of time constraints. I can work something out if this helps, but it is probably a good idea to talk about the concept I worked out first, if you like it or if you would handle it differently.

This code needs a formatting pass to match the Zeek style guide but that’s a much smaller concern.

It does :) I stuck to the formatting I was used to.

I hope this could clear up some of the confusion. I'm now also part of the Zeek Slack, if you have any quick questions just shoot me a DM (Peter Oettig) :)

timwoj · 2020-05-05T19:07:57Z

Thanks for the updates @peet1993! @rsmmr and I talked about this for a while on our call last week, and I think the first steps for this are:

Merge this into a new branch on the zeek repo and rebase it on top of master (already done).
Get the plugin integrated as part of the baseline code.
Work on integrating llanalyzer into the existing analyzer chain (probably inserting it into net_packet_dispatch) and calling the existing analyzers from it where it makes sense.

From there I can work on how to rework the configuration tree stuff to be automatically built as analyzers are added, which seems to be the bulk of the work left here.

J-Gras · 2020-05-05T22:16:03Z

Get the plugin integrated as part of the baseline code.

In case you mean making it a "static plugin", this commit moves the existing code to a static one.

Work on integrating llanalyzer into the existing analyzer chain (probably inserting it into net_packet_dispatch) and calling the existing analyzers from it where it makes sense.

Am I right, that the first step would be to switch to the dynamic interface without introducing additional features? One example for a "new feature" might be how to deal with information collected by ll-analyzers. For example, at the moment L2 addresses are handed to the upper layers in a way that might not work for new analyzers.

From there I can work on how to rework the configuration tree stuff to be automatically built as analyzers are added, which seems to be the bulk of the work left here.

I think all we need is a table per ll-analyzer. I looked into implementing this using the configuration framework. In case the "include mechanism" that was mentioned in the thesis is not of high priority, this should be straight forward.

timwoj · 2020-05-05T22:26:28Z

In case you mean making it a "static plugin", this commit moves the existing code to a static one.

No, I mean moving the plugin code out of plugins/llpoc-plugin into the src/llanalyzer directory so that it builds as part of the zeek build instead of as a separate step. I've actually already done this in the last hour or so and can confirm that it works correctly in that I can process (at the very least) a bit of HTTP traffic and it all analyzes correctly. My intention for doing this is that the "base set" of protocols (the ones currently provided as part of the plugin) should be part of the Zeek code and not something separate.

Am I right, that the first step would be to switch to the dynamic interface without introducing additional features? One example for a "new feature" might be how to deal with information collected by ll-analyzers. For example, at the moment L2 addresses are handed to the upper layers in a way that might not work for new analyzers.

You're probably right here. The real trick is going to be sorting out how to go from, for example, the IP4 llanalyzer down into the Session code, which is effectively the front-end for the current set of L3 analyzers. The main goal I think is to end up with something where llanalyzer::Manager::Process() is called directly from net_packet_dispatch, and everything is handled from there.

As a starting point, just making the IP4/IP6 analyzers call the Sessions analyzers directly is easiest to prove it works, and then go from there in adding Sessions to the analyzer chain directly.

I think all we need is a table per ll-analyzer. I looked into implementing this using the configuration framework. In case the "include mechanism" that was mentioned in the thesis is not of high priority, this should be straight forward.

Yep, shouldn't be too bad.

J-Gras · 2020-05-05T22:52:43Z

In case you mean making it a "static plugin", this commit moves the existing code to a static one.

No, I mean moving the plugin code out of plugins/llpoc-plugin into the src/llanalyzer directory so that it builds as part of the zeek build instead of as a separate step. I've actually already done this in the last hour or so and can confirm that it works correctly in that I can process (at the very least) a bit of HTTP traffic and it all analyzes correctly. My intention for doing this is that the "base set" of protocols (the ones currently provided as part of the plugin) should be part of the Zeek code and not something separate.

That's what I did? I guess my wording is just confusing. I moved the plugins to src/llanalyzer/protocol and made them build as part of zeek. I call this "static plugin".

timwoj · 2020-05-05T22:57:03Z

That's what I did? I guess my wording is just confusing. I moved the plugins to ´src/llanalyzer/protocol´ and made them build as part of zeek. I call this "static plugin".

I looked at the commit and it basically just moved the code from ProcessLayer2() into a plugin instead of moving the LLPOC code over. After our chat on Slack I see now that the LLPOC doesn't actually implement all of ProcessLayer2(). There's bits missing, like mpls support.

Let's discuss this more tomorrow on Slack and see if we can nail down a good path forward for all of it.

lealog · 2021-03-09T16:45:15Z

Does it mean that 802.11 is already supported?

timwoj · 2021-03-09T16:48:39Z

@lealog There aren't packet analyzers written for 802.11 yet, but the framework merged in this work enables someone to write them. Unfortunately, the core Zeek team probably doesn't have the bandwidth work on those right now. It'd be an interesting project for someone from the community to pick up though.

jsiwek added Type: Enhancement Area: Plugins Area: Protocol Analysis labels Jan 22, 2019

jsiwek added the Type: Project A self-contained project — for example an intern project, a tech evaluation, or prototyping label Feb 6, 2019

rsmmr assigned J-Gras Jun 13, 2019

timwoj mentioned this issue Apr 13, 2020

Support for Non-IP protocols #910

Closed

timwoj self-assigned this May 5, 2020

timwoj mentioned this issue May 6, 2020

Define coding style and a clang-format config #386

Closed

timwoj added this to the 3.2.0 milestone Jul 1, 2020

timwoj added this to Unassigned / Todo in Release 3.2.0 via automation Jul 1, 2020

J-Gras mentioned this issue Jul 3, 2020

Pluggable low-level analyzers #1046

Merged

4 tasks

timwoj linked a pull request Jul 3, 2020 that will close this issue

Pluggable low-level analyzers #1046

Merged

4 tasks

rsmmr removed this from Unassigned / Todo in Release 3.2.0 Jul 17, 2020

rsmmr added this to Unassigned / Todo in Release 4.0.0 via automation Jul 17, 2020

rsmmr modified the milestones: 3.2.0, 4.0.0 Jul 17, 2020

jsiwek moved this from Unassigned / Todo to Assigned / In Progress in Release 4.0.0 Aug 10, 2020

rsmmr mentioned this issue Aug 12, 2020

Move UDP/TCP/ICMP to packet analyzers #1114

Closed

timwoj mentioned this issue Aug 18, 2020

Move IP handling to Packet Analysis architecture #1131

Closed

timwoj closed this as completed in #1046 Sep 23, 2020

Release 4.0.0 automation moved this from Assigned / In Progress to Done Sep 23, 2020

philrz mentioned this issue Jun 30, 2021

Seeking guidance for users that want to analyze 802.11 #1641

Closed

lealog mentioned this issue Jul 1, 2021

802.11 protocol analysis brimdata/brimcap#107

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make low-level analyzers pluggable #248

Make low-level analyzers pluggable #248

J-Gras commented Jan 21, 2019

rsmmr commented Jun 13, 2019

rsmmr commented Jan 9, 2020

J-Gras commented Jan 10, 2020

timwoj commented Apr 8, 2020 •

edited

Loading

J-Gras commented Apr 15, 2020

timwoj commented Apr 17, 2020

timwoj commented Apr 28, 2020

J-Gras commented Apr 28, 2020

timwoj commented Apr 28, 2020

J-Gras commented Apr 29, 2020

poettig commented May 3, 2020

timwoj commented May 5, 2020

J-Gras commented May 5, 2020

timwoj commented May 5, 2020

J-Gras commented May 5, 2020 •

edited

Loading

timwoj commented May 5, 2020

lealog commented Mar 9, 2021

timwoj commented Mar 9, 2021

Make low-level analyzers pluggable #248

Make low-level analyzers pluggable #248

Comments

J-Gras commented Jan 21, 2019

rsmmr commented Jun 13, 2019

rsmmr commented Jan 9, 2020

J-Gras commented Jan 10, 2020

timwoj commented Apr 8, 2020 • edited Loading

J-Gras commented Apr 15, 2020

timwoj commented Apr 17, 2020

timwoj commented Apr 28, 2020

J-Gras commented Apr 28, 2020

timwoj commented Apr 28, 2020

J-Gras commented Apr 29, 2020

poettig commented May 3, 2020

timwoj commented May 5, 2020

J-Gras commented May 5, 2020

timwoj commented May 5, 2020

J-Gras commented May 5, 2020 • edited Loading

timwoj commented May 5, 2020

lealog commented Mar 9, 2021

timwoj commented Mar 9, 2021

timwoj commented Apr 8, 2020 •

edited

Loading

J-Gras commented May 5, 2020 •

edited

Loading