Skip to content

Support parsing of concatenated PCAPs #3513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 15, 2023
Merged

Support parsing of concatenated PCAPs #3513

merged 17 commits into from
Sep 15, 2023

Conversation

mavam
Copy link
Member

@mavam mavam commented Sep 14, 2023

This PR equips the pcap parser with the capability to parse a stream of concatenated PCAPS.

This means that you can now do the following:

cat *.pcap | tenzir 'read pcap'

@mavam mavam added the format Parser and printer label Sep 14, 2023
@mavam mavam force-pushed the topic/micro-pcaps branch 2 times, most recently from 0442bd0 to 978de0b Compare September 14, 2023 14:24
@netantho
Copy link
Contributor

I know it's still in draft, but some very early testing, some compilation error on my side

$ docker build . -f Dockerfile --target tenzir-node-de
[...]
FAILED: libtenzir/CMakeFiles/libtenzir_builtins.dir/builtins/formats/pcap.cpp.o                                                              
/usr/bin/g++-12 -DBOOST_ATOMIC_DYN_LINK -DBOOST_ATOMIC_NO_LIB -DBOOST_FILESYSTEM_DYN_LINK -DBOOST_FILESYSTEM_NO_LIB -DFMT_SHARED -DSIMDJSON_T
HREADS_ENABLED=1 -DSPDLOG_COMPILED_LIB -DSPDLOG_FMT_EXTERNAL -DSPDLOG_SHARED_LIB -DTENZIR_ENABLE_BUILTINS -DXXH_ACCEPT_NULL_INPUT_POINTER=1 -
DXXH_INLINE_ALL=1 -I/tmp/tenzir/build/libtenzir/include -I/tmp/tenzir/libtenzir/include -isystem /tmp/tenzir/libtenzir/aux/caf/libcaf_core -i
system /tmp/tenzir/build/libtenzir/aux/caf/libcaf_core -isystem /tmp/tenzir/libtenzir/aux/caf/libcaf_io -isystem /tmp/tenzir/build/libtenzir/
aux/caf/libcaf_io -isystem /tmp/tenzir/libtenzir/aux/caf/libcaf_openssl -isystem /tmp/tenzir/build/libtenzir/aux/caf/libcaf_openssl -isystem 
/tmp/tenzir/build/libtenzir/libtenzir-fbs/include -isystem /tmp/tenzir/libtenzir/aux/simdjson/include -O3 -DNDEBUG -ftemplate-backtrace-limit
=0 -fno-omit-frame-pointer -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -Wno-unknown-warning -Wall -Wextra -pedantic -Werror=switch -Werror=
odr -Wundef -Wno-redundant-move -Wno-error=bool-compare -fdebug-prefix-map=/tmp/tenzir=. -std=gnu++20 -MD -MT libtenzir/CMakeFiles/libtenzir_
builtins.dir/builtins/formats/pcap.cpp.o -MF libtenzir/CMakeFiles/libtenzir_builtins.dir/builtins/formats/pcap.cpp.o.d -o libtenzir/CMakeFile
s/libtenzir_builtins.dir/builtins/formats/pcap.cpp.o -c /tmp/tenzir/libtenzir/builtins/formats/pcap.cpp                                      
In file included from /tmp/tenzir/libtenzir/include/tenzir/detail/logger.hpp:19,                                                             
                 from /tmp/tenzir/libtenzir/include/tenzir/logger.hpp:42,                                                                    
                 from /tmp/tenzir/libtenzir/include/tenzir/pattern.hpp:13,                                                                   
                 from /tmp/tenzir/libtenzir/include/tenzir/data.hpp:20,                                                                      
                 from /tmp/tenzir/libtenzir/include/tenzir/concept/convertible/data.hpp:16,                                                  
                 from /tmp/tenzir/libtenzir/include/tenzir/argument_parser.hpp:11,                                                           
                 from /tmp/tenzir/libtenzir/builtins/formats/pcap.cpp:9:                                                                     
/tmp/tenzir/libtenzir/builtins/formats/pcap.cpp: In instantiation of 'tenzir::plugins::pcap::{anonymous}::pcap_parser::instantiate(tenzir::ge
nerator<caf::intrusive_ptr<tenzir::chunk> >, tenzir::operator_control_plane&) const::<lambda(auto:161&, tenzir::generator<caf::intrusive_ptr<
tenzir::chunk> >, bool)> [with auto:161 = tenzir::operator_control_plane]':                                                                  
/tmp/tenzir/libtenzir/builtins/formats/pcap.cpp:319:16:   required from here                                                                 
/tmp/tenzir/libtenzir/builtins/formats/pcap.cpp:267:38: error: cannot bind packed field 'packet.tenzir::pcap::packet_record::header.tenzir::p
cap::packet_header::captured_packet_length' to 'unsigned int&'                                                                               
  267 |                        packet.header.captured_packet_length);                                                                        
      |                        ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~                                                                          
/tmp/tenzir/libtenzir/builtins/formats/pcap.cpp:266:11: note: in expansion of macro 'TENZIR_DEBUG'                  
  266 |           TENZIR_DEBUG("reading packet data of size {}",                                                                             
      |           ^~~~~~~~~~~~                                                                                                               
cc1plus: note: unrecognized command-line option '-Wno-unknown-warning' may have been intended to silence earlier diagnostics

@mavam
Copy link
Member Author

mavam commented Sep 14, 2023

@netantho fixed now. Turns out GCC treats packed structs differently.

@netantho
Copy link
Contributor

@netantho fixed now. Turns out GCC treats packed structs differently.

Thanks, indeed fixed. Still crunching the numbers but early results suggest a nice performance boost, thanks so much @mavam and team! 🙏

@mavam mavam marked this pull request as ready for review September 15, 2023 08:16
@mavam
Copy link
Member Author

mavam commented Sep 15, 2023

@dominiklohmann this is ready from a functional perspective. I'm still getting hangs in the executor (which you already know). Here's what causes the hang for me:

cat tenzir/integration/data/pcap/vlan-* > /tmp/concatenated-trace.pcap
tenzir 'read pcap -e' < /tmp/concatenated-trace.pcap

Other than that, it looks like @netantho got this working already.

@mavam mavam added the improvement An incremental enhancement of an existing feature label Sep 15, 2023
@mavam
Copy link
Member Author

mavam commented Sep 15, 2023

This works flawlessly now after c3d4064.

@mavam mavam enabled auto-merge September 15, 2023 14:11
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me and this works well in practice now.

@netantho
Copy link
Contributor

netantho commented Sep 15, 2023

Code looks good to me and this works well in practice now.

Confirming that it works, even another ~30% perf gained with this afternoon's commits. Command tested:

cat *.pcap | /usr/bin/docker run --network=host --mount type=bind,source=`pwd`,target=/mnt -i $1 --config=/mnt/tenzir.yaml "read pcap | extend provenance_provider=\"$provider\", provenance_region=\"$region\", provenance_sensor=\"$sensor\" | import"

I'm seeing the number of events increased with:

/usr/bin/docker run --network=host --mount type=bind,source=`pwd`,target=/mnt -i 206b21c05cbd "export | summarize count(.)"

Also seeing expected output with:

/usr/bin/docker run --network=host --mount type=bind,source=`pwd`,target=/mnt -i 206b21c05cbd "export | head"

@mavam mavam merged commit 6a83168 into main Sep 15, 2023
@mavam mavam deleted the topic/micro-pcaps branch September 15, 2023 19:03
@netantho
Copy link
Contributor

netantho commented Sep 17, 2023

That was a great perf improvement PR: Seeing 2x improvement on a set of 2-15MB pcap files, and fewer stuck pcap import pipelines (maybe even disappeared at one import thread at a time)

@dominiklohmann
Copy link
Member

dominiklohmann commented Sep 17, 2023

fewer stuck pcap import pipelines (maybe even disappeared at one import thread at a time)

That should be gone entirely as of c3d4064 which I pushed onto this PR. It was a really stupid regression that I introduced shortly before we released v4.1.0.

In hindsight, we probably should add a separate changelog entry to document the bug fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format Parser and printer improvement An incremental enhancement of an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants