Skip to content

Improve json parser, add null type, and various fixes #3503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 21, 2023

Conversation

jachris
Copy link
Contributor

@jachris jachris commented Sep 11, 2023

Improve reliability

This PR improves the reliability of the JSON parser by swapping out one of its core components with another implementation. Previously, the parser could crash or produce erroneous output in some situations.

Improve accuracy

The new parser tries to parse all incoming events as accurately as possible, without sticky type promotion. For example, if a field is almost always a record, but an integer in a few other events, then the new parser yields events where the types match their input (either record or integer, depending on the event). The previous approach would have produced records up to the first batch that sees the integer, and after that always strings (due to promotion rules). Assuming that null fields are considered the same as non-existing fields, the new approach should always1 yield events that match exactly they input type.

Improve performance

We will be benchmarking this PR against the status quo with this dataset and read suricata | summarize count(.) as our pipeline. Previously, this took around 15.7s on my machine. Now, we are done after 8.5s. That is more than 180% of the original event processing rate! Compare this to jq -s length eve.json, which takes 7.6s. Thus, we are only off by 10%, even though we arguably do a lot more work.

Add --raw

Sometimes (depending on the input stream), the JSON parser produces many small batches due to frequent type switches. This PR also adds the --raw flag (initially built for testing purposes), which disables type inference for strings (which can normally be parsed as IP addresses, durations, etc.) and numbers (for which we generally use int64 or double). With --raw, we can parse documents where JSON type of all events is compatible into a single batch. However, this means that, for example, IP addresses are parsed as a string instead of an ip.

Add null type

Although the null value is an inhabitant of every type (due to implicit nullability), there are situations where having a dedicated null type is useful, for example if an expression always evaluates to null (e.g., field does not exist). It is also necessary to faithfully represent empty lists (which were previously not parsed at all), which is why this is part of this PR.

Allow empty records

Previously, empty records were neither parsed nor handled "correctly" by operators. For example, dropping all fields previously meant that the whole event was dropped instead of yielding an empty event. Furthermore, allowing them is also necessary to faithfully represent nested empty records.

Miscellaneous

  • Introduce type_kind as an alternative to the uint8 type index
  • Fix transform_columns, which previously transformed null records to non-null records with null fields (see diff)
  • Some other small fixes and improvements here and there

Tasklist

  • Changelog
  • Converge with try_atom, atom, try_data, data (and perhaps try_record?)
  • Address remaining TODOs that are important enough
  • Small cleanup (and consider shrinking test output with 7k lines)
  • Create followup issue for now overloaded meaning of type::operator bool()
  • Create followup issue to look at transform_columns() with flatten() (this only contains a workaround)
  • Consider adding field to record even if value is always null

Footnotes

  1. There is an edge case where a list inside a single event contains mixed types, for example {"foo": [{"bar": []}, {"bar": 42}]. Because there currently are no union types, this cannot be accurately represented. The implementation contains some special logic that casts the conflicting types into strings, such that we get {"foo": [{"bar": "[]"}, {"bar": "42"}] for our example. But this behavior is limited to a single event, unlike previously.

@jachris jachris force-pushed the topic/series-builder branch from de14e1c to b16cad0 Compare September 11, 2023 11:10
@jachris jachris added feature New functionality bug Incorrect behavior format Parser and printer operator Source, transformation, and sink engine Core pipeline and storage engine labels Sep 11, 2023
@jachris jachris changed the title Add series_builder, add null type, fix json parser and allow empty records Add series_builder, add null type, fix json parser and other small fixes Sep 11, 2023
@jachris jachris changed the title Add series_builder, add null type, fix json parser and other small fixes Add series_builder, add null type, fix json parser and other fixes Sep 11, 2023
@jachris jachris changed the title Add series_builder, add null type, fix json parser and other fixes Imrpove json parser, add null type, and various fixes Sep 11, 2023
@jachris jachris changed the title Imrpove json parser, add null type, and various fixes Improve json parser, add null type, and various fixes Sep 11, 2023
@jachris jachris force-pushed the topic/series-builder branch 2 times, most recently from a35cf10 to ad8a307 Compare September 11, 2023 11:49
@jachris jachris force-pushed the topic/series-builder branch from ad8a307 to c0ef1b0 Compare September 11, 2023 12:26
@jachris jachris force-pushed the topic/series-builder branch 8 times, most recently from c3f6897 to ab9dd40 Compare September 15, 2023 14:30
@jachris
Copy link
Contributor Author

jachris commented Sep 15, 2023

Marking this as ready for review now. I'm not 100% happy with everything (in particular there are still some small API issues remaining with regards to conversions and infallibility), but I think it's good enough for a review.

I will also read review the diff myself once more and do some additional cleanup (see tasklist above).

@jachris jachris marked this pull request as ready for review September 15, 2023 14:32
@netantho
Copy link
Contributor

Great perf improvement PR: Seeing 2x improvement on import time for Suricata eve.json file, thanks!

@netantho
Copy link
Contributor

netantho commented Sep 17, 2023

Edit: Solved in latest commits.

Previous problem:

I'm getting a crash for another dataset (I can share the eve.json file privately, I was able to reproduce it reliably).

Client logs:

# zstdcat "*/eve.json.zst" | /usr/bin/docker run --network=host --mount type=bind,source=`pwd`,target=/mnt -i $1 --config=/mnt/tenzir.yaml "read suricata | extend provenance_filename=\"eve.json\", provenance_provider=\"$provider\", provenance_region=\"$region\", provenance_sensor=\"$sensor\", ruleset_etfree_date=\"$ruleset_etfree_date\", suricata_version=\"$suricata_version\" | import"
[...]
[20:21:17.513] requesting field memuse of length 1 to finish and leave 0                                                                                                                              [20:21:17.513] series builder got request to finish but leave 0                                                                                                                                       [20:21:17.513] requesting field memcap of length 1 to finish and leave 0                                                                                                                              [20:21:17.513] series builder got request to finish but leave 0                                                                                                                                       [20:21:17.513] requesting field file_store of length 1 to finish and leave 0                                                                                                                          [20:21:17.513] series builder got request to finish but leave 0                                                                                                                                       [20:21:17.513] finishing 1 records with 1 fields                                                                                                                                                      [20:21:17.513] requesting field open_files of length 1 to finish and leave 0                                                                                                                          [20:21:17.513] series builder got request to finish but leave 0 
/tmp/tenzir/libtenzir/src/series_builder.cpp:1035: assertion failed 'builder_->kind().is<null_type>()'                                                                                                0x7f4b0ecebaf5: (std::_Function_handler<void (caf::scheduled_actor*, caf::exit_msg&), caf::scheduled_actor::set_exit_handler<tenzir::shutdown<tenzir::policy::parallel>(caf::event_based_actor*, std::
vector<caf::actor, std::allocator<caf::actor> >)::{lambda(caf::exit_msg const&)#1}>(tenzir::shutdown<tenzir::policy::parallel>(caf::event_based_actor*, std::vector<caf::actor, std::allocator<caf::actor> >)::{lambda(caf::exit_msg const&)#1})::{lambda(caf::scheduled_actor*, caf::exit_msg&)#1}>::_M_invoke(std::_Any_data const&, caf::scheduled_actor*&&, caf::exit_msg&)+0x1d95)                     
0x7f4b0ecd0312: (tenzir::detail::field_ref::atom(std::variant<caf::none_t, bool, long, unsigned long, double, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::time_point<std::
chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::basic_string_view<char, std::char_traits<char> >, tenzir::ip, tenzir::subnet, unsigned char>)+0x612)     
0x7f4b0ece0152: (tenzir::builder_ref::try_atom(std::variant<caf::none_t, bool, long, unsigned long, double, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::time_point<std::ch
rono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::basic_string_view<char, std::char_traits<char> >, tenzir::ip, tenzir::subnet, unsigned char>)+0x42)        
0x555ab9e2a465: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x1b5f5)                                                                                                        
0x555ab9e3d08f: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2e21f)                                                                                                       
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                      
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                      
0x555ab9e3cde2: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2df72)                                                                                                      
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                      
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                       
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                       
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                             
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                             
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                             
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                             
0x555ab9e3fdda: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x30f6a)                                                                                                      
0x555ab9e488cb: (arrow::Result<std::shared_ptr<arrow::StructArray> >::~Result()+0x2eb)                                                                                     
0x555ab9e3523b: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x263cb)                                                                                                      
0x7f4b0eabff66: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0xdd6)          
0x7f4b0eac58b3: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0x6723)           
0x7f4b0eacdcd0: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0xeb40)           
0x7f4b0e18c138: (caf::scheduled_actor::categorize(caf::mailbox_element&)+0x948)                                                                                                                       
0x7f4b0e18c700: (caf::scheduled_actor::consume(caf::mailbox_element&)+0x210)                                                                                                                          
0x7f4b0e18cb3c: (caf::scheduled_actor::reactivate(caf::mailbox_element&)+0x1c)                                                                                                                        
0x7f4b0e18da7b: (caf::scheduled_actor::resume(caf::execution_unit*, unsigned long)+0xe2b)                                                                                                             
0x7f4b0e0932ae: (void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)+0x2e5e) 
0x7f4b0e09355e: (void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)+0x310e) 
0x7f4b0b9b04a3: (std::error_code::default_error_condition() const+0x33)                            
0x7f4b0b685044: (pthread_condattr_setpshared+0x4d4)                                                
0x7f4b0b704860: (__clone+0x40)                   
tenzir-v4.1.0: Error: signal 6 (Aborted)                                                           
0x7f4b0ea57cfc: (fatal_handler+0x3c)             
0x7f4b0b637fd0: (__sigaction+0x40)               
0x7f4b0b686d3c: (pthread_key_delete+0x14c)                                                         
0x7f4b0b637f32: (gsignal+0x12)                   
0x7f4b0b622472: (abort+0xd3)                     
0x7f4b0ecebafa: (std::_Function_handler<void (caf::scheduled_actor*, caf::exit_msg&), caf::scheduled_actor::set_exit_handler<tenzir::shutdown<tenzir::policy::parallel>(caf::event_based_actor*, std::
vector<caf::actor, std::allocator<caf::actor> >)::{lambda(caf::exit_msg const&)#1}>(tenzir::shutdown<tenzir::policy::parallel>(caf::event_based_actor*, std::vector<caf::actor, std::allocator<caf::ac
tor> >)::{lambda(caf::exit_msg const&)#1})::{lambda(caf::scheduled_actor*, caf::exit_msg&)#1}>::_M_invoke(std::_Any_data const&, caf::scheduled_actor*&&, caf::exit_msg&)+0x1d9a)                     
0x7f4b0ecd0312: (tenzir::detail::field_ref::atom(std::variant<caf::none_t, bool, long, unsigned long, double, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::time_point<std::
chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::basic_string_view<char, std::char_traits<char> >, tenzir::ip, tenzir::subnet, unsigned char>)+0x612)     
0x7f4b0ece0152: (tenzir::builder_ref::try_atom(std::variant<caf::none_t, bool, long, unsigned long, double, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::time_point<std::ch
rono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::basic_string_view<char, std::char_traits<char> >, tenzir::ip, tenzir::subnet, unsigned char>)+0x42)        
0x555ab9e2a465: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x1b5f5)                                                                                                        
0x555ab9e3d08f: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2e21f)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3cde2: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2df72)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3fdda: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x30f6a)                                                                                                        
0x555ab9e488cb: (arrow::Result<std::shared_ptr<arrow::StructArray> >::~Result()+0x2eb)                                                                                                                
0x555ab9e3523b: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x263cb)                                                                                                        
0x7f4b0eabff66: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0xdd6)            
0x7f4b0eac58b3: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0x6723)           
0x7f4b0eacdcd0: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0xeb40)           
0x7f4b0e18c138: (caf::scheduled_actor::categorize(caf::mailbox_element&)+0x948)                                                                                                                       
0x7f4b0e18c700: (caf::scheduled_actor::consume(caf::mailbox_element&)+0x210)                                                                                                                          
0x7f4b0e18cb3c: (caf::scheduled_actor::reactivate(caf::mailbox_element&)+0x1c)                                                                                                                        
0x7f4b0e18da7b: (caf::scheduled_actor::resume(caf::execution_unit*, unsigned long)+0xe2b)                                                                                                             
0x7f4b0e0932ae: (void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)+0x2e5e) 
0x7f4b0e09355e: (void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)+0x310e) 
0x7f4b0b9b04a3: (std::error_code::default_error_condition() const+0x33)                            
0x7f4b0b685044: (pthread_condattr_setpshared+0x4d4)                                                
0x7f4b0b704860: (__clone+0x40)                   
tenzir-v4.1.0: Error: signal 11 (Segmentation fault)                                               
0x7f4b0ea57cfc: (fatal_handler+0x3c)             
0x7f4b0b637fd0: (__sigaction+0x40)               
0x7f4b0b62250f: (abort+0x170)                    
0x7f4b0ecebafa: (std::_Function_handler<void (caf::scheduled_actor*, caf::exit_msg&), caf::scheduled_actor::set_exit_handler<tenzir::shutdown<tenzir::policy::parallel>(caf::event_based_actor*, std::
vector<caf::actor, std::allocator<caf::actor> >)::{lambda(caf::exit_msg const&)#1}>(tenzir::shutdown<tenzir::policy::parallel>(caf::event_based_actor*, std::vector<caf::actor, std::allocator<caf::ac
tor> >)::{lambda(caf::exit_msg const&)#1})::{lambda(caf::scheduled_actor*, caf::exit_msg&)#1}>::_M_invoke(std::_Any_data const&, caf::scheduled_actor*&&, caf::exit_msg&)+0x1d9a)                     
0x7f4b0ecd0312: (tenzir::detail::field_ref::atom(std::variant<caf::none_t, bool, long, unsigned long, double, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::time_point<std::
chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::basic_string_view<char, std::char_traits<char> >, tenzir::ip, tenzir::subnet, unsigned char>)+0x612)     
0x7f4b0ece0152: (tenzir::builder_ref::try_atom(std::variant<caf::none_t, bool, long, unsigned long, double, std::chrono::duration<long, std::ratio<1l, 1000000000l> >, std::chrono::time_point<std::ch
rono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::basic_string_view<char, std::char_traits<char> >, tenzir::ip, tenzir::subnet, unsigned char>)+0x42)        
0x555ab9e2a465: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x1b5f5)                                                                                                        
0x555ab9e3d08f: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2e21f)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3cde2: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2df72)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3cc28: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x2ddb8)                                                                                                        
0x555ab9e3f6cc: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x3085c)                                                                                                        
0x555ab9e3fdda: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x30f6a)                                                                                                        
0x555ab9e488cb: (arrow::Result<std::shared_ptr<arrow::StructArray> >::~Result()+0x2eb)                                                                                                                
0x555ab9e3523b: (std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_weak_release()+0x263cb)                                                                                                        
0x7f4b0eabff66: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0xdd6)            
0x7f4b0eac58b3: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0x6723)           
0x7f4b0eacdcd0: (void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<caf::typed_actor<caf::result<void> (tenzir::atom::start, std::vector<caf::act
or, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzi
r::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intr
usive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenz
ir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, fmt::v9::formatter<caf::typed_actor
<caf::result<void> (tenzir::atom::start, std::vector<caf::actor, std::allocator<caf::actor> >), caf::result<void> (tenzir::atom::pause), caf::result<void> (tenzir::atom::resume), caf::result<void> (
tenzir::atom::pull, caf::typed_actor<caf::result<void> (tenzir::atom::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vecto
r<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_ptr<tenzir::chunk> > >)>, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000000l> >), caf::result<void> (tenzir::ato
m::push, std::vector<tenzir::table_slice, std::allocator<tenzir::table_slice> >), caf::result<void> (tenzir::atom::push, std::vector<caf::intrusive_ptr<tenzir::chunk>, std::allocator<caf::intrusive_
ptr<tenzir::chunk> > >)>, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&)+0xeb40)           
0x7f4b0e18c138: (caf::scheduled_actor::categorize(caf::mailbox_element&)+0x948)                                                                                                                       
0x7f4b0e18c700: (caf::scheduled_actor::consume(caf::mailbox_element&)+0x210)                                                                                                                          
0x7f4b0e18cb3c: (caf::scheduled_actor::reactivate(caf::mailbox_element&)+0x1c)                                                                                                                        
0x7f4b0e18da7b: (caf::scheduled_actor::resume(caf::execution_unit*, unsigned long)+0xe2b)                                                                                                             
0x7f4b0e0932ae: (void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)+0x2e5e) 
0x7f4b0e09355e: (void std::vector<double, std::allocator<double> >::_M_realloc_insert<double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&&)+0x310e) 
0x7f4b0b9b04a3: (std::error_code::default_error_condition() const+0x33)                            
0x7f4b0b685044: (pthread_condattr_setpshared+0x4d4)                                                
0x7f4b0b704860: (__clone+0x40)                   

Node logs:

[...]
[20:21:13.183] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:13.183] checking N6tenzir9bool_typeE and N6tenzir9bool_typeE
[20:21:13.183] checking N6tenzir9bool_typeE and N6tenzir9bool_typeE
[20:21:13.183] checking N6tenzir9bool_typeE and N6tenzir9bool_typeE
[20:21:13.183] checking N6tenzir9bool_typeE and N6tenzir9bool_typeE
[20:21:13.183] checking N6tenzir9bool_typeE and N6tenzir9bool_typeE
[20:21:14.740] checking N6tenzir9time_typeE and N6tenzir9time_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir7ip_typeE and N6tenzir7ip_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir7ip_typeE and N6tenzir7ip_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE
[20:21:14.740] checking N6tenzir9bool_typeE and N6tenzir9bool_typeE
[20:21:14.740] checking N6tenzir11string_typeE and N6tenzir11string_typeE
[20:21:14.740] checking N6tenzir10int64_typeE and N6tenzir10int64_typeE

@netantho
Copy link
Contributor

netantho commented Sep 17, 2023

Edit: Solved in latest commits.

Previous problem:

On another eve.json file, I'm getting a stuck pipeline (reproducible as well):

# zstdcat hetzner-baa-hel1-1_2022-06-08_1654646402732806772_suricata_7-0_default_free_20230901/eve.json.zst | /usr/bin/docker run --network=host --mount type=bind,source=`pwd`,target=/mnt -i d545a2dc2990 --config=/mnt/tenzir.yaml "read suricata | extend provenance_filename=\"eve.json\", provenance_provider=\"$provider\", provenance_region=\"$region\", provenance_sensor=\"$sensor\", ruleset_etfree_date=\"$ruleset_etfree_date\", suricata_version=\"$suricata_version\" | import"
[20:43:50.953] loaded configuration file: "/mnt/tenzir.yaml"
[20:43:50.968] client connects to 0.0.0.0:5158
[20:43:50.968] client connected to node at 0.0.0.0:5158
warning: !! convert_error: unable to convert a0a0a0a into an uint64
 = note: from `read "json", parser_args(*selector("suricata", "event_type"), null, "", null, true, false, false)`
warning: !! convert_error: unable to convert a0a0a0a into an uint64
 = note: from `read "json", parser_args(*selector("suricata", "event_type"), null, "", null, true, false, false)`
warning: !! convert_error: unable to convert a0a0a0a into an uint64
 = note: from `read "json", parser_args(*selector("suricata", "event_type"), null, "", null, true, false, false)`
warning: !! convert_error: unable to convert a0a0a0a into an uint64
 = note: from `read "json", parser_args(*selector("suricata", "event_type"), null, "", null, true, false, false)`
warning: !! convert_error: unable to convert a0a0a0a into an uint64
 = note: from `read "json", parser_args(*selector("suricata", "event_type"), null, "", null, true, false, false)`
[20:43:51.249] finishing events due to conflict: requested string but got int64
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] finishing 5 records with 11 fields
[20:43:51.249] requesting field timestamp of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field flow_id of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field pcap_cnt of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field event_type of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field src_ip of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field src_port of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field dest_ip of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field dest_port of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field proto of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field pkt_src of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field ike of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] finishing 5 records with 8 fields
[20:43:51.249] requesting field version_major of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field version_minor of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field init_spi of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field resp_spi of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field message_id of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field exchange_type of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field payload of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] list got request to finish 5 and leave 1
[20:43:51.249] ending offset of list is 15 of 19
[20:43:51.249] series builder got request to finish but leave 4
[20:43:51.249] requesting field ikev1 of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] finishing 5 records with 4 fields
[20:43:51.249] requesting field doi of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field encrypted_payloads of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field client of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] finishing 5 records with 1 fields
[20:43:51.249] requesting field proposals of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] list got request to finish 5 and leave 1
[20:43:51.249] ending offset of list is 5 of 6
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] finishing 5 records with 10 fields
[20:43:51.249] requesting field alg_enc of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field alg_enc_raw of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field unknown of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field unknown_raw of length 5 to finish and leave 0
[20:43:51.249] series builder got request to finish but leave 0
[20:43:51.249] requesting field alg_hash of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field alg_hash_raw of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field alg_auth of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field alg_auth_raw of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field alg_dh of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field alg_dh_raw of length 6 to finish and leave 1
[20:43:51.249] series builder got request to finish but leave 1
[20:43:51.249] requesting field server of length 5 to finish and leave 0
[20:43:51.249] series builder got request to finish but leave 0
[20:43:51.249] finishing 5 records with 0 fields
[20:43:51.249] finishing events due to conflict: requested int64 but got string

Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback from an initial review, more coming later or tomorrow.

@jachris jachris force-pushed the topic/series-builder branch 2 times, most recently from 4f5b849 to 9de7db2 Compare September 18, 2023 18:08
@jachris jachris force-pushed the topic/series-builder branch from 9de7db2 to 8a27325 Compare September 19, 2023 10:15
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is so massive that I think that after addressing the remaining TODOs and writing follow-up issues we should merge it, but only shortly after today's v4.2 release. This will give us some time to catch any outstanding issues. I went through the code and except for the things you already noted I'm really happy with this. The performance looks to be really good as well.

I also managed to port my YAML format PR to the new API rather easily, and that now works as expected. I'll wait with pushing that until after this is merged so you can more freely rebase.

Please add changelog entries about the following things:

  • Performance and correctness improvements of the JSON parser.
  • Support for empty lists and empty records, and the implied change in behavior for drop and select.

@jachris jachris changed the base branch from main to topic/tui September 19, 2023 13:19
@jachris jachris changed the base branch from topic/tui to main September 19, 2023 13:20
@jachris jachris force-pushed the topic/series-builder branch from 912ff0c to 88f76e0 Compare September 19, 2023 13:23
@jachris
Copy link
Contributor Author

jachris commented Sep 19, 2023

I have added a counting benchmark to the PR description. We went from 15.7s to 8.5s, which is a big improvement. 🚀

Also, this pushes us very close to jq -s length, which takes 7.6s for the same task.

@jachris jachris force-pushed the topic/series-builder branch from 15d750b to 089a8b1 Compare September 20, 2023 10:12
@jachris jachris enabled auto-merge September 21, 2023 09:02
@jachris jachris force-pushed the topic/series-builder branch from 42d1b4f to fd26cff Compare September 21, 2023 09:40
@jachris jachris merged commit 610addb into main Sep 21, 2023
@jachris jachris deleted the topic/series-builder branch September 21, 2023 11:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior engine Core pipeline and storage engine feature New functionality format Parser and printer operator Source, transformation, and sink
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants