Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the scaffolding for new logical operators #3004

Merged
merged 57 commits into from
Mar 16, 2023
Merged

Conversation

dominiklohmann
Copy link
Member

In a three-day Hackathon with @jachris, we've implemented the new scaffolding for logical operators as designed in RFC-001, and implemented a local executor that works without spinning up a new thread.

The goal for this PR is to provide the scaffolding such that follow-up PRs can make the following changes:

  • Migrate the remaining operators to the new logical operator plugin API.
  • Make use of the new local executor in the /query endpoint, the import and export commands, and the partition transformer.
  • Remove import and export pipelines defined in the configuration file (they will return at a later point in time).
  • Implement a multi-threaded pipeline executor as a replacement for the local executor.

For reviewing, we recommend the following order of files:

  • element_type.hpp to get an understanding of element types and batches, and their type-erased version
  • logical_operator.hpp to get an understanding of the logical operator API.
  • logical_pipeline.hpp to get an understanding of how logical operators compose into a logical pipeline.
  • physical_operator.hpp, where.cpp, and head.cpp to get an understanding of how operators transform data (including two examples).
  • logical_pipeline.cpp to see how we connect producers to be able to execute pipelines.

During review, please pay attention to the desired destruction order of producer, physical operator, logical operator, and operator control plane.

We recommend pairing with us for the review, especially for logical_pipeline::make_local_executor().

@dominiklohmann dominiklohmann added the feature New functionality label Mar 9, 2023
@dominiklohmann dominiklohmann requested a review from tobim March 9, 2023 14:54
@dominiklohmann dominiklohmann force-pushed the topic/rfc001 branch 2 times, most recently from 924c769 to 3604108 Compare March 9, 2023 15:08
@dominiklohmann dominiklohmann marked this pull request as ready for review March 9, 2023 15:08
Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First few review comments to make them visible. No reason to wait until I'm through to the end.

libvast/builtins/operators/extend.cpp Outdated Show resolved Hide resolved
libvast/src/system/make_pipelines.cpp Show resolved Hide resolved
libvast/src/table_slice.cpp Show resolved Hide resolved
libvast/include/vast/element_type.hpp Show resolved Hide resolved
libvast/include/vast/logical_operator.hpp Outdated Show resolved Hide resolved
libvast/include/vast/physical_operator.hpp Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Show resolved Hide resolved
Coroutines are not always driven to the first suspension point. This
meant that `logical_pipeline::make_local_executor` was prone to UAF.
In contrast, the free function captures it by-value immediately.
Copy link
Member

@tobim tobim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator abstraction should be able to handle all cases, at least for vast exec. I expect the implementation to hold up for now, but we may discover that not all our assumptions hold when we implement hosted pipelines.

libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
libvast/src/logical_pipeline.cpp Outdated Show resolved Hide resolved
@jachris jachris enabled auto-merge March 16, 2023 11:17
@jachris jachris merged commit c03b454 into master Mar 16, 2023
@jachris jachris deleted the topic/rfc001 branch March 16, 2023 11:55
jachris added a commit that referenced this pull request Mar 29, 2023
This PR makes some changes to the implementation of RFC-001 (#3004). The
main goal is to simplify operator definitions and internal execution
details.

- The instantiation state is now only stored by the instantiated
generator.
- There is no more `physical_operator`/`logical_operator` separation in
the code base, because we have only one intermediate representation for
both (though we can still use these terms in explanations).
- Operators can be copied without a `parse(to_string())` round-trip.
- Schema-specific dispatching is an implementation detail of the
operator, but we provide a class that makes this convenient. This
drastically simplifies the execution engine.
- Polymorphic operators are now possible. For instance, `head` could be
made to work on both event and byte streams.
- The empty pipeline is therefore now also correctly typed, same as
`pass`.

Pipeline serialization is left for a follow-up PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants