Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a sort operator #3155

Merged
merged 12 commits into from May 17, 2023
Merged

Implement a sort operator #3155

merged 12 commits into from May 17, 2023

Conversation

dominiklohmann
Copy link
Member

@dominiklohmann dominiklohmann commented May 16, 2023

This is a very basic implementation of a sort operator that comes with a few fundamental limitations:

  • All data is kept in memory
  • There is no windowing support
  • Extension types are not supported for sorting (ip, subnet, enum).
  • Sorting only works by one column at a time
  • The resulting slices of the operator have a size of 1, which leads to very slow data access in following operators

But other than that it works as you'd expect.

Tasks

Edit tasklist title
Beta Give feedback Tasklist Tasks, more options

Delete tasklist

Delete tasklist block?
Are you sure? All relationships in this tasklist will be removed.
  1. Implement in-memory sorting
    Options
  2. Write follow-up stories for the above-mentioned limitations
    Options
  3. Write tests
    Options
  4. Write documentation (including known limitations)
    Options
  5. Add a changelog entry
    Options

@dominiklohmann dominiklohmann added the feature New functionality label May 16, 2023
This is a very basic implementation of a sort operator that comes with a
few fundamental limitations:
- All data is kept in memory
- There is no windowing support
- Extension types are not supported for sorting (ip, subnet, enum).
- Sorting only works by one column at a time
- The resulting slices of the operator have a size of 1, which leads to
  very slow data access in following operators

But other than that it works as you'de xpect.
@mavam
Copy link
Member

mavam commented May 17, 2023

@dominiklohmann I pushed a commit with an updated documentation.

@dominiklohmann dominiklohmann marked this pull request as ready for review May 17, 2023 11:15
@mavam
Copy link
Member

mavam commented May 17, 2023

One thing I noticed during testing: just sort by itself doesn't work, even if the input has just a single field. I would argue we note this down and tackle it during the other follow-ups.

@mavam
Copy link
Member

mavam commented May 17, 2023

Added another commit to test nulls-first. (Currently failing)

@dominiklohmann
Copy link
Member Author

One thing I noticed during testing: just sort by itself doesn't work, even if the input has just a single field. I would argue we note this down and tackle it during the other follow-ups.

The sensible default if the field is omitted is likely sort . asc nulls-last, but we can't express that in the expression language yet. I've added a story for this.

@dominiklohmann dominiklohmann requested a review from mavam May 17, 2023 14:34
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this works as expected. I added tests and they now work.

Unfortunately I lack the Arrow depth to review what the code does in detail. If you need a second pair of eyes from @tenzir/pipelines-squad, I'd ask there. Otherwise my approval is only based on high-level skimming and testing.

libvast/builtins/operators/sort.cpp Show resolved Hide resolved
libvast/include/vast/type.hpp Outdated Show resolved Hide resolved
@dominiklohmann dominiklohmann merged commit 6519597 into main May 17, 2023
41 checks passed
@dominiklohmann dominiklohmann deleted the topic/sort-operator branch May 17, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New functionality
Projects
None yet
2 participants