New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simdjson dependency to libvast #1246
Add simdjson dependency to libvast #1246
Conversation
Instead of re-opening, feel free to rebase and force-push next time. We don't mind force pushes before reviews within a PR. |
Ok, I get it. |
@@ -55,6 +55,11 @@ in { | |||
configureFlags = old.configureFlags ++ [ "--enable-prof" "--enable-stats" ]; | |||
}); | |||
broker = final.callPackage ./broker {inherit stdenv; python = final.python3;}; | |||
simdjson = prev.simdjson.overrideAttrs (old: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are plans for static-lib as the default option for build, but currently simdjson uses shared-lib. So a little adjustment is required.
@dominiklohmann, @tobim Please, review the changes to nix* I've made. |
c480e5f
to
58175a1
Compare
6865aa5
to
49ac953
Compare
Co-authored-by: tobim <tobim+github@fastmail.fm>
Not bad figuring the nix stuff out so fast. It's not particularly newcomer-friendly. |
This is unnecessary after 5b4b052, which I accidentally pushed to master directly. Sorry about that one.
Remove check whether config file is a regular file
Co-authored-by: Matthias Vallentin <matthias@tenzir.com>
You almost make that sound like a "feature", @tobim. 😉 |
Handle Arrow decoder errors gracefully
Gracefully deal with JSON to data conversion errors
Use the ServerTester fixture for testing dump
Make table slice encodings printable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested this out locally and it works great, it's really a lot faster already than the current parser in real world usage. Really great work so far!
Code-wise I really like the approach you've taken (and commented) with the conversion from JsonType
to VastType
. It certainly does its job well enough, and after having toyed with the code a I bit it is the way to go over implementing a visit-based approach.
Do you need any more input or want us to look at some things in-depth specifically, e.g., do you need further help with the test issues?
@dominiklohmann Currently I'm stuck with "Node suricata alert simdjson" integration test which compares a csv export results. Except this I'm going to fresh-look the code and that is it for phase 2. |
Sounds good. I'll take an in-depth look at the integration test issue tomorrow. |
@ngrodzitski I reviewed this at length today, and discussed with @mavam what we want to do with regards to the different escaping behaviors of the old and the new json readers.
To summarize, we can move forward with this PR. Please update the remaining reference files accordingly. You can run Once the CI is green I would like to merge this into the epic branch, do a few fixups and write a changelog entry and merge the epic branch into master. Phase 3 and 4 can then happen in separate PRs that branch off master. Side note: The performance looks to be greatly improved! |
Ok, I will do. Nevertheless, I'd like to note that this new-lines tollerance in csv can lead to errors.
if
To compare with reference file the output is sorted on line basis:
The catch is that the following wrong output (A with record2 and B with record1)
will be sorted and compared to reference without errors. That is unlikely to be a problem now, but it shows that compare method for under such conditions is not reliable in general. |
That's absolutely correct. We sort the lines in the integration test framework before comparing them, which cannot work with this. I'll discuss adapting the CSV writer with the team in tomorrow's standup. |
It's possible to deactivate the sorting, which could be an option if the output is guaranteed to be deterministic. |
Even though it's legal to have CR/LF unescaped, per the spec, we could decide to escape newlines, because most CSV parsers are not that fancy. This would be an option to the As of right now, I'd go with the simplest path forward. |
a82d94e
to
2c65736
Compare
We'll fix the CSV issue separately from the simdjson-related PRs so you don't need to worry about it. I'll merge this today into the epic branch, and merge the epic branch into master after doing some fixups and writing a changelog entry. Phase 3 should then be a new PR based off of master. |
I think the reason of |
I'll merge this into the epic branch now and fix the remaining CI issue myself in the epic branch, it likely is a permission issue. No big deal. |
📔 Description
Restore PR with rebased epic.
📝 Checklist
🎯 Review Instructions