Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast_float to parse reals #2332

Merged
merged 14 commits into from
Jun 9, 2022
Merged

Conversation

tobim
Copy link
Member

@tobim tobim commented Jun 7, 2022

Do not merge before #2321

This replaces the internal float parsing code with the https://github.com/fastfloat/fast_float library. The main benefit is more correct parsing and support for scientific notation.

📝 Checklist

  • All user-facing changes have changelog entries.
  • The changes are reflected on docs.tenzir.com/vast, if necessary.
  • The PR description contains instructions for the reviewer, if necessary.

🎯 Review Instructions

File by file. parseable/numeric/real.{ch}pp are probably the most interesting starting point. Be aware the real is semantically the old real_opt_dot. A parser that requires a dot does not exist in fast_float so we don't have that any more. The only code that really wanted this was the data parser, but a small modification was enough to change to the regular real parser.

@tobim tobim added the bug Incorrect behavior label Jun 7, 2022
@tobim tobim requested a review from a team June 7, 2022 21:31
@tobim tobim force-pushed the story/sc-20915/integrate-fast_float branch from 2e1a30e to ff7e46e Compare June 7, 2022 21:34
Copy link
Member

@mavam mavam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to work overall. I have some minor comments only.

@dominiklohmann, could you review the CMake? @dispanser, maybe the nix 🧙?

libvast/src/concept/parseable/numeric/real.cpp Outdated Show resolved Hide resolved
libvast/include/vast/concept/parseable/numeric/real.hpp Outdated Show resolved Hide resolved
libvast/include/vast/concept/parseable/vast/data.hpp Outdated Show resolved Hide resolved
libvast/test/format/json.cpp Show resolved Hide resolved
libvast/test/parseable.cpp Show resolved Hide resolved
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve of the CMake and Nix changes except for the missing SBOM and docs update.

.gitmodules Show resolved Hide resolved
@tobim tobim force-pushed the story/sc-20915/integrate-fast_float branch from ff7e46e to 651f47c Compare June 8, 2022 09:21
@tobim
Copy link
Member Author

tobim commented Jun 8, 2022

The summarize unit test fails because of a mismatch. I tried to reproduce but I get very different results:

$ build/gcc/debug/bin/vast -N import zeek < libvast_test/artifacts/logs/zeek/conn.log
$ build/gcc/debug/bin/vast -N export json --numeric-durations > all.jsonl
$ cat all.jsonl | jq -s 'map(.ts |= .[:-16]) | group_by(.ts)[] | map(.duration) | add'
0.877989
1909.2635140000007
64567.938712
303.8997809999999
83.041812
238.96152800000002
188.27766699999995
279.100037
105.96806399999998
122.157704
8431.890083999999
19559.80153700003
27274.072281000037
4112.946189000001
4008.919659000001
5156.49655
6213.606477000004
4524.929005999997
278.3215950000001
1519.8252199999986
28236.221959000002
2149.467891
303.1990909999997
355.74103599999984
467.3227489999999
165.38586599999996
246.9962219999997

@dominiklohmann can you clarify what to do to get the correct values?

@dominiklohmann
Copy link
Member

@dominiklohmann can you clarify what to do to get the correct values?

Exactly what you did. You just need to go back from the numeric durations to the integral value in nanoseconds.

Another easy way to do this for durations is to just change printable/vast/json.hpp to always print durations in nanoseconds.

@tobim tobim force-pushed the story/sc-20915/integrate-fast_float branch from 651f47c to 48dfc78 Compare June 9, 2022 14:38
Copy link
Member

@dominiklohmann dominiklohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! Thanks for the rounding bug fix as well in the summarize plugin that we discussed earlier.

plugins/summarize/summarize.cpp Show resolved Hide resolved
Base automatically changed from topic/hackathon-rebuild-command to master June 9, 2022 15:44
Real rounding introduces issues. Consider a summarization to day
granularity, real rounding would put events after noon into the
bucket of the next day. This is very unintuitive and should be
considered a bug.
@tobim tobim force-pushed the story/sc-20915/integrate-fast_float branch from 48dfc78 to 7f71207 Compare June 9, 2022 15:58
@tobim tobim enabled auto-merge June 9, 2022 16:03
@tobim tobim merged commit 0162365 into master Jun 9, 2022
@tobim tobim deleted the story/sc-20915/integrate-fast_float branch June 9, 2022 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants