-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remap runtime is heavily dominated by clone/drop times #6770
Comments
Interesting. It looks like there are two main culprits. First is the clone that we need to do to ensure the event isn't mutated by the condition. There is a similar clone that is made by the transform so we can roll back on error. The other culprit is due to the way the Target trait is defined - https://github.com/timberio/vector/blob/master/src/event/log_event.rs#L247. To fetch a value from the event it returns the value and not a reference, thus requiring a clone. This is partially because Vrl Value is different to Vector Value so a Vrl Value needs to be created and returned. Also for metric events, the actual Value doesn't really exist in the event, so the value has to be created and returned. If it were a reference there would be nothing to own the value the reference is pointing at. The Target trait is essentially the interface between Vector and Vrl, and I suspect most of the gains can be made by rethinking the way this is working. First to allow it to only create copies of data when it is modified and second to provide some way for it to handle references to Vector Values when possible and own (and cache) created data when references aren't possible. |
As noted in #6787 if we adjust the internal representation of VRL's map to be cheaper to clone -- going through various pointer types and even a simple vec of tuples, essentially -- the runtime of this configuration:
is dominated by clone costs of
pays similar costs but does so across multiple threads in the default vector setup and is faster as a result. If we make vector single-threaded they're more or less the same cost. I believe this strongly suggests that merely making clones cheap -- as is attempted in #6787 -- is not sufficient and we must remove clones. Pushing forward with #5374 strikes me as the next logical step. |
Alright, update here. The work that @FungusHumungus is doing to get #5374 up to parity with master is ongoing and important to fixing this. Additionally, experimentation in https://github.com/blt/pipelines does help confirm that we do need to be operating on batches inside of vector, not individual |
A while has passed since this issue was last updated. There have been many performance-related improvements to VRL, including new path-lookup code, and a shared It might be worth profiling this again, or closing the issue if it is superseded by other issues we've filed since. |
Closing since the investigation here is complete. When we pick up performance work again it'll likely start anew with new profiles. |
@lukesteensen and I have been profiling vector 6193e68 across two configs, seen here and here. Our hypothesis was that the nohop configuration would perform better than the multihop configuration, but the opposite is true. For instance:
The multihop configuration runs ~10 seconds faster than the nohop, hypothesis being that remap is a serial bottleneck in the nohop variant where in multihop its costs are spread across multiple threads. This is born out by the flamegraphs of the two configs, seen here. Notably the diff flamegraph does not show a negative difference except in a slightly elevated backtrace generation in the nohop variant.
The following screenshot show major contributors to runtime:
For the purposes of this issue we can see remap in two places:
Note the high degree to which drops and BTreeMap clones appear in this flamegraph. VRL does a non-trivial amount of cloning internally and these costs do show up.
cc @timberio/vector-processing
The text was updated successfully, but these errors were encountered: