New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved performance and benchmarks #1
base: main
Are you sure you want to change the base?
Conversation
Thanks for the feedback! I love this iteration. Let me review this a bit deeper this weekend. |
I've pushed an integration test that spawns 4 threads that each will send 1000 messages and tried 3 options:
I'm not an expert in UDP, so I'm not sure why I couldn't send more than 1000 messages per thread. I tried 2k (although nothing in between 1 and 2k) and it hung up, so not sure if this is saturating the UDP port or something. Here are the results on my machine (YMMV):
I think that the proposed code has not yet reached it's full potential as it's still allocating memory for a secondary buffer |
for prefix in block_list { | ||
if line.starts_with(prefix) { | ||
continue 'outer; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current impl (as well as the initial one) runs in linear time of block list size.
Since this is main functionality it makes sense to optimize it. It is possible to implement it in constant time (of block list size), for example by using Trie (prefix tree) data structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my first commit I was actually using BTreeSet to store the keys, but it required exact matches and was slower than the Vec + starts_with() approach, though I think it was due to other slow areas. I've read about Tries before, but I'm not experienced in them, so feel free to propose some samples if you have 😃
I'm currently focusing on a way to remove the extra BytesMut
allocation, which I think it's entirely possible, leaving the prefix match to be the only remaining bottleneck
Current implementation has a a few downsides:
String
, this requires allocations that could be avoided.Although not all of the downsides were addressed and further improvements are possible, this MR proposes the following improvements:
String
full
to limited features in the Tokio dependency to improve compile times and binary sizetokio_util
to build a codec and useUdpFramed
to make code more ergonomicBytesMut
buffer (provided bytokio_util
) to drop the unwanted byte slices and concatenate the desired ones.tokio_util
, aBytesMut
buffer that can automatically grow if needed is provided "for free" (as in: no need to manually manage buffers)BTreeSet<Bytes>
instead of aVec<String>
for faster comparisons. This comes at the expense of only working for exact matches, but could be changed to use regex