-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476
Conversation
Another change: treat subnets as a special case in #find_network method (no need to binary-search when I can directly match hash key). This is before the shortcut was in place (binary search only):
And this is with the subnet shortcut in place:
|
I can speed up #find_network a little bit more by doing ip to subnet check myself (I already have ip and subnet addresses as ints, I only need subnet mask as int to do the check). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can speed up #find_network a little bit more by doing ip to subnet check myself (I already have ip and subnet addresses as ints, I only need subnet mask as int to do the check).
Please do, my profiling shows it might improve performance.
-
Would you like to include any benchmark scripts in this commit? They don't have to run as part of the test suite, but may be useful for future regression testing. Mine are in the
17387-find-subnet-wb
branch on my fork and contain scripts for both add_subnet and find_network. -
I experimented with trie storage of subnets to see if it would be better than a binary search. The performance should be more linear as the number of subnets increases:
Base:
Calculating ------------------------------------- find_subnet (1 subnets, 200 hosts) 612.695 (±14.4%) i/s - 5.672k in 9.972658s find_subnet (5 subnets, 1000 hosts) 99.585 (±11.0%) i/s - 980.000 in 9.995022s find_subnet (50 subnets, 10000 hosts) 9.239 (±10.8%) i/s - 92.000 in 10.082209s find_subnet (500 subnets, 100000 hosts) 0.816 (± 0.0%) i/s - 9.000 in 11.036854s find_subnet (5000 subnets, 1000000 hosts) 0.068 (± 0.0%) i/s - 1.000 in 14.690730s
With trie:
Calculating ------------------------------------- find_subnet (1 subnets, 200 hosts) 177.660 (±11.8%) i/s - 1.740k in 9.986557s find_subnet (5 subnets, 1000 hosts) 34.654 (±14.4%) i/s - 340.000 in 10.008077s find_subnet (50 subnets, 10000 hosts) 3.468 (± 0.0%) i/s - 35.000 in 10.154915s find_subnet (500 subnets, 100000 hosts) 0.347 (± 0.0%) i/s - 4.000 in 11.522769s find_subnet (5000 subnets, 1000000 hosts) 0.034 (± 0.0%) i/s - 1.000 in 29.217843s
though performance is currently worse - probably because it's not optimised at all (tree could be compressed, code improved among other things). If that interests you, it's in my 17387-find-subnet-wb-trie
branch, but given how it's performing right now, I'm happy with the bsearch here.
@@ -35,22 +41,44 @@ def add_subnets(*subnets) | |||
end | |||
|
|||
def delete_subnet(subnet_address) | |||
m.synchronize { subnets.delete(subnet_address) } | |||
m.synchronize { subnets.delete(ipv4_to_i(subnet_address)); subnet_keys.delete(ipv4_to_i(subnet_address)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split this block onto multiple lines instead of using ;
::Proxy::MemoryStore.new, ::Proxy::MemoryStore.new, ::Proxy::MemoryStore.new) | ||
end | ||
|
||
def add_subnet(subnet) | ||
m.synchronize do | ||
raise Proxy::DHCP::Error, "Unable to add subnet #{subnet}" if find_subnet(subnet.network) | ||
key = ipv4_to_i(subnet.network) & ipv4_to_i(subnet.netmask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ipv4_to_i(subnet.network)
can be replaced by subnet.ipaddr.to_i
(if you add an attr_reader), which seems to have a small performance benefit here. I think IPAddr already calculates this value, and ipv4_to_i has a slight cost.
Base:
Calculating ------------------------------------- add_subnet (1) 31.170k (±17.0%) i/s - 288.249k in 9.820534s add_subnet (5) 7.418k (±12.6%) i/s - 71.478k in 9.938092s add_subnet (50) 767.327 (±10.4%) i/s - 7.548k in 9.992509s add_subnet (500) 74.870 (± 9.3%) i/s - 742.000 in 10.011340s add_subnet (1000) 37.684 (± 8.0%) i/s - 375.000 in 10.009325s add_subnet (15000) 2.304 (± 0.0%) i/s - 23.000 in 10.058093s Memory stats Total objects allocated: 77750679
Changed:
Calculating ------------------------------------- add_subnet (1) 33.397k (±17.1%) i/s - 308.262k in 9.820294s add_subnet (5) 8.128k (±12.0%) i/s - 78.568k in 9.937083s add_subnet (50) 845.850 (± 8.6%) i/s - 8.355k in 9.993047s add_subnet (500) 82.899 (± 6.0%) i/s - 825.000 in 9.999682s add_subnet (1000) 40.876 (± 7.3%) i/s - 407.000 in 10.017958s add_subnet (15000) 2.575 (± 0.0%) i/s - 26.000 in 10.158017s Memory stats Total objects allocated: 66702210 Total heap pages allocated: 760
end | ||
if a_slice.size == 2 | ||
return all_subnets[a_slice.first] if all_subnets[a_slice.first].include?(address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the Subnet#include?
calls to avoid IPAddr would probably be a good idea as you suggest. Creating new IPAddr objects from the address
argument is taking approximately 25% of the wall time when calling find_subnet
1M times.
I would, I thought it was quite useful. I'll cherry-pick it from your branch.
I'll definitely take a look. |
Out of curiosity, what are you using for profiling? |
ruby-prof (domcleal@de5dd3c) and then KCacheGrind to view the resulting file. |
Calculating ------------------------------------- add_subnet (1) 31.170k (±17.0%) i/s - 288.249k in 9.820534s add_subnet (5) 7.418k (±12.6%) i/s - 71.478k in 9.938092s add_subnet (50) 767.327 (±10.4%) i/s - 7.548k in 9.992509s add_subnet (500) 74.870 (± 9.3%) i/s - 742.000 in 10.011340s add_subnet (1000) 37.684 (± 8.0%) i/s - 375.000 in 10.009325s add_subnet (15000) 2.304 (± 0.0%) i/s - 23.000 in 10.058093s Memory stats Total objects allocated: 77750679 Total heap pages allocated: 760
Adds 200 hosts to each /24 subnet and runs find_subnet for each host. Calculating ------------------------------------- find_subnet (1 subnets, 200 hosts) 612.695 (±14.4%) i/s - 5.672k in 9.972658s find_subnet (5 subnets, 1000 hosts) 99.585 (±11.0%) i/s - 980.000 in 9.995022s find_subnet (50 subnets, 10000 hosts) 9.239 (±10.8%) i/s - 92.000 in 10.082209s find_subnet (500 subnets, 100000 hosts) 0.816 (± 0.0%) i/s - 9.000 in 11.036854s find_subnet (5000 subnets, 1000000 hosts) 0.068 (± 0.0%) i/s - 1.000 in 14.690730s (cherry picked from commit c3cb604b281ffa9584a4ccbd17f7c973b214f5dc)
There were the following issues with the commit message:
If you don't have a ticket number, please create an issue in Redmine. More guidelines are available in Coding Standards or on the Foreman wiki. This message was auto-generated by Foreman's prprocessor |
|
There were the following issues with the commit message:
If you don't have a ticket number, please create an issue in Redmine. More guidelines are available in Coding Standards or on the Foreman wiki. This message was auto-generated by Foreman's prprocessor |
I spent a good chunk of the day today trying to optimize Trie-based implementation (see the results here: https://github.com/witlessbird/smart-proxy/tree/find_subnet_speedup_trie). Unfortunately, ip lookup is still about twice as slow compared to binary search, and at this point I don't know if I can make the implementation any faster. Please see results below.
For comparison, binary-search based implementation (note that memory usage and objects allocations are lower here too, which might contribute to the overall speed): [EDIT]: object allocations for find_subnet are actually higher here, while the memory usage is down (probably lots of hash keys, or something).
|
Yeah, I think any optimisation of the trie would be over the top and probably make this rather complex to maintain. I'm pretty certain it is possible to optimise though, since it's the same kind of longest-prefix matching algorithm that routers use to search routing tables (i.e. a set of network prefixes and masks, find the prefix that matches a given IP - this is #find_subnet). There's a lot of literature and known ways to search these quickly. Anyway, I came up with an alternative for the binary search which also scales linearly for any number of subnets (for a fixed network address size) and is much simpler than the trie. Using the existing hash of integer prefixes to subnets, take the input IP address and zero out the least significant bit, check the hash for a matching prefix, zero out the next least sig. bit and repeat. Eventually you'll find a network prefix that matches the input IP address prefix, then just compare the netmask to check the IP is within the range. At worst for an IPv4 address you'd perform 32 hash lookups (for a /0 prefix), and usually significantly fewer. Before: Calculating ------------------------------------- find_subnet (1 subnets, 200 hosts) 1.256k (± 9.6%) i/s - 12.413k in 9.993943s find_subnet (5 subnets, 1000 hosts) 189.718 (± 8.4%) i/s - 1.884k in 10.003885s find_subnet (50 subnets, 10000 hosts) 15.123 (± 6.6%) i/s - 151.000 in 10.000912s find_subnet (500 subnets, 100000 hosts) 1.239 (± 0.0%) i/s - 13.000 in 10.497013s find_subnet (5000 subnets, 1000000 hosts) 0.101 (± 0.0%) i/s - 2.000 in 19.721190s Memory stats Total objects allocated: 228393390 Total heap pages allocated: 4310 After: Calculating ------------------------------------- find_subnet (1 subnets, 200 hosts) 1.492k (± 9.1%) i/s - 14.754k in 9.992266s find_subnet (5 subnets, 1000 hosts) 296.580 (± 8.1%) i/s - 2.945k in 10.000121s find_subnet (50 subnets, 10000 hosts) 28.152 (± 7.1%) i/s - 280.000 in 10.006165s find_subnet (500 subnets, 100000 hosts) 2.823 (± 0.0%) i/s - 29.000 in 10.276872s find_subnet (5000 subnets, 1000000 hosts) 0.274 (± 0.0%) i/s - 3.000 in 10.959625s Memory stats Total objects allocated: 134584664 Total heap pages allocated: 4285 Commits at are: https://github.com/domcleal/smart-proxy/commits/17387-find-subnet-wb. If you like I can submit this as a PR instead for proper review? I also optimised the |
We might be hitting native code vs. ruby performance penalty: trie should be fast (esp. after replacing recursions with loops, arrays with bitmaps, etc), but hash table is implemented in c. Please submit a PR with your implementation of find_subnet? It's faster most of the time (should be faster for add_subnet too, as it doesn't need to maintain an ordered key set) and simpler too. |
Ah, that's a very good point.
Yes, add_subnet performance did increase as a result of not needing to add to the set. Thanks for checking, I'll submit a PR later once I've tidied and checked the changes. |
closing in favour of #477. |
Thanks for your work on this PR! |
Np, thanks for looking into this with me. |
No description provided.