Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

dmitri-d · 2016-11-17T17:04:20Z

No description provided.

dmitri-d · 2016-11-17T18:00:46Z

Another change: treat subnets as a special case in #find_network method (no need to binary-search when I can directly match hash key). This is before the shortcut was in place (binary search only):

add_subnet (1)     22.799k (±19.9%) i/s -    207.331k in   9.844120s
add_subnet (5)      3.787k (±10.9%) i/s -     36.732k in   9.965606s
add_subnet (50)    318.338  (± 9.4%) i/s -      3.142k in   9.994796s
add_subnet (500)     25.258  (± 4.0%) i/s -    252.000  in  10.014617s
add_subnet (1000)     10.775  (± 0.0%) i/s -    108.000  in  10.042336s
add_subnet (15000)      0.135  (± 0.0%) i/s -      2.000  in  14.780711s

And this is with the subnet shortcut in place:

add_subnet (1)     26.995k (±18.8%) i/s -    246.420k in   9.823332s
add_subnet (5)      6.473k (±12.9%) i/s -     62.277k in   9.946560s
add_subnet (50)    674.130  (±10.2%) i/s -      6.630k in   9.993145s
add_subnet (500)     66.002  (± 7.6%) i/s -    656.000  in  10.010883s
add_subnet (1000)     33.124  (± 6.0%) i/s -    330.000  in  10.023744s
add_subnet (15000)      2.164  (± 0.0%) i/s -     22.000  in  10.195299s

dmitri-d · 2016-11-17T18:11:33Z

I can speed up #find_network a little bit more by doing ip to subnet check myself (I already have ip and subnet addresses as ints, I only need subnet mask as int to do the check).

domcleal

I can speed up #find_network a little bit more by doing ip to subnet check myself (I already have ip and subnet addresses as ints, I only need subnet mask as int to do the check).

Please do, my profiling shows it might improve performance.

Would you like to include any benchmark scripts in this commit? They don't have to run as part of the test suite, but may be useful for future regression testing. Mine are in the 17387-find-subnet-wb branch on my fork and contain scripts for both add_subnet and find_network.
I experimented with trie storage of subnets to see if it would be better than a binary search. The performance should be more linear as the number of subnets increases:

Base:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        612.695  (±14.4%) i/s -      5.672k in   9.972658s
find_subnet (5 subnets, 1000 hosts)
                         99.585  (±11.0%) i/s -    980.000  in   9.995022s
find_subnet (50 subnets, 10000 hosts)
                          9.239  (±10.8%) i/s -     92.000  in  10.082209s
find_subnet (500 subnets, 100000 hosts)
                          0.816  (± 0.0%) i/s -      9.000  in  11.036854s
find_subnet (5000 subnets, 1000000 hosts)
                          0.068  (± 0.0%) i/s -      1.000  in  14.690730s

With trie:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        177.660  (±11.8%) i/s -      1.740k in   9.986557s
find_subnet (5 subnets, 1000 hosts)
                         34.654  (±14.4%) i/s -    340.000  in  10.008077s
find_subnet (50 subnets, 10000 hosts)
                          3.468  (± 0.0%) i/s -     35.000  in  10.154915s
find_subnet (500 subnets, 100000 hosts)
                          0.347  (± 0.0%) i/s -      4.000  in  11.522769s
find_subnet (5000 subnets, 1000000 hosts)
                          0.034  (± 0.0%) i/s -      1.000  in  29.217843s

though performance is currently worse - probably because it's not optimised at all (tree could be compressed, code improved among other things). If that interests you, it's in my 17387-find-subnet-wb-trie branch, but given how it's performing right now, I'm happy with the bsearch here.

domcleal · 2016-11-18T09:39:30Z

modules/dhcp_common/subnet_service.rb

@@ -35,22 +41,44 @@ def add_subnets(*subnets)
    end

    def delete_subnet(subnet_address)
-      m.synchronize { subnets.delete(subnet_address) }
+      m.synchronize { subnets.delete(ipv4_to_i(subnet_address)); subnet_keys.delete(ipv4_to_i(subnet_address)) }


Split this block onto multiple lines instead of using ;

domcleal · 2016-11-18T14:05:50Z

modules/dhcp_common/subnet_service.rb

          ::Proxy::MemoryStore.new, ::Proxy::MemoryStore.new, ::Proxy::MemoryStore.new)
    end

    def add_subnet(subnet)
      m.synchronize do
-        raise Proxy::DHCP::Error, "Unable to add subnet #{subnet}" if find_subnet(subnet.network)
+        key = ipv4_to_i(subnet.network) & ipv4_to_i(subnet.netmask)


ipv4_to_i(subnet.network) can be replaced by subnet.ipaddr.to_i (if you add an attr_reader), which seems to have a small performance benefit here. I think IPAddr already calculates this value, and ipv4_to_i has a slight cost.

Base:

Calculating ------------------------------------- add_subnet (1) 31.170k (±17.0%) i/s - 288.249k in 9.820534s add_subnet (5) 7.418k (±12.6%) i/s - 71.478k in 9.938092s add_subnet (50) 767.327 (±10.4%) i/s - 7.548k in 9.992509s add_subnet (500) 74.870 (± 9.3%) i/s - 742.000 in 10.011340s add_subnet (1000) 37.684 (± 8.0%) i/s - 375.000 in 10.009325s add_subnet (15000) 2.304 (± 0.0%) i/s - 23.000 in 10.058093s Memory stats Total objects allocated: 77750679

Changed:

Calculating ------------------------------------- add_subnet (1) 33.397k (±17.1%) i/s - 308.262k in 9.820294s add_subnet (5) 8.128k (±12.0%) i/s - 78.568k in 9.937083s add_subnet (50) 845.850 (± 8.6%) i/s - 8.355k in 9.993047s add_subnet (500) 82.899 (± 6.0%) i/s - 825.000 in 9.999682s add_subnet (1000) 40.876 (± 7.3%) i/s - 407.000 in 10.017958s add_subnet (15000) 2.575 (± 0.0%) i/s - 26.000 in 10.158017s Memory stats Total objects allocated: 66702210 Total heap pages allocated: 760

domcleal · 2016-11-18T14:29:48Z

modules/dhcp_common/subnet_service.rb

      end
+      if a_slice.size == 2
+        return all_subnets[a_slice.first] if all_subnets[a_slice.first].include?(address)


Replacing the Subnet#include? calls to avoid IPAddr would probably be a good idea as you suggest. Creating new IPAddr objects from the address argument is taking approximately 25% of the wall time when calling find_subnet 1M times.

dmitri-d · 2016-11-18T14:58:17Z

Would you like to include any benchmark scripts in this commit? They don't have to run as part of the test suite, but may be useful for future regression testing. Mine are in the 17387-find-subnet-wb branch on my fork and contain scripts for both add_subnet and find_network.

I would, I thought it was quite useful. I'll cherry-pick it from your branch.

I experimented with trie storage of subnets to see if it would be better than a binary search...

I'll definitely take a look.

dmitri-d · 2016-11-18T15:00:21Z

Out of curiosity, what are you using for profiling?

domcleal · 2016-11-18T15:03:37Z

Out of curiosity, what are you using for profiling?

ruby-prof (domcleal@de5dd3c) and then KCacheGrind to view the resulting file.

Calculating ------------------------------------- add_subnet (1) 31.170k (±17.0%) i/s - 288.249k in 9.820534s add_subnet (5) 7.418k (±12.6%) i/s - 71.478k in 9.938092s add_subnet (50) 767.327 (±10.4%) i/s - 7.548k in 9.992509s add_subnet (500) 74.870 (± 9.3%) i/s - 742.000 in 10.011340s add_subnet (1000) 37.684 (± 8.0%) i/s - 375.000 in 10.009325s add_subnet (15000) 2.304 (± 0.0%) i/s - 23.000 in 10.058093s Memory stats Total objects allocated: 77750679 Total heap pages allocated: 760

Adds 200 hosts to each /24 subnet and runs find_subnet for each host. Calculating ------------------------------------- find_subnet (1 subnets, 200 hosts) 612.695 (±14.4%) i/s - 5.672k in 9.972658s find_subnet (5 subnets, 1000 hosts) 99.585 (±11.0%) i/s - 980.000 in 9.995022s find_subnet (50 subnets, 10000 hosts) 9.239 (±10.8%) i/s - 92.000 in 10.082209s find_subnet (500 subnets, 100000 hosts) 0.816 (± 0.0%) i/s - 9.000 in 11.036854s find_subnet (5000 subnets, 1000000 hosts) 0.068 (± 0.0%) i/s - 1.000 in 14.690730s (cherry picked from commit c3cb604b281ffa9584a4ccbd17f7c973b214f5dc)

theforeman-bot · 2016-11-18T16:16:52Z

There were the following issues with the commit message:

3701e1f must be in the format fixes #redmine_number - brief description
commit message for 3701e1f is not wrapped at 72nd column
commit message for 3701e1f is not wrapped at 72nd column
2669bbc must be in the format fixes #redmine_number - brief description

If you don't have a ticket number, please create an issue in Redmine.

More guidelines are available in Coding Standards or on the Foreman wiki.

This message was auto-generated by Foreman's prprocessor

dmitri-d · 2016-11-18T16:23:13Z

Replaced IPAddr#include? with a local check.
Added benchmarks from 17387-find-subnet-wb-trie

theforeman-bot · 2016-11-18T16:26:36Z

There were the following issues with the commit message:

3701e1f must be in the format fixes #redmine_number - brief description
commit message for 3701e1f is not wrapped at 72nd column
commit message for 3701e1f is not wrapped at 72nd column
2669bbc must be in the format fixes #redmine_number - brief description
30abcd2 must be in the format fixes #redmine_number - brief description

If you don't have a ticket number, please create an issue in Redmine.

More guidelines are available in Coding Standards or on the Foreman wiki.

This message was auto-generated by Foreman's prprocessor

dmitri-d · 2016-11-21T16:52:50Z

I spent a good chunk of the day today trying to optimize Trie-based implementation (see the results here: https://github.com/witlessbird/smart-proxy/tree/find_subnet_speedup_trie). Unfortunately, ip lookup is still about twice as slow compared to binary search, and at this point I don't know if I can make the implementation any faster. Please see results below.

Calculating -------------------------------------
      add_subnet (1)     23.918k (±12.8%) i/s -    225.813k in   9.843063s
      add_subnet (5)      5.656k (± 9.3%) i/s -     55.005k in   9.950457s
     add_subnet (50)    582.296  (± 9.4%) i/s -      5.737k in   9.994719s
    add_subnet (500)     56.424  (±10.6%) i/s -    557.000  in  10.014349s
   add_subnet (1000)     28.647  (± 7.0%) i/s -    286.000  in  10.031897s
  add_subnet (15000)      1.931  (± 0.0%) i/s -     20.000  in  10.365617s
Memory stats
Total objects allocated: 68038558
Total heap pages allocated: 2459

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        490.860  (± 8.4%) i/s -      4.831k in   9.997520s
find_subnet (5 subnets, 1000 hosts)
                        100.014  (± 6.0%) i/s -    997.000  in  10.005266s
find_subnet (50 subnets, 10000 hosts)
                          9.900  (± 0.0%) i/s -     99.000  in  10.021060s
find_subnet (500 subnets, 100000 hosts)
                          0.976  (± 0.0%) i/s -     10.000  in  10.246424s
find_subnet (5000 subnets, 1000000 hosts)
                          0.096  (± 0.0%) i/s -      1.000  in  10.371006s
Memory stats
Total objects allocated: 87300471
Total heap pages allocated: 7235

For comparison, binary-search based implementation (note that memory usage and objects allocations are lower here too, which might contribute to the overall speed):

[EDIT]: object allocations for find_subnet are actually higher here, while the memory usage is down (probably lots of hash keys, or something).

Calculating -------------------------------------
      add_subnet (1)     32.187k (±17.5%) i/s -    294.913k in   9.794147s
      add_subnet (5)      8.042k (±14.1%) i/s -     76.814k in   9.929182s
     add_subnet (50)    832.096  (±10.3%) i/s -      8.186k in   9.992165s
    add_subnet (500)     81.789  (± 7.3%) i/s -    814.000  in  10.004160s
   add_subnet (1000)     40.334  (± 5.0%) i/s -    403.000  in  10.023466s
  add_subnet (15000)      2.614  (± 0.0%) i/s -     27.000  in  10.340966s
Memory stats
Total objects allocated: 47389684
Total heap pages allocated: 758

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                          1.071k (± 9.0%) i/s -     10.596k in   9.993791s
find_subnet (5 subnets, 1000 hosts)
                        162.130  (± 8.6%) i/s -      1.610k in  10.003799s
find_subnet (50 subnets, 10000 hosts)
                         13.148  (± 7.6%) i/s -    132.000  in  10.055822s
find_subnet (500 subnets, 100000 hosts)
                          1.065  (± 0.0%) i/s -     11.000  in  10.325334s
find_subnet (5000 subnets, 1000000 hosts)
                          0.087  (± 0.0%) i/s -      1.000  in  11.439908s
Memory stats
Total objects allocated: 170046915
Total heap pages allocated: 4294

domcleal · 2016-11-22T09:02:44Z

Yeah, I think any optimisation of the trie would be over the top and probably make this rather complex to maintain.

I'm pretty certain it is possible to optimise though, since it's the same kind of longest-prefix matching algorithm that routers use to search routing tables (i.e. a set of network prefixes and masks, find the prefix that matches a given IP - this is #find_subnet). There's a lot of literature and known ways to search these quickly.

Anyway, I came up with an alternative for the binary search which also scales linearly for any number of subnets (for a fixed network address size) and is much simpler than the trie.

Using the existing hash of integer prefixes to subnets, take the input IP address and zero out the least significant bit, check the hash for a matching prefix, zero out the next least sig. bit and repeat. Eventually you'll find a network prefix that matches the input IP address prefix, then just compare the netmask to check the IP is within the range.

At worst for an IPv4 address you'd perform 32 hash lookups (for a /0 prefix), and usually significantly fewer. Before:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                          1.256k (± 9.6%) i/s -     12.413k in   9.993943s
find_subnet (5 subnets, 1000 hosts)
                        189.718  (± 8.4%) i/s -      1.884k in  10.003885s
find_subnet (50 subnets, 10000 hosts)
                         15.123  (± 6.6%) i/s -    151.000  in  10.000912s
find_subnet (500 subnets, 100000 hosts)
                          1.239  (± 0.0%) i/s -     13.000  in  10.497013s
find_subnet (5000 subnets, 1000000 hosts)
                          0.101  (± 0.0%) i/s -      2.000  in  19.721190s
Memory stats
Total objects allocated: 228393390
Total heap pages allocated: 4310

After:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                          1.492k (± 9.1%) i/s -     14.754k in   9.992266s
find_subnet (5 subnets, 1000 hosts)
                        296.580  (± 8.1%) i/s -      2.945k in  10.000121s
find_subnet (50 subnets, 10000 hosts)
                         28.152  (± 7.1%) i/s -    280.000  in  10.006165s
find_subnet (500 subnets, 100000 hosts)
                          2.823  (± 0.0%) i/s -     29.000  in  10.276872s
find_subnet (5000 subnets, 1000000 hosts)
                          0.274  (± 0.0%) i/s -      3.000  in  10.959625s
Memory stats
Total objects allocated: 134584664
Total heap pages allocated: 4285

Commits at are: https://github.com/domcleal/smart-proxy/commits/17387-find-subnet-wb. If you like I can submit this as a PR instead for proper review?

I also optimised the ipv4_to_i(subnet.netmask) call which was repeated and came up in profiling as relatively expensive.

dmitri-d · 2016-11-22T09:49:25Z

We might be hitting native code vs. ruby performance penalty: trie should be fast (esp. after replacing recursions with loops, arrays with bitmaps, etc), but hash table is implemented in c.

Please submit a PR with your implementation of find_subnet? It's faster most of the time (should be faster for add_subnet too, as it doesn't need to maintain an ordered key set) and simpler too.

domcleal · 2016-11-22T10:12:46Z

We might be hitting native code vs. ruby performance penalty: trie should be fast (esp. after replacing recursions with loops, arrays with bitmaps, etc), but hash table is implemented in c.

Ah, that's a very good point.

Please submit a PR with your implementation of find_subnet? It's faster most of the time (should be faster for add_subnet too, as it doesn't need to maintain an ordered key set) and simpler too.

Yes, add_subnet performance did increase as a result of not needing to add to the set. Thanks for checking, I'll submit a PR later once I've tidied and checked the changes.

dmitri-d · 2016-11-22T11:38:01Z

closing in favour of #477.

domcleal · 2016-11-22T11:43:39Z

Thanks for your work on this PR!

dmitri-d · 2016-11-22T11:51:10Z

Np, thanks for looking into this with me.

theforeman-bot added Needs testing Not yet reviewed labels Nov 17, 2016

domcleal suggested changes Nov 18, 2016

View reviewed changes

theforeman-bot added Waiting on contributor and removed Not yet reviewed labels Nov 18, 2016

dmitri-d and others added 3 commits November 18, 2016 16:04

Fixes #17387 - SubnetService#find_subnet is O(log n) now.

e496d8b

theforeman-bot added Needs re-review Needs testing Waiting on contributor and removed Waiting on contributor labels Nov 18, 2016

Made rubocop happy

30abcd2

theforeman-bot added Needs re-review Needs testing and removed Waiting on contributor labels Nov 18, 2016

theforeman-bot added the Waiting on contributor label Nov 18, 2016

domcleal mentioned this pull request Nov 22, 2016

fixes #17387 - SubnetService#find_subnet has constant time lookup #477

Closed

dmitri-d closed this Nov 22, 2016

dmitri-d deleted the find_subnet_speedup branch January 31, 2017 11:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

dmitri-d commented Nov 17, 2016

dmitri-d commented Nov 17, 2016 •

edited

Loading

dmitri-d commented Nov 17, 2016

domcleal left a comment

domcleal Nov 18, 2016

domcleal Nov 18, 2016

domcleal Nov 18, 2016

dmitri-d commented Nov 18, 2016

dmitri-d commented Nov 18, 2016

domcleal commented Nov 18, 2016

theforeman-bot commented Nov 18, 2016

dmitri-d commented Nov 18, 2016

theforeman-bot commented Nov 18, 2016

dmitri-d commented Nov 21, 2016 •

edited

Loading

domcleal commented Nov 22, 2016

dmitri-d commented Nov 22, 2016

domcleal commented Nov 22, 2016

dmitri-d commented Nov 22, 2016

domcleal commented Nov 22, 2016

dmitri-d commented Nov 22, 2016

Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

Conversation

dmitri-d commented Nov 17, 2016

dmitri-d commented Nov 17, 2016 • edited Loading

dmitri-d commented Nov 17, 2016

domcleal left a comment

Choose a reason for hiding this comment

domcleal Nov 18, 2016

Choose a reason for hiding this comment

domcleal Nov 18, 2016

Choose a reason for hiding this comment

domcleal Nov 18, 2016

Choose a reason for hiding this comment

dmitri-d commented Nov 18, 2016

dmitri-d commented Nov 18, 2016

domcleal commented Nov 18, 2016

theforeman-bot commented Nov 18, 2016

dmitri-d commented Nov 18, 2016

theforeman-bot commented Nov 18, 2016

dmitri-d commented Nov 21, 2016 • edited Loading

domcleal commented Nov 22, 2016

dmitri-d commented Nov 22, 2016

domcleal commented Nov 22, 2016

dmitri-d commented Nov 22, 2016

domcleal commented Nov 22, 2016

dmitri-d commented Nov 22, 2016

dmitri-d commented Nov 17, 2016 •

edited

Loading

dmitri-d commented Nov 21, 2016 •

edited

Loading