Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

Closed
wants to merge 4 commits into from
Closed

Fixes #17387 - SubnetService#find_subnet is O(log n) now. #476

wants to merge 4 commits into from

Conversation

dmitri-d
Copy link
Member

No description provided.

@dmitri-d
Copy link
Member Author

dmitri-d commented Nov 17, 2016

Another change: treat subnets as a special case in #find_network method (no need to binary-search when I can directly match hash key). This is before the shortcut was in place (binary search only):

add_subnet (1)     22.799k (±19.9%) i/s -    207.331k in   9.844120s
add_subnet (5)      3.787k (±10.9%) i/s -     36.732k in   9.965606s
add_subnet (50)    318.338  (± 9.4%) i/s -      3.142k in   9.994796s
add_subnet (500)     25.258  (± 4.0%) i/s -    252.000  in  10.014617s
add_subnet (1000)     10.775  (± 0.0%) i/s -    108.000  in  10.042336s
add_subnet (15000)      0.135  (± 0.0%) i/s -      2.000  in  14.780711s

And this is with the subnet shortcut in place:

add_subnet (1)     26.995k (±18.8%) i/s -    246.420k in   9.823332s
add_subnet (5)      6.473k (±12.9%) i/s -     62.277k in   9.946560s
add_subnet (50)    674.130  (±10.2%) i/s -      6.630k in   9.993145s
add_subnet (500)     66.002  (± 7.6%) i/s -    656.000  in  10.010883s
add_subnet (1000)     33.124  (± 6.0%) i/s -    330.000  in  10.023744s
add_subnet (15000)      2.164  (± 0.0%) i/s -     22.000  in  10.195299s

@dmitri-d
Copy link
Member Author

I can speed up #find_network a little bit more by doing ip to subnet check myself (I already have ip and subnet addresses as ints, I only need subnet mask as int to do the check).

Copy link
Contributor

@domcleal domcleal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can speed up #find_network a little bit more by doing ip to subnet check myself (I already have ip and subnet addresses as ints, I only need subnet mask as int to do the check).

Please do, my profiling shows it might improve performance.

  1. Would you like to include any benchmark scripts in this commit? They don't have to run as part of the test suite, but may be useful for future regression testing. Mine are in the 17387-find-subnet-wb branch on my fork and contain scripts for both add_subnet and find_network.

  2. I experimented with trie storage of subnets to see if it would be better than a binary search. The performance should be more linear as the number of subnets increases:

Base:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        612.695  (±14.4%) i/s -      5.672k in   9.972658s
find_subnet (5 subnets, 1000 hosts)
                         99.585  (±11.0%) i/s -    980.000  in   9.995022s
find_subnet (50 subnets, 10000 hosts)
                          9.239  (±10.8%) i/s -     92.000  in  10.082209s
find_subnet (500 subnets, 100000 hosts)
                          0.816  (± 0.0%) i/s -      9.000  in  11.036854s
find_subnet (5000 subnets, 1000000 hosts)
                          0.068  (± 0.0%) i/s -      1.000  in  14.690730s

With trie:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        177.660  (±11.8%) i/s -      1.740k in   9.986557s
find_subnet (5 subnets, 1000 hosts)
                         34.654  (±14.4%) i/s -    340.000  in  10.008077s
find_subnet (50 subnets, 10000 hosts)
                          3.468  (± 0.0%) i/s -     35.000  in  10.154915s
find_subnet (500 subnets, 100000 hosts)
                          0.347  (± 0.0%) i/s -      4.000  in  11.522769s
find_subnet (5000 subnets, 1000000 hosts)
                          0.034  (± 0.0%) i/s -      1.000  in  29.217843s

though performance is currently worse - probably because it's not optimised at all (tree could be compressed, code improved among other things). If that interests you, it's in my 17387-find-subnet-wb-trie branch, but given how it's performing right now, I'm happy with the bsearch here.

@@ -35,22 +41,44 @@ def add_subnets(*subnets)
end

def delete_subnet(subnet_address)
m.synchronize { subnets.delete(subnet_address) }
m.synchronize { subnets.delete(ipv4_to_i(subnet_address)); subnet_keys.delete(ipv4_to_i(subnet_address)) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this block onto multiple lines instead of using ;

::Proxy::MemoryStore.new, ::Proxy::MemoryStore.new, ::Proxy::MemoryStore.new)
end

def add_subnet(subnet)
m.synchronize do
raise Proxy::DHCP::Error, "Unable to add subnet #{subnet}" if find_subnet(subnet.network)
key = ipv4_to_i(subnet.network) & ipv4_to_i(subnet.netmask)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipv4_to_i(subnet.network) can be replaced by subnet.ipaddr.to_i (if you add an attr_reader), which seems to have a small performance benefit here. I think IPAddr already calculates this value, and ipv4_to_i has a slight cost.

Base:

Calculating -------------------------------------
      add_subnet (1)     31.170k (±17.0%) i/s -    288.249k in   9.820534s
      add_subnet (5)      7.418k (±12.6%) i/s -     71.478k in   9.938092s
     add_subnet (50)    767.327  (±10.4%) i/s -      7.548k in   9.992509s
    add_subnet (500)     74.870  (± 9.3%) i/s -    742.000  in  10.011340s
   add_subnet (1000)     37.684  (± 8.0%) i/s -    375.000  in  10.009325s
  add_subnet (15000)      2.304  (± 0.0%) i/s -     23.000  in  10.058093s
Memory stats
Total objects allocated: 77750679

Changed:

Calculating -------------------------------------
      add_subnet (1)     33.397k (±17.1%) i/s -    308.262k in   9.820294s
      add_subnet (5)      8.128k (±12.0%) i/s -     78.568k in   9.937083s
     add_subnet (50)    845.850  (± 8.6%) i/s -      8.355k in   9.993047s
    add_subnet (500)     82.899  (± 6.0%) i/s -    825.000  in   9.999682s
   add_subnet (1000)     40.876  (± 7.3%) i/s -    407.000  in  10.017958s
  add_subnet (15000)      2.575  (± 0.0%) i/s -     26.000  in  10.158017s
Memory stats
Total objects allocated: 66702210
Total heap pages allocated: 760

end
if a_slice.size == 2
return all_subnets[a_slice.first] if all_subnets[a_slice.first].include?(address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the Subnet#include? calls to avoid IPAddr would probably be a good idea as you suggest. Creating new IPAddr objects from the address argument is taking approximately 25% of the wall time when calling find_subnet 1M times.

@dmitri-d
Copy link
Member Author

  1. Would you like to include any benchmark scripts in this commit? They don't have to run as part of the test suite, but may be useful for future regression testing. Mine are in the 17387-find-subnet-wb branch on my fork and contain scripts for both add_subnet and find_network.

I would, I thought it was quite useful. I'll cherry-pick it from your branch.

I experimented with trie storage of subnets to see if it would be better than a binary search...

I'll definitely take a look.

@dmitri-d
Copy link
Member Author

Out of curiosity, what are you using for profiling?

@domcleal
Copy link
Contributor

Out of curiosity, what are you using for profiling?

ruby-prof (domcleal@de5dd3c) and then KCacheGrind to view the resulting file.

dmitri-d and others added 3 commits November 18, 2016 16:04
Calculating -------------------------------------
      add_subnet (1)     31.170k (±17.0%) i/s -    288.249k in   9.820534s
      add_subnet (5)      7.418k (±12.6%) i/s -     71.478k in   9.938092s
     add_subnet (50)    767.327  (±10.4%) i/s -      7.548k in   9.992509s
    add_subnet (500)     74.870  (± 9.3%) i/s -    742.000  in  10.011340s
   add_subnet (1000)     37.684  (± 8.0%) i/s -    375.000  in  10.009325s
  add_subnet (15000)      2.304  (± 0.0%) i/s -     23.000  in  10.058093s
Memory stats
Total objects allocated: 77750679
Total heap pages allocated: 760
Adds 200 hosts to each /24 subnet and runs find_subnet for each host.

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        612.695  (±14.4%) i/s -      5.672k in   9.972658s
find_subnet (5 subnets, 1000 hosts)
                         99.585  (±11.0%) i/s -    980.000  in   9.995022s
find_subnet (50 subnets, 10000 hosts)
                          9.239  (±10.8%) i/s -     92.000  in  10.082209s
find_subnet (500 subnets, 100000 hosts)
                          0.816  (± 0.0%) i/s -      9.000  in  11.036854s
find_subnet (5000 subnets, 1000000 hosts)
                          0.068  (± 0.0%) i/s -      1.000  in  14.690730s

(cherry picked from commit c3cb604b281ffa9584a4ccbd17f7c973b214f5dc)
@theforeman-bot
Copy link
Member

There were the following issues with the commit message:

  • 3701e1f must be in the format fixes #redmine_number - brief description
  • commit message for 3701e1f is not wrapped at 72nd column
  • commit message for 3701e1f is not wrapped at 72nd column
  • 2669bbc must be in the format fixes #redmine_number - brief description

If you don't have a ticket number, please create an issue in Redmine.

More guidelines are available in Coding Standards or on the Foreman wiki.


This message was auto-generated by Foreman's prprocessor

@dmitri-d
Copy link
Member Author

  • Replaced IPAddr#include? with a local check.
  • Added benchmarks from 17387-find-subnet-wb-trie

@theforeman-bot
Copy link
Member

There were the following issues with the commit message:

  • 3701e1f must be in the format fixes #redmine_number - brief description
  • commit message for 3701e1f is not wrapped at 72nd column
  • commit message for 3701e1f is not wrapped at 72nd column
  • 2669bbc must be in the format fixes #redmine_number - brief description
  • 30abcd2 must be in the format fixes #redmine_number - brief description

If you don't have a ticket number, please create an issue in Redmine.

More guidelines are available in Coding Standards or on the Foreman wiki.


This message was auto-generated by Foreman's prprocessor

@dmitri-d
Copy link
Member Author

dmitri-d commented Nov 21, 2016

I spent a good chunk of the day today trying to optimize Trie-based implementation (see the results here: https://github.com/witlessbird/smart-proxy/tree/find_subnet_speedup_trie). Unfortunately, ip lookup is still about twice as slow compared to binary search, and at this point I don't know if I can make the implementation any faster. Please see results below.

Calculating -------------------------------------
      add_subnet (1)     23.918k (±12.8%) i/s -    225.813k in   9.843063s
      add_subnet (5)      5.656k (± 9.3%) i/s -     55.005k in   9.950457s
     add_subnet (50)    582.296  (± 9.4%) i/s -      5.737k in   9.994719s
    add_subnet (500)     56.424  (±10.6%) i/s -    557.000  in  10.014349s
   add_subnet (1000)     28.647  (± 7.0%) i/s -    286.000  in  10.031897s
  add_subnet (15000)      1.931  (± 0.0%) i/s -     20.000  in  10.365617s
Memory stats
Total objects allocated: 68038558
Total heap pages allocated: 2459
Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                        490.860  (± 8.4%) i/s -      4.831k in   9.997520s
find_subnet (5 subnets, 1000 hosts)
                        100.014  (± 6.0%) i/s -    997.000  in  10.005266s
find_subnet (50 subnets, 10000 hosts)
                          9.900  (± 0.0%) i/s -     99.000  in  10.021060s
find_subnet (500 subnets, 100000 hosts)
                          0.976  (± 0.0%) i/s -     10.000  in  10.246424s
find_subnet (5000 subnets, 1000000 hosts)
                          0.096  (± 0.0%) i/s -      1.000  in  10.371006s
Memory stats
Total objects allocated: 87300471
Total heap pages allocated: 7235

For comparison, binary-search based implementation (note that memory usage and objects allocations are lower here too, which might contribute to the overall speed):

[EDIT]: object allocations for find_subnet are actually higher here, while the memory usage is down (probably lots of hash keys, or something).

Calculating -------------------------------------
      add_subnet (1)     32.187k (±17.5%) i/s -    294.913k in   9.794147s
      add_subnet (5)      8.042k (±14.1%) i/s -     76.814k in   9.929182s
     add_subnet (50)    832.096  (±10.3%) i/s -      8.186k in   9.992165s
    add_subnet (500)     81.789  (± 7.3%) i/s -    814.000  in  10.004160s
   add_subnet (1000)     40.334  (± 5.0%) i/s -    403.000  in  10.023466s
  add_subnet (15000)      2.614  (± 0.0%) i/s -     27.000  in  10.340966s
Memory stats
Total objects allocated: 47389684
Total heap pages allocated: 758
Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                          1.071k (± 9.0%) i/s -     10.596k in   9.993791s
find_subnet (5 subnets, 1000 hosts)
                        162.130  (± 8.6%) i/s -      1.610k in  10.003799s
find_subnet (50 subnets, 10000 hosts)
                         13.148  (± 7.6%) i/s -    132.000  in  10.055822s
find_subnet (500 subnets, 100000 hosts)
                          1.065  (± 0.0%) i/s -     11.000  in  10.325334s
find_subnet (5000 subnets, 1000000 hosts)
                          0.087  (± 0.0%) i/s -      1.000  in  11.439908s
Memory stats
Total objects allocated: 170046915
Total heap pages allocated: 4294

@domcleal
Copy link
Contributor

Yeah, I think any optimisation of the trie would be over the top and probably make this rather complex to maintain.

I'm pretty certain it is possible to optimise though, since it's the same kind of longest-prefix matching algorithm that routers use to search routing tables (i.e. a set of network prefixes and masks, find the prefix that matches a given IP - this is #find_subnet). There's a lot of literature and known ways to search these quickly.

Anyway, I came up with an alternative for the binary search which also scales linearly for any number of subnets (for a fixed network address size) and is much simpler than the trie.

Using the existing hash of integer prefixes to subnets, take the input IP address and zero out the least significant bit, check the hash for a matching prefix, zero out the next least sig. bit and repeat. Eventually you'll find a network prefix that matches the input IP address prefix, then just compare the netmask to check the IP is within the range.

At worst for an IPv4 address you'd perform 32 hash lookups (for a /0 prefix), and usually significantly fewer. Before:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                          1.256k (± 9.6%) i/s -     12.413k in   9.993943s
find_subnet (5 subnets, 1000 hosts)
                        189.718  (± 8.4%) i/s -      1.884k in  10.003885s
find_subnet (50 subnets, 10000 hosts)
                         15.123  (± 6.6%) i/s -    151.000  in  10.000912s
find_subnet (500 subnets, 100000 hosts)
                          1.239  (± 0.0%) i/s -     13.000  in  10.497013s
find_subnet (5000 subnets, 1000000 hosts)
                          0.101  (± 0.0%) i/s -      2.000  in  19.721190s
Memory stats
Total objects allocated: 228393390
Total heap pages allocated: 4310

After:

Calculating -------------------------------------
find_subnet (1 subnets, 200 hosts)
                          1.492k (± 9.1%) i/s -     14.754k in   9.992266s
find_subnet (5 subnets, 1000 hosts)
                        296.580  (± 8.1%) i/s -      2.945k in  10.000121s
find_subnet (50 subnets, 10000 hosts)
                         28.152  (± 7.1%) i/s -    280.000  in  10.006165s
find_subnet (500 subnets, 100000 hosts)
                          2.823  (± 0.0%) i/s -     29.000  in  10.276872s
find_subnet (5000 subnets, 1000000 hosts)
                          0.274  (± 0.0%) i/s -      3.000  in  10.959625s
Memory stats
Total objects allocated: 134584664
Total heap pages allocated: 4285

Commits at are: https://github.com/domcleal/smart-proxy/commits/17387-find-subnet-wb. If you like I can submit this as a PR instead for proper review?

I also optimised the ipv4_to_i(subnet.netmask) call which was repeated and came up in profiling as relatively expensive.

@dmitri-d
Copy link
Member Author

We might be hitting native code vs. ruby performance penalty: trie should be fast (esp. after replacing recursions with loops, arrays with bitmaps, etc), but hash table is implemented in c.

Please submit a PR with your implementation of find_subnet? It's faster most of the time (should be faster for add_subnet too, as it doesn't need to maintain an ordered key set) and simpler too.

@domcleal
Copy link
Contributor

We might be hitting native code vs. ruby performance penalty: trie should be fast (esp. after replacing recursions with loops, arrays with bitmaps, etc), but hash table is implemented in c.

Ah, that's a very good point.

Please submit a PR with your implementation of find_subnet? It's faster most of the time (should be faster for add_subnet too, as it doesn't need to maintain an ordered key set) and simpler too.

Yes, add_subnet performance did increase as a result of not needing to add to the set. Thanks for checking, I'll submit a PR later once I've tidied and checked the changes.

@dmitri-d
Copy link
Member Author

closing in favour of #477.

@dmitri-d dmitri-d closed this Nov 22, 2016
@domcleal
Copy link
Contributor

Thanks for your work on this PR!

@dmitri-d
Copy link
Member Author

Np, thanks for looking into this with me.

@dmitri-d dmitri-d deleted the find_subnet_speedup branch January 31, 2017 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants