Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup fails #3

Closed
alaz opened this issue Apr 11, 2019 · 3 comments
Closed

Lookup fails #3

alaz opened this issue Apr 11, 2019 · 3 comments

Comments

@alaz
Copy link

alaz commented Apr 11, 2019

Hi,

I am an author of Legitbot, it's a library to detect whether a request comes from a valid Web crawler or from some kind of impersonator. I am using "segment_tree" to quickly lookup IPs in a list of Facebook ranges. And we have a problem when looking for a valid IP and getting negative result back: alaz/legitbot#10

I tried to come up with a smaller test case, but frankly it does not look too small. Here I have the full list of Facebook IP ranges, then I'm building a segment tree and the lookup fails:

a = ["69.63.176.0/20", "66.220.144.0/20", "66.220.144.0/21", "69.63.184.0/21", "69.63.176.0/21", "74.119.76.0/22", "69.171.255.0/24", "173.252.64.0/18", "69.171.224.0/19", "69.171.224.0/20", "103.4.96.0/22", "69.63.176.0/24", "173.252.64.0/19", "173.252.70.0/24", "31.13.64.0/18", "31.13.24.0/21", "66.220.152.0/21", "66.220.159.0/24", "69.171.239.0/24", "69.171.240.0/20", "31.13.64.0/19", "31.13.64.0/24", "31.13.65.0/24", "31.13.67.0/24", "31.13.68.0/24", "31.13.69.0/24", "31.13.70.0/24", "31.13.71.0/24", "31.13.72.0/24", "31.13.73.0/24", "31.13.74.0/24", "31.13.75.0/24", "31.13.76.0/24", "31.13.77.0/24", "31.13.96.0/19", "31.13.66.0/24", "173.252.96.0/19", "69.63.178.0/24", "31.13.78.0/24", "31.13.79.0/24", "31.13.80.0/24", "31.13.82.0/24", "31.13.83.0/24", "31.13.84.0/24", "31.13.85.0/24", "31.13.86.0/24", "31.13.87.0/24", "31.13.88.0/24", "31.13.89.0/24", "31.13.90.0/24", "31.13.91.0/24", "31.13.92.0/24", "31.13.93.0/24", "31.13.94.0/24", "31.13.95.0/24", "69.171.253.0/24", "69.63.186.0/24", "31.13.81.0/24", "179.60.192.0/22", "179.60.192.0/24", "179.60.193.0/24", "179.60.194.0/24", "179.60.195.0/24", "185.60.216.0/22", "45.64.40.0/22", "185.60.216.0/24", "185.60.217.0/24", "185.60.218.0/24", "185.60.219.0/24", "129.134.0.0/16", "157.240.0.0/16", "157.240.8.0/24", "157.240.0.0/24", "157.240.1.0/24", "157.240.2.0/24", "157.240.3.0/24", "157.240.4.0/24", "157.240.5.0/24", "157.240.6.0/24", "157.240.7.0/24", "157.240.9.0/24", "157.240.10.0/24", "157.240.16.0/24", "157.240.19.0/24", "157.240.11.0/24", "157.240.12.0/24", "157.240.13.0/24", "157.240.14.0/24", "157.240.15.0/24", "157.240.17.0/24", "157.240.18.0/24", "157.240.20.0/24", "157.240.21.0/24", "157.240.22.0/24", "157.240.23.0/24", "157.240.0.0/17", "69.171.250.0/24", "157.240.24.0/24", "157.240.25.0/24", "199.201.64.0/24", "199.201.65.0/24", "199.201.64.0/22", "204.15.20.0/22", "157.240.192.0/24", "129.134.0.0/17", "157.240.198.0/24"]

irb(main):061:0> ss = SegmentTree.new(a.map {|ip| [IPAddr.new(ip).to_range, true]})
=> SegmentTree(31.13.24.0..204.15.23.255)

irb(main):062:0> ss.find(IPAddr.new('69.171.251.1'))
=> nil

But If I select only some part of ranges from the same list, then lookup in the resulting segment tree succeeds:

irb(main):063:0> s = SegmentTree.new(a.select {|ip| /^69.171/ === ip}.map {|ip| [IPAddr.new(ip).to_range, true]})
=> SegmentTree(69.171.224.0..69.171.255.255)
irb(main):064:0> s.find(IPAddr.new('69.171.251.1'))
=> #<SegmentTree::Segment:0x00007fb4c8211978 @range=#<IPAddr: IPv4:69.171.240.0/255.255.240.0>..#<IPAddr: IPv4:69.171.255.255/255.255.240.0>, @value=true>

The most evident observation is that some ranges overlap and the list is not sorted. I shall appreciate your help on this issue.

Regards,
Alexander.

@kirichkov
Copy link

I've been investigating the issue myself, and this is most definitely caused by the overlap of IP address ranges, using different netmasks.

I've pinpointed the issue to

def matches?(x, low_idx, high_idx, idx) #:nodoc:

When the IP ranges are sorted, it just happens so that on the first run the left (Index 51) is #<SegmentTree::Segment:0x00007f97a9c112c0 @range=#<IPAddr: IPv4:69.171.250.0/255.255.255.0>..#<IPAddr: IPv4:69.171.250.255/255.255.255.0>, @value=true>

the right (Index 53) is #<SegmentTree::Segment:0x00007f97a9c12ee0 @range=#<IPAddr: IPv4:69.171.255.0/255.255.255.0>..#<IPAddr: IPv4:69.171.255.255/255.255.255.0>, @value=true>

and segment (Index 52) is #<SegmentTree::Segment:0x00007f97a9c11f90 @range=#<IPAddr: IPv4:69.171.253.0/255.255.255.0>..#<IPAddr: IPv4:69.171.253.255/255.255.255.0>, @value=true>

But because the enquired IP address is further two places to the left at index 50 #<SegmentTree::Segment:0x00007faf0dc564f8 @range=#<IPAddr: IPv4:69.171.240.0/255.255.240.0>..#<IPAddr: IPv4:69.171.255.255/255.255.240.0>, @value=true>,
That's because it falls into a larger range, which starts earlier.

The binary search gives up, as it expects 69.171.251.1 to be after the current left, while in fact it is not.

With the current implementation of SegmentTree using a sorted array and binary search, I don't see how this can be solved in here. Perhaps the list of IPs can be broken up beforehand into the least-common-denominator netmask and passed to SegmentTree as a list of same-sized netmask ranges.

@take-five
Copy link
Owner

Hi, thanks for reporting and investigating the bug!

So it seems that binary search implementation is fundamentally incorrect and the library requires an overhaul. I'm not sure if I'll have enough time to do that in the nearest future.

In the meantime, can you try an interval tree for searching the IP-address? https://github.com/misshie/interval-tree seems to fit your needs. I tested it on your dataset and it seems to work fine:

subnets = ["69.63.176.0/20", "66.220.144.0/20", "66.220.144.0/21", "69.63.184.0/21", "69.63.176.0/21", "74.119.76.0/22", "69.171.255.0/24", "173.252.64.0/18", "69.171.224.0/19", "69.171.224.0/20", "103.4.96.0/22", "69.63.176.0/24", "173.252.64.0/19", "173.252.70.0/24", "31.13.64.0/18", "31.13.24.0/21", "66.220.152.0/21", "66.220.159.0/24", "69.171.239.0/24", "69.171.240.0/20", "31.13.64.0/19", "31.13.64.0/24", "31.13.65.0/24", "31.13.67.0/24", "31.13.68.0/24", "31.13.69.0/24", "31.13.70.0/24", "31.13.71.0/24", "31.13.72.0/24", "31.13.73.0/24", "31.13.74.0/24", "31.13.75.0/24", "31.13.76.0/24", "31.13.77.0/24", "31.13.96.0/19", "31.13.66.0/24", "173.252.96.0/19", "69.63.178.0/24", "31.13.78.0/24", "31.13.79.0/24", "31.13.80.0/24", "31.13.82.0/24", "31.13.83.0/24", "31.13.84.0/24", "31.13.85.0/24", "31.13.86.0/24", "31.13.87.0/24", "31.13.88.0/24", "31.13.89.0/24", "31.13.90.0/24", "31.13.91.0/24", "31.13.92.0/24", "31.13.93.0/24", "31.13.94.0/24", "31.13.95.0/24", "69.171.253.0/24", "69.63.186.0/24", "31.13.81.0/24", "179.60.192.0/22", "179.60.192.0/24", "179.60.193.0/24", "179.60.194.0/24", "179.60.195.0/24", "185.60.216.0/22", "45.64.40.0/22", "185.60.216.0/24", "185.60.217.0/24", "185.60.218.0/24", "185.60.219.0/24", "129.134.0.0/16", "157.240.0.0/16", "157.240.8.0/24", "157.240.0.0/24", "157.240.1.0/24", "157.240.2.0/24", "157.240.3.0/24", "157.240.4.0/24", "157.240.5.0/24", "157.240.6.0/24", "157.240.7.0/24", "157.240.9.0/24", "157.240.10.0/24", "157.240.16.0/24", "157.240.19.0/24", "157.240.11.0/24", "157.240.12.0/24", "157.240.13.0/24", "157.240.14.0/24", "157.240.15.0/24", "157.240.17.0/24", "157.240.18.0/24", "157.240.20.0/24", "157.240.21.0/24", "157.240.22.0/24", "157.240.23.0/24", "157.240.0.0/17", "69.171.250.0/24", "157.240.24.0/24", "157.240.25.0/24", "199.201.64.0/24", "199.201.65.0/24", "199.201.64.0/22", "204.15.20.0/22", "157.240.192.0/24", "129.134.0.0/17", "157.240.198.0/24"]

ranges = subnets.map do |ip|
  range = ip.to_range
  (range.begin.to_i..range.end.to_i) # interval-tree can't work with IPAddr, hence converting to integer
end

tree = IntervalTree::Tree.new(ranges)
ip_to_search = IPAddr.new('69.171.251.1')

found = tree.search(ip_to_search.to_i).size > 1

@alaz
Copy link
Author

alaz commented Apr 11, 2019

Thank you for the suggestion!

@alaz alaz closed this as completed Apr 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants