Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #19205 - Add IPMI IP scanning to BMC API #519

Closed
wants to merge 1 commit into from
Closed

Fixes #19205 - Add IPMI IP scanning to BMC API #519

wants to merge 1 commit into from

Conversation

Deltik
Copy link
Contributor

@Deltik Deltik commented Apr 6, 2017

This commit adds a new feature to smart-proxy's bmc module. This new
feature is an IP address range scanner that returns a list of IP
addresses that respond to IPMI ping packets on UDP port 623.

The use case for this is the discovery of IPMI hosts. Once the API
method has returned the list of responding IPMI hosts, another
application (perhaps Foreman in the future) can iterate over the list to
add the associated servers into Foreman as Foreman hosts and/or
configure the IPMI hosts in existing Foreman hosts' interfaces.

Scanning for IPMI hosts would be an important first step to onboard new
servers into Foreman if the current data center infrastructure
management does not yet have a reliable source of truth of which servers
have which IPMI hosts. Knowing which IP addresses are IPMI hosts would
simplify finding servers and IPMI NICs that are not in Foreman.

New API methods:

  • GET /bmc/scan
    Shows available resources /scan/range and /scan/cidr

  • GET /bmc/scan/range
    Shows usage on specifying a beginning IP address and an ending IP
    address for making a scan request

  • GET /bmc/scan/cidr
    Shows usage on specifying an IP address and its netmask in dot
    decimal format or prefixlen format for making a scan request

  • GET /bmc/scan/range/:address_first/:address_last
    Performs an IPMI ping scan from :address_first to :address_last and
    returns the result in key "result" of a JSON hash

  • GET /bmc/scan/cidr/:address/:netmask
    Performs an IPMI ping scan in the CIDR range of :address/:netmask,
    where :netmask is in decimal format (e.g. "255.255.255.0") or in
    prefixlen format (e.g. "24")

New classes:

  • Proxy::BMC::BaseScanner
    The interface for IP range scanning

  • Proxy::BMC::IPMIScanner
    An IP range scanner for IPMI. Implements Proxy::BMC::BaseScanner.
    Loosely based on ipmiutil_discover: http://ipmiutil.sourceforge.net/
    This class is used when the bmc_provider or bmc_default_provider is
    set to "freeipmi" or "ipmitool" because those imply IPMI BMC.

New configurable items in "bmc.yml":

  • :bmc_scanner_max_range_size
    The largest number of IP addresses that may be scanned in one
    request. Set this value to 0 to turn off the scanner. Defaults to
    65536, which covers a CIDR prefixlen of /16.

  • :bmc_scanner_max_threads_per_request
    The maximum number of IP addresses to scan at the same time, per
    request. Defaults to 500. Before increasing this number, you should
    increase the system limit on the maximum number of open files for the
    Smart Proxy process.

  • :bmc_scanner_socket_timeout_seconds
    How many seconds to wait for each scanned IP address to respond
    before timing out. Defaults to 1.

This commit adds a new feature to smart-proxy's bmc module. This new
feature is an IP address range scanner that returns a list of IP
addresses that respond to IPMI ping packets on UDP port 623.

The use case for this is the discovery of IPMI hosts. Once the API
method has returned the list of responding IPMI hosts, another
application (perhaps Foreman in the future) can iterate over the list to
add the associated servers into Foreman as Foreman hosts and/or
configure the IPMI hosts in existing Foreman hosts' interfaces.

Scanning for IPMI hosts would be an important first step to onboard new
servers into Foreman if the current data center infrastructure
management does not yet have a reliable source of truth of which servers
have which IPMI hosts. Knowing which IP addresses are IPMI hosts would
simplify finding servers and IPMI NICs that are not in Foreman.

New API methods:

 - GET /bmc/scan
   Shows available resources /scan/range and /scan/cidr

 - GET /bmc/scan/range
   Shows usage on specifying a beginning IP address and an ending IP
   address for making a scan request

 - GET /bmc/scan/cidr
   Shows usage on specifying an IP address and its netmask in dot
   decimal format or prefixlen format for making a scan request

 - GET /bmc/scan/range/:address_first/:address_last
   Performs an IPMI ping scan from :address_first to :address_last and
   returns the result in key "result" of a JSON hash

 - GET /bmc/scan/cidr/:address/:netmask
   Performs an IPMI ping scan in the CIDR range of :address/:netmask,
   where :netmask is in decimal format (e.g. "255.255.255.0") or in
   prefixlen format (e.g. "24")

New classes:

 - Proxy::BMC::BaseScanner
   The interface for IP range scanning

 - Proxy::BMC::IPMIScanner
   An IP range scanner for IPMI. Implements Proxy::BMC::BaseScanner.
   Loosely based on ipmiutil_discover: http://ipmiutil.sourceforge.net/
   This class is used when the bmc_provider or bmc_default_provider is
   set to "freeipmi" or "ipmitool" because those imply IPMI BMC.

New configurable items in "bmc.yml":

 - :bmc_scanner_max_range_size
   The largest number of IP addresses that may be scanned in one
   request. Set this value to 0 to turn off the scanner.  Defaults to
   65536, which covers a CIDR prefixlen of /16.

 - :bmc_scanner_max_threads_per_request
   The maximum number of IP addresses to scan at the same time, per
   request. Defaults to 500.  Before increasing this number, you should
   increase the system limit on the maximum number of open files for the
   Smart Proxy process.

 - :bmc_scanner_socket_timeout_seconds
   How many seconds to wait for each scanned IP address to respond
   before timing out. Defaults to 1.
@stbenjam
Copy link
Member

stbenjam commented Apr 6, 2017

[test]

Copy link
Member

@dmitri-d dmitri-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

While this approach might work, I think there's a pretty high likelihood of the call to smart-proxy timing out on larger ranges and/or slower/busier networks. Did you try pinging large number of hosts (a few thousand)? How long did it take?

You may want to consider separating scanning request and retrieving scan results into separate calls: the former would initiate an async job, the latter would fetch its results.

Ultimately, I'm not sure if this belongs in the smart-proxy core, however. AFAIK this doesn't integrate with any current or planned Foreman functionality, and it doesn't support all of available BMC providers (not sure if this is even possible). A dedicated smart-proxy (and Forman?) plugin is probably a better fit. @stbenjam: Any thoughts on this?

!selections.nil?
end

# Not used because of slowness
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the code that's not being used.

socket.connect(address.to_s, 623)
socket.send([0x6, 0x0, 0xff, 0x6, 0x0, 0x0, 0x11, 0xbe, 0x80, 0x0, 0x0, 0x0].pack('C*'), 0)
selections = IO.select([socket], nil, nil, (Proxy::BMC::Plugin.settings.bmc_scanner_socket_timeout_seconds || 1))
socket.close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leaking file descriptors in case an exception occurs on sending or in select call.

max_threads = [sockets.length / 2, 1].max
# Clean up sockets
sockets.each do |sock|
sock.close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is leaking file descriptors on any error other than Errno::EMFILE or when no error occur.

end

# Determine maximum number of threads
def calculate_max_threads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a good approach: smart-proxy usually shares the system with other systems that oftentimes support network infrastructure. This will starving them of file descriptors/possibly cause failures.

def scan_threaded_to_list
return false if !valid?
pinged = Array.new
pool = Concurrent::ThreadPoolExecutor.new(max_threads: calculate_max_threads)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While a couple hundred of threads aren't going to kill a system:

  • they can starve other proxy plugins or processes running on the system
  • a thread per connection is an overkill/is wasteful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest batching addresses, use IO#select + timeout to check for responses in each batch, and use a pool of 10-20 workers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this before I opted to do a thread per connection.

      def scan_io_to_list
        pinged = Array.new
        sockets = Array.new
        begin
          @range.each do |address|
            socket = UDPSocket.new
            socket.connect(address.to_s, 623)
            socket.send("\x6\0\xff\x6\0\0\x11\xbe\x80\0\0\0", 0)
            sockets << socket
          end
        rescue Errno::EMFILE => error
          p error
          p sockets.length
          exit
        end
        while selections = IO.select(sockets, nil, nil, 1)
          selections[0].each do |socket|
            pinged << socket.peeraddr[2]
            socket.close
          end
        end
        pinged
      end

I agree with you that there should be a hybrid of threads and IO#select.

retry
end
socket.connect(address.to_s, 623)
socket.send([0x6, 0x0, 0xff, 0x6, 0x0, 0x0, 0x11, 0xbe, 0x80, 0x0, 0x0, 0x0].pack('C*'), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment describing the contents of this packet, together with a link to some docs about the call this is making?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From ipmiutil idiscover:

/*
 * send_ping_pkt:
 * RMCP Ping buffer, sent as a UDP packet to port 0x026f.
 * rmcp.ver     = 0x06          // RMCP Version 1.0
 * rmcp.resvd   = 0x00          // RESERVED
 * rmcp.seq     = 0xff          // no RMCP ACK 
 * rmcp.class   = 0x06          // RMCP_CLASS_ASF
 * asf.iana     = 0x000011BE    // ASF_RMCP_IANA (hi-lo)
 * asf.type     = 0x80          // ASF_TYPE_PING
 * asf.tag      = 0x00          // ASF sequence number
 * asf.resvd    = 0x00          // RESERVED
 * asf.len      = 0x00
 */

socket = UDPSocket.new
rescue Errno::EMFILE
logger.warn "IPMIScanner: Ran out of free file descriptors while creating UDPSocket! Consider increasing file open limit."
retry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a potential to loop for while. I'd suggest putting a hard limit on number of retries and handle file descriptor starvation by reducing the number of simultaneously contacted addresses instead.

end

def valid?
@range.is_a? Range
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of validating ip range size in constructor, I would suggest saving ip range parameters there, and delay the actual range initialization until later (you are doing an explicit validity check anyway).

For example, something like this might work:

def range
  @range ||= (@address_first..@address_last)
  raise "Range too large." if @range.first(scanner_max_range_size+1).size > scanner_max_range_size
end

This will allow you to use exceptions too.

@Deltik
Copy link
Contributor Author

Deltik commented Apr 14, 2017

@witlessbird: I do agree that this functionality should be moved into a separate plugin, since nothing in Foreman can make use of this, so I'll be looking into making a plugin tentatively called "smart_proxy_onboard".

In case you were still curious, here are some performance tests:

Test machine:

         OS: CentOS Linux 7 (Core)
     Kernel: 3.10.0-514.6.1.el7.x86_64
    Product: VMware Virtual Platform
         VM: VMware
        CPU: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
      nproc: 4
        RAM: 3791MiB
  • 256 addresses scanned, 34 IPMI hosts responded:

    • command to count IPMI hosts responded:
      time curl -k --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/foreman.pem --key /etc/puppetlabs/puppet/ssl/private_keys/foreman.pem -H "Accept: application/json" https://10.10.145.47:8443/bmc/scan/range/10.246.0.0/10.246.0.255 -X GET | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | wc -l
      
    • command to time API responses:
      for i in {1..50} ; do { time curl -sk --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/foreman.pem --key /etc/puppetlabs/puppet/ssl/private_keys/foreman.pem -H "Accept: application/json" https://10.10.145.47:8443/bmc/scan/range/10.246.0.0/10.246.0.255 -X GET ;} 2>&1 | grep '^real' 2>&1 ; done
      
    • samples: 50
    • five-number summary and mean, in seconds:
       Min.   :1.484  
       1st Qu.:1.524  
       Median :1.534  
       Mean   :1.540  
       3rd Qu.:1.556  
       Max.   :1.692
      
    • standard deviation: 0.03350714
  • 2044 addresses scanned, 486 IPMI hosts responded:

    • command to count IPMI hosts responded:
      time curl -k --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/foreman.pem --key /etc/puppetlabs/puppet/ssl/private_keys/foreman.pem -H "Accept: application/json" https://10.10.145.47:8443/bmc/scan/range/10.14.64.2/10.14.71.253 -X GET | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | wc -l
      
    • command to time API responses:
      for i in {1..50} ; do { time curl -sk --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/foreman.pem --key /etc/puppetlabs/puppet/ssl/private_keys/foreman.pem -H "Accept: application/json" https://10.10.145.47:8443/bmc/scan/range/10.14.64.2/10.14.71.253 -X GET ;} 2>&1 | grep '^real' 2>&1 ; done
      
    • samples: 50
    • five-number summary and mean, in seconds:
       Min.   :4.342  
       1st Qu.:4.357  
       Median :4.372  
       Mean   :4.411  
       3rd Qu.:4.459  
       Max.   :4.564
      
    • standard deviation: 0.06653151
  • 65536 addresses scanned, 46 IPMI hosts responded:

    • command to benchmark:
      time curl -k --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/foreman.pem --key /etc/puppetlabs/puppet/ssl/private_keys/foreman.pem -H "Accept: application/json" https://10.10.145.47:8443/bmc/scan/range/10.246.0.0/10.246.255.255 -X GET | grep -Eo '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' | wc -l
      
    • samples: 1
    • response time: 132.629 seconds

If file descriptors are exhausted using the default settings, Smart Proxy uses up to 10.0% of the RAM on the test machine. This means that raising the file descriptor limit would probably make it more likely for an OOM condition to happen. A solution that integrates this in a hybrid with fewer threads could be more optimal.

To compare, one of the slowest Smart Proxy BMC API calls is a Rubyipmi timeout for a non-existent IPMI host, which takes 20 seconds, like so:

root@ops-am2dev-foreman0001 [~]# time curl -k --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/foreman.pem --key /etc/puppetlabs/puppet/ssl/private_keys/foreman.pem -H "Accept: application/json" https://10.10.145.47:8443/bmc/10.246.0.72/chassis/power/status -X GET -u 'admin:password'                                                                                                                                    
{"action":"status","result":null}
real    0m20.379s
user    0m0.176s
sys     0m0.028s

@logicminds
Copy link
Contributor

Is this something we should implement into rubyipmi and perhaps use native functionality in ipmitool and freeipmi?

@Deltik
Copy link
Contributor Author

Deltik commented Apr 14, 2017

@logicminds: This functionality is missing from both IPMItool and FreeIPMI. What I did instead was port some of the code from ipmiutil idiscover and try to make it significantly faster by opening many sockets at once instead of going down the list of addresses sequentially.

@dmitri-d
Copy link
Member

@Deltik: thanks for collecting the stats. Looks pretty good; however if you are going to keep bmc_scanner_max_range_size setting, I think you should consider decoupling scan requests from responses: scanning a /15 subnet (quite exotic, but possible) would likely result in request to proxy timing out.
Also: let me know if you have questions or such about smart-proxy or plugin development.

@logicminds: Personally I always prefer native api calls to parsing shell output, so this looks better to me. Having said that, it's up to @Deltik which approach to take.

@Deltik
Copy link
Contributor Author

Deltik commented Apr 14, 2017

@lzap: Is there any potential for this code to be coupled in with smart_proxy_discovery?

Imagine a process before Foreman Discovery where servers with IPMI are set to PXE boot en masse into the Foreman Discovery Image.

@dmitri-d
Copy link
Member

dmitri-d commented May 4, 2017

closing in favour of #521

@dmitri-d dmitri-d closed this May 4, 2017
@Deltik
Copy link
Contributor Author

Deltik commented May 4, 2017

@witlessbird: This pull request is unrelated to #521, but I suppose this one should remain closed anyway because the code should be a plugin instead.

@dmitri-d
Copy link
Member

dmitri-d commented May 4, 2017

@Deltik: my apologies.

@dmitri-d
Copy link
Member

dmitri-d commented May 5, 2017

@Deltik: Forgot to ask yesterday if you need any help or have questions re: plugins and such...

Deltik pushed a commit to Deltik/smart_proxy_onboard that referenced this pull request May 5, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 5, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 5, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 5, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
Deltik added a commit to Deltik/smart_proxy_onboard that referenced this pull request May 6, 2017
Reimplementation of 0bb903c43c19a389ecd80f5c75877d2962b897c6
@Deltik
Copy link
Contributor Author

Deltik commented May 8, 2017

@witlessbird: I do in fact need some help with the https://github.com/Deltik/smart_proxy_onboard plugin.

Smart Proxy isn't loading the route map that I have specified at https://github.com/Deltik/smart_proxy_onboard/blob/7bcd7304b500f4ac76062fc2e181c7e1a9f9aca0/lib/smart_proxy_onboard/http_config.ru, and I do not know how to troubleshoot this.

How does one load up a console for Smart Proxy? Why aren't the new API methods from the plugin recognized?

@dmitri-d
Copy link
Member

dmitri-d commented May 8, 2017

@Deltik: I'm sorry I missed your pm earlier today. I'll try to take a look at the code today...

@dmitri-d
Copy link
Member

dmitri-d commented May 8, 2017

@Deltik: I created two issues in https://github.com/Deltik/smart_proxy_onboard repository, the mounting point one is the reason for you not seeing any of your api end points.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants