-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XrdCl issues on dual-stack hosts #326
Comments
I will change the underlying code to make he decision on configured interfaces with a DNS fallback. I want to point out that this is not a slam-dunk solution. We differentiated how a server and a client determines IP availability because for servers you can specify which interfaces are relevant while you cannot do so for a client. Why is that important? Because some sites, for one reason or another, define non-routable interfaces that, for all intense and purposes, look like they should work. Generally, admins automatically register working interfaces in DNS for client type machines so basing it on DNS was a safer choice. Clearly, that was not done in this case. So, if the client machine has weird interfaces we may be fooled as well. Though, I suppose it's unlikely that client-type machines will have weird interfaces. We shall see. |
Hi Andy, To be clear - about half of the sites we've encountered didn't have DNS configured compatible with the client. Further, it's not really DNS - for example, if there's an IPv4 address in /etc/hosts, glibc will skip any DNS lookups. What about the other half of the fix: have the client detect when the Brian |
OK, so let me ramble here because there is a lot of history behind what we did. That might work but the socket information has been abstracted out by the time we need to set this information. So, adding it in based on the socket would be a relatively large change. I suppose I could ague that it would be a hack anyway w.r.t. that people really need to configure IPv6 correctly (clearly that is not what happens 50% of the time). So, hack is relative to sites that just don't do it right and how do we compensate for that. The answer is that it will never be absolutely right, sigh. I am still prone to just base it on interfaces as it will get us to the 10-20% edge cases where people don't correctly configure the interfaces. We found that to happen most often in VM's but then that's what lot of worker nodes are becoming. I suppose we can just say that if you have configured a usable IPv6 interface you must register it in DNS or in /etc/hosts (if you are still prone to using /etc/hosts - another big problem). I recall Lukasz and I having a lot of heated discussions about this because it became very clear to us that sites configure IPv6 in a myriad of ways and we couldn't possibly capture all of them and maybe we should just give up and assume a base level of configuration. That, of course, proved to be unworkable as well and part of the reason is that we got slammed in the WLCG IPv6 readiness meeting for proposing it because it would not properly capture IPv6-only clients. At the moment I don't have any easy solution here. Either we instruct people to "register" all of their interfaces (which they should be doing) or we use the configured interface method (which has problems of its own). Basing it on socket connection is very problematic given the code base. So, which poison do you want? |
Andy, I think that part of why we worked together so well is that we have never really had a non-heated discussion ;) Anyways, I was trying to make a conscious effort not to comment on these - this is none of my business anymore after all, but since it concerns XrdCl and I was called by name, I will add my two cents. I am and have always been of the opinion that trying too hard to work around sysadmin's mistakes is a strategy that definitely makes many users happy, but it will quite often back-fire. If a host has mis-configured network interfaces it really is not XRootD's problem. This statement is meant to support both sides of the discussion in certain aspects. :) |
Hi, I'm sorry, but where on earth does this rule about "properly" configured hosts exist (as in, a host is properly configured if and only if the hostname has a resolving DNS entry)? As far as I can see, no such rule exists for IPv4. Why does it exist for IPv6? As a few examples where hostnames may not work:
This issue is pretty bad: we now have to consider disabling IPv6 for all sites because IPv4 stops working for a large number of dual-stack sites. I concede that the existing setup probably looked like the correct approach when it was written: at that point, how sites would deploy IPv6 was a bit more theoretical. Here's my logic for using interface-based semantics:
In fact, using the same logic, we probably should identify a client as dual-stack if there is a public IPv6 address and a private IPv4-only (given the prevalence of NATs for IPv4 and the scarcity of NATs for IPv6). Brian |
What I meant to say is that XRootD will often try to query DNS because it deems the network interfaces mis-configured and tries to work around. Some times it works, some times it doesn't. People have asked for conflicting things at different occasions. |
My opinion on this has always been: trust the host configuration and let the admins deal with the consequences. |
Brian, I completely agree with you that this has to be solved and soon. I am just trying to come up with a solution that doesn't cause other, perhaps more serious, problems (though this problem is pretty serious in and of itself). So, let's review the options:
I this this pretty much covers the solution space. Now we just need to implement one of them. Let me work a bit more on seeing how to pass the connection detail up the call stack since, as you point out, is the least problematic solution but, obviously, does not cover all possible configurations. |
OK, I found a relatively easy way to make the connection test. Essentially, if we determine that we are not dual stacked based on DNS and if we have an IPv4 address but the end-point connection is family AF_INET6 then we mark ourselves as dual stacked. Is there more we should consider? |
Hi Andy, So, basically you've found a way to do solution (3) above? The logic you mention sounds correct. My preference would be to do both (2) and (3) independently. I suppose we could hide (2) behind, as you say, "yet another" envvar; I too dislike that due to the same reasons you mention. Brian |
Yes, solution (3) is possible to do and has been done. I am looking on how to decouple (2). The likely approach (so as not to break the ABI) is to define a new query type, queryINIF. When used it will base the lookup on ifconfig. The default lookup, queryINET, bases it on DNS. If we do add an envar it would be used to decide which query to actually use. |
Forgot to include a "fix" in the patch I pushed. But the fix is now in the repo and will be included in 4.3. |
XrdCl
currently has a serious bug when used on dual-stack hosts.The code for detecting network stack support currently looks at the node's hostname and resolves all available addresses for that hostname (which is incorrect: there's no guarantee the hostname is in any way connected to DNS or network address resolution).
If
XrdNetUtils
doesn't detect a IPv6 address for the hostname,XrdCl
will claim to not be dual-stack when logging into a redirector. If it connects to the Xrootd server via IPv6, then the redirector will (incorrectly) identify the worker node as IPv6-only. This results in errors for CMS as the dual-stack host won't be redirected to IPv4-only servers.I would suggest a twofold fix:
getifaddrs
instead of relying on the configuration of the worker node's hostname.The text was updated successfully, but these errors were encountered: