-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xrdfs locate -r issue #40
Comments
Apparently, the client has been put into an infinite redirection loop and decided to give up. IMO, kXR_locate should never redirect. It's a server/federation config issue. |
We have two top level redirectors that are peers so file that can't be found anywhere fails with "too many redirections". That we are fully aware, we made it intentionally so. Issue here is that the chain of redirections should never start because the endpoint does have the file as proven by xrdcp-ing it from the endpoint. I don't know if kXR_locate should redirect or not, but I did fully expect that xrdfs locate would not redirect. |
I am finishing up work to automatically avoid redirection loops in peered clusters. The way ATLAS is currently configured guarantees redirection loops when a file is not found at all. As far as having the file, we keep going over and over this. You know that it has the file because your view is above/outside the system. If the system says you don’t have the file then the system is correct from it’s perspective. Clearly, the cmsd at that endpoint is probably not working – seems to be the most common reason when this happens. As for locate redirections, sometimes it needs to be done and sometimes not. That is, for regional federations, locate redirection is required otherwise you will, many times, be unable to locate files even when they logically exist in the federation. However, peered locate redirections are not really appropriate. So, that’s the approach I’m taking. Regional locates will redirect, peered locates will not. Though, I may add a flag that requests peered redirection while doing a locate. Andy From: Ilija Vukotic We have two top level redirectors that are peers so file that can't be found anywhere fails with "too many redirections". That we are fully aware, we made it intentionally so. Issue here is that the chain of redirections should never start because the endpoint does have the file as proven by xrdcp-ing it from the endpoint. I don't know if kXR_locate should redirect or not, but I did fully expect that xrdfs locate would not redirect. — |
Are you saying that at 23 of our endpoints cmsd is constantly not working and everything works ok? |
Adapted Primary vertex producer to use new consumes API
I think this is no longer an issue. |
We have 23 FAX endpoints that exhibit this problem:
I try to locate a file that we know exist at each of the sites by doing "xrdfs server.name locate -r theFile". These endpoints will wait some time (~1 min) and fail with message "too many redirections"
At the same time simple xrdcp of the same file from each of the sites work.
two examples:
Here both commands work:
xrdcp -f -np -d 1 root://grid-cert-03.roma1.infn.it:1094//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-roma1/user.HironoriIto.xrootd.infn-roma1-1M - > /dev/null
xrdfs grid-cert-03.roma1.infn.it:1094 locate -r //atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-roma1/user.HironoriIto.xrootd.infn-roma1
[::141.108.38.30]:1094 Server Read
Here second command does not find the file and redirects upstream:
xrdcp -f -np -d 1 root://xrootd-atlas.cr.cnaf.infn.it:1094//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-t1/user.HironoriIto.xrootd.infn-t1-1M - > /dev/null
xrdfs xrootd-atlas.cr.cnaf.infn.it:1094 locate -r //atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-t1/user.HironoriIto.xrootd.infn-t1
[FATAL] Redirect limit has been reached
Full list of sites may be found here: http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?view=FAX%20endpoints#currentView=FAX%2520endpoints&highlight=false
Ilija
The text was updated successfully, but these errors were encountered: