Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xrdfs locate -r issue #40

Closed
ivukotic opened this issue Sep 4, 2013 · 5 comments
Closed

xrdfs locate -r issue #40

ivukotic opened this issue Sep 4, 2013 · 5 comments

Comments

@ivukotic
Copy link

ivukotic commented Sep 4, 2013

We have 23 FAX endpoints that exhibit this problem:
I try to locate a file that we know exist at each of the sites by doing "xrdfs server.name locate -r theFile". These endpoints will wait some time (~1 min) and fail with message "too many redirections"
At the same time simple xrdcp of the same file from each of the sites work.

two examples:
Here both commands work:
xrdcp -f -np -d 1 root://grid-cert-03.roma1.infn.it:1094//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-roma1/user.HironoriIto.xrootd.infn-roma1-1M - > /dev/null

xrdfs grid-cert-03.roma1.infn.it:1094 locate -r //atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-roma1/user.HironoriIto.xrootd.infn-roma1
[::141.108.38.30]:1094 Server Read

Here second command does not find the file and redirects upstream:
xrdcp -f -np -d 1 root://xrootd-atlas.cr.cnaf.infn.it:1094//atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-t1/user.HironoriIto.xrootd.infn-t1-1M - > /dev/null

xrdfs xrootd-atlas.cr.cnaf.infn.it:1094 locate -r //atlas/dq2/user/HironoriIto/user.HironoriIto.xrootd.infn-t1/user.HironoriIto.xrootd.infn-t1
[FATAL] Redirect limit has been reached

Full list of sites may be found here: http://dashb-atlas-ssb.cern.ch/dashboard/request.py/siteview?view=FAX%20endpoints#currentView=FAX%2520endpoints&highlight=false

Ilija

@ljanyst
Copy link
Contributor

ljanyst commented Sep 4, 2013

Apparently, the client has been put into an infinite redirection loop and decided to give up. IMO, kXR_locate should never redirect.

It's a server/federation config issue.

@ghost ghost assigned abh3 Sep 4, 2013
@ivukotic
Copy link
Author

ivukotic commented Sep 4, 2013

We have two top level redirectors that are peers so file that can't be found anywhere fails with "too many redirections". That we are fully aware, we made it intentionally so.

Issue here is that the chain of redirections should never start because the endpoint does have the file as proven by xrdcp-ing it from the endpoint.

I don't know if kXR_locate should redirect or not, but I did fully expect that xrdfs locate would not redirect.

@abh3
Copy link
Member

abh3 commented Sep 4, 2013

I am finishing up work to automatically avoid redirection loops in peered clusters. The way ATLAS is currently configured guarantees redirection loops when a file is not found at all. As far as having the file, we keep going over and over this. You know that it has the file because your view is above/outside the system. If the system says you don’t have the file then the system is correct from it’s perspective. Clearly, the cmsd at that endpoint is probably not working – seems to be the most common reason when this happens.

As for locate redirections, sometimes it needs to be done and sometimes not. That is, for regional federations, locate redirection is required otherwise you will, many times, be unable to locate files even when they logically exist in the federation. However, peered locate redirections are not really appropriate. So, that’s the approach I’m taking. Regional locates will redirect, peered locates will not. Though, I may add a flag that requests peered redirection while doing a locate.

Andy

From: Ilija Vukotic
Sent: Wednesday, September 04, 2013 7:50 AM
To: xrootd/xrootd
Cc: Andrew Hanushevsky
Subject: Re: [xrootd] xrdfs locate -r issue (#40)

We have two top level redirectors that are peers so file that can't be found anywhere fails with "too many redirections". That we are fully aware, we made it intentionally so.

Issue here is that the chain of redirections should never start because the endpoint does have the file as proven by xrdcp-ing it from the endpoint.

I don't know if kXR_locate should redirect or not, but I did fully expect that xrdfs locate would not redirect.


Reply to this email directly or view it on GitHub.

@ivukotic
Copy link
Author

ivukotic commented Sep 4, 2013

Are you saying that at 23 of our endpoints cmsd is constantly not working and everything works ok?
I can get file by asking endpoint directly, I can get it by asking an upstream redirector, all while cmsd thinks it does not have the file?

alja referenced this issue in alja/xrootd Apr 1, 2014
Adapted Primary vertex producer to use new consumes API
@abh3
Copy link
Member

abh3 commented Sep 14, 2014

I think this is no longer an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants