Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No retry after a broken server is hit #138

Closed
tommasoboccali opened this issue Sep 8, 2014 · 15 comments
Closed

No retry after a broken server is hit #138

tommasoboccali opened this issue Sep 8, 2014 · 15 comments

Comments

@tommasoboccali
Copy link

Ciao, in CMS we just moved out SAM tests to a newer version, in order to solve issues we had previously with a 3_1 xrootd version.

Now we use:

Tool info as configured in location /tmp/tboccali/CMSSW_7_1_7
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Name : xrootd
Version : 3.2.4-cms2
++++++++++++++++++++

what we see is that after a connection to a broken server (certifica problems, most probably), a retry does not see to be issued:

http://dashb-cms-sum.cern.ch/dashboard/request.py/getMetricResultDetails?hostName=cmsrm-cream03.roma1.infn.it&flavour=CREAM-CE&metric=org.cms.WN-xrootd-fallback&timeStamp=2014-09-08T08:25:41Z

is this expected? Brian told me we had a patch from you for a similar problem, so I wanted to make sure this is NOT the expected behavior.

thanks a lot

tom

@tommasoboccali
Copy link
Author

some more precise info. in the redirector used there are more than 1 server able to provide the file.

I link here 2 logs from xrdcp -d 3:

  • one which end up in the same server --> fail
  • one which ends up in a good server on the same redirection level -> all ok

https://www.dropbox.com/s/m0ranydm6ook2sl/log.fail?dl=0
https://www.dropbox.com/s/wctvaeczz2gra57/log.ok?dl=0

thanks

tom

@abh3
Copy link
Member

abh3 commented Sep 8, 2014

Hi Tom,

Indeed that is not the expected behaviour with the patch I gave to Brian.
He said he would include the backported 3.2 patch in the CMS software
suite. Without the patch, you will indeed see that a retry is not
performed.

Andy

On Mon, 8 Sep 2014, Tommaso Boccali wrote:

Ciao, in CMS we just moved out SAM tests to a newer version, in order to solve issues we had previously with a 3_1 xrootd version.

Now we use:

Tool info as configured in location /tmp/tboccali/CMSSW_7_1_7
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Name : xrootd
Version : 3.2.4-cms2
++++++++++++++++++++

what we see is that after a connection to a broken server (certifica problems, most probably), a retry does not see to be issued:

http://dashb-cms-sum.cern.ch/dashboard/request.py/getMetricResultDetails?hostName=cmsrm-cream03.roma1.infn.it&flavour=CREAM-CE&metric=org.cms.WN-xrootd-fallback&timeStamp=2014-09-08T08:25:41Z

is this expected? Brian told me we had a patch from you for a similar problem, so I wanted to make sure this is NOT the expected behavior.

thanks a lot

tom


Reply to this email directly or view it on GitHub:
#138

@bbockelm
Copy link
Contributor

bbockelm commented Sep 9, 2014

Rats - it looks like the patch got clobbered in our move to 4.0.3 (and subsequent revert back to 3.2.4). Andy, what's the ticket again? I can't find the old patch.

@ljanyst
Copy link
Contributor

ljanyst commented Sep 9, 2014

What was the reason for the revert?

@bbockelm
Copy link
Contributor

bbockelm commented Sep 9, 2014

Build issue with ROOT; supposedly fixed in 5.34.20, although 5.34.20 has unrelated bugs that prevent CMS from using it.

So, we wait again.

@xrootd-dev
Copy link

Hi Brian,

I believe the patch is:

6184cd0

I will have to verify that this is all of it (I think it is). Did I send
you a patch file for 3.2.4?

Andy

On Tue, 9 Sep 2014, Brian Bockelman wrote:

Rats - it looks like the patch got clobbered in our move to 4.0.3 (and
subsequent revert back to 3.2.4). Andy, what's the ticket again? I can't find the old patch.


Reply to this email directly or view it on GitHub:
#138 (comment)

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

@tommasoboccali
Copy link
Author

from the comment, the linked patch seems to be solving the multiple DNS problems, not the "not enough retries". Is it the same issue?

tom

@xrootd-dev
Copy link

Hi Tom,

Yes, I believe it’s the same issue. The loop would simply exit and not continue on to the next redirector.

Andy

From: Tommaso Boccali
Sent: Tuesday, September 09, 2014 2:14 PM
To: xrootd/xrootd
Cc: xrootd-dev
Subject: Re: [xrootd] No retry after a broken server is hit (#138)

from the comment, the linked patch seems to be solving the multiple DNS problems, not the "not enough retries". Is it the same issue?

tom


Reply to this email directly or view it on GitHub.


Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

@tommasoboccali
Copy link
Author

uhm, but in this case the wanted behavior is to use another server under the same redirector... I would like to try, but currently the broken server is no more broken, and breaking it on purpose is a bit ... nasty ;)

tom

@abh3
Copy link
Member

abh3 commented Sep 10, 2014

Hi Tom,

I don't think you need to try. The patch should address the problems you
saw. We just need to it get back into the CMS software suite and are
just waiting for Brian to figure out what needs to be done.

Andy

On Tue, 9 Sep 2014, Tommaso Boccali wrote:

uhm, but in this case the wanted behavior is to use another server under the same redirector... I would like to try, but currently the broken server is no more broken, and breaking it on purpose is a bit ... nasty ;)

tom


Reply to this email directly or view it on GitHub:
#138 (comment)

@xrootd-dev
Copy link

OK, still waiting for what you really need to get the patch back into your
general release. Could you please tell me?

On Tue, 9 Sep 2014, Brian Bockelman wrote:

Rats - it looks like the patch got clobbered in our move to 4.0.3 (and subsequent revert back to 3.2.4). Andy, what's the ticket again? I can't find the old patch.


Reply to this email directly or view it on GitHub:
#138 (comment)

########################################################################
Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

@xrootd-dev
Copy link

Hi Brian,

Please find the back-ported patch for this problem at:

http://www.slac.stanford.edu/~abh/xrootd-3.2.4/

I will keep it there to the end of the year incase you loose it.

Andy

From: Brian Bockelman
Sent: Tuesday, September 09, 2014 9:04 AM
To: xrootd/xrootd
Subject: Re: [xrootd] No retry after a broken server is hit (#138)

Rats - it looks like the patch got clobbered in our move to 4.0.3 (and subsequent revert back to 3.2.4). Andy, what's the ticket again? I can't find the old patch.


Reply to this email directly or view it on GitHub.


Use REPLY-ALL to reply to list

To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1

@bbockelm
Copy link
Contributor

Pull request is in to the CMSSW team; just waiting for them to pick it up.

In terms of Xrootd 4.0.3 -- they are currently trying to integrate ROOT 5.34.21.

@ljanyst
Copy link
Contributor

ljanyst commented Sep 17, 2014

For 4.0.3 - Great! Please let me know if you spot any problems.

@abh3
Copy link
Member

abh3 commented Dec 3, 2014

Seems like no problems have been hit after this has been fixed. So, I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants