-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xrootd5:: incompatible communication between v4 and v5? #1324
Comments
i've seen https://xrootd.slac.stanford.edu/doc/dev50/R5-Issues.htm but even if every path declaration is commented out, still does not work .. see nopath directory in the above cernbox share |
given the breakage, i would suggest to be pulled from epel (where there are no old versions, only the latest is present) and keep only the xroot repos for testing the new versions (which were helpful to me to recover the server because i could not get back to v4 otherwise) |
Well, this is completely new to us. The error would suggest that the
cms_monperf script is in error. So, please give me a) the directive you
used to specify the cms_monperf script and a pointer to that script.
Andy
…On Mon, 9 Nov 2020, Adrian Sevcenco wrote:
so, i tried xrootd 5 and so far i got this:
1. the cms.perf is incompatible with the existent cms_monPerf script that was running in v4
in cmslog i get this:
`201110 03:16:56 9534 Meter: Perf monitor returned invalid output:`
2. even if i comment out this i get in cmslog
```
201110 03:18:16 10200 Meter: Found 2 filesystem(s); 216TB total (77% util); 51TB free (25TB max)
------ ***@***.*** phase 2 server initialization completed.
------ cmsd ***@***.***:8853 initialization completed.
201110 03:18:16 10223 Start: Waiting for primary server to login.
```
and in xrdlog:
```
201110 03:18:17 10232 sysConfig: Configured as HTTP(s) data server.
------ HTTP protocol initialization completed.
------ xrootd ***@***.***:1094 initialization completed.
```
and everything is stuck and the server is not registering in redirector
the logs can be seen here https://cernbox.cern.ch/index.php/s/qzeEISru1gbL3uS
i would have expected to have some breakage in ALICE xrootd plugins but not to be complete disfunctional...
any idea what is going on? in which conditions v5 worked successfully?
Thank you!
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#1324
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
Ah, one more thing. The cmsmonperf error is unrelated to the sever not
registering (cmsmonperf errrs are not treated as fatal). In this case, the
companion xrootd has not connected to its cmsd so the cmsd is waiting for
that to happen. Nothing will proceed until that happens. So, what is going
on with the xrootd?
…On Mon, 9 Nov 2020, Adrian Sevcenco wrote:
so, i tried xrootd 5 and so far i got this:
1. the cms.perf is incompatible with the existent cms_monPerf script that was running in v4
in cmslog i get this:
`201110 03:16:56 9534 Meter: Perf monitor returned invalid output:`
2. even if i comment out this i get in cmslog
```
201110 03:18:16 10200 Meter: Found 2 filesystem(s); 216TB total (77% util); 51TB free (25TB max)
------ ***@***.*** phase 2 server initialization completed.
------ cmsd ***@***.***:8853 initialization completed.
201110 03:18:16 10223 Start: Waiting for primary server to login.
```
and in xrdlog:
```
201110 03:18:17 10232 sysConfig: Configured as HTTP(s) data server.
------ HTTP protocol initialization completed.
------ xrootd ***@***.***:1094 initialization completed.
```
and everything is stuck and the server is not registering in redirector
the logs can be seen here https://cernbox.cern.ch/index.php/s/qzeEISru1gbL3uS
i would have expected to have some breakage in ALICE xrootd plugins but not to be complete disfunctional...
any idea what is going on? in which conditions v5 worked successfully?
Thank you!
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
#1324
|
so, the monPerf does not matter as the problem stays also with it's directive commented out. |
Correct, as I said the monperf directive has nothing to do with the
companion xrootd not connecting to its cmsd. This is the first report of
any issues in this regard. So, while I look at what might be the issue
with the monperf script, let's concentrate on why the xrootd doesn't want
to connect. Is it that it never started up? The log will tell us
everything.
Andy
…On Mon, 9 Nov 2020, Adrian Sevcenco wrote:
so, the monPerf does not matter as the problem stays also with it's directive commented out.
the actual script that was used so far and was working in v4 is https://github.com/xrootd/xrootd/blob/master/utils/cms_monPerf
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#1324 (comment)
|
@abh3 see my cernbox share from my initial submission .. the nopath directory within is the configuration and logs when commenting out the {pid,admin}path directives and running the processes in debug mode |
OK, so according to the xrootd log it connected to the cmsd So, I've seen this before and it usually winds up being the case that a phantom cmsd is still running on that host. So, it did connect but not to the one you expected it to connect to. Could you check to see what is actually running? A ps -ef with a grep will usually illuminate the problem unless the "phantom" cmsd was killed but winds up still executing because the kernel can't get rid of it. Rare but it does happen. |
Ah, a few other things to get a clearer picture. I assume you are not using containers no virtual machines for any of this. If that is not the case, please let me know what the actual running setup is. Additionally, could you go back to the logs and post what the running log shows several minutes after what has been posted. From the logs I don't know where the cutoff was. According to the messages, the cmsd put up an accept a full second before the corresponding connect. So, that is odd. |
@adriansev : we will address this issue ASAP and if necessary push a patch to EPEL We run regular tests with xrootd/cmsd setup with XRootD5, so another possibility is to try running a very basic xrootd+cmsd setup to see if that works for you (it should, it does work in our test suit) and then bisect over your config file to see which part causes the issue. BTW Would it be possible for you to run against our release candidates in your test suit so we can detect early possible problems in the future? |
FWIW, I am using:
with |
@simonmichal well, i have not test suit .. this was tried in production :D .. moreover at this moment i have not even the simplest hardware to make it as a test storage system |
Any more information about the questions I asked regarding that the xrootd did connect to some cmsd it just wasn't the cmsd you thought it would connect to? |
@abh3 sorry, i forgot to answer :( so, this is a bona-fide production server on which i do the test, so no containers. Also, the logs stops there, nothing moves after that point. we can move discussion on a private mail and i can give access to a sacrificial server |
Sounds good to me. Just let me know what you want me to do. I want to get
to the botto m of this as much as you do.
…On Tue, 10 Nov 2020, Adrian Sevcenco wrote:
@abh3 sorry, i forgot to answer :( so, this is a bona-fide production server on which i do the test, so no containers. Also, the logs stops there, nothing moves after that point. we can move discussion on a private mail and i can give access to a sacrificial server
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1324 (comment)
|
@abh3 @simonmichal i updated the xrootd from xroot-testing to 5.0.3-0.rc1 and it seem that the problem is solved. the server is seen and registered to redirector. Waiting for the epel release.. I think that i can close this ticket... |
Apparently, this has been solved by fetching the latest release. |
so, i tried xrootd 5 and so far i got this:
in cmslog i get this:
201110 03:16:56 9534 Meter: Perf monitor returned invalid output:
and in xrdlog:
and everything is stuck and the server is not registering in redirector
the logs can be seen here https://cernbox.cern.ch/index.php/s/qzeEISru1gbL3uS
i would have expected to have some breakage in ALICE xrootd plugins but not to be complete disfunctional...
any idea what is going on? in which conditions v5 worked successfully?
Thank you!
The text was updated successfully, but these errors were encountered: