-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xrootd starts up in half-alive state when encountering certificate error #1674
Labels
Comments
A couple of things:
a) Could you cut a git hub ticket for this issue (the text in this message
is sufficient for the ticket).
b) Now as for a solution, I am suprised that this causes a zombie state.
Would it be possible to attach gdb to the process when it enters such a
state and then generate a stack trace or all running threads via gdb
(i.e. 'thread apply all bt'). Please make sure the debug rpm is installed
so that we get actual statement numbers.
…On Fri, 8 Apr 2022, Carl Vuosalo wrote:
With XRootD 5.4.1 running on CentOS Linux 7.9.2009, we have found that at start-up, the xrootd server can enter a zombie state if it encounters a certificate error. In this state, cmsd runs and responds normally, and xrootd keeps running but does not respond to any requests. Any request to it just waits until a timeout occurs.
This problem occurred when an XRootD re-start happened to occur at the same time as certificates and CRLs were being updated. Because the certificates and CRLs were in an intermediate state, xrootd encountered an error while trying to read them. But rather than terminating, it entered a non-responsive state and kept running.
Since the error is recognized by xrootd, we request a code change so that it terminates when encountering this error rather than continuing on in a zombie state.
A portion of the xrootd.log during start-up shows the error:
220325 09:45:43 17060 sysTPCInitialize: Will load configuration for the
TPC handler from /etc/xrootd/xrootd-t2wisc.cfg
=====> http.desthttps yes
220325 09:45:43 17060 TPC_TempCA: Reloading the list of CAs and CRLs in
directory
220325 09:45:43 17060 TPC_Maintenance: Failed to open certificate file
MD-Grid-CA-T.signing_policy No such file or directory
220325 09:45:43 17060 TPC_Config: CAs / CRL generation for libcurl failed.
220325 09:45:53 17164 cms_Receive: localhost 0 bytes on 1174667264
220325 09:45:53 17164 cms_setStatus:
/var/run/xrootd/t2wisc/.olb/olbd.super sent resume event
220325 09:45:53 17164 cms_setStatus: Manager
/var/run/xrootd/t2wisc/.olb/olbd.super resumed
220326 03:07:01 17080 /usr/bin/xrootd -l /var/log/xrootd/xrootd.log -c
/etc/xrootd/xrootd-t2wisc.cfg -k fifo -s /run/xrootd/xrootd-t2wisc.pid
-n t2wisc
220326 03:07:01 17080 Copr. 2004-2012 Stanford University, xrd version
v5.4.1
--
Reply to this email directly or view it on GitHub:
#1674
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>
########################################################################
Use REPLY-ALL to reply to list
To unsubscribe from the XROOTD-DEV list, click the following link:
https://listserv.slac.stanford.edu/cgi-bin/wa?SUBED1=XROOTD-DEV&A=1
|
Never mind on cutting an issue as I see you already have (was misled by the email). Anyway, a stack trace would help to speed up the resolution here. |
cvuosalo
changed the title
xrootd starts up in zombie state when encountering certificate error
xrootd starts up in half-alive state when encountering certificate error
Apr 12, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
With XRootD 5.4.1 running on CentOS Linux 7.9.2009, we have found that at start-up, the xrootd server can enter a half-alive state if it encounters a certificate error. In this state, cmsd runs and responds normally, and xrootd keeps running but does not respond to any requests. Any request to it just waits until a timeout occurs.
This problem occurred when an XRootD re-start happened to occur at the same time as certificates and CRLs were being updated. Because the certificates and CRLs were in an intermediate state, xrootd encountered an error while trying to read them. But rather than terminating, it entered a non-responsive state and kept running.
Since the error is recognized by xrootd, we request a code change so that it terminates when encountering this error rather than continuing on in an undead state.
A portion of the xrootd.log during start-up shows the error:
220325 09:45:43 17060 sysTPCInitialize: Will load configuration for the
TPC handler from /etc/xrootd/xrootd-t2wisc.cfg
=====> http.desthttps yes
220325 09:45:43 17060 TPC_TempCA: Reloading the list of CAs and CRLs in
directory
220325 09:45:43 17060 TPC_Maintenance: Failed to open certificate file
MD-Grid-CA-T.signing_policy No such file or directory
220325 09:45:43 17060 TPC_Config: CAs / CRL generation for libcurl failed.
220325 09:45:53 17164 cms_Receive: localhost 0 bytes on 1174667264
220325 09:45:53 17164 cms_setStatus:
/var/run/xrootd/t2wisc/.olb/olbd.super sent resume event
220325 09:45:53 17164 cms_setStatus: Manager
/var/run/xrootd/t2wisc/.olb/olbd.super resumed
220326 03:07:01 17080 /usr/bin/xrootd -l /var/log/xrootd/xrootd.log -c
/etc/xrootd/xrootd-t2wisc.cfg -k fifo -s /run/xrootd/xrootd-t2wisc.pid
-n t2wisc
220326 03:07:01 17080 Copr. 2004-2012 Stanford University, xrd version
v5.4.1
The text was updated successfully, but these errors were encountered: