Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault and kernel traps with xrdcopy #990

Closed
samuambroj opened this issue May 17, 2019 · 1 comment
Closed

Segfault and kernel traps with xrdcopy #990

samuambroj opened this issue May 17, 2019 · 1 comment

Comments

@samuambroj
Copy link

Dear Xrootd Team,

We are experiencing segfault and kernel traps issues when running xrdcopy. This problem was observed two weeks ago and we allowed core dumps to be saved immediately. The problem has appeared again tonight, occurring 7 times between 00:30 and 5:35 (CEST). For these seven failures, a core dump has been saved:

$ ls -ltc core.*
-rw------- 1 dcache-mon users 169820160 May 17 05:32 core.13221
-rw------- 1 dcache-mon users 169877504 May 17 05:17 core.10943
-rw------- 1 dcache-mon users 102694912 May 17 05:02 core.8971
-rw------- 1 dcache-mon users 136257536 May 17 02:32 core.20364
-rw------- 1 dcache-mon users 136257536 May 17 02:17 core.18044
-rw------- 1 dcache-mon users 119476224 May 17 02:02 core.16144
-rw------- 1 dcache-mon users 186654720 May 17 00:32 core.3559

The logged information in /var/log/messages:

# grep kernel /var/log/messages
May 17 00:32:46  kernel: traps: xrdcopy[3596] general protection ip:7f653460acd4 sp:7f65306c4c10 error:0 in libXrdCl.so.2.0.0[7f6534545000+11e000]
May 17 02:02:31  kernel: xrdcopy[16193]: segfault at 2 ip 00007fbef46e946c sp 00007fbef0803b50 error 4 in libXrdCl.so.2.0.0[7fbef4684000+11e000]
May 17 02:32:31  kernel: xrdcopy[20399]: segfault at 2 ip 00007f645f48146c sp 00007f645b59bb50 error 4 in libXrdCl.so.2.0.0[7f645f41c000+11e000]
May 17 05:02:31  kernel: xrdcopy[8998]: segfault at 2 ip 00007f43563f746c sp 00007f4352511b50 error 4 in libXrdCl.so.2.0.0[7f4356392000+11e000]
May 17 05:17:46  kernel: xrdcopy[10986]: segfault at 2 ip 00007fea0295d46c sp 00007fe9fea77b50 error 4 in libXrdCl.so.2.0.0[7fea028f8000+11e000]
May 17 05:32:31  kernel: traps: xrdcopy[13257] general protection ip:7f3203a09cd4 sp:7f31ffac3c10 error:0 in libXrdCl.so.2.0.0[7f3203944000+11e000]

One comment: we have not been able to find anything in /var/log/messages for the core dump at 02:17

The xrdcopy command is run with debug level 3 and the lines containing error are the following:

[2019-05-17 05:32:01.417937 +0200][Debug  ][PostMaster        ] [***********************:1094 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Window: 1800
[2019-05-17 05:32:01.438923 +0200][Debug  ][PostMaster        ] [[**************************]:23573 #0] Stream parameters: Network Stack: IPAuto, Connection Window: 120, ConnectionRetry: 5, Stream Error Window: 1800
[2019-05-17 05:32:31.422823 +0200][Debug  ][XRootD            ] [[**************************]:23573] Handling error while processing kXR_read (handle: 0x00000000, offset: 67108864, size: 16777216): [ERROR] Operation expired.
[2019-05-17 05:32:31.422838 +0200][Error  ][XRootD            ] [[**************************]:23573] Unable to get the response to request kXR_read (handle: 0x00000000, offset: 67108864, size: 16777216)
[2019-05-17 05:32:31.422975 +0200][Debug  ][ExDbgMsg          ] [[**************************]:23573] Calling MsgHandler: 0x1982000 (message: kXR_read (handle: 0x00000000, offset: 67108864, size: 16777216) ) with status: [ERROR] Operation expired.
[2019-05-17 05:32:31.423016 +0200][Dump   ][File              ] [0x1978da0@root://****************************************************************7A78A3F8.root] File state error encountered. Message kXR_read (handle: 0x00000000, offset: 67108864, size: 16777216) returned with [ERROR] Operation expired
[2019-05-17 05:32:31.423039 +0200][Error  ][File              ] [0x1978da0@root://****************************************************************47A78A3F8.root] Fatal file state error. Message kXR_read (handle: 0x00000000, offset: 67108864, size: 16777216) returned with [ERROR] Operation expired
[2019-05-17 05:32:31.423065 +0200][Dump   ][File              ] [0x1978da0@root://****************************************************************47A78A3F8.root] Failing message kXR_read (handle: 0x00000000, offset: 67108864, size: 16777216) with [ERROR] Operation expired
[2019-05-17 05:32:31.423212 +0200][Debug  ][Utility           ] Unable read 16777216 bytes at 67108864 from root://**************************************************************47A78A3F8.root: [ERROR] Operation expired

The xrootd related packages installed on the client machine (note that xrootd-debuginfo was installed two weeks ago):

# rpm -qa | grep -i xroo
xrootd-client-libs-4.9.1-1.el7.x86_64
gfal2-plugin-xrootd-2.16.1-1.el7.x86_64
xrootd-libs-4.9.1-1.el7.x86_64
xrootd-client-4.9.1-1.el7.x86_64
nordugrid-arc-plugins-xrootd-5.4.3-1.el7.x86_64
xrootd-debuginfo-4.9.1-1.el7.x86_64

We could send you the 7 core dumps and the detailed xrdcopy output of the last failure. The total uncompressed size is around 1GB.

Best,
Samuel

@simonmichal
Copy link
Contributor

@samuambroj : sorry for the late response, somehow I haven't notice your issue before. Do you maybe have respective stacktrace including the line numbers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants