Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seg fault when SFS_STARTED replied by OFS layer #340

Closed
esindril opened this issue Mar 9, 2016 · 2 comments
Closed

Seg fault when SFS_STARTED replied by OFS layer #340

esindril opened this issue Mar 9, 2016 · 2 comments

Comments

@esindril
Copy link
Contributor

esindril commented Mar 9, 2016

For the CASTOR xrootd plugin we got the following crash:

Program terminated with signal 11, Segmentation fault.
#0  0x00000031a863c68a in XrdXrootdCBJob::DoIt (this=0x7f17d8203620) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdCallBack.cc:155
155    if (eInfo->getErrCB()) eInfo->getErrCB()->Done(Result, eInfo);
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.7.x86_64 libgcc-4.4.7-16.el6.x86_64 libstdc++-4.4.7-16.el6.x86_64 libuuid-2.17.2-12.18.el6.x86_64 openssl-1.0.1e-42.el6_7.4.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) p eInfo
$1 = (XrdOucErrInfo *) 0x7f1885596120
(gdb) p *eInfo
$2 = {_vptr.XrdOucErrInfo = 0x31a86a227f, ErrInfo = {static Max_Error_Len = 2048, static Path_Offset = 1024, user = 0x7f17d83090e0 "xrdmgr.137281:21@ccxrtli223.in2p3.fr", ucap = 0, code = 13, 
    message = "\000\000\000\000\000\000\000\000\354fY\205\030\177\000\000\bgY\205\030\177\000\000d\000\000\000\000\000\000\000(gY\205\030\177", '\000' <repeats 58 times>, "pbY\205\030\177\000\000\377\377\377\377\377\377\377\377\340fY\205\030\177", '\000' <repeats 26 times>, "XfY\205\030\177\000\000\000\000\000\000\000\000\000\000 ", '\000' <repeats 31 times>"\257, \t", '\000' <repeats 46 times>"\377, \377\377\377", '\000' <repeats 12 times>, "XfY\205\030\177\000\000s", '\000' <repeats 15 times>"\214, \"j\250\061", '\000' <repeats 23 times>, "\026", '\000' <repeats 423 times>, "2735", '\000' <repeats 40 times>, "\030\000\000\000\060\000\000\000\240gY\205\030\177\000\000\340fY\205\030\177", '\000' <repeats 194 times>, "`eY\205\030"..., static uVMask = -1, static uAsync = -2147483648, static uUrlOK = 1073741824, static uMProt = 536870912, static uReadR = 268435456, static uIPv4 = 134217728, static uIPv64 = 67108864, 
    static uPrip = 33554432}, ErrCB = 0x1, {ErrCBarg = 6357880, ErrEnv = 0x610378}, mID = 2735, dOff = 0, reserved = 0, dataBuff = 0x0}
(gdb) p *eInfo->ErrCB
Cannot access memory at address 0x1
(gdb) p eInfo->ErrCB
$3 = (XrdOucEICB *) 0x1
(gdb) bt
#0  0x00000031a863c68a in XrdXrootdCBJob::DoIt (this=0x7f17d8203620) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdCallBack.cc:155
#1  0x00000031a9263cc5 in XrdScheduler::Run (this=0x6102b8) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:333
#2  0x00000031a9263eb9 in XrdStartWorking (carg=<value optimized out>) at /usr/src/debug/xrootd/xrootd/src/Xrd/XrdScheduler.cc:85
#3  0x00000031a92270af in XrdSysThread_Xeq (myargs=0x7f17d826ea30) at /usr/src/debug/xrootd/xrootd/src/XrdSys/XrdSysPthread.cc:86
#4  0x00000031a7607aa1 in ?? ()
#5  0x00007f1886a83700 in ?? ()
#6  0x0000000000000000 in ?? ()
(gdb) f 0
#0  0x00000031a863c68a in XrdXrootdCBJob::DoIt (this=0x7f17d8203620) at /usr/src/debug/xrootd/xrootd/src/XrdXrootd/XrdXrootdCallBack.cc:155
155    if (eInfo->getErrCB()) eInfo->getErrCB()->Done(Result, eInfo);
(gdb) p *eInfo
$4 = {_vptr.XrdOucErrInfo = 0x31a86a227f, ErrInfo = {static Max_Error_Len = 2048, static Path_Offset = 1024, user = 0x7f17d83090e0 "xrdmgr.137281:21@ccxrtli223.in2p3.fr", ucap = 0, code = 13, 
    message = "\000\000\000\000\000\000\000\000\354fY\205\030\177\000\000\bgY\205\030\177\000\000d\000\000\000\000\000\000\000(gY\205\030\177", '\000' <repeats 58 times>, "pbY\205\030\177\000\000\377\377\377\377\377\377\377\377\340fY\205\030\177", '\000' <repeats 26 times>, "XfY\205\030\177\000\000\000\000\000\000\000\000\000\000 ", '\000' <repeats 31 times>"\257, \t", '\000' <repeats 46 times>"\377, \377\377\377", '\000' <repeats 12 times>, "XfY\205\030\177\000\000s", '\000' <repeats 15 times>"\214, \"j\250\061", '\000' <repeats 23 times>, "\026", '\000' <repeats 423 times>, "2735", '\000' <repeats 40 times>, "\030\000\000\000\060\000\000\000\240gY\205\030\177\000\000\340fY\205\030\177", '\000' <repeats 194 times>, "`eY\205\030"..., static uVMask = -1, static uAsync = -2147483648, static uUrlOK = 1073741824, static uMProt = 536870912, static uReadR = 268435456, static uIPv4 = 134217728, static uIPv64 = 67108864, 
    static uPrip = 33554432}, ErrCB = 0x1, {ErrCBarg = 6357880, ErrEnv = 0x610378}, mID = 2735, dOff = 0, reserved = 0, dataBuff = 0x0}
(gdb) q

And the corresponding messages from the log file:

160309 15:29:52 12804 XrdInet: Accepted connection from 21@ccxrtli223.in2p3.fr
160309 15:29:52 12804 XrootdXeq: xrdmgr.137281:21@ccxrtli223.in2p3.fr pub IPv4 login
160309 15:29:52 time=1457533792.576015 func=open                     level=INFO  logid=630d12de-e603-11e5-97a5-00259004e876 unit=ds@lxfsrd45a01.cern.ch:1094 tid=139740849350400 source=XrdxCastor2Ofs:607   tident=xrdmgr.137281:21@ccxrt
li223.in2p3.fr path=/castor/cern.ch/alice/raw/global/2015/12/13/10/15000246991020.1400.root, opaque=tpc.key=00073d57400fe57c56e03360&tpc.org=alienmas.58748@pcalienstorage2.cern.ch, isRW=0, open_mode=4000, file_ptr=0x7f17d83671d0
160309 15:29:52 12804 xrdmgr.137281:21@ccxrtli223.in2p3.fr castor2ofs_open: 4000-40600 fn=/srv/castor/01/05/1491166405@castorns.22098012203
160309 15:29:52 time=1457533792.578211 func=open                     level=INFO  logid=630d12de-e603-11e5-97a5-00259004e876 unit=ds@lxfsrd45a01.cern.ch:1094 tid=139740849350400 source=XrdxCastor2Ofs:704   tident=xrdmgr.137281:21@ccxrt
li223.in2p3.fr rc=-512 msg="open delayed by the OFS layer, client will retry"
160309 15:29:52 12804 xrdmgr.137281:21@ccxrtli223.in2p3.fr castor2ofs_close: use=0 fn=dummy
160309 15:30:04 2735 castor2ofs_TPC: localhost tpc grant by alienmas.58748@pcalienstorage2.cern.ch expired for /srv/castor/01/05/1491166405@castorns.22098012203
160309 15:30:07 2735 castor2ofs_TPC: localhost tpc grant by alienmas.58748@pcalienstorage2.cern.ch expired for /srv/castor/01/05/1491166405@castorns.22098012203
160309 15:30:07 14668 XrootdsendResp: xrdmgr.137281:21@ccxrtli223.in2p3.fr open file async resp aborted; user gone.
160309 15:31:29 11234 Starting on Linux 2.6.32-573.7.1.el6.x86_64
Copr.  2004-2012 Stanford University, xrd version v20150924-c00a228
@abh3
Copy link
Member

abh3 commented Apr 20, 2016

Could you check the code (perhaps provide a snippet here) that handles the return of the SFS_STARTED indicator. This is usually caused by not waiting for the wait notification to be sent to the client. There is a protocol for handling that (and it is handled in the standard callback API). However, if that is not being used (i.e. you are doing it manually) hen you have to handle it yourself.

@esindril
Copy link
Contributor Author

This is no longer an issue in XRootD 4.3.0 with the new XRootD Castor plugin deployed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants