Skip to content

tests/bugs/ec/bug-1161886.t fails and leaves a core #4525

@pranithk

Description

@pranithk
Member
tests/bugs/ec/../../include.rc: line 348: 604199 Segmentation fault      (core dumped) tests/bugs/ec/bug-1161886 192.168.122.213 patchy /file2
not ok  15 [     37/   3076] <  39> 'tests/bugs/ec/bug-1161886 192.168.122.213 patchy /file2' -> ''
ok  16 [     36/      9] <  40> '^0$ stat -c %s /mnt/glusterfs/0/file2'
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
getfattr: Removing leading '/' from absolute path names
ok  17 [     23/      9] <  41> '^Y$ check_ec_size file2 0'
Failed 1/17 subtests

Test Summary Report
-------------------
tests/bugs/ec/bug-1161886.t (Wstat: 0 Tests: 17 Failed: 1)
  Failed test:  15

Based on git bisect, first bad commit points to:

ba5bfeb119a9e7cbf294ba9c5076ac0792598855 is the first bad commit
commit ba5bfeb119a9e7cbf294ba9c5076ac0792598855
Author: mohit84 <moagrawa@redhat.com>
Date:   Mon Oct 21 23:06:14 2024 +0530

    rpc: Improve rpc clnt connection cleanup process (#4329)

    During the first rpc clnt submission we take the rpc reference and
    register the call_bail function for the timer thread. The timer thread
    call call_bail function every 10s basis. In case if a client trigger a
    shutdown request it try to call rpc_clnt_connection_cleanup to cleanup
    the rpc connection.The rpc_clnt_connection would not be able to cleanup
    the rpc connection successfully due to the cleanup_started flag being set by
    the upper xlator. The rpc reference will be unref only after trigger
    a call_bail function so basically if somehow call_bail is triggered just
    before start a shutdown process the application has to wait for 10s
    to cleanup the rpc connection eventually the process becomes slow.

    Solution: Unref the rpc object based on the conn->timer/conn->reconnect
    pointer value as we are doing the same for ping_timer. These pointer are always
    modified under the critical section so we can assume if pointer is valid it means
    rpc reference is also valid.

    Fixes: #4320
    credits: Xavi Hernandez <xhernandez@redhat.com>
    Change-Id: Ib947b8bfcbe1b49e1ed05a50a84de6f92afbca13

    Signed-off-by: Mohit Agrawal <moagrawa@redhat.com>

 rpc/rpc-lib/src/rpc-clnt.c | 30 +++++++++++-------------------
 1 file changed, 11 insertions(+), 19 deletions(-)

Activity

self-assigned this
on Apr 22, 2025
pranithk

pranithk commented on Apr 22, 2025

@pranithk
MemberAuthor

Coredump:

Core was generated by `tests/bugs/ec/bug-1161886 192.168.122.213 patchy /file2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f7f5746e980 in rpc_clnt_reconnect (conn_ptr=0x7f7f38046028)
    at /home/pk/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:357
357         ctx = clnt->ctx;
[Current thread is 1 (Thread 0x7f7f44dff6c0 (LWP 417664))]
Missing rpms, try: dnf --enablerepo='*debug*' install libattr-debuginfo-2.5.2-4.fc41.x86_64
(gdb) p clnt
$1 = (struct rpc_clnt *) 0x170
(gdb) bt
#0  0x00007f7f5746e980 in rpc_clnt_reconnect (conn_ptr=0x7f7f38046028)
    at /home/pk/glusterfs/rpc/rpc-lib/src/rpc-clnt.c:357
#1  0x00007f7f5731f096 in gf_timer_proc (data=0xee0ed88)
    at /home/pk/glusterfs/libglusterfs/src/timer.c:152
#2  0x00007f7f5751f148 in start_thread (arg=<optimized out>) at pthread_create.c:448
#3  0x00007f7f575a30cc in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78
added a commit that references this issue on Apr 22, 2025
b606532
added this to the Gluster 11.3 milestone on Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

    Development

    Participants

    @pranithk

    Issue actions

      tests/bugs/ec/bug-1161886.t fails and leaves a core · Issue #4525 · gluster/glusterfs