when multiple shards deleteions are triggered, subsequent shards deletion do not work once the former one deletion got failed.

In function `shard_delete_shards`, each deletion calls `shard_delete_shards_of_entry` to start a deletion process.
However, when the first deletion fails:
```
[2025-05-29 02:07:22.125974] D [MSGID: 0] [shard.c:3764:shard_delete_shards] 0-test_volume-shard: Initiating deletion of shards of gfid c81ade0e-5a83-46a3-a40f-6736131d4d56. 
[2025-05-29 02:07:22.126700] D [MSGID: 0] [shard.c:3445:__shard_delete_shards_of_entry] 0-test_volume-shard: base file = c81ade0e-5a83-46a3-a40f-6736131d4d56, shard-block-size=67108864, file-size=818348032, shard_count=12
[2025-05-29 02:07:22.127604] D [MSGID: 0] [shard.c:3482:__shard_delete_shards_of_entry] 0-test_volume-shard: deleting 12 shards starting from block 1 of gfid c81ade0e-5a83-46a3-a40f-6736131d456.
[2025-05-29 02:07:22.131320] E [MSGID: 133010] [shard.c:2420:shard_common_lookup_shards_cbk] 0-test_volume-shard: Lookup on shard 12 failed. Base file gfid = c81ade0e-5a83-46a3-a40f-6736131d4d56 [Stale file handle]
[2025-05-29 02:07:22.131340] E [MSGID: 133020] [shard.c:3038:shard_post_lookup_shards_unlink_handler] 0-test_volume-shard: failed to delete shards of c81ade0e-5a83-46a3-a40f-6736131d4d56 [Stale file handle]
[2025-05-29 02:07:22.131553] E [MSGID: 133021] [shard.c:3774:shard_delete_shards] 0-test_volume-shard: Failed to clean up shards of gfid c81ade0e-5a83-46a3-a40f-6736131d4d56 [Stale file handle]
```

All the subsequent deletions will fail with same error:
```
# grep -rnw shard_delete_shards /var/log/glusterfs/home-cjh-mnt.log |head
14624:[2025-05-29 02:04:43.094298] E [MSGID: 133021] [shard.c:3767:shard_delete_shards] 0-test_volume-shard: Failed to clean up shards of gfid c81ade0e-5a83-46a3-a40f-6736131d4d56 [Stale file handle]
14626:[2025-05-29 02:04:43.095675] E [MSGID: 133021] [shard.c:3767:shard_delete_shards] 0-test_volume-shard: Failed to clean up shards of gfid 379056e7-6b6c-46e4-90cc-1b62d6358e89 [Stale file handle]
14628:[2025-05-29 02:04:43.097117] E [MSGID: 133021] [shard.c:3767:shard_delete_shards] 0-test_volume-shard: Failed to clean up shards of gfid 9d01e837-704f-487d-bb13-bf5cedd71e8d [Stale file handle]
14630:[2025-05-29 02:04:43.098637] E [MSGID: 133021] [shard.c:3767:shard_delete_shards] 0-test_volume-shard: Failed to clean up shards of gfid 25bde759-e09e-4507-a182-e78c8b56fb86 [Stale file handle]
14632:[2025-05-29 02:04:43.100096] E [MSGID: 133021] [shard.c:3767:shard_delete_shards] 0-test_volume-shard: Failed to clean up shards of gfid 7544ee70-f4c4-452c-b469-7d59219bd3ed [Stale file handle]
```

The reason is that the variable `cleaup_frame->local` is common to all deletions. In function `shard_common_resolve_shards`,  `local->op_ret` will be checkd first. It was set to `-1` during `c81ade0e-5a83-46a3-a40f-6736131d4d56` deletion failure and kept unchanged utill all deletions finish. Other deletions thus abnormally exit `shard_regulated_shards_deletion` due to `local->op_ret < 0`. In this case, `shard_regulated_shards_deletion` returns `local->op_errno` as final result, so all deletions fail with same error. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

when multiple shards deleteions are triggered, subsequent shards deletion do not work once the former one deletion got failed. #4550

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

when multiple shards deleteions are triggered, subsequent shards deletion do not work once the former one deletion got failed. #4550

Description

Activity

pranithk commented on Sep 2, 2025

chen1195585098 commented on Sep 3, 2025

pranithk commented on Sep 3, 2025

pranithk commented on Sep 3, 2025

chen1195585098 commented on Sep 4, 2025

pranithk commented on Sep 4, 2025

chen1195585098 commented on Sep 4, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions