Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loop device detach (losetup -d) hung #356

Open
hongyuntw opened this issue Feb 15, 2024 · 1 comment
Open

loop device detach (losetup -d) hung #356

hongyuntw opened this issue Feb 15, 2024 · 1 comment
Labels

Comments

@hongyuntw
Copy link

hongyuntw commented Feb 15, 2024

Hi, I've encountered an issue when using losetup -d to detach a loop device, it hangs. Here are the steps to reproduce:

  1. Create a loop device:
dd if=/dev/zero of=./x.img count=400 bs=1M
LOOP_DEVICE=$(losetup --find --show --partscan ./x.img) && echo $LOOP_DEVICE
mkfs.ext4 -F $LOOP_DEVICE
mkdir -p /mnt/tests/ && mount $LOOP_DEVICE /mnt/tests/
  1. Set up a snapshot: dbdctl setup-snapshot $LOOP_DEVICE /mnt/tests/.cow 0
  2. Destroy the snapshot: dbdctl destroy 0
  3. Unmount the device: umount /mnt/tests
  4. Detach the loop device (Hungs here): losetup -d $LOOP_DEVICE

I've used gdb to debug the kernel and found that the root cause is when detaching the loop device. If no one else is using it, the kernel (loop_clr_fd in loop.c) calls the __loop_clr_fd function internally. This function then calls blk_mq_freeze_queue, where the hang occurs.

The reason for the hang is due to abnormal ref count changes in the request queue of the loop device.
Here is the image

image

In the second red box, it can be seen that the value of lo->lo_queue->q_usage_counter->data inexplicably increased from 1 to 22. This is very strange. I experimented a few times and found that sometimes it increases to over 100. This results in the inability to freeze lo->lo_queue.

I suspect this issue might be related to changes in the kernel loop device. Two commits seem particularly relevant, but i am not sure the root cause is related with them
Commit 1
Commit 2

Additionally, this situation only occurs when we perform setup & destroy & umount before detaching, leading to a hang. If we follow the sequence setup -> destroy -> detach -> umount, or setup -> umount -> detach -> destroy, the losetup -d command won't result in a hang. This is because our module is still using the loop device, so it doesn't call __loop_clr_fd in loop_clr_fd .

And it may affect kernel versions 5.16 and above, confirmed on Fedora 34 (5.16.19 / 5.17.12) and Fedora 38 (6.2).

However, this error does not seem to affect physical disks but not sure will effect the ref cnt for request queue of disk.

@hongyuntw hongyuntw changed the title loop device detach (losetup -d) hung for kernel 5.17+ loop device detach (losetup -d) hung Feb 15, 2024
@Swistusmen Swistusmen added the bug label Feb 15, 2024
@Swistusmen
Copy link
Collaborator

Hi man, thanks for raising that issue. We will look at this, sorry currently whole team has another priorities, but it should change soon and we will go back to this+ to your PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants