You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using Proxmox 8 with ZFS 2.2.7. Sometimes we have the problem that a subvol (container) cannot be deleted. For example
TASK ERROR: zfs error: cannot destroy 'tank01/subvol-122-disk-0': dataset is busy
We are 100% sure that there is no access to this dataset from such processes. I had the same behavior on another server with a container template. It could not be deleted, but it was never cloned or used. Still no access. We used strace and debug down to the process level:
We tried to actively reproduce it. To do this, I executed a test program for days that repeatedly created 50 containers, let them run for 5 minutes, rebooted, ran them again for 5 minutes, then shut them down and destroyed them. This happened on a PHY server and also in a virtual environment where resources were so severely restricted that an I/O wait of approx. 80-90% was generated. But I couldn't reproduce it.
If the behavior occurs again on productive servers, how can we debug it to find the error? What is the best way to proceed?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I am using Proxmox 8 with ZFS 2.2.7. Sometimes we have the problem that a subvol (container) cannot be deleted. For example
TASK ERROR: zfs error: cannot destroy 'tank01/subvol-122-disk-0': dataset is busy
We are 100% sure that there is no access to this dataset from such processes. I had the same behavior on another server with a container template. It could not be deleted, but it was never cloned or used. Still no access. We used strace and debug down to the process level:
We tried to actively reproduce it. To do this, I executed a test program for days that repeatedly created 50 containers, let them run for 5 minutes, rebooted, ran them again for 5 minutes, then shut them down and destroyed them. This happened on a PHY server and also in a virtual environment where resources were so severely restricted that an I/O wait of approx. 80-90% was generated. But I couldn't reproduce it.
If the behavior occurs again on productive servers, how can we debug it to find the error? What is the best way to proceed?
Thanks a lot
Best Regards
Boospy
Beta Was this translation helpful? Give feedback.
All reactions