zfs send | zfs receive fails on zvols that have additional snapshots #692

ryao · 2012-04-23T12:39:09Z

If I have a rpool/zvol@X and rpool/zvol@Y and I try to send rpool/zvol@Y to another pool, rpool/zvol@X will be transfered and then a failure will occur. I had this happen to me a few times before I realized that this was happening. The following illustrates this:

zfs send -DRvp rpool/KVM@backup | ssh -C 192.168.1.3 zfs receive -Fudv rpool/BACKUP/vserver

Password: sending from @ to rpool/KVM@backup
sending from @ to rpool/KVM/vm_gentoo@backup

receiving full stream of rpool/KVM@backup into rpool/BACKUP/vserver/KVM@backup
received 44.2KB stream in 1 seconds (44.2KB/sec)
receiving full stream of rpool/KVM/vm_gentoo@backup into rpool/BACKUP/vserver/KVM/vm_gentoo@backup
sending from @ to rpool/KVM/minix@install
received 13.8GB stream in 1032 seconds (13.7MB/sec)
receiving full stream of rpool/KVM/minix@install into rpool/BACKUP/vserver/KVM/minix@install
sending from @install to rpool/KVM/minix@backup
sending from @ to rpool/KVM/win7@install
cannot remove device links for 'rpool/BACKUP/vserver/KVM/minix': dataset is busy
received 521MB stream in 38 seconds (13.7MB/sec)

In this case, both rpool/KVM/minix@backup and rpool/KVM/minix@install exist. Trying to send either will result in an error. I opted to delete rpool/KVM/minix@backup, rollback to rpool/KVM/minix@install and then rename rpool/KVM/minix@install to rpool/KVM/minix@backup. Had this been a dataset, both snapshots would have been transferred without a problem.

GregorKopka · 2012-04-24T14:19:00Z

Did rpool/BACKUP/vserver/KVM/minix exists at that time, and if so: was it in use?

ryao · 2012-04-24T15:19:42Z

It did not exist prior to the transfer.

ryao · 2012-04-24T18:46:57Z

./lib/libzfs/libzfs_dataset.c calls an ioctl with ZFS_IOC_REMOVE_MINOR. When multiple snapshots of a zvol are encountered, this fails and the above error message is printed. It would seem that zv_open_count is being increased for each descendant snapshot. I am still looking into the how that happens.

ryao · 2012-04-24T20:10:31Z

I also deleted the additional snapshots, which worked as a hack to resolve the issue. I subsequently tried (and failed) to reproduce this with new snapshots. This occurred following the use of sync=disabled on a zvol and system instability. I guess that some sort of high level corruption ocurred, which deleting the old snapshots fixed.

With that said, I cannot identify the cause of zv_open_count being set to a value greater than zero. I will leave the decision to close this to Brian.

behlendorf · 2012-04-27T21:31:46Z

The zv_open_count in incremented/decremented when the zvol is opened/closed by any userspace process. This behavior is similar to Solaris but not identical so there might be an issue here. In particular, I don't believe snapshots of zvols are accessible under Solaris.

Anyway, let's leave the issue open until we resolve the root cause.

PascalLauly · 2012-05-10T22:27:10Z

snapshot of zvols are accessible under solaris and sending rpool/zvol@Y to another pool works. I use this feature on opensolaris to backup volume like this:
(ssh 192.168.0.1 "zfs send -R rpool/zvol@Y" ) | zfs receive -d rpool

So there is a linux issue here.

dverite · 2012-05-25T15:48:53Z

I've been investigating this error in a case where it occurs consistently and found some interesting facts.
The command is: zfs receive -d ibackup2 < data where data is a 10Gb stream.
It fails after a while with:

cannot remove device links for 'ibackup2/linux_master32': dataset is busy

First fact, when commenting the udev rule in 60-zvol.rules, the error disappears and the receive operation completes successfully (that is, except for the links not created inside /dev/zvol, which incidentally produce the warnings /dev/zvol/x may not be immediately available).

Also, when tracing the code with gdb and letting it pause just before stepping in zvol_remove_link() or when introducing a sleep(1) before the call at libzfs/libzfs_sendrecv.c:2617, the error no longer occurs.

It may be relevant that the input stream contains volumes with partitions. The whole set of links actually created at the end of the operation when it's successful is:

linux_master32
linux_master32@20110207
linux_master32@20110207-part1
linux_master32@20110207-part2
linux_master32@20110207-part3
linux_master32@20110207-part4
linux_master32@last_snap
linux_master32@last_snap-part1
linux_master32@last_snap-part2
linux_master32@last_snap-part3
linux_master32@last_snap-part4
linux_master32-part1
linux_master32-part2
linux_master32-part3
linux_master32-part4
linux_master32@snap
linux_master32@snap-part1
linux_master32@snap-part2
linux_master32@snap-part3
linux_master32@snap-part4

I've observed that zfs receive creates /dev/zvol/ibackup2/linux_master32, then /dev/zvol/ibackup2/linux_master32@20110207 and then removes /dev/zvol/ibackup2/linux_master32 before continuing on to the next snapshot. It is apparently this removal of /dev/zvol/ibackup2/linux_master32 that fails with the EBUSY error. When it doesn't fail at the point of linux_master32@20110207, it will at linux_master32@snap or linux_master32@last_nap.

For the reasons above, I suspect that the root cause of the problem is a race condition between the asynchronous creation of this link by udev and its almost immediate destruction by zfs receive. The code in libzfs_dataset.c:zvol_create_link_common() has a loop waiting for the device link to appear, but maybe it's not sufficient?

Hope this helps.

behlendorf · 2012-06-11T19:13:20Z

Slightly reworked but the intent of the patch remains the same. Merged in to master.

) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.144 to 1.0.152. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](serde-rs/serde@v1.0.144...v1.0.152) --- updated-dependencies: - dependency-name: serde dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

dverite mentioned this issue Jun 9, 2012

Fix for spurious 'cannot remove device links...dataset is busy' errors #781

Closed

behlendorf closed this as completed in c6327b6 Jun 11, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs send | zfs receive fails on zvols that have additional snapshots #692

zfs send | zfs receive fails on zvols that have additional snapshots #692

ryao commented Apr 23, 2012

zfs send -DRvp rpool/KVM@backup | ssh -C 192.168.1.3 zfs receive -Fudv rpool/BACKUP/vserver

GregorKopka commented Apr 24, 2012

ryao commented Apr 24, 2012

ryao commented Apr 24, 2012

ryao commented Apr 24, 2012

behlendorf commented Apr 27, 2012

PascalLauly commented May 10, 2012

dverite commented May 25, 2012

behlendorf commented Jun 11, 2012

zfs send | zfs receive fails on zvols that have additional snapshots #692

zfs send | zfs receive fails on zvols that have additional snapshots #692

Comments

ryao commented Apr 23, 2012

zfs send -DRvp rpool/KVM@backup | ssh -C 192.168.1.3 zfs receive -Fudv rpool/BACKUP/vserver

GregorKopka commented Apr 24, 2012

ryao commented Apr 24, 2012

ryao commented Apr 24, 2012

ryao commented Apr 24, 2012

behlendorf commented Apr 27, 2012

PascalLauly commented May 10, 2012

dverite commented May 25, 2012

behlendorf commented Jun 11, 2012