Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs send | zfs receive fails on zvols that have additional snapshots #692

Closed
ryao opened this issue Apr 23, 2012 · 8 comments
Closed

zfs send | zfs receive fails on zvols that have additional snapshots #692

ryao opened this issue Apr 23, 2012 · 8 comments
Milestone

Comments

@ryao
Copy link
Contributor

ryao commented Apr 23, 2012

If I have a rpool/zvol@X and rpool/zvol@Y and I try to send rpool/zvol@Y to another pool, rpool/zvol@X will be transfered and then a failure will occur. I had this happen to me a few times before I realized that this was happening. The following illustrates this:

zfs send -DRvp rpool/KVM@backup | ssh -C 192.168.1.3 zfs receive -Fudv rpool/BACKUP/vserver

Password: sending from @ to rpool/KVM@backup
sending from @ to rpool/KVM/vm_gentoo@backup

receiving full stream of rpool/KVM@backup into rpool/BACKUP/vserver/KVM@backup
received 44.2KB stream in 1 seconds (44.2KB/sec)
receiving full stream of rpool/KVM/vm_gentoo@backup into rpool/BACKUP/vserver/KVM/vm_gentoo@backup
sending from @ to rpool/KVM/minix@install
received 13.8GB stream in 1032 seconds (13.7MB/sec)
receiving full stream of rpool/KVM/minix@install into rpool/BACKUP/vserver/KVM/minix@install
sending from @install to rpool/KVM/minix@backup
sending from @ to rpool/KVM/win7@install
cannot remove device links for 'rpool/BACKUP/vserver/KVM/minix': dataset is busy
received 521MB stream in 38 seconds (13.7MB/sec)

In this case, both rpool/KVM/minix@backup and rpool/KVM/minix@install exist. Trying to send either will result in an error. I opted to delete rpool/KVM/minix@backup, rollback to rpool/KVM/minix@install and then rename rpool/KVM/minix@install to rpool/KVM/minix@backup. Had this been a dataset, both snapshots would have been transferred without a problem.

@GregorKopka
Copy link
Contributor

Did rpool/BACKUP/vserver/KVM/minix exists at that time, and if so: was it in use?

@ryao
Copy link
Contributor Author

ryao commented Apr 24, 2012

It did not exist prior to the transfer.

@ryao
Copy link
Contributor Author

ryao commented Apr 24, 2012

./lib/libzfs/libzfs_dataset.c calls an ioctl with ZFS_IOC_REMOVE_MINOR. When multiple snapshots of a zvol are encountered, this fails and the above error message is printed. It would seem that zv_open_count is being increased for each descendant snapshot. I am still looking into the how that happens.

@ryao
Copy link
Contributor Author

ryao commented Apr 24, 2012

I also deleted the additional snapshots, which worked as a hack to resolve the issue. I subsequently tried (and failed) to reproduce this with new snapshots. This occurred following the use of sync=disabled on a zvol and system instability. I guess that some sort of high level corruption ocurred, which deleting the old snapshots fixed.

With that said, I cannot identify the cause of zv_open_count being set to a value greater than zero. I will leave the decision to close this to Brian.

@behlendorf
Copy link
Contributor

The zv_open_count in incremented/decremented when the zvol is opened/closed by any userspace process. This behavior is similar to Solaris but not identical so there might be an issue here. In particular, I don't believe snapshots of zvols are accessible under Solaris.

Anyway, let's leave the issue open until we resolve the root cause.

@PascalLauly
Copy link

snapshot of zvols are accessible under solaris and sending rpool/zvol@Y to another pool works. I use this feature on opensolaris to backup volume like this:
(ssh 192.168.0.1 "zfs send -R rpool/zvol@Y" ) | zfs receive -d rpool

So there is a linux issue here.

@dverite
Copy link
Contributor

dverite commented May 25, 2012

I've been investigating this error in a case where it occurs consistently and found some interesting facts.
The command is: zfs receive -d ibackup2 < data where data is a 10Gb stream.
It fails after a while with:

cannot remove device links for 'ibackup2/linux_master32': dataset is busy

First fact, when commenting the udev rule in 60-zvol.rules, the error disappears and the receive operation completes successfully (that is, except for the links not created inside /dev/zvol, which incidentally produce the warnings /dev/zvol/x may not be immediately available).

Also, when tracing the code with gdb and letting it pause just before stepping in zvol_remove_link() or when introducing a sleep(1) before the call at libzfs/libzfs_sendrecv.c:2617, the error no longer occurs.

It may be relevant that the input stream contains volumes with partitions. The whole set of links actually created at the end of the operation when it's successful is:

linux_master32
linux_master32@20110207
linux_master32@20110207-part1
linux_master32@20110207-part2
linux_master32@20110207-part3
linux_master32@20110207-part4
linux_master32@last_snap
linux_master32@last_snap-part1
linux_master32@last_snap-part2
linux_master32@last_snap-part3
linux_master32@last_snap-part4
linux_master32-part1
linux_master32-part2
linux_master32-part3
linux_master32-part4
linux_master32@snap
linux_master32@snap-part1
linux_master32@snap-part2
linux_master32@snap-part3
linux_master32@snap-part4

I've observed that zfs receive creates /dev/zvol/ibackup2/linux_master32, then /dev/zvol/ibackup2/linux_master32@20110207 and then removes /dev/zvol/ibackup2/linux_master32 before continuing on to the next snapshot. It is apparently this removal of /dev/zvol/ibackup2/linux_master32 that fails with the EBUSY error. When it doesn't fail at the point of linux_master32@20110207, it will at linux_master32@snap or linux_master32@last_nap.

For the reasons above, I suspect that the root cause of the problem is a race condition between the asynchronous creation of this link by udev and its almost immediate destruction by zfs receive. The code in libzfs_dataset.c:zvol_create_link_common() has a loop waiting for the device link to appear, but maybe it's not sufficient?

Hope this helps.

@behlendorf
Copy link
Contributor

Slightly reworked but the intent of the patch remains the same. Merged in to master.

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
)

Bumps [serde](https://github.com/serde-rs/serde) from 1.0.144 to 1.0.152.
- [Release notes](https://github.com/serde-rs/serde/releases)
- [Commits](serde-rs/serde@v1.0.144...v1.0.152)

---
updated-dependencies:
- dependency-name: serde
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants