zfs 0.7 hangs receiving 16M block (reproducible) #7365

fling- · 2018-03-29T05:14:23Z

System information

Type	Version/Name
Distribution Name	gentoo
Distribution Version	17.1
Linux Kernel	4.15.3
Architecture	amd64
ZFS Version	0.7.6-1
SPL Version	0.7.6-1

Describe the problem you're observing

zfs hangs receiving 16M blocks when I'm trying to migrate data between pools
Looks like I'm stuck in the middle of migration as I can't send without -L because of #6224

Describe how to reproduce the problem

Using https://gist.github.com/fling-/c66bf1e4a082b5cf9cd4d1106fe6e2bc

Include any warning/errors/backtraces from the system logs

# zfs send -LRe studio/gentoo@old-pool | mbuffer -L -m 512M | zfs recv -u new-root/gentoo
in @  0.0 KiB/s, out @  0.0 KiB/s, 2248 MiB total, buffer 100% full^C

recv hangs in D state:

952 pts/11   D+     0:03 zfs recv -u new-root/gentoo

The text was updated successfully, but these errors were encountered:

zrav · 2018-04-03T09:17:58Z

If it is acceptable to re-send all the data using <=128k blocks, as a workaround you could restart sending from scratch without -L. #6224 only occurs when you incrementally send small blocks with large blocks already present on the target.

When using 16MB blocks the send/recv queue's aren't quite big enough. This change leaves the default 16M queue size which a good value for most pools. But it additionally ensures that the queue sizes are at least twice the allowed zfs_max_recordsize. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#7365

behlendorf · 2018-04-07T00:39:53Z

@fling- thank you for the excellent reproducer, that definitely helped in reproducing the issue and identify the root cause. I've opened #7404 with a proposed fix.

When using 16MB blocks the send/recv queue's aren't quite big enough. This change leaves the default 16M queue size which a good value for most pools. But it additionally ensures that the queue sizes are at least twice the allowed zfs_max_recordsize. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#7365 Closes openzfs#7404

When using 16MB blocks the send/recv queue's aren't quite big enough. This change leaves the default 16M queue size which a good value for most pools. But it additionally ensures that the queue sizes are at least twice the allowed zfs_max_recordsize. Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #7365 Closes #7404

behlendorf mentioned this issue Apr 7, 2018

Fix 'zfs send/recv' hang with 16M blocks #7404

Merged

13 tasks

behlendorf closed this as completed in 3b0d992 Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs 0.7 hangs receiving 16M block (reproducible) #7365

zfs 0.7 hangs receiving 16M block (reproducible) #7365

fling- commented Mar 29, 2018 •

edited

Loading

zrav commented Apr 3, 2018

behlendorf commented Apr 7, 2018

zfs 0.7 hangs receiving 16M block (reproducible) #7365

zfs 0.7 hangs receiving 16M block (reproducible) #7365

Comments

fling- commented Mar 29, 2018 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

zrav commented Apr 3, 2018

behlendorf commented Apr 7, 2018

fling- commented Mar 29, 2018 •

edited

Loading