New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication fails when receive side apparently hangs #136
Comments
I merged the RPC rewrite today: #111 |
Also, this issue has been encountered before in #104 |
As can be seen from the commits above, work on this issue is progressing in branch master...problame/overlapping-dataset-hierarchy-improvements Could you please verify that the issue was fixed or is at least mitigated? |
I'm running zrepl 0.1.0rc3 on FreeBSD 12, replicating to a Ubuntu Linux VM also running zrepl 0.1.0rc3. Using an ssh connection. I cannot get replication to work correctly about 95% of the time. Sometimes it fails right away, sometimes it goes for a few minutes and then fails. I tried using a manual
zfs send | ssh backup zfs recv
command and the replication was successful, so I'm confident the problem is in zrepl, and that it's probably not a Linux/FreeBSD incompatibility issue. The only thing I've noticed is that the transfer rate goes to 0 for several seconds before it fails. I thought perhaps the backup hard drive I was using was hanging for a few seconds on writes and that this was tripping up zrepl, but I could not reproduce any write hangs with a dd test, so I think it's tripping up for some other reason. It's always the same error message "unconsumed input stream".I should also note that during the manual send/recv the transfer rate still went to 0 for a few seconds several times, but it would just recover and continue after that. So I'm not sure if there's a problem with my I/O or not, but if there is, zrepl times out almost immediately and fails.
This is the log.
Mar 12 22:16:52 fileserver zrepl[39020]: job=machines subsystem=repl msg="receive request failed (might also be error on sender)" step="zfspool/oldmachines@zrepl_20190312_234215_000 (full)" err="handler error with unconsumed input stream: zfs exited with error: exit status 1\nstderr:\ncannot receive new filesystem stream: checksum mismatch or incomplete stream\n" fs=zfspool/oldmachines invocation=5 errType=*streamrpc.RemoteEndpointError
The text was updated successfully, but these errors were encountered: