Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FreeBSD >= 12.2, < 13.0: Go runtime deadlocks and/or panics on multicore systems #411

Closed
problame opened this issue Jan 8, 2021 · 62 comments
Labels
Milestone

Comments

@problame
Copy link
Member

problame commented Jan 8, 2021

Happened 2 weeks after upgrading my main server to FreeBSD 12.2.
Made a goroutine dump of the receiving side.
Most interesting stack:

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xa6b4f6 0xa6f205 0xa71f3e 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44							/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64							/usr/local/go/src/sync/waitgroup.go:130
#	0xa6b4f5	github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage+0x3b5	/home/circleci/project/rpc/dataconn/stream/stream_conn.go:109
#	0xa6f204	github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv+0x204			/home/circleci/project/rpc/dataconn/dataconn_client.go:87
#	0xa71f3d	github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1+0x7d		/home/circleci/project/rpc/dataconn/dataconn_client.go:162

But:

  • no zfs process on the receiving side
  • other stacks show that the connection was probably still active (heartbeat goroutine existed)
  • sadly didn't capture sockstat output

Maybe I'm hitting the same bug that @mdtancsa had/has on CURRENT.
Let's see whether it happens again at some point.
This machine has hyperthreading disabled but does the (experimental) parallel replication.


goroutine profile: total 27
1 @ 0x40c2d4 0x46aefd 0xa9ae25 0x46e681
#	0x46aefc	os/signal.signal_recv+0x9c	/usr/local/go/src/runtime/sigqueue.go:147
#	0xa9ae24	os/signal.loop+0x24		/usr/local/go/src/os/signal/signal_unix.go:23

1 @ 0x4397a5 0x4065af 0x4061eb 0xa56178 0xa939ff 0xa93366 0xae22c2 0x46e681
#	0xa56177	github.com/zrepl/zrepl/replication/driver.Do.func2+0x77		/home/circleci/project/replication/driver/replication_driver.go:280
#	0xa939fe	github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do+0x37e	/home/circleci/project/daemon/job/active.go:467
#	0xa93365	github.com/zrepl/zrepl/daemon/job.(*ActiveSide).Run+0x425	/home/circleci/project/daemon/job/active.go:424
#	0xae22c1	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141		/home/circleci/project/daemon/daemon.go:248

1 @ 0x4397a5 0x4065af 0x4061eb 0xa57888 0x46e681
#	0xa57887	github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1+0x47	/home/circleci/project/replication/driver/replication_stepqueue.go:85

1 @ 0x4397a5 0x4065af 0x4061eb 0xa7ccd8 0x46e681
#	0xa7ccd7	github.com/zrepl/zrepl/rpc.NewClient.func1.1+0x37	/home/circleci/project/rpc/rpc_client.go:60

1 @ 0x4397a5 0x4065af 0x4061eb 0xae1d14 0x46e681
#	0xae1d13	github.com/zrepl/zrepl/daemon.Run.func1+0x33	/home/circleci/project/daemon/daemon.go:37

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d81e5 0x4d81c3 0x53e80f 0x55214e 0x70b2a2 0x4f6f51 0x70b4f3 0x708315 0x70e5df 0x70e5ea 0x56a922 0x4d3627 0x8fb1c9 0x8fb17a 0x8fba45 0x918f52 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54					/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44					/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d81e4	internal/poll.(*pollDesc).waitRead+0x1a4				/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d81c2	internal/poll.(*FD).Read+0x182						/usr/local/go/src/internal/poll/fd_unix.go:159
#	0x53e80e	net.(*netFD).Read+0x4e							/usr/local/go/src/net/fd_posix.go:55
#	0x55214d	net.(*conn).Read+0x8d							/usr/local/go/src/net/net.go:182
#	0x70b2a1	crypto/tls.(*atLeastReader).Read+0x61					/usr/local/go/src/crypto/tls/conn.go:779
#	0x4f6f50	bytes.(*Buffer).ReadFrom+0xb0						/usr/local/go/src/bytes/buffer.go:204
#	0x70b4f2	crypto/tls.(*Conn).readFromUntil+0xf2					/usr/local/go/src/crypto/tls/conn.go:801
#	0x708314	crypto/tls.(*Conn).readRecordOrCCS+0x114				/usr/local/go/src/crypto/tls/conn.go:608
#	0x70e5de	crypto/tls.(*Conn).readRecord+0x15e					/usr/local/go/src/crypto/tls/conn.go:576
#	0x70e5e9	crypto/tls.(*Conn).Read+0x169						/usr/local/go/src/crypto/tls/conn.go:1252
#	0x56a921	bufio.(*Reader).Read+0x221						/usr/local/go/src/bufio/bufio.go:227
#	0x4d3626	io.ReadAtLeast+0x86							/usr/local/go/src/io/io.go:314
#	0x8fb1c8	io.ReadFull+0x88							/usr/local/go/src/io/io.go:333
#	0x8fb179	golang.org/x/net/http2.readFrameHeader+0x39				/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:237
#	0x8fba44	golang.org/x/net/http2.(*Framer).ReadFrame+0xa4				/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:492
#	0x918f51	google.golang.org/grpc/internal/transport.(*http2Client).reader+0x171	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1218

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d81e5 0x4d81c3 0x53e80f 0x55214e 0x7c7638 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54		/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44		/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d81e4	internal/poll.(*pollDesc).waitRead+0x1a4	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d81c2	internal/poll.(*FD).Read+0x182			/usr/local/go/src/internal/poll/fd_unix.go:159
#	0x53e80e	net.(*netFD).Read+0x4e				/usr/local/go/src/net/fd_posix.go:55
#	0x55214d	net.(*conn).Read+0x8d				/usr/local/go/src/net/net.go:182
#	0x7c7637	net/http.(*connReader).backgroundRead+0x57	/usr/local/go/src/net/http/server.go:690

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d9d9c 0x4d9d7e 0x53fd85 0x55c132 0x55af65 0x7d2466 0xae23cc 0xae2362 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54					/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44					/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d9d9b	internal/poll.(*pollDesc).waitRead+0x1fb				/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d9d7d	internal/poll.(*FD).Accept+0x1dd					/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x53fd84	net.(*netFD).accept+0x44						/usr/local/go/src/net/fd_unix.go:172
#	0x55c131	net.(*TCPListener).accept+0x31						/usr/local/go/src/net/tcpsock_posix.go:139
#	0x55af64	net.(*TCPListener).Accept+0x64						/usr/local/go/src/net/tcpsock.go:261
#	0x7d2465	net/http.(*Server).Serve+0x265						/usr/local/go/src/net/http/server.go:2937
#	0xae23cb	net/http.Serve+0x8b							/usr/local/go/src/net/http/server.go:2498
#	0xae2361	github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop.func1+0x21	/home/circleci/project/daemon/pprof.go:74

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d9d9c 0x4d9d7e 0x53fd85 0x562572 0x560805 0x7d2466 0xae19c5 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54				/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44				/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d9d9b	internal/poll.(*pollDesc).waitRead+0x1fb			/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d9d7d	internal/poll.(*FD).Accept+0x1dd				/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x53fd84	net.(*netFD).accept+0x44					/usr/local/go/src/net/fd_unix.go:172
#	0x562571	net.(*UnixListener).accept+0x31					/usr/local/go/src/net/unixsock_posix.go:162
#	0x560804	net.(*UnixListener).Accept+0x64					/usr/local/go/src/net/unixsock.go:260
#	0x7d2465	net/http.(*Server).Serve+0x265					/usr/local/go/src/net/http/server.go:2937
#	0xae19c4	github.com/zrepl/zrepl/daemon.(*controlJob).Run.func5+0x44	/home/circleci/project/daemon/control.go:168

1 @ 0x4397a5 0x44982f 0x838e07 0x46e681
#	0x838e06	github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1+0x3c6	/home/circleci/project/daemon/logging/trace/trace_chrometrace.go:146

1 @ 0x4397a5 0x44982f 0x90971b 0x909f53 0x92999b 0x46e681
#	0x90971a	google.golang.org/grpc/internal/transport.(*controlBuffer).get+0x11a	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317
#	0x909f52	google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x1d2	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435
#	0x92999a	google.golang.org/grpc/internal/transport.newHTTP2Client.func3+0x7a	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:328

1 @ 0x4397a5 0x44982f 0x919885 0x46e681
#	0x919884	google.golang.org/grpc/internal/transport.(*http2Client).keepalive+0x2e4	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1299

1 @ 0x4397a5 0x44982f 0x95e319 0x46e681
#	0x95e318	google.golang.org/grpc.(*ccBalancerWrapper).watcher+0x118	/go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:122

1 @ 0x4397a5 0x44982f 0x9619e5 0xa7ce8d 0x46e681
#	0x9619e4	google.golang.org/grpc.(*ClientConn).WaitForStateChange+0x104	/go/pkg/mod/google.golang.org/grpc@v1.17.0/clientconn.go:417
#	0xa7ce8c	github.com/zrepl/zrepl/rpc.NewClient.func1+0x18c		/home/circleci/project/rpc/rpc_client.go:67

1 @ 0x4397a5 0x44982f 0xa49ee9 0xa46f8d 0xa90d3f 0xa90cfe 0x46e681
#	0xa49ee8	github.com/zrepl/zrepl/daemon/snapper.wait+0x1c8			/home/circleci/project/daemon/snapper/snapper.go:376
#	0xa46f8c	github.com/zrepl/zrepl/daemon/snapper.(*Snapper).Run+0x24c		/home/circleci/project/daemon/snapper/snapper.go:158
#	0xa90d3e	github.com/zrepl/zrepl/daemon/snapper.(*PeriodicOrManual).Run+0x5e	/home/circleci/project/daemon/snapper/snapper_all.go:23
#	0xa90cfd	github.com/zrepl/zrepl/daemon/job.(*modePush).RunPeriodic+0x1d		/home/circleci/project/daemon/job/active.go:132

1 @ 0x4397a5 0x44982f 0xa57c3b 0x46e681
#	0xa57c3a	github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2+0x15a	/home/circleci/project/replication/driver/replication_stepqueue.go:92

1 @ 0x4397a5 0x44982f 0xa6864e 0x46e681
#	0xa6864d	github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats+0x1ad	/home/circleci/project/rpc/dataconn/heartbeatconn/heartbeatconn.go:84

1 @ 0x4397a5 0x44982f 0xa6fdfd 0xa7ae11 0xa5e0c3 0xa5a85f 0xa57554 0x82683d 0xa521ce 0xa567bc 0x46e681
#	0xa6fdfc	github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv+0x35c			/home/circleci/project/rpc/dataconn/dataconn_client.go:183
#	0xa7ae10	github.com/zrepl/zrepl/rpc.(*Client).Receive+0xd0				/home/circleci/project/rpc/rpc_client.go:108
#	0xa5e0c2	github.com/zrepl/zrepl/replication/logic.(*Step).doReplication+0x462		/home/circleci/project/replication/logic/replication_logic.go:620
#	0xa5a85e	github.com/zrepl/zrepl/replication/logic.(*Step).Step+0x3e			/home/circleci/project/replication/logic/replication_logic.go:187
#	0xa57553	github.com/zrepl/zrepl/replication/driver.(*fs).do.func5+0x1f3			/home/circleci/project/replication/driver/replication_driver.go:641
#	0x82683c	github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile+0x5c			/home/circleci/project/util/chainlock/chainlock.go:41
#	0xa521cd	github.com/zrepl/zrepl/replication/driver.(*fs).do+0x94d			/home/circleci/project/replication/driver/replication_driver.go:634
#	0xa567bb	github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1+0x13b	/home/circleci/project/replication/driver/replication_driver.go:433

1 @ 0x4397a5 0x44982f 0xa98809 0x46e681
#	0xa98808	github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1+0x108	/home/circleci/project/daemon/job/active.go:438

1 @ 0x4397a5 0x44982f 0xadd465 0xae22c2 0x46e681
#	0xadd464	github.com/zrepl/zrepl/daemon.(*controlJob).Run+0x744	/home/circleci/project/daemon/control.go:172
#	0xae22c1	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141	/home/circleci/project/daemon/daemon.go:248

1 @ 0x4397a5 0x44982f 0xadf032 0xae1429 0x874fa2 0x5ce502 0x5cf11e 0x8754cd 0x8754be 0xb23bc5 0x4393a9 0x46e681
#	0xadf031	github.com/zrepl/zrepl/daemon.Run+0xb11			/home/circleci/project/daemon/daemon.go:111
#	0xae1428	github.com/zrepl/zrepl/daemon.glob..func1+0x48		/home/circleci/project/daemon/main.go:16
#	0x874fa1	github.com/zrepl/zrepl/cli.(*Subcommand).run+0xe1	/home/circleci/project/cli/cli.go:105
#	0x5ce501	github.com/spf13/cobra.(*Command).execute+0x2c1		/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760
#	0x5cf11d	github.com/spf13/cobra.(*Command).ExecuteC+0x2fd	/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846
#	0x8754cc	github.com/spf13/cobra.(*Command).Execute+0x2c		/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
#	0x8754bd	github.com/zrepl/zrepl/cli.Run+0x1d			/home/circleci/project/cli/cli.go:152
#	0xb23bc4	main.main+0x24						/home/circleci/project/main.go:24
#	0x4393a8	runtime.main+0x208					/usr/local/go/src/runtime/proc.go:204

1 @ 0x4397a5 0x44982f 0xae0788 0x46e681
#	0xae0787	github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop+0xe7	/home/circleci/project/daemon/pprof.go:45

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xa5684a 0x82683d 0xa513f2 0xa4f7ba 0xa55108 0x82683d 0xa5564f 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44							/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64							/usr/local/go/src/sync/waitgroup.go:130
#	0xa56849	github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func2+0x29	/home/circleci/project/replication/driver/replication_driver.go:437
#	0x82683c	github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile+0x5c			/home/circleci/project/util/chainlock/chainlock.go:41
#	0xa513f1	github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems+0x311	/home/circleci/project/replication/driver/replication_driver.go:436
#	0xa4f7b9	github.com/zrepl/zrepl/replication/driver.(*attempt).do+0x79			/home/circleci/project/replication/driver/replication_driver.go:301
#	0xa55107	github.com/zrepl/zrepl/replication/driver.Do.func1.1+0x47			/home/circleci/project/replication/driver/replication_driver.go:219
#	0x82683c	github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile+0x5c			/home/circleci/project/util/chainlock/chainlock.go:41
#	0xa5564e	github.com/zrepl/zrepl/replication/driver.Do.func1+0x3ce			/home/circleci/project/replication/driver/replication_driver.go:218

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xa6b4f6 0xa6f205 0xa71f3e 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44							/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64							/usr/local/go/src/sync/waitgroup.go:130
#	0xa6b4f5	github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage+0x3b5	/home/circleci/project/rpc/dataconn/stream/stream_conn.go:109
#	0xa6f204	github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv+0x204			/home/circleci/project/rpc/dataconn/dataconn_client.go:87
#	0xa71f3d	github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1+0x7d		/home/circleci/project/rpc/dataconn/dataconn_client.go:162

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xae206d 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44				/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64				/usr/local/go/src/sync/waitgroup.go:130
#	0xae206c	github.com/zrepl/zrepl/daemon.(*jobs).wait.func1+0x2c	/home/circleci/project/daemon/daemon.go:144

1 @ 0x4397a5 0x46a998 0x46a96e 0x475edd 0xa57de8 0x46e681
#	0x46a96d	sync.runtime_notifyListWait+0xcd					/usr/local/go/src/runtime/sema.go:513
#	0x475edc	sync.(*Cond).Wait+0x9c							/usr/local/go/src/sync/cond.go:56
#	0xa57de7	github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func3+0x67	/home/circleci/project/replication/driver/replication_stepqueue.go:121

1 @ 0x4688dd 0xabc582 0xabc345 0xab8f32 0xac6925 0xac8205 0x7ceaa4 0x7d09cd 0x7d20a3 0x7cd8ad 0x46e681
#	0x4688dc	runtime/pprof.runtime_goroutineProfileWithLabels+0x5c	/usr/local/go/src/runtime/mprof.go:716
#	0xabc581	runtime/pprof.writeRuntimeProfile+0xe1			/usr/local/go/src/runtime/pprof/pprof.go:724
#	0xabc344	runtime/pprof.writeGoroutine+0xa4			/usr/local/go/src/runtime/pprof/pprof.go:684
#	0xab8f31	runtime/pprof.(*Profile).WriteTo+0x3f1			/usr/local/go/src/runtime/pprof/pprof.go:331
#	0xac6924	net/http/pprof.handler.ServeHTTP+0x384			/usr/local/go/src/net/http/pprof/pprof.go:256
#	0xac8204	net/http/pprof.Index+0x944				/usr/local/go/src/net/http/pprof/pprof.go:367
#	0x7ceaa3	net/http.HandlerFunc.ServeHTTP+0x43			/usr/local/go/src/net/http/server.go:2042
#	0x7d09cc	net/http.(*ServeMux).ServeHTTP+0x1ac			/usr/local/go/src/net/http/server.go:2417
#	0x7d20a2	net/http.serverHandler.ServeHTTP+0xa2			/usr/local/go/src/net/http/server.go:2843
#	0x7cd8ac	net/http.(*conn).serve+0x8ac				/usr/local/go/src/net/http/server.go:1925

@problame problame added the bug label Jan 8, 2021
@problame problame added this to the 0.3.2 milestone Jan 8, 2021
@mdtancsa
Copy link

mdtancsa commented Jan 8, 2021 via email

@problame problame changed the title freebsd 12.2: irreproducible lockup on sink side freebsd 12.2: irreproducible lockup on push side Jan 10, 2021
@problame
Copy link
Member Author

Happened again:


  • push side *
goroutine profile: total 27
1 @ 0x40c2d4 0x46aefd 0xa9ae25 0x46e681
#	0x46aefc	os/signal.signal_recv+0x9c	/usr/local/go/src/runtime/sigqueue.go:147
#	0xa9ae24	os/signal.loop+0x24		/usr/local/go/src/os/signal/signal_unix.go:23

1 @ 0x4397a5 0x4065af 0x4061eb 0xa56178 0xa939ff 0xa93366 0xae22c2 0x46e681
#	0xa56177	github.com/zrepl/zrepl/replication/driver.Do.func2+0x77		/home/circleci/project/replication/driver/replication_driver.go:280
#	0xa939fe	github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do+0x37e	/home/circleci/project/daemon/job/active.go:467
#	0xa93365	github.com/zrepl/zrepl/daemon/job.(*ActiveSide).Run+0x425	/home/circleci/project/daemon/job/active.go:424
#	0xae22c1	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141		/home/circleci/project/daemon/daemon.go:248

1 @ 0x4397a5 0x4065af 0x4061eb 0xa57888 0x46e681
#	0xa57887	github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1+0x47	/home/circleci/project/replication/driver/replication_stepqueue.go:85

1 @ 0x4397a5 0x4065af 0x4061eb 0xa7ccd8 0x46e681
#	0xa7ccd7	github.com/zrepl/zrepl/rpc.NewClient.func1.1+0x37	/home/circleci/project/rpc/rpc_client.go:60

1 @ 0x4397a5 0x4065af 0x4061eb 0xae1d14 0x46e681
#	0xae1d13	github.com/zrepl/zrepl/daemon.Run.func1+0x33	/home/circleci/project/daemon/daemon.go:37

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d81e5 0x4d81c3 0x53e80f 0x55214e 0x70b2a2 0x4f6f51 0x70b4f3 0x708315 0x70e5df 0x70e5ea 0x56a922 0x4d3627 0x8fb1c9 0x8fb17a 0x8fba45 0x918f52 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54					/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44					/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d81e4	internal/poll.(*pollDesc).waitRead+0x1a4				/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d81c2	internal/poll.(*FD).Read+0x182						/usr/local/go/src/internal/poll/fd_unix.go:159
#	0x53e80e	net.(*netFD).Read+0x4e							/usr/local/go/src/net/fd_posix.go:55
#	0x55214d	net.(*conn).Read+0x8d							/usr/local/go/src/net/net.go:182
#	0x70b2a1	crypto/tls.(*atLeastReader).Read+0x61					/usr/local/go/src/crypto/tls/conn.go:779
#	0x4f6f50	bytes.(*Buffer).ReadFrom+0xb0						/usr/local/go/src/bytes/buffer.go:204
#	0x70b4f2	crypto/tls.(*Conn).readFromUntil+0xf2					/usr/local/go/src/crypto/tls/conn.go:801
#	0x708314	crypto/tls.(*Conn).readRecordOrCCS+0x114				/usr/local/go/src/crypto/tls/conn.go:608
#	0x70e5de	crypto/tls.(*Conn).readRecord+0x15e					/usr/local/go/src/crypto/tls/conn.go:576
#	0x70e5e9	crypto/tls.(*Conn).Read+0x169						/usr/local/go/src/crypto/tls/conn.go:1252
#	0x56a921	bufio.(*Reader).Read+0x221						/usr/local/go/src/bufio/bufio.go:227
#	0x4d3626	io.ReadAtLeast+0x86							/usr/local/go/src/io/io.go:314
#	0x8fb1c8	io.ReadFull+0x88							/usr/local/go/src/io/io.go:333
#	0x8fb179	golang.org/x/net/http2.readFrameHeader+0x39				/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:237
#	0x8fba44	golang.org/x/net/http2.(*Framer).ReadFrame+0xa4				/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:492
#	0x918f51	google.golang.org/grpc/internal/transport.(*http2Client).reader+0x171	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1218

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d9d9c 0x4d9d7e 0x53fd85 0x55c132 0x55af65 0x7d2466 0xae23cc 0xae2362 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54					/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44					/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d9d9b	internal/poll.(*pollDesc).waitRead+0x1fb				/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d9d7d	internal/poll.(*FD).Accept+0x1dd					/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x53fd84	net.(*netFD).accept+0x44						/usr/local/go/src/net/fd_unix.go:172
#	0x55c131	net.(*TCPListener).accept+0x31						/usr/local/go/src/net/tcpsock_posix.go:139
#	0x55af64	net.(*TCPListener).Accept+0x64						/usr/local/go/src/net/tcpsock.go:261
#	0x7d2465	net/http.(*Server).Serve+0x265						/usr/local/go/src/net/http/server.go:2937
#	0xae23cb	net/http.Serve+0x8b							/usr/local/go/src/net/http/server.go:2498
#	0xae2361	github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop.func1+0x21	/home/circleci/project/daemon/pprof.go:74

1 @ 0x4397a5 0x43257b 0x468cf5 0x4d71a5 0x4d9d9c 0x4d9d7e 0x53fd85 0x562572 0x560805 0x7d2466 0xae19c5 0x46e681
#	0x468cf4	internal/poll.runtime_pollWait+0x54				/usr/local/go/src/runtime/netpoll.go:220
#	0x4d71a4	internal/poll.(*pollDesc).wait+0x44				/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4d9d9b	internal/poll.(*pollDesc).waitRead+0x1fb			/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4d9d7d	internal/poll.(*FD).Accept+0x1dd				/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x53fd84	net.(*netFD).accept+0x44					/usr/local/go/src/net/fd_unix.go:172
#	0x562571	net.(*UnixListener).accept+0x31					/usr/local/go/src/net/unixsock_posix.go:162
#	0x560804	net.(*UnixListener).Accept+0x64					/usr/local/go/src/net/unixsock.go:260
#	0x7d2465	net/http.(*Server).Serve+0x265					/usr/local/go/src/net/http/server.go:2937
#	0xae19c4	github.com/zrepl/zrepl/daemon.(*controlJob).Run.func5+0x44	/home/circleci/project/daemon/control.go:168

1 @ 0x4397a5 0x44982f 0x838e07 0x46e681
#	0x838e06	github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1+0x3c6	/home/circleci/project/daemon/logging/trace/trace_chrometrace.go:146

1 @ 0x4397a5 0x44982f 0x90971b 0x909f53 0x92999b 0x46e681
#	0x90971a	google.golang.org/grpc/internal/transport.(*controlBuffer).get+0x11a	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317
#	0x909f52	google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x1d2	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435
#	0x92999a	google.golang.org/grpc/internal/transport.newHTTP2Client.func3+0x7a	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:328

1 @ 0x4397a5 0x44982f 0x919885 0x46e681
#	0x919884	google.golang.org/grpc/internal/transport.(*http2Client).keepalive+0x2e4	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1299

1 @ 0x4397a5 0x44982f 0x95e319 0x46e681
#	0x95e318	google.golang.org/grpc.(*ccBalancerWrapper).watcher+0x118	/go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:122

1 @ 0x4397a5 0x44982f 0x9619e5 0xa7ce8d 0x46e681
#	0x9619e4	google.golang.org/grpc.(*ClientConn).WaitForStateChange+0x104	/go/pkg/mod/google.golang.org/grpc@v1.17.0/clientconn.go:417
#	0xa7ce8c	github.com/zrepl/zrepl/rpc.NewClient.func1+0x18c		/home/circleci/project/rpc/rpc_client.go:67

1 @ 0x4397a5 0x44982f 0xa49ee9 0xa46f8d 0xa90d3f 0xa90cfe 0x46e681
#	0xa49ee8	github.com/zrepl/zrepl/daemon/snapper.wait+0x1c8			/home/circleci/project/daemon/snapper/snapper.go:376
#	0xa46f8c	github.com/zrepl/zrepl/daemon/snapper.(*Snapper).Run+0x24c		/home/circleci/project/daemon/snapper/snapper.go:158
#	0xa90d3e	github.com/zrepl/zrepl/daemon/snapper.(*PeriodicOrManual).Run+0x5e	/home/circleci/project/daemon/snapper/snapper_all.go:23
#	0xa90cfd	github.com/zrepl/zrepl/daemon/job.(*modePush).RunPeriodic+0x1d		/home/circleci/project/daemon/job/active.go:132

1 @ 0x4397a5 0x44982f 0xa57c3b 0x46e681
#	0xa57c3a	github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2+0x15a	/home/circleci/project/replication/driver/replication_stepqueue.go:92

1 @ 0x4397a5 0x44982f 0xa6864e 0x46e681
#	0xa6864d	github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats+0x1ad	/home/circleci/project/rpc/dataconn/heartbeatconn/heartbeatconn.go:84

1 @ 0x4397a5 0x44982f 0xa6fdfd 0xa7ae11 0xa5e0c3 0xa5a85f 0xa57554 0x82683d 0xa521ce 0xa567bc 0x46e681
#	0xa6fdfc	github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv+0x35c			/home/circleci/project/rpc/dataconn/dataconn_client.go:183
#	0xa7ae10	github.com/zrepl/zrepl/rpc.(*Client).Receive+0xd0				/home/circleci/project/rpc/rpc_client.go:108
#	0xa5e0c2	github.com/zrepl/zrepl/replication/logic.(*Step).doReplication+0x462		/home/circleci/project/replication/logic/replication_logic.go:620
#	0xa5a85e	github.com/zrepl/zrepl/replication/logic.(*Step).Step+0x3e			/home/circleci/project/replication/logic/replication_logic.go:187
#	0xa57553	github.com/zrepl/zrepl/replication/driver.(*fs).do.func5+0x1f3			/home/circleci/project/replication/driver/replication_driver.go:641
#	0x82683c	github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile+0x5c			/home/circleci/project/util/chainlock/chainlock.go:41
#	0xa521cd	github.com/zrepl/zrepl/replication/driver.(*fs).do+0x94d			/home/circleci/project/replication/driver/replication_driver.go:634
#	0xa567bb	github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1+0x13b	/home/circleci/project/replication/driver/replication_driver.go:433

1 @ 0x4397a5 0x44982f 0xa98809 0x46e681
#	0xa98808	github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1+0x108	/home/circleci/project/daemon/job/active.go:438

1 @ 0x4397a5 0x44982f 0xadd465 0xae22c2 0x46e681
#	0xadd464	github.com/zrepl/zrepl/daemon.(*controlJob).Run+0x744	/home/circleci/project/daemon/control.go:172
#	0xae22c1	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141	/home/circleci/project/daemon/daemon.go:248

1 @ 0x4397a5 0x44982f 0xadf032 0xae1429 0x874fa2 0x5ce502 0x5cf11e 0x8754cd 0x8754be 0xb23bc5 0x4393a9 0x46e681
#	0xadf031	github.com/zrepl/zrepl/daemon.Run+0xb11			/home/circleci/project/daemon/daemon.go:111
#	0xae1428	github.com/zrepl/zrepl/daemon.glob..func1+0x48		/home/circleci/project/daemon/main.go:16
#	0x874fa1	github.com/zrepl/zrepl/cli.(*Subcommand).run+0xe1	/home/circleci/project/cli/cli.go:105
#	0x5ce501	github.com/spf13/cobra.(*Command).execute+0x2c1		/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760
#	0x5cf11d	github.com/spf13/cobra.(*Command).ExecuteC+0x2fd	/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846
#	0x8754cc	github.com/spf13/cobra.(*Command).Execute+0x2c		/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
#	0x8754bd	github.com/zrepl/zrepl/cli.Run+0x1d			/home/circleci/project/cli/cli.go:152
#	0xb23bc4	main.main+0x24						/home/circleci/project/main.go:24
#	0x4393a8	runtime.main+0x208					/usr/local/go/src/runtime/proc.go:204

1 @ 0x4397a5 0x44982f 0xae0788 0x46e681
#	0xae0787	github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop+0xe7	/home/circleci/project/daemon/pprof.go:45

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xa5684a 0x82683d 0xa513f2 0xa4f7ba 0xa55108 0x82683d 0xa5564f 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44							/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64							/usr/local/go/src/sync/waitgroup.go:130
#	0xa56849	github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func2+0x29	/home/circleci/project/replication/driver/replication_driver.go:437
#	0x82683c	github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile+0x5c			/home/circleci/project/util/chainlock/chainlock.go:41
#	0xa513f1	github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems+0x311	/home/circleci/project/replication/driver/replication_driver.go:436
#	0xa4f7b9	github.com/zrepl/zrepl/replication/driver.(*attempt).do+0x79			/home/circleci/project/replication/driver/replication_driver.go:301
#	0xa55107	github.com/zrepl/zrepl/replication/driver.Do.func1.1+0x47			/home/circleci/project/replication/driver/replication_driver.go:219
#	0x82683c	github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile+0x5c			/home/circleci/project/util/chainlock/chainlock.go:41
#	0xa5564e	github.com/zrepl/zrepl/replication/driver.Do.func1+0x3ce			/home/circleci/project/replication/driver/replication_driver.go:218

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xa6b4f6 0xa6f069 0xa71f3e 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44							/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64							/usr/local/go/src/sync/waitgroup.go:130
#	0xa6b4f5	github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage+0x3b5	/home/circleci/project/rpc/dataconn/stream/stream_conn.go:109
#	0xa6f068	github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv+0x68				/home/circleci/project/rpc/dataconn/dataconn_client.go:74
#	0xa71f3d	github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1+0x7d		/home/circleci/project/rpc/dataconn/dataconn_client.go:162

1 @ 0x4397a5 0x44a1e5 0x44a1ce 0x46a6e5 0x4796c5 0xae206d 0x46e681
#	0x46a6e4	sync.runtime_Semacquire+0x44				/usr/local/go/src/runtime/sema.go:56
#	0x4796c4	sync.(*WaitGroup).Wait+0x64				/usr/local/go/src/sync/waitgroup.go:130
#	0xae206c	github.com/zrepl/zrepl/daemon.(*jobs).wait.func1+0x2c	/home/circleci/project/daemon/daemon.go:144

1 @ 0x4397a5 0x46a998 0x46a96e 0x475edd 0xa57de8 0x46e681
#	0x46a96d	sync.runtime_notifyListWait+0xcd					/usr/local/go/src/runtime/sema.go:513
#	0x475edc	sync.(*Cond).Wait+0x9c							/usr/local/go/src/sync/cond.go:56
#	0xa57de7	github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func3+0x67	/home/circleci/project/replication/driver/replication_stepqueue.go:121

1 @ 0x4688dd 0xabc582 0xabc345 0xab8f32 0xac6925 0xac8205 0x7ceaa4 0x7d09cd 0x7d20a3 0x7cd8ad 0x46e681
#	0x4688dc	runtime/pprof.runtime_goroutineProfileWithLabels+0x5c	/usr/local/go/src/runtime/mprof.go:716
#	0xabc581	runtime/pprof.writeRuntimeProfile+0xe1			/usr/local/go/src/runtime/pprof/pprof.go:724
#	0xabc344	runtime/pprof.writeGoroutine+0xa4			/usr/local/go/src/runtime/pprof/pprof.go:684
#	0xab8f31	runtime/pprof.(*Profile).WriteTo+0x3f1			/usr/local/go/src/runtime/pprof/pprof.go:331
#	0xac6924	net/http/pprof.handler.ServeHTTP+0x384			/usr/local/go/src/net/http/pprof/pprof.go:256
#	0xac8204	net/http/pprof.Index+0x944				/usr/local/go/src/net/http/pprof/pprof.go:367
#	0x7ceaa3	net/http.HandlerFunc.ServeHTTP+0x43			/usr/local/go/src/net/http/server.go:2042
#	0x7d09cc	net/http.(*ServeMux).ServeHTTP+0x1ac			/usr/local/go/src/net/http/server.go:2417
#	0x7d20a2	net/http.serverHandler.ServeHTTP+0xa2			/usr/local/go/src/net/http/server.go:2843
#	0x7cd8ac	net/http.(*conn).serve+0x8ac				/usr/local/go/src/net/http/server.go:1925

1 @ 0x483385 0x4824ba 0x4dc74c 0x4dc71e 0x4daf67 0x4d817e 0x53e80f 0x55214e 0x7c7638 0x46e681
#	0x483384	syscall.Syscall+0x4				/usr/local/go/src/syscall/asm_unix_amd64.s:19
#	0x4824b9	syscall.read+0x59				/usr/local/go/src/syscall/zsyscall_freebsd_amd64.go:949
#	0x4dc74b	syscall.Read+0x4b				/usr/local/go/src/syscall/syscall_unix.go:187
#	0x4dc71d	internal/poll.(*FD).Read.func1+0x1d		/usr/local/go/src/internal/poll/fd_unix.go:155
#	0x4daf66	internal/poll.ignoringEINTR+0x26		/usr/local/go/src/internal/poll/fd_unix.go:567
#	0x4d817d	internal/poll.(*FD).Read+0x13d			/usr/local/go/src/internal/poll/fd_unix.go:155
#	0x53e80e	net.(*netFD).Read+0x4e				/usr/local/go/src/net/fd_posix.go:55
#	0x55214d	net.(*conn).Read+0x8d				/usr/local/go/src/net/net.go:182
#	0x7c7637	net/http.(*connReader).backgroundRead+0x57	/usr/local/go/src/net/http/server.go:690

% ps aux
USER    PID %CPU %MEM    VSZ   RSS TT  STAT STARTED       TIME COMMAND
zrepl 64995  9.7  0.4 721504 68096  -  I    29Dec20 1875:33.07 /usr/home/zrepl/zrepl_ansible/zrepl --config zrepl.yml daemon
zrepl  9965  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 15065  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 21919  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 27965  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 30625  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 30726  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 32145  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 35634  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 37348  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 38850  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 41212  0.0  0.0      0     0  -  Z    30Dec20    0:00.01 <defunct>
zrepl 58716  0.0  0.0  10844  1924  -  Is    2Jan21    0:00.18 daemon: /usr/home/zrepl/zrepl_ansible/zrepl[60525] (daemon)
zrepl 60525  0.0  0.4 723552 70136  -  I     2Jan21   69:32.21 /usr/home/zrepl/zrepl_ansible/zrepl --config zrepl.yml daemon
zrepl 87272  0.0  0.3 722208 51932  -  I    30Dec20   24:50.65 /usr/home/zrepl/zrepl_ansible/zrepl --config zrepl.yml daemon
zrepl 17351  0.0  0.0  14000  4264  0  S    11:02      0:00.02 -zsh (zsh)
zrepl 84557  0.0  0.0  11752  2672  0  R+   11:07      0:00.00 ps aux
% procstat 15065          
  PID  PPID  PGID   SID  TSID THR LOGIN    WCHAN     EMUL          COMM        
15065 64995 64855 64855     0   1 root     -         FreeBSD ELF64 zfs
% sockstat      
USER     COMMAND    PID   FD PROTO  LOCAL ADDRESS         FOREIGN ADDRESS      
zrepl    zrepl      60525 3  dgram  -> ??
zrepl    zrepl      60525 7  stream /home/zrepl/zrepl_ansible/run/control
zrepl    zrepl      60525 9  tcp6   sender::1:14084 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      60525 10 tcp4   10.123.237.40:22345   *:*
zrepl    zrepl      60525 13 tcp6   sender::1:59165 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      87272 3  dgram  -> ??
zrepl    zrepl      87272 8  tcp6   sender::1:64078 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      87272 10 tcp6   sender::1:41868 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 3  dgram  -> ??
zrepl    zrepl      64995 7  stream /home/zrepl/zrepl_ansible/run/control
zrepl    zrepl      64995 8  tcp4   10.123.237.40:9091    *:*
zrepl    zrepl      64995 9  tcp4   10.123.237.40:9092    *:*
zrepl    zrepl      64995 10 tcp6   sender::1:44736 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 11 tcp4   10.123.237.40:9091    10.123.237.70:45446
zrepl    zrepl      64995 12 tcp6   sender::1:30584 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 13 tcp6   sender::1:30586 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 15 tcp6   sender::1:40123 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 16 tcp6   sender::1:34749 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 17 tcp6   sender::1:49949 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 19 tcp6   sender::1:65393 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 20 tcp6   sender::1:39804 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 23 tcp6   sender::1:41492 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 27 tcp6   sender::1:58824 r:e:c:e:i:v:e:r:2342
zrepl    zrepl      64995 28 tcp6   sender::1:10242 r:e:c:e:i:v:e:r:2342

  • receiver * (Linux)
% ss -t
State   Recv-Q   Send-Q                               Local Address:Port                   Peer Address:Port           
...
ESTAB   0        0          [receiver]:2342         [sender]:64078          
ESTAB   0        0          [receiver]:2342         [sender]:14084   

=> only two connections on receiver !?

ps aux | grep zfs

And no zfs processes on the receiver.

Goroutine dump:

goroutine profile: total 30
2 @ 0x43c065 0x43487b 0x46bfb5 0x4dc3a5 0x4dd3e5 0x4dd3c3 0x53ee0f 0x553ace 0x710ba2 0x4fd551 0x710df3 0x70dc15 0x713edf 0x713eea 0x570222 0x4d8247 0x901109 0x9010ba 0x901985 0x9238a5 0x976f1b 0x988d45 0x471b01
#	0x46bfb4	internal/poll.runtime_pollWait+0x54						/usr/local/go/src/runtime/netpoll.go:220
#	0x4dc3a4	internal/poll.(*pollDesc).wait+0x44						/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4dd3e4	internal/poll.(*pollDesc).waitRead+0x1a4					/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4dd3c2	internal/poll.(*FD).Read+0x182							/usr/local/go/src/internal/poll/fd_unix.go:159
#	0x53ee0e	net.(*netFD).Read+0x4e								/usr/local/go/src/net/fd_posix.go:55
#	0x553acd	net.(*conn).Read+0x8d								/usr/local/go/src/net/net.go:182
#	0x710ba1	crypto/tls.(*atLeastReader).Read+0x61						/usr/local/go/src/crypto/tls/conn.go:779
#	0x4fd550	bytes.(*Buffer).ReadFrom+0xb0							/usr/local/go/src/bytes/buffer.go:204
#	0x710df2	crypto/tls.(*Conn).readFromUntil+0xf2						/usr/local/go/src/crypto/tls/conn.go:801
#	0x70dc14	crypto/tls.(*Conn).readRecordOrCCS+0x114					/usr/local/go/src/crypto/tls/conn.go:608
#	0x713ede	crypto/tls.(*Conn).readRecord+0x15e						/usr/local/go/src/crypto/tls/conn.go:576
#	0x713ee9	crypto/tls.(*Conn).Read+0x169							/usr/local/go/src/crypto/tls/conn.go:1252
#	0x570221	bufio.(*Reader).Read+0x221							/usr/local/go/src/bufio/bufio.go:227
#	0x4d8246	io.ReadAtLeast+0x86								/usr/local/go/src/io/io.go:314
#	0x901108	io.ReadFull+0x88								/usr/local/go/src/io/io.go:333
#	0x9010b9	golang.org/x/net/http2.readFrameHeader+0x39					/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:237
#	0x901984	golang.org/x/net/http2.(*Framer).ReadFrame+0xa4					/go/pkg/mod/golang.org/x/net@v0.0.0-20190613194153-d28f0bde5980/http2/frame.go:492
#	0x9238a4	google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams+0x84	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:429
#	0x976f1a	google.golang.org/grpc.(*Server).serveStreams+0xda				/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:681
#	0x988d44	google.golang.org/grpc.(*Server).handleRawConn.func1+0x44			/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:643

2 @ 0x43c065 0x43487b 0x46bfb5 0x4dc3a5 0x4dd3e5 0x4dd3c3 0x53ee0f 0x553ace 0x7cd3ad 0x56fba5 0x5708fd 0x570b34 0x75270c 0x7c754a 0x7c7579 0x7ce83a 0x7d2f05 0x471b01
#	0x46bfb4	internal/poll.runtime_pollWait+0x54		/usr/local/go/src/runtime/netpoll.go:220
#	0x4dc3a4	internal/poll.(*pollDesc).wait+0x44		/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4dd3e4	internal/poll.(*pollDesc).waitRead+0x1a4	/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4dd3c2	internal/poll.(*FD).Read+0x182			/usr/local/go/src/internal/poll/fd_unix.go:159
#	0x53ee0e	net.(*netFD).Read+0x4e				/usr/local/go/src/net/fd_posix.go:55
#	0x553acd	net.(*conn).Read+0x8d				/usr/local/go/src/net/net.go:182
#	0x7cd3ac	net/http.(*connReader).Read+0x1ac		/usr/local/go/src/net/http/server.go:798
#	0x56fba4	bufio.(*Reader).fill+0x104			/usr/local/go/src/bufio/bufio.go:101
#	0x5708fc	bufio.(*Reader).ReadSlice+0x3c			/usr/local/go/src/bufio/bufio.go:360
#	0x570b33	bufio.(*Reader).ReadLine+0x33			/usr/local/go/src/bufio/bufio.go:389
#	0x75270b	net/textproto.(*Reader).readLineSlice+0x6b	/usr/local/go/src/net/textproto/reader.go:58
#	0x7c7549	net/textproto.(*Reader).ReadLine+0xa9		/usr/local/go/src/net/textproto/reader.go:39
#	0x7c7578	net/http.readRequest+0xd8			/usr/local/go/src/net/http/request.go:1012
#	0x7ce839	net/http.(*conn).readRequest+0x199		/usr/local/go/src/net/http/server.go:984
#	0x7d2f04	net/http.(*conn).serve+0x704			/usr/local/go/src/net/http/server.go:1851

2 @ 0x43c065 0x44c16f 0x90f97b 0x9101b3 0x931357 0x471b01
#	0x90f97a	google.golang.org/grpc/internal/transport.(*controlBuffer).get+0x11a	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317
#	0x9101b2	google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x1d2	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435
#	0x931356	google.golang.org/grpc/internal/transport.newHTTP2Server.func2+0xd6	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:276

2 @ 0x43c065 0x44c16f 0x927a7c 0x471b01
#	0x927a7b	google.golang.org/grpc/internal/transport.(*http2Server).keepalive+0x21b	/go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_server.go:921

1 @ 0x40e1b4 0x46e1bd 0xaa1b65 0x471b01
#	0x46e1bc	os/signal.signal_recv+0x9c	/usr/local/go/src/runtime/sigqueue.go:147
#	0xaa1b64	os/signal.loop+0x24		/usr/local/go/src/os/signal/signal_unix.go:23

1 @ 0x43c065 0x40848f 0x4080cb 0xa79014 0x471b01
#	0xa79013	github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve.func1+0x73	/home/circleci/project/rpc/dataconn/dataconn_server.go:80

1 @ 0x43c065 0x40848f 0x4080cb 0xa80265 0x471b01
#	0xa80264	github.com/zrepl/zrepl/rpc/transportmux.Demux.func1+0x64	/home/circleci/project/rpc/transportmux/transportmux.go:116

1 @ 0x43c065 0x40848f 0x4080cb 0xa8448c 0x471b01
#	0xa8448b	github.com/zrepl/zrepl/rpc.NewServer.func1.2+0x4b	/home/circleci/project/rpc/rpc_server.go:63

1 @ 0x43c065 0x40848f 0x4080cb 0xae8b94 0x471b01
#	0xae8b93	github.com/zrepl/zrepl/daemon.Run.func1+0x33	/home/circleci/project/daemon/daemon.go:37

1 @ 0x43c065 0x40848f 0x4080cb 0xae93a8 0x471b01
#	0xae93a7	github.com/zrepl/zrepl/daemon.(*prometheusJob).Run.func1+0x47	/home/circleci/project/daemon/prometheus.go:75

1 @ 0x43c065 0x40848f 0x40810b 0xa77845 0xa84a92 0x471b01
#	0xa77844	github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve+0x264	/home/circleci/project/rpc/dataconn/dataconn_server.go:104
#	0xa84a91	github.com/zrepl/zrepl/rpc.NewServer.func4+0x51			/home/circleci/project/rpc/rpc_server.go:86

1 @ 0x43c065 0x40848f 0x40810b 0xa7ae45 0x975af6 0xa7bc59 0xa847cf 0x471b01
#	0xa7ae44	github.com/zrepl/zrepl/rpc/netadaptor.Listener.Accept+0x184			/home/circleci/project/rpc/netadaptor/authlistener_netlistener_adaptor.go:79
#	0x975af5	google.golang.org/grpc.(*Server).Serve+0x215					/go/pkg/mod/google.golang.org/grpc@v1.17.0/server.go:556
#	0xa7bc58	github.com/zrepl/zrepl/rpc/grpcclientidentity/grpchelper.NewServer.func1+0x78	/home/circleci/project/rpc/grpcclientidentity/grpchelper/authlistener_grpc_adaptor_wrapper.go:67
#	0xa847ce	github.com/zrepl/zrepl/rpc.NewServer.func1+0x1ee				/home/circleci/project/rpc/rpc_server.go:72

1 @ 0x43c065 0x43487b 0x46bfb5 0x4dc3a5 0x4def9c 0x4def7e 0x540385 0x55ddf2 0x55ca25 0x8d8e06 0xa95792 0xa7e409 0xa8076e 0x471b01
#	0x46bfb4	internal/poll.runtime_pollWait+0x54						/usr/local/go/src/runtime/netpoll.go:220
#	0x4dc3a4	internal/poll.(*pollDesc).wait+0x44						/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4def9b	internal/poll.(*pollDesc).waitRead+0x1fb					/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4def7d	internal/poll.(*FD).Accept+0x1dd						/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x540384	net.(*netFD).accept+0x44							/usr/local/go/src/net/fd_unix.go:172
#	0x55ddf1	net.(*TCPListener).accept+0x31							/usr/local/go/src/net/tcpsock_posix.go:139
#	0x55ca24	net.(*TCPListener).AcceptTCP+0x64						/usr/local/go/src/net/tcpsock.go:248
#	0x8d8e05	github.com/zrepl/zrepl/tlsconf.(*ClientAuthListener).Accept+0x45		/home/circleci/project/tlsconf/tlsconf.go:67
#	0xa95791	github.com/zrepl/zrepl/transport/tls.tlsAuthListener.Accept+0x71		/home/circleci/project/transport/tls/serve_tls.go:69
#	0xa7e408	github.com/zrepl/zrepl/rpc/versionhandshake.HandshakeListener.Accept+0x48	/home/circleci/project/rpc/versionhandshake/versionhandshake_transport_wrappers.go:50
#	0xa8076d	github.com/zrepl/zrepl/rpc/transportmux.Demux.func2+0xcd			/home/circleci/project/rpc/transportmux/transportmux.go:152

1 @ 0x43c065 0x43487b 0x46bfb5 0x4dc3a5 0x4def9c 0x4def7e 0x540385 0x55ddf2 0x55cbc5 0x7d7c66 0xae7f85 0xae7f2b 0xae9142 0x471b01
#	0x46bfb4	internal/poll.runtime_pollWait+0x54				/usr/local/go/src/runtime/netpoll.go:220
#	0x4dc3a4	internal/poll.(*pollDesc).wait+0x44				/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4def9b	internal/poll.(*pollDesc).waitRead+0x1fb			/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4def7d	internal/poll.(*FD).Accept+0x1dd				/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x540384	net.(*netFD).accept+0x44					/usr/local/go/src/net/fd_unix.go:172
#	0x55ddf1	net.(*TCPListener).accept+0x31					/usr/local/go/src/net/tcpsock_posix.go:139
#	0x55cbc4	net.(*TCPListener).Accept+0x64					/usr/local/go/src/net/tcpsock.go:261
#	0x7d7c65	net/http.(*Server).Serve+0x265					/usr/local/go/src/net/http/server.go:2937
#	0xae7f84	net/http.Serve+0x204						/usr/local/go/src/net/http/server.go:2498
#	0xae7f2a	github.com/zrepl/zrepl/daemon.(*prometheusJob).Run+0x1aa	/home/circleci/project/daemon/prometheus.go:82
#	0xae9141	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141		/home/circleci/project/daemon/daemon.go:248

1 @ 0x43c065 0x43487b 0x46bfb5 0x4dc3a5 0x4def9c 0x4def7e 0x540385 0x55ddf2 0x55cbc5 0x7d7c66 0xae924c 0xae91e2 0x471b01
#	0x46bfb4	internal/poll.runtime_pollWait+0x54					/usr/local/go/src/runtime/netpoll.go:220
#	0x4dc3a4	internal/poll.(*pollDesc).wait+0x44					/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4def9b	internal/poll.(*pollDesc).waitRead+0x1fb				/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4def7d	internal/poll.(*FD).Accept+0x1dd					/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x540384	net.(*netFD).accept+0x44						/usr/local/go/src/net/fd_unix.go:172
#	0x55ddf1	net.(*TCPListener).accept+0x31						/usr/local/go/src/net/tcpsock_posix.go:139
#	0x55cbc4	net.(*TCPListener).Accept+0x64						/usr/local/go/src/net/tcpsock.go:261
#	0x7d7c65	net/http.(*Server).Serve+0x265						/usr/local/go/src/net/http/server.go:2937
#	0xae924b	net/http.Serve+0x8b							/usr/local/go/src/net/http/server.go:2498
#	0xae91e1	github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop.func1+0x21	/home/circleci/project/daemon/pprof.go:74

1 @ 0x43c065 0x43487b 0x46bfb5 0x4dc3a5 0x4def9c 0x4def7e 0x540385 0x564232 0x5624c5 0x7d7c66 0xae8845 0x471b01
#	0x46bfb4	internal/poll.runtime_pollWait+0x54				/usr/local/go/src/runtime/netpoll.go:220
#	0x4dc3a4	internal/poll.(*pollDesc).wait+0x44				/usr/local/go/src/internal/poll/fd_poll_runtime.go:87
#	0x4def9b	internal/poll.(*pollDesc).waitRead+0x1fb			/usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#	0x4def7d	internal/poll.(*FD).Accept+0x1dd				/usr/local/go/src/internal/poll/fd_unix.go:394
#	0x540384	net.(*netFD).accept+0x44					/usr/local/go/src/net/fd_unix.go:172
#	0x564231	net.(*UnixListener).accept+0x31					/usr/local/go/src/net/unixsock_posix.go:162
#	0x5624c4	net.(*UnixListener).Accept+0x64					/usr/local/go/src/net/unixsock.go:260
#	0x7d7c65	net/http.(*Server).Serve+0x265					/usr/local/go/src/net/http/server.go:2937
#	0xae8844	github.com/zrepl/zrepl/daemon.(*controlJob).Run.func5+0x44	/home/circleci/project/daemon/control.go:168

1 @ 0x43c065 0x44c16f 0x83e607 0x471b01
#	0x83e606	github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1+0x3c6	/home/circleci/project/daemon/logging/trace/trace_chrometrace.go:146

1 @ 0x43c065 0x44c16f 0xa7eee7 0xa7918c 0x471b01
#	0xa7eee6	github.com/zrepl/zrepl/rpc/transportmux.(*demuxListener).Accept+0x106	/home/circleci/project/rpc/transportmux/transportmux.go:51
#	0xa7918b	github.com/zrepl/zrepl/rpc/dataconn.(*Server).Serve.func2+0xab		/home/circleci/project/rpc/dataconn/dataconn_server.go:92

1 @ 0x43c065 0x44c16f 0xa7eee7 0xa7af9f 0x471b01
#	0xa7eee6	github.com/zrepl/zrepl/rpc/transportmux.(*demuxListener).Accept+0x106	/home/circleci/project/rpc/transportmux/transportmux.go:51
#	0xa7af9e	github.com/zrepl/zrepl/rpc/netadaptor.Listener.handleAccept+0x9e	/home/circleci/project/rpc/netadaptor/authlistener_netlistener_adaptor.go:94

1 @ 0x43c065 0x44c16f 0xa838d9 0xa9dd4a 0xae9142 0x471b01
#	0xa838d8	github.com/zrepl/zrepl/rpc.(*Server).Serve+0x358		/home/circleci/project/rpc/rpc_server.go:117
#	0xa9dd49	github.com/zrepl/zrepl/daemon/job.(*PassiveSide).Run+0x449	/home/circleci/project/daemon/job/passive.go:190
#	0xae9141	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141		/home/circleci/project/daemon/daemon.go:248

1 @ 0x43c065 0x44c16f 0xae42e5 0xae9142 0x471b01
#	0xae42e4	github.com/zrepl/zrepl/daemon.(*controlJob).Run+0x744	/home/circleci/project/daemon/control.go:172
#	0xae9141	github.com/zrepl/zrepl/daemon.(*jobs).start.func1+0x141	/home/circleci/project/daemon/daemon.go:248

1 @ 0x43c065 0x44c16f 0xae5eb2 0xae82a9 0x87a7a2 0x5d3e02 0x5d4a1e 0x87accd 0x87acbe 0xb2aa05 0x43bc69 0x471b01
#	0xae5eb1	github.com/zrepl/zrepl/daemon.Run+0xb11			/home/circleci/project/daemon/daemon.go:111
#	0xae82a8	github.com/zrepl/zrepl/daemon.glob..func1+0x48		/home/circleci/project/daemon/main.go:16
#	0x87a7a1	github.com/zrepl/zrepl/cli.(*Subcommand).run+0xe1	/home/circleci/project/cli/cli.go:105
#	0x5d3e01	github.com/spf13/cobra.(*Command).execute+0x2c1		/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760
#	0x5d4a1d	github.com/spf13/cobra.(*Command).ExecuteC+0x2fd	/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846
#	0x87accc	github.com/spf13/cobra.(*Command).Execute+0x2c		/go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
#	0x87acbd	github.com/zrepl/zrepl/cli.Run+0x1d			/home/circleci/project/cli/cli.go:152
#	0xb2aa04	main.main+0x24						/home/circleci/project/main.go:24
#	0x43bc68	runtime.main+0x208					/usr/local/go/src/runtime/proc.go:204

1 @ 0x43c065 0x44c16f 0xae7608 0x471b01
#	0xae7607	github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop+0xe7	/home/circleci/project/daemon/pprof.go:45

1 @ 0x43c065 0x44cb25 0x44cb0e 0x46d9a5 0x47cde5 0xae8eed 0x471b01
#	0x46d9a4	sync.runtime_Semacquire+0x44				/usr/local/go/src/runtime/sema.go:56
#	0x47cde4	sync.(*WaitGroup).Wait+0x64				/usr/local/go/src/sync/waitgroup.go:130
#	0xae8eec	github.com/zrepl/zrepl/daemon.(*jobs).wait.func1+0x2c	/home/circleci/project/daemon/daemon.go:144

1 @ 0x46bb5d 0xac32e2 0xac30a5 0xabfc72 0xacd7a5 0xacf085 0x7d42a4 0x7d61cd 0x7d78a3 0x7d30ad 0x471b01
#	0x46bb5c	runtime/pprof.runtime_goroutineProfileWithLabels+0x5c	/usr/local/go/src/runtime/mprof.go:716
#	0xac32e1	runtime/pprof.writeRuntimeProfile+0xe1			/usr/local/go/src/runtime/pprof/pprof.go:724
#	0xac30a4	runtime/pprof.writeGoroutine+0xa4			/usr/local/go/src/runtime/pprof/pprof.go:684
#	0xabfc71	runtime/pprof.(*Profile).WriteTo+0x3f1			/usr/local/go/src/runtime/pprof/pprof.go:331
#	0xacd7a4	net/http/pprof.handler.ServeHTTP+0x384			/usr/local/go/src/net/http/pprof/pprof.go:256
#	0xacf084	net/http/pprof.Index+0x944				/usr/local/go/src/net/http/pprof/pprof.go:367
#	0x7d42a3	net/http.HandlerFunc.ServeHTTP+0x43			/usr/local/go/src/net/http/server.go:2042
#	0x7d61cc	net/http.(*ServeMux).ServeHTTP+0x1ac			/usr/local/go/src/net/http/server.go:2417
#	0x7d78a2	net/http.serverHandler.ServeHTTP+0xa2			/usr/local/go/src/net/http/server.go:2843
#	0x7d30ac	net/http.(*conn).serve+0x8ac				/usr/local/go/src/net/http/server.go:1925

1 @ 0x7ccde1 0x471b01
#	0x7ccde0	net/http.(*connReader).backgroundRead+0x0	/usr/local/go/src/net/http/server.go:689



Looking into this later today

@problame
Copy link
Member Author

@mdtancsa next time it hangs up, could you run ps aux | grep defunct on the sender before restarting?

@mdtancsa
Copy link

mdtancsa commented Jan 10, 2021 via email

@problame
Copy link
Member Author

Another (maybe interesting) observation: the 'CLOSED' sockets on the sender still have some bytes in their Recv-Q. Not sure whether that's normal.

netstat -n -p tcp -W | grep 2342
tcp6       0      0 sender.59165                    receiver.2342    CLOSED     
tcp6      63      0 sender.14084                    receiver.2342    CLOSED     
tcp6      24      0 sender.41868                    receiver.2342    CLOSED     
tcp6       0      0 sender.64078                    receiver.2342    ESTABLISHED
tcp6       0      0 sender.30586                    receiver.2342    CLOSED     
tcp6     177      0 sender.30584                    receiver.2342    CLOSED     
tcp6     177      0 sender.10242                    receiver.2342    CLOSED     
tcp6     177      0 sender.65393                    receiver.2342    CLOSED     
tcp6     177      0 sender.39804                    receiver.2342    CLOSED     
tcp6     177      0 sender.58824                    receiver.2342    CLOSED     
tcp6     177      0 sender.41492                    receiver.2342    CLOSED     
tcp6     177      0 sender.49949                    receiver.2342    CLOSED     
tcp6     177      0 sender.34749                    receiver.2342    CLOSED     
tcp6     207      0 sender.40123                    receiver.2342    CLOSED     
tcp6      63      0 sender.44736                    receiver.2342    CLOSED     

@problame
Copy link
Member Author

Investigation with tcpdump shows that there was periodic heartbeat connectivity on the wire.


The far more ... interesting aspect that I hadn't noticed initially was that there were two zrepl processes running on the sender. Apparently, something got messed up with my internal deployment.
Killed both now, let's see if the problem returns with just one daemon running.

@problame
Copy link
Member Author

Interesting, another lockup.
Daemon does not respond in zrepl status or pprof on, hence attached via gdb.
Lots of stacks in the stepQueue and/or waiting for the parent FS to complete replication.
Never seen this behavior before.
sockstat shows one TCP connection in CLOSED state.
procstat fd PID_OF_DAEMON says the socket FD is still open though.
No new traffic / connection attempts visible in tcpdump.

Going to do one more restart and see if it locks up again.
If it does, going to try pinning the daemon to one core (HT disabled) and see if it locks up.
Also going to try to provoke lockup by changing cpuset repeatedly while the daemon is running.

@mdtancsa
Copy link

Thats the same behaviour I see. I need to do a kill -9 of the process as zrepl status fails to attach

@problame
Copy link
Member Author

Yes, needed to do the same.

@problame problame changed the title freebsd 12.2: irreproducible lockup on push side freebsd 12.2: lockup on push side Jan 10, 2021
@problame
Copy link
Member Author

Another lockup much like the first one:

  • receiver seems fine (Linux)
  • no zfs processes on sender
  • tcpdump shows periodic back-and-forth of 39byte payload (it's TLS so there is likely some overhead)
  • procstat fd PID_OF_DAEMON + cross-correlation with netstat shows one socket to the peer in CLOSED, one in OPEN state
    • i suppose one is grpc and one is the hung data transfer?
  • daemon responds to ./zrepl --config zrepl.yml pprof listen on 10.123.237.40:22345 but when connecting to the address we get connection refused, sockstat doesn't show the socket opened

Stacktrace

(dlv) grs -t
  Goroutine 1 - User: /home/circleci/project/daemon/daemon.go:111 github.com/zrepl/zrepl/daemon.Run (0xadf032) [select 447062h2m38.236211777s]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000044982f in runtime.selectgo
             at /usr/local/go/src/runtime/select.go:338
         2  0x0000000000adf032 in github.com/zrepl/zrepl/daemon.Run
             at /home/circleci/project/daemon/daemon.go:111
         3  0x0000000000ae1429 in github.com/zrepl/zrepl/daemon.glob..func1
             at /home/circleci/project/daemon/main.go:16
         4  0x0000000000874fa2 in github.com/zrepl/zrepl/cli.(*Subcommand).run
             at /home/circleci/project/cli/cli.go:105
         5  0x0000000000875a92 in github.com/zrepl/zrepl/cli.(*Subcommand).run-fm
             at /home/circleci/project/cli/cli.go:100
         6  0x00000000005ce502 in github.com/spf13/cobra.(*Command).execute
             at /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760
         7  0x00000000005cf11e in github.com/spf13/cobra.(*Command).ExecuteC
             at /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846
         8  0x00000000008754cd in github.com/spf13/cobra.(*Command).Execute
             at /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
         9  0x00000000008754cd in github.com/zrepl/zrepl/cli.Run
             at /home/circleci/project/cli/cli.go:152
        10  0x0000000000b23bc5 in main.main
             at /home/circleci/project/main.go:24
        (truncated)
  Goroutine 2 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [force gc (idle) 447013h17m59.221412004s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x0000000000439645 in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x0000000000439645 in runtime.forcegchelper
            at /usr/local/go/src/runtime/proc.go:255
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 3 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [GC sweep wait]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x00000000004245bb in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x00000000004245bb in runtime.bgsweep
            at /usr/local/go/src/runtime/mgcsweep.go:182
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 4 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [GC scavenge wait]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000042271c in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x000000000042271c in runtime.bgscavenge
            at /usr/local/go/src/runtime/mgcscavenge.go:314
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 7 - User: /usr/local/go/src/runtime/sigqueue.go:147 os/signal.signal_recv (0x46aefd) (thread 100469)
        0  0x000000000046fd1e in runtime.sys_umtx_op
            at /usr/local/go/src/runtime/sys_freebsd_amd64.s:21
        1  0x000000000046ca40 in runtime.systemstack_switch
            at /usr/local/go/src/runtime/asm_amd64.s:330
        2  0x00000000004337fd in runtime.futexsleep
            at /usr/local/go/src/runtime/os_freebsd.go:155
        3  0x000000000040c0c6 in runtime.notetsleep_internal
            at /usr/local/go/src/runtime/lock_futex.go:182
        4  0x000000000040c2ec in runtime.notetsleepg
            at /usr/local/go/src/runtime/lock_futex.go:236
        5  0x000000000046aefd in os/signal.signal_recv
            at /usr/local/go/src/runtime/sigqueue.go:147
        6  0x0000000000a9ae25 in os/signal.loop
            at /usr/local/go/src/os/signal/signal_unix.go:23
        7  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 17 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [finalizer wait 447013h31m59.254036836s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x0000000000419309 in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x0000000000419309 in runtime.runfinq
            at /usr/local/go/src/runtime/mfinal.go:175
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 19 - User: /home/circleci/project/daemon/daemon.go:37 github.com/zrepl/zrepl/daemon.Run.func1 (0xae1d14) [chan receive 447062h2m38.239049104s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x00000000004065af in runtime.chanrecv
            at /usr/local/go/src/runtime/chan.go:577
        2  0x00000000004061eb in runtime.chanrecv1
            at /usr/local/go/src/runtime/chan.go:439
        3  0x0000000000ae1d14 in github.com/zrepl/zrepl/daemon.Run.func1
            at /home/circleci/project/daemon/daemon.go:37
        4  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 26 - User: /home/circleci/project/daemon/snapper/snapper.go:376 github.com/zrepl/zrepl/daemon/snapper.wait (0xa49ee9) [select 447013h33m59.259420414s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000a49ee9 in github.com/zrepl/zrepl/daemon/snapper.wait
            at /home/circleci/project/daemon/snapper/snapper.go:376
        3  0x0000000000a46f8d in github.com/zrepl/zrepl/daemon/snapper.(*Snapper).Run
            at /home/circleci/project/daemon/snapper/snapper.go:158
        4  0x0000000000a90d3f in github.com/zrepl/zrepl/daemon/snapper.(*PeriodicOrManual).Run
            at /home/circleci/project/daemon/snapper/snapper_all.go:23
        5  0x0000000000a90d3f in github.com/zrepl/zrepl/daemon/job.(*modePush).RunPeriodic
            at /home/circleci/project/daemon/job/active.go:132
        6  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 33 - User: /home/circleci/project/daemon/logging/trace/trace_chrometrace.go:146 github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1 (0x838e07) [select 447013h33m59.259916766s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000838e07 in github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1
            at /home/circleci/project/daemon/logging/trace/trace_chrometrace.go:146
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 35 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [select 447062h2m38.240235502s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x000000000046729d in runtime.ensureSigM.func1
            at /usr/local/go/src/runtime/signal_unix.go:875
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 42 - User: /home/circleci/project/replication/driver/replication_driver.go:280 github.com/zrepl/zrepl/replication/driver.Do.func2 (0xa56178) [chan receive 447017h33m2.635620379s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x00000000004065af in runtime.chanrecv
            at /usr/local/go/src/runtime/chan.go:577
        2  0x00000000004061eb in runtime.chanrecv1
            at /usr/local/go/src/runtime/chan.go:439
        3  0x0000000000a56178 in github.com/zrepl/zrepl/replication/driver.Do.func2
            at /home/circleci/project/replication/driver/replication_driver.go:280
        4  0x0000000000a939ff in github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do
            at /home/circleci/project/daemon/job/active.go:467
        5  0x0000000000a93366 in github.com/zrepl/zrepl/daemon/job.(*ActiveSide).Run
            at /home/circleci/project/daemon/job/active.go:424
        6  0x0000000000ae22c2 in github.com/zrepl/zrepl/daemon.(*jobs).start.func1
            at /home/circleci/project/daemon/daemon.go:248
        7  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 43 - User: /usr/local/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0x46a6e5) [semacquire 447062h2m38.241209061s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044a1e5 in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x000000000044a1e5 in runtime.semacquire1
            at /usr/local/go/src/runtime/sema.go:144
        3  0x000000000046a6e5 in sync.runtime_Semacquire
            at /usr/local/go/src/runtime/sema.go:56
        4  0x00000000004796c5 in sync.(*WaitGroup).Wait
            at /usr/local/go/src/sync/waitgroup.go:130
        5  0x0000000000ae206d in github.com/zrepl/zrepl/daemon.(*jobs).wait.func1
            at /home/circleci/project/daemon/daemon.go:144
        6  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 51 - User: /home/circleci/project/daemon/control.go:172 github.com/zrepl/zrepl/daemon.(*controlJob).Run (0xadd465) [select 447062h2m38.241681468s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000add465 in github.com/zrepl/zrepl/daemon.(*controlJob).Run
            at /home/circleci/project/daemon/control.go:172
        3  0x0000000000ae22c2 in github.com/zrepl/zrepl/daemon.(*jobs).start.func1
            at /home/circleci/project/daemon/daemon.go:248
        4  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 52 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x468cf5) [IO wait 447062h2m33.739599871s]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000043257b in runtime.netpollblock
             at /usr/local/go/src/runtime/netpoll.go:436
         2  0x0000000000468cf5 in internal/poll.runtime_pollWait
             at /usr/local/go/src/runtime/netpoll.go:220
         3  0x00000000004d71a5 in internal/poll.(*pollDesc).wait
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
         4  0x00000000004d9d9c in internal/poll.(*pollDesc).waitRead
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
         5  0x00000000004d9d9c in internal/poll.(*FD).Accept
             at /usr/local/go/src/internal/poll/fd_unix.go:394
         6  0x000000000053fd85 in net.(*netFD).accept
             at /usr/local/go/src/net/fd_unix.go:172
         7  0x000000000055c132 in net.(*TCPListener).accept
             at /usr/local/go/src/net/tcpsock_posix.go:139
         8  0x000000000055af65 in net.(*TCPListener).Accept
             at /usr/local/go/src/net/tcpsock.go:261
         9  0x00000000007f8edc in net/http.(*onceCloseListener).Accept
             at <autogenerated>:1
        10  0x00000000007d2466 in net/http.(*Server).Serve
             at /usr/local/go/src/net/http/server.go:2937
        (truncated)
  Goroutine 59 - User: /home/circleci/project/daemon/pprof.go:45 github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop (0xae0788) [select 447013h9m33.296021172s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000ae0788 in github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop
            at /home/circleci/project/daemon/pprof.go:45
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 62 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x468cf5) [IO wait 447062h2m38.246104536s]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000043257b in runtime.netpollblock
             at /usr/local/go/src/runtime/netpoll.go:436
         2  0x0000000000468cf5 in internal/poll.runtime_pollWait
             at /usr/local/go/src/runtime/netpoll.go:220
         3  0x00000000004d71a5 in internal/poll.(*pollDesc).wait
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
         4  0x00000000004d9d9c in internal/poll.(*pollDesc).waitRead
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
         5  0x00000000004d9d9c in internal/poll.(*FD).Accept
             at /usr/local/go/src/internal/poll/fd_unix.go:394
         6  0x000000000053fd85 in net.(*netFD).accept
             at /usr/local/go/src/net/fd_unix.go:172
         7  0x000000000055c132 in net.(*TCPListener).accept
             at /usr/local/go/src/net/tcpsock_posix.go:139
         8  0x000000000055af65 in net.(*TCPListener).Accept
             at /usr/local/go/src/net/tcpsock.go:261
         9  0x00000000007f8edc in net/http.(*onceCloseListener).Accept
             at <autogenerated>:1
        10  0x00000000007d2466 in net/http.(*Server).Serve
             at /usr/local/go/src/net/http/server.go:2937
        (truncated)
  Goroutine 63 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x468cf5) [IO wait]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000043257b in runtime.netpollblock
             at /usr/local/go/src/runtime/netpoll.go:436
         2  0x0000000000468cf5 in internal/poll.runtime_pollWait
             at /usr/local/go/src/runtime/netpoll.go:220
         3  0x00000000004d71a5 in internal/poll.(*pollDesc).wait
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
         4  0x00000000004d9d9c in internal/poll.(*pollDesc).waitRead
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
         5  0x00000000004d9d9c in internal/poll.(*FD).Accept
             at /usr/local/go/src/internal/poll/fd_unix.go:394
         6  0x000000000053fd85 in net.(*netFD).accept
             at /usr/local/go/src/net/fd_unix.go:172
         7  0x0000000000562572 in net.(*UnixListener).accept
             at /usr/local/go/src/net/unixsock_posix.go:162
         8  0x0000000000560805 in net.(*UnixListener).Accept
             at /usr/local/go/src/net/unixsock.go:260
         9  0x00000000007f8edc in net/http.(*onceCloseListener).Accept
             at <autogenerated>:1
        10  0x00000000007d2466 in net/http.(*Server).Serve
             at /usr/local/go/src/net/http/server.go:2937
        (truncated)
  Goroutine 69 - User: /home/circleci/project/daemon/prometheus.go:75 github.com/zrepl/zrepl/daemon.(*prometheusJob).Run.func1 (0xae2528) [chan receive 447062h2m38.24868852s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x00000000004065af in runtime.chanrecv
            at /usr/local/go/src/runtime/chan.go:577
        2  0x00000000004061eb in runtime.chanrecv1
            at /usr/local/go/src/runtime/chan.go:439
        3  0x0000000000ae2528 in github.com/zrepl/zrepl/daemon.(*prometheusJob).Run.func1
            at /home/circleci/project/daemon/prometheus.go:75
        4  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 106 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [GC worker (idle)]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000041cf7f in runtime.gcBgMarkWorker
            at /usr/local/go/src/runtime/mgc.go:1891
        2  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 183 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [GC worker (idle)]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000041cf7f in runtime.gcBgMarkWorker
            at /usr/local/go/src/runtime/mgc.go:1891
        2  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 202 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [GC worker (idle)]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000041cf7f in runtime.gcBgMarkWorker
            at /usr/local/go/src/runtime/mgc.go:1891
        2  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 211 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x4397a5) [GC worker (idle)]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000041cf7f in runtime.gcBgMarkWorker
            at /usr/local/go/src/runtime/mgc.go:1891
        2  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 1554 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x468cf5) [IO wait]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000043257b in runtime.netpollblock
             at /usr/local/go/src/runtime/netpoll.go:436
         2  0x0000000000468cf5 in internal/poll.runtime_pollWait
             at /usr/local/go/src/runtime/netpoll.go:220
         3  0x00000000004d71a5 in internal/poll.(*pollDesc).wait
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
         4  0x00000000004d81e5 in internal/poll.(*pollDesc).waitRead
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
         5  0x00000000004d81e5 in internal/poll.(*FD).Read
             at /usr/local/go/src/internal/poll/fd_unix.go:159
         6  0x000000000053e80f in net.(*netFD).Read
             at /usr/local/go/src/net/fd_posix.go:55
         7  0x000000000055214e in net.(*conn).Read
             at /usr/local/go/src/net/net.go:182
         8  0x00000000007c7bad in net/http.(*connReader).Read
             at /usr/local/go/src/net/http/server.go:798
         9  0x000000000056a2a5 in bufio.(*Reader).fill
             at /usr/local/go/src/bufio/bufio.go:101
        10  0x000000000056affd in bufio.(*Reader).ReadSlice
             at /usr/local/go/src/bufio/bufio.go:360
        (truncated)
  Goroutine 2606107 - User: /home/circleci/project/rpc/rpc_client.go:60 github.com/zrepl/zrepl/rpc.NewClient.func1.1 (0xa7ccd8) [chan receive 447017h33m2.6457672s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x00000000004065af in runtime.chanrecv
            at /usr/local/go/src/runtime/chan.go:577
        2  0x00000000004061eb in runtime.chanrecv1
            at /usr/local/go/src/runtime/chan.go:439
        3  0x0000000000a7ccd8 in github.com/zrepl/zrepl/rpc.NewClient.func1.1
            at /home/circleci/project/rpc/rpc_client.go:60
        4  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606110 - User: /usr/local/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0x46a6e5) [semacquire 447017h33m2.646132333s]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000044a1e5 in runtime.goparkunlock
             at /usr/local/go/src/runtime/proc.go:312
         2  0x000000000044a1e5 in runtime.semacquire1
             at /usr/local/go/src/runtime/sema.go:144
         3  0x000000000046a6e5 in sync.runtime_Semacquire
             at /usr/local/go/src/runtime/sema.go:56
         4  0x00000000004796c5 in sync.(*WaitGroup).Wait
             at /usr/local/go/src/sync/waitgroup.go:130
         5  0x0000000000a5684a in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func2
             at /home/circleci/project/replication/driver/replication_driver.go:437
         6  0x000000000082683d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
             at /home/circleci/project/util/chainlock/chainlock.go:41
         7  0x0000000000a513f2 in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems
             at /home/circleci/project/replication/driver/replication_driver.go:436
         8  0x0000000000a4f7ba in github.com/zrepl/zrepl/replication/driver.(*attempt).do
             at /home/circleci/project/replication/driver/replication_driver.go:301
         9  0x0000000000a55108 in github.com/zrepl/zrepl/replication/driver.Do.func1.1
             at /home/circleci/project/replication/driver/replication_driver.go:219
        10  0x000000000082683d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
             at /home/circleci/project/util/chainlock/chainlock.go:41
        (truncated)
  Goroutine 2606123 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1299 google.golang.org/grpc/internal/transport.(*http2Client).keepalive (0x919885) [select]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000919885 in google.golang.org/grpc/internal/transport.(*http2Client).keepalive
            at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1299
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606124 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x468cf5) [IO wait]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000043257b in runtime.netpollblock
             at /usr/local/go/src/runtime/netpoll.go:436
         2  0x0000000000468cf5 in internal/poll.runtime_pollWait
             at /usr/local/go/src/runtime/netpoll.go:220
         3  0x00000000004d71a5 in internal/poll.(*pollDesc).wait
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
         4  0x00000000004d81e5 in internal/poll.(*pollDesc).waitRead
             at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
         5  0x00000000004d81e5 in internal/poll.(*FD).Read
             at /usr/local/go/src/internal/poll/fd_unix.go:159
         6  0x000000000053e80f in net.(*netFD).Read
             at /usr/local/go/src/net/fd_posix.go:55
         7  0x000000000055214e in net.(*conn).Read
             at /usr/local/go/src/net/net.go:182
         8  0x000000000070b2a2 in crypto/tls.(*atLeastReader).Read
             at /usr/local/go/src/crypto/tls/conn.go:779
         9  0x00000000004f6f51 in bytes.(*Buffer).ReadFrom
             at /usr/local/go/src/bytes/buffer.go:204
        10  0x000000000070b4f3 in crypto/tls.(*Conn).readFromUntil
             at /usr/local/go/src/crypto/tls/conn.go:801
        (truncated)
  Goroutine 2606125 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x90971b) [select]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x000000000090971b in google.golang.org/grpc/internal/transport.(*controlBuffer).get
            at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317
        3  0x0000000000909f53 in google.golang.org/grpc/internal/transport.(*loopyWriter).run
            at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435
        4  0x000000000092999b in google.golang.org/grpc/internal/transport.newHTTP2Client.func3
            at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:328
        5  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606129 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:122 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x95e319) [select 447017h33m2.64827772s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x000000000095e319 in google.golang.org/grpc.(*ccBalancerWrapper).watcher
            at /go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:122
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606130 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/clientconn.go:417 google.golang.org/grpc.(*ClientConn).WaitForStateChange (0x9619e5) [select 447017h33m2.648574162s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x00000000009619e5 in google.golang.org/grpc.(*ClientConn).WaitForStateChange
            at /go/pkg/mod/google.golang.org/grpc@v1.17.0/clientconn.go:417
        3  0x0000000000a7ce8d in github.com/zrepl/zrepl/rpc.NewClient.func1
            at /home/circleci/project/rpc/rpc_client.go:67
        4  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606131 - User: /home/circleci/project/daemon/job/active.go:438 github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1 (0xa98809) [select 447017h33m2.648971316s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000a98809 in github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1
            at /home/circleci/project/daemon/job/active.go:438
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606157 - User: /home/circleci/project/replication/driver/replication_stepqueue.go:85 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1 (0xa57888) [chan receive 447017h33m2.649286834s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x00000000004065af in runtime.chanrecv
            at /usr/local/go/src/runtime/chan.go:577
        2  0x00000000004061eb in runtime.chanrecv1
            at /usr/local/go/src/runtime/chan.go:439
        3  0x0000000000a57888 in github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1
            at /home/circleci/project/replication/driver/replication_stepqueue.go:85
        4  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606158 - User: /home/circleci/project/replication/driver/replication_stepqueue.go:92 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2 (0xa57c3b) [select 447017h30m53.269222262s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000a57c3b in github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2
            at /home/circleci/project/replication/driver/replication_stepqueue.go:92
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
* Goroutine 2606159 - User: /usr/local/go/src/runtime/proc.go:312 sync.runtime_notifyListWait (0x46a998) [sync.Cond.Wait 447017h30m34.871382794s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000046a998 in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x000000000046a998 in sync.runtime_notifyListWait
            at /usr/local/go/src/runtime/sema.go:513
        3  0x0000000000475edd in sync.(*Cond).Wait
            at /usr/local/go/src/sync/cond.go:56
        4  0x0000000000a57de8 in github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func3
            at /home/circleci/project/replication/driver/replication_stepqueue.go:121
        5  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2606202 - User: /home/circleci/project/rpc/dataconn/dataconn_client.go:183 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv (0xa6fdfd) [select 447017h30m56.666543539s]
         0  0x00000000004397a5 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:307
         1  0x000000000044982f in runtime.selectgo
             at /usr/local/go/src/runtime/select.go:338
         2  0x0000000000a6fdfd in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv
             at /home/circleci/project/rpc/dataconn/dataconn_client.go:183
         3  0x0000000000a7ae11 in github.com/zrepl/zrepl/rpc.(*Client).Receive
             at /home/circleci/project/rpc/rpc_client.go:108
         4  0x0000000000a5e0c3 in github.com/zrepl/zrepl/replication/logic.(*Step).doReplication
             at /home/circleci/project/replication/logic/replication_logic.go:620
         5  0x0000000000a5a85f in github.com/zrepl/zrepl/replication/logic.(*Step).Step
             at /home/circleci/project/replication/logic/replication_logic.go:187
         6  0x0000000000a57554 in github.com/zrepl/zrepl/replication/driver.(*fs).do.func5
             at /home/circleci/project/replication/driver/replication_driver.go:641
         7  0x000000000082683d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
             at /home/circleci/project/util/chainlock/chainlock.go:41
         8  0x0000000000a521ce in github.com/zrepl/zrepl/replication/driver.(*fs).do
             at /home/circleci/project/replication/driver/replication_driver.go:634
         9  0x0000000000a567bc in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1
             at /home/circleci/project/replication/driver/replication_driver.go:433
        10  0x000000000046e681 in runtime.goexit
             at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2613746 - User: /home/circleci/project/rpc/dataconn/heartbeatconn/heartbeatconn.go:84 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6864e) [select]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044982f in runtime.selectgo
            at /usr/local/go/src/runtime/select.go:338
        2  0x0000000000a6864e in github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats
            at /home/circleci/project/rpc/dataconn/heartbeatconn/heartbeatconn.go:84
        3  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 2613748 - User: /usr/local/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0x46a6e5) [semacquire 447017h30m53.270913541s]
        0  0x00000000004397a5 in runtime.gopark
            at /usr/local/go/src/runtime/proc.go:307
        1  0x000000000044a1e5 in runtime.goparkunlock
            at /usr/local/go/src/runtime/proc.go:312
        2  0x000000000044a1e5 in runtime.semacquire1
            at /usr/local/go/src/runtime/sema.go:144
        3  0x000000000046a6e5 in sync.runtime_Semacquire
            at /usr/local/go/src/runtime/sema.go:56
        4  0x00000000004796c5 in sync.(*WaitGroup).Wait
            at /usr/local/go/src/sync/waitgroup.go:130
        5  0x0000000000a6b4f6 in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage
            at /home/circleci/project/rpc/dataconn/stream/stream_conn.go:109
        6  0x0000000000a6f069 in github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv
            at /home/circleci/project/rpc/dataconn/dataconn_client.go:74
        7  0x0000000000a71f3e in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1
            at /home/circleci/project/rpc/dataconn/dataconn_client.go:162
        8  0x000000000046e681 in runtime.goexit
            at /usr/local/go/src/runtime/asm_amd64.s:1374
[37 goroutines]

@problame
Copy link
Member Author

FYI, I still run into hangs on my RELENG_12 server. However, I can drastically reduce the instances by pining the process to just 2 CPUs via cpuset -l0,1 -p

@mdtancsa is that equivalent to 12 + what becomes the next 12.X or is it whatever the latest released 12.X is?
I.e.: were you running something > 12.1 and < 12.2 when you first experiences the crashes?

@mdtancsa
Copy link

FYI, I still run into hangs on my RELENG_12 server. However, I can drastically reduce the instances by pining the process to just 2 CPUs via cpuset -l0,1 -p

@mdtancsa is that equivalent to 12 + what becomes the next 12.X or is it whatever the latest released 12.X is?
I.e.: were you running something > 12.1 and < 12.2 when you first experiences the crashes?

@problame yes, RELENG_12 is a point in time snapshot of the src for what eventually becomes future 12.x releases. 12.1R, 12.2R, 12.3R etc. IIRC, I started to see the hangs when I upgraded my server from 11.x to 12. It is also referred to as 12 STABLE, and is also considered "stable" in that its generally bug fixes and minor enhancements to the code that will not break the branch's ABI. Generally, the commits are very conservative and are definitely not experimental. Development is always done in the branch number above, also referred to as HEAD. So right now FreeBSD 13 (also HEAD or '.') is the development branch, and any bug fixes discovered there get "MFC'd" or "Merged From Current".

@problame
Copy link
Member Author

Interesting, I was always confused between RELENG_X and X-STABLE.

Assuming it's a FreeBSD bug, it will have wandered through STABLE into RELENG into 12.2.
And your upgrade from 11 to 12 STABLE might have already included the regression.

I solicited help on Twitter yesterday.
If you can find the from- and to-revisions of the update that caused you problems, please post them here (sorry if I asked for that months ago, I cannot find the issue comment).

@mdtancsa
Copy link

mdtancsa commented Jan 13, 2021 via email

@problame
Copy link
Member Author

Thanks for following up, maybe this information is useful for a FreeBSD dev to do a bisect.

@problame
Copy link
Member Author

FWIW I used cpuset to limit the push-side daemon to one core (no HT) 5 days ago, no lockup since.
Here's a few machine stats:

$ sysctl hw.model
hw.model: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

$ grep cpu_microcode /boot/loader.conf
cpu_microcode_load="YES"
cpu_microcode_name="/boot/firmware/intel-ucode.bin"

$ pkg info devcpu-data
devcpu-data-1.28
Name           : devcpu-data
Version        : 1.28
Installed on   : Thu Jan  9 15:13:40 2020 CET
Origin         : sysutils/devcpu-data
Architecture   : FreeBSD:12:*
Prefix         : /usr/local
Categories     : sysutils
Licenses       : EULA
Maintainer     : sbruno@FreeBSD.org
WWW            : UNKNOWN
Comment        : Intel and AMD CPUs microcode updates
Annotations    :
	repo_type      : binary
	repository     : FreeBSD
Flat size      : 5.96MiB
Description    :
This port supplies microcode updates for use with cpuctl(4) microcode
update facility.  These could be used to keep your processor's firmware
up-to-date.


@mdtancsa could you post the output of these commands as well?

@mdtancsa
Copy link

mdtancsa commented Jan 18, 2021 via email

@mafredri
Copy link
Contributor

mafredri commented Jan 19, 2021

I'm having the same (or a similar) issue on Linux. That is, both clients are running Debian Buster (sender amd64 and receiver arm64) and the sender becomes stuck during replication.

What happens:

  • Sender is stuck on sending a specific dataset
  • zfs recv is running (waiting) on receiver, no zfs processes on sender
  • Connection seems uninterrupted (restarting zrepl on receiver side wakes the sender up and triggers an error)
  • Enabling concurrent replication (ZREPL_REPLICATION_EXPERIMENTAL_REPLICATION_CONCURRENCY=4) does not help but allows the other datasets to finish (or until all concurrent workers have become "stuck")
  • Stopping zrepl (systemctl stop zrepl) on the sender takes a long time (only when "stuck")

I've captured the following goroutine stacks on the sender via delve:

(dlv) goroutines -t
  Goroutine 1 - User: /src/daemon/daemon.go:111 github.com/zrepl/zrepl/daemon.Run (0xae5eb2) [select 446601h12m16.062777053s]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000044c16f in runtime.selectgo
	     at /usr/local/go/src/runtime/select.go:338
	 2  0x0000000000ae5eb2 in github.com/zrepl/zrepl/daemon.Run
	     at /src/daemon/daemon.go:111
	 3  0x0000000000ae82a9 in github.com/zrepl/zrepl/daemon.glob..func1
	     at /src/daemon/main.go:16
	 4  0x000000000087a7a2 in github.com/zrepl/zrepl/cli.(*Subcommand).run
	     at /src/cli/cli.go:105
	 5  0x000000000087b292 in github.com/zrepl/zrepl/cli.(*Subcommand).run-fm
	     at /src/cli/cli.go:100
	 6  0x00000000005d3e02 in github.com/spf13/cobra.(*Command).execute
	     at /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:760
	 7  0x00000000005d4a1e in github.com/spf13/cobra.(*Command).ExecuteC
	     at /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:846
	 8  0x000000000087accd in github.com/spf13/cobra.(*Command).Execute
	     at /go/pkg/mod/github.com/spf13/cobra@v0.0.2/command.go:794
	 9  0x000000000087accd in github.com/zrepl/zrepl/cli.Run
	     at /src/cli/cli.go:152
	10  0x0000000000b2aa05 in main.main
	     at /src/main.go:24
	(truncated)
  Goroutine 2 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [force gc (idle) 446581h6m21.32047136s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000043bf05 in runtime.goparkunlock
	    at /usr/local/go/src/runtime/proc.go:312
	2  0x000000000043bf05 in runtime.forcegchelper
	    at /usr/local/go/src/runtime/proc.go:255
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 3 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [GC sweep wait]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x00000000004267bb in runtime.goparkunlock
	    at /usr/local/go/src/runtime/proc.go:312
	2  0x00000000004267bb in runtime.bgsweep
	    at /usr/local/go/src/runtime/mgcsweep.go:182
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 4 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [GC scavenge wait]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000042491c in runtime.goparkunlock
	    at /usr/local/go/src/runtime/proc.go:312
	2  0x000000000042491c in runtime.bgscavenge
	    at /usr/local/go/src/runtime/mgcscavenge.go:314
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 5 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [finalizer wait 446581h2m39.647048063s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000041b509 in runtime.goparkunlock
	    at /usr/local/go/src/runtime/proc.go:312
	2  0x000000000041b509 in runtime.runfinq
	    at /usr/local/go/src/runtime/mfinal.go:175
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 8 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [select 446601h12m16.072243617s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x000000000046a2ba in runtime.ensureSigM.func1
	    at /usr/local/go/src/runtime/signal_unix.go:875
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 10 - User: /src/daemon/snapper/snapper.go:376 github.com/zrepl/zrepl/daemon/snapper.wait (0xa50d89) [select 446581h4m21.29221205s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a50d89 in github.com/zrepl/zrepl/daemon/snapper.wait
	    at /src/daemon/snapper/snapper.go:376
	3  0x0000000000a4de2d in github.com/zrepl/zrepl/daemon/snapper.(*Snapper).Run
	    at /src/daemon/snapper/snapper.go:158
	4  0x0000000000a97a7f in github.com/zrepl/zrepl/daemon/snapper.(*PeriodicOrManual).Run
	    at /src/daemon/snapper/snapper_all.go:23
	5  0x0000000000a97a7f in github.com/zrepl/zrepl/daemon/job.(*modePush).RunPeriodic
	    at /src/daemon/job/active.go:132
	6  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 19 - User: /src/daemon/logging/trace/trace_chrometrace.go:146 github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1 (0x83e607) [select 446581h4m21.294500353s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x000000000083e607 in github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1
	    at /src/daemon/logging/trace/trace_chrometrace.go:146
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 21 - User: /usr/local/go/src/runtime/sigqueue.go:147 os/signal.signal_recv (0x46e1bd) (thread 2645)
	0  0x0000000000473983 in runtime.futex
	    at /usr/local/go/src/runtime/sys_linux_amd64.s:588
	1  0x0000000000435626 in runtime.futexsleep
	    at /usr/local/go/src/runtime/os_linux.go:45
	2  0x000000000040dfa6 in runtime.notetsleep_internal
	    at /usr/local/go/src/runtime/lock_futex.go:182
	3  0x000000000040e1cc in runtime.notetsleepg
	    at /usr/local/go/src/runtime/lock_futex.go:236
	4  0x000000000046e1bd in os/signal.signal_recv
	    at /usr/local/go/src/runtime/sigqueue.go:147
	5  0x0000000000aa1b65 in os/signal.loop
	    at /usr/local/go/src/os/signal/signal_unix.go:23
	6  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 24 - User: /src/daemon/pprof.go:45 github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop (0xae7608) [select 446601h12m16.079575441s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000ae7608 in github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop
	    at /src/daemon/pprof.go:45
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 25 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004def9c in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004def9c in internal/poll.(*FD).Accept
	     at /usr/local/go/src/internal/poll/fd_unix.go:394
	 6  0x0000000000540385 in net.(*netFD).accept
	     at /usr/local/go/src/net/fd_unix.go:172
	 7  0x0000000000564232 in net.(*UnixListener).accept
	     at /usr/local/go/src/net/unixsock_posix.go:162
	 8  0x00000000005624c5 in net.(*UnixListener).Accept
	     at /usr/local/go/src/net/unixsock.go:260
	 9  0x00000000007fe6dc in net/http.(*onceCloseListener).Accept
	     at <autogenerated>:1
	10  0x00000000007d7c66 in net/http.(*Server).Serve
	     at /usr/local/go/src/net/http/server.go:2937
	(truncated)
  Goroutine 34 - User: /src/daemon/daemon.go:37 github.com/zrepl/zrepl/daemon.Run.func1 (0xae8b94) [chan receive 446601h12m16.096107089s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x00000000004080cb in runtime.chanrecv1
	    at /usr/local/go/src/runtime/chan.go:439
	3  0x0000000000ae8b94 in github.com/zrepl/zrepl/daemon.Run.func1
	    at /src/daemon/daemon.go:37
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 38 - User: /src/daemon/control.go:172 github.com/zrepl/zrepl/daemon.(*controlJob).Run (0xae42e5) [select 446601h12m16.098265966s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000ae42e5 in github.com/zrepl/zrepl/daemon.(*controlJob).Run
	    at /src/daemon/control.go:172
	3  0x0000000000ae9142 in github.com/zrepl/zrepl/daemon.(*jobs).start.func1
	    at /src/daemon/daemon.go:248
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 39 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait 446601h12m16.100230117s]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004def9c in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004def9c in internal/poll.(*FD).Accept
	     at /usr/local/go/src/internal/poll/fd_unix.go:394
	 6  0x0000000000540385 in net.(*netFD).accept
	     at /usr/local/go/src/net/fd_unix.go:172
	 7  0x000000000055ddf2 in net.(*TCPListener).accept
	     at /usr/local/go/src/net/tcpsock_posix.go:139
	 8  0x000000000055cbc5 in net.(*TCPListener).Accept
	     at /usr/local/go/src/net/tcpsock.go:261
	 9  0x00000000007fe6dc in net/http.(*onceCloseListener).Accept
	     at <autogenerated>:1
	10  0x00000000007d7c66 in net/http.(*Server).Serve
	     at /usr/local/go/src/net/http/server.go:2937
	(truncated)
  Goroutine 51 - User: /src/replication/driver/replication_driver.go:280 github.com/zrepl/zrepl/replication/driver.Do.func2 (0xa5d018) [chan receive 446596h34m50.27280688s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x00000000004080cb in runtime.chanrecv1
	    at /usr/local/go/src/runtime/chan.go:439
	3  0x0000000000a5d018 in github.com/zrepl/zrepl/replication/driver.Do.func2
	    at /src/replication/driver/replication_driver.go:280
	4  0x0000000000a9a73f in github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do
	    at /src/daemon/job/active.go:467
	5  0x0000000000a9a0a6 in github.com/zrepl/zrepl/daemon/job.(*ActiveSide).Run
	    at /src/daemon/job/active.go:424
	6  0x0000000000ae9142 in github.com/zrepl/zrepl/daemon.(*jobs).start.func1
	    at /src/daemon/daemon.go:248
	7  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 53 - User: /src/daemon/job/snapjob.go:111 github.com/zrepl/zrepl/daemon/job.(*SnapJob).Run (0xa9eb75) [select 446581h10m21.4429538s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a9eb75 in github.com/zrepl/zrepl/daemon/job.(*SnapJob).Run
	    at /src/daemon/job/snapjob.go:111
	3  0x0000000000ae9142 in github.com/zrepl/zrepl/daemon.(*jobs).start.func1
	    at /src/daemon/daemon.go:248
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 55 - User: /src/daemon/job/snapjob.go:111 github.com/zrepl/zrepl/daemon/job.(*SnapJob).Run (0xa9eb75) [select 446581h12m21.484796613s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a9eb75 in github.com/zrepl/zrepl/daemon/job.(*SnapJob).Run
	    at /src/daemon/job/snapjob.go:111
	3  0x0000000000ae9142 in github.com/zrepl/zrepl/daemon.(*jobs).start.func1
	    at /src/daemon/daemon.go:248
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 56 - User: /usr/local/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0x46d9a5) [semacquire 446601h12m16.112545025s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044cb25 in runtime.goparkunlock
	    at /usr/local/go/src/runtime/proc.go:312
	2  0x000000000044cb25 in runtime.semacquire1
	    at /usr/local/go/src/runtime/sema.go:144
	3  0x000000000046d9a5 in sync.runtime_Semacquire
	    at /usr/local/go/src/runtime/sema.go:56
	4  0x000000000047cde5 in sync.(*WaitGroup).Wait
	    at /usr/local/go/src/sync/waitgroup.go:130
	5  0x0000000000ae8eed in github.com/zrepl/zrepl/daemon.(*jobs).wait.func1
	    at /src/daemon/daemon.go:144
	6  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 58 - User: /src/daemon/snapper/snapper.go:376 github.com/zrepl/zrepl/daemon/snapper.wait (0xa50d89) [select 446581h10m21.448046759s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a50d89 in github.com/zrepl/zrepl/daemon/snapper.wait
	    at /src/daemon/snapper/snapper.go:376
	3  0x0000000000a4de2d in github.com/zrepl/zrepl/daemon/snapper.(*Snapper).Run
	    at /src/daemon/snapper/snapper.go:158
	4  0x0000000000a524bb in github.com/zrepl/zrepl/daemon/snapper.(*PeriodicOrManual).Run
	    at /src/daemon/snapper/snapper_all.go:23
	5  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 80 - User: /src/daemon/prometheus.go:75 github.com/zrepl/zrepl/daemon.(*prometheusJob).Run.func1 (0xae93a8) [chan receive 446601h12m16.116383188s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x00000000004080cb in runtime.chanrecv1
	    at /usr/local/go/src/runtime/chan.go:439
	3  0x0000000000ae93a8 in github.com/zrepl/zrepl/daemon.(*prometheusJob).Run.func1
	    at /src/daemon/prometheus.go:75
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 134 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [GC worker (idle)]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000041f17f in runtime.gcBgMarkWorker
	    at /usr/local/go/src/runtime/mgc.go:1891
	2  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 135 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [GC worker (idle)]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000041f17f in runtime.gcBgMarkWorker
	    at /usr/local/go/src/runtime/mgc.go:1891
	2  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 136 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [GC worker (idle)]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000041f17f in runtime.gcBgMarkWorker
	    at /usr/local/go/src/runtime/mgc.go:1891
	2  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 137 - User: /usr/local/go/src/runtime/proc.go:307 runtime.gopark (0x43c065) [GC worker (idle)]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000041f17f in runtime.gcBgMarkWorker
	    at /usr/local/go/src/runtime/mgc.go:1891
	2  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 175 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004dd3e5 in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004dd3e5 in internal/poll.(*FD).Read
	     at /usr/local/go/src/internal/poll/fd_unix.go:159
	 6  0x000000000053ee0f in net.(*netFD).Read
	     at /usr/local/go/src/net/fd_posix.go:55
	 7  0x0000000000553ace in net.(*conn).Read
	     at /usr/local/go/src/net/net.go:182
	 8  0x00000000007cd3ad in net/http.(*connReader).Read
	     at /usr/local/go/src/net/http/server.go:798
	 9  0x000000000056fba5 in bufio.(*Reader).fill
	     at /usr/local/go/src/bufio/bufio.go:101
	10  0x00000000005708fd in bufio.(*Reader).ReadSlice
	     at /usr/local/go/src/bufio/bufio.go:360
	(truncated)
  Goroutine 316058 - User: /src/rpc/rpc_client.go:60 github.com/zrepl/zrepl/rpc.NewClient.func1.1 (0xa83b78) [chan receive 446596h34m50.300511041s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x00000000004080cb in runtime.chanrecv1
	    at /usr/local/go/src/runtime/chan.go:439
	3  0x0000000000a83b78 in github.com/zrepl/zrepl/rpc.NewClient.func1.1
	    at /src/rpc/rpc_client.go:60
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316170 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:122 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x964939) [select 446596h34m50.302680643s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000964939 in google.golang.org/grpc.(*ccBalancerWrapper).watcher
	    at /go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:122
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316171 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/clientconn.go:417 google.golang.org/grpc.(*ClientConn).WaitForStateChange (0x968005) [select 446596h34m50.30479362s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000968005 in google.golang.org/grpc.(*ClientConn).WaitForStateChange
	    at /go/pkg/mod/google.golang.org/grpc@v1.17.0/clientconn.go:417
	3  0x0000000000a83d2d in github.com/zrepl/zrepl/rpc.NewClient.func1
	    at /src/rpc/rpc_client.go:67
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316172 - User: /src/daemon/job/active.go:438 github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1 (0xa9f549) [select 446596h34m50.306584194s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a9f549 in github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1
	    at /src/daemon/job/active.go:438
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316174 - User: /usr/local/go/src/runtime/sema.go:56 sync.runtime_Semacquire (0x46d9a5) [semacquire 446596h34m50.308230243s]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000044cb25 in runtime.goparkunlock
	     at /usr/local/go/src/runtime/proc.go:312
	 2  0x000000000044cb25 in runtime.semacquire1
	     at /usr/local/go/src/runtime/sema.go:144
	 3  0x000000000046d9a5 in sync.runtime_Semacquire
	     at /usr/local/go/src/runtime/sema.go:56
	 4  0x000000000047cde5 in sync.(*WaitGroup).Wait
	     at /usr/local/go/src/sync/waitgroup.go:130
	 5  0x0000000000a5d6ea in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func2
	     at /src/replication/driver/replication_driver.go:437
	 6  0x000000000082c03d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
	     at /src/util/chainlock/chainlock.go:41
	 7  0x0000000000a58292 in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems
	     at /src/replication/driver/replication_driver.go:436
	 8  0x0000000000a5665a in github.com/zrepl/zrepl/replication/driver.(*attempt).do
	     at /src/replication/driver/replication_driver.go:301
	 9  0x0000000000a5bfa8 in github.com/zrepl/zrepl/replication/driver.Do.func1.1
	     at /src/replication/driver/replication_driver.go:219
	10  0x000000000082c03d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
	     at /src/util/chainlock/chainlock.go:41
	(truncated)
  Goroutine 316190 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1299 google.golang.org/grpc/internal/transport.(*http2Client).keepalive (0x91fca5) [select]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x000000000091fca5 in google.golang.org/grpc/internal/transport.(*http2Client).keepalive
	    at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1299
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316191 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004dd3e5 in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004dd3e5 in internal/poll.(*FD).Read
	     at /usr/local/go/src/internal/poll/fd_unix.go:159
	 6  0x000000000053ee0f in net.(*netFD).Read
	     at /usr/local/go/src/net/fd_posix.go:55
	 7  0x0000000000553ace in net.(*conn).Read
	     at /usr/local/go/src/net/net.go:182
	 8  0x0000000000710ba2 in crypto/tls.(*atLeastReader).Read
	     at /usr/local/go/src/crypto/tls/conn.go:779
	 9  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	     at /usr/local/go/src/bytes/buffer.go:204
	10  0x0000000000710df3 in crypto/tls.(*Conn).readFromUntil
	     at /usr/local/go/src/crypto/tls/conn.go:801
	(truncated)
  Goroutine 316192 - User: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317 google.golang.org/grpc/internal/transport.(*controlBuffer).get (0x90f97b) [select]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x000000000090f97b in google.golang.org/grpc/internal/transport.(*controlBuffer).get
	    at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:317
	3  0x00000000009101b3 in google.golang.org/grpc/internal/transport.(*loopyWriter).run
	    at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/controlbuf.go:435
	4  0x000000000092ffbb in google.golang.org/grpc/internal/transport.newHTTP2Client.func3
	    at /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:328
	5  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316440 - User: /src/replication/driver/replication_stepqueue.go:85 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1 (0xa5e728) [chan receive 446596h34m50.317776569s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x00000000004080cb in runtime.chanrecv1
	    at /usr/local/go/src/runtime/chan.go:439
	3  0x0000000000a5e728 in github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1
	    at /src/replication/driver/replication_stepqueue.go:85
	4  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316441 - User: /src/replication/driver/replication_stepqueue.go:92 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2 (0xa5eadb) [select 446594h22m14.706312627s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a5eadb in github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2
	    at /src/replication/driver/replication_stepqueue.go:92
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316442 - User: /usr/local/go/src/runtime/proc.go:312 sync.runtime_notifyListWait (0x46dc58) [sync.Cond.Wait 446594h16m35.333529759s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000046dc58 in runtime.goparkunlock
	    at /usr/local/go/src/runtime/proc.go:312
	2  0x000000000046dc58 in sync.runtime_notifyListWait
	    at /usr/local/go/src/runtime/sema.go:513
	3  0x00000000004795fd in sync.(*Cond).Wait
	    at /usr/local/go/src/sync/cond.go:56
	4  0x0000000000a5ec88 in github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func3
	    at /src/replication/driver/replication_stepqueue.go:121
	5  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316511 - User: /src/rpc/dataconn/dataconn_client.go:183 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv (0xa76c9d) [select 446595h59m7.404478201s]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000044c16f in runtime.selectgo
	     at /usr/local/go/src/runtime/select.go:338
	 2  0x0000000000a76c9d in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv
	     at /src/rpc/dataconn/dataconn_client.go:183
	 3  0x0000000000a81cb1 in github.com/zrepl/zrepl/rpc.(*Client).Receive
	     at /src/rpc/rpc_client.go:108
	 4  0x0000000000a64f63 in github.com/zrepl/zrepl/replication/logic.(*Step).doReplication
	     at /src/replication/logic/replication_logic.go:620
	 5  0x0000000000a616ff in github.com/zrepl/zrepl/replication/logic.(*Step).Step
	     at /src/replication/logic/replication_logic.go:187
	 6  0x0000000000a5e3f4 in github.com/zrepl/zrepl/replication/driver.(*fs).do.func5
	     at /src/replication/driver/replication_driver.go:641
	 7  0x000000000082c03d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
	     at /src/util/chainlock/chainlock.go:41
	 8  0x0000000000a5906e in github.com/zrepl/zrepl/replication/driver.(*fs).do
	     at /src/replication/driver/replication_driver.go:634
	 9  0x0000000000a5d65c in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1
	     at /src/replication/driver/replication_driver.go:433
	10  0x0000000000471b01 in runtime.goexit
	     at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316514 - User: /src/rpc/dataconn/dataconn_client.go:183 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv (0xa76c9d) [select 446596h5m55.41550604s]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000044c16f in runtime.selectgo
	     at /usr/local/go/src/runtime/select.go:338
	 2  0x0000000000a76c9d in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv
	     at /src/rpc/dataconn/dataconn_client.go:183
	 3  0x0000000000a81cb1 in github.com/zrepl/zrepl/rpc.(*Client).Receive
	     at /src/rpc/rpc_client.go:108
	 4  0x0000000000a64f63 in github.com/zrepl/zrepl/replication/logic.(*Step).doReplication
	     at /src/replication/logic/replication_logic.go:620
	 5  0x0000000000a616ff in github.com/zrepl/zrepl/replication/logic.(*Step).Step
	     at /src/replication/logic/replication_logic.go:187
	 6  0x0000000000a5e3f4 in github.com/zrepl/zrepl/replication/driver.(*fs).do.func5
	     at /src/replication/driver/replication_driver.go:641
	 7  0x000000000082c03d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
	     at /src/util/chainlock/chainlock.go:41
	 8  0x0000000000a5906e in github.com/zrepl/zrepl/replication/driver.(*fs).do
	     at /src/replication/driver/replication_driver.go:634
	 9  0x0000000000a5d65c in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1
	     at /src/replication/driver/replication_driver.go:433
	10  0x0000000000471b01 in runtime.goexit
	     at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 316516 - User: /src/rpc/dataconn/dataconn_client.go:183 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv (0xa76c9d) [select 446595h56m59.666319641s]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000044c16f in runtime.selectgo
	     at /usr/local/go/src/runtime/select.go:338
	 2  0x0000000000a76c9d in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv
	     at /src/rpc/dataconn/dataconn_client.go:183
	 3  0x0000000000a81cb1 in github.com/zrepl/zrepl/rpc.(*Client).Receive
	     at /src/rpc/rpc_client.go:108
	 4  0x0000000000a64f63 in github.com/zrepl/zrepl/replication/logic.(*Step).doReplication
	     at /src/replication/logic/replication_logic.go:620
	 5  0x0000000000a616ff in github.com/zrepl/zrepl/replication/logic.(*Step).Step
	     at /src/replication/logic/replication_logic.go:187
	 6  0x0000000000a5e3f4 in github.com/zrepl/zrepl/replication/driver.(*fs).do.func5
	     at /src/replication/driver/replication_driver.go:641
	 7  0x000000000082c03d in github.com/zrepl/zrepl/util/chainlock.(*L).DropWhile
	     at /src/util/chainlock/chainlock.go:41
	 8  0x0000000000a5906e in github.com/zrepl/zrepl/replication/driver.(*fs).do
	     at /src/replication/driver/replication_driver.go:634
	 9  0x0000000000a5d65c in github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1
	     at /src/replication/driver/replication_driver.go:433
	10  0x0000000000471b01 in runtime.goexit
	     at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 411513 - User: /usr/local/go/src/io/pipe.go:57 io.(*pipe).Read (0x4d9aa7) [select 446596h5m55.421461993s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x00000000004d9aa7 in io.(*pipe).Read
	    at /usr/local/go/src/io/pipe.go:57
	3  0x00000000004da2ac in io.(*PipeReader).Read
	    at /usr/local/go/src/io/pipe.go:134
	4  0x00000000004d89e3 in io.(*LimitedReader).Read
	    at /usr/local/go/src/io/io.go:455
	5  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	    at /usr/local/go/src/bytes/buffer.go:204
	6  0x00000000004d88df in io.copyBuffer
	    at /usr/local/go/src/io/io.go:395
	7  0x0000000000a7418f in io.Copy
	    at /usr/local/go/src/io/io.go:368
	8  0x0000000000a7418f in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage.func2
	    at /src/rpc/dataconn/stream/stream_conn.go:102
	9  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 411577 - User: /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:84 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6f4ee) [select]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a6f4ee in github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats
	    at /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:84
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 411578 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004dd3e5 in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004dd3e5 in internal/poll.(*FD).Read
	     at /usr/local/go/src/internal/poll/fd_unix.go:159
	 6  0x000000000053ee0f in net.(*netFD).Read
	     at /usr/local/go/src/net/fd_posix.go:55
	 7  0x0000000000553ace in net.(*conn).Read
	     at /usr/local/go/src/net/net.go:182
	 8  0x0000000000710ba2 in crypto/tls.(*atLeastReader).Read
	     at /usr/local/go/src/crypto/tls/conn.go:779
	 9  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	     at /usr/local/go/src/bytes/buffer.go:204
	10  0x0000000000710df3 in crypto/tls.(*Conn).readFromUntil
	     at /usr/local/go/src/crypto/tls/conn.go:801
	(truncated)
  Goroutine 411579 - User: /src/rpc/dataconn/stream/stream.go:245 github.com/zrepl/zrepl/rpc/dataconn/stream.readStream (0xa710e5) [chan receive 446596h5m55.428661691s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x000000000040810b in runtime.chanrecv2
	    at /usr/local/go/src/runtime/chan.go:444
	3  0x0000000000a710e5 in github.com/zrepl/zrepl/rpc/dataconn/stream.readStream
	    at /src/rpc/dataconn/stream/stream.go:245
	4  0x0000000000a72325 in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage
	    at /src/rpc/dataconn/stream/stream_conn.go:106
	5  0x0000000000a75f09 in github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv
	    at /src/rpc/dataconn/dataconn_client.go:74
	6  0x0000000000a78dde in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1
	    at /src/rpc/dataconn/dataconn_client.go:162
	7  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 421848 - User: /usr/local/go/src/io/pipe.go:57 io.(*pipe).Read (0x4d9aa7) [select 446595h59m7.42620171s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x00000000004d9aa7 in io.(*pipe).Read
	    at /usr/local/go/src/io/pipe.go:57
	3  0x00000000004da2ac in io.(*PipeReader).Read
	    at /usr/local/go/src/io/pipe.go:134
	4  0x00000000004d89e3 in io.(*LimitedReader).Read
	    at /usr/local/go/src/io/io.go:455
	5  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	    at /usr/local/go/src/bytes/buffer.go:204
	6  0x00000000004d88df in io.copyBuffer
	    at /usr/local/go/src/io/io.go:395
	7  0x0000000000a7418f in io.Copy
	    at /usr/local/go/src/io/io.go:368
	8  0x0000000000a7418f in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage.func2
	    at /src/rpc/dataconn/stream/stream_conn.go:102
	9  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 421915 - User: /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:84 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6f4ee) [select]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a6f4ee in github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats
	    at /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:84
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 421916 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004dd3e5 in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004dd3e5 in internal/poll.(*FD).Read
	     at /usr/local/go/src/internal/poll/fd_unix.go:159
	 6  0x000000000053ee0f in net.(*netFD).Read
	     at /usr/local/go/src/net/fd_posix.go:55
	 7  0x0000000000553ace in net.(*conn).Read
	     at /usr/local/go/src/net/net.go:182
	 8  0x0000000000710ba2 in crypto/tls.(*atLeastReader).Read
	     at /usr/local/go/src/crypto/tls/conn.go:779
	 9  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	     at /usr/local/go/src/bytes/buffer.go:204
	10  0x0000000000710df3 in crypto/tls.(*Conn).readFromUntil
	     at /usr/local/go/src/crypto/tls/conn.go:801
	(truncated)
  Goroutine 421917 - User: /src/rpc/dataconn/stream/stream.go:245 github.com/zrepl/zrepl/rpc/dataconn/stream.readStream (0xa710e5) [chan receive 446595h59m7.435328996s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x000000000040810b in runtime.chanrecv2
	    at /usr/local/go/src/runtime/chan.go:444
	3  0x0000000000a710e5 in github.com/zrepl/zrepl/rpc/dataconn/stream.readStream
	    at /src/rpc/dataconn/stream/stream.go:245
	4  0x0000000000a72325 in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage
	    at /src/rpc/dataconn/stream/stream_conn.go:106
	5  0x0000000000a75f09 in github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv
	    at /src/rpc/dataconn/dataconn_client.go:74
	6  0x0000000000a78dde in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1
	    at /src/rpc/dataconn/dataconn_client.go:162
	7  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 427432 - User: /usr/local/go/src/io/pipe.go:57 io.(*pipe).Read (0x4d9aa7) [select 446595h56m59.692058939s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x00000000004d9aa7 in io.(*pipe).Read
	    at /usr/local/go/src/io/pipe.go:57
	3  0x00000000004da2ac in io.(*PipeReader).Read
	    at /usr/local/go/src/io/pipe.go:134
	4  0x00000000004d89e3 in io.(*LimitedReader).Read
	    at /usr/local/go/src/io/io.go:455
	5  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	    at /usr/local/go/src/bytes/buffer.go:204
	6  0x00000000004d88df in io.copyBuffer
	    at /usr/local/go/src/io/io.go:395
	7  0x0000000000a7418f in io.Copy
	    at /usr/local/go/src/io/io.go:368
	8  0x0000000000a7418f in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage.func2
	    at /src/rpc/dataconn/stream/stream_conn.go:102
	9  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 427601 - User: /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:84 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6f4ee) [select]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000044c16f in runtime.selectgo
	    at /usr/local/go/src/runtime/select.go:338
	2  0x0000000000a6f4ee in github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats
	    at /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:84
	3  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
  Goroutine 427602 - User: /usr/local/go/src/runtime/netpoll.go:220 internal/poll.runtime_pollWait (0x46bfb5) [IO wait]
	 0  0x000000000043c065 in runtime.gopark
	     at /usr/local/go/src/runtime/proc.go:307
	 1  0x000000000043487b in runtime.netpollblock
	     at /usr/local/go/src/runtime/netpoll.go:436
	 2  0x000000000046bfb5 in internal/poll.runtime_pollWait
	     at /usr/local/go/src/runtime/netpoll.go:220
	 3  0x00000000004dc3a5 in internal/poll.(*pollDesc).wait
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:87
	 4  0x00000000004dd3e5 in internal/poll.(*pollDesc).waitRead
	     at /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
	 5  0x00000000004dd3e5 in internal/poll.(*FD).Read
	     at /usr/local/go/src/internal/poll/fd_unix.go:159
	 6  0x000000000053ee0f in net.(*netFD).Read
	     at /usr/local/go/src/net/fd_posix.go:55
	 7  0x0000000000553ace in net.(*conn).Read
	     at /usr/local/go/src/net/net.go:182
	 8  0x0000000000710ba2 in crypto/tls.(*atLeastReader).Read
	     at /usr/local/go/src/crypto/tls/conn.go:779
	 9  0x00000000004fd551 in bytes.(*Buffer).ReadFrom
	     at /usr/local/go/src/bytes/buffer.go:204
	10  0x0000000000710df3 in crypto/tls.(*Conn).readFromUntil
	     at /usr/local/go/src/crypto/tls/conn.go:801
	(truncated)
  Goroutine 427603 - User: /src/rpc/dataconn/stream/stream.go:245 github.com/zrepl/zrepl/rpc/dataconn/stream.readStream (0xa710e5) [chan receive 446595h56m59.699777616s]
	0  0x000000000043c065 in runtime.gopark
	    at /usr/local/go/src/runtime/proc.go:307
	1  0x000000000040848f in runtime.chanrecv
	    at /usr/local/go/src/runtime/chan.go:577
	2  0x000000000040810b in runtime.chanrecv2
	    at /usr/local/go/src/runtime/chan.go:444
	3  0x0000000000a710e5 in github.com/zrepl/zrepl/rpc/dataconn/stream.readStream
	    at /src/rpc/dataconn/stream/stream.go:245
	4  0x0000000000a72325 in github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage
	    at /src/rpc/dataconn/stream/stream_conn.go:106
	5  0x0000000000a75f09 in github.com/zrepl/zrepl/rpc/dataconn.(*Client).recv
	    at /src/rpc/dataconn/dataconn_client.go:74
	6  0x0000000000a78dde in github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1
	    at /src/rpc/dataconn/dataconn_client.go:162
	7  0x0000000000471b01 in runtime.goexit
	    at /usr/local/go/src/runtime/asm_amd64.s:1374
[51 goroutines]
(dlv) goroutines -s
  Goroutine 1 - Start: /usr/local/go/src/runtime/proc.go:114 runtime.main (0x43ba60) [select 446601h12m23.179860606s]
  Goroutine 2 - Start: /usr/local/go/src/runtime/proc.go:246 runtime.forcegchelper (0x43be40) [force gc (idle) 446581h6m28.433742674s]
  Goroutine 3 - Start: /usr/local/go/src/runtime/mgcsweep.go:156 runtime.bgsweep (0x426680) [GC sweep wait]
  Goroutine 4 - Start: /usr/local/go/src/runtime/mgcscavenge.go:252 runtime.bgscavenge (0x424680) [GC scavenge wait]
  Goroutine 5 - Start: /usr/local/go/src/runtime/mfinal.go:161 runtime.runfinq (0x41b460) [finalizer wait 446581h2m46.756149774s]
  Goroutine 8 - Start: /usr/local/go/src/runtime/signal_unix.go:858 runtime.ensureSigM.func1 (0x46a0c0) [select 446601h12m23.180150095s]
  Goroutine 10 - Start: /src/daemon/job/active.go:131 github.com/zrepl/zrepl/daemon/job.(*modePush).RunPeriodic (0xa97a20) [select 446581h4m28.398886395s]
  Goroutine 19 - Start: /src/daemon/logging/trace/trace_chrometrace.go:135 github.com/zrepl/zrepl/daemon/logging/trace.init.2.func1 (0x83e240) [select 446581h4m28.398938095s]
  Goroutine 21 - Start: /usr/local/go/src/os/signal/signal_unix.go:21 os/signal.loop (0xaa1b40) (thread 2645)
  Goroutine 24 - Start: /src/daemon/pprof.go:40 github.com/zrepl/zrepl/daemon.(*pprofServer).controlLoop (0xae7520) [select 446601h12m23.180353534s]
  Goroutine 25 - Start: /src/daemon/control.go:167 github.com/zrepl/zrepl/daemon.(*controlJob).Run.func5 (0xae8800) [IO wait]
  Goroutine 34 - Start: /src/daemon/daemon.go:36 github.com/zrepl/zrepl/daemon.Run.func1 (0xae8b60) [chan receive 446601h12m23.18051466s]
  Goroutine 38 - Start: /src/daemon/daemon.go:244 github.com/zrepl/zrepl/daemon.(*jobs).start.func1 (0xae9000) [select 446601h12m23.18056586s]
  Goroutine 39 - Start: /src/daemon/daemon.go:244 github.com/zrepl/zrepl/daemon.(*jobs).start.func1 (0xae9000) [IO wait 446601h12m23.180674936s]
  Goroutine 51 - Start: /src/daemon/daemon.go:244 github.com/zrepl/zrepl/daemon.(*jobs).start.func1 (0xae9000) [chan receive 446596h34m57.346520254s]
  Goroutine 53 - Start: /src/daemon/daemon.go:244 github.com/zrepl/zrepl/daemon.(*jobs).start.func1 (0xae9000) [select 446581h10m28.514109144s]
  Goroutine 55 - Start: /src/daemon/daemon.go:244 github.com/zrepl/zrepl/daemon.(*jobs).start.func1 (0xae9000) [select 446581h12m28.554441785s]
  Goroutine 56 - Start: /src/daemon/daemon.go:143 github.com/zrepl/zrepl/daemon.(*jobs).wait.func1 (0xae8ec0) [semacquire 446601h12m23.180942913s]
  Goroutine 58 - Start: /src/daemon/snapper/snapper_all.go:21 github.com/zrepl/zrepl/daemon/snapper.(*PeriodicOrManual).Run (0xa52460) [select 446581h10m28.51429652s]
  Goroutine 80 - Start: /src/daemon/prometheus.go:74 github.com/zrepl/zrepl/daemon.(*prometheusJob).Run.func1 (0xae9360) [chan receive 446601h12m23.181049039s]
  Goroutine 134 - Start: /usr/local/go/src/runtime/mgc.go:1862 runtime.gcBgMarkWorker (0x41f080) [GC worker (idle)]
  Goroutine 135 - Start: /usr/local/go/src/runtime/mgc.go:1862 runtime.gcBgMarkWorker (0x41f080) [GC worker (idle)]
  Goroutine 136 - Start: /usr/local/go/src/runtime/mgc.go:1862 runtime.gcBgMarkWorker (0x41f080) [GC worker (idle)]
  Goroutine 137 - Start: /usr/local/go/src/runtime/mgc.go:1862 runtime.gcBgMarkWorker (0x41f080) [GC worker (idle)]
  Goroutine 175 - Start: /usr/local/go/src/net/http/server.go:1794 net/http.(*conn).serve (0x7d2800) [IO wait]
  Goroutine 316058 - Start: /src/rpc/rpc_client.go:59 github.com/zrepl/zrepl/rpc.NewClient.func1.1 (0xa83b40) [chan receive 446596h34m57.347161646s]
  Goroutine 316170 - Start: /go/pkg/mod/google.golang.org/grpc@v1.17.0/balancer_conn_wrappers.go:120 google.golang.org/grpc.(*ccBalancerWrapper).watcher (0x964820) [select 446596h34m57.347236209s]
  Goroutine 316171 - Start: /src/rpc/rpc_client.go:57 github.com/zrepl/zrepl/rpc.NewClient.func1 (0xa83ba0) [select 446596h34m57.347287871s]
  Goroutine 316172 - Start: /src/daemon/job/active.go:437 github.com/zrepl/zrepl/daemon/job.(*ActiveSide).do.func1 (0xa9f440) [select 446596h34m57.347337672s]
  Goroutine 316174 - Start: /src/replication/driver/replication_driver.go:196 github.com/zrepl/zrepl/replication/driver.Do.func1 (0xa5c120) [semacquire 446596h34m57.34738676s]
  Goroutine 316190 - Start: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1265 google.golang.org/grpc/internal/transport.(*http2Client).keepalive (0x91f9c0) [select]
  Goroutine 316191 - Start: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:1196 google.golang.org/grpc/internal/transport.(*http2Client).reader (0x91f200) [IO wait]
  Goroutine 316192 - Start: /go/pkg/mod/google.golang.org/grpc@v1.17.0/internal/transport/http2_client.go:326 google.golang.org/grpc/internal/transport.newHTTP2Client.func3 (0x92ff40) [select]
  Goroutine 316440 - Start: /src/replication/driver/replication_stepqueue.go:84 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func1 (0xa5e6e0) [chan receive 446596h34m57.347588161s]
  Goroutine 316441 - Start: /src/replication/driver/replication_stepqueue.go:90 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func2 (0xa5e980) [select 446594h22m21.734475945s]
  Goroutine 316442 - Start: /src/replication/driver/replication_stepqueue.go:116 github.com/zrepl/zrepl/replication/driver.(*stepQueue).Start.func3 (0xa5ec20) [sync.Cond.Wait 446594h16m42.360244492s]
  Goroutine 316511 - Start: /src/replication/driver/replication_driver.go:428 github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1 (0xa5d520) [select 446595h59m14.429808475s]
  Goroutine 316514 - Start: /src/replication/driver/replication_driver.go:428 github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1 (0xa5d520) [select 446596h6m2.436981863s]
  Goroutine 316516 - Start: /src/replication/driver/replication_driver.go:428 github.com/zrepl/zrepl/replication/driver.(*attempt).doFilesystems.func1 (0xa5d520) [select 446595h57m6.68462078s]
  Goroutine 411513 - Start: /src/rpc/dataconn/stream/stream_conn.go:99 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage.func2 (0xa740c0) [select 446596h6m2.437083714s]
  Goroutine 411577 - Start: /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:76 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6f340) [select]
  Goroutine 411578 - Start: /src/rpc/dataconn/stream/stream_conn.go:63 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).readFrames (0xa71f80) [IO wait]
  Goroutine 411579 - Start: /src/rpc/dataconn/dataconn_client.go:160 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1 (0xa78d60) [chan receive 446596h6m2.437323416s]
  Goroutine 421848 - Start: /src/rpc/dataconn/stream/stream_conn.go:99 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage.func2 (0xa740c0) [select 446595h59m14.430279603s]
  Goroutine 421915 - Start: /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:76 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6f340) [select]
  Goroutine 421916 - Start: /src/rpc/dataconn/stream/stream_conn.go:63 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).readFrames (0xa71f80) [IO wait]
  Goroutine 421917 - Start: /src/rpc/dataconn/dataconn_client.go:160 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1 (0xa78d60) [chan receive 446595h59m14.430470155s]
  Goroutine 427432 - Start: /src/rpc/dataconn/stream/stream_conn.go:99 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).ReadStreamedMessage.func2 (0xa740c0) [select 446595h57m6.685231709s]
  Goroutine 427601 - Start: /src/rpc/dataconn/heartbeatconn/heartbeatconn.go:76 github.com/zrepl/zrepl/rpc/dataconn/heartbeatconn.(*Conn).sendHeartbeats (0xa6f340) [select]
  Goroutine 427602 - Start: /src/rpc/dataconn/stream/stream_conn.go:63 github.com/zrepl/zrepl/rpc/dataconn/stream.(*Conn).readFrames (0xa71f80) [IO wait]
  Goroutine 427603 - Start: /src/rpc/dataconn/dataconn_client.go:160 github.com/zrepl/zrepl/rpc/dataconn.(*Client).ReqRecv.func1 (0xa78d60) [chan receive 446595h57m6.685480549s]
[51 goroutines]

If you believe this is a separate issue, I'll open a new one.

@problame
Copy link
Member Author

Connection seems uninterrupted (restarting zrepl on receiver side wakes the sender up and triggers an error)

So the lockup in the sender goes away when the sender wakes up because the connection times out?

@problame
Copy link
Member Author

Also:

  • can you reproduce the lockup situation?
  • how often does the lockup occur?
  • when the daemon is locked up in the state you describe, what is the state of the TCP connections? ss -nt (please filter the output to only contain the zrepl connections)

@mafredri
Copy link
Contributor

mafredri commented Jan 19, 2021

So the lockup in the sender goes away when the sender wakes up because the connection times out?

It never times out unless I stop zrepl on the receiver, when I do stop it, the lockup goes away immediately. That makes me think the connection closure is the cause for the wakeup.

can you reproduce the lockup situation?

I seem to be able to reproduce it quite often when there is a backlog of snapshots (don't know the exact value, but let's say 50+ / dataset, 23 datasets). Just now I finished working through a backlog of 800+ snapshots/dataset (backlog was due to a lockup), during that time I restarted zrepl (due to new lockups) probably around 10 times to work through it all.

how often does the lockup occur?

It can work fine for weeks on end or lock up almost immediately, the lockups in normal operation seem a bit sporadic, but they do seem to happen often when the receiver (machine) has been restarted, but occasionally without restarts.

when the daemon is locked up in the state you describe, what is the state of the TCP connections?

I'll try to reproduce and get back to you on this, if my memory serves from inspecting it at one point, most were listed as established, some possibly waiting.

@problame
Copy link
Member Author

@mafredri thanks for the detailed response. Since I have seen similar stack traces on FreeBSD, we might be in luck in the sense that it's not an issue in FreeBSD or Linux. I noticed yesterday evening that my upgrade to FreeBSD 12.2 might have coincided with deployment of a zrepl build built by Go 1.15. Previous builds were Go 1.14. I'll deploy a 1.14-based build today and see whether there are any new lockups.

@mafredri
Copy link
Contributor

mafredri commented Jan 20, 2021

Good lead, I'll be sure to give the new build a try. I also managed to reproduce this again to get the ss output:

  • I stopped zrepl on the receiver side for a few hours
  • Started it back up and backlog was 16 snaps per dataset
  • Concurrent replication was enabled, and only one dataset got stuck (at snap 8/16)
❯ ss -nt
ESTAB  0       0                 [sender]:42092         [receiver]:8888
ESTAB  0       0                 [sender]:40510         [receiver]:8888

@problame
Copy link
Member Author

So before deploying the 1.14-based build, I wanted to see if I could provoke the lockup on FreeBSD for the 1.15-based build again by expanding the CPU mask to all available CPUs. It took two days, but it locked up today.

  • There is one CLOSED network connection and some rpc-related goroutines. That's nothing new, we've seen this before.
  • But here is also the following goroutine stuck in malloc, even though there's plenty of free memory (not a memory leak or anything):
  (dlv) bt
 0  0x000000000046fd1e in runtime.sys_umtx_op
    at /usr/local/go/src/runtime/sys_freebsd_amd64.s:21
 1  0x000000000046ca40 in runtime.systemstack_switch
    at /usr/local/go/src/runtime/asm_amd64.s:330
 2  0x000000000041b99a in runtime.gcStart
    at /usr/local/go/src/runtime/mgc.go:1316
 3  0x000000000040d850 in runtime.mallocgc
    at /usr/local/go/src/runtime/malloc.go:1153
 4  0x000000000040e0b8 in runtime.newobject
    at /usr/local/go/src/runtime/malloc.go:1195
 5  0x0000000000837417 in encoding/base64.NewEncoder
    at /usr/local/go/src/encoding/base64/base64.go:258
 6  0x0000000000837417 in github.com/zrepl/zrepl/daemon/logging/trace.genID
    at /home/circleci/project/daemon/logging/trace/trace_genID.go:30
 7  0x0000000000834e47 in github.com/zrepl/zrepl/daemon/logging/trace.WithSpan
    at /home/circleci/project/daemon/logging/trace/trace.go:307
 8  0x0000000000836b33 in github.com/zrepl/zrepl/daemon/logging/trace.WithSpanFromStackUpdateCtx
    at /home/circleci/project/daemon/logging/trace/trace_convenience.go:17
 9  0x00000000009c2c51 in github.com/zrepl/zrepl/util/semaphore.(*S).Acquire
    at /home/circleci/project/util/semaphore/semaphore.go:26
10  0x00000000009dd385 in github.com/zrepl/zrepl/endpoint.ListAbstractionsStreamed.func3.1
    at /home/circleci/project/endpoint/endpoint_zfs_abstraction.go:584
11  0x000000000083971b in github.com/zrepl/zrepl/daemon/logging/trace.WithTaskGroup.func1
    at /home/circleci/project/daemon/logging/trace/trace_convenience.go:54
12  0x00000000009dd5d5 in github.com/zrepl/zrepl/endpoint.ListAbstractionsStreamed.func3
    at /home/circleci/project/endpoint/endpoint_zfs_abstraction.go:583
13  0x000000000046e681 in runtime.goexit
    at /usr/local/go/src/runtime/asm_amd64.s:1374

I think I remember that when I tried to reproduce @mdtancsa 's problems on FreeBSD, the stack trace that got me thinking it's a Go runtime / FreeBSD issue was exactly one of these.

So there is most likely a Go runtime / FreeBSD bug, because being blocked forever on a mutex somewhere in Go's malloc can't be zrepl's fault.
So I'm going to do deploy the 1.14 based build and see how it plays out.

The other question is whether the Linux lockups are due to the same bug. I don't think they are. But let's keep them in this thread for now until we are certain what causes the FreeBSD bug.

@problame problame added platform and removed bug labels Apr 17, 2021
@problame problame modified the milestones: 0.4, 0.5 Apr 17, 2021
@mdtancsa
Copy link

mdtancsa commented Apr 17, 2021 via email

@problame problame pinned this issue Apr 18, 2021
@problame problame changed the title freebsd 12.2: lockup on push side FreeBSD >= 12.2: Go runtime deadlocks and/or panics on multicore systems Apr 18, 2021
@sporkman
Copy link

I just setup zrepl last night with a very simple snapshot and sink to a second local pool and found my "zrepl status" this morning was complaining it couldn't connect to the daemon.

Looking around, I found zrepl had crashed:

Jun 13 07:17:12 nnj3 zrepl[43371]: fatal: morestack on g0
Jun 13 07:17:12 nnj3 zrepl[43371]: SIGTRAP: trace trap
Jun 13 07:17:12 nnj3 zrepl[43371]: PC=0x4731c2 m=10 sigcode=1
Jun 13 07:17:12 nnj3 zrepl[43371]:
Jun 13 07:17:12 nnj3 zrepl[43371]: goroutine 0 [idle]:
Jun 13 07:17:12 nnj3 zrepl[43371]: runtime.abort()
Jun 13 07:17:12 nnj3 zrepl[43371]: 	/usr/local/go/src/runtime/asm_amd64.s:854 +0x2
Jun 13 07:17:12 nnj3 zrepl[43371]: runtime.morestack()
Jun 13 07:17:12 nnj3 zrepl[43371]: 	/usr/local/go/src/runtime/asm_amd64.s:425 +0x25
Jun 13 07:17:12 nnj3 zrepl[43371]:
Jun 13 07:17:12 nnj3 zrepl[43371]: goroutine 4 [running]:
Jun 13 07:17:12 nnj3 kernel: pid 43820 (zrepl), jid 0, uid 0: exited on signal 11 (core dumped)

I know nothing of "go", so not sure what else to provide here.

This is FreeBSD 12.2-p6.

@problame
Copy link
Member Author

@sporkman yeah, that looks like a manifestation of this bug.

@woodsb02 any chance of linking this issue, and the cpuset workaround, in UPDATING or similar?

@woodsb02
Copy link
Contributor

FYI, @lcook has taken over as maintainer of the FreeBSD zrepl port.

@lcook - please refer query from @problame above :)

@mdtancsa
Copy link

TBH, I think using CPUset to limit the amount of CPUs might cause other problems with clients giving up if you have a lot of clients. ( I have 444 datasets from 20 different servers). The way I use it on the server side is to run the program in a shell script that logs the failure of the daemon and just restarts it. About ~ 5% the daemon deadlocks and I kill and restart it when its in that condition. I check its status by connecting to the prometheus port to see if its responding. If its not, I kill it and restart the daemon

@avg-I
Copy link

avg-I commented Dec 13, 2021

FWIW, recently I "found" a regression in stable/12 when hunting for a memory corruption that seemed to be related to copy-on-write after a fork. I put found in quotes, because the regression was introduced and fixed a long time ago, but while the regression was merged to stable/12, the fix was not.

@mdtancsa
Copy link

FWIW, recently I "found" a regression in stable/12 when hunting for a memory corruption that seemed to be related to copy-on-write after a fork. I put found in quotes, because the regression was introduced and fixed a long time ago, but while the regression was merged to stable/12, the fix was not.

Interesting, is the proper fix in RELENG_12 now ? At the time of the bug, we did migrate from RELENG_11 to RELENG_12, but it was hard to tell as the bug started to manifest well after the migration and we also started to add more clients etc etc. If there is a patch to test, I would be happy to give it a spin.

@avg-I
Copy link

avg-I commented Dec 13, 2021

The problem isn't fixed yet, but hopefully will be very soon: https://reviews.freebsd.org/D33413

@mdtancsa
Copy link

The problem isn't fixed yet, but hopefully will be very soon: https://reviews.freebsd.org/D33413

OK, thanks! Is it in RELENG_13 as well ? We were slotted to update the box soon. But it would be nice to try this fix first to see if this is indeed the cause of the problems.

@avg-I
Copy link

avg-I commented Dec 13, 2021

The problem was fixed in CURRENT months before 13 was branched off and 13.0 was released.

@problame
Copy link
Member Author

@avg-I is this review the same thing that @emaste is talking about in this thread on the Go issue tracker? golang/go#46272 (comment)

@emaste
Copy link

emaste commented Dec 14, 2021

@problame no - @avg-I's review is an issue that was fixed before 13.0 released: freebsd/freebsd-src@3fd989da. The fix is now brought back to the stable/12 branch: freebsd/freebsd-src@1820ca2 and will be in 12.4. I expect we will have an errata update (in early January, after the holidays) for 12.3. If the change fixes this #411 issue it's FreeBSD >=12.2 <13.0 that's affected.

The issue I mention in golang/go#46272 was just fixed in FreeBSD main freebsd/freebsd-src@73b357b and hasn't yet been cherry-picked anywhere else. It should make it to the FreeBSD stable branches in the next day or two, and be included in the same errata update as the above.

@mdtancsa
Copy link

I have been running the commit for the past 5 days and no deadlocks or crashes. Over the weekend, the server in question runs quite a busy load and normally would result in at least 3-4 crashes a day. I would say this fixed the issue for me at this point.
FreeBSD 12.3-STABLE #14 r371237 Tue Dec 14 11:48:29 EST 2021
Thank you @avg-I for fixing this long standing bug!

@emaste
Copy link

emaste commented Dec 20, 2021

Thanks for checking @mdtancsa.

@problame problame changed the title FreeBSD >= 12.2: Go runtime deadlocks and/or panics on multicore systems FreeBSD >= 12.2, < 13.0: Go runtime deadlocks and/or panics on multicore systems Dec 29, 2021
@problame problame changed the title FreeBSD >= 12.2, < 13.0: Go runtime deadlocks and/or panics on multicore systems FreeBSD >= 12.2, < 13.0 / 12.3: Go runtime deadlocks and/or panics on multicore systems Dec 29, 2021
@problame
Copy link
Member Author

Since I upgraded to 13 a few weeks ago, I have run zrepl without errors and without a trimmed down cpuset. So, one more sign that the fix that @emaste linked to above did indeed fix this issue #411.

Given @mdtancsa 's positive experience on 12.3, I'll close this issue.
I will change the title to >= 12.2, < 13.0, and update once the errata for 12.3 comes out.

@problame problame changed the title FreeBSD >= 12.2, < 13.0 / 12.3: Go runtime deadlocks and/or panics on multicore systems FreeBSD >= 12.2, < 13.0: Go runtime deadlocks and/or panics on multicore systems Dec 29, 2021
@emaste
Copy link

emaste commented Jan 12, 2022

Errata updates with the fix are now available via freebsd-update with the fix.

  • FreeBSD 12.2-RELASE-p12
  • FreeBSD 12.3-RELEASE-p1

@problame problame unpinned this issue Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants