Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with 0.5.0 on FreeBSD #629

Closed
ZeWaren opened this issue Sep 19, 2022 · 7 comments
Closed

Crash with 0.5.0 on FreeBSD #629

ZeWaren opened this issue Sep 19, 2022 · 7 comments
Labels
Milestone

Comments

@ZeWaren
Copy link

ZeWaren commented Sep 19, 2022

Greetings,

A couple of my zrepl instances periodically crash.

When started, they work properly and they crash after 1-2 days.

System:

# uname -a
FreeBSD randomserver 13.0-RELEASE-p7 FreeBSD 13.0-RELEASE-p7 #0: Mon Jan 31 18:24:03 UTC 2022     root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

Version:

# zrepl version
client: zrepl version=v0.5.0 go=go1.17.6 GOOS=freebsd GOARCH=amd64 Compiler=gc
server: zrepl version=v0.5.0 go=go1.17.6 GOOS=freebsd GOARCH=amd64 Compiler=gc

Crash summary:

Sep 19 02:44:58 randomserver zrepl[91190]: panic: end task: 1 active child tasks (run daemon with env var ZREPL_TRACE_DEBUG_ENABLED=1 for more details)
Sep 19 02:44:58 randomserver zrepl[91190]: : end task: task still has active child tasks
Sep 19 02:44:58 randomserver zrepl[91190]:
Sep 19 02:44:58 randomserver zrepl[91190]: goroutine 2174246 [running]:
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1.1(0xc0126f5680)
Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/daemon/logging/trace/trace.go:249 +0x525
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1()
Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/daemon/logging/trace/trace.go:272 +0x38
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1()
Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:41 +0x37
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/daemon/job.(*PassiveSide).Run.func1({0xf25198, 0xc000426840}, {0xf1d4f0, 0xc0004f9220}, 0xc000426870)
Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/daemon/job/passive.go:179 +0x2ed
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/rpc.NewServer.func1.1({0xf25198, 0xc000426840}, {0xf1d540, 0xc0004f9200}, 0xc000426870)

Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/rpc/rpc_server.go:56 +0xcf
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/rpc/grpcclientidentity.NewInterceptors.func1({0xf25198, 0xc006af7c80}, {0xd63480, 0xc00470f140}, 0xc0004f90c0, 0xc0008c0b70)
Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/rpc/grpcclientidentity/authlistener_grpc_adaptor.go:136 +0x4b5
Sep 19 02:44:58 randomserver zrepl[91190]: github.com/zrepl/zrepl/replication/logic/pdu._Replication_ReplicationCursor_Handler({0xdba820, 0xc0000a2100}, {0xf25198, 0xc006af7c80}, 0xc007f54ae0, 0xc000599680)
Sep 19 02:44:58 randomserver zrepl[91190]:        github.com/zrepl/zrepl/replication/logic/pdu/pdu_grpc.pb.go:240 +0x16a
Sep 19 02:44:58 randomserver zrepl[91190]: google.golang.org/grpc.(*Server).processUnaryRPC(0xc000172700, {0xf32740, 0xc0000c5200}, 0xc00024ac00, 0xc0005997d0, 0x1489a60, 0x0)
Sep 19 02:44:58 randomserver zrepl[91190]:        google.golang.org/grpc@v1.35.0/server.go:1217 +0xe28
Sep 19 02:44:58 randomserver zrepl[91190]: google.golang.org/grpc.(*Server).handleStream(0xc000172700, {0xf32740, 0xc0000c5200}, 0xc00024ac00, 0x0)
Sep 19 02:44:58 randomserver zrepl[91190]:        google.golang.org/grpc@v1.35.0/server.go:1540 +0xcb3
Sep 19 02:44:58 randomserver zrepl[91190]: google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc0003d40b0, 0xc000172700, {0xf32740, 0xc0000c5200}, 0xc00024ac00)
Sep 19 02:44:58 randomserver zrepl[91190]:        google.golang.org/grpc@v1.35.0/server.go:878 +0xad
Sep 19 02:44:58 randomserver zrepl[91190]: created by google.golang.org/grpc.(*Server).serveStreams.func1
Sep 19 02:44:58 randomserver zrepl[91190]:        google.golang.org/grpc@v1.35.0/server.go:876 +0x1ec

Crash details: zrepl-crash.log

With a few hours up to a couple days, the crash is reproductible, so I'd be happy to try things in order to fix the issue.

@silenius
Copy link

I'm having the same issue on:

~/ sudo zrepl version      
client: zrepl version= go=go1.18.3 GOOS=freebsd GOARCH=amd64 Compiler=gc
server: zrepl version= go=go1.18.3 GOOS=freebsd GOARCH=amd64 Compiler=gc

~/ freebsd-version
13.1-RELEASE

logs:

Sep 23 11:53:05 srv zrepl[87052]: job=sync_srv_zsync subsystem=rpc.data span=Uv38$qQ4a$qQ4a.jQgg msg="cannot write send stream" err="frameconn: shutting down" 
Sep 23 11:53:05 srv zrepl[87051]: 2022-09-23T11:53:05+02:00 [ERRO][sync_srv_zsync][rpc.data][Uv38$qQ4a$qQ4a.jQgg]: cannot write send stream err="frameconn: shutting down"
Sep 23 13:30:49 srv sudo[97585]:   julien : 1 incorrect password attempt ; TTY=pts/13 ; PWD=/usr/home/julien ; USER=root ; COMMAND=/usr/bin/vi /jails/zsync/usr/local/etc/zrepl
Sep 23 13:33:21 srv zrepl[97624]: job=sync_srv_zsync subsystem=rpc.data span=Uv38$mDoH$mDoH.UU3+ msg="cannot write send stream" err="frameconn: shutting down" 
Sep 23 13:33:21 srv zrepl[97623]: 2022-09-23T13:33:21+02:00 [ERRO][sync_srv_zsync][rpc.data][Uv38$mDoH$mDoH.UU3+]: cannot write send stream err="frameconn: shutting down"
Sep 23 16:06:52 srv sudo[26089]:   julien : 3 incorrect password attempts ; TTY=pts/10 ; PWD=/usr/home/julien ; USER=root ; COMMAND=/usr/bin/vi /usr/local/etc/zrepl/zrepl.yml
Sep 24 06:53:03 srv zrepl[26470]: panic: end task: 1 active child tasks (run daemon with env var ZREPL_TRACE_DEBUG_ENABLED=1 for more details)
Sep 24 06:53:03 srv zrepl[26470]: : end task: task still has active child tasks
Sep 24 06:53:03 srv zrepl[26470]: 
Sep 24 06:53:03 srv zrepl[26470]: goroutine 1732039 [running]:
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1.1(0xc000ab9b00)
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/daemon/logging/trace/trace.go:249 +0x451
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/daemon/logging/trace.WithTask.func1()
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/daemon/logging/trace/trace.go:272 +0x2b
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/daemon/logging/trace.WithTaskAndSpan.func1()
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/daemon/logging/trace/trace_convenience.go:41 +0x2e
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/daemon/job.(*PassiveSide).Run.func1({0xe71f60?, 0xc000bfac90?}, {0xe6fc88, 0xc000bff780}, 0xc000bfacc0)
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/daemon/job/passive.go:179 +0x209
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/rpc.NewServer.func1.1({0xe71f60, 0xc000bfac90}, {0xe6fcd8?, 0xc000bff760}, 0xbfd500?)
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/rpc/rpc_server.go:56 +0xb7
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/rpc/grpcclientidentity.NewInterceptors.func1({0xe71f60, 0xc000bfabd0}, {0xc96220?, 0xc000ba8a40}, 0xc000bff620, 0xc00038ef48)
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/rpc/grpcclientidentity/authlistener_grpc_adaptor.go:136 +0x3c3
Sep 24 06:53:03 srv zrepl[26470]: github.com/zrepl/zrepl/replication/logic/pdu._Replication_ReplicationCursor_Handler({0xcf3380?, 0xc000338080}, {0xe71f60, 0xc000bfabd0}, 0xc000c05860, 0xc0004c8870)
Sep 24 06:53:03 srv zrepl[26470]:       github.com/zrepl/zrepl/replication/logic/pdu/pdu_grpc.pb.go:240 +0x138
Sep 24 06:53:03 srv zrepl[26470]: google.golang.org/grpc.(*Server).processUnaryRPC(0xc000110700, {0xe76128, 0xc000b56f00}, 0xc0006a4b00, 0xc0004c89c0, 0x1392a40, 0x0)
Sep 24 06:53:03 srv zrepl[26470]:       google.golang.org/grpc@v1.35.0/server.go:1217 +0xc8f
Sep 24 06:53:03 srv zrepl[26470]: google.golang.org/grpc.(*Server).handleStream(0xc000110700, {0xe76128, 0xc000b56f00}, 0xc0006a4b00, 0x0)
Sep 24 06:53:03 srv zrepl[26470]:       google.golang.org/grpc@v1.35.0/server.go:1540 +0xa1b
Sep 24 06:53:03 srv zrepl[26470]: google.golang.org/grpc.(*Server).serveStreams.func1.2()
Sep 24 06:53:03 srv zrepl[26470]:       google.golang.org/grpc@v1.35.0/server.go:878 +0x98
Sep 24 06:53:03 srv zrepl[26470]: created by google.golang.org/grpc.(*Server).serveStreams.func1
Sep 24 06:53:03 srv zrepl[26470]:       google.golang.org/grpc@v1.35.0/server.go:876 +0x28a

@problame
Copy link
Member

I suspect this is a duplicate of #607 fixed by 193abbe which is already in master. I have been meaning to issue a zrepl release for months now, but life is quite busy right now.

Can you cherry pick that patch and try it out on your side?
Also, run the daemon with env var ZREPL_TRACE_DEBUG_ENABLED=1 to make zrepl capture & dump more information next time it crashes.

@problame problame added the bug label Sep 25, 2022
@problame problame added this to the 0.6 milestone Sep 25, 2022
@ZeWaren
Copy link
Author

ZeWaren commented Sep 29, 2022

Thanks for the info.

I'm now running a patched version of the latest port with 193abbe as a test.

We'll see in a few days if it helped.

@silenius
Copy link

silenius commented Oct 3, 2022

I'm also running a patched version and it seems to fix the issue. Maybe it would be worth to release a 0.5.1 with this patch ..?

@ZeWaren
Copy link
Author

ZeWaren commented Oct 4, 2022

It's been 5 days and my instance is also still running. The patch seems to be working so far.

@ahjohannessen
Copy link

A release would be appreciated 🙏

@problame
Copy link
Member

problame commented Oct 6, 2022

Yes, a release will happen this month. Sorry, very busy with personal stuff ATM.

Closing this as duplicate, see above.

@problame problame closed this as completed Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants