Skip to content

Conversation

vitarb
Copy link
Contributor

@vitarb vitarb commented Nov 5, 2020

Recently we've added a connection backoff resetter as a way to mitigate long gRPC backoff intervals (look here for details).

The way resetter works is that we have a thread pool that periodically resets connection backoff allowing threads to detect backends faster.
Initially those threads in the thread pool used default thread factory resulting in creation of user threads (as opposed to daemon threads) and were blocking JVM shutdown in case of failure in main program.

This caused unwanted behavior and displayed on our java canary as a "stuck jvm" with crashed main in case if the process was unable to startup properly (e.g. register the namespace).

With this change threads are made daemon threads and will no longer block jvm exit.

Example canary startup log when backend is unavailable after the change (expected behavior, JVM exits if it's unable to create a namespace):

Exception in thread "main" io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at io.temporal.api.workflowservice.v1.WorkflowServiceGrpc$WorkflowServiceBlockingStub.registerNamespace(WorkflowServiceGrpc.java:2574)
	at io.temporal.canary.Initializer.registerNamespace(Initializer.java:150)
	at io.temporal.canary.Initializer.<init>(Initializer.java:74)
	at io.temporal.canary.Canary.start(Canary.java:22)
	at io.temporal.canary.Main.main(Main.java:42)
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /127.0.0.1:7233
Caused by: io.grpc.netty.shaded.io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /127.0.0.1:7233

Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused

	at io.grpc.netty.shaded.io.netty.channel.unix.Errors.throwConnectException(Errors.java:124)
	at io.grpc.netty.shaded.io.netty.channel.unix.Socket.finishConnect(Socket.java:243)
	at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:672)
	at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:649)
	at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:529)
	at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465)
	at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
	at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:832)

Execution failed for task ':Main.main()'.
> Process 'command '/home/fury/.jdks/openjdk-14.0.1/bin/java'' finished with non-zero exit value 1

@vitarb vitarb requested a review from mastermanu November 5, 2020 03:31
@mastermanu mastermanu changed the title Use deamon threads for connection backoff resetor Use deamon threads for connection backoff resetter Nov 5, 2020
@vitarb vitarb requested a review from mfateev November 5, 2020 03:34
@mastermanu mastermanu changed the title Use deamon threads for connection backoff resetter Use daemon threads for connection backoff resetter Nov 5, 2020
@vitarb vitarb force-pushed the use-deamon-threads branch from 5bb4c7d to 2d830f1 Compare November 5, 2020 03:45
@vitarb vitarb merged commit 370bb6e into temporalio:master Nov 5, 2020
@vitarb vitarb deleted the use-deamon-threads branch November 5, 2020 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants