Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Worker.finalize_shutdown seems to hang when poll never succeeded due to server permission failure #667

Open
cretz opened this issue Jan 10, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@cretz
Copy link
Member

cretz commented Jan 10, 2024

Describe the bug

Originally reported at temporalio/sdk-python#459.

To replicate, first run a local Temporal server, then alter https://github.com/temporalio/samples-go/blob/main/grpc-proxy/proxy-server/main.go to change the gRPC dialing to:

	grpcClient, err := grpc.Dial(
		upstreamFlag,
		grpc.WithTransportCredentials(insecure.NewCredentials()),
		// grpc.WithUnaryInterceptor(clientInterceptor),
		grpc.WithUnaryInterceptor(func(
			ctx context.Context, method string,
			req, reply any,
			cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
			if strings.HasPrefix(method, "/temporal.api.workflowservice.v1.WorkflowService/Poll") {
				return status.Error(codes.PermissionDenied, "DENIED!")
			}
			return invoker(ctx, method, req, reply, cc, opts...)
		}),
	)

This will simulate permission denied upstream. Then run it, e.g. go run ./grpc-proxy/proxy-server and it'll run a proxy at 127.0.0.1:8081 that proxies to 127.0.0.1:7233 by default. Then run the Python script given in that issue pointing to this proxy:

import asyncio
from temporalio import activity, client, workflow, worker

@activity.defn
async def a() -> None:
    pass


@workflow.defn
class Workflow:
    @workflow.run
    async def run(self) -> None:
        pass


async def main():
    c = await client.Client.connect(
        "127.0.0.1:8081",
        rpc_metadata={"authorization": "wrong_token"},
        # tls=True,
    )
    w = worker.Worker(
        c,
        task_queue="default",
        activities=[a],
        workflows=[Workflow],
    )
    await w.run()


if __name__ == "__main__":
    asyncio.run(main())

After about a minute or so, the worker will fatally error due to too many permission denials. But the worker process gets to await finalize_shutdown and hangs. Feel free to add logs inside .venv/Lib/site-packages/temporalio/worker/_worker.py's run call to see how far it gets.

@pwalessi-dell
Copy link

pwalessi-dell commented May 6, 2024

It looks like there is a race condition or deadlock. Sometimes when I run, I get RuntimeError: Workflow worker failed and the test app ends. Other times, it hangs on worker.finalize_shutdown().await; in worker.rs finalize_shutdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants