Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't close Gateway with ongoing requests #1518

Closed
Zelldon opened this issue Oct 16, 2018 · 2 comments · Fixed by #1763 or #1801
Closed

Can't close Gateway with ongoing requests #1518

Zelldon opened this issue Oct 16, 2018 · 2 comments · Fixed by #1763 or #1801
Assignees

Comments

@Zelldon
Copy link
Member

Zelldon commented Oct 16, 2018

Scenario:

  • Start a Broker with embedded Gateway
  • Create a Job
  • Create new Worker
  • Stop the broker with the Gateway
  • -> Deadlock Gateway can't be closed
    clientRule.getJobClient().newCreateCommand().jobType("test").send().join();
    final RecordingJobHandler jobHandler = new RecordingJobHandler();
    clientRule.getJobClient().newWorker().jobType("test").handler(jobHandler).open();
    brokerRule.stopBroker();

If we remove line 2 and 3 it works.

@menski
Copy link
Contributor

menski commented Oct 17, 2018

Problem

This is not related to job workers but in general if there is an ongoing request during shutdown. The reason for this is the order in which we shutdown our components:

  1. Service Container close, which closes all stream processors, i.e. no requests are processed and answered anymore
    https://github.com/zeebe-io/zeebe/blob/1f7f5103b71c32904a8d87ff49828f6889b4ce2b/broker-core/src/main/java/io/zeebe/broker/system/SystemContext.java#L218
  2. Close the gateway
    1. First we close the broker client, which drops all broker requests without failing them
      https://github.com/zeebe-io/zeebe/blob/1f7f5103b71c32904a8d87ff49828f6889b4ce2b/gateway/src/main/java/io/zeebe/gateway/Gateway.java#L109
    2. Shutdown the gRPC server and await its termination
      https://github.com/zeebe-io/zeebe/blob/1f7f5103b71c32904a8d87ff49828f6889b4ce2b/gateway/src/main/java/io/zeebe/gateway/Gateway.java#L114-L116

The server.shutdown() should ensure a graceful shutdown with allowing ongoing requests to finish. This will never terminate if there where open requests in our scenario as we already closed the stream processors. And the requests will also never fail as we already closed the broker client.

Possible solution

No matter what solution we choose we have to close the gRPC server before closing the broker client in the gateway. As the client is required to complete the ongoing requests. This will at least ensure that requests timeout and the server can terminate, even if the stream processors are already shutdown.
To allow graceful shutdowns of the embedded gateway we have to make sure to close the gateway as first component before stopping anything else, so we ensure that all in flight requests are completed before shutting down the internal processing.

As a quick dirty workaround we can use server.shutdownNow() to also terminate ongoing gRPC requests.

@menski menski changed the title Can't close Gateway with an active Job worker Can't close Gateway with ongoing requests Oct 17, 2018
@ghost ghost assigned Zelldon Oct 17, 2018
@ghost ghost added needs review and removed ready labels Oct 17, 2018
menski added a commit that referenced this issue Oct 22, 2018
chore(qa): remove topic subscription from it tests

 * replace in most test cases the ClientRule with the
 GrpcClientRule
 * rename ClientRule to GatewayRule
 * create AssertionHelper for the usage of the RecordExporter
 * #1518 ignores some tests where gateway can't be closed
 * to remove gateway from clustering rule etc we have to do this #1520

closes #1512
closes #1525
@menski menski assigned MiguelPires and unassigned Zelldon Nov 29, 2018
@ghost ghost added needs review and removed in progress labels Dec 7, 2018
ghost pushed a commit that referenced this issue Dec 11, 2018
1763: fix(broker-core): close gateway before service containers r=MiguelPires a=MiguelPires

The gateway was being closed after the service containers so it would hang since these would never terminate. I separated the gateway's resource releasing delegate from the others, so we could close it before the service containers.
closes #1518 


Co-authored-by: Miguel Pires <miguel.pires@camunda.com>
@ghost ghost closed this as completed in #1763 Dec 11, 2018
@ghost ghost removed the Status: Needs Review label Dec 11, 2018
@Zelldon
Copy link
Member Author

Zelldon commented Dec 17, 2018

There are still some tests ignored

@Zelldon Zelldon reopened this Dec 17, 2018
@ghost ghost assigned menski Dec 17, 2018
@ghost ghost added the Status: In Progress label Dec 17, 2018
ghost pushed a commit that referenced this issue Dec 18, 2018
1801: Reenable reprocessing tests related to job activation and incidents r=Zelldon a=menski

closes #1518 
related to #1033 


Co-authored-by: Sebastian Menski <sebastian.menski@camunda.com>
@ghost ghost closed this as completed in #1801 Dec 18, 2018
@ghost ghost removed the Status: Needs Review label Dec 18, 2018
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants