Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Race condition on canceling a workflow instance #4352

Closed
Zelldon opened this issue Apr 22, 2020 · 1 comment · Fixed by #4590
Closed

Possible Race condition on canceling a workflow instance #4352

Zelldon opened this issue Apr 22, 2020 · 1 comment · Fixed by #4590
Assignees
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request

Comments

@Zelldon
Copy link
Member

Zelldon commented Apr 22, 2020

Describe the bug

Reported by a cloud user was that it seem that operate get out of sync. But there real problem was that the workflow instance get stuck during canceling the instance.

Imagine the following workflow:

multiBug

Task B is completed and during completing and taking the next sequence flow the workflow instance is canceled by the user. What now can happen is that the canceling will not clean up correctly all scopes and the instance get stuck. In our case only the Task A and the Sub Process was terminated correctly. The multi instance and workflow instance was still alive.

To Reproduce

This is also reproducible via an engine unit test and the following process
multiBug.bpmn.txt.

Test
/*
 * Copyright © 2020  camunda services GmbH (info@camunda.com)
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *        http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.
 *
 */

package io.zeebe.engine.processor.workflow.multiinstance;

import io.zeebe.engine.util.EngineRule;
import io.zeebe.engine.util.RecordToWrite;
import io.zeebe.model.bpmn.Bpmn;
import io.zeebe.protocol.record.intent.JobIntent;
import io.zeebe.protocol.record.intent.WorkflowInstanceIntent;
import io.zeebe.protocol.record.value.BpmnElementType;
import io.zeebe.test.util.record.RecordingExporter;
import io.zeebe.test.util.record.RecordingExporterTestWatcher;
import java.util.Arrays;
import java.util.stream.Collectors;
import org.junit.ClassRule;
import org.junit.Rule;
import org.junit.Test;

public class MultiInstanceBugTest {

@ClassRule public static final EngineRule ENGINE = EngineRule.singlePartition();
public static final String TASK_ELEMENT_ID = "task";
private static final String PROCESS_ID = "process";
private static final String SUB_PROCESS_ELEMENT_ID = "sub-process";
private static final String JOB_TYPE = "test";
private static final String INPUT_COLLECTION = "items";
private static final String INPUT_ELEMENT = "item";

@Rule
public final RecordingExporterTestWatcher recordingExporterTestWatcher =
new RecordingExporterTestWatcher();

@Test
public void shouldActivateStartEventForEachElement() {
// given
final var resourceAsStream =
MultiInstanceBugTest.class.getResourceAsStream("/workflows/multiBug.bpmn");
final var bpmnModelInstance = Bpmn.readModelFromStream(resourceAsStream);
ENGINE.deployment().withXmlResource(bpmnModelInstance).deploy();

final long workflowInstanceKey =
    ENGINE
        .workflowInstance()
        .ofBpmnProcessId(PROCESS_ID)
        .withVariable(INPUT_COLLECTION, Arrays.asList(10, 20, 30))
        .create();

final var instanceRecordValueRecord =
    RecordingExporter.workflowInstanceRecords()
        .withIntent(WorkflowInstanceIntent.ELEMENT_ACTIVATED)
        .withElementType(BpmnElementType.PROCESS)
        .getFirst();

final var taskA =
    RecordingExporter.jobRecords().withIntent(JobIntent.CREATED).withType("a").getFirst();
final var taskB =
    RecordingExporter.jobRecords().withIntent(JobIntent.CREATED).withType("b").getFirst();
ENGINE.writeRecords(RecordToWrite.command().job(JobIntent.COMPLETE).key(taskB.getKey()));

RecordingExporter.jobRecords().withIntent(JobIntent.COMPLETED).withType("b").getFirst();
//    ENGINE.stop();

// when
ENGINE.writeRecords(
    RecordToWrite.command()
        .key(workflowInstanceKey)
        .workflowInstance(WorkflowInstanceIntent.CANCEL, instanceRecordValueRecord.getValue()));

//    ENGINE.start();

// then
final var instanceCanceled =
    RecordingExporter.workflowInstanceRecords()
        .withIntent(WorkflowInstanceIntent.ELEMENT_TERMINATED)
        .withElementType(BpmnElementType.PROCESS)
        .getFirst();

final var collect =
    RecordingExporter.workflowInstanceRecords()
        .withIntent(WorkflowInstanceIntent.ELEMENT_TERMINATED)
        .collect(Collectors.toList());

}
}

Be aware that this is a race condition, which means that the test might not fail on the first try.

Expected behavior

The workflow instance can be terminated without any problems.

Log/Stacktrace
We have extracted the records from the failed scenario you can find them in the
records.txt

We see that only the task and sub process are terminated and the sequence flow after the task B is taken. Actually the same sequence flow seems to be taken twice, but it has different scope ids, maybe related to the problem. Be aware that we can't share here the actual bpmn process to protect our user. So this means the output above doesn't match to our current model which we have shown in the example.

BUT if we run the test we can see similar output
output-test.txt

Environment:

  • Zeebe Version: 0.23.0
@Zelldon Zelldon added kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog labels Apr 22, 2020
@npepinpe npepinpe added severity/low Marks a bug as having little to no noticeable impact for the user Impact: Usability and removed severity/low Marks a bug as having little to no noticeable impact for the user labels Apr 27, 2020
@menski menski added the support Marks an issue as related to a customer support request label Apr 30, 2020
@menski
Copy link
Contributor

menski commented Apr 30, 2020

Support Case: https://jira.camunda.com/browse/SUPPORT-7623

Waiting for customer to prioritize

@saig0 saig0 added severity/mid Marks a bug as having a noticeable impact but with a known workaround and removed Severity: Major labels May 15, 2020
@saig0 saig0 self-assigned this May 25, 2020
ghost pushed a commit that referenced this issue May 26, 2020
4590: chore(engine): migrate sub-process processor r=saig0 a=saig0

# Description

* migrate sub-process processor
* fix termination of an embedded sub-process with a waiting token on a joining parallel gateway
* clean up tests for embedded sub-process

## Related issues

closes #4474 
closes #4400 
closes #4352

#

Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
ghost pushed a commit that referenced this issue May 26, 2020
4590: chore(engine): migrate sub-process processor r=saig0 a=saig0

# Description

* migrate sub-process processor
* fix termination of an embedded sub-process with a waiting token on a joining parallel gateway
* clean up tests for embedded sub-process

## Related issues

closes #4474 
closes #4400 
closes #4352

#

Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
ghost pushed a commit that referenced this issue May 26, 2020
4590: chore(engine): migrate sub-process processor r=saig0 a=saig0

# Description

* migrate sub-process processor
* fix termination of an embedded sub-process with a waiting token on a joining parallel gateway
* clean up tests for embedded sub-process

## Related issues

closes #4474 
closes #4400 
closes #4352

#

Co-authored-by: Philipp Ossler <philipp.ossler@gmail.com>
@ghost ghost closed this as completed in d2e92a6 May 26, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes an issue or PR as a bug scope/broker Marks an issue or PR to appear in the broker section of the changelog severity/mid Marks a bug as having a noticeable impact but with a known workaround support Marks an issue as related to a customer support request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants