Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some workflows cannot be retried due to "java.lang.IllegalStateException: Duplicate key" #1552

Open
ysmazda opened this issue Apr 7, 2021 · 3 comments

Comments

@ysmazda
Copy link

ysmazda commented Apr 7, 2021

It seems certain workflow definitions cause an error and the workflow cannot be retried. A retried attempt is not created due to the server side error "java.lang.IllegalStateException: Duplicate key".
I can retry such workflows from the command line by digdag retry --all but cannot by digdag retry --resume.
I don't know the exact condition to reproduce the issue, but here are examples of workflow definitions that cause this issue.

+tasks:
  _retry: 1

  +task_1:
    echo>: task 1
  
  +task_2:
    fail>: task 2
+tasks:
  _retry: 1

  +task_1:
    echo>: task 1
  
  +task_2:
    echo>: task 2

  +task_3:
    fail>: task 3

These workflows fail, which is on purpose to reproduce the issue.
When I try retrying the attempt through digdag WebUI, a new attempt is not created. The digdag server returns 500 (Internal Server Error) and outputs the following log.

digdag server error log
2021-04-07 12:40:30.608 +0900 [ERROR] (XNIO-1 task-27): UT005023: Exception handling request to /api/attempts
org.jboss.resteasy.spi.UnhandledException: java.lang.IllegalStateException: Duplicate key +repro+tasks+task_1 (attempted merging values ResumingTask{sourceTaskId=35, fullName=+repro+tasks+task_1, config={"echo>":"task 1"}, updatedAt=2021-04-07T03:24:20Z, subtaskConfig={}, exportParams={}, resetStoreParams=[], storeParams={}, report=TaskReport{inputs=[], outputs=[]}, error={}} and ResumingTask{sourceTaskId=37, fullName=+repro+tasks+task_1, config={"echo>":"task 1"}, updatedAt=2021-04-07T03:24:20Z, subtaskConfig={}, exportParams={}, resetStoreParams=[], storeParams={}, report=TaskReport{inputs=[], outputs=[]}, error={}})
        at org.jboss.resteasy.core.ExceptionHandler.handleApplicationException(ExceptionHandler.java:78)
        at org.jboss.resteasy.core.ExceptionHandler.handleException(ExceptionHandler.java:222)
        at org.jboss.resteasy.core.SynchronousDispatcher.writeException(SynchronousDispatcher.java:179)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:422)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:213)
        at org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:228)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56)
        at org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
        at io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85)
        at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:129)
        at io.digdag.guice.rs.server.undertow.UndertowServer$SetListenAddressNameServletFilter.doFilter(UndertowServer.java:139)
        at io.undertow.servlet.core.ManagedFilter.doFilter(ManagedFilter.java:61)
        at io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(FilterHandler.java:131)
        at io.undertow.servlet.handlers.FilterHandler.handleRequest(FilterHandler.java:84)
        at io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62)
        at io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36)
        at io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:131)
        at io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46)
        at io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64)
        at io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60)
        at io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77)
        at io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43)
        at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:292)
        at io.undertow.servlet.handlers.ServletInitialHandler.access$100(ServletInitialHandler.java:81)
        at io.undertow.servlet.handlers.ServletInitialHandler$2.call(ServletInitialHandler.java:138)
        at io.undertow.servlet.handlers.ServletInitialHandler$2.call(ServletInitialHandler.java:135)
        at io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:48)
        at io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43)
        at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:272)
        at io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81)
        at io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:104)
        at io.undertow.server.Connectors.executeRootHandler(Connectors.java:202)
        at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:805)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.lang.IllegalStateException: Duplicate key +repro+tasks+task_1 (attempted merging values ResumingTask{sourceTaskId=35, fullName=+repro+tasks+task_1, config={"echo>":"task 1"}, updatedAt=2021-04-07T03:24:20Z, subtaskConfig={}, exportParams={}, resetStoreParams=[], storeParams={}, report=TaskReport{inputs=[], outputs=[]}, error={}} and ResumingTask{sourceTaskId=37, fullName=+repro+tasks+task_1, config={"echo>":"task 1"}, updatedAt=2021-04-07T03:24:20Z, subtaskConfig={}, exportParams={}, resetStoreParams=[], storeParams={}, report=TaskReport{inputs=[], outputs=[]}, error={}})
        at java.base/java.util.stream.Collectors.duplicateKeyException(Collectors.java:133)
        at java.base/java.util.stream.Collectors.lambda$uniqKeysMapAccumulator$1(Collectors.java:180)
        at java.base/java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
        at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
        at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
        at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
        at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:578)
        at io.digdag.core.workflow.TaskControl.addTasks(TaskControl.java:127)
        at io.digdag.core.workflow.TaskControl.addInitialTasksExceptingRootTask(TaskControl.java:62)
        at io.digdag.core.workflow.WorkflowExecutor.lambda$storeTasks$4(WorkflowExecutor.java:388)
        at io.digdag.core.database.DatabaseSessionStoreManager$DatabaseSessionControlStore.insertRootTask(DatabaseSessionStoreManager.java:1555)
        at io.digdag.core.workflow.WorkflowExecutor.storeTasks(WorkflowExecutor.java:386)
        at io.digdag.core.workflow.WorkflowExecutor.lambda$submitTasks$3(WorkflowExecutor.java:316)
        at io.digdag.core.database.DatabaseSessionStoreManager$DatabaseSessionControlStore.putAndLockSession(DatabaseSessionStoreManager.java:1612)
        at io.digdag.core.database.DatabaseSessionStoreManager$DatabaseSessionStore.lambda$putAndLockSession$3(DatabaseSessionStoreManager.java:1254)
        at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:192)
        at io.digdag.core.database.BasicDatabaseStoreManager.transaction(BasicDatabaseStoreManager.java:180)
        at io.digdag.core.database.DatabaseSessionStoreManager$DatabaseSessionStore.putAndLockSession(DatabaseSessionStoreManager.java:1252)
        at io.digdag.core.workflow.WorkflowExecutor.submitTasks(WorkflowExecutor.java:300)
        at io.digdag.core.workflow.WorkflowExecutor.submitWorkflow(WorkflowExecutor.java:219)
        at io.digdag.server.rs.AttemptResource.lambda$startAttempt$6(AttemptResource.java:270)
        at io.digdag.core.database.ThreadLocalTransactionManager.begin(ThreadLocalTransactionManager.java:263)
        at io.digdag.server.rs.AttemptResource.startAttempt(AttemptResource.java:244)
        at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invokeMain(DigdagTimedMethodInterceptor.java:58)
        at io.digdag.server.metrics.DigdagTimedMethodInterceptor.invoke(DigdagTimedMethodInterceptor.java:31)
        at jdk.internal.reflect.GeneratedMethodAccessor139.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:140)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249)
        at org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:236)
        at org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:406)
        ... 37 common frames omitted

The following examples do NOT cause the issue.

+tasks:
  +task_1:
    echo>: task 1
  
  +task_2:
    fail>: task 2
+tasks:
  _retry: 1

  +task_1:
    fail>: task 1
  
  +task_2:
    echo>: task 2

Based on these observations, I think the issue occurs under the following conditions.

  • _retry is set to a task
  • the task has two or more subtasks
  • one of the subtasks except the first one has failed

Versions

$ digdag --version
0.10.0
 
$ java --version
openjdk 11.0.7 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.7+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.7+10, mixed mode)

$ sw_vers
ProductName:    macOS
ProductVersion: 11.2.1
BuildVersion:   20D74
@hiroyuki-sato
Copy link
Contributor

Hello, @ysmazda
Thank you for reporting the issue.

I could reproduce this error in my environment.

  • macOS: 11.2.3
  • Digdag: 0.10.0

Here is reproduce steps.

  1. digdag server -m
  2. clone retry_test
  3. digdag push retry_test
  4. digdag start retry_test retry_test --session now
  5. digdag retry 1 --resume --latest-revision

It seems that _retry creates tasks that have the same name
That's why? It causes a duplicate key error.
Caused by: java.lang.IllegalStateException: Duplicate key ResumingTask

@yoyama , @komamitsu Could you take a look when you get a chance?

digdag_error

@seiyab
Copy link
Contributor

seiyab commented Apr 24, 2021

I think the expected result is not obvious and should be discussed to solve this problem.

Examples

Workflow definition

+tasks:
  _retry: 1

  +task_1:
    echo>: task 1
  
  +task_2:
    fail>: task 2

Retry failed from failed task

+tasks+task_2
+tasks+task_2

Retry failed from the nearest ancestor with _retry

+tasks+task_1
+tasks+task_2
+tasks+task_1
+tasks+task_2

Retry failed from failed task, and _retry as if it's first attempt

+tasks+task_2
+tasks+task_1
+tasks+task_2

@KentFujii
Copy link
Contributor

KentFujii commented Apr 22, 2022

I’m sure that Retry failed from failed task is expected to solve this problem.
My suggestion is that the workflow definitions below should be the same result for idempotency.

+tasks:
  _retry: 1
  
  +task_1:
    sh>: echo "task 1"
  
  +task_2:
    sh>: exit 1
+tasks:
  
  +task_1:
    _retry: 1
    sh>: echo "task 1"
  
  +task_2:
    _retry: 1
    sh>: exit 1

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants