Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory extraction task in workflow extracts unexpected items #1816

Open
ishizuka-mihoko opened this issue Sep 21, 2023 · 2 comments
Open

Comments

@ishizuka-mihoko
Copy link

I have a question about Digdag workflows.
In a .dig file, I have the following configuration:

_export:
  plugin:
    repositories:
      - https://jitpack.io/
    dependencies:
      - com.github.takemikami:digdag-plugin-shresult:0.0.3

+find_dirs:
  sh_result>: |
    find ./ -maxdepth 1 -type d -exec basename {} \; | grep -v "^.$" | grep -E '^\[' | sort -u | tr '\n' ',' | sed 's/,$//'
  destination_variable: dirs
  stdout_format: text

+call_dig:
  for_each>:
    dir: "${dirs.split(',')}"
  _parallel: true
  _do:
    +loop_dig:
      call>: ${dir}/unload.dig

In the ‘+find_dirs’ task, I’m attempting to extract directory names that start with ‘[’ and store them in the ‘dirs’ variable. However, during Digdag execution, there are instances where items other than directory names starting with ‘[’ are being stored in ‘dirs’.
Is this potentially a bug in Digdag? I would appreciate your confirmation.
Here are the ‘dirs’ values that I want to extract and the ‘dirs’ values that were mistakenly extracted in this operation:

Desired ‘dirs’:
dirs: [foo1]bar1,[foo2]bar2,[foo3]bar3,[foo4]bar4

‘Dirs’ extracted in the current operation (Example 1: Code from the executed workflow’s .dig file is being extracted):

dirs: >+ 
2023-09-14 21:45:48 +0000 [INFO] (1913@[0:test_wf]+test_coordinator+call_dig^sub^sub+find_dirs) io.digdag.core.agent.OperatorManager: sh_result>: find ./ -maxdepth 1 -type d -exec basename {} \; | grep -v "^.$" | grep -E '^\[' | sort -u | tr '\n' ',' | sed 's/,$//' 

‘Dirs’ extracted in the current operation (Example 2: Logs from other projects executed at the same time are being extracted):

dirs: >- 
2023-09-07 09:15:02 +0000 [INFO] (5117@[0:cost_alert]+cost_change+scripts+notice) io.digdag.core.agent.OperatorManager: sh>: cd cost_alert python3 [main.py](http://main.py/) -t increase [foo1]bar1,[foo2]bar2,[foo3]bar3,[foo4]bar4
@hiroyuki-sato
Copy link
Contributor

Hello, @ishizuka-mihoko

It is necessary to write the detail about your environment and reproduce steps for investigate the issue. I tried to reproduce the problem. But It seems work well in my environment.

sh_reslut> operator doesn't the project in digdag. It's a 3rd party plugin.

  • digdag: 0.10.5
  • Java 1.8 (Zuru)
  • OS: macOS 13.5.2
find * -type f -print
[foo1]/unload.dig
[foo2]/unload.dig
[foo3]/unload.dig
hoge/unload.dig
test.dig
_export:
  plugin:
    repositories:
      - https://jitpack.io/
    dependencies:
      - com.github.takemikami:digdag-plugin-shresult:0.0.3

+find_dirs:
  sh_result>: |
    find . -maxdepth 1 -type d -exec basename {} \; | grep -v "^.$" | grep -E '^\[' | sort -u | tr '\n' ',' | sed 's/,$//'
  destination_variable: dirs
  stdout_format: text

+call_dig:
  for_each>:
    dir: "${dirs.split(',')}"
  _parallel: true
  _do:
    +loop_dig:
      call>: ${dir}/unload.dig
cat */*.dig
+tasks1:
  echo>: foo1/unload.dig
+tasks1:
  echo>: foo2/unload.dig
+tasks1:
  echo>: foo3/unload.dig
+tasks1:
  echo>: hoge/unload.dig
digdag run -a test.dig
2023-09-22 00:09:24 +0900: Digdag v0.10.5
2023-09-22 00:09:24 +0900 [WARN] (main): Reusing the last session time 2023-09-21T00:00:00+00:00.
2023-09-22 00:09:24 +0900 [INFO] (main): Using session /private/tmp/hoge/.digdag/status/20230921T000000+0000.
2023-09-22 00:09:24 +0900 [INFO] (main): Starting a new session project id=1 workflow name=test session_time=2023-09-21T00:00:00+00:00
2023-09-22 00:09:25 +0900 [INFO] (0018@[0:default:1:1]+test+find_dirs): sh_result>: find . -maxdepth 1 -type d -exec basename {} \; | grep -v "^.$" | grep -E '^\[' | sort -u | tr '\n' ',' | sed 's/,$//'

2023-09-22 00:09:25 +0900 [INFO] (0018@[0:default:1:1]+test+call_dig): for_each>: {dir=["[foo1]","[foo2]","[foo3]"]}
2023-09-22 00:09:25 +0900 [INFO] (0018@[0:default:1:1]+test+call_dig^sub+for-0=dir=0=%5Bfoo1%5D+loop_dig): call>: [foo1]/unload.dig
2023-09-22 00:09:25 +0900 [INFO] (0020@[0:default:1:1]+test+call_dig^sub+for-0=dir=1=%5Bfoo2%5D+loop_dig): call>: [foo2]/unload.dig
2023-09-22 00:09:25 +0900 [INFO] (0021@[0:default:1:1]+test+call_dig^sub+for-0=dir=2=%5Bfoo3%5D+loop_dig): call>: [foo3]/unload.dig
2023-09-22 00:09:25 +0900 [INFO] (0018@[0:default:1:1]+test+call_dig^sub+for-0=dir=1=%5Bfoo2%5D+loop_dig^sub+tasks1): echo>: foo2/unload.dig
foo2/unload.dig
2023-09-22 00:09:25 +0900 [INFO] (0018@[0:default:1:1]+test+call_dig^sub+for-0=dir=2=%5Bfoo3%5D+loop_dig^sub+tasks1): echo>: foo3/unload.dig
foo3/unload.dig
2023-09-22 00:09:25 +0900 [INFO] (0018@[0:default:1:1]+test+call_dig^sub+for-0=dir=0=%5Bfoo1%5D+loop_dig^sub+tasks1): echo>: foo1/unload.dig
foo1/unload.dig
Success. Task state is saved at /private/tmp/hoge/.digdag/status/20230921T000000+0000 directory.
  * Use --session <daily | hourly | "yyyy-MM-dd[ HH:mm:ss]"> to not reuse the last session time.
  * Use --rerun, --start +NAME, or --goal +NAME argument to rerun skipped tasks.

@ishizuka-mihoko
Copy link
Author

ishizuka-mihoko commented Sep 22, 2023

Hello, @hiroyuki-sato
Thank you for your comment. Here is my environment:
digdag: 0.10.4
macOS: 12.6

I tried to reproduce the problem. But It seems work well in my environment.

In my environment too, it generally works well, but occasionally, I observe behavior where it extracts incorrect information.

sh_reslut> operator doesn't the project in digdag. It's a 3rd party plugin.

There could be a possibility that the issue is related to the plugin itself. Depending on the situation, I might consider reaching out to the plugin's support for assistance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants