Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tasks should run dependsOn before hashing inputs #8051

Open
1 task done
JavaScriptBach opened this issue Apr 26, 2024 · 4 comments
Open
1 task done

Tasks should run dependsOn before hashing inputs #8051

JavaScriptBach opened this issue Apr 26, 2024 · 4 comments
Labels
kind: bug Something isn't working needs: team input Filter for core team meetings owned-by: turborepo

Comments

@JavaScriptBach
Copy link

Verify canary release

  • I verified that the issue exists in the latest Turborepo canary release.

Link to code that reproduces this issue

https://github.com/JavaScriptBach/turbo-caching-bug

What package manager are you using / does the bug impact?

Yarn v2/v3/v4 (node_modules linker only)

What operating system are you using?

Linux

Which canary version will you have in your reproduction?

1.13.3-canary.4

Describe the Bug

If I have a task A whose input is a generated file produced by another task B, I have to Turbo task A twice before it gets cached.

What I think is happening

I think it's because on the first invocation:

  1. Turbo hashes the input, but the generated file doesn't exist yet.
  2. Turbo runs the dependent task, which produces the generated file
  3. Turbo runs the original task and stores it under the now-stale hash.

On second invocation:

  1. Turbo sees that the generated file now exists, so the hash is different from before, therefore it runs the original task again.

Why I think it's a bug

In order to obtain correct caching behavior, I currently have to exclude all generated files from the input. This is non-intuitive because the generated files are conceptually inputs to my task. Furthermore, I've already told Turbo that it depends on my codegen task.

It would be nice for Turbo to handle this, perhaps by running all dependsOn tasks before hashing the inputs to the original task?

Expected Behavior

Everything is cached after running Turbo once.

To Reproduce

See the linked repo.

Additional context

No response

@JavaScriptBach JavaScriptBach added kind: bug Something isn't working needs: triage New issues get this label. Remove it after triage owned-by: turborepo labels Apr 26, 2024
@peplin
Copy link

peplin commented Apr 26, 2024

Here's an example of 3 sequential turbo runs from the linked repository. You can see how it takes 3 runs to get to FULL TURBO:

$ turbo my-test --summarize                                                                                                                     
• Running my-test
• Remote caching disabled
my-codegen: cache miss, executing fb830814bce3d882
my-codegen:
my-test: cache miss, executing 2ef6da69cc2b018d
my-test:

  Tasks:    2 successful, 2 total
 Cached:    0 cached, 2 total
   Time:    373ms
Summary:    /Users/peplin/dev/turbo-caching-bug/.turbo/runs/2fejXqPcSUhLBGZsQgZ3Flk77ml.json


$ node_modules/.bin/turbo my-test --summarize
• Running my-test
• Remote caching disabled
my-codegen: cache hit, replaying logs fb830814bce3d882
my-codegen:
my-test: cache miss, executing fcb34ccdb1cf4377
my-test:

  Tasks:    2 successful, 2 total
 Cached:    1 cached, 2 total
   Time:    242ms
Summary:    /Users/peplin/dev/turbo-caching-bug/.turbo/runs/2fejY73vyb4eLqszBlFWviHymzE.json


$ node_modules/.bin/turbo my-test --summarize
• Running my-test
• Remote caching disabled
my-codegen: cache hit (outputs already on disk), replaying logs fb830814bce3d882
my-codegen:
my-test: cache hit, replaying logs fcb34ccdb1cf4377
my-test:

  Tasks:    2 successful, 2 total
 Cached:    2 cached, 2 total
   Time:    73ms >>> FULL TURBO
Summary:    /Users/peplin/dev/turbo-caching-bug/.turbo/runs/2fejYAIPKlkQD5auUDsWA7FLt2M.json

Here are the 3 summary files:

1.json
2.json
3.json

1.json does not include my-codegen.txt in its inputs, so the cache hash is different than run 2. I would expect the first run to be a cache miss, but with a generated hash that matches the second run.

@weyert
Copy link
Contributor

weyert commented Apr 28, 2024

Yeah, I think I might have a similar problem but for the codegen-task is in the same package, e.g. using:

{
  "$schema": "https://turbo.build/schema.json",
  "extends": ["//"],
  "pipeline": {
    "generate": {
      "outputMode": "new-only",
      "inputs": [
        "src/**/*.yml"
      ],
      "outputs": ["src/**/*", "!src/**/*.yml"],
      "cache": true
    },
    "build": {
      "outputMode": "new-only",
      "inputs": [
        "!src/**/*.yml"
      ],
      "outputs": ["lib/**"],
      "dependsOn": ["generate"],
      "cache": true
    }
  }
}

@NicholasLYang NicholasLYang added needs: team input Filter for core team meetings and removed needs: triage New issues get this label. Remove it after triage labels Apr 30, 2024
@mattico
Copy link

mattico commented Aug 29, 2024

I also ran into this issue and made a repro of my own before I found yours: https://github.com/mattico/turborepo-repro

I can verify this issue still exists with version 2.0.15-canary.4.

@Leksat
Copy link

Leksat commented Sep 6, 2024

Just met this issue. Confirming that 2.1.2-canary.0 is affected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Something isn't working needs: team input Filter for core team meetings owned-by: turborepo
Projects
None yet
Development

No branches or pull requests

6 participants