-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached files differ from original command execution's output files #7410
Comments
I've been digging at this trying to replicate the issue without using a next build and have come up with nothing helpful, but in the process I found something very interesting... Simply manually unarchiving the files from the turbo cache The command I'm using to unarchive the files is: This manually unarchived version also works perfectly. No server errors like when I try to run a server based on the files restored from turbo's cache. So this tells me that the issue is probably one or both of:
Just quickly looking at the code I feel like
The idea being that by copying the files and following symlinks, the resulting build would be devoid of both hard links (due to copying the files) and symlinks (due to the The results here at least partially confirm my hunch. Before making this change in my big monorepo project the diff showed both:
After making this change to copy the files and dereference links, all the files which were present in the original but not in the cached version were fixed and the diff now only shows loads of files which are only present in the cached version. I was hoping that this would at least get things working again (and could maybe be a temporary workaround), however, doing the file copying seems to screw up the next build. Trying to run the server using the copied/dereferenced version using either the original files or the turbo cached version errors out with:
So to summarize, this issue is almost certainly due to how Turbo unarchives its caches and handles the output files. I'm still trying to find the precise file/directory conditions which cause these issues. |
Just went through tested turbo versions to see if there was a point in the past where this issue didn't exist and then where it appeared. Here's a timeline of what I found:
I looked through the 1.11.0 changes and this one stuck out to me: #6662 I tried downgrading my monorepo project to |
Thanks for the exceptionally detailed write-up! I am going to work through some of this and see if I can identify a fix. |
Hey, had a look at the code directly and diff you identified and tried to write some tests but can't trigger the behaviour. Your repro-repo is private I think (I am getting a 404). Mind flipping it to public? |
D'oh... yeah that was a mistake. It's public now. |
Any update? Have you been able to reproduce the issue in the repo? |
Possibly related to #6823. See if you can replicate the error message from that issue on Turbo 1.11. |
Awesome!! Is there a way I can install that PR's version locally in my big monorepo project so I can do a more thorough test? Is there a snapshot version or something I can use? |
#7628) ### Description Fixes #7410 This is just #6662, but applied to directories. See [Entry::path](https://docs.rs/tar/latest/tar/struct.Entry.html#method.path) for explanation as to why one should never rely on a header's path. ### Testing Instructions In the first commit added a failing unit test, test passes after fix. Also tested against reproduction: https://github.com/trappar/turbo-cache-missing-output-files Just need to update `reproduce-issue.sh` to use the `turbo` in this branch: ``` ... Diffing the original force-build's .next directory with the cache-hit-build's .next directory [0 olszewski@chriss-mbp] /tmp/turbo-cache-missing-output-files $ ``` Closes TURBO-2532
You can checkout that PR and I'm cutting a canary you can use in your project by just changing the |
Just ran my tests against my monorepo project and I can confirm that this fixed part of the issue - the part that showed up in my repro repo, but this doesn't actually fix the bulk of the problems. Would you like me to open a new issue? I'm honestly not sure if I'll be able to make a repro of the issues I'm seeing, but here's the diff which I assembled using a similar script to the one from my repro repo:
|
Interesting. Are those diffs a regression compared to the previous canary or are those files missing in the cache hit on both Are the files missing from the cache hit all just regular files? |
As far as I can tell those diffs are a subset of the diffs that I originally saw when testing 1.12.4. I'll downgrade to 1.12.5-canary.0 and try rerunning. Will update this comment once I have those results. I think there are two distinct bugs causing these issues. The patterns I saw seemed to be something like:
Update: Here's the diff output on 1.12.5-canary.0:
Yeesh, that's long. LMK if you want me to remove it at some point to keep this manageable. Diffing the two diff outputs shows me that it looks like all the "Only in output/cache-hit-build" lines go away with |
I was able to update dependencies in the repro repo in such a way that I think it's now reproducing the whole problem, not just that one part of it! I'm now getting the following output from the
Note that I also updated to the Sorry I didn't manage to capture that part of it before! |
No worries, thanks for updating the repro. I believe #7633 should fix the remaining diff issues. |
Thanks! Let me know when there's a new canary version I can test against. |
### Description Fix for secondary problem discovered in #7410 Again, reading the docs for [`Entry::link_name`](https://docs.rs/tar/latest/tar/struct.Entry.html#method.link_name) it is recommended against using a header as it might have an incomplete/differing link name from the actual entry. `restore_symlink` already uses this method over accessing the link name via the header so it doesn't need to be updated. ### Testing Instructions Added failing unit test in first commit. The test adds a symlink to a directory with a long path that gets restored before the target has been restored. This results in us hitting the `topologically_restore_symlinks` codepath which contained the bug. Also tested against updates to https://github.com/trappar/turbo-cache-missing-output-files which also trigger this behavior. Closes TURBO-2539
|
That seems to have done it! As of this version I'm once again able to run my server based purely on a cache-restored build without issue! 🎉 For full transparency, I am still seeing one line in the diff output:
But I'm fairly confident that issue has nothing to do with Turbo. I tried downgrading to % pwd
/Users/trappar/[redacted]/.next/standalone/node_modules/.pnpm/node_modules
% ls -al
total 0
drwxr-xr-x 10 trappar staff 320 Mar 5 17:14 .
drwxr-xr-x 204 trappar staff 6528 Mar 5 17:14 ..
drwxr-xr-x 3 trappar staff 96 Mar 5 17:14 @opentelemetry
lrwxr-xr-x 1 trappar staff 54 Mar 5 17:14 caniuse-lite -> ../caniuse-lite@1.0.30001528/node_modules/caniuse-lite
lrwxr-xr-x 1 trappar staff 114 Mar 5 17:14 next -> ../next@14.0.3_react-dom@18.3.0-canary-627b7abd6-20230911_react@18.3.0-canary-627b7abd6-20230911/node_modules/next
lrwxr-xr-x 1 trappar staff 60 Mar 5 17:14 react -> ../react@18.3.0-canary-627b7abd6-20230911/node_modules/react
lrwxr-xr-x 1 trappar staff 34 Mar 5 17:14 sharp -> ../sharp@0.31.3/node_modules/sharp
lrwxr-xr-x 1 trappar staff 51 Mar 5 17:14 supports-color -> ../supports-color@7.2.0/node_modules/supports-color
lrwxr-xr-x 1 trappar staff 42 Mar 5 17:14 uglify-js -> ../uglify-js@3.17.4/node_modules/uglify-js
lrwxr-xr-x 1 trappar staff 71 Mar 5 17:14 webpack -> ../webpack@5.89.0_@swc+core@1.3.83_esbuild@0.17.19/node_modules/webpack
% ls -al .. | grep caniuse
drwxr-xr-x 3 trappar staff 96 Mar 5 17:14 caniuse-lite@1.0.30001585 Notice that the link points at In the cache restored version the symlink still exists, so I think Turbo is working perfectly and diff is just complaining since it isn't able to find the file. Maybe this is a PNPM or Next.js bug? Either way it's not causing any issues so I'd say this is fixed! Feel free to close! |
Ah, we intentionally don't restore broken symlinks, but I think that's acceptable? If we run into a tool that depends on broken symlinks existing we can reconsider that decision. Again, thanks for the great reproduction/write up/testing of the fixes! |
Verify canary release
Link to code that reproduces this issue
https://github.com/trappar/turbo-cache-missing-output-files
What package manager are you using / does the bug impact?
pnpm
What operating system are you using?
Mac
Which canary version will you have in your reproduction?
1.12.4
Describe the Bug
When building my app with turbo, I have it configured to cache the
.next
directory. I noticed that if I build, then delete the.next
directory, then recover it from my turbo cache - the resulting files differ from the original files I had after building.Expected Behavior
The files cached by turbo cache should precisely match the original output files.
To Reproduce
The reproduction repo I linked has a
reproduce-issue.sh
script that you can run to see the issue. Since this is a minimal reproduction repo, the number of divergent files between the fresh and cached copies are minimal. In a large enterprise monorepo I work on, the number of files not being copied is massive and is causing our servers to error/crash when we try to run a build which was recalled from the turbo cache.Just for context, here's all that I'm doing in that script:
And this is producing the following list of file differences:
Additional context
The only context where I've seen this happen is a Next.js app using the "standalone" output mode. I suspect this is because the directory I'm trying to cache has long file paths and complex/internal links. It's possible that this could be distilled down even further into a reproduction that simply creates a directory structure like what Next.js creates, but this is probably the best repo since it's a fairly common application configuration.
I know this issue can be reproduced on Ubuntu and on Mac since I'm seeing these kinds of failures in CI, and I've reproduced it locally in that reproduction repo. I'm curious if this bug will happen on all OSes (windows?)
TURBO-2414
The text was updated successfully, but these errors were encountered: