-
-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build cache should be reusable between git checkouts #673
Comments
Happy to help with a PR on this :-) |
Another option to workaround this is to use a git hook like Danny Lin's Or there are simpler options just for mtime, like
The downsides of those are that it can be slow, and its extra complexity for users to get caching between checkouts working. The question then is should this be something shadow-cljs fixes or something users fix with their own solutions like the above ? For example shadow-cljs might need to do hashing of all the files to work out what has modified instead of using mtime which might have performance implications for shadow-cljs itself ? |
I think the best way to handle this would be to md5/sha1 actual files only, so it keeps the old logic for |
jar/zip files should contain crc32 checksums of all files in zip header. So potentially it could be pretty fast to check that. Just my $.02 |
Interesting. I don't know much about crc32 but does the same content always produce the same checksum? So if you have the same file but in |
Yes, AFAIK. The only problem could be collisions since the checksum is only 32bits. Meaning that you have two files with different content but giving you the same CRC and you decide to use cached version. But I think the likelihood is very slim because we are dealing here with text files and nobody is going to attack this system with collisions. Also you would be comparing multiple CRCs (of a list of files), so under normal circumstances the change is even lower when touching multiple files. I would go through list of files in the zip header and hash all filenames and CRC fileds into one md5/sha1 and use that as content hash (maybe also include other userful fields from the header like file size?). Filenames should be included because you want to cover cases when people rename files without changing their content, I guess. |
It should be fine as long as the CRC is unique enough that it doesn't clash with similar content. It is never used as a lookup value and only used in combination with the same file. So the only problem would be if two "similar" files result in the same CRC. Dunno how likely it is to get the same CRC with small changes like changing I can just md5/sha1 the whole jar file as a whole too. It is fine to invalidate everything in that file if that hash changes, no need to test individual files. |
I released I only tested this briefly though so would be nice if someone could confirm that actually does what its supposed to in actual CI environments. |
I've tested this and it works with GitHub Actions for two internal (private repo) apps. Reduced compile times from ~46s to ~8s which is great 👍 😄 Thanks @thheller ! However I cannot get it to work for |
@superstructor if I read this right this uses the git sha as the cache key? Doesn't that invalidate the cache for each commit? |
Partially correct but a key point is the addition of a Using a git sha as the cache key ensures that the latest cache is always uploaded with GitHub Actions. One could probably use something that changes less often than the git sha such as a hash of all source files, compiler config, package.json etc to decide if to upload the cache from the current build. It is just that so many things could influence the state of the cache I opted for always uploading it regardless. Then using I've confirmed .shadow-cljs is being correctly uploaded and downloaded on builds. The same configuration works on a couple of private projects significantly reducing build times. Just on the re-frame-10x example it doesn't speed up builds for some reason. I'll need to investigate more to find out if this difference is due to something I am doing or within shadow-cljs itself. |
It is probably enough to hash everything that affects dependencies only. The cache otherwise will then already check the sha1 of everything else so uploading a cache for each commit seems overkill. Should be fine with just |
Steps to reproduce:
The text was updated successfully, but these errors were encountered: