Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLJS output is undeterministic #958

Closed
abbec opened this issue Nov 17, 2021 · 10 comments
Closed

CLJS output is undeterministic #958

abbec opened this issue Nov 17, 2021 · 10 comments

Comments

@abbec
Copy link

abbec commented Nov 17, 2021

Hi,

When building with any optimization from the closure compiler (simple or advanced) and turning off cache, the output is constantly different.

mv out/index.js out/index-old.js && npx shadow-cljs release function && diff -u --color out/index.js out/index-old.js

results in diffs like this

-cljs.core.async.partition_by.cljs$core$IFn$_invoke$arity$3=function(a,b,c){var d=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(c),e=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(1);cljs.core.async.impl.dispatch.run(function(){var f=function(){var g=function(h){var m=h[1];if(7===m)return h[2]=h[2],h[1]=3,new cljs.core.Keyword(null,"recur","recur",-437573268);if(1===m){var l=[];m=new cljs.core.Keyword("cljs.core.async","nothing","cljs.core.async/nothing",-69252123);h[7]=l;h[8]=m;h[2]=null;
+cljs.core.async.partition_by.cljs$core$IFn$_invoke$arity$3=function(a,b,c){var d=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(c),e=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(1);cljs.core.async.impl.dispatch.run(function(){var f=function(){var g=function(h){var m=h[1];if(7===m)return h[2]=h[2],h[1]=3,new cljs.core.Keyword(null,"recur","recur",-437573268);if(1===m){var l=[];m=new cljs.core.Keyword("cljs.core.async","nothing","cljs.core.async/nothing",-69252123);h[7]=m;h[8]=l;h[2]=null;

where we can see that the local variables l and m are switched at the end of the line.

This is a problem for us since we use Nix to build this output and want the same input to produce identical output to make it properly cacheable.

When turning off optimizations, the output is deterministic.

Saw this as well. Does shadow-cljs need to send a seed somewhere to the closure compiler?

@abbec
Copy link
Author

abbec commented Nov 17, 2021

I can add that the output of compile with whitespace only optimization level also generates undeterministic code...

@abbec
Copy link
Author

abbec commented Nov 17, 2021

Actually, scratch that, it seems that it is the cljs JS generation that is not stable. Running

 npx shadow-cljs compile function && mv .shadow-cljs/builds/function/dev/out/cljs-runtime/main.js .shadow-cljs/builds/function/dev/out/cljs-runtime/main-old.js && npx shadow-cljs compile function && diff -u --color ./.shadow-cljs/builds/function/dev/out/cljs-runtime/main.js ./.shadow-cljs/builds/function/dev/out/cljs-runtime/main-old.js

generates diffs like

-return cljs.core.async.impl.ioc_helpers.run_state_machine_wrapped(state__9471__auto__);
+return cljs.core.async.impl.ioc_helpers.run_state_machine_wrapped(state__9316__auto__);

@abbec abbec changed the title Closure compiler output is undeterministic CLJS output is undeterministic Nov 17, 2021
@thheller
Copy link
Owner

Deterministic output is not supported. Neither ClojureScript or the Closure Compiler really support it and it would require substantial changes in both to get it. Not something shadow-cljs can address in any way on its own.

@thheller
Copy link
Owner

To further clarify: The compiler uses gensym based symbols heavily (eg. in cljs macro example above state#). gensym uses an incrementing integer to make the symbol unqiue. This means the number you get is kinda non-deterministic as parallel compilation with many threads can affect that as well as overall compilation order. So adding a new namespace or removing one would affect that sequence.

The Closure Compiler does have mechanism to store previously shortened names so they get determistic names back on further compiles. By not using the cache you are also not using that feature. But even with that results are pretty much entirely random. :advanced moves a lot of code arround and just using a function differently or in different places may affect output chunks and as such any "signature".

Long story short: Deterministic output is not supported.

The only thing that should decide when to use a cache is the inputs (ie. use your packge-lock.json and shadow-cljs.edn or so). Not the output.

@abbec
Copy link
Author

abbec commented Nov 17, 2021

Thanks for the explanation!

Unfortunately I do not decide how Nix does caching, it essentially requires output to be the same if input is the same. However it is good to know that this is expected behavior, then we can work around it :)

@abbec
Copy link
Author

abbec commented Nov 17, 2021

It also creates problems with systems like Terraform, that uses a hash of the output to know if it is supposed to replace an uploaded archive or not.

@thheller
Copy link
Owner

Maybe I misunderstand something but how does Nix use any cache if it has to compile to know if it can use a cache? That kinda sounds like the opposite of any other caching mechanism I ever heard of or used? I mean isn't the point of a cache to skip the expensive part (ie. the compilation)?

@abbec
Copy link
Author

abbec commented Nov 17, 2021

Actually, deterministic output is not a strict requirement for getting caching (unless using the new CAS derivations). CAS derivations enables deduplication, early cutoff in build systems, and unprivileged closure copying so it is more a nice-to-have. We will work around the Terraform issue in some other way.

@thheller
Copy link
Owner

Sorry, I don't know what any of that means.

All caching shadow-cljs does is based on fairly strict checking of the inputs. If they are all equal no compilation is done assuming the .shadow-cljs/builds cache exists to verify all this. The output after :advanced is still not guaranteed to be 100% deterministic but should be close to it. If there are changes anywhere the cache is invalidated for those sources and in turn the entire output is invalidated since :advanced is a whole program build. So for any kind of CI build system I recommend keeping the .shadow-cljs/builds cache dir alive between builds.

shadow-cljs also offers a couple options for "fingerprinting" outputs but now clue how this plays into Terraform since I have not used that either. You can sort of create a shortcut and skip a CLJS build if none of your "inputs" changes (eg. build config, package-lock.json, shadow-cljs.edn, *.cljs). Unless of course you are using side-effecting macros then all bets are off since they can do anything and there is no way for shadow-cljs to keep track of it.

@abbec
Copy link
Author

abbec commented Nov 22, 2021

Thanks a lot for all the information! This is not a huge issue and there is a lot of ways we can work around it but it is a good point about the fingerprinting options! Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants