CLJS output is undeterministic #958

abbec · 2021-11-17T07:37:24Z

Hi,

When building with any optimization from the closure compiler (simple or advanced) and turning off cache, the output is constantly different.

mv out/index.js out/index-old.js && npx shadow-cljs release function && diff -u --color out/index.js out/index-old.js

results in diffs like this

-cljs.core.async.partition_by.cljs$core$IFn$_invoke$arity$3=function(a,b,c){var d=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(c),e=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(1);cljs.core.async.impl.dispatch.run(function(){var f=function(){var g=function(h){var m=h[1];if(7===m)return h[2]=h[2],h[1]=3,new cljs.core.Keyword(null,"recur","recur",-437573268);if(1===m){var l=[];m=new cljs.core.Keyword("cljs.core.async","nothing","cljs.core.async/nothing",-69252123);h[7]=l;h[8]=m;h[2]=null;
+cljs.core.async.partition_by.cljs$core$IFn$_invoke$arity$3=function(a,b,c){var d=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(c),e=cljs.core.async.chan.cljs$core$IFn$_invoke$arity$1(1);cljs.core.async.impl.dispatch.run(function(){var f=function(){var g=function(h){var m=h[1];if(7===m)return h[2]=h[2],h[1]=3,new cljs.core.Keyword(null,"recur","recur",-437573268);if(1===m){var l=[];m=new cljs.core.Keyword("cljs.core.async","nothing","cljs.core.async/nothing",-69252123);h[7]=m;h[8]=l;h[2]=null;

where we can see that the local variables l and m are switched at the end of the line.

This is a problem for us since we use Nix to build this output and want the same input to produce identical output to make it properly cacheable.

When turning off optimizations, the output is deterministic.

Saw this as well. Does shadow-cljs need to send a seed somewhere to the closure compiler?

The text was updated successfully, but these errors were encountered:

abbec · 2021-11-17T07:42:23Z

I can add that the output of compile with whitespace only optimization level also generates undeterministic code...

abbec · 2021-11-17T07:51:45Z

Actually, scratch that, it seems that it is the cljs JS generation that is not stable. Running

 npx shadow-cljs compile function && mv .shadow-cljs/builds/function/dev/out/cljs-runtime/main.js .shadow-cljs/builds/function/dev/out/cljs-runtime/main-old.js && npx shadow-cljs compile function && diff -u --color ./.shadow-cljs/builds/function/dev/out/cljs-runtime/main.js ./.shadow-cljs/builds/function/dev/out/cljs-runtime/main-old.js

generates diffs like

-return cljs.core.async.impl.ioc_helpers.run_state_machine_wrapped(state__9471__auto__);
+return cljs.core.async.impl.ioc_helpers.run_state_machine_wrapped(state__9316__auto__);

thheller · 2021-11-17T08:11:39Z

Deterministic output is not supported. Neither ClojureScript or the Closure Compiler really support it and it would require substantial changes in both to get it. Not something shadow-cljs can address in any way on its own.

thheller · 2021-11-17T08:19:23Z

To further clarify: The compiler uses gensym based symbols heavily (eg. in cljs macro example above state#). gensym uses an incrementing integer to make the symbol unqiue. This means the number you get is kinda non-deterministic as parallel compilation with many threads can affect that as well as overall compilation order. So adding a new namespace or removing one would affect that sequence.

The Closure Compiler does have mechanism to store previously shortened names so they get determistic names back on further compiles. By not using the cache you are also not using that feature. But even with that results are pretty much entirely random. :advanced moves a lot of code arround and just using a function differently or in different places may affect output chunks and as such any "signature".

Long story short: Deterministic output is not supported.

The only thing that should decide when to use a cache is the inputs (ie. use your packge-lock.json and shadow-cljs.edn or so). Not the output.

abbec · 2021-11-17T08:39:01Z

Thanks for the explanation!

Unfortunately I do not decide how Nix does caching, it essentially requires output to be the same if input is the same. However it is good to know that this is expected behavior, then we can work around it :)

abbec · 2021-11-17T08:40:41Z

It also creates problems with systems like Terraform, that uses a hash of the output to know if it is supposed to replace an uploaded archive or not.

thheller · 2021-11-17T08:45:58Z

Maybe I misunderstand something but how does Nix use any cache if it has to compile to know if it can use a cache? That kinda sounds like the opposite of any other caching mechanism I ever heard of or used? I mean isn't the point of a cache to skip the expensive part (ie. the compilation)?

abbec · 2021-11-17T09:05:31Z

Actually, deterministic output is not a strict requirement for getting caching (unless using the new CAS derivations). CAS derivations enables deduplication, early cutoff in build systems, and unprivileged closure copying so it is more a nice-to-have. We will work around the Terraform issue in some other way.

thheller · 2021-11-17T09:28:40Z

Sorry, I don't know what any of that means.

All caching shadow-cljs does is based on fairly strict checking of the inputs. If they are all equal no compilation is done assuming the .shadow-cljs/builds cache exists to verify all this. The output after :advanced is still not guaranteed to be 100% deterministic but should be close to it. If there are changes anywhere the cache is invalidated for those sources and in turn the entire output is invalidated since :advanced is a whole program build. So for any kind of CI build system I recommend keeping the .shadow-cljs/builds cache dir alive between builds.

shadow-cljs also offers a couple options for "fingerprinting" outputs but now clue how this plays into Terraform since I have not used that either. You can sort of create a shortcut and skip a CLJS build if none of your "inputs" changes (eg. build config, package-lock.json, shadow-cljs.edn, *.cljs). Unless of course you are using side-effecting macros then all bets are off since they can do anything and there is no way for shadow-cljs to keep track of it.

abbec · 2021-11-22T07:16:38Z

Thanks a lot for all the information! This is not a huge issue and there is a lot of ways we can work around it but it is a good point about the fingerprinting options! Thanks again!

abbec changed the title ~~Closure compiler output is undeterministic~~ CLJS output is undeterministic Nov 17, 2021

thheller closed this as completed Nov 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLJS output is undeterministic #958

CLJS output is undeterministic #958

abbec commented Nov 17, 2021

abbec commented Nov 17, 2021

abbec commented Nov 17, 2021

thheller commented Nov 17, 2021

thheller commented Nov 17, 2021

abbec commented Nov 17, 2021

abbec commented Nov 17, 2021

thheller commented Nov 17, 2021

abbec commented Nov 17, 2021

thheller commented Nov 17, 2021

abbec commented Nov 22, 2021

CLJS output is undeterministic #958

CLJS output is undeterministic #958

Comments

abbec commented Nov 17, 2021

abbec commented Nov 17, 2021

abbec commented Nov 17, 2021

thheller commented Nov 17, 2021

thheller commented Nov 17, 2021

abbec commented Nov 17, 2021

abbec commented Nov 17, 2021

thheller commented Nov 17, 2021

abbec commented Nov 17, 2021

thheller commented Nov 17, 2021

abbec commented Nov 22, 2021